The Python wrap around XGBoots implements a scikit-learn interface and this interface, more or less, support the scikit-learn cross validation system. More, XGBoost have is own cross validation system and the Python wrap support it. In other words, we have two cross validation systems. They are partialy supported and the functionalities supported for XGBoost are not the same for LightGBM. Currently, it's a puzzle.
The example presented covers both cases. The first, step_GradientBoostingCV, call the XGBoost cross validation. The second, step_GridSearchCV, call the scikit-learn cross validation.
The data preparation is the same as for the nbex_xgb_model.ipynb example. We take only two images to speed up the process.
The 'Tune' class manages everything. The step_GradientBoostingCV method call the XGBoost cv() function. The step_GridSearchCV method call the scikit-learn GridSearchCV() function.
Take note that this is in development and that changes can be significant.
%matplotlib inline
from __future__ import print_function
import os
import os.path as osp
import numpy as np
import pysptools.ml as ml
import pysptools.skl as skl
from sklearn.model_selection import train_test_split
home_path = '/mnt'
source_path = osp.join(home_path, 'dev-data/CZ_hsdb')
result_path = None
def print_step_header(step_id, title):
print('================================================================')
print('{}: {}'.format(step_id, title))
print('================================================================')
print()
# img1
img1_scaled, img1_cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'img1',
[['Snow',{'rec':(41,79,49,100)}]],
skl.HyperGaussianNB, None,
display=False)
# img2
img2_scaled, img2_cmap = ml.get_scaled_img_and_class_map(source_path, result_path, 'img2',
[['Snow',{'rec':(83,50,100,79)},{'rec':(107,151,111,164)}]],
skl.HyperLogisticRegression, {'class_weight':{0:1.0,1:5}},
display=False)
def step_GradientBoostingCV(tune, update, cv_params, verbose):
print_step_header('Step', 'GradientBoosting cross validation')
tune.print_params('input')
tune.step_GradientBoostingCV(update, cv_params, verbose)
def step_GridSearchCV(tune, params, title, verbose):
print_step_header('Step', 'scikit-learn cross-validation')
tune.print_params('input')
tune.step_GridSearchCV(params, title, verbose)
tune.print_params('output')
X_train and y_train sets are built
The class Tune is created with the HyperXGBClassifier estimator. It's ready for cross validation, we can call Tune methods repeatedly with differents cv hypothesis.
verbose = False
n_shrink = 3
snow_fname = ['img1','img2']
nosnow_fname = ['imga1','imgb1','imgb6','imga7']
all_fname = snow_fname + nosnow_fname
snow_img = [img1_scaled,img2_scaled]
nosnow_img = ml.batch_load(source_path, nosnow_fname, n_shrink)
snow_cmap = [img1_cmap,img2_cmap]
M = snow_img[0]
bkg_cmap = np.zeros((M.shape[0],M.shape[1]))
X,y = skl.shape_to_XY(snow_img+nosnow_img,
snow_cmap+[bkg_cmap,bkg_cmap,bkg_cmap,bkg_cmap])
seed = 5
train_size = 0.25
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=train_size,
random_state=seed)
start_param = {'max_depth':10,
'min_child_weight':1,
'gamma':0,
'subsample':0.8,
'colsample_bytree':0.5,
'scale_pos_weight':1.5}
# Tune can be call with HyperXGBClassifier or HyperLGBMClassifier,
# but hyperparameters and cv parameters are differents
t = ml.Tune(ml.HyperXGBClassifier, start_param, X_train, y_train)
We set an hypothesis and call the Gradient Boosting cross validation
# Step 1: Fix learning rate and number of estimators for tuning tree-based parameters
step_GradientBoostingCV(t, {'learning_rate':0.2,'n_estimators':500,'silent':1},
{'verbose_eval':False},
True)
# After reading the cross validation results we manually set n_estimator
t.p_update({'n_estimators':9})
t.print_params('output')
Same but this time we call the scikit-learn cross validation
# Step 2: Tune max_depth and min_child_weight
step_GridSearchCV(t, {'max_depth':[24,25, 26], 'min_child_weight':[1]}, 'Step 2', True)
Finally, the result
print(t.get_p_current())