jupyter_MLExperiment

Summary

Machine learning experiment

What Machine learning experiment usualy do?

  • search hyper-parameter of a model
  • validate/compare performance and generliazation between models (overfit/underfit)

How to conduct a Machine learning experiment:

  • making comparision using cross validatiion, or CV.

How to make fair comparison among models?

  • use same data to sample for validation set

What tool-kit is avaliable?

  • Scikit-Learn library provides the two main functionalities to do Machine learning experiment as the following:
method composition focus
Hyper-parameter optimizers a warper of estimator, data splitter, parameter grid search the best set of hyper-parameters of a model
Model validator a warper of estimator and data splitter validate the models

Data streaming pipeline

A pipeline can wrap multiple functionalities including data transformation (scaler, dimentionality reduction) and an estimator to be a single class. It helps to conduct CV on the same data transformation and the same estimator for a fair comparison. Reference: 管道:链式评估器

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
In [2]:
from sklearn.decomposition import PCA
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

# scaling
# dimentionality reduction
pca = PCA()
# Learnign algorithm
logistic = SGDClassifier(loss='log', penalty='l2', early_stopping=True,
                         max_iter=10000, tol=1e-5, random_state=0)
# pipeline
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])

print(pipe)
Pipeline(steps=[('pca', PCA()),
                ('logistic',
                 SGDClassifier(early_stopping=True, loss='log', max_iter=10000,
                               random_state=0, tol=1e-05))])

ML Experiment

Hyper-parameter optimizers: select Hyper-parameter

optimizer().fit():

  • X,y: data
  • param_grid: parameter grid
  • estimator: class with fit function
  • scoring: evaluation metric
  • cv: Splitter (default 5-fold cross validation)
Hyper-parameter optimizers description
model_selection.GridSearchCV() Exhaustive search over specified parameter values for an estimator.
model_selection.HalvingGridSearchCV() Search over specified parameter values with successive halving.
model_selection.ParameterGrid() Grid of parameters with a discrete number of values for each.
model_selection.ParameterSampler() Generator on parameters sampled from given distributions.
model_selection.RandomizedSearchCV() Randomized search on hyper parameters.
model_selection.HalvingRandomSearchCV() Randomized search on hyper parameters.
In [3]:
%time
# Parameters of pipelines can be set using ‘__’ separated parameter names:
param_grid = {
    'pca__n_components': [5, 20, 30, 40, 50, 64],
    'logistic__alpha': np.logspace(-4, 4, 5),
}
cv=5
search = GridSearchCV(pipe, param_grid, iid=False, cv=cv, scoring='recall_macro', n_jobs=cv)
search.fit(X_digits, y_digits)

print("Best parameter (CV score=%0.3f):" % search.best_score_)
print(search.best_params_)
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.01 µs
Best parameter (CV score=0.919):
{'logistic__alpha': 0.01, 'pca__n_components': 50}
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/model_selection/_search.py:849: FutureWarning: The parameter 'iid' is deprecated in 0.22 and will be removed in 0.24.
  "removed in 0.24.", FutureWarning
In [4]:
d = {'mean_test_score': search.cv_results_.get('mean_test_score'),
     'std_test_score': search.cv_results_.get('std_test_score'),
     'rank_test_score': search.cv_results_.get('rank_test_score')}
pd.DataFrame(data= search.cv_results_.get('params')).join(pd.DataFrame(data= d))
Out[4]:
logistic__alpha pca__n_components mean_test_score std_test_score rank_test_score
0 0.0001 5 0.716044 0.052309 23
1 0.0001 20 0.887411 0.035888 15
2 0.0001 30 0.903600 0.044263 9
3 0.0001 40 0.895183 0.039510 11
4 0.0001 50 0.891895 0.035644 14
5 0.0001 64 0.894120 0.043411 12
6 0.0100 5 0.768088 0.031506 16
7 0.0100 20 0.903073 0.033513 10
8 0.0100 30 0.908520 0.026920 4
9 0.0100 40 0.913647 0.027858 3
10 0.0100 50 0.919154 0.023918 1
11 0.0100 64 0.918032 0.025162 2
12 1.0000 5 0.753747 0.037280 22
13 1.0000 20 0.892815 0.035741 13
14 1.0000 30 0.903896 0.037860 8
15 1.0000 40 0.906120 0.035376 5
16 1.0000 50 0.906104 0.036751 6
17 1.0000 64 0.906104 0.036751 6
18 100.0000 5 0.627318 0.109118 24
19 100.0000 20 0.764470 0.080391 21
20 100.0000 30 0.766754 0.080410 17
21 100.0000 40 0.766199 0.080017 18
22 100.0000 50 0.765643 0.080739 19
23 100.0000 64 0.765643 0.080739 19
24 10000.0000 5 0.355885 0.314470 30
25 10000.0000 20 0.390485 0.356177 29
26 10000.0000 30 0.391597 0.357540 28
27 10000.0000 40 0.392168 0.358197 25
28 10000.0000 50 0.392168 0.358197 25
29 10000.0000 64 0.392168 0.358197 25

Model validator: check model

validator():

  • X,y: data
  • estimator: class with fit function
  • scoring: evaluation metric
  • cv: Splitter (default 5-fold cross validation)
Model validator validating on
sklearn.model_selection.cross_val_score() single score
sklearn.model_selection.cross_validate() by score
sklearn.model_selection.validation_curve() hyper-parameter by score
sklearn.model_selection.learning_curve() trained sample by score
In [5]:
%time
from sklearn.model_selection import cross_validate
from sklearn.metrics import recall_score
scoring = ['precision_macro', 'recall_macro']
cv=5
scores = cross_validate(pipe, X_digits, y_digits, scoring=scoring, cv=cv, return_estimator=True, n_jobs=cv)
CPU times: user 13 µs, sys: 34 µs, total: 47 µs
Wall time: 7.15 µs
In [6]:
[print(x,":", 
       np.mean(scores.get('test_'+ x)).round(4), 
       '+/-',np.std(scores.get('test_'+ x)).round(4)) 
 for x in scoring]
precision_macro : 0.901 +/- 0.039
recall_macro : 0.8941 +/- 0.0434
Out[6]:
[None, None]

Cross-validation API

Splitter Classes

Two categories of the splitting for non TimeSeries data:

  • train_test_split : divide data set into training set and test set
  • kfold: divide data set into prespecified number of folds
  • ShuffleSplit: randomly sample the entire dataset, with prespecified number of test_size and train_size

Two categories of the Sampling:

  • Stratified Sampling: elements within each stratum(class) are sampled, aims to increase precision to reduce error
  • Cluster(group) Sampling: only selected clusters(group) are sampled, aims to reduce cost and increase the efficiency of sampling

One categories TimeSeries data

  • TimeSeriesSplit

Less-frequently-used Splitter Classes Description
model_selection.LeaveOneGroupOut() Leave One Group Out cross-validator
model_selection.LeavePGroupsOut() Leave P Group(s) Out cross-validator
model_selection.LeaveOneOut() Leave-One-Out cross-validator
model_selection.LeavePOut() Leave-P-Out cross-validator
model_selection.PredefinedSplit() Predefined split cross-validator
model_selection.RepeatedKFold() Repeated K-Fold cross validator.
model_selection.RepeatedStratifiedKFold() Repeated Stratified K-Fold cross validator.

Model specific cross-validation API

method description
linear_model.ElasticNetCV([l1_ratio, eps, …]) Elastic Net model with iterative fitting along a regularization path
linear_model.LarsCV([fit_intercept, …]) Cross-validated Least Angle Regression model
linear_model.LassoCV([eps, n_alphas, …]) Lasso linear model with iterative fitting along a regularization path
linear_model.LassoLarsCV([fit_intercept, …]) Cross-validated Lasso, using the LARS algorithm
linear_model.LogisticRegressionCV([Cs, …]) Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.MultiTaskElasticNetCV([…]) Multi-task L1/L2 ElasticNet with built-in cross-validation.
linear_model.MultiTaskLassoCV([eps, …]) Multi-task L1/L2 Lasso with built-in cross-validation.
linear_model.OrthogonalMatchingPursuitCV([…]) Cross-validated Orthogonal Matching Pursuit model (OMP)
linear_model.RidgeCV([alphas, …]) Ridge regression with built-in cross-validation.
linear_model.RidgeClassifierCV([alphas, …]) Ridge classifier with built-in cross-validation.

Score API

Classification score

Scoring

Function

Comment

‘accuracy’

metrics.accuracy_score

‘balanced_accuracy’

metrics.balanced_accuracy_score

‘top_k_accuracy’

metrics.top_k_accuracy_score

‘average_precision’

metrics.average_precision_score

‘neg_brier_score’

metrics.brier_score_loss

‘f1’

metrics.f1_score

for binary targets

‘f1_micro’

metrics.f1_score

micro-averaged

‘f1_macro’

metrics.f1_score

macro-averaged

‘f1_weighted’

metrics.f1_score

weighted average

‘f1_samples’

metrics.f1_score

by multilabel sample

‘neg_log_loss’

metrics.log_loss

requires predict_proba support

‘precision’ etc.

metrics.precision_score

suffixes apply as with ‘f1’

‘recall’ etc.

metrics.recall_score

suffixes apply as with ‘f1’

‘jaccard’ etc.

metrics.jaccard_score

suffixes apply as with ‘f1’

‘roc_auc’

metrics.roc_auc_score

‘roc_auc_ovr’

metrics.roc_auc_score

‘roc_auc_ovo’

metrics.roc_auc_score

‘roc_auc_ovr_weighted’

metrics.roc_auc_score

‘roc_auc_ovo_weighted’

metrics.roc_auc_score

Regression score

Scoring

Function

Comment

‘explained_variance’

metrics.explained_variance_score

‘max_error’

metrics.max_error

‘neg_mean_absolute_error’

metrics.mean_absolute_error

‘neg_mean_squared_error’

metrics.mean_squared_error

‘neg_root_mean_squared_error’

metrics.mean_squared_error

‘neg_mean_squared_log_error’

metrics.mean_squared_log_error

‘neg_median_absolute_error’

metrics.median_absolute_error

‘r2’

metrics.r2_score

‘neg_mean_poisson_deviance’

metrics.mean_poisson_deviance

‘neg_mean_gamma_deviance’

metrics.mean_gamma_deviance

‘neg_mean_absolute_percentage_error’

metrics.mean_absolute_percentage_error

Clustering score

Scoring

Function

Comment

‘adjusted_mutual_info_score’

metrics.adjusted_mutual_info_score

‘adjusted_rand_score’

metrics.adjusted_rand_score

‘completeness_score’

metrics.completeness_score

‘fowlkes_mallows_score’

metrics.fowlkes_mallows_score

‘homogeneity_score’

metrics.homogeneity_score

‘mutual_info_score’

metrics.mutual_info_score

‘normalized_mutual_info_score’

metrics.normalized_mutual_info_score

‘rand_score’

metrics.rand_score

‘v_measure_score’

metrics.v_measure_score

In [ ]: