Experiments#
Here, we describe how some experiments from [C1] can be reproduced.
Experiments with hyperparameters optimization#
To run experiments with hyperparameters optimization, under directory experiments/,
use for instance
python run_hyperopt_classfiers.py --clf_name WildWood --dataset_name adult
with the WildWood classifier and adult dataset. Some options are
--n_estimatorsor-tNumber of estimators (maximal number of boosting iterations for gradient boosting), default=100.
--hyperopt_evalsor-nNumber of hyperopt (hyperoptimization) steps, default=50.
Experiments on default parameters#
To run experiments with default parameters, under directory experiments/, use
python run_benchmark_default_params_classifiers.py --clf_name WildWood --dataset_name adult
with the WildWood classifier and adult dataset.
Datasets and classifiers#
Here are the options available for the scripts run_hyperopt_classfiers.py and
run_benchmark_default_params_classifiers.py.
dataset_namecan be set as
adult,bank,breastcancer,car,cardio,churn,default-cb,letter,satimage,sensorless,spambase,amazon,covtype,internet,kick,kddcup,higgsclf_namecan be set as
LGBMClassifier,XGBClassifier,CatBoostClassifier,RandomForestClassifier,HistGradientBoostingClassifier,WildWood
Experiments presented in [C1]#
Figure 1 is produced using
fig_aggregation_effect.py.Figure 2 is produced using
n_tree_experiment.py.Tables 1 and 3 from the paper are produced using
run_hyperopt_classfiers.pywithn_estimators=5000for gradient boosting algorithms and withn_estimators=nforRFnandWWn. Usepython run_hyperopt_classfiers.py --clf_name <classifier> --dataset_name <dataset> --n_estimators <n_estimators>
for each pair
(<classifier>, <dataset>)to run hyperparameters optimization experiments and use for exampleimport pickle as pkl filename = 'exp_hyperopt_xxx.pickle' with open(filename, "rb") as f: results = pkl.load(f) df = results["results"]
to retrieve experiments information, such as AUC, logloss and their standard deviations.
Tables 2 and 4 are produced with
benchmark_default_params.py, usingpython run_benchmark_default_params_classifiers.py --clf_name <classifier> --dataset_name <dataset>
for each pair
(<classifier>, <dataset>)to run experiments with default parameters and use similar commands to retrieve experiments information.Using experiments results (AUC and fit time) done by
run_hyperopt_classfiers.py, then concatenating dataframes and usingfig_auc_fit_time.pyto produce Figure 3.
References#
Stéphane Gaïffas, Ibrahim Merad, and Yiyang Yu. Wildwood: a new random forest algorithm. 2021. arXiv:2109.08010.