The wildwood.preprocessing module contains the Encoder class. The Encoder
performs the transformation of an input pandas.DataFrame or numpy.ndarray
into a wildwood.FeaturesBitArray class.
A class that transforms an input pandas.DataFrame or numpy.ndarray into
a wildwood.FeaturesBitArray class, corresponding to a column-wise binning of the
original columns.
Categorical columns are simply ordinal-encoded using contiguous non-negative
integers, while continuous columns are binned using inter-quantiles intervals,
so that each bin contains approximately the same number of samples.
Both mappings from categories to integers (for categorical columns) and from
inter-quantile intervals to integers (for continuous columns) are computed using
the .fit() method.
The .transform() method will bin the features and create the features’ bitarray.
Its default behavior is to raise an error whenever an unknown category is met,
but this can be changed using the handle_unknown option.
When the input is a pandas.DataFrame, we support the encoding of missing
values both for categorical and numerical columns. However, when the input is a
numpy.ndarray, missing values are supported only for a numerical data type.
Other situations might raise unexpected errors.
If a column contains missing values, the last bin (last integer) is used to
encode them.
Parameters:
max_bins (int) – The maximum number of bins for numerical columns, not including the bin used
for missing values, if any. Should be at least 3.
We will use max_bins bins when the column has no missing values, and
max_bins+1 bins if it does.
The last bin (at index max_bins) is used to encode missing values.
If a column has less than max_bins different inter-quantile or
categories, we use less than max_bins bins for it.
is_categorical (None or numpy.ndarray) – If not None, it is a numpy.ndarray of shape (n_features,) with boolean
dtype, which corresponds to a categorical indicator for each column as
specified by the user.
handle_unknown ({"error", "consider_missing"}, default="error") – If set to “error”, an error will be raised at transform whenever a category was
not seen during fit. If set to “consider_missing”, we will consider it as a
missing value (it will end up in the same bin as missing values).
cat_min_categories (int or {"log", "sqrt"}, default="log") – When a column is numerical and is_categorical is None, WildWood decides
that it is categorical whenever its number of unique values is smaller or
equal to cat_min_categories. Otherwise, it is considered numerical.
If an int larger than 3 is given, we use it as cat_min_categories.
If “log”, we set cat_min_categories=max(2,floor(log(n_samples))).
If “sqrt”, we set cat_min_categories=max(2,floor(sqrt(n_samples))).
Default is “log”.
subsample (int or None, default=200000) – If n_samples>subsample, then subsample samples are chosen at random
to compute the quantiles. If None, the whole dataset is used.
random_state (int, default=None) – Allows to seed the random number generator used to generate a subsample for
quantiles computation.
A dictionary that maps the index of a categorical column to an array
containing its raw categories. For instance, categories_[2][4] is the raw
category corresponding to the bin index 4 for column index 2.
A dictionary that maps the index of a continuous column to an array
containing its binning thresholds. It is usually of length max_bins-1,
unless the column has less unique values than that.
A numpy array of shape (n_features,) with boolean dtype, which indicates if
each feature is considered as categorical by WildWood or not. This might
differ from the is_categorical given by the user. See the _checks.py
module for details.
This computes, for each column of X, a mapping from raw values to bins:
categories to integers for categorical columns and inter-quantile intervals
for continuous columns. It also infers if columns are either categorical or
continuous using the is_categorical attribute, and tries to guess it if not
provided by the user.
Parameters:
X (array-like of shape (n_samples, n_features)) – The data to fit and transform. It can be either a pandas.DataFrame
or a 2D numpy.ndarray.
Request metadata passed to the inverse_transform method.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
New in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
columns (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for columns parameter in inverse_transform.
features_bitarray (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for features_bitarray parameter in inverse_transform.
index (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for index parameter in inverse_transform.
return_dataframe (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_dataframe parameter in inverse_transform.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Bins the columns in X. Both continuous and categorical columns are mapped
to a contiguous range of non-negative integers. The resulting binned data is
stored in a memory-efficient FeaturesBitArray class, which uses internally a
bitarray.
Parameters:
X (array-like of shape (n_samples, n_features)) – The data to transform using binning. It can be either a pandas dataframe or
a 2d numpy array.
y (None) – This is ignored.
Returns:
output – A WildWood FeaturesBitArray class corresponding to the binned data.
It grows in parallel n_estimators trees using bootstrap samples and aggregates
their predictions (bagging). Each tree uses “in-the-bag” samples to grow itself
and “out-of-bag” samples to compute aggregation weights for all possible subtrees of
the whole tree.
The prediction function of each tree in WildWood is very different from the one
of a standard decision trees whenever aggregation=True (default). Indeed, the
predictions of a tree are computed here as an aggregation with exponential
weights of all the predictions given by all possible subtrees (prunings) of the
full tree. The required computations are performed efficiently thanks to a
variant of the context tree weighting algorithm.
Also, both continuous and categorical features are binned with a maximum of
max_bins bins, allowing to use an efficient histogram-based split search.
Parameters:
n_estimators (int, default=10) – The number of trees in the forest.
criterion ({"gini", "entropy"}, default="gini") – The impurity criterion used to measure the quality of a split. The supported
impurity criteria are “gini” for the Gini impurity and “entropy” for the
entropy impurity.
loss ({"log"}, default="log") – The loss used for the computation of the aggregation weights. Only “log”
is supported for now, namely the log-loss for classification.
step (float, default=1.0) – Step-size for the aggregation weights. Default is 1.0 for classification with
the log-loss, which is the best theoretical choice. A larger value will lead to
larger aggregation weights for subtrees with better out-of-bag (validation)
loss.
aggregation (bool, default=True) – Controls if aggregation is used in the trees. It is highly recommended to
leave it as True.
dirichlet (float, default=0.5) – Regularization level of the class frequencies used for predictions in each
node. A good default is dirichlet=0.5 for binary classification.
max_depth (int, default=None) – The maximum depth of a tree. If None, then nodes from the tree are split until
they are “pure” (impurity is zero) or until they contain
min_samples_split samples.
min_samples_split (int, default=2) – The minimum number of training samples and out-the-bag samples required to
split a node. This must be >= 2.
min_samples_leaf (int, default=1) – A split point is considered if it leaves at least min_samples_leaf
training samples and out-the-bag samples in the left and right childs.
This must be >= 1.
max_bins (int, default=256) – The maximum number of bins for numerical columns, not including the bin used
for missing values, if any. Should be at least 4. Before training, each column
of the input array X is binned into integer-valued bins,
corresponding to inter-quantile intervals, enabling faster split finding.
We will use max_bins bins when the column has no missing values, and
max_bins+1 bins if it does.
The last bin (at index max_bins) is used to encode missing values.
If a column has less than max_bins different inter-quantile or
categories, we use less than max_bins bins for it.
categorical_features (array-like, default=None) – Array-like containing boolean or integer values or shape (n_features,) or
(n_categorical_features,) indicating the categorical features.
If None : no feature will be considered categorical.
If boolean array-like : boolean mask indicating categorical features.
If integer array-like : integer indices indicating categorical features.
max_features ({"auto", "sqrt", "log2"} or int, default="auto") – The number of features to consider when looking for the best split.
If int, consider max_features features at each split.
If “auto”, max_features=sqrt(n_features).
If “sqrt”, max_features=sqrt(n_features) (same as “auto”).
If “log2”, max_features=log2(n_features)
If None, max_features=n_features.
handle_unknown ({"error", "consider_missing"}, default="error") – If set to “error”, an error will be raised while encoding the data whenever a
category in a categorical column was not seen during fit. If set to
“consider_missing”, we will consider it as a missing value (it will end up in
the same bin as missing values).
cat_min_categories (int or {"log", "sqrt"}, default="log") – When a column contains numerical values and its type is not specified through
categorical_columns, WildWood decides that it is categorical whenever its
number of unique values is smaller or equal to cat_min_categories.
Otherwise, it is considered numerical.
If an int larger than 3 is given, we use it as cat_min_categories.
If “log”, we set cat_min_categories=max(2,floor(log(n_samples))).
If “sqrt”, we set cat_min_categories=max(2,floor(sqrt(n_samples))).
Default is “log”.
subsample (int or None, default=200000) – If n_samples>subsample, then subsample samples are chosen at random
to compute the quantiles used to bin numerical columns. If None, the whole
dataset is used.
n_jobs (int, default=1) – The number of jobs to run in parallel for fit(), predict(),
predict_proba() and apply(). All these methods are parallelized
over the trees in the forest. n_jobs=-1 means using all processors.
random_state (int, RandomState instance or None, default=None) – Controls both the randomness involved in bootstrapping the samples and
sampling the features when looking for the best splits
(if max_features<n_features). See Bootstrap and feature subsampling for details.
verbose (bool, default=False) – Controls the verbosity when fitting and predicting.
class_weight ("balanced" or None, default=None) – Weights associated with classes. If None, all classes are supposed to have
weight one. The “balanced” mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as n_samples/(n_classes*np.bincount(y)). These weights will be
multiplied with sample_weight when passed through the fit()
method.
multiclass ({"multinomial", "ovr"}, default="multinomial") – Used only for n_classes_ class classification with n_classes_>2 and
data with categorical features.
If “multinomial”, n_estimators trees will be trained to make
multiclass predictions. See also cat_split_strategy in this case.
If “ovr” we use a one-versus-all strategy, where labels are binarized and
n_classes_*n_estimators trees are trained to make binary predictions and
the final predictions are obtained as normalized scores. Use
multiclass="ovr" together with categorical_features for the best results
in multiclass problems with categorical features.
cat_split_strategy ({"binary", "all", "random"}, default="binary") – Used only for n_classes_-class classification with n_classes_>2,
data with categorical features and multiclass="multinomial". If
“binary”, split-search for categorical features use a single loop over
the bins sorted with respect to the proportion of labels with class 1 in each
bin. If “all”, it uses n_classes_ loops, corresponding to the bins
sorted with respect to the proportion of labels of each class. If “random”,
it performs a single loop, with bins sorted at random.
Apply trees in the forest to X, return leaf indices.
Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
Returns:
X_leaves – For each datapoint x in X and for each tree in the forest, return the
index of the leaf x ends up in.
Return type:
ndarray of shape (n_samples, n_estimators)
fit(X, y, sample_weight=None, categorical_features=None, randomized_depth=False)[source]#
Trains WildWood’s forest predictor from the training set (X, y).
Parameters:
TODO (#) –
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, it will be binned into a uint8
data type.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in
regression).
sample_weight (array-like of shape (n_samples,), default=None) – If None, then samples are equally weighted. Otherwise, samples are
weighted. If sample_weight[42] = 3.0 then all computations do “as if”
there were 3 lines with the same contents as X[42] in all computations
(for split finding, node predictions and for the aggregation algorithm
(computation of validation losses).
categorical_features (array-like, default=None) – Array-like containing boolean or integer values or shape (n_features,) or
(n_categorical_features,) indicating the categorical features.
Note that this can be specified as well as a parameter of the class.
If None : no feature will be considered categorical.
If boolean array-like : boolean mask indicating categorical features.
If integer array-like : integer indices indicating categorical features.
The predicted class of an input sample is a vote by the trees in the forest,
weighted by their probability estimates. That is, the predicted class is the
one with highest mean probability estimate across the trees.
Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples.
The predicted class probabilities of an input sample are computed as
the mean predicted class probabilities of the trees in the forest.
If aggregation=False, the class probability of a single tree is a
regularization using the dirichlet parameter of the fraction of samples of
the same class in a leaf. If aggregation=True the class probability of a
single tree is an aggregation with exponential weights of the predictions of
all pruned subtrees it contains. See Prediction function: aggregation with exponential weights for more details.
Parameters:
X ({array-like} of shape (n_samples, n_features)) – The input samples.
Returns:
output – The class probabilities of the input samples.
Gives the predict_proba(X) of each tree in the forest.
This simply returns a (n_estimator,n_samples,n_classes) ndarray
containing the predict_proba of each tree in the forest,
see : meth:predict_proba for details.
Parameters:
X ({array-like} of shape (n_samples, n_features)) – The input samples.
Returns:
output – The predicted class probabilities by each tree for the input samples.
Return type:
ndarray of shape (n_estimators, n_samples, n_classes)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score – Mean accuracy of self.predict(X) w.r.t. y.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
New in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
categorical_features (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for categorical_features parameter in fit.
randomized_depth (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for randomized_depth parameter in fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
New in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
It grows in parallel n_estimator trees using bootstrap samples and aggregates
their predictions (bagging). Each tree uses “in-the-bag” samples to grow itself
and “out-of-bag” samples to compute aggregation weights for all possible subtrees
of the whole tree.
The prediction function of each tree in WildWood is very different from the one
of a standard decision trees whenever aggregation=True (default). Indeed, the
predictions of a tree are computed here as an aggregation with exponential
weights of all the predictions given by all possible subtrees (prunings) of the
full tree. The required computations are performed efficiently thanks to a
variant of the context tree weighting algorithm.
Also, continuous features are binned with a maximum of max_bins bins (+1 if
it contains missing values) allowing to use an efficient histogram-based split
search.
Parameters:
n_estimators (int, default=10) – The number of trees in the forest.
criterion ({"mse"}, default="mse") – The impurity criterion used to measure the quality of a split. Only “mse”,
which corresponds to variance reduction for split finding is available for now.
loss ({"mse"}, default="mse") – The loss used for the computation of the aggregation weights. Only “mse”
is supported for now, which corresponds to the least-squares loss.
step (float, default=1.0) – Step-size for the aggregation weights. Default is 1.0, a larger value will
lead to larger aggregation weights for subtrees with better out-of-bag (
validation) loss.
aggregation (bool, default=True) – Controls if aggregation is used in the trees. It is highly recommended to
leave it as True.
max_depth (int, default=None) – The maximum depth of a tree. If None, then nodes from the tree are split until
they are “pure” (impurity is zero) or until they contain
min_samples_split samples.
min_samples_split (int, default=2) – The minimum number of training samples and out-the-bag samples required to
split a node. This must be >= 2.
min_samples_leaf (int, default=1) – A split point is considered if it leaves at least min_samples_leaf
training samples and out-the-bag samples in the left and right childs.
This must be >= 1.
max_bins (int, default=256) – The maximum number of bins for numerical columns, not including the bin used
for missing values, if any. Should be at least 3. Before training, each column
of the input array X is binned into integer-valued bins,
corresponding to inter-quantile intervals, enabling faster split finding.
We will use max_bins bins when the column has no missing values, and
max_bins+1 bins if it does.
The last bin (at index max_bins) is used to encode missing values.
If a column has less than max_bins different inter-quantile or
categories, we use less than max_bins bins for it.
categorical_features (array-like, default=None) – Array-like containing boolean or integer values or shape (n_features,) or
(n_categorical_features,) indicating the categorical features.
If None : no feature will be considered categorical.
If boolean array-like : boolean mask indicating categorical features.
If integer array-like : integer indices indicating categorical features.
max_features ({"auto", "sqrt", "log2"} or int, default="auto") – The number of features to consider when looking for the best split.
If int, consider max_features features at each split.
If “auto”, max_features=sqrt(n_features).
If “sqrt”, max_features=sqrt(n_features) (same as “auto”).
If “log2”, max_features=log2(n_features)
If None, max_features=n_features.
handle_unknown ({"error", "consider_missing"}, default="error") – If set to “error”, an error will be raised while encoding the data whenever a
category in a categorical column was not seen during fit. If set to
“consider_missing”, we will consider it as a missing value (it will end up in
the same bin as missing values).
cat_min_categories (int or {"log", "sqrt"}, default="log") – When a column contains numerical values and its type is not specified through
categorical_columns, WildWood decides that it is categorical whenever its
number of unique values is smaller or equal to cat_min_categories.
Otherwise, it is considered numerical.
If an int larger than 3 is given, we use it as cat_min_categories.
If “log”, we set cat_min_categories=max(2,floor(log(n_samples))).
If “sqrt”, we set cat_min_categories=max(2,floor(sqrt(n_samples))).
Default is “log”.
subsample (int or None, default=200000) – If n_samples>subsample, then subsample samples are chosen at random
to compute the quantiles used to bin numerical columns. If None, the whole
dataset is used.
n_jobs (int, default=1) – The number of jobs to run in parallel for fit(), predict() and
apply(). All these methods are parallelized over the trees in the
forest. n_jobs=-1 means using all processors.
random_state (int, RandomState instance or None, default=None) – Controls both the randomness involved in bootstrapping the samples and
sampling the features when looking for the best splits
(if max_features<n_features). See Bootstrap and feature subsampling for details.
verbose (bool, default=False) – Controls the verbosity when fitting and predicting.
Apply trees in the forest to X, return leaf indices.
Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
Returns:
X_leaves – For each datapoint x in X and for each tree in the forest, return the
index of the leaf x ends up in.
Return type:
ndarray of shape (n_samples, n_estimators)
fit(X, y, sample_weight=None, categorical_features=None, randomized_depth=False)[source]#
Trains WildWood’s forest predictor from the training set (X, y).
Parameters:
TODO (#) –
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, it will be binned into a uint8
data type.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in
regression).
sample_weight (array-like of shape (n_samples,), default=None) – If None, then samples are equally weighted. Otherwise, samples are
weighted. If sample_weight[42] = 3.0 then all computations do “as if”
there were 3 lines with the same contents as X[42] in all computations
(for split finding, node predictions and for the aggregation algorithm
(computation of validation losses).
categorical_features (array-like, default=None) – Array-like containing boolean or integer values or shape (n_features,) or
(n_categorical_features,) indicating the categorical features.
Note that this can be specified as well as a parameter of the class.
If None : no feature will be considered categorical.
If boolean array-like : boolean mask indicating categorical features.
If integer array-like : integer indices indicating categorical features.
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as
\((1 - \frac{u}{v})\), where \(u\) is the residual
sum of squares ((y_true-y_pred)**2).sum() and \(v\)
is the total sum of squares ((y_true-y_true.mean())**2).sum().
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of y, disregarding the input features, would get
a \(R^2\) score of 0.0.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples,n_samples_fitted), where n_samples_fitted
is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score – \(R^2\) of self.predict(X) w.r.t. y.
Return type:
float
Notes
The \(R^2\) score used when calling score on a regressor uses
multioutput='uniform_average' from version 0.23 to keep consistent
with default value of r2_score().
This influences the score method of all the multioutput
regressors (except for
MultiOutputRegressor).
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
New in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
categorical_features (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for categorical_features parameter in fit.
randomized_depth (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for randomized_depth parameter in fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
New in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.