Estimator utils
This file is part of the TPOT library.
The current version of TPOT was developed at Cedars-Sinai by: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)
The original version of TPOT was primarily developed at the University of Pennsylvania by: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - and many more generous open-source contributors
TPOT is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
TPOT is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with TPOT. If not, see http://www.gnu.org/licenses/.
apply_make_pipeline(ind, preprocessing_pipeline=None, export_graphpipeline=False, **pipeline_kwargs)
¶
Helper function to create a column of sklearn pipelines from the tpot2 individual class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ind |
The individual to convert to a pipeline. |
required | |
preprocessing_pipeline |
The preprocessing pipeline to include before the individual's pipeline. |
None
|
|
export_graphpipeline |
Force the pipeline to be exported as a graph pipeline. Flattens all nested pipelines, FeatureUnions, and GraphPipelines into a single GraphPipeline. |
False
|
|
pipeline_kwargs |
Keyword arguments to pass to the export_pipeline or export_flattened_graphpipeline method. |
{}
|
Returns:
Type | Description |
---|---|
sklearn estimator
|
|
Source code in tpot2/tpot_estimator/estimator_utils.py
check_if_y_is_encoded(y)
¶
Checks if the target y is composed of sequential ints from 0 to N. XGBoost requires the target to be encoded in this way.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
The target vector. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the target is encoded as sequential ints from 0 to N, False otherwise |
Source code in tpot2/tpot_estimator/estimator_utils.py
convert_parents_tuples_to_integers(row, object_to_int)
¶
Helper function to convert the parent rows into integers representing the index of the parent in the population.
Original pandas dataframe using a custom index for the parents. This function converts the custom index to an integer index for easier manipulation by end users.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row |
The row to convert. |
required | |
object_to_int |
A dictionary mapping the object to an integer index. |
required |
Returns
tuple The row with the custom index converted to an integer index.
Source code in tpot2/tpot_estimator/estimator_utils.py
objective_function_generator(pipeline, x, y, scorers, cv, other_objective_functions, step=None, budget=None, is_classification=True, export_graphpipeline=False, **pipeline_kwargs)
¶
Uses cross validation to evaluate the pipeline using the scorers, and concatenates results with scores from standalone other objective functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline |
The individual to evaluate. |
required | |
x |
The feature matrix. |
required | |
y |
The target vector. |
required | |
scorers |
The scorers to use for cross validation. |
required | |
cv |
The cross-validator to use. For example, sklearn.model_selection.KFold or sklearn.model_selection.StratifiedKFold. If an int, will use sklearn.model_selection.KFold with n_splits=cv. |
required | |
other_objective_functions |
A list of standalone objective functions to evaluate the pipeline. With signature obj(pipeline) -> float. or obj(pipeline) -> np.ndarray These functions take in the unfitted estimator. |
required | |
step |
The fold to return the scores for. If None, will return the mean of all the scores (per scorer). Default is None. |
None
|
|
budget |
The budget to subsample the data. If None, will use the full dataset. Default is None. Will subsample budget*len(x) samples. |
None
|
|
is_classification |
If True, will stratify the subsampling. Default is True. |
True
|
|
export_graphpipeline |
Force the pipeline to be exported as a graph pipeline. Flattens all nested sklearn pipelines, FeatureUnions, and GraphPipelines into a single GraphPipeline. |
False
|
|
pipeline_kwargs |
Keyword arguments to pass to the export_pipeline or export_flattened_graphpipeline method. |
{}
|
Returns:
Type | Description |
---|---|
ndarray
|
The concatenated scores for the pipeline. The first len(scorers) elements are the cross validation scores, and the remaining elements are the standalone objective functions. |
Source code in tpot2/tpot_estimator/estimator_utils.py
remove_underrepresented_classes(x, y, min_count)
¶
Helper function to remove classes with less than min_count samples from the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
The feature matrix. |
required | |
y |
The target vector. |
required | |
min_count |
The minimum number of samples to keep a class. |
required |
Returns:
Type | Description |
---|---|
(ndarray, ndarray)
|
The feature matrix and target vector with rows from classes with less than min_count samples removed. |
Source code in tpot2/tpot_estimator/estimator_utils.py
val_objective_function_generator(pipeline, X_train, y_train, X_test, y_test, scorers, other_objective_functions, export_graphpipeline=False, **pipeline_kwargs)
¶
Trains a pipeline on a training set and evaluates it on a test set using the scorers and other objective functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline |
The individual to evaluate. |
required | |
X_train |
The feature matrix of the training set. |
required | |
y_train |
The target vector of the training set. |
required | |
X_test |
The feature matrix of the test set. |
required | |
y_test |
The target vector of the test set. |
required | |
scorers |
The scorers to use for cross validation. |
required | |
other_objective_functions |
A list of standalone objective functions to evaluate the pipeline. With signature obj(pipeline) -> float. or obj(pipeline) -> np.ndarray These functions take in the unfitted estimator. |
required | |
export_graphpipeline |
Force the pipeline to be exported as a graph pipeline. Flattens all nested sklearn pipelines, FeatureUnions, and GraphPipelines into a single GraphPipeline. |
False
|
|
pipeline_kwargs |
Keyword arguments to pass to the export_pipeline or export_flattened_graphpipeline method. |
{}
|
Returns:
Type | Description |
---|---|
ndarray
|
The concatenated scores for the pipeline. The first len(scorers) elements are the cross validation scores, and the remaining elements are the standalone objective functions. |