TPOT API

Classification

class tpot.TPOTClassifier(generations=100, population_size=100,
                          offspring_size=None, mutation_rate=0.9,
                          crossover_rate=0.1,
                          scoring='accuracy', cv=5,
                          subsample=1.0, n_jobs=1,
                          max_time_mins=None, max_eval_time_mins=5,
                          random_state=None, config_dict=None,
                          template=None,
                          warm_start=False,
                          memory=None,
                          use_dask=False,
                          periodic_checkpoint_folder=None,
                          early_stop=None,
                          verbosity=0,
                          disable_update_check=False,
                          log_file=None
                          )

source

Automated machine learning for supervised classification tasks.

The TPOTClassifier performs an intelligent search over machine learning pipelines that can contain supervised classification models, preprocessors, feature selection techniques, and any other estimator or transformer that follows the scikit-learn API. The TPOTClassifier will also search over the hyperparameters of all objects in the pipeline.

By default, TPOTClassifier will search over a broad range of supervised classification algorithms, transformers, and their parameters. However, the algorithms, transformers, and hyperparameters that the TPOTClassifier searches over can be fully customized using the config_dict parameter.

Read more in the User Guide.

Parameters:

generations: int or None optional (default=100)

Number of iterations to the run pipeline optimization process. It must be a positive number or None. If None, the parameter max_time_mins must be defined as the runtime limit.

Generally, TPOT will work better when you give it more generations (and therefore time) to optimize the pipeline.

TPOT will evaluate population_size + generations × offspring_size pipelines in total.

population_size: int, optional (default=100)

Number of individuals to retain in the genetic programming population every generation. Must be a positive number.

Generally, TPOT will work better when you give it more individuals with which to optimize the pipeline.

offspring_size: int, optional (default=None)

Number of offspring to produce in each genetic programming generation. Must be a positive number. By default, the number of offspring is equal to the number of population size.

mutation_rate: float, optional (default=0.9)

Mutation rate for the genetic programming algorithm in the range [0.0, 1.0]. This parameter tells the GP algorithm how many pipelines to apply random changes to every generation.

mutation_rate + crossover_rate cannot exceed 1.0.

We recommend using the default parameter unless you understand how the mutation rate affects GP algorithms.

crossover_rate: float, optional (default=0.1)

Crossover rate for the genetic programming algorithm in the range [0.0, 1.0]. This parameter tells the genetic programming algorithm how many pipelines to "breed" every generation.

mutation_rate + crossover_rate cannot exceed 1.0.

We recommend using the default parameter unless you understand how the crossover rate affects GP algorithms.

scoring: string or callable, optional (default='accuracy')

Function used to evaluate the quality of a given pipeline for the classification problem. The following built-in scoring functions can be used:

'accuracy', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_log_loss', 'precision' etc. (suffixes apply as with ‘f1’), 'recall' etc. (suffixes apply as with ‘f1’), ‘jaccard’ etc. (suffixes apply as with ‘f1’), 'roc_auc', ‘roc_auc_ovr’, ‘roc_auc_ovo’, ‘roc_auc_ovr_weighted’, ‘roc_auc_ovo_weighted’

If you would like to use a custom scorer, you can pass the callable object/function with signature scorer(estimator, X, y).

See the section on scoring functions for more details.

cv: int, cross-validation generator, or an iterable, optional (default=5)

Cross-validation strategy used when evaluating pipelines.

Possible inputs:

integer, to specify the number of folds in an unshuffled StratifiedKFold,

An object to be used as a cross-validation generator, or

An iterable yielding train/test splits.

subsample: float, optional (default=1.0)

Fraction of training samples that are used during the TPOT optimization process. Must be in the range (0.0, 1.0].

Setting subsample=0.5 tells TPOT to use a random subsample of half of the training data. This subsample will remain the same during the entire pipeline optimization process.

n_jobs: integer, optional (default=1)

Number of processes to use in parallel for evaluating pipelines during the TPOT optimization process.

Setting n_jobs=-1 will use as many cores as available on the computer. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Beware that using multiple processes on the same machine may cause memory issues for large datasets.

max_time_mins: integer or None, optional (default=None)

How many minutes TPOT has to optimize the pipeline.

If not None, this setting will allow TPOT to run until max_time_mins minutes elapsed and then stop. TPOT will stop earlier if generations is set and all generations are already evaluated.

max_eval_time_mins: float, optional (default=5)

How many minutes TPOT has to evaluate a single pipeline.

Setting this parameter to higher values will allow TPOT to evaluate more complex pipelines, but will also allow TPOT to run longer. Use this parameter to help prevent TPOT from wasting time on evaluating time-consuming pipelines.

random_state: integer or None, optional (default=None)

The seed of the pseudo random number generator used in TPOT.

Use this parameter to make sure that TPOT will give you the same results each time you run it against the same data set with that seed.

config_dict: Python dictionary, string, or None, optional (default=None)

A configuration dictionary for customizing the operators and parameters that TPOT searches in the optimization process.

Possible inputs are:

Python dictionary, TPOT will use your custom configuration,

string 'TPOT light', TPOT will use a built-in configuration with only fast models and preprocessors, or

string 'TPOT MDR', TPOT will use a built-in configuration specialized for genomic studies, or

string 'TPOT sparse': TPOT will use a configuration dictionary with a one-hot encoder and the operators normally included in TPOT that also support sparse matrices, or

None, TPOT will use the default TPOTClassifier configuration.

See the built-in configurations section for the list of configurations included with TPOT, and the custom configuration section for more information and examples of how to create your own TPOT configurations.

template: string (default=None)

Template of predefined pipeline structure. The option is for specifying a desired structure for the machine learning pipeline evaluated in TPOT.

So far this option only supports linear pipeline structure. Each step in the pipeline should be a main class of operators (Selector, Transformer, Classifier) or a specific operator (e.g. `SelectPercentile`) defined in TPOT operator configuration. If one step is a main class, TPOT will randomly assign all subclass operators (subclasses of [`SelectorMixin`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/base.py#L17), [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html), [`ClassifierMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.ClassifierMixin.html) in scikit-learn) to that step. Steps in the template are delimited by "-", e.g. "SelectPercentile-Transformer-Classifier". By default value of template is None, TPOT generates tree-based pipeline randomly. See the template option in tpot section for more details.

warm_start: boolean, optional (default=False)

Flag indicating whether the TPOT instance will reuse the population from previous calls to fit().

Setting warm_start=True can be useful for running TPOT for a short time on a dataset, checking the results, then resuming the TPOT run from where it left off.

memory: a joblib.Memory object or string, optional (default=None)

If supplied, pipeline will cache each transformer after calling fit. This feature is used to avoid computing the fit transformers within a pipeline if the parameters and input data are identical with another fitted pipeline during optimization process. More details about memory caching in scikit-learn documentation

Possible inputs are:

String 'auto': TPOT uses memory caching with a temporary directory and cleans it up upon shutdown, or

Path of a caching directory, TPOT uses memory caching with the provided directory and TPOT does NOT clean the caching directory up upon shutdown, or

Memory object, TPOT uses the instance of joblib.Memory for memory caching and TPOT does NOT clean the caching directory up upon shutdown, or

None, TPOT does not use memory caching.

use_dask: boolean, optional (default: False)

Whether to use Dask-ML's pipeline optimiziations. This avoid re-fitting the same estimator on the same split of data multiple times. It will also provide more detailed diagnostics when using Dask's distributed scheduler.

See avoid repeated work for more details.

periodic_checkpoint_folder: path string, optional (default: None)

If supplied, a folder in which TPOT will periodically save pipelines in pareto front so far while optimizing.

Currently once per generation but not more often than once per 30 seconds.

Useful in multiple cases:

Sudden death before TPOT could save optimized pipeline

Track its progress

Grab pipelines while it's still optimizing

early_stop: integer, optional (default: None)

How many generations TPOT checks whether there is no improvement in optimization process.

Ends the optimization process if there is no improvement in the given number of generations.

verbosity: integer, optional (default=0)

How much information TPOT communicates while it's running.

Possible inputs are:

0, TPOT will print nothing,

1, TPOT will print minimal information,

2, TPOT will print more information and provide a progress bar, or

3, TPOT will print everything and provide a progress bar.

disable_update_check: boolean, optional (default=False)

Flag indicating whether the TPOT version checker should be disabled.

The update checker will tell you when a new version of TPOT has been released.

log_file: file-like class (io.TextIOWrapper or io.StringIO) or string, optional (default: None)

Save progress content to a file. If it is a string for the path and file name of the desired output file, TPOT will create the file and write log into it. If it is None, TPOT will output log into sys.stdout

Attributes:

fitted_pipeline_: scikit-learn Pipeline object

The best pipeline that TPOT discovered during the pipeline optimization process, fitted on the entire training dataset.

pareto_front_fitted_pipelines_: Python dictionary

Dictionary containing the all pipelines on the TPOT Pareto front, where the key is the string representation of the pipeline and the value is the corresponding pipeline fitted on the entire training dataset.

The TPOT Pareto front provides a trade-off between pipeline complexity (i.e., the number of steps in the pipeline) and the predictive performance of the pipeline.

Note: pareto_front_fitted_pipelines_ is only available when verbosity=3.

evaluated_individuals_: Python dictionary

Dictionary containing all pipelines that were evaluated during the pipeline optimization process, where the key is the string representation of the pipeline and the value is a tuple containing (# of steps in pipeline, accuracy metric for the pipeline).

This attribute is primarily for internal use, but may be useful for looking at the other pipelines that TPOT evaluated.

Example

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_digits_pipeline.py')

Functions

fit(features, classes[, sample_weight, groups])	Run the TPOT optimization process on the given training data.
predict(features)	Use the optimized pipeline to predict the classes for a feature set.
predict_proba(features)	Use the optimized pipeline to estimate the class probabilities for a feature set.
score(testing_features, testing_classes)	Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
export(output_file_name)	Export the optimized pipeline as Python code.

fit(features, classes, sample_weight=None, groups=None)

Run the TPOT optimization process on the given training data.

Uses genetic programming to optimize a machine learning pipeline that maximizes the score on the provided features and target. This pipeline optimization procedure uses internal k-fold cross-validaton to avoid overfitting on the provided data. At the end of the pipeline optimization procedure, the best pipeline is then trained on the entire set of provided samples.

Parameters:

features: array-like {n_samples, n_features}

Feature matrix

TPOT and all scikit-learn algorithms assume that the features will be numerical and there will be no missing values. As such, when a feature matrix is provided to TPOT, all missing values will automatically be replaced (i.e., imputed) using median value imputation.

If you wish to use a different imputation strategy than median imputation, please make sure to apply imputation to your feature set prior to passing it to TPOT.

classes: array-like {n_samples}

List of class labels for prediction

sample_weight: array-like {n_samples}, optional

Per-sample weights. Higher weights indicate more importance. If specified, sample_weight will be passed to any pipeline element whose fit() function accepts a sample_weight argument. By default, using sample_weight does not affect tpot's scoring functions, which determine preferences between pipelines.

groups: array-like, with shape {n_samples, }, optional

Group labels for the samples used when performing cross-validation.

This parameter should only be used in conjunction with sklearn's Group cross-validation functions, such as sklearn.model_selection.GroupKFold.

Returns:

self: object

Returns a copy of the fitted TPOT object

predict(features)

Use the optimized pipeline to predict the classes for a feature set.

Parameters:	features: array-like {n_samples, n_features} Feature matrix
Returns:	predictions: array-like {n_samples} Predicted classes for the samples in the feature matrix

predict_proba(features)

Use the optimized pipeline to estimate the class probabilities for a feature set.

Note: This function will only work for pipelines whose final classifier supports the predict_proba function. TPOT will raise an error otherwise.

Parameters:	features: array-like {n_samples, n_features} Feature matrix
Returns:	predictions: array-like {n_samples, n_classes} The class probabilities of the input samples

score(testing_features, testing_classes)

Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.

The default scoring function for TPOTClassifier is 'accuracy'.

Parameters:

testing_features: array-like {n_samples, n_features}

Feature matrix of the testing set

testing_classes: array-like {n_samples}

List of class labels for prediction in the testing set

Returns:

accuracy_score: float

The estimated test set accuracy according to the user-specified scoring function.

export(output_file_name, data_file_path)

Export the optimized pipeline as Python code.

See the usage documentation for example usage of the export function.

Parameters:

output_file_name: string

String containing the path and file name of the desired output file

data_file_path: string

By default, the path of input dataset is 'PATH/TO/DATA/FILE' by default. If data_file_path is another string, the path will be replaced.

Returns:

exported_code_string: string

The whole pipeline text as a string should be returned if output_file_name is not specified.

Regression

class tpot.TPOTRegressor(generations=100, population_size=100,
                         offspring_size=None, mutation_rate=0.9,
                         crossover_rate=0.1,
                         scoring='neg_mean_squared_error', cv=5,
                         subsample=1.0, n_jobs=1,
                         max_time_mins=None, max_eval_time_mins=5,
                         random_state=None, config_dict=None,
                         template=None,
                         warm_start=False,
                         memory=None,
                         use_dask=False,
                         periodic_checkpoint_folder=None,
                         early_stop=None,
                         verbosity=0,
                         disable_update_check=False)

source

Automated machine learning for supervised regression tasks.

The TPOTRegressor performs an intelligent search over machine learning pipelines that can contain supervised regression models, preprocessors, feature selection techniques, and any other estimator or transformer that follows the scikit-learn API. The TPOTRegressor will also search over the hyperparameters of all objects in the pipeline.

By default, TPOTRegressor will search over a broad range of supervised regression models, transformers, and their hyperparameters. However, the models, transformers, and parameters that the TPOTRegressor searches over can be fully customized using the config_dict parameter.

Read more in the User Guide.

Parameters:

generations: int or None, optional (default=100)

Number of iterations to the run pipeline optimization process. It must be a positive number or None. If None, the parameter max_time_mins must be defined as the runtime limit.

Generally, TPOT will work better when you give it more generations (and therefore time) to optimize the pipeline.

TPOT will evaluate population_size + generations × offspring_size pipelines in total.

population_size: int, optional (default=100)

Number of individuals to retain in the genetic programming population every generation. Must be a positive number.

Generally, TPOT will work better when you give it more individuals with which to optimize the pipeline.

offspring_size: int, optional (default=None)

Number of offspring to produce in each genetic programming generation. Must be a positive number. By default, the number of offspring is equal to the number of population size.

mutation_rate: float, optional (default=0.9)

Mutation rate for the genetic programming algorithm in the range [0.0, 1.0]. This parameter tells the GP algorithm how many pipelines to apply random changes to every generation.

mutation_rate + crossover_rate cannot exceed 1.0.

We recommend using the default parameter unless you understand how the mutation rate affects GP algorithms.

crossover_rate: float, optional (default=0.1)

Crossover rate for the genetic programming algorithm in the range [0.0, 1.0]. This parameter tells the genetic programming algorithm how many pipelines to "breed" every generation.

mutation_rate + crossover_rate cannot exceed 1.0.

We recommend using the default parameter unless you understand how the crossover rate affects GP algorithms.

scoring: string or callable, optional (default='neg_mean_squared_error')

Function used to evaluate the quality of a given pipeline for the regression problem. The following built-in scoring functions can be used:

'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'r2'

Note that we recommend using the neg version of mean squared error and related metrics so TPOT will minimize (instead of maximize) the metric.

If you would like to use a custom scorer, you can pass the callable object/function with signature scorer(estimator, X, y).

See the section on scoring functions for more details.

cv: int, cross-validation generator, or an iterable, optional (default=5)

Cross-validation strategy used when evaluating pipelines.

Possible inputs:

integer, to specify the number of folds in an unshuffled KFold,

An object to be used as a cross-validation generator, or

An iterable yielding train/test splits.

subsample: float, optional (default=1.0)

Fraction of training samples that are used during the TPOT optimization process. Must be in the range (0.0, 1.0].

Setting subsample=0.5 tells TPOT to use a random subsample of half of the training data. This subsample will remain the same during the entire pipeline optimization process.

n_jobs: integer, optional (default=1)

Number of processes to use in parallel for evaluating pipelines during the TPOT optimization process.

Setting n_jobs=-1 will use as many cores as available on the computer. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Beware that using multiple processes on the same machine may cause memory issues for large datasets

max_time_mins: integer or None, optional (default=None)

How many minutes TPOT has to optimize the pipeline.

If not None, this setting will allow TPOT to run until max_time_mins minutes elapsed and then stop. TPOT will stop earlier if generations is set and all generations are already evaluated.

max_eval_time_mins: float, optional (default=5)

How many minutes TPOT has to evaluate a single pipeline.

Setting this parameter to higher values will allow TPOT to evaluate more complex pipelines, but will also allow TPOT to run longer. Use this parameter to help prevent TPOT from wasting time on evaluating time-consuming pipelines.

random_state: integer or None, optional (default=None)

The seed of the pseudo random number generator used in TPOT.

Use this parameter to make sure that TPOT will give you the same results each time you run it against the same data set with that seed.

config_dict: Python dictionary, string, or None, optional (default=None)

A configuration dictionary for customizing the operators and parameters that TPOT searches in the optimization process.

Possible inputs are:

Python dictionary, TPOT will use your custom configuration,

string 'TPOT light', TPOT will use a built-in configuration with only fast models and preprocessors, or

string 'TPOT MDR', TPOT will use a built-in configuration specialized for genomic studies, or

string 'TPOT sparse': TPOT will use a configuration dictionary with a one-hot encoder and the operators normally included in TPOT that also support sparse matrices, or

None, TPOT will use the default TPOTRegressor configuration.

See the built-in configurations section for the list of configurations included with TPOT, and the custom configuration section for more information and examples of how to create your own TPOT configurations.

template: string (default=None)

Template of predefined pipeline structure. The option is for specifying a desired structure for the machine learning pipeline evaluated in TPOT.

So far this option only supports linear pipeline structure. Each step in the pipeline should be a main class of operators (Selector, Transformer or Regressor) or a specific operator (e.g. `SelectPercentile`) defined in TPOT operator configuration. If one step is a main class, TPOT will randomly assign all subclass operators (subclasses of [`SelectorMixin`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/base.py#L17), [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html) or [`RegressorMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.RegressorMixin.html) in scikit-learn) to that step. Steps in the template are delimited by "-", e.g. "SelectPercentile-Transformer-Regressor". By default value of template is None, TPOT generates tree-based pipeline randomly. See the template option in tpot section for more details.

warm_start: boolean, optional (default=False)

Flag indicating whether the TPOT instance will reuse the population from previous calls to fit().

Setting warm_start=True can be useful for running TPOT for a short time on a dataset, checking the results, then resuming the TPOT run from where it left off.

memory: a joblib.Memory object or string, optional (default=None)

If supplied, pipeline will cache each transformer after calling fit. This feature is used to avoid computing the fit transformers within a pipeline if the parameters and input data are identical with another fitted pipeline during optimization process. More details about memory caching in scikit-learn documentation

Possible inputs are:

String 'auto': TPOT uses memory caching with a temporary directory and cleans it up upon shutdown, or

Path of a caching directory, TPOT uses memory caching with the provided directory and TPOT does NOT clean the caching directory up upon shutdown, or

Memory object, TPOT uses the instance of joblib.Memory for memory caching and TPOT does NOT clean the caching directory up upon shutdown, or

None, TPOT does not use memory caching.

use_dask: boolean, optional (default: False)

Whether to use Dask-ML's pipeline optimiziations. This avoid re-fitting the same estimator on the same split of data multiple times. It will also provide more detailed diagnostics when using Dask's distributed scheduler.

See avoid repeated work for more details.

periodic_checkpoint_folder: path string, optional (default: None)

If supplied, a folder in which TPOT will periodically save pipelines in pareto front so far while optimizing.

Currently once per generation but not more often than once per 30 seconds.

Useful in multiple cases:

Sudden death before TPOT could save optimized pipeline

Track its progress

Grab pipelines while it's still optimizing

early_stop: integer, optional (default: None)

How many generations TPOT checks whether there is no improvement in optimization process.

Ends the optimization process if there is no improvement in the given number of generations.

verbosity: integer, optional (default=0)

How much information TPOT communicates while it's running.

Possible inputs are:

0, TPOT will print nothing,

1, TPOT will print minimal information,

2, TPOT will print more information and provide a progress bar, or

3, TPOT will print everything and provide a progress bar.

disable_update_check: boolean, optional (default=False)

Flag indicating whether the TPOT version checker should be disabled.

The update checker will tell you when a new version of TPOT has been released.

Attributes:

fitted_pipeline_: scikit-learn Pipeline object

The best pipeline that TPOT discovered during the pipeline optimization process, fitted on the entire training dataset.

pareto_front_fitted_pipelines_: Python dictionary

Dictionary containing the all pipelines on the TPOT Pareto front, where the key is the string representation of the pipeline and the value is the corresponding pipeline fitted on the entire training dataset.

The TPOT Pareto front provides a trade-off between pipeline complexity (i.e., the number of steps in the pipeline) and the predictive performance of the pipeline.

Note: _pareto_front_fitted_pipelines is only available when verbosity=3.

evaluated_individuals_: Python dictionary

Dictionary containing all pipelines that were evaluated during the pipeline optimization process, where the key is the string representation of the pipeline and the value is a tuple containing (# of steps in pipeline, accuracy metric for the pipeline).

This attribute is primarily for internal use, but may be useful for looking at the other pipelines that TPOT evaluated.

Example

from tpot import TPOTRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

digits = load_boston()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

tpot = TPOTRegressor(generations=5, population_size=50, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')

Functions

fit(features, target[, sample_weight, groups])	Run the TPOT optimization process on the given training data.
predict(features)	Use the optimized pipeline to predict the target values for a feature set.
score(testing_features, testing_target)	Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
export(output_file_name)	Export the optimized pipeline as Python code.

fit(features, target, sample_weight=None, groups=None)

Run the TPOT optimization process on the given training data.

Uses genetic programming to optimize a machine learning pipeline that maximizes the score on the provided features and target. This pipeline optimization procedure uses internal k-fold cross-validaton to avoid overfitting on the provided data. At the end of the pipeline optimization procedure, the best pipeline is then trained on the entire set of provided samples.

Parameters:

features: array-like {n_samples, n_features}

Feature matrix

TPOT and all scikit-learn algorithms assume that the features will be numerical and there will be no missing values. As such, when a feature matrix is provided to TPOT, all missing values will automatically be replaced (i.e., imputed) using median value imputation.

If you wish to use a different imputation strategy than median imputation, please make sure to apply imputation to your feature set prior to passing it to TPOT.

target: array-like {n_samples}

List of target labels for prediction

sample_weight: array-like {n_samples}, optional

Per-sample weights. Higher weights indicate more importance. If specified, sample_weight will be passed to any pipeline element whose fit() function accepts a sample_weight argument. By default, using sample_weight does not affect tpot's scoring functions, which determine preferences between pipelines.

groups: array-like, with shape {n_samples, }, optional

Group labels for the samples used when performing cross-validation.

This parameter should only be used in conjunction with sklearn's Group cross-validation functions, such as sklearn.model_selection.GroupKFold.

Returns:

self: object

Returns a copy of the fitted TPOT object

predict(features)

Use the optimized pipeline to predict the target values for a feature set.

Parameters:	features: array-like {n_samples, n_features} Feature matrix
Returns:	predictions: array-like {n_samples} Predicted target values for the samples in the feature matrix

score(testing_features, testing_target)

Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.

The default scoring function for TPOTRegressor is 'mean_squared_error'.

Parameters:

testing_features: array-like {n_samples, n_features}

Feature matrix of the testing set

testing_target: array-like {n_samples}

List of target labels for prediction in the testing set

Returns:

accuracy_score: float

The estimated test set accuracy according to the user-specified scoring function.

export(output_file_name)

Export the optimized pipeline as Python code.

See the usage documentation for example usage of the export function.

Parameters:

output_file_name: string

String containing the path and file name of the desired output file

data_file_path: string

By default, the path of input dataset is 'PATH/TO/DATA/FILE' by default. If data_file_path is another string, the path will be replaced.

Returns:

exported_code_string: string

The whole pipeline text as a string should be returned if output_file_name is not specified.