sklearn.model_selection.train_test_split

sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None) [source]

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)
...
>>> X_train
array([[4, 5],
       [0, 1],
       [6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
       [8, 9]])
>>> y_test
[1, 4]

>>> train_test_split(y, shuffle=False)
[[0, 1, 2], [3, 4]]

Examples using `sklearn.model_selection.train_test_split`

Release Highlights for scikit-learn 0.23

Release Highlights for scikit-learn 0.24

Release Highlights for scikit-learn 0.22

Probability Calibration curves

Probability calibration of classifiers

Recognizing hand-written digits

Classifier comparison

Principal Component Regression vs Partial Least Squares Regression

Post pruning decision trees with cost complexity pruning

Understanding the decision tree structure

Comparing random forests and the multi-output meta estimator

Gradient Boosting regression

Early stopping of Gradient Boosting

Feature transformations with ensembles of trees

Gradient Boosting Out-of-Bag estimates

Faces recognition example using eigenfaces and SVMs

Prediction Latency

Pipeline Anova SVM

Univariate Feature Selection

Non-negative least squares

Comparing various online solvers

MNIST classification using multinomial logistic + L1

Multiclass sparse logistic regression on 20newgroups

Early stopping of Stochastic Gradient Descent

Poisson regression and non-normal loss

Tweedie regression on insurance claims

Permutation Importance with Multicollinear or Correlated Features

Permutation Importance vs Random Forest Feature Importance (MDI)

Partial Dependence and Individual Conditional Expectation Plots

Common pitfalls in interpretation of coefficients of linear models

Scalable learning with polynomial kernel aproximation

ROC Curve with Visualization API

Visualizations with Display Objects

Confusion matrix

Detection error tradeoff (DET) curve

Parameter estimation using grid search with cross-validation

Receiver Operating Characteristic (ROC)

Precision-Recall

Classifier Chain

Comparing Nearest Neighbors with and without Neighborhood Components Analysis

Dimensionality Reduction with Neighborhood Components Analysis

Restricted Boltzmann Machine features for digit classification

Varying regularization in Multi-layer Perceptron

Column Transformer with Mixed Types

Effect of transforming the targets in regression model

Importance of Feature Scaling

Map data to a normal distribution

Feature discretization

Semi-supervised Classification on a Text Dataset

© 2007–2020 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/0.24/modules/generated/sklearn.model_selection.train_test_split.html

sklearn.model_selection.train_test_split

Examples

Examples using sklearn.model_selection.train_test_split

Examples using `sklearn.model_selection.train_test_split`