logo资料库

scikit-learn官方文档.pdf

第1页 / 共2170页
第2页 / 共2170页
第3页 / 共2170页
第4页 / 共2170页
第5页 / 共2170页
第6页 / 共2170页
第7页 / 共2170页
第8页 / 共2170页
资料共2170页,剩余部分请下载后查看
Welcome to scikit-learn
Installing scikit-learn
Frequently Asked Questions
Support
Related Projects
About us
Who is using scikit-learn?
Release history
scikit-learn Tutorials
An introduction to machine learning with scikit-learn
A tutorial on statistical-learning for scientific data processing
Working With Text Data
Choosing the right estimator
External Resources, Videos and Talks
User Guide
Supervised learning
Unsupervised learning
Model selection and evaluation
Dataset transformations
Dataset loading utilities
Strategies to scale computationally: bigger data
Computational Performance
Examples
General examples
Examples based on real world datasets
Biclustering
Calibration
Classification
Clustering
Covariance estimation
Cross decomposition
Dataset examples
Decomposition
Ensemble methods
Tutorial exercises
Feature Selection
Gaussian Process for Machine Learning
Generalized Linear Models
Manifold learning
Gaussian Mixture Models
Model Selection
Multioutput methods
Nearest Neighbors
Neural Networks
Preprocessing
Semi Supervised Classification
Support Vector Machines
Working with text documents
Decision Trees
API Reference
sklearn.base: Base classes and utility functions
sklearn.calibration: Probability Calibration
sklearn.cluster: Clustering
sklearn.cluster.bicluster: Biclustering
sklearn.covariance: Covariance Estimators
sklearn.cross_decomposition: Cross decomposition
sklearn.datasets: Datasets
sklearn.decomposition: Matrix Decomposition
sklearn.discriminant_analysis: Discriminant Analysis
sklearn.dummy: Dummy estimators
sklearn.ensemble: Ensemble Methods
sklearn.exceptions: Exceptions and warnings
sklearn.feature_extraction: Feature Extraction
sklearn.feature_selection: Feature Selection
sklearn.gaussian_process: Gaussian Processes
sklearn.isotonic: Isotonic regression
sklearn.kernel_approximation Kernel Approximation
sklearn.kernel_ridge Kernel Ridge Regression
sklearn.linear_model: Generalized Linear Models
sklearn.manifold: Manifold Learning
sklearn.metrics: Metrics
sklearn.mixture: Gaussian Mixture Models
sklearn.model_selection: Model Selection
sklearn.multiclass: Multiclass and multilabel classification
sklearn.multioutput: Multioutput regression and classification
sklearn.naive_bayes: Naive Bayes
sklearn.neighbors: Nearest Neighbors
sklearn.neural_network: Neural network models
sklearn.pipeline: Pipeline
sklearn.preprocessing: Preprocessing and Normalization
sklearn.random_projection: Random projection
sklearn.semi_supervised Semi-Supervised Learning
sklearn.svm: Support Vector Machines
sklearn.tree: Decision Trees
sklearn.utils: Utilities
Recently deprecated
Developer’s Guide
Contributing
Developers’ Tips and Tricks
Utilities for Developers
How to optimize for speed
Advanced installation instructions
Maintainer / core-developer information
Bibliography
Index
scikit-learn user guide Release 0.19.1 scikit-learn developers Nov 21, 2017
CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 7 8 11 16 24 1 Welcome to scikit-learn . . . . . 1.1 1.2 . 1.3 . 1.4 1.5 About us . . 1.6 Who is using scikit-learn? . 1.7 . Installing scikit-learn . . Frequently Asked Questions . Support . . . . Related Projects . . . . Release history . . . . . . . . . . . . . . . . . . . . . . . . 2 scikit-learn Tutorials 107 2.1 An introduction to machine learning with scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . 107 2.2 A tutorial on statistical-learning for scientific data processing . . . . . . . . . . . . . . . . . . . . . 113 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 2.3 Working With Text Data . . Choosing the right estimator . 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 2.5 External Resources, Videos and Talks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 . . . 3 User Guide . . . . Supervised learning . 151 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 3.1 . 3.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 3.3 Model selection and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 3.4 Dataset transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 3.5 Dataset loading utilities 3.6 . 566 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 3.7 Strategies to scale computationally: bigger data . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Performance . . . . . . . . . . . . . . . . 4 Examples . . . . . . . . . . . . . . . . . . . . . . . 4.1 General examples 4.2 4.3 4.4 4.5 4.6 4.7 4.8 . 4.9 Dataset examples . 4.10 Decomposition . . . 4.11 Ensemble methods . 4.12 Tutorial exercises . . 4.13 Feature Selection . . 4.14 Gaussian Process for Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898 . 908 Examples based on real world datasets . . Biclustering . . Calibration . . . Classification . Clustering . . . Covariance estimation . . Cross decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932 4.15 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002 4.16 Manifold learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024 4.17 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041 . 4.18 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079 4.19 Multioutput methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081 . . 4.20 Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096 . . . 4.21 Neural Networks . 4.22 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109 . . . . 4.23 Semi Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1121 4.24 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1133 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162 4.25 Working with text documents 4.26 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 API Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183 sklearn.base: Base classes and utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . 1183 5.1 sklearn.calibration: Probability Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 1189 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193 sklearn.cluster: Clustering . . . . 5.3 sklearn.cluster.bicluster: Biclustering . . . 1231 5.4 sklearn.covariance: Covariance Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237 5.5 sklearn.cross_decomposition: Cross decomposition . . . . . . . . . . . . . . . . . . . . 1267 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281 sklearn.datasets: Datasets . . . . . 5.7 sklearn.decomposition: Matrix Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 1330 5.8 . . . . . . . . . . . . . . . . . 1383 sklearn.discriminant_analysis: Discriminant Analysis . 5.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1391 5.10 sklearn.dummy: Dummy estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396 5.11 sklearn.ensemble: Ensemble Methods . . . . 1426 5.12 sklearn.exceptions: Exceptions and warnings . 1431 5.13 sklearn.feature_extraction: Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 1457 5.14 sklearn.feature_selection: Feature Selection . . . . . . . . . . . . . . . . . . . . . . 1489 5.15 sklearn.gaussian_process: Gaussian Processes . . 5.16 sklearn.isotonic: Isotonic regression . . . . . 1527 5.17 sklearn.kernel_approximation Kernel Approximation . . . . . . . . . . . . . . . . . . . 1531 5.18 sklearn.kernel_ridge Kernel Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . 1540 5.19 sklearn.linear_model: Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . 1543 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1642 5.20 sklearn.manifold: Manifold Learning . . . 5.21 sklearn.metrics: Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1661 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1728 5.22 sklearn.mixture: Gaussian Mixture Models 5.23 sklearn.model_selection: Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 1739 5.24 sklearn.multiclass: Multiclass and multilabel classification . . . . . . . . . . . . . . . . . . 1795 5.25 sklearn.multioutput: Multioutput regression and classification . . . . . . . . . . . . . . . . 1803 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1810 5.26 sklearn.naive_bayes: Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1821 5.27 sklearn.neighbors: Nearest Neighbors . . . . . . 5.28 sklearn.neural_network: Neural network models . . . . . . . . . . . . . . . . . . . . . . . 1870 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1884 5.29 sklearn.pipeline: Pipeline . . . . . 5.30 sklearn.preprocessing: Preprocessing and Normalization . . . . . . . . . . . . . . . . . . . 1892 5.31 sklearn.random_projection: Random projection . . . . . . 1936 5.32 sklearn.semi_supervised Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 1942 5.33 sklearn.svm: Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1948 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1981 5.34 sklearn.tree: Decision Trees . . 5.35 sklearn.utils: Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2006 5.36 Recently deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Developer’s Guide Contributing . 6.1 . 6.2 Developers’ Tips and Tricks . . . . . . . . 2085 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2085 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2102 . . . ii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2104 . 6.3 Utilities for Developers 6.4 How to optimize for speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2107 6.5 Advanced installation instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2113 6.6 Maintainer / core-developer information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2119 . . . . . Bibliography Index 2121 2129 iii
iv
CHAPTER ONE WELCOME TO SCIKIT-LEARN 1.1 Installing scikit-learn Note: If you wish to contribute to the project, it’s recommended you install the latest development version. 1.1.1 Installing the latest release Scikit-learn requires: • Python (>= 2.7 or >= 3.3), • NumPy (>= 1.8.2), • SciPy (>= 0.13.3). If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip pip install -U scikit-learn or conda: conda install scikit-learn If you have not installed NumPy or SciPy yet, you can also install these using conda or pip. When using pip, please ensure that binary wheels are used, and NumPy and SciPy are not recompiled from source, which can happen when using particular configurations of operating system and hardware (such as Linux on a Raspberry Pi). Building numpy and scipy from source can be complex (especially on Windows) and requires careful configuration to ensure that they link against an optimized implementation of linear algebra routines. Instead, use a third-party distribution as described below. If you must install scikit-learn and its dependencies with pip, you can install it as scikit-learn[alldeps]. The most common use case for this is in a requirements.txt file used as part of an automated build process for a PaaS application or a Docker image. This option is not intended for manual installation from the command line. 1.1.2 Third-party Distributions If you don’t already have a python installation with numpy and scipy, we recommend to install either via your package manager or via a python bundle. These come with numpy, scipy, scikit-learn, matplotlib and many other helpful 1
scikit-learn user guide, Release 0.19.1 scientific and data processing libraries. Available options are: Canopy and Anaconda for all supported platforms Canopy and Anaconda both ship a recent version of scikit-learn, in addition to a large set of scientific python library for Windows, Mac OSX and Linux. Anaconda offers scikit-learn as part of its free distribution. Warning: To upgrade or uninstall scikit-learn installed with Anaconda or conda you should not use the pip command. Instead: To upgrade scikit-learn: conda update scikit-learn To uninstall scikit-learn: conda remove scikit-learn Upgrading with pip install -U scikit-learn or uninstalling pip uninstall scikit-learn is likely fail to properly remove files installed by the conda command. pip upgrade and uninstall operations only work on packages installed via pip install. WinPython for Windows The WinPython project distributes scikit-learn as an additional plugin. For installation instructions for particular operating systems or for compiling the bleeding edge version, see the Ad- vanced installation instructions. 1.2 Frequently Asked Questions Here we try to give some answers to questions that regularly pop up on the mailing list. 1.2.1 What is the project name (a lot of people get it wrong)? scikit-learn, but not scikit or SciKit nor sci-kit learn. Also not scikits.learn or scikits-learn, which were previously used. 1.2.2 How do you pronounce the project name? sy-kit learn. sci stands for science! 1.2.3 Why scikit? There are multiple scikits, which are scientific toolboxes built around SciPy. You can find a list at https://scikits. appspot.com/scikits. Apart from scikit-learn, another popular one is scikit-image. 2 Chapter 1. Welcome to scikit-learn
分享到:
收藏