Hands-On Machine Learning with Scikit-Learn and TensorFlow.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：7.20M 资料格式：pdf 举报版权申诉

qq_33256688-11646957-4744302542860671553.pdf-第1页.png

第1页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第2页.png

第2页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第3页.png

第3页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第4页.png

第4页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第5页.png

第5页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第6页.png

第6页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第7页.png

第7页 / 共564页

qq_33256688-11646957-4744302542860671553.pdf-第8页.png

第8页 / 共564页

Table of Contents

Preface

The Machine Learning Tsunami

Machine Learning in Your Projects

Objective and Approach

Prerequisites

Roadmap

Other Resources

Conventions Used in This Book

Using Code Examples

O’Reilly Safari

How to Contact Us

Acknowledgments

Part I. The Fundamentals of Machine Learning

Chapter 1. The Machine Learning Landscape

What Is Machine Learning?

Why Use Machine Learning?

Types of Machine Learning Systems

Supervised/Unsupervised Learning

Batch and Online Learning

Instance-Based Versus Model-Based Learning

Main Challenges of Machine Learning

Insufficient Quantity of Training Data

Nonrepresentative Training Data

Poor-Quality Data

Irrelevant Features

Overfitting the Training Data

Underfitting the Training Data

Stepping Back

Testing and Validating

Exercises

Chapter 2. End-to-End Machine Learning Project

Working with Real Data

Look at the Big Picture

Frame the Problem

Select a Performance Measure

Check the Assumptions

Get the Data

Create the Workspace

Download the Data

Take a Quick Look at the Data Structure

Create a Test Set

Discover and Visualize the Data to Gain Insights

Visualizing Geographical Data

Looking for Correlations

Experimenting with Attribute Combinations

Prepare the Data for Machine Learning Algorithms

Data Cleaning

Handling Text and Categorical Attributes

Custom Transformers

Feature Scaling

Transformation Pipelines

Select and Train a Model

Training and Evaluating on the Training Set

Better Evaluation Using Cross-Validation

Fine-Tune Your Model

Grid Search

Randomized Search

Ensemble Methods

Analyze the Best Models and Their Errors

Evaluate Your System on the Test Set

Launch, Monitor, and Maintain Your System

Try It Out!

Exercises

Chapter 3. Classification

MNIST

Training a Binary Classifier

Performance Measures

Measuring Accuracy Using Cross-Validation

Confusion Matrix

Precision and Recall

Precision/Recall Tradeoff

The ROC Curve

Multiclass Classification

Error Analysis

Multilabel Classification

Multioutput Classification

Exercises

Chapter 4. Training Models

Linear Regression

The Normal Equation

Computational Complexity

Gradient Descent

Batch Gradient Descent

Stochastic Gradient Descent

Mini-batch Gradient Descent

Polynomial Regression

Learning Curves

Regularized Linear Models

Ridge Regression

Lasso Regression

Elastic Net

Early Stopping

Logistic Regression

Estimating Probabilities

Training and Cost Function

Decision Boundaries

Softmax Regression

Exercises

Chapter 5. Support Vector Machines

Linear SVM Classification

Soft Margin Classification

Nonlinear SVM Classification

Polynomial Kernel

Adding Similarity Features

Gaussian RBF Kernel

Computational Complexity

SVM Regression

Under the Hood

Decision Function and Predictions

Training Objective

Quadratic Programming

The Dual Problem

Kernelized SVM

Online SVMs

Exercises

Chapter 6. Decision Trees

Training and Visualizing a Decision Tree

Making Predictions

Estimating Class Probabilities

The CART Training Algorithm

Computational Complexity

Gini Impurity or Entropy?

Regularization Hyperparameters

Regression

Instability

Exercises

Chapter 7. Ensemble Learning and Random Forests

Voting Classifiers

Bagging and Pasting

Bagging and Pasting in Scikit-Learn

Out-of-Bag Evaluation

Random Patches and Random Subspaces

Random Forests

Extra-Trees

Feature Importance

Boosting

AdaBoost

Gradient Boosting

Stacking

Exercises

Chapter 8. Dimensionality Reduction

The Curse of Dimensionality

Main Approaches for Dimensionality Reduction

Projection

Manifold Learning

PCA

Preserving the Variance

Principal Components

Projecting Down to d Dimensions

Using Scikit-Learn

Explained Variance Ratio

Choosing the Right Number of Dimensions

PCA for Compression

Incremental PCA

Randomized PCA

Kernel PCA

Selecting a Kernel and Tuning Hyperparameters

LLE

Other Dimensionality Reduction Techniques

Exercises

Part II. Neural Networks and Deep Learning

Chapter 9. Up and Running with TensorFlow

Installation

Creating Your First Graph and Running It in a Session

Managing Graphs

Lifecycle of a Node Value

Linear Regression with TensorFlow

Implementing Gradient Descent

Manually Computing the Gradients

Using autodiff

Using an Optimizer

Feeding Data to the Training Algorithm

Saving and Restoring Models

Visualizing the Graph and Training Curves Using TensorBoard

Name Scopes

Modularity

Sharing Variables

Exercises

Chapter 10. Introduction to Artificial Neural Networks

From Biological to Artificial Neurons

Biological Neurons

Logical Computations with Neurons

The Perceptron

Multi-Layer Perceptron and Backpropagation

Training an MLP with TensorFlow’s High-Level API

Training a DNN Using Plain TensorFlow

Construction Phase

Execution Phase

Using the Neural Network

Fine-Tuning Neural Network Hyperparameters

Number of Hidden Layers

Number of Neurons per Hidden Layer

Activation Functions

Exercises

Chapter 11. Training Deep Neural Nets

Vanishing/Exploding Gradients Problems

Xavier and He Initialization

Nonsaturating Activation Functions

Batch Normalization

Gradient Clipping

Reusing Pretrained Layers

Reusing a TensorFlow Model

Reusing Models from Other Frameworks

Freezing the Lower Layers

Caching the Frozen Layers

Tweaking, Dropping, or Replacing the Upper Layers

Model Zoos

Unsupervised Pretraining

Pretraining on an Auxiliary Task

Faster Optimizers

Momentum optimization

Nesterov Accelerated Gradient

AdaGrad

RMSProp

Adam Optimization

Learning Rate Scheduling

Avoiding Overfitting Through Regularization

Early Stopping

ℓ1 and ℓ2 Regularization

Dropout

Max-Norm Regularization

Data Augmentation

Practical Guidelines

Exercises

Chapter 12. Distributing TensorFlow Across Devices and Servers

Multiple Devices on a Single Machine

Installation

Managing the GPU RAM

Placing Operations on Devices

Parallel Execution

Control Dependencies

Multiple Devices Across Multiple Servers

Opening a Session

The Master and Worker Services

Pinning Operations Across Tasks

Sharding Variables Across Multiple Parameter Servers

Sharing State Across Sessions Using Resource Containers

Asynchronous Communication Using TensorFlow Queues

Loading Data Directly from the Graph

Parallelizing Neural Networks on a TensorFlow Cluster

One Neural Network per Device

In-Graph Versus Between-Graph Replication

Model Parallelism

Data Parallelism

Exercises

Chapter 13. Convolutional Neural Networks

The Architecture of the Visual Cortex

Convolutional Layer

Filters

Stacking Multiple Feature Maps

TensorFlow Implementation

Memory Requirements

Pooling Layer

CNN Architectures

LeNet-5

AlexNet

GoogLeNet

ResNet

Exercises

Chapter 14. Recurrent Neural Networks

Recurrent Neurons

Memory Cells

Input and Output Sequences

Basic RNNs in TensorFlow

Static Unrolling Through Time

Dynamic Unrolling Through Time

Handling Variable Length Input Sequences

Handling Variable-Length Output Sequences

Training RNNs

Training a Sequence Classifier

Training to Predict Time Series

Creative RNN

Deep RNNs

Distributing a Deep RNN Across Multiple GPUs

Applying Dropout

The Difficulty of Training over Many Time Steps

LSTM Cell

Peephole Connections

GRU Cell

Natural Language Processing

Word Embeddings

An Encoder–Decoder Network for Machine Translation

Exercises

Chapter 15. Autoencoders

Efficient Data Representations

Performing PCA with an Undercomplete Linear Autoencoder

Stacked Autoencoders

TensorFlow Implementation

Tying Weights

Training One Autoencoder at a Time

Visualizing the Reconstructions

Visualizing Features

Unsupervised Pretraining Using Stacked Autoencoders

Denoising Autoencoders

TensorFlow Implementation

Sparse Autoencoders

TensorFlow Implementation

Variational Autoencoders

Generating Digits

Other Autoencoders

Exercises

Chapter 16. Reinforcement Learning

Learning to Optimize Rewards

Policy Search

Introduction to OpenAI Gym

Neural Network Policies

Evaluating Actions: The Credit Assignment Problem

Policy Gradients

Markov Decision Processes

Temporal Difference Learning and Q-Learning

Exploration Policies

Approximate Q-Learning

Learning to Play Ms. Pac-Man Using Deep Q-Learning

Exercises

Thank You!

Appendix A. Exercise Solutions

Chapter 1: The Machine Learning Landscape

Chapter 2: End-to-End Machine Learning Project

Chapter 3: Classification

Chapter 4: Training Linear Models

Chapter 5: Support Vector Machines

Chapter 6: Decision Trees

Chapter 7: Ensemble Learning and Random Forests

Chapter 8: Dimensionality Reduction

Chapter 9: Up and Running with TensorFlow

Chapter 10: Introduction to Artificial Neural Networks

Chapter 11: Training Deep Neural Nets

Chapter 12: Distributing TensorFlow Across Devices and Servers

Chapter 13: Convolutional Neural Networks

Chapter 14: Recurrent Neural Networks

Chapter 15: Autoencoders

Chapter 16: Reinforcement Learning

Appendix B. Machine Learning Project Checklist

Frame the Problem and Look at the Big Picture

Get the Data

Explore the Data

Prepare the Data

Short-List Promising Models

Fine-Tune the System

Present Your Solution

Launch!

Appendix C. SVM Dual Problem

Appendix D. Autodiff

Manual Differentiation

Symbolic Differentiation

Numerical Differentiation

Forward-Mode Autodiff

Reverse-Mode Autodiff

Appendix E. Other Popular ANN Architectures

Hopfield Networks

Boltzmann Machines

Restricted Boltzmann Machines

Deep Belief Nets

Self-Organizing Maps

Index

About the Author

Colophon

Hands-On Machine Learning with Scikit-Learn & TensorFlow CONCEPTS, TOOLS, AND TECHNIQUES TO BUILD INTELLIGENT SYSTEMS Aurélien Géron Download from finelybook www.finelybook.com

Download from finelybook www.finelybook.com

Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems Aurélien Géron Beijing Beijing Boston Boston Farnham Sebastopol Farnham Sebastopol Tokyo Tokyo Download from finelybook www.finelybook.com

Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron Copyright © 2017 Aurélien Géron. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Nicole Tache Production Editor: Nicholas Adams Copyeditor: Rachel Monaghan Proofreader: Charles Roumeliotis March 2017: First Edition Revision History for the First Edition 2017-03-10: First Release Indexer: Wendy Catalano Interior Designer: David Futato Cover Designer: Randy Comer Illustrator: Rebecca Demarest See http://oreilly.com/catalog/errata.csp?isbn=9781491962299 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Hands-On Machine Learning with Scikit-Learn and TensorFlow, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-96229-9 [LSI] Download from finelybook www.finelybook.com

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part I. The Fundamentals of Machine Learning 1. The Machine Learning Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 What Is Machine Learning? 4 Why Use Machine Learning? 4 Types of Machine Learning Systems 7 Supervised/Unsupervised Learning 8 Batch and Online Learning 14 Instance-Based Versus Model-Based Learning 17 Main Challenges of Machine Learning 22 Insufficient Quantity of Training Data 22 Nonrepresentative Training Data 24 Poor-Quality Data 25 Irrelevant Features 25 Overfitting the Training Data 26 Underfitting the Training Data 28 Stepping Back 28 Testing and Validating 29 Exercises 31 2. End-to-End Machine Learning Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Working with Real Data 33 Look at the Big Picture 35 Frame the Problem 35 Select a Performance Measure 37 iii Download from finelybook www.finelybook.com

Check the Assumptions 40 Get the Data 40 Create the Workspace 40 Download the Data 43 Take a Quick Look at the Data Structure 45 Create a Test Set 49 Discover and Visualize the Data to Gain Insights 53 Visualizing Geographical Data 53 Looking for Correlations 55 Experimenting with Attribute Combinations 58 Prepare the Data for Machine Learning Algorithms 59 Data Cleaning 60 Handling Text and Categorical Attributes 62 Custom Transformers 64 Feature Scaling 65 Transformation Pipelines 66 Select and Train a Model 68 Training and Evaluating on the Training Set 68 Better Evaluation Using Cross-Validation 69 Fine-Tune Your Model 71 Grid Search 72 Randomized Search 74 Ensemble Methods 74 Analyze the Best Models and Their Errors 74 Evaluate Your System on the Test Set 75 Launch, Monitor, and Maintain Your System 76 Try It Out! 77 Exercises 77 3. Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 MNIST 79 Training a Binary Classifier 82 Performance Measures 82 Measuring Accuracy Using Cross-Validation 83 Confusion Matrix 84 Precision and Recall 86 Precision/Recall Tradeoff 87 The ROC Curve 91 Multiclass Classification 93 Error Analysis 96 Multilabel Classification 100 Multioutput Classification 101 iv | Table of Contents Download from finelybook www.finelybook.com

Exercises 102 4. Training Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Linear Regression 106 The Normal Equation 108 Computational Complexity 110 Gradient Descent 111 Batch Gradient Descent 114 Stochastic Gradient Descent 117 Mini-batch Gradient Descent 119 Polynomial Regression 121 Learning Curves 123 Regularized Linear Models 127 Ridge Regression 127 Lasso Regression 130 Elastic Net 132 Early Stopping 133 Logistic Regression 134 Estimating Probabilities 134 Training and Cost Function 135 Decision Boundaries 136 Softmax Regression 139 Exercises 142 5. Support Vector Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Linear SVM Classification 145 Soft Margin Classification 146 Nonlinear SVM Classification 149 Polynomial Kernel 150 Adding Similarity Features 151 Gaussian RBF Kernel 152 Computational Complexity 153 SVM Regression 154 Under the Hood 156 Decision Function and Predictions 156 Training Objective 157 Quadratic Programming 159 The Dual Problem 160 Kernelized SVM 161 Online SVMs 164 Exercises 165 Table of Contents | v Download from finelybook www.finelybook.com

6. Decision Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Training and Visualizing a Decision Tree 167 Making Predictions 169 Estimating Class Probabilities 171 The CART Training Algorithm 171 Computational Complexity 172 Gini Impurity or Entropy? 172 Regularization Hyperparameters 173 Regression 175 Instability 177 Exercises 178 7. Ensemble Learning and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Voting Classifiers 181 Bagging and Pasting 185 Bagging and Pasting in Scikit-Learn 186 Out-of-Bag Evaluation 187 Random Patches and Random Subspaces 188 Random Forests 189 Extra-Trees 190 Feature Importance 190 Boosting 191 AdaBoost 192 Gradient Boosting 195 Stacking 200 Exercises 202 8. Dimensionality Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 The Curse of Dimensionality 206 Main Approaches for Dimensionality Reduction 207 Projection 207 Manifold Learning 210 PCA 211 Preserving the Variance 211 Principal Components 212 Projecting Down to d Dimensions 213 Using Scikit-Learn 214 Explained Variance Ratio 214 Choosing the Right Number of Dimensions 215 PCA for Compression 216 Incremental PCA 217 Randomized PCA 218 vi | Table of Contents Download from finelybook www.finelybook.com

分享到：

赞收藏

资料库

Hands-On Machine Learning with Scikit-Learn and TensorFlow.pdf

相关推荐

人工智能

热门标签

最新资料