logo资料库

Machine Learning in Python-Essential Techniques for Predictive Analysis.pdf

第1页 / 共361页
第2页 / 共361页
第3页 / 共361页
第4页 / 共361页
第5页 / 共361页
第6页 / 共361页
第7页 / 共361页
第8页 / 共361页
资料共361页,剩余部分请下载后查看
Machine Learning in Python®
Contents
Introduction
Chapter 1 The Two Essential Algorithms for Making Predictions
Why Are These Two Algorithms So Useful?
What Are Penalized Regression Methods?
What Are Ensemble Methods?
How to Decide Which Algorithm to Use
The Process Steps for Building a Predictive Model
Framing a Machine Learning Problem
Feature Extraction and Feature Engineering
Determining Performance of a Trained Model
Chapter Contents and Dependencies
Summary
Chapter 2 Understand the Problem by Understanding the Data
The Anatomy of a New Problem
Different Types of Attributes and Labels Drive Modeling Choices
Things to Notice about Your New Data Set
Classification Problems: Detecting Unexploded Mines Using Sonar
Physical Characteristics of the Rocks Versus Mines Data Set
Statistical Summaries of the Rocks versus Mines Data Set
Visualization of Outliers Using Quantile-Quantile Plot
Statistical Characterization of Categorical Attributes
How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set
Visualizing Properties of the Rocks versus Mines Data Set
Visualizing with Parallel Coordinates Plots
Visualizing Interrelationships between Attributes and Labels
Visualizing Attribute and Label Correlations Using a Heat Map
Summarizing the Process for Understanding Rocks versus Mines Data Set
Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?
Parallel Coordinates for Regression Problems—Visualize Variable Relationships for Abalone Problem
How to Use Correlation Heat Map for Regression—Visualize Pair-Wise Correlations for the Abalone Problem
Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes
Multiclass Classification Problem: What Type of Glass Is That?
Summary
Chapter 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data
The Basic Problem: Understanding Function Approximation
Working with Training Data
Assessing Performance of Predictive Models
Factors Driving Algorithm Choices and Performance—Complexity and Data
Contrast Between a Simple Problem and a Complex Problem
Contrast Between a Simple Model and a Complex Model
Factors Driving Predictive Algorithm Performance
Choosing an Algorithm: Linear or Nonlinear?
Measuring the Performance of Predictive Models
Performance Measures for Different Types of Problems
Simulating Performance of Deployed Models
Achieving Harmony Between Model and Data
Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size
Using Forward Stepwise Regression to Control Overfitting
Evaluating and Understanding Your Predictive Model
Control Overfitting by Penalizing Regression Coefficients—Ridge Regression
Summary
Chapter 4 Penalized Linear Regression
Why Penalized Linear Regression Methods Are So Useful
Extremely Fast Coefficient Estimation
Variable Importance Information
Extremely Fast Evaluation When Deployed
Reliable Performance
Sparse Solutions
Problem May Require Linear Model
When to Use Ensemble Methods
Penalized Linear Regression: Regulating Linear Regression for Optimum Performance
Training Linear Models: Minimizing Errors and More
Adding a Coefficient Penalty to the OLS Formulation
Other Useful Coefficient Penalties—Manhattan and ElasticNet
Why Lasso Penalty Leads to Sparse Coefficient Vectors
ElasticNet Penalty Includes Both Lasso and Ridge
Solving the Penalized Linear Regression Problem
Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression
How LARS Generates Hundreds of Models of Varying Complexity
Choosing the Best Model from The Hundreds LARS Generates
Using Glmnet: Very Fast and Very General
Comparison of the Mechanics of Glmnet and LARS Algorithms
Initializing and Iterating the Glmnet Algorithm
Extensions to Linear Regression with Numeric Input
Solving Classification Problems with Penalized Regression
Working with Classification Problems Having More Than Two Outcomes
Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems
Incorporating Non-Numeric Attributes into Linear Methods
Summary
Chapter 5 Building Predictive Models Using Penalized Linear Methods
Python Packages for Penalized Linear Regression
Multivariable Regression: Predicting Wine Taste
Building and Testing a Model to Predict Wine Taste
Training on the Whole Data Set before Deployment
Basis Expansion: Improving Performance by Creating New Variables from Old Ones
Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines
Build a Rocks versus Mines Classifier for Deployment
Multiclass Classification: Classifying Crime Scene Glass Samples
Summary
Chapter 6 Ensemble Methods
Binary Decision Trees
How a Binary Decision Tree Generates Predictions
How to Train a Binary Decision Tree
Tree Training Equals Split Point Selection
How Split Point Selection Affects Predictions
Algorithm for Selecting Split Points
Multivariable Tree Training—Which Attribute to Split?
Recursive Splitting for More Tree Depth
Overfitting Binary Trees
Measuring Overfit with Binary Trees
Balancing Binary Tree Complexity for Best Performance
Modifications for Classification and Categorical Features
Bootstrap Aggregation: “Bagging”
How Does the Bagging Algorithm Work?
Bagging Performance—Bias versus Variance
How Bagging Behaves on Multivariable Problem
Bagging Needs Tree Depth for Performance
Summary of Bagging
Gradient Boosting
Basic Principle of Gradient Boosting Algorithm
Parameter Settings for Gradient Boosting
How Gradient Boosting Iterates Toward a Predictive Model
Getting the Best Performance from Gradient Boosting
Gradient Boosting on a Multivariable Problem
Summary for Gradient Boosting
Random Forest
Random Forests: Bagging Plus Random Attribute Subsets
Random Forests Performance Drivers
Random Forests Summary
Summary
Chapter 7 Building Ensemble Models with Python
Solving Regression Problems with Python Ensemble Packages
Building a Random Forest Model to Predict Wine Taste
Constructing a RandomForestRegressor Object
Modeling Wine Taste with RandomForestRegressor
Visualizing the Performance of a Random Forests Regression Model
Using Gradient Boosting to Predict Wine Taste
Using the Class Constructor for GradientBoostingRegressor
Using GradientBoostingRegressor to Implement a Regression Model
Assessing the Performance of a Gradient Boosting Model
Coding Bagging to Predict Wine Taste
Incorporating Non-Numeric Attributes in Python Ensemble Models
Coding the Sex of Abalone for Input to Random Forest Regression in Python
Assessing Performance and the Importance of Coded Variables
Coding the Sex of Abalone for Gradient Boosting Regression in Python
Assessing Performance and the Importance of Coded Variables with Gradient Boosting
Solving Binary Classification Problems with Python Ensemble Methods
Detecting Unexploded Mines with Python Random Forest
Constructing a Random Forests Model to Detect Unexploded Mines
Determining the Performance of a Random Forests Classifier
Detecting Unexploded Mines with Python Gradient Boosting
Determining the Performance of a Gradient Boosting Classifier
Solving Multiclass Classification Problems with Python Ensemble Methods
Classifying Glass with Random Forests
Dealing with Class Imbalances
Classifying Glass Using Gradient Boosting
Assessing the Advantage of Using Random Forest Base Learners with Gradient Boosting
Comparing Algorithms
Summary
Index
EULA
Machine Learning in Python®
Machine Learning in Python® Essential Techniques for Predictive Analysis Michael Bowles
Machine Learning in Python® : Essential Techniques for Predictive Analysis Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2015 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-1-118-96174-2 ISBN: 978-1-118-96176-6 (ebk) ISBN: 978-1-118-96175-9 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or autho- rization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disap- peared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http:// booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2015930541 Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Python is a registered trademark of Python Software Foundation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
To my children, Scott, Seth, and Cayley. Their blossoming lives and selves bring me more joy than anything else in this world. To my close friends David and Ron for their selfless generosity and steadfast friendship. To my friends and colleagues at Hacker Dojo in Mountain View, California, for their technical challenges and repartee. To my climbing partners. One of them, Katherine, says climbing partners make the best friends because “they see you paralyzed with fear, offer encouragement to overcome it, and celebrate when you do.”
分享到:
收藏