Machine Learning in Python-Essential Techniques for Predictive Analysis.pdf

发布时间：2022-06-15 发布人：admin 分类：说明书资料大小：13.35M 资料格式：pdf 举报版权申诉

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第1页.png

第1页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第2页.png

第2页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第3页.png

第3页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第4页.png

第4页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第5页.png

第5页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第6页.png

第6页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第7页.png

第7页 / 共361页

61c4e7e3-4b5f-4179-9ced-955dc5846acc.pdf-第8页.png

第8页 / 共361页

Machine Learning in Python®

Contents

Introduction

Chapter 1 The Two Essential Algorithms for Making Predictions

Why Are These Two Algorithms So Useful?

What Are Penalized Regression Methods?

What Are Ensemble Methods?

How to Decide Which Algorithm to Use

The Process Steps for Building a Predictive Model

Framing a Machine Learning Problem

Feature Extraction and Feature Engineering

Determining Performance of a Trained Model

Chapter Contents and Dependencies

Summary

Chapter 2 Understand the Problem by Understanding the Data

The Anatomy of a New Problem

Different Types of Attributes and Labels Drive Modeling Choices

Things to Notice about Your New Data Set

Classification Problems: Detecting Unexploded Mines Using Sonar

Physical Characteristics of the Rocks Versus Mines Data Set

Statistical Summaries of the Rocks versus Mines Data Set

Visualization of Outliers Using Quantile-Quantile Plot

Statistical Characterization of Categorical Attributes

How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set

Visualizing Properties of the Rocks versus Mines Data Set

Visualizing with Parallel Coordinates Plots

Visualizing Interrelationships between Attributes and Labels

Visualizing Attribute and Label Correlations Using a Heat Map

Summarizing the Process for Understanding Rocks versus Mines Data Set

Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?

Parallel Coordinates for Regression Problems—Visualize Variable Relationships for Abalone Problem

How to Use Correlation Heat Map for Regression—Visualize Pair-Wise Correlations for the Abalone Problem

Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes

Multiclass Classification Problem: What Type of Glass Is That?

Summary

Chapter 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data

The Basic Problem: Understanding Function Approximation

Working with Training Data

Assessing Performance of Predictive Models

Factors Driving Algorithm Choices and Performance—Complexity and Data

Contrast Between a Simple Problem and a Complex Problem

Contrast Between a Simple Model and a Complex Model

Factors Driving Predictive Algorithm Performance

Choosing an Algorithm: Linear or Nonlinear?

Measuring the Performance of Predictive Models

Performance Measures for Different Types of Problems

Simulating Performance of Deployed Models

Achieving Harmony Between Model and Data

Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size

Using Forward Stepwise Regression to Control Overfitting

Evaluating and Understanding Your Predictive Model

Control Overfitting by Penalizing Regression Coefficients—Ridge Regression

Summary

Chapter 4 Penalized Linear Regression

Why Penalized Linear Regression Methods Are So Useful

Extremely Fast Coefficient Estimation

Variable Importance Information

Extremely Fast Evaluation When Deployed

Reliable Performance

Sparse Solutions

Problem May Require Linear Model

When to Use Ensemble Methods

Penalized Linear Regression: Regulating Linear Regression for Optimum Performance

Training Linear Models: Minimizing Errors and More

Adding a Coefficient Penalty to the OLS Formulation

Other Useful Coefficient Penalties—Manhattan and ElasticNet

Why Lasso Penalty Leads to Sparse Coefficient Vectors

ElasticNet Penalty Includes Both Lasso and Ridge

Solving the Penalized Linear Regression Problem

Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression

How LARS Generates Hundreds of Models of Varying Complexity

Choosing the Best Model from The Hundreds LARS Generates

Using Glmnet: Very Fast and Very General

Comparison of the Mechanics of Glmnet and LARS Algorithms

Initializing and Iterating the Glmnet Algorithm

Extensions to Linear Regression with Numeric Input

Solving Classification Problems with Penalized Regression

Working with Classification Problems Having More Than Two Outcomes

Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems

Incorporating Non-Numeric Attributes into Linear Methods

Summary

Chapter 5 Building Predictive Models Using Penalized Linear Methods

Python Packages for Penalized Linear Regression

Multivariable Regression: Predicting Wine Taste

Building and Testing a Model to Predict Wine Taste

Training on the Whole Data Set before Deployment

Basis Expansion: Improving Performance by Creating New Variables from Old Ones

Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines

Build a Rocks versus Mines Classifier for Deployment

Multiclass Classification: Classifying Crime Scene Glass Samples

Summary

Chapter 6 Ensemble Methods

Binary Decision Trees

How a Binary Decision Tree Generates Predictions

How to Train a Binary Decision Tree

Tree Training Equals Split Point Selection

How Split Point Selection Affects Predictions

Algorithm for Selecting Split Points

Multivariable Tree Training—Which Attribute to Split?

Recursive Splitting for More Tree Depth

Overfitting Binary Trees

Measuring Overfit with Binary Trees

Balancing Binary Tree Complexity for Best Performance

Modifications for Classification and Categorical Features

Bootstrap Aggregation: “Bagging”

How Does the Bagging Algorithm Work?

Bagging Performance—Bias versus Variance

How Bagging Behaves on Multivariable Problem

Bagging Needs Tree Depth for Performance

Summary of Bagging

Gradient Boosting

Basic Principle of Gradient Boosting Algorithm

Parameter Settings for Gradient Boosting

How Gradient Boosting Iterates Toward a Predictive Model

Getting the Best Performance from Gradient Boosting

Gradient Boosting on a Multivariable Problem

Summary for Gradient Boosting

Random Forest

Random Forests: Bagging Plus Random Attribute Subsets

Random Forests Performance Drivers

Random Forests Summary

Summary

Chapter 7 Building Ensemble Models with Python

Solving Regression Problems with Python Ensemble Packages

Building a Random Forest Model to Predict Wine Taste

Constructing a RandomForestRegressor Object

Modeling Wine Taste with RandomForestRegressor

Visualizing the Performance of a Random Forests Regression Model

Using Gradient Boosting to Predict Wine Taste

Using the Class Constructor for GradientBoostingRegressor

Using GradientBoostingRegressor to Implement a Regression Model

Assessing the Performance of a Gradient Boosting Model

Coding Bagging to Predict Wine Taste

Incorporating Non-Numeric Attributes in Python Ensemble Models

Coding the Sex of Abalone for Input to Random Forest Regression in Python

Assessing Performance and the Importance of Coded Variables

Coding the Sex of Abalone for Gradient Boosting Regression in Python

Assessing Performance and the Importance of Coded Variables with Gradient Boosting

Solving Binary Classification Problems with Python Ensemble Methods

Detecting Unexploded Mines with Python Random Forest

Constructing a Random Forests Model to Detect Unexploded Mines

Determining the Performance of a Random Forests Classifier

Detecting Unexploded Mines with Python Gradient Boosting

Determining the Performance of a Gradient Boosting Classifier

Solving Multiclass Classification Problems with Python Ensemble Methods

Classifying Glass with Random Forests

Dealing with Class Imbalances

Classifying Glass Using Gradient Boosting

Assessing the Advantage of Using Random Forest Base Learners with Gradient Boosting

Comparing Algorithms

Summary

Index

EULA

Machine Learning in Python®

Machine Learning in Python® Essential Techniques for Predictive Analysis Michael Bowles

Machine Learning in Python® : Essential Techniques for Predictive Analysis Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2015 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-1-118-96174-2 ISBN: 978-1-118-96176-6 (ebk) ISBN: 978-1-118-96175-9 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or autho- rization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disap- peared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http:// booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2015930541 Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Python is a registered trademark of Python Software Foundation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

To my children, Scott, Seth, and Cayley. Their blossoming lives and selves bring me more joy than anything else in this world. To my close friends David and Ron for their selfless generosity and steadfast friendship. To my friends and colleagues at Hacker Dojo in Mountain View, California, for their technical challenges and repartee. To my climbing partners. One of them, Katherine, says climbing partners make the best friends because “they see you paralyzed with fear, offer encouragement to overcome it, and celebrate when you do.”

分享到：

赞收藏

资料库

Machine Learning in Python-Essential Techniques for Predictive Analysis.pdf

相关推荐

后端

热门标签

最新资料