Data Science for Business.pdf

发布时间：2022-06-06 发布人：admin 分类：说明书资料大小：15.81M 资料格式：pdf 举报版权申诉

qq_16832969-7543633-16359647258828345537.pdf-第1页.png

第1页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第2页.png

第2页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第3页.png

第3页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第4页.png

第4页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第5页.png

第5页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第6页.png

第6页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第7页.png

第7页 / 共409页

qq_16832969-7543633-16359647258828345537.pdf-第8页.png

第8页 / 共409页

Table of Contents

Preface

Our Conceptual Approach to Data Science

To the Instructor

Other Skills and Concepts

Sections and Notation

Using Examples

Safari® Books Online

How to Contact Us

Acknowledgments

Chapter 1. Introduction: Data-Analytic Thinking

The Ubiquity of Data Opportunities

Example: Hurricane Frances

Example: Predicting Customer Churn

Data Science, Engineering, and Data-Driven Decision Making

Data Processing and “Big Data”

From Big Data 1.0 to Big Data 2.0

Data and Data Science Capability as a Strategic Asset

Data-Analytic Thinking

This Book

Data Mining and Data Science, Revisited

Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist

Summary

Chapter 2. Business Problems and Data Science Solutions

From Business Problems to Data Mining Tasks

Supervised Versus Unsupervised Methods

Data Mining and Its Results

The Data Mining Process

Business Understanding

Data Understanding

Data Preparation

Modeling

Evaluation

Deployment

Implications for Managing the Data Science Team

Other Analytics Techniques and Technologies

Statistics

Database Querying

Data Warehousing

Regression Analysis

Machine Learning and Data Mining

Answering Business Questions with These Techniques

Summary

Chapter 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation

Models, Induction, and Prediction

Supervised Segmentation

Selecting Informative Attributes

Example: Attribute Selection with Information Gain

Supervised Segmentation with Tree-Structured Models

Visualizing Segmentations

Trees as Sets of Rules

Probability Estimation

Example: Addressing the Churn Problem with Tree Induction

Summary

Chapter 4. Fitting a Model to Data

Classification via Mathematical Functions

Linear Discriminant Functions

Optimizing an Objective Function

An Example of Mining a Linear Discriminant from Data

Linear Discriminant Functions for Scoring and Ranking Instances

Support Vector Machines, Briefly

Regression via Mathematical Functions

Class Probability Estimation and Logistic “Regression”

* Logistic Regression: Some Technical Details

Example: Logistic Regression versus Tree Induction

Nonlinear Functions, Support Vector Machines, and Neural Networks

Summary

Chapter 5. Overfitting and Its Avoidance

Generalization

Overfitting

Overfitting Examined

Holdout Data and Fitting Graphs

Overfitting in Tree Induction

Overfitting in Mathematical Functions

Example: Overfitting Linear Functions

* Example: Why Is Overfitting Bad?

From Holdout Evaluation to Cross-Validation

The Churn Dataset Revisited

Learning Curves

Overfitting Avoidance and Complexity Control

Avoiding Overfitting with Tree Induction

A General Method for Avoiding Overfitting

* Avoiding Overfitting for Parameter Optimization

Summary

Chapter 6. Similarity, Neighbors, and Clusters

Similarity and Distance

Nearest-Neighbor Reasoning

Example: Whiskey Analytics

Nearest Neighbors for Predictive Modeling

How Many Neighbors and How Much Influence?

Geometric Interpretation, Overfitting, and Complexity Control

Issues with Nearest-Neighbor Methods

Some Important Technical Details Relating to Similarities and Neighbors

Heterogeneous Attributes

* Other Distance Functions

* Combining Functions: Calculating Scores from Neighbors

Clustering

Example: Whiskey Analytics Revisited

Hierarchical Clustering

Nearest Neighbors Revisited: Clustering Around Centroids

Example: Clustering Business News Stories

Understanding the Results of Clustering

* Using Supervised Learning to Generate Cluster Descriptions

Stepping Back: Solving a Business Problem Versus Data Exploration

Summary

Chapter 7. Decision Analytic Thinking I: What Is a Good Model?

Evaluating Classifiers

Plain Accuracy and Its Problems

The Confusion Matrix

Problems with Unbalanced Classes

Problems with Unequal Costs and Benefits

Generalizing Beyond Classification

A Key Analytical Framework: Expected Value

Using Expected Value to Frame Classifier Use

Using Expected Value to Frame Classifier Evaluation

Evaluation, Baseline Performance, and Implications for Investments in Data

Summary

Chapter 8. Visualizing Model Performance

Ranking Instead of Classifying

Profit Curves

ROC Graphs and Curves

The Area Under the ROC Curve (AUC)

Cumulative Response and Lift Curves

Example: Performance Analytics for Churn Modeling

Summary

Chapter 9. Evidence and Probabilities

Example: Targeting Online Consumers With Advertisements

Combining Evidence Probabilistically

Joint Probability and Independence

Bayes’ Rule

Applying Bayes’ Rule to Data Science

Conditional Independence and Naive Bayes

Advantages and Disadvantages of Naive Bayes

A Model of Evidence “Lift”

Example: Evidence Lifts from Facebook “Likes”

Evidence in Action: Targeting Consumers with Ads

Summary

Chapter 10. Representing and Mining Text

Why Text Is Important

Why Text Is Difficult

Representation

Bag of Words

Term Frequency

Measuring Sparseness: Inverse Document Frequency

Combining Them: TFIDF

Example: Jazz Musicians

* The Relationship of IDF to Entropy

Beyond Bag of Words

N-gram Sequences

Named Entity Extraction

Topic Models

Example: Mining News Stories to Predict Stock Price Movement

The Task

The Data

Data Preprocessing

Results

Summary

Chapter 11. Decision Analytic Thinking II: Toward Analytical Engineering

Targeting the Best Prospects for a Charity Mailing

The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces

A Brief Digression on Selection Bias

Our Churn Example Revisited with Even More Sophistication

The Expected Value Framework: Structuring a More Complicated Business Problem

Assessing the Influence of the Incentive

From an Expected Value Decomposition to a Data Science Solution

Summary

Chapter 12. Other Data Science Tasks and Techniques

Co-occurrences and Associations: Finding Items That Go Together

Measuring Surprise: Lift and Leverage

Example: Beer and Lottery Tickets

Associations Among Facebook Likes

Profiling: Finding Typical Behavior

Link Prediction and Social Recommendation

Data Reduction, Latent Information, and Movie Recommendation

Bias, Variance, and Ensemble Methods

Data-Driven Causal Explanation and a Viral Marketing Example

Summary

Chapter 13. Data Science and Business Strategy

Thinking Data-Analytically, Redux

Achieving Competitive Advantage with Data Science

Sustaining Competitive Advantage with Data Science

Formidable Historical Advantage

Unique Intellectual Property

Unique Intangible Collateral Assets

Superior Data Scientists

Superior Data Science Management

Attracting and Nurturing Data Scientists and Their Teams

Examine Data Science Case Studies

Be Ready to Accept Creative Ideas from Any Source

Be Ready to Evaluate Proposals for Data Science Projects

Example Data Mining Proposal

Flaws in the Big Red Proposal

A Firm’s Data Science Maturity

Chapter 14. Conclusion

The Fundamental Concepts of Data Science

Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data

Changing the Way We Think about Solutions to Business Problems

What Data Can’t Do: Humans in the Loop, Revisited

Privacy, Ethics, and Mining Data About Individuals

Is There More to Data Science?

Final Example: From Crowd-Sourcing to Cloud-Sourcing

Final Words

Appendix A. Proposal Review Guide

Business and Data Understanding

Data Preparation

Modeling

Evaluation and Deployment

Appendix B. Another Sample Proposal

Scenario and Proposal

Flaws in the GGC Proposal

Glossary

Bibliography

Index

About the Authors

www.it-ebooks.info

Praise “A must-read resource for anyone who is serious about embracing the opportunity of big data.” — Craig Vaughan Global Vice President at SAP “This timely book says out loud what has finally become apparent: in the modern world, Data is Business, and you can no longer think business without thinking data. Read this book and you will understand the Science behind thinking data.” — Ron Bekkerman Chief Data Officer at Carmel Ventures “A great book for business managers who lead or interact with data scientists, who wish to better understand the principals and algorithms available without the technical details of single-disciplinary books.” — Ronny Kohavi Partner Architect at Microsoft Online Services Division “Provost and Fawcett have distilled their mastery of both the art and science of real-world data analysis into an unrivalled introduction to the field.” —Geoff Webb Editor-in-Chief of Data Mining and Knowledge Discovery Journal “I would love it if everyone I had to work with had read this book.” — Claudia Perlich Chief Scientist of M6D (Media6Degrees) and Advertising Research Foundation Innovation Award Grand Winner (2013) www.it-ebooks.info

“A foundational piece in the fast developing world of Data Science. A must read for anyone interested in the Big Data revolution." —Justin Gapper Business Unit Analytics Manager at Teledyne Scientific and Imaging “The authors, both renowned experts in data science before it had a name, have taken a complex topic and made it accessible to all levels, but mostly helpful to the budding data scientist. As far as I know, this is the first book of its kind—with a focus on data science concepts as applied to practical business problems. It is liberally sprinkled with compelling real-world examples outlining familiar, accessible problems in the business world: customer churn, targeted marking, even whiskey analytics! The book is unique in that it does not give a cookbook of algorithms, rather it helps the reader understand the underlying concepts behind data science, and most importantly how to approach and be successful at problem solving. Whether you are looking for a good comprehensive overview of data science or are a budding data scientist in need of the basics, this is a must-read.” — Chris Volinsky Director of Statistics Research at AT&T Labs and Winning Team Member for the $1 Million Netflix Challenge “This book goes beyond data analytics 101. It’s the essential guide for those of us (all of us?) whose businesses are built on the ubiquity of data opportunities and the new mandate for data-driven decision-making.” —Tom Phillips CEO of Media6Degrees and Former Head of Google Search and Analytics “Intelligent use of data has become a force powering business to new levels of competitiveness. To thrive in this data-driven ecosystem, engineers, analysts, and managers alike must understand the options, design choices, and tradeoffs before them. With motivating examples, clear exposition, and a breadth of details covering not only the “hows” but the “whys”, Data Science for Business is the perfect primer for those wishing to become involved in the development and application of data-driven systems.” —Josh Attenberg Data Science Lead at Etsy www.it-ebooks.info

“Data is the foundation of new waves of productivity growth, innovation, and richer customer insight. Only recently viewed broadly as a source of competitive advantage, dealing well with data is rapidly becoming table stakes to stay in the game. The authors’ deep applied experience makes this a must read—a window into your competitor’s strategy.” — Alan Murray Serial Entrepreneur; Partner at Coriolis Ventures “One of the best data mining books, which helped me think through various ideas on liquidity analysis in the FX business. The examples are excellent and help you take a deep dive into the subject! This one is going to be on my shelf for lifetime!” — Nidhi Kathuria Vice President of FX at Royal Bank of Scotland www.it-ebooks.info

www.it-ebooks.info

Data Science for Business Foster Provost and Tom Fawcett www.it-ebooks.info

Data Science for Business by Foster Provost and Tom Fawcett Copyright © 2013 Foster Provost and Tom Fawcett. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Mike Loukides and Meghan Blanchette Production Editor: Christopher Hearse Proofreader: Kiel Van Horn Indexer: WordCo Indexing Services, Inc. Cover Designer: Mark Paglietti Interior Designer: David Futato Illustrator: Rebecca Demarest July 2013: First Edition Revision History for the First Edition: 2013-07-25: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449361327 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Many of the designations used by man‐ ufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. Data Science for Business is a trademark of Foster Provost and Tom Fawcett. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-449-36132-7 [LSI] www.it-ebooks.info

分享到：

赞收藏

资料库

Data Science for Business.pdf

相关推荐

信息化

热门标签

最新资料