Data Science.pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：1.38M 资料格式：pdf 举报版权申诉

weixin_43960172-10902532-4744302542922760833.pdf-第1页.png

第1页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第2页.png

第2页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第3页.png

第3页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第4页.png

第4页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第5页.png

第5页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第6页.png

第6页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第7页.png

第7页 / 共183页

weixin_43960172-10902532-4744302542922760833.pdf-第8页.png

第8页 / 共183页

Series page

Title page

Series Foreword

Preface

Acknowledgments

1 What Is Data Science?

2 What Are Data, and What Is a Data Set?

3 A Data Science Ecosystem

4 Machine Learning 101

5 Standard Data Science Tasks

6 Privacy and Ethics

7 Future Trends and Principles of Success

Glossary

Further Readings

References

Index

About Author

Download from finelybook 7450911@qq.com The MIT Press Essential Knowledge Series Auctions, Timothy P. Hubbard and Harry J. Paarsch The Book, Amaranth Borsuk Cloud Computing, Nayan Ruparelia Computing: A Concise History, Paul E. Ceruzzi The Conscious Mind, Zoltan L. Torey Crowdsourcing, Daren C. Brabham Data Science, John D. Kelleher and Brendan Tierney Free Will, Mark Balaguer The Future, Nick Montfort Information and Society, Michael Buckland Information and the Modern Corporation, James W. Cortada Intellectual Property Strategy, John Palfrey The Internet of Things, Samuel Greengard Machine Learning: The New AI, Ethem Alpaydin Machine Translation, Thierry Poibeau Memes in Digital Culture, Limor Shifman Metadata, Jeffrey Pomerantz The Mind–Body Problem, Jonathan Westphal MOOCs, Jonathan Haber Neuroplasticity, Moheb Costandi Open Access, Peter Suber 2

Download from finelybook 7450911@qq.com Paradox, Margaret Cuonzo Post-Truth, Lee McIntyre Robots, John Jordan Self-Tracking, Gina Neff and Dawn Nafus Sustainability, Kent E. Portney Synesthesia, Richard E. Cytowic The Technological Singularity, Murray Shanahan Understanding Beliefs, Nils J. Nilsson Waves, Frederic Raichlen 3

Download from finelybook 7450911@qq.com Data Science John D. Kelleher and Brendan Tierney The MIT Press Cambridge, Massachusetts London, England 4

Download from finelybook 7450911@qq.com © 2018 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in Chaparral Pro by Toppan Best-set Premedia Limited. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Names: Kelleher, John D., 1974- author. | Tierney, Brendan, 1970- author. Title: Data science / John D. Kelleher and Brendan Tierney. Description: Cambridge, MA : The MIT Press, [2018] | Series: The MIT Press essential knowledge series | Includes bibliographical references and index. Identifiers: LCCN 2017043665 | ISBN 9780262535434 (pbk. : alk. paper) eISBN 9780262347013 Subjects: LCSH: Big data. | Machine learning. | Data mining. | Quantitative research. Classification: LCC QA76.9.B45 K45 2018 | DDC 005.7--dc23 LC record available at https://lccn.loc.gov/2017043665 ePub Version 1.0 5

Download from finelybook 7450911@qq.com Table of Contents Series page Title page Copyright page Series Foreword Preface Acknowledgments 1 What Is Data Science? 2 What Are Data, and What Is a Data Set? 3 A Data Science Ecosystem 4 Machine Learning 101 5 Standard Data Science Tasks 6 Privacy and Ethics 7 Future Trends and Principles of Success Glossary Further Readings References Index About Author List of Tables Table 1 A Data Set of Classic Books 6

Download from finelybook 7450911@qq.com Table 2 Diabetes Study Data Set Table 3 A Data Set of Emails: Spam or Not Spam? List of Illustrations Figure 1 A skills-set desideratum for a data scientist. Figure 2 The DIKW pyramid (adapted from Kitchin 2014a). Figure 3 Data science pyramid (adapted from Han, Kamber, and Pei 2011). Figure 4 The CRISP-DM life cycle (based on figure 2 in Chapman, Clinton, Kerber, et al. 1999). Figure 5 The CRISP-DM stages and tasks (based on figure 3 in Chapman, Clinton, Kerber, et al. 1999). Figure 6 A typical small-data and big-data architecture for data science (inspired by a figure from the Hortonworks newsletter, April 23, 2013, https://hortonworks.com/blog/hadoop-and-the- data-warehouse-when-to-use-which). Figure 7 The traditional process for building predictive models and scoring data. Figure 8 Databases, data warehousing, and Hadoop working together (inspired by a figure in the Gluent data platform white paper, 2017, https://gluent.com/wp- content/uploads/2017/09/Gluent-Overview.pdf). Figure 9 Scatterplots of shoe size and height, weight and exercise, and shoe size and exercise. Figure 10 Scatterplots of the likelihood of diabetes with respect to height, weight, and BMI. Figure 11 (a) The best-fit regression line for the model “Diabetes = −7.38431 + 0.55593 BMI.” (b) The dashed vertical lines illustrate the residual for each instance. 7

Download from finelybook 7450911@qq.com Figure 12 Mapping the logistic and tanh functions as applied to the input x. Figure 13 A simple neural network. Figure 14 A neural network that predicts a person’s fitness level. Figure 15 A deep neural network. Figure 16 A decision tree for determining whether an email is spam or not. Figure 17 Creating the root node in the tree. Figure 18 Adding the second node to the tree. 8

分享到：

赞收藏

资料库

Data Science.pdf

相关推荐

课程资源

热门标签

最新资料