logo资料库

Data Science.pdf

第1页 / 共183页
第2页 / 共183页
第3页 / 共183页
第4页 / 共183页
第5页 / 共183页
第6页 / 共183页
第7页 / 共183页
第8页 / 共183页
资料共183页,剩余部分请下载后查看
Series page
Title page
Copyright page
Series Foreword
Preface
Acknowledgments
1 What Is Data Science?
2 What Are Data, and What Is a Data Set?
3 A Data Science Ecosystem
4 Machine Learning 101
5 Standard Data Science Tasks
6 Privacy and Ethics
7 Future Trends and Principles of Success
Glossary
Further Readings
References
Index
About Author
Download from finelybook 7450911@qq.com The MIT Press Essential Knowledge Series Auctions, Timothy P. Hubbard and Harry J. Paarsch The Book, Amaranth Borsuk Cloud Computing, Nayan Ruparelia Computing: A Concise History, Paul E. Ceruzzi The Conscious Mind, Zoltan L. Torey Crowdsourcing, Daren C. Brabham Data Science, John D. Kelleher and Brendan Tierney Free Will, Mark Balaguer The Future, Nick Montfort Information and Society, Michael Buckland Information and the Modern Corporation, James W. Cortada Intellectual Property Strategy, John Palfrey The Internet of Things, Samuel Greengard Machine Learning: The New AI, Ethem Alpaydin Machine Translation, Thierry Poibeau Memes in Digital Culture, Limor Shifman Metadata, Jeffrey Pomerantz The Mind–Body Problem, Jonathan Westphal MOOCs, Jonathan Haber Neuroplasticity, Moheb Costandi Open Access, Peter Suber 2
Download from finelybook 7450911@qq.com Paradox, Margaret Cuonzo Post-Truth, Lee McIntyre Robots, John Jordan Self-Tracking, Gina Neff and Dawn Nafus Sustainability, Kent E. Portney Synesthesia, Richard E. Cytowic The Technological Singularity, Murray Shanahan Understanding Beliefs, Nils J. Nilsson Waves, Frederic Raichlen 3
Download from finelybook 7450911@qq.com Data Science John D. Kelleher and Brendan Tierney The MIT Press Cambridge, Massachusetts London, England 4
Download from finelybook 7450911@qq.com © 2018 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in Chaparral Pro by Toppan Best-set Premedia Limited. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Names: Kelleher, John D., 1974- author. | Tierney, Brendan, 1970- author. Title: Data science / John D. Kelleher and Brendan Tierney. Description: Cambridge, MA : The MIT Press, [2018] | Series: The MIT Press essential knowledge series | Includes bibliographical references and index. Identifiers: LCCN 2017043665 | ISBN 9780262535434 (pbk. : alk. paper) eISBN 9780262347013 Subjects: LCSH: Big data. | Machine learning. | Data mining. | Quantitative research. Classification: LCC QA76.9.B45 K45 2018 | DDC 005.7--dc23 LC record available at https://lccn.loc.gov/2017043665 ePub Version 1.0 5
Download from finelybook 7450911@qq.com Table of Contents Series page Title page Copyright page Series Foreword Preface Acknowledgments 1 What Is Data Science? 2 What Are Data, and What Is a Data Set? 3 A Data Science Ecosystem 4 Machine Learning 101 5 Standard Data Science Tasks 6 Privacy and Ethics 7 Future Trends and Principles of Success Glossary Further Readings References Index About Author List of Tables Table 1 A Data Set of Classic Books 6
Download from finelybook 7450911@qq.com Table 2 Diabetes Study Data Set Table 3 A Data Set of Emails: Spam or Not Spam? List of Illustrations Figure 1 A skills-set desideratum for a data scientist. Figure 2 The DIKW pyramid (adapted from Kitchin 2014a). Figure 3 Data science pyramid (adapted from Han, Kamber, and Pei 2011). Figure 4 The CRISP-DM life cycle (based on figure 2 in Chapman, Clinton, Kerber, et al. 1999). Figure 5 The CRISP-DM stages and tasks (based on figure 3 in Chapman, Clinton, Kerber, et al. 1999). Figure 6 A typical small-data and big-data architecture for data science (inspired by a figure from the Hortonworks newsletter, April 23, 2013, https://hortonworks.com/blog/hadoop-and-the- data-warehouse-when-to-use-which). Figure 7 The traditional process for building predictive models and scoring data. Figure 8 Databases, data warehousing, and Hadoop working together (inspired by a figure in the Gluent data platform white paper, 2017, https://gluent.com/wp- content/uploads/2017/09/Gluent-Overview.pdf). Figure 9 Scatterplots of shoe size and height, weight and exercise, and shoe size and exercise. Figure 10 Scatterplots of the likelihood of diabetes with respect to height, weight, and BMI. Figure 11 (a) The best-fit regression line for the model “Diabetes = −7.38431 + 0.55593 BMI.” (b) The dashed vertical lines illustrate the residual for each instance. 7
Download from finelybook 7450911@qq.com Figure 12 Mapping the logistic and tanh functions as applied to the input x. Figure 13 A simple neural network. Figure 14 A neural network that predicts a person’s fitness level. Figure 15 A deep neural network. Figure 16 A decision tree for determining whether an email is spam or not. Figure 17 Creating the root node in the tree. Figure 18 Adding the second node to the tree. 8
分享到:
收藏