logo资料库

computer age statistical inference.pdf

第1页 / 共493页
第2页 / 共493页
第3页 / 共493页
第4页 / 共493页
第5页 / 共493页
第6页 / 共493页
第7页 / 共493页
第8页 / 共493页
资料共493页,剩余部分请下载后查看
The Work, Computer Age Statistical Inference, was first published by Cambridge University Press. c in the Work, Bradley Efron and Trevor Hastie, 2016. Cambridge University Press’s catalogue entry for the Work can be found at http: // www. cambridge. org/ 9781107149892 NB: The copy of the Work, as displayed on this website, can be purchased through Cambridge University Press and other standard distribution channels. This copy is made available for personal use only and must not be adapted, sold or re-distributed. Corrected November 10, 2017. The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. “Big data,” “data science,” and “machine learning” have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going?This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories – Bayesian, frequentist, Fisherian – individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.Efron & hastiEComputEr agE statistiCal infErEnCE“How and why is computational statistics taking over the world? In this serious work of synthesis that is also fun to read, Efron and Hastie give their take on the unreasonable effectiveness of statistics and machine learning in the context of a series of clear, historically informed examples.”— Andrew Gelman, Columbia University “Computer Age Statistical Inference is written especially for those who want to hear the big ideas, and see them instantiated through the essential mathematics that defines statistical analysis. It makes a great supplement to the traditional curricula for beginning graduate students.”— Rob Kass, Carnegie Mellon University “This is a terrific book. It gives a clear, accessible, and entertaining account of the interplay between theory and methodological development that has driven statistics in the computer age. The authors succeed brilliantly in locating contemporary algorithmic methodologies for analysis of ‘big data’ within the framework of established statistical theory.”— Alastair Young, Imperial College London “This is a guided tour of modern statistics that emphasizes the conceptual and computational advances of the last century. Authored by two masters of the field, it offers just the right mix of mathematical analysis and insightful commentary.”— Hal Varian, Google “Efron and Hastie guide us through the maze of breakthrough statistical methodologies following the computing evolution: why they were developed, their properties, and how they are used. Highlighting their origins, the book helps us understand each method’s roles in inference and/or prediction.”— Galit Shmueli, National Tsing Hua University “A masterful guide to how the inferential bases of classical statistics can provide a principled disciplinary frame for the data science of the twenty-first century.” — Stephen Stigler, University of Chicago, author of Seven Pillars of Statistical Wisdom “A refreshing view of modern statistics. Algorithmics are put on equal footing with intuition, properties, and the abstract arguments behind them. The methods covered are indispensable to practicing statistical analysts in today’s big data and big computing landscape.”— Robert Gramacy, The University of Chicago Booth School of BusinessBradley Efron is Max H. Stein Professor, Professor of Statistics, and Professor of Biomedical Data Science at Stanford University. He has held visiting faculty appointments at Harvard, UC Berkeley, and Imperial College London. Efron has worked extensively on theories of statistical inference, and is the inventor of the bootstrap sampling technique. He received the National Medal of Science in 2005 and the Guy Medal in Gold of the Royal Statistical Society in 2014. Trevor Hastie is John A. Overdeck Professor, Professor of Statistics, and Professor of Biomedical Data Science at Stanford University. He is coauthor of Elements of Statistical Learning, a key text in the field of modern data analysis. He is also known for his work on generalized additive models and principal curves, and for his contributions to the R computing environment. Hastie was awarded the Emmanuel and Carol Parzen prize for Statistical Innovation in 2014. Institute of Mathematical Statistics MonographsEditorial Board:D. R. Cox (University of Oxford)B. Hambly (University of Oxford)S. Holmes (Stanford University)J. Wellner (University of Washington)Cover illustration: Pacific Ocean wave, North Shore, Oahu, Hawaii. © Brian Sytnyk / Getty Images.Cover designed by Zoe Naylor.PRINTED IN THE UNITED KINGDOMComputEr agE statistiCal infErEnCEalgorithms, EvidEnCE, and data sCiEnCEBradlEy Efron trEvor hastiE9781107149892 Efron & Hastie JKT C M Y K
Computer Age Statistical Inference Algorithms, Evidence, and Data Science Bradley Efron Trevor Hastie Stanford University
To Donna and Lynda
viii
Contents Preface Acknowledgments Notation 1 1.1 1.2 1.3 2 2.1 2.2 2.3 3 3.1 3.2 3.3 3.4 3.5 4 4.1 4.2 4.3 4.4 4.5 5 Part I Classic Statistical Inference Algorithms and Inference A Regression Example Hypothesis Testing Notes Frequentist Inference Frequentism in Practice Frequentist Optimality Notes and Details Bayesian Inference Two Examples Uninformative Prior Distributions Flaws in Frequentist Inference A Bayesian/Frequentist Comparison List Notes and Details Fisherian Inference and Maximum Likelihood Estimation Likelihood and Maximum Likelihood Fisher Information and the MLE Conditional Inference Permutation and Randomization Notes and Details Parametric Models and Exponential Families ix xv xviii xix 1 3 4 8 11 12 14 18 20 22 24 28 30 33 36 38 38 41 45 49 51 53
x 5.1 5.2 5.3 5.4 5.5 5.6 6 6.1 6.2 6.3 6.4 6.5 7 7.1 7.2 7.3 7.4 7.5 8 8.1 8.2 8.3 8.4 8.5 Contents Univariate Families The Multivariate Normal Distribution Fisher’s Information Bound for Multiparameter Families The Multinomial Distribution Exponential Families Notes and Details Part II Early Computer-Age Methods Empirical Bayes Robbins’ Formula The Missing-Species Problem A Medical Example Indirect Evidence 1 Notes and Details James–Stein Estimation and Ridge Regression The James–Stein Estimator The Baseball Players Ridge Regression Indirect Evidence 2 Notes and Details Generalized Linear Models and Regression Trees Logistic Regression Generalized Linear Models Poisson Regression Regression Trees Notes and Details Survival Analysis and the EM Algorithm Life Tables and Hazard Rates Censored Data and the Kaplan–Meier Estimate The Log-Rank Test The Proportional Hazards Model 9 9.1 9.2 9.3 9.4 9.5 Missing Data and the EM Algorithm 9.6 Notes and Details The Jackknife and the Bootstrap 10 10.1 The Jackknife Estimate of Standard Error 10.2 The Nonparametric Bootstrap 10.3 Resampling Plans 54 55 59 61 64 69 73 75 75 78 84 88 88 91 91 94 97 102 104 108 109 116 120 124 128 131 131 134 139 143 146 150 155 156 159 162
分享到:
收藏