logo资料库

Probability for Statistics and Machine Learning.pdf

第1页 / 共796页
第2页 / 共796页
第3页 / 共796页
第4页 / 共796页
第5页 / 共796页
第6页 / 共796页
第7页 / 共796页
第8页 / 共796页
资料共796页,剩余部分请下载后查看
000
0
Probability for Statistics and Machine Learning
Preface
Contents
Suggested Courses with Different Themes
1
Chapter 1 Review of Univariate Probability
1.1 Experiments and Sample Spaces
1.2 Conditional Probability and Independence
1.3 Integer-Valued and Discrete Random Variables
1.3.1 CDF and Independence
1.3.2 Expectation and Moments
1.4 Inequalities
1.5 Generating and Moment-Generating Functions
1.6 Applications of Generating Functions to a Pattern Problem
1.7 Standard Discrete Distributions
1.8 Poisson Approximation to Binomial
1.9 Continuous Random Variables
1.10 Functions of a Continuous Random Variable
1.10.1 Expectation and Moments
1.10.2 Moments and the Tail of a CDF
1.11 Moment-Generating Function and Fundamental Inequalities
1.11.1 Inversion of an MGF and Post's Formula
1.12 Some Special Continuous Distributions
1.13 Normal Distribution and Confidence Interval for a Mean
1.14 Stein's Lemma
1.15 Chernoff's Variance Inequality
1.16 Various Characterizations of Normal Distributions
1.17 Normal Approximations and Central Limit Theorem
1.17.1 Binomial Confidence Interval
1.17.2 Error of the CLT
1.18 Normal Approximation to Poisson and Gamma
1.18.1 Confidence Intervals
1.19 Convergence of Densities and Edgeworth Expansions
References
10
Chapter 10 Markov Chains and Applications
10.1 Notation and Basic Definitions
10.2 Examples and Various Applications as a Model
10.3 Chapman–Kolmogorov Equation
10.4 Communicating Classes
10.5 Gambler's Ruin
10.6 First Passage, Recurrence, and Transience
10.7 Long Run Evolution and Stationary Distributions
References
11
Chapter 11 Random Walks
11.1 Random Walk on the Cubic Lattice
11.1.1 Some Distribution Theory
11.1.2 Recurrence and Transience
11.1.3 Pólya's Formula for the Return Probability
11.2 First Passage Time and Arc Sine Law
11.3 The Local Time
11.4 Practically Useful Generalizations
11.5 Wald's Identity
11.6 Fate of a Random Walk
11.7 Chung–Fuchs Theorem
11.8 Six Important Inequalities
References
12
Chapter 12 Brownian Motion and Gaussian Processes
12.1 Preview of Connections to the Random Walk
12.2 Basic Definitions
12.2.1 Condition for a Gaussian Process to be Markov
12.2.2 Explicit Construction of Brownian Motion
12.3 Basic Distributional Properties
12.3.1 Reflection Principle and Extremes
12.3.2 Path Properties and Behavior Near Zero and Infinity
12.3.3 Fractal Nature of Level Sets
12.4 The Dirichlet Problem and Boundary Crossing Probabilities
12.4.1 Recurrence and Transience
12.5 The Local Time of Brownian Motion
12.6 Invariance Principle and Statistical Applications
12.7 Strong Invariance Principle and the KMT Theorem
12.8 Brownian Motion with Drift and Ornstein–Uhlenbeck Process
12.8.1 Negative Drift and Density of Maximum
12.8.2 Transition Density and the Heat Equation
12.8.3 The Ornstein–Uhlenbeck Process
References
13
Chapter 13 Poisson Processes and Applications
13.1 Notation
13.2 Defining a Homogeneous Poisson Process
13.3 Important Properties and Uses as a Statistical Model
13.4 Linear Poisson Process and Brownian Motion: A Connection
13.5 Higher-Dimensional Poisson Point Processes
13.5.1 The Mapping Theorem
13.6 One-Dimensional Nonhomogeneous Processes
13.7 Campbell's Theorem and Shot Noise
13.7.1 Poisson Process and Stable Laws
References
14
Chapter 14 Discrete Time Martingales and Concentration Inequalities
14.1 Illustrative Examples and Applications in Statistics
14.2 Stopping Times and Optional Stopping
14.2.1 Stopping Times
14.2.2 Optional Stopping
14.2.3 Sufficient Conditions for Optional Stopping Theorem
14.2.4 Applications of Optional Stopping
14.3 Martingale and Concentration Inequalities
14.3.1 Maximal Inequality
14.3.2 Inequalities of Burkholder, Davis, and Gundy
14.3.3 Inequalities of Hoeffding and Azuma
14.3.4 Inequalities of McDiarmid and Devroye
14.3.5 The Upcrossing Inequality
14.4 Convergence of Martingales
14.4.1 The Basic Convergence Theorem
14.4.2 Convergence in L1 and L2
14.5 Reverse Martingales and Proof of SLLN
14.6 Martingale Central Limit Theorem
References
15
Chapter 15 Probability Metrics
15.1 Standard Probability Metrics Useful in Statistics
15.2 Basic Properties of the Metrics
15.3 Metric Inequalities
15.4 Differential Metrics for Parametric Families
15.4.1 Fisher Information and Differential Metrics
15.4.2 Rao's Geodesic Distances on Distributions
References
16
Chapter 16 Empirical Processes and VC Theory
16.1 Basic Notation and Definitions
16.2 Classic Asymptotic Properties of the Empirical Process
16.2.1 Invariance Principle and Statistical Applications
16.2.2 Weighted Empirical Process
16.2.3 The Quantile Process
16.2.4 Strong Approximations of the Empirical Process
16.3 Vapnik–Chervonenkis Theory
16.3.1 Basic Theory
16.3.2 Concrete Examples
16.4 CLTs for Empirical Measures and Applications
16.4.1 Notation and Formulation
16.4.2 Entropy Bounds and Specific CLTs
16.4.3 Concrete Examples
16.5 Maximal Inequalities and Symmetrization
16.6 Connection to the Poisson Process
References
17
Chapter 17 Large Deviations
17.1 Large Deviations for Sample Means
17.1.1 The Cramér–Chernoff Theorem in R
17.1.2 Properties of the Rate Function
17.1.3 Cramér's Theorem for General Sets
17.2 The Grtner–Ellis Theorem and Markov Chain LargeDeviations
17.3 The t-Statistic
17.4 Lipschitz Functions and Talagrand's Inequality
17.5 Large Deviations in Continuous Time
17.5.1 Continuity of a Gaussian Process
17.5.2 Metric Entropy of T and Tail of the Supremum
References
18
Chapter 18 The Exponential Family and Statistical Applications
18.1 One-Parameter Exponential Family
18.1.1 Definition and First Examples
18.2 The Canonical Form and Basic Properties
18.2.1 Convexity Properties
18.2.2 Moments and Moment Generating Function
18.2.3 Closure Properties
18.3 Multiparameter Exponential Family
18.4 Sufficiency and Completeness
18.4.1 Neyman--Fisher Factorization and Basu's Theorem
18.4.2 Applications of Basu's Theorem to Probability
18.5 Curved Exponential Family
References
19
Chapter 19 Simulation and Markov Chain Monte Carlo
19.1 The Ordinary Monte Carlo
19.1.1 Basic Theory and Examples
19.1.2 Monte Carlo P-Values
19.1.3 Rao--Blackwellization
19.2 Textbook Simulation Techniques
19.2.1 Quantile Transformation and Accept--Reject
19.2.2 Importance Sampling and Its Asymptotic Properties
19.2.3 Optimal Importance Sampling Distribution
19.2.4 Algorithms for Simulating from CommonDistributions
19.3 Markov Chain Monte Carlo
19.3.1 Reversible Markov Chains
19.3.2 Metropolis Algorithms
19.4 The Gibbs Sampler
19.5 Convergence of MCMC and Bounds on Errors
19.5.1 Spectral Bounds
19.5.2 Dobrushin's Inequality and Diaconis--Fill--Stroock Bound
19.5.3 Drift and Minorization Methods
19.6 MCMC on General Spaces
19.6.1 General Theory and Metropolis Schemes
19.6.2 Convergence
19.6.3 Convergence of the Gibbs Sampler
19.7 Practical Convergence Diagnostics
References
2
Chapter 2 Multivariate Discrete Distributions
2.1 Bivariate Joint Distributions and Expectations of Functions
2.2 Conditional Distributions and Conditional Expectations
2.2.1 Examples on Conditional Distributionsand Expectations
2.3 Using Conditioning to Evaluate Mean and Variance
2.4 Covariance and Correlation
2.5 Multivariate Case
2.5.1 Joint MGF
2.5.2 Multinomial Distribution
2.6 The Poissonization Technique
20
Chapter 20 Useful Tools for Statistics and Machine Learning
20.1 The Bootstrap
20.1.1 Consistency of the Bootstrap
20.1.2 Further Examples
20.1.3 Higher-Order Accuracy of the Bootstrap
20.1.4 Bootstrap for Dependent Data
20.2 The EM Algorithm
20.2.1 The Algorithm and Examples
20.2.2 Monotone Ascent and Convergence of EM
20.2.3 Modifications of EM
20.3 Kernels and Classification
20.3.1 Smoothing by Kernels
20.3.2 Some Common Kernels in Use
20.3.3 Kernel Density Estimation
20.3.4 Kernels for Statistical Classification
20.3.4.1 Reproducing Kernel Hilbert Spaces
20.3.5 Mercer's Theorem and Feature Maps
20.3.5.1 Support Vector Machines
References
3
Chapter 3 Multidimensional Densities
3.1 Joint Density Function and Its Role
3.2 Expectation of Functions
3.3 Bivariate Normal
3.4 Conditional Densities and Expectations
3.4.1 Examples on Conditional Densities and Expectations
3.5 Posterior Densities, Likelihood Functions, and Bayes Estimates
3.6 Maximum Likelihood Estimates
3.7 Bivariate Normal Conditional Distributions
3.8 Useful Formulas and Characterizations for Bivariate Normal
3.8.1 Computing Bivariate Normal Probabilities
3.9 Conditional Expectation Given a Set and Borel's Paradox
References
4
Chapter 4 Advanced Distribution Theory
4.1 Convolutions and Examples
4.2 Products and Quotients and the t- and F-Distribution
4.3 Transformations
4.4 Applications of Jacobian Formula
4.5 Polar Coordinates in Two Dimensions
4.6 n-Dimensional Polar and Helmert's Transformation
4.6.1 Efficient Spherical Calculations with PolarCoordinates
4.6.2 Independence of Mean and Variancein Normal Case
4.6.3 The t Confidence Interval
4.7 The Dirichlet Distribution
4.7.1 Picking a Point from the Surface of a Sphere
4.7.2 Poincaré's Lemma
4.8 Ten Important High-Dimensional Formulasfor Easy Reference
References
5
Chapter 5 Multivariate Normal and Related Distributions
5.1 Definition and Some Basic Properties
5.2 Conditional Distributions
5.3 Exchangeable Normal Variables
5.4 Sampling Distributions Useful in Statistics
5.4.1 Wishart Expectation Identities
5.4.2 * Hotelling's T2 and Distribution of Quadratic Forms
5.4.3 Distribution of Correlation Coefficient
5.5 Noncentral Distributions
5.6 Some Important Inequalities for Easy Reference
References
6
Chapter 6 Finite Sample Theory of Order Statistics and Extremes
6.1 Basic Distribution Theory
6.2 More Advanced Distribution Theory
6.3 Quantile Transformation and Existence of Moments
6.4 Spacings
6.4.1 Exponential Spacings and Réyni's Representation
6.4.2 Uniform Spacings
6.5 Conditional Distributions and Markov Property
6.6 Some Applications
6.6.1 Records
6.6.2 The Empirical CDF
6.7 Distribution of the Multinomial Maximum
References
7
Chapter 7 Essential Asymptotics and Applications
7.1 Some Basic Notation and Convergence Concepts
7.2 Laws of Large Numbers
7.3 Convergence Preservation
7.4 Convergence in Distribution
7.5 Preservation of Convergence and Statistical Applications
7.5.1 Slutsky's Theorem
7.5.2 Delta Theorem
7.5.3 Variance Stabilizing Transformations
7.6 Convergence of Moments
7.6.1 Uniform Integrability
7.6.2 The Moment Problem and Convergence in Distribution
7.6.3 Approximation of Moments
7.7 Convergence of Densities and Scheffé's Theorem
References
8
Chapter 8 Characteristic Functions and Applications
8.1 Characteristic Functions of Standard Distributions
8.2 Inversion and Uniqueness
8.3 Taylor Expansions, Differentiability, and Moments
8.4 Continuity Theorems
8.5 Proof of the CLT and the WLLN
8.6 Producing Characteristic Functions
8.7 Error of the Central Limit Theorem
8.8 Lindeberg–Feller Theorem for General Independent Case
8.9 Infinite Divisibility and Stable Laws
8.10 Some Useful Inequalities
References
9
Chapter 9 Asymptotics of Extremes and Order Statistics
9.1 Central-Order Statistics
9.1.1 Single-Order Statistic
9.1.2 Two Statistical Applications
9.1.3 Several Order Statistics
9.2 Extremes
9.2.1 Easily Applicable Limit Theorems
9.2.2 The Convergence of Types Theorem
9.3 Fisher--Tippett Family and Putting it Together
References
appendix
Appendix A Symbols, Useful Formulas, and Normal Table
A.1 Glossary of Symbols
A.2 Moments and MGFs of Common Distributions
A.3 Normal Table
Author Index
Subject Index
Springer Texts in Statistics Series Editors: G. Casella S. Fienberg I. Olkin For further volumes: http://www.springer.com/series/417
Anirban DasGupta Probability for Statistics and Machine Learning Fundamentals and Advanced Topics ABC
Anirban DasGupta Department of Statistics Purdue University 150 N. University Street West Lafayette, IN 47907, USA dasgupta@stat.purdue.edu Mathematica R is a registered trademark of Wolfram Research, Inc. ISBN 978-1-4419-9633-6 DOI 10.1007/978-1-4419-9634-3 Springer New York Dordrecht Heidelberg London e-ISBN 978-1-4419-9634-3 Library of Congress Control Number: 2011924777 c Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To Persi Diaconis, Peter Hall, Ashok Maitra, and my mother, with affection
Preface This is the companion second volume to my undergraduate text Fundamentals of Probability: A First Course. The purpose of my writing this book is to give gradu- ate students, instructors, and researchers in statistics, mathematics, and computer science a lucidly written unique text at the confluence of probability, advanced stochastic processes, statistics, and key tools for machine learning. Numerous top- ics in probability and stochastic processes of current importance in statistics and machine learning that are widely scattered in the literature in many different spe- cialized books are all brought together under one fold in this book. This is done with an extensive bibliography for each topic, and numerous worked-out examples and exercises. Probability, with all its models, techniques, and its poignant beauty, is an incredibly powerful tool for anyone who deals with data or randomness. The content and the style of this book reflect that philosophy; I emphasize lucidity, a wide background, and the far-reaching applicability of probability in science. The book starts with a self-contained and fairly complete review of basic prob- ability, and then traverses its way through the classics, to advanced modern topics and tools, including a substantial amount of statistics itself. Because of its nearly encyclopaedic coverage, it can serve as a graduate text for a year-long probabil- ity sequence, or for focused short courses on selected topics, for self-study, and as a nearly unique reference for research in statistics, probability, and computer sci- ence. It provides an extensive treatment of most of the standard topics in a graduate probability sequence, and integrates them with the basic theory and many examples of several core statistical topics, as well as with some tools of major importance in machine learning. This is done with unusually detailed bibliographies for the reader who wants to dig deeper into a particular topic, and with a huge repertoire of worked-out examples and exercises. The total number of worked-out examples in this book is 423, and the total number of exercises is 808. An instructor can rotate the exercises between semesters, and use them for setting exams, and a student can use them for additional exam preparation and self-study. I believe that the book is unique in its range, unification, bibliographic detail, and its collection of problems and examples. Topics in core probability, such as distribution theory, asymptotics, Markov chains, martingales, Poisson processes, random walks, and Brownian motion are covered in the first 14 chapters. In these chapters, a reader will also find basic vii
分享到:
收藏