logo资料库

High-Dimensional Probability: An Introduction with Applications in Data Science.pdf

第1页 / 共301页
第2页 / 共301页
第3页 / 共301页
第4页 / 共301页
第5页 / 共301页
第6页 / 共301页
第7页 / 共301页
第8页 / 共301页
资料共301页,剩余部分请下载后查看
Preface
Who is this book for?
Why this book?
What is this book about?
Prerequisites
A word on exercises
Related reading
Acknowledgements
Appetizer: using probability to cover a geometric set
Preliminaries on random variables
Basic quantities associated with random variables
Some classical inequalities
Limit theorems
Notes
Concentration of sums of independent random variables
Why concentration inequalities?
Hoeffding's inequality
Chernoff's inequality
Application: degrees of random graphs
Sub-gaussian distributions
General Hoeffding's and Khintchine's inequalities
Sub-exponential distributions
Bernstein's inequality
Notes
Random vectors in high dimensions
Concentration of the norm
Covariance matrices and principal component analysis
Examples of high-dimensional distributions
Sub-gaussian distributions in higher dimensions
Application: Grothendieck's inequality and semidefinite programming
Application: Maximum cut for graphs
Kernel trick, and tightening of Grothendieck's inequality
Notes
Random matrices
Preliminaries on matrices
Nets, covering numbers and packing numbers
Application: error correcting codes
Upper bounds on random sub-gaussian matrices
Application: community detection in networks
Two-sided bounds on sub-gaussian matrices
Application: covariance estimation and clustering
Notes
Concentration without independence
Concentration of Lipschitz functions on the sphere
Concentration on other metric measure spaces
Application: Johnson-Lindenstrauss Lemma
Matrix Bernstein's inequality
Application: community detection in sparse networks
Application: covariance estimation for general distributions
Notes
Quadratic forms, symmetrization and contraction
Decoupling
Hanson-Wright Inequality
Concentration of anisotropic random vectors
Symmetrization
Random matrices with non-i.i.d. entries
Application: matrix completion
Contraction Principle
Notes
Random processes
Basic concepts and examples
Slepian's inequality
Sharp bounds on Gaussian matrices
Sudakov's minoration inequality
Gaussian width
Stable dimension, stable rank, and Gaussian complexity
Random projections of sets
Notes
Chaining
Dudley's inequality
Application: empirical processes
VC dimension
Application: statistical learning theory
Generic chaining
Talagrand's majorizing measure and comparison theorems
Chevet's inequality
Notes
Deviations of random matrices and geometric consequences
Matrix deviation inequality
Random matrices, random projections and covariance estimation
Johnson-Lindenstrauss Lemma for infinite sets
Random sections: M* bound and Escape Theorem
Notes
Sparse Recovery
High-dimensional signal recovery problems
Signal recovery based on M* bound
Recovery of sparse signals
Low-rank matrix recovery
Exact recovery and the restricted isometry property
Lasso algorithm for sparse regression
Notes
Dvoretzky-Milman's Theorem
Deviations of random matrices with respect to general norms
Johnson-Lindenstrauss embeddings and sharper Chevet inequality
Dvoretzky-Milman's Theorem
Notes
Bibliography
Index
High-Dimensional Probability An Introduction with Applications in Data Science Roman Vershynin University of California, Irvine June 7, 2018 https://www.math.uci.edu/~rvershyn/
Contents Preface Appetizer: using probability to cover a geometric set 1 1.1 1.2 1.3 1.4 Preliminaries on random variables Basic quantities associated with random variables Some classical inequalities Limit theorems Notes Hoeffding’s inequality Chernoff’s inequality Application: degrees of random graphs Sub-gaussian distributions 2 2.1 Why concentration inequalities? 2.2 2.3 2.4 2.5 2.6 General Hoeffding’s and Khintchine’s inequalities 2.7 2.8 2.9 Concentration of sums of independent random variables Sub-exponential distributions Bernstein’s inequality Notes Random vectors in high dimensions Concentration of the norm Covariance matrices and principal component analysis Examples of high-dimensional distributions Sub-gaussian distributions in higher dimensions Application: Grothendieck’s inequality and semidefinite programming Application: Maximum cut for graphs 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Kernel trick, and tightening of Grothendieck’s inequality 3.8 Notes 4 4.1 4.2 4.3 4.4 4.5 4.6 Random matrices Preliminaries on matrices Nets, covering numbers and packing numbers Application: error correcting codes Upper bounds on random sub-gaussian matrices Application: community detection in networks Two-sided bounds on sub-gaussian matrices iii vi 1 6 6 7 9 12 13 13 16 19 21 24 29 32 37 40 42 43 45 50 56 60 66 70 74 76 76 81 86 89 93 97
iv 4.7 4.8 Contents Application: covariance estimation and clustering Notes Concentration without independence Concentration of Lipschitz functions on the sphere Concentration on other metric measure spaces Application: Johnson-Lindenstrauss Lemma 5 5.1 5.2 5.3 5.4 Matrix Bernstein’s inequality 5.5 5.6 5.7 Application: community detection in sparse networks Application: covariance estimation for general distributions Notes Quadratic forms, symmetrization and contraction 6 6.1 Decoupling 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Hanson-Wright Inequality Concentration of anisotropic random vectors Symmetrization Random matrices with non-i.i.d. entries Application: matrix completion Contraction Principle Notes Random processes Basic concepts and examples Slepian’s inequality Sharp bounds on Gaussian matrices Sudakov’s minoration inequality 7 7.1 7.2 7.3 7.4 7.5 Gaussian width 7.6 7.7 7.8 Stable dimension, stable rank, and Gaussian complexity Random projections of sets Notes Application: empirical processes VC dimension Application: statistical learning theory Chaining 8 8.1 Dudley’s inequality 8.2 8.3 8.4 8.5 Generic chaining 8.6 8.7 8.8 Talagrand’s majorizing measure and comparison theorems Chevet’s inequality Notes Deviations of random matrices and geometric consequences 9 9.1 Matrix deviation inequality 9.2 9.3 9.4 9.5 Random matrices, random projections and covariance estimation Johnson-Lindenstrauss Lemma for infinite sets Random sections: M∗ bound and Escape Theorem Notes 99 103 105 105 112 118 121 129 129 133 135 135 139 142 145 147 148 151 154 156 156 160 167 170 172 178 181 185 187 187 195 200 212 219 223 225 227 229 229 235 238 240 244
Contents Sparse Recovery 10 10.1 High-dimensional signal recovery problems 10.2 Signal recovery based on M∗ bound 10.3 Recovery of sparse signals 10.4 Low-rank matrix recovery 10.5 Exact recovery and the restricted isometry property 10.6 Lasso algorithm for sparse regression 10.7 Notes 11 Dvoretzky-Milman’s Theorem 11.1 Deviations of random matrices with respect to general norms 11.2 Johnson-Lindenstrauss embeddings and sharper Chevet inequality 11.3 Dvoretzky-Milman’s Theorem 11.4 Notes Bibliography Index v 246 246 248 250 254 256 262 267 269 269 272 274 279 280 290
Preface Who is this book for? This is a textbook in probability in high dimensions with a view toward applica- tions in data sciences. It is intended for doctoral and advanced masters students and beginning researchers in mathematics, statistics, electrical engineering, com- putational biology and related areas, who are looking to expand their knowledge of theoretical methods used in modern research in data sciences. Why this book? Data sciences are moving fast, and probabilistic methods often provide a foun- dation and inspiration for such advances. A typical graduate probability course is no longer sufficient to acquire the level of mathematical sophistication that is expected from a beginning researcher in data sciences today. The proposed book intends to partially cover this gap. It presents some of the key probabilis- tic methods and results that may form an essential toolbox for a mathematical data scientist. This book can be used as a textbook for a basic second course in probability with a view toward data science applications. It is also suitable for self-study. What is this book about? High-dimensional probability is an area of probability theory that studies random objects in Rn where the dimension n can be very large. This book places par- ticular emphasis on random vectors, random matrices, and random projections. It teaches basic theoretical skills for the analysis of these objects, which include concentration inequalities, covering and packing arguments, decoupling and sym- metrization tricks, chaining and comparison techniques for stochastic processes, combinatorial reasoning based on the VC dimension, and a lot more. High-dimensional probability provides vital theoretical tools for applications in data science. This book integrates theory with applications for covariance estimation, semidefinite programming, networks, elements of statistical learning, error correcting codes, clustering, matrix completion, dimension reduction, sparse signal recovery, sparse regression, and more. Prerequisites The essential prerequisites for reading this book are a rigorous course in probabil- ity theory (on Masters or Ph.D. level), an excellent command of undergraduate linear algebra, and general familiarity with basic notions about metric, normed vi
Preface vii and Hilbert spaces and linear operators. Knowledge of measure theory is not essential but would be helpful. A word on exercises Exercises are integrated into the text. The reader can do them immediately to check his or her understanding of the material just presented, and to prepare better for later developments. The difficulty of the exercises is indicated by the number of coffee cups; it can range from easiest () to hardest (). Related reading This book covers only a fraction of theoretical apparatus of high-dimensional probability, and it illustrates it with only a sample of data science applications. Each chapter in this book is concluded with a Notes section, which has pointers to other texts on the matter. A few particularly useful sources should be noted here. The now classical book [8] showcases the probabilistic method in applica- tions to discrete mathematics and computer science. The forthcoming book [19] presents a panorama of mathematical data science, and it particularly focuses on applications in computer science. Both these books are accessible to gradu- ate and advanced undergraduate students. The lecture notes [206] are pitched for graduate students and present more theoretical material in high-dimensional probability. Acknowledgements The feedback from my many colleagues was instrumental in perfecting this book. My special thanks go to Florent Benaych-Georges, Jennifer Bryson, Lukas Gr¨atz, Ping Hsu, Mike Izbicki, George Linderman, Cong Ma, Galyna Livshyts, Jelani Nelson, Ekkehard Schnoor, Martin Spindler, Dominik St¨oger, Tim Sullivan, Ter- ence Tao, Joel Tropp, Katarzyna Wyczesany, Yifei Shen and Haoshu Xu for many valuable suggestions and corrections, especially to Sjoerd Dirksen, Larry Gold- stein, Wu Han, Han Wu, and Mahdi Soltanolkotabi for detailed proofreading of the book, Can Le, Jennifer Bryson and my son Ivan Vershynin for their help with many pictures in this book. Irvine, California April 2018
分享到:
收藏