High-Dimensional Probability: An Introduction with Applications in Data Science.pdf

发布时间：2022-06-19 发布人：admin 分类：说明书资料大小：3.97M 资料格式：pdf 举报版权申诉

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第1页.png

第1页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第2页.png

第2页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第3页.png

第3页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第4页.png

第4页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第5页.png

第5页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第6页.png

第6页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第7页.png

第7页 / 共301页

0c9f40f2-0912-4781-b347-3f6107171d4a.pdf-第8页.png

第8页 / 共301页

Preface

Who is this book for?

Why this book?

What is this book about?

Prerequisites

A word on exercises

Related reading

Acknowledgements

Appetizer: using probability to cover a geometric set

Preliminaries on random variables

Basic quantities associated with random variables

Some classical inequalities

Limit theorems

Notes

Concentration of sums of independent random variables

Why concentration inequalities?

Hoeffding's inequality

Chernoff's inequality

Application: degrees of random graphs

Sub-gaussian distributions

General Hoeffding's and Khintchine's inequalities

Sub-exponential distributions

Bernstein's inequality

Notes

Random vectors in high dimensions

Concentration of the norm

Covariance matrices and principal component analysis

Examples of high-dimensional distributions

Sub-gaussian distributions in higher dimensions

Application: Grothendieck's inequality and semidefinite programming

Application: Maximum cut for graphs

Kernel trick, and tightening of Grothendieck's inequality

Notes

Random matrices

Preliminaries on matrices

Nets, covering numbers and packing numbers

Application: error correcting codes

Upper bounds on random sub-gaussian matrices

Application: community detection in networks

Two-sided bounds on sub-gaussian matrices

Application: covariance estimation and clustering

Notes

Concentration without independence

Concentration of Lipschitz functions on the sphere

Concentration on other metric measure spaces

Application: Johnson-Lindenstrauss Lemma

Matrix Bernstein's inequality

Application: community detection in sparse networks

Application: covariance estimation for general distributions

Notes

Quadratic forms, symmetrization and contraction

Decoupling

Hanson-Wright Inequality

Concentration of anisotropic random vectors

Symmetrization

Random matrices with non-i.i.d. entries

Application: matrix completion

Contraction Principle

Notes

Random processes

Basic concepts and examples

Slepian's inequality

Sharp bounds on Gaussian matrices

Sudakov's minoration inequality

Gaussian width

Stable dimension, stable rank, and Gaussian complexity

Random projections of sets

Notes

Chaining

Dudley's inequality

Application: empirical processes

VC dimension

Application: statistical learning theory

Generic chaining

Talagrand's majorizing measure and comparison theorems

Chevet's inequality

Notes

Deviations of random matrices and geometric consequences

Matrix deviation inequality

Random matrices, random projections and covariance estimation

Johnson-Lindenstrauss Lemma for infinite sets

Random sections: M* bound and Escape Theorem

Notes

Sparse Recovery

High-dimensional signal recovery problems

Signal recovery based on M* bound

Recovery of sparse signals

Low-rank matrix recovery

Exact recovery and the restricted isometry property

Lasso algorithm for sparse regression

Notes

Dvoretzky-Milman's Theorem

Deviations of random matrices with respect to general norms

Johnson-Lindenstrauss embeddings and sharper Chevet inequality

Dvoretzky-Milman's Theorem

Notes

Bibliography

Index

High-Dimensional Probability An Introduction with Applications in Data Science Roman Vershynin University of California, Irvine June 7, 2018 https://www.math.uci.edu/~rvershyn/

Contents Preface Appetizer: using probability to cover a geometric set 1 1.1 1.2 1.3 1.4 Preliminaries on random variables Basic quantities associated with random variables Some classical inequalities Limit theorems Notes Hoeﬀding’s inequality Chernoﬀ’s inequality Application: degrees of random graphs Sub-gaussian distributions 2 2.1 Why concentration inequalities? 2.2 2.3 2.4 2.5 2.6 General Hoeﬀding’s and Khintchine’s inequalities 2.7 2.8 2.9 Concentration of sums of independent random variables Sub-exponential distributions Bernstein’s inequality Notes Random vectors in high dimensions Concentration of the norm Covariance matrices and principal component analysis Examples of high-dimensional distributions Sub-gaussian distributions in higher dimensions Application: Grothendieck’s inequality and semideﬁnite programming Application: Maximum cut for graphs 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Kernel trick, and tightening of Grothendieck’s inequality 3.8 Notes 4 4.1 4.2 4.3 4.4 4.5 4.6 Random matrices Preliminaries on matrices Nets, covering numbers and packing numbers Application: error correcting codes Upper bounds on random sub-gaussian matrices Application: community detection in networks Two-sided bounds on sub-gaussian matrices iii vi 1 6 6 7 9 12 13 13 16 19 21 24 29 32 37 40 42 43 45 50 56 60 66 70 74 76 76 81 86 89 93 97

iv 4.7 4.8 Contents Application: covariance estimation and clustering Notes Concentration without independence Concentration of Lipschitz functions on the sphere Concentration on other metric measure spaces Application: Johnson-Lindenstrauss Lemma 5 5.1 5.2 5.3 5.4 Matrix Bernstein’s inequality 5.5 5.6 5.7 Application: community detection in sparse networks Application: covariance estimation for general distributions Notes Quadratic forms, symmetrization and contraction 6 6.1 Decoupling 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Hanson-Wright Inequality Concentration of anisotropic random vectors Symmetrization Random matrices with non-i.i.d. entries Application: matrix completion Contraction Principle Notes Random processes Basic concepts and examples Slepian’s inequality Sharp bounds on Gaussian matrices Sudakov’s minoration inequality 7 7.1 7.2 7.3 7.4 7.5 Gaussian width 7.6 7.7 7.8 Stable dimension, stable rank, and Gaussian complexity Random projections of sets Notes Application: empirical processes VC dimension Application: statistical learning theory Chaining 8 8.1 Dudley’s inequality 8.2 8.3 8.4 8.5 Generic chaining 8.6 8.7 8.8 Talagrand’s majorizing measure and comparison theorems Chevet’s inequality Notes Deviations of random matrices and geometric consequences 9 9.1 Matrix deviation inequality 9.2 9.3 9.4 9.5 Random matrices, random projections and covariance estimation Johnson-Lindenstrauss Lemma for inﬁnite sets Random sections: M∗ bound and Escape Theorem Notes 99 103 105 105 112 118 121 129 129 133 135 135 139 142 145 147 148 151 154 156 156 160 167 170 172 178 181 185 187 187 195 200 212 219 223 225 227 229 229 235 238 240 244

Contents Sparse Recovery 10 10.1 High-dimensional signal recovery problems 10.2 Signal recovery based on M∗ bound 10.3 Recovery of sparse signals 10.4 Low-rank matrix recovery 10.5 Exact recovery and the restricted isometry property 10.6 Lasso algorithm for sparse regression 10.7 Notes 11 Dvoretzky-Milman’s Theorem 11.1 Deviations of random matrices with respect to general norms 11.2 Johnson-Lindenstrauss embeddings and sharper Chevet inequality 11.3 Dvoretzky-Milman’s Theorem 11.4 Notes Bibliography Index v 246 246 248 250 254 256 262 267 269 269 272 274 279 280 290

Preface Who is this book for? This is a textbook in probability in high dimensions with a view toward applica- tions in data sciences. It is intended for doctoral and advanced masters students and beginning researchers in mathematics, statistics, electrical engineering, com- putational biology and related areas, who are looking to expand their knowledge of theoretical methods used in modern research in data sciences. Why this book? Data sciences are moving fast, and probabilistic methods often provide a foun- dation and inspiration for such advances. A typical graduate probability course is no longer suﬃcient to acquire the level of mathematical sophistication that is expected from a beginning researcher in data sciences today. The proposed book intends to partially cover this gap. It presents some of the key probabilis- tic methods and results that may form an essential toolbox for a mathematical data scientist. This book can be used as a textbook for a basic second course in probability with a view toward data science applications. It is also suitable for self-study. What is this book about? High-dimensional probability is an area of probability theory that studies random objects in Rn where the dimension n can be very large. This book places par- ticular emphasis on random vectors, random matrices, and random projections. It teaches basic theoretical skills for the analysis of these objects, which include concentration inequalities, covering and packing arguments, decoupling and sym- metrization tricks, chaining and comparison techniques for stochastic processes, combinatorial reasoning based on the VC dimension, and a lot more. High-dimensional probability provides vital theoretical tools for applications in data science. This book integrates theory with applications for covariance estimation, semideﬁnite programming, networks, elements of statistical learning, error correcting codes, clustering, matrix completion, dimension reduction, sparse signal recovery, sparse regression, and more. Prerequisites The essential prerequisites for reading this book are a rigorous course in probabil- ity theory (on Masters or Ph.D. level), an excellent command of undergraduate linear algebra, and general familiarity with basic notions about metric, normed vi

Preface vii and Hilbert spaces and linear operators. Knowledge of measure theory is not essential but would be helpful. A word on exercises Exercises are integrated into the text. The reader can do them immediately to check his or her understanding of the material just presented, and to prepare better for later developments. The diﬃculty of the exercises is indicated by the number of coﬀee cups; it can range from easiest () to hardest (). Related reading This book covers only a fraction of theoretical apparatus of high-dimensional probability, and it illustrates it with only a sample of data science applications. Each chapter in this book is concluded with a Notes section, which has pointers to other texts on the matter. A few particularly useful sources should be noted here. The now classical book [8] showcases the probabilistic method in applica- tions to discrete mathematics and computer science. The forthcoming book [19] presents a panorama of mathematical data science, and it particularly focuses on applications in computer science. Both these books are accessible to gradu- ate and advanced undergraduate students. The lecture notes [206] are pitched for graduate students and present more theoretical material in high-dimensional probability. Acknowledgements The feedback from my many colleagues was instrumental in perfecting this book. My special thanks go to Florent Benaych-Georges, Jennifer Bryson, Lukas Gr¨atz, Ping Hsu, Mike Izbicki, George Linderman, Cong Ma, Galyna Livshyts, Jelani Nelson, Ekkehard Schnoor, Martin Spindler, Dominik St¨oger, Tim Sullivan, Ter- ence Tao, Joel Tropp, Katarzyna Wyczesany, Yifei Shen and Haoshu Xu for many valuable suggestions and corrections, especially to Sjoerd Dirksen, Larry Gold- stein, Wu Han, Han Wu, and Mahdi Soltanolkotabi for detailed proofreading of the book, Can Le, Jennifer Bryson and my son Ivan Vershynin for their help with many pictures in this book. Irvine, California April 2018

分享到：

赞收藏

资料库

High-Dimensional Probability: An Introduction with Applications in Data Science.pdf

相关推荐

人工智能

热门标签

最新资料