Foundations_of_Data_Science_March 2019.pdf

发布时间：2022-06-03 发布人：admin 分类：说明书资料大小：2.34M 资料格式：pdf 举报版权申诉

wzbyytm-11058802-4744300845378353430.pdf-第1页.png

第1页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第2页.png

第2页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第3页.png

第3页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第4页.png

第4页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第5页.png

第5页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第6页.png

第6页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第7页.png

第7页 / 共484页

wzbyytm-11058802-4744300845378353430.pdf-第8页.png

第8页 / 共484页

1 Introduction

2 High-Dimensional Space

2.2 The Law of Large Numbers

2.3 The Geometry of High Dimensions

2.4 Properties of the Unit Ball

2.5 Generating Points Uniformly at Random from a Ball

2.6 Gaussians in High Dimension

2.7 Random Projection and Johnson-Lindenstrauss Lemma

2.8 Separating Gaussians

2.9 Fitting a Spherical Gaussian to Data

2.10 Bibliographic Notes

3 Best-Fit Subspaces and Singular Value Decomposition (SVD)

3.1 Introduction

3.2 Preliminaries

3.3 Singular Vectors

3.4 Singular Value Decomposition (SVD)

3.5 Best Rank-k Approximations

3.6 Left Singular Vectors

3.7 Power Method for Singular Value Decomposition

3.8 Singular Vectors and Eigenvectors

3.9 Applications of Singular Value Decomposition

3.9.1 Centering Data

3.9.2 Principal Component Analysis

3.9.3 Clustering a Mixture of Spherical Gaussians

3.9.4 Ranking Documents and Web Pages

3.9.5 An Illustrative Application of SVD

3.9.6 An Application of SVD to a Discrete Optimization Problem

3.10 Bibliographic Notes

4 Random Walks and Markov Chains

5 Machine Learning

5.1 Introduction

5.2 The Perceptron Algorithm

5.3 Kernel Functions and Non-linearly Separable Data

5.4 Generalizing to New Data

5.4.1 Overfitting and Uniform Convergence

5.4.2 Occam’s Razor

5.4.3 Regularization: Penalizing Complexity

5.5 VC-Dimension

5.5.1 Definitions and Key Theorems

5.5.2 VC-Dimension of Some Set Systems

5.5.3 Shatter Function for Set Systems of Bounded VC-Dimension

5.5.4 VC-Dimension of Combinations of Concepts

5.5.5 The Key Theorem

5.6 VC-dimension and Machine Learning

5.7 Other Measures of Complexity

5.8 Deep Learning

5.8.1 Generative Adversarial Networks (GANs)

5.9 Gradient Descent

5.9.1 Stochastic Gradient Descent

5.9.2 Regularizer

5.10 Online Learning

5.10.1 An Example: Learning Disjunctions

5.10.2 The Halving Algorithm

5.10.3 The Perceptron Algorithm

5.10.4 Inseparable Data and Hinge Loss

.10.5 Online to Batch Conversion

5.10.6 Combining (Sleeping) Expert Advice

5.11 Boosting

5.12 Further Current Directions

5.12.1 Semi-Supervised Learning

5.12.2 Active Learning

5.12.3 Multi-Task Learning

5.13 Bibliographic Notes

6 Algorithms for Massive Data Problems: Streaming, Sketching, and Sampling

6.1 Introduction

6.2 Frequency Moments of Data Streams

6.3 Matrix Algorithms using Sampling

6.4 Sketches of Documents

6.5 Bibliographic Notes

6.6 Exercises

7 Clustering

7.1 Introduction

7.2 k-Means Clustering

7.3 k-Center Clustering

7.4 Finding Low-Error Clusterings

7.5 Spectral Clustering

7.6 Approximation Stability

7.7 High-Density Clusters

7.8 Kernel Methods

7.9 Recursive Clustering Based on Sparse Cuts

7.10 Dense Submatrices and Communities

7.11 Community Finding and Graph Partitioning

7.12 Spectral Clustering Applied to Social Networks

7.13 Bibliographic Notes

7.14 Exercises

8 Random Graphs

8.1 The G(n; p) Model

8.2 Phase Transitions

8.3 Giant Component

8.4 Cycles and Full Connectivity

8.5 Phase Transitions for Increasing Properties

8.6 Branching Processes

8.7 CNF-SAT

8.8 Non-uniform Models of Random Graphs

8.9 Growth Models

8.10 Small World Graphs

8.11 Bibliographic Notes

9 Topic Models, Non-negative Matrix Factorization,Hidden Markov Models, and Graphical Models

9.1 Topic Models

9.2 An Idealized Model

9.3 Non-negative Matrix Factorization - NMF

9.4 NMF with Anchor Terms

9.5 Hard and Soft Clustering

9.6 The Latent Dirichlet Allocation Model for Topic Modeling

9.7 The Dominant Admixture Model

9.8 Formal Assumptions

9.9 Finding the Term-Topic Matrix

9.10 Hidden Markov Models

9.11 Graphical Models and Belief Propagation

9.12 Bayesian or Belief Networks

9.13 Markov Random Fields

9.14 Factor Graphs

9.15 Tree Algorithms

9.16 Message Passing in General Graphs

9.16.1 Graphs with a Single Cycle

9.16.2 Belief Update in Networks with a Single Loop

9.16.3 Maximum Weight Matching

9.17 Warning Propagation

9.18 Correlation Between Variables

9.19 Bibliographic Notes

10 Other Topics

10.5 Gradient

10.6 Linear Programming

10.7 Integer Optimization

10.8 Semi-Definite Programming

10.9 Bibliographic Notes

10.10 Exercises

11 Wavelets

11.1 Dilation

11.2 The Haar Wavelet

11.3 Wavelet Systems

11.4 Solving the Dilation Equation

11.5 Conditions on the Dilation Equation

11.6 Derivation of the Wavelets from the Scaling Function

11.7 Sufficient Conditions for the Wavelets to be Orthogonal

11.8 Expressing a Function in Terms of Wavelets

11.9 Designing a Wavelet System

11.10 Applications

11.11 Bibliographic Notes

12 Appendix

12.1 Definitions and Notation

12.2 Useful Relations

12.3 Useful Inequalities

Tails of Gaussians

12.4 Probability

12.5 Bounds on Tail Probability

12.6 Applications of the Tail Bound

12.7 Eigenvalues and Eigenvectors

12.8 Generating Functions

12.9 Miscellaneous

12.10 Exercises

References

Contents 1 Introduction 9 2 High-Dimensional Space 2.4.1 Volume of the Unit Ball 2.4.2 Volume Near the Equator 12 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The Law of Large Numbers 2.3 The Geometry of High Dimensions . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Properties of the Unit Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Generating Points Uniformly at Random from a Ball . . . . . . . . . . . . 22 2.6 Gaussians in High Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Random Projection and Johnson-Lindenstrauss Lemma . . . . . . . . . . . 25 2.8 Separating Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.9 Fitting a Spherical Gaussian to Data . . . . . . . . . . . . . . . . . . . . . 29 2.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) 40 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Singular Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . 46 3.5 Best Rank-k Approximations . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.6 Left Singular Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Power Method for Singular Value Decomposition . . . . . . . . . . . . . . . 51 3.7.1 A Faster Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.8 Singular Vectors and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 54 3.9 Applications of Singular Value Decomposition . . . . . . . . . . . . . . . . 54 3.9.1 Centering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.9.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 56 3.9.3 Clustering a Mixture of Spherical Gaussians . . . . . . . . . . . . . 57 3.9.4 Ranking Documents and Web Pages . . . . . . . . . . . . . . . . . 62 3.9.5 An Illustrative Application of SVD . . . . . . . . . . . . . . . . . . 63 3.9.6 An Application of SVD to a Discrete Optimization Problem . . . . 64 3.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4 Random Walks and Markov Chains 77 4.1 Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.1 Metropolis-Hasting Algorithm . . . . . . . . . . . . . . . . . . . . . 84 4.2.2 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 2

4.3 Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4 Convergence of Random Walks on Undirected Graphs . . . . . . . . . . . . 89 4.4.1 Using Normalized Conductance to Prove Convergence . . . . . . . . 95 4.5 Electrical Networks and Random Walks . . . . . . . . . . . . . . . . . . . . 98 4.6 Random Walks on Undirected Graphs with Unit Edge Weights . . . . . . . 103 4.7 Random Walks in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . 110 4.8 The Web as a Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Machine Learning Shatter Function for Set Systems of Bounded VC-Dimension 131 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.2 The Perceptron Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 Kernel Functions and Non-linearly Separable Data . . . . . . . . . . . . . . 134 5.4 Generalizing to New Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4.1 Overﬁtting and Uniform Convergence . . . . . . . . . . . . . . . . . 137 5.4.2 Occam’s Razor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.3 Regularization: Penalizing Complexity . . . . . . . . . . . . . . . . 140 5.5 VC-Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.5.1 Deﬁnitions and Key Theorems . . . . . . . . . . . . . . . . . . . . . 142 . . . . . . . . . . . . . . . . . 143 5.5.2 VC-Dimension of Some Set Systems 5.5.3 . . . 145 5.5.4 VC-Dimension of Combinations of Concepts . . . . . . . . . . . . . 147 5.5.5 The Key Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.6 VC-dimension and Machine Learning . . . . . . . . . . . . . . . . . . . . . 149 5.7 Other Measures of Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.8 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.8.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . 157 5.9 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 160 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.10 Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.10.1 An Example: Learning Disjunctions . . . . . . . . . . . . . . . . . . 163 5.10.2 The Halving Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.10.3 The Perceptron Algorithm . . . . . . . . . . . . . . . . . . . . . . . 164 5.10.4 Inseparable Data and Hinge Loss . . . . . . . . . . . . . . . . . . . 165 5.10.5 Online to Batch Conversion . . . . . . . . . . . . . . . . . . . . . . 166 5.10.6 Combining (Sleeping) Expert Advice . . . . . . . . . . . . . . . . . 167 5.11 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.12 Further Current Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.12.1 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 173 5.12.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 5.12.3 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.9.1 5.9.2 Regularizer 3

5.13 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6 Algorithms for Massive Data Problems: Streaming, Sketching, and 184 Sampling Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.1 6.2 Frequency Moments of Data Streams . . . . . . . . . . . . . . . . . . . . . 185 6.2.1 Number of Distinct Elements in a Data Stream . . . . . . . . . . . 186 6.2.2 Number of Occurrences of a Given Element. . . . . . . . . . . . . . 189 6.2.3 Frequent Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.2.4 The Second Moment . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.3 Matrix Algorithms using Sampling . . . . . . . . . . . . . . . . . . . . . . 195 6.3.1 Matrix Multiplication using Sampling . . . . . . . . . . . . . . . . . 196 Implementing Length Squared Sampling in Two Passes . . . . . . . 199 6.3.2 6.3.3 Sketch of a Large Matrix . . . . . . . . . . . . . . . . . . . . . . . . 200 6.4 Sketches of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 6.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7 Clustering 7.1 Structural Properties of the k-Means Objective 211 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.1.1 Preliminaries 7.1.2 Two General Assumptions on the Form of Clusters . . . . . . . . . 212 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.1.3 7.2 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.2.1 A Maximum-Likelihood Motivation . . . . . . . . . . . . . . . . . . 214 7.2.2 . . . . . . . . . . . 215 7.2.3 Lloyd’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 7.2.4 Ward’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.2.5 k-Means Clustering on the Line . . . . . . . . . . . . . . . . . . . . 218 7.3 k-Center Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.4 Finding Low-Error Clusterings . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.5 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.5.1 Why Project? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.5.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 7.5.3 Means Separated by Ω(1) Standard Deviations . . . . . . . . . . . . 222 7.5.4 Laplacians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 7.5.5 Local spectral clustering . . . . . . . . . . . . . . . . . . . . . . . . 224 7.6 Approximation Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 7.6.1 The Conceptual Idea . . . . . . . . . . . . . . . . . . . . . . . . . . 227 7.6.2 Making this Formal . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 . . . . . . . . . . . . . . . . . . . . . . . . 228 7.6.3 Algorithm and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 7.7 High-Density Clusters 4

7.7.1 Single Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 7.7.2 Robust Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 7.8 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 7.9 Recursive Clustering Based on Sparse Cuts . . . . . . . . . . . . . . . . . . 232 7.10 Dense Submatrices and Communities . . . . . . . . . . . . . . . . . . . . . 233 7.11 Community Finding and Graph Partitioning . . . . . . . . . . . . . . . . . 236 7.12 Spectral Clustering Applied to Social Networks . . . . . . . . . . . . . . . 239 7.13 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 7.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 8 Random Graphs 8.1 The G(n, p) Model 248 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 8.1.1 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 . . . . . . . . . . . . . . . . . . 253 8.1.2 Existence of Triangles in G(n, d/n) 8.2 Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 8.3 Giant Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 . . . . . . . . . . . . . . . . . . . 267 8.3.1 Existence of a Giant Component 8.3.2 No Other Large Components . . . . . . . . . . . . . . . . . . . . . 269 8.3.3 The Case of p < 1/n . . . . . . . . . . . . . . . . . . . . . . . . . . 269 8.4 Cycles and Full Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.4.1 Emergence of Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.4.2 Full Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 8.4.3 Threshold for O(ln n) Diameter . . . . . . . . . . . . . . . . . . . . 273 8.5 Phase Transitions for Increasing Properties . . . . . . . . . . . . . . . . . . 275 8.6 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 8.7 CNF-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 8.7.1 SAT-solvers in practice . . . . . . . . . . . . . . . . . . . . . . . . . 283 8.7.2 Phase Transitions for CNF-SAT . . . . . . . . . . . . . . . . . . . . 284 . . . . . . . . . . . . . . . . . . . 289 8.8.1 Giant Component in Graphs with Given Degree Distribution . . . . 290 8.9 Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 8.9.1 Growth Model Without Preferential Attachment . . . . . . . . . . . 292 8.9.2 Growth Model With Preferential Attachment . . . . . . . . . . . . 298 8.10 Small World Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 8.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 8.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 8.8 Non-uniform Models of Random Graphs 9 Topic Models, Non-negative Matrix Factorization, Hidden Markov Mod- 315 els, and Graphical Models 9.1 Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 9.2 An Idealized Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 9.3 Non-negative Matrix Factorization - NMF . . . . . . . . . . . . . . . . . . 320 9.4 NMF with Anchor Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5

9.5 Hard and Soft Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 9.6 The Latent Dirichlet Allocation Model for Topic Modeling . . . . . . . . . 325 9.7 The Dominant Admixture Model . . . . . . . . . . . . . . . . . . . . . . . 327 9.8 Formal Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 9.9 Finding the Term-Topic Matrix . . . . . . . . . . . . . . . . . . . . . . . . 332 9.10 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 9.11 Graphical Models and Belief Propagation . . . . . . . . . . . . . . . . . . . 342 9.12 Bayesian or Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 343 9.13 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 9.14 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 9.15 Tree Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 9.16 Message Passing in General Graphs . . . . . . . . . . . . . . . . . . . . . . 347 9.16.1 Graphs with a Single Cycle . . . . . . . . . . . . . . . . . . . . . . 349 9.16.2 Belief Update in Networks with a Single Loop . . . . . . . . . . . . 351 9.16.3 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . 352 9.17 Warning Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 9.18 Correlation Between Variables . . . . . . . . . . . . . . . . . . . . . . . . . 356 9.19 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 9.20 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 10 Other Topics 10.2 Compressed Sensing and Sparse Vectors 365 10.1 Ranking and Social Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 10.1.1 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 10.1.2 Examples . . . . . . . . . . . . . . . . . . . 369 10.2.1 Unique Reconstruction of a Sparse Vector . . . . . . . . . . . . . . 370 10.2.2 Eﬃciently Finding the Unique Sparse Solution . . . . . . . . . . . . 371 10.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 10.3.1 Biological . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 10.3.2 Low Rank Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 10.4 An Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 . . . . . . . . . . . . . . . 375 10.4.1 Sparse Vector in Some Coordinate Basis 10.4.2 A Representation Cannot be Sparse in Both Time and Frequency Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 10.5 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 10.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 10.6.1 The Ellipsoid Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 380 10.7 Integer Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 10.8 Semi-Deﬁnite Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 383 10.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 10.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 6

11 Wavelets 390 11.1 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 11.2 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 11.3 Wavelet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 11.4 Solving the Dilation Equation . . . . . . . . . . . . . . . . . . . . . . . . . 395 11.5 Conditions on the Dilation Equation . . . . . . . . . . . . . . . . . . . . . 397 11.6 Derivation of the Wavelets from the Scaling Function . . . . . . . . . . . . 399 11.7 Suﬃcient Conditions for the Wavelets to be Orthogonal . . . . . . . . . . . 403 11.8 Expressing a Function in Terms of Wavelets . . . . . . . . . . . . . . . . . 406 11.9 Designing a Wavelet System . . . . . . . . . . . . . . . . . . . . . . . . . . 406 11.10Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 11.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 11.12 Exercises 12 Appendix 411 12.1 Deﬁnitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.1.1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.1.2 Substructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.1.3 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.2 Useful Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 12.3 Useful Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 12.4 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 12.4.1 Sample Space, Events, and Independence . . . . . . . . . . . . . . . 425 12.4.2 Linearity of Expectation . . . . . . . . . . . . . . . . . . . . . . . . 426 12.4.3 Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 12.4.4 Indicator Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 12.4.5 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 12.4.6 Variance of the Sum of Independent Random Variables . . . . . . . 427 12.4.7 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 12.4.8 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 428 12.4.9 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 428 12.4.10 Bayes Rule and Estimators . . . . . . . . . . . . . . . . . . . . . . . 432 12.5 Bounds on Tail Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 12.5.1 Chernoﬀ Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 12.5.2 More General Tail Bounds . . . . . . . . . . . . . . . . . . . . . . . 437 12.6 Applications of the Tail Bound . . . . . . . . . . . . . . . . . . . . . . . . 440 12.7 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 442 12.7.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 443 12.7.2 Relationship between SVD and Eigen Decomposition . . . . . . . . 445 12.7.3 Extremal Properties of Eigenvalues . . . . . . . . . . . . . . . . . . 446 12.7.4 Eigenvalues of the Sum of Two Symmetric Matrices . . . . . . . . . 448 12.7.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 12.7.6 Important Norms and Their Properties . . . . . . . . . . . . . . . . 450 7

12.7.7 Additional Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 452 12.7.8 Distance Between Subspaces . . . . . . . . . . . . . . . . . . . . . . 454 12.7.9 Positive Semideﬁnite Matrix . . . . . . . . . . . . . . . . . . . . . . 455 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 12.8 Generating Functions 12.8.1 Generating Functions for Sequences Deﬁned by Recurrence Rela- tionships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 12.8.2 The Exponential Generating Function and the Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 12.9 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 12.9.1 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . 460 12.9.2 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 12.9.3 Application of Mean Value Theorem . . . . . . . . . . . . . . . . . 461 12.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Index 468 8

分享到：

赞收藏

资料库

Foundations_of_Data_Science_March 2019.pdf

相关推荐

大数据

热门标签

最新资料