logo资料库

Foundations_of_Data_Science_March 2019.pdf

第1页 / 共484页
第2页 / 共484页
第3页 / 共484页
第4页 / 共484页
第5页 / 共484页
第6页 / 共484页
第7页 / 共484页
第8页 / 共484页
资料共484页,剩余部分请下载后查看
1 Introduction
2 High-Dimensional Space
2.2 The Law of Large Numbers
2.3 The Geometry of High Dimensions
2.4 Properties of the Unit Ball
2.5 Generating Points Uniformly at Random from a Ball
2.6 Gaussians in High Dimension
2.7 Random Projection and Johnson-Lindenstrauss Lemma
2.8 Separating Gaussians
2.9 Fitting a Spherical Gaussian to Data
2.10 Bibliographic Notes
3 Best-Fit Subspaces and Singular Value Decomposition (SVD)
3.1 Introduction
3.2 Preliminaries
3.3 Singular Vectors
3.4 Singular Value Decomposition (SVD)
3.5 Best Rank-k Approximations
3.6 Left Singular Vectors
3.7 Power Method for Singular Value Decomposition
3.8 Singular Vectors and Eigenvectors
3.9 Applications of Singular Value Decomposition
3.9.1 Centering Data
3.9.2 Principal Component Analysis
3.9.3 Clustering a Mixture of Spherical Gaussians
3.9.4 Ranking Documents and Web Pages
3.9.5 An Illustrative Application of SVD
3.9.6 An Application of SVD to a Discrete Optimization Problem
3.10 Bibliographic Notes
4 Random Walks and Markov Chains
5 Machine Learning
5.1 Introduction
5.2 The Perceptron Algorithm
5.3 Kernel Functions and Non-linearly Separable Data
5.4 Generalizing to New Data
5.4.1 Overfitting and Uniform Convergence
5.4.2 Occam’s Razor
5.4.3 Regularization: Penalizing Complexity
5.5 VC-Dimension
5.5.1 Definitions and Key Theorems
5.5.2 VC-Dimension of Some Set Systems
5.5.3 Shatter Function for Set Systems of Bounded VC-Dimension
5.5.4 VC-Dimension of Combinations of Concepts
5.5.5 The Key Theorem
5.6 VC-dimension and Machine Learning
5.7 Other Measures of Complexity
5.8 Deep Learning
5.8.1 Generative Adversarial Networks (GANs)
5.9 Gradient Descent
5.9.1 Stochastic Gradient Descent
5.9.2 Regularizer
5.10 Online Learning
5.10.1 An Example: Learning Disjunctions
5.10.2 The Halving Algorithm
5.10.3 The Perceptron Algorithm
5.10.4 Inseparable Data and Hinge Loss
.10.5 Online to Batch Conversion
5.10.6 Combining (Sleeping) Expert Advice
5.11 Boosting
5.12 Further Current Directions
5.12.1 Semi-Supervised Learning
5.12.2 Active Learning
5.12.3 Multi-Task Learning
5.13 Bibliographic Notes
6 Algorithms for Massive Data Problems: Streaming, Sketching, and Sampling
6.1 Introduction
6.2 Frequency Moments of Data Streams
6.3 Matrix Algorithms using Sampling
6.4 Sketches of Documents
6.5 Bibliographic Notes
6.6 Exercises
7 Clustering
7.1 Introduction
7.2 k-Means Clustering
7.3 k-Center Clustering
7.4 Finding Low-Error Clusterings
7.5 Spectral Clustering
7.6 Approximation Stability
7.7 High-Density Clusters
7.8 Kernel Methods
7.9 Recursive Clustering Based on Sparse Cuts
7.10 Dense Submatrices and Communities
7.11 Community Finding and Graph Partitioning
7.12 Spectral Clustering Applied to Social Networks
7.13 Bibliographic Notes
7.14 Exercises
8 Random Graphs
8.1 The G(n; p) Model
8.2 Phase Transitions
8.3 Giant Component
8.4 Cycles and Full Connectivity
8.5 Phase Transitions for Increasing Properties
8.6 Branching Processes
8.7 CNF-SAT
8.8 Non-uniform Models of Random Graphs
8.9 Growth Models
8.10 Small World Graphs
8.11 Bibliographic Notes
9 Topic Models, Non-negative Matrix Factorization,Hidden Markov Models, and Graphical Models
9.1 Topic Models
9.2 An Idealized Model
9.3 Non-negative Matrix Factorization - NMF
9.4 NMF with Anchor Terms
9.5 Hard and Soft Clustering
9.6 The Latent Dirichlet Allocation Model for Topic Modeling
9.7 The Dominant Admixture Model
9.8 Formal Assumptions
9.9 Finding the Term-Topic Matrix
9.10 Hidden Markov Models
9.11 Graphical Models and Belief Propagation
9.12 Bayesian or Belief Networks
9.13 Markov Random Fields
9.14 Factor Graphs
9.15 Tree Algorithms
9.16 Message Passing in General Graphs
9.16.1 Graphs with a Single Cycle
9.16.2 Belief Update in Networks with a Single Loop
9.16.3 Maximum Weight Matching
9.17 Warning Propagation
9.18 Correlation Between Variables
9.19 Bibliographic Notes
10 Other Topics
10.5 Gradient
10.6 Linear Programming
10.7 Integer Optimization
10.8 Semi-Definite Programming
10.9 Bibliographic Notes
10.10 Exercises
11 Wavelets
11.1 Dilation
11.2 The Haar Wavelet
11.3 Wavelet Systems
11.4 Solving the Dilation Equation
11.5 Conditions on the Dilation Equation
11.6 Derivation of the Wavelets from the Scaling Function
11.7 Sufficient Conditions for the Wavelets to be Orthogonal
11.8 Expressing a Function in Terms of Wavelets
11.9 Designing a Wavelet System
11.10 Applications
11.11 Bibliographic Notes
12 Appendix
12.1 Definitions and Notation
12.2 Useful Relations
12.3 Useful Inequalities
Tails of Gaussians
12.4 Probability
12.5 Bounds on Tail Probability
12.6 Applications of the Tail Bound
12.7 Eigenvalues and Eigenvectors
12.8 Generating Functions
12.9 Miscellaneous
12.10 Exercises
References
Foundations of Data Science∗ Avrim Blum, John Hopcroft, and Ravindran Kannan Saturday 2nd March, 2019 ∗Copyright 2015. All rights reserved 1
Contents 1 Introduction 9 2 High-Dimensional Space 2.4.1 Volume of the Unit Ball 2.4.2 Volume Near the Equator 12 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The Law of Large Numbers 2.3 The Geometry of High Dimensions . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Properties of the Unit Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Generating Points Uniformly at Random from a Ball . . . . . . . . . . . . 22 2.6 Gaussians in High Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Random Projection and Johnson-Lindenstrauss Lemma . . . . . . . . . . . 25 2.8 Separating Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.9 Fitting a Spherical Gaussian to Data . . . . . . . . . . . . . . . . . . . . . 29 2.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) 40 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Singular Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . 46 3.5 Best Rank-k Approximations . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.6 Left Singular Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Power Method for Singular Value Decomposition . . . . . . . . . . . . . . . 51 3.7.1 A Faster Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.8 Singular Vectors and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 54 3.9 Applications of Singular Value Decomposition . . . . . . . . . . . . . . . . 54 3.9.1 Centering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.9.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 56 3.9.3 Clustering a Mixture of Spherical Gaussians . . . . . . . . . . . . . 57 3.9.4 Ranking Documents and Web Pages . . . . . . . . . . . . . . . . . 62 3.9.5 An Illustrative Application of SVD . . . . . . . . . . . . . . . . . . 63 3.9.6 An Application of SVD to a Discrete Optimization Problem . . . . 64 3.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4 Random Walks and Markov Chains 77 4.1 Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.1 Metropolis-Hasting Algorithm . . . . . . . . . . . . . . . . . . . . . 84 4.2.2 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 2
4.3 Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4 Convergence of Random Walks on Undirected Graphs . . . . . . . . . . . . 89 4.4.1 Using Normalized Conductance to Prove Convergence . . . . . . . . 95 4.5 Electrical Networks and Random Walks . . . . . . . . . . . . . . . . . . . . 98 4.6 Random Walks on Undirected Graphs with Unit Edge Weights . . . . . . . 103 4.7 Random Walks in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . 110 4.8 The Web as a Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Machine Learning Shatter Function for Set Systems of Bounded VC-Dimension 131 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.2 The Perceptron Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 Kernel Functions and Non-linearly Separable Data . . . . . . . . . . . . . . 134 5.4 Generalizing to New Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4.1 Overfitting and Uniform Convergence . . . . . . . . . . . . . . . . . 137 5.4.2 Occam’s Razor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.3 Regularization: Penalizing Complexity . . . . . . . . . . . . . . . . 140 5.5 VC-Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.5.1 Definitions and Key Theorems . . . . . . . . . . . . . . . . . . . . . 142 . . . . . . . . . . . . . . . . . 143 5.5.2 VC-Dimension of Some Set Systems 5.5.3 . . . 145 5.5.4 VC-Dimension of Combinations of Concepts . . . . . . . . . . . . . 147 5.5.5 The Key Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.6 VC-dimension and Machine Learning . . . . . . . . . . . . . . . . . . . . . 149 5.7 Other Measures of Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.8 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.8.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . 157 5.9 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 160 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.10 Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.10.1 An Example: Learning Disjunctions . . . . . . . . . . . . . . . . . . 163 5.10.2 The Halving Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.10.3 The Perceptron Algorithm . . . . . . . . . . . . . . . . . . . . . . . 164 5.10.4 Inseparable Data and Hinge Loss . . . . . . . . . . . . . . . . . . . 165 5.10.5 Online to Batch Conversion . . . . . . . . . . . . . . . . . . . . . . 166 5.10.6 Combining (Sleeping) Expert Advice . . . . . . . . . . . . . . . . . 167 5.11 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.12 Further Current Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.12.1 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 173 5.12.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 5.12.3 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.9.1 5.9.2 Regularizer 3
5.13 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6 Algorithms for Massive Data Problems: Streaming, Sketching, and 184 Sampling Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.1 6.2 Frequency Moments of Data Streams . . . . . . . . . . . . . . . . . . . . . 185 6.2.1 Number of Distinct Elements in a Data Stream . . . . . . . . . . . 186 6.2.2 Number of Occurrences of a Given Element. . . . . . . . . . . . . . 189 6.2.3 Frequent Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.2.4 The Second Moment . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.3 Matrix Algorithms using Sampling . . . . . . . . . . . . . . . . . . . . . . 195 6.3.1 Matrix Multiplication using Sampling . . . . . . . . . . . . . . . . . 196 Implementing Length Squared Sampling in Two Passes . . . . . . . 199 6.3.2 6.3.3 Sketch of a Large Matrix . . . . . . . . . . . . . . . . . . . . . . . . 200 6.4 Sketches of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 6.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7 Clustering 7.1 Structural Properties of the k-Means Objective 211 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.1.1 Preliminaries 7.1.2 Two General Assumptions on the Form of Clusters . . . . . . . . . 212 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.1.3 7.2 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.2.1 A Maximum-Likelihood Motivation . . . . . . . . . . . . . . . . . . 214 7.2.2 . . . . . . . . . . . 215 7.2.3 Lloyd’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 7.2.4 Ward’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.2.5 k-Means Clustering on the Line . . . . . . . . . . . . . . . . . . . . 218 7.3 k-Center Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.4 Finding Low-Error Clusterings . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.5 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.5.1 Why Project? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.5.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 7.5.3 Means Separated by Ω(1) Standard Deviations . . . . . . . . . . . . 222 7.5.4 Laplacians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 7.5.5 Local spectral clustering . . . . . . . . . . . . . . . . . . . . . . . . 224 7.6 Approximation Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 7.6.1 The Conceptual Idea . . . . . . . . . . . . . . . . . . . . . . . . . . 227 7.6.2 Making this Formal . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 . . . . . . . . . . . . . . . . . . . . . . . . 228 7.6.3 Algorithm and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 7.7 High-Density Clusters 4
7.7.1 Single Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 7.7.2 Robust Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 7.8 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 7.9 Recursive Clustering Based on Sparse Cuts . . . . . . . . . . . . . . . . . . 232 7.10 Dense Submatrices and Communities . . . . . . . . . . . . . . . . . . . . . 233 7.11 Community Finding and Graph Partitioning . . . . . . . . . . . . . . . . . 236 7.12 Spectral Clustering Applied to Social Networks . . . . . . . . . . . . . . . 239 7.13 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 7.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 8 Random Graphs 8.1 The G(n, p) Model 248 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 8.1.1 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 . . . . . . . . . . . . . . . . . . 253 8.1.2 Existence of Triangles in G(n, d/n) 8.2 Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 8.3 Giant Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 . . . . . . . . . . . . . . . . . . . 267 8.3.1 Existence of a Giant Component 8.3.2 No Other Large Components . . . . . . . . . . . . . . . . . . . . . 269 8.3.3 The Case of p < 1/n . . . . . . . . . . . . . . . . . . . . . . . . . . 269 8.4 Cycles and Full Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.4.1 Emergence of Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.4.2 Full Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 8.4.3 Threshold for O(ln n) Diameter . . . . . . . . . . . . . . . . . . . . 273 8.5 Phase Transitions for Increasing Properties . . . . . . . . . . . . . . . . . . 275 8.6 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 8.7 CNF-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 8.7.1 SAT-solvers in practice . . . . . . . . . . . . . . . . . . . . . . . . . 283 8.7.2 Phase Transitions for CNF-SAT . . . . . . . . . . . . . . . . . . . . 284 . . . . . . . . . . . . . . . . . . . 289 8.8.1 Giant Component in Graphs with Given Degree Distribution . . . . 290 8.9 Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 8.9.1 Growth Model Without Preferential Attachment . . . . . . . . . . . 292 8.9.2 Growth Model With Preferential Attachment . . . . . . . . . . . . 298 8.10 Small World Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 8.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 8.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 8.8 Non-uniform Models of Random Graphs 9 Topic Models, Non-negative Matrix Factorization, Hidden Markov Mod- 315 els, and Graphical Models 9.1 Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 9.2 An Idealized Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 9.3 Non-negative Matrix Factorization - NMF . . . . . . . . . . . . . . . . . . 320 9.4 NMF with Anchor Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5
9.5 Hard and Soft Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 9.6 The Latent Dirichlet Allocation Model for Topic Modeling . . . . . . . . . 325 9.7 The Dominant Admixture Model . . . . . . . . . . . . . . . . . . . . . . . 327 9.8 Formal Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 9.9 Finding the Term-Topic Matrix . . . . . . . . . . . . . . . . . . . . . . . . 332 9.10 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 9.11 Graphical Models and Belief Propagation . . . . . . . . . . . . . . . . . . . 342 9.12 Bayesian or Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 343 9.13 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 9.14 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 9.15 Tree Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 9.16 Message Passing in General Graphs . . . . . . . . . . . . . . . . . . . . . . 347 9.16.1 Graphs with a Single Cycle . . . . . . . . . . . . . . . . . . . . . . 349 9.16.2 Belief Update in Networks with a Single Loop . . . . . . . . . . . . 351 9.16.3 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . 352 9.17 Warning Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 9.18 Correlation Between Variables . . . . . . . . . . . . . . . . . . . . . . . . . 356 9.19 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 9.20 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 10 Other Topics 10.2 Compressed Sensing and Sparse Vectors 365 10.1 Ranking and Social Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 10.1.1 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 10.1.2 Examples . . . . . . . . . . . . . . . . . . . 369 10.2.1 Unique Reconstruction of a Sparse Vector . . . . . . . . . . . . . . 370 10.2.2 Efficiently Finding the Unique Sparse Solution . . . . . . . . . . . . 371 10.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 10.3.1 Biological . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 10.3.2 Low Rank Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 10.4 An Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 . . . . . . . . . . . . . . . 375 10.4.1 Sparse Vector in Some Coordinate Basis 10.4.2 A Representation Cannot be Sparse in Both Time and Frequency Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 10.5 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 10.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 10.6.1 The Ellipsoid Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 380 10.7 Integer Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 10.8 Semi-Definite Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 383 10.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 10.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 6
11 Wavelets 390 11.1 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 11.2 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 11.3 Wavelet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 11.4 Solving the Dilation Equation . . . . . . . . . . . . . . . . . . . . . . . . . 395 11.5 Conditions on the Dilation Equation . . . . . . . . . . . . . . . . . . . . . 397 11.6 Derivation of the Wavelets from the Scaling Function . . . . . . . . . . . . 399 11.7 Sufficient Conditions for the Wavelets to be Orthogonal . . . . . . . . . . . 403 11.8 Expressing a Function in Terms of Wavelets . . . . . . . . . . . . . . . . . 406 11.9 Designing a Wavelet System . . . . . . . . . . . . . . . . . . . . . . . . . . 406 11.10Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 11.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 11.12 Exercises 12 Appendix 411 12.1 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.1.1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.1.2 Substructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.1.3 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 411 12.2 Useful Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 12.3 Useful Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 12.4 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 12.4.1 Sample Space, Events, and Independence . . . . . . . . . . . . . . . 425 12.4.2 Linearity of Expectation . . . . . . . . . . . . . . . . . . . . . . . . 426 12.4.3 Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 12.4.4 Indicator Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 12.4.5 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 12.4.6 Variance of the Sum of Independent Random Variables . . . . . . . 427 12.4.7 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 12.4.8 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 428 12.4.9 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 428 12.4.10 Bayes Rule and Estimators . . . . . . . . . . . . . . . . . . . . . . . 432 12.5 Bounds on Tail Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 12.5.1 Chernoff Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 12.5.2 More General Tail Bounds . . . . . . . . . . . . . . . . . . . . . . . 437 12.6 Applications of the Tail Bound . . . . . . . . . . . . . . . . . . . . . . . . 440 12.7 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 442 12.7.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 443 12.7.2 Relationship between SVD and Eigen Decomposition . . . . . . . . 445 12.7.3 Extremal Properties of Eigenvalues . . . . . . . . . . . . . . . . . . 446 12.7.4 Eigenvalues of the Sum of Two Symmetric Matrices . . . . . . . . . 448 12.7.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 12.7.6 Important Norms and Their Properties . . . . . . . . . . . . . . . . 450 7
12.7.7 Additional Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 452 12.7.8 Distance Between Subspaces . . . . . . . . . . . . . . . . . . . . . . 454 12.7.9 Positive Semidefinite Matrix . . . . . . . . . . . . . . . . . . . . . . 455 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 12.8 Generating Functions 12.8.1 Generating Functions for Sequences Defined by Recurrence Rela- tionships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 12.8.2 The Exponential Generating Function and the Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 12.9 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 12.9.1 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . 460 12.9.2 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 12.9.3 Application of Mean Value Theorem . . . . . . . . . . . . . . . . . 461 12.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Index 468 8
分享到:
收藏