logo资料库

Statistical Learning with Sparsity.pdf

第1页 / 共362页
第2页 / 共362页
第3页 / 共362页
第4页 / 共362页
第5页 / 共362页
第6页 / 共362页
第7页 / 共362页
第8页 / 共362页
资料共362页,剩余部分请下载后查看
Statistics copy to come from copywriter for review Monographs on Statistics and Applied Probability 143 Statistical Learning with Sparsity The Lasso and Generalizations 143 S t a t i s t i c a l L e a r n i n g w i t h S p a r s i t y H a s t i e • T i b s h i r a n i • W a n w i K25103 r i g h t w w w . c r c p r e s s . c o m Trevor Hastie Robert Tibshirani Martin Wainwright K25103_cover.indd 1 2/24/15 1:35 PM
To our parents: Valerie and Patrick Hastie Vera and Sami Tibshirani Patricia and John Wainwright and to our families: Samantha, Timothy, and Lynda Charlie, Ryan, Jess, Julie, and Cheryl Haruko and Hana
Contents Preface 1 Introduction 2 The Lasso for Linear Models Introduction 2.1 2.2 The Lasso Estimator 2.3 Cross-Validation and Inference 2.4 Computation of the Lasso Solution Single Predictor: Soft Thresholding 2.4.1 2.4.2 Multiple Predictors: Cyclic Coordinate Descent 2.4.3 Soft-Thresholding and Orthogonal Bases 2.5 Degrees of Freedom 2.6 Uniqueness of the Lasso Solutions 2.7 A Glimpse at the Theory 2.8 The Nonnegative Garrote 2.9 2.10 Some Perspective Exercises ‘q Penalties and Bayes Estimates 3 Generalized Linear Models Introduction 3.1 3.2 Logistic Regression 3.2.1 Example: Document Classification 3.2.2 Algorithms 3.3 Multiclass Logistic Regression 3.3.1 Example: Handwritten Digits 3.3.2 Algorithms 3.3.3 Grouped-Lasso Multinomial 3.4 Log-Linear Models and the Poisson GLM 3.4.1 Example: Distribution Smoothing 3.5 Cox Proportional Hazards Models 3.5.1 Cross-Validation 3.5.2 Pre-Validation 3.6 Support Vector Machines 3.6.1 Logistic Regression with Separable Data ix xv 1 7 7 8 13 14 15 15 17 17 19 20 20 22 23 24 29 29 31 32 35 36 37 39 39 40 40 42 43 45 46 49
x 3.7 Computational Details and glmnet Bibliographic Notes Exercises 4 Generalizations of the Lasso Penalty Introduction 4.1 4.2 The Elastic Net 4.3 The Group Lasso 4.3.1 Computation for the Group Lasso 4.3.2 4.3.3 The Overlap Group Lasso Sparse Group Lasso 4.4 Sparse Additive Models and the Group Lasso Sparse Additive Models and Backfitting 4.4.1 Additive Models and Backfitting 4.4.2 4.4.3 Approaches Using Optimization and the Group Lasso 4.4.4 Multiple Penalization for Sparse Additive Models 5.2.1 Optimality for Differentiable Problems 5.2.2 Nondifferentiable Functions and Subgradients 5.3 Gradient Descent 5.3.1 Unconstrained Gradient Descent 5.3.2 Projected Gradient Methods 5.3.3 Proximal Gradient Methods 5.3.4 Accelerated Gradient Methods 5.4 Coordinate Descent Separability and Coordinate Descent 5.4.1 5.4.2 Linear Regression and the Lasso 5.4.3 Logistic Regression and Generalized Linear Models 5.5 A Simulation Study 5.6 Least Angle Regression 5.7 Alternating Direction Method of Multipliers 4.5 The Fused Lasso 4.5.1 Fitting the Fused Lasso 4.5.1.1 Reparametrization 4.5.1.2 A Path Algorithm 4.5.1.3 A Dual Path Algorithm 4.5.1.4 Dynamic Programming for the Fused Lasso 4.5.2 Trend Filtering 4.5.3 Nearly Isotonic Regression 4.6 Nonconvex Penalties Bibliographic Notes Exercises 5 Optimization Methods Introduction 5.1 5.2 Convex Optimality Conditions 50 52 53 55 55 56 58 62 64 65 69 69 70 72 74 76 77 78 79 79 80 81 83 84 86 88 95 95 95 95 98 100 101 102 103 107 109 110 112 115 117 118 121
5.8 Minorization-Maximization Algorithms 5.9 Biconvexity and Alternating Minimization 5.10 Screening Rules Bibliographic Notes Appendix Exercises 6 Statistical Inference 6.1 The Bayesian Lasso 6.2 The Bootstrap 6.3 Post-Selection Inference for the Lasso 6.3.1 The Covariance Test 6.3.2 A General Scheme for Post-Selection Inference 6.3.2.1 Fixed-λ Inference for the Lasso 6.3.2.2 The Spacing Test for LAR 6.3.3 What Hypothesis Is Being Tested? 6.3.4 Back to Forward Stepwise Regression Inference via a Debiased Lasso 6.4 6.5 Other Proposals for Post-Selection Inference Bibliographic Notes Exercises xi 123 124 127 131 132 134 139 139 142 147 147 150 154 156 157 158 158 160 161 162 Introduction 7.1 7.2 The Singular Value Decomposition 7.3 Missing Data and Matrix Completion 7.3.1 The Netflix Movie Challenge 7.3.2 Matrix Completion Using Nuclear Norm 7.3.3 Theoretical Results for Matrix Completion 7.3.4 Maximum Margin Factorization and Related Methods 7 Matrix Decompositions, Approximations, and Completion 167 167 169 169 170 174 177 181 184 185 187 190 195 196 7.4 Reduced-Rank Regression 7.5 A General Matrix Regression Framework 7.6 Penalized Matrix Decomposition 7.7 Additive Matrix Decomposition Bibliographic Notes Exercises 8 Sparse Multivariate Methods Introduction 8.1 8.2 Sparse Principal Components Analysis 8.2.1 8.2.2 Some Background Sparse Principal Components 8.2.2.1 Sparsity from Maximum Variance 8.2.2.2 Methods Based on Reconstruction 8.2.3 Higher-Rank Solutions 201 201 202 202 204 204 206 207
xii Illustrative Application of Sparse PCA 8.2.3.1 Sparse PCA via Fantope Projection Sparse Autoencoders and Deep Learning Some Theory for Sparse PCA 8.3 Sparse Canonical Correlation Analysis 8.2.4 8.2.5 8.2.6 8.3.1 Example: Netflix Movie Rating Data 8.4 Sparse Linear Discriminant Analysis 8.4.1 Normal Theory and Bayes’ Rule 8.4.2 Nearest Shrunken Centroids 8.4.3 Fisher’s Linear Discriminant Analysis 8.4.3.1 Example: Simulated Data with Five Classes 8.4.4 Optimal Scoring 8.4.4.1 Example: Face Silhouettes 8.5 Sparse Clustering 8.5.1 Some Background on Clustering 8.5.1.1 Example: Simulated Data with Six Classes Sparse Hierarchical Clustering Sparse K-Means Clustering 8.5.2 8.5.3 8.5.4 Convex Clustering Bibliographic Notes Exercises 9 Graphs and Model Selection Introduction 9.1 9.2 Basics of Graphical Models 9.2.1 Factorization and Markov Properties 9.2.1.1 Factorization Property 9.2.1.2 Markov Property 9.2.1.3 Equivalence of Factorization and Markov Properties 9.2.2 Some Examples 9.2.2.1 Discrete Graphical Models 9.2.2.2 Gaussian Graphical Models 9.3 Graph Selection via Penalized Likelihood 9.3.1 Global Likelihoods for Gaussian Models 9.3.2 Graphical Lasso Algorithm 9.3.3 Exploiting Block-Diagonal Structure 9.3.4 Theoretical Guarantees for the Graphical Lasso 9.3.5 Global Likelihood for Discrete Models 9.4 Graph Selection via Conditional Inference 9.4.1 Neighborhood-Based Likelihood for Gaussians 9.4.2 Neighborhood-Based Likelihood for Discrete Models 9.4.3 Pseudo-Likelihood for Mixed Models 9.5 Graphical Models with Hidden Variables Bibliographic Notes 209 210 210 212 213 215 217 217 218 221 222 225 226 227 227 228 228 230 231 232 234 241 241 241 241 242 243 243 244 244 245 246 247 248 251 252 253 254 255 256 259 261 261
Exercises 10 Signal Approximation and Compressed Sensing 10.1 Introduction 10.2 Signals and Sparse Representations 10.2.1 Orthogonal Bases 10.2.2 Approximation in Orthogonal Bases 10.2.3 Reconstruction in Overcomplete Bases 10.3 Random Projection and Approximation 10.3.1 Johnson–Lindenstrauss Approximation 10.3.2 Compressed Sensing 10.4 Equivalence between ‘0 and ‘1 Recovery 10.4.1 Restricted Nullspace Property 10.4.2 Sufficient Conditions for Restricted Nullspace 10.4.3 Proofs 10.4.3.1 Proof of Theorem 10.1 10.4.3.2 Proof of Proposition 10.1 Bibliographic Notes Exercises 11 Theoretical Results for the Lasso 11.1 Introduction 11.1.1 Types of Loss Functions 11.1.2 Types of Sparsity Models 11.2 Bounds on Lasso ‘2-Error 11.2.1 Strong Convexity in the Classical Setting 11.2.2 Restricted Eigenvalues for Regression 11.2.3 A Basic Consistency Result 11.3 Bounds on Prediction Error 11.4 Support Recovery in Linear Regression 11.4.1 Variable-Selection Consistency for the Lasso 11.4.1.1 Some Numerical Studies 11.4.2 Proof of Theorem 11.3 11.5 Beyond the Basic Lasso Bibliographic Notes Exercises Bibliography Author Index Index xiii 263 269 269 269 269 271 274 276 277 278 280 281 282 284 284 284 285 286 289 289 289 290 291 291 293 294 299 301 301 303 305 310 311 312 315 337 343
分享到:
收藏