logo资料库

《广义主成分分析》Generalized Principal Component analysis.pdf

第1页 / 共590页
第2页 / 共590页
第3页 / 共590页
第4页 / 共590页
第5页 / 共590页
第6页 / 共590页
第7页 / 共590页
第8页 / 共590页
资料共590页,剩余部分请下载后查看
Preface
Acknowledgments
Contents
Glossary of Notation
1 Introduction
1.1 Modeling Data with a Parametric Model
1.1.1 The Choice of a Model Class
1.1.2 Statistical Models versus Geometric Models
1.2 Modeling Mixed Data with a Mixture Model
1.2.1 Examples of Mixed Data Modeling
1.2.2 Mathematical Representations of Mixture Models
1.3 Clustering via Discriminative or Nonparametric Methods
1.4 Noise, Errors, Outliers, and Model Selection
Part I Modeling Data with a Single Subspace
2 Principal Component Analysis
2.1 Classical Principal Component Analysis (PCA)
2.1.1 A Statistical View of PCA
2.1.2 A Geometric View of PCA
2.1.3 A Rank Minimization View of PCA
2.2 Probabilistic Principal Component Analysis (PPCA)
2.2.1 PPCA from Population Mean and Covariance
2.2.2 PPCA by Maximum Likelihood
2.3 Model Selection for Principal Component Analysis
2.3.1 Model Selection by Information-Theoretic Criteria
2.3.2 Model Selection by Rank Minimization
2.3.3 Model Selection by Asymptotic Mean Square Error
2.4 Bibliographic Notes
2.5 Exercises
3 Robust Principal Component Analysis
3.1 PCA with Robustness to Missing Entries
3.1.1 Incomplete PCA by Mean and Covariance Completion
3.1.2 Incomplete PPCA by Expectation Maximization
3.1.3 Matrix Completion by Convex Optimization
3.1.4 Incomplete PCA by Alternating Minimization
3.2 PCA with Robustness to Corrupted Entries
3.2.1 Robust PCA by Iteratively Reweighted Least Squares
3.2.2 Robust PCA by Convex Optimization
3.3 PCA with Robustness to Outliers
3.3.1 Outlier Detection by Robust Statistics
3.3.2 Outlier Detection by Convex Optimization
3.4 Bibliographic Notes
3.5 Exercises
4 Nonlinear and Nonparametric Extensions
4.1 Nonlinear and Kernel PCA
4.1.1 Nonlinear Principal Component Analysis (NLPCA)
4.1.2 NLPCA in a High-dimensional Feature Space
4.1.3 Kernel PCA (KPCA)
4.2 Nonparametric Manifold Learning
4.2.1 Multidimensional Scaling (MDS)
4.2.2 Locally Linear Embedding (LLE)
4.2.3 Laplacian Eigenmaps (LE)
4.3 K-Means and Spectral Clustering
4.3.1 K-Means Clustering
4.3.2 Spectral Clustering
4.4 Bibliographic Notes
4.5 Exercises
4.A Laplacian Eigenmaps: Continuous Formulation
Part II Modeling Data with Multiple Subspaces
5 Algebraic-Geometric Methods
5.1 Problem Formulation of Subspace Clustering
5.1.1 Projectivization of Affine Subspaces
5.1.2 Subspace Projection and Minimum Representation
5.2 Introductory Cases of Subspace Clustering
5.2.1 Clustering Points on a Line
5.2.2 Clustering Lines in a Plane
5.2.3 Clustering Hyperplanes
5.3 Subspace Clustering Knowing the Number of Subspaces
5.3.1 An Introductory Example
5.3.2 Fitting Polynomials to Subspaces
5.3.3 Subspaces from Polynomial Differentiation
5.3.4 Point Selection via Polynomial Division
5.3.5 The Basic Algebraic Subspace Clustering Algorithm
5.4 Subspace Clustering not Knowing the Number of Subspaces
5.4.1 Introductory Examples
5.4.2 Clustering Subspaces of Equal Dimension
5.4.3 Clustering Subspaces of Different Dimensions
5.5 Model Selection for Multiple Subspaces
5.5.1 Effective Dimension of Samples of Multiple Subspaces
5.5.2 Minimum Effective Dimension of Noisy Samples
5.5.3 Recursive Algebraic Subspace Clustering
5.6 Bibliographic Notes
5.7 Exercises
6 Statistical Methods
6.1 K-Subspaces
6.1.1 K-Subspaces Model
6.1.2 K-Subspaces Algorithm
6.1.3 Convergence of the K-Subspaces Algorithm
6.1.4 Advantages and Disadvantages of K-Subspaces
6.2 Mixture of Probabilistic PCA (MPPCA)
6.2.1 MPPCA Model
6.2.2 Maximum Likelihood Estimation for MPPCA
6.2.3 Maximum a Posteriori (MAP) Estimation for MPPCA
6.2.4 Relationship between K-Subspaces and MPPCA
6.3 Compression-Based Subspace Clustering
6.3.1 Model Estimation and Data Compression
6.3.2 Minimium Coding Length via Agglomerative Clustering
6.3.3 Lossy Coding of Multivariate Data
6.3.4 Coding Length of Mixed Gaussian Data
6.4 Simulations and Applications
6.4.1 Statistical Methods on Synthetic Data
6.4.2 Statistical Methods on Gene Expression Clustering, Image Segmentation, and Face Clustering
6.5 Bibliographic Notes
6.6 Exercises
6.A Lossy Coding Length for Subspace-like Data
7 Spectral Methods
7.1 Spectral Subspace Clustering
7.2 Local Subspace Affinity (LSA) and Spectral Local Best-Fit Flats (SLBF)
7.3 Locally Linear Manifold Clustering (LLMC)
7.4 Spectral Curvature Clustering (SCC)
7.5 Spectral Algebraic Subspace Clustering (SASC)
7.6 Simulations and Applications
7.6.1 Spectral Methods on Synthetic Data
7.6.2 Spectral Methods on Face Clustering
7.7 Exercises
8 Sparse and Low-Rank Methods
8.1 Self-Expressiveness and Subspace-Preserving Representations
8.1.1 Self-Expressiveness Property
8.1.2 Subspace-Preserving Representation
8.2 Low-Rank Subspace Clustering (LRSC)
8.2.1 LRSC with Uncorrupted Data
8.2.2 LRSC with Robustness to Noise
8.2.3 LRSC with Robustness to Corruptions
8.3 Sparse Subspace Clustering (SSC)
8.3.1 SSC with Uncorrupted Data
8.3.2 SSC with Robustness to Outliers
8.3.3 SSC with Robustness to Noise
8.3.4 SSC with Robustness to Corrupted Entries
8.3.5 SSC for Affine Subspaces
8.4 Simulations and Applications
8.4.1 Low-Rank and Sparse Methods on Synthetic Data
8.4.2 Low-Rank and Sparse Methods on Face Clustering
8.5 Bibliographic Notes
8.6 Exercises
Part III Applications
9 Image Representation
9.1 Seeking Compact and Sparse Image Representations
9.1.1 Prefixed Linear Transformations
9.1.2 Adaptive, Overcomplete, and Hybrid Representations
9.1.3 Hierarchical Models for Multiscale Structures.
9.2 Image Representation with Multiscale Hybrid Linear Models
9.2.1 Linear versus Hybrid Linear Models
9.2.2 Multiscale Hybrid Linear Models
9.2.3 Experiments and Comparisons
9.3 Multiscale Hybrid Linear Models in Wavelet Domain
9.3.1 Imagery Data Vectors in the Wavelet Domain
9.3.2 Hybrid Linear Models in the Wavelet Domain
9.3.3 Comparison with Other Lossy Representations
9.4 Bibliographic Notes
10 Image Segmentation
10.1 Basic Models and Principles
10.1.1 Problem Formulation
10.1.2 Image Segmentation as Subspace Clustering
10.1.3 Minimum Coding Length Principle
10.2 Encoding Image Textures and Boundaries
10.2.1 Construction of Texture Features
10.2.2 Texture Encoding
10.2.3 Boundary Encoding
10.3 Compression-Based Image Segmentation
10.3.1 Minimizing Total Coding Length
10.3.2 Hierarchical Implementation
10.3.3 Choosing the Proper Distortion Level
10.4 Experimental Evaluation
10.4.1 Color Spaces and Compressibility
10.4.2 Experimental Setup
10.4.3 Results and Discussions
10.5 Bibliographic Notes
11 Motion Segmentation
11.1 The 3D Motion Segmentation Problem
11.2 Motion Segmentation from Multiple Affine Views
11.2.1 Affine Projection of a Rigid-Body Motion
11.2.2 Motion Subspace of a Rigid-Body Motion
11.2.3 Segmentation of Multiple Rigid-Body Motions
11.2.4 Experiments on Multiview Motion Segmentation
11.3 Motion Segmentation from Two Perspective Views
11.3.1 Perspective Projection of a Rigid-Body Motion
11.3.2 Segmentation of 3D Translational Motions
11.3.3 Segmentation of Rigid-Body Motions
11.3.4 Segmentation of Rotational Motions or Planar Scenes
11.3.5 Experiments on Two-View Motion Segmentation
11.4 Temporal Motion Segmentation
11.4.1 Dynamical Models of Time-Series Data
11.4.2 Experiments on Temporal Video Segmentation
11.4.3 Experiments on Segmentation of Human Motion Data
11.5 Bibliographical Notes
12 Hybrid System Identification
12.1 Problem Statement
12.2 Identification of a Single ARX System
12.3 Identification of Hybrid ARX Systems
12.3.1 The Hybrid Decoupling Polynomial
12.3.2 Identifying the Hybrid Decoupling Polynomial
12.3.3 Identifying System Parameters and Discrete States
12.3.4 The Basic Algorithm and Its Extensions
12.4 Simulations and Experiments
12.4.1 Error in the Estimation of the Model Parameters
12.4.2 Error as a Function of the Model Orders
12.4.3 Error as a Function of Noise
12.4.4 Experimental Results on Test Data Sets
12.5 Bibliographic Notes
13 Final Words
13.1 Unbalanced and Multimodal Data
13.2 Unsupervised and Semisupervised Learning
13.3 Data Acquisition and Online Data Analysis
13.4 Other Low-Dimensional Models
13.5 Computability and Scalability
13.6 Theory, Algorithms, Systems, and Applications
A Basic Facts from Optimization
A.1 Unconstrained Optimization
A.1.1 Optimality Conditions
A.1.2 Convex Set and Convex Function
A.1.3 Subgradient
A.1.4 Gradient Descent Algorithm
A.1.5 Alternating Direction Minimization
A.2 Constrained Optimization
A.2.1 Optimality Conditions and Lagrangian Multipliers
A.2.2 Augmented Lagrange Multipler Methods
A.2.3 Alternating Direction Method of Multipliers
A.3 Exercises
B Basic Facts from Mathematical Statistics
B.1 Estimation of Parametric Models
B.1.1 Sufficient Statistics
B.1.2 Mean Square Error, Efficiency, and Fisher Information
B.1.3 The Rao–Blackwell Theorem and Uniformly Minimum-Variance Unbiased Estimator
B.1.4 Maximum Likelihood (ML) Estimator
B.1.5 Consistency and Asymptotic Efficiency of the ML Estimator
B.2 ML Estimation for Models with Latent Variables
B.2.1 Expectation Maximization (EM)
B.2.2 Maximum a Posteriori Expectation Maximization (MAP-EM)
B.3 Estimation of Mixture Models
B.3.1 EM for Mixture Models
B.3.2 MAP-EM for Mixture Models
B.3.3 A Case in Which EM Fails
B.4 Model-Selection Criteria
B.4.1 Akaike Information Criterion
B.4.2 Bayesian Information Criterion
B.5 Robust Statistical Methods
B.5.1 Influence-Based Outlier Detection
B.5.2 Probability-Based Outlier Detection
B.5.3 Random-Sampling-Based Outlier Detection
B.6 Exercises
C Basic Facts from Algebraic Geometry
C.1 Abstract Algebra Basics
C.1.1 Polynomial Rings
C.1.2 Ideals and Algebraic Sets
C.1.3 Algebra and Geometry: Hilbert's Nullstellensatz
C.1.4 Algebraic Sampling Theory
C.1.5 Decomposition of Ideals and Algebraic Sets
C.1.6 Hilbert Function, Polynomial, and Series
C.2 Ideals of Subspace Arrangements
C.3 Subspace Embedding and PL-Generated Ideals
C.4 Hilbert Functions of Subspace Arrangements
C.4.1 Hilbert Function and Algebraic Subspace Clustering
C.4.2 Special Cases of the Hilbert Function
C.4.3 Formulas for the Hilbert Function
C.5 Bibliographic Notes
References
Index
Interdisciplinary Applied Mathematics 40 René Vidal Yi Ma S. Shankar Sastry Generalized Principal Component Analysis
Generalized Principal Component Analysis
Interdisciplinary Applied Mathematics Volume 40 Editors S.S. Antman L. Greengard P. Holmes Series Advisors Leon Glass Robert Kohn P.S. Krishnaprasad James D. Murray Shankar Sastry James Sneyd Problems in engineering, computational science, and the physical and biological sciences are using increasingly sophisticated mathematical techniques. Thus, the bridge between the mathematical sciences and other disciplines is heavily traveled. The correspondingly increased dialog between the disciplines has led to the establishment of the series: Interdisciplinary Applied Mathematics. The purpose of this series is to meet the current and future needs for the interaction between various science and technology areas on the one hand and mathematics on the other. This is done, firstly, by encouraging the ways that mathematics may be applied in traditional areas, as well as point towards new and innovative areas of applications; and, secondly, by encouraging other scientific disciplines to engage in a dialog with mathematicians outlining their problems to both access new methods and suggest innovative developments within mathematics itself. The series will consist of monographs and high-level texts from researchersworking on the interplay between mathematics and other fields of science and technology. More information about this series at http://www.springer.com/series/1390
René Vidal • Yi Ma • S. Shankar Sastry Generalized Principal Component Analysis 123
René Vidal Center for Imaging Science Department of Biomedical Engineering Johns Hopkins University Baltimore, MD, USA Yi Ma School of Information Science and Technology ShanghaiTech University Shanghai, China S. Shankar Sastry Department of Electrical Engineering and Computer Science University of California Berkeley Berkeley, CA, USA ISSN 0939-6047 Interdisciplinary Applied Mathematics ISBN 978-0-387-87810-2 DOI 10.1007/978-0-387-87811-9 ISSN 2196-9973 (electronic) ISBN 978-0-387-87811-9 (eBook) Library of Congress Control Number: 2015958763 Mathematics Subject Classification (2010): 30C10, 30C40, 62-XX, 62-07, 62-08, 62B10, 62Fxx, 62H12, 62H25, 62H35, 62Jxx, 62J05, 62J07, 14-XX, 14N20, 15-XX Springer New York Heidelberg Dordrecht London © Springer-Verlag New York 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer Science+Business Media LLC New York is part of Springer Science+Business Media (www. springer.com)
To Camille, Margot, and Romeo Vidal (R. V.) To Diana Xiaoyuan Zhu, Barron, and Henry Ma (Y. M.) To Claire Tomlin, Samuel, and Lucy Sastry (S. S.)
Preface We are not very pleased when we are forced to accept a mathematical truth by virtue of a complicated chain of formal conclusions and computations, which we traverse blindly, link by link, feeling our way by touch. We want first an overview of the aim and of the road; we want to understand the idea of the proof, the deeper context. —Hermann Weyl Classical theory and methods for the analysis of data were established mainly for engineering and scientific problems that arose five or six decades ago. In these classical settings, engineers or scientists usually had full control of the data acquisition process. As a result, the data to be processed and analyzed were typically clean and complete: they contained only moderate amounts of noise and were often adequately collected for the specific task or problem of interest. In that regime, many data analysis methods were based on the assumption that most data sets have fewer effective degrees of freedom than the dimension of the ambient space. For example, the number of pixels in an image can be rather large, yet most computer vision models used only a few parameters to describe the appearance, geometry, and dynamics of a scene. This assumption motivated the development of a number of techniques for identifying low-dimensional structures in high-dimensional data, a problem that is important not only for understanding the data, but also for many practical purposes such as data compression and transmission. A popular technique for discovering low-dimensional structure in data is principal component analysis (PCA), which assumes that the data are drawn from a single low-dimensional affine subspace of a high-dimensional space (Jolliffe 1986, 2002). PCA is arguably the simplest and most popular dimensionality reduction tool, and it has found widespread applications in many fields such as computer vision (Turk and Pentland 1991). However, in the past decade or so, there has been a fundamental regime shift in data analysis. Currently, scientists and engineers often must deal with data whose dimension, size, and complexity expand at an explosive rate. Moreover, they have pretty much lost control of the data acquisition process. For instance, in 2012, 350 million photos were uploaded to Facebook every day, and 100 hours vii
分享到:
收藏