Preface
Acknowledgments
Contents
Glossary of Notation
1 Introduction
1.1 Modeling Data with a Parametric Model
1.1.1 The Choice of a Model Class
1.1.2 Statistical Models versus Geometric Models
1.2 Modeling Mixed Data with a Mixture Model
1.2.1 Examples of Mixed Data Modeling
1.2.2 Mathematical Representations of Mixture Models
1.3 Clustering via Discriminative or Nonparametric Methods
1.4 Noise, Errors, Outliers, and Model Selection
Part I Modeling Data with a Single Subspace
2 Principal Component Analysis
2.1 Classical Principal Component Analysis (PCA)
2.1.1 A Statistical View of PCA
2.1.2 A Geometric View of PCA
2.1.3 A Rank Minimization View of PCA
2.2 Probabilistic Principal Component Analysis (PPCA)
2.2.1 PPCA from Population Mean and Covariance
2.2.2 PPCA by Maximum Likelihood
2.3 Model Selection for Principal Component Analysis
2.3.1 Model Selection by Information-Theoretic Criteria
2.3.2 Model Selection by Rank Minimization
2.3.3 Model Selection by Asymptotic Mean Square Error
2.4 Bibliographic Notes
2.5 Exercises
3 Robust Principal Component Analysis
3.1 PCA with Robustness to Missing Entries
3.1.1 Incomplete PCA by Mean and Covariance Completion
3.1.2 Incomplete PPCA by Expectation Maximization
3.1.3 Matrix Completion by Convex Optimization
3.1.4 Incomplete PCA by Alternating Minimization
3.2 PCA with Robustness to Corrupted Entries
3.2.1 Robust PCA by Iteratively Reweighted Least Squares
3.2.2 Robust PCA by Convex Optimization
3.3 PCA with Robustness to Outliers
3.3.1 Outlier Detection by Robust Statistics
3.3.2 Outlier Detection by Convex Optimization
3.4 Bibliographic Notes
3.5 Exercises
4 Nonlinear and Nonparametric Extensions
4.1 Nonlinear and Kernel PCA
4.1.1 Nonlinear Principal Component Analysis (NLPCA)
4.1.2 NLPCA in a High-dimensional Feature Space
4.1.3 Kernel PCA (KPCA)
4.2 Nonparametric Manifold Learning
4.2.1 Multidimensional Scaling (MDS)
4.2.2 Locally Linear Embedding (LLE)
4.2.3 Laplacian Eigenmaps (LE)
4.3 K-Means and Spectral Clustering
4.3.1 K-Means Clustering
4.3.2 Spectral Clustering
4.4 Bibliographic Notes
4.5 Exercises
4.A Laplacian Eigenmaps: Continuous Formulation
Part II Modeling Data with Multiple Subspaces
5 Algebraic-Geometric Methods
5.1 Problem Formulation of Subspace Clustering
5.1.1 Projectivization of Affine Subspaces
5.1.2 Subspace Projection and Minimum Representation
5.2 Introductory Cases of Subspace Clustering
5.2.1 Clustering Points on a Line
5.2.2 Clustering Lines in a Plane
5.2.3 Clustering Hyperplanes
5.3 Subspace Clustering Knowing the Number of Subspaces
5.3.1 An Introductory Example
5.3.2 Fitting Polynomials to Subspaces
5.3.3 Subspaces from Polynomial Differentiation
5.3.4 Point Selection via Polynomial Division
5.3.5 The Basic Algebraic Subspace Clustering Algorithm
5.4 Subspace Clustering not Knowing the Number of Subspaces
5.4.1 Introductory Examples
5.4.2 Clustering Subspaces of Equal Dimension
5.4.3 Clustering Subspaces of Different Dimensions
5.5 Model Selection for Multiple Subspaces
5.5.1 Effective Dimension of Samples of Multiple Subspaces
5.5.2 Minimum Effective Dimension of Noisy Samples
5.5.3 Recursive Algebraic Subspace Clustering
5.6 Bibliographic Notes
5.7 Exercises
6 Statistical Methods
6.1 K-Subspaces
6.1.1 K-Subspaces Model
6.1.2 K-Subspaces Algorithm
6.1.3 Convergence of the K-Subspaces Algorithm
6.1.4 Advantages and Disadvantages of K-Subspaces
6.2 Mixture of Probabilistic PCA (MPPCA)
6.2.1 MPPCA Model
6.2.2 Maximum Likelihood Estimation for MPPCA
6.2.3 Maximum a Posteriori (MAP) Estimation for MPPCA
6.2.4 Relationship between K-Subspaces and MPPCA
6.3 Compression-Based Subspace Clustering
6.3.1 Model Estimation and Data Compression
6.3.2 Minimium Coding Length via Agglomerative Clustering
6.3.3 Lossy Coding of Multivariate Data
6.3.4 Coding Length of Mixed Gaussian Data
6.4 Simulations and Applications
6.4.1 Statistical Methods on Synthetic Data
6.4.2 Statistical Methods on Gene Expression Clustering, Image Segmentation, and Face Clustering
6.5 Bibliographic Notes
6.6 Exercises
6.A Lossy Coding Length for Subspace-like Data
7 Spectral Methods
7.1 Spectral Subspace Clustering
7.2 Local Subspace Affinity (LSA) and Spectral Local Best-Fit Flats (SLBF)
7.3 Locally Linear Manifold Clustering (LLMC)
7.4 Spectral Curvature Clustering (SCC)
7.5 Spectral Algebraic Subspace Clustering (SASC)
7.6 Simulations and Applications
7.6.1 Spectral Methods on Synthetic Data
7.6.2 Spectral Methods on Face Clustering
7.7 Exercises
8 Sparse and Low-Rank Methods
8.1 Self-Expressiveness and Subspace-Preserving Representations
8.1.1 Self-Expressiveness Property
8.1.2 Subspace-Preserving Representation
8.2 Low-Rank Subspace Clustering (LRSC)
8.2.1 LRSC with Uncorrupted Data
8.2.2 LRSC with Robustness to Noise
8.2.3 LRSC with Robustness to Corruptions
8.3 Sparse Subspace Clustering (SSC)
8.3.1 SSC with Uncorrupted Data
8.3.2 SSC with Robustness to Outliers
8.3.3 SSC with Robustness to Noise
8.3.4 SSC with Robustness to Corrupted Entries
8.3.5 SSC for Affine Subspaces
8.4 Simulations and Applications
8.4.1 Low-Rank and Sparse Methods on Synthetic Data
8.4.2 Low-Rank and Sparse Methods on Face Clustering
8.5 Bibliographic Notes
8.6 Exercises
Part III Applications
9 Image Representation
9.1 Seeking Compact and Sparse Image Representations
9.1.1 Prefixed Linear Transformations
9.1.2 Adaptive, Overcomplete, and Hybrid Representations
9.1.3 Hierarchical Models for Multiscale Structures.
9.2 Image Representation with Multiscale Hybrid Linear Models
9.2.1 Linear versus Hybrid Linear Models
9.2.2 Multiscale Hybrid Linear Models
9.2.3 Experiments and Comparisons
9.3 Multiscale Hybrid Linear Models in Wavelet Domain
9.3.1 Imagery Data Vectors in the Wavelet Domain
9.3.2 Hybrid Linear Models in the Wavelet Domain
9.3.3 Comparison with Other Lossy Representations
9.4 Bibliographic Notes
10 Image Segmentation
10.1 Basic Models and Principles
10.1.1 Problem Formulation
10.1.2 Image Segmentation as Subspace Clustering
10.1.3 Minimum Coding Length Principle
10.2 Encoding Image Textures and Boundaries
10.2.1 Construction of Texture Features
10.2.2 Texture Encoding
10.2.3 Boundary Encoding
10.3 Compression-Based Image Segmentation
10.3.1 Minimizing Total Coding Length
10.3.2 Hierarchical Implementation
10.3.3 Choosing the Proper Distortion Level
10.4 Experimental Evaluation
10.4.1 Color Spaces and Compressibility
10.4.2 Experimental Setup
10.4.3 Results and Discussions
10.5 Bibliographic Notes
11 Motion Segmentation
11.1 The 3D Motion Segmentation Problem
11.2 Motion Segmentation from Multiple Affine Views
11.2.1 Affine Projection of a Rigid-Body Motion
11.2.2 Motion Subspace of a Rigid-Body Motion
11.2.3 Segmentation of Multiple Rigid-Body Motions
11.2.4 Experiments on Multiview Motion Segmentation
11.3 Motion Segmentation from Two Perspective Views
11.3.1 Perspective Projection of a Rigid-Body Motion
11.3.2 Segmentation of 3D Translational Motions
11.3.3 Segmentation of Rigid-Body Motions
11.3.4 Segmentation of Rotational Motions or Planar Scenes
11.3.5 Experiments on Two-View Motion Segmentation
11.4 Temporal Motion Segmentation
11.4.1 Dynamical Models of Time-Series Data
11.4.2 Experiments on Temporal Video Segmentation
11.4.3 Experiments on Segmentation of Human Motion Data
11.5 Bibliographical Notes
12 Hybrid System Identification
12.1 Problem Statement
12.2 Identification of a Single ARX System
12.3 Identification of Hybrid ARX Systems
12.3.1 The Hybrid Decoupling Polynomial
12.3.2 Identifying the Hybrid Decoupling Polynomial
12.3.3 Identifying System Parameters and Discrete States
12.3.4 The Basic Algorithm and Its Extensions
12.4 Simulations and Experiments
12.4.1 Error in the Estimation of the Model Parameters
12.4.2 Error as a Function of the Model Orders
12.4.3 Error as a Function of Noise
12.4.4 Experimental Results on Test Data Sets
12.5 Bibliographic Notes
13 Final Words
13.1 Unbalanced and Multimodal Data
13.2 Unsupervised and Semisupervised Learning
13.3 Data Acquisition and Online Data Analysis
13.4 Other Low-Dimensional Models
13.5 Computability and Scalability
13.6 Theory, Algorithms, Systems, and Applications
A Basic Facts from Optimization
A.1 Unconstrained Optimization
A.1.1 Optimality Conditions
A.1.2 Convex Set and Convex Function
A.1.3 Subgradient
A.1.4 Gradient Descent Algorithm
A.1.5 Alternating Direction Minimization
A.2 Constrained Optimization
A.2.1 Optimality Conditions and Lagrangian Multipliers
A.2.2 Augmented Lagrange Multipler Methods
A.2.3 Alternating Direction Method of Multipliers
A.3 Exercises
B Basic Facts from Mathematical Statistics
B.1 Estimation of Parametric Models
B.1.1 Sufficient Statistics
B.1.2 Mean Square Error, Efficiency, and Fisher Information
B.1.3 The Rao–Blackwell Theorem and Uniformly Minimum-Variance Unbiased Estimator
B.1.4 Maximum Likelihood (ML) Estimator
B.1.5 Consistency and Asymptotic Efficiency of the ML Estimator
B.2 ML Estimation for Models with Latent Variables
B.2.1 Expectation Maximization (EM)
B.2.2 Maximum a Posteriori Expectation Maximization (MAP-EM)
B.3 Estimation of Mixture Models
B.3.1 EM for Mixture Models
B.3.2 MAP-EM for Mixture Models
B.3.3 A Case in Which EM Fails
B.4 Model-Selection Criteria
B.4.1 Akaike Information Criterion
B.4.2 Bayesian Information Criterion
B.5 Robust Statistical Methods
B.5.1 Influence-Based Outlier Detection
B.5.2 Probability-Based Outlier Detection
B.5.3 Random-Sampling-Based Outlier Detection
B.6 Exercises
C Basic Facts from Algebraic Geometry
C.1 Abstract Algebra Basics
C.1.1 Polynomial Rings
C.1.2 Ideals and Algebraic Sets
C.1.3 Algebra and Geometry: Hilbert's Nullstellensatz
C.1.4 Algebraic Sampling Theory
C.1.5 Decomposition of Ideals and Algebraic Sets
C.1.6 Hilbert Function, Polynomial, and Series
C.2 Ideals of Subspace Arrangements
C.3 Subspace Embedding and PL-Generated Ideals
C.4 Hilbert Functions of Subspace Arrangements
C.4.1 Hilbert Function and Algebraic Subspace Clustering
C.4.2 Special Cases of the Hilbert Function
C.4.3 Formulas for the Hilbert Function
C.5 Bibliographic Notes
References
Index