《广义主成分分析》Generalized Principal Component analysis.pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：12.84M 资料格式：pdf 举报版权申诉

queyuze-9831436-4744302543288205130.pdf-第1页.png

第1页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第2页.png

第2页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第3页.png

第3页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第4页.png

第4页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第5页.png

第5页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第6页.png

第6页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第7页.png

第7页 / 共590页

queyuze-9831436-4744302543288205130.pdf-第8页.png

第8页 / 共590页

Preface

Acknowledgments

Contents

Glossary of Notation

1 Introduction

1.1 Modeling Data with a Parametric Model

1.1.1 The Choice of a Model Class

1.1.2 Statistical Models versus Geometric Models

1.2 Modeling Mixed Data with a Mixture Model

1.2.1 Examples of Mixed Data Modeling

1.2.2 Mathematical Representations of Mixture Models

1.3 Clustering via Discriminative or Nonparametric Methods

1.4 Noise, Errors, Outliers, and Model Selection

Part I Modeling Data with a Single Subspace

2 Principal Component Analysis

2.1 Classical Principal Component Analysis (PCA)

2.1.1 A Statistical View of PCA

2.1.2 A Geometric View of PCA

2.1.3 A Rank Minimization View of PCA

2.2 Probabilistic Principal Component Analysis (PPCA)

2.2.1 PPCA from Population Mean and Covariance

2.2.2 PPCA by Maximum Likelihood

2.3 Model Selection for Principal Component Analysis

2.3.1 Model Selection by Information-Theoretic Criteria

2.3.2 Model Selection by Rank Minimization

2.3.3 Model Selection by Asymptotic Mean Square Error

2.4 Bibliographic Notes

2.5 Exercises

3 Robust Principal Component Analysis

3.1 PCA with Robustness to Missing Entries

3.1.1 Incomplete PCA by Mean and Covariance Completion

3.1.2 Incomplete PPCA by Expectation Maximization

3.1.3 Matrix Completion by Convex Optimization

3.1.4 Incomplete PCA by Alternating Minimization

3.2 PCA with Robustness to Corrupted Entries

3.2.1 Robust PCA by Iteratively Reweighted Least Squares

3.2.2 Robust PCA by Convex Optimization

3.3 PCA with Robustness to Outliers

3.3.1 Outlier Detection by Robust Statistics

3.3.2 Outlier Detection by Convex Optimization

3.4 Bibliographic Notes

3.5 Exercises

4 Nonlinear and Nonparametric Extensions

4.1 Nonlinear and Kernel PCA

4.1.1 Nonlinear Principal Component Analysis (NLPCA)

4.1.2 NLPCA in a High-dimensional Feature Space

4.1.3 Kernel PCA (KPCA)

4.2 Nonparametric Manifold Learning

4.2.1 Multidimensional Scaling (MDS)

4.2.2 Locally Linear Embedding (LLE)

4.2.3 Laplacian Eigenmaps (LE)

4.3 K-Means and Spectral Clustering

4.3.1 K-Means Clustering

4.3.2 Spectral Clustering

4.4 Bibliographic Notes

4.5 Exercises

4.A Laplacian Eigenmaps: Continuous Formulation

Part II Modeling Data with Multiple Subspaces

5 Algebraic-Geometric Methods

5.1 Problem Formulation of Subspace Clustering

5.1.1 Projectivization of Affine Subspaces

5.1.2 Subspace Projection and Minimum Representation

5.2 Introductory Cases of Subspace Clustering

5.2.1 Clustering Points on a Line

5.2.2 Clustering Lines in a Plane

5.2.3 Clustering Hyperplanes

5.3 Subspace Clustering Knowing the Number of Subspaces

5.3.1 An Introductory Example

5.3.2 Fitting Polynomials to Subspaces

5.3.3 Subspaces from Polynomial Differentiation

5.3.4 Point Selection via Polynomial Division

5.3.5 The Basic Algebraic Subspace Clustering Algorithm

5.4 Subspace Clustering not Knowing the Number of Subspaces

5.4.1 Introductory Examples

5.4.2 Clustering Subspaces of Equal Dimension

5.4.3 Clustering Subspaces of Different Dimensions

5.5 Model Selection for Multiple Subspaces

5.5.1 Effective Dimension of Samples of Multiple Subspaces

5.5.2 Minimum Effective Dimension of Noisy Samples

5.5.3 Recursive Algebraic Subspace Clustering

5.6 Bibliographic Notes

5.7 Exercises

6 Statistical Methods

6.1 K-Subspaces

6.1.1 K-Subspaces Model

6.1.2 K-Subspaces Algorithm

6.1.3 Convergence of the K-Subspaces Algorithm

6.1.4 Advantages and Disadvantages of K-Subspaces

6.2 Mixture of Probabilistic PCA (MPPCA)

6.2.1 MPPCA Model

6.2.2 Maximum Likelihood Estimation for MPPCA

6.2.3 Maximum a Posteriori (MAP) Estimation for MPPCA

6.2.4 Relationship between K-Subspaces and MPPCA

6.3 Compression-Based Subspace Clustering

6.3.1 Model Estimation and Data Compression

6.3.2 Minimium Coding Length via Agglomerative Clustering

6.3.3 Lossy Coding of Multivariate Data

6.3.4 Coding Length of Mixed Gaussian Data

6.4 Simulations and Applications

6.4.1 Statistical Methods on Synthetic Data

6.4.2 Statistical Methods on Gene Expression Clustering, Image Segmentation, and Face Clustering

6.5 Bibliographic Notes

6.6 Exercises

6.A Lossy Coding Length for Subspace-like Data

7 Spectral Methods

7.1 Spectral Subspace Clustering

7.2 Local Subspace Affinity (LSA) and Spectral Local Best-Fit Flats (SLBF)

7.3 Locally Linear Manifold Clustering (LLMC)

7.4 Spectral Curvature Clustering (SCC)

7.5 Spectral Algebraic Subspace Clustering (SASC)

7.6 Simulations and Applications

7.6.1 Spectral Methods on Synthetic Data

7.6.2 Spectral Methods on Face Clustering

7.7 Exercises

8 Sparse and Low-Rank Methods

8.1 Self-Expressiveness and Subspace-Preserving Representations

8.1.1 Self-Expressiveness Property

8.1.2 Subspace-Preserving Representation

8.2 Low-Rank Subspace Clustering (LRSC)

8.2.1 LRSC with Uncorrupted Data

8.2.2 LRSC with Robustness to Noise

8.2.3 LRSC with Robustness to Corruptions

8.3 Sparse Subspace Clustering (SSC)

8.3.1 SSC with Uncorrupted Data

8.3.2 SSC with Robustness to Outliers

8.3.3 SSC with Robustness to Noise

8.3.4 SSC with Robustness to Corrupted Entries

8.3.5 SSC for Affine Subspaces

8.4 Simulations and Applications

8.4.1 Low-Rank and Sparse Methods on Synthetic Data

8.4.2 Low-Rank and Sparse Methods on Face Clustering

8.5 Bibliographic Notes

8.6 Exercises

Part III Applications

9 Image Representation

9.1 Seeking Compact and Sparse Image Representations

9.1.1 Prefixed Linear Transformations

9.1.2 Adaptive, Overcomplete, and Hybrid Representations

9.1.3 Hierarchical Models for Multiscale Structures.

9.2 Image Representation with Multiscale Hybrid Linear Models

9.2.1 Linear versus Hybrid Linear Models

9.2.2 Multiscale Hybrid Linear Models

9.2.3 Experiments and Comparisons

9.3 Multiscale Hybrid Linear Models in Wavelet Domain

9.3.1 Imagery Data Vectors in the Wavelet Domain

9.3.2 Hybrid Linear Models in the Wavelet Domain

9.3.3 Comparison with Other Lossy Representations

9.4 Bibliographic Notes

10 Image Segmentation

10.1 Basic Models and Principles

10.1.1 Problem Formulation

10.1.2 Image Segmentation as Subspace Clustering

10.1.3 Minimum Coding Length Principle

10.2 Encoding Image Textures and Boundaries

10.2.1 Construction of Texture Features

10.2.2 Texture Encoding

10.2.3 Boundary Encoding

10.3 Compression-Based Image Segmentation

10.3.1 Minimizing Total Coding Length

10.3.2 Hierarchical Implementation

10.3.3 Choosing the Proper Distortion Level

10.4 Experimental Evaluation

10.4.1 Color Spaces and Compressibility

10.4.2 Experimental Setup

10.4.3 Results and Discussions

10.5 Bibliographic Notes

11 Motion Segmentation

11.1 The 3D Motion Segmentation Problem

11.2 Motion Segmentation from Multiple Affine Views

11.2.1 Affine Projection of a Rigid-Body Motion

11.2.2 Motion Subspace of a Rigid-Body Motion

11.2.3 Segmentation of Multiple Rigid-Body Motions

11.2.4 Experiments on Multiview Motion Segmentation

11.3 Motion Segmentation from Two Perspective Views

11.3.1 Perspective Projection of a Rigid-Body Motion

11.3.2 Segmentation of 3D Translational Motions

11.3.3 Segmentation of Rigid-Body Motions

11.3.4 Segmentation of Rotational Motions or Planar Scenes

11.3.5 Experiments on Two-View Motion Segmentation

11.4 Temporal Motion Segmentation

11.4.1 Dynamical Models of Time-Series Data

11.4.2 Experiments on Temporal Video Segmentation

11.4.3 Experiments on Segmentation of Human Motion Data

11.5 Bibliographical Notes

12 Hybrid System Identification

12.1 Problem Statement

12.2 Identification of a Single ARX System

12.3 Identification of Hybrid ARX Systems

12.3.1 The Hybrid Decoupling Polynomial

12.3.2 Identifying the Hybrid Decoupling Polynomial

12.3.3 Identifying System Parameters and Discrete States

12.3.4 The Basic Algorithm and Its Extensions

12.4 Simulations and Experiments

12.4.1 Error in the Estimation of the Model Parameters

12.4.2 Error as a Function of the Model Orders

12.4.3 Error as a Function of Noise

12.4.4 Experimental Results on Test Data Sets

12.5 Bibliographic Notes

13 Final Words

13.1 Unbalanced and Multimodal Data

13.2 Unsupervised and Semisupervised Learning

13.3 Data Acquisition and Online Data Analysis

13.4 Other Low-Dimensional Models

13.5 Computability and Scalability

13.6 Theory, Algorithms, Systems, and Applications

A Basic Facts from Optimization

A.1 Unconstrained Optimization

A.1.1 Optimality Conditions

A.1.2 Convex Set and Convex Function

A.1.3 Subgradient

A.1.4 Gradient Descent Algorithm

A.1.5 Alternating Direction Minimization

A.2 Constrained Optimization

A.2.1 Optimality Conditions and Lagrangian Multipliers

A.2.2 Augmented Lagrange Multipler Methods

A.2.3 Alternating Direction Method of Multipliers

A.3 Exercises

B Basic Facts from Mathematical Statistics

B.1 Estimation of Parametric Models

B.1.1 Sufficient Statistics

B.1.2 Mean Square Error, Efficiency, and Fisher Information

B.1.3 The Rao–Blackwell Theorem and Uniformly Minimum-Variance Unbiased Estimator

B.1.4 Maximum Likelihood (ML) Estimator

B.1.5 Consistency and Asymptotic Efficiency of the ML Estimator

B.2 ML Estimation for Models with Latent Variables

B.2.1 Expectation Maximization (EM)

B.2.2 Maximum a Posteriori Expectation Maximization (MAP-EM)

B.3 Estimation of Mixture Models

B.3.1 EM for Mixture Models

B.3.2 MAP-EM for Mixture Models

B.3.3 A Case in Which EM Fails

B.4 Model-Selection Criteria

B.4.1 Akaike Information Criterion

B.4.2 Bayesian Information Criterion

B.5 Robust Statistical Methods

B.5.1 Influence-Based Outlier Detection

B.5.2 Probability-Based Outlier Detection

B.5.3 Random-Sampling-Based Outlier Detection

B.6 Exercises

C Basic Facts from Algebraic Geometry

C.1 Abstract Algebra Basics

C.1.1 Polynomial Rings

C.1.2 Ideals and Algebraic Sets

C.1.3 Algebra and Geometry: Hilbert's Nullstellensatz

C.1.4 Algebraic Sampling Theory

C.1.5 Decomposition of Ideals and Algebraic Sets

C.1.6 Hilbert Function, Polynomial, and Series

C.2 Ideals of Subspace Arrangements

C.3 Subspace Embedding and PL-Generated Ideals

C.4 Hilbert Functions of Subspace Arrangements

C.4.1 Hilbert Function and Algebraic Subspace Clustering

C.4.2 Special Cases of the Hilbert Function

C.4.3 Formulas for the Hilbert Function

C.5 Bibliographic Notes

References

Index

Interdisciplinary Applied Mathematics 40 René Vidal Yi Ma S. Shankar Sastry Generalized Principal Component Analysis

Generalized Principal Component Analysis

Interdisciplinary Applied Mathematics Volume 40 Editors S.S. Antman L. Greengard P. Holmes Series Advisors Leon Glass Robert Kohn P.S. Krishnaprasad James D. Murray Shankar Sastry James Sneyd Problems in engineering, computational science, and the physical and biological sciences are using increasingly sophisticated mathematical techniques. Thus, the bridge between the mathematical sciences and other disciplines is heavily traveled. The correspondingly increased dialog between the disciplines has led to the establishment of the series: Interdisciplinary Applied Mathematics. The purpose of this series is to meet the current and future needs for the interaction between various science and technology areas on the one hand and mathematics on the other. This is done, ﬁrstly, by encouraging the ways that mathematics may be applied in traditional areas, as well as point towards new and innovative areas of applications; and, secondly, by encouraging other scientiﬁc disciplines to engage in a dialog with mathematicians outlining their problems to both access new methods and suggest innovative developments within mathematics itself. The series will consist of monographs and high-level texts from researchersworking on the interplay between mathematics and other ﬁelds of science and technology. More information about this series at http://www.springer.com/series/1390

René Vidal • Yi Ma • S. Shankar Sastry Generalized Principal Component Analysis 123

René Vidal Center for Imaging Science Department of Biomedical Engineering Johns Hopkins University Baltimore, MD, USA Yi Ma School of Information Science and Technology ShanghaiTech University Shanghai, China S. Shankar Sastry Department of Electrical Engineering and Computer Science University of California Berkeley Berkeley, CA, USA ISSN 0939-6047 Interdisciplinary Applied Mathematics ISBN 978-0-387-87810-2 DOI 10.1007/978-0-387-87811-9 ISSN 2196-9973 (electronic) ISBN 978-0-387-87811-9 (eBook) Library of Congress Control Number: 2015958763 Mathematics Subject Classiﬁcation (2010): 30C10, 30C40, 62-XX, 62-07, 62-08, 62B10, 62Fxx, 62H12, 62H25, 62H35, 62Jxx, 62J05, 62J07, 14-XX, 14N20, 15-XX Springer New York Heidelberg Dordrecht London © Springer-Verlag New York 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer Science+Business Media LLC New York is part of Springer Science+Business Media (www. springer.com)

To Camille, Margot, and Romeo Vidal (R. V.) To Diana Xiaoyuan Zhu, Barron, and Henry Ma (Y. M.) To Claire Tomlin, Samuel, and Lucy Sastry (S. S.)

Preface We are not very pleased when we are forced to accept a mathematical truth by virtue of a complicated chain of formal conclusions and computations, which we traverse blindly, link by link, feeling our way by touch. We want ﬁrst an overview of the aim and of the road; we want to understand the idea of the proof, the deeper context. —Hermann Weyl Classical theory and methods for the analysis of data were established mainly for engineering and scientiﬁc problems that arose ﬁve or six decades ago. In these classical settings, engineers or scientists usually had full control of the data acquisition process. As a result, the data to be processed and analyzed were typically clean and complete: they contained only moderate amounts of noise and were often adequately collected for the speciﬁc task or problem of interest. In that regime, many data analysis methods were based on the assumption that most data sets have fewer effective degrees of freedom than the dimension of the ambient space. For example, the number of pixels in an image can be rather large, yet most computer vision models used only a few parameters to describe the appearance, geometry, and dynamics of a scene. This assumption motivated the development of a number of techniques for identifying low-dimensional structures in high-dimensional data, a problem that is important not only for understanding the data, but also for many practical purposes such as data compression and transmission. A popular technique for discovering low-dimensional structure in data is principal component analysis (PCA), which assumes that the data are drawn from a single low-dimensional afﬁne subspace of a high-dimensional space (Jolliffe 1986, 2002). PCA is arguably the simplest and most popular dimensionality reduction tool, and it has found widespread applications in many ﬁelds such as computer vision (Turk and Pentland 1991). However, in the past decade or so, there has been a fundamental regime shift in data analysis. Currently, scientists and engineers often must deal with data whose dimension, size, and complexity expand at an explosive rate. Moreover, they have pretty much lost control of the data acquisition process. For instance, in 2012, 350 million photos were uploaded to Facebook every day, and 100 hours vii

分享到：

赞收藏

资料库

《广义主成分分析》Generalized Principal Component analysis.pdf

相关推荐

开发技术

热门标签

最新资料