logo资料库

Nonlinear Dimensionality Reduction.pdf

第1页 / 共316页
第2页 / 共316页
第3页 / 共316页
第4页 / 共316页
第5页 / 共316页
第6页 / 共316页
第7页 / 共316页
第8页 / 共316页
资料共316页,剩余部分请下载后查看
Notations
Acronyms
High-Dimensional Data
Practical motivations
Fields of application
The goals to be reached
Theoretical motivations
How can we visualize high-dimensional spaces?
Curse of dimensionality and empty space phenomenon
Some directions to be explored
Relevance of the variables
Dependencies between the variables
About topology, spaces, and manifolds
Two benchmark manifolds
Overview of the next chapters
Characteristics of an Analysis Method
Purpose
Expected functionalities
Estimation of the number of latent variables
Embedding for dimensionality reduction
Embedding for latent variable separation
Internal characteristics
Underlying model
Algorithm
Criterion
Example: Principal component analysis
Data model of PCA
Criteria leading to PCA
Functionalities of PCA
Algorithms
Examples and limitations of PCA
Toward a categorization of DR methods
Hard vs. soft dimensionality reduction
Traditional vs. generative model
Linear vs. nonlinear model
Continuous vs. discrete model
Implicit vs. explicit mapping
Integrated vs. external estimation of the dimensionality
Layered vs. standalone embeddings
Single vs. multiple coordinate systems
Optional vs. mandatory vector quantization
Batch vs. online algorithm
Exact vs. approximate optimization
The type of criterion to be optimized
Estimation of the Intrinsic Dimension
Definition of the intrinsic dimension
Fractal dimensions
The q-dimension
Capacity dimension
Information dimension
Correlation dimension
Some inequalities
Practical estimation
Other dimension estimators
Local methods
Trial and error
Comparisons
Data Sets
PCA estimator
Correlation dimension
Local PCA estimator
Trial and error
Concluding remarks
Distance Preservation
State-of-the-art
Spatial distances
Metric space, distances, norms and scalar product
Multidimensional scaling
Sammon's nonlinear mapping
Curvilinear component analysis
Graph distances
Geodesic distance and graph distance
Isomap
Geodesic NLM
Curvilinear distance analysis
Other distances
Kernel PCA
Semidefinite embedding
Topology Preservation
State of the art
Predefined lattice
Self-Organizing Maps
Generative Topographic Mapping
Data-driven lattice
Locally linear embedding
Laplacian eigenmaps
Isotop
Method comparisons
Toy examples
The Swiss roll
Manifolds having essential loops or spheres
Cortex unfolding
Image processing
Artificial faces
Real faces
Conclusions
Summary of the book
The problem
A basic solution
Dimensionality reduction
Latent variable separation
Intrinsic dimensionality estimation
Data flow
Variable Selection
Calibration
Linear dimensionality reduction
Nonlinear dimensionality reduction
Latent variable separation
Further processing
Model complexity
Taxonomy
Distance preservation
Topology preservation
Spectral methods
Nonspectral methods
Tentative methodology
Perspectives
Matrix Calculus
Singular value decomposition
Eigenvalue decomposition
Square root of a square matrix
Gaussian Variables
One-dimensional Gaussian distribution
Multidimensional Gaussian distribution
Uncorrelated Gaussian variables
Isotropic multivariate Gaussian distribution
Linearly mixed Gaussian variables
Optimization
Newton's method
Finding extrema
Multivariate version
Gradient ascent/descent
Stochastic gradient descent
Vector quantization
Classical techniques
Competitive learning
Taxonomy
Initialization and ``dead units''
Graph Building
Without vector quantization
K-rule
-rule
-rule
With vector quantization
Data rule
Histogram rule
Implementation Issues
Dimension estimation
Capacity dimension
Correlation dimension
Computation of the closest point(s)
Graph distances
References
Index
Information Science and Statistics Series Editors: M. Jordan J. Kleinberg B. Schölkopf
Information Science and Statistics Akaike and Kitagawa: The Practice of Time Series Analysis. Bishop: Pattern Recognition and Machine Learning. Cowell, Dawid, Lauritzen, and Spiegelhalter: Probabilistic Networks and Expert Systems. Doucet, de Freitas, and Gordon: Sequential Monte Carlo Methods in Practice. Fine: Feedforward Neural Network Methodology. Hawkins and Olwell: Cumulative Sum Charts and Charting for Quality Improvement. Jensen and Nielsen: Bayesian Networks and Decision Graphs, Second Edition. Lee and Verleysen: Nonlinear Dimensionality Reduction. Marchette: Computer Intrusion Detection and Network Monitoring: A Statistical Viewpoint. Rissanen: Information and Complexity in Statistical Modeling. Rubinstein and Kroese: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte Carlo Simulation, and Machine Learning. Studen´y: Probabilistic Conditional Independence Structures. Vapnik: The Nature of Statistical Learning Theory, Second Edition. Wallace: Statistical and Inductive Inference by Minimum Massage Length.
John A. Lee Michel Verleysen Nonlinear Dimensionality Reduction
John Lee Molecular Imaging and Experimental Radiotherapy Université catholique de Louvain Avenue Hippocrate 54/69 B-1200 Bruxelles Belgium john.lee@uclouvain.be Michel Verleysen Machine Learning Group – DICE Université catholique de Louvain Place du Levant 3 B-1348 Louvain-la-Neuve Belgium michel.verleysen@uclouvain.be Series Editors: Michael Jordan Division of Computer Science and Department of Statistics University of California, Berkeley Berkeley, CA 94720 USA Jon Kleinberg Department of Computer Science Cornell University Ithaca, NY 14853 USA Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstrasse 38 72076 Tübingen Germany Library of Congress Control Number: 2006939149 ISBN-13: 978-0-387-39350-6 e-ISBN-13: 978-0-387-39351-3 Printed on acid-free paper. © 2007 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science + Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 4 3 2 1 springer.com
To our families
Preface Methods of dimensionality reduction are innovative and important tools in the fields of data analysis, data mining, and machine learning. They provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimen- sional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late 1990s, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are, for ex- ample, the use of graphs to represent the manifold topology, and the use of new metrics like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have led to spectral embedding, which encompasses many of the recently developed methods. This book describes existing and advanced methods to reduce the dimen- sionality of numerical databases. For each method, the description starts from intuitive ideas, develops the necessary mathematical details, and ends by out- lining the algorithmic implementation. Methods are compared with each other with the help of different illustrative examples. The purpose of the book is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods are all described from a unifying point of view, in order to highlight their respective strengths and shortcomings. The book is primarily intended for statisticians, computer scientists, and data analysts. It is also accessible to other practitioners having a basic back- ground in statistics and/or computational learning, such as psychologists (in psychometry) and economists. Louvain-la-Neuve, Belgium October 2006 John A. Lee Michel Verleysen
Contents Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XV Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .XVII 1 High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Practical motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Fields of application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The goals to be reached . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Theoretical motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 How can we visualize high-dimensional spaces? . . . . . . . . 1.2.2 Curse of dimensionality and empty space phenomenon . 1.3 Some directions to be explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 3 4 6 9 1.3.1 Relevance of the variables . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Dependencies between the variables . . . . . . . . . . . . . . . . . . 10 1.4 About topology, spaces, and manifolds . . . . . . . . . . . . . . . . . . . . . 11 1.5 Two benchmark manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Overview of the next chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Characteristics of an Analysis Method . . . . . . . . . . . . . . . . . . . . . 17 2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Expected functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Estimation of the number of latent variables . . . . . . . . . . 18 2.2.2 Embedding for dimensionality reduction . . . . . . . . . . . . . . 19 2.2.3 Embedding for latent variable separation . . . . . . . . . . . . . 20 2.3 Internal characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Underlying model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.3 Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Example: Principal component analysis . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Data model of PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2 Criteria leading to PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
分享到:
收藏