logo资料库

Information Geometry And Its Application.pdf

第1页 / 共376页
第2页 / 共376页
第3页 / 共376页
第4页 / 共376页
第5页 / 共376页
第6页 / 共376页
第7页 / 共376页
第8页 / 共376页
资料共376页,剩余部分请下载后查看
Preface
Contents
Part I Geometry of Divergence Functions: Dually Flat Riemannian Structure
1 Manifold, Divergence and Dually Flat Structure
1.1 Manifolds
1.1.1 Manifold and Coordinate Systems
1.1.2 Examples of Manifolds
1.2 Divergence Between Two Points
1.2.1 Divergence
1.2.2 Examples of Divergence
1.3 Convex Function and Bregman Divergence
1.3.1 Convex Function
1.3.2 Bregman Divergence
1.4 Legendre Transformation
1.5 Dually Flat Riemannian Structure Derived from Convex Function
1.5.1 Affine and Dual Affine Coordinate Systems
1.5.2 Tangent Space, Basis Vectors and Riemannian Metric
1.5.3 Parallel Transport of Vector
1.6 Generalized Pythagorean Theorem and Projection Theorem
1.6.1 Generalized Pythagorean Theorem
1.6.2 Projection Theorem
1.6.3 Divergence Between Submanifolds: Alternating Minimization Algorithm
2 Exponential Families and Mixture Families of Probability Distributions
2.1 Exponential Family of Probability Distributions
2.2 Examples of Exponential Family: Gaussian and Discrete Distributions
2.2.1 Gaussian Distribution
2.2.2 Discrete Distribution
2.3 Mixture Family of Probability Distributions
2.4 Flat Structure: e-flat and m-flat
2.5 On Infinite-Dimensional Manifold of Probability Distributions
2.6 Kernel Exponential Family
2.7 Bregman Divergence and Exponential Family
2.8 Applications of Pythagorean Theorem
2.8.1 Maximum Entropy Principle
2.8.2 Mutual Information
2.8.3 Repeated Observations and Maximum Likelihood Estimator
3 Invariant Geometry of Manifold of Probability Distributions
3.1 Invariance Criterion
3.2 Information Monotonicity Under Coarse Graining
3.2.1 Coarse Graining and Sufficient Statistics in Sn
3.2.2 Invariant Divergence
3.3 Examples of f-Divergence in Sn
3.3.1 KL-Divergence
3.3.2 χ2-Divergence
3.3.3 α-Divergence
3.4 General Properties of f-Divergence and KL-Divergence
3.4.1 Properties of f-Divergence
3.4.2 Properties of KL-Divergence
3.5 Fisher Information: The Unique Invariant Metric
3.6 f-Divergence in Manifold of Positive Measures
4 α-Geometry, Tsallis q-Entropy and Positive-Definite Matrices
4.1 Invariant and Flat Divergence
4.1.1 KL-Divergence Is Unique
4.1.2 α-Divergence Is Unique in Rn+
4.2 α-Geometry in Sn and Rn+
4.2.1 α-Geodesic and α-Pythagorean Theorem in Rn+
4.2.2 α-Geodesic in Sn
4.2.3 α-Pythagorean Theorem and α-Projection Theorem in Sn
4.2.4 Apportionment Due to α-Divergence
4.2.5 α-Mean
4.2.6 α-Families of Probability Distributions
4.2.7 Optimality of α-Integration
4.2.8 Application to α-Integration of Experts
4.3 Geometry of Tsallis q-Entropy
4.3.1 q-Logarithm and q-Exponential Function
4.3.2 q-Exponential Family (α-Family) of Probability Distributions
4.3.3 q-Escort Geometry
4.3.4 Deformed Exponential Family: χ-Escort Geometry
4.3.5 Conformal Character of q-Escort Geometry
4.4 (u, v)-Divergence: Dually Flat Divergence in Manifold of Positive Measures
4.4.1 Decomposable (u, v)-Divergence
4.4.2 General (u, v) Flat Structure in Rn+
4.5 Invariant Flat Divergence in Manifold of Positive-Definite Matrices
4.5.1 Bregman Divergence and Invariance Under Gl(n)
4.5.2 Invariant Flat Decomposable Divergences Under O(n)
4.5.3 Non-flat Invariant Divergences
4.6 Miscellaneous Divergences
4.6.1 γ-Divergence
4.6.2 Other Types of (α, β)-Divergences
4.6.3 Burbea--Rao Divergence and Jensen--Shannon Divergence
4.6.4 (ρ, τ)-Structure and (F, G, H)-Structure
Part II Introduction to Dual Differential Geometry
5 Elements of Differential Geometry
5.1 Manifold and Tangent Space
5.2 Riemannian Metric
5.3 Affine Connection
5.4 Tensors
5.5 Covariant Derivative
5.6 Geodesic
5.7 Parallel Transport of Vector
5.8 Riemann--Christoffel Curvature
5.8.1 Round-the-World Transport of Vector
5.8.2 Covariant Derivative and RC Curvature
5.8.3 Flat Manifold
5.9 Levi--Civita (Riemannian) Connection
5.10 Submanifold and Embedding Curvature
5.10.1 Submanifold
5.10.2 Embedding Curvature
6 Dual Affine Connections and Dually Flat Manifold
6.1 Dual Connections
6.2 Metric and Cubic Tensor Derived from Divergence
6.3 Invariant Metric and Cubic Tensor
6.4 α-Geometry
6.5 Dually Flat Manifold
6.6 Canonical Divergence in Dually Flat Manifold
6.7 Canonical Divergence in General Manifold of Dual Connections
6.8 Dual Foliations of Flat Manifold and Mixed Coordinates
6.8.1 k-cut of Dual Coordinate Systems: Mixed Coordinates and Foliation
6.8.2 Decomposition of Canonical Divergence
6.8.3 A Simple Illustrative Example: Neural Firing
6.8.4 Higher-Order Interactions of Neuronal Spikes
6.9 System Complexity and Integrated Information
6.10 Input--Output Analysis in Economics
Part III Information Geometry of Statistical Inference
7 Asymptotic Theory of Statistical Inference
7.1 Estimation
7.2 Estimation in Exponential Family
7.3 Estimation in Curved Exponential Family
7.4 First-Order Asymptotic Theory of Estimation
7.5 Higher-Order Asymptotic Theory of Estimation
7.6 Asymptotic Theory of Hypothesis Testing
8 Estimation in the Presence of Hidden Variables
8.1 EM Algorithm
8.1.1 Statistical Model with Hidden Variables
8.1.2 Minimizing Divergence Between Model Manifold and Data Manifold
8.1.3 EM Algorithm
8.1.4 Example: Gaussian Mixture
8.2 Loss of Information by Data Reduction
8.3 Estimation Based on Misspecified Statistical Model
9 Neyman-Scott Problem: Estimating Function and Semiparametric Statistical Model
9.1 Statistical Model Including Nuisance Parameters
9.2 Neyman--Scott Problem and Semiparametrics
9.3 Estimating Function
9.4 Information Geometry of Estimating Function
9.5 Solutions to Neyman--Scott Problems
9.5.1 Estimating Function in the Exponential Case
9.5.2 Coefficient of Linear Dependence
9.5.3 Scale Problem
9.5.4 Temporal Firing Pattern of Single Neuron
10 Linear Systems and Time Series
10.1 Stationary Time Series and Linear System
10.2 Typical Finite-Dimensional Manifolds of Time Series
10.3 Dual Geometry of System Manifold
10.4 Geometry of AR, MA and ARMA Models
Part IV Applications of Information Geometry
11 Machine Learning
11.1 Clustering Patterns
11.1.1 Pattern Space and Divergence
11.1.2 Center of Cluster
11.1.3 k-Means: Clustering Algorithm
11.1.4 Voronoi Diagram
11.1.5 Stochastic Version of Classification and Clustering
11.1.6 Robust Cluster Center
11.1.7 Asmptotic Evaluation of Error Probability in Pattern Recognition: Chernoff Information
11.2 Geometry of Support Vector Machine
11.2.1 Linear Classifier
11.2.2 Embedding into High-Dimensional Space
11.2.3 Kernel Method
11.2.4 Riemannian Metric Induced by Kernel
11.3 Stochastic Reasoning: Belief Propagation and CCCP Algorithms
11.3.1 Graphical Model
11.3.2 Mean Field Approximation and m-Projection
11.3.3 Belief Propagation
11.3.4 Solution of BP Algorithm
11.3.5 CCCP (Convex--Concave Computational Procedure)
11.4 Information Geometry of Boosting
11.4.1 Boosting: Integration of Weak Machines
11.4.2 Stochastic Interpretation of Machine
11.4.3 Construction of New Weak Machines
11.4.4 Determination of the Weights of Weak Machines
11.5 Bayesian Inference and Deep Learning
11.5.1 Bayesian Duality in Exponential Family
11.5.2 Restricted Boltzmann Machine
11.5.3 Unsupervised Learning of RBM
11.5.4 Geometry of Contrastive Divergence
11.5.5 Gaussian RBM
12 Natural Gradient Learning and Its Dynamics in Singular Regions
12.1 Natural Gradient Stochastic Descent Learning
12.1.1 On-Line Learning and Batch Learning
12.1.2 Natural Gradient: Steepest Descent Direction in Riemannian Manifold
12.1.3 Riemannian Metric, Hessian and Absolute Hessian
12.1.4 Stochastic Relaxation of Optimization Problem
12.1.5 Natural Policy Gradient in Reinforcement Learning
12.1.6 Mirror Descent and Natural Gradient
12.1.7 Properties of Natural Gradient Learning
12.2 Singularity in Learning: Multilayer Perceptron
12.2.1 Multilayer Perceptron
12.2.2 Singularities in M
12.2.3 Dynamics of Learning in M
12.2.4 Critical Slowdown of Dynamics
12.2.5 Natural Gradient Learning Is Free of Plateaus
12.2.6 Singular Statistical Models
12.2.7 Bayesian Inference and Singular Model
13 Signal Processing and Optimization
13.1 Principal Component Analysis
13.1.1 Eigenvalue Analysis
13.1.2 Principal Components, Minor Components and Whitening
13.1.3 Dynamics of Learning of Principal and Minor Components
13.2 Independent Component Analysis
13.2.3 Estimating Function of ICA: Semiparametric Approach
13.3 Non-negative Matrix Factorization
13.4 Sparse Signal Processing
13.4.1 Linear Regression and Sparse Solution
13.4.2 Minimization of Convex Function Under L1 Constraint
13.4.3 Analysis of Solution Path
13.4.4 Minkovskian Gradient Flow
13.4.5 Underdetermined Case
13.5 Optimization in Convex Programming
13.5.1 Convex Programming
13.5.2 Dually Flat Structure Derived from Barrier Function
13.5.3 Computational Complexity and m-curvature
13.6 Dual Geometry Derived from Game Theory
13.6.1 Minimization of Game-Score
13.6.2 Hyvärinen Score
References
Index
Applied Mathematical Sciences Shun-ichi Amari Information Geometry and Its Applications
Applied Mathematical Sciences Volume 194 Editors S.S. Antman, Institute for Physical Science and Technology, University of Maryland, College Park, MD, USA ssa@math.umd.edu Leslie Greengard, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA Greengard@cims.nyu.edu P.J. Holmes, Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ, USA pholmes@math.princeton.edu Advisors J. Bell, Lawrence Berkeley National Lab, Center for Computational Sciences and Engineering, Berkeley, CA, USA P. Constantin, Department of Mathematics, Princeton University, Princeton, NJ, USA J. Keller, Department of Mathematics, Stanford University, Stanford, CA, USA R. Kohn, Courant Institute of Mathematical Sciences, New York University, New York, USA R. Pego, Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, USA L. Ryzhik, Department of Mathematics, Stanford University, Stanford, CA, USA A. Singer, Department of Mathematics, Princeton University, Princeton, NJ, USA A. Stevens, Department of Applied Mathematics, University of Münster, Münster, Germany A. Stuart, Mathematics Institute, University of Warwick, Coventry, United Kingdom S. Wright, Computer Sciences Department, University of Wisconsin, Madison, WI, USA Founding Editors Fritz John, Joseph P. LaSalle and Lawrence Sirovich
More information about this series at http://www.springer.com/series/34
Shun-ichi Amari Information Geometry and Its Applications 123
Shun-ichi Amari Brain Science Institute RIKEN Wako, Saitama Japan ISSN 0066-5452 Applied Mathematical Sciences ISBN 978-4-431-55977-1 DOI 10.1007/978-4-431-55978-8 ISSN 2196-968X (electronic) ISBN 978-4-431-55978-8 (eBook) Library of Congress Control Number: 2015958849 © Springer Japan 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. trademarks, service marks, etc. Printed on acid-free paper This Springer imprint is published by SpringerNature The registered company is Springer Japan KK
Preface Information geometry is a method of exploring the world of information by means of modern geometry. Theories of information have so far been studied mostly by using algebraic, logical, analytical, and probabilistic methods. Since geometry studies mutual relations between elements such as distance and curvature, it should provide the information sciences with powerful tools. Information geometry has emerged from studies of invariant geometrical structure involved in statistical inference. It defines a Riemannian metric together with dually coupled affine connections in a manifold of probability distributions. These structures play important roles not only in statistical inference but also in wider areas of information sciences, such as machine learning, signal processing, optimization, and even neuroscience, not to mention mathematics and physics. It is intended that the present monograph will give an introduction to information geometry and an overview of wide areas of application. For this purpose, Part I begins with a divergence function in a manifold. We then show that this provides the manifold with a dually flat structure equipped with a Riemannian metric. A highlight is a generalized Pythagorean theorem in a dually flat information manifold. The results are understandable without knowledge of differential geometry. Part II gives an introduction to modern differential geometry without tears. We try to present concepts in a way which is intuitively understandable, not sticking to rigorous mathematics. Throughout the monograph, we do not pursue a rigorous mathematical basis but rather develop a framework which gives practically useful and understandable descriptions. Part III is devoted to statistical inference, where various topics will be found, including the Neyman–Scott problem, semiparametric models, and the EM algo- rithm. Part IV overviews various applications of information geometry in the fields of machine learning, signal processing, and others. Allow me to review my own personal history in information geometry. It was in 1958, when I was a graduate student on a master’s course, that I followed a seminar on statistics. The text was “Information Theory and Statistics” by S. Kullback, and v
vi Preface a professor suggested to me that the Fisher information might be regarded as a Riemannian metric. I calculated the Riemannian metric and curvature of the manifold of Gaussian distributions and found that it is a manifold of constant from the famous Poincaré half-plane in curvature, which is no different non-Euclidean geometry. I was enchanted by its beauty. I believed that a beautiful structure must have important practical significance, but I was not able to pursue its consequences further. Fifteen years later, I was stimulated by a paper by Prof. B. Efron and accom- panying discussions by Prof. A.P. Dawid, and restarted my investigation into information geometry. Later, I found that Prof. N.N. Chentsov had developed a theory along similar lines. I was lucky that Sir D. Cox noticed my approach and organized an international workshop on information geometry in 1984, in which many active statisticians participated. This was a good start for information geometry. Now information geometry has been developed worldwide and many symposia and workshops have been organized around the world. Its areas of application have been enlarged from statistical inference to wider fields of information sciences. To my regret, I have not been able to introduce many excellent works by other researchers around the world. For example, I have not been able to touch upon quantum information geometry. Also I have not been able to refer to many important works, because of my limited capability. Last but not least, I would like to thank Dr. M. Kumon and Prof. H. Nagaoka, who collaborated in the early period of the infancy of information geometry. I also thank the many researchers who have supported me in the process of construction of information geometry, Profs. D. Cox, C.R. Rao, O. Barndorff-Nielsen, S. Lauritzen, B. Efron, A.P. Dawid, K. Takeuchi, and the late N.N. Chentsov, among many many others. Finally, I would like to thank Ms. Emi Namioka who arranged my handwritten manuscripts in the beautiful TEX form. Without her devotion, the monograph would not have appeared. April 2015 Shun-ichi Amari
Contents Part I Geometry of Divergence Functions: Dually Flat Riemannian Structure 1.2 1.3 1 Manifold, Divergence and Dually Flat Structure . . . . . . . . . . . . . . 1.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Manifold and Coordinate Systems . . . . . . . . . . . . . . . 1.1.2 Examples of Manifolds . . . . . . . . . . . . . . . . . . . . . . . Divergence Between Two Points . . . . . . . . . . . . . . . . . . . . . . Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 1.2.2 Examples of Divergence . . . . . . . . . . . . . . . . . . . . . . Convex Function and Bregman Divergence . . . . . . . . . . . . . . . 1.3.1 Convex Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Bregman Divergence . . . . . . . . . . . . . . . . . . . . . . . . Legendre Transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . Dually Flat Riemannian Structure Derived from Convex Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Affine and Dual Affine Coordinate Systems . . . . . . . . . 1.5.1 Tangent Space, Basis Vectors 1.5.2 and Riemannian Metric . . . . . . . . . . . . . . . . . . . . . . . Parallel Transport of Vector. . . . . . . . . . . . . . . . . . . . 1.5.3 Generalized Pythagorean Theorem and Projection Theorem . . . . Generalized Pythagorean Theorem . . . . . . . . . . . . . . . 1.6.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 1.6.3 Divergence Between Submanifolds: Alternating Minimization Algorithm . . . . . . . . . . . . . . . . . . . . . . 1.4 1.5 1.6 2 Exponential Families and Mixture Families of Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Exponential Family of Probability Distributions . . . . . . . . . . . . 3 3 3 5 9 9 11 12 12 13 16 19 19 20 23 24 24 26 27 31 31 vii
分享到:
收藏