Information Geometry And Its Application.pdf

发布时间：2022-06-14 发布人：admin 分类：说明书资料大小：6.14M 资料格式：pdf 举报版权申诉

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第1页.png

第1页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第2页.png

第2页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第3页.png

第3页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第4页.png

第4页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第5页.png

第5页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第6页.png

第6页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第7页.png

第7页 / 共376页

d0da2ba7-b06d-4459-a8a8-e2b16b08d1ac.pdf-第8页.png

第8页 / 共376页

Preface

Contents

Part I Geometry of Divergence Functions: Dually Flat Riemannian Structure

1 Manifold, Divergence and Dually Flat Structure

1.1 Manifolds

1.1.1 Manifold and Coordinate Systems

1.1.2 Examples of Manifolds

1.2 Divergence Between Two Points

1.2.1 Divergence

1.2.2 Examples of Divergence

1.3 Convex Function and Bregman Divergence

1.3.1 Convex Function

1.3.2 Bregman Divergence

1.4 Legendre Transformation

1.5 Dually Flat Riemannian Structure Derived from Convex Function

1.5.1 Affine and Dual Affine Coordinate Systems

1.5.2 Tangent Space, Basis Vectors and Riemannian Metric

1.5.3 Parallel Transport of Vector

1.6 Generalized Pythagorean Theorem and Projection Theorem

1.6.1 Generalized Pythagorean Theorem

1.6.2 Projection Theorem

1.6.3 Divergence Between Submanifolds: Alternating Minimization Algorithm

2 Exponential Families and Mixture Families of Probability Distributions

2.1 Exponential Family of Probability Distributions

2.2 Examples of Exponential Family: Gaussian and Discrete Distributions

2.2.1 Gaussian Distribution

2.2.2 Discrete Distribution

2.3 Mixture Family of Probability Distributions

2.4 Flat Structure: e-flat and m-flat

2.5 On Infinite-Dimensional Manifold of Probability Distributions

2.6 Kernel Exponential Family

2.7 Bregman Divergence and Exponential Family

2.8 Applications of Pythagorean Theorem

2.8.1 Maximum Entropy Principle

2.8.2 Mutual Information

2.8.3 Repeated Observations and Maximum Likelihood Estimator

3 Invariant Geometry of Manifold of Probability Distributions

3.1 Invariance Criterion

3.2 Information Monotonicity Under Coarse Graining

3.2.1 Coarse Graining and Sufficient Statistics in Sn

3.2.2 Invariant Divergence

3.3 Examples of f-Divergence in Sn

3.3.1 KL-Divergence

3.3.2 χ2-Divergence

3.3.3 α-Divergence

3.4 General Properties of f-Divergence and KL-Divergence

3.4.1 Properties of f-Divergence

3.4.2 Properties of KL-Divergence

3.5 Fisher Information: The Unique Invariant Metric

3.6 f-Divergence in Manifold of Positive Measures

4 α-Geometry, Tsallis q-Entropy and Positive-Definite Matrices

4.1 Invariant and Flat Divergence

4.1.1 KL-Divergence Is Unique

4.1.2 α-Divergence Is Unique in Rn+

4.2 α-Geometry in Sn and Rn+

4.2.1 α-Geodesic and α-Pythagorean Theorem in Rn+

4.2.2 α-Geodesic in Sn

4.2.3 α-Pythagorean Theorem and α-Projection Theorem in Sn

4.2.4 Apportionment Due to α-Divergence

4.2.5 α-Mean

4.2.6 α-Families of Probability Distributions

4.2.7 Optimality of α-Integration

4.2.8 Application to α-Integration of Experts

4.3 Geometry of Tsallis q-Entropy

4.3.1 q-Logarithm and q-Exponential Function

4.3.2 q-Exponential Family (α-Family) of Probability Distributions

4.3.3 q-Escort Geometry

4.3.4 Deformed Exponential Family: χ-Escort Geometry

4.3.5 Conformal Character of q-Escort Geometry

4.4 (u, v)-Divergence: Dually Flat Divergence in Manifold of Positive Measures

4.4.1 Decomposable (u, v)-Divergence

4.4.2 General (u, v) Flat Structure in Rn+

4.5 Invariant Flat Divergence in Manifold of Positive-Definite Matrices

4.5.1 Bregman Divergence and Invariance Under Gl(n)

4.5.2 Invariant Flat Decomposable Divergences Under O(n)

4.5.3 Non-flat Invariant Divergences

4.6 Miscellaneous Divergences

4.6.1 γ-Divergence

4.6.2 Other Types of (α, β)-Divergences

4.6.3 Burbea--Rao Divergence and Jensen--Shannon Divergence

4.6.4 (ρ, τ)-Structure and (F, G, H)-Structure

Part II Introduction to Dual Differential Geometry

5 Elements of Differential Geometry

5.1 Manifold and Tangent Space

5.2 Riemannian Metric

5.3 Affine Connection

5.4 Tensors

5.5 Covariant Derivative

5.6 Geodesic

5.7 Parallel Transport of Vector

5.8 Riemann--Christoffel Curvature

5.8.1 Round-the-World Transport of Vector

5.8.2 Covariant Derivative and RC Curvature

5.8.3 Flat Manifold

5.9 Levi--Civita (Riemannian) Connection

5.10 Submanifold and Embedding Curvature

5.10.1 Submanifold

5.10.2 Embedding Curvature

6 Dual Affine Connections and Dually Flat Manifold

6.1 Dual Connections

6.2 Metric and Cubic Tensor Derived from Divergence

6.3 Invariant Metric and Cubic Tensor

6.4 α-Geometry

6.5 Dually Flat Manifold

6.6 Canonical Divergence in Dually Flat Manifold

6.7 Canonical Divergence in General Manifold of Dual Connections

6.8 Dual Foliations of Flat Manifold and Mixed Coordinates

6.8.1 k-cut of Dual Coordinate Systems: Mixed Coordinates and Foliation

6.8.2 Decomposition of Canonical Divergence

6.8.3 A Simple Illustrative Example: Neural Firing

6.8.4 Higher-Order Interactions of Neuronal Spikes

6.9 System Complexity and Integrated Information

6.10 Input--Output Analysis in Economics

Part III Information Geometry of Statistical Inference

7 Asymptotic Theory of Statistical Inference

7.1 Estimation

7.2 Estimation in Exponential Family

7.3 Estimation in Curved Exponential Family

7.4 First-Order Asymptotic Theory of Estimation

7.5 Higher-Order Asymptotic Theory of Estimation

7.6 Asymptotic Theory of Hypothesis Testing

8 Estimation in the Presence of Hidden Variables

8.1 EM Algorithm

8.1.1 Statistical Model with Hidden Variables

8.1.2 Minimizing Divergence Between Model Manifold and Data Manifold

8.1.3 EM Algorithm

8.1.4 Example: Gaussian Mixture

8.2 Loss of Information by Data Reduction

8.3 Estimation Based on Misspecified Statistical Model

9 Neyman-Scott Problem: Estimating Function and Semiparametric Statistical Model

9.1 Statistical Model Including Nuisance Parameters

9.2 Neyman--Scott Problem and Semiparametrics

9.3 Estimating Function

9.4 Information Geometry of Estimating Function

9.5 Solutions to Neyman--Scott Problems

9.5.1 Estimating Function in the Exponential Case

9.5.2 Coefficient of Linear Dependence

9.5.3 Scale Problem

9.5.4 Temporal Firing Pattern of Single Neuron

10 Linear Systems and Time Series

10.1 Stationary Time Series and Linear System

10.2 Typical Finite-Dimensional Manifolds of Time Series

10.3 Dual Geometry of System Manifold

10.4 Geometry of AR, MA and ARMA Models

Part IV Applications of Information Geometry

11 Machine Learning

11.1 Clustering Patterns

11.1.1 Pattern Space and Divergence

11.1.2 Center of Cluster

11.1.3 k-Means: Clustering Algorithm

11.1.4 Voronoi Diagram

11.1.5 Stochastic Version of Classification and Clustering

11.1.6 Robust Cluster Center

11.1.7 Asmptotic Evaluation of Error Probability in Pattern Recognition: Chernoff Information

11.2 Geometry of Support Vector Machine

11.2.1 Linear Classifier

11.2.2 Embedding into High-Dimensional Space

11.2.3 Kernel Method

11.2.4 Riemannian Metric Induced by Kernel

11.3 Stochastic Reasoning: Belief Propagation and CCCP Algorithms

11.3.1 Graphical Model

11.3.2 Mean Field Approximation and m-Projection

11.3.3 Belief Propagation

11.3.4 Solution of BP Algorithm

11.3.5 CCCP (Convex--Concave Computational Procedure)

11.4 Information Geometry of Boosting

11.4.1 Boosting: Integration of Weak Machines

11.4.2 Stochastic Interpretation of Machine

11.4.3 Construction of New Weak Machines

11.4.4 Determination of the Weights of Weak Machines

11.5 Bayesian Inference and Deep Learning

11.5.1 Bayesian Duality in Exponential Family

11.5.2 Restricted Boltzmann Machine

11.5.3 Unsupervised Learning of RBM

11.5.4 Geometry of Contrastive Divergence

11.5.5 Gaussian RBM

12 Natural Gradient Learning and Its Dynamics in Singular Regions

12.1 Natural Gradient Stochastic Descent Learning

12.1.1 On-Line Learning and Batch Learning

12.1.2 Natural Gradient: Steepest Descent Direction in Riemannian Manifold

12.1.3 Riemannian Metric, Hessian and Absolute Hessian

12.1.4 Stochastic Relaxation of Optimization Problem

12.1.5 Natural Policy Gradient in Reinforcement Learning

12.1.6 Mirror Descent and Natural Gradient

12.1.7 Properties of Natural Gradient Learning

12.2 Singularity in Learning: Multilayer Perceptron

12.2.1 Multilayer Perceptron

12.2.2 Singularities in M

12.2.3 Dynamics of Learning in M

12.2.4 Critical Slowdown of Dynamics

12.2.5 Natural Gradient Learning Is Free of Plateaus

12.2.6 Singular Statistical Models

12.2.7 Bayesian Inference and Singular Model

13 Signal Processing and Optimization

13.1 Principal Component Analysis

13.1.1 Eigenvalue Analysis

13.1.2 Principal Components, Minor Components and Whitening

13.1.3 Dynamics of Learning of Principal and Minor Components

13.2 Independent Component Analysis

13.2.3 Estimating Function of ICA: Semiparametric Approach

13.3 Non-negative Matrix Factorization

13.4 Sparse Signal Processing

13.4.1 Linear Regression and Sparse Solution

13.4.2 Minimization of Convex Function Under L1 Constraint

13.4.3 Analysis of Solution Path

13.4.4 Minkovskian Gradient Flow

13.4.5 Underdetermined Case

13.5 Optimization in Convex Programming

13.5.1 Convex Programming

13.5.2 Dually Flat Structure Derived from Barrier Function

13.5.3 Computational Complexity and m-curvature

13.6 Dual Geometry Derived from Game Theory

13.6.1 Minimization of Game-Score

13.6.2 Hyvärinen Score

References

Index

Applied Mathematical Sciences Shun-ichi Amari Information Geometry and Its Applications

Applied Mathematical Sciences Volume 194 Editors S.S. Antman, Institute for Physical Science and Technology, University of Maryland, College Park, MD, USA ssa@math.umd.edu Leslie Greengard, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA Greengard@cims.nyu.edu P.J. Holmes, Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ, USA pholmes@math.princeton.edu Advisors J. Bell, Lawrence Berkeley National Lab, Center for Computational Sciences and Engineering, Berkeley, CA, USA P. Constantin, Department of Mathematics, Princeton University, Princeton, NJ, USA J. Keller, Department of Mathematics, Stanford University, Stanford, CA, USA R. Kohn, Courant Institute of Mathematical Sciences, New York University, New York, USA R. Pego, Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, USA L. Ryzhik, Department of Mathematics, Stanford University, Stanford, CA, USA A. Singer, Department of Mathematics, Princeton University, Princeton, NJ, USA A. Stevens, Department of Applied Mathematics, University of Münster, Münster, Germany A. Stuart, Mathematics Institute, University of Warwick, Coventry, United Kingdom S. Wright, Computer Sciences Department, University of Wisconsin, Madison, WI, USA Founding Editors Fritz John, Joseph P. LaSalle and Lawrence Sirovich

More information about this series at http://www.springer.com/series/34

Shun-ichi Amari Information Geometry and Its Applications 123

Shun-ichi Amari Brain Science Institute RIKEN Wako, Saitama Japan ISSN 0066-5452 Applied Mathematical Sciences ISBN 978-4-431-55977-1 DOI 10.1007/978-4-431-55978-8 ISSN 2196-968X (electronic) ISBN 978-4-431-55978-8 (eBook) Library of Congress Control Number: 2015958849 © Springer Japan 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. trademarks, service marks, etc. Printed on acid-free paper This Springer imprint is published by SpringerNature The registered company is Springer Japan KK

Preface Information geometry is a method of exploring the world of information by means of modern geometry. Theories of information have so far been studied mostly by using algebraic, logical, analytical, and probabilistic methods. Since geometry studies mutual relations between elements such as distance and curvature, it should provide the information sciences with powerful tools. Information geometry has emerged from studies of invariant geometrical structure involved in statistical inference. It deﬁnes a Riemannian metric together with dually coupled afﬁne connections in a manifold of probability distributions. These structures play important roles not only in statistical inference but also in wider areas of information sciences, such as machine learning, signal processing, optimization, and even neuroscience, not to mention mathematics and physics. It is intended that the present monograph will give an introduction to information geometry and an overview of wide areas of application. For this purpose, Part I begins with a divergence function in a manifold. We then show that this provides the manifold with a dually ﬂat structure equipped with a Riemannian metric. A highlight is a generalized Pythagorean theorem in a dually ﬂat information manifold. The results are understandable without knowledge of differential geometry. Part II gives an introduction to modern differential geometry without tears. We try to present concepts in a way which is intuitively understandable, not sticking to rigorous mathematics. Throughout the monograph, we do not pursue a rigorous mathematical basis but rather develop a framework which gives practically useful and understandable descriptions. Part III is devoted to statistical inference, where various topics will be found, including the Neyman–Scott problem, semiparametric models, and the EM algo- rithm. Part IV overviews various applications of information geometry in the ﬁelds of machine learning, signal processing, and others. Allow me to review my own personal history in information geometry. It was in 1958, when I was a graduate student on a master’s course, that I followed a seminar on statistics. The text was “Information Theory and Statistics” by S. Kullback, and v

vi Preface a professor suggested to me that the Fisher information might be regarded as a Riemannian metric. I calculated the Riemannian metric and curvature of the manifold of Gaussian distributions and found that it is a manifold of constant from the famous Poincaré half-plane in curvature, which is no different non-Euclidean geometry. I was enchanted by its beauty. I believed that a beautiful structure must have important practical signiﬁcance, but I was not able to pursue its consequences further. Fifteen years later, I was stimulated by a paper by Prof. B. Efron and accom- panying discussions by Prof. A.P. Dawid, and restarted my investigation into information geometry. Later, I found that Prof. N.N. Chentsov had developed a theory along similar lines. I was lucky that Sir D. Cox noticed my approach and organized an international workshop on information geometry in 1984, in which many active statisticians participated. This was a good start for information geometry. Now information geometry has been developed worldwide and many symposia and workshops have been organized around the world. Its areas of application have been enlarged from statistical inference to wider ﬁelds of information sciences. To my regret, I have not been able to introduce many excellent works by other researchers around the world. For example, I have not been able to touch upon quantum information geometry. Also I have not been able to refer to many important works, because of my limited capability. Last but not least, I would like to thank Dr. M. Kumon and Prof. H. Nagaoka, who collaborated in the early period of the infancy of information geometry. I also thank the many researchers who have supported me in the process of construction of information geometry, Profs. D. Cox, C.R. Rao, O. Barndorff-Nielsen, S. Lauritzen, B. Efron, A.P. Dawid, K. Takeuchi, and the late N.N. Chentsov, among many many others. Finally, I would like to thank Ms. Emi Namioka who arranged my handwritten manuscripts in the beautiful TEX form. Without her devotion, the monograph would not have appeared. April 2015 Shun-ichi Amari

Contents Part I Geometry of Divergence Functions: Dually Flat Riemannian Structure 1.2 1.3 1 Manifold, Divergence and Dually Flat Structure . . . . . . . . . . . . . . 1.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Manifold and Coordinate Systems . . . . . . . . . . . . . . . 1.1.2 Examples of Manifolds . . . . . . . . . . . . . . . . . . . . . . . Divergence Between Two Points . . . . . . . . . . . . . . . . . . . . . . Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 1.2.2 Examples of Divergence . . . . . . . . . . . . . . . . . . . . . . Convex Function and Bregman Divergence . . . . . . . . . . . . . . . 1.3.1 Convex Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Bregman Divergence . . . . . . . . . . . . . . . . . . . . . . . . Legendre Transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . Dually Flat Riemannian Structure Derived from Convex Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Afﬁne and Dual Afﬁne Coordinate Systems . . . . . . . . . 1.5.1 Tangent Space, Basis Vectors 1.5.2 and Riemannian Metric . . . . . . . . . . . . . . . . . . . . . . . Parallel Transport of Vector. . . . . . . . . . . . . . . . . . . . 1.5.3 Generalized Pythagorean Theorem and Projection Theorem . . . . Generalized Pythagorean Theorem . . . . . . . . . . . . . . . 1.6.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 1.6.3 Divergence Between Submanifolds: Alternating Minimization Algorithm . . . . . . . . . . . . . . . . . . . . . . 1.4 1.5 1.6 2 Exponential Families and Mixture Families of Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Exponential Family of Probability Distributions . . . . . . . . . . . . 3 3 3 5 9 9 11 12 12 13 16 19 19 20 23 24 24 26 27 31 31 vii

分享到：

赞收藏

资料库

Information Geometry And Its Application.pdf

相关推荐

开发技术

热门标签

最新资料