Preface
Contents
Part I Geometry of Divergence Functions: Dually Flat Riemannian Structure
1 Manifold, Divergence and Dually Flat Structure
1.1 Manifolds
1.1.1 Manifold and Coordinate Systems
1.1.2 Examples of Manifolds
1.2 Divergence Between Two Points
1.2.1 Divergence
1.2.2 Examples of Divergence
1.3 Convex Function and Bregman Divergence
1.3.1 Convex Function
1.3.2 Bregman Divergence
1.4 Legendre Transformation
1.5 Dually Flat Riemannian Structure Derived from Convex Function
1.5.1 Affine and Dual Affine Coordinate Systems
1.5.2 Tangent Space, Basis Vectors and Riemannian Metric
1.5.3 Parallel Transport of Vector
1.6 Generalized Pythagorean Theorem and Projection Theorem
1.6.1 Generalized Pythagorean Theorem
1.6.2 Projection Theorem
1.6.3 Divergence Between Submanifolds: Alternating Minimization Algorithm
2 Exponential Families and Mixture Families of Probability Distributions
2.1 Exponential Family of Probability Distributions
2.2 Examples of Exponential Family: Gaussian and Discrete Distributions
2.2.1 Gaussian Distribution
2.2.2 Discrete Distribution
2.3 Mixture Family of Probability Distributions
2.4 Flat Structure: e-flat and m-flat
2.5 On Infinite-Dimensional Manifold of Probability Distributions
2.6 Kernel Exponential Family
2.7 Bregman Divergence and Exponential Family
2.8 Applications of Pythagorean Theorem
2.8.1 Maximum Entropy Principle
2.8.2 Mutual Information
2.8.3 Repeated Observations and Maximum Likelihood Estimator
3 Invariant Geometry of Manifold of Probability Distributions
3.1 Invariance Criterion
3.2 Information Monotonicity Under Coarse Graining
3.2.1 Coarse Graining and Sufficient Statistics in Sn
3.2.2 Invariant Divergence
3.3 Examples of f-Divergence in Sn
3.3.1 KL-Divergence
3.3.2 χ2-Divergence
3.3.3 α-Divergence
3.4 General Properties of f-Divergence and KL-Divergence
3.4.1 Properties of f-Divergence
3.4.2 Properties of KL-Divergence
3.5 Fisher Information: The Unique Invariant Metric
3.6 f-Divergence in Manifold of Positive Measures
4 α-Geometry, Tsallis q-Entropy and Positive-Definite Matrices
4.1 Invariant and Flat Divergence
4.1.1 KL-Divergence Is Unique
4.1.2 α-Divergence Is Unique in Rn+
4.2 α-Geometry in Sn and Rn+
4.2.1 α-Geodesic and α-Pythagorean Theorem in Rn+
4.2.2 α-Geodesic in Sn
4.2.3 α-Pythagorean Theorem and α-Projection Theorem in Sn
4.2.4 Apportionment Due to α-Divergence
4.2.5 α-Mean
4.2.6 α-Families of Probability Distributions
4.2.7 Optimality of α-Integration
4.2.8 Application to α-Integration of Experts
4.3 Geometry of Tsallis q-Entropy
4.3.1 q-Logarithm and q-Exponential Function
4.3.2 q-Exponential Family (α-Family) of Probability Distributions
4.3.3 q-Escort Geometry
4.3.4 Deformed Exponential Family: χ-Escort Geometry
4.3.5 Conformal Character of q-Escort Geometry
4.4 (u, v)-Divergence: Dually Flat Divergence in Manifold of Positive Measures
4.4.1 Decomposable (u, v)-Divergence
4.4.2 General (u, v) Flat Structure in Rn+
4.5 Invariant Flat Divergence in Manifold of Positive-Definite Matrices
4.5.1 Bregman Divergence and Invariance Under Gl(n)
4.5.2 Invariant Flat Decomposable Divergences Under O(n)
4.5.3 Non-flat Invariant Divergences
4.6 Miscellaneous Divergences
4.6.1 γ-Divergence
4.6.2 Other Types of (α, β)-Divergences
4.6.3 Burbea--Rao Divergence and Jensen--Shannon Divergence
4.6.4 (ρ, τ)-Structure and (F, G, H)-Structure
Part II Introduction to Dual Differential Geometry
5 Elements of Differential Geometry
5.1 Manifold and Tangent Space
5.2 Riemannian Metric
5.3 Affine Connection
5.4 Tensors
5.5 Covariant Derivative
5.6 Geodesic
5.7 Parallel Transport of Vector
5.8 Riemann--Christoffel Curvature
5.8.1 Round-the-World Transport of Vector
5.8.2 Covariant Derivative and RC Curvature
5.8.3 Flat Manifold
5.9 Levi--Civita (Riemannian) Connection
5.10 Submanifold and Embedding Curvature
5.10.1 Submanifold
5.10.2 Embedding Curvature
6 Dual Affine Connections and Dually Flat Manifold
6.1 Dual Connections
6.2 Metric and Cubic Tensor Derived from Divergence
6.3 Invariant Metric and Cubic Tensor
6.4 α-Geometry
6.5 Dually Flat Manifold
6.6 Canonical Divergence in Dually Flat Manifold
6.7 Canonical Divergence in General Manifold of Dual Connections
6.8 Dual Foliations of Flat Manifold and Mixed Coordinates
6.8.1 k-cut of Dual Coordinate Systems: Mixed Coordinates and Foliation
6.8.2 Decomposition of Canonical Divergence
6.8.3 A Simple Illustrative Example: Neural Firing
6.8.4 Higher-Order Interactions of Neuronal Spikes
6.9 System Complexity and Integrated Information
6.10 Input--Output Analysis in Economics
Part III Information Geometry of Statistical Inference
7 Asymptotic Theory of Statistical Inference
7.1 Estimation
7.2 Estimation in Exponential Family
7.3 Estimation in Curved Exponential Family
7.4 First-Order Asymptotic Theory of Estimation
7.5 Higher-Order Asymptotic Theory of Estimation
7.6 Asymptotic Theory of Hypothesis Testing
8 Estimation in the Presence of Hidden Variables
8.1 EM Algorithm
8.1.1 Statistical Model with Hidden Variables
8.1.2 Minimizing Divergence Between Model Manifold and Data Manifold
8.1.3 EM Algorithm
8.1.4 Example: Gaussian Mixture
8.2 Loss of Information by Data Reduction
8.3 Estimation Based on Misspecified Statistical Model
9 Neyman-Scott Problem: Estimating Function and Semiparametric Statistical Model
9.1 Statistical Model Including Nuisance Parameters
9.2 Neyman--Scott Problem and Semiparametrics
9.3 Estimating Function
9.4 Information Geometry of Estimating Function
9.5 Solutions to Neyman--Scott Problems
9.5.1 Estimating Function in the Exponential Case
9.5.2 Coefficient of Linear Dependence
9.5.3 Scale Problem
9.5.4 Temporal Firing Pattern of Single Neuron
10 Linear Systems and Time Series
10.1 Stationary Time Series and Linear System
10.2 Typical Finite-Dimensional Manifolds of Time Series
10.3 Dual Geometry of System Manifold
10.4 Geometry of AR, MA and ARMA Models
Part IV Applications of Information Geometry
11 Machine Learning
11.1 Clustering Patterns
11.1.1 Pattern Space and Divergence
11.1.2 Center of Cluster
11.1.3 k-Means: Clustering Algorithm
11.1.4 Voronoi Diagram
11.1.5 Stochastic Version of Classification and Clustering
11.1.6 Robust Cluster Center
11.1.7 Asmptotic Evaluation of Error Probability in Pattern Recognition: Chernoff Information
11.2 Geometry of Support Vector Machine
11.2.1 Linear Classifier
11.2.2 Embedding into High-Dimensional Space
11.2.3 Kernel Method
11.2.4 Riemannian Metric Induced by Kernel
11.3 Stochastic Reasoning: Belief Propagation and CCCP Algorithms
11.3.1 Graphical Model
11.3.2 Mean Field Approximation and m-Projection
11.3.3 Belief Propagation
11.3.4 Solution of BP Algorithm
11.3.5 CCCP (Convex--Concave Computational Procedure)
11.4 Information Geometry of Boosting
11.4.1 Boosting: Integration of Weak Machines
11.4.2 Stochastic Interpretation of Machine
11.4.3 Construction of New Weak Machines
11.4.4 Determination of the Weights of Weak Machines
11.5 Bayesian Inference and Deep Learning
11.5.1 Bayesian Duality in Exponential Family
11.5.2 Restricted Boltzmann Machine
11.5.3 Unsupervised Learning of RBM
11.5.4 Geometry of Contrastive Divergence
11.5.5 Gaussian RBM
12 Natural Gradient Learning and Its Dynamics in Singular Regions
12.1 Natural Gradient Stochastic Descent Learning
12.1.1 On-Line Learning and Batch Learning
12.1.2 Natural Gradient: Steepest Descent Direction in Riemannian Manifold
12.1.3 Riemannian Metric, Hessian and Absolute Hessian
12.1.4 Stochastic Relaxation of Optimization Problem
12.1.5 Natural Policy Gradient in Reinforcement Learning
12.1.6 Mirror Descent and Natural Gradient
12.1.7 Properties of Natural Gradient Learning
12.2 Singularity in Learning: Multilayer Perceptron
12.2.1 Multilayer Perceptron
12.2.2 Singularities in M
12.2.3 Dynamics of Learning in M
12.2.4 Critical Slowdown of Dynamics
12.2.5 Natural Gradient Learning Is Free of Plateaus
12.2.6 Singular Statistical Models
12.2.7 Bayesian Inference and Singular Model
13 Signal Processing and Optimization
13.1 Principal Component Analysis
13.1.1 Eigenvalue Analysis
13.1.2 Principal Components, Minor Components and Whitening
13.1.3 Dynamics of Learning of Principal and Minor Components
13.2 Independent Component Analysis
13.2.3 Estimating Function of ICA: Semiparametric Approach
13.3 Non-negative Matrix Factorization
13.4 Sparse Signal Processing
13.4.1 Linear Regression and Sparse Solution
13.4.2 Minimization of Convex Function Under L1 Constraint
13.4.3 Analysis of Solution Path
13.4.4 Minkovskian Gradient Flow
13.4.5 Underdetermined Case
13.5 Optimization in Convex Programming
13.5.1 Convex Programming
13.5.2 Dually Flat Structure Derived from Barrier Function
13.5.3 Computational Complexity and m-curvature
13.6 Dual Geometry Derived from Game Theory
13.6.1 Minimization of Game-Score
13.6.2 Hyvärinen Score
References
Index