logo资料库

2015年新书:Machine Learning A Bayesian and Optimization Perspective.pdf

第1页 / 共1072页
第2页 / 共1072页
第3页 / 共1072页
第4页 / 共1072页
第5页 / 共1072页
第6页 / 共1072页
第7页 / 共1072页
第8页 / 共1072页
资料共1072页,剩余部分请下载后查看
Cover
Contents
Preface
Acknowledgments
Notation
Dedication
1 Introduction
What Machine Learning is About
Classification
Regression
Structure and a Road Map of the Book
References
2 Probability and Stochastic Processes
Introduction
Probability and Random Variables
Probability
Relative frequency definition
Axiomatic definition
Discrete Random Variables
Joint and conditional probabilities
Bayes theorem
Continuous Random Variables
Mean and Variance
Complex random variables
Transformation of Random Variables
Examples of Distributions
Discrete Variables
The Bernoulli distribution
The Binomial distribution
The Multinomial distribution
Continuous Variables
The uniform distribution
The Gaussian distribution
The central limit theorem
The exponential distribution
The beta distribution
The gamma distribution
The Dirichlet distribution
Stochastic Processes
First and Second Order Statistics
Stationarity and Ergodicity
Power Spectral Density
Properties of the autocorrelation sequence
Power spectral density
Transmission through a linear system
Physical interpretation of the PSD
Autoregressive Models
Information Theory
Discrete Random Variables
Information
Mutual and conditional information
Entropy and average mutual information
Continuous Random Variables
Average mutual information and conditional information
Relative entropy or Kullback-Leibler divergence
Stochastic Convergence
Convergence everywhere
Convergence almost everywhere
Convergence in the mean-square sense
Convergence in probability
Convergence in distribution
Problems
References
3 Learning in Parametric Modeling
3.1 Introduction
3.2 Parameter Estimation: The Deterministic Point of View
3.3 Linear Regression
3.4 Classification
Generative versus discriminative learning
Supervised, semisupervised, and unsupervised learning
3.5 Biased Versus Unbiased Estimation
3.5.1 Biased or Unbiased Estimation?
3.6 The Cramér-Rao Lower Bound
3.7 Sufficient Statistic
3.8 Regularization
Inverse problems: Ill-conditioning and overfitting
3.9 The Bias-Variance Dilemma
3.9.1 Mean-Square Error Estimation
3.9.2 Bias-Variance Tradeoff
3.10 Maximum Likelihood Method
3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case
3.11 Bayesian Inference
3.11.1 The Maximum A Posteriori Probability Estimation Method
3.12 Curse of Dimensionality
3.13 Validation
Cross-validation
3.14 Expected and Empirical Loss Functions
3.15 Nonparametric Modeling and Estimation
Problems
References
4 Mean-Square Error Linear Estimation
Introduction
Mean-Square Error Linear Estimation: The Normal Equations
The Cost Function Surface
A Geometric Viewpoint: Orthogonality Condition
Extension to Complex-Valued Variables
Widely Linear Complex-Valued Estimation
Circularity conditions
Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus
Linear Filtering
MSE Linear Filtering: A Frequency Domain Point of View
Deconvolution: image deblurring
Some Typical Applications
Interference Cancellation
System Identification
Deconvolution: Channel Equalization
Algorithmic Aspects
Forward and backward MSE optimal predictors
The Lattice-Ladder Scheme
Orthogonality of the optimal backward errors
Mean-Square Error Estimation of Linear Models
The Gauss-Markov Theorem
Constrained Linear Estimation: The Beamforming Case
Time-Varying Statistics: Kalman Filtering
Problems
MATLAB Exercises
References
5 Stochastic Gradient Descent: the lms Algorithm and its Family
Introduction
The Steepest Descent Method
Application to the Mean-Square Error Cost Function
Time-varying step-sizes
The Complex-Valued Case
Stochastic Approximation
Application to the MSE linear estimation
The Least-Mean-Squares Adaptive Algorithm
Convergence and Steady-State Performance of the LMS in Stationary Environments
Convergence of the parameter error vector
Cumulative Loss Bounds
The Affine Projection Algorithm
Geometric interpretation of APA
Orthogonal projections
The Normalized LMS
The Complex-Valued Case
The widely linear LMS
The widely linear APA
Relatives of the LMS
The sign-error LMS
The least-mean-fourth (LMF) algorithm
Transform-domain LMS
Simulation Examples
Adaptive Decision Feedback Equalization
The Linearly Constrained LMS
Tracking Performance of the LMS in Nonstationary Environments
Distributed Learning: The Distributed LMS
Cooperation Strategies
Centralized networks
Decentralized networks
The Diffusion LMS
Convergence and Steady-State Performance: Some Highlights
Consensus-Based Distributed Schemes
A Case Study: Target Localization
Some Concluding Remarks: Consensus Matrix
Problems
MATLAB Exercises
References
6 The Least-Squares Family
Introduction
Least-Squares Linear Regression: A Geometric Perspective
Statistical Properties of the LS Estimator
The LS estimator is unbiased
Covariance matrix of the LS estimator
The LS estimator is BLUE in the presence of white noise
The LS estimator achieves the Cramér-Rao bound for white Gaussian noise
Asymptotic distribution of the LS estimator
Orthogonalizing the Column Space of X: The SVD Method
Pseudo-inverse matrix and SVD
Ridge Regression
Principal components regression
The Recursive Least-Squares Algorithm
Time-iterative computations of ɸn, pn
Time updating of θn
Newton's Iterative Minimization Method
RLS and Newton's Method
Steady-State Performance of the RLS
Complex-Valued Data: The Widely Linear RLS
Computational Aspects of the LS Solution
Cholesky factorization
QR factorization
Fast RLS versions
The Coordinate and Cyclic Coordinate Descent Methods
Simulation Examples
Total-Least-Squares
Geometric interpretation of the total-least-squares method
Problems
MATLAB Exercises
References
7 Classification: A Tour of the Classics
Introduction
Bayesian Classification
The Bayesian classifier minimizes the misclassification error
Average Risk
Decision (Hyper)Surfaces
The Gaussian Distribution Case
Minimum distance classifiers
The Naive Bayes Classifier
The Nearest Neighbor Rule
Logistic Regression
Fisher's Linear Discriminant
Classification Trees
Combining Classifiers
Experimental comparisons
Schemes for combining classifiers
The Boosting Approach
The AdaBoost algorithm
The log-loss function
Boosting Trees
Case Study: Protein Folding Prediction
Protein folding prediction as a classification task
Classification of folding prediction via decision trees
Problems
MATLAB Exercises
References
8 Parameter Learning: A Convex Analytic Path
Introduction
Convex Sets and Functions
Convex Sets
Convex Functions
Projections onto Convex Sets
Properties of Projections
Fundamental Theorem of Projections onto Convex Sets
A Parallel Version of POCS
From Convex Sets to Parameter Estimation and Machine Learning
Regression
Classification
Infinite Many Closed Convex Sets: The Online Learning Case
Convergence of APSM
Some practical hints
Constrained Learning
The Distributed APSM
Optimizing Nonsmooth Convex Cost Functions
Subgradients and Subdifferentials
Minimizing Nonsmooth Continuous Convex Loss Functions: The BatchLearning Case
The subgradient method
The generic projected subgradient scheme
The projected gradient method (PGM)
Projected subgradient method
Online Learning for Convex Optimization
The PEGASOS algorithm
Regret Analysis
Regret analysis of the subgradient algorithm
Online Learning and Big Data Applications: A Discussion
Approximation, estimation and optimization errors
Batch versus online learning
Proximal Operators
Properties of the Proximal Operator
Proximal Minimization
Resolvent of the subdifferential mapping
Proximal Splitting Methods for Optimization
The proximal forward-backward splitting operator
Alternating direction method of multipliers (ADMM)
Mirror descent algorithms
Problems
MATLAB Exercises
Appendix to Chapter 8
References
9 Sparsity-Aware Learning: Concepts andTheoretical Foundations
Introduction
Searching for a Norm
The Least Absolute Shrinkage and Selection Operator (LASSO)
Sparse Signal Representation
In Search of the Sparsest Solution
The Ɩ2 norm minimizer
The l0 norm minimizer
The l1 norm minimizer
Characterization of the l1 norm minimizer
Geometric interpretation
Uniqueness of the l0 Minimizer
Mutual Coherence
Equivalence of l0 and l1 Minimizers: Sufficiency Conditions
Condition Implied by the Mutual Coherence Number
The Restricted Isometry Property (RIP)
Constructing matrices that obey the RIP of order k
Robust Sparse Signal Recovery from Noisy Measurements
Compressed Sensing: The Glory of Randomness
Compressed sensing
Dimensionality Reduction and Stable Embeddings
Sub-Nyquist Sampling: Analog-to-Information Conversion
A Case Study: Image De-Noising
Problems
MATLAB Exercises
References
10 Sparsity-aware Learning: Algorithms and Applications
Introduction
Sparsity-Promoting Algorithms
Greedy Algorithms
OMP can recover optimal sparse solutions: sufficiency condition
The LARS algorithm
Compressed sensing matching pursuit (CSMP) algorithms
Iterative Shrinkage/Thresholding (IST) Algorithms
Which Algorithm?: Some Practical Hints
Variations on the Sparsity-Aware Theme
Online Sparsity-Promoting Algorithms
LASSO: Asymptotic Performance
The Adaptive Norm-Weighted LASSO
Adaptive CoSaMP (AdCoSaMP) Algorithm
Sparse Adaptive Projection Subgradient Method (SpAPSM)
Projection onto the weighted l1 ball
Learning Sparse Analysis Models
Compressed Sensing for Sparse Signal Representation in Coherent Dictionaries
Cosparsity
A Case Study: Time-Frequency Analysis
Gabor transform and frames
Time-frequency resolution
Gabor frames
Time-frequency analysis of echolocation signals emitted by bats
Appendix to Chapter 10: Some Hints from the Theory of Frames
Problems
MATLAB Exercises
References
11 Learning in Reproducing Kernel Hilbert Spaces
11.1 Introduction
11.2 Generalized Linear Models
11.3 Volterra, Wiener, and Hammerstein Models
11.4 Cover's Theorem: Capacity of a Space in Linear Dichotomies
11.5 Reproducing Kernel Hilbert Spaces
11.5.1 Some Properties and Theoretical Highlights
11.5.2 Examples of Kernel Functions
Constructing kernels
String kernels
11.6 Representer Theorem
11.6.1 Semiparametric Representer Theorem
11.6.2 Nonparametric Modeling: A Discussion
11.7 Kernel Ridge Regression
11.8 Support Vector Regression
11.8.1 The Linear ε-Insensitive Optimal Regression
The solution
Solving the optimization task
11.9 Kernel Ridge Regression Revisited
11.10 Optimal Margin Classification: Support Vector Machines
11.10.1 Linearly Separable Classes: Maximum Margin Classifiers
The solution
The optimization task
11.10.2 Nonseparable Classes
The solution
The optimization task
11.10.3 Performance of SVMs and Applications
11.10.4 Choice of Hyperparameters
11.11 Computational Considerations
11.11.1 Multiclass Generalizations
11.12 Online Learning in RKHS
11.12.1 The Kernel LMS (KLMS)
11.12.2 The Naive Online Rreg Minimization Algorithm (NORMA)
Classification: the hinge loss function
Regression: the linear ε-insensitive loss function
Error bounds and convergence performance
11.12.3 The Kernel APSM Algorithm
Regression
Classification
11.13 Multiple Kernel Learning
11.14 Nonparametric Sparsity-Aware Learning: Additive Models
11.15 A Case Study: Authorship Identification
Problems
MATLAB Exercises
References
12 Bayesian Learning: Inference and the EM Algorithm
Introduction
Regression: A Bayesian Perspective
The Maximum Likelihood Estimator
The MAP Estimator
The Bayesian Approach
The Evidence Function and Occam's Razor Rule
Laplacian approximation and the evidence function
Exponential Family of Probability Distributions
The Exponential Family and the Maximum Entropy Method
Latent Variables and the EM Algorithm
The Expectation-Maximization Algorithm
The EM Algorithm: A Lower Bound Maximization View
Linear Regression and the EM Algorithm
Gaussian Mixture Models
Gaussian Mixture Modeling and Clustering
Combining Learning Models: A Probabilistic Point of View
Mixing Linear Regression Models
Mixture of experts
Hierarchical mixture of experts
Mixing Logistic Regression Models
Problems
MATLAB Exercises
Appendix to Chapter 12
PDFs with Exponent of Quadratic Form
The Conditional from the Joint Gaussian pdf
The Marginal from the Joint Gaussian Pdf
The Posterior from Gaussian Prior and Conditional Pdfs
References
13 Bayesian Learning: Approximate Inference and Nonparametric Models
13.1 Introduction
13.2 Variational Approximation in Bayesian Learning
The mean field approximation
13.2.1 The Case of the Exponential Family of Probability Distributions
13.3 A Variational Bayesian Approach to Linear Regression
Computation of the lower bound
13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling
13.5 When Bayesian Inference Meets Sparsity
13.6 Sparse Bayesian Learning (SBL)
13.6.1 The Spike and Slab Method
13.7 The Relevance Vector Machine Framework
13.7.1 Adopting the Logistic Regression Model for Classification
13.8 Convex Duality and Variational Bounds
13.9 Sparsity-Aware Regression: A Variational Bound Bayesian Path
13.10 Sparsity-Aware Learning: Some Concluding Remarks
Parameter identifiability and sparse Bayesian modeling
13.11 Expectation Propagation
Minimizing the KL divergence
The expectation propagation algorithm
13.12 Nonparametric Bayesian Modeling
13.12.1 The Chinese Restaurant Process
13.12.2 Inference
13.12.3 Dirichlet Processes
13.12.4 The Stick-Breaking Construction of a DP
13.13 Gaussian Processes
13.13.1 Covariance Functions and Kernels
13.13.2 Regression
Dealing with hyperparameters
Computational considerations
13.13.3 Classification
13.14 A Case Study: Hyperspectral Image Unmixing
13.14.1 Hierarchical Bayesian Modeling
13.14.2 Experimental Results
Problems
MATLAB Exercises
References
14 Monte Carlo Methods
Introduction
Monte Carlo Methods: The Main Concept
Random number generation
Random Sampling Based on Function Transformation
Rejection Sampling
Importance Sampling
Monte Carlo Methods and the EM Algorithm
Markov Chain Monte Carlo Methods
Ergodic Markov Chains
The Metropolis Method
Convergence Issues
Gibbs Sampling
In Search of More Efficient Methods: A Discussion
Variational inference or Monte Carlo methods
A Case Study: Change-Point Detection
Problems
MATLAB Exercise
References
15 Probabilistic Graphical Models: Part I
Introduction
The Need for Graphical Models
Bayesian Networks and the Markov Condition
Graphs: Basic Definitions
Some Hints on Causality
d-separation
Sigmoidal Bayesian Networks
Linear Gaussian Models
Multiple-Cause Networks
I-Maps, Soundness, Faithfulness, and Completeness
Undirected Graphical Models
Independencies and I-Maps in Markov Random Fields
The Ising Model and Its Variants
Conditional Random Fields (CRFs)
Factor Graphs
Graphical Models for Error-Correcting Codes
Moralization of Directed Graphs
Exact Inference Methods: Message-Passing Algorithms
Exact Inference in Chains
Exact Inference in Trees
The Sum-Product Algorithm
The Max-Product and Max-Sum Algorithms
Problems
References
16 Probabilistic Graphical Models: Part II
Introduction
Triangulated Graphs and Junction Trees
Constructing a Join Tree
message-passing in Junction Trees
Approximate Inference Methods
Variational Methods: Local Approximation
Multiple-cause networks and the noisy-OR model
The Boltzmann machine
Block Methods for Variational Approximation
The mean field approximation and the Boltzmann machine
Loopy Belief Propagation
Dynamic Graphical Models
Hidden Markov Models
Inference
Learning the Parameters in an HMM
Discriminative Learning
Beyond HMMs: A Discussion
Factorial Hidden Markov Models
Time-Varying Dynamic Bayesian Networks
Learning Graphical Models
Parameter Estimation
Learning the Structure
Problems
References
17 Particle Filtering
Introduction
Sequential Importance Sampling
Importance Sampling Revisited
Resampling
Sequential Sampling
Kalman and Particle Filtering
Kalman Filtering: A Bayesian Point of View
Particle Filtering
Degeneracy
Generic Particle Filtering
Auxiliary Particle Filtering
Problems
MATLAB Exercises
References
18 Neural Networks and Deep Learning
Introduction
The Perceptron
The Kernel Perceptron Algorithm
Feed-Forward Multilayer Neural Networks
The Backpropagation Algorithm
The Gradient Descent Scheme
Speeding up the convergence rate
Some practical hints
Beyond the Gradient Descent Rationale
Selecting a Cost Function
Pruning the Network
Universal Approximation Property of Feed-Forward Neural Networks
Neural Networks: A Bayesian Flavor
Learning Deep Networks
The Need for Deep Architectures
Training Deep Networks
Distributed representations
Training Restricted Boltzmann Machines
Computation of the conditional probabilities
Contrastive divergence
Training Deep Feed-Forward Networks
Deep Belief Networks
Variations on the Deep Learning Theme
Gaussian Units
Stacked Autoencoders
The Conditional RBM
Case Study: A Deep Network for Optical Character Recognition
CASE Study: A Deep Autoencoder
Example: Generating Data via a DBN
Problems
MATLAB Exercises
References
19 Dimensionality Reduction
19.1 Introduction
19.2 Intrinsic Dimensionality
19.3 Principle Component Analysis
PCA, SVD, and low-Rank matrix factorization
Minimum error interpretation
PCA and information retrieval
Orthogonalizing properties of PCA and feature generation
Latent variables
19.4 Canonical Correlation Analysis
19.4.1 Relatives of CCA
Partial least-squares
19.5 Independent Component Analysis
19.5.1 ICA and Gaussianity
19.5.2 ICA and Higher Order Cumulants
ICA ambiguities
19.5.3 Non-Gaussianity and Independent Components
19.5.4 ICA Based on Mutual Information
19.5.5 Alternative Paths to ICA
The cocktail party problem
19.6 Dictionary Learning: The k-SVD Algorithm
Why the name k-SVD
19.7 Nonnegative Matrix Factorization
19.8 Learning Low-Dimensional Models: A Probabilistic Perspective
19.8.1 Factor Analysis
19.8.2 Probabilistic PCA
19.8.3 Mixture of Factors Analyzers: A Bayesian View to Compressed Sensing
19.9 Nonlinear Dimensionality Reduction
19.9.1 Kernel PCA
19.9.2 Graph-Based Methods
Laplacian eigenmaps
Local linear embedding (LLE)
Isometric mapping (ISOMAP)
19.10 Low-Rank Matrix Factorization: A Sparse Modeling Path
19.10.1 Matrix Completion
19.10.2 Robust PCA
19.10.3 Applications of Matrix Completion and ROBUST PCA
Matrix completion
Robust PCA/PCP
19.11 A Case Study: fMRI Data Analysis
Problems
MATLAB Exercises
References
Appendix-A- Linear Algebra
A.1 Properties of Matrices
Matrix inversion lemmas
Matrix derivatives
A.2 Positive Definite and Symmetric Matrices
A.3 Wirtinger Calculus
References
Appendix-B- Probability Theory and Statistics
B.1 Cramér-Rao Bound
B.2 Characteristic Functions
B.3 Moments and Cumulants
B.4 Edgeworth Expansion of a pdf
Reference
Appendix-C-Hints on Constrained Optimization
C.1 Equality Constraints
C.2 Inequality Constraints
The Karush-Kuhn-Tucker (KKT) conditions
Min-Max duality
Saddle point condition
Lagrangian duality
Convex programming
Wolfe dual representation
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Machine Learning A Bayesian and Optimization Perspective
This page intentionally left blank
Machine Learning A Bayesian and Optimization Perspective Sergios Theodoridis AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 125 London Wall, London, EC2Y 5AS, UK 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA 225 Wyman Street, Waltham, MA 02451, USA The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK Copyright © 2015 Elsevier Ltd. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12-801522-3 For information on all Academic Press publications visit our website at http://store.elsevier.com/ Publisher: Jonathan Simpson Acquisition Editor: Tim Pitts Editorial Project Manager: Charlie Kent Production Project Manager: Susan Li Designer: Greg Harris Typeset by SPi Global, India Printed and bound in The United States 15 18 19 16 10 9 17 8 7 6 5 4 3 2 1
Contents Preface ................................................................................................... xvii Acknowledgments ....................................................................................... xix Notation .................................................................................................. xxi CHAPTER 1 CHAPTER 2 CHAPTER 3 Introduction ...................................................................... 1.1 What Machine Learning is About.................................................. 1.1.1 Classification . .............................................................. 1.1.2 Regression.................................................................. 1.2 Structure and a Road Map of the Book ............................................ References ................................................................................ Probability and Stochastic Processes .................................... 9 2.1 Introduction ......................................................................... 10 2.2 Probability and Random Variables . ................................................ 10 2.2.1 Probability.................................................................. 11 2.2.2 Discrete Random Variables. ............................................... 12 2.2.3 Continuous Random Variables ............................................ 14 2.2.4 Mean and Variance ........................................................ 15 Transformation of Random Variables. .................................... 17 2.2.5 2.3 Examples of Distributions .......................................................... 18 2.3.1 Discrete Variables.......................................................... 18 2.3.2 Continuous Variables ...................................................... 20 2.4 Stochastic Processes . ............................................................... 29 2.4.1 First and Second Order Statistics ......................................... 30 2.4.2 Stationarity and Ergodicity ... ............................................. 30 2.4.3 Power Spectral Density .................................................... 33 2.4.4 Autoregressive Models .................................................... 38 2.5 Information Theory ................................................................. 41 2.5.1 Discrete Random Variables. ............................................... 42 2.5.2 Continuous Random Variables ............................................ 45 2.6 Stochastic Convergence ............................................................ 48 Problems.................................................................................. 49 References ................................................................................ 51 Learning in Parametric Modeling: Basic Concepts and Directions 53 3.1 Introduction ......................................................................... 53 3.2 Parameter Estimation: The Deterministic Point of View . ........................ 54 1 1 2 3 5 8 v
vi Contents 3.3 Linear Regression................................................................... 57 3.4 Classification ........................................................................ 60 3.5 Biased Versus Unbiased Estimation ............................................... 64 3.5.1 Biased or Unbiased Estimation? .......................................... 65 3.6 The Cramér-Rao Lower Bound .................................................... 67 3.7 Sufficient Statistic................................................................... 70 3.8 Regularization....................................................................... 72 3.9 The Bias-Variance Dilemma ....................................................... 77 3.9.1 Mean-Square Error Estimation ............................................ 77 3.9.2 Bias-Variance Tradeoff .................................................... 78 3.10 Maximum Likelihood Method ..................................................... 82 3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case ................ 84 3.11 Bayesian Inference. ................................................................. 84 3.11.1 The Maximum a Posteriori Probability Estimation Method .... ......... 88 3.12 Curse of Dimensionality... ......................................................... 89 3.13 Validation . .......................................................................... 91 3.14 Expected and Empirical Loss Functions ........................................... 93 3.15 Nonparametric Modeling and Estimation ......................................... 95 Problems ................................................................................... 97 References.................................................................................. 102 CHAPTER 4 Mean-Square Error Linear Estimation .................................... 105 4.1 Introduction ......................................................................... 105 4.2 Mean-Square Error Linear Estimation: The Normal Equations .................. 106 The Cost Function Surface ................................................ 107 4.3 A Geometric Viewpoint: Orthogonality Condition................................ 109 4.4 Extension to Complex-Valued Variables .......................................... 111 4.4.1 Widely Linear Complex-Valued Estimation .............................. 113 4.4.2 Optimizing with Respect to Complex-Valued Variables: 4.2.1 Wirtinger Calculus ......................................................... 116 4.5 Linear Filtering ..................................................................... 118 4.6 MSE Linear Filtering: A Frequency Domain Point of View. ... .................. 120 4.7 Some Typical Applications ......................................................... 124 Interference Cancellation .................................................. 124 4.7.1 4.7.2 System Identification . ..................................................... 125 4.7.3 Deconvolution: Channel Equalization . ................................... 126 4.8 Algorithmic Aspects: The Levinson and the Lattice-Ladder Algorithms ........ 132 The Lattice-Ladder Scheme . .............................................. 137 4.9 Mean-Square Error Estimation of Linear Models ................................. 140 4.9.1 The Gauss-Markov Theorem .............................................. 143 4.9.2 Constrained Linear Estimation: The Beamforming Case ................ 145 4.8.1
Contents vii 4.10 Time-Varying Statistics: Kalman Filtering ... ..................................... 148 Problems ................................................................................... 154 References.................................................................................. 158 CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family .................................................................... 161 5.1 Introduction ......................................................................... 162 5.2 The Steepest Descent Method ...................................................... 163 5.3 Application to the Mean-Square Error Cost Function ............................ 167 The Complex-Valued Case ................................................ 175 5.4 Stochastic Approximation .......................................................... 177 5.5 The Least-Mean-Squares Adaptive Algorithm .................................... 179 5.3.1 5.5.1 Convergence and Steady-State Performance of the LMS 5.6.1 in Stationary Environments ............................................... 181 5.5.2 Cumulative Loss Bounds .................................................. 186 5.6 The Affine Projection Algorithm................................................... 188 The Normalized LMS ..................................................... 193 5.7 The Complex-Valued Case ......................................................... 194 5.8 Relatives of the LMS ............................................................... 196 5.9 Simulation Examples ............................................................... 199 5.10 Adaptive Decision Feedback Equalization ........................................ 202 5.11 The Linearly Constrained LMS .................................................... 204 5.12 Tracking Performance of the LMS in Nonstationary Environments ....................................................................... 206 5.13 Distributed Learning: The Distributed LMS . ..................................... 208 5.13.1 Cooperation Strategies. .. .................................................. 209 5.13.2 The Diffusion LMS ........................................................ 211 5.13.3 Convergence and Steady-State Performance: CHAPTER 6 Some Highlights . .......................................................... 218 5.13.4 Consensus-Based Distributed Schemes................................... 220 5.14 A Case Study: Target Localization ... ............................................. 222 5.15 Some Concluding Remarks: Consensus Matrix ................................... 223 Problems ................................................................................... 224 References.................................................................................. 227 The Least-Squares Family .................................................... 233 6.1 Introduction ......................................................................... 234 6.2 Least-Squares Linear Regression: A Geometric Perspective . .................... 234 6.3 Statistical Properties of the LS Estimator.......................................... 236 6.4 Orthogonalizing the Column Space of X: The SVD Method..................... 239 6.5 Ridge Regression ................................................................... 243 6.6 The Recursive Least-Squares Algorithm . ......................................... 245
分享到:
收藏