logo资料库

Gaussian Processes for Machine Learning.pdf

第1页 / 共266页
第2页 / 共266页
第3页 / 共266页
第4页 / 共266页
第5页 / 共266页
第6页 / 共266页
第7页 / 共266页
第8页 / 共266页
资料共266页,剩余部分请下载后查看
Series Foreword
Preface
Symbols and Notation
Introduction
A Pictorial Introduction to Bayesian Modelling
Roadmap
Regression
Weight-space View
The Standard Linear Model
Projections of Inputs into Feature Space
Function-space View
Varying the Hyperparameters
Decision Theory for Regression
An Example Application
Smoothing, Weight Functions and Equivalent Kernels
* Incorporating Explicit Basis Functions
Marginal Likelihood
History and Related Work
Exercises
Classification
Classification Problems
Decision Theory for Classification
Linear Models for Classification
Gaussian Process Classification
The Laplace Approximation for the Binary GP Classifier
Posterior
Predictions
Implementation
Marginal Likelihood
* Multi-class Laplace Approximation
Implementation
Expectation Propagation
Predictions
Marginal Likelihood
Implementation
Experiments
A Toy Problem
One-dimensional Example
Binary Handwritten Digit Classification Example
10-class Handwritten Digit Classification Example
Discussion
* Appendix: Moment Derivations
Exercises
Covariance Functions
Preliminaries
* Mean Square Continuity and Differentiability
Examples of Covariance Functions
Stationary Covariance Functions
Dot Product Covariance Functions
Other Non-stationary Covariance Functions
Making New Kernels from Old
Eigenfunction Analysis of Kernels
* An Analytic Example
Numerical Approximation of Eigenfunctions
Kernels for Non-vectorial Inputs
String Kernels
Fisher Kernels
Exercises
Model Selection and Adaptation of Hyperparameters
The Model Selection Problem
Bayesian Model Selection
Cross-validation
Model Selection for GP Regression
Marginal Likelihood
Cross-validation
Examples and Discussion
Model Selection for GP Classification
* Derivatives of the Marginal Likelihood for Laplace's Approximation
* Derivatives of the Marginal Likelihood for EP
Cross-validation
Example
Exercises
Relationships between GPs and Other Models
Reproducing Kernel Hilbert Spaces
Regularization
* Regularization Defined by Differential Operators
Obtaining the Regularized Solution
The Relationship of the Regularization View to Gaussian Process Prediction
Spline Models
* A 1-d Gaussian Process Spline Construction
* Support Vector Machines
Support Vector Classification
Support Vector Regression
* Least-squares Classification
Probabilistic Least-squares Classification
* Relevance Vector Machines
Exercises
Theoretical Perspectives
The Equivalent Kernel
Some Specific Examples of Equivalent Kernels
* Asymptotic Analysis
Consistency
Equivalence and Orthogonality
* Average-case Learning Curves
* PAC-Bayesian Analysis
The PAC Framework
PAC-Bayesian Analysis
PAC-Bayesian Analysis of GP Classification
Comparison with Other Supervised Learning Methods
* Appendix: Learning Curve for the Ornstein-Uhlenbeck Process
Exercises
Approximation Methods for Large Datasets
Reduced-rank Approximations of the Gram Matrix
Greedy Approximation
Approximations for GPR with Fixed Hyperparameters
Subset of Regressors
The Nyström Method
Subset of Datapoints
Projected Process Approximation
Bayesian Committee Machine
Iterative Solution of Linear Systems
Comparison of Approximate GPR Methods
Approximations for GPC with Fixed Hyperparameters
* Approximating the Marginal Likelihood and its Derivatives
* Appendix: Equivalence of SR and GPR Using the Nyström Approximate Kernel
Exercises
Further Issues and Conclusions
Multiple Outputs
Noise Models with Dependencies
Non-Gaussian Likelihoods
Derivative Observations
Prediction with Uncertain Inputs
Mixtures of Gaussian Processes
Global Optimization
Evaluation of Integrals
Student's t Process
Invariances
Latent Variable Models
Conclusions and Future Directions
Appendix Mathematical Background
Joint, Marginal and Conditional Probability
Gaussian Identities
Matrix Identities
Matrix Derivatives
Matrix Norms
Cholesky Decomposition
Entropy and Kullback-Leibler Divergence
Limits
Measure and Integration
Lp Spaces
Fourier Transforms
Convexity
Appendix Gaussian Markov Processes
Fourier Analysis
Sampling and Periodization
Continuous-time Gaussian Markov Processes
Continuous-time GMPs on R
The Solution of the Corresponding SDE on the Circle
Discrete-time Gaussian Markov Processes
Discrete-time GMPs on Z
The Solution of the Corresponding Difference Equation on PN
The Relationship Between Discrete-time and Sampled Continuous-time GMPs
Markov Processes in Higher Dimensions
Appendix Datasets and Code
Bibliography
Author Index
Subject Index
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Gaussian Processes for Machine Learning
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Adaptive Computation and Machine Learning Thomas Dietterich, Editor Christopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns, Associate Editors Bioinformatics: The Machine Learning Approach, Pierre Baldi and Søren Brunak Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto Graphical Models for Machine Learning and Digital Communication, Brendan J. Frey Learning in Graphical Models, Michael I. Jordan Causation, Prediction, and Search, second edition, Peter Spirtes, Clark Glymour, and Richard Scheines Principles of Data Mining, David Hand, Heikki Mannila, and Padhraic Smyth Bioinformatics: The Machine Learning Approach, second edition, Pierre Baldi and Søren Brunak Learning Kernel Classifiers: Theory and Algorithms, Ralf Herbrich Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Bernhard Sch¨olkopf and Alexander J. Smola Introduction to Machine Learning, Ethem Alpaydin Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Gaussian Processes for Machine Learning Carl Edward Rasmussen Christopher K. I. Williams The MIT Press Cambridge, Massachusetts London, England
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. c 2006 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please email special sales@mitpress.mit.edu or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142. Typeset by the authors using LATEX 2ε. This book was printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Rasmussen, Carl Edward. Gaussian processes for machine learning / Carl Edward Rasmussen, Christopher K. I. Williams. p. cm. —(Adaptive computation and machine learning) Includes bibliographical references and indexes. ISBN 0-262-18253-X 1. Gaussian processes—Data processing. 2. Machine learning—Mathematical models. I. Williams, Christopher K. I. II. Title. III. Series. QA274.4.R37 2006 519.2'3—dc22 10 9 8 7 6 5 4 3 2 2005053433
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind. — James Clerk Maxwell [1850]
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X.
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Contents Series Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Symbols and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1 Introduction 1.1 A Pictorial Introduction to Bayesian Modelling . . . . . . . . . . . . . . . 1.2 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Regression 2.1 Weight-space View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 The Standard Linear Model . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Projections of Inputs into Feature Space . . . . . . . . . . . . . . . 2.2 Function-space View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Varying the Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Decision Theory for Regression . . . . . . . . . . . . . . . . . . . . . . . . 2.5 An Example Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Smoothing, Weight Functions and Equivalent Kernels . . . . . . . . . . . Incorporating Explicit Basis Functions . . . . . . . . . . . . . . . . . . . . 2.7.1 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 History and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ∗ 2.7 1 3 5 7 7 8 11 13 19 21 22 24 27 29 29 30 3 Classification 3.1 Classification Problems 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.1 Decision Theory for Classification . . . . . . . . . . . . . . . . . . 35 3.2 Linear Models for Classification . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Gaussian Process Classification . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 The Laplace Approximation for the Binary GP Classifier . . . . . . . . . . 41 3.4.1 Posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.2 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.4 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 47 ∗ 3.5 Multi-class Laplace Approximation . . . . . . . . . . . . . . . . . . . . . . 48 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Expectation Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6.1 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6.2 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.7.1 A Toy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.7.2 One-dimensional Example . . . . . . . . . . . . . . . . . . . . . . 62 3.7.3 Binary Handwritten Digit Classification Example . . . . . . . . . . 63 3.7.4 10-class Handwritten Digit Classification Example . . . . . . . . . 70 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 ∗Sections marked by an asterisk contain advanced material that may be omitted on a first reading. 3.5.1
C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. viii Contents ∗ 3.9 Appendix: Moment Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Exercises 74 75 4 Covariance Functions ∗ 4.1 Preliminaries 4.2 Examples of Covariance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Mean Square Continuity and Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Stationary Covariance Functions . . . . . . . . . . . . . . . . . . . 4.2.2 Dot Product Covariance Functions . . . . . . . . . . . . . . . . . . 4.2.3 Other Non-stationary Covariance Functions . . . . . . . . . . . . . 4.2.4 Making New Kernels from Old . . . . . . . . . . . . . . . . . . . . 4.3 Eigenfunction Analysis of Kernels . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 An Analytic Example . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Numerical Approximation of Eigenfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 79 81 81 82 89 90 94 96 97 98 99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 String Kernels 4.4.1 4.4.2 Fisher Kernels 4.4 Kernels for Non-vectorial Inputs 4.5 Exercises ∗ 5 Model Selection and Adaptation of Hyperparameters 105 5.1 The Model Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 Bayesian Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 Model Selection for GP Regression . . . . . . . . . . . . . . . . . . . . . . 112 5.4.1 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.2 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.3 Examples and Discussion . . . . . . . . . . . . . . . . . . . . . . . 118 5.5 Model Selection for GP Classification . . . . . . . . . . . . . . . . . . . . . 124 5.5.1 Derivatives of the Marginal Likelihood for Laplace’s Approximation 125 5.5.2 Derivatives of the Marginal Likelihood for EP . . . . . . . . . . . . 127 5.5.3 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.5.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.6 Exercises ∗ ∗ 6 Relationships between GPs and Other Models 129 6.1 Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . 129 6.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.2.1 Regularization Defined by Differential Operators . . . . . . . . . . 133 6.2.2 Obtaining the Regularized Solution . . . . . . . . . . . . . . . . . . 135 6.2.3 The Relationship of the Regularization View to Gaussian Process ∗ ∗ ∗ 6.4 Support Vector Machines Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.3 Spline Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3.1 A 1-d Gaussian Process Spline Construction . . . . . . . . . . . . . 138 . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Support Vector Classification . . . . . . . . . . . . . . . . . . . . . 141 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . 145 ∗ 6.5 Least-squares Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.5.1 Probabilistic Least-squares Classification . . . . . . . . . . . . . . . 147 6.4.1 6.4.2
分享到:
收藏