Gaussian Processes for Machine Learning.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：3.06M 资料格式：pdf 举报版权申诉

qq_42809850-10585308-4744300845388849956.pdf-第1页.png

第1页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第2页.png

第2页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第3页.png

第3页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第4页.png

第4页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第5页.png

第5页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第6页.png

第6页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第7页.png

第7页 / 共266页

qq_42809850-10585308-4744300845388849956.pdf-第8页.png

第8页 / 共266页

Series Foreword

Preface

Symbols and Notation

Introduction

A Pictorial Introduction to Bayesian Modelling

Roadmap

Regression

Weight-space View

The Standard Linear Model

Projections of Inputs into Feature Space

Function-space View

Varying the Hyperparameters

Decision Theory for Regression

An Example Application

Smoothing, Weight Functions and Equivalent Kernels

* Incorporating Explicit Basis Functions

Marginal Likelihood

History and Related Work

Exercises

Classification

Classification Problems

Decision Theory for Classification

Linear Models for Classification

Gaussian Process Classification

The Laplace Approximation for the Binary GP Classifier

Posterior

Predictions

Implementation

Marginal Likelihood

* Multi-class Laplace Approximation

Implementation

Expectation Propagation

Predictions

Marginal Likelihood

Implementation

Experiments

A Toy Problem

One-dimensional Example

Binary Handwritten Digit Classification Example

10-class Handwritten Digit Classification Example

Discussion

* Appendix: Moment Derivations

Exercises

Covariance Functions

Preliminaries

* Mean Square Continuity and Differentiability

Examples of Covariance Functions

Stationary Covariance Functions

Dot Product Covariance Functions

Other Non-stationary Covariance Functions

Making New Kernels from Old

Eigenfunction Analysis of Kernels

* An Analytic Example

Numerical Approximation of Eigenfunctions

Kernels for Non-vectorial Inputs

String Kernels

Fisher Kernels

Exercises

Model Selection and Adaptation of Hyperparameters

The Model Selection Problem

Bayesian Model Selection

Cross-validation

Model Selection for GP Regression

Marginal Likelihood

Cross-validation

Examples and Discussion

Model Selection for GP Classification

* Derivatives of the Marginal Likelihood for Laplace's Approximation

* Derivatives of the Marginal Likelihood for EP

Cross-validation

Example

Exercises

Relationships between GPs and Other Models

Reproducing Kernel Hilbert Spaces

Regularization

* Regularization Defined by Differential Operators

Obtaining the Regularized Solution

The Relationship of the Regularization View to Gaussian Process Prediction

Spline Models

* A 1-d Gaussian Process Spline Construction

* Support Vector Machines

Support Vector Classification

Support Vector Regression

* Least-squares Classification

Probabilistic Least-squares Classification

* Relevance Vector Machines

Exercises

Theoretical Perspectives

The Equivalent Kernel

Some Specific Examples of Equivalent Kernels

* Asymptotic Analysis

Consistency

Equivalence and Orthogonality

* Average-case Learning Curves

* PAC-Bayesian Analysis

The PAC Framework

PAC-Bayesian Analysis

PAC-Bayesian Analysis of GP Classification

Comparison with Other Supervised Learning Methods

* Appendix: Learning Curve for the Ornstein-Uhlenbeck Process

Exercises

Approximation Methods for Large Datasets

Reduced-rank Approximations of the Gram Matrix

Greedy Approximation

Approximations for GPR with Fixed Hyperparameters

Subset of Regressors

The Nyström Method

Subset of Datapoints

Projected Process Approximation

Bayesian Committee Machine

Iterative Solution of Linear Systems

Comparison of Approximate GPR Methods

Approximations for GPC with Fixed Hyperparameters

* Approximating the Marginal Likelihood and its Derivatives

* Appendix: Equivalence of SR and GPR Using the Nyström Approximate Kernel

Exercises

Further Issues and Conclusions

Multiple Outputs

Noise Models with Dependencies

Non-Gaussian Likelihoods

Derivative Observations

Prediction with Uncertain Inputs

Mixtures of Gaussian Processes

Global Optimization

Evaluation of Integrals

Student's t Process

Invariances

Latent Variable Models

Conclusions and Future Directions

Appendix Mathematical Background

Joint, Marginal and Conditional Probability

Gaussian Identities

Matrix Identities

Matrix Derivatives

Matrix Norms

Cholesky Decomposition

Entropy and Kullback-Leibler Divergence

Limits

Measure and Integration

Lp Spaces

Fourier Transforms

Convexity

Appendix Gaussian Markov Processes

Fourier Analysis

Sampling and Periodization

Continuous-time Gaussian Markov Processes

Continuous-time GMPs on R

The Solution of the Corresponding SDE on the Circle

Discrete-time Gaussian Markov Processes

Discrete-time GMPs on Z

The Solution of the Corresponding Difference Equation on PN

The Relationship Between Discrete-time and Sampled Continuous-time GMPs

Markov Processes in Higher Dimensions

Appendix Datasets and Code

Bibliography

Author Index

Subject Index

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Adaptive Computation and Machine Learning Thomas Dietterich, Editor Christopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns, Associate Editors Bioinformatics: The Machine Learning Approach, Pierre Baldi and Søren Brunak Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto Graphical Models for Machine Learning and Digital Communication, Brendan J. Frey Learning in Graphical Models, Michael I. Jordan Causation, Prediction, and Search, second edition, Peter Spirtes, Clark Glymour, and Richard Scheines Principles of Data Mining, David Hand, Heikki Mannila, and Padhraic Smyth Bioinformatics: The Machine Learning Approach, second edition, Pierre Baldi and Søren Brunak Learning Kernel Classiﬁers: Theory and Algorithms, Ralf Herbrich Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Bernhard Sch¨olkopf and Alexander J. Smola Introduction to Machine Learning, Ethem Alpaydin Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Gaussian Processes for Machine Learning Carl Edward Rasmussen Christopher K. I. Williams The MIT Press Cambridge, Massachusetts London, England

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. c 2006 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please email special sales@mitpress.mit.edu or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142. Typeset by the authors using LATEX 2ε. This book was printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Rasmussen, Carl Edward. Gaussian processes for machine learning / Carl Edward Rasmussen, Christopher K. I. Williams. p. cm. —(Adaptive computation and machine learning) Includes bibliographical references and indexes. ISBN 0-262-18253-X 1. Gaussian processes—Data processing. 2. Machine learning—Mathematical models. I. Williams, Christopher K. I. II. Title. III. Series. QA274.4.R37 2006 519.2'3—dc22 10 9 8 7 6 5 4 3 2 2005053433

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind. — James Clerk Maxwell [1850]

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X.

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. Contents Series Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Symbols and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1 Introduction 1.1 A Pictorial Introduction to Bayesian Modelling . . . . . . . . . . . . . . . 1.2 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Regression 2.1 Weight-space View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 The Standard Linear Model . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Projections of Inputs into Feature Space . . . . . . . . . . . . . . . 2.2 Function-space View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Varying the Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Decision Theory for Regression . . . . . . . . . . . . . . . . . . . . . . . . 2.5 An Example Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Smoothing, Weight Functions and Equivalent Kernels . . . . . . . . . . . Incorporating Explicit Basis Functions . . . . . . . . . . . . . . . . . . . . 2.7.1 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 History and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ∗ 2.7 1 3 5 7 7 8 11 13 19 21 22 24 27 29 29 30 3 Classiﬁcation 3.1 Classiﬁcation Problems 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.1 Decision Theory for Classiﬁcation . . . . . . . . . . . . . . . . . . 35 3.2 Linear Models for Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Gaussian Process Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 The Laplace Approximation for the Binary GP Classiﬁer . . . . . . . . . . 41 3.4.1 Posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.2 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.4 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 47 ∗ 3.5 Multi-class Laplace Approximation . . . . . . . . . . . . . . . . . . . . . . 48 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Expectation Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6.1 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6.2 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.7.1 A Toy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.7.2 One-dimensional Example . . . . . . . . . . . . . . . . . . . . . . 62 3.7.3 Binary Handwritten Digit Classiﬁcation Example . . . . . . . . . . 63 3.7.4 10-class Handwritten Digit Classiﬁcation Example . . . . . . . . . 70 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 ∗Sections marked by an asterisk contain advanced material that may be omitted on a ﬁrst reading. 3.5.1

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml ISBN 026218253X. viii Contents ∗ 3.9 Appendix: Moment Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Exercises 74 75 4 Covariance Functions ∗ 4.1 Preliminaries 4.2 Examples of Covariance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Mean Square Continuity and Diﬀerentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Stationary Covariance Functions . . . . . . . . . . . . . . . . . . . 4.2.2 Dot Product Covariance Functions . . . . . . . . . . . . . . . . . . 4.2.3 Other Non-stationary Covariance Functions . . . . . . . . . . . . . 4.2.4 Making New Kernels from Old . . . . . . . . . . . . . . . . . . . . 4.3 Eigenfunction Analysis of Kernels . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 An Analytic Example . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Numerical Approximation of Eigenfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 79 81 81 82 89 90 94 96 97 98 99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 String Kernels 4.4.1 4.4.2 Fisher Kernels 4.4 Kernels for Non-vectorial Inputs 4.5 Exercises ∗ 5 Model Selection and Adaptation of Hyperparameters 105 5.1 The Model Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 Bayesian Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 Model Selection for GP Regression . . . . . . . . . . . . . . . . . . . . . . 112 5.4.1 Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.2 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.3 Examples and Discussion . . . . . . . . . . . . . . . . . . . . . . . 118 5.5 Model Selection for GP Classiﬁcation . . . . . . . . . . . . . . . . . . . . . 124 5.5.1 Derivatives of the Marginal Likelihood for Laplace’s Approximation 125 5.5.2 Derivatives of the Marginal Likelihood for EP . . . . . . . . . . . . 127 5.5.3 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.5.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.6 Exercises ∗ ∗ 6 Relationships between GPs and Other Models 129 6.1 Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . 129 6.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.2.1 Regularization Deﬁned by Diﬀerential Operators . . . . . . . . . . 133 6.2.2 Obtaining the Regularized Solution . . . . . . . . . . . . . . . . . . 135 6.2.3 The Relationship of the Regularization View to Gaussian Process ∗ ∗ ∗ 6.4 Support Vector Machines Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.3 Spline Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3.1 A 1-d Gaussian Process Spline Construction . . . . . . . . . . . . . 138 . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Support Vector Classiﬁcation . . . . . . . . . . . . . . . . . . . . . 141 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . 145 ∗ 6.5 Least-squares Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.5.1 Probabilistic Least-squares Classiﬁcation . . . . . . . . . . . . . . . 147 6.4.1 6.4.2

分享到：

赞收藏

资料库

Gaussian Processes for Machine Learning.pdf

相关推荐

人工智能

热门标签

最新资料