logo资料库

Bayesian Learning for Neural Networks-Springer-Verlag New York.pdf

第1页 / 共193页
第2页 / 共193页
第3页 / 共193页
第4页 / 共193页
第5页 / 共193页
第6页 / 共193页
第7页 / 共193页
第8页 / 共193页
资料共193页,剩余部分请下载后查看
Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger 118
Springer Science+Business Media, LLC
Radford M. Neal Bayesian Learning for Neural Networks , Springer
Radford M. Nea! Department of Statistics and Department of Computer Science University of Toronto Toronto, Ontario Canada MSS IA4 ISBN 978-0-387-94724-2 DOI 10.1007/978-1-4612-0745-0 ISBN 978-1-4612-0745-0 (eBook) CIP data available. Printed on acid-free paper. © 1996 Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc. in 1996 AlI rights reserved. This work may not be translated or copied in whole or in part without the written permission ofthe publisher Springer Science+Business Media, LLC, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyonc. Camera ready copy provided by the author. 9 8 7 6 5 4 3
Preface This book, a revision of my Ph.D. thesis,l explores the Bayesian approach to learning flexible statistical models based on what are known as "neural networks". These models are now commonly used for many applications, but understanding why they (sometimes) work well, and how they can best be employed is still a matter for research. My aim in the work reported here to show that a Bayesian approach to learning these models is two-fold - can yield theoretical insights, and to show also that it can be useful in practice. The strategy for dealing with complexity that I advocate here for neural network models can also be applied to other complex Bayesian models, as can the computational methods that I employ. In Chapter 1, I introduce the Bayesian framework for learning, the neu ral network models that will be examined, and the Markov chain Monte Carlo methods on which the implementation is based. This presentation presupposes only that the reader possesses a basic statistical background. Chapter 1 also introduces the major themes of this book, which involve two fundamental characteristics of Bayesian learning. First, Bayesian learn ing starts with a prior probability distribution for model parameters, which is supposed to capture our beliefs about the problem derived from back ground knowledge. Second, Bayesian predictions are not based on a single estimate for the model parameters, but rather are found by integrating the 1 Bayesian Learning for Neural Networks, Department of Computer Science, Univer sity of Toronto, 1995.
VI Preface model's predictions with respect to the posterior parameter distribution that we obtain when we update the prior to take account of the data. For neural network models, both these aspects present difficulties - the prior over network parameters has no obvious relation to any prior knowledge we are likely to have, and integration over the posterior distribution is computationally very demanding. I address the first of these problems in Chapter 2, by defining classes of prior distributions for network parameters that reach sensible limits as the size of the network goes to infinity. In this limit, the properties of these priors can be elucidated. Some priors converge to Gaussian processes, in which functions computed by the network may be smooth, Brownian, or fractionally Brownian. Other priors converge to non-Gaussian stable processes. Interesting effects are obtained by combining priors of both sorts in networks with more than one hidden layer. This work shows that within the Bayesian framework there is no theoretical need to limit the complexity of neural network models. Indeed, limiting complexity is likely to conflict with our prior beliefs, and can therefore be justified only to the extent that it is necessary for computational reasons. The computational problem of integrating over the posterior distribu tion is addressed in Chapter 3, using Markov chain Monte Carlo methods. I demonstrate that the hybrid Monte Carlo algorithm, originally developed for applications in quantum chromodynamics, is superior to the methods based on simple random walks that are widely used in statistical applica tions at present. The hybrid Monte Carlo method makes the use of complex Bayesian network models possible in practice, though the computation time required can still be substantial. In Chapter 4, I use a hybrid Monte Carlo implementation to test the performance of Bayesian neural network models on several synthetic and real data sets. Good results are obtained on small data sets when large networks are used in conjunction with priors designed to reach limits as network size increases, confirming that with Bayesian learning one need not restrict the complexity of the network based on the size of the data set. A Bayesian approach is also found to be effective in automatically determining the relevance of inputs. Finally, in Chapter 5, I draw some conclusions from this work, and briefly discuss related work by myself and others since the completion of the original thesis. Readers interested in pursuing research in this area may obtain free soft ware implementing the methods, as described in Appendix B. One should note, however, that this software is not intended for use in routine data analysis. The software is also designed only for use on Unix systems.
Preface vii Of the many people who have contributed to this work, I would like first of all to thank my thesis advisor, Geoffrey Hinton. His enthusiasm for understanding learning, his openness to new ideas, and his ability to provide insightful criticism have made working 'with him a joy. I am also fortunate to have been part of the research group he has led, and of the wider AI group at the University of Toronto. I would particularly like to thank fellow students Richard Mann, Carl Rasmussen, and Chris Williams for their helpful comments on this work and its precursors. My thanks also go to the present and former members of my Ph.D. committee, Mike Evans, Scott Graham, Rudy Mathon, Demetri Terzopoulos, and Rob Tibshirani. I am especially pleased to thank David MacKay, whose work on Bayesian learning and its application to neural network models has been an inspi ration to me. He has also contributed much to this work through many conversations and e-mail exchanges, which have ranged from the philos ophy of Bayesian inference to detailed comments on presentation. I have benefited from discussions with other researchers as well, in particular, Wray Buntine, Brian Ripley, Hans Henrik Thodberg, and David Wolpert. This work was funded by the Natural Sciences and Engineering Research Council of Canada and by the Information Technology Research Centre. For part of my studies, I was supported by an Ontario Government Scholarship.
Contents Preface 1 Introduction 1.1 Bayesian and frequentist views of learning . . . 1.1.1 Models and likelihood . . . . . . 1.1.2 Bayesian learning and prediction 1.1.3 Hierarchical models 1.1.4 Learning complex models . . . . 1.2 Bayesian neural networks . . . . . . . . 1.2.1 Multilayer perceptron networks. 1.2.2 Selecting a network model and prior 1.2.3 Automatic Relevance Determination (ARD) models 1.2.4 An illustration of Bayesian learning for a neural net Implementations based on Gaussian approximations 1.2.5 1.3 Markov chain Monte Carlo methods . . . . . . . . . . 1.3.1 Monte Carlo integration using Markov chains . 1.3.2 Gibbs sampling. . . . . . . . . 1.3.3 The Metropolis algorithm . . . 1.4 Outline of the remainder of the book III 1 3 3 4 6 7 10 10 14 15 17 19 22 23 25 26 28
分享到:
收藏