logo资料库

A_First_Course_in_Bayesian_Statistical_Methods.pdf

第1页 / 共269页
第2页 / 共269页
第3页 / 共269页
第4页 / 共269页
第5页 / 共269页
第6页 / 共269页
第7页 / 共269页
第8页 / 共269页
资料共269页,剩余部分请下载后查看
A First Course in Bayesian
Preface
Contents
1-Introduction and examples
Introduction and examples
Introduction
Why Bayes?
Estimating the probability of a rare event
Building a predictive model
Where we are going
Discussion and further references
2-Belief, probability and exchangeability
Belief, probability and exchangeability
Belief functions and probabilities
Events, partitions and Bayes' rule
Independence
Random variables
Discrete random variables
Continuous random variables
Descriptions of distributions
Joint distributions
Independent random variables
Exchangeability
de Finetti's theorem
Discussion and further references
3-One-parameter models
One-parameter models
The binomial model
Inference for exchangeable binary data
Confidence regions
The Poisson model
Posterior inference
Example: Birth rates
Exponential families and conjugate priors
Discussion and further references
4-Monte Carlo approximation
Monte Carlo approximation
The Monte Carlo method
Posterior inference for arbitrary functions
Sampling from predictive distributions
Posterior predictive model checking
Discussion and further references
5-The normal model
The normal model
The normal model
Inference for the mean, conditional on the variance
Joint inference for the mean and variance
Bias, variance and mean squared error
Prior specification based on expectations
The normal model for non-normal data
Discussion and further references
6-Posterior approximation with the Gibbs sampler
Posterior approximation with the Gibbs sampler
A semiconjugate prior distribution
Discrete approximations
Sampling from the conditional distributions
Gibbs sampling
General properties of the Gibbs sampler
Introduction to MCMC diagnostics
Discussion and further references
7-The multivariate normal model
The multivariate normal model
The multivariate normal density
A semiconjugate prior distribution for the mean
The inverse-Wishart distribution
Gibbs sampling of the mean and covariance
Missing data and imputation
Discussion and further references
8-Group comparisons and hierarchical modeling
Group comparisons and hierarchical modeling
Comparing two groups
Comparing multiple groups
Exchangeability and hierarchical models
The hierarchical normal model
Posterior inference
Example: Math scores in U.S. public schools
Prior distributions and posterior approximation
Posterior summaries and shrinkage
Hierarchical modeling of means and variances
Analysis of math score data
Discussion and further references
9-Linear regression
Linear regression
The linear regression model
Least squares estimation for the oxygen uptake data
Bayesian estimation for a regression model
A semiconjugate prior distribution
Default and weakly informative prior distributions
Model selection
Bayesian model comparison
Gibbs sampling and model averaging
Discussion and further references
10-Nonconjugate priors and Metropolis-Hastings algorithms
Nonconjugate priors and Metropolis-Hastings algorithms
Generalized linear models
The Metropolis algorithm
The Metropolis algorithm for Poisson regression
Metropolis, Metropolis-Hastings and Gibbs
The Metropolis-Hastings algorithm
Why does the Metropolis-Hastings algorithm work?
Combining the Metropolis and Gibbs algorithms
A regression model with correlated errors
Analysis of the ice core data
Discussion and further references
11-Linear and generalized linear mixed effects models
Linear and generalized linear mixed effects models
A hierarchical regression model
Full conditional distributions
Posterior analysis of the math score data
Generalized linear mixed effects models
A Metropolis-Gibbs algorithm for posterior approximation
Analysis of tumor location data
Discussion and further references
12-Latent variable methods for ordinal data
Latent variable methods for ordinal data
Ordered probit regression and the rank likelihood
Probit regression
Transformation models and the rank likelihood
The Gaussian copula model
Rank likelihood for copula estimation
Discussion and further references
Exercises
Exercises
Common distributions
References
Index
Springer Texts in Statistics Series Editors: G. Casella S. Fienberg I. Olkin For other titles published in this series, go to http://www.springer.com/series/417
Peter D. Hoff A First Course in Bayesian Statistical Methods 123
Peter D. Hoff Department of Statistics University of Washington Seattle WA 98195-4322 USA hoff@stat.washington.edu ISSN 1431-875X ISBN 978-0-387-92299-7 DOI 10.1007/978-0-387-92407-6 Springer Dordrecht Heidelberg London New York e-ISBN 978-0-387-92407-6 Library of Congress Control Number: 2009929120 c Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface This book originated from a set of lecture notes for a one-quarter graduate- level course taught at the University of Washington. The purpose of the course is to familiarize the students with the basic concepts of Bayesian theory and to quickly get them performing their own data analyses using Bayesian com- putational tools. The audience for this course includes non-statistics graduate students who did well in their department’s graduate-level introductory statis- tics courses and who also have an interest in statistics. Additionally, first- and second-year statistics graduate students have found this course to be a useful introduction to statistical modeling. Like the course, this book is intended to be a self-contained and compact introduction to the main concepts of Bayesian theory and practice. By the end of the text, readers should have the ability to understand and implement the basic tools of Bayesian statistical methods for their own data analysis purposes. The text is not intended as a comprehen- sive handbook for advanced statistical researchers, although it is hoped that this latter category of readers could use this book as a quick introduction to Bayesian methods and as a preparation for more comprehensive and detailed studies. Computing Monte Carlo summaries of posterior distributions play an important role in the way data analyses are presented in this text. My experience has been that once a student understands the basic idea of posterior sampling, their data analyses quickly become more creative and meaningful, using relevant posterior predictive distributions and interesting functions of parameters. The open-source R statistical computing environment provides sufficient function- ality to make Monte Carlo estimation very easy for a large number of statis- tical models, and example R-code is provided throughout the text. Much of the example code can be run “as is” in R, and essentially all of it can be run after downloading the relevant datasets from the companion website for this book.
VI Preface Acknowledgments The presentation of material in this book, and my teaching style in general, have been heavily influenced by the diverse set of students taking CSSS-STAT 564 at the University of Washington. My thanks to them for improving my teaching. I also thank Chris Hoffman, Vladimir Minin, Xiaoyue Niu and Marc Suchard for their extensive comments, suggestions and corrections for this book, and to Adrian Raftery for bibliographic suggestions. Finally, I thank my wife Jen for her patience and support. Seattle, WA Peter Hoff March 2009
Contents 1 Introduction and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Why Bayes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Estimating the probability of a rare event . . . . . . . . . . . . 1.2.2 Building a predictive model . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 8 1.3 Where we are going . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Belief, probability and exchangeability . . . . . . . . . . . . . . . . . . . . . 13 2.1 Belief functions and probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Events, partitions and Bayes’ rule . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.3 Descriptions of distributions . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Independent random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.8 de Finetti’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.9 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1 The binomial model 3 One-parameter models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1 Inference for exchangeable binary data . . . . . . . . . . . . . . . 35 3.1.2 Confidence regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.1 Posterior inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.2 Example: Birth rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 Exponential families and conjugate priors . . . . . . . . . . . . . . . . . . . 51 3.4 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 The Poisson model
VIII Contents 4 Monte Carlo approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 The Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Posterior inference for arbitrary functions . . . . . . . . . . . . . . . . . . . 57 4.3 Sampling from predictive distributions . . . . . . . . . . . . . . . . . . . . . 60 4.4 Posterior predictive model checking . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5 The normal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1 The normal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Inference for the mean, conditional on the variance . . . . . . . . . . 69 5.3 Joint inference for the mean and variance . . . . . . . . . . . . . . . . . . . 73 5.4 Bias, variance and mean squared error . . . . . . . . . . . . . . . . . . . . . 79 5.5 Prior specification based on expectations . . . . . . . . . . . . . . . . . . . 83 5.6 The normal model for non-normal data . . . . . . . . . . . . . . . . . . . . . 84 5.7 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6 Posterior approximation with the Gibbs sampler . . . . . . . . . . 89 6.1 A semiconjugate prior distribution . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2 Discrete approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.3 Sampling from the conditional distributions . . . . . . . . . . . . . . . . . 92 6.4 Gibbs sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.5 General properties of the Gibbs sampler . . . . . . . . . . . . . . . . . . . . 96 6.6 Introduction to MCMC diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 98 6.7 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7 The multivariate normal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.1 The multivariate normal density . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2 A semiconjugate prior distribution for the mean . . . . . . . . . . . . . 107 7.3 The inverse-Wishart distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.4 Gibbs sampling of the mean and covariance . . . . . . . . . . . . . . . . . 112 7.5 Missing data and imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.6 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8 Group comparisons and hierarchical modeling . . . . . . . . . . . . . 125 8.1 Comparing two groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 8.2 Comparing multiple groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.2.1 Exchangeability and hierarchical models . . . . . . . . . . . . . . 131 8.3 The hierarchical normal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.3.1 Posterior inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.4 Example: Math scores in U.S. public schools . . . . . . . . . . . . . . . . 135 8.4.1 Prior distributions and posterior approximation . . . . . . . 137 8.4.2 Posterior summaries and shrinkage . . . . . . . . . . . . . . . . . . 140 8.5 Hierarchical modeling of means and variances . . . . . . . . . . . . . . . 143 8.5.1 Analysis of math score data . . . . . . . . . . . . . . . . . . . . . . . . 145 8.6 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Contents IX 9 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.1 The linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.1.1 Least squares estimation for the oxygen uptake data . . . 153 9.2 Bayesian estimation for a regression model . . . . . . . . . . . . . . . . . . 154 9.2.1 A semiconjugate prior distribution . . . . . . . . . . . . . . . . . . . 154 9.2.2 Default and weakly informative prior distributions . . . . . 155 9.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.3.1 Bayesian model comparison . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.3.2 Gibbs sampling and model averaging . . . . . . . . . . . . . . . . . 167 9.4 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 170 10 Nonconjugate priors and Metropolis-Hastings algorithms . . 171 10.1 Generalized linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 10.2 The Metropolis algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 10.3 The Metropolis algorithm for Poisson regression . . . . . . . . . . . . . 179 10.4 Metropolis, Metropolis-Hastings and Gibbs . . . . . . . . . . . . . . . . . 181 10.4.1 The Metropolis-Hastings algorithm . . . . . . . . . . . . . . . . . . 182 10.4.2 Why does the Metropolis-Hastings algorithm work? . . . . 184 10.5 Combining the Metropolis and Gibbs algorithms . . . . . . . . . . . . 187 10.5.1 A regression model with correlated errors . . . . . . . . . . . . . 188 10.5.2 Analysis of the ice core data . . . . . . . . . . . . . . . . . . . . . . . . 191 10.6 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 192 11 Linear and generalized linear mixed effects models. . . . . . . . . 195 11.1 A hierarchical regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 11.2 Full conditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 11.3 Posterior analysis of the math score data . . . . . . . . . . . . . . . . . . . 200 11.4 Generalized linear mixed effects models . . . . . . . . . . . . . . . . . . . . 201 11.4.1 A Metropolis-Gibbs algorithm for posterior approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 11.4.2 Analysis of tumor location data . . . . . . . . . . . . . . . . . . . . . 203 11.5 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 207 12 Latent variable methods for ordinal data . . . . . . . . . . . . . . . . . . 209 12.1 Ordered probit regression and the rank likelihood . . . . . . . . . . . . 209 12.1.1 Probit regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 12.1.2 Transformation models and the rank likelihood . . . . . . . . 214 12.2 The Gaussian copula model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 12.2.1 Rank likelihood for copula estimation . . . . . . . . . . . . . . . . 218 12.3 Discussion and further references . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Common distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
分享到:
收藏