Bayesian Data Analysis in Ecology Using Linear Models with R, BU....pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：13.51M 资料格式：pdf 举报版权申诉

aieshanzi-9275663-4744300845219839391.pdf-第1页.png

第1页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第2页.png

第2页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第3页.png

第3页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第4页.png

第4页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第5页.png

第5页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第6页.png

第6页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第7页.png

第7页 / 共316页

aieshanzi-9275663-4744300845219839391.pdf-第8页.png

第8页 / 共316页

Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan

Digital Assets

Acknowledgments

1. Why do we Need Statistical Models and What is this Book About?

1.1 Why We Need Statistical Models

1.2 What This Book is About

Further Reading

2. Prerequisites and Vocabulary

2.1 Software

2.1.1 What Is R?

2.1.2 Working with R

2.2 Important Statistical Terms and How to Handle Them in R

2.2.1 Data Sets, Variables, and Observations

2.2.2 Distributions and Summary Statistics

2.2.3 More on R Objects

2.2.4 R Functions for Graphics

2.2.5 Writing Our Own R Functions

Further Reading

3. The Bayesian and the Frequentist Ways of Analyzing Data

3.1 Short Historical Overview

3.2 The Bayesian Way

3.2.1 Estimating the Mean of a Normal Distribution with a Known Variance

3.2.2 Estimating Mean and Variance of a Normal Distribution Using Simulation

3.3 The Frequentist Way

3.4 Comparison of the Bayesian and the Frequentist Ways

Further Reading

4. Normal Linear Models

4.1 Linear Regression

4.1.1 Background

4.1.2 Fitting a Linear Regression in R

4.1.3 Drawing Conclusions

4.1.4 Frequentist Results

4.2 Regression Variants: ANOVA, ANCOVA, and Multiple Regression

4.2.1 One-Way ANOVA

4.2.2 Frequentist Results from a One-Way ANOVA

4.2.3 Two-Way ANOVA

4.2.4 Frequentist Results from a Two-Way ANOVA

4.2.5 Multiple Comparisons and Post Hoc Tests

4.2.6 Analysis of Covariance

4.2.7 Multiple Regression and Collinearity

4.2.8 Ordered Factors and Contrasts

4.2.9 Quadratic and Higher Polynomial Terms

Further Reading

5. Likelihood

5.1 Theory

5.2 The Maximum Likelihood Method

5.3 The Log Pointwise Predictive Density

Further Reading

6. Assessing Model Assumptions: Residual Analysis

6.1 Model Assumptions

6.2 Independent and Identically Distributed

6.3 The QQ Plot

6.4 Temporal Autocorrelation

6.5 Spatial Autocorrelation

6.6 Heteroscedasticity

Further Reading

7. Linear Mixed Effects Models

7.1 Background

7.1.1 Why Mixed Effects Models?

7.1.2 Random Factors and Partial Pooling

7.2 Fitting a Linear Mixed Model in R

7.3 Restricted Maximum Likelihood Estimation

7.4 Assessing Model Assumptions

7.5 Drawing Conclusions

7.6 Frequentist Results

7.7 Random Intercept and Random Slope

7.8 Nested and Crossed Random Effects

7.9 Model Selection in Mixed Models

Further Reading

8. Generalized Linear Models

8.1 Background

8.2 Binomial Model

8.2.1 Background

8.2.2 Fitting a Binomial Model in R

8.2.3 Assessing Model Assumptions: Overdispersion and Zero-Inflation

Option 1

Option 2

8.2.4 Drawing Conclusions

8.2.5 Frequentist Results

8.3 Fitting a Binary Logistic Regression in R

8.3.1 Some Final Remarks

8.4 Poisson Model

8.4.1 Background

8.4.2 Fitting a Poisson-Model in R

8.4.3 Assessing Model Assumptions

8.4.4 Drawing Conclusions

8.4.5 Modeling Rates and Densities: Poisson Model with an Offset

8.4.6 Frequentist Results

Further Reading

9. Generalized Linear Mixed Models

9.1 Binomial Mixed Model

9.1.1 Background

9.1.2 Fitting a Binomial Mixed Model in R

9.1.3 Assessing Model Assumptions

9.1.4 Drawing Conclusions

9.2 Poisson Mixed Model

9.2.1 Background

9.2.2 Fitting a Poisson Mixed Model in R

9.2.3 Assessing Model Assumptions

9.2.4 Drawing Conclusions

9.2.5 Modeling Bird Densities by a Poisson Mixed Model Including an Offset

Further Reading

10. Posterior Predictive Model Checking and Proportion of Explained Variance

10.1 Posterior Predictive Model Checking

10.2 Measures of Explained Variance

Further Reading

11. Model Selection and Multimodel Inference

11.1 When and Why We Select Models and Why This is Difficult

11.2 Methods for Model Selection and Model Comparisons

11.2.1 Cross-Validation

11.2.2 Information Criteria: Akaike Information Criterion and Widely Applicable Information Criterion

11.2.3 Other Information Criteria

11.2.4 Bayes Factors and Posterior Model Probabilities

11.2.5 Model-Based Methods to Obtain Posterior Model Probabilities and Inclusion Probabilities

11.2.6 “Least Absolute Shrinkage and Selection Operator” 䰀䄀匀匀伀 and Ridge Regression

11.3 Multimodel Inference

11.4 Which Method to Choose and Which Strategy to Follow

Further Reading

12. Markov Chain Monte Carlo Simulation

12.1 Background

12.2 MCMC Using BUGS

12.2.1 Using BUGS from OpenBUGS

12.2.2 Using BUGS from R

12.3 MCMC Using Stan

12.4 Sim, BUGS, and Stan

Further Reading

13. Modeling Spatial Data Using GLMM

13.1 Background

13.2 Modeling Assumptions

13.3 Explicit Modeling of Spatial Autocorrelation

13.3.1 Starting the Model Fitting

13.3.2 Variogram Modeling

13.3.3 Bayesian Modeling

13.3.4 OpenBUGS Example

Further Reading

14. Advanced Ecological Models

14.1 Hierarchical Multinomial Model to Analyze Habitat Selection Using BUGS

14.2 Zero-Inflated Poisson Mixed Model for Analyzing Breeding Success Using Stan

14.3 Occupancy Model to Measure Species Distribution Using Stan

14.4 Territory Occupancy Model to Estimate Survival Using BUGS

14.5 Analyzing Survival Based on Mark-Recapture Data Using Stan

Further Reading

15. Prior Influence and Parameter Estimability

15.1 How to Specify Prior Distributions

15.2 Prior Sensitivity Analysis

15.3 Parameter Estimability

Further Reading

16. Checklist

16.1 Data Analysis Step by Step

Step 1: Plausibility of Data

Step 2: Relationships

Step 3: Error Distribution

Step 4: Preparation of Explanatory Variables

Step 5: Data Structure

Step 6: Fit the Model

Step 7: Check Model Assumptions, Fit, and Sensitivity

Step 8: Model Uncertainty

Step 9: Draw Conclusions

Further Reading

17. What Should I Report in a Paper

17.1 How to Present the Results

17.2 How to Write Up the Statistical Methods

Further Reading

References

Index

Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan Fra¨ nzi Korner-Nievergelt Tobias Roth Stefanie von Felten Je´ roˆ me Gue´ lat Bettina Almasi Pius Korner-Nievergelt AMSTERDAM l BOSTON l HEIDELBERG l LONDON NEW YORK l OXFORD l PARIS l SAN DIEGO SAN FRANCISCO l SINGAPORE l SYDNEY l TOKYO Academic Press is an imprint of Elsevier

Academic Press is an imprint of Elsevier 32 Jamestown Road, London NW1 7BY, UK 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA 225 Wyman Street, Waltham, MA 02451, USA The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK Copyright Ó 2015 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangement with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this ﬁeld are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-12-801370-0 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress For information on all Academic Press Publications visit our website at http://store.elsevier.com/ Printed and bound in the USA

Digital Assets Thank you for selecting Academic Press’ Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan. To complement the learning experience, we have provided a number of online tools to accompany this edition. To view the R-package “blmeco” that contains all example data and some speciﬁc functions presented in the book, visit www.r-project.org. The full R-Code and exercises for each chapter are provided at www.oikostat.ch/blmeco.htm. x

Acknowledgments The basis of this book is a course script written for statistics classes at the International Max Planck Research School for Organismal Biology (IMPRS)dsee www.orn.mpg.de/2453/Short_portrait. We, therefore, sincerely thank all the IMPRS students who have used the script and worked with us. The text grew as a result of questions and problems that appeared during the application of linear models to the various Ph.D. projects of the IMPRS stu- dents. Their enthusiasm in analyzing data and discussion of their problems motivated us to write this book, with the hope that it will be of help to future students. We especially thank Daniel Piechowski and Margrit Hieber-Ruiz for hiring us to give the courses at the IMPRS. The main part of the book was written by FK and PK during time spent at the Colorado Cooperative Fish and Wildlife Research Unit and the Department of Fish, Wildlife, and Conservation Biology at Colorado State University in the spring of 2014. Here, we were kindly hosted and experienced a motivating time. William Kendall made this possible, for which we are very grateful. Gabriele Engeler and Joyce Pratt managed the administrational challenges of tenure there and made us feel at home. Allison Level kindly introduced us to the CSU library system, which we used extensively while writing this book. We enjoyed a very inspiring environment and cordially thank all the Fish, Wildlife, and Conservation Biology staff and students who we met during our stay. The companies and institutions at which the authors were employed during the work on the book always positively supported the project, even when it produced delays in other projects. We are grateful to all our colleagues at the Swiss Ornithological Institute (www.vogelwarte.ch), oikostat GmbH (www.oikostat.ch), Hintermann & Weber AG (www.hintermannweber.ch), the University of Basel, and the Clinical Trial Unit at the University of Basel (www.scto.ch/en/CTU-Network/CTU-Basel.html). We are very grateful to the R Development Core Team (http://www. r-project.org/contributors.html) for providing and maintaining this wonderful software and network tool. We appreciate the ﬂexibility and understandability of the language R and the possibilitiy to easily exchange code. Similarly, we would like to thank the developers of BUGS (http://www.openbugs.net/w/ BugsDev) and Stan (http://mc-stan.org/team.html) their extremely useful software freely available. Coding BUGS or Stan has helped for making all xi

xii Acknowledgments us in many cases to think more clearly about the biological processes that have generated our data. Example data were kindly provided by the Ulmet-Kommission (www.bnv. ch), the Landschaft und Gewa¨sser of Kanton Aargau, the Swiss Ornithological Institute (www.vogelwarte.ch), Valentin Amrhein, Anja Bock, Christoph Bu¨hler, Karl-Heinz Clever, Thomas Gottschalk, Martin Gru¨ebler, Gu¨nther Herbert, Thomas Hoffmeister, Rainer Holler, Beat Naef-Daenzer, Werner Peter, Luc Schifferli, Udo Seum, Maris Strazds, and Jean-Luc Zollinger. For comments on the manuscript we thank Martin Bulla, Kim Meichtry- Stier and Marco Perrig. We also thank Roey Angel, Karin Boos, Paul Conn, and two anonymous reviewers for many valuable suggestions regarding the book’s structure and details in the text. Holger Schielzeth gave valuable comments and input for Chapter 10, and David Anderson and Michael Schaub commented on Chapter 11. Bob Carpenter ﬁgured out essential parts of the Stan code for the CormackeJollyeSeber model. Michael Betancourt and Bob Carpenter commented on the introduction to MCMC and the Stan examples. Valentin Amrhein and Barbara Helm provided input for Chapter 17. All these people greatly improved the quality of the book, made the text more accessible, and helped reduce the error rate. Finally, we are extremely thankful for the tremendous work that Kate Huyvaert did proofreading our English.

Chapter 1 Why do we Need Statistical Models and What is this Book About? Chapter Outline 1.1 Why We Need Statistical Models 1 1.2 What This Book is About 2 1.1 WHY WE NEED STATISTICAL MODELS There are at least four main reasons why statistical models are used: (1) models help to describe how we think a system works, (2) data can be summarized using models, (3) comparison of model predictions with data helps with understanding the system, and (4) models allow for predictions, including the quantiﬁcation of their uncertainty, and, therefore, they help with making decisions. A statistical model is a mathematical construct based on probability theory that aims to reconstruct the system or the process under study; the data are observations of this system or process. When we speak of “models” in this book, we always mean statistical models. Models express what we know (or, better, what we think we know) about a natural system. The difference between the model and the observations shows that what we think about the system may still not be realistic and, therefore, points out what we may want to think about more intensively. In this way, statistical models help with understanding natural systems. Analyzing data using statistical models is rarely just applying one model to the data and extracting the results. Rather, it is an iterative process of ﬁtting a model, comparing the model with the data, gaining insight into the system from speciﬁc discrepancies between the model and the data, and then ﬁnding a more realistic model. Analyzing data using statistical models is a learning process. Reality is usually too complex to be perfectly represented by a model. Thus, no model is perfect, but a good model is useful (e.g., Box, 1979). Often, several models may be plausible and ﬁt the data reasonably well. In such cases, the inference can be based on the set of all models, or a model that performs best for Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan http://dx.doi.org/10.1016/B978-0-12-801370-0.00001-0. Copyright © 2015 Elsevier Inc. All rights reserved. 1

2 Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan a speciﬁc purpose is selected. In Chapter 11 we have compiled a number of approaches we found useful for model comparisons and multimodel inference. Once we have one or several models, we want to draw inferences from the model(s). Estimates of the effects of the predictor variables on the outcome variables, ﬁtted values, or derived quantities that are of biological interest are extracted, together with an uncertainty estimate. In this book we use, except in one example, Bayesian methods to assess uncertainty of the estimates. Models summarize data. When we have measured the height of 100 trees in a forest and we would like to report these heights to colleagues, we report the mean and the standard deviation instead of reporting all 100 values. The mean and the standard deviation, together with a distributional assumption (e.g., the normal distribution) represent a statistical model that describes the data. We do not need to report all 100 values because the 2 values (mean and standard deviation) describe the distribution of the 100 values sufﬁciently well so that people have a picture of the heights of the 100 trees. With increasing complexity of the data, we need more complex models that summarize the data in a sensible way. Statistical models are widely applied because they allow for quantifying uncertainty and making predictions. A well-known application of statistical models is the weather forecast. Additional examples include the prediction of bird or bat collision risks at wind energy turbines based on some covariates, the avalanche bulletins, or all the models used to predict changes of an ecosystem when temperatures rise. Political decisions are often based on models or model predictions. Models are pervasive; they even govern our daily life. For example, we ﬁrst expected our children to get home before 3:30 p.m. because we knew that the school bus drops them off at 3:24, and a child can walk 200 m in around 4 min. What we had in mind was a model child. After some weeks observing the time our children came home after school, we could compare the model prediction with real data. Based on this comparison and short interviews with the children, we included “playing with the neighbor’s dog” in our model and updated the expected arrival time to 3:45 p.m. 1.2 WHAT THIS BOOK IS ABOUT This book is about a broad class of statistical models called linear models. Such models have a systematic part and a stochastic part. The systematic part describes how the outcome variable (y, variable of interest) is related to the predictor variables (x, explanatory variables). This part produces the ﬁtted values that are completely deﬁned by the values of the predictor variables. The stochastic part of the model describes the scatter of the observations around the ﬁtted values using a probability distribution. For example, a regression line is the systematic part of the model, and the scatter of the data around the regression line (more precisely: the distribution of the residuals) is the stochastic part.

Why do we Need Statistical Models Chapter j 1 3 Linear models are probably the most commonly used models in biology and in many other research areas. Linear models form the basis for many statistical methods such as survival analysis, structural equation analysis, variance components analysis, time-series analysis, and most multivariate techniques. It is of crucial importance to understand linear models when doing quantitative research in biology, agronomy, social sciences, and so on. This book introduces linear models and describes how to ﬁt linear models in R, BUGS, and Stan. The book is written for scientists (particularly organismal biologists and ecologists; many of our examples come from ecology). The number of mathematical formulae is reduced to what we think is essential to correctly interpret model structure and results. Chapter 2 provides some basic information regarding software used in this book, important statistical terms, and how to work with them using the statistical software package R, which is used in most chapters of the book. The linear relationship between the outcome y and the predictor x can be straightforward, as in linear models with normal error distribution (normal linear model, LM, Chapter 4). But the linear relationship can also be indirect via a link function. In this case, the direct linear relationship is between a transformed outcome variable and the predictor variables, and, usually, the model has a nonnormal error distribution such as Poisson or binomial (generalized linear model, GLM, Chapter 8). Generalized linear models can handle outcome variables that are not on a continuous and inﬁnite scale, such as counts and proportions. For some linear models (LM, GLM) the observations are required to be independent of each other. However, this is often not the case, for example, when more than one measurement is taken on the same individual (i.e., repeated measurements) or when several individuals belong to the same nest, farm, or another grouping factor. Such data should be analyzed using mixed models (LMM, GLMM, Chapters 7 and 9); they account for the nonindependence of the observations. Nonindependence of data may also be introduced when observations are made close to each other (in space or time). In Chapter 6 we show how temporal or spatial autocorrelation is detected and we give a few hints about how temporal correlation can be addressed. In Chapter 13, we analyze spatial data using a species distribution example. Chapter 14 contains examples of more complex analyses of ecological data sets. These models should be understandable with the theory learned in the ﬁrst part of the book. The chapter presents ideas on how the linear model can be expanded to more complex models. The software BUGS and Stan, introduced in Chapter 12, are used to ﬁt these complex models. BUGS and Stan are relatively easy to use and ﬂexible enough to build many speciﬁc models. We hope that this chapter motivates biologists and others to build their own models for the particular process they are investigating. Throughout the book, we treat model checking using graphical methods with high importance. Residual analysis is discussed in Chapter 6. Chapter 10

分享到：

赞收藏

资料库

Bayesian Data Analysis in Ecology Using Linear Models with R, BU....pdf

相关推荐

课程资源

热门标签

最新资料