logo资料库

applied econometrics with r.pdf

第1页 / 共229页
第2页 / 共229页
第3页 / 共229页
第4页 / 共229页
第5页 / 共229页
第6页 / 共229页
第7页 / 共229页
第8页 / 共229页
资料共229页,剩余部分请下载后查看
Preface
Contents
1 Introduction
1.1 An Introductory R Session
1.2 Getting Started
1.3 Working with R
1.4 Getting Help
1.5 The Development Model
1.6 A Brief History of R
2 Basics
2.1 R as a Calculator
2.2 Matrix Operations
2.3 R as a Programming Language
2.4 Formulas
2.5 Data Management in R
2.6 Object Orientation
2.7 R Graphics
2.8 Exploratory Data Analysis with R
2.9 Exercises
3 Linear Regression
3.1 Simple Linear Regression
Analysis of variance
Point and interval estimates
Prediction
Plotting "lm" objects
Testing a linear hypothesis
3.2 Multiple Linear Regression
Dummy variables and contrast coding
The function I()
Comparison of models
3.3 Partially Linear Models
3.4 Factors, Interactions, and Weights
Interactions
Separate regressions for each level
Change of the reference category
Weighted least sqaures
3.5 Linear Regression with Time Series Data
Encompassing test
3.6 Linear Regression with Panel Data
Static linear models
Dynamic linear models
3.7 System of Linear Equations
3.8 Exercises
4 Diagnostics and Alternative Methods of Regression
4.1 Regression Diagnostistics
Leverage and standardized residuals
Deletion diagnostics
The function influence.measure()
4.2 Diagnostic Tests
Testing for heteroskedasticity
Testing the functional form
Testing for autocorrelation
4.3 Robust Standard Errors and Tests
HC estimators
HAC estimators
4.4 Resistant Regression
4.5 Quantile Regression
4.6 Exercises
5 Models of Microeconometrics
5.1 Generalized Linear Models
5.2 Binary Dependent Variables
Visualization
Effects
Goodness of fit and Prediction
Residuals and diagnostics
(Quasi-)complete separation
5.3 Regression Models for Count Data
Dealing with overdispersion
Robust standard errors
Zero-inflated Poisson and negative binomial models
Hurdle models
5.4 Censored Dependent Variables
5.5 Extensions
A semiparapetric binary response model
Multinomial responses
Ordinal responses
5.6 Exercises
6 Time Series
6.1 Infrastructure and "Naive" Methods
Classes for time series data
(Linear) filtering
Decomposition
Exponential smoothing
6.2 Classical Model-Based Analysis
6.3 Stationarity, Unit Roots, and Cointegration
Unit-root tests
Stationary tests
Cointegration
6.4 Time Series Regression and Structural Change
More on fitting dynamic regression models
Structural change tests
Dating structural changes
6.5 Extensions
Structural time series models
GARCH models
6.6 Exercises
7 Programming Your Own Analysis
7.1 Simulations
Data-generating process
Evaluation for a single scenario
Iterated evaluation over all scenarios
Simulation and summary
7.2 Bootstrapping a Linear Regression
7.3 Maximizing a Likelihood
7.4 Reproducible Econometrics Using Sweave()
7.5 Exercises
References
Index
Use R! Advisors: Robert Gentleman· Kurt Hornik· Giovanni Parmigiani
Use R! Albert: Bayesian Computation with R Bivand/Pebesma/G´omez-Rubio: Applied Spatial Data Analysis with R Claude: Morphometrics with R Cook/Swayne: Interactive and Dynamic Graphics for Data Analysis: With R and GGobi Hahne/Huber/Gentleman/Falcon: Bioconductor Case Studies Kleiber/Zeileis, Applied Econometrics with R Nason: Wavelet Methods in Statistics with R Paradis: Analysis of Phylogenetics and Evolution with R Peng/Dominici: Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health Pfaff: Analysis of Integrated and Cointegrated Time Series with R, 2nd edition Sarkar: Lattice: Multivariate Data Visualization with R Spector: Data Manipulation with R
Christian Kleiber · Achim Zeileis Applied Econometrics with R 123
Christian Kleiber Universit¨at Basel WWZ, Department of Statistics and Econometrics Petersgraben 51 CH-4051 Basel Switzerland Christian.Kleiber@unibas.ch Achim Zeileis Wirtschaftsuniversit¨at Wien Department of Statistics and Mathematics Augasse 2–6 A-1090 Wien Austria Achim.Zeileis@wu-wien.ac.at Kurt Hornik Department of Statistics and Mathematics Wirtschaftsuniversit¨at Wien Augasse 2–6 A-1090 Wien Austria Series Editors Robert Gentleman Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., M2-B876 PO Box 19024, Seattle, Washington 98102-1024 USA Giovanni Parmigiani The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University 550 North Broadway Baltimore, MD 21205-2011 USA ISBN: 978-0-387-77316-2 DOI: 10.1007/978-0-387-77318-6 e-ISBN: 978-0-387-77318-6 Library of Congress Control Number: 2008934356 c 2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com
Preface R is a language and environment for data analysis and graphics. It may be considered an implementation of S, an award-winning language initially de- veloped at Bell Laboratories since the late 1970s. The R project was initiated by Robert Gentleman and Ross Ihaka at the University of Auckland, New Zealand, in the early 1990s, and has been developed by an international team since mid-1997. Historically, econometricians have favored other computing environments, some of which have fallen by the wayside, and also a variety of packages with canned routines. We believe that R has great potential in econometrics, both for research and for teaching. There are at least three reasons for this: (1) R is mostly platform independent and runs on Microsoft Windows, the Mac family of operating systems, and various flavors of Unix/Linux, and also on some more exotic platforms. (2) R is free software that can be downloaded and installed at no cost from a family of mirror sites around the globe, the Comprehensive R Archive Network (CRAN); hence students can easily install it on their own machines. (3) R is open-source software, so that the full source code is available and can be inspected to understand what it really does, learn from it, and modify and extend it. We also like to think that platform independence and the open-source philosophy make R an ideal environment for reproducible econometric research. This book provides an introduction to econometric computing with R; it is not an econometrics textbook. Preferably readers have taken an introductory econometrics course before but not necessarily one that makes heavy use of matrices. However, we do assume that readers are somewhat familiar with ma- trix notation, specifically matrix representations of regression models. Thus, we hope the book might be suitable as a “second book” for a course with sufficient emphasis on applications and practical issues at the intermediate or beginning graduate level. It is hoped that it will also be useful to profes- sional economists and econometricians who wish to learn R. We cover linear regression models for cross-section and time series data as well as the com- mon nonlinear models of microeconometrics, such as logit, probit, and tobit
vi Preface models, as well as regression models for count data. In addition, we provide a chapter on programming, including simulations, optimization, and an in- troduction to Sweave()—an environment that allows integration of text and code in a single document, thereby greatly facilitating reproducible research. (In fact, the entire book was written using Sweave() technology.) We feel that students should be introduced to challenging data sets as early as possible. We therefore use a number of data sets from the data archives of leading applied econometrics journals such as the Journal of Ap- plied Econometrics and the Journal of Business & Economic Statistics. Some of these have been used in recent textbooks, among them Baltagi (2002), Davidson and MacKinnon (2004), Greene (2003), Stock and Watson (2007), and Verbeek (2004). In addition, we provide all further data sets from Bal- tagi (2002), Franses (1998), Greene (2003), and Stock and Watson (2007), as well as selected data sets from additional sources, in an R package called AER that accompanies this book. It is available from the CRAN servers at http://CRAN.R-project.org/ and also contains all the code used in the fol- lowing chapters. These data sets are suitable for illustrating a wide variety of topics, among them wage equations, growth regressions, dynamic regressions and time series models, hedonic regressions, the demand for health care, or labor force participation, to mention a few. In our view, applied econometrics suffers from an underuse of graphics— one of the strengths of the R system for statistical computing and graphics. Therefore, we decided to make liberal use of graphical displays throughout, some of which are perhaps not well known. The publisher asked for a compact treatment; however, the fact that R has been mainly developed by statisticians forces us to briefly discuss a number of statistical concepts that are not widely used among econometricians, for historical reasons, including factors and generalized linear models, the latter in connection with microeconometrics. We also provide a chapter on R basics (notably data structures, graphics, and basic aspects of programming) to keep the book self-contained. The production of the book The entire book was typeset by the authors using LATEX and R’s Sweave() tools. Specifically, the final manuscript was compiled using R version 2.7.0, AER version 0.9-0, and the most current version (as of 2008-05-28) of all other CRAN packages that AER depends on (or suggests). The first author started under Microsoft Windows XP Pro, but thanks to a case of theft he switched to Mac OS X along the way. The second author used Debian GNU/Linux throughout. Thus, we can confidently assert that the book is fully repro- ducible, for the version given above, on the most important (single-user) plat- forms.
Preface vii Settings and appearance R is mainly run at its default settings; however, we found it convenient to employ a few minor modifications invoked by R> options(prompt="R> ", digits=4, show.signif.stars=FALSE) This replaces the standard R prompt > by the more evocative R>. For compact- ness, digits = 4 reduces the number of digits shown when printing numbers from the default of 7. Note that this does not reduce the precision with which these numbers are internally processed and stored. In addition, R by default displays one to three stars to indicate the significance of p values in model sum- maries at conventional levels. This is disabled by setting show.signif.stars = FALSE. Typographical conventions We use a typewriter font for all code; additionally, function names are fol- lowed by parentheses, as in plot(), and class names (a concept that is ex- plained in Chapters 1 and 2) are displayed as in “lm”. Furthermore, boldface is used for package names, as in AER. Acknowledgments This book would not exist without R itself, and thus we thank the R Develop- ment Core Team for their continuing efforts to provide an outstanding piece of open-source software, as well as all the R users and developers supporting these efforts. In particular, we are indebted to all those R package authors whose packages we employ in the course of this book. Several anonymous reviewers provided valuable feedback on earlier drafts. In addition, we are grateful to Rob J. Hyndman, Roger Koenker, and Jeffrey S. Racine for particularly detailed comments and helpful discussions. On the technical side, we are indebted to Torsten Hothorn and Uwe Ligges for advice on and infrastructure for automated production of the book. Regarding the accompanying package AER, we are grateful to Badi H. Baltagi, Philip Hans Franses, William H. Greene, James H. Stock, and Mark W. Watson for per- mitting us to include all the data sets from their textbooks (namely Baltagi 2002; Franses 1998; Greene 2003; Stock and Watson 2007). We also thank Inga Diedenhofen and Markus Hertrich for preparing some of these data in R format. Finally, we thank John Kimmel, our editor at Springer, for his pa- tience and encouragement in guiding the preparation and production of this book. Needless to say, we are responsible for the remaining shortcomings. May, 2008 Christian Kleiber, Basel Achim Zeileis, Wien
分享到:
收藏