Springer Series in Statistics
Advisors:
P. Bickel, P. Diggle, S. Fienberg, U. Gather,
I. Olkin, S. Zeger
Springer Series in Statistics
Alho/Spencer: Statistical Demography and Forecasting.
Andersen/Borgan/Gill/Keiding: Statistical Models Based on Counting Processes.
Atkinson/Riani: Robust Diagnostic Regression Analysis.
Atkinson/Riani/Ceriloi: Exploring Multivariate Data with the Forward Search.
Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition.
Borg/Groenen: Modern Multidimensional Scaling: Theory and Applications, 2nd
edition.
Brockwell/Davis: Time Series: Theory and Methods, 2nd edition.
Bucklew: Introduction to Rare Event Simulation.
Cappé/Moulines/Rydén: Inference in Hidden Markov Models.
Chan/Tong: Chaos: A Statistical Perspective.
Chen/Shao/Ibrahim: Monte Carlo Methods in Bayesian Computation.
Coles: An Introduction to Statistical Modeling of Extreme Values.
Devroye/Lugosi: Combinatorial Methods in Density Estimation.
Diggle/Ribeiro: Model-based Geostatistics.
Dudoit/Van der Laan: Multiple Testing Procedures with Applications to Genomics.
Efromovich: Nonparametric Curve Estimation: Methods, Theory, and Applications.
Eggermont/LaRiccia: Maximum Penalized Likelihood Estimation, Volume I: Density
Estimation.
2nd edition.
Fahrmeir/Tutz: Multivariate Statistical Modeling Based on Generalized Linear Models,
Fan/Yao: Nonlinear Time Series: Nonparametric and Parametric Methods.
Ferraty/Vieu: Nonparametric Functional Data Analysis: Theory and Practice.
Ferreira/Lee: Multiscale Modeling: A Bayesian Perspective.
Fienberg/Hoaglin: Selected Papers of Frederick Mosteller.
Frühwirth-Schnatter: Finite Mixture and Markov Switching Models.
Ghosh/Ramamoorthi: Bayesian Nonparametrics.
Glaz/Naus/Wallenstein: Scan Statistics.
Good: Permutation Tests: Parametric and Bootstrap Tests of Hypotheses, 3rd edition.
Gouriéroux: ARCH Models and Financial Applications.
Gu: Smoothing Spline ANOVA Models.
Gyöfi /Kohler/Krzyźak/Walk: A Distribution-Free Theory of Nonparametric
Regression.
Haberman: Advanced Statistics, Volume I: Description of Populations.
Hall: The Bootstrap and Edgeworth Expansion.
Härdle: Smoothing Techniques: With Implementation in S.
Harrell: Regression Modeling Strategies: With Applications to Linear Models,
Logistic Regression, and Survival Analysis.
Hart: Nonparametric Smoothing and Lack-of-Fit Tests.
Hastie/Tibshirani/Friedman: The Elements of Statistical Learning: Data Mining,
Inference, and Prediction.
Hedayat/Sloane/Stufken: Orthogonal Arrays: Theory and Applications.
Heyde: Quasi-Likelihood and its Application: A General Approach to Optimal
Parameter Estimation.
Huet/Bouvier/Poursat/Jolivet: Statistical Tools for Nonlinear Regression: A Practical
Guide with S-PLUS and R Examples, 2nd edition.
Ibrahim/Chen/Sinha: Bayesian Survival Analysis.
Jiang: Linear and Generalized Linear Mixed Models and Their Applications.
Jolliffe: Principal Component Analysis, 2nd edition.
Knottnerus: Sample Survey Theory: Some Pythagorean Perspectives.
Konishi/Kitagawa: Information Criteria and Statistical Modeling.
(continued after index)
Sadanori Konishi.
Genshiro Kitagawa
Information Criteria
and Statistical Modeling
Sadanori Konishi
Faculty of Mathematics
Kyushu University
6-10-1 Hakozaki, Higashi-ku
Fukuoka 812-8581
Japan
konishi@math.kyushu-u.ac.jp
Genshiro Kitagawa
The Institute of Statistical Mathematics
4-6-7 Minami-Azabu,
Tokyo 106-8569
Japan
kitagawa@ism.ac.jp
Minato-ku
ISBN: 978-0-387-71886-6
e-ISBN: 978-0-387-71887-3
Library of Congress Control Number: 2007925718
© 2008 Springer Science+Business Media, LLC
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street,
New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identifi ed as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
Printed on acid-free paper
9 8 7 6 5 4 3 2 1
springer.com
Preface
Statistical modeling is a critical tool in scientific research. Statistical mod-
els are used to understand phenomena with uncertainty, to determine the
structure of complex systems, and to control such systems as well as to make
reliable predictions in various natural and social science fields. The objective
of statistical analysis is to express the information contained in the data of
the phenomenon and system under consideration. This information can be
expressed in an understandable form using a statistical model. A model also
allows inferences to be made about unknown aspects of stochastic phenomena
and to help reveal causal relationships. In practice, model selection and evalu-
ation are central issues, and a crucial aspect is selecting the most appropriate
model from a set of candidate models.
In the information-theoretic approach advocated by Akaike (1973, 1974),
the Kullback–Leibler (1951) information discrepancy is considered as the basic
criterion for evaluating the goodness of a model as an approximation to the
true distribution that generates the data. The Akaike information criterion
(AIC) was derived as an asymptotic approximate estimate of the Kullback–
Leibler information discrepancy and provides a useful tool for evaluating
models estimated by the maximum likelihood method. Numerous successful
applications of the AIC in statistical sciences have been reported [see, e.g.,
Akaike and Kitagawa (1998) and Bozdogan (1994)]. In practice, the Bayesian
information criterion (BIC) proposed by Schwarz (1978) is also widely used
as a model selection criterion. The BIC is based on Bayesian probability and
can be applied to models estimated by the maximum likelihood method.
The wide availability of fast and inexpensive computers enables the con-
struction of various types of nonlinear models for analyzing data with complex
structure. Nonlinear statistical modeling has received considerable attention in
various fields of research, such as statistical science, information science, com-
puter science, engineering, and artificial intelligence. Considerable effort has
been made in establishing practical methods of modeling complex structures of
stochastic phenomena. Realistic models for complex nonlinear phenomena are
generally characterized by a large number of parameters. Since the maximum
vi
Preface
likelihood method yields meaningless or unstable parameter estimates and
leads to overfitting, such models are usually estimated by such methods as
the maximum penalized likelihood method [Good and Gaskins (1971), Green
and Silverman (1994)] or the Bayes approach. With the development of these
flexible modeling techniques, it has become necessary to develop model selec-
tion and evaluation criteria for models estimated by methods other than the
maximum likelihood method, relaxing the assumptions imposed on the AIC
and BIC.
One of the main objectives of this book is to provide comprehensive expla-
nations of the concepts and derivations of the AIC, BIC, and related criteria,
together with a wide range of practical examples of model selection and eval-
uation criteria. A secondary objective is to provide a theoretical basis for
the analysis and extension of information criteria via a statistical functional
approach. A generalized information criterion (GIC) and a bootstrap infor-
mation criterion are presented, which provide unified tools for modeling and
model evaluation for a diverse range of models, including various types of non-
linear models and model estimation procedures such as robust estimation, the
maximum penalized likelihood method and a Bayesian approach. A general
framework for constructing the BIC is also described.
In Chapter 1, the basic concepts of statistical modeling are discussed. In
Chapter 2, models are presented that express the mechanism of the occurrence
of stochastic phenomena. Chapter 3, the central part of this book, explains
the basic ideas of model evaluation and presents the definition and derivation
of the AIC, in both its theoretical and practical aspects, together with a wide
range of practical applications. Chapter 4 presents various examples of statis-
tical modeling based on the AIC. Chapter 5 presents a unified information-
theoretic approach to statistical model selection and evaluation problems in
terms of a statistical functional and introduces the GIC [Konishi and Kitagawa
(1996)] for the evaluation of a broad class of models, including models esti-
mated by robust procedures, maximum penalized likelihood methods, and the
Bayes approach. In Chapter 6, the GIC is illustrated through nonlinear sta-
tistical modeling in regression and discriminant analyses. Chapter 7 presents
the derivation of the GIC and investigates its asymptotic properties, along
with some theoretical and numerical improvements. Chapter 8 is devoted to
the bootstrap version of information criteria, including the variance reduction
technique that substantially reduces the variance associated with a Monte
Carlo simulation. In Chapter 9, the Bayesian approach to model evaluation,
such as the BIC, ABIC [Akaike (1980b)] and the predictive information cri-
terion [Kitagawa (1997)] are discussed. The BIC is also extended such that
it can be applied to the evaluation of models estimated by the method of
regularization. Finally, in Chapter 10, several model selection and evaluation
criteria such as cross-validation, generalized cross-validation, final prediction
error (FPE), Mallows’ Cp, the Hannan–Quinn criterion, and ICOMP are in-
troduced as related topics.
Preface
vii
We would like to acknowledge the many people who contributed to the
preparation and completion of this book. In particular, we would like to ac-
knowledge with our sincere thanks Hirotugu Akaike, from whom we have
learned so much about the seminal ideas of statistical modeling.
We have been greatly influenced through discussions with Z. D. Bai,
H. Bozdogan, D. F. Findley, Y. Fujikoshi, W. Gersch, A. K. Gupta,
T. Higuchi, M. Ichikawa, S. Imoto, M. Ishiguro, N. Matsumoto, Y. Maesono,
N. Nakamura, R. Nishii, Y. Ogata, K. Ohtsu, C. R. Rao, Y. Sakamoto,
R. Shibata, M. S. Srivastava, T. Takanami, K. Tanabe, M. Uchida, N. Yoshida,
T. Yanagawa, and Y. Wu.
We are grateful to three anonymous reviewers for comments and sugges-
tions that allowed us to improve the original manuscript. Y. Araki, T. Fujii,
S. Kawano, M. Kayano, H. Masuda, H. Matsui, Y. Ninomiya, Y. Nonaka, and
Y. Tanokura read parts of the manuscript and offered helpful suggestions.
We would especially like to express our gratitude to D. F. Findley for his
previous reading of this manuscript and his constructive comments. We are
also deeply thankful to S. Ono for her help in preparing the manuscript by
LATEX. John Kimmel patiently encouraged and supported us throughout the
final preparation of this book. We express our sincere thanks to all of these
people.
Fukuoka and Tokyo, Japan
February 2007
Sadanori Konishi
Genshiro Kitagawa