Half Title
Title Page
Copyright Page
Table of Contents
Preface
1 Linear Models
1.1 A simple linear model
Simple least squares estimation
1.1.1 Sampling properties of β
1.1.2 So how old is the universe?
1.1.3 Adding a distributional assumption
Testing hypotheses about β
Confidence intervals
1.2 Linear models in general
1.3 The theory of linear models
1.3.1 Least squares estimation of β
1.3.2 The distribution of β
1.3.3 βi – βi/σβi ~ tn–p
1.3.4 F-ratio results I
1.3.5 F-ratio results II
1.3.6 The influence matrix
1.3.7 The residuals, ϵ, and fitted values, μ
1.3.8 Results in terms of X
1.3.9 The Gauss Markov Theorem: What’s special about least squares?
1.4 The geometry of linear modelling
1.4.1 Least squares
1.4.2 Fitting by orthogonal decompositions
1.4.3 Comparison of nested models
1.5 Practical linear modelling
1.5.1 Model fitting and model checking
1.5.2 Model summary
1.5.3 Model selection
1.5.4 Another model selection example A follow-up
1.5.5 Confidence intervals
1.5.6 Prediction
1.5.7 Co-linearity, confounding and causation
1.6 Practical modelling with factors
1.6.1 Identifiability
1.6.2 Multiple factors
1.6.3 ‘Interactions’ of factors
1.6.4 Using factor variables in R
1.7 General linear model specification in R
1.8 Further linear modelling theory
1.8.1 Constraints I: General linear constraints
1.8.2 Constraints II: ‘Contrasts’ and factor variables
1.8.3 Likelihood
1.8.4 Non-independent data with variable variance
1.8.5 Simple AR correlation models
1.8.6 AIC and Mallows’ statistic
1.8.7 The wrong model
1.8.8 Non-linear least squares
1.8.9 Further reading
1.9 Exercises
2 Linear Mixed Models
2.1 Mixed models for balanced data
2.1.1 A motivating example
The wrong approach: A fixed effects linear model
The right approach: A mixed effects model
2.1.2 General principles
2.1.3 A single random factor
2.1.4 A model with two factors
2.1.5 Discussion
2.2 Maximum likelihood estimation
2.2.1 Numerical likelihood maximization
2.3 Linear mixed models in general
2.4 Linear mixed model maximum likelihood estimation
2.4.1 The distribution of b | y, β given θ
2.4.2 The distribution of β given θ
2.4.3 The distribution of θ
2.4.4 Maximizing the profile likelihood
2.4.5 REML
2.4.6 Effective degrees of freedom
2.4.7 The EM algorithm
2.4.8 Model selection
2.5 Linear mixed models in R
2.5.1 Package nlme
2.5.2 Tree growth: An example using lme
2.5.3 Several levels of nesting
2.5.4 Package lme4
2.5.5 Package mgcv
2.6 Exercises
3 Generalized Linear Models
3.1 GLM theory
3.1.1 The exponential family of distributions
3.1.2 Fitting generalized linear models
3.1.3 Large sample distribution of β
3.1.4 Comparing models
Deviance
Model comparison with unknown ϕ
AIC
3.1.5 Estimating ϕ, Pearson’s statistic and Fletcher’s estimator
3.1.6 Canonical link functions
3.1.7 Residuals Pearson residuals Deviance residuals
3.1.8 Quasi-likelihood
3.1.9 Tweedie and negative binomial distributions
3.1.10 The Cox proportional hazards model for survival data Cumulative hazard and survival functions
3.2 Geometry of GLMs
3.2.1 The geometry of IRLS
3.2.2 Geometry and IRLS convergence
3.3 GLMs with R
3.3.1 Binomial models and heart disease
3.3.2 A Poisson regression epidemic model
3.3.3 Cox proportional hazards modelling of survival data
3.3.4 Log-linear models for categorical data
3.3.5 Sole eggs in the Bristol channel
3.4 Generalized linear mixed models
3.4.1 Penalized IRLS
3.4.2 The PQL method
3.4.3 Distributional results
3.5 GLMMs with R
3.5.1 glmmPQL
3.5.2 gam
3.5.3 glmer
3.6 Exercises
4 Introducing GAMs
4.1 Introduction
4.2 Univariate smoothing
4.2.1 Representing a function with basis expansions
A very simple basis: Polynomials
The problem with polynomials
The piecewise linear basis
Using the piecewise linear basis
4.2.2 Controlling smoothness by penalizing wiggliness
4.2.3 Choosing the smoothing parameter, A, by cross validation
4.2.4 The Bayesian/mixed model alternative
4.3 Additive models
4.3.1 Penalized piecewise regression representation of an additive model
4.3.2 Fitting additive models by penalized least squares
4.4 Generalized additive models
4.5 Summary
4.6 Introducing package mgcv
4.6.1 Finer control of gam
4.6.2 Smooths of several variables
4.6.3 Parametric model terms
4.6.4 The mgcv help pages
4.7 Exercises
5 Smoothers
5.1 Smoothing splines
5.1.1 Natural cubic splines are smoothest interpolators
5.1.2 Cubic smoothing splines
5.2 Penalized regression splines
5.3 Some one-dimensional smoothers
5.3.1 Cubic regression splines
5.3.2 A cyclic cubic regression spline
5.3.3 P-splines
5.3.4 P-splines with derivative based penalties
5.3.5 Adaptive smoothing
5.3.6 SCOP-splines
5.4 Some useful smoother theory
5.4.1 Identifiability constraints
5.4.2 ‘Natural’ parameterization, effective degrees of freedom and smoothing bias
5.4.3 Null space penalties
5.5 Isotropic smoothing
5.5.1 Thin plate regression splines
Thin plate splines
Thin plate regression splines
Properties of thin plate regression splines
Knot-based approximation
5.5.2 Duchon splines
5.5.3 Splines on the sphere
5.5.4 Soap film smoothing over finite domains
5.6 Tensor product smooth interactions
5.6.1 Tensor product bases
5.6.2 Tensor product penalties
5.6.3 ANOVA decompositions of smooths
Numerical identifiability constraints for nested terms
5.6.4 Tensor product smooths under shape constraints
5.6.5 An alternative tensor product construction What is being penalized?
5.7 Isotropy versus scale invariance
5.8 Smooths, random fields and random effects
5.8.1 Gaussian Markov random fields
5.8.2 Gaussian process regression smoothers
5.9 Choosing the basis dimension
5.10 Generalized smoothing splines
5.11 Exercises
6 GAM theory
6.1 Setting up the model
6.1.1 Estimating β given λ
6.1.2 Degrees of freedom and scale parameter estimation
6.1.3 Stable least squares with negative weights
6.2 Smoothness selection criteria
6.2.1 Known scale parameter: UBRE
6.2.2 Unknown scale parameter: Cross validation Leave-several-out cross validation Problems with ordinary cross validation
6.2.3 Generalized cross validation
6.2.4 Double cross validation
6.2.5 Prediction error criteria for the generalized case
6.2.6 Marginal likelihood and REML
6.2.7 The problem with log |Sλ|+
6.2.8 Prediction error criteria versus marginal likelihood
Unpenalized coefficient bias
6.2.9 The ‘one standard error rule’ and smoother models
6.3 Computing the smoothing parameter estimates
6.4 The generalized Fellner-Schall method
6.4.1 General regular likelihoods
6.5 Direct Gaussian case and performance iteration (PQL)
6.5.1 Newton optimization of the GCV score
6.5.2 REML
log |Sλ|+ and its derivatives
The remaining derivative components
6.5.3 Some Newton method details
6.6 Direct nested iteration methods
6.6.1 Prediction error criteria
6.6.2 Example: Cox proportional hazards model
Derivatives with respect to smoothing parameters
Prediction and the baseline hazard
6.7 Initial smoothing parameter guesses
6.8 GAMM methods
6.8.1 GAMM inference with mixed model estimation
6.9 Bigger data methods
6.9.1 Bigger still
6.10 Posterior distribution and confidence intervals
6.10.1 Nychka’s coverage probability argument Interval limitations and simulations
6.10.2 Whole function intervals
6.10.3 Posterior simulation in general
6.11 AIC and smoothing parameter uncertainty
6.11.1 Smoothing parameter uncertainty
6.11.2 A corrected AIC
6.12 Hypothesis testing andp-values
6.12.1 Approximate p-values for smooth terms Computing Tr
Simulation performance
6.12.2 Approximate p-values for random effect terms
6.12.3 Testing a parametric term against a smooth alternative
6.12.4 Approximate generalized likelihood ratio tests
6.13 Other model selection approaches
6.14 Further GAM theory
6.14.1 The geometry of penalized regression
6.14.2 Backfitting GAMs
6.15 Exercises
7 GAMs in Practice: mgcv
7.1 Specifying smooths
7.1.1 How smooth specification works
7.2 Brain imaging example
7.2.1 Preliminary modelling
7.2.2 Would an additive structure be better?
7.2.3 Isotropic or tensor product smooths?
7.2.4 Detecting symmetry (with by variables)
7.2.5 Comparing two surfaces
7.2.6 Prediction with predict.gam Prediction with lpmatrix
7.2.7 Variances of non-linear functions of the fitted model
7.3 A smooth ANOVA model for diabetic retinopathy
7.4 Air pollution in Chicago
7.4.1 A single index model for pollution related deaths
7.4.2 A distributed lag model for pollution related deaths
7.5 Mackerel egg survey example
7.5.1 Model development
7.5.2 Model predictions
7.5.3 Alternative spatial smooths and geographic regression
7.6 Spatial smoothing of Portuguese larks data
7.7 Generalized additive mixed models with R
7.7.1 A space-time GAMM for sole eggs
Soap film improvement of boundary behaviour
7.7.2 The temperature in Cairo
7.7.3 Fully Bayesian stochastic simulation: jagam
7.7.4 Random wiggly curves
7.8 Primary biliary cirrhosis survival analysis
7.8.1 Time dependent covariates
7.9 Location-scale modelling
7.9.1 Extreme rainfall in Switzerland
7.10 Fuel efficiency of cars: Multivariate additive models
7.11 Functional data analysis
7.11.1 Scalar on function regression
Prostate cancer screening
A multinomial prostate screening model
7.11.2 Function on scalar regression: Canadian weather
7.12 Other packages
7.13 Exercises
A Maximum Likelihood Estimation
A.1 Invariance
A.2 Properties of the expected log-likelihood
A.3 Consistency
A.4 Large sample distribution of θ
A.5 The generalized likelihood ratio test (GLRT)
A.6 Derivation of 2λ ~ χ2r under H0
A.7 AIC in general
A.8 Quasi-likelihood results
B Some Matrix Algebra
B.1 Basic computational efficiency
B.2 Covariance matrices
B.3 Differentiating a matrix inverse
B.4 Kronecker product
B.5 Orthogonal matrices and Householder matrices
B.6 QR decomposition
B.7 Cholesky decomposition
B.8 Pivoting
B.9 Eigen-decomposition
B.10 Singular value decomposition
B.11 Lanczos iteration
C Solutions to Exercises
C.1 Chapter 1
C.2 Chapter 2
C.3 Chapter 3
C.4 Chapter 4
C.5 Chapter 5
C.6 Chapter 6
C.7 Chapter 7
Bibliography
Index