INFORMATICS AND MATHEMATICAL MODELLING
IMM
Technical University of Denmark
DK-2800 Kgs. Lyngby – Denmark
DACE
A MATLAB Kriging Toolbox
Version 2.0, August 1, 2002
Søren N. Lophaven
Hans Bruun Nielsen
Jacob Søndergaard
46
44
42
40
38
36
34
100
80
60
40
20
20
0
0
100
80
60
40
Technical Report IMM-TR-2002-12
Please direct communication to Hans Bruun Nielsen (hbn@imm.dtu.dk)
Contents
1. Introduction
2. Modelling and Prediction
2.1. The Kriging Predictor
. . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. Regression Models
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3. Correlation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3. Generalized Least Squares Fit
1
1
2
5
6
9
3.1. Computational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4. Experimental Design
12
4.1. Rectangular Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2. Latin Hypercube Sampling . . . . . . . . . . . . . . . . . . . . . . . . 12
5. Reference Manual
13
5.1. Model Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2. Evaluate the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3. Regression Models
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4. Correlation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.5. Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.6. Auxiliary Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.7. Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6. Examples of Usage
21
6.1. Work-through Example . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2. Adding a Regression Function . . . . . . . . . . . . . . . . . . . . . . 24
7. Notation
24
1.
Introduction
This report describes the background for and use of the software package DACE
(Design and Analysis of Computer Experiments), which is a Matlab toolbox for
working with kriging approximations to computer models.
The typical use of this software is to construct a kriging approximation model based
on data from a computer experiment, and to use this approximation model as a
surrogate for the computer model. Here, a computer experiment is a collection of
pairs of input and responses from runs of a computer model. Both the input and
the response from the computer model are likely to be high dimensional.
The computer models we address are deterministic, and thus a response from a
model lacks random error, i.e., repeated runs for the same input parameters gives
the same response from the model.
Often the approximation models are needed as a part of a design problem, in which
the best set of parameters for running the computer model is determined. This
is for example problems where a computer model is fitted to physical data. This
design problem is related to the more general problem of predicting output from a
computer model at untried inputs.
In Section 2 we consider models for computer experiments and efficient predictors,
Section 3 discusses generalized least squares and implementation aspects, and in
Section 4 we consider experimental design for the predictors. Section 5 is a reference
manual for the toolbox, and finally examples of usage and list of notation are given
in Sections 6 and 7.
2. Modelling and Prediction
V
V
S:,j, S:,j
Y:,j, Y:,j
··· sm]
= 1,
= 1,
Given a set of m design sites S = [s1
Y = [y1 ··· ym]
conditions1
with si ∈ IRn and responses
with yi ∈ IRq. The data is assumed to satisfy the normalization
·,·
where X:,j is the vector given by the jth column in matrix X, and µ[· ] and V
denote respectively the mean and the covariance.
Following [9] we adopt a model ˆy that expresses the deterministic response y(x)∈ IRq,
for an n dimensional input x∈D ⊆ IRn, as a realization of a regression model F and
j = 1, . . . , n ,
j = 1, . . . , q ,
µ[S:,j] = 0,
µ[Y:,j] = 0,
(2.1)
1The user does not have to think of this: The first step in the model construction is to normalize
the given S, Y so that (2.1) is satisfied, see (5.1) below.
1
a random function (stochastic process),
ˆy(x) = F(β:,, x) + z(x),
= 1, . . . , q .
(2.2)
We use a regression model which is a linear combination of p chosen functions
fj : IRn → IR,
F(β:,, x) = β1,f1(x) + ··· βp,fp(x)
= [f1(x) ··· fp(x)] β:,
≡ f (x)
β:, .
The coefficients {βk,} are regression parameters.
The random process z is assumed to have mean zero and covariance
E
z(w)z(x)
= σ2
R(θ, w, x),
= 1, . . . , q
(2.3)
(2.4)
between z(w) and z(x), where σ2
is the process variance for the th component
of the response and R(θ, w, x) is the correlation model with parameters θ. An
interpretation of the model (2.2) is that deviations from the regression model, though
the response is deterministic, may resemble a sample path of a (suitably chosen)
stochastic process z. In the following we will focus on the kriging predictor for y.
First, however, we must bear in mind that the true value can be written as
y(x) = F(β:,, x) + α(β:,, x) ,
(2.5)
where α is the approximation error. The assumption is that by proper choice of β
this error behaves like “white noise” in the region of interest, i.e., for x∈D.
F = [f (s1) ··· f (sm)]
,
2.1. The Kriging Predictor
For the set S of design sites we have the expanded m×p design matrix F with
Fij = fj(si),
(2.6)
with f (x) defined in (2.3). Further, define R as the matrix R of stochastic-process
correlations between z’s at design sites,
Rij = R(θ, si, sj),
i, j = 1, . . . , m .
At an untried point x let
r(x) = [R(θ, s1, x) ··· R(θ, sm, x)]
(2.7)
(2.8)
be the vector of correlations between z’s at design sites and x.
Now, for the sake of convenience, assume that q = 1, implying that β = β:,1 and
Y = Y:,1, and consider the linear predictor
ˆy(x) = cY ,
2
(2.9)
ϕ(x) = E
F c(x) = f (x) .
2
2
(ˆy(x) − y(x))
cZ − z
z2 + cZZc − 2cZz
1 + cRc − 2cr
.
− λ
1 + cRc − 2cr
= E
= E
= σ2
(2.10)
(2.11)
with c = c(x)∈ IRm. The error is
ˆy(x) − y(x) = cY − y(x)
= c
= cZ − z +
(F β + Z) − (f (x)
F c − f (x)
β + z)
β ,
where Z = [z1 . . . zm]
unbiased we demand that F c − f (x) = 0, or
are the errors at the design sites. To keep the predictor
Under this condition the mean squared error (MSE) of the predictor (2.9) is
The Lagrangian function for the problem of minimizing ϕ with respect to c and
subject to the constraint (2.10) is
L(c, λ) = σ2
(F c − f ) .
(2.12)
The gradient of (2.12) with respect to c is
c(c, λ) = 2σ2(Rc − r) − F λ ,
L
and from the first order necessary conditions for optimality (see e.g. [7, Section 12.2])
we get the following system of equations
R F
F
0
c
˜λ
=
r
f
,
(2.13)
where we have defined
The solution to (2.13) is
˜λ = − λ
2σ2
.
˜λ = (F R−1F )
c = R−1(r − F ˜λ) .
−1(F R−1r − f ) ,
(2.14)
The matrix R and therefore R−1 is symmetric, and by means of (2.9) we find
3
ˆy(x) = (r − F ˜λ)
R−1Y
= rR−1Y − (F R−1r − f )
(F R−1F )
−1F R−1Y .
(2.15)
In Section 3 we show that for the regression problem
F β Y
the generalized least squares solution (with respect to R) is
β∗
= (F R−1F )
−1F R−1Y ,
and inserting this in (2.15) we find the predictor
ˆy(x) = rR−1Y − (F R−1r − f )
+ rR−1(Y − F β∗
)
β∗
= fβ∗
= f (x)
+ r(x)
γ∗ .
β∗
(2.16)
= Y − F β∗
For multiple responses (q > 1) the relation (2.16) hold for each column in Y , so
that (2.16) holds with β∗ ∈ IRp×q given by (2.15) and γ∗ ∈ IRm×q computed via the
residuals, Rγ∗
Note that for a fixed set of design data the matrices β∗
are fixed. For every
new x we just have to compute the vectors f (x) ∈ IRp and r(x) ∈ IRm and add two
simple products.
and γ∗
.
Getting an estimate of the error involves a larger computational work. Again we
first let q = 1, and from (2.11) and (2.14) we get the following expression for the
MSE of the predictor,
(Rc − 2r)
1 + c
1 + (F ˜λ − r)
R−1(F ˜λ + r)
1 + ˜λF R−1F ˜λ − rR−1r
−1u − rR−1r
1 + u
(F R−1F )
.
ϕ(x) = σ2
= σ2
= σ2
= σ2
(2.17)
where u = F R−1r − f and σ2 is found by means of (3.7) below. This expression
generalizes immediately to the multiple response case: for the th response function
we replace σ by σ, the process variance for th response function. Computational
aspects are given in Section 3.1.
Remark 2.1.1. Let x = si, the ith design site. Then r(x) = R:,i, the ith column
of R, and R−1r(x) = ei, the ith column of the unit matrix, ei = I:,i. Using these
relations and (2.6) in (2.16) we find
ˆy(si) = f (si)
= f (si)
= f (si)
β∗
β∗
β∗
+ r(si)
i (Y − F β∗
+ e
+ yi − Fi,:β∗
)
= yi .
R−1(Y − F β∗
)
4
This shows that the Kriging predictor interpolates the design data. Further, in
(2.17) we get u = F ei − f (si) = 0 and the associated MSE
ϕ(si) = σ2(1 − R:,i
ei) = σ2(1 − Rii) = 0 ,
since Rii = 1.
Remark 2.1.2. As indicated by the name MSE (mean squared error) we expect that
ϕ(x) ≥ 0, but in (2.17) it may happen that rR−1r > 1+u
−1u, in which
case ϕ(x) < 0. This point needs further investigation, but as a first explanation we
offer the following: Equation (2.11) is based on the assumption that the difference
between the regression model and the true value is “white noise”, and if there is
significant approximation error, (2.5), then this assumption and its implications do
not hold.
(F R−1F )
Remark 2.1.3. From (2.16) it follows that the gradient
ˆy
=
∂ ˆy
∂x1
··· ∂ ˆy
∂xn
can be expressed as
(x) = Jf (x)
γ∗ ,
where Jf and Jr is the Jacobian of f and r, respectively,
∂R
∂xj
(Jf (x))ij =
(Jr(x))ij =
β∗
+ Jr(x)
∂fi
∂xj
(x),
(2.18)
(θ, si, x) .
(2.19)
From (2.17) it follows that the gradient of the MSE can be expressed as
ϕ
(x) = 2σ2
= 2σ2
(F R−1F )
(F R−1F )
−1u − R−1r
−1(F R−1r − Jf β∗
) − R−1r
ˆy
.
(2.20)
These expressions are implemented in the toolbox, see Section 5.2.
2.2. Regression Models
The toolbox provides regression models with polynomials of orders 0, 1 and 2. More
specific, with xj denoting the jth component of x,
Constant, p = 1 :
f1(x) = 1 ,
(2.21)
Linear, p = n+1 :
f1(x) = 1,
f2(x) = x1,
. . . ,
fn+1(x) = xn ,
(2.22)
5
Quadratic, p = 1
2(n+1)(n+2) :
f1(x) = 1
f2(x) = x1, . . . , fn+1(x) = xn
fn+2(x) = x2
f2n+2(x) = x2
···
1, . . . , f2n+1(x) = x1xn
2, . . . , f3n(x) = x2xn
···
fp(x) = x2
n .
(2.23)
The corresponding Jacobians are (index n×q denotes the size of the matrix and O
is the matrix of all zeros)
constant : Jf = [On×1] ,
linear : Jf = [On×1 In×n] ,
quadratic : Jf = [On×1 In×n H] ,
where we illustrate H ∈ IRn×(p−n−1) by
2x1 x2
0
2x1 x2 x3
0
x1 2x2
0
0
x1
0
0 x1
,
0
0
2x2 x3
0
0
0
x2 2x3
.
n = 2 : H =
n = 3 : H =
2.3. Correlation Models
As [9] we restrict our attention to correlations of the form
Rj(θ, wj − xj) ,
R(θ, w, x) =
n
j=1
expg
Name
exp
i.e., to products of stationary, one-dimensional correlations. More specific, the tool-
box contains the following 7 choices
Rj(θ, dj)
exp(−θj|dj|)
exp(−θj|dj|θn+1),
exp(−θjd2
j )
max{0, 1−θj|dj|}
1 − 1.5ξj + 0.5ξ3
j ,
1 − 3ξ2
j + 2ξ3
j ,
ς(ξj), (2.24)
ξj = min{1, θj|dj|}
ξj = min{1, θj|dj|}
ξj = θj|dj|
0 < θn+1 ≤ 2
spherical
spline
gauss
cubic
lin
Table 2.1. Correlation functions. dj = wj − xj
6