Wave-front  estimation from wave-front slope measurements
W. H.  Southwell
Rockwell International Corporation, Rocketdyne Division, Canoga Park, California 91304
(Received  28  August 1979;  revised 21  February 1980)
The  problem  of wave-front  estimation  from  wave-front  slope  measurements  has  been  examined
from  a least-squares  curve  fitting  model  point  of  view.  It  is  shown  that  the  slope  measurement
sampling  geometry  influences  the  model  selection  for  the phase  estimation. Successive  over-relax-
ation (SOR)  is employed  to numerically  solve the  exact zonal  phase estimation  problem. A new zonal
phase gradient  model  is  introduced  and  its error  propagator,  which  relates  the  mean-square  wave-
front error  to the noisy slope measurements,  has been compared  with two  previously  used models. A
technique  for the rapid  extraction  of phase  aperture  functions  is presented.  Error propagation  prop-
erties  for modal  estimation  are  evaluated  and  compared  with zonal  estimation  results.
INTRODUCTION
In this  paper we consider the problem of estimating wave-
front phase  from a set of discrete phase slope measurements.
This  reconstruction  process  is  necessary  to  obtain  phase
profile  estimates  from  certain  types  of  sensors,  such  as  a
shearing  interferometer  or  a  Hartmann  sensor.  It  is  also
necessary  in  image  restoration  using  the Knox-Thompson'
speckle imaging  technique.
A variety  of approaches  have been  proposed  to accomplish
this  phase  reconstruction.  Basically,  each  estimation  ap-
proach  may  be  categorized  as  being  either  zonal  or  modal
depending  on whether  the estimate  is a phase value  in a local
zone or  a coefficient of an  aperture function.  In either case,
least-squares  estimation  is  used  for  the  phase  reconstruc-
tion.
Differences  in  the  methods  can  be  distinguished  by  the
different  models  used  to  "fit"  the  local  slope  measure-
ments.
Some  important considerations  in selecting  a model are  (i)
compatibility;  does  the  model  fit  the geometry  of the slope
measurements  as  given  by  the  sensor?,  (ii)  numerical  com-
plexity;  are  there  convergence  problems, storage,  and com-
putation  speed  requirements?,  and  (iii)  error  propagation;
what effect does noise in the slope measurements have on the
phase estimates?
The reconstruction problem  was first addressed in the lit-
erature by Rimmer,2  and since then by a number  of authors. 3 4
Most of these  papers use  a phase difference  slope model  for
zonal  reconstruction  that is chiefly  applicable  to a shearing
interferometric  sensor, where  the x-slope  measurement  po-
sitions are not coincident  with the y-slope measurement  po-
sitions.  With some  approximations,  Hudgin3  calculated the
noise propagator  for this model.  Using  a  model that assumed
coincident x-  and y-slope  measurements  and some  approxi-
mations,  Fried4  also  calculated  the  noise  propagation  pa-
rameter.  Recently,  Herrmann 8  constructed an exact  solution
for  the  noise  propagator  for  the  models  of both  Fried  and
Hudgin.  Herrmann's results essentially confirmed  the pre-
vious  results.
In this paper  we present a  new phase reconstruction  model
which,  like  that of Fried  is  applicable  to coincident  x-  and
y-slope  measurements  (and  thus  suitable  for  Hartmann
sensors).  The error propagation  properties  of this new  model
have been shown to be superior  to the models of both  Hudgin
and Fried,  partly because it provides  a more uniform  sampling
of the aperture.  We also describe, and illustrate with exam-
ples, a matrix iterative technique  that solves the exact  least-
squares zonal reconstruction  problem,  is easy  to implement,
and  exhibits rapid convergence.
Modal  estimation  is  also  discussed.  Some  numerical
methods  for the  rapid extraction  of the low-order  orthogonal
modes  are described.  Finally, noise propagation effects  using
modal  estimation  are  compared  with those using  zonal esti-
mation.
ZONAL  ESTIMATION
Three grid  configurations  that illustrate  the grid positions
for the  x-  and y-slope  measurements  and the  reconstructed
phases  are shown  in Fig. 1.  It is important to point out that
the formulation  of the  problem depends  on the grid  pattern
used, which, in turn, depends on where the actual slope mea-
surements  are  taken.  Figure  1(A)  would  be  applicable,  for
example,  for a Hartmann  sensor, where each small lens mea-
998 
J. Opt. Soc.  Am.,  Vol. 70, No.  8, August  1980
0030-3941/80/080998-09$00.50
(0 1980  Optical Society  of America  -
998
Syj  = (bij+1- oij)/h 
1J 
i=  1, N
i=  1,N-  1
1 
(6)
where  h  =  DIN and D  is the width  of  the  aperture.  Each
phase  point represents  an  equal subregion  of area h 2  in the
aperture.  This is  the least-squares  model used  by Hudgin.3
(We  note that  it is equivalent to the assumption  of a bilinear
spline phase on the phase grid of Fig.  1(B).  The phase  is ev-
erywhere  continuous, but has discontinuous  slopes at the mesh
points.)
It  is also  interesting to  note that we  have derived the slope
model based  on  actual local slopes.  But  we see  from Eqs. (5)
and  (6)  that the  model  is equally applicable  to the  assumption
that  the  slopes  be  determined  by  adjacent  phase  differ-
ences.
Now  suppose  our x-  and y-slope  measurements  are  given
at coincident points, as in a  Hartmann plate.  One possibility
is  to  let the phase  points  used  in Eq.  (4)  be  replaced  by  the
average  of two phase values which  are positioned perpendic-
ular to  the slope  direction  by one half-step  up and one  half-
step down.  This is the  configuration shown in Fig. 1(C).  This
gives the slope  model  used  by Fried,4
Sx  = [(¢0i+1j, +  oi+lj+,)/ 2  - (kij + 
h
,ij++)/2] 
=  [(0jj+1 + 
±i+lj+i)/2  -(kij  + 0i+ij)/2] 
1] 
h
(7)
(8)
For  a  new  model  appropriate  for  Hartmann  sensor  mea-
surements,  we  consider  the  grid  geometry  of Fig.  1(A)  and
return  to  Eqs.  (1) and  (2)  for  a  representation  of the  phase
between  mesh points.  This time we  have  two measurements
per  interval,  a  slope at each end.  This enables the  determi-
nation of both cl and  c2 in  Eq. (2).
This  produces the relationships
(Sj,X~,j  + S,,j) 
(Oi+,j -r oij) 
2
(Stx+ j + SY0- 
2 
h 
(0i'j+1 -
h
ij)
I 
i =  1, N- 
j= 1,N
i= 1,N
j= 1,N-1
MI
(10)
[We comment  that the phase model used for data configu-
ration  like Fig.  1 (A)  is a biquadratic  spline, whereas configu-
rations  like  Fig.  1(B)  are  based  on  a bilinear  spline  phase.
However, since we are not interpolating,  but only determining
the phase at the mesh points, it does not necessarily mean that
one  is more  accurate  in the  phase estimation.]
LEAST-SQUARES  SOLUTION
Having  a set of measurements  Sx and  SY  and a model for
these  quantities  [Eqs.  (5) and  (6),  (7) and  (8),  or (9)  and  (10)
in terms of the adjustable parameters  Iij], it is straightforward
to apply least-squares techniques.  However,  there are some
potential  difficulties  which require attention.
To formulate the least-squares  problem,  note that Eqs. (5)
and  (6)  or  Eqs. (7) and  (8)  may be  written  as  a single  matrix
equation
S=A4, 
(11)
where S is  a vector containing  all the slope measurements  (x
(B) 
I  I I  1.
*,- 
___ 
j
4  -
+  -
"
-
I 
*-*9-*-
-
I
FIG.  1.  Slope  measurement
sampling  geometry  and  wave-
front mesh points.  The  horizontal
dashes  indicate  positions  of  x-
slope  sampling. 
The  vertical
dashes  are the  y-slope sampling
positions.  The  dots  are  the esti-
mated  phase  points.  Configura-
tion  B has  been  considered  pre-
viously by  Hudgin  and  configura-
tion  C by  Fried.
sures  both the x  andy slopes at the same point.  Figure  1(B)
could  be applicable, for  example, in  a shearing interferometer
where the beam  is first divided  in order to  apply, separately,
the  x  and  y  shears.  Figure  1(C)  shows  coincident  x  and y
slopes, but with a  displaced  phase grid.
The  problem  is  to least-squares-fit  the slope  measurement
data to some  function  or  model of the phase  given  at its grid
points.  Consider  first  the  data  configuration  given  in Fig.
In  that  configuration  there  are  N2  phase  points  and
1(B). 
2N(N-1)  slope measurement  points.  Let us  assume that the
phase  dependence  between  grid  points  in  the  x  direction  is
represented  by  the  polynomial
The slope  is
t1=Co + 
clx 
2
.
Sx  = c1  + 2c 2x. 
(1)
(2)
But  since we  have only one  slope  measurement, which  is  given
at the  midpoint of the interval, we  can only determine  c 1. The
constant c0  is the value of the  phase  at the left side of the in-
terval.  Thus,  all  that the data  can  support  is a  linear phase
model,
q5=  + ST x. 
(3)
The phase at the right of the interval  (x  =h) is  ki+  . Applying
this  boundary  fact leads  to the slope  model,
S-T  =  (0i+1 - ,i)lh 
i = 1, N-  1, 
(4)
where N  is  the number  of phase sample points in  the x direc-
tion.  Generalizing  to all  rows  and to the y  direction,  we  ob-
tain,
= - 1N - 1
j1, N 
()
999 
J. Opt.  Soc.  Am.,  Vol.  70,  No.  8, August  1980
W. H. Southwell 
999
slopes  first)  and 4  is a  vector of length N2  containing  all  the
unknown  phase  values.  The size of the rectangular  matrix
A  is N2  by the number  of slope measurements;  however,  it is
very  sparse.  Multiplying  Eq.  (14)  from  the  left  by  At  (A
transpose),  we obtain the normal  equations
(AtA)o  = AtS,
(12)
which  are  a set of N 2  linear  equations in  N2  unknowns  4jk.
The  standard  least-squares  solution  is  obtained  by  multi-
plying  the  right-hand  side  of Eq.  (12)  by  the  inverse  of the
square  matrix AtA.
However,  the  standard  least-squares  solution  cannot  be
used because  AtA is  singular.  This is a  consequence  of the
fact that any constant  wave front  added  to the correct wave
front also satisfies the models.  This does not mean  that Eq.
(12)  cannot  be solved  for a meaningful  solution.  But it does
mean  we  cannot  use  the  standard  matrix  inversion  tech-
nique.
Herrmann  has shown  that the desired  solution  is  the one
that has a  minimum norm,  which has zero  mean.  He shows
that using  the  zero-mean  condition  leads  to an extended  Ae
matrix that yields  an At A,  matrix that is  not singular.  The
matrix A,  has an  additional row  (of all  ones) and  Se  has an
additional  element  (zero).  However,  this procedure  requires
the computer  storage  of a matrix  the  size  A,  which  has es-
sentially  2N2  elements.
For large  arrays,  we  can use  iterative  matrix  methods  for
solving Eq. (12).  The solution  obtained  is unique to  within
an additive constant,  but it is  an easy matter  to subtract out
the  mean.  We  discuss  these methods  below.  But first,  we
write the matrix equation representing  Eqs. (9)  and  (10), the
Fig.  1(A)  model,
DS  = A4, 
(13)
where  D is a large sparse  matrix that performs  the adjacent
slope  averaging.  The corresponding  normal  equations  are
(AtA)4)  =  AtDS. 
(14)
MATRIX 
ITERATIVE  SOLUTIONS
For large arrays, we  can take advantage  of the sparse nature
of AtA and use matrix iterative methods for  solving Eqs.  (12)
and  (14).  If we  write just the nonzero  matrix element  com-
ponents  of Eq.  (12)  for Fig.  1(B)  configuration, we  obtain
gJ'kJjh 
[bj+lk +  )j-l, + 
'jk+1  + 
'kj,k-1I
=  [SJ1k-- 
SJk  +  Si-I,k  Sjk]  h, 
(15)
where
2 
j=lorN;k=lorN
gj  = 3 
4 
lj  =1 or N;  = 2to N  -
= k  =  1 orN;  j  =  2toN -1
otherwise,
(16)
and  if a nonexistent  matrix element is called  for, then zero is
assumed.  The g  factors  are  the diagonal  elements  of AtA,
which are either 2, 3,  or 4 according  to whether  its accompa-
nying  phase  point qpjh  is  on a  corner,  an edge,  or an  interior
point.  Calling the right-hand side  bjk, we  can rewrite Eq.  (15)
as
'tjk =  kjk  +  bJhlgjk,
where  opii is the nearest-neighbor  phase  average,
jth =  [j+l,k +  Oj-l,k  +  4j,k+l]/gJk,
and
bib  =  [Syk-I - S\ +  S-ik 
- SJk'h,
(17)
(18)
(19)
which  is  a  constant  depending  only  on  the  slope measure-
ments.
Equation (17)  forms the basis for an iterative solution.  The
right-hand side  of Eq.  (17)  is evaluated  using current  values
for the O's  (which  may initially be  zero).  The result becomes
the improved  estimate;  at the  mth iteration, this becomes
Oj(hm+l)  = 
J(km)  +  biklgih-
(20)
If the  left-hand  side  is  held  in a  separate array  until all N2
points are evaluated, then this procedure  is the Jacobi method.
If, however,  the  left-hand side updates the wave-front  array
directly,  such that succeeding evaluations  of the  right-hand
side could use  phase points that have already been updated,
then  the  procedure  is  called  the  Gauss-Seidel  method.
Generally,  the  Gauss-Seidel method  converges  faster and  is
easier  to implement.
Neither of these methods, however,  updates a phase point
based on the  previous value  at that point, since 4)jh  does not
include  4)k-
By adding and subtracting 4)jk  on  the right-hand  side of Eq.
(20)  and introducing  the relaxation  parameter a, we have
(m +1) =  (m) +  . [2 ) + bjk/gjk  - 0)J1 ]
(21)
This  iterative  technique  is  the  successive  over-relaxation
(SOR) method.  Convergence  is  achieved when  the quantity
in brackets on the right-hand side becomes small.  The SOR
method  generally  promises  improvement  in  convergence;
however,  it is necessary  to determine the value  of the relaxa-
tion  parameter w  which maximizes  the rate  of convergence.
Fortunately,  AtA  for  the  phase  reconstruction  problem
belongs  to a  class  of matrices for which  the optimal  value  of
w  is  known.9 The result  is
2
W  1 +  sin[7r/(N  +  1)]
(22)
The results of a numerical  comparison of the three matrix
iteration  methods  discussed  above  are tabulated  in Table  I.
The slope measurements  were  generated  from the analytical
derivatives  of a simulated  wave front, which  in this  example
contained  astigmatism.  The rms wave-front  reconstruction
error  was  calculated  by  comparing  the  wave  front  recon-
structed  from the slopes  with the simulated wave  front.  The
Jacobi method  converged but left a small residual error in this
example, although  it generally  gives the correct  answer.  The
Gauss-Seidel  method converged  in approximately  N2  itera-
tions, whereas  the SOR method  converged in approximately
2N  iterations. 
(An analysis  of the convergence  rates  indicates
that for large  N, SOR will converge  2N/7r fewer iterations than
the Gauss-Seidel  method.)
Hudgin 3  formulates  the wave-front reconstruction  problem
in  such a way that edge  effects are  ignored.  They deal with
Eq.  (15) where  all the gjk  = 4, which treats all phase points as
1000 
J.  Opt.  Soc.  Am., Vol. 70,  No. 8,  August  1980
W.  H. Southwell 
1000
rms  wave-front  reconstruction  error  in waves  as  a  function  of
TABLE  I. 
the  number  of  iterations for  three  different  iterative  methods.  The  sampled
2 )/a2
wave  front  contained  astigmatism  of  the form  W  = 2.3717  (X2
+  6xyla2  over  a  square  aperture  of area  4a 2.
-y
N
4
8
16
Number of
iterations 
Jacobi 
Gauss-Seidel 
SOR 
2 
4 
8 
16 
32 
64 
2 
4 
8 
16 
32 
64 
128 
256 
2 
4 
8 
16 
32 
64 
128 
256 
512 
1024 
0.487 
0.176 
0.126 
0.125 
0.125 
0.125 
1.638 
1.288 
0.823 
0.350 
0.067 
0.014 
0.013 
0.013 
2.068 
1.947 
1.744 
1.422 
0.969 
0.467 
0.113 
0.007 
0.002 
0.002 
0.504 
0.255 
0.043 
8)x  10-4 
3 X 10-7 
0 
1.389 
0.936 
0.454 
0.151 
0.034 
0.002 
4)x  10-6 
0 
1.963 
1.763 
1.446 
0.996 
0.492 
0.135 
0.022 
0.001 
6 x  10-6 
0 
0.620 
0.249
2 x  10-4
0
0
0
1.062 
0.664
0.319
0.022
6 X 10-5
0
0
0
1.563 
1.159
0.716
0.344
0.046
5x 10-4
0
0
0
0
being  in the  interior  of the aperture.  Such  an  assumption  can
lead to large  wave-front  reconstruction  errors.  Table II  shows
the  wave-front  reconstruction  results using  the recursive  al-
gorithm  of  Hudgin  on the  same  slope  data  used  in  Table  I.
The  large  errors  indicate that the  simulated  wave  front  was
not  reconstructed.  This  means  that edge effects  are  impor-
tant-even for N  =  16, where  nearly one-fourth of the phase
points  lie  on  the aperture  edge.
Frost et  al.7  also  neglect  edge  effects  in  order  to utilize  a
fast-Fourier-transform  solution method.  The results of their
numerical  examples  appear  to  be  quite good;  however,  they
used a  large  phase  grid,  N  =  256.
For practical wave-front  sensors,  where  N will  probably be
less  than  35,  the  aperture  boundary  or  edge  effects  are  im-
portant.  To make  this  point  more  explicit,  we used  the  re-
cursive  algorithm  of Hudgin  on a  simulated wave  front  with
defocus  W =  1.732  [2(x 2 + y 2) - 1].  Using an N  = 4  grid, the
correct  wave  front  is
/2.17
1 0.43
0.43
2.17
0.43 
-1.30 
-1.30 
0.43 
0.43  2.17\
-1.30  0.43
-1.30  0.43
0.43  2.17
(23)
The  exact  analytical  derivatives  were  used  to  generate  the
slope data  at the midpoints between  sample points.  The re-
constructed  wave front, using  the recursive  formula  of Hudgin,
gave  the  following  result  (after  applying  an  unimportant
constant  offset  of 1.588):
1001 
J. Opt.  Soc.  Am.,  Vol. 70, No.  8,  August 1980
2.17 
1 1.01 
1.01 
2.17 
1.01 
-0.72 
-0.72 
1.01 
1.01  2.17\
1.01
-0.72 
-0.72 
1.01  )
1.01  2.17
(24)
The  rms wave-front  reconstruction  error was 0.25  waves.
We  next write the nonzero  matrix element  components of
Eq.  (14),  Fig.  1(A)  configuration:
gjkIjk  -
[Oj+l,k + 
'j-lk 
+  kj,k+l  +  Oj,k-1]
1
=  -
2'
[Sjyh+l -Sjyk-1  ± Sjx+l,k  -S1_1,k]  h,
(25)
where
2 
j=lorN;k=lorN
gjk = 
3  kl1orN;j-2toN-1 
4 
otherwise,
(26)
and  if a  nonexistent phase  element is  called  for,  then zero is
assumed,  and if a  nonexistent slope element  is called for, then
the negative of the adjacent slope  is assumed.  The only dif-
ference  between  this  equation  and  Eq.  (15)  is in the formation
of the right-hand  side of Eq. (25),  which is only a manipulation
of the data.  Consequently,  the same  SOR technique, Eq.  (21),
may  be applied,  except that bik  is  now the right-hand side of
Eq.  (25).
ZONAL  ESTIMATION  ERROR
There  are  two  important  sources  of  error  in  the  recon-
struction  process.  One is the  algorithm accuracy.  That is,
how  well does  the least-squares  process  reconstruct  a  phase
of  arbitrary shape?  The  answer,  of course,  depends  on the
shape  of the wave  front  being measured.  For  example,  nu-
merical  experimentation  indicates  that  the  zonal  recon-
struction  models  discussed  here  will  perfectly  reconstruct
TABLE  II.  Wave-front reconstruction results using the  recursive formula
of  Hudgin (which  neglects  edge effects).  Shown here is the  rms wave-front
reconstruction  error  in  waves  from  slope data sampled  from  a wave front
containing astigmatism  of the form  W  =  2.3717(x 2 - y2)/a 2 +  6xy/a2  over
a  square aperture of area  4a 2 .
In each case the recursive algorithm quickly
converged,  but  did  not  reproduce the  sampled wave  front.
Number  of
iterations 
rms error 
2 
4 
8 
16 
2 
4 
8 
16 
2 
4 
8 
16 
1.106 
1.158
1.160
1.159
1.624 
1.602
1.625
1.624
1.944 
1.905
1.897
1.908
N
4
8
16
W. H. Southwell 
1001
wave  fronts  having  any  combination  of  tilts,  defocus,  and
astigmatism.  Furthermore, this is true for any N.  However,
for  wave  fronts with  higher-order  aberrations, the model gives
nonperfect  reconstruction  that  is  N dependent.  Basically,
this  type  of  error  is  a  consequence  of the  finite  sampling
density.
The other important  source  of error is  that produced from
noise in the slope  measurements.
The basis  for determining the error propagation,  that is, the
amount of error  resulting  from errors  in  the slope measure-
ments, is given  by  the model  itself, or  the normal  equations,
Eq.  (12)  [or  Eq. (14)]:
(AtAe) O = AtS, 
(27)
where  we now  use the extended  matrix Ae  (discussed  above)
to assure  that our  solutions will have zero  mean.  We  assume
now that our  slope  measurements  contain  random additive
noise,
(28)
We now ask, what is e, the resulting error in wave  front O?  To
answer  this, write
S = So + a. 
and apply the  linear relationship  Eq. (27).  We  obtain
0 =  t0o 
: 
,
(AlA,)  (Oo  + e) =  At  (So ±  a), 
(AtAe)  e = At  or, 
(29)
(30)
(31)
since  So and 00 are  assumed to  be the true values.  Thus, the
wave-front errors e will obey the same set of normal equations
that the wave  front obeys.  Furthermore,  the new AtAe  ma-
trix  is  not singular.  This permits,  in  principle  at  least,  the
application  of the standard  linear  least-squares solution
where
e=  B a,
B  =  (AtAe)-At
B  =  (AtAeA)-
1  AtD, 
where  Eq.  (34)  is applicable  for the  model in Fig. 1(A).
A particular  phase  error component is
e=  E  Bik  ak. 
k
(32)
(33)
(34)
(35)
We  next  form  the  product  Ej  ej  and  take the  statistical  av-
erage,
(ei  ej)  =  A  E  Bih B1 l  (a,  al); 
k 
I
(36)
bvt  since the slope  errors are independent  and uncorrelated
(and, we assume,  equal),
((k  CO)  =  a2  6k1.
Equation  (36)  becomes
(Ei  EJ)  =  a2EZBikBik,
k
(37)
(38) 
2
[BB+]iLJ
= U
(39)
The matrix on the right-hand  side of Eq.  (39) is  not diago-
nal, which means  that the phase  errors are  not uncorrelated.
Nor are the phase  errors equal at each point.
To obtain the mean-square  phase error E, we set i  = j  in Eq.
(38)  [or  Eq. (39)]  and  average  over i,
E  = 
E_ (BE2)
=  N2  E_  E  B'k
(40)
(41)
This  says that the  mean-square  phase  error  is  proportional
to  the sum of the square of each component  of the B matrix.
The matrix B is  the same  size as  A+ but  it is  not sparse.
For  the  slope  models  that we  considered,  the  matrix A  is
fixed for a given N,  except for h,  the phase point separation.
But since  it enters  in a  multiplicative  manner,  it  is  easy  to
factor it  out of the  right-hand side  of Eq. (41),
E= Ch2 U2,
where  the noise  coefficient C  is  defined by
C = D2  L  L  Bik
(42)
(43)
There  is  another  way  to  evaluate  the  noise  coefficient.
Using the fact that the slope measurement errors are  uncor-
related,  the total wave-front  variance  will  be  the  sum of the
wave-front  variances  due to each  slope  variance  separately.
To show  this, we assume  each  slope variance  Sk is  different
and  use Eq.  (36)  and  (37) in  (40),
where
E = Z Ek,
k
Eh  =  I  L, B2  a2
(44)
(45)
where, due to the independence  of the  ak,  all the cross terms
vanish.  The scalar quantity Ek  is the mean-square  wave front
resulting from  a  set of zero  slopes  except  for Uk.  The  SOR
method, Eq.  (21),  may be  used to calculate  Ek.  When using
Eq. (21),  which  is based on the standard  normal equations, it
is important  to subtract out the  mean.  This  approach  is a
means of calculating  the elements  of B  without forming an
inverse of A'Ae.  The wave-front solution  of Eq. (21), when
all slopes are zero except  0Uk (which is unity),  becomes the kth
column of the B matrix.  This is evident from Eq. (35).  (This
technique  is similar to the approach  used by Fried.4 )
To compare  our results with those previously published  for
phase difference  measurements, we need to convert our  slope
errors a to phase difference  errors  0Ipd-  We do this using the
expression
Opd  =  (2r/X)hs,
(46)
where  ()pd  is  the  phase  difference  measurement  associated
with  the  slope angle  s,  h is  the  spacing,  and  X is the wave-
length.  This gives
1002 
J. Opt.  Soc.  Am.,  Vol. 70,  No. 8, August  1980
W. H. Southwell 
1002
1.6-
1.2-
0.81
0.4
C
B
A
_1 
I I 
2I 
I 
20 
FIG.  2.  Noise  coefficient for  zonal  estimation  versus  N for a  square  array
containing N2  phase  points.  Curves  A,  B, and  C correspond  to  the sampling
geometries  in  Fig.  1.
N 
Cpd  =  CN2/D 2
=  C/h2 ,
(47) 
where  Cpd  is the  noise  coefficient  for  phase difference  mea-
surements  defined  by
r2
= 
)CPd  pd, 
(48)
where  ((634)2)  is  the mean-square  phase  error and  apd is the
variance  of each phase  difference  measurement.  For all the
data plotted  in this paper,  the noise coefficient  is given  by the
right-hand  side  of Eq.  (47).
We  have  numerically  evaluated  the  error  propagator  for
each configuration  shown  in  Fig.  1.  The results are shown  in
Fig.  2.  Our  results  for Figs.  1(B)  and  1(C)  are  in  agreement
with Herrmann. 8  We note that the noise coefficient is  lower
for the  data  configuration  of Fig.  1(A).
MODAL  ESTIMATION
For  comparison purposes,  we  assume we  have  the  same  set
of  slope  measurements  as  depicted  in  Fig.  1.  We  now  use
these  measurements to fit the coefficients  in a  phase expansion
of orthogonal  functions
O(x,y)  =  E  aknkFk(X,y),
M
k=O
(49)
where  ak  are the  coefficients  to  be  determined,  nk  are  the
normalization  constants,  and  Fk  are  the  two-dimensional
functions  that  are  orthogonal  over  the  discretely  sampled
aperture.  For later  convenience,  we  choose  Fk  to have  zero
mean  and to  be normalized  such  that the phase variance  is
co 
M
2
2= 
E_ ak.
k=1
(50)
7  (3ye-1dN)X 
7  (3y2.Nd2)x)- 
15 
d
We  have  constructed  a  set  Fk  from  products  of  Legendre
polynominals  that have  the required properties
8  (5X 2 -gN)x 
nkni  E 
E  Fk (xivyj)  Ft (xi~yj)  =  N 
N  N.
j=1  j=1
41l,
(51)
9  (5y 2 -gN)y)  6gNdN + 25(dN-gN)dN 
0 
15y 2 -gN
1003 
J. Opt. Soc.  Am.,  Vol. 70, No.  8, August  1980
W. H.  Southwell 
1003
where  we  have  assumed  that  the  origin  of  the  coordinate
system  is  at the center of the square  array.  We  list the first
10  of  these  functions  and  their  normalization  constants  in
Table  III.
The slope  model is  obtained by differentiating  Eq.  (49),
M 
SX  =  E  aknk-
k=1 
ax
k
M 
SY=  E  akank
k=1 
ZFk
CY
(52)
(53)
In this case, it is the M phase  expansion coefficients  ak  that
are  the adjustable  parameters,  instead  of N 2  phase  points.
This  greatly  reduces  the  numerical  complexity  of the  esti-
mation.  The partial  derivatives  are analytical  derivatives  and
are  also  shown  in Table III.
Writing  Eqs.  (55)  and  (56)  in  matrix form,  we have
S =  Aa,
(543
where  A is a rectangular  matrix with M columns and 2N 2 rows
for Fig.  1(A),  2N(N  - 1) rows for Fig. 1(B), and 2(N - 1)2 rows
for Fig.  1(C).
At this point, we observe  that ao, the piston component of
the  phase, is  not determined.  But we do not need to be  con-
cerned  about ao.  Since  all  the other terms in the  expansion
have  zero mean,  then so will  0(x, y)  without the piston  term.
Thus, we  are assured  that our solution will be minimum norm.
Furthermore,  the  normal  equations  are not ill conditioned,
AtAa  =  AtS,
(55)
so that we  may  use the standard  least-squares  solution
TABLE  Ill.  Two-dimensional  Legendre  polynomials used  to represent  a
wave  front  b(xy)  =  M 1 aknkFk(xy) over  a square  aperture  of  width  D.
3(D/2)2  [1-  7/(3N2)]  and dN = (D/2) 2 (1 - 1/N2) are
The  constants  gN 
determined  such that  the  polynomials  are normalized  over a  square  grid
of  /N2 sample  points.
k 
0 
1 
2 
Fk 
1 
x 
Y 
3 
3x 2 -dNp 
4  3y 2 -dN  J 
5 
6  (3X2  - dN)y 
xy 
nk 
1 
3
3 
dN
5 
3dNgN-5dN
9NdN 
Fk 
Fk
0 
1 
0 
6x 
0 
0
0
1
0
6y
y 
6xy 
x
3x 2 - dN
3y2 -dN 
6xy
15x2 -gN 
0
a =  (AtA)-'AtS,
(56)
where  AtA  is  an  M by  M  symmetric  matrix,  and  AtS  is  a
vector of length M whose components  are weighted sums (or
moments) over the  slope  data.
From an inspection of the partial derivatives  listed in Table
III, we note that many  of them turn out to be  other Fk  func-
tions.  Thus, those are  orthogonal.  Fitting  data directly to
an orthogonal set has  the advantage  that each  may be  deter-
mined  independently  by a weighted  sum over  the  data.  In
terms of Eq.  (56),  this results in a diagonal  MA matrix.  In-
version  of  a  diagonal  matrix  requires  no  lengthy  numerical
procedure.  The  inverse  is also  diagonal  with  the  diagonal
elements  being the  reciprocals of corresponding elements.
As  an example, suppose  we wish to extract all modes  up to
and  including astigmatism, M  =  5.  For N =  4 this gives
AtA  = 
/1.2
0
°
0
0
0
51.2
0
0
0
0
0
32.0
0
0
0
0
0
32.0
0
0
0
0
0
102. 4)
[AtA-1  = 
0.0195
0
0
0
0
o 
0.0195 
0
0
0 
0 
0 
0.0031
0
0
0
0
0
0.0031
J. 
(58)
0
0.0098/
Furthermore, we  note that these matrix elements  are  not data
dependent.  They depend only on the sensor  geometry such
as N.  This means they may be  fixed, predetermined  numbers
in Eq.  (56).  The mode  extraction then reduces to a  weighted
sum  over the slope data.  In some cases, this sum extends only
x-  or y-slope  data.
Suppose we wish to  extract all nine modes shown in Table
III.  In this case,  we  note that the derivative  of F 8 is not or-
thogonal  to the derivative of F1 . This leads  to an  AtA matrix
that is  no longer diagonal.  Fortunately, this is  not a serious
difficulty.  It does  not mean  we  lose the fast mode extraction
discussed  above  for the  other  modes. 
In this case  the only
off-diagonal  elements  that  appear are  1 and  8,  and  2  and  9.
For  the unaffected  modes, the inverse is  still the reciprocal  of
the  diagonal  element.  For  the  modes that  are coupled, the
inverse  is just the inverse  of the 2  X  2 submatrix.  Thus, the
inverse  of AtA is as sparse  as AtA.  To demonstrate  this, we
write  these matrixes for  N  = 4,
51.2
0
0
0
0
0
0,
145.1
0
51.2
0
0
0
0
0
0
0
145.1
0
0.0258
0
0
320.
0
0
0
0
0
0
0 
0 
0
0
0
0
0
0
0.0031-2 
0 
0 
0 
0 
0 
0 
0
-0.00211
AtA = 
0.0258
0
0
0
0
0
0
-0.00211
(AtA)-l = 
l
0
0
0
0
102.4
0
0
0
0
0
0
0
320.
0
0
0
0
0
0
0
0
0.003 12
0
0
0
0
0
0
0
0
0
0.00977
0
0
0
0
i
0
0
0
0
0
371.2
0
0
0
0
0
0
0
0
0
371.2
0
0
0
0
0
0
0
0.00269
145.1
0
0
0
0
0
0
1691.
0
0
145.1
0
0
0
0
0
0
1691.
0
0
0
0
0
0
-0.00211
0
0
0
0
0
0
0
0.00078
0
-0.00211
0
0
0
0
0
0
0.00078
0
0
0
0.00269
0
0
(59)
(60)
(62)
Thus, mode extraction is still numerically fast.  For uncoupled
modes,  they are  determined by  independent  weighted sums
of the slope data.  For  coupled modes, it is a linear combina-
tion  of the weighted  sums for  those  modes.
MODAL  ESTIMATION  ERROR
In  order  to  compare  the  noise propagation  properties  of
modal  reconstruction,  we next determine  the phase  error  Ei
at the  discrete grid points, as  we did  above  for the  zonal  re-
construction.  Using  Eq. (32)  and
a = a 
(61)
where a is  the induced  random error  in the modal coefficients
due  to  the  slope  measurement  noise,  we  have,  from  Eq.
(49),
cta, 
Ei  =  Ec aInIF1 (xi,yi).
l
Note that ei  automatically  has zero mean due to our definition
of  the  F1  functions.  Thus, the  mean-square  phase  recon-
struction  error E is
E  = N2 E
= 
(63)
where we have  used the orthogonality of the F1 functions.  We
now use  Eq.  (56) with  Eqs. (28)  and  (61)  to obtain al
=(a2)
cal =  E  [(A+A)-'A+J]1 joj.
j
This gives from Eq. (63)
(64)
1004 
J. Opt. Soc. Am., Vol. 70,  No. 8, August  1980
W.  H. Southwell 
1004
0.8 
-
0.6 
-
0.4 
-
B
0.2
5
10
15
20
25
30
FIG.  3.  Noise  coefficient  for
modal  estimation  versus  N for  a
square  array containing N2  phase
points.  Curves  A,  B,  and  C  cor-
respond  to  the sampling  geome-
tries  in  Fig.  1.
E = E_  E  EZ  [(A+A)- 1A+]ij  [A+(A+A)- 1]kI  (ajgfk).
I 
j  k
Ensemble  averaging of the  slope errors gives
(i  ap) )= 
2  5jk,
(65)
(66)
where  we  assume  the  errors  are  equal  and  uncorrelated.
Applying  Eq. (66)  to  Eq. (65),  we have
E= E  E  [(AtA)-lAt]j  [A(AtA)-t~1ji2
=L[(AtA)-1AtA(AtA)-1J11  52
=  a2  
(AtA)ul  =  2  tr(AtA)-. 
(67)
Thus, the  mean-square  phase  error is  proportional  to the
trace  of  (AtA)-',  which  is  the  sum  of  the  diagonal  ele-
ments.
Using  our previous  definition  for  C, Eqs.  (42) and  (47), the
error  propagation  parameter,  we have
CMR  =  -
tr(AtA)-1 ,  modal reconstruction, 
(68)
where  h  is the distance  between sample  points.
Equations  (58)  and  (60)  give us  the breakdown  by modes
of CMR  for N  = 4, where we can examine the diagonal  elements
term by term.  The  data geometry  used was as shown in Fig.
1(A).
We  evaluated  CMR  for all data configurations  shown in Fig.
1  for  several  values  of N.  The  results for  nine  modes  are
shown  in Fig.  3.  For small N, like the zonal  reconstruction,
there  is significant  difference  between  the three cases.
The difference  may be accounted  for by two factors.  The
first is that the number of data points differs in each case.  For
Fig.  1(A)  we  have  2N2  slopes,  for 1(B)  there are  2N(N  - 1),
and  for 1(C) there  are 2(N  - 1)2.  The  second factor  is their
location.  We note from  Fig.  1(C) that we are reconstructing
the phase  outside  the slope sampling  region.  Phase recon-
struction in this region  is, like extrapolation,  more uncertain.
Cubalchini 10  has  also  observed  that  the  error  propagation
depends  on the slope  geometry  (for  modal estimation).
DISCUSSION  AND  CONCLUSIONS
The  above explanation  may also  account for  the difference
between the zonal  reconstruction  models of Hudgin and Fried
previously reported  (and shown as  curves B and C in  Fig. 2).
As might be  expected, these  differences  become  smaller as N
increases  for modal  estimation.  But for zonal  estimation,  we
note  that  the  differences  between  the  Fig.  1(A)  and  2(A)
geometries  become  smaller, but the large  gap to Fig. 1(C)  ge-
ometry  (and model)  appears to persist even  at large  N.
We next calculated  the mean-square  phase  error  as before,
using the sum of the solutions with separate  unity slopes,  ex-
cept that this time  we weighted  the outer  zones by  one-half
and the corners by one-fourth.  This was also done by Fried.4
Essentially this reduces  the size  of the aperture such that the
outer phase points lie on the boundary.  The results  are shown
in Fig.  4.  All three  configurations  improved,  but Fig.  1(C)
1.6  r
1.2  F-
4-z
0. 8  F-
0.4
C
__B~
A
I /
4 
8 
12 
N
I 
I 
16 
I 
I
20
FIG.  4.  Noise coefficient for zonal estimation on a reduced aperture.  The
phase reconstruction  error was evaluated  as  in Fig.  2, except that the  pe-
rimeter  points are weighted  by  one-half and  the corners  by  one-fourth.
1005 
J. Opt.  Soc.  Am.,  Vol. 70,  No. 8, August  1980
W. 14.  Southwell 
1005