MonteCarloStatisticalMethods
ChristianP.Robert
CREST,Insee,Paris
GeorgeCasella
CornellUniversity,Ithaca,NY
DraftVersion.
February,
CHAPTER
Introduction
Version.February,
Untiltheadventofpowerfulandaccessiblecomputingmethods,theex-
perimenterwasconfrontedwithadicultchoice.Eitherdescribeanaccu-
ratemodelofaphenomenon,whichwouldusuallypreventthecomputation
ofexplicitanswers,orchooseastandardmodelwhichwouldallowthiscom-
putation,butwouldoftennotbeacloserepresentationofarealisticmodel.
Thisdilemmaispresentinmanybranchesofstatisticalapplications,for
exampleinelectricalengineering,aeronautics,biology,networks,andas-
tronomy.Touserealisticmodels,theresearchersinthesedisciplineshave
oftendevelopedoriginalapproachesformodelttingthatarecustomized
fortheirownproblems.(Thisisparticularlytrueofphysicists,theorig-
inatorsofMarkovchainMonteCarlomethods.)Traditionalmethodsof
analysis,suchastheusualnumericalanalysistechniques,turnouttobe
notwelladaptedforsuchsettings.Therstsectionofthischapterpresents
anumberofexamplesofstatisticalmodels,someofwhichwereinstrumen-
talindevelopingtheeldofsimulation-basedinference.Theremaining
sectionsdescribethedicultiesspecictomostcommonstatisticalmeth-
ods,whilethenalsectioncontainsacomparisonwithnumericalanalysis
techniques.
.StatisticalModels
Inapurelystatisticalsetup,computationaldicultiesoccuratboththe
levelofprobabilisticmodelingoftheinferredphenomenonandatthelevel
ofstatisticalinferenceonthismodel(estimation,prediction,tests,variable
selection,etc.).Intherstcase,adetailedrepresentationofthecauses
ofthephenomenon,suchasaccountingforpotentialexplanatoryvariables
linkedtothephenomenon,canleadtoaprobabilisticstructurewhichistoo
complextoallowforaparametricrepresentationofthemodel.Moreover,
theremaybenoprovisionforgettingclosed-formestimatesofquantitiesof
interest.Afrequentsetupwiththistypeofcomplexityisanexpertsystems
(inmedicine,physics,nance,etc.)ormoregenerallyagraphstructure.
Figure..givesanexampleofsuchastructureanalyzedinSpiegelhalter
etal.( ).Itisrelatedtothedetectionofaleftventriclehypertrophia
(LVH),wherethelinksbetweencausesrepresentprobabilisticdependen-
n2: Disease?
n4: LVH?
n9: Sick?
n1: Birth
asphyxia?
n5: Duct
flow?
n6: Cardiac
mixing?
n7: Lung
parenchyma?
n8: Lung
flow?
n3: Age at
presentation?
INTRODUCTION
[..
Figure...Probabilisticrepresentationoflinksbetweencausesofleftventricle
hypertrophia(Source:Spiegelhalteretal., )
cies.(Themotivationbehindtheanalysisistoimprovethepredictionof
thisdisease.)Inthiscase,theconditionaldistributionsofnodeswithre-
specttotheirparentsleadtothejointdistribution.SeeRobert( )or
Lauritzen( )forotherexamplesofcomplexexpertsystemswherethis
reconstitutionisimpossible.
Asecondsetupwheremodelcomplexityprohibitsanexplicitrepresen-
tationappearsineconometrics(andinmanyotherareas)forstructures
oflatent(ormissing)variablemodels.Givena\simple"model,aggrega-
tionorremovalofsomecomponentsofthismodelmaysometimesinduce
suchinvolvedstructuresthatsimulationistrulytheonlywaytodrawan
inference.(Chapter providesaseriesofexamplesofsuchmodelswhere
simulationmethodsarenecessary.)
Example..{Censoreddatamodels{Censoreddatamodelscan
beconsideredtobemissingdatamodelswheredensitiesarenotsampled
directly.Toobtainestimates,andmakeinferences,usuallyrequirespro-
grammingorcomputingtimeandprecludesanalyticalanswers.
Barringcaseswherethecensoringphenomenoncanbeignored(seeChap-
ter ),severaltypesofcensoringcanbecategorizedbytheirrelationwith
anunderlying(unobserved)model,Yif(yij):
(i)GivenrandomvariablesYi,whichmaybetimesofobservationorcon-
centrations,theactualobservationsareYi=minfYi;ugwhereuisthe
maximalobservationdurationorthesmallestmeasurableconcentration
Claudine,notChristian!
n20: Grunting
report?
n17: Right
up. quad. O2?
n18: CO2
report?
n10: Hypoxia
distribution?
n11: Hypoxia
in O2?
n16: Lower
body O2?
n19: X-ray
report?
n12: CO2?
n14: Grunting?
n13: Chest
X-ray?
n15: LVH
report?
..]
STATISTICALMODELS
rate.
(ii)TheoriginalvariablesYiarekeptinthesamplewithprobability(yi)
andthenumberofdiscardedvariablesiseitherknownorunknown.
(iii)ThevariablesYiareassociatedwithauxiliaryvariablesXigsuch
thatyi=h(yi;xi)istheobservation.Typically,h(yi;xi)=min(yi;xi).
Thefactthattruncationoccurred,namelythevariableIIyi>xi,maybe
eitherknownorunknown.
Asanexample,ifXN(;)andYN(;),thevariableZ=
X^Y=min(X;Y)isdistributedas
z
_'z
z'z
+
(..)
where'isthedensityofthenormalN( ;)distributionandisthe
correspondingcdf,whichisnoteasytocompute.Similarly,ifXhasa
Weibulldistributionwithtwoparameters,We(;)anddensity
f(x)=xex
onIR+,theobservationofthecensoredvariableZ=X^!,where!is
constant,hasthedensity
f(z)=zezIIz!+Z!xexdx!(z);
(..)
wherea()istheDiracmassata.Inthiscase,theweightoftheDirac
mass,P(X!),cannotbeexplicitlycomputed.
Thedistributions(..)and(..)appearnaturallyinqualitycontrol
applications.There,testingofaproductmaybeofaduration!,where
thequantityofinterestistimetofailure.Iftheproductisstillfunctioning
attheendoftheexperiment,theobservationonfailuretimeiscensored.
Similarly,inalongitudinalstudyofadisease,somepatientsmayleavethe
studyeitherduetootherdeathcausesorbysimplydroppingout.
k
Insomecases,theadditiveformofadensity,whileformallyexplicit,
prohibitsthecomputationofthedensityofasample(X;;Xn)forn
large.(Here,\explicit"hastherestrictivemeaningthat\itcanbecomputed
inareasonabletime".)
Example..{Mixturemodels{Modelsofmixturesofdistributions
arebasedontheassumptionthattheobservationsaregeneratedfromone
ofkelementarydistributionsfiwithprobabilitypi,theoveralldensity
being
pf(x)++pkfk(x):
INTRODUCTION
[..
Anexpansionofthedistributionof(X;;Xn),
nYi=fpf(xi)++pkfk(xi)g;
involvesknelementaryterms,whichisprohibitiveforlargesamples.While
thecomputationofstandardmomentslikethemeanorthevarianceof
thesedistributionsisfeasibleinmostsetups(andthusthederivationof
momentestimators,seeProblem.),therepresentationofthelikelihood
(andthereforetheanalyticalcomputationofmaximumlikelihoodorBayes
estimates)isgenerallyimpossibleformixtures.
k
Lastly,welookataparticularlyimportantexampleintheprocessing
oftemporal(ortimeseries)datawherethelikelihoodcannotbewritten
explicitly.
Example..{Movingaveragemodel{AnMA(q)modeldescribes
variables(Xt)thatcanbemodeledas(t= ;:::;n)
Xt="t+qXj=j"tj;
(..)
wherefori=q;(q);,the"i'sarei.i.d.randomvariables"i
N( ;)andforj=;;q,thejsareunknownparameters.Ifthe
sampleconsistsoftheobservation(X ;;Xn),wheren>q,thesample
densityis(Problem.)
qYi='"i'x Pqi=i"i
ZIRq(n+q)
'x^"oPqi=i"i
(..)
'xnPqi=i^"ni
d"d"q;
with
^" =x qXi=i"i;
^"=xqXi=i"i^" ;
:::
^"n=xnqXi=i^"ni:
Theiterativedenitionofthe^"i'sisarealobstacletoanexplicitintegration
in(..)whichhindersstatisticalinferenceinthesemodels.Notethatfor
i=q;(q);;theperturbations"icanbeinterpretedasmissing
k
data(seeChapter ).
..]
STATISTICALINFERENCE
Beforetheintroductionofsimulation-basedinference,computationaldif-
cultiesencounteredinthemodelingofaproblemoftenforcedtheuse
of\standardmodels"and\standard"distributions.Onecoursewouldbe
tousemodelsbasedonexponentialfamilies(..)(seeLehmann, ,
Brown, orRobert, ),whichenjoynumerousregularityproperties
(seeNote..).Anothercoursewastoabandonparametricrepresenta-
tionsfornon-parametricapproacheswhicharebydenitionrobustagainst
modelingerrors.Ineconometrics,thecomputingbottleneckcreatedbythe
needforexplicitsolutionshasledtotheuseoflinearstructuresofdepen-
dence(seeGourierouxandMonfort, , ).
.StatisticalInference
Thestatisticaltechniquesthatwewillbemostconcernedwitharemax-
imumlikelihoodandBayesianmethods,andtheinferencesthatcanbe
drawnfromtheiruse.Intheirimplementation,theseapproachesarecus-
tomarilyassociatedwithspecicmathematicalcomputations,theformer
withmaximizationproblems|andthustoanimplicitdenitionofestima-
torsassolutionsofmaximizationproblems|,thelaterwithintegration
problems|andthustoa(formally)explicitrepresentationofestimatorsas
anintegral.(SeeLehmann ,Berger ,CasellaandBerger or
Robert foranintroductiontothesetechniques.)Aspreviouslymen-
tioned,reductiontosimple,perhapsnon-realistic,distributionswasoften
necessitatedbycomputationallimitations,butitisalsothecasethatthe
reductiontosimpledistributionsdoesnotnecessarilyeliminatetheissue
ofnon-explicitexpressions,whateverthestatisticaltechnique.Ourmajor
focusistheapplicationofsimulation-basedtechniquestoprovidesolutions
andinferenceforamorerealisticsetofmodels,andhencecircumventthe
problemsassociatedwiththeneedforexplicitorcomputationallysimple
answers.
Alternativeapproaches(see,forinstance,GourierouxandMonfort )
involvesolvingimplicitequationsformethodsofmomentsorminimization
ofgeneralizeddistances(forM-estimators).Approachesbyminimaldis-
tancecaningeneralbereformulatedasmaximizationsofformallikelihoods
asillustratedinExample..below,whilethemethodofmomentscan
sometimesbeexpressedasaderivationofamaximizationproblem,thatis
asadierenceequation.Notehoweverthatsuchaninterpretationisrare
andalsothatthemethodofmomentsisgenerallysub-optimalwhencom-
paredwithBayesianormaximumlikelihoodapproaches,theselattertwo
methodsusingmoreecientlytheinformationcontainedinthedistribu-
tionoftheobservations,accordingtotheLikelihoodPrinciple(seeRobert
).Butthemomentestimatorsarestillofinterestasstartingvalues
foriterativemethodsaimingatmaximizingthelikelihood,sincetheyare
convergentinmostsetups.Forinstance,inthecaseofnormalmixtures,
whilethelikelihoodisnotbounded(seeExample..below)andthere-
forethereisnomaximumlikelihoodestimator,itcanbeshownthatthe
INTRODUCTION
[..
solutionofthelikelihoodequationswhichisclosertothemomentestimator
isaconvergentestimator(seeLehmann ).
Example..{LeastSquaresEstimators{Estimationbyleastsquares
canbetracedbacktoGauss( )andLegendre( )(seeStigler ).
Intheparticularcaseoflinearregressionweobserve(xi;Yi),i=;;n,
where
(..)
Yi=a+bxi+"i;
i=;;n;
andthevariables"irepresenterrors.Theparameter(a;b)isestimatedby
minimizingthedistance
nXi=(yiaxib)
(..)
in(a;b),yieldingtheleastsquaresestimates.Ifweaddmorestructureto
theerrorterm,inparticularthat"iN( ;),independent(equivalently,
YijxiN(axi+b;)),thelog-likelihoodfunctionfor(a;b)isproportional
to
log(n)nXi=(yiaxib)=;
anditfollowsthatthemaximumlikelihoodestimatesofaandbareiden-
ticaltotheleastsquaresestimates.However,thelikelihoodstructurealso
providesanestimatorof.
Therefore,if,in(..)weassumeIE("i)= ,orequivalentlythatthelin-
earrelationshipIE[Yjx]=ax+bholds,minimizationof(..)isequivalent,
fromacomputationalpointofview,toimposinganormalityassumptionon
Yconditionallyonxandapplyingmaximumlikelihood.Inthislattercase
theadditionalestimatorofisconsistentifthenormalapproximationis
asymptoticallyvalid.
k
Althoughsomewhatobvious,thisformalequivalencebetweentheopti-
mizationofafunctiondependingontheobservationsandthemaximization
ofalikelihoodassociatedwiththeobservationshasanontrivialoutcome,
andappliesinmanyothercases.Forexample,inthecasewherethepa-
rametersareconstrainedRobertsonetal.( )considerapqtableof
randomvariablesYijwithmeansij,wherethemeansareincreasingini
andj.Estimationoftheij'sbyminimizingthesumofthe(yijij)'sis
possiblethroughthe(numerical)algorithmcalled\pool-adjacent-violators"
anddevelopedbyRobertsonetal.( )tosolvethisspecicproblem.(See
Problems.and..)Analternativeistouseanalgorithmbasedon
simulationandarepresentationbyanormallikelihoodoftheproblem(see
x..).
.LikelihoodMethods
Themethodofmaximumlikelihoodestimationisquiteapopulartechnique
forderivingestimators.StartingfromaniidsampleX;:::;Xnfroma