logo资料库

简化stata中倍差法的估计.docx

第1页 / 共12页
第2页 / 共12页
第3页 / 共12页
第4页 / 共12页
第5页 / 共12页
第6页 / 共12页
第7页 / 共12页
第8页 / 共12页
资料共12页,剩余部分请下载后查看
Simplifying the Estimation of Difference in Differ
1.Introduction
2.diff syntax and equations
2.1Options
2.2Option: balancing test
3.Example
3.1DID with no covariates
3.2DID with covariates
3.3Kernel Propensity Score DID
3.4Quantile DID
3.5Balancing test
4.Saved results
5.Acknowledgements
6.References
$FLA Munich Personal RePEc Archive Simplifying the estimation of difference in differences treatment effects with Stata Juan M. Villa Brooks World Poverty Institute, University of Manchester November 2012 Online at https://mpra.ub.uni-muenchen.de/43943/ MPRA Paper No. 43943, posted 22. January 2013 22:59 UTC
Simplifying the Estimation of Difference in Differences Treatment Effects with Stata* JuanM.Villa BrooksWorldPovertyInstitute UniversityofManchester Manchester,UK. juan.villalora@postgrad.manchester.ac.uk ***DRAFTVERSION*** 1. Introduction Abstract.ThispaperexplainstheinsightsoftheStata'suserwrittencommand difffor theestimationofDifferenceinDifferencestreatmenteffects(DID).Theoptionsandthe formulasaredetailedforthesingleDID,KernelPropensityScoreDID,QuantileDIDandthe balancingproperties.Anexampleofthefeaturesof diffispresentedbyusingthedataset fromCardandKrueger(1994). Keywords:Differenceindifferences,causalinference,kernelpropensityscore,quantile treatmenteffects,quasi-experiments. DifferenceinDifferencestreatmenteffects(DID)havebeenwidelyusedwhenthe evaluationofagiveninterventionentailsthecollectionofpaneldataorrepeatedcross sections.DIDintegratestheadvancesofthefixedeffectsestimatorswiththecausal inferenceanalysiswhenunobservedeventsorcharacteristicsconfoundtheinterpretations (AngristandPischke,2008). Despitetheexistenceofotherplausiblemethodsbasedontheavailabilityofobservational dataforquasi-experimentalcausalinference-i.e.matchingmethods,instrumentalvariable, regressiondiscontinuity-,DIDestimationsofferanalternativereachingthe unconfoundednessbycontrollingforunobservedcharacteristicsandcombiningitwith observedorcomplementaryinformation.Additionally,theDIDisaflexibleformofcausal inferencebecauseitcanbecombinedwithsomeotherprocedures,suchastheKernel *Apreviousversionofthispaperwaspresentedatthe2012UKStataUsersGroupMeetinginLondon,UK. Thisversion:November,2012.
2. diff syntax and equations ssc install diff, replace PropensityScore(Heckmanetal.,1997,1998)andthequintileregression(Meyeretal., 1995). Inthispaper,theStata'scommand diffisexplainedandsomedetailsonits implementationaregivenbyusingthedatasetsfromtheCardandKrueger(1994)article ontheeffectsoftheincreaseintheminimumwage.Similarly,itisexplainhowthe balancingpropertiescanbetestedwhenobservationaldataisprovided. InthenextsectiontheequationsbehindtheestimationoftheDIDareexplainedalongwith thefeaturesofthe diffcommand.Inthethirdsectionandexampleisprovidedand,inthe fourthsection,thebalancingpropertiesaretestedwiththeoptionsthatcanbespecified withthecommand. diffcanbeinstalledorupdatedfromtheSSCarchivebyrunningthecommand: The diffsyntaxisdetailedasfollows: Thecommandrequeststhespecificationoftheoutcomevariable(outcome_var)and allowstheuseofweights,exceptforsomeoptions.Theinitialrequiredoptionisthe period(varname),whichcontainsadummyvariableindicatingthebaseline(period==0) andafollow-up(period==1)periods.Additionally,theoption treated(varname),is need,containingadummyvariablewiththeindicatorofthecontrol(treated==0)and treated(treated==1)individuals. Fortheindividual,thisinitialsettingperformsthefollowinglinearregression: Theestimatedcoefficientshavethefollowinginterpretation: :Isthemeanoutcomeforthecontrolgrouponthebaseline. :Isthemeanoutcomeforthecontrolgroupinthefollow-up. :Isthesingledifferencebetweentreatedandcontrolgroupsonthebaseline. :Isthemeanoutcomeforthetreatedgrouponthebaseline. :Isthemeanoutcomeforthetreatedgroupinthefollow-up. diff outcome_var [if] [in] [weight] ,[ options]     
 :IstheDIDorimpact. The diffcommandarrangesthesecoefficientsintheoutputtable.Thenumberof observations,r-squared,standarderrors,t-statistic-orthez-statwhenstandarderrorsare bootstrapped-andthep-valuearealsoreported: Number of observations in the DIFF-IN-DIFF: # Baseline Control: # Treated: # R-square: 0.0 Follow-up # # DIFFERENCE IN DIFFERENCES ESTIMATION Treated | Diff(BL) | Control ------------------ ------------ BASE LINE --------- ----------- FOLLOW UP ---------- -------------------- Outcome Variable | Control | | Diff(FU) | DIFF-IN-DIFF ------------------+---------+-----------+----------+----------+-----------------+----------+------------- outcome_variable Std. Error t/z P>|t/z| --------------------------------------------------------------------------------------------------------- * Means and Standard Errors are estimated by linear regression **Inference: *** p<0.01; ** p<0.05; * p<0.1 Treated | | | | | | | | | | | | | | | | | | | | | | | 2.1 Options cov(varlist)-Specifiesthepre-treatmentcovariatesofthemodel.Thesevariablesare alsoknownascontrolsorobservablecharacteristics.Ifwedenoteasthethcovariate, diffrunsthefollowingregressionwiththisoption: Thecoefficientsarenotreportedintheoutputtable.However,itispossibletorequest themifoption reportisspecified. kernel-PerformstheKernel-basedPropensityScoreDID.Atafirststage,thisoptionruns a probitmodel-or logitifthisoptionisselected-ofthe treated(varname)onthe cov(varlist).Itgeneratesthevariables_weightsthatcontainstheweightsderived fromthekerneldensityfunctionand _pswhenthePropensityScoreisnotspecifiedin pscore(varname).Thisoptionrequiresthe id(varname)ofeachindividual,henceitis notcompatiblewithrepeatedcrosssection.ItalsoallowstheestimationoftheDIDonthe commonsupportbyspecifyingtheoption support. Inasecondstage, diffrunsaregressionapplyingtheStata'saverageweightsoption [av=_weights],obtainedfromthepropensityscore:
Option kernelcanbecustomizedbyselectionthebandwidth, bw(#)andthekerneltype, ktype(kernel),accordingtotheStata's kdensitychoices.Finally,thefirststageis explicitlyshowedif reportisspecified. qdid(quantile)-PerformstheQuantileDifferenceinDifferencesestimationatthe specifiedquantilefrom0.1to0.9(quantile0.5performstheQDIDatthemedeian).Itmay becombinedwith kerneland cov(varlist)options. qdid(quantile)doesnot supportweightsnorrobuststandarderrors.ThisoptionusesStata's qreg and bsqreg forbootstrappedstandarderrors.SeeAngristandPischke(2008)fordetailedinformation onQuantileTreatmentEffectsandMeyeretal.(1995)foraillustrativeexample. cluster(varname)-Calculatesclusteredstandarderrorsby varname. robust-CalculatesrobustStd.Errors. bs-PerformsaBootstrapestimationofcoefficientsandstandarderrors. reps(int) specifiesthenumberofrepetitionswhenthebsisselected.Thedefaultare50repetitions. nostar-Removestheinferencestarsfromthep-values. test-Performsabalancingt-testofdifferenceinmeansofthespecifiedcovariates betweenthecontrolandtreatedgroupsin period == 0.Theoptiontestcombinedwith kernelperformsthebalancingt-testwiththeweightedcovariates.Stata's ttest commandisusedtoestimatethet-statisticsandstandarderrors. Foreachvariablein cov(varlist), testoptionrunsthecommand: Whencombinedwith kernel,thedifferences,t-statisticsandstandarderrorsare generatedwithlinearregression. diffoffersanexamplewiththedatasetfromCardandKrueger(1994).Itcanbe downloadedintotheworkingdirectorybyrunning net get diffandthen, use cardkrueger1994,clear.Inthiscase,theauthorsstudytheimpactoftheincreaseinthe minimumwageinthestateofNewJersey-thetreatedgroup-ontheemploymentlevelat thefastfoodindustry.Theycomparethechangesinthenumberofemployeesatthe restaurantsinthistreatedgrouptotheonesoftheneighborstate,Pennsylvania-the controlgroup-.TheycollectabaselineinFebruary,1992,andafollow-upinNovember. ttest cov(varname) if period == 0, by(treated) 2.2 Option: balancing test 3. Example
Thedescriptionofthevariablesinthedatasetareisthefollowing: Contains data from cardkrueger1994.dta label value type 820 8 format treated variable label 18,860 (99.9% of memory free) Dataset from Card&Krueger (1994) %8.0g int %8.0g byte long %8.0g float %9.0g %8.0g byte byte %8.0g %8.0g byte byte %8.0g obs: vars: size: ----------------------------------------------------------------------------------------------------------- storage display variable name ----------------------------------------------------------------------------------------------------------- id t treated fte bk kfc roys wendys ----------------------------------------------------------------------------------------------------------- Sorted by: id Store ID Feb. 1992 = 0; Nov. 1992 = 1 New Jersey = 1; Pennsylvania = 0 Output: Full Time Employment Burger King == 1 Kentuky Fried Chiken == 1 Roy Rogers == 1 Wendy's == 1 With820observations,thenumberofindividualsorstoresare331and79inthetreated andcontrolgroups,respectively.Theoutcomevariableisfte,whilesomecovariatesare definedasdummyvariableindicatingwhethertheobservationbelongstoagivenfastfood restaurant.Thebasicstatisticareshowasfollows: Variable | Std. Dev. Min Max Obs Mean t -------------+-------------------------------------------------------- 522 1 1 80 1 1 1 1 -------------+-------------------------------------------------------- id | t | treated | fte | bk | kfc | roys | wendys | 148.1413 .5003052 .3946469 9.022517 .4933761 .3965364 .4282318 .3536639 246.5073 .5 .8073171 17.59457 .4170732 .195122 .2414634 .1463415 820 820 820 801 820 820 820 820 1 0 0 0 0 0 0 0 3.1 DID with no covariates diff fte, t(treated) p(t) Theoutputtableofthisinitialsettingis: Number of observations in the DIFF-IN-DIFF: 801 Baseline Control: 78 Treated: 326 404 Follow-up 77 320 397 155 646 R-square: 0.00805 DIFFERENCE IN DIFFERENCES ESTIMATION | Control | Treated | Diff(BL) | Control | Treated | Diff(FU) | DIFF-IN-DIFF --------------------- ------------ BASE LINE --------- ----------- FOLLOW UP ---------- -------------- Outcome Variable ---------------------+---------+-----------+----------+---------+-----------+----------+-------------- fte Std. Error t P>|t| | -2.884 | 1.135 | -2.54 | 0.011** | 17.573 | 0.503 | 20.45 | 0.000 | 17.542 | 1.026 | 17.60 | 0.000 | 2.914 | 1.611 | 1.81 | 0.071* | 19.949 | 1.019 | 19.57 | 0.000 | 17.065 | 0.499 | 14.17 | 0.000 | 0.030 | 1.143 | -0.33 | 0.979
------------------------------------------------------------------------------------------------------ * Means and Standard Errors are estimated by linear regression **Inference: *** p<0.01; ** p<0.05; * p<0.1 Thebaselineinformationcontainsthecolumnswiththemeanoutcomeforeachgroupand itsdifference(-2.88inthiscase).Theseestimatorsarepresentedalongwithstandard errors,t-statisticsandp-values.Thesameinformationisshowedforthebaseline(witha differenceof0.03).Thelastcolumnisthedifferenceindifferences,thatis,0.03-(-2.88)= 2.94.Thep-valueisaccompaniedbyastarinterpretedasthestatisticalinferenceat differentsignificantlevels. Alternatively,bootstrappedstandarderrorscanberequestedbyaddingthepotion bs: diff fte, t(treated) p(t) bs rep(50) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Number of observations in the DIFF-IN-DIFF: 801 Baseline Control: 78 Treated: 326 404 Follow-up 77 320 397 155 646 R-square: 0.00805 Bootstrapped Standard Errors DIFFERENCE IN DIFFERENCES ESTIMATION | Control | Treated | Diff(BL) | Control | Treated | Diff(FU) | DIFF-IN-DIFF --------------------- ------------ BASE LINE --------- ----------- FOLLOW UP ---------- -------------- Outcome Variable ---------------------+---------+-----------+----------+---------+-----------+----------+-------------- fte Std. Error z P>|z| ------------------------------------------------------------------------------------------------------ * Means and Standard Errors are estimated by linear regression **Inference: *** p<0.01; ** p<0.05; * p<0.1 | -2.884 | 1.381 | -2.09 | 0.037** | 19.949 | 1.330 | 15.00 | 0.000 | 17.542 | 0.830 | 17.05 | 0.000 | 17.573 | 0.477 | 20.76 | 0.000 | 0.030 | 0.920 | 0.28 | 0.974 | 2.914 | 1.792 | 1.63 | 0.104 | 17.065 | 0.494 | 14.12 | 0.000 3.2 DID with covariates diff fte, t(treated) p(t) cov(bk kfc roys) DIFFERENCE-IN-DIFFERENCES WITH COVARIATES Number of observations in the DIFF-IN-DIFF: 801 Baseline Control: 78 Treated: 326 404 Follow-up 77 320 397 155 646 R-square: 0.18784 --------------------- ------------ BASE LINE --------- ----------- FOLLOW UP ---------- -------------- Outcome Variable ---------------------+---------+-----------+----------+---------+-----------+----------+-------------- | Control | Treated | Diff(BL) | Control | Treated | Diff(FU) | DIFF-IN-DIFF DIFFERENCE IN DIFFERENCES ESTIMATION
| 18.837 | 0.851 | 18.43 | 0.000 | 21.161 | 1.142 | 18.53 | 0.000 | 18.758 | 1.158 | 19.09 | 0.000 | -2.324 | 1.031 | -2.25 | 0.024** fte Std. Error t P>|t| ------------------------------------------------------------------------------------------------------ * Means and Standard Errors are estimated by linear regression **Inference: *** p<0.01; ** p<0.05; * p<0.1 Option reportallowstheoutputtableofthecoefficientsfromthe cov(varlist): | 2.935 | 1.460 | 2.01 | 0.045** | 19.369 | 0.853 | 19.87 | 0.000 | 0.611 | 1.037 | 0.51 | 0.556 Covariates and Coefficients: ------------------------------------------------------------------- Variable(s) P>|t| ---------------------+------------+-----------+---------+---------- bk | 0.303 | 0.000 kfc roys | 0.354 ------------------------------------------------------------------- | 1.032 | -9.154 | -0.927 | 0.917 | -9.205 | -0.897 | 0.889 | 1.006 | 0.967 | Std. Err. | Coeff. t | | 3.3 Kernel Propensity Score DID diff fte, t(treated) p(t) cov(bk kfc roys) kernel id(id) TheKernelPropensityScoreDIDcanbeestimatedonthecommonsupportofthe propensityscore.Iyouhavepreviouslyestimatedthepropensityscoreyoucanprovideit withtheoption pscore(varname).Thebasicsyntaxis: Thefulloptionsare: Withthefollowingoutputtable: diff fte, t(treated) p(t) cov(bk kfc roys) kernel id(id) report KERNEL PROPENSITY SCORE DIFFERENCE-IN-DIFFERENCES Report - Propensity score estimation: Iteration 0: Iteration 1: Iteration 2: log likelihood = -198.21978 log likelihood = -196.7657 log likelihood = -196.7636 Probit regression Log likelihood = -196.7636 Number of obs LR chi2(3) Prob > chi2 Pseudo R2 = = = = 404 2.91 0.4053 0.0073 Coef. Std. Err. treated | ------------------------------------------------------------------------------ [95% Conf. Interval] -------------+---------------------------------------------------------------- .5910649 .8725469 .7541618 .9959767 ------------------------------------------------------------------------------ -.2285591 -.0948873 -.1545664 .2992305 .1812529 .3888298 .2997977 .6476036 .2090916 .246799 .2318227 .1777446 bk | kfc | roys | _cons | 0.386 0.115 0.196 0.000 0.87 1.58 1.29 3.64 P>|z| z Number of observations in the DIFF-IN-DIFF: 800 Baseline Follow-up
分享到:
收藏