Statistical Analysis With Latent Variables
User’s Guide
Linda K. Muthén
Bengt O. Muthén
Following is the correct citation for this document:
Muthén, L.K. and Muthén, B.O. (1998-2010). Mplus User’s Guide. Sixth Edition.
Los Angeles, CA: Muthén & Muthén
Copyright © 1998-2010 Muthén & Muthén
Program Copyright © 1998-2010 Muthén & Muthén
Version 6
April 2010
The development of this software has been funded in whole or in part with Federal funds
from the National Institute on Alcohol Abuse and Alcoholism, National Institutes of
Health, under Contract No. N44AA52008 and Contract No. N44AA92009.
Muthén & Muthén
3463 Stoner Avenue
Los Angeles, CA 90066
Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support@StatModel.com
TABLE OF CONTENTS
Chapter 1: Introduction
Chapter 2: Getting started with Mplus
Chapter 3: Regression and path analysis
Chapter 4: Exploratory factor analysis
Chapter 5: Confirmatory factor analysis and structural equation modeling
Chapter 6: Growth modeling and survival analysis
Chapter 7: Mixture modeling with cross-sectional data
Chapter 8: Mixture modeling with longitudinal data
Chapter 9: Multilevel modeling with complex survey data
Chapter 10: Multilevel mixture modeling
Chapter 11: Missing data modeling and Bayesian analysis
Chapter 12: Monte Carlo simulation studies
Chapter 13: Special features
Chapter 14: Special modeling issues
Chapter 15: TITLE, DATA, VARIABLE, and DEFINE commands
Chapter 16: ANALYSIS command
Chapter 17: MODEL command
Chapter 18: OUTPUT, SAVEDATA, and PLOT commands
Chapter 19: MONTECARLO command
Chapter 20: A summary of the Mplus language
1
13
19
41
51
97
141
197
233
289
337
357
391
407
449
519
567
633
689
711
PREFACE
We started to develop Mplus fifteen years ago with the goal of providing researchers with
powerful new statistical modeling techniques. We saw a wide gap between new
statistical methods presented in the statistical literature and the statistical methods used
by researchers in substantively-oriented papers. Our goal was to help bridge this gap
with easy-to-use but powerful software. Version 1 of Mplus was released in November
1998; Version 2 was released in February 2001; Version 3 was released in March 2004;
Version 4 was released in February 2006; and Version 5 was released in November 2007.
We are now proud to present the new and unique features of Version 6. With Version 6,
we have gone a considerable way toward accomplishing our goal, and we plan to
continue to pursue it in the future.
The new features that have been added between Version 5 and Version 6 would never
have been accomplished without two very important team members, Tihomir
Asparouhov and Thuy Nguyen. It may be hard to believe that the Mplus team has only
two programmers, but these two programmers are extraordinary. Tihomir has developed
and programmed sophisticated statistical algorithms to make the new modeling possible.
Without his ingenuity, they would not exist. His deep insights into complex modeling
issues and statistical theory are invaluable. Thuy has developed the post-processing
graphics module and the Mplus editor and language generator. In addition, Thuy has
programmed the Mplus language and is responsible for keeping control of the entire code
which has grown enormously. Her unwavering consistency, logic, and steady and calm
approach to problems keep everyone on target. We feel fortunate to work with such a
talented team. Not only are they extremely bright, but they are also hard-working, loyal,
and always striving for excellence. Mplus Version 6 would not have been possible
without them.
Another important team member is Michelle Conn. Michelle was with us at the
beginning when she was instrumental in setting up the Mplus office and has been
managing the office for the past six years. In addition, Michelle is responsible for
creating the pictures of the models in the example chapters of the Mplus User’s Guide.
She has patiently and quickly changed them time and time again as we have repeatedly
changed our minds. She is also responsible for keeping the website updated and
interacting with customers. Her calm under pressure is much appreciated. Jean
Maninger joined the Mplus team after Version 4 was released. Jean works with Michelle
and has proved to be a valuable team member.
We would also like to thank all of the people who have contributed to the development of
Mplus in past years. These include Stephen Du Toit, Shyan Lam, Damir Spisic, Kerby
Shedden, and John Molitor.
Part of the work has been supported by SBIR contracts from NIAAA that we
acknowledge gratefully. We thank Bridget Grant for her encouragement in this work.
Linda K. Muthén
Bengt O. Muthén
Los Angeles, California
April 2010
Introduction
CHAPTER 1
INTRODUCTION
Mplus is a statistical modeling program that provides researchers with a
flexible tool to analyze their data. Mplus offers researchers a wide
choice of models, estimators, and algorithms in a program that has an
easy-to-use interface and graphical displays of data and analysis results.
Mplus allows the analysis of both cross-sectional and longitudinal data,
single-level and multilevel data, data
that come from different
populations with either observed or unobserved heterogeneity, and data
that contain missing values. Analyses can be carried out for observed
variables that are continuous, censored, binary, ordered categorical
(ordinal), unordered categorical (nominal), counts, or combinations of
these variable types. In addition, Mplus has extensive capabilities for
Monte Carlo simulation studies, where data can be generated and
analyzed according to any of the models included in the program.
The Mplus modeling framework draws on the unifying theme of latent
variables. The generality of the Mplus modeling framework comes from
the unique use of both continuous and categorical latent variables.
Continuous latent variables are used to represent factors corresponding
to unobserved constructs, random effects corresponding to individual
differences in development, random effects corresponding to variation in
coefficients across groups in hierarchical data, frailties corresponding to
unobserved heterogeneity in survival time, liabilities corresponding to
genetic susceptibility to disease, and latent response variable values
corresponding to missing data. Categorical latent variables are used to
represent latent classes corresponding to homogeneous groups of
individuals,
types of
development
components
corresponding to finite mixtures of unobserved populations, and latent
response variable categories corresponding to missing data.
trajectory classes corresponding
in unobserved populations, mixture
latent
to
THE Mplus MODELING FRAMEWORK
The purpose of modeling data is to describe the structure of data in a
simple way so that it is understandable and interpretable. Essentially,
the modeling of data amounts to specifying a set of relationships
1
between variables. The figure below shows the types of relationships
that can be modeled in Mplus. The rectangles represent observed
variables. Observed variables can be outcome variables or background
variables. Background variables are referred to as x; continuous and
censored outcome variables are referred to as y; and binary, ordered
categorical (ordinal), unordered categorical (nominal), and count
outcome variables are referred to as u. The circles represent latent
variables. Both continuous and categorical latent variables are allowed.
Continuous latent variables are referred to as f. Categorical latent
variables are referred to as c.
The arrows in the figure represent regression relationships between
variables. Regressions relationships that are allowed but not specifically
shown in the figure include regressions among observed outcome
variables, among continuous latent variables, and among categorical
latent variables. For continuous outcome variables, linear regression
models are used. For censored outcome variables, censored (tobit)
regression models are used, with or without inflation at the censoring
point. For binary and ordered categorical outcomes, probit or logistic
regressions models are used. For unordered categorical outcomes,
multinomial logistic regression models are used. For count outcomes,
Poisson and negative binomial regression models are used, with or
without inflation at the zero point.
CHAPTER 1
2