Basic and Advanced Multilevel Modeling with R and Stan
Prerequisites (knowledge of topic)
A strong background in linear regression is a necessity. Background exposure to maximum likelihood models like logistic regression would be very helpful but is not strictly necessary. Some previous background exposure to multilevel, longitudinal, panel, or mixed effects models would be very helpful but is not necessary. People without a background in multilevel models should (time permitting) order a copy of either Multilevel Analysis: Techniques and applications by Joop Hox, Mirjam Moerbeek, and Rens van de Schoot (2017) or Multilevel Analysis by Tom Snijders and Roel Bosker (2011) and attempt to read the early chapters ahead of time. Again, this is not requirement to attend the class but will help you to absorb the material in lecture much more easily.
A laptop—preferably a PC as that is what I use. Please insure that you have administrator access on your machine or that someone who does can help you install needed software prior to the workshop. Without doing this you will be unable to follow along with the labs in class.
The course will use R and RStudio which are both free and open source. We will mainly be using the R packages lme4 and brms as well as some extensions. Please have both programs and the specific packages installed on your machine before you arrive. Note that brms will require rtools to be installed on your machine and that requires administrator access. An install script for all needed packages will be provided for registered students.
Note: if you are primarily a Stata user then I can provide you with some code (for version 16) to do many of the things covered in the course. However, we will not have time to go through it in class.
This course is designed to provide a practical guide to fitting advanced multilevel models. It is pitched for people from widely different backgrounds, so a significant amount of attention is paid to translating concepts across fields. My approach to the class combines work from econometrics, statistics/biostatistics, and psychometrics. The class is structured using a maximum likelihood framework with practical applied Bayesian extensions on different topics. R packages are selected specifically to make the transition from MLE to Bayesian multilevel models as straightforward and seamless as possible. This is a very applied course with annotated code provided and time in class for lab work. However, it is necessary to spend a class time working through theory and interpretation as well as the logic of mixed effects models.
Specific topics include:
• Random intercept and random slope models
• Cross-classified and multiple membership models
• Generalized linear mixed models
• Special topics chosen by students
The last day of class will have material chosen by the students from a predetermined list of possible topics. In order for your topic to be considered you must respond to the course survey by the end of lunch on Monday so that we can discuss updates during the afternoon.
While you will not be an expert in multilevel modeling after one week—this takes years of practice—you will have the tools to go home and fit many advanced models in your own work. By the end of the week you will have practical experience fitting both Bayesian and likelihood versions of basic and advanced multilevel models with RStudio. You will be able to produce diagnostics and results and hopefully interpret them correctly. If you use the models in your own work and read the supplementary materials for the course, you will end up with a very high level of knowledge in multilevel modeling over time. While we do cover Bayesian extensions for multilevel models, this course is not a substitute for a fully-fledged course on Bayesian data analysis. However, it will leave you very well prepared for such a course or for reading a Bayesian analysis textbook
Day 1 – Introduction
• The basic multilevel modeling toolkit
o Random intercept models
o Random coefficient models
o Fixed effects models
• The model fitting checklist
o Data structures
o Missing data and selection bias
o Omitted variable bias
o Latent dependency structures
Afternoon Discussion: Modeling Test Scores
• This is the basic students in classes in schools example on steroids. We will discuss differences between designs where we have some exogenous intervention for students (aka a causal inference model) and ones where we have observational data and can only really model correlations but have potentially very complex structures at work.
• Software Introduction to lme4 and brms
• Fitting random intercept and random slope models with lme4 and brms
Day 2 - Complex Data Structures
• Review of the toolkit and checklist
• Grouping structures
o What kinds of groups do you have?
o How many groups do you have?
• Latent dependency structures
o Groups and experiments
o Groups and time
o Groups and space
o Groups and networks
Afternoon Discussion: Modeling State Policymaking
• We will work out the logic of how to model environmental policy adoption in US states over time. We will mainly follow along with the design and analysis for my paper on state environmental policy adoption. This example highlights latent dependency structures (time, space, networks, latent classes) and complicated grouping structures.
• Fitting cross-classified models with lme4 and brms
• Fitting multiple membership models with lme4 and brms
• Modeling problems for interference, time, space, and networks
Day 3 - Bias from Selection and Omitted Variables
• Review of the toolkit and checklist
• Selection bias and missing data
o Multilevel multiple imputation
o Multilevel selection models
• Omitted variable bias
o Mundlak and Hausman
o Fixed vs mixed effects models
o Random coefficients and Bayesian shrinkage
Afternoon Discussion: Modeling Changes in Public Opinion Over Time
• We will go through some basic and not-so-basic strategies to model dynamic changes in public opinion over time. We will mainly use a data set on Jewish Israeli public support for a two-state solution over a 20-year period. This example provides an excellent illustration of time varying confounding also known as contextual effect moderation through time.
• Review of previous labs
• Multilevel multiple imputation
• Multilevel selection models
• Fixed, random, and mixed effects model comparisons
Day 4 - Practical Model Fitting, Diagnostics, and Model Comparison
• Review of the toolkit and checklist
• Building an analysis plan
• Building a coherent workflow
• Model comparison
• Model diagnostics in lme4
• Model comparison with lme4
• Priors in brms
• Model diagnostics in brms
• Model comparison with brms
• Explaining and justifying what you’ve done to others
Day 5 - Special Topics
Participants choose among the following advanced topics based on personal interest. I will have the final list of topics that I plan to cover updated by Tuesday afternoon. While I will attempt to accommodate everyone in the class it is unlikely that there will be sufficient time to cover all requested topics.
• Review of any topic already covered in the class
• Multilevel categorical modeling
• Multilevel ordered choice modeling
• Multilevel survival modeling
• Multilevel propensity scores
• Multilevel regression and poststratification
• Multilevel structural equation modeling
Mandatory readings before course start
Gill, J. and A. J. Womack (2013). The Multilevel Model Framework. The SAGE handbook of multilevel modeling. M. A. Scott, J. S. Simonoff and B. D. Marx, Sage.
Enders, C. K. (2013). Centering predictors and contextual effects. SAGE Handbook of Multilevel Modeling. M. A. Scott, J. S. Simonoff and B. D. Marx.
Fielding, Antony, and Harvey Goldstein. 2006. "Cross-classified and multiple membership structures in multilevel models: An introduction and review."
Bell, A. and K. Jones (2015). "Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data." Political Science Research and Methods 3(01): 133-153.
Students will have two weeks after the last day of class to complete a homework assignment worth 100% of their grade in the course. This assignment will include coding practice, theory, results interpretation, and research design problems. Students may submit it for feedback at least 72 hours before the deadline. As this is a homework assignment, course notes, readings, and r scripts are obviously allowed. Examples of these problems will be worked through during class time.
Homework will be a mix of research design applications, coding, and fitting models.
1. Students will be given research questions and be required to outline a set of potential analyses designed to answer them. This will include tradeoffs and potential weaknesses in their analysis.
2. Students will be required to diagram R code and explain the purpose and use of each segment. They will be required to articulate how different sections of the code work “under the hood” and outline any relevant implications.
3. Students will be required to fit models, perform diagnostics, and report/interpret results accurately.
The material needed for study will be lecture notes, the required readings in the above list, and the R package documentation for packages used in the course