Analyzing Panel Data

Prerequisites (knowledge of topic)
Comfortable familiarity with univariate differential and integral calculus, basic probability theory, and linear algebra is required. Students should have completed Ph.D.-level courses in introductory statistics, and in linear and generalized linear regression models (including logistic regression, etc.), up to the level of Regression III. Familiarity with discrete and continuous univariate probability distributions will be helpful.

Hardware
Students will be required to provide their own laptop computers.

Software
All analyses will be conducted using the R statistical software. R is free, open-source, and runs on all contemporary operating systems. The instructor will also offer support for students wishing to use Stata.

Learning objectives
Students will learn how to visualize, analyze, and conduct diagnostics on models for observational data that has both cross-sectional and temporal variation.

Course content

Analysts increasingly find themselves presented with data that vary both over cross-sectional units and across time. Such panel data provides unique and valuable opportunities to address substantive questions in the economic, social, and behavioral sciences. This course will begin with a discussion of the relevant dimensions of variation in such data, and discuss some of the challenges and opportunities that such data provide. It will then progress to linear models for one-way unit effects (fixed, between, and random), models for complex panel error structures, dynamic panel models, nonlinear models for discrete dependent variables, and models that leverage panel data to make causal inferences in observational contexts. Students will learn the statistical theory behind the various models, details about estimation and inference, and techniques for the visualization and substantive interpretation of their statistical results. Students will also develop statistical software skills for fitting and interpreting the models in question, and will use the models in both simulated and real data applications. Students will leave the course with a thorough understanding of both the theoretical and practical aspects of conducting analyses of panel data.

Structure
Day One:
Morning:
•     (Very) Brief Review of Linear Regression
•     Overview of Panel Data: Visualization, Pooling, and Variation
•     Regression with Panel Data
Afternoon:
•     Unit Effects Models: Fixed-, Between-, and Random-Effects

Day Two:
Morning:
•     Dynamic Panel Data Models: The Instrumental Variables / Generalized Method of Moments Framework
Afternoon:
•     More Dynamic Models: Orthogonalization-Based Methods

Day Three:
Morning:
•     Unit-Effects and Dynamic Models for Discrete Dependent Variables
Afternoon:
•     GLMs for Panel Data: Generalized Estimating Equations (GEEs)

Day Four:
Morning:
•     Introduction to Causal Inference with Panel Data (Including Unit Effects)
Afternoon:
•     Models for Causal Inference: Differences-In-Differences, Synthetic Controls, and Other Methods

Day Five:
Morning:
•     Practical Issues: Model Selection, Specification, and Interpretation
Afternoon:
•     Course Examination

Literature

Mandatory
Hsiao, Cheng. 2014. Analysis of Panel Data, 3rd Ed. New York: Cambridge University Press.
OR
Croissant, Yves, and Giovanni Millo. 2018. Panel Data Econometrics with R. New York: Wiley.

Supplementary / voluntary
Abadie, Alberto. 2005. "Semiparametric Difference-in-Differences Estimators." Review of Economic Studies 72:1-19.

Anderson, T. W., and C. Hsiao. 1981. "Estimation Of Dynamic Models With Error Components." Journal of the American Statistical Association 76:598-606.

Antonakis, John, Samuel Bendahan, Philippe Jacquart, and Rafael Lalive. 2010. "On Making Causal Claims: A Review and Recommendations." The Leadership Quarterly 21(6):1086-1120.

Arellano, M. and S. Bond. 1991. "Some Tests Of Specification For Panel Data: Monte Carlo Evidence And An Application To Employment Equations." Review of Economic Studies 58:277-297.

Beck, Nathaniel, and Jonathan N. Katz. 1995. "What To Do (And Not To Do) With Time-Series Cross-Section Data." American Political Science Review 89(September): 634-647.

Bliese, P. D., D. J. Schepker, S. M. Essman, and R. E. Ployhart. 2020. "Bridging Methodological Divides Between Macro- and Microresearch: Endogeneity and Methods for Panel Data." Journal of Management, 46(1):70-99.

Clark, Tom S. and Drew A. Linzer. 2015. "Should I Use Fixed Or Random Effects?" Political Science Research and Methods 3(2):399-408.

Doudchenko, Nikolay, and Guido Imbens. 2016. "Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis." Working paper: Graduate School of Business, Stanford University.

Gaibulloev, K., Todd Sandler, and D. Sul. 2014. "Of Nickell Bias, Cross-Sectional Dependence, and Their Cures: Reply." Political Analysis 22: 279-280.

Hill, T. D., A. P. Davis, J. M. Roos, and M. T. French. 2020. "Limitations of Fixed-Effects Models for Panel Data." Sociological Perspectives 63:357-369.

Hu, F. B., J. Goldberg, D. Hedeker, B. R. Flay, and M. A. Pentz. 1998. "Comparison of population-averaged and subject-specific approaches for analyzing repeated binary outcomes." American Journal of Epidemiology 147(7):694-703.

Imai, Kosuke, and In Song Kim. 2019. "When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?" American Journal of Political Science 62:467-490.

Keele, Luke, and Nathan J. Kelly. 2006. "Dynamic Models for Dynamic Theories: The Ins and Outs of Lagged Dependent Variables." Political Analysis 14(2):186-205.

Lancaster, Tony. 2002. "Orthogonal Parameters and Panel Data." Review of Economic Studies 69:647-666.

Liu, Licheng, Ye Wang, Yiqing Xu. 2019. "A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data." Working paper: Stanford University.

Mummolo, Jonathan, and Erik Peterson. 2018. "Improving the Interpretation of Fixed Effects Regression Results." Political Science Research and Methods 6:829-835.

Neuhaus, J. M., and J. D. Kalbfleisch. 1998. "Between- and Within-Cluster Covariate Effects in the Analysis of Clustered Data. Biometrics, 54(2): 638-645.

Pickup, Mark and Vincent Hopkins. 2020. "Transformed-Likelihood Estimators for Dynamic Panel Models with a Very Small T." Political Science Research & Methods, forthcoming.

Xu, Yiqing. 2017. "Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models." Political Analysis 25:57-76.

Zorn, Christopher. 2001. "Generalized Estimating Equation Models for Correlated Data: A Review with Applications." American Journal of Political Science 45(April):470-90.

Mandatory readings before course start

Hsiao, Cheng. 2007. "Panel Data Analysis -- Advantages and Challenges." Test 16:1-22.

Examination part
Students will be evaluated on two written homework assignments that will be completed during the course (20% each) and a final examination (60%). Homework assignments will typically involve a combination of simulation-based exercises and "real data" analyses, and will be completed during the evenings while the class is in session. For the final examination, students will have two alternatives:

•    "In-Class": Complete the final examination in the afternoon of the last day of class (from roughly noon until 6:00 p.m. local time), or

•    "Take-Home": Complete the final examination during the week following the end of the course (due date: TBA).

Additional details about the final examination will be discussed in the morning session on the first day of the course.

Supplementary aids

The exam will be a "practical examination" (see below for content). Students will be allowed access to (and encouraged to reference) all course materials, notes, help files, and other documentation in completing their exam.

Examination content

The examination will involve the application of the techniques taught in the class to one or more "live" data example(s). These will typically take the form of either (a) a replication and extension of an existing published work, or (b) an original analysis of observational data with a panel / time-series cross-sectional component. Students will be required to specify, estimate, and interpret various statistical models, to conduct and present diagnostics and robustness checks, and to give detailed justifications for their choices.

Examination relevant literature
See above. Details of the examination literature will be finalized prior to the start of class.