Regression for Publishing

Prerequisites (knowledge of topic)
Mathematics: Comfortable familiarity with univariate differential and integral calculus, basic probability theory, and linear algebra is required. Familiarity with discrete and continuous univariate probability distributions will be helpful. Statistics: Students should have completed Ph.D.-level courses in introductory statistics and linear regression models, up to the level of GSERM's Regression II.

Students will complete course work on their own laptop computers. Microsoft Windows, Apple OS-X, and Linux variants are all supported; please contact the instructor to ascertain the viability of other operating systems for course work.

Basic proficiency with at least one statistical software package/language is not required but is highly recommended. Preferred software packages include the R statistical computing language and Stata. Course content will be presented using R; computer code for all course materials (analyses, graphics, course slides, examples, exercises) will be made available to students. Students choosing to use R are encouraged to arrive at class with current versions of both R ( and RStudio ( on their laptops.

Course content
This course builds directly upon the foundations laid in Regression II, with a focus on successfully applying linear and generalized linear regression models. After a brief review of the linear regression model, the course addresses a series of practical issues in the application of such models: presentation and discussion of results (including tabular, graphical, and textual modes of presentation); fitting, presentation, and interpretation of two- and three-way multiplicative interaction terms; model specification for dealing with nonlinearities in covariate effects; and post-estimation diagnostics, including specification and sensitivity testing. The course then moves to a discussion of generalized linear models, including logistic, probit, and Poisson regression, as well as textual, tabular, and graphical methods for presentation and discussion of such models. The course concludes with a "participants' choice" session, where we will discuss specific issues and concerns raised by students' own research projects and agendas.

Day One (morning session): Review of linear regression.
Day One (afternoon session): Presentation and interpretation of linear regression models.

Day Two (morning session): Fitting and interpreting models with multiplicative interactions.
Day Two (afternoon session): Nonlinearity: Specification, presentation, and interpretation.

Day Three (morning session): Anticipating criticisms: Model diagnostics and sensitivity tests.
Day Three (afternoon session): Introduction to logit, probit, and other Generalized Linear Models (GLMs).

Day Four (morning session): GLMs: Presentation, interpretation, and discussion.
Day Four (afternoon session): GLMs: Practical considerations, plus extensions.

Day Five (morning session): "Participants' choice" session.
Day Five (afternoon session): Examination period.


The course has one required text:

Fox, John R. 2016. Applied Regression Analysis and Generalized Linear Models, Third Edition. Thousand Oaks, CA: Sage Publications.

Additional readings will also be assigned as necessary; a list of those readings will be sent to course participants a few weeks before the course begins. All additional readings will be available on the course github repository and/or through online library services (e.g., JSTOR).

Supplementary / Voluntary

Mandatory readings before course start

Examination part
 - Two written homework assignments (20% each)
 - A final examination (50%)
 - Oral / class participation (10%)

Supplementary aids
The exam will be a "practical examination" (see below for content). Students will be allowed access to (and encouraged to reference) all course materials, notes, help files, and other documentation in completing their exam. Additional useful materials include:

Fox, John, and Sanford Weisberg. 2011. An R and S-Plus Companion to Applied Regression, Second Edition. Thousand Oaks, CA: Sage Publications.

Nagler, Jonathan. 1996. "Coding Style and Good Computing Practices." The Political Methodologist 6(2):2-8.

Examination content
The examination will involve the application of the techniques taught in the class to one or more "live" data example(s). These will typically take the form of either (a) a replication and extension of an existing published work, or (b) an original analysis of observational data using linear and/or generalized linear regression. Students will be required to specify, estimate, and interpret various forms of regression models, to present tabular and graphical interpretations of those model results, to conduct and present diagnostics and robustness checks, and to give detailed explanations and justifications for their responses.

Fox, John. 2016. Applied Regression Analysis and Generalized Linear Models, Third Edition. Thousand Oaks, CA: Sage Publications.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel / Hierarchical Models. New York: Cambridge University Press.