# Analyzing Survey Research Data

**Prerequisites (knowledge of topic)**

Substantive Background: Students taking this course should have a general familiarity with the types of data that can be obtained through survey research. While not absolutely required, it would be useful if students bring to the course survey datasets from their own fields. But, even if students do not bring their own data, the instructor will provide several survey datasets for course use.

Statistical Methods: Students in this course should be familiar with multiple regression analysis and comfortable with the process of employing regression models to analyze empirical data.

Computing: Students in this course should have some prior exposure and basic experience with the R statistical computing environment. But specific packages and functions will be introduced and explained in detail throughout the course.

**Hardware**

Students in this course should bring their own laptop computers to class so they can access the software required to carry out the analyses in course examples and exercises.

**Software**

This course will rely on the R statistical computing environment. Students should install the latest version of R on their computers before the first class session. While not absolutely required, it is strongly recommended that students also install RStudio. Doing so will make it much easier to interact with the R system in productive ways.

The course material will use several R packages. Students should install the optiscale, psych, mokken, and smacof packages before the first class session. Additional R packages will be made available and used throughout the course.

**Course content**

This course is aimed at demonstrating to students how to complete 3 critical tasks with survey data: 1) combine several survey items into a more reliable and powerful scale, 2) assess the dimensionality of a set of attitudes, 3) produce geometric maps of attitudes and preferences, so that the fundamental structure of people’s beliefs can be more readily interpreted. More generally, this course is aimed at aiding researchers in better measuring the phenomena they are interested in. Though researchers of all sorts recognize measurement as a fundamental and crucial step of the scientific process, the topic is rarely given formal attention in core graduate courses beyond a cursory treatment of the concepts of reliability and validity.

The course will cover a variety of strategies for producing quantitative (usually interval-level) variables from qualitative survey responses (which are usually believed to be measured at the nominal or ordinal level). We will begin with a discussion of measurement theory, giving detailed consideration to such concepts as measurement level and measurement accuracy. This will lead us to optimal scaling strategies, for assigning numbers to objects. Following that, we will cover a variety of methods for combining multiple survey responses in order to produce higher-quality summary measures. These include: summated rating (or “Likert”) scales and reliability of measurement; principal components analysis; item response theory; factor analysis; multidimensional scaling; the vector model for profile data; and correspondence analysis. Each of these methods applies a measurement model to empirical data in order to generate a quantitative representation of the observations and survey items. The results provide new variables that can be employed as input to subsequent statistical models. These methods are not just “mere” measurement tools; in addition to quantifying observations, they often provide useful new insights about the systematic structure that exists within those observations. And, from a practical perspective, consideration of measurement theory and scaling methods can guide researchers to construct more powerful batteries of survey questions.

**Structure**

On each class day, the morning session will be used to introduce new concepts, models, and techniques. Some of this discussion may extend on into the afternoon sessions. But, most of the time during the afternoon sessions will be devoted to class exercises that provide students an opportunity to apply the material discussed during the morning session.

*Day 1*

General introduction and basic concepts

Measurement theory

Optimal scaling

Summated rating scales (or, additive indexes)

*Day 2*

Reliability

Cumulative scales (or, Mokken scaling, IRT)

*Day 3*

Biplots

Principal components analysis

*Day 4*

Factor analysis (exploratory and confirmatory)

Multidimensional scaling

*Day 5*

More multidimensional scaling

Correspondence analysis

**Literature**

Mandatory

Unfortunately, there is no single textbook that covers all of the topics in this course. In addition, many of the texts that are available have certain drawbacks that limit their usefulness for our purposes: They tend to be very expensive; they usually assume a high level of mathematical sophistication; they often contain sections that are out of date. Because of these considerations, the required readings can be taken from two alternative sources: (1) The Sage series on Quantitative Applications in the Social Sciences (i.e., the “little green books”); or (2) chapters from The Wiley Handbook of Psychometric Testing, edited by Paul Irwing, Tom Booth, and David J. Hughes.

Sage QASS monographs:

Dunteman, George H. (1989) Principal Components Analysis.

Jacoby, William G. (1991) Data Theory and Dimensional Analysis.

Kim, Jae-On and Charles W. Mueller. (1978a) Introduction to Factor Analysis.

Kim, Jae-On and Charles W. Mueller. (1978b) Factor Analysis: Statistical Methods and Practical Issues.

Kruskal, Joseph B. and Myron Wish. (1978) Multidimensional Scaling.

McIver, John and Edward G. Carmines. (1981) Unidimensional Scaling.

Van Schuur, Wijbrandt. (2011) Ordinal Item Response Theory: Mokken Scale Analysis.

Weller, Susan C. and A. Kimball Romney. (1990) Metric Scaling: Correspondence Analysis.

Chapters from The Wiley Handbook of Psychometric Testing:

DeMars, Christine. “Classical Test Theory and Item Response Theory.”

Hughes, David J. “Psychometric Validity: Establishing the Accuracy and Appropriateness of Psychometric Measures.”

Jacoby, William G. and David J. Ciuk. “Multidimensional Scaling: An Introduction.”

Jennrich, Robert J. “Rotation.”

Meijer, Rob R. and Jorge N. Tendeiro. “Unidimensional Item Response Theory.”

Mulaik, Stanley A. “Fundamentals of Common Factor Analysis.”

Revelle, William and David M. Condon. “Reliability.”

Timmerman, Marieke E.; Urbano Lorenzo-Seva; Eva Ceulemans. “The Number of Factors Problem.”

Supplementary / voluntary

Armstrong II, David A.; Ryan Bakker; Royce Carroll; Christopher Hare; Keith T. Poole; Howard Rosenthal. (2014) Analyzing Spatial Models of Choice and Judgment with R.

Bartholomew, David J.; Fiona Steele; Irini Moustaki; Jane I. Galbraith. (2008) Analysis of Multivariate Social Science Data (Second Edition).

Borg, Ingwer and Patrick Groenen. (2005) Modern Multidimensional Scaling: Theory and Applications (Second Edition).

Cudek, Robert and Robert C. MacCallum, Editors (2007) Factor Analysis at 100.

Lattin, James; J. Douglas Carroll; Paul E. Green. (2003) Analyzing Multivariate Data.

Mulaik, Stanley A. (2010) Foundations of Factor Analysis (Second Edition).

Wickens, Thomas D. (1995) The Geometry of Multivariate Statistics.

Mandatory readings before course start

None.

**Examination part**

Course participants will be evaluated on the basis of oral participation (20%) and a major homework exercise (80%). In the homework exercise, course participants will apply one or more of the techniques covered in the class to actual survey data. Ideally, students will have their own survey data drawn from their respective substantive fields. But, if not, the course instructor can provide some survey data drawn from political science and sociological applications.