# Network Analysis – Statistical Analysis of Social Network Data

**Prerequisites & content**

Prerequisite knowledge for the course includes the fundamentals of probability and statistics, especially hypothesis testing and regression analysis. This intermediate level course assumes that students can interpret the results of Ordinary Least Squares, Probit, and Logit regressions. They should also be familiar with the problems that are most common in regression, such as multicollinearity, heteroscedasticity, and endogeneity. Finally, students should comfortable working with computers and data. No prior knowledge of R or network analysis is required.

The concept of “social networks” is increasingly a part of social discussion, organizational strategy, and academic research. The rising interest in social networks has been coupled with a proliferation of widely available network data, but there has not been a concomitant increase in understanding how to analyze social network data. This course presents concepts and methods applicable for the analysis of a wide range of social networks, such as those based on family ties, business collaboration, political alliances, and social media.

Classical statistical analysis is premised on the assumption that observations are sampled independently of one another. In the case of social networks, however, observations are not independent of one another, but are dependent on the structure of the social network. The dependence of observations on one another is a feature of the data, rather than a nuisance. This course is an introduction to statistical models that attempt to understand this feature as both a cause and an effect of social processes.

Since network data are generated in a different way that many other kinds of social data, the course begins by considering the research designs, sampling strategies, and data formats that are commonly associated with network analysis. A key aspect of performing network analysis is describing various elements of the network’s structure. To this end, the course covers the calculation of a variety of descriptive statistics on networks, such as density, centralization, centrality, connectedness, reciprocity, and transitivity. We consider various ways of visualizing networks, including multidimensional scaling and spring embedding. We learn methods of estimating regressions in which network ties are the dependent variable, including the quadratic assignment procedure and exponential random graph models (ERGMs). We consider extensions of ERGMs, including models for two-mode data and networks over time.

Instruction is split between lectures and hands-on computer exercises. Students may find it to their advantage to bring with them a social network data set that is relevant to their research interests, **but doing so is ****not**** required**. The instructor will provide data sets necessary for completing the course exercises.

**Structure**

Day 1. Fundamental of Network Analysis

A. Why undertake network analysis?

B. How network analysis differs from other statistical methods

C. Elements of networks (Nodes, links, modes, attributes, matrices, graphs)

D. Key concepts (directionality, symmetry)

E. Visualization

F. Sampling

G. Survey methods

H. Working with network data in R

Day 2. Descriptive and Inferential Statistics

A. Density

B. Degree distributions

C. Centrality (degree, betweenness, closeness, power)

D. Centralization

E. Components and cores

F. Triads, triples, and transitivity

G. Clustering

H. Correlation and the Quadratic Assignment Procedure

I. Random graphs

J. Descriptive and inferential statistics in R

Day 3. Exponential Random Graph Models (ERGMs)

- Theory
- Specification
- Estimation
- Goodness of Fit
- Working with one-mode and two-mode ERGMs in R

Day 4. Network Data over Time Using Temporal ERGMs

Day 5. Student Presentations and Extensions of ERGM

- Student Presentations
- Additional extension of ERGMs, if time allows
- Concluding Discussion

**Literature**

Butts, Carter T. 2008. “network: A Package for Managing Relational Data in R.” *Journal of Statistical Software*, Vol. 24, No. 2: 1-36.

Dekker, David, David Krackhardt, Tom A.B. Snijders. 2007. “Sensitivity of MRQAP Tests to collinearity and autocorrelation conditions.” *Psychometrika, *Vol. 72: 563-581.

Freeman, Linton C. 2005. “Graphical Techniques for Exploring Social Network Data.” Pp. 248-269 in Peter J. Carrington, John Scott, and Stanley Wasserman, *Models and Methods in Social Network Analysis*. New York: Cambridge University Press.

Heaney, Michael T. 2014. “Multiplex Networks and Interest Group Influence Reputation: An Exponential Random Graph Model.” *Social Networks*, Vol. 36, No. 1 (January): 66-81.

Hunter, David R., Mark S. Handcock, Carter T. Butts, Steven M. Goodreau, and Martina Morris. 2008. “ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks.” *Journal of Statistical Software*, Vol. 24, No. 3 (February): 1-29.

Hunter, David R., Steven R. Goodreau, and Mark S. Handcock. 2008. “Goodness of Fit of Social Network Models.” *Journal of the American Statistical Association*, Vol. 103, No. 481 (March): 248-258.

Krackhardt, David. 1987 “QAP Partialing as a Test of Spuriousness.” *Social Networks*, Vol. 9: 171-186.

Krivitsky, Pavel N. 2012. “Exponential-family random graph models for valued networks.” *Electronic Journal of Statistics*, Vol. 6: 1100-1128.

Leifeld, Philip, and Volker Schneider. 2012. “Information Exchange in Policy Networks.” *American Journal of Political Science,* Vol. 53, No. 3. 731–744.

Leifeld, Philip, and Skyler J. Cranmer. 2014. “TERGM vs. SIENA.” Paper presented at the 7th Political Networks Conference, McGill University, Montreal, Canada, May 30, 2014.

Lusher, Dean, Johan Koskinen, and Garry Robins, eds. 2012. *Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications*. New York: Cambridge University Press.

Marsden, Peter V. 1990. “Network Data and Measurement.” *Annual Review of Sociology*, Vol. 16: 435-463.

Marsden, Peter V. 2005. “Recent Development in Network Measurement.” Pp. 8-30 in Peter J. Carrington, John Scott, and Stanley Wasserman, *Models and Methods in Social Network Analysis*. New York: Cambridge University Press.

Morris, Martina, Mark S. Handcock, and David R. Hunter. 2008. “Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects.” *Journal of Statistical Software*, Vol. 24, No. 4 (February): 1-24.

Garry L. Robins. 2016. *Doing Social Network Research: Network-based Research Design for Social Scientists*. Los Angeles: Sage.

Robins, Garry, Tom Snijders, Peg Wang, Mark Handcock, and Philippa Pattison. 2007. “Recent developments in exponential random graph (p*) models for social networks.” *Social Networks*, Vol. 29: 192-215.

Scott, John. 2000. *Social Network Analysis: A Handbook*, Second Edition. London: Sage Publications. Chapter 3-6.

Wang, Peng, Ken Sharpe, Garry L. Robins, and Philippa E. Pattison. 2009. “Exponential random graph (p*) models for affiliation networks.” *Social Networks*, Vol. 31, No 1: 12-25.

Wasserman, Stanley, and Katherine Faust. 1994. *Social Network Analysis: Methods and Applications*. New York: Cambridge University Press. Chatper 6.

**Exam**

75%: There will be one written computer-based problem set on Monday through Thursday (for four assignments in total). Time will be allocated in class to complete the assignments, which must be submitted each day.

25%: On the final of day of the course, each student will make a presentation to the class on the results of her or his research project for the week. Giving a presentation to the course is required to receive a satisfactory grade in the course.