Analyzing Unstructured Data

Course Description

The world is awash in massive amounts of data, and more and more of that data appears to be unstructured – it comes in the form of text, audio, video, and images – not in nicely organized datasets.  Working with this kind of data requires new tools and approaches and ways of thinking.

With a substantive focus on questions of health and social outcomes, this short course introduces students to techniques for finding structure in unstructured data, with the goal of providing actionable insights. We will work primarily with text data, and the short course will lean more towards the practical aspects of working with unstructured data. The short course should be an appropriate introduction for social science scholars with a good grasp of quantitative social science methods.


Basic R Programming


A laptop computer is required.


R and RStudio

The course is organized as such:


Day 1: Introduction, R Refresher, and Processing Text Data

Day 2: Exploring Unstructured Data

Day 3: Finding Latent Structures in Data

Day 4: Visualization of Unstructured Data

Day 5: Machine Learning & Predictive Data Mining


More details, specific readings, and resources will be provided closer to the course start date.