Econometrics of Big Data

Course Description

As in many other fields, economists are increasingly making use of high-dimensional models – models with many unknown parameters that need to be inferred from the data.  Such models arise naturally in modern data sets that include rich information for each unit of observation (a type of “big data”) and in nonparametric applications where researchers wish to learn, rather than impose, functional forms.  High-dimensional models provide a vehicle for modeling and analyzing complex phenomena and for incorporating rich sources of confounding information into economic models.

Our goal in this course is two-fold.  First, we wish to provide an overview and introduction to several modern methods, largely coming from statistics and machine learning, which are useful for exploring high-dimensional data and for building prediction models in high-dimensional settings.  Second, we will present recent proposals that adapt high-dimensional methods to the problem of doing valid inference about model parameters and illustrate applications of these proposals for doing inference about economically interesting parameters.

Course prerequisites

The course is a PhD level course. Basic knowledge of parametric statistical models and associated asymptotic theory is expected.

Preliminary Outline

Lecture 1:  Introduction to High-Dimensional Modeling

Lecture 2:  Introduction to Distributed Computing for Very Large Data Sets

Lecture 3:  Tree-based Methods

Lecture 4:   An Overview of High-Dimensional Inference

Lecture 5:  Penalized Estimation Methods

Lecture 6:  Moderate p Asymptotics

Lecture 7:  Examples

Lecture 8:  Inference:  Computation

Lecture 9:  Introduction to Unsupervised Learning

Lecture 10:  Very Large p Asymptotics

Course literature

Course notes and a list of readings provided at the beginning of the course.


Written examination (100%)

Examination content

Content of the lectures

Examination relevant literature

To be discussed in class