# Statistical Learning and Applications

**Prerequisites (knowledge of topic)**

The course will focus on the statistical and mathematical foundations of machine learning theory. The aim is to provide the students with a thorough understanding of the basic principles so as to prepare them to develop innovative methods and algorithms in their own field of applications. The course will be reasonably self-contained and does not require any specific prior knowledge in learning theory. It is nevertheless targeted at students with a sufficient quantitative background and it will rely on a basic knowledge of statistics and mathematics (probability, regression methods, linear algebra, elements of optimization theory, etc.) such as provided by standard undergraduate courses. Although students will be encouraged to perform some numerical experiments on their own, this will by no means be compulsory and could be made by the software/hardware of their choice.

**Hardware**

none mandatory

**Software**

none mandatory

**Course content**

This is a first course on statistical/machine learning and high-dimensional data analysis, aiming at providing a mathematical toolkit to deal with large datasets.

A tentative list of topics which will be covered is as follows (NB. some might be optional and reserved to students with a more mathematical background):

INTRODUCTION

- What is ‘Statistical Learning’ aka ‘Learning Theory’ or ‘Machine Learning’?
- The goal: extract meaningful information/infer good models from huge datasets and understand how a human brain/machine can ‘learn from examples’
- Examples from various scientific disciplines
- Similarities and differences with related fields: multivariate statistics, high-dimensional analysis, computational statistics, Bayesian statistics, data mining, computer vision, artificial intelligence, etc.

MATHEMATICAL FRAMEWORK

- Learning from examples
- Supervised versus unsupervised setting
- Classification and regression problems
- The ideal case: the Bayes risk and the Minimum Mean Square Error Predictor
- Empirical Risk Minimization
- Probabilistic concentration inequalities: Markov, Chebyshev, Hoeffding and McDiarmid
- The PAC (‘Probably Approximately Correct’) paradigm

CLASSIFICATION PROBLEMS AND SUPPORT VECTOR MACHINES (SVM)

- The Perceptron, ancestor of neural networks
- Convex optimization in a nutshell
- Linear SVM: the separable (‘hard margin’) and nonseparable (‘soft margin’) cases
- Nonlinear SVM: the ‘kernel trick’

KERNEL METHODS

- Kernels
- Reproducing Kernel Hilbert Spaces (RKHS)
- The Representer Theorem
- Examples of popular kernels

CONSISTENCY AND COMPLEXITY ISSUES

- Uniform Convergence of Empirical Means
- ‘Overfitting’ and capacity of a class of functions (VC dimension, Rademacher complexity)
- Risk bounds from Rademacher averages
- Rademacher complexity of a ball in a RKHS

LINEAR REGRESSION PROBLEMS

- The trouble with Ordinary Least Squares (OLS) for high-dimensional datasets
- Linear regularization/shrinkage methods: Truncated SVD (‘Singular Value Decomposition’) aka Principal Component Regression and Ridge Regression
- Nonlinear regularization/shrinkage methods: Lasso and Elastic Net Regression; variable selection; other (structured-)sparsity enforcing methods
- Computational issues

APPLICATIONS

- High-dimensional time series prediction in Economics and other disciplines
- Portfolio optimization/selection in Finance
- Optimal combination of forecasts
- Bioinformatics: gene selection, etc.
- Computer Vision: face detection and authentication

RISK BOUNDS

- Risk bounds via surrogate loss functions
- The contraction principle
- Convex risk minimization
- Least-squares regression in a RKHS
- The hinge loss: revisiting SVM

**Structure**

There will be 5 whole-day lessons (tentatively).

**Literature**

Mandatory: none

Supplementary / voluntary:

Besides the lecture notes/slides which will be distributed to the students, the main supplementary reference book for the course will be

M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning. MIT Press, 2012.

Also recommended is

T. Hastie, R. Tibshirani, and M. Wainwright. Statistical Learning with Sparsity:

The Lasso and Generalizations. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. CRC Press, 2015.

as well as the more encyclopedic references

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer New York, 2013.

K. Murphy. Machine learning. A Probabilistic Perspective. MIT Press 2012.

The following textbooks

H. Kobayashi, B.L. Mark, and W. Turin. Probability, Random Processes, and Statistical Analysis: Applications to Communications, Signal Processing, Queueing Theory and Mathematical Finance. Cambridge University Press, 2011.

B. Efron and T; Hastie. Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. Cambridge UP 2016.

can be useful as refreshers or to learn more about basic probability and statistics.

Mandatory readings before course start: none

**Examination**

Examination paper written at home.