Theoretical Aspects of Machine Learning
Prerequisites (knowledge of topic)
Probability theory at a good level, affinity to mathematical problems, advanced econometrics.
Laptops for the pc sessions.
Exercises will require the usage of the statistical software R.
The goal of this course is to provide a comprehensive overview of the mathematical theory behind machine learning. How can we characterize a good prediction? How can we construct good predictions based on machine learning methods? What is the relationship between (1) estimation error, (2) sample size and (3) model complexity? How do these abstract concepts apply in particular Machine Learning methods such as Boosting, Support Vector Machine, Ridge and LASSO? The objective of the course is to give detailed and intuitively clear answers to those questions. As a result, participants will receive a good preparation for theoretical and empicial work with/on Machine Learning methods.
1. Principles of statistical theory (loss function and risk, approximation vs estimation error, no free lunch theorems)
2. Concentration inequalities for bounded loss functions (Hoeffding’s Lemma, Azuma-Hoeffding’s inequality, Bounded difference inequality, Bernstein’s inequality, McDiarmid inequality)
3. Classification (binary case and its loss function, Bayesian classifier, Optimality of the Bayesian Classifier, Oracle inequalities for the Bayesian classifier, Finite dictionary learning case, The impact of noise on convergence rates, infinite dictionary)
4. General case (general loss functions, symmetrization, Rademacher complexity, Covering numbers, Chaining)
5. Applications Part 1: Vector Machine support, boosting
6. The mathematics and statistics of regularization methods (LASSO, Ridge, elastic net)
7. Applications Part 2: applying LASSO and Ridge
Concepts of statistical learning: Concentration inequalities, concepts of statistical theory (topics 1 and 2 from the course content)
The math of Machine learning and Classification. (topic 3 from course content)
The Machine learning methods and the general case (topics 4 and 5 from the course content)
LASSO and Ridge (topics 6 and 7 from the course content).
There will be a lecture script.
Supplementary / voluntary
The book “Elements of statistical learning” by Hastie, Tibshirani and Friedman gives a nice introduction intro Boosting and Vector Support Machines.
Further topic-specific nonobligatory references will be given during the lecture.
Final written examination at the end of the course (100%)
'Closed Book'. No external references allowed.
While the course is very technical, for the exam only intuitions are necessary. Intuition means: describe which assumptions are necessary for a result and give a verbal, possibly graphical reason (the participants able to give precise mathematical reasoning can do it instead). Participants should be able to give the intuition for the following concepts: Calculating a loss function, giving intuition for concentration inequalities, showing optimality of the Bayesian classifier, understanding the intuition behind the finite dictionary case, understanding the intuition of the impact of noise (Massart’s noise condition, Mammen-Tsybakov’s noise condition), describing the infinite dictionary learning problem, being able to define and explain Rademacher complexity, its relation to cardinality, calculating the VC-dimension, applications to empirical risk, being able to explain symmetrization, its relation to Rademacher complexity in the general case, and giving the intuition of how these concepts apply to Vector Support Machines and Boosting; the mathematical intuition behind the LASSO and Ridge methods, in particular in the orthonormal design.
Examination relevant literature