Machine Learning with R - Introduction
Prerequisites (knowledge of topic)
This course assumes no prior experience with machine learning or R, though it may be helpful to be familiar with introductory statistics and programming.
A laptop computer is required to complete the in-class exercises.
R (https://www.r-project.org/) and R Studio (https://www.rstudio.com/products/rstudio/) are available at no cost and are needed for this course.
Machine learning, put simply, involves teaching computers to learn from experience, typically for the purpose of identifying or responding to patterns or making predictions about what may happen in the future. This course is intended to be an introduction to machine learning methods through the exploration of real-world examples. We will cover the basic math and statistical theory needed to understand and apply many of the most common machine learning techniques, but no advanced math or programming skills are required. The target audience may include social scientists or practitioners who are interested in understanding more about these methods and their applications. Students with extensive programming or statistics experience may be better served by a more theoretical course on these methods.
The course will be designed to be interactive, with ample time for hands-on practice with the Machine Learning methods. Each day will include several lectures based on a Machine Learning topic, in addition to hands-on “lab” sections to apply the learnings to new datasets (or your own data, if desired).
The schedule will be as follows:
Day 1: Introducing Machine Learning with R
- How machines learn
- Using R, R Studio, and R Markdown
- k-Nearest Neighbors
- Lab sections – installing R, using R Markdown, choosing own dataset (if desired)
Day 2: Intermediate ML Methods – Classification Models
- Quiz on Day 1 material
- Naïve Bayes
- Decision Trees and Rule Learners
- Lab sections – practicing with Naïve Bayes and decision trees
Day 3: Intermediate ML Methods – Numeric Prediction
- Quiz on Day 2 material
- Linear Regression
- Regression trees
- Logistic regression
- Lab sections – practicing with regression methods
Day 4: Advanced Classification Models
- Quiz on Day 3 material
- Neural Networks
- Support Vector Machines
- Random Forests
- Lab section – practice with neural networks, SVMs, and random forests
Day 5: Other ML Methods
- Quiz on Day 4 material
- Association Rules
- Hierarchical clustering
- k-Means clustering
- Lab section – practice with these methods, work on final report
Machine Learning with R (2nd ed.) by Brett Lantz (2015). Packt Publishing
Supplementary / voluntary
Mandatory readings before course start
Please install R and R Studio on your laptop prior to the 1st class. Be sure that these are working correctly and that external packages can be installed. Instructions for doing this are in the first chapter of Machine Learning with R.
60% of the course grade will be based on a project and final report (approximately 2-3 pages), to be delivered within 2-3 weeks after the course in R Notebook format. This will be graded based on its use of the methods covered in class as well as making appropriate conclusions from the data. The remaining 40% will be based on four short quizzes, which are based on the topics covered on the previous day.
Students may reference literature and class materials as needed when writing the final project report. The short in-class quizzes will be closed book and will measure the student’s understanding of the material covered in the previous day’s class.
The final project report should illustrate an ability to apply machine learning methods to a new dataset, which may be on a topic of the student’s choosing. The student should explore the data and explain the methods applied.
The quiz material will be based entirely on the lecture material, which is based on the required book, Machine Learning with R (2nd edition). Note that the lectures may include a small amount of additional material not found in the book.