# Machine Learning with R - Introduction

**Prerequisites (knowledge of topic)**

This course assumes no prior experience with machine learning or R, though it may be helpful to be familiar with introductory statistics and programming.

**Hardware**

A laptop computer is required to complete the in-class exercises.

**Software**

R (https://www.r-project.org/) and R Studio (https://www.rstudio.com/products/rstudio/) are available at no cost and are needed for this course.

**Course content**

Machine learning, put simply, involves teaching computers to learn from experience, typically for the purpose of identifying or responding to patterns or making predictions about what may happen in the future. This course is intended to be an introduction to machine learning methods through the exploration of real-world examples. We will cover the basic math and statistical theory needed to understand and apply many of the most common machine learning techniques, but no advanced math or programming skills are required. The target audience may include social scientists or practitioners who are interested in understanding more about these methods and their applications. Students with extensive programming or statistics experience may be better served by a more theoretical course on these methods.

**Structure**

The course will be designed to be interactive, with ample time for hands-on practice with the Machine Learning methods. Each day will include several lectures based on a Machine Learning topic, in addition to hands-on “lab” sections to apply the learnings to new datasets (or your own data, if desired).

The schedule will be as follows:

*Day 1:* Introducing Machine Learning with R

- How machines learn
- Using R, R Studio, and R Markdown
- k-Nearest Neighbors
- Lab sections – installing R, using R Markdown, choosing own dataset (if desired)

*Day 2: *Intermediate ML Methods – Classification Models

- Quiz on Day 1 material
- Naïve Bayes
- Decision Trees and Rule Learners
- Lab sections – practicing with Naïve Bayes and decision trees

*Day 3: *Intermediate ML Methods – Numeric Prediction

- Quiz on Day 2 material
- Linear Regression
- Regression trees
- Logistic regression
- Lab sections – practicing with regression methods

*Day 4:* Advanced Classification Models

- Quiz on Day 3 material
- Neural Networks
- Support Vector Machines
- Random Forests
- Lab section – practice with neural networks, SVMs, and random forests

*Day 5: *Other ML Methods

- Quiz on Day 4 material
- Association Rules
- Hierarchical clustering
- k-Means clustering
- Lab section – practice with these methods, work on final report

**Literature**

**Mandatory**

Machine Learning with R (2nd ed.) by Brett Lantz (2015). Packt Publishing

**Supplementary / voluntary**

None required.

**Mandatory readings before course start**

Please install R and R Studio on your laptop prior to the 1st class. Be sure that these are working correctly and that external packages can be installed. Instructions for doing this are in the first chapter of Machine Learning with R.

**Examination part**

60% of the course grade will be based on a project and final report (approximately 2-3 pages), to be delivered within 2-3 weeks after the course in R Notebook format. This will be graded based on its use of the methods covered in class as well as making appropriate conclusions from the data. The remaining 40% will be based on four short quizzes, which are based on the topics covered on the previous day.

**Supplementary aids**

Students may reference literature and class materials as needed when writing the final project report. The short in-class quizzes will be closed book and will measure the student’s understanding of the material covered in the previous day’s class.

**Examination content**

The final project report should illustrate an ability to apply machine learning methods to a new dataset, which may be on a topic of the student’s choosing. The student should explore the data and explain the methods applied.

**Literature**

The quiz material will be based entirely on the lecture material, which is based on the required book, Machine Learning with R (2nd edition). Note that the lectures may include a small amount of additional material not found in the book.