Introduction to Git and fundamentals of R –1 week, 2 classes
- Introduction to Git, how we will be using Git both for capstones and for class material throughout the course
- Includes an overview of R, its advantages and disadvantages, coding fundamentals and data wrangling
- Introduce base plotting in R
- Introduction to SQL and connecting R to SQL databases for the implementation of a seamless data pipeline
- Static and interactive visualization in R–1 week, 2 classes
Build intuitive data visualizations in ggplot
- Deeper dive into interactive libraries including plotly and Highcharts
- How to visualize outputs from the algorithms we covered
Introduction to foundational statistics, linear regression and a scientific approach to building a model –1 week, 2 classes
Covers statistics that we will be using in subsequent classes, such as:
- T-tests/f-tests/p-values/confidence intervals
- Review of linear and polynomial regression
- LOESS regression
- Introduction to the model building process
- Train/test sets
- Unsupervised vs. supervised learning
Unsupervised learning methods –1 week, 2 classes
Midway capstone presentations –0.5 week, 1 class
Text Mining in R –1.5 weeks, 3 classes
- Working with the tm package in R
- Cleaning and manipulating text
- Introduction to regular expressions
- Summary metrics of corpora and visualization of text data
Supervised learning methods -Classification –1.5 weeks, 3 classes
- Introduction to classification including creating dummy variables in R and other techniques used to transform data for a classification problem k-Nearest Neighbors
- Logistic regression with LASSO and Ridge penalties
- Decision trees, Random Forest
- Support Vector Machines
Finalize capstone projects and open discussion/work –0.5 weeks, 1 class
Week 1: Topic and success requirements selected.
Weeks 1-2: Project plan developed including the skill sets and technology required, and data set identified.
Weeks 2-4: Data set acquired and exploratory analysis and visualization performed.
Weeks 4-5: Initial analysis performed & peer review.
Week 5: Analysis refined.
Weeks 6-7: Application development and peer review.
Weeks 7-8: Final presentations and conclusions.