• Text Resize A A A
  • Print Print
  • Share Share on facebook Share on twitter Share


Introduction to Git and fundamentals of R –1 week, 2 classes

  • Introduction to Git, how we will be using Git both for capstones and for class material throughout the course
  • Includes an overview of R, its advantages and disadvantages, coding fundamentals and data wrangling
  • Introduce base plotting in R
  • Introduction to SQL and connecting R to SQL databases for the implementation of a seamless data pipeline
  • Static and interactive visualization in R–1 week, 2 classes

Build intuitive data visualizations in ggplot

  • Deeper dive into interactive libraries including plotly and Highcharts
  • How to visualize outputs from the algorithms we covered

Introduction to foundational statistics, linear regression and a scientific approach to building a model –1 week, 2 classes

Covers statistics that we will be using in subsequent classes, such as:

  • Correlation/covariance
  • Autocorrelation
  • T-tests/f-tests/p-values/confidence intervals
  • Review of linear and polynomial regression
  • LOESS regression
  • Introduction to the model building process
  • Train/test sets
  • Unsupervised vs. supervised learning

Unsupervised learning methods –1 week, 2 classes

  • Clustering
  • PCA

Midway capstone presentations –0.5 week, 1 class

Text Mining in R –1.5 weeks, 3 classes

  • Working with the tm package in R
  • Cleaning and manipulating text
  • Introduction to regular expressions
  • Summary metrics of corpora and visualization of text data

Supervised learning methods -Classification –1.5 weeks, 3 classes

  • Introduction to classification including creating dummy variables in R and other techniques used to transform data for a classification problem k-Nearest Neighbors
  • Logistic regression with LASSO and Ridge penalties
  • Decision trees, Random Forest
  • Support Vector Machines

Finalize capstone projects and open discussion/work –0.5 weeks, 1 class

Capstone Project Plan: Students are required to log their progress in the Google Sheet at the beginning of each week to ensure progress and enable TAs to proactively provide support

Week 1: Topic and success requirements selected.

Weeks 1-2: Project plan developed including the skill sets and technology required, and data set identified.

Weeks 2-4: Data set acquired and exploratory analysis and visualization performed.

Weeks 4-5: Initial analysis performed & peer review.

Week 5: Analysis refined.

Weeks 6-7: Application development and peer review.

Weeks 7-8: Final presentations and conclusions.

Content created by Office of the Chief Technology Officer (CTO)
Content last reviewed on September 24, 2018