Data Science CoLab

Making Data Work for HHS


Introduction to Git and fundamentals of R –1 week, 2 classes

  • Introduction to Git, how we will be using Git both for capstones and for class material throughout the course
  • Includes an overview of R, its advantages and disadvantages, coding fundamentals and data wrangling
  • Introduce base plotting in R
  • Introduction to SQL and connecting R to SQL databases for the implementation of a seamless data pipeline

Static and interactive visualization in R–1 week, 2 classes

  • Build intuitive data visualizations in ggplot
  • Deeper dive into interactive libraries including plotly and Highcharts
  • How to visualize outputs from the algorithms we covered

Introduction to foundational statistics, linear regression and a scientific approach to building a model –1 week, 2 classes

  • Covers statistics that we will be using in subsequent classes, such as:
  • Correlation/covariance
  • Autocorrelation
  • T-tests/f-tests/p-values/confidence intervals
  • Review of linear and polynomial regression
  • LOESS regression
  • Introduction to the model building process
  • Train/test sets
  • Unsupervised vs. supervised learning

Unsupervised learning methods –1 week, 2 classes

  • Clustering
  • PCA

Midway capstone presentations –0.5 week, 1 class

Text Mining in R –1.5 weeks, 3 classes

  • Working with the tm package in R
  • Cleaning and manipulating text
  • Introduction to regular expressions
  • Summary metrics of corpora and visualization of text data

Supervised learning methods -Classification –1.5 weeks, 3 classes

  • Introduction to classification including creating dummy variables in R and other techniques used to transform data for a classification problem
  • k-Nearest Neighbors
  • Logistic regression with LASSO and Ridge penalties
  • Decision trees, Random Forest
  • Support Vector Machines

Finalize capstone projects and open discussion/work –0.5 weeks, 1 class