Wharton PhD Tech Camp 2015



Jing Peng

Course Description

This course aims at offering some hands-on experience for incoming and current PhD students interested in data-driven research. This course will use a real-world example to show how to collect, clean, and analyze data, from beginning to end. In addition, this course will also briefly cover machine learning and natural language processing tools that may greatly facilitate data analysis. The goal of this short course is to develop familiarity with data-related skills and tools, so that students know where to start and how to possibly get things done in their future data-related research.

There is no prerequisite for this course. Feel free to attend the sessions selectively. Auditing is welcome (please register). There will be assignments for practice after each session, but no exam. Please bring your own laptop for this course.


1. R and Python Basics. Version control.  

Tools: RPython, RStudio, Wingware IDEGIT, Notepad++

Resources: Google Python videos, R videos, GIT doc

2. Data Collection: Facebook, Twitter, and general Web pages. 

Tools: tweepy, Facebook SDK for Pythonscrapy, regular expression tester

Resources:Twitter REST and Streaming APIs, tweepy doc Facebook APIregex tutorial

3. Data Cleaning: data.table (R)

Resources: data.table docs (do read the intro), full manual

4. Data Analysis: Data Mining vs. Causal Inference

Tools: weka

5. Batch Jobs: Wharton High Performance Computing, and Amazon Web Services

Resources: training videos, basics tutorial

6. Advanced Data Analysis: estimating your own model. 

Tools: optimRprofRcpp, inlineArmadillo

Setup: Rtools (win, add to path), XCode and fortran (mac)

Resources: information matrix, Maximum Likelihood Estimation

7. Big Data Introduction

Tools: Amazon EMR, Hadoop

Resources: Big Data Analytics, MapReduce ExampleAmazon Big Data Course

8. Natural Language Processing. 

Tools: NLTKgensim, MTurk

Setup: whl files

Resources: Vector Space Model, Latent Dirichlet Allocation, NLTK book

Time and Location

3:00pm-4:30pm Monday/Wednesday/Friday July 31-August 17 (8 sessions),  JMHH F55 

Additional Sources

Tech Camp 2013

Tech Camp 2014