Wharton PhD Tech Camp 2016

 

Instructor

Youran Fu

Office: JMHH 532.4;  Email: youranfu@wharton.upenn.edu 

Course Description

The aim of this course is to familiarize incoming and current Wharton PhD students with both publicly available and Wharton specific cutting-edge tools and resources for empirical research. The course is primarily concerned with acquiring, cleaning, managing, and analyzing data. It will provide hands-on experience using variety of computing tools, including intro-level machine learning and natural language processing techniques. At the end of this short-term course, students will have a better understanding of what tools are most appropriate for different data analysis tasks at hand.

There is no prerequisite for this course. Feel free to attend the sessions selectively. Auditing is welcome (please register). The format will be roughly a 60-min lecture followed by a 30-min lab session, where you are encouraged to work on exercises. There is no exam. Please bring your own laptop for this course.

Registration Form [please fill out the survey questions at the end too]

Syllabus.pdf

Slides will be sent to your registered email after each session.

1. Version Control, Unix, and R Basics

Pre-class: Read Code and Data Chapter 3;   Install GitHub;   R & RStudio (an IDE for R)

Resources: GitHub Getting Started, GitHub Student Pack, Git docUnix Command Cheat Sheet;   R Intro, R Language, R videos

Lab exercises: Download all files here

2. Data Cleaning and Manipulation (R and SQL)

Pre-class: Apply for Wharton grid account;   Have a look at data.table vignettes & SQL Tutorial

Resources: data.table cheat sheet, ggplot2 cheat sheet

Lab exercises: Download all files here

3. Wharton HPCC and Behavioral Lab (Guest Speakers)

Pre-class: Apply for Wharton grid account;   View and participate in the HPCC 101 series videos

For session 4: If you have never used Python before, install Anaconda for Python 2.7 (recommended Python packaged distribution) OR Enthought Canopy (another Python packaged distribution)

4. Python and Regex Basics

Pre-class: Have a look at Interactive Python Tutorial, & Interactive Regex Tutorial;   Install Anaconda for Python 2.7 (recommended) if you have never used Python before

Resources: Codecademy Python Course, Google Python Class;   Online Regex Editor, Another Regex Tester;   Notepad++ : a recommended editor for Windows user

Lab exercises: Download all files here

5. Data Collection and Web Scraping (Python and R)

Pre-class: Apply for Twitter API Keys, & install Python package tweepy;   Have a look at Twitter REST and Streaming APIs

Resources: import.io, API Search Engine, Tutorials for Web Development (Scraping) Related; Tweepy Doc, Scrapy Doc, Facebook SDK for Python, Facebook APIs;   other API examples available on Tech Camp 2013 website (link at the bottom)

Lab exercises: Download all files here

6. Data Analysis: Causation vs. Prediction

Pre-class: Install Weka 3.8

Resources: Weka MOOC Courses

Lab exercises: Download all files here

7. Intro to Natural Language Processing and Machine Learning

Pre-class: Install Python packages NLTK, scikit-learn, gensim

Resources: wheel files for Python packages (ignore it if you successfully install all the abovementioned packages);   How to get training set: MTurk

Lab exercises:Download all files here

8. Hands-on R Practice Session: in-class demo and lab session

Attention: room changes to SHDH 109!

Time and Location

1:30pm-3:00pm Monday/Wednesday/Friday 

August 1 - 17 (8 sessions),  JMHH F55 (switch to SHDH 109 for the last session on Aug 17 only) 

Additional Sources

Tech Camp 2013

Tech Camp 2014

Tech Camp 2015