Wharton PhD Tech Camp 2018

Instructor

Course Description

This course aims to equip incoming and current Wharton doctoral students with the basic technical skills and tools required for empirical research. This includes publicly available analysis tools (e.g., R and Python) and Wharton-specific resources (e.g., Wharton grid computing cluster and WRDS). The course is primarily concerned with acquiring, cleaning, managing, and analyzing real-world datasets. It will also provide hands-on experience using a variety of computing tools, including machine learning and natural language processing techniques. At the end of this short-term course, students will have a better understanding of what tools are most appropriate for various data analysis tasks at hand.

There is no prerequisite for this course. Feel free to attend the sessions selectively. Auditing is welcome. The format will be roughly a 60-min lecture followed by a 30-min lab session, where you are encouraged to work on exercises. There is no exam. Please bring your own laptop for this course.

Timetable


Sessions

1. Course Intro, Command Line, R and Python Basics Mon 30 July, F50 in JMHH


Pre-class: Install R & RStudio (an IDE for R);  Install Anaconda;  Install PuTTY or MobaXterm

Resources: Unix Command Reference;   R Intro;   R Reference Card;   R Markdown Cheatsheet;   Python for Beginners;   Tutorial of Installing Anaconda on Windows;   Jupyter Notebook Quick Start Guide



2. More Intro to R & Python Wed 1 August, F50 in JMHH


Resources: R bloggers;   learnpython.org



3. Data Acquisition 1: Consuming APIs Friday 3 August, F50 in JMHH


Pre-class: Apply for Yelp Fusion API Key by Following Authentication

Resources: Wharton Research Data Services (WRDS);   Data Offered by Lippincott Library;   Apply for Data from WCAI;   REST API Tutorial;   Documentation of Yelp Fusion API;   Interfaces of Requests Package in Python;   Other API Examples Available on Websites of Past Tech Camps (links at the bottom)



4. Data Acquisition 2: Web Scraping Monday 6 August, F50 in JMHH


Resources: HTML Basics;   Documentation of Beautiful Soup Package;   Selenium with Python



5. Data Analysis: Summarization and Visualization, Causation vs. Prediction Wed 8 August, F50 in JMHH


Pre-class: Briefly Review Wharton HPCC Documentation Website and Wharton HPCC Documentation;   Apply for Wharton HPCC Account by Following Getting an Account

Resources: ggplot2 Tutorial;   Notes for Generalized Linear Models



6. Wharton HPCC and Behavioral Lab (Guest Speakers) Fri 10 August, F50 in JMHH


Pre-class: Briefly Review Wharton HPCC Documentation Website and Wharton HPCC Documentation;   Apply for Wharton HPCC Account by Following Getting an Account



7. Intro to Machine Learning and Foundations of Deep LearningMon 13 August, F50 in JMHH


Resources: An Introduction to Statistical Learning with Applications in R Book;   Mining of Massive Datasets Book;   Deep Learning Reading List by Dokyun Lee;   Stanford CS231n Convolutional Neural Networks for Visual Recognition;   Coursera Deep Learning Course;   Deep Learning Textbook;   WILDML Blog



8. Intro to Natural Language Processing Wed 15 August, F50 in JMHH


Resources: Natural Language Toolkit (NLTK);   Text Mining with R;   Stanford NLP Software;   MALLET;   David Blei on Topic Modeling;   gensim Package;   Word2Vec


Resources from Past Years