Instructor: Alex Miller

Course Description

The aim of this course is to familiarize incoming and current Wharton PhD students with the basic technical skills and tools required for empirical research. This includes publicly available and open source tools (e.g., AWS, Python, R) and Wharton-specific resources (e.g., Wharton grid computing cluster, WRDS). The course is primarily concerned with acquiring, cleaning, managing, and analyzing data. It will provide hands-on experience using variety of computing tools, including intro-level machine learning and natural language processing techniques. At the end of this short-term course, students will have a better understanding of what tools are most appropriate for different data analysis tasks at hand.

There is no prerequisite for this course. Feel free to attend the sessions selectively. Auditing is welcome. The format will be roughly a 60-min lecture followed by a 30-min lab session, where you are encouraged to work on exercises. There is no exam. Please bring your own laptop for this course.

Dates and Time


GitHub Repository

All the notes and slides for the course can be downloaded at the course’s repository on GitHub.


  1. Intro, Unix, Git, and R Basics

    Mon 31 July, Room F55

  2. More R & Python Intro

    Wed 2 Aug, Room F55

  3. Wharton HPCC and Behavioral Lab (Guest Speakers)

    Fri 4 Aug, Room F55

  4. Structured Data Collection: Consuming APIs in Python

    Mon 7 Aug, Room F70

  5. Unstructured Data Collection in Python: Crawling and Scraping

    Wed 9 Aug, Room F70

  6. Advanced Scraping and Regex

    Fri 11 Aug, Room F70

  7. Intro to Text Mining and NLP

    Mon 14 Aug, Room F70

  8. Intro to Concepts in Machine Learning

    Wed 16 Aug, Room F70

Resources from Past Years