Lecture 14

Overview

  • Understand the difference between web-scraping and Application Program Interface (API)
  • Define HTML and CSS selectors
  • Introduce the rvest package for scraping in R
  • Demonstrate how to extract information from HTML pages
  • Practice scraping data

Before class

Several packages are needed for this and next lectures on the topic. They are all already installed on R Workbench. But if you are using R from your laptop (VS. R Workbench), I’d suggest following the scraping lectures using Workbench.

Readings

Readings for all lectures on the topic (both direct web-scraping and scraping using APIs) are posted here.

General Introductions:

Articles that summarize the scraping workflow:

API resources: Install-and-play API packages for R

Class materials

In-class materials (exercises and code) will be posted here shortly before class.