Lecture 14

Overview

  • Understand the difference between web-scraping and Application Program Interface (API)
  • Define HTML and CSS selectors
  • Introduce the rvest package for scraping in R
  • Demonstrate how to extract information from HTML pages
  • Practice scraping data

Before class

Several packages are needed for this and next lectures on the topic. They are all already installed on R Workbench. But if you are using R from your laptop (VS. R Workbench), I’d suggest following the scraping lectures using Workbench.

Readings

Readings for all lectures on the topic (both direct web-scraping and scraping using APIs) are posted here.

General Introductions:

Articles that summarize the scraping workflow:

API resources: Install-and-play API packages for R

Class materials

Run the code below in your console to download today’s in-class exercises: usethis::use_course("css-materials/getting-data-from-the-web-scraping")