Lecture 14
Overview
- Understand the difference between web-scraping and Application Program Interface (API)
- Define HTML and CSS selectors
- Introduce the
rvest
package for scraping in R - Demonstrate how to extract information from HTML pages
- Practice scraping data
Before class
Several packages are needed for this and next lectures on the topic. They are all already installed on R Workbench. But if you are using R from your laptop (VS. R Workbench), I’d suggest following the scraping lectures using Workbench.
Readings
Readings for all lectures on the topic (both direct web-scraping and scraping using APIs) are posted here.
General Introductions:
- Chapter 1 and 4 in Web Scraping with R
rvest
documentationhttr
documentation- Web Scraping using R Cheat Sheet
Articles that summarize the scraping workflow:
- Web scraping with R by William Marble
- Web scraping using R by Alex Bradley and Richard J. E. James
API resources: Install-and-play API packages for R
Class materials
In-class materials (exercises and code) will be posted here shortly before class.