Workshop: Web Scraping with R

Cape Town (14-15 June 2018)

2018-04-12 talk: training talk: workshop Andrew B. Collier

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping.

Who should attend?

This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping.

What will you learn?

You will learn:

  • data manipulation with dplyr, tidyr and purrr;
  • tools for accessing the DOM;
  • scraping static sites with rvest;
  • scraping dynamic sites with RSelenium; and
  • setting up an automated scraper in the cloud.

See programme below for further details.

Where - Rise, Floor 5, Woodstock Exchange, 66 Albert Road, Woodstock, Cape Town
When - 14-15 June 2018
Who - Andrew Collier
Hanjo Odendaal

There are just 20 seats available. A 10% discount is available for groups of 4 or more people from a single organisation attending both days.

Email training@exegetic.biz if you have any questions about the workshop.

Register

Programme

Day 1

  • Motivating Example
  • R and the tidyverse
    • Vectors, Lists and Data Frames
    • Loading data from a file
    • Manipulating Data Frames with dplyr
    • Pivoting with tidyr
    • Functional programming with purrr
  • Introduction to scraping
    • Ethics
    • DOM
    • Developer Tools
    • CSS and XPath
    • robots.txt and site map
  • Scraping a static site with rvest
    • What happens under the hood
    • What the hell is curl?
    • Assisted Assignment: Movie information from IMDB

Day 2

  • Case Study: Investigating drug tests using rvest
  • Interacting with APIs
    • Using XHR to find an API
    • Building wrappers around APIs
  • Scraping a dynamic site with RSelenium
    • Why RSelenium is needed
    • Navigation around web-pages
    • Combining RSelenium with rvest
    • Useful JavaScript tools
    • Case Study
  • Deploying a Scraper in the Cloud
    • Launching and connecting to an EC2 instance
    • Headless browsers
    • Automation with cron

Register

Next: Classification: Get the Balance Right.
Previous: Tips for Lightning Talks.