Web Scraping R Data From JSON

Second article in a series covering scraping data from the web into R; Part I is here and we give some suggestions on potential projects here.

JSON has emerged as one of the common standards for sharing data on the web, particularly data that may be consumed by front-end JavaScript applications. JSON (Javascript Object Notation) is a key:value format which provides the reader with a high degree of context about what a value means. The key-value structure can be nested, permitting data packets like the following:

{‘book’:”Midsummer Nights Dream”,
‘author’: “William Shakespeare”,

Several libraries have emerged for R users that enable you to easily process and digest JSON data. We will present an example from one of these libraries, jsonlite, which is a fork of another leading library RJSONIO. We selected this library due its relative ease of use.

We start with the preliminaries, since jsonlite doesn’t come as part of the r standard libraries:



We will be using a placeholder generator for json data:


This service spits out a faux list of json data, supposedly representing a list of blog post or news articles.

Moving this information into an R data frame is fairly straightforward:

json_file <- "https://jsonplaceholder.typicode.com/posts"

data <- fromJSON(json_file)

Which yields us a lovely looking data frame with required fields.

For those of you who prefer to browse through the data in a text editor or Excel, you can easily dump the file out to a csv file with the following one liner:

write.csv(data, "data.csv")

The package can support more advanced data retrieval, including:

  • Accessing API’s which require a key
  • Extracting and Concatenating multi-page scrapes into the single data frame
  • POST request operations with complex headers and data elements

A set of examples (provided by the package author) are detailed here.

Looking for more options for web scraping in R? Check out our other guides:

Ready To Put This Into Action? Check Out Our Project Suggestions!