Second article in a series covering scraping data from the web into R; Part I is here and we give some suggestions on potential projects here.
JSON has emerged as one of the common standards for sharing data on the web, particularly data that may be consumed by front-end JavaScript applications. JSON (Javascript Object Notation) is a key:value format which provides the reader with a high degree of context about what a value means. The key-value structure can be nested, permitting data packets like the following:
{‘book’:”Midsummer Nights Dream”,
‘author’: “William Shakespeare”,
‘price’:5.99,
‘inventory’:12}
So, if you’re wondering how to access json…. or better yet, convert json to dataframe elements…
R jsonlite – reading json in r
Several libraries have emerged for R users that enable you to easily process and digest JSON data. Here is an example from one of these libraries, jsonlite, which is a fork of another leading library RJSONIO. We selected this library due its relative ease of use.
Since jsonlite doesn’t come as part of the r standard libraries, we must install it:
We will be using a placeholder generator for json data:
# r web / r json - installing jsonlite
install.packages("jsonlite")
libraries("jsonlite")
https://jsonplaceholder.typicode.com/posts
This service spits out a faux list of json data, supposedly representing a list of blog post or news articles.
Moving this information into an R data frame is fairly straightforward:
# r web / r json - get json data from url
json_file <- "https://jsonplaceholder.typicode.com/posts"
data <- fromJSON(json_file)
Which yields us a lovely looking data frame with required fields.
Completing The Cycle – r json to csv
For those of you who prefer to browse through the data in a text editor or Excel, you can easily dump the file out to a csv file with the following one liner:
# r web / r json - json to csv in r - saving it for later
write.csv(data, "data.csv")
The package can support more advanced data retrieval, including:
- Accessing API’s which require a key
- Extracting and Concatenating multi-page scrapes into the single data frame
- POST request operations with complex headers and data elements
A set of examples (provided by the package author) are detailed here.
Looking for more options for web scraping in R? Check out our other guides: