CSV files are the most basic option for moving data around between systems. They are supported by every major database and spreadsheet system. It is trivial to generate a csv using almost every programming language, including R. They can be easily edited using any text editor.
Wouldn’t it be nice to be able to directly download a CSV file into R? This would make it easy for you to update your project if the source data changed. This might also help in the event you need to download a whole bunch of files in a single batch.
Analysts of the world, rejoice! You’re in luck. R has a couple of ways to get this done.
How To Get A CSV File From A Website
Fortunately, there’s an easy trick with the
read.csv() procedure which can be used to import data from the web into a data frame.
Simple take the URL and feed it into read.csv(). While the most common use for this package is reading CSV files from your computer, it is robust enough to be used for broader purposes. It can accept any proper character string and parse it as if it was a text file on your hard drive. To download a CSV file from the web and load it into R (properly parsed), all you need to do it pass the URL to read.csv() in the same manner you would pass a filename.
# r read csv from url # allows you to directly download csv file from website data <- read.csv("http://apps.fs.fed.us/fiadb-downloads/CSV/LICHEN_SPECIES_SUMMARY.csv")
This method only works if the file is being served via http. Unfortunately, this method of serving files is becoming increasingly rare. Due to major companies pushing for more security on the Internet, more websites are using secure https to handle data. Fortunately, there is a simple tweak we can make to the read.csv one liner using the RCurl library and getURL library that solves this:
Using RCURL to download a https file
Our next example is a list of lost pets in Seattle, Washington. We’re adapting our example to use RCurl to handle the file transfer and reading the result using
# r download csv from url # gives additional functions to handle secure https library (RCurl) download <- getURL("https://data.kingcounty.gov/api/views/yaai-7frk/rows.csv?accessType=DOWNLOAD") data <- read.csv (text = download)
This is a good option if you’re working with basic reporting and data extraction systems.
R can also handle more complicated data requests. The next step up from processing CSV files is to use readLines and the RCurl and XML libraries to handle more complicated import operations. This gives you some capacity to parse and reshape the contents of the web page you are scraping. We also have an article covering JSON based web scraping options.
And finally, if you’re still looking for a project – here are some web scraping project ideas.