CSV files are the most basic option for moving data around between systems. They are supported by every major database and spreadsheet system. It is trivial to generate a csv using almost every programming language, including R. They can be easily edited using any text editor.
Wouldn’t it be nice to be able to directly download a read CSV file into the R code current working directory? This would make it easy for you to update your project if the source data changed. This might also help in the event you need to download a whole bunch of files in a single batch.
Analysts of the world, rejoice! You’re in luck. R code has a couple of ways to get this done.
How To Get A CSV File From A Website
Fortunately, there’s an easy trick with the read.csv() procedure which can be used to import data from the web into a data frame R object file format.
Simple take the header argument URL and feed it into read.csv(). While the most common use for this package is reading CSV files from your computer, it is robust enough to be used for broader purposes. It can accept any proper character string and parse it as if it was a text file on your hard drive- reading column names, row names, a data file, zip file, excel file, any type of text file that you can save as a csv can be read csv file in R script programming. To download a CSV file from the web and load it into R current working directory (properly parsed), all you need to do it pass the URL to read.csv() in the same manner you would pass a character vector filename.
# r read csv from url
# allows you to directly download csv file from website
data <- read.csv("http://apps.fs.fed.us/fiadb-downloads/CSV/LICHEN_SPECIES_SUMMARY.csv")
This method only works if the file is being served via http. Unfortunately, this method of serving files is becoming increasingly rare. Due to major companies pushing for more security on the Internet, more websites are using secure https header argument field separator to handle data, meaning this csv format R function does not work as well to read a dataset from a file path off of the internet. Fortunately, there is a simple tweak we can make to the read.csv one liner using the RCurl library and getURL library that solves this readr package issue, allowing us to read a csv data set from the web file path where our reader object normally could not:
Using RCURL to download a https file
Our next example is a list of lost pets in Seattle, Washington. We’re adapting our example to use RCurl R package to handle the absolute path file transfer to the R workspace and reading the result using read.csv() :
# r download csv from url
# gives additional functions to handle secure https
download <- getURL("https://data.kingcounty.gov/api/views/yaai-7frk/rows.csv?accessType=DOWNLOAD")
data <- read.csv (text = download)
This is a good R code option if you’re working with basic reporting and data type extraction systems, especially if you want to use multiple csv files with a wide variety of variable names, such as an excel spreadsheet with much more complicated parameter value issues than your typical csv data.
R can also handle more complicated data type requests. The next step up from processing CSV files is to use readLines and the RCurl and XML libraries to handle more complicated import operations . This gives you some capacity to parse and reshape the contents of the web page you are scraping. We also have an article covering JSON based web scraping options .
And finally, if you’re still looking for a project – here are some web scraping project ideas.