The R programming language is an invaluable resource for analytics, data science, and machine learning. If you want to regularly analyze large amounts of data then you’d be hard-pressed to find a better programming language than R. However, if you’re dealing with a constant influx of data then you might want to automate aspects of its analysis. This is especially true for tasks involving machine learning. After all, if you’re working with automated processes then it’s only logical to leverage that concept for your own use. So how would you go about creating an automated workflow in R? The answer for many projects is to use the openxlsx package to automate excel exports and formatting. Read on to discover exactly how to accomplish all of that and more.
R, Openxlsx, and Your Data
You might be wondering why you’d want to export data from your script. After all, that’s where you’re doing all of your processing anyway. But take a moment to think about the nature of your workflow. Over the course of your day, you’re probably taking data from a wide variety of sources and moving it into different contexts. That’s exactly what you’re enabling for your digitally processed data by adding import/export functionality. It also gives you the ability to set up automated workflows. For example, imagine that you have an R script that you can only run after another script has finished crunching some numbers. You’d normally need to manually move information from one script to the other. Even if your initial script dumps data into a file the lack of standardization means that it’s going to be of limited universal utility.
Now imagine that you were scheduling your script to run at a predetermined time every day. And that when the script finished up it would export its data into an Excel file in a shared, networked, directory. You or anyone else on the network could then set up a cron job to check for new files and import the data when it becomes available. Just like that, you’ve removed the need to alert other people of newly available data. And you’ve also saved yourself the time and effort needed to manually export and import data between your various scripts. But this does raise a question. How do you go about exporting to a Microsoft-based format?
There was a time when working with Word, Excel, and other file formats from Microsoft was difficult even at the best of times. But R has a library called openxlsx that makes it easy to work with Excel-formatted data. The only caveat is that you’ll need to actually install the openxlsx package before you can take advantage of its additional functionality. But since it’s available in CRAN this is as easy as just typing the following.
install.packages(“openxlsx”)
Once you’ve installed openxlsx you’re ready to discover just how easy it makes Excel file access.
Using Openxlsx With Excel Files
Take a look at this simple example of how we could use the openxlsx library to write a data frame into an excel spreadsheet.
library(openxlsx)
marsMoons <- data.frame(
moonName = c(“Phobos”, “Deimos”),
moonDistance = c(9378, 23463)
)
ourWorkbook <- createWorkbook()
addWorksheet(ourWorkbook, sheetName = “Sheet1”)
writeData(ourWorkbook, sheet = “Sheet1”, x = marsMoons, rowNames = TRUE, colNames = TRUE)
saveWorkbook(ourWorkbook, file = “mars_moons.xlsx”, overwrite = TRUE)
We begin by importing the openxlsx library. Don’t forget to install it before proceeding with these examples. Whether that’s through CRAN or any other method. We proceed to create a data frame to use for our excel export. This data frame, marsMoons, contains the moons of mars and their distance from their parent planet.
Assuming you’ve successfully installed and loaded up openxlsx you can now create a workbook. The library makes this as easy as calling createWorkbook and assigning the results to a variable. In this case, we assign it to ourWorkbook.
We proceed to add a worksheet, Sheet1, to ourWorkbook. You can think of sheets as analogous to pages in a book. So we start out with the first page as Sheet1. We continue on to write our data frame into the workbook. This is done through the writeData command. We simply need to supply arguments specifying a workbook, sheet, data frame, and whether to label the rows and columns. Once that’s finished we can finally export our data frame in Excel format. We use override as TRUE in order to overwrite any existing iterations from previous runs. Note as well that our .xlsx file uses underscores in the name rather than the camel notation found in the rest of the script. The new naming convention will prevent any compatibility issues that might arise from moving between case-sensitive and case-insensitive file systems.
We could also streamline this process to make it a little easier to extend into a more universal tool that’s easy to modify for a variety of tasks. Take a look at the following example.
library(openxlsx)
writeExcel <- function(ourData, file) {
ourWorkbook <- createWorkbook()
addWorksheet(ourWorkbook, sheetName = “Sheet1”)
writeData(ourWorkbook, sheet = “Sheet1”, x = ourData, rowNames = TRUE, colNames = TRUE)
saveWorkbook(ourWorkbook, file = file, overwrite = TRUE)
}
readExcel <- function(file) {
ourWorkbook <- loadWorkbook(file)
ourData <- read.xlsx(ourWorkbook, sheet = “Sheet1”)
return(ourData)
}
marsMoons <- data.frame(
moonName = c(“Phobos”, “Deimos”),
moonDistance = c(9378, 23463)
)
writeExcel(marsMoons, “mars_moons.xlsx”)
ourData <- readExcel(“mars_moons.xlsx”)
print(ourData)
We’re essentially using the same technique as in the first example. The main difference is that we’ve moved the data frame export into a writeData function. We also have a new readExcel function that can ensure that we’ve properly exported and formatted our data frame to an Excel file. This essentially verifies that we’re writing everything out correctly and that other scripts will be able to make use of our data.
The openxlsx’s read system is fairly analogous to its write function. We simply assign a workbook instance and then provide a filename. Then we read out the data by specifying the workbook instance and the sheet we want to use.
Taking Things to the Next Level With Automation
Our earlier musings were centered around the idea of automating daily workflows. And it’s actually fairly easy to build on the prior example in order to do so. The best way to implement it would be to simply use your operating system’s native scheduling system to run your script every day. It’s true that you could initiate this behavior from within the R script itself. But there’s one huge issue with doing so. Your workflow becomes entirely dependent on the stability of R’s interpreter. And a lot of things can interfere with a virtual machine’s stability.
Operating system reboots and updates, RAM usage issues, and even problems with drive overuse can cause a script to stop executing. And once that happens it won’t be able to automatically restart on its own. But if we tie the script into the operating system’s own scheduling system we can ensure that the script will always be run from scratch as long as the operating system is functional. Of course, the specifics will vary between operating systems.
Linux and OSX users will want to use crontab. The editor is brought up by typing “crontab -e”. And you can specify that your R script should be run every night at midnight with the following.
0 0 * * * YourRinterpreter /pathToYourRscriptAndItsName
In Windows, you can use the Task Scheduler. You can launch it from the Windows menu or by typing taskschd.msc on the command line. Then create your new action within the system’s Action – Create Basic Task – Trigger. As long as you’ve associated .r files with your R interpreter you should be able to call your script as you would any compiled executable.
You’ll of course need to modify these options to fit your personal requirements. But these are all of the pieces you’ll need to automate your workflow by formatting and exporting your data as an Excel file.