Graph of data -ProgrammingR header image
Beginner to advanced resources for the R programming language

Articles

Calling Python from R with rPython

Python has generated a good bit of buzz over the past year as an alternative to R. Personal biases aside, an expert makes the best use of the available tools, and sometimes Python is better suited to a task. As a case in point, I recently wanted to pull data via the Reddit API. There isn’t an R package that provides easy access to the Reddit API, but there is a very well designed and documented Python module called PRAW (or, the Python Reddit API Wrapper). Using this module I was able to develop a Python-based solution to get and analyze the data I needed without too much trouble.

However, I prefer working in R, so I was glad to discover the rPython package, which enables calling Python scripts from R. After finding rPython, I was able to rewrite my purely Python script as a primarily R-based program.

If you want to use rPython there are a couple of prerequisites you’ll need to address if you haven’t already. No surprise, you’ll need to have Python installed. After that, you’ll need to install the PRAW module via pip install praw. Finally, install the rPython package from CRAN. (But see the note below first if you’re on Windows.)

After you’ve completed those steps, it’s as easy as writing your Python script and adding a line or two to your R code.

First create a Python script that imports the praw module and does the first data call:

import praw

# Set the user agent information
# IMPORTANT: Change this if you borrow this code. Reddit has very strong 
# guidelines about how to report user agent information 
r = praw.Reddit('Check New Articles script based on code by ProgrammingR.com')
   
# Create a (lazy) generator that will get the data when we call it below
new_subs = r.get_new(limit=100)

# Get the data and put it into a usable format
new_subs=[str(x) for x in new_subs]

Since the Python session is persistent, we can also create a shorter Python script that we can use to fetch updated data without reimporting the praw module

# Create a (lazy) generator that will get the data when we call it below
new_subs = r.get_new(limit=100)

# Get the data and create a list of strings
new_subs=[str(x) for x in new_subs]

Finally, some R code that calls the Python script and gets the data from the Python variables we create:

library(rPython)

# Load/run the main Python script
python.load("GetNewRedditSubmissions.py")

# Get the variable
new_subs_data <- python.get("new_subs")

# Load/run re-fecth script
python.load("RefreshNewSubs.py")

# Get the updated variable
new_subs_data <- python.get("new_subs")

head(new_subs_data)

A few final notes:

  • The main drawback to the rPython package is that it currently doesn’t run on Windows. The developer (Carlos J. Gil Bellosta) is working to fix this, though. If that wrinkle gets resolved, I can see this being a very popular package.
  • You can use RStudio to write your Python programs, which is easier than switching to another IDE for simple scripts. However, it causes an issue with EOL characters. Namely, you need to add a blank line at the end of each .py file to get it to load properly.
  • The Python session rPython initiates is associated with the R session. Any Python modules you load or variables you create will be available until you remove them or close the R session.


ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

Creating your personal, portable R code library with GitHub

As I discussed in a previous post, I have a few helper functions I’ve created that I commonly use in my work. Until recently, I manually included these functions at the start of my R scripts by either the tried and true copy-and-paste method, or by extracting them from a local file with the <code>source()</code> function. The former approach has the benefit of keeping the helper code inextricably attached to the main script, but it adds a good bit of code to wade through. The latter approach keeps the code cleaner, but requires that whoever is running the code always has access to the sourced file and that it is always in the same relative path – and that makes sharing or moving code more difficult. The start of a recent project requiring me to share my helper function library prompted me to find a better solution.

The resulting approach takes advantage of GitHub Gists and R’s ability to source via a web-based location to enable you to create a personal, portable library of R functions for private use or to share.

The process is very straightforward. If you don’t have a GitHub account, getting one is the first step. After doing so, click on “Gist” in the menu bar to create a new Gist. (There are a couple of reasons for using a Gist instead of a formal repository. For one, repositories cannot be made private unless you have a paid GitHub membership. Gists, however, can be created as “Secret” and therefore available only to people who have the exact URL.) Name your Gist, add your code, and choose whether you want it to be a secret or public Gist to save that revision. After saving your Gist you will be taken to a screen displaying your new Gist.

Now that you’ve created your Gist, you just need to source it in your R code. That step is easily done via the <code>source()</code> function. Rather than including a file path as you would with a local file, you simply include the URL to your Gist. Note that you can’t just copy the URL of the GitHub page you’re currently on. You need to source the URL of the raw code. To get that URL, click on the “<>” button on the top right of your Gist (circled in red below). This will take you to the raw code *for the current revision*. To get the path that always points to the most recent revision, remove everything in the URL after “/raw/”.

GitHubGist

Regardless of which version you link to, you’ll still be left with a long URL that includes a randomly generated alphanumeric string that you don’t want to have to try to remember. Here’s a protip, use a URL shortener that let’s you define a custom URL (e.g., tinyurl) to create a shortened version that is easy to remember. Something like the little library at http://tinyurl.com/ProgRLib  (YES! You can use and add to this library!).

Now you’ve got a portable, easy to reference, personal library of functions that you can easily include in your code or share with others.



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

Global Indicator Analyses with R

I was recently asked by a client to create a large number of “proof of concept” visualizations that illustrated the power of R for compiling and analyzing disparate datasets. The client was specifically interested in automated analyses of global data. A little research led me to the WDI package.

The WDI package is a tool to “search, extract and format data from the World Bank’s World Development Indicators” (WDI help). In essence, it is an R-based wrapper for the World Bank Economic Indicators Data API. When used in combination with the information on the World Bank data portal it provides easy access to thousands of global datapoints.

Here is an example use case that illustrates how simple and easy it is to use, especially with a little help from the countrycode and ggplot2 packages:

library(WDI)
library(ggplot2)
library(countrycode)

# Use the WDIsearch function to get a list of fertility rate indicators
indicatorMetaData <- WDIsearch("Fertility rate", field="name", short=FALSE)

# Define a list of countries for which to pull data
countries <- c("United States", "Britain", "Sweden", "Germany")

# Convert the country names to iso2c format used in the World Bank data
iso2cNames <- countrycode(countries, "country.name", "iso2c")

# Pull data for each countries for the first two fertility rate indicators, for the years 2001 to 2011
wdiData <- WDI(iso2cNames, indicatorMetaData[1:2,1], start=2001, end=2011)

# Pull out indicator names
indicatorNames <- indicatorMetaData[1:2, 1]

# Create trend charts for the first two indicators
for (indicatorName in indicatorNames) { 
  pl <- ggplot(wdiData, aes(x=year, y=wdiData[,indicatorName], group=country, color=country))+
    geom_line(size=1)+
    scale_x_continuous(name="Year", breaks=c(unique(wdiData[,"year"])))+
    scale_y_continuous(name=indicatorName)+
    scale_linetype_discrete(name="Country")+
    theme(legend.title=element_blank())+
    ggtitle(paste(indicatorMetaData[indicatorMetaData[,1]==indicatorName, "name"], "\n"))
  ggsave(paste(indicatorName, ".jpg", sep=""), pl)
}

WDI package visualization 1WDI package visualization 2

This code can be adapted to quickly pull and visualize many pieces of data. Even if you don’t have an analytic need for the WDI data, the ease of access and depth of information available via the WDI package make them perfect for creating toy examples for classes, presentations or blogs, or conveying the power and depth of available R packages.



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

SPARQL with R in less than 5 minutes

In this article we’ll get up and running on the Semantic Web in less than 5 minutes using SPARQL with R. We’ll begin with a brief introduction to the Semantic Web then cover some simple steps for downloading and analyzing government data via a SPARQL query with the SPARQL R package.

What is the Semantic Web?

To newcomers, the Semantic Web can sound mysterious and ominous. By most accounts, it’s the wave of the future, but it’s hard to pin down exactly what it is. This is in part because the Semantic Web has been evolving for some time but is just now beginning to take a  recognizable shape (DuCharme 2011).  Detailed definitions of the Semantic Web abound, but simply put, it is an attempt to structure the unstructured data on the Web and to formalize the standards that make that structure possible. In other words, it’s an attempt to create a data definition for the Web.

The primary method for accessing data made available on the Semantic Web is via SPARQL queries to a data provider’s endpoint. Endpoints are portals to data that a provider has made available for querying. This data is often published in RDF format. If you’re familiar with using SQL to query a database, then the idea of using a query language (SPARQL) to access a structured database or data repository (the endpoint) should be common ground. The practice is also similar to webscraping an XML document with a known structure.

The Semantic Web and R

A primary goal of the Semantic Web movement is to enable machines and applications to interface using standardized data definitions. As the Semantic Web grows, it will be ripe for mining with statistical programs that already lend themselves to automation – such as R. To that end, let’s dive into a simple query that will illustrate 75% of what you’ll be doing on the Semantic Web.

Accessing Data.gov datasets with SPARQL

We’ll use data at the Data.gov endpoint for this example. Data.gov has a wide array of public data available, making this example generalizable to many other datasets. One of the key challenges of querying a Semantic Web resource is knowing what data is accessible. Sometimes the best way to find this out is to run a simple query with no filters that returns only a few results or to directly view the RDF. Fortunately, information on the data available via Data.gov has been cataloged on a wiki hosted by Rensselaer. We’ll use Dataset 1187 for this example. It’s simple and has interesting data – the total number of wildfires and acres burned per year, 1960-2008.

R code

Data.gov provides a Virtuoso endpoint here, which we could use to submit manual queries. But we want to automate this process, so we’ll use Willem Robert van Hage and Tomi Kauppinen’s SPARQL package to access the endpoint.

library(SPARQL) # SPARQL querying package
library(ggplot2)

# Step 1 - Set up preliminaries and define query
# Define the data.gov endpoint
endpoint <- "http://services.data.gov/sparql"

# create query statement
query <-
"PREFIX  dgp1187: <http://data-gov.tw.rpi.edu/vocab/p/1187/>
SELECT ?ye ?fi ?ac
WHERE {
?s dgp1187:year ?ye .
?s dgp1187:fires ?fi .
?s dgp1187:acres ?ac .
}"

# Step 2 - Use SPARQL package to submit query and save results to a data frame
qd <- SPARQL(endpoint,query)
df <- qd$results

# Step 3 - Prep for graphing

# Numbers are usually returned as characters, so convert to numeric and create a
# variable for "average acres burned per fire"
str(df)
df <- as.data.frame(apply(df, 2, as.numeric))
str(df)

df$avgperfire <- df$ac/df$fi

# Step 4 - Plot some data
ggplot(df, aes(x=ye, y=avgperfire, group=1)) +
geom_point() +
stat_smooth() +
scale_x_continuous(breaks=seq(1960, 2008, 5)) +
xlab("Year") +
ylab("Average acres burned per fire")

ggplot(df, aes(x=ye, y=fi, group=1)) +
geom_point() +
stat_smooth() +
scale_x_continuous(breaks=seq(1960, 2008, 5)) +
xlab("Year") +
ylab("Number of fires")

ggplot(df, aes(x=ye, y=ac, group=1)) +
geom_point() +
stat_smooth() +
scale_x_continuous(breaks=seq(1960, 2008, 5)) +
xlab("Year") +
ylab("Acres burned")

# In less than 5 mins we have written code to download just
# the data we need and have an interesting result to explore!

If you’re familiar with R, the only foreign code is that in Steps 1 and 2. There we define the location of the endpoint and write the SPARQL query. If you have experience with SQL, you’ll notice that SPARQL syntax is very similar. The most notable exception is that Semantic Web data is structured around chunks of data that contain three pieces of information. In Semantic Web terminology these three pieces of information as referred to as subjects, predicates and objects and together they are called “triples”. A SPARQL query returns pieces of these chunks of data; exactly which pieces are returned depends on the query. We have defined a prefix which establishes that everywhere we say “dgp1187:” we want it to know that we mean “<http://data-gov.tw.rpi.edu/vocab/p/1187/>”. This saves us from having to retype long URIs every time we need to reference them. URIs are used extensively on the Semantic Web, so this is a very helpful feature.

In English, our query says, “Give me the values for the attributes “fires”, “acres” and “year” wherever they are defined, and assign them to variables named “fi”, “ac” and “ye” respectively. Also fetch the location of the value in the data as a variable called ?s and merge the other data values by that variable.”

The last bit of the query is the closest we get to SPARQL magic. Because the variable ?s is present in each clause of the query, our fire, acres burned, and year data will be merged by that variable. This is somewhat analogous to saying, “Once you’ve found a row of data that contains information on the number of fires, also return the acres burned and year data from that row and list them on the same row in the new dataset.” (If the logic behind the query isn’t yet clear, or you’re ready to see how you can make this query more advanced or parsimonious, the text and web pages in the “References” and “Additional Information” sections at the end of this article are good resources for learning SPARQL.)

Once defined, we pass this query and endpoint information to the SPARQL function which returns an object with two values – a data frame of the new data and a list of namespaces. At this point, we’re concerned with the data frame, so we pull it out in the last line of Step 2.Data.gov SPARQL example

From that point on we treat the data just like any other dataset. Running some quick graphs, we see that although the number of fires per year have decreased, the number of acres burned per fire and the number of acres burned per year increased considerably between 1975 and 2008. (Despite being born in 1975, I disavow any responsibility for this trend.)

Summary

In a few short minutes, using R and SPARQL, we wrote code to pull and do initial analyses on an interesting government dataset. Hopefully, the power of R for mining the Semantic Web is evident from this simple example. As more data becomes available in RDF format, automated solutions for mining and analyzing the Semantic Web will become more and more useful.

Reference:

DuCharme, Bob (2011-07-14). Learning SPARQL . OReilly Media.

Additional Information:

Basic tutorial on querying Data.gov

More detailed tutorial on querying Data.gov



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

Helper Function Contest

Just in time for the holidays, ProgrammingR is giving you the opportunity to win a gift for yourself or someone who loves R. And in the spirit of the holidays, you’ll also be giving back to the R community.

In the last article I presented two helper functions I use often. I also teased an upcoming, related event. This is that event.

The premise is simple. Post your favorite (original) R helper function(s). We’ll select 3-5 of the top candidates and put those to a popular vote. The author of the function with the most votes at the end of the contest will receive their choice of one of the three books below.

The contest begins now. Post your submissions to this forum or use the comments below to point us to a post on your site written specifically for this contest. You can enter as many of your helper functions as you like between now and midnight 12/14/12 Eastern Standard Time. We’ll choose the finalists and allow approximately a week of voting for them starting soon after 12/14. The author of the function with the most votes at the end of the popular voting will be the winner.

I look forward to seeing what the ProgrammingR community has to offer!

Some Important Guidelines:
By participating, you agree to be bound by the decisions of the ProgrammingR staff in all contest-related aspects, but especially in regard to who the finalists are and the winner is. We will do our best to run a fair and impartial contest, but we can’t guarantee the originality of every submission, for example. You also certify that you have the right to publicly post any code you submit and that we can repost it on the site. If you point us to a submission on your site and your code is selected as a finalist, we’ll need to verify that it actually is your site.



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

R Helper Functions

If you do a lot of R programming, you probably have a list of R helper functions set aside in a script that you include on R startup or at the top of your code. In some cases helper functions add capabilities that aren’t otherwise available. In other cases, they replicate functionality that is available elsewhere without loading unnecessary components. Below I present two of my most frequently used data manipulation helper functions as examples.

Descriptives R Helpler Function

# Display some basic descriptives
descs <- function (x) {
  if(!hidetables) {
    if(length(unique(x))>30) {
      print("Summary results:")
      print(summary(x))
      print("")
      print("Number of categories is greater than 30, table not produced")
    } else {
      print("Summary results:")
      print(summary(x))
      print("")
      print("Table results:")
      print(table(x, useNA="always"))
    }
   
  } else {
    print("Tables are hidden")
  }
}


# Set hide tables to true to hide tables
hidetables <- FALSE

Dummy Variable R Helper Function

# Create dummy variables for each level of a categorical variable
createDummies <- function(x, df, keepNAs = TRUE) {
  for (i in seq(1, length(unique(df[, x])))) {
    if(keepNAs) {
      df[, paste(x,".", i, sep = "")] <- ifelse(df[, x] != i, 0, 1)
    } else {
      df[, paste(x,".", i, sep = "")] <- ifelse(df[, x] != i | is.na(df[, x]) , 0, 1)     
    }
  }
  df
}

The Descriptives R Helper Function produces a summary or table of the passed variable/object; it uses the number of unique values to determine whether to call just the summary() or summary() and table() functions. It also includes NAs by default in the tables (one of table()‘s biggest annoyances). Once the exploratory and data manipulation work is done, all output from this function can be suppressed by setting the hidetables object to TRUE.

The Dummy Variable R Helper Function creates indicator variables from all values of a variable. Based on experience, I avoid the factor object as much as possible and this approach allows me to quickly create indicators that can be used in any way I want.

If you don’t have a programming background or are just beginning with R, you might not have had time to realize the benefit of helper functions or identify the tasks you do repetitively, but it’s worthwhile to give the issue some consideration. Helper functions can be exceptionally useful for saving time on repetitive tasks or facilitating your work. They’re so useful in fact, that there is a special ProgrammingR event planned around helper functions coming soon. For the more experienced R programmers out there, make a mental note of the most useful helper functions you’ve written in the past. That list will come in handy in the near future!



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

Progress bars in R using winProgressBar

Using progress bars in R scripts can provide valuable timing feedback during development and additional polish to final products. winProgressBar and setWinProgressBar are the primary functions for creating progress bars in R.

Progress bars, and progress indicators in general, are relatively uncommon in R programming. This makes sense, as they can add bloat and, being design elements, they generally fall into the classification of “nice but not necessary”. However, during development, especially when using loops, progress bars can a cleaner way of tracking loop progress than, for example, printing iteration numbers. And for programmers who prepare scripts or packages for non-programmers, they add feedback that users have come to expect from other software.

To add progress bars in R scripts use the winProgressBar and setWinProgressBar functions. For non-Windows users there’s also a very similar Tcl/Tk version (tkProgressBar and settkProgressbar); depending on your current set up, you may need to install the tcltk library to use it.

Setting up a progress indicator to track the progress of a loop is very straightforward. First, initialize the display:

pb <- winProgressBar(title="Example progress bar", label="0% done", min=0, max=100, initial=0)

Use the title and label options to set the style of the display. The min and max options should use whatever values are most applicable to your task, but in most cases this will be displaying a percentage so a range of 0 to 100, starting at 0, makes sense.

After initializing the progress indicator, add your loop code and update the progress bar by calling the setWinProgressBar function at the end of each loop:

for(i in 1:100) {
Sys.sleep(0.1) # slow down the code for illustration purposes
info <- sprintf("%d%% done", round((i/100)*100))
setWinProgressBar(pb, i/(100)*100, label=info)
}

Once the loop is exited, close the progress bar window:

close(pb)

Here is a snapshot of the progress indicator in action:

Windows Progress Bar Example

To track the progress of an entire script or program, you just need to place the initializing function (winProgressBar) at the beginning of the code and do updates via setWinProgressBar at key points within the flow of your program. For example, if the first task your code performs usually takes about 10% of the total time required, set the value to 10% after that task completes.

If a popup progress bar doesn't work for your task, there is a minimalist option, txtProgressBar, that by default draws a line in the console. It also allows for some fun customizations:

Text Progress Bar Example



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

Animations in R

Animated charts can be very helpful in illustrating concepts or discovering relationships, which makes them very helpful in teaching and exploratory research. Fortunately, creating animated graphs in R is fairly straightforward, once you have the right tools and understand a few basic principles about how the animations are created.

In this article I’ll provide an example of how to use the animation package to create an animated chart with a couple of bells and whistles.

The package installs out-of-the-box with several animations that are tailored for instruction. The examples are of varying complexity ranging from a simple coin flip simulation to illustrations of mathematical problems such as Buffon’s needle problem. In most scenarios, however, you’ll want to create your own animations, so let’s look at how to do that.

First, there are several different formats in which you can create your animations – GIF, HTML, LaTeX, SWF and mp4. The saveGIF() function call below illustrates the generic format for each of the calls:

saveGIF({
    for (i in 1:10) plot(runif(10), ylim = 0:1)
})

Understanding that the package creates animations by generating and then compiling many graphs is central to creating polished custom animations. As you can see, the syntax looks a little unfamiliar at first because the inside of the function call is a custom loop that creates the individual graphs. (Note: If you’re familiar with with the way the boot() function works, this is somewhat similar.) Once those individual graphs are created, the function compiles the images in the format specified by the function call. As you might have guessed, most of the animation types require that you install 3rd party libraries for R to be able to do the compilations. The installation of these libraries is covered in the package help.

Basic use of the animation functions is covered in the package help, but the application of the functions to novel tasks can still be a little difficult. As a result, I’ve created an example that illustrates how to use the functions to create animations with a couple of bells and whistles.

This animation plots the density functions of 150 draws of 100 values from a normally distributed random variable. To make things a little more interesting (i.e., make the distribution move), a constant that varies based on the iteration count is added to the 100 values. The chart also includes a slightly stylized frame tracker (or draw counter) along the top of the chart and a horizontal bar that notes the current position and previous two positions of the sample mean. Finally, the foreground color of the chart changes based on the mean of the distribution.

 

###################################
library(animation)

#Set delay between frames when replaying
ani.options(interval=.05)

# Set up a vector of colors for use below
col.range <- heat.colors(15)

# Begin animation loop
# Note the brackets within the parentheses
saveGIF({
 
  # For the most part, it’s safest to start with graphical settings in
  # the animation loop, as the loop adds a layer of complexity to
  # manipulating the graphs. For example, the layout specification needs to
  # be within animation loop to work properly.
  layout(matrix(c(1, rep(2, 5)), 6, 1))
 
  # Adjust the margins a little
  par(mar=c(4,4,2,1) + 0.1)
 
    # Begin the loop that creates the 150 individual graphs
    for (i in 1:150) {
     
      # Pull 100 observations from a normal distribution
      # and add a constant based on the iteration to move the distribution
      chunk <- rnorm(100)+sqrt(abs((i)-51))
     
      # Reset the color of the top chart every time (so that it doesn’t change as the
      # bottom chart changes)
      par(fg=1)
     
      # Set up the top chart that keeps track of the current frame/iteration
      # Dress it up a little just for fun
      plot(-5, xlim = c(1,150), ylim = c(0, .3), axes = F, xlab = “”, ylab = “”, main = “Iteration”)
      abline(v=i, lwd=5, col = rgb(0, 0, 255, 255, maxColorValue=255))
      abline(v=i-1, lwd=5, col = rgb(0, 0, 255, 50, maxColorValue=255))
      abline(v=i-2, lwd=5, col = rgb(0, 0, 255, 25, maxColorValue=255))
     
      # Bring back the X axis
      axis(1)
     
      # Set the color of the bottom chart based on the distance of the distribution’s mean from 0
      par(fg = col.range[mean(chunk)+3])
     
      # Set up the bottom chart
   
   plot(density(chunk), main = “”, xlab = “X Value”, xlim = c(-5, 15), ylim = c(0, .6))

      # Add a line that indicates the mean of the distribution. Add additional lines to track
      # previous means
      abline(v=mean(chunk), col = rgb(255, 0, 0, 255, maxColorValue=255))
      if (exists(“lastmean”)) {abline(v=lastmean, col = rgb(255, 0, 0, 50, maxColorValue=255)); prevlastmean <- lastmean;}
      if (exists(“prevlastmean”)) {abline(v=prevlastmean, col = rgb(255, 0, 0, 25, maxColorValue=255))}
      #Fix last mean calculation
      lastmean <- mean(chunk)
    }
})

########################

And the final product:

A couple of closing notes:

  • Because there are external programs involved (e.g., SWF Tools, ImageMagick, FFmpeg), the setup for this package is slightly more difficult than the average package and things will likely seem less polished than normal. Things may also not work as well; you’ll need to be prepared to be flexible with your animation formats and graph layouts.
  • Animation works exceptionally well when smaller numbers of individual graphs are being compiled, but as the number of individual graphs grows, so does your likelihood of hitting a problem. E.g., although GIF is a very exportable and transportable format, and therefore ideal for many situations, I found that animations with more than ~500 source graphs just didn’t compile. The limit for HTML was similar. Your mileage may vary, but again, be prepared to be flexible.
  • If you do not need to transport your animation and it will have less than a few hundred individual images, you can avoid installing 3rd party software by using the saveHTML function. This output also includes an interface that allows you to pause and move within the animation easily.
  • As mentioned in the code above, if you’re having trouble getting a particular graphical parameter to work, make sure that it is in the internal loop. For efficiency, you want to keep the loop as clean as possible of course, but some things need to be specified each time a new chart is plotted, and therefore need to be inside the loop.
  • Animations aren’t very common in research presentations, but can provide extensive insight beyond static images. Given R’s advanced graphing capabilities, it’s possible to create very nice animations without needing to learn a completely different software package.

If you’ve created an animation you’d like to share or have additional tips, feel free add them to the comments.



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

RStudio Development Environment

RStudio LayoutCompared to many other languages of equal popularity, there are realtively few development environments for R. In fact, the total number of production ready R IDEs could probably be counted on one hand. That deficiency is a small price to pay to use R and if you’re not already accustomed to using IDEs for other languages, you probably haven’t missed it too much. But RStudio goes a long way toward providing a full-featured R development platform, that, once you’ve used it, quickly becomes hard to give up again.

RStudio has some nice graphical features and the layout is clean and logical for the most part. Functionally, some of the best features are:

  • Plot caching (allows you to flip back through previous graphs without rerunning them, making it much easier to review your graphical output)
  • Function, object and parameter listing and completion that works even with user-defined functions
  • Shortcuts for quickly drilling down into functions

RStudio paramater completion

RStudio also provides version control integration (Git, SVN) which could prove to be very helpful, but I haven’t yet tested it. I can’t speak to how well it works, just that it is available.

In addition to these positives, RStudio has an active support system with developer participation via the RStudio support site.

Overall, I’ve been very impressed with RStudio over the past few weeks. If you haven’t yet tested it, I suggest you give it a try. Given the growth of R over recent years, I think it’s time we expected development tools to mature to the level that they have for other programming languages with similar levels of adoption. The only way that will produce sustainable, mature products is if there is a constant demand in the market.

Already using something else? Feel free to mention your favorite R IDE in the comments.



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.

Installing quantstrat from R-forge and source

R is used extensively in the financial industry; many of my recent clients have been working in or developing products for the financial sector. Some common applications are to use R to analyze market data and evaluate quantitative trading strategies. Custom solutions are almost always the best way to do this, but the quantstrat package can make it easy to quickly get a high-level understanding of a strategy’s potential. However, quantstrat is still under development, and this, combined with a lack of documentation and the complex nature of the tasks involved, make it difficult to work with. This article addresses one of the most basic issues with quantstrat – getting it installed. quantstrat and it’s required packages currently aren’t available on CRAN – you have to get them from R-forge. As a result, the installation is slightly less straightforward than other packages and provides an opportunity to discuss how to install packages from R-forge and locally from source. Although this article focuses on installing quantstrat, these instructions will help with any R-package that you need to build from source.


If you’re installing from R-forge, the process is only moderately different than installing from CRAN; simply change the install.packages command to point to the R-forge repository:


install.packages("FinancialInstrument", repos="http://R-Forge.R-project.org")

install.packages("blotter", repos="http://R-Forge.R-project.org")

install.packages("quantstrat", repos="http://R-Forge.R-project.org")


Since the FinancialInstrument and blotter packages are dependencies for quantstrat, you can download and install all three at once with just the last line.


In some cases, you may need to build the packages yourself. You’ll need to set your system up to compile R source code if it isn’t already. To do so, follow steps 1-3 below. If your system is already set up to compile R source code, you can skip to step 4.


# 1) Install Rtools package (must be done manually from http://www.murdoch-sutherland.com/Rtools/

# 2) Install LaTex from www.miktex.org/

# 3) Install InnoSetup http://www.jrsoftware.org/

# 4) Download the three package source files available from R-forge http://r-forge.r-project.org/R/?group_id=316

# 5) Install the packages using the commands below (substituting the appropriate version numbers):


install.packages("C:/yourpath/FinancialInstrument_0.9.18.tar.gz", repos = NULL, type="source")

install.packages("C:/yourpath/blotter_0.8.4.tar.gz", repos = NULL, type="source")

install.packages("C:/yourpath/quantstrat_0.6.1.tar.gz", repos = NULL, type="source")


Note that these directions are relevant until the packages are available on CRAN, after which, you’ll be able to download and install them like any other package (I’ll make a note on this post once that happens). Also note that since these packages are under heavy development, you’ll want to update them often.



ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the ProgrammingR articles feed. To be notified when new R-based job listings are posted, subscribe to the ProgrammingR jobs feed. You may also want to post an R consultant listing, hire an R consultant, or ask a question in the forums.