Pipe operator – %>% in R

One need that often comes up in programming is the need to do several sequential operations on the same data. In simple cases, this is not much of a problem, but it can escalate quite quickly. When this happens, the code becomes difficult to read and nearly impossible to follow.

Sequential operations

The first solution you might think of, and with most programming languages it is the only solution, would be to run the operations sequentially using a second variable to preserve the contents of the original variable. In R, this solution looks something like this.

# %>% in R example (alternative - intermediate variables)
> a = 3.14159
> b = seq(from = a,10,3)
> round(b,3)
 [1] 3.142 6.142 9.142

Even at this level following the logic of this code is a little tricky, because while the operations are in a specific order the variables are not. This we get harder with more operations.

Nested operations

One solution that R allows is the nesting of operations. It has the advantage of reducing the number of lines of code and variables needed. It does, however, have the disadvantage of quickly getting unreadable. The more nested operations you have the more complicated and harder to follow the code becomes.

# %>% in R example (alternative - nested syntax) 
> a = 3.14159
 > seq(round(a,3),10,3)
 [1] 3.142 6.142 9.142

This is just an example of two nested operations and it is already difficult to read.

The need for simplification

What is clearly needed is a better solution to this problem. The solution needs to make the code more readable and easy to follow. This means ensuring, that the code is in an easy-to-read order. Part of the problem is that nested codes required going from right to left and easy reading means reading the code from left to right. So simplifying the code requires making the change to “left to right.”

The pipe operator

Thanks to the magrittr package, R has an excellent solution in the pipe operator. The pipe operator is an R operator in the form of data %>% function1 %>% function2. It performs the same function as nesting operations, but it does so in a straightforward left to right manner. This makes the entire set of code easier to read and follow. It gets rid of the complexities of both sequential and nested operations.

How to install the magrittr package

Installing the magrittr package, it is quite simple with RStudio. In the lower right-hand window, click on the packages tab, bring up the list of available packages. If the magrittr package is not already there, click on install. Simply paste “magrittr” in the text box labeled packages, click install and the program will do the rest. Once magrittr has been installed, it will be included in your list of available packages. Then simply click the checkbox next to it to add it to your project.

How to use the pipe operator

Using the pipe operator, it is quite simple. It has a format of data %>% function1 %>% function2. Where the function or data before the “%>%” operator is inserted as the main argument of the function that follows it.

# %>% in R - example 
> a = 3.14159
 > a %>% seq(10,3) %>% round(3)
 [1] 3.142 6.142 9.142

This is how the previous example looks using the pipe operator. It is well named since the data flows right down the pipe of this operator.

Application of the pipe operator

Here is a real-life example of chick weight over time data. This data comes with magrittr so that you can try it yourself. It provides an excellent example of just how much the pipe operator simplifies the code being used.

# %>% in R - example
> library(magrittr)
 > ChickWeight
 > testdata = ChickWeight[ChickWeight$Time==20,1]
 > testdata
> testdata %>% sapply(log)
> testdata %>% sapply(log) %>% summary()

But at this point you are probably thinking that it is quite long, however, only six lines of it are actual code the rest is print out the data.

The pipe operator in R is an extremely useful tool for simplifying code. While it is not a part of the R base packages, magrittr is a recommended add-on not only for the benefits of this code but for the data libraries found within.