How To Use Pivot_wider in R To Pivot Your Data From Long to Wide

The R programming language is a perfect fit for almost anything related to data science. The framework is largely centered around statistical management and manipulation. And anyone new to R code will quickly discover that it can accomplish almost any mathematical feat. However, that doesn’t necessarily mean that the process will be as clean or straightforward as one might desire. There are a few areas, such as statistical pivoting, where the R language can be a little longwinded. You can pivot information from long to wide with R’s default lexicon. But it’s a relatively involved process that typically involves juggling your dataframes between multiple functions. However, a third-party library called tidyr vastly simplifies the process of statistical pivoting in R. And you’ll soon learn how easy it is to pivot from long to wide in R using tidyr’s pivot_wider function.

A Brief Overview of Pivoting and Tidyr

Before we start working with actual code it’s important to stop for a minute to define a few elements. First, we need to think about what we want to accomplish with data pivoting. One of the reasons people use languages and libraries focused on data science is that they tend to make it easier to work with multi-dimensional processing. This somewhat intimidatingly titled concept really just comes down to thinking about metrics as multi-layered.

Multi-dimensional collections provide nested information instead of a single list of variables. This essentially means that we can nest one collection of information inside of another collection. And we can continue even further down that line. We’re able to essentially work with data on the same type of axis that we work with in physical space. For example, spatial positioning typically uses an x,y, and z-axis. Information structures and containers in languages like R make it easy to chart that type of information. And we often go far beyond it. But the typical data pivoting operation can be seen as just moving things along an axis. As the name suggests, we pivot a structure in order to get a different perspective on the elements which constitute it.

Things would be fairly easy if our informational container were a physical object. But pivoting information within a data frame is a bit more involved than just moving papers around with our hands. All the information we’re working with is essentially a digital abstraction. A computer needs to do a fair amount of work to move complex structures around. And this is reflected by the fact that pivoting in R is, by default, rather messy.

We’d typically need to use a chain of several functions to move and manipulate the containers storing our data. But this is where the tidyr library comes in. It takes a lot of the common, but messy, pivoting functionality in R and essentially tidies it up. The library creates tidy data from normally convoluted processes. In fact, tidyr has a pivotwider function that makes it easy to take a dataframe and pivot it accordingly into a wide format. And you’re about to see how easy pivot wider makes the pivoting process.

Using Pivot_wider

We can begin working with pivot_wider by setting up a thought experiment. Imagine that you’re organizing data from two survey missions around Jupiter and Neptune. The probes are looking for evidence of self-generated magnetospheres on both the planets and orbiting moons. But the probes are only able to survey a portion of those bodies at one time. As such we have separate datasets to work with along with information from return trips to replicate data. We’ll set this up in code with the following declaration.

df <- data.frame(planet=rep(c(‘Jupiter’, ‘Neptune’), each=4),
mission=rep(c(1, 1, 2, 2), times=2),
info=rep(c(‘moons’, ‘magnetosphere’), times=4),
returned=c(20, 1, 80, 2, 10, 1, 14, 1))
print(df)

If you run this code you’ll see the dataset laid out and printed to screen. You’ll probably also notice that the information isn’t particularly well organized for legibility. We can fix this by pivoting the dataframe using pivot_wider. Take a look at the following code.

library(tidyr)

print(df)
df2 <- df %>% pivot_wider(names_from = info, values_from = returned)
print(df2)

We begin by importing the tidyr library. This is where we get pivot_wider from. Next, we declare our data frame in the same way as in the prior example. We then print it to screen for later comparison. In the next step, we actually use the pivot_wider function. This might seem like a lot of information to work with at first. But when we return to look at it later you’ll see that it’s actually far more concise than it might appear. Also, note that we’re using an infix operator to pipe the results to df and then assigning everything back to a new df2 dataframe.

With the conversion finished, we print the new dataframe to the screen. Take note of just how much more legible the information is now that we’ve pivoted it to decrease the rows. This is one of the reasons why pivoting is such an important topic. It’s not just a matter of discovering new information. We also need to make sure that we can properly understand it on a more intuitive level. Presentation always matters. But with that in mind, just how does pivot_wider manage to reformat everything to that extent with just a single line of code?

Pivot_wider’s Use and Arguments

Take a look back at how we set up the pivot_wider call. We began by supplying an argument called names_from and follow it with values_from. These two arguments relate to the fact that we need to create multiple columns and have a column name for each. These values determine the output column for the data table cell values.

The names from property points to the column which will be used as the basis for column names. We simply need to point to a character vector to use for the respective names. The value from property is how we tell pivot_wider where to get values. Note that we could also specify a values_fill property to tell the function which numbers to use if there were any missing values.

Names_from and value_from are the two most important arguments. Between them, we have everything we need for a pivot. But keep in mind that the function’s using a large number of default values behind the scenes. For example, pivot_wider has a names_repair argument that defaults to check_unique. This forces the R interpreter to error out if it sees columns with duplicated output. But if we added an argument of names_repair = ‘unique’ the interpreter would automatically modify our dataset to add numeric suffixes if duplicates were detected. Any of pivot_wider’s many default behaviors can be overridden in a similar manner.