A common data manipulation task in R involves merging two data frames together. One of the simplest ways to do this is with the cbind function. The cbind function – short for column bind – is a merge function that can be used to combine two data frames with the same number of multiple rows into a single data frame.
While simple, cbind addresses a fairly common issue with small datasets: missing or confusing variable names. Consider, for example, we are going to look at our manufacturing productivity. Our basic data is very simple: we have a database of the number of widgets made per hour by different machine operators. Unfortunately, this dataframe doesn’t track operators by their actual name but rather a code: “EDS01”. That will make our analysis, and combining things into a single column, a little challenging. Cbind to the rescue!
This data table example is going to cover a couple of topics. First, we’re going to use the cbind merge function to join two sets of columns together into a single dataframe. This will address the variable names problem we have above, that of getting information from a legacy system with weird / unreadable codes and row names.
Next we’re going to show to how you can use cbind to quickly append information to an existing data frame or matrix on the fly. In this case, we’re going to add notes on where the operators were hired. This will allow a dataset analyst to examine performance by operator. The cbind method is good for these sorts of R code exercises, where you want to quickly derive an attribute or numeric vector from notes, history, or an existing matrix and append it to your data.
We will start by setting up the data.
# cbind in r - data for example activity <-data.frame(opid=c("Op01","Op02","Op03", "Op04","Op05","Op06","Op07"), units=c(23,43,21,32,13,12,32)) names <- data.frame(operator=c("Larry","Curly","Moe", "Jack","Jill","Kim","Perry")) > activity opid units 1 Op01 23 2 Op02 43 3 Op03 21 4 Op04 32 5 Op05 13 6 Op06 12 7 Op07 32 > names operator 1 Larry 2 Curly 3 Moe 4 Jack 5 Jill 6 Kim 7 Perry
and now to combine it…
# how to use cbind in r blended <- cbind(activity, names) > blended opid units operator 1 Op01 23 Larry 2 Op02 43 Curly 3 Op03 21 Moe 4 Op04 32 Jack 5 Op05 13 Jill 6 Op06 12 Kim 7 Op07 32 Perry
There we go… much easier to read. We can see how everyone is doing.
Cbind Examples – append data attributes
Continuing our example a little further, we likely collected this data because we want to analyze it a bit. Perhaps we should want to look at productivity based on where the worker was recruited?
We will use cbind to append a different column below. Since we hired our employees due to their roles in classic movies (the three stooges), nursery books (Jack and Jill), and cartoons (Kim Possible and Phineas and Ferb), we will note the source of the hire.
# cbind in r column names sourceofhire <- data.frame(found=c("Movie","Movie","Movie", "Book","Book","TV","TV")) blended <- cbind(activty, names, sourceofhire) > blended opid units operator found 1 Op01 23 Larry Movie 2 Op02 43 Curly Movie 3 Op03 21 Moe Movie 4 Op04 32 Jack Book 5 Op05 13 Jill Book 6 Op06 12 Kim TV 7 Op07 32 Perry TV
As you can see, we can use cbind to slap an additional set of character vector attributes onto the dataset in a couple of seconds.
In fact, since the cbind R function can join multiple sets of columns at once, we could have done this in one shot- this method allows us to do the first and second column all at once.
# r merge multiple data frames blended <- cbind(blended, sourceofhire) > blended opid units operator found 1 Op01 23 Larry Movie 2 Op02 43 Curly Movie 3 Op03 21 Moe Movie 4 Op04 32 Jack Book 5 Op05 13 Jill Book 6 Op06 12 Kim TV 7 Op07 32 Perry TV
Up till now we have been looking at simple, separate column merges where you rely on columns being in the same order. For more complicated joins with multiple rows, multiple columns, and a different column value, take a look at our article about merging dataframes.
Related Topics & Alternative Solutions:
Like many r programming challenges, there is often more than one way to get things done. The advantage of the cbind r function is that it can handle r appends very efficiently; this is a big advantage if you’re iterating across a lot of data. You can also perform similar operations on rows with rbind (for mental consistency, at least). This works well with large integer vector or logical vector where you can iterate across them.
The merge operation in the r language provides another effective way to handle combining data. This handles data frame arguments well, even in situations where you need to manage multiple vectors, column names, or matrix arguments.
data.table also provides good options, tapping into the robust library around the data object in the r language. The data frame method is good for basic clean up work as data hygiene efforts.
Finally there is the dplyr package, which has emerged as the swiss army knife for manipulating data within the r language.
In any event, whichever applicable method you select, there are many ways to get this done!