Concatenating Strings in R – Paste and Collapse

String concatenation and text assembly in R is handled via paste and related functions. There are a couple of different options and variants which can help you get your desired outcome.

Simple Concatenation in R – paste()

Let’s start with the base case – using paste() to convert a vector of values into a string. The paste() function accepts three sets of arguments:

  • The list of values you wish to convert into a string
  • the sep parameter – a character string you want to interject between each term
  • the collapse parameter – a character string you want to interject between each result

The sep and collapse options for the paste function allow you to layer the text or symbols you use when concatenating strings. This can be useful when constructing file formats or system output.

Lets start with a trivial case.

 

result <- paste ("the", "quick", "brown", "fox", "jumps", sep=" ") > result
[1] "the quick brown fox jumps"

In this case, we are using r’s paste function to concatenate values into a single string.

Introducing The Collapse parameter

A key limitation in the example above: we assume we’re concatenating a constant number of arguments. This doesn’t always occur in the real world. Suppose we want to write an R function which generates comma delimited text for an unspecified number of arguments. Perhaps we’re working on the (always popular) custom XML format for passing data into a web application’s API. Or unpacking an unusual format from an old database. In any event, we may wish to write our process to accept information as a vector of unspecified length. Repeating the example above as a vector…

 

result <- paste (c("the", "quick", "brown", "fox", "jumps"), sep=" ") > result
[1] "the"   "quick" "brown" "fox"   "jumps"

An efficient conversion of vector to text, but we’ve retained the fragmented nature of the original data. A small change is needed, to string these elements together. We shall specify a “collapse” parameter to concatenate the results into a single string. As shown below:

 

result <- paste (c("the", "quick", "brown", "fox", "jumps"), sep=" ",collapse=" ") > result
[1] "the quick brown fox jumps"

Concatenating multiple sets of strings

These functions can be combined to create a more complicated format. For example, suppose we have a vector of information from a database that we want to convert into a dictionary format.

 
c("12 NE 1st Street", "New York", "NY", "Donor", "Gold")

We know what each of these values is:

 
c("address", "city", "state", "status, "tier")

And we would like our final format to be something along the lines of:

key:value,key:value

So we present the paste function with two vectors – the first being field names, the second containing field values. We join them using a colon and join the pieces together using semi-colons. Code is below.

 
paste(c("address", "city", "state", "status", "tier"), c("12 NE 1st Street", "New York", "NY", "Donor", "Gold"),sep=":",collapse=',')

Yielding a result of:

 
"address:12 NE 1st Street;city:New York;state:NY;status:Donor;tier:Gold"

Which is an example of how you can use paste to concatenate strings in R to build complex formats.

Shortcuts – paste vs. paste0 functions

The default value of sep is a blank space. Since reducing a vector to a string without any separating space is a common use case, a shorthand version of the function has been developed. The paste0 function has a zero length space as it’s default value for sep. In effect, using paste0 is the same as declaring:

 paste(values, sep="") 

For more information about handy functions for cleaning up data, check out our functions reference.