R is best known for data analysis and statistics. But that should by no means be taken to mean that it can’t be used for areas more commonly associated with languages like Python or Perl. You might assume that subjects like string manipulation are best performed with other languages. But as you’ll soon discover, you can easily use R to perform a variety of complex transformations on strings. This is largely thanks to R’s stringr package. And the string_replace function from stringr simplifies string manipulation to an incredible degree. Read on to discover how to use stringr and str_replace to replace matched patterns with new text.
Text, Strings, and Stringr
Even if you’ve never used stringr there’s a good chance that you already have it installed. Stringr’s libraries aren’t part of the base R lexicon. But the library is part of the larger tidyverse package. And like most of the tidyverse packages, it can help you tidy up your data. In fact, you have a vast range of functionality to choose from. The strreplace function is the easiest way to go about replacing text. And it gives you an easy way to simply pass a piece of replacement text to the function. But you also have options to use the regular expression syntax you might be familiar with from languages like perl or python.
At the same time, it’s important to keep in mind that R strings are their own unique entity when compared to strings in any other language. For example, strings in Python are actually just an array of what would be classified as a character vector in R. To achieve the same in R we’d need to manually split the strings into an array with the strsplit function, as seen below.
ourString <- “thunderousaaa”
ourArray <- strsplit(ourString, “”)[]
In this example, we see that strings are concrete entities in and of themselves. While we can manipulate strings as an array it requires us to essentially reconstruct the strings into a new format. We could build on this method to replace array elements of split strings from within a for loop. But the stringr library gives us a far more elegant solution.
Basic Text Manipulation With Stringr
We’ve seen that we can split strings into constituent parts. But R can essentially do that work for us behind the scenes. Take a look at the following code.
ourString <- “thunderousaaa thunderousaaa”
ourFixedString <- str_replace(ourString,”aaa”, “ly”)
ourFixedStringAll <- str_replace_all(ourString,”aaa”, “ly”)
We begin by importing stringr in order to gain access to str_replace. Next, we create a string called ourString and assign two typo-ridden attempts to write thunderously. If we wanted to fix that typo we could use str_replace. And an example of that follows, with the results assigned to ourFixedString. We also repeat this process with a function called str_replace_all. Both functions have the same syntax. We simply pass arguments consisting of our original character string, a pattern to replace, and the new text which will be substituted.
Note the differences between str_replace and str_replace_all when we print them out on the following two lines. Only the first occurrence of thunderousaaa in ourString was corrected to thunderously when we ran str_replace. This is because str_replace is essentially a one-off solution. We tell it to replace strings, and the function is considered complete after a single string character instance has been found and replaced. But str_replace_all works on all of the text within strings. It looks for every occurrence of matched patterns rather than just the first instance. If we had 20 instances of thunderousaaa in our strings then str_replace_all would fix all 20.
Regular Expressions and More Complex Replacement Situations
Earlier on we made comparisons to Perl. And this might have made you wonder about how R might be able to leverage regular expressions for pattern matching. If that’s the case then you probably also imagined that it’d require integrating another library in addition to stringr. But, in fact, if you’ve tried the previous code then you’ve already enabled regular expressions in your code. The stringr library uses regular expressions for all pattern matching by default. For example, try running the following code.
ourString <- “thunderoussaaa thunderoussaaa”
ourFixedString <- str_replace_all(ourString, “.aa.”,”ly”)
In this script we once again begin by importing stringr. We make a minor change to ourString to garble the text even further. Next, we use another stringr function called str_extract. As the name suggests, this function extracts a portion of a larger string. It also works analogously to str_replace. And like str_replace we also have an option of using str_extract_all if needed. However, in this instance, we’re just using str_extract to show how stringr sees the string we want to manipulate. We pass .aa. To str_extract as our selector. This is regular expression syntax to indicate that we want the text from directly before and after an occurrence of aa. The periods essentially act as a wildcard value.
Next, we create a variable called ourFixedString. We assign a string to it with the str_replace_all function. And we supply this function with the same regular expression that we used in str_extract. The main difference is that we’re now supplying a third argument in the form of a “ly” replacement string. This is the text that we want to replace any matched character string with. And because we’re using str_replace_all, it will replace occurrences within multiple string segments.
Finally, it’s worth noting that you can actually dive a little deeper into stringr’s underlying functionality. The str_replace function is provided by stringr. Stringr, in turn, is essentially a wrapper around another R library called stringi. Stringi and stringr have roughly similar functionality. There are some instances where stringr and stringi even provide nearly identical syntax. But stringi also has some powerful functionality not found in stringr. This includes character encoding, transliteration, and some more advanced regular expression options.