How to Use Sub in R – Examples

Need to selectively replace the text in an R string? The R sub function can handle this, scanning the string for the text you want to replace and returning a revised version of the string.

Sub() differs from gsub() because it only replaces the first instance of the search string, not every instance in the text you are searching.

How To Use Sub () in R

The basic syntax of sub in r:

sub(search_term, replacement_term, string_searched, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Breaking down the components:

  • The search term – can be a text fragment or a regular expression.
  • Replacement term – usually a text fragment
  • String searched – must be a string
  • Ignore case – allows you to ignore case when searching
  • Perl – ability to use perl regular expressions
  • Fixed – option which forces the sub function to treat the search term as a string, overriding any other instructions (useful when a search string can also be interpreted as a regular expression.

A working code example – sub in r with basic text:

# sub in R
> base <- "Diogenes the cynic searched Athens for an honest man."
> sub("an honest man", "himself", base)
[1] "Diogenes the cynic searched Athens for himself."

Sub in R – Regular Expressions

R’s sub() function can work with regular expressions, which gives it a fair amount of power. We’re going to show a very basic version of this below, where we protect the privacy of some address data with a generic string substitution.

Sub actually works extremely well in this case. We know the typical US address has a street number in front. This number is of unknown length (number of digits). Furthermore, other numbers may also exist within an address that we want to preserve (eg. 4th Street, 57th Street SE) – so we really do only want to perform a single replacement on the first number.

# sub in R
> base <- "1155 East Main Street, Anytown, AL"
> sub("[[:digit:]]+","_", base)
[1] "_ East Main Street, Anytown, AL"

Sub in R – Searching for patterns

You can use regular expressions to look for more advanced patterns. In the example below, we’re going to grab the first sequence of 1 – 3 n’s and replace them with a star (not harming any additional n’s in excess of that amount).

# sub in r - regular expression pattern matching
> base <- "bnnnnnannannasplit"
> sub("n{1,3}","*",base)
[1] "b*nnannannasplit"

As you can see, we find the initial sequence of 5 n’s… replace the first three, then preserve the remaining two. A second example, looking for a word… or something that fits the pattern… is shown below…

# sub in R
> base <- "I love my dog"
> sub("l[A-z]*e", "like", base)
[1] "I like my dog"

In this example, we’re looking for a word of uncertain length which starts with l and ends with e. Since we’re not into that whole love thing, we’re going to demote it to like and call it a day. This is a way to clean up text.

Sub in R – Finding Alternative Matches

Sometimes what you’re looking for may involve more than one thing. In the example below, we want to adjust a pet specific text (dog, cat, etc.) to refer the companion animal as a more generic “pet”. We use the | operator within a regular expression to set this up.

# sub in r - regular expression for alternatives
> base <- "I love my dog"
> sub("dog|cat|hamster|goat|pig","pet", base)
[1] "I love my pet"