How to Use Gsub() in R – With Examples

Need to selectively replace multiple occurrences of a text within an R string? Never fear, the R gsub () function is here! This souped up version of the sub() function doesn’t just stop at the first instance of the string you want to replace. It gets them ALLLL…..

So when you want to utterly sanitize an entire string full of data, clearing out every instance of heretical thought, gsub in r is your go-to solution…

How To Use gsub () in R

The basic syntax of gsub in r:.

gsub(search_term, replacement_term, string_searched, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Breaking down the components:

  • The search term – can be a text fragment or a regular expression.
  • Replacement term – usually a text fragment
  • String searched – must be a string
  • Ignore case – allows you to ignore case when searching
  • Perl – ability to use perl regular expressions
  • Fixed – option which forces the sub function to treat the search term as a string, overriding any other instructions (useful when a search string can also be interpreted as a regular expression.

A working code example – gsub in r with basic text:

# gsub in R
> base <- "Diogenes the cynic searched Athens for an honest man."
> gsub("an honest man", "himself", base)
[1] "Diogenes the cynic searched Athens for himself."

GSub in R – Regular Expressions

R’s gsub() function can work with regular expressions. Here’s an example of this below, where we are going to remove all of the punctuation from a phone number.

# gsub in R - regular expressions
> phone <-"(206) 555 - 1212"
> gsub("[[:punct:][:blank:]]","",phone)
[1] "2065551212"

As you can see, that phone number got a lot skinnier in a hurry! It will also now fit neatly in a numeric field within a database, which is a much easier way to store and manage this type of information.

Sub in R – Searching for patterns

You can use regular expressions to look for more advanced patterns. In the example below, we’re going to grab the first sequence of 1 – 3 n’s and replace them with a star (not harming any additional n’s in excess of that amount).

# sub in r - regular expression pattern matching
> base <- "bnnnnnannannasplit"
> gsub("n{1,3}","*",base)
[1] "b**a*a*asplit"

As you can see, it tagged multiple subsets of n’s – far more than the original version of this example in our tutorial on sub.

Sub in R – Finding Alternative Matches

Sometimes what you’re looking for may involve more than one thing. In the example below, we want to adjust a pet specific text (dog, cat, etc.) to refer the companion animal as a more generic “pet”. We use the | operator within a regular expression to set this up.

# sub in r - regular expression for alternatives
> base <- "I love my dog even though it may annoy with my cat"
> gsub("dog|cat|hamster|goat|pig","pet", base)
[1] "I love my pet even though it may annoy with my pet"

Mission accomplished, although the final results may look a little bit weird. The original version (sub tutorial) reads a bit better. In any event, this regex syntax allows you to sweep through a line of text and replace multiple words.