The R programming language is almost synonymous with statistical computing. It provides people with a coding syntax that feels right at home when placed next to standard mathematics. And it has a built-in lexicon that allows for complex data types and easy object manipulation. However, there’s another side to R that’s much less commonly discussed. R isn’t just about numbers, it’s also a fantastic tool for text manipulation. Many of the techniques used to work with collections of numbers in R also translate into collections of letters – aka words. For example, you might be shocked to discover the ease with which you can pick, and alter, a single character from a larger string in R. But you’ll soon discover exactly how to remove last character from a string using R.
The Nature of Strings in R
Strings are one of those things that you can never take for granted when learning to use a new programming language. Most languages have some concept of a string. After all, language is an inherent part of the human experience. It’s natural for us to want to work with written text. And that is, essentially, what a string is. It’s a way to portray standard written languages within a system that’s essentially built on 1s and 0s. Programming languages are trying to model the same concept when they implement strings. But most have slightly different takes on the idea. For example, in Python, a string is essentially synonymous with an array. It’s just an array of characters. And you can access elements of a string in the same way you would elements of an array or list.
A string in R is still a collection of characters. But an R string isn’t quite as easily manipulated. We generally need to use specific functions to access part of a string in R. For example, in python we could simply write print(ourString) to access the first letter of a string assigned to ourString. But in R we’d need to use a specific function such as in the following example.
ourString <- “The quick brown fox jumps over the lazy dog”
print(substr(ourString, 2, 2))
In this example, we use a function called substr to access the second letter in “The quick brown fox jumps over the lazy dog”. It’s true that using a function adds a little more complexity to the process of string access. But as you’ll soon see, this also gives us some extra power to manipulate strings through the standard method of access. In fact, you’ve already gotten a glimpse at how we can go about removing the last character from a string in R.
The Easiest Way To Remove the Last Character
In R, if we access an element of a string we can generally change it. You can think of string access in R as essentially being read/write by default. As such, we can make a slight change in our previous example to remove a trailing character from the string. Take a look at the modified code below.
ourString <- “The quick brown fox jumps over the lazy dog._”
ourString <- substr(ourString, 1, nchar(ourString)-1)
We begin by once again defining ourString with an example sentence. Except this time around we’ve intentionally added a superfluous character at the end. But we can easily get rid of it in the following line. We call substr and pass ourString to it as the initial argument. Next, we pass a 1. The number in this position tells substr where to begin copying text. The next argument tells it where to end its text selection. Unfortunately, there’s no special character or code to tell substr to select the last character. We need to know an exact numerical position. That’s not a huge problem in this example since we can just count out the characters in the string by hand. But imagine that we’re using an automated process that’s going to sort through hundreds of strings.
We need the code to be able to determine the size of the string on its own. And we can easily implement that through a call to nchar. This function gives us the size of a vector. With a string that means the returned number is also synonymous with the last character within it. We can just pass ourString to nchar to get the numerical position of the last character in the string. But note that we’re not selecting a character to delete. We’re actually selecting everything we want to save from deletion. As such, we want to pass the final character number -1. That means we’ll select text in ourString that starts with the first character and ends with the penultimate. The substr function has now copied all the text in ourString except the last character. With that finished it assigns the modified string back to ourString. We proceed to print that string out. And the _ is now absent from ourString. But there is another, more advanced, variation on this concept that we can use.
Regex Allows for More Precise Selections
The substr function is certainly easy to use. But in some cases, it can be described as simple to a fault. The prior example highlighted the fact that we needed to call a function as an argument to provide subtr with an endpoint to stop its string selection. Wouldn’t it be nice if we had a way to just directly communicate that fact? We do if we use sub instead of substr. Take a look at the following example.
ourString <- “The quick brown fox jumps over the lazy dog._”
ourString <- sub(“.$”, “”, ourString)
This is fairly similar to the previous code block. But note that the arguments passed to sub differ considerably from our prior work. The initial argument for sub is where the magic really happens. It uses a regular expression, aka regex, for text selection. Take a look at that argument, which reads as “.$”. We enclose the regex as a string in order to pass it all as a single command. The . is regex’s wild card value. It essentially says that any value in that position will trigger a positive match. So when the next character, the $ anchor is called, the wild card will always provide a positive result. The $ itself is a code for the end of a string. So we can take all of that explanation together to read it as one simple command. Look at the end of the string and as long as there’s any character prior to that end then you have a match. In this case, it’s the _ character.
With the regex match processed, the next argument is called. The matched character pattern will be replaced by a string consisting of “”. Since there’s nothing in that replacement string it’ll delete any replaced character. So the sub function essentially deletes the _ character. The final result after performing that deletion is then assigned back to ourString. And, finally, the newly modified string is printed to screen. We now see ourString correctly formatted and with the last character removed.