It’s quite common for people to come into the R programming language with some prior experience in other coding styles. After all, R’s prominence in the world of data analysis, machine learning, and general data science are well known. It’s often more efficient to jump into R code than try to push another programming language into the areas where R excels. However, at the same time, this can lead to some common errors where people try to use R as if it was a more generalized programming language. And there are few areas where this is more common than with FUN R arguments.
The FUN Fundamentals
R can seem relatively analogous to other programming languages at first glance. You’ll find most of the same basic functionality in R’s native objects as you will in other programming languages. And it’s easy to use R just as you would, for example, Python. But that’s also one of the most significant stumbling blocks to mastering R. Because R is capable of so much more than just aping other programming languages. The most significant point to keep in mind is that data is the cornerstone of the R language.
Everything in R comes back to data. Even the data frame, a fundamental element in R, is fairly similar to a standard database. But you’re able to manipulate data frames as easily as you would integers or strings. And R even has special methods which let you apply functions to part, or the entirety, of a dataset. One of the most efficient of these methods is the previously mentioned FUN.
FUN refers to, as you might suspect, a function. And it’s used as an argument within the apply family of functions. These include apply, sapply, tapply, mapply, and lapply. In turn, each of these functions has special capabilities to work with collections of data. Together with FUN they can efficiently run loops on data without ever actually needing to create a formal loop structure. And we don’t even have to use any additional package libraries. It’s all just part of R’s standard lexicon.
Diving Into a Basic FUN Implementation
The nature of the apply family and FUN can be a little hard to understand simply through a description. It’s often easiest to learn about them through first-hand experience. So let’s take a look at a simple example of FUN within the context of lapply.
ourVector <- c(‘T-Rex’, ‘Utahraptor’, ‘Crow’, ‘Meilong’, ‘Triceratops’,’Troodon’)
ourResult <- lapply(ourVector, FUN=nchar)
print(ourResult)
Imagine a hypothetical situation where we’ve been hired to do programming work for a museum’s dinosaur-themed event. Participants can request their favorite dinosaur printed out on a museum lanyard as swag when they go through registration. But we need to order specific sizes to match dinosaur names. Which means looping through a structure containing the participant’s requested dinosaur name to create a grouping based on string size.
We begin the process by creating a list of dinosaur names and storing it in the ourVector vector variable. Next, we assign the results of lapply to ourResult. Lapply is a discussion unto itself. But you can think of lapply as a way to apply a specific function to items within a list. It’s essentially a “list apply”. We pass our list of names to lapply as its first argument. Next, we assign a function to the FUN argument. In this case, we go with nchar. The lapply function will run the function assigned to FUN on every element passed as the first argument. So in this case it’s equivalent to the following statements.
nchar(‘T-Rex’)
nchar(‘Utahraptor’)
nchar(‘Crow’)
nchar(‘Meilong’)
nchar(‘Triceratops’)
You can see how FUN leads to far more succinct code! And you can also see the result of taking this approach to using nchar as part of FUN when we print out the results in the final line. We see that nchar has indeed been run on every string in ourVector to give us a list of string sizes. But this is only the beginning. We can build a wide variety of functionality on top of FUN.
More Advanced Scenarios With FUN
You might have noticed that a string of ‘Crow’ is in the list of dinosaur names. In our hypothetical, a group of people who really love crows decided that birds were close enough relatives to a dinosaur to count. We’re ok with that, but only if the crow labels don’t outnumber any of the other options. The following code shows one way we could combine lapply, FUN, and grepl to count occurrences of ‘Crow’ in ourVector.
ourVector <- c(‘T-Rex’, ‘Utahraptor’, ‘Crow’, ‘Meilong’, ‘Triceratops’,’Troodon’)
ourResult <- lapply(ourVector, FUN=function(x) {
grepl(“Crow”, x, fixed=TRUE)
})
ourCrowNumber <- sum(unlist(ourResult))
print(ourCrowNumber)
We begin by defining ourVector again. Next, we create ourResult. As you might expect, this holds the result of our lapply call. We do so by once again passing ourVector as the primary argument. However, this is where we start to see the kind of advanced functionality that’s built into FUN. We use FUN to create our own temporary function within the call to lapply. Our function takes the element of ourVector we’re working through and passes it to grepl to see if it matches “Crow”. The ourResult variable now holds the total number of matches for “Crow” that was found within ourVector. However, it’s not in the most human-readable state. We have a new list of values that are either TRUE or FALSE. With TRUE marking a match for “Crow” and FALSE showing a string value that isn’t a match.
But we can easily clean our results up. And we do so through the call to sum and unlist on the next line. We use unlist to flatten our results out of a list format so that we can get a sum of the positive occurrences. We pass the unlisted values to sum and take the results to the next line as ourCrowNumber. Finally, we print ourCrowNumber out to see that we only have one occurrence of “Crow” within ourVector.
This might not be the most efficient way to go about getting the number of occurrences of a string. But it does highlight how easily you can build functionality on top of FUN. And you can also use it with any of the other apply family of functions.