Wildcard in R – Two ways to wildcard match text strings in R

Sometimes when doing data processing you are going to want to pull specific data out of a vector or data frame based on a wild card. A wild card is a piece of text that contains common elements that you are looking for. The formula provided by R programming has two output options that have different uses.

Description

When doing a wild card search in R programming you use the grep function and it has the format of grep(wild card, vector, value). It will provide either the position of the data that fits the wild card or the actual data depending on how the value argument is set. If the value argument is true, it will produce a vector containing the matching data. If the value argument is false, then it will provide a vector with the location of each matched data point in the vector argument. If the value argument is omitted, it is automatically false.

Explanation

When using the grep function, you put your wild card in the first argument and your vector in the second argument. If you want to create a vector consisting of the matching values, then you set the value parameter to true, otherwise, you can just leave it blank. The grep function then searches the vector for values that match the wild card. If the value argument is true, it will then create a vector consisting of matching values. If the vector argument is false, or not included, the function produces a vector consisting of the location of the matching values in the vector being searched. The one you use depends upon the situation. Using a true value argument works best if you are analyzing a vector. Using a false value argument works best if you are analyzing a data frame.

Examples

Here we have two examples of how-to wildcard match text strings. They both use the same basic formula, but they produce different results. The first one only has two arguments and produces a vector of the row numbers with the match and produces a vector of the matches by adding a third argument.

> df = data.frame (
+ Name = c(“Janet”, “Charles”, “Alcaraz”, “James”, “Samantha”, “Bob”),
+ Age = c(22, 56, 15, 45, 19, 88),
+ Childen = c(2, 0, 4, 3, 1, 5))
> df
Name Age Childen
1 Janet 22 2
2 Charles 56 0
3 Alcaraz 15 4
4 James 45 3
5 Samantha 19 1
6 Bob 88 5
> wc = “Ja*”
> n = df$Name
> a = grep(wc, n)
> a
[1] 1 4
> jf = df[a,]
> jf
Name Age Childen
1 Janet 22 2
4 James 45 3

In this version of the grep function, we have only two arguments because the value argument is false by default. The result is a vector of the row numbers that match our wild card. In this case, we get two row numbers that are used to produce a data frame consisting of rows containing the two names that start with “Ja.” This approach allows you to extract the entire matching row of a data frame.

> df = data.frame (
+ Name = c(“Janet”, “Charles”, “Alcaraz”, “James”, “Samantha”, “Bob”),
+ Age = c(22, 56, 15, 45, 19, 88),
+ Childen = c(2, 0, 4, 3, 1, 5))
> df
Name Age Childen
1 Janet 22 2
2 Charles 56 0
3 Alcaraz 15 4
4 James 45 3
5 Samantha 19 1
6 Bob 88 5
> wc = “Ja*”
> n = df$Name
> a = grep(wc, n, value = TRUE)
> a
[1] “Janet” “James”

In this version of the grep function, we have three arguments with the value argument set to true. If the value argument is set to false, the function produces the same result that you get when only using two arguments. When the value argument is set to true, the grep function produces a vector of the actual matching values. In this case, we get the names Janet and James because they both start with “Ja” and thereby match our wild card.

Application

The main application for the grep function is extracting selected data from a vector or other data set. If you are working with a vector then you probably want to set the value argument as true. If you are working with a data frame you can just exclude the value argument and use the resulting vector to pull out the matching rows.

The grep function is a handy way of pulling out selective data from a vector or other data set. Now to do this you have to convert the column you were looking for to a vector, but that is an easy task. This provides yet another handy tool to have in your programming toolbox.

Scroll to top
Privacy Policy