When you need to do a countif in r you will quickly find that there is no dedicated function to do this task. However, there is a way of doing a conditional count, but it is a little more complicated than simply plugging values into a single existing function. It requires combining a logical command with a function designed to sum up the results.
R’s answer to the Microsoft Excel countif function
If you are looking to do a conditional count in R, you will quickly discover that there is no countif function or countifs function. Instead, R programming uses the sum function in the format of sum(x == value) where “value” is the value or values being counted and “x” is the dataset being evaluated. In essence, you can count number items in a vector by running a logical test and summing the results. There is an optional argument used to catch missing values, otherwise, you will get an NA value. It is a straightforward process, but you sometimes have to be careful to get the results you are looking for. While this approach is not likely to produce error messages, it is necessary to set it up properly in order to get the correct result.
Conditional Counts: How this countif approach works for R
A conditional count in R does not count rows in a data frame but it can be used to count the occurence of specific values in each row of a column. When being used in this fashion the sum function simply checks the conditions and adds up all the occurrences that match that condition. One potential problem is that if a missing value is encountered it will return an NA value, rather than counting the values you are looking for. This can be prevented by adding the additional argument of “na.rm=TRUE” but it is unnecessary if you know that there are no missing values present in your data set. Except for this one snag, this is a straightforward process.
Examples of this countif approach in R
Here are several code examples illustrating a conditional count. Each of them illustrates different circumstances and ways of using this approach.
> x = c(“a”, “c”, “b”, “c”, “d”, “c”, “e”, “c”, “f”)
> x == “c”
[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
> sum(x == “c”)
[1] 4
This example is a straightforward sum of a value in a character vector. It illustrates the simplest usage of this function with a conditional statement.
> x = c(1, 2, 3, 2, 4, 4, 2, 3, 1)
> x == 2
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
> sum(x == 2)
[1] 3
This example is a straightforward sum of a value in a numeric vector. It illustrates the simplest usage of this function with a conditional statement.
> x = c(“a”, “c”, “b”, “c”, NA, “c”, “e”, “c”, “f”)
> x == “c”
[1] FALSE TRUE FALSE TRUE NA TRUE FALSE TRUE FALSE
> sum(x == “c”)
[1] NA
> sum(x == “c”, na.rm=TRUE)
[1] 4
> sum(x == “c”, na.rm=FALSE)
[1] NA
This example illustrates the sum function being used with a conditional argument on a vector with a missing value. It shows how the result is an NA value and not the number of the value being looked for without the na.rm argument being set to true.
> x = data.frame(X = c(1, 2, 1, 4, 1, 6, 1),
+ Y = c(6, 7, 8, 7, 10, 7, 12))
> x
X Y
1 1 6
2 2 7
3 1 8
4 4 7
5 1 10
6 6 7
7 1 12
> sum(x$X == 1)
[1] 4
> sum(x$Y == 7)
[1] 3
> sum(x$X == 1 | x$Y == 7)
[1] 7
This example shows the sum function used with a data frame. It illustrates the use of this function with a single column as well as a multiple column argument. It also illustrates the use of multiple criteria. Note that in the case with multiple criteria it produces a total of both criteria and not each one individually.
> x = data.frame(X = c(1, 2, 1, 4, 1, 6, 1),
+ Y = c(6, 1, 8, 1, 10, 1, 12))
> x
X Y
1 1 6
2 2 1
3 1 8
4 4 1
5 1 10
6 6 1
7 1 12
> sum(x == 1)
[1] 7
How You Can Use this (conditional counts in R)
There are many applications of a conditional count. A common application is a case where you have a data frame with a column that is a factor that categorizes the data in the other columns. You can use this conditional sum to count the number of items in each category. Another application is counting the number of items in a data set with the same value for the purpose of producing a plot of that count to show visual clues to any pattern that may exist. These are just a couple of the many applications of a conditional count.
A conditional count is a useful tool that gives you the number of occurrences of a given value within a data set. Using the sum function with a conditional statement is the method used to make such a count. It is straightforward to use, but you need to know its quirks, in the situations that you use it.