How To Count The Number Of Occurrences In A Column In R

Counting sheep is easy, but counting occurrences in a column of data can be a real headache. Fortunately, R has some nifty functions that can make the task a breeze. In this article, we’ll explore how to use R to count the number of times a certain value appears in a column of data. Whether you’re counting the number of times your boss says ‘um’ in a meeting or keeping track of how many slices of pizza you’ve eaten, these R functions will have you counting like a pro in no time.

We’re going to explore a couple of different options for accomplishing this. The first of these is the table() function in base R. This provides a handy way to aggregate and count unique values in a R data frame. We’ll explore a couple of edge cases, including counting missing values and checking multiple columns. We’ll also explore range checking, which uses the table() function to determine can tell you how many places in the dataset have a unique value above, below, or equal to a certain value. Finally, we’ll explore how to accomplish the same task using the aggregate () function in R.

Why Count The Number Of Occurrences In a Column?

Often, the raw content of a data set does not show clear relationships. In some cases, counting occurrences can show otherwise hidden relationships. These cases mainly occur when the range of values being compared is limited. When you in R count the number of occurrences in a column, it can help reveal those relationships. Learning to count in R, whether it be a categorical variable, for example animal species or new column names, can help improve the return value of your data analysis, and the summary statistic output that this type of function provides can help you create a graph, identify a specific value, calculate the correlation coefficient, or even find missing data in any single column or object.

When counting the occurrence of distinct values, it gives you new information about the data set. Furthermore, when you count occurances among multiple columns it can show relationships between columns that you would not see simply by looking at the raw numbers. Finding these relationships can have a big impact on how you view information.

How To Count The Number Of Occurrences In A Column

The process of counting the number of occurrences is similar to the count function in Excel. You give it a range to check and it gives the number of occurrences. In this case, it is a data frame for that range.

# how to count number of occurrences in a column
> df = ToothGrowth
> table(df$supp)

OJ VC
30 30

Is this example, the table() function shows the number of occurrences for the two values in the column “supp” both of which have thirty occurrences. This is the simplest form of this function, the others yield more information.

Comparing Multiple Columns

To count occurrences between columns, simply use both names, and it provides the frequency between the values of each column. This process produces a dataset of all those comparisons that can be used for further processing. It expands the variety a comparison you can make.

# comparing multiple columns
> df = ToothGrowth
> table(df$supp, df$dose)

0.5 1 2
OJ 10 10 10
VC 10 10 10

In this example, the two columns of the data frame have a frequency of ten across each of their values. While it is unusual to have such an even distribution, it makes for an easy test case for future examples.

Checking For NA Values

The table() function usually ignores NA or true false values and only count occurrences of a text string and numeric value. This fact means that in general, you can ignore them.

# occurrences in a column of NA values

> df = ToothGrowth
> df$dose[5] = NA
> df$dose[10] = NA
> table(df$supp, df$dose)

0.5 1 2
OJ 10 10 10
VC 8 10 10
> table(df$supp, is.na(df$dose))

FALSE TRUE
OJ 30 0
VC 28 2

In this example, we substitute the original distinct values for NA values. These were numeric values but we did not touch the string values. The first table array shows the effect of NA values and in the second table, they are counted.

Including NA Values

In this situation instead of having a unique value of a number or a string, but rather an NA value, you may want to include a count of those values as well.

# checking occurrences in a column counting NA values
> df = ToothGrowth
> df$dose[5] = NA
> df$dose[10] = NA
> table(df$supp, df$dose, useNA = "always")

0.5 1 2 NA
OJ 10 10 10 0
VC 8 10 10 2
NA 0 0 0 0

In this example, we included an argument that tells the table() function to include NA values. The result is the addition of a column and row for that addition.

Range Checking

Range checking is one practical use of the table() function. It can tell you how many places in the dataset have a unique value above, below, or equal to a certain value.

# counting occurrences in a column range checking
> df = ToothGrowth
> table(df$supp, df$dose>2)

FALSE TRUE
OJ 10 20
VC 10 20

In this example, we have the sum of how many values are less than two and not less than two for each supplement.

This method can be used with dataframes, which make handling your data a lot more user-friendly. Check out our handy guide about converting lists to dataframes here.

The table() function also works with arrays. So, you can put a group of vectors through the array formula and then the table() formula to get the same type of results. Being able to count the number of occurrences is a convenient tool, and it is a simple and versatile tool that adds flexibility to R programming.

Alternative Approach: Using aggregate() to Group Data by Columns and Count Occurrences of A Value

The aggregate() function in R is used to group data by one or more columns and perform calculations on the grouped data. Here’s an example of how to use the aggregate() function in R to group data by one or more columns and perform calculations, using cartoon characters from different TV shows:

# create a data frame of cartoon characters
df <- data.frame(name = c("Homer Simpson", "Marge Simpson", "Bart Simpson", "Lisa Simpson", "Maggie Simpson",
                          "Peter Griffin", "Lois Griffin", "Chris Griffin", "Meg Griffin", "Stewie Griffin",
                          "SpongeBob SquarePants", "Patrick Star", "Squidward Tentacles", "Sandy Cheeks", "Mr. Krabs"),
                 show = c("The Simpsons", "The Simpsons", "The Simpsons", "The Simpsons", "The Simpsons",
                          "Family Guy", "Family Guy", "Family Guy", "Family Guy", "Family Guy",
                          "SpongeBob SquarePants", "SpongeBob SquarePants", "SpongeBob SquarePants", "SpongeBob SquarePants", "SpongeBob SquarePants"),
                 gender = c("male", "female", "male", "female", "female",
                            "male", "female", "male", "female", "male",
                            "male", "male", "male", "female", "male"))

# group the data by the "show" and "gender" columns and count the number of characters in each group
aggregate(name ~ show + gender, data = df, function(x) length(x))

In this example, the aggregate() function groups the data in the df data frame by the “show” and “gender” columns and counts the number of characters in each group.

The formula name ~ show + gender specifies that we want to group the data by the “show” and “gender” columns and aggregate the “name” column. The function function(x) length(x) is applied to the “name” column and counts the number of characters in each group.

The output of the aggregate() function is a new data frame that contains the results of the aggregation. In this example, the output is:

                   show gender name
1            Family Guy female    2
2 SpongeBob SquarePants female    1
3          The Simpsons female    3
4            Family Guy   male    3
5 SpongeBob SquarePants   male    4
6          The Simpsons   male    2