There are occasions in data science when you need to know how many times a given value occurs. This occurs most often when you have a limited set of possible values that you need to compare. A good case of this would be how many people in a large group have the same birthday.
Why Count The Number Of Occurrences In a Column?
Often, the raw content of a data set does not show clear relationships. In some cases, counting occurrences can show otherwise hidden relationships. These cases mainly occur when the range of values being compared is limited. When you in R count the number of occurrences in a column, it can help reveal those relationships.
When counting the occurence of distinct values, it gives you new information about the data set. Furthermore, when you count occurances among multiple columns it can show relationships between columns that you would not see simply by looking at the raw numbers. Finding these relationships can have a big impact on how you view information.
How To Count The Number Of Occurrences In A Column
The process of counting the number of occurrences is similar to the count function in Excel. You give it a range to check and it gives the number of occurrences. In this case, it is a data frame for that range.
# how to count number of occurrences in a column > df = ToothGrowth > table(df$supp) OJ VC 30 30
Is this example, the table() function shows the number of occurrences for the two values in the column “supp” both of which have thirty occurrences. This is the simplest form of this function, the others yield more information.
Comparing Multiple Columns
To count occurrences between columns, simply use both names, and it provides the frequency between the values of each column. This process produces a dataset of all those comparisons that can be used for further processing. It expands the variety a comparison you can make.
# comparing multiple columns > df = ToothGrowth > table(df$supp, df$dose) 0.5 1 2 OJ 10 10 10 VC 10 10 10
In this example, the two columns of the data frame have a frequency of ten across each of their values. While it is unusual to have such an even distribution, it makes for an easy test case for future examples.
Checking For NA Values
The table() function usually ignores NA or true false values and only count occurrences of a text string and numeric value. This fact means that in general, you can ignore them.
# occurrences in a column of NA values > df = ToothGrowth > df$dose = NA > df$dose = NA > table(df$supp, df$dose) 0.5 1 2 OJ 10 10 10 VC 8 10 10 > table(df$supp, is.na(df$dose)) FALSE TRUE OJ 30 0 VC 28 2
In this example, we substitute the original distinct values for NA values. These were numeric values but we did not touch the string values. The first table array shows the effect of NA values and in the second table, they are counted.
Including NA Values
In this situation instead of having a unique value of a number or a string, but rather an NA value, you may want to include a count of those values as well.
# checking occurrences in a column counting NA values > df = ToothGrowth > df$dose = NA > df$dose = NA > table(df$supp, df$dose, useNA = "always") 0.5 1 2 NA OJ 10 10 10 0 VC 8 10 10 2 NA 0 0 0 0
In this example, we included an argument that tells the table() function to include NA values. The result is the addition of a column and row for that addition.
Range checking is one practical use of the table() function. It can tell you how many places in the dataset have a unique value above, below, or equal to a certain value.
# counting occurrences in a column range checking > df = ToothGrowth > table(df$supp, df$dose>2) FALSE TRUE OJ 10 20 VC 10 20
In this example, we have the sum of how many values are less than two and not less than two for each supplement.
The table() function also works with arrays. So, you can put a group of vectors through the array formula and then the table() formula to get the same type of results. Being able to count the number of occurrences is a convenient tool, and it is a simple and versatile tool that adds flexibility to R programming.