How to count unique values in R

There are times when doing data science that you are going to want to know how many unique elements your data set has. Sometimes you may want the actual distinct values, but other times, you may only need to count those values. Unfortunately, there is no single function that will count unique values but there is a way of doing it by combining two functions.

Description

The unique function creates a vector holding the unique values within the original vector and it has the format of unique(x). To count unique values requires using the length function with the format of length(x) on the results of the unique function. This results in the joint function of length(unique(x)) and the dual function produces a numeric value that is the number of unique values in a vector, this is a much simpler process than the table function. By combining these two functions, we produce a tool that carries out a task that neither one can do separately. Together they supply useful information about a vector that will not be available otherwise.

Explanation

The reason you can count unique vector elements by combining the unique and length functions is that the unique function does not include a duplicate value in its output. The unique function creates a vector of values with the duplicates removed, and the length function supplies the number of elements in the vector. When these two functions are combined, they create a numerical value that indicates the number of unique values in the original vector. As a result of this combination, it successfully counts the number of unique values in the vector to which it is being applied. It is an elegant solution for finding the unique value count.

Examples

Here we have four examples of using these functions under different circumstances. They illustrate the process for both vectors and data frames.

> x=c(2,4,4,6,8,7,7,11,11,16,20,8,6)
> x
[1] 2 4 4 6 8 7 7 11 11 16 20 8 6
> unique(x)
[1] 2 4 6 8 7 11 16 20
> length(unique(x))
[1] 8

Here we have a straightforward application of these functions being applied to a vector in the simplest case. It is a simple matter of plugging the vector name into the joint functions.

> x=c(2,4,4,6,8,7,7,NA,NA,NA,20,8,6)
> x
[1] 2 4 4 6 8 7 7 NA NA NA 20 8 6
> unique(x)
[1] 2 4 6 8 7 NA 20
> length(unique(x))
[1] 7

In this example, the vector includes missing values. This results in the list of unique values including a missing value and so it is included in the unique value count.

> df = data.frame(y=c(2, 3, 3, 4, 5, 5, 6, 8, 8, 9),
+ x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> df
y x
1 2 1
2 3 2
3 3 3
4 4 4
5 5 5
6 5 6
7 6 7
8 8 8
9 8 9
10 9 10
> unique(df$y)
[1] 2 3 4 5 6 8 9
> unique(df$x)
[1] 1 2 3 4 5 6 7 8 9 10
> unique(c(df$y,df$x))
[1] 2 3 4 5 6 8 9 1 7 10
> length(unique(c(df$y,df$x)))
[1] 10

This example shows how to apply these functions to multiple columns, to get the unique value count for a data frame.

> df = data.frame(y=c(2, 3, 3, 4, 5, 5, 6, 8, 8, 9),
+ x=c(1, 2, 1, 4, 4, 6, 7, 7, 9, 9))
> df
y x
1 2 1
2 3 2
3 3 1
4 4 4
5 5 4
6 5 6
7 6 7
8 8 7
9 8 9
10 9 9
> unique(df$y)
[1] 2 3 4 5 6 8 9
> length(unique(df$y))
[1] 7
> unique(df$x)
[1] 1 2 4 6 7 9
> length(unique(df$x))
[1] 6

This example shows how to count the unique values in each column in a data frame.

Application

There are several applications of counting unique values by using the combination of unique and length functions. It can be used to find the value width of a graph, the number of steps needed to analyze each value separately, and it can be used to count the number of levels in a factor vector. Each of these comes in handy when doing data analysis depending on your specific needs.

The combination of unique and length functions will provide you with the unique value count of the vector that you are applying them to. Once you understand how to use this combination of functions to count unique values, you will find it an easy process.