Rank in R – How to Rank Data in R

The need to determine the rank of the values of data points is an important part of statistics. It shows where the high and low points are in data, as well as patterns fluctuations. Determining the rank of data in a data set can also show additional relationships among the data.

Rank function

The basic form of the rank() function has the form of rate(vector) and it produces a vector that contains the rank of the values in the vector that was evaluated such that the lowest value would have a rank of 1 and the second-lowest value would have a rank of 2.

# rank in R - basic example
> x = c(5,1,4,7,10,35,25)
> rank(x)
[1] 3 1 2 4 5 7 6

This is the basics of how to rank data in r. If you look closely at this example, you will see that the first value 5, has a rank of three because it is the third-lowest value.

NA last option

Rank in R has an optional term called na.last and it can have four values

$ rank in R - putting NA last - default value
> x = c(5,1,4,7,NA,35,25)
 > rank(x,na.last = TRUE)
 [1] 3 1 2 4 7 6 5

TRUE is the default value used when this option is emitted. It ranks an NA value last giving it the highest rank.

# rank in R, putting NA first
> rank(x,na.last = FALSE)
 [1] 4 2 3 5 1 7 6

FALSE ranks an NA value first giving it a rank of 1.

# rank in R - na.last option
> rank(x,na.last = NA)
 [1] 3 1 2 4 6 5

NA does not rank an NA value.

# rank in R - NA values
> rank(x,na.last = "keep")
 [1]  3  1  2  4 NA  6  5

“keep” ranks an NA value with a rank of NA.

Ties option

When ranking in R, you have the ties.method for handling duplicates which can have five values.

# rank in r - average method for dupes
> x = c(5,1,4,7,4,35,25)
 > rank(x,ties.method = "average")
 [1] 4.0 1.0 2.5 5.0 2.5 7.0 6.0

“average” returns the average values for the duplicates. It is also the default value when this option is missing.

# rank in R - first
> rank(x,ties.method = "first")
 [1] 4 1 2 5 3 7 6

“first” ranks the first duplicate first.

# rank in R - random
> rank(x,ties.method = "random")
 [1] 4 1 3 5 2 7 6

“random” ranks duplicates in random order.

# ranking R - solving ties
> rank(x,ties.method = "max")
 [1] 4 1 3 5 3 7 6
 > rank(x,ties.method = "min")
 [1] 4 1 2 5 2 7 6

“max” and “min” assign duplicate values the maximum or minimum value respectively.

Ranking character vectors

The rank function works on characters and not only numbers. When a vector of characters is evaluated through rank in R, it orders the characters based on alphabetical order.

# rank in R - alphabetical ordering
> x = c("c","h","a","r","l","e","s")
 > rank(x)
 [1] 2 4 1 6 5 3 7

Applications

Here is an application of how to rank data in R using the data set mtcars. Here, we are using ranking in r to find the numerical order are the miles per gallon the first ten cars in the list.

# rank in r - ranking cars by gas mileage
> car=head(mtcars$mpg,10)
> car
  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
> rank(car,ties.method = "first")
  [1]  5  6  8  7  3  2  1 10  9  4

Note that the first two are identical in one has a ranking a five in the other six because of the ties.method being “first.” The same thing occurs with the 3rd and 9th values. An application of these results would be to rank the cars by their mileage.

The rank function in R is another useful tool for data science. This makes determining which values are greater than others easier. It makes ranking objects in a data set by a specific property easy to do.