We’re going to walk through how to sort data in r. This tutorial is specific to dataframes.
This article continues the examples started in our data frame tutorial. We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- How To Create an R Data Frame
- How To Sort an R Data Frame (this article)
- How to Add and Remove Columns
- Renaming Columns
- How To Add and Remove Rows
- How to Merge Two Data Frame
Sorting an R Data Frame
Continuing the example in our r data frame tutorial, let us look at how we might able to sort the data frame into an appropriate order. We will be using the order( ) function to accomplish this.
The order function’s default sort is in ascending order (from lowest to highest value). A quick hack to reverse this is to add a minus sign to the sorting variable to indicate you want the results sorted in descending order. Here are a couple of examples.
Returning to our feathered subjects (the chickens) for a moment, lets start by selecting a list of the chickens who were in the measured on the final day of the study (day 21). We’re going to use conditional indexing to do this quickly.
# r sort dataframe by column # filter larger data frame to specific set of records birds <- ChickWeight[ChickWeight$Time ==21,]
We’ve got a total of 45 birds in the set, by the way. Lets start by sorting them into order (eg. order dataframe by column ) :
# r sort dataframe by column birds[order(birds$weight),]
And as you can see, it does a lovely job in sorting the results from largest to smallest.
I’d like to be a bit more picky, however. Perhaps the largest birds, and only the top 5 of them. Easy enough…
# sort dataframe by column in r # select top N results birds[order(-birds$weight),][1:5,]
We use two techniques to zero in on the results we’re interested in. First, we use a negative sign in from the variable to sort the results in descending order. Next, we select the first five rows of the data frame for inspection, This yields the following result – which is exactly what we are looking for.
Sorting by Multiple Factors
Moving along, what if we wanted to sort the entire list by the largest birds for each diet? Easy enough, the order function supports the ability to sort using multiple variables.
# sort dataframe by column in r # sorting by multiple variables birds[order(birds$Diet, -birds$weight),]
This yields this utterly lovely result, which satisfies our goal.
The Difference Between Sort (), Rank (), and Order() in R
Sort is not the only way to sort data in r – you may also want to use two other functions to get the same job done. Let’s take a quick pause to explore the difference between sort and order in r .
- Sort() – returns the results sorted in ascending order (you can use a minus sign to get results in descending order).
- Rank () – will return a vector providing the rank of each element within a vector. It does not sort the underlying data.
- Order() – returns a vector with the index that element (within the original vector) would occupy if you sorted the vector into order
These distinctions become important if you’re writing higher level functions to manipulate data, particularly if you expect to sort the underlying data multiple times. It may be more efficient to use indexes within your calculation process.
Summation: Sorting Dataframe in R
As you can see from the examples above, the order function provides you with the essential tool you need to sort a data frame in R. By manipulating the sign of the variables, you can control the direction of the sort.
Up next…adding and removing columns from a data frame. Or if you want to skip ahead, see below….