We’re going to walk through how to sort data in r. This tutorial is specific to dataframes. Using the dataframe sort by column method will help you reorder column names, find unique values, organize each column label, and any other sorting functions you need to help you better perform data manipulation on a multiple column dataframe. Learning to sort dataframe column values or create a row index can help you determine every single column value, and find any missing values you may have in your newly sorted dataframe object.
This article continues the examples started in our data frame tutorial. We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- How To Create an R Data Frame
- How To Sort an R Data Frame (this article)
- How to Add and Remove Columns
- Renaming Columns
- How To Add and Remove Rows
- How to Merge Two Data Frame
Sorting an R Data Frame
Let’s take a look at the different sorts of sort in R, as well as the difference between sort and order in R. Continuing the example in our r data frame tutorial , let us look at how we might able to sort the data frame into an appropriate order. We will be using the order( ) function to accomplish this. This key function is essential to creating a row index or column index from an original dataframe object, and sort rows into in a selected sort order to create a better organized dataset. The order sort function is a great example of a key function for our analysis, and you can use the default sort order or use a different sorting algorithm if you want your table to look different from the default sort function.
Sorting in R programming is easy. The order function’s default sort is in ascending order (sort values from lowest to highest value). A quick hack to reverse this is to add a minus sign to the sorting variable to indicate you want the results sorted in descending order. That allows you to use a different sorting algorithm, and better find a specific column, a single row, or any null value or missing data in your dataset. Here are a couple of examples.
Returning to our feathered subjects (the chickens) for a moment, lets start by selecting a list of the chickens who were in the measured on the final day of the study (day 21). We’re going to use conditional indexing to do this quickly.
# r sort dataframe by column # filter larger data frame to specific set of records birds <- ChickWeight[ChickWeight$Time ==21,]
We’ve got a total of 45 birds in the set, by the way. Lets start by sorting them into order (eg. order dataframe by column ) :
# r sort dataframe by column birds[order(birds$weight),]
And as you can see, it does a lovely job in sorting the results from largest to smallest.
I’d like to be a bit more picky, however. Perhaps the largest birds, and only the top 5 of them. We can use the column name to indicate which field we’re interested in. Easy enough…
# sort dataframe by column in r # select top N results birds[order(-birds$weight),][1:5,]
We use two techniques to zero in on the results we’re interested in. First, we use a negative sign in from the variable to sort the results in descending order (the default is increasing order). Next, we select the first five rows of the data frame for inspection, This yields the following result – which is exactly what we are looking for.
Sorting by Multiple Factors (Multiple Columns)
Moving along, what if we wanted to sort the entire list by the largest birds for each diet? Easy enough, the order function supports the ability to sort using multiple variables (values in multiple columns).
# sort dataframe by column in r # sorting by multiple variables birds[order(birds$Diet, -birds$weight),]
This yields this utterly lovely result, which satisfies our goal.
The Difference Between Sort (), Rank (), and Order() in R
Sort is not the only way to sort data in r – you may also want to use two other functions to get the same job done. Let’s take a quick pause to explore the difference between sort and order in r .
- Sort() – returns the results sorted in ascending order (you can use a minus sign to get results in descending order).
- Rank () – will return a vector providing the rank of each element within a vector. It does not sort the underlying data.
- Order() – returns a vector with the index that element (within the original vector) would occupy if you sorted the vector into order
These distinctions become important if you’re writing higher level functions to manipulate data, particularly if you expect to sort the underlying data multiple times. It may be more efficient to use indexes within your calculation process.
Summation: Sorting Dataframe in R
As you can see from the examples above, the order function provides you with the essential tool you need to sort a data frame in R. By manipulating the sign of the variables, you can control the direction of the sort.
Up next…adding and removing columns from a data frame. Or if you want to skip ahead, see below….