Cut Function in R [How to Split Your Data Into Bins]

Sometimes when dealing with vectors in R you need to be able to separate values into groups. This is where the cut function in r comes into play. This function creates a vector consisting of a list of the category the values in the original numeric vector falls under. The result is a new vector providing information about the data in the original.

Description of the Cut Function In R

The cut function has the form of cut(x, breaks, labels), and x is a numeric vector and it produces a vector of the categories that each value in x falls under. This is based on each interval set in the “breaks” argument. The “labels” argument is optional and by default, it creates labels based on the “breaks” argument. It is simple to use once you understand how it works and what it does.

Explanation of the Cut Function

The cut function will take either a discrete or continuous variable. It will simply be divided up according to the breaks. The only critical factor is that the variable has to be a numeric vector. While it will not accept a data frame, it will accept individual columns in a data frame. It will also accept a missing value simply returning them as NA values. The purpose of this function is to produce a vector of the categories that the values in the original numeric vector fall under. It is a good way of segregating those values into categories regardless of what order they are in.

Examples of the Cut Function in Action

There are several different ways this function can be applied. Here we will look at three of them.

> x = 1:12
> cut(x, breaks = 3)
[1] (0.989,4.67] (0.989,4.67] (0.989,4.67] (0.989,4.67] (4.67,8.33] (4.67,8.33]
[7] (4.67,8.33] (4.67,8.33] (8.33,12] (8.33,12] (8.33,12] (8.33,12]
Levels: (0.989,4.67] (4.67,8.33] (8.33,12]

In this example, we define the breaks argument as a single number, this simply divides the vector into that the given number of categories with no labels giving. In this case, the number is three.

> x = 1:12
> cut(x, breaks = c(0,4, 8, 12))
[1] (0,4] (0,4] (0,4] (0,4] (4,8] (4,8] (4,8] (4,8] (8,12] (8,12] (8,12]
[12] (8,12]
Levels: (0,4] (4,8] (8,12]

In this example, once again no labels are given but the breaks are clearly defined producing a nice-looking set of generated categories.

> x = 1:12
> cut(x, breaks = c(0,4, 8, 12), labels=c(“A”,”B”,”C”))
[1] A A A A B B B B C C C C
Levels: A B C

In this final example, we have both defined breaks and labels resulting in an even better-looking output.

Application of the Cut Function in R

There are many applications of this function but they require applying other functions. You can use a for loop with if statements to count the number in each category. You could create a data frame that puts the numeric values in one column in the category label in another column. You could then apply the crosstab function to create a crosstab table. All of these can provide some useful ways of looking at data.

Like many functions, this one has a number of different variations. These variations improve the functionality and usefulness of that function. Like any tool, the trick is learning how and when to use it.

Scroll to top