How to Delete Cases by Values in R From a Dataframe

Sometimes when working with data frames, it is necessary to remove rows based on specific conditions. While this is a simple process, it does have a considerable amount of variation. In short, it simply requires a routine that goes into the data frame you are working on it removes the rows that meet the conditions you want to exclude.

What We’re Solving For

The r code to remove a row is a base r procedure that is given the column name to be checked and the conditions to check for. It is an operation that is performed on original data that checks the column that is specified for a specific set of conditions, these conditions will be a specific value or values that you do not want to have included. The conditions can include missing values, so this process is a handy way of filtering out missing data. While it does not qualify as a function, it has the feel of a function with the data frame’s name.

Explanation

Data frames often contained more data than you are interested in. Sometimes a row will contain data such as Dates that are outside the range you are working with. When this is the case a remove row procedure is what is needed to accomplish the task. This particular procedure applies a filter function to the selected column name. Any row in that column where the conditions are met that row is removed. This is a quick and easy way to clean up a data frame of data that you cannot process. However, it is a simple case of removing any rows that meet the conditions were looking to remove.

Examples

Here we have several examples of this operation in action. It starts off showing the dataframe being used which is then followed by three examples of this operation in action. The fourth example demonstrates a function that is capable of doing the job.

> df = data.frame(A = c(1, 2, 3, 4, 5, 6, 7),
+ B = c(1, 2, 3, 4, 1, 2, 3),
+ C = c(“A”, “B”, “C”, “D”, “E”, “F”, “G”),
+ D = c(“A”, “B”, “C”, “D”, “A”, “B”, “C”))
> df
A B C D
1 1 1 A A
2 2 2 B B
3 3 3 C C
4 4 4 D D
5 5 1 E A
6 6 2 F B
7 7 3 G C

This section of r code simply demonstrates the data frame being used without any filtering.

> df = data.frame(A = c(1, 2, 3, 4, 5, 6, 7),
+ B = c(1, 2, 3, 4, 1, 2, 3),
+ C = c(“A”, “B”, “C”, “D”, “E”, “F”, “G”),
+ D = c(“A”, “B”, “C”, “D”, “A”, “B”, “C”))
> df2=df[df$A != 2, ]
> df2
A B C D
1 1 1 A A
3 3 3 C C
4 4 4 D D
5 5 1 E A
6 6 2 F B
7 7 3 G C

This example includes our data frame a single comparison operator checking for a specific value of two. it also produces a new data frame so as not to overwrite the original.

> df = data.frame(A = c(1, 2, 3, 4, 5, 6, 7),
+ B = c(1, 2, 3, 4, 1, 2, 3),
+ C = c(“A”, “B”, “C”, “D”, “E”, “F”, “G”),
+ D = c(“A”, “B”, “C”, “D”, “A”, “B”, “C”))
> df2=df[df$B != 2, ]
> df2
A B C D
1 1 1 A A
3 3 3 C C
4 4 4 D D
5 5 1 E A
7 7 3 G C

In this example the operator discovers this value being looked for in two rows and deletes them both

> df = data.frame(A = c(1, 2, 3, 4, 5, 6, 7),
+ B = c(1, 2, 3, 4, 1, 2, 3),
+ C = c(“A”, “B”, “C”, “D”, “E”, “F”, “G”),
+ D = c(“A”, “B”, “C”, “D”, “A”, “B”, “C”))
> df2=df[df$B != 2 & df$C != “D”,]
> df2
A B C D
1 1 1 A A
3 3 3 C C
5 5 1 E A
7 7 3 G C

This example is set up with multiple conditions and consequently removes more than one row.

> df = data.frame(A = c(1, 2, 3, 4, 5, 6, 7),
+ B = c(1, 2, 3, 4, 1, 2, 3),
+ C = c(“A”, “B”, “C”, “D”, “E”, “F”, “G”),
+ D = c(“A”, “B”, “C”, “D”, “A”, “B”, “C”))
> df2=subset(df, df$B != 2 & df$C != “D”)
> df2
A B C D
1 1 1 A A
3 3 3 C C
5 5 1 E A
7 7 3 G C

This final example repeats the previous one using the subset function.

Application

The primary application of this procedure is eliminating missing value cases from a dataframe. Sometimes the vector that makes up a data frame column has missing data, so these incomplete cases need to be dealt with before a lot of processing can be done. Another common application would be the elimination of a duplicate row in a case where duplications are redundant. It is also helpful in eliminating data that is out of the range of the study, for example, dates that are outside the time range being studied.

The routine used for conditionally removing rows from a data frame is a helpful tool they can clean up the data frame from problematic data. It is a helpful and easy to use tool that has many applications.

Scroll to top
Privacy Policy