How to Remove Variables in R

One aspect of using previously created data is that the dataset may contain more variables than you need. In such cases, you may want to remove the variables that you are not using. When dealing with datasets they may actually get in the way. So removing them is important. In this R tutorial, we’ll show you how to delete multiple variables, rows, or data frame columns that you don’t need for your calculations or summary statistics.

The subset()

This the main function for removing variables from datasets. It takes the form of 1subset(x, row-subset, column-select) where row-subset is a Boolean expression (true or false) and column-select is a list of the columns to be removed or retained. It is fairly simple to use once you get the hang of it. You can exclude either row-subset or column-select but you need to include one of them. Also when applying it to a character vector you can only use row-subset. However, it does not provide any useful results when dealing with vectors, but it is quite handy for dataframes.

Basic variable Removal

Basic variable removal is an important part of data manipulation. If data contained in an original data set is not needed for analysis, it is best to remove it so that it does not get in the way. Here is a sample section of the code that shows his process in action.

#how to remove variables in r
> a = data.frame("x" = c(5, 2, 10), "y" = c(2, 10, 8), "z" = c(8, 15, 20), "name" = c("Bob", "Tom", "Sue"))
> a
x y z name
1 5 2 8 Bob
2 2 10 15 Tom
3 10 8 20 Sue

> a = 2subset(a, select = c(x, y, z))
> a
x y z
1 5 2 8
2 2 10 15
3 10 8 20

As you can see the data frame starts with four existing columns and then removes a column based on the column names not on the list. If you make this argument “select = -c()” to remove the columns whose names are on the list. Another part of this function’s flexibility is that you can select any of the columns for removal. Both arguments can be entered in different formats. This flexibility includes removing multiple columns at the same time. Furthermore, this flexibility extends beyond removing columns and includes the removal of rows as well.

Removal of Rows

One of this subset function’s arguments (row-subset) removes or retains rows based on the Boolean argument used. In our example below this argument is z>10 and so it checks each data point in “z” to see if the value in a row is greater than 10. It then eliminates any existing ones that do not meet these conditions.

#remove rows in R
> a = data.frame("x" = c(5, 2, 10), "y" = c(2, 10, 8), "z" = c(8, 15, 20), "name" = c("Bob", "Tom", "Sue"))
> a
x y z name
1 5 2 8 Bob
2 2 10 15 Tom
3 10 8 20 Sue

> a = 3subset(a, z > 10)
> a
x y z name
2 2 10 15 Tom
3 10 8 20 Sue

One of this function’s practical uses is the removal of rows with a missing value in one of the columns. Removing rows with a missing value or multiple na values is one of the main reasons why you would want to remove an r variable. In this case above we are simply checking for values above 10 and removing those that are 10 or lower. This is a simple situation but a common one.

Removal of columns and rows

Another part of this subset function’s flexibility is the ability to delete both existing columns and rows with missing data at the same time. This is done by using both of the arguments. When used together both the row-subset and column-select arguments work together to remove the appropriate rows and columns. While you can do this separately it demonstrates this subsetting function’s power. It can even be used to delete duplicate rows or data frame columns with multiple variables missing.

#remove data frame columns and rows in r tutorial
> a = data.frame("x" = c(5, 2, 10), "y" = c(2, 10, 8), "z" = c(8, 15, 20), "name" = c("Bob", "Tom", "Sue"))
> a
x y z name
1 5 2 8 Bob
2 2 10 15 Tom
3 10 8 20 Sue

> a = 4subset(a, z > 10, select = -c(z, name))
> a
x y
2 2 10
3 10 8

In the case above we use both arguments using z>10 for the row-subset, and -c(z, name) column-select. Now you can see that the minus sign causes the named columns to be deleted rather than retained as it was in the earlier cases.

Now you know how to how to remove variables in r. The real trick is figuring out how best to set up the arguments to get the results that you are looking for. Now even if a data source gives you a monster size dataframe, you can trim it down to a more convenient and workable size.