Data frames are one of the most important parts of any programming language which puts an emphasis on statistics and data science. As such, it should come as no surprise that the R programming language provides users with a wide range of functions related to managing data frames. But it’s just as important to master the basics of data frame manipulation as it is the more advanced manipulations. For example, what’s the best way to delete a data frame?
Data Frames and Definitions
Before considering how to delete a data frame in R we need to properly define them. A data frame in R is essentially just a container designed to hold two-dimensional data. Specifically, it stores data in the form of rows and columns. This is roughly analogous to a standard spreadsheet. Data frames provide a higher level of freedom than some comparable collection types in R such as matrices as it can contain numbers, characters, and factors. But they also have an important restriction. We typically need to use specific functions to manipulate, and to some extent even analyze, data frames. Two of our most important tools are the ls function and the rm function. These two functions provide everything we need to learn how to delete a data frame in R.
Basic Data Frame Manipulation
The best way to understand data frame manipulation is to jump right in. Take a look at the following example to see how we can work with data frames and the ls function.
ls()
ourFrame <- data.frame(part1= c(51, 52, 53, 54, 55),
part2 =c(56, 57, 58, 59, 60))
ourFrame1 <- data.frame(1, 2, 3, 4, 5)
ourFrame2 <- data.frame(6, 7, 8, 9, 10)
ourFrame3 <- data.frame(11, 12, 13, 14, 15)
ourFrame4 <- data.frame(16, 17, 18, 19, 20)
ourFrame44 <- data.frame(44, 44, 44, 44, 44)
ls()
We start out by calling ls. This shows all of the variables and functions that are currently in our working environment. Next, we’ll define a series of data frames before once again calling ls. The important point to note here is that once we call the second ls we’ll see our data frames. However, one of the frames is notably different than the other. The ourFrame44 doesn’t fit with the naming conventions or numerical progression. So how would we go about deleting it? This process is relatively simple. Just add the following code to the script.
rm(df44)
ls()
This additional code makes use of the rm function. Rm is shorthand for remove from environment. And it does exactly that to the df44 frame that we passed as an argument. When ls is called again in the next line you’ll see that df44 is now absent from the list of frames and other variables. The rm and ls functions provide us with the basic tools to delete data frames. But we can also build on them for some more advanced usage scenarios.
More Advanced Deletion Scenarios
We can use a similar technique to remove multiple data frames at the same time. Try replacing the rm call in our previous example with the following.
rm(ourFrame1, ourFrame2)
In this example we pass two arguments to rm. If you run this code you’ll see that rm behaves the same when we pass one, two, or more arguments. Rm will remove any of the data frames we pass to it.
But take a look at ourFrame. What if we wanted to drop data frame content but not fully remove the totality of the frame’s information? Consider how we tackle this within the following code.
ourFrame <- data.frame(part1= c(51, 52, 53, 54, 55),
part2 =c(56, 57, 58, 59, 60))
print(ourFrame)
dfMod <- subset(ourFrame, select = -c(part1) )
print(dfMod)
We begin by assigning a two-column frame to ourFrame. Next, we print out its contents to highlight that we are indeed dealing with two separate components. We continue on to use a function from the base R library called subset. This function captures a subset of a larger data set. In this case, we’re selecting everything in ourFrame except for the part1 column that we’ve passed as an argument. This leaves only the part2 column of ourFrame. We then assign that data to a new variable, dfMod. Finally, we print it to screen to highlight that this has effectively deleted the part1 column.