How To Get a List of Data Frames Loaded in the R Environment

The R programming language is a fantastic asset to any task related to data science. It’s no secret that the language’s lexicon, style, and data types are expertly tailored to advanced math and science. But there’s also another element to the base R language that isn’t talked about nearly as much. And that’s the fact that the R environment as a whole is equally friendly to continual analysis and modification. For example, you can easily get a list of data frames that are currently loaded within your R environment. Read on to discover just how easily you can incorporate that technique into your own code.

Memory Management, Data Frames, and R

Before we move on to practical examples, it’s important to take a moment to look at how R handles memory management. Every programming language needs to work with variables that are stored in the system’s RAM. The more complex the variable the more space it’ll take up within your system’s memory. And this can grow exponentially when you’re dealing with the types of huge data sets typically encountered within data science.

Languages typically use some form of memory management to reduce the overall load caused by all of these variables. When variables don’t seem to be needed anymore they’ll be removed from active memory. R uses something called a garbage collector to track variable usage. When the system thinks that variables aren’t being used anymore it’ll automatically remove them from active memory. And you can even manually call R’s garbage collector by adding gc() into your code. However, there’s always something of a balancing act when using garbage collection.

Garbage collection can’t be too lenient or too overzealous. And R’s garbage collector tends to err on the side of caution. This can lead to data frames sticking around longer than they’re needed. And of course, there are other times when you simply want to find out which frames are in memory. For these cases and more you’ll find yourself needing to peek into R’s active memory to see what’s currently in use.

Taking a Peek Into R’s Active Memory

Tapping into working memory in many programming languages can be an exercise in frustration. But R is a pleasant exception to that rule. It’s extremely easy to just take a glance into the R interpreter’s working memory to see what’s currently in use. In fact, it only takes a single line of code to do so. Take a look at the following example.

jupitersMoons <- data.frame(
moonName = c(“Io”, “Europa”, “Ganymede”, “Callisto”),
moonDistance = c(421000, 671000, 1070000, 1880000)
)

marsMoons <- data.frame(
moonName = c(“Phobos”, “Deimos”),
moonDistance = c(9378, 23463)
)
varOne <- 1
ls()

We begin by defining two data frames. The first data frame is called jupitersMoons. This data frame contains the names of some of Jupiter’s moons and their distance from the parent planet. We do the same with another data frame called marsMoons. We proceed to create a variable called varOne which contains the number 1.

Finally, we get to the actual code – a single line reading ls(). While this was a fair amount of code, keep in mind that everything except that final line was just variable declarations. When we finally run that one line of code with ls() we see all variables currently used by the R interpreter.

So far we have almost everything we were aiming for. We declare data frames and are then able to see them in memory after they’ve been loaded. However, there’s one issue. We also see that varOne variable. That’s the catch with the ls function. It gives us all of the variables in use rather than a narrow selection that’s been pruned by our own criteria.

Putting the Pieces Together

Thankfully, R’s exceptional fluidity with data manipulation makes it fairly easy to get around the minor limitations of ls. We simply need to loop over the information provided by ls with our selective criteria in mind. Take a look at the following code to see that concept in action.

varOne <- 1
varTwo <- 2
varThree <-3
varFour <-4
varFive <-5

jupitersMoons <- data.frame(
moonName = c(“Io”, “Europa”, “Ganymede”, “Callisto”),
moonDistance = c(421000, 671000, 1070000, 1880000)
)

marsMoons <- data.frame(
moonName = c(“Phobos”, “Deimos”),
moonDistance = c(9378, 23463)
)

ourObjects <- ls()
dataFrameNames <- list()
for (obj in ourObjects) {
objectInstance <- get(obj)
if (any(class(objectInstance) == “data.frame”)) {
dataFrameNames <- c(dataFrameNames, obj)
}
}
print(dataFrameNames)

This is obviously a lot more complicated than simply calling ls with a single line of code. But it’s simpler than it might appear at first glance. The initial code proceeds just as it did in the previous example. We create a number of variables, including the planet-based data frames and our numeric items.

Next, we jump back into the same ls command that we’d previously used. However, this time around we’re assigning the results of ls to a variable – ourObjects. As the name suggests, these are all of the objects currently in R’s active memory. Unfortunately, that also includes the numeric variables that we don’t care about. So we need to narrow the ourObjects listings to only include the names of our data frame object instances. And that’s exactly what we do on the following lines.

We begin by creating a loop over ourObjects while assigning data for the iterations into obj. We start the iteration out with a call to get() that uses obj as our argument. In doing so we’re able to “get” information about the data currently stored in obj. The next if’s logic uses a lot of nesting. But it’s relatively simple when we look at the individual components.

Working outward in our conditional logic we can see that the code is checking the class of objectInstance to see if it matches data.frame. In short, we’re checking if it’s a data frame. However, we need to get that information as a binary statement. This is where the any function comes in. The any function then passes the results of our probe to act as an if conditional. If the conditional is true then our code will assign the data frame’s name to dataFrameNames.

Note that we could also grab the row names with row.names() at this point if we were interested in getting information about the frame’s rows. Another point to consider is that we don’t need to do anything special to iterate over any specific number of frames. Because we’re grabbing all active objects we’re automatically setting things up to work with multiple data frames.

And in that final line, we’re able to see the fruits of our labors. The print statement outputs the contents of dataFrameNames. The output should now have jupitersMoons and marsMoons. Note that the variables which aren’t data frames are now absent from our printed data.

Scroll to top
Privacy Policy