A ggplot2 doesn’t know how to deal with data of class integer error typically arises from a mismatch between data types. In particular, with the ggplot2 package expecting a data frame but instead finding an integer. In theory, this is a fairly easy error to fix since it simply requires you to supply the right type of variable. However, things can get a little more complicated when it comes to tracking down the reason why an integer is being sent in the first place. But you’ll soon discover the most efficient ways to solve this problem.
Understanding the Plotting Functions
The ggplot2 R package is one of the big reasons why R is so popular in data science. The library simplifies advanced plot concepts into a system that you can quite literally use with just a few lines of R code and a function call. Along with R markdown, ggplot2 allows you to create reports that can be understood at a glance. Often to the point where even people outside the field can understand what’s going on. And the library makes it just as easy to switch between plot types. For example, you can redefine geom types or quickly define a geom point to use in a scatter plot. You can even create a full geom histogram to display more advanced concepts. People might not understand the specifics of a large study. But positioning all of the data into a bar chart can convey the meaning better than many in-depth explanations. However, this power does come with a price.
As with many automated processes, it’s important to ensure the library has everything properly formatted. Ggplot2 needs to know exactly what it’s dealing with before it’s able to work its magic. And that’s the crux of the ‘ggplot2 doesn’t know how to deal with data of class integer’ error. Ggplot2 works with data formatted into a data frame. But if you’re seeing the error then ggplot2 has instead found itself presented with an integer.
A Closer Look at ggplot2
It’s easier to see how this works by recreating the problem. Take a look at the following code.
ourData <- 10L
We begin by importing ggplot2, then move on to create the ourData variable. Note that we use L to specify an integer. R normally creates a double by default when creating numerical values. Using the L ensures that we’re emulating the conditions which give rise to the integer error.
Next, we pass ourData to ggplot. This will produce the integer error. However, note that the exact wording will vary depending on the version of ggplot2 currently in use. Different versions of the library are slightly more or less verbose with how they report on data type incompatibility. But no matter the exact wording, it’ll always point out that an integer is being passed and that it needs different data types.
Fixing the Error
The error is extremely easy to fix if you have full control over your information. For example, here’s how you could easily fix the previous code.
ourData <- 10L
ourDf <- data.frame(value = ourData)
ggplot(ourDf, aes(x = “Value”, y = value)) + geom_bar(stat = “identity”)
The first two lines remain unchanged. We once again define ourData as an integer. But this time around we take that integer and insert it into a data frame called ourDf. Next, we use ggplot to plot ourDf onto a bar graph. Note that we specify stat in order to use the full value of ourData.
As you can see, the error’s easy to fix when you can pin down the source of the problem. But that’s also where things can get a little more complicated. When you encounter the error it’s generally not going to stem from such a straightforward source. It’s far more likely that you’ll be in a situation where a function is misbehaving. For example, missing values from an imported source might cause a column to remain unpopulated. And this might cause the function to continually redefine a variable as an integer rather than filling frames with those numbers.
In general, the most important part of debugging the error isn’t really fixing code – it’s tracking down the source of the problem. You’ll typically need to leapfrog from the ggplot2 call creating the error to prior sources. So, with the first example, you’d see ggplot(ourData) and want to look at where ourData originates. This will typically lead to another function. Again, this often stems from automated imports where your code is pulling information from an external source. It’s quite common to just assume that data’s going to come in a particular format. And that might be the case for quite some time.
But if the source of your information changes, things might break unexpectedly. And that’s why it’s generally a good idea to add some level of preprocessing to your import functions. For example, forcing a categorical variable rather than a continuous variable. Even simply setting up some form of verification and logging during the import stage can help prevent this type of error. For example, using a pristine sample of the format you expect and then comparing it to the result of an import before passing it to a ggplot2 call.
This is similar in a sense to best practice with ggplot2 output. You’ll often want to clean your output data a little bit by adding noise with geom jitter or geom smooth. Both of those techniques are used to more evenly place your points on a graph. And, likewise, it’s a good idea to ensure that the data is properly placed into a frame before working with it in any other way. Whether that’s through ggplot2 or another library that requires data frames.