Bar Charts In R - geom_col vs geom_bar

The ease with which R can create visualizations has made it an immensely popular language in data science. You can easily take a complex data set and turn it into a bar chart that anyone can understand at a glance. However, there are often multiple options to perform these tasks. The two most popular methods of doing so in R are geom_col and geom_bar. But each has unique attributes. Geom bar is generally used to plot counts, and geom col for pre-aggregated data where you already have the value of your y axis. But there’s a lot more to both approaches.

A General Overview of the Functions

Geom bar and col are both part of the ggplot2 R package. Both geomcol and geombar are used to create bar plots. And there’s also a lot of overlap between these two functions. However, their individual strengths make each better suited to specific tasks. Geom_bar is the more complex of the two. It leans heavily into more dynamic generation for situations where you’re working on information while also visually modeling it.

Geomcol is another, similar, function that’s also found in ggplot2. However, you’ll generally use geomcol when you’ve already generated your summaries and just need to create a graphical representation. As such there’s no need for additional transformation or counts. As with geombar, you feed data to the function and receive a visual representation where the height of the bar corresponds with the values you’ve supplied. The higher the value the higher the bars.

A General Overview of the Functions

Visual topics are inherently easier to understand visually. So with that in mind, try running the following code.

library(ggplot2)

ourData <- data.frame( drone = c(“A”, “B”, “C”, “D”), samples = c(100, 110, 150, 105) )
ggplot(ourData, aes(x=drone, y=samples)) + geom_col() + labs(title=”Samples by Drone”, x=”Drone”, y=”Samples”)

In this example, we create a frame called ourData and fill it with information about a hypothetical drone mission to take geological samples. This consists of drone and samples variables. We then create a new ggplot object from the ggplot2 library. The ourData variable is passed into the new ggplot object along with the aesthetic mapping (aes) to use. This is followed by the actual call to geomcol. Note that no additional arguments need to be passed when the call is made.

We then pass on the “labs” information. This is essentially how labels are going to be laid out. Note that labs, while useful, aren’t considered required aesthetics and there’s nothing forcing you to use them. Or at least that’s true in theory. In reality, it’s generally just best practice to always make sure your presentations are properly labeled. It’s somewhat similar to the fact that you don’t necessarily need to explain metrics in a brief report about an ongoing study. But not doing so will usually lead to mistaken assumptions somewhere down the line. So while you’re not going to be forced into use of labs, it’s a good idea to develop the habit.

Now how would geombar change the process? Take a look at the following code.

library(ggplot2)

ourData <- data.frame(drone = c(“A”, “A”, “B”, “C”, “C”, “C”, “D”, “D”, “D”, “D”))
ggplot(ourData, aes(x=drone)) + geom_bar() + labs(title=”Number of Samples by Drone”, x=”Drone”, y=”Result”)

We use a fairly similar procedure here. The code begins by once again loading ggplot2. We then create ourData again, but this time we use a simplified drone listing. We then create a new ggplot object and feed ourData into it.

We use aes to map drone onto the x-axis as we’re using y with a predetermined value. Next, geombar is called and passed into the ggplot object along with the labs information. Note that while the y-axis is referenced in labs, it’s never explicitly defined. But we don’t get alerts about missing values or the like. This is all handled through the geom bar call and is one of the major differences between geombar and geomcol. One of R’s strengths is found in its ability to handle automatic mapping, position adjustment, width, orientation, and other elements without explicitly defining their properties. For example, you don’t need to manually define a stat count value in geombar because it automatically calculates the number of cases at each x level.

Looking a Little Deeper Into Their Functionality

The functionality can also be taken a little further. Take a look at the following code.

library(ggplot2)

ourData <- data.frame( planet = c(“Mercury”, “Venus”, “Earth”, “Mars”, “Jupiter”, “Saturn”, “Uranus”, “Neptune”), moons = c(0, 0, 1, 2, 79, 53, 27, 14) )

ggplot(ourData, aes(x = planet, y = moons, fill = planet)) + geom_bar(stat = “identity”) + labs(title = “Moons per Planet”, x = “Planet”, y = “Number of Moons”) + theme_minimal()

In this example, we use the fill function and theme_minimal to assign different colour options to the planet listing along with an easy-to-read legend.

We could do something similar with geom_col with the following.

library(ggplot2)

ourData <- data.frame(planet = c(“Mercury”, “Venus”, “Earth”, “Mars”, “Jupiter”, “Saturn”, “Uranus”, “Neptune”), moons = c(0, 0, 1, 2, 79, 53, 27, 14) )

ggplot(ourData, aes(x = planet, y = moons, fill = planet)) + geom_col() + labs(title = “Moons per Planet”, x = “Planet”, y = “Number of Moons”) + theme_minimal()

The important point to take note of in this example is that geomcol and geombar are just drop-in components within the larger ggplot object. You should always think of ggplot as the main focal point when working with a bar plot that uses either.

Because of these similarities, you can generally apply similar strategies to working with each. For example, you can apply a facet wrap with either of the functions as part of the larger ggplot object in order to split the plot into multiple panels. And the same goes for changing the column position in a plot or adding to a layer. The two functions do provide different takes on a similar concept. But, on a fundamental level, they’re both elements of ggplot2 and work within that greater context.