ILet’s talk themes. Often, in data science, we are heavily focused on the underlying numerical and categorical value of the data we work with. We strive diligently to collect it, process it, explore it, analyze it, and so on, when eventually we arrive at the final step, communicating it. By the time the data gets wrangled and lugged through the “data science pipeline”, we ourselves feel no less wrangled. Who, after all this, is going to put in the effort to provide effective communication of the data? This is a thought that, on no rare occasion, pops into our minds. This is especially the case for those coming into data science from a mathematical or statistical background, or even better, engineering. We’ve done the work, we’ve put the effort in. Who cares how effectively it’s conveyed; the data is all there! we might say to ourselves. Now, this is hardly the place for a treatise on the importance of effective visual communication in data science. Those interested may check out the books and lectures of former Google employee Cole Nussbaumer Knaflic, who has of late dedicated her career for this very purpose (see: Storytelling with Data: The Effective Visual Communication of Information). Instead, we turn to those already convinced or, at the very least, aware to some extent of the importance we should be giving to this last step of the pipeline. We will present a powerful tool for customizing our plots in R, so that we can give our data the visualizations they deserve.
Themes allow us to customize all the non-data components of our plots. We are able to tweak and adjust the colors and fonts, edit the titles and labels, and change the background and gridlines. Every one of these components, plus many other components (i.e. legends) are all accessible and ready to be molded and sculpted like dough in our hands. It is quite natural that you may have been working with ggplot2 for some time now, and never realized the existent of themes or customizable components. This is because ggplot2 provides us with a default theme that is applied to all our plots. This default theme may be accessed and added to. Or removed completely and replaced by a new one.
Themes: Default themes
ggplot2 supplies and instates a default theme for its users and thus allows them the luxury of disassociation from premature, trivial customization of non-data components. Let us see what this looks like
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
Note that assigning our plot as an object to a variable, in this case p, will help us grasp ggplot2’s concept of adding customization onto existing plots. Since we now have a reference to our plot in p, any subsequent customization may be added with ease using this reference. Effectively, this means that when first creating the plot, we do not have to ensure that it is rendered the way we will eventually want it rendered. Upon running this code, immediately we recognize the signature ggplot2 theme with its grey background and white gridlines. In order to view what the definitions and attributes of this theme are, you may run the following:
If you ran the above code, you will likely see a large dumped of information in front of you. For those inclined to self-study and autodidactism, this should be sufficient enough to get you pretty far. Here, you may observe all the attributes available for customization within ggplot2 themes. The list is too long to publish in this article, however, a simple pursual is enough to notice that they are eponymous and easily understandable.
Along with the default theme provided by ggplot2, there is short list of other complete themes that are available for ggplot2 users. These are: theme_gray(), theme_bw(), theme_linedraw(), theme_light(), theme_dark(), theme_minimal(), theme_classic(), theme_void(), theme_test(). These are self-explanatory, or at the very least, allow one to guess at what the result may resemble (Note: theme_gray() is the signature theme that is applied to all plots in ggplot2, the one we saw above). Do not forget, however, that additions and customization maybe be subsequently added onto any one of these themes. To supply on of these themes to your plot, you may either create the plot from anew:
p_bw <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + theme_bw()
Add onto our previous plot:
p + theme_bw()
Or, set the theme of the session, overriding the default that we see when we run theme_get():
The first two examples showcase the two primary methods we will use when customizing our plots. Any customization we want to apply may either be added to the plot while we are creating. Or, we can do it afterwards, as long as we have saved a reference to our plot in an R variable. We could also do both, add some customization while, and some after. By giving us the ability to customize our plot whenever we want, ggplot2 gives us flexibility and room to breathe when we are exploring data and not yet sure of how we want to make the visuals look. For the rest of the tutorial, we will only be using the second method, if only for the sake of not repeating the same line every time.
We have now seen how the theme_bw() looks and can ascertain that “bw” stands for black and white. Below are some examples of the other themes listed above:
p + theme_linedraw()
p + theme_light()
Before moving on to how we can customize the individual components available to us in the themes, let us create some components we would like to customize.
p + labs(title = “Iris Dataset, Sepal Length vs Sepal Width”, x = “Sepal Length”, y = “Sepal Width”, caption = “Rendered in R using ggplot2”)
With plot and axis titles added, we may now start using theme() to customize these components. theme() provides us access to all the attributes we previously saw dumped in the after math of theme_get(). These are the attributes the control all the details of the components, from font size and type, to legend position and color. Each argument provided to us through theme_get() may only be set to one of four available element_type() types: element_text(), element_line(), element_box(), element_blank().
If the component we are trying to customize with theme() is of the textual nature (e.g. titles and captions), we must provide it an element_text(). If the component is line based (e.g. axis and grid lines), we pass the argument that we are using to modify said component an element_line(). If it is rectangular in nature (e.g. plot background) then pass element_box(). And finally, if you would like to turn off displaying it, element_blank().
Since there are way too many arguments available for modification in theme(), and likewise way to many arguments we can modify in each element_type(), this tutorial would be too tedious and tiresome if we were to include all of them. Therefore, we will showcase a complete example of how to modify a single component using by passing a element_type() to theme(). From there, you should be able to modify whatever you want by first looking up what other components are given access to (see: ?theme) and what options are available for modification (see: ?element_type).
The example will aim for the customization of the plot title. Since a title is a textual component, we will need an element_text() to modify it. In order to remove clutter, let us first define this and assign a reference to it for use in theme().
t_c <- element_text(size = 20, face = “bold”, color = “Navy”, hjust = 0.5)
Here, you can see some of the options available to us for modification. Each element_type() has it’s own list of options. This example only showcases four options of the text element type. These are: size, face, color, and hjust. hjust is the distance between the title and the edge of the plots. Setting it to 0.5 centers the title. Let us now pass this to theme():
p + theme(plot.title = t_c)
In theme(), we find the plot.title argument which is used to modify the title component of “plot”. Likewise, you may modify other components of “plot” (e.g. plot.background, plot.caption) or the title component of other parts of the plot (e.g. legend.title). By passing t_c to plot.title in theme(), and subsequently adding this to our plot (p), we have modified the title of the plot as per our wishes. This should give you the foundation you need to further customize your graphs and utilize ggplot2’s capabilities to its fullest.