Often, as data scientist, we get carried away plowing through data on our data exploration expedition. We try all sorts of analyses and plots and immerse ourselves completely within the data. For that short period of time in which we are buried under thousands, sometimes millions, of rows of data, we become one with the data. We no longer need to keep referring back to variables names. No longer are we in need of column names. We spend so much time devouring our datasets, they become more familiar to us then the back of our hands. The very hands that are hurriedly typing away just out of sight, below, on our keyboards.
Legends are crucial. There first thing any reasonable person does when looking at a graph or plot is search for the axis titles and legends. All the information regarding variables in a plot should be properly presented in legends.
And thus, when it comes to presenting our data, we often forget one of the most crucial components. We have gained such an intuitive grasp of the variables in the data we are plotting that we forget the only reason we are able to decipher its contents is the hours or days of complete immersion we partook in. But how can someone who is looking at this plot for the first time, who has no prior information or knowledge about the dataset being present, see the plot as you do? How is he to discern what the different lines or dots or colors represent? This is where legends come in. Rest assured, ggplot2 provides for you comprehensive access to legend creation and modification.
But where is my legend?
At this point in the series, you should all be familiar with ggplot2. Though we will continue to learn and explore this powerful package, we are all able to import data and plot it using ggplot2. When we assign the axes to variables in our dataset, we use the aes() argument within the ggplot function. Often, we only require the plotting of two variables, x and y. When a plot contains only the variables x and y, and when these variables are plotted as distances on the graph, there is not need for a legend. The reason for this is that all the information needed to decoded the meaning of the variables x and y is present in the axis titles. Therefore, ggplot2 does not, by default, present a legend in this case. Therefore, it is quite natural that you may have used ggplot2 many times prior, but have never seen its legend.
A legend becomes important when there is a third variable introduced. Since ggplot2 doesn’t support three-dimensional plots, the third variable cannot be mapped to a distance in the plot. Therefore, we map it to another aesthetic. The aesthetics we have access to through ggplot2 are color, shape, and alpha. Color and shape are self-explanatory, adjusting the color and shape of the plotted points. Alpha adjusts the transparency. Since these aesthetics do not belong to an axis, they require some other method of labeling. This can best be done with a legend and that is why ggplot2 automatically places a legend on a plot containing three different aesthetics.
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point()
Here, you can see that plot comes with a legend on the right-hand side. This legend labels the third aesthetic, color. We assigned this aesthetic to the Species variable in our Iris dataset and now, using the legend, we can easily find the data points that belong to each Species.
ggplot2 Legends: Title
Now, since we have a plot and it comes with a legend by default, any modification we would love to make on the legends can be subsequently added on to our initial plot (see: Article 1 on how ggplot2 works). Our third aesthetic happens to be properly labeled, and thus, the legend ends up with a suitable title. However, not all datasets come with properly labeled variables. Therefore, one of the first things you might want to do when you plot a dataset with a legend is fix the title of the legend. We can do with the labs() function:
p + labs( color = “Legend: Species”)
This function lets you change the name of each legend that corresponds to a unique aesthetic. In our case, we only used the additional color aesthetic. However, if we had also used the shape aesthetic, for example, that we can also change the title of the legend corresponding to shape by adding a shape argument to the labs() function.
You should all be familiar with the theme() function by now. For those who are not, go read our previous article on themes in ggplot2. Legends in ggplot2 are modified with the help of this function. Let us recall what theme() is and how it is used. Remember that theme() provides us access to a large number of attributes that we can modify. In our previous tutorial we used theme() with the attribute plot.title to modify the title of our plot. This is a good example for us to understand the naming convention used for the attributes. The name of the component we would like to modify is followed by a dot and the name of the attribute that belongs to the component. Obviously, you will need to check references to make sure that particular component-attribute combination exists.
In our case, we are only concerned with the legend. Therefore, all the arguments we will be using in theme() will be of the nature legend.* or legend.*.*. Let us see this in a simple example:
p + theme( legend.position = “none” )
You will see that this removes the legend from our plot. We set the position attribute of the legend component of our “theme”, or plot, to “none”. In this way, we were able to remove the legend completely from our plot.
ggplot2() Legends: Position
With titles out of the way, the second most sought after modification for legends is positioning. The default positioning of the legend, set by ggplot2, might not always be the most satisfactory. This may easily be changed with “legend.position”. Let us look at a couple examples:
p + theme(legend.position = “top”)
p + theme(legend.position = “bottom”)
This moves the legend to the top or the bottom of the plot. You may also use “left” and “right” to, likewise, move the legend to the left or right side of the plot. If you desire a more precise method for legend positioning, you may pass coordinates to legend.position in the form of c(x,y). These coordinates must be numbers between 0 and 1. c(0,0) corresponds to the bottom left of our plot while c(1,1) corresponds to the top right.
p + theme( legend.position = c(0.1,0.8) )
ggplot2 Legends: Other Customization
Some other things you might want to customize in your legends are the colors of the legend itself. If you would like to customize the colors and borders of the keys themselves with in the legend, you can use legend.key. Since the keys are rectangles, you will need to provide an element_rect() type to legend.key (See: article on ggplot2 themes).
p + theme(legend.key = element_rect(fill = “grey”, color = “purple”))
Fill adjusts the fill of the rectangles making up the keys in the legends whereas color adjusts the color of the border.
If you would instead like to play with the rectangle making up the whole legend, you can adjust the color and thickness of the border with legend.box.background as follows:
p + theme( legend.box.background = element_rect(color=”green”, size=5) )
Size sets thickness of the legend’s border while color assigns the coloring. You can also change the margins of the legend with legen.box.margin:
p + theme( legend.box.background = element_rect(color=”green”, size=5),
legend.box.margin = margin(100, 50, 25, 10) )
Finally, you can customize the text of the keys and the title with legend.text and legend.title respectively. Since these are text based attributes, you need to pass an element_text() to these arguments.
p + theme(legend.text = element_text(size = 20, color = “blue”),
legend.title = element_text(face = “bold”))