It is often much easier to see patterns in data when that data is presented as a graph rather than seeing a string of numbers. There are numerous types of graphs, each of which can show different types of relationships and patterns. The boxplot is a graph that shows more than just where the values are.
Side by side boxplot
A side by side boxplot provides the viewer with an easy to see a comparison between data set features. These features include the maximum, minimum, range, center, quartiles, interquartile range, variance, and skewness. It can show the relationships among the data points of a single data set or between two or more related data sets. The form of this type of graph is a box showing the quartiles, which lines showing the rest of the range of the data set. When used to compare related data sets the visual comparison can speak volumes.
Side by side boxplot in R
Doing a side by side boxplot in R involves using the boxplot() function which has the form of boxplot(data sets) and produces a side by side boxplot graph of the data sets it is being applied to. You can enter one or more data sets. This function also has several optional parameters, including.
- main – the main title of the breath.
- names – labels for each of the data sets.
- xlab – label before the x-axis,
- ylab – label for the y-axis
- col – color of the boxes.
- border – color of the border.
- horizontal – determines the orientation to graph.
- notch – appearance of the boxes.
# basic boxplot > x = 1:10 > boxplot(x)
Here is a simple illustration of the boxplot() function. Here the values of x are evenly distributed. If you run this code, you will see a balanced boxplot graph.
# basic boxplot 2 > y = c(1,4,5,6,9) > boxplot(y)
Here is a simple illustration of the boxplot() function with the values of x concentrated towards the center. If you run this code, you will see a boxplot graph with the box a little squished when compared to the one above.
The applications to this type of graph are numerous here is an illustration the code for comparing the gas mileage of 4 Cylinder cars to 8 cylinder cars.
# how to make a side by side boxplot in r > cyl4 = mtcars$mpg[which(mtcars$cyl==4)] > cyl8 = mtcars$mpg[which(mtcars$cyl==8)] > par(mfrow=c(1,2)) > boxplot(cyl4) > boxplot(cyl8) > par(mfrow=c(1,1)) > boxplot(cyl4,cyl8, + main = "4 cylinders versus 8 cylinders", + ylab = "Miles per gallon", + names = c("4 cylinders", "8 cylinders"))
The top two boxplot() functions what the two graphs side by side. The bottom boxplot() function put both boxplots in the same graph. It also illustrates some of the optional parameters of this function.
The boxplot() function is an extremely useful graphing tool that many programming languages lack. It serves as an example of why R is a useful tool in data science.