Sometimes you need a summary of the content of a set of data. As the result knowing how to summarize data in r is highly important when using descriptive statistics. Being able to summarize data in a simple manner is a big help to productivity in r programming.
Description of summary()
The summary() function is an r function with the form of summary(variable) where the variable can be any dataset. It returns the number of incidences at least value in an all vector. It also returns the maximum, minimum, mean, median, first quantile, and third quantile. If the dataset is a data frame this data in a table based on each of each column. It provides information in a quick and simple manner using a single function.
Key features of summary()
An important feature of the summary function is the fact that it can be used with data set containing multiple variables. While you can get some of the same information from boxplot or bar chart the summary table produced by this r function is numerical rather than graphical. The frequency table provides a simple numeric value allowing you to see the number of occurrences at a glance. The features that you get the most results from its interaction with different types of variables. For example, a vector will produce a simple list, a data frame produces a chart.
Examples of summary() in action
Here are two examples of the use of the summary function. The first one involves a vector in the second one uses a data frame. They make for a good illustration of the differences in the results from different types of data sets.
> demo = 0:100
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 25 50 50 75 100
In this case, we have a simple vector from one to a hundred. The result is a very simple chart showing the values of each of the summary quantities.
> group = data.frame(
+ name = c(“Bob”, “Tim”,”Sue”,”Tim”,”Betty”,”Charles”),
+ age = c( 25, 30, 33, 35, 86, 56),
+ pay = c(200, 300, 333, 250, 400, 350)
name age pay
Betty :1 Min. :25.00 Min. :200.0
Bob :1 1st Qu.:30.75 1st Qu.:262.5
Charles:1 Median :34.00 Median :316.5
Sue :1 Mean :44.17 Mean :305.5
Tim :2 3rd Qu.:50.75 3rd Qu.:345.8
Max. :86.00 Max. :400.0
This time we put a data frame into the summary() function and it produces results for each column. For names, it produces the number of occurrences, and for age and pay, it produces columns of summary values use for each.
Alternatives to summary()
It is possible to produce all of these results with separate functions. There are six other functions that produce the same results as the summary() function.
- table() – Produces the number of occurrences of each element
- max() – Produces the maximum value
- min() – Produces the minimum value
- quantile() – Produces the quarterly values of the range
- mean() – Produces the average value
- median() – produces the middle n Value
The big advantage that the summary() function has is that you get everything with one function. Using the other r function requires more effort.
The summary() function is a very useful tool in r. If you need to produce a summary of your data set this is a quick and easy way to do it. Furthermore, it will help you understand the data by providing important statistics about it.