A five number summary is an important part of data science. These summary statistics supply a lot of statistical information about a data set. This is an important part of statistics because it shows five important statistical factors of a data set. This descriptive statistic calculation will often show you more than you would see just looking at the raw numbers.
Description – the fivernum Function in R
Calculating a five number summary in r involves using the fivenum function. It has the form of fivenum(data) where data is the numerical vector being evaluated. It produces the minimum value, the first quartile, the median, the third quartile, and the maximum value. It does not include the mean value. The first quartile is at the 25th percentile, the median is at the 50th percentile, and the third quartile is at the 75th percentile. Each of the data points is used to calculate these values. These five values supply useful statistical information about the data set.
Explanation of the fivenum Function in R
The fivenum function is related to the boxplot function which provides the same information for the data values but in a graphical form. Calculating these five values allows you to find the interquartile range, which is the area between the first and second quartile. This range is similar to the standard deviation in that they both show the distribution of the data. However, the interquartile range is not affected by outlier data, but it is the standard deviation that is used in calculating the confidence interval. The lower quartile is the area below the first quartile and the upper quartile is the range above the second quartile. The fivenum function ignores missing values and bases its calculations only on the actual numeric values.
Examples of the fivenum Function in R
Here are three examples of the fivenum function in action.
> age = c(46,48,51,57,60,63,67,87,95)
> fivenum(age)
[1] 46 51 60 67 95
> boxplot(age)
This is a basic example of the fivenum, and boxplot functions being applied to a simple numeric vector.
> df = data.frame(Roket=c(‘Saturn V’, ‘N1’, ‘STS’, ‘Falcon 9’, ‘Falcon H’, ‘SLS’, ‘SLS B2’, ‘Star Ship’),
+ Thrust=c(3440, 4620, 2000, 776, 2327, 3992, 4309, 6400),
+ Orbit=c(140, 95, 27.5, 22.8, 63.8, 95, 130, 150),
+ Hight=c(110.6, 105, 56.1, 70, 70, 98.1, 111.3, 120))
> fivenum(df$Thrust)
[1] 776.0 2163.5 3716.0 4464.5 6400.0
> boxplot(df$Thrust)
This is a basic example of the fivenum, and boxplot functions being applied to a column of a data frame.
> df = data.frame(Roket=c(‘Saturn V’, ‘N1’, ‘STS’, ‘Falcon 9’, ‘Falcon H’, ‘SLS’, ‘SLS B2’, ‘Star Ship’),
+ Thrust=c(3440, 4620, 2000, 776, 2327, 3992, 4309, 6400),
+ Orbit=c(140, 95, 27.5, 22.8, 63.8, 95, 130, 150),
+ Hight=c(110.6, 105, 56.1, 70, 70, 98.1, 111.3, 120))
> sapply(df[c(‘Thrust’, ‘Orbit’, ‘Hight’)], fivenum)
Thrust Orbit Hight
[1,] 776.0 22.80 56.10
[2,] 2163.5 45.65 70.00
[3,] 3716.0 95.00 101.55
[4,] 4464.5 135.00 110.95
[5,] 6400.0 150.00 120.00
This example shows an alternative way to apply the fivenum function to a data frame.
Application (Five Number Summary in R)
The main application of these functions comes after recording each observation. These functions provide you with each percentile of statistical significance. They are twenty-five percent, fifty percent, seventy-five percent. They also give you the maximum and minimum values. It helps supply useful information about the data being analyzed that would not be seen from the raw data.
Calculating the five-number summary in r is an important statistical operation. These functions produce important statistical information. You will find them a useful tool in any statistical analysis of numerical data.