When doing statistical analysis and data science, looking at the raw data is often not helpful. Often summary statistics are needed to help supply useful information about the data. One form of this is running descriptive statistics on your numeric columns, which supplies a lot of useful information about data. While it is stuff that you can calculate on your own, this is a quick method of statistical calculation.

### Description – describe() Function in R

When using describe in r, the describe function has the form of describe(dataset), where “dataset” is the data set being described. The function accepts any data type including missing data. It produces a contingency table supplying information about the data set. The exact content of the table depends upon the data structure being analyzed.

### Explanation – describe() in R

The describe function supplies a lot of statistical information. The information it supplies depends upon what it is examining. While the mean is the central tendency that all numeric data sets get, the median is reported for vectors along with the other quantiles. It does not give you the standard deviation or sample variance, but it does supply the range of values. The function gives you the Gmd value, which supplies both deviation and variance information. It does not distinguish a response variable, but it will sample it like any other value.

### Examples

Here are two r code examples showing the describe function in action.

> library(Hmisc)

> x = c(1:10)

> describe(x)

x

n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95

10 0 10 1 5.5 3.667 1.45 1.90 3.25 5.50 7.75 9.10 9.55

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10

Frequency 1 1 1 1 1 1 1 1 1 1

Proportion 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

This example applies the describe function to a simple vector.

> library(Hmisc)

> x = c(1:5,3)

> y = c(1,3,5,7,3,9)

> z = c(“A”, “B”, “C”, “D”, “E”, “C”)

> xyz = data.frame(z, x, y)

> describe(xyz)

xyz

3 Variables 6 Observations

————————————————————————————————————————-

z

n missing distinct

6 0 5

lowest : A B C D E, highest: A B C D E

Value A B C D E

Frequency 1 1 2 1 1

Proportion 0.167 0.167 0.333 0.167 0.167

————————————————————————————————————————-

x

n missing distinct Info Mean Gmd

6 0 5 0.971 3 1.733

lowest : 1 2 3 4 5, highest: 1 2 3 4 5

Value 1 2 3 4 5

Frequency 1 1 2 1 1

Proportion 0.167 0.167 0.333 0.167 0.167

————————————————————————————————————————-

y

n missing distinct Info Mean Gmd

6 0 5 0.971 4.667 3.6

lowest : 1 3 5 7 9, highest: 1 3 5 7 9

Value 1 3 5 7 9

Frequency 1 2 1 1 1

Proportion 0.167 0.333 0.167 0.167 0.167

————————————————————————————————————————-

This example applies the describe function to a data frame, so it produces more results.

### Applications of describe in R

The main application of the describe function is that of supplying statistical information about the contents of a vector. When it is applied to a data frame it treats each column as a separate vector. A practical application would be a company using this function on a data frame about its employees to get statistical information on things like pay, age, and hours worked.

The describe function is a powerful function that supplies important statistical information. The information that it supplies is quite useful in statistical analysis. It is an easy tool to use since it only has one argument. This means that it will provide you with a lot of information with little effort.

Practical alternatives include the summary function and various R package competitors. Measures of sample variance are a key gap if you’re preparing for regression analysis or categorical variable modeling. These situations may prompt you to go beyond the usual base R statistic measures in your descriptive statistics work.