Sometimes when doing data science, you may be dealing with an exceptionally long table of data. In such cases printing out the entire data set may result in a longer table than you need to get a feel for it. In such cases, you may only need the beginning of the data set.

### Description – Head() Function in R

The function in R that prints out just the beginning of a data set is the head function which has the format of head(x, n) where āxā is the data set and ānā is the number of elements to be listed. The number of elements to be listed has a default value of six. It can be used on vectors, data frames, matrixes, and lists making it an extremely useful function. This function is intended for reducing large data sets to a reasonable size when printing them out and for other uses. When you are dealing with small datasets it is simpler to just use the dataset name to get your printout.

### What The Head Function Does

The head function will print out a table of the first element, to whatever number element you set. By default, it prints out the first six elements. This is a straightforward process of going into the data set and pulling out the first element that it contains through to the number that you choose. If you give the function number larger than the length of the data set, then it will produce the entire data set. It is an easy function to use because it has only two easy-to-use arguments.

### Examples – R’s head function in action

Here we have five examples of code using the head function. Each one of them illustrates different features or data types.

> t = as.numeric(Sys.time())

> set.seed(t)

> x = rnorm(15)

> y = rnorm(15)

> z = rnorm(15)

> df = data.frame(x,y,z)

> head(df)

x y z

1 0.8263685 1.08257381 -1.3096090

2 -0.8312313 0.36612834 0.6999038

3 0.1802131 0.75739806 0.6955328

4 0.7666959 0.70281472 0.4916422

5 -2.2403236 0.65351072 -1.0081274

6 -0.1295188 -0.01911863 0.3133820

> t = as.numeric(Sys.time())

> set.seed(t)

> x = rnorm(15)

> y = rnorm(15)

> z = rnorm(15)

> df = data.frame(x,y,z)

> head(df,3)

x y z

1 0.8263685 1.0825738 -1.3096090

2 -0.8312313 0.3661283 0.6999038

3 0.1802131 0.7573981 0.6955328

This example also uses a data frame, but the length is set to three and so it only produces three lines of data.

> t = as.numeric(Sys.time())

> set.seed(t)

> x = rnorm(15)

> y = rnorm(15)

> z = rnorm(15)

> df = data.frame(x,y,z)

> head(df,10)

x y z

1 0.8263685 1.08257381 -1.3096090

2 -0.8312313 0.36612834 0.6999038

3 0.1802131 0.75739806 0.6955328

4 0.7666959 0.70281472 0.4916422

5 -2.2403236 0.65351072 -1.0081274

6 -0.1295188 -0.01911863 0.3133820

7 -0.2014634 -0.14081729 0.5485846

8 0.9239509 0.36813212 0.9711359

9 0.3142296 -0.95919401 0.7099152

10 -0.8895970 -1.53274661 0.2723141

In this example, we once again use a data frame, but we give it a length of ten. This results in a table consisting of ten lines of data.

> t = as.numeric(Sys.time())

> set.seed(t)

> x = rnorm(15)

> y = rnorm(15)

> z = rnorm(15)

> df = data.frame(x,y,z)

> head(df,20)

x y z

1 0.8263685 1.08257381 -1.30960905

2 -0.8312313 0.36612834 0.69990378

3 0.1802131 0.75739806 0.69553277

4 0.7666959 0.70281472 0.49164221

5 -2.2403236 0.65351072 -1.00812737

6 -0.1295188 -0.01911863 0.31338201

7 -0.2014634 -0.14081729 0.54858459

8 0.9239509 0.36813212 0.97113594

9 0.3142296 -0.95919401 0.70991520

10 -0.8895970 -1.53274661 0.27231411

11 0.3417856 0.46241101 1.04322014

12 2.1996556 0.85134456 -0.35725121

13 0.4924358 0.23457563 -0.09494597

14 1.1620174 0.35782759 -0.37705048

15 -0.8958105 0.47707668 0.33351358

In this example, we once again use a data frame, but we give it a length of twenty which is longer than the length of the data set. This results in a table that goes all the way to the last element and prints out the entire data set.

> t = as.numeric(Sys.time())

> set.seed(t)

> x = rnorm(15)

> head(x)

[1] 0.8263685 -0.8312313 0.1802131 0.7666959 -2.2403236 -0.1295188

In this example, we illustrate using the head function with a vector resulting in only the first six elements. This illustrates its use on more than just a data frame. It would have worked just as well for a matrix.

### Applications of the head() function

The main application of the head function is reducing the size of the printout of a large data set to a more manageable size. It can also be used to produce a reduced-size data set beginning with the first element and going to whatever length you decide. This means that another application of this function is trimming down large datasets to the first part of the original with the ability to determine how much of the original data set is retained.

The head function is an easy function to learn and make use of. You can use it to reduce printouts of large datasets or reduce their size to a smaller version. In either case, it comes in handy when you have an exceptionally large data set.