Sometimes when doing data science, you may be dealing with an exceptionally long table of data. In such cases printing out the entire data set may result in a longer table than you need to get a feel for it. In such cases, you may only need the beginning of the data set.
Description – Head() Function in R
The function in R that prints out just the beginning of a data set is the head function which has the format of head(x, n) where āxā is the data set and ānā is the number of elements to be listed. The number of elements to be listed has a default value of six. It can be used on vectors, data frames, matrixes, and lists making it an extremely useful function. This function is intended for reducing large data sets to a reasonable size when printing them out and for other uses. When you are dealing with small datasets it is simpler to just use the dataset name to get your printout.
What The Head Function Does
The head function will print out a table of the first element, to whatever number element you set. By default, it prints out the first six elements. This is a straightforward process of going into the data set and pulling out the first element that it contains through to the number that you choose. If you give the function number larger than the length of the data set, then it will produce the entire data set. It is an easy function to use because it has only two easy-to-use arguments.
Examples – R’s head function in action
Here we have five examples of code using the head function. Each one of them illustrates different features or data types.
> t = as.numeric(Sys.time())
> set.seed(t)
> x = rnorm(15)
> y = rnorm(15)
> z = rnorm(15)
> df = data.frame(x,y,z)
> head(df)
x y z
1 0.8263685 1.08257381 -1.3096090
2 -0.8312313 0.36612834 0.6999038
3 0.1802131 0.75739806 0.6955328
4 0.7666959 0.70281472 0.4916422
5 -2.2403236 0.65351072 -1.0081274
6 -0.1295188 -0.01911863 0.3133820
> t = as.numeric(Sys.time())
> set.seed(t)
> x = rnorm(15)
> y = rnorm(15)
> z = rnorm(15)
> df = data.frame(x,y,z)
> head(df,3)
x y z
1 0.8263685 1.0825738 -1.3096090
2 -0.8312313 0.3661283 0.6999038
3 0.1802131 0.7573981 0.6955328
This example also uses a data frame, but the length is set to three and so it only produces three lines of data.
> t = as.numeric(Sys.time())
> set.seed(t)
> x = rnorm(15)
> y = rnorm(15)
> z = rnorm(15)
> df = data.frame(x,y,z)
> head(df,10)
x y z
1 0.8263685 1.08257381 -1.3096090
2 -0.8312313 0.36612834 0.6999038
3 0.1802131 0.75739806 0.6955328
4 0.7666959 0.70281472 0.4916422
5 -2.2403236 0.65351072 -1.0081274
6 -0.1295188 -0.01911863 0.3133820
7 -0.2014634 -0.14081729 0.5485846
8 0.9239509 0.36813212 0.9711359
9 0.3142296 -0.95919401 0.7099152
10 -0.8895970 -1.53274661 0.2723141
In this example, we once again use a data frame, but we give it a length of ten. This results in a table consisting of ten lines of data.
> t = as.numeric(Sys.time())
> set.seed(t)
> x = rnorm(15)
> y = rnorm(15)
> z = rnorm(15)
> df = data.frame(x,y,z)
> head(df,20)
x y z
1 0.8263685 1.08257381 -1.30960905
2 -0.8312313 0.36612834 0.69990378
3 0.1802131 0.75739806 0.69553277
4 0.7666959 0.70281472 0.49164221
5 -2.2403236 0.65351072 -1.00812737
6 -0.1295188 -0.01911863 0.31338201
7 -0.2014634 -0.14081729 0.54858459
8 0.9239509 0.36813212 0.97113594
9 0.3142296 -0.95919401 0.70991520
10 -0.8895970 -1.53274661 0.27231411
11 0.3417856 0.46241101 1.04322014
12 2.1996556 0.85134456 -0.35725121
13 0.4924358 0.23457563 -0.09494597
14 1.1620174 0.35782759 -0.37705048
15 -0.8958105 0.47707668 0.33351358
In this example, we once again use a data frame, but we give it a length of twenty which is longer than the length of the data set. This results in a table that goes all the way to the last element and prints out the entire data set.
> t = as.numeric(Sys.time())
> set.seed(t)
> x = rnorm(15)
> head(x)
[1] 0.8263685 -0.8312313 0.1802131 0.7666959 -2.2403236 -0.1295188
In this example, we illustrate using the head function with a vector resulting in only the first six elements. This illustrates its use on more than just a data frame. It would have worked just as well for a matrix.
Applications of the head() function
The main application of the head function is reducing the size of the printout of a large data set to a more manageable size. It can also be used to produce a reduced-size data set beginning with the first element and going to whatever length you decide. This means that another application of this function is trimming down large datasets to the first part of the original with the ability to determine how much of the original data set is retained.
The head function is an easy function to learn and make use of. You can use it to reduce printouts of large datasets or reduce their size to a smaller version. In either case, it comes in handy when you have an exceptionally large data set.