How To Use the split function in R to break up data frames & vectors

Sometimes when doing data science, it is necessary to split up the contents of data structures. When programming in r, you can use an R function to do the job. There are a couple of functions that will do this depending upon the data structure you are working with. Usually, splitting up a data structure in this way is done to show connections that are not readily visible from the data.

Description – The Split() Function in R

The split function in r is the function that you use to split data frames and vectors. It has the form of split(v, g) and it will split the data frame or vector according to groups. In this function, v is the data frame or vector and g is the grouping based on which it is being split. The grouping is described by a character vector that indicates how the data set will be split. The split function has an inverse function called the unsplit function. The unsplit function can recreate the original data set if the same group is used.

Explanation of the split function in R

The split function in r is similar to the strsplit function and the str_split function, which split character strings into individual words. The str_split function can also use a delimiter or multiple delimiters to eliminate words in the input string from the output string. The split function will split a data frame based on each column name in the group list. It can be used in the apply function and the lapply function, to split individual columns of a data frame. This function does a rather good job of helping to categorize the data within a data set. It divides it up according to categories so that you can see them.

Examples of the split function in action

Here are three examples of the split function in action.

> v = 1:9
> g = c(“A”,”B”,”C”,”A”,”B”,”C”,”A”,”B”,”C”)
> sp = split(v,g)
> sp
$A
[1] 1 4 7

$B
[1] 2 5 8

$C
[1] 3 6 9

> u = unsplit(sp,g)
> u
[1] 1 2 3 4 5 6 7 8 9

This example uses the split and unsplit functions on a vector.

> A = c(“A”,”B”,”C”,”A”,”B”,”C”,”A”,”B”,”C”)
> B = 1:9
> df = data.frame(A,B)
> sp = split(df,df$A)
> sp
$A
A B
1 A 1
4 A 4
7 A 7

$B
A B
2 B 2
5 B 5
8 B 8

$C
A B
3 C 3
6 C 6
9 C 9

> u = unsplit(sp,df$A)
> u
A B
1 A 1
2 B 2
3 C 3
4 A 4
5 B 5
6 C 6
7 A 7
8 B 8
9 C 9

This example uses the split and unsplit functions on a data frame using a direct reference to a column name.

> A = c(“A”,”B”,”C”,”A”,”B”,”C”,”A”,”B”,”C”)
> B = 1:9
> df = data.frame(A,B)
> sp = split(df,A)
> sp
$A
A B
1 A 1
4 A 4
7 A 7

$B
A B
2 B 2
5 B 5
8 B 8

$C
A B
3 C 3
6 C 6
9 C 9

> u = unsplit(sp,A)
> u
A B
1 A 1
2 B 2
3 C 3
4 A 4
5 B 5
6 C 6
7 A 7
8 B 8
9 C 9
> z = sp$B
> z
A B
2 B 2
5 B 5
8 B 8

This example uses the split and unsplit functions on a data frame using a vector that is similar to the first column. It also uses the dollar sign operation to pull some of the data out into a smaller data frame.

Applications of the split function in R

The main application of the split function is dividing up data into categories. It is designed to work on vectors and data frames, and it has a lot of flexibility. In data frames where the first column has repetitive labels that service categories this function can rearrange the data according to those categories. For vectors, this function can rearrange the data in a format that divides it up into categories that are defined within the function. It is a helpful tool for displaying data in a more meaningful way.

A split function is a handy tool for displaying data in a meaningful manner. It also supplies a way of pulling a part of the categorized data out of the original data set and storing it in a different data set. It is a useful tool, for rearranging data according to categories. After which you can display it or spread it to other variables. You will find this function an especially useful tool.