Sometimes when doing data science, you need to load in the data for processing from existing files. This is where the haven package comes in handy because it has functions you need for loading a variety of files.
What are SAV files?
When you are trying to load a SAV file you use the read_sav function from the haven package with the format of read_sav(file name). When you want to save the file you use the write_sav function also from the haven package. This function has the format of write_sav(data frame, file name). It is a fairly simple process as long as you are saving to or loading from your operating directory.
How the haven package works with .sav files
When you are saving a SAV file, the write_sav function saves the data to a file in your operating directory. Furthermore, the read_sav function loads the indicated file into a data frame. If the file is not in the operating directory, then it will be necessary to supply the actual path to that file. This makes it possible to share data among a group.
Examples of importing and saving .sav files
Here we have two sets of example code, the first one shows you how to save SAV files and the other shows you how to load them.
> library(“haven”)
> t = as.numeric(Sys.time())
> set.seed(t)
> df0 = data.frame(
+ A = as.integer(abs(rnorm(5)*10)),
+ B = as.integer(abs(rnorm(5)*10)),
+ C = 11:15,
+ D = as.integer(abs(rnorm(5)*10)),
+ E = 3
+ )
> df0
A B C D E
1 6 17 11 3 3
2 16 9 12 4 3
3 8 15 13 6 3
4 1 9 14 9 3
5 0 0 15 21 3
> write_sav(df0, “demo.sav”)
This example creates a data frame and then saves it as an SAV file. Unless you include a path, the file will automatically be saved to the current directory.
> library(“haven”)
> df1 = read_sav(“demo.sav”)
> df1
# A tibble: 5 x 5
A B C D E
1 6 17 11 3 3
2 16 9 12 4 3
3 8 15 13 6 3
4 1 9 14 9 3
5 0 0 15 21 3
This is the code for loading the sav file and converting it into a data frame. It is a simple process, but you need to make sure that you have the haven package loaded for it to work.
Where to use .sav files
The two most common applications of these functions are being able to load data from an external source and being able to share data being processed by a group. There are going to be times when you will get your data from external sources such as the government. Furthermore, once you have that data you may need to pass your results on to other members of your team.
Being able to save and load data is an important part of giving a program flexibility. Having the necessary functions for both saving and loading data files is an important tool to have in your programming toolkit.