How to Read sas7bdat in R to Convert SAS Data to a R Dataframe

When doing data science with r programming, it is sometimes necessary to load a data file. The SAS file format is easily loaded in as a data frame. The function for loading such files into an Rr program is extremely easy to use with the data file name is the only argument that is needed. if the file is not in the same folder as your r script file then you will need to include the path to the data file, but it is still a simple function to use.

Using haven to read SAS data files in R

In order to load in a sasc7dat file you first need to install the haven package. To import SAS file as a data frame you need to use the read function. This function has the format of read_sas(“file’s name”) and it is used to import a SAS file as a data frame. Because this function has only a single argument it is extremely easy to use, And the most likely error message would be “file not found”. This is most likely to occur if the data file you are loading it’s not in the same directory as your R script file and you do not include the proper file path information. If the data file is in a different directory, all you need to do is include the path information along with the filename using the format read_sas(“C:/temp/mysasdataset.sas7bdat”) where you have the path and filename together. This is a simple modification to the basic format of this operation, and all you need to do is know where the file is found.

How haven works (to read sas7bdat files in R)

Loading in a sas dataset requires installing the haven package because the read function is not a base r operation but part of this package. However, this is what you need to do to load a sasc7dat file. This operation loads in the SAS file type and creates a data frame from it. Because it is not a base r function, you will get an error message if you do not install the haven package before using it. However, you only have to install the package once. After that as long as you keep your r script open the read function will be usable. Once you have the file loaded into a variable you can then use it like any other data frame. So once you install the haven package, you will be good to go on loading and processing these files.

Examples

The following examples show different aspects of uploading in sas data using the read operation. The sasc7dat file used in these examples can be found by clicking here. Be aware of the fact that it produces a rather large data frame.

> library(haven)
> df = read_sas(“psu97ai.sas7bdat”)
> View(df)

This example simply loads in the sasc7dat file and then opens up a view window showing the entire data set. This is the simplest example and all it does is illustrate loading in the sasc7dat file and then displaying it.

> library(haven)
> df = read_sas(“psu97ai.sas7bdat”)
> class(df)
[1] “tbl_df” “tbl” “data.frame”

In this example, we checked the data type and confirm that this is indeed a data frame. At first glance, this may not be obvious because the term “data frame” does not appear In the reading operation.

> library(haven)
> df = read_sas(“psu97ai.sas7bdat”)
> df
# A tibble: 29,217 x 70
NCESSCH FIPS LEAID SCHNO STID97 LEANM97 SEASCH97 SCHNAM97

1 010000200277 01 0100002 00277 210 ALABAMA~ 0020 SEQUOYA~
2 010000201705 01 0100002 01705 210 ALABAMA~ 0030 WALLACE~
3 010000201706 01 0100002 01706 210 ALABAMA~ 0040 MCNEEL ~
4 010000500870 01 0100005 00870 101 ALBERTV~ 0010 ALABAMA~
5 010000500871 01 0100005 00871 101 ALBERTV~ 0020 ALBERTV~
6 010000500879 01 0100005 00879 101 ALBERTV~ 0110 EVANS E~
7 010000500886 01 0100005 00886 101 ALBERTV~ 0150 MCCORD ~
8 010000500889 01 0100005 00889 101 ALBERTV~ 0200 WEST EN~
9 010000501616 01 0100005 01616 101 ALBERTV~ 0035 BIG SPR~
10 010000600123 01 0100006 00123 048 MARSHAL~ 0065 BOAZ MI~
# … with 29,207 more rows, and 62 more variables: STREET97,
# CITY97, ST97, ZIP97, ZIP497,
# PHONE97, TYPE97, STATUS97, LOCALE97,
# FTE97, GRSPAN97, GSLO97, GSHI97,
# UG97, PK97, KG97, G0197, G0297,
# G0397, G0497, G0597, G0697, G0797,
# G0897, G0997, G1097, G1197, …

In this example, after we finish loading the content it is then displayed as normal data frames would be. What we get in the process is a truncated version of the data set, but it illustrates that we are indeed getting data frames. However, it still gives an excellent illustration of how these functions work.

How To Apply This To Analytics Projects

The main application of this operation is loading in a sas dataset from a data file. The reason for this is to load the content that you are going to be working with and putting it into a format that you can print, graph, and otherwise process. When you are loading a data file you do not have control over the format, in this case, the process only requires a simple operation. Once you have the information loaded the applications are endless. This file format seems to be particularly good at storing information in a table format. Furthermore, it is good at handling large amounts of information. However, once it is loaded in, you can process it as in any other situation.

Reading a sasc7dat file is a simple process that simply requires entering the data file name into the reading operation. This is a simple process that will come in handy when working with a sas dataset. This makes it another useful tool in your R programming toolbox.

Scroll to top
Privacy Policy