How to Read A .dat File in R [importing external text files]

When working with data science in R programming it is sometimes necessary to import data from a data file into a data frame. It is often handy to be able to use a binary file, because they are more efficient in their use of space, and they are a little bit more secure than plain text. R programming does not have a native function for loading such files, but the haven package provides one for you. All you need to do is install it before running your program.

What are .dat files and How Do You Read Them?

If you are trying to read .dat file You need to use the read_dta function from the haven package. This function has the format of read_dta(” file name”) and it will import data from the named data file. This Function loads the file type with the file extension of “.dat” into your program as data. Because this function has only a single argument it is extremely easy to use the most likely error message you are to receive is that of “file not found”. To avoid this, it is necessary to make sure that the file Is in the same folder as your R script file. Otherwise, to access the file it is necessary to include the full path of the file you are loading. This will give the function of the format of read_dta(“C:/folder1/folder2/file name”), allowing you to load the file wherever it may be on your computer.

What the haven package is doing

When you import data from a .dat file you are loading binary data into a data frame. Unlike a csv file which is a standard text file that can be read by any text editor, this file type cannot be read by someone who does not have the proper tool to import data from a binary file. Like a csv file once this data file is loaded it will produce a row and column, data frame format. In a way, this makes it more secure than a regular text file because it is a lot harder to read since it cannot be read by a standard text editor. While this is not as good as password protection it is still more secure than a regular text file while at the same time taking up less space.

Examples of imports from .dat files

Here are three examples of r code to import data from a .dat file. Each of these data files can be found along with others at Stata dataset files. You can substitute any of them for the ones used here.

> library(haven)
> df = read_dta(“demo.dta”)
> df
# A tibble: 180 x 6
year qtr gdp pr m1 rs
1 1952 1 87.9 0.198 127. 1.64
2 1952 2 88.1 0.198 128. 1.68
3 1952 3 89.6 0.200 129. 1.83
4 1952 4 92.9 0.201 129. 1.92
5 1953 1 94.6 0.201 131. 2.05
6 1953 2 95.6 0.201 130. 2.20
7 1953 3 95.4 0.202 131. 2.02
8 1953 4 94.2 0.203 130. 1.49
9 1954 1 94.1 0.203 130. 1.08
10 1954 2 94.2 0.204 131. 0.814
# … with 170 more rows

This example uses the demo file, and it produces one hundred and eighty rows of data. The actual code is quite simple and as long as you get the path and file name correct it will work.

> library(haven)
> df = read_dta(“cars.dta”)
> df
# A tibble: 392 x 4
mpg cyl eng wgt
1 18 8 307 3504
2 15 8 350 3693
3 18 8 318 3436
4 16 8 304 3433
5 17 8 302 3449
6 15 8 429 4341
7 14 8 454 4354
8 14 8 440 4312
9 14 8 455 4425
10 15 8 390 3850
# … with 382 more rows

This example uses the cars file, and it produces three hundred and ninety-two rows of data. The actual code is quite simple and as long as you get the path and file name correct it will work.

> library(haven)
> df = read_dta(“airline.dta”)
> df
# A tibble: 32 x 6
year y w r l k
1 1948 1.21 0.243 0.145 1.41 0.612
2 1949 1.35 0.260 0.218 1.38 0.559
3 1950 1.57 0.278 0.316 1.39 0.573
4 1951 1.95 0.297 0.394 1.55 0.564
5 1952 2.27 0.310 0.356 1.80 0.574
6 1953 2.73 0.322 0.359 1.93 0.711
7 1954 3.03 0.335 0.403 1.96 0.776
8 1955 3.56 0.350 0.396 2.12 0.827
9 1956 3.98 0.361 0.382 2.43 0.800
10 1957 4.42 0.379 0.305 2.71 0.921
# … with 22 more rows

This example uses the airline file, and it produces thirty-two rows of data. The actual code is quite simple and as long as you get the path and file name correct it will work.

Applications of reading .dat files

A application of this function is being able to import data from a .dat file. The fact that binary data cannot be read from an ordinary text editor makes it harder for unauthorized individuals to inspect records stored in this file type. While this is not as good as storing data in an encrypted form, it does reduce the accessibility of the data to someone that has a program that can read binary files. Binary files also take up less space than text files, this is because they make more efficient use of memory resulting in smaller files. Binary files are an excellent way of storing large amounts of data, precisely because they are more efficient in their storage. You can use these types of files to store any type of data, and you will save a lot of space on large amounts of it.

Storing data as a binary file has the advantage of compressing the data while making it unreadable to a text editor. It not only saves space in your storage device, but it does supply an inexpensive form of encryption. In either case, being able to load a binary file expands the number of databases that you can access in your R programming, and in data science that is helpful.