How to read a zip file in r without unzipping it (with examples)

When doing data science sometimes you will receive data sets as CSV files, and sometimes you will receive more than one together in a zip file. Fortunately, R programming supplies a way to load the contents from within your program.

How Zip Files Operate

To unzip a zip file within your program, you use the unzip function with the format of unzip(zip, list) where the zip argument is the zip file being worked on. If the list argument is set to true, the result is a directory of the zip file. If the list argument is a file name, then the file that is named is unzipped.

Meet the unzip function

When using the unzip function, it either creates a directory of the zip file or unzips a file name if you give it one. The file name provided is unzipped into the active directory allowing it to be read. This means that the file is placed right where it needs to be.

Examples of reading a zip file without unzipping it

Here are three examples of accessing a zip file without unzipping it.

> list = unzip(“demo.zip”, list = TRUE)
> list
Name Length Date
1 c2020.csv 79 2020-12-22 14:24:00
2 mtcars.csv 1303 2021-12-19 17:16:00
3 Demo.csv 97 2022-01-06 12:45:00

This example simply produces a directory of the contents of the zip file.

> library(readr)
> df = read_csv(unzip(“demo.zip”, “Demo.csv”))

— Column specification ———————————————
cols(
A = col_character(),
B = col_double(),
C = col_double(),
D = col_double()
)
> df
# A tibble: 7 x 4
A B C D
1 A 1 2 2
2 B 2 3 4
3 C 3 4 6
4 D 4 5 8
5 E 5 6 10
6 F 6 7 14
7 G 7 8 16

This example loads a basic demo data frame from a CSV file. It then goes ahead and prints out the contents.

> library(readr)
> df = read_csv(unzip(“demo.zip”, “c2020.csv”))

— Column specification ———————————————
cols(
Bob_T = col_double(),
Tom_B = col_double(),
Sue_C = col_double(),
Tim_M = col_double()
)
> df
# A tibble: 6 x 4
Bob_T Tom_B Sue_C Tim_M
1 5 6 4 0
2 2 8 6 8
3 4 4 4 1
4 0 3 7 3
5 3 7 5 5
6 1 9 2 6

This example demonstrates the reading of another CSV file to a data frame. It then goes on to print out the data frame.

Application

The main application of the unzipped function is being able to access datasets that are stored in zip files. This has the benefit that zip files take up less space and can therefore be sent more quickly. This is more convenient than unzipping the file separately and moving its content to the proper directory.

The unzip function is a handy tool for being able to conveniently unzip the contents of a zip file. It is also handy for determining the contents of the zip file. This makes it a handy tool for your program toolbox.

Scroll to top