Create Empty Data Frame in R – With or Without Column Names

As you move into advanced R programming, you will want to be able to initialize an empty data frame for use in more complicated procedures. Fortunately, R offers several ways to create an empty data frame depending on your situation and needs.

Create Empty Data Frame From Existing Data Frame

Suppose you have an existing data frame, with a lovely naming convention which you have grown very attached to. (We leave it as an exercise for the reader to determine why they are so attached to their data frames. Perhaps it was initialized for you by a friend? Or maybe you dislike creating new data frames). In any event, the proper solution is to merely remove all the rows, as shown below:

mere_husk_of_my_data_frame <- originaldataframe[FALSE,]

In the blink of an eye, the rows of your data frame will disappear, leaving the neatly structured column heading ready for this next adventure. Flip commentary aside, this is actually very useful when dealing with large and complex datasets. Cloning a properly formatted (and vetted) data frame and emptying the clone is a great way to reduce the frustration associated with processing data updates and similar files.

Initializing an Empty Data Frame From Scratch

Next up – initializing an empty data frame from scratch, while naming columns and defining data types. We’re going to create a hypothetical list of transactions, including: date, what was sold, who bought it, and the sale price. Each of which has a different data type, of course.

 df <- read.csv(text="Date,customer,prodid,sale", colClasses = c("Date", "character", "integer","numeric")) 

This approach uses a couple of clever shortcuts. First, you can initialize the columns of a dataframe through the read.csv function. The function assumes the first row of the file is the headers; in this case, we’re replacing the actual file with a comma delimited string. We provide the process with class descriptions via a vector that we initialize inline. You can even change column names at a later date if you identify you want to drop a particular field from the data frame.

Initializing Empty Data Frames – Practical Applications

So having created our empty data frame, we can potentially fill it by querying an SQL database. This is a common practice in industry, particularly commercial analytics, where scripting your extracts from the corporate transaction databases is a great way to speed up your process.

Regarding database access, a really clever type could have a little fun with the header record that most databases will provide you. The header describes the field names and the data types of the query results. You can use that to automatically configure column names and data types. I had an Oracle => Python function which automatically performed this for any query results, scanning the results of whatever came back from Oracle and automatically converting the fields and their content into a relevant data type. It was a tremendous time saver for a system that I hit several times per day as a pricing analyst.

A similar approach can be used when working with web scraping results. This permits you to set up the base data frame and invest your time in developing code to unpack and QA the contents of what your web scraping queries return to you.