We’re going to walk through how to create an R data frame from scratch.
This article continues the examples started in our data frame tutorial. We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- How To Create an R Data Frame
- How To Sort an R Data Frame
- How to Add and Remove Columns
- Renaming Columns
- How To Add and Remove Rows
- How to Merge Two Data Frames
Creating Your Own Data Frames
Continuing the example in our r data frame tutorial, we have three attributes in the original example that we might be able to enrich with a little additional data. In addition to their weight, we know three things about the chicken measurements.
- Their Diet – 4 possible factors – perhaps we can group the diets further
- The Chick # – There are 50 chickens, perhaps they have something in common such as breed or parents
- Time – date of measurement, perhaps there were things going on that we can use to segment the data
Lets start by creating a data frame to expand on what we know about the diet.
Creating Our First R data Frame
A R data frame is composed of “vectors”, an R data type that represents an ordered list of values. In the case of the diet, we know there are several nutrients inside each of the 4 diet variations the chickens were fed.
- Protein – High or Low
- Vitamin – High or Low
So we fed the chickens combinations of each to understand the effects of each element. We create this by setting up a series of vectors to represent this experiment.
diets <- data.frame ('diet'=1:4, 'protein'=c(0,0,1,1), 'vitamin'=c(0,1,0,1))
The results of this effort looks like:
This now exists in a data frame titled “diets” which we can join (at some future point) with our original data frame to enrich our data with additional attributes about each diet.
The ability to create data frames from within your code is particularly useful in business analytics.
First, while in many cases you will be importing data from Excel (or text file) or SQL database, you may decide to insert additional attributes you identify over the course of your research. For example, by digging around a bit in the chicken farming example, we identified that there was a mix of factors behind each diet. Further statistical analysis could identify that diets with certain components perform better than others – which could be important to future testing. Along the same lines, you might identify that certain data points are questionable (the night inspector was frequently sleeping, for example) and should be dismissed from the analysis.
The other opportunity here is the ability to run “what-if” scenarios within your analysis. For example, you may want to be able to adjust certain factors in an optimization script (more/less cost, more/less sensitivity, tighter constraints) as part of your analysis. Creating an R dataframe from your script and merging it with other data is an excellent way to make these sorts of dynamic adjustments.
Up next…sorting R data frames. Or if you want to skip ahead, see below….