External data sets are rarely designed with your project in mind. Various reason codes are created, for ancient and unknown purposes, and we stumble upon them when we’re trying to solve a practical problem. If you have worked with manufacturing or transaction systems, this will sound very familiar.
Your project will be a lot more comprehensible if you rename the factors into terms that the average person can understand. So you rename the transaction override code “P” to “Pricing Override” or item code 867-5309 to the more familiar: “Cheetos bulk pack”.
How To Change Factor Levels in R
For this exercise, we’re going to use the warpbreaks data set in the standard r installation. This is manufacturing data, looking at how often the wool on a weaving machine breaks. They’re looking for differences in the materials and machine settings (tension). This sort of question is very common in manufacturing: optimizing the machine and raw materials to limit scrap.
# sample data - rename factor levels r > head(warpbreaks) breaks wool tension 1 26 A L 2 30 A L 3 54 A L 4 25 A L 5 70 A L 6 52 A L
We have two factors (wool, tension). We want to rename factor levels in r so they are easier to understand. Let’s take look at their values:
# look at factor levels in r for wool > levels(warpbreaks$wool)  "A" "B" # look at factor levels in r for tension > levels(warpbreaks$tension)  "L" "M" "H"
So in terms of factor levels, we have two types of wool and three tension settings for the machine. Perhaps the machine factor levels would be far easier to understand if we called them Low, Medium, and High.
We can accomplish this with a simple vector operation.
# Change the Levels of a Factor in R levels(warpbreaks$tension) <- c("Low","Medium","High") # validate that we renamed the factor levels in R > levels(warpbreaks$tension)  "Low" "Medium" "High" # a view of the final data set after we change factor levels in R > head(warpbreaks) breaks wool tension 1 26 A Low 2 30 A Low 3 54 A Low 4 25 A Low 5 70 A Low 6 52 A Low
While this makes your data set a bit more verbose, renaming factor levels is a great way to make your project much more readable. This is a particularly useful thing to do if you work in industry or are a R consultant.