FImagine a column of data having similar inputs, for instance, customer information with a single column filled as “Male” or “Female” throughout. Such data is said to have factors, in our case, there are two factors, being “Male” and “Female”. Data frames in R automatically list data as factors when you print the column that is factorizable. However, most other data types do not list the factors, rather they list the entire column.
It is, therefore, often important to convert your data column that is in the form of factors, into numeric data. In this tutorial, I will be covering a couple of ways to do this. We’ll start with the most basic method and then progress to a variety of methods that do the job. In the end, it is for you to decide which method works best depending on the data set you are using.
How to Convert Factors to Numeric Data in R?
Down below, I have created a data set which repeats the same numbers randomly and is hence factorizable.
> myData <- (sample(c(2, 5, 7, 10, 12), 1000, replace= TRUE))
This data has not been stored as factors, this is verified by using the is.factor() command.
Therefore, we have our dataset which is not in factors. I’ll now store this data in another variable as factors. I can do that very quickly using the as.factor() command, which converts any vector into a factor. This is important for now because our goal here is to work with factors.
> FactoredData <- as.factor(myData)
[The image above shows the last few data values along with the factors listed as levels of the data. Now if I use the is.factor() check, it gives me positive because the as.factor() command does the conversion for us.
We now have a data set that has been factored and we are now ready to convert it into numeric data. R gives us many commands for convenient conversions of data, the as.numeric() command comes in handy for this one. However, there is one catch here. You can identify this if you simply use the as.numeric() command on the data here.
Now you may be wondering where the 1’s and 2’s came from, we never had any of these values in our original data set. The answer is simple, R does not really know what the original data values meant, and it labels them as 1, 2,3 and so on. This is usually helpful if you have non-numeric data such True and False, or Male and Female. However, in our case, you can use a quick fix to work around this. You can first convert your data into characters and then into numeric and this fixes the problem for us. Converting character vectors into numeric vectors is also rather simple, but useful.
In the image below you can see the correct factors that correlate with your original data.
Working with Non-Numeric Factors
Now that there is some basic understanding of how factors work and how you convert them into numeric data, I would like to extend our discussion to non-numeric data and how you can work with that.
I’ll be using a built-in data set of R called “warpbreaks”, it shows data of how many times wool breaks during weaving and categorizes the wool according to its thread tension and type. Using the fact there the wool is categorized into three types of tensions, L, M and H, we can see the factors right there, distributed into three levels.
We can now print the factors that the data is divided into.
> myData2 <- as.factor(warpbreaks$tension)
[With our data neatly stored into factors in the variable myData2, I can proceed to convert this variable into a numeric.
[Here you can see how it lists the L type tension with 1’s, the M type with 2’s and H type with 3’s. In this case, we want this sort of result but in the previous case we did not, but now you know both ways and how they work, you should hopefully be able to build on this when working with other data sets.
Other Methods of Converting Factors into Numeric
As discussed earlier, R gives you many ways to perform a simple task and it is up to you to decide how you want to go about the job. When converting factors to numeric, there are numerous commands and packages that can make your life easier. I have listed some of the easier methods down below.
Convert the Levels to Numeric
Factors are stored as levels as well. You can see this when you print a column of your data as factors. Therefore, converting the levels into numeric get the job done as well.
The Paste Method
The paste command also comes in handy here.
The “varhandle” Package
The package allows some very efficient and convenient conversions.
This is possibly the easiest method from this tutorial with the downside that you are required to install an additional package.
The methods used to convert factors into numeric codes are exhaustive, but this tutorial should equip with the most basic and widely used methods. If you plan to dive into data science and pursue it as a career, you’ll be doing this a lot, I encourage you to read more on this and get better at it.