When you are doing a Principal Components Analysis, you will get the “error in colmeans(x, na.rm = true) : ‘x’ must be numeric” error message if one of your columns has characters or other non-numeric values. Fortunately, there is a simple solution for fixing this problem. It simply involves translating a factor variable into a numeric variable.
Description of the error
This error message occurs because when you are doing a Principal Components Analysis, the values of each column of your data frame have to have numeric values. If it has characters or other non-numeric values such as missing values, you will get our error message. This occurs because the prcomp function only works with numeric values, so as a result, you will get an error message if the values are not numeric. As a result, if you need to run this kind of analysis, you need to make sure that you are giving it only numeric values. If you give it non-numeric values, you will get our error message.
Explanation of the error
The following example contains code that produces our error message. You should note column Z of the data frame.
> t = as.numeric(Sys.time())
> set.seed(t)
> z = c(“A”, “B”, “C”, “D”, “E”)
> x = rnorm(5)
> y = rnorm(5)
> df = data.frame(z, x, y)
> df
z x y
1 A 0.02307778 0.41365815
2 B 0.63213959 0.77502100
3 C -0.91366753 1.83374930
4 D 0.90422176 -0.09915274
5 E 0.75987927 -0.77146351
> pr = prcomp(df)
Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric
If you look at data frame df you will notice that column Z has characters instead of numbers. It is this fact that triggers our error message because it is looking only for numeric values.
<h2>How to fix the error.</h2>Here we have an example of how to fix this problem. As long as you can convert the column into a factor, you can easily convert it into a numeric value. This is what we do in this example, and it fixes the problem.
> t = as.numeric(Sys.time())
> set.seed(t)
> z = c(“A”, “B”, “C”, “D”, “E”)
> x = rnorm(5)
> y = rnorm(5)
> df = data.frame(z, x, y)
> df
z x y
1 A 1.0158299 1.3621230
2 B -1.0393691 -0.4218296
3 C 0.1113177 0.5536360
4 D 1.8122020 1.1435097
5 E -1.3957393 0.9001602
> df2 = df
> df2$z = as.numeric(as.factor(df2$z))
> df2
z x y
1 1 1.0158299 1.3621230
2 2 -1.0393691 -0.4218296
3 3 0.1113177 0.5536360
4 4 1.8122020 1.1435097
5 5 -1.3957393 0.9001602
> pr = prcomp(df2)
> pr
Standard deviations (1, .., p=3):
[1] 1.664002 1.351141 0.469779
Rotation (n x k) = (3 x 3):
PC1 PC2 PC3
z 0.86675116 -0.4768597 -0.1461071
x -0.49435044 -0.7826437 -0.3782676
y -0.06603079 -0.4000920 0.9140932
If you will take a look at the difference between data frames df and df2, you will see that in df2 column Z is a series of numbers rather than letters. This conversion was accomplished by converting the column into a factor and then converting the factor into a list of numeric values.
This error message results from a simple mistake to make, but one that is also easy to fix. It is a simple matter of making sure that what you are putting through a Principal Components Analysis is only a numeric variable. This one simple correction will allow you to do the analysis without any errors. This means that you will get the results that you are looking for.