When you are doing a Principal Components Analysis, you will get the “error in colmeans(x, na.rm = true) : ‘x’ must be numeric” error message if one of your columns has characters or other non-numeric values. Fortunately, there is a simple solution for fixing this problem. It simply involves translating a factor variable into a numeric variable.

### Description of the error

This error message occurs because when you are doing a Principal Components Analysis, the values of each column of your data frame have to have numeric values. If it has characters or other non-numeric values such as missing values, you will get our error message. This occurs because the prcomp function only works with numeric values, so as a result, you will get an error message if the values are not numeric. As a result, if you need to run this kind of analysis, you need to make sure that you are giving it only numeric values. If you give it non-numeric values, you will get our error message.

### Explanation of the error

The following example contains code that produces our error message. You should note column Z of the data frame.

> t = as.numeric(Sys.time())

> set.seed(t)

> z = c(“A”, “B”, “C”, “D”, “E”)

> x = rnorm(5)

> y = rnorm(5)

> df = data.frame(z, x, y)

> df

z x y

1 A 0.02307778 0.41365815

2 B 0.63213959 0.77502100

3 C -0.91366753 1.83374930

4 D 0.90422176 -0.09915274

5 E 0.75987927 -0.77146351

> pr = prcomp(df)

Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

If you look at data frame df you will notice that column Z has characters instead of numbers. It is this fact that triggers our error message because it is looking only for numeric values.

<h2>How to fix the error.</h2>Here we have an example of how to fix this problem. As long as you can convert the column into a factor, you can easily convert it into a numeric value. This is what we do in this example, and it fixes the problem.

> t = as.numeric(Sys.time())

> set.seed(t)

> z = c(“A”, “B”, “C”, “D”, “E”)

> x = rnorm(5)

> y = rnorm(5)

> df = data.frame(z, x, y)

> df

z x y

1 A 1.0158299 1.3621230

2 B -1.0393691 -0.4218296

3 C 0.1113177 0.5536360

4 D 1.8122020 1.1435097

5 E -1.3957393 0.9001602

> df2 = df

> df2$z = as.numeric(as.factor(df2$z))

> df2

z x y

1 1 1.0158299 1.3621230

2 2 -1.0393691 -0.4218296

3 3 0.1113177 0.5536360

4 4 1.8122020 1.1435097

5 5 -1.3957393 0.9001602

> pr = prcomp(df2)

> pr

Standard deviations (1, .., p=3):

[1] 1.664002 1.351141 0.469779

Rotation (n x k) = (3 x 3):

PC1 PC2 PC3

z 0.86675116 -0.4768597 -0.1461071

x -0.49435044 -0.7826437 -0.3782676

y -0.06603079 -0.4000920 0.9140932

If you will take a look at the difference between data frames df and df2, you will see that in df2 column Z is a series of numbers rather than letters. This conversion was accomplished by converting the column into a factor and then converting the factor into a list of numeric values.

This error message results from a simple mistake to make, but one that is also easy to fix. It is a simple matter of making sure that what you are putting through a Principal Components Analysis is only a numeric variable. This one simple correction will allow you to do the analysis without any errors. This means that you will get the results that you are looking for.