How to Fix the R Error in do_one(nmeth) : na/nan/inf in foreign function call (arg 1)

The “error in do_one(nmeth) : na/nan/inf in foreign function call (arg 1)” error message occurs when using the kmeans function on a data frame with missing values. It is an easy error message to get if you do not have control over the contents of a data frame you are working with. Fortunately, it is an easy problem to fix.

Description of the error

This error message occurs when you are using the kmeans function with the format of kmeans(x, centers) where “x” is the data frame you are working with and “centers” is the number of clusters the function uses. It supplies a lot of information about the data frame and the clusters that it uses. The error message occurs when the data frame has one or more missing values. When this is the case, the function cannot process the data frame and it returns our error message. The solution to this problem is to eliminate the rows containing the missing values from the data frame.

Explanation of the error

Here we have an example of code that produces this error message. The data frame contains a single missing value.

> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(A = as.integer(abs(rnorm(7)*10)),
+ B = c(21, 15, 34, NA, 22, 16, 55),
+ C = as.integer(abs(rnorm(7)*10)))
> df
A B C
1 1 21 20
2 2 15 18
3 5 34 16
4 3 NA 7
5 19 22 9
6 9 16 0
7 5 55 6
> km = kmeans(df, centers = 3)
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

As you can see this data frame has an NA value on the fourth row of column “B” and it is this missing value that triggers our error message. This is because the kmeans function cannot handle missing values.

How to fix the R error

Here is a code example that fixes this problem. It is actually an amazing easy fix.

> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(A = as.integer(abs(rnorm(7)*10)),
+ B = c(21, 15, 34, NA, 22, 16, 55),
+ C = as.integer(abs(rnorm(7)*10)))
> df
A B C
1 1 21 20
2 2 15 18
3 5 34 16
4 3 NA 7
5 19 22 9
6 9 16 0
7 5 55 6
> df = na.omit(df)
> df
A B C
1 1 21 20
2 2 15 18
3 5 34 16
5 19 22 9
6 9 16 0
7 5 55 6
> km = kmeans(df, centers = 3)
> km
K-means clustering with 3 clusters of sizes 1, 3, 2

Cluster means:
A B C
1 5.000000 55.00000 6.0
2 2.666667 23.33333 18.0
3 14.000000 19.00000 4.5

Clustering vector:
1 2 3 5 6 7
2 2 2 3 3 1

Within cluster sum of squares by cluster:
[1] 0.0000 205.3333 108.5000
(between_SS / total_SS = 81.3 %)

Available components:

[1] “cluster” “centers” “totss” “withinss” “tot.withinss” “betweenss” “size”
[8] “iter” “fault”

As you can see applying the na.omit function to the data frame removes any rows containing NA values. Note that the first printout of the data frame has an NA value. However, the second printout after using the na.omit function omits row four and its NA value. The result is, that the error message is fixed and the program runs without any difficulties.

This is an easy error message to get if you do not have any control over the original data frame you are using. However, it is also easy to fix by simply removing the rows containing missing values. As long as your data frame is not filled with missing values removing those rows will not cause a problem. It is a simple problem with an even simpler fix.

Scroll to top