An x must be an array of at least two dimensions error can be frustrating in its unpredictability. Most type errors in R arise from a fairly obvious compatibility issue. For example, if an error complains that you need to use a string as an argument then you can assume that you simply passed an incorrectly typed variable. But this particular error is a little more tricky as it deals with data formats that are inherently flexible. You may well have started out with a multidimensional array only for it to have flattened at some other point. As you’ll soon see, fixing the error is really just a matter of tracing the variable backward. Then ensuring it’s being properly assigned, formatted, or converted.
What the Error Really Means
At the most basic level, this type of error is pointing out that an element within your code expects a multidimensional structure and is instead receiving something different. The most common reason is that you’ve simply passed an incorrect variable. For example, you might have passed a vector rather than a two dimensional array. However, the nature of data in R can make this a lot more complex than that simple description suggests. And in particular, what exists as a 2D array one moment might not necessarily have the same shape the next.
Finding the Error’s Underlying Cause
R is an incredibly popular language for people working in data science. And one of the big reasons for that popularity comes from how R handles data. The basic R language already provides you with a level of flexibility similar to, for example, numpy arrays. And an R data frame is inherently powerful in comparison to what’s found by default in most other languages. What’s more, R doesn’t lock you into specific structures for your variables.
The language makes it easy for 3rd party libraries to create a custom data type or dtype. And R’s popularity also means that you’re certain to find a data structure that matches your needs. And if not, it’s easy to construct your own. However, this ease of use does have one surprising downside. R can make it a little too easy to seamlessly manipulate data. You can think of it as somewhat analogous to pouring water from one container to another that has a vastly different shape. And then freezing that water into a singular shape. However, the problem is that there are times when you need your data to be in one shape. But R’s interpreter might have “poured” it into a different shape without your explicit instruction to do so.
R’s ability to seamlessly shape an array element or dimension into another data type will usually ensure your code runs smoothly. But sometimes you’ll see errors like the current one pop up as a result of that flexibility. This can be a difficult concept for people more familiar with strongly typed languages like C. R is strongly, but dynamically, typed. This means that there are instances where specific types are needed but where the environment can also shift those types around. But the process can be more easily understood with an example. Take a look at the following code.
ourSet <- data.frame(a = c(1, 2, 3, 4, 5),
b = c(6, 7, 8, 9, 10),
c = c(11, 12, 13, 14, 15),
d = c(16, 17, 18, 19, 20)
)
print(ourSet)
We begin by creating a two dimensional data frame and assigning it to a variable named ourSet. It contains five rows and four columns. Next, we print out the contents of ourSet. That’s fairly straightforward, but what happens if we try to work with just a portion of that structure? Try swapping out the previous print statement with the following.
print(rowSums(ourSet[ , 1]) )
You might imagine that we’d see the sum of the ourSet rows. Instead, we get the error message. And the reason is the crux of everything discussed up to this point. When we accessed the subset of ourSet, the R interpreter automatically converted it into the most efficient form – a vector. However, rowSums needs a two dimensional structure. The original 2d array in the data frame fits those criteria. But the automatic conversion created by accessing it also rendered the type incompatible.
Putting It All Together To Fix the Error
Thankfully R makes it just as easy to instruct the system on data conversation as it does the creation of the data itself. The error can be fixed by simply telling the R interpreter not to “drop” the dimensional element when accessing the structure’s subset. Edit the previous example again, but this time replace the print statement with the following.
print(rowSums(ourSet[ , 1, drop = FALSE]) )
R assumes that drop is TRUE by default. By manually overriding that behavior you can maintain the element structure. This means that the data is now in the same format as the original array. Note that this holds true for most R containers. The different forms of multidimensional array typically function in the same way. So multidimensional arrays using one library will behave like a multi dimensional array in another. Conversion to a rectangular array for this process can fit every individual element into an entire array that fits within the proper structure for the matrices.
Likewise, this concept holds true for other conversions or procedures to access elements of the structure. For example, if you used matrix multiplication you can leverage drop to ensure proper parity for every array element. Notably, R will also fill in missing values to ensure that every value lines up properly. This is the flip side to the data fluidity which prompted the error in the first place. You can also manually check the data you’re working with to determine whether to set up additional formatting before moving it into your workflow. For example, you can use the dim function to check the dimensions in an array or similar container. Take a look at this minor modification to the original example.
ourSet <- data.frame(a = c(1, 2, 3, 4, 5),
b = c(6, 7, 8, 9, 10),
c = c(11, 12, 13, 14, 15),
d = c(16, 17, 18, 19, 20)
)
ourNewVal <- ourSet[ , 1, drop = FALSE]
ourNewValDropped <- ourSet[ , 1,]
print(dim(ourNewVal))
print(dim(ourNewValDropped))
We create ourSet again, but this time we’ll assign the subset to ourNewVal and ourNewValDropped. One disables drop and the other leaves it enabled. Next, we print the result of running dim on each of these variables. When we disable drop we can see that the result is a five by one array. But when we don’t modify the drop value we receive a result of NULL with dim because it’s not multidimensional.