Nrow & Ncol in R – Getting Dimensions of a Data Frame

In data science, it is often helpful to know the number of rows and columns in your data set. When you or your program create the data set, then the number of rows and columns can be known. However, when you load data from another source, you will need to calculate them.

Count number of rows in R

Counting the number of columns in a data set is usually fairly easy as there are usually a limited number of variables. The need to count the number of rows for a data set can be challenging because they can get extremely long. In a programming language that lacks a special function for this task, it requires running through the data set until you get an end of file flag to tell the routine to stop counting. When you count number of rows in R, the process is greatly simplified because you can use a single function.

Nrow in R – Counting Rows

The function nrow in R has the form nrow(dataset). This will return the number of rows in the data set. This replaces the need for complicated solutions such as iterating across the elements to count them.

# Nrow in R
> x=data.frame(mtcars)
> nrow(x)
 [1] 32

Using this example of the mtcars data set, nrow in R shows that this data set has thirty-two rows. One thing this tells us is it this data set contains thirty-two cars and that was without opening it up and counting them. The Simplicity of this example is demonstrated by the fact that it takes only two lines of code including retrieving the data set.

Ncol in R – Counting Columns

The function ncol in R has the form ncol(dataset). This returns the number of columns in the data set and is an efficient way to interrogate otherwise undefined data frames for their dimensions. You will use this frequently when writing utility functions to iterate across the columns of a dataframe.

The counting of the columns in this data set has been reduced to a single function.

# ncol in R
> x=data.frame(mtcars)
> ncol(x)
 [1] 11

Using this example of the mtcars data set, ncol in R shows that this data set has eleven columns. What this tells us is that each car has eleven data points or specifications about it in the data set.

Applications of Nrow & Ncol in R


As mentioned before, the two R functions can play a critical role in helping abstract your work into more “generic” functions. The number of rows (nrow in R) and number of columns (ncol in R) are required to control iteration across the rows and columns of a data frame.

# ncol in r, nrow in r - examples
> x=data.frame(mtcars)
 > nrow(x)
 [1] 32
 > ncol(x)
 [1] 11

As our example from the mtcars data set shows, there are eleven columns of data and thirty-two rows. This translates into thirty-two cars and eleven specifications for each car. This is the typical pattern, but the author of the data set can set it up as he wants.

For example, I’ve written several “helper” utilities for OBDC SQL connections. These simplify the process of accepting data from a SQL query by automating the conversion of the query results into an R data frame. This is useful when the design of the query is changing rapidly (as the business users ask us to include additional fields in the analysis). With the helper functions, we only need to make changes in the query. The helper function analyzes the data the query returns and performs any required conversion / clean-up to fit it into an R data frame. For more guidance on manipulating data in R, read here.

Coding Style – When to Use Dynamic Sizing

Matrix multiplication can be tricky, especially if you’re dealing with a multiple step process. As a matter of form, you should assume that someone is going to change your data (add a column, change column name, add rows, update a row name, give a missing value) and rearrange the steps to your process. This renders many matrix multiplication operations very brittle – unless you write the program to dynamically resize any loops based on the number of rows (nrow function) and column(s) (ncol function). This gives you a way to protect your matrix function from sketchy data.

The same applies to higher level program design. You’ll drop and add steps to any analysis, which will throw off column number or row number references. You may even do this dynamically yourself (look at rbind or cbind to accomplish this, bolting multiple columns and data onto your data frame). The more you future proof your R code using self contained functions, the less risk of error.

Scroll to top
Privacy Policy