Sometimes when doing data science, it is necessary to set up blank datasets to hold data as it is being processed. A matrix has the advantage over a data frame when running processing algorithms, that each location is called by a numerical value, and not a column name making it easier to procedurally call each location.
Description
To set up a blank matrix, you use the matrix function in the format of matrix(value, ncol, nrow) where “value” is the initial value you want to fill your matrix with, “ncol” is the number of columns the matrix will have, and “nrow” is the number of rows that the matrix will have. When setting the initial value can be any value that you want, but for a blank matrix, it will frequently be all zeros or NA values. A matrix of zeros is most frequently used when your values are going to be numeric, and NA values tend to be used with blank matrixes of other data types. Ultimately, you used the default value that will be the most appropriate for your situation.
Explanation
When setting up a matrix of zeros, you used the specific format of matrix(0, ncol, nrow). When used in this fashion the function will produce a “ncol” by “nrow” matrix filled entirely with zeros. It is a straightforward process that when used creates a matrix and populates it with zeros. It is an easy formula to learn how to use because it has only three arguments, and they are easy to understand. Regardless of the value that you enter it will populate the matrix with that value. This is a function that is easy to use and understand, all you need to do is provide it with a value, here it is a zero, and with the number of columns and rows, you want to create.
Examples
Here we have three examples of code using the matrix function to create a matrix of zeros. Each example illustrates a different situation where this function is being used.
> x = matrix(0, ncol = 7, nrow = 7)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0
This is the most basic of examples. It is simply a case of using a zero and constant column and row lengths.
> t = as.numeric(Sys.time())
> set.seed(t)
> A = as.integer(abs(rnorm(2))*5+2)
> A
[1] 3 5
> x = matrix(0, ncol = A[1], nrow = A[2])
> x
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
In this example, we have a matrix with random column and row lengths. It shows the flexibility of this function.
> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(A = as.integer(abs(rnorm(7)*10)),
+ B = as.integer(abs(rnorm(7)*10)),
+ C = as.integer(abs(rnorm(7)*10)),
+ D = as.integer(abs(rnorm(7)*10)))
> df
A B C D
1 3 1 13 18
2 6 4 5 2
3 13 15 19 4
4 8 19 7 13
5 3 3 10 24
6 2 5 4 12
7 12 23 2 7
> a = nrow(df)
> a
[1] 7
> b = ncol(df)
> b
[1] 4
> x = matrix(0, ncol = b, nrow = a)
> x
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
[5,] 0 0 0 0
[6,] 0 0 0 0
[7,] 0 0 0 0
Here we have a practical example where we define a matrix of zeros that is the same size as a data frame.
Application
A primary application of creating a matrix of zeros is to have it ready as a temporary storage place for numbers that are being processed. The main advantage of using a matrix, in this case, is that the columns are accessed via a column number rather than a column name like they are in a data frame allowing for procedural access to individual pieces of stored data. It is most useful when you have to go through each piece of data separately in a multiple-step algorithm and you need a place to temporarily store data as it is being processed.
The matrix function is the function to use to create a matrix of zeros. Creating such a matrix is handy when you are trying to get a matrix ready for use with an existing data set. It is a straightforward function with easy-to-understand arguments. As a result, you should not have any trouble learning to use this function.