Raw column variable data is not always as nice and neat as you would like. Sometimes it is in a format that is hard to make sense of. Crosstabs tables are a handy way of fixing this. Creating a crosstab in r is a simple process but the function to do so is not embedded in base R code but needs to be downloaded to use.
Description of the function.
The crosstab function has the format of crosstab(data.frame, row.vars, col.vars) along with several optional column variable and row variable arguments and factor levels. The crosstab function is not part of the built-in set of R code functions but it is available online for inclusion in projects. While you do have the option of copying and pasting the cross table function into your code you can also add it by the following command.
Once added you will be able to use this cross table function but it needs to be reloaded every time you reopen the project so it is best to keep it as part of the code. It is definitely a tool that you want to have in your data processing tool kit.
Examples of the function in action.
In these examples we have randomly generated data in a data frame. The first one just shows the raw data in the data frame. The second example uses crosstab to produce a cross tabulation analysis of the observed value information.
# how to make a crosstab in r > Grade = sample(c("A","B","C","D","F"), 20, replace = TRUE) > School = sample(c("Public", "Private"), 20, replace = TRUE) > x = data.frame(Grade, School) > x Grade School 1 F Private 2 D Public 3 F Private 4 B Private 5 D Private 6 F Private 7 B Private 8 D Private 9 D Private 10 C Private 11 F Public 12 F Public 13 B Public 14 D Private 15 B Public 16 B Public 17 D Public 18 D Private 19 A Public 20 D Public
As you can see in this example we have the lengthy default table type of the data frame, and its categorical data content is difficult to make sense of. What is needed is a contingency table that puts the categorical data in a table type that is easier to understand.
# crosstab in r example > Grade = sample(c("A","B","C","D","F"), 20, replace = TRUE) > School = sample(c("Public", "Private"), 20, replace = TRUE) > x = data.frame(Grade, School) > source("http://pcwww.liv.ac.uk/~william/R/crosstab.r") > crosstab(x, row.vars = "Grade", col.vars = "School") School Private Public Sum Grade A 2 1 3 B 1 1 2 C 3 2 5 D 3 5 8 F 2 0 2 Sum 11 9 20
In this example, the cross table is made from a tabulation of each included variable. The first column mentioned provides the names of the rows. The other column provides the column names of the new frequency table. It shows the frequency of those column names with the row names. Many more examples of crosstab in action can be found online.
Application of this function.
The main application of the crosstabs function is reformatting information into an easier to read and understand format. As you can see from our two examples the second one is a lot easier to read. It is not uncommon to find survey data in the format in our first example, furthermore, it is a relatively simple example since it has only two columns and twenty rows. Most real-world examples are much larger, having multiple variables and factor levels, column names, column percentages, cross classifying factors, and more. Missing values and unused levels or cross classifying factors can make testing a null hypothesis or performing a chi square test difficult. The crosstab function reduces your observed value data and character string to a compact format containing only the survey data you want in an easy to read and understand format, removing unused levels and marginal frequencies from your data frame or matrix that you do not need to do your categorical variable analysis or evaluate a null hypothesis.
Looking for more great R programming content? Check out the rest of our site and these great articles: