You can use the Chi Square test in R to evaluate the association between two categorical variables. For the purposes of this exercise, we’re going to use a common marketing application – what % of a group of prospects accepted a new offer that we are testing?
Assume we have a larger data set (every prospect and their information, including if they responded). We can boil this down to a high level table that summarizes our count of records. We will feed this into the chi-square test in R to assess if there is a statistically significant relationship between the two categorical variables.
# Chi Square test in R example; data setup > recordcounts <- as.table(rbind(c(40, 5000), c(65, 5000))) > dimnames(recordcounts) <- list(offer = c("old","new"), outcome=c('accept','reject')) # Chi Square test in R example; inspect data > recordcounts outcome offer accept reject old 40 5000 new 65 5000 # Chi Square test in R example; run test > chisq.test(recordcounts) Pearson's Chi-squared test with Yates' continuity correction data: recordcounts X-squared = 5.424, df = 1, p-value = 0.01986
In review, we selected two groups of 5000 prospects (one for each offer). The offer was presented, resulting in a binary outcome (accept, reject). We tallied the number of each outcome into the table above.
Visual inspection suggests the new offer might have done well (yielded 65 acceptances against 40 acceptances with our current champion). But is this real or an artifact of chance? We run a chi-square test to gain perspective.
With a P value below .02, we will most likely accept that something worked. (Typical alpha is .05 or .025, depending on the standards of your employer).