Chi Square Test using SAS

A chi-square test is an statistical method to test association between two categorical variables (especially between nominal variables).  Type of Variables.

Correlation Analysis: When both the variables are continuous, and it can be done using Pearson Correlation Coefficient.  Correlation Analysis.

ANOVA: One variable is categorical and other variable is continuous. Finding how levels of categorical variable impacting mean of continuous variable. (One Way ANOVA Tutorial).

In this blog, focus is to explain Chi-Square and specifically how to use SAS for Chi Square Test.  In the previous blogs, we have explained Chi Square Calculations and Also provided a detailed explanation of Chi square as Test of Association. You may want to read below blogs first.

Some of the scenarios for Chi Square Applications are

Scenario 1: Whether a few customer segments prefer any particular channel of interactions.  A bank has 5 customer segments and 3 interaction channels (Branch, Call Center, and Online Chat). Now the bank management wanted to check if there is an association between customer segment and channel used.

Scenario 2: Association between Race and Marital Status in US.  In US, people from different races (such as White, Black, Hispanic, and Asian) and  have Marital Status ( such as Bachelor, Married,Widowed, and Divorced).  Chi-square Test can be used finding relationship between Race and Marital Status.

Chi-Square Test using SAS

Now we want to use SAS for perform Chi Square Test. SAS has a procedure called PROC FREQ which help you to get frequency tables for a single variable or combination of variables.
In the TABLE statement of this procedure, we have an option chisq to get Chi Square Statistics along with a few other Statistics.

Scenario: We have customer segment and saving product type information for the banking customers. We want to find whether there is an association between customer segment and type of saving product customers are taking.

We need to read the data and dataset has 3 variables.

%* Chi Square;
data chi_square;
length segment product $32.;
infile cards dlm="," ;
input segment $ product $ custCount;
Premium,Money Market,198
Advance,Money Market,79
Core,Money Market,914
Mass-Market,Money Market,155
Premium,Online Saving,61
Advance,Online Saving,8
Core,Online Saving,122
Mass-Market,Online Saving,33
Premium,Fixed Deposit,200
Advance,Fixed Deposit,132
Core,Fixed Deposit,1332
Mass-Market,Fixed Deposit,212

For a combination of customer segment and product type, we have count of customers in the data. So the dataset is already aggregated for the combination. Importance of each observation is not same, hence we should be using WEIGHT statement to give weights of each observation. The weight will be taken from variable customer count (custCount).

We are not interested in row-wise, column-wise and overall % , so we have suppressed using options in the TABLE statement.

proc freq data=chi_square;
 table segment*product/nocol norow nopercent chisq;
  weight custCount;

Chi-square cross tab

Now, we want to get Chi Square Statistics and this can be done using CHISQ option in the TABLE statement.

proc freq data=chi_square;
 table segment*product/nocol norow nopercent chisq;
  weight custCount;

Chi Square Statistics using SAS

DF - Degree of Freedom and it is calculated as (Number of Rows- 1)*(Number of Columns -1). In the above examples, (4-1)(3-1) and hence DF is 6.

Value is Chi-Square Statistics value and is calculated from Observed and Expected Values. The formula is
Chi Square Test
If you want to use the detailed calculation, we have carried out using excel. Chi- Square Test - Worked out Example.

Leave a Comment