Analysing Count and Proportions - using PROC FREQ

Variables from Analysis perspective are categorical and continuous (details on Variables Types). For summarising categorical variables, counts and proportions are used.  SAS has PROC FREQ procedures to summarise categorical variables. FREQ - read as frequency of variable values.

In this blog, we will explore some of the commonly used options and statements of PROC FREQ.

PROC FREQ  can be used for analysis and validation when analysis variable(s) are categorical .

And it helps in

  • Displaying count of variable values or distribution of the variable
  • Finding missing values or % missing values of a categorical variable
  • Creating a cross table or contingency table for two variables
  • Can be used for multi-dimension tables analysis as well
  • Finding association between variables using Chi-Square test
  • Calculating overall %, row %, column % and cumulative % along with counts
  • Computing Exact test statistics

SAS PROC FREQ  helps in getting one way (for a single categorical variable) frequency table. Now consider a scenario and  will discuss on requirement of PROC FREQ.

Context: We have information about the atheletes participated in London Olympic event. Some of the variables available are  Name, Gender, State, Event Participated and which medal has been won.

Scenario 1: You may want to know the count of athletes from each country. This will help you understand the most & least represented countries.

Scenario 2: You may have a question, how Gender distribution are different for each country?  Is country A has higher % of  “Female” athletes compared to other countries?

Scenario 3: Next question, is there an  association between Country and Gender? Means, can we say that Country variable influence whether more/less females are  becoming athletes?

Find Count and % of Variable Values

In PROC FREQ,  TABLE statement helps in giving variable(s) for which level frequency has to be calculated. In the scenario, we want to calculate count of Male and Female based on Sex variable in the dataset London.

proc freq data=London;
  table sex;
run;

proc freq default

By default, PROC FREQ  produces count/frequency, Percent, Cumulative Count/Frequency and Cumulative Percent statistics. We can suppress Percent using NOPERCENT option. NOCUM can be used for suppressing cumulative column.

proc freq percent

Control Order of the Table

If we want to sort the variable values based Frequency order we can use ORDER= option.  By default  (with order= option), the table will have values based on value of categorical variable.

proc freq order

A few times we want to show values with higher count  on top and this can be achieved using ORDER=FREQ option.

proc freq data=london order=freq;
  table Age;
run;

proc freq order_freq

We can format the age value by defining custom formats( using PROC FORMAT) and get the values based on the sorted values.

proc format;
  value agef
  Low-15 ="1:<15"
  15 -20 ="2:15-20"
  20- 30="3:20-30"
  30-40 = "4:30-40"
  40-50 = "5:40-50"
  50-High  = "6:50+"
  other="Others"
  ;
run;
proc freq data=london order=formatted;
  table Age;
  format age agef. ;
run;

proc print formatted values

Creating Contingency or Cross Tab using Two Variables

Table statement can be used to generate cross tab or contingency table for two variables. The two categorical variables have to separated by "*".  In the scenario discussed earlier, we wanted to find cross tab between age groups and sex of athletes.

proc freq data=london order=formatted;
  table Age;
  format age agef. ;
run;

cross tab

By default, it produce Count/Freq, Percent, Row Percent and Column Percent. If we want suppress  these statistics, we can do using NOPERCENT, NOROW, NOCOL options.

cross tab only count

In some cases, we may want to get list view instead of as cross tab. We can get this using LIST  option in TABLE statement.

proc freq data=london order=formatted;
  table Age*Sex/nopercent nocol norow list;
  format age agef. ;
run;

proc freq list

In the next set of blogs, we will focus on some additional options and statistical applications of PROC FREQ.

 

Leave a Comment