Proc Sort - Options and Scenarios

In a number of SAS PROCs or Procedures, we may want to group the observations together. For example if we are using BY statement in PROC PRINT (All About PROC PRINT), SAS prints observations grouped by BY variable(s) values.

When we are using Base SAS and have specified BY statement, the input dataset have to be sorted by BY variables.

In this blog, we will discuss different options of PROC SORT.

Sort a SAS dataset

%* Sample Dataset ;
data cust;
  input custid name $ age;
 cards;
 12 Rohan 23
 11 Ronit 37
 15 Soni  27
 13 John  65
 16 Samuel 39
 12 Rohan 27
 13 John  62
 12 Rohan 37
 ;
run;
%* Sort a SAS dataset;
proc sort data=cust ;
  by custid;
run;

Sort a SAS dataset using a particular variable and it can be defined using BY statement.

Sort a Dataset and Output to a new Dataset

Using OUT=  option we can out the sorted data values  to a new dataset instead of overwriting an existing input dataset.

%* Create sorted output dataset;
proc sort data=cust out=cust1;
  by custid;
run;
Title "Input SAS dataset";
proc print data=cust noobs;
run;
Title "Sorted SAS dataset";
proc print data=cust1 noobs;
run;

Sorted SAS datasets

Sort using Multiple Variables

A SAS dataset can be sorted using multiple variables. These variables can be listed in BY statement. Based on sequence of variables in BY statement the dataset will be sorted.

%* Sorted SAS dataset with multiple variables;
proc sort data=cust out=cust1;
  by custid age;
run;

In this case first observations are arranged using "custid" and then "age" variable values. By default SAS arrange values in Ascending order.

We can arrange data in ascending order for one and descending order for other variable values.

Sort in Ascending and Descending Orders

In this example, we want output dataset to be sorted in ascending order (default option) for custid and descending order for age variable using "descending option" before variable name in BY statement.

%* Sorted SAS dataset with multiple variables;
proc sort data=cust out=cust1;
  by custid descending age;
run;
Title "Sorted Ascending for custid and descending for age";
proc print data=cust1 noobs;
run;

Descending Option

Duplicate Values and Records

A SAS dataset can be duplicate based on one or more variables, it is called duplicate keys , or it can be duplicates across all variables and this called duplicate records.

SAS has an option nodup or noduplicates which removes duplicate values if they are adjacent to each other or next to each other.

%* Remove Duplicates;
Title "Input SAS dataset";
proc print data=cust noobs;
run;
proc sort data=cust  nodup out=cust2;
  by custid;
run;
Title "Remove duplicate records";
proc print data=cust2 noobs;
run;

dumplicates - sorted on two variables v1

In this example, notice that duplicate values for "Ronit" were deleted but not for "Rohan". Since Age values were different. SAS was comparing 37 vs 27 which are difference and then 27 vs 37 which were again different, so not deleted.

Now if sort both on CustID and Age, then one of the duplicate records of "Rohan" will be deleted.

proc sort data=cust  nodup out=cust2;
  by custid;
run;
Title "Remove duplicate records";
proc print data=cust2 noobs;
run;

dumplicates - sorted on two variables v2

Now if you wanted to keep only nonduplicate for  By variable(s), we can get rid of duplicates using dupkey option.

proc sort data=cust  nodup out=cust2;
  by custid age;
run;
Title "Remove dumplicate records";
proc print data=cust2 noobs;
run;

duplicate keys

SAS has introduced a few new options in PROC SAS. Next blog will be on a new options in SAS. Wait for the update..

1 thought on “Proc Sort - Options and Scenarios”

Leave a Comment