Retain Statement - Explained with Examples


Author: Rameshwari, done SAS for Data Analytics Training course from DnI Institute


SAS programs are made of statements and each statements end with semi comma -";" . One of the important statements in SAS is RETAIN statement.

Why do we need Retain Statement?

Important point about SAS program is that in majority of  cases SAS code are executed sequentially, meaning SAS reads  observations one by one from an input dataset into Program Data Vector (PDV) performs next set of statements and output the contents of PDV to  output dataset. At any processing step, SAS has only one observation from an input dataset. If we want to keep value(s) from previous processing statement(s), we need to instruct SAS. This is where role of Retain Statement becomes important. Lag function is another way to find previous values of a variable.

Let's understand working SAS Steps and Program Data Vector.

Retain statement Steps

Important point to note is that at the start of reading next observation, by default SAS reset PDV to missing values (all variable values will be assigned to missing). So, if do not want SAS to reset values of a variable, we have to use RETAIN statement.

When do we need to retain values?

Now, we need to understand a few examples or scenarios where we may want to retain values.

Scenario 1:  Input dataset has month sales of a retailer and each observation indicates sales in a month. So, dataset has two variables - month and sales value. We want to find increment in sales from previous month. So, we want to compare sales of current month with previous month sales (previous observation).

Scenario 2: We have monthly spend for a customer and want to find cumulative spend. So, we may want to aggregate spend based on spend value till last month and then add current month spend.

Scenario 3: Transaction level dataset is available and we want to find number of transaction a customer makes before spend reaches $500 for a customer.

How does SAS Retain statement work?

Let us create a dataset spend, with variable names as month and spend.We have to use cards statement as we have to enter the raw data.

Once you run the program you will be able to see that a dataset spend is created in work library with 6 observations and  2 variables month(we have used $ sign for month variable as it is character variable, we could have used informat as well) and spend.

Now we want to find cumulative total spend across months.

We have created a dataset cum_spend. Using set statement one by one observation is read from the input dataset spend. A new variable cum_spend is created to calculate the spend  of all the months.

In the PDV, value of cum_spend is missing, so when SAS execute statement below statement, spend value is added to missing value of cum_spend, hence output is missing value.

So, output dataset has a variable cum_spend but all values are missing values.

We could assign cum_spend to zero for first observation of a dataset.

In the above code, first observation , cum_spend  is assigned 0 and when below statement is expected, cum_spend takes value of spend.

But again for second observation, SAS reinitialise cum_spend along with other variables to missing.

So, missing value of cum_spend and spend is summed and output is missing.So missing value is assigned to cum_spend.

If we were to retain values of cumulative spend value from previous observation, we have to use retain statement and have to initialise to zero.  Here is the SAS code which gets you what you want.

retain statement dataset working

 

Leave a Comment