10 Data Manipulation Scenarios and SAS Codes

Scenarios 1-Creating multiple rows from a single row in the input SAS dataset: You have a pharma shop and each of the medicines has manufacturing date and expiry date for each batch of a medicine. From the input SAS dataset with one observation for a medicine to multiple observations for each of the valid date (when medicine to good to be consumed/sold).

Increase Rows

data medi;
    infile cards dlm='09'x;
    informat Manufacturing_Date	Expiry_Date mmddyy10.;
    format Manufacturing_Date	Expiry_Date mmddyy10.;
	input Drug_bacth $ 	Manufacturing_Date	Expiry_Date;
cards;
Z6505	8/6/2014	8/21/2014
Z2061	5/12/2011	5/23/2011
Z4058	7/3/2015	7/15/2015
Z5645	1/29/2015	2/5/2015
Z4669	2/13/2011	2/23/2011
Z3011	9/19/2014	9/26/2014
Z2688	12/2/2011	12/7/2011
Z4864	11/30/2010	12/19/2010
Z7017	12/28/2010	1/15/2011
;
run;
data medi_valid;
	set medi;
	do i=Manufacturing_Date to  Expiry_Date by 1;
	  valid_date=i;
	  output;
	end;
	format valid_date date9.;
	drop i Manufacturing_Date	Expiry_Date;
run;

Scenario 2- Converting Row into Columns: Input data-set has row for profit for each quarters for an organisation. Wanted to compare year on year change in profit, so wanted to convert to horizontal data where each quarter  profit is a column.

Transpose Data

data profit;
 infile datalines dlm='09'x ; 
 input Company $ 	Period $ 	Profit :comma5.;
datalines;
ABC1	2014Q1	 1,765 
ABC1	2014Q2	 1,893 
ABC1	2014Q3	 1,915 
ABC1	2014Q4	 1,983 
ABC1	2015Q1	 2,001 
ABC1	2015Q2	 2,092 
ABC1	2015Q3	 2,103 
ABC2	2014Q4	 501 
ABC2	2015Q1	 567 
ABC2	2015Q2	 623 
ABC2	2015Q3	 709 
;
run;
proc sort data=profit;
  by Company period;
run;
proc transpose data=profit prefix=profit_ out=profit_tran (drop=_name_);
 by Company ;
 id period;
 var Profit ;
run;

Scenario 3- Summarising Values by Group Variable(s): Input data is at a customer and month level, we want to summarise customer behaviour data at a customer level(one transaction per customer) in the output data.

Aggregate

SAS code

Data Spend;
  infile cards dlm="09"x;
  informat month monyy6.;
  format month monyy6.;
  input Month	Spend_Amt	Spend_Cnt	CustID;
cards;
Jun-15	5691.05	24	1005
Jul-15	12240.87	24	1005
Aug-15	5124.45	11	1005
Sep-15	11438.02	20	1005
Oct-15	5646.36	25	1005
Nov-15	10811.63	37	1005
Dec-15	10301.11	21	1005
Jun-15	3234.3	11	1007
Jul-15	800.85	10	1007
Aug-15	2013.27	14	1007
Sep-15	2392.54	14	1007
Oct-15	2672.91	22	1007
Nov-15	3226.29	19	1007
Dec-15	1849.39	13	1007
;
run;

proc sql;
  create table spend_cust as
  select custid,
         max(Spend_Amt) as max_Spend_Amt,
         min(Spend_Amt) as min_Spend_Amt,
         avg(Spend_Amt) as avg_Spend_Amt,
         sum(Spend_Amt) as sm_Spend_Amt,
         avg(Spend_Cnt) as avg_Spend_Cnt,
         sum(Spend_Cnt) as sm_Spend_Cnt
  from spend
  group by custid
  ;
quit;        








 

More scenarios: Next Set

Leave a Comment