Machine Learning - Steps to Build Regression Model

A number of real life business decisions are of a regression in nature. We have prepared a  detailed list of regression modelling scenarios. Business Scenario: House Price Prediction We all want to have our own house and price of a house is a imprtant link betewen our wish and owning a house. We want to build ... Read more

Scenarios: Binary Predictive Models

A long list of business decisions are of binary in nature and we will list a few of such scenarios. When a decision variable (also referred as target variable, Response variable or dependent variable) is binary (takes only two value), a long list of supervised statistical and machine learning algorithms can be used. Some of ... Read more

Logistic Regression using R: German Credit Example

Logistic Regression is one of the oldest  and widely used Statistical/Machine Learning techniques for Binary Decision Variable scenarios. In the previous blog, we have explained the overall steps to build a predictive model using Logistic Regression.   Also, if you are interested to understand Binary Model Performance Statistics, you can read a detailed blog on Model ... Read more

10 Most Commonly Used Character Functions in SAS

In this blog, we will discuss 10 most common text manipulation functions in SAS. These functions can be used for preparing data for text analytics or predictive model development. Length of String: LENGTH, LENGTHC and LENGTHN Change Case of String: LOWCASE, UPCASE and PROPCASE String Alignment: LEFT and RIGHT Remove or Trim Leading and Trailing Blanks: TRIM ... Read more

K Means Clustering Examples and Practical Applications

Pricing Segmentation: E-retailers or e-commerce companies have taken the retail industry by storm. They are offering luring offers and discounts. They aim to move from discount led to convenience or differentiation led offering over a period in time. Some of them have been forced to start the journey. One of the large retailer wanted to ... Read more

Data Science for Schools and Educational Institutes

Data is pervasive  but analytics & insights are not. Schools and educational institutions are not an exception. In this blog, we will discuss on various ideas to leverage data and analytics for Schools and Institutions. Student Analytics Schools and Institutions captures various details about the students and their learning. Some of the data captured are ... Read more

Analysing Count and Proportions - using PROC FREQ

Variables from Analysis perspective are categorical and continuous (details on Variables Types). For summarising categorical variables, counts and proportions are used.  SAS has PROC FREQ procedures to summarise categorical variables. FREQ - read as frequency of variable values. In this blog, we will explore some of the commonly used options and statements of PROC FREQ. ... Read more

Decision Tree- Credit Risk Data and Model

Decision Tree is one of the commonly used exploratory data analysis and objective segmentation techniques. Great advantage with Decision Tree is that the its output is relatively easy to understand or intrepret. Introduction to Decision Tree and intrepet Decision Tree results Simple way to understand decision tree is that it is hierarchical approach to partition ... Read more

Proc Sort - Options and Scenarios

In a number of SAS PROCs or Procedures, we may want to group the observations together. For example if we are using BY statement in PROC PRINT (All About PROC PRINT), SAS prints observations grouped by BY variable(s) values. When we are using Base SAS and have specified BY statement, the input dataset have to ... Read more


In this blog, we will discuss some of the commonly used options statements of PROC PRINT in SAS. Below are some of the common tasks which needs to be done and  how these can be achieved using PROC PRINT. Print a SAS dataset Print only a few variables of a SAS dataset (VAR statement) Print ... Read more

10 Data Manipulation Scenarios and SAS Codes

Scenarios 1-Creating multiple rows from a single row in the input SAS dataset: You have a pharma shop and each of the medicines has manufacturing date and expiry date for each batch of a medicine. From the input SAS dataset with one observation for a medicine to multiple observations for each of the valid date ... Read more

Estimating Regression Beta Coefficients using Matrix Calculation

In our series of blogs on Multiple Regression, we have already shared details on following and now focus is to show Beta Coefficient Estimation using Matrix Calculations in R. Multiple Regression Application Scenarios Multiple Regression Tutorial Multi-collinearity Assumption Normality Assumption Step by Steps Explanation of Beta Coefficient Estimation in Multiple Regression Reading Data

Independent ... Read more

Multiple Regression Model Tutorial

When do you use Multiple Regression? Based on scale of measurement, variables can be defined as Binary, Ordinal, Nominal and Continuous (Ratio and Interval Scale) type. When a decision (or target/dependent) variable is continuous, one of the Statistical Methods available for building the model is multiple regression.  These type of scenarios or problems are classified ... Read more

Multiple Regression: Normality Assumption

There are a few assumptions involved in Multiple Regression and one of these assumption is Normality assumption. Some of the other assumptions discussed in other blogs - Linearity, Multi-collinearity and Hopescadasticity. This assumption is probably least important.  Parameter Estimation Method – Ordinary Least Square (OLS) – is not dependent on this assumption. In my view, it ... Read more

T Test - Single Sided

Came across an interesting scenario to apply single sided T Test. In the previous blog, we explained T Test and T Test using R.  As discussed, two sample T Test is used to compare means of two samples. Scenario: In India many of the higher educational institutions and even organisations apply criteria on  class X ... Read more

Scenarios: Multiple Regression Applications

Key criteria used to check whether multiple regression technique can be used is continuous target or dependent variable. Some of the scenarios and ideas are list below. These examples are across functional areas and business verticals.   Industry Vertical Scenario Scenario Description   Human Resource Salary Estimate Predicting or estimating salary of a person based ... Read more

Multiple Regression Assumption- Multi-collinearity and Auto-correlation

In the previous blog, we discussed "Linearity" assumption in multiple regression, now we are discussing on Multicollinearity and Auto-correlation. What is multicollinearity? Collinearity is relationship between two variables and it can be between a dependent variable and an independent variable. And one of the way to measure is using Pearson Correlation Coefficient. Correlation Analysis Overview. Multi-collinearity ... Read more

Assumptions of Linear Regression or Multiple Regression – Explained

Each of the Statistical Techniques has some underlying assumptions. Multiple Regression has a few and we will explain and illustrate those using examples. Multiple Regression Approach A tutorial on Multiple Regression Examples of Multiple Regression Applications Multiple Regression using an analytics tool – SAS /R Some fundamental assumptions Linear relationship Multivariate normality No or little ... Read more

Wealth Management and Analytics

Due to increased competition, customer expectations and regulatory requirements, increased focus on data driven analytics for Wealth Management firms. Some of the key topics and uses are listed below. Customer & Marketing Analytics Cross-sell and up-sell analytics Product Sequence Analysis to know how customer take up product and linkage to their life stage Fund Outflow ... Read more

Confidence Interval and Random Forest