Python - Data Manipulation Scenarios and Questions

In this blog, we have listed a few data manipulation scenarios or examples from data science projects. These examples can propel your Python learning for Data Science. Data Manipulation is one of the significant activity of any Data Science or Predictive Modeling project.   If you have any scenarios or examples, do share with us ... Read morePython - Data Manipulation Scenarios and Questions

Visualisation using R - Commonly used functions

We will discussing some of the commonly used Base R Graphic functions.  Some of the commonly used functions are plot: Plotting Line Chart and Scatter Plot boxplot: Box Whiskers Plot for a continuous variable  or distributions by different groups hist: Histogram Scatter  Plot We will create a sample data points and then use for the scatter ... Read moreVisualisation using R - Commonly used functions

Powerful Proc Tabulate Explained

In this blog, we will use PROC TABULATE , one of the most powerful PROCS for data summarization using SAS. For creating summary table of the information (similar to Pivot in Excel), we need to define Classification Variables, also called dimensions Measurement Variables, also called Facts Structure of the Output Summary Tables PROC FREQ, PROC ... Read morePowerful Proc Tabulate Explained

Concatenating Datasets in SAS

Author: Mrinmoy Saikia Data preparation is one of the most significant steps in Data Science, Analytics or Reporting Projects. In this blog, we focus on learning - "Combining 2 or more SAS datasets Vertically". Combining SAS datasets vertically is also referred as Concatenating Datasets. Concatenating is combining 2 or more datasets one below another. From ... Read moreConcatenating Datasets in SAS

Data Science: Profile Screening Model for Mid-Management Roles

Business Context: The client was an executive search firm. It has built a candidate database with over a million candidate profiles.  The client wanted to leverage the candidate database for smart candidate selection and recruitment process. For this project, the aim was to build a predictive model which will help in identifying a list of ... Read moreData Science: Profile Screening Model for Mid-Management Roles

SAS Control Statements

In this blog, we will discuss some of the SAS control statements with examples. IF/WHERE Statement IF or WHERE statement is applied to select observations. WHERE condition is applied while reading observations from Input dataset where as IF condition is applied at Program Data Vector (PDV). Using the Where statement may improve the efficiency of ... Read moreSAS Control Statements

Chi Square Test using SAS

A chi-square test is an statistical method to test association between two categorical variables (especially between nominal variables).  Type of Variables. Correlation Analysis: When both the variables are continuous, and it can be done using Pearson Correlation Coefficient.  Correlation Analysis. ANOVA: One variable is categorical and other variable is continuous. Finding how levels of categorical variable ... Read moreChi Square Test using SAS

ANOVA using SAS and Example

Analysis of Variance (ANOVA) is used for comparing means across multiple samples. Focus here is only 1-Way ANOVA and there are a few different ways of applying similar concepts to different scenarios. If number of samples or groups is one or two, we can use T Test (T Test using SAS). Using one categorical variables, ... Read moreANOVA using SAS and Example

Retain Statement - Explained with Examples

Author: Rameshwari, done SAS for Data Analytics Training course from DnI Institute SAS programs are made of statements and each statements end with semi comma -";" . One of the important statements in SAS is RETAIN statement. Why do we need Retain Statement? Important point about SAS program is that in majority of  cases SAS ... Read moreRetain Statement - Explained with Examples

Machine Learning for Retailers

Retail industry has been in the forefront in adopting Data Science and Machine Learning. Some of the common applications of Data Science and Machine Learning are list below. Definitely, these are not exhaustive and smart professionals are bringing new applications everyday. Some of the commonly used Machine Learning and Data Science Techniques are Logistic Regression ... Read moreMachine Learning for Retailers

Machine Learning - Steps to Build Regression Model

A number of real life business decisions are of a regression in nature. We have prepared a  detailed list of regression modelling scenarios. Business Scenario: House Price Prediction We all want to have our own house and price of a house is a imprtant link betewen our wish and owning a house. We want to build ... Read moreMachine Learning - Steps to Build Regression Model

Scenarios: Binary Predictive Models

A long list of business decisions are of binary in nature and we will list a few of such scenarios. When a decision variable (also referred as target variable, Response variable or dependent variable) is binary (takes only two value), a long list of supervised statistical and machine learning algorithms can be used. Some of ... Read moreScenarios: Binary Predictive Models

Logistic Regression using R: German Credit Example

Logistic Regression is one of the oldest  and widely used Statistical/Machine Learning techniques for Binary Decision Variable scenarios. In the previous blog, we have explained the overall steps to build a predictive model using Logistic Regression.   Also, if you are interested to understand Binary Model Performance Statistics, you can read a detailed blog on Model ... Read moreLogistic Regression using R: German Credit Example

10 Most Commonly Used Character Functions in SAS

In this blog, we will discuss 10 most common text manipulation functions in SAS. These functions can be used for preparing data for text analytics or predictive model development. Length of String: LENGTH, LENGTHC and LENGTHN Change Case of String: LOWCASE, UPCASE and PROPCASE String Alignment: LEFT and RIGHT Remove or Trim Leading and Trailing Blanks: TRIM ... Read more10 Most Commonly Used Character Functions in SAS

K Means Clustering Examples and Practical Applications

Pricing Segmentation: E-retailers or e-commerce companies have taken the retail industry by storm. They are offering luring offers and discounts. They aim to move from discount led to convenience or differentiation led offering over a period in time. Some of them have been forced to start the journey. One of the large retailer wanted to ... Read moreK Means Clustering Examples and Practical Applications

Data Science for Schools and Educational Institutes

Data is pervasive  but analytics & insights are not. Schools and educational institutions are not an exception. In this blog, we will discuss on various ideas to leverage data and analytics for Schools and Institutions. Student Analytics Schools and Institutions captures various details about the students and their learning. Some of the data captured are ... Read moreData Science for Schools and Educational Institutes

Analysing Count and Proportions - using PROC FREQ

Variables from Analysis perspective are categorical and continuous (details on Variables Types). For summarising categorical variables, counts and proportions are used.  SAS has PROC FREQ procedures to summarise categorical variables. FREQ - read as frequency of variable values. In this blog, we will explore some of the commonly used options and statements of PROC FREQ. ... Read moreAnalysing Count and Proportions - using PROC FREQ

Decision Tree- Credit Risk Data and Model

Decision Tree is one of the commonly used exploratory data analysis and objective segmentation techniques. Great advantage with Decision Tree is that the its output is relatively easy to understand or intrepret. Introduction to Decision Tree and intrepet Decision Tree results Simple way to understand decision tree is that it is hierarchical approach to partition ... Read moreDecision Tree- Credit Risk Data and Model

Proc Sort - Options and Scenarios

In a number of SAS PROCs or Procedures, we may want to group the observations together. For example if we are using BY statement in PROC PRINT (All About PROC PRINT), SAS prints observations grouped by BY variable(s) values. When we are using Base SAS and have specified BY statement, the input dataset have to ... Read moreProc Sort - Options and Scenarios