cover_image

R Fuzzy String Match

Here is very cool solution to detect fraud case which is related to same name or same address used to show different entry. While doing the risk consulting, due diligence I've come across the problem when we have to check: Genuine data vs dummy data Employee is also involved as vendor Some relative of employee ... Read more

cover_image

R - Generate Random Names - Random Number - Random Date

This tutorial is focused to generate random values and create dummy dataset. If you are very good in understanding the business and business attribute then this tutorial will help you creating a dummy data-set. Basically there are three types of attributes: Numeric/Integer Character/Factor Dates Below is the code which help you to generate any type ... Read more

R - Load and Install multiple package at once and auto install if package is not found

Install multiple package if not available in R library and load into session. Copy and paste below code to load libraries. I've taken "ggplot2", "dplyr", "curl" package in my package checklist:  

It will load all the required package from list, if unavailable it will install and then load.

Data Wrangling using Python- Part 1

In this blog, we will show some of the commonly used data wrangling steps using Python.  We will be using pandas data frame as our data object to show all the steps. Importing Python Packages In this part of blog, we will use pandas and numpy packages available in Python. We need to import these ... Read more

Reading Text File in Python

Reading data into an analytical tool is one of the first steps before proceeding to analytics. Some of the common challenges while reading a text file are Knowing Delimiter Presence of Missing  or null values Column values holding date values In this blog, we will read text file in which values are separated by a ... Read more

Python Learning - Finding Answers

In this blog, we are sharing some of the scenario arose while working on a project and we are providing the steps to get the work done. I am sure, there would be multiple ways to achieve the outcome but I am sharing my solutions.   Q1:  How do we extract id value from html ... Read more

Visualisation using R - Commonly used functions

We will discussing some of the commonly used Base R Graphic functions.  Some of the commonly used functions are plot: Plotting Line Chart and Scatter Plot boxplot: Box Whiskers Plot for a continuous variable  or distributions by different groups hist: Histogram Scatter  Plot We will create a sample data points and then use for the scatter ... Read more

Powerful Proc Tabulate Explained

In this blog, we will use PROC TABULATE , one of the most powerful PROCS for data summarization using SAS. For creating summary table of the information (similar to Pivot in Excel), we need to define Classification Variables, also called dimensions Measurement Variables, also called Facts Structure of the Output Summary Tables PROC FREQ, PROC ... Read more

Concatenating Datasets in SAS

Author: Mrinmoy Saikia Data preparation is one of the most significant steps in Data Science, Analytics or Reporting Projects. In this blog, we focus on learning - "Combining 2 or more SAS datasets Vertically". Combining SAS datasets vertically is also referred as Concatenating Datasets. Concatenating is combining 2 or more datasets one below another. From ... Read more

Data Science: Profile Screening Model for Mid-Management Roles

Business Context: The client was an executive search firm. It has built a candidate database with over a million candidate profiles.  The client wanted to leverage the candidate database for smart candidate selection and recruitment process. For this project, the aim was to build a predictive model which will help in identifying a list of ... Read more

SAS Control Statements

In this blog, we will discuss some of the SAS control statements with examples. IF/WHERE Statement IF or WHERE statement is applied to select observations. WHERE condition is applied while reading observations from Input dataset where as IF condition is applied at Program Data Vector (PDV). Using the Where statement may improve the efficiency of ... Read more

Chi Square Test using SAS

A chi-square test is an statistical method to test association between two categorical variables (especially between nominal variables).  Type of Variables. Correlation Analysis: When both the variables are continuous, and it can be done using Pearson Correlation Coefficient.  Correlation Analysis. ANOVA: One variable is categorical and other variable is continuous. Finding how levels of categorical variable ... Read more

ANOVA using SAS and Example

Analysis of Variance (ANOVA) is used for comparing means across multiple samples. Focus here is only 1-Way ANOVA and there are a few different ways of applying similar concepts to different scenarios. If number of samples or groups is one or two, we can use T Test (T Test using SAS). Using one categorical variables, ... Read more

Retain Statement - Explained with Examples

Author: Rameshwari, done SAS for Data Analytics Training course from DnI Institute SAS programs are made of statements and each statements end with semi comma -";" . One of the important statements in SAS is RETAIN statement. Why do we need Retain Statement? Important point about SAS program is that in majority of  cases SAS ... Read more

Machine Learning for Retailers

Retail industry has been in the forefront in adopting Data Science and Machine Learning. Some of the common applications of Data Science and Machine Learning are list below. Definitely, these are not exhaustive and smart professionals are bringing new applications everyday. Some of the commonly used Machine Learning and Data Science Techniques are Logistic Regression ... Read more

Machine Learning - Steps to Build Regression Model

A number of real life business decisions are of a regression in nature. We have prepared a  detailed list of regression modelling scenarios. Business Scenario: House Price Prediction We all want to have our own house and price of a house is a imprtant link betewen our wish and owning a house. We want to build ... Read more

Scenarios: Binary Predictive Models

A long list of business decisions are of binary in nature and we will list a few of such scenarios. When a decision variable (also referred as target variable, Response variable or dependent variable) is binary (takes only two value), a long list of supervised statistical and machine learning algorithms can be used. Some of ... Read more