## Model Performance Assessment Statistics – Concordance: Steps to Calculate

Binary Decision Outcome is one of most common analytics problem being solved in the industry. Some examples where binary predictive models are used.. Some of the common Model Performance Statistics which are used assessing performance of a Binary Predictive Models are: Confusion Matrix ROC Chart Concordance % Gini KS Lift Chart Gain Table and Chart ... Read more

For doing an analysis on a given data, building a machine learning model for a business problem, or building a great visualization in R, we need to bring data into R or R Studio. The data may be available in different source formats (e.g. excel, csv, text, sql etc) and across systems (e.g. in the ... Read more

## Tutorial on Random Forest using Python

In the previous blog, we explained  Random Forest algorithm and steps you take in building Random Forest Model using R. In this blog, we will show high level steps required to build a Machine Learning Model in Python. Random Forest algorithm is based on Classification and Regression Tree  (CART) decision tree algorithm. But it builds ... Read more

## Step by Step Tutorial on Decision Tree using Python

In this blog, the aim is to show you steps of building a Decision Tree using Python Jupiter Notebook. If you are interested to learn Decision Tree algorithm, we have an excellent tutorial on "Decision Tree Algorithm - CART". We are using the same data for explaining the steps involved in building a decision tree. ... Read more

## Random Forest using R - Step by Step on a Sample Data

Some of the interested candidates have asked us to show steps on building Random Forest  for a sample data and score another sample using the Random Forest Model built. Here are the steps..

## Decision Tree CART Algorithm Part 3

In the precious blogs, we have explained on selecting Best Split for each of the independent variables. Now we need to select the best variable, again consideration is Gini Index Value. For each of the independent Variables, we have best split and its Gini Index value. Here is the table. Variable Spend in the last ... Read more

## CART Algorithm: Best Split for a Categorical Variable

Similar to continuous variables, Decision Tree Algorithm - CART has to find the best split for categorical variable as well. Only difference will be to find possible cut off values. For example, we have a variable - education- it had 4 levels -"University","Graduate","High School" and "Others". We consider all possible two way splits for the ... Read more

## CART Algorithm for Decision Tree

Classification and Regression Tree (CART) is one of commonly used Decision Tree algorithms. In this post, we will explained the steps of CART algorithm using an example data. Decision Tree is a recursive partitioning approach and CART split each of the input node into two child nodes, so CART decision tree is Binary Decision Tree. ... Read more

## Reading CSV file and Text File in Python

Reading Comma Separated (CSV) file in Python is one of the commonly used activities before proceeding to Data Science or Data Analysis steps. We can read csv data at least two different ways.

Reading CVS using function from Pandas Library.

## Twitter - R integration and extract data between from and till date

Here is complete end to end integration walk through for twitter and R. It will help the data retrieval between two dates "since and until" PreRequisite: Twitter Account R Steps: Step 1: Go to https://apps.twitter.com/app/ Step 2: Sign-in to your twitter account Step 3: Click on create new app   Step 4: Fill basic requirement to create app ... Read more

## Count consecutive number of days with condition in R

Although R is mostly known for its capability in statistical analytics and data modeling but it has great capability to data preparation and query. Here I present a sample case which is very much required in our day to day analysis. So I came through a scenario when I have to figure out the exception ... Read more

## R Fuzzy String Match

Here is very cool solution to detect fraud case which is related to same name or same address used to show different entry. While doing the risk consulting, due diligence I've come across the problem when we have to check: Genuine data vs dummy data Employee is also involved as vendor Some relative of employee ... Read more

## R - Generate Random Names - Random Number - Random Date

This tutorial is focused to generate random values and create dummy dataset. If you are very good in understanding the business and business attribute then this tutorial will help you creating a dummy data-set. Basically there are three types of attributes: Numeric/Integer Character/Factor Dates Below is the code which help you to generate any type ... Read more

## R - Load and Install multiple package at once and auto install if package is not found

Install multiple package if not available in R library and load into session. Copy and paste below code to load libraries. I've taken "ggplot2", "dplyr", "curl" package in my package checklist:

It will load all the required package from list, if unavailable it will install and then load.

## Data Wrangling using Python- Part 1

In this blog, we will show some of the commonly used data wrangling steps using Python.  We will be using pandas data frame as our data object to show all the steps. Importing Python Packages In this part of blog, we will use pandas and numpy packages available in Python. We need to import these ... Read more

## Reading Text File in Python

Reading data into an analytical tool is one of the first steps before proceeding to analytics. Some of the common challenges while reading a text file are Knowing Delimiter Presence of Missing  or null values Column values holding date values In this blog, we will read text file in which values are separated by a ... Read more

## Python Learning - Finding Answers

In this blog, we are sharing some of the scenario arose while working on a project and we are providing the steps to get the work done. I am sure, there would be multiple ways to achieve the outcome but I am sharing my solutions.   Q1:  How do we extract id value from html ... Read more

## Python - Data Manipulation Scenarios and Questions

In this blog, we have listed a few data manipulation scenarios or examples from data science projects. These examples can propel your Python learning for Data Science. Data Manipulation is one of the significant activity of any Data Science or Predictive Modeling project.   If you have any scenarios or examples, do share with us ... Read more

## Visualisation using R - Commonly used functions

We will discussing some of the commonly used Base R Graphic functions.  Some of the commonly used functions are plot: Plotting Line Chart and Scatter Plot boxplot: Box Whiskers Plot for a continuous variable  or distributions by different groups hist: Histogram Scatter  Plot We will create a sample data points and then use for the scatter ... Read more