cover_image

R Fuzzy String Match

Here is very cool solution to detect fraud case which is related to same name or same address used to show different entry. While doing the risk consulting, due diligence I've come across the problem when we have to check: Genuine data vs dummy data Employee is also involved as vendor Some relative of employee ... Read more

cover_image

R - Generate Random Names - Random Number - Random Date

This tutorial is focused to generate random values and create dummy dataset. If you are very good in understanding the business and business attribute then this tutorial will help you creating a dummy data-set. Basically there are three types of attributes: Numeric/Integer Character/Factor Dates Below is the code which help you to generate any type ... Read more

R - Load and Install multiple package at once and auto install if package is not found

Install multiple package if not available in R library and load into session. Copy and paste below code to load libraries. I've taken "ggplot2", "dplyr", "curl" package in my package checklist:  

It will load all the required package from list, if unavailable it will install and then load.

Data Science: Profile Screening Model for Mid-Management Roles

Business Context: The client was an executive search firm. It has built a candidate database with over a million candidate profiles.  The client wanted to leverage the candidate database for smart candidate selection and recruitment process. For this project, the aim was to build a predictive model which will help in identifying a list of ... Read more

Machine Learning - Steps to Build Regression Model

A number of real life business decisions are of a regression in nature. We have prepared a  detailed list of regression modelling scenarios. Business Scenario: House Price Prediction We all want to have our own house and price of a house is a imprtant link betewen our wish and owning a house. We want to build ... Read more

Logistic Regression using R: German Credit Example

Logistic Regression is one of the oldest  and widely used Statistical/Machine Learning techniques for Binary Decision Variable scenarios. In the previous blog, we have explained the overall steps to build a predictive model using Logistic Regression.   Also, if you are interested to understand Binary Model Performance Statistics, you can read a detailed blog on Model ... Read more

Bagging Algorithm: Concepts with Example

Bagging meaning Bootstrap Aggregation. Bootstrapping is a process of selecting samples from original sample (or population) and using these samples for estimating various statistics or model accuracy.  Bagging (Bootstrap aggregating) was proposed by Leo Breiman in 1994 for improving classification accuracy. Bootstrapping is a process of creating random samples with replacement for estimating sample statistics. ... Read more

Variable Standardization and K Means Clustering

Analytics is  an approach for identifying patterns from the data and the data is related to a particular context. Without context data values does not have much  significance. Characteristics of the objects (e.g. transactions, customers or clicks) are captured in different variables and values of these variables for an object are the data points. For examples, ... Read more

Sorting Data in R

Sorting of data is one of the common activity in preparing data for analytics and data science projects.  Probably, this is the probably reason, "How to sort a data frame by column(s) using R" is one of the common questions asked across R forums and blogs.  Some of the other questions are - How to ... Read more

Random Forest Using R: Step by Step Tutorial

Random Forest: Overview Random Forest is an ensemble learning  based classification and regression technique. It is one of the commonly used predictive modelling and machine learning technique. Before understanding random forest algorithm, it is recommended to understand about decision tree algorithm & applications. A non-technical description of decision tree. A simple explanation of why is it ... Read more

Interview Questions for R & Data Science

R is an open source Statistical Computing Environment and R Studio is IDE which use R for Data Science and Analytics. Increased number of organizations are migrating statistical & data analytics to R and are looking for analytics professionals and Data Scientist who have R experience. In a job interview, the organizations will be testing candidates ... Read more

SVM for Regression using R

Support Vector Machine for Regression using R Predictive Modelling problems are classified either as classification or Regression problem. Support Vector Machine (SVM) algorithm could be used for both classification and regression scenarios. In the earlier blog, we have explained SVM technique and its way of working using an example In regression problems, the target variable ... Read more

Building Predictive Model using SVM and R

Predictive Modelling problems are classified either as classification or regression problem. Within classification based on the level and type of decision variable (Target Variable), different algorithms could be used. A number of statistical and machine learning techniques are available for both classification and regression type of the problems. Some of the commonly used techniques for ... Read more

Decision Tree: Entropy and Information Gain

A decision tree is a hierarchical or tree like representation of decisions. Decision Tree is a technique to iteratively break input data (or node) into two or more data samples (or nodes). And this recursive partitioning of input data (or node) continue until it meets specified condition(s). How decision tree is built? There are different ... Read more

String Manipulations and Regular Expressions: Part 4

Text Mining and Text Analytics are getting increased attention, thanks to open source technologies and digitization. Huge volume of textual data is generated and organizations are looking for ways to learn and create competitive advantage. String manipulation is one of the key steps in processing text data for insights. We have covered some of the ... Read more

String Manipulations and Regular Expressions: Part 2

In the previous blog, we had focused on using R for removing whitespaces from an input string. The white space could be leading, training or embedded in the input string. There are a number of the string manipulation functionalities of R and functions in R.In this blog, mainly we want to cover different options to ... Read more

String Manipulations and Regular Expressions: Part 1

Textual data has been less explored in the older years. Huge volume of social media and digital data are to all to us and the organizations to explore, understand, action and generate value. Textual Data and Text Mining requires high level of processing power and text analytics capabilities to make sense of the data and ... Read more

Market Basket Analysis – Key Performance Statistics

Market Basket Analysis (MBA) is an analysis to understand product bought together in a retail transaction or customer visit in a store. A number of blogs on a brief overview on Market Basket Analysis for a retail , a few published case studies of market basket analysis and step by step approach to Market Basket Analysis using R. In this ... Read more