Random Forest using R - Step by Step on a Sample Data

Some of the interested candidates have asked us to show steps on building Random Forest  for a sample data and

score another sample using the Random Forest Model built.

Here are the steps..

# ---------------------- Random Forest using R -------------------------#
#                        Author : Ram                                   #
#-----------------------------------------------------------------------#


# Read a dataset which a target variable (binary: Two level) and 
# a list of independent Variables

df_rf <- read.csv(file="http://dni-institute.in/blogs/wp-content/uploads/2017/07/dt_data.csv",
                  stringsAsFactors = F)

# Check variable types 
str(df_rf)

# Summary statistics
summary(df_rf)

# Check Target Variable level - counts and %
table(df_rf$Spend_Drop_over50pct)
table(df_rf$Spend_Drop_over50pct)*100/nrow(df_rf)

# Name of  variables
names(df_rf)

# Change Target Variable as Factor if we want to build classification model

df_rf$Spend_Drop_over50pct <- as.factor(df_rf$Spend_Drop_over50pct)

# -----------------  Build Random Forest -----------------------------
library(randomForest)
rf_mod <- randomForest(Spend_Drop_over50pct~Last_Month_spend+Last_3m_avg_spend,
                       data =df_rf )

# Save the Random Forest Model
saveRDS(rf_mod,file="rf_model")

# remove random forest model object from env
rm(rf_mod)
rf_mod

# Load Random Forest 
rf_mod <-readRDS(file="rf_model")
summary(rf_mod)
# Random Forest Model is useful for scoring the new data frame
# How do I load a DataFrame with no classes (or minimal classes) 
# and get actual binary classification responses that I want:

# a data frame sample
index1 <- sample(1:nrow(df_rf),200,replace = F)
names(df_rf)
smp_rf <- df_rf[index1,c(5,6)]
# Score a new data
NewPredictions <- predict(rf_mod, smp_rf) 

head(NewPredictions)

table(NewPredictions)

 

 

 

Leave a Comment