Multiple Histograms in R

Histogram is one of the important visualization for univariate analysis. Data vector or a column in a data frame must be numeric for plotting a histogram in R or for that matter in any of the tool.

A tutorial on plot histogram in r. In this blog, the focus is on using base R graphic functions for plotting a beautiful histogram in R.

A step by step tutorial on Histogram in r using ggplot2. In this blog, ggplot2 package and its functions are using for plotting histogram. Also, options for plotting density histogram curve and overlay Histogram with density curve are illustrated.

Basics of Histogram in R

Basic syntax of a histogram is simple

hist : Function to plot a histogram x : Input vector and must be numeric breaks : Input for deciding number of bars or breaks in a histogram xlab* : Label for X axis ylab** : Label for Y axis main : Chart title xlim : Range of X Values ylim : Range of Y Values

And many other parameters in hist function for histogram plot in R.

# Create a numeric vector or data points

df.hist <- rnorm(1000,m=70, sd=20)
hist(x=df.hist,
     breaks=10,
     xlab="Input Values/Groups",
     ylab="Count",
     main="Histogram in R",
     col="blue",
     border="white")

Histogram using R

Read csv file and create a data frame

Now we want to take an example and create histogram in R for all the numeric variables in an input csv file.

Read a csv file using read.csv function. More information on reading data in R

In a real life scenario, an input data frame or a csv file may have hundreds of number variables, and we want to create histogram for each of the variable and store in a particular folder.

*   Takes inputs : data frame, path  to which histogram to be plotted and stores, maximum number of breaks
*   Find a list and count of numeric columns
*   Loop for each of the numeric variable
*   Starting plotting device
*   Plot histogram in R
*   Close the plotting device

Read csv file

invoiceDF <- read.csv(file="C:\\DnI\\DnI Institute\\Blog\\R\\histogram\\invoiceData1.csv",header=T)

# View of initial 6 rows
head(invoiceDF)
##   custNo InvoiceAmount2  AnnualSale InvoiceAmountTotal InvoiceAmount1
## 1      1      -11783.55 14210860132         -1583608.8    1673175.330
## 2      2       42248.11  3446817261          -584130.6    -909903.952
## 3      3      -20737.96  4360346859           470542.5    -159253.953
## 4      4       62529.48  1609197951         -1527911.7     955919.797
## 5      5      -33672.69 -4673802032           963853.6       8520.159
## 6      6      -21573.67  -372227640           260563.4     319893.629
# Type of variables
sapply(invoiceDF, function (x) class(x))
##             custNo     InvoiceAmount2         AnnualSale 
##          "integer"          "numeric"          "numeric" 
## InvoiceAmountTotal     InvoiceAmount1 
##          "numeric"          "numeric"

In this example, input data is a csv file which has 5 columns or variables - custNo, annual sales, total invoice amount, invoice amount 1 and invoice amount 2.

4 variables are numeric CustNo is an integer. We want to plot histogram for all the numeric variable (excluding custNo)

First we will show manual steps to get the histogram using R

 

#Get type of input data frame columns
col.type <- data.frame("varType"=sapply(invoiceDF, function (x) class(x)))
# Convert rownames/variable names to a data frame column
col.type$varName <- rownames(col.type)
# Set row names of data frame to null
rownames(col.type) <- NULL


noBreaks <- 20

nCol <- nrow(col.type)
for(i in 1:nCol){
  if(col.type[i,1]=="numeric"){
    
    hist(invoiceDF[,col.type[i,2]],
         breaks=noBreaks,
         main=paste(" Histogram for ",col.type[i,2],sep=""),
         xlab=paste(" Breaks : ",col.type[i,2],sep=""),
         ylab="Count",
         col="green",
         border="red"
    )
  }
}

Histogram using R - all variablesplots window in R. If the plots have to saved to a folder, we need to save programatically. Graphs could be saved as PDF as well.

R graphic devices could be used for storing plot in any of these BMP, JPEG, PNG and TIFF formats. R functions for these are bmp, jpeg , png and tiff .

#path and name of image to be created
png(file="C:\\DnI\\DnI Institute\\Blog\\R\\histogram\\hist.png")
# Histogram Image
hist(x=df.hist,
     breaks=10,
     xlab="Input Values/Groups",
     ylab="Count",
     main="Histogram in R",
     col="blue",
     border="white")
#close the graphic device
dev.off()

list.files function, one can find the list of files in a folder/directory.

list.files(path ="C:\\DnI\\DnI Institute\\Blog\\R\\histogram")
## [1] "AnnualSale.png"         "hist.png"              
## [3] "InvoiceAmount1.png"     "InvoiceAmount2.png"    
## [5] "InvoiceAmountTotal.png" "invoiceData.csv"       
## [7] "invoiceData.xlsx"       "invoiceData1.csv"      
## [9] "ram.png"

 

You will be see that hist image is created in the folder.

Now, most of the important components are illustrated above and below is the function to create a list of histograms for each of the input variable (numeric type).

NumVarHist <- function(dataF,noBins,fpath){
  
  #Get type of input data frame columns
col.type <- data.frame("varType"=sapply(dataF, function (x) class(x)))
# Convert rownames/variable names to a data frame column
col.type$varName <- rownames(col.type)
# Set row names of data frame to null
rownames(col.type) <- NULL

nCol <- nrow(col.type)
  for(i in 1:nCol){
    if(col.type[i,1]=="numeric"){
      # path and name of image
      fileN=paste(col.type[i,2],"png",sep=".")
      png(file=paste(fpath,fileN,sep="\\"))
      hist(dataF[,col.type[i,2]],
         breaks=noBins,
         main=paste(" Histogram for ",col.type[i,2],sep=""),
         xlab=paste(" Breaks : ",col.type[i,2],sep=""),
         ylab="Count",
         col="green",
         border="red"
      )
      #close the graphic device
      dev.off()
    }
  }
}

# Call function
NumVarHist(invoiceDF,15,"C:\\DnI\\DnI Institute\\Blog\\R\\histogram")

# List of hisgram files created
list.files(path ="C:\\DnI\\DnI Institute\\Blog\\R\\histogram",
           pattern=".png")
## [1] "AnnualSale.png"         "hist.png"              
## [3] "InvoiceAmount1.png"     "InvoiceAmount2.png"    
## [5] "InvoiceAmountTotal.png" "ram.png"

We can add one more argument to exclude a list of columns even from numeric type. Also applying checks to handle incorrect inputs effectively, e.g. if data frame does not exist or physical path is not available.

 


histogram in r example,histogram in r ggplot2,plot histogram in r,histogram in r from csv,histogram in r with line,histogram in r density,histogram in r xlim, histogram in r x must be numeric,how to make histograms in r,plot histograms in r,how to draw histograms in r,plot distribution r

 

2 thoughts on “Multiple Histograms in R”

  1. Is there a way we can add multiple series in the histogram plot..
    For example get histogram for year 2015 and yer 2016 in one plot?

Leave a Comment