Heatmap using R

 Author: Niloy Ghosh


In a lot of scenarios, visualising data and information on world map can give some interesting insights and perspective.

In the current example, we are using London Olympic Athlete Data for plotting. The results of clustering has been visualised on world map to see how clusters are presented across countries.

London Olympic Athlete Data has been used for Clustering.

Tutorial on Clustering using R

The clusters were created based on Athlete Height and Weight. But the clusters were significantly different on profiled variables - age, and winning rate.

Profiling and Summary Statistics of the Clusters

Reading data into R

One of the first step in plotting and visualisation is reading data into R.

## Set up work directory
#set up library
setwd("~Learn R/training")
# Read Data
london <- read.csv("londonT.csv")

Manipulate Data using R

Objective of the blog is to plot count of athletes from each country and visualise on the world map. The heatmap so plot can give a great visualisation of the countries sent higher number of athletes to London Olympic.

Considering clusters have been created based on K Means. Heat-map shows spreads of clusters across countries.

sqldf is used to count athletes based on cluster and country.

## load library
library(sqldf)
## get the data in a different form
mod_player_data <- sqldf("select cluster, Country, count(*) as num_of_players 
                         from london
                         group by 1, 2")

Load Map Plotting Packages

Map plotting requires a few R Packages; hence they should be installed and load below using heatmap function.

library(maps)
library(mapdata)
library(RColorBrewer)
library(plyr)

Plotting Clusters on the Worldmap

Get data by cluster and plot on the worldmap as a heatmap.

Steps used:

  • Subset data for each cluster
  • Create player count groups using cut function
  • Create color palatte
  • Add country
  • Draw on worldmap using map function
## get the cluster wise data
cluster1 <- subset(mod_player_data, cluster == 1)


## sort the individual cluster data
cluster1 <- arrange(cluster1, num_of_players)

## assign intervals using cut function
plot_data1 <- cbind(cluster1, partition = cut (as.numeric(cluster1$num_of_players),c(0,1,5,10,15,20,50, max(cluster1$num_of_players)),labels = FALSE, right = TRUE))

## create a color palette
gama1 <- brewer.pal(7,"Reds")

## assign color for each country
col1 <- character()

for (i in 1:nrow(cluster1))
{
  col1 <- append(col1,gama1[as.numeric(plot_data1[i,4])])
}


## draw the map
map('worldHires',as.character(plot_data1[,2]),fill=TRUE,col=col1,plot=TRUE, cex = 15, exact=FALSE)
title("Athletes Counts by Country from London Olympics")
legend("bottomleft", c("1", "2 - 5", "6 - 10", "11 - 15", "16 - 20", "21 - 50", "more than 50"),border=gama1, fill = gama1, cex = 0.5, box.col = "white")

Similarly other clusters can be plotted to the worldmap.


 

4 thoughts on “Heatmap using R”

Leave a Comment