Although R is mostly known for its capability in statistical analytics and data modeling but it has great capability to data preparation and query.

Here I present a sample case which is very much required in our day to day analysis. So I came through a scenario when I have to figure out the exception when any employee has worked consecutively for 7 or 8 days without taking any off.

To count the consecutive days I have taken below dataset with Employee ID and Attendance Date.

set.seed(1) df <- data.frame(Seq=seq(1:20),EmpID=sample(1001:1002,20,replace=T),Date=sample(seq(as.Date("2016-12-01"),as.Date("2016-12-10"),by="day"),20,replace=T))

Few data cleaning required before proceeding:

- Remove duplicates if any at the employee id and date level
- Create required dummy variable to help in counting the consecutive dates

df <- df[order(df$EmpID,df$Date),] df <- df[!duplicated(df[,c("EmpID","Date")]),] df$last_Date <- c(as.Date("1970-01-01",format="%Y-%m-%d"),df[1:nrow(df)-1,]$Date) df$diff <- df$Date - df$last_Date df$last_empID <- c(0,df[1:nrow(df)-1,]$EmpID) df$diff_EmpID <- ifelse(df$EmpID == df$last_empID,0,1) df$flag <- ifelse(df$diff==1 & df$diff_EmpID==0,0,1)

With the use of rle function of base package we can count the consecutive dates

consecutive_count <- function(x) { x <- !x rl <- rle(x) len <- rl$lengths v <- rl$values cumLen <- cumsum(len) z <- x # replace the 0 at the end of each zero-block in z by the # negative of the length of the preceding 1-block.... iDrops <- c(0, diff(v)) < 0 z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ] # ... to ensure that the cumsum below does the right thing. # We zap the cumsum with x so only the cumsums for the 1-blocks survive: x*cumsum(z) } df$Consecutive <- consecutive_count(df$flag) Output <- df[,c("Seq","EmpID","Date","Consecutive")]

Output:

Here employee 1001 has consecutively marked his attendance for 7 days from 2016-12-02 to 2016-12-08. Whereas the pattern for employee 1002 is different, 1002 never marked his attendance consecutively for more than 2 days

Post your related query for customized solution.

hello author,

The code written above solved my problem. A very efficient use of logic and gave the required result in very short time. Thanks for the great help.....