Count consecutive number of days with condition in R

Count consecutive number of days with condition in R

Although R is mostly known for its capability in statistical analytics and data modeling but it has great capability to data preparation and query.

Here I present a sample case which is very much required in our day to day analysis. So I came through a scenario when I have to figure out the exception when any employee has worked consecutively for 7 or 8 days without taking any off.

To count the consecutive days I have taken below dataset with Employee ID and Attendance Date.

set.seed(1)
df <- data.frame(Seq=seq(1:20),EmpID=sample(1001:1002,20,replace=T),Date=sample(seq(as.Date("2016-12-01"),as.Date("2016-12-10"),by="day"),20,replace=T))

Few data cleaning required before proceeding:

  1. Remove duplicates if any at the employee id and date level
  2. Create required dummy variable to help in counting the consecutive dates
df <- df[order(df$EmpID,df$Date),] df <- df[!duplicated(df[,c("EmpID","Date")]),]
df$last_Date <- c(as.Date("1970-01-01",format="%Y-%m-%d"),df[1:nrow(df)-1,]$Date)
df$diff <- df$Date - df$last_Date df$last_empID <- c(0,df[1:nrow(df)-1,]$EmpID)
df$diff_EmpID <- ifelse(df$EmpID == df$last_empID,0,1)
df$flag <- ifelse(df$diff==1 & df$diff_EmpID==0,0,1)

With the use of rle function of base package we can count the consecutive dates

consecutive_count <- function(x)  {
  x <- !x
  rl <- rle(x)
  len <- rl$lengths
  v <- rl$values
  cumLen <- cumsum(len)
  z <- x
  # replace the 0 at the end of each zero-block in z by the 
  # negative of the length of the preceding 1-block....
  iDrops <- c(0, diff(v)) < 0
  z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
  # ... to ensure that the cumsum below does the right thing.
  # We zap the cumsum with x so only the cumsums for the 1-blocks survive:
  x*cumsum(z)
}

df$Consecutive <- consecutive_count(df$flag)

Output <- df[,c("Seq","EmpID","Date","Consecutive")]

Output:

Here employee 1001 has consecutively marked his attendance for 7 days from 2016-12-02 to 2016-12-08. Whereas the pattern for employee 1002 is different, 1002 never marked his attendance consecutively for more than 2 days

Post your related query for customized solution.

1 thought on “Count consecutive number of days with condition in R”

  1. hello author,
    The code written above solved my problem. A very efficient use of logic and gave the required result in very short time. Thanks for the great help.....

Leave a Comment