In the last blog, overview to cox proportional model and building Cox Regression Model using R are discussed.
In this blog, the focus is on cox proportional hazards model interpretation or how to interpret Cox Regression Model output in R.
Cox Regression in R
In survival package in R, coxph function can be used for building Cox Proportional Hazard Model.
cox.ph.m <- coxph(Surv(Time2Event, attrition)~CUSTOMER_AGE+MoB, data=wsurv.data)
summary(cox.ph.m ) ## Call: ## coxph(formula = Surv(Time2Event, attrition) ~ CUSTOMER_AGE + ## MoB, data = wsurv.data) ## ## n= 1957, number of events= 607 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## CUSTOMER_AGE -0.00310 0.99690 0.00443 -0.7 0.48 ## MoB -0.27636 0.75854 0.01824 -15.2
The summary output gives information on formula used for the Cox Regression Model. Also, it gives sample information, in this sample size is 1957 observations and number of events(attrition) as 607.
Similar to other regression output, Cox Regression output has beta coefficient, standard error, statistics and P value.
Cox Regression has additional column for eb (exponentiation of beta coefficients).
Goodness of Fit Test for each variables is assess whether a variable is significant or not. It is done using Z statistics and its P Value.
For assessing overall Cox Regression Model, a few statistics are calculated and shown in the summary output.
Concordance : Similar to concordance level in logistic regression, concordance shows fraction of pairs in the sample, where the observations with the higher survival time has the higher probability of survival predicted by the model.
Rsquare : R2 is measure variance explained by the model under consideration.
Likelihood ratio test, Wald test and Score (logrank) test tests are using for testing Global Null Hypothesis Beta =0. In this example, P Value for each of these tests are 0, so the null hypothesis could not be accepted.
One of the great way to measure model predictive power is by comparing predicted vs actual results.
Prediction using Cox Regression
predict function can be used for finding out the predicted values using the Cox Model developed. In this example, same sample have been used for calculating predicted values.
If we want to find Hazard Probability, we can using the model we have built. Below function gives Hazard Probability
Now, we will discuss steps to get values using R.
type=“lp” calculate linear predicted (xb). We can then use the baseline probability for a time and do the exponential transformation of linear prediction.
lp.pred <- predict(cox.ph.m, type="lp", data=wsurv.data) # Baseline Function base <- basehaz(cox.ph.m) # Base value H0 = 0.025880965 at 59 Days # Predicted Value at time 59 days Pred.val <- 0.025880965*exp(lp.pred)
Using Predicted Probability, we can create predicted target level. Now, we have predicted probabilities at a time : 59 days. We can define target value (actual at a time 59 days). Similarly for different time periods, we can get base line probabilities using basehaz function output.
Pred.target <- ifelse(Pred.val>0.09,1,0)
Considering that the input data has attrition/censoring values at different time, we have to define observed target values at 59 Day for appropriate comparison.
target <- ifelse(wsurv.data$Time2Event <=59, 0, wsurv.data$attrition)
Model Accuracy: Confusion Matrix
Using Predicted and Observed target level, confusion matrix is created. For this example, accuracy level & confusion matrix is created at 59 days.
Overall accuracy of the model prediction is 79%. Since action will be on predicted group 1, the 61% of the 1s are actually ones.
Gains Table and Lift Chart
Gains Table can be created for each of the important time interval and validate accuracy of the model.
Gain Table for one of the time point is as follow.
Lift Chart shows the lift of the model at each of the deciles. For the selected time, the lift chart is as follow.
If we want to calculate Survival Probability, we can use below code.
install.packages("pec") library(pec) Pred_Prob <- predictSurvProb(cox.ph.m, newdata=wsurv.data, times=c(59)) # predicts the survival probability based on the model for the dataset -wsurv.data and for the time 59 days