This tutorial will show you how to set up and interpret a Cox proportional hazards model in Excel using the XLSTAT software.
Dataset to run a Cox proportional hazards model
An Excel sheet with both the data and results can be downloaded by clicking here.
The data have been obtained in Kalbfleisch and Prentice (The Statistical Analysis of Failure Time Data, Wiley, 2002, p. 119) and represent a clinical trial investigating the effect of covariates on time to death of patients with lung cancer. Our goal is to determine which covariate influences the survival time.
Cox proportional hazards model
The proportional hazard model has been introduced by Cox (1972) and it is based on a classical regression scheme. The estimation of the model is performed with a specific type of maximum likelihood estimation called partial likelihood. An estimation of the coefficients of the model is obtained supposing the proportional hazard hypothesis holds.
In the dataset, the daysurv variable is the time data; the censoring variable is the status variable (1 for death, 0 for censored). The covariates are the performance status of the patient at the beginning of the study (perfstatus), the age of the patient at the beginning of the study (age), the number of month since lung cancer diagnostic at the beginning of the study (month) and the presence of an earlier treatment.
Setting up a Cox proportional hazards model
After opening XLSTAT, select the XLSTAT / Survival analysis / Cox Proportional hazards model command.
Once you've clicked on the button, the Cox proportional hazards model box will appear. Select the data on the Excel sheet. The Time data corresponds to the durations when the patients either died or were censored. The "Status indicator" describes whether a patient died (event code=1) or was censored (censored code = 0) at a given time.
The covariates are all quantitative and can be selected in the quantitative box.
Other options can be selected on the other tabs of the dialog box like stratisfication of the model, individual residuals computation, ties handling method selection and so on.
The computations begin once you have clicked on OK. The results will then be displayed on a new Excel sheet.
Interpreting the results of a Cox proportional hazards model
The first table displays a summary of the data. We can see that the number of observed times (time steps) is different than the number of observations. We then have to use a tie handling technique (Breslow’s method is the default one, but Efron’s method is also available).
Descriptive statistics are then displayed for each variable (in our example, all variable are quantitative, but if qualitative variables are selected a different table will be displayed). For the treatment of qualitative variables, please refer to the help of XLSTAT for more details.
The next table gives several indicators of the quality of the model (or goodness of fit). These results are equivalent to the R2 and to the analysis of variance table in linear regression and ANOVA. The most important value to look at is the probability of Chi-square test on the log ratio. This is equivalent to the Fisher's F test: we try to evaluate if the variables bring significant information by comparing the model as it is defined with a simpler model with no impact of the covariates. In this case, as the probability is lower than 0.0001, we can conclude that significant information is brought by the variables.
The following table gives details on the model. This table is helpful in understanding the effect of the various variables.
On this table we can see from looking at the probability of the Chi-squares that the variable most influencing survival time is perfstatus. This shows that the performance status of the patient at beginning of the study has a significant effect on survival time. The hazard ratio is obtained as the exponential of the parameter estimate.
Finally, the cumulative hazard function is displayed:
This study has shown that the only covariate with a significant impact is the performance status. The coefficient being negative shows that when a patient has a low performance status his survival time is greater. The other covariates do not have a significant effect on the survival time.