Survival Analysis: Part II

Published in

CodeX

6 min readMay 10, 2021

In this post, we’ll go over how to put the survival plots and models we discussed in Part I into practice, the most interesting part of learning is its practical implementation. First, let look at our dataset and understand.

This dataset is of prostate cancer:

Details of the columns:

Patient: The unique id of patient
Treatment: There are two types of treatment Type 1 and Type 2.
Status: The status of the event status in the data frame containing the data is called prostate cancer.
Time: Time with a time component into column status.
Age is the age of the patient.
Size: Size of cancer tumor.

I am performing the analysis in R, for the survival analysis we need to install certain libraries:

Let’s look at the time and status,

From the above, we see some of the observations do have a plus sign at the end that plus sign indicates censoring it is a standard operating procedure and survival analysis to have a status column indicated with either 0 or 1. 1 encodes an event in 0 indicates censoring are stuck to that convention and marks the data accordingly.

Kaplan Meier Plot

Now, this survival fit model. We cannot directly pack into a plot to get the Kaplan Meier plot.

Time is indicated on the x-axis. The survival probability is shown on the y axis, which is in months. We have censoring events indicated with those symbols over here.
This effect we added to the plot because we were set to mark time to true and we have the death events indicated with these steps each step shows the decline in survival probability.
The plot also has 95 percent confidence intervals. As the upper and lower bands now actually this whole plot.

2. Stratified Kaplan Meier Plot

Stratified Kaplan Meier Plot for prostate cancer

We can identify easily which line belongs to which treatment group. In this case, it is pretty obvious said treatment 2 has better survival chances.
For example, at Forty months treatment group one is already as low as around 70 percent survival whereas the alternative is still at a rather high 95 percent.
Now after about 65 months the curves are nearing each other and the probabilities are equal. Also, note the rather wide confidence intervals for treatment 1.

3. The Log Rank Test

When we actually want to know if there is a significant survival difference between the two treatment groups. These groups are identified in the treatment column.

The p-value, in this case, is non-significant. That means we have to stick with the null hypothesis which says that there is no difference in the survival probabilities of both treatment groups for each group.
We get the total subject number per group. We get the observed number of events per group as well as the expected events derived from the observed number.
Now from prost$treatment, we are calculating the chi-square statistic for each group as well as the variance per group.
So we can see it is actually fairly simple to compare the survival probabilities of those two groups

4. Cox Proportional Hazards Model and Parametric Models

The ultimate purpose of the Cox proportional hazard method is to notice how different factors (covariates) in our dataset impact the event of interest.

Hazard Rate: The probability estimate of the time it takes for an event to take place.
Covariates: There are external factors that do influence the probability of an event. These external factors are called covariance in a proportional hazards model.
Concordance: It tells us the chance of being correct in selecting one observation with a higher risk, often from two randomly chosen ones.
Now to be brief we want to concordance to be as close to one as possible. Anything lower than zero point five is a very bad model.
The values exp(bi) are called the hazard ratio (HR). The HR greater than 1 indicates that as the value of ith covariate increases, the event hazard increases, and thus the duration of survival decreases.

Results from Cox Proportional Hazard Model

From our Cox Model, If we check out the summary output we first see the model formula and the number of events versus the total number of observations.
We have all the covariance here including their coefficients for the model the prediction into walls and the significance.
Stars with the p values suggest that these covariances are significant for the model.
In our case, we have two variables H size and index that are significant, especially since the index variable is important for the model.
This information is actually pretty interesting since with this piece of info we can simplify the model and potentially eliminate covariance from the model which is not seen as interesting.

5. Survival Trees

Disciplines like survival analysis a survival tree is a decision tree fitted on survival data it allows covariance to be used quite like in a Cox proportional hazards regression model.
The results we are actually modeling our survival probabilities many of you guys already know decision trees or have at least seen them fairly.
Often you will see a visual representation of such a decision tree is essentially a flowchart-like structure in which each internal node represents a test on one selected variable.
We have to add that times on the x-axis of the plot and the average survival probability is on the y-axis.

We have the survival probabilities on the y-axis and the time in months is on the x-axis.

The survival study on the prostate cancer dataset will come to a close here. I hope it has clarified the principle of survival analysis for you.

Hope this helps :) Follow me if you like my posts. Please feel free to leave comments for any clarifications or questions. Happy learning 😃

Feel free to connect: LinkedIn

Survival Analysis: Part II

Written by Afaf Athar