A machine learning framework for predicting population-wise rwTTD
Termination of a specific treatment can be considered as survival data, where an observed termination of treatment is an event point and otherwise the patient is censored (Fig. 1a )1. However, existing survival models only predict individual patient’s likelihood of survival. As shown below shortly, the aggregation of individuals does not represent the profile of a population. Therefore, we designed an approach that predicts the termination curve of a population.
We started with producing the gold standard (expected future time) for each individual in the training population. This expected future time is defined as the time expected until the treatment is terminated from the point at which we are going to make the predictions. Prior to this point, all observed clinical data are available for making predictions. Two cases can be considered here. In the first case, if we know the termination time of the treatment (an ‘event’ data point), the patient’s future time is defined as the time between the end of the observation window, from which we collect feature data used to make prediction, and the drug termination time. In the second case, if the termination time of the treatment is unknown for a patient (a ‘censored’ data point), we infer the expected future time from the survival curve derived from the training population. In this case, we use a popular method, Kaplan–Meier curve, to represent the termination ratio of the training set 13. The expected future time is then composed of two parts. The first part is the existing time lapse, i.e. , from the end of the observation time window to the last contact time point, because we know without uncertainty that the patient continued drug treatment until the last contact time point. The second part is the expected time after the last contact time point, which is calculated as the integral of the curve beyond the last contact time point divided by the terminated ratio at the last contact time point (Fig. 1a ). Adding the first and second part together results in the expected future time for the censored individuals. This approach generates the gold standard for predicting the expected future time for each individual into which any kinds of base learners can be built. Later, we will explain how a nested training scheme can extrapolate and aggregate the predictions from individuals to infer the terminated ratio curve for a population.
We simulated drug termination data of a population following a survival study 14(Fig. 1b ). We generated a population of total nindividuals, where the termination rate for each individual is drawn from a population of p ~ N(pmean ,𝛔 ), and we force the minimal termination rate to be zero. We hypothesize that the probability that a patient terminates the treatment (p ) on a single day is driven by a series of (m in total) predictive features f . These features, in reality, can be demographic information, clinical measurements or any claim data, as will be shown with the real world drug treatment experiment below. In this simulation experiment, we Let individual feature values correlates to p by:
\(v_{\text{kj}}=p_{k}\times f_{j}(1+\theta\times\epsilon_{j}\))
Where \(v_{\text{kj}}\) is the value of feature j for patientk . \(p_{k}\) is the termination rate of Patient k .fj represents the scaling factor of a particular feature, uniformly drawn between [0, 𝞪]. Each feature j is parameterized by noise factor 𝞮j , uniformly drawn from [0, 𝞫]. When 𝞫 goes up, a larger sampling range will result in less correlation between the feature and the expected future time. The value of the jth feature of the kth sample,vkj , is further parameterized by 𝜽, which is uniformly distributed sampled between [-0.5, 0.5].
We set the maximal allowed observation date of all individuals to𝞭max . Between [0,𝞭max ], we create a binomially distributed vector of length 𝞭 k ~B (𝞭max, , \(p_{k}\)) for each individualk . Thus, the higher the \(p_{k}\), the more likely the individual is to be terminated with the uncertainty defined by the binomial distribution. In this binomially sampled sequence, the first appearance of 1 decides the termination date tterm . Next, for each individual, we uniformly sampled between [0,𝞭max ] and define the censoring datetcensor . Iftterm >tcensor , the last observation timetlast=tcensor , and the status is 0 (censored point and no termination date is observed); otherwise, thetlast=tterm with a status =1 (termination observed and the date is defined).