Abstract

Recurrent events are common in clinical, healthcare, social and behavioral studies. A recent analysis framework for potentially censored recurrent event data is to construct a censored longitudinal data set consisting of times to the ﬁrst recurrent event in multiple prespeciﬁed follow-up windows of length τ. With the staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest are growing in popularity, as they can incorporate information from highly correlated predictors with non-standard relationships. In this paper, we bridge this gap by developing a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent τ-duration follow-up period from a reconstructed censored longitudinal data set. We demonstrate the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a τ duration follow-up period when compared to the recurrent event modeling framework of Xia et al. (2020) in settings where association between predictors and recurrent event outcomes is complex in nature. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease (Albert et al., 2011).

Keywords Censored data; Longitudinal data; Pseudo-observations; Random forest; Recurrent events