The use of repeated measurement data from patients to improve the classification ability of prediction models is a key methodological issue in the current development of clinical prediction models. This study aims to investigate the statistical modeling approach of the two-stage model in developing prediction models for non-time-varying outcomes using repeated measurement data. Using the prediction of the risk of severe postpartum hemorrhage as a case study, this study presents the implementation process of the two-stage model from various perspectives, including data structure, basic principles, software utilization, and model evaluation, to provide methodological support for clinical investigators.
ObjectiveTo explore the utilization of longitudinal data in constructing non-time-varying outcome prediction models and to compare the impact of different modeling approaches on prediction performance. MethodsClinical predictors were selected using univariate analysis and Lasso regression. Non-time-varying outcome prediction models were developed based on latent class trajectory analysis, the two-stage model, and logistic regression. Internal validation was performed using Bootstrapping resampling, and model performance was evaluated using ROC curves, PR curves, sensitivity, specificity and other relevant metrics. ResultsA total of 49 629 pregnant women were included in the study, with mean age of 31.42±4.13 years and pre-pregnancy BMI of 20.91±2.62kg/m². Fourteen predictors were incorporated into the final model. Prediction models utilizing longitudinal data demonstrated high accuracy, with AUROC values exceeding 0.90 and PR-AUC values greater than 0.47. The two-stage model based on late-pregnancy hemoglobin data showed the best performance, achieving AUROC of 0.93 (95%CI 0.92 to 0.94) and PR-AUC of 0.60 (95%CI 0.56 to 0.64). Internal validation confirmed robust model performance, and calibration curves indicated a good agreement between predicted and observed outcomes. ConclusionFor the longitudinal data, the two-stage model can well capture the dynamic change trajectory of the longitudinal data. For different clinical outcomes, the predictive value of repeated measurement data is different.