• 1. Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, P. R. China;
  • 2. Shenzhen Health Development Research and Data Management Center, Shenzhen 518028, P. R. China;
  • 3. National Population Health Data Center, Chinese Academy of Medical Sciences, Beijing 100730, P. R. China;
HE Xiaofeng, Email: 0912hexf@163.com; LI Jiao, Email: li.jiao@imicams.ac.cn
Export PDF Favorites Scan Get Citation

Objective To summarize and explore the application of machine learning models to survival data with non-proportional hazards (NPH), and to provide a methodological reference for large-scale, high-dimensional survival data. Method Firstly, the concept of NPH and related testing methods were outlined; then the advantages and disadvantages of machine learning algorithm-based NPH survival analysis methods were summarized based on the relevant literature; finally, using real-world clinical data, a case study was conducted with two ensemble machine learning models and two deep learning models in survival data with NPH: a study of the risk of death within 30 days in stroke patients in the ICU. Results 8 commonly used machine learning model-based NPH survival analyses were identified, including five traditional machine learning models such as random survival forest and three deep learning models based on artificial neural networks (e.g., DeepHit); the case study found that the random survival forest models performed the best (C-index=0.773, IBS=0.151), and the permutation importance-based algorithm found that age was the most important characteristic affecting the risk of death in stroke patients. Conclusion Survival big data in the era of precision medicine presenting NPH is common, and machine learning model-based survival analysis can be used when faced with more complex survival data and higher survival analysis needs.