A study on multi-class classification of medical questionnaire item texts based on prompt learning_Chinese Journal of Evidence-Based Medicine

Authors：

HAO Jie ¹ , PENG Qinglong ² , CONG Shan ² , LI Jiao ¹ ,  SUN Haixia ¹

1. Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, P. R. China;
2. Qingdao Innovation Development Base, Harbin Engineering University, Qingdao 266000, P. R. China;

Corresponding author：

SUN Haixia, Email: sun.haixia@imicams.ac.cn

Keywords：

Medical questionnaire; Question classification; Multi-class classification; Prompt learning; Pre-trained language model

DOI：

10.7507/1672-2531.202307139

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

Objective The current medical questionnaire resources are mainly processed and organized at the document level, which hampers user access and reuse at the questionnaire item level. This study aims to propose a multi-class classification of items in medical questionnaires in low-resource scenarios, and to support fine-grained organization and provision of medical questionnaires resources. Methods We introduced a novel, BERT-based, prompt learning approach for multi-class classification of items in medical questionnaires. First, we curated a small corpus of lung cancer medical assessment items by collecting relevant clinical assessment questionnaires, extracting function and domain classifications, and manually annotating the items with "function-domain" combination labels. We then employed prompt learning by feeding the customized template into BERT. The masked positions were predicted and filled, followed by mapping the populated text to labels. This process enables the multi-class classification of item texts in medical questionnaires. Results The constructed corpus comprised 347 clinical assessment items for lung cancer, across nine "function-domain" labels. The experimental results indicated that the proposed method achieved an average accuracy of 93% on our self-constructed dataset, outperforming the runner-up GAN-BERT by approximately 6%. Conclusion The proposed method can maintain robust performance while minimizing the cost of building medical questionnaire item corpora, illustrating its promotion value of research and practice in medical questionnaire classification.

Citation： HAO Jie, PENG Qinglong, CONG Shan, LI Jiao, SUN Haixia. A study on multi-class classification of medical questionnaire item texts based on prompt learning. Chinese Journal of Evidence-Based Medicine, 2024, 24(1): 76-82. doi: 10.7507/1672-2531.202307139 Copy

1.	Devellis RF, Thorpe CT. Scale development: theory and applications. Los Angeles: SAGE Publications Inc. , 2021.
2.	薛允莲, 许军, 刘贵浩, 等. 基于亚健康评定量表(SHMSV1. 0)的我国城镇居民亚健康状况评价研究. 中国全科医学, 2021, 24(7): 834-841.
3.	国家卫生健康委办公厅. 原发性肺癌诊疗指南(2022年版). 协和医学杂志, 2022, 13(4): 549-570.
4.	Mendoza TR, Wang XS, Lu C, et al. Measuring the symptom burden of lung cancer: the validity and utility of the lung cancer module of the M. D. Anderson Symptom Inventory. Oncologist, 2011, 16(2): 217-227.
5.	MDCalc. About MDCalc. 2023.
6.	Zhao S, Su C, Lu Z, et al. Recent advances in biomedical literature mining. Brief Bioinform, 2021, 22(3): bbaa057.
7.	Rios A, Kavuluru R. Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. ACM BCB, 2015: 258-267.
8.	Li M, Fei Z, Zeng M, et al. Automated ICD-9 coding via a deep learning approach. IEEE/ACM Trans Comput Biol Bioinform, 2019, 16(4): 1193-1202.
9.	Shen Y, Zhang Q, Zhang J, et al. Improving medical short text classification with semantic expansion using word-cluster embedding. arXiv Preprint, arXiv: 1812.01885.
10.	杨飞洪, 王序文, 李姣. 基于BERT-TextCNN模型的临床试验筛选短文本分类方法. 中华医学图书情报杂志, 2021, 30(1): 54-59.
11.	Wei J, Zou K. EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv Preprint, arXiv: 1901.11196.
12.	Liu P, Yuan W, Fu JL, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv, 2023, 55(9): 1-35.
13.	Li Q, Peng H, Li JX, et al. A survey on text classification: from shallow to deep learning. arXiv Preprint, arXiv: 2008.00364.
14.	Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification. arXiv Preprint, arXiv: 1607.01759.
15.	Kim Y. Convolutional neural networks for sentence classification. arXiv Preprint, arXiv: 1408.5882.
16.	Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning. arXiv Preprint, arXiv: 1605.05101.
17.	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint, arXiv: 1810.04805.
18.	Wang J, Wang ZY, Zhang DW, et al. Combining knowledge with deep convolutional neural networks for short text classification. IJCAI, 2017: 2915-2921.
19.	田晗, 徐春. 基于BERT模型的医学短文本分类算法研究. 伊犁师范大学学报(自然科学版), 2021, 15(4): 50-57.
20.	Gu Y, Tinn R, Cheng H, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc, 2021, 3(1): 1-23.
21.	Sivarajkumar S, Wang YS. HealthPrompt: a zero-shot learning paradigm for clinical natural language processing. arXiv Preprint, arXiv: 2203.05061.
22.	Wong SC, Gatt A, Stamatescu V, et al. Understanding data augmentation for classification: when to warp? arXiv Preprint, arXiv: 1609.08764.
23.	Sennrich R, Haddow B, Brich A. Improving neural machine translation models with monolingual data. arXiv Preprint, arXiv: 1511.06709.
24.	Morris JX, Lifland E, Yoo JY, et al. TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. EMNLP, 2020: 119-126.
25.	Taylor N, Zhang Y, Joyce D, et al. Clinical prompt learning with frozen language models. arXiv Preprint, arXiv: 2205.05535.
26.	Bengio, Y. Deep learning. London: MIT Press, 2016.
27.	Croce D, Castellucci G, Basili R. GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples. ACL 2020: 2114–2119.
28.	He Y, Zhu Z, Zhang Y, et al. Infusing disease knowledge into BERT for health question answering, medical inference and disease name recognition. arXiv Preprint, arXiv: 2010.03746.
29.	Yuan Z, Liu Y, Tan C, et al. Improving biomedical pretrained language models with knowledge. arXiv Preprint, arXiv: 2104.10344.

1. Devellis RF, Thorpe CT. Scale development: theory and applications. Los Angeles: SAGE Publications Inc. , 2021.
2. 薛允莲, 许军, 刘贵浩, 等. 基于亚健康评定量表(SHMSV1. 0)的我国城镇居民亚健康状况评价研究. 中国全科医学, 2021, 24(7): 834-841.
3. 国家卫生健康委办公厅. 原发性肺癌诊疗指南(2022年版). 协和医学杂志, 2022, 13(4): 549-570.
4. Mendoza TR, Wang XS, Lu C, et al. Measuring the symptom burden of lung cancer: the validity and utility of the lung cancer module of the M. D. Anderson Symptom Inventory. Oncologist, 2011, 16(2): 217-227.
5. MDCalc. About MDCalc. 2023.
6. Zhao S, Su C, Lu Z, et al. Recent advances in biomedical literature mining. Brief Bioinform, 2021, 22(3): bbaa057.
7. Rios A, Kavuluru R. Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. ACM BCB, 2015: 258-267.
8. Li M, Fei Z, Zeng M, et al. Automated ICD-9 coding via a deep learning approach. IEEE/ACM Trans Comput Biol Bioinform, 2019, 16(4): 1193-1202.
9. Shen Y, Zhang Q, Zhang J, et al. Improving medical short text classification with semantic expansion using word-cluster embedding. arXiv Preprint, arXiv: 1812.01885.
10. 杨飞洪, 王序文, 李姣. 基于BERT-TextCNN模型的临床试验筛选短文本分类方法. 中华医学图书情报杂志, 2021, 30(1): 54-59.
11. Wei J, Zou K. EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv Preprint, arXiv: 1901.11196.
12. Liu P, Yuan W, Fu JL, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv, 2023, 55(9): 1-35.
13. Li Q, Peng H, Li JX, et al. A survey on text classification: from shallow to deep learning. arXiv Preprint, arXiv: 2008.00364.
14. Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification. arXiv Preprint, arXiv: 1607.01759.
15. Kim Y. Convolutional neural networks for sentence classification. arXiv Preprint, arXiv: 1408.5882.
16. Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning. arXiv Preprint, arXiv: 1605.05101.
17. Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint, arXiv: 1810.04805.
18. Wang J, Wang ZY, Zhang DW, et al. Combining knowledge with deep convolutional neural networks for short text classification. IJCAI, 2017: 2915-2921.
19. 田晗, 徐春. 基于BERT模型的医学短文本分类算法研究. 伊犁师范大学学报(自然科学版), 2021, 15(4): 50-57.
20. Gu Y, Tinn R, Cheng H, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc, 2021, 3(1): 1-23.
21. Sivarajkumar S, Wang YS. HealthPrompt: a zero-shot learning paradigm for clinical natural language processing. arXiv Preprint, arXiv: 2203.05061.
22. Wong SC, Gatt A, Stamatescu V, et al. Understanding data augmentation for classification: when to warp? arXiv Preprint, arXiv: 1609.08764.
23. Sennrich R, Haddow B, Brich A. Improving neural machine translation models with monolingual data. arXiv Preprint, arXiv: 1511.06709.
24. Morris JX, Lifland E, Yoo JY, et al. TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. EMNLP, 2020: 119-126.
25. Taylor N, Zhang Y, Joyce D, et al. Clinical prompt learning with frozen language models. arXiv Preprint, arXiv: 2205.05535.
26. Bengio, Y. Deep learning. London: MIT Press, 2016.
27. Croce D, Castellucci G, Basili R. GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples. ACL 2020: 2114–2119.
28. He Y, Zhu Z, Zhang Y, et al. Infusing disease knowledge into BERT for health question answering, medical inference and disease name recognition. arXiv Preprint, arXiv: 2010.03746.
29. Yuan Z, Liu Y, Tan C, et al. Improving biomedical pretrained language models with knowledge. arXiv Preprint, arXiv: 2104.10344.

Chinese Journal of Evidence-Based Medicine

A study on multi-class classification of medical questionnaire item texts based on prompt learning

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content