Research on automatic generation of multimodal medical image reports based on memory driven_Journal of Biomedical Engineering

Authors：

 XING Suxia , FANG Junze , JU Zihan , Guo Zheng , WANG Yu

1. School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China;

Corresponding author：

XING Suxia, Email: xingsuxia@163.com

Keywords：

Multimodal feature fusion; Memory driven; Automatic report generation; Medical imaging

DOI：

10.7507/1001-5515.202304001

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.

Citation： XING Suxia, FANG Junze, JU Zihan, Guo Zheng, WANG Yu. Research on automatic generation of multimodal medical image reports based on memory driven. Journal of Biomedical Engineering, 2024, 41(1): 60-69. doi: 10.7507/1001-5515.202304001 Copy

1.	张物华, 李锵, 关欣. 基于多尺度卷积神经网络的X光图像中肺炎病灶检测. 激光与光电子学进展, 2020, 57(8): 179-186.
2.	黄欣, 方钰, 顾梦丹. 基于卷积神经网络的 X 线胸片疾病分类研究. 系统仿真学报, 2020, 32(6): 1188-1194.
3.	Messina P, Pino P, Parra D, et al. A survey on deep learning and explainability for automatic report generation from medical images. ACM Computing Surveys, 2020, arXiv: 2010.10563.
4.	Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint, 2017, arXiv: 1711.05225.
5.	Demner-Fushman D, Kohli M D, Rosenman M B, et al. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 2016, 23(2): 304-310.
6.	Vinyals O, Toshev A, Bengio S, et al. Show and tell: a neural image caption generator//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2015: 3156-3164.
7.	Li C Y, Liang X, Hu Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation//Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18), 2018: 1537-1547.
8.	Han K, Wang Y, Chen H, et al. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110.
9.	He X, Yang Y, Shi B, et al. VD-SAN: visual-densely semantic attention network for image caption generation. Neurocomputing, 2019, 328: 48-55.
10.	Alfarghaly O, Khaled R, Elkorany A, et al. Automated radiology report generation using conditioned transformers. Informatics in Medicine Unlocked, 2021, 24: 100557.
11.	Valanarasu J M J, Oza P, Hacihaliloglu I, et al. Medical transformer: gated axial-attention for medical image segmentation//Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Springer, 2021: 36-46.
12.	Hou B, Kaissis G, Summers R M, et al. Ratchet: medical transformer for chest X-ray diagnosis and reporting//Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Springer, 2021: 293-303.
13.	Srinivasan P, Thapar D, Bhavsar A, et al. Hierarchical X-ray report generation via pathology tags and multi head attention//Proceedings of the Asian Conference on Computer Vision (ACCV 2020), Springer, 2020: 600-616.
14.	Liu F, Wu X, Ge S, et al. Exploring and distilling posterior and prior knowledge for radiology report generation// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2021: 13753-13762.
15.	Li J, Li S, Hu Y, et al. A self-guided framework for radiology report generation//Medical Image Computing and Computer Assisted Intervention (MICCAI 2022), Springer, 2022: 588-598.
16.	You D, Liu F, Ge S, et al. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation.//Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Springer, 2021: 72-82.
17.	Chen Z, Shen Y, Song Y, et al. Cross-modal memory networks for radiology report generation// The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 2022. arXiv: 2204.13258.
18.	Wang X, Peng Y, Lu L, et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2017: 2097-2106.
19.	Johnson A E W, Pollard T J, Berkowitz S J, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 2019, 6(1): 317.
20.	Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows// Proceedings of the IEEE/CVF international conference on computer vision, IEEE, 2021, 10012-10022.
21.	Devlin J, Chang M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL Anthology, 2019: 4171-4186.
22.	Silva Barbon R, Akabane A T. Towards transfer learning techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for automatic text classification from different languages: a case study. Sensors, 2022, 22(21): 8184.
23.	Chen Z, Song Y, Chang T H, et al. Generating radiology reports via memory-driven transformer// Conference on Empirical Methods in Natural Language Processing (EMNLP-2020), 2020. arXiv: 2010.16056.
24.	Lee D, Tian Z, Xue L, et al. Enhancing content preservation in text style transfer using reverse attention and conditional layer normalization// The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 2021. arXiv: 2108.00449.
25.	Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36(4): 1234-1240.
26.	Yang S, Wu X, Ge S, et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis, 2023, 86: 102798.
27.	Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation// Proceedings of the Annual Meeting of the Association for Computational Linguistics, ACL, 2002: 311-318.
28.	Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Association for Computational Linguistics, 2005: 65-72.
29.	Lin C Y. ROUGE: a package for automatic evaluation of summaries. Text summarization branches out, Association for Computational Linguistics, 2004: 74-81.
30.	He Tong, Zhang Zhi, Zhang Hang, et al. Bag of tricks for image classification with convolutional neural networks// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2019: 558-567.
31.	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016: 770-778.
32.	Huang Gao, Liu Zhuang,Van Der Maaten L, et al. Densely connected convolutional networks// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2017: 2261-2269.
33.	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale//International Conference on Learning Representations, ICLR, 2021: 1-22.
34.	Selvaraju RR, Cogswell M, Das A, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization.//Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2017: 618-626.

1. 张物华, 李锵, 关欣. 基于多尺度卷积神经网络的X光图像中肺炎病灶检测. 激光与光电子学进展, 2020, 57(8): 179-186.
2. 黄欣, 方钰, 顾梦丹. 基于卷积神经网络的 X 线胸片疾病分类研究. 系统仿真学报, 2020, 32(6): 1188-1194.
3. Messina P, Pino P, Parra D, et al. A survey on deep learning and explainability for automatic report generation from medical images. ACM Computing Surveys, 2020, arXiv: 2010.10563.
4. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint, 2017, arXiv: 1711.05225.
5. Demner-Fushman D, Kohli M D, Rosenman M B, et al. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 2016, 23(2): 304-310.
6. Vinyals O, Toshev A, Bengio S, et al. Show and tell: a neural image caption generator//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2015: 3156-3164.
7. Li C Y, Liang X, Hu Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation//Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18), 2018: 1537-1547.
8. Han K, Wang Y, Chen H, et al. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110.
9. He X, Yang Y, Shi B, et al. VD-SAN: visual-densely semantic attention network for image caption generation. Neurocomputing, 2019, 328: 48-55.
10. Alfarghaly O, Khaled R, Elkorany A, et al. Automated radiology report generation using conditioned transformers. Informatics in Medicine Unlocked, 2021, 24: 100557.
11. Valanarasu J M J, Oza P, Hacihaliloglu I, et al. Medical transformer: gated axial-attention for medical image segmentation//Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Springer, 2021: 36-46.
12. Hou B, Kaissis G, Summers R M, et al. Ratchet: medical transformer for chest X-ray diagnosis and reporting//Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Springer, 2021: 293-303.
13. Srinivasan P, Thapar D, Bhavsar A, et al. Hierarchical X-ray report generation via pathology tags and multi head attention//Proceedings of the Asian Conference on Computer Vision (ACCV 2020), Springer, 2020: 600-616.
14. Liu F, Wu X, Ge S, et al. Exploring and distilling posterior and prior knowledge for radiology report generation// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2021: 13753-13762.
15. Li J, Li S, Hu Y, et al. A self-guided framework for radiology report generation//Medical Image Computing and Computer Assisted Intervention (MICCAI 2022), Springer, 2022: 588-598.
16. You D, Liu F, Ge S, et al. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation.//Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Springer, 2021: 72-82.
17. Chen Z, Shen Y, Song Y, et al. Cross-modal memory networks for radiology report generation// The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 2022. arXiv: 2204.13258.
18. Wang X, Peng Y, Lu L, et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2017: 2097-2106.
19. Johnson A E W, Pollard T J, Berkowitz S J, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 2019, 6(1): 317.
20. Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows// Proceedings of the IEEE/CVF international conference on computer vision, IEEE, 2021, 10012-10022.
21. Devlin J, Chang M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL Anthology, 2019: 4171-4186.
22. Silva Barbon R, Akabane A T. Towards transfer learning techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for automatic text classification from different languages: a case study. Sensors, 2022, 22(21): 8184.
23. Chen Z, Song Y, Chang T H, et al. Generating radiology reports via memory-driven transformer// Conference on Empirical Methods in Natural Language Processing (EMNLP-2020), 2020. arXiv: 2010.16056.
24. Lee D, Tian Z, Xue L, et al. Enhancing content preservation in text style transfer using reverse attention and conditional layer normalization// The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 2021. arXiv: 2108.00449.
25. Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36(4): 1234-1240.
26. Yang S, Wu X, Ge S, et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis, 2023, 86: 102798.
27. Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation// Proceedings of the Annual Meeting of the Association for Computational Linguistics, ACL, 2002: 311-318.
28. Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Association for Computational Linguistics, 2005: 65-72.
29. Lin C Y. ROUGE: a package for automatic evaluation of summaries. Text summarization branches out, Association for Computational Linguistics, 2004: 74-81.
30. He Tong, Zhang Zhi, Zhang Hang, et al. Bag of tricks for image classification with convolutional neural networks// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2019: 558-567.
31. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016: 770-778.
32. Huang Gao, Liu Zhuang,Van Der Maaten L, et al. Densely connected convolutional networks// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2017: 2261-2269.
33. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale//International Conference on Learning Representations, ICLR, 2021: 1-22.
34. Selvaraju RR, Cogswell M, Das A, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization.//Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2017: 618-626.

Journal of Biomedical Engineering

Research on automatic generation of multimodal medical image reports based on memory driven

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content