1. |
Teney D, Anderson P, He X, et al. Tips and tricks for visual question answering: Learnings from the 2017 Challenge// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City: IEEE, 2018: 4223-4232..
|
2. |
Ambati R, Dudyala C R. A sequence-to-sequence model approach for imageCLEF 2018 medical domain visual question answering// 2018 15th IEEE India Council International Conference (INDICON). Coimbatore: IEEE, 2018: 1-6..
|
3. |
Kovaleva O, Shivade C, Kashyap S, et al. Towards visual dialog for radiology// Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing (BioNLP). Online: ACL, 2020: 60-69..
|
4. |
Liu B, Zhan L M, Xu L, et al. Medical visual question answering via conditional reasoning and contrastive learning. IEEE Trans Med Imaging, 2023, 42(5): 1532-1545..
|
5. |
Pan H, He S, Zhang K, et al. AMAM: an attention-based multimodal alignment model for medical visual question answering. Knowl-Based Syst, 2022, 255: 109763..
|
6. |
Lan M, Zhang Y, Zhang L, et al. Defect detection from uav images based on region-based cnns// 2018 IEEE International Conference on Data Mining Workshops (ICDMW). Singapore: IEEE, 2018: 385-390..
|
7. |
张北辰, 李亮, 查正军, 等. 基于跨模态对比学习的视觉问答主动学习方法. 计算机学报, 2022, 45(8): 1730-1745..
|
8. |
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database// 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami: IEEE, 2009: 248-255..
|
9. |
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks// Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML). Sydney: JMLR, 2017: 1126-1135..
|
10. |
Zhang Y, Jiang H, Miura Y, et al. Contrastive learning of medical visual representations from paired images and text// Machine Learning for Healthcare Conference (MLHC). Durham: PMLR, 2022: 2-25..
|
11. |
Gong H, Chen G, Liu S, et al. Cross-modal self-attention with multi-task pre-training for medical visual question answering// Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR). New York: ACM, 2021: 456-460..
|
12. |
Agrawal V, Udupa J, Tong Y, et al. BRR-Net: a tandem architectural CNN-RNN for automatic body region localization in CT images. Med Phys, 2020, 47(10): 5020-5031..
|
13. |
Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding// North American Chapter of the Association for Computational Linguistics (NAACL). Minneapolis: ACL, 2019: 4171-4186..
|
14. |
Gu Y, Tinn R, Cheng H, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare, 2021, 3(1): 1-23..
|
15. |
Yan B, Pei M. Clinical-BERT: vision-language pre-training for radiograph diagnosis and reports generation// Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Vancouver: AAAI Press, 2022: 2982-2990..
|
16. |
Guo Z, Han D. Multi-modal explicit sparse attention networks for visual question answering. Sensors, 2020, 20(23): 6758-6771..
|
17. |
Cornia M, Stefanini M, Baraldi L, et al. Meshed-memory transformer for image captioning// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 10575-10584..
|
18. |
Ma L, Lu Z, Li H. Learning to answer questions from image using convolutional neural network// Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI). Arizona: AAAI Press, 2016: 3567-3573..
|
19. |
Ma L, Jiang W, Jie Z, et al. Bidirectional image-sentence retrieval by local and global deep matching. Neurocomput, 2019, 345(9): 36-44..
|
20. |
Fukui A, Park D H, Yang D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). Austin: ACL, 2016: 457-468..
|
21. |
Yang Z, He X, Gao J, et al. Stacked attention networks for image question answering// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 21-29..
|
22. |
Kafle K, Kanan C. An analysis of visual question answering algorithms// 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017: 1983-1991..
|
23. |
Singh J, Mahapatra D, Deepti R B. Medical VQA: MixUp helps keeping it simple// Image and Vision Computing: 37th International Conference (IVCNZ). Auckland: Springer-Verlag, 2023: 402-414..
|
24. |
Kim J H, Jun J, Zhang B. Bilinear attention networks// Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS). Montréal: NeurIPS Foundation, 2018: 1571-1581..
|
25. |
Rahman T, Chou S, Sigal L, et al. An improved attention for visual question answering// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville: IEEE, 2021: 1653-1662..
|
26. |
邹品荣, 肖锋, 张文娟, 等. 面向视觉问答的多模块共同注意模型. 计算机工程, 2022, 48(2): 250-260..
|
27. |
Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision// International Conference on Machine Learning (ICML). California: PMLR, 2021: 8748-8763..
|
28. |
Yu Z, Yu J, Cui Y, et al. Deep modular co-attention networks for visual question answering// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). California: IEEE, 2019: 6274-6283..
|
29. |
Pennington J, Socher R, Manning C. GloVe: global vectors for word representation// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: ACL, 2014: 1532-1543..
|
30. |
Hu J, Shen L, Sun G. Squeeze-and-excitation networks// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Hawaii: IEEE, 2018: 7132-7141..
|
31. |
Gasmi K, Ltaifa I B, Lejeune G, et al. Optimal deep neural network-based model for answering visual medical question. Cybernet Syst, 2022, 53(5): 403-424..
|
32. |
Do T, Nguyen B X, Tjiputra E, et al. Multiple meta-model quantifying for medical visual question answering// Medical Image Computing and Computer Assisted Intervention (MICCAI). Strasbourg: Springer, 2021: 64-74..
|
33. |
Dey R, Salem F M. Gate-variants of gated recurrent unit (GRU) neural networks// 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS). Boston: IEEE, 2017: 1597-1600..
|
34. |
Corbeil J, Ghadivel H A. BET: a backtranslation approach for easy data augmentation in transformer-based paraphrase identification context. arXiv, 2020: 2009.12452..
|
35. |
Lu J, Yang J, Batra D, et al. Hierarchical question-image co-attention for visual question answering// Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). New York: MIT Press, 2016: 289-297..
|
36. |
Lau J, Gayen S, Abacha A B, et al. A dataset of clinically generated visual questions and answers about radiology images. Sci Data, 2018, 5(1): 180251..
|
37. |
He X, Zhang Y, Mou L, et al. PathVQA: 30000+ questions for medical visual question answering. arXiv, 2020: 2003.10286..
|
38. |
Nguyen B D, Do T, Tran Q D, et al. Overcoming data limitation in medical visual question answering// International Conference on Medical Image Computing and Computer-Assisted intervention (MICCAI). Shenzhen: Springer, 2019: 522-530..
|