Ultrasonic examination is a common method in thyroid examination, and the results are mainly composed of thyroid ultrasound images and text reports. Implementation of cross modal retrieval method of images and text reports can provide great convenience for doctors and patients, but currently there is no retrieval method to correlate thyroid ultrasound images with text reports. This paper proposes a cross-modal method based on the deep learning and improved cross-modal generative adversarial network: ①the weight sharing constraints between the fully connection layers used to construct the public representation space in the original network are changed to cosine similarity constraints, so that the network can better learn the common representation of different modal data; ②the fully connection layer is added before the cross-modal discriminator to merge the full connection layer of image and text in the original network with weight sharing. Semantic regularization is realized on the basis of inheriting the advantages of the original network weight sharing. The experimental results show that the mean average precision of cross modal retrieval method for thyroid ultrasound image and text report in this paper can reach 0.508, which is significantly higher than the traditional cross-modal method, providing a new method for cross-modal retrieval of thyroid ultrasound image and text report.