Online Mendelian Inheritance in Man (OMIM) is a knowledge source and data base for human genetic diseases and related genes. Each OMIM entry includes clinical synopsis, linkage analysis for candidate genes, chromosomal localization and animal models, which has become an authoritative source of information for the study of the relationship between genes and diseases. As overlap of disease symptoms may reflect interactions at the molecular level, comparison of phenotypic similarity may indicate candidate genes and help to discover functional connections between genes and proteins. However, the OMIM has used free text to describe disease phenotypes, which does not suit computer analysis. Standardization of OMIM data therefore has important implications for large-scale comparison of disease phenotypes and prediction of phenotype-genotype correlations. Recently, standard medical language systems, term frequency-inverse document frequency and the law of cosines for document classification have been introduced for mining of OMIM data. Combined with Gene Ontology and various comparison methods, this has achieved substantial successes. In this article, we have reviewed various methods for standardization and similarity comparison of OMIM data. We also predicted the trend for research in this direction.
In order to understand the evolution of the diagnosis and treatment plans of corona virus disease 2019 (COVID-19), and provide convenience for medical staff in actual diagnosis and treatment, this paper uses the 9 diagnosis and treatment plans of COVID-19 issued by the National Health Commission during the period from January 26, 2020 to August 19, 2020 as research data to perform comparative analysis and visual analysis. Based on text mining, this paper obtained the text similarity and summarized its evolution law by expressing and measuring the similarity of the overall diagnosis and treatment plans of COVID-19 and the same modules, which provides reference for clinical diagnosis and treatment practice and other diagnosis and treatment plan formulation.