• 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, P. R. China;
  • 2. State Key Laboratory of Analytical Chemistry for Life Sciences, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China;
  • 3. Chemistry and Biomedicine Innovation Center, Nanjing University, Nanjing 210023, P. R. China;
HUANG Shuo, Email: shuo.huang@nju.edu.cn; ZHANG Daoqiang, Email: dqzhang@nuaa.edu.cn
Export PDF Favorites Scan Get Citation

O6-carboxymethyl guanine(O6-CMG) is a highly mutagenic alkylation product of DNA that causes gastrointestinal cancer in organisms. Existing studies used mutant Mycobacterium smegmatis porin A (MspA) nanopore assisted by Phi29 DNA polymerase to localize it. Recently, machine learning technology has been widely used in the analysis of nanopore sequencing data. But the machine learning always need a large number of data labels that have brought extra work burden to researchers, which greatly affects its practicability. Accordingly, this paper proposes a nano-Unsupervised-Deep-Learning method (nano-UDL) based on an unsupervised clustering algorithm to identify methylation events in nanopore data automatically. Specially, nano-UDL first uses the deep AutoEncoder to extract features from the nanopore dataset and then applies the MeanShift clustering algorithm to classify data. Besides, nano-UDL can extract the optimal features for clustering by joint optimizing the clustering loss and reconstruction loss. Experimental results demonstrate that nano-UDL has relatively accurate recognition accuracy on the O6-CMG dataset and can accurately identify all sequence segments containing O6-CMG. In order to further verify the robustness of nano-UDL, hyperparameter sensitivity verification and ablation experiments were carried out in this paper. Using machine learning to analyze nanopore data can effectively reduce the additional cost of manual data analysis, which is significant for many biological studies, including genome sequencing.

Citation: GUAN Xiaoyu, WANG Yu, ZHANG Jinyue, SHAO Wei, HUANG Shuo, ZHANG Daoqiang. Unsupervised deep learning for identifying the O6-carboxymethyl guanine by nanopore sequencing. Journal of Biomedical Engineering, 2022, 39(1): 139-148. doi: 10.7507/1001-5515.202104068 Copy

  • Previous Article

    Design, simulation and application of multichannel microfluidic chip for cell migration
  • Next Article

    Structural design and performance analysis of an auxiliary dining robot