Identification of Metagenomic Antibiotic Resistance Genes Using a CNN-Attention Hybrid Architecture
DOI: https://doi.org/10.62517/jike.202604216
Author(s)
Chenhao Guo
Affiliation(s)
Guangdong Pharmaceutical University, Zhongshan, Guangdong, China
Abstract
Metagenomic sequencing is playing a decisive role in the rapid screening of clinical pathogens and the monitoring of environmental resistance groups. However, in the face of the proliferation of short sequencing fragments and high-frequency distant mutation sequences ( homology < 40 % ) in actual combat, conventional BLAST alignment and early deep learning tools often inevitably fall into the double dilemma of recall decline and ' decision black box '. To this end, this study constructs a DeepARG-Attention joint operation architecture. The system relies on multi-scale 1D-ResNet to keenly anchor local biochemical motifs, and uses a multi-head self-attention network to forcibly bridge the long-range feature breaks caused by sequencing truncation, supplemented by Focal Loss depth to correct the underlying sample imbalance. The measured data show that even under the limitation of 100 bp extreme fragmentation, the model still achieves an AUPR peak of 0.942, and its detection efficiency in the low homology interval completely overwhelms the active benchmark tool. More importantly, reverse attention mapping irrefutably confirms that the network can spontaneously focus and lock the underlying catalytic core sites of proteins. This breakthrough not only establishes a set of practical fast screening chassis with strong anti-disturbance, but also provides a new paradigm of logical self-consistency for the accurate exploration of unknown high-risk drug-resistant targets.
Keywords
Metagenome; Antibiotic Resistance Genes; Convolutional Neural Network; Self-Attention Mechanism; Explainable Artificial Intelligence; Distant Homology
References
[1] World Health Organization. Global antimicrobial resistance and use surveillance system (GLASS) report: 2022[R]. Geneva: World Health Organization, 2022.
[2] Gullberg E, Albrecht S, Karlsson C, et al. Selection of a multidrug resistance plasmid by subinhibitory levels of antibiotics[J]. mBio, 2014, 5(5): e01918-14.
[3] Zankari E, Hasman H, Cosentino S, et al. Identification of acquired antimicrobial resistance genes[J]. Journal of antimicrobial chemotherapy, 2012, 67(11): 2640-2644.
[4] Alcock B P, Raphenya A R, Lau T T Y, et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database[J]. Nucleic acids research, 2020, 48(D1): D517-D525.
[5] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[6] Arango-Argoty G, Garner E, Pruden A, et al. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data[J]. Microbiome, 2018, 6(1): 1-15.
[7] Liu Y, Wang J, Yi J, et al. Identifying antibiotic resistance genes via bi-pathway multi-attention mechanism[J]. Briefings in Bioinformatics, 2023, 24(5): bbad258.
[8] Pei Y, Shum M H H, Liao Y, et al. ARGNet: using deep neural networks for robust identification and classification of antibiotic resistance genes from sequences[J]. Microbiome, 2024, 12(1): 91.
[9] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[10] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008.
[11] Rives A, Meier J, Sbihi T, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences, 2021, 118(15): e2016239118.
[12] He L, Li H, Qi R, et al. MCT-ARG: Identification and classification of antibiotic resistance genes based on a multi-channel Transformer model[J]. Science of The Total Environment, 2024, 912: 169434.
[13] Yagimoto K, Hosoda S, Sato M, et al. Prediction of antibiotic resistance mechanisms using a protein language model[J]. Bioinformatics, 2024, 40(10): btae554.
[14] Wang B, Meng R, Li Z, et al. Predicting antibiotic resistance genes and bacterial phenotypes based on protein language models[J]. Frontiers in Microbiology, 2024, 15: 1475685.
[15] Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.