STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Research on Transformer-Based Multilingual Machine Translation Methods
DOI: https://doi.org/10.62517/jike.202504108
Author(s)
Xiaodong Zhao1, Rouyi Fan1, Wanyue Liu2,*
Affiliation(s)
1School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, China 2Henan University of Technology, Zhengzhou, China *Corresponding Author.
Abstract
Because of the great difference in word order between different languages in machine translation, the translation model has the problem of wrong translation. Translation models with the same target language and different source languages learn different word order information, resulting in different translation quality. Therefore, this paper proposes a multilingual neural machine translation model with multiple languages at the source and one language at the target. Multiple languages with different word orders participate in the model training at the same time, so that the model can learn multiple word order information of sentences with the same meaning. First, 170,000 Russian-Uzbek-Uyghur-English-Chinese parallel corpus is constructed. On this basis, different source languages are added with specified language tags by using the method of adding language tags, and then mixed as a new data set to train a multilingual translation model. In addition, four multilingual neural machine translation models, stacking, parallel, fusion and sublayer fusion, are realized by modifying the Transformer model method. The experimental results show that the method of adding language tags can partially improve the performance of bilingual translation, and the quality of translation can be further improved after the source language is rewritten with Latin letters; The modified four Multilingual models can improve the quality of translation models.
Keywords
Multilingual Neural Machine Translation Model; Multilingual Parallel Corpus; Linguistic Markup; Improved Model
References
[1] Li Y, Wang H, Zhang M, et al. PreAlign: Enhancing Cross-Lingual Transfer via Pre-training Alignment for Multilingual Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Taipei, Taiwan, 2024. Association for Computational Linguistics. [2] Liang X, Wu L, Li J, et al. Multi-teacher distillation with single model for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 992-1002. [3] Merullo J, Eickhoff C, Pavlick E. Talking heads: Understanding inter-layer communication in transformer language models. Advances in Neural Information Processing Systems, 2024, 37: 61372-61418. [4] Purason T, Tättar A. Multilingual neural machine translation with the right amount of sharing//Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. 2022: 91-100. [5] Rojas M A, Carranza R. Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning. arXiv preprint arXiv:2412.08955, 2024. [6] Wang Q, Zhang J. Parameter differentiation based multilingual neural machine translation//Proceedings of the AAAI Conference on Artificial Intelligence. 2022, 36(10): 11440-11448. [7] Cheng S, Chen W, Tang Y, et al. Unified Training for Cross-Lingual Abstractive Summarization by Aligning Parallel Machine Translation Pairs. Mathematics, 2024, 12(13): 2107. [8] Guo J, Zhang Z, Xu L, et al. Adaptive adapters: An efficient way to incorporate BERT into neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1740-1751. [9] Escolano Peinado C. Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation. 2022. [10] Trankova N, Rykunov D, Serov I, et al. Hierarchical Encoding and Conditional Attention in Neural Machine Translation. The American Journal of Engineering and Technology, 2024, 6(09): 45-55. [11] Mielke S J, Alyafeai Z, Salesky E, et al. Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP. arXiv preprint arXiv:2112.10508, 2021. [12] Al Jawarneh, IM, Bellavista, P, Murillo, JM. A Pre-Filtering Approach for Incorporating Contextual Information into Deep Learning Based Recommender Systems//Proceedings of IEEE ACCESS. 2020: 40485-40498.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved