STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Entity Recognition in Traditional Chinese Medicine Package Inserts Based on Large Language Models
DOI: https://doi.org/10.62517/jike.202504401
Author(s)
Yali Wan1,*, Deyi Xiong2
Affiliation(s)
1School of Information Technology Engineering, Guangzhou College of Commerce, Guangzhou, Guangdong, China 2Chongqing Academy of Metrology and Quality Inspection, Chongqing, China *Corresponding Author
Abstract
Aiming at the problems in Traditional Chinese Medicine (TCM) package insert texts, such as conceptual ambiguity, professional terminology, and lack of large-scale annotated data, this paper proposes an entity recognition method based on Large Language Models (LLMs). This method selects DeepSeek-V3.2-exp as the base model and designs an iterative prompt optimization strategy. The basic prompt is constructed by defining the task and injecting domain knowledge. On this basis, Chain-of-Thought (CoT) reasoning rules are introduced to build a structured decision-making process to guide the model in performing entity recognition for TCM package inserts, effectively enhancing the model's ability to distinguish semantically ambiguous entities. Experimental results show that compared to the basic prompt, the optimized prompt incorporating CoT achieves a good improvement in the overall F1 score.
Keywords
Named Entity Recognition; LLM; Instruction Manual of Traditional Chinese Medicine
References
[1] Anthony P, Alfred R, Leong L C, et al. A rule-based named-entity recognition for malay articles // International Conference on Advanced Data Mining & Applications.2013. [2] Muñoz O M, Quimbaya A P, Sierra A, Gonzalez R A, García A A. Named Entity Recognition Over Electronic Health Records Through a Combined Dictionary-based Approach. Procedia Computer Science, 2016, 100: 55–61. [3] Bikel D M, Schwartz R, Weischedel R M. An algorithm that learns what's in a name. Machine Learning, 1999, 34:211-231. [4] McCallum A, Li W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. Edmonton: Association for Computational Linguistics, 2003, 188-191. [5] Lafferty J, Mccallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning (ICML), 2001, 282-289. [6] Feng Y, Jiang B, Wang L, et al. Cybersecurity named entity recognition using multi-modal ensemble learning. IEEE Access, 2020, 8, 63214-63224. [7] Jiang W, Yi G, Wang X l. Improving Feature Extraction in Named Entity Recognition Based on Maximum Entropy Model // 2006 International Conference on Machine Learning and Cybernetics. Dalian, China: IEEE, 2006: 2630-2635. [8] Ukov-Gregori A, Bachrach Y, Coope S. Named entity recognition withparallel recurrent neural networks // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 69-74. [9] Ma X, Yu R, Gao Co, Wei Z, Xia Y, Wang X, Liu H. Research on named entity recognition method of marine natural products based on attention mechanism. Frontiers in Chemistry, 2023, (11):958002. [10]Zheng Z, Liu M, Weng Z. A Chinese BERT-based dual-channel named entity recognition method for solid rocket engines. Electronics, 2023, 12(3):752.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved