Optimization of Decision Making Capabilities of Intelligent Characters in Strategy Games Based on Large Language Model Agent_Vol. 2 No. 3 (JBDC 2024)_Journal of Big Data and Computing (ISSN: 2959-0590)

Home > Journal of Big Data and Computing (ISSN: 2959-0590) > Vol. 2 No. 3 (JBDC 2024) >

Optimization of Decision Making Capabilities of Intelligent Characters in Strategy Games Based on Large Language Model Agent

Download PDF

DOI: https://doi.org/10.62517/jbdc.202401320

Author(s)

Kehua Shi

Affiliation(s)

School of Fuzhou University, Fujian, China

Abstract

In order to explore the optimization of decision-making ability of intelligent characters in strategy games based on Large Language Model (LLM) Agent, this paper takes pokellmon[1] developed by Georgia Institute of Technology as an example, and realizes the exploration of the scheme of optimization of decision-making ability by imitating the human players: 1. Collecting the experience of various human players and letting the LLM to expand the knowledge by means of Prompt Engineering and RAG(Retrieval-Augmented Generation) to get that experience; 2. Collecting the records of human high-scoring matchups records from the The replays records on pokemon showdown official website, and make a dataset of multiple rounds records of each game in the mode of multi-round dialogues, and fine-tune the LLM through QLoRA fine-tuning method (Tim Dettmers et al., 2023) after processing the data, so as to allow the LLM to learn the decision-making of the human masters in the random battles. The open-source LLM llamma3.1-8b was chosen for this study, and the battles against the heuristic bots showed that the modified pokellmon got some degree of strategy optimization, and compared to the very beginning of the battles against the heuristic bots, the winning rate was improved from 18% to 34%.

Keywords

LLM Agent; Prompt Engineering; RAG; QLoRA; Game Stratege

References

[1] Hu, Sihao, Tiansheng Huang, and Ling Liu. “Pok\'eLLMon: A Human-Parity Agent for Pok\'emon Battles with Large Language Models.” arXiv preprint arXiv:2402.01118 (2024). [2] Dettmers, Tim, et al. “Qlora: Efficient finetuning of quantized llms.” Advances in Neural Information Processing Systems 36 (2024). [3] Ma, Weiyu, et al. “Large language models play starcraft ii: Benchmarks and a chain of summarization approach.” arXiv preprint arXiv:2312.11865 (2023). [4] CSDN. (2024) BaiChuan13B Examples of fine-tuning multiple rounds of dialogues https://blog.csdn.net/Python_Ai_Road/article/details/132400115?rId=132400115&source=Freyr_s &type = blog&refer = APP [5] cnblogs. (2023) Game AI Behavioral Decision Making https://www.cnblogs.com/OwlCat/p/17871494.html