A Comparative Study of Large Language Models and Traditional Sentiment Analysis Methods for Sentiment Analysis under Data Imbalance Scenario_Vol. 4 No. 1 (JBDC 2026)_Journal of Big Data and Computing (ISSN: 2959-0590)

Home > Journal of Big Data and Computing (ISSN: 2959-0590) > Vol. 4 No. 1 (JBDC 2026) >

A Comparative Study of Large Language Models and Traditional Sentiment Analysis Methods for Sentiment Analysis under Data Imbalance Scenario

Download PDF

DOI: https://doi.org/10.62517/jbdc.202601122

Author(s)

Yi Li

Affiliation(s)

Department of Computer Science, Tianjin University of Technology, Tianjin, China

Abstract

Sentiment analysis is crucial for extracting insights from user-generated text, but its real-world application is often hindered by the pervasive challenge of data imbalance, where majority sentiment classes dominate. This study presents a controlled empirical comparison between traditional sentiment analysis methods (including dictionary-based approaches, classical machine learning models such as Decision Tree, Random Forest, SVM, and a deep learning model LSTM) and a Large Language Model (DeepSeek API with prompt engineering) for sentiment analysis on highly imbalanced datasets. Experiments were conducted on the Twitter US Airline Sentiment dataset with three constructed imbalance ratios (65%:25%:10%, 80%:15%:5%, and 90%:8%:2%) and an additional general sentiment analysis dataset with a fixed 65%:25%:10% ratio, using class-specific precision, recall, F1-score, and Macro-average F1-score as primary metrics. The results reveal a dramatic divergence in model robustness. Traditional methods, despite achieving reasonable overall accuracy in milder scenarios, showed significant limitations in recognizing minority sentiment classes under severe imbalance. In contrast, the LLM consistently achieved the highest Macro-average F1-scores across all Twitter imbalance scenarios (0.800, 0.743, and 0.720), maintained strong minority class performance (e.g., Positive class F1-score of 0.62 and recall of 88% under extreme 90:8:2 imbalance, compared to the best traditional model's F1 of 0.52 and LSTM recall of 20%), and demonstrated superior cross-dataset generalizability (Macro F1 of 0.800 on Twitter and 0.660 on the general dataset). We conclude that the inherent prior knowledge and superior contextual understanding of LLMs, activated through simple prompt engineering, confer a significant advantage over traditional models that learn solely from the imbalanced training data. Our findings strongly suggest that LLMs offer a more robust and effective solution for sentiment analysis in realistic, imbalanced scenarios.

Keywords

Sentiment Analysis; Data Imbalance; Large Language Models; Traditional Sentiment Analysis Methods; Comparative Study

References

[1] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008. [2] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques," in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79-86, 2002. [3] T. Brown et al., "Language models are few-shot learners," in Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020. [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171-4186, 2019. [5] W. Zhang et al., "Sentiment Analysis in the Era of Large Language Models: A Reality Check," in Findings of the Association for Computational Linguistics: NAACL 2024, pp. 3893-3909, 2024. [6] J. Hartmann, J. Schwenzow, and M. Witte, "Sentiment Analysis in the Age of Generative AI," Customer Needs and Solutions, vol. 11, no. 1, pp. 1-14, 2024. [7] H. Yang, X. Zeng, L. Xu, and T. Liu, "Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey," arXiv preprint arXiv:2406.08068, 2024. [8] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. [9] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988, 2017. [10] S. Havrylov and I. Titov, "A Survey of Methods for Addressing Class Imbalance in Deep Learning for Natural Language Processing," in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), pp. 531-542, 2023. [11] N. Japkowicz and S. Stephen, "The class imbalance problem: A systematic study," Intelligent Data Analysis, vol. 6, no. 5, pp. 429-449, 2002. [12] D.-Y. Kim and D.-J. Shin, "Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach," Applied Sciences, vol. 14, no. 2, p. 622, 2024.