Comparative analysis of class imbalance reduction methods in building machine learning models in the financial sector
A.F. Konstantinov, L.P. Dyakonova
Upload the full text
Abstract. The article discusses methods for improving quality metrics of machine learning models used in the financial sector. Due to the fact that the data sets on which the models are trained have class imbalances, it is proposed to use models aimed at reducing the imbalance. The study conducted experiments using 9 methods for accounting for class imbalances with three data sets on retail lending. The CatboostClassifier gradient boosting model, which does not take into account class imbalances, was used as the base model. The experiments showed that the use of the RandomOverSampler method provides a significant increase in classification quality metrics compared to the base model. The results indicate the promise of further research into methods for accounting for class imbalances in the study of financial data, as well as the feasibility of application of the considered methods in practice.
Keywords: financial risks, machine learning, classification, class imbalance
For citation. Konstantinov A.F., Dyakonova L.P. Comparative analysis of class imbalance reduction methods in building machine learning models in financial sector. News of the Kabardino-Balkarian Scientific Center of RAS. 2025. Vol. 27. No. 1. Pp. 143–151. DOI: 10.35330/1991-6639-2025-27-1-143-151
References
- Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research. 2002. Vol. 16. Pp. 321–357. DOI: 10.1613/jair.953
- He H., Bai Y., Garcia E.A., Li S. Adasyn: adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 2008. Pp. 1322–1328. DOI: 10.1109/IJCNN.2008.4633969
- Han H., Wang W.-Y., Mao B.-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing. 2005. Pp. 878–887. Springer. DOI: 10.1007/11538059_91
- Tomek I. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics. 1976.Vol. 6. Pp. 769–772. DOI: 10.1109/TSMC.1976.4309452
- Laurikkala J. Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe. 2001. Pp. 63–66. Springer. DOI: 10.1007/3-540-48229-6_9
- Batista G., Prati R.C., Monard M.C. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 2004. Vol. 6. No. 1. Pp. 20–29. DOI: 10.1145/1007730.1007735
- Batista G., Bazzan B., Monard M., Balancing Training Data for Automated Annotation of Keywords: a Case Study. In WOB. 2003. Pp. 10–18. BibTeX key: conf/wob/BatistaBM03
Information about the authors
Alexey F. Konstantinov, Post-graduate Student, Department of Informatics, Plekhanov Russian University of Economics;
115054, Russia, Moscow, 36 Stremyannyy lane;
konstantinovaf@gmail.com, ORCID: https://orcid.org/0009-0000-9591-3301, SPIN-code: 3088-3121
Lyudmila P. Dyakonova, Candidate of Physical and Mathematical Sciences, Associate Professor, Department of Informatics, Plekhanov Russian University of Economics;
115054, Russia, Moscow, 36 Stremyannyy lane;
Dyakonova.LP@rea.ru, ORCID: https://orcid.org/0000-0001-5229-8070, SPIN-code: 2513-8831











