News of the Kabardino-Balkarian Scientific Center of the Russian Academy of Sciences

Известия Кабардино-Балкарского научного центра РАН

1991-66392949-1940

294372

10.35330/1991-6639-2025-27-2-11-22

EWHPZV

System analysis, management and information processing

Системный анализ, управление и обработка информации

Research Article

Building a machine learning model for predicting fraudulent transactions

Построение модели машинного обучения для прогнозирования мошеннических транзакций

https://orcid.org/0009-0000-9591-3301

3088-3121

Konstantinov

Alexey F.

Константинов

Алексей Федорович

Russian Federation

Postgraduate Student at the Department of Informatics

аспирант кафедры информатики

konstantinovaf@gmail.com

https://orcid.org/0000-0001-5229-8070

2513-8831

Дьяконова

Людмила Павловна

Dyakonova

Lyudmila P.

Russian Federation

Candidate of Physical and Mathematical Sciences, Associate Professor at the Department of Informatics

канд. физ.-мат. наук, доцент кафедры информатики

Dyakonova.LP@rea.ru

Plekhanov Russian University of EconomicsРоссийский экономический университет имени Г. В. Плеханова

11062025

2025

272

11223005202530052025

2025

Konstantinov A.F., Dyakonova L.P.

Константинов А.Ф., Дьяконова Л.П.

https://creativecommons.org/licenses/by/4.0

https://journals.rcsi.science/1991-6639/article/view/294372

The article presents development of a machine learning model for predicting fraudulent transactions using transactional data from a bank. It discusses the features of encoding categorical variables related to the presence of time in the transactional data to avoid information leakage. Additionally, experiments were conducted on the application of bagging and the creation of additional variables based on their contribution to the final prediction using Shapley values. The quality metrics of the machine learning model are examined and analyzed.

В статье представлена разработка модели машинного обучения для прогнозирования мошеннических транзакций на примере транзакционных данных банка. Рассмотрены особенности кодирования категориальных переменных, связанные с наличием времени в транзакционных данных, чтобы избежать утечек информации. Проведены эксперименты по применению баггинга (bootstrap aggregating) и созданию дополнительных переменных на основе их вклада в итоговый прогноз с применением Shapley values. Рассмотрены показатели качества модели машинного обучения и проведен их анализ.

мошеннические транзакцииcatboostкодирование категориальных переменныхcatboost_encodertarget_encoderbaggingсоздание переменныхShapley values

fraudulent transactionscatboostencoding categorical variablescatboost_encodertarget_encoderbaggingvariables creationShapley values

Mashrur A., Luo W., Zaidi N.A., Robles-Kelly A. Machine Learning for Financial Risk Management: A Survey. IEEE Access. 2020. Vol. 8. Pp. 203203–203223. DOI: 10.1109/ACCESS.2020.3036322

Awosika T., Shukla R.M., Pranggono B. Transparency and Privacy: The Role of Explainable AI and Federated Learning in Financial Fraud Detection. IEEE Access. 2024. Vol. 12. Pp. 64551–64560. DOI: 10.1109/ACCESS.2024.3394528

McMahan B., Moore E., Ramage D. et al. Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20 th International Conference on Artificial Intelligence and Statistics. 2017. Vol. 54. Pp. 1273–1282. DOI: 10.48550/arXiv.1602.05629

Ali A.A., Khedr A.M., El-Bannany M., Kanakkayil S. A Powerful Predicting Model for Financial Statement Fraud Based on Optimized XGBoost Ensemble Learning Technique. Applied Sciences. 2023. Vol. 13. No. 4. P. 2272. DOI: 10.3390/app13042272

He K., Yang Q., Ji L. et al. Financial Time Series Forecasting with the Deep Learning Ensemble Model. Mathematics. 2023. Vol. 11. No. 4. P. 1054. DOI: 10.3390/math11041054

Prokhorenkova L., Gusev G., Vorobev A. et al. CatBoost: unbiased boosting with categorical features. NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018. Pp. 6639–6649. DOI: 0.48550/arXiv.1706.09516

Micci-Barreca D. A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. ACM SIGKDD Explorations Newsletter. Vol. 3. No. 1. Pp. 27–32. DOI: 10.1145/507533.507538

Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. Workshop on ML Systems at NIPS. 2017. DOI: 10.48550/arXiv.1810.11363

Breiman L. Bagging predictors. Machine Learning. 1996. Vol. 24. No. 2. Pp. 123–140. DOI: 10.1007/BF00058655

10.

Official website Catboost. Common parameters. Точка доступа: https://catboost.ai/en/docs/ references/training-parameters/common#bagging_temperature (дата обращения: 10 января 2025)

11.

Shapley L. Notes on the n-person game, ii: the value of an n-person game. 1951.

12.

Official website SHAP library. Точка доступа: https://shap.readthedocs.io/en/latest/ example_notebooks/tabular_examples/tree_based_models/Catboost%20tutorial.html (дата обращения: 10 января 2025)

13.

Brier Glenn W. Verification of forecasts expressed in terms of probability. Monthly Weather Review. 1950. Vol. 78. No. 1. Pp. 1–3. Bibcode:1950MWRv...78....1B. DOI: 10.1175/1520-0493(1950)078 <0001:VOFEIT> 2.0.CO

14.

Akiba T., Sano S., Yanase T. et al. Optuna: A Next-generation Hyperparameter Optimization Framework. KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Pp. 2623–2631. DOI: 10.1145/3292500.3330701