Ключевые слова

1991-6639

2949-1940

Известия Кабардино-Балкарского научного центра РАН

NEWS OF THE KABARDINO-BALKARIAN SCIENTIFIC CENTER OF RAS

КБНЦ РАН

10.35330/1991-6639-2026-28-2-34-50

ISSQXZ

https://izvestiyakbncran.ru/index.php/28-2-3/

ИНФОРМАТИКА И ИНФОРМАЦИОННЫЕ ПРОЦЕССЫ

INFORMATICS AND INFORMATION PROCESSES

Архитектура распределенной системы хранения и обработки больших данных на основе Apache Ozone и Argo Workflows

Architecture of a distributed storage and big data processing system based on Apache Ozone and Argo Workflows

Полянцева

Ксения Андреевна

Полянцева

Ксения Андреевна

Polyantseva

Ksenia A.

k.a.poliantseva@mtuci.ru

0000-0002-7102-4208

Комлев

Артем Владимирович

Комлев

Артем Владимирович

Komlev

Artem V.

komlev1257@gmail.com Городничев

Михаил Геннадьевич

Городничев

Михаил Геннадьевич

Gorodnichev

Mikhail G.

m.g.gorodnichev@mtuci.ru

0000-0003-1739-9831

Московский технический университет связи и информатики (Москва, Россия) Moscow Technical University of Communications and Informatics (Moscow, Russia)

30 04 2026

2026

28 2 34 50 25 02 2026 25 03 2026 11 03 2026

Полянцева К. А., Комлев А. В., Городничев М. Г.

2026

Полянцева К. А., Комлев А. В., Городничев М. Г.

Polyantseva K.A., Komlev A.V., Gorodnichev M.G.

CC BY 4.0

https://izvestiyakbncran.ru/index.php/28-2-3/

В статье рассматривается архитектура распределенной системы хранения и обработки больших данных, построенная на основе интеграции объектного хранилища Apache Ozone и системы оркестрации вычислительных процессов Argo Workflows. Цель исследования. Разработка и исследование архитектуры распределенной системы хранения и обработки больших данных, основанной на интеграции Apache Ozone и Argo Workflows, реализующей принцип разделения функций хранения и вычислений, а также оценка эффективности предложенного решения по сравнению с традиционной архитектурой Apache Hadoop. Методы исследования. Использованы методы системного анализа архитектур больших данных, сравнительного экспериментального тестирования распределенных систем хранения и обработки информации, а также методы математического моделирования для формализации процессов масштабирования ресурсов, времени выполнения вычислений и эффективности хранения данных. Экспериментальная оценка проводилась на кластерах Apache Ozone и Apache Hadoop с использованием Apache Spark для выполнения вычислительных задач. Результаты. Разработана архитектура распределенной системы, обеспечивающая независимое масштабирование подсистем хранения и вычислений за счет использования объектного хранилища Apache Ozone и оркестрации вычислительных процессов на базе Argo Workflows в контейнерной среде Kubernetes. Предложена методика интеграции компонентов без использования промежуточного S3-шлюза, позволяющая снизить накладные расходы взаимодействия. Проведенные экспериментальные исследования показали сопоставимую производительность предложенного решения с Hadoop-кластером при операциях чтения, записи и обработки данных, а также преимущества в гибкости масштабирования и эффективности использования дискового пространства при применении erasure coding. Выводы. Результаты исследования подтверждают перспективность использования архитектуры на основе Apache Ozone и Argo Workflows в качестве альтернативы традиционным платформам обработки больших данных. Раздельная архитектура хранения и вычислений позволяет повысить гибкость инфраструктуры, оптимизировать использование ресурсов и снизить затраты на хранение данных при сохранении сопоставимого уровня производительности. Предложенный подход может быть применен при построении корпоративных аналитических платформ, систем обработки больших данных и инфраструктур машинного обучения.

The article discusses the architecture of a distributed big data storage and processing system based on the integration of the Apache Ozone object storage and the Argo Workflows computing process orchestration system. Aim. Development and research of the architecture of a distributed big data storage and processing system based on the integration of Apache Ozone and Argo Workflows, implementing the principle of separation of storage and computing functions, as well as evaluating the effectiveness of the proposed solution compared to the traditional Apache Hadoop architecture. Methods. Methods of system analysis of big data architectures, comparative experimental testing of distributed information storage and processing systems, as well as mathematical modeling methods are used to formalize the processes of scaling resources, computing time, and data storage efficiency. The experimental evaluation is carried out on Apache Ozone and Apache Hadoop clusters using Apache Spark to perform computational tasks. Results. A distributed system architecture has been developed that provides independent scaling of storage and computing subsystems through the use of Apache Ozone object storage and orchestration of computing processes based on Argo Workflows in the Kubernetes container environment. A method for integrating components without using an intermediate S3 gateway is proposed, which reduces the overhead costs of interaction. Experimental studies have shown comparable performance of the proposed solution with a Hadoop cluster for data reading, writing, and processing, as well as advantages in scaling flexibility and disk space efficiency when using erasure coding. Conclusions. The results of the study confirm the prospects of using architecture based on Apache Ozone and Argo Workflows as an alternative to traditional big data platforms. The separate storage and computing architecture allow for increased infrastructure flexibility, optimized resource usage, and lower data storage costs while maintaining comparable performance levels. The proposed approach can be applied in the construction of corporate analytical platforms, big data processing systems and machine learning infrastructures.

Ключевые слова распределенные системы хранения данных большие данные Apache Ozone Argo Workflows Kubernetes Apache Spark объектные хранилища разделение хранения и вычислений масштабируемость обработка данных контейнерные вычисления отказоустойчивость

Keywords distributed storage systems big data Apache Ozone Argo Workflows Kubernetes Apache Spark object storage separation of storage and computing scalability data processing container computing fault tolerance

Исследование проведено без спонсорской поддержки.

The study was performed without external funding.

Полянцева К. А. Высоконагруженная платформа для агрегации и анализа неструктурированных данных о состоянии дорожного полотна // Автоматизация в промышленности. 2022. № 5. С. 32–37. DOI: 10.25728/avtprom.2022.05.09

Городничев М. Г., Титов Д. В., Липатова А. Д. О задаче построение независимых архитектур обработки данных в интеллектуальных транспортных системах // Инженерный вестник Дона. 2025. № 11(131). С. 62–92.

Malik V. Hadoop Distributed file system (HDFS) with its architecture. International Journal for Research in Applied Science and Engineering Technology. 2025. Vol. 13. Pp. 6031–6034. DOI: 10.22214/ijraset.2025.71584

Kala Karun A., Chitharanjan K. A review on Hadoop – HDFS infrastructure extensions. 2013 IEEE Conference on Information & Communication Technologies, Thuckalay, India. 2013. Pp. 132–137. DOI: 10.1109/CICT.2013.6558077

Zhu Z., Tan L., Li Y., Ji C. PHDFS: Optimizing I/O performance of HDFS in deep learning cloud computing platform. Journal of Systems Architecture. 2020. Vol. 109. Article 101810. DOI: 10.1016/j.sysarc.2020.101810

Иевлев К. О., Городничев М. Г. Сравнительный анализ систем хранения данных HDFS и Apache Ozone // Computational Nanotechnology. 2025. Т. 12. № 1. С. 26–33. DOI: 10.33693/2313-223X-2025-12-1-26-33

Wilkinson S. R., Aloqalaa M., Belhajjame K. et al. Applying the FAIR principles to computational workflows. Scientific Data. 2025. Vol. 12. Article 328. DOI: 10.1038/s41597-025-04451-9

Gustafsson O.J.R., Wilkinson S.R., Bacall F. et al. WorkflowHub: a registry for computational workflows. Scientific Data. 2025. Vol. 12. Article 837. DOI: 10.1038/s41597-025-04786-3

Tourouta E., Gorodnichev M., Polyantseva K., Moseva M. Providing fault tolerance of cluster computing systems based on fault-tolerant dynamic computation planning. Lecture Notes in Information Systems and Organisation: 3rd. Virtual, Online, 2022. Pp. 143–150. DOI: 10.1007/978-3-030-94252-6_10

Kumar B., Verma A., Verma P. Introduction of kubernetes. Modern kubernetes: From core concepts to intelligent autoscaling for cloud applications. Cham: Springer, 2026. Pp. 1–15. (Studies in Autonomic, Data-driven and Industrial Computing). DOI: 10.1007/978-3-032-12972-7_1

Aqasizade H., Ataie E., Bastam M. Kubernetes in action: Exploring the performance of Kubernetes distributions in the cloud. Software: Practice and Experience. 2025. Vol. 55. Pp. 1711–1725. DOI: 10.1002/spe.70000

Lucani D., Feher M. HyRES: A hybrid replication and erasure coding approach to data storage. 2025. 14 p. arXiv: 2511.00896. URL: https://arxiv.org/abs/2511.00896 (аccessed: 22/02/2026)

Shen Z., Cai Y., Cheng K., Lee P. P. C., Li X., Hu Y., Shu J. A survey of the past, present, and future of erasure coding for storage systems. ACM Transactions on Storage. 2025. Vol. 21. No. 1. Article 4. 39 p. DOI: 10.1145/3708994

Ibrahim S., Darrous J. Erasure coding aware block placement for data-intensive applications. ACM SIGOPS Operating Systems Review. 2025. Vol. 59. No. 1. Pp. 62–69. DOI: 10.1145/3759441.3759451

Polyantseva K. A. High-load platform for aggregation and analysis of unstructured data on road surface condition. Avtomatizatsiya v promyshlennosti [Automation in Industry]. 2022. No. 5. Pp. 32–37. DOI: 10.25728/avtprom.2022.05.09. (In Russian)

Gorodnichev M.G., Titov D.V., Lipatova A.D. On problem of constructing independent data processing architectures in intelligent transport systems. Inzhenernyy vestnik Dona [Engineering Bulletin of the Don]. 2025. No. 11(131). Pp. 62–92. (In Russian)

Ievlev K.O., Gorodnichev M.G. Comparative analysis of HDFS and Apache Ozone data storage systems. Computational Nanotechnology. 2025. Vol. 12. No. 1. Pp. 26–33. DOI: 10.33693/2313-223X-2025-12-1-26-33. (In Russian)

Wilkinson S. R., Aloqalaa M., Belhajjame K. et al. Applying the FAIR principles to computational workflows. Scientific Data. 2025. Vol. 12. Article 328. DOI: 10.1038/s41597-025-04451-9

Gustafsson O.J.R., Wilkinson S.R., Bacall F. et al. WorkflowHub: a registry for computational workflows. Scientific Data. 2025. Vol. 12. Article 837. DOI: 10.1038/s41597-025-04786-3

Lucani D., Feher M. HyRES: A hybrid replication and erasure coding approach to data storage. 2025. 14 p. arXiv: 2511.00896. URL: https://arxiv.org/abs/2511.00896 (аccessed: 22/02/2026)

Ibrahim S., Darrous J. Erasure coding aware block placement for data-intensive applications. ACM SIGOPS Operating Systems Review. 2025. Vol. 59. No. 1. Pp. 62–69. DOI: 10.1145/3759441.3759451