Development of an unmanned vehicle course control system based on reinforcement learning
A.E. Ushakov, M.M. Stebulyanin, M.A. Shereuzhev, F.V. Devyatkin
Upload the full text
Abstract: At present, there is a growing development of autonomous transportation, driven by the need to improve road safety, reduce collisions, and enhance the efficiency of logistics operations. This trend is also influenced by increasing complexity in road conditions and challenges related to vehicle navigation and control, which make traditional control algorithms insufficient in terms of quality and effectiveness. Aim. The objective of this research is to develop an intelligent system that enables an autonomous vehicle to independently control its course. The autonomous agent (a vehicle model) learns to navigate and follow a predefined trajectory using reinforcement learning through interaction with a simulation environment, based on the Actor-Critic method. Materials and Methods. In this work, the Stable-Baselines 3 (SB3) library built on the PyTorch framework was used to implement and train the reinforcement learning model. The DonkeyCar simulator served as the training environment. To improve the speed and efficiency of training, a denoising autoencoder algorithm was applied to extract the region of interest (ROI). Results. A series of comparative experiments was conducted to evaluate the impact of various parameters on training efficiency – such as speed limits, steering angle constraints, allowable deviation width from the lane center, movement continuity, discount factor, and frame rendering rate. Conclusion. The results of the study demonstrate the potential of reinforcement learning in the field of autonomous transport, while also highlighting the need for further training on real-world data, the prospects for scaling the approach to different classes of vehicles, and limitations related to computational resources and the need for safe behavior verification
Keywords: reinforcement learning, unmanned vehicle, Q-learning, DQN (Deep Q-Network), actor-critic, simulation modeling, intelligent system, simulation environment, training stability
For citation. Ushakov A.E., Stebulyanin M.M., Shereuzhev M.A., Devyatkin F.V. Development of an unmanned vehicle course control system based on reinforcement learning. News of the Kabardino-Balkarian Scientific Center of RAS. 2025. Vol. 27. No. 3. Pp. 39–54. DOI: 10.35330/1991-6639-2025-27-3-39-54
References
- Syrkin I.S., Dubinkin D.M., Yunusov I.F., Ushakov A.E. Control systems of autonomous mining dump trucks. Young Russia: Proceedings of the XIV All-Russian Scientific and Practical Conference with International Participation, Kemerovo, April 19–21, 2022. Kemerovo: T.F. Gorbachev Kuzbass State Technical University, 2022. Pp. 420071–420078. EDN: CXHGOK. (In Russian)
- Toromanoff M., Wirbel E., Moutarde F. End-to-end model-free reinforcement learning for urban driving using implicit affordances. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. С. 7151–7160. DOI: 10.1109/CVPR42600.2020.00718
- Sauer A., Savinov N., Geiger A. Conditional affordance learning for driving in urban environments. Proceedings of the Conference on Robot Learning (CoRL). 2018. DOI: 10.48550/arXiv.1806.06498
- Shereuzhev M.A., U Gо, Serebrenny V.V. Modification of a deep learning algorithm for the distribution of functions and tasks between a robotic system and a human under conditions of uncertainty and environmental variability. News of the Kabardino-Balkarian Scientific Center of RAS. 2024. Vol. 26. No. 6. P. 208–218. DOI: 10.35330/1991-6639-2024-26-6-208-218. (In Russian)
- Tampuu A., Semikin M., Muhammad N. et al. Survey of end-to-end driving: Architectures and training methods: arXiv preprint arXiv:2003.06404. 2020.
- Lyutikova L.A. Application of a machine learning method for the analysis of incomplete data. News of the Kabardino-Balkarian Scientific Center of RAS. 2024. Vol. 26. No. 6. Pp. 139–145. DOI: 10.35330/1991-6639-2024-26-6-139-145. (In Russian)
- Shereuzhev M.A., Arabadzhiev D.I., Semyannikov I.V. Modeling of a collision avoidance algorithm in collaborative robotic systems. News of the Kabardino-Balkarian Scientific Center of RAS. 2024. Vol. 26. No. 6. Pp. 67–81. DOI: 10.35330/1991-6639-2024-26-6-67-81. (In Russian)
- He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas (NV), Pp. 770–778. DOI: 10.1109/CVPR.2016.90
- Petrenko V.I. Classification of multi-agent reinforcement learning tasks. News of the Kabardino-Balkarian Scientific Center of RAS. 2021. Vol. 3. No. 101. Pp. 32–44. DOI: 10.35330/1991-6639-2021-3-101-32-44. (In Russian)
- Cole A., Gandju S., Kazam M. Iskusstvennyy intellekt i komp’yuternoye zreniye: real’nyye proyekty na Python, Keras i TensorFlow [Artificial intelligence and computer vision: Real projects using Python, Keras, and TensorFlow]. St. Petersburg: Piter, 2019. 356 p. ISBN: 978-1-492-04305-0. (In Russian)
- Ushakov A.E., Stebulyanin M.M. Study of model training parameters for a course control system. Internauka. 2025. No. 1-3(365). Pp. 53–57. EDN: OXPGLQ. (In Russian)
- Ushakov A.E. Using a simulator to study autonomous driving technologies. Russian Science in the Modern World: Proceedings of the LXVII International Scientific and Practical Conference. Moscow, January 15, 2025. Moscow: Aktualnost. RF, 2025. Pp. 155–158. EDN: JFUWYO. (In Russian)
- Sutton R.S., Barto A.G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 2018. 552 с.
- Liang X., Wang T., Yang L., Xing E. CIRL: Controllable imitative reinforcement learning for vision-based self-driving. Proceedings of the European Conference on Computer Vision (ECCV). 2018. DOI: 10.48550/arXiv.1807.03776
- Shereuzhev M.A., Shereuzhev M.A., Kishev A.Yu. Issues of selecting a machine vision system for agricultural robotic complexes for weed control. News of the Kabardino-Balkarian Scientific Center of RAS. 2022. No. 4(108). Pp. 84–95. DOI: 10.35330/1991-6639-2022-4-108-84-95. (In Russian)
- Chen D., Zhou B., Koltun V., Krähenbühl P. Learning by Cheating: arXiv preprint arXiv:1912.12294. 2019
Information about the authors
Alexander E. Ushakov, Postgraduate student, Research Engineer, Department of Robotics and Mechatronics, Moscow State University of Technology “STANKIN”;
127055, Russia, Moscow, 1 Vadkovsky lane;
ushakov_ae@internet.ru, ORCID: https://orcid.org/0009-0006-1467-5043, SPIN-code: 5174-7378
Mikhail M. Stebulyanin, Doctor of Technical Sciences, Professor, Head of the Department of Robotics and Mechatronics, Moscow State University of Technology “STANKIN”;
127055, Russia, Moscow, 1 Vadkovsky lane;
mmsteb@rambler.ru, ORCID: https://orcid.org/0009-0007-3443-0593, SPIN-code: 4389-1120
Madin A. Shereuzhev, Candidate of Engineering Sciences, Associate Professor at the Department of Robotics and Mechatronics, Moscow State University of Technology “STANKIN”;
127055, Russia, Moscow, 1 Vadkovsky lane;
shereuzhev@gmail.com, ORCID: https://orcid.org/0000-0003-2352-992X, SPIN-code: 1734-9056
Fedor V. Devyatkin, Postgraduate student at the Department of ME7 “Robotic Systems and Mechatronics”, The Bauman Moscow State Technical University;
105005, Russia, Moscow, 5, 2-nd Baumanskaya street;
Engineer, Moscow State University of Technology “STANKIN”;
127055, Russia, Moscow, 1 Vadkovsky lane;
feodor-dev@ya.ru, ORCID: https://orcid.org/0009-0000-2639-9521, SPIN-code: 7738-5724