On the application of reinforcement learning in the task of choosing the optimal trajectory

M.G. Gorodnichev

Upload the full text

Abstract: This paper reviews state-of-the-art reinforcement learning methods, with a focus on their application in dynamic and complex environments. The study begins by analysing the main approaches to reinforcement learning such as dynamic programming, Monte Carlo methods, time-difference methods and policy gradients. Special attention is given to the Generalised Adversarial Imitation Learning (GAIL) methodology and its impact on the optimisation of agents’ strategies. A study of model-free learning is presented and criteria for selecting agents capable of operating in continuous action and state spaces are highlighted. The experimental part is devoted to analysing the learning of agents using different types of sensors, including visual sensors, and demonstrates their ability to adapt to the environment despite resolution constraints. A comparison of results based on cumulative reward and episode length is presented, revealing improved agent performance in the later stages of training. The study confirms that the use of simulated learning significantly improves agent performance by reducing time costs and improving decision-making strategies. The present work holds promise for further exploration of mechanisms for improving sensor resolution and fine-tuning hyperparameters.

Keywords: reinforcement learning, intelligent agents, optimal trajectory, highly automated vehicles, policy-based learning, actor-critic architectures, simulated learning, sensors, continuous states, discrete states, PPO, SAC

For citation. Gorodnichev M.G. On the application of reinforcement learning in the task of choosing the optimal trajectory. News of the Kabardino-Balkarian Scientific Center of RAS. 2025. Vol. 27. No. 2. Pp. 86–102. DOI: 10.35330/1991-6639-2025-27-2-86-102

References

Zhang S., Xia Q., Chen M., Cheng S. Multi-Objective Optimal Trajectory Planning for Robotic Arms Using Deep Reinforcement Learning. Sensors. 2023. Vol. 23. P. 5974. DOI: 10.3390/s23135974
Tamizi M.G., Yaghoubi M., Najjaran H. A review of recent trend in motion planning of industrial robots. International Journal of Intelligent Robotics and Applications. 2023. Vol. 7. Pp. 253–274. DOI:10.1007/s41315-023-00274-2
Kollar T., Roy N. Trajectory Optimization using Reinforcement Learning for Map Exploration. International Journal of Robotics Research. 2008. Vol. 27. No. 2. Pp. 175–196. DOI: 10.1177/0278364907087426
Acar E.U., Choset H., Zhang Y., Schervish M. Path planning for robotic demining: robust sensor-based coverage of unstructured environments and probabilistic methods. International Journal of Robotics Research. 2003. Vol. 22. No. 7–8. Pp. 441–466.
Cohn D.A., Ghahramani Z., Jordan M.I. Active learning with statistical models. Journal of Artificial Intelligence Research. 1996. No. 4. Pp. 705–712.
Axhausen K. et al. Introducing MATSim. In: Horni, A et al (eds.). Multi-Agent Transport Simulation MATSim. London: Ubiquity Press. 2016. Pp. 3–8. DOI: 10.5334/baw.1
Wu G., Zhang D., Miao Z., Bao W., Cao J. How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision Making. Electronics. 2024. Vol. 13. P. 1281. DOI: 10.3390/electronics13071281
Zhou T., Lin M. Deadline-aware deep-recurrent-q-network governor for smart energy saving. IEEE Transactions on Network Science and Engineering. 2021. Vol. 9. Pp. 3886–3895. DOI: 10.1109/TNSE.2021.3123280
Yang Y., Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv 2020, arXiv:2011.00583. DOI: 10.48550/arXiv.2011.00583
Mazyavkina N., Sviridov S., Ivanov S., Burnaev E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021. Vol. 134. P. 105400. DOI: 10.1016/j.cor.2021.105400
Junwei Zhang, Zhenghao Zhang, Shuai Han, Shuai Lü, Proximal policy optimization via enhanced exploration efficiency. Information Sciences. 2022. Vol. 609. Pp. 750–765. ISSN 0020-0255. DOI: 10.1016/j.ins.2022.07.111
Hessel M., Modayil J., H. van Hasselt, Schaul T. et al. Rainbow: Combining improvements in deep reinforcement learning. In AAAI Conference on Artificial Intelligence. 2018. Pp. 3215–3222. DOI: 10.1609/aaai.v32i1.11796
Haarnoja T., Zhou A., Abbeel P., Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning. 2018. Pp. 1856–1865. DOI: 10.48550/arXiv.1801.01290
Lillicrap T.P., Hunt J.J., Pritzel A. et al. Continuous control with deep reinforcement learning. arXiv:1509.02971v1. 2015. file:///C:/Users/%D0%90%D1%80%D1%81%D0%B5%D0%BD/ Downloads/1509.02971v1.pdf
Chen Y., Lam C.T., Pau G., Ke W. From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Applied Sciences.2025. Vol. 15. No. 3. P. 1423. DOI: 10.3390/app15031423
Guoyu Zuo, Kexin Chen, Jiahao Lu, Xiangsheng Huang. Deterministic generative adversarial imitation learning. Neurocomputing. 2020. Vol. 388. Pp. 60–69. ISSN 0925-2312. DOI: 10.1016/j.neucom.2020.01.016
Sawada R. Automatic Collision Avoidance Using Deep Reinforcement Learning with Grid Sensor. In: Sato, H., Iwanaga, S., Ishii, A. (eds). Proceedings of the 23rd Asia Pacific Symposium on Intelligent and Evolutionary Systems. IES 2019. Proceedings in Adaptation, Learning and Optimization. Springer, Cham. 2020. Vol. 12. Pp. 17–32. DOI: 10.1007/978-3-030-37442-6_3
Hachaj T., Piekarczyk M. On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data. Applied Sciences. 2025. Vol. 15. No. 2. P. 538. DOI: 10.3390/app15020538

Information about the author

Mikhail G. Gorodnichev, Candidate of Engineering Sciences, Associate Professor, Dean of the Faculty of Information Technology, Moscow Technical University of Communications and Informatics;

111024, Russia, Moscow, 8A Aviamotornaya street;

m.g.gorodnichev@mtuci.ru, ORCID: https://orcid.org/0000-0003-1739-9831, SPIN-code: 4576-9642

Indexing Databases

On the application of reinforcement learning in the task of choosing the optimal trajectory