Modification of a deep learning algorithm for distributing functions and tasks between a robotic complex and a person in conditions of uncertainty and variability of the environment
M.A. Shereuzhev, Wu Guo, V.V. Serebrenny
Upload the full text
Abstract. In the real world, conditions are rarely stable, which requires robotic systems to be able to adapt to uncertainty. Human-robot collaboration increases productivity, but this requires effective task allocation methods that consider the characteristics of both parties.The aim of the work is to determine optimal strategies for distributing tasks between people and collaborative robots and adaptive control of a collaborative robot under uncertainty and a changing environment.Research methods. The paper develops a graph-based approach to task allocation based on the capabilities of a human and a robot. The LSTM memory mechanism is built into the reinforcement learning algorithm to solve the problem of partial observability caused by inaccurate sensor measurements and environmental noise. The Hindsight Experience Replay method is used to overcome the problem of sparse rewards.Results.The trained model demonstrated stable convergence, achieving a high level of success rate of manipulation of objects.The integration of LSTM and HER methods into reinforcement learning allows solving the problems of distributing tasks between a human and a robot under uncertainty and a changing environment. The proposed method can be applied in various scenarios for collaborative robots in complex and changing conditions.
Keywords: human robot interaction, adaptive control algorithm, task distribution, reinforcement learning
For citation. Shereuzhev M.A., Guo Wu, SerebrennyV.V. Modification of a deep learning algorithm for distributing functions and tasks between a robotic complex and a person in conditions of uncertainty and variability of the environment. News of the Kabardino-Balkarian Scientific Center of RAS. 2024. Vol. 26. No. 6. Pp. 208–218. DOI: 10.35330/1991-6639-2024-26-6-208-218
References
- Fiore M., Clodic A., Alami R. On planning and task achievement modalities for human-robot collaboration. In Experimental Robotics: The 14th International Symposium on Experimental Robotics. Marrakech, Morocco: Springer. 2016. Pp. 293–306.
- Ghadirzadeh A., Chen X., Yin W. et al. Human-centered collaborative robots with deep reinforcement learning. IEEE Robotics and Automation Letters. 2020. Vol. 6(2). Pp. 566–571. DOI: 10.48550/arXiv.2007.01009
- Qureshi A.H., Nakamura Y., Yoshikawa Y., Ishiguro H. Robot gains social intelligence through multimodal deep reinforcement learning. In IEEE-RAS. 16th International Conference on Humanoid Robots (humanoids). 2016. Pp. 745–751. DOI: 10.48550/arXiv.1702.07492
- Kwok Y.K., Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys. 1999. Vol. 31(4). Pp. 406–471. DOI: 10.1145/344588.344618
- Malik A.A., Bilberg A. Complexity-based task allocation in human-robot collaborative assembly. Industrial Robot: International Journal of Robotics Research and Application. 2019. Vol. 46(4). Pp. 471–480. DOI: 10.1108/IR-11-2018-0231
- Lucignano L., Cutugno F., Rossi S., Finzi A. A dialogue system for multimodal human-robot interaction. Proceedings of the 15th ACM on International Conference on Multimodal Interaction. 2013. Pp. 197–204. DOI: 10.1145/2522848.2522873
- Qiu C., Hu Y., Chen Y., Zeng B. Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Internet of Things Journal. 2019. Vol. 6(5). Pp. 8577–8588. DOI: 10.1109/JIOT.2019.2921159
- Hochreiter S. Long Short-term Memory. Neural Computation MIT-Press. 1997.
- Andrychowicz M., Wolski F., Ray A. et al. Hindsight experience replay. Advances in Neural Information Processing Systems. 2017. Vol. 30.
- Towers M., Kwiatkowski A., Terry J. et al. Gymnasium: A standard interface for reinforcement learning environments. arXiv:2407.17032. 2024. DOI: 10.48550/arXiv.2407.17032
Information about the author
Madin A. Shereuzhev, Candidate of Engineering Sciences, Junior Research, Center for Cognitive
Technologies and Machine Vision Systems, Moscow State University of Technology STANKIN;
127055, Russia, Moscow, 1 Vadkovsky lane;
Senior Teacher, The Department of Robotic Systems and Mechatronics, Moscow State Technical
University named after N.E. Bauman;
105005, Russia, Moscow, 2nd Baumanskaya street 5, bld 1;
m.shereuzhev@stankin.ru, ORCID: http://orcid.org/0000-0003-2352-992X; SPIN-code: 1734-9056
Wu Guo, Post-graduate Student at the Department of Robotic Systems and Mechatronics, Moscow
State Technical University named after N.E. Bauman;
105005, Russia, Moscow, 2nd Baumanskaya street 5, bld 1;
ug@student.bmstu.ru,ORCID: http://orcid.org/0000-0001-8424-4421; SPIN-code: 9189-9658
Vladimir V. Serebrenny, Candidate of Engineering Sciences, Associate Professor, Head of the Department
of Robotic Systems and Mechatronics, Moscow State Technical University named after N.E. Bauman;
105005, Russia, Moscow, 2nd Baumanskaya street 5, bld 1;
vsereb@bmstu.ru, ORCID: http://orcid.org/0000-0003-1182-2117, SPIN-code: 5410-8433











