Research on multi-objective path planning and dynamic obstacle avoidance algorithm of manipulator based on reinforcement learning

Authors

  • Zhengis Arkalyk

DOI:

https://doi.org/10.56028/aetr.14.1.1405.2025

Keywords:

Multi-objective path planning; Dynamic obstacle avoidance; Manipulator; Reinforcement learning; Hierarchical reinforcement learning; Proximal policy optimization; Soft actor-critic; LSTM.

Abstract

Aiming at the problem of multi-objective path planning (MOPP) and dynamic obstacle avoidance of manipulator in dynamic environment, this paper proposes a solution based on hierarchical reinforcement learning (HRL) framework. Traditional path planning methods have some problems in dynamic scenes, such as poor real-time performance, difficult to balance multi-objective conflicts and insufficient adaptability to environmental changes. Therefore, this paper designs a two-tier architecture including global path planning layer and local obstacle avoidance layer, which is trained by Proximal policy optimization (PPO) and Soft Actor-Critic (SAC) algorithms respectively, and achieves collaborative optimization through an efficient information exchange mechanism between layers. At the same time, a multi-objective reward function based on dynamic weight adjustment strategy is introduced. Fuzzy logic is used to adaptively balance the relationship among path length, obstacle avoidance safety and energy consumption according to environmental complexity. Combined with Long Short-Term Memory (LSTM), the trajectory of obstacles is predicted, and the potential field method is further introduced to modify the obstacle avoidance reward, which improves the real-time response ability and robustness of the algorithm in dynamic environment. The experimental results show that the HRL-SAC-PPO method proposed in this paper shows superior performance in both static and dynamic scenarios. In the static scene, the success rate of this method reaches 100%, the average path length is shortened to 2.13m, no collision occurs, and the energy consumption is reduced to 1.12 kJ, which shows a good multi-objective optimization effect. In the dynamic scene, the trajectory error of obstacles predicted by LSTM is only 4.2%, and the safe distance between the robot arm and obstacles is improved by 35%, which significantly enhances the reliability of obstacle avoidance. In addition, the average decision delay of this method is only 11.3ms, and the peak delay is 23ms, which is much lower than that of the contrast algorithm, showing stronger real-time response ability. The ablation experiment further verified the key role of LSTM trajectory prediction, dynamic weight adjustment and layered structure on the overall performance. In the welding task verification of the real UR5 manipulator, the success rate of the system in dynamic environment is 95.3%, the average path length is 3.41m, and the maximum joint acceleration is only 0.87 radian/s², which is far below the safety threshold, indicating that the algorithm has good stability and obstacle avoidance ability in practical application. The comprehensive performance comparison shows that this method performs well in different industrial scenarios, and has stronger environmental adaptability and comprehensive path planning ability.

Downloads

Published

2025-07-26