熊春源, 熊俊涛, 杨振刚, 等. 基于深度强化学习的柑橘采摘机械臂路径规划方法[J]. 华南农业大学学报, 2023, 44(3): 473-483. doi: 10.7671/j.issn.1001-411X.202206024
    引用本文: 熊春源, 熊俊涛, 杨振刚, 等. 基于深度强化学习的柑橘采摘机械臂路径规划方法[J]. 华南农业大学学报, 2023, 44(3): 473-483. doi: 10.7671/j.issn.1001-411X.202206024
    XIONG Chunyuan, XIONG Juntao, YANG Zhengang, et al. Path planning method for citrus picking manipulator based on deep reinforcement learning[J]. Journal of South China Agricultural University, 2023, 44(3): 473-483. doi: 10.7671/j.issn.1001-411X.202206024
    Citation: XIONG Chunyuan, XIONG Juntao, YANG Zhengang, et al. Path planning method for citrus picking manipulator based on deep reinforcement learning[J]. Journal of South China Agricultural University, 2023, 44(3): 473-483. doi: 10.7671/j.issn.1001-411X.202206024

    基于深度强化学习的柑橘采摘机械臂路径规划方法

    Path planning method for citrus picking manipulator based on deep reinforcement learning

    • 摘要:
      目的  为解决非结构化环境下采用深度强化学习进行采摘机械臂路径规划时存在的效率低、采摘路径规划成功率不佳的问题,提出了一种非结构化环境下基于深度强化学习(Deep reinforcement learning, DRL)和人工势场的柑橘采摘机械臂的路径规划方法。
      方法  首先,通过强化学习方法进行采摘路径规划问题求解,设计了结合人工势场的强化学习方法;其次,引入长短期记忆(Longshort term memory,LSTM)结构对2种DRL算法的Actor网络和Critic网络进行改进;最后,在3种不同的非结构化柑橘果树环境训练DRL算法对采摘机械臂进行路径规划。
      结果  仿真对比试验表明:结合人工势场的强化学习方法有效提高了采摘机械臂路径规划的成功率;引入LSTM结构的方法可使深度确定性策略梯度(Deep deterministic policy gradient,DDPG)算法的收敛速度提升57.25%,路径规划成功率提升23.00%;使软行为评判(Soft actor critic,SAC)算法的收敛速度提升53.73%,路径规划成功率提升9.00%;与传统算法RRT-connect(Rapidly exploring random trees connect)对比,引入LSTM结构的SAC算法使规划路径长度缩短了16.20%,路径规划成功率提升了9.67%。
      结论  所提出的路径规划方法在路径规划长度、路径规划成功率方面存在一定优势,可为解决采摘机器人在非结构化环境下的路径规划问题提供参考。

       

      Abstract:
      Objective  In order to solve the problems of poor training efficiency and low success rate of picking path planning of manipulator using deep reinforcement learning (DRL), this study proposed a path planning method combined with DRL and artificial potential field for citrus picking manipulator in unstructured environments.
      Method  Firstly, the picking path planning problem was solved by the DRL with artificial potential field method. Secondly, the longshort term memory (LSTM) structure was introduced to improve the Actor network and Critic network of two DRL algorithms. Finally, the DRL algorithms were trained in three different unstructured citrus growing environments to perform path planning for picking manipulator.
      Result  The comparison of simulation experiments showed that the success rate of path planning was effectively improved by combining DRL with the artificial potential field method, the method with LSTM structure improved the convergence speed of the deep deterministic policy gradient (DDPG) algorithm by 57.25% and the success rate of path planning by 23.00%. Meanwhile, the method improved the convergence speed of the soft actor critic (SAC) algorithm by 53.73% and the path planning success rate by 9.00%. Compared with the traditional algorithm RRT-connect (Rapidly exploring random trees connect), the SAC algorithm with LSTM structure shortened the planned path length by 16.20% and improved the path planning success rate by 9.67%.
      Conclusion  The proposed path planning method has certain advantages for path planning length and path planning success rate, which can provide references for solving path planning problems of picking robots in unstructured environments.

       

    /

    返回文章
    返回