基于深度强化学习的柑橘黄龙病智能动态防控策略

    Intelligent dynamic prevention and control strategy of citrus Huanglongbing based on deep reinforcement learning

    • 摘要:
      目的 柑橘黄龙病(Citrus Huanglongbing, HLB)传播受多重动态因素耦合影响,传统最优控制方法因计算复杂度高且依赖精确模型,导致其在实际应用中存在局限性。为解决这一问题,本文提出了一种基于双延迟深度确定性策略梯度(Twin delayed deep deterministic policy gradient, TD3)的HLB智能动态防控方法。
      方法 首先,构建融合宿主−媒介交互机制的HLB传播控制动力学模型,并通过离散化处理将其转化为马尔科夫决策过程环境;随后,引入TD3算法,设计生物约束兼容的多目标奖励函数;最后,提出HLB防控策略。
      结果 仿真试验结果表明,与DDPG、PPO等传统算法相比,本文提出的基于TD3的HLB动态防控策略在多项关键指标上均呈现出明显优势,系统状态收敛至无病平衡点的速度分别提升了约26.59%和20.99%;累计控制成本分别降低了23.79%和19.90%;杀虫剂峰值使用量减少了约35.57%。数值分析结果进一步表明,在HLB爆发初期,及时喷洒杀虫剂干预对阻断HLB传播链具有关键作用;动态防控策略相较于恒定控制策略,在抑制病害扩散效果和降低实施控制的成本方面更具优势。
      结论 本研究提出的基于TD3的HLB防控方法为高效控制HLB传播提供了新的视角,展示了深度强化学习方法在农业病害防控中的潜力。

       

      Abstract:
      Objective  Citrus Huanglongbing (HLB) transmission is influenced by the coupling of multiple dynamic factors. Traditional optimal control methods face the limitations in practical applications due to their high computational complexity and reliance on precise models. To address this problem, this paper proposes an intelligent dynamic prevention and control method for HLB based on the Twin delayed deep deterministic policy gradient (TD3) algorithm.
      Method Firstly, based on the transmission dynamics of HLB, a HLB propagation dynamics model of the interaction mechanism between host and vector was established. On this basis, the HLB transmission control dynamic model was discretized to construct a Markov Decision Process environment suitable for deep reinforcement learning. Subsequently, the TD3 algorithm was introduced, and a multi-objective reward function compatible with biological constraints was designed. Finally, an HLB prevention and control strategy was proposed.
      Result Simulation experimental results demonstrated that the proposed dynamic prevention and control strategy for HLB based on TD3 exhibited the significant advantages over traditional algorithms across multiple key performance indicators. Compared to DDPG and PPD, the speed of system state convergence to the disease-free equilibrium point increased by 26.59% and 20.99% respectively, the cumulative control cost reduced by 23.79% and 19.90% respectively, and the peak pesticide usage decreased by about 35.57%. Numerical analysis further showed that timely spraying insecticide during the early stages of HLB outbreak played a critical role in interrupting the transmission chain and preventing large-scale epidemics. Compared with constant control strategies, dynamic control strategies had more advantages in suppressing the spread of diseases and reducing the cost of implementing control measures.
      Conclusion The HLB prevention and control method based on TD3 proposed in this study provides a new perspective for the efficient control of HLB transmission, and demonstrates the potential of deep reinforcement learning methods in agricultural disease prevention and control.

       

    /

    返回文章
    返回