基于PPO算法的自适应光谱数据增强框架:一种预测龙眼TSS含量的新方法

    An adaptive spectral data augmentation framework based on PPO algorithm: A novel method for predicting longan TSS content

    • 摘要:
      目的 龙眼成熟度的快速、准确判别对采收期调控至关重要。总可溶性固形物(Total soluble solids, TSS)含量是其核心评价指标,但传统检测方法耗时且无法满足实时需求。近红外光谱(Near-infrared spectroscopy, NIRS)作为无损监测技术具备应用潜力,然而农业场景中数据匮乏导致的小样本问题,严重制约了其建模精度与实际应用效果。本文旨在解决这一技术瓶颈,实现龙眼TSS含量的快速无损检测。
      方法 本研究创新性地提出一种基于近端策略优化算法的自适应光谱数据增强框架。该框架将增强策略选择过程建模为强化学习任务,智能体通过动态组合噪声添加、平移与缩放等变换操作,生成可有效提升下游CatBoost模型预测精度的高质量样本数据;同时引入Z-score离群值筛选机制,以确保数据具备物理合理性。
      结果 在含有400个样本的龙眼TSS含量数据集上,所提框架测试集R20.7556,平均绝对误差(Mean absolute error, MAE)为1.6961,相对预测偏差(Relative predictive deviation, RPD)为1.6044,显著优于PCA+SVR(R2=0.1227)、PCA+RFR(R2=0.6218)等基线模型及单一固定增强策略;其过拟合差距(训练集与验证集R2差值)仅为0.1443,在提升预测精度的同时,有效提升了模型鲁棒性。
      结论 本研究为解决农业光谱分析中的小样本难题提供了一种智能化、自适应且稳定可靠的新型方案,具有重要的理论意义与广阔的应用前景。

       

      Abstract:
      Objective Rapid and accurate assessment of longan maturity is crucial for harvest time regulation. The content of total soluble solids (TSS) serves as the core evaluation index, yet traditional detection methods are time-consuming and fail to meet real-time requirements. Near-infrared spectroscopy (NIRS), as a non-destructive monitoring technology, holds significant application potential. However, the small-sample problem caused by data scarcity in agricultural scenarios severely restricts its modeling accuracy and practical applicability. This study aims to address this technical bottleneck and achieve rapid non-destructive detection of TSS content in longan.
      Method This study innovatively proposed an adaptive spectral data augmentation framework based on the proximal policy optimization (PPO) algorithm. The framework formulated the selection of augmentation strategies as a reinforcement learning process. The agent dynamically combined actions such as noise addition, translation and scaling to generate high-quality samples that enhanced the prediction performance of the downstream CatBoost model. Meanwhile the Z-score outlier filtering mechanism was introduced to ensure the physical rationality of the data.
      Result On longan TSS dataset of 400 samples, the coefficient of determination (R2) of the proposed framework on the test set reached 0.7556 and the mean absolute error (MAE) was 1.6961, and the relative predictive deviation (RPD) was 1.6044. It was significantly superier to baseline models such as PCA+SVR (R2=0.1227), PCA+RFR (R2=0.6218), as well as the single fixed augmentation strategy. Meanwhile, its overfitting gap (the difference in R2 between the training set and validation set) was only 0.1443, which effectively improved the prediction accuracy and enhanced the model robustness.
      Conclusion This study provides an intelligent, adaptive, stable, and reliable novel solution for addressing the small-sample challenge in agricultural spectral analysis, with significant theoretical implications and broad application potential.

       

    /

    返回文章
    返回