Abstract:
Objective Rapid and accurate assessment of longan maturity is crucial for harvest time regulation. The content of total soluble solids (TSS) serves as the core evaluation index, yet traditional detection methods are time-consuming and fail to meet real-time requirements. Near-infrared spectroscopy (NIRS), as a non-destructive monitoring technology, holds significant application potential. However, the small-sample problem caused by data scarcity in agricultural scenarios severely restricts its modeling accuracy and practical applicability. This study aims to address this technical bottleneck and achieve rapid non-destructive detection of TSS content in longan.
Method This study innovatively proposed an adaptive spectral data augmentation framework based on the proximal policy optimization (PPO) algorithm. The framework formulated the selection of augmentation strategies as a reinforcement learning process. The agent dynamically combined actions such as noise addition, translation and scaling to generate high-quality samples that enhanced the prediction performance of the downstream CatBoost model. Meanwhile the Z-score outlier filtering mechanism was introduced to ensure the physical rationality of the data.
Result On longan TSS dataset of 400 samples, the coefficient of determination (R2) of the proposed framework on the test set reached 0.7556 and the mean absolute error (MAE) was 1.6961, and the relative predictive deviation (RPD) was 1.6044. It was significantly superier to baseline models such as PCA+SVR (R2=0.1227), PCA+RFR (R2=0.6218), as well as the single fixed augmentation strategy. Meanwhile, its overfitting gap (the difference in R2 between the training set and validation set) was only 0.1443, which effectively improved the prediction accuracy and enhanced the model robustness.
Conclusion This study provides an intelligent, adaptive, stable, and reliable novel solution for addressing the small-sample challenge in agricultural spectral analysis, with significant theoretical implications and broad application potential.