基于单目视觉和改进YOLOv8-pose模型的篮筐位姿估计方法

陈琳; 徐震; 张春燕; 吉晓升; 成松松; 黄嘉俊

doi:10.7671/j.issn.1001-411X.202504027

摘要:

目的当前设施大棚蔬菜采收装筐后的搬运作业仍以人工为主，存在效率低下、劳动强度大等问题，严重制约了农业生产的规模化与智能化发展。开发具备篮筐自主抓取功能的新型农业机器人，是破解该瓶颈、提升农业生产效率的关键技术路径。其中，基于计算机视觉技术实现对篮筐的精准位姿估计，是保障机器人抓取动作稳定可靠的核心前提与技术基础。然而，现有位姿估计方法的准确性与实时性难以满足复杂大棚环境下的实际作业需求，亟待进一步深入研究与优化。

方法以YOLOv8-pose为基准模型，通过检测篮筐特征点并融合PnP算法估计篮筐位姿。首先，利用单目相机采集各种复杂背景下的篮筐RGB图像并制作成数据集。其次，在YOLOv8-pose模型基础上引入Biformer模块、GAM注意力机制和Focaler_GIoU损失函数，提升模型在复杂背景和遮挡情况下的关键点检测性能。最后，基于篮筐尺寸参数与检测到的关键点二维坐标，利用PnP算法求解篮筐在三维空间中的位姿参数。

结果试验结果显示，关键点平均精度均值、准确率分别提升3.73、4.31个百分点，定位平均精准度提高了5.20像素，与手动标识的关键点之间的均方根误差为4.45 像素。通过分析相机与篮筐距离对位姿估计精度的影响可知，在相机距离篮筐1.7~1.9 m时，位姿估计算法表现出较高的定位精度，表明相机与篮筐的相对距离对位姿估计精度具有重要影响。

结论本研究提出的方法可为设施大棚场景下的篮筐位姿估计提供低成本、高精度的解决方案，为农业机器人抓取篮筐提供技术支撑。

Abstract:

Objective Currently, the handling operation of vegetable baskets after harvesting in protected greenhouses is still dominated by manual labor, which has problems such as low efficiency and high labor intensity, seriously restricting the large-scale and intelligent development of agricultural production. Developing a new type of agricultural robot with autonomous basket-grabbing functionality is a key technical path to break this bottleneck and improve agricultural production efficiency. Among them, achieving accurate pose estimation of baskets based on computer vision technology is the core premise and technical foundation for ensuring the stability and reliability of the robot’s grabbing actions. However, the accuracy and real-time performance of existing pose estimation methods are difficult to meet the actual operational requirements in complex greenhouse environments, and further in-depth research and optimization are urgently needed.

Method Based on the YOLOv8-pose baseline model, this approach estimated the basket’s pose by detecting its feature points and integrating the PnP algorithm. Firstly, RGB images of baskets under diverse complex backgrounds were captured using a monocular camera to construct a dedicated dataset. Secondly, the Biformer module, GAM attention mechanism and Focaler_GIoU loss function were incorporated into the YOLOv8-pose framework to enhance keypoint detection robustness in challenging scenarios involving cluttered backgrounds and occlusions. Finally, leveraging the basket’s predefined dimensional parameters and the detected 2D keypoint coordinates, the PnP algorithm was employed to solve for the 3D pose parameters of the basket in physical space.

Result The mean average precision (mAP) and precision of keypoints were increased by 3.73 and 4.31 percentage points, respectively. The average positioning precision was increased by 5.20 pixels, and the root mean square error (RMSE) between these keypoints and manually identified keypoints was increased by 4.45 pixels on average. The pose estimation algorithm achieved higher accuracy when the camera was 1.7 to 1.9 m from the basket, highlighting the critical influence of relative distance between the camera and the basket on localization estimation precision.

Conclusion The method proposed in this study can provide a low-cost and high-precision solution for basket pose estimation in a facility greenhouse scenario, and provide a technical support for agricultural robots to grasp the basket.

基于单目视觉和改进YOLOv8-pose模型的篮筐位姿估计方法

Basket pose estimation method based on monocular vision and improved YOLOv8-pose model