基于混合分组扩张卷积的玉米植株图像深度估计

周云成; 刘忠颖; 邓寒冰; 苗腾; 王昌远

doi:10.7671/j.issn.1001-411X.202304019

摘要:

目的研究面向玉米田间场景的图像深度估计方法，解决深度估计模型因缺少有效光度损失度量而易产生的精度不足问题，为田间智能农业机械视觉系统设计及导航避障等提供技术支持。

方法应用双目相机作为视觉传感器，提出一种基于混合分组扩张卷积的无监督场景深度估计模型。设计一种混合分组扩张卷积结构及对应的自注意力机制，由此构建反向残差模块和深度估计骨干网络；并将光照不敏感的图像梯度和Gabor纹理特征引入视图表观差异度量，构建模型优化目标。以田间玉米植株图像深度估计为例，开展模型的训练和测试试验。

结果与固定扩张因子相比，采用混合分组扩张卷积使田间玉米植株深度估计平均相对误差降低了63.9%，平均绝对误差和均方根误差则分别降低32.3%和10.2%，模型精度显著提高；图像梯度、Gabor纹理特征和自注意力机制的引入，使田间玉米植株深度估计平均绝对误差和均方根误差进一步降低3.2%和4.6%。增加浅层编码器的网络宽度和深度可显著提高模型深度估计精度，但该处理对深层编码器的作用不明显。该研究设计的自注意力机制对编码器浅层反向残差模块中不同扩张因子的卷积分组体现出选择性，说明该机制具有自主调节感受野的能力。与Monodepth2相比，该研究模型田间玉米植株深度估计的平均相对误差降低48.2%，平均绝对误差降低17.1%；在20 m采样范围内，估计深度的平均绝对误差小于16 cm，计算速度为14.3帧/s。

结论基于混合分组扩张卷积的图像深度估计模型优于现有方法，有效提升了深度估计的精度，能够满足田间玉米植株图像的深度估计要求。

Abstract:

Objective To study the image depth estimation methods for corn field scenes, solve the problem of insufficient accuracy in depth estimation models due to the lack of effective photometric loss measures, and provide technical support for the vision system design of field intelligent agricultural machinery and navigation obstacle avoidance.

Method This study applied binocular cameras as visual sensors, and proposed an unsupervised depth estimation model based on hybrid grouping extended convolution. A hybrid grouping extended convolution structure and its corresponding self-attention regulation mechanism were designed. The reverse residual module and deep neural network were constructed as the backbone of the model. The illumination insensitive image gradient and Gabor texture features were introduced into the apparent difference measurement of view, and the model optimization objective was constructed based on them. Taking maize plant image as an example, the model training and verification tests were carried out.

Result Compared with the fixed expansion factor, the average relative error of maize plant depth estimation in the field was reduced by 63.9%, the average absolute error and root mean square error were reduced by 32.3% and 10.2% respectively, and the accuracy of the model was significantly improved. With the introduction of image gradient, Gabor texture feature and self-attention mechanism, the mean absolute error and root mean square error of field scene depth estimation were further reduced by 3.2% and 4.6% respectively. Increasing the network width and depth of shallow encoder could significantly improve the accuracy of model depth estimation, but the effect of this treatment on deep encoder was not obvious. The self-attention mechanism designed in this study was selective to the convolution grouping of different expansion factors in the shallow reverse residual module of the encoder, indicating that the mechanism had the ability to adjust the receptive field. Compared with Monodepth2, the average relative error and the average absolute error of the estimated depth of maize plants in the field of the research model were reduced by 48.2% and 17.1% respectively. Within the sampling range of 20 m, the average absolute error of the estimated depth was no more than 16 cm, and the calculation speed was 14.3 frames per second.

Conclusion The image depth estimation model based on hybrid group dilated convolution is superior to existing methods, effectively improves the accuracy of depth estimation and can meet the depth estimation requirements of field corn plant images.

基于混合分组扩张卷积的玉米植株图像深度估计

Depth estimation for corn plant images based on hybrid group dilated convolution