Objective To study the image depth estimation methods for corn field scenes, solve the problem of insufficient accuracy in depth estimation models due to the lack of effective photometric loss measures, and provide technical support for the vision system design of field intelligent agricultural machinery and navigation obstacle avoidance.
Method This study applied binocular cameras as visual sensors, and proposed an unsupervised depth estimation model based on hybrid grouping extended convolution. A hybrid grouping extended convolution structure and its corresponding self-attention regulation mechanism were designed. The reverse residual module and deep neural network were constructed as the backbone of the model. The illumination insensitive image gradient and Gabor texture features were introduced into the apparent difference measurement of view, and the model optimization objective was constructed based on them. Taking maize plant image as an example, the model training and verification tests were carried out.
Result Compared with the fixed expansion factor, the average relative error of maize plant depth estimation in the field was reduced by 63.9%, the average absolute error and root mean square error were reduced by 32.3% and 10.2% respectively, and the accuracy of the model was significantly improved. With the introduction of image gradient, Gabor texture feature and self-attention mechanism, the mean absolute error and root mean square error of field scene depth estimation were further reduced by 3.2% and 4.6% respectively. Increasing the network width and depth of shallow encoder could significantly improve the accuracy of model depth estimation, but the effect of this treatment on deep encoder was not obvious. The self-attention mechanism designed in this study was selective to the convolution grouping of different expansion factors in the shallow reverse residual module of the encoder, indicating that the mechanism had the ability to adjust the receptive field. Compared with Monodepth2, the average relative error and the average absolute error of the estimated depth of maize plants in the field of the research model were reduced by 48.2% and 17.1% respectively. Within the sampling range of 20 m, the average absolute error of the estimated depth was no more than 16 cm, and the calculation speed was 14.3 frames per second.
Conclusion The image depth estimation model based on hybrid group dilated convolution is superior to existing methods, effectively improves the accuracy of depth estimation and can meet the depth estimation requirements of field corn plant images.