Citation: | ZHAO Fachuan, XU Xiaohui, SONG Tao, et al. A lightweight crop pest identification method based on multi-head attention[J]. Journal of South China Agricultural University, 2023, 44(6): 986-994. DOI: 10.7671/j.issn.1001-411X.202208051 |
To solve the problems that the current pest identification method has many parameters, a large amount of calculation and is difficult to deploy embedded devices at the edge, so as to realize accurate identification of crop pests and diseases, and improve crop yield and quality.
A lightweight convolutional neural network called multi-head attention to convolutional neural network (M2CNet) was proposed. M2CNet adopted hierarchical pyramid structure. Firstly, a local capture block was constructed by combining depth separable residual and cyclic fully connected residual to capture short-range information. Secondly, a lightweight global capture block was constructed by combining global subsampling attention and lightweight feedforward network to capture long-distance information. Three variants, namely M2CNet-S, M2CNet-B, and M2CNet-L, were proposed by M2CNet to meet different edge deployment requirements.
M2CNet-S/B/L had parameter sizes of 1.8M, 3.5M and 5.8M, and floating point operations of 0.23G, 0.39G, and 0.60G, respectively. M2CNet-S/B/L achieved top5 accuracy greater than 99.7% and top1 accuracy greater than 95.9% in PlantVillage disease dataset, and top5 accuracy greater than 88.4% and top1 accuracy greater than 67.0% in IP102 pest dataset, outperforming models of the same level in comparison.
Effective identification of crop diseases and pests can be achieved by this method, and it provides valuable references for edge engineering deployment.
[1] |
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. arXiv: 1409.1556. https://arxiv.org/abs/1409.1556.
|
[2] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
|
[3] |
李静, 陈桂芬, 安宇. 基于优化卷积神经网络的玉米螟虫害图像识别[J]. 华南农业大学学报, 2020, 41(3): 110-116. doi: 10.7671/j.issn.1001-411X.201907017
|
[4] |
刘洋, 冯全, 王书志. 基于轻量级CNN的植物病害识别方法及移动端应用[J]. 农业工程学报, 2019, 35(17): 194-204. doi: 10.11975/j.issn.1002-6819.2019.17.024
|
[5] |
陆健强, 林佳翰, 黄仲强, 等. 基于Mixup算法和卷积神经网络的柑橘黄龙病果实识别研究[J]. 华南农业大学学报, 2021, 42(3): 94-101. doi: 10.7671/j.issn.1001-411X.202008041
|
[6] |
邱文杰, 叶进, 胡亮青, 等. 面向植物病害识别的卷积神经网络精简结构Distilled-MobileNet模型[J]. 智慧农业(中英文), 2021(1): 109-117.
|
[7] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. arXiv: 2010.11929. https://arxiv.org/abs/2010.11929.
|
[8] |
KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[R/OL]. Technical report: University of Toronto, https://www.cs.toronto.edu~kriz/learning-features-2009-TR.pdf.
|
[9] |
HUGHES D P, SALATHE M. An open access repository of images on plant health to enable the development of mobile disease diagnostics[EB/OL]. arXiv: 1511.08060. https://arxiv. org/abs/1511.08060.
|
[10] |
WU X P, ZHAN C, LAI Y K, et al. IP102: A large-scale benchmark dataset for insect pest recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2020: 8779-8788.
|
[11] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
|
[12] |
HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. arXiv: 1704.04861. https://arxiv.org/abs/1704.0486.
|
[13] |
CHEN S, XIE E, GE C, et al. CycleMLP: A MLP-like architecture for dense prediction[C]//International Conference on Learning Representations. OpenRe view. net, 2022: 1-21.
|
[14] |
IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning. New York: ACM, 2015: 448-456.
|
[15] |
LIU Z, MAO H, WU C Y, et al. A convnet for the 2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA: IEEE, 2022: 11966-11976.
|
[16] |
CHU X, TIAN Z, WANG Y, et al. Twins: Revisiting the design of spatial attention in vision transformers[C]//Advances in Neural Information Processing Systems(NIPS). 2021, 34: 9355-9366.
|
[17] |
CHU X X, TIAN Z, ZHANG B, et al. Conditional positional encodings for vision transformers[EB/OL]. arXiv: 2102.10882. https://arxiv.org/abs/2102.10882.
|
[18] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM, 2017: 6000-6010.
|
[19] |
LOSHCHILOV I, HUTTER F. SGDR: Stochastic gradient descent with restarts[C]//International Conference on Learning Representations. Toulon: OpenReview. net, 2017: 1-16.
|
[20] |
LOSHCHILOY I, HUTTER F. Decoupled weight decay regularization[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2019: 1-19.
|
[21] |
MULLER R, KORNBLITH S, HINTON G E. When does label smoothing help? [EB/OL]. arXiv: 1906.02629. https://arxiv.org/abs/1906.02629.
|
[22] |
ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond empirical risk minimization[EB/OL]. arXiv: 1710.09412. https://arxiv.org/abs/1710.09412.
|
[23] |
ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 6848-6856.
|
[24] |
MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV). New York: ACM, 2018: 122-138.
|
[25] |
IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]//arXiv: 1602.07360. https://arxiv.org/abs/1602.07360.
|
[26] |
SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: Inverted residuals and linear bottlenecks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 4510-4520.
|
[27] |
HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2020: 1314-1324.
|
[28] |
TAN M X, CHEN B, PANG R M, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2020: 2815-2823.
|
[29] |
TAN M, LE Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning. Long Beach, CA, USA: LR, 2019: 6105-6114.
|
[30] |
MEHTA S, RASTEGARI M. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. arXiv: 2110.02178. https://arxiv.org/abs/2110.02178.
|
[31] |
SELVARAJU R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. doi: 10.1007/s11263-019-01228-7
|