Loading [MathJax]/jax/output/SVG/jax.js
  • 《中国科学引文数据库(CSCD)》来源期刊
  • 中国科技期刊引证报告(核心版)期刊
  • 《中文核心期刊要目总览》核心期刊
  • RCCSE中国核心学术期刊

融合多头注意力的轻量级作物病虫害识别

赵法川, 徐晓辉, 宋涛, 郝淼淼, 汪曙, 朱伟龙

赵法川, 徐晓辉, 宋涛, 等. 融合多头注意力的轻量级作物病虫害识别[J]. 华南农业大学学报, 2023, 44(6): 986-994. DOI: 10.7671/j.issn.1001-411X.202208051
引用本文: 赵法川, 徐晓辉, 宋涛, 等. 融合多头注意力的轻量级作物病虫害识别[J]. 华南农业大学学报, 2023, 44(6): 986-994. DOI: 10.7671/j.issn.1001-411X.202208051
ZHAO Fachuan, XU Xiaohui, SONG Tao, et al. A lightweight crop pest identification method based on multi-head attention[J]. Journal of South China Agricultural University, 2023, 44(6): 986-994. DOI: 10.7671/j.issn.1001-411X.202208051
Citation: ZHAO Fachuan, XU Xiaohui, SONG Tao, et al. A lightweight crop pest identification method based on multi-head attention[J]. Journal of South China Agricultural University, 2023, 44(6): 986-994. DOI: 10.7671/j.issn.1001-411X.202208051

融合多头注意力的轻量级作物病虫害识别

基金项目: 河北省重点研发计划(20327201D)
详细信息
    作者简介:

    赵法川,硕士研究生,主要从事农业物联网研究,E-mail: 202031903037@stu.hebut.edu.cn

    通讯作者:

    徐晓辉,研究员,主要从事传感器及智能系统研究,E-mail: xxh@hebut.edu.cn

  • 中图分类号: S435

A lightweight crop pest identification method based on multi-head attention

  • 摘要:
    目的 

    解决当前病虫害识别方法参数多、计算量大、难以在边缘嵌入式设备部署的问题,实现农作物病虫害精准识别,提高农作物产量和品质。

    方法 

    提出一种融合多头注意力的轻量级卷积网络(Multi-head attention to convolutional neural network,M2CNet)。M2CNet采用层级金字塔结构,首先,结合深度可分离残差和循环全连接残差构建局部捕获块,用来捕捉短距离信息;其次,结合全局子采样注意力和轻量级前馈网络构建轻量级全局捕获块,用来捕捉长距离信息。提出M2CNet-S/B/L 3个变体以满足不同的边缘部署需求。

    结果 

    M2CNet-S/B/L参数量分别为1.8M、3.5M和5.8M,计算量(Floating point operations,FLOPs)分别为0.23G、0.39G和0.60G。M2CNet-S/B/L对PlantVillage病害数据集取得了大于99.7%的Top5准确率和大于95.9%的Top1准确率,对IP102虫害数据集取得了大于88.4%的Top5准确率和大于67.0%的Top1准确率,且比同级别的模型表现优异。

    结论 

    该方法能够对作物病虫害进行有效识别,且可为边缘侧工程部署提供有益参考。

    Abstract:
    Objective 

    To solve the problems that the current pest identification method has many parameters, a large amount of calculation and is difficult to deploy embedded devices at the edge, so as to realize accurate identification of crop pests and diseases, and improve crop yield and quality.

    Method 

    A lightweight convolutional neural network called multi-head attention to convolutional neural network (M2CNet) was proposed. M2CNet adopted hierarchical pyramid structure. Firstly, a local capture block was constructed by combining depth separable residual and cyclic fully connected residual to capture short-range information. Secondly, a lightweight global capture block was constructed by combining global subsampling attention and lightweight feedforward network to capture long-distance information. Three variants, namely M2CNet-S, M2CNet-B, and M2CNet-L, were proposed by M2CNet to meet different edge deployment requirements.

    Result 

    M2CNet-S/B/L had parameter sizes of 1.8M, 3.5M and 5.8M, and floating point operations of 0.23G, 0.39G, and 0.60G, respectively. M2CNet-S/B/L achieved top5 accuracy greater than 99.7% and top1 accuracy greater than 95.9% in PlantVillage disease dataset, and top5 accuracy greater than 88.4% and top1 accuracy greater than 67.0% in IP102 pest dataset, outperforming models of the same level in comparison.

    Conclusion 

    Effective identification of crop diseases and pests can be achieved by this method, and it provides valuable references for edge engineering deployment.

  • 图  1   M2CNet网络总体组成

    LCB:局部捕获块;LGCB:轻量级全局捕获块;HW分别代表输入图片的高度和宽度;Ci:指用于阶段i的通道数;Li:阶段i的局部捕获块和轻量级全局捕获块数量

    Figure  1.   Overall structure of the M2CNet network

    LCB: Local capture block; LGCB: Lightweight global capture block; H and W represent the height and width of the input image, respectively; Ci: Number of channels used for stage i; Li represents the number of local capture blocks and lightweight global capture blocks in stage i

    图  2   局部捕捉块结构图

    a:深度可分离卷积;b:多层循环全连接

    Figure  2.   Structure diagram of a local snap block

    a : Deep separable convolution; b: Multi-layer loop fully connected

    图  3   标准多头注意力(a)与全局子采样注意力(b)的对比

    QKV分别表示查询、键和值,HW分别表示输入图片的高度和宽度,s表示子窗口大小,C表示通道数

    Figure  3.   Comparison of standard multi-head attention and global subsampling attention

    Q, K, V represent query, key and value respectively, H and W represent the height and width of the input picture respectively, s represents size of the subwindow, C represents number of channels

    图  4   轻量级全局捕获块

    a:条件位置编码;b:全局子采样注意力;c:轻量级前馈网络;di:通道维数;s:子窗口的大小;h:多头注意力头的数量;HW分别代表输入特征的高度和宽度

    Figure  4.   Lightweight global capture block

    a: Conditional position encoding; b: Global subsampling attention; c: Lightweight feedforward network; di: Channel dimension; s: Size of the sub window; h: Number of attention heads with multiple heads; H and W represent the height and width of the input features, respectively

    图  5   M2CNet-S/B/L在CIFAR100数据集的训练过程

    Figure  5.   M2CNet-S/B/L training process in the CIFAR100 dataset

    图  6   病虫害数据集识别结果

    柱状图的宽度与模型参数呈线性关系,参数量越大柱状图越宽;同一色系代表同一对照,同一色系中颜色最深的柱子对应M2CNet变体

    Figure  6.   Identification results of pest data sets

    The width of the bar chart is linearly related to the model parameters, the larger the number of parameters, the wider the bar chart; The same color system represents the same control, and the darkest column in the same color system corresponds to the M2CNet variant

    图  7   网络关注区域热力图

    红色高亮部分代表网络关注度高的区域,冷色发暗部分代表网络关注度低的区域

    Figure  7.   Thermal map of the network focus area

    The highlighted areas in red represent areas with high network attention, while the dark areas in cool colors represent areas with low network attention

    表  1   IP102 数据集害虫分级分类体系

    Table  1   Taxonomy of the IP102 dataset on different class levels

    作物
    Crop
    害虫类别
    Pest class
    训练集
    Training set
    测试集
    Test set
    水稻 Rice 14 6734 1683
    玉米 Corn 13 11212 2803
    小麦 Wheat 9 2734 684
    甜菜 Sugarbeet 8 3536 884
    苜蓿 Alfalfa 13 8312 2078
    葡萄 Grape 16 14041 3510
    柑橘 Orange 19 5818 1455
    芒果 Mango 10 7790 1948
    总计 Total 102 60177 15045
    下载: 导出CSV

    表  2   M2CNet-S/B/L的网络架构 1)

    Table  2   M2CNet-S/B/L network architecture

    阶段
    Stage
    输出尺寸
    Output size
    层名称
    Name of layer
    M2CNet-SM2CNet-BM2CNet-L
    156×56Conv.下采样4×4,36,stride44×4,48,stride4
    56×56深度可分离卷积[3×3,1×1,363×1,1×3,36H1=1,s1=4R1=4]×1[3×3,1×1,483×1,1×3,48H1=1,s1=4R1=4]×1[3×3,1×1,483×1,1×3,48H1=1,s1=4R1=4]×1
    多层循环全连接
    全局子采样注意力
    轻量级前馈网络
    228×28Conv.下采样2×2,72,stride22×2,96,stride2
    28×28深度可分离卷积[3×3,1×1,723×1,1×3,72H1=2,s1=2R1=4]×2[3×3,1×1,963×1,1×3,96H1=2,s1=2R1=4]×1[3×3,1×1,963×1,1×3,96H1=2,s1=2R1=4]×2
    多层循环全连接
    全局子采样注意力
    轻量级前馈网络
    314×14Conv.下采样2×2,144,stride22×2,192,stride2
    14×14深度可分离卷积[3×3,1×1,1443×1,1×3,144H1=4,s1=2R1=4]×3[3×3,1×1,1923×1,1×3,192H1=4,s1=2R1=4]×4[3×3,1×1,1923×1,1×3,192H1=4,s1=2R1=4]×6
    多层循环全连接
    全局子采样注意力
    轻量级前馈网络
    47×7Conv.下采样2×2,288,stride22×2,384,stride2
    7×7深度可分离卷积[3×3,1×1,2883×1,1×3,288H1=8,s1=1R1=4]×2[3×3,1×1,3843×1,1×3,384H1=8,s1=1R1=4]×2[3×3,1×1,3843×1,1×3,384H1=8,s1=1R1=4]×4
    多层循环全连接
    全局子采样注意力
    轻量级前馈网络
    输出 Output1×1全连接100
    参数量(M) No. of parameters1.833.525.76
    计算量(G) Floating point operations0.230.390.60
     1)输入图像大小默认为224像素×224像素,Conv.代表卷积操作,stride表示卷积的步幅,HiSi是第i个全局子采样注意力的头数和次采样大小,Ri是第i个轻量级前馈网络的特征尺寸缩放比
     1) The input image size is 224×224 by default, Conv. stands for convolution operation, stride stands for convolution step, Hi and Si are the number of heads and subsampling size of the ith global subsampling, and Ri is the scaling ratio of the feature size of the ith lightweight feedforward network
    下载: 导出CSV

    表  3   CIFAR100数据集模型对比结果

    Table  3   Comparison results of CIFAR100 dataset model

    模型
    Model
    参数量 (M)
    No. of
    parameters
    计算量 (G)
    Floating
    point
    operations
    准确率/%
    Accuracy
    Top5Top1
    ShuffleNet-V2 0.50.40.0472.7441.83
    ShuffleNet-V2 1.01.40.1586.2159.65
    ShuffleNet-V2 1.52.60.3090.0866.56
    ShuffleNet-V2 2.05.60.5693.0672.79
    SqueezeNet 1.00.80.7578.4849.68
    SqueezeNet 1.10.80.3078.1250.14
    MobileNet-V3-Small1.60.0687.9061.74
    MobileNet-V22.40.3191.6969.16
    MobileNet-V3-Large4.30.2393.5773.27
    MnasNet 0.51.10.1188.1362.60
    MnasNet 0.752.00.2291.4469.20
    MnasNet 1.03.20.3292.8172.70
    MnasNet 1.35.10.5494.4176.64
    EfficientNet B04.10.4094.6376.00
    EfficientNet B16.60.6094.9577.96
    ResNet 1811.21.8094.6676.85
    VGG 11129.27.6094.2575.82
    VGG 13129.411.3094.3876.46
    VGG 16134.715.5094.6378.19
    VGG 19140.019.6095.2578.19
    MobileViT-XXS1.00.3384.9855.96
    MobileViT-XS2.00.9089.5564.34
    MobileViT-S5.11.7593.6472.93
    M2CNet-S1.80.2392.4671.09
    M2CNet-B3.50.3994.1675.32
    M2CNet-L5.80.6095.3178.39
    下载: 导出CSV
  • [1]

    SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. arXiv: 1409.1556. https://arxiv.org/abs/1409.1556.

    [2]

    HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.

    [3] 李静, 陈桂芬, 安宇. 基于优化卷积神经网络的玉米螟虫害图像识别[J]. 华南农业大学学报, 2020, 41(3): 110-116. doi: 10.7671/j.issn.1001-411X.201907017
    [4] 刘洋, 冯全, 王书志. 基于轻量级CNN的植物病害识别方法及移动端应用[J]. 农业工程学报, 2019, 35(17): 194-204. doi: 10.11975/j.issn.1002-6819.2019.17.024
    [5] 陆健强, 林佳翰, 黄仲强, 等. 基于Mixup算法和卷积神经网络的柑橘黄龙病果实识别研究[J]. 华南农业大学学报, 2021, 42(3): 94-101. doi: 10.7671/j.issn.1001-411X.202008041
    [6] 邱文杰, 叶进, 胡亮青, 等. 面向植物病害识别的卷积神经网络精简结构Distilled-MobileNet模型[J]. 智慧农业(中英文), 2021(1): 109-117.
    [7]

    DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. arXiv: 2010.11929. https://arxiv.org/abs/2010.11929.

    [8]

    KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[R/OL]. Technical report: University of Toronto, https://www.cs.toronto.edu~kriz/learning-features-2009-TR.pdf.

    [9]

    HUGHES D P, SALATHE M. An open access repository of images on plant health to enable the development of mobile disease diagnostics[EB/OL]. arXiv: 1511.08060. https://arxiv. org/abs/1511.08060.

    [10]

    WU X P, ZHAN C, LAI Y K, et al. IP102: A large-scale benchmark dataset for insect pest recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2020: 8779-8788.

    [11]

    HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.

    [12]

    HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. arXiv: 1704.04861. https://arxiv.org/abs/1704.0486.

    [13]

    CHEN S, XIE E, GE C, et al. CycleMLP: A MLP-like architecture for dense prediction[C]//International Conference on Learning Representations. OpenRe view. net, 2022: 1-21.

    [14]

    IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning. New York: ACM, 2015: 448-456.

    [15]

    LIU Z, MAO H, WU C Y, et al. A convnet for the 2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA: IEEE, 2022: 11966-11976.

    [16]

    CHU X, TIAN Z, WANG Y, et al. Twins: Revisiting the design of spatial attention in vision transformers[C]//Advances in Neural Information Processing Systems(NIPS). 2021, 34: 9355-9366.

    [17]

    CHU X X, TIAN Z, ZHANG B, et al. Conditional positional encodings for vision transformers[EB/OL]. arXiv: 2102.10882. https://arxiv.org/abs/2102.10882.

    [18]

    VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM, 2017: 6000-6010.

    [19]

    LOSHCHILOV I, HUTTER F. SGDR: Stochastic gradient descent with restarts[C]//International Conference on Learning Representations. Toulon: OpenReview. net, 2017: 1-16.

    [20]

    LOSHCHILOY I, HUTTER F. Decoupled weight decay regularization[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2019: 1-19.

    [21]

    MULLER R, KORNBLITH S, HINTON G E. When does label smoothing help? [EB/OL]. arXiv: 1906.02629. https://arxiv.org/abs/1906.02629.

    [22]

    ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond empirical risk minimization[EB/OL]. arXiv: 1710.09412. https://arxiv.org/abs/1710.09412.

    [23]

    ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 6848-6856.

    [24]

    MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV). New York: ACM, 2018: 122-138.

    [25]

    IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]//arXiv: 1602.07360. https://arxiv.org/abs/1602.07360.

    [26]

    SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: Inverted residuals and linear bottlenecks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 4510-4520.

    [27]

    HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2020: 1314-1324.

    [28]

    TAN M X, CHEN B, PANG R M, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2020: 2815-2823.

    [29]

    TAN M, LE Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning. Long Beach, CA, USA: LR, 2019: 6105-6114.

    [30]

    MEHTA S, RASTEGARI M. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. arXiv: 2110.02178. https://arxiv.org/abs/2110.02178.

    [31]

    SELVARAJU R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. doi: 10.1007/s11263-019-01228-7

图(7)  /  表(3)
计量
  • 文章访问数:  683
  • HTML全文浏览量:  18
  • PDF下载量:  26
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-08-30
  • 网络出版日期:  2023-11-12
  • 发布日期:  2023-05-29
  • 刊出日期:  2023-11-09

目录

    /

    返回文章
    返回