基于Swin-Unet的奶牛饲料消耗状态监测方法

    Feed consumption status monitoring method of dairy cows based on Swin-Unet

    • 摘要:
      目的 针对监控图像中饲料区域结构较长、边界模糊,且形状与尺寸复杂多变等特点,本研究旨在更准确地分割饲料残余区域与消耗区域,以达到准确监测饲料消耗状态的目的。
      方法 本研究提出了基于Swin-Unet的语义分割模型,其在Swin Transformer块的开始阶段应用ConvNeXt块,增强模型对特征信息的编码能力,以提供更好的特征表示,并利用深度卷积替换线性注意力映射,以提供局部空间上下文信息。同时提出了新颖的宽范围感受野模块来代替多层感知机,以丰富多尺度空间上下文信息。此外在编码器的开始阶段,将线性嵌入层替换为卷积嵌入层,通过分阶段压缩特征,在块之间和内部引入更多的空间上下文信息。最后引入多尺度输入策略、深度监督策略,并提出了特征融合模块,以加强特征融合。
      结果 所提出方法的平均交并比、准确率、F1分数与运行速度分别为86.46%、98.60%、92.29%和23帧/s,相较于Swin-Unet,分别提高4.36、2.90、0.65个百分点和15%。
      结论 基于图像语义分割的方法应用于饲料消耗状态的自动监测是可行的,该方法通过将卷积引入Swin-Unet,有效地提高了分割精度与计算效率,对提升生产管理效率具有重要意义。

       

      Abstract:
      Objective In view of the characteristics of the feed area in the monitoring image, which has a long structure, fuzzy boundaries, as well as complex and changeable shapes and sizes, the aim of this study was to more accurately segment the feed residual area and consumption area, and achieve the purpose of accurately monitoring the feed consumption status.
      Method This study proposed a semantic segmentation model based on Swin-Unet, which applied ConvNeXt block at the beginning of the Swin Transformer block to enhance the model’s ability of encoding feature information to provide better feature representation. The model used depth-wise convolution to replace linear attention projection to provide local spatial context information. At the same time, a novel wide receptive field module was proposed to replace the multi-layer perceptron to enrich multi-scale spatial context information. In addition, at the beginning of the encoder, the linear embedding layer was replaced with a convolutional embedding layer, which introduces more spatial context information between and within patches by compressing features in stages. Finally, a multi-scale input strategy, a deep supervision strategy and a feature fusion module were introduced to strengthen feature fusion.
      Result The mean intersection over union, accuracy, F1-score and operation speed of the proposed method were 86.46%, 98.60%, 92.29% and 23 frames/s respectively, which were 4.36, 2.90, 0.65 percentage points and 15% higher than those of Swin-Unet.
      Conclusion It is feasible to apply the method based on image semantic segmentation to the automatic monitoring of feed consumption status. This method effectively improves the segmentation accuracy and computing efficiency by introducing convolution into Swin-Unet, which is of great significance for improving production management efficiency.

       

    /

    返回文章
    返回