基于欠定盲源分离和深度学习的生猪状态音频识别

    Pig state audio recognition based on underdetermined blind source separation and deep learning

    • 摘要:
      目的 为解决群养环境下生猪音频难以分离与识别的问题,提出基于欠定盲源分离与ECA-EfficientNetV2的生猪状态音频识别方法。
      方法 以仿真群养环境下4类生猪音频信号作为观测信号,将信号稀疏表示后,通过层次聚类估计出信号混合矩阵,并利用lp范数重构算法求解lp范数最小值以完成生猪音频信号重构。将重构信号转化为声谱图,分为进食声、咆哮声、哼叫声和发情声4类,利用ECA-EfficientNetV2网络模型识别音频,获取生猪状态。
      结果 混合矩阵估计的归一化均方误差最低为3.266×10−4,分离重构的音频信噪比在3.254~4.267 dB之间。声谱图经ECA-EfficientNetV2识别检测,准确率高达98.35%;与经典卷积神经网络ResNet50和VGG16对比,准确率分别提升2.88和1.81个百分点;与原EfficientNetV2相比,准确率降低0.52个百分点,但模型参数量减少33.56%,浮点运算量(FLOPs)降低1.86 G,推理时间减少9.40 ms。
      结论 基于盲源分离及改进EfficientNetV2的方法,轻量且高效地实现了分离与识别群养生猪音频信号。

       

      Abstract:
      Objective In order to solve the problem of difficult separation and recognition of pig audio under group rearing environment, we propose a method of pig state audio recognition based on underdetermined blind source separation and ECA-EfficientNetV2.
      Method Four types of pig audio signals were simulated as observation signals in group rearing environment. After the signals were sparsely represented, the signal mixing matrix was estimated by hierarchical clustering, and the lp-paradigm reconstruction algorithm was used to solve for the minimum of lp-paradigm to complete the reconstruction of pig audio signals. The reconstructed signals were transformed into acoustic spectrograms, which were divided into four categories, namely, eating sound, roar sound, hum sound and estrous sound. The audio was recognized using the ECA-EfficientNetV2 network model to obtain the state of the pigs.
      Result The normalized mean square error of the hybrid matrix estimation was as low as 3.266×10−4, and the signal-to-noise ratios of the separated reconstructed audio ranged from 3.254 to 4.267 dB. The acoustic spectrogram was recognized and detected by ECA-EfficientNetV2 with an accuracy of up to 98.35%, and the accuracy improved by 2.88 and 1.81 percentage points compared with the classical convolutional neural networks ResNet50 and VGG16, respectively. Compared with the original EfficientNetV2, the accuracy decreased by 0.52 percentage points, but the amount of the model parameters reduced by 33.56%, the floating-point operations (FLOPs) reduced by 1.86 G, and inference time reduced by 9.40 ms.
      Conclusion The method based on blind source separation and improvement of EfficientNetV2 lightly and efficiently realizes separating and recognizing audio signals of group-raised pigs.

       

    /

    返回文章
    返回