基于思维链蒸馏和反事实推理的农业命名实体识别技术

    Agricultural named entity recognition technology based on thought chain distillation and counterfactual reasoning

    • 摘要:
      目的 解决大型语言模型(Large language model, LLM)在农业领域命名实体识别(Named entity recognition, NER)任务中的幻觉、上下文逻辑不一致以及无法在低资源设备上运行的问题。
      方法 使用DeepSeek-671B作为教师模型,将其领域知识迁移至参数量更小的学生模型,学生模型选取DeepSeek、Qwen和Llama的15×109、70×109和140×109低参数版本蒸馏并进行反事实推理训练,模型效果在农业病害专用数据集CropDiseaseNer上进行试验验证。
      结果 通过对比一系列蒸馏后学生模型的性能表现,结果表明DeepSeek-14B实体识别F1达89.60%,且参数量仅为教师模型的2.08%。其性能相较于通用大模型GPT-mini-14B(F1为57.64%)和通用LLM的领域适配模型GLiNER(F1为82.96%)有较大提升。进一步分析表明,同源架构的DeepSeek学生模型因参数对齐性优势,在病害实体、病原菌属名等长尾类别识别任务中显著优于异源架构模型。
      结论 本研究验证了知识蒸馏在农业领域NER任务中的有效性,为资源受限场景下的实体识别技术提供了新的解决方案。

       

      Abstract:
      Objective To address the issues of hallucinations, contextual logical inconsistencies, and inability to run on low-resource devices when large language models perform named entity recognition (NER) in agriculture.
      Method Using DeepSeek-671B as the teacher model, domain knowledge was transferred to student models with fewer parameters. The student models selected were low-parameter versions of DeepSeek, Qwen, and Llama (1.5 billion, 7 billion, and 14 billion parameters, respectively), which underwent distillation and counterfactual reasoning training. Model performance was experimentally validated on the CropDiseaseNer dataset, a specialized agricultural disease dataset.
      Result By comparing the performance of a series of distilled student models, the results showed that DeepSeek-14B achieved an entity recognition F1 score of 89.60% while requiring only 2.08% of the parameters of the teacher model. Its performance significantly outperformed both the general-purpose large model GPT-mini-14B (F1 score: 57.64%) and the domain-adapted model GLiNER (F1 score: 82.96%) based on a general LLM. Further analysis revealed that the DeepSeek student model, sharing the same architecture, demonstrated superiority over models with different architectures in recognizing long-tail categories such as disease entities and pathogen genus names, owing to its parameter alignment advantage.
      Conclusion This study validates the effectiveness of knowledge distillation in NER tasks within the agricultural domain, offering a novel solution for entity recognition technology in resource-constrained scenarios.

       

    /

    返回文章
    返回