机器学习模型对猪基因表达量的预测准确性评估

    Evaluation of predictive accuracy of gene expression in pigs using machine learning models

    • 摘要:
      目的 通过对比不同机器学习模型利用基因顺式单核苷酸多态性(Single-nucleotide polymorphisms, SNP)预测猪的基因表达量的效果,探究基因顺式遗传力(Cis-heritability, cis-h2)和顺式SNP(Cis-SNP)数量与不同模型预测准确性间的关系。
      方法 基于PigGTEx项目猪的肌肉组织样本的蛋白编码基因,使用18种不同机器学习模型,将基因转录起始位点±1 Mb范围内的cis-SNP用于训练,评估每种模型的预测准确性。
      结果 机器学习模型的预测准确性与基因cis-h2间存在正相关,弹性网络回归模型和Lasso回归模型整体预测准确性最高,R2平均值分别为0.03620.0358;一定范围内,模型预测准确性与基因cis-SNP数量间存在正相关。
      结论 使用机器学习模型预测猪基因表达的准确性受基因cis-h2和基因cis-SNP数量影响较大,根据不同基因的cis-h2cis-SNP数量选择合适的机器学习模型预测猪的基因表达量有利于提高预测准确性。

       

      Abstract:
      Objective By comparing the performance of various machine learning models in predicting gene expression in pigs utilizing single nucleotide polymorphisms (SNP), we investigated the relationship among cis-heritability (cis-h2), the number of cis-SNPs and the prediction accuracy.
      Method Based on the protein encoding genes of pigs derived from muscle tissue of the PigGTEx project, we trained 18 distinct machine learning models by employing cis-SNPs located within a ±1 Mb window from the transcription start sites of genes. Subsequently, we evaluated the prediction accuracy of each model.
      Result There wasa positive correlation between the prediction accuracy of machine learning models and the cis-h2 values of genes. Notably, the Elastic Net regression model and the Lasso regression model exhibited the highest overall prediction accuracy, with mean R2 values of 0.0362 and 0.0358, respectively. Furthermore, there was a positive correlation between the prediction accuracy of these machine learning models and the number of cis-SNPs around the genes within certain range.
      Conclusion The accuracy of utilizing machine learning models to predict gene expression in pigs is largely influenced by both cis-h2 and the number of cis-SNPs. Therefore, selecting an appropriate machine learning model tailored to the specific cis-h2 values and the number of cis-SNPs of different genes of pigs can be advantageous in enhancing prediction accuracy.

       

    /

    返回文章
    返回