机器学习模型对猪基因表达量预测准确性的评估

周天乐; 滕金言; 徐志婷; 张哲

doi:10.7671/j.issn.1001-411X.202409024

机器学习模型对猪基因表达量预测准确性的评估

Evaluation of predictive accuracy of gene expression in pigs using machine learning models

摘要

摘要:
目的对比不同机器学习模型利用基因顺式单核苷酸多态性(Single-nucleotide polymorphism, SNP)预测猪的基因表达量的效果，探究基因顺式遗传力(cis-heritability, cis-h²)和顺式SNP(cis-SNP)数量与不同模型预测准确性的关系。
方法基于PigGTEx项目猪肌肉组织样本的蛋白编码基因，使用18种不同机器学习模型，将基因转录起始位点±1 Mb范围内的cis-SNP用于训练，评估每种模型的预测准确性。
结果机器学习模型的预测准确性与基因cis-h²呈正相关，弹性网络回归模型和Lasso回归模型整体预测准确性最高，R²平均值分别为0.0362和0.0358；一定范围内，模型预测准确性与基因cis-SNP数量呈正相关。
结论使用机器学习模型预测猪基因表达的准确性受基因cis-h²和cis-SNP数量影响较大，根据不同基因的cis-h²和cis-SNP数量选择合适的机器学习模型预测猪的基因表达量有利于提高预测准确性。

Abstract:
Objective The goal was to compare the performance of various machine learning models in predicting gene expression in pigs utilizing single nucleotide polymorphisms (SNPs), and to investigate the relationship between cis-heritability (cis-h²), the number of cis-SNPs and the prediction accuracy of different models.
Method Based on the protein encoding genes of pigs derived from muscle tissue of the PigGTEx project, we trained 18 distinct machine learning models by employing cis-SNPs located within a ±1 Mb window from the transcription start sites of genes. Subsequently, we evaluated the prediction accuracy of each model.
Result There was a positive correlation between the prediction accuracy of machine learning models and the cis-h² of genes. Notably, the elastic net regression model and the Lasso regression model exhibited the highest overall prediction accuracy, with the means of R² being 0.0362 and 0.0358, respectively. Furthermore, there was a positive correlation between the prediction accuracy of these machine learning models and the number of cis-SNPs around the genes within certain range.
Conclusion The accuracy of utilizing machine learning models to predict gene expression in pigs is largely influenced by both cis-h² and the number of cis-SNPs of genes. Therefore, selecting an appropriate machine learning model tailored to the specific cis-h² and the number of cis-SNPs of different genes can be advantageous in enhancing the accuracy for predicting pig gene expression levels.

HTML全文

参考文献(29)

施引文献

资源附件(0)