Abstract:
Objective By comparing the performance of various machine learning models in predicting gene expression in pigs utilizing single nucleotide polymorphisms (SNP), we investigated the relationship among cis-heritability (cis-h2), the number of cis-SNPs and the prediction accuracy.
Method Based on the protein encoding genes of pigs derived from muscle tissue of the PigGTEx project, we trained 18 distinct machine learning models by employing cis-SNPs located within a ±1 Mb window from the transcription start sites of genes. Subsequently, we evaluated the prediction accuracy of each model.
Result There wasa positive correlation between the prediction accuracy of machine learning models and the cis-h2 values of genes. Notably, the Elastic Net regression model and the Lasso regression model exhibited the highest overall prediction accuracy, with mean R2 values of 0.0362 and 0.0358, respectively. Furthermore, there was a positive correlation between the prediction accuracy of these machine learning models and the number of cis-SNPs around the genes within certain range.
Conclusion The accuracy of utilizing machine learning models to predict gene expression in pigs is largely influenced by both cis-h2 and the number of cis-SNPs. Therefore, selecting an appropriate machine learning model tailored to the specific cis-h2 values and the number of cis-SNPs of different genes of pigs can be advantageous in enhancing prediction accuracy.