• Chinese Core Journal
  • Chinese Science Citation Database (CSCD) Source journal
  • Journal of Citation Report of Chinese S&T Journals (Core Edition)
ZHOU Tianle, TENG Jinyan, XU Zhiting, et al. Evaluation of predictive accuracy of gene expression in pigs using machine learning models[J]. Journal of South China Agricultural University, 2025, 46(0): 1-9.
Citation: ZHOU Tianle, TENG Jinyan, XU Zhiting, et al. Evaluation of predictive accuracy of gene expression in pigs using machine learning models[J]. Journal of South China Agricultural University, 2025, 46(0): 1-9.

Evaluation of predictive accuracy of gene expression in pigs using machine learning models

More Information
  • Objective 

    By comparing the performance of various machine learning models in predicting gene expression in pigs utilizing single nucleotide polymorphisms (SNP), we investigated the relationship among cis-heritability (cis-h2), the number of cis-SNPs and the prediction accuracy.

    Method 

    Based on the protein encoding genes of pigs derived from muscle tissue of the PigGTEx project, we trained 18 distinct machine learning models by employing cis-SNPs located within a ±1 Mb window from the transcription start sites of genes. Subsequently, we evaluated the prediction accuracy of each model.

    Result 

    There wasa positive correlation between the prediction accuracy of machine learning models and the cis-h2 values of genes. Notably, the Elastic Net regression model and the Lasso regression model exhibited the highest overall prediction accuracy, with mean R2 values of 0.0362 and 0.0358, respectively. Furthermore, there was a positive correlation between the prediction accuracy of these machine learning models and the number of cis-SNPs around the genes within certain range.

    Conclusion 

    The accuracy of utilizing machine learning models to predict gene expression in pigs is largely influenced by both cis-h2 and the number of cis-SNPs. Therefore, selecting an appropriate machine learning model tailored to the specific cis-h2 values and the number of cis-SNPs of different genes of pigs can be advantageous in enhancing prediction accuracy.

  • [1]
    牛安然, 张兴, 杨雨婷, 等. 全基因组关联分析在猪育种中的研究进展[J]. 畜牧与兽医, 2023(55): 139-147.
    [2]
    LI T, WAN P, LIN Q, et al. Genome-wide association study meta-analysis elucidates genetic structure and identifies candidate genes of teat number traits in pigs[J]. International Journal of Molecular Sciences, 2023(25): 451.
    [3]
    ZENG H, ZHONG Z, XU Z, et al. Meta-analysis of genome-wide association studies uncovers shared candidate genes across breeds for pig fatness trait[J]. BMC Genomics, 2022(23): 1-786.
    [4]
    窦腾飞, 吴姿仪, 白利瑶, 等. 全基因组关联分析鉴定大白猪生长性状遗传变异及候选基因[J]. 中国畜牧杂志, 2023(59): 264-272.
    [5]
    LI X, WU J, ZHUANG Z, et al. Integrated single-trait and multi-trait GWASs reveal the genetic architecture of internal organ weight in pigs[J]. Animals, 2023(13).
    [6]
    张宇, 周佳伟, 吴俊静, 等. 大白猪繁殖性状全基因组关联分析[J]. 中国畜牧杂志, 2022(58): 94-99.
    [7]
    TENG J, GAO Y, YIN H, et al. A compendium of genetic regulatory effects across pig tissues[J]. Nature Genetics, 2024(56): 112-123.
    [8]
    郑韵頔, 冉雪琴, 牛熙, 等. 全基因组eQTL揭示猪11号染色体肉质性状新候选位点[J]. 农业生物技术学报, 2024(32): 807-819. doi: 10.3969/j.issn.1674-7968.2024.04.007
    [9]
    MAI J, LU M, GAO Q, et al. Transcriptome-wide association studies: recent advances in methods, applications and available databases[J]. Communications Biology, 2023(6): 899.
    [10]
    GAMAZON E R, WHEELER H E, SHAH K P, et al. A gene-based association method for mapping traits using reference transcriptome data[J]. Nature Genetics, 2015(47): 1091-1098.
    [11]
    GUSEV A, KO A, SHI H, et al. Integrative approaches for large-scale transcriptome-wide association studies[J]. Nature Genetics, 2016(48): 245-252.
    [12]
    ROBINSON M D, OSHLACK A. A scaling normalization method for differential expression analysis of RNA-seq data[J]. Genome Biology, 2010(11): R25.
    [13]
    ZHENG X, LEVINE D, SHEN J, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data[J]. Bioinformatics, 2012(28): 3326-3328.
    [14]
    STEGLE O, PARTS L, PIIPARI M, et al. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses[J]. Nature Protocols, 2012(7): 500-507.
    [15]
    MEHMOOD T, LILAND K H, SNIPEN L, et al. A review of variable selection methods in Partial Least Squares Regression[J]. Chemometrics and Intelligent Laboratory Systems, 2012(118): 62-69.
    [16]
    HOERL A E, KENNARD R W. Ridge regression: biased estimation for nonorthogonal problems[J]. Technometrics, 1970(12): 55-67.
    [17]
    TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996(58): 267-288.
    [18]
    ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005(67): 301-320.
    [19]
    汪廷华, 陈峻婷. 核函数的选择研究综述[J]. 计算机工程与设计, 2012(33): 1181-1186. doi: 10.3969/j.issn.1000-7024.2012.03.068
    [20]
    李欣海. 随机森林模型在分类与回归分析中的应用[J]. 应用昆虫学报, 2013(50): 1190-1197. doi: 10.7679/j.issn.2095-1353.2013.163
    [21]
    YANG J, LEE S H, GODDARD M E, et al. GCTA: a tool for genome-wide complex trait analysis[J]. The American Journal of Human Genetics, 2011(88): 76-82.
    [22]
    WHEELER H E, SHAH K P, BRENNER J, et al. Survey of the heritability and sparse architecture of gene expression traits across human tissues[J]. PLoS Genetics, 2016(12): e1006423.
    [23]
    BAE S, CHOI S, KIM S M, et al. Prediction of quantitative traits using common genetic variants: application to body mass index[J]. Genomics Inform, 2016(14): 149-159.
    [24]
    SPILIOPOULOU A, NAGY R, BERMINGHAM M L, et al. Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models[J]. Human Molecular Genetics, 2015(24): 4167-4182.
    [25]
    WANG J, GAMAZON E R, PIERCE B L, et al. Imputing gene expression in uncollected tissues within and beyond GTEx[J]. The American Journal of Human Genetics, 2016(98): 697-708.
    [26]
    FRYETT J J, MORRIS A P, CORDELL H J. Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies[J]. Genetic Epidemiology, 2020(44): 425-441.
    [27]
    WAINBERG M, SINNOTT-ARMSTRONG N, MANCUSO N, et al. Opportunities and challenges for transcriptome-wide association studies[J]. Nature Genetics, 2019(51): 592-599.
    [28]
    FAN J, LV J. A selective overview of variable selection in high dimensional feature space[J]. Statistica Sinica, 2010(20): 101.
    [29]
    GUYON I, ELISSEEFF A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003(3): 1157-1182.

Catalog

    Article views (118) PDF downloads (29) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return