• 《中国科学引文数据库(CSCD)》来源期刊
  • 中国科技期刊引证报告(核心版)期刊
  • 《中文核心期刊要目总览》核心期刊
  • RCCSE中国核心学术期刊

一种二倍体片段测序中SNP检测系统的构建

邓继忠, 林伟森, 甘四明, 黄华盛, 李梅, 金济, 何明昊

邓继忠, 林伟森, 甘四明, 黄华盛, 李梅, 金济, 何明昊. 一种二倍体片段测序中SNP检测系统的构建[J]. 华南农业大学学报, 2016, 37(3): 115-120. DOI: 10.7671/j.issn.1001-411X.2016.03.018
引用本文: 邓继忠, 林伟森, 甘四明, 黄华盛, 李梅, 金济, 何明昊. 一种二倍体片段测序中SNP检测系统的构建[J]. 华南农业大学学报, 2016, 37(3): 115-120. DOI: 10.7671/j.issn.1001-411X.2016.03.018
DENG Jizhong, LIN Weisen, GAN Siming, HUANG Huasheng, LI Mei, JIN Ji, HE Minghao. Development of an automatic system for SNP detection in diploid fragment sequencing[J]. Journal of South China Agricultural University, 2016, 37(3): 115-120. DOI: 10.7671/j.issn.1001-411X.2016.03.018
Citation: DENG Jizhong, LIN Weisen, GAN Siming, HUANG Huasheng, LI Mei, JIN Ji, HE Minghao. Development of an automatic system for SNP detection in diploid fragment sequencing[J]. Journal of South China Agricultural University, 2016, 37(3): 115-120. DOI: 10.7671/j.issn.1001-411X.2016.03.018

一种二倍体片段测序中SNP检测系统的构建

基金项目: 

863计划 2013AA102705

国家自然科学基金 31270702

国家自然科学基金 31070592

详细信息
    作者简介:

    邓继忠(1963—),男,副教授,博士,E-mail: jz-deng@scau.edu.cn

    通讯作者:

    甘四明(1970—),男,研究员,博士,E-mail:Siming_Gan@126.com

  • 中图分类号: TP391.4;Q523.8

Development of an automatic system for SNP detection in diploid fragment sequencing

  • 摘要:
    目的 

    开发基于模式识别方法的二倍体片段测序中单核苷酸多态性(Single nucleotide polymorphism,SNP)自动检测系统,提高检测的准确性。

    方法 

    采用LabWindows/CVI 9.0开发平台,结合Matlab函数库编程,以二倍体PCR片段测序的.ab1或.scf格式文件作为源数据,首先分离出碱基G、A、T和C,进行一维离散小波滤波,再对各碱基处的波形进行典型特征提取,最后运用基于反向传播神经网络的分类器完成SNP识别和判断。

    结果 

    系统界面友好、运行稳定。SNP等级分为6级,允许用户对可疑的SNP进行人工修正,对尾叶桉Eucalyptus urophylla的26个测序序列143个SNP的测试中检测准确率、假阳性率和假阴性率均明显优于之前的类似软件。

    结论 

    本文所构建的SNP自动检测系统准确性高,不需参考序列,可用于二倍体PCR片段测序中SNP的高效检测。

    Abstract:
    Objective 

    This study aims to develop a pattern-recognition based system for automatic single nucleotide polymorphism (SNP)detection in diploid fragment sequencing and improve the detection accuracy.

    Method 

    The LabWindows/CVI 9.0 platform and Matlab environment were combined for analyzing.ab1 or.scf files generated in diploid PCR fragment sequencing. Firstly, four bases G, A, T and C were separated for eliminating noise through one-dimensional discrete wavelet filtering, following with extraction of typical features of each base position (peak) from a fluorescence curve. A classifier based on back-propagation neural network was then used for SNP recognition and diagnosis.

    Result 

    This established system was characterized by friendly interface, stable operation and manual modification accessibility. It classified the SNP reliability into six grades. Performance test with 143 SNPs of 26 sequencing fragments from Eucalyptus urophylla demonstrated that our system outperformed three previously reported software packages in detecting accuracy, false positive and false negative rates.

    Conclusion 

    Our system has a high rate of accuracy without the need for a reference sequence. It could be used for efficient SNP detection in diploid PCR fragment sequencing.

  • 图  1   SNP检测流程图

    Figure  1.   A flow chart of the SNP detection process

    图  2   SNP检测系统的界面

    Figure  2.   A typical interface of the SNP detection system

    图  3   本系统导入测序文件后的判读序列与测序比较

    b:本系统判读的序列,155 bp处双峰、SNP为T/A

    Figure  3.   Base sequence obtained in our system being compared to that of sequencer

    图  4   人工校正的有效性

    Figure  4.   Efficient manual correction of a misidentified base

    图  5   SNP检测结果存为.txt文件

    Figure  5.   SNP detection result saved as a.txt file

    表  1   不同软件识别SNP的结果对比1)

    Table  1   Comparison of software performance in SNP detection

    下载: 导出CSV
  • [1]

    OSSOWSKI S, SCHNEEBERGER K, CLARK R M, et al. Sequencing of natural strains of Arabidopsis thaliana wit short reads[J]. Genome Res, 2008, 18(12): 2024-2033. doi: 10.1101/gr.080200.108

    [2] 唐立琼, 肖层林, 王伟平. SNP分子标记的研究及其应用进展[J].中国农学通报, 2012, 28(12): 154-158. http://d.old.wanfangdata.com.cn/Periodical/zgnxtb201212028
    [3] 许家磊, 王宇, 后猛, 等. SNP检测方法的研究进展[J].分子植物育种, 2015, 13(2): 475-482. http://d.old.wanfangdata.com.cn/Periodical/yc200601021
    [4]

    WECKX S, DEL-FAVERO J, RADEMAKERS R, et al. novoSNP, a novel computational tool for sequence variation discovery[J]. Genome Res, 2005, 15(3): 436-442. doi: 10.1101/gr.2754005

    [5]

    MATTHEW S, JAMES S, ROBERTSON P D, et al. Automating sequence-based detection and genotyping of single nucleotide polymorphisms (SNPs) from diploid samples[J]. Nat Genet, 2006, 38(3): 375-381. doi: 10.1038/ng1746

    [6]

    DENG J Z, HUANG H S, YU X L, et al. DiSNPindel: Improved intra-individual SNP and InDel detection in direct amplicon sequencing of a diploid[J]. BMC Bioinformatics, 2015, 16: 343. doi: 10.1186/s12859-015-0790-y

    [7] 仇志平, 李树军. LabWindows/CVI虚拟仪器软件在测试领域中的应用[J].计算机工程与设计, 2007, 28(22): 5544-5548. doi: 10.3969/j.issn.1000-7024.2007.22.065
    [8] 刘君华.虚拟程序编程语言LabWindows/CVI编程[M].北京:电子工业出版社, 2001.
    [9] 肖伟, 刘忠, 曾新勇, 等. MATLAB程序设计与应用[M].北京:清华大学出版社, 2005.
    [10]

    BUI T D, CHEN G. Translation-invariant denoising using multiwavelets[J]. IEEE Trans Sig Proc, 1998, 46(12): 3414-3420. doi: 10.1109/78.735315

    [11]

    PAN Q, ZHANG P, DAI G, et al. Two denoising methods by wavelet transform[J]. IEEE Trans Sig Proc, 1999, 47(12): 3401-3406. doi: 10.1109/78.806084

    [12]

    MCKEOWN J J, STELLA F, HALL G. Some numerical aspects of the training problem for feed-forward neural nets[J]. Neural Netw, 1997, 10(9): 1455-1463. doi: 10.1016-S0893-6080(97)00015-4/

    [13] 黄华盛. 基于模式识别的二倍体个体内SNP和InDel自动检测[D]. 广州: 华南农业大学, 2014.
    [14]

    YU X, GUO Y, ZHANG X, et al. Integration of EST-CAPS markers into genetic maps of and Eucalgptus urophylla and E.tereticornis and their alignment with E. grandis genome sequence[J]. Silvae Genet, 2012, 61(6): 247-255.

    [15]

    STUDER A, ZHAO Q, ROSS-IBARRA J, et al. Identification of a functional transposon insertion in the maize domestication gene tb1[J]. Nat Genet, 2011, 43(11): 1160-1163. doi: 10.1038/ng.942

    [16]

    NGAMPHIW C, KULAWONGANUNCHAI S, ASSAWAMAKIN A, et al. VarDetect: A nucleotide sequence variation exploratory tool[J]. BMC Bioinformatics, 2008, 9(S12): 9. http://d.old.wanfangdata.com.cn/OAPaper/oai_pubmedcentral.nih.gov_2638149

图(5)  /  表(1)
计量
  • 文章访问数:  1192
  • HTML全文浏览量:  5
  • PDF下载量:  1596
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-08-25
  • 网络出版日期:  2023-05-17
  • 刊出日期:  2016-05-09

目录

    /

    返回文章
    返回