CN110438235A - 基于毛干蛋白质组nsSNP进行人群来源推断的方法 - Google Patents

基于毛干蛋白质组nsSNP进行人群来源推断的方法 Download PDF

Info

Publication number
CN110438235A
CN110438235A CN201810414456.0A CN201810414456A CN110438235A CN 110438235 A CN110438235 A CN 110438235A CN 201810414456 A CN201810414456 A CN 201810414456A CN 110438235 A CN110438235 A CN 110438235A
Authority
CN
China
Prior art keywords
nssnp
site
africa
sites
east asia
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810414456.0A
Other languages
English (en)
Other versions
CN110438235B (zh
Inventor
李彩霞
丰蕾
江丽
季安全
王桂强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Forensic Science Ministry of Public Security PRC
Original Assignee
Institute of Forensic Science Ministry of Public Security PRC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Forensic Science Ministry of Public Security PRC filed Critical Institute of Forensic Science Ministry of Public Security PRC
Priority to CN201810414456.0A priority Critical patent/CN110438235B/zh
Publication of CN110438235A publication Critical patent/CN110438235A/zh
Application granted granted Critical
Publication of CN110438235B publication Critical patent/CN110438235B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/166Oligonucleotides used as internal standards, controls or normalisation probes

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

本发明公开了基于毛干蛋白质组nsSNP进行人群来源推断的方法。本发明选取104个中国汉族样本和105个中国维吾尔族样本的毛干样本进行了毛干蛋白质组的提取,通过质谱检测毛干蛋白质组,筛选得到772个包含SAP的特异性多肽序列,对应703个SAP位点,并将所述SAP位点与千人基因组数据库中的SNP位点关联进而反推得到527个nsSNP位点组合。通过实验证明,本发明提供的nsSNP位点组合可用于非洲、东亚和欧洲三大人群推断。

Description

基于毛干蛋白质组nsSNP进行人群来源推断的方法
技术领域
本发明涉及生物技术领域,具体基于毛干蛋白质组nsSNP进行人群来源推断的方法。
背景技术
随着法医DNA检验技术的发展和进步,常见血液/斑、唾液/斑、精液/斑、脱落细胞、带毛囊的毛发、甚至骨骼都能获得STR分型。然而,毛干由角质化细胞组成,细胞核DNA含量非常低而且降解严重,虽然也有报道采用低扩增体系、增加循环次数和多次平行扩增的方法可以获得部分STR分型,但是由于其准确性和稳定性差而未在案件检验中应用。目前对于毛干的检验方法是通过测序的方法检测线粒体DNA的高变区的碱基差异,存在识别率不高(数值)、具有异质性、只能排除不能认定等缺点,限制了其在法医检验鉴定中的应用。
与毛干中的核DNA相比,蛋白质要更加稳定,可以长期保持稳定。与基因组DNA类似,在不同的个体中,蛋白质序列存在一定的差异,是由于编码基因上的非同义单核苷酸多态性(non-synonymous single nucleotide polymorphism,nsSNP)通过转录翻译后形成的,称作单氨基酸多态性(single amino acid polymorphisms,SAP)。液质联用的串联质谱法鉴定蛋白质是目前蛋白质组学研究的首选平台。蛋白质经胰酶消化形成的肽段先进入液相色谱进行分离,再进行质谱检测,从而鉴定出特异性多肽序列。研究显示可以通过质谱方法检测获得SAP的特异性多肽,这种特异性多肽被称为遗传多样性多肽(geneticallyvariant peptides,GVP)。
基因组中SNP作为法医遗传学的新的遗传标记,目前已经用于法医人群推断,研究报道了大量人群推断体系,在洲际范围内,不仅可以实现非洲、东亚和欧洲三大人群推断,而且Kidd等的55个SNP组合可以实现七个洲际人群的区分(非洲、欧洲、西南亚、南亚、东亚、大洋洲、美洲)。目前,利用外显子中的nsSNP开展人群推断研究非常少。一项美国的外显子测序计划(Exome Sequencing Proiect,ESP)包含约2203非裔美国人和4300欧裔美国人,分析显示nsSNP在欧美人群频率具有较好的杂合度,其中35000个nsSNPs位点最小等位基因频率大于0.8%。
发明内容
本发明的第一个目的是提供用于区分非洲、东亚和欧洲三大人群的nsSNP位点组合。
本发明提供的用于区分非洲、东亚和欧洲三大人群的nsSNP位点组合由如下527个nsSNP位点组成:rs111433922、rs35340855、rs74058627、rs16829071、rs77912442、rs75073861、rs33931638、rs2274540、rs181507001、rs1340472、rs10776792、rs138286826、rs3790549、rs6587649、rs142660239、rs141677205、rs150525217、rs78489268、rs35492900、rs73004856、rs9793541、rs11205064、rs143680696、rs111350576、rs4329520、rs75424193、rs150172690、rs7527180、rs137886860、rs116208483、rs11544443、rs35358752、rs140222211、rs146608925、rs79957178、rs61743921、rs76446715、rs291102、rs3738046、rs2234697、rs35273824、rs55873785、rs61739198、rs147571909、rs61741026、rs3127679、rs61850830、rs61737718、rs41277978、rs11200927、rs144135625、rs2281878、rs150218827、rs149172507、rs3781409、rs142332607、rs146366062、rs77752215、rs114405390、rs117868609、rs78838117、rs73428416、rs147366020、rs61744476、rs115660558、rs112245148、rs74706151、rs1695、rs188029416、rs199773487、rs1945783、rs78786722、rs11604169、rs75068802、rs141425229、rs111738856、rs61750769、rs34495134、rs13312793、rs112319661、rs25680、rs35819349、rs1063193、rs114865992、rs76226247、rs74095220、rs4761786、rs2071588、rs202205489、rs79897879、rs183358379、rs140635030、rs2852464、rs61740813、rs112554450、rs61630004、rs1732301、rs36004911、rs61730587、rs2658658、rs1732263、rs61730589、rs1791634、rs61730590、rs148287450、rs2232387、rs142860834、rs138021918、rs11540301、rs17845411、rs201201647、rs148276250、rs11170177、rs35043606、rs76412202、rs74660757、rs2634041、rs636127、rs200729891、rs78374723、rs36143766、rs139252457、rs2638497、rs116117459、rs143673140、rs61743822、rs138008625、rs201904127、rs35645287、rs114939776、rs145486599、rs4964460、rs2723880、rs117037408、rs113902407、rs139495129、rs35201084、rs78872760、rs143710874、rs139160172、rs17111188、rs35926651、rs2229462、rs141486741、rs762063、rs10148371、rs11125、rs45560241、rs941920、rs61745465、rs45542736、rs59773088、rs116761065、rs7149578、rs151256890、rs55863440、rs77734634、rs11549015、rs142368943、rs76831919、rs149516006、rs147209733、rs6083、rs141566933、rs182752537、rs143921047、rs8040674、rs114486517、rs138510119、rs13226、rs61733465、rs2745101、rs7202502、rs26856、rs149302444、rs8063727、rs61734749、rs74444511、rs4850、rs143599196、rs61764619、rs149180816、rs139027672、rs115575792、rs35959859、rs11646443、rs115334480、rs142294143、rs111653425、rs8068049、rs140044904、rs33923045、rs139361222、rs2269859、rs7213256、rs112557906、rs112120285、rs17843023、rs17843021、rs142154718、rs721957、rs2010027、rs151267951、rs140634473、rs3829598、rs138200823、rs150218495、rs9635728、rs6503578、rs36006291、rs113142104、rs201968324、rs139615301、rs1497383、rs366700、rs34361798、rs61746658、rs444509、rs385055、rs111435962、rs428371、rs149778906、rs62067292、rs144662088、rs144085234、rs35424651、rs9894258、rs140430944、rs150620728、rs71373411、rs114488848、rs61741663、rs143499346、rs12450621、rs77779192、rs112544857、rs187425812、rs17737019、rs35371972、rs16966811、rs9916475、rs9916484、rs9916724、rs139509509、rs9893787、rs117083040、rs116901031、rs2604955、rs2071563、rs16966929、rs57682233、rs73983451、rs146792525、rs2071560、rs2071601、rs139209783、rs189378138、rs138303882、rs139838007、rs200825300、rs2301354、rs9675246、rs8082683、rs73294423、rs11551760、rs117484558、rs148173278、rs111383277、rs2229512、rs143043662、rs41283425、rs34891485、rs143967758、rs62636624、rs59657238、rs112984118、rs116700192、rs116640209、rs35074489、rs75138404、rs62621822、rs142608913、rs11871357、rs2228306、rs140743740、rs2853533、rs3737374、rs78014467、rs1455555、rs151208927、rs3746173、rs7250822、rs55862054、rs890850、rs10410943、rs80251258、rs2287813、rs62638750、rs117612375、rs150023166、rs10846、rs7249305、rs8111625、rs116923487、rs75291244、rs61731193、rs114254919、rs146740964、rs773902、rs12983721、rs61995739、rs61742630、rs151268424、rs112433506、rs2229259、rs148300955、rs185356090、rs116440799、rs143467587、rs191886465、rs189187210、rs114308190、rs4802741、rs144495841、rs73938668、rs116363585、rs115704323、rs57920974、rs533617、rs62130126、rs143205707、rs192390933、rs1109758、rs13413205、rs72937663、rs142729495、rs75630766、rs202041757、rs6761276、rs6743376、rs77686710、rs34355135、rs112797950、rs35852101、rs35830636、rs76148000、rs113701414、rs3815849、rs181520135、rs73996408、rs2233384、rs2233390、rs2233393、rs6431437、rs73102303、rs61732303、rs214814、rs114998364、rs34205880、rs111730906、rs145658539、rs6061066、rs3746609、rs17301126、rs41293138、rs200948404、rs17856024、rs2231619、rs61750835、rs36068952、rs78386672、rs45486695、rs61750208、rs2830585、rs141102396、rs113360916、rs3804010、rs61748317、rs16986753、rs73901140、rs113504861、rs151147550、rs117415039、rs115002444、rs74429119、rs79258920、rs16987932、rs78121368、rs61753641、rs76994627、rs181516402、rs233252、rs465279、rs111668637、rs411254、rs140821764、rs73909208、rs79740360、rs462007、rs78191358、rs78821735、rs73909210、rs34302939、rs61745911、rs7277175、rs201439546、rs115031369、rs61742280、rs112405400、rs133072、rs147348682、rs191014345、rs61730105、rs76321736、rs3796375、rs2228561、rs17080284、rs138055453、rs57006145、rs140995238、rs116174869、rs77141175、rs77299600、rs5955、rs144811342、rs61995956、rs186892593、rs115253144、rs17029215、rs3811813、rs10513155、rs76155491、rs73757391、rs147178651、rs148509798、rs77499935、rs181914313、rs149861653、rs35610885、rs150956127、rs146522449、rs2278371、rs61743236、rs6872614、rs145827614、rs77767937、rs112465391、rs77758574、rs2076299、rs28763966、rs28763967、rs6929069、rs1225746、rs34286843、rs138815183、rs73736234、rs9261293、rs199834022、rs41293883、rs45624537、rs145921744、rs61746206、rs2621330、rs2070121、rs60336135、rs115292676、rs11969595、rs111265263、rs138694074、rs1676015、rs2227885、rs2295005、rs9478144、rs141119961、rs4716346、rs16901311、rs185762794、rs11548791、rs5743342、rs150151168、rs73692834、rs145942606、rs10256、rs114926839、rs2437100、rs10953934、rs1062154、rs114560708、rs73463436、rs61745481、rs149880251、rs72475803、rs35781576、rs114896954、rs148249848、rs145786248、rs76489557、rs150147780、rs116816681、rs7013127、rs117589117、rs11539895、rs540473、rs34250374、rs35791393、rs16929374、rs146467307、rs3750501、rs7025814、rs114612810、rs144749820、rs76003300、rs145771944、rs1538660、rs142111180、rs76057724、rs144181457、rs3812561、rs7850438、rs139415880、rs16997659、rs17147624、rs17847095、rs41306133、rs144825978、rs138895359和rs142447204。
本发明的第二个目的是提供用于区分非洲、东亚和欧洲三大人群的产品。
本发明提供的用于区分非洲、东亚和欧洲三大人群的产品包括检测上述527个nsSNP位点基因型的物质。
上述产品中,所述检测上述527个nsSNP位点基因型的物质为检测上述527个nsSNP位点基因型的试剂和/或仪器。所述检测上述527个nsSNP位点基因型的试剂和/或仪器可为利用现有技术中的方法检测上述527个nsSNP位点基因型所需的试剂和/或仪器。
本发明的第三个目的是提供上述nsSNP位点组合或上述产品的新用途。
本发明提供了上述nsSNP位点组合或上述产品在区分非洲、东亚和欧洲三大人群中的应用。
本发明还提供了上述nsSNP位点组合或上述产品在构建非洲、东亚和欧洲三大人群基因分型数据库中的应用。
本发明的第四个目的是提供一种构建非洲、东亚和欧洲三大人群基因分型数据库的方法。
本发明提供的构建非洲、东亚和欧洲三大人群基因分型数据库的方法包括如下步骤:
(a1)从千人基因组数据库中选取非洲、东亚和欧洲三大人群基于上述527个nsSNP位点基因型形成原始分型库;
(a2)将所述原始分型库里所有样本进行structure聚类分析,从中选取祖先主成分大于90%的部分即构成非洲、东亚和欧洲三大人群基因分型数据库。
本发明的第五个目的是提供一种区分非洲、东亚和欧洲三大洲际人群的方法。
本发明提供的区分非洲、东亚和欧洲三大洲际人群的方法包括如下步骤:
(b1)按照上述方法构建非洲、东亚和欧洲三大人群基因分型数据库;
(b2)提取待测者的基因组DNA,并进行527个nsSNP位点的基因型检测,获得待测者在527个nsSNP位点上的基因分型结果;
(b3)将待测者在527个nsSNP位点上的基因分型结果与所述非洲、东亚和欧洲三大人群基因分型数据库进行比对,从而确定待测者为非洲、东亚和欧洲人群中的哪一种。
本发明的最后一个目的是提供一种基于毛干蛋白质组筛选用于人群推断的nsSNP位点组合的方法。
本发明提供的基于毛干蛋白质组筛选用于人群推断的nsSNP位点组合的方法包括如下步骤:
(c1)分别提取不同待测个体的毛干蛋白质组,并对所述毛干蛋白质组进行质谱检测,筛选得到特异性多肽;
(c2)将所述特异性多肽与SAP参考蛋白数据库中的参考蛋白序列进行比对,筛选得到含有SAP位点的特异性多肽;并对所述SAP位点的位置进行定位,获得所述SAP位点所在的蛋白名称以及位置;
(c3)将所述SAP位点所在的蛋白名称以及位置与千人基因组数据库中的SNP位点所在的蛋白名称及位置进行关联,若某SAP位点所在蛋白名称及位置与某SNP位点所在蛋白名称及位置相同,且该SNP位点的碱基突变导致所述SAP位点的氨基酸突变,则该SNP位点即为所述nsSNP位点。
上述方法中,所述(c1)中,选择FDR小于等于1%的多肽作为高度可信的蛋白质定性鉴定的过滤参数,筛选得到特异性多肽。
上述方法中,所述(c3)还包括删除所述nsSNP位点组合中的连锁位点的步骤。
上述方法中,所述人群推断为非洲、东亚和欧洲三大人群推断。
上述方法中,所述nsSNP位点组合由如下nsSNP位点组成:rs111433922、rs35340855、rs74058627、rs16829071、rs77912442、rs75073861、rs33931638、rs2274540、rs181507001、rs1340472、rs10776792、rs138286826、rs3790549、rs6587649、rs142660239、rs141677205、rs150525217、rs78489268、rs35492900、rs73004856、rs9793541、rs11205064、rs143680696、rs111350576、rs4329520、rs75424193、rs150172690、rs7527180、rs137886860、rs116208483、rs11544443、rs35358752、rs140222211、rs146608925、rs79957178、rs61743921、rs76446715、rs291102、rs3738046、rs2234697、rs35273824、rs55873785、rs61739198、rs147571909、rs61741026、rs3127679、rs61850830、rs61737718、rs41277978、rs11200927、rs144135625、rs2281878、rs150218827、rs149172507、rs3781409、rs142332607、rs146366062、rs77752215、rs114405390、rs117868609、rs78838117、rs73428416、rs147366020、rs61744476、rs115660558、rs112245148、rs74706151、rs1695、rs188029416、rs199773487、rs1945783、rs78786722、rs11604169、rs75068802、rs141425229、rs111738856、rs61750769、rs34495134、rs13312793、rs112319661、rs25680、rs35819349、rs1063193、rs114865992、rs76226247、rs74095220、rs4761786、rs2071588、rs202205489、rs79897879、rs183358379、rs140635030、rs2852464、rs61740813、rs112554450、rs61630004、rs1732301、rs36004911、rs61730587、rs2658658、rs1732263、rs61730589、rs1791634、rs61730590、rs148287450、rs2232387、rs142860834、rs138021918、rs11540301、rs17845411、rs201201647、rs148276250、rs11170177、rs35043606、rs76412202、rs74660757、rs2634041、rs636127、rs200729891、rs78374723、rs36143766、rs139252457、rs2638497、rs116117459、rs143673140、rs61743822、rs138008625、rs201904127、rs35645287、rs114939776、rs145486599、rs4964460、rs2723880、rs117037408、rs113902407、rs139495129、rs35201084、rs78872760、rs143710874、rs139160172、rs17111188、rs35926651、rs2229462、rs141486741、rs762063、rs10148371、rs11125、rs45560241、rs941920、rs61745465、rs45542736、rs59773088、rs116761065、rs7149578、rs151256890、rs55863440、rs77734634、rs11549015、rs142368943、rs76831919、rs149516006、rs147209733、rs6083、rs141566933、rs182752537、rs143921047、rs8040674、rs114486517、rs138510119、rs13226、rs61733465、rs2745101、rs7202502、rs26856、rs149302444、rs8063727、rs61734749、rs74444511、rs4850、rs143599196、rs61764619、rs149180816、rs139027672、rs115575792、rs35959859、rs11646443、rs115334480、rs142294143、rs111653425、rs8068049、rs140044904、rs33923045、rs139361222、rs2269859、rs7213256、rs112557906、rs112120285、rs17843023、rs17843021、rs142154718、rs721957、rs2010027、rs151267951、rs140634473、rs3829598、rs138200823、rs150218495、rs9635728、rs6503578、rs36006291、rs113142104、rs201968324、rs139615301、rs1497383、rs366700、rs34361798、rs61746658、rs444509、rs385055、rs111435962、rs428371、rs149778906、rs62067292、rs144662088、rs144085234、rs35424651、rs9894258、rs140430944、rs150620728、rs71373411、rs114488848、rs61741663、rs143499346、rs12450621、rs77779192、rs112544857、rs187425812、rs17737019、rs35371972、rs16966811、rs9916475、rs9916484、rs9916724、rs139509509、rs9893787、rs117083040、rs116901031、rs2604955、rs2071563、rs16966929、rs57682233、rs73983451、rs146792525、rs2071560、rs2071601、rs139209783、rs189378138、rs138303882、rs139838007、rs200825300、rs2301354、rs9675246、rs8082683、rs73294423、rs11551760、rs117484558、rs148173278、rs111383277、rs2229512、rs143043662、rs41283425、rs34891485、rs143967758、rs62636624、rs59657238、rs112984118、rs116700192、rs116640209、rs35074489、rs75138404、rs62621822、rs142608913、rs11871357、rs2228306、rs140743740、rs2853533、rs3737374、rs78014467、rs1455555、rs151208927、rs3746173、rs7250822、rs55862054、rs890850、rs10410943、rs80251258、rs2287813、rs62638750、rs117612375、rs150023166、rs10846、rs7249305、rs8111625、rs116923487、rs75291244、rs61731193、rs114254919、rs146740964、rs773902、rs12983721、rs61995739、rs61742630、rs151268424、rs112433506、rs2229259、rs148300955、rs185356090、rs116440799、rs143467587、rs191886465、rs189187210、rs114308190、rs4802741、rs144495841、rs73938668、rs116363585、rs115704323、rs57920974、rs533617、rs62130126、rs143205707、rs192390933、rs1109758、rs13413205、rs72937663、rs142729495、rs75630766、rs202041757、rs6761276、rs6743376、rs77686710、rs34355135、rs112797950、rs35852101、rs35830636、rs76148000、rs113701414、rs3815849、rs181520135、rs73996408、rs2233384、rs2233390、rs2233393、rs6431437、rs73102303、rs61732303、rs214814、rs114998364、rs34205880、rs111730906、rs145658539、rs6061066、rs3746609、rs17301126、rs41293138、rs200948404、rs17856024、rs2231619、rs61750835、rs36068952、rs78386672、rs45486695、rs61750208、rs2830585、rs141102396、rs113360916、rs3804010、rs61748317、rs16986753、rs73901140、rs113504861、rs151147550、rs117415039、rs115002444、rs74429119、rs79258920、rs16987932、rs78121368、rs61753641、rs76994627、rs181516402、rs233252、rs465279、rs111668637、rs411254、rs140821764、rs73909208、rs79740360、rs462007、rs78191358、rs78821735、rs73909210、rs34302939、rs61745911、rs7277175、rs201439546、rs115031369、rs61742280、rs112405400、rs133072、rs147348682、rs191014345、rs61730105、rs76321736、rs3796375、rs2228561、rs17080284、rs138055453、rs57006145、rs140995238、rs116174869、rs77141175、rs77299600、rs5955、rs144811342、rs61995956、rs186892593、rs115253144、rs17029215、rs3811813、rs10513155、rs76155491、rs73757391、rs147178651、rs148509798、rs77499935、rs181914313、rs149861653、rs35610885、rs150956127、rs146522449、rs2278371、rs61743236、rs6872614、rs145827614、rs77767937、rs112465391、rs77758574、rs2076299、rs28763966、rs28763967、rs6929069、rs1225746、rs34286843、rs138815183、rs73736234、rs9261293、rs199834022、rs41293883、rs45624537、rs145921744、rs61746206、rs2621330、rs2070121、rs60336135、rs115292676、rs11969595、rs111265263、rs138694074、rs1676015、rs2227885、rs2295005、rs9478144、rs141119961、rs4716346、rs16901311、rs185762794、rs11548791、rs5743342、rs150151168、rs73692834、rs145942606、rs10256、rs114926839、rs2437100、rs10953934、rs1062154、rs114560708、rs73463436、rs61745481、rs149880251、rs72475803、rs35781576、rs114896954、rs148249848、rs145786248、rs76489557、rs150147780、rs116816681、rs7013127、rs117589117、rs11539895、rs540473、rs34250374、rs35791393、rs16929374、rs146467307、rs3750501、rs7025814、rs114612810、rs144749820、rs76003300、rs145771944、rs1538660、rs142111180、rs76057724、rs144181457、rs3812561、rs7850438、rs139415880、rs16997659、rs17147624、rs17847095、rs41306133、rs144825978、rs138895359和rs142447204。
本发明选取104个中国汉族样本和105个中国维吾尔族样本的毛干样本进行了毛干蛋白质组的提取,通过质谱检测毛干蛋白质组,筛选得到772个包含SAP的特异性多肽序列,对应703个SAP位点,并将所述SAP位点与千人基因组数据库中的SNP位点关联进而反推得到527个nsSNP位点组合。通过实验证明,本发明提供的nsSNP位点组合可用于非洲、东亚和欧洲三大人群推断。
附图说明
图1为GO分析结果图。
图2为一代测序验证结果。检测了10份汉族口腔拭子样本的88个SNP位点,真阳性(TP,true positives)为质谱检测结果和一代测序结果一致,用蓝色表示;假阳性(FP,false positive)为质谱检测结果与一代测序结果不一致,用红色表示;假阴性(FN,falsenegative)质谱未检测到分型而一代测序检测出分型,用绿色表示;真阴性(TN,truenegatives)质谱和一代测序均未检测到分型,用白色表示。橙色代表一代测序未获得分型。
图3为STRUCTURE聚类分析结果(k=5)。
图4为PCA主成分分析图。
图5为STRUCTURE结果图(K=3)。
具体实施方式
下述实施例中所使用的实验方法如无特殊说明,均为常规方法。
下述实施例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。
下述实施例中的定量试验,均设置三次重复实验,结果取平均值。
实施例1、毛干蛋白组中的nsSNPs的获得
一、毛干蛋白质组的提取及蛋白质组的质谱检测
1、样本的收集
收集104名中国汉族和105名维吾尔族无关个体的毛干样本以及对应的口腔擦拭物各209份,毛干样本均切去毛发的头尾,以保证不含毛囊和发尾,每份毛干样本长2cm(单根长度不足2cm时则使用两根同源毛干)。
2、毛干蛋白质组提取
提取毛干样本中的毛干蛋白质组,具体步骤如下:使用10%(体积分数)甲醇、水各清洗毛干2次,每次1-2h,之后取出清洗后的毛干切碎至约1-2mm。切碎后的毛干每份分别加入100μL蛋白处理液(1M尿素、50mM NH4HCO3、0.1M DTT、7μg/mL胰酶),于37℃金属浴中振荡反应16h,吸取酶解液至新EP管中,得到毛干蛋白质组样品。对毛干蛋白质组样品进行定量。结果显示,蛋白质组的总质量约为10μg。
3、蛋白质组的质谱检测
将毛干蛋白质组样品进行ZipTip除盐,抽干后加入上样buffer,复溶后进样进行质谱检测。质谱检测采用液质联用仪(NCS3500高效液相系统)和Q Exactive质谱仪(ThermoScientific),选用胰酶特异性酶切,最多允许2个漏切位点,设定母离子质量允差为20ppm,子离子质量允差为0.02Da。
二、SAP位点的定位
1、特异性多肽的筛选
使用Proteome Discoverer1.4软件对液质检测.raw文件进行蛋白质定性鉴定,选择FDR<=1%的多肽作为高度可信的蛋白质定性鉴定的过滤参数,筛选得到特异性多肽。
2、多态性位点的定位
对步骤1中筛选得到的特异性多肽中的氨基酸多态性(SAP)位点进行定位。具体方法如下:将筛选得到的特异性多肽与SAP参考蛋白数据库中的参考蛋白序列进行比对,筛选出其中存在氨基酸多态性位点的多肽,并对多态性位点在参考蛋白序列的位置进行定位。SAP参考蛋白数据库为文献“Parker GJ,Leppert T,Anex DS,Hilmer JK,Matsunami N,Baird L,Stevens J,Parsawar K,Durbin-Johnson BP,Rocke DM,Nelson C,FairbanksDJ,Wilson AS,Rice RH,Woodward SR,Bothner B,Hart BR,Leppert M.Demonstration ofProtein-Based Human Identification Using the Hair Shaft Proteome.PLoS One,2016,11(9):e0160653.”中建立的数据库(RefSeq Protein Variant Database),该数据库既包括突变前的蛋白序列,也包括突变后的蛋白序列。
结果表明:汉族样本共检出304至1509个多肽(均值为936个),其中包含SAP的特异性多肽44至137个(均值为96个)。维吾尔族样本共检出316至1331个多肽(均值为821个),其中包含SAP的特异性多肽39至120个(均值为84个)。全部样本共发现772个包含SAP的特异性多肽,对应703个SAP位点,位于460个蛋白上。
3、GO分析
使用KOBAS(KEGG Orthology Based Annotation System)系统进行GeneOntology(简称GO)分析。GO分析是基因功能国际标准分类体系,通过GO分析按照Cellularcomponent(细胞组分,表示细胞或其所处外界环境)、Molecular Function(分子功能,描述在分子水平上基因产物的活性元件)、Biological process(生物学过程,表示一个分子活动事件从起始到终止的过程,包括细胞、组织、器官和物种的功能整合)对基因进行分类,且每个分类含有不同层次依次细化。
GO分析显示大部分蛋白(步骤2中获得的含有SAP位点的蛋白)为角蛋白或者角蛋白相关蛋白,其余蛋白质功能分布广泛,涉及到细胞功能、代谢、应急反应、信号转导等方面(图1)。
三、反推nsSNP统计学分析
1、nsSNP位点组合的筛选
根据SAP所在的蛋白名称以及位置,将SAP与千人基因组数据库(1000 Genomes数据库,27个人群共2504个个体,人群样本信息如表1所示)关联,找到SAP与SNP的对应关系。根据SAP所在的蛋白名称,与1000 Genomes数据库中SNP所在的蛋白名称进行对应关系的查找;若某SAP位点所在蛋白名称及位置与某SNP位点所在蛋白名称及位置相同,且该SNP位点的碱基突变导致所述SAP位点的氨基酸突变,则该SNP位点作为筛选得到的nsSNP位点,共筛选得到552个nsSNP位点。
表1、人群样本信息表
通过千人基因组数据库中2504个个体的基因组数据对比,毛干蛋白质组共推导出552个nsSNP位点,位于320个蛋白质上。其中5个位点(rs146291703,rs10274334,rs57670668,rs143643076,rs6580873)有三种分型,剩余547个nsSNP位点。经连锁不平衡检验,删掉如下20个连锁位点(r2>0.2):rs75130475,rs74743312,rs34212827,rs150149800,rs34861030,rs6503627,rs34180629,rs2480345,rs114703967,rs139815542,rs1138272,rs2239710,rs743686,rs14024,rs26857,rs12451652,rs9897046,rs9908304,rs8071814和rs77018583,最终得到527个nsSNP位点。527个nsSNP位点及其相关信息如表2所示。
表2、527个nsSNP位点的相关信息
汉族、维族检出率均超过15%的SAP位点,对应着如下88个nsSNP位点:rs2227885,rs148276250,rs77499935,rs1695,rs1138272,rs147178651,rs73757391,rs76155491,rs11871357,rs141102396,rs5955,rs143043662,rs41283425,rs112544857,rs187425812,rs2071560,rs146792525,rs73983451,rs16966929,rs2071563,rs114488848,rs71373411,rs150620728,rs139209783,rs138303882,rs189378138,rs139838007,rs743686,rs12451652,rs2071601,rs200825300,rs2071588,rs2852464,rs61740813,rs61630004,rs10148371,rs11125,rs61734749,rs149302444,rs214814,rs17080284,rs9675246,rs140430944,rs28763966,rs28763967,rs6929069,rs2233393,rs77752215,rs2239710,rs9894258,rs139615301,rs201968324,rs3829598,rs9897046,rs144085234,rs366700,rs444509,rs61746658,rs34361798,rs61730590,rs1791634,rs61730589,rs1732263,rs2658658,rs148287450,rs62067292,rs74429119,rs79258920,rs151147550,rs113504861,rs117415039,rs16986753,rs61748317,rs140634473,rs151267951,rs9908304,rs465279,rs73909208,rs76994627,rs34302939,rs61745911,rs17843021,rs112120285,rs112557906,rs143643076,rs7213256,rs142154718和rs11170177,去掉如下连锁位点(r2>0.2)和三等位基因位点后:rs1138272,rs743686,rs12451652,rs2239710,rs9897046,rs9908304和rs143643076,最终获得81个nsSNP位点。
2、nsSNP一代测序验证
挑选10份SAP检出率最高的汉族样本对应的口腔拭子,对上述步骤1中的88个nsSNP位点进行一代测序验证。具体步骤如下:采用MagAttract DNA Mini M48(Qiagen)试剂盒提取基因组DNA,使用Primer Premier 5.0件设计引物,采用一代测序方法检测相应的nsSNP的分型,并分别计算每个样本的准确性和检出率。准确性的计算公式如下:TP/(TP+FP),检出率的计算公式如下:TP/(TP+FP),其中,真阳性(TP,true positives)为质谱和一代测序结果一致,假阳性(FP,false positive)为质谱检测到与一代测序不一致,假阴性(FN,false negative)为质谱未检测到而一代测序检测出分型。
结果如图2所示。10个样本平均准确性为95.88%,10个样本平均检出率为77.19%。
实施例2、nsSNP位点组合(527个)在人群推断中的应用
一、基于千人基因组数据对nsSNP位点组合(527个)进行评估
1、主成分分析(PCA)
使用527个nsSNP位点组合针对表1中的千人基因组数据库中的非洲、东亚和欧洲三大地域共27个人群、2504个样本利用Rv3.2.3软件进行主成分分析(PrincipalComponent Analysis,PCA)。利用主成分分析对数据进行降维,将多个具有较强相关性的实测变量综合成少量综合变量,根据数据在降维后的因子分布进行绘图得到分析结果的可视化。
结果如图4所示。从图中可以看出:主成分1(PC1)和主成分2(PC2)解释了60.5的差异。527个nsSNP位点组合可以有效区分非洲、东亚、欧洲三大人群。
2、聚类分析
使用527个nsSNP位点组合针对表1中的千人基因组数据库中的非洲、东亚和欧洲三大地域共27个人群、2504个样本利用STRUCTURE.v2.3.4软件进行聚类分析,分析各人群的遗传结构,使用Distruct 1.1绘制人群聚类结果图。STRUCTURE v3.4聚类分析是基于一组SNP位点的群体样本基因型数据进行聚类分析。假定有K个群体(K由用户指定可能的范围,最终根据结果确定最优值)的一个模型,该方法模拟在K的情况下使用贝叶斯算法和“有放回的重抽样方法”来推断群体结构和个体祖先成分。每个个体被(按照概率)分配到一个群体,个体的基因型表明他们是混合的或共同分配到两个或者更多群体。
结果如图3所示(K值为5)。从图中可以看出:本发明的527个nsSNP位点组合可有效区分非洲、东亚和欧洲三大人群。
二、nsSNP位点组合(81个)的评估
1、聚类分析
使用81个nsSNP位点组合针对表1中的千人基因组数据库中的非洲、东亚和欧洲三大地域共19个人群、1668个样本利用STRUCTURE.v2.3.4软件进行聚类分析,分析各人群的遗传结构,使用Distruct 1.1绘制人群聚类结果图。
结果如图5所示(K值为3)。81个nsSNP位点可区分非洲、东亚、欧洲三大人群。
2、人群推断分析
对19个已知祖先信息来源的汉族样本(CHH)基于千人基因组数据库用法医智能软件计算随机人群匹配概率(Macthing Probability,MP),基于似然比进行其可能洲际人群来源的统计,以似然比大于100判断其最可能祖先来源,似然比(Likelihood Ratio,LR)的计算方法如下:未知个体概率最大的群体匹配概率为分母,其他群体的匹配概率为分子,依次得到不同人群的似然比值。群体匹配概率是对某位点组合的一个特定分型可能出现在人群中的估计概率,也可以理解为从人群中随机抽取一份样本,会出现特定DNA分型的理论概率。似然比是对群体匹配概率证据价值的量化,进行祖先来源的推断。
19份汉族样本(CHH)测试样本匹配概率结果如表3所示。样本的祖先成分计算统计结果如表4所示。这19个测试样本,从祖先信息成分可以看出成分最高的均为东亚,19个测试样本中祖先来源推断与样本信息均一致,可以看出本发明筛选的81个nsSNPs位点对测试样本祖先来源推断的准确率达100%。
表3、19份汉族样本(CHH)测试样本匹配概率结果
表4、19个测试样本祖先来源分析结果

Claims (10)

1.用于区分非洲、东亚和欧洲三大人群的nsSNP位点组合;所述nsSNP位点组合由如下527个nsSNP位点组成:rs111433922、rs35340855、rs74058627、rs16829071、rs77912442、rs75073861、rs33931638、rs2274540、rs181507001、rs1340472、rs10776792、rs138286826、rs3790549、rs6587649、rs142660239、rs141677205、rs150525217、rs78489268、rs35492900、rs73004856、rs9793541、rs11205064、rs143680696、rs111350576、rs4329520、rs75424193、rs150172690、rs7527180、rs137886860、rs116208483、rs11544443、rs35358752、rs140222211、rs146608925、rs79957178、rs61743921、rs76446715、rs291102、rs3738046、rs2234697、rs35273824、rs55873785、rs61739198、rs147571909、rs61741026、rs3127679、rs61850830、rs61737718、rs41277978、rs11200927、rs144135625、rs2281878、rs150218827、rs149172507、rs3781409、rs142332607、rs146366062、rs77752215、rs114405390、rs117868609、rs78838117、rs73428416、rs147366020、rs61744476、rs115660558、rs112245148、rs74706151、rs1695、rs188029416、rs199773487、rs1945783、rs78786722、rs11604169、rs75068802、rs141425229、rs111738856、rs61750769、rs34495134、rs13312793、rs112319661、rs25680、rs35819349、rs1063193、rs114865992、rs76226247、rs74095220、rs4761786、rs2071588、rs202205489、rs79897879、rs183358379、rs140635030、rs2852464、rs61740813、rs112554450、rs61630004、rs1732301、rs36004911、rs61730587、rs2658658、rs1732263、rs61730589、rs1791634、rs61730590、rs148287450、rs2232387、rs142860834、rs138021918、rs11540301、rs17845411、rs201201647、rs148276250、rs11170177、rs35043606、rs76412202、rs74660757、rs2634041、rs636127、rs200729891、rs78374723、rs36143766、rs139252457、rs2638497、rs116117459、rs143673140、rs61743822、rs138008625、rs201904127、rs35645287、rs114939776、rs145486599、rs4964460、rs2723880、rs117037408、rs113902407、rs139495129、rs35201084、rs78872760、rs143710874、rs139160172、rs17111188、rs35926651、rs2229462、rs141486741、rs762063、rs10148371、rs11125、rs45560241、rs941920、rs61745465、rs45542736、rs59773088、rs116761065、rs7149578、rs151256890、rs55863440、rs77734634、rs11549015、rs142368943、rs76831919、rs149516006、rs147209733、rs6083、rs141566933、rs182752537、rs143921047、rs8040674、rs114486517、rs138510119、rs13226、rs61733465、rs2745101、rs7202502、rs26856、rs149302444、rs8063727、rs61734749、rs74444511、rs4850、rs143599196、rs61764619、rs149180816、rs139027672、rs115575792、rs35959859、rs11646443、rs115334480、rs142294143、rs111653425、rs8068049、rs140044904、rs33923045、rs139361222、rs2269859、rs7213256、rs112557906、rs112120285、rs17843023、rs17843021、rs142154718、rs721957、rs2010027、rs151267951、rs140634473、rs3829598、rs138200823、rs150218495、rs9635728、rs6503578、rs36006291、rs113142104、rs201968324、rs139615301、rs1497383、rs366700、rs34361798、rs61746658、rs444509、rs385055、rs111435962、rs428371、rs149778906、rs62067292、rs144662088、rs144085234、rs35424651、rs9894258、rs140430944、rs150620728、rs71373411、rs114488848、rs61741663、rs143499346、rs12450621、rs77779192、rs112544857、rs187425812、rs17737019、rs35371972、rs16966811、rs9916475、rs9916484、rs9916724、rs139509509、rs9893787、rs117083040、rs116901031、rs2604955、rs2071563、rs16966929、rs57682233、rs73983451、rs146792525、rs2071560、rs2071601、rs139209783、rs189378138、rs138303882、rs139838007、rs200825300、rs2301354、rs9675246、rs8082683、rs73294423、rs11551760、rs117484558、rs148173278、rs111383277、rs2229512、rs143043662、rs41283425、rs34891485、rs143967758、rs62636624、rs59657238、rs112984118、rs116700192、rs116640209、rs35074489、rs75138404、rs62621822、rs142608913、rs11871357、rs2228306、rs140743740、rs2853533、rs3737374、rs78014467、rs1455555、rs151208927、rs3746173、rs7250822、rs55862054、rs890850、rs10410943、rs80251258、rs2287813、rs62638750、rs117612375、rs150023166、rs10846、rs7249305、rs8111625、rs116923487、rs75291244、rs61731193、rs114254919、rs146740964、rs773902、rs12983721、rs61995739、rs61742630、rs151268424、rs112433506、rs2229259、rs148300955、rs185356090、rs116440799、rs143467587、rs191886465、rs189187210、rs114308190、rs4802741、rs144495841、rs73938668、rs116363585、rs115704323、rs57920974、rs533617、rs62130126、rs143205707、rs192390933、rs1109758、rs13413205、rs72937663、rs142729495、rs75630766、rs202041757、rs6761276、rs6743376、rs77686710、rs34355135、rs112797950、rs35852101、rs35830636、rs76148000、rs113701414、rs3815849、rs181520135、rs73996408、rs2233384、rs2233390、rs2233393、rs6431437、rs73102303、rs61732303、rs214814、rs114998364、rs34205880、rs111730906、rs145658539、rs6061066、rs3746609、rs17301126、rs41293138、rs200948404、rs17856024、rs2231619、rs61750835、rs36068952、rs78386672、rs45486695、rs61750208、rs2830585、rs141102396、rs113360916、rs3804010、rs61748317、rs16986753、rs73901140、rs113504861、rs151147550、rs117415039、rs115002444、rs74429119、rs79258920、rs16987932、rs78121368、rs61753641、rs76994627、rs181516402、rs233252、rs465279、rs111668637、rs411254、rs140821764、rs73909208、rs79740360、rs462007、rs78191358、rs78821735、rs73909210、rs34302939、rs61745911、rs7277175、rs201439546、rs115031369、rs61742280、rs112405400、rs133072、rs147348682、rs191014345、rs61730105、rs76321736、rs3796375、rs2228561、rs17080284、rs138055453、rs57006145、rs140995238、rs116174869、rs77141175、rs77299600、rs5955、rs144811342、rs61995956、rs186892593、rs115253144、rs17029215、rs3811813、rs10513155、rs76155491、rs73757391、rs147178651、rs148509798、rs77499935、rs181914313、rs149861653、rs35610885、rs150956127、rs146522449、rs2278371、rs61743236、rs6872614、rs145827614、rs77767937、rs112465391、rs77758574、rs2076299、rs28763966、rs28763967、rs6929069、rs1225746、rs34286843、rs138815183、rs73736234、rs9261293、rs199834022、rs41293883、rs45624537、rs145921744、rs61746206、rs2621330、rs2070121、rs60336135、rs115292676、rs11969595、rs111265263、rs138694074、rs1676015、rs2227885、rs2295005、rs9478144、rs141119961、rs4716346、rs16901311、rs185762794、rs11548791、rs5743342、rs150151168、rs73692834、rs145942606、rs10256、rs114926839、rs2437100、rs10953934、rs1062154、rs114560708、rs73463436、rs61745481、rs149880251、rs72475803、rs35781576、rs114896954、rs148249848、rs145786248、rs76489557、rs150147780、rs116816681、rs7013127、rs117589117、rs11539895、rs540473、rs34250374、rs35791393、rs16929374、rs146467307、rs3750501、rs7025814、rs114612810、rs144749820、rs76003300、rs145771944、rs1538660、rs142111180、rs76057724、rs144181457、rs3812561、rs7850438、rs139415880、rs16997659、rs17147624、rs17847095、rs41306133、rs144825978、rs138895359和rs142447204。
2.用于区分非洲、东亚和欧洲三大人群的产品,其包括检测权利要求1中所述的527个nsSNP位点基因型的物质。
3.根据权利要求2所述的产品,其特征在于:所述检测权利要求1中所述的527个nsSNP位点基因型的物质为检测权利要求1中所述的527个nsSNP位点基因型的试剂和/或仪器。
4.权利要求1所述的nsSNP位点组合或权利要求2或3所述的产品在区分非洲、东亚和欧洲三大人群中的应用。
5.权利要求1所述的nsSNP位点组合或权利要求2或3所述的产品在构建非洲、东亚和欧洲三大人群基因分型数据库中的应用。
6.一种构建非洲、东亚和欧洲三大人群基因分型数据库的方法,包括如下步骤:
(a1)从千人基因组数据库中选取非洲、东亚和欧洲三大人群基于权利要求1中所述的527个nsSNP位点基因型形成原始分型库;
(a2)将所述原始分型库里所有样本进行structure聚类分析,从中选取祖先主成分大于90%的部分即构成非洲、东亚和欧洲三大人群基因分型数据库。
7.一种区分非洲、东亚和欧洲三大洲际人群的方法,包括如下步骤:
(b1)按照权利要求6所述的方法构建非洲、东亚和欧洲三大人群基因分型数据库;
(b2)提取待测者的基因组DNA,并进行527个nsSNP位点的基因型检测,获得待测者在527个nsSNP位点上的基因分型结果;
(b3)将待测者在527个nsSNP位点上的基因分型结果与所述非洲、东亚和欧洲三大人群基因分型数据库进行比对,从而确定待测者为非洲、东亚和欧洲人群中的哪一种。
8.一种基于毛干蛋白质组筛选用于人群推断的nsSNP位点组合的方法,包括如下步骤:
(c1)分别提取不同待测个体的毛干蛋白质组,并对所述毛干蛋白质组进行质谱检测,筛选得到特异性多肽;
(c2)将所述特异性多肽与SAP参考蛋白数据库中的参考蛋白序列进行比对,筛选得到含有SAP位点的特异性多肽;并对所述SAP位点的位置进行定位,获得所述SAP位点所在的蛋白名称以及位置;
(c3)将所述SAP位点所在的蛋白名称以及位置与千人基因组数据库中的SNP位点所在的蛋白名称及位置进行关联,若某SAP位点所在蛋白名称及位置与某SNP位点所在蛋白名称及位置相同,且该SNP位点的碱基突变导致所述SAP位点的氨基酸突变,则该SNP位点即为所述nsSNP位点。
9.根据权利要求8所述的方法,其特征在于:所述(c1)中,选择FDR小于等于1%的多肽作为高度可信的蛋白质定性鉴定的过滤参数,筛选得到特异性多肽;
和/或,所述(c3)还包括删除所述nsSNP位点组合中的连锁位点的步骤。
10.根据权利要求8或9所述的方法,其特征在于:所述人群推断为非洲、东亚和欧洲三大人群推断;
和/或,所述nsSNP位点组合由如下nsSNP位点组成:rs111433922、rs35340855、rs74058627、rs16829071、rs77912442、rs75073861、rs33931638、rs2274540、rs181507001、rs1340472、rs10776792、rs138286826、rs3790549、rs6587649、rs142660239、rs141677205、rs150525217、rs78489268、rs35492900、rs73004856、rs9793541、rs11205064、rs143680696、rs111350576、rs4329520、rs75424193、rs150172690、rs7527180、rs137886860、rs116208483、rs11544443、rs35358752、rs140222211、rs146608925、rs79957178、rs61743921、rs76446715、rs291102、rs3738046、rs2234697、rs35273824、rs55873785、rs61739198、rs147571909、rs61741026、rs3127679、rs61850830、rs61737718、rs41277978、rs11200927、rs144135625、rs2281878、rs150218827、rs149172507、rs3781409、rs142332607、rs146366062、rs77752215、rs114405390、rs117868609、rs78838117、rs73428416、rs147366020、rs61744476、rs115660558、rs112245148、rs74706151、rs1695、rs188029416、rs199773487、rs1945783、rs78786722、rs11604169、rs75068802、rs141425229、rs111738856、rs61750769、rs34495134、rs13312793、rs112319661、rs25680、rs35819349、rs1063193、rs114865992、rs76226247、rs74095220、rs4761786、rs2071588、rs202205489、rs79897879、rs183358379、rs140635030、rs2852464、rs61740813、rs112554450、rs61630004、rs1732301、rs36004911、rs61730587、rs2658658、rs1732263、rs61730589、rs1791634、rs61730590、rs148287450、rs2232387、rs142860834、rs138021918、rs11540301、rs17845411、rs201201647、rs148276250、rs11170177、rs35043606、rs76412202、rs74660757、rs2634041、rs636127、rs200729891、rs78374723、rs36143766、rs139252457、rs2638497、rs116117459、rs143673140、rs61743822、rs138008625、rs201904127、rs35645287、rs114939776、rs145486599、rs4964460、rs2723880、rs117037408、rs113902407、rs139495129、rs35201084、rs78872760、rs143710874、rs139160172、rs17111188、rs35926651、rs2229462、rs141486741、rs762063、rs10148371、rs11125、rs45560241、rs941920、rs61745465、rs45542736、rs59773088、rs116761065、rs7149578、rs151256890、rs55863440、rs77734634、rs11549015、rs142368943、rs76831919、rs149516006、rs147209733、rs6083、rs141566933、rs182752537、rs143921047、rs8040674、rs114486517、rs138510119、rs13226、rs61733465、rs2745101、rs7202502、rs26856、rs149302444、rs8063727、rs61734749、rs74444511、rs4850、rs143599196、rs61764619、rs149180816、rs139027672、rs115575792、rs35959859、rs11646443、rs115334480、rs142294143、rs111653425、rs8068049、rs140044904、rs33923045、rs139361222、rs2269859、rs7213256、rs112557906、rs112120285、rs17843023、rs17843021、rs142154718、rs721957、rs2010027、rs151267951、rs140634473、rs3829598、rs138200823、rs150218495、rs9635728、rs6503578、rs36006291、rs113142104、rs201968324、rs139615301、rs1497383、rs366700、rs34361798、rs61746658、rs444509、rs385055、rs111435962、rs428371、rs149778906、rs62067292、rs144662088、rs144085234、rs35424651、rs9894258、rs140430944、rs150620728、rs71373411、rs114488848、rs61741663、rs143499346、rs12450621、rs77779192、rs112544857、rs187425812、rs17737019、rs35371972、rs16966811、rs9916475、rs9916484、rs9916724、rs139509509、rs9893787、rs117083040、rs116901031、rs2604955、rs2071563、rs16966929、rs57682233、rs73983451、rs146792525、rs2071560、rs2071601、rs139209783、rs189378138、rs138303882、rs139838007、rs200825300、rs2301354、rs9675246、rs8082683、rs73294423、rs11551760、rs117484558、rs148173278、rs111383277、rs2229512、rs143043662、rs41283425、rs34891485、rs143967758、rs62636624、rs59657238、rs112984118、rs116700192、rs116640209、rs35074489、rs75138404、rs62621822、rs142608913、rs11871357、rs2228306、rs140743740、rs2853533、rs3737374、rs78014467、rs1455555、rs151208927、rs3746173、rs7250822、rs55862054、rs890850、rs10410943、rs80251258、rs2287813、rs62638750、rs117612375、rs150023166、rs10846、rs7249305、rs8111625、rs116923487、rs75291244、rs61731193、rs114254919、rs146740964、rs773902、rs12983721、rs61995739、rs61742630、rs151268424、rs112433506、rs2229259、rs148300955、rs185356090、rs116440799、rs143467587、rs191886465、rs189187210、rs114308190、rs4802741、rs144495841、rs73938668、rs116363585、rs115704323、rs57920974、rs533617、rs62130126、rs143205707、rs192390933、rs1109758、rs13413205、rs72937663、rs142729495、rs75630766、rs202041757、rs6761276、rs6743376、rs77686710、rs34355135、rs112797950、rs35852101、rs35830636、rs76148000、rs113701414、rs3815849、rs181520135、rs73996408、rs2233384、rs2233390、rs2233393、rs6431437、rs73102303、rs61732303、rs214814、rs114998364、rs34205880、rs111730906、rs145658539、rs6061066、rs3746609、rs17301126、rs41293138、rs200948404、rs17856024、rs2231619、rs61750835、rs36068952、rs78386672、rs45486695、rs61750208、rs2830585、rs141102396、rs113360916、rs3804010、rs61748317、rs16986753、rs73901140、rs113504861、rs151147550、rs117415039、rs115002444、rs74429119、rs79258920、rs16987932、rs78121368、rs61753641、rs76994627、rs181516402、rs233252、rs465279、rs111668637、rs411254、rs140821764、rs73909208、rs79740360、rs462007、rs78191358、rs78821735、rs73909210、rs34302939、rs61745911、rs7277175、rs201439546、rs115031369、rs61742280、rs112405400、rs133072、rs147348682、rs191014345、rs61730105、rs76321736、rs3796375、rs2228561、rs17080284、rs138055453、rs57006145、rs140995238、rs116174869、rs77141175、rs77299600、rs5955、rs144811342、rs61995956、rs186892593、rs115253144、rs17029215、rs3811813、rs10513155、rs76155491、rs73757391、rs147178651、rs148509798、rs77499935、rs181914313、rs149861653、rs35610885、rs150956127、rs146522449、rs2278371、rs61743236、rs6872614、rs145827614、rs77767937、rs112465391、rs77758574、rs2076299、rs28763966、rs28763967、rs6929069、rs1225746、rs34286843、rs138815183、rs73736234、rs9261293、rs199834022、rs41293883、rs45624537、rs145921744、rs61746206、rs2621330、rs2070121、rs60336135、rs115292676、rs11969595、rs111265263、rs138694074、rs1676015、rs2227885、rs2295005、rs9478144、rs141119961、rs4716346、rs16901311、rs185762794、rs11548791、rs5743342、rs150151168、rs73692834、rs145942606、rs10256、rs114926839、rs2437100、rs10953934、rs1062154、rs114560708、rs73463436、rs61745481、rs149880251、rs72475803、rs35781576、rs114896954、rs148249848、rs145786248、rs76489557、rs150147780、rs116816681、rs7013127、rs117589117、rs11539895、rs540473、rs34250374、rs35791393、rs16929374、rs146467307、rs3750501、rs7025814、rs114612810、rs144749820、rs76003300、rs145771944、rs1538660、rs142111180、rs76057724、rs144181457、rs3812561、rs7850438、rs139415880、rs16997659、rs17147624、rs17847095、rs41306133、rs144825978、rs138895359和rs142447204。
CN201810414456.0A 2018-05-03 2018-05-03 基于毛干蛋白质组nsSNP进行人群来源推断的方法 Active CN110438235B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810414456.0A CN110438235B (zh) 2018-05-03 2018-05-03 基于毛干蛋白质组nsSNP进行人群来源推断的方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810414456.0A CN110438235B (zh) 2018-05-03 2018-05-03 基于毛干蛋白质组nsSNP进行人群来源推断的方法

Publications (2)

Publication Number Publication Date
CN110438235A true CN110438235A (zh) 2019-11-12
CN110438235B CN110438235B (zh) 2022-06-28

Family

ID=68427732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810414456.0A Active CN110438235B (zh) 2018-05-03 2018-05-03 基于毛干蛋白质组nsSNP进行人群来源推断的方法

Country Status (1)

Country Link
CN (1) CN110438235B (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233724A (zh) * 2020-10-16 2021-01-15 深圳市盛景基因生物科技有限公司 基于大数据人工智能算法的祖源多态性预测方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110236918A1 (en) * 2010-03-24 2011-09-29 Glendon John Parker Methods for conducting genetic analysis using protein polymorphisms

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110236918A1 (en) * 2010-03-24 2011-09-29 Glendon John Parker Methods for conducting genetic analysis using protein polymorphisms

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GLENDON J. PARKER等: "Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome", 《PLOS ONE》 *
KATELYN ELIZABETH MASON等: "Protein-based forensic identification using genetically variant peptides in human bone", 《FORENSIC SCIENCE INTERNATIONAL》 *
LAWRENCE LIVERMORE NATIONAL LABORATORY: "A new role for hair in human identification", 《PROTEINS FOR IDENTIFICATION》 *
SEVTAP SAVAS等: "A comprehensive catalogue of functional genetic variations in the EGFR pathway: Protein–protein interaction analysis reveals novel genes and polymorphisms important for cancer research", 《INT. J. CANCER》 *
苏智端等: "人群变异的分子基础:从单核苷酸多态性到单氨基酸多态性", 《中国科学:生命科学》 *
高丽霞等: "蛋白质组学在皮肤毛囊发育研究中的应用", 《畜牧与饲料科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233724A (zh) * 2020-10-16 2021-01-15 深圳市盛景基因生物科技有限公司 基于大数据人工智能算法的祖源多态性预测方法

Also Published As

Publication number Publication date
CN110438235B (zh) 2022-06-28

Similar Documents

Publication Publication Date Title
US20220246234A1 (en) Using cell-free dna fragment size to detect tumor-associated variant
US20230295690A1 (en) Haplotype resolved genome sequencing
US20200335178A1 (en) Detecting repeat expansions with short read sequencing data
US11923046B2 (en) Noninvasive prenatal molecular karyotyping from maternal plasma
Zhao et al. Detection of fetal subchromosomal abnormalities by sequencing circulating cell-free DNA from maternal plasma
Bilello The agony and ecstasy of “OMIC” technologies in drug development
AU2015266665A1 (en) Detecting fetal sub-chromosomal aneuploidies and copy number variations
JPH11501741A (ja) 微生物学的データを保存し解析するコンピュータシステム
EP3283647B1 (en) A method for non-invasive prenatal detection of fetal chromosome aneuploidy from maternal blood
CN110438235A (zh) 基于毛干蛋白质组nsSNP进行人群来源推断的方法
US20210366569A1 (en) Limit of detection based quality control metric
WO2008066596A2 (en) Gene expression barcode for normal and diseased tissue classification
STRAUSBERG et al. Functional genomics: technological challenges and opportunities
EP4305191A1 (en) Systems and methods for identifying microbial biosynthetic genetic clusters
CN114171116A (zh) 孕妇游离及本身dna评估胎儿dna浓度的方法及应用
Kekeç et al. New generation genome sequencing methods
Ren et al. Reference Materials for Improving Reliability of Multiomics Profiling
Veeramachaneni Data analysis in rare disease diagnostics
Albujja Microhaplotypes analysis for human identification using next-generation sequencing (NGS)
Dimartino A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads
Lan et al. Targeted sequencing of high-density SNPs provides an enhanced tool for forensic applications and genetic landscape exploration in Chinese Korean ethnic group
Bernstein Penalty-Based Dynamic Programming for the Identification of Post-Translational Modifications in Peptide Mass Spectra
Laberge et al. New Technologies in Pre-and Postnatal Diagnosis
CN117778566A (zh) 用于预测甲状腺癌转移的标志物及其应用
EP1114187A1 (en) Geometrical and hierarchical classification based on gene expression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant