CN109852701B - Composite system for family source inference and inference method and application thereof - Google Patents

Composite system for family source inference and inference method and application thereof Download PDF

Info

Publication number
CN109852701B
CN109852701B CN201811617609.8A CN201811617609A CN109852701B CN 109852701 B CN109852701 B CN 109852701B CN 201811617609 A CN201811617609 A CN 201811617609A CN 109852701 B CN109852701 B CN 109852701B
Authority
CN
China
Prior art keywords
primer
final concentration
site
base extension
artificial sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811617609.8A
Other languages
Chinese (zh)
Other versions
CN109852701A (en
Inventor
梁伟波
张�林
屈胜秋
朱镜
王胤吉
尹璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201811617609.8A priority Critical patent/CN109852701B/en
Publication of CN109852701A publication Critical patent/CN109852701A/en
Application granted granted Critical
Publication of CN109852701B publication Critical patent/CN109852701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a composite system for family source inference and an inference method and application thereof. The compound system comprises a compound primer for amplifying 20 AIMs sites and single base extension primers which are in one-to-one correspondence with the 20 AIMs sites; the specific sequence of the composite primer is shown as SEQ ID NO. 1-40, and the specific sequence of the single base extension primer is shown as SEQ ID NO. 41-60. The invention adopts the combination of 20 specific AIMs loci to form a detection system which can effectively carry out multiplex amplification on unknown individuals.

Description

Composite system for family source inference and inference method and application thereof
Technical Field
The invention belongs to the technical field of forensic detection, and particularly relates to a composite system for family source inference, and an inference method and application thereof.
Background
Family source inference is the inference of the geographic origin of a sample or individual of unknown origin or the constituent proportions of different geographic origin components in its genetic information by evaluating a series of Ancestral Information Markers (AIMs). In the practice of forensic science, inference can provide more clues when there is a lack of data in the database that matches the suspect. In recent years, many combinations of sites (panel) deduced with Single Nucleotide Polymorphisms (SNPs) and InDels (InDels) as ethnic origins have been reported, and most of the combinations of sites can distinguish between three and five populations, generally ancestral origins in different continental regions, such as africa, europe, asia, america, and the like. However, the site combinations for the internal differentiation of the subgroups are few, and the site combinations reported in the current research for the ancestry inference of the subgroups are obtained by further screening among the existing site combinations, so in order to differentiate the subgroups more finely, we need to search for new combinations of secondary ancestry information markers.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention provides a composite system for performing family source inference, and an inference method and application thereof, and provides a method for performing family source inference on more finely differentiated subgroups.
In order to achieve the purpose, the technical scheme adopted by the invention for solving the technical problems is as follows:
a composite system for family source inference comprises a composite primer for amplifying 20 AIMs sites and a single base extension primer corresponding to the 20 AIMs sites one by one; the loci comprise rs200189385, rs76102765, rs201197181, rs3827760, rs17034666, rs922452, rs1426654, rs57108441, rs12441154, rs12440301, rs1834640, rs72643557, rs72643559, rs434124, rs10444771, rs174592, rs1071803, rs9805939, rs 34674 505505505and rs 59950930;
the specific sequences of the composite primers used for amplifying the 20 AIMs are shown as follows:
rs1071803-F:AGACCTACATCTGCAACGTGA;(SEQ ID NO.1)
rs1071803-R:CTGGACTGGGGCTGCATAG;(SEQ ID NO.2)
rs34505674-F:ACAGAACAGAAGTGCCAGGA;(SEQ ID NO.3)
rs34505674-R:GGACAAGTGACCTCAGGCAT;(SEQ ID NO.4)
rs12440301-F:TAGTGCACACCTACCCCAAG;(SEQ ID NO.5)
rs12440301-R:AGCCAATTCTAGACTCCACCT;(SEQ ID NO.6)
rs1426654-F:TCAGCCCTTGGATTGTCTCA;(SEQ ID NO.7)
rs1426654-R:TGCCAATATCTCCCTTTGTGAT;(SEQ ID NO.8)
rs174592-F:CGCCGAGAAAATGAAGCCTT;(SEQ ID NO.9)
rs174592-R:TCTGAGCAGGAACGTGGATT;(SEQ ID NO.10)
rs72643559-F:TTTTGTCCCGCTAGTGTCCA;(SEQ ID NO.11)
rs72643559-R:GGCCGAGAGACTACGTTTTC;(SEQ ID NO.12)
rs200189385-F:GAACAATGTGACCTCTAGTACCA;(SEQ ID NO.13)
rs200189385-R:ATAACCCCAAAATGCCCAGC;(SEQ ID NO.14)
rs59950930-F:TGTGGGGTGTGCATGACTAT;(SEQ ID NO.15)
rs59950930-R:CTAGAGGAGGGGTCTGTGC;(SEQ ID NO.16)
rs10444771-F:GTCTCAAGCACACATTCCCC;(SEQ ID NO.17)
rs10444771-R:CAGCTCCTAGATTTTGGCAGG;(SEQ ID NO.18)
rs12441154-F:CACAATTGCCTCACGGATGA;(SEQ ID NO.19)
rs12441154-R:CAGCACAGACCGACAAAGAC;(SEQ ID NO.20)
rs3827760-F:TGCCATTTGATTGCCTCGAG;(SEQ ID NO.21)
rs3827760-R:AAAGAGTTGCATGCCGTCTG;(SEQ ID NO.22)
rs922452-F:GGTGTCCATCAGTACCCGAA;(SEQ ID NO.23)
rs922452-R:GGAGGCATCAAGATTCCAGC;(SEQ ID NO.24)
rs201197181-F:GACATTCAGATTGGTGGCCC;(SEQ ID NO.25)
rs201197181-R:GAAGGAGAAAGAAGCCCACG;(SEQ ID NO.26)
rs72643557-F:AGGCTGGTCTTGAACTGAGA;(SEQ ID NO.27)
rs72643557-R:CACCACTGACATTGCCCAAT;(SEQ ID NO.28)
rs9805939-F:GTTTGGATGTGCGTCTCTCC;(SEQ ID NO.29)
rs9805939-R:GAGAAGAAAGTGGCACGTCC;(SEQ ID NO.30)
rs76102765-F:AAGACACGTTTAGACAATCAGTG;(SEQ ID NO.31)
rs76102765-R:TGACTTGTGACAGTTCCAACAG;(SEQ ID NO.32)
rs17034666-F:CCTGGCCGGGAAGATAATCT;(SEQ ID NO.33)
rs17034666-R:TGAGACAACCAAGGCACAGA;(SEQ ID NO.34)
rs1834640-F:TTCGCGTTGTGTCATCCTTG;(SEQ ID NO.35)
rs1834640-R:GTCAGATGCAGGTTGAAGCT;(SEQ ID NO.36)
rs57108441-F:TCTGGCCATTCTCTCAGGAC;(SEQ ID NO.37)
rs57108441-R:GCAGGGGCCTCAGATTCTTA;(SEQ ID NO.38)
rs434124-F:GCATCCTGACCCCTGAGATT;(SEQ ID NO.39)
rs434124-R:GAACCCAGCCTTTGAGTGTG;(SEQ ID NO.40)
the specific sequence of the single-base extension primer is shown as follows:
rs12440301:gactgactTTGGACAACAGTTTTTTAGA;(SEQ ID NO.41)
rs1426654:actgactgactgactgactGTCTCAGGATGTTGCAGGC;(SEQ ID NO.42)
rs1071803:gactgactgactgactgactgactCAACACCAAGGTGGACAAGA;(SEQ ID NO.43)
rs34505674:
actgactgactgactgactgactgactgactgactgactgactgactgactTCAGCCATGGGTTAAGAGCT;(SEQ ID NO.44)
rs12441154:gactgactgactgactgactgactGCAGAATGAGGCAGTGAAAT;(SEQ ID NO.45)
rs72643559:gactgactgactgactgactgactTCTTCGTAATTCTAGTTGTG;(SEQ ID NO.46)
rs174592:gactgactgactgactgactgactgactGGCCTCAAGTTCATCAGTTC;(SEQ ID NO.47)
rs10444771:ctgactgactgactgactgactgactgactgactCCTTCTCCATCCCAGCTC;(SEQ ID NO.48)
rs59950930:actgactgactgactgactgactgactgactgactgactgactGTGCAGCCCGGGTGACA;(SEQ ID NO.49)
rs200189385:
gactgactgactgactgactgactgactgactgactgactgactgactgactTTGCTTTTACCTAGGGTTAC;(SEQ ID NO.50)
rs3827760:ctgactgactGTACAACTCTGAGAAGGCTG;(SEQ ID NO.51)
rs9805939:tgactgactgactAGACTTTTAAACAACCAGCT;(SEQ ID NO.52)
rs17034666:ccccTTTAATGCTATTATTAAAATTAGTTGTGAG;(SEQ ID NO.53)
rs72643557:gactgactgactgactgactgactgactGCCAAGGTGGCC;(SEQ ID NO.54)
rs1834640:ctgactgactCTATGGCATTGATTATTCCTTGGTTTATGT;(SEQ ID NO.55)
rs201197181:tgactgactgactgactgactAAGAATTCATTCTTCTTCTTCTT;(SEQ ID NO.56)
rs922452:ctgactgactgactgactgactgactgactGGAGCTCACCACCCTT;(SEQ ID NO.57)
rs57108441:tgactgactgactgactgactgactgactgactCAACTTGGACTCCCACCTG;(SEQ ID NO.58)
rs76102765:gactgactgactgactgactgactgactgactgactAACAGAATAGGGAATGATCT;(SEQ ID NO.59)
rs434124:gactgactgactgactgactgactgactgactgactgactCTTGAAGGGACTTAACTCCA。(SEQ ID NO.60)
further, in the composite primers, the final concentrations of the front primer and the rear primer in each site primer are the same, and the specific concentrations are shown in table 2.
Further, the final concentrations of the single base extension primer primers are shown in Table 3.
The method for estimating the family source by using the kit is characterized by comprising the following steps:
(1) extracting DNA to be detected;
(2) using DNA to be detected as a template, carrying out PCR amplification by using the composite primer of claim 1, and purifying an amplification product;
(3) then carrying out composite extension on the purified amplification product by using the single-base extension primer of claim 1 to obtain a typing result, and comparing the typing result with data in a standard database.
Further, in the step (2), the amplification system comprises 2 × multiple PCR Master Mix 5 μ L, composite primer 1 μ L, 1 μ L of DNA template 1ng/μ L, and finally, double distilled water is used to make up to 10 μ L.
Further, in the step (2), the amplification condition is 95 ℃, and the pre-denaturation is carried out for 15 min; denaturation at 94 ℃ for 30 s; annealing at 53-60 ℃ for 90 s; extension at 72 ℃ for 20 s; finally, extension for 10min at 72 ℃ for 30 cycles.
Further, the reaction system purified in step (2) includes 5. mu.L of the amplification product, 2.5. mu.L of rSAP 1U/. mu.L, and 1.5. mu.L of Exol 2U/. mu.L.
Further, in the step (2), the purification conditions were incubation at 37 ℃ for 60min and then treatment at 80 ℃ for 15min to eliminate the enzyme activity.
Further, the amplification system in step (3) comprises 1.5. mu.L of the amplification product, 0.5. mu.L of the single-base extension primer, and SNaPshotTMMultiplex Ready Reaction Mix 1.5μL。
Further, in the step (3), the amplification condition is 96 ℃, and the denaturation time is 10 s; annealing at 59 ℃ for 5 s; extending for 10s at 60 ℃; for a total of 25 cycles.
A kit for performing family source inference, comprising the above-described composite system.
More specifically, the kit of the invention specifically comprises the following components:
a) complex amplification reaction mixture: containing PCR buffer solution, MgCl2Common components such as dNTPs and DNA polymerase; b) a composite primer: the specific sequence is shown as SEQ ID NO. 1-40; c) amplification product purification reagents: contains common components such as exonuclease 1(ExoI) and Buffer solution (ExoI Buffer) thereof, Shrimp Alkaline Phosphatase (SAP) and Buffer solution (SAP Buffer) thereof, and the like; purifying the product after the composite amplification so as to be convenient for the next operation; d) single base extension primer: the specific sequence is shown as SEQ ID NO. 41-60; e) single base extension reaction mixture: comprises common components such as DNA polymerase, buffer solution, MgCl2, fluorescent labeled dideoxyribonucleic acid and the like;
the composite amplification reaction mixed solution and the amplification product purification reagent can be prepared according to a common formula in the field or according to a molecular biology manual, and also can be directly used for commercialized products; as the template for extracting DNA from the sample to be tested, various conventional reagents currently used in the art can be used, and the extraction of the DNA template can be carried out by referring to the conventional methods.
The invention has the beneficial effects that:
1. the combination of 20 specific AIMs loci of the present invention was screened by comparing the difference in allele frequencies of SNP and InDels of whole genome autosomal biallelic between two populations within Asia using the thousand human genome database by the applicants, rather than based on preexisting loci combinations.
2. The present invention utilizes a combination of 20 specific AIMs sites to divide not only 30 populations (26 populations of thousand human genomes and 4 test populations) into 9 population sources, namely Africa, sub-Saharan Africa, south Asia, mixed-ethnic Africa, Europe, America, eastern Asia, central east Asia, southeast Asia. Only 14 groups (10 groups of thousand human genomes and 4 test groups) aiming at Asia can be divided into 5 group sources of Europe, south Asia, east Asia east, middle east Asia and south east Asia, and particularly, the further differentiation of the east Asia groups is realized. Can provide important clues in the actual cases of forensic medicine aiming at east Asian internal groups with close geographical positions.
3. The method and system of the present invention, using the combination of 20 specific AIMs sites, constitute a detection system capable of performing effective multiplex amplification on unknown individuals.
Drawings
FIG. 1 shows the results of STRUCTURE population composition analysis of 30 populations using the STRUCTURE software for 20 specific AIMs sites in the present application;
FIG. 2 shows the results of STRUCTURE population composition analysis of 14 Asian populations using STRUCTURE software for 20 specific AIMs sites in the present application;
FIG. 3 shows the results of the SnaPshot assay for combinations of 20 AIMs sites;
FIG. 4 is a graph of the results of principal component analysis of 20 specific AIMs locus combinations against 14 Asian populations; wherein, FIG. 4a is a difference diagram between PC1 and PC 2; FIG. 4b is a difference diagram between PC1 and PC 3.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Example 1 screening of loci
Firstly, performing frequency comparison of allelic genes of allelic SNP and InDels of autosomes of whole genomes between every two groups through 10 Asia groups of thousand human genomes, and screening out 20 specific AIMs (allele frequencies) loci, specifically comprising rs200189385, rs76102765, rs201197181, rs3827760, rs17034666, rs922452, rs1426654, rs57108441, rs 12441140301, rs1834640, rs72643557, rs 72356459, rs434124, rs10444771, rs174592, rs1071803, rs9805939, rs 50534674 and rs 59950930; the site details are shown in Table 1.
TABLE 120 AIMs site information
dbSNP(rs#) Chromosome (Chr) Physical location
rs200189385 11 100828842
rs76102765 1 73016941
rs201197181 9 132391683
rs3827760 2 109513601
rs17034666 2 109571508
rs922452 2 109543883
rs1426654 15 48426484
rs57108441 15 48394586
rs12441154 15 48390956
rs12440301 15 48389924
rs1834640 15 48392165
rs72643557 11 61579427
rs72643559 11 61620274
rs434124 19 54809336
rs10444771 14 106112575
rs174592 11 61618608
rs1071803 14 106209119
rs9805939 14 106219891
rs34505674 14 106270813
rs59950930 14 106031934
Example 2 primer design
According to the site information obtained by screening, designing composite primers for amplifying 20 AIMs sites and single base extension primers corresponding to the 20 AIMs sites one by using an online tool Primer3 software (http:// Primer3.ut. ee /); the specific information of the composite primer is shown in Table 2, and the specific information of the single-base extended primer is shown in Table 3.
TABLE 2 composite primer details
Figure BDA0001926124280000081
TABLE 3 detailed information of single-base extension primers
Figure BDA0001926124280000082
Figure BDA0001926124280000091
Example 3 partitioning principle of sample sources and family sources to be tested
The reference population used in the present invention consisted of 26 populations (N2504) from the thousand human genome project (http:// www.1000genomes.org /). The 4 test population samples consisted of 4 populations (N189) from the basic medicine and forensic physical evidence laboratory subjects group of the university of sichuan university; respectively 45 Thai human samples (THD) from Thailand Mangu, 49 Han human Samples (SCH) from Sichuan of China, 49 Tibetan human samples (TIB) from Qinghai Xining and 46 Yi human samples (YI) from Liangshan of Sichuan; thousand human genomic DNA samples were taken from cell lines, subject groups of sample DNA were taken from blood samples, collection of all sample objects was supervised by applicable provisions, and the collected objects signed informed consent and informed ancestry information.
FIG. 1 shows the results of STRUCTURE population composition analysis of 30 populations using STRUCTURE software for 20 specific AIMs in the protocol of the present application. The results in fig. 1 show that at K7, this set of locus combinations can distinguish between 9 population sources of africa, sub-saharan africa, south asia, mixed-source africa, europe, america, eastern asia, central eastern asia, southeast asia. FIG. 2 shows the results of STRUCTURE population composition analysis of 14 Asian populations using STRUCTURE software for 20 specific AIMs sites in the protocol of the present application. The results in fig. 2 show that at K4, the 14 populations for asia alone can be divided into 5 population sources for europe, south asia, eastern asia, central east asia, and southeast asia.
Example 4 family source inference detection procedure
1. 189 venous blood of the 189 unknown individual DNAs are obtained as experimental samples, the unknown individual DNAs are extracted by using a BioTeke DNA kit (BioTeke, China), 10 Asian populations of thousand human genomes are used as reference data, detection populations of 4 laboratories are used as unknown samples, and the detailed information of the samples is shown in Table 4.
TABLE 4 sample information
Region of land Group abbreviations Sample size Source
1 SAS BEB 86 1000Genomes
2 SAS ITU 102 1000Genomes
3 SAS STU 102 1000Genomes
4 SAS PJL 96 1000Genomes
5 SAS GIH 103 1000Genomes
6 EAS JPT 104 1000Genomes
7 EAS CHB 103 1000Genomes
8 EAS CHS 105 1000Genomes
9 EAS SCH 50 our lab
10 EAS TIB 50 our lab
11 EAS YI 50 our lab
12 SEA KHV 99 1000Genomes
13 SEA CDX 93 1000Genomes
14 SEA THD 45 our lab
2. The compound system prepared by the invention is adopted to carry out family source inference on the compound system, and the specific process is as follows:
(1) taking the extracted DNA as a template, adopting primers with sequences shown as SEQ ID NO. 1-40, carrying out multiple PCR reaction by using a GeneAmp 9700 thermal cycler (PCR amplification instrument), wherein an amplification system comprises 2 multiplied Multiplex PCR Master Mix 5 muL, composite primers 1 muL and 1 ng/muL DNA template 1 muL, and finally, supplementing the DNA template to 10 muL by double distilled water; the amplification condition is 95 ℃, and the pre-denaturation is 15 min; denaturation at 94 ℃ for 30 s; annealing at 53-60 ℃ for 90 s; extension at 72 ℃ for 20 s; finally, extending for 10min at 72 ℃ for 30 cycles;
(2) purifying the amplified product, wherein the purification reaction system comprises 5 mu L of the amplified product, 2.5 mu L of rSAP with the concentration of 1U/mu L and 1.5 mu L of Exol with the concentration of 2U/mu L; the purification condition is incubation for 60min at 37 ℃, then treatment for 15min at 80 ℃ to eliminate enzyme activity;
(3) using the SNaPshot Multiplex kit (ABI, USA), an extension reaction system was prepared, and the purified amplification product was subjected to complex extension using a single-base extension primer, wherein the amplification system included 1.5. mu.L of the purified amplification product, 0.5. mu.L of the single-base extension primer, and the SNaPshotTMMultiplex Ready Reaction Mix 1.5. mu.L; the amplification condition is 96 ℃, and the denaturation time is 10 s; annealing at 59 ℃ for 5 s; extending for 10s at 60 ℃; a total of 25 cycles; then adding 1 mu L of rSAP (1U/. mu.L) into the reaction product, uniformly mixing, incubating for 60min at 37 ℃, then purifying for 15min at 80 ℃, and removing redundant primers and dNTP.
3. Typing detection of single base extension purified product
Performing electrophoresis detection on the purified extension product on a 3130 type genetic analyzer (ABI, USA); the assay system was 10. mu.L, and included 1. mu.L of a mixture of the purified single base extension product, 9. mu.L of formamide, and the GeneScanLiz-120 internal standard (38:1 by volume) (ABI, USA). The electrophoresis parameters are as follows: the sample injection time is 12s, the sample injection voltage is 1.5kV, the electrophoresis voltage is 15kV, and the electrophoresis time is 18 min. The genotypes of the 20 AIMs sites were obtained according to GeneMapper ID-X v 1.2.2 software.
Example 5 software and analytical methods
1. Principal Component Analysis (PCA)
Performing principal component analysis using the R software package 30 populations including 26 population samples and 4 test population samples of the thousand human genome database were divided by region, and population principal component analysis based on allele frequency was performed.
2. Cluster analysis
For 14 populations in Table 4, clustering analysis was performed using structure.v.2.3.4 software (K3-10), individual population genetic structure was analyzed, STRUCTURUREHARSTER (http:// taylor0. biology.u.edu/structure Harvester) was used to find the optimal K value, and then population clustering results were plotted using CLUMPPv.1.1.2 and Distrucctv.1.1. STRUCTURE is a free open source, classical bioinformatics software that uses multi-site genotype data composed of unlinked markers to implement model-based clustering methods to infer population composition STRUCTURE, and is widely used in the fields of human genetics, group genetics, forensic genetics, and the like. The parameters used were 100,000burn-insteps and 100,000 MCsteps, 5 replicates.
Example 6 analysis of results
1. Composite test system results
The results are shown in FIG. 3, the result of the SNaPshot assay for the combination of 20 AIMs sites, 20 AIMs sites divided into 3 multiplex assay systems, and the system S1 comprising 4 sites: rs12440301, rs1426654, rs1071803, rs34505674, and the typing results of the example sample are shown in fig. 3 a; system S2 comprises 6 sites: rs12441154, rs72643559, rs174592, rs10444771, rs59950930 and rs200189385, and the typing results of the example samples are shown in fig. 3 b; system S3 contains 10 sites: rs3827760, rs9805939, rs17034666, rs72643557, rs1834640, rs201197181, rs922452, rs57108441, rs76102765, rs434124, the typing results for the example sample are shown in fig. 3 c. The alleles of 20 AIMs sites clearly judged the typing.
2. Discriminatory potency of 20 specific AIMs site combinations
(1) Cluster analysis
FIG. 2 shows the results of STRUCTURE population composition analysis of 14 Asian populations using STRUCTURE software for 20 specific AIMs sites in the protocol of the present application. The results in fig. 2 show that at K4, the 14 populations for asia can be divided into 5 population sources for europe, south asia, east asia, middle east asia, south east asia.
Table 5 shows the results of analysis of the population composition obtained using the STRUCTURE software for 16 of the test specimens enumerated by the Applicant. As can be seen from Table 5, the population composition of each sample, taking NO.1 as an example, the composition of NO.1 in the test population THD from the southeast Asia population was 0.988, which supports the southeast Asia population, and the data analysis of NO.2-NO.16 below is the same as NO. 1.
TABLE 5 analysis of population composition
Figure BDA0001926124280000131
(2) Principal component analysis
FIG. 4 is a graph of the results of principal component analysis of 20 specific AIMs site combinations against 14 Asian populations, from which it can be seen that principal component 1(PC1) and principal component 2(PC2) account for 68.26% of the differences and principal component 3(PC3) account for 14.28% of the differences; combining the analysis of the first three main components, the 20 specific AIMs can obviously distinguish 14 groups into 4 groups, namely the groups of south Asia, east Asia, south east Asia and GIH, thereby obtaining the ethnic information of the individual to be detected.
Sequence listing
<110> Sichuan university
<120> a complex system for group source inference, inference method and application thereof
<160> 60
<170> SIPOSequenceListing 1.0
<210> 1
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
agacctacat ctgcaacgtg a 21
<210> 2
<211> 19
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
ctggactggg gctgcatag 19
<210> 3
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
acagaacaga agtgccagga 20
<210> 4
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
ggacaagtga cctcaggcat 20
<210> 5
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
tagtgcacac ctaccccaag 20
<210> 6
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
agccaattct agactccacc t 21
<210> 7
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
tcagcccttg gattgtctca 20
<210> 8
<211> 22
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
tgccaatatc tccctttgtg at 22
<210> 9
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
cgccgagaaa atgaagcctt 20
<210> 10
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
tctgagcagg aacgtggatt 20
<210> 11
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
ttttgtcccg ctagtgtcca 20
<210> 12
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
ggccgagaga ctacgttttc 20
<210> 13
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
gaacaatgtg acctctagta cca 23
<210> 14
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
ataaccccaa aatgcccagc 20
<210> 15
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
tgtggggtgt gcatgactat 20
<210> 16
<211> 19
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
ctagaggagg ggtctgtgc 19
<210> 17
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 17
gtctcaagca cacattcccc 20
<210> 18
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 18
cagctcctag attttggcag g 21
<210> 19
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
cacaattgcc tcacggatga 20
<210> 20
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
cagcacagac cgacaaagac 20
<210> 21
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 21
tgccatttga ttgcctcgag 20
<210> 22
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 22
aaagagttgc atgccgtctg 20
<210> 23
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 23
ggtgtccatc agtacccgaa 20
<210> 24
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 24
ggaggcatca agattccagc 20
<210> 25
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 25
gacattcaga ttggtggccc 20
<210> 26
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 26
gaaggagaaa gaagcccacg 20
<210> 27
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 27
aggctggtct tgaactgaga 20
<210> 28
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 28
caccactgac attgcccaat 20
<210> 29
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 29
gtttggatgt gcgtctctcc 20
<210> 30
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 30
gagaagaaag tggcacgtcc 20
<210> 31
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 31
aagacacgtt tagacaatca gtg 23
<210> 32
<211> 22
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 32
tgacttgtga cagttccaac ag 22
<210> 33
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 33
cctggccggg aagataatct 20
<210> 34
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 34
tgagacaacc aaggcacaga 20
<210> 35
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 35
ttcgcgttgt gtcatccttg 20
<210> 36
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 36
gtcagatgca ggttgaagct 20
<210> 37
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 37
tctggccatt ctctcaggac 20
<210> 38
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 38
gcaggggcct cagattctta 20
<210> 39
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 39
gcatcctgac ccctgagatt 20
<210> 40
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 40
gaacccagcc tttgagtgtg 20
<210> 41
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 41
gactgacttt ggacaacagt tttttaga 28
<210> 42
<211> 38
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 42
actgactgac tgactgactg tctcaggatg ttgcaggc 38
<210> 43
<211> 44
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 43
gactgactga ctgactgact gactcaacac caaggtggac aaga 44
<210> 44
<211> 71
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 44
actgactgac tgactgactg actgactgac tgactgactg actgactgac ttcagccatg 60
ggttaagagc t 71
<210> 45
<211> 44
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 45
gactgactga ctgactgact gactgcagaa tgaggcagtg aaat 44
<210> 46
<211> 44
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 46
gactgactga ctgactgact gacttcttcg taattctagt tgtg 44
<210> 47
<211> 48
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 47
gactgactga ctgactgact gactgactgg cctcaagttc atcagttc 48
<210> 48
<211> 52
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 48
ctgactgact gactgactga ctgactgact gactccttct ccatcccagc tc 52
<210> 49
<211> 60
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 49
actgactgac tgactgactg actgactgac tgactgactg actgtgcagc ccgggtgaca 60
<210> 50
<211> 72
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 50
gactgactga ctgactgact gactgactga ctgactgact gactgactga ctttgctttt 60
acctagggtt ac 72
<210> 51
<211> 30
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 51
ctgactgact gtacaactct gagaaggctg 30
<210> 52
<211> 33
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 52
tgactgactg actagacttt taaacaacca gct 33
<210> 53
<211> 34
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 53
cccctttaat gctattatta aaattagttg tgag 34
<210> 54
<211> 40
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 54
gactgactga ctgactgact gactgactgc caaggtggcc 40
<210> 55
<211> 40
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 55
ctgactgact ctatggcatt gattattcct tggtttatgt 40
<210> 56
<211> 44
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 56
tgactgactg actgactgac taagaattca ttcttcttct tctt 44
<210> 57
<211> 46
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 57
ctgactgact gactgactga ctgactgact ggagctcacc accctt 46
<210> 58
<211> 52
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 58
tgactgactg actgactgac tgactgactg actcaacttg gactcccacc tg 52
<210> 59
<211> 56
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 59
gactgactga ctgactgact gactgactga ctgactaaca gaatagggaa tgatct 56
<210> 60
<211> 60
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 60
gactgactga ctgactgact gactgactga ctgactgact cttgaaggga cttaactcca 60

Claims (8)

1. A primer combination for detecting the locus used for ethnic group inference is characterized by comprising a composite primer for amplifying 20 AIMs loci and a single-base extension primer which is in one-to-one correspondence with the 20 AIMs loci; the loci comprise rs200189385, rs76102765, rs201197181, rs3827760, rs17034666, rs922452, rs1426654, rs57108441, rs12441154, rs12440301, rs1834640, rs72643557, rs72643559, rs434124, rs10444771, rs174592, rs1071803, rs9805939, rs 34674 505505505and rs 59950930;
the specific sequence of the composite primer for amplifying 20 AIMs sites is shown as SEQ ID NO. 1-40; the specific sequence of the single base extension primer is shown as SEQ ID NO. 41-60.
2. The primer composition for detecting ethnic group deduced site according to claim 1, wherein the final concentration of the forward primer and the reverse primer of rs1071803 in the composite primer is 0.595 μ M; the final concentration of the forward primer and the reverse primer of rs34505674 is 0.595 μ M; the final concentration of the forward primer and the reverse primer of rs12440301 is 0.595 mu M; the final concentrations of the forward primer and the reverse primer of rs1426654 are both 0.714 μ M; the final concentration of the forward primer and the reverse primer of rs174592 is 0.903. mu.M; the final concentration of the forward primer and the reverse primer of rs72643559 are both 0.516. mu.M; the final concentration of the forward primer and the reverse primer of rs200189385 is 0.339. mu.M; the final concentration of the forward primer and the reverse primer of rs59950930 is 0.565 μ M; the final concentration of the forward primer and the reverse primer of rs10444771 are both 1.032 μ M; the final concentration of the forward primer and the reverse primer of rs12441154 is 0.516. mu.M; the final concentration of the forward primer and the reverse primer of rs3827760 are both 0.518. mu.M; the final concentration of the forward primer and the reverse primer of rs922452 are both 0.691 mu M; the final concentration of the forward primer and the reverse primer of rs201197181 is 1.036 mu M; the final concentration of the forward primer and the reverse primer of rs72643557 is 1.036 mu M; the final concentration of the forward primer and the reverse primer of rs9805939 is 0.518. mu.M; the final concentration of the forward primer and the reverse primer of rs76102765 is 0.833 μ M; the final concentrations of the forward primer and the reverse primer of rs17034666 are both 0.667 μ M; the final concentration of the forward primer and the reverse primer of rs1834640 is 0.833. mu.M; the final concentration of the forward primer and the reverse primer of rs57108441 is 0.667 mu M; the final concentration of the forward primer and the reverse primer of rs434124 is 1. mu.M.
3. The primer composition for detecting ethnic group deduced site according to claim 1, wherein the final concentration of the single-base extended primer of the rs12440301 site is 0.5 μ M; the final concentration of the single-base extension primer of the rs1426654 site is 0.333 mu M; the final concentration of the single base extension primer of the rs1071803 site is 0.25 mu M; the final concentration of the single base extension primer at the rs34505674 site is 0.125. mu.M; the final concentration of the single-base extension primer of the rs12441154 site is 0.033 mu M; the final concentration of the single-base extension primer at the rs72643559 site is 0.833 mu M; the final concentration of the single base extension primer of the rs174592 locus is 0.333 mu M; the final concentration of the single base extension primer of the rs10444771 site is 0.833 mu M; the final concentration of the single-base extension primer of the rs59950930 site is 0.833 mu M; the final concentration of the single base extension primer of the rs200189385 site is 0.167 mu M; the final concentration of the single base extension primer of the rs3827760 site is 0.1 mu M; the final concentration of the single base extension primer of the rs9805939 site is 0.05 mu M; the final concentration of the single base extension primer at the site rs17034666 is 0.05. mu.M; the final concentration of the single-base extension primer of the rs72643557 site is 1 mu M; the final concentration of the single base extension primer of the rs1834640 site is 0.1 mu M; the final concentration of the rs201197181 primer is 1 μ M; the final concentration of the single base extension primer of the rs922452 site is 0.1 mu M; the final concentration of the single base extension primer at the rs57108441 site is 0.5 mu M; the final concentration of the single-base extension primer of the rs76102765 site is 0.8 mu M; the final concentration of the single base extension primer at the rs434124 site was 0.2. mu.M.
4. A method for family source inference using the primer composition of claim 1, comprising the steps of:
(1) extracting DNA to be detected;
(2) using DNA to be detected as a template, performing PCR amplification by using the primer composition of claim 1, and purifying an amplification product;
(3) then the single-base extension primer of claim 1 is used for carrying out composite extension on the purified amplification product, and finally a typing result is obtained, namely the family source information can be obtained through analysis.
5. The method of claim 4, wherein the amplification system in step (2) comprises 2 x multiple PCR Master Mix 5 μ L, composite primer 1 μ L, 1ng/μ L DNA template 1 μ L, and finally made up to 10 μ L with double distilled water.
6. The method according to claim 4, wherein the amplification conditions in step (2) are 95 ℃ and pre-denaturation is carried out for 15 min; denaturation at 94 ℃ for 30 s; annealing at 53-60 ℃ for 90 s; extension at 72 ℃ for 20 s; finally, extension for 10min at 72 ℃ for 30 cycles.
7. The method of claim 4, wherein the purification reaction system in step (2) comprises 5. mu.L of amplification product, 2.5. mu.L of rSAP 1U/. mu.L, and 1.5. mu.L of Exol 2U/. mu.L; the purification conditions were 37 ℃ incubation for 60min, followed by 80 ℃ treatment for 15min to eliminate the enzyme activity.
8. A kit for family source inference, comprising the primer composition according to claim 1.
CN201811617609.8A 2018-12-28 2018-12-28 Composite system for family source inference and inference method and application thereof Active CN109852701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811617609.8A CN109852701B (en) 2018-12-28 2018-12-28 Composite system for family source inference and inference method and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811617609.8A CN109852701B (en) 2018-12-28 2018-12-28 Composite system for family source inference and inference method and application thereof

Publications (2)

Publication Number Publication Date
CN109852701A CN109852701A (en) 2019-06-07
CN109852701B true CN109852701B (en) 2021-01-26

Family

ID=66892846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811617609.8A Active CN109852701B (en) 2018-12-28 2018-12-28 Composite system for family source inference and inference method and application thereof

Country Status (1)

Country Link
CN (1) CN109852701B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105861654A (en) * 2016-04-05 2016-08-17 公安部物证鉴定中心 Method and system for analyzing ten group sources for unknown-source individual
CN107419017A (en) * 2017-07-25 2017-12-01 公安部物证鉴定中心 The method and system for unknown source individual infer in five continents border group source
CN108411008A (en) * 2018-06-01 2018-08-17 公安部物证鉴定中心 The application of 72 SNP sites and relevant primer in identifying or assisting identification human groups

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105861654A (en) * 2016-04-05 2016-08-17 公安部物证鉴定中心 Method and system for analyzing ten group sources for unknown-source individual
CN107419017A (en) * 2017-07-25 2017-12-01 公安部物证鉴定中心 The method and system for unknown source individual infer in five continents border group source
CN108411008A (en) * 2018-06-01 2018-08-17 公安部物证鉴定中心 The application of 72 SNP sites and relevant primer in identifying or assisting identification human groups

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Genetic differences among ethnic groups;Tao Huang等;《BMC Genomics》;20151221;第16卷;文献号1093,第1-10页,全文 *

Also Published As

Publication number Publication date
CN109852701A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
Daniel et al. A SNaPshot of next generation sequencing for forensic SNP analysis
Old et al. Fetal DNA analysis
JP5680304B2 (en) Rapid forensic DNA analysis
Huang et al. Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system
CN107385064B (en) Fluorescence labeling composite amplification kit for simultaneously amplifying human autosomal SNP and STR loci and application thereof
CN108220413B (en) Fluorescent multiplex amplification kit for combined detection of human Y chromosome STR and Indel loci and application thereof
US20050037388A1 (en) Method for detecting diseases caused by chromosomal imbalances
Bouakaze et al. First successful assay of Y-SNP typing by SNaPshot minisequencing on ancient DNA
US10648032B2 (en) High-throughput sequencing method for methylated CpG island in trace DNA
CN111088329B (en) Fluorescence composite amplification system, kit and application thereof
CN107419017B (en) Method and system for inferring source of five continental ethnic groups of individuals of unknown origin
Watahiki et al. Polymorphisms and microvariant sequences in the Japanese population for 25 Y-STR markers and their relationships to Y-chromosome haplogroups
Du et al. developmental validation of a novel 6-dye typing system with 36 Y-STR loci
Wei et al. A novel multiplex assay of SNP-STR markers for forensic purpose
CN104830852A (en) Multiplex real-time fluorescent PCR (polymerase chain reaction) method for detecting HLA-B*15:02 alleles
CN110551813B (en) Primer group, application, product and method for detecting related SNP (single nucleotide polymorphism) sites of drug metabolic capability of rheumatic immune disease
WO2019127928A1 (en) Kit for detecting genetic variation in human chromosome 15q11-13 and application thereof
CN110846399A (en) Cardiovascular disease individualized medication gene detection system kit and application thereof
US20040241655A1 (en) Conditional touchdown multiplex polymerase chain reaction
CN112280849B (en) Composite amplification system and kit for anti-depression individualized medication genotyping detection
CN109852701B (en) Composite system for family source inference and inference method and application thereof
Mizuno et al. A forensic method for the simultaneous analysis of biallelic markers identifying Y chromosome haplogroups inferred as having originated in Asia and the Japanese archipelago
Lee et al. Analysis of mutation rates and haplotypes of 23 Y-chromosomal STRs in Korean father–son pairs
CN109852702B (en) SNP-SNP marked composite system and method and application thereof for detecting unbalanced mixed sample
Damour et al. Identification and characterization of novel DIP-STRs from whole-genome sequencing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant