CN102943111B - Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method - Google Patents

Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method Download PDF

Info

Publication number
CN102943111B
CN102943111B CN201210466090.4A CN201210466090A CN102943111B CN 102943111 B CN102943111 B CN 102943111B CN 201210466090 A CN201210466090 A CN 201210466090A CN 102943111 B CN102943111 B CN 102943111B
Authority
CN
China
Prior art keywords
sequencing
str
sequence
library
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210466090.4A
Other languages
Chinese (zh)
Other versions
CN102943111A (en
Inventor
周骋
姚志建
潘雅姣
陈琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING IPE BIOTECHNOLOGY Co Ltd
Original Assignee
BEIJING IPE BIOTECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING IPE BIOTECHNOLOGY Co Ltd filed Critical BEIJING IPE BIOTECHNOLOGY Co Ltd
Priority to CN201210466090.4A priority Critical patent/CN102943111B/en
Publication of CN102943111A publication Critical patent/CN102943111A/en
Application granted granted Critical
Publication of CN102943111B publication Critical patent/CN102943111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses application of a high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of a short tandem repeat gene locus in human genome and a method, belonging to the technical field of biology. The method comprises the following steps of: preparing a multi-sample gene locus proliferation sublibrary; sequencing high-pass DNA; and analyzing data and reporting the result. According to the application and the method disclosed by the invention, the high-pass DNA sequencing technology is applied to determination of the human short tandem repeat (STR) gene locus for the first time and the resolving power and the sensitivity of the STR locus as individual human recognition are remarkably improved; and meanwhile, the detection pass of the STR is greatly improved.

Description

High-throughput DNA sequencing method is for measuring purposes and the method for the short-and-medium fragment tandem repeat of human genome
Technical field
The present invention relates to biological technical field, particularly a kind of high-throughput DNA sequencing method is for measuring purposes and the method for the short-and-medium fragment tandem repeat of human genome.
Background technology
In the work such as legal medical expert, criminal investigation and material evidence qualification, how to follow the trail of, assert suspect; In the event such as crime, disaster, how to judge related personnel or victim; In relatives assert, how to confirm sibship etc. is the target that occurs for a long time and constantly explore in history of human civilization.After genetics before more than 100 years becomes a subject due to Mendelian's (G.J.Mendel) the sex work of laying a foundation, along with genetic each progress makes above-mentioned work have more solid scientific basic.
1985 Britain geneticist Jeffery Si (A.Jeffreys) so-called " DNA fingerprint " proposed first, point out that some region in human genome has tumor-necrosis factor glycoproteins, and this class tumor-necrosis factor glycoproteins has individual difference (polymorphism) can heredity.Exercise Individual identification with DNA and have many advantages: (1) DNA exists individual difference as the carrier of hereditary signal, shows as difference in shape; (2) DNA as hereditary carrier be again stable, with the basis of sibship; (3) any object that has the cell of core can serve as DNA analysis on human body; (4) stability of DNA sample is better.
To use restriction enzyme to repeat (VNTR) limiting property fragment length polymorphism to variable number series connection in human genome to analyze (RFLP) for detection of the method for genetic polymorphism the earliest, need to comparatively complete (degree of decomposition be lower) and comparatively a large amount of samples but the shortcoming of this method is, this sometimes cannot realize concerning legal medical expert scene, meanwhile, the resolution of the method is also lower.Along with the development of technology, the appearance of archaeal dna polymerase chain reaction (PCR) makes the demand of sample size greatly reduce, evaluating objects is focused on to the shorter tumor-necrosis factor glycoproteins (STR) in VNTR, make again shorter nucleic acid fragment also can be used in analysis, add the application of multiple PCR technique, make rapidly the somatotype of str locus seat detect popularization rapidly in legal medical expert and criminal investigation, and make to collect the database expanding day of correlated crowd STR analytical results.
Polymorphism in said gene group on same site shows as has two or more different nucleic acid primary structures (kind of Nucleotide or the number of tumor-necrosis factor glycoproteins etc.) to be called as allelotrope on this site, analyzes allelic structure and is called as gene type.Allelotrope more polygene type will be more, if allelic number is n, just means and have n kind homozygote and n (n-1)/2 kind of heterozygote.For example on a certain site, there are 10 allelotrope, just should have 10 kinds of homozygotes and 45 kinds of heterozygotes, on this site, have 55 kinds of allelotrope.In Individual identification, need to detect the polymorphism on multiple sites (locus), if they are all not chain, the frequency of its locus can multiply each other simultaneously, just likely greatly improves the confidence level of Individual identification if improve the number of locus.
Since the nineties in last century, be to detect the genotype of approximately 20 locus with multiplex PCR to the general detection method of STR, in detection, use with fluorescently-labeled primer and design the length of amplicon, the fluorescently-labeled amplicon for each locus of having of produced different length is separated in capillary electrophoresis, and compare with standard substance, thereby realize, the allelotrope in each locus is carried out to somatotype.But, this method also exists the defect of bringing due to technical restriction, mainly contain: (1), due to the restriction of the aspect such as phase mutual interference and capillary pipe length and imaging technique of fluorescent marker, the number of analyzed locus has been difficult to further significantly promote; (2) due to analyze to as if the length scale of each fragment, cannot further detect and the fine difference of the nucleic acid primary structure of composition fragment therefore limit the resolution detecting; (3) there is amorphs, cause different test kits likely to occur the difference of some locus measurement result; (4), while especially there is biased sample in the interference of Stutter peak (in fragment analysis time and come across the small peak before main peak).
DNA sequencing method can solve the puzzlement in above-mentioned STR analytical procedure, improves the resolution of gene type.Carry out STR somatotype because the reason of the aspect such as flux, cost is difficult to become a kind of practical methods that can be used for daily mensuration but stop capillary electrophoresis (Sanger method) with the two deoxidations of fluorescent mark, particularly for the construction of database.
Summary of the invention
The object of the embodiment of the present invention is the defect for above-mentioned prior art, provide a kind of and can cost-effectively realize the imagination that multiple sample, multiple site are measured simultaneously, not only improve the flux that STR detects, improved resolving power and the sensitivity of STR site as human individual's identification simultaneously.
The technical scheme that the present invention takes is to achieve these goals:
High-throughput DNA sequencing method is for measuring the purposes of the short-and-medium fragment tandem repeat of human genome, be included in legal medical expert and other and relate to the association area that mankind's str locus seat detects, as the application in the fields such as gene diagnosis, human inheritance's map construction and population genetic study.
Another technical scheme provided by the invention is: measure the method for the short-and-medium fragment tandem repeat of human genome with high-throughput DNA sequencing method, comprise the following steps:
(1) preparation in Multi-example locus proliferator library
Proliferator library refers to that two ends are connected with the DNA fragmentation of different joints, and wherein, a side is sequence measuring joints: can contain sample label, to distinguish the sequencing result of different samples; An other side is permanent joint: catch particle for connecting, the preparation method in proliferator library adopts TRAP or connection method:
A. TRAP: use the multiplex PCR that carries out multiple samples with the special combination of primers of joint sequence, obtain the object fragment with corresponding joint, composition sample library;
B. connection method: the combination of application general primer is carried out after the multiplex PCR of multiple samples, is connected corresponding joint is connected in amplicon fragment by flat end, composition sample library;
(2) high-throughput DNA sequencing
A. sequencing template preparation: library content is fixed on and catches on particle through a side connector, and each particle carries a single DNA fragmentation; By carrying particle and PCR reagent and the oil phase emulsification of template, form water in oil microreactor; Parallel the carrying out of amplification in whole fragment library, forms sequencing template;
B.DNA order-checking: adopt high-flux sequence instrument to carry out DNA sequencing;
(3) data analysis and reporting the result
A. data Quality Control
According to order-checking length and quality, raw data is filtered;
B. information categorization checks order
Carry out according to the sample label sequence information in sequencing result, sequencing result can effectively be sorted out to the folder in different samples, different STR site;
C. Data Format Transform
High-throughput DNA sequencing results conversion is become to the standard format of current STR somatotype result, represent with the multiplicity of str locus seat core tumor-necrosis factor glycoproteins, this step is by making the standard " ladder comparison reference sequence " of certain locus or directly reading the several different methods such as core multiplicity and carry out; Basis also can be shown to some extent with micro-variation of the sample sequence of canonical reference sequence alignment discovery in addition, better to reflect the polymorphism in this STR site.
The concrete operations flow process that the present invention takes is to achieve these goals:
(1) design of object fragment special primer and checking
In human genome, have a large amount of tumor-necrosis factor glycoproteinss, the difference of tumor-necrosis factor glycoproteins quantity determines the hereditary feature of Different Individual.
Carry out design of primers according to published str locus seat sequence information, to measure STR site Short tandem repeats number of times in unknown sample.Following table 1 is listed 40 and is usually used in the design of primers of str locus seat and experimental results show that its operability.Conventionally adopt specificity and the content of agarose electrophoresis to amplified production to detect, adopt sequencing to detect the accuracy of amplified production sequence, thereby prove its operability.
40, table 1 is usually used in the primer of amplification of STR locus
(2) preparation in locus proliferator library
A. merge primer extension method: application is by above-mentioned purpose fragments specific primer, the fusion primer of joint and sample label 3 part compositions, obtain two ends through the multiplex PCR amplification of multiple sample and be connected with different joint sequences, and with the object fragment of sample label, composition DNA library.
Merging primer is except object fragments specific primer, also contains other sequences long primer of (comprising sequence measuring joints, permanent joint, sample label), and its structure is referring to Fig. 2 (taking Ion torrent order-checking platform as example).P joint is permanent joint, catches magnetic bead connect and check order with universal primer for DNA; Sample label forms to distinguish sample by approximately 10 Nucleotide; A joint is that sequence measuring joints is for checking order with universal primer.
B. connection method: apply above-mentioned combination of primers row multiplex PCR and obtain object fragment, being connected with respectively the object fragment of Coded A joint and P joint by flat end connection acquisition two ends, composition DNA library.Idiographic flow is, amplicon → amplicon end-filling of str locus seat and phosphorylation modification → Coded A joint and P joint are connected on amplicon → sample library.
Sample library structure is referring to Fig. 3, and P joint is caught for DNA that magnetic bead connects and checked order with universal primer; A joint with 10 Nucleotide sample label has formed CodedA joint, and wherein A joint is used in conjunction with sequencing primer, and sample label is in order to distinguish different samples.
(3) high-throughput DNA sequencing
A. by method amplification sequencing libraries such as water-in-oil PCR, to prepare sequencing template.With water-in-oil PCR(emulsion PCR, ePCR) be example, idiographic flow is: library content is fixed on and catches on magnetic bead through P joint, make each magnetic bead carry a single DNA fragmentation → by the reagent emulsification of the PCR reagent of water and oil phase, form emulsion, after the magnetic bead that carries template mixes with emulsion, enter in drop, wherein each drop is amplification parallel carrying out in each water-in-oil micro reaction pool in a water in oil micro reaction pool → whole fragment library, forms sequencing template.
B. adopt high-flux sequence instrument to carry out DNA sequencing, for example, use Ion Torrent individualized operation gene order-checking instrument (PGM) to check order.
(4) data analysis and reporting the result
A. data Quality Control
According to order-checking length and quality, raw data is filtered.
B. information categorization checks order
According to the special primer sequence in sample label information and each STR site, sequencing result is sorted out respectively according to sample and str locus seat;
C. Data Format Transform
STR somatotype result represents with the multiplicity of str locus seat core tumor-necrosis factor glycoproteins, and concrete operation step is as follows:
According to the known somatotype result of certain str locus seat, make the standard " ladder comparison reference sequence " of this locus; Such as CSF locus polymorphism shows as the short tandem repeats of 5-9 time, " the ladder comparison reference sequence " of this locus is one group of sequence of core sequence AGAT multiplicity 5-9.By sequence alignment, sequence information is converted to STR somatotype result (referring to following sequence table);
Ladder comparison reference sequence example in data-switching:
<CSF?reference?5?sequence
GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGGAAGTACTTAGAACAGGGTCTGACACAGGAAATGCT
<CSF?reference?6sequence
GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGATAGGAAGTACTTAGAACAGGGTCTGACACAGGAAATGCT
<CSF?reference?7?sequence
GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGATAGATAGGAAGTACTTAGAACAGGGTCTGACACAGGAAATGCT
<CSF?reference?8sequence
GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGATAGATAGATAGGAAGTACTTAGAACAGGGTCTGACACAGGAAATGCT
<CSF?reference?9sequence
GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGATAGATAGATAGAT
D, do not drop on the processing in reference sequence integral point
If certain str locus seat of certain sample does not drop on the yardstick of ladder completely, for example sequencing result shows that it has inserted 2 Nucleotide the 5th and the 6th between short iteron, according to comparison result, this micro-variation also can be exchanged into fixing non-CODIS form, be designated as CSF 8.2 (AG5,6).
E, except making the standard " ladder comparison reference sequence " of locus, directly tumor-necrosis factor glycoproteins is counted etc. to method and also can realize STR Data Format Transform.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
(1) theory of high-flux sequence is introduced to mankind STR somatotype, STR is measured and be accurate to DNA sequence dna by fuzzy clip size;
(2) by introduce sample label in DNA sequencing, to the DNA molecular of different sources in addition mark distinguish, order-checking when realizing Multi-example;
(3) combination of high-throughput DNA sequencing and multiple PCR technique, measures and becomes possibility when making the multiple site to multiple sample.
(4) can be chosen in multiplex PCR and use long fusion primer, generate directly to increase the sequencing library that contains joint sequence and sample label, avoid loaded down with trivial details Connection Step.
(5) under the certain prerequisite of sequencing throughput, increase and detect sample number, detecting position count and the check order ratio of three parameters of the degree of depth by tune, can meet the different needs for sample flux, accuracy of detection and sensitivity.
Brief description of the drawings
Fig. 1 is the schema with high-throughput DNA sequencing method mensuration STR providing in the embodiment of the present invention;
Specifically comprise the following steps: the design of (1) object fragment special primer and checking; (2) preparation in locus proliferator library, DNA library by through the fusion primer of particular design through the multiplex PCR acquisition of directly increasing, or the follow-up connection by common multiple PCR products obtains, this step need to be optimized multiplex PCR system, to ensure balance (3) emulsion archaeal dna polymerase chain reaction (the emulsion PCR of multiple PCR products amount, ePCR) obtain sequencing template, cover and form independently PCR micro reaction pool through emulsion by the particle that carries unique DNA fragment, realize the independent parallel amplification in whole fragment library; (4) high-throughput DNA sequencing; (5) data analysis and reporting the result
Fig. 2 is the fusion primer structural representation providing in the embodiment of the present invention;
A joint is sequencing primer district, and P joint is for catching particle land, and sample label is in order to distinguish different samples.
Fig. 3 is sample library structural representation.
P joint is caught magnetic bead connection and with universal primer order-checking, has been formed CodedA joint with the A joint of 10 Nucleotide sample label for DNA.
Embodiment
High-throughput DNA sequencing method is introduced STR determination techniques by the present invention, can be used for comprising that legal medical expert and other relate to the association area that mankind's str locus seat detects, and comprises the fields such as gene diagnosis, human colony's genetic research, gene mapping.For making the object, technical solutions and advantages of the present invention clearer, below with the example that is applied as at legal medical expert mankind STR site detection field, by reference to the accompanying drawings to the specific embodiment of the present invention, and the advantage that its examining report content is more detailed compared with currently available products is further described.
1. material
Test sample is 10 parts, human gene group DNA's sample (because length limits taking 10 duplicate samples as example) of unknown somatotype result, adopts Amelogenin locus and 15 the str locus seats that high-flux sequence method contains Identifiler test kit to carry out somatotype mensuration.Wherein 2 duplicate samples (sample 1,2).
For embodying the advantage of product of the present invention, adopt existing multicolor fluorescence STR detection kit Amp F/STR Identifiler (Life technology) to carry out somatotype simultaneously, and the somatotype result of two kinds of measuring methods is compared.
High-flux sequence method reagent is made up of 3 main agents boxes, library construction test kit (assembling, comprises two kinds of library construction test kits of TRAP and connection method voluntarily); Sequencing template is prepared test kit (purchased from Ion Torrent company); Sequencing kit (purchased from Ion Torrent company)
(1) library construction test kit
1) TRAP library construction test kit: comprise 10 groups of fusion primer ponds of containing different sample label, every group merges primer (Invitrogen is synthetic) pond and contains 16 pairs of fusion primers with same sample label, and including PCR damping fluid, archaeal dna polymerase, dNTPs(triphosphate deoxy-nucleotide) conventional PCR component (Roche) and nucleic acid purification reagent (AMPureRegent, Beckman).
2) connection method library construction test kit: comprise following 3 components
A) object fragment amplification reagent: other the conventional PCR components (Roche) such as primer pond and archaeal dna polymerase that are made up of 16 STR locus specificity primers (Invitrogen is synthetic) form.
B) connect reagent: by 10 groups of the A joints that contains different sample label and P joint (Invitrogen is synthetic), and connection reagent (purchased from NEB) including end modified enzyme (T4 phosphokinase and T4DNA polysaccharase), T4 ligase enzyme and nucleic acid purification reagent (AMPure Regent, Beckman) composition.
C) connect product amplifing reagent: formed by library propagation primer (Invitrogen) and high-fidelity platinum enzyme (Invitrogen).
Upstream and downstream primer sequence corresponding to 16 STR sites is as following table:
Sample labeling sequence is as following table:
Sample label numbering Sequence
1 CTAAGGTAAC(SEQ?ID?NO.33)
2 TAAGGAGAAC(SEQ?ID?NO.34)
3 AAGAGGATTC(SEQ?ID?NO.35)
4 TACCAAGATC(SEQ?ID?NO.36)
5 CAGAAGGAAC(SEQ?ID?NO.37)
6 CTGCAAGTTC(SEQ?ID?NO.38)
7 TTCGTGATTC(SEQ?ID?NO.39)
8 TTCCGATAAC(SEQ?ID?NO.40)
9 TGAGCGGAAC(SEQ?ID?NO.41)
10 CTGACCGAAC(SEQ?ID?NO.42)
Joint sequence is as following table:
Joint Sequence
A joint CCATCTCATCCCTGCGTGTCTCCGACTCAG(SEQ?ID?NO.43)
P joint CCTCTCTATGGGCAGTCGGTGAT(SEQ?ID?NO.44)
Propagation primer sequence in library is as following table:
Primer Sequence
Upstream primer CCATCTCATCCCTGCGTGTCTCCGACTCAG(SEQ?ID?NO.45)
Downstream primer CCTCTCTATGGGCAGTCGGTGAT(SEQ?ID?NO.46)
(2) sequencing template is prepared test kit: be made up of oil phase, water two portions.Oil phase principal constituent is mineral oil, and water comprises that the conventional component of PCR and object fragment catch particle.
(3) sequencing kit: formed by sequencing primer, sequencing reaction enzyme, 4 kinds of dNTP.
2. method
Multicolor fluorescence labelling method detecting step is with reference to the Standard Operating Procedure of Amp F/STR Identifiler test kit.
The concrete operation step of high-flux sequence method is referring to Fig. 1.
(1) the testing gene seat library of sample is set up
1) TRAP: completed by TRAP library construction test kit, Fig. 2 is shown in by principal component fusion design of primers schematic diagram.The preparation of multi-PRC reaction system and response procedures are as follows:
A) PCR reaction system preparation
Getting 0.2Ep pipe puts and adds following reagent and sample on ice and mix
B) put and in PCR instrument, carry out PCR reaction by follow procedure
C) purify with reference to AMPure Regent operation instructions in library.
2) connection method: completed by connection method library construction test kit, object fragment amplification and joint linker are as follows:
A) object fragment amplification
I PCR reaction system preparation: get 0.2Ep pipe and put and add following reagent and sample on ice and mix
II is put and in PCR instrument, is carried out PCR reaction by follow procedure:
III amplicon is purified: carry out with reference to AMPure Regent operation instructions
B) amplicon is end modified
The end modified reaction system of I preparation: get 0.2Ep pipe and put and add following reagent and sample on ice and mix
II is pressed follow procedure reaction:
The end modified rear amplicon of III is purified: carry out with reference to AMPure Regent operation instructions
C) joint connects
I ligation system preparation: get 0.2Ep pipe and put and add following reagent and sample on ice and mix
BSA】 ?
P joint (1 μ M) 2
Containing the A joint 1-10 (1 μ M) of sample label 2
DNA ligase (2000U/ μ L) 2
The water of nuclease free 5
Cumulative volume 40
II linker: room temperature 10 minutes
III connects purification of products: carry out with reference to AMPure Regent operation instructions
D) nick translation, amplified library and purifying
I reaction system preparation: get 0.2Ep pipe and put and add following reagent and sample on ice and mix
Composition Volume (μ L)
Above-mentioned connection product 10-100ng 25
High-fidelity platinum enzyme mixation 100
Library propagation primer * (10 μ M each) 5
Cumulative volume 130
* library propagation upstream and downstream primer is respectively A joint normal chain and P joint complementary strand, to ensure that exact connect ion product is bred.
II is pressed follow procedure reaction:
* cycle number is determined by amplification template amount, and template amount 10ng establishes 8 circulations, and 100ng establishes 5 circulations.
Purifying in III propagation library: carries out with reference to AMPure Regent (Beckman) operation instructions
(2) sequencing template is prepared
This step is prepared test kit by sequencing template and is completed, and the preparation of water-in-oil PCR reaction system and response procedures are as follows:
1) PCR reaction system preparation:
After the abundant vortex of water mixes, that be connected with by surface and one section of sequence library P joint complementation, catch particle and be connected with library P joint, then mix oil phase fully mixed point with 10:1 ratio, forms water-in-oil PCR micro reaction pool.
2) pcr amplification condition
(3) high-throughput DNA sequencing
Can adopt any high-flux sequence supporting sequencing kit of negotiating peace to carry out, for example, use Ion Torrent PGM(purchased from Life Technologies).
(4) data processing
Completed by Nextgene analysis software (Softgenetics), comprise data Quality Control, sequencing data classification and three steps of somatotype results conversion.
Quality Control parameter arranges: Mean Score (quality mean scores) >=16, fragment length >=55 base
Data are sorted out parameter setting: Barcode sorting(sorts out by sample label) must 10 bases all coincide, PrimerSorting (sorting out by site primer) can allow 3 with interior base mispairing.
Somatotype results conversion parameter: as sample, identical with the sequencing result and certain reference sequences that exceed 35% in site, be converted to the STR somatotype that this reference sequences is corresponding.
(5) interpretation of result
For the result that makes two kinds of methods has comparability, be different allele% by the two detected result unification, wherein high-flux sequence method represents with different allele reading percentages, multicolor fluorescence labelling method represents with different allele peak height percentages, adopts Chi-square to analyze the consistence of two kinds of method detected results.
Result shows, taking the different allele reading of high-flux sequence method per-cent as observed value, taking the different allele peak height of multicolor fluorescence method per-cent as expected value, in two increments bases, the P value in all 16 sites is all greater than 0.05, the detected result that two kinds of methods are described is basically identical, and the somatotype standard of existing multicolor fluorescence method (comprising the defining standard of background noise, homozygous and heterozygous) is also basic feasible solution for high-flux sequence method.
Adopt existing multicolor fluorescence method somatotype standard, taking the different allele per-cent of high-flux sequence method as according to carrying out somatotype, as shown in table 2 to the somatotype result of 10 duplicate samples, the wherein somatotype result of sample 1,2 and multicolor fluorescence method (table 1) in full accord.
Compared with multicolor fluorescence method, high-flux sequence method can detect micro-variation of DNA sequence dna, such as 15 and 16 somatotypes of locus D3S1358 all can be refined as 15A([AGAT] 11[AGAC] 3aGAT), 15B([AGAT] 12[AGAC] 2aGAT) and 16A([AGAT] 12[AGAC] 3aGAT), 16b([AGAT] 13[AGAC] 2aGAT).High-flux sequence method can be accurately 15A and 16A(table 1 by result), and multicolor fluorescence rule only can show 15,16.
High-flux sequence method not only can realize the somatotype function of existing multicolor fluorescence method, the detected result of two kinds of methods has certain comparability, and because high-flux sequence can show the fine difference of STR site and flanking DNA sequence thereof, be a kind of means that more accurately, effectively detect mankind's str locus seat.
Two kinds of methods of table 1. (multicolor fluorescence method and high-flux sequence method) locus somatotype the results list
St:Stutter allele writes a Chinese character in simplified form, due to the background noise of archaeal dna polymerase slippage generation.
Table 2. adopts high-flux sequence method 10 duplicate samples to be amounted to the somatotype result in 16 sites (comprising 1 sex site Ame)
2) high-throughput DNA sequencing result example (sample 10):
A. data Quality Control result
Raw data FASTQ file (334, article 966, order-checking), filter 7180 order-checkings through quality screening (retaining the sequencing result of Mean Score >=16), length screening (retaining the sequencing result of length >=55 base) filters 3169 order-checkings, obtains altogether 296093 sequencing results that meet Quality Control requirement for follow-up data processing.
B. information categorization checks order
Above-mentioned 296093 order-checkings are carried out according to sample label sequence information, are effectively sorted out extremely different sample files and press from both sides, and result is as follows:
Table 3. sample label is sorted out result
Taking sample 10 as example, according to primer sequence, 17586 of this sample sequencing results to be sorted out, result is as follows:
Result (sample 10) is sorted out in table 4. site
C. Data Format Transform
According to carrying out somatotype with standard " ladder comparison reference sequence " comparison result, result is as follows:
1)CSFIPO?12,13
47%reads
>GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGAAGTACTTAGAACAGGGTCTGACACAGGAAATGCT
51%?reads
>GATATTAACAGTAACTGCCTTCATAGATAGAAGATAGATAGATTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGAAGTACTTAGAACAGGGTCTGACACAGGAAATGCT
2)FGA?21,25
54%reads
>AAATAAAATTAGGCATATTTACAAGCTAGTTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCCTTTCTTCCTTTCTTTTTTGCTGGCAATTACAGACAAATCACTCAGC
43%reads
>AAATAAAATTAGGCATATTTACAAGCTAGTTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCCTTTCTTCCTTTCTTTTTTGCTGGCAATTACAGACAAATCACTCAGC
3)TH01(7,9)
52%reads
>CCTGTTCCTCCCTTATTTCCCTCATTCATTCATTCATTCATTCATTCATTCATTCATTCACCATGGAGTCTGTGTTCCC
47%reads
>CCTGTTCCTCCCTTATTTCCCTCATTCATTCATTCATTCATTCATTCATTCACCATGGAGTCTGTGTTCCC
4)VWA(16,20)
48%reads
>TCAGTATGTGACTTGGATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTA
45%reads
>TCAGTATGTGACTTGGATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTA
5)D3S1358(16,17)
54%reads
>CACTGCAGTCCAATCTGGGTGACAGAGCAAGACCCTGTCTCATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGACAGATACATGCAAGCCTCTGTTGATTTCA
40%reads
>CACTGCAGTCCAATCTGGGTGACAGAGCAAGACCCTGTCTCATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGACAGATACATGCAAGCCTCTGTTGATTTCA
6)D5S818(10,12)
53%reads
>AGGGTGATTTTCCTCTTTGGTATCCTTATGTAATATTTTGAAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAGGTATAAATAAGGATACAGATAAAGATACAAATGTTGTAAACTGTGGCTATGATTGG
45%reads
>AGGGTGATTTTCCTCTTTGGTATCCTTATGTAATATTTTGAAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAGGTATAAATAAGGATACAGATAAAGATACAAATGTTGTAAACTGTGGCTATGATTGG
7)D7S820(8)
96%reads
>AGGGTATGATAGAACACTTGTCATAGTTTAGAACGAACTAACGATAGATAGATAGATAGATAGATAGATAGATAGACAGATTGATAGTTTTTTTTAATCTCACTAAATAGTCTATAGTAAACATTTAATTACCAATATTTGGTGCAATTCTGTCAATGA
8)D8S1179(16,17)
50%?reads
>ACACGGCCGGGCAACTTATATGTATTTTTGTATTTCATGTGTACATTCGTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATTCCCCACAGTGAAAATAATCTACAGGATAGGTAAATAAATTAAGGCATATTCACGCAATGGGATACGATA
46%?reads
>ACACGGCCGGGCAACTTATATGTATTTTTGTATTTCATGTGTACATTCGTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATTCCCCACAGTGAAAATAATCTACAGGATAGGTAAATAAATTAAGGCATATTCACGCAATGGGATACGATA
9)D13S317(8,9)
43%reads
>GGACTCTGACCCATCTAACGCCTATCTGTATTTACAAATACATTATCTATCTATCTATCTATCTATCTATCTATCAATCAATCATCTATCTATCTTTCTGTCTGTCTTTTTGGGCTGCCTATGGCTC
54%reads
>GGACTCTGACCCATCTAACGCCTATCTGTATTTACAAATACATTATCTATCTATCTATCTATCTATCTATCAATCAATCATCTATCTATCTTTCTGTCTGTCTTTTTGGGCTGCCTATGGCTC
10)D16S539(9,13)
49%reads
>CCTCTTCCCTAGATCAATACAGACAGACAGACAGGTGGATAGATAGATAGATAGATAGATAGATAGATAGATATCATTGAAAGACAAAACAGAGATGGATGATAGATACATGCTTACAGATGCA
47%reads
>CCTCTTCCCTAGATCAATACAGACAGACAGACAGGTGGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATCATTGAAAGACAAAACAGAGATGGATGATAGATACATGCTTACAGATGCA
11)D18S51(16,17)
54%reads
>TGAGTGACAAATTGAGACCTTGTCTCAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAGAGAGGAAAGAAAGAGAAAAAGAAAAGAAATAGTAGCAACTGTTATTGTAAGAC
40%reads
>TGAGTGACAAATTGAGACCTTGTCTCAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAGAGAGGAAAGAAAGAGAAAAAGAAAAGAAATAGTAGCAACTGTTATTGTAAGAC
12)D21S11(29,32.2)
50%reads
>AATTCCCCAAGTGAATTGCCTTCTATCTATCTATCTATCTGTCTGTCTGTCTGTCTGTCTGTCTATCTATCTATATCTATCTATCTATCATCTATCTATCCATATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCGTCTATCTATCCAGTCTATCTACC
46%reads
>AATTCCCCAAGTGAATTGCCTTCTATCTATCTATCTATCTGTCTGTCTGTCTGTCTGTCTGTCTATCTATCTATATCTATCTATCTATCATCTATCTATCCATATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATATCTATCGTCTATCTATCCAGTCTATCTACC
13)TPOX(8)
94%reads
>CTTAGGGAACCCTCACTGAATGAATGAATGAATGAATGAATGAATGAATGTTTGGGCAAATAAACGCTGACAAGGAC14)D2S1338(17)
96%reads
>TGGAAACAGAAATGGCTTGGCCTTGCCTGCCTGCCTGCCTGCCTGCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCCTCCTGCAATCCTTTAACTTACTG
15)D19S433(14)
>CCTGGGCAACAGAATAAGATTCTGTTGAAGGAAAGAAGGTAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAGAGAGGAAGAAAGAGAGAAGATTTTTATTCGGGTAATGGGTGCACC
16)Aml(X,Y)
54%reads
>CCCTGGGCTCTGTAAAGAATAGTGTGTTGATTCTTTATCCCAGATGTTTCTCAAGTGGTCCTGATTTTACAGTTCCTACCACCAGCTTCCCAGTTTAAGCTCTGAT
46%reads
>CCCTGGGCTCTGTAAAGAATAGTGTGTTGATTCTTTATCCCAGATAAAGTGGTTTCTCAAGTGGTCCTGATTTTACAGTTCCTACCACCAGCTTCCCAGTTTAAGCTCTGAT
Tool of the present invention has the following advantages:
1. improve the resolving power of STR site for individual recognition
(1) result display state: existing detection technique is inferred short-movie section multiplicity to contain the PCR of the STR section length of rising in value, the detection of PCR product length not only will be added serial allele ladder as standard, reduce detection flux, and exist certain error [taking 310 type capillary electrophoresis apparatus as example, standard deviation (Std Deviation) for 16 all Allelic Ladder in site of Amp F/STR test kit arrives 0.15(reference reagent box specification sheets between 0.02)], and the DNA sequence dna information obtaining by DNA sequencing method not only can be truer, reflect intuitively the short-movie section multiplicity in STR section, and the micro-variation of sequence in this region can further be detected.
(2) detecting position is counted: be subject to the restriction of fluorescent mark and swimming lane length, the STR number of loci of the disposable detection of existing detection technique is generally 20 left and right, and for high-throughput DNA sequencing, the number of sites of disposable detection only limits to the stoichiometric number (current multiplex PCR can be realized at most 2000 sites and increase simultaneously) that multiplex PCR can be included, and the number of sites of detection is counted and obviously improved than the detecting position of existing capillary electrophoresis.
Based on above 2 points, with the method that detects mankind STR based on high-throughput DNA sequencing provided by the invention, make to measure resolving power and risen to the DNA sequencing that each Nucleotide is detected one by one by clip size, and can increase the STR number of sites of disposable detection, will greatly improve the resolving power of STR site for individual recognition.
2. improve detection sensitivity
Existing STR detects the fragment analysis based on fluorescently-labeled pcr amplification technology and capillary electrophoresis, the amount of template DNA is had to certain requirement, high-flux sequence can be realized the even detection of single DNA molecules of tracer level, the sensitivity bringing thus greatly improves, and is of great importance for the mensuration of special micro-sample.
3. improve sample flux
Be subject to the restriction of capillary electrophoresis apparatus sense channel, existing STR detection technique single-time measurement number of samples is limited.And the sample capacity of high-throughput DNA sequencing technology only depends on that sequencing throughput comprises the factors such as the integrated level of electronic component, and can be as required to the DNA molecular of different sources in addition mark distinguish, greatly the raising of degree detection flux.
High-throughput DNA sequencing method is introduced STR determination techniques by the present invention, forms STR analytical technology of new generation.The STR determination techniques of a new generation is based upon on the high-throughput DNA sequencing platform recently occurring, STR is measured and rise to by fragment (tens of polymkeric substance to hundreds of Nucleotide) the DNA sequencing level that each Nucleotide is detected one by one, and owing to being no longer subject to fluorescent mark technical limitation, the locus number that can simultaneously measure is more, has improved greatly STR and identify as human individual the resolving power of polymorphic dna mark.High-throughput DNA sequencing is by the DNA sequencing molecule of sample label distinguishing different, and taking length as 10-12, the sample label of a base is example, and the number of tags that can form is in theory 4 10, as long as sequencing throughput is enough large, the disposable sample flux that can detect is far above existing detection means.These improvement make STR somatotype detection accuracy in the Application Areass such as Individual identification further improve with detection flux.
The present invention also comprises with high-throughput DNA sequencing method and measures the application that human genome short-and-medium fragment tandem repeat extended (be included in legal medical expert and other and relate to the application in the association area that mankind's str locus seat detects) and related products.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (1)

1. the method with the short-and-medium fragment tandem repeat of high-throughput DNA sequencing method mensuration human genome of non-medical diagnosis on disease and therapeutic purpose, is characterized in that: said method comprising the steps of:
(1) preparation in Multi-example locus proliferator library
Proliferator library refers to that two ends are connected with the DNA fragmentation of different joints, and wherein, a side is sequence measuring joints: can contain sample label, to distinguish the sequencing result of different samples; An other side is permanent joint: catch particle for connecting, the preparation method in proliferator library adopts TRAP or connection method:
A. TRAP: use the multiplex PCR that carries out multiple samples with the special combination of primers of joint sequence, obtain the object fragment with corresponding joint, composition sample library;
B. connection method: the combination of application general primer is carried out after the multiplex PCR of multiple samples, is connected corresponding joint is connected in amplicon fragment by flat end, composition sample library;
(2) high-throughput DNA sequencing
A. sequencing template preparation: library content is fixed on and catches on particle through a side connector, and each particle carries a single DNA fragmentation; By carrying particle and PCR reagent and the oil phase emulsification of template, form water in oil microreactor; Parallel the carrying out of amplification in whole fragment library, forms sequencing template;
B.DNA order-checking: adopt high-flux sequence instrument to carry out DNA sequencing;
(3) data analysis and reporting the result
A. data Quality Control
According to order-checking length and quality, raw data is filtered;
B. information categorization checks order
Carry out according to the sample label sequence information in sequencing result, sequencing result can effectively be sorted out to the folder in different samples, different STR site;
C. Data Format Transform
High-throughput DNA sequencing results conversion is become to the standard format of current STR somatotype result, represent with the multiplicity of str locus seat core tumor-necrosis factor glycoproteins, this step is by making the standard " ladder comparison reference sequence " of certain locus or directly reading core multiplicity and carry out; Also can show to some extent according to micro-variation of sample sequence of finding with canonical reference sequence alignment, to reflect the polymorphism in this STR site.
CN201210466090.4A 2012-11-16 2012-11-16 Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method Active CN102943111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210466090.4A CN102943111B (en) 2012-11-16 2012-11-16 Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210466090.4A CN102943111B (en) 2012-11-16 2012-11-16 Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method

Publications (2)

Publication Number Publication Date
CN102943111A CN102943111A (en) 2013-02-27
CN102943111B true CN102943111B (en) 2014-07-16

Family

ID=47726093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210466090.4A Active CN102943111B (en) 2012-11-16 2012-11-16 Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method

Country Status (1)

Country Link
CN (1) CN102943111B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103173557A (en) * 2013-04-08 2013-06-26 上海邃志生物科技有限公司 Multiple PCR (polymerase chain reaction) primer combination and detection method used for human paternity test
CN104630202A (en) * 2013-11-13 2015-05-20 北京大学 Amplification method capable of decreasing bias generation during trace nucleic acid substance entire amplification
CN104531713A (en) * 2015-01-20 2015-04-22 中国人民解放军第三军医大学 Multiple PCR primers and method for constructing human TCBR library based on high-throughput sequencing
CN104673907B (en) * 2015-02-12 2017-12-26 上海市刑事科学技术研究院 A kind of system and its detection method that STR partings are examined for high flux
CN104805187B (en) * 2015-03-31 2018-02-13 农业部科技发展中心 A kind of method of the specificity for testing pure lines new soybean varieties, uniformity and stability
CN106834428B (en) * 2015-12-07 2020-07-24 北京爱普益生物科技有限公司 High-throughput multi-site human short fragment tandem repeat sequence detection kit and preparation and application thereof
CN105567681B (en) * 2015-12-31 2018-08-31 广州赛哲生物科技股份有限公司 A kind of method and label connector based on the noninvasive biopsy virus of high-throughput gene sequencing
CN107122625B (en) * 2016-02-24 2020-10-09 北京爱普益生物科技有限公司 Method for processing high-throughput sequencing information of human short segment tandem repeat sequence
CN105821482B (en) * 2016-04-29 2018-04-10 李星军 A kind of biochemistry micro- reaction system, high-flux sequence build storehouse instrument and application
CN106650308A (en) * 2016-11-07 2017-05-10 为朔医学数据科技(北京)有限公司 Processing method and system for mitochondrial high-throughput sequencing data
CN108165620B (en) * 2018-01-05 2019-05-14 东莞博奥木华基因科技有限公司 Label and its preparation method and application
CN111863133B (en) * 2019-12-30 2023-07-18 上海交通大学医学院附属瑞金医院 Analysis method, kit and analysis system for high-throughput sequencing data
CN114592072A (en) * 2022-03-31 2022-06-07 公安部物证鉴定中心 Human 29 autosomal STR locus multiplex amplification detection kit and application

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020018999A1 (en) * 2000-02-24 2002-02-14 Jan Henriksson Methods for characterizing polymorphisms
CN102703595B (en) * 2012-06-13 2014-02-12 东南大学 STR (short tandem repeat) sequence high-throughput detection method with base selective controllable extension and detection reagent thereof

Also Published As

Publication number Publication date
CN102943111A (en) 2013-02-27

Similar Documents

Publication Publication Date Title
CN102943111B (en) Application of high-pass DNA (Deoxyribonucleic Acid) sequencing method on determination of short tandem repeat gene locus in human genome and method
Kumar et al. Next-generation sequencing and emerging technologies
Hu et al. Next-generation sequencing technologies: An overview
Macaulay et al. Single-cell multiomics: multiple measurements from single cells
Morey et al. A glimpse into past, present, and future DNA sequencing
CN103874767B (en) Presumptive area in sample of nucleic acid is carried out the method and system of gene type
CN107849612B (en) Alignment and variant sequencing analysis pipeline
Korpelainen et al. RNA-seq data analysis: a practical approach
Kalisky et al. A brief review of single-cell transcriptomic technologies
CN105358709B (en) System and method for detecting genome copy numbers variation
AU2021269294A1 (en) Validation methods and systems for sequence variant calls
US20230002821A1 (en) High-throughput detection method for rare mutation of gene
Profaizer et al. Human leukocyte antigen typing by next-generation sequencing
Yin et al. Challenges in the application of NGS in the clinical laboratory
JP2020529648A (en) Methods and systems for degradation and quantification of DNA mixtures from multiple contributors of known or unknown genotypes
Ton et al. Multiplexed nanopore sequencing of HLA-B locus in Māori and Pacific Island samples
CN112795654A (en) Method and kit for organism fusion gene detection and fusion abundance quantification
McGinn et al. New technologies for DNA analysis–a review of the READNA Project
Arboleda et al. An overview of DNA analytical methods
EP3279339B1 (en) Method for determining gene state of fetus
Daviaud et al. Whole-genome bisulfite sequencing using the Ovation® ultralow methyl-seq protocol
Elkins et al. Next generation sequencing in forensic science: a primer
CN114657242B (en) Application of GPR33 gene in assessment of marneffei Talaromyces susceptible population
CN112885407B (en) Second-generation sequencing-based micro-haplotype detection and typing system and method
Vermeersch et al. Single-cell RNA sequencing in yeast using the 10× Genomics chromium device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant