Banking process, kit and detection method is sequenced in coding bis- generations of PCR
Technical field
The present invention relates to DNA deep sequencing detection technique fields, more particularly in tissue, blood plasma or serum 0.03%~
The high-fidelity deep sequencing of the uncertain DNA fragmentation in mutant target sequence site of 1% low ratio detects.
Background technology
Genetic science is a key areas of life science, and the gene-code sequence of life entity and its variation can influence to give birth to
The a variety of biological functions for ordering body, cause function to make a variation, and the genetic mutation by detecting life entity will be seen that and estimate life
Function makes a variation, and by the genetic mutation situation of quantitative detection life entity, can also accurately understand the progression of disease of illness life entity
And prognosis, such as the presence or absence of the progress of tumour, recurrence or phase and therapeutic effect in the morning, afternoon and evening.Currently, detection genetic mutation has become
It analyzes and determines genetic disease, the important means of tumor disease, individuation drug response feature, is lacked for predicting and preventing birth
It falls into, regular monitoring tumor recurrence, monitors the therapeutic effect of tumour personalized medicine and chemotherapy.
Currently, the detection method of genetic mutation mainly has generation PCR sequencing PCR, two generation PCR sequencing PCRs and three generations's PCR sequencing PCR.A generation is surveyed
Sequence can reliably detect variation in the sample of 20% or more ratio, be less than 20%, the reliability of detection is substantially reduced, so not
Low ratio (0.05%~1%) genetic mutation type can be detected.When the first two for PCR sequencing PCR (including gene Jian Ku and the sequencing two of upper machine it is big
Link) high depth sequencing (multiplying more than 5000) can detect genetic mutation type ratio be less than 10% sample.Because current
It is sequenced there are mainly two types of banking process, one is full genome sequencings to build library, by the DNA sample of fragmentation, carries out non-specific connector
It connects and builds library, build library joint efficiency and be less than 30%, cause about 70% primary template sequence information to lose, to 10000 bases
Because 5 mutation may cause false negative because losing the positive in group DNA sample of copy.Therefore, it is not possible to accurately detect 5/
10000 mutation.Because generally drawing blood 8~10ml, 4~5ml of blood plasma is obtained hereinafter, the DNA extracted is pressed in 30~40ng
3.3pg is a genome copies, and there are about 10000~12000 copies by 30~40ngDNA.By building the loss in library, can only remain
2000~3000 copies, it is highly difficult that realization effective 10000 multiplies high depth sequencing.Moreover, even if 5000 multiply the sequencing of depth,
The expense of full genome sequencing is also prohibitively high, can not popularization and application.
High depth is sequenced, and is sequenced suitable for target gene target area, although depth is very deep, sequencing target sequence is limited, expense
It is not high, tumor peripheries blood mutator detection (also known as CT-DNA detections) can be solved with popularization and application.Target gene target area is sequenced,
Current method has special prize law and PCR specific amplified methods.Special prize law application distinguished sequence probe hybrid capture target base
Because of target area segment, connector connection (efficiency about 30%) is carried out first, is then attached product purification, then carry out adapter PCR,
Probe hybrid capture is carried out after amplification, then carries out capture product amplification.By connecting, capturing, it is attributed to the practical of primary template and obtains
Rate about 20%.Such 10000 original copies about obtain 2000 DNA moleculars for being originated from original template, although expanding by PCR
Energization reaches enough sequences, and progress 5000 multiplies, 10000 multiply sequencing, and the sequence number for being attributed to primary template does not improve, though
Right depth is very deep, but cannot represent the depth of 10000 primary templates.The PCR of more than 30 cycle also will produce very important
Mispairing error seriously affects the fidelity and sensitivity of sequencing.
In current method, also site-specific amplification method is used to obtain target sequence, general specific amplified 10~20 follows
Ring, then carries out connector and expands again, and 10~15 cycles always expand 20~35 cycles.This equally brings more serious PCR wrong
With error problem, detection noise is increased, reduces the signal-to-noise ratio of detection.Because of unavoidable base in the reaction of gene magnification
Mispairing extension causes to be mutated false positive error, and the mispairing of exo+ polymerase extends less than General polymeric enzyme, but still inevitable.Such as
This amplification, up to 0.5%~2%, this has just seriously affected the fidelity of deep sequencing, also just affects for the error mutation brought
(precision of ct-DNA sequencing detections wants the net detection energy that can reach 1/1000 mutation for application of the deep sequencing in ct-DNA detections
Power just has higher value).
Chinese shellfish is auspicious and Kanggong department has invented C-smart methods and carried out deep sequencing to ct-DNA.The characteristics of its method is
One step carries out molecular bar code connector connection to primary template, then carries out the amplification of 20~30 cycles, then to amplified production ring
Change, then carry out cyclic DNA specific amplification, realizes that library sequencing is built in targeting.The mispairing false positive that the amplified band of multiple cycles comes is missed
Difference can be deducted to reduce by the calculating of molecular bar code, improve signal-to-noise ratio.It (is generally not more than the disadvantage is that joint efficiency is not high
30%) primary template information, will be caused to lose, the PCR exponential amplifications of High circulation, cause the mutant proportion of low ratio more it is low very
To loss.
Low ratio mutator majority is dissociative DNA from tumor patient peripheral circulation blood plasma or precancerous lesion and early stage
The extract for the DNA that the tumor tissues of lesion release, DNA sample is mostly the short-movie section (160~200bp) of fragmentation, to obtain
The more amplified production containing mutational site is obtained, general amplification region Design of length is 70~100bp, and mutational site multiple preset exists
The centre portion of PCR product.Detection method of the current detection mutator less than 1% is mainly digital pcr method, and some is known as
Droplet digital pcr also has micropore digital pcr, effect difference larger.There are many this method domestic and foreign literature report.Digital pcr is most
Big limitation is the quantitative detection for the sample that can only carry out known clear mutational site, and to common mutations segment, but site is not
Fixed mutation then can not be detected clearly.
Digital pcr method (including droplet PCR, micropore PCR) is the reliable method of low meristic variation genetic test.It can
The low meristic variation gene of detection 1/100 to 1/10000.The apparent of this method is limited in that:1) digital pcr, one anti-
1-2 known determining site mutation genes should can only be detected, it is not true cannot multiple sites reliably to be detected simultaneously to same sample
Fixed mutation.Because tumour variation is complicated, polygenes multisite mutation is the important feature of tumour, therefore detects polygenes multidigit point
It is particularly significant, so current digital pcr cannot meet the needs of blood plasma multisite mutation detection very well.2) digital pcr needs
Larger, the generally conventional primary blood drawing 10ML of amount of DNA, only enough do a reaction, 1~2 site of survey will survey more sites then
Sample size is inadequate.3) 1~2 site is only detected, for tumour, detection coverage is too low, and application surface is too narrow.For example, Kras
Gene is that kinds of tumors needs the mutator detected, and most common mutation is 12 codons and 13 codons, relates to 6 here
A gene base position, a site can have 3 kinds of variations, and which site mutation is not fixed, and unknown.In this way, digital pcr
Just be not suitable for.In tumor tissues and blood of cancer patients detection, similar multidigit point hot-zone catastrophe is most, and certain list
One site mutation ratio is relatively low, so the applicability of digital pcr is narrow, fails to be generalizable so far.4) digital pcr cost is too
Height, as soon as detection site, only material cost need 200 yuan to 400 yuan.Therefore, digital pcr is so far only in field of scientific study
Using still cannot be used for the detection of actual medicine.The digital pcr of Vehicles Collected from Market supply has four companies, is respectively:The U.S.
Raindance, the Bole in the U.S., the life in the U.S., Singapore clari.The first two is the drop PCR of Water-In-Oil, latter two
It is micropore PCR.
Invention content
One of the technical problem to be solved in the present invention is to provide a kind of two generations sequencing banking process of coding PCR, uses the party
The DNA library of method structure can be used for being sequenced the unknown mutant fragments of the multidigit point for detecting multiple gene target areas and site, and can be with
It traces to the source and calculates the detection number and sample original DNA mutation detection number of sample original DNA molecule, reach highly sensitive and fidelity.
In order to solve the above technical problems, banking process is sequenced in the first coding bis- generation of PCR of the present invention, step includes:
1) sample DNA is extracted;
2) it using the DNA of step 1) extraction as template, is carried with the distinguished sequence of template complementation, 5 ' ends with public with 3 ' ends
The primer forward or backwards of joint sequence, middle part with random molecular coded sequence carries out single-stranded special linear amplification reaction, and
Amplified production is purified;
3) it using purified product obtained by step 2) as template, is carried with the distinguished sequence with template complementation, 5 ' ends with 3 ' ends
The reversed or forward primer of common contact sequence constitutes primer pair with common contact primer, it is anti-to carry out single-ended special exponential amplification
It answers, and amplified production is purified;The sequence of the common contact primer and 5 ' the public of end of template described in this step connect
Header sequence is identical or complementary;
4) using purified product obtained by step 3) as template, the non-specific finger of both-end is carried out with sequencing both-end common contact primer pair
Number amplified reaction, and amplified production is purified;It is described sequencing both-end common contact primer pair sequence respectively with step 2)
Common contact sequence in the forward primer and the common contact sequence in reverse primer described in step 3) are identical or complementary;
5) quantitative, quality inspection obtains DNA sequencing library.
Step 1) the sample DNA is usually the short-movie section of 150~500bp.
Step 2) the random molecular coded sequence is constituted by being inserted into one section of nonrandom sequences among two sections of random sequences,
In, the base number of every section of random sequence is 4bp or more (preferably 4~5bp), and the base number of nonrandom sequences is 4~6bp.For example,
4N-A4TA-4N、4N-T4AT-4N、4N-AGCT-4N、4N-CTAG-4N、4N-TACTGT-4N、5N-A4TA-4N、5N-A4TA-
5N, 4N-T4AT-5N, 5N-T4AT-5N, 4N-AGCT-5N, 5N-AGCT-5N etc., wherein N represent randomized bases (i.e. dATP,
Any one of dCTP, dGTP, dTTP).When primer synthesizes, the base (A, C, G, T) on each site of random sequence is logical
Cross what random connection obtained, the base type set when being then according to design of primers of the base on each site of nonrandom sequences is solid
What fixed distribution obtained.By being inserted into nonrandom sequences among random sequence, separate randomized bases, one isolated area of formation can
To improve the specificity of the single-stranded linear amplified reaction of random coded primer, the formation of non-specific product is reduced.
The length of the step 2) primer is 50~65bp, and the special linear amplification single-stranded forward or backwards reacts anti-
The system is answered to be:5 × PCR buffer solutions, 2 μ L, 10mM dNTP, 0.5 μ L, DNA profiling 1 μ L, 5 μM of 1 μ L, 5U/ μ l KOD DNA of primer
Polymerase 0.2 μ L, H25.8 μ L of O, 10 μ L of total volume;Reaction condition is (slow cooling three-step approach cycle):95 DEG C are denaturalized for 5 minutes;
95 DEG C 30 seconds, 64 DEG C anneal 1 minute, 63 DEG C anneal 1 minute, 62 DEG C anneal 1 minute, 61 DEG C anneal 1 minute, 60 DEG C anneal 1 point
Clock, 59 DEG C are annealed 1 minute, and 58 DEG C are annealed 1 minute, and 68 DEG C~72 DEG C extend for 30 seconds, 2~10 thermal cycles;16 DEG C of holdings.It removes
Outside KOD enzymes, reacts exo+ polymerase used and be also an option that Pfu enzymes, vent enzymes, kapa high fidelity enzymes etc..This step is using more
Linear amplification is recycled, sampling rate can be improved, it is ensured that reaches the requirement that the initial non-duplicate recall rate of DNA molecular number is more than 30%
(ctDNA is the low-frequency molecular in cfDNA, if the initial molecular detection number to cfDNA is insufficient, for example, containing in sample
Under the conditions of 10000 DNA copies, recall rate is less than 10%, then the probability for detecting 0.1% ctDNA is just inadequate, be easy to cause vacation
It is negative;If recall rate can be more than 30%, it is exactly 3 times of 10% recall rate that the ctDNA of detection 0.1%, which is the power of a test of the positive,
More than, just cannot be easily caused false negative).In addition, this step is single-stranded linear amplification, that is, the abrupt climatic change of the present invention only detects
As soon as the wherein chain in DNA double chain can be equivalent to complementary double chain mutation detection, and under the conditions of same detection effect, Ke Yijie
Save the sequencing amount of half.
Step 3) the single-ended special exponential amplification reaction is first using slow cooling three-step approach cycle, and conditional synchronization is rapid 2), heat
Recurring number is 4~6;Two-step method is used to recycle again, 95 DEG C of denaturation, 68~72 DEG C of annealing extend, and thermal cycle number is 15~20.
This reaction condition advantageously reduces the ratio of non-specific product.
Routinely three-step approach recycles the reaction system and reaction condition of the non-specific exponential amplification reaction of the step 4) both-end
(95 DEG C of denaturation, 58 DEG C of annealing, 68 DEG C of extensions), thermal cycle number is 15~20, and primer concentration is 5 μM.The purifying of amplified production
It is that (volume ratio of magnetic bead and amplified production is 1.5 with magnetic bead:1) purify 220bp or more long segments, be conducive to remove 180bp with
Under non-targeted product.Sequencing both-end common contact primer pair described in step 4) is surveyed when using illumina bis- generations sequenators
It is P5 adapter-primers (SEQ ID NO when sequence:And P7 adapter-primers (SEQ ID NO 3):4);When partly leading using life companies
It is that PGM/A adapter-primers and PGM/B adapter-primers, sequence are as follows when body chip sequencer:
PGM/A adapter-primers:CCATCTCATCCCTGCGTGTCTCCGACTCAG(SEQ ID NO:5)
PGM/B adapter-primers:CCTCTCTATGGGCAGTCGGTGAT(SEQ ID NO:6)
The second technical problem to be solved by the present invention is to provide a kind of bis- generations of coding PCR for detection in Gene Mutation and is sequenced
Detection method, this method prepare DNA sequencing library according to above-mentioned banking process, carry out high-flux sequence detection, and tie to sequencing
Fruit data carry out Source Tracing.The Source Tracing includes error identification, authenticity differentiates, mutation differentiates, mutant proportion calculates,
Specifically include following steps:
1) different target gene data are sorted, multiple subdatas are formed;
2) subdata carries out molecule encoding sequence respectively, and an independent molecule coding is defined as a family of molecule, counts
Family of molecule number calculates the reads in family of molecule;
3) differentiate family of molecule validity:Reads numbers in one family of molecule are less than present count, which is determined as nothing
Effect;More than or equal to present count, which is determined as effectively;The present count is the natural number more than 5;
4) reads in family of molecule is compared respectively with reference sequences, recorded, count mispairing reads and non-mistake
The family of molecule is determined as wild type molecule race if non-mispairing reads numbers are more than 0 with reads;If non-mispairing reads
Number is equal to 0, then continues the differentiation of step 5) mutant molecules race;
5) differentiation of mutant molecules race:If the consistency of mutational site and genotype is more than 90%, by the molecule
Race is determined as mutant molecules race, and the family of molecule is otherwise determined as wild type molecule race;
6) it traces to the source statistics, and mutant proportion is calculated as follows:It traces to the source molecular mutation ratio=mutant molecules race number/(prominent
Modification family of molecule number+wild type molecule race number).
The third technical problem to be solved by the present invention is to provide a kind of bis- generations of coding PCR and is sequenced and builds library kit, the reagent
Box includes the DNA sequencing library prepared with above-mentioned banking process, can be used for the high throughput of 0.03%~1% low frequency mutator
Sequencing.
The four of the technical problem to be solved in the present invention are to provide the primer pair for above-mentioned banking process.Wherein, positive or
Reverse primer has such as SEQ ID NO:Sequence shown in 1, reversed or forward primer have such as SEQ ID NO:Sequence shown in 2
Row;Alternatively, primer has such as SEQ ID NO forward or backwards:One or more sequence in sequence shown in 7~9, reversely or just
There is such as SEQ ID NO to primer:2、SEQ ID NO:One or more sequence in sequence shown in 10~11.Wherein, sequence
In N be A, C, G, T in any one.
The present invention first carries out the single-ended special of random molecular coding using the unknown genetic fragment in target area mutational site as template
The linear single-stranded synthetic reaction of primer guiding converts the base in serial site on primary template to the complementation of molecule encoding label
Sequence carries out the single-ended special exponential amplification synthetic reaction of the single-ended special primer guiding of offside, to reduce target base again after purification
Because the coding of sequence is lost, increases the specificity of product, then carries out the non-specific exponential amplification of multi-cycle of common contact primer,
Obtain enough complete sequences to be measured needed for high-flux sequence.With it is existing build library and sequencing analysis method compared with, side of the invention
Method has the following advantages and beneficial effect:
1. the specificity of detection is high, false positive can reach 0.01% hereinafter, specific than traditional two chain PCR amplification method
2% 200 times of high resolution.This is because encoding Source Tracing by unimolecule, (one is encoded to family, there is multiple sequences in race
As a result, must mutational site consistency reach 90% or more, be just determined as being mutated), can eliminate PCR amplification generation random mistake
With error.
2. what is inspected by random samples due to the present invention is a chain in DNA double chain, rather than two chains, 1 family of molecule represent one
DNA double chain, 2 family of molecule same loci saltant types of detection represent the identical mutation of 2 DNA moleculars, therefore, detection of the invention
Method sensitivity is very high, and detection limit can reach 0.03% mutation level, is connected with connector than double-strand PCR amplification method and builds library method (this
The detection limit of two methods is 0.1%) 3 times high;Initial DNA molecular number recall rate can reach 30%~70%, mutation point
Sub- detection limit can reach 0.03%~0.1%.In addition, the linear amplification of the unimolecule coded markings special primer of the present invention,
Even if in the case where sample DNA molecular number is certain, the sampling observation rate of sample DNA molecule can also be improved by increase recurring number,
That is, the sensitivity of mutation detection is improved, therefore, the sensitivity of method of the invention is to can adjust, is controllable.
3. can be quantitatively sequenced according to molecule encoding Source Tracing, the digitlization for carrying out initial DNA molecular, calculate initial
The detection number of DNA molecular, therefore can accurately carry out molecular number and quantitatively be quantified with mutant proportion, it solves rare low ratio and is mutated base
Because of the problem that cannot be quantified.
4. banking process is sequenced in current common two generation, all it is difficult to detect the sample that mutator ratio is less than 1%, and originally
Invention builds library by single-ended molecule encoding PCR, is sequenced through two generations, can detect 0.03%~1% low ratio mutator, can
The sequencing of two generation of low frequency mutant DNA, which is prepared, applied to design builds library kit.
5. multiple mutation detection can be carried out.The blood sample of one routine sampling amount can detect multiple genes, multiple simultaneously
Target area, multiple sites, and can detect and find unknown mutation site.
6. can be to avoid the expansion of the connection poor efficiency, high loss and PCR method of the primary templates connection method such as C-smart
Increase the problem of skewness keeps low ratio mutation lower or even loses.
7. the cost of low ratio mutator detection can be substantially reduced.
Description of the drawings
Fig. 1 is the single-stranded special linear amplification reaction principle schematic diagram of forward direction of the embodiment of the present invention 1.
Fig. 2 is the reversed single-ended special exponential amplification reaction principle schematic diagram of the embodiment of the present invention 1.
Fig. 3 is the both-end common contact primer exponential amplification reaction schematic diagram of the embodiment of the present invention 1.
Specific implementation mode
Following embodiment is merely to illustrate the present invention, rather than limits the scope of the invention.Tool is not specified in embodiment
The experimental method of concrete conditions in the establishment of a specific crime, usually according to normal condition, for example (,) Sambrook et al., molecular cloning:Laboratory manual (New
York:Cold Spring Harbor Laboratory Press, 1989) condition described in, or built according to manufacturer
The condition of view.
1 peripheral blood blood plasma gene EGFR EXON20 abrupt climatic changes of embodiment
1. sample DNA extracts
Suitable subject's peripheral blood 3~5ml of blood plasma is taken, with dissociative DNA extracts kit (DK607-01, by Shanghai Lay
Maple bio tech ltd provides) extract its episome group DNA.
2. positive single-stranded special linear amplification reaction
Forward primer EGFR EXON20-55bp-F:
TACACGACGCTCTTCCGATCTNNNNATTTTANNNNTAGGAAGCCTACGTGATGGC(SEQ ID NO:1)
Positive single-stranded special linear amplification reaction system is:5 × PCR buffer solutions, 2 μ L, 10mM dNTP, 0.5 μ L, DNA moulds
Plate 1 μ L (DNA containing 24.75ng), 5 μM of 1 μ L, 5U/ μ l KOD archaeal dna polymerases of primer 0.2 μ L, H2O 5.8 μ L, 10 μ of total volume
L。
Reaction condition is as shown in table 1, using slow cooling three-step approach thermal cycle.
The positive single-stranded special linear amplification reaction condition of table 1
3. positive single-stranded special linear amplification product purification
Common library of building is added in the reaction tube of positive single-stranded special linear amplification product and purifies magnetic bead, routinely two generations survey
Sequence builds library purifying specification operation, is purified to positive single-stranded special linear amplification product.
4. reversed single-ended special exponential amplification reaction
Reversed single-ended special exponential amplification reaction is carried out to the above-mentioned product after magnetic beads for purifying, reaction the primer is to such as
Under:
Reverse primer EGFR EXON20-49bp-R:
GACTGGAGTTCCTTGGCACCCGAGAATTCCA-GCAGCCGAAGGGCATGAG(SEQ ID NO:2)
P5 common contact primers:
AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID
NO:3)
Amplification reaction system is the same as step 2.Reaction condition is:First using slow cooling three-step approach thermal cycle, it is shown in Table 1, wherein following
Number of rings is 5~6;Two-step method thermal cycle, 95 DEG C of denaturation, 68 DEG C of annealing is used to extend again, 20 cycles.
5. reversed single-ended special exponential amplification product purification
Magnetic bead is purified using common library of building, library purifying specification operation is built in routinely two generations sequencing, to single-ended obtained by step 4
Special exponential amplification product is purified.
6. the non-specific exponential amplification reaction of both-end
Using step 5 gained purified product as template, the both-end common contact primer pair of system is sequenced with bis- generations of illumina
(5 μM, each 1 μ l) progress exponential amplification reactions.Wherein,
The sequence of P5 common contact primers is:
AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID
NO:3)
The sequence of P7 common contact primers is:
CAAGCAGAAGACGGCATACGAGATTACAGACGGTGACTG-GAGTTCCTTGGCACCCGAGA(SEQ ID
NO:4)
For amplification reaction system with step 2, reaction condition is that (95 DEG C of denaturation, 58 DEG C of annealing, 68 DEG C are prolonged conventional three-step approach cycle
Stretch), recurring number is 20.
7. the non-specific exponential amplification purifying of both-end, quantitative quality inspection, sequencing
Magnetic beads for purifying, magnetic bead and amplification are carried out according to a conventional method to the non-specific exponential amplification product of the both-end of above-mentioned steps 6
The volume ratio of product is 1.5:1.Then quantitative quality inspection is carried out (to quantify using routine QBIT detectors, by instrument and quantitative detection
The specification of reagent is operated), it is quantitatively qualification more than 10ng, illumina NEXT SEQ500 sequenators is sent to carry out two
Generation sequencing.
8. interpretation of result
The sequencing of two generations obtains 200000reads.Source Tracing is carried out to the data obtained, judges the presence or absence of mutator, class
Type and quantity.
Making a concrete analysis of step is:1) different target gene data are carried out with conventional method to sort, forms multiple subdatas.2)
Subdata carries out random 8 molecule encodings sequence respectively, if each independent molecule is encoded to a family of molecule, and counts molecule
Race's number calculates the reads in family of molecule.3) family of molecule distinguishing validity:Reads numbers in one family of molecule are less than present count
(present count>5,6) present count is set as by the present embodiment, it is invalid to be judged to;More than or equal to present count, it is determined as effectively.4) divide
The other comparison that reference sequences are carried out to reads in family of molecule is analyzed, and record counts mispairing reads and non-mispairing reads, then
Differentiate whether family of molecule is wild type, if non-mispairing reads numbers>0, then it is judged to wild type race;If non-mispairing reads numbers are
0, then continue the differentiation of mutant molecules race.5) differentiation of mutant molecules race:If reads and reference sequence in family of molecule
Row compare, and the consistency > 90% in mutational site and genotype is determined as true saltant type;Otherwise it is determined as wild type.6) it traces to the source
Statistics, is calculated as follows mutant proportion:It traces to the source molecular mutation ratio=mutant molecules race number/(mutant molecules race number+wild
Type family of molecule number).
Table 2 is to build library (embodiment of the present invention 1) using random molecular coding PCR and molecule encoding PCR is not used to build library (i.e.
Library is built using conventional amplicon), the comparison in difference on two generation sequencing results of EGFR EXON20 genes.
2 two kinds of banking process EGFR EXON20 gene two generations sequencing results of table compare
As can be seen from Table 2, for a complete normal DNA sample, does not use coding PCR to build library, that is, use conventional expansion
Increase son and build library, in two generation sequencing results, mutant proportion majority is more than 1% level.And coding of embodiment of the present invention PCR builds library two
For sequencing as a result, mutant proportion is seldom in 0.5% level, the overwhelming majority is below horizontal 0.02%;Based on 60 sites, remove
20, outside 26,35,41, the mutant proportion in other sites is 0, i.e. 90% or more site normal gene sports zero, it is shown that
The random molecular coding PCR of the present invention, which builds library two generations sequencing approach and builds library two generations sequencing approach than common amplicon, stronger mistake
Poor elimination effect.In addition, from table 2 it is estimated that the present embodiment builds the initial of two generation of library sequencing using random molecular coding PCR
It is about 48% that DNA molecular number, which detects accounting,.
The calculating of above-mentioned initial DNA molecular number detection accounting, it has been contemplated that the repetition detection in linear multi-cycle amplification
(hereinafter referred to as examining again) problem.Specific evaluation method is as follows:
1, the initial DNA molecular number of sample=DNA mass (ng) × 1000/3.3.(1 mankind's DNA copy number=3.3pg)
2, total family of molecule number of the accumulative cycle synthesis of molecular number=linear amplification of linear amplification single loop synthesis/linear
Recurring number;
3, non-heavy inspection molecular number+epicycle that adds up for adding up non-heavy inspection molecular number=upper cycle that often wheel recycles recycles linearly
Expand molecular number × (repetition of 1- epicycles cycle detects accounting) of synthesis
It is initial that 4, accumulative non-heavy inspection molecule accounting=epicycle that often wheel recycles recycled adds up non-heavy inspection molecular number/sample
DNA molecular number
That 5, often takes turns repetition detection accounting=upper cycle of cycle adds up non-heavy inspection molecule accounting
By taking the present embodiment as an example,
Initial DNA molecular number=24.75 of sample × 1000/3.3=7500 mankind's DNA copy;
Total family of molecule number of the accumulative cycle synthesis of 8 linear amplifications is 4784 (molecular numbers of tracing to the source for representing detection), then line
Property amplification single loop synthesis molecular number=4784/8=598;
Often the accumulative non-heavy inspection molecular number of wheel cycle, accumulative non-heavy inspection molecule accounting, repeat the calculating of detection accounting referring to
Shown in table 3, by 8 cycle linear amplifications after, add up it is non-it is heavy inspection molecule accounting be 48.56% (the i.e. the 8th wheel cycle add up
Non- heavy inspection molecule accounting, that is, the initial DNA molecular number estimated detects accounting)
Table 3
2 tumor patient sample peripheral blood blood plasma gene EGFR EXON18 of embodiment, EGFR EXON19, EGFR EXON20
Multiple mutation detects
1. sample DNA extracts
Suitable subject's peripheral blood 3~5ml of blood plasma is taken, with dissociative DNA extracts kit (DK607-01, by Shanghai Lay
Maple bio tech ltd provides) extract its episome group DNA.
2. positive single-stranded special linear amplification reaction
Forward primer sequence is as follows:
EGFR EXON20-55bp-F:
TACACGACGCTCTTCCGATCTNNNNTAAAATNNNNTAGGAAGCCTACGTGATGGC(SEQ ID NO:7)
EGFR EXON18-59bp-F:
TACACGACGCTCTTCCGATCTNNNNTAAAATNNNNGAGATCTTGAAGGAAACTGAATTC(SEQ ID
NO:8)
EGFR EXON19-58bp-F:
TACACGACGCTCTTCCGATCTNNNNTAAAATNNNNGAAAGTTAAAATTCCCGTCGCTA(SEQ ID NO:
9)
Positive single-stranded special linear amplification reaction system is:5 × PCR buffer solutions, 2 μ L, 10mM dNTP, 0.5 μ L, DNA moulds
Plate 1 μ L (DNA containing 33ng), 5 μM of 1 μ L, 5U/ μ l KOD archaeal dna polymerases of primer 0.2 μ L, H25.8 μ L of O, 10 μ L of total volume.
Reaction condition is the same as embodiment 1.
3. positive single-stranded special linear amplification product purification
Common library of building is added in the reaction tube of positive single-stranded special linear amplification product and purifies magnetic bead, routinely two generations survey
Sequence builds library purifying specification operation, is purified to positive single-stranded special linear amplification product.
4. reversed single-ended special exponential amplification reaction
Reversed single-ended special exponential amplification reaction is carried out to the above-mentioned product after magnetic beads for purifying, reaction the primer group is such as
Under:
Reverse primer:
EGFR EXON20-49bp-R:
GACTGGAGTTCCTTGGCACCCGAGAATTCCA-GCAGCCGAAGGGCATGAG(SEQ ID NO:2)
EGFR EXON18-55bp-R:
GACTGGAGTTCCTTGGCACCCGAGAATTCCA-CAGGGACCTTACCTTATACACCGT(SEQ ID NO:
10)
EGFR EXON19-54bp-R:
GACTGGAGTTCCTTGGCACCCGAGAATTCCA-CAGCAAAGCAGAAACTCACATCG(SEQ ID NO:11)
P5 common contact primers:
AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID
NO:3)
Amplification reaction system and reaction condition are the same as embodiment 1.
5. single-ended special exponential amplification product purification
Magnetic bead is purified using common library of building, library purifying specification operation is built in routinely two generations sequencing, to single-ended obtained by step 4
Special exponential amplification product is purified.
6. the non-specific exponential amplification reaction of both-end
Using step 5 gained purified product as template, the both-end common contact primer pair of system is sequenced with bis- generations of illumina
(5 μM, each 1 μ l) progress exponential amplification reactions.Wherein,
The sequence of P5 common contact primers is:
AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID
NO:3)
The sequence of P7 common contact primers is:
CAAGCAGAAGACGGCATACGAGATTACAGACGGTGACTG-GAGTTCCTTGGCACCCGAGA(SEQ ID
NO:4)
With embodiment 1, recurring number is 15 for amplification reaction system and reaction condition.
7. the non-specific exponential amplification purifying of both-end, quantitative quality inspection, sequencing
Magnetic beads for purifying is carried out according to a conventional method to the non-specific exponential amplification product of the both-end of above-mentioned steps 6, and is quantified
Quality inspection (is quantified, operated by the specification of instrument and quantitative detecting reagent) using routine QBIT detectors, is more than quantitatively
10ng is qualification, and illumina NEXT SEQ500 sequenators is sent to carry out the sequencing of two generations.
8. interpretation of result
The sequencing of two generations obtains 1695790reads.Source Tracing is carried out to the data obtained, judges the presence or absence of mutator, class
Type and quantity.Step is made a concrete analysis of with embodiment 1.
Table 4 is that the embodiment of the present invention 2 builds library to EGFR EXON18, EGFR EXON19, EGFR using molecule encoding PCR
Two generation sequencing results of EXON20 genes.
4 blood plasma ctDNA measured results of table
By table 4 as it can be seen that DNA sample for a tumor patient, is surveyed using bis- generations of coding PCR of the embodiment of the present invention 2
Sequence banking process can detect the saltant type of 0.1% level or more.The evaluation method of the non-duplicate detection accounting of DNA molecular is the same as implementation
Example 1.
Sequence table
<110>Upper sea base causes biological medicine Science and Technology Ltd.
<120>Banking process, kit and detection method is sequenced in coding bis- generations of PCR
<130> CPC-NP-16-100362
<160> 11
<170> PatentIn version 3.3
<210> 1
<211> 55
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 1
tacacgacgc tcttccgatc tnnnnatttt annnntagga agcctacgtg atggc 55
<210> 2
<211> 49
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 2
gactggagtt ccttggcacc cgagaattcc agcagccgaa gggcatgag 49
<210> 3
<211> 58
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 3
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210> 4
<211> 59
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 4
caagcagaag acggcatacg agattacaga cggtgactgg agttccttgg cacccgaga 59
<210> 5
<211> 30
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 5
ccatctcatc cctgcgtgtc tccgactcag 30
<210> 6
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 6
cctctctatg ggcagtcggt gat 23
<210> 7
<211> 55
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 7
tacacgacgc tcttccgatc tnnnntaaaa tnnnntagga agcctacgtg atggc 55
<210> 8
<211> 59
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 8
tacacgacgc tcttccgatc tnnnntaaaa tnnnngagat cttgaaggaa actgaattc 59
<210> 9
<211> 58
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 9
tacacgacgc tcttccgatc tnnnntaaaa tnnnngaaag ttaaaattcc cgtcgcta 58
<210> 10
<211> 55
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 10
gactggagtt ccttggcacc cgagaattcc acagggacct taccttatac accgt 55
<210> 11
<211> 54
<212> DNA
<213>Artificial sequence
<220>
<221> misc_feature
<223>Primer
<400> 11
gactggagtt ccttggcacc cgagaattcc acagcaaagc agaaactcac atcg 54