CN108504649A

CN108504649A - Banking process, kit and detection method is sequenced in coding bis- generations of PCR

Info

Publication number: CN108504649A
Application number: CN201710102701.XA
Authority: CN
Inventors: 黄新华; 戴慧清
Original assignee: Shanghai Based Biopharmaceutical Technology Co Ltd
Current assignee: Jiaxing Jinfukang Medical Laboratory Co Ltd
Priority date: 2017-02-24
Filing date: 2017-02-24
Publication date: 2018-09-07
Anticipated expiration: 2037-02-24
Also published as: CN108504649B

Abstract

The invention discloses a kind of bis- generations of coding PCR to be sequenced banking process, and step includes：1) DNA is extracted；2) positive single-stranded linear amplified reaction is carried out with the single-ended special primer with random molecular coded sequence；3) it uses the single-ended special primer of offside to constitute primer pair with common contact primer, carries out single-ended special exponential amplification reaction；4) exponential amplification reaction is carried out with sequencing both-end common contact primer pair；5) quantitative quality inspection, obtains DNA library.The invention also discloses the methods for carrying out sequencing detection with above-mentioned DNA library, and bis- generations of coding PCR comprising the DNA library to be sequenced and build library kit.The present invention allows sequencing result according to the trace to the source real sequence for detecting initial DNA profiling and true detection molecular number is encoded, to significantly reduce mutation false positive and false negative by encoding sample original DNA template sequence in initial complementary synthesis.

Description

Banking process, kit and detection method is sequenced in coding bis- generations of PCR

Technical field

The present invention relates to DNA deep sequencing detection technique fields, more particularly in tissue, blood plasma or serum 0.03%~ The high-fidelity deep sequencing of the uncertain DNA fragmentation in mutant target sequence site of 1% low ratio detects.

Background technology

Genetic science is a key areas of life science, and the gene-code sequence of life entity and its variation can influence to give birth to The a variety of biological functions for ordering body, cause function to make a variation, and the genetic mutation by detecting life entity will be seen that and estimate life Function makes a variation, and by the genetic mutation situation of quantitative detection life entity, can also accurately understand the progression of disease of illness life entity And prognosis, such as the presence or absence of the progress of tumour, recurrence or phase and therapeutic effect in the morning, afternoon and evening.Currently, detection genetic mutation has become It analyzes and determines genetic disease, the important means of tumor disease, individuation drug response feature, is lacked for predicting and preventing birth It falls into, regular monitoring tumor recurrence, monitors the therapeutic effect of tumour personalized medicine and chemotherapy.

Currently, the detection method of genetic mutation mainly has generation PCR sequencing PCR, two generation PCR sequencing PCRs and three generations's PCR sequencing PCR.A generation is surveyed Sequence can reliably detect variation in the sample of 20% or more ratio, be less than 20%, the reliability of detection is substantially reduced, so not Low ratio (0.05%~1%) genetic mutation type can be detected.When the first two for PCR sequencing PCR (including gene Jian Ku and the sequencing two of upper machine it is big Link) high depth sequencing (multiplying more than 5000) can detect genetic mutation type ratio be less than 10% sample.Because current It is sequenced there are mainly two types of banking process, one is full genome sequencings to build library, by the DNA sample of fragmentation, carries out non-specific connector It connects and builds library, build library joint efficiency and be less than 30%, cause about 70% primary template sequence information to lose, to 10000 bases Because 5 mutation may cause false negative because losing the positive in group DNA sample of copy.Therefore, it is not possible to accurately detect 5/ 10000 mutation.Because generally drawing blood 8~10ml, 4~5ml of blood plasma is obtained hereinafter, the DNA extracted is pressed in 30~40ng 3.3pg is a genome copies, and there are about 10000~12000 copies by 30~40ngDNA.By building the loss in library, can only remain 2000~3000 copies, it is highly difficult that realization effective 10000 multiplies high depth sequencing.Moreover, even if 5000 multiply the sequencing of depth, The expense of full genome sequencing is also prohibitively high, can not popularization and application.

High depth is sequenced, and is sequenced suitable for target gene target area, although depth is very deep, sequencing target sequence is limited, expense It is not high, tumor peripheries blood mutator detection (also known as CT-DNA detections) can be solved with popularization and application.Target gene target area is sequenced, Current method has special prize law and PCR specific amplified methods.Special prize law application distinguished sequence probe hybrid capture target base Because of target area segment, connector connection (efficiency about 30%) is carried out first, is then attached product purification, then carry out adapter PCR, Probe hybrid capture is carried out after amplification, then carries out capture product amplification.By connecting, capturing, it is attributed to the practical of primary template and obtains Rate about 20%.Such 10000 original copies about obtain 2000 DNA moleculars for being originated from original template, although expanding by PCR Energization reaches enough sequences, and progress 5000 multiplies, 10000 multiply sequencing, and the sequence number for being attributed to primary template does not improve, though Right depth is very deep, but cannot represent the depth of 10000 primary templates.The PCR of more than 30 cycle also will produce very important Mispairing error seriously affects the fidelity and sensitivity of sequencing.

In current method, also site-specific amplification method is used to obtain target sequence, general specific amplified 10~20 follows Ring, then carries out connector and expands again, and 10~15 cycles always expand 20~35 cycles.This equally brings more serious PCR wrong With error problem, detection noise is increased, reduces the signal-to-noise ratio of detection.Because of unavoidable base in the reaction of gene magnification Mispairing extension causes to be mutated false positive error, and the mispairing of exo+ polymerase extends less than General polymeric enzyme, but still inevitable.Such as This amplification, up to 0.5%~2%, this has just seriously affected the fidelity of deep sequencing, also just affects for the error mutation brought (precision of ct-DNA sequencing detections wants the net detection energy that can reach 1/1000 mutation for application of the deep sequencing in ct-DNA detections Power just has higher value).

Chinese shellfish is auspicious and Kanggong department has invented C-smart methods and carried out deep sequencing to ct-DNA.The characteristics of its method is One step carries out molecular bar code connector connection to primary template, then carries out the amplification of 20~30 cycles, then to amplified production ring Change, then carry out cyclic DNA specific amplification, realizes that library sequencing is built in targeting.The mispairing false positive that the amplified band of multiple cycles comes is missed Difference can be deducted to reduce by the calculating of molecular bar code, improve signal-to-noise ratio.It (is generally not more than the disadvantage is that joint efficiency is not high 30%) primary template information, will be caused to lose, the PCR exponential amplifications of High circulation, cause the mutant proportion of low ratio more it is low very To loss.

Low ratio mutator majority is dissociative DNA from tumor patient peripheral circulation blood plasma or precancerous lesion and early stage The extract for the DNA that the tumor tissues of lesion release, DNA sample is mostly the short-movie section (160~200bp) of fragmentation, to obtain The more amplified production containing mutational site is obtained, general amplification region Design of length is 70~100bp, and mutational site multiple preset exists The centre portion of PCR product.Detection method of the current detection mutator less than 1% is mainly digital pcr method, and some is known as Droplet digital pcr also has micropore digital pcr, effect difference larger.There are many this method domestic and foreign literature report.Digital pcr is most Big limitation is the quantitative detection for the sample that can only carry out known clear mutational site, and to common mutations segment, but site is not Fixed mutation then can not be detected clearly.

Digital pcr method (including droplet PCR, micropore PCR) is the reliable method of low meristic variation genetic test.It can The low meristic variation gene of detection 1/100 to 1/10000.The apparent of this method is limited in that：1) digital pcr, one anti- 1-2 known determining site mutation genes should can only be detected, it is not true cannot multiple sites reliably to be detected simultaneously to same sample Fixed mutation.Because tumour variation is complicated, polygenes multisite mutation is the important feature of tumour, therefore detects polygenes multidigit point It is particularly significant, so current digital pcr cannot meet the needs of blood plasma multisite mutation detection very well.2) digital pcr needs Larger, the generally conventional primary blood drawing 10ML of amount of DNA, only enough do a reaction, 1~2 site of survey will survey more sites then Sample size is inadequate.3) 1~2 site is only detected, for tumour, detection coverage is too low, and application surface is too narrow.For example, Kras Gene is that kinds of tumors needs the mutator detected, and most common mutation is 12 codons and 13 codons, relates to 6 here A gene base position, a site can have 3 kinds of variations, and which site mutation is not fixed, and unknown.In this way, digital pcr Just be not suitable for.In tumor tissues and blood of cancer patients detection, similar multidigit point hot-zone catastrophe is most, and certain list One site mutation ratio is relatively low, so the applicability of digital pcr is narrow, fails to be generalizable so far.4) digital pcr cost is too Height, as soon as detection site, only material cost need 200 yuan to 400 yuan.Therefore, digital pcr is so far only in field of scientific study Using still cannot be used for the detection of actual medicine.The digital pcr of Vehicles Collected from Market supply has four companies, is respectively：The U.S. Raindance, the Bole in the U.S., the life in the U.S., Singapore clari.The first two is the drop PCR of Water-In-Oil, latter two It is micropore PCR.

Invention content

One of the technical problem to be solved in the present invention is to provide a kind of two generations sequencing banking process of coding PCR, uses the party The DNA library of method structure can be used for being sequenced the unknown mutant fragments of the multidigit point for detecting multiple gene target areas and site, and can be with It traces to the source and calculates the detection number and sample original DNA mutation detection number of sample original DNA molecule, reach highly sensitive and fidelity.

In order to solve the above technical problems, banking process is sequenced in the first coding bis- generation of PCR of the present invention, step includes：

1) sample DNA is extracted；

2) it using the DNA of step 1) extraction as template, is carried with the distinguished sequence of template complementation, 5 ' ends with public with 3 ' ends The primer forward or backwards of joint sequence, middle part with random molecular coded sequence carries out single-stranded special linear amplification reaction, and Amplified production is purified；

3) it using purified product obtained by step 2) as template, is carried with the distinguished sequence with template complementation, 5 ' ends with 3 ' ends The reversed or forward primer of common contact sequence constitutes primer pair with common contact primer, it is anti-to carry out single-ended special exponential amplification It answers, and amplified production is purified；The sequence of the common contact primer and 5 ' the public of end of template described in this step connect Header sequence is identical or complementary；

4) using purified product obtained by step 3) as template, the non-specific finger of both-end is carried out with sequencing both-end common contact primer pair Number amplified reaction, and amplified production is purified；It is described sequencing both-end common contact primer pair sequence respectively with step 2) Common contact sequence in the forward primer and the common contact sequence in reverse primer described in step 3) are identical or complementary；

5) quantitative, quality inspection obtains DNA sequencing library.

Step 1) the sample DNA is usually the short-movie section of 150~500bp.

Step 2) the random molecular coded sequence is constituted by being inserted into one section of nonrandom sequences among two sections of random sequences, In, the base number of every section of random sequence is 4bp or more (preferably 4~5bp), and the base number of nonrandom sequences is 4~6bp.For example, 4N-A4TA-4N、4N-T4AT-4N、4N-AGCT-4N、4N-CTAG-4N、4N-TACTGT-4N、5N-A4TA-4N、5N-A4TA- 5N, 4N-T4AT-5N, 5N-T4AT-5N, 4N-AGCT-5N, 5N-AGCT-5N etc., wherein N represent randomized bases (i.e. dATP, Any one of dCTP, dGTP, dTTP).When primer synthesizes, the base (A, C, G, T) on each site of random sequence is logical Cross what random connection obtained, the base type set when being then according to design of primers of the base on each site of nonrandom sequences is solid What fixed distribution obtained.By being inserted into nonrandom sequences among random sequence, separate randomized bases, one isolated area of formation can To improve the specificity of the single-stranded linear amplified reaction of random coded primer, the formation of non-specific product is reduced.

The length of the step 2) primer is 50~65bp, and the special linear amplification single-stranded forward or backwards reacts anti- The system is answered to be：5 × PCR buffer solutions, 2 μ L, 10mM dNTP, 0.5 μ L, DNA profiling 1 μ L, 5 μM of 1 μ L, 5U/ μ l KOD DNA of primer Polymerase 0.2 μ L, H₂5.8 μ L of O, 10 μ L of total volume；Reaction condition is (slow cooling three-step approach cycle)：95 DEG C are denaturalized for 5 minutes； 95 DEG C 30 seconds, 64 DEG C anneal 1 minute, 63 DEG C anneal 1 minute, 62 DEG C anneal 1 minute, 61 DEG C anneal 1 minute, 60 DEG C anneal 1 point Clock, 59 DEG C are annealed 1 minute, and 58 DEG C are annealed 1 minute, and 68 DEG C~72 DEG C extend for 30 seconds, 2~10 thermal cycles；16 DEG C of holdings.It removes Outside KOD enzymes, reacts exo+ polymerase used and be also an option that Pfu enzymes, vent enzymes, kapa high fidelity enzymes etc..This step is using more Linear amplification is recycled, sampling rate can be improved, it is ensured that reaches the requirement that the initial non-duplicate recall rate of DNA molecular number is more than 30% (ctDNA is the low-frequency molecular in cfDNA, if the initial molecular detection number to cfDNA is insufficient, for example, containing in sample Under the conditions of 10000 DNA copies, recall rate is less than 10%, then the probability for detecting 0.1% ctDNA is just inadequate, be easy to cause vacation It is negative；If recall rate can be more than 30%, it is exactly 3 times of 10% recall rate that the ctDNA of detection 0.1%, which is the power of a test of the positive, More than, just cannot be easily caused false negative).In addition, this step is single-stranded linear amplification, that is, the abrupt climatic change of the present invention only detects As soon as the wherein chain in DNA double chain can be equivalent to complementary double chain mutation detection, and under the conditions of same detection effect, Ke Yijie Save the sequencing amount of half.

Step 3) the single-ended special exponential amplification reaction is first using slow cooling three-step approach cycle, and conditional synchronization is rapid 2), heat Recurring number is 4~6；Two-step method is used to recycle again, 95 DEG C of denaturation, 68~72 DEG C of annealing extend, and thermal cycle number is 15~20. This reaction condition advantageously reduces the ratio of non-specific product.

Routinely three-step approach recycles the reaction system and reaction condition of the non-specific exponential amplification reaction of the step 4) both-end (95 DEG C of denaturation, 58 DEG C of annealing, 68 DEG C of extensions), thermal cycle number is 15~20, and primer concentration is 5 μM.The purifying of amplified production It is that (volume ratio of magnetic bead and amplified production is 1.5 with magnetic bead:1) purify 220bp or more long segments, be conducive to remove 180bp with Under non-targeted product.Sequencing both-end common contact primer pair described in step 4) is surveyed when using illumina bis- generations sequenators It is P5 adapter-primers (SEQ ID NO when sequence:And P7 adapter-primers (SEQ ID NO 3):4)；When partly leading using life companies It is that PGM/A adapter-primers and PGM/B adapter-primers, sequence are as follows when body chip sequencer：

PGM/A adapter-primers：CCATCTCATCCCTGCGTGTCTCCGACTCAG(SEQ ID NO:5)

PGM/B adapter-primers：CCTCTCTATGGGCAGTCGGTGAT(SEQ ID NO:6)

The second technical problem to be solved by the present invention is to provide a kind of bis- generations of coding PCR for detection in Gene Mutation and is sequenced Detection method, this method prepare DNA sequencing library according to above-mentioned banking process, carry out high-flux sequence detection, and tie to sequencing Fruit data carry out Source Tracing.The Source Tracing includes error identification, authenticity differentiates, mutation differentiates, mutant proportion calculates, Specifically include following steps：

1) different target gene data are sorted, multiple subdatas are formed；

2) subdata carries out molecule encoding sequence respectively, and an independent molecule coding is defined as a family of molecule, counts Family of molecule number calculates the reads in family of molecule；

3) differentiate family of molecule validity：Reads numbers in one family of molecule are less than present count, which is determined as nothing Effect；More than or equal to present count, which is determined as effectively；The present count is the natural number more than 5；

4) reads in family of molecule is compared respectively with reference sequences, recorded, count mispairing reads and non-mistake The family of molecule is determined as wild type molecule race if non-mispairing reads numbers are more than 0 with reads；If non-mispairing reads Number is equal to 0, then continues the differentiation of step 5) mutant molecules race；

5) differentiation of mutant molecules race：If the consistency of mutational site and genotype is more than 90%, by the molecule Race is determined as mutant molecules race, and the family of molecule is otherwise determined as wild type molecule race；

6) it traces to the source statistics, and mutant proportion is calculated as follows：It traces to the source molecular mutation ratio=mutant molecules race number/(prominent Modification family of molecule number+wild type molecule race number).

The third technical problem to be solved by the present invention is to provide a kind of bis- generations of coding PCR and is sequenced and builds library kit, the reagent Box includes the DNA sequencing library prepared with above-mentioned banking process, can be used for the high throughput of 0.03%~1% low frequency mutator Sequencing.

The four of the technical problem to be solved in the present invention are to provide the primer pair for above-mentioned banking process.Wherein, positive or Reverse primer has such as SEQ ID NO:Sequence shown in 1, reversed or forward primer have such as SEQ ID NO:Sequence shown in 2 Row；Alternatively, primer has such as SEQ ID NO forward or backwards:One or more sequence in sequence shown in 7~9, reversely or just There is such as SEQ ID NO to primer:2、SEQ ID NO:One or more sequence in sequence shown in 10~11.Wherein, sequence In N be A, C, G, T in any one.

The present invention first carries out the single-ended special of random molecular coding using the unknown genetic fragment in target area mutational site as template The linear single-stranded synthetic reaction of primer guiding converts the base in serial site on primary template to the complementation of molecule encoding label Sequence carries out the single-ended special exponential amplification synthetic reaction of the single-ended special primer guiding of offside, to reduce target base again after purification Because the coding of sequence is lost, increases the specificity of product, then carries out the non-specific exponential amplification of multi-cycle of common contact primer, Obtain enough complete sequences to be measured needed for high-flux sequence.With it is existing build library and sequencing analysis method compared with, side of the invention Method has the following advantages and beneficial effect：

1. the specificity of detection is high, false positive can reach 0.01% hereinafter, specific than traditional two chain PCR amplification method 2% 200 times of high resolution.This is because encoding Source Tracing by unimolecule, (one is encoded to family, there is multiple sequences in race As a result, must mutational site consistency reach 90% or more, be just determined as being mutated), can eliminate PCR amplification generation random mistake With error.

2. what is inspected by random samples due to the present invention is a chain in DNA double chain, rather than two chains, 1 family of molecule represent one DNA double chain, 2 family of molecule same loci saltant types of detection represent the identical mutation of 2 DNA moleculars, therefore, detection of the invention Method sensitivity is very high, and detection limit can reach 0.03% mutation level, is connected with connector than double-strand PCR amplification method and builds library method (this The detection limit of two methods is 0.1%) 3 times high；Initial DNA molecular number recall rate can reach 30%~70%, mutation point Sub- detection limit can reach 0.03%~0.1%.In addition, the linear amplification of the unimolecule coded markings special primer of the present invention, Even if in the case where sample DNA molecular number is certain, the sampling observation rate of sample DNA molecule can also be improved by increase recurring number, That is, the sensitivity of mutation detection is improved, therefore, the sensitivity of method of the invention is to can adjust, is controllable.

3. can be quantitatively sequenced according to molecule encoding Source Tracing, the digitlization for carrying out initial DNA molecular, calculate initial The detection number of DNA molecular, therefore can accurately carry out molecular number and quantitatively be quantified with mutant proportion, it solves rare low ratio and is mutated base Because of the problem that cannot be quantified.

4. banking process is sequenced in current common two generation, all it is difficult to detect the sample that mutator ratio is less than 1%, and originally Invention builds library by single-ended molecule encoding PCR, is sequenced through two generations, can detect 0.03%~1% low ratio mutator, can The sequencing of two generation of low frequency mutant DNA, which is prepared, applied to design builds library kit.

5. multiple mutation detection can be carried out.The blood sample of one routine sampling amount can detect multiple genes, multiple simultaneously Target area, multiple sites, and can detect and find unknown mutation site.

6. can be to avoid the expansion of the connection poor efficiency, high loss and PCR method of the primary templates connection method such as C-smart Increase the problem of skewness keeps low ratio mutation lower or even loses.

7. the cost of low ratio mutator detection can be substantially reduced.

Description of the drawings

Fig. 1 is the single-stranded special linear amplification reaction principle schematic diagram of forward direction of the embodiment of the present invention 1.

Fig. 2 is the reversed single-ended special exponential amplification reaction principle schematic diagram of the embodiment of the present invention 1.

Fig. 3 is the both-end common contact primer exponential amplification reaction schematic diagram of the embodiment of the present invention 1.

Specific implementation mode

Following embodiment is merely to illustrate the present invention, rather than limits the scope of the invention.Tool is not specified in embodiment The experimental method of concrete conditions in the establishment of a specific crime, usually according to normal condition, for example (,) Sambrook et al., molecular cloning：Laboratory manual (New York:Cold Spring Harbor Laboratory Press, 1989) condition described in, or built according to manufacturer The condition of view.

1 peripheral blood blood plasma gene EGFR EXON20 abrupt climatic changes of embodiment

1. sample DNA extracts

Suitable subject's peripheral blood 3~5ml of blood plasma is taken, with dissociative DNA extracts kit (DK607-01, by Shanghai Lay Maple bio tech ltd provides) extract its episome group DNA.

2. positive single-stranded special linear amplification reaction

Forward primer EGFR EXON20-55bp-F：

TACACGACGCTCTTCCGATCTNNNNATTTTANNNNTAGGAAGCCTACGTGATGGC(SEQ ID NO:1)

Positive single-stranded special linear amplification reaction system is：5 × PCR buffer solutions, 2 μ L, 10mM dNTP, 0.5 μ L, DNA moulds Plate 1 μ L (DNA containing 24.75ng), 5 μM of 1 μ L, 5U/ μ l KOD archaeal dna polymerases of primer 0.2 μ L, H₂O 5.8 μ L, 10 μ of total volume L。

Reaction condition is as shown in table 1, using slow cooling three-step approach thermal cycle.

The positive single-stranded special linear amplification reaction condition of table 1

3. positive single-stranded special linear amplification product purification

Common library of building is added in the reaction tube of positive single-stranded special linear amplification product and purifies magnetic bead, routinely two generations survey Sequence builds library purifying specification operation, is purified to positive single-stranded special linear amplification product.

4. reversed single-ended special exponential amplification reaction

Reversed single-ended special exponential amplification reaction is carried out to the above-mentioned product after magnetic beads for purifying, reaction the primer is to such as Under：

Reverse primer EGFR EXON20-49bp-R：

GACTGGAGTTCCTTGGCACCCGAGAATTCCA-GCAGCCGAAGGGCATGAG(SEQ ID NO:2)

P5 common contact primers：

AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID NO:3)

Amplification reaction system is the same as step 2.Reaction condition is：First using slow cooling three-step approach thermal cycle, it is shown in Table 1, wherein following Number of rings is 5~6；Two-step method thermal cycle, 95 DEG C of denaturation, 68 DEG C of annealing is used to extend again, 20 cycles.

5. reversed single-ended special exponential amplification product purification

Magnetic bead is purified using common library of building, library purifying specification operation is built in routinely two generations sequencing, to single-ended obtained by step 4 Special exponential amplification product is purified.

6. the non-specific exponential amplification reaction of both-end

Using step 5 gained purified product as template, the both-end common contact primer pair of system is sequenced with bis- generations of illumina (5 μM, each 1 μ l) progress exponential amplification reactions.Wherein,

The sequence of P5 common contact primers is：

AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID NO:3)

The sequence of P7 common contact primers is：

CAAGCAGAAGACGGCATACGAGATTACAGACGGTGACTG-GAGTTCCTTGGCACCCGAGA(SEQ ID NO:4)

For amplification reaction system with step 2, reaction condition is that (95 DEG C of denaturation, 58 DEG C of annealing, 68 DEG C are prolonged conventional three-step approach cycle Stretch), recurring number is 20.

7. the non-specific exponential amplification purifying of both-end, quantitative quality inspection, sequencing

Magnetic beads for purifying, magnetic bead and amplification are carried out according to a conventional method to the non-specific exponential amplification product of the both-end of above-mentioned steps 6 The volume ratio of product is 1.5:1.Then quantitative quality inspection is carried out (to quantify using routine QBIT detectors, by instrument and quantitative detection The specification of reagent is operated), it is quantitatively qualification more than 10ng, illumina NEXT SEQ500 sequenators is sent to carry out two Generation sequencing.

8. interpretation of result

The sequencing of two generations obtains 200000reads.Source Tracing is carried out to the data obtained, judges the presence or absence of mutator, class Type and quantity.

Making a concrete analysis of step is：1) different target gene data are carried out with conventional method to sort, forms multiple subdatas.2) Subdata carries out random 8 molecule encodings sequence respectively, if each independent molecule is encoded to a family of molecule, and counts molecule Race's number calculates the reads in family of molecule.3) family of molecule distinguishing validity：Reads numbers in one family of molecule are less than present count (present count>5,6) present count is set as by the present embodiment, it is invalid to be judged to；More than or equal to present count, it is determined as effectively.4) divide The other comparison that reference sequences are carried out to reads in family of molecule is analyzed, and record counts mispairing reads and non-mispairing reads, then Differentiate whether family of molecule is wild type, if non-mispairing reads numbers>0, then it is judged to wild type race；If non-mispairing reads numbers are 0, then continue the differentiation of mutant molecules race.5) differentiation of mutant molecules race：If reads and reference sequence in family of molecule Row compare, and the consistency ＞ 90% in mutational site and genotype is determined as true saltant type；Otherwise it is determined as wild type.6) it traces to the source Statistics, is calculated as follows mutant proportion：It traces to the source molecular mutation ratio=mutant molecules race number/(mutant molecules race number+wild Type family of molecule number).

Table 2 is to build library (embodiment of the present invention 1) using random molecular coding PCR and molecule encoding PCR is not used to build library (i.e. Library is built using conventional amplicon), the comparison in difference on two generation sequencing results of EGFR EXON20 genes.

2 two kinds of banking process EGFR EXON20 gene two generations sequencing results of table compare

As can be seen from Table 2, for a complete normal DNA sample, does not use coding PCR to build library, that is, use conventional expansion Increase son and build library, in two generation sequencing results, mutant proportion majority is more than 1% level.And coding of embodiment of the present invention PCR builds library two For sequencing as a result, mutant proportion is seldom in 0.5% level, the overwhelming majority is below horizontal 0.02%；Based on 60 sites, remove 20, outside 26,35,41, the mutant proportion in other sites is 0, i.e. 90% or more site normal gene sports zero, it is shown that The random molecular coding PCR of the present invention, which builds library two generations sequencing approach and builds library two generations sequencing approach than common amplicon, stronger mistake Poor elimination effect.In addition, from table 2 it is estimated that the present embodiment builds the initial of two generation of library sequencing using random molecular coding PCR It is about 48% that DNA molecular number, which detects accounting,.

The calculating of above-mentioned initial DNA molecular number detection accounting, it has been contemplated that the repetition detection in linear multi-cycle amplification (hereinafter referred to as examining again) problem.Specific evaluation method is as follows：

1, the initial DNA molecular number of sample=DNA mass (ng) × 1000/3.3.(1 mankind's DNA copy number=3.3pg)

2, total family of molecule number of the accumulative cycle synthesis of molecular number=linear amplification of linear amplification single loop synthesis/linear Recurring number；

3, non-heavy inspection molecular number+epicycle that adds up for adding up non-heavy inspection molecular number=upper cycle that often wheel recycles recycles linearly Expand molecular number × (repetition of 1- epicycles cycle detects accounting) of synthesis

It is initial that 4, accumulative non-heavy inspection molecule accounting=epicycle that often wheel recycles recycled adds up non-heavy inspection molecular number/sample DNA molecular number

That 5, often takes turns repetition detection accounting=upper cycle of cycle adds up non-heavy inspection molecule accounting

By taking the present embodiment as an example,

Initial DNA molecular number=24.75 of sample × 1000/3.3=7500 mankind's DNA copy；

Total family of molecule number of the accumulative cycle synthesis of 8 linear amplifications is 4784 (molecular numbers of tracing to the source for representing detection), then line Property amplification single loop synthesis molecular number=4784/8=598；

Often the accumulative non-heavy inspection molecular number of wheel cycle, accumulative non-heavy inspection molecule accounting, repeat the calculating of detection accounting referring to Shown in table 3, by 8 cycle linear amplifications after, add up it is non-it is heavy inspection molecule accounting be 48.56% (the i.e. the 8th wheel cycle add up Non- heavy inspection molecule accounting, that is, the initial DNA molecular number estimated detects accounting)

Table 3

2 tumor patient sample peripheral blood blood plasma gene EGFR EXON18 of embodiment, EGFR EXON19, EGFR EXON20 Multiple mutation detects

1. sample DNA extracts

2. positive single-stranded special linear amplification reaction

Forward primer sequence is as follows：

EGFR EXON20-55bp-F：

TACACGACGCTCTTCCGATCTNNNNTAAAATNNNNTAGGAAGCCTACGTGATGGC(SEQ ID NO:7)

EGFR EXON18-59bp-F：

TACACGACGCTCTTCCGATCTNNNNTAAAATNNNNGAGATCTTGAAGGAAACTGAATTC(SEQ ID NO:8)

EGFR EXON19-58bp-F：

TACACGACGCTCTTCCGATCTNNNNTAAAATNNNNGAAAGTTAAAATTCCCGTCGCTA(SEQ ID NO: 9)

Positive single-stranded special linear amplification reaction system is：5 × PCR buffer solutions, 2 μ L, 10mM dNTP, 0.5 μ L, DNA moulds Plate 1 μ L (DNA containing 33ng), 5 μM of 1 μ L, 5U/ μ l KOD archaeal dna polymerases of primer 0.2 μ L, H₂5.8 μ L of O, 10 μ L of total volume.

Reaction condition is the same as embodiment 1.

3. positive single-stranded special linear amplification product purification

4. reversed single-ended special exponential amplification reaction

Reversed single-ended special exponential amplification reaction is carried out to the above-mentioned product after magnetic beads for purifying, reaction the primer group is such as Under：

Reverse primer：

EGFR EXON20-49bp-R：

GACTGGAGTTCCTTGGCACCCGAGAATTCCA-GCAGCCGAAGGGCATGAG(SEQ ID NO:2)

EGFR EXON18-55bp-R：

GACTGGAGTTCCTTGGCACCCGAGAATTCCA-CAGGGACCTTACCTTATACACCGT(SEQ ID NO: 10)

EGFR EXON19-54bp-R：

GACTGGAGTTCCTTGGCACCCGAGAATTCCA-CAGCAAAGCAGAAACTCACATCG(SEQ ID NO:11)

P5 common contact primers：

AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID NO:3)

Amplification reaction system and reaction condition are the same as embodiment 1.

5. single-ended special exponential amplification product purification

6. the non-specific exponential amplification reaction of both-end

The sequence of P5 common contact primers is：

AATGATACGGCGACCACCGAGATCTACAC-TCTTTCCC-TACACGACGCTCTTCCGATCT(SEQ ID NO:3)

The sequence of P7 common contact primers is：

CAAGCAGAAGACGGCATACGAGATTACAGACGGTGACTG-GAGTTCCTTGGCACCCGAGA(SEQ ID NO:4)

With embodiment 1, recurring number is 15 for amplification reaction system and reaction condition.

Magnetic beads for purifying is carried out according to a conventional method to the non-specific exponential amplification product of the both-end of above-mentioned steps 6, and is quantified Quality inspection (is quantified, operated by the specification of instrument and quantitative detecting reagent) using routine QBIT detectors, is more than quantitatively 10ng is qualification, and illumina NEXT SEQ500 sequenators is sent to carry out the sequencing of two generations.

8. interpretation of result

The sequencing of two generations obtains 1695790reads.Source Tracing is carried out to the data obtained, judges the presence or absence of mutator, class Type and quantity.Step is made a concrete analysis of with embodiment 1.

Table 4 is that the embodiment of the present invention 2 builds library to EGFR EXON18, EGFR EXON19, EGFR using molecule encoding PCR Two generation sequencing results of EXON20 genes.

4 blood plasma ctDNA measured results of table

By table 4 as it can be seen that DNA sample for a tumor patient, is surveyed using bis- generations of coding PCR of the embodiment of the present invention 2 Sequence banking process can detect the saltant type of 0.1% level or more.The evaluation method of the non-duplicate detection accounting of DNA molecular is the same as implementation Example 1.

Sequence table

<110>Upper sea base causes biological medicine Science and Technology Ltd.

<120>Banking process, kit and detection method is sequenced in coding bis- generations of PCR

<130> CPC-NP-16-100362

<160> 11

<170> PatentIn version 3.3

<210> 1

<211> 55

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 1

tacacgacgc tcttccgatc tnnnnatttt annnntagga agcctacgtg atggc 55

<210> 2

<211> 49

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 2

gactggagtt ccttggcacc cgagaattcc agcagccgaa gggcatgag 49

<210> 3

<211> 58

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 3

aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58

<210> 4

<211> 59

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 4

caagcagaag acggcatacg agattacaga cggtgactgg agttccttgg cacccgaga 59

<210> 5

<211> 30

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 5

ccatctcatc cctgcgtgtc tccgactcag 30

<210> 6

<211> 23

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 6

cctctctatg ggcagtcggt gat 23

<210> 7

<211> 55

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 7

tacacgacgc tcttccgatc tnnnntaaaa tnnnntagga agcctacgtg atggc 55

<210> 8

<211> 59

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 8

tacacgacgc tcttccgatc tnnnntaaaa tnnnngagat cttgaaggaa actgaattc 59

<210> 9

<211> 58

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 9

tacacgacgc tcttccgatc tnnnntaaaa tnnnngaaag ttaaaattcc cgtcgcta 58

<210> 10

<211> 55

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 10

gactggagtt ccttggcacc cgagaattcc acagggacct taccttatac accgt 55

<210> 11

<211> 54

<212> DNA

<213>Artificial sequence

<220>

<221> misc_feature

<223>Primer

<400> 11

gactggagtt ccttggcacc cgagaattcc acagcaaagc agaaactcac atcg 54

Claims

1. banking process is sequenced in coding bis- generations of PCR, which is characterized in that step includes：

1) sample DNA is extracted；

2) using the DNA of step 1) extraction as template, common contact is carried with the distinguished sequence with template complementation, 5 ' ends with 3 ' ends The primer forward or backwards of sequence, middle part with random molecular coded sequence carries out single-stranded special linear amplification reaction, and to expanding Volume increase object is purified；

3) it using purified product obtained by step 2) as template, is carried with the distinguished sequence of template complementation, 5 ' ends with public with 3 ' ends The reversed or forward primer of joint sequence constitutes primer pair with common contact primer, carries out single-ended special exponential amplification reaction, and Amplified production is purified；The common contact sequence of the sequence of the common contact primer and 5 ' ends of template described in this step It is identical or complementary；

4) it using purified product obtained by step 3) as template, carries out the non-specific index of both-end with sequencing both-end common contact primer pair and expands Increase reaction, and amplified production is purified；It is described sequencing both-end common contact primer pair sequence respectively with described in step 2) Common contact sequence in forward primer and the common contact sequence in reverse primer described in step 3) are identical or complementary；

5) quantitative, quality inspection obtains DNA sequencing library.

2. according to the method described in claim 1, it is characterized in that, step 2) the random molecular coded sequence include two sections with Machine sequence and one section of nonrandom sequences；The nonrandom sequences are located between two sections of random sequences, for being spaced two sections of stochastic orderings Row；The base number of every section of random sequence is 4bp or more, and the base number of nonrandom sequences is 4~6bp；When primer synthesizes, at random Base on each site of sequence obtains to be connected at random when synthesis, when the base on each site of nonrandom sequences is according to design of primers Set base type fixed allocation.

3. according to the method described in claim 1, it is characterized in that, special linear amplification single-stranded forward or backwards described in step 2) Reaction using denaturation, annealing, extend three-step approach, wherein annealing stage use slow cooling annealing way, thermal cycle number be 2~ 10.

4. according to the method described in claim 1, it is characterized in that, step 3) the single-ended special exponential amplification reaction, is first adopted With denaturation, annealing, extend three-step approach, it is 4~6 that wherein annealing stage, which uses the annealing way of slow cooling, thermal cycle number,；Again Two-step method is extended using denaturation, annealing, thermal cycle number is 15~25.

5. according to the method described in claim 1, it is characterized in that, the heat of the non-specific exponential amplification reaction of the step 4) both-end Recurring number is 15~20.

6. detection method is sequenced in bis- generations of coding PCR for detection in Gene Mutation, which is characterized in that according to the claims 1 Any one of~5 banking process prepare DNA sequencing library, carry out high-flux sequence detection, and carry out to sequencing result data Source Tracing；The Source Tracing includes error identification, authenticity differentiates, mutation differentiates, mutant proportion calculates.

7. according to the method described in claim 6, it is characterized in that, the Source Tracing includes the following steps：

1) different target gene data are sorted, multiple subdatas are formed；

2) subdata carries out molecule encoding sequence respectively, and an independent molecule coding is defined as a family of molecule, counts molecule Race's number calculates the reads in family of molecule；

3) differentiate family of molecule validity：Reads numbers in one family of molecule are less than present count, which is determined as in vain； More than or equal to present count, which is determined as effectively；The present count is the natural number more than 5；

4) reads in family of molecule is compared respectively with reference sequences, recorded, count mispairing reads and non-mispairing The family of molecule is determined as wild type molecule race by reads if non-mispairing reads numbers are more than 0；If non-mispairing reads numbers Equal to 0, then continue following steps 5) differentiation of mutant molecules race；

5) differentiation of mutant molecules race：If the consistency of mutational site and genotype is more than 90%, which is sentenced Not Wei mutant molecules race, the family of molecule is otherwise determined as wild type molecule race；

6) it traces to the source statistics, and mutant proportion is calculated as follows：It traces to the source molecular mutation ratio=mutant molecules race number/(saltant type Family of molecule number+wild type molecule race number).

8. coding bis- generations of PCR, which are sequenced, builds library kit, which is characterized in that the kit includes any one of useful Claims 1 to 5 DNA sequencing library prepared by the banking process.

9. the primer pair for any one of the Claims 1 to 5 banking process, which is characterized in that described to draw forward or backwards Object has such as SEQ ID NO:Sequence shown in 1, the reversed or forward primer have such as SEQ ID NO:Sequence shown in 2, Wherein, the N in sequence is any one in A, C, G, T.

10. the primer pair for any one of the Claims 1 to 5 banking process, which is characterized in that described to draw forward or backwards Object has such as SEQ ID NO:One or more sequence in sequence shown in 7~9, the reversed or forward primer have such as SEQ ID NO:2、SEQ ID NO:One or more sequence in sequence shown in 10~11, wherein the N in sequence is in A, C, G, T Any one.