CN104946737B

CN104946737B - For detecting the composition and method of rare sequence variants

Info

Publication number: CN104946737B
Application number: CN201410765164.3A
Authority: CN
Inventors: 林盛榕; 孙朝辉; 赵奇志; 邓凌锋
Original assignee: Encore Economic Holdings Ltd
Current assignee: Encore economic Holdings Limited
Priority date: 2013-12-11
Filing date: 2014-12-11
Publication date: 2019-02-22
Anticipated expiration: 2034-12-11
Also published as: AU2021206868A1; AU2014362227A1; EP3495506B1; EP3080298B1; WO2015089333A1; IL300974A; EP3495506A1; JP2017510244A; JP6435334B2; IL274464B2; US11597973B2; US20210054449A1; AU2021206868B2; MX2016007605A; KR20160106596A; HK1214843A1; IL246021B; EP3080298A1; IL274464B1; AU2024201195A1

Abstract

In some respects, the present invention provides the methods for identifying the sequence variants in nucleic acid samples.In some embodiments, method includes the sequence difference between identification sequencing reading and reference sequences, and will be present at least two different Circular polynucleotides, as the sequence difference in two Circular polynucleotides with different contacts is determined as sequence variants.In some respects, the present invention provides the composition that can be used in the method and systems.

Description

For detecting the composition and method of rare sequence variants

Cross reference

This application claims the U.S. Provisional Application 61/914,907 submitted on December 11st, 2013, on May 1st, 2014 The power of the U.S. Provisional Application 61/987,414 of submission and the U.S. Provisional Application 62/010,975 submitted on June 11st, 2014 Benefit；Above-mentioned all U.S. Provisional Applications are both incorporated herein by reference.

Background technique

The complicated intragroup sequence variations of identification are the fields of an active development, in particular with extensive parallel nucleic acid The appearance of sequencing.However, the constant error frequency due to common technology is bigger than the frequency that actual sequences many in group make a variation, Large-scale parallel sequencing has significant limitation.For example, the error rate of 0.1-1% is reported in the high-flux sequence of standard Road.When variant frequency is low, is such as equal to or less than error rate, there is high false positive rate to the detection of rare sequence variants.

The reason of detecting rare sequence variants has very much.For example, detecting rare characteristic sequence can be used for identifying and distinguishing The presence of adverse environmental pollutants such as division bacteria group.The normal method of characterization division bacteria group is to identify highly conserved sequence such as The difference of rRNA sequence.However, the typical method based on sequencing for this faces and multi-quantity such in given sample The relevant challenge of degree of homology between different genes group and member, so that red tape shows complicated ask for this Topic.Improved program will have the potentiality for reinforcing the pollution detection under a variety of settings.For example, for assembling satellite and other skies Between aircraft component toilet can be used system and method surveyed, with understand there are which kind of microbiologic populations, and And exploitation preferably depollutes and clean technologies, to prevent from introducing earth microorganism into other celestial bodies or its sample, or exploitation For what the data generated by microorganism outside the earth that estimates were distinguished with the data generated by pollution earth microorganism Method.Food Monitoring application includes the periodic detection to food processing factory's production line, investigates slaughterhouse, checks dining room, hospital, The food-borne causal agent in school, the kitchen in prison and other mechanisms and food storage area.Can also similarly monitor water source deposit and Processing factory.

Summary of the invention

In view of the above, to the improved method for detecting rare sequence variants, there are demands.Composition of the invention and Method meets the demand, and additionally provides other benefit.In particular, various aspects of the invention provide to rare or The super-sensitive detection of low frequency Nucleic acid sequence variants (being sometimes referred to as mutated).This includes that may contain in normal sequence background There is the identification of the low frequency variance (including substitution, insertion and missing) in the sample of a small amount of series of variation and illustrates and right In the identification for the low frequency variation being sequenced under wrong background.

In one aspect, the present invention provides a kind of identification sequence variants, such as the side of the sequence variants in nucleic acid samples Method.In some embodiments, each polynucleotides in multiple polynucleotides have 5 ' ends and 3 ' ends, and this method Include: that the independent polynucleotides in the multiple polynucleotides are cyclized to form multiple Circular polynucleotides by (a), wherein Each Circular polynucleotide has contact (junction) between 5 ' ends and 3 ' ends；(b) the cyclic annular multicore glycosides of (a) is expanded Acid；(c) polynucleotides of amplification are sequenced to generate multiple sequencings and read；(d) identification sequencing read with reference sequences it Between sequence difference；(e) sequence difference that will be present at least two Circular polynucleotides with different contacts determines It (calling) is sequence variants.In some embodiments, this method includes the sequence between identification sequencing reading and reference sequences Column difference, and the sequence difference that will be present at least two Circular polynucleotides with different contacts are determined as that sequence becomes Body, in which: the amplified production for corresponding at least two Circular polynucleotide is read in (a) sequencing；And (b) at least two ring Each of shape polynucleotides include different to be connect by what the 5 ' ends and 3 ' ends of the corresponding polynucleotides of connection were formed Point.

The multiple polynucleotides can be single-stranded or double-strand.In some embodiments, which is single Chain.In some embodiments, cyclisation is realized and being attached reaction to multiple polynucleotides.In some implementations In scheme, individual Circular polynucleotide has the unique contact in the polynucleotides of cyclisation.In some embodiments, should Sequence variants are single nucleotide polymorphism (SNP).In some embodiments, the reference sequences be by by sequence read each other The consensus sequence for being compared and being formed.In some embodiments, which is known reference sequences, such as with reference to Genome or part thereof.In some embodiments, cyclisation includes that adapter polynucleotides are connected in multiple polynucleotides 5 ' ends of polynucleotides, 3 ' ends or the step of both 5 ' ends and 3 ' ends.In some embodiments, amplification passes through It is realized using the polymerase with strand-displacement activity, such as in rolling circle amplification (RCA).In some embodiments, it expands Including being placed in Circular polynucleotide in the amplification reaction mixture containing random primer.In some embodiments, amplification packet It includes and Circular polynucleotide is placed in the amplification reaction mixture containing one or more primers, each of them primer passes through sequence Column complementarity specifically hybridizes from different target sequences.In some embodiments, microbial contamination is identified based on determination step Object.

The polynucleotides of amplification can be sequenced in the case where carrying out or without being enriched with, such as by being sequenced It carries out enriching step before and is enriched with one or more target polynucleotides in the polynucleotides of amplification.In some embodiments In, which includes hybridizing the polynucleotides of amplification and the probe of multiple and substrate attachment.In some embodiment party In case, which includes that sequence A and sequence B comprising being orientated with 5 ' to 3 ' directions are expanded in amplification reaction mixture Target sequence, which includes: (a) polynucleotides expanded；(b) comprising the first primer of sequence A ', wherein should The first primer is specifically hybridized with the sequence A of target sequence by the complementarity between sequence A and sequence A '；(c) include Second primer of sequence B, wherein second primer and the sequence being present in the complementary polynucleotide comprising target sequence complement B ' is specifically hybridized by the complementarity between B and B '；And (d) polymerase, extension the first primer and second are drawn Object is to generate the polynucleotides of amplification；Wherein the distance between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B are 75nt or shorter.

In one aspect, the present invention provides a kind of method of the sequence variants in identification nucleic acid samples, the nucleic acid samples Polynucleotides comprising being less than 50ng, each polynucleotides have 5 ' ends and 3 ' ends.In some embodiments, the party Method includes: that (a) with ligase is cyclized the individual polynucleotides in the sample to form multiple Circular polynucleotides；(b) one Denier separates the Circular polynucleotide with the ligase, that is, expands the Circular polynucleotide to form concatermer (concatemer)；(c) concatermer is sequenced to generate multiple sequencings and read；(d) identify multiple sequencing read and Sequence difference between reference sequences；(e) the multiple reading that will be obtained from the nucleic acid samples less than 50ng polynucleotide Sequence variants are determined as with the sequence difference that 0.05% or higher frequency occur in taking.The polynucleotides can be it is single-stranded or Double-strand.In some embodiments, which is single-stranded.In some embodiments, individual cyclic annular multicore glycosides Acid has the unique contact in the polynucleotides of cyclisation.In some embodiments, which is mononucleotide polymorphic Property.In some embodiments, which is that the consensus sequence to be formed is compared each other by that will be sequenced to read.? In some embodiments, which is known reference sequences, such as with reference to genome.In some embodiments, expand Increasing is realized by using the polymerase with strand-displacement activity.In some embodiments, amplification includes by cyclic annular multicore glycosides Acid is placed in the amplification reaction mixture containing random primer.In some embodiments, amplification includes by Circular polynucleotide It is placed in the amplification reaction mixture containing one or more primers, each of them primer passes through complementarity and different targets Hybridize to sequence-specific.

In one aspect, the present invention provides the methods that one kind expands multiple and different concatermers in the reactive mixture, should Concatermer includes two or more copies of target sequence, wherein the target sequence include the sequence A that be orientated with 5 ' to 3 ' directions with Sequence B.In some embodiments, this method includes carrying out nucleic acid amplification reaction to reaction mixture, and wherein the reaction mixes Object includes: (a) multiple concatermers, wherein individual concatermer includes to have 5 ' ends and 3 ' by cyclisation in multiple concatermer The independent polynucleotides of end and the different contacts formed；(b) include sequence A ' the first primer, wherein the first primer with The sequence A of target sequence is specifically hybridized by the complementarity between sequence A and sequence A '；It (c) include the second of sequence B Primer, wherein second primer and the sequence B being present in the complementary polynucleotide comprising target sequence complement ' pass through sequence B Complementarity between B ' specifically hybridizes；And (d) polymerase, extend the first primer and the second primer to generate The polynucleotides of amplification；Wherein the distance between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B are 75nt or more It is short.In some embodiments, the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', and the second primer includes opposite It is located at the sequence D of 5 ' sides in sequence B, and during the first amplification stage of sequence C and sequence D under the first hybridization temperature not Hybridize with the multiple concatermer.In some embodiments, amplification includes first stage and second stage；First stage includes Hybridization step at the first temperature, the first and second primers hybridize before primer extend with the concatermer therebetween；And second Stage includes the hybridization step under the second temperature for being higher than the first temperature, therebetween the first and second primers and the comprising extension One or second primer or its complement amplified production hybridization.In some embodiments, 5 hybridization at the second temperature follow After ring and primer extend, in reaction mixture at least 5% amplifying polynucleotides include target sequence two or more copy Shellfish.

In in a related aspect, the present invention provides one kind expand in the reactive mixture it is multiple comprising target sequence not With the method for Circular polynucleotide, wherein the target sequence includes the sequence A being orientated with 5 ' to 3 ' directions and sequence B.In some realities It applies in scheme, this method includes carrying out nucleic acid amplification reaction to reaction mixture, and wherein the reaction mixture includes: (a) multiple Circular polynucleotide, wherein individual Circular polynucleotide includes to have 5 ' ends by cyclisation in multiple Circular polynucleotide The different contacts formed with the independent polynucleotides of 3 ' ends；(b) include sequence A ' the first primer, wherein this first draws Object is specifically hybridized with the sequence A of target sequence by the complementarity between sequence A and sequence A '；(c) comprising sequence B Second primer, wherein second primer and the sequence B being present in the complementary polynucleotide comprising target sequence complement ' pass through sequence Complementarity between column B and B ' specifically hybridizes；And (d) polymerase, extend the first primer and the second primer with Generate the polynucleotides of amplification；Wherein sequence A and sequence B are endogenous sequence, and the 5 ' ends of the sequence A of target sequence and sequence Arranging the distance between 3 ' ends of B is 75nt or shorter.In some embodiments, the first primer includes relative to the position sequence A ' Sequence C in 5 ' sides, the second primer includes the sequence D for being located at 5 ' sides relative to sequence B, and sequence C and sequence D are miscellaneous first Do not hybridize with the multiple Circular polynucleotide during the first amplification stage at a temperature of friendship.In some embodiments, expand Increase includes first stage and second stage；First stage includes hybridization step at the first temperature, and first and second draw therebetween Object hybridizes before primer extend with the Circular polynucleotide or its amplified production；Second stage includes being higher than the first temperature Hybridization step under second temperature, therebetween the first and second primers and the first or second primer comprising extension or its complement Amplified production hybridization.

In one aspect, the present invention provides a kind of for carrying out the reaction mixture of method of the invention.The reaction is mixed Closing object may include one of such as various components herein with respect to described in any in a variety of methods or a variety of.One In a little embodiments, which is multi-joint for expanding the difference of multiple two or more copies comprising target sequence The mixture of body, wherein the target sequence includes the sequence A being orientated with 5 ' to 3 ' directions and sequence B, which includes: (a) multiple concatermers, wherein individual concatermer includes to have 5 ' ends and 3 ' ends by cyclisation in multiple concatermer Independent polynucleotides and the different contacts formed；(b) include sequence A ' the first primer, the wherein the first primer and target sequence Sequence A specifically hybridized by complementarity between sequence A and sequence A '；(c) include sequence B the second primer, Wherein second primer and the sequence B being present in the complementary polynucleotide comprising target sequence complement ' by between B and B ' Complementarity specifically hybridizes；And (d) polymerase, extend the first primer and the second primer to generate the multicore glycosides of amplification Acid；Wherein the distance between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B are 75nt or shorter.In some implementations In scheme, the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', and the second primer includes to be located at 5 ' relative to sequence B The sequence D of side, and sequence C and sequence D during the first amplification step of amplified reaction not with it is the two or more Concatermer hybridization.

In one aspect, the present invention provides can be used for method described herein (such as in its each other party of the invention Method described in any aspect in face) in or by this method generate composition.In some embodiments, the composition Comprising it is multiple be single-stranded cyclisation polynucleotides, and substantially free of ligase.In some embodiments, the composition includes Multiple concatermers, wherein multiple concatermer corresponds to one group 10000 or less target polynucleotide, and further, Wherein the independent concatermer in multiple concatermer is characterized in that: (a) they include that sequence repeats (sequence repeat) Two or more copy, wherein all copies both correspond to identical target polynucleotide；And (b) one individually Another independent concatermer in contact and the composition in concatermer between two or more duplicate copies of sequence In difference.

In one aspect, the present invention provides a kind of systems for detection sequence variant.In some embodiments, should System includes (a) computer, is configured as receiving the user's request for carrying out sample detection reaction；(b) amplification system is rung It should request to carry out nucleic acid amplification reaction to sample or part of it in user, wherein the amplified reaction will be the following steps are included: (i) will Individual polynucleotides are cyclized to form multiple Circular polynucleotides, and wherein each Circular polynucleotide is in 5 ' ends There is contact between 3 ' ends；(ii) expands the Circular polynucleotide；(c) sequencing system, for by the amplification system The polynucleotides of amplification generate sequencing and read, and the sequence difference between reference sequences is read in identification sequencing, and will be present in Sequence difference at least two Circular polynucleotides with different contacts is determined as sequence variants；(d) it is sent out to recipient It delivers newspaper the Report Builder of announcement, wherein this report includes the result detected about sequence variants.In some embodiments, this connects Debit is user.

In one aspect, the present invention provides a kind of computer-readable mediums comprising code, and the code is once by one Or multiple processors execute, i.e. the method for examinations sequence variants.In some embodiments, the method for the implementation includes: (a) client for carrying out detection reaction to sample is received to request；(b) it requests to carry out nucleic acid to sample or part of it in response to client Amplified reaction, wherein individual polynucleotides the following steps are included: (i) be cyclized to form multiple ring-types by the amplified reaction Polynucleotides, wherein each Circular polynucleotide has contact between 5 ' ends and 3 ' ends；(ii) expands the ring Shape polynucleotides；(c) carry out sequencing analysis comprising following steps: (i) is generated for the polynucleotides expanded in amplified reaction Sequencing is read；(ii) sequence difference between reference sequences is read in identification sequencing；And (iii) will be present at least two tools There is the sequence difference in the Circular polynucleotide of different contacts to be determined as sequence variants；(d) it generates comprising about sequence variants The report of testing result.

It quotes and is incorporated to

The all publications, patents and patent applications referred in this specification are both incorporated herein by reference, and degree is such as It is same particularly and individually to show that each individual publication, patent or patent application are incorporated by reference.

Detailed description of the invention

Novel feature of the invention is specifically shown in appended claims.By reference to below to the present invention is utilized The detailed description and its attached drawing that the illustrative embodiment of principle is illustrated, it will obtain to the features and advantages of the present invention It is better understood by, in the accompanying drawings:

Fig. 1 depicts the schematic diagram of an embodiment according to the method for the present invention.DNA chain is cyclized, and is added Add target specificity primer corresponding with the gene studied and polymerase, dNTP, buffer etc., so that rolling Circle amplification (RCA) is to form the concatermer (for example, " polymer ") of template DNA (for example, " monomer ").Handle the concatermer with Corresponding complementary strand is synthesized, adds adapter (adapter) then to prepare sequencing library.The library of the generation (then uses It is sequenced in standard technique) generally comprise three types: do not including the nDNA of rare sequence variants (for example, mutation) (" just Often " DNA)；NDNA comprising enzymatic sequencing mistake；With it is " true in sample polynucleotide comprising being already present on before amplification Just " or the DNA of the polymer of actual sequence variant.The presence of multiple copies of effective rare mutation makes it possible to detection and identification Sequence variants.

Fig. 2 depicts strategy similar with Fig. 1, but adds adapter to promote the cyclisation of polynucleotides.Fig. 2 is also shown The use of target specificity primer.

Fig. 3 is similar to Fig. 2, other than having used adapter primer in amplification.

Fig. 4 depicts three embodiments relevant to the cyclisation formation of ccDNA.On top, adapter is being not present In the case of single stranded DNA (ssDNA) is cyclized, intermediate scheme is depicted using adapter, and the scheme of bottom has used two A adapter oligomer (different sequences is generated in every one end), and can further comprise all hybridize with two adapters so that Two end adjacent clamping plate oligomer (splint oligo).

Fig. 5, which is depicted, to be cyclized particular target by using " Molecular Tweezers " so that two ends of single stranded DNA exist It is spatially neighbouring with the embodiment for connection.

Fig. 6 A and 6B depict two schemes of the closed end addition adapter using nucleic acid.

Fig. 7 A, 7B and 7C depict three kinds of different modes for causing rolling circle amplification (RC) reaction.Fig. 7 A, which is shown, to be made With target specificity primer, for example, specific target target gene or target sequence.This is amplified only target sequence.Figure 7B is described using random primer, which usually expands all sample sequences, then passes through life during processing Object informatics mode sorts the sample sequence.Fig. 7 C describes the use of adapter primer when using adapter, this is generally also Generate non-target specific amplification.

Fig. 8 is depicted according to an embodiment, and double-stranded DNA cyclisation and amplification are so that the reality that two chains are all expanded Example.

Fig. 9 A, 9B, 9C and 9D are depicted to realize complementary strand synthesis for the kinds of schemes of subsequent sequencing.Fig. 9 A description Using the random initiation of target chain, then it is attached.Fig. 9 B describes the adapter initiation using target chain, similarly after with even It connects.Fig. 9 C describes the use of " ring (loop) " adapter, and wherein there are two complementary Sequences for adapter tool, so that Their phase mutual crosses are to generate ring (for example, loop-stem structure).Once the end with concatermer is connect, free-end, that is, conduct of the ring The primer of complementary strand.Fig. 9 D is shown using super-branched random primer to realize that the second chain synthesizes.

Figure 10 is shown according to promoting to the Circular polynucleotide of the target nucleic acid sequence for containing at least two copy or chain The PCR method of the embodiment of sequencing, wherein using fixed away from each other when matching (aligned) in the monomer in target sequence To pair of primers (also referred to as " back-to-back (back toback) ", for example, orienting but not being located to be amplified in two directions The end in region).In some embodiments, these primer sets are used, after concatermer is formed to promote amplicon to become target The more polymer of sequence, for example, dimer, tripolymer etc..Optionally, this method can further comprise size selection to remove ratio The smaller amplicon of dimer.

Figure 11 depicts an embodiment, wherein being walked using back-to-back (B2B) primer and " rise progressively (touch up) " PCR Suddenly, so that amplification less favorable for short product (such as monomer).In this case, there are two structural domains for primer tool: with Target sequence hybridization first structure domain (grey or black arrow), and hybridize with original target sequence, for " universal primer " knot Close the second structural domain (bending rectangle in domain；Otherwise referred to as adapter).In some embodiments, using low temperature anneal step First round PCR is carried out, so that gene specific sequence combines.The cold operation generates the PCR product of various length, including short Product.After several wheels, annealing temperature is improved, so that being conducive to the hybridization of entire primer, two structural domains；As schemed Show, this is found in the end of template, and internal combustion is more unstable.Therefore at a lower temperature or only one knot It is compared when structure domain, when at a higher temperature and when tool is there are two structural domain is less favorable for generating shorter product.

Figure 12 A and 12B depict two kinds of distinct methods of sequencing library building.Figure 12 A is shown The example of Nextera sample preparation system, by the system, DNA can be in one step simultaneously by fragmentation and with survey Sequence adapter is labeled.In Figure 12 B, concatermer is subjected to fragmentation by ultrasonic treatment, is all added then to two ends Adapter (for example, by using kit of KAPA Biosystems), and carry out PCR amplification.Other methods are also available 's.

Figure 13 A-C provides the exemplary advantage to back-to-back (B2B) design of primers compared to normal PCR design of primers Diagram.Primer (arrow A and B) is placed in the region of target sequence flank by normal PCR design of primers (left side), which can be with It is the hot spot (black asterisk) of mutation, and they are general at a distance of at least 60 base-pairs (bp), to generate the allusion quotation of about 100bp Type footprint.In this illustration, primer is placed in the side of target sequence by B2B design of primers (right side).Two B2B primers are in the opposite direction It faces, any one can be overlapped (such as being about or less than about 12bp, 10bp, 5bp or smaller).According to the length of B2B primer, Total footprint in the diagram can be 28-50bp.Since footprint is larger, fragmentation event, which is more likely to destroy in traditional design, draws Object combines, and causes sequence information to be lost, either for linear fragment (13A), cyclized DNA (13B), or for amplified production For (13C).In addition, as shown in fig. 13 c, the capture of B2B design of primers can be used for distinguishing different polynucleotides junction sequences ( Referred to as " natural bar code ").

Figure 14 shows the method for generating the template for detection sequence variant according to an embodiment (for example, using It is cyclized the embodiment example of the process of polynucleotides, referred to herein as " Nebula ").DNA input denaturation is become SsDNA is cyclized by connection, and the uncyclized DNA that degraded by exonuclease digestion.Pass through quantitative PCR (qPCR) joint efficiency is quantified, compares the amount of input DNA and cyclized DNA, generally produces at least about 80% joint efficiency.It will The DNA of cyclisation is purified in exchange buffering liquid, then carries out whole genome amplification using random primer and Phi29 polymerase (WGA).By WGA product purification, and by product fragmentation (such as passing through ultrasonic treatment) be about 400bp or be less than about 400bp Short-movie section.By the target hit rate (on-target rate) of the DNA of qPCR quantitative amplification, wherein more same amount of ginseng The DNA of genomic DNA and amplification is examined, usually shows about 95% or greater than about 95% average target hit rate.

Figure 15 shows " the rising progressively " second for carrying out amplification using tailing B2B primer and implementing PCR at relatively high temperatures The other embodiments in stage.B2B primer includes sequence-specific regions (heavy black) and linking subsequence (hollow frame).Compared with At a temperature of low first stage annealing, target specific sequence and template annealing, to generate initial monomer, and PCR product packet Containing tandem sequence repeats (15A).It is the second amplification stage at relatively high temperatures, more advantageous compared with the hybridization of individual target-specific sequences In the hybridization of target specific sequence and linking subsequence, it reduce the degree (15B) for preferentially generating short product.When not advantageous When complete primer, the ratio (15C, left) for the monomer that increases sharply with the annealing of the inside of target specific sequence.

Figure 16 shows the comparison between the ambient noise (frequency of variant) detected by target sequencing approach, the target Sequencing approach uses Q30 filter, it is desirable that (bottom line) and does not require (top line) more in two differences of variant to be counted as There are sequence differences on nucleotide (for example, identified by different contacts).This verifying filter is applied herein In also referred to as " Firefly ".Human genome DNA (12878, Coriell Institute) turns to 100- by segment 200bp, and 2% incorporation including the genomic DNA (19240, Coriell Institute) containing known SNP (CYP2C19) (spike-in).Real variant signal (peak of label) does not significantly exceed background (top, light gray chromatic graph).It is tested by application Demonstrate,prove filter, background noise reduction to about 0.1 (lower, black figure).

Figure 17 shows when applying the method for the present invention, with various low frequencies (2%, 0.2% in polynucleotides group With the detection for the sequence variants 0.02%) being impregnated in, however it is significantly higher than background.

Figure 18 shows the joint efficiency of one embodiment of the invention and the analysis result of target hit rate.

Figure 19 is shown in the method for an embodiment according to the present invention, the holding of gene frequency, and It there is no deviation.

Figure 20 is shown according to an embodiment, to the testing result of the sequence variants in small input sample.

Figure 21 shows the example of the high background in the testing result of the sequence variants obtained according to standard sequencing methods, There are sequence differences on two different polynucleotides for middle failed call.

The figure that Figure 22 is provided shows what the G/C content distribution of genome was generated with method according to embodiments of the present invention Sequencing result (" Nebula-Firefly "；It is left), the sequencing result that uses alternative sequencing library building kit to obtain (Rubicon,Rubicon Genomics；It is intermediate) and usually the Cell-free DNA (cfDNA) that 32ng is reported is directed in such as document Comparison between the G/C content distribution on (right side).

The method according to an embodiment that the figure that Figure 23 is provided shows reads the size of the input DNA obtained from sequencing Distribution.

The figure that Figure 24 is provided shows uniform between multiple targets by random priming according to an embodiment Amplification.

Figure 25 is shown without cyclisation, is used to form the polynucleotides poly with identifiable contact The embodiment of body.Polynucleotides (for example, polynucleotide passage or Cell-free DNA) is connected to be formed with non-natural contact Polymer, the non-natural contact can be used for embodiment according to the present invention and distinguishes independent polynucleotides (herein Also referred to as " automatic-label ").In Figure 25 A, polynucleotides are connected directly to one another by flush end connection.It is more in Figure 25 B Nucleotide interleaves adapter oligonucleotides by one or more and is connected, which can further include bar code sequence Column.Then polymer is expanded by method either in a variety of methods, such as by using random primer (full-length genome Amplification), adapter primer or one or more target specificity primers or primer pair.Tool is formed from multiple individual polynucleotides There is the process for the polymer that can identify contact to be also referred herein as " Eclipse ".

Figure 26 shows the variation example of the process of Figure 25.To polynucleotides (such as cfDNA or other polynucleotides pieces Section) it carries out end reparation, A tailing and is connect with adapter (for example, using standard reagent box, such as KAPA Biosystems Kit).The carrier DNA marked through internal uracil (U) can be supplemented, total DNA input is increased to required level (example Such as to about 20ng or greater than about 20ng).Sequence variants to be detected are indicated with " asterisk ".When connecting completion, addition can be passed through (it is in uracil dna glycosylase (UDG) and DNA glycosylase-lyase to uracil specificity cutting reagent (USER) enzyme Cut the mixture of nuclease VIII) carrier DNA is degraded.By product purification, to remove the segment of carrier DNA.Amplification (for example, By PCR, the primer for linking subsequence is used) purified product.Rank due to degradation and at least one end Sub- separation is connect, any remaining carrier DNA is unlikely to be amplified.The product of amplification can be purified to remove short dna piece Section.

Figure 27 shows the variation example of the process of Figure 25.Target specificity amplimer has included being total to for adapter effect Same 5' " tail " (grey arrow).Carry out the primary amplification of several circulations (for example, at least about 5,10 or more circulations) (such as passing through PCR).PCR product can also be used as primer, with the annealing of other PCR products (for example, when annealing temperature is in second-order When section reduces), there is the concatermer that can identify contact to generate.Second stage may include it is multiple circulation (for example, 5,10,15, 20 or more circulations), and may include the selection or variation to the condition that concatermer is formed and expanded is conducive to.According to this The method of schematic diagram is also referred to as " Relay Amp Seq ", is used especially for the situation (for example, in droplet) of compartmentation In.

Figure 28 A-E shows the non-limiting example of the method for being cyclized to polynucleotides.It, will in Figure 28 A Double-stranded polynucleotide (for example, dsDNA) denaturation become it is single-stranded, then directly cyclisation (such as by CircLigase carry out from Connectivity connection).In Figure 28 B, end is carried out to polynucleotides (for example, DNA fragmentation) and is repaired with A tailing (to 3 ' end additions The Single base extension of adenosine), to improve joint efficiency, then denaturation is single-stranded, and is cyclized.In Figure 28 C, to polynucleotides into Row end is repaired and A tailing (if it is double-strand), is connected to the adapter extended with thymidine (T), and denaturation becomes single-stranded, and Cyclisation.In Figure 28 D, end reparation and A tailing (if it is double-strand) are carried out to polynucleotides, both ends are connected to three The adapter of a element (T for connection extend, complementary region and 3 ' tails between adapter), chain is denaturalized, and will be single-stranded Polynucleotides are cyclized and (are promoted by the complementary region between linking subsequence).In Figure 28 E, it is by double-stranded polynucleotide denaturation Single stranded form, and be cyclized in the presence of being close together the end of polynucleotides to promote the Molecular Tweezers connected.

The workflow design that Figure 29 shows for identifying the amplification system of sequence variants according to the method for the present invention is shown Example, especially for the polynucleotides of cyclisation.

The workflow design that Figure 30 shows for identifying the amplification system of sequence variants according to the method for the present invention is shown Example is inputted especially for the linear polynucleotides in the case of no cyclisation step.

Figure 31 provides the exemplary summary diagram of workflow for the method according to the invention identification sequence variants. According to " Eclipse " (linear polynucleotides analysis) branch, analysis may include digital pcr (such as digital droplet PCR, ddPCR), Real-time PCR, the analysis of the enrichment captured by probe (capture sequence) and docking point sequence (automated tag), based on insertion It is connected the sequencing or Relay Amp sequencing of subsequence (bar code insertion).According to " Nebula " (analysis of cyclisation polynucleotides), divide Analysis may include digital pcr (such as digital droplet PCR, ddPCR), real-time PCR, by enrichment (capture sequence) that probe captures with And dock the analysis of point sequence (natural bar code), the enrichment of amplification (such as B2B is expanded) is captured or targeted by probe, and Sequence analysis with verification step, the verification step by sequence variants be accredited as two different polynucleotides (such as with The polynucleotides of different contacts) present in difference.

Figure 32 is the diagram of system according to an embodiment.

Figure 33 is shown according to the capture rate of example and along the covering of target region.> 90% targeting base is coating It has covered more than 20x, and > 50% base that is targeted has > the covering of 50x.

Detailed description of the invention

Unless otherwise indicated, the implementation of some embodiments disclosed herein uses immune within the scope of art technology , biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA routine techniques. See, e.g., Sambrook and Green, Molecular Cloning:A Laboratory Manual, fourth edition (2012)； Current Protocols in Molecular Biology is serial (F.M.Ausubel etc. writes)；Methods In Enzymology series (Academic Press, Inc.), PCR 2:A Practical Approach (M.J.MacPherson, B.D.Hames and G.R.Taylor write (1995))；Harlow and Lane writes (1988) Antibodies,A Laboratory Manual,and Culture of Animal Cells:A Manual of Basic Technique and Specialized Applications, the 6th edition (R.I.Freshney writes (2010)).

Term " about " " about " means the acceptable error in particular value as one of ordinary skill in the identified In range, it is partly dependent on how the value measures or determine, for example, the limitation of measuring system.For example, " about " can be with Refer to the practice according to this field, at 1 or is greater than in 1 standard deviation.Alternatively, " about " can be show definite value until 20%, until 10%, until 5% or until 1% range.Alternatively, especially for biosystem or process, which can be with Refer in an order of magnitude of numerical value, preferably within 5 times, more preferably within 2 times.In the application and claim When describing particular value in book, unless otherwise stated, term " about " is considered as referring to the acceptable error range in particular value It is interior.

Term " polynucleotides ", " nucleotide ", " nucleotide sequence ", " nucleic acid " and " oligonucleotides " is to may be used interchangeably 's.They refer to the nucleotide (deoxyribonucleotide or ribonucleotide) of random length or the polymerized form of its analog. Polynucleotides can have arbitrary three-dimensional structure, and can exercise any of or unknown function.It is multicore below The non-limiting example of thuja acid: the coding or noncoding region of gene or genetic fragment, the locus (seat determined by linkage analysis Position), exon, introne, mRNA (mRNA), transfer RNA (tRNA), rRNA (rRNA), short interfering rna (siRNA), short hairpin RNA (shRNA), Microrna (miRNA), ribozyme, cDNA, recombination of polynucleotide, branch's multicore glycosides Acid, plasmid, carrier, the isolated DNA of arbitrary sequence, arbitrary sequence isolated RNA, nucleic acid probe and primer.Polynucleotides It may include the nucleotide of one or more modifications, such as methylated nucleotide and nucleotide analog.If it does, to nucleosides The modification of sour structure can assign before or after polymer assembles.Nucleotide sequence can be interrupted by non-nucleotide components. Polynucleotides can be further embellished after polymerisation, for example, by being conjugated with labeling component.

In general, term " target polynucleotide " refers to the nucleic acid molecules or more in the nucleic acid molecules starter population with target sequence Nucleotide, it is desirable to identify presence, amount and/or the nucleotide sequence of the target sequence or in which the change of one or more.In general, art Language " target sequence " refers to the nucleic acid sequence on single nucleic acid strands.Target sequence can be a part of gene, regulating and controlling sequence, genome DNA, cDNA, RNA including mRNA, miRNA, rRNA, etc..Target sequence can be target sequence or secondary from sample The product of target such as amplified reaction.

In general, " nucleotide probe ", " probe " or " tagged oligonucleotides " refers to for by miscellaneous with corresponding target sequence Hand over and detect or identify in hybridization reaction the polynucleotides of its corresponding target polynucleotide.Therefore, nucleotide probe can be with one A or multiple target polynucleotide hybridization.Label oligonucleotide can be perfect mutually with one or more target polynucleotides in sample Mend, or comprising not with one or more nucleosides of nucleotide complementation corresponding in one or more target polynucleotides in sample Acid.

" hybridization " refers to such reaction, and in the reaction, one or more polynucleotides react compound to be formed Body, the hydrogen bonding between base which passes through nucleotide residue are stabilized.The hydrogen bonding can pass through Watson Crick base pairing, Hoogstein in conjunction with or according to base complement with any other sequences specificity pattern and Occur.The complex may include two chains to form duplex structure, three or more chains, lists for forming multi-stranded complex One hybridization chain or their any combination certainly.The step of hybridization reaction be may be constructed widely in the process, such as PCR The digestion of starting or endonuclease to polynucleotides.Second sequence complementary with First ray is referred to as the " mutual of First ray Complement (complement) ".Such as refer to the ability of polynucleotides formation complex for the term " interfertile " of polynucleotides, The hydrogen bonding between base that the complex passes through hybridization reaction nucleotide residue is stabilized.

" complementarity " refers to that nucleic acid and another nucleic acid sequence pass through classics Watson-Crick or other non-classical types Mode forms the ability of hydrogen bond.Complementary percentage indicate can be formed with second nucleotide sequence in nucleic acid molecules hydrogen bond (for example, Watson-Crick base pairing) residue percentage (for example, have in 10 5,6,7,8,9,10 be respectively 50%, 60%, 70%, 80%, 90% and 100% is complementary)." perfect complementary " refers to that all consecutive residues of nucleic acid sequence will be with second Equal number of consecutive residue hydrogen bonding in nucleic acid sequence." being substantially complementary " used herein refer to complementary degree 8, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,30,35,40,45,50 or more core In the region of thuja acid be at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100%, or refer to two nucleic acid hybridized under strict conditions.Sequence identity, such as in order to assess complementary percentage, It can be measured by any suitable alignment algorithm, including but not limited to Needleman-Wunsch algorithm is (referring to example Such as, EMBOSS Needle comparative device, can be from www.ebi.ac.uk/Tools/psa/emboss_needle/ Nucleotide.html obtain, optionally have default setting), BLAST algorithm (see, e.g., BLAST compare tool, can Obtained from blast.ncbi.nlm.nih.gov/Blast.cgi, optionally there is default setting) or Smith-Waterman Algorithm (see, e.g., EMBOSS Water comparative device, it can www.ebi.ac.uk/Tools/psa/emboss_water/ Nucleotide.html can get, and optionally have default setting).It is optimal to compare any conjunction that institute's selected algorithm can be used Suitable parameter (including default parameters) is assessed.

In general, " stringent condition " of hybridization refer to have under this condition complementary nucleic acid with target sequence mainly with Target sequence hybridization, and do not hybridize with non-target sequences substantially.Stringent condition is usually sequence dependent, and according to many Factor and it is different.In general, sequence is longer, the sequence and temperature when its target sequence specific hybrid are higher.Stringent condition it is non- Limitative examples have a detailed description in following: Tijssen (1993), Laboratory Technniques In The of Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes A part, chapter 2, " Overview of principles of hybridization and the strategy of nucleic acid probe assay”,Elsevier,N.Y.。

In one aspect, the present invention provides a kind of identification sequence variants, such as the side of the sequence variants in nucleic acid samples Method.In some embodiments, each polynucleotides in multiple polynucleotides have 5 ' ends and 3 ' ends, and this method It include: that the individual polynucleotides in the multiple polynucleotides are cyclized to form multiple Circular polynucleotides by (a), In each Circular polynucleotide between 5 ' ends and 3 ' ends have contact；(b) Circular polynucleotide of (a) is expanded；(c) The polynucleotides of amplification are sequenced to generate multiple sequencings and read；(d) sequence between reference sequences is read in identification sequencing Column difference；(e) sequence difference that will be present at least two Circular polynucleotides with different contacts is determined as sequence Variant.In some embodiments, this method includes the sequence difference between identification sequencing reading and reference sequences, and will be deposited It is that the sequence difference at least two Circular polynucleotides with different contacts is determined as sequence variants, in which: (a) survey Sequence reads the amplified production for corresponding at least two Circular polynucleotides；And it is (b) every at least two Circular polynucleotide One different contact formed comprising the 5 ' ends and 3 ' ends by the corresponding polynucleotides of connection.

In general, term " sequence variants " refers to any variation in sequence relative to one or more reference sequences.Generally For, for given group individual known to reference sequences, sequence variants are with frequency more lower than reference sequences generation.Example Such as, specific bacterium category may have the shared reference sequences for 16SrRNA gene, but individual kind of possibility in the category There are one or more sequence variants in gene (or part of it), be useful to this kind in identification bacterial community. As another example, when most preferably comparing, the sequences of same kind of multiple individuals (or multiple sequencings reading of same individual Take) it can produce consensus sequence, and the sequence variants relative to the consensus sequence can be used for identifying instruction danger in the group The mutant of the pollution of danger.In general, " consensus sequence " refers to such nucleotide sequence, reflection is when to a series of associated nucleic acids A large amount of mathematics and/or sequence analysis are carried out (such as according to the optimal sequence of any one ratio in a variety of sequence alignment algorithms It is right) when, the most common Base selection in sequence at each position.A variety of alignment algorithms can be used, and some of which exists It is described herein.In some embodiments, reference sequences are single known reference sequences, such as single individual Genome sequence.In some embodiments, reference sequences are by comparing multiple known arrays (such as reference group The genome sequence of multiple individuals, or multiple sequencings of the polynucleotides from same individual are read) and the consensus sequence of formation. In some embodiments, reference sequences are by most preferably comparing the shared sequence formed from the sequence for the sample analyzed Column, so that sequence variants represent the variation in same sample relative to corresponding sequence.In some embodiments, sequence variants exist (also referred to as " rare " sequence variants) are occurred with low frequency in group.For example, sequence variants can with about or below about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001% or lower frequency occur.In some embodiments, sequence variants are with about Or the frequency below about 0.1% occurs.

Sequence variants can be any variation relative to reference sequences.Sequence variations can be by a nucleotide or multiple Change, insertion or the missing composition of nucleotide (such as 2,3,4,5,6,7,8,9,10 or more nucleotide).When sequence becomes When body includes two or more nucleotide differences, different nucleotide can be continuous or discontinuous each other.Sequence becomes The non-limiting example of body type includes single nucleotide polymorphism (SNP), missing/insertion polymorphism (DIP), copy number variant (CNV), Short tandem repeatSTR (STR), simple sequence repeats (SSR), Variable bend tail vehicle (VNTR), expanding fragment length are more State property (AFLP), insertion polymorphism and sequence specific amplification polymorphism based on retrotransposon.

The nucleic acid samples that can carry out method described herein can be originated from any suitable source.In some embodiments In, used sample is environmental sample.Environmental sample can come from any environmental sources, for example, naturally occurring or artificial Atmosphere, water system, soil or any other purposes sample.In some embodiments, environmental sample can be obtained from, for example, greatly Gas pathogen collection system, surface subsidence product object, underground water, stratum depths ancient water, the plant roots-Soil Interface on grassland, edge Bank water and sewage treatment plant.

Polynucleotides from sample can be any one in a variety of polynucleotides, including but not limited to DNA, RNA, RRNA (rRNA), transfer RNA (tRNA), Microrna (miRNA), mRNA (mRNA), any of the above one kind segment Or any of the above two or more combination.In some embodiments, sample includes DNA.In some embodiments, sample Product include genomic DNA.In some embodiments, sample includes mitochondrial DNA, chloroplast DNA, Plasmid DNA, bacteria artificial Chromosome, yeast artificial chromosome, label oligonucleotide or combinations thereof.In some embodiments, sample includes to pass through amplification The DNA of generation, such as the primer extension reaction carried out by using the combination of any suitable primer and archaeal dna polymerase, including but It is not limited to polymerase chain reaction (PCR), reverse transcription and combinations thereof.When the template of primer extension reaction is RNA, the production of reverse transcription Object is referred to as complementary DNA (cDNA).Useful primer, which may include, in primer extension reaction has one or more targets Sequence, random sequence, the part random sequence and combinations thereof of specificity.In general, sample polynucleotide includes to exist in the sample Any polynucleotides, may include or may not include target polynucleotide.The polynucleotides can be it is single-stranded, double-strand or its Combination.In some embodiments, the polynucleotides for undergoing method of the invention are single stranded polynucleotides, be may be present or can not There are double-stranded polynucleotides.In some embodiments, which is single stranded DNA.Single stranded DNA (ssDNA) can be with The ssDNA of single stranded form separation, or separated with double-stranded form and be subsequently formed into single-stranded one with for the method for the present invention The DNA of a or multiple steps.

In some embodiments, polynucleotides undergo subsequent step without extraction step and/or without further purification step (such as cyclisation and amplification).For example, fluid sample can be treated to remove cell without extraction step, to generate purifying Fluid sample and cell sample, then separate DNA from the fluid sample of purifying.A variety of programs for being used to separate polynucleotides It is available, such as by precipitating or the non-specific binding with substrate, washs substrate then to discharge the polynucleotides of combination. When separating polynucleotides from sample without cell extraction step, polynucleotides will be largely extracellular or " without thin Born of the same parents' " polynucleotides, it can correspond to dead or impaired cell.The identity of this kind of cell, which can be used for characterizing them, to be originated from Cell or cell colony, such as in microbiologic population.

If handled sample to extract polynucleotides, such as extracted from the cell in sample, then a variety of extractions Method is available, for example, nucleic acid can by with phenol, phenol/chloroform/isoamyl alcohol or similar preparation (including TRIzol and TriReagent it) carries out organic extraction and purifies.Other non-limiting examples of purification technique include: (1) organic extraction then Ethanol precipitation, for example, using phenol/chloroform organic reagent (Ausubel etc., 1993), with or without the use of automatic nucleic acid extraction Device, such as the 341 type DNA extractors that can be obtained from Applied Biosystems (Foster city, Calif)；(2) fixed Phase absorption method (U.S. Patent number 5,234,809；Walsh etc., 1991)；(3) the Salt treatment nucleic acid precipitation method (Miller etc., 1988), which generally known as " saltouts " method.Another example of nucleic acid separation and/or purifying includes using nucleic acid Can specificity or non-specific binding magnetic-particle, then using magnet separate pearl, and wash and eluted from pearl Nucleic acid (see, e.g., U.S. Patent number 5,705,628).It in some embodiments, can be advanced before above-mentioned separation method Row enzymatic digestion stage is to help to remove unwanted protein from sample, such as with Proteinase K or other similar protease It is digested.See, e.g., U.S. Patent number 7,001,724.If desired, RNase inhibition can be added into lysis buffer Agent.For specific cell or sample type, it may be necessary to increase protein denaturation/digestion step in scheme.Purification process It can be for separation DNA, RNA or this two.When DNA and RNA is separated together during or after extraction procedure, can make One or both is purified individually from one another with further step.Also the subfraction of the nucleic acid extracted is produced, for example, according to Size, sequence or other physically or chemically characteristic is purified.Other than original nucleic acid separating step, the purifying of nucleic acid can be with It is carried out after the arbitrary steps of disclosed method, such as removing excessive or unwanted reagent, reactant or production Object.A variety of methods for being used to determine nucleic acid amount and/or nucleic acid purity in sample are available, for example, by absorbance (for example, Light absorption and their ratio at 260nm, 280nm) and marker detection (for example, fluorescent dye and intercalator, example As SYBR is green, SYBR is blue, DAPI, propidium iodide, Hoechst coloring agent, SYBR gold, ethidium bromide).

According to some embodiments, the polynucleotides in multiple polynucleotides from sample are cyclized.Cyclisation can 3 ' the ends including 5 ' ends of polynucleotides to be connected to same polynucleotides, are connected to another polynucleotides in sample 3 ' ends, or it is connected to 3 ' of the polynucleotides (for example, artificial polynucleotide, such as oligonucleotides adapter) from separate sources End.In some embodiments, 5 ' ends of polynucleotides are connected to 3 ' ends of same polynucleotides (also referred to as " certainly Connection ").In some embodiments, select the condition of cyclization in favor of the polynucleotides within the scope of specific length From connection, to generate the cyclisation polynucleotides group with specific average length.For example, can choose cyclization condition with Conducive to being shorter in length than about 5000,2500,1000,750,500,400,300,200,150,100,50 or less nucleotide Polynucleotides connect certainly.In some embodiments, being conducive to length is 50-5000 nucleotide, 100-2500 nucleosides The segment of acid or 150-500 nucleotide, so that the average length of cyclisation polynucleotides is fallen into corresponding range.Some In embodiment, the length of 80% or more cyclisation segment is 50-500 nucleotide, such as length is 50-200 nucleosides Acid.The reaction condition that can optimize includes the time span for distributing to connection reaction, the concentration of various reagents and to be connected more The concentration of nucleotide.In some embodiments, cyclization keeps the distribution for the fragment length being present in sample before being cyclized. For example, average value, intermediate value, the mode (mode) of the fragment length of fragment length and cyclisation polynucleotides before being cyclized in sample With one or more of standard deviation within mutual 75%, 80%, 85%, 90%, 95% or higher percentage.

One or more adapter oligonucleotides have been used, rather than have been preferentially formed from connection cyclisation product, thus in sample 5 ' the ends and 3 ' ends of polynucleotides interleave adapter oligonucleotides by one or more and are connected, to form cyclic annular multicore Thuja acid.For example, 5 ' ends of polynucleotides can be connected to 3 ' ends of adapter, and 5 ' end energy of same adapter Enough it is connected to 3 ' ends of same polynucleotides.Adapter oligonucleotides includes any oligonucleotides with sequence, the sequence At least part of column be it is known, it can connect with sample polynucleotide.Adapter oligonucleotides may include DNA, RNA, nucleotide analog, atypia nucleotide, the nucleotide of label, the nucleotide of modification or their combination.Adapter is few Nucleotide can be single-stranded, double-strand or partial duplex.In general, partial duplex adapter includes one or more single-stranded Region and one or more double-stranded regions.Double-strand adapter may include the individual oligonucleotides of hybridize each other two ( Referred to as " oligonucleotides duplex "), and hybridize can leave one or more flush ends, one or more 3 ' jags, one or Multiple 5 ' jags, one or more protrusion by caused by mispairing and/or unpaired nucleotide or their any group It closes.When two hybridising regions of adapter are separated from each other by non-hybridising region, " bubble " structure can be generated.Different types of linking Son can be applied in combination, such as not homotactic adapter.Different adapters can sequentially reaction in or simultaneously with sample Polynucleotides connection.In some embodiments, identical adapter is added to two ends of target polynucleotide.For example, First and second adapters can be added in same reaction.Adapter can be operated before combining with sample polynucleotide.Example Such as, it can add or remove terminal phosphate.

When using adapter oligonucleotides, which may include one or more in a variety of sequential elements It is a, including but not limited to, one or more amplimer anneal sequences or its complement, one or more sequencing primer annealing sequences Column or its complement, one or more bar code sequences, one or more are in multiple and different adapters or the son of different adapters Consensus, one or more restriction enzyme recognition sites, the one or more and one or more target multicore glycosides shared between collection The jag of sour jag complementation, one or more probe binding sites (for example, for being connected to microarray dataset, for example, for The flow cell of large-scale parallel sequencing, such as Illumina, the flow cell of Inc. exploitation), it is one or more random or close to The sequence of machine is (for example, in one or more positions randomly selected one from one group of two or more different nucleotide Or multiple nucleotide, wherein each of the different nucleotide in one or more positions selection are including random sequence Adapter set in be embodied as) or their combination.In some cases, adapter, which can be used for purifying, contains adapter Those of ring, such as by using (special for ease of processing with the pearl coated comprising the oligonucleotides of adapter complementary series Not magnetic bead), the pearl can and hybridizing therewith " capture " have correct adapter close ring, wash off those and do not include Then the ring of adapter and any not connected component discharge captured ring from pearl.In addition, in some cases, it is miscellaneous The capture probe of friendship and the complex of target ring can be directly used for generating concatermer, such as pass through direct rolling circle amplification (RCA).? In some embodiments, the adapter in ring also is used as sequencing primer.Two or more sequential elements can be each other not It is neighbouring (such as being separated by one or more nucleotide), located adjacent one another, partly overlapping or completely overlapped.For example, expanding Increasing primer annealing sequence also can be used as sequencing primer anneal sequence.Sequential element can be located on or near 3 ' ends, be located on or near 5 ' ends or inside adapter oligonucleotides.Sequential element can be any suitable length, e.g., about or less than about 3, 4, the length of 5,6,7,8,9,10,15,20,25,30,35,40,45,50 or more nucleotide.Adapter oligonucleotides can With any appropriate length, it is at least sufficient to accommodate its one or more sequential element for being included.In some embodiments, The length of adapter be about or less than about 10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,90,100, 200 or more nucleotide.In some embodiments, the length of adapter oligonucleotides is in about 12 to 40 nucleotide In range, such as length is about 15 to 35 nucleotide.

In some embodiments, the adapter oligonucleotides being connect with the polynucleotides of the fragmentation from a sample Comprising sequence common to one or more all adapter oligonucleotides and the polynucleotides with the specific sample are connect Adapter for unique bar code so that the bar code sequence can be used for connect from a sample or adapter The polynucleotides of reaction are distinguished with the polynucleotides for connecting reaction from another sample or adapter.In some implementations In scheme, adapter oligonucleotides includes the 5 ' jags complementary with one or more target polynucleotide jags, 3 ' jags Both or this.Complementary overhangs can be one or more nucleotide in length, including but not limited to 1,2,3,4,5,6,7, 8, the length of 9,10,11,12,13,14,15 or more nucleotide.Complementary overhangs may include fixed sequence.Adapter The complementary overhangs of oligonucleotides may include the random sequence of one or more nucleotide, so that in one or more positions Place randomly chooses one or more nucleotide from one group of two or more different nucleotide, wherein in one or more positions Locate each of different nucleotide of selection all in the set of the adapter with the complementary overhangs comprising random sequence It is embodied as.In some embodiments, adapter jag and the target multicore glycosides generated by restriction endonuclease digestion Sour jag is complementary.In some embodiments, adapter jag is made of adenine or thymidine.

The method of a variety of cyclisation polynucleotides is available.In some embodiments, cyclisation includes enzyme reaction, such as is made With ligase (such as RNA or DNA ligase).A variety of ligases are available, including but not limited to, Circligase^TM (Epicentre；Madison, WI), RNA ligase, (ssRNA ligase acts on DNA and RNA two to T4RNA ligase 1 Person).In addition, if dsDNA template is not present, T4DNA ligase also can connect ssDNA, although this is usually slowly anti- It answers.Other non-limiting examples of ligase include: NAD- dependence ligase, including Taq DNA ligase, Filamentous Thermus (Thermus filiformis) DNA ligase, e. coli dna ligase, Tth DNA ligase, water pipe blackening Thermus (I is connected (Thermus scotoductus) DNA ligase with II), heat-staple ligase, the heat-staple DNA of Ampligase Enzyme, VanC- type ligase, 9 ° of N DNA ligases, Tsp DNA ligase and the novel ligase found by bioprospecting； ATP- dependence ligase, including T4RNA ligase, T4DNA ligase, T3DNA ligase, T7DNA ligase, Pfu DNA Ligase, DNA ligase 1, DNA ligase III, DNA ligase IV and the novel ligase found by bioprospecting；And Wild type, mutant isotype and their genetic engineering variant.When needing that polynucleotides and enzyme is adjusted from when connecting Concentration to promote molecule inner ring rather than the formation of intermolecular structure.Reaction temperature and time also can adjust.In some embodiments In, promote the formation of molecule inner ring using 60 DEG C.In some embodiments, the reaction time is 12-16 hours.Reaction condition It can be the condition of manufacturer's defined of selected enzyme.It in some embodiments, may include exonuclease step To digest any not connected nucleic acid after cyclization.That is, close ring is free of free 5 ' or 3 ' ends, therefore introduce 5 ' or 3 ' exonucleases will not digest close ring but can digest not connected component.This is used especially in multiplicated system.

Generally, by polynucleotides end connect to each other to form Circular polynucleotide (directly, or use one Or multiple intermediate adaptor oligonucleotides) contact with junction sequences can be generated.When the 5 ' ends and 3 ' ends of polynucleotides When being connected by adapter polynucleotides, term " contact " can refer to the contact (such as 5 ' between polynucleotides and adapter One of terminal contacts or 3 ' terminal contacts), or refer to as formed by adapter polynucleotides and comprising adapter polynucleotides, Contact between the 5 ' ends and 3 ' ends of polynucleotides.When 5 ' ends of polynucleotides and 3 ' ends are without using interleaving linking When connecting in the case where son (for example, 5 ' ends and 3 ' ends of single stranded DNA), term " contact " refers to that the two ends are connected Point.Contact can be identified according to the sequence (also referred to as " junction sequences ") of the polynucleotides comprising contact.In some implementations In scheme, sample includes the polynucleotides with the end mixture formed by following procedure: natural degradation process (such as it is thin Cellular lysate, cell death and DNA is made to be discharged into other processes in its ambient enviroment from cell, DNA can in the ambient enviroment Further it is degraded, such as in cell-free polynucleotides), as sample treatment (for example, fixing, dyeing and/or storing Journey) by-product fragmentation, and the cutting DNA by not limiting particular target sequence method carry out fragmentation (for example, Mechanical fragmentation is such as ultrasonically treated；Non-sequence specific nucleic acid enzymatic treatment, such as DNase I, fragmentation enzyme (fragmentase)).When sample includes the polynucleotides with end mixture, two nucleotide have identical 5 ' end Or a possibility that 3 ' end is low, and the possibility of two nucleotide independently both 5 ' end and 3 ' ends having the same Property is extremely low.Therefore, in some embodiments, or even when two polynucleotides include the part with identical target sequence, Contact can be used to distinguish different polynucleotides.When polynucleotides end is connected without using adapter is interleave When, junction sequences can be accredited and comparing with reference sequences.For example, when the sequence of two group sub-sequences is relative to reference sequence When column seem to be inverted, then seem that the point inverted can indicate to have contact in the point.When polynucleotides end passes through one When a or multiple linking subsequences connect, contact can be by being connected the neighbouring and identified of subsequence with known, or is being sequenced The length of reading is enough to reflect in the case where obtaining sequence from 5 ' and 3 ' ends of cyclisation polynucleotides by above-mentioned comparison It is fixed.In some embodiments, the formation of specific contact is very rare event, so that its cyclisation multicore glycosides in sample It is unique among acid.

Fig. 4 shows three non-limiting examples of the method for cyclisation polynucleotides.In the top, adapter is being not present In the case where polynucleotides are cyclized, intermediate scheme is described using adapter, and the scheme of bottom has used two A adapter.When using two adapters, one of them can connect with 5 ' ends of polynucleotides, and the second adapter energy It is enough to be connect with 3 ' ends of same polynucleotides.In some embodiments, adapter connection may include different using two Adapter and " clamping plate (splint) " nucleic acid complementary with the two adapters are to promote to connect.Forked or " Y " can also be used Shape adapter.When using two adapters, two ends have identical adapter polynucleotides can due to self annealing and It is removed in the next steps.

Fig. 6 shows other non-limiting examples of the method for cyclisation polynucleotides such as single stranded DNA.Adapter can be by not right It is added to 5 ' the ends or 3 ' ends of polynucleotides with claiming.As shown in Figure 6A, single stranded DNA (ssDNA) has free in 3 ' ends Hydroxyl, and adapter has closed 3 ' end, so that in the presence of ligase, it is preferred to react the 3 ' of ssDNA End is connected to 5 ' ends of adapter.In this embodiment, connection uses reagent (such as poly- in the molecule before forming ring Ethylene glycol (PEG)) come to drive the intermolecular connection of single ssDNA segment and single adapter may be useful.It can also be into The reverse sequence (closed 3 ', free 5 ', etc.) of row end.Once linearly connected is completed, i.e., available enzymatic treatment is connected Segment to remove enclosure portion, such as by using kinases or other suitable enzymes or chemicals.Once eliminating closure Point, the addition of cyclase (such as CircLigase) allows for carrying out polynucleotides of the inner molecular reaction to form cyclisation.Such as Fig. 6 B It is shown, there is the double-strand adapter of closed 5 ' or 3 ' ends by using a wherein chain, be capable of forming duplex structure, The double-stranded segment with notch is generated after connection.This two chains can be separated later, remove enclosure portion, and make single-chain fragment It is cyclized to form the polynucleotides of cyclisation.

In some embodiments, two ends of polynucleotides (for example, single stranded DNA) are made to be close together using Molecular Tweezers To improve intramolecular cyclization rate.Fig. 5 shows a kind of example diagram of such process.This can by using or without using linking Son and complete.The use of Molecular Tweezers may be outstanding in the case where the length of average polynucleotide passage is greater than about 100 nucleotide Its is useful.In some embodiments, Molecular Tweezers probe includes three structural domains: first structure domain interleaves structural domain and second Structural domain.First and second structural domains will pass through first complementarity and hybridize with the corresponding sequence in target polynucleotide.Molecule Pincers probe interleaves that structural domain is unobvious to be hybridized with target sequence.Therefore Molecular Tweezers make the two of target sequence with hybridizing for target polynucleotide A end is closer to this facilitate the intramolecular cyclizations of target sequence in the presence of cyclase.In some embodiments, this In addition useful, because Molecular Tweezers can also be used as amplimer.

After cyclisation, it is more that reaction product can be purified the cyclisation that may participate in subsequent step with raising before amplification or sequencing The relative concentration or purity of nucleotide are (for example, other one or more molecules in the separation or reaction that pass through Circular polynucleotide Removal).For example, cyclization or its component can be handled to remove single-stranded (uncyclized) polynucleotides, such as by with outer Cut nucleic acid enzymatic treatment.As further example, size exclusion chromatography can be carried out to cyclization or part thereof, to retain And small reagent (such as unreacted adapter) is abandoned, or retain in individual volume and discharge cyclisation product.It is a variety of to be used for The kit of cleaning connection reaction is available, such as the Zymo oligonucleotides purification kit manufactured by Zymo Reaserch Provided kit.In some embodiments, purifying includes for the connection used in cyclization that removes or degrade Enzyme and/or the processing for purifying cyclisation polynucleotides from the ligase.In some embodiments, for ligase of degrading Processing includes the processing carried out with protease (such as Proteinase K).Proteinase K processing can follow the manufacturer's protocol or standard scheme (for example, Molecular Cloning:A Laboratory Manual, the 4th edition (2012) is mentioned such as Sambrook and Green It supplies).Also it can extract and precipitate after Protease Treatment.In an example, cyclisation polynucleotides purify as follows: Proteinase K (Qiagen) processing is carried out in the presence of 0.1%SDS and 20mM EDTA, is extracted with 1:1 phenol/chloroform and chloroform, And with ethyl alcohol or isopropanol precipitating.In some embodiments, precipitating carries out in ethanol.

Directly cyclisation polynucleotides can be sequenced after cyclisation.Alternatively, one or more expansions can be carried out before sequencing Increase reaction.In general, " amplification " refers to the process of to form one or more copies of target polynucleotide or part thereof.A variety of amplifications are more The method of nucleotide (such as DNA and/or RNA) is available.Amplification can be it is linear, it is exponential, or the multistage expand Both linear and exponential phases involved in increasing process.Amplification method may include the change of temperature, such as denaturation step, Huo Zheke To be the constant temperature process for not needing thermal denaturation.Polymerase chain reaction (PCR) uses the annealing of denaturation, primer pair and opposite strand and draws Multiple circulations that object extends increase the copy number of target sequence with exponential form.The denaturation of the nucleic acid chains of annealing can be by such as getting off It realizes: applying heat, improves localized metallic ion concentration (for example, U.S. Patent number 6,277,605), ultrasonic radiation (for example, WO/ 2000/049176), apply voltage (for example, U.S. Patent number 5,527,670, U.S. Patent number 6,033,850, U.S. Patent number 5,939,291 and U.S. Patent number 6,333,157) and with the primer being integrated on magnetic responsiveness material apply electromagnetism with being combined Field (for example, U.S. Patent number 5,545,540).In the version referred to as RT-PCR, using reverse transcriptase (RT) by RNA Prepare complementary DNA (cDNA), later by PCR amplification cDNA with generate DNA multiple copies (for example, U.S. Patent number 5, 322,770 and U.S. Patent number 5,310,652).One example of isothermal amplification method is strand displacement amplification, commonly referred to as SDA, It uses the circulation of following procedure: so that primer sequence pair and the opposite strand of target sequence is annealed, primer is carried out in the presence of dNTP and is prolonged Stretch to generate the primer extension product of half thiophosphorylation of duplex, endonuclease mediate to semi-modified limitation endonuclease Enzyme recognition site forms notch and polymerase-mediated 3 ' the end progress primer extends from notch to replace already present chain and produce The raw chain for being used for next round primer annealing, notch formation and strand displacement, expands so as to cause the geometry of product (for example, the U.S. is special Benefit number 5,270,184 and U.S. Patent number 5,455,166).Thermophilic SDA (tSDA) is in essentially identical method in higher temperature Degree is lower to use thermophilic endonuclease and polymerase (european patent number 0684315).Other amplification methods include rolling circle amplification (RCA) (such as Lizardi, " Rolling Circle Replication Reporter Systems " U.S. Patent number 5, 854,033)；Helicase dependent amplification (HDA) is (for example, Kong etc., " Helicase Dependent Amplification Nucleic Acids " U.S. Patent Application Publication No. US 2004-0058378A1)；The isothermal duplication (LAMP) mediated with ring (such as Notomi etc., " Process for Synthesizing Nucleic Acid " U.S. Patent number 6,410,278).? Under some cases, isothermal duplication is used and is transcribed from promoter sequence by RNA polymerase, such as can introduce few nucleosides In sour primer.Amplification method based on transcription includes the amplification based on nucleic acid sequence, also referred to as NASBA (such as U.S. Patent number 5,130,238)；Dependent on use rna replicon enzyme (commonly referred to as Q β replicase) amplification probe molecule itself method (for example, Lizardi, P. etc. (1988) BioTechnol.6,1197-1202)；It is automatic to maintain sequence replicating (for example, Guatelli, J. etc. (1990)Proc.Natl.Acad.Sci.USA 87,1874-1878；Landgren(1993)Trends in Genetics 9, 199-202；With HELEN H.LEE etc., NUCLEIC ACID AMPLIFICATION TECHNOLOGIES (1997))；And generation The method (for example, U.S. Patent number 5,480,784 and U.S. Patent number 5,399,491) of additional transcription templates.In addition etc. Isothermal nucleic acid amplification method includes being made using the primer containing atypia nucleotide (for example, uracil or RNA nucleotide) and combination The enzyme (such as DNA glycosylase or RNaseH) that nucleic acid is cut at atypia nucleotide, with exposed needle to additional primer Binding site (for example, U.S. Patent number 6,251,639, U.S. Patent number 6,946,251 and U.S. Patent number 7,824, 890).Isothermal duplication process can be linear or exponential.

In some embodiments, amplification includes rolling circle amplification (RCA).Typical RCA reaction mixture include it is a kind of or A variety of primers, polymerase and dNTPs, and generate concatermer.In general, the polymerase in RCA reaction is with strand displacement Active polymerase.A variety of such polymerases are available, and non-limiting example includes exonuclease^-DNA polymerase i (Klenow) segment, Phi29DNA polymerase, Taq archaeal dna polymerase etc. greatly.In general, concatermer is to include to carry out self-template multicore glycosides Acid target sequence two or more copy (such as target sequence pact or be more than about 2,3,4,5,6,7,8,9,10 or more A copy；In some embodiments, about or more than about 2 copy) polynucleotide amplification product.Amplimer can be Any suitable length, for example, about or at least about 5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80, 90,100 or more nucleotide, arbitrary portion or can all be complementary to corresponding target sequence that primer is hybridized (for example, about For or at least about 5,10,15,20,25,30,35,40,45,50 or more nucleotide).Fig. 7 describes suitable primer Three non-limiting examples.Fig. 7 A shows without using adapter but uses target specificity primer, can be used for detecting spy The presence or absence of sequence variants in targeting sequence.In some embodiments, using for multiple targets in same reaction Multiple target specificity primers.For example, can in an amplified reaction using for about or at least about 10,50,100, 150, the target of 200,250,300,400,500,1000,2500,5000,10000,15000 or more different target sequences Specific primer, to expand the target sequence (if present) of respective number in parallel.Multiple target sequences can correspond to Mutually isogenic different piece, different genes or non-genomic sequence.Multiple target sequences in multiple primers targeting term single gene When, primer can be spaced apart (for example, being spaced apart about or at least about 50 nucleotide, every 50-150 core along gene order Thuja acid or every 50-100 nucleotide), to cover the whole or specified portions of target gene.Use and rank are shown in fig. 7 c Connect the primer of subsequence hybridization (it can be adapter oligonucleotides itself in some cases).

Fig. 7 B shows the example expanded by random primer.In general, random primer include it is one or more random or Close to random sequence (for example, being randomly choosed from one group of two or more different nucleotide in one or more positions One or more nucleotide, wherein one or more positions selection each of different nucleotide comprising with It is embodied as in the set of the adapter of machine sequence).In this way, polynucleotides are (for example, whole or substantially all of cyclisation is more Nucleotide) it can be amplified in a manner of sequence-nonspecific.Such program is referred to alternatively as " whole genome amplification " (WGA)；But It is that typical WGA scheme (not being related to cyclisation step) cannot effectively expand short polynucleotides, such as multicore according to the present invention Acid fragments.About the further illustrative discussion of WGA program, see, for example Li et al. people (2006) J Mol.Diagn.8 (1): 22-30。

When cyclisation polynucleotides are amplified before sequencing, can be directly sequenced without enrichment and to amplified production, Or it is sequenced after one or more enriching steps.Enrichment may include purifying one or more reactive components, such as lead to Cross reservation amplified production or the one or more reagents of removal.For example, amplified production can purify as follows: being connected in substrate with multiple Probe hybridization, then release capture polynucleotides, such as pass through washing step.Alternatively, amplified production can use the one of combination pair A member is marked, and later in conjunction with another member for the combination pair being connected in substrate, and is washed to discharge amplification Product.Possible substrate includes but is not limited to glass and modified or functionalization glass, plastics (including acrylic resin, The copolymer of polystyrene and styrene and other materials, polypropylene, polyethylene, polybutene, polyurethane, Teflon^TM, etc. Deng), polysaccharide, nylon or nitrocellulose, ceramics, resin, the silica including silicon and modified silicon or silica-base material, carbon, Metal, unorganic glass, plastics, fiber optic bundle and various other polymer.In some embodiments, substrate is pearl or other are small Discrete particle form, can be magnetic or paramagnetic beads, to promote separation by applying magnetic field.In general, " in conjunction with It is right " refer to one of first and second parts, wherein the first and second parts for having specific binding compatibility each other.Properly Combination to including but not limited to antigen/antibody (for example, foxalin/anti-foxalin, dinitrobenzene (DNP)/anti- DNP, the anti-dansyl of dansyl-X-, fluorescein/anti-fluorescein, lucifer yellow/anti-lucifer yellow and rhodamine/anti-rhodamine)；Biology Element/Avidin (or biotin/Streptavidin)；Caldesmon (CBP)/calmodulin；Hormone/hormone receptor； Agglutinin/carbohydrate；Peptide/cell-membrane receptor；Albumin A/antibody；Haptens/antihapten；Enzyme/co-factor；With enzyme/bottom Object.

In some embodiments, being cyclized the enrichment after the amplification of polynucleotides includes one or more additional amplifications Reaction.In some embodiments, it includes sequence A and sequence B (with 5 ' to 3 ' that enrichment, which is included in amplification in amplification reaction mixture, Direction orientation) target sequence, the amplification reaction mixture include (a) amplification polynucleotides；(b) first comprising sequence A ' is drawn Object, wherein the sequence A of the first primer and target sequence is specifically miscellaneous by the complementarity between sequence A and sequence A ' It hands over；(c) include sequence B the second primer, wherein second primer be present in the complementary multicore glycosides comprising target sequence complement Sequence B in acid ' specifically hybridized by the complementarity between B and B '；And (d) polymerase, extension first are drawn Object and the second primer are to generate the polynucleotides of amplification；Wherein between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B Distance be 75nt or shorter.Figure 10 show relative in single repetition target sequence (itself unless for ring-type, otherwise Will not generally be amplified) and multiple copies comprising target sequence concatermer, the example alignment of the first and second primers.In view of drawing Orientation of the object relative to target sequence monomer, the arrangement can be described as " back-to-back " (B2B) or " reversed " primer.It is carried out with B2B primer Amplification promote cyclic annular and/or concatermer amplified production enrichment.In addition, the orientation is with relatively small footmark, (a pair is drawn The total distance that object is crossed over) combine, make it possible to expand more various fragmentation event around target sequence, because with The arrangement of the primer seen in typical amplified reaction (facing with each other, across target sequence) is compared, unlikely to go out between primer Existing contact.In some embodiments, the distance between the 5 ' ends of sequence A and 3 ' ends of sequence B are about or less than about 200,150,100,75,50,40,30,25,20,15 or less nucleotide.In some embodiments, sequence A is sequence The complement of B.In some embodiments, multiple B2B primer pairs for multiple and different target sequences are used in same reaction, With expand in parallel multiple and different target sequences (for example, about or at least about 10,50,100,150,200,250,300,400,500, 1000,2500,5000,10000,15000 or more different target sequences).Primer can have any appropriate length, Such as described in this paper other parts.Amplification may include any suitable amplified reaction under proper condition, such as this Amplified reaction described in text.In some embodiments, amplification is polymerase chain reaction.

In some embodiments, B2B primer contains at least two sequential element: miscellaneous with target sequence by complementarity 5 ' " tails " that will not hybridize with target sequence in the first element of friendship, and the first amplification stage under the first hybridization temperature, at this Hybridization occurs for first element (for example, being located at its 3 ' side due to tail portion and close to first element junction during first amplification stage Target sequence part between lack complementarity).For example, the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', Second primer include be located at the sequence Ds of 5 ' sides relative to sequence B, and sequence C and sequence D under the first hybridization temperature first Do not hybridize with multiple concatermers during the amplification stage.In some embodiments for having used such tailed primer, amplification can Including first stage and second stage；First stage includes hybridization step (the first and second primer therebetween at the first temperature Hybridize with concatermer (or cyclisation polynucleotides)) and primer extend；And second stage includes being higher than the second of the first temperature At a temperature of the hybridization step (amplification of the first and second primers and first or second primer or its complement comprising amplification therebetween Products thereof) and primer extend.With the intermolecular hybrid by first element and multi-joint intracorporal internal target sequence in only primer And form shorter segment and compare, higher temperature is more advantageous in primer extension product along the first element of primer and tail element Between hybridization.Therefore, this two stages amplification can be used for reducing the degree for being otherwise advantageously possible for short amplified production, to protect The amplified production of relatively high proportion of two or more copies with target sequence is held.For example, carrying out 5 circulations After hybridization and primer extend under (for example, at least 5,6,7,8,9,10,15,20 or more circulations) second temperature, reaction At least 5% in mixture (for example, at least 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30% or more) Amplifying polynucleotides include two or more copies of target sequence.According to this two stage, tailing B2B primer amplification process The diagram of embodiment be shown in FIG. 11.

In some embodiments, enrichment includes expansion under conditions of tending to increase amplicon length from concatermer Increase.For example, primer concentration can be reduced, so that not being that each initiation site can be with primer hybridization, so that PCR is produced Object is longer.Similarly, the primer hybridization time reduced in circulation can similarly make less primer hybridization, to also make average The sub- size of PCR amplification increases.In addition, being averaged for PCR amplification can similarly be increased by increasing the temperature of circulation and/or extension of time Length.Any combination of these technologies can be used.

In some embodiments, especially amplified production is handled when having carried out amplification with B2B primer, To filter generated amplicon according to size, to reduce and/or remove the number of monomers in the mixture comprising concatermer. A variety of available technologies can be used to complete in this, including but not limited to, cut from gel segment and gel filtration (for example, The segment for being greater than about 300,400,500 or more nucleotide for being enriched with length)；And for combining buffering by fine tuning The SPRI pearl (Agencourt AMPure XP) of liquid concentration progress size selection.For example, can with DNA fragmentation mixed process The middle DNA fragmentation that greater than about 500 base-pairs (bp) are preferentially combined using 0.6x combination buffer.

In some embodiments, when amplification generates single-stranded concatermer, in the sequencing text generated for sequencing reaction A part before library is formed or as the formation, is converted to double-strand construct for single-stranded.It is a variety of to generate double-strand from single-chain nucleic acid The suitable method of construct is available.Fig. 9 describes some possible methods, but many other sides also can be used Method.As shown in Figure 9 A, for example, double-strand can be generated using random primer, polymerase, dNTP and ligase.Fig. 9 B is described when more The synthesis of the second chain when the conjuncted subsequence comprising linking, can use as primer in the reaction.Fig. 9 C describes " ring " Use, wherein to an end of the end of concatermer addition ring adapter, middle ring adapter has from hybrid nucleic acid One fraction.In this case, the connection of ring adapter is produced from ring hybridization and as polymerase Primed template. Fig. 9 D shows the use of super-branched primer, is generally chiefly used in known to target sequence, in the case where the formation of a plurality of chain, ought especially make When with polymerase with strong strand displacement function.

According to some embodiments, cyclisation polynucleotides (or its amplified production, optionally enriched) is sequenced Reaction is read with generating sequencing.The sequencing reading generated by such method can be used according to other methods disclosed herein. A variety of sequencing approaches are available, especially high-flux sequence method.Example includes but is not limited to the sequencing of Illumina manufacture System is (such asWithSequencing system), Life Technologies manufacture sequencing system (IonDeng), the 454Life Sciences system of Roche, Pacific Biosciences system Deng.In some embodiments, sequencing includes usingWithSystem generates length about or more than about 50, the reading of 75,100,125,150,175,200,250,300 or more nucleotide.In some embodiments, it is sequenced Including synthesis order-checking process, wherein being added on the primer extension product in growth with single nucleotide acid, the nucleotide quilt Iteratively identify.Pyrosequencing is an example of synthesis order-checking process, by analyzing in generated synthetic mixture The presence of sequencing reaction by-product, that is, pyrophosphoric acid and the incorporation for identifying nucleotide.In particular, primer/template/polymerase complex It is contacted with the nucleotide of single type.If the nucleotide is impregnated in, polymerization reaction cut triphosphoric acid chain α and β phosphoric acid it Between nucleoside triphosphate, to discharge pyrophosphoric acid.Then discharged pyrophosphoric acid is identified using chemiluminescence enzyme reporting system In the presence of the pyrophosphoric acid containing AMP is converted ATP by the system, measures ATP with luciferase later and is believed with generating measurable light Number.When detecting light, base is impregnated in, and when light is not detected, base is not incorporated into.After washing step appropriate, make various alkali Base circulation is contacted with the complex, sequentially to identify the subsequent base in template sequence.See, e.g., U.S. Patent number 6, 210,891。

In relevant sequencing procedure, primer/template/polymerase complex is fixed in substrate, and the complex It is contacted with the nucleotide of label.The fixation of complex can be carried out by primer sequence, template sequence and/or polymerase, and can To be covalent or non-covalent.For example, the fixation of complex can pass through the company between polymerase or primer and substrate surface It connects and realizes.In alternative setting, nucleotide has and does not have removable termination group.After incorporation, marker It is coupled with complex, and is therefore detectable.In the case where carrying the nucleotide of terminator, the mark that can individually identify is carried All four kinds of different nucleotide of note object are contacted with composite bulk phase.Presence of the incorporation of labeled nucleotide due to terminator And extension is prevented, and marker is added in complex, to allow to identify the nucleotide mixed.Then by marker and Terminator is removed from the nucleotide of incorporation, and the process is repeated after washing step appropriate.In open-ended nucleotide In the case of, as pyrosequencing, the nucleotide of a type of label is added in complex with determine if by It can mix.After the labelling groups and washing step appropriate on removal nucleotide, a variety of different nucleotide are in same mistake It is recycled in journey by reaction mixture.See, e.g., U.S. Patent number 6,833,246, pass through draw for all purposes With and be integrally incorporated herein.For example, Illumina genomic analysis system (Illumina Genome Analyzer System) It is based on technology described in WO 98/44151, wherein DNA molecular (is further referred to as flowed by anchor probe binding site Pond binding site) it is integrated on microarray dataset (flow cell), and expanded in situ on glass slide.DNA molecular expands on it The surface of solids generally comprise multiple first and second oligonucleotide bindings, first with one close to or at target polynucleotide The sequence of a end is complementary, and second complementary with the sequence of another end close to or at target polynucleotide.This row Column allow to carry out bridge amplification, such as described in US20140121116.Then DNA molecular is annealed with sequencing primer, and It is sequenced in parallel to base one by one using reversible termination submethod.It, can be in the knot of anchoring double-strand bridge before the hybridization of sequencing primer Close the chain that double-strand bridge polynucleotides are cut at the cleavage site in one of oligonucleotides, thus leave one not with solid-based Bear building-up closes single-stranded, can be removed by denaturation, and another link merging can be used to hybridize with sequencing primer.In general, Illumina gene order-checking analysis system uses the flow cell with 8 channels, generates the sequencing that length is 18-36 base It reads, each run generates the quality data for being greater than 1.3Gbp (referring to www.illumina.com)

During another synthesis order-checking, the core of not isolabeling is observed in real time with the progress of Template Dependent synthesis The incorporation of thuja acid.Specifically, the incorporation of the nucleotide with fluorescent marker, individually fixed primer/template/poly- is observed Synthase complex, to allow to identify the base of every kind of addition in real time with the addition of base.In this process, labelling groups It is connected in a part of the nucleotide cut during incorporation.For example, being mixed by being connected to labelling groups A part of the phosphoric acid chain removed in journey, that is, on α, β, γ or other terminal phosphate groups on nucleoside polyphosphates, the marker It does not mix in nascent strand, but generates n DNA.The observation of individual molecule is related generally to for composite optical being limited in non- In often small illumination volume.By carrying out optical confinement to complex, region monitored is created, wherein the nucleosides of random dispersion Acid exists within the very short time, and the nucleotide mixed retains the longer time with being impregnated in view volume. This generates characteristic signals relevant to incorporation event, can also be composed and are characterized by the characteristic signal for the base being added. In related aspect, the labeling component of interaction is provided on polymerase or complex other parts and the nucleotide of incorporation, Such as fluorescence resonance energy transfer (FRET) dyestuff pair, make labeling component in interaction distance to mix event, and generate Characteristic signal, this be also for the base being impregnated in it is characteristic (see, e.g., U.S. Patent number 6,917,726,7, 033,764,7,052,847,7,056,676,7,170,050,7,361,466 and 7,416,844；With US 20070134128).

In some embodiments, the nucleic acid in sample can be sequenced by connection.This method generally uses DNA ligase Identify target sequence, for example, such as polonies (polony) methods and SOLiD technology (Applied Biosystems, Now it is Invirogen) used in.In general, the set of all possible oligonucleotides of regular length is provided, according to sequencing Position is marked.Oligonucleotides is annealed and connected；DNA ligase can be generated to correspond to and is somebody's turn to do to the preferential attachment of matching sequence The signal of complementary series at position.

According to some embodiments, if sequencing reading is different at least two from the sequence difference between reference sequences Exist in polynucleotides (such as two different Circular polynucleotides, they can be distinguished due to different contacts), Then it is determined as real sequence variants (for example, being present in the sample before amplification or sequencing, and not during these Either one or two of result).Because the sequence variants generated as amplification or sequencing mistake are unlikely comprising identical target sequence (such as position and type) accurately is replicated on the different polynucleotides of two of column, so adding this certificate parameter greatly Reduce the background of faulty sequence variant, and with the increasing of the detection sensitivity and accuracy to make a variation to actual sequence in sample Add.In some embodiments, frequency about or below about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001% or lower Sequence variants sufficiently above background, to allow accurately to determine.In some embodiments, sequence variants are with about or low Frequency in about 0.1% occurs.In some embodiments, when statistically significantly the frequency of sequence variants is higher than background When error rate (for example, p value is about or less than about 0.05,0.01,0.001,0.0001 or lower), the frequency is sufficiently above back Scape.In some embodiments, when the frequency of sequence variants is about or is at least about 2 times of background error rate, 3 times, 4 times, 5 Again, when 6 times, 7 times, 8 times, 9 times, 10 times, 25 times, 50 times, 100 times or higher (for example, at least higher for 5 times), the frequency is abundant Higher than background.In some embodiments, the background error rate in the accurate sequence for determining given position is about or is below about 1%, 0.5%, 0.1%, 0.05%, 0.01%, 0.005%, 0.001%, 0.0005% or lower.In some embodiments In, error rate is lower than 0.001%.

In some embodiments, identify that real sequence variants (also referred to as " determine " or " decisioing making ") include optimal Ground, which reads one or more sequencing, to be compared to identify the difference between the two, and identification contact with reference sequences. In general, comparing includes placing a sequence along another sequence, notch iteratively is introduced along each sequence, to two sequences How matched quality gives a mark, and repeats preferably along reference sequences to each position.With best score Matching be considered as being directed at (alignment), and represent the deduction about the degree of relationship between sequence.In some embodiment party In case, reading the reference sequences being compared with sequencing is to belong to member's mutually of the same race with reference to genome, such as with subject Genome.It can be with reference to genome complete or incomplete.In some embodiments, with reference to genome only by including target The region of polynucleotides forms, such as the area source self-reference genome or sharing from the sequencing reading generation by being analyzed Sequence.In some embodiments, reference sequences include the polynucleotide sequence of one or more organisms, for example, coming from one Kind or various bacteria, archeobacteria, virus, protist, fungi or other organisms sequence, or be made of the sequence.? In some embodiments, reference sequences only by a part of reference genome (such as with the target sequences analyzed of one or more Corresponding region (for example, one or more genes, or part thereof)) composition.For example, (such as being detected for detection pathogen In the case where pollution), with reference to genome is the pathogen (such as HIV, HPV or harmful bacterium bacterial strain, such as Escherichia coli) Whole gene group, or the part of it that can be used for identifying, such as identifying specific bacterial strain or serotype.In some implementations In scheme, sequencing is read and is compared with multiple and different reference sequences, for example, to screen multiple and different organism or Bacterial strain.

In typical compare, there is the mismatch base in reference sequences to indicate in the point beside the base in sequencing reading Replacement mutation has occurred.Similarly, when a sequence beside the base in another sequence include notch when, deduction have occurred it is slotting Enter or deletion mutation (" indel ").When wishing to indicate that a sequence compares each other with another, which sometimes referred to as matches To comparison (pairwise alignment).Multiple Sequence Alignment typically refers to the comparison of two or more sequences, including, such as It is compared by a series of pairings.In some embodiments, the probable property for being related to displacement and insertion/deletion to alignment score is set Fixed number value.When individual base is compared, matching or the comparison score caused according to probable property is replaced is mismatched, it can be with Be, for example, 1 be matching and 0.33 is mismatch.Insertion/deletion deducts gap penalty from comparison score, can be, example Such as -1.Gap penalty and the probable property of displacement can be based on Heuristicses or based on a priori assumption how being mutated about sequence.It Value influence generate comparison.The example of algorithm for being compared includes but is not limited to that Smith-Waterman (SW) is calculated Method, Needleman-Wunsch (NW) algorithm, algorithm and hash function ratio based on Burrows-Wheeler conversion (BWT) To device such as Novoalign (Novocraft Technologies；Can be obtained from www.novocraft.com), ELAND (Illumina, San Diego, Calif), SOAP (can obtain from soap.genomics.org.cn) and Maq (can be from Maq.sourceforge.net is obtained).An exemplary alignment programs for executing BWT method are that Burrows-Wheeler is compared Device (BWA) can be obtained from the website SourceForge that Geeknet (Fairfax, Va.) is safeguarded.The generally each nucleotide of BWT The memory of 2 bits is occupied, so that being carried out with typical desk-top or laptop computer to the nucleotide sequence up to 4G base-pair Indexation (index) is possibly realized.Preprocessing process includes the building (for example, indexation of reference sequences) of BWT and supports auxiliary Help data structure.BWA includes two different algorithms, is based on BWT.Short (the bwa- of bwa- can be used by the comparison that BWA is carried out Short) algorithm carries out, which designed for the short inquiry for being up to about 200bp, has low error rate (< 3%) (Li H. with Durbin R.Bioinformatics, 25:1754-60 (2009)).Second algorithm, BWA-SW are read for long And design, there are more mistake (Li H. and Durbin R. (2010) .Fast and accurate long-read alignmentwith Burrows-Wheeler Transform.Bioinformatics,Epub.).Bwa-sw comparative device has When be known as " bwa- long ", " bwa long algorithm " or similar title.Execute a comparison journey of Smith-Waterman algorithm versions Sequence is MUMmer, can be obtained from the website SourceForge that Geeknet (Fairfax, Va.) is safeguarded.MUMmer is a use Carry out the system of quick comparison complete genome group, no matter it is complete form or sketch form (Kurtz, S. etc., Genome Biology,5:R12(2004)；Delcher, A.L. etc., Nucl.Acids Res., 27:11 (1999)).For example, MUMmer 3.0 on 2.4GHz Linux desktop computer, using the memory of 78MB, can find out in 13.7 seconds in a pair of 5 megabasses Whole 20- base-pairs or longer accurate ratio between genome are matched.MUMmer can also compare incomplete genome；Its energy Enough contigs for easily handling 100s or 1000s from shotgun sequencing project, and using including in the system It is compared NUCmer program with another group of contig or genome.Other non-limiting examples of alignment programs include: to come From BLAT (Kent, W.J., Genome the Research 4:656- of Kent Informatics (Santa Cruz, Calif.) 664(2002))；From Beijing Joint Genome Institute (Beijing, Conn.) or BGI Americas Corporation The SOAP2 of (Cambridge, Mass.)；Bowtie (Langmead etc., Genome Biology, 10:R25 (2009))；Efficiently The ELANDv2 component of extensive nucleotide comparison data library (ELAND) or sequence and common assessment (CASAVA) software of variation (Illumina,San Diego,Calif.)；From Real Time Genomics, Inc. (San Francisco, Calif.) RTG Investigator；Novoalign from Novocraft (Selangor, Malaysia)；Exonerate, Europe Bioinformatics research institute (Hinxton, UK) (Slater, G. and Birney, E., BMC Bioinformatics 6:31 (2005)), Clustal Omega comes from University College Dublin (Dublin, Ireland) (Sievers F. etc., Mol Syst Biol 7, article 539 (2011))；From University College Dublin (Dublin, Ireland ClustalW or ClustalX (Larkin M.A. etc., Bioinformatics, 23,2947-2948) (2007))；And FASTA, European Bioinformatics research institute (Hinxton, UK) (Pearson W.R. etc., PNAS 85 (8): 2444-8(1988)；Lipman, D.J., Science227 (4693): 1435-41 (1985)).

In general, sequencing data is obtained from large-scale parallel sequencing reaction.Many a new generation's high-flux sequence systems will Data output is FASTQ file, but extended formatting can also be used.In some embodiments, generally divided by sequence alignment Contact and any relative to reference sequence of the analysis sequence to identify repetitive unit length (such as monomer length), be formed by cyclisation The real variation of column.Identification repetitive unit length may include calculating the region of repetitive unit, find out the reference locus of sequence The boundary of (for example, when one or more sequences are especially target with amplification, enrichment and/or sequencing), each repeat region And/or the number of running repetitive sequence is sequenced every time.Sequence analysis may include analyzing the sequence of two chains of duplex Data.As described above, in some embodiments, the different polynucleotides from sample are (for example, the ring with different contacts Change polynucleotides) readings sequence occur identical variant be considered as confirm variant.In some embodiments, if sequence Variant occurs in the more than one repetitive unit of identical polynucleotides, then the sequence variants are also considered as confirmation or true Positive variant, because identical sequence variants are same unlikely in the identical multi-joint intracorporal same position for repeating target sequence Occur.The quality score of sequence can consider when identifying variant and confirmation variant, for example, quality score can be filtered out lower than threshold The sequence and base of value.Sensitivity and specificity that other biological Informatics Method further increases variant judgement can be used.

In some embodiments, statistical analysis can be used to come in definitive variation (mutation) and quantization total DNA sample Variance ratio.Sequencing data can be used to calculate the overall measurement value of particular bases.For example, the comparison knot calculated in the step of from front Fruit can calculate the number of " effectively reading ", that is, the number that the confirmation for each locus is read.The allele of variant The available effective reading counting for locus of frequency is standardized.Total noise level can be calculated, total noise level be The average rate for the variant observed in all locus.The frequency of variant and total noise level can be used for really in conjunction with other factors Determine the confidence interval of variant judgement.The statistical model of such as Poisson distribution can be used to assess the confidence interval of variant identification. It can also be used the gene frequency of variant as the index of the relative quantity of variant in gross sample.

In some embodiments, microorgranic contaminant is identified based on determination step.For example, specific sequence variants can refer to Show the pollution of potential infectious microorganism.In order to identify microorganism, sequence variants can be identified in highly conserved polynucleotides. Can be used for microorganism system occur characterization and identification exemplary height guard polynucleotides be included in 16S rRNA gene, 23S rRNA gene, 5S rRNA gene, 5.8S rRNA gene, 12S rRNA gene, 18S rRNA gene, 28S rRNA base The nucleotide sequence found in cause, gyrB gene, rpoB gene, fusA gene, recA gene, coxl gene and nifD gene. For eucaryote, rRNA gene can be core rRNA gene, mitochondria rRNA gene or this two.In some embodiments In, the sequence variants in 16S-23S rRNA gene internal transcribed spacer (ITS) can be used for distinguishing and identifying those closely Relevant taxon, the process is with or without the use of other rRNA genes.Since the structure of 16S rRNA limits, whole gene Specific region has highly conserved polynucleotide sequence, but unstructuredness section may have the variability of height.Identify sequence Column variant can be used for identifying that operating taxa (OTU), OTU represent subgenus, category, subfamily, section, suborder, mesh, subclass, guiding principle, Asia Door, door, suberathem or boundary, and optionally determine its frequency in group.The detection of particular sequence variant, which can be used for detecting, to be referred to Show the presence of the microorganism of pollution and optionally detects its amount (opposite or absolute).Illustrative application includes to excrement Or the water quality detection of other pollutions, the detection to animals or humans pathogen position pollution entering the water, detect recycled water or circulation Water detects the sewage discharge stream including ocean disposal plume, the pathogen monitoring of aquaculture installation, monitoring sandy beach, trip Swimming area or the relevant recreational facilities of other water, and the situation in great numbers of prediction toxic algae.Food monitoring application adds including food The periodic detection of plant produced line investigates slaughterhouse, checks kitchen and the food in restaurant, hospital, school, prison and other mechanisms The food-borne causal agent of Storage, such as coli strain O157:H7 or O111:B4, Listeria monocytogenes (Listeria monocytogenes) or intestines salmonella intestines subspecies enteritis serovar (Salmonella enterica subsp.enterica serovar Enteritidis).It can detecte in the water that shellfish and shellfish are lived and cause paralytic Mussel poisoning, neurotoxic shellfish poisoning, diarrhea-type mussel poisoning and forgetting type mussel poisoning algae.Furthermore, it is possible to putting Check imported food to guarantee food safety before row in customs.Phytopathogen monitoring application includes that gardening and nursery are supervised Survey, such as the microorganism that monitoring causes robur to die suddenly --- robur dies suddenly phytophthora (Phytophthora ramorum), crop disease Substance monitoring and disease management and the monitoring of forestry pathogen and disease management.It can also be that its main security is hidden to microbial contamination The manufacturing environment of the drug of trouble, medical instrument and other consumer goods or key component, wherein special pathogen such as verdigris is false for investigation The presence of monad (Pseudomonas aeruginosa) or Staphylococcus aureus (Staphylococcus aureus), more The presence of common microbiological mostly relevant to the mankind, to water there are the presence of relevant microorganism or other represent before existed The presence of the microorganism for the biological load identified in the specific environment or like environment.Similarly, can exist to including spacecraft It is that the construction of interior sensitive equipment and the monitoring of assembling area are previously determined, known to inhabit in the environment or most commonly introduce the ring Microorganism in border.

In one aspect, the present invention provides a kind of method of the sequence variants in identification nucleic acid samples, the nucleic acid samples Polynucleotides comprising being less than 50ng, each polynucleotides have 5 ' ends and 3 ' ends.In some embodiments, this method It include: that (a) with ligase is cyclized the individual polynucleotides in the sample to form multiple Circular polynucleotides；(b) once The ligase is isolated from the Circular polynucleotide, that is, expands the Circular polynucleotide to form concatermer；(c) to this Concatermer is sequenced to generate multiple sequencings and read；(d) sequence for identifying that multiple sequencing is read between reference sequences is poor It is different；(e) by from the multiple reading that the nucleic acid samples less than 50ng polynucleotide obtain with 0.05% or higher The sequence difference that frequency occurs is determined as sequence variants.

The initial amount of polynucleotides in the sample can be with very little.In some embodiments, the amount of initiation nucleotide is few In 50ng, for example, less than 45ng, 40ng, 35ng, 30ng, 25ng, 20ng, 15ng, 10ng, 5ng, 4ng, 3ng, 2ng, 1ng, 0.5ng, 0.1ng or less.In some embodiments, the amount of starting polynucleotide is in the range of 0.1-100ng, such as 1- 75ng, 5-50ng or 10-20ng.In general, less starting material improves the important of the rate of recovery for increasing each processing step Property.The process for reducing the amount of polynucleotides for being used to participate in subsequent reactions in sample, which reduces, can detect that the sensitive of rare mutation Degree.For example, the expected 10-20% for only recycling starting material of the method for Lou etc. (PNAS, 2013,110 (49)) description.For big For the starting material (for example, from purify in the bacterium of laboratory cultures) of amount, this may not be substantive obstacle.But It is the sample significantly lower for starting material, the recycling in the low range may be in the very rare variant of detection Substantive sexual dysfunction.Therefore, in some embodiments, returned in the method for the invention from a step to the sample of another step Yield (for example, the mass fraction that can be used for being input to following amplification step or sequencing steps into the input of cyclisation step) is about Or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95% or higher.The rate of recovery of particular step can be close to 100%.The rate of recovery can be the rate of recovery about particular form, such as be input to Circular polynucleotide from non-annularity polynucleotides The rate of recovery.

The polynucleotides can come from any suitable sample, such as herein for sample described in various aspects of the present invention Product.Polynucleotides from sample can be any one in a variety of polynucleotides, including but not limited to DNA, RNA, ribose Body RNA (rRNA), transfer RNA (tRNA), Microrna (miRNA), mRNA (mRNA), any of the above one kind segment or with The above arbitrarily combination of two or more.In some embodiments, sample includes DNA.In some embodiments, multicore glycosides Acid be it is single-stranded, what is directly obtained is either generated by processing (such as denaturation).This document describes suitable more Other examples of nucleotide, such as be described for any aspect in various aspects of the present invention.In some embodiments In, polynucleotides undergo subsequent step (such as cyclisation and amplification) without extraction step and/or without further purification step.For example, Fluid sample can be handled without extraction step to remove cell, to generate the fluid sample and cell sample of purifying Product then separate DNA from the fluid sample of purifying.A variety of programs for separating polynucleotides are available, such as are passed through Precipitating, or the non-specific binding with substrate wash substrate then to discharge the polynucleotides of combination.It is walked when without cell extraction When separating polynucleotides suddenly and from sample, polynucleotides are largely extracellular or " cell-free " polynucleotides, can Corresponding to dead or impaired cell.It can be used for characterizing the cell or cell mass that they are derived to the identification of this kind of cell Body, such as in microbiologic population.If handled sample to extract polynucleotides, such as mentioned from the cell in sample It takes, is available there are many extracting method, the example is provided herein (for example, for any in various aspects of the present invention Aspect).

Sequence variants in nucleic acid samples can be any one in a variety of sequence variants.This document describes sequence variants Multiple non-limiting examples, such as be described for any aspect in various aspects of the present invention.In some embodiment party In case, sequence variants are single nucleotide polymorphism (SNP).In some embodiments, sequence variants are in group with low frequency Occur (also referred to as " rare " sequence variants).For example, sequence variants can with about or below about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001% or lower frequency occur.In some embodiments, sequence variants are to be about or be lower than About 0.1% frequency occurs.

According to some embodiments, the polynucleotides of sample are cyclized, such as by using ligase.Cyclisation can wrap 3 ' the ends that 5 ' ends of polynucleotides are connected to same polynucleotides are included, 3 ' of another polynucleotides in sample are connected to End, or it is connected to 3 ' ends of the polynucleotides (for example, artificial polynucleotide, such as oligonucleotides adapter) from separate sources End.In some embodiments, the 3 ' ends that 5 ' ends of polynucleotides are connected to same polynucleotides (also referred to as " connect certainly Connect ").There is provided herein cyclization process (with or without the use of adapter oligonucleotides), reagents (for example, the type of adapter, company Connect the use of enzyme), reaction condition (for example, be conducive to from connect) and optional additional treatments (such as being purified after reaction) it is non- Limitative examples, such as be described for any aspect in various aspects of the present invention.

In general, the end of polynucleotides, which is connected to each other to form Circular polynucleotide, (or directly, or to be made Adapter oligonucleotides is interleave with one or more) contact with junction sequences can be generated.When polynucleotides 5 ' ends and When 3 ' ends are connected by adapter polynucleotides, term " contact " can refer to the contact between polynucleotides and adapter (such as one of 5 ' terminal contacts or 3 ' terminal contacts), or refer to as formed by adapter polynucleotides and including adapter multicore Contact between thuja acid, polynucleotides 5 ' ends and 3 ' ends.When 5 ' ends of polynucleotides and 3 ' ends are not using When connecting in the case where interleaving adapter (for example, 5 ' ends and 3 ' ends of single stranded DNA), term " contact " refers to the two ends Hold the point being connected.Contact can be identified according to the sequence (also referred to as " junction sequences ") of the polynucleotides comprising contact. In some embodiments, sample includes the polynucleotides with the end mixture formed by following procedure: natural degradation Process (such as cell cracking, cell death and other DNA are discharged into the process of its ambient enviroment, DNA ring around this from cell Can further degrade in border, such as in cell-free polynucleotides), the fragmentation of the by-product as sample treatment is (for example, solid Fixed, dyeing and/or storing process), and the fragmentation that carries out of method of the cutting DNA by being not limited to specific target sequence (for example, mechanical fragmentation, is such as ultrasonically treated；Non-sequence specific nucleic acid enzymatic treatment, such as DNase I, fragmentation enzyme (fragmentase)).When sample includes the polynucleotides with end mixture, two nucleotide have identical 5 ' end Or a possibility that 3 ' end, is very low, and two nucleotide independently both 5 ' end and 3 ' ends having the same a possibility that It is extremely low.It therefore, in some embodiments, can be with or even when two polynucleotides include the part with identical target sequence Different polynucleotides are distinguished using contact.When polynucleotides end is connected without using adapter is interleave, Junction sequences can be identified by comparing with reference sequences.For example, when the sequence of two group sub-sequences is relative to reference sequences When showing reversion, show that the point inverted can indicate to have contact in the point.When polynucleotides end by one or When multiple linking subsequences connect, contact can be by be connected the neighbouring of subsequence and is identified, or in sequencing reading with known Length be enough to obtain from 5 ' and 3 ' ends of cyclisation polynucleotides and identified by above-mentioned comparison in the case where sequence.? In some embodiments, the formation of specific contact is very rare event so that its sample cyclisation polynucleotides it In be unique.

After cyclisation, it is more that reaction product can be purified the cyclisation that may participate in subsequent step with raising before amplification or sequencing The relative concentration or purity of nucleotide are (for example, other one or more molecules in the separation or reaction that pass through Circular polynucleotide Removal).For example, cyclization or its component can be handled to remove single-stranded (uncyclized) polynucleotides, such as by circumscribed Nucleic acid enzymatic treatment.As further example, cyclization or part thereof can carry out size exclusion chromatography, retain whereby and lose Small reagent (such as unreacted adapter) is abandoned, or retains in individual volume and discharges cyclisation product.It is a variety of to be used to clear up The kit of connection reaction is available, such as the Zymo oligonucleotides purification kit manufactured by Zymo Reaserch is mentioned The kit of confession.In some embodiments, purifying include for remove or degrade the ligase used in cyclization and/ Or the processing for being purified into cyclisation polynucleotides from the ligase.In some embodiments, the place for ligase of degrading Reason includes Protease Treatment.Suitable protease can be obtained from prokaryotes, virus and eucaryote.The example of protease includes Proteinase K (coming from Candida albicans (Tritirachium album)), pronase e (come from streptomyces griseus (Streptomyces griseus)), bacillus polymyxa (Bacillus polymyxa) protease, thermolysin (come From Thermophilic Bacteria), trypsase, subtilopeptidase A, furin etc..In some embodiments, which is egg White enzyme K.Proteinase K processing can follow the manufacturer's protocol or using standard conditions (for example, such as Sambrook and Green, Molecular Cloning:A Laboratory Manual, provided by the 4th edition (2012)).May be used also after Protease Treatment It extracts and precipitates.In an example, cyclisation polynucleotides purify as follows: in the presence of 0.1%SDS and 20mM EDTA Lower progress Proteinase K (Qiagen) processing, with 1:1 phenol/chloroform and chloroform, and with ethyl alcohol or isopropanol precipitating.One In a little embodiments, precipitating carries out in ethanol.

As that directly cyclisation polynucleotides can be sequenced after cyclisation for described in other aspects of the present invention.Alternatively, One or more amplified reactions can be carried out before sequencing.The method of a variety of amplifying polynucleotides (for example, DNA and/or RNA) is can ?.Amplification can be it is linear, it is exponential, or the linear and exponential phase involved in multistage amplification procedure.Amplification side Method may include the change of temperature, such as denaturation step, or can be the constant temperature process for not needing thermal denaturation.This document describes The non-limiting example of suitable amplification procedure, such as be described for any aspect in various aspects of the present invention.? In some embodiments, amplification includes rolling circle amplification (RCA).As described elsewhere herein, typical RCA reaction mixture packet Containing one or more primers, polymerase and dNTPs, and generate concatermer.In general, the polymerase in RCA reaction is tool There is the polymerase of strand-displacement activity.A variety of such polymerases are available, and non-limiting example includes exonuclease^- DNA polymerase i big (Klenow) segment, Phi29DNA polymerase, Taq archaeal dna polymerase etc..In general, concatermer is comprising coming from The target sequence of template nucleotide two or more copy (such as target sequence pact or be more than about 2,3,4,5,6,7,8,9,10 A or more copy；In some embodiments, about or more than about 2 copy) polynucleotide amplification product.Amplification is drawn Object can be any suitable length, e.g., about or at least about 5,10,15,20,25,30,35,40,45,50,55,60,65, 70,75,80,90,100 or more nucleotide, arbitrary portion or can all be complementary to the respective target sequence that primer is hybridized Column (for example, about or at least about 5,10,15,20,25,30,35,40,45,50 or more nucleotide).This document describes The example of a variety of RCA processes, such as primer is targeted using random primer, target specificity primer and adapter, some of them are shown In Fig. 7.

When be cyclized polynucleotides expand (for example, to generate concatermer) before sequencing when, amplified production can not into Direct Sequencing in the case where row enrichment, or be sequenced after one or more enriching steps.This document describes suitable richnesses The non-limiting example of collection process, such as be described for any aspect in various aspects of the present invention (for example, second B2B primer is used in amplification step).According to some embodiments, the polynucleotides of cyclisation (or its amplified production, optionally may be used Can be enriched) sequencing reaction is carried out to generate sequencing reading.The sequencing generated by such method is read can be according to herein Disclosed other methods use.A variety of sequencing approaches are available, especially high-flux sequence method.Example includes but is not limited to The sequencing system of Illumina manufacture is (such asWithSequencing system), Life Technologies Sequencing system (the Ion of manufactureDeng), the 454Life Sciences system of Roche, Pacific Biosciences system etc..In some embodiments, sequencing includes usingWithSystem Generate length be about or more than about 50,75,100,125,150,175,200,250,300 or more nucleotide reading. This document describes other non-limiting examples of amplification platform and method, such as any aspect in various aspects of the present invention It is described.

According to some embodiments, if sequencing reading is different at least two from the sequence difference between reference sequences Hair in polynucleotides (such as two different Circular polynucleotides, they can be distinguished due to different contacts) It is raw, then it is determined as real sequence variants (for example, being present in the sample before amplification or sequencing, and not these mistakes The result of any one of journey process).It is being wrapped because the sequence variants of the result wrong as amplification or sequencing are unlikely (such as position and type) is accurately repeated on two different polynucleotides containing identical target sequence, so this verifying ginseng of addition Number considerably reduces the background of faulty sequence variant, while with the sensitivity and standard of the actual sequence variation in test sample The increase of true property.In some embodiments, frequency about or below about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001% Or lower sequence variants determine sufficiently above background so that permission is accurate.In some embodiments, sequence variants are with about For or below about 0.1% frequency occur.In some embodiments, this method include by those frequencies about 0.0005% to In the range of about 3%, the sequence difference such as between 0.001%-2% or 0.01%-1% is determined as real sequence variants. In some embodiments, when statistically significantly the frequency of sequence variants is higher than background error rate (for example, p value is about Or it is less than about 0.05,0.01,0.001,0.0001 or lower) when, the frequency is sufficiently above background.In some embodiments, When the frequency of sequence variants about or is at least about 2 times of background error rate, 3 times, 4 times, 5 times, 6 times, 7 times, 8 times, 9 times, 10 Again, when 25 times, 50 times, 100 times or higher (for example, at least 5 times high), the frequency is sufficiently above background.In some embodiments In, the background error rate in the accurate sequence for determining given position about or below about 1%, 0.5%, 0.1%, 0.05%, 0.01%, 0.005%, 0.001%, 0.0005% or lower.In some embodiments, error rate is lower than 0.001%.Herein The method of determining frequency and error rate is described, as being described in terms of any in various aspects of the present invention.

In some embodiments, identify that real sequence variants (also referred to as " determine " or " decisioing making ") include optimal Ground, which reads one or more sequencing, to be compared to identify the difference between the two, and identification contact with reference sequences. In general, comparing includes placing a sequence along another sequence, notch iteratively is introduced along each sequence, to two sequences How matched quality gives a mark, and repeats preferably along reference sequences to each position.With best score Matching be considered as being aligned, and represent the deduction about the degree of relationship between sequence.A variety of alignment algorithms and the execution calculation The comparative device of method is available, and non-limiting example is described herein, as in terms of any in various aspects of the present invention It is described.In some embodiments, reading the reference sequences being compared with sequencing is known reference sequences, is such as joined Examine genome (such as the genome for belonging to same member with subject).It can be with reference to genome complete or imperfect 's.In some embodiments, it is only made of the region comprising target polynucleotide with reference to genome, such as the area source self-reference Genome reads the consensus sequence generated from the sequencing analyzed.In some embodiments, reference sequences include one Or the polynucleotide sequence of multiple organisms, for example, from one or more bacteriums, archeobacteria, virus, protist, fungi Or the sequence of other organisms, or be made of the sequence.In some embodiments, reference sequences are only by the one of reference genome Part, such as region corresponding with the target sequence that one or more is analyzed (for example, one or more genes, or part thereof) Composition.For example, for detection pathogen (such as detect pollution in the case where), with reference to genome be the pathogen (such as HIV, HPV or harmful bacterium bacterial strain, such as Escherichia coli) complete genome group or part of it, the part can be used for identifying, example Such as identify specific bacterial strain or serotype.In some embodiments, sequencing is read and is compared with multiple and different reference sequences It is right, such as screening multiple and different organism or bacterial strain.This document describes can identify sequence difference in contrast Other non-limiting examples of reference sequences, such as be described for any aspect in various aspects of the present invention.

In one aspect, the present invention provides the methods that one kind expands multiple and different concatermers in the reactive mixture, should Concatermer includes two or more copies of target sequence, and wherein the target sequence includes the sequence A oriented with 5 ' to 3 ' directions and sequence Arrange B.In some embodiments, this method includes that reaction mixture is carried out nucleic acid amplification reaction, wherein the reaction mixture Include: (a) multiple concatermers, wherein the individual concatermer in multiple concatermer includes to have 5 ' ends and 3 ' by cyclisation The independent polynucleotides of end and the different contacts formed；(b) include sequence A ' the first primer, the wherein the first primer and target The sequence A of sequence is specifically hybridized by the complementarity between sequence A and sequence A '；(c) second comprising sequence B is drawn Object, wherein second primer and the sequence B being present in the complementary polynucleotide comprising target sequence complement ' by sequence B with Complementarity between B ' specifically hybridizes；And (d) polymerase, extension the first primer and the second primer are expanded with generating The polynucleotides of increasing；Wherein the distance between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B are 75nt or shorter.

In in a related aspect, the present invention provides one kind expand in the reactive mixture it is multiple comprising target sequence not With the method for Circular polynucleotide, wherein the target sequence includes the sequence A oriented with 5 ' to 3 ' directions and sequence B.In some realities It applies in scheme, this method includes that reaction mixture is carried out nucleic acid amplification reaction, and wherein the reaction mixture includes: (a) multiple Circular polynucleotide, wherein the individual Circular polynucleotide in multiple Circular polynucleotide includes to have 5 ' ends by cyclisation The different contacts holding the independent polynucleotides with 3 ' ends and being formed；(b) include sequence A ' the first primer, wherein this first draws Object is specifically hybridized with the sequence A of target sequence by the complementarity between sequence A and sequence A '；(c) comprising sequence B Second primer, wherein second primer and the sequence B being present in the complementary polynucleotide comprising target sequence complement ' pass through sequence Complementarity between column B and B ' specifically hybridizes；And (d) polymerase, extend the first primer and the second primer with Generate the polynucleotides of amplification；Wherein sequence A and sequence B are endogenous sequence, and the 5 ' ends of the sequence A of target sequence and sequence Arranging the distance between 3 ' ends of B is 75nt or shorter.

Either amplification Circular polynucleotide still expands concatermer, and such polynucleotides all may be from any suitable sample Product source (or directly, or indirectly, such as pass through amplification).This document describes a variety of suitable sample sources, The type of optional extracting method, polynucleotides and the type of sequence variants, such as appointing in various aspects of the present invention Where face is described.Circular polynucleotide can be generated by the cyclisation of non-annularity polynucleotides.There is provided herein cyclization processes (for example, using and without using adapter oligonucleotides), reagent (for example, the type of adapter, ligase use), reaction item The non-limit of part (for example, being conducive to connect certainly), optional additional treatments (such as being purified after reaction) and the contact being consequently formed Property example processed, such as be described for any aspect in various aspects of the present invention.Concatermer can be by Circular polynucleotide Amplification generate.The method of a variety of amplifying polynucleotides (for example, DNA and/or RNA) is available, and non-limiting example also exists It is described herein.In some embodiments, concatermer is generated by the rolling circle amplification of Circular polynucleotide.

Figure 10 show the first and second primers relative in single repetition target sequence (itself unless for ring-type, Otherwise will not generally expand) and multiple copies comprising target sequence concatermer example alignment.Such as it is described herein its Pointed by his aspect, primer arrangement can be described as " back-to-back " (B2B) or " reversed " primer.The amplification carried out with B2B primer Promote cyclic annular and/or concatermer template enrichment.In addition, the orientation is with relatively small footmark, (pair of primers is crossed over Total distance) combine, make it possible to expand more various fragmentation event around target sequence, because expanding in typical case The arrangement of primer seen in reaction (facing with each other, across target sequence) is compared, unlikely to occur contact between primer.One In a little embodiments, the distance between 3 ' ends of 5 ' ends of sequence A and sequence B about or less than about 200,150,100, 75,50,40,30,25,20,15 or less nucleotide.In some embodiments, sequence A is the complement of sequence B.? In some embodiments, using multiple B2B primer pairs for multiple and different target sequences in same reaction, to expand in parallel Multiple and different target sequence (for example, about or at least about 10,50,100,150,200,250,300,400,500,1000,2500, 5000,10000,15000 or more different target sequences).Primer can have any appropriate length, such as herein Described in other parts.Amplification may include any appropriate amplified reaction under proper condition, such as described herein Amplified reaction.In some embodiments, amplification is polymerase chain reaction.

In some embodiments, B2B primer contains at least two sequential element: miscellaneous with target sequence by complementarity 5 ' " tails " that will not hybridize with target sequence in the first element of friendship, and the first amplification stage under the first hybridization temperature, at this Hybridization occurs for first element (for example, due to being located at its 3 ' side in tail portion and close to first element junction during the amplification stage Target sequence lacks complementarity between part).For example, the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', the Sequence D of two primers comprising being located at 5 ' sides relative to sequence B, and the first expansion of sequence C and sequence D under the first hybridization temperature Do not hybridize with multiple concatermers (or Circular polynucleotide) during the increasing stage.In some implementations for having used such tailed primer In scheme, amplification may include first stage and second stage；First stage includes hybridization step at the first temperature, and therebetween One and second primer hybridize with concatermer (or Circular polynucleotide) and primer extend；And second stage is included in higher than Hybridization step under the second temperature of one temperature, therebetween the first and second primers and the first or second primer comprising extension or its The amplified production of complement hybridizes and primer extend.Amplification cycles number at each temperature in two temperature can be based on Required product is adjusted.In general, the first temperature will be used for relative small number of circulation, for example, about or less than about 15,10,9,8,7,6,5 or less circulation.Recurring number at relatively high temperatures can be independently of the recurring number at a temperature of first Selected, but be usually as much or more circulation, for example, about or at least about 5,6,7,8,9,10,15,20,25 or More circulations.It is shorter with being formed by the intermolecular hybrid of first element and multi-joint intracorporal internal target sequence in only primer Segment is compared, and higher temperature is more advantageous in primer extension product hybridizing between the first element of primer and tail element. Therefore, this two stages amplification can be used for reducing the degree for being advantageously possible for short amplified production, to maintain relatively high ratio The amplified production of example has two or more copies of target sequence.For example, 5 circulation (for example, at least 5,6,7,8,9,10, 15,20 or more circulation) second temperature under hybridization and primer extend after, at least 5% in reaction mixture (such as At least 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30% or more) amplifying polynucleotides include target sequence Two or more copies of column.Figure 11 is shown according to the embodiment of this two stages, tailing B2B primer amplification process In show.

In some embodiments, the condition setting of amplification is to be partial to increase the amplicon length from concatermer.For example, Primer concentration can be reduced, so that not being that each initiation site can be with primer hybridization, so that PCR product is longer. Similarly, the primer hybridization time reduced in circulation can similarly enable less primer to hybridize, to also make average PCR Amplicon size increases.In addition, the temperature and/or extension of time of increase circulation equally will increase the average length of PCR amplification. Any combination of these technologies can be used.

In some embodiments, especially amplified production is handled when having carried out amplification with B2B primer, To filter obtained amplicon according to size, to reduce and/or remove the number of monomers in the mixture comprising concatermer. A variety of available technologies can be used to complete in this, including but not limited to, cut from gel segment and gel filtration (for example, The segment for being greater than about 300,400,500 or more nucleotide for being enriched with length)；And for combining buffering by fine tuning The SPRI pearl (Agencourt AMPure XP) of liquid concentration progress size selection.For example, can with DNA fragmentation mixed process The middle DNA fragmentation that greater than about 500 base-pairs (bp) are preferentially combined using 0.6x combination buffer.

In some embodiments, the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', and the second primer includes It is located at the sequence D of 5 ' sides relative to sequence B, and during the first amplification stage of sequence C and sequence D under the first hybridization temperature Do not hybridize with multiple Circular polynucleotides.Amplification may include first stage and second stage；Wherein the first stage is included in Hybridization step at a temperature of one, the first and second primers are miscellaneous with Circular polynucleotide or its amplified production before primer extend therebetween It hands over；And second stage include be higher than the first temperature second temperature under hybridization step, therebetween the first and second primers and packet The amplified production of first or second primer or its complement containing extension hybridizes.For example, the first temperature can choose for about or Higher than about sequence A ', the Tm or its average value of sequence B, or 1 DEG C higher than one of these Tm, 2 DEG C, 3 DEG C, 4 DEG C, 5 DEG C, 6 DEG C, 7 DEG C, 8 DEG C, 9 DEG C, 10 DEG C or higher temperature.In such instances, second temperature can choose about or greater than about to combine sequence Arrange (A '+C), composite sequence (B+D) Tm or its average value, or 1 DEG C higher than one of these Tm, 2 DEG C, 3 DEG C, 4 DEG C, 5 DEG C, 6 DEG C, 7 DEG C, 8 DEG C, 9 DEG C, 10 DEG C or higher temperature.Term " Tm " is also referred to as " melting temperature ", and typically represent 50% by The oligonucleotides of reference sequences (it is in practice likely to be compared with the subsequence in most nucleotide) and its complementary series composition occurs Temperature when hybridization (or separation).In general, Tm is increased with the increase of length, therefore, the Tm of sequence A ' is expected lower than combination sequence Arrange the Tm of (A '+C).

In one aspect, the present invention provides a kind of reaction mixtures for the method according to the invention implementation method. The reaction mixture may include one of various components as described in this paper for any a variety of methods or a variety of, be included in Reaction mixture described in method described herein.In some embodiments, which is multiple for expanding The ring-type of the different concatermers of two or more copies comprising target sequence or one or more copies comprising target sequence is more The mixture of nucleotide (such as cyclic monomer), wherein the target sequence includes the sequence A oriented with 5 ' to 3 ' directions and sequence B, The reaction mixture includes: (a) multiple concatermers (or Circular polynucleotide), wherein multiple concatermer (or cyclic annular multicore glycosides Acid) in individual concatermer (or Circular polynucleotide) include by cyclisation have 5 ' ends and 3 ' ends independent multicore glycosides Sour and formation different contacts；(b) comprising the first primer of sequence A ', wherein the sequence A of the first primer and target sequence passes through Complementarity between sequence A and sequence A ' specifically hybridizes；(c) include sequence B the second primer, wherein this second draws Object and the sequence B being present in the complementary polynucleotide comprising target sequence complement ' it is special by the complementarity between B and B ' Strange land hybridization；And (d) polymerase, extend the first primer and the second primer to generate the polynucleotides of amplification；Wherein target sequence The distance between 5 ' ends and 3 ' ends of sequence B of the sequence A of column are 75nt or shorter.Sample, primer, gathers polynucleotides Synthase, other reagents and reaction condition can be any those described herein, such as any aspect in various aspects It is those of described, it can include in the reactive mixture with any appropriate combination.In some embodiments, the first primer Sequence C comprising being located at 5 ' sides relative to sequence A ', sequence D of second primer comprising being located at 5 ' sides relative to sequence B, and sequence Column C and sequence D do not hybridize during the first amplification step of amplified reaction with the two or more concatermers.

In one aspect, the present invention provides composition that is useful in method described herein or being generated by it, examples Such as in each otherwise any aspect of the invention.In some embodiments, it is single-stranded that the composition, which includes multiple, And cyclisation polynucleotides substantially free of ligase.In some embodiments, the composition includes multiple concatermers, In multiple concatermer correspond to one group 10000 or less target polynucleotide, and further, wherein multiple multi-joint Independent concatermer in body is characterized in that: (a) they include two or more duplicate copies of sequence, wherein all institutes It states copy and both corresponds to identical target polynucleotide；And (b) in an individual concatermer sequence it is duplicate two or more The difference in another independent concatermer in contact and the composition between copy.Sample, primer, gathers polynucleotides Synthase and other reagents can be any those described herein, for example, for described in any aspect in various aspects that It a bit, can include in the composition with any appropriate combination.The composition may include one or more pairs of primers, such as herein The B2B primer, designed for expanding one or more target sequences.Composition can be provided in the form of kit.Examination Reagent and other materials in agent box can be contained in any suitable container, and can be directly available form, or Person needs to be combined with the reagent of other reagents or user's offer in kit (for example, the dilution of the composition of concentration Or the reconstruction of freeze-dried composition).Kit can provide buffer, and non-limiting example includes sodium carbonate buffer, bicarbonate Sodium buffer, borate buffer solution, Tris buffer, MOPS buffer, HEPES buffer solution and combinations thereof.Kit can be into one Step comprising about implement herein for any in various aspects in terms of described in one or more methods explanation.Explanation It can be provided with one or more language (such as 2,3,4,5 kind or more language).

In one aspect, it is disclosed herein and provides a kind of system for detection sequence variant.In some embodiments, The system includes (a) computer, is configured to receive the user's request for carrying out sample detection reaction；(b) amplification system is rung It should request to carry out nucleic acid amplification reaction to sample or part of it in user, wherein the amplified reaction will be the following steps are included: (i) will Individual polynucleotides are cyclized to form multiple Circular polynucleotides, and wherein each Circular polynucleotide is in 5 ' ends There is contact between 3 ' ends；(ii) expands the Circular polynucleotide；(c) sequencing system, for by the amplification system The polynucleotides of amplification generate sequencing and read, and the sequence difference between reference sequences is read in identification sequencing, and will be present in Sequence difference at least two Circular polynucleotides with different contacts is determined as sequence variants；(d) it is sent out to recipient It delivers newspaper the Report Builder of announcement, wherein this report includes the result for sequence variants detection.In some embodiments, this connects Debit is user.Figure 32 shows the non-limiting example of system useful in the methods of the invention.

Computer for using in the system may include one or more processors.Processor can with one or Other units of multiple controllers, computing unit and/or computer system are associated, or as needed in implantation firmware.Such as Fruit realizes that then routine is storable in any computer-readable memory in software, is such as stored in RAM, ROM, flash is deposited In reservoir, disk, laser disk or other suitable storage mediums.Similarly, which can be via any of transfer approach And it is sent to calculating equipment, the transfer approach is believed for example including the communications such as telephone wire, internet, wireless connection are passed through Road, or via removable mediums such as computer readable diskette, flash drives.Each step can be realized as each piece, behaviour Work, tool, module and technology, and described each piece, operation, tool, module and technology transfer can hardware, firmware, software or It is realized in any combination of person's hardware, firmware and/or software.When realizing within hardware, described piece, operation, among technology etc. Some or all blocks, operation, technology can for example can compile at customization integrated circuit (IC), specific integrated circuit (ASIC), scene It is realized in journey logic array (FPGA), programmable logic array (PLA) etc..Client can be used in the embodiment of system End-relationship server database schema.Client-server architecture is such network architecture: each of on wherein network Computer or process are client or server.Server computer is usually dedicated to hyperdisk driver (file service Device), the powerful computer of printer (printing server) or network flow (network server).Client computer includes using Family runs the PC (personal computer) or work station and example output device disclosed herein of application program on it.Visitor Family end computer obtains resource, such as file, equipment even processing capacity by server computer.In some embodiments In, all database functions of server computer processes.Client computer can have all front end data management of processing Software, and also can receive data input from the user.

The system can be configured for receiving the user's request for carrying out sample detection reaction.User request can be straight It is connecing or indirect.The example of direct request includes the request transmitted by input equipments such as keyboard, mouse or touch screens. The example of indirect request includes such as passing through the transmission of internet (wired or wireless) via communication media.

The system can further include the amplification for carrying out nucleic acid amplification reaction to sample or part thereof in response to user's request System.The method of a variety of amplifying polynucleotides (for example, DNA and/or RNA) is available.Amplification can be linear, exponential form , or the linear and exponential phase involved in multistage amplification procedure.Amplification method may include the change of temperature, such as thermal denaturation Step, or can be the constant temperature process for not needing thermal denaturation.This document describes the non-limiting example of suitable amplification procedure, Such as it is described for any aspect in various aspects of the present invention.In some embodiments, amplification includes that rolling ring expands Increase (RCA).A variety of systems for amplifying polynucleotides are available, and can be based on the class for the amplified reaction that will be carried out Type and it is different.For example, amplification system may include thermal cycler for the amplification method including temperature change circulation.Amplification system It may include real-time amplification and detecting instrument, such as be by Applied Biosystems, Roche and Strategene manufacture System.In some embodiments, individual polynucleotides the following steps are included: (i) be cyclized more to be formed by amplified reaction A Circular polynucleotide, wherein each Circular polynucleotide has contact between 5 ' ends and 3 ' ends；(ii) expands Increase the Circular polynucleotide.Sample, polynucleotides, primer, polymerase and other reagents can be any those described herein, Such as those of described in any aspect in various aspects.There is provided herein cyclization processes (for example, using and without using rank Connect sub- oligonucleotides), reagent (for example, use of the type of adapter, ligase), reaction condition is (for example, be conducive to from connecting Connect), the non-limiting example of optional additional treatments (such as reaction then purify) and the contact being consequently formed, such as this Any aspect in invention various aspects is described.It by Systematic selection and/or can be designed as executing any such method.

System can further include sequencing system, which is directed to be generated by the polynucleotides that amplification system expands and survey Sequence is read, and identifies the sequence difference between the sequencing reading and reference sequences, and will be present at least two and connect with difference Sequence difference in the Circular polynucleotide of point is determined as sequence variants.Sequencing system and amplification system can be it is identical, or Equipment comprising overlapping.For example, identical thermal cycler all can be used in amplification system and sequencing system.It is used for the system more Kind microarray dataset is available, and is selected based on selected sequencing approach.This document describes the examples of sequencing approach. Amplification and sequencing may include the use of liquid processor.Several commercially available liquid processing systems can be used for running these processes Automatic operation (see, e.g., from Perkin-Elmer, Beckman Coulter, Caliper Life Sciences, Tecan, Eppendorf, Apricot Design, Velocity 11 liquid processor as example).A variety of automatic sequencings Instrument is commercially available, and including by Life Technologies (SOLiD platform, and the detection based on pH), Roche The sequenator of (454 platform), Illumina (for example, system based on flow cell, such as Genome Analyzer device) manufacture. Between 2,3,4,5 or more automation equipments (for example, one or more of liquid processor and sequencing device it Between) transfer can be manually or automation.

This document describes the methods relative to reference sequences identification sequence difference and judgement sequence variants, such as this hair Any aspect in bright various aspects is described.Sequencing system generally includes software, which is used in response to number is sequenced According to input and required parameter input (such as selection with reference to genome) and execute these steps.Alignment algorithm and execute this The example of the comparative device of a little algorithms is described herein, and may make up a part of sequencing system.

The system can further include the Report Builder that report is sent to recipient, and wherein this report includes to be used for sequence The result of the detection of column variant.Report can generate in real time, such as during reading is sequenced or when analyzing sequencing data, and with The progress of process regularly update.In addition, or alternatively, report can generate at the end of analysis.This report can automatically generate, Such as when sequencing system completes to determine the step of all sequences variant.In some embodiments, this report is in response to user Instruction and generate.Testing result in addition to variant is sequenced, report also may include the analysis of sequence variants based on one or more. For example, this report may include about such correlation when one or more sequence variants are to specific pollutants or related phenotype Information, such as the pollutant or phenotype there are a possibility that, and optional suggestion based on this information horizontal in what (such as additional test, monitoring or remedial measure).This report can take many forms in any form.It is envisioned that It arrives, data relevant to present disclosure can network in this way or connection (or hand of any other suitable transmission information Section, including but not limited to mailing physical examination report, such as print out) transmission, for receiving and/or being consulted by recipient.It receives Person can be but not limited to personal or electronic system (for example, one or more computers, and/or one or more servers).

In one aspect, the present invention provides a kind of computer-readable mediums comprising code, and the code is once by one Or multiple processors execute, i.e. the method for examinations sequence variants.In some embodiments, the method for the implementation includes: (a) client for carrying out detection reaction to sample is received to request；(b) it requests to carry out nucleic acid to sample or part of it in response to client Amplified reaction, wherein individual polynucleotides the following steps are included: (i) be cyclized to form multiple ring-types by the amplified reaction Polynucleotides, wherein each Circular polynucleotide has contact between 5 ' ends and 3 ' ends；(ii) expands the ring Shape polynucleotides；(c) carry out sequencing analysis comprising following steps: it is raw that (i) is directed to the polynucleotides expanded in the amplification reaction It is read at sequencing；(ii) sequence difference between reference sequences is read in identification sequencing；And (iii) will be present at least two Sequence difference in Circular polynucleotide with different contacts is determined as sequence variants；(d) it generates comprising becoming for sequence The report for the result that physical examination is surveyed.

Machine readable media comprising computer-executable code can take many forms, including but not limited to tangible to deposit Storage media, carrier media or physical transmission medium.Non-volatile memory medium is for example including CD or disk (such as any calculating Any storage equipment in machine) etc., it can be used for realizing the storage medium etc. of database.Volatile storage medium includes dynamic Memory, such as the main memory of such computer platform.Tangible transmission media includes coaxial cable, copper wire and optical fiber, packet Include the conducting wire for constituting the bus in computer system.Carrier wave transmission media can take electric signal or electromagnetic signal or sound wave or The form of light wave, electric signal generated or electromagnetic signal or sound such as during radio frequency (RF) and infrared (IR) data communication Wave or light wave.Therefore, the common form of computer-readable medium includes, for example: floppy disk, flexible disk, hard disk, tape, Ren Heqi It is his magnetic medium, CD-ROM, DVD or DVD-ROM, any other optical medium, card punch paper tape, any with hole patterns Other physical storage mediums, RAM, ROM, PROM and EPROM, FLASH-EPROM, any other memory chip or box, carrier wave Transmission data or instruction, the cable of the such carrier wave of transmission or link or it is any allow computer therefrom read programming code and/ Or other media of data.Many forms in the form of these computer-readable mediums may participate in processor transmit one or One or more instructions of multiple sequences are for execution.

Computer-executable code of the invention can include that server, PC or such as smart phone or plate calculate It is executed in any suitable equipment comprising processor including the mobile devices such as machine.Any controller or computer optionally wrap Monitor is included, which can be cathode-ray tube (" CRT ") display, flat-panel monitor (for example, active matrix liquid crystal is aobvious Show device, liquid crystal display etc.) or other displays.Computer circuits are normally placed in box, which includes many integrated circuits Chip, such as microprocessor, memory, interface circuit and other chips.The box optionally also drives comprising hard disk drive, floppy disk Dynamic device, high capacity move driver such as writable cd-ROM and other common peripheral components.Such as keyboard, mouse or The input equipments such as touch screen optionally provide input from the user.Computer may include appropriate for receiving user instructions Software, the form of the user instruction are that the user into one group of parameter field inputs (for example, in the gui) or form is Preprogrammed instruction (for example, pre-programmed is used for a variety of different concrete operations).

Embodiment

The following example is provided in order to illustrate various embodiments of the present invention, is not intended to and is limited in any way The system present invention.These embodiments and method described in it represent preferred embodiment at present, are exemplary, and It should not be taken as limiting the scope of the invention.Those skilled in the art will envision that the model of the present invention limited by scope of the claims Enclose interior included its variation and other purposes.

Embodiment 1: preparation is used for the tandem sequence repeats sequencing library of abrupt climatic change

By in 12 μ L water or 10mM Tris-HCl pH 8.0 > 10ng about 150bp DNA fragmentation starting, be added 2 μ L10X CircLigase buffer solution mixture, and it is heated to 95 DEG C 2 minutes, it cools down 5 minutes on ice.4 μ L 5M beets are added Alkali, 1 μ L 50mM MnCl₂With 1 μ LCircLigase II.It is incubated at least 12 hours at 60 DEG C.The mixing of 2 μ L RCA primers is added Object (each 50nM, until 5nM final concentration), and mix.It is heated to 95 DEG C 2 minutes, and is cooled to 42 DEG C 2 hours.With ZYMO few nucleosides Sour Purification Kit CirLigation product.According to the explanation of manufacturer, 28 μ L water are added into 22 μ L CircLigation product to 50 μ L total volume.It is mixed with 100 μ L oligonucleotides combination buffers and 400 μ L ethyl alcohol.With > 10,000 × g is centrifuged 30 seconds, then discards efflux.750 μ L DNA washing buffers are added, with > 10,000 × g is centrifuged 30 Second, efflux is discarded, and be at full throttle centrifuged again 1 minute.By posts transfer into new Eppendorf pipe, and with 17 μ L Water elution (final elution volume is about 15 μ L).

Rolling circle amplification is carried out in the volume of about 50 μ L.5 μ L10X RepliPHI buffering is added into 15 μ L elution samples Liquid (Epicentre), 1 μ L 25mM dNTP, 2 μ L 100mM DTT, 1 μ L 100U/ μ L RepliPHI Phi29 and 26 μ L water. 30 DEG C incubation reaction mixture 1 hour.RCA product, and remaining washing step are purified by the way that 80 μ L Ampure pearls are added Follow the explanation of manufacturer.For elution, 22.5 μ L elution buffers are added and are incubated for pearl 5 minutes at 65 DEG C.Pipe is returned Of short duration centrifugation before returning to magnet.

Eluted product and 25 μ L 2X Phusion main mixtures that about 20 μ L are reacted from RCA, 2.5 μ L DMSO and The various B2B primer mixtures of 0.5 10 μM of μ L mix.Run following PCR program: 95 DEG C 1 minute, 5 extensions (95 recycled DEG C 15 seconds, 55 DEG C 15 seconds, 72 DEG C 1 minute), the duplication of 13-18 circulation (95 DEG C 15 seconds, 68 DEG C 15 seconds, 72 DEG C 1 minute) and 72 DEG C of final extensions in 7 minutes.E- gel is run to check the size of PCR product.If range is 100-500bp, carry out The purifying of 0.6X Ampure pearl to be enriched with 300-500bp, and takes 1-2ng to carry out another wheel using tiny RNA library adapter primer PCR.If primer size range > 1000bp is purified with 1.6X Ampure pearl, and takes 2-3ng for Nextera XT expands sublibrary preparation, is enriched in the size within the scope of 400-1000bp to purify by 0.6X Ampure pearl.

In order to carry out bioinformatic analysis to sequencing data, FASTQ file is obtained from MiSeq operation.It will using BWA Sequence in FASTQ file is compared with the reference genome sequence comprising targeting sequence (such as KRAS and EGFR).It uses Comparison result finds the region and length and its reference position of the repetitive unit of each sequence (two reading).Use each sequence Comparison result and repetitive unit information, find the variant in all locus.The result read from two is merged.Meter Calculate the normalized frequency and noise level of variant.Multiple additional standards are applied in the variant obtained from the variant of confirmation determines, Including qscore>30 and p value<0.0001.Real variant (mutation) is reported as by the variant of the confirmation of these standards.It should Process can be automated by computer language (such as Python).

Embodiment 2: tandem sequence repeats sequencing library of the preparation for sequence variants detection

The DNA fragmentation that 10ng average length in 12 μ L volumes is 150bp is used for the building of tandem sequence repeats sequencing library. DNA use in advance T4 polynucleotide kinase (New England Biolabs) handle, with the end 5' add phosphate group and The end 3' leaves hydroxyl.For by DNase I or the generation of enzymatic fragmentation or the DNA fragmentation that is extracted from serum or blood plasma, Skip end-o f-pipe -control step.DNA is mixed with 2 μ L 10X CircLigase buffers (Epicentre CL9021K).It will mix It closes object and is heated to 95 DEG C 2 minutes, and cool down 5 minutes on ice, 4 μ L glycine betaines, 1 μ L 50mM MnCl are then added₂With 1 μ L CircLigase II(Epicentre CL9021K).Reaction at least 12 hours is attached at 60 DEG C.By each of 1 μ L 200nm Kind RCA primer mixture (to the final concentration of final 10nM) is added in connection product and mixes, and is heated to 96 DEG C and is kept for 1 point Clock is cooled to 42 DEG C, and is incubated for 2 hours at 42 DEG C.

There is the cyclisation of RCA primer to connect (CircLigation) product ZYMO oligonucleotides purification kit hybridization (ZYMO Research, D4060) purifying.For this purpose, with 28 μ L water and 1 μ L vector rna, (Sigma-Aldrich, R5636 use 1X TE buffer is diluted to 200ng/ μ L) by 21 μ L product dilutions to 50 μ L.Diluted sample is delayed in conjunction with 100 μ L oligonucleotides Fliud flushing and the mixing of 400 μ L, 100% ethyl alcohol.Mixture is loaded on column, is centrifuged 30 seconds under > 10,000 × g.Discard outflow Liquid.The column is washed to by being centrifuged 30 seconds under > 10,000 × g with 750 μ L DNA washing buffers, discards efflux, and It is at full throttle centrifuged again 1 minute.The column is transferred in new 1.5mL Eppendorf pipe and uses 17 μ L elution buffers (10mM Tris-CL pH 8.0, final elution volume are about 15 μ L) eluted dna.

By 5 μ L 10X RepliPHI buffers, 2 μ L 25mM dNTP, 2 μ L 100mM DTT, 1 μ L 100U/ μ L RepliPHI Phi29 and 25 μ L water (Epicentre, RH040210) are added to 15 μ L from the sample eluted on column, overall reaction Volume is 50 μ L.Reaction mixture is incubated for 2 hours at 30 DEG C.By the way that 80 μ L Ampure XP pearl (Beckman are added Coulter, A63881) and purify RCA product.Washing step according to manufacturer explanation.65 in 22.5 μ L elution buffers DEG C be incubated for 5 minutes after elute RCA product.Of short duration centrifugation will be managed before returning to magnet.

By about 20 μ L from the RCA eluted product reacted and 25 μ L 2X Phusion main mixture (New England Biolabs M0531S), 2.5 μ L water, 2.5 μ L DMSO and (each 10 μM) of 0.5 μ L B2B primer mixture mixing.Using following Thermocycling program is expanded: 95 DEG C 2 minutes, 5 circulation extension (95 DEG C 30 seconds, 55 DEG C 15 seconds, 72 DEG C 1 minute), 18 Circulation duplication (95 DEG C 15 seconds, 68 DEG C 15 seconds, 72 DEG C 1 minute) and 72 DEG C of final extensions in 7 minutes.Pass through electrophoretic examinations PCR product size.Once long PCR product is confirmed by electrophoresis, PCR product is and 30 μ L Ampure pearls (0.6X volume) Mixing is for purifying, with the PCR product of enrichment > 500bp.Use Qubit 2.0Quantification Platform (Invitorgen) product of purifying is quantified.The DNA of about 1ng purifying is for Nextera XT amplification sublibrary preparation (Illumina FC-131-1024).Insert Fragment size > 500bp is enriched with by being purified with 0.6X Ampure pearl Library element.

Using suitable for 2100Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA) The concentration and size distribution in the library of Agilent DNA high sensitivity kit assay amplification.Using Illumina MiSeq and 2-250bp MiSeq sequencing kit is sequenced.According to MiSeq handbook, the library being denaturalized 12pM is loaded on into sequencing operation On.

In the version of this process, Illumina adapter is used for library preparation rather than Nextera preparation.For This, is used for PCR amplification for the about 1ng DNA similarly purified, the amplification using a pair of common segment comprising B2B primer and Primer (P5 and the P7 of Illumina linking subsequence；5'CAAGCAGAAGACGGCATACGA3' and 5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT3').Using Phusion main mixture, the duplication step of 12 circulations is carried out Suddenly (95 DEG C 30 seconds, 55 DEG C 15 seconds, 72 DEG C 60 seconds).The purpose of the amplification step is that Illumina adapter is added for amplicon Sequencing.Length > 500bp amplicon is enriched with 0.6X Ampure pearl.Using suitable for 2100Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA) Agilent DNA high sensitivity kit assay amplification sublibrary Concentration and size distribution.It is sequenced using Illumina MiSeq and 2x250bp MiSeq sequencing kit.B2B primer Common segment acts also as sequencing primer sequence, and the survey of customization is added if primer is not included in Illumina kit Sequence primer.The library being denaturalized 12pM is loaded in sequencing operation.

The target region covering in an instance analysis is shown in FIG. 33.Following table 3 describes point of target area Analyse result.

Table 1 provides the example of RCA primer useful in the method for the invention.Table 2 is provided in method of the invention In useful B2B primer example.

Table 1

Table 2

Table 3

	As a result
		It reads	1.5M
% target base, 1x	97.8%

The hit of % target	63.4%
		% duplication	18.2%
Mean coverage	74.5x
		The standard deviation of coverage	0.21

Embodiment 3: the fragmentation of the genomic DNA for sequencing library building

Using NEBNext dsDNA fragmentation enzyme reagent kit (New England Biolabs) according to the scheme of manufacturer 1 μ L genomic DNA is handled.Incubation time extends to 45 minutes, 37 DEG C.By the way that 5 μ L 0.5M EDTA pH 8.0 are added And fragmentation reaction is terminated, and by the Ampure XP pearl (Beckman Coulter, A63881) of addition 2X volume according to system The scheme for making quotient purifies.Fragmentation is analyzed on Bioanalyzer (Agilent) using highly sensitive DNA kit DNA.The magnitude range of the DNA of fragmentation typically about 100bp to about 200bp, and the peak with about 150bp.

Embodiment 4: library preparation procedure

In the present embodiment, for illustrative purposes, using the library KAPA reagent preparation box (KK8230).

For including the steps that pearl purifies, by AMPure XP pearl (catalog number (Cat.No.) A63881) balance to room temperature, and with sample It is sufficiently resuspended before product mixing.After being sufficiently mixed on turbine mixer with sample, it is incubated at room temperature 15 minutes, so that DNA is integrated on pearl.Then pearl is placed on magnet stand until liquid is clarified.Then with the ethyl alcohol of 200 μ L 80% by pearl It washes twice, and is dried at room temperature for 15 minutes.

Reaction is repaired in order to carry out end, main mixture is repaired at most 50 μ L (2-10ng) Cell-free DNAs and 20 ends μ L (8 μ L water, 7 ends L10X KAPA μ repair buffer and enzymatic mixture is repaired in 5 ends L KAPA μ) mixing, and be incubated at 20 DEG C 30 minutes.Then 120 μ L AMPureXP pearls 70 ends μ L are added to repair in reaction solution.Then as described above to sample into Row purifying.

In order to carry out A- tailings reactions, by the pearl of the drying for the DNA fragmentation repaired comprising end with A- tailing is main mixes Object (42 μ L water, 5 μ L 10X KAPA A tailing buffers and KAPAA- tailing enzyme) mixing.30 DEG C will be reacted on to be incubated for 30 minutes. After 90 μ L PEG solution (20%PEG 8000,2.5M NaCl) are added, mixture is washed by above-mentioned pearl purification schemes. This A- tailing step is skipped for flush end connection reaction.

Connector is connected, having following sequences, (two kinds of oligonucleotides of 5' to 3') are used to form adapter multicore glycosides Sour duplex :/5Phos/CCATTTCATTACCTCTTTCTCCGCACCCGACATAGAT*T and/5Phos/ ATCTATGTCGGGTGCGGAGAAAGAGGTAATGAAATGG*T.(being connected for flush end) or the A- tailing repaired comprising end The drying pearl of (for the connection based on connector) connect main mixture with 45 μ L, and (30 μ L water, 10 μ L 5x KAPA connections are slow Fliud flushing and 5 μ L KAPA T4DNA ligases) and 5 μ L water (being connected for flush end) or 5 μ L connector oligonucleotides equimolar Mixture (for the connection based on connector) mixing.Pearl is sufficiently resuspended, and is incubated for 15 minutes at 20 DEG C.50 μ L are added After PEG solution (see on), the mixture is washed by above-mentioned pearl purification schemes.

Multiple displacement amplification (MDA) is carried out using Illustra Genomiphi V2DNA amplification kit.It will be comprising connecting The pearl of the drying of tab segments chain be resuspended in 9 μ L include random hexamer buffer in, and 95 DEG C heat 3 minutes, then It is cooling rapidly on ice.It is added after 1 μ L enzymatic mixture, cooling sample is incubated for 90 minutes at 30 DEG C.Then by 65 DEG C heating stoppings in 10 minutes react.After 30 μ L PEG solution (see on) are added, mixture is washed according to above-mentioned purification schemes It washs, and is resuspended in 200 μ L TE (and being incubated for 5 minutes at 65 DEG C).If it is desired, quantitative PCR, digital droplet PCR can be passed through (ddPCR) or by new-generation sequencing (NGS) product of purifying is quantified.

It is using Covaris S220 that the long segment chain (such as > 2kb) of connection is ultrasonic in 130 μ L total volumes after MDA It handles to about 300bp.The scheme of manufacturer indicates 140W peak power, 10% occupation efficiency, every pulse (burst) 200 Circulation and 80 seconds processing time.The fragment length of about 300bp is selected to increase and keep complete original Cell-free DNA segment Probability.Standard library preparation method can be used that adapter is placed on the DNA fragmentation of ultrasonic treatment for sequencing when needed. A variety of reading combinations are returned from the paired end sequencing operation on Illumina sequenator (HiSeq or MiSeq).Contact (or Person be from junction, or include adapter in Connection Step in the case where is adapter contact) in reading internal (its 5' and 3' flank be non-linking subsequence) those of read be used to interested sequence carry out barcode encoding.

Embodiment 5: cyclisation and amplification

The present embodiment provides the exemplary descriptions of cyclisation and amplification program (also referred to as " Nebula " program).The program makes With following equipment: PCR instrument (such as MJ research PTC-200Peltier thermal cycler)；Circligase II,ssDNA Ligase Epicentre catalog number (Cat.No.) CL9025K；Exonuclease (such as Exol, NEB Biolabs catalog number (Cat.No.) M0293S； Exolll, NEB Biolabs catalog number (Cat.No.) M0206S)；T4 polynucleotide kinase (NEB Biolab catalog number (Cat.No.) M0201S)；Full genome Group amplification kit (such as GE Healthcare, Illustra, Ready-To-Go, Genomiphi, V3DNA amplifing reagent Box)；GlycoBlue (such as Ambion catalog number (Cat.No.) AM9515)；Micro centrifuge (such as Eppendrof 5415D)；DNA purifying Pearl (such as Agencourt, AMpure XP, Beckman Coulter catalog number (Cat.No.) A63881)；Magnet stand (such as MagnaRack^TM Invitrogen catalog number (Cat.No.) CS15000)；2.0 fluorimeters (Invitrogen, catalog number (Cat.No.) Q32866)；Molecular probe DsDNA HS assay kit (Life Technology catalog number (Cat.No.) 032854)；With Bioanalyzer (Agilent 2100), And high sensitivity DNA reagent (catalog number (Cat.No.) 5067-4626).

In order to expand the DNA fragmentation (such as Cell-free DNA) for lacking 5' terminal phosphate, the first step is that end is repaired and single-stranded Formation.It is denaturalized DNA at 96 DEG C 30 seconds (such as in PCR instrument).It is slow by reacting 40 μ L DNA with 5 μ L 10X PNK Fliud flushing mixing, is then incubated for 30 minutes at 37 DEG C, to prepare polynucleotide kinase (PNK) reaction.1mMATP and PNK enzyme is added Enter in the reaction, and is incubated for 45 minutes at 37 DEG C.Buffering fluid exchange is carried out by precipitating and being resuspended DNA.50 μ L are anti-from PNK The DNA and 5 μ L 0.5M sodium acetate pH5.2,1 μ L GlycoBlue, 1 μ L oligonucleotides (100ng/ μ L) and 150 μ L 100% answered Ethyl alcohol mixing.It is incubated for 30 minutes at -80 DEG C, and with 16K rpm centrifugation 5 minutes to precipitate DNA.With 500 μ L, 70% ethanol washing DNA sediment air-dries 5 minutes at room temperature, and DNA is suspended in 12 μ L 10mM Tris-Cl pH 8.0.

Then by connecting the DNA circle of resuspension.It is denaturalized DNA 30 seconds at 96 DEG C, sample cools down 2 points on ice Clock, and connection enzymatic mixture (2 μ L 10X CircLigase buffers, 4 μ L 5M glycine betaines, 1 μ L 50mM MnCl is added₂、1μ L CircLigase II).Connection reaction is incubated for 16 hours for 60 DEG C in PCR instrument.It is degraded by exonuclease digestion and is not connected The polynucleotides connect.For this purpose, being denaturalized DNA 45 seconds at 80 DEG C, and 1 μ L exonuclease enzymatic mixture is added into each pipe (ExoI 20U/ μ L:ExoIII 100U/ μ L=1:2).It is mixed by pressure-vaccum 5 times above and below pipettor, and of short duration centrifugation.Disappear Change mixture to be incubated for 45 minutes at 37 DEG C.30 μ L water, which are added, makes volume reach 50 μ L, and by precipitating as described above and again It is outstanding to carry out further buffering fluid exchange.

In order to carry out whole genome amplification (WGA), by originating the DNA of purifying and denaturation 5 minutes at 65 DEG C.By 10 μ L is in the DNA that the denaturation buffer of GE WGA kit is added to 10 μ L purifying.Divide in cooling block or cooled on ice DNA 2 Clock.20 μ L DNA are added into Ready-To-Go GenomiPhi V3 cake (WGA).WGA reaction is incubated for 1.5 hours at 30 DEG C. Heat inactivation 10 minutes at 65 DEG C.

Use AmpureXP magnetic bead (1.6X) purification of samples.By pearl vortex oscillation, 80 μ L are dispensed in 1.5mL pipe.It will 30 μ L water, the DNA of 20 μ L amplification and the mixing of 80 μ L pearls.It is incubated at room temperature 3 minutes.Pipe is placed on magnet stand 2 minutes, and is inhaled Clear solution out.Pearl is washed twice with 80% ethyl alcohol.By the way that 200 μ L 10mM Tris-Cl pH 8.0 elution is added DNA.DNA bead mixtures are incubated for 5 minutes at 65 DEG C.Pipe is put back on magnet stand 2 minutes.195 μ LDNA are transferred to new pipe In.1 μ L is taken to be quantified using Qubit.Finally, being ultrasonically treated 130 μ L WGA products using CovarisS220 to reach about The size of 400bp.

Embodiment 6: cyclisation and amplification with additional purifying

The present embodiment provides the cyclisation and amplification program (also referred to as " Nebula " journey that have phenol chloroform extraction step Sequence) exemplary description.

Step 1 is removal competitiveness RNA (from the RNA carrier in extraction) and natural RNA (copurification) to be used for Circligase reaction.By the way that 1 μ L RNase A (10mg/mL) (Qiagen 1007885) is added to 50 μ L cfDNA (2- RNA 10ng) is removed, and is incubated for 30 at 37 DEG C on PCR instrument (MJ research PTC-200Peltier thermal cycler) Minute.

Step 2 is buffering fluid exchange and salt and ethanol precipitation.The step is for clearing up and being concentrated input object for connecting Be it is highly useful, have close to 100% the rate of recovery (and column usually only recycles 30%).Ethyl alcohol coprecipitation mixture (50 μ L DNA, 5 μ L 0.5M sodium acetate pH5.2,1 μ L GlycoBlue (Ambion AM9515), 1 μ L carrier from RNase processing are few Nucleotide (100ng/ μ L), 150 μ L, 100% ethyl alcohol) it is incubated for 30 minutes at -80 DEG C, and with 16K rpm (Eppendorf It is centrifuged 5 minutes 5415D) to precipitate DNA.Slightly using 20-mer non-specific carrier oligonucleotides (we use PCR primer) Increase the yield and stability of precipitating recycling.By DNA sediment 500 μ L, 70% ethanol washing.By DNA sediment in room Temperature is lower to be air-dried 5 minutes, and is resuspended with 13 μ L 10mM Tris-Cl pH 8.0.

Step 3 is cyclisation.It is denaturalized 12 μ L cfDNA 30 seconds at 96 DEG C, and 2 minutes cooling on ice cube.In cooling block On be attached mixture (12 μ L cfDNA, 2 μ L 10X Circligase buffers, 4 μ L 5M glycine betaines, 1 μ L 50mM MnCl₂, 1 μ L Circligase II (Epicentre#CL9025K) addition, and be attached at 60 DEG C 16 hours.

Step 4 is exonuclease digestion.Connection DNA mixture is incubated for 45 seconds at 80 DEG C in PCR instrument, is then carried out Exonuclease enzymatic treatment.Be added into each pipe 1 μ L exonuclease enzymatic mixture (ExoI 20U/ μ L:ExoIII 100U/ μ L=1: 2), and reaction is incubated for 30 minutes at 37 DEG C.For quality control purposes, it is not necessary that removal linear die.

Step 5 is phenol chloroform extraction and buffer replacement and salt and ethanol precipitation.Phenol/ethyl alcohol, which helps to realize, is more than 80% joint efficiency (amount that the amount of cyclisation product is approximately equal to input polynucleotides).180 μ L 10mM Tris are added to 20 μ Reach the volume of 200 μ L in DNA of the L from exonuclease enzymatic treatment, and uses 200 μ L phenol extraction DNA.Water layer is collected, And DNA is recycled by ethanol precipitation.Ethyl alcohol coprecipitation mixture (DNA solution, 20 μ L 0.5M second after 200 μ L phenol extractions Sour sodium pH5.2,1 μ L GlycoBlue, 1 μ L carrier oligonucleotides (100ng/ μ L), 600 μ L, 100% ethyl alcohol) it is incubated in -80 DEG C 30 minutes, and with 16K rpm centrifugation 5 minutes to precipitate DNA.By DNA sediment 70% ethanol washing of 500 μ L.DNA precipitating Object air-dries 5 minutes at room temperature, and is resuspended with 11 μ L 10mM Tris-Cl pH 8.0.

Step 6 is whole genome amplification.The DNA of 10 μ L purifying is incubated for 5 minutes for 65 DEG C on heat block, and 10 μ L are added Denaturation buffer (comes from GE Healthcare, Ready-To-Go, Genomiphi, V3DNA amplification kit).DNA is in room temperature Lower cooling adds to Ready-To-Go GenomiPhi V3 cake (WGA) after five minutes, by 20 μ L DNA.Amplified reaction is incubated at 30 DEG C It educates 1.5 hours, and passes through the heat inactivation reaction of termination in 10 minutes at 65 DEG C.

Step 7 is purified using the pearl of AmpureXP magnetic bead (1.6X).This is carried out according to the embodiment of front.

Step 8 is the ultrasonic treatment in the embodiment such as front.Then DNA is ready for quantitative PCR, ddPCR Or sequencing library building.

Embodiment 7: the analysis of joint efficiency and target hit rate

It is cyclized as above-described embodiment and the cfDNA for carrying out full-length genome application is carried out by quantitative PCR (qPCR) Analysis.The qPCR amplification curve result (using KRAS primer) of sample target is shown in Figure 18.As shown in figure 18,1/10 The qPCR of input cfDNA expands to obtain 31.75 Average Ct values (cycle threshold), and the connection product of 1/10 same sample obtains To 31.927 Average Ct values, about 88% high joint efficiency is shown.Joint efficiency can about or above about 70%, 80%, in the range of 90%, 95%, or it is higher, for example, about 100%.Uncyclized linear DNA is removed in some instances, with So that substantially all of DNA can be expanded from annular form.Each sample runs two repetitions every time three times.Such as Figure 18 B institute Show, 10ng WGA product and the amplification curve with reference to genomic DNA (gDNA) (12878,10ng) almost overlap each other.WGA sample The Average Ct values of product are 26.655, and the Average Ct values of gDNA sample are 26.605, show to be more than that 96% high target is hit Rate.The number of KRAS is comparable with the gDNA not expanded in the DNA of the amplification of specified rate, shows the amplification of no bias Journey.Each sample test three times, two repetitions every time.Point as a comparison, be also tested for Lou et al. (PNAS, 2013,110 (49)) the cyclisation scheme provided.Using the method (this method lacks precipitating and purification step in above-described embodiment) of Lou, only There is the linear input DNA of 10-30% to be converted into cyclic DNA.The so low rate of recovery is shown to downstream sequencing and variant detection Challenge.

Embodiment 8: the analysis by ddPCR to the cyclized DNA of amplification

In the whole genome amplification product generated with droplet digital pcr (ddPCR) assessment by the polynucleotides that are cyclized etc. Position gene frequency is kept and bias.In general, ddPCR refers to the number for measuring absolute magnitude and counting to nucleic acid molecules PCR analysis, the nucleic acid molecules be encapsulated in it is supporting the discrete, volume of PCR amplification to limit, in water-in-oil type droplet subregion (Hinson etc., 2011, Anal.Chem.83:8604-8610；Pinheiro etc., 2012, Anal.Chem.84:1003- 1011).Single ddPCR reaction can be made of the droplet of at least 20,000 subregions in every hole.Energy can be used in droplet digital pcr Any platform for enough carrying out digital pcr analysis carries out, and digital pcr analysis is measured absolutely and counting to nucleic acid molecules Amount, the nucleic acid molecules be encapsulated in it is supporting the discrete, volume of PCR amplification to limit, in water-in-oil type droplet subregion.Droplet number The example strategy of word PCR can be summarized as follows: sample being diluted and is assigned to thousands of to millions of individually reaction chamber (Water-In-Oils Droplet) in, so that each reaction chamber includes a copy of target nucleic acid molecules or copies not comprising its." positive " detected The number of droplet (it contains target amplicon (that is, target nucleic acid molecules)), relative to " feminine gender " droplet, (it does not include target amplicon (target nucleic acid molecules)) number, be determined for the copy number of target nucleic acid molecules in primary sample.Droplet digital pcr The example of system includes the QX100 of Bio-Rad^TMDroplet digital pcr system will be assigned to 20 containing nucleic acid-templated sample, In the droplet of 000 nanoliter of size；And the RainDrop of RainDance^TMDigital pcr system will include nucleic acid-templated sample Product are assigned in the droplet of 1,000,000 to 10,000,000 picoliters sizes.It provides and is used in WO2013181276A1 The other examples of the method for ddPCR.

In the present embodiment, the BRAF V600E genomic DNA (gDNA) from K-1735 with refer to gene Group DNA 12878 with special ratios (0%, 0.67%, 2.0%, 6.67%, 20% or 100%) mixing, and carry out fragmentation with Generate the segment of (in this case, about 150bp) similar to the size seen in cfDNA.By mixed DNA sample (10ng) is cyclized and is expanded according to embodiment 2.For BRAF V600E and wild type, the DNA of 40ng amplification is carried out ddPCR.Observed mutation allele frequency is shown with figure in Figure 19 and table is made.As shown, observed To the mutation allele frequency (center row of Figure 19 table) carried out when expanding reflect input mutation allele frequency The ddPCR result (most downlink) of (most uplink) and 100ng genomic DNA when without amplification.According to ddPCR output Gene frequency is calculated as the number containing the BRAF droplet being mutated divided by containing mutant and containing the droplet of wild type Summation.The DNA expanded is represented as open circles, and the DNA without amplification is represented as the filled circles reduced.In addition to Have outside small deviation at 0.67%, the two data sets are completely overlapped.This demonstrate the real tables of mutation allele frequency Existing holding, substantially without bias.

Embodiment 9: higher than the detection of the sequence variants of background

10ng sonicated gDNA (150bp, polygenes Multiple reference DNA, Horizon) as described in Example 2 into Row cyclisation and amplification, are then ultrasonically treated.The DNA of fragmentation then carries out Rubicon sequencing library building.Capture sequencing Afterwards, plot a distance is with reference to the variant in hot spot 50bp.Show that variant detects in Figure 20 as a result, where it is determined that variant needs to exist It can be detected in the two different polynucleotides distinguished according to different contacts.Hot spot (KIT will be referred to expected from seven D816V, EGFR G719S, EGFR T790M, EGFR L858R, KRAS G13D, KRAS G12D, NRAS Q61K) it draws in place Set 0.Also it confirmed other two variant, hollow triangle and diamond shape be expressed as in Figure 20.It is detected according to similar approach different Other results of the variant sequence thereof of concentration are shown in Figure 17.

In order to compare, gDNA is ultrasonically treated as described above, but according to conventional practice by sonicated 10ng GDNA directly carries out Rubicon sequencing library building, and without being cyclized and do not need confirmation in two different polynucleotides On sequence variants.After capture sequencing, the variant in distance reference hot spot 50bp is drawn again, the results are shown in Figure 21.By 7 It is a expected with reference to hot spot (KIT D816V, EGFR G719S, EGFR T790M, EGFR L858R, KRAS G13D, KRAS G12D, NRAS Q61K) it is plotted in position 0.The variant of other positions is not expected, and is most likely to be due to sequencing Mistake.It is in Figure 21 the result shows that standard sequencing methods are with much higher compared with the result for generating method used in Figure 20 Random error rate, when gene frequency low (such as less than 5%), it can cover real jump signal.This viewpoint it is another One illustration is illustrated by the similar result being drawn in Figure 16.

The analysis of embodiment 10:GC composition and size distribution

10ng sonicated gDNA (150bp, polygenes Multiple reference DNA, Horizon) as described in Example 2 into Row cyclisation and amplification, sequencing, and determine that double polynucleotides verifyings filter (left side) are analyzed with variant.There to be a series of CG Table and graphing is made in the number of the sequence of percentage, as shown in figure 22.As shown in the figure of the leftmost side, according to embodiment 2 The sequence of the sample of preparation is substantially similar to theoretical distribution in addition to central peak (total G/C content corresponding to basal gene group). In contrast, when using Rubicon sequencing library building kit directly to construct survey without amplification using same amount of gDNA When preface library, the difference between sequencing result and theoretical distribution is clearly (see intermediate figure).This direct Rubicon sequencing Central peak it is higher than theoretical distribution.Newman et al. (2014；Nature Medicine, (20): 548-54) report, when making When with 32ng cfDNA, cfDNA is sequenced G/C content distribution and is similar to theoretical distribution.This shows in the figure of the rightmost side.

The assessment of DNA size distribution is carried out to the cfDNA for being cyclized, expanding and being sequenced as described in Example 2.Such as Figure 23 institute Show, the peak for the fragment length distribution that sequencing result is pointed out is located at about 150-180bp, is similar to the exemplary distribution mould of cfDNA Formula.

Embodiment 11: the assessment of homogeneity is expanded

The qPCR result for 10 products for being cyclized and expanding according to embodiment 2 (is come from the reference DNA not expanded The gDNA of 12878 cell lines, Coriell Institute) it is compared.10ng genome is used for reference to DNA or amplified production Each qPCR reaction in real time, and ratio is generated by the relative quantification that amplified production is referred to relative to genome.As shown in figure 24, For the ratio of each PCR in 2 times of variation, this shows that the copy number of these targets in the DNA library of amplification is very similar to not The reference DNA of amplification.It designs and is demonstrated in advance from 6 genes (BRAF, cKIT, EGFR, KRAS, NRAS, PI3KCA) 10 pairs of PCR primers.

It is aobvious for those skilled in the art although the preferred embodiments of the invention are shown and described herein And be clear to, these embodiments only provide in an illustrative manner.In the case of without departing from the present invention, those skilled in the art Member is now it will be appreciated that many changes, change and substitution.It should be appreciated that the various substitutions of invention as described herein embodiment Scheme can use in the practice of the invention.Intention is limited the scope of the invention with following claims, and is thus covered Method and structure and its equivalent in these the scope of the claims.

Claims

1. a kind of method of the identification comprising the sequence variants in the nucleic acid samples of multiple polynucleotides, in the multiple polynucleotides Each polynucleotides have 5 ' ends and 3 ' ends, this method comprises:

(a) the independent polynucleotides in the multiple polynucleotides are subjected to cyclisation and form multiple Circular polynucleotides, wherein often A Circular polynucleotide has contact between 5 ' ends and 3 ' ends；

(b) Circular polynucleotide of (a) is expanded；

(c) polynucleotides of amplification are sequenced to generate multiple sequencings and read；

(d) sequencing is read and is compared with reference sequences to identify the sequence difference between them；And

(e) only it is present in at least two cyclic annular multicore glycosides with different contacts in the sequence difference relative to same reference sequences The sequence difference is determined as sequence variants when sour.

2. a kind of method for identifying sequence variants, is compared this method comprises: reading sequencing with reference sequences to identify it Between sequence difference, and be only present at least two in the sequence difference relative to same reference sequences and connect with difference The sequence difference is determined as sequence variants when the Circular polynucleotide comprising identical target sequence of point, in which:

(a) amplified production for corresponding at least two Circular polynucleotide is read in the sequencing；And

(b) each of described at least two Circular polynucleotide includes the 5 ' ends and 3 ' by connecting corresponding polynucleotides End and the different contacts formed.

3. it is method according to claim 1 or 2, wherein the multiple polynucleotides are single-stranded.

4. method according to claim 1 or 2, wherein cyclisation is and the multiple polynucleotides are attached reaction It realizes.

5. it is method according to claim 1 or 2, wherein individual Circular polynucleotide has in the polynucleotides of cyclisation solely Special contact.

6. it is method according to claim 1 or 2, wherein the sequence variants are single nucleotide polymorphism.

7. it is method according to claim 1 or 2, wherein the reference sequences are compared each other by that will be sequenced to read The consensus sequence of formation.

8. it is method according to claim 1 or 2, wherein the reference sequences are known reference sequences.

9. the method as described in claim 1, wherein cyclisation includes that adapter polynucleotides are connected to the multiple multicore glycosides The step of 5 ' ends of the polynucleotides in acid, 3 ' ends or both 5 ' ends and 3 ' ends.

10. the method as described in claim 1, wherein expanding by using the polymerase with strand-displacement activity and realizing.

11. the method as described in claim 1, wherein amplification includes that the Circular polynucleotide is placed in containing random primer In amplification reaction mixture.

12. the method as described in claim 1, wherein amplification includes that the Circular polynucleotide is placed in containing one or more In the amplification reaction mixture of primer, each of them primer is specifically miscellaneous by complementarity and different target sequences It hands over.

13. the method as described in claim 11 or 12, wherein being surveyed without enrichment to the polynucleotides of the amplification Sequence step.

14. the method as described in claim 11 or 12 further comprises and the progress enriching step before sequencing in institute It states and is enriched with one or more target polynucleotides in the polynucleotides of amplification.

15. method as claimed in claim 14, wherein the enriching step includes the polynucleotides and multiple and base for making amplification The probe of bottom attachment is hybridized.

16. method as claimed in claim 14, wherein the enriching step is included in amplification in amplification reaction mixture includes With the target sequence of 5 ' to 3 ' the directions sequence A being orientated and sequence B, which includes:

(a) multiple concatermers, wherein individual concatermer in the multiple concatermer include by cyclisation have 5 ' ends and The independent polynucleotides of 3 ' ends and the different contacts formed；

(b) include sequence A ' the first primer, wherein the sequence A of the first primer and target sequence by sequence A and sequence A ' it Between complementarity specifically hybridize；

(c) include sequence B the second primer, wherein second primer be present in the complementary multicore glycosides comprising target sequence complement Sequence B in acid ' specifically hybridized by the complementarity between B and B '；With

(d) polymerase extends the first primer and the second primer to generate the polynucleotides of amplification；

Wherein the distance between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B are 75nt or shorter.

17. it is method according to claim 1 or 2, wherein identifying microorgranic contaminant based on the determination step.

18. it is method according to claim 1 or 2, wherein the nucleic acid samples include the polynucleotides less than 50ng.

19. method as claimed in claim 18 obtains the multiple from the nucleic acid samples less than 50ng polynucleotides The sequence difference of sequence variants is judged as in reading to occur with 0.05% or higher frequency.

20. the method as described in claim 1, wherein step (b) is multiple comprising target sequence including expanding in the reactive mixture Different Circular polynucleotides, wherein the target sequence includes the sequence A that is orientated with 5 ' to 3 ' directions and sequence B, this method include Nucleic acid amplification reaction is carried out to the reaction mixture, wherein the reaction mixture includes:

(a) multiple Circular polynucleotides, wherein the individual Circular polynucleotide in the multiple Circular polynucleotide includes logical Cross the different contacts that cyclisation has the independent polynucleotides of 5 ' ends and 3 ' ends and formed；

(c) include sequence B the second primer, wherein second primer be present in the complementary multicore glycosides comprising target sequence complement Sequence B in acid ' specifically hybridized by the complementarity between sequence B and B '；With

Wherein sequence A and sequence B are endogenous sequences, and between the 5 ' ends of the sequence A of target sequence and 3 ' ends of sequence B Distance be 75nt or shorter.

21. the method described in claim 16, wherein the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', the Sequence D of two primers comprising being located at 5 ' sides relative to sequence B, and the first expansion of sequence C and sequence D under the first hybridization temperature Do not hybridize with the multiple concatermer during the increasing stage.

22. method as claimed in claim 21, wherein amplification includes first stage and second stage；First stage is included in Hybridization step at a temperature of one, the first and second primers hybridize before primer extend with the concatermer therebetween；And second-order Section includes the hybridization step under the second temperature for being higher than the first temperature, therebetween the first and second primers with include the through extending One or second primer or its complement amplified production hybridization.

23. method as claimed in claim 22, wherein after 5 hybridization circulation and primer extend at the second temperature, institute State in reaction mixture at least 5% two or more copies of amplifying polynucleotides comprising target sequence.

24. method as claimed in claim 20, wherein the first primer includes to be located at the sequence C of 5 ' sides relative to sequence A ', the Sequence D of two primers comprising being located at 5 ' sides relative to sequence B, and the first expansion of sequence C and sequence D under the first hybridization temperature Do not hybridize with the multiple Circular polynucleotide during the increasing stage.

25. method as claimed in claim 24, wherein amplification includes first stage and second stage；First stage is included in Hybridization step at a temperature of one, therebetween the first and second primers before primer extend with the Circular polynucleotide or its amplification Products thereof；And second stage includes the hybridization step under the second temperature for being higher than the first temperature, first and second draw therebetween Object hybridizes with the amplified production of first or second primer or its complement comprising extension.

26. method of any of claims 1 or 2 connects used in cyclization wherein removing or degrading after cyclisation Enzyme.

27. the method for claim 16, wherein the multiple concatermer corresponds to one group 10000 or less target multicore glycosides Acid, and further, wherein the independent concatermer in the multiple concatermer is characterized in that:

(a) they include two or more duplicate copies of sequence, wherein all copies both correspond to identical target Polynucleotides；And

(b) contact in an individual concatermer between two or more duplicate copies of sequence is individually multi-joint with another Difference in body.

28. a kind of system for detection sequence variant, which includes:

(a) computer is configured as receiving the user's request for carrying out sample detection reaction；

(b) amplification system carries out nucleic acid amplification reaction to sample or part of it in response to user's request, and wherein the amplification is anti- It answers the following steps are included: (i) being cyclized individual polynucleotides to form multiple Circular polynucleotides, wherein each ring-type Polynucleotides have contact all between 5 ' ends and 3 ' ends；(ii) expands the Circular polynucleotide；

(c) sequencing system generates sequencing for the polynucleotides expanded by the amplification system and reads, by the sequencing read with Reference sequences are compared to identify the sequence difference between them, and only in the sequence difference relative to same reference sequences The sequence difference is determined as sequence variants when being present at least two Circular polynucleotides with different contacts；With

(d) Report Builder of report is sent to recipient, wherein this report includes the knot of the detection for the sequence variants Fruit.

29. system as claimed in claim 28, wherein the recipient is user.

30. a kind of computer-readable medium comprising code, the code are once performed by one or more processors, that is, implement inspection The method for surveying sequence variants, this method comprises:

(a) client for carrying out detection reaction to sample is received to request；

(b) request to carry out nucleic acid amplification reaction to sample or part of it in response to the client, wherein amplified reaction include with Lower step: individual polynucleotides are cyclized to form multiple Circular polynucleotides by (i), wherein each Circular polynucleotide There is contact all between 5 ' ends and 3 ' ends；(ii) expands the Circular polynucleotide；

(c) carry out sequencing analysis comprising following steps: (i) is generated for the polynucleotides expanded in the amplified reaction and is surveyed Sequence is read；(ii) sequencing is read and is compared with reference sequences to identify the sequence difference between them；And (iii) only exists Sequence difference relative to same reference sequences will be described when being present at least two Circular polynucleotides with different contacts Sequence difference is determined as sequence variants；With

(d) report of the testing result comprising the sequence variants is generated.