CN1771336A - Methods and means for nucleic acid sequencing - Google Patents

Methods and means for nucleic acid sequencing Download PDF

Info

Publication number
CN1771336A
CN1771336A CNA2004800097143A CN200480009714A CN1771336A CN 1771336 A CN1771336 A CN 1771336A CN A2004800097143 A CNA2004800097143 A CN A2004800097143A CN 200480009714 A CN200480009714 A CN 200480009714A CN 1771336 A CN1771336 A CN 1771336A
Authority
CN
China
Prior art keywords
nucleotide
nucleic acid
complementary
chain
templated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2004800097143A
Other languages
Chinese (zh)
Inventor
S·林纳松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GENIZON SVENSKA AB
Original Assignee
GENIZON SVENSKA AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GENIZON SVENSKA AB filed Critical GENIZON SVENSKA AB
Publication of CN1771336A publication Critical patent/CN1771336A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Nucleic acid sequencing-by-synthesis. Primed synthesis of a second strand complementary to a template strand in repeated sets of steps, each step comprising providing one or more of the possible nucleotide complementarity classes for incorporation into the synthesized strand, and each set of steps comprising providing all four possible nucleotide complementarity classes. Three of the four possible nucleotide complementarity classes may first be provided for incorporation into the synthesized strand, then separately the fourth nucleotide complementarity class alone. Also, a DNA molecule consisting of a stem portion and first and second loop portions, wherein the stem portion consists of a first strand and a second strand, wherein the first strand and second strand are equal in length, complementary and annealed together, wherein the first loop portion joins the 3' end of the first strand to the 5' end of the second strand and the second loop portion joins the 3' end of the second strand to the 5' end of the first strand so the DNA molecule has no free 5' or 3' ends, and uses thereof, especially in sequencing.

Description

The Method and kit for that is used for nucleic acid sequencing
The present invention relates to nucleic acid sequencing.The invention particularly relates to " via synthetic order-checking " (SBS), wherein have the nucleic acid chains and the nucleic acid annealing that contains the template that needs its sequence information of free 3 ' end, and it is synthetic to be used to cause second chain, wherein Nucleotide mixes determines to provide sequence information.The present invention partly is based on the notion of such exquisiteness, it allows to use nonocclusive Nucleotide in so-called " colourity order-checking (chroma sequencing) ", thereby overcome the various problems that existing sequencing technologies is had, and allow to use standard reagent and equipment in single working days, to obtain very a large amount of sequences.Embodiment preferred allows to obtain other benefit.The present invention also relates to be used for the algorithm and the technology of sequential analysis, and equipment that is used to check order and system.The present invention allows the automatization of a large amount of examining orders, and the standard desktop equipment that only uses this area to be easy to obtain.
The initiation of second chain that the present invention relates to be complementary to template strand in multiple step group is synthetic, each step comprises to be provided one or more but randomly is less than the complementary type of all possible Nucleotide, be used for incorporation in the synthetic chain, and each group step comprises provides all the four kinds complementary types of possible Nucleotide, randomly be in two or more steps, wherein at least one step comprises that interpolation surpasses the complementary type of a kind of Nucleotide.Preferably, this comprises three kinds that at first provide in four kinds of complementary types of possible Nucleotide, is used for incorporation in the synthetic chain, only provides the 4th kind of Nucleotide complementary type then individually.Chain extension is along with the Nucleotide of last step mixes and stops, and as when the 4th kind of Nucleotide is provided, this is because there is not other Nucleotide.The number of Nucleotide and its kind randomly between determine stopping allow the information of definite apace relevant template based composition and/or sequence.When each use single " terminating nucleotide ", utilize in four kinds of different IPs thuja acids each to carry out four-wheel and extend stopping, the information that can be used in very fast and easily determine whole template sequences can be provided.
Although use many diverse ways in genome research, directly checking order is most worthy so far.In fact, if can make order-checking enough effective, then three main problem in science of all in the genomics (sequencing, gene type assay and gene expression analysis) just can both solve.Can carry out gene type assay to individuality by genome sequencing to the order-checking of pattern species, and can be by being converted into cDNA and checking order and at large analyze RNA colony (copy number of each mRNA of direct census).
Can comprise epigenomics (epigenomics) (research of the cytosine(Cyt) that methylates in the genome-by non-methylated cytosine(Cyt) hydrosulphite is converted into uridylic, sequence and the unconverted template sequence with gained compares then), protein-protein interaction (by the hit that is obtained is checked order), protein-dna interaction (by the dna fragmentation that is obtained is checked order) or the like in the yeast two-hybrid experiment behind the karyomit(e) immunoprecipitation by the science of order-checking solution and other examples of medical problem.Thereby needs highly effectively be used for the method for dna sequencing.
But, need high sequencing throughput in order to replace householder method such as microarray and PCR fragment analysis.For example, viable cell contains the messenger RNA(mRNA) of about 300,000 copies, and on average each copy is about 2,000 bases.Even, also must survey 600,000,000 Nucleotide therefore for the RNA in the individual cells that checks order fully.In the complex organization that is made up of many different cell types, this task becomes even is difficult more, because the cell type specificity transcript is further diluted.To need the flux of gigabit base every day to satisfy these demands.Following table has shown some estimations (people is unless point out separately) for every kind of required flux of experiment:
Experiment The flux that needs
Genome sequence (10 * from the beginning) 30Gbp
Full genome polymorphism 3Gbp
Complete haplotype collection of illustrative plates (200 individualities) 600Gbp
Genetic expression 600Mbp
Epigenomics 3Gbp
1,000 ten thousand protein-interactings 400Mbp
Whole biosphere (species of every genus) ~300Tbp
The invention enables and abovely all can realize with rational cost.
The method that is used for dna sequencing
Utilizing Sanger order-checking PNAS 74 no.12:5463-5467 such as (, 1977) Sanger of fluorescence dideoxy nucleotide is the most widely used method, and already 96 and even 384 kapillary sequenators in successfully automatization.Yet this method depends on a large amount of segmental physical sepn corresponding to each base position of template, thereby is not easy to rise to the order-checking (current best instrument produces the sequence of~2 hundred ten thousand Nucleotide every day) of ultra-high throughput.
Sequence also can be surveyed target polynucleotide and acquisition indirectly by use the probe of selecting from one group of probe.
Via the order-checking of hybridization use represented the longest all possible sequence as certain-length (i.e. one group of all k aggressiveness, wherein k is subject to the number that can install to the probe on the microarray surface; For 1,000,000 probes, can use k=10) one group of probe, and hybridize with template.It is very complicated to rebuild template sequence by this group probe, and because unpredictable character of hybridization kinetics inherent and order-checking than the combination surge of the required number of probes of large form, make it to become more difficult.Even these problems can overcome, but flux will be low inevitably, because each template all needs to carry the microarray of millions of probes, and array cannot use usually once again.
Millimicro hole (nanopore) order-checking (US Genomics, United States Patent (USP) 6,355,420) utilized such fact, promptly when making the length dna molecule pass the millimicro hole of separating two reaction chambers by force, the bonded probe can be used as the variation that electricity is led between the described reaction chamber and detects.By subgroup modifying DNA that might the k aggressiveness with institute, might infer sequence partly.Up to now, feasible strategy was not proposed as yet, to obtain full length sequence, although if possible words can obtain surprising flux (30 minutes genomic magnitudes of people) in principle by millimicro hole approach.
Having designed number of ways already is used for via synthetic order-checking (SBS).
In order to improve sequencing throughput, expectation can be manifested mixing of each base on parallel a large amount of templates, as, be positioned on glass surface or the similar reaction chamber.This realizes (referring to as US4863849 such as Malamede, Kumar US5908755) by SBS.There are two kinds of approach that lead to SBS: perhaps detect the by product that Nucleotide discharged that mixes by each, perhaps detect the mark that permanently adheres to.
Tetra-sodium order-checking (pyrosequencing) (as WO9323564) mixes monomeric by product by each of detection inorganic bisphosphate (PPi) form and measures template sequence.In order to keep the synchronization of all template molecule reactions, add a kind of monomer, and uncorporated monomer is degraded before adding next time at every turn.Yet, caused problem with poly-subsequence (same monomer of bunchiness), because can not prevent multiple mixing.Synchronization finally destroyed (because do not mix on the small portion template or the wrong sum that mixes has finally overwhelmed real signal), and current best system can only read about 20-30 base, and it is united flux and is approximately 200,000 base/skies.
Though the Sanger order-checking all needs exquisite instrument (being kapillary) to each template, tetra-sodium order-checking carrying out in single reaction chamber at an easy rate parallelization is handled.US6274320 has described the purposes that rolling circle amplification is used for producing series connection multiple linear ssdna molecule, and described dna molecular is attached to optical fiber, analyzes in the tetra-sodium sequencing reaction, and described reaction can be carried out thereupon concurrently.In principle, the flux of this kind system only is subject to surface-area (number of template molecule), speed of response and imaging device (resolving power).Yet, prevent that PPi from spreading the number that the needs that come mean in fact necessary limited reactions site from detector before being converted into detectable signal.In US6274320, limit each and be reflected at the micro-scale reaction vessels that is arranged in the optical fiber tip and carry out, thereby the sequence number is limited to sequence of each optical fiber.
Even more limited be the weak point realized of tetra-sodium order-checking read length (<30bp).This type of short sequence is not directly available in genome sequencing, and the complexity of balanced reaction is arranged so that being difficult to further prolongation reads length.Only be occasionally and be, once reported the longest length that reads for 100bp for specific template.
The similar scheme that detects release mark has been described among the US6255083.Described among the WO01/23610 and added Nucleotide in proper order, and detect immediately the scheme of the mark that cuts down by exonuclease.
Advantage on the mark that detection discharges or the principle of by product is that template keeps not containing mark in subsequent step.Yet, come because signal spreads from template, so may be difficult to parallel this type of order-checking scheme on solid surface such as microarray.
The by product that replace to detect discharges, people can detect described Nucleotide when each Nucleotide that mixes is added in the polymkeric substance in the growth.In principle, this kind scheme will check order as tetra-sodium (add a kind of base at every turn, between four kinds of natural nucleotides, circulate) equally carry out, but opposite, with the nucleotide analog (being fluorescence) of applying marking.As an example, Polony order-checking (Mitra RD, Church GM., Nucleic Acids Res 1999 Dec 15; 27 (24): be to be based upon order to add on the basis of fluorescently-labeled Nucleotide e34 " In situ localized amplification and contactreplication of many individual DNA molecules ").
The mark that detection is attached to each Nucleotide that mixes has proposed other difficulty, and it is must to remove, calculate the signal that is produced in deduction or each step of physics quencher, thinks that next step prepares.This type of removal can be for example can be cut joint and finishes by photobleaching or by using between Nucleotide and mark.For example, the fluorescent nucleotide of specificity design is used in the Polony order-checking, and it carries the dithiol joint between Nucleotide and fluorescence dye.According to the observations of not delivering, use reductive agent such as dithiothreitol (DTT) can effectively cut this joint, obtain at least 99.8% pure Nucleotide.
Because read the forfeiture that length mainly is subject to the synchronism that takes place in each step in the SBS method, thus expectation can be added all four kinds of Nucleotide in sequencing reaction, and be retained in the ability of stopped reaction between base is mixed each time.Like that, all four kinds of Nucleotide will always can utilize (thereby limit erroneous incorporation efficiency), and might monitor each and mix base.
Many investigators have imagined the solution that is called as base interpolation order-checking strategy (BASS) sometimes independently.By using 3 '-monomer of sealing, can prevent that reaction from surpassing a step, but described enclosure portion is unsettled (but as photocleavage or chemical degradable) at every turn, thereby can exposes 3 '-OH group, think that next synthesis step prepares.
BASS comprises:
1. single-stranded template and annealed primer are provided;
2. add the fluorescent nucleotide of 3 '-OH sealing;
3. the interpolation polysaccharase mixes single Nucleotide;
4. read fluorescence;
5. the removal blocking groups for example passes through photocleavage;
6. repeating step 2-5.
The Nucleotide of permanent 3 '-OH sealing is used in the distortion of this scheme, it utilizes exonuclease to remove (WO1/23610, WO93/21340), perhaps use the Nucleotide of unsettled 3 '-OH sealing, it can revert to functional 3 '-OH group (US5302509, WO00/50642, WO91/06678, WO93/05183).
All BASS schemes all have following general character:
Use sealing or terminating nucleotide, to prevent each synthetic step that surpasses.
The Nucleotide that each step is mixed also is labeled, and normally uses fluorescence dye.
In the ending of each round-robin, remove enclosure portion (or whole terminal nucleotide), think that next circulation prepares.
Altogether, these demands have proposed the requirement that is difficult to overcome to enzyme used among the BASS:
They must be accepted simultaneously in its 3 ' sealing (wherein modify and do not tolerated by enzyme usually) and by fluorescently-labeled Nucleotide.
They must enough mix this type of Nucleotide effectively, thereby all templates only have a negligible part to break away from synchronization in each circulation.
They must strictly distinguish the base pairing of this type of Nucleotide.
They must not wanted and remove blocking groups or terminating nucleotide prematurely.
So far still nobody can to make the fact of BASS running point out these difficulties be unsurmountable.For example, at (Metzker etc. " Termination of DNA synthesis by novel3 '-modified-deoxyribonucleoside 5 '-triphosphates ", NucleicAcids Res 1994:22 (20): 4259-67), there is not the dCTP of dUTP and the 3 '-sealing of endonuclease capable tolerance 3 '-sealing in 8 enzymes being studied, complicated even without what fluorescent mark added.Thereby seek can accept 3 '-sealing and seemed it almost is futureless by the enzyme of all four kinds of Nucleotide of fluorescent mark form.
In a word, if can make that then people can be convincingly to the order-checking that walks abreast attached to lip-deep millions of templates via the sequence measurement running of mixing.The main magnetism of that detection is mixed and non-release mark is that reaction can walk abreast from the teeth outwards.For example, on the surface of 10 * 10cm, this kind system can check order with about 600000bp/s to for example 37,000,000 templates, each circulation 60s (being assumed to the Poisson's distribution of 1 template/10 μ m), thus realized 50Gb/24 hour.In principle, in this kind system every day can check order ten people's genome.The cost of this system will be suitable with the fluorescent scanning instrument, and running cost will be suitable with the cost of present Sanger sequenator.
The major obstacle of the described target of remaining realization is: at first, it is too short to read length in SBS, to such an extent as to unavailable in big genomic order-checking, and second point still untappedly goes out with enough high-density template to be placed in lip-deep reliable fashion.
The present invention has solved prior art problems dexterously in many aspects.
The accompanying drawing summary
Fig. 1 illustrates and utilizes each natural nucleotide (being shown in the left side) as terminating nucleotide, by the template (top line shows the order-checking chain) of colourity sequencing order-checking.Each colourity sequence table is shown a series of dashes (measuring the number that inserts base) and letter (measuring the number of successive terminating nucleotide).It seems by this figure, obviously, can obtain original series by the reading hurdle by arranging reading.
Fig. 2
Nucleotide in example II mixes in the mensuration, and the figure illustrates is having and do not having under the archaeal dna polymerase (Ke Lienuo (Klenow)) fluorescence (arbitrary unit) after attempting mixing dTTP (with the Cy3 mark), dATP and dGTP.Expected results is two dTTP that mix, and should clearly prove by figure, plants the incident of mixing thus and has produced enough signals, is higher than mixing of background noise to such an extent as to can detect reliably.
Fig. 3 illustrates the embodiment of the reaction chamber that is used for the order-checking of solid phase colourity in the microarray scanner that is adapted at rule.This diagram has shown the reaction chamber assembly of 25 * 75mm glass slide (1) of service regulations, but point sample or adhere to template at random on the described slide glass.Between the reaction period, rubber cradle (2) with glass capsulation in reaction chamber.Import (3) and outlet (4) are connected in as reagent distribution system illustrated among Fig. 4 by junctor (5).
Fig. 4 illustrates the embodiment of the reagent distribution system that is used to implement the colourity order-checking in the reaction chamber that is adapted at Fig. 3.10-port valve (1) makes reagent can pass in and out chamber (2) and waste material pipe (6) distributes, and maximum eight reagent container (3) can hold as required different reagent and the lavation buffer solution of arbitrary given colourity order-checking scheme.Syringe pump (4) and valve (1) can be together with scanner (5, shown the partial view of object slide stand) together by motorize easily and computer control, to be used for full automatic system.
The present invention is based upon on the basis of the exploitation of novel order-checking strategy, described stragetic innovation previously described via the synthetic sequence measurement, make that simultaneously its most difficulties are avoided.It is such strategy, promptly is easy to parallelization, directly manifests each monomeric mixing (promptly need not size fractionation separate) and the long possibility that reads length is provided.
The present invention is based on such understanding, promptly in the SBS method, opposite with supposition once, and nonessentially stopping (, perhaps as among the BASS, using the Nucleotide of sealing in the method for tetra-sodium order-checking or WO1/23610) on each position as being a kind of base of each interpolation.
On the contrary, order-checking can be jumped and be carried out, and jumps to the next one from specific " termination " Nucleotide of each appearance.Can insert Nucleotide by mark.Can the mark terminating nucleotide.This provides improvement, it may be that ideal is compromise between two kinds of schemes, that is: (wherein each step all is productive to the scheme of use blocking groups, but removing sealing is a problem) and by realize synchronized scheme (wherein, so that more multistep is that cost nonproductive, that aggravated synchronism forfeiture problem has avoided removing sealing suddenly) with a kind of base of each interpolation.Equally, compare with the situation of BASS, the present invention has eliminated mark has been placed on the same Nucleotide needs as blocking groups.
One aspect of the present invention provides via the synthetic sequencing, it is characterized in that mixing Nucleotide in mode progressively, and one of them step allows to mix above a kind of Nucleotide potentially.
In preferred embodiments, step allows to mix three kinds in four kinds of possible Nucleotide potentially, and this depends on the potential template sequence.Preferably, different steps allows to mix the 4th kind of possible Nucleotide, promptly is different from the remaining the sort of of three kinds of can mix potentially in first step.
In other embodiments, implement different steps, allowing mixing all four kinds of Nucleotide in one group of step, but wherein at least one step allows to mix and surpasses a kind of all possible Nucleotide that is less than.Such just as discussed further below, the method of prior art may be summarized to be or has can one group of four different repeating step of round-robin, each step allows only to mix a kind of (actual number of the Nucleotide that mixes depends on the potential template sequence) in four kinds of Nucleotide in principle, the single repeating step that perhaps has the Nucleotide that comprises all four kinds of sealings, again allow only to mix a kind of in four kinds of Nucleotide in each step, the two can be summarised as the method for " 1-1-1-1 ".The single step that allows to mix all four kinds of Nucleotide in principle is not useable for order-checking, and it can be summarized as the method for " 4 ", and this is because the chain of order-checking will be immediately and the terminal polymerization of template.The present invention allows the method for enforcement via synthetic order-checking in different embodiments, it is characterized in that mixing Nucleotide in abideing by the step that is different from " 4 " or " 1-1-1-1 " pattern.Thereby, in preferred embodiments, in abideing by one group of step of " 3-1 ", mix Nucleotide, as mentioning already.In other embodiments, one group of step is abideed by " 2-2 " or " 1-2-1 ", or abides by scramble pattern, wherein may repeat (as " 2-2-3 ") at one group of step inner nucleotide.The step group is circulated as required.In addition, can the step group with different mode be made up.
According to an aspect of the present invention, provide the method for definite kernel acid sequence and/or based composition information, described method comprises:
(i) provide the nucleic acid that comprises first chain, described first chain comprises nucleic acid-templated, wherein the free 3 ' end with the described nucleic acid-templated first chain annealed nucleic acid chains allows to be complementary to nucleic acid-templated nucleic acid chains extension, this is by the dependent nucleic acid polymerase of template, by template sequence dependency ground Nucleotide is incorporated into to be complementary to and realizes in the nucleic acid-templated nucleic acid chains;
(ii) implement one or more steps of one group, number of cycles with expectation should be organized one or more steps, or implemented with one or more combination of steps of other groups, was complementary to nucleic acid-templated nucleic acid chains with extension, thereby allow to obtain the information of described nucleic acid base composition of expression or sequence
One of them step comprises:
(a) exist when following:
The nucleic acid that comprises first chain, described first chain comprises nucleic acid-templated,
With the free 3 ' end of the described nucleic acid-templated first chain annealed nucleic acid chains and
Template dependency nucleic acid polymerase;
Provide be selected from a kind of, two kinds, the Nucleotide of three kinds or the four kinds complementary types of Nucleotide, with be used for by described nucleic acid polymerase with described oligonucleotide template dependency be incorporated into and be complementary to nucleic acid-templated nucleic acid chains, wherein each described Nucleotide is natural nucleotide or nucleotide analog, they can the free 3 ' end of nucleic acid chains by nucleic acid polymerase template dependency be incorporated in the DNA chain, and in the complementary type of each Nucleotide, described Nucleotide and nucleotide analog and adenosine (A), cytosine(Cyt) (C), the complementation of one of thymus pyrimidine (T) and guanine (G);
With
(b) remove or the uncorporated Nucleotide of deactivation;
And
Wherein in one group of step
Provide the Nucleotide that is selected from the complementary types of all four kinds of Nucleotide, and it can be used for carrying out that template is dependent mixes,
In at least one step, provide and be selected from the Nucleotide that surpasses a kind of, optional two kinds, three kinds or the four kinds complementary types of Nucleotide, and it can be used for carrying out, and template is dependent mixes, and the Nucleotide in the complementary type of at least a Nucleotide, be complementary in the nucleic acid-templated nucleic acid chains if be incorporated into, then allow to be complementary to nucleic acid-templated nucleic acid chains is further extended and
Randomly do not provide Nucleotide complementary type in surpassing a step, perhaps being no more than in this group step provides each Nucleotide complementary type in the step; With
Wherein, if in a step, provide the Nucleotide that is selected from all four kinds of complementary types, Nucleotide in the complementary type of then a kind of, two or three Nucleotide, be complementary in the nucleic acid-templated nucleic acid chains if be incorporated into, then prevent to be complementary to nucleic acid-templated nucleic acid chains and further extend, if there is multiple copied with all copies that exist;
(iii) implement the described steps of many groups, the described step group that circulates and/or with the different described step groups of step group Joint Implementation;
(iv) determine to be incorporated into character and/or the amount that is complementary to the Nucleotide in the nucleic acid-templated nucleic acid chains at least one group of step, this is to realize by being incorporated into the character and/or the amount that are complementary to the Nucleotide in the nucleic acid-templated nucleic acid chains at least one step of determining each group, to described group of character and/or the amount that will determine the Nucleotide that mixed.
As indicated such, the present invention allows to check order and need not to carry out the size fractionation separation.
Free 3 ' end with the first chain annealed nucleic acid that is positioned at nucleic acid (as DNA) template (sequence information and/or based composition information about it are desired) 5 ', can be by providing with the first chain annealed primer (as Oligonucleolide primers), can by with the first chain annealed, second chain in breach provide (in the case, between extended peroid, initial and nucleic acid-templated annealed part is replaced or is degraded in second chain), perhaps can provide, promptly allow the continuation of first chain of the Cheng Huan backward that self causes by the ring of self.
Nucleotide or nucleotide analog can be by its base pairing property definitions.Thereby the complementary type of the Nucleotide that all Nucleotide that are complementary to natural adenosine and mix or nucleotide analog belong to thymus pyrimidine, be complementary to natural guanine and mix those belong to the complementary type of Nucleotide of cytosine(Cyt), be complementary to natural thymus pyrimidine and mix those belong to the complementary type of Nucleotide of adenosine, and be complementary to natural cytosine(Cyt) and mix those belong to the complementary type of Nucleotide of guanine.Thereby the complementary type specification of Nucleotide and defined Nucleotide or the logic property of nucleotide analog with regard to template guided polymerization.
By Nucleotide is provided in reaction medium, in order to mixing, and allow it to mix Nucleotide potentially by the dependent polysaccharase of template.
Nucleic acid-templated can be thymus nucleic acid (DNA), and nucleic acid polymerase can be the dependent archaeal dna polymerase of DNA, and Nucleotide can be deoxyribonucleotide or deoxyribonucleotide analogue.
Nucleic acid-templated can be thymus nucleic acid (DNA), and nucleic acid polymerase can be the dependent Yeast Nucleic Acid of DNA (RNA) polysaccharase, and Nucleotide can be ribonucleotide or ribonucleoside acid-like substance.
Nucleic acid-templated can be Yeast Nucleic Acid (RNA), and nucleic acid polymerase can be a ThermoScript II, and Nucleotide can be deoxyribonucleotide or deoxyribonucleotide analogue.
In the preferred embodiment of all respects of the present invention, mix potentially wherein that used Nucleotide is selected from standard nucleotides in the step that surpasses a kind of different Nucleotide.
In the certain preferred embodiments of all respects of the present invention, only mix potentially wherein that used Nucleotide is the Nucleotide that is selected from standard nucleotides in a kind of step in the different IPs thuja acid.
In other embodiments, can adopt the Nucleotide or the analogue of modification, as other places institute is further discussed in the literary composition.
Nucleotide of the present invention can be labeled, and mark can comprise fluorescent mark.Different Nucleotide (as between the complementary type of A, C, G and T) can come mark by different marks, for example, may be the different fluorescent marks of different colours.
As indicatedly the invention provides like that via the synthetic sequence measurement, it is characterized in that be different from 4 or the scheme of 1-1-1-1 mix Nucleotide.
Thereby the scheme of preferably mixing at first allows to mix potentially 2 or 3 Nucleotide, then, generally be continue washing step with after removing uncorporated Nucleotide, in different steps, this scheme of mixing allows to mix potentially 2 Nucleotide or 1 Nucleotide.Can carry out the combination of step group, so that total reaction scheme to be provided.
Certainly, in reaction medium, provide suitable condition, mix to be used for implementing the dependent Nucleotide of template at 3 ' end of DNA chain according to the available knowledge and technology in this area.
In one embodiment, the present invention proposes such method, it comprises a round-robin step or step group: dna profiling is provided, wherein the free 3 ' end with the first chain annealed nucleic acid chains (as annealing primer) that is positioned at dna profiling 5 ' allows the synthetic DNA chain that is complementary to dna profiling, in the first step in the presence of polysaccharase, under the condition that Nucleotide is incorporated into the prolongation chain that is complementary to template, add the Nucleotide (being called " insertion " Nucleotide) of a group echo, and then wash to remove uncorporated Nucleotide, then second the step in the presence of polysaccharase, based on primer ground Nucleotide is being incorporated under the condition that prolongs chain, add the Nucleotide (" termination " Nucleotide) of second group echo, and then wash removing uncorporated Nucleotide, and determine to mix the mark of Nucleotide.This group step can repeat needed circulation or number of times.
Thereby in each step, determine to mix the number (but non-order) of Nucleotide.If the mark at the different IPs thuja acid is differentiable, then the number of each Nucleotide kind of mixing (but non-order) will be determined.
By this way, promptly by determining that the information about mixing Nucleotide that mark is obtained is called as colourity.Colourity and off-gauge dna sequence dna, still:
It can be used as signature (signature) sequence, and compares with known dna sequence dna;
One group of four (normally) such sequence can re-assembly and become normal dna sequence dna (as further explaination the in the literary composition).
Embodiment of the present invention, and the notion of colourity can use dT to illustrate as the general sequence that terminating nucleotide obtained as inserting Nucleotide by consulting and using dA, dC and dG, and for example, note is done as follows:
dT[1A,2C,1G,1T]-[2A,2C,1G,3T]-[2A,2C,1G,1T]-[0A,1C,0G,1T]
Wherein, parenthetic numeral provided occur between the dT at every turn each insert the abundance of Nucleotide, as measured, and add the number of continuous dT by its mark intensity.
Several dna sequence dnas can produce these data, for example:
ACCGTGCACATTTACAGCTCT
CAGCTCCAAGTTTCACGATCT
Deng
Base introducing (base-calling) strategy hereinafter is provided, and it uses the information that obtained by four kinds of such sequence readings (use in succession in four kinds of Nucleotide each as terminating nucleotide) or colourity to determine original series clearly.
In one aspect, the preferred embodiment of the invention provides such method (scheme I), and it comprises:
1. for single-stranded template provides the annealed dna chain with 3 ' end, to work as primer.
2. add the Nucleotide (being called " insertion Nucleotide ") of one group of one or more mark, so select them, thereby at least a Nucleotide (being called " terminating nucleotide ") that is complementary to template is excluded outside this group echo Nucleotide.Usually, but add the three kinds of Nucleotide (the 4th kind of natural nucleotide is terminating nucleotide) carry separator.
3. randomly, add one or more sealing Nucleotide (being different from labeled nucleotide).These also are " terminating nucleotides ".Example comprises the Nucleotide that 3 '-O-modifies, but it can carry the group of photocleavage, stays 3 '-OH, perhaps other modifications, i.e. acyclic nucleotide and dideoxy nucleotide when irradiation.
4. randomly, add one or more non-inhibitor Nucleotide that mixes (Nucleotide that is different from labeled nucleotide and sealing), they can bring into play the effect that the mistake on the template position that prevents from not have complement in described mark or sealing Nucleotide group is mixed.Example comprises 5 '-two-and the acid of list-phosphoric acid nucleoside, 5 '-(alpha-beta-methylene radical) triphosphopyridine nucleotide.
5. causing Nucleotide to add incubation under the condition of growing chain to suitable polysaccharase.
6. the uncorporated Nucleotide of flush away.
7. if in step 3, added any sealing Nucleotide, then
A. remove enclosure portion, as passing through photocleavage, Enzymatic transformation or chemical reaction.
B. alternatively, handle and mix nonocclusive Nucleotide subsequently by exonuclease and replace complete nucleotide (for example referring to WO1/23610, WO93/21340).
8. add remaining Nucleotide (" terminating nucleotide "), they are to guarantee to be present in all Nucleotide in the template all to have added complement necessary, and are causing Nucleotide to add incubation under the condition of growing chain to polysaccharase (and in nonessential and the step 5 identical).This terminating nucleotide can randomly be labeled, and/or 3 '-sealing (as among the BASS).
9. the uncorporated Nucleotide of flush away.
10. detect the existence and/or the amount of each labeled nucleotide.
11. randomly, remove described mark and/or 3 '-blocking groups or make its inefficacy.For example, fluorescent mark can be by photobleaching.
12. repeating step 2-11 is accomplished until required cycle number.
This kind sequence measurement is particularly suitable for parallelization on solid phase, and this is because it is simple, is again because it provides sane method for synchronizing.This scheme can repeat repeatedly by being restarted by fresh primer in step 1.
The Nucleotide that is added in step 3 and 8 is called as terminating nucleotide because they in step 5, prevent (by be closed or by not existing) polymerization continues to cross its complement.Described terminating nucleotide group can change.For example, if reaction is carried out four times from step 1, then each in four kinds of natural nucleotides all can be used as terminating nucleotide.
Primer stays free 3 ' end by base complement and template annealing, can one by one add Nucleotide by the dependent archaeal dna polymerase of template on it.As indicated like that free 3 ' end can produce breach by a chain that makes double chain DNA molecule, perhaps produces by allowing free 3 ' of strand to hold backward Cheng Huan to be used for oneself's initiation.
Attention: " mark " molecule should be understood to comprise the mixture of pure tagged molecule and mark and unmarked molecule.For example, the dTTP of mark can be pure fluorescein-labeled dTTP, the mixture of perhaps fluorescein-labeled dTTP and conventional unmarked dTTP.Mark and unlabelled best ratio are determined by some factors:
Obtain enough signals to overcome the needs of equipment noise.For example, on PerkinElmerScanArray, 2.5 fluorescence dye/pixel produces the signal that is three times in noise level.
Avoid a plurality of fluorescence dye next-door neighbours' needs, to avoid FRET (fluorescence resonance energy transfer) (FRET, this causes a kind of fluorescence dye quencher another kind).FRET decays along with six powers of distance, but still can be important in the scope of several Nucleotide.
Avoid a plurality of fluorescence dye next-door neighbours' needs, mix Nucleotide (it may be subjected to the inhibition of large volume fluorescence dye steric effect) by polysaccharase subsequently to avoid inhibition.
Select as another kind, people can force the Nucleotide fraction of mark to be used to stop growing chain, for example, by the acyclic or dideoxy nucleotide of applying marking, or by mark is placed 3 '-OH last or its near.As long as the Nucleotide of mark only accounts for the sub-fraction of complete nucleotide, then the forfeiture by the signal that stops being caused is still unimportant, and simultaneously since enzyme can avoid fully the forfeiture of the synchronism due to the lower avidity of modified nucleotide.
The breadboard work of contriver finds~2.5% or labeled nucleotide still less can work well (embodiment sees below).Suppose that template is that 1000 series connection of 100bp sequence repeat copy, then, obtain at least 25 fluorescence dyes of each template (that is, on PerkinElmer ScanArra for each Nucleotide that mixes, if each template all in pixel, then is higher than noise level more than 10 times).Suppose to have mixed in an average circulation four Nucleotide, then 1000 bases of mark spacing average out to have avoided quencher and polysaccharase to suppress both.
In other embodiments of the present invention, scheme I (for instance) has allowed to relax the BASS distortion to some restrictions of polysaccharase.If insertion Nucleotide group is labeled but is not closed, terminating nucleotide is not labeled but is closed simultaneously, then can add all four kinds of Nucleotide as mixture in single step, washs like that as mentioned then and scans.Can use the polysaccharase of accepting sealing Nucleotide and labeled nucleotide, perhaps can use different polysaccharases, in first step, add the insertion Nucleotide of mark, and in second step, add the terminating nucleotide of sealing.The colourity difference of the scheme of this kind modification is that homopolymer is detected as the adjacent cycle of not mixing; They stop along with the single terminating nucleotide that mixes separately, thereby scan homopolymer step by step, but not fill it in single run.In such scheme, but may expect to use photocleavage fluorescence dye (seeing below) but and 3 '-blocking groups of photocleavage.Alternatively, can use by the removable blocking groups of the chemical treatment of gentleness, for example, (Tetrahedron Letters 1999, vol.40, pp.371-372) the middle allyl group of describing such as Kamal.
In simple especially embodiment, an aspect of of the present present invention provides such method (scheme II), and it comprises:
1. for providing free 3 ' on the annealed dna chain, single-stranded template holds, to bring into play function as primer.
2. but add three kinds of Nucleotide that carry separator, as differentiable fluorescent mark.
3. randomly, add one or more non-inhibitor Nucleotide (being different from labeled nucleotide) that mixes.Example comprises 5 '-two-and the acid of list-phosphoric acid nucleoside, 5 '-(alpha-beta-methylene radical) triphosphopyridine nucleotide.
4. causing Nucleotide to add incubation under the condition of growing chain to suitable polysaccharase.
5. the uncorporated Nucleotide of flush away.
6. add remaining Nucleotide (mark, for example fluorescence ground), and causing Nucleotide to add incubation under the condition of growing chain to polysaccharase (and in nonessential and the step 5 identical).
7. the uncorporated Nucleotide of flush away.
8. detect the existence and the amount of each labeled nucleotide.
9. make mark lost efficacy (for example,, be not that each circulation is all essential, perhaps, by carrying out chemical treatment with for example dithiothreitol (DTT)) to cut disulfide linkage by photobleaching.
10. repeating step 2-7 is accomplished until required cycle number.
For example, people can use dA/dG/dC (as being labeled as red/green) in step 2, add dT (as being labeled as yellow) then in step 6.Step 4 dA occurs first with adding dA, dG and the dC of any number in template, do not stop because there being complementary Nucleotide then.The fluorescence reading of dA/dG/dC in the step 8 (as red/green) will and each dT between the number of dA, dG and dC proportional, and the fluorescence of the dA that mixes (as yellow) will be proportional with the number of continuous dT, and after spectral separation, can quantitative various contributions.The sequence that obtains generally can be remembered the sequence as four kinds of members, has provided the number (but non-order) of dA, dG and dC between each dT.
For example, sequence A CGCTACGCATCAGACTTC (being template TGCGATGCGTAGTCTGAAG) can remember as [1A, 2C, 1G, 1T]-[2A, 2C, 1G, 1T]-[2A, 2C, 1G, 2T]-[0A, 1C, 0G, 0T].
By implementing four kinds of different reactions according to scheme II, between four kinds of possibilities, change terminating nucleotide, people can guarantee to stop at each different bases place in one of four kinds of reactions.
Although fluorescence dye is easy to use, be not that all fluorescence dyes all are easy to bleaching.Can use the mark of other types in aforesaid operations, as long as for each circulation, they can removal, deactivations or calculate deduction and get final product.But, in other embodiments, select in order to allow wideer mark, removal (as the photobleaching of fluorescence dye) can randomly replace with fully and restart, and is for example as follows:
At first, carry out a circulation with mark as fluorescent nucleotide.Remove synthetic DNA chain recently, as handling, and make fresh primer annealing, to restart this process by methane amide.Specifically, carry out a circulation with unlabelled Nucleotide, then the Nucleotide with mark carries out a circulation.Repeat this process, use the unmarked Nucleotide of cumulative round-robin each time.By this way, the last circulation that only at every turn restarts is labeled, and has eliminated the needs of removing at preceding round-robin mark (as the bleaching fluorescence dye).
Also can utilize identical approach to cross non-purpose district, the reading head of the moving tape recorder of an image drift is arranged.
As the alternatives of photobleaching, can use the fluorescent nucleotide that between Nucleotide and fluorescence dye, carries the modification that to cut joint.For example, this type of Nucleotide that carries disulfide linkage had been described already, it can cut effectively by reductive agent such as dithiothreitol (DTT) (sees that Rob Mitra and George Church are about being used for checking order and the work of the polony technology of gene type assay, about comprising the details of chemical structure, can utilize browser finding on the Internet, as http://cbcg.lbl.gov/Genome9/Talks/mitra.pdf.Similarly, and Li etc. (PNAS 2003, vol.100 no.2, but the fluorescent nucleotide of the photocleavage of the 2-nitrobenzyl joint that comprises photo-labile pp.414-419) has been described.
Method according to scheme II allows to realize many advantages:
Circulating with current majority is that nonproductive SBS method is (because in this method, the single base of the each interpolation of people, complementary probability<50% on this position) compares, because one of four kinds of reactions stop (ignoring homopolymer) on each template position, so n required cycle number of base of order-checking is n.
Because in four kinds of reactions each, syntheticly restart by primer, will problem be arranged with 4 times of low degree so mainly depend on the factor of cycle number.Particularly, the forfeiture of synchronism will take place after many circulations, but for each of four kinds of reactions, because all templates are all by synchronization again effectively, so compare with SBI or tetra-sodium sequencing, under simulated condition, can read four times more than base (embodiment as follows).
Do not need the application (promptly be used for the signature order-checking of genetic expression, the methyl-cytosine(Cyt) that is used to show genomics checks order, and the snp analysis that is used for specific SNP) of complete sequence can use the partial sequence that is only obtained from one of four kinds of reactions.The sequence that is obtained comprises the information that is equivalent to 1 base pair of each circulation.The scheme III that sees below.See that also Fig. 1 is about the diagram of forming obtainable data by each dA, dC, dG and dT in the differential responses.Those data any all is enough to be used in required purpose, for example, determines that in some possible sequences (as there are differences) which is present among the specimen in dA Nucleotide.
The homopolymer chain is always measured four times, makes them than be easier to base introducing correctly in SBI or tetra-sodium sequencing.The base that sees below is introduced algorithm II.
Base is introduced algorithm I (elementary tactics)
This part disclosure has been enumerated the exemplary embodiment of the following aspect of the present invention, and these aspects relate to according to using the information that method obtained of disclosed termination and insertion Nucleotide to identify sequence by comprising.
By implementing four kinds of different reactions according to scheme II, between four kinds of possibilities, change terminating nucleotide, people can guarantee to stop at each different base place in one of four kinds of reactions.Table has hereinafter shown in each four circulations in using four kinds of terminating nucleotides the result or the colourity that will be obtained by sequence A CGCTACGCATCAGACTC (template TGCGATGCGTAGTCTGAG):
Stop The sequence that is obtained (preceding 4 circulations):
dT dA dG dC [1A,2C,1G,1T]-[2A,2C,1G,1T]-[2A,2C,1G,1T]-[0A,1C,0G,0T] [0C,0G,0T,1A]-[2C,1G,1T,1A]-[2C,1G,0T,1A]-[1C,0G,1T,1A] [1A,1C,0T,1G]-[1A,2C,1T,1G]-[2A,2C,1T,1G]-[1A,2C,1T,0G] [1A,0G,0T,1C]-[0A,1G,0T,1C]-[1A,0G,1T,1C]-[0A,1G,0T,1C]
Read from left to right, people can easily find out first Nucleotide must be A (because the first step of relevant A does not produce the fluorescence of any other base, therefore must be terminated and do not contain any Nucleotide that mixes).Remove corresponding clauses and subclauses, and record A, can produce:
Stop The sequence that is obtained:
dT dA dG dC [1A,2C,1G,1T]-[2A,2C,1G,1T]-[2A,2C,1G,1T]-[0A,1C,0G,0T] [2C,1G,1T,1A]-[2C,1G,0T,1A]-[1C,0G,1T,1A] [1A,1C,0T,1G]-[1A,2C,1T,1G]-[2A,2C,1T,1G]-[1A,2C,1T,0G] [1A,0G,0T,1C]-[0A,1G,0T,1C]-[1A,0G,1T,1C]-[0A,1G,0T,1C]
Sequence: A
The unique clauses and subclauses that conform in left side are relevant C now, because it has shown the existence of having only 1 A.Remove corresponding clauses and subclauses, and record C, we obtain:
Stop The sequence that is obtained:
dT dA dG dC [1A,2C,1G,1T]-[2A,2C,1G,1T]-[2A,2C,1G,1T]-[0A,1C,0G,0T] [2C,1G,1T,1A]-[2C,1G,0T,1A]-[1C,0G,1T,1A] [1A,1C,0T,1G]-[1A,2C,1T,1G]-[2A,2C,1T,1G]-[1A,2C,1T,0G] [0A,1G,0T,1C]-[1A,0G,1T,1C]-[0A,1G,0T,1C]
Sequence: AC
The unique clauses and subclauses that conform in left side are relevant G now:
Stop The sequence that is obtained:
dT dA dG dC [1A,2C,1G,1T]-[2A,2C,1G,1T]-[2A,2C,1G,1T]-[0A,1C,0G,0T] [2C,1G,1T,1A]-[2C,1G,0T,1A]-[1C,0G,1T,1A] [1A,2C,1T,1G]-[2A,2C,1T,1G]-[1A,2C,1T,0G] [0A,1G,0T,1C]-[1A,0G,1T,1C]-[0A,1G,0T,1C]
Sequence: ACG
Now the unique clauses and subclauses that conform in left side are relevant C because its shown this and before C between have only 1 G, the sequence of gained is consistent so far with us.
Continue like this, complete sequence: ACGCTACGCATCAGACTC finally is provided.
In fact, find out the total distance between each terminating nucleotide of fluorescence sum tolerance that the insertion Nucleotide by each step obtained easily, and from the number of the fluorescence of terminating nucleotide tolerance successive terminating nucleotide, and therefore people always can determine sequence by one group of four kinds of reaction.This fact illustrates with further reference to Fig. 1.
Four lines among pan Fig. 1 just makes it possible to " reading " sequence.Might obtain sequence like this, i.e. number (by the magnitude of institute's mark note) by the terminating nucleotide determining simply to be mixed in each circulation as fluorescence, and the number of the insertion Nucleotide that is mixed in each circulation (magnitude of remembering by institute's mark again), and will use each four the operating result of operation each time in four kinds of different IPs thuja acids to arrange as terminating nucleotide.But, preferably, determine the character (this may mean characteristic) of insertion Nucleotide in service each time, thereby the information degeneracy that allows to be exceedingly fast and accurately determine sequence is provided, allow the error of mark magnitude in measuring, for example as this paper is further discuss.
Base is introduced algorithm II
Can use the more complicated base introducing algorithm of for example dynamic programming, least squares optimization and/or regular expression execution, in the face of measuring error the time, to look for optimal sequence.But this type of algorithm also can utilize the redundancy of acquired information better.In other words, the measurement length between each identical Nucleotide that occurs is opposite with only using, and this type of algorithm will be looked for optimal sequence, and it minimizes the difference between desired and viewed three kinds of each abundance of inserting in the Nucleotide.
The inventor provides the dynamic programming algorithm that can move, although the noise of 20-25% is arranged, it is operational excellence also.It is right that it at first uses dynamic programming to carry out the multiple ratio of four measurement series, thereby in each step the difference between desired and viewed three kinds of each abundance of inserting in the Nucleotide is minimized.Then, based on four kinds of obtainable range observations, utilize the least squares optimization to look for the most probable length of each homopolymer chain.
Term and definition
Homopolymer is a kind of continuous sequence of specific nucleotide.Homopolymer sequence is that wherein homopolymer note is as numeral and the dna sequence dna of non-repetitive letter, and promptly the ACCGGT note is as ACGT, and has homopolymer length 1,2,2,1.
Degree of filling in colors on a sketch is to utilize in four kinds of natural nucleotides each as terminating nucleotide, by the one group of observed value that method of the present invention such as scheme I is repeated obtained for four times.The three-dimensional array of the circulation thereby colourity is served as reasons, terminating nucleotide and the observed value of Nucleotide index of surveying.For example, if each terminating nucleotide is carried out 10 circulations, then colourity will comprise 10 (cycle numbers) and multiply by 4 (numbers of terminating nucleotide) and multiply by 4 (number of the Nucleotide of surveying) individual members, and position { 4, ' A ', ' C ' } on numeral will be to be the fluorescence of the cytosine(Cyt) when adenosine is used as terminating nucleotide, surveyed in 4 o'clock in cycle number.For simplicity, the colourity of establishing x is to comprise the subgroup as the observed value that terminating nucleotide obtained by x in whole colourities.Thereby the colourity of A is 1/4th of whole colourities.
If the cycle number of N in repeating each time, being carried out.Therefore colourity is 4*4*N the member who is inferred by the mark observed value.
If calling sequence is nucleotide sequence S 0, S 1... S k(wherein each S is one of [A, C, G, T]).The purpose that base is introduced is to look for best calling sequence when given colourity.For simplicity, we are with the homopolymer chain amount of being expressed as rather than repeat identical base, and in other words, we make each position i and an amount q in the calling sequence iRelevant, it has provided base S iThe repeat number of estimating.For being consistent, we so limit sequence, thereby for all n, S N+1≠ S n
Base is introduced Phase I, dynamic programming
The purpose that base is introduced is to look for best calling sequence when given colourity.Yet, have 4*3 K-1Individual possible length is the calling sequence of k, even for quite little k, also is great number (during k=20, existing to surpass 4,000,000,000 possible calling sequences).Introduce algorithm in order to look for the available base, simplify this complex nature of the problem.
Calling sequence can be sorted out by the occurrence number of each Nucleotide.For example, { 1,2,0,4} is corresponding to arbitrary sequence that contains 1A, 2C, no G and 4T for the base counting.An example of this kind sequence is TCTATCT.
Utilized such fact according to algorithm provided by the invention, promptly under some simple situations, we can easily draw best calling sequence, and by recurrence, more situation of difficult can be inferred out by comparatively simple situation.
Some simple situations solve easily.{ 0,0,0,0} is corresponding to the calling sequence of sky for the base counting.{ 1,0,0,0} only can be corresponding to calling sequence ' A ', and also is similar for C, G and T for counting.
Yet { 1,1,1,1} can be corresponding to ' ACGT ', ' TCGA ' or the like for the base counting.Under this type of situation, colourity can be used for looking for best calling sequence.
{ i, j, k, any calling sequence of l} must that is to say accurately corresponding to the colourity of specific subgroup, comprises the subgroup of the colourity of the colourity of colourity, a k circulation G of colourity, a j circulation C of i circulation A and l circulation T to note having the base counting.Therefore the colourity of calling sequence prediction and actual measured colourity can be compared.{ i, j, k, the best calling sequence of l} will be that it predicts the colourity and the actual measurement colourity of corresponding subgroup are the most similar that.Similarity can be measured by multiple mode, for example, and as difference and (sum ofdifferences), variance and (sum of square differences), Pearson correlation coefficient etc.Similarity can be reported as score, promptly as pending minimized error score or pending maximized similarity score.
Described generalized case i, and j, k, l} can not directly solve.But i, and j, k, the best calling sequence of l} can be produced by short sequence by maximum four kinds of different modes: by to { i-1, j, k adds in the optimal sequence of l} ' A ', by to { i, j-1, k, add in the optimal sequence of l} ' C ', by to { i, j, k-1 adds in the optimal sequence of l} ' G ', perhaps passes through to { i, j, k adds in the optimal sequence of l-1} ' T '.
By counting the score (as mentioned above, by comparison prediction colourity and actual colourity) and selecting minimum value (or maximum value, decide on used measure), people can find out in (at most) four kinds of extensions which is best one.Shown hereinafter how this carries out, but temporary transient supposition has obtained such score.
We q is set to the base of introducing recently of the amount of the actual measurement that obtained by colourity.For example, when extension that consider to use ' A ' (promptly by i-1, j, k, l} is to { k in the time of l}), then will obtain q by the colourity that position { i, ' A ', ' A ' } locates, the amount of measured mark adenosine when promptly using adenosine as terminating nucleotide in circulation i for i, j.
Thereby { i, j, k, the best calling sequence of l} always can obtain by the best extension of finding out the sequence that contains one of introducing base less.This operation can be carried out repetition to each shorter situation then, until reach all be zero the situation of separating as 1,0,0,0}.Therefore use same simple operations by recursiveness ground, always might obtain the best calling sequence of any length.As by product, obtained as homopolymer length q measured in colourity i
The minority restriction of using:
Sequence can not contain and be less than any base of zero.Thereby we can not be by extending { i, j, k ,-1} and obtain { i, j, k, the best calling sequence of 0} with ' T '.Since this restriction, all recurrence must end at last [0,0,0, the 0} empty sequence.
We are to the constraint of calling sequence, promptly for all n, S N+1≠ S n, mean if i-1, j, k, the best calling sequence of l} terminates in ' A ', then we can not extend with ' A ', also be suchlike for other bases.
In some cases, there is not the extension of possibility.For example, and 2,0,0,0} can not by with another ' A ' extension 1,0,0,0} and producing.Under this type of situation, there is not calling sequence.
The similarity score can be calculated by mode progressively.Because they only differ from a circulation, thus when calculate i, j, k, during the score of l}, i-1, j, k, the score of l} is reusable, or the like.This can be by following the tracks of each { i, j, k, the length of the best calling sequence of l} and running score (running score) and realize.When research from for example from i-1, j, k, l} is to { k during may the extending of the l} extension of (promptly by ' A '), only needs to calculate the prediction chrominance section corresponding to the additional cycles of ' A ' for i, j.This can calculate by the insertion base of tracing back to ' A ' of most recent in the research calling sequence.Since i-1, and j, k, the best calling sequence of l} is known, how to obtain so also known it.Particularly, the amount q of the measurement of known each insertion Nucleotide.For each ' C ', ' G ' and ' T ', this tittle is all added up, trace back to ' A ' of most recent, with the round-robin predictor that obtains to lack in the colourity about prediction always.Then circulation corresponding in difference between these predictors (or variance etc.) and the actual institute colour examining degree is added on the running score.Be introduced into the running score that sequence length is divided by by calculating then, can obtain standardized score.
Note now, for calculate 3,2,2, the best calling sequence of 2}, still need calculate 2,2,2,2}, 1,2,2, the score of 2} etc.But in order to find best generally sequence, people must systematically study and be up to a certain limit (for example { N, N, N, N}) all possibilities, wherein each all will cause tracing back to { 0,0,0, reruning of the score of 0} still exists so combination increases sharply.Yet dynamic programming is a clever mode of avoiding this type of combination to increase sharply.
Can use algorithm, thereby when calculating score, it is stored in the matrix of four-dimensional N * N * N * N in order to using again.Thereby { 3,2,2, during the best calling sequence of 2}, { 2,2,2,2}, { 1,2,2, the score of 2} etc. will be stored in the described matrix when calculating.When need once more for example after a while 2,2,2, during the score of 2}, can avoid recurrence fully, and only be by taking out the result that previous calculations is crossed in the matrix.This provides very effectively and has carried out.With research about 3 4NIndividual possible calling sequence is opposite, only need study N 4Individual possibility.For example, in the real system of N=20, this problem is from about 10 38Inferior calculating reduces to 160 000, makes algorithm be become effectively by infeasible.
The maximum length sequence that can introduce reliably by algorithm as disclosed herein is so a kind of sequence, and it has the N homopolymer of one of base, surpasses N a kind of base, and is less than N other bases.This is conspicuous because of the following fact, and promptly when a kind of termination base had surpassed N, this sequence still can be introduced, because the base that lacks must be filled up by other three kinds of breach that stay.But when second kind of base surpasses N, can not be filled far and away by clear and definite by the breach that all the other bases stay.Described restriction is not to be absolute; Partial sequence still can obtain from complete colourity.
According to application, people can select the report (wherein) any i, j, k, l} until N, N, N, the optimal sequence of N}, N, N, N, the optimal sequence of N}, perhaps one of them index is the optimal sequence of N.Among the embodiment hereinafter, use be the latter.Selection depends on such factor, for example whether compare to tolerance range and preferably read length, and whether partial sequence is acceptable.
Base is introduced Phase, least squares (choosing wantonly)
The result of Phase I is calling sequence S 0, S 1... S nAnd corresponding homopolymer length q 0, q 1... q nBy each q is rounded to immediate integer, and spell out the dna sequence dna of gained, we can write out this in the mode of routine.Yet, existing more information in the colourity, we can be used to it, to find out q iBetter estimated value.After all, the homopolymer length that each that surveyed stops base is single measurement, but the in fact all measured mistake in each position four times (each stops base once) in the calling sequence.
An example makes this become clear.Consider following sequence:
ACGCATCAAAGCCTTACACGGTAAGCATCATC
' AAA ' triplet that occurs on this sequence location 8 will directly be measured in the third step of the colourity of A, and will be that an approximate number is as 3.43.If measuring error is huge, then may be difficult to all be sure of how the amount of being surveyed is rounded to integer in each case.
Yet ' AAA ' triplet also helps second step of the colourity of second step of colourity of the 4th step, G of the colourity of C and T.(colourity of C and T) in both cases, described triplet are actually independent measurement, and under the third situation, and it is measured with single A the preceding.Let as assume that for the colourity of A, C, G and T corresponding observed value is respectively 3.43,3.1,4.2 and 2.9.We will be ready to use these additional measurements to reduce the effect of random meausrement error.
Consider homopolymer length q once more 0, q 1... q nThe simple numeral that Phase I obtains is opposite with being received in, and we can form one group of simultaneous equation of describing the Additional Information of relevant q.Above-mentioned triplet is q 8, this is because it is the 8th homopolymer.Similarly, the A in its front is q 5We can write out following information according to leading portion now:
q 8=3.43 (from the colourities of A)
q 8=3.1 (from the colourities of C)
q 5+ q 8=4.2 (from the colourities of G)
q 8=2.9 (from the colourities of T)
We can carry out in a similar fashion to each position in the calling sequence.The simultaneous equation system of gained can use as the least squares optimization finds the solution, and gained is separated that group homopolymer length q that has provided all observed values in the match shades best 0, q 1... q n
Error tolerance base is introduced the example of algorithm
Following table has shown carries out 10 round-robin templates to each terminating nucleotide
ATGGAGCAGCGTCATTCCTTAGCGGGCAACTGTGACGATGGTGAGAAGTCAGAAAGAGAGGC
TCAGGGATTCGAGCATCGGACCTGTATGGACTCTGGGGA
The analog result of (providing the order-checking chain) colourity order-checking.
What the colourity of terminating nucleotide shown in each group has shown, each row had shown Nucleotide acquisition shown in the left side is (mimic) observed value of unit with a base, and each classifies a circulation as, comprises adding first three Nucleotide, adds a Nucleotide then.For example, four of runic numerals have shown the observed value that is obtained in the opening rotation of dATP as the colourity of terminating nucleotide.Because template is initial with A, so only A has provided and significantly is different from zero signal.
A A C G T 0.78 -0.19 0.2 0.07 1.09 -0.14 2.17 0.86 1.07 0.81 1.09 0.03 1 2.07 1.86 1.31 1.03 1.95 0.02 3.57 2.01 2.08 3.96 -0.14 0.86 1.17 1.91 2.19 1.17 1.21 1.01 0.09 1.03 -0.11 3.05 2.1 1.99 0.01 0.96 0.08
C A C G T 2 0.96 2.95 1.04 1.05 1 1.01 0.15 0.2 0.98 0.73 0.95 1.01 1.95 0.03 2.02 0.94 0.92 0.9 1.99 -0.06 1.04 3.05 0.02 1.91 1.1 0.12 -0.03 1.08 0.99 2.03 2.14 4.08 1.05 5.86 3.07 5.85 1.14 4.99 0.12
G A C G T 0.95 0.06 2.06 1.08 1.01 0.02 0.87 -0.13 1.15 1.01 1 0.06 0.01 1.11 1.06 -0.08 2.08 3 0.97 5.03 -0.01 1.08 2.98 -0.03 2.17 2.12 1.08 1.16 0.01 0.07 0.92 0.88 1.14 1.16 0.99 0.04 1.13 0.09 2.02 0.95
T A C G T 0.97 -0.07 -0.06 1.04 2.02 2.01 4.84 0.93 1.06 0.81 -0.14 2.25 0 1.91 -0.2 2.03 3.05 3.16 3.97 1.19 -0.06 0.06 0.96 0.84 1.91 0.9 2.01 0.91 0.02 0.07 2.06 0.96 2.94 -0.1 2.94 0.93 6.11 2.24 5.37 0.61
Use the base introducing of above-mentioned dynamic programming algorithm to identify following calling sequence (it does not show homopolymer): ATGAGCAGCGTCATCTAGCGCACTGTGACGATG, this is correct.Launch homopolymer by being rounded to immediate integer, produce ATGGAGCAGCGTCATTCCTTAGCGGGCAACTGTGACGATGG, this is correct once more, and has covered the template of 41bp.Thereby, only in the order-checking of 10 round-robin colourities, and be that (in this example, 10%CV), people can obtain the sequence information of 41 base pairs in the presence of effective measuring error.
Be the error tolerance of the assessment algorithm of giving, institute moved a series of 100 simulations to template with the random noise that is equivalent to 10%CV.Whole 100 calling sequences and whole 100 expansion sequences all are correct.59 long 41bp wherein, and remaining has comprised the other T that comes self-template.Thereby, shown in algorithm be productive when the experimental bias, and be the error tolerance.
Nucleotide adds scheme
In SBS, always suppose that all the time Nucleotide must add one at every turn, perhaps at least must be as among the BASS, force to mix one at every turn.Yet, as mentioned shown in, can utilize other Nucleotide interpolation schemes to obtain dna sequence dna, and some is suitable for avoiding the limitation (as the forfeiture of synchronism) of SBS better.In this section, we have studied all possible Nucleotide and have added scheme, and the proof conventional scheme is least possible in some aspects.
Nucleotide interpolation scheme is the rule that is used for to SBS reaction interpolation Nucleotide.It is by comprising that the consecutive steps that adds one or more Nucleotide is formed.In this section, we will ignore purely as any Nucleotide that inhibitor added or can not be impregnated in for some other reasonses.And we will claim and can be " T " (perhaps being called G, C, A similarly for cytosine(Cyt), guanine, thymus pyrimidine) with any Nucleotide of adenosine base pairing.In special application, can use the analogue or the derivative of natural nucleotide, but, be that its base pairing ability has determined Nucleotide to add the logic of scheme for the order-checking purpose.Nucleotide analog or derivative with multiple base pairing ability can be expressed as " AC ", " GCT " etc., to show this fact.
Recycle scheme is that the Nucleotide of repetition basic model adds scheme.The Nucleotide that has the recycle scheme that restarts and be the repetition basic model and restart with the distortion of basic model with fresh primer thereupon adds scheme.The nature scheme is wherein not have base to repeat until the scheme of having added all four kinds of bases.
In the natural circulation scheme, " 4 " show added all four kinds of Nucleotide in first steps, and it is a degeneracy, and can not be used for order-checking.
Scheme " 1-1-1-1 " is a conventional scheme, for all previously disclosed SBS methods used.Even notice that BASS falls into this type of, although because all four kinds of Nucleotide can add simultaneously, still because of the blocking groups that can cut, they are forced to one by one mix.
Scheme 1-1-1-1 is the scheme of minimum production.This can be found out by the following fact, and promptly after each productive step, the next Nucleotide on the template may be one of three kinds of possibilities (promptly are different from the base that just checked order those three), but has only added one base.As a result, it is the scheme that influenced by the synchronism forfeiture.
The method according to this invention is scheme 3-1, as disclosed in the literary composition.It is complete productive scheme (each step guarantees that all Nucleotide mixes, because the Nucleotide that lacks in given step is added in step subsequently).Having four kinds of distortion of 3-1, is to provide by change single Nucleotide in A, C, G and T.As mentioned, those four kinds of distortion can be used for rebuilding target sequence.
Scheme 2-2 is the complete productive scheme of another kind of possibility.This scheme has only three kinds of distortion, corresponding to AC-GT, AG-CT and AT-GC; Every other combination all is simply to put upside down.
What is that a scheme guarantees that people always can rebuild the minimum prerequisite of original series (restarting probably)? in fact, all each homopolymer in the target sequence must be separable with two neighbour in required being.In other words, each homopolymer must be to get rid of the part that at least one Nucleotide of its left hand neighbours mixes step and gets rid of its right hand neighbours' step.In scheme 1-1-1-1, each single step all has this characteristic, so sequence always can be rebuild.
In scheme 3-1, restart with all four kinds of possible distortion and to have guaranteed that each homopolymer all is a part that does not comprise the step of other Nucleotide.In principle, only three kinds in four kinds of distortion are strict the needs, because under the sort of situation, three kinds of bases will be added separately in some steps, and this automatically makes a distinction itself and the 4th kind.Thereby scheme 3-1 has produced non-existent redundant information among the scheme 1-1-1-1, this improve when being used in the face of experimental noise base introduce (as, by dynamic programming, as implied above).Thereby it not only more has productivity than 1-1-1-1, and has more the error tolerance.
Scheme 2-2 goes through three times and restarts, and also produces enough information with calling sequence.Find out that easily each is separable at least one at AC-GT, AG-CT and AT-GC to Nucleotide.Thereby scheme 2-2 is likely the most succinct complete productive scheme, although must effort by the extraneous information probable value of 3-1 generation.Still there are some redundancies (if Nucleotide is by isolabeling institute mark not); Thereby the error tolerance of scheme 2-2 is between 1-1-1-1 and 3-1.
Unconventional (acyclic) scheme also may be useful under special occasions.For example, when partial sequence is known, compare, can use unconventional scheme to cross non-purpose part quickly, perhaps can use them to produce even more redundant data, introduce error so that further reduce base with other modes are in the cards.
Generally speaking, add scheme about Nucleotide, we are modulated to find, and 3-1 has productivity and error tolerance the most, and some surprisingly, and traditional scheme 1-1-1-1 is minimum production and fallibility.
The signature order-checking
The present invention's can be used for signing another embodiment of aspect of order-checking comprises such method (scheme III), and it comprises:
1. provide annealing primer for single-stranded template.
2. add three kinds of Nucleotide, one of them carries mark such as fluorescent mark.
3. randomly add one or more non-inhibitor Nucleotide (being different from labeled nucleotide) that mixes.Example comprises 5 '-two-and the acid of list-phosphoric acid nucleoside, 5 '-(alpha-beta-methylene radical) triphosphopyridine nucleotide.
4. causing Nucleotide to add incubation under the condition of growing chain to suitable polysaccharase.
5. the existence of certification mark Nucleotide and amount.
6. mark was lost efficacy, as by photobleaching (being not in each circulates, all to be essential).
7. add remaining Nucleotide, and causing Nucleotide to add incubation under the condition of growing chain to polysaccharase (and in the nonessential and step 5 identical).
8. repeating step 2-7 is accomplished until required cycle number.
For example, people can use fluorescence dC and conventional dA/dG in step 2, add dT then in step 7.Then step 4 dA occurs first with adding dA, dG and the dC of any number in template, stops then, because there is not complementary dT Nucleotide.Fluorescence reading in the step 5 will disclose each to the existence of dC between the dT or do not exist.The sequence that is obtained generally can remember and do binary digital sequence, shows for each right T continuously, whether has one or more C between them.
For example, sequence A CGCTACGCATCAGACTC will remember as 1111, and sequence A CTCAGCTATATT note does 11000.Generally speaking, this type of sequence contains the information that is equivalent to 1/2 base pair of each circulation.24 circulations will be equivalent to the signature sequence of 12bp, and will be unique in human transcription thing group (transcriptome) for instance.Existing sequence library and sequence alignment algorithm can easily be transformed, and are used to analyze to adapt to this type of binary signature.
Scheme III is easy to carry out especially, because only need observational measurement.For example, scheme III may be particularly suitable for utilizing fluorescence correlation spectroscopy order-checking individual molecule.
The colourity order-checking that utilizes PPi to detect
In another embodiment, an aspect of of the present present invention provides such method (scheme IV), and it comprises the release (as referring to WO93/23564) of (opposite with applying marking Nucleotide) monitoring inorganic pyrophosphate (PPi).This kind method can comprise:
1. provide annealing primer for single-stranded template.
2. add one group and insert Nucleotide (promptly surpass a kind of but be less than all four kinds of possible Nucleotide).
3. randomly add one or more non-inhibitor Nucleotide that mix (be different from and insert Nucleotide).Example comprises 5 '-two-and the acid of list-phosphoric acid nucleoside, 5 '-(alpha-beta-methylene radical) triphosphopyridine nucleotide
4. causing Nucleotide to add incubation under the condition of growing chain to suitable polysaccharase, and (for example, as described in the WO93/23564) mixed in monitoring simultaneously.
5. add the terminating nucleotide group, and causing Nucleotide to add incubation under the condition of growing chain to, and (for example, as described in the WO93/23564) mixed in monitoring simultaneously with polysaccharase (and in nonessential and the step 5 identical).
6. repeating step 2-5 is accomplished until required cycle number.
Again, can use four kinds of natural nucleotides to repeat this scheme as terminating nucleotide.Compare with the order-checking of the tetra-sodium of standard, this scheme provides four times growth aspect the length reading, and standard scheme is not modified (except Nucleotide adds the variation that variation on the order and required base are introduced).
Following example has shown the importance of synchronism forfeiture and has used the influence of colourity order-checking scheme.It has shown by the check order result of the target DNA that both checked order of tetra-sodium order-checking and colourity.Suppose that mixing in the step all templates of fixed fraction at each all loses synchronism.In SBI, step is to add one base.In order-checking (jump sequencing) step of jumping, be alternately to add three kinds or a kind of base.Additionally, colourity order-checking is restarted three times by fresh primer, utilizes in four kinds of natural nucleotides each as terminating nucleotide.
Target sequence (for each terminating nucleotide, the end last Nucleotide that the colourity order-checking is reached shows with capitalization):
atggagcagc?gtcattcctt?agcgggcaac?tgtgacgatg?gtgagaagtc
agaaagagag?gctcaGGGat?tcgagcatcg?gacctgtAtg?gactctgggg
atccTTcctt?tgggCaaaat?gatcccccta?ccattttgcc?cattactgct
The tetra-sodium order-checking
Stop for 40 times at the synchronism forfeiture
40 reactions steps
Reaction The result
a?c?g?t a?c?g?t a?c?g?t a?c?g?t a?c?g?t a?c?g?t a?c?g?t a?c?g?t a?c?g?t a?c?g?t a - - t 2g?- - - a - g - - c - - a - g - - c - - - - g t - c - - a - - 2t - 2c?- 2t
Total sequence: 20bp
The colourity order-checking
Stop for 40 times at the synchronism forfeiture
160 reactions steps (be every kind and stop 40 of bases)
Reaction The result
cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a cgt?a -a t2g a gc a 2g2ct a 4t2c a ... Deng
... restart and repeat by [gta c], [tac g] and [acg t] ...
Total sequence: 88bp+27bp partial sequence
Generally speaking, synchronism forfeiture problem has been evaded in colourity order-checking, has realized surpassing four times the longer length that reads.
The order-checking of solid phase colourity
In order to make described method automatization and parallel, two kinds of main approach are provided according to embodiment of the present invention.
The template that first kind of approach uses array or otherwise arrange, and when being adapted under its characteristic situation of reservation, checking order to a large amount of templates.
Second kind of approach used the adhering at random of solid support, and can be used on when must obtain a large amount of sequence at random by the library.
The method of embodiment of an aspect of array mould plate of being used to according to the present invention to check order provides such method (plan V), and it comprises:
1. solid support is provided, and it provides many active regions or active surface, and each can both be in conjunction with template molecule, wherein in conjunction with being
A. directly, perhaps
B. indirectly, by in conjunction with primer or joint, described primer or joint and template hybridization or otherwise have avidity with template.
2. add single-stranded template to each active region or to active surface, follow the tracks of to settle which template on each position.Then each zone will be made up of a large amount of same ssDNA templates, as in the spot microarray.
3. randomly, add primer (perhaps being used to joint) from solid support.
4. all templates that checks order concurrently according to the present invention are as according to any of scheme I-IV.
5. obtain to identify the sequence of template about each.
In all active regions, joint (step 1b) is not must be identical.Can use different joints from the mixture of complexity, angling out specific template, thereby the possibility in order-checking subgroup library is provided.
The flux of plan V is subject to the resolving power of the device that is used to add template.Utilize the microarray instrument of standard, the density of every square centimeter of thousands of templates is possible.
When the higher flux of needs and template density are unimportant, can use another kind of approach.
Another embodiment of one aspect of the present invention is to provide as such method (plan V I), and it comprises:
1. solid support is provided, it carries what adhere on random site be the template molecule (preferably to be suitable for the density of detecting instrument) of part strand at least, each template randomly increases, holding the target sequence of multiple copied, they or be attached to or extremely be adjacent to primary template (more more close than any other template molecule at least).
2. utilize the present invention's sequencing template concurrently, for example any of scheme I-IV, the Nucleotide of certification mark concurrently.
Exist many approach that amplification template is provided to high-density.For example, can following use rolling circle amplification:
A. for surface (as glass) provides attached primer, preferably be situated between and adhere to by covalent linkage, perhaps opposite with covalent linkage, can use extremely strong non covalent bond (as biotin/streptavidin).
B. add circular template, preferably add with the density that is suitable for detecting instrument.
C. make template and primer annealing.
D. utilize rolling circle amplification to increase, repeat template with the strand series connection that is created in the length that is attached to the surface on each position.
Lizardi etc. have described " Mutation detection and single-moleculecounting using isothermal rolling circle amplification ": Nature Genetics vol 19, p.225.
The modification of this method comprised provide reverse primer producing additional replication fork, thereby improve efficiency of pcr product.The alternative approach of RCA comprise solid phase PCR (Adessi etc. " Solid phaseDNA Amplification:characterization of primer attachment andamdlification mechanisms " Nucleic Acids Research 2000:28 (20): 87e) and gel in (in-gel) PCR (' polonies ', US6485944 and MitraRD, Church GM,-In situ localized amplification and contactreplication of many individual DNA molecules ", Nucleic AcidsResearch 1999:27 (24): e 34).
" proper density " is preferably and makes the maximized density of flux, for example, guarantees that detector as much as possible (or the pixel in the detector) detects the limiting dilution of single template molecule.On any conventional arrays, perfectly limiting dilution will make all positions of 37% hold one template (because form of Poisson's distribution); All the other positions will not held or hold above a template.
For example, on the Typhoon 9200 with 25 μ m pixel sizes, the reaction chamber of 35 * 43cm is held 200,014,000 pixels.By limiting dilution (Poisson's distribution), wherein 37% will hold one template, i.e. 8,900 ten thousand templates.50 bases on each template are checked order in 50 circulations produce the sequence of 1.7Gb.Be 45 minutes sweep time, and a day flux is about 3Gbp, is equivalent to the full sequence of human genome.
The template that is suitable for solid phase RCA should optimization yield (with regard to the copy number of template sequence), and the sequence that is suitable for downstream application is provided simultaneously.Generally speaking, preferred little template.Particularly, template can be made up of the primer binding sequence of 20-25bp and the insertion fragment of 40-150bp.The primer binding sequence both can be used for initial RCA, can be used for causing sequencing reaction again, and perhaps template can comprise independent sequencing primer binding site.Inserting fragment should be as much as possible little, keeps sufficiently long to hold required sequence simultaneously.For example, if utilize single terminating nucleotide to carry out 10 round-robin order-checkings, then on average will survey 40 bases, thereby template must be considerable more more than enough than 40 bases at least, to prevent the sequencing primer binding sequence.
In order to increase the signal that template produced, have necessity and concentrate them by rolling circle amplification.Because the RCA product is single strand dna basically, its by as many as 1000 or or even the inline copy thing of 10000 original circular template form, described molecule will be very long.For example, utilizing RCA to increase 1000 times 100bp template will be in the magnitude of 30 μ m, thereby its signal will be extended across several different pixels (being assumed to the pixel resolution of 5 μ m).Utilize the low resolution instrument may be helpless to thing,, therefore may can not detect because rare ssDNA product only occupies a very little part in the 30 μ m pixel areas.Thereby expectation can be concentrated into signal littler zone.
In (Lizardi etc. are above quoted from), the RCA product is spissated as linking agent by the Nucleotide and the multivalent antibody that use the epi-position mark.On the other hand, the invention provides simple alternatives, it is convenient especially when the original double-stranded DNA of order-checking.
Prepare about the template that is used for the method according to this invention, and, dsDNA template (it may be short, as 80bp) is connected in the joint oligonucleotide that carries hairpin loop, to form false double-stranded ring structure or dumb-bell shape as another aspect of the present invention.In this kind structure, the primer binding site that is used for RCA and sequencing reaction subsequently can be placed hairpin loop.For fear of two chains that check order simultaneously, be used for different primers RCA amplification and that be used to check order by use, people can guarantee to have the template of different hairpin loops at its two ends with only checking order.Thereby, amplification is only had the template of at least one RCA primer binding site, and will only check order and have those templates of at least one sequencing primer binding site.
Because the RCA product of this kind template will be partially double stranded throughout, so it will build up the zigzag structure to inflection, be concentrated into littler zone.But, be out of question so primer is approaching because primer binding site exposes as single stranded DNA throughout.Embodiment hereinafter is presented at the product of this class template formation~5-10 μ m behind the RCA.
For oligonucleotide is fixing from the teeth outwards, many different approach (referring to as " Minisequencing on oligonucleotide arrays:comparison of immobilisation chemistries " such as Lindroos, Nucleic AcidsResearch 2001:29 (13) e69) had been described already.For example, biotinylated oligonucleotide (oligo) can be attached to the array that streptavidin applies; Can be with NH 2-the oligonucleotide of modifying is covalently attached to the glass slide of epoxy silane derivatize or lsothiocyanates coating; can the oligonucleotide of succinylation be coupled to aminophenyl-or the glass of aminopropyl derivatize by peptide bond, and can be fixed on the on glass of hydrosulphonyl silaneization (mercaptosilanised) by the oligonucleotide that the thiol/disulfide permutoid reaction is modified disulphide.More descriptions in the literature already.
The device that is used for the automatization high-flux sequence
The method according to this invention is particularly suitable for automatization, this be because they can be simply by through placing on the detector or wherein reaction chamber many reagent solutions that circulate carry out, randomly have thermal control.
In an example, detector is the fluorescent scanning instrument, and for example, it may detect by laser excitation, bandpass filtering and photomultiplier and turn round.For example, ScanArrayExpress (PerkinElmer) is so a kind of instrument; It is with the resolution scan microslide of 5 μ m/ pixels, can detect fewly to 2 fluorescence dyes of every pixel, and has~sweep time (with four kinds of colors) of 20 minutes.Day sequencing throughput on such instrument is the highest will to be 1.7Gbp.
Reaction chamber provides:
Approaching easily to scanner head (scan head).
Airtight reaction chamber.
Be used for injecting and pipetting the import of reagent from reaction chamber.
The outlet that allows air and reagent to enter and discharge reaction chamber.
Reaction chamber can be configured to standard microarrays slide glass form as shown in Figure 3, is suitable for being inserted among standard microarrays scanner such as the ScanArray Express.Reaction chamber can be inserted in the scanner, and remains there during whole sequencing reaction.Pump and reagent bottle (for example, as shown in Figure 4) according to fixed scheme supply reagent, and computer control pump and scanner, between reaction and scanning alternately.Randomly, reaction chamber can be temperature control.
Dispenser unit can be connected in vehicularized outlet, to instruct flowing of reagent, total system is moved under the control of computer.Integrated system will be by scanner, divider, outlet and liquid vessel, and the computer of control is formed.
According to a further aspect in the invention, provide the instrument that is used to implement the inventive method, this instrument comprises:
Can detect the image-forming component of the mark that mixes or discharge,
Be used for the one or more reaction chambers that adhere to template of splendid attire, thereby every group of step has once them can be near image-forming component at least,
Be used to reaction chamber that the reagent distribution system of reagent is provided.
Reaction chamber can provide, and image-forming component may be able to differentiate, and density is at least 100/cm 2, 1000/cm at least randomly 2, at least 10 000/cm 2Or at least 100 000/cm 2The template of adhering to.
Image-forming component can adopt system or the device that is selected from down group: photomultiplier, photorectifier, charge-coupled device, cmos imaging chip, near-field scan microscope, far field confocal microscope, wide visual field surface illumination (epi-illumination) microscope and total internal reflectance microscope.
Image-forming component can detect fluorescent mark.
But the fluorescence that the image-forming component detection laser is brought out.
In a embodiment according to instrument of the present invention, reaction chamber is a closed structure, comprise transparent surface, cover and be used to port that reaction chamber and reagent distribution system are adhered to, wherein the surperficial within it splendid attire template molecule of transparent surface forms pixel spare and can pass the transparent surface imaging.
Example I-original position template amplification
The oligonucleotide (TGGTCATCAGCCTTCATGCAACCAAAGTATGAAATAACCAGCGTAATACGACTCAC TATAGGGCGTGGTTATTTCATACT and TTGGTTGCATGAAGGCTGATGACCATCCTTTTCCTTACTAGCGTAATACGACTCAC TATAGGGCGTAGTAAGGAAAAGGA) of two 5 '-phosphorylations by the 4 μ l 100pmol/ μ l that anneal, and add 2 μ l T4 and connect damping fluid, 0.3 μ l T4DNA ligase enzyme (1.5 Weiss units; Fermentas) and 7 μ l water, and at 37 ℃ of incubations prepared the cyclic single strand template in 1 hour.Then by coming the deactivation ligase enzyme in 10 minutes at 65 ℃ of incubations.
Primer A50T7RC
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCCCTA TAGTGAGTCGTATTACGC) carries 5 ' terminal amino group (NH) part, by at 100 μ l MOPS (0.2M, wherein sodium-acetate and EDTA are according to ' Molecular Cloning ' such as Sambrook, third edition, Cold Spring Harbor Laboratory Press 2001 preparations) incubation 10 μ M primers are 5 minutes in, and be attached on the silylated microarray slide glass of Greiner, in 1ml PBS/ ethanol (3: 1), use 2.5mg NaBH 4Reduced 5 minutes, and in 0.2% sodium lauryl sulphate, washed then, then use distilled water flushing.
Incubation exsiccant slide glass then, be used to use the 2 μ l dUTP-Cy3 (final concentrations of 100 μ M, PerkinElmer), dTTP, dATP, dCTP and the dGTP of each the 2 μ l (final concentration of whole 1mM, NEB), 4 μ l Sequenase damping fluids, 1 μ l Sequenase (13u, AmershamBiosciences), 4 μ l water and 1 μ l template carry out rolling circle amplification.Thereby labeled nucleotide is about 2.5% of complete nucleotide.Be that 37 ℃ of incubations after two hours, wash slide glass in water, and on PerkinElmer ScanArray Express, scan.The result is a large amount of speck, has represented the template of amplification respectively.This result also shows with this form can easily detect 2.5% mark frequency (in fact, many spots make detector saturated).
The amplification of part slide glass shows, under the pixel size of 5 μ m images, the templates of great majority amplification occupy the pixel of or peanut.Under this size, on the scanner very the pixel of vast scale can be used for different template molecules, thereby guaranteed flux peak.White pixel is fully saturated detector shows that to be less than 2.5% mark enough detectable.Suppose that template is 160bp, then on behalf of each template, 2.5% mark copy about 4 Nucleotide that mix, in the scope of the colourity sequencing reaction of expecting.
Example II-single step sequencing reaction
By incubation in the Dynal combination/lavation buffer solution (Dynal, Norway) of 10pmol/ μ l, biotinylated T7 primer (GCGTAATACGACTCACTATAGGGCG) is attached on the microarray slide glass of Greiner streptavidin coating.The rubber diaphragm that contains the wide hole array of 5mm by stickup on slide glass carries out pore-creating.(Clontech) boils with the TOPO2.1 plasmid, and cooled on ice is added in each hole with 20fmol/ μ l then.Under room temperature, after the incubation 15 minutes, slide glass was washed in combination/lavation buffer solution 15 minutes.
In two holes, add dATP, the dTTP and the dGTP (final concentration of 100 μ M that contain 4 μ l EcoPol damping fluids, each 0.4 μ l, NEB), the 0.4 μ l dUTP-Cy3 (final concentration of 10 μ M, PerkinElmer), 2 circumscribed archaeal dna polymerases of μ l Ke Lienuo (NEB) and add water to the reaction mixture of 40ul, and in two other holes, add the identical mixture that replaces Ke Lienuo with water., after the washed twice 15 minutes slide glass is scanned on Typhoon 9200 incubation 10 minutes and in combination/lavation buffer solution.
Give under the situation of solid plate (Clontech TOPO2.1), expected result is to mix 2 dTTP.Fig. 2 has shown this result, clearly illustrates that the dTTP that has mixed mark, and the signal that is obtained is significantly higher than background (as given by the fluorescence in the reaction of omitting Ke Lienuo).

Claims (52)

1. determine the sequence of nucleic acid and/or the method for based composition information, described method comprises:
(i) provide the nucleic acid that comprises first chain, described first chain comprises nucleic acid-templated, wherein the free 3 ' end with the described first chain annealed nucleic acid chains allows to be complementary to nucleic acid-templated nucleic acid chains extension, this is by the dependent nucleic acid polymerase of template, by template sequence dependency ground Nucleotide is incorporated into to be complementary to and realizes in the nucleic acid-templated nucleic acid chains;
(ii) implement one or more steps of one group, number of cycles with expectation should be organized one or more steps, or implemented with one or more combination of steps of other groups, was complementary to nucleic acid-templated nucleic acid chains with extension, thereby allow to obtain the based composition of the described nucleic acid of expression or the information of sequence
One of them step comprises:
(a) exist when following:
The nucleic acid that comprises first chain, described first chain comprises nucleic acid-templated,
With the free 3 ' end of the described nucleic acid-templated first chain annealed nucleic acid chains and
Template dependency nucleic acid polymerase;
Provide be selected from a kind of, two kinds, the Nucleotide of three kinds or the four kinds complementary types of Nucleotide, with be used for by described nucleic acid polymerase with described oligonucleotide template dependency be incorporated into and be complementary to nucleic acid-templated nucleic acid chains, wherein each described Nucleotide is natural nucleotide or nucleotide analog, they can the free 3 ' end of nucleic acid chains by nucleic acid polymerase template dependency be incorporated in the nucleic acid chains, and in the complementary type of each Nucleotide, described Nucleotide and nucleotide analog and adenosine (A), cytosine(Cyt) (C), the complementation of one of thymus pyrimidine (T) and guanine (G);
With
(b) remove or the uncorporated Nucleotide of deactivation;
And
Wherein in one group of step
Provide the Nucleotide that is selected from the complementary types of all four kinds of Nucleotide, and it can be used for carrying out that template is dependent mixes,
In at least one step, provide and be selected from the Nucleotide that surpasses a kind of, optional two kinds, three kinds or the four kinds complementary types of Nucleotide, and it can be used for carrying out, and template is dependent mixes, and the Nucleotide in the complementary type of at least a Nucleotide, be complementary in the nucleic acid-templated nucleic acid chains if be incorporated into, then allow to be complementary to nucleic acid-templated nucleic acid chains is further extended and
Randomly in surpassing a step, do not provide Nucleotide complementary type; With
Wherein, if in a step, provide the Nucleotide that is selected from all four kinds of complementary types, Nucleotide in the complementary type of then a kind of, two or three Nucleotide, be complementary in the nucleic acid-templated nucleic acid chains if be incorporated into, then prevent to be complementary to nucleic acid-templated nucleic acid chains and further extend, if there is multiple copied with all copies that exist;
(iii) implement the described steps of many groups, the described step group that circulates and/or with the different described step groups of step group Joint Implementation;
(iv) determine to be incorporated into character and/or the amount that is complementary to the Nucleotide in the nucleic acid-templated nucleic acid chains at least one group of step, this is to realize by being incorporated into the character and/or the amount that are complementary to the Nucleotide in the nucleic acid-templated nucleic acid chains at least one step of determining each group, to described group of character and/or the amount that will determine the Nucleotide that mixed.
2. according to the method for claim 1, wherein in one group of step, the Nucleotide that is selected from three kinds or the two kinds complementary types of Nucleotide is provided in first step, and the Nucleotide of choosing from the complementary type of remaining one or both Nucleotide is provided in second step.
3. according to the method for claim 2, comprise the Nucleotide that mixed in first or second step of determining step group or the amount of multiple Nucleotide, to determine the character and/or the amount of the Nucleotide that mixed to described step group.
4. according to the method for claim 3, comprise the amount of the Nucleotide that is mixed in each step of determining group, to the described group of amount that will determine the Nucleotide that mixed.
5. according to the method for claim 4, wherein in one group of step, three kinds of Nucleotide are provided in first step, and a kind of Nucleotide is provided in second step.
6. according to the method for claim 5, comprise the property quality and quantity of the Nucleotide that is mixed in definite first step.
7. according to each method in the claim 2 to 6, wherein the Nucleotide that is provided in first step is differently carried out mark separately.
8. according to each method in the claim 2 to 7, wherein the Nucleotide that is provided in second step is labeled.
9. according to each method in the claim 1 to 8, four kinds of Nucleotide that wherein are complementary to A, C, T and G are differently carried out mark separately.
10. according to the method for claim 7, claim 8 or claim 9, wherein Nucleotide is by fluorescent mark.
11. according to the method for claim 7, claim 8, claim 9 or claim 10, wherein be incorporated into when being complementary in the nucleic acid-templated nucleic acid chains when Nucleotide, the mark of described Nucleotide lost efficacy.
12. according to the method for claim 7, claim 8, claim 9 or claim 10, wherein be incorporated into when being complementary in the nucleic acid-templated nucleic acid chains when Nucleotide, the mark of described Nucleotide is from described Nucleotide cutting or discharge.
13., comprise and determining from being incorporated into the character and/or the amount of the mark that is complementary to the one or more Nucleotide cuttings the nucleic acid-templated nucleic acid chains or discharges according to the method for claim 12.
14. according to each method in the claim 5 to 13, comprise and implement a round-robin step group, wherein in every group of step of this round-robin, three kinds of Nucleotide are provided in first step, and a kind of Nucleotide is provided in second step.
15. method according to claim 14, comprise described nucleic acid is implemented four round-robin step groups, wherein in each circulation, in steps in all second steps of group a kind of Nucleotide of being provided be identical, and a kind of Nucleotide that wherein institute is provided in all second steps of group in steps in each circulation be different from other three round-robin a kind of Nucleotide of being provided in all second steps of organizing in steps.
16. according to each method in the claim 1 to 15, wherein one group of step additionally comprises provides one or more sealing Nucleotide, its termination is mixed Nucleotide in being complementary to nucleic acid-templated nucleic acid chains.
17. according to each method in the claim 1 to 16, wherein one group of step additionally comprises provides the inhibitor of one or more non-mixing property Nucleotide, it suppresses in being complementary to nucleic acid-templated nucleic acid chains mistake and mixes Nucleotide.
18. according to each method in the claim 1 to 17, wherein nucleic acid-templated is thymus nucleic acid (DNA), nucleic acid polymerase is the dependent archaeal dna polymerase of DNA, and Nucleotide is deoxyribonucleotide or deoxyribonucleotide analogue.
19. according to each method in the claim 1 to 17, wherein nucleic acid-templated is thymus nucleic acid (DNA), nucleic acid polymerase is the dependent Yeast Nucleic Acid of DNA (RNA) polysaccharase, and Nucleotide is ribonucleotide or ribonucleoside acid-like substance.
20. according to each method in the claim 1 to 17, wherein nucleic acid-templated is Yeast Nucleic Acid (RNA), nucleic acid polymerase is a ThermoScript II, and Nucleotide is deoxyribonucleotide or deoxyribonucleotide analogue.
21., wherein nucleic acid-templatedly provide with multiple copied according to each method in the claim 1 to 20.
22., comprise by nucleic acid amplification reaction the nucleic acid-templated of multiple copied is provided according to the method for claim 21.
23. according to the method for claim 22, wherein nucleic acid amplification reaction comprises rolling circle amplification.
24. the method according to claim 23 comprises:
The dna molecular of being made up of the stem and first and second ring portions is provided, wherein said stem is made up of first chain and second chain, wherein said first chain and second chain length equate, complementation and annealing are together, and comprise the zone that needs its sequence and/or based composition information, wherein said first ring portion is connected in 3 ' end of described first chain at 5 ' end of described second chain, and described second ring portion is connected in 3 ' end of described second chain at 5 ' end of described first chain, thereby described dna molecular does not have free 5 ' or 3 ' end, and one of them ring portion comprises the primer binding site that is used for rolling circle amplification and a ring portion comprises the primer binding site that is used to check order;
Implement rolling circle amplification, with nucleic acid that multiple copied is provided as described nucleic acid-templated.
25. it is, wherein nucleic acid-templated attached on the solid support according to each method in the claim 1 to 24.
26. according to the method for claim 25, wherein a plurality of different nucleic acid-templated forms with array are attached on the solid support.
27. according to the method for claim 25 or claim 26, wherein nucleic acid-templated Jie by with attached to the primer annealing on the solid support attached on the solid support.
28., comprise by coming the definite kernel acid sequence to being incorporated into the analysis that the character that is complementary to the Nucleotide in the nucleic acid-templated nucleic acid chains and/or amount determine according to each method in the claim 1 to 27.
29. via the synthetic method for nucleic acid sequencing, be characterised in that in mode progressively and mix Nucleotide that one of them step is mixed with allowing the template dependency and surpassed a kind of different Nucleotide.
30. method according to claim 29, one of them step is mixed three kinds of different Nucleotide with allowing the template dependency, described Nucleotide is selected from the Nucleotide that is complementary to adenosine (A), cytosine(Cyt) (C), thymus pyrimidine (T) and guanine (G), and different steps mixing this organizes remaining Nucleotide with allowing the template dependency.
31. through programdesign with control according to the computer processor of each method in the claim 1 to 30.
32. carry the computer readable device that is used for according to the program of the computer processor of claim 31.
33. through programdesign with by implementing to provide the computer processor of the sequence and/or the based composition information of nucleic acid according to each method in the claim 1 to 30.
34. carry the computer readable device that is used for according to the program of the computer processor of claim 33.
35. be suitable for implementing the test kit according to each method in the claim 1 to 30, described test kit comprises the reagent that one or more groups is pre-mixed in one or more reagent containers, wherein the reagent that is pre-mixed of each group comprises
The Nucleotide of from all four kinds of complementary types, choosing,
The Nucleotide of choosing at least one container contains a kind of from surpassing, optional two kinds, three kinds or the four kinds of complementary types, and the Nucleotide in the complementary type of at least a Nucleotide is complementary in the nucleic acid-templated nucleic acid chains if be incorporated into, then allow to be complementary to nucleic acid-templated nucleic acid chains and further extend, and
Wherein, if the Nucleotide that is selected from all four kinds of complementary types is provided in single container, Nucleotide in then a kind of, the two or three complementary type is complementary in the nucleic acid-templated nucleic acid chains if be incorporated into, and then prevents to be complementary to nucleic acid-templated nucleic acid chains and further extends.
36. be used for implementing comprising according to each the instrument of method of claim 1 to 30:
Can detect the image-forming component of the mark that mixes or discharge,
The reaction chamber that is used for the one or more templates of adhering to of splendid attire, thus every group of step has at least once them can be near image-forming component,
Be used to reaction chamber that the reagent distribution system of reagent is provided.
37. according to the instrument of claim 36, wherein reaction chamber provides, and image-forming component can differentiate, density is at least 100/cm 2, 1000/cm at least randomly 2, at least 10 000/cm 2Or at least 100 000/cm 2The template of adhering to.
38. according to the instrument of claim 35 or claim 36, wherein image-forming component adopts system or the device that is selected from down group: photomultiplier, photorectifier, charge-coupled device, cmos imaging chip, near-field scan microscope, far field confocal microscope, wide visual field surface illumination microscope and total internal reflectance microscope.
39. according to the instrument of claim 35 or claim 36, wherein image-forming component detects fluorescent mark.
40. according to the instrument of claim 39, the fluorescence that brings out of image-forming component detection laser wherein.
41. according to each instrument in the claim 35 to 40, wherein reaction chamber is a closed structure, it comprises transparent surface, covers and is used to port that reaction chamber and reagent distribution system are adhered to, wherein the surperficial within it splendid attire template molecule of transparent surface forms pixel spare and can pass the transparent surface imaging.
42. the dna molecular of forming by the stem and first and second ring portions, wherein said stem is made up of first chain and second chain, wherein said first chain and second chain length equate, complementary and annealing is in the same place, wherein said first ring portion is connected in 3 ' end of described first chain at 5 ' end of described second chain, and described second ring portion is held the 5 ' end that is connected in described first chain with 3 ' of described second chain, thereby described dna molecular does not have free 5 ' or 3 ' end.
43. according to the dna molecular of claim 42, one of them ring portion comprises the primer binding site that is used for rolling circle amplification.
44. according to the dna molecular of claim 42 or claim 43, one of them ring portion comprises the primer binding site that is used to check order.
45. attached to the array on the solid support according to a plurality of different dna moleculars of claim 42, claim 43 or claim 44, randomly be situated between by with adhere to attached to the primer annealing on the solid support.
46. preparation is according to the method for the dna molecular of claim 42, claim 43 or claim 44, described method comprises:
Provide by first chain and have 5 ' end and double chain DNA molecule that 3 ' second chain of holding is formed with 5 ' end and 3 ' end; And
Connect first joint, be connected in 5 ' end of second chain with 3 ' end with first chain, and connect second joint, be connected in 5 ' of first chain with the end with second chain 3 ' and hold, wherein said joint is a hairpin structure.
47. produce the method for multiple copied dna profiling, described method comprises implements rolling circle amplification to the dna molecular according to claim 43 or claim 44, comprises the dna molecular of the extension of multiple copied dna profiling with production.
48. produce the method for a plurality of dna profilings of multiple copied, described method comprises implements rolling circle amplification to a plurality of dna moleculars according to claim 43 or claim 34, comprises the IDNA molecule of a plurality of extensions of multiple copied dna profiling with production.
49. according to the method for claim 47 or claim 48, wherein rolling circle amplification primer or dna molecular are attached on the solid support.
50., comprise that further the annealing between the complementary strand in the dna profiling of multiple copied in the dna molecular that passes through to be extended concentrates the dna molecular of described extension according to the method for claim 47 or claim 48.
51. according to the method for claim 50, wherein the dna molecular of Yan Shening is concentrated on the solid support.
52., comprise that further dna profiling or a plurality of dna profiling to multiple copied in the dna molecular that is extended checks order according to each method in the claim 47 to 51.
CNA2004800097143A 2003-02-12 2004-02-09 Methods and means for nucleic acid sequencing Pending CN1771336A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB0303191A GB2398383B (en) 2003-02-12 2003-02-12 Method and means for nucleic acid sequencing
US60/446,553 2003-02-12
GB0303191.1 2003-02-12

Publications (1)

Publication Number Publication Date
CN1771336A true CN1771336A (en) 2006-05-10

Family

ID=9952884

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2004800097143A Pending CN1771336A (en) 2003-02-12 2004-02-09 Methods and means for nucleic acid sequencing

Country Status (2)

Country Link
CN (1) CN1771336A (en)
GB (2) GB2398383B (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101168773B (en) * 2007-10-16 2010-06-02 东南大学 Nucleic acid sequencing method based on fluorescence quenching
CN101652780B (en) * 2007-01-26 2012-10-03 伊鲁米那股份有限公司 Nucleic acid sequencing system and method
CN103429754A (en) * 2010-09-23 2013-12-04 桑特里莱恩科技控股公司 Native-extension parallel sequencing
CN103907117A (en) * 2011-09-01 2014-07-02 基因组编译器公司 System for polynucleotide construct design, visualization and transactions to manufacture the same
CN107002293A (en) * 2014-09-17 2017-08-01 艾比斯生物科学公司 Pass through synthesis order-checking using pulse reading optical
CN107075579A (en) * 2014-07-15 2017-08-18 亿明达股份有限公司 The electronic installation of biochemistry activation
CN108165618A (en) * 2017-12-08 2018-06-15 东南大学 DNA sequencing method of the one kind comprising nucleotide and 3 ' the reversible blocked nucleotides in end
WO2018121587A1 (en) * 2016-12-27 2018-07-05 深圳华大生命科学研究院 Single fluorescent dye based sequencing method
CN109328192A (en) * 2016-05-20 2019-02-12 宽腾矽公司 Labeled polynucleotide composition and the method for nucleic acid sequencing
US10272410B2 (en) 2013-08-05 2019-04-30 Twist Bioscience Corporation De novo synthesized gene libraries
CN109844136A (en) * 2016-08-15 2019-06-04 欧姆尼奥姆股份有限公司 The method and system of sequencing nucleic acid
WO2020010495A1 (en) * 2018-07-09 2020-01-16 深圳华大智造极创科技有限公司 Method for nucleic acid sequencing
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US10696965B2 (en) 2017-06-12 2020-06-30 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US10744477B2 (en) 2015-04-21 2020-08-18 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10754994B2 (en) 2016-09-21 2020-08-25 Twist Bioscience Corporation Nucleic acid based data storage
US10844373B2 (en) 2015-09-18 2020-11-24 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US10894242B2 (en) 2017-10-20 2021-01-19 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US10894959B2 (en) 2017-03-15 2021-01-19 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US10907274B2 (en) 2016-12-16 2021-02-02 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
WO2021030952A1 (en) * 2019-08-16 2021-02-25 深圳市真迈生物科技有限公司 Base recognition method and system, computer program product, and sequencing system
US10936953B2 (en) 2018-01-04 2021-03-02 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US10975372B2 (en) 2016-08-22 2021-04-13 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10987648B2 (en) 2015-12-01 2021-04-27 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
US11377676B2 (en) 2017-06-12 2022-07-05 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11407837B2 (en) 2017-09-11 2022-08-09 Twist Bioscience Corporation GPCR binding proteins and synthesis thereof
US11492665B2 (en) 2018-05-18 2022-11-08 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11492728B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for antibody optimization
US11492727B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for GLP1 receptor
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
US11613772B2 (en) 2019-01-23 2023-03-28 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
US11655504B2 (en) 2017-07-24 2023-05-23 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0514935D0 (en) 2005-07-20 2005-08-24 Solexa Ltd Methods for sequencing a polynucleotide template
BRPI0909212A2 (en) 2008-03-28 2015-08-18 Pacific Biosciences California Compositions and method for nucleic acid sequencing
WO2010075188A2 (en) 2008-12-23 2010-07-01 Illumina Inc. Multibase delivery for long reads in sequencing by synthesis protocols
EP2427572B1 (en) * 2009-05-01 2013-08-28 Illumina, Inc. Sequencing methods
EP2456892B1 (en) 2009-07-24 2014-10-01 Illumina, Inc. Method for sequencing a polynucleotide template
CN102858995B (en) 2009-09-10 2016-10-26 森特瑞隆技术控股公司 Targeting sequence measurement
US10174368B2 (en) 2009-09-10 2019-01-08 Centrillion Technology Holdings Corporation Methods and systems for sequencing long nucleic acids
US20120252682A1 (en) 2011-04-01 2012-10-04 Maples Corporate Services Limited Methods and systems for sequencing nucleic acids
JP6093498B2 (en) 2011-12-13 2017-03-08 株式会社日立ハイテクノロジーズ Nucleic acid amplification method
CN116083547A (en) 2015-11-19 2023-05-09 赛纳生物科技(北京)有限公司 Method for correcting advance amount during sequencing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5674679A (en) * 1991-09-27 1997-10-07 Amersham Life Science, Inc. DNA cycle sequencing
AU6353194A (en) * 1993-02-18 1994-09-14 United States Biochemical Corporation Dna sequencing with non-radioactive label
US5674683A (en) * 1995-03-21 1997-10-07 Research Corporation Technologies, Inc. Stem-loop and circular oligonucleotides and method of using
US6235502B1 (en) * 1998-09-18 2001-05-22 Molecular Staging Inc. Methods for selectively isolating DNA using rolling circle amplification
US6573051B2 (en) * 2001-03-09 2003-06-03 Molecular Staging, Inc. Open circle probes with intramolecular stem structures
EP1436401A4 (en) * 2001-09-27 2006-06-14 Timothy Albert Holton Stem-loop vector system

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101652780B (en) * 2007-01-26 2012-10-03 伊鲁米那股份有限公司 Nucleic acid sequencing system and method
CN101168773B (en) * 2007-10-16 2010-06-02 东南大学 Nucleic acid sequencing method based on fluorescence quenching
CN103429754A (en) * 2010-09-23 2013-12-04 桑特里莱恩科技控股公司 Native-extension parallel sequencing
CN103907117A (en) * 2011-09-01 2014-07-02 基因组编译器公司 System for polynucleotide construct design, visualization and transactions to manufacture the same
CN103907117B (en) * 2011-09-01 2019-03-29 基因组编译器公司 System and method for polynucleotide constructs design
US10583415B2 (en) 2013-08-05 2020-03-10 Twist Bioscience Corporation De novo synthesized gene libraries
US10639609B2 (en) 2013-08-05 2020-05-05 Twist Bioscience Corporation De novo synthesized gene libraries
US11559778B2 (en) 2013-08-05 2023-01-24 Twist Bioscience Corporation De novo synthesized gene libraries
US11452980B2 (en) 2013-08-05 2022-09-27 Twist Bioscience Corporation De novo synthesized gene libraries
US10632445B2 (en) 2013-08-05 2020-04-28 Twist Bioscience Corporation De novo synthesized gene libraries
US10272410B2 (en) 2013-08-05 2019-04-30 Twist Bioscience Corporation De novo synthesized gene libraries
US11185837B2 (en) 2013-08-05 2021-11-30 Twist Bioscience Corporation De novo synthesized gene libraries
US10384188B2 (en) 2013-08-05 2019-08-20 Twist Bioscience Corporation De novo synthesized gene libraries
US10618024B2 (en) 2013-08-05 2020-04-14 Twist Bioscience Corporation De novo synthesized gene libraries
US10773232B2 (en) 2013-08-05 2020-09-15 Twist Bioscience Corporation De novo synthesized gene libraries
CN107075579A (en) * 2014-07-15 2017-08-18 亿明达股份有限公司 The electronic installation of biochemistry activation
US10550428B2 (en) 2014-09-17 2020-02-04 Ibis Biosciences, Inc. Sequencing by synthesis using pulse read optics
CN107002293A (en) * 2014-09-17 2017-08-01 艾比斯生物科学公司 Pass through synthesis order-checking using pulse reading optical
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US11697668B2 (en) 2015-02-04 2023-07-11 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US11691118B2 (en) 2015-04-21 2023-07-04 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10744477B2 (en) 2015-04-21 2020-08-18 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10844373B2 (en) 2015-09-18 2020-11-24 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11807956B2 (en) 2015-09-18 2023-11-07 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
US10987648B2 (en) 2015-12-01 2021-04-27 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
CN109328192A (en) * 2016-05-20 2019-02-12 宽腾矽公司 Labeled polynucleotide composition and the method for nucleic acid sequencing
CN109844136A (en) * 2016-08-15 2019-06-04 欧姆尼奥姆股份有限公司 The method and system of sequencing nucleic acid
US10975372B2 (en) 2016-08-22 2021-04-13 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10754994B2 (en) 2016-09-21 2020-08-25 Twist Bioscience Corporation Nucleic acid based data storage
US11562103B2 (en) 2016-09-21 2023-01-24 Twist Bioscience Corporation Nucleic acid based data storage
US11263354B2 (en) 2016-09-21 2022-03-01 Twist Bioscience Corporation Nucleic acid based data storage
US10907274B2 (en) 2016-12-16 2021-02-02 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US11466318B2 (en) 2016-12-27 2022-10-11 Egi Tech (Shen Zhen) Co., Limited Single fluorescent dye-based sequencing method
WO2018121587A1 (en) * 2016-12-27 2018-07-05 深圳华大生命科学研究院 Single fluorescent dye based sequencing method
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
US10894959B2 (en) 2017-03-15 2021-01-19 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US11332740B2 (en) 2017-06-12 2022-05-17 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11377676B2 (en) 2017-06-12 2022-07-05 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US10696965B2 (en) 2017-06-12 2020-06-30 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11655504B2 (en) 2017-07-24 2023-05-23 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
US11407837B2 (en) 2017-09-11 2022-08-09 Twist Bioscience Corporation GPCR binding proteins and synthesis thereof
US11745159B2 (en) 2017-10-20 2023-09-05 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US10894242B2 (en) 2017-10-20 2021-01-19 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
CN108165618B (en) * 2017-12-08 2021-06-08 东南大学 DNA sequencing method containing nucleotide and 3' end reversible closed nucleotide
CN108165618A (en) * 2017-12-08 2018-06-15 东南大学 DNA sequencing method of the one kind comprising nucleotide and 3 ' the reversible blocked nucleotides in end
US10936953B2 (en) 2018-01-04 2021-03-02 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US11732294B2 (en) 2018-05-18 2023-08-22 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11492665B2 (en) 2018-05-18 2022-11-08 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
WO2020010495A1 (en) * 2018-07-09 2020-01-16 深圳华大智造极创科技有限公司 Method for nucleic acid sequencing
US11613772B2 (en) 2019-01-23 2023-03-28 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
US11492727B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for GLP1 receptor
US11492728B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for antibody optimization
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
WO2021030952A1 (en) * 2019-08-16 2021-02-25 深圳市真迈生物科技有限公司 Base recognition method and system, computer program product, and sequencing system

Also Published As

Publication number Publication date
GB0303191D0 (en) 2003-03-19
GB2398383B (en) 2005-03-09
GB0402773D0 (en) 2004-03-10
GB2398301B (en) 2005-06-01
GB2398383A (en) 2004-08-18
GB2398301A (en) 2004-08-18

Similar Documents

Publication Publication Date Title
CN1771336A (en) Methods and means for nucleic acid sequencing
CN1118581C (en) Characterising DNA
CN1553953A (en) Real-time sequence determination
CN1065873C (en) Decorative ignitor
CN1850981A (en) Method for amplifying target nucleic acid sequence by nickase, and kit for amplifying target nucleic acid sequence and its use
CN1608137A (en) Multiplexed analysis of polymorphic loci by concurrent interrogation and enzyme-mediated detection
CN1040220A (en) A kind of method of amplification of nucleotide sequences
CN1896284A (en) Method for identifying allelic gene type
CN1950519A (en) Polony fluorescent in situ sequencing beads
CN1806051A (en) Identification of clonal cells by repeats in (eg.) t-cell receptor V/D/J genes
CN1325458A (en) Method of nucleic acid amplification and sequencing
CN1633505A (en) Nucleic acid amplification methods
CN1863927A (en) Nucleic acid detection assay
CN1764729A (en) Assay for detecting methylation changes in nucleic acids using an intercalatin nucleic acid
CN1592792A (en) Materials and methods for detection of nucleic acids
CN1656233A (en) Exponential nucleic acid amplification using nicking endonucleases
CN101076537A (en) Oligonucleotides labeled with a plurality of fluorophores
CN1668923A (en) DNA micro-array having standard probe and kit including the array
CN1218832A (en) Thermal stable DNA polyase for detecting change of frequency
CN1415020A (en) Method of detecting variation and/or polymorphism
CN1602361A (en) Hybridization portion control oligonucleotide and its uses
CN1977050A (en) Determination of hepatitis c virus genotype
CN1293204C (en) Method for determining alleles
CN1876843A (en) Method of detecting variation or polymorphism
CN1364199A (en) Probe for constructing probe polymer, method of constructing probe polymer and utilization thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication