CN102027130A

CN102027130A - Paired end sequencing

Info

Publication number: CN102027130A
Application number: CN2009801131835A
Authority: CN
Inventors: Z·陈; B·C·戈温; G·C·费雷里; D·R·里奇斯
Original assignee: F Hoffmann La Roche AG
Current assignee: F Hoffmann La Roche AG; Roche Diagnostics GmbH
Priority date: 2008-02-05
Filing date: 2009-02-04
Publication date: 2011-04-20
Also published as: JP2011510669A; EP2242855A1; WO2009098037A1; CA2712426A1

Abstract

An embodiment of a method for obtaining a DNA construct comprising two end regions of a target nucleic acid in an in vitro reaction is described that comprises the steps of: fragmenting a large nucleic acid molecule to produce a target nucleic acid molecule; ligating a recombination adaptor element to each end of the target nucleic acid molecule to produce an adapted target nucleic acid molecule; exposing the adapted target nucleic acid to a site specific recombinase to produce a circular nucleic acid product and a linear nucleic acid product from the adapted target nucleic acid, wherein the circular nucleic acid product comprises the target nucleic acid molecule; and fragmenting the circular nucleic acid product to produce a template nucleic acid molecule comprising a sequence region from each end of the target nucleic acid molecule.

Description

Pairing end sequencing method

Invention field

The present invention relates to nucleic acid sequencing, gene order-checking and sequencing result is assembled into the field of contiguous sequence.

Background of invention

A kind of method that big target nucleic acid (for example people's gene group) is checked order is to use shotgun sequencing.In shotgun sequencing, make target nucleic acid fragmentization or subclone produce a series of overlapping nucleic acid fragment after, measure these fragments sequence.According to the overlapping of each fragments sequence with to the understanding of each fragments sequence, can make up complete target nucleic acid sequence.

A shortcoming of shotgun sequencing is if target nucleic acid sequence comprises many little tumor-necrosis factor glycoproteinss (tandem repetitive sequence or inverted repeats), and then assembling may be very difficult.Can not assemble genome sequence with the iteron causes assembling and occurs breach (gap) in the sequence.Therefore, after initial nucleotide sequence assembling, need to mend the breach of flat sequence coverage, but also need to solve probabilistic problem in the assembling.

A kind of method that solves these breach is to use bigger clone or fragment to check order, and strides across the iteron because these bigger fragments may be sufficiently long to.Yet the big segmental order-checking of nucleic acid is difficult and consuming time in existing sequenator.

Another kind of method of crossing over the breach in the sequence is to determine the sequence of two ends of big fragment.Read long (sequence read) with the unique sequence of the segmental end of shotgun sequencing and compare, the pair of sequences of two ends is read long have known spacing and direction.Using long relatively fragment also to help to contain the sequence of scattering repeat element (interspersed repetitive element) assembles.(Smith, M.W. etc., Nature Genetics 7:40-47 (1994) are called pairing end sequencing method (paired end sequencing) to the method for this type in this area.The present invention includes and be used to match new method, system and the composition of end sequencing method and other nucleic acid technology.

Summary of the invention

One embodiment of the invention relate to the method for DNA construct that is used for obtaining to comprise in vitro reactions two end region of target nucleic acid, and described target nucleic acid can be the big section that derives from the biological gene group.Described method comprises the following steps:

The invention describes the embodiment of method of DNA construct that is used for obtaining to comprise two end region of target nucleic acid, said method comprising the steps of: make large nucleic acids molecule fragment generation target nucleic acid molecule in vitro reactions; Make reorganization adapter element (adaptor element) and each terminal connection of target nucleic acid molecule produce target nucleic acid (the adapted target nucleic acid) molecule that is connected; Make the target nucleic acid of linking be exposed to site-specific recombinase, produce circular nucleic acid product and linear nucleic acid product from the target nucleic acid that is connected, wherein the circular nucleic acid product comprises target nucleic acid molecule; Circular nucleic acid product fragmentation is produced comprise the template nucleic acid molecule that derives from each terminal sequence area of target nucleic acid molecule.

In some implementation processes, described method also comprises the step of using exonuclease to remove the non-annularity molecule.In addition, in some implementation processes, described method is further comprising the steps of: a large amount of circular vectors dna moleculars (carrier DNA molecule) are added in the circular nucleic acid product; Make circular nucleic acid product and vector dna molecule fragmentation produce template molecule and a large amount of linear carrier molecules; Measure the efficient of self-template molecule and linear carrier molecule fragmentization; Template molecule is increased comprise the colony of a large amount of basic identical copies with generation, wherein the linear carrier molecule can not increase; Described colony is checked order, generate the sequence data of the sequence composition that comprises template nucleic acid.

Method of the present invention can be carried out in a large amount of target dna fragments to produce the library of DNA construct simultaneously, and described construct contains to come the end of arrogant dna fragmentation.An advantage of the present invention is need not to use protokaryon or eukaryotic host cell in the external structure library.

Therefore, the present invention relates to be used for to obtain to comprise the method for DNA construct of two end region of target nucleic acid, said method comprising the steps of in vitro reactions:

-make nucleic acid fragment generation target nucleic acid molecule;

-make reorganization adapter element and each terminal connection of target nucleic acid molecule produce the target nucleic acid molecule that is connected;

-target nucleic acid of linking is exposed in the site-specific recombinase, produce circular nucleic acid product and linear nucleic acid product by the target nucleic acid that is connected, wherein the circular nucleic acid product comprises target nucleic acid molecule; With

-make the generation of circular nucleic acid product fragmentation comprise the template nucleic acid molecule that derives from each terminal sequence area of target nucleic acid molecule.

The nucleic acid of fragmentation can be by very large molecular composition.For example, described nucleic acid can be not shear basically or before not by the genomic dna of pre-fragmentation.In this case, described new method is particularly useful for comprising length and is selected from least 3Kb, 8Kb, 10Kb, 20Kb, 50Kb and the target nucleic acid molecule of 100Kb at least at least at least at least at least.

An outstanding example that can be used for the site-specific recombinase of situation of the present invention is the Cre recombinase.

Make a preferable methods of circular nucleic acid product fragmentation comprise the step of atomizing.Preferably make the step of circular nucleic acid product fragmentation comprise that also use II type restriction enzyme carries out the fracture first time and uses atomizing to carry out the fracture second time the circular nucleic acid product, wherein II type restriction enzyme cuts on the restriction site in the heterozygosis linking district (hybrid adaptor region) of circular nucleic acid product and produce short sequence area from target nucleic acid, atomizes then to produce the length sequence area from target nucleic acid.For example, II type restriction enzyme comprises MmeI, and short sequence area comprises the 20bp sequence length.

In first embodiment, described method also is included in the target nucleic acid that will be connected and is exposed to the step of removing the non-annularity molecule after the step of site-specific recombinase.The non-annularity molecule preferably comprises linear nucleic acid product and adapter dimer product, and wherein adapter dimer product is connected to each other by two reorganization adapter elements and produces.Described method equally also preferably includes the step of using at least a exonuclease to remove the non-annularity molecule.

Described method also preferably includes following these steps:

-a large amount of circular vectors dna moleculars are added in the circular nucleic acid product,

-make circular nucleic acid product and vector dna molecule fragmentation produce template molecule and a large amount of linear carrier molecule,

The efficient of-mensuration fragmentation from template molecule and linear carrier molecule,

-make template molecule amplification comprise the colony of a large amount of basic identical copies with generation, wherein the linear carrier molecule can not increase; With

-described colony checked order comprises the sequence data that the sequence of template nucleic acid is formed with generation.

Particularly preferably be, the circular vectors molecule comprises pUC19.Particularly preferably be equally, the circular vectors molecule comprises impaired DNA, and wherein impaired DNA can not increase.And impaired DNA is selected from following type of impairment: UV damage, alkylation/methylate, X ray damage, hydrolysis and oxidative damage.

In and second embodiment that can with it make up compatible with disclosed first embodiment above, method of the present invention is further comprising the steps of:

-make the template nucleic acid amplification comprise the colony of a large amount of basic identical copies with generation; With

Described method preferably also comprises makes second cover adapter element and the template nucleic acid molecule step of connecting, wherein the second cover adapter element comprise the first primer element and the second primer element and also wherein amplification step use the first primer element, the order-checking step is used the second primer element.

The sequence of template nucleic acid is formed the sequence composition that also preferably comprises from each sequence area of target molecule end.

With disclosed first and three embodiment that can with it make up compatible above with second embodiment in, reorganization adapter element comprises the first reorganization adapter element and the second reorganization adapter element, wherein the first and second reorganization adapter elements both all comprise directed element (directional element).

Preferably when the directed element in the first and second reorganization adapter elements is in correlation between the accumulation rate and speed, just produce circular nucleic acid product and linear nucleic acid product.But therefore each self-contained flush end that is connected with target nucleic acid molecule with the orientation of the correlation between the accumulation rate and speed (identical directional relationship) that promotes directed element of the first and second reorganization adapter elements.

The first and second reorganization adapter elements also preferably comprise the overhang that prevents that adapter concatermer (adaptor concatemer) from forming.Directed element also preferably comprises the lox sequential element.The first and second reorganization adapter elements also preferably comprise the palindromic sequence element that is positioned at directed element two ends flank.

In and four embodiment that can with it make up compatible with disclosed first and second embodiments above, the circular nucleic acid product comprises first heterozygosis reorganization adapter (hybrid recombination adaptor), the linear nucleic acid product comprises second heterozygosis reorganization adapter, and wherein first and second heterozygosis reorganization adapter comprises the element of the reorganization adapter that derives from connection.

Template nucleic acid preferably comprises first heterozygosis reorganization adapter between the end sequence district.Very preferred template nucleic acid comprises at least one and first heterozygosis reorganization adapter bonded enrichment label (enrichment tag).Described enrichment label can be the plain label of biological example.

In addition, the present invention relates to be used for to obtain to comprise in vitro reactions the method for a large amount of DNA construct of two end region of target nucleic acid, described method comprises the following steps:

-make a large amount of target nucleic acid molecules of large nucleic acids molecule fragment generation,

-make reorganization adapter element and each terminal connection of target nucleic acid molecule produce a large amount of target nucleic acid molecules that are connected,

-target nucleic acid molecule that is connected is exposed in the site-specific recombinase, from the target nucleic acid molecule that is connected, produce a large amount of circular nucleic acid products and a large amount of linear nucleic acid products, wherein the circular nucleic acid product comprise target nucleic acid molecule and

-make circular nucleic acid product fragmentation produce a large amount of template nucleic acid molecules of each the terminal sequence area that comprises target nucleic acid molecule.

In addition, the invention provides the test kit that is used to implement above to disclose method, described test kit comprises:

-a large amount of reorganization adapter element; With

-site-specific recombinase.

Site-specific recombinase is preferably the Cre recombinase.

This class test kit can comprise especially:

-a large amount of reorganization adapter element,

-site-specific recombinase, Cre recombinase for example,

-exonuclease; With

-circular vectors DNA, for example pUC19 DNA.

The accompanying drawing summary

Can understand in conjunction with the accompanying drawings below by providing for example but and do not mean that the detailed Description Of The Invention that limit the invention to described specific embodiments, described accompanying drawing is attached to herein by reference, wherein:

Fig. 1 represents to match the synoptic diagram of an embodiment of end sequencing strategy.Numerical markings is indicated the starting point of nucleic acid.A flanking region of element (capture element) is captured in " 101 " expression, for example shown in Fig. 3 A left side.Second flanking region of element is captured in " 102 " expression, for example shown in Fig. 3 A right side.Element is captured in " 103 " expression.(and optional size fractionation (size fractionated)) initial nucleic acid of " 104 " expression fragmentation." 105 " expression isolated component (separator element)." 106 " expression polysaccharase.

Fig. 2 represents to match the synoptic diagram of second embodiment of end sequencing strategy.

Fig. 3 represents to capture fragments sequence and design.Sequence identifier is as follows:

The pairing end is captured fragment product S EQ ID NO:1

Oligo?1 SEQ?ID?NO：2

Oligo?2 SEQ?ID?NO：3

Oligo?3 SEQ?ID?NO：4

Oligo?4 SEQ?ID?NO：5

The pairing end is captured fragment product (IIS type, MmeI) SEQ ID NO:6

Short adapter pairing end is captured fragment SEQ ID NO:7

Short adapter pairing end is captured fragment (IIS type, MmeI) SEQ ID NO:8

Fig. 4 represents the segmental embodiment of RE.

Fig. 5 represents segmental another embodiment of RE.

Fig. 6 represents to use the pairing end of hair clip adapter to read regular way (paired end read approach).The hair clip adapter has following sequence:

A

A/ \AAACCCG---GAATTC---AAACCCTTTCGGT---TCCAAC-

3′OH

| ||||||| |||||| ||||||||||||| ||||||

T\ /TTTGGGC---CTTAAG---TTTGGGAAAGCCA---AGGTTG-

5′PO4

T

(SEQ?ID?NO：27)

The hair clip adapter is a continuous kernel acid sequence, is divided into above 4 districts and describes.4 districts from left to right are hair clip district, restriction endonuclease recognition site, biotinylation district and IIS type restriction endonuclease recognition site." 601 " expression hair clip adapter." 603 " expression genomic dna.Met represents methylate DNA." 602 " expression hair clip adapter dimer.The hair clip adapter of being limited property of " 604 " expression endonuclease cutting.The endonuclease cutting of being limited property of " 605 " expression and two hair clip adapters that connect again.SA represents the streptavidin bead.Bio represents vitamin H (biological example elementization DNA).

Fig. 7 represents to match the improvement of terminal method.

Fig. 8 represents that the pairing end with overhang adapter reads regular way.

Fig. 9 represents " (tag primed) that label causes " two end sequencing methods, and this is a kind of method that is used for product order-checking of the present invention.

Figure 10 represents that adapter connects into ring.

Figure 11 represent based on the cyclisation of ssDNA.

Figure 12 represent to match the end sequencing strategy another embodiment synoptic diagram--long PET random fragmentation (Paired-Reads PET Random Fragmentation) is read in pairing.SPRI is meant the reversible fixation method of solid phase (solid-phase reversible immobilization).

Figure 13 represents that the pairing that obtains reads long PET random fragmentation sequencing data from intestinal bacteria (E.Coli) K12 order-checking.

Figure 14 represents the whole bag of tricks with intestinal bacteria endonuclease V cutting double-stranded DNA.Represented Hypoxanthine deoxyriboside by the Nucleotide " I " that frame is lived.

Figure 14 A represents that wherein the nucleotide sequence of double-stranded DNA instructs double-stranded method of cutting by intestinal bacteria endonuclease V in the mode that produces 3 ' strand palindrome overhang.Notice that 3 ' strand overhang contains the Hypoxanthine deoxyriboside residue.

Figure 14 B represents that wherein the nucleotide sequence of double-stranded DNA instructs double-stranded method of cutting by intestinal bacteria endonuclease V in the mode that produces the non-palindrome overhang of 3 ' strand.Notice that 3 ' strand overhang contains the Hypoxanthine deoxyriboside residue.

Figure 14 C represents that wherein the nucleotide sequence of double-stranded DNA instructs double-stranded method of cutting by intestinal bacteria endonuclease V in the mode that produces 5 ' strand palindrome overhang.Notice that 5 ' strand overhang does not contain the Hypoxanthine deoxyriboside residue.

Figure 14 D represents that wherein the nucleotide sequence of double-stranded DNA instructs double-stranded method of cutting by intestinal bacteria endonuclease V in the mode that produces the non-palindrome overhang of 5 ' strand.Notice that 5 ' strand overhang does not contain the Hypoxanthine deoxyriboside residue.

Figure 14 E represents that wherein the nucleotide sequence of double-stranded DNA instructs double-stranded method of cutting by intestinal bacteria endonuclease V in the mode that produces flush end.

Figure 15 is illustrated in the hair clip adapter (Hypoxanthine deoxyriboside hair clip adapter) that contains Hypoxanthine deoxyriboside on the relative chain is carried out another embodiment of the double-stranded pairing end sequencing strategy that cuts by intestinal bacteria endonuclease V synoptic diagram.

Figure 16 represents to use the hair clip of Hypoxanthine deoxyriboside described in Figure 15 adapter method, the distribution that long distance is read in the pairing that obtains from the order-checking of e. coli k12 genomic dna.

Figure 17 represents that the present invention matches the synoptic diagram of another embodiment of end sequencing method.The nucleotide sequence of hair clip adapter, match terminal adapter (" A " and " B ") and PCR primer " F-PCR " and " R-PCR " and see Figure 18.Each matches terminal adapter and has as shown in figure 18 two strands and strand part." Bio " represents vitamin H." Met " represents methylated base.The particulate of " SA bead " expression streptavidin bag quilt." EcoRI " and " MmeI " represents the recognition site of restriction endonuclease EcoRI and MmeI respectively.

Figure 18 represents the nucleotide sequence and the modification of adapter shown in Figure 17 and primer tasteless nucleotide.Figure 18 A represents hair clip adapter sequence." iBiodT " represents inner biotin labeled deoxythymidine." Bio " represents vitamin H." EcoRI " and " MmeI " represents the recognition site of restriction endonuclease EcoRI and MmeI respectively.

Figure 18 B represents to match terminal adapter and PCR primer nucleotide sequence.Each matches terminal adapter (" A " and " B ") by two single stranded oligonucleotides " A cochain " and " chain under the A ", " B cochain " and " chain under the B " annealing generation.5 ' end of the polynucleotide sequence shown in Figure 18 B does not have phosphorylation.

Figure 19 represents to be used for to connect at water-in-oil emulsion the synoptic diagram of an embodiment of the method for polynucleotide.

Figure 20 represents the e. coli k12 genomic dna coverage degree of depth (the depth of coverage) graphic representation that obtains by the pairing end sequencing data that obtain when being with or without the carrier DNA that contains the MmeI site.

Figure 21 represents to be used for the synoptic diagram based on an embodiment of the method for the terminal strategy of pairing of reorganization.

Figure 22 represent to be used for Figure 21 based on embodiment of the adapter of the strategy of reorganization with by the adapter product of its generation.This paper has described SEQ ID No:57-64 in proper order by appearance.

Figure 23 represent according among Figure 21 of adapter directivity based on the synoptic diagram of the product of reorganization strategy.

Figure 24 represent to small part according to the method described in Figure 21 based on reorganization, the distribution of long distance is read in the pairing that is obtained by the e. coli k12 genomic dna.

The synoptic diagram of the advantage that the resulting sequence information of long pairing terminal fragment that Figure 25 represents to adopt the method based on reorganization described in Figure 21 to produce is provided.

Detailed Description Of The Invention

Except as otherwise noted, otherwise employed all scientific and technical terminologies of this paper all have the identical meanings of one skilled in the art's common sense of the present invention.Although can adopt multiple similarly or be equal to the method and the material of methods described herein and material in practice of the present invention, this paper has described preferable material and method.

The present invention relates to be used for two terminal cost-efficient fast methods that separate and check order of the big fragment of nucleic acid.Described method is fast and is suitable for operation automatically, can supply to carry out the big segmental order-checking of DNA and be connected.

Progressively clone shotgun sequencing (clone-by-clone shotgun sequencing) with routine and compare, pairing end sequencing method has a plurality of considerable advantages, and is actually progressively cloning replenishing of shotgun sequencing.In these advantages, most importantly produce the ability of big genomic support (scaffolding) fast, even when genome is scattered with repeat element.Method of the present invention can be used to produce the library of dna fragmentation from vitro reactions, wherein said fragment contains the end of bigger dna fragmentation.Even also can be at least pairing spacing distance (paired distance spacing) between these ends more than the 10kb by utilization, the order-checking labour with minimum uses method of the present invention and assembles the whole genome supporting structure.

First method

In one embodiment, pairing end sequencing method can follow these steps to carry out:

Step 1A

Parent material can be any nucleic acid, comprises for example genomic dna, cDNA, RNA, PCR product, episome etc.Though method of the present invention is especially effective to the nucleic acid parent material of long section, the present invention also is applicable to small nucleic acids, for example clay, plasmid, little PCR product, Mitochondrial DNA etc.

DNA can be from any source.For example, DNA can be the genome of unknown or incomplete known biology from its dna sequence dna.Again for instance, DNA can be the genome of known biology from its dna sequence dna.The order-checking of known group DNA can for the researchist collect the data of related gene group polymorphism and make genotype and disease interrelated.

The nucleic acid parent material can be known dimensions or known dimensions scope.For example, parent material can be that wherein average insertion sequence size and distribution are known cDNA library or genomic library.

Perhaps, make nucleic acid parent material fragmentation (Figure 1A) by in the multiple common method any, comprise atomizing, supersound process, hydrodynamic force is sheared (HydroShear), ultrasonic fragmentation, (for example the DNA enzyme is handled (comprising limited DNA enzyme processing) in the enzymatic cutting, the RNA enzyme is handled (comprising limited RNA enzyme processing) and is digested with restriction endonuclease), pre-fragmentation library (prefragmented library) (for example in the cDNA library) and chemistry (for example NaOH) inductive fragmentation, thermoinducible fragmentation and transposon-mediated sudden change--this can introduce cleavage site, for example spreads all over the restriction endonuclease cleavage site of whole DNA sample.Referring to Goryshin I.Y. and Reznikoff W.S., J Biol Chem.1998 March 27,273 (13): 7367-74; Reznikoff W.S. etc., Methods Mol Biol.2004; 260:83-96; Oscar R. etc., Journal of Bacteriology, April calendar year 2001,2384-2388 page or leaf, the 183rd volume, the 7th phase; Pelicic, V. etc., Journal of Bacteriology, in August, 2000,5391-5398 page or leaf, the 182nd volume.

Some fragmentation methods (for example atomizing) can produce the target dna fragment group, and its size only differs 2 times.Other fractionation method (for example digestion with restriction enzyme) produces bigger magnitude range.Big if desired nucleic acid fragment, it may be favourable then also having other method (for example hydrodynamic force shearing).Shear in hydrodynamic force (Genomic Solutions, Ann Arbor, MI, USA) in, the DNA that makes solution is by a pipe that narrows suddenly.When solution near narrowing when place, fluid quickens to keep by the less volumetric flow rate that narrows the district.In this accelerator, drag force is stretching DNA up to its unexpected fracture.DNA take place fragmentation up to part for shearing force and Yan Taixiao so that can't destroy chemical bond again.The size of flow rate of fluid and contraction has determined final dna fragmentation size.Other method that is used to prepare the nucleic acid parent material can be referring to international patent application no WO04/070007, and this application its full content by reference gives combination.

According to the fragmentation method that is adopted, the DNA end may need precision work (polishing).That is to say, may need the double-stranded DNA end handled making it to make flush end and be suitable for connecting.This step will change with manner known in the art according to the fragmentation method.For example, can use Bal31 to the DNA precision work of mechanical shearing with cutting sequence overhang, for example klenow, T4 polysaccharase and dNTP mend flat to produce flush end can to use polysaccharase.

Step 1B

When segmental size than needed variation more for a long time, can carry out size fractionation to reduce this size variation to nucleic acid fragment.

Size fractionation (Size fractionation) is the optional step that can be undertaken by the multiple currently known methods in this area.The method that is used for size fractionation comprises gel method (for example pulsed-field gel electrophoresis), passes through the precipitator method and the size exclusion chromatography method (gel permeation chromatography) of saccharose gradient or cesium chloride gradient.The selection of certain size range will be depended on the zone length that is striden across by the pairing end sequencing.

A preferred technology that is used for size fractionation is gel electrophoresis (referring to Figure 1B).In a preferred embodiment, the dna fragmentation of size fractionation has each other 25% with interior size distribution.For example, the big small portion of 5Kb can comprise 5Kb+/-1kb (be the fragment of 4Kb～6Kb), the big small portion of 50Kb can comprise 50Kb+/-10kb (is the fragment of 40Kb～60Kb).

Step 1C

In this step, prepared " capturing element ".Capturing element is linear double-strandednucleic acid--it is terminal or double-stranded terminal that it can have the strand that is used to connect the nucleic acid fragment that derives from previous step." capturing element " can be as circular nucleic acid (for example described plasmid of Fig. 1 C) propagation that contains forward and reverse adapter end (being plotted as round thick line district among Fig. 1 C).Can after be cut, this cyclic plasmid use and capture element.These adapter ends contain the nucleotide sequence of the hybridization site of the potential PCR primer that can be used as in subsequent step and sequencing primer.

Between two adapter ends, capture element and can comprise other element, for example restriction endonuclease identification and/or cleavage site, antibiotics resistance mark, protokaryon or eucaryon replication orgin or these combination of elements.The example of this class antibiotics resistance mark especially includes, without being limited to give the gene of resistances such as penbritin, tsiklomitsin, Xin Meisu, kantlex, Streptomycin sulphate, bleomycin, zero mycin (zeocin), paraxin.The protokaryon replication orgin especially also can comprise OriC and OriV.The eucaryon replication orgin can comprise autonomously replicating sequence (ARS), but is not limited to these sequences.In addition, capture element and can contain restriction endonuclease identification and/or the cleavage site (for example preferred unique rare site) that can be used to nucleic acid product (step L) digestion is subsequently become the small segment of can increase (passing through PCR).Capture element and also can comprise mark or label, the biological example element is with purification of nucleic acids or the enrichment that is easy to be used to match end sequencing.

Step 1D

The application known technology makes captures element linearization, and for example (flush end or cohesive end can be used for different produced in fragments in restriction endonuclease digestion; Vide infra and Fig. 1 D).In order to prevent that concatermer from forming (being that a plurality of elements of capturing are connected to each other), can make the topoisomerase enzyme modification of capturing the element dephosphorylation or being used for the TA clone.

Step 1E

The fragment (or size fractionation fragment) of capturing element and steps A or B is connected to form comprises a segmental circular nucleic acid (Fig. 1 E) of capturing an element and a target DNA.Make by currently known methods and to capture element and be connected, for example connect by dna ligase or by topoisomerase enzyme clone strategy with target DNA.

Step 1F

The result of preceding step produces and captures element and may have the aggregate that quite big or small dna fragmentation is connected.Use this step to reject the big inner area of target dna fragment, produce the insertion sequence (Fig. 1 F) that big I is more suitable for the clone of automatization dna sequencing.

In this step, the genomic dna of capturing (i.e. the circular nucleic acid that is produced by step e) digests with one or more restriction endonucleases, and described restriction endonuclease can have one or more cleavage sites in genomic dna.Generally speaking, any restriction endonuclease all can be used for " interior cutting (internal cleavage) ", as long as not cutting in capturing element of restriction endonuclease.Interior cutting is meant in the inner cutting of target DNA and can not cuts the cutting of capturing element.Can design and capture element and make it not contain the cleavage site of selected restriction endonuclease, thus cutting restriction enzyme in selecting.Restriction endonuclease and uses thereof is well-known in the art, and easily is applied to method of the present invention.In addition, can use the size that the combination that is confined to the interior multiple restriction enzyme that cuts separately further reduces target dna fragment.

In a preferred embodiment, genomic dna is cut into the element of capturing of 50-150 base by in these restriction endonucleases one or more.

Step 1G

In this step, make as " isolated component " of the double-strandednucleic acid of known array and be connected to form circular nucleic acid (Fig. 1 G) between the digestion genome material end of step in front.Should " isolated component " be used as two purposes.The first, isolated component can comprise the priming site (seeing below step I) of the rolling circle amplification that is used for little ring.The second, because the sequence of isolated component is known, so it can be used as the identifier (make it possible to prune (trimming) and be easy to the end that connect carried out software analysis) of marker ligand to terminal each end of genome.That is to say that in genomic fragment order-checking process subsequently, the sequence of isolated component can be sent and show the signal that the whole genome fragment has been carried out order-checking.This class isolated component also can comprise other element, for example restriction endonuclease identification and/or cleavage site, antibiotics resistance mark, protokaryon or eucaryon replication orgin or these combination of elements.Although optional the existence such as antibiotics resistance mark and this class component of replication orgin, one of advantage of the inventive method is that described method does not need to use host cell (for example intestinal bacteria) to be used for clone, amplification or other operation of nucleic acid.Isolated component can also be biotinylated, or tagged with mark or label, is easy to the purification of nucleic acids or the enrichment of matching end sequencing.

Step 1H

Provide with strand by the rapid circular nucleic acid (being little ring) that produces of previous step, be used to produce single-chain nucleic acid.This can adopt standard DNA sex change technology, is undertaken by salt, temperature or the pH that changes solution.Other DNA sex change technology is known to those skilled in the art.After the sex change, the dna circle that derives from same little ring still can connect, but this does not influence method of the present invention (Fig. 1 H).

Step 1I

Make primer with comprise can with the isolated component annealing of the sequence of primer annealing.Therefore, this intervening sequence is as the initiator (Fig. 1 I) of rolling circle amplification.

Step 1J

Make sample amplification by rolling circle amplification, produce long single stranded product (Fig. 1 J).An advantage of this rolling circle amplification step is not have the element of isolated component can not increase, and the element of closed loop is not difficult to amplification.

Step 1K

One or more are added cap oligonucleotide (capping oligo) and are positioned at forward and the oppositely strand restriction site annealing of adapter flank (providing double-stranded for it in these zones) (Fig. 1 L).Add the cap oligonucleotide can with capture element, complementary to small part to the adapter of small part zone or both.

Step 1L

Cut into small segment (Fig. 1 M) at cap site adding the cap single stranded DNA.These small segments have the terminal of known array and can easily use conventional amplification technique (for example PCR) amplification.

Second method

In second embodiment, pairing end sequencing method can follow these steps to carry out:

The fragmentation of step 2A-sample DNA

The fragmentation of target nucleic acid is identical with the embodiment of front with size fractionation.

Step 2B-methylates and terminal precision work

If needed, can the segment target nucleic acid be methylated by any methylase.Preferred methylase can be the methylase that influences restriction endonuclease digestion.Can use methylase by at least two kinds of different strategies.In a preferred embodiment, methylase can be realized the restriction endonuclease cutting by only cutting on the restriction site that methylates.In another preferred embodiment, methylase prevents from only to be cut the not restriction endonuclease cutting of methylate DNA.

Terminal accurately machined step is identical with step described in first method.

The connection of step 2C-label adapter

In this step, adapter is connected with the end of target nucleic acid fragment (Fig. 2 I), is created in the fragment that two ends have adapter.Adapter can be any size, but the size of preferred 10-30 base, the more preferably size of 12-15 base.In order to prevent to form the concatermer of adapter and/or target nucleic acid fragment, adapter can comprise flush end and incompatible cohesive end (end that promptly has 5 ' overhang or 3 ' overhang).With after dna fragmentation is connected, remove ligase enzyme at adapter, mend flat cohesive end with polysaccharase and dNTP.

The adapter of this part can be to capture fragment.Capture segmental example and see Fig. 4 and Fig. 5.

In order to prevent that concatermer from forming, adapter can be hair clip adapter (Fig. 6 A).The use of hair clip adapter (for example Fig. 6) prevents that concatermer from forming, and surpasses dimeric any polymer because the hair clip adapter can't form.The another kind of method that prevents concatermer is to use 5 ' end of one or two chain wherein not have the adapter of phosphorylation.

Operable other adapter comprises not phosphorylation adapter, has the advantage of using less procedure of processing, but still needs to use kinase whose phosphorylation step.

As the present disclosure other parts were discussed, adapter can be methylated or biotinylation or both have concurrently.

Enzymic digestion of step 2D-exonuclease and gel-purified

The dna fragmentation that is connected with two hair clip adapters can use exonuclease to carry out purifying.The double-stranded DNA that this exonuclease enzyme purification utilization is connected with the hair clip adapter at two ends is not have 5 ' end or 3 ' this fact of dna molecular of holding that exposes.Connect other DNA in the mixture, for example dna fragmentation and the adapter that is not connected of the double chain DNA fragment that only is connected, not connection with a hair clip adapter, susceptible is in exonuclease (Fig. 6 B).Therefore, the connection mixture that is exposed to exonuclease will be removed most of DNA, but except dna fragmentation that is connected with two hair clip adapters and the hair clip adapter dimer.Because hair clip adapter dimer is obvious more less than dna fragmentation, therefore they can adopt known technology to remove, size fractionation post (for example column spinner (spin column)) for example, one of or agarose or acrylamide gel electrophoresis, or other polynucleotide size diagnostic method of known in the art and/or the argumentation of present disclosure other parts.

In one embodiment, adapter can be beneficial to carry the segmental separation/enrichment of label by biotinylation.

In another embodiment, can capture oligonucleotide and fragment annealing, come purifying to contain the fragment of adapter by making with the sequence label complementary.

Step 2E-is used for the segmental preparation of cyclisation

After adapter being added two ends of target nucleic acid fragment, make this fragment cyclisation.

In order to prepare the target nucleic acid that is used for from cyclisation, for a variety of reasons, may need cutting to be connected the subarea.For example, if use the hair clip adapter, then dna fragmentation can be from cyclisation, because do not have free 5 ' end or 3 ' end.Again for instance, if adapter stays the dna fragmentation that has flush end, then cutting can allow adapter to have 5 ' overhang or 3 ' overhang, and these overhangs (so-called " cohesive end ") promote the efficient of connection greatly.In addition, but the digestion sacrificial vessel that is connected the subarea has the selection of the dna fragmentation of two adapters (every end connects).This is because can design adapter, and feasible cutting with restriction endonuclease can stay compatible cohesive end.After cutting in being connected the subarea, the dna fragmentation (undesirable type) that only has an adapter can have a cohesive end and a flush end, and may be difficult to from cyclisation.Therefore, only having the dna fragmentation of adapter at two ends can cyclisation.

Available several different methods is finished the restricted cutting of adapter.In one approach, adapter is methylated, and be connected with methylate DNA not.Then, construct digests with the restriction endonuclease that only cuts methylate DNA.Because have only adapter to be methylated, so have only adapter to be cut.

In another approach, dna fragmentation can be methylated, and adapter is not methylated.Can limit cutting with only discerning and cut the restriction endonuclease cutting of methylate DNA not to adapter.This can have been methylated or realized by external methylated initiate dna by using.

Be appreciated that in some cases, do not need to digest adapter.For example, only comprise flush end, then can choose the digestion adapter wantonly if derive from the fragment of above-mentioned steps.

Also to be appreciated that, can handle to promote connection/cyclisation dna fragmentation.For example,, perhaps do not contain 5 ' phosphoric acid, then can remove blocking groups, perhaps can add phosphoric acid salt so that fragment is easy to connect if adapter seals.

Step 2F-end is connected to form the cyclisation fragment

Several different methods can be used for cyclisation.

In one embodiment, the ligase enzyme adding is had in the reaction mixture of suitable ligase enzyme damping fluid, can supply dna fragmentation cyclisation again.

In one embodiment, be connected and carry out under rare DNA concentration connecting certainly, and hinder the formation of concatermer with promotion.

In another embodiment, describe according to the present disclosure other parts, connect in water-in-oil emulsion, wherein water-containing drop contains an about fragment for the treatment of cyclisation.

In one embodiment, (signature tag) is connected with target nucleic acid fragment with feature tag, and makes this fragment from cyclisation (referring to Fig. 2).Feature tag is the double-strandednucleic acid sequence between 24-30 base pair.This " feature tag " is similar to " isolated component " of above-mentioned embodiment, because it can be used as the identifier (make it possible to prune and be easy to connect terminal software analysis) of marker ligand to terminal each end of genome.In genomic fragment order-checking process subsequently, the sequence of feature tag is represented the border between two ends of target nucleic acid sequence.

Step 2G

Adding feature tag and after cyclisation, making target nucleic acid fragment further digestion or fragmentation.Fragmentation can adopt the given any fragmentation method of present disclosure to carry out.Referring to for example above-mentioned steps 1A.Perhaps, can use one or more restriction endonuclease digestion target DNAs to produce fragment.

In a preferred embodiment, use atomizer to make nucleic acid fragmentization be about 200-300bp up to average clip size.As shown in Figure 2, some the contained feature tags in these fragments, and other fragment can not contain feature tag.

In this, can adopt standard technique that nucleic acid fragment is checked order.The method that is used for the nucleic acid fragment order-checking is known.The International Patent Application WO 05/003375 that a kind of preferred sequence measurement was submitted to referring on January 28th, 2004.

Step 2H

In an optional step, never enrichment contains the fragment of feature tag in the fragment of feature tag.A kind of method that is used for enrichment is included in sample preparation steps and uses the biotinylation feature tag.Behind fragmentation, can make the fragment biotinylation that contains feature tag, and can use the streptavidin bead in streptavidin post or the solution to carry out purifying.

After the enrichment, can adopt standard technique that nucleic acid fragment is checked order, comprise automatic technology, for example the technology described in the International Patent Application WO of submitting on January 28th, 2,004 05/003375.

The third method

Can match the end sequencing method by the third method.

Step 3A-3E

In the method, steps A～step e can (promptly be carried out according to the step described in the step 2A～2E) according to second method.In addition, in the third method, each adapter comprises IIS type restriction endonuclease site, and the cutting DNA at the about 15-25bp of distance limit endonuclease enzyme recognition site place can be instructed in this site.Known different IIS type restriction endonuclease is in the cutting of distance endonuclease enzyme recognition site different distance place, and expection uses different IIS type restriction endonucleases to regulate this distance.

Step 3F-end is connected to form the cyclisation fragment

Step 3F can carry out according to second method (step 2F), just use characteristic label (referring to Fig. 6 D) not.

Optional enriching step

In any method of the present invention, after connection, all can use exonuclease to remove non-cyclisation fragment and reduce the segmental existence of concatermerization.Because suitably the dna fragmentation of cyclisation has unexposed 5 ' end or 3 ' end again, this can resist the exonuclease enzymic digestion.In addition, bigger concatermer, the 5 ' end or the 3 ' chance of holding that have exposure owing to otch may be bigger.Exonuclease is handled and also can be removed these concatermers with otch.

Optional rolling circle amplification

Cyclized DNA can increase by rolling circle amplification.Say simply, can use oligonucleotide and a chain hybridization of cyclized DNA again.This Oligonucleolide primers polymerase extension.Because template is a circle, polysaccharase has generation the strand concatermer of a plurality of tumor-necrosis factor glycoproteinss of target DNA.This strand concatermer can become two strands by second primer is hybridized with it, and from this second primer extension.For example, this second primer adapter sequence complementation of strand concatermer therewith).The double-stranded concatermer of gained can be directly used in next step.

Digestion/fragmentation of step 3G-DNA

In this step, the cyclisation nucleic acid or the concatermer nucleic acid that derive from rolling circle amplification digest (Fig. 6 D) with IIS type restriction endonuclease.Described in step 3A, each adapter contains at least one IIS type restriction endonuclease cleavage site.IIS type restriction endonuclease will be discerned the IIS type restriction endonuclease cleavage site on the adapter, and cut the nucleic acid of about 10-20 base pair.The example of IIS type restriction endonuclease comprises MmeI (about 20bp), EcoP151 (25bp) or BpmI (14bp).

This step will produce short dna fragmentation (10-100bp), and this fragment comprises segmental two ends of larger dna, has linking subarea (Fig. 6 E) between two ends.A kind of alternative approach that is used for producing same structure is to adopt the described multiple dna fragmentation method of present disclosure other parts any (for example described in step 1A) to make cyclisation nucleic acid random fragmentation.This can be for the fragment of any size of preparation (100bp, 150bp, 200bp, 250bp, 300bp or more than).

As for another kind of method, be not connected other dna fragmentation (Fig. 6 E) in subarea in the middle of can being created in yet.Yet, because it is biotinylated being connected the subarea, can use the solid support that biology is have avidity to carry out the selectivity purifying so comprise the DNA that is connected the subarea, solid support is streptavidin bead, avidin bead, BCCP bead etc. for example.

Step 3H-order-checking

Can check order by hand or by the spawn of automatization sequence technology to the inventive method.By carry out the craft order-checking such as these class methods such as Sanger sequencing or Maxam-Gilbert sequencing is well-known.For example, can be by adopting the automatization sequence measurement as by 454 Life Sciences Corporation (Branford, CT) Yan Fa 454Sequencing ^TMCarry out the automatization order-checking, this method also can be referring to application WO/05003375 that submitted on January 28th, 2004 and the U.S. Patent application USSN:10/767 that submits to the 28 days January in 2004 of pending trial simultaneously, 779; The USSN:60/476 that on June 6th, 2003 submitted to, 602; The USSN:60/476 that on June 6th, 2003 submitted to, 504; The USSN:60/443 that on January 29th, 2003 submitted to, 471; The USSN:60/476 that on June 6th, 2003 submitted to, 313; The USSN:60/476 that on June 6th, 2003 submitted to, 592; The USSN:60/465 that on April 23rd, 2003 submitted to, 071; And the USSN:60/497 of submission on August 25th, 2003,985.

Say simply, in automatic sequencing method (for example sequence measurement of developing by 454 Life Sciences Corp.), an order-checking adapter (order-checking adapter A) can be connected with an end of dna fragmentation, and can be with second terminal be connected of the second order-checking adapter (order-checking adapter B) with dna fragmentation.After connecting, combine with solid support by making vitamin H, dna fragmentation purifying from any order-checking adapter that does not connect can be come out.Isolating nucleic acid fragment can be put into independent reactive tank, using has specific primer further to increase by PCR to order-checking adapter A and order-checking adapter B.Can separate by biotin moiety and A that preferentially is made up of the A-B fragment or B adapter single stranded DNA arbitrary is connected.Can use has specific sequencing primer or the adapter between two ends (for example hair clip adapter) is had specific sequencing primer order-checking adapter A, order-checking adapter B, and the nucleic acid of this amplification is checked order.

In case make a large amount of these fragments that comprise the segmental end of larger dna, then can check order, and pairing end sequence information is assembled to produce genomic partial or complete sequence map it.

The 4th kind of method

Pairing end sequencing method can adopt the alternative of aforesaid method, i.e. the pairing that is called is as shown in Figure 12 read the method for long PET random fragmentation and carried out.Experimental result according to this 4th kind of method is seen Figure 13.

Step 4A-4E

In the method, steps A～step D can carry out according to method described in second method or the third method (promptly as step 2A-2D or step 3A-3D).As alternative approach, step 4D can adopt SPRI (the reversible fixation method of solid phase) to carry out carrying out purifying so that exonuclease is handled fragment.For example, the nucleic acid fragment among Figure 12 is connected with the biotinylation primer, and can uses the bead of streptavidin for example, avidin, low affinity streptavidin or low affinity avidin bag quilt to carry out purifying.

Step 4ECan carry out according to step 2E or the described step of step 3E.

Step 4FCan carry out according to the described step of step 3F.Say simply, can adopt as step 2F or the described any known cyclization method of step 3F and make the rapid linear DNA fragment cyclisation that produces of previous step.

In addition, the optional enriching step that can carry out described in above-mentioned step 3F is come the enrichment circular nucleic acid.Say simply, can remove the nucleic acid that does not have cyclisation by the exonuclease that degraded has the nucleic acid of free-end.Covalence closed circular nucleic acid does not have free-end, can resist exonuclease and attack.Because like this, with exonuclease handle can be when removing linear nucleic acid the enrichment circular nucleic acid.

Step 4G

After cyclisation, can adopt the cited any fragmentation method of present disclosure to carry out fragmentation.A kind of preferable methods is to adopt mechanical shearing to make the circular nucleic acid fragmentation.For example, can vibrate by vortex, carry out mechanical shearing by osculum or described other similar approach of present disclosure other parts by forcing the nucleic acid in the solution.An advantage of mechanical shearing is the nucleic acid (referring to the nucleic acid behind Figure 12 step G) that can produce different lengths.

The dna fragmentation that is not connected the subarea in the middle of also being created in.Referring to Figure 12.Yet, because it is biotinylated being connected the subarea, therefore can adopt solid phase or semi-solid phase support (for example streptavidin bead, avidin bead, BCCP bead etc.) that vitamin H is had avidity to carry out the selectivity purifying to comprising the DNA that is connected the subarea.

Step 4H

Can adopt available any craft or automated method that the product of method 4 is checked order.The details of these class methods are seen above-mentioned steps 3H.

Read long PET random fragmentation method with the pairing shown in Figure 12 as mentioned above a plurality of advantages are provided.The first, method 4 provides high confidence aspect assembling because mechanical shearing can produce long fragment, this fragment so can for long read long.The long length of reading makes the assembling of target sequence have high confidence.The second, owing to mechanical shearing become possible than long segment cause crossing over read than the pairing end in longer nucleic acid district long.By crossing over than the longer nucleic acid district, method 4 helps breach closure (gap closure), and have cross over the nucleic acid district that is difficult to analyze than high likelihood.These difficult region can be for example iteron or high GC content district.Like this, method 4 provides the breach advantage that closed performance is improved.The 3rd, because method 4 provides the ability of breach closure, so when each end can be used for making up assembly parts, this method can be used for the complete genome group is checked order specially.

An example of the advantage of method 4 can be referring to Figure 13.Figure 13 has described the e. coli k12 genomic dna that employing method 4 checks order.As observable, adopt this method,, all be feasible obviously from not waiting less than 50 to about 400 than the long long length distribution of reading.In addition, can produce the fragment length of about 3kb and to its end sequencing.This has just shown with other method compares, and method 4 provides the closed performance of breach preferably.

The 5th kind of method

Can adopt the alternative of the aforesaid method that provides as Figure 15 to match end sequencing.

In the method, adapter can be designed to Hypoxanthine deoxyriboside hair clip adapter, it has mixed Hypoxanthine deoxyriboside Nucleotide (this paper also claims inosine) on the relative chain of hair clip double stranded region.Intestinal bacteria endonuclease V (EndoV) is introducing strand otch (cut/nick) between the 2nd of inosine Nucleotide and the 3rd Nucleotide 3 '.(Yao M and Kow YW, J Biol Chem.1995,270 (48): 28609-16; Yao M and Kow YW, J Biol Chem.1994,269 (50): 31390-6; Yao M etc., Ann N Y Acad Sci.1994,726:315-6; Yao M etc., J Biol Chem.1994,269 (23): 16260-8).

As shown in Figure 14, whether the positioned opposite decision of inosine in the hair clip adapter can produce 3 ' strand overhang (Figure 14 A and Figure 14 B), 5 ' strand overhang (Figure 14 C and Figure 14 D) or flush end (no overhang) (Figure 14 E) when two chains of EndoV cutting.Also can design the sequence of hair clip adapter, when EndoV cuts, produce the non-palindrome (Figure 14 A and Figure 14 B) or the palindrome (Figure 14 A and Figure 14 C) strand overhang.Well-known in the art be Hypoxanthine deoxyriboside will with 4 kinds of base A, G, C and T any and with self the pairing (Watkins and SantaLucia, 2005, Nucleic Acids Res.33 (19): 6258-67).In addition, adapter can contain just like the described IIS type of present disclosure other parts restriction endonuclease recognition site (for example MmeI).

Step 5A (Figure 15 steps A)

In the method, steps A can be carried out according to method described in the step 1A basically.Can make target dna fragmentization by aforesaid any physics known in the art or biochemical method.Can choose wantonly by the described any size fractionation method of present disclosure other parts the gained fragment is carried out size fractionation.

Step 5B and 5C (Figure 15 step B+C)

Can carry out precision work to the end of target DNA by any fine-finishing method described herein, and can be connected to form the target DNA of adapter mark with above-mentioned Hypoxanthine deoxyriboside hair clip adapter.

Step 5D (Figure 15 step D)

The ligation thing can be handled with one or more exonucleases (as the argumentation of this paper other parts), and carries out size fractionation with the required reaction product of enrichment by any method as herein described.

Step 5E (Figure 15 step e)

The target nucleic acid of adapter mark cuts with EndoV.The condition of cleavage reaction can be the disclosed any condition of following document: Yao etc. (Yao M and Kow YW, J Biol Chem.1995,270 (48): 28609-16; Yao M and Kow YW, J Biol Chem.1994,269 (50): 31390-6; Yao M etc., Ann N Y Acad Sci.1994,726:315-6; With Yao M etc., J Biol Chem.1994,269 (23): 16260-8).The technician will be appreciated that and can also adopt conditions of similarity.

Step 5F-H (Figure 15 step F-H)

In the 5th kind of method, step F-H can be undertaken by method as described in second kind, the third or the 4th kind of method (promptly as step 2F-H or step 3F-H or step 4F-H).

The Hypoxanthine deoxyriboside hair clip adapter of the 5th kind of method is favourable, because EndoV cutting when some injury site of inosine or DNA or base mispairing exist only.Therefore, target nucleic acid will can not handled cutting by EndoV.Therefore, when the EndoV site was uniqueness to adapter, target DNA did not need to protect by methylating as in the above-mentioned embodiment some.The removal step that methylates has been saved the time, and has eliminated the relevant problem that not exclusively methylates with target DNA.In addition, compare with EcoRI digestion, EndoV digestion is very fast, has therefore shortened and has implemented the required time of this method.

The pairing that obtains by Hypoxanthine deoxyriboside hair clip adapter method is read long result's a example and is seen Figure 16.Prepare the e. coli k12 genomic dna and check order (Figure 15) according to the 5th kind of method.The mean distance that pairing is read between the length is 2070bp (standard deviation=594).

The 6th kind of method

In other embodiments, can match the end sequencing method by some or all the method in comprising the following steps, referring to Figure 17 and Figure 18.

The fragmentation of step 6A-target DNA (Figure 17 A)

According to the 6th kind of method, polynucleotide molecule (for example genomic dna) fragment of target DNA sample is changed into greater than about 500 bases, greater than about 1000 bases, greater than about 2000 bases, greater than about 5000 bases, greater than about 10000 bases, greater than about 20,000 base, greater than about 50,000 base, greater than about 100,000 base, greater than about 250,000 bases, greater than about 100 ten thousand bases or greater than the molecule of about 500 ten thousand bases.In a preferred embodiment, fragment length does not wait to about 5kb from about 1.5kb.Can finish fragmentation by described any physics of present disclosure other parts and/or biochemical method.In a preferred embodiment, target DNA is by physics strength random shearing, for example by using HydroShear

Instrument (Genomic Solutions) carries out.According to required clip size the DNA that shears is carried out purifying then.This optional size is selected and can be realized by any big or small system of selection known in the art and disclosed herein, for example electrophoresis and/or liquid chromatography.In a preferred embodiment, by at SPRI

Carry out purifying on the size exclusion bead and select the DNA sample (Agencourt of shearing according to size; Hawkins etc., Nucleic Acids Res.1995 (23): 4742-4743).For example, in the bacterial genomes order-checking experiment of classics, can be to segmental end (in pairs) order-checking of about 2-2.5kb for contig ordering (contig ordering).Big fragment has the genomic order-checking that is beneficial to more high biology (for example fungi, plant and animal).

Methylate (Figure 17 B) of some restriction site of step 6B-

As described below, with after target dna fragment is connected, when preparing for cyclisation, adapter can be with one or more restriction enzymes cuttings at adapter.In order to prevent target DNA, make target DNA exempt from digestion by modifying with corresponding methylase by selected digestion with restriction enzyme.In a preferred embodiment, adapter is the hair clip adapter, and carries EcoRI restriction site (Figure 18 A).Therefore, in a preferred embodiment, before carrying out cyclisation, when producing the EcoRI cohesive end, use the EcoRI methylase that the EcoRI restriction site that exists in the sample dna fragment is methylated with the integrity of protection dna fragmentation by the hair clip adapter by connection.

Terminal precision work of step 6C-fragment and phosphorylation (Figure 17 C)

The hydrodynamic force of DNA is sheared generation have some fragments of turned welt end (frayed end) (strand overhang).Flush end is preferred for adapter connection subsequently.Therefore, optional " mend flat " with archaeal dna polymerase and/or by with exonuclease (for example mung-bean nuclease) " chewing-back ", make any turned welt terminal smooth and make it to be easy to connect by enzymatic method.Advantageously, some archaeal dna polymerases also have exonuclease activity.Choose wantonly after smooth reaction, preferred available polynucleotide kinase makes fragment 5 ' end phosphorylation.In a preferred embodiment, use T4 archaeal dna polymerase and T4 polynucleotide kinase (T4 PNK) to mend gentle phosphorylation respectively.Use 3 ' the recessed end (5 ' overhang) of T4 archaeal dna polymerase, and its strand 3 ' → 5 ' exonuclease activity is sloughed 3 ' overhang by its 5 ' → 3 ' polymerase activity next " mending flat " DNA.The kinase activity of T4PNK is added to 5 '-hydroxyl terminal with phosphate group.

Step 6D-hair clip adapter connects (Figure 17 D and Figure 18 A)

According to the present invention, the double chain oligonucleotide adapter is connected with the end of target dna fragment.In a preferred embodiment, adapter is hair clip adapter (Figure 18 A).An advantage of hair clip adapter is that the connection event between the adapter will only produce the adapter dimer, has promptly prevented the formation of polymer adapter concatermer.In addition, its hairpin structure can protect sample fragment to avoid being used for sloughing the not exonuclease enzymic digestion (step 6E) of junction fragment.The preferred hair clip adapter design of shown in Figure 18 A one contains EcoRI and MmeI restriction site.EcoRI can be used to produce cohesive end (step 6F) for its cyclisation (step 6G) on each segmental end, MmeI is the IIS type restriction enzyme that is cut DNA 20bp by its recognition site; It is used to cut into the end of cyclisation sample fragment, produces the terminal label of pairing to be checked order.The technician should be appreciated that, EcoRI can replace with in multiple other the endonuclease that has the variation followed in the nucleotide sequence of adapter oligonucleotide any, and uses suitable methylase with the protection target dna fragment.Equally, MmeI can replace with other IIS type restriction enzyme, as long as selected enzyme is enough to for the pairing of downstream sequence assembling terminal in cutting on enough distances of its restriction site to produce length.In a preferred embodiment, the hair clip adapter on the site shown in Figure 18 A for example by biotinylation.Other biotinylation site also suits, and the technician can select for use.During the terminal adapter of pairing connects, during filling-in (fragment reparation), and during the terminal amplified library of pairing, biotin moiety can supply the optional selection of the pairing terminal fragment that contains adapter and the segmental optional immobilization in terminal library (in MmeI digestion back) of matching.

Step 6E-exonuclease is selected (Figure 17 E)

The connection of hair clip adapter then takes place in preferred exonuclease enzymic digestion, to remove at two ends not any DNA that correctly agrees with the hair clip adapter; And the purifying on SPRI size exclusion bead is removed unwanted small molecules classification, for example adapter-adapter dimer.The exonuclease enzymic digestion can be carried out with one or more of various exonucleases known in the art.Digestion is preferred to be finished with active combination, it can be provided with 3 ' → 5 simultaneously ' and 5 ' → 3 ' both direction digest strand and double-stranded DNA.In a preferred embodiment, the exonuclease enzyme mixture contains intestinal bacteria exonuclease I (3 ' → 5 ' strand exonuclease), phage exonuclease (5 ' → 3 ' strand and double-stranded exonuclease) and phage t7 exonuclease (5 ' → 3 ' double-stranded exonuclease can start at breach and incision).

Step 6F-EcoRI digests (Figure 17 F)

In a preferred embodiment, use the kernel cutting that causes by EcoRI, by cutting hair clip adapter each segmental terminal produce cohesive end (Figure 18 A) but and the feed section carry out cyclisation.To remove hairpin structure at the fragment end with EcoRI digestion, stay cohesive end.The EcoRI site, inside that exists in the sample DNA is protected by methylating of carrying out in step 6B not long ago.

Step 6G-cyclisation (Figure 17 G)

Fragment is carried out the intramolecularly connection and cyclisation by its EcoRI cohesive end then.Therefore the site that connects has two-part hair clip adapter (the EcoRI site that head to head, has reconstruct; Be total to 44bp), are ends of sample fragment in both sides.Carry out another kind of exonuclease enzymic digestion to remove any non-cyclized DNA.

Step 6H-MmeI digests (Figure 17 H)

Then, the cyclized DNA fragment is carried out restriction enzyme digestion with MmeI.This IIS type restriction enzyme (is staying 2nt 3 ' overhang, is promptly cutting at 20/18nt apart from the cutting of the about 20bp of its restriction site place; This enzyme also produces some minority products, and its otch self-alignment is lighted by 19bp and do not waited to 22bp).End at the hair clip adapter that is connected with sample dna fragment has MmeI site (Figure 18 A); Carry out restriction enzyme digestion in these sites and produce the terminal DNA of pairing library fragment, respectively contain " two " hair clip adapter (44bp) of connection and two 20bp ends of sample fragment, length is 84bp altogether.

Step 6I-separates (Figure 17 I) with the streptavidin bead

In this step, can choose the MmeI restricted fragment of rejecting " two " hair clip adapter that lacks biotin label, do not connect wantonly.The biotin label that exists in the hair clip adapter is combined with streptavidin or avidin bead, can make the library immobilization (and from other MmeI restricted fragment, separating) of pairing terminal fragment.

Step 6J-matches terminal adapter and connects (Figure 17 J)

In this step, the segmental end in the terminal library of pairing that will in step 6H, produce and choose wantonly purifying in step 6I be called the double-stranded adapter that matches terminal library adapter (paired end library adaptor) or match terminal adapter (paired end adaptor) and be connected (Figure 18 B).These match terminal adapter provides trigger area (priming region) supporting amplification and nucleotide sequencing simultaneously, and can comprise and be used for the equencing at 454S ^TM(for example 4 Nucleotide) of the weak point of accurately searching in the system " order-checking key (sequencing key) " sequence.Adapter can have " degeneracy " 2-base strand 3 ' overhang.Degeneracy is meant that 2 outstanding bases are at random, and promptly they can be G, A, T or C separately.If use the enzyme beyond the MmeI, then the technician can easily design the terminal adapter of the pairing compatible with other enzyme.Exemplary adapter shown in Figure 18 B is designed to very help matching terminal library fragment and directed connection of each adapter of containing degeneracy 2bp 3 ' overhang at its 3 ' end, described adapter can only connect with the end of the terminal library of the pairing that produces with MmeI fragment (5 ' end of supposition adapter sees below not by phosphorylation).In the ligation that contains the adapter of a large amount of molar excess (adapter: the fragment ratio is 15: 1), adapter can combine with the terminal library of pairing fragment, maximally utilises pairing terminal library fragment simultaneously and makes the minimizing possibility that forms the terminal library of pairing fragment concatermer.Adapter itself can be that phosphorylation does not need by filling-in reparation (step 6K) but therefore connect product subsequently so that the dimeric formation of adapter minimizes.

Step 6K-filling-in (Fig. 6 K)

If the terminal adapter of the pairing that connects in step 6J is by phosphorylation, then its with the segmental 3 '-contact of the terminal library DNA of pairing on will have breach.Can use the strand displacement archaeal dna polymerase to repair this two " breach " or " otch ", therefore described polysaccharase identification otch, displacement has the chain (becoming the free 3 ' end of each adapter) of otch, and extends in the mode that causes repairing otch and forming total length dsDNA.In a preferred embodiment, use BstDNA polysaccharase (big fragment).Other strand displacement archaeal dna polymerase known in the art also is applicable to this step, for example phi29 archaeal dna polymerase, dna polymerase i (Klenow fragment) or Vent Archaeal dna polymerase.

Step 6L-increase (Fig. 6 L)

Can choose the terminal DNA of amplification " linking " pairing library wantonly.Preferred amplification is undertaken by PCR, but also can adopt other nucleic acid amplification method known in the art and/or described herein.Oligonucleotide F-PCR and R-PCR shown in preferred Figure 18 B can be used as the PCR primer.

No matter whether amplification (as top paragraph is described), all checked order in the terminal DNA of " linking " pairing library subsequently.Preferably the various molecules to the library check order.If selected dna sequencing method needs a large amount of identical template molecules in each unique sequencing reaction, the then method amplification that can clone of each molecule in library.Preferred clonal expansion is undertaken by bead emulsion PCR according to the method described in international patent application no WO 2005/003375, WO 2004/069849, the WO 2005/073410, and described each application all is attached to herein by reference.

The 7th kind of method

In another embodiment, can match end sequencing by some or all the method in comprising the following steps, see Figure 21-25.

Described embodiment provides particularly advantageous and creationary method, and this method provides by connection carries out cyclisation and be suitable for implementing aforesaid method and some or all alternative approach of alternative.In addition, now the embodiment of Miao Shuing is effective especially from (i.e. the pairing end-to-end distance of about 20Kb from) for producing more than the 10Kb pairing end-to-end distance, yet what it is also understood that is, described strategy based on reorganization also can be used for being shorter than the cyclisation fragment of 10Kb (the pairing end-to-end distance of promptly about 3Kb or 8Kb from).The embodiment utilization of Miao Shuing now is used for the cyclisation of nucleic acid molecule based on the strategy of intramolecularly reorganization, described nucleic acid molecule comprises for long pairing end-to-end distance from needed sequence length, and is providing main advantage aspect the efficient that is used for nucleic acid molecule (especially large nucleic acids molecule) cyclisation.

Some embodiment preferred comprise that it is said is external excision by the recombining reaction method, described method is utilized Cre/Lox type site-specific recombinase (hereinafter referred to as " SSR ") system, be used for the cyclisation of linear linking target fragment to produce a kind of segmental circular nucleic acid of target and second kind of linear fragment that comprises the excision of heterozygosis adapter sequence of comprising, an example of these class methods as shown in Figure 21.For example, Figure 21 provides the example overview based on the strategy of SSR, is used to produce the library of pairing distance for the above terminal template nucleic acid molecule of the pairing of checking order of 10Kb.As hereinafter will describing in detail, Figure 21 illustrates following method: make genomic dna or other required dna fragmentationization, connect

adapter

2105 and 2107 and produce linking fragment 2100, according to needed length it is selected then.Also illustrated among the figure from being connected the SSR reconstitution steps that fragment 2100 produces cyclic products 2150 and linear product 2155, wherein shear cyclic products 2150 with mechanical system and produce the terminal template 2160 of linear pairing, making it subsequently to increase produces 2170 groups that comprise many essentially identical template 2160 copies.

Various equivalent modifications should be understood that, although this paper has described the embodiment of using the SSR system of Cre/Lox, but also can use other member of intergrase family, for example Int/att and FLP/FRT, so the disclosure of Cre/Lox should not regarded as restrictive.In addition, although generally describe this method according to individual molecule, but should be understood that, this method is carried out on numerous molecules in identical or similar reaction environment simultaneously, the water-in-oil emulsion reactor (water-in-oil type emulsion reactor) described of this specification sheets other parts for example, wherein a large amount of target molecules are about a molecule or 10,100,1000,1,000,000 molecule etc. in various reaction environments.For example, utilize as water-in-oil emulsion strategy that this specification sheets other parts are described suppresses the intermolecular incident formation of concatermer (be etc.), and promote to produce the needed intramolecularly of cyclisation product and recombinate, more details see below.

Step 7A-fragmentation

Described in above-mentioned various embodiments, the polynucleotide molecule fragment of the target DNA sample in original gene group or other source is changed into greater than about 10,000 base, greater than about 20,000 base, greater than about 50,000 base, greater than about 100,000 base, greater than about 250,000 bases, greater than about 100 ten thousand bases or greater than the molecule of about 500 ten thousand bases.In some preferred embodiments, the scope of fragment length does not wait to surpassing 100Kb from about 10Kb to about 50Kb, from about 10Kb to about 100Kb or from about 10Kb.Fragmentation can be realized by described any physics of present disclosure other parts and/or biochemical method.In a preferred embodiment, target DNA is by physics strength random shearing, for example by using HydroShear

Instrument (Genomic Solutions).Although should be understood that,, then can adopt the segmental any method described herein that produces if method selected can produce needed fragment length.

The terminal precision work of step 7B-

In the existing alternative of describing, can adopt the described any method of present disclosure other parts, each segmental end is carried out precision work, for example the method described in the step 6C above.As described, the adapter that preferred flush end is used for subsequently connects.Therefore, optional can any frayed end or overhang being carried out smooth and make it to be easy to connecting by enzymatic method with archaeal dna polymerase " benefit is flat " and/or by with exonuclease (for example mung-bean nuclease) " chewing-back ".Advantageously, some archaeal dna polymerases also have exonuclease activity.Choose wantonly after smooth reaction, can preferably make segmental 5 ' end phosphorylation with polynucleotide kinase.In a preferred embodiment, use T4 archaeal dna polymerase and T4 polynucleotide kinase (T4 PNK) to be used to mend gentle phosphorylation respectively.The T4 archaeal dna polymerase is used for by 3 ' the recessed end (5 ' overhang) of its 5 ' → 3 ' polymerase activity " benefit is flat " DNA, and its strand 3 ' → 5 ' exonuclease activity removes 3 ' overhang.The kinase activity of T4 PNK adds to 5 '-hydroxyl terminal with phosphate group.

Step 7C-adapter connects

And for example the above is connected the double chain oligonucleotide adapter with the end of accurately machined target dna fragment.In the existing embodiment of describing, adapter can comprise the loxP adapter, and an example of this adapter is seen Figure 22.For example, Figure 22 provides the illustrative example of 2 double-stranded adapter material loxP-6F adapters 2105 and loxP-6R adapter 2107, each adapter has first flush end that lacks 5 ' phosphate, and has the 3 ' overhang and the phosphorylase 15 of 3 sequence locations ' second end of end.Those of ordinary skill should be appreciated that described 3 ' overhang is not limited to 3 sequence locations, may be greater or less than 3 according to required condition.

In order to promote cyclisation product, adapter 2105 is connected with the end of the target dna fragment of precision work (promptly smooth) with 2107 first flush end, make lox P 2200 districts in each adapter with orientation in the same way, relevant details see below.In addition, comprise two kinds of adapter materials, second end of overhang and 5 ' phosphorylation of each adapter the specificity advantage is provided.First advantage is to suppress the polymer adapter to form the aforesaid adapter concatermer molecule of generation.In other words, only the flush end of adapter 2105 and adapter 2107 is attachable each other, having limited this class adapter connection event forms and the opposed dimer of long concatermer, described concatermer difficult be connected the target molecule differentiation, and consume the adapter molecule of significant proportion in some cases, make them can not be used for and being connected of target molecule.Second advantage is the efficient that 5 ' phosphorylation and 3 ' overhang improve the exonuclease enzyme liberating separately, and therefore removing of cyclisation molecule is not improved, and all details see below.

Step 7D-size is selected

Next, can carry out purifying to the nucleic acid fragment 2100 that adapter connects according to required clip size.This optional size selects step can adopt any big or small system of selection known in the art and disclosed herein such as electrophoresis and/or liquid chromatography to carry out.In one embodiment, select to shear the size of DNA sample by aforesaid gel electrophoresis.In described embodiment, produce the dna fragmentation of size fractionation based on the method for gel, described fragment comprises the size distribution of the length that (for example is the scope of desired length 25%) to a certain degree with desired length.For example, the fixed big small portion of 20Kb of target will produce a group fragment, its length be 20Kb+/-5kb (promptly producing the fragment length scope of 15Kb-25Kb).In identical or other embodiment, can use alternative size fractionation technology, particularly wherein need than long segment with strengthen the pairing end-to-end distance from.One of this class technology that is suitable for more macromolecular size fractionation is called " pulsed field gel electrophoresis " (PFGE hereinafter referred to as, referring to Schwartz DC, Cantor CR.Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis (DNA to the yeast chromosomal size separates by pulsed field gradient gel electrophoresis) .Cell.1984 May; 37 (1): 67-75, the document by reference its integral body be attached to be used for all purposes herein).Compare with the resolving power that is reached with the standard gel electrophoresis method, PFGE can carry out size fractionation to the large size molecule with much bigger resolving power.For example, it is inoperative that the person of ordinary skill in the relevant it being understood that the standard gel electrophoresis method generally carries out effective size separation to macromole, and especially sequence length is about the above nucleic acid molecule of 20Kb.The PFGE method provides accurately distinguishing this class large nucleic acids molecular size on the other hand.

In addition, in the embodiment of application standard gel electrophoresis or PFGE method, need to adopt the method that those of ordinary skills are known to be called " electroelution " to be used for extracting nucleic acid or protein molecule effectively sometimes from polyacrylamide or sepharose.

In some embodiments, the method (for example method of in step 6K, describing) that adopts this specification sheets other parts to describe, mending the flat breach that stays from above-mentioned adapter Connection Step may be very important.

Step 7E-carries out cyclisation by reorganization

Next, linearity is connected nucleic acid sequence fragments 2100 is exposed in the site-specific recombinase, it is terminal and in abutting connection with the Cre recombinase in the 34bp loxP district 2206 of the adapter 2105 of target nucleic acid sequence and 2107 that for example identification connects target nucleic acid sequence.For comprising the linking fragment in the adapter loxP district 2206 (details see below) of orientation in the same way, the excision of Cre recombinase comprises the linear fragment of the weak point of loxP district heterozygote and (sees Figure 21, as linear product 2155), and make the target nucleic acid cyclisation produce ring molecule (seeing Figure 21) as cyclic products 2150 with the second heterozygosis loxP district and target nucleic acid.For example, Figure 21 and Figure 23 explanation is by two recombinant products of linear product 2155 of the conduct of Cre recombinase generation and cyclic products 2150.Figure 22 further specifies the composition in the reorganization adapter that is present in the cyclisation product 2,150 2110 and heterozygosis loxP district 2208.Those of ordinary skill should be appreciated that, cutting in the loxP district 2206 of Cre recombinase in two adapters 2105 and 2107, and with original adapter 2105 and 2107 both the loxP districts of the conduct 2206 district heterozygotes formation product of recombinating.For example, combination in the loxP district 2206 of Cre recombinase any in 6F 2105 and 6R 2107 adapters, and each comfortable identical sequence position cutting.Bonded recombinase/nucleic acid complex is positioned at segmental each end of target nucleic acid sequence of linking, and reacts incision tip with 6F 2105 and 6R 2107 adapters each other and couple together and make the nucleic acid fragment cyclisation thus.In this example, recombinase makes the section that downcuts from 6F 2105 adapters that lack 8bp targeting sequence (directional sequence) 2200 be connected with the section of 6R 2107 adapters that comprise 8bp targeting sequence 2200, thereby produces cyclic products 2150.In addition, 8bp targeting sequence 2200 elements that derive from 6F 2105 adapters are connected with 6R 2107 adapters of shortage 8bp targeting sequence 2200 elements of remainder, produce short heterozygosis adapter, be above-mentioned linear product 2155.Resulting heterozygosis adapter as cyclic products 2150 is seen Figure 22, is the adapter 2110 that comprises loxP district 2208.The embodiment that comprises the zone 2208 of adapter 2110 in the zone 2208 formed with the essentially identical sequences in loxP district 2206 and the cyclic products 2150 also comprises two related embodiments (label derives from one of adapter 2105 and 2107) of enrichment label 2205.In some embodiments, the existence of two of enrichment label 2205 embodiments improves the efficient of enriching step subsequently.As shown in Figure 22, the enrichment label can comprise vitamin H, yet should be understood that, can use the enrichment label (promptly combination is to (binding pair)) of any kind described herein or generally known in the art.It is also noted that adapter 2110 also comprises the original adapter 2105 that is connected to target dna fragment in the cyclic products 2150 and 2107 flush end.

Figure 22 and Figure 23 provide the directivity in loxP site for an example that produces the importance of cyclisation product from the SSR method.In the example of Figure 22, the wild-type form in loxP district 2206 (showing with the frame table around the sequence area) combines with adapter 2105 and 2107.Yet, should be understood that, can use other mutant, as long as keep SSR functional.In addition, various equivalent modifications should be understood that, in described SSR system, the loxP district has direction characteristic, and this category feature will influence product when being exposed to the Cre recombinase.In the example of Figure 22, both zones 2206 of

6F adapter

2105 and 6R adapter 2107 comprise that to the Cre/Lox system be typical feature, comprise that promptly length is the directed loxP sequence 2200 (directivity uses the arrow that links to each other with sequence 2200 to represent) of 8bp.In addition, zone 2206 is included in the palindromic sequence element of about 13bp of targeting sequence 2200 each side flank.

Figure 23 provides the illustrative example according to the SSR product that relative orientation produced in loxP district 2206.The first, Figure 23 A provides the representative example that is connected fragment 2100 ', and this linkings fragment has and concerns localized two loxP districts 2206 in the opposite direction and by the linear inversion product 2305 of Cre recombinase generation (change represent with the position in shadow zone 2300).Diversely be, Figure 23 B provides representational linking fragment 2100 "; this linking fragment has with equidirectional and concerns localized two loxP districts 2206 and the product that is produced by the Cre recombinase; it comprises first cyclic products 2150 of inclusion region 2208 (in aforesaid reorganization adapter 2110) and be connected the second linear product 2155 that fragment 2100 is downcut certainly, and comprises second recombination zone 2208.The recombining reaction that should be understood that Figure 23 B is " two-way " as four-headed arrow is represented, wherein with by integrating the represented integration direction of arrow 2336 compares, and the amplitude of excision arrow 2334 expression the Direction of Reaction is bigger.What the person of ordinary skill in the relevant it is also understood that is, the arrow 2334 and 2336 that provides only is used for illustration purpose, is not the definite scale by the directivity actual margin, and described directivity may depend in part on reaction conditions at least.Importantly, in a preferred embodiment, make the reaction conditions optimization to promote the excision direction and to form cyclic products.

Step 7F-removes non-annularity nucleic acid

Subsequently, can adopt the described any method of this specification sheets other parts to remove all linear nucleic acid molecules, the target nucleic acid fragment that comprise the product 2155, inversion product 2305, adapter dimer of excision, is not connected etc.For example, can adopt the exonuclease processing policy to remove all linear nucleic acid molecule products or other residual linear fragment effectively.

In some embodiments, may need to use more than one type exonuclease to remove the efficient of any unwanted linear nucleic acid molecule with raising.For example, in some embodiments, can use two or more exonuclease enzymes, can include but not limited to exonuclease 1 (also can be described as EXO 1) exonuclease enzyme and be called as the DNA enzyme that depends on ATP with digestion linear dsdna (i.e. Plasmid-Safe for example ^TMDepend on the DNA enzyme of ATP, this enzyme can be available from Epicentre Biotechnologies, Madison WI).

Step 7G-linearizing

Then, can adopt any of the whole bag of tricks that this specification sheets other parts describe, make circular nucleic acid product 2150 fragmentations form linear nucleic acid molecule, it comprises the end region of initial target nucleic acid, has the linking subarea in the middle.In the existing alternative of describing, possibility is particularly advantageous to be to utilize one of mechanical shear cut type method, for example can select preferred fragment length and promote the atomizing that the pairing label forms, and wherein one or more paired labels have than long sequence length.

In addition, importantly, notice that adapter element shown in Figure 22 lacks other IIS type restriction site of MmeI or the description of this specification sheets other parts, yet should be understood that easily that this class site also can be included.In fact, in some embodiments, it is favourable that the MmeI site is combined with one of adapter material, makes to be connected with two adapter materials and during cyclisation when nucleic acid fragment, can use MmeI enzyme cutting ring molecule, stay the 20bp label at an end of new linear fragment.Then, adopt mechanical process to make the linear fragment fragmentation once more, more details see below and the other parts of this specification sheets, and wherein mechanical fragmentation is selected the much bigger particular patch segment length of combination than 20bp label and 34bp loxP district.The result is that the length of paired second label is longer than first label, and greatly reduces the possibility that interleaves fragmentation in the district that comprises adapter 2110.The preferred length of paired second label can be to small part on average reading long or always read long ability based on the sequence measurement of the sequence data that is used for producing gained pairing terminal fragment.

In some embodiments, may also can before the linearizing step, add carrier DNA to hang down the valuable target dna fragment that quantity and/or inferior quality exist in order to prevent from purification step subsequently, to lose unintentionally.In the described embodiment of using II type restriction site (for example MmeI), it may be favourable using the MmeI carrier DNA, as the description of this specification sheets other parts.

Also may be advantageously, in identical or alternative embodiment, use the carrier DNA of other type that is more suitable for special applications to be used for other purpose.One in this classification comprises the efficient of analyzing mechanical operation steps (for example above-mentioned linearizing step).In some embodiments, the efficient of needs assessment machinery fragmentation method, atomization for example as herein described, the terminal template 2160 of wherein matching is not to produce to be used for effective mensuration of this efficient with q.s.Therefore, need by before the fragmentation step, adding some circular vectors DNA to increase the amount of fragmentation product.Yet this class carrier DNA product is difficult to distinguish the terminal template 2160 from matching when merging in sample.In this class embodiment, more advantageously after carrying out the sizing step, limit the amount of the carrier DNA that can check order.In other words, the useful carrier DNA that is to use is used for the analysis of power operation step, but does not generally need to consume the precious resources of order-checking step, comes to produce sequence information from unworthy carrier DNA.Wherein a kind of method of the carrier DNA that restriction can the order-checking amount is to make it and can't increase by PCR or other amplification method.Therefore, linearizing product storehouse (for example match terminal template 2160) embodiment of being used for checking order by further amplification therein, but in the amplification group with the sequencing template of 2170 group representations, total carrier DNA group's appearance obviously reduces.For example, the same just as will be hereinafter described in greater detail, circular vectors DNA for example pUC 19 can be specifically handles with short wave ultraviolet light, makes each chain effectively crosslinked and make it and can't increase by producing pyrimidine dimer, causes it not appear in the final sample basically and is checked order.The carrier DNA of handling can be added and have in cyclisation target DNA (being cyclic products 2150) and the linearizing sample, make sample comprise to derive from target (promptly match terminal template 2160) and carrier DNA group's linearizing to be represented.In this example, can analyze to determine linearizing efficient entire sample, for example can be available from Agilent Technologies by using, LabChip DNA 7500 chips of inc., wherein because due to the increase of nucleic acid volume, carrier DNA makes it possible to measure more accurately.Using any method as herein described to make in the process of sample amplification subsequently, the copy number of carrier DNA will can not increase, and make the amplification sample have the target DNA molecule of obvious larger proportion.

Step 7H-enrichment

In addition, Figure 22 represents the embodiment with each adapter material bonded enrichment label 2205, and described adapter material can comprise that biotin label or this specification sheets other parts are described or the enrichment label of other type generally known in the art.As mentioned above, during the connection of the terminal adapter of pairing, during filling-in (fragment reparation), and during the terminal amplified library of pairing, the plain part of enrichment label biological example can be for the optional selection of the pairing terminal fragment that contains adapter and the optional immobilization of terminal library fragment (after the circular nucleic acid linearizing) of matching.LoxP adapter 2105 described herein and 2107 other advantages are that adapter-adapter connection event only causes the adapter dimer, promptly prevent the formation of polymer adapter concatermer.

An aspect of the alternative of the inventive method 7 is consistent with other method as herein described and alternative, for example be used to connect the step J-L of the 6th kind of method (step 6J-6L) of adapter and amplification, and the product order-checking of also describing in this application subsequently.

As previously mentioned, aspect the ability of reading long effective covering gene pack support with minimum aim sequence as shown in figure 25, the alternative of method 7 provides the clear superiority that is better than other method.For example, Figure 25 explanation provides the long pairing end of about 20Kb to read long significant advantage in the assembling of e. coli k12 genome support, and the short pairing end that is better than about 3Kb is read length, even is better than known greater advantages based on the air gun method.The 7th kind of method provides other advantage that is better than based on method of attachment, because its only needs less treatment step, these steps need less precious resources, for example technician man-hour, instrument time spent and rate of utilization and reagent rate of utilization.

Should be understood that the present invention also expects and comprises any combination of above-mentioned 7 kinds of method corresponding steps.

As observable, similarity is arranged 1,2,3,4,5 and 6 of methods from above-mentioned present disclosure.Particularly

method

2,3,4,5 is especially similar with 6 similar step, can merge and exchange with generation to be equal to or favourable result between method.

Since introduced the universal method of pairing end sequencing method, introduce the alternative of described method below.

In a kind of alternative, the hair clip adapter can (Fig. 8) replace with outstanding adapter (overhang adaptor).Outstanding adapter can be by biotinylation, and can have for example following sequence:

5′OH-AATTC---AAACCCTTTCGGT---TCCAAC-3′OH (SeqID?NO：28)

| ||||||||||||| ||||||

3′OH-G---TTTGGGAAAGCCA---AGGTTG-5′PO4 (SeqID?NO：29)

63 ' terminal nucleotides of cochain (Seq ID NO:28) are TCCAAC, are connected with the complementary nucleotide of following chain (SeqID NO:29), form the recognition site of II type S restriction enzyme MmeI.

This alternative carries out in the mode that is similar to method 3.After first genomic dna (Fig. 8 A) fragmentation and precision work (Fig. 8 B), make terminal be connected (Fig. 8 C) of outstanding adapter and fragment.Can remove the dimer of giving prominence to adapter by size fractionation chromatography (being column spinner) or based on the chromatography of electric charge.Can't form the more senior concatermer of outstanding adapter, because lack phosphate at 5 ' overhang.After removing the overhang primer dimer (Fig. 8 D), can make fragment from connecting (Fig. 8 E) by the kinases processing.Carry out from connecting (being cyclisation), can carry out the exonuclease enzymic digestion subsequently to remove the non-annularity DNA that does not connect.Because the dna fragmentation that is not connected with outstanding adapter has the flush end that produces because of precision work, so their connection is not as having the effective of two each segmental 5 ' overhangs of outstanding adapter (cohesive end) that connect in a side.After cyclisation, utilize Mme I digestion to slough outstanding adapter DNA (referring to Fig. 8 F) at a distance, stay about 20 bases (Fig. 8 G) of initial gene group DNA in the every side that connects outstanding adapter.Have the streptavidin bead purifying (Fig. 8 H) of the fragment use of outstanding adapter in conjunction with the biotinylation adapter.

The gained fragment can check order by any effective ways, for example the method that present disclosure provided (for example step 3H).

The nucleic acid that is produced by the inventive method can use the terminal complementary primer of one or more and described sequence to check order.That is to say, under the order-checking scheme that step 3H describes, order-checking adapter A was connected with order-checking adapter B before checking order with fragment is terminal.Because therefore known segmental end sequence or order-checking adapter A or order-checking adapter B can be used to carry out sequencing fragment with order-checking adapter A or B complementary sequencing primer.In addition, be known (referring to 703 among Fig. 7 for example) in the sequence that comprises each fragment middle part that connects adapter.Also can use with this central region complementary primer and begin to check order from middle.In addition, can make the sequencing primer heterozygosis of the sequencing primer of end region and middle region become to treat the fragment (referring to Fig. 9) that checks order simultaneously.A primer is protected, and another primer is not protection then.Among Fig. 9, be subjected to the protection of phosphate group with the primer of terminal heterozygosis.First round order-checking will be from unprotected primer (Fig. 9, middle part primer).After first round order-checking, can choose the extension that stops first primer wantonly, for example by mixing complementary dideoxy nucleotide.Perhaps, the extension of first primer can be proceeded to the end of template strand, make that termination is unnecessary.Can make the second protected primer deprotection and take turns the sequence of extending in the order-checking with definite fragment end second.It can be that two long pairing end sequencings of single template of strand read to grow up to be possible that this method makes.

In second kind of alternative, the initiate dna (Figure 10 A) that makes fragmentation with have the adapter of 3 ' CC overhang and be connected with optional inner IIS type restriction endonuclease site.Junction fragment can't be from connection or from cyclisation, because their end is inconsistent (not complementary).Yet these fragments can use the joint that has 5 ' GG overhang in both sides to connect (Figure 10 B).After connection, can nucleic acid fragment purifying from non-annularity DNA be come out by standard gel discussed above and column chromatography or by cutting the not exonuclease enzymic digestion of cyclisation molecule.Gained cyclic DNA (Figure 10 D) can check order to gained DNA with after the MmeI cutting as in other method.

In another kind of alternative, can adopt method of the present invention to produce A/B and be connected ssDNA (Figure 11, step 1).Can make this single-chain fragment cyclisation (Figure 11, step 2) by with the oligonucleotide hybridization that comprises with A/B adapter complementary sequence, and connection in the presence of ligase enzyme.Except helping connection, oligonucleotide also can be used as primer (Figure 11, the step 3) that promotes cyclisation ssDNA rolling circle amplification.Can be according to method 1, the description cutting rolling circle amplification DNA among step 1K and the L (Fig. 1 L and Fig. 1 M).After amplification, preparation of standard library and sequencing technologies can be applied to this product (Figure 11, step 4).

Embodiments more of the present invention are with the unforeseeable basis that is found to be in the genomic pairing end sequencing experiment of coli strain K12, wherein experimental program comprises the cutting according to methods described herein use MmeI, stride the genomic degree of depth of reading long coverage very different (Figure 20, " carrier free (-) ").The so-called degree of depth is meant that the sequence in the essentially identical genome district that maps reads long number.This change in depth relevant with the density of striding genomic MmeI site (Figure 20).Unexpected and surprisingly, the inventor finds to add the double-stranded DNA (being designated as " (+) " in Figure 20) in the known MmeI of containing site, be that the pcr amplification product (" AmpPosMmeI (+) ") in intestinal bacteria B bacterial strain DNA (" EcoliB Strain (+) "), salmon sperm DNA (" SalSprmDNA (+) ") or the known MmeI of containing site reduces greatly and strides the variation of the genomic coverage degree of depth, and make it randomization.Yet, compare with " carrier free " contrast, add the double-stranded DNA (being designated as " (-) " in Figure 20) that lacks the MmeI site, promptly poly (dIdC) (" dIdC (-) ") or the known pcr amplification product (" AmpNegMmeI (-) ") that does not contain the MmeI site can not change the version of striding the genomic coverage degree of depth.Therefore, use the positive carrier DNA of MmeI to provide the pairing end to read the genomic more uniform distribution of long span, this is favourable.Following table institute column data has further confirmed these unforeseeable discoveries:

Table 1.MmeI carrier DNA is read the long depth profile and the effect of length to matching end

The statistical information of the coverage degree of depth of table 1 expression e. coli k12.3 samples (OK) have added the positive carrier DNA of MmeI, and 3 samples in bottom have added the negative carrier DNA of MmeI.Every column headings is represented: " Depth Ave "=mean depth; The standard deviation of " Depth STDEV "=degree of depth; The standard deviation of " Depth%CV "=degree of depth is divided by mean depth (this quotient representation is by the variation of the gauged degree of depth of mean depth); Long mean distance is read in pairing in " Length Ave "=genome; The standard deviation of long distance is read in pairing in " LengthSTDEV "=genome; " Length%CV "=length standard difference is divided by mean length.

According to Figure 20, table 1 expression is by adding the positive carrier DNA of MmeI, and the variation of crossing over the genomic coverage degree of depth of e. coli k12 reduces greatly (referring to Depth STDEV and Depth%CV value; Less Depth STDEV and Depth%CV value are favourable).

This causes striding genomic pairing end and reads long being more evenly distributed.This uniform distribution is favourable.

Table 2. has the effect of the pairing end sequencing of the positive carrier DNA of MmeI to the genome support of e. coli k12

Ratio

The table 2 expression effect of the pairing end sequencing data of the positive carrier DNA acquisition of MmeI to the support of shotgun contig.When at GS20 sequenator (454 Life Sciences, Branford, CT, when USA) upward reading long assembling with the pairing end sequencing by 121 big contigs that the e. coli k12 genomic dna carried out the shotgun sequencing acquisition, the pairing end sequencing that is produced with carrier free DNA or when lacking the carrier DNA in MmeI site is read long (48-56 support) and is compared, with the positive carrier DNA (hurdle " Stratagene SS dsDNA (+) " of MmeI, " intestinal bacteria B bacterial strain (+) " and " amplification positive (+) ") the pairing end sequencing that produces reads long resulting support number less (promptly support) greatly (19-25).Therefore, the use of the positive carrier DNA of MmeI improves the genome assembly performance that obtains by the pairing end sequencing that carries out according to the present invention.

As mentioned above, embodiments more of the present invention comprise the use of two strands " carrier DNA ".In some embodiments, in comprising the DNA step of cutting of being undertaken, use carrier DNA by restriction endonuclease MmeI.In described embodiment, carrier DNA contains one or more MmeI site.When the mole number of MmeI enzyme molecule approximate the DNA sample (products catalogue, New England Biolabs, Ipswich, when MA, the mole number in the MmeI site that exists in USA), the kernel cutting by MmeI takes place the most effectively.In the method for the invention, because for measuring not only difficulty but also lower concentration DNA (being about several nanograms～tens nanograms usually) consuming time reliably, and owing to, may be difficult to estimate the number in MmeI site based on due to the number of variations in the MmeI site of target DNA to be checked order.Therefore, the correct calculating amount that will add the MmeI enzyme in the reactant (to reach stoichiometric calculation concentration) just becomes problem.Make the number in MmeI site and the number of MmeI enzyme molecule reach the equilibrated needs in order to overcome this difficulty and to satisfy, certain methods of the present invention comprises the carrier DNA (with respect to sample DNA) that adding is excessive.Like this, the amount that add the MmeI enzyme in the reactant can be calculated according to the amount of known carrier DNA, and the number in MmeI site can be ignored in (ring-type) sample DNA.Therefore, the DNA concentration of measure sample DNA becomes unnecessary.This has just improved speed, has reduced required cost and the time of this method.The amount that the amount of carrier DNA can surpass sample DNA reach 1000 times of several times～about 10 times, several times～about 100 times, several times～about or more.In a preferred embodiment, the double-stranded salmon sperm DNA of 2 microgram supersound process is added the volume that reaches 100 microlitres in the sample DNA (for example 1XNEBuffer 4 (New England Biolabs) and 50 μ M S-adenosylmethionines (SAM)) with 2 MmeI of unit and all needed reagent, under about 37 degrees centigrade, hatched about 15 minutes.The technician should be appreciated that, can be in practical framework conditioned reaction temperature and time length.

In the MmeI restriction digest, contain the excessive use of the carrier DNA in MmeI site, with the MmeI enzyme associating of aforesaid about stoichiometric quantity, can choose wantonly and incorporate in any method that comprises the described MmeI digestion of present disclosure, for example the step 6H of the 6th kind of method (Figure 17 H).The technician it will also be appreciated that the strategy that adds " carrier DNA " contain the MmeI site reacts at any MmeI restriction digest, all is useful in the reaction of the number the unknown in the MmeI site in the low and/or sample DNA of sample DNA content wherein particularly.

The carrier DNA of more more embodiments can be used to the power operation of analytic sample, and wherein best carrier DNA does not hinder other step in this method.A kind of such method is the amplification of DNA sample, wherein can adopt to grasp DNA not to be increased but the also method known to the skilled of impregnable ordinary skill, handles circular vectors DNA (promptly by causing dna damage).For example, pUC 19 carrier DNAs can produce in dna structure so-called " pyrimidine dimer " with short wavelength ultraviolet radiation about 45 minutes (promptly usually between 30 minutes and 60 minutes).Dimer on the polysaccharase that is usually used in amplification method can not " be read (read the through) " template DNA, therefore the pUC DNA through irradiation can not increase.What those skilled in the art it is also understood that is to adopt destruction DNA to make it any other method that can not increase.For example, can produce damage by endogenous or external source method.The certain methods that produces dna damage includes but not limited to UV damage (UV-B, UV-A), alkylation/methylate, X ray damage, hydrolysis (promptly causing depurination by heat collapse) and oxidative damage.

As mentioned above, in some embodiments, treated circular vectors DNA is added in the cyclisation target DNA sample to improve linearizing step availability feature, particularly utilize the linearizing of mechanical fragmentation (for example by using atomizing).For example, the carrier pUC DNA of the processing between 1-4 μ g can be added in the cyclisation target DNA sample, and under 30psi atomizing 2 minutes with the linear nucleic acid fragment of the paired distance (pair distance) that produces its member and comprise about 20kb.Use derives from LabChip 7500 pilot chips of Agilent Technologies, measures whole atomized sample, determines whether atomizing becomes to give birth to needed result.

Table 3: use the resulting result of untreated carrier DNA

The relative percentage of the carrier DNA that exists in the sample after table 3 expression is increased, this is proportional with the amount that adds the untreated carrier DNA in the preceding sample of amplification.For example, add 1 μ g untreated carrier DNA and cause carrier DNA to be presented in the amplification sample, add 3 μ g equally and cause with 20% present with 6% nucleic acid molecule.

Table 4: use and handle the result that carrier DNA obtains

The relative percentage of the carrier DNA of the processing that exists in the sample of table 4 expression amplification back, wherein with table 3 in the untreated carrier DNA that provides compare greatly and reduce.For example, add carrier DNA that 1 μ g handled and cause that carrier DNA presents with 0.02% nucleic acid molecule in the amplification sample, add 3 μ g equally and cause with 0.06% present.

Connection in the water-in-oil emulsion

Embodiments more of the present invention comprise that also being used for nucleic acid molecule passes through to connect and the method for cyclisation.The cyclisation of nucleic acid molecule generally realizes by connecting under low nucleic acid concentration.With respect to the intermolecular incident of following secondary (or more senior) reaction kinetics, lower concentration helps following needed intramolecularly ligation (being the cyclisation) (F.M.Ausubel etc. (editor) of first order reaction kinetics, 2001, Current Protocols in Molecular Biology, John Wiley ﹠amp; Sons Inc.).Yet, even under high dilution, can not prevent intermolecular incident, the excess dilution of nucleic acid is unactual yet.The generation that reduces needed intramolecular cyclization incident of moleculartie (concatermer, dicyclo etc.).In some cases, the moleculartie product may be disadvantageous to downstream application.Generally speaking, ordinary method has two main shortcomings at least.The first, need the dilution initial nucleic acid to increase reaction volume and relevant reagent cost.High dilution also is difficult to reclaim effectively reaction product.The second, a large amount of moleculartie incidents take place really, reduce the output that connects product in the desired molecule.

The present invention includes the method for having got rid of the problem relevant greatly with above-mentioned conventional cyclization method.For example, according to the present invention, need not carry out ligation with high dilution (promptly under nucleic acid concentration).In one embodiment, but have in the isolated physically reaction environment of each linear dsdna molecule of compatible coupling end (for example flush end or staggered (" glue ") are held) and connect.Preferably be used for making in the presence of the tensio-active agent of emulsion-stabilizing, will containing the aqueous solution and essential all reagent (for example dna ligase, ligase enzyme damping fluid, the ATP etc.) emulsification in oil of ligation of the DNA that will connect.The more argumentations that are used to prepare suitable composition of emulsion and method see below.The resulting water-in-oil emulsion (microreactor) that contains droplet (microdroplet) respectively contains zero, one or more dna moleculars.Can adjust the number of the dna molecular of each microreactor by changing the size of DNA concentration and droplet.For technicians, calculate conditions suitable, just a conventional optimization problem according to the size (length is measured with the base number) and the average-volume of droplet of nucleic acid concentration, polynucleotide.The ideal droplet can contain an attachable dna molecular.Yet, be appreciated that in a group microreactor, the number of each microreactor dna molecular part will change according to the size variation of microreactor and the stochastic distribution of dna molecular.Therefore, some microreactors may not contain dna molecular, and some can contain a dna molecular, and some can contain two or more dna moleculars.It should be recognized by those skilled in the art that and to come balanced quantities and cost (reagent use) by the average number that changes each microreactor dna molecular as required.

Preferably in assembling, will connect mixture and keep ice-cold (for example at 0-4 degree centigrade), finish up to emulsion process.This will prevent to carry out ligation before needed emulsion environment forms, therefore can prevent the formation of unwanted intermolecular bonding.Subsequently, emulsive ligation thing is hatched allowing under the temperature of ligation.Incubation time can be from several minutes to 1 hour, several hours, overnight or do not wait more than 24 hours or 1 day.After this hatches, but before the breakdown of emulsion, during or afterwards, in order to prevent unwanted moleculartie in the blended ligation thing, can stop ligation.Can by reduce the temperature to about 0-4 degree centigrade (frozen water), by to the hot deactivation of ligase enzyme, stop ligation by any combination that adds EDTA, adds ligase enzyme inhibitor etc. or these class methods.

The technician can easily be applied to aforesaid method of the present invention the cyclisation of strand or double-stranded RNA or strand or double-stranded DNA.For example, by with add cap oligonucleotide (also claiming bridge joint oligonucleotide (bridging oligonucleotide)) annealing, can cause the terminal directly arranged side by side of linear strand polynucleotide molecule, the described cap oligonucleotide that adds has each terminal complementary part with described linear strand polynucleotide molecule, as method 1 step 1K describe (referring to Fig. 1 L and Figure 11).

Then, emulsive ligation thing can be hatched under suitable temperature.For example, for " cohesive end " that be connected with the T4 dna ligase, suitable incubation temperature is 16 degrees centigrade, but bigger temperature range also is acceptable.The condition that DNA is connected with other molecule is generally known in the art.An advantage of carrying out cyclization in emulsion is reaction times of extending to the success of this method is neutral or or even useful.For example, be under the situation of a dna molecular at each microreactor, incubation time can prolong till most of dna moleculars are by cyclisation.By contrast, by using the non-emulsion process of above-mentioned routine, hatch for a long time the moleculartie product that may cause higher proportion.Another advantage that the present invention is based on the method for attachment of emulsion is to make to react the incidence of carrying out the long relatively time and can not improve moleculartie.The increase of this incubation time allows the cyclisation product of greater number and can not increase the danger that moleculartie takes place.In addition, because molecule is by physical method for separation, and not in the concentration dependent mode, therefore for the connection event of similar number, reaction volume can much lower (nucleic acid concentration that is aqueous phase nucleic acid can be much higher), and this has reduced reagent cost, improved the convenience of handling sample.The technician is understood that to connect and occurs in the given droplet that described droplet must contain enough reagent, comprises at least one ligase enzyme molecule.

Separating of breakdown of emulsion and cyclized DNA

After the connection, can stop ligation, emulsion " is broken " (this area also claims " emulsion breaking ").Have the method (reaching the reference of wherein quoting the 5th, 989, No. 892 referring to for example United States Patent (USP)) of many breakdowns of emulsion, those skilled in the art can select appropriate means.After the emulsion breaking can be the separate nucleic acid step, and this can be undertaken by any appropriate method of isolating nucleic acid.In case isolate nucleic acid, just can remove the material that does not connect by any method that is suitable for this task, one of described task is that sample is carried out the exonuclease enzymic digestion.Employed concrete exonuclease can depend in part on molecule type (strand or double-stranded DNA or RNA) and other consideration item of research, for example suitably considers temperature of reaction in the method.Carry out the exonuclease processing by one of several different methods known in the art after, can carry out purifying to the cyclisation material, for example phenol/chloroform extraction method or any commercially available purification kit that is suitable for this purpose.

Adopt above-mentioned cyclisation scheme based on dilution commonly used, the recovery of observing required cyclic products reduces with the increase of linearity input dna molecular length.Emulsion method of attachment of the present invention is specially adapted to the cyclisation of long polynucleotide molecule, for example molecular length is greater than about 500 bases, length is greater than about 1000 bases, length is greater than about 2000 bases, length is greater than about 5000 bases, length is greater than about 10000 bases, length is greater than about 20,000 bases, and length is greater than about 50,000 base, length is greater than about 100,000 bases, and length is greater than about 250,000 base, length greater than about 100 ten thousand bases or length greater than about 500 ten thousand bases or in fact in the target experimental program, be considered as any size of needing.

Emulsion method of attachment described herein can be used for various ligations, no matter whether cause cyclisation.Therefore, the above-mentioned emulsion method of attachment can be used for any Connection Step of the whole bag of tricks described herein, especially wherein needs to make the ligation of input nucleic acid cyclisation.

Emulsification

Emulsion is the hybrid system of two kinds of immiscible liquid phases, wherein one be dispersed in mutually as the drop of micro-size or colloid size another mutually in.Emulsion of the present invention must be able to form micro-capsule (microreactor).Emulsion can produce from any suitable combination of immiscible fluid.Emulsion of the present invention has aqueous favoring (containing biochemical component) and hydrophobic immiscible fluid (a kind of " oil "), the phase (disperse phase, interior phase or discontinuous phase) of described aqueous favoring for existing with the fine droplets form, described hydrophobic immiscible fluid is that described drop is suspended in matrix (non-dispersive phase, external phase or foreign minister) wherein.This class emulsion is called " water-in-oil " (W/O).This just has the whole water that contains biochemical component and is isolated in advantage in the dispersant liquid drop (interior phase).Foreign minister's (for hydrophobicity oil) does not generally contain any biochemical component, is inert therefore.

In some embodiments, microreactor contains nucleic acid and connects essential reagent.Each can contain just what a polynucleotide molecule a large amount of microreactors.In certain embodiments, may need heat-staple water-in-oil emulsion, for example under following situation: after reaction, carry out the hot deactivation of ligase enzyme, perhaps use thermally-stabilised ligase enzyme (for example Taq dna ligase) at high temperature to connect.Can form emulsion according to any appropriate method known in the art.Hereinafter described and produced a kind of method of emulsion, but can adopt any method of preparation emulsion.These methods are known in the art, and comprise householder method (adjuvant method), counter-current, cross-flow method, vibration, rotary drum method and embrane method.In addition, can pass through the size of the speed setting micro-capsule of change flow velocity and component.For example, when dripping, can change the size of drop and the total time of sending.In some embodiments, can in microfluidic device, produce droplet, for example as people such as Link describe (Angew.Chem.Int.Ed., 2006,45,2556-2560), by reference all in conjunction with combination hereby.

At least some microreactors should be enough big to comprise that enough nucleic acid is connected reagent with other.Yet at least some microreactors should make part microreactor group contain the single polynucleotide molecule that can connect certainly enough for a short time.In some embodiments, emulsion is heat-staple.About 100 nanometers of diameter range size of preferred formed drop～about 500 microns, more preferably from about 1 micron～about 100 microns.Advantageously, the cross-flow fluid mixes, and chooses wantonly and the electric field associating, can be for the consistence of formation of control drop and drop size.

The various emulsions that are applicable to biological respinse can be referring to Griffiths and Tawfik, EMBO, 22, the 24-35 pages or leaves (2003); Ghadessy etc., Proc.Natl.Acad.Sci.USA 98, the 4552-4557 pages or leaves (2001); United States Patent (USP) the 6th, 489, No. 103 and WO 02/22869, described document all is attached to herein by reference.In a preferred embodiment, oil is silicone oil.

Tensio-active agent

Can make emulsion-stabilizing of the present invention by adding one or more tensio-active agents (emulsion stabilizer, tensio-active agent).These tensio-active agents also claim emulsifying agent and are separated to prevent (or postponing at least) as water/oily interface.Can use multiple oil and numerous emulsifiers to produce water-in-oil emulsion; Up-to-date tensio-active agent of compiling surpasses 16,000 kinds, wherein many can be used as emulsifying agent (Ash, M. and Ash, I. (1993) Handbook of industrial surfactants.Gower, Aldershot).The emulsion stabilizer that is used for the inventive method comprises Atlox 4912, sorbitan monooleate (sorbester p17; ICI), polyoxyethylene 20 sorbitan monooleate (tween 80; ICI) and other generally acknowledge and commercially available suitable stabilizers.

In various embodiments, tensio-active agent is with the 0.5-50% in the oil phase, preferred 10-45%, more preferably the volume/volume concentration of 30-40% emulsion provides.

In some embodiments, use chemically inert silicone-type tensio-active agent, for example silicone copolymers.In one embodiment, used silicone copolymers is for example Abil of polysiloxane-poly-hexadecyl-ethylene glycol copolymer (hexadecyl dimethicone copolyol) EM90 (Goldschmidt).

Chemically inert silicone-type tensio-active agent can be used as tensio-active agent unique in the emulsion compositions and provides, and perhaps can be used as one of several tensio-active agents provides.Therefore, can use mean mixtures of individual surfactants.

In specific embodiment, employed a kind of tensio-active agent is DowCorning

749 Fluid (with 1-50%, preferred 10-45%, more preferably 25-35% w/w use).In other specific embodiment, employed a kind of tensio-active agent is DowCorning

5225C Formulation Aid (with 1-50%, preferred 10-45%, more preferably 35-45% w/w use).In a preferred embodiment, oil/surfactant mixture is made up of following: 40% (w/w) Dow Corning

5225C Formulation Aid, 30% (w/w) Dow Corning

749 Fluid and 30% (w/w) silicone oil.

Method of the present invention provides multiple benefit and the advantage that is better than existing method.The advantage that the inventive method is better than prior art is need in eucaryon or prokaryotic hosts the fragment of preparation not to be cloned and increase.During rearrangeable a plurality of tumor-necrosis factor glycoproteins, this is especially useful during target sequence is included in the host cell as episome propagation.

Another advantage of disclosure method is can be by the contig sequence not only is provided, but also provide length can surpass 100bp, surpass 300bp, surpass 500bp, surpass 1kb, surpass 5kb, surpass 10kb, surpass 100kb, surpass 1Mb, surpass the end sequence of the above long contig of 10Mb and the direction of end sequence, thereby can promote genome to assemble.This sequence information and directional information can be used to promote the genome assembling, and the breach closure is provided.

In addition, the pairing end is read long second confidence level that is provided in the genome assembling.For example, if the pairing end sequencing is consistent on the relative dna sequence with conventional contig order-checking, then the confidence level of this sequence improves.Perhaps, if two sequence datas contradiction each other, then degree of confidence reduces, and may need more analysis and/or order-checking to find out inconsistent reason.

Whether the terminal existence of reading open reading-frame (ORF) in the length of pairing also provides the direction about the open reading-frame (ORF) position.For example, if two order-checking ends of contig contain open reading-frame (ORF), then very possible whole contig is exactly an open reading-frame (ORF).This can be confirmed by the standard sequencing technologies.Perhaps,, can make up the specific PCR primer, can check order to determine existing of open reading-frame (ORF) amplification region with the amplification two ends for understanding to two ends.

Method of the present invention also can improve the understanding to genome organization and structure.Because pairing end sequencing method has the ability of crossing over the zone that is difficult to check order, even because can't check order to these zones, also can release genome structure.The zone that is difficult to check order can be for example iteron and secondary structure zone.In this case, even do not know the sequence that these are regional, also can in genome, draw the number of these difficult region and the collection of illustrative plates of position.

Method of the present invention is also for measure genomic haplotype in extended distance.For example, can prepare the Idiotype primer contains two SNP that connect with long distance with amplification genomic zone.Can adopt method of the present invention, two ends of this amplification region be checked order with the determining unit type, and need not the nucleic acid between two SNP is checked order.When this method is crossed over the inefficient zone of order-checking as two SNP dragon it is useful.These zones comprise long zone, have the zone of tumor-necrosis factor glycoproteins or the zone of secondary structure.

The method of biotinylation adapter provides extra advantage (Fig. 7 and Figure 22).Fig. 7 A represents that nucleic acid is easily to be connected with B with sequencing primer A in the mode that checks order.In the nucleic acid some are the contaminated nucleic acids (701) that do not contain two ends in single contig district.The nucleic acid fragment that contains two ends of contig is represented with 702.Because nucleic acid 702 is unique nucleic acid classes that comprise vitamin H, so this nucleic acid class can be used streptavidin bead purifying (Fig. 7 B).This nucleic acid class is easy to order-checking behind purifying.By using the avidity purifying, the sequence that produces useful information partly greatly increases.

This is being particularly useful when contaminating dna (701) is very long, for example, if the nucleic acid of each pollution (701) has the length of several kb among Fig. 7 D.Order-checking may consume quite a few reagent, manpower and the computer power that is exclusively used in this project to these pollutents.In this case, before the fragment suitable, can save a large amount of work and and reagent by affinity chromatography (Fig. 7 E) purifying.

The technician should recognize immediately that any double-stranded DNA that contains relative chain inosine (see Figure 14, be with or without hair clip) by the cutting of EndoV kernel can produce strand overhang (cohesive end), and wherein in fact overhang can be any nucleotide sequence.The present invention also comprises and is substantially similar to Figure 14, but do not have the polynucleotide design and the method for hair clip.In addition, be understood that easily also that as mentioned above, as shown in figure 14 the inventive method that is with or without hair clip and composition can be used for multiple molecular biology and recombinant DNA technology, wherein need to introduce unique endonuclease site.This class technology includes but not limited to structure, the various subclone strategy in DNA and cDNA library or benefits from any method in the unique endonuclease site in primer, adapter or the joint.

The terminal nucleic acid construct of pairing that is produced by any method described herein can check order by any sequence measurement known in the art.For example Sanger's standard sequence measurement checks order or the Maxam-Gilbert order-checking is generally known in the art.Also can pass through by 454 by for example using

(USA) exploitation is called 454Sequencing to Life Sciences Corporation for Branford, CT ^TMThe automatization sequence measurement check order, for example referring to United States Patent (USP) the 7th, 323,305 and 7,244, the U.S. Patent application sequence number (SN) 10/767,894 of No. 567 and on January 28th, 2004 application; And on January 28th, 2004 application 10/767,899.Other sequence measurement known in the art, for example any while synthesizing analytical method (sequencing-by-synthesis) or by limit fillet sequencing (sequencing-by-ligation), relevant summary is referring to Metzger (Genome Res.2005 December; 15 (12): 1767-76), by reference hereby in conjunction with) be also included within, and can be used for pairing end sequencing method of the present invention.

The term " vitamin H ", " avidin " or " streptavidin " that run through present disclosure are used to describe multiple in conjunction with right.Be appreciated that these terms just explanation use in conjunction with right a kind of method.Therefore, term vitamin H, avidin or streptavidin can be used in conjunction with right arbitrary member and replace.In conjunction with to being two kinds of molecules of any bonded of specificity each other, comprise at least: for example anti-FLAG antibody of FLAG/, vitamin H/avidin, vitamin H/streptavidin, receptor/ligand, antigen/antibody, receptor/ligand, polyHIS/ nickel, A albumen/antibody and derivative thereof below in conjunction with right.Other combination is to being known and delivering in the literature.

The reference that all patents, patent application and present disclosure any part are quoted all by reference its integral body give combination hereby.

Below, the present invention will be further described by following non-limiting examples.

Embodiment

Embodiment 1: the oligonucleotide design

Following design and the synthetic oligonucleotide that is used to test.

The capture element oligonucleotide shown to Fig. 3 A top designs to comprise UA3 adapter and order-checking key (key).Make the NotI site between adapter.Can use nested oligonucleotide and PCR to produce complete construct (capturing element).The sequence of final product is synthesized and cloned.

The IIS type that Fig. 3 A lower section shows is captured the fragment oligonucleotide, and is similar with the above-mentioned fragment of capturing, and just capturing behind the order-checking key sequence comprises the sequence of representing IIS type restriction endonuclease site (for example MmeI) in the fragment.These IIS type restriction endonuclease cleavage sites allow cutting with being captured any construct that element makes up by these of IIS type restriction endonuclease cutting.As known in the art, the cutting of IIS type restriction endonuclease is positioned at apart from the DNA of the various distances of recognition site, under the situation of MmeI, is the distance of 20/18 base.

To lack adapter capture the fragment oligonucleotide be designed to contain the SAD1 adapter and the order-checking key (Fig. 3 B).The NotI site is equally between adapter.This oligonucleotide synthesized in order-checking have MmeI IIS type restriction endonuclease cleavage site (referring to Fig. 3 B, short adapter is captured fragment (IIS type)) behind the key sequence.

Embodiment 2: the scheme that is used for hair clip adapter pairing end sequencing

Use standard HydroShear assembly (Genomic Solutions, Ann Arbor, MI, USA), with the e. coli k12 DNA among the 100 μ l (20 μ g) with 20 circulations of speed 10 hydrodynamic force shearing treatment.By adding 50 μ l DNA (5 μ g), 34.75 μ l H ₂O, 10 μ l methylase damping fluids, 0.25 μ l 32mM SAM and 5 μ l EcoRI methylases (40,000 units/ml, New England Biolabs (NEB), Ipswich, MA USA), carries out methylation reaction to the DNA that shears.Reactant was hatched under 37 ℃ 30 minutes.Behind methylation reaction, according to manufacturer's specification sheets, the methylate DNA of shearing uses QiagenMinElute PCR purification column purifying.With 10 μ l EB damping fluids the DNA of purifying wash-out from the post is come out.

The methylate DNA of shearing is carried out the precision work step has flush end with generation shearing material.10 μ l DNA adding is contained 13 μ l H ₂In the reaction mixture of O, 5 μ l 10X precision work damping fluids, 5 μ l 1mg/ml bovine serum albumins, 5 μ l 10mM ATP, 3 μ l 10mM dNTP, 5 μ l 10U/ μ lT4 polynucleotide kinases and 5 μ l 3U/ μ l T4 archaeal dna polymerases.Reactant was hatched under 12 ℃ 15 minutes, after this, temperature is risen to 25 ℃ reach 15 minutes again.According to manufacturer's specification sheets, reactant is carried out purifying on Qiagen MinElute PCR purification column subsequently.

Shear DNA, 17.5 μ l H by adding 10 μ l, 5 μ g ₂(the T4 dna ligase NEB), makes the hair clip adapter be connected with the flush end dna fragmentation of shearing for O, 50 μ l 2X Quick ligase enzyme damping fluids, 20 μ l, 10 μ M hair clip adapters and 2.5 μ l Quick ligase enzymes.Reactant was hatched under 25 ℃ 15 minutes, after this, by in mixture, adding 2 μ l λ exonucleases, 1 μ l Rec J (30,000 s/ml of unit, NEB), 1 μ l T7 exonuclease (10,000 units/ml, NEB) and 1 μ l exonuclease I (20, junction fragment NEB), is selected by 000 unit/ml.Reactant was hatched under 37 ℃ 30 minutes, after this, make sample on Qiagen MinElute PCR purification column, carry out purifying.According to manufacturer's specification sheets, make treated DNA then, and come out with volume wash-out from post of 50 μ l by Invitrogen Purelink post.

The DNA that the exonuclease that connects is handled by EcoRI digests.To contain 50 μ l DNA, 30 μ l H ₂(reactant of 20,000 units/ml) is 37 ℃ of following overnight incubation for O, 10 μ l EcoRI damping fluids and 10 μ l EcoRI.According to manufacturer's specification sheets, with cleaved products Qiagen QiaQuick column purification.Containing 50 μ l DNA, 20 μ l damping fluids 4 (NewEngland Biolabs), 2 μ l 100mM ATP, 123 μ l H ₂In the reactant of O and 5 μ l ligase enzymes (the same), cleaved products is connected once more to produce closed-circular DNA.The ligation thing was hatched under 25 ℃ 15 minutes, after this, by in mixture, adding 1 μ l λ exonuclease (5,000 unit/ml, NEB), 0.5 μ l Rec J (the same), 0.5 μ l T7 exonuclease (the same) and 0.5 μ l exonuclease I (the same), it is carried out another exonuclease of taking turns handles.The exonuclease reactant was hatched under 37 ℃ 30 minutes, after this, sample Qiagen MinElute PCR purification column purifying.

Containing 10 μ l DNA, 78.75 μ l H then ₂(2,000 units/ml in reaction mixture NEB), carry out Mme I digestion to handling DNA for O, 10 μ l damping fluids 4 (New England Biolabs), 0.25 μ l SAM and 0.5 μ l Mme I.Reactant was digested 60 minutes with Mme I down at 37 ℃, on ultimate density 0.1%3M sodium acetate buffered Qiagen QiaQuick post, carry out purifying then.According to manufacturer's specification sheets, wash pillar with 700 μ l 8.0M Guanidinium hydrochlorides, and sample is added on the post.DNA is with 30 μ l EB buffer solution elution, and is diluted to the final volume of 100 μ l.

Be prepared as follows streptavidin magnetic beads (50 μ l) (Dynal Dynabeads M270, Invitrogen, Carlsbad, CA, USA):, bead is suspended in the 100 μ l 2X bead binding buffer liquid, after this with the washing of 2X bead binding buffer liquid, 100 μ l DNA samples are added in the bead, at room temperature mixed 20 minutes.With bead washed twice in lavation buffer solution.With SAD7 adapter cover group (A/B cover group, wherein make single stranded oligonucleotide AD7Ftop and SAD7Fbot annealing form the A adapter, single stranded oligonucleotide AD7Rtop and SADRFbot annealing formed the B adapter) (SAD7Ftop:5 '-CCGCCCAGCATCGCCTCAGNN-3 ' (SEQ ID NO:51); SAD7Fbot:5 '-CTGAGGCGATGCTGG-3 ' (SEQ ID NO:52); SAD7Rtop:5 '-CCGCCCGAGCACCGCTCAGNN-3 ' (SEQ ID NO:53); SAD7Rbot:5 '-CTGAGCGGTGCTCGG-3 ' (SEQ ID NO:54), wherein N is 4 kinds of bases (any of A, G, T or C), is connected with DNA in conjunction with the streptavidin bead, wherein will contain 15 μ l H ₂The ligation mixture of O, 25 μ l Quick ligase enzyme damping fluids, 5 μ l SAD7 adapter cover groups and 5 μ l Quick ligase enzymes (the same) adds in bead-DNA mixture.The ligation thing was hatched under 25 ℃ 15 minutes, then, wash bead twice with the bead lavation buffer solution.

Contain 40 μ l H by in bead, adding ₂O, 5 μ l 10X mend flat damping fluid, 2 μ l 10mM dNTP and 3 μ l and mend flat polysaccharase (mixture NEB) carries out the Nucleotide filling-in for Bst archaeal dna polymerase, 8,000 units/ml.With reactant after hatching 20 minutes under 37 ℃, with bead washed twice in lavation buffer solution.Then bead is suspended in the 25 μ l TE damping fluids.

To contain 30 μ l H in conjunction with the DNA of bead then ₂O, 5 μ l 10X Advantage, 2 damping fluids, 2 μ l 10mM dNTP, 1 μ l, 100 μ M forward primers (SAD7FPCR:5 '-Bio-CCGCCCAGCATCGCC-3 ' (SEQ ID NO:55)), 1 μ l, 100 μ M reverse primers (SAD7RPCR:5 '-CCGCCCGAGCACCGC-3 ' (SEQ ID NO:56), 10 μ l are in conjunction with the DNA and 1 μ l Advantage, the 2 polysaccharase mixture (Clontech of bead, Mountain View, CA carries out PCR in reaction mixture USA).PCR adopts follow procedure to carry out: (a) 94 ℃ following 4 minutes, (b) 94 ℃ of 15 seconds following, (c) 64 ℃ of following 15 seconds, step (b) and (c) carry out 19 times and circulate wherein, (d) 68 ℃ following 2 minutes, after this, reactant is kept under 14 ℃.

The PCR product uses Qiagen MinElute PCR purification column to carry out purifying, then, purified product is being carried out electrophoresis under 5 volt/cm on 1.5% sepharose, detects the existence of 120bp product.Downcut the 120bp fragment from gel, adopt Qiagen MinElute gel extraction scheme to reclaim.With the 120bp fragment with 18 μ l EB buffer solution elution.Double-stranded product is combined with the streptavidin bead, and with bead lavation buffer solution washed twice.Single stranded product is carried out purifying with 125mM NaOH wash-out on Qiagen MinElute PCR purification column.(USA) sequence measurement checks order to this material on 454 Life Sciences Corporation automatization sequencing systems for Branford, CT to adopt standard 454 Life Sciences Corporation then.

Embodiment 3: the scheme that is used for non-hair clip adapter pairing end sequencing

Use standard package (HydroShear, the same), with the e. coli k12 DNA (5 μ g) of 100 μ l volumes with speed 11 through 20 circulations of hydrodynamic force shearing treatment.According to manufacturer's specification sheets, will shear DNA and on Qiagen MinElute PCR purification column, carry out purifying, and with 23 μ l EB buffer solution elution.The shearing DNA of purifying is carried out flush end precision work in the reaction mixture that contains 23 μ l DNA, 5 μ l10X precision work damping fluids, 5 μ l 1mg/ml bovine serum albumins, 5 μ l 10mM ATP, 3 μ l10mM dNTP, 5 μ l 10U/ μ l T4 polynucleotide kinases and 5 μ l 3U/ μ l T4 archaeal dna polymerases.Reactant was hatched under 12 ℃ 15 minutes, after this, temperature is risen to 25 ℃ reach 15 minutes again.According to manufacturer's specification sheets, reactant is carried out purifying on Qiagen MinElute PCR purification column subsequently.The purify DNA that the connection of no hair clip adapter uses 2 μ g in the reaction mixture that contains 25 μ l 2X Quick ligase enzyme damping fluids, 18.5 μ l, 10 μ M and do not have hair clip adapter and 2.5 μ l Quick ligase enzymes (the same) to shear carries out.The ligation thing was hatched under 25 ℃ 15 minutes, after this, make sample successively by Sephacryl S-400 column spinner and Qiagen MinElute PCR purification column.Go out DNA with 10 μ l EB damping fluids wash-out from the post then.

Then, make the connection DNA of purifying carry out kinase reaction, wherein mixture contains 13 μ lH ₂O, 25 μ l 2X damping fluids, 10 μ l DNA and 2 μ l 10U/ μ l T4 polynucleotide kinases.Reactant was hatched under 37 ℃ 60 minutes, after this, make sample on 1% sepharose, carry out electrophoresis with 5 volts/cm.From the band between the gel cutting-out 1500bp to 4000bp, adopt Qiagen MinElute gel extraction scheme to reclaim.

Make the DNA of purifying contain 18 μ l DNA, 20 μ l damping fluids 4 (New England Biolabs), 2 μ l ATP, 150 μ l H ₂Carrying out another in the reaction mixture of O and 10 μ l ligase enzymes (the same) takes turns and is connected to produce cyclic DNA.Reactant was hatched under 25 ℃ 15 minutes, and after this, the mixture that will contain 2 μ l λ exonucleases (the same), 1 μ l Rec J (the same), 1 μ l T7 exonuclease (the same) and 1 μ l exonuclease I (the same) was hatched under 37 ℃ 30 minutes.After exonuclease reaction, DNA is carried out purifying on Qiagen MinElute PCR purification column, and with 20 μ l EB buffer solution elution.

Connection DNA adding with purifying contains 68.6 μ l H then ₂In the mixture of O, 10 μ l damping fluids 4 (New England Biolabs), 0.2 μ l SAM and 1 μ l Mme I restriction endonuclease (the same).DNA was cut 30 minutes down at 37 ℃, after this, use, wash with 700 μ l 8.0M Guanidinium hydrochlorides with the pre-buffered Qiagen of the ultimate density of 0.1%3M sodium acetate QiaQuick column purification DNA.Then with the DNA of purifying with 30 μ l EB buffer solution elution, adjusted volume to 100 μ l.

After the washing of streptavidin magnetic beads (50 μ l) (the same) usefulness 2X bead binding buffer liquid, be suspended in the 100 μ l bead binding buffer liquid.With bead and 100 μ l DNA sample mix, make it at room temperature to be bonded to each other 20 minutes then.After this, with bead washed twice in lavation buffer solution, carry out ligation with SAD7 adapter cover group (A/B cover group) (the same).To contain 15 μ l H ₂Among the DNA of mixture adding in conjunction with bead of O, 25 μ l Quick ligase enzyme damping fluids, 5 μ l SAD7 adapters and 5 μ l Quick ligase enzymes (the same), under 25 ℃, hatched 15 minutes, after this, with bead washed twice in lavation buffer solution.

To contain 40 μ l H in conjunction with the DNA of bead ₂O, 5 μ l 10X mend in the mixture that flat damping fluid, 2 μ l10mM dNTP and 3 μ l mend flat polysaccharase (the same) and carry out filling-in.Be reflected at and carried out under 37 ℃ 20 minutes, after this, bead after the washed twice, is suspended in the 25 μ l TE damping fluids in lavation buffer solution.To contain 30 μ l H in conjunction with the DNA of bead ₂Increase in O, 5 μ l10X Advantage, 2 damping fluids, 2 μ l dNTP, 0.5 μ l, 100 μ M forward primers (the same), 0.5 μ l, 100 μ M reverse primers (the same), 10 μ l the reaction mixture in conjunction with the DNA of bead and 1 μ l Advantage2 enzyme (the same).PCR is reflected under the following condition and carries out: (a) 94 ℃ following 4 minutes, (b) 94 ℃ of following 15 seconds, (c) 64 ℃ of following 15 seconds, wherein step (b) and (c) 24 circulations of repetition, (d) 68 ℃ following 2 minutes, after this, the PCR reactant is kept under 14 ℃.The PCR product carries out purifying with Qiagen MinElute PCR purification column, carries out electrophoresis with 5 volts/cm in 1.5% sepharose.From the product of gel cutting-out 120bp, reclaim with Qiagen MinElute gel extraction scheme.DNA uses 18 μ l EB buffer solution elution subsequently.

Double-stranded DNA is combined, bead lavation buffer solution washed twice with the streptavidin bead.Single stranded DNA is used 125mM NaOH wash-out then, uses Qiagen MinElute PCR purification column to carry out purifying subsequently.Make the material of purifying carry out standard 454 emulsions and order-checking scheme.

Adopt aforesaid method, we obtain following result:

From 4 60x60 electrophoresis (about 1.3x10 ⁶Read long) normal 454 sequences produce the intestinal bacteria contig: produced 303 contigs greater than 1000bp, its mean size is 16,858bp, largest amount is 94,060bp.Table 5 comprises the other result who adopts aforesaid method to obtain.

Table 5: the result of pairing end sequencing method

14×43

By to the e. coli k12 that obtains from Genbank genomic whole pairings read the progress row and compare retrieval (blasting) for the first time and analyze.Keep with the desired value of reference genome coupling less than 0.1 read long.Analyzed contain that two comparisons retrievals of independently being separated by the internal connection sequence hit (blast hit) whole read long in genome apart comparison retrieval distance, if distance less than 5, the then reservation of 000bp.Make these read long first and second positions in genome then and hit ordering, and measure with observe the overlapping matched sequence that whether occurs in sorting near.Then according to above-mentioned same way as, measure each and the overlapping mating partner of 454 order-checking contigs of the contig of these orderings.

Embodiment 4: the scheme that is used for external excision by recombining reaction

1.DNA fragmentation

Use Hydroshear large assembly to shear 30 μ g e. coli k12 DNA samples to produce the 15-30Kb fragment.Make dna fragmentation pass through MicroSpin S400 column purification.

2. the terminal precision work of fragment

Carried out precision work with T4 archaeal dna polymerase and T4 PNK with the dna fragmentation end is following in Eppendorf tube.30 μ g initiate dna samples have carried out two secondary responses.

10X PNK damping fluid 10 μ l

BSA (20mg/ml diluent) 0.5 μ l

ATP(100mM) 1μl

dNTP(10mM?each) 4μl

Shear DNA (＜15 μ g) 75 μ l

T4 archaeal dna polymerase (3U/ μ l) 5 μ l

T4?PNK(10U/μl) 5μl

With the reaction mixture thorough mixing, and under 12 ℃, hatched 15 minutes.Be right after reaction mixture was hatched under 25 ℃ 15 minutes.Reactant with QIAEX II test kit purifying, is reacted with 37 μ l EB wash-outs at every turn.

3.LoxP adapter connects

(need reaction in duplicate) the following loxP6 adapter is added in the precision work dna fragmentation.

The quick ligase enzyme damping fluid of Roche 2X (#1) 50 μ l

LoxP6 adapter (per 20 μ M) 10 μ l

Precision work DNA 35 μ l

The quick ligase enzyme of Roche (#3) 5 μ l

With the reaction mixture thorough mixing, and under 25 ℃, hatched 15 minutes.

4. gel-purified and size are selected

The DNA sample that uses the preparation comb that two loxP are connected is loaded into (if using sample comb then available a plurality of hole) in the 0.5% big sepharose, and gel electrophoresis under 35V is spent the night.

Morning next day, collect the dna fragmentation of required scope (for example 20-25Kb), according to manufacturer's specification sheets, use QIAEX II to carry out purifying.

5. filling-in

Carry out filling-in and connect the otch of introducing to repair by the loxP6 adapter.

LoxP is connected DNA 38 μ l

10X Bst polymerase buffer 5 μ l

DNTP (each 10mM) 4 μ l

Bst archaeal dna polymerase 3 μ l

Behind the reaction mixture thorough mixing, under 50 ℃, hatched 15 minutes, flow through MicroSpin S400 post subsequently.Quantitative assay DNA concentration then.

6. be used for the excision reaction of cyclisation

With the 150-300ng DNA that from above-mentioned filling-in, produces, carry out the reorganization of site-specific type to produce the cyclisation molecule.

Molecular biology grade water 39 μ l

10X Cre damping fluid 10 μ l

Mend flat global DNA (150ng) 50 μ l

Cre recombinase (12U/ μ l) 1 μ l

Behind the reaction mixture thorough mixing, under 37 ℃, hatched 45 minutes, reach 10 minutes at 80 ℃ then and make Cre recombinase inactivation.Reaction mixture is cooled to 10 ℃, carries out next step immediately.

7. remove linear molecule

From above-mentioned reaction mixture, remove linear molecule by the exonuclease processing.

By following reagent is added in the above-mentioned excision reaction mixture of refrigerative, carry out exonuclease immediately and hatch.

ATP(100mM) 1.1μl

DTT(100mM) 1.1μl

Plasmid-Safe depends on DNA enzyme (10U/ μ l) the 5 μ l of ATP

Exonuclease I (20U/ μ l) 3 μ l

Behind the reaction mixture thorough mixing, under 37 ℃, hatched 30-60 minute.Under 80 ℃, hatch then and made exonuclease inactivation immediately in 20 minutes.

All the other following methods are modification of 454 library preparation methods.

8. the atomizing of cyclisation molecule

Make the cyclisation molecule fragment change into fragment by atomizing less than 1Kb.

1 μ l 0.5M EDTA and 1 μ g pUC19 are added in the above-mentioned heat-inactivated reaction mixture.Under 44psi, make DNA atomizing 2 minutes at the atomizing damping fluid.According to manufacturer's specification sheets, the dna fragmentation through atomizing is carried out purifying with the MinElute test kit.

9. the terminal precision work of fragment

10X PNK damping fluid 5 μ l

BSA (1mg/ml diluent) 5 μ l

ATP(10mM) 5μl

DNTP (each 10mM) 2 μ l

Atomizing DNA 23 μ l

T4 archaeal dna polymerase (3U/ μ l) 5 μ l

PNK(10U/μl) 5μl

Behind the reaction mixture thorough mixing, under 12 ℃, hatched 15 minutes.Be right after reaction mixture was hatched under 25 ℃ 15 minutes.Reactant QiaQuick purifying is with 50 μ lEB wash-outs.

10. library immobilization

According to manufacturer's recommendation, the precision work dna fragmentation is combined with the bead (for example Dynal M270 bead) of streptavidin bag quilt.Bead only stays bead after washing 3 times with 500 μ l TE.

11.454 the PE adapter connects

The following terminal adapter of 454 pairings that makes is connected to immobilization and accurately machined dna fragmentation on bead:

Molecular biology grade water 15 μ l

The quick ligase enzyme damping fluid of Roche (#1) 25 μ l

Abiotic elementization 454PE adapter 5 μ l

Behind the reaction mixture thorough mixing, adding has in the bead of captured dna.The vibration of reaction mixture vortex is mixed, add then

The quick ligase enzyme of Roche (#3) 5 μ l

Behind the reaction mixture thorough mixing, at room temperature on turner, hatched 15 minutes.Bead only stays bead with 500 μ l TE washing at least 3 times.

12. filling-in

Carry out filling-in with the reparation otch, and mend the flat 5 ' overhang of introducing by 454 PE adapters.

Molecular biology grade water 40 μ l

10X Bst dna polymerase buffer liquid 5 μ l

DNTP (each 10mM) 2 μ l

Bst archaeal dna polymerase 3 μ l

After adding reaction mixture in the above-mentioned DNA bead, under 37 ℃, hatched 15 minutes.Then bead is suspended among the 20 μ l EB.

13. increase in advance in the library

Increase in advance in the following terminal library of double-stranded pairing that makes:

Molecular biology grade water 28.5 μ l

10 * HiFi damping fluid, 5 μ l

50mM?MgCl ₂ 2.5μl

DNTP (each 10mM) 2 μ l

The forwards/reverse primer is to (each 100 μ M) 1 μ l

DNA 10 μ l on the bead

HiFi Taq archaeal dna polymerase (5U/ μ l) 1 μ l

Use follow procedure and be used for thermal cycler:

94 ℃ 3 minutes

94 ℃ of 30 second; 60 ℃ of 20 second; 72 ℃ of 45 second, 20 circulations

72 ℃ 2 minutes

Keep 10 ℃

14. the library size is selected

Clean by the following two-wheeled SPRI bead that carries out, select required library clip size.

1) adds molecular biology grade water, make above-mentioned reaction mixture reach 100 μ l.72 μ lSPRI beads are added in the sample.After according to manufacturer's specification sheets bead being hatched, washing.DNA is with 80 μ l EB wash-outs.

2) after adding 52 μ l SPRI beads in the 80 μ l elution samples, at room temperature hatched 5 minutes.Bead is combined with MPC, and collect unconjugated supernatant liquor.

3) carry out buffer-exchanged with the QiaQuick test kit after, with 50 μ l EB wash-outs.

15. separate in the strand library

1) captures the above-mentioned DNA that selects by size with the streptavidin bead.After the washing, in conjunction with the DNA of bead with the solution sex change of unwinding after, collect unconjugated ssDNA.

2) ssDNA neutralizes with sodium acetate, with MinElute test kit exchange buffering liquid.SsDNA 15-20 μ l TE wash-out.

Then in standard 454 emulsion amplified reactions, strand is matched after the terminal library member amplification, amplification member group is checked order.Figure 24 comprises that paired range distribution that expression is consistent with target insertion sequence size 24Kb and the about 40Kb that is detected grow up to the figure that adjusts the distance most.

Though describe favourable embodiment of the present invention in detail at this, but should be understood that the detail given in the foregoing description specification sheets that the invention is not restricted to by above-mentioned paragraph qualification, be stranded under the situation that does not depart from the spirit or scope of the present invention, the many tangible alternative of this specification sheets all is feasible.The modification and the alternative of method described herein will be apparent to those skilled in the art, and are included in the claims of enclosing.

Claims

1. method of DNA construct that is used for obtaining to comprise in vitro reactions two end region of target nucleic acid said method comprising the steps of:

-make nucleic acid molecule fragmentization to produce target nucleic acid molecule;

-target nucleic acid of linking is exposed in the site-specific recombinase, produce circular nucleic acid product and linear nucleic acid product by the target nucleic acid that is connected, wherein said circular nucleic acid product comprises target nucleic acid molecule; With

-make circular nucleic acid product fragmentation comprise template nucleic acid molecule from each terminal sequence area of target nucleic acid molecule with generation.

2. the process of claim 1 wherein that after the target nucleic acid with described linking was exposed to the step of site-specific recombinase, described method also comprised the step of removing the non-annularity molecule.

3. the method for claim 1, described method is further comprising the steps of:

-make described template nucleic acid amplification, produce the colony that comprises a large amount of essentially identical copies; With

-described colony is checked order, produce the sequence data of the sequence composition that comprises template nucleic acid.

4. the process of claim 1 wherein that described reorganization adapter element comprises the first reorganization adapter element and the second reorganization adapter element, the wherein said first and second reorganization adapter element boths comprise directed element.

5. the process of claim 1 wherein that described site-specific recombinase comprises the Cre recombinase.

6. the process of claim 1 wherein that described target nucleic acid molecule comprises is selected from following length: 3Kb, 8Kb, 10Kb, 20Kb, 50Kb and 100Kb at least at least at least at least at least at least.

7. the process of claim 1 wherein that described large nucleic acids molecule comprises genomic dna.

8. the method for claim 1, wherein said circular nucleic acid product comprises first heterozygosis reorganization adapter, described linear nucleic acid product comprises second heterozygosis reorganization adapter, and wherein said first and second heterozygosis reorganization adapter comprises the element from the reorganization adapter that connects.

9. the process of claim 1 wherein that the described step of circular nucleic acid product fragmentation that makes comprises atomizing.

10. method of a large amount of DNA that is used for obtaining to comprise in vitro reactions two end region of target nucleic acid, described method comprises the following steps:

-make the large nucleic acids molecule fragmentization to produce a large amount of target nucleic acid molecules;

-reorganization adapter element is connected with each end of target nucleic acid molecule, produce the target nucleic acid molecule of a large amount of linkings;

-target nucleic acid molecule that is connected is exposed in the site-specific recombinase, from the target nucleic acid molecule that is connected, produce a large amount of circular nucleic acid products and a large amount of linear nucleic acid products, wherein said circular nucleic acid product comprise target nucleic acid molecule and

-make circular nucleic acid product fragmentation comprise a large amount of template nucleic acid molecules from each terminal sequence area of target nucleic acid molecule with generation.

11. a test kit of implementing the method for claim 1, described test kit comprises:

-a large amount of reorganization adapter element; With

-site-specific recombinase, it is preferably the Cre recombinase.

12. a test kit of implementing the method for claim 1, described test kit comprises:

-a large amount of reorganization adapter element;

-site-specific recombinase; It is preferably the Cre recombinase;

-exonuclease; With

-circular vectors DNA, it is preferably pUC19.