WO1997043449A1 - Methods for sequencing large nucleic acid segments and compositions for large sequencing projects - Google Patents

Methods for sequencing large nucleic acid segments and compositions for large sequencing projects Download PDF

Info

Publication number
WO1997043449A1
WO1997043449A1 PCT/US1997/008114 US9708114W WO9743449A1 WO 1997043449 A1 WO1997043449 A1 WO 1997043449A1 US 9708114 W US9708114 W US 9708114W WO 9743449 A1 WO9743449 A1 WO 9743449A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
primer
sequencing
dna
complementary
Prior art date
Application number
PCT/US1997/008114
Other languages
French (fr)
Inventor
Koichi Hagiwara
Curtis Harris
Original Assignee
The Government Of The United States Of America, As Represented By The Secretary, Department Of Health And Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Government Of The United States Of America, As Represented By The Secretary, Department Of Health And Human Services filed Critical The Government Of The United States Of America, As Represented By The Secretary, Department Of Health And Human Services
Priority to AU30660/97A priority Critical patent/AU3066097A/en
Publication of WO1997043449A1 publication Critical patent/WO1997043449A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • This invention relates to the field of nucleic acid amplification and sequencing and kits for amplification and sequencing.
  • DNA sequencing typically involves two steps: i) making suitable templates for all the regions to be sequenced; and ii) running sequencing reactions for electrophoresis. The latter step can be automated by use of workstations and autosequencers. The first step requires careful experimental design and laborious DNA manipulation such as the construction of nested deletion mutants. See,
  • shot-gun sequencing methods randomly selected sub clones, which may or may not have overlapping sequence information, are randomly sequenced. The sequences of the sub clones are then compiled to produce an ordered sequence. These procedures eliminate complicated DNA manipulations; however, the method is inherently inefficient because many recombinant clones must be sequenced due to the random nature of the procedure.
  • This invention provides improvements in nucleic acid sequencing technology, particularly for sequencing large DNAs.
  • the invention provides a way of generating overlapping nucleic acids from a large clone to facilitate sequencing, and powerful methods of amplifying and tagging the overlapping nucleic acids into suitable sequencing templates. The methods can be used in conjunction with shotgun sequencing techniques to dramatically improve the efficiency of shotgun methods.
  • the invention provides new methods of sequencing large nucleic acids, kits for practicing the methods and related compositions.
  • a plurality of overlapping double stranded nucleic acids are provided, for example by digesting a large recombinant DNA such as a cosmid, recombinant lambda phage genome, Yeast artificial chromosome (YAC), or pi plasmid (alternatively referred to as a pi artificial chromosome, or "PAC") with at least one restriction endonuclease.
  • a large recombinant DNA such as a cosmid, recombinant lambda phage genome, Yeast artificial chromosome (YAC), or pi plasmid (alternatively referred to as a pi artificial chromosome, or "PAC"
  • YAC Yeast artificial chromosome
  • PAC pi plasmid
  • the large DNA is aliquotted into several separate reaction mixtures and digested in parallel with a multiplicity of different enzymes.
  • samples of the large DNA are digested in parallel with Alu I, Ace I, BstY I, Hinc II, Rsa I, Sea I, Msl I, Hpa I, Sma I, fltfiJ 1, Tsp45 I, ftu I, 4/7 III, Pvu II and £coR V.
  • This parallel digestion provides overlapping fragments of the large nucleic acid, preferably in overlapping increments of about 200 to 600 nucleotides, and commonly 200 to 400 nucleotides. Additional restriction endonucleases are used as necessary, depending on the particular large DNA.
  • a first primer binding site is ligated to a selected first double stranded nucleic acid fragment.
  • the first primer binding site when in double-stranded form typically has terminal regions which are complementary, surrounding a central non-complementary region.
  • a second primer binding site is preferably ligated to a selected second double stranded nucleic acid fragment, and a third primer binding site is typically ligated to a third selected double stranded nucleic acid fragment and so on, until each nucleic acid fragment has a ligated primer binding site.
  • primer binding sites are preferably identical, but are optionally selected independently.
  • a selected double stranded nucleic acid fragment with a ligated primer binding site is a "tagged" double stranded nucleic acid fragment.
  • the first double stranded nucleic acid fragment with a primer binding site is a tagged first double stranded nucleic acid.
  • the first primer binding site when in double-stranded form, typically has terminal regions which are complementary, surrounding a central non-complementary region.
  • An example of a first primer binding site is a vectorette such as those which are used for identification of the terminal ends of YACs.
  • the first double stranded nucleic acid fragment has blunt ends, either due to the restriction digest, or due to enzymatic processing of the ends (e.g., with a polymerase such as T4 DNA polymerase). Blunting the ends of the nucleic acid fragments facilitates attachment of a single primer binding site to all of the overlapping fragments for parallel sequencing analysis.
  • the first and second primer binding sites are identical.
  • the ends are optionally left heterogeneous for the different nucleic acid fragments, with different primer binding sites being ligated onto each fragment.
  • the selected double stranded nucleic acid is phosphorylated (ether as a result of the endonuclease digestion, or by treating the nucleic acid with a kinase enzyme), while the primer binding site is not phosphorylated. This causes each primer binding site to be covalently ligated to only one strand of the double-stranded nucleic acid, and prevents the formation of primer binding site concatamers.
  • the ligation is ordinarily performed with a ligase, but is optionally performed chemically.
  • the tagged fragments are then amplified by anchored PCR.
  • a portion of the tagged nucleic acid is amplified using an internal primer complementary to the selected double stranded nucleic acid
  • the sequence of the internal primer is typically derived from examining known sequence present in the large DNA, such as vector sequence, or other available sequence information, and constructing a complementary nucleic acid to the known sequence.
  • the PCR reaction is performed by denaturing the first double-stranded tagged nucleic acid, and hybridizing the internal primer to a complementary amplification primer binding subsequence of the selected double stranded nucleic acid, wherein the internal sequence is 3' to the first primer binding site, and performing a primer extension of the strand complementary to the internal primer binding subsequence by PCR, with the first internal primer to prime DNA synthesis, thereby forming a first PCR amplification product.
  • a first amplification primer is hybridized to the first PCR amplification product in a region of the first amplification product which is complementary to the first primer binding site, and, primer extension of the strand complementary to the first amplification product is performed by PCR, using the first amplification primer to prime DNA synthesis, thereby producing a double stranded amplified first nucleic acid subsequence.
  • the first amplification primer is complementary to the first amplification product, in whole or in part. Ordinarily, at least the 3' end of the amplification primer is complementary to the first primer binding site. Usually, at about 8 to about 30 nucleotides at the 3' end of the amplification primer are complementary. An example of such a primer is the 224M13 primer described herein.
  • the PCR amplification strategy of the invention has a number of advantages over standard methods of performing PCR. For instance, because the first primer binding site comprises a non-complementary subsequence, the first amplification primer is designed so that it can only hybridize to the complement of one of the strands of the primer binding site, which is not generated until the first PCR amplification product is formed. This reduces or eliminates the formation of unwanted PCR products, thereby eliminating the need for purification and subcloning of the PCR products for subsequent sequencing.
  • the PCR products are then sequenced by performing a dideoxy chain termination reaction with the amplified first tagged nucleic acid subsequence as a template, using a first sequencing primer complementary to one strand of the amplified first tagged nucleic acid subsequence to prime DNA synthesis.
  • the first sequencing primer hybridizes to a terminal region of the amplified first tagged nucleic acid subsequence which comprises a nucleotide sequence complementary to one strand of the first primer binding site.
  • the sequencing primer can hybridize to a terminal region of the amplified first tagged nucleic acid subsequence which does not comprise a nucleotide sequence complementary to one strand of the first primer binding site.
  • the first amplification primer preferably includes a 5' tail which does not hybridize to the first amplification product.
  • This 5' tail often is complementary to a widely available sequencing primer such as the universal M13, M13 reverse, T7, T3 and SP6 sequencing primers.
  • the PCR products are digested with an exonuclease to produce a single stranded template for the dideoxy sequencing reaction.
  • double stranded templates are acceptable for sequencing reactions, single stranded templates can produce more readable sequence information.
  • the above long distance sequencing methods are typically repeated on each of the double stranded nucleic acids generated from the large DNA, thereby providing sequence from the terminal region corresponding to the first amplification primer of each amplified fragment. The sequences are overlapping, and are compiled to produce the sequence of the large DNA, or a subsequence thereof.
  • kits for sequencing large nucleic acids typically comprise a container and instruction in the use of the kit for performing sequencing of large nucleic acids by the methods described herein.
  • the kits include components such as nucleic acids to make a primer binding site, and a primer which hybridizes to the complement of one strand of the primer binding site, wherein the primer binding site comprises a double-stranded vectorette comprising a central non-complementary nucleic acid subsequence.
  • kits optionally include reagents and materials for performing the amplification and dideoxy chain termination steps of the invention such as taq polymerase, rTth DNA polymerase XL, Taq plus DNA polymerase, DNA ligase, one or more restriction endonuclease, computer software for compiling overlapping sequence information, software for designing PCR primers, DNA kinase, lambda exonuclease, PCR reagents (nucleotides, labels, salts etc.), DNA polymerase, SequenaseTM , subcloning plasmids, and an Ml 3 sequencing primer.
  • reagents and materials for performing the amplification and dideoxy chain termination steps of the invention such as taq polymerase, rTth DNA polymerase XL, Taq plus DNA polymerase, DNA ligase, one or more restriction endonuclease, computer software for compiling overlapping sequence information, software for designing PCR primers,
  • the invention also provides a set of PCR reaction mixtures, corresponding to the PCR reaction mixtures made in the process of sequencing a large DNA as described above.
  • the reaction mixtures typically comprise an overlapping series of DNAs, wherein each reaction mixture in the set of reaction mixtures comprise a template DNA tagged with a vectorette, a first primer which hybridizes to an internal subsequence of the template DNA, and a second primer which hybridizes to the vectorette.
  • the PCR reaction mixtures also typically include appropriate PCR reagents such as rTth polymerase XL, nucleotides, and
  • the PCR reaction mixtures optionally comprise any of the compositions used in performing the PCR steps of the methods of the invention, such as template DNA tagged with a vectorette at each end, thereby providing an internal template DNA subsequence, flanked by external vectorette sequences.
  • each of the vectorettes are covalently attached to only one strand of the internal DNA subsequence.
  • Vectorettes for the different PCR reactions in the set of PCR reactions ordinarily have the same sequences.
  • the vectorette comprises a subsequence which is complementary to a nucleic acid complementary to primer 224M13.
  • Figure 1 is a schematic representation of a preferred method of providing a overlapping templates for sequencing
  • Figure 2 is a schematic representation of a preferred method of sequencing a nucleic acid.
  • Figure 3 panels A and B shows a typical set of amplified fragments made by anchored PCR. One-twentieth of each PCR amplification reaction was electrophoresed on a 1 % agarose gel. Reactions are aligned according to fragment size.
  • Figure 4 is a schematic of combination sequencing by shotgun and long distance sequencing methods.
  • Figure 5 is a schematic of a simplified amplification/sequencing protocol.
  • nucleic acids are "overlapping" when the nucleic acids have a region of common sequence. For example, different restriction fragments of a large nucleic acid will have regions in common derived from the large nucleic acid. See, e.g., Figure 1.
  • a “primer binding site” is a region of a nucleic acid which specifically hybridizes to a primer by hybridization of complementary base pairs.
  • nucleic acid refers to a deoxyribonucleotide (DNA) or ribonucleotide (RNA) polymer in either single- or double-stranded form, and unless otherwise limited, encompasses known analogues of natural nucleotides that hybridize to nucleic acids in a manner similar to naturally occurring nucleotides.
  • nucleic acid sequence optionally includes the complementary sequence thereof.
  • Regions of nucleic acids are "complementary" when they can hybridize through standard nucleic acid base pairing (A-T, A-U, or C-G).
  • a region is non-complementary over specified regions when complementary base pairs do not align between the regions.
  • the "vectorette" units described herein have a central non-complementary region, resulting in a bubble structure where the complementary ends of the vectorette hybridize, and the central non-complementary region does not.
  • nucleic acids Two nucleic acids are "ligated" together when one or more covalent bond is formed between the nucleic acids.
  • nucleic acids are ligated enzymatically, i.e., using a ligase enzyme; however, they are optionally ligated using chemical reagents.
  • a nucleic acid “tag” is a short nucleic acid of known sequence which is ligated to one or more nucleic acids.
  • a nucleic acid with a ligated tag is a “tagged nucleic acid.
  • An “internal primer” is a primer (a single-stranded nucleic acid which is typically between 8 and 100 nucleotides in length, usually between 12 and 40 nucleotides in length and often between 17 and 30 nucleotides in length) which hybridizes to a subsequence found between the ends of a nucleic acid.
  • a “dideoxy chain termination reaction” is a reaction in which a dideoxy nucleotide is incorporated into a nucleic acid. Ordinarily, the reaction is a primer extension reaction performed using a polymerase (SequenaseTM, taq, rTth or the like). The dideoxy chain termination reaction forms the basis for most known enzymatic sequencing reactions.
  • nucleic acid template or “template” is a nucleic acid which is copied, or, when single stranded, a nucleic acid which is used to make a complementary nucleic acid.
  • a "terminal region" of a nucleic acid refers to a subsequence of the nucleic acid which is located adjacent to either the 5' or 3' end of the nucleic acid.
  • a "vectorette” is a double-stranded nucleic acid, wherein a portion of the double-stranded nucleic acid is non-complementary. The non- complementary portion is ordinarily flanked by regions of complementarity.
  • a "recombinant nucleic acid” comprises or is encoded by one or more nucleic acids which are derived from a nucleic acid which was artificially constructed.
  • the nucleic acid can comprise or be encoded by a cloned nucleic acid formed by joining heterologous nucleic acids as taught, e.g. , in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in
  • nucleic acid can be synthesized chemically. Two single-stranded nucleic acids "hybridize" when they form a double-stranded duplex.
  • the region of double-strandedness can include the full- length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single stranded nucleic acid, or the region of double-strandedness can include a subsequence of each nucleic acid.
  • Hybridization with Nucleic Acid Probes part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York.
  • Appropriate solutions and temperatures for hybridization are sequence dependent, with the selection of appropriate hybridization conditions being routine. See, Tijssen et al. , id.
  • highly stringent hybridization conditions are selected to be about 5-10° C lower than the thermal melting point (TJ for the specific sequence at a defined ionic strength and pH.
  • the T m is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe.
  • a “primer extension reaction,” is performed by hybridizing a primer to a template nucleic acid, and covalently linking nucleotides to the primer such that the added nucleotides are complementary to the template nucleic acid.
  • Primer extension is ordinarily performed using an enzyme such as a DNA polymerase.
  • a template dependant polymerase such as DNA polymerase I (or the Klenow fragment thereof), taq or rTth polymerase XL incorporates a nucleotide complementary to the template strand on the 3' end of a primer which is hybridized to the template.
  • An “amplification primer” is a nucleic acid primer used for primer extension in a PCR reaction.
  • a “region" of a nucleic acid refers to the general area surrounding a structural feature of the nucleic acid, such as the termini of the molecule, an incorporated residue, or a specific subsequence.
  • a “restriction endonuclease cleavage site” denotes the site at which a known endonuclease cleaves DNA under defined environmental conditions.
  • a “restriction endonuclease recognition site” denotes the DNA site which is recognized by the endonuclease which brings about the cleavage reaction. The recognition site is distinct from the cleavage site for some enzymes, such as Hphl.
  • a "set” of restriction enzyme digests refers to a parallel series of digests performed on a single template nucleic acid (generally a large DNA such as a YAC, BAC, PAC or cosmid).
  • a “set” of PCR reactions refers to a series of parallel reactions where similar manipulations are performed on all of the members of the set. For example, a set of restriction enzyme digests of a large DNA is optionally treated by ligation of similar components and PCR amplification for sequencing as described, supra.
  • Electrophoresis of a typical dideoxy sequencing reaction has an upper limit of resolution of about 200 to 600 nucleotides, depending on the precise apparatus which is used.
  • it is not possible to sequence large nucleic acids such as plasmids, cosmid clones, yeast artificial plasmid clones, recombinant lambda phage or other recombinant nucleic acids in a single sequencing reaction.
  • small fragments of the selected nucleic acid are sequenced individually, and the sequences are compiled to produce the overall sequence of the large nucleic acid.
  • PCR sequencing methods see, Rosenthal and Jones (1990) Nucleic Acids Research 18(10): 3095-3096 and Riley et al. (1990) Nucleic
  • Prior art methods typically require purification steps to isolate PCR products for sequencing, and/or subcloning of the PCR products for sequencing, and are performed using a laborious chromosome walking method in which PCR products are sequenced in a linear fashion. For example, in the Rosenthal and
  • each PCR amplification and sequencing reaction provides the basis for the selection of primers to a second amplification and sequencing event for a contiguous nucleic acid.
  • the present invention provides a parallel series of PCR templates for simultaneous sequencing, and no purification steps are required to isolate intermediate PCR products.
  • the methods proceed by selecting a large nucleic acid to be sequenced, digesting copies of the nucleic acid in parallel with a multiplicity of restriction enzymes in separate reactions, ligating a tag onto the digested fragments, performing anchored PCR on the fragments, and sequencing the resulting amplified fragments in parallel from one of the terminal regions of each of the nucleic acids.
  • the methods of the invention thus provide several significant advantages over the prior art. Providing Large Nucleic Acid Templates
  • RNA and DNA nucleic acids The selection of the nucleic acid to be sequenced depends upon the construct in hand by the sequencer. Many methods of making recombinant RNA and DNA nucleic acids, including recombinant plasmids, recombinant lambda phage, cosmids, yeast artificial chromosomes (YACs), PI artificial chromosomes,
  • BACs Bacterial Artificial Chromosomes
  • BACs Bacterial Artificial Chromosomes
  • YACs, BACs, PACs and MACs as artificial chromosomes
  • Examples of appropriate cloning techniques for making large nucleic acids, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook); and Current Protocols in Molecular Biology , F.M. Ausubel et al , eds., Current
  • nucleic acids sequenced by this invention are isolated from biological sources or synthesized in vitro.
  • the nucleic acids of the invention are present in transformed or transfected whole cells, in transformed or transfected cell lysates, or in a partially purified or substantially pure form.
  • RNA polymerase mediated techniques e.g., NASBA
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • NASBA RNA polymerase mediated techniques
  • RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR expansion and sequencing using reverse transcriptase and a polymerase. See, Ausbel, Sambrook and Berger, all supra.
  • a large nucleic acid to be sequenced is typically a large DNA molecule derived from a plasmid, cosmid, phage, YAC or the like. Multiple copies of the DNA are grown, typically in cell culture in bacteria (typically E. coli) or eukaryotic cell culture (typically yeast, insect cells, animal cells or the like). Methods of cloning and amplifying DNA are described in Ausbel, Sambrook and Berger, all supra. Illustrative of cells for the production of DNAs include bacteria, and eukaryotic cells of fungal, plant, insect or vertebrate (e.g., mammalian) origin.
  • Transducing such cells with DNAs is accomplished by various known means. These include calcium phosphate precipitation, fusion of the recipient cells with bacterial or yeast protoplasts containing the DNA, treatment of the recipient cells with liposomes containing the DNA, DEAE dextran, receptor-mediated endocytosis, electroporation, micro-injection of the DNA directly into the cells, incubating viral vectors containing target nucleic acids which encode polypeptides of interest with cells within the host range of the vector, calcium phosphate transfection, and many other techniques known to those of skill. See, e.g., Methods in Enzymology, vol. 185, Academic Press, Inc. , San Diego, CA (D.V. Goeddel, ed.) (1990) or M.
  • the DNA is purified and aliquotted into separate containers with selected restriction endonucleases and appropriate buffers for digestion.
  • restriction endonucleases are known, well characterized, and commercially available.
  • the restriction digest is then stopped by methods known in the art, such as the addition of EDTA, SDS or the application of heat, alcohol precipitation of the DNA, phenol-chloroform extraction to remove the restriction endonuclease, or a combination thereof. For example, simply heating an enzyme digest to 68°C for 5-10 minutes is sufficient to inactivate many restriction enzymes.
  • a sample from each of the aliquots is electrophoresed, or otherwise analyzed to test whether the restriction digestion worked properly.
  • DNA from each aliquot is optionally purified, e.g., by precipitation, column chromatography or the like.
  • the digested DNA is ordinarily made blunt (i.e., any restriction digest overhang is removed). Some restriction digests leave a blunt end, while other ends are made blunt using a DNA polymerase.
  • the DNA is preferably phosphorylated. Some restriction enzymes leave an appropriate phosphoryl group; digested DNA from those which do not are treated with a kinase enzyme such as T4 polynucleotide kinase.
  • a kinase enzyme such as T4 polynucleotide kinase.
  • primer binding sites with overhangs can facilitate attachment to DNA fragments (i.e., where the DNA has a complementary overhang.
  • Primer binding sites such as vectorettes are attached to the digested aliquots of DNA (or a sample thereof). Vectorettes are described, e.g., in Riley, et al (1990) Nucleic Acids Res. 18: 2887-2890.
  • the primer binding sites are usually attached to the DNA using a DNA ligase in a DNA ligation reaction.
  • a preferred DNA ligase is T4 DNA ligase, but many other ligases are known, appropriate and commercially available.
  • the primer binding site can be chemically coupled to the digested DNA in a condensation reaction.
  • oligonucleotides used in the invention such as sequencing primers, primer binding sites, vectorettes and the like can be made recombinantly, but more typically they are made chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts. , 22(20): 1859-1862, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al. (1984) Nucleic Acids Res. , 12:6159-6168. Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill.
  • oligonucleotides Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255:137-149.
  • the sequence of the synthetic oligonucleotides can be verified using the chemical degradation method of Maxam and Gilbert (1980) in Grossman and Moldave (eds.) Academic Press,
  • nucleic acids are made as single stranded nucleic acid (typically DNA). Where the primer binding site is double stranded, or partially double stranded, two single stranded nucleic acids are synthesized and hybridized. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and
  • nucleic acids are placed together in solution, heated above the melting temperature of the nucleic acids, and slowly cooled to a temperature below the melting temperature of the nucleic acids.
  • Preferred double stranded primer binding sites comprise non- complementary sequences.
  • PCR primers are made in which the 3' portion of the primer is complementary to a nucleic acid which is complementary to one of the non-complementary regions of the primer binding site.
  • a preferred primer binding site is a vectorette, in which the central portion of the primer binding site is non- complementary. The primer binding sites are ordinarily not phosphorylated for ligation to the digested DNA.
  • the primer binding sites are optionally phosphorylated. Where the primer binding site is not phosphorylated, the digested DNA is usually phosphorylated to permit ligation. Most DNA ligases require a site of phosphorylation to couple nucleic acids.
  • thermostable polymerase enzymes which have superior abilities to amplify large nucleic acids are preferred (see, Cheng et al, supra.).
  • Example enzymes include rTth DNA polymerase XL (a mixture of rTth DNA polymerase from Thermus thermophilus and Vent DNA polymerase from Thermococcus litoralis, available from Perkin Elmer, Foster City, CA.
  • each set of PCR reactions can be used to amplify sets of overlapping fragments which extend up to at least about 20 kb from an internal primer.
  • primer extension reaction is carried out according to known techniques, e.g., as specified by the supplier of the selected polymerase.
  • the internal primer to prime the first strand of synthesis is selected based upon hybridization to a known sequence in the large DNA.
  • This known sequence optionally comes from vector sequence surrounding a clone in a cloning vector, such as a plasmid, cosmid, phage or YAC.
  • a portion of a DNA of interest can be sequenced using standard techniques, or more preferably using the techniques described herein, thereby providing sequence information for making the internal primer.
  • ordered sets of nested fragments are made and the sequence information is used to generate a second contiguous ordered set of nested fragments, and so on, until the entire sequence of interest is determined.
  • selecting optimal amplification primers is typically done using computer assisted consideration of available sequence and excluding potential primers which do not have desired hybridization characteristics, and/or including potential primers which meet selected hybridization characteristics. This is done by deteimining all possible nucleic acid primers, or a subset of all possible primers with selected hybridization properties (e.g., those with a selected length and G:C ratio) based upon the known sequence.
  • the selection of the hybridization properties of the primer is dependent on the desired hybridization and discrimination properties of the primer. In general, the longer the primer, the higher the melting temperature.
  • amplification primers are between 8 and 100 nucleotides in length, and preferably between about 10 and 30 nucleotides in length. Most preferably, the primers are between 15 and 25 nucleic acids in length. For example, in one preferred embodiment, the nucleic acid primers are about 17-30 nucleotides in length.
  • nucleotides at the 5' end of a primer can incorporate structural features unrelated to the target nucleic acid; for instance, in one preferred embodiment, a sequencing primer hybridization site (or a complement to such as primer, depending on the application) is incorporated into the amplification primer, where the sequencing primer is derived from a primer used in a standard sequencing kit, such as one using a biotinylated or dye-labeled universal Ml 3 or SP6 primer. These structural features are referred to as constant primer regions.
  • the primers are selected so that there is no complementarity between any known target sequence and any constant primer region.
  • constant regions in primer sequences are optional.
  • all primer sequences are selected to hybridize only to a perfectly complementary DNA, with the nearest mismatch hybridization possibility from known DNA sequence typical having at least about 50 to 70% hybridization mismatches, and preferably 100% mismatches for the terminal 5 nucleotides at the 3' end of the primer.
  • the primers are selected so that no secondary structure forms within the primer.
  • Self -complementary primers have poor hybridization properties, because the complementary portions of the primers self hybridize (i.e., form hairpin structures).
  • the primers are also selected so that the primers do not hybridize to each other, thereby preventing duplex formation of the primers in solution, and possible concatenation of the primers during PCR. If there is more than one constant region in the primer, the constant regions of the primer are selected so that they do not self-hybridize or form hai ⁇ in structures.
  • sets of amplification primers are of a single length
  • the primers are optionally selected so that they have roughly the same, and, in some embodiments, exactly the same overall base composition (i.e., the same A+T to G+C ratio of nucleic acids).
  • the A+T to G+C ratio is determined by selecting a thermal melting temperature for the primer-DNA hybridization, and selecting an A+T to G+C ratio and probe length for each primer which has approximately the selected thermal melting temperature.
  • selection steps are performed using simple computer programs to perform the selection as outlined above; however, all of the steps are optionally performed manually.
  • One available computer program for primer selection is the Mac Vector program from Kodak. In addition to programs for primer selection, one of skill can easily design simple programs for any of the preferred selection steps.
  • the Second Cvcle of PCR Second Strand Synthesis
  • the second cycle of PCR is performed using a primer (the "first amplification primer,” or “second strand primer”) which hybridizes to a region of the first strand synthesized from the internal primer as discussed above.
  • the amplification primer hybridizes to the first strand in the region which was synthesized in the first strand complementary to the primer binding site.
  • the amplification primer hybridizes to the region of the first strand which was synthesized to be complementary to one strand of the primer binding site.
  • this region of the first strand be complementary to a portion of the primer binding site which was itself not complementary to the opposing strand of the primer binding site, such as the central portion of a vectorette (See, Figure 2 and Figure 5).
  • the sequence of the first strand in this region is created by the first strand synthesis; the double- stranded template for the first round of PCR does not contain an equivalent sequence.
  • the first amplification primer can only hybridize to the first strand, resulting in PCR with low background. This eliminates the need for purification of the PCR product, although purification is optionally performed.
  • the PCR product is purified it is typically purified using simple column chromatography to remove any excess primers, or by gel purification.
  • the region of a PCR primer which should be complementary to the template is the 3' end of the primer.
  • the 5' end optionally has sequences engineered to facilitated sequencing, or cloning.
  • the primer optionally incorporates a sequence corresponding to widely used universal Ml 3, M13 reverse, T7, T3 or SP6 sequencing primers.
  • a preferred SP6 primer is ATTTAGGTGACACTATAG.
  • a preferred T7 primer is
  • a preferred T3 primer is ATTAACCCTCACTAAAGGGA.
  • a preferred M13 reverse primer is CAGGAAACAGCTATGACC.
  • the first amplification primer is phosphorylated, e.g., using T4 polynucleotide kinase. This permits selective digestion of the second strand using an exonuclease, leaving a single-stranded template for sequencing.
  • the first and second strands are exponentially amplified by PCR to yield the final PCR product for each of the parallel series of parallel reactions (referred to as a "set" of PCR reactions).
  • An aliquot of each of the reactions can be analyzed for purity. If unwanted PCR products occur, the final product can be purified by gel-purification or column chromatography procedures. Although it is not preferred or required, the PCR products can be subcloned for sequencing, or other procedures.
  • the final PCR product is directly sequenced using a wide variety of sequencing protocols. It is expected that one of skill is thoroughly familiar with basic sequencing protocols, including those using the Klenow fragment of DNA polymerase, taq polymerase, SequenaseTM, and the like. See, Sambrook, Ausbel, Innis, and Berger, supra.
  • one strand of the final PCR product is selectively digested to produce a single-stranded template for sequencing.
  • single-stranded templates often produce more readable sequence information.
  • sequences are aligned based upon overlapping sequence information. Preferably, this is performed using computer assisted compilation of the sequences, but the sequences are optionally aligned manually.
  • the information is used to select a new internal primer which hybridizes to the region. The entire procedure for generating overlapping or "nested" fragments is then repeated using the new internal primer.
  • Kits are constructed based upon the methods and sets of PCR reactions described herein.
  • the kits will ordinarily contain instructions in the methods of the invention, reagents for performing the PCR reactions, a container such as a box containing the instructions and vials of reagents.
  • the kits optionally include sequencing primers, computer software for primer selection and sequence compilation, sequencing equipment such as gel readers or gel boxes, PCR enzymes, sequencing enzymes, buffers for performing the PCR or sequencing reactions, primer binding sites such as vectorettes (hybridized or unhybridized), positive control large DNAs such as a cosmid, plasmid, phage or YAC, an exonuclease, a kinase enzyme, or other items for performing the methods described herein.
  • shotgun sequencing methods In shotgun sequencing methods, a given template is cleaved into many smaller fragments, subcloned and the clones randomly isolated and sequenced. The sequences from the clones is then assembled, and overlapping regions compiled. See also, Sambrook, Innis, Ausubel, Berger, Watson, and Lewin, supra for a discussion of shotgunning DNA. As the size of a template increases, the work required to generate a full length clone increases geometrically, because of the number of random clones that must be sequenced to ensure that all regions of the template are represented. In some shotgun methods, some improvement is achieved by directly sequencing the DNA fragments, e.g., using PCR mediated methods. However, shotgun sequencing methods are typically still labor intensive.
  • the present invention is used in conjunction with shotgun sequencing methods to dramatically improve the speed of sequencing large projects.
  • shotgun cloning and sequencing is first performed.
  • a minority of the total template is sequenced by shotgunning methods.
  • Sequence specific primers are generated for each sequence generated by the shotgun methods, using the methods described for primer selection, supra. See, Figure 4.
  • the sequence specific primers are used to perform long distance sequencing as described, supra (see, e.g., Figures 1, 2, and 5), for each sequence specific primer.
  • the large template is cleaved with restriction enzymes to generate a set of overlapping fragments
  • vectorettes comprising a primer binding site for a sequencing primer are ligated as appropriate
  • long distance PCR is performed and the resulting fragments are sequenced with chain termination methods using the sequencing primer.
  • the sequence generated by the long distance sequencing method can be extended by generating new sequence specific primers from the sequence generated by the first set of long distance sequencing reactions, and repeating the long distance sequencing method using these sequence specific primers. All of the sequences are then aligned and compiled to generate a single contiguous sequence for the large template.
  • standard shotgun sequencing is first performed to determine most of the sequence of a given large template (e.g., a majority of the sequence, up to about 90% of the sequence). Generating the complete sequence at this point by shotgun sequencing requires a very large investment of effort, because a large number of random clones or fragments needs to be sequenced to find the missing information. For instance, in a conceptually simple system with just 10 random fragments, with 9 of the 10 already sequenced, the chance that sequencing an additional random clone will provided the missing information is just 10% . Instead of generating the missing sequencing by sequencing random fragments, a long distance sequencing protocol as described, supra, is used to fill in the missing information. This can dramatically reduce the number of sequencing reactions that need to be performed to generate complete sequence information for a large template. See, Figure 4. EXAMPLES
  • nested DNA fragments around region of interest were amplified by anchored PCR using a vectorette unit and the fragments were directly sequenced.
  • the procedure is schematically presented in Figures 1 and 2.
  • Amplification primers were designed in the sequenced regions of the cosmid using the Mac Vector program (Kodak).
  • V-bottom are complementary to each other except for the middle one-third of the annealed double-stranded vectorette, imparting a bubble like structure to the vectorette (see also, Figures 1 and 2).
  • the M13 sequence-tagged 224 primer (224M13: 5'- TGTAAAACGACGGCCAGTCGAATCGTAACCGTTCGTACGAGAATCGCT-3') was phosphorylated in a 50 ⁇ l volume containing lx kinase buffer [70 mM Tris-HCl(pH7.6), 10 mM MgCl 2 , 5 mM dithiothreitol, 1 mM ATP], 30 ⁇ M 224M13 and 50 U T4 polynucleotide kinase. The reaction was incubated at 37°C for 30min, heated to 68°C for 10 min then stored at -20°C.
  • Cosmid DNA was extracted from a 1.5 ml overnight culture by an alkaline mini-prep method (See, Sambrook) into 50 ⁇ l of distilled water. Two microliters of this cosmid DNA solution were enzymatically digested using Alu I, Bsa Al, BstU I, Pal I, Rsa I, Ace I, Afl III, BstY I, Hmc II, Msl I, 7sp45 I, EcoR V, Hpa I, Pvw II, Sea I, 5/n ⁇ I, Slsp I or Stu I (Stratagene and New England Biolabs) in a 20 ⁇ l volume including 10 U of enzyme and lx buffer as recommended by the manufacturers.
  • the vectorette unit was ligated to each restriction fragment in a 40 ⁇ l reaction volume containing lx T4 ligase buffer [50 mM Tris-HCl(pH7.8), 10 mM MgCl 2 , 10 mM dithiothreitol, 1 mM ATP, 25 ⁇ g/ml bovine serum albumin],
  • PCR reactions were performed in 100 ⁇ l volumes containing lx XL buffer II (Perkin-Elmer), 1.1 mM Mg(OAc) 2 , 200 nM dNTPs, 2.5 ⁇ l from the vectorette unit-Iigated restriction fragment solution, 1 ⁇ l of 30 ⁇ M phosphorylated 224M13, 1 ⁇ l of 30 ⁇ M amplification primer and 2 U of rTth XL DNA polymerase.
  • Reactions were heated to 94 °C for 1 min, then PCRed for a total of 40 cycles at 94°C for 30 sec, 55°C for 30 sec, 68°C for 4 min; the last 24 cycles used a 15 sec extension per cycle using a thermal cycler (Perkin-Elmer).
  • Amplified PCR fragments were purified in 50 ⁇ l of distilled water using the Wizard PCR prep kit (Promega). Purified PCR fragments, 25 ⁇ l, were subjected to ⁇ exonuclease digestion using the "PCR template prep for ssDNA sequencing" kit (Pharmacia) to yield single stranded DNAs in 25 ⁇ l of TE(10 mM Tris-HCl pH 7.6, 1 mM EDTA). Seven microliters of single stranded DNA solutions were subjected to direct fluorescent sequencing using the Taq dye-primer cycle sequencing kit and the 373 A DNA sequencer (Perkin-Elmer).
  • sequence of 14 kb out of a 16 kb region of interest was determined from a single small scale cosmid DNA preparation.
  • a region of 2 kb with few restriction sites for the initial set of enzymes required the use of additional restriction enzymes to generate appropriate fragments.
  • this strategy is suitable to determine nucleotide sequence around a region of known sequence. For example, this method was successfully applied to determine the genomic structure of the transforming growth factor ⁇ type II receptor gene using YAC and cosmid clones.
  • a simplified protocol was used, omitting phosphorylation of the 224M13 primer, lambda exonuclease treatment, phenol/chloroform extraction, ethanol precipitation, spin column treatment and other optional procedures.
  • This simplified treatment reduced the time needed for preparing PCR samples by up to three hours.
  • DNA from a selected source e.g., plasmid, YAC, PAC, BAC, or cosmid
  • a multiplicity of enzymes as described, supra, in a 10 ⁇ l reaction volume for one hour.
  • the reactions were stopped by heat inactivating the restriction enzymes by incubating the reaction mixtures for 10 minutes at 68 °C.
  • the reactions were then cooled to room temperature.
  • restriction enzymes giving cohesive ends were used, the ends were blunted by Klenow treatment (1 ⁇ l of lOmM dNTPs and .5 units of Klenow were added at room temperature, and incubated for 5 minutes).
  • the Klenow enzyme was heat inactivated at 68 °C for 10 minutes, followed by cooling to room temperature.
  • 1 ⁇ l of 4 nM vectorette, 3 ⁇ l of T 4 DNA ligase buffer containing 10 mM ATP, 26 ⁇ l distilled water and 1 Weiss unit of T 4 DNA ligase were added.
  • the mixture was incubated for 1 hour at room temperature. 210 ⁇ l of distilled water was then added, and the mixture stored until use.
  • PCR reactions For the PCR reactions, 224M13 (unphosphorylated) was used with the sequence specific primer and 2.5 ⁇ l solution from the stored mixture. After confirming the amplified fragments by agarose gel electrophoresis, the PCR fragments were purified using Wizard PCR preparation kits from Promega, or similar standard methods. The purified DNA was eluted in 50 ⁇ l of distilled water. Sequencing reactions were performed using ABI's -21M13 dye primer kit, or a comparable kit (e.g., a dye sequencing kit available from Amersham).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods for rapidly sequencing large nucleic acids by making overlapping fragments and anchored PCR amplification are provided. In the methods, an overlapping set of double stranded nucleic acids is provided, primer binding sites are ligated to the double stranded nucleic acids, and anchored PCR is performed on the resulting primer binding site tagged nucleic acids. PCR amplified fragments are sequenced by chain termination methods. These long distance sequencing methods are used independently, or in conjunction with other sequencing methods, such as shotgun sequencing techniques. Sets of PCR reactions for performing rapid sequencing and kits for practicing the methods are also provided.

Description

METHODS FOR SEQUENCING LARGE NUCLEIC ACID SEGMENTS AND COMPOSITIONS FOR LARGE SEQUENCING PROJECTS
FIELD OF THE INVENTION
This invention relates to the field of nucleic acid amplification and sequencing and kits for amplification and sequencing.
CROSSREFERENCETORELATEDAPPLICATIONS This application is a continuation-in-part of United States provisional patent application USSN 60/017,569 filed May 15, 1996 by Hagiwara et al , which is incorporated herein by reference in its entirety for all purposes. BACKGROUND OF THE INVENTION Efficient DNA sequencing technology is central to the development of the biotechnology industry and basic biological research. Improvements in the efficiency and speed of DNA sequencing are needed to keep pace with the demands for DNA sequence information. The Human Genome Project, for example, has set a goal for dramatically increasing the efficiency, cost- effectiveness and throughput of DNA sequencing techniques. See, Collins, and Galas (1993) Science 262:43-46.
Most DNA sequencing today is carried out by chain termination methods of DNA sequencing. The most popular chain termination methods of DNA sequencing are variants of the dideoxynucleotide mediated chain termination method of Sanger. See, Sanger et al. (1977) Proc. Nat. Acad. Sci., USA 74:5463- 5467. For a simple introduction to dideoxy sequencing, see, Current Protocols in
Molecular Biology, F.M. Ausubel et al. , eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (Supplement 37, current through 1997) (Ausubel), Chapter 7. Thousands of laboratories employ dideoxynucleotide chain termination techniques. Commercial kits containing the reagents most typically used for these methods of DNA sequencing are available and widely used. DNA sequencing typically involves two steps: i) making suitable templates for all the regions to be sequenced; and ii) running sequencing reactions for electrophoresis. The latter step can be automated by use of workstations and autosequencers. The first step requires careful experimental design and laborious DNA manipulation such as the construction of nested deletion mutants. See,
Griffin, H.G. and Griffin, A.M. (1993) DNA sequencing protocols, Humana Press, New Jersey. Making templates is often the limiting step in large sequencing projects.
For example, in "shot-gun" sequencing methods, randomly selected sub clones, which may or may not have overlapping sequence information, are randomly sequenced. The sequences of the sub clones are then compiled to produce an ordered sequence. These procedures eliminate complicated DNA manipulations; however, the method is inherently inefficient because many recombinant clones must be sequenced due to the random nature of the procedure. This invention provides improvements in nucleic acid sequencing technology, particularly for sequencing large DNAs. In particular, the invention provides a way of generating overlapping nucleic acids from a large clone to facilitate sequencing, and powerful methods of amplifying and tagging the overlapping nucleic acids into suitable sequencing templates. The methods can be used in conjunction with shotgun sequencing techniques to dramatically improve the efficiency of shotgun methods.
SUMMARY OF THE INVENTION
The invention provides new methods of sequencing large nucleic acids, kits for practicing the methods and related compositions.
In the methods of the invention, a plurality of overlapping double stranded nucleic acids are provided, for example by digesting a large recombinant DNA such as a cosmid, recombinant lambda phage genome, Yeast artificial chromosome (YAC), or pi plasmid (alternatively referred to as a pi artificial chromosome, or "PAC") with at least one restriction endonuclease. Typically, the large DNA is aliquotted into several separate reaction mixtures and digested in parallel with a multiplicity of different enzymes. For instance, in one preferred embodiment, samples of the large DNA are digested in parallel with Alu I, Ace I, BstY I, Hinc II, Rsa I, Sea I, Msl I, Hpa I, Sma I, fltfiJ 1, Tsp45 I, ftu I, 4/7 III, Pvu II and £coR V. This parallel digestion provides overlapping fragments of the large nucleic acid, preferably in overlapping increments of about 200 to 600 nucleotides, and commonly 200 to 400 nucleotides. Additional restriction endonucleases are used as necessary, depending on the particular large DNA.
One or more of the overlapping fragments are selected for further processing (any subset, or all, of the restriction digests are optionally processed in parallel). A first primer binding site is ligated to a selected first double stranded nucleic acid fragment. The first primer binding site when in double-stranded form typically has terminal regions which are complementary, surrounding a central non-complementary region. Similarly, a second primer binding site is preferably ligated to a selected second double stranded nucleic acid fragment, and a third primer binding site is typically ligated to a third selected double stranded nucleic acid fragment and so on, until each nucleic acid fragment has a ligated primer binding site. The first, second, third... [n] primer binding sites are preferably identical, but are optionally selected independently. A selected double stranded nucleic acid fragment with a ligated primer binding site is a "tagged" double stranded nucleic acid fragment. Thus, the first double stranded nucleic acid fragment with a primer binding site is a tagged first double stranded nucleic acid.
The first primer binding site, when in double-stranded form, typically has terminal regions which are complementary, surrounding a central non-complementary region. An example of a first primer binding site is a vectorette such as those which are used for identification of the terminal ends of YACs. Often, the first double stranded nucleic acid fragment has blunt ends, either due to the restriction digest, or due to enzymatic processing of the ends (e.g., with a polymerase such as T4 DNA polymerase). Blunting the ends of the nucleic acid fragments facilitates attachment of a single primer binding site to all of the overlapping fragments for parallel sequencing analysis. In this embodiment, the first and second primer binding sites are identical. However, the ends are optionally left heterogeneous for the different nucleic acid fragments, with different primer binding sites being ligated onto each fragment. In one preferred class of embodiments, the selected double stranded nucleic acid is phosphorylated (ether as a result of the endonuclease digestion, or by treating the nucleic acid with a kinase enzyme), while the primer binding site is not phosphorylated. This causes each primer binding site to be covalently ligated to only one strand of the double-stranded nucleic acid, and prevents the formation of primer binding site concatamers. The ligation is ordinarily performed with a ligase, but is optionally performed chemically.
The tagged fragments are then amplified by anchored PCR. In the anchored PCR procedure, a portion of the tagged nucleic acid is amplified using an internal primer complementary to the selected double stranded nucleic acid
(e.g., the first double stranded nucleic acid), thereby providing an amplified first tagged nucleic acid subsequence. The sequence of the internal primer is typically derived from examining known sequence present in the large DNA, such as vector sequence, or other available sequence information, and constructing a complementary nucleic acid to the known sequence.
Typically, the PCR reaction is performed by denaturing the first double-stranded tagged nucleic acid, and hybridizing the internal primer to a complementary amplification primer binding subsequence of the selected double stranded nucleic acid, wherein the internal sequence is 3' to the first primer binding site, and performing a primer extension of the strand complementary to the internal primer binding subsequence by PCR, with the first internal primer to prime DNA synthesis, thereby forming a first PCR amplification product. A first amplification primer is hybridized to the first PCR amplification product in a region of the first amplification product which is complementary to the first primer binding site, and, primer extension of the strand complementary to the first amplification product is performed by PCR, using the first amplification primer to prime DNA synthesis, thereby producing a double stranded amplified first nucleic acid subsequence. The first amplification primer is complementary to the first amplification product, in whole or in part. Ordinarily, at least the 3' end of the amplification primer is complementary to the first primer binding site. Usually, at about 8 to about 30 nucleotides at the 3' end of the amplification primer are complementary. An example of such a primer is the 224M13 primer described herein.
The PCR amplification strategy of the invention has a number of advantages over standard methods of performing PCR. For instance, because the first primer binding site comprises a non-complementary subsequence, the first amplification primer is designed so that it can only hybridize to the complement of one of the strands of the primer binding site, which is not generated until the first PCR amplification product is formed. This reduces or eliminates the formation of unwanted PCR products, thereby eliminating the need for purification and subcloning of the PCR products for subsequent sequencing.
The PCR products are then sequenced by performing a dideoxy chain termination reaction with the amplified first tagged nucleic acid subsequence as a template, using a first sequencing primer complementary to one strand of the amplified first tagged nucleic acid subsequence to prime DNA synthesis. In one class of embodiments, the first sequencing primer hybridizes to a terminal region of the amplified first tagged nucleic acid subsequence which comprises a nucleotide sequence complementary to one strand of the first primer binding site. Alternatively, the sequencing primer can hybridize to a terminal region of the amplified first tagged nucleic acid subsequence which does not comprise a nucleotide sequence complementary to one strand of the first primer binding site.
For example, the first amplification primer preferably includes a 5' tail which does not hybridize to the first amplification product. This 5' tail often is complementary to a widely available sequencing primer such as the universal M13, M13 reverse, T7, T3 and SP6 sequencing primers. In one preferred embodiment, the PCR products are digested with an exonuclease to produce a single stranded template for the dideoxy sequencing reaction. Although double stranded templates are acceptable for sequencing reactions, single stranded templates can produce more readable sequence information. The above long distance sequencing methods are typically repeated on each of the double stranded nucleic acids generated from the large DNA, thereby providing sequence from the terminal region corresponding to the first amplification primer of each amplified fragment. The sequences are overlapping, and are compiled to produce the sequence of the large DNA, or a subsequence thereof.
The present invention also provides kits for sequencing large nucleic acids. The kits typically comprise a container and instruction in the use of the kit for performing sequencing of large nucleic acids by the methods described herein. Ordinarily, the kits include components such as nucleic acids to make a primer binding site, and a primer which hybridizes to the complement of one strand of the primer binding site, wherein the primer binding site comprises a double-stranded vectorette comprising a central non-complementary nucleic acid subsequence. The kits optionally include reagents and materials for performing the amplification and dideoxy chain termination steps of the invention such as taq polymerase, rTth DNA polymerase XL, Taq plus DNA polymerase, DNA ligase, one or more restriction endonuclease, computer software for compiling overlapping sequence information, software for designing PCR primers, DNA kinase, lambda exonuclease, PCR reagents (nucleotides, labels, salts etc.), DNA polymerase, Sequenase™ , subcloning plasmids, and an Ml 3 sequencing primer.
The invention also provides a set of PCR reaction mixtures, corresponding to the PCR reaction mixtures made in the process of sequencing a large DNA as described above. The reaction mixtures typically comprise an overlapping series of DNAs, wherein each reaction mixture in the set of reaction mixtures comprise a template DNA tagged with a vectorette, a first primer which hybridizes to an internal subsequence of the template DNA, and a second primer which hybridizes to the vectorette. The PCR reaction mixtures also typically include appropriate PCR reagents such as rTth polymerase XL, nucleotides, and
Mg+ +. As one of skill will understand, the PCR reaction mixtures optionally comprise any of the compositions used in performing the PCR steps of the methods of the invention, such as template DNA tagged with a vectorette at each end, thereby providing an internal template DNA subsequence, flanked by external vectorette sequences. In one embodiment, where the vectorettes are unphosphorylated, each of the vectorettes are covalently attached to only one strand of the internal DNA subsequence. Vectorettes for the different PCR reactions in the set of PCR reactions ordinarily have the same sequences. For example in one particular embodiment, the vectorette comprises a subsequence which is complementary to a nucleic acid complementary to primer 224M13.
BRIEF DESCRIPTION OF THE DRAWING
Figure 1 is a schematic representation of a preferred method of providing a overlapping templates for sequencing
Figure 2 is a schematic representation of a preferred method of sequencing a nucleic acid. Figure 3 panels A and B shows a typical set of amplified fragments made by anchored PCR. One-twentieth of each PCR amplification reaction was electrophoresed on a 1 % agarose gel. Reactions are aligned according to fragment size.
Figure 4 is a schematic of combination sequencing by shotgun and long distance sequencing methods.
Figure 5 is a schematic of a simplified amplification/sequencing protocol.
DEFINITIONS Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton et al. (1994) Dictionary of Microbiology and Molecular Biology, second edition, John Wiley and Sons (New York); Walker (ed) (1988) The Cambridge Dictionary of Science and Technology, The press syndicate of the University of Cambridge, NY; and Hale and Marham (1991) The Harper Collins Dictionary of Biology Harper Perennial, NY provide one of skill with a general dictionary of many of the terms used in this invention. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, certain preferred methods and materials are described. For purposes of the present invention, the following terms are defined below.
Two nucleic acids are "overlapping" when the nucleic acids have a region of common sequence. For example, different restriction fragments of a large nucleic acid will have regions in common derived from the large nucleic acid. See, e.g., Figure 1.
A "primer binding site" is a region of a nucleic acid which specifically hybridizes to a primer by hybridization of complementary base pairs. The term "nucleic acid" refers to a deoxyribonucleotide (DNA) or ribonucleotide (RNA) polymer in either single- or double-stranded form, and unless otherwise limited, encompasses known analogues of natural nucleotides that hybridize to nucleic acids in a manner similar to naturally occurring nucleotides.
Unless otherwise indicated, a particular nucleic acid sequence optionally includes the complementary sequence thereof.
Regions of nucleic acids are "complementary" when they can hybridize through standard nucleic acid base pairing (A-T, A-U, or C-G). A region is non-complementary over specified regions when complementary base pairs do not align between the regions. For example, the "vectorette" units described herein have a central non-complementary region, resulting in a bubble structure where the complementary ends of the vectorette hybridize, and the central non-complementary region does not.
Two nucleic acids are "ligated" together when one or more covalent bond is formed between the nucleic acids. Ordinarily, nucleic acids are ligated enzymatically, i.e., using a ligase enzyme; however, they are optionally ligated using chemical reagents.
A nucleic acid "tag" is a short nucleic acid of known sequence which is ligated to one or more nucleic acids. A nucleic acid with a ligated tag is a "tagged nucleic acid. " An "internal primer" is a primer (a single-stranded nucleic acid which is typically between 8 and 100 nucleotides in length, usually between 12 and 40 nucleotides in length and often between 17 and 30 nucleotides in length) which hybridizes to a subsequence found between the ends of a nucleic acid. A "dideoxy chain termination reaction" is a reaction in which a dideoxy nucleotide is incorporated into a nucleic acid. Ordinarily, the reaction is a primer extension reaction performed using a polymerase (Sequenase™, taq, rTth or the like). The dideoxy chain termination reaction forms the basis for most known enzymatic sequencing reactions.
A "nucleic acid template" or "template" is a nucleic acid which is copied, or, when single stranded, a nucleic acid which is used to make a complementary nucleic acid.
A "terminal region" of a nucleic acid refers to a subsequence of the nucleic acid which is located adjacent to either the 5' or 3' end of the nucleic acid.
A "vectorette" is a double-stranded nucleic acid, wherein a portion of the double-stranded nucleic acid is non-complementary. The non- complementary portion is ordinarily flanked by regions of complementarity.
A "recombinant nucleic acid" comprises or is encoded by one or more nucleic acids which are derived from a nucleic acid which was artificially constructed. For example, the nucleic acid can comprise or be encoded by a cloned nucleic acid formed by joining heterologous nucleic acids as taught, e.g. , in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in
Enzymology volume 152 Academic Press, Inc. , San Diego, CA (Berger) and in Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3 (Sambrook). Alternatively, the nucleic acid can be synthesized chemically. Two single-stranded nucleic acids "hybridize" when they form a double-stranded duplex. The region of double-strandedness can include the full- length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single stranded nucleic acid, or the region of double-strandedness can include a subsequence of each nucleic acid. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-
Hybridization with Nucleic Acid Probes part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York. In the context of the present invention, it is common to hybridize a primer to a template nucleic acid. Appropriate solutions and temperatures for hybridization are sequence dependent, with the selection of appropriate hybridization conditions being routine. See, Tijssen et al. , id. Generally, highly stringent hybridization conditions are selected to be about 5-10° C lower than the thermal melting point (TJ for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe.
A "primer extension reaction," is performed by hybridizing a primer to a template nucleic acid, and covalently linking nucleotides to the primer such that the added nucleotides are complementary to the template nucleic acid. Primer extension is ordinarily performed using an enzyme such as a DNA polymerase. Using appropriate buffers, pH, salts and nucleotide triphosphates, a template dependant polymerase such a DNA polymerase I (or the Klenow fragment thereof), taq or rTth polymerase XL incorporates a nucleotide complementary to the template strand on the 3' end of a primer which is hybridized to the template.
An "amplification primer" is a nucleic acid primer used for primer extension in a PCR reaction. A "region" of a nucleic acid refers to the general area surrounding a structural feature of the nucleic acid, such as the termini of the molecule, an incorporated residue, or a specific subsequence.
A "restriction endonuclease cleavage site" denotes the site at which a known endonuclease cleaves DNA under defined environmental conditions. A "restriction endonuclease recognition site" denotes the DNA site which is recognized by the endonuclease which brings about the cleavage reaction. The recognition site is distinct from the cleavage site for some enzymes, such as Hphl.
A "set" of restriction enzyme digests refers to a parallel series of digests performed on a single template nucleic acid (generally a large DNA such as a YAC, BAC, PAC or cosmid). A "set" of PCR reactions refers to a series of parallel reactions where similar manipulations are performed on all of the members of the set. For example, a set of restriction enzyme digests of a large DNA is optionally treated by ligation of similar components and PCR amplification for sequencing as described, supra. DETAILED DISCUSSION OF THE INVENTION
Electrophoresis of a typical dideoxy sequencing reaction has an upper limit of resolution of about 200 to 600 nucleotides, depending on the precise apparatus which is used. Thus, it is not possible to sequence large nucleic acids such as plasmids, cosmid clones, yeast artificial plasmid clones, recombinant lambda phage or other recombinant nucleic acids in a single sequencing reaction. Instead, small fragments of the selected nucleic acid are sequenced individually, and the sequences are compiled to produce the overall sequence of the large nucleic acid. For example, PCR sequencing methods (see, Rosenthal and Jones (1990) Nucleic Acids Research 18(10): 3095-3096 and Riley et al. (1990) Nucleic
Acids Research 18(10):2887-2890) have been described.
Prior art methods typically require purification steps to isolate PCR products for sequencing, and/or subcloning of the PCR products for sequencing, and are performed using a laborious chromosome walking method in which PCR products are sequenced in a linear fashion. For example, in the Rosenthal and
Jones methods, each PCR amplification and sequencing reaction provides the basis for the selection of primers to a second amplification and sequencing event for a contiguous nucleic acid.
In contrast, the present invention provides a parallel series of PCR templates for simultaneous sequencing, and no purification steps are required to isolate intermediate PCR products. These advantages dramatically increase the speed and efficiency in obtaining the sequence for a large nucleic acid.
The methods proceed by selecting a large nucleic acid to be sequenced, digesting copies of the nucleic acid in parallel with a multiplicity of restriction enzymes in separate reactions, ligating a tag onto the digested fragments, performing anchored PCR on the fragments, and sequencing the resulting amplified fragments in parallel from one of the terminal regions of each of the nucleic acids. The methods of the invention thus provide several significant advantages over the prior art. Providing Large Nucleic Acid Templates
The selection of the nucleic acid to be sequenced depends upon the construct in hand by the sequencer. Many methods of making recombinant RNA and DNA nucleic acids, including recombinant plasmids, recombinant lambda phage, cosmids, yeast artificial chromosomes (YACs), PI artificial chromosomes,
Bacterial Artificial Chromosomes (BACs), and the like are known. A general introduction to YACs, BACs, PACs and MACs as artificial chromosomes is described in Monaco and Larin (1994) Trends Biotechnol 12 (7): 280-286. Examples of appropriate cloning techniques for making large nucleic acids, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook); and Current Protocols in Molecular Biology , F.M. Ausubel et al , eds., Current
Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (1997, supplement 37) (Ausubel). Basic procedures for sequencing, cloning and other aspects of molecular biology and underlying theoretical considerations are also found in Lewin (1995) Genes V Oxford University Press Inc. , NY (Lewin); and Watson et al (1992) Recombinant DNA
Second Edition Scientific American Books, NY. Product information from manufacturers of biological reagents and experimental equipment also provide information useful in known biological methods. Such manufacturers include the Sigma Chemical Company (Saint Louis, MO); New England Biolabs (Beverly, MA); R&D systems (Minneapolis, MN); Pharmacia LKB Biotechnology
(Piscataway, NJ); CLONTECH Laboratories, Inc. (Palo Alto, CA); ChemGenes Corp. , (Waltham MA) Aldrich Chemical Company (Milwaukee, WI); Glen Research, Inc. (Sterling, VA); GIBCO BRL Life Technologies, Inc. (Gaithersberg, MD); Fluka Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland); Invitrogen (San Diego, CA); Perkin Elmer (Foster City,
CA); and Strategene; as well as many other commercial sources known to one of skill. The construction of YACs and YAC libraries is known. See, Berger Burke et al. (1987) Science 236:806-812. Gridded libraries of YACs were described in Anand et al. (1989) Nucleic Acids Res. 17, 3425-3433, and Anand et al. (1990) Nucleic Acids Res. Riley (1990) 18:1951-1956 Nucleic Acids Res. 18(10): 2887-2890 and the references therein describe cloning of YACs and the use of vectorettes in conjunction with YACs. See also, Ausubel, chapter 13. Cosmid cloning is well known. See, e.g., Ausubel, chapter 1.10.11 (supplement 13) and the references therein. See also, Ish-Horowitz and Burke (1981) Nucleic Acids Res. 9:2989-2998; Murray (1983) Phage Lambda and Molecular Cloning in Lambda II (Hendrix et al , eds) 395-432 Cold Spring Harbor Laboratory, NY;
Frischauf et al (1983) J.Mol Biol. 170:827-842; and, Dunn and Blattner (1987) Nucleic Acids Res. 15:2677-2698, and the references cited therein. Construction of BAC and PI libraries is well known; see, e.g., Ashworth et al. (1995) Anal Biochem 224 (2): 564-571; Wang et al. (1994) Genomics 24(3): 527-534; Kim et al. (1994) Genomics 22(2): 336-9; Rouquier et al. (1994) Anal Biochem 217(2):
205-9; Shizuya et al (1992) Proc Natl Acad Sci U S A 89(18): 8794-7; Kim et al. (1994) Genomics 22 (2): 336-9; Woo et al. (1994) Nucleic Acids Res 22(23): 4922-31; Wang et al (1995) Plant (3): 525-33; Cai (1995) Genomics 29 (2): 413-25; Schmitt et al. (1996) Genomics 1996 33(1): 9-20; Kim et al. (1996) Genomics 34(2): 213-8; Kim et al (1996) Proc Natl Acad Sci U S A (13):
6297-301; Pusch et al. (1996) Gene 183(1-2): 29-33; and, Wang et al. (1996) Genome Res 6(7): 612-9.
The nucleic acids sequenced by this invention, whether RNA, cDNA, genomic DNA, or a hybrid of the various combinations, are isolated from biological sources or synthesized in vitro. The nucleic acids of the invention are present in transformed or transfected whole cells, in transformed or transfected cell lysates, or in a partially purified or substantially pure form.
In vitro amplification techniques suitable for amplifying sequences to provide a large nucleic acid or for subsequent analysis, sequencing or subcloning are known. Examples of techniques sufficient to direct persons of skill through such in vitro amplification methods, including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), Q/3-replicase amplification and other RNA polymerase mediated techniques (e.g., NASBA) are found in Berger, Sambrook, and Ausubel, as well as Mullis et al , (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al eds) Academic Press Inc. San Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3, 81-94; (Kwoh et al. (1989) Proc.
Natl. Acad. Sci. USA 86, 1173; Guatelli et al. (1990) Proc. Natl Acad. Sci. USA 87, 1874; Lomell et al (1989) J. Clin. Chem 35, 1826; Landegren et al , (1988) Science 241, 1077-1080; Van Brunt (1990) Biotechnology 8, 291-294; Wu and Wallace, (1989) Gene 4, 560; Barringer et al. (1990) Gene 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564. Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al, U.S. Pat. No. 5,426,039. Improved methods of amplifying large nucleic acids are summarized in Cheng et al. (1994) Nature 369: 684-685 and the references therein. One of skill will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR expansion and sequencing using reverse transcriptase and a polymerase. See, Ausbel, Sambrook and Berger, all supra.
Digesting the Large Nucleic Acid in Parallel with Restriction Endonucleases A large nucleic acid to be sequenced is typically a large DNA molecule derived from a plasmid, cosmid, phage, YAC or the like. Multiple copies of the DNA are grown, typically in cell culture in bacteria (typically E. coli) or eukaryotic cell culture (typically yeast, insect cells, animal cells or the like). Methods of cloning and amplifying DNA are described in Ausbel, Sambrook and Berger, all supra. Illustrative of cells for the production of DNAs include bacteria, and eukaryotic cells of fungal, plant, insect or vertebrate (e.g., mammalian) origin. Transducing such cells with DNAs is accomplished by various known means. These include calcium phosphate precipitation, fusion of the recipient cells with bacterial or yeast protoplasts containing the DNA, treatment of the recipient cells with liposomes containing the DNA, DEAE dextran, receptor-mediated endocytosis, electroporation, micro-injection of the DNA directly into the cells, incubating viral vectors containing target nucleic acids which encode polypeptides of interest with cells within the host range of the vector, calcium phosphate transfection, and many other techniques known to those of skill. See, e.g., Methods in Enzymology, vol. 185, Academic Press, Inc. , San Diego, CA (D.V. Goeddel, ed.) (1990) or M. Krieger, Gene Transfer and Expression — A Laboratory Manual, Stockton Press, New York, NY, (1990) and the references cited therein, as well as Sambrook and Ausbel. The culture of cells used in conjunction with the present invention, including cell lines and cultured cells from tissue or blood samples is well known in the art. Freshney (Culture of Animal Cells, a Manual of Basic Technique, third edition Wiley-Liss, New York (1994)) and the references cited therein provides a general guide to the culture of cells. See also, Kuchler et al. (1977) Biochemical Methods in Cell Culture and Virology, Kuchler, R.J., Dowden, Hutchinson and Ross, Inc. Additional information on cell culture is found in Ausubel, Sambrook and Berger, supra. Cell culture media are described in Atlas and Parks (eds) The Handbook of Microbiological Media (1993) CRC Press, Boca Raton, FL.
The DNA is purified and aliquotted into separate containers with selected restriction endonucleases and appropriate buffers for digestion. Hundreds of restriction endonucleases are known, well characterized, and commercially available. The restriction digest is then stopped by methods known in the art, such as the addition of EDTA, SDS or the application of heat, alcohol precipitation of the DNA, phenol-chloroform extraction to remove the restriction endonuclease, or a combination thereof. For example, simply heating an enzyme digest to 68°C for 5-10 minutes is sufficient to inactivate many restriction enzymes. Optionally, a sample from each of the aliquots is electrophoresed, or otherwise analyzed to test whether the restriction digestion worked properly. The
DNA from each aliquot is optionally purified, e.g., by precipitation, column chromatography or the like.
The digested DNA is ordinarily made blunt (i.e., any restriction digest overhang is removed). Some restriction digests leave a blunt end, while other ends are made blunt using a DNA polymerase. To permit ligation of the digested DNA to a primer binding site, the DNA is preferably phosphorylated. Some restriction enzymes leave an appropriate phosphoryl group; digested DNA from those which do not are treated with a kinase enzyme such as T4 polynucleotide kinase. The advantage to making the DNA blunt is that the same primer binding site can be ligated to all of the DNA digestion products, simplifying subsequent PCR and sequencing steps. However, a more complex approach in which the DNAs are not blunt ended, and in which different primer binding sites are ligated depending on the restriction enzyme overhang can also be used. The use of primer binding sites with overhangs can facilitate attachment to DNA fragments (i.e., where the DNA has a complementary overhang.
Attaching Primer Binding Sites to the Digested DNA
Primer binding sites such as vectorettes are attached to the digested aliquots of DNA (or a sample thereof). Vectorettes are described, e.g., in Riley, et al (1990) Nucleic Acids Res. 18: 2887-2890. The primer binding sites are usually attached to the DNA using a DNA ligase in a DNA ligation reaction. A preferred DNA ligase is T4 DNA ligase, but many other ligases are known, appropriate and commercially available. Alternatively, the primer binding site can be chemically coupled to the digested DNA in a condensation reaction.
Small nucleic acids (oligonucleotides) used in the invention such as sequencing primers, primer binding sites, vectorettes and the like can be made recombinantly, but more typically they are made chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts. , 22(20): 1859-1862, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al. (1984) Nucleic Acids Res. , 12:6159-6168. Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill. Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255:137-149. The sequence of the synthetic oligonucleotides can be verified using the chemical degradation method of Maxam and Gilbert (1980) in Grossman and Moldave (eds.) Academic Press,
New York, Methods in Enzymology 65:499-560. In chemical synthetic procedures, the nucleic acids are made as single stranded nucleic acid (typically DNA). Where the primer binding site is double stranded, or partially double stranded, two single stranded nucleic acids are synthesized and hybridized. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and
Molecular Biology-Hybridization with Nucleic Acid Probes part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York. Typically, the nucleic acids are placed together in solution, heated above the melting temperature of the nucleic acids, and slowly cooled to a temperature below the melting temperature of the nucleic acids.
Preferred double stranded primer binding sites comprise non- complementary sequences. As described herein, PCR primers are made in which the 3' portion of the primer is complementary to a nucleic acid which is complementary to one of the non-complementary regions of the primer binding site. As described in Figures 1 and 2, the use of primer binding sites with regions of non-complementarity eliminates unwanted PCR products, obviating the need for purification of PCR products for sequencing. A preferred primer binding site is a vectorette, in which the central portion of the primer binding site is non- complementary. The primer binding sites are ordinarily not phosphorylated for ligation to the digested DNA. As shown in Figures 1 and 2, this reduces the possibility of unwanted PCR products during later steps (e.g., concatamers of the primer binding site do not form). However, the primer binding sites are optionally phosphorylated. Where the primer binding site is not phosphorylated, the digested DNA is usually phosphorylated to permit ligation. Most DNA ligases require a site of phosphorylation to couple nucleic acids.
The First Cvcle of PCR: First Strand Synthesis
In the first cycle of PCR, the digested tagged double-stranded DNAs are denatured in parallel, and an internal amplification primer is hybridized to an internal region on the digested DNA for primer extension by a thermostable enzyme. Although ordinary taq can be used for this purpose, thermostable polymerase enzymes which have superior abilities to amplify large nucleic acids are preferred (see, Cheng et al, supra.). Example enzymes include rTth DNA polymerase XL (a mixture of rTth DNA polymerase from Thermus thermophilus and Vent DNA polymerase from Thermococcus litoralis, available from Perkin Elmer, Foster City, CA. See also, Cheng et al, supra), Taq plus DNA polymerase (Stratagene Catalogue # 600203 and 600204); the Expand long template PCR system (Boehringer Catalogue # 1-681-834), and the eLONGase Amplification system (GIBCO-BRL Catalogue # 10481). These enzymes can reliably amplify 20 kb or larger sequences (occasionally up to 42 kb; see, Cheng et al , supra). Thus, each set of PCR reactions can be used to amplify sets of overlapping fragments which extend up to at least about 20 kb from an internal primer. It will be appreciated that where a central portion of a DNA to be sequenced is known, overlapping fragments can be generated in either direction relative to the known sequence. The primer extension reaction is carried out according to known techniques, e.g., as specified by the supplier of the selected polymerase. Primer Design
The internal primer to prime the first strand of synthesis is selected based upon hybridization to a known sequence in the large DNA. This known sequence optionally comes from vector sequence surrounding a clone in a cloning vector, such as a plasmid, cosmid, phage or YAC. Alternatively, one of skill will appreciate that a portion of a DNA of interest can be sequenced using standard techniques, or more preferably using the techniques described herein, thereby providing sequence information for making the internal primer. Using the techniques described herein, ordered sets of nested fragments are made and the sequence information is used to generate a second contiguous ordered set of nested fragments, and so on, until the entire sequence of interest is determined.
While many sequences can be used to construct primers, selecting optimal amplification primers is typically done using computer assisted consideration of available sequence and excluding potential primers which do not have desired hybridization characteristics, and/or including potential primers which meet selected hybridization characteristics. This is done by deteimining all possible nucleic acid primers, or a subset of all possible primers with selected hybridization properties (e.g., those with a selected length and G:C ratio) based upon the known sequence. The selection of the hybridization properties of the primer is dependent on the desired hybridization and discrimination properties of the primer. In general, the longer the primer, the higher the melting temperature.
However, longer primers are not as specific because a single mismatch has less of a destabilizing effect on hybridization than a single mismatch on a short nucleic acid duplex; thus, long primers can create unwanted PCR products. It is expected that one of skill is thoroughly familiar with the theory and practice of nucleic acid hybridization and primer selection. Gait, ed. Oligonucleotide Synthesis: A
Practical Approach, IRL Press, Oxford (1984); W.H. A. Kuijpers Nucleic Acids Research 18(17), 5197 (1994); K.L. Dueholm J. Org. Chem. 59, 5767-5773 (1994); S. Agrawal (ed.) Methods in Molecular Biology, volume 20; and Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology— hybridization with nucleic acid probes, e.g., part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York provide a basic guide to nucleic acid hybridization. Innis supra provides an overview of primer selection.
Most typically, amplification primers are between 8 and 100 nucleotides in length, and preferably between about 10 and 30 nucleotides in length. Most preferably, the primers are between 15 and 25 nucleic acids in length. For example, in one preferred embodiment, the nucleic acid primers are about 17-30 nucleotides in length.
One of skill will recognize that the 3' end of an amplification primer is more important for PCR than the 5' end. Investigators have reported
PCR products where only a few nucleotides at the 3' end of an amplification primer were complementary to a DNA to be amplified. In this regard, nucleotides at the 5' end of a primer can incorporate structural features unrelated to the target nucleic acid; for instance, in one preferred embodiment, a sequencing primer hybridization site (or a complement to such as primer, depending on the application) is incorporated into the amplification primer, where the sequencing primer is derived from a primer used in a standard sequencing kit, such as one using a biotinylated or dye-labeled universal Ml 3 or SP6 primer. These structural features are referred to as constant primer regions.
The primers are selected so that there is no complementarity between any known target sequence and any constant primer region. One of skill will appreciate that constant regions in primer sequences are optional.
Typically, all primer sequences are selected to hybridize only to a perfectly complementary DNA, with the nearest mismatch hybridization possibility from known DNA sequence typical having at least about 50 to 70% hybridization mismatches, and preferably 100% mismatches for the terminal 5 nucleotides at the 3' end of the primer.
The primers are selected so that no secondary structure forms within the primer. Self -complementary primers have poor hybridization properties, because the complementary portions of the primers self hybridize (i.e., form hairpin structures). The primers are also selected so that the primers do not hybridize to each other, thereby preventing duplex formation of the primers in solution, and possible concatenation of the primers during PCR. If there is more than one constant region in the primer, the constant regions of the primer are selected so that they do not self-hybridize or form haiφin structures.
Where sets of amplification primers (i.e., the 5' and 3' primers used for exponential amplification) are of a single length, the primers are optionally selected so that they have roughly the same, and, in some embodiments, exactly the same overall base composition (i.e., the same A+T to G+C ratio of nucleic acids). Where the primers are of differing lengths, the A+T to G+C ratio is determined by selecting a thermal melting temperature for the primer-DNA hybridization, and selecting an A+T to G+C ratio and probe length for each primer which has approximately the selected thermal melting temperature.
One of skill will recognize that there are a variety of possible ways of performing the above selection steps, and that variations on the steps are appropriate. Most typically, selection steps are performed using simple computer programs to perform the selection as outlined above; however, all of the steps are optionally performed manually. One available computer program for primer selection is the Mac Vector program from Kodak. In addition to programs for primer selection, one of skill can easily design simple programs for any of the preferred selection steps.
The Second Cvcle of PCR: Second Strand Synthesis The second cycle of PCR is performed using a primer (the "first amplification primer," or "second strand primer") which hybridizes to a region of the first strand synthesized from the internal primer as discussed above. Typically, the amplification primer hybridizes to the first strand in the region which was synthesized in the first strand complementary to the primer binding site. In preferred embodiments, the amplification primer hybridizes to the region of the first strand which was synthesized to be complementary to one strand of the primer binding site. It is particularly preferred that this region of the first strand be complementary to a portion of the primer binding site which was itself not complementary to the opposing strand of the primer binding site, such as the central portion of a vectorette (See, Figure 2 and Figure 5). The sequence of the first strand in this region is created by the first strand synthesis; the double- stranded template for the first round of PCR does not contain an equivalent sequence.
In preferred embodiments, the first amplification primer can only hybridize to the first strand, resulting in PCR with low background. This eliminates the need for purification of the PCR product, although purification is optionally performed. When the PCR product is purified it is typically purified using simple column chromatography to remove any excess primers, or by gel purification. As discussed supra, the region of a PCR primer which should be complementary to the template is the 3' end of the primer. The 5' end optionally has sequences engineered to facilitated sequencing, or cloning. For example, the primer optionally incorporates a sequence corresponding to widely used universal Ml 3, M13 reverse, T7, T3 or SP6 sequencing primers. A preferred SP6 primer is ATTTAGGTGACACTATAG. A preferred T7 primer is
TAATACGACTCACTATAGGG. A preferred T3 primer is ATTAACCCTCACTAAAGGGA. A preferred M13 reverse primer is CAGGAAACAGCTATGACC. An advantage to using such a universal primer is that kits and reagents for sequencing with these primers, e.g., biotinylated, dye labeled and luminescent primers, are widely available (e.g., from Applied Biosystems, Inc. or Perkin Elmer, Inc.). However, one of skill will recognize that a wide range of sequencing primers are optionally used.
In one embodiment, the first amplification primer is phosphorylated, e.g., using T4 polynucleotide kinase. This permits selective digestion of the second strand using an exonuclease, leaving a single-stranded template for sequencing.
The Final PCR Product
The first and second strands are exponentially amplified by PCR to yield the final PCR product for each of the parallel series of parallel reactions (referred to as a "set" of PCR reactions). An aliquot of each of the reactions can be analyzed for purity. If unwanted PCR products occur, the final product can be purified by gel-purification or column chromatography procedures. Although it is not preferred or required, the PCR products can be subcloned for sequencing, or other procedures.
After exponential amplification with the internal primer and the first primer, the final PCR product is directly sequenced using a wide variety of sequencing protocols. It is expected that one of skill is thoroughly familiar with basic sequencing protocols, including those using the Klenow fragment of DNA polymerase, taq polymerase, Sequenase™, and the like. See, Sambrook, Ausbel, Innis, and Berger, supra. In one preferred embodiment, one strand of the final PCR product is selectively digested to produce a single-stranded template for sequencing. One of skill will appreciate that single-stranded templates often produce more readable sequence information.
Once all of the PCR products from all of the separate parallel reactions are sequenced, the sequences are aligned based upon overlapping sequence information. Preferably, this is performed using computer assisted compilation of the sequences, but the sequences are optionally aligned manually. Once the sequence furthest from the internal primer binding site is determined, the information is used to select a new internal primer which hybridizes to the region. The entire procedure for generating overlapping or "nested" fragments is then repeated using the new internal primer.
Kits
Kits are constructed based upon the methods and sets of PCR reactions described herein. The kits will ordinarily contain instructions in the methods of the invention, reagents for performing the PCR reactions, a container such as a box containing the instructions and vials of reagents. The kits optionally include sequencing primers, computer software for primer selection and sequence compilation, sequencing equipment such as gel readers or gel boxes, PCR enzymes, sequencing enzymes, buffers for performing the PCR or sequencing reactions, primer binding sites such as vectorettes (hybridized or unhybridized), positive control large DNAs such as a cosmid, plasmid, phage or YAC, an exonuclease, a kinase enzyme, or other items for performing the methods described herein.
Combinatorial Shotgun- Long Distance Sequencing Methods In shotgun sequencing methods, a given template is cleaved into many smaller fragments, subcloned and the clones randomly isolated and sequenced. The sequences from the clones is then assembled, and overlapping regions compiled. See also, Sambrook, Innis, Ausubel, Berger, Watson, and Lewin, supra for a discussion of shotgunning DNA. As the size of a template increases, the work required to generate a full length clone increases geometrically, because of the number of random clones that must be sequenced to ensure that all regions of the template are represented. In some shotgun methods, some improvement is achieved by directly sequencing the DNA fragments, e.g., using PCR mediated methods. However, shotgun sequencing methods are typically still labor intensive.
The present invention is used in conjunction with shotgun sequencing methods to dramatically improve the speed of sequencing large projects. In the methods, shotgun cloning and sequencing is first performed. In one embodiment, a minority of the total template is sequenced by shotgunning methods. For example, in one embodiment, between about 10 and about 100 clones or other DNA fragments, potentially out of several thousand, are sequenced by standard shotgun cloning and sequencing. Sequence specific primers are generated for each sequence generated by the shotgun methods, using the methods described for primer selection, supra. See, Figure 4. The sequence specific primers are used to perform long distance sequencing as described, supra (see, e.g., Figures 1, 2, and 5), for each sequence specific primer. In brief, the large template is cleaved with restriction enzymes to generate a set of overlapping fragments, vectorettes comprising a primer binding site for a sequencing primer are ligated as appropriate, long distance PCR is performed and the resulting fragments are sequenced with chain termination methods using the sequencing primer. As needed, the sequence generated by the long distance sequencing method can be extended by generating new sequence specific primers from the sequence generated by the first set of long distance sequencing reactions, and repeating the long distance sequencing method using these sequence specific primers. All of the sequences are then aligned and compiled to generate a single contiguous sequence for the large template. In a similar embodiment, standard shotgun sequencing is first performed to determine most of the sequence of a given large template (e.g., a majority of the sequence, up to about 90% of the sequence). Generating the complete sequence at this point by shotgun sequencing requires a very large investment of effort, because a large number of random clones or fragments needs to be sequenced to find the missing information. For instance, in a conceptually simple system with just 10 random fragments, with 9 of the 10 already sequenced, the chance that sequencing an additional random clone will provided the missing information is just 10% . Instead of generating the missing sequencing by sequencing random fragments, a long distance sequencing protocol as described, supra, is used to fill in the missing information. This can dramatically reduce the number of sequencing reactions that need to be performed to generate complete sequence information for a large template. See, Figure 4. EXAMPLES
The following examples are offered by way of illustration only. One of skill will recognize many parameters and conditions which can be modified to achieve essentially similar results.
Example 1 : Anchored PCR Using Vectorettes
In one embodiment, nested DNA fragments around region of interest were amplified by anchored PCR using a vectorette unit and the fragments were directly sequenced. The procedure is schematically presented in Figures 1 and 2.
Amplification primers were designed in the sequenced regions of the cosmid using the Mac Vector program (Kodak).
The following oligonucleotides, V-top 5 -GAAGGAGAGGACGCTGTCTGTCGAAGGTAAGGAACGGACGAGAGAAGGGAGAG-3' and V-bottom
5 '-CTCTCCCTTCTCGAATCGTAACCGTTCGTACGAGAATCGCTGTCCTCTCCTTC-3 ' were synthesized, purified and dissolved in distilled water to a final concentration of 4 μM of each. The solution was heated at 68 °C for 10 min and was slowly cooled over 30 min to room temperature to make an annealed vectorette unit (4 μM). See also, Riley, et al. (1990) Nucleic Acids Res. 18: 2887-2890. V-top and
V-bottom are complementary to each other except for the middle one-third of the annealed double-stranded vectorette, imparting a bubble like structure to the vectorette (see also, Figures 1 and 2).
The M13 sequence-tagged 224 primer (224M13: 5'- TGTAAAACGACGGCCAGTCGAATCGTAACCGTTCGTACGAGAATCGCT-3') was phosphorylated in a 50 μl volume containing lx kinase buffer [70 mM Tris-HCl(pH7.6), 10 mM MgCl2, 5 mM dithiothreitol, 1 mM ATP], 30 μM 224M13 and 50 U T4 polynucleotide kinase. The reaction was incubated at 37°C for 30min, heated to 68°C for 10 min then stored at -20°C. Cosmid DNA was extracted from a 1.5 ml overnight culture by an alkaline mini-prep method (See, Sambrook) into 50 μl of distilled water. Two microliters of this cosmid DNA solution were enzymatically digested using Alu I, Bsa Al, BstU I, Pal I, Rsa I, Ace I, Afl III, BstY I, Hmc II, Msl I, 7sp45 I, EcoR V, Hpa I, Pvw II, Sea I, 5/nα I, Slsp I or Stu I (Stratagene and New England Biolabs) in a 20 μl volume including 10 U of enzyme and lx buffer as recommended by the manufacturers. After a one hour incubation at 37 °C, individual digestions were extracted with phenol/chloroform and precipitated with ethanol. Because Ace I, Afl III, BstY I and Tsp 45 I do not give blunt-end DNA fragment, samples digested with them were blunt-ended by treatment with T4 DNA polymerase in a 50 μl reaction containing lx buffer [50 mM NaCI, 10 mM Tris-HCl(pH7.9), 10 mM MgCl2, 1 mM dithiothreitol, 100 nM dNTPs] and 6 U of T4 DNA polymerase at 37 °C for 30 min, followed by extraction with phenol/chloroform and precipitation with ethanol. The other restriction enzymes give blunt-end DNA and this step was unnecessary for the enzymatic digests.
The vectorette unit was ligated to each restriction fragment in a 40 μl reaction volume containing lx T4 ligase buffer [50 mM Tris-HCl(pH7.8), 10 mM MgCl2, 10 mM dithiothreitol, 1 mM ATP, 25 μg/ml bovine serum albumin],
1 μl of 4 μM vectorette unit solution and 83 U T4 DNA ligase. After 2 hr at room temperature the reaction was diluted with distilled water to 250 μl and then stored at -20°C.
The PCR reactions were performed in 100 μl volumes containing lx XL buffer II (Perkin-Elmer), 1.1 mM Mg(OAc)2, 200 nM dNTPs, 2.5 μl from the vectorette unit-Iigated restriction fragment solution, 1 μl of 30 μM phosphorylated 224M13, 1 μl of 30 μM amplification primer and 2 U of rTth XL DNA polymerase. Reactions were heated to 94 °C for 1 min, then PCRed for a total of 40 cycles at 94°C for 30 sec, 55°C for 30 sec, 68°C for 4 min; the last 24 cycles used a 15 sec extension per cycle using a thermal cycler (Perkin-Elmer).
Amplified PCR fragments were purified in 50 μl of distilled water using the Wizard PCR prep kit (Promega). Purified PCR fragments, 25 μl, were subjected to λ exonuclease digestion using the "PCR template prep for ssDNA sequencing" kit (Pharmacia) to yield single stranded DNAs in 25 μl of TE(10 mM Tris-HCl pH 7.6, 1 mM EDTA). Seven microliters of single stranded DNA solutions were subjected to direct fluorescent sequencing using the Taq dye-primer cycle sequencing kit and the 373 A DNA sequencer (Perkin-Elmer). Restriction enzymes with various lengths of recognition sequences (4 bases to 6 bases) were used to obtain various lengths of the restricted fragment around the amplification primer. This provided various lengths of amplified fragments. As expected, the sizes of the amplified fragments were distributed up to 6 kb in length, giving a set of nested fragments suitable for long range sequence determination (Figure 3, panel A). The M13 sequence is at the end of the 224M13 primer, enabling sequencing from that end using a commercially available M13-based dye-primer sequencing kit (Figures 2 and 5). The sequences were quite readable, allowing long sequences to be read in a single run Figure 3, Panel B. The sequence of the M13 primer used in this example was
TGTAAAACGACGGCCAGT.
After the sequences were determined and assembled, new amplification primers were designed based upon the sequenced fragments using the Mac Vector program and synthesized. The steps beginning from PCR amplification were repeated using the same vectorette-ligated restriction fragment solutions.
Using this method, the sequence of 14 kb out of a 16 kb region of interest was determined from a single small scale cosmid DNA preparation. A region of 2 kb with few restriction sites for the initial set of enzymes required the use of additional restriction enzymes to generate appropriate fragments.
In addition to sequencing a long continuous stretch of DNA, this strategy is suitable to determine nucleotide sequence around a region of known sequence. For example, this method was successfully applied to determine the genomic structure of the transforming growth factor β type II receptor gene using YAC and cosmid clones.
The vectorette unit was originally used to isolate the end fragments from yeast artificial chromosome (YAC) clones because of its high specificity and low background. Using computer-designed amplification primers, false-priming products were rare. The preferred method of this example is characterized by the use of vectorette-mediated anchored PCR by which a series of nested DNA fragments suitable for sequencing are obtained in a single step. The ease of this technique, the need for minimal amounts of DNA and the ability to sequence systematically large nucleotide segments make this method advantageous and preferable to existing methods for many sequencing applications. This method alone or in conjunction with other methods, such as shotgunning techniques, facilitates a wide variety of sequencing projects.
Example 2: Simplified Anchored PCR Using Vectorettes
In one class of embodiments, a simplified protocol was used, omitting phosphorylation of the 224M13 primer, lambda exonuclease treatment, phenol/chloroform extraction, ethanol precipitation, spin column treatment and other optional procedures. This simplified treatment reduced the time needed for preparing PCR samples by up to three hours.
In this class of embodiments, DNA from a selected source (e.g., plasmid, YAC, PAC, BAC, or cosmid) was digested with a multiplicity of enzymes as described, supra, in a 10 μl reaction volume for one hour. The reactions were stopped by heat inactivating the restriction enzymes by incubating the reaction mixtures for 10 minutes at 68 °C. The reactions were then cooled to room temperature. In cases where restriction enzymes giving cohesive ends were used, the ends were blunted by Klenow treatment (1 μl of lOmM dNTPs and .5 units of Klenow were added at room temperature, and incubated for 5 minutes).
The Klenow enzyme was heat inactivated at 68 °C for 10 minutes, followed by cooling to room temperature. To the mixture, 1 μl of 4 nM vectorette, 3 μl of T4 DNA ligase buffer containing 10 mM ATP, 26 μl distilled water and 1 Weiss unit of T4 DNA ligase were added. The mixture was incubated for 1 hour at room temperature. 210 μl of distilled water was then added, and the mixture stored until use.
For the PCR reactions, 224M13 (unphosphorylated) was used with the sequence specific primer and 2.5 μl solution from the stored mixture. After confirming the amplified fragments by agarose gel electrophoresis, the PCR fragments were purified using Wizard PCR preparation kits from Promega, or similar standard methods. The purified DNA was eluted in 50 μl of distilled water. Sequencing reactions were performed using ABI's -21M13 dye primer kit, or a comparable kit (e.g., a dye sequencing kit available from Amersham).
All publications and patent applications cited in this specification are herein incorporated by reference for all purposes as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

Claims

WHAT IS CLAIMED IS:
1. A method of sequencing a first nucleic acid comprising: (i) providing a plurality of different overlapping double stranded nucleic acids; (ii) ligating a first primer binding site to a first double stranded nucleic acid from the plurality of different overlapping nucleic acids, wherein the ligated primer binding site comprises a non-complementary subsequence, thereby producing a first tagged nucleic acid; (iii) amplifying a portion of the first tagged nucleic acid, wherein the amplification is performed with an internal primer complementary to the first double stranded nucleic acid, thereby providing an amplified first tagged nucleic acid subsequence; and, (iv) performing a dideoxy chain termination reaction with the amplified first tagged nucleic acid subsequence as a template, using a first sequencing primer complementary to one strand of the amplified first tagged nucleic acid subsequence to prime DNA synthesis.
2. The method of claim 1, wherein the plurality of different overlapping nucleic acids is a set of restriction enzyme digests of a large DNA.
3. The method of claim 1 , wherein the plurality of different overlapping nucleic acids is a restriction enzyme digest of a large DNA, wherein the method further comprises parallel sequencing of additional nucleic acids from the plurality of overlapping nucleic acids and compilation of the sequences to provide a sequence for the large DNA.
4. The method of claim 1, further comprising sequencing a second nucleic acid in parallel with the first nucleic acid by: (v) attaching a second primer binding site to the second nucleic acid, thereby producing a tagged second nucleic acid; (vi) amplifying the tagged second nucleic acid to produced an amplified second tagged nucleic acid; and, (vii) performing a dideoxy chain termination reaction using the amplified second tagged nucleic acid as a template using a second sequencing primer complementary to the amplified second tagged nucleic acid.
5. The method of claim 4, wherein the method further comprises compiling the sequence of the first and second nucleic acid, thereby generating a third sequence which includes sequence information from both the first and second nucleic acids.
6. The method of claim 4, wherein the second sequencing primer hybridizes to a terminal region of the amplified second tagged nucleic acid subsequence, which terminal region comprises a nucleotide sequence complementary to one strand of the second primer binding site.
7. The method of claim 1 , further comprising shotgun subcloning a large DNA, thereby providing random subclones of the large DNA, which random subclones comprise the plurality of different overlapping double stranded nucleic acids of step (i).
8. The method of claim 1 , further comprising: shotgun subcloning a large DNA, thereby providing random subclones of the large DNA; sequencing a plurality of said random subclones, thereby providing sequenced random subclones; and, selecting said internal primer of step (iv) to be complementary to a subclone selected from said sequenced random subclones.
9. The method of claim 8, wherein a majority of sequence for said large DNA is determined by sequencing said plurality of random subclones.
10. The method of claim 8, wherein a minority of sequence for said large DNA is determined by sequencing said plurality of random subclones.
11. The method of claim 1, wherein the first sequencing primer hybridizes to a terminal region of the amplified first tagged nucleic acid subsequence, which terminal region comprises a nucleotide sequence complementary to one strand of the first primer binding site.
12. The method of claim 1, wherein the plurality of different overlapping nucleic acids is provided by digesting a large nucleic acid with a multiplicity of different restriction enzymes.
13. The method of claim 1 , wherein the first primer binding site comprises a double-stranded vectorette, wherein the vectorette comprises a central non-complementary nucleic acid subsequence.
14. The method of claim 1 , wherein the first double stranded nucleic acid is a recombinant nucleic acid.
15. The method of claim 1, wherein the first sequencing primer is selected from the group consisting of the Ml 3 sequencing primer and the SP6 sequencing primer.
16. The method of claim 1, wherein the step of amplifying a portion of the first tagged nucleic acid is performed by the substeps of: (a) denaturing the first double-stranded tagged nucleic acid; (b) hybridizing the internal primer to a complementary amplification primer binding subsequence of the first double stranded nucleic acid, wherein said sequence is 3' to the first primer binding site; (c) performing primer extension of the strand complementary to the internal primer binding subsequence by PCR, with the first amplification primer to prime DNA synthesis, thereby forming a first PCR amplification product; (d) denaturing the first PCR amplification product; (e) hybridizing a first amplification primer to the first PCR amplification product in a region of the first amplification product which is complementary to the first primer binding site; and, (f) performing primer extension of the strand complementary to the first amplification product by PCR, using the first amplification primer to prime DNA synthesis, thereby producing a double stranded amplified first nucleic acid subsequence.
17. The method of claim 16, wherein the first amplification primer is the 224M13 primer.
18. The metiiod of claim 16, wherein the method further comprises the step of digesting the double stranded amplified first nucleic acid with λ exonuclease, thereby producing a single-stranded amplified first nucleic acid subsequence.
19. The method of claim 1, wherein the first double stranded nucleic acid comprises a 5' phosphate group, and wherein the first primer binding site does not comprise a 5' phosphate group.
20. The method of claim 1 , wherein the first double stranded nucleic acid comprises a 5' phosphate group, the first primer binding site does not comprise a 5' phosphate group, and wherein the first nucleic acid is ligated to the first primer binding site using a DNA ligase.
21. The method of claim 1, wherein the plurality of different nucleic acids is provided by cloning a large nucleic acid in a vector selected from the group of vectors consisting of a plasmid, a cosmid, a λ phage genome, a pi plasmid, a bacterial artificial chromosome, and a yeast artificial chromosome, and cleaving the large nucleic acid with a multiplicity of restriction enzymes.
22. The method of claim 1 , wherein the method further comprises blunting the ends of the plurality of different double stranded nucleic acids.
23. The method of claim 1, wherein the first nucleic acid is provided by cleaving a cloned nucleic acid with a plurality different of restriction enzymes.
24. A kit for sequencing large nucleic acids, comprising a container, instruction in the use of the kit for performing sequencing of large nucleic acids, a nucleic acid encoding a primer binding site, and a primer which hybridizes to a nucleic acid complementary to the primer binding site, wherein the primer binding site comprises a double-stranded vectorette comprising a central non-complementary nucleic acid subsequence.
25. The kit of claim 24, wherein the kit further comprises a component selected from the group of components consisting of taq polymerase, rTth DNA polymerase XL, Taq plus DNA polymerase, DNA ligase, a restriction endonuclease, computer software for compiling overlapping sequence information, software for designing PCR primers, DNA kinase, λ exonuclease, PCR reagents, DNA polymerase, sequenase, subcloning plasmids, and an M13 sequencing primer.
26. A set of PCR reaction mixtures, wherein the reaction mixtures comprise an overlapping series of template nucleic acids, and wherein each reaction mixture in the set of reaction mixtures comprise: a template DNA tagged with a vectorette; a first primer which hybridizes to an internal subsequence of the template DNA; and, a second primer which hybridizes to the vectorette.
27. The set of PCR reaction mixtures of claim 26, wherein each PCR reaction mixture in the set further comprises a composition selected from the group consisting of rTth polymerase XL, nucleotides, and Mg++.
28. The set of PCR reaction mixtures of claim 26, wherein each template DNA found in each of the PCR reaction mixtures in the set is tagged with a vectorette at each end, thereby providing an internal template DNA subsequence, flanked by external vectorette sequences, and wherein each of the vectorettes are covalently attached to only one strand of the internal DNA subsequence.
29. The set of PCR reaction mixtures of claim 28, wherein the vectorettes flanking the internal DNA have the same sequence.
30. The set of PCR reaction mixtures of claim 28, wherein the vectorettes flanking the internal DNA have the same sequence, and wherein each vectorette comprises a subsequence which is complementary to a sequence which is complementary to primer 224M13.
PCT/US1997/008114 1996-05-15 1997-05-14 Methods for sequencing large nucleic acid segments and compositions for large sequencing projects WO1997043449A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU30660/97A AU3066097A (en) 1996-05-15 1997-05-14 Methods for sequencing large nucleic acid segments and compositions for large sequencing projects

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US1756996P 1996-05-15 1996-05-15
US60/017,569 1996-05-15

Publications (1)

Publication Number Publication Date
WO1997043449A1 true WO1997043449A1 (en) 1997-11-20

Family

ID=21783325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/008114 WO1997043449A1 (en) 1996-05-15 1997-05-14 Methods for sequencing large nucleic acid segments and compositions for large sequencing projects

Country Status (2)

Country Link
AU (1) AU3066097A (en)
WO (1) WO1997043449A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000040755A2 (en) * 1999-01-06 2000-07-13 Cornell Research Foundation, Inc. Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
WO2013093530A1 (en) 2011-12-20 2013-06-27 Kps Orvosi Biotechnológiai És Egészségügyi Szolgáltató Kft. Method for determining the sequence of fragmented nucleic acids
WO2017079699A1 (en) * 2015-11-04 2017-05-11 The Broad Institute, Inc. Multiplex high-resolution detection of micro-organism strains, related kits, diagnostics methods and screening assays

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0224126A2 (en) * 1985-11-25 1987-06-03 The University of Calgary Covalently linked complementary oligodeoxynucleotides as universal nucleic acid sequencing primer linkers
WO1993024654A1 (en) * 1992-06-02 1993-12-09 Boehringer Mannheim Gmbh Simultaneous sequencing of nucleic acids
WO1996002673A1 (en) * 1994-07-14 1996-02-01 Amicon, Inc. Method of using mobile priming sites for dna sequencing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0224126A2 (en) * 1985-11-25 1987-06-03 The University of Calgary Covalently linked complementary oligodeoxynucleotides as universal nucleic acid sequencing primer linkers
WO1993024654A1 (en) * 1992-06-02 1993-12-09 Boehringer Mannheim Gmbh Simultaneous sequencing of nucleic acids
WO1996002673A1 (en) * 1994-07-14 1996-02-01 Amicon, Inc. Method of using mobile priming sites for dna sequencing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BERG E S ET AL: "HYBRID PCR SEQUENCING: SEQUENCING OF PCR PRODUCTS USING A UNIVERSAL PRIMER", BIOTECHNIQUES, vol. 17, no. 5, 1 November 1994 (1994-11-01), pages 896, 898, 900/901, XP000474854 *
HAGIWARA K ET AL: "Long distance sequencer method; a novel strategy for large DNA sequencing", NUCLEIC ACIDS RESEARCH, vol. 24, no. 12, 15 June 1996 (1996-06-15), pages 2460 - 61, XP002041218 *
SIEBERT P ET AL: "An improved method for walking in uncloned genomic DNA", NUCLEIC ACIDS RESEARCH, vol. 23, no. 6, 1995, pages 1087 - 88, XP002041219 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000040755A2 (en) * 1999-01-06 2000-07-13 Cornell Research Foundation, Inc. Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
WO2000040755A3 (en) * 1999-01-06 2001-01-04 Cornell Res Foundation Inc Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US6534293B1 (en) 1999-01-06 2003-03-18 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
WO2013093530A1 (en) 2011-12-20 2013-06-27 Kps Orvosi Biotechnológiai És Egészségügyi Szolgáltató Kft. Method for determining the sequence of fragmented nucleic acids
WO2017079699A1 (en) * 2015-11-04 2017-05-11 The Broad Institute, Inc. Multiplex high-resolution detection of micro-organism strains, related kits, diagnostics methods and screening assays

Also Published As

Publication number Publication date
AU3066097A (en) 1997-12-05

Similar Documents

Publication Publication Date Title
US5455166A (en) Strand displacement amplification
US5487993A (en) Direct cloning of PCR amplified nucleic acids
US5270184A (en) Nucleic acid target generation
US5728524A (en) Process for categorizing nucleotide sequence populations
US5554517A (en) Nucleic acid amplification process
US5514568A (en) Enzymatic inverse polymerase chain reaction
AU746620B2 (en) Nucleic acid indexing
US5409818A (en) Nucleic acid amplification process
US4766072A (en) Vectors for in vitro production of RNA copies of either strand of a cloned DNA sequence
AU622863B2 (en) Amplification method for polynucleotide assays
US20050112639A1 (en) Amplification of polynucleotide sequences by rolling circle amplification
JPH07500735A (en) Messenger RNA identification, isolation and cloning
WO1998040518A9 (en) Nucleic acid indexing
EP0708840A1 (en) Improved methods for detecting nucleic acid sequences
JPS63500006A (en) Nucleic acid base sequencing method using exonuclease inhibition
WO1997043449A1 (en) Methods for sequencing large nucleic acid segments and compositions for large sequencing projects
EP0598832A4 (en) Dna sequencing with a t7-type gene 6 exonuclease.
WO1990001064A1 (en) Sequence-specific amplification techniques
US20020064837A1 (en) Method for synthesizing a nucleic acid molecule using a ribonuclease
IE910099A1 (en) Amplification of nucleotide sequences using vectorette units
US5407813A (en) Preparation of nucleic acid deletion fragments
US6586237B1 (en) Compositions and methods for cloning nucleic acids
AU652214B2 (en) Strand displacement amplification "Sealing Clerk - sealing is to be refunded"
JPH0576399A (en) Preparation of nucleic acid reproducible in vitro
JPH03117488A (en) Method for cloning genomic dna

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN YU AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i

Free format text: PAT. BUL. 50/97 UNDER INID (60) "PARENT APPLICATION OR GRANT", REPLACE THE EXISTING TEXT BY "(63) RELATED BY CONTINUATION US 60/017569 (CIP) FILED ON 15.05.96"

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 97541078

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA