WO2000028087A1 - A library of modified primers for nucleic acid sequencing, and method of use thereof - Google Patents

A library of modified primers for nucleic acid sequencing, and method of use thereof Download PDF

Info

Publication number
WO2000028087A1
WO2000028087A1 PCT/US1999/026431 US9926431W WO0028087A1 WO 2000028087 A1 WO2000028087 A1 WO 2000028087A1 US 9926431 W US9926431 W US 9926431W WO 0028087 A1 WO0028087 A1 WO 0028087A1
Authority
WO
WIPO (PCT)
Prior art keywords
primer
nucleotides
sequence
library
primers
Prior art date
Application number
PCT/US1999/026431
Other languages
French (fr)
Inventor
A. Michael Chin
Original Assignee
Sequetech Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sequetech Corporation filed Critical Sequetech Corporation
Publication of WO2000028087A1 publication Critical patent/WO2000028087A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07HSUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
    • C07H21/00Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6832Enhancement of hybridisation reaction
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures

Definitions

  • the present invention relates generally to modified oligonucleotide primers, reduced set libraries of such primers and methods for their use in polymerase-catalyzed primer-extension reactions .
  • Sanger, or dideoxy sequencing is currently the sequencing method of choice for DNA sequence determination. It relies upon stable and specific annealing of a single-strand oligonucleotide (the "sequencing primer") to the template to be sequenced, followed by primer extension using a DNA polymerase.
  • a variation of dideoxy sequencing termed "cycle-sequencing" involves use of each template strand multiple times in each reaction, resulting in signal amplification.
  • One application of cycle sequencing comprises a series of individual cycle sequencing reactions, wherein the results of each individual primer extension allows the next, overlapping segment to be sequenced and is termed "primer walking". Approximately, 400-800 nucleotides of sequence is reliably determined from each primer extension, then used to design the next walking primer which is preferably selected to anneal to a sequence within about 50-100 nucleotides of the end of the previously sequenced segment.
  • any given 15 base DNA sequence is expected to exist once every 10 9 bases in a random sequence.
  • Such specific primers are typically used in sequencing reactions and range from 15 to 25 bases in length. There are 4 15 -4 25 different primers having a length of 15-25 nucleotides, so producing a bank or library of all possible sequencing primers is not practical.
  • the new primer required for each step of the "walk” is generally synthesized by standard chemical methods, based on the sequence obtained from the preceding step of the "walk".
  • the invention includes, in one aspect, a modified oligonucleotide primer for use in characterizing a selected target sequence via a polymerase-catalyzed primer-extension reaction.
  • a modified oligonucleotide primer is from 7-11 nucleotides extending from a 3' end to a 5 ' end, has a Tm greater than or equal to 35°C and has a sequence of nucleotide bases complementary to that of the target's selected sequence.
  • At least the three nucleotides closest to the primer's 3' end are natural nucleotides linked by natural phosphodiester linkages and have two adjoining nucleotide base analogs among the remaining primer nucleotides which are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides.
  • such primers have an intercalating agent attached to the 5 ' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
  • the modified oligonucleotide primer is composed of between 9-11 bases and at least the four nucleotides closest to the 3' end of the oligonucleotide primer are natural nucleotides linked by natural phosphodiester linkages.
  • a further aspect the invention is directed to a library of modified oligonucleotide primers from which can be selected, an oligonucleotide primer effective for use in characterizing a selected target sequence, via a polymerase-catalyzed primer- extension reaction.
  • the primer members of the library are composed of from 7-11 nucleotides extending from a 3' end to a 5 ' end, wherein at least the three nucleotides closest to the 3' end of the primer are natural nucleotides linked by natural phosphodiester linkages and the remaining primer sequence contains one or more stabilizing modifications which result in an overall Tm of primer dissociation from a complementary target sequence of at least about 35°C.
  • the primer members of the oligonucleotide library are chosen such that the probability of at least one primer in the library hybridizing along its entire length with a random sequence contained in a lOO er target sequence is at least 90%.
  • the primer members of the library may be modified in one or more of the following ways:
  • two or more adjoining nucleotides may be base analogs of natural nucleotides which are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides;
  • an intercalating agent may be attached to the 5' end of the primer through a linker that permits intercalation of the agent between adjacent bases in the nucleotide sequence of the primer;
  • the backbone linkages between adjacent nucleotides may be modified in a manner effective to enhance the stability of primer/target duplex formation
  • the 5' end of the primer may be attached to a minor groove binder (MGB) in a manner effective to permit binding of the MGB to the minor groove in the primer/template duplex.
  • MGB minor groove binder
  • the library is a reduced-set library which includes oligonucleotide primers wherein the primers have been selected to remove sequences: that contain dinucleotides of the form TA, AC, and GT, when read in a 5 ' to 3' direction; that contain repetitive genomic sequences; that promote self-annealing; and which have a Tm that is predicted to be less than about 35°C.
  • the reduced-set library of oligonucleotide primers is further selected to remove at least some sequences that contain dinucleotides of the form AA or TT and/or that contain sequences found in commonly used cloning vectors.
  • the modified oligonucleotide primers of the reduced-set library have between 7-11 bases, wherein at least the three nucleotides closest to the 3' end of each primer are natural nucleotides linked by natural phosphodiester linkages.
  • the oligonucleotide primers of the reduced-set library also have two adjoining nucleotide base analogs effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides and an intercalating agent attached to the 5' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
  • the remaining portion of the primer sequence of the oligonucleotide primers of the reduced-set library have a modified backbone linkage between at least some of the adjacent nucleotides, where the modification is effective to enhance the stability of primer/target duplex formation.
  • the 5' end of the primer is attached to a minor groove binder (MGB) in a manner effective to permit binding of the MGB to the minor groove in the primer/template duplex.
  • MGB minor groove binder
  • Exemplary reduced-set libraries have a primer length of 7, 8, 9, 10 or 11 nucleotides, and the number of primers in the library is about 400-8,000; 1500-8,000; 6,100-18,000; 25,000-70,000; or 97,000-280,000 primers, respectively.
  • the invention further provides a method of selecting a modified oligonucleotide primer effective for use in characterizing a selected target sequence, via a polymerase- catalyzed primer-extension reaction, where the target has a known sequence of at least 100 nucleotides.
  • the method includes the steps of comparing a subsequence within a known target sequence with the sequences of the primers in a modified oligonucleotide primer library of the invention and selecting the matched-sequence primer for use in a primer- extension reaction.
  • the modified oligonucleotide primer library is a reduced-set library and the method is carried out using a computer interface to select the primer sequence from the library.
  • Figure 1 depicts the structure of an exemplary 9-mer modified oligonucleotide primer of the invention which has 2 modified bases and an acridine pendant group.
  • Figure 2 is a schematic representation of a network 21 interconnecting two clients 22a and 22b and a server 23.
  • FIG. 3 is a functional block diagram of a typical computer system 30 that may be used to implement a network client and/or a network server which includes a bus 31 that interconnects a central processing unit (CPU) 32, system memory (RAM) 33, and read-only memory (ROM) 34 and several device interfaces, as further detailed herein.
  • CPU central processing unit
  • RAM system memory
  • ROM read-only memory
  • Figure 4 depicts a flow diagram of a method for selection and use of the oligonucleotide primers of the invention in polymerase- catalyzed primer-extension reactions by way of a computer interface .
  • Nucleic acid subunits are referred to herein by their standard base designations; T, thymine; A, adenosine; C, cytosine; G, guanine, U, uracil; variable positions are referred to as described below.
  • the term "naturally occurring nucleotides" means A, C, G, and T for DNA and A, C, G, and U for RNA.
  • nucleoside and nucleotide include those moieties which contain both known purine and pyrimidine bases and heterocyclic bases which have been modified. Such modifications include halogenated purines and pyrimidines, methylated purines or pyrimidines, acylated purines or pyrimidines, or heterocycles with additional fused rings. Modified nucleosides or nucleotides will also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen, aliphatic groups, or are functionalized as ethers, amines, or the like.
  • polynucleotide refers to a polymeric molecule having a backbone which supports bases capable of hydrogen bonding to typical polynucleotides, where the polymer backbone presents the bases linked by phosphodiester bonds in a manner to permit such hydrogen bonding in a sequence specific fashion between the polymeric molecule and a typical polynucleotide (e.g., single-stranded DNA).
  • Polynucleotides include polymers having a polynucleotide which is an N- or C- glycoside of a purine or pyrimidine base, and to other polymers containing non-standard nucleotide backbones, for example, polyamide linkages (e.g., peptide nucleic acids or PNAs) , phosphodiamidate morpholine chemistry, and other synthetic sequence-specific nucleic acid molecules providing that the molecules contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA.
  • polyamide linkages e.g., peptide nucleic acids or PNAs
  • PNAs phosphodiamidate morpholine chemistry
  • other synthetic sequence-specific nucleic acid molecules providing that the molecules contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA.
  • polynucleotide and “oligonucleotide” are used interchangeably herein and refer to the primary structure of the molecule. These terms include modified, variant or substituted nucleic acids (DNA and RNA) , both single- and double-stranded.
  • Examples include a nucleic acid sequence comprising: (1) a label, many examples of which are known in the art; (2) methylation or "caps"; (3) a substitution of one or more naturally occurring nucleotides with an analog of a natural nucleotide base; (4) an interbase or backbone modification, such as a modified linkage (e.g., an alpha anomeric nucleic acid, etc.); (5) a pendant moiety, e.g., a protein such as a nuclease, toxin, antibody, signal peptide, poly-L-lysine, etc.; (6) a minor groove binding moiety (e.g., pyrrole/imidazole polyamide, CDPI3, a netropsin or distamycin A analog, etc.); (7) an intercalator (e.g., acridine, psoralen, etc.); (8) a chelator ( e . g. , a metal, a radioactive metal,
  • Polynucleotides are described as "complementary" to one another when hybridization occurs in an antiparallel configuration between two single-stranded polynucleotides.
  • a double-stranded polynucleotide can be “complementary” to another polynucleotide, if hybridization can occur between one of the strands of the first polynucleotide and the second.
  • Complementarity (the degree that one polynucleotide is complementary with another) is quantifiable in terms of the proportion of bases in opposing strands that are expected to form hydrogen bonds with each other, according to generally accepted base-pairing rules.
  • analog with reference to an oligonucleotide means a substance possessing both structural and chemical properties similar to those of an oligonucleotide which has standard or natural structural features.
  • modified with reference to an oligonucleotide means a substance possessing both structural and chemical properties similar to, but not identical to, those of an oligonucleotide which has standard or natural structural features.
  • a modified oligonucleotide primer of the invention is from 7 to 11 nucleotides in length and has a Tm greater than or equal to 35°C.
  • oligonucleotide primer refers to a nucleic acid sequence which may include variant or substituted nucleic acids (DNA and RNA) , a modified backbone, a stabilizing pendant group or nucleotide analogs, so long as the oligonucleotide binds to a complementary or near complementary sequence in a manner effective to extended via a polymerase- catalyzed primer-extension reaction.
  • the primer is constructed so that, it is capable of forming stable duplexes under the conditions specified.
  • base stacking refers to the relationship of adjacent bases along a linear DNA molecule, such that as one proceeds along a duplex, adjacent bases are "stacked" above one another, providing a stabilizing effect.
  • natural nucleotide refers to residues, A, C, G, and T for DNA and A, C, G, and U for RNA.
  • standard backbone or “naturally occurring backbone” refers to a phosphodiester linkage which includes deoxyribose (for DNA) or ribose (for RNA) as the backbone or to which the residues, A, C, G, and T for DNA and A, C, G, and U for RNA, or modifications and analogs thereof, are linked.
  • linkage or "interbase linkage” is used interchangeably with the term “backbone”.
  • a polymer having a "modified backbone” is one having other than the standard phosphodiester-linked deoxyribose or ribose for the backbone to which the base residues are linked.
  • a modified backbone as referred to herein has interbase spacing such that appropriate hydrogen bonds can form between an oligonucleotide with a modified backbone and one with a standard phosphodiester backbone.
  • An oligonucleotide having a "chimeric backbone” is one in which the oligonucleotide backbone comprises a segment having a standard backbone and a segment having a modified backbone.
  • An oligonucleotide having a chimeric backbone may also comprise more than one segment of a standard backbone and/or more than one segment of modified backbone, which may be the same or different.
  • the segment of standard backbone and the segment of modified backbone may each be contiguous or the standard and modified backbone segments may be interspersed, e.g., alternating, within one another.
  • degenerate nucleotide and “degenerate base” means that the nucleotide at that position in the sequence may correspond to any one of four nucleotide (A, C, G and T in DNA, or A, C, G and U, in RNA) , such that the population comprises an equal number of oligonucleotide molecules having each one of the four bases in the particular position.
  • the terms “reduced-set library” and “reduced- set library of oligonucleotide primers” refer to a library of a given size which does not contain all possibel sequences of that length. For example, a library of 9mers may contain 9,000 of the 262,144 (4 9 ) possible 9mers.
  • template DNA refers to a single- or double- stranded deoxyribonucleic acid molecule that contains a segment of nucleotides to be sequenced.
  • the template DNA may be derived from a variety of sources, e.g., directly from biological organisms as genomic DNA or as molecular clones propagated in an appropriate host.
  • the template DNA may be prepared for sequencing by a variety of means, e . g. , proteinase K/SDS, chaotropic salts, or the like.
  • the template DNA may come from in vitro amplification procedures such as PCR.
  • hybridizing template region or “hybridizing template nucleotide sequence” or “primer binding site” refers to a primer binding region contained within the template molecule with which a primer will anneal, or form a stable hybrid (or duplex) under desired conditions.
  • hybridization or “annealing” is meant the sequence- specific binding between a primer and a template nucleic acid. It will be appreciated that the binding sequences need not have perfect Watson-Crick complementarity to provide stable hybrids. In many situations, stable hybrids will form where base analogs are paired with naturally occurring bases. They may or may not form hydrogen bonds. Accordingly, as used herein the term “complementary” refers to an oligonucleotide that forms a stable duplex with its “complement” under sequencing conditions.
  • primer refers to a structure comprised of an oligonucleotide, as defined above, that contains a nucleic acid sequence complementary to a nucleic acid sequence present in the template molecule.
  • the polynucleotide regions of a primer may be composed of DNA, and/or RNA, and/or synthetic nucleotide analogs.
  • the backbone of the primer may be a standard backbone or a chimeric backbone.
  • the 3' end of the primer may have any chemical functionality capable of undergoing a DNA polymerase-catalyzed reaction with a nucleotide triphosphate, resulting in an interbase linkage.
  • a “sequencing primer” is an oligonucleotide capable of hybridizing to a segment of the template DNA at a point from which a DNA polymerase can extend the primer by catalyzing the template- dependent addition of nucleotides thereto.
  • An “initiation primer” is a sequencing primer that is capable of specifically hybridizing to a known sequence of nucleotides and can be extended by a DNA polymerase into the segment of the template to be sequenced.
  • Vector primers are special cases of “initiation primers” in which the hybridization site is in the cloning vector.
  • a “walking primer” is a sequencing primer that is capable of specifically hybridizing to a sequence of nucleotides that has been determined either by DNA polymerase extension of an initiation primer or of a previously used walking primer.
  • the term “stability” refers to the binding strength of an oligonucleotide hybridized to its complement.
  • the stability of a primer/template duplex is typically expressed as the “melt temperature” or “Tm” of the duplex, which is the tempareature at which 50% of the duplexes are dissociated. "Tm” is used herein with reference to modified oligonucleotides.
  • the term "computer interface” may mean a graphical user interface, a website, or a computer mouse or menu driven mechanism for providing information to, or receiving information from, a computer. It may rely upon various storage media such as magnetic diskettes, tapes or hard drives, or optical storage devices. The information may be conveyed to and from a local computer via serial, parallel, or modem ports, or to or from a remote computer via LANs, WANs or the internet.
  • an oligonucleotide of 9 nucleotides in length has sufficient specificity to hybridize essentially exclusively to its complement.
  • the stability of a hybrid duplex formed between a sequencing primer and the template DNA is also a significant factor which is likely to be limiting.
  • short sequencing primers e. g. , hexamers
  • Such primers are generally inadequate for cycle sequencing.
  • primers and primer library of the present invention are 7-11 bases in length and can be used individually in sequencing reactions, e.g., cycle sequencing, thus avoiding the use of multiple primers.
  • primers described herein may be used effectively for cycle sequencing.
  • a modified oligonucleotide primer of the invention is from 7 to 11 nucleotides in length and has a nucleotide sequence complementary to that of a subsequence of a selected target polynucleotide sequence and a Tm greater than or equal to 35°C.
  • the sequence of such a modified oligonucleotide primer has the following features: (1) at least the three nucleotides closest to the 3' end of the primer are natural nucleotides linked by natural phosphodiester linkages; (2) among the remaining nucleotides of the primer, two or more adjoining nucleotides are base analogs of natural nucleotide bases that are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides; and (3) an intercalating agent is attached to the 5' end of the primer rhrough a linker effective to allow intercalation of the agent between the two adjacent base analogs.
  • a modified oligonucleotide primer of the invention is from 9 to 11 nucleotides in length and at least the four nucleotides closest to the 3' end of the primer are natural nucleotides linked by natural phosphodiester linkages.
  • Figure 1 shows an exemplary 9-mer containing two adjacent enhanced base stacking base analogs (bold, prime bases) and a 5' attached intercalator annealed to a template sequence.
  • the strands are antiparallel with the bases on opposite strands hydrogen bonded in the standard Watson-Crick fashion.
  • An intercalator is covalently attached, via a flexible linker at the 5' end of the primer.
  • the intercalator has spontaneously intercalated between the modified bases, stabilizing the duplex.
  • the duplex stability of the oligonucleotide primers of the invention is increased by modifications which increase the stability of duplexes formed between such oligonucleotides and a target DNA sequence, without significantly interfering with specificity. These modifications include: (1) non-specific base primer lengtheners (N) ; (2) base specific strong binders (S) ; (3) strong binding interbase linkages (L) ; and (4) stabilizing pendant groups (P) .
  • N non-specific base primer lengtheners
  • D degenerate bases
  • M multiple-binding bases
  • the invention also provides libraries of modified oligonucleotide primers for use in primer extension by way of a polymerase-catalyzed primer-extension reaction.
  • Oligonucleotide primer members of such a primer library are from 7 to 11 nucleotides in length, have a nucleotide sequence complementary to that of a subsequence of a selected target polynucleotide sequence, have at least three nucleotides closest to the 3' end of the primer that are natural nucleotides linked by natural phosphodiester linkages, and the remaining portion of the primer sequence contains one or more modifications resulting in an overall Tm of primer dissociation from a complementary target sequence of at least about 35°C.
  • Preferred modifications include one or more of: (1) two or more adjoining nucleotides which are base analogs of natural nucleotides effective to enhance base stacking in a duplex primer- target structure relative to that observed with natural nucleotides; (2) an intercalating agent may be attached to the 5' end of the primer through a linker that permits intercalation of the agent between adjacent bases in the remaining nucleotides of the primer; (3) one or more modified backbone linkages between adjacent nucleotides which are effective to enhance the stability of primer/target duplex formation; and (4) attachment of a minor groove binder (MGB) to the 5' end of the primer.
  • MGB minor groove binder
  • oligonucleotide primer members of a primer library of the invention are chosen such that the probability that at least one primer in the library will hybridize along its entire length with a random sequence contained in a lOOmer target sequence is 90% or greater.
  • Library selection further involves the elimination of primers which posses less than optimal properties from the library.
  • a sequencing primer must match the template exactly once to function; primers which do not match do not provide sequence information while primers which match more than once yield unintelligible mixed sequences .
  • the probability of each individual primer matching the template must be high enough that at least one primer from the library matches the desired region in the desired direction, yet low enough that the probability of multiple priming by that primer is negligible.
  • Natural DNA sequences are not random. There are clear, statistically significant patterns in biological sequences. Based upon these patterns it is possible to exclude from the library oligonucleotides which would be expected to match natural DNA templates either too frequently or too infrequently.
  • a hairpin consists of a primer basepaired to itself and dimer consists of two separate, identical primer molecules basepaired with each other. Since Watson-Crick basepairing dictates that the basepaired strands are antiparallel, hairpins and dimers require that the primer contain either interrupted or non-interrupted palendromic sequences to form. Choosing primers which don't interact with themselves then consists of excluding sequences which contain palendromes. Accordingly, primers that promote self-annealing or annealing with other primers of the library are excluded in the formation of a reduced-set library.
  • the oligonucleotide primer libraries of the invention are based on standard DNA sequences whose predicted stabilities falls within a central range.
  • Oligonucleotides whose GC content is below 33% or greater than 67% are excluded from the library.
  • Various stabilizing modifications are made to these standard DNA sequences resulting in the modified oligonucleotide primer memebers of a reduced-set library of the invention.
  • Modified oligonucleotide primers based on standard DNA sequences which possess similar GC content and which have similar stabilizing modifications will have similar overall Tm's. Hence, a collection or library of modified primers chosen in this way are expected to possess fairly uniform Tm's.
  • Additional oligonucleotides can be eliminated from a reduced- set primer library because of experimental constraints. All cloned sequences require a cloning vector and primers which match that vector are typically useless. One reasonable strategy for library selection, therefore is to exclude primers which match any of the commonly used cloning vectors. Such vector sequences may be removed in the formation of the reduced-set library or excluded while selecting primers from the library, as further described below.
  • a reduced-set library of oligonucleotide primers of the invention has been selected by removal of some sequences, such as sequences: that contain dinucleotides of the form TA, AC, and GT, when read in a 5 ' to 3 ' direction; that contain repetitive genomic sequences; that promote self-annealing; and which have a GC content that is below 33% or greater than 67%.
  • the reduced-set library of oligonucleotide primers is further selected to remove at least some sequences that contain dinucleotides of the form AA or TT and/or that contain sequences found in commonly used cloning vectors.
  • the approximate number of primers predicted to be in such a reduced-set oligonucleotide primer library of the invention is about is about 400-8,000 for a 7mer; 1500-8,000 for an 8mer; 6,100-18,000 for a 9mer; 25,000-70,000 for a lOmer; and 97,000- 280,000 for an llmer.
  • a preferred library size is about 1,500 for a 7mer; 6,500 for an 8mer; 9,000 for a 9mer; 35,000 for a lOmer; and 140,000 for an llmer.
  • This number is "reduced” relative to the number of oligonucleotides in a complete library which is 16,384 for a 7mer; 65,536 for an 8mer; 262,144 for a 9mer; 1,048,576 for a lOmer; and 4,194,304 for an llmer, respectively.
  • the probability of at least one primer in such a reduced-set library of oligonucleotide primers of hybridizing along its entire length with a random sequence contained in a selected lOOmer target sequence is 90% or greater.
  • the oligonucleotide primers provided in the reduced-set libraries of the invention contain from 7 to 11 nucleotides.
  • the smallest complete collection of oligonucleotides 8 nucleotides in length includes 65,536 different oligonucleotides.
  • preparing a library of oligonucleotide primers for use in polymerase-catalyzed primer-extension reactions with a significantly smaller number of oligonucleotides would be highly advantageous. It is not necessary to prime sequencing reactions at each position of the template and therefore it is not necessary to include all possible sequences in such a library. Moderately overlapping sequencing runs are in fact preferred because they provide efficient template coverage.
  • a sequencing primer within the last 50 to 100 bases of the previous sequence is desirable.
  • a 9-mer library which contains 8000 primers has a greater than 95% chance of containing a primer which falls within the last 100 bases of a sequence run and is directed in the forward direction. Therefore, for a random sequence, inclusion of only 3% of a complete 9-mer library would still allow for efficient sequencing, e.g., by primer walking.
  • the oligonucleotide primers of the reduced-set library have between 9-11 bases wherein at least the four nucleotides closest to the 3' end of each primer are natural nucleotides linked by natural phosphodiester linkages.
  • the oligonucleotide primers of the reduced-set library also have two adjoining nucleotide base analogs effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides and an intercalating agent attached to the 5 ' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
  • the remaining portion of the primer sequence of the oligonucleotide primers of the reduced-set library have a modified backbone linkage between at least some of the adjacent nucleotides, where the modification is effective to enhance the stability of primer/target duplex formation.
  • the oligonucleotide primer has a minor groove binder (MGB) attached to the 5' end of the primer.
  • MGB minor groove binder
  • the oligonucleotide primers of the invention include nonspecific bases known in the art, such as, degenerate bases (D); multiple-binding bases (M) and universal bases (U) .
  • a modified oligonucleotide primer comprising a degenerate base may have any of the four bases incorporated at a given base position within the population of primer molecules.
  • the collective primer thus created would have greater stability than the original primer due to its increased length while maintaining the original primer's specificity. In a singly degenerate mixture only one quarter of the primer molecules match the template perfectly and have enhanced stability.
  • An exemplary -set primer library of the invention comprises from 6 to 10 specific bases, and from 1 to 5 degenerate bases (D) .
  • multiple-binding bases i.e., base analogs that can pair with two or three natural bases
  • base analogs that can pair with two or three natural bases
  • 2-methylaminomethyleneamino-6-methoxyaminopurine and 6H, 8H-3, 4-dihydro-pyrimido [4, 5-c] [4,5-c] [1, 2] oxazin-7-one bind with pyrimidines and purines, respectively (Lin and Brown, 1992) .
  • the collective primer can basepair with any of the four natural base at that position.
  • only one half of the primer molecules in this mixture will have enhanced stability due to the multiple-binding base. Nevertheless, this constitutes a considerable improvement over the addition of all four specific bases, i.e., the use of a degenerate base position in a primer, since each additional multiple-binding base reduces the effective concentration by only two fold.
  • An exemplary oligonucleotide primer in a reduced-set primer library of the invention comprises from 6 to 10 specific bases, and from 1 to 5 multiple-binding bases (M) .
  • a multiple binding base can be located at any position in the oligonucleotide primer, except in the first 3 base positions at the 3' end of the primer molecule (i.e., the end extended by the polymerase), and is generally not located in the first 6 base positions at the 3'.
  • universal bases i.e., base analogs that can pair with all four natural bases
  • base analogs There are numerous universal base analogs described in the literature.
  • Some examples of nonspecific base analogs that can be incorporated into a primer to enhance its binding stability include 'inosine (hypoxanthine) , 3-nitropyrrole, 4-nitroindole, 5-nitroindole, 6-nitroindole, 5-nitroindazole, 4, 5-imidazoledicarboxamide, 3, 5-pyrazoledicarboxamide, 4-nitroimidazole, purine, and the like.
  • An exemplary oligonucleotide primer in a reduced-set primer library of the invention comprises from 6 to 10 specific bases, and from 1 to 5 universal bases (U) .
  • a universal base can be located at any position in the oligonucleotide primer, except in the first 3 base positions at the 3' end of the primer molecule (i.e., the end extended by the polymerase), and is generally not located in the first 6 base positions at the 3'.
  • the stability of the oligonucleotide primers of the invention may be increased through the use of high-affinity base analogs .
  • a number of such base analogs have been described which bind by various mechanisms, e . g. , additional hydrogen bonds, enhanced hydrophobic bonding, increased base stacking, and the like.
  • a modified oligonucleotide primer of the invention comprises base analogs at two adjoining nucleotide positions which in most cases are analogs of natural nucleotide bases that are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides. These base analogs may be located at any position in the oligonucleotide primer other than the first 3 nucleotide positions closest to the 3' end of the oligonucleotide.
  • Exemplary base analogs for use in the oligonucleotide primers of the invention include 5-methyl-2 ' -deoxycytidine, 5-bromo-2'- deoxycytidine, 2-amino-2 ' -deoxyadenine, 5- (1-propynyl) -2 ' - deoxyuridine, 5- (4, 5- dimethylthiazol-2-yl) -2 ' -deoxyuridine, 5-(l- propynyl) -2 '-deoxycytidine, 7- (1-propynyl) -7-deaza-2 '- deoxyguanosine, 7- (1-propynyl) -7-deaza-2 ' -deoxyadenosine, 5- fluoro-2 ' -deoxyuridine, 5-bromo-2 ' -deoxyuridine, 7-deazaadenosine, N2- (imidazolylpropyl) -2 ' -deoxyguanosine, tricyclic analog
  • the primer will comprise from 1 to 8 strongly binding bases and from 3 to 10 natural bases (A, C, G, or T) , where the number of the former plus the number of the latter ranges from 8 to 11.
  • a strongly binding base (S) can be located at any position in the oligonucleotide primer, except within the 3 base positions at the 3' end of the primer molecule
  • the oligonucleotide backbone may be modified to increase duplex stability.
  • Oligonucleotide primers with partial neutral or positively charged backbones are effective in a polymerase-catalyzed primer-extension reaction, e.g., a cycle sequencing reaction. Such modifications may result in reduced repulsion from, or even attraction to, the negatively charged phosphate backbone of the template, or stabilize their duplexes by adopting conformations favorable to annealing.
  • a preferred oligonucleotide primer of the invention comprises nucleotide subunits joined by internucleotide backbone linkages which present the nucleotide bases for hybridization with a target nucleic acid sequence, at temperatures appropriate to cycle sequencing (i.e., for use in a polymerase-catalyzed primer- extension reaction carried out at or above 35° C) , and wherein the base sequence of the primer is complementary to portions of a target sequence.
  • Modified backbone linkages between adjacent nucleotides generally comprise linkages effective to enhance the stability of primer/target duplex formation relative to the stability of primer/target duplexes formed using an identical nucleotide sequence under the same hybridization conditions, wherein the nucleotide subunits are linked by natural phosphodiester bonds.
  • Such a modified backbone finds utility in the oligonucleotide primers of the invention if so long as the interbase spacing is suitable for the formation of appropriate hydrogen bonds, e.g., Watson-Crick, with a nucleic acid template which has a standard backbone .
  • Oligonucleotides having standard or natural nucleic acid bases attached to non-standard polymer backbones for example, backbones with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), with negatively charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), and with positively charged linkages (e.g., aminoalkylphosphoramidates, aminoalkylphosphotriesters, guanidine, etc.), may be used as primers for use in extending a target polynucleotide, from a selected target sequence, via a polymerase-catalyzed primer-extension reaction.
  • uncharged linkages e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.
  • negatively charged linkages e.g., phosphorothioates, phosphorodithioates, etc.
  • Oligonucleotide primers which have a combination of one or more of standard or natural nucleic acid bases, non-standard or non-natural bases, standard polymer backbones and non-standard polymer backbones with or without pendant groups on the 5' end of the primer find utility in thermostable DNA polymerase-catalyzed primer-extension reaction so long as the resulting oligonucleotides have a Tm greater than or equal to 35°C and are capable of specific hybridization with a given target sequence of interest .
  • these oligonucleotides are generally capable of base pairing in a manner analogous to standard or natural nucleic acids.
  • Exemplary backbone linkages for use in the oligonucleotide primers of the invention have been modified relative to naturally occurring nucleic acid linkages in a manner effective to enhance the stability of primer/target duplex formation.
  • Such exemplary backbone linkages include, but are not limited to, morpholino derivatives, N- (2-aminoethyl) glycine backbones, 2 ' -O-methyl, 3'- thioformacetal, 5'-amido derivatives, 2 '-sugar modifications such as 2 '-0-methyl, 2'-0-allyl, 2'-fluoro, 5 ' -pyrene derivatized phosphate, 5 ' -N-carbamate, hydroxylamine, methyleneoxy (methylimino) , guanidine linkages, and the like, and combinations thereof. See, Durand et al . Nucleic Acid Res . 17 : 1823.
  • such modified interbase linkages can be located at any position in the oligonucleotide primer, except for the linkage between the first 3 bases closest to the 3' end of the primer molecule, and are generally not located between the first 6 bases closest to the 3' end.
  • Molecules which by their configuration noncovalently bond with duplex nucleic acids, also stabilize such duplexes (Zimmer and Wahnert, 1986) . This effect can be magnified by tethering these ligands to the DNA with a short linker.
  • the invention provides a modified oligonucleotide primer library wherein the primers are conjugated to such stabilizing pendant groups, resulting in primers which anneal to the template more stably than unmodified primers without exhibiting increased specificity.
  • a stabilizing pendant group for conjugation to the oligonucleotide primers of the invention can be any molecule which binds to and stabilizes duplex DNA, regardless of the mechanism. Examples of tethered stabilizing molecules which bind DNA by various mechanisms have been described. As will be appreciated by those skilled in the art, a given molecule which serves to stabilize a nucleic acid duplex may interact with DNA through one or more mechanisms.
  • Intercalators are a broad class of compounds which spontaneously insert themselves between consecutive basepairs of duplex nucleic acids, stabilizing the duplex. Intercalation requires a flat geometry and intercalators tend to be planar aromatic ring structures. Many examples of intercalators have been described in the literature, including homodimers and heterodimers of intercalators which possess affinities for duplex DNA that are higher than their monomeric constituents (Glazer and Rye, 1992).
  • Intercalators which are useful in this invention include an intercalator or multimeric form thereof which stabilizes duplex nucleic acids upon intercalation. Oligonucleotides tethered to intercalators which bind to their complement with enhanced stability have been described, e.g., 2-amino anthraquinone (Freier, 1997); 9-aminoellipticine (Vasseur, 1988); [N-(2- hydroxyethyl) phenazinium] (Levina, 1993); 2-methoxy-6-chloro-9- aminoacridine (Asseline, et al .
  • MGBs minor groove binding molecules
  • MGBs include a large number of both natural and synthetic compounds.
  • MGBs include: dihydropyrroloindole tripeptide (CDPI3, Kutyavin, 1997), poly (N-methylpyrrole carboxamide) (Sinyakov, 1995); 1,2- dihydro-eH-pyrrolo [3, 2-e] indole-7-carboxylate (Lukhtanov, 1996) ; netropsin (Levina, 1996); and the family of lexitropsins .
  • Lexitropsins are ⁇ H ⁇ iethylimidazole analogues of the antiviral agent netropsin, examples of which include but are not limited to: [ [l-Methyl-4- [ [l-methyl-4- (formylamido) imidazol-2-yl] - carboxamido]pyrrol-2-yl] carboxamido]propionamidine hydrochloride; [ [l-Methyl-4- [ [l-methyl-4- ( formylamido) pyrrol-2-yl] - carboxamido] imidazol-2-yl] carboxamido] propionamidine hydrochloride; [ [l-Methyl-4- [ [l-methyl-4- (formylamido) imidazol - 2-yl] -carboxamido] imidazol-2-yl] carboxamido] propionamidine hydrochloride) (Kissinger, 1987) including cross-linked lexitrop
  • pyrrole/imidazole/ hydroxypyrrole polyamides are linear polyamides which contain the amino acids N- methylimidazole, N-methylpyrrole, 3-hydroxypyrrole, and gamma- aminobutyric acid. The sequence of these residues within the polyamide backbone determines the binding specificity of this class of MGBs. (White, et al . , 1998)). Oligonucleotides tethered to minor groove binding molecules, which bind to their complement with enhanced stability, have also been described in the literature, e.g., cyclopropapyrroloindole
  • a class of DNA binding molecules which interact by way of electrostatic attraction also find utility as stabilizing pendant groups.
  • Examples include positively charged molecules which appear to be attracted to the poly-anionic phosphodiester backbone of DNA.
  • Examples of oligonucleotides tethered to such molecules, which bind to their complement with enhanced stability, are described in the literature, are exemplified by peptides (Harrison and Balasubramanian, 1998); polyamines (Shinozuka, et al . , 1997); and the like.
  • Tethering a stabilizing pendant group to an oligonucleotide primer provides a means to enhance the stability of the duplex by increasing the effective concentration of the oligonucleotide.
  • the site of attachment of the tether to the oligonucleotide can vary and may be the phosphate (e.g., 5', internucleotidic) , sugar (e.g., 2', abasic site ) or one of the bases (e.g., C5 of uridine) .
  • the backbone of the linker contains from 0-10, generally from about 5-7 atoms.
  • the tether is flexible enough to allow the pendant group to interact with the DNA duplex and may contain any of a variety of stable linkages (e.g. methylene, ether, peptide and the like or combinations thereof) .
  • Such a stabilizing pendant group is generally attached to the 5' end of the oligonucleotide primer molecule.
  • the invention is focused on providing a library of oligonucleotide primers for use in DNA polymerase-catalyzed primer-extension reactions, in particular, cycle sequencing reactions. Having such a library available eliminates the expensive and time-consuming step of chemically synthesizing oligonucleotide primers for each step of the sequencing process.
  • the oligonucleotide primers of the invention find utility as sequencing primers in dideoxy-based sequencing reactions.
  • a library of these primers may function as vector primers, initiation primers, or walking primers.
  • these primers are capable of annealing with the template to form a stable, specific hybrid duplex which exhibits a helical structure, stabilized by base-specific base pairing and base stacking.
  • the overall configuration and dimension of these hybrid duplexes are similar enough to naturally occurring DNA duplexes to be recognized by a DNA polymerase.
  • the oligonucleotides in the library are extended by a DNA polymerase-catalyzed reaction with a nucleotide triphosphate, resulting in an interbase linkage.
  • the invention provides a reduced-set library of oligonucleotide primers as described above.
  • SEQUENCING WITH A REDUCED-SET LIBRARY OF OLIGONUCLEOTIDE PRIMERS A "cycle sequencing" procedure is a method of sequencing a template DNA molecule.
  • An oligonucleotide primer is incubated with the template DNA under conditions in which the primer specifically and stably hybridizes to a specific sequence in the template.
  • a thermostable DNA polymerase, dNTPs and dideoxyNTPs are added to the sequencing reaction mixture under conditions in which the primer is extended from its 3' end as directed by the template DNA.
  • the sequencing reaction process of primer annealing and extension of the primer in the presence of dNTPs/ddNTPs is repeated many times resulting in a linear amplification of the sequencing reaction products, the identity of which is determined using standard methods well known in the art. Repeatedly raising and lowering the reaction temperature in this fashion results in a linear amplification of the sequencing signal.
  • Cycle sequencing allows the use of very small amounts of template, e.g., 50 to 100 ng of single-stranded M13 DNA, 100-300 ng of plasmid DNA, or 1-2 ⁇ g of cosmid DNA.
  • the reactions are carried out at temperatures that facilitate sequencing through regions of significant secondary structure as well as the sequencing of double-stranded templates without the need for a separate denaturation step. (See, e.g. U.S. Pat. No. 5,741,640.)
  • next walking primer which is designed to anneal to a sequence within about 50-100 nucleotides of the end of the previously sequenced segment.
  • the results of individual primer extensions allow a next, overlapping region of a nucleic acid to be sequenced.
  • a series of cycle sequencing reactions allows acquisition of extended sequence information ("primer walking”). Primer "walks" frequently begin within vector sequences near unknown cloned sequences, but may begin from any known sequence. In the former case, a "vector primer” annealing to a sequence in the vector adjacent to the cloned DNA is extended by a DNA polymerase into unknown cloned
  • a method of cycle sequencing DNA using such a primer library involves mixing a library sequencing primer with a template DNA to be sequenced, and then incubating the mixture with a DNA polymerase, nucleoside triphosphates and appropriate dideoxynucleoside triphosphate terminators.
  • the dideoxy terminators are fluorescently labeled and the DNA polymerase is thermostable and the incubation is a "thermocycle" .
  • the thermocycle generally comprises a series of temperatures appropriate to: denature the strands of the template (94°C to 98°C) ; anneal the primer to the template (45°C to 55°C) ; and extend to the primer (55°C to 75°C) .
  • temperatures and cycle times of the thermocycling procedure will vary to some extent and can readily be optimized as appropriate to the thermocycler, reagents, etc. It follows that primers having a Tm of about 35°C are effective in cycle sequencing reactions, while primers having a Tm of about 50°C are more effective.
  • the sequencing mixture is exposed to the above "thermocycle" repeatedly, resulting in a linear accumulation of sequencing reaction products and a corresponding amplification of the sequencing signal. Once the reaction has been completed, the terminated extension products are separated according to size to allow determination of the sequence of the template DNA.
  • the invention further relates to selecting a modified oligonucleotide primer from a reduced-set oligonucleotide primer library for use in a polymerase-catalyzed primer-extension reaction and which has a sequence of nucleotide bases complementary to that of a 7-11 nucleotide portion of a known target sequence of about 100 nucleotides.
  • the method comprises matching a subsequence within a selected target polynucleotide sequence with the complementary sequence of a primer in a reduced- set library of oligonucleotide primers, where each primer in the library is characterized as defined as above, and selecting a matched-sequence primer for use in the primer-extension reaction.
  • Another aspect of the invention comprises the use of a computer interface and a program of instructions: (1) to facilitate primer selection from a reduced-set library of the invention and (2) as a means to expedite delivery of the primer for use in a polymerase-catalyzed primer extension reaction.
  • the computer interface may be implemented, for example, using a network- or internet-based system for exchanging information between computers.
  • Fig. 2 is a schematic representation of a network 21 interconnecting two clients 22a and 22b, and a server 23. This limited representation is for the ease of illustration only, as the network may (and typically does) include a plurality of both clients and servers.
  • a network client 22a uses network 21 to access resources provided by network server 23 to select a primer and arrange for its expedited delivery.
  • Network server 23 may be a hypermedia server, perhaps operating in conformity with the Hypertext Transfer Protocol (HTTP) , although this is not necessary to practice this aspect of the invention.
  • HTTP Hypertext Transfer Protocol
  • Such paths may be implemented as switched and/or non-switched paths using private and/or public facilities.
  • the topology of the network is not critical and may be implemented in a variety of ways including hierarchical and peer- to-peer networks.
  • the network clients and network server may be locally located with respect to one another and may be implemented on the same hardware.
  • Fig. 3 is a functional block diagram of a typical computer system 30 that may be used to implement a network client 22a, 22b and/or a network server 23.
  • this computer system includes a bus 31 that interconnects a central processing unit (CPU) 32 representing processing circuitry such as a microprocessor, system memory in the form of random-access memory (RAM) 33 and read-only memory (ROM) 34 and several device interfaces.
  • CPU central processing unit
  • RAM random-access memory
  • ROM read-only memory
  • An input controller 35 represents interface circuitry that connects to one or more input devices 36 such as a keyboard and/or mouse.
  • a display controller 37 represents interface circuitry that connects to one or more display devices 38 such as a video display terminal.
  • An I/O controller 39 represents interface circuitry that connects to one or more I/O devices 40 such as a modem or a network connection.
  • a storage controller 41 represents interface circuitry that connects to one or more storage devices 42 such as a magnetic disk or tape drive, optical disk drive or solid-state storage device.
  • a printer controller 43 represents interface circuitry that connects to one or more printer devices 44 such as a laser or ink-jet printer. No particular type of computer system is critical to practice this aspect of the present invention.
  • a program of instructions i.e., software
  • a program of instructions which may be executed on either the client or server side, or portions on each side, controls the interaction and exchange of information between the client and server to enable a user to select a primer and arrange for its expedited delivery.
  • the sequence comparisons of the primer selection process may be performed on the client side by downloading the instructions from the server or using a disk to load the instructions into the client computer.
  • the comparisons may be performed interactively, with software residing on the server and the user transmitting relevant sequences to the server.
  • the program of instructions may be carried by any computer-readable medium including various magnetic media such as a disk or tape, various optical media such as a compact disc, network paths such as broadband or baseband transmission paths, as well as other communication paths throughout the electromagnetic spectrum including and a carrier wave encoded to transmit the program of instructions.
  • various computer-readable medium as used herein is intended to cover to all such media transmit the program of instructions.
  • various aspects of the computer instructions may be implemented with functionally equivalent hardware using discrete components, such as application specific integrated circuits (ASICs), or the like.
  • Fig. 4 is a flow chart illustrating an exemplary application of the computer-implemented primer selection process.
  • a user inputs a target sequence of approximately 100 bases into the computer, e.g., a network client, as the starting point for sequencing.
  • a reduced-set library of the invention is presented to the user through a computer interface conveyed by a program of instructions carried, e . g. , on a disk or transmitted over the internet.
  • the reduced-set library may be a database containing, e.g., 9000 9-mers.
  • a sequence comparison algorithm is executed which compares the inputted target sequence of about 100 nucleotides to such a reduced-set library to determine if there is a match (step 404) .
  • the comparison may be carried out using an algorithm effective to select one or more appropriate oligonucleotide primers from the reduced-set library, which has a sequence of nucleotide bases complementary to that of an 7-11 nucleotide portion of the target sequence.
  • the comparison results in selection of one or more
  • step 405 the user inputs a known sequence to be excluded from the selected primer sequences, e.g., a previosuly determined sequence.
  • step 406 a sequence comparison algorithm is executed which compares the inputted sequences to be excluded to the target-matched oligonucleotide primers to determine if there is a match (step 407) .
  • the comparison may be carried out using an algorithm effective to eliminate selected oligonucleotide primers which match the inputted sequence, resulting in selection of one or more oligonucleotide primers effective for use in extending the primer via a polymerase-catalyzed primer-extension reaction, where each the primer has a greater than 90% probability of hybridizing along its entire length with a random sequence contained in the selected lOOmer target sequence.
  • the user may accept or reject the selected sequence, or in the case where more than one sequence has been selected, accept one or more and reject the others.
  • the user indicates acceptance in step 409, e.g., by either (1) clicking a button available through the computer interface, (2) ordering by telephone, or (3) ordering by fax, any of which result in express delivery of the ready-made oligonucleotide primer (s) by the library provider to the user in step 410, preferably resulting in receipt of the oligonucleotide primer (s) by the user on the following day.
  • the user also provides payment, e.g., in the form of a credit card number, to the library provider.
  • step 404 in the very unlikely event that there is no match, the user may input a new target sequence in step 411 or exit the system. Similarly, if the user does not accept any of the selected sequences in step 407, the user may begin again or exit the system.
  • each step of the sequencing process approximately 400-800 nucleotides bases are reliably sequenced from a single primer extension.
  • the sequence of about 100 nucleotides on the 3' end of this sequence serves as the next target sequence for design of the next primer, which may be selected from a reduced-set library of the invention, as described above.
  • the computer interface facilitates efficient primer selection and delivery of ready-made primers by the library provider, resulting in the ability to obtain sequencing information more quickly and in a cost-effective manner .
  • the process of primer walking may be automated even further.
  • All of the basic steps of cycle sequencing can or have been automated.
  • Pipetting robots are available which are capable of setting up sequencing reactions.
  • Thermocyclers and capillary fluorescence sequencers are in routine use for automatically incubating and analyzing sequencing reactions.
  • Software exists to both interpret the output of fluorescence sequencers and to select sequencing primers from that sequence.
  • a readily available primer library along with appropriate software allows the pipetting robot to initiate the next step of the primer walk.
  • a further application of the oligonucleotide primers and primer libraries of the invention is in random or "shotgun" sequencing.
  • the sequence of large DNA clones can be determined by "shotgun” sequencing using randomly selected oligonucleotide primers which have been stabilized (modified) , as described herein.
  • Successfully primed reactions yield sequences, which can be accumulated in substantial numbers and assembled by computer, or alternatively, the initial sequences may serve as a starting point for bi-directional primer walking, as further described herein.
  • Randomly chosen library primers may be used individually or in small sets (i.e., 10 or less) to enhance the probability of obtaining sequence information from a given reaction.
  • the primers in the set may also be used individually to obtain single sequences.
  • approximately 5 random primers selected from a reduced-set library of the invention are used simultaneously in a polymerase-catalyzed primer extension reaction, as described herein. (See, e.g., Messing, et al . , 1991.)
  • the invention may also be employed in biochip screening using immobilized oligonucleotide primers of the invention. Sequencing of a cloned DNA sequence using a reduced-set oligonucleotide primer library of the invention may be accelerated by pre- screening the library primers to identify all primers which match somewhere in the clone to be sequenced. A particularly efficient way to do this is to label the cloned DNA and hybridize it with a biochip containing the entire reduced-set library according to protocols known in the art. All immobilized library primers which retain the label are complementary to some portion of the sequence of the clone.
  • the library primers selected by this type of pre- screen can be further screened, possibly by computer to eliminate primers which match known sequences derived from the vector or other known sequences. The remaining primers are good candidates for random or "shotgun" sequencing. (See, e.g., Pease, et al . , 1994; Southern, et al . , 1992.)
  • a known sequence is extended from a newly synthesized primer that primes near the end of the known sequence.
  • primers are typically 15-25 bases in length and anneal to their complement with extreme specificity. Being essentially unique, such sequencing primers can be used only once. Synthesis of a new primer for each step results in expense and delay.
  • the present invention provides a reduced-set library of ready-made oligonucleotide primers and methods for the use of such primers in polymerase-catalyzed primer extension reactions, that provide the advantages of (1) shorter sequences (from 7-11 bases in length) which are useful for sequencing many distinct clones; (2) tighter binding than natural oligonucleotide primers of the same length, effective for use in polymerase-catalyzed primer- extension reactions; (3) a greater than 90% probability of having a primer which binds to a region of any given lOOmer sequence; (4) primers that are ready-made; and (5) a means for selecting such a primer, using a computer interface.
  • These aspects of the invention provide improved efficiency and lower the cost of the sequencing process.
  • oligonucleotide primers of the invention libraries of such primers and methods for their use are generally applicable to any nucleic acid which is to be sequenced.
  • the Robocycler Thermocycler (Stratagene, La Jolla, CA) was used with the following standard thermal cycle for all primers: rapid thermal ramp to 96°C; 96°C for 45 seconds; rapid thermal ramp to 50°C; 50°C for 30 seconds; rapid thermal ramp to 60°C; 60°C for 4 minutes; repeated for 25 cycles.
  • Table 1 shows the results of cycle-sequencing with a series of 9-mers which are members of the general class of short oligonucleotides containing two adjacent enhanced base stacking base analogs and a 5' attached intercalator.
  • This group of primers contain the modified bases 5-methyl-2 ' -deoxycytidine, 2-amino-2 ' -deoxyadenine, and 5(1 -propynyl) -2 ' -deoxyuridine in various combinations and various positions (i.e., at the 1st and
  • the primers were used to sequence 3 different templates, and with one exception, all of the reactions provided sufficient signal to obtain at least 500 bases of sequence (with 330 bases obtained in the one exception) .
  • X 2-methoxy-6-chloro-acridine; lower case bases indicate modified bases signal minus background fold stimulation over acridine only bases
  • a standard full-set library of 9mers will contain about 262,144 different oligonucleotides.
  • about 9000 oligonucleotide primers are found in a 9mer library.
  • Such a reduced-set library of oligonucleotide primers may be designed by starting with biological sequences, e.g. eukaryotic, prokaryotic, and archeic sequences found in GenBank as a raw source of 9mer sequences, followed by removal of selected sequences.
  • all 9-mers capable of self-annealing such as palindromes are screened out, using appropriate software (e.g., OLIGOTM) ; repetitive genomic sequences, sequences containing dinucleotides of the form TA, AC and GT, when read in a 5 ' to 3 ' direction are also screened out and 9-mers possessing a GC content of less than 33% or greater than 67% are eliminated.
  • appropriate software e.g., OLIGOTM
  • repetitive genomic sequences sequences containing dinucleotides of the form TA, AC and GT, when read in a 5 ' to 3 ' direction are also screened out and 9-mers possessing a GC content of less than 33% or greater than 67% are eliminated.
  • other sequences such as vector sequences and low complexity sequences such as mobile elements are also be screened out.
  • Minimum, preferred and maximum library sizes for exemplary reduced-set libraries of 7, 8, 9, 10 and llmers are provided in Table 2.

Abstract

Modified oligonucleotide primers, libraries of such primers and methods for their use in characterizing a selected target sequence, via a polymerase-catalyzed primer-extension reaction is described. The primers are from 7-11 bases in length, have a Tm greater than or equal to 35 °C and are effective for use in characterizing a selected target which has a known sequence of at least about 100 nucleotides. A method of selecting a primer from a ready-made reduced-set library of such oligonucleotides using a computer interface is also described.

Description

A LIBRARY OF MODIFIED PRIMERS FOR NUCLEIC ACID SEQUENCING, AND
METHOD OF USE THEREOF
References Addess, K.J., and Feigon, J., Nucleic Acids Res 22(24) :5484- 5491, (1994).
Asseline, U., et al., Proc Natl Acad Sci USA 81 (11) :3297- 3301, (1984).
Azhikina, T., et al., DNA Seq 6(4) : 211-216, (1996). Azhikina, T., et al., Proc Natl Acad Sci USA 90 (24) : 11460- 11462, (1993).
Bazile, D., et al., Nucleic Acids Res 17 (19) : 7749-7759, (1989) .
Blocker, H., and Lincoln, D.N., Comput Appl Biosci 10 (2) :193- 197, (1994).
Bock, J.H., and Slightom, J.L., Biotechniques 19 (1) :60-62, (1995) .
Breslauer, K.J., et al., Proc Natl Acad Sci USA 83(11): 3746- 3750, (1986) . Burbelo, P.D., and ladarola, M.J. Biotechniques 16(4) : 645- 646, (1994).
Burge, C, et al., Proc Natl Acad Sci USA £9 (4) : 1358-1362, (1992) .
Chen, Y.H., et al., J Biomol Struct Dyn 14 (3) : 341-355, (1996).
Durand, M., et al., Nucleic Acids Res 17 (5) : 1823-1837, (1989) .
Freier, S.M., and Altmann, K.H., Nucleic Acids Res 25(22) :4429-4443, (1997). Geierstanger, B.H., and Wemmer, D.E., Annu Rev Biophys Biomol Struct 2^:463-493, (1995).
Ghiso, N.S., et al., Genomics 17 (3) : 798-799, (1993). Glazer, A.N., and Rye, H.S., Nature 359 (6398 ): 859-861, (1992) . Hardin, S.H., et al. , Genome Res 6 ( 6) : 545-550, (1996).
Harrison, J.G., and Balasubramanian, S., Nucleic Acids Res 26(13) :3136-3145, (1998).
Hou, ., and Smith, L.M., Anal Biochem 221 ( 1) : 136-141, (1994) . Johnson, A.F., et al., Anal Biochem 241(2) . -228-231 , (1996). Kaczorowski, T., and Szybalski, . Anal Biochem 221(1): 127- 135, (1994).
Kieleczawa, J., et al., Science 258 (5089) : 1787-1791, (1992). Kissinger, K., et al., Biochemistry 26 (18) :5590-5595, (1987). Kotler, L., et al., Biotechniques 17 (3) : 554-559, (1994).
Kumar, S., et al., Nucleic Acids Res 26 (3) : 831-838, (1998). Kutyavin, I.V., et al., Nucleic Acids Res 25_ (18) : 3718-3723, (1997) .
Levina, A.S., et al., Antisense Nucleic Acid Drug Dev 6(2) :75-85, (1996) .
Levina, A.S., et al., Bioconjug Chem 4 (5) : 319-325, (1993). Lin, P.K., and Brown, D.M., Nucleic Acids Res 2_0 (19) : 5149- 5152, (1992) .
Lukhtanov, E.A., et al., Bioconjug Chem 7(5) : 564-567, (1996). Maxam, A.M., and Gilbert, W . , Proc Na tl Acad Sci U S A 74(2) :560-564, (1977) .
Maxam, A.M., and Gilbert, ., Methods Enzymol 65 (1) : 99-560, (1980) . Ono, A., et al . , Bioconj ug Chem 4(6) : 99-508 , (1993).
Raja, M.C., et al . , Biotechniques 23 (3) : 362-368, (1997).
Ruiz-Martinez, M.C., et al . , Biotechniques 20 (6) : 1058-1064, (1996) .
Sanger, F., et al . , Proc Na tl Acad Sci USA 74 (12) : 5463-7, (1977) .
Shinozuka, K., et al . , Nucleic Acids Symp Ser 37 :215-216, (1997) .
Siemieniak, D.R., and Slightom, J.L., Gene 9_6 (1): 121-124, (1990) . Sinyakov, A.N., et al . , J. Am . Chem . Soc. 117:4995-4996 (1995) .
Slightom, J.L., et al . , Biotechniques 17 (3) : 536-537, (1994).
Studier, F. ., Proc Na tl Acad Sci USA 86 (18 ): 6917-6921, (1989) . Szybalski, ., Gene 90 (1) : 177-178, (1990).
Vasseur, J.J., et al . , Biochem Biophys Res Commun 152 (1) : 56- 61, (1988).
White, S., et al . , Na ture 391 (6666) : 468-471, (1998).
Wiederholt, K. , et al . , Bioconjug Chem 8(2) : 119-126, (1997). Zimmer, C., and Wahnert, U., Prog Biophys Mol Biol 47 (1) :31- 112, (1986) .
Field Of The Invention
The present invention relates generally to modified oligonucleotide primers, reduced set libraries of such primers and methods for their use in polymerase-catalyzed primer-extension reactions .
Background Of The Invention Efforts are underway throughout industry and academia to obtain the DNA sequence of the entire genome of various life forms. The human genome includes about 3 x 109 base pairs, rendering the sequencing of it a formidable task.
Current nucleotide sequencing methods involve generating nucleic acid fragments which are labeled (radioactively, fluorescently or chemically) , and resolved according to size, using gel electrophoresis . (See, e.g., Maxam & Gilbert, 1977; 1980; and Sanger et al . , 1977).
Sanger, or dideoxy sequencing (Sanger et al . , 1977) is currently the sequencing method of choice for DNA sequence determination. It relies upon stable and specific annealing of a single-strand oligonucleotide (the "sequencing primer") to the template to be sequenced, followed by primer extension using a DNA polymerase. A variation of dideoxy sequencing termed "cycle-sequencing" involves use of each template strand multiple times in each reaction, resulting in signal amplification. One application of cycle sequencing comprises a series of individual cycle sequencing reactions, wherein the results of each individual primer extension allows the next, overlapping segment to be sequenced and is termed "primer walking". Approximately, 400-800 nucleotides of sequence is reliably determined from each primer extension, then used to design the next walking primer which is preferably selected to anneal to a sequence within about 50-100 nucleotides of the end of the previously sequenced segment.
Statistically, any given 15 base DNA sequence is expected to exist once every 109 bases in a random sequence. Such specific primers are typically used in sequencing reactions and range from 15 to 25 bases in length. There are 415-425 different primers having a length of 15-25 nucleotides, so producing a bank or library of all possible sequencing primers is not practical. As a result, when using a 15mer to 25mer, the new primer required for each step of the "walk" is generally synthesized by standard chemical methods, based on the sequence obtained from the preceding step of the "walk".
One proposed solution to this problem is a reduced library size of short standard DNA primers, which could be prepared in advance and selected as needed. [See e.g., Studier, 1989; Siemieniak and Slightom, 1990; Blocker and Lincoln, 1994; Burbelo and ladarola, 1994; Slightom, et al . , 1994; Hardin, et al . , 1996; Bock and Slightom, 1995] .
Another proposed solution is the use of composite primers which are combinations of small primer library members that anneal consecutively along a template to create longer more specific sequencing primers. [See, e.g., Szybalski, 1990; Raja, et al . , 1997; Kieleczawa, et al . , 1992; Hou and Smith, 1994; Johnson, et al . , 1996; Ruiz-Martinez, 1996; Kotler, et al . , 1994; Azhikina, et al . , 1993; Ghiso, et al . , 1993; Kaczorowski and Szybalski, 1994; U.S. Patent No. 5,114,839.]
Despite considerable work, these approaches generally: (1) exhibit poor stability of the primer-template duplex since standard sequencing primers shorter than about 10 bases generally do not bind with sufficient stability at the temperatures required for cycle sequencing; (2) focus on single stranded templates, and do not work as well on double stranded templates; (3) suffer from less than optimal success rates. (See, e . g. , Johnson, et al . , 1996; Azhikina, et al . , 1996). Accordingly, there remains a need for improved primers, libraries of such primers and methods for efficient and cost effective sequencing of DNA.
Summary Of The Invention
The invention includes, in one aspect, a modified oligonucleotide primer for use in characterizing a selected target sequence via a polymerase-catalyzed primer-extension reaction. Such a modified oligonucleotide primer is from 7-11 nucleotides extending from a 3' end to a 5 ' end, has a Tm greater than or equal to 35°C and has a sequence of nucleotide bases complementary to that of the target's selected sequence.
In general, at least the three nucleotides closest to the primer's 3' end are natural nucleotides linked by natural phosphodiester linkages and have two adjoining nucleotide base analogs among the remaining primer nucleotides which are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides. In addition, such primers have an intercalating agent attached to the 5 ' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
In one preferred embodiment of this aspect of the invention, the modified oligonucleotide primer is composed of between 9-11 bases and at least the four nucleotides closest to the 3' end of the oligonucleotide primer are natural nucleotides linked by natural phosphodiester linkages.
A further aspect the invention is directed to a library of modified oligonucleotide primers from which can be selected, an oligonucleotide primer effective for use in characterizing a selected target sequence, via a polymerase-catalyzed primer- extension reaction.
The primer members of the library are composed of from 7-11 nucleotides extending from a 3' end to a 5 ' end, wherein at least the three nucleotides closest to the 3' end of the primer are natural nucleotides linked by natural phosphodiester linkages and the remaining primer sequence contains one or more stabilizing modifications which result in an overall Tm of primer dissociation from a complementary target sequence of at least about 35°C. The primer members of the oligonucleotide library are chosen such that the probability of at least one primer in the library hybridizing along its entire length with a random sequence contained in a lOO er target sequence is at least 90%. The primer members of the library may be modified in one or more of the following ways:
(1) two or more adjoining nucleotides may be base analogs of natural nucleotides which are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides;
(2) an intercalating agent may be attached to the 5' end of the primer through a linker that permits intercalation of the agent between adjacent bases in the nucleotide sequence of the primer;
(3) the backbone linkages between adjacent nucleotides may be modified in a manner effective to enhance the stability of primer/target duplex formation; and
(4) the 5' end of the primer may be attached to a minor groove binder (MGB) in a manner effective to permit binding of the MGB to the minor groove in the primer/template duplex.
In general, the library is a reduced-set library which includes oligonucleotide primers wherein the primers have been selected to remove sequences: that contain dinucleotides of the form TA, AC, and GT, when read in a 5 ' to 3' direction; that contain repetitive genomic sequences; that promote self-annealing; and which have a Tm that is predicted to be less than about 35°C.
In some cases, the reduced-set library of oligonucleotide primers is further selected to remove at least some sequences that contain dinucleotides of the form AA or TT and/or that contain sequences found in commonly used cloning vectors.
In general, the modified oligonucleotide primers of the reduced-set library have between 7-11 bases, wherein at least the three nucleotides closest to the 3' end of each primer are natural nucleotides linked by natural phosphodiester linkages.
In some cases, the oligonucleotide primers of the reduced-set library also have two adjoining nucleotide base analogs effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides and an intercalating agent attached to the 5' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
In other cases, the remaining portion of the primer sequence of the oligonucleotide primers of the reduced-set library have a modified backbone linkage between at least some of the adjacent nucleotides, where the modification is effective to enhance the stability of primer/target duplex formation.
In still other cases, the 5' end of the primer is attached to a minor groove binder (MGB) in a manner effective to permit binding of the MGB to the minor groove in the primer/template duplex.
Exemplary reduced-set libraries have a primer length of 7, 8, 9, 10 or 11 nucleotides, and the number of primers in the library is about 400-8,000; 1500-8,000; 6,100-18,000; 25,000-70,000; or 97,000-280,000 primers, respectively.
The invention further provides a method of selecting a modified oligonucleotide primer effective for use in characterizing a selected target sequence, via a polymerase- catalyzed primer-extension reaction, where the target has a known sequence of at least 100 nucleotides.
The method includes the steps of comparing a subsequence within a known target sequence with the sequences of the primers in a modified oligonucleotide primer library of the invention and selecting the matched-sequence primer for use in a primer- extension reaction.
In general, the modified oligonucleotide primer library is a reduced-set library and the method is carried out using a computer interface to select the primer sequence from the library.
Brief Description Of The Figures
Figure 1 depicts the structure of an exemplary 9-mer modified oligonucleotide primer of the invention which has 2 modified bases and an acridine pendant group. Figure 2 is a schematic representation of a network 21 interconnecting two clients 22a and 22b and a server 23.
Figure 3 is a functional block diagram of a typical computer system 30 that may be used to implement a network client and/or a network server which includes a bus 31 that interconnects a central processing unit (CPU) 32, system memory (RAM) 33, and read-only memory (ROM) 34 and several device interfaces, as further detailed herein.
Figure 4 depicts a flow diagram of a method for selection and use of the oligonucleotide primers of the invention in polymerase- catalyzed primer-extension reactions by way of a computer interface .
Detailed Description Of The Invention
I. DEFINITIONS Unless otherwise indicated, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook, et al . , (1989) and Ausubel, et al . , (1989), for definitions and terms of the art. It is to be understood that this invention is not limited by the particular methodology, protocols, and reagents described, as these may vary.
Nucleic acid subunits are referred to herein by their standard base designations; T, thymine; A, adenosine; C, cytosine; G, guanine, U, uracil; variable positions are referred to as described below. The term "naturally occurring nucleotides" means A, C, G, and T for DNA and A, C, G, and U for RNA.
As used herein, the terms "nucleoside" and "nucleotide" include those moieties which contain both known purine and pyrimidine bases and heterocyclic bases which have been modified. Such modifications include halogenated purines and pyrimidines, methylated purines or pyrimidines, acylated purines or pyrimidines, or heterocycles with additional fused rings. Modified nucleosides or nucleotides will also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen, aliphatic groups, or are functionalized as ethers, amines, or the like.
The term "polynucleotide" as used herein refers to a polymeric molecule having a backbone which supports bases capable of hydrogen bonding to typical polynucleotides, where the polymer backbone presents the bases linked by phosphodiester bonds in a manner to permit such hydrogen bonding in a sequence specific fashion between the polymeric molecule and a typical polynucleotide (e.g., single-stranded DNA). "Polynucleotides" include polymers having a polynucleotide which is an N- or C- glycoside of a purine or pyrimidine base, and to other polymers containing non-standard nucleotide backbones, for example, polyamide linkages (e.g., peptide nucleic acids or PNAs) , phosphodiamidate morpholine chemistry, and other synthetic sequence-specific nucleic acid molecules providing that the molecules contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA.
The terms "polynucleotide" and "oligonucleotide" are used interchangeably herein and refer to the primary structure of the molecule. These terms include modified, variant or substituted nucleic acids (DNA and RNA) , both single- and double-stranded. Examples include a nucleic acid sequence comprising: (1) a label, many examples of which are known in the art; (2) methylation or "caps"; (3) a substitution of one or more naturally occurring nucleotides with an analog of a natural nucleotide base; (4) an interbase or backbone modification, such as a modified linkage (e.g., an alpha anomeric nucleic acid, etc.); (5) a pendant moiety, e.g., a protein such as a nuclease, toxin, antibody, signal peptide, poly-L-lysine, etc.; (6) a minor groove binding moiety (e.g., pyrrole/imidazole polyamide, CDPI3, a netropsin or distamycin A analog, etc.); (7) an intercalator (e.g., acridine, psoralen, etc.); (8) a chelator ( e . g. , a metal, a radioactive metal, a boron, oxidative metal, etc.); (9) an alkylator; and (10) an unmodified natural polynucleotide or oligonucleotide.
Polynucleotides are described as "complementary" to one another when hybridization occurs in an antiparallel configuration between two single-stranded polynucleotides. A double-stranded polynucleotide can be "complementary" to another polynucleotide, if hybridization can occur between one of the strands of the first polynucleotide and the second. Complementarity (the degree that one polynucleotide is complementary with another) is quantifiable in terms of the proportion of bases in opposing strands that are expected to form hydrogen bonds with each other, according to generally accepted base-pairing rules.
As used herein the term "analog" with reference to an oligonucleotide means a substance possessing both structural and chemical properties similar to those of an oligonucleotide which has standard or natural structural features. As used herein the term "modified" with reference to an oligonucleotide means a substance possessing both structural and chemical properties similar to, but not identical to, those of an oligonucleotide which has standard or natural structural features. A modified oligonucleotide primer of the invention is from 7 to 11 nucleotides in length and has a Tm greater than or equal to 35°C. As used herein the term "oligonucleotide primer" refers to a nucleic acid sequence which may include variant or substituted nucleic acids (DNA and RNA) , a modified backbone, a stabilizing pendant group or nucleotide analogs, so long as the oligonucleotide binds to a complementary or near complementary sequence in a manner effective to extended via a polymerase- catalyzed primer-extension reaction. Generally, the primer is constructed so that, it is capable of forming stable duplexes under the conditions specified. As used herein, the term "base stacking" refers to the relationship of adjacent bases along a linear DNA molecule, such that as one proceeds along a duplex, adjacent bases are "stacked" above one another, providing a stabilizing effect.
As used herein, the term "natural nucleotide" refers to residues, A, C, G, and T for DNA and A, C, G, and U for RNA.
As used herein "standard backbone" or "naturally occurring backbone" refers to a phosphodiester linkage which includes deoxyribose (for DNA) or ribose (for RNA) as the backbone or to which the residues, A, C, G, and T for DNA and A, C, G, and U for RNA, or modifications and analogs thereof, are linked.
The term "linkage" or "interbase linkage" is used interchangeably with the term "backbone". A polymer having a "modified backbone" is one having other than the standard phosphodiester-linked deoxyribose or ribose for the backbone to which the base residues are linked. A modified backbone as referred to herein has interbase spacing such that appropriate hydrogen bonds can form between an oligonucleotide with a modified backbone and one with a standard phosphodiester backbone.
An oligonucleotide having a "chimeric backbone" is one in which the oligonucleotide backbone comprises a segment having a standard backbone and a segment having a modified backbone. An oligonucleotide having a chimeric backbone may also comprise more than one segment of a standard backbone and/or more than one segment of modified backbone, which may be the same or different. In an oligonucleotide having a chimeric backbone, the segment of standard backbone and the segment of modified backbone may each be contiguous or the standard and modified backbone segments may be interspersed, e.g., alternating, within one another.
As used herein, the terms "degenerate nucleotide" and "degenerate base" means that the nucleotide at that position in the sequence may correspond to any one of four nucleotide (A, C, G and T in DNA, or A, C, G and U, in RNA) , such that the population comprises an equal number of oligonucleotide molecules having each one of the four bases in the particular position.
As used herein, the terms "reduced-set library" and "reduced- set library of oligonucleotide primers" refer to a library of a given size which does not contain all possibel sequences of that length. For example, a library of 9mers may contain 9,000 of the 262,144 (49) possible 9mers.
The term "template DNA" refers to a single- or double- stranded deoxyribonucleic acid molecule that contains a segment of nucleotides to be sequenced. The template DNA may be derived from a variety of sources, e.g., directly from biological organisms as genomic DNA or as molecular clones propagated in an appropriate host. The template DNA may be prepared for sequencing by a variety of means, e . g. , proteinase K/SDS, chaotropic salts, or the like. Alternatively, the template DNA may come from in vitro amplification procedures such as PCR.
As used herein, the term "hybridizing template region" or "hybridizing template nucleotide sequence" or "primer binding site" refers to a primer binding region contained within the template molecule with which a primer will anneal, or form a stable hybrid (or duplex) under desired conditions.
By "hybridization" or "annealing" is meant the sequence- specific binding between a primer and a template nucleic acid. It will be appreciated that the binding sequences need not have perfect Watson-Crick complementarity to provide stable hybrids. In many situations, stable hybrids will form where base analogs are paired with naturally occurring bases. They may or may not form hydrogen bonds. Accordingly, as used herein the term "complementary" refers to an oligonucleotide that forms a stable duplex with its "complement" under sequencing conditions.
As used herein, the term "primer" refers to a structure comprised of an oligonucleotide, as defined above, that contains a nucleic acid sequence complementary to a nucleic acid sequence present in the template molecule. The polynucleotide regions of a primer may be composed of DNA, and/or RNA, and/or synthetic nucleotide analogs. The backbone of the primer may be a standard backbone or a chimeric backbone. The 3' end of the primer may have any chemical functionality capable of undergoing a DNA polymerase-catalyzed reaction with a nucleotide triphosphate, resulting in an interbase linkage.
A "sequencing primer" is an oligonucleotide capable of hybridizing to a segment of the template DNA at a point from which a DNA polymerase can extend the primer by catalyzing the template- dependent addition of nucleotides thereto. An "initiation primer" is a sequencing primer that is capable of specifically hybridizing to a known sequence of nucleotides and can be extended by a DNA polymerase into the segment of the template to be sequenced. "Vector primers" are special cases of "initiation primers" in which the hybridization site is in the cloning vector. A "walking primer" is a sequencing primer that is capable of specifically hybridizing to a sequence of nucleotides that has been determined either by DNA polymerase extension of an initiation primer or of a previously used walking primer. As used herein, the term "stability" refers to the binding strength of an oligonucleotide hybridized to its complement. The stability of a primer/template duplex is typically expressed as the "melt temperature" or "Tm" of the duplex, which is the tempareature at which 50% of the duplexes are dissociated. "Tm" is used herein with reference to modified oligonucleotides.
As used herein, the term "computer interface" may mean a graphical user interface, a website, or a computer mouse or menu driven mechanism for providing information to, or receiving information from, a computer. It may rely upon various storage media such as magnetic diskettes, tapes or hard drives, or optical storage devices. The information may be conveyed to and from a local computer via serial, parallel, or modem ports, or to or from a remote computer via LANs, WANs or the internet.
II. IMPROVED OLIGONUCLEOTIDE PRIMERS AND PRIMER LIBRARIES
In a typical cloned DNA, an oligonucleotide of 9 nucleotides in length has sufficient specificity to hybridize essentially exclusively to its complement. However, in a cycle sequencing procedure, the stability of a hybrid duplex formed between a sequencing primer and the template DNA is also a significant factor which is likely to be limiting. Although the use of short sequencing primers, e. g. , hexamers, in DNA sequencing has been described, it has generally been in the context of compound primers, i.e., wherein strings of two, three, or more ligated or unligated hexamers form a template-directed contiguous primer sequence. Such primers are generally inadequate for cycle sequencing. ( See, e . g. , Szybalski, 1990; Raja, et al . , 1997; Kieleczawa, et al . , 1992; Hou and Smith, 1994; Johnson, et al . , 1996; Ruiz-Martinez, et al . , 1996; Kotler, et al . , 1994; Azhikina, et al . , 1996; Azhikina, et al . , 1993; Ghiso, et al . , 1993; and Kaczorowski and Szybalski, 1994.)
In contrast, the primers and primer library of the present invention are 7-11 bases in length and can be used individually in sequencing reactions, e.g., cycle sequencing, thus avoiding the use of multiple primers. In addition, the primers described herein may be used effectively for cycle sequencing. A modified oligonucleotide primer of the invention is from 7 to 11 nucleotides in length and has a nucleotide sequence complementary to that of a subsequence of a selected target polynucleotide sequence and a Tm greater than or equal to 35°C.
In general, the sequence of such a modified oligonucleotide primer has the following features: (1) at least the three nucleotides closest to the 3' end of the primer are natural nucleotides linked by natural phosphodiester linkages; (2) among the remaining nucleotides of the primer, two or more adjoining nucleotides are base analogs of natural nucleotide bases that are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides; and (3) an intercalating agent is attached to the 5' end of the primer rhrough a linker effective to allow intercalation of the agent between the two adjacent base analogs.
In one preferred embodiment, a modified oligonucleotide primer of the invention is from 9 to 11 nucleotides in length and at least the four nucleotides closest to the 3' end of the primer are natural nucleotides linked by natural phosphodiester linkages.
Figure 1 shows an exemplary 9-mer containing two adjacent enhanced base stacking base analogs (bold, prime bases) and a 5' attached intercalator annealed to a template sequence. The strands are antiparallel with the bases on opposite strands hydrogen bonded in the standard Watson-Crick fashion. An intercalator is covalently attached, via a flexible linker at the 5' end of the primer. The intercalator has spontaneously intercalated between the modified bases, stabilizing the duplex. The duplex stability of the oligonucleotide primers of the invention is increased by modifications which increase the stability of duplexes formed between such oligonucleotides and a target DNA sequence, without significantly interfering with specificity. These modifications include: (1) non-specific base primer lengtheners (N) ; (2) base specific strong binders (S) ; (3) strong binding interbase linkages (L) ; and (4) stabilizing pendant groups (P) .
Approaches to increasing the effectiveness of oligonucleotide primers in polymerase-catalyzed primer-extension reactions wherein the primers are from about 7-11 nucleotides in length, include one or more of the following: (1) the use of high-affinity base analogs to increase the stability of the oligonucleotides; (2) tethering a ligand to the oligonucleotide, wherein the ligand is known to noncovalently bond with and stabilize duplex nucleic acids, the effect of which can be magnified by tethering these ligands to the DNA by way of a short linker; or (3) increasing the length of the primer to increase the stability of the primer- template duplex, which may be accomplished by incorporation of non-specific base primer lengtheners (N) , i.e., degenerate bases (D) ; multiple-binding bases (M) and/or universal bases (U) .
OLIGONUCLEOTIDE PRIMER LIBRARIES
The invention also provides libraries of modified oligonucleotide primers for use in primer extension by way of a polymerase-catalyzed primer-extension reaction.
Oligonucleotide primer members of such a primer library are from 7 to 11 nucleotides in length, have a nucleotide sequence complementary to that of a subsequence of a selected target polynucleotide sequence, have at least three nucleotides closest to the 3' end of the primer that are natural nucleotides linked by natural phosphodiester linkages, and the remaining portion of the primer sequence contains one or more modifications resulting in an overall Tm of primer dissociation from a complementary target sequence of at least about 35°C.
Preferred modifications include one or more of: (1) two or more adjoining nucleotides which are base analogs of natural nucleotides effective to enhance base stacking in a duplex primer- target structure relative to that observed with natural nucleotides; (2) an intercalating agent may be attached to the 5' end of the primer through a linker that permits intercalation of the agent between adjacent bases in the remaining nucleotides of the primer; (3) one or more modified backbone linkages between adjacent nucleotides which are effective to enhance the stability of primer/target duplex formation; and (4) attachment of a minor groove binder (MGB) to the 5' end of the primer.
SELECTION OF LIBRARY SEQUENCES
The oligonucleotide primer members of a primer library of the invention are chosen such that the probability that at least one primer in the library will hybridize along its entire length with a random sequence contained in a lOOmer target sequence is 90% or greater.
Library selection further involves the elimination of primers which posses less than optimal properties from the library. A sequencing primer must match the template exactly once to function; primers which do not match do not provide sequence information while primers which match more than once yield unintelligible mixed sequences . The probability of each individual primer matching the template must be high enough that at least one primer from the library matches the desired region in the desired direction, yet low enough that the probability of multiple priming by that primer is negligible.
Natural DNA sequences are not random. There are clear, statistically significant patterns in biological sequences. Based upon these patterns it is possible to exclude from the library oligonucleotides which would be expected to match natural DNA templates either too frequently or too infrequently.
Computer analysis of the occurrence of short combinations of bases within natural sequences revealed that some are under- represented and some are over-represented (Burge, et al , 1992) . For example, the dinucleotides TA, AC and GT (when read in the 5' to 3 ' direction) are consistently under-represented whereas AA and TT are consistently over-represented. A reduced set library limits under-represented sequences. In some cases, a reduced set library will further limit over-represented sequences. It has become clear that repetitive sequences have proliferated in the genomes of higher organisms. Many examples are known and include Alu sequences, GT/CA repeats, Line elements, microsatellites, and the like. Oligonucleotides which match any of these ubiquitous sequences are excluded in the formation of a reduced-set library.
It is also important that functional sequencing primers exhibit minimal self-interactions, namely hairpin and dimer formation. A hairpin consists of a primer basepaired to itself and dimer consists of two separate, identical primer molecules basepaired with each other. Since Watson-Crick basepairing dictates that the basepaired strands are antiparallel, hairpins and dimers require that the primer contain either interrupted or non-interrupted palendromic sequences to form. Choosing primers which don't interact with themselves then consists of excluding sequences which contain palendromes. Accordingly, primers that promote self-annealing or annealing with other primers of the library are excluded in the formation of a reduced-set library. It is preferable to utilize the same polymerase-catalyzed primer-extension protocol regardless of which library primer is being used. One simple way to create a library whose member's Tm's fall within a restricted range is to exclude sequences of either very high, or very low GC content, e.g., a GC content below 33% or greater than 67%. Software packages are available for primer construction according to these principles, an example being OLIGO TM Version 4.0 For Macintosh from National Biosciences, Inc. (Plymouth, Minn.).
Using GC content as a criterion, the oligonucleotide primer libraries of the invention are based on standard DNA sequences whose predicted stabilities falls within a central range.
Oligonucleotides whose GC content is below 33% or greater than 67% are excluded from the library. Various stabilizing modifications (as further described herein) are made to these standard DNA sequences resulting in the modified oligonucleotide primer memebers of a reduced-set library of the invention.
Modified oligonucleotide primers, based on standard DNA sequences which possess similar GC content and which have similar stabilizing modifications will have similar overall Tm's. Hence, a collection or library of modified primers chosen in this way are expected to possess fairly uniform Tm's.
Studies have also indicated that the 3' primer base is important for efficient primer extension (Nevinsky, et al . , 1990), with G the preferred 3' base because it exhibits both strong binding and high selectivity.
Additional oligonucleotides can be eliminated from a reduced- set primer library because of experimental constraints. All cloned sequences require a cloning vector and primers which match that vector are typically useless. One reasonable strategy for library selection, therefore is to exclude primers which match any of the commonly used cloning vectors. Such vector sequences may be removed in the formation of the reduced-set library or excluded while selecting primers from the library, as further described below.
All of the aforementioned selection criteria can be applied more or less strictly and combined together to result in a library size which can be practically assembled and maintained. Accordingly, a reduced-set library of oligonucleotide primers of the invention has been selected by removal of some sequences, such as sequences: that contain dinucleotides of the form TA, AC, and GT, when read in a 5 ' to 3 ' direction; that contain repetitive genomic sequences; that promote self-annealing; and which have a GC content that is below 33% or greater than 67%.
In some cases, the reduced-set library of oligonucleotide primers is further selected to remove at least some sequences that contain dinucleotides of the form AA or TT and/or that contain sequences found in commonly used cloning vectors. The approximate number of primers predicted to be in such a reduced-set oligonucleotide primer library of the invention is about is about 400-8,000 for a 7mer; 1500-8,000 for an 8mer; 6,100-18,000 for a 9mer; 25,000-70,000 for a lOmer; and 97,000- 280,000 for an llmer. A preferred library size is about 1,500 for a 7mer; 6,500 for an 8mer; 9,000 for a 9mer; 35,000 for a lOmer; and 140,000 for an llmer.
This number is "reduced" relative to the number of oligonucleotides in a complete library which is 16,384 for a 7mer; 65,536 for an 8mer; 262,144 for a 9mer; 1,048,576 for a lOmer; and 4,194,304 for an llmer, respectively.
In addition, the probability of at least one primer in such a reduced-set library of oligonucleotide primers of hybridizing along its entire length with a random sequence contained in a selected lOOmer target sequence is 90% or greater. The oligonucleotide primers provided in the reduced-set libraries of the invention contain from 7 to 11 nucleotides. By way of example, the smallest complete collection of oligonucleotides 8 nucleotides in length includes 65,536 different oligonucleotides. Thus, preparing a library of oligonucleotide primers for use in polymerase-catalyzed primer-extension reactions with a significantly smaller number of oligonucleotides would be highly advantageous. It is not necessary to prime sequencing reactions at each position of the template and therefore it is not necessary to include all possible sequences in such a library. Moderately overlapping sequencing runs are in fact preferred because they provide efficient template coverage.
For example, in a sequencing method based on primer walking, selection of a sequencing primer within the last 50 to 100 bases of the previous sequence is desirable. Applying the Poisson distribution and assuming a random template sequence, a 9-mer library which contains 8000 primers has a greater than 95% chance of containing a primer which falls within the last 100 bases of a sequence run and is directed in the forward direction. Therefore, for a random sequence, inclusion of only 3% of a complete 9-mer library would still allow for efficient sequencing, e.g., by primer walking.
PRIMER MODIFICATIONS In many cases, the oligonucleotide primers of the reduced-set library have between 9-11 bases wherein at least the four nucleotides closest to the 3' end of each primer are natural nucleotides linked by natural phosphodiester linkages.
In some cases, the oligonucleotide primers of the reduced-set library also have two adjoining nucleotide base analogs effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides and an intercalating agent attached to the 5 ' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
In other cases, the remaining portion of the primer sequence of the oligonucleotide primers of the reduced-set library have a modified backbone linkage between at least some of the adjacent nucleotides, where the modification is effective to enhance the stability of primer/target duplex formation.
In further cases, the oligonucleotide primer has a minor groove binder (MGB) attached to the 5' end of the primer. It will be understood that all the nucleotides of such oligonucleotide primer members of a reduced-set library of the invention may be natural nucleotides.
Exemplary base analogs of natural nucleotide bases, intercalating agents, modified backbone linkages and minor groove binders are described below. NON-SPECIFIC PRIMER LENGTHENERS
In general, longer primers bind more strongly than shorter primers. In addition, as a standard DNA primer becomes longer, the number of possible sequences and hence the specificity increases sharply. The incorporation of nonspecific bases which basepair with all four naturally occurring bases (A, C, G, and T) provides a means to increase the length stability of the primer without increasing its specificity.
Hence, in one aspect the oligonucleotide primers of the invention include nonspecific bases known in the art, such as, degenerate bases (D); multiple-binding bases (M) and universal bases (U) .
A modified oligonucleotide primer comprising a degenerate base may have any of the four bases incorporated at a given base position within the population of primer molecules. The collective primer thus created would have greater stability than the original primer due to its increased length while maintaining the original primer's specificity. In a singly degenerate mixture only one quarter of the primer molecules match the template perfectly and have enhanced stability.
However, this is of minimal consequence because sequencing primers are generally included in large excess in sequencing reactions and the perfectly matched primer would be present in adequate concentration. This strategy is limited to a small number of degenerate bases since each additional degenerate base reduces the effective concentration four fold and the combined effect of degenerate bases is exponential.
An exemplary -set primer library of the invention comprises from 6 to 10 specific bases, and from 1 to 5 degenerate bases (D) .
ADDITION OF MULTIPLE-BINDING BASES TO SHORT PRIMERS
The addition of multiple-binding bases, i.e., base analogs that can pair with two or three natural bases, can increase stability while increasing the number of primer sequences and decreasing the effective primer concentration only modestly. For example, 2-methylaminomethyleneamino-6-methoxyaminopurine and 6H, 8H-3, 4-dihydro-pyrimido [4, 5-c] [4,5-c] [1, 2] oxazin-7-one bind with pyrimidines and purines, respectively (Lin and Brown, 1992) . If these base analogs are incorporated at a given base position within the population of primer molecules to produce a collective primer in which about 50% of the primer molecules have one analog and about 50% have the other analog, the collective primer can basepair with any of the four natural base at that position. In actuality, only one half of the primer molecules in this mixture will have enhanced stability due to the multiple-binding base. Nevertheless, this constitutes a considerable improvement over the addition of all four specific bases, i.e., the use of a degenerate base position in a primer, since each additional multiple-binding base reduces the effective concentration by only two fold. There are numerous multiple-binding base analogs described in the literature.
Additional examples of analogs which bind to two bases are N^> -methoxy-2 ' -deoxyadenosine, N^-methoxy cytosine, and the like. An exemplary oligonucleotide primer in a reduced-set primer library of the invention comprises from 6 to 10 specific bases, and from 1 to 5 multiple-binding bases (M) . A multiple binding base can be located at any position in the oligonucleotide primer, except in the first 3 base positions at the 3' end of the primer molecule (i.e., the end extended by the polymerase), and is generally not located in the first 6 base positions at the 3'.
ADDITION OF UNIVERSAL BASES TO SHORT PRIMERS
The addition of universal bases, i.e., base analogs that can pair with all four natural bases, can increase length and therefore stability without increasing the number of primer sequences or decreasing the effective primer concentration. There are numerous universal base analogs described in the literature. Some examples of nonspecific base analogs that can be incorporated into a primer to enhance its binding stability include 'inosine (hypoxanthine) , 3-nitropyrrole, 4-nitroindole, 5-nitroindole, 6-nitroindole, 5-nitroindazole, 4, 5-imidazoledicarboxamide, 3, 5-pyrazoledicarboxamide, 4-nitroimidazole, purine, and the like. An exemplary oligonucleotide primer in a reduced-set primer library of the invention comprises from 6 to 10 specific bases, and from 1 to 5 universal bases (U) . A universal base can be located at any position in the oligonucleotide primer, except in the first 3 base positions at the 3' end of the primer molecule (i.e., the end extended by the polymerase), and is generally not located in the first 6 base positions at the 3'.
BASE SPECIFIC STRONG BINDERS
The stability of the oligonucleotide primers of the invention may be increased through the use of high-affinity base analogs . A number of such base analogs have been described which bind by various mechanisms, e . g. , additional hydrogen bonds, enhanced hydrophobic bonding, increased base stacking, and the like.
A modified oligonucleotide primer of the invention comprises base analogs at two adjoining nucleotide positions which in most cases are analogs of natural nucleotide bases that are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides. These base analogs may be located at any position in the oligonucleotide primer other than the first 3 nucleotide positions closest to the 3' end of the oligonucleotide.
Exemplary base analogs for use in the oligonucleotide primers of the invention include 5-methyl-2 ' -deoxycytidine, 5-bromo-2'- deoxycytidine, 2-amino-2 ' -deoxyadenine, 5- (1-propynyl) -2 ' - deoxyuridine, 5- (4, 5- dimethylthiazol-2-yl) -2 ' -deoxyuridine, 5-(l- propynyl) -2 '-deoxycytidine, 7- (1-propynyl) -7-deaza-2 '- deoxyguanosine, 7- (1-propynyl) -7-deaza-2 ' -deoxyadenosine, 5- fluoro-2 ' -deoxyuridine, 5-bromo-2 ' -deoxyuridine, 7-deazaadenosine, N2- (imidazolylpropyl) -2 ' -deoxyguanosine, tricyclic analogs of cytosine such as phenothiazine and phenoxazine, and phenoxazine 3- methyl nucleotide, and the like..
In a primer prepared to include such high-affinity base analogs or strongly binding bases (S) , the primer will comprise from 1 to 8 strongly binding bases and from 3 to 10 natural bases (A, C, G, or T) , where the number of the former plus the number of the latter ranges from 8 to 11. A strongly binding base (S) can be located at any position in the oligonucleotide primer, except within the 3 base positions at the 3' end of the primer molecule
(i.e., the end extended by the polymerase), and is generally not located within 6 bases of the 3' end.
STRONG BINDING INTERBASE LINKAGES/MODIFIED BACKBONES
In addition to base additions or substitutions in the sequence of a modified oligonucleotide primer, the oligonucleotide backbone may be modified to increase duplex stability.
Oligonucleotide primers with partial neutral or positively charged backbones are effective in a polymerase-catalyzed primer-extension reaction, e.g., a cycle sequencing reaction. Such modifications may result in reduced repulsion from, or even attraction to, the negatively charged phosphate backbone of the template, or stabilize their duplexes by adopting conformations favorable to annealing.
A preferred oligonucleotide primer of the invention comprises nucleotide subunits joined by internucleotide backbone linkages which present the nucleotide bases for hybridization with a target nucleic acid sequence, at temperatures appropriate to cycle sequencing (i.e., for use in a polymerase-catalyzed primer- extension reaction carried out at or above 35° C) , and wherein the base sequence of the primer is complementary to portions of a target sequence.
Modified backbone linkages between adjacent nucleotides generally comprise linkages effective to enhance the stability of primer/target duplex formation relative to the stability of primer/target duplexes formed using an identical nucleotide sequence under the same hybridization conditions, wherein the nucleotide subunits are linked by natural phosphodiester bonds.
Such a modified backbone finds utility in the oligonucleotide primers of the invention if so long as the interbase spacing is suitable for the formation of appropriate hydrogen bonds, e.g., Watson-Crick, with a nucleic acid template which has a standard backbone .
Oligonucleotides having standard or natural nucleic acid bases attached to non-standard polymer backbones, for example, backbones with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), with negatively charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), and with positively charged linkages (e.g., aminoalkylphosphoramidates, aminoalkylphosphotriesters, guanidine, etc.), may be used as primers for use in extending a target polynucleotide, from a selected target sequence, via a polymerase-catalyzed primer-extension reaction.
Oligonucleotide primers which have a combination of one or more of standard or natural nucleic acid bases, non-standard or non-natural bases, standard polymer backbones and non-standard polymer backbones with or without pendant groups on the 5' end of the primer find utility in thermostable DNA polymerase-catalyzed primer-extension reaction so long as the resulting oligonucleotides have a Tm greater than or equal to 35°C and are capable of specific hybridization with a given target sequence of interest .
If the interbase spacing is correct, these oligonucleotides are generally capable of base pairing in a manner analogous to standard or natural nucleic acids.
Exemplary backbone linkages for use in the oligonucleotide primers of the invention have been modified relative to naturally occurring nucleic acid linkages in a manner effective to enhance the stability of primer/target duplex formation. Such exemplary backbone linkages include, but are not limited to, morpholino derivatives, N- (2-aminoethyl) glycine backbones, 2 ' -O-methyl, 3'- thioformacetal, 5'-amido derivatives, 2 '-sugar modifications such as 2 '-0-methyl, 2'-0-allyl, 2'-fluoro, 5 ' -pyrene derivatized phosphate, 5 ' -N-carbamate, hydroxylamine, methyleneoxy (methylimino) , guanidine linkages, and the like, and combinations thereof. See, Durand et al . Nucleic Acid Res . 17 : 1823.
In a reduced-set library oligonucleotide primer, such modified interbase linkages can be located at any position in the oligonucleotide primer, except for the linkage between the first 3 bases closest to the 3' end of the primer molecule, and are generally not located between the first 6 bases closest to the 3' end.
STABILIZING PENDANT GROUPS
Molecules which by their configuration noncovalently bond with duplex nucleic acids, also stabilize such duplexes (Zimmer and Wahnert, 1986) . This effect can be magnified by tethering these ligands to the DNA with a short linker.
In one aspect, the invention provides a modified oligonucleotide primer library wherein the primers are conjugated to such stabilizing pendant groups, resulting in primers which anneal to the template more stably than unmodified primers without exhibiting increased specificity.
A stabilizing pendant group for conjugation to the oligonucleotide primers of the invention can be any molecule which binds to and stabilizes duplex DNA, regardless of the mechanism. Examples of tethered stabilizing molecules which bind DNA by various mechanisms have been described. As will be appreciated by those skilled in the art, a given molecule which serves to stabilize a nucleic acid duplex may interact with DNA through one or more mechanisms.
Intercalators are a broad class of compounds which spontaneously insert themselves between consecutive basepairs of duplex nucleic acids, stabilizing the duplex. Intercalation requires a flat geometry and intercalators tend to be planar aromatic ring structures. Many examples of intercalators have been described in the literature, including homodimers and heterodimers of intercalators which possess affinities for duplex DNA that are higher than their monomeric constituents (Glazer and Rye, 1992).
Intercalators which are useful in this invention include an intercalator or multimeric form thereof which stabilizes duplex nucleic acids upon intercalation. Oligonucleotides tethered to intercalators which bind to their complement with enhanced stability have been described, e.g., 2-amino anthraquinone (Freier, 1997); 9-aminoellipticine (Vasseur, 1988); [N-(2- hydroxyethyl) phenazinium] (Levina, 1993); 2-methoxy-6-chloro-9- aminoacridine (Asseline, et al . , 1984); oxazolopyridocarbazolium (Bazile, 1989); pyrene (Ono, 1993); [N-(2- hydroxyethylethyl) phenazinium] (Levina, et al . , 1993); arugomycin; nogalamycin; pluramycins; actinomycin D; triostin A; echinomycin; TANDEM; CysMeTANDEM, and the like.
Most small molecules which bind by a non-intercalative mechanism to DNA do so in the minor groove of the DNA double helix. Such minor groove binding molecules, termed MGBs, include a large number of both natural and synthetic compounds. Examples of MGBs include: dihydropyrroloindole tripeptide (CDPI3, Kutyavin, 1997), poly (N-methylpyrrole carboxamide) (Sinyakov, 1995); 1,2- dihydro-eH-pyrrolo [3, 2-e] indole-7-carboxylate (Lukhtanov, 1996) ; netropsin (Levina, 1996); and the family of lexitropsins . Lexitropsins are ΛHτiethylimidazole analogues of the antiviral agent netropsin, examples of which include but are not limited to: [ [l-Methyl-4- [ [l-methyl-4- (formylamido) imidazol-2-yl] - carboxamido]pyrrol-2-yl] carboxamido]propionamidine hydrochloride; [ [l-Methyl-4- [ [l-methyl-4- ( formylamido) pyrrol-2-yl] - carboxamido] imidazol-2-yl] carboxamido] propionamidine hydrochloride; [ [l-Methyl-4- [ [l-methyl-4- (formylamido) imidazol - 2-yl] -carboxamido] imidazol-2-yl] carboxamido] propionamidine hydrochloride) (Kissinger, 1987) including cross-linked lexitropsins (Chen et al . , 1996); pyrrole/imidazole/ hydroxypyrrole polyamides (pyrrole/imidazole/ hydroxypyrrole polyamides are linear polyamides which contain the amino acids N- methylimidazole, N-methylpyrrole, 3-hydroxypyrrole, and gamma- aminobutyric acid. The sequence of these residues within the polyamide backbone determines the binding specificity of this class of MGBs. (White, et al . , 1998)). Oligonucleotides tethered to minor groove binding molecules, which bind to their complement with enhanced stability, have also been described in the literature, e.g., cyclopropapyrroloindole
(Kumar and Schweitzer, 1998); netropsin and distamycin A (Levina, et al . , 1996); Hoechst 33258 (Wiederholt, et al . , 1997); and the like.
A class of DNA binding molecules which interact by way of electrostatic attraction, also find utility as stabilizing pendant groups. Examples include positively charged molecules which appear to be attracted to the poly-anionic phosphodiester backbone of DNA. Examples of oligonucleotides tethered to such molecules, which bind to their complement with enhanced stability, are described in the literature, are exemplified by peptides (Harrison and Balasubramanian, 1998); polyamines (Shinozuka, et al . , 1997); and the like. Arugomycin, nogalamycin, pluramycins, actinomycin D, triostin A, echinomycin, TANDEM (Geirstanger, 1995), and CysMeTANDEM (Addess, 1994) as well as their structural analogs bind through both intercalation and minor groove interaction. Given that they form tight complexes with duplex DNA, molecules which bind by multiple mechanisms may function as stabilizing pendant groups.
Tethering a stabilizing pendant group to an oligonucleotide primer provides a means to enhance the stability of the duplex by increasing the effective concentration of the oligonucleotide. The site of attachment of the tether to the oligonucleotide can vary and may be the phosphate (e.g., 5', internucleotidic) , sugar (e.g., 2', abasic site ) or one of the bases (e.g., C5 of uridine) . The backbone of the linker contains from 0-10, generally from about 5-7 atoms. The tether is flexible enough to allow the pendant group to interact with the DNA duplex and may contain any of a variety of stable linkages (e.g. methylene, ether, peptide and the like or combinations thereof) .
Such a stabilizing pendant group is generally attached to the 5' end of the oligonucleotide primer molecule.
III. USE OF OLIGONUCLEOTIDE PRIMER LIBRARIES
The invention is focused on providing a library of oligonucleotide primers for use in DNA polymerase-catalyzed primer-extension reactions, in particular, cycle sequencing reactions. Having such a library available eliminates the expensive and time-consuming step of chemically synthesizing oligonucleotide primers for each step of the sequencing process.
The oligonucleotide primers of the invention find utility as sequencing primers in dideoxy-based sequencing reactions. Thus, a library of these primers may function as vector primers, initiation primers, or walking primers. Generally, these primers are capable of annealing with the template to form a stable, specific hybrid duplex which exhibits a helical structure, stabilized by base-specific base pairing and base stacking. The overall configuration and dimension of these hybrid duplexes are similar enough to naturally occurring DNA duplexes to be recognized by a DNA polymerase. In addition, the oligonucleotides in the library are extended by a DNA polymerase-catalyzed reaction with a nucleotide triphosphate, resulting in an interbase linkage. More specifically, the invention provides a reduced-set library of oligonucleotide primers as described above. SEQUENCING WITH A REDUCED-SET LIBRARY OF OLIGONUCLEOTIDE PRIMERS A "cycle sequencing" procedure is a method of sequencing a template DNA molecule. An oligonucleotide primer is incubated with the template DNA under conditions in which the primer specifically and stably hybridizes to a specific sequence in the template. A thermostable DNA polymerase, dNTPs and dideoxyNTPs are added to the sequencing reaction mixture under conditions in which the primer is extended from its 3' end as directed by the template DNA. The sequencing reaction process of primer annealing and extension of the primer in the presence of dNTPs/ddNTPs is repeated many times resulting in a linear amplification of the sequencing reaction products, the identity of which is determined using standard methods well known in the art. Repeatedly raising and lowering the reaction temperature in this fashion results in a linear amplification of the sequencing signal. Cycle sequencing allows the use of very small amounts of template, e.g., 50 to 100 ng of single-stranded M13 DNA, 100-300 ng of plasmid DNA, or 1-2 μg of cosmid DNA. The reactions are carried out at temperatures that facilitate sequencing through regions of significant secondary structure as well as the sequencing of double-stranded templates without the need for a separate denaturation step. (See, e.g. U.S. Pat. No. 5,741,640.)
Following reliable sequencing of approximately 400-800 nucleotides bases from a single primer extension, the sequence serves as a template for design of the next walking primer, which is designed to anneal to a sequence within about 50-100 nucleotides of the end of the previously sequenced segment. The results of individual primer extensions allow a next, overlapping region of a nucleic acid to be sequenced. A series of cycle sequencing reactions allows acquisition of extended sequence information ("primer walking"). Primer "walks" frequently begin within vector sequences near unknown cloned sequences, but may begin from any known sequence. In the former case, a "vector primer" annealing to a sequence in the vector adjacent to the cloned DNA is extended by a DNA polymerase into unknown cloned
DNA. Consecutive steps are performed until the entire sequence of interest has been determined. Such standard cycle sequencing methods require a new primer for each subsequent step. Design and sequencing of such primers is time consuming and costly. In addition to the primer library disclosed and claimed herein, a method of cycle sequencing DNA using such a primer library is provided. The method involves mixing a library sequencing primer with a template DNA to be sequenced, and then incubating the mixture with a DNA polymerase, nucleoside triphosphates and appropriate dideoxynucleoside triphosphate terminators. In a preferred embodiment, the dideoxy terminators are fluorescently labeled and the DNA polymerase is thermostable and the incubation is a "thermocycle" . The thermocycle generally comprises a series of temperatures appropriate to: denature the strands of the template (94°C to 98°C) ; anneal the primer to the template (45°C to 55°C) ; and extend to the primer (55°C to 75°C) . As will be understood by those of skill in the art, the exact temperatures and cycle times of the thermocycling procedure will vary to some extent and can readily be optimized as appropriate to the thermocycler, reagents, etc. It follows that primers having a Tm of about 35°C are effective in cycle sequencing reactions, while primers having a Tm of about 50°C are more effective.
The sequencing mixture is exposed to the above "thermocycle" repeatedly, resulting in a linear accumulation of sequencing reaction products and a corresponding amplification of the sequencing signal. Once the reaction has been completed, the terminated extension products are separated according to size to allow determination of the sequence of the template DNA. The invention further relates to selecting a modified oligonucleotide primer from a reduced-set oligonucleotide primer library for use in a polymerase-catalyzed primer-extension reaction and which has a sequence of nucleotide bases complementary to that of a 7-11 nucleotide portion of a known target sequence of about 100 nucleotides. The method comprises matching a subsequence within a selected target polynucleotide sequence with the complementary sequence of a primer in a reduced- set library of oligonucleotide primers, where each primer in the library is characterized as defined as above, and selecting a matched-sequence primer for use in the primer-extension reaction. Another aspect of the invention comprises the use of a computer interface and a program of instructions: (1) to facilitate primer selection from a reduced-set library of the invention and (2) as a means to expedite delivery of the primer for use in a polymerase-catalyzed primer extension reaction.
The computer interface may be implemented, for example, using a network- or internet-based system for exchanging information between computers. Fig. 2 is a schematic representation of a network 21 interconnecting two clients 22a and 22b, and a server 23. This limited representation is for the ease of illustration only, as the network may (and typically does) include a plurality of both clients and servers. A network client 22a uses network 21 to access resources provided by network server 23 to select a primer and arrange for its expedited delivery. Network server 23 may be a hypermedia server, perhaps operating in conformity with the Hypertext Transfer Protocol (HTTP) , although this is not necessary to practice this aspect of the invention. The nature of the communication paths connecting network clients 22a, 22b and network server 23 are not critical to the practice of this aspect of the invention. Such paths may be implemented as switched and/or non-switched paths using private and/or public facilities. Similarly, the topology of the network is not critical and may be implemented in a variety of ways including hierarchical and peer- to-peer networks. The network clients and network server, for example, may be locally located with respect to one another and may be implemented on the same hardware.
Fig. 3 is a functional block diagram of a typical computer system 30 that may be used to implement a network client 22a, 22b and/or a network server 23. As shown, this computer system includes a bus 31 that interconnects a central processing unit (CPU) 32 representing processing circuitry such as a microprocessor, system memory in the form of random-access memory (RAM) 33 and read-only memory (ROM) 34 and several device interfaces. An input controller 35 represents interface circuitry that connects to one or more input devices 36 such as a keyboard and/or mouse. A display controller 37 represents interface circuitry that connects to one or more display devices 38 such as a video display terminal. An I/O controller 39 represents interface circuitry that connects to one or more I/O devices 40 such as a modem or a network connection. A storage controller 41 represents interface circuitry that connects to one or more storage devices 42 such as a magnetic disk or tape drive, optical disk drive or solid-state storage device. A printer controller 43 represents interface circuitry that connects to one or more printer devices 44 such as a laser or ink-jet printer. No particular type of computer system is critical to practice this aspect of the present invention.
A program of instructions (i.e., software), which may be executed on either the client or server side, or portions on each side, controls the interaction and exchange of information between the client and server to enable a user to select a primer and arrange for its expedited delivery. For example, the sequence comparisons of the primer selection process may be performed on the client side by downloading the instructions from the server or using a disk to load the instructions into the client computer. Alternatively, the comparisons may be performed interactively, with software residing on the server and the user transmitting relevant sequences to the server. More broadly, the program of instructions may be carried by any computer-readable medium including various magnetic media such as a disk or tape, various optical media such as a compact disc, network paths such as broadband or baseband transmission paths, as well as other communication paths throughout the electromagnetic spectrum including and a carrier wave encoded to transmit the program of instructions. Thus, the term "computer-readable medium" as used herein is intended to cover to all such media transmit the program of instructions. Where appropriate, various aspects of the computer instructions may be implemented with functionally equivalent hardware using discrete components, such as application specific integrated circuits (ASICs), or the like.
Fig. 4 is a flow chart illustrating an exemplary application of the computer-implemented primer selection process. In step 401, a user inputs a target sequence of approximately 100 bases into the computer, e.g., a network client, as the starting point for sequencing. In step 402, a reduced-set library of the invention is presented to the user through a computer interface conveyed by a program of instructions carried, e . g. , on a disk or transmitted over the internet. The reduced-set library may be a database containing, e.g., 9000 9-mers. In step 403, a sequence comparison algorithm is executed which compares the inputted target sequence of about 100 nucleotides to such a reduced-set library to determine if there is a match (step 404) . The comparison may be carried out using an algorithm effective to select one or more appropriate oligonucleotide primers from the reduced-set library, which has a sequence of nucleotide bases complementary to that of an 7-11 nucleotide portion of the target sequence. The comparison results in selection of one or more
(typically 1-5) oligonucleotide primers. In step 405, the user inputs a known sequence to be excluded from the selected primer sequences, e.g., a previosuly determined sequence. In step 406, a sequence comparison algorithm is executed which compares the inputted sequences to be excluded to the target-matched oligonucleotide primers to determine if there is a match (step 407) . The comparison may be carried out using an algorithm effective to eliminate selected oligonucleotide primers which match the inputted sequence, resulting in selection of one or more oligonucleotide primers effective for use in extending the primer via a polymerase-catalyzed primer-extension reaction, where each the primer has a greater than 90% probability of hybridizing along its entire length with a random sequence contained in the selected lOOmer target sequence. In step 408 the user may accept or reject the selected sequence, or in the case where more than one sequence has been selected, accept one or more and reject the others. The user indicates acceptance in step 409, e.g., by either (1) clicking a button available through the computer interface, (2) ordering by telephone, or (3) ordering by fax, any of which result in express delivery of the ready-made oligonucleotide primer (s) by the library provider to the user in step 410, preferably resulting in receipt of the oligonucleotide primer (s) by the user on the following day. As part of the acceptance process, the user also provides payment, e.g., in the form of a credit card number, to the library provider.
Returning to step 404, in the very unlikely event that there is no match, the user may input a new target sequence in step 411 or exit the system. Similarly, if the user does not accept any of the selected sequences in step 407, the user may begin again or exit the system.
In each step of the sequencing process, approximately 400-800 nucleotides bases are reliably sequenced from a single primer extension. The sequence of about 100 nucleotides on the 3' end of this sequence serves as the next target sequence for design of the next primer, which may be selected from a reduced-set library of the invention, as described above.
This process is repeated until the sequence of the entire target nucleic acid has been determined. The computer interface facilitates efficient primer selection and delivery of ready-made primers by the library provider, resulting in the ability to obtain sequencing information more quickly and in a cost-effective manner .
The process of primer walking may be automated even further. With the availability of a ready made library of sequencing primers useful for extending arbitrary sequences, all of the basic steps of cycle sequencing can or have been automated. Pipetting robots are available which are capable of setting up sequencing reactions. Thermocyclers and capillary fluorescence sequencers are in routine use for automatically incubating and analyzing sequencing reactions. Software exists to both interpret the output of fluorescence sequencers and to select sequencing primers from that sequence. A readily available primer library along with appropriate software allows the pipetting robot to initiate the next step of the primer walk.
Given that 96 capillary fluorescence sequencers are currently available, and that typical sequencing runs and thermocycles require about 2 hours each, a single instrument capable of producing about 200,000 bases of assembled, denovo sequence per day is well within the reach of current technology.
A further application of the oligonucleotide primers and primer libraries of the invention is in random or "shotgun" sequencing. The sequence of large DNA clones can be determined by "shotgun" sequencing using randomly selected oligonucleotide primers which have been stabilized (modified) , as described herein. Successfully primed reactions yield sequences, which can be accumulated in substantial numbers and assembled by computer, or alternatively, the initial sequences may serve as a starting point for bi-directional primer walking, as further described herein. Randomly chosen library primers may be used individually or in small sets (i.e., 10 or less) to enhance the probability of obtaining sequence information from a given reaction. If multiple sequences are simultaneously obtained in this way, the primers in the set may also be used individually to obtain single sequences. In an exemplary case, approximately 5 random primers selected from a reduced-set library of the invention are used simultaneously in a polymerase-catalyzed primer extension reaction, as described herein. (See, e.g., Messing, et al . , 1991.)
The invention may also be employed in biochip screening using immobilized oligonucleotide primers of the invention. Sequencing of a cloned DNA sequence using a reduced-set oligonucleotide primer library of the invention may be accelerated by pre- screening the library primers to identify all primers which match somewhere in the clone to be sequenced. A particularly efficient way to do this is to label the cloned DNA and hybridize it with a biochip containing the entire reduced-set library according to protocols known in the art. All immobilized library primers which retain the label are complementary to some portion of the sequence of the clone. The library primers selected by this type of pre- screen can be further screened, possibly by computer to eliminate primers which match known sequences derived from the vector or other known sequences. The remaining primers are good candidates for random or "shotgun" sequencing. (See, e.g., Pease, et al . , 1994; Southern, et al . , 1992.)
IV. Utility
In conventional directed priming methods, a known sequence is extended from a newly synthesized primer that primes near the end of the known sequence. Such primers are typically 15-25 bases in length and anneal to their complement with extreme specificity. Being essentially unique, such sequencing primers can be used only once. Synthesis of a new primer for each step results in expense and delay.
A short, single sequencing primer strategy was proposed in a theoretical paper (Studier, 1989) which concluded that 8-mers, 9- mers and 10-mers possess sufficient specificity to sequence cosmid-sized (50 kilobasepair) templates of arbitrary sequence. However, even a complete library of 8-mers would contain 65,536 primers, such that selection of a smaller subset thereof would be of great benefit. The present invention provides a reduced-set library of ready-made oligonucleotide primers and methods for the use of such primers in polymerase-catalyzed primer extension reactions, that provide the advantages of (1) shorter sequences (from 7-11 bases in length) which are useful for sequencing many distinct clones; (2) tighter binding than natural oligonucleotide primers of the same length, effective for use in polymerase-catalyzed primer- extension reactions; (3) a greater than 90% probability of having a primer which binds to a region of any given lOOmer sequence; (4) primers that are ready-made; and (5) a means for selecting such a primer, using a computer interface. These aspects of the invention provide improved efficiency and lower the cost of the sequencing process.
The oligonucleotide primers of the invention, libraries of such primers and methods for their use are generally applicable to any nucleic acid which is to be sequenced.
All patents, patent applications, and publications mentioned herein, are hereby expressly incorporated by reference herein. It is to be understood that while the invention has been described in conjunction with the preferred specific embodiments thereof, that the description above as well as the examples which follow are intended to illustrate and not limit the scope of the invention. Other aspects, advantages and modifications within the scope of the invention will be apparent to those skilled in the art to which the invention pertains. The practice of the present invention will employ, unless otherwise indicated, conventional techniques of synthetic organic chemistry, biochemistry, molecular biology, and the like, which are within the skill of the art. Such techniques are explained fully in the literature. (See, e.g., Sambrook, et al . , 1989; 1984); 1984); Ansorge, et al . , 1997). (Wiley, NY) and the series, Methods in Enzymology (Academic Press, Inc.), all of which are expressly incorporated by reference herein. EXAMPLE 1 Cycle Sequencing with 9-mers Containing two Adjacent Enhanced Base Stacking Base Analogs and a 5 ' Attached Intercalator All oligonucleotides were synthesized by Biosource International (Foster City, CA) using solid phase synthesis, and supplied in cartridge purified form. Modified base and acridine phosphoramidites were purchased from Glen Research (Sterling, VA) .
All sequencing was performed on the ABI 377 DNA Sequencer (Perkin-Elmer Applied Biosystems, Foster City, CA) utilizing the ABI PRISM® Big Dye Terminator Cycle Sequencing Ready Reaction Kit with Amplitaq® DNA Polymerase, FS . The manufacturer's protocol was used with the following exceptions. (1) 10 μl reactions were used instead of 20 μl reactions and 80% of each reaction was loaded per lane. (2) The Robocycler Thermocycler (Stratagene, La Jolla, CA) was used with the following standard thermal cycle for all primers: rapid thermal ramp to 96°C; 96°C for 45 seconds; rapid thermal ramp to 50°C; 50°C for 30 seconds; rapid thermal ramp to 60°C; 60°C for 4 minutes; repeated for 25 cycles.
Table 1 shows the results of cycle-sequencing with a series of 9-mers which are members of the general class of short oligonucleotides containing two adjacent enhanced base stacking base analogs and a 5' attached intercalator. This group of primers contain the modified bases 5-methyl-2 ' -deoxycytidine, 2-amino-2 ' -deoxyadenine, and 5(1 -propynyl) -2 ' -deoxyuridine in various combinations and various positions (i.e., at the 1st and
2nd, 2nd and 3rd, and 3rd and 4th base positions from the 5* end). The primers were used to sequence 3 different templates, and with one exception, all of the reactions provided sufficient signal to obtain at least 500 bases of sequence (with 330 bases obtained in the one exception) .
The signal strengths were compared with identical sequencing reactions using oligonucleotides which lacked only the two base analogs. The average range of stimulation was 15.1 fold and the overall average read length was 580 bases. TABLE 1.
Figure imgf000034_0001
5' Base=l
"X" = 2-methoxy-6-chloro-acridine; lower case bases indicate modified bases signal minus background fold stimulation over acridine only bases
Figure imgf000035_0001
EXAMPLE 2 Library Construction As discussed herein, a standard full-set library of 9mers will contain about 262,144 different oligonucleotides. In a reduced set library of the invention, about 9000 oligonucleotide primers are found in a 9mer library. Such a reduced-set library of oligonucleotide primers may be designed by starting with biological sequences, e.g. eukaryotic, prokaryotic, and archeic sequences found in GenBank as a raw source of 9mer sequences, followed by removal of selected sequences.
In forming the reduced-set library, all 9-mers capable of self-annealing such as palindromes are screened out, using appropriate software (e.g., OLIGO™) ; repetitive genomic sequences, sequences containing dinucleotides of the form TA, AC and GT, when read in a 5 ' to 3 ' direction are also screened out and 9-mers possessing a GC content of less than 33% or greater than 67% are eliminated. Optionally, other sequences such as vector sequences and low complexity sequences such as mobile elements are also be screened out. Minimum, preferred and maximum library sizes for exemplary reduced-set libraries of 7, 8, 9, 10 and llmers are provided in Table 2.
TABLE 2 .
Primer size Lib rary Si ze
Minimum Preferred Maximum
7 400 1 , 500 8 , 000
8 1500 6 , 500 8 , 000
9 6 , 100 9 , 000 18 , 000
10 25 , 000 35 , 000 70 , 000
11 97 , 000 140 , 000 280 , 000

Claims

It IS CLAIMED :
1. A modified oligonucleotide primer for use in characterizing a selected target sequence, via a polymerase- catalyzed primer-extension reaction, comprising: an oligonucleotide composed of 7-11 nucleotides extending from a
3' end to a 5 ' end having a sequence of nucleotide bases complementary to that of a target's selected sequence and a Tm greater than or equal to 35°C, where (i) at least three nucleotides closest to the primer's 3' end are natural nucleotides linked by natural phosphodiester linkages; and
(ii) among the remaining primer nucleotides, two or more adjoining nucleotides are base analogs of natural nucleotide bases, and are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides, and
(iii) an intercalating agent is attached to the 5' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
2. The primer of claim 1, which is composed of between 9-11 bases, wherein the five nucleotides closest to the primer's 3' end are natural nucleotides linked by natural phosphodiester linkages.
3. The primer of claim 1, wherein the base analogs in the two adjoining primer nucleotides are selected from the group consisting of 5-methyl-2 ' -deoxycytidine, 5-bromo-2 ' -deoxycytidine, 2-amino-2 ' -deoxyadenine , 5- (1-propynyl) -2 ' -deoxyuridine, 5- ( 4 , 5- dimethylthiazol-2-yl) -2 ' -deoxyuridine, 5- (1-propynyl) -2 ' - deoxycytidine, 7- (1-propynyl) -7-deaza-2 ' -deoxyguanosine, 7-(l- propynyl) -7-deaza-2 ' -deoxyadenosine, 5-fluoro-2 ' -deoxyuridine, 5- bromo-2 ' -deoxyuridine, 7-deazaadenosine, N2- (imidazolylpropyl) -2 ' - deoxyguanosine, tricyclic analogs of cytosine, and phenoxazine 3- methyl nucleotide.
4. The primer of claim 1, wherein the intercalating agent is selected from the group consisting of 2-amino anthraquinone, 9- aminoellipticine, [N- (2-hydroxyethyl) phenazinium] , 2-methoxy-6- chloro-9-aminoacridine, oxazolopyridocarbazolium, pyrene, arugomycin, nogalamycin, pluramycins, actinomycin D, triostin A, echinomycin, TANDEM, and CysMeTANDEM.
5. A library of modified oligonucleotide primers from which can be selected, an oligonucleotide primer effective for use in characterizing a selected target sequence via a polymerase- catalyzed primer-extension reaction, where: (a) each primer in the library is composed of from 7-11 nucleotides extending from a 3' end to a 5 ' end, has a Tm greater than or equal to 35°C and (i) at least the three nucleotides closest to the primer's 3' end are natural nucleotides linked by natural phosphodiester linkages; (ii) the remaining portion of the primer sequence contains a modification effective to raise the Tm of primer dissociation from a complementary target sequence to at least about 35°C, said modification selected from the group consisting of:
(iia) two or more adjoining nucleotides are base analogs of natural nucleotide bases, and are effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides;
(iib) the 5' end of the primer is attached to an intercalating agent through a linker that permits intercalation of the agent between adjacent bases in the remaining nucleotides in said primer;
(iic) the backbone linkages between adjacent nucleotides are modified linkages effective to enhance the stability of primer/target duplex formation; (ϋd) the 5' end of the primer is attached to a minor groove binder (MGB) in a manner effective to permit binding of said MGB to the target sequence; and
(iie) combinations of (iia)-(iid); and
(b) the probability of at least one primer in the library of hybridizing along its entire length with a random sequence contained in a lOOmer target sequence is 90% or greater.
6. A reduced-set library of oligonucleotide primers according to claim 5 wherein said primer sequences have been selected to remove at least some sequences (i) that contain dinucleotides of the form TA, AC, and GT, when read in a 5 ' to 3 ' direction; (ii) that contain repetitive genomic sequences, (iii) that promote self-annealing, and (iv) whose Tm is less than 35°C.
7. The reduced-set library of oligonucleotide primers according to claim 6 wherein said primer sequences have been further selected to remove at least some sequences (vi) that contain dinucleotides of the form AA or TT; and/or (ii) that contain sequences found in commonly used cloning vectors.
8. The reduced-set library of claim 6, wherein the oligonucleotide primers of said library are composed of between 9- 11 bases and the five nucleotides closest to the 3' end of the primers are natural nucleotides linked by natural phosphodiester linkages.
9. The reduced-set library of claim 6, wherein each primer has, among the remaining primer nucleotides, two or more adjoining nucleotides which are base analogs of natural nucleotide bases, and are effective to enhance base stacking in a duplex primer- target structure relative to that observed with natural nucleotides, and an intercalating agent attached to the 5' end of the primer through a linker effective to allow intercalation of the agent between the two adjacent analog bases.
10. The reduced-set library of claim 9, wherein the base analogs in the two adjoining primer nucleotides in are selected from the group consisting of 5-methyl-2 ' -deoxycytidine, 5-bromo- 2 ' -deoxycytidine, 2-amino-2 ' -deoxyadenine, 5- (1-propynyl) -2 ' - deoxyuridine, 5- (4, 5- dimethylthiazol-2-yl) -2 ' -deoxyuridine, 5-(l- propynyl ) -2 '-deoxycytidine, 7- (1-propynyl) -7-deaza-2 ' - deoxyguanosine, 7- (1-propynyl) -7-deaza-2 ' -deoxyadenosine, 5- fluoro-2 ' -deoxyuridine, 5-bromo-2 ' -deoxyuridine, 7-deazaadenosine, N2- (imidazolylpropyl) -2 ' -deoxyguanosine, tricyclic analogs of cytosine, and phenoxazine 3-methyl nucleotide.
11. The reduced-set library of claim 9, wherein the intercalating agent is selected from the group consisting of 2- amino anthraquinone, 9-aminoellipticine, [N- (2-hydroxyethyl)phenazinium] , 2-methoxy-6-chloro-9- aminoacridine, oxazolopyridocarbazolium, pyrene, arugomycin, nogalamycin, pluramycins, actinomycin D, triostin A, echinomycin, TANDEM and CysMeTANDEM.
12. The reduced-set library of claim 6, wherein the remaining portion of the primer sequence contains a modified backbone linkage between at least some of the adjacent nucleotides, said modification effective to enhance the stability of primer/target duplex formation, and said modified backbone linkage selected from the group consisting of morpholino derivatives, N- (2-aminoethyl) glycine backbones, 2'-0-methyl, 2 ' -0- allyl, 2'-fluoro, 5 ' -pyrene derivatized phosphate, 5 ' -N-carbamate, hydroxylamine, methyleneoxy, guanidine linkages, and combinations thereof .
13. The reduced-set library of claim 6, wherein the primer length is 7 nucleotides and the number of primers in the library is about 1,500.
14. The reduced-set library of claim 6, wherein the primer length is 8 nucleotides and the number of primers in the library is about 6,500.
15. The reduced-set library of claim 6, wherein the primer length is 9 nucleotides and the number of primers in the library is about 9000.
16. The reduced-set library of claim 6, wherein the primer length is 10 nucleotides and the number of primers in the library is about 35,000.
17. The reduced-set library of claim 6, wherein the primer length is 11 nucleotides and the number of primers in the library is about 140,000.
18. A method of selecting a modified oligonucleotide primer effective for use in characterizing a selected target sequence, via a polymerase-catalyzed primer-extension reaction where the target has a known sequence of at least about 100 nucleotides, said method comprising:
(1) matching a subsequence or its complement within the known target sequence with the sequence of a primer in a library of oligonucleotide primers; where
(a) each primer in the library is composed of from 7-11 nucleotides extending from a 3' end to a 5 ' end, has a Tm greater than or equal to 35°C, and (i) at least the three nucleotides closest to the primer's 3' end are natural nucleotides linked by natural phosphodiester linkages; (ii) the remaining portion of the primer sequence contain modifications effective to raise the Tm of primer dissociation from a complementary target sequence to at least about 35°C, said modifications selected from the group consisting of:
(iia) two or more adjoining nucleotides which are base analogs of natural nucleotide bases effective to enhance base stacking in a duplex primer-target structure relative to that observed with natural nucleotides;
(iib) the 5' end of the primer is attached to an intercalating agent through a linker that permits intercalation of the agent between adjacent bases in the remaining nucleotides in said primer;
(iic) the backbone linkages between adjacent nucleotides are modified linkages effective to enhance the stability of primer/target duplex formation;
(iid) the 5' end of the primer is attached to a minor groove binder (MGB) in a manner effective to permit binding of said MGB to the target sequence; and
(iie) combinations of (iia)-(iid); and (b) the probability of at least one primer in the library of hybridizing along its entire length with a random sequence contained in a lOOmer target sequence is 90% or greater; and (2) selecting the matched-sequence primer for use in the primer- extension reaction.
19. The method according to claim 18, wherein said library is a reduced-set library, said oligonucleotide primer sequences selected to remove at least some sequences (i) that contain dinucleotides of the form TA, AC, and GT, when read in a 5 ' to 3 ' direction; (ii) that contain repetitive genomic sequences, (iii) that promote self-annealing, and (iv) whose Tm is predicted to be less than 35°C.
20. The method according to claim 19, wherein the primer sequences in said reduced set library have been further selected to remove at least some sequences (vi) that contain dinucleotides of the form AA or TT; or (ii) that contain sequences found in commonly used cloning vectors.
21. A computer-based system to facilitate selection from a reduced-set library of modified oligonucleotide primers effective for use in characterizing a selected target sequence, via a polymerase-catalyzed primer-extension reaction where the target has a known sequence of at least about 100 nucleotides, said system comprising:
(1) a processor; and
(2) a program of instructions for controlling the processor to:
(a) display the reduced-set library of oligonucleotides,
(b) enable a user to input a target sequence into the computer system,
(c) compare the target sequence with primer sequences in the reduced-set library of oligonucleotides to select at least one primer sequence from the reduced-set library which has a sequence of nucleotide bases complementary to that of a 7-11 nucleotide portion of the target sequence, (d) enable a user to input a sequence to be subtracted into the computer system, (e) compare the sequence to be subtracted with the at least one primer sequence selected in (c) to subtract any primer sequences complementary to a selected sequence, (f) enable the user to accept or reject the one or more selected and subtracted primers, and (g) enable the user to provide payment for the one or more selected and subtracted primers.
22. A computer-readable medium embodying a program of instructions for execution by the computer to perform a method of selecting a modified oligonucleotide primer effective for use in characterizing a selected target sequence, via a polymerase- catalyzed primer-extension reaction where the target has a known sequence of at least about 100 nucleotides, said program of instructions comprising instructions for: (a) displaying the reduced-set library of oligonucleotides,
(b) enabling a user to input a target sequence into the computer system,
(c) comparing the target sequence with sequences in the reduced-set library of oligonucleotides to select at least one sequence from the reduced-set library which has a sequence of nucleotide bases complementary to that of a 7-11 nucleotide portion of the target sequence, (d) enable a user to input a sequence to be subtracted into the computer system,
(e) comparing the sequence to be subtracted with the at least one primer sequence selected in (c) to subtract any primer sequences complementary to a selected sequence,
(f) enabling the user to accept or reject the one or more selected and subtracted primers, and
(g) enabling the user to provide payment the one or more selected and subtracted primers .
PCT/US1999/026431 1998-11-10 1999-11-09 A library of modified primers for nucleic acid sequencing, and method of use thereof WO2000028087A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10776098P 1998-11-10 1998-11-10
US60/107,760 1998-11-10

Publications (1)

Publication Number Publication Date
WO2000028087A1 true WO2000028087A1 (en) 2000-05-18

Family

ID=22318324

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/026431 WO2000028087A1 (en) 1998-11-10 1999-11-09 A library of modified primers for nucleic acid sequencing, and method of use thereof

Country Status (1)

Country Link
WO (1) WO2000028087A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000060123A2 (en) * 1999-04-06 2000-10-12 Genome Technologies, Llc Method for selecting primers for amplification of nucleic acids

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997001645A1 (en) * 1995-06-28 1997-01-16 Amersham Life Science Primer walking cycle sequencing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997001645A1 (en) * 1995-06-28 1997-01-16 Amersham Life Science Primer walking cycle sequencing

Non-Patent Citations (16)

* Cited by examiner, † Cited by third party
Title
"Computer-assisted selection of oligonucleotide primers and probes, and calculation of critical parameters using OLIGO, ver. 4.0 primer analysis software", FASEB JOURNAL, vol. 7, no. 7Sup, 1993, pages A1315, XP002132782 *
AZHIKINA T ET AL.: "Strings of contiguous modified pentanucleotides with increased DNA-binding affinity can be used for DNA sequencing by primer walking", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES USA, vol. 90, 1993, pages 11460 - 11462, XP002132780 *
BALL S ET AL.: "The use of tailed octamer primers for cycle sequencing", NUCLEIC ACIDS RESEARCH, vol. 26, no. 22, 1998, pages 5225 - 5227, XP002132786 *
BLÖCKER H AND LINCOLN D N: "The 'shortmer' approach to nucleic acid sequence analysis I: computer simulation of sequencing projects to find economical primer sets", CABIOS, vol. 10, no. 2, 1994, pages 193 - 197, XP002132781 *
DURAND M ET AL.: "Oligothymidylates covalently linked to an acridine derivative and with modified phosphodiester backbone: circular dichroism studies of their interactions with complementary sequences", NUCLEIC ACIDS RESEARCH, vol. 17, no. 5, 1989, pages 1823 - 1837, XP002132785 *
FREIER S M ET AL.: "The ups and downs of nucleic acid duplex stability: structure-stability studies on chemically-modified DNA:RNA duplexes", NICLEIC ACIDS RESEARCH, vol. 25, no. 22, 1997, pages 4429 - 4443, XP002132784 *
GEIERSTANGER B H AND WEMMER D E: "Complexes of the minor groove of dna", ANNUAL REVIEWS OF BIOPHYSICAL AND BIOMOLECULAR STRUCTURES, vol. 24, 1995, pages 463 - 493, XP000884117 *
HABENER J F ET AL.: "5-Fluorodeoxyuridine as an alternative to the synthesis of mixed hybridization probes for the detection of specific gene sequences", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES USA, vol. 85, 1988, pages 1735 - 1739, XP002132779 *
HARDIN S H ET AL: "OCTAMER-PRIMED CYCLE SEQUENCING: DESIGN OF AN OPTIMIZED PRIMER LIBRARY", GENOME RESEARCH,US,COLD SPRING HARBOR LABORATORY PRESS, vol. 6, no. 6, 1 June 1996 (1996-06-01), pages 545 - 550, XP000597087, ISSN: 1088-9051 *
JONES L B AND HARDIN S H: "Octamer-primed cycle sequencing using dye-terminator chemistry", NUCLEIC ACIDS RESEARCH, vol. 26, no. 11, 1998, pages 2824 - 2826, XP002132777 *
KOTLER L ET AL: "DNA SEQUENCING: MODULAR PRIMERS FOR AUTOMATED WALKING", BIOTECHNIQUES,US,EATON PUBLISHING, NATICK, vol. 17, no. 3, 1 September 1994 (1994-09-01), pages 554 - 556,558-559, XP000466496, ISSN: 0736-6205 *
KUMAR S ET AL.: "Solution structure of a highly stable DNA duplex conjugated to a minor groove binder", NUCLEIC ACIDS RESEARCH, vol. 26, no. 3, 1998, pages 831 - 838, XP002132783 *
KUTYAVIN I V ET AL.: "Oligonucleotides with conjugated dihydropyrroloindole tripeptides: base composition and backbone effects on hybridization", NUCLEIC ACIDS RESEARCH, vol. 25, no. 18, 1997, pages 3718 - 3723, XP002132775 *
RUIZ-MARTINEZ M C ET AL: "DNA SEQUENCING BY CAPILLARY ELECTROPHORESIS USING SHORT OLIGONUCLEOTIDE PRIMER LIBRARIES", BIOTECHNIQUES,US,EATON PUBLISHING, NATICK, vol. 20, no. 6, 1 June 1996 (1996-06-01), pages 1058 - 1062,1064,, XP000597614, ISSN: 0736-6205 *
SLIGHTOM J L ET AL.: "Nucleotide sequencing double-stranded plasmids with primers selected from a nonamer library", BIOTECHNIQUES, vol. 17, no. 3, 1994, pages 536 - 544, XP002132776 *
TOULMÉ J J ET A.: "Specific inhibition of mRNA translation by complementary oligonucleotides covalently linked to intercalating agents", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES USA, vol. 83, 1986, pages 1227 - 1231, XP002132778 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000060123A2 (en) * 1999-04-06 2000-10-12 Genome Technologies, Llc Method for selecting primers for amplification of nucleic acids
WO2000060123A3 (en) * 1999-04-06 2002-01-24 Genome Technologies Llc Method for selecting primers for amplification of nucleic acids

Similar Documents

Publication Publication Date Title
US6379932B1 (en) Single primer PCR amplification of RNA
Wetmur DNA probes: applications of the principles of nucleic acid hybridization
AU687535B2 (en) Isothermal strand displacement nucleic acid amplification
CN114774529A (en) Transformable marking compositions, methods and processes incorporating same
CA2734514C (en) Self-avoiding molecular recognition systems in dna amplification
EP1869217A2 (en) Methods, compositions, and kits for detection of micro rna
CA2639819A1 (en) Selective terminal tagging of nucleic acids
US5599921A (en) Oligonucleotide families useful for producing primers
WO2000060121A1 (en) Pcr genome walking with synthetic primer
US20170016046A1 (en) Methods and Compositions for Isolating Polynucleotides
AU2008229628A1 (en) Assay for gene expression
Tsang et al. [23] In vitro evolution of randomized ribozymes
CA2365980A1 (en) Amplification and sequencing primer pairs and use thereof
KR19990022596A (en) Synthesis of Enzymatically Cleavable Templates and Primer-Based Oligonucleotides
JP2003504018A5 (en)
Demidov et al. An artificial primosome: design, function, and applications
WO2000028087A1 (en) A library of modified primers for nucleic acid sequencing, and method of use thereof
JP2004513625A5 (en)
CN117580959A (en) Methods and compositions for combinatorial indexing of bead-based nucleic acids
EP0897991B1 (en) Small Triplex Forming PNA Oligos
WO2001081630A2 (en) Template-specific termination in a polymerase chain reaction
CN113795594A (en) Nucleic acid amplification and identification method
EP1200624A2 (en) Multiplexed strand displacement for nucleic acid determinations
CN115803453A (en) Compositions and methods for capturing and amplifying target polynucleotides using modified capture primers
KR100844010B1 (en) Method for Simultaneous Amplification of Multi-gene

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase