EP0979307A2 - Sequen age d'un polynucleotide au moyen d'amorces semidegenerees - Google Patents

Sequen age d'un polynucleotide au moyen d'amorces semidegenerees

Info

Publication number
EP0979307A2
EP0979307A2 EP98919324A EP98919324A EP0979307A2 EP 0979307 A2 EP0979307 A2 EP 0979307A2 EP 98919324 A EP98919324 A EP 98919324A EP 98919324 A EP98919324 A EP 98919324A EP 0979307 A2 EP0979307 A2 EP 0979307A2
Authority
EP
European Patent Office
Prior art keywords
primers
sequence
reaction
primer
semi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98919324A
Other languages
German (de)
English (en)
Inventor
Andrew Webster
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP0979307A2 publication Critical patent/EP0979307A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • This invention relates to a method for determining the sequence of a polynucleotide using semi-degenerate oligonucleotide primers.
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • the standard sequencing method involves the elongation of a DNA primer sequence along a polynucleotide template using a DNA polymerase enzyme, deoxynucleotide triphosphates and dideoxynucleotide triphosphates as proposed by F Sanger et al . , (PNAS 1977; 74:5463-).
  • the latter species terminate the elongation reaction and, if labelled specifically (e.g. with a specific fluorophore) , or separated in four different reaction tubes, allow the determination of sequence.
  • This method whilst accurate, has a number of disadvantages.
  • the range of accurate sequence that can be determined in one reaction and electrophoresis run is limited to 500-800 bases due mainly to the scarcity of long products which have escaped the earlier dideoxynucleotide termination.
  • experiments to determine unknown sequence cannot easily be performed in parallel as the result of one sequencing reaction needs to be identified before a further clone or polymerase chain reaction (PCR) product is retrieved for the determination of further adjacent sequence.
  • the concentration of template needs to be high such that pre-amplification of the template DNA is necessary before the sequencing step.
  • only one strand of the template DNA can be determined during each reaction, a separate reaction being needed to sequence the complementary strand to check the validity of the determined sequence.
  • a second method to sequence a polynucleotide has been proposed using hybridisation of unknown DNA to a large panel of known oligonucleotides (Drmanac et al . , Genomics 1989;4:114-, Bains et al . , ibid 1991; 11: 294-, Southern et al . , ibid. 1992 ; 13 : 1008-) .
  • Such a technique is potentially powerful, especially since the development of microscopic oligonucleotide arrays, using photolithographic technology (Fodor et al . , Science 1991; 251: 767-, Pease et al . , PNAS 1994;91:5022-) .
  • hybridisation 'events' occur in parallel on the same array.
  • the number of outcomes from a hybridisation experiment greatly exceeds those from an experiment involving electrophoresis alone, and so hybridisation may potentially be of value in the determination of a megabase sequence.
  • hybridisation of one sequence is not entirely specific to its complementary sequence alone and cross-hybridisations involving one or more mismatched nucleotides are difficult to avoid. This inaccuracy prevents the use of an array to determine unknown sequence.
  • oligomer probe on the array there is a limit to the size of non-contiguous repeated units within the template sequence that can be unambiguously ordered.
  • adjacent repeats of identical sequence such as di- or tri- nucleotides, which occur very commonly in mammalian DNA, can never be accurately determined by using hybridisation alone.
  • the polymerase chain reaction has proved an extremely powerful technique in nucleotide analysis. In its most straightforward form, this involves the amplification of sequence between two smaller known sequences that can be encoded by two oligonucleotides known as 'primers'.
  • a DNA polymerase enzyme extends a 5 ' to 3 ' sequence of nucleotides from each primer complementary to the template sequence. Cycles of denaturation, primer annealing and polymerase elongation are performed to manufacture many identical copies of desired double-stranded DNA between these two primer sequences.
  • the primer nucleotide has to anneal to its complementary sequence in the template DNA and the affinity and specificity of this annealing process can be controlled to some extent by the annealing temperature and the salt concentration.
  • DNA polymerase enzymes that do not possess a 3' to 5' exonuclease function e.g. Taq polymerase
  • elongation of the primer sequence will not take place if the 3 ' most nucleotides of the primer do not match exactly with the corresponding nucleotides on the template strand.
  • a number of mismatches 5' to these sites can be tolerated, particularly if the annealing temperature is lowered or salt concentration increased, mismatches being tolerated less well the closer they are to the 3' end of the primer.
  • the annealing temperature and salt concentration for each pair of primers has to be determined empirically, although calculation of the (G+C) to (A+T) ratio, primer length and salt concentration do allow a melting temperature (Tm) for each primer to be estimated.
  • Amplification using semi-degenerate primers shows reproducible bands with template DNA, suggesting that the non-random 3 ' nucleotides still confer specificity of binding.
  • the same conclusion can be drawn from the reproducible amplicons derived from short primers in the process known as random amplification of polymorphic DNA (RAPD) .
  • RAPD polymorphic DNA
  • a small single primer of arbitrary sequence is used at low annealing temperature in a PCR reaction to distinguish different strains of organism, on the basis of reproducible amplicon sizes, following amplification. If mismatches could occur all along the primer the bands would not be reproducible; the specificity of the reaction is therefore likely to be caused by the specific matching of the primer 3' ends.
  • the present invention involves a method for the analysis of polynucleotide molecules using semi-degenerate primers, and the polymerase reaction.
  • the method comprises the steps of: a) reacting a target polynucleotide with oligonucleotide primers, a polymerase enzyme, and the other reagents necessary for the polymerase reaction, wherein the oligonucleotide primers are chosen such that polymerase products of varying lengths are produced; and b) analysing the products of the said reaction or reactions; wherein the oligonucleotide primers include an array of semi-degenerate primers whose 3' ends comprise variations of one or more nucleotides A, T, G and C such that the array is complementary to all the polynucleotide sequence .
  • the semi-degenerate primers are used in a set of separate reactions to generate amplicons from a contiguous segment of polynucleotide, the sequence of which is to be determined.
  • Prior knowledge of the primers' 3 '-ends in each reaction, and subsequent sizing of fragments, allows the unambiguous determination of template sequence.
  • tagging of the 5'- ends of the primers with a specifically designed sequence allows manipulations such as separation of amplicons on the basis of complementary sequence hybridisation, the addition of primer sites for further amplification and incorporation of sites for in vitro transcription and hybridisation to oligonucleotide arrays.
  • the frequency of binding of primers on the template, the number of distinct amplicons of specific length generated and the proportion of template covered by amplicons from specific primers can be predicted using statistical analysis.
  • the technique can therefore be designed and adapted appropriately for each range of length of template that requires sequencing.
  • amplicon generation using semi-degenerate primers allows the •translation' of unknown nucleotide sequence into a series of designed oligomer sequence 'tags' for hybridisation to oligonucleotide arrays, so that cross-hybridisation is minimised.
  • the length of product generated by the semi-degenerate sequencing reactions is not limited to a maximum of ⁇ 1000 bases, instead being limited only by the elongation time during reaction cycles and the specific type of polymerase enzyme used (e.g. 3.5 kb) . This increases the number of nucleotides that can be determined in each reaction.
  • Embodiments of the invention allow parallel analysis of a large template polynucleotide in which the sequence is unknown. Experiments that can only be undertaken sequentially by the Sanger method can therefore be set up simultaneously by this method.
  • Embodiments of the invention interrogate both sense and anti-sense strands simultaneously during the sequencing reactions, acting as a check for accurate sequence determination.
  • Embodiments of the invention generate products suitable for hybridisation analysis, providing enormous power to determine the sequence of megabase polynucleotides in single experiments. This power is limited only by the size of the oligonucleotide array and allows the possibility of whole genome sequencing. Detailed Description of the Invention
  • one non- degenerate specific primer is chosen.
  • This primer could be a specific sequence at the beginning of the DNA sequence to be determined, or complementary to part of a cloning vector if the template DNA has been cloned. Ideally, it should have a low annealing temperature.
  • the specific primer is labelled so that amplicons with incorporated primer can subsequently be identified. This can be done for example, using end-labelling with ⁇ P-dATP, biotinylation or attachment of a fluorophore.
  • the latter labelling scheme by using more than one distinct fluorophore with different emission frequencies, may allow later electrophoresis of more than one reaction product on one lane of an electrophoretic gel.
  • a set of semi-degenerate primers is designed so that together their non-degenerate 3 ' ends cover all possible sequences in the template DNA. Random nucleotides, a universal base, e.g. inosine, or a combination of both entities are then used to make up the middle 5-10 nucleotides of the primer. Inosine has the disadvantage of allowing intercomplementarity of the semi- degenerate and specific primers, whilst random nucleotides have the disadvantage of decreasing by fourfold the effective concentration of each primer for each nucleotide position.
  • a specific tag sequence can optionally be added to the 5' end.
  • Such a tag sequence may encourage the binding of the semi-degenerate primer at the end of an amplicon following the first or subsequent reaction cycles, rather than further internally, thus increasing the number of large amplicons at the expense of small ones. Furthermore, if a similar tag is added to the 5' end of the specific primer, further rounds of amplification using primers to the tag sequences can be used to augment the concentration of final product.
  • the label e.g. ⁇ P-dATP etc
  • the type of DNA polymerase used in the sequencing reaction is important.
  • the enzyme must not have any 3' to 5' exonuclease activity, as this would allow a semi- degenerate primer to successfully amplify a product at a site that does not exactly correspond to its 3' end.
  • the absence of such activity occurs in a number of enzymes available commercially for use in PCR (e.g. Thermus aquaticus (Taq) polymerase, Thermus thermophilus (Tth) polymerase) ; although this may reduce the fidelity of polymerisation, it should not effect the accuracy of the technique.
  • a 5' to 3' exonuclease activity is useful, however, in reducing the tendency for semi-degenerate primer binding internally and thus reducing the frequency of short amplicons.
  • Reactions are set up using template DNA, deoxynucleotide triphosphates, a suitable buffer system, a suitable DNA polymerase and a single or set of semi- degenerate primer (s). Because the semi-degenerate primers can manufacture amplicons without the incorporation of the specific primer, and this may reduce the efficiency of specific primer generated reactions, a number of cycles at high stringency (with only the specific primer annealing ) can be performed first. Also the specific primer can be used in excess concentration. This may generate a number of single-stranded products with the specific primer at the ends (i.e. as in 'asymmetric PCR').
  • the reactions can be performed on whole genomic DNA, again the efficiency of specific primer-generated reactions is increased if the template is separated and/or purified before the reactions.
  • the semi-degenerate primers can be added and the annealing temperature lowered. The exact annealing temperature for any set of semi- degenerate and specific primers may need to be determined empirically. Thereafter, cycles of PCR are repeated as usual with the elongation temperature appropriate to the DNA polymerase (e.g. 72°C for Taq polymerase) and denaturation steps at 96°C.
  • the technique is also of use in determining the sequence of a specific mRNA, from extracted cell total RNA, after reverse transcriptase amplification with a single specific primer. Subsequent use of a further specific primer and a set of semi-degenerate primers may allow determination of the sequence without amplification of other mRNAs.
  • the choice of semi-degenerate primer sets influences the number of reactions and electrophoresis runs as well as the decoding for the final sequence solution.
  • the simplest primer set, which uses four semi-degenerate primers in four separate PCR reactions each with a labelled specific primer, is shown below:
  • R represents equal proportions of each nucleotide A, T, G, C or inosine
  • tag represents a common specified DNA sequence of 10-20 nucleotides that does not share complementarity with itself or the chosen specific primer.
  • the four reactions are run out on a non-denaturing gel (e.g. agarose or polyacrylamide) in separate lanes or in the same lane if four different fluors have been used to label the specific primer in each reaction. The sequence can be deduced easily by the relative lengths of the bands occurring in each lane.
  • Amplicons with semi-degenerate primers at each end do not appear on the gel as they are not labelled. Subsequent autoradiography or fluorescence detection is performed to reveal the specific primer- containing amplicons. A more sophisticated set of four semi-degenerate primers that specify the first two 3 • positions is as follows:
  • Primers in which the two 3' positions can each be one of two nucleotides, make up a set of four sequences that match their complementary sequences during the annealing step of the PCR reaction.
  • There are 36 ( 6 2 ) such primers and 58,905 ways of choosing a set of four different ones ( 36 C 4 ) .
  • 36 C 4 36 C 4 .
  • This particular combination allows unambiguous sequence determination, as the pair of nucleotides at any one position in each primer is not reproduced exactly in the other position in any of the four primers.
  • each position of the template requires the annealing of two complementary primers and the successful amplification of product.
  • a larger set of semi-degenerate primers may allow each nucleotide position to be 'checked' as each would be complementary to two or three primers in the sequencing reactions:
  • This method only sequences one strand of the template DNA (unlike the subsequent embodiments) and so an anti- sense 'check' does not inherently occur. Also the length of sequence determined in each experiment is limited by both the length of double-stranded amplicon that can be distinguished to the accuracy of a single nucleotide pair during electrophoresis (e.g. approx. 3000 bases) and the length that can be generated by the polymerase enzyme.
  • one or more semi-degenerate primer(s) with specified 3 ' -ends are used in a set of individual sequencing reaction experiments such that each reaction contains different 3' sequences. A specific primer is not used. A number of such experiments are performed on purified template DNA.
  • the size of the amplicons from each experiment is then determined.
  • the final sequence can be solved knowing the length of nucleotide between the flanking primer (s) that were used in each experiment.
  • the number of experiments that are necessary, together with the specificity of binding of the semi-degenerate primers that are needed, can be determined using statistical considerations.
  • the technique can be modified by the use of endonuclease enzymes to fragment the template in a predictable fashion prior to carrying out semi-degenerate primed PCR.
  • the template polynucleotide needs to have been separated from other contaminating polynucleotides because a specific labelled primer is not used. Some contaminating sequence can however be tolerated during the reconstruction of sequence. However, significant contaminating polynucleotide would prevent the clean separation of legitimate amplicons during size analysis.
  • One method to obtain such purification of a specific segment of DNA for example, without cloning, is to use a biotin/streptavidin or similar binding system.
  • a biotinylated oligonucleotide with a sequence complementary to a part of the sequence to be determined is allowed to hybridise to total genomic DNA following a fractionation or endonuclease step.
  • the bound fragment is then isolated using avidin or streptavidin capture.
  • Other methods include the flow-sorting of nucleotide fragments and electrophoretic separation.
  • the desired template can be pre-amplified by prior PCR and subsequently purified by standard methods.
  • nucleotide For any single nucleotide within the target polynucleotide to be included in the final reconstructed sequence, two events need to occur: firstly, the nucleotide must be bound to at least one primer within its 3 ' specified sequence; secondly, this specific nucleotide duplex must go on to be incorporated in an amplicon.
  • the number of different specified 3 ' primer sequences that are necessary to sequence unambiguously does not need to include all the possible 3' nucleotide combinations. For example, considering the use of 5 '-tag- (Random) x (N) 3 -3 ' primers, where (N) 3 represents a specified 3mer sequence, not all 64 possible primers need be included. If they were, each nucleotide would be bound at 3 sites by 1 to 3 different primers.
  • Various algorithms can be designed to determine a smaller set of 3mers that bind all possible nucleotides (with regard to their surrounding sequence) at least once.
  • AAA AAG, AAC, AGA, AGC, ACA, ACG, ACC, ATA, ATG, ATC, ATT, GGA, GGG, GGC, GGT, GCA, GTA, GTG, GTC, GTT, CGA, CGC, CCA, CCG, CCC, CTA, CTG, CTC, CTT, TGA, TGC, TCA, TTA, TTC, TTT
  • AAAA AAAA, AAGA, AACA, AATA, AATC, AGGA, AGGG, AGGC, AGGT, AGCA, AGCG,
  • ACCG ACCG, ACCC, ACCT, ACTA, ACTG, ACTC, ACTT, ATGA, ATGG, ATGC, ATGT,
  • ATCA ATTA, ATTG, ATTC, ATTT, GAAA, GAGA, GACA, GATA, GATC, GGGA, GGGG, GGGC, GGGT, GGCA, GCGA, GCGG, GCGC, GCGT, GCCA, GCCG, GCCC,
  • GCCT GCCT
  • GCTA GCTG
  • GCTC GCTT
  • GTAA GTGA
  • GTGG GTGG
  • GTGT GTCA
  • GTTA GTTA
  • CCGT CCCA
  • CCCG CCCC
  • CCCT CCTG
  • CTAA CTCA
  • CTTA CTTG
  • TAAA TAAA
  • TAGA TACA, TATA, TATC, TGAA, TGGA, TGGT, TGCA, TGTA, TGTG, TCGA, TCGG, TCGC, TCGT, TCCA, TCCG, TCCC, TCCT, TCTA, TCTG, TCTC, TCTT,
  • TTCA TTCA
  • TTTA TTTG
  • TTTC TTTT
  • 64 simultaneous PCR reactions labelled conventionally (eg with ⁇ 32 P-dCTP during the reactions) , each containing a single 3mer semi-degenerate primer, after complete digestion with endonuclease enzymes that have a total probability of cutting of 2/4 5 (e.g. 2 5- cutters or two of the form -g, A/T, g, C, A/T, C- ) run out on a non-denaturing gel, would give useful information from 640 base-pairs (1% probability of no amplicons) to 3760 base pairs (99% probability of no amplicons) .
  • the reconstruction of the final sequence using the second method is more complicated than the decoding necessary for the first method.
  • the length x of nucleotide is determined. This can be taken as either exact or to lie within specified limits, depending on the accuracy of nucleotide sizing.
  • a nucleotide sequence or set of sequences that defines the ends of each length x is determined. If n different semi- degenerate 3 ' end sequences are used in the reaction being considered, then this set will comprise all of these n sequences.
  • Different algorithms can be designed to use these data to piece together the final sequence solution. One algorithm involves searching for lengths of amplicon in a determined fashion until a match, as follows, is made.
  • (A) represents a 3 • primer sequence, or set of sequences, used in one semi-degenerate reaction (reaction 1) .
  • the complementary sequence or set of sequences A' has been used in another separate reaction (reaction 2) .
  • a, b, c and d represent lengths of amplicon occurring in these two reactions - a, c and d in reaction 1 and b in reaction 2.
  • This embodiment of the invention is an excellent way of generating polynucleotide fragments from a template nucleotide for the use of hybridisation onto an oligonucleotide array.
  • the product from each semi- degenerate reaction can be fragmented using techniques already described (e.g. Chee et al . , Science 1996; 274:610- ) and hybridised to an array.
  • any non-contiguous repeat sequences could be unambiguously ordered as each repeat would turn up in differently labelled amplicons.
  • concomitant sizing of the reaction products e.g. using electrophoresis may allow the determination of the length of contiguous repeat sequences as well.
  • this third embodiment of the invention proceeds through repeated polymerase reactions with specific pairs of primers, one of which is a semi- degenerate primer with a defined 3 ' end.
  • the other primer is designed so that it has a nucleotide sequence that is specific to the individual semi-degenerate primer and the individual step in the overall series of reactions.
  • This primer also contains sequences that are recognised by corresponding primers in subsequent reactions so that, after several reactions, the polymerase product contains one end labelled with multiple primers, which define the order of binding and identity of the semi-degenerate primers to the template polynucleotide. Therefore, as the series of reactions proceeds, one end of the template polynucleotide will be shortened as subsequent semi- degenerate primers hybridise 3' to the last one, and one end will be lengthened, as the second primer is incorporated into the polymerase product. Ultimately, no template will remain, and the polymerase product will consist of a defined series of nucleotides. The products can then be hybridised exclusively to specific addresses on an oligonucleotide array.
  • the basic strategy involves an initial single semi- degenerate sequencing reaction to generate amplicons from the template sequence using a set of semi-degenerate primers.
  • the reactions are carried out so that each semi- degenerate primer of the set is contained in a separate compartment.
  • the resulting amplicons are used in the subsequent cycle of reactions.
  • a number of sequential semi-degenerate primed reactions are interspersed with complete mixing of the reaction products and further separation for another round of reactions.
  • Initial amplicons from the first reaction on the template sequence are labelled differently at each end, and only asymmetrically labelled amplicons 'survive' the subsequent processing. This enables one end of the amplicon to be used for 'tagging' and the other for sequencing.
  • Subsequent division of the reactants into separate compartments followed by semi-degenerate sequencing reactions with primers specific to each compartment allows the progressive translation of random sequence at one end into designed tags at the other as the cycles of mixing, separation and polymerisation reactions continue. Whilst the number of determined sequences increase exponentially, the number of primers needed for the sequencing strategy increases only arithmetically.
  • a final PCR reaction on all the amplicons amplifies only the complete tagged sequence which is then used to hybridise to a specially designed array.
  • Each tag contains sequences that represent the primer ends of the initial amplicon as well as a unique contiguous sequence within that amplicon.
  • Subsequent decoding allows the determination of a proportion of the original template.
  • a number of similar experiments are performed with distinct initial primers to ensure a complete coverage of the template. Decoding can then identify all unique sequences and accurately order sequences between non-contiguous repeats. The only ambiguity remaining will be the determination of the number of contiguous repeat sequences (e.g. di-, tri- and tetra- nucleotide repeats) . Sizing (e.g.
  • electrophoresis of a set of reaction products will then allow the determination of the lengths of concomitant repeat sequences.
  • the technique depends upon the generation of amplicons in which each end is bound to a different primer, one end being the 'sequencing end', the other being the 'labelling end'.
  • the 'sequencing end' is bound internally by subsequent primers to its 5' end and this ensures that symmetric amplicons with two sequencing ends do not obfuscate the final hybridisation analysis (as the final reaction can only take place if the original 5' tag is available for primer binding).
  • the 'labelling end' is bound at the 5' end. Symmetric amplicons with two labelling ends, if allowed to occur, would undergo subsequent rounds of amplification and be available for the final hybridisation step.
  • Such amplicons can be eliminated by incorporating a separation step in which only amplicons with one or both 'sequencing ends' are retained. This step is undertaken after step A in the following example. It could also be undertaken after step B.
  • step B this latter method would have the disadvantage that one cycle of reaction mixing will have already occurred and so the reactants would need to be transferred from apparatus designed for cycles of mixing and separation. If this step is undertaken after step B, then this would involve the incorporation of a biotin or similar label in the sequencing primer (primer Bl) . After step B, the reactants are pooled and captured using an avidin/streptavidin (or similar) system so eliminating amplicons with two 'labelling ends'. Many variations exist within the basic strategy described above.
  • 'Tag' sequences are domains of primer 5' ends that are specific to both the step of the process and the compartment.
  • Tag B is specific to step B
  • agj. is specific to compartment 1 and so Tag B1 was used during step B in compartment 1.
  • 'C sequences are domains of primer 5 ' ends that are specific to each step but common to all compartments.
  • C c has been used in all compartments during step C.
  • 'N' sequences represent the specified 3' sequence of the semi-degenerate primer.
  • Subscripts x,y etc represent these different 3' specified sequences.
  • (R) n represents sequences within primers that have randomly applied nucleotides at each position, a universal base e.g. inosine, or a combination of the two.
  • the steps of the process are designated A, B, C etc and the compartments 1,2,3, etc.
  • the first reaction mixture includes the template to be sequenced which can be any purified source of contiguous polynucleotide (e.g. flow-sorted whole mammalian chromosomes) .
  • the choice of semi-degenerate primers is determined by the template length and size of array available; the primer binding probability may be adjusted to include a desired proportion of nucleotides in the subsequent amplicons.
  • the annealing temperature is determined empirically for each reaction. In this first single reaction, if a number of such primers (n) is used, this gives n(n+l)/2 types of amplicon with respect to amplicon ends.
  • the semi-degenerate primers for step A are designed to a blueprint similar to the following:
  • 'N' sequences represent the specific 3' ends of the semi-degenerate primers.
  • a a ⁇ q and A lab in a 1:1 proportion.
  • the 'N' sequences are designed to have similar annealing temperatures (the N sequences do not necessarily need to bind all possible nucleotides in the template but instead need to generate a proportion of all template within the resulting amplicons).
  • Primer A s ⁇ q will become incorporated as the sequencing end of amplicons and is biotinylated to allow its retrieval (above) .
  • Primer A lab will become incorporated as the labelling strand and is not biotinylated.
  • C regions represent 10-20 mer tagging sequences common to all the primers in the reaction, C ⁇ , Qai, etc being different from each other, but which are specific to step A and occur in each primer during step A.
  • Tag A j- s ⁇ q and ag ⁇ represent 10-20mer tagging sequences which are different to each other and are specific to the primer with the 3' end of (N x ) and to step A.
  • (R) n represents a number of random nucleotides, inosine or a combination thereof.
  • an avidin/streptavidin capture step (or similar technique) is incorporated to eliminate A lab -A lab amplicons. Alternatively, this can be done after step B (see above) .
  • reagents i.e. primers, polymerase, dNTPs etc
  • reaction product is divided into separate compartments suitable for polymerase reactions (e.g. Eppendorf tubes, PCR silicon chips, a specifically designed chamber with a facility to erect small water-tight separating walls to create temporarily a number of non-communicating individual compartments) .
  • polymerase reactions e.g. Eppendorf tubes, PCR silicon chips, a specifically designed chamber with a facility to erect small water-tight separating walls to create temporarily a number of non-communicating individual compartments.
  • the second reaction step (step B) is designed to separately label each amplicon end such that the 'labelling end' is labelled with a tag sequence specific to the 'sequencing end' of the same amplicon. Subsequent hybridisation at the end of the sequencing experiment will then reveal that the random sequence in question occurred within an amplicon derived in step A from the two specific primer sequences encoded by the first two tag sequences. This is achieved by using each separate compartment to amplify only those amplicons with a specific (or a set of specific) 'sequencing ends'. This principle of the selective amplification of amplicons in each compartment is used in subsequent steps.
  • a typical pair of primers used in this step would be as follows:
  • Step B Primers Bl) 5'- C ⁇ -Tag AlK - C Ac -(R) n -N,-3' B2) S'-C.-Tag B .-C ⁇ '
  • Primer Bl is designed to bind only to the 'sequencing end' of amplicons that contain the two sequences N x and Tag AlaB ⁇ I . Amplicons with other 'sequencing ends' are not amplified in compartment 1.
  • Primer B2 binds to the C ft , sequence of the labelling end of amplicons applying a tag sequence Tag B1 - that identifies this amplicon as having a N 2 sequence at the other end. It also applies a new step-specific C B sequence which will allow subsequent reactions to take place. After this step, two types of amplicon will be formed in each compartment with regard to amplicon ends:
  • B2 - B2 amplicons will not exist because amplicons without at least one 'sequencing end' will have been eliminated (as above) .
  • Bl - B2 amplicons contain the sequence C ⁇ which will be required for the final step that generates tags for hybridisation to an array.
  • Bl-Bl amplicons do not possess this sequence and so will not be included in the final sequence analysis.
  • Bl- Bl amplicons will only occur when an amplicon has the same compartment-specific sequence at each end a rarer event than B1-B2 amplicons which require only that there is a compartment-specific sequence at one end.
  • B1-B2 amplicons will have at their ends the following sequences:
  • B2 ends 5 ' -C B -Tag Bx -C Ab -Tag Aylab - C ⁇ - (R) n -N y -3 '
  • the B2 end codes for the ends of the relevant amplicon from step A with the tags Tag Bx and Tag Aylab which in this case code for the step A primers with the specific end sequences of N x and N y respectively .
  • the reactants from each compartment are evenly mixed and again divided into a number of compartments for reactions with a pair of compartment- specific primers.
  • Step C Subsequent steps are concerned with identifying sequences within the B1-B2 amplicon.
  • the primers used are designed so that a hybridisation event occurring within this region generates a specific tag sequence at the labelling end.
  • primer C2 An example, termed primer C2 , is shown below:
  • domain C B binds to all 'labelling ends' or (B2 ends) of B1-B2 amplicons.
  • successful amplification only occurs if the other end has in it a sequence specific to the particular compartment.
  • primer C2 carries a tag sequence ag ⁇ - that codes for this compartment-specific sequence N x and hence adds it to the two other tag sequences at this 'labelling end' that identify the original step A primers for that amplicon.
  • Domain C c is common to all compartments in step C and will allow subsequent primer binding at the labelling end in the next step, step D.
  • a primer can be designed to interrogate the nucleotides immediately adjacent and upstream of the original step A primer sequence.
  • the primer could interrogate the three nucleotides -XXX- in the following Bl end:
  • domain C ⁇ because it is unique to all Bl ends, only allows binding of primer Cl to the 'sequencing ends' of amplicons.
  • ⁇ ⁇ is a sequence of, for example, 3 nucleotides that are introduced as mismatches and allow for efficient primer hybridisation in the subsequent step.
  • (R) n spans all (R) n _ N x - sequences of Bl ends so that all Bl ends are interrogated.
  • XXX is the compartment-specific sequence that interrogates the upstream three nucleotides and only allows subsequent amplification if this specific sequence is present (in this case only 1 in 64 such amplicons, on average, would be amplified) .
  • sequences XXX were completely non-degenerate then, for complete sequencing, 64 compartments would be required. If XX non-degenerate sequences were used, 16 compartments would be required. Also, if a A/T-XX sequence was used, in which the first 5' nucleotide can be one of two bases and the other two are specific, 32 compartments would be required, and so on.
  • the advantage of this particular kind of primer Cl because of the presence of the domain C ⁇ , is that it allows the annealing temperature of primer Cl to easily match that of primer C2.
  • the main disadvantage is that only nucleotides directly adjacent to step A 3' primer sequences are interrogated and sequenced.
  • a Cl primer can be designed so that it binds anywhere within the amplicon sequence.
  • An example of such a primer is as follows:
  • This primer will bind wherever the specific sequence XXX occurs within an amplicon. However, the annealing temperature for efficient amplification with such a primer is likely to be low and not correspond with the annealing temperature of primer C2.
  • One way round this is to incorporate one or more ' anchor ' non-random nucleotides immediately downstream from the sequence XXX, here shown as sequence (N) n , which are common to all compartments. This limits the sequences that will be interrogated but if n ⁇ 3 such a limitation is easily overcome by the fact that both strands of template are being sequenced and that further experiments with differing anchor sequences can also be performed on the same template.
  • a further way to overcome the likely different annealing temperatures of primers Cl and C2 is to use a number of reaction cycles with only the C2 primers at the appropriate annealing temperature, such that a number of single strands are generated before the addition of Cl primers and lowering of annealing temperature.
  • a primer of the Cl(b) type will be used containing two anchor nucleotides -AG- as follows:
  • amplicons C1-C2 and Cl-Cl there will be two types of amplicons with respect to amplicon ends, amplicons C1-C2 and Cl-Cl.
  • the amplicon ends will be as follows:
  • step D the amplicons are further interrogated with respect to the nucleotides 3' to the sequencing ends.
  • step D will be described here.
  • the number of subsequent steps will be determined by the size of the oligonucleotide array available for hybridisation, the detrimental effects of the accumulation of primers from previous steps (see below) , and the mutation rate of the DNA polymerase. A larger number of steps increases the length of contiguous nucleotide that is encoded in each sequence and also increases the outcome set for each experiment.
  • Step D occurs after the even mixing of all step C reactants and the subsequent division into compartments.
  • a pair of primers is used in each compartment, one each to bind to the sequencing and labelling ends.
  • An example of such a primer pair, used in compartment 1, is shown below:
  • D2 labels the 'labelling end' of each amplicon by virtue of the binding of domain -C c - and labels the amplicon with a tag specific to step D and compartment 1 encoding for the unique sequence XXX.
  • Primer Dl interrogates the 3 nucleotides 3' to those previously interrogated in step C.
  • the presence and binding of sequences Tag ⁇ e ,- and AG allows the annealing temperature to be increased to match primer D2. It also prevents internal binding within the amplicon and so only contiguous sequence is interrogated.
  • Sequence (YYY) Dn ⁇ w is a sequence of 3 nucleotides common to all compartments that is introduced as mismatches and allow for efficient primer hybridisation in the subsequent step.
  • step E might comprise the following two primers:
  • XXX being the compartment specific sequence used to interrogate amplicon sequence and corresponding to the tag sequence Tag E1 - on the labelling end.
  • a final PCR step is performed to amplify the tag sequences on the labelling ends of the amplicons. Only amplicons that have completed each step will be so amplified and used in the hybridisation step. If steps through from A to F have occurred the 'labelling end' of surviving amplicons will look as follows:
  • This sequence contains 6 different tag sequences that together encode for the original amplicon ends as well as a contiguous sequence of 12 nucleotides contained somewhere within that amplicon on the strand shared by sequence B2.
  • the outcome set for this reaction will be 64 6 that is 6.8 x 10 10 .
  • C AJ ' is complementary to the sequence C ⁇ .
  • This final step can be used to generate RNA using a T7 RNA polymerase promoter or a fluor-labelled single stranded DNA in asymmetric PCR for hybridisation to an array as described by Hacia et al and Shoemaker et al respectively (Nature Genet 1996; 14:441-449) .
  • the form of the final sequences generated for hybridisation will be as follows:
  • Each address on the array would have a nucleotide sequence of this form.
  • the set of (64 x 7) 448 Tag sequences will have been specifically designed so that they do not cross-hybridise to a complementary sequence of any other sequence within the set. This could easily be achieved using a subset of the 16384 possible 7mers for example.
  • the C domains of the array can be designed to have a number of mismatches with respect to the C domains in the hybridisation sequences.
  • the problem of contaminating primers One problem with this technique is the accumulation of primers in the reaction mixture from previous steps following step B (above) .
  • One way to eliminate such primers is to purify the amplicons, using standard methods for purifying PCR products, after each step. This would, however, add a further manipulation to each step, and make automation of the whole experiment more complex.
  • One way to reduce the chances of this happening, other than purifying products is to incorporate a dilution step after step B when the reactants had been mixed. The only component of the reactants to have increased in concentration will be the successfully amplified products from the previous step. Hence by diluting the reactants, such amplicons would be the only species to survive in appreciable concentration. Further polymerase enzyme, dNTPs and buffer solution etc will need to be added before the subsequent steps. Also, the number of cycles of reaction for each step should be high enough to ensure adequate concentration of amplicons after the dilution step. This process could be easily automated within a system specially designed for the repetitive mixing and dividing of reactants.
  • a further method to eliminate unwanted primers from previous reaction steps is to add oligonucleotides whose 3 • ends are complementary to such primers (and not to the primers of the present reaction step) so that dimers can be formed and the 3' ends of the primers be 'neutralised* after polymerase elongation.
  • oligonucleotides whose 3 • ends are complementary to such primers (and not to the primers of the present reaction step) so that dimers can be formed and the 3' ends of the primers be 'neutralised* after polymerase elongation.
  • Non-contiguous sequences that are repeated in the template of, regarding the above example, 11 or more nucleotides can be unambiguously ordered as each will occur in one or more different step A amplicons. If the elongation step of the step A reaction is restricted to the generation of amplicons 3000 bases or less then the probability of having 3 or more sequences of 11 or more nucleotides repeating in random DNA in the same amplicon is very small. Even if this did occur, the inclusion of one such repeat in another amplicon would allow unambiguous ordering.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un procédé d'analyse de molécules polynucléotidiques. Ce procédé comprend les étapes consistant: a) à faire réagir un polynucléotide cible avec des amorces nucléotidiques, une polymérase, et les autres réactifs nécessaires pour obtenir une réaction de la polymérase, lesdites amorces nucléotidiques étant choisies de sorte que des produits de la polymérase de longueur variable puissent être générés; b) à analyser les produits de la réaction ou des réactions susmentionnée(s), les amorces nucléotidiques renfermant un ensemble d'amorces semidégénérées dont les extrémités (3') comprennent des variations d'un ou plusieurs nucléotides A, T, G, et C, de sorte que ledit ensemble est complémentaire à toute la séquence polynucléotidique.
EP98919324A 1997-04-28 1998-04-28 Sequen age d'un polynucleotide au moyen d'amorces semidegenerees Withdrawn EP0979307A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB9708606 1997-04-28
GBGB9708606.0A GB9708606D0 (en) 1997-04-28 1997-04-28 Sequencing
PCT/GB1998/001233 WO1998049341A2 (fr) 1997-04-28 1998-04-28 Sequençage d'un polynucleotide au moyen d'amorces semidegenerees

Publications (1)

Publication Number Publication Date
EP0979307A2 true EP0979307A2 (fr) 2000-02-16

Family

ID=10811491

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98919324A Withdrawn EP0979307A2 (fr) 1997-04-28 1998-04-28 Sequen age d'un polynucleotide au moyen d'amorces semidegenerees

Country Status (4)

Country Link
EP (1) EP0979307A2 (fr)
AU (1) AU7220398A (fr)
GB (1) GB9708606D0 (fr)
WO (1) WO1998049341A2 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6322968B1 (en) 1997-11-21 2001-11-27 Orchid Biosciences, Inc. De novo or “universal” sequencing array
US6872521B1 (en) 1998-06-16 2005-03-29 Beckman Coulter, Inc. Polymerase signaling assay
AUPQ008799A0 (en) * 1999-04-30 1999-05-27 Tillett, Daniel Genome sequencing
JP2005218301A (ja) * 2002-03-06 2005-08-18 Takara Bio Inc 核酸の塩基配列決定方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5962221A (en) * 1993-01-19 1999-10-05 Univ Tennessee Res Corp Oligonucleotide constructs and methods for the generation of sequence signatures from nucleic acids
US5604097A (en) * 1994-10-13 1997-02-18 Spectragen, Inc. Methods for sorting polynucleotides using oligonucleotide tags

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9849341A2 *

Also Published As

Publication number Publication date
WO1998049341A3 (fr) 1999-01-28
AU7220398A (en) 1998-11-24
WO1998049341A2 (fr) 1998-11-05
GB9708606D0 (en) 1997-06-18

Similar Documents

Publication Publication Date Title
US6083726A (en) Methods for polynucleotide synthesis and articles for polynucleotide hybridization
AU718610B2 (en) Optimally fluorescent oligonucleotides
US5114839A (en) Process for dna sequencing using oligonucleotide bank
US5547843A (en) Method for promoting specific alignment of short oligonucleotides on nucleic acids
US5354656A (en) Method of DNA sequencing
EA012525B1 (ru) Способ подготовки полинуклеотидов для анализа
US5858731A (en) Oligonucleotide libraries useful for producing primers
AU8417391A (en) Circular extension for generating multiple nucleic acid complements
EP0483345A1 (fr) Amplification de la transcription de sequences d'acides nucleiques activee par liaison du promoteur
JPH048293A (ja) ライゲーシヨン可能なヘアピンプローブ及びその転写を用いた核酸の増巾方法
JPH02503054A (ja) 核酸配列の増幅および検出
US5599921A (en) Oligonucleotide families useful for producing primers
JP2002531053A (ja) 核酸のヌクレオチド配列を解析するための方法および試薬
WO1999036571A2 (fr) Procede de detection de sequences nucleotidiques
EP0358737A4 (en) Genomic amplification with direct sequencing
EP0854935A1 (fr) Procede de sequencage genomique continu
JPH06153952A (ja) 微量未知二重鎖dna分子の増幅、標識を行うための前処理方法
US6335184B1 (en) Linked linear amplification of nucleic acids
EP0979307A2 (fr) Sequen age d'un polynucleotide au moyen d'amorces semidegenerees
US6015675A (en) Mutation detection by competitive oligonucleotide priming
US20020018999A1 (en) Methods for characterizing polymorphisms
WO2003002752A2 (fr) Methodes d'utilisation de bibliotheques de translation de coupure pour analyse du polymorphisme de nucleotide simple
US6670120B1 (en) Categorising nucleic acid
EP1117826A1 (fr) Amorces oligonucleotidiques destabilisant la formation de duplex non specifiques et leurs utilisations
Mauger et al. High‐specificity single‐tube multiplex genotyping using Ribo‐PAP PCR, tag primers, alkali cleavage of RNA/DNA chimeras and MALDI‐TOF MS

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19991116

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Withdrawal date: 20011105