WO2010113031A2 - Method of altering nucleic acids - Google Patents

Method of altering nucleic acids Download PDF

Info

Publication number
WO2010113031A2
WO2010113031A2 PCT/IB2010/000893 IB2010000893W WO2010113031A2 WO 2010113031 A2 WO2010113031 A2 WO 2010113031A2 IB 2010000893 W IB2010000893 W IB 2010000893W WO 2010113031 A2 WO2010113031 A2 WO 2010113031A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sequence
type iis
nuclease
homology
Prior art date
Application number
PCT/IB2010/000893
Other languages
French (fr)
Other versions
WO2010113031A3 (en
Inventor
Adrian Francis Stewart
Youming Zhang
Marcello Maresca
Harald Kranz
Stephan Noll
Original Assignee
Gene Bridges Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gene Bridges Gmbh filed Critical Gene Bridges Gmbh
Publication of WO2010113031A2 publication Critical patent/WO2010113031A2/en
Publication of WO2010113031A3 publication Critical patent/WO2010113031A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1082Preparation or screening gene libraries by chromosomal integration of polynucleotide sequences, HR-, site-specific-recombination, transposons, viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites

Definitions

  • This application is related to a method for alteration of a nucleic acid molecule based on linked steps of recombination and endonuclease digestion.
  • nucleic acid molecules particularly DNA molecules
  • DNA molecules DNA molecules
  • functional genomics for review, see Vukmirovic and Tilghman, Nature 405 (2000), 820-822
  • structural genomics for review, see Skolnick et al., Nature Biotech 18 (2000), 283-287
  • proteomics for review, see Banks et al., Lancet 356 (2000), 1749-1756; Pandey and Mann, Nature 405 (2000), 837- 846).
  • SDM site directed mutagenesis
  • two-step PCR which encompasses the frequently applied Overlap PCR' technique
  • the final product is assembled through 2 rounds of PCR.
  • one or more PCR fragments are produced which are then assembled, or extended, using a differing combination of primers in a second round of PCR.
  • the product of the second round is then cloned into a vector using standard molecular biology techniques. Whilst this method is simple, and can be easily performed with commonplace techniques, it is disadvantaged by the increased mutational frequencies inherent with multiple PCR rounds.
  • a second set of methods are based on amplification of the whole target plasmid.
  • the first of these applies 'inverse PCR'.
  • the whole plasmid is amplified as a single linear fragment, wherein the desired mutations are introduced through the use of mutagenic primers.
  • the product may be digested with the Dpnl enzyme to cleave methylated and hemimethylated DNA, i.e. the unmutated template DNA. If the Dpnl digestion step is not complete, a proportion of the plasmids recovered are likely to be unmutated parent. However, completion of digestion cannot be confirmed easily.
  • a gel-extraction step may be performed, although this step is likely both to extend and complicate the process, and additionally contribute to substantial loss of the product.
  • the linear fragment is then ligated back to itself to reform a plasmid, before transformation into a host cell.
  • the fragment is treated with a kinase, or cleaved by a restriction enzyme that recognises a sequence at the termini to generate termini with phosphates attached.
  • Restriction enzymes are grouped into broad classes based on their characteristics. Most commonly applied to molecular engineering are the Type Il enzymes. These enzymes are linked by their recognition of palindromic DNA sequences, and cleavage within this sequence. Cleavage is defined herein as the action by the restriction endonuclease which causes the introduction of a break into the nucleic acid backbone. In other words it causes a break in the covalent bond between sugar and phosphate groups in the nucleic acid backbone. The cleavage of both strands of a double-stranded nucleic acid results in the dissociation of the nucleic acid sequences on either side of the cleavage.
  • Type IIS restriction enzymes for example the much-studied enzyme Fokl.
  • These endonucleases recognise asymmetric sequences (herein termed the "recognition sequence") and cleave outside of this recognition sequence (herein termed the "cleavage point” or “point of cleavage”).
  • the point of cleavage may be the same on each strand of the DNA helix to leave a blunt end, or may be recessed on one strand relative to the other, to leave a 5' or 3' overhang.
  • the distance between recognition sequence and cleavage points varies with the enzyme, but distances of between from 1 up to 20 nucleotides have been recorded (Szybalski et al. Gene 20 (1991 ) 13-26).
  • Type IIS enzymes are not generally applicable in routine molecular cloning. This is because these enzymes cut outside of their defined recognition sequence, and therefore the termini created by digestion do not have a particular sequence. In other words, the 5' or 3' overhangs left following cleavage by these enzymes can be of any sequence at all.
  • Type IIS enzymes have been applied in specific situations, which by appropriate design, took advantage of their ability to cleave outside of the recognition sequence. For example, these enzymes have been applied in the construction of coding sequences without extraneous nucleotides resulting from the use of a standard restriction enzyme recognition sequence. Examples of this use include domain swapping experiments, translational fusion tagging, reporter gene fusions and also mutagenesis studies.
  • Type IIS restriction enzymes have been applied in protocols which amplify the whole backbone of the plasmid by PCR. Primers with Type IIS recognition sequences at their 5' end followed by the desired mutation, and a 3' region for annealing to the template were designed. By amplifying the plasmid with these primers, subsequent digestion with a Type IIS enzyme allowed seamless re-ligation of the mutated linear vector, i.e. without the inclusion of unwanted sequences. This approach is exemplified by the Phoenix Mutagenesis system (Shigaki and Hirshi Anal. Biochem. (2001 ) 298: 1 18-120). As discussed earlier, methods of this nature have a severe drawback resulting from PCR of the whole fragment in that there is ample opportunity for the errors in DNA replication to occur during the in vitro amplification.
  • a further method that amplifies the plasmid backbone is the "Quick Change" PCR mutagenesis method (Stratagene).
  • This kit is the most popular method for SDM.
  • two complementary primers which also contain the desired mutations, are used to prime extension around the plasmid, from the point of annealing.
  • all amplification is from the parent molecule, which significantly reduces unwanted mutations.
  • the product is then treated with Dpnl, as detailed above, and in doing so suffers from the same disadvantages detailed above.
  • this method produces double- stranded PCR products with complementary single-stranded regions at their termini no restriction enzyme cleavage/kinase/ligation steps are required for re-circularisation prior to transformation into the host cell.
  • a further limitation of this method stems from the fact that the PCR reactions in which the mutant sequences are generated are primed by synthetic oligonucleotides. Should it be desired to introduce multiple mutations, the maximum distance between the mutations which can be achieved in a single "Quick Change" reaction is limited by current oligonucleotide synthesis technologies, that is a distance of 100-200 oligonucleotides. Furthermore, longer oligonucleotide syntheses are more prone to produce mutant products. This adds weight to the reality that the products must be sequenced to ensure that the desired, and only the desired, mutations have been incorporated into the target.
  • RecA-dependent recombination pathway which is responsible for the majority of recombinogenic processes in the bacterial cell.
  • a second recombination pathway is the RecF-pathway.
  • recombination requires the expression of both components of the RecE/RecT protein pair, or of its functional homologues derived from the lambda phage, Red ⁇ /Red ⁇ .
  • ET recombination uses the RecE/RecT protein pair (or its functionally homologous pair Red ⁇ /Red ⁇ ) for precise DNA engineering (Zhang et al., Nature Genet 20 (1998), 123- 128; Muyrers et al., Nucl Acids Res 27 (1999), 1555-1557; International patent application WO99/29837; for review see Muyrers et al., Trends Bioch Sci 2001) 26(5): 325-331 ).
  • This system is enormous powerful and may be used to introduce substitutions, deletions and insertions into nucleic acid molecules, as desired.
  • Recombinogenic mutagenesis has been used to incorporate selectable markers into the target DNA in conjunction with the desired mutation.
  • selectable markers are often left in the targeted construct following recombination.
  • methods have been developed to incorporate sites for further site specific recombinases (e.g Bloor & Cranenburgh, Appl. Env. Micro. (2006) 2520-2525). By designing these sites in the correct orientation relative to one another, the selectable marker can be removed from the construct by the action of the further recombinase.
  • Methods for altering the DNA sequence of a target nucleic acid to date have only applied recombineering approaches on their own, or PCR-based approaches in conjunction with cleavage of the PCR product at the termini, with Type IIS enzymes to allow seamless re- ligation. It is the object of the current invention to provide a method for the swift and simple alteration of target nucleic acid molecules, which by virtue of these properties is amenable to high-throughput applications, using an inventive combination of recombineering and endonuclease restriction to effect seamless mutagenesis of target molecules.
  • a method for altering the sequence of a target nucleic acid molecule comprising the steps of: a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a first recognition sequence for a Type IIS nuclease; iii) a selectable marker iv) a second recognition sequence for a Type IIS nuclease; v) a second region of homology to the target nucleic acid molecule; wherein components i) to v) are ordered from 5' to 3'; and a replacement nucleic acid sequence positioned between the first and second regions of homology but flanking the recognition sequences for the Type IIS nucleases; b) selecting for the incorporation of the replacement nucleic acid using the selectable marker, and c) cleaving the product
  • R2S This method is herein termed R2S.
  • Methods according to the invention are swift, simple, efficient and amenable to high-througnput methodologies.
  • the applications described herein are very suited to plasmid engineering in prokaryotes and eukaryotes, although they are also applicable to engineering large molecules and even genomes.
  • Type IIS nucleases are characterized by their ability to cleave at a short distance away from their binding site, rather than within the binding site as usual for restriction enzymes. Because Type IIS nucleases cleave outside their binding sites in flanking DNA, a repeat of a Type IIS site may be cut by the cognate enzyme, leaving DNA ends that can, by simple design, have complementary, single-stranded extensions that can anneal and through efficient ligation establish covalently closed plasmids (or episomes) for further propagation. In a preferred embodiment the ligated product is then transformed into a host cell.
  • the method can be used for site-directed mutagenesis of a target nucleic acid molecule.
  • the linear target fragment product of step c) is ligated to generate the desired replacement sequence.
  • the site-directed mutagenesis is based on the selection of a focal sequence in the target nucleic acid molecule (such as a plasmid or episome) that will be subject to mutagenesis.
  • this focal sequence can be any 1 to 5 bp sequence, according to which Type IIS enzyme is used - here in Figure 1 it is shown as 4 bp, specifically AGCT. It will be understood by the skilled addressee that the focal sequence may be of any length, and designed such that it is appropriate to the strategy employed.
  • a nucleic acid fragment containing a gene for selection is introduced by homologous recombination.
  • the nucleic acid fragment includes, in order from 5' to 3', a first region of homology to the target nucleic acid molecule, also termed a "homology arm"; a first recognition sequence for a Type IIS nuclease; a selectable marker; a second recognition sequence for the Type IIS nuclease; and a second region of homology to the target nucleic acid molecule.
  • the two homology arms flank the focal sequence in the target nucleic acid molecule.
  • the nucleic acid fragment includes a replacement nucleic acid sequence that is positioned between the first and second regions of homology.
  • the replacement sequence must flank, i.e. lie outside, the recognition sequences for the Type MS nucleases.
  • the replacement nucleic acid molecule includes two focal sequences. These focal sequences form the replacement sequence to be inserted into the target nucleic acid and are also the cleavage points for the Type Hs nuclease(s).
  • a first portion of the replacement sequence is present in the first focal sequence.
  • a second portion of the replacement sequence is present in the second focal sequence.
  • the length of the focal sequence will depend on the mutagenesis strategy being employed. For example, a focal sequence may be very short in the case where site directed mutagenesis is concerned, and is preferably 1-4 bp.
  • the focal sequences should be designed such that they are cleaved with the Type IIS nucleases to leave compatible termini in the product of step c) of the method of the first aspect of the invention that can re-ligate to form a circular molecule.
  • the replacement sequence will not be made up of the two focal sequences in their entirety, as a portion of each focal sequence will be excised by the action of the Type IIS nuclease(s).
  • the portion of one focal sequence that forms part of the replacement sequence is the portion that extends from the homology region to the cleavage point of the enzyme within that focal sequence, including any region of single- strandedness, as a result of the cleavage by the Type IIS nuclease(s).
  • the recombined target nucleic acid for example an episome in Figure 1 , is purified, and optionally, it can be retransformed to isolate a pure plasmid preparation.
  • step b) the recombined target nucleic acid is digested with the Type IIS nuclease (Bsal in this case) and re-ligated.
  • the Type IIS nuclease can be any type IIS derived restriction enzyme or zinc-finger endonuclease.
  • the two recognition sites for the chosen Type IIS enzyme are placed in inverted orientation so that cleavage at both sites separates the fragment carrying the two recognition sites and including the selection marker from the target nucleic acid molecule.
  • the recognition sequences of the Type IIS nuclease may be in the same orientation or inverted.
  • "in the same orientation” is meant that, if the cleavage of the nucleic acid is 5' to the first recognition sequence then it is also 5' to the second recognition sequence.
  • the cleavage of the nucleic acid is 3' to the first recognition sequence then it is also 3' to the second recognition sequence.
  • inverted' is meant that, if the cleavage of the strand is 5' to the first recognition sequence then it is 3' to the second recognition sequence, or vice versa.
  • the recognition sequences are inverted.
  • Particularly preferred is an inverted repeat of the recognition sequence which directs cleavage by the Type Hs nuclease(s) such that the recognition sites are located on the same nucleic acid digestion product as the selectable marker.
  • the product of this reaction is transformed to propagate the desired nucleic acid molecule.
  • the transformation can include an optional selection for the loss of rpsL by selection for streptomycin resistance.
  • the obtained product will be the mutated vector.
  • a single point mutation is illustrated, however the same approach can be applied to mutate any of the base pairs that fall within the single-stranded region created by the cleavage of the Type IIS nuclease.
  • mutations may comprise substitutions, insertions or deletions.
  • the number of base pairs changed may range from a single point mutation, to multiple substitutions, insertions or deletions, for example, 2, 3, 5, 10, 20, 50 or more base pairs.
  • a nucleic acid fragment is introduced into the target nucleic acid molecule by homologous recombination.
  • the most suitable homologous recombination technique for use in these methods employs the Red operon from phage lambda and is commonly termed 'recombineering' (see Zhang et al., Nature Genet 20 (1998), 123-128; Muyrers et al., Nucl Acids Res 27 (1999), 1555-1557; co-pending International patent application WO99/29837; co-pending European patent application EPl 399546; also co- pending International application filed on 20th February 2009 and entitled "Method of nucleic acid recombination"; the contents of all these documents is hereby incorporated by reference.
  • Recombineering uses the Red ⁇ protein, or combinations of proteins comprising the Red ⁇ protein, for example the Red ⁇ , Red ⁇ and Red ⁇ proteins, to mediate homologous recombination.
  • Suitable phage annealing proteins for use in the invention include RecT (from the rac prophage), Red ⁇ (from phage ⁇ ), and Erf (from phage P22).
  • RecT from the rac prophage
  • Red ⁇ from phage ⁇
  • Erf from phage P22
  • the identification of the recT gene was originally reported by Hall et al., (J. Bacterid. 175 (1993), 277-287).
  • the RecT protein is known to be similar to the ⁇ bacteriophage ⁇ protein or Red ⁇ (Hall et al. (1993); Muniyappa and Radding, J.Biol.Chem.
  • Erf protein is described by Poteete and Fenton, (J MoI Biol 163 (1983), 257-275) and references therein. Erf is functionally similar to Red ⁇ and RecT (Murphy et al., J MoI Biol 194 ( 1987), 105- 1 17), and in some cases can substitute for the lambda phage recombination system (Poteete and Fenton, Genetics 134 (1993), 1 OHIO 1021).
  • the invention also includes the use of functional equivalents of the molecules that are explicitly identified above as RecT, Red ⁇ and Erf, provided that the functional equivalents retain the ability to mediate recombination, as described herein and in European patent application EPl 399546.
  • Such functional equivalents include homologues of elements of recombination systems that are present in bacteriophages, including but not limited to large DNA phages, T4 phage, T7 phage, small DNA phages, isometric phages, filamentous DNA phages, RNA phages, Mu phage, Pl phage, defective phages and phagelike objects, as well as the functional homologues of elements of recombination systems that are present in viruses (e.g.
  • a selection step needs to be included to ensure high efficiency.
  • the selection step should be based on the insertion of an antibiotic or other gene so that the product can be distinguished from substrate.
  • the selectable gene remains at the site of the mutagenesis and must be removed, as the insertion of a gene is incompatible with most genetic engineering strategies, for example, SDM, which requires only a very small change at the site of mutagenesis.
  • selectable marker may be used, either conferring resistance, sensitivity, causing fluorescence and so on.
  • the selectable marker may be an antibiotic resistance marker.
  • the selectable marker may be an enzyme which complements an auxotrophy.
  • the selectable marker may be an enzyme which produces an essential nutrient. In a host which lacks the ability to produce this essential nutrient, only those cells containing the selectable marker will be able to grow in media which lack the nutrient.
  • auxotrophic markers are well known in the art. A non-limiting list of examples includes ura3, pyrG, niaD and trpC.
  • Incorporating a marker ensues simple screening because targets incorporating the introduced nucleic acids can be identified by phenotypic screening, using selectable growth media, rather than by sequence or indeed size, on a gel.
  • Use of a fluorescence marker may be particularly applicable for high- throughput methodologies. By using such marker it will be possible to isolate cells containing the desired product by Fluorescence-activated cell sorting (FACS).
  • the introduced nucleic acid molecule may further comprise a counterselectable marker between the two Type IIS nuclease recognition sites.
  • a selection pressure for the absence of both markers can be exerted. For example, applying a counterselection pressure during the growth of the host cell into which the religated product of step c) is transformed will ensure that only clones which lack the marker grow. In this case surviving clones will be those which have been digested by the Type IIS restriction enzyme and then re-ligated such that selectable marker, the counterselectable marker, and the Type IIS recognition sequences are no longer present.
  • selection for the absence of a counterselectable marker permits the simple and inexpensive screening of numerous clones, as opposed to the considerably more onerous and expensive screening methods involving further PCR and sequencing reactions.
  • counterselectable markers include rpsL, which renders sensitive those E. coli hosts that are naturally resistant to streptomycin. Its removal thus restores resistance.
  • SacB Another counter selectable gene is SacB, which conveys toxicity to sucrose.
  • a primer region may be included for PCR amplification of a selectable gene.
  • the introduced nucleic acid fragment should possess at least two regions of sequence homology (homology arms) with regions of sequence on the target nucleic acid molecule.
  • homology is meant that when the sequences of the introduced and target nucleic acid molecules are aligned, there are a number of nucleotide residues that are identical between the sequences at equivalent positions. Degrees of homology can be readily calculated (Computational Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing. Informatics and Genome
  • Such regions of homology are preferably at least 9 nucleotides each, more preferably at least 15 nucleotides each, more preferably at least 20 nucleotides each, even more preferably at least 30 nucleotides each.
  • Particularly efficient recombination events may be effected using longer regions of homology, such as 50 nucleotides or more.
  • the regions of sequence homology may be located on the introduced nucleic acid fragment so that one region of homology is at one end of the molecule and the other is at the other end. However, one or both of the regions of homology may also be located internally.
  • the two sequence homology regions should thus be tailored to the requirements of each particular experiment. There are no particular limitations relating to the position for the two sequence homology regions located on the target DNA molecule, except that for circular double-stranded DNA molecules, the repair recombination event should not abolish the capacity to replicate.
  • the sequence homology regions can be interrupted by non-identical sequence regions, provided that sufficient sequence homology is retained to allow the repair recombination reaction to occur.
  • the introduced nucleic acid molecule should also include at least two recognition sites for Type IIS nucleases.
  • a recognition sequence is a nucleic acid DNA sequence to which the nuclease binds during cleavage of its cognate target nucleic acid.
  • the first and second Type IIS recognition sequences in the introduced nucleic acid are recognised by the same Type IIS restriction nuclease.
  • the Type IIS recognition sequences are recognised by differing Type IIS restriction nucleases.
  • Type IIS nucleases include Type IIS restriction enzymes, Type IIS derived zinc- finger endonucleases and any other Type IIS derived enzyme.
  • a list of some currently available Type IIS restriction enzymes can be found at www.neb.com.
  • the invention also includes the use of functional equivalents of the molecules that are explicitly identified above as Type IIS provided that the functional equivalents retain the ability to recognise a recognition sequences and cleave outside of this sequence. Such functional equivalents can be found in the Type III and Type IV classes of restriction enzymes.
  • fragments of the Type IIS restriction enzymes such as truncated variants, and fusion proteins of which the sequence of a Type IIS restriction enzyme forms a part, that retain the ability to recognise a recognition sequence and cleave outside of this sequence. It is considered that the identification of such functional equivalents is within the ability of the skilled addressee.
  • Type IIS enzyme variants that have been optimised and/or evolved, through, for example DNA shuffling (Stemmer, W.P. Nature 370, 389-91 (1994)), or Substrate-linked directed evolution (SIiDE, GB 0029375.3).
  • Such functional variants may be evolved such that their recognition sequence is changed (Doyon et al. J. Am. Chem. Soc (2006) 128: 2477-2484).
  • Type IIS enzymes created from other proteins, for example by directed evolution or as a fusion protein, to generate a Type IIS enzyme that can recognize long sequences.
  • An example of such a sequence is the recognition sequence for the site- specific recombinase Cre, whose recognition sequence is called loxP and is 32 bps long.
  • Cre recombinase engineered to become a Type Hs enzyme would serve a useful purpose as a very rare cutting instrument, since such enzymes would allow the invention to be applied to whole genomes, since the introduced recognition sites would be unique.
  • very rare cutting Type 11s enzymes have already been developed based on combinations of zinc fingers, and these are also incorporated by reference herein (e.g.
  • the recognition sequence of the Type IIS nuclease enzyme should be designed such that when the enzyme cleaves it does so within the focal sequence.
  • Type IIS restriction endonuclease recognition sequences are not palindromic.
  • the orientation of the recognition sequence therefore determines the direction in which the cut is made by the endonuclease.
  • the enzyme cleaves within the adjacent focal sequence.
  • the distance between the recognition sequence and the cleavage points is 1 -20 nucleotides, for example 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides.
  • the distance between the recognition sequence and the cleavage point is 1 -4 nucleotides. In an alternative, this distance may be more than 20 nucleotides.
  • the distance between the recognition sequence and cleavage point is the distance in nucleotides between the closest nucleotide of the recognition sequence (which is not counted) and the nucleotide prior to the cleavage point. If the distance between the recognition sequence and cleavage point is the same for both strands of the nucleic acid, then digestion with a Type IIS enzyme will result in blunt ended termini. If the distance between the recognition sequence and cleavage point varies then termini with a 5' or 3' overhang will be produced, depending upon which strand has the longer distance between the recognition sequence and cleavage point.
  • a nucleic acid fragment is introduced by homologous recombination into a target nucleic acid molecule as described above, which fragment includes first and second regions of homology to the target nucleic acid molecule, around a focal sequence.
  • the fragment also includes first and second recognition sequences for a Type IIS nuclease, in inverted orientation, and a selectable marker.
  • two replacement nucleic acid sequences matching portions of the focal sequence are positioned at each end of the molecule between the first and second regions of homology and flanking the recognition sequences for the Type IIS nuclease.
  • a cleavage at a specified nucleotide or between any two specified nucleotides can be achieved without adding any extra nucleotides to the ends after cleavage.
  • Cleavage by the TypeIIS nuclease can generate two ends that are compatible for religation or can generate ends that are not compatible for religation such that upon cleavage the regions of single-strandedness generated do not have compatible ends and are thus incapable of being ligated to each other to form a circular molecule.
  • one terminus may have 3' overhang, and the other a 5' overhang; one may have a 3' overhang, and the other be blunt; one may have a 5' overhang, and the other be blunt; both may have 5' or 3' overhangs, but of non- complementary sequence such that they are incapable of annealing to one another.
  • the precise format of the termini will depend on the difference in distance between the recognition sequence and the cleavage point, as the skilled person will understand. Accordingly, the target fragment includes the desired replacement sequence but is linearised.
  • Such a molecule might conveniently be a vector, but could be a larger molecule, even a genome, as long as the Type IIS nuclease used does not recognize any other sequence in the molecule or genome.
  • This variation has particular applicability to the linearization of BACs (bacterial artificial chromosomes) at chosen sites and the further use of these linearized BACs in complex recombination exercises.
  • the efficiency of cloning can be increased by applying a selection pressure against a counter selectable marker excised during the digestion by the Type Hs nucleases. Such selection would be far preferred to gel extraction, or CsCl gradient purification for the exclusion of undigested vector in the cloning product.
  • a vector linearized by this method is used for cloning nucleic acids, such as DNA, for example DNA encoding short hairpin RNAs (shRNAs).
  • the vector which is linearised has first and second regions of homology which comprise or consisfof microRNA (miRNA, or miR) sequences. This is particularly useful when the linearized vector is used to clone shRNAs.
  • the first and second regions of homology may comprise or consist of sequences from miR30.
  • the vector may include a promoter.
  • the first and second regions of homology comprising or consisting of miRNA sequences are designed such that the sequences are removed from the shRNA following microRNA processing of the expressed construct.
  • This approach was employed in Silva et al, 2005, Second-generation shRNA libraries covering the mouse and human genomes., Nature Genetics, 37: 1281 -8.
  • a system for cloning shRNAs into a vector comprising miRNA sequences, linearized according to this method of the invention, is shown in Figure 5 and detailed in example 5. This method is particularly advantageous because it leaves no scar in the shRNA or miRNA sequences. Such scars may have detrimental effects on the secondary structure of the transcribed RNA.
  • a linearized vector is used for cloning a double stranded DNA formed from annealed oligonucleotides and having overhanging ends, wherein the ends of the double stranded DNA are compatible with the ends of the linearized vector.
  • the double stranded DNA to be inserted into the linearized vector may be prepared by any of the methods known in the art, for example restriction enzyme digestion out of another vector, a PCR fragment, which may then be digested with a restriction enzyme, and also double stranded DNAs prepared by the annealing of complementary single stranded nucleic acids.
  • the double stranded DNA is formed by complementary oligonucleotides and encodes a shRNA.
  • the double stranded nucleic acid may then be inserted into the linearized vector using methods known in the art, such as ligation using a DNA ligase, for example T4 DNA ligase.
  • a further embodiment of the method of the invention provides a simple variation to achieve small insertions and replacements. All steps are the same as illustrated above except the inserted nucleic acid fragment is differently configured to suit the experimental purpose. This methodology is illustrated in Figure 2.
  • the sequence to be inserted or replaced (here denoted as seq X) is ultimately positioned adjacent to a focal sequence in the target nucleic acid molecule.
  • the nucleic acid fragment that is introduced in step a) of the method includes, at either end, adjacent to and internal to the homology arms, two focal sequences that match the focal sequence in the target nucleic acid molecule.
  • the inserted sequence represents the replacement nucleic acid sequence, and is positioned between the first and second regions of homology but external to the recognition sequences for the Type IIS nuclease.
  • the inserted sequence can be any sequence of any length although most often it will be less than 1000 bps for convenience, for example, about 2, 5, 10, 20, 50, 100, 200 or 500.
  • the obtained product will be a vector containing the inserted sequence in any chosen place without any neighbouring selectable gene or any other operational sequence.
  • the inserted sequence can be any sequence that can be introduced to the left (5'), or right (3'), or both (5' and 3'), of the focal sequence.
  • the inserted sequence can be one, or more than one, codon(s) (that can replace codon(s) in the target nucleic acid molecule). Also possible are larger in-frame insertions which can be accomplished with the insertion of larger lengths of nucleic acid. It will generally be preferred that the inserted length of nucleic acid contains a number of nucleotides which is divisible by three, such that the reading frame is maintained. In one preferred embodiment, the inserted sequence is 3 bp long and in-frame within a coding region. Hence this method encompasses codon-based mutagenesis, which is superior to single nucleotide mutagenesis for the mutation of protein coding regions.
  • sequence homology arms that span regions of non-identical sequence compared to the target nucleic acid molecule, mutations such as substitutions, (for example, point mutations), insertions and/or deletions may be introduced into the target nucleic acid molecule.
  • the inserted sequence can be a loxP site or any other site-specific recombinase site; it can also be a short tag or a restriction site. Other examples will be clear to those of skill in the art.
  • a further embodiment of the invention provides a method suitable for larger insertions (illustrated in Figure 3).
  • the introduced nucleic acid fragment is comprised of two or more components.
  • “Triple recombination” is an embodiment of the method of this aspect of the invention where the introduced nucleic acid is itself made from two fragments.
  • "'Quadruple recombination” is an embodiment of the method of this aspect of the invention where the introduced nucleic acid is itself made from three fragments. The use of triple and quadruple recombination is particularly applicable to the current invention.
  • the introduced nucleic acid fragment is made from two separate fragments in vivo.
  • each component includes a region of homology to the target nucleic acid molecule.
  • Each component also includes a focal sequence which matches the focal sequence in the target nucleic acid molecule.
  • At least one of the components includes a selectable marker.
  • At least one of the components preferably only one, includes a replacement nucleic acid sequence in the form of an inserted sequence.
  • both components contain a recognition sequence for a Type Hs nuclease.
  • one component may contain two recognition sequences for Type Hs nucleases.
  • both components include a region of mutual homology that allows the components to undergo homologous recombination to knit together to form a single nucleic acid fragment for introduction into the target nucleic acid molecule.
  • the introduced nucleic acid fragment is made from three separate fragments in vivo.
  • a first fragment comprises the selectable marker flanked on either side by two annealing regions which are not capable of annealing to the target nucleic acid.
  • a second fragment comprises a first region of homology capable of annealing with a first sequence on the target nucleic acid, and a second region of homology capable of annealing with a first annealing region on the first fragment.
  • a third fragment comprises a first region of homology capable of annealing with a second sequence of the target nucleic acid, and a second region of homology capable of annealing with a second annealing region on the first fragment.
  • the second or third fragment may comprise the coding sequence of a gene.
  • the second and/or third fragments may comprise a partial coding sequence of a gene.
  • the second and/or third fragments may comprise a promoter element.
  • the second and third fragments may comprise partial coding sequences of the same gene such that following excision of the selectable marker by the Type IIS restriction endonuclease digestion, should the termini of the linear molecule be re-ligated the entire coding sequence of the gene is reconstituted.
  • Embodiments of the invention that exploit triple and quadruple recombination may incorporate methods of terminal adaptation, as detailed in co-pending application filed on
  • the appropriate stands of each fragment can be preferentially degraded such that the efficiency of the recombineering step is maximised.
  • the preferentially degraded strands are the strands of the first and third fragments that are not capable of annealing to the lagging strand of the target nucleic acid at a replication fork. It is preferable that the retained strand of second nucleic fragment is the strand which can anneal to the sequence of the preferentially degraded first and third fragments.
  • a method, herein termed Unique Site Elimination (USE), for altering the sequence of a nucleic acid molecule comprising the steps of a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a recognition sequence for a nuclease; iii) a selectable marker; and iv) a second region of homology to the target nucleic acid molecule; and b) selecting for the incorporation of the nucleic acid fragment using the selectable marker; c) introducing into the selected product of step b) a replacement nucleic acid fragment comprising, from 5' to 3' : i) a first region of homology to the product of step b); ii) a replacement nucleic acid sequence; iii) a second region of
  • the unique restriction site elimination method allows for the efficient and extensive modification of plasmids and episomes irrespective of size or sequence requirements.
  • An example of the USE method is set out in Figure 6.
  • the key concept is based on the introduction of a selectable gene into an episome by homologous recombination, which simultaneously introduces a restriction enzyme site that is ideally absent from the episome, and that will thereby be unique in the episome.
  • a second homologous recombination step is then made which will eliminate the selectable gene and simultaneously the unique restriction site, but which introduces the desired alteration in the target nucleic acid molecule (e.g. substitution, insertion or deletion).
  • the fragment introduced into the episome in the second homologous recombination step c) need not carry a selectable gene, and ideally will not. However the fragment introduced in the first step a) may carry a counterselectable gene in some embodiments; however this is not essential to the method.
  • the second homologous recombination step all episomes are harvested. These will include a small number of desired products from the second step c) and a large number of episomes that were not successfully recombined in this second step. The episomes are then digested with the nuclease corresponding to the unique site introduced in the first step a).
  • the nucleic acid fragment introduced in step a) requires a first region of homology to the target nucleic acid molecule; a recognition sequence for a nuclease; a selectable marker; and a second region of homology to the target nucleic acid molecule.
  • the regions of sequence homology may be located on the replacement fragment so that one region of homology is at one end of the molecule and the other is at the other end. However, one or both of the regions of homology may also be located internally. The two sequence homology regions should thus be tailored to the requirements of each particular experiment.
  • sequence homology regions can be interrupted by non-identical sequence regions, provided that sufficient sequence homology is retained to allow the repair recombination reaction to occur.
  • sequence homology arms that span regions of non-identical sequence compared to the target nucleic acid molecule, mutations such as substitutions, (for example, point mutations), insertions and/or deletions may thus be introduced into the target nucleic acid molecule.
  • the nucleic acid fragment introduced in step a) should include a recognition sequence for a nuclease enzyme.
  • this recognition sequence does not occur elsewhere in the target nucleic acid, so that this recognition sequence is unique.
  • digestion will only occur in those instances where the replacement nucleic acid in step c) has not successfully integrated.
  • the invention is still operable by exploiting very short periods of nuclease digestion, so that partial digestion occurs and the desired products of step c) can still be isolated.
  • Type III restriction endonuclease or functional equivalent may be employed.
  • Type HI enzymes require two copies of the recognition sequence to be present in a single molecule for cleavage to occur. Accordingly, two recognition sequences may be placed in the inserted fragment if there are no occurrences of the recognition sequence in the target molecule. Further, if one recognition sequence is already present in the target molecule, incorporation of a further recognition sequence for the Type III enzyme in the introduced nucleic acid thus means cleavage will occur in those molecules which contains both sites, such that only the desired products of step c) are isolated.
  • this recognition sequence has the advantage that digestion with this enzyme in step d) of the method cleaves solely those nucleic acid molecules that have been altered by recombineering by the introduced nucleic acid fragment introduced in step a), but not by the replacement nucleic acid fragment introduced in step c).
  • digestion with this enzyme in step d) of the method cleaves solely those nucleic acid molecules that have been altered by recombineering by the introduced nucleic acid fragment introduced in step a), but not by the replacement nucleic acid fragment introduced in step c).
  • those nucleic acids which remain uncleaved, i.e. the desired product of the method will form a vast excess of the product that is recovered.
  • the replacement nucleic acid must not contain the recognition sequence present in the introduced nucleic acid.
  • the USE method is not limited to use with Type IIS enzymes only. Any nuclease may be used that has a cognate recognition sequence in nucleic acid, including restriction enzymes of all types. Rare cutters will be generally preferred, because these nucleases are unlikely to recognise sequences elsewhere in the target nucleic acid and it is therefore easier to introduce a recognition sequence that is unique. Accordingly, the Type IIS nucleases (including Type IIS restriction enzymes, Type IIS derived zinc-finger endonucleases and any other Type IIS derived enzyme) are preferred. A restriction enzyme may be used which is not sensitive to the methylation state of the recognition sequence.
  • the replacement nucleic acid fragment introduced in step c) should be designed such that it comprises homology arms that span a portion of the target nucleic acid sequence which it is desired to mutate.
  • the constituent nucleotide components of a nucleic acid molecule are thus changed in some way. Examples of alterations include the insertion, deletion or substitution of one or more constituent nucleotides in the target nucleic acid molecule, such as the introduction of a point mutation or creation of altered protein reading frames.
  • the replacement nucleic acid sequence may thus contain a substitution, insertion or deletion from the original target nucleic acid molecule and thus it is this sequence which defines the alteration in sequence which is desired.
  • the number of base pairs substituted may range from a single point mutation, to multiple substitutions, insertions or deletions, for example, 2, 3, 5, 10, 20, 50 or more base pairs. Insertions or deletions may also comprise single or a small number of base pair additions or deletions, for example, 2, 3, 5, 10, 20, 50 or more base pairs; however, larger insertions and deletions of 100, 200, 500, 1000 base pairs or more may conveniently be inserted or removed, or even larger regions of many kilobase pairs. The inserted or deleted sequence will often be less than 1000 bps for convenience, for example, about 2, 5, 10, 20, 50, 100, 200 or 500.
  • the insertion inserted by the method is a small insertion, for example of three nucleotides.
  • This embodiment is particularly preferred if the insertion is in a coding sequence. In this case this insertion will add a further codon to the coding sequence of a protein.
  • larger in-frame insertions which can be accomplished with the insertion of larger lengths of nucleic acid, providing that the inserted length of nucleic. acid contains a number of nucleotides which is divisible by three, such that the reading frame is maintained. Concerted combinations of insertions, deletions, and substitutions are also possible.
  • alteration event there is no restriction to the type of alteration event to which the present application is applied, although the most obvious applications include those which are extremely difficult or time consuming using approaches that are currently available. Particularly the alteration is one which is not amenable to high-throughput methodologies using current techniques Examples include the precise modification of endogenous nucleic acid molecules in any species, such as yeast chromosomes, mouse embryonic stem cell chromosomes, C. elegans chromosomes, Arabidopsis and Drosophila chromosomes, human cell lines, viruses and parasites, or exogenous molecules such as plasmids, yeast artificial chromosomes (YACs) and human artificial chromosomes (HACs).
  • yeast chromosomes mouse embryonic stem cell chromosomes
  • C. elegans chromosomes Arabidopsis and Drosophila chromosomes
  • human cell lines viruses and parasites
  • exogenous molecules such as plasmids, yeast artificial chro
  • the homology arms must span these sequences in the target nucleic acid which is produced in step b). This is evident, for example, from the illustration of an exemplary system according to this embodiment of the invention which is illustrated in Figure 6. It will be within the ambit of the skilled reader, imbued with knowledge of the present invention, to design constructs for use in accordance with the present invention.
  • the introduced nucleic acid fragments that are introduced may be circular or linear, but are preferably linear DNA or RNA molecules, either double-stranded or single-stranded.
  • DNA is generally preferred.
  • Preferred nucleic acids thus include single-stranded DNA or RNA, in either orientation, 5' or 3'.
  • Annealed oligonucleotides may also be used, either with blunt ends, or possessing 5' or 3' overhangs.
  • single-stranded oligonucleotides are used.
  • single-stranded deoxyribonucleotides are used.
  • Introduced nucleic acid molecules carrying a synthetic modification can also be used.
  • the introduced nucleic acid fragments do not necessarily represent a single species of nucleic acid molecule.
  • a heterogeneous population of nucleic acid molecules for example, to generate a DNA library, such as a genomic or cDNA library.
  • target nucleic acid molecule A number of different types may be used in the method of the invention. Accordingly, intact circular double-stranded nucleic acid molecules (DNA and RNA), such as plasmids, and other extrachromosomal DNA molecules based on cosmid, Pl, BAC or PAC vector technology may be used as the target nucleic acid molecule according to the invention described above. Examples of such vectors are described, for example, by Sambrook and Russell (Molecular Cloning, Third Edition (2000), Cold Spring Harbor Laboratory Press) and Vietnamese et al. (Nature Genet. 6 (1994), 84-89) and the references cited therein.
  • the target nucleic acid molecule may also be a host cell chromosome, such as, for example, the E. coli chromosome.
  • a eukaryotic host cell chromosome for example, from yeast, C. elegans, Drosophila, mouse or human
  • eukaryotic extrachromosomal DNA molecule such as a plasmid, YAC and HAC
  • the target nucleic acid molecule need not be circular, but may be linear.
  • the target nucleic acid molecule is a double-stranded nucleic acid molecule, more preferably, a double-stranded DNA molecule.
  • the method of the invention may be effected, in part, in a host.
  • Suitable hosts include cells of many species, such prokaryotes and eukaryotes, and also including viruses and parasites, although bacteria, such as gram negative bacteria are a preferred host.
  • the host cell is an enterobacterial cell, such as a Salmonella, Klebsiella, Bacillus, Neisseria or Escherichia coli cell (the method of the invention works effectively in all strains of E. coli that have been tested so far).
  • the method of the present invention is also suitable for use in eukaryotic cells or organisms, such as fungi, plant or animal cells, as well as viral and parasitic cells and organisms.
  • the system has been demonstrated to function well in ES cells, specifically mouse ES cells, and there is no reason to suppose that it will not also be functional in other eukaryotic cells.
  • the method of the invention may comprise the contacting of the introduced and target nucleic acid molecules in vivo.
  • the introduced nucleic acid molecule may be transformed into a host cell that already harbours the target nucleic acid molecule.
  • the introduced and target nucleic acid molecules may be mixed together in vitro before their co-transformation into the host cell.
  • one or both of the species of nucleic acid molecule may be introduced into the host cell by any means, such as by transfection, transduction, transformation, electroporation and so on. For bacterial cells, a preferred method of transformation or cotransformation is electroporation.
  • the homologous recombination of the method is initiated entirely in vitro, without the participation of host cells or the cellular recombination machinery.
  • Phage annealing proteins such as RecT are able to form complexes in vitro between the protein itself, an oligonucleotide molecule and a double-stranded nucleic acid molecule (Noirot and Kolodner, J Biol Chem 273 (1998), 12274-12280).
  • RecT Phage annealing proteins
  • RecT Phage annealing proteins
  • RecT are able to form complexes in vitro between the protein itself, an oligonucleotide molecule and a double-stranded nucleic acid molecule.
  • a complex is that formed between RecT, a ssDNA oligonucleotide and an intact circular plasmid.
  • joint molecules consisting, in this example, of the plasmid and the ssDNA oligonucleotide.
  • Such joint molecules have been found to be stable after removal of the phage annealing protein. The formation of stable joint molecules has been found to be dependent on the existence of shared homology regions between the ssDNA oligonucleotide and the plasmid.
  • the methods of the invention rely on recombination events that involve the replacement of a section of target nucleic acid for an equivalent section of introduced nucleic acid, to which the introduced fragment is directed through the existence of shared regions of sequence homology between the two molecule types.
  • the introduced nucleic acid becomes covalently attached to the target nucleic acid.
  • the sequence information in the introduced nucleic acid molecule becomes integrated into the target nucleic acid molecule in a precise and specific manner, and with a high degree of fidelity.
  • the efficiency of this step when coupled with a selection step, is high, and allows the simple manipulation of sequences.
  • the nucleic acid molecule fragments used to replace target sequence may be single- stranded.
  • This single-stranded nucleic acid may be generated in vivo or in vitro. In other words the single-stranded nucleic acid may be generated in a host cell.
  • the generation of the single-stranded replacement nucleic acid from the double-stranded nucleic acid substrate prior to recombination may be mediated by any suitable means.
  • the double- stranded nucleic acid substrate may be adapted such that one strand is preferentially degraded entirely to leave the other strand as the single-stranded replacement nucleic acid (see co-pending International application filed on 20th February 2009 and entitled "Method of nucleic acid recombination").
  • the degradation is preferably mediated by an exonuclease.
  • the exonuclease may be a 3' to 5' exonuclease but is preferably a 5' to 3' exonuclease.
  • the 5' to 3' exonuclease is Red alpha (Kovall, R. and Matthews, B.W. Science, 1997, 277, 1824-1827; Carter, DM. and Radding, CM., 1971 , J. Biol. Chem. 246, 2502-2512; Little, J. W. 1967, J. Biol. Chem., 242, 679-686) or a functional equivalent thereof.
  • the exonuclease is RecBCD.
  • the single-stranded replacement nucleic acid is generated from the double-stranded nucleic acid substrate by a helicase.
  • the helicase separates the dsDNA substrate into two single-stranded nucleic acids, one of which is the single-stranded replacement nucleic acid.
  • the helicase may be either a 5 '-3' or 3 '-5' helicase.
  • the helicase is RecBCD whilst it is inhibited by Red gamma.
  • the helicase is any helicase of the RecQ, RecG or DnaB classes.
  • the single-stranded replacement nucleic acid generated by the helicase is preferentially stabilised relative to the other single-stranded nucleic acid generated by the helicase.
  • the step of generating the single-stranded replacement nucleic acid from the double-stranded nucleic acid substrate is carried out in a host cell in which the recombination occurs.
  • the step of generating the single-stranded replacement nucleic acid may be carried out in a separate host cell from the host cell in which the recombination occurs and may then be transferred to the host cell in which recombination occurs by any suitable means, for example, by transduction, transfection or electroporation.
  • the step of generating the single-stranded nucleic acid from the double-stranded nucleic acid substrate may be carried out in vitro.
  • the requirement in the host cell in which recombination takes place for Red alpha or an alternative enzyme that preferentially degrades one strand of the double-stranded nucleic acid substrate, or which separates the two strands, may be bypassed by providing the single-stranded nucleic acid to the host cell.
  • adapting one or both 5' ends of the double-stranded nucleic acid increases the yield of the single-stranded nucleic acid.
  • this increase in yield is due to the effect of adapting the 5' end(s) on the enzymes that act to generate the single- stranded nucleic acid.
  • the double-stranded nucleic acid substrate is adapted so that it is asymmetric at its 5' ends.
  • the asymmetry preferably causes one strand to be preferentially degraded. This preferably results in the other strand being maintained and so the production of a single-stranded nucleic acid is favoured, thereby improving the yield of the single- stranded nucleic acid.
  • the method of the invention preferably utilises a double-stranded nucleic acid substrate having asymmetry at its 5' ends wherein the method is conducted in the presence of Red alpha and/or a helicase and in the presence of Red beta.
  • Red gamma is preferably also present as Red gamma inhibits RecBCD, which degrades double-stranded DNA.
  • Red-mediated homologous recombination employs a double-stranded nucleic acid substrate that is adapted to have asymmetric 5' ends in the presence of Red beta and Red gamma, without Red alpha.
  • a less efficient but still operable way to engineer DNA using Red-mediated homologous recombination employs a double-stranded nucleic acid substrate that is adapted to have asymmetric 5' ends in the presence of Red beta, without Red gamma (or a functional equivalent thereof) and without Red alpha (or a functional equivalent thereof). Such a method is also encompassed within the scope of the invention.
  • any suitable method of making a double-stranded nucleic acid substrate asymmetric such that one strand is preferentially degraded whilst the other is maintained is envisaged by the present invention.
  • the asymmetry may be conferred, for example, by one or more features present in only one strand of the double-stranded nucleic acid substrate or by one or more features present in both strands of the double-stranded nucleic acid substrate, wherein different features are present in different strands.
  • the asymmetry is present at or in close proximity to the 5' ends of the two strands of the double-stranded nucleic acid substrate, most preferably at the 5' ends.
  • the asymmetry is preferably present at the 5' end of the 5' identity regions of the double-stranded nucleic acid substrate, or may be present in a region 5' of the 5' identity regions.
  • the ''identity regions" of the double-stranded nucleic acid substrate correspond to the regions of the single-stranded nucleic acid that are identical to sequence on the target nucleic acid, or are complementary thereto.
  • the double- stranded nucleic acid substrate may have one or more features at or in close proximity to the 5' end of one of its strands but not at or in close proximity to the 5' end of the other strand which make it asymmetric.
  • the asymmetry is conferred by a modification to the nucleic acid sequence.
  • the modification affects the progression of exonuclease, preferably a 5 '-3' exonuclease, preferably Red alpha exonuclease, on one strand but does not affect the progression of the exonuclease on the other strand.
  • the modification may inhibit the progression of exonuclease on one strand such that the exonuclease preferentially degrades the other strand.
  • inhibit the progression of exonuclease is meant that the modification inhibits the progression of the exonuclease on that strand relative to the other strand, for example, by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, most preferably 100%.
  • the modification may be the inclusion of a blocking DNA sequence, such as the Red alpha exonuclease pause sequence, more preferably, the Red alpha pentanucleotide pause sequence GGCGA, more preferably GGCGATTCT, more preferably, the left lambda cohesive end, also called the cos site (Perkins TT, Dalai RV, Mitsis PG, Block SM Sequence-dependent pausing of single lambda exonuclease molecules. Science 301 : 1914-8).
  • the Red alpha exonuclease pause site may, for example, be placed at or in close proximity to the 5' end of one strand but not at or in close proximity to the 5' end of the other strand.
  • the modification prevents the exonuclease from binding to one strand of the double-stranded nucleic acid substrate such that only the other strand is degraded. In a further preferred embodiment, the modification does not prevent the exonuclease from binding but blocks it from degrading one strand or both of the double-stranded nucleic acid substrate such that the strand that will anneal to the lagging strand template is stabilized upon separation from the dsDNA substrate by a helicase.
  • the modification may promote the progression of exonuclease, preferably of 5'-3' exonuclease, more preferably Red alpha exonuclease, on one strand such that the exonuclease preferentially degrades that strand relative to the other strand.
  • promote the progression of exonuclease is meant that the modification promotes the progression of exonuclease activity on that strand relative to the other strand, for example, by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 100%, at least 200%, at least 300%, or at least 400%.
  • the modification may serve to preferentially stabilise one strand, for example, by preventing an exonuclease or endonuclease from binding to that strand.
  • the modification prevents exonuclease degradation of both strands such that one strand is protected and can be released from the other by the action of a helicase.
  • the modification is one or more covalent modifications.
  • the covalent modification is present at or in close proximity to the 5' end of one strand but is not present at or in close proximity to the 5' end of the other strand. More preferably, the covalent modification is present at the 5' end of one strand but is not present at the 5' end of the other strand.
  • Preferred covalent modifications are the presence of a replacement nucleotide, such as the presence of a hydroxyl group or a phosphothioester bond. Such covalent modifications disfavour the action of exonucleases.
  • the covalent modification is preferably selected from one or more of the following:
  • the phosphothioate(s) is present in place of the 5'-most bond in the 5' identity region, or are present in place of the first two bonds, or are present in place of up to each of the first six (e.g. 3, 4, 5, 6) or more bonds in the 5' identity region; one or more phosphoacetates in place of one or more phosphodiester bonds.
  • the phosphoacetate(s) is present in place of the 5'-most bond in the 5' identity region, or are present in place of the first two bonds, or are present in place of up to each of the first six (e.g.
  • the one or more locked nucleotides are present in place of the first nucleotide in the 5' identity region, or are present in place of the first two nucleotides, or in place of up to the first six (e.g. 3, 4, 5 or 6) nucleotides; • a hydroxyl group.
  • the 5' most nucleotide of the substrate is also the 5' most nucleotide of the region that is identical to sequence on the target nucleic acid and the hydroxyl group is at the 5' end of this region of sequence identity; a 5' protruding end.
  • the covalent modification may be 2 or more protruding nucleotides, 4 or more protruding nucleotides, 6 or more protruding nucleotides, preferably 1 1 or more protruding nucleotides, preferably a 5' end containing the Red alpha pause sequence, preferably the left lambda cohesive end known as cos;
  • any other covalent adduct that renders resistance to 5'-3' exonucleases may be modified to contain an attached adduct such as biotin deoxygenin or fluorophore such as FITC.
  • the covalent modification is preferably selected from one or more of the following:
  • the stretch of DNA sequence may be, for example, 1 -29 bps in length, more preferably 30-99 bps in length, more preferably 100-999 bps in length, even more preferably more than 1 kb in length;
  • a double- stranded nucleic acid substrate for the production of a single-stranded nucleic acid that contains one or more covalent modifications that protect the 5' end of the strand to be maintained and also one or more covalent modifications that render the other strand of the double-stranded nucleic acid substrate sensitive to 5'-3' exonucleases.
  • the double-stranded nucleic acid substrate may lack the 5' phosphate (i.e. presence of hydroxyl) on one strand whilst the other strand comprises the 5' phosphate.
  • the double-stranded nucleic acid substrate is adapted such that it comprises a 5' phosphothioate at one of its 5' ends but not at the other 5' end. Any other chemical modification at or near the 5' end which inhibits or promotes exonuclease progression or blocks exonuclease binding is also encompassed within the scope of the invention.
  • the asymmetry may be caused by the double-stranded nucleic acid substrate having different extensions of single-strandedness; that is different combinations of 5' protruding, blunt (or "flush") or 3' protruding ends.
  • the double-stranded nucleic acid substrate may have only one 5' protruding end, only one 3' protruding end and/or only one blunt end.
  • the asymmetry may be created by restriction cleavage to create different ends on the nucleic acid substrate. Restriction enzymes leave either 5' protruding, blunt or 3' protruding ends. The 5' protruding ends are least favoured for Red alpha digestion.
  • the double-stranded nucleic acid substrate preferably has only one 5' protruding end.
  • each strand of the double-stranded nucleic acid substrate is a continuous nucleic acid strand.
  • the asymmetry may alternatively be caused by the double-stranded nucleic acid substrate having different extensions of double-strandedness.
  • one end may have no additional nucleic acid sequence beyond the end of the identity region, and the other may have additional non-identical sequences.
  • the additional non-identical sequences may be as short as 4 base pairs, however, preferably will be longer than 10 base pairs, and more preferably longer than 100 base pairs.
  • homologous recombination may also occur in the absence of the Red ⁇ exonuclease when a double-stranded nucleic acid substrate is exposed to a target nucleic acid under conditions suitable for recombination to occur. It is hypothesised that a helicase acts to separate the two strands of the double- stranded nucleic acid substrate and that the strand that is the single-stranded nucleic acid is then available for use in homologous recombination.
  • the double-stranded nucleic acid is symmetrically adapted at both of its 5' ends.
  • the double-stranded nucleic acid substrate is covalently modified at both of its 5' ends. Particularly preferred is the use of a double-stranded nucleic acid substrate in which both 5' ends are covalently modified with a biotin molecule, or more preferably, with a phosphothioate.
  • the recombination is carried out in the absence of Red ⁇ .
  • the invention also envisages using a helicase to generate the single-stranded nucleic acid from a double-stranded nucleic acid substrate that has 5' asymmetric ends, as described above.
  • the substrate may be dephosphorylated with alkaline phosphatase, and then cleaved with a second restriction enzyme.
  • restriction enzymes usually leave phosphates on the 5' end, this will generate an asymmetrically phosphorylated substrate.
  • Two oligonucleotides may be designed for use as the terminal identity regions as is usual for a recombineering exercise. These oligonucleotides may be chemically synthesized so that their 5' ends are different with respect to the presence of a replacement nucleotide at or in close proximity to the 5' end. These oligonucleotides can be used, for example, for oligonucleotide-directed mutagenesis after annealing, or PCR on templates to create the asymmetrically ended double-stranded nucleic acid substrate or mixed with standard double-stranded nucleic acid cassettes and co-introduced into a host for 'quadruple' recombination.
  • the double-stranded nucleic acid substrate may be made by any suitable method. For example, it may be generated by PCR techniques or may be made from two single- stranded nucleic acids that anneal to each other.
  • the double-stranded nucleic acid substrate may in particular be generated by long range PCR. Long range PCR has been used in the art to generate double-stranded fragments, for example of up to 50kb (Cheng et al. (1994) Proc Natl Acad Sci 91 : 5695-5699).
  • the 5' ends of one or both of the primers used in this long range PCR may be adapted so that the PCR product is suitable for use as the double-stranded nucleic acid substrate in the methods of the invention.
  • a preferred embodiment for the invention is to perform the homologous recombination in a host cell mutated for exonucleases, specifically an E. coli host mutated for sbcB.
  • the invention also provides a method comprising performing the homologous recombination in a host cell in which the activity of its endogenous sbcB exonuclease, or the orthologue or functional equivalent thereof, has been inactivated or reduced.
  • the host cell is E. coli.
  • SbcB or its orthologue or functional equivalent may be inactivated or the activity thereof may be reduced by way of a mutation.
  • the mutation inactivates the SbcB or its orthologue.
  • Any suitable mutation is envisaged, for example, a deletion, insertion or substitution.
  • the entire gene encoding the exonuclease may be deleted or one or more point mutations may be used to inactivate the SbcB or its orthologue.
  • the exonuclease may be inactivated in any other appropriate way, for example, by gene silencing techniques, by the use of exonuclease-specific antagonists or by degradation of the exonuclease.
  • Methods which utilise the mutant of the SbcB/orthologue/functional equivalent described above may be a method according to the present invention. Also provided is the use of the SbcB mutants (and corresponding orthologues/functional equivalents) in broader aspects of homologous recombination technology.
  • a method of altering the sequence of a target nucleic acid comprising (a) bringing a first nucleic acid molecule into contact with a target nucleic acid molecule in the presence of a phage annealing protein, or a functional equivalent or fragment thereof, wherein said first nucleic acid molecule comprises at least two regions of shared sequence homology with the target nucleic acid molecule, under conditions suitable for repair recombination to occur between said first and second nucleic acid molecules and wherein the functional equivalent or fragment retains the ability to mediate recombination and wherein the activity of the host's endogenous sbcB exonuclease or orthologue or functional equivalent thereof has been inactivated or reduced; and (b) selecting a target nucleic acid molecule whose sequence has been altered so as to include sequence from said first nucleic acid molecule.
  • the phage annealing protein is Red beta or a functional equivalent thereof. The method may be carried out in the absence
  • the target nucleic acid is the lagging strand template of a DNA replication fork and the inserted nucleic acid has 5' and 3' homology regions that can anneal to the lagging strand template of the target DNA when it is replicating.
  • the term "lagging strand”, as used herein, refers to the strand that is formed during discontinuous synthesis of a dsDNA molecule during DNA replication.
  • the single stranded replacement nucleic acid anneals through its 5' and 3' identity regions to the lagging strand template of the target nucleic acid and promotes Okazaki-like synthesis and is thereby incorporated into the lagging strand.
  • the double-stranded nucleic acid substrate is made from two or more double-stranded nucleic acids or from one or more double-stranded nucleic acids together with one or more single-stranded oligonucleotides.
  • the use of two double- stranded nucleic acids to make the double-stranded nucleic acid substrate is referred to herein as 'triple' recombination because there are two double-stranded nucleic acid molecules which are used to make the double-stranded nucleic acid substrate and there is one target nucleic acid.
  • the use of three nucleic acids to make the double-stranded nucleic acid substrate is referred to herein as 'quadruple' recombination because there are three nucleic acids which are used to make the double-stranded nucleic acid substrate and there is one target nucleic acid.
  • any number of single-stranded and/or double-stranded nucleic acids may be used to make the double-stranded nucleic acid substrate provided that the resulting double-stranded nucleic acid substrate is adapted at one or both of its 5' ends such that preferential degradation of one strand and/or strand separation generates the single-stranded nucleic acid.
  • each of the more than one nucleic acids must be able to anneal with a part of its neighbouring nucleic acid.
  • one end of each double-stranded nucleic acid that is used to make up the double-stranded nucleic acid substrate must be able to anneal to the target, whereas the other ends of each double-stranded nucleic acid that is used to make up the double-stranded nucleic acid substrate must be able to anneal to each other.
  • the two double-stranded nucleic acids that are used to make up the double-stranded nucleic acid substrate are adapted such that one strand of each double-stranded nucleic acid is preferentially maintained. Methods for adaptation that lead to preferential degradation are discussed above. Following degradation of one strand of each of the two double-stranded nucleic acids, the remaining single strands anneal with each other to form the double-stranded nucleic acid substrate of the invention.
  • Figure 1 Basic mechanism for site directed mutagenesis mediated by the R2S system
  • FIG. 1 Small insertions and replacements mediated by the R2S system
  • Figure 5 Scheme for shRNA insertion into a plasmid/BAC after linearization of the plasmid/BAC by the R2S system
  • Site directed mutagenesis was based on the selection of a focal sequence in the plasmid or episome that was be subject to mutagenesis.
  • This focal sequence was any 1 to 20 bp sequence, according to which Type IIS enzyme was used. In figure 1 it was 4 bp, specifically AGCT.
  • the Type IIS enzyme (here Bsal) can be any Type IIS restriction enzyme (as well as any Type IIS derived zinc-finger endonuclease or any other Type IIS derived enzyme).
  • Two recognition sequences for the chosen Type IIS enzyme are placed in inverted orientation so that cleavage at both sites separated the fragment carrying the two binding sites from the vector, leaving a complementary region of single-strandedness that can be readily ligated. By designing the cleavage sites in different ways, seamless mutagenesis can be easily achieved.
  • Figure 1 is read in clockwise direction, starting from the upper left part.
  • step one a DNA segment containing a gene for selection, usually a gene conferring antibiotic resistance, here blasticidin) and, optionally, a gene for counterselection (here it is rpsL for counterselection with streptomycin) was introduced by a standard method of homologous recombination (recombineering) and the cells carrying the recombined product were selected with blasticidin.
  • a gene for selection usually a gene conferring antibiotic resistance, here blasticidin
  • a gene for counterselection here it is rpsL for counterselection with streptomycin
  • the DNA segment can be made in any way, but most conveniently was made by with oligonucleotides that encode, from the 5 ' end, a homology arm for recombination into the plasmid (here the homology arms are indicated by the blue regions); the 4 nucleotides that became single-stranded after cleavage by the Type Hs enzyme; the recognition sequence for the Type Hs enzyme and a primer region for PCR amplification of a selectable gene (optionally including a counterselectable gene).
  • the recombined plasmid was purified, and optionally retransformed to isolate a pure plasmid preparation or not.
  • the recombined plasmid was digested with the Type Hs restriction enzyme (Bsal in this case) and re-ligated.
  • the product of this reaction is transformed and optionally selected for the loss of rpsL by selection for streptomycin resistance.
  • the obtained product was the mutated vector.
  • a single point mutation is illustrated, however the same approach was applied to mutate any of the base pairs that are within the single-stranded region created by the cleavage of the Type IIS enzyme .
  • the R2S strategy is particularly suitable for small insertions or small replacements. All steps are the same as described in example one, except the inserted DNA fragment was differently configured to suit the experimental purpose.
  • Seq X The sequence to be inserted or replaced (termed Seq X here and in figure 2) was adjacent to the focal sequence. Seq X can be any sequence of any length although most often it was less than 1000 bps due to convenience.
  • the obtained product was a vector containing the Seq X in any chosen place without a neighbouring selectable gene or any other operational sequence. Seq.X was any sequence that can be introduced in the left, or right, or both, of the focal point. It was one, or more than one, codon/s (that can replace codon/s in the vector). It was a loxP site or any other site for site specific recombinase site. It was a short tag or a restriction site. Frequently, seq X was 3 bp long and in frame within a coding region. Hence the method of this example details codon-based mutagenesis, which is superior to single nucleotide mutagenesis for the mutation of protein coding regions.
  • nucleic acid recombined into the target was the product of recombination between two DNA fragments.
  • this triple intermediate recombination worked with good efficiency because the initial recombination step must occur for the ultimate integratation of the selectable gene into the target. This represents a convenient way to introduce larger sequences such as coding regions for fluorescence proteins.
  • Quadruple recombination in which the replacement nucleic acid was constructed from three fragments prior to integration into the target.
  • RIIS was also used to linearize a circular nucleic at any position between any two nucleotides, for example within a vector, or indeed within a genome, as long as the Type IIS enzyme did not recognize any other recognition sequences in the molecule or genome.
  • This variation was particularly applicability to the linearization of BACs (bacterial artificial chromosomes) at chosen sites and the further use of these linearized BACs in complex recombination exercises.
  • this method had particular application for the preparation of linearised vector backbones for high efficiency cloning by standard molecular biology techniques.
  • any plasmid having a miR (microRNA) sequence may be used with this strategy.
  • the strategy generates a shRNA expressing plasmid in miR context without addition of any other sequence or restriction site (which can be detrimental to the miR processing) in the final construct.
  • the plasmid vector is based on the R6K plasmid origin gamma without the accompanying pi protein open reading frame (Filutowicz et ai, 1986, Positive and negative roles of an initiator protein at an origin of replication, PNAS 83: 9645-9). These plasmids can be grown in E. coli hosts containing the pi protein but will not replicate in E. coli hosts in the absence of pi.
  • the miR30 flanking sequences which improve RNA processing of shRNA transcripts (Silva et al, 2005, Second- generation shRNA libraries covering the mouse and human genomes., Nature Genetics, 37: 1281-8) have been cloned into this plasmid.
  • This example is shown schematically in Figure 5, which is read in clockwise direction, starting from the upper left part.
  • the plasmid was modified by replacing the sequence for a short hairpin RNA with rpsLbsd (conferring resistance to blasticidin and sensitivity to streptomycin) flanked by type IIS restriction sites.
  • the homology regions of the arms were chosen in the common miR sequence. This step was optionally performed in liquid culture using appropriate selection.
  • step 2 of Figure 5 the plasmid was digested in vitro using a Type IIS restriction enzyme.
  • the type IIS restriction enzyme cuts inside the common sequence to generate non-complementary overhangs of few nucleotides specific for the miR sequence. This mirrors the approach discussed in example 4 and shown in the scheme of Figure 4.
  • the linearized plasmid can be made as a batch and kept frozen for step 3.
  • the shRNA is made by commercially sourced complementary oligonucleotides having the desired sequence, that will form dsDNA with overhangs that can anneal to the ends left after the Type IIS restriction cleavage of step 2.
  • the ligation reaction is performed in vitro using standard conditions and counter selection for sensitivity to streptomycin is applied after transformation of the ligation reaction.
  • a plasmid carrying the lacZ gene is illustrated in Figure 6.
  • This plasmid also carries a gene conveying resistance to ampicillin (apR).
  • a gene conveying resistance to ampicillin (apR)
  • kmR kanamycin
  • the kmR gene and the restriction sites were amplified by PCR using oligonucleotides that not only contained the appropriate primers for PCR amplification but also contained 50 extra nucleotides of sequence identity either side of the insertion site in the lacZ gene.
  • the introduced restriction sites were absent from the starting plasmid and so are now unique in the product of the first recombination step. (NB this example employs four restriction enzyme sites but only one is needed).
  • the first step product was then recombined in a second step with an oligonucleotide that contained 50 extra nucleotides of sequence identity either side of the insertion site in the lacZ gene as well as the intended point mutation right in the middle (i.e. 50 nucleotides from each end).
  • the lacZ gene was restored by elimination of the kmR gene however the intended point mutation is also necessarily introduced.
  • the second step was unselected, a substantial amount of first step product remained, which needed to be separated from the second step product. This was achieved by harvesting all plasmids together, and cleaving with the enzyme Pmel, which cut only those plasmids which had not incorporated the oligonucleotide of the second step of Figure 6. After restriction, the mixture of linearized first step product and circular second step product was retransformed back into the host and selection for ampicillin resistance. A large proportion of the resistant colonies carried the restored lacZ gene with the introduced point mutation.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Virology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Mycology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method for altering the sequence of a target nucleic acid molecule, said method comprising the steps of: a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a first recognition sequence for a Type IIS nuclease; iii) a selectable marker; iv) a second recognition sequence for a Type IIS nuclease; v) a second region of homology to the target nucleic acid molecule; wherein components i) to v) are ordered from 5' to 3'; and a replacement nucleic acid sequence positioned between the first and second regions of homology but flanking the recognition sequences for the Type IIS nucleases; b) selecting for the incorporation of the replacement nucleic acid using the selectable marker, and c) cleaving the product of step b) with Type IIS nucleases such that the selectable marker is excised, to produce a linear target fragment including the desired replacement sequence.

Description

Method of altering nucleic acids
This application is related to a method for alteration of a nucleic acid molecule based on linked steps of recombination and endonuclease digestion.
All publications, patents and patent applications cited herein are incorporated in full by reference.
Background
The engineering of nucleic acid molecules, particularly DNA molecules, is of fundamental importance to Life Science research. For example, the construction and precise manipulation of nucleic acid molecules is required in many studies and applications in the research fields of, for example, functional genomics (for review, see Vukmirovic and Tilghman, Nature 405 (2000), 820-822), structural genomics (for review, see Skolnick et al., Nature Biotech 18 (2000), 283-287) and proteomics (for review, see Banks et al., Lancet 356 (2000), 1749-1756; Pandey and Mann, Nature 405 (2000), 837- 846).
A number of methods are currently available for engineering nucleic acid molecules, particularly DNA molecules. Conventional methods, which are still the most widely used, rely wholly on restriction digestion, followed by ligation (see Sambrook J and Russell D. W. Molecular Cloning, a laboratory manual, 3rd ed. (2000) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York). Progress in our understanding of the various mechanisms of nucleic acid recombination has allowed conventional cloning techniques to be complemented and partially replaced by more advanced strategies utilising homologous recombination (in prokaryotes, see below; in eukaryotes, see, for example, Bode et al., Biol Chem 381 (2000), 801 -813; Joyner, Gene Targeting, a practical approach, (2000) second edition, Oxford University Press Inc. New York), PCR- directed mutagenesis (see Ling and Robinson, Anal. Biochem. 254 ( 1997), 157-178), site- specific recombination (for example, Hauser et al., Cells Tissues Organs 167 (2000), 75- 80) and transposon mutagenesis (see, for example, Martienssen, Proc. Natl. Acad, Sci. USA 95 (1998), 2021 -2026; Parinov and Sundaresan, Curr Opin Biotechnol 1 1 (2000), 157- 161 ).
One particular area of mutagenesis which is restricted by current technologies is site directed mutagenesis (SDM). SDM is a fundamental methodology in molecular biology whereby a single nucleotide, or a few nucleotides at one site, are altered. This methodology is especially relevant for the analysis of protein function because alteration of a codon in a protein coding gene is the way to change a single amino acid in the context of the rest of the protein. SDM is also powerful for other types of functional enquiry via mutagenesis, including discrete analyses of cis elements in DNA or RNA molecules. SDM requires a high degree of precision. Ideally only one to a few nucleotides are altered in the target DNA molecule, which can be as small as an E.coli plasmid of 2000 bps, or as large as a higher eukaryotic genome of 10,000 million bps. This application addresses the current technical limitations of SDM. Existing methods, particularly those that are popular in commercial practice, are severely limited in size and practice to E.coli plasmids of less than 8000 bps. These methods are limited to small sized plasmids by two different limitations. Firstly random mutagenesis, and secondly cost in terms of time and money. These limitations are described in more detail below.
In two-step PCR, which encompasses the frequently applied Overlap PCR' technique, the final product is assembled through 2 rounds of PCR. In the first round one or more PCR fragments are produced which are then assembled, or extended, using a differing combination of primers in a second round of PCR. The product of the second round is then cloned into a vector using standard molecular biology techniques. Whilst this method is simple, and can be easily performed with commonplace techniques, it is disadvantaged by the increased mutational frequencies inherent with multiple PCR rounds.
A second set of methods are based on amplification of the whole target plasmid. The first of these applies 'inverse PCR'. Here the whole plasmid is amplified as a single linear fragment, wherein the desired mutations are introduced through the use of mutagenic primers. Following amplification, the product may be digested with the Dpnl enzyme to cleave methylated and hemimethylated DNA, i.e. the unmutated template DNA. If the Dpnl digestion step is not complete, a proportion of the plasmids recovered are likely to be unmutated parent. However, completion of digestion cannot be confirmed easily. In this situation a gel-extraction step may be performed, although this step is likely both to extend and complicate the process, and additionally contribute to substantial loss of the product. The linear fragment is then ligated back to itself to reform a plasmid, before transformation into a host cell. To increase the efficiency of the re-ligation, the fragment is treated with a kinase, or cleaved by a restriction enzyme that recognises a sequence at the termini to generate termini with phosphates attached.
Restriction enzymes are grouped into broad classes based on their characteristics. Most commonly applied to molecular engineering are the Type Il enzymes. These enzymes are linked by their recognition of palindromic DNA sequences, and cleavage within this sequence. Cleavage is defined herein as the action by the restriction endonuclease which causes the introduction of a break into the nucleic acid backbone. In other words it causes a break in the covalent bond between sugar and phosphate groups in the nucleic acid backbone. The cleavage of both strands of a double-stranded nucleic acid results in the dissociation of the nucleic acid sequences on either side of the cleavage. Introduction of a Type II enzyme recognition/cleavage site into the linear fragment solely for the purpose of cleavage leads to extraneous nucleotides being incorporated into the re-circularised product, which is generally going to be incompatible with the mutagenesis strategy.
More irregular are the Type IIS restriction enzymes, for example the much-studied enzyme Fokl. These endonucleases recognise asymmetric sequences (herein termed the "recognition sequence") and cleave outside of this recognition sequence (herein termed the "cleavage point" or "point of cleavage"). The point of cleavage may be the same on each strand of the DNA helix to leave a blunt end, or may be recessed on one strand relative to the other, to leave a 5' or 3' overhang. The distance between recognition sequence and cleavage points varies with the enzyme, but distances of between from 1 up to 20 nucleotides have been recorded (Szybalski et al. Gene 20 (1991 ) 13-26). Type IIS enzymes are not generally applicable in routine molecular cloning. This is because these enzymes cut outside of their defined recognition sequence, and therefore the termini created by digestion do not have a particular sequence. In other words, the 5' or 3' overhangs left following cleavage by these enzymes can be of any sequence at all.
Type IIS enzymes have been applied in specific situations, which by appropriate design, took advantage of their ability to cleave outside of the recognition sequence. For example, these enzymes have been applied in the construction of coding sequences without extraneous nucleotides resulting from the use of a standard restriction enzyme recognition sequence. Examples of this use include domain swapping experiments, translational fusion tagging, reporter gene fusions and also mutagenesis studies.
Type IIS restriction enzymes have been applied in protocols which amplify the whole backbone of the plasmid by PCR. Primers with Type IIS recognition sequences at their 5' end followed by the desired mutation, and a 3' region for annealing to the template were designed. By amplifying the plasmid with these primers, subsequent digestion with a Type IIS enzyme allowed seamless re-ligation of the mutated linear vector, i.e. without the inclusion of unwanted sequences. This approach is exemplified by the Phoenix Mutagenesis system (Shigaki and Hirshi Anal. Biochem. (2001 ) 298: 1 18-120). As discussed earlier, methods of this nature have a severe drawback resulting from PCR of the whole fragment in that there is ample opportunity for the errors in DNA replication to occur during the in vitro amplification.
A further method that amplifies the plasmid backbone is the "Quick Change" PCR mutagenesis method (Stratagene). This kit is the most popular method for SDM. In this method two complementary primers, which also contain the desired mutations, are used to prime extension around the plasmid, from the point of annealing. In this method, all amplification is from the parent molecule, which significantly reduces unwanted mutations. The product is then treated with Dpnl, as detailed above, and in doing so suffers from the same disadvantages detailed above. As this method produces double- stranded PCR products with complementary single-stranded regions at their termini no restriction enzyme cleavage/kinase/ligation steps are required for re-circularisation prior to transformation into the host cell.
Although PCR-based in vitro strategies which amplify around the plasmid backbone allow precise site-directed mutagenesis to be effected, such methods suffer from the introduction of unwanted artefactual secondary mutations in the targeted molecule during amplification of the mutated nucleic acid product. In order to verify that these molecules are free of erroneous mutations, the entire molecule must be sequenced. This imposes a significant bottleneck on the mutagenesis process in respect of time and/or costs. The costs of screening may not be prohibitive for the mutation of one base to another specified base, but with increasing number of permutations in a single mutagenic reaction, for example when attempting saturation mutagenesis at a number of positions, the costs may become quite significant. Furthermore, methods amplifying the whole target plasmid with high efficiency and fidelity are presently limited to molecules of a maximal size of around 10- 15 kilobasepairs, and typically around 8 kilobasepairs.
A further limitation of this method stems from the fact that the PCR reactions in which the mutant sequences are generated are primed by synthetic oligonucleotides. Should it be desired to introduce multiple mutations, the maximum distance between the mutations which can be achieved in a single "Quick Change" reaction is limited by current oligonucleotide synthesis technologies, that is a distance of 100-200 oligonucleotides. Furthermore, longer oligonucleotide syntheses are more prone to produce mutant products. This adds weight to the reality that the products must be sequenced to ensure that the desired, and only the desired, mutations have been incorporated into the target. Other techniques for mutagenesis do not allow flexible DNA engineering at any chosen position, but instead require specific sequence elements (site-specific recombination based methods) or are based on random targeting (transposon based methods). The application of homologous recombination in DNA engineering has been pioneered in S. cerevisiae (for review see Shashikant et al., Gene 223 (1998), 9-20). However, since several inherent complications limit the usefulness of yeast as a DNA engineering host, homologous recombination based DNA engineering has recently been established in the premier cloning host E. coli. (for review see Muyrers et al., Trends in Bioch Sci, (2001) 26(5): 325-331). The most widely studied is the RecA-dependent recombination pathway, which is responsible for the majority of recombinogenic processes in the bacterial cell. A second recombination pathway is the RecF-pathway. In the third pathway, recombination requires the expression of both components of the RecE/RecT protein pair, or of its functional homologues derived from the lambda phage, Redα/Redβ. In the last few years, a technology termed ET recombination has been developed that uses the RecE/RecT protein pair (or its functionally homologous pair Redα/Redβ) for precise DNA engineering (Zhang et al., Nature Genet 20 (1998), 123- 128; Muyrers et al., Nucl Acids Res 27 (1999), 1555-1557; International patent application WO99/29837; for review see Muyrers et al., Trends Bioch Sci 2001) 26(5): 325-331 ). This system is immensely powerful and may be used to introduce substitutions, deletions and insertions into nucleic acid molecules, as desired.
However, another disadvantage of the methods described above is that there is no simple, generally applicable method for identifying products which contain the mutation from those that do not. In the absence of any easily selectable marker or physical selection systems, PCR-based methods can be used to detect the mutation. Approaches based on these methods do not employ a selectable gene but simply rely on finding the intended mutation amongst a usually large to very large background of unaltered substrate. Such approaches are labour intensive and not efficient.
Recombinogenic mutagenesis (recombineering) has been used to incorporate selectable markers into the target DNA in conjunction with the desired mutation. The ability to select by a physical characteristic, for example resistance to an antibiotic, greatly simplifies the process of screening. A significant disadvantage of the current methods of recombinogenic mutagenesis is that these selectable markers are often left in the targeted construct following recombination. To solve this problem, methods have been developed to incorporate sites for further site specific recombinases (e.g Bloor & Cranenburgh, Appl. Env. Micro. (2006) 2520-2525). By designing these sites in the correct orientation relative to one another, the selectable marker can be removed from the construct by the action of the further recombinase. In this case, however, artefacts comprising one copy of the site-specific recombinase sites are left behind in the targeted construct, which can significantly affect the structure and function of the DNA into which they are inserted. This is particularly disadvantageous in methods of site directed mutagenesis, where often only a single nucleotide change is desired.
Attempts have been made to perform a second recombineering step to remove the marker. These attempts have only been met with success in single copy systems, for example BACs (Warming et al., Nucl. Acid Res. (2005) e36). When applying this approach in multicopy systems, it was observed that intermolecular recombination was observed, as opposed to the desired intramolecular reaction. This resulted in a mix of desired product, the intermediate product produced in the first recombineering step, higher multimers of these plasmid and also hybrid molecules, such as a multimer containing the sequence of both the desired product and the intermediate product produced in the first recombineering step (Thomason et al., Plasmid (2007) 58: 148- 158).
Hence, there is an important need to develop better methods for SDM that are not fundamentally limited to minor tasks, in other words, methods that are not constrained by the maximum size of either the target, or the mutagenic oligonucleotide, or by excessive costs of time and money spent in screening for the desired mutants. A further reason for developing better methods lies with high throughput (HT) methodology. The current methods, particularly the screening steps, which in some cases may require the sequencing of the whole plasmid backbone, are labour intensive, expensive and not easily adaptable to HT processing and automation. Given the vast challenges presented by biological diversity and the clear need to magnify the scale of analysis using HT methodologies, SDM methods that permit HT scale-up will create new applications. That is not to say that such methods are limited to performing solely SDM. Any method of mutagenesis that is amenable to HT applications will be of great use in the field.
Methods for altering the DNA sequence of a target nucleic acid to date have only applied recombineering approaches on their own, or PCR-based approaches in conjunction with cleavage of the PCR product at the termini, with Type IIS enzymes to allow seamless re- ligation. It is the object of the current invention to provide a method for the swift and simple alteration of target nucleic acid molecules, which by virtue of these properties is amenable to high-throughput applications, using an inventive combination of recombineering and endonuclease restriction to effect seamless mutagenesis of target molecules.
Summary of the invention
According to the invention, there is provided a method for altering the sequence of a target nucleic acid molecule, said method comprising the steps of: a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a first recognition sequence for a Type IIS nuclease; iii) a selectable marker iv) a second recognition sequence for a Type IIS nuclease; v) a second region of homology to the target nucleic acid molecule; wherein components i) to v) are ordered from 5' to 3'; and a replacement nucleic acid sequence positioned between the first and second regions of homology but flanking the recognition sequences for the Type IIS nucleases; b) selecting for the incorporation of the replacement nucleic acid using the selectable marker, and c) cleaving the product of step b) with Type IIS nucleases such that the selectable marker is excised, to produce a linear target fragment including the desired replacement sequence.
This method is herein termed R2S. Methods according to the invention are swift, simple, efficient and amenable to high-througnput methodologies. The applications described herein are very suited to plasmid engineering in prokaryotes and eukaryotes, although they are also applicable to engineering large molecules and even genomes. To achieve the inventive breakthrough described here, it was necessary to bypass some of the existing limitations of homologous recombination, as well as to combine homologous recombination using a selectable gene with the employment of a step using Type IIS nucleases in a very specific way.
Type IIS nucleases are characterized by their ability to cleave at a short distance away from their binding site, rather than within the binding site as usual for restriction enzymes. Because Type IIS nucleases cleave outside their binding sites in flanking DNA, a repeat of a Type IIS site may be cut by the cognate enzyme, leaving DNA ends that can, by simple design, have complementary, single-stranded extensions that can anneal and through efficient ligation establish covalently closed plasmids (or episomes) for further propagation. In a preferred embodiment the ligated product is then transformed into a host cell.
Site-directed mutagenesis
According to one embodiment of the invention, the method can be used for site-directed mutagenesis of a target nucleic acid molecule. In this method, the linear target fragment product of step c) is ligated to generate the desired replacement sequence.
This embodiment can be illustrated by reference to Figure 1 herein. The site-directed mutagenesis is based on the selection of a focal sequence in the target nucleic acid molecule (such as a plasmid or episome) that will be subject to mutagenesis. Preferably, this focal sequence can be any 1 to 5 bp sequence, according to which Type IIS enzyme is used - here in Figure 1 it is shown as 4 bp, specifically AGCT. It will be understood by the skilled addressee that the focal sequence may be of any length, and designed such that it is appropriate to the strategy employed.
The Figure should be read in clockwise direction starting from the upper left part.
In step a), a nucleic acid fragment containing a gene for selection, usually a gene conferring antibiotic resistance, (here shown as blasticidin) and, optionally, a gene for counterselection (here it is rpsL for counterselection with streptomycin) is introduced by homologous recombination. The nucleic acid fragment includes, in order from 5' to 3', a first region of homology to the target nucleic acid molecule, also termed a "homology arm"; a first recognition sequence for a Type IIS nuclease; a selectable marker; a second recognition sequence for the Type IIS nuclease; and a second region of homology to the target nucleic acid molecule. The two homology arms flank the focal sequence in the target nucleic acid molecule.
In addition, the nucleic acid fragment includes a replacement nucleic acid sequence that is positioned between the first and second regions of homology. The replacement sequence must flank, i.e. lie outside, the recognition sequences for the Type MS nucleases. The replacement nucleic acid molecule includes two focal sequences. These focal sequences form the replacement sequence to be inserted into the target nucleic acid and are also the cleavage points for the Type Hs nuclease(s). A first portion of the replacement sequence is present in the first focal sequence. A second portion of the replacement sequence is present in the second focal sequence. The length of the focal sequence will depend on the mutagenesis strategy being employed. For example, a focal sequence may be very short in the case where site directed mutagenesis is concerned, and is preferably 1-4 bp.
In this aspect of the invention, the focal sequences should be designed such that they are cleaved with the Type IIS nucleases to leave compatible termini in the product of step c) of the method of the first aspect of the invention that can re-ligate to form a circular molecule.
In this embodiment, the replacement sequence will not be made up of the two focal sequences in their entirety, as a portion of each focal sequence will be excised by the action of the Type IIS nuclease(s). The portion of one focal sequence that forms part of the replacement sequence is the portion that extends from the homology region to the cleavage point of the enzyme within that focal sequence, including any region of single- strandedness, as a result of the cleavage by the Type IIS nuclease(s).
The recombined target nucleic acid, for example an episome in Figure 1 , is purified, and optionally, it can be retransformed to isolate a pure plasmid preparation.
In step b) the recombined target nucleic acid is digested with the Type IIS nuclease (Bsal in this case) and re-ligated. The Type IIS nuclease can be any type IIS derived restriction enzyme or zinc-finger endonuclease. The two recognition sites for the chosen Type IIS enzyme are placed in inverted orientation so that cleavage at both sites separates the fragment carrying the two recognition sites and including the selection marker from the target nucleic acid molecule.
The recognition sequences of the Type IIS nuclease may be in the same orientation or inverted. By "'in the same orientation" is meant that, if the cleavage of the nucleic acid is 5' to the first recognition sequence then it is also 5' to the second recognition sequence. Alternatively, if the cleavage of the nucleic acid is 3' to the first recognition sequence then it is also 3' to the second recognition sequence. By "inverted'" is meant that, if the cleavage of the strand is 5' to the first recognition sequence then it is 3' to the second recognition sequence, or vice versa. In a preferred methodology the recognition sequences are inverted. Particularly preferred is an inverted repeat of the recognition sequence which directs cleavage by the Type Hs nuclease(s) such that the recognition sites are located on the same nucleic acid digestion product as the selectable marker.
Following cleavage, two regions of replacement sequence remain which abut (or are contiguous with) the homology arms. Appropriate design of these sequences means that these form complementary regions of single-strandedness that can be readily ligated. By designing the cleavage sites in different ways, seamless mutagenesis can be easily achieved. The product of this reaction is transformed to propagate the desired nucleic acid molecule. In the example presented in Figure 1 , the transformation can include an optional selection for the loss of rpsL by selection for streptomycin resistance.
The obtained product will be the mutated vector. In Figure 1 , a single point mutation is illustrated, however the same approach can be applied to mutate any of the base pairs that fall within the single-stranded region created by the cleavage of the Type IIS nuclease. For example, mutations may comprise substitutions, insertions or deletions. The number of base pairs changed may range from a single point mutation, to multiple substitutions, insertions or deletions, for example, 2, 3, 5, 10, 20, 50 or more base pairs.
In step a) of the method, a nucleic acid fragment is introduced into the target nucleic acid molecule by homologous recombination. The most suitable homologous recombination technique for use in these methods employs the Red operon from phage lambda and is commonly termed 'recombineering' (see Zhang et al., Nature Genet 20 (1998), 123-128; Muyrers et al., Nucl Acids Res 27 (1999), 1555-1557; co-pending International patent application WO99/29837; co-pending European patent application EPl 399546; also co- pending International application filed on 20th February 2009 and entitled "Method of nucleic acid recombination"; the contents of all these documents is hereby incorporated by reference. Recombineering uses the Redβ protein, or combinations of proteins comprising the Redβ protein, for example the Redα, Redβ and Redγ proteins, to mediate homologous recombination.
The introduced nucleic acid fragment must therefore be brought into contact with the target nucleic acid molecule in the presence of a phage annealing protein, or a functional equivalent or fragment thereof. Suitable phage annealing proteins for use in the invention (as known at the time of writing) include RecT (from the rac prophage), Redβ (from phage λ), and Erf (from phage P22). The identification of the recT gene was originally reported by Hall et al., (J. Bacterid. 175 (1993), 277-287). The RecT protein is known to be similar to the λ bacteriophage β protein or Redβ (Hall et al. (1993); Muniyappa and Radding, J.Biol.Chem. 261 (1986), 7472-7478; Kmiec and Holloman, J.Biol.Chem.256 (1981 ), 12636- 12639). The Erf protein is described by Poteete and Fenton, (J MoI Biol 163 (1983), 257-275) and references therein. Erf is functionally similar to Redβ and RecT (Murphy et al., J MoI Biol 194 ( 1987), 105- 1 17), and in some cases can substitute for the lambda phage recombination system (Poteete and Fenton, Genetics 134 (1993), 1 OHIO 1021). The invention also includes the use of functional equivalents of the molecules that are explicitly identified above as RecT, Redβ and Erf, provided that the functional equivalents retain the ability to mediate recombination, as described herein and in European patent application EPl 399546. Such functional equivalents include homologues of elements of recombination systems that are present in bacteriophages, including but not limited to large DNA phages, T4 phage, T7 phage, small DNA phages, isometric phages, filamentous DNA phages, RNA phages, Mu phage, Pl phage, defective phages and phagelike objects, as well as the functional homologues of elements of recombination systems that are present in viruses (e.g. Datta et al., 2008 PNAS 705: 1626-1631). Of course, as and when additional, functionally equivalent annealing proteins are discovered, for example, as a result of genome sequencing projects of other coliphages and lambdoid phages, it is envisaged that these annealing proteins will be equally suitable to those that are explicitly recited above.
In any context, homologous recombination, even using an efficient system like the Red operon, is a rare event and so according to the methods of the invention, a selection step needs to be included to ensure high efficiency. Usually the selection step should be based on the insertion of an antibiotic or other gene so that the product can be distinguished from substrate. In conventional methods, the selectable gene remains at the site of the mutagenesis and must be removed, as the insertion of a gene is incompatible with most genetic engineering strategies, for example, SDM, which requires only a very small change at the site of mutagenesis. The desire to ensure seamless removal of the selectable marker is equally valid for other types of mutagenesis, such as where it is desired to maintain the mutated construct as close as possible to its original form incorporating only the necessary mutations, for example the placement of a whole domain in a coding sequence with another similar domain. As described above, previous applications of recombineering to SDM attempted to solve this problem by use of counterselectable genes and additional site specific recombination sites, which were met with numerous difficulties. The methods of the present invention overcome all these problems.
Any selectable marker may be used, either conferring resistance, sensitivity, causing fluorescence and so on. The selectable marker may be an antibiotic resistance marker. Alternatively, the selectable marker may be an enzyme which complements an auxotrophy. In other words the selectable marker may be an enzyme which produces an essential nutrient. In a host which lacks the ability to produce this essential nutrient, only those cells containing the selectable marker will be able to grow in media which lack the nutrient. Examples of auxotrophic markers are well known in the art. A non-limiting list of examples includes ura3, pyrG, niaD and trpC. Incorporating a marker ensues simple screening because targets incorporating the introduced nucleic acids can be identified by phenotypic screening, using selectable growth media, rather than by sequence or indeed size, on a gel. Use of a fluorescence marker may be particularly applicable for high- throughput methodologies. By using such marker it will be possible to isolate cells containing the desired product by Fluorescence-activated cell sorting (FACS).
The introduced nucleic acid molecule may further comprise a counterselectable marker between the two Type IIS nuclease recognition sites. By coupling the selectable marker to a counterselectable marker, a selection pressure for the absence of both markers can be exerted. For example, applying a counterselection pressure during the growth of the host cell into which the religated product of step c) is transformed will ensure that only clones which lack the marker grow. In this case surviving clones will be those which have been digested by the Type IIS restriction enzyme and then re-ligated such that selectable marker, the counterselectable marker, and the Type IIS recognition sequences are no longer present. In a similar vein to the presence of a selectable marker, selected for following step a) of the method, selection for the absence of a counterselectable marker permits the simple and inexpensive screening of numerous clones, as opposed to the considerably more onerous and expensive screening methods involving further PCR and sequencing reactions. These characteristics further advantage the use of this method in high-throughput methodologies.
A non-exhaustive list of counterselectable markers include rpsL, which renders sensitive those E. coli hosts that are naturally resistant to streptomycin. Its removal thus restores resistance. Another counter selectable gene is SacB, which conveys toxicity to sucrose.
Other methods of selection will be known to those of skill in the art. For example, a primer region may be included for PCR amplification of a selectable gene.
According to the invention, the introduced nucleic acid fragment should possess at least two regions of sequence homology (homology arms) with regions of sequence on the target nucleic acid molecule. By "homology" is meant that when the sequences of the introduced and target nucleic acid molecules are aligned, there are a number of nucleotide residues that are identical between the sequences at equivalent positions. Degrees of homology can be readily calculated (Computational Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing. Informatics and Genome
Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part 1 , Griffin, A.M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991). Such regions of homology are preferably at least 9 nucleotides each, more preferably at least 15 nucleotides each, more preferably at least 20 nucleotides each, even more preferably at least 30 nucleotides each. Particularly efficient recombination events may be effected using longer regions of homology, such as 50 nucleotides or more. Preferably, the degree of homology over these regions is at least 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98% or 99% or more identity, as determined using BLAST version 2.1.3 using the default parameters specified by the NCBI (the National Center for Biotechnology Information; http://www.ncbi.nlm.nih.gov/) [Blosum 62 matrix; gap open penalty=l 1 and gap extension penalty=l].
The regions of sequence homology may be located on the introduced nucleic acid fragment so that one region of homology is at one end of the molecule and the other is at the other end. However, one or both of the regions of homology may also be located internally. The two sequence homology regions should thus be tailored to the requirements of each particular experiment. There are no particular limitations relating to the position for the two sequence homology regions located on the target DNA molecule, except that for circular double-stranded DNA molecules, the repair recombination event should not abolish the capacity to replicate. As the skilled reader will appreciate, the sequence homology regions can be interrupted by non-identical sequence regions, provided that sufficient sequence homology is retained to allow the repair recombination reaction to occur.
The introduced nucleic acid molecule should also include at least two recognition sites for Type IIS nucleases. A recognition sequence is a nucleic acid DNA sequence to which the nuclease binds during cleavage of its cognate target nucleic acid. In most embodiments of the present invention, the first and second Type IIS recognition sequences in the introduced nucleic acid are recognised by the same Type IIS restriction nuclease. However, in other embodiments the Type IIS recognition sequences are recognised by differing Type IIS restriction nucleases.
Included as Type IIS nucleases are Type IIS restriction enzymes, Type IIS derived zinc- finger endonucleases and any other Type IIS derived enzyme. A list of some currently available Type IIS restriction enzymes can be found at www.neb.com. The invention also includes the use of functional equivalents of the molecules that are explicitly identified above as Type IIS provided that the functional equivalents retain the ability to recognise a recognition sequences and cleave outside of this sequence. Such functional equivalents can be found in the Type III and Type IV classes of restriction enzymes. Also included as functional equivalents are fragments of the Type IIS restriction enzymes, such as truncated variants, and fusion proteins of which the sequence of a Type IIS restriction enzyme forms a part, that retain the ability to recognise a recognition sequence and cleave outside of this sequence. It is considered that the identification of such functional equivalents is within the ability of the skilled addressee.
Also included as functional variants are Type IIS enzyme variants that have been optimised and/or evolved, through, for example DNA shuffling (Stemmer, W.P. Nature 370, 389-91 (1994)), or Substrate-linked directed evolution (SIiDE, GB 0029375.3). Such functional variants may be evolved such that their recognition sequence is changed (Doyon et al. J. Am. Chem. Soc (2006) 128: 2477-2484).
Also included are Type IIS enzymes created from other proteins, for example by directed evolution or as a fusion protein, to generate a Type IIS enzyme that can recognize long sequences. An example of such a sequence is the recognition sequence for the site- specific recombinase Cre, whose recognition sequence is called loxP and is 32 bps long. A Cre recombinase engineered to become a Type Hs enzyme would serve a useful purpose as a very rare cutting instrument, since such enzymes would allow the invention to be applied to whole genomes, since the introduced recognition sites would be unique. Furthermore, very rare cutting Type 11s enzymes have already been developed based on combinations of zinc fingers, and these are also incorporated by reference herein (e.g. Kim et al PNAS 93 (1996) 1 156- 1 160; Pabo, C. O., Peisach, E., and Grant, R. A. (2001 ), Annu Rev Biochem 70, 313-40; Isalan, M., Klug, A., and Choo, Y. (2001 ), Nat Biotechnol 19, 656-60; Berg, J. M. (1997), Nat Biotechnol 15, 323; Jamieson, A. C, Wang, H., and Kim, S. H. (1996), Proc Natl Acad Sci U S A 93, 12834-9; Rebar, E. J., and Pabo, C. O. (1994), Science 263, 671 -3; Rhodes, D., and Klug, A. (1993), Sci Am 268, 56-9, 62-5.
Preferably, the recognition sequence of the Type IIS nuclease enzyme should be designed such that when the enzyme cleaves it does so within the focal sequence. In contrast to standard Type II palindromic restriction endonuclease recognition sequences, Type IIS restriction endonuclease recognition sequences are not palindromic. The orientation of the recognition sequence therefore determines the direction in which the cut is made by the endonuclease. In a preferred embodiment the enzyme cleaves within the adjacent focal sequence. In a preferred embodiment the distance between the recognition sequence and the cleavage points is 1 -20 nucleotides, for example 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides. Typically the distance between the recognition sequence and the cleavage point is 1 -4 nucleotides. In an alternative, this distance may be more than 20 nucleotides.
The distance between the recognition sequence and cleavage point is the distance in nucleotides between the closest nucleotide of the recognition sequence (which is not counted) and the nucleotide prior to the cleavage point. If the distance between the recognition sequence and cleavage point is the same for both strands of the nucleic acid, then digestion with a Type IIS enzyme will result in blunt ended termini. If the distance between the recognition sequence and cleavage point varies then termini with a 5' or 3' overhang will be produced, depending upon which strand has the longer distance between the recognition sequence and cleavage point.
Linearization
This same system can also be used to linearize a vector at any position between any two nucleotides within a target nucleic acid molecule (see Figure 4). In this embodiment, a nucleic acid fragment is introduced by homologous recombination into a target nucleic acid molecule as described above, which fragment includes first and second regions of homology to the target nucleic acid molecule, around a focal sequence. The fragment also includes first and second recognition sequences for a Type IIS nuclease, in inverted orientation, and a selectable marker. In this embodiment, two replacement nucleic acid sequences matching portions of the focal sequence are positioned at each end of the molecule between the first and second regions of homology and flanking the recognition sequences for the Type IIS nuclease. Therefore by design, a cleavage at a specified nucleotide or between any two specified nucleotides can be achieved without adding any extra nucleotides to the ends after cleavage. Cleavage by the TypeIIS nuclease can generate two ends that are compatible for religation or can generate ends that are not compatible for religation such that upon cleavage the regions of single-strandedness generated do not have compatible ends and are thus incapable of being ligated to each other to form a circular molecule. For example, one terminus may have 3' overhang, and the other a 5' overhang; one may have a 3' overhang, and the other be blunt; one may have a 5' overhang, and the other be blunt; both may have 5' or 3' overhangs, but of non- complementary sequence such that they are incapable of annealing to one another. The precise format of the termini will depend on the difference in distance between the recognition sequence and the cleavage point, as the skilled person will understand. Accordingly, the target fragment includes the desired replacement sequence but is linearised.
Such a molecule might conveniently be a vector, but could be a larger molecule, even a genome, as long as the Type IIS nuclease used does not recognize any other sequence in the molecule or genome. This variation has particular applicability to the linearization of BACs (bacterial artificial chromosomes) at chosen sites and the further use of these linearized BACs in complex recombination exercises.
When cloning into a vector linearised by the method described above, the efficiency of cloning can be increased by applying a selection pressure against a counter selectable marker excised during the digestion by the Type Hs nucleases. Such selection would be far preferred to gel extraction, or CsCl gradient purification for the exclusion of undigested vector in the cloning product.
In some embodiments, a vector linearized by this method is used for cloning nucleic acids, such as DNA, for example DNA encoding short hairpin RNAs (shRNAs). In some embodiments the vector which is linearised has first and second regions of homology which comprise or consisfof microRNA (miRNA, or miR) sequences. This is particularly useful when the linearized vector is used to clone shRNAs. For example, the first and second regions of homology may comprise or consist of sequences from miR30. In order to transcribe the DNA encoding the shRNA, the vector may include a promoter. In some embodiments the first and second regions of homology comprising or consisting of miRNA sequences, such as miR30 sequences, are designed such that the sequences are removed from the shRNA following microRNA processing of the expressed construct. This approach was employed in Silva et al, 2005, Second-generation shRNA libraries covering the mouse and human genomes., Nature Genetics, 37: 1281 -8. A system for cloning shRNAs into a vector comprising miRNA sequences, linearized according to this method of the invention, is shown in Figure 5 and detailed in example 5. This method is particularly advantageous because it leaves no scar in the shRNA or miRNA sequences. Such scars may have detrimental effects on the secondary structure of the transcribed RNA.
In some embodiments, a linearized vector is used for cloning a double stranded DNA formed from annealed oligonucleotides and having overhanging ends, wherein the ends of the double stranded DNA are compatible with the ends of the linearized vector. The double stranded DNA to be inserted into the linearized vector may be prepared by any of the methods known in the art, for example restriction enzyme digestion out of another vector, a PCR fragment, which may then be digested with a restriction enzyme, and also double stranded DNAs prepared by the annealing of complementary single stranded nucleic acids. In some embodiments, the double stranded DNA is formed by complementary oligonucleotides and encodes a shRNA. The double stranded nucleic acid may then be inserted into the linearized vector using methods known in the art, such as ligation using a DNA ligase, for example T4 DNA ligase.
Insertion
A further embodiment of the method of the invention provides a simple variation to achieve small insertions and replacements. All steps are the same as illustrated above except the inserted nucleic acid fragment is differently configured to suit the experimental purpose. This methodology is illustrated in Figure 2.
According to this embodiment, the sequence to be inserted or replaced (here denoted as seq X) is ultimately positioned adjacent to a focal sequence in the target nucleic acid molecule. In this embodiment, the nucleic acid fragment that is introduced in step a) of the method includes, at either end, adjacent to and internal to the homology arms, two focal sequences that match the focal sequence in the target nucleic acid molecule. In this embodiment of the method, the inserted sequence represents the replacement nucleic acid sequence, and is positioned between the first and second regions of homology but external to the recognition sequences for the Type IIS nuclease. In this manner, cleavage at its recognition sites by the Type IIS nuclease excises the selectable marker and leaves behind a single focal sequence contiguous with the inserted sequence. In this manner, insertion of the sequence into the target nucleic acid molecule is achieved without leaving any scar or residual heterologous sequence.
The inserted sequence can be any sequence of any length although most often it will be less than 1000 bps for convenience, for example, about 2, 5, 10, 20, 50, 100, 200 or 500. The obtained product will be a vector containing the inserted sequence in any chosen place without any neighbouring selectable gene or any other operational sequence. The inserted sequence can be any sequence that can be introduced to the left (5'), or right (3'), or both (5' and 3'), of the focal sequence.
The inserted sequence can be one, or more than one, codon(s) (that can replace codon(s) in the target nucleic acid molecule). Also possible are larger in-frame insertions which can be accomplished with the insertion of larger lengths of nucleic acid. It will generally be preferred that the inserted length of nucleic acid contains a number of nucleotides which is divisible by three, such that the reading frame is maintained. In one preferred embodiment, the inserted sequence is 3 bp long and in-frame within a coding region. Hence this method encompasses codon-based mutagenesis, which is superior to single nucleotide mutagenesis for the mutation of protein coding regions.
Furthermore, by including in the introduced nucleic acid fragment, sequence homology arms that span regions of non-identical sequence compared to the target nucleic acid molecule, mutations such as substitutions, (for example, point mutations), insertions and/or deletions may be introduced into the target nucleic acid molecule.
The inserted sequence can be a loxP site or any other site-specific recombinase site; it can also be a short tag or a restriction site. Other examples will be clear to those of skill in the art.
Large insertion
A further embodiment of the invention provides a method suitable for larger insertions (illustrated in Figure 3). In this method, the introduced nucleic acid fragment is comprised of two or more components. "Triple recombination" is an embodiment of the method of this aspect of the invention where the introduced nucleic acid is itself made from two fragments. "'Quadruple recombination" is an embodiment of the method of this aspect of the invention where the introduced nucleic acid is itself made from three fragments. The use of triple and quadruple recombination is particularly applicable to the current invention.
In one preferred embodiment of the method of the first aspect of the invention, the introduced nucleic acid fragment is made from two separate fragments in vivo. In triple recombination, where two components are involved each component includes a region of homology to the target nucleic acid molecule. Each component also includes a focal sequence which matches the focal sequence in the target nucleic acid molecule. At least one of the components includes a selectable marker. At least one of the components, preferably only one, includes a replacement nucleic acid sequence in the form of an inserted sequence. In one arrangement, both components contain a recognition sequence for a Type Hs nuclease. Alternatively, one component may contain two recognition sequences for Type Hs nucleases. Additionally, both components include a region of mutual homology that allows the components to undergo homologous recombination to knit together to form a single nucleic acid fragment for introduction into the target nucleic acid molecule.
This method is thus the same as the more simple methods described above except that the introduced fragment is the product of a triple recombination process. This process is described in co-pending International application filed on 20th February 2009 and entitled "Method of nucleic acid recombination". Although not as efficient as the simple recombination applications illustrated above, this triple intermediate recombination works with good efficiency because the internal recombination product must occur for the primary recombination step to integrate the selectable marker. This represents a convenient way to introduce larger sequences such as coding regions for fluorescent proteins. Examples of lengths suitable for insertion in accordance with this embodiment of the invention range from a few hundred base pairs to many hundreds or thousands of base pairs.
In one preferred embodiment of the method of the first aspect of the invention, the introduced nucleic acid fragment is made from three separate fragments in vivo. A first fragment comprises the selectable marker flanked on either side by two annealing regions which are not capable of annealing to the target nucleic acid. A second fragment comprises a first region of homology capable of annealing with a first sequence on the target nucleic acid, and a second region of homology capable of annealing with a first annealing region on the first fragment. In this embodiment a third fragment comprises a first region of homology capable of annealing with a second sequence of the target nucleic acid, and a second region of homology capable of annealing with a second annealing region on the first fragment.
In one embodiment the second or third fragment may comprise the coding sequence of a gene. In a further embodiment, the second and/or third fragments may comprise a partial coding sequence of a gene. In a further embodiment the second and/or third fragments may comprise a promoter element. In a further embodiment the second and third fragments may comprise partial coding sequences of the same gene such that following excision of the selectable marker by the Type IIS restriction endonuclease digestion, should the termini of the linear molecule be re-ligated the entire coding sequence of the gene is reconstituted.
Embodiments of the invention that exploit triple and quadruple recombination may incorporate methods of terminal adaptation, as detailed in co-pending application filed on
20th February 2009 and entitled "Method of nucleic acid recombination". According to this methodology, in adapting the fragments the appropriate stands of each fragment can be preferentially degraded such that the efficiency of the recombineering step is maximised. Preferably, the preferentially degraded strands are the strands of the first and third fragments that are not capable of annealing to the lagging strand of the target nucleic acid at a replication fork. It is preferable that the retained strand of second nucleic fragment is the strand which can anneal to the sequence of the preferentially degraded first and third fragments.
Unique site elimination
According to a further aspect of the invention a method, herein termed Unique Site Elimination (USE), for altering the sequence of a nucleic acid molecule is provided, the method comprising the steps of a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a recognition sequence for a nuclease; iii) a selectable marker; and iv) a second region of homology to the target nucleic acid molecule; and b) selecting for the incorporation of the nucleic acid fragment using the selectable marker; c) introducing into the selected product of step b) a replacement nucleic acid fragment comprising, from 5' to 3' : i) a first region of homology to the product of step b); ii) a replacement nucleic acid sequence; iii) a second region of homology to the product of step b); and such that the recognition sequence for the nuclease is removed; d) digesting the product of step c) with a nuclease that recognises the recognition sequence introduced in steps a) and b) such that successfully integrated replacement nucleic acids in step c) avoid linearization by the nuclease.
In this manner, the unique restriction site elimination method allows for the efficient and extensive modification of plasmids and episomes irrespective of size or sequence requirements. An example of the USE method is set out in Figure 6. The key concept is based on the introduction of a selectable gene into an episome by homologous recombination, which simultaneously introduces a restriction enzyme site that is ideally absent from the episome, and that will thereby be unique in the episome. A second homologous recombination step is then made which will eliminate the selectable gene and simultaneously the unique restriction site, but which introduces the desired alteration in the target nucleic acid molecule (e.g. substitution, insertion or deletion). The fragment introduced into the episome in the second homologous recombination step c) need not carry a selectable gene, and ideally will not. However the fragment introduced in the first step a) may carry a counterselectable gene in some embodiments; however this is not essential to the method. After step c), the second homologous recombination step, all episomes are harvested. These will include a small number of desired products from the second step c) and a large number of episomes that were not successfully recombined in this second step. The episomes are then digested with the nuclease corresponding to the unique site introduced in the first step a). These episomes will be linearized, whereas the products of the second recombination step will be uncut and remain as circles. The mixture of linear and circular episomes are then retransformed back into host cells. Only the uncut circular episomes will replicate and can be retrieved after propagation. A high proportion of these will be products from the second homologous recombination step c) and will contain the desired alteration in target nucleic acid sequence.
Many of the steps of this method are common between the methods previously discussed, for example, the methods for site-directed mutagenesis, linearization, and introduction of small and large insertions. Accordingly, there is no need to repeat again here the steps of homologous recombination, the details of the design of the homology arms, the selection and counterselection markers, and so on, since all these aspects are common to the various methods.
It will be apparent to the skilled reader how best to design the homology arms on both the nucleic acid fragment introduced in step a) and the fragment introduced in step c). The nucleic acid fragment introduced in step a) requires a first region of homology to the target nucleic acid molecule; a recognition sequence for a nuclease; a selectable marker; and a second region of homology to the target nucleic acid molecule. The regions of sequence homology may be located on the replacement fragment so that one region of homology is at one end of the molecule and the other is at the other end. However, one or both of the regions of homology may also be located internally. The two sequence homology regions should thus be tailored to the requirements of each particular experiment. There are no particular limitations relating to the position for the two sequence homology regions located on the target molecule, except that for circular double-stranded DNA molecules, the repair recombination event should not abolish the capacity to replicate. As the skilled reader will appreciate, the sequence homology regions can be interrupted by non-identical sequence regions, provided that sufficient sequence homology is retained to allow the repair recombination reaction to occur. Although this is not the focus of the invention, by including in the replacement nucleic acid molecule, sequence homology arms that span regions of non-identical sequence compared to the target nucleic acid molecule, mutations such as substitutions, (for example, point mutations), insertions and/or deletions may thus be introduced into the target nucleic acid molecule.
The nucleic acid fragment introduced in step a) should include a recognition sequence for a nuclease enzyme. Preferably, this recognition sequence does not occur elsewhere in the target nucleic acid, so that this recognition sequence is unique. In this scenario, digestion will only occur in those instances where the replacement nucleic acid in step c) has not successfully integrated. However, in some instances, such as in the case of very large constructs, e.g. BACs, it may be that one or a small number of other identical recognition sequences do inevitably exist in the target nucleic acid molecule, but that the invention is still operable by exploiting very short periods of nuclease digestion, so that partial digestion occurs and the desired products of step c) can still be isolated. Alternatively, a Type III restriction endonuclease, or functional equivalent may be employed. Type HI enzymes require two copies of the recognition sequence to be present in a single molecule for cleavage to occur. Accordingly, two recognition sequences may be placed in the inserted fragment if there are no occurrences of the recognition sequence in the target molecule. Further, if one recognition sequence is already present in the target molecule, incorporation of a further recognition sequence for the Type III enzyme in the introduced nucleic acid thus means cleavage will occur in those molecules which contains both sites, such that only the desired products of step c) are isolated.
The introduction of this recognition sequence has the advantage that digestion with this enzyme in step d) of the method cleaves solely those nucleic acid molecules that have been altered by recombineering by the introduced nucleic acid fragment introduced in step a), but not by the replacement nucleic acid fragment introduced in step c). Thus upon the transformation of the product of step d) into a suitable host cell, those nucleic acids which remain uncleaved, i.e. the desired product of the method, will form a vast excess of the product that is recovered. Again, for the method to work efficiently, the replacement nucleic acid must not contain the recognition sequence present in the introduced nucleic acid.
The USE method is not limited to use with Type IIS enzymes only. Any nuclease may be used that has a cognate recognition sequence in nucleic acid, including restriction enzymes of all types. Rare cutters will be generally preferred, because these nucleases are unlikely to recognise sequences elsewhere in the target nucleic acid and it is therefore easier to introduce a recognition sequence that is unique. Accordingly, the Type IIS nucleases (including Type IIS restriction enzymes, Type IIS derived zinc-finger endonucleases and any other Type IIS derived enzyme) are preferred. A restriction enzyme may be used which is not sensitive to the methylation state of the recognition sequence.
The replacement nucleic acid fragment introduced in step c) should be designed such that it comprises homology arms that span a portion of the target nucleic acid sequence which it is desired to mutate. According to this method, the constituent nucleotide components of a nucleic acid molecule are thus changed in some way. Examples of alterations include the insertion, deletion or substitution of one or more constituent nucleotides in the target nucleic acid molecule, such as the introduction of a point mutation or creation of altered protein reading frames. The replacement nucleic acid sequence may thus contain a substitution, insertion or deletion from the original target nucleic acid molecule and thus it is this sequence which defines the alteration in sequence which is desired. The number of base pairs substituted may range from a single point mutation, to multiple substitutions, insertions or deletions, for example, 2, 3, 5, 10, 20, 50 or more base pairs. Insertions or deletions may also comprise single or a small number of base pair additions or deletions, for example, 2, 3, 5, 10, 20, 50 or more base pairs; however, larger insertions and deletions of 100, 200, 500, 1000 base pairs or more may conveniently be inserted or removed, or even larger regions of many kilobase pairs. The inserted or deleted sequence will often be less than 1000 bps for convenience, for example, about 2, 5, 10, 20, 50, 100, 200 or 500. In one preferred embodiment of either the first or second aspects of the invention, the insertion inserted by the method is a small insertion, for example of three nucleotides. This embodiment is particularly preferred if the insertion is in a coding sequence. In this case this insertion will add a further codon to the coding sequence of a protein. Also preferred are larger in-frame insertions which can be accomplished with the insertion of larger lengths of nucleic acid, providing that the inserted length of nucleic. acid contains a number of nucleotides which is divisible by three, such that the reading frame is maintained. Concerted combinations of insertions, deletions, and substitutions are also possible.
There is no restriction to the type of alteration event to which the present application is applied, although the most obvious applications include those which are extremely difficult or time consuming using approaches that are currently available. Particularly the alteration is one which is not amenable to high-throughput methodologies using current techniques Examples include the precise modification of endogenous nucleic acid molecules in any species, such as yeast chromosomes, mouse embryonic stem cell chromosomes, C. elegans chromosomes, Arabidopsis and Drosophila chromosomes, human cell lines, viruses and parasites, or exogenous molecules such as plasmids, yeast artificial chromosomes (YACs) and human artificial chromosomes (HACs).
In order to ensure that the recognition site and selection marker are removed as a result of the second homologous recombination step c), the homology arms must span these sequences in the target nucleic acid which is produced in step b). This is evident, for example, from the illustration of an exemplary system according to this embodiment of the invention which is illustrated in Figure 6. It will be within the ambit of the skilled reader, imbued with knowledge of the present invention, to design constructs for use in accordance with the present invention.
In all the aspects of the invention described herein, the introduced nucleic acid fragments that are introduced may be circular or linear, but are preferably linear DNA or RNA molecules, either double-stranded or single-stranded. DNA is generally preferred. Preferred nucleic acids thus include single-stranded DNA or RNA, in either orientation, 5' or 3'. Annealed oligonucleotides may also be used, either with blunt ends, or possessing 5' or 3' overhangs. In one embodiment, single-stranded oligonucleotides are used. In another embodiment, single-stranded deoxyribonucleotides are used. Introduced nucleic acid molecules carrying a synthetic modification can also be used.
It should be noted that the introduced nucleic acid fragments do not necessarily represent a single species of nucleic acid molecule. For example, it is possible to use a heterogeneous population of nucleic acid molecules, for example, to generate a DNA library, such as a genomic or cDNA library.
A number of different types of target nucleic acid molecule may be used in the method of the invention. Accordingly, intact circular double-stranded nucleic acid molecules (DNA and RNA), such as plasmids, and other extrachromosomal DNA molecules based on cosmid, Pl, BAC or PAC vector technology may be used as the target nucleic acid molecule according to the invention described above. Examples of such vectors are described, for example, by Sambrook and Russell (Molecular Cloning, Third Edition (2000), Cold Spring Harbor Laboratory Press) and Ioannou et al. (Nature Genet. 6 (1994), 84-89) and the references cited therein.
The target nucleic acid molecule may also be a host cell chromosome, such as, for example, the E. coli chromosome. Alternatively, a eukaryotic host cell chromosome (for example, from yeast, C. elegans, Drosophila, mouse or human) or eukaryotic extrachromosomal DNA molecule such as a plasmid, YAC and HAC can be used. Alternatively, the target nucleic acid molecule need not be circular, but may be linear. Preferably, the target nucleic acid molecule is a double-stranded nucleic acid molecule, more preferably, a double-stranded DNA molecule.
The method of the invention may be effected, in part, in a host. Suitable hosts include cells of many species, such prokaryotes and eukaryotes, and also including viruses and parasites, although bacteria, such as gram negative bacteria are a preferred host. More preferably, the host cell is an enterobacterial cell, such as a Salmonella, Klebsiella, Bacillus, Neisseria or Escherichia coli cell (the method of the invention works effectively in all strains of E. coli that have been tested so far). It should be noted, however, that the method of the present invention is also suitable for use in eukaryotic cells or organisms, such as fungi, plant or animal cells, as well as viral and parasitic cells and organisms. The system has been demonstrated to function well in ES cells, specifically mouse ES cells, and there is no reason to suppose that it will not also be functional in other eukaryotic cells.
The method of the invention may comprise the contacting of the introduced and target nucleic acid molecules in vivo. In one embodiment, the introduced nucleic acid molecule may be transformed into a host cell that already harbours the target nucleic acid molecule. In a different embodiment, the introduced and target nucleic acid molecules may be mixed together in vitro before their co-transformation into the host cell. Of course, one or both of the species of nucleic acid molecule may be introduced into the host cell by any means, such as by transfection, transduction, transformation, electroporation and so on. For bacterial cells, a preferred method of transformation or cotransformation is electroporation. In one embodiment the homologous recombination of the method is initiated entirely in vitro, without the participation of host cells or the cellular recombination machinery. Phage annealing proteins such as RecT are able to form complexes in vitro between the protein itself, an oligonucleotide molecule and a double-stranded nucleic acid molecule (Noirot and Kolodner, J Biol Chem 273 (1998), 12274-12280). One example of such a complex is that formed between RecT, a ssDNA oligonucleotide and an intact circular plasmid. Such complexes lead to the formation of complexes that are herein termed "joint molecules" (consisting, in this example, of the plasmid and the ssDNA oligonucleotide). Such joint molecules have been found to be stable after removal of the phage annealing protein. The formation of stable joint molecules has been found to be dependent on the existence of shared homology regions between the ssDNA oligonucleotide and the plasmid.
The potential of RecA to make joint molecules in vitro has already been exploited to allow the isolation of desired DNA strategies from a pool, for example in RecA-assisted cloning (Ferrin and Camerini-Otero, Proc Natl Acad Sci USA 95 (1998), 2152-2157, for review see Ferrin, Methods MoI Biol 152 (2000), 135-147) and in RecA-mediated affinity capture (Zhumabayeva et al., Biotechniques 27 (1999), 834-840). It is proposed herein that so-called "joint molecules" as described above may be used directly to mediate recombination in a host cell, where the host cell does not need to express any phage annealing protein whatsoever.
As detailed herein, the methods of the invention rely on recombination events that involve the replacement of a section of target nucleic acid for an equivalent section of introduced nucleic acid, to which the introduced fragment is directed through the existence of shared regions of sequence homology between the two molecule types. As with conventional homologous recombination events, the introduced nucleic acid becomes covalently attached to the target nucleic acid. In this manner, the sequence information in the introduced nucleic acid molecule becomes integrated into the target nucleic acid molecule in a precise and specific manner, and with a high degree of fidelity. The efficiency of this step, when coupled with a selection step, is high, and allows the simple manipulation of sequences.
The nucleic acid molecule fragments used to replace target sequence may be single- stranded. This single-stranded nucleic acid may be generated in vivo or in vitro. In other words the single-stranded nucleic acid may be generated in a host cell. The generation of the single-stranded replacement nucleic acid from the double-stranded nucleic acid substrate prior to recombination may be mediated by any suitable means. The double- stranded nucleic acid substrate may be adapted such that one strand is preferentially degraded entirely to leave the other strand as the single-stranded replacement nucleic acid (see co-pending International application filed on 20th February 2009 and entitled "Method of nucleic acid recombination"). The degradation is preferably mediated by an exonuclease. The exonuclease may be a 3' to 5' exonuclease but is preferably a 5' to 3' exonuclease. Preferably, the 5' to 3' exonuclease is Red alpha (Kovall, R. and Matthews, B.W. Science, 1997, 277, 1824-1827; Carter, DM. and Radding, CM., 1971 , J. Biol. Chem. 246, 2502-2512; Little, J. W. 1967, J. Biol. Chem., 242, 679-686) or a functional equivalent thereof. In an alternative embodiment, the exonuclease is RecBCD. Alternatively and/or additionally, the single-stranded replacement nucleic acid is generated from the double-stranded nucleic acid substrate by a helicase. The helicase separates the dsDNA substrate into two single-stranded nucleic acids, one of which is the single-stranded replacement nucleic acid. The helicase may be either a 5 '-3' or 3 '-5' helicase. Preferably the helicase is RecBCD whilst it is inhibited by Red gamma. In other preferred embodiments, the helicase is any helicase of the RecQ, RecG or DnaB classes. In some embodiments, the single-stranded replacement nucleic acid generated by the helicase is preferentially stabilised relative to the other single-stranded nucleic acid generated by the helicase.
In preferred embodiments, the step of generating the single-stranded replacement nucleic acid from the double-stranded nucleic acid substrate is carried out in a host cell in which the recombination occurs. Alternatively, the step of generating the single-stranded replacement nucleic acid may be carried out in a separate host cell from the host cell in which the recombination occurs and may then be transferred to the host cell in which recombination occurs by any suitable means, for example, by transduction, transfection or electroporation. Alternatively, the step of generating the single-stranded nucleic acid from the double-stranded nucleic acid substrate may be carried out in vitro. Thus, the requirement in the host cell in which recombination takes place for Red alpha or an alternative enzyme that preferentially degrades one strand of the double-stranded nucleic acid substrate, or which separates the two strands, may be bypassed by providing the single-stranded nucleic acid to the host cell.
Advantageously, adapting one or both 5' ends of the double-stranded nucleic acid increases the yield of the single-stranded nucleic acid. Preferably, this increase in yield is due to the effect of adapting the 5' end(s) on the enzymes that act to generate the single- stranded nucleic acid.
Preferably, the double-stranded nucleic acid substrate is adapted so that it is asymmetric at its 5' ends. The asymmetry preferably causes one strand to be preferentially degraded. This preferably results in the other strand being maintained and so the production of a single-stranded nucleic acid is favoured, thereby improving the yield of the single- stranded nucleic acid.
By preparing a double-stranded nucleic acid substrate with asymmetric 5' ends and bringing this into contact with a target nucleic acid in the presence of Red beta and a suitable degradation/separation enzyme (preferably Red alpha or a helicase), it is possible to increase engineering efficiencies to levels greater than any other configuration yet described for recombineering methodologies. Therefore, the method of the invention preferably utilises a double-stranded nucleic acid substrate having asymmetry at its 5' ends wherein the method is conducted in the presence of Red alpha and/or a helicase and in the presence of Red beta. Red gamma is preferably also present as Red gamma inhibits RecBCD, which degrades double-stranded DNA. Another efficient way to engineer DNA using Red-mediated homologous recombination employs a double-stranded nucleic acid substrate that is adapted to have asymmetric 5' ends in the presence of Red beta and Red gamma, without Red alpha. A less efficient but still operable way to engineer DNA using Red-mediated homologous recombination employs a double-stranded nucleic acid substrate that is adapted to have asymmetric 5' ends in the presence of Red beta, without Red gamma (or a functional equivalent thereof) and without Red alpha (or a functional equivalent thereof). Such a method is also encompassed within the scope of the invention.
Any suitable method of making a double-stranded nucleic acid substrate asymmetric such that one strand is preferentially degraded whilst the other is maintained is envisaged by the present invention. The asymmetry may be conferred, for example, by one or more features present in only one strand of the double-stranded nucleic acid substrate or by one or more features present in both strands of the double-stranded nucleic acid substrate, wherein different features are present in different strands.
Preferably, the asymmetry is present at or in close proximity to the 5' ends of the two strands of the double-stranded nucleic acid substrate, most preferably at the 5' ends. For example, the asymmetry is preferably present at the 5' end of the 5' identity regions of the double-stranded nucleic acid substrate, or may be present in a region 5' of the 5' identity regions. The ''identity regions" of the double-stranded nucleic acid substrate correspond to the regions of the single-stranded nucleic acid that are identical to sequence on the target nucleic acid, or are complementary thereto. For example, the double- stranded nucleic acid substrate may have one or more features at or in close proximity to the 5' end of one of its strands but not at or in close proximity to the 5' end of the other strand which make it asymmetric.
Preferably, the asymmetry is conferred by a modification to the nucleic acid sequence. Preferably, the modification affects the progression of exonuclease, preferably a 5 '-3' exonuclease, preferably Red alpha exonuclease, on one strand but does not affect the progression of the exonuclease on the other strand. For example, the modification may inhibit the progression of exonuclease on one strand such that the exonuclease preferentially degrades the other strand. By "inhibit" the progression of exonuclease is meant that the modification inhibits the progression of the exonuclease on that strand relative to the other strand, for example, by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, most preferably 100%. For example, the modification may be the inclusion of a blocking DNA sequence, such as the Red alpha exonuclease pause sequence, more preferably, the Red alpha pentanucleotide pause sequence GGCGA, more preferably GGCGATTCT, more preferably, the left lambda cohesive end, also called the cos site (Perkins TT, Dalai RV, Mitsis PG, Block SM Sequence-dependent pausing of single lambda exonuclease molecules. Science 301 : 1914-8). The Red alpha exonuclease pause site may, for example, be placed at or in close proximity to the 5' end of one strand but not at or in close proximity to the 5' end of the other strand.
In a further preferred embodiment, the modification prevents the exonuclease from binding to one strand of the double-stranded nucleic acid substrate such that only the other strand is degraded. In a further preferred embodiment, the modification does not prevent the exonuclease from binding but blocks it from degrading one strand or both of the double-stranded nucleic acid substrate such that the strand that will anneal to the lagging strand template is stabilized upon separation from the dsDNA substrate by a helicase. In an alternative embodiment, the modification may promote the progression of exonuclease, preferably of 5'-3' exonuclease, more preferably Red alpha exonuclease, on one strand such that the exonuclease preferentially degrades that strand relative to the other strand. By "promote" the progression of exonuclease is meant that the modification promotes the progression of exonuclease activity on that strand relative to the other strand, for example, by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 100%, at least 200%, at least 300%, or at least 400%. In embodiments in which the two strands of the double-stranded nucleic acid substrate are separated by a helicase, the modification may serve to preferentially stabilise one strand, for example, by preventing an exonuclease or endonuclease from binding to that strand. In another embodiment, the modification prevents exonuclease degradation of both strands such that one strand is protected and can be released from the other by the action of a helicase.
In a preferred embodiment, the modification is one or more covalent modifications. Preferably, the covalent modification is present at or in close proximity to the 5' end of one strand but is not present at or in close proximity to the 5' end of the other strand. More preferably, the covalent modification is present at the 5' end of one strand but is not present at the 5' end of the other strand.
Preferred covalent modifications are the presence of a replacement nucleotide, such as the presence of a hydroxyl group or a phosphothioester bond. Such covalent modifications disfavour the action of exonucleases.
For example, in embodiments in which it is desired to protect the 5' end of the strand to be maintained, the covalent modification is preferably selected from one or more of the following:
• one or more phopshothioates in place of one or more phosphodiester bonds.
Preferably, the phosphothioate(s) is present in place of the 5'-most bond in the 5' identity region, or are present in place of the first two bonds, or are present in place of up to each of the first six (e.g. 3, 4, 5, 6) or more bonds in the 5' identity region; one or more phosphoacetates in place of one or more phosphodiester bonds. Preferably, the phosphoacetate(s) is present in place of the 5'-most bond in the 5' identity region, or are present in place of the first two bonds, or are present in place of up to each of the first six (e.g. 3, 4, 5, 6) or more bonds in the 5' identity region; one or more locked nucleotides (preferably LNA; 2'-0 and/or 4'-C-Methylen- beta-D-ribofuranosyl) in place of one or more nucleotides. Preferably, the one or more locked nucleotides are present in place of the first nucleotide in the 5' identity region, or are present in place of the first two nucleotides, or in place of up to the first six (e.g. 3, 4, 5 or 6) nucleotides; • a hydroxyl group. Preferably, the 5' most nucleotide of the substrate is also the 5' most nucleotide of the region that is identical to sequence on the target nucleic acid and the hydroxyl group is at the 5' end of this region of sequence identity; a 5' protruding end. For example, the covalent modification may be 2 or more protruding nucleotides, 4 or more protruding nucleotides, 6 or more protruding nucleotides, preferably 1 1 or more protruding nucleotides, preferably a 5' end containing the Red alpha pause sequence, preferably the left lambda cohesive end known as cos;
• any other covalent adduct that renders resistance to 5'-3' exonucleases. For example, the 5' end may be modified to contain an attached adduct such as biotin deoxygenin or fluorophore such as FITC.
In embodiments in which it is desired to render one strand of the double-stranded nucleic acid substrate sensitive to 5 '-3' exonucleases such that the other strand is the strand to be maintained, the covalent modification is preferably selected from one or more of the following:
• a 5' phosphate group;
• a 5' end that is either flush or recessed with respect to the adjacent 3' end;
• a 5' end that carries a stretch of DNA sequence that is not identical to the target DNA. The stretch of DNA sequence may be, for example, 1 -29 bps in length, more preferably 30-99 bps in length, more preferably 100-999 bps in length, even more preferably more than 1 kb in length;
• a 5' end that includes deoxy uridine nucleotides in place of deoxy thymidine nucleotides in the DNA strand;
• any other covalent adduct that conveys sensitivity to 5'-3' exonucleases.
Also encompassed within the scope of the invention are methods which use a double- stranded nucleic acid substrate for the production of a single-stranded nucleic acid that contains one or more covalent modifications that protect the 5' end of the strand to be maintained and also one or more covalent modifications that render the other strand of the double-stranded nucleic acid substrate sensitive to 5'-3' exonucleases. For example, the double-stranded nucleic acid substrate may lack the 5' phosphate (i.e. presence of hydroxyl) on one strand whilst the other strand comprises the 5' phosphate.
Preferably, the double-stranded nucleic acid substrate is adapted such that it comprises a 5' phosphothioate at one of its 5' ends but not at the other 5' end. Any other chemical modification at or near the 5' end which inhibits or promotes exonuclease progression or blocks exonuclease binding is also encompassed within the scope of the invention.
As mentioned above, the asymmetry may be caused by the double-stranded nucleic acid substrate having different extensions of single-strandedness; that is different combinations of 5' protruding, blunt (or "flush") or 3' protruding ends. For example, the double-stranded nucleic acid substrate may have only one 5' protruding end, only one 3' protruding end and/or only one blunt end. The asymmetry may be created by restriction cleavage to create different ends on the nucleic acid substrate. Restriction enzymes leave either 5' protruding, blunt or 3' protruding ends. The 5' protruding ends are least favoured for Red alpha digestion. Thus, in one embodiment, the double-stranded nucleic acid substrate preferably has only one 5' protruding end. In embodiments in which the asymmetry is generated by different extensions of single-strandedness, it is preferred that each strand of the double-stranded nucleic acid substrate is a continuous nucleic acid strand.
The asymmetry may alternatively be caused by the double-stranded nucleic acid substrate having different extensions of double-strandedness. For example, one end may have no additional nucleic acid sequence beyond the end of the identity region, and the other may have additional non-identical sequences. The additional non-identical sequences may be as short as 4 base pairs, however, preferably will be longer than 10 base pairs, and more preferably longer than 100 base pairs.
As mentioned above, it has also been found that homologous recombination may also occur in the absence of the Redα exonuclease when a double-stranded nucleic acid substrate is exposed to a target nucleic acid under conditions suitable for recombination to occur. It is hypothesised that a helicase acts to separate the two strands of the double- stranded nucleic acid substrate and that the strand that is the single-stranded nucleic acid is then available for use in homologous recombination. Surprisingly, it has been found that adapting both of the 5' ends of the double-stranded nucleic acid substrate leads to improved efficiencies of homologous recombination in such systems compared to systems in which the 5' ends are not adapted. Thus, in an alternative embodiment, the double-stranded nucleic acid is symmetrically adapted at both of its 5' ends. In a preferred embodiment, the double-stranded nucleic acid substrate is covalently modified at both of its 5' ends. Particularly preferred is the use of a double-stranded nucleic acid substrate in which both 5' ends are covalently modified with a biotin molecule, or more preferably, with a phosphothioate. Preferably, in such embodiments, the recombination is carried out in the absence of Red α. Alternatively, the invention also envisages using a helicase to generate the single-stranded nucleic acid from a double-stranded nucleic acid substrate that has 5' asymmetric ends, as described above.
The skilled person will understand the techniques required to adapt the double-stranded nucleic acid substrate to make it asymmetric. For example, following cleavage by a restriction enzyme, the substrate may be dephosphorylated with alkaline phosphatase, and then cleaved with a second restriction enzyme. As restriction enzymes usually leave phosphates on the 5' end, this will generate an asymmetrically phosphorylated substrate.
Two oligonucleotides may be designed for use as the terminal identity regions as is usual for a recombineering exercise. These oligonucleotides may be chemically synthesized so that their 5' ends are different with respect to the presence of a replacement nucleotide at or in close proximity to the 5' end. These oligonucleotides can be used, for example, for oligonucleotide-directed mutagenesis after annealing, or PCR on templates to create the asymmetrically ended double-stranded nucleic acid substrate or mixed with standard double-stranded nucleic acid cassettes and co-introduced into a host for 'quadruple' recombination.
The double-stranded nucleic acid substrate may be made by any suitable method. For example, it may be generated by PCR techniques or may be made from two single- stranded nucleic acids that anneal to each other. The double-stranded nucleic acid substrate may in particular be generated by long range PCR. Long range PCR has been used in the art to generate double-stranded fragments, for example of up to 50kb (Cheng et al. (1994) Proc Natl Acad Sci 91 : 5695-5699). The 5' ends of one or both of the primers used in this long range PCR may be adapted so that the PCR product is suitable for use as the double-stranded nucleic acid substrate in the methods of the invention. A preferred embodiment for the invention is to perform the homologous recombination in a host cell mutated for exonucleases, specifically an E. coli host mutated for sbcB. Thus, the invention also provides a method comprising performing the homologous recombination in a host cell in which the activity of its endogenous sbcB exonuclease, or the orthologue or functional equivalent thereof, has been inactivated or reduced. Also provided is a host cell in which the activity of its endogenous sbcB exonuclease, or the orthologue thereof, has been reduced or inactivated relative to its wild-type counterpart; such a host cell forms an aspect of the present invention. Preferably, the host cell is E. coli. SbcB or its orthologue or functional equivalent may be inactivated or the activity thereof may be reduced by way of a mutation. Preferably, the mutation inactivates the SbcB or its orthologue. Any suitable mutation is envisaged, for example, a deletion, insertion or substitution. For example, the entire gene encoding the exonuclease may be deleted or one or more point mutations may be used to inactivate the SbcB or its orthologue. The exonuclease may be inactivated in any other appropriate way, for example, by gene silencing techniques, by the use of exonuclease-specific antagonists or by degradation of the exonuclease.
Methods which utilise the mutant of the SbcB/orthologue/functional equivalent described above, may be a method according to the present invention. Also provided is the use of the SbcB mutants (and corresponding orthologues/functional equivalents) in broader aspects of homologous recombination technology. Thus, there is a provided a method of altering the sequence of a target nucleic acid comprising (a) bringing a first nucleic acid molecule into contact with a target nucleic acid molecule in the presence of a phage annealing protein, or a functional equivalent or fragment thereof, wherein said first nucleic acid molecule comprises at least two regions of shared sequence homology with the target nucleic acid molecule, under conditions suitable for repair recombination to occur between said first and second nucleic acid molecules and wherein the functional equivalent or fragment retains the ability to mediate recombination and wherein the activity of the host's endogenous sbcB exonuclease or orthologue or functional equivalent thereof has been inactivated or reduced; and (b) selecting a target nucleic acid molecule whose sequence has been altered so as to include sequence from said first nucleic acid molecule. Preferably, the phage annealing protein is Red beta or a functional equivalent thereof. The method may be carried out in the absence or presence of one or both of Red alpha and/or Red gamma or their functional equivalents.
Preferably, the target nucleic acid is the lagging strand template of a DNA replication fork and the inserted nucleic acid has 5' and 3' homology regions that can anneal to the lagging strand template of the target DNA when it is replicating. The term "lagging strand", as used herein, refers to the strand that is formed during discontinuous synthesis of a dsDNA molecule during DNA replication. The single stranded replacement nucleic acid anneals through its 5' and 3' identity regions to the lagging strand template of the target nucleic acid and promotes Okazaki-like synthesis and is thereby incorporated into the lagging strand. The direction of replication for plasmids, BACs and chromosomes is known, and so it is possible to design the double stranded nucleic acid substrate so that the maintained strand is the one that will anneal to the lagging strand template. In some embodiments, the double-stranded nucleic acid substrate is made from two or more double-stranded nucleic acids or from one or more double-stranded nucleic acids together with one or more single-stranded oligonucleotides. The use of two double- stranded nucleic acids to make the double-stranded nucleic acid substrate is referred to herein as 'triple' recombination because there are two double-stranded nucleic acid molecules which are used to make the double-stranded nucleic acid substrate and there is one target nucleic acid. The use of three nucleic acids to make the double-stranded nucleic acid substrate is referred to herein as 'quadruple' recombination because there are three nucleic acids which are used to make the double-stranded nucleic acid substrate and there is one target nucleic acid. Any number of single-stranded and/or double-stranded nucleic acids may be used to make the double-stranded nucleic acid substrate provided that the resulting double-stranded nucleic acid substrate is adapted at one or both of its 5' ends such that preferential degradation of one strand and/or strand separation generates the single-stranded nucleic acid.
In all cases where more than one nucleic acid is used to make the double-stranded nucleic acid substrate, a part of each of the more than one nucleic acids must be able to anneal with a part of its neighbouring nucleic acid. For example, for triple recombination, one end of each double-stranded nucleic acid that is used to make up the double-stranded nucleic acid substrate must be able to anneal to the target, whereas the other ends of each double-stranded nucleic acid that is used to make up the double-stranded nucleic acid substrate must be able to anneal to each other. The two double-stranded nucleic acids that are used to make up the double-stranded nucleic acid substrate are adapted such that one strand of each double-stranded nucleic acid is preferentially maintained. Methods for adaptation that lead to preferential degradation are discussed above. Following degradation of one strand of each of the two double-stranded nucleic acids, the remaining single strands anneal with each other to form the double-stranded nucleic acid substrate of the invention.
It is a great strength of the methods of the invention that no complex selection steps are necessary to select for the altered molecules. The advantages of the methods stem from the use of physical markers in both selection steps, namely selection for a selectable marker in step b), which removes any need for labour intensive screening for clones containing the mutation. This advantage may be compounded by the further, optional, use of selection against a counterselectable marker following transformation of the re-ligated product of step c) of the method according to the first aspect of the invention which ensures all altered nucleic acids recovered have had their selectable marker seamlessly excised, or following transformation of the digestion product of step d) of the method according to the second aspect of the invention, to provide a further level of selection for the desired altered nucleic acid.
The invention will now be illustrated by representative examples. It will be appreciated that modification of detail may be made without departing from the scope of the invention.
Brief description of the figures
Figure 1 : Basic mechanism for site directed mutagenesis mediated by the R2S system;
Figure 2: Small insertions and replacements mediated by the R2S system;
Figure 3: Large insertions mediated by the R2S system
Figure 4: Scheme for linearising DNA mediated by the R2S system
Figure 5: Scheme for shRNA insertion into a plasmid/BAC after linearization of the plasmid/BAC by the R2S system
Figure 6: Scheme for repairing LacZ through USE EXAMPLES
Example 1 : Site directed mutagenesis
Site directed mutagenesis was based on the selection of a focal sequence in the plasmid or episome that was be subject to mutagenesis. This focal sequence was any 1 to 20 bp sequence, according to which Type IIS enzyme was used. In figure 1 it was 4 bp, specifically AGCT. The Type IIS enzyme (here Bsal) can be any Type IIS restriction enzyme (as well as any Type IIS derived zinc-finger endonuclease or any other Type IIS derived enzyme). Two recognition sequences for the chosen Type IIS enzyme are placed in inverted orientation so that cleavage at both sites separated the fragment carrying the two binding sites from the vector, leaving a complementary region of single-strandedness that can be readily ligated. By designing the cleavage sites in different ways, seamless mutagenesis can be easily achieved.
Figure 1 is read in clockwise direction, starting from the upper left part.
In step one, a DNA segment containing a gene for selection, usually a gene conferring antibiotic resistance, here blasticidin) and, optionally, a gene for counterselection (here it is rpsL for counterselection with streptomycin) was introduced by a standard method of homologous recombination (recombineering) and the cells carrying the recombined product were selected with blasticidin. The DNA segment can be made in any way, but most conveniently was made by with oligonucleotides that encode, from the 5 ' end, a homology arm for recombination into the plasmid (here the homology arms are indicated by the blue regions); the 4 nucleotides that became single-stranded after cleavage by the Type Hs enzyme; the recognition sequence for the Type Hs enzyme and a primer region for PCR amplification of a selectable gene (optionally including a counterselectable gene).
The recombined plasmid was purified, and optionally retransformed to isolate a pure plasmid preparation or not. In step 2 the recombined plasmid was digested with the Type Hs restriction enzyme (Bsal in this case) and re-ligated. The product of this reaction is transformed and optionally selected for the loss of rpsL by selection for streptomycin resistance.
The obtained product was the mutated vector. Here a single point mutation is illustrated, however the same approach was applied to mutate any of the base pairs that are within the single-stranded region created by the cleavage of the Type IIS enzyme .
Example 2 : Small insertions and deletions
The R2S strategy is particularly suitable for small insertions or small replacements. All steps are the same as described in example one, except the inserted DNA fragment was differently configured to suit the experimental purpose.
The sequence to be inserted or replaced (termed Seq X here and in figure 2) was adjacent to the focal sequence. Seq X can be any sequence of any length although most often it was less than 1000 bps due to convenience. The obtained product was a vector containing the Seq X in any chosen place without a neighbouring selectable gene or any other operational sequence. Seq.X was any sequence that can be introduced in the left, or right, or both, of the focal point. It was one, or more than one, codon/s (that can replace codon/s in the vector). It was a loxP site or any other site for site specific recombinase site. It was a short tag or a restriction site. Frequently, seq X was 3 bp long and in frame within a coding region. Hence the method of this example details codon-based mutagenesis, which is superior to single nucleotide mutagenesis for the mutation of protein coding regions.
Example 3: Triple recombination and Quadruple recombination
The method of this example was the same as those in examples 1 and 2, except that nucleic acid recombined into the target was the product of recombination between two DNA fragments. Although not as efficient as the simple recombination applications illustrated above, this triple intermediate recombination worked with good efficiency because the initial recombination step must occur for the ultimate integratation of the selectable gene into the target. This represents a convenient way to introduce larger sequences such as coding regions for fluorescence proteins. Highly similar is Quadruple recombination, in which the replacement nucleic acid was constructed from three fragments prior to integration into the target.
Example 4: Use of R2S for linearization of circular DNA molecules
RIIS was also used to linearize a circular nucleic at any position between any two nucleotides, for example within a vector, or indeed within a genome, as long as the Type IIS enzyme did not recognize any other recognition sequences in the molecule or genome. This variation was particularly applicability to the linearization of BACs (bacterial artificial chromosomes) at chosen sites and the further use of these linearized BACs in complex recombination exercises. Furthermore, the applicants found that this method had particular application for the preparation of linearised vector backbones for high efficiency cloning by standard molecular biology techniques. First a vector linearised with by this method was created, with single-stranded overhangs appropriate to the nucleic acid fragment which was desired to be cloned into the vector by digestion with a Type IIS restriction endonuclease. As the person skilled in the art would know, nucleic acid digestions rarely proceed to absolute completion. The fragment to be clones was generated with termini that were compatible with those in the prepared vector. The fragments were then ligated together. Transformation with the ligation product into a host cell was then followed by selection for both the selection maker normally present on the vector back bone, and for the exclusion of the counterselectable maker, that had been incorporated in the vector backbone prior to its linearization and excision by the Type IIS restriction digest. This method allowed the inventors to remove any relegated vector back bone that had not been fully cleaved from the products recovered following transformation. This step significantly simplifies the process of cloning, particularly in cases where the cloning of large DNA fragments was desired.
Example 5: Use of R2S linearization in a strategy for shRNA cloning
Any plasmid having a miR (microRNA) sequence may be used with this strategy. The strategy generates a shRNA expressing plasmid in miR context without addition of any other sequence or restriction site (which can be detrimental to the miR processing) in the final construct. Preferably the plasmid vector is based on the R6K plasmid origin gamma without the accompanying pi protein open reading frame (Filutowicz et ai, 1986, Positive and negative roles of an initiator protein at an origin of replication, PNAS 83: 9645-9). These plasmids can be grown in E. coli hosts containing the pi protein but will not replicate in E. coli hosts in the absence of pi. Preferably the miR30 flanking sequences, which improve RNA processing of shRNA transcripts (Silva et al, 2005, Second- generation shRNA libraries covering the mouse and human genomes., Nature Genetics, 37: 1281-8) have been cloned into this plasmid. This example is shown schematically in Figure 5, which is read in clockwise direction, starting from the upper left part.
As shown in step 1 of Figure 5, the plasmid was modified by replacing the sequence for a short hairpin RNA with rpsLbsd (conferring resistance to blasticidin and sensitivity to streptomycin) flanked by type IIS restriction sites. The homology regions of the arms were chosen in the common miR sequence. This step was optionally performed in liquid culture using appropriate selection.
In step 2 of Figure 5, the plasmid was digested in vitro using a Type IIS restriction enzyme. The type IIS restriction enzyme cuts inside the common sequence to generate non-complementary overhangs of few nucleotides specific for the miR sequence. This mirrors the approach discussed in example 4 and shown in the scheme of Figure 4. After this stage, the linearized plasmid can be made as a batch and kept frozen for step 3.
In step 3, the shRNA is made by commercially sourced complementary oligonucleotides having the desired sequence, that will form dsDNA with overhangs that can anneal to the ends left after the Type IIS restriction cleavage of step 2. The ligation reaction is performed in vitro using standard conditions and counter selection for sensitivity to streptomycin is applied after transformation of the ligation reaction.
Example 6: Unique Site Elimination for improved recombineering
A plasmid carrying the lacZ gene is illustrated in Figure 6. This plasmid also carries a gene conveying resistance to ampicillin (apR). In the first homologous recombination step, a gene conveying resistance to kanamycin (kmR) was inserted into the lacZ gene at a site chosen to introduce a point mutation. Alongside the kmR gene are several restriction enzyme recognition sequences (Notl, Pad, Pmel, Kpnl). The kmR gene and the restriction sites were amplified by PCR using oligonucleotides that not only contained the appropriate primers for PCR amplification but also contained 50 extra nucleotides of sequence identity either side of the insertion site in the lacZ gene. The introduced restriction sites were absent from the starting plasmid and so are now unique in the product of the first recombination step. (NB this example employs four restriction enzyme sites but only one is needed).
The first step product was then recombined in a second step with an oligonucleotide that contained 50 extra nucleotides of sequence identity either side of the insertion site in the lacZ gene as well as the intended point mutation right in the middle (i.e. 50 nucleotides from each end). After recombination, the lacZ gene was restored by elimination of the kmR gene however the intended point mutation is also necessarily introduced. Because the second step was unselected, a substantial amount of first step product remained, which needed to be separated from the second step product. This was achieved by harvesting all plasmids together, and cleaving with the enzyme Pmel, which cut only those plasmids which had not incorporated the oligonucleotide of the second step of Figure 6. After restriction, the mixture of linearized first step product and circular second step product was retransformed back into the host and selection for ampicillin resistance. A large proportion of the resistant colonies carried the restored lacZ gene with the introduced point mutation.

Claims

1. A method for altering the sequence of a target nucleic acid molecule, said method comprising the steps of: a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a first recognition sequence for a Type IIS nuclease; iii) a selectable marker iv) a second recognition sequence for a Type IIS nuclease; v) a second region of homology to the target nucleic acid molecule; wherein components i) to v) are ordered from 5' to 3'; and a replacement nucleic acid sequence positioned between the first and second regions of homology but flanking the recognition sequences for the Type IIS nuclease; b) selecting for the incorporation of the replacement nucleic acid using the selectable marker; and c) cleaving the product of step b) with a Type IIS nuclease such that the selectable marker is excised, to produce a linear target fragment including the desired replacement sequence.
2. The method of claim 1 comprising a further step wherein the linear target fragment product of step c) is Ii gated.
3. The method of claim 1 or 2 wherein the first and second recognition sequences for a Type IIS nuclease direct cleavage by the enzyme such that the recognition sequences are located on the same nucleic acid digestion product as the selectable marker.
4. The method of any preceding claim wherein the introduced nucleic further comprises a counter-selectable marker between the Type IIS nuclease recognition sequences.
5. The method of any preceding claim wherein the Type IIS nuclease is a Type IIS restriction enzyme.
6. The method of any preceding claim wherein the first and second recognition sequences for a Type IIS nuclease are the recognised by the same Type IIS nuclease.
7. A method for altering the sequence of a nucleic acid molecule, said method comprising the steps of: a) introducing a nucleic acid fragment into the target nucleic acid molecule by homologous recombination, wherein the nucleic acid fragment comprises: i) a first region of homology to the target nucleic acid molecule; ii) a recognition sequence for a nuclease; iii) a selectable marker; and iv) a second region of homology to the target nucleic acid molecule; and b) selecting for the incorporation of the nucleic acid fragment using the selectable marker; c) introducing into the selected product of step b) a replacement nucleic acid fragment comprising, from 5' to 3': i) a first region of homology to the product of step b); ii) a replacement nucleic acid sequence; iii) a second region of homology to the product of step b); and such that the recognition sequence for the nuclease is removed; d) digesting the product of step c) with a nuclease that recognises the recognition sequence introduced in steps a) and b) such that successfully integrated replacement nucleic acids in step c) avoid linearization by the nuclease.
8. The method of claim 7 wherein the recognition sequence for a nuclease of the nucleic acid fragment introduced in step a) is a recognition sequence for a rare cutting enzyme.
9. The method of claim 7 or claim 8 wherein the recognition sequence for a nuclease of the nucleic acid fragment introduced in step a) is a recognition sequence that does not occur elsewhere in the target nucleic acid.
10. The method of any preceding claim wherein altering the sequence of a target nucleic acid molecule comprises one or more insertions, deletions, or substitutions.
1 1. The method of any preceding claim wherein the introduced nucleic acid fragment is comprised of two or more components.
12. The method of any preceding claim wherein the homologous recombination use combinations of proteins comprising the Redβ protein, for example the Redα, Redβ and Redγ proteins.
13. The method of any preceding claim wherein the introduced nucleic acid is a single stranded nucleic acid.
14. The method of claim 13 wherein the single stranded nucleic acid is formed from a double stranded nucleic acid.
15. The method of claim 14 wherein the double stranded nucleic acid is adapted so that it is asymmetric at its 5' ends, wherein the asymmetry causes one strand to be preferentially degraded.
16. The method of any of claims 13-15 wherein the single stranded replacement nucleic acid anneals to the lagging strand template of a DNA replication fork.
17. The method of claims 1-6 or 10-16 wherein steps a) and b) are conducted in a host cell.
18. The method of claims 7-16 wherein steps a), b) and c) are conducted in a host cell.
19. The method of claim 17 or 18 wherein the host cell is a prokaryotic host cell.
20. The method of claim 17 or 18 wherein the host cell is a eukaryotic host cell.
PCT/IB2010/000893 2009-03-30 2010-03-30 Method of altering nucleic acids WO2010113031A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0905458.6 2009-03-30
GB0905458A GB0905458D0 (en) 2009-03-30 2009-03-30 Method of altering nucleic acids

Publications (2)

Publication Number Publication Date
WO2010113031A2 true WO2010113031A2 (en) 2010-10-07
WO2010113031A3 WO2010113031A3 (en) 2011-01-27

Family

ID=40671958

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2010/000893 WO2010113031A2 (en) 2009-03-30 2010-03-30 Method of altering nucleic acids

Country Status (2)

Country Link
GB (1) GB0905458D0 (en)
WO (1) WO2010113031A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2534163A1 (en) * 2010-02-09 2012-12-19 Sangamo BioSciences, Inc. Targeted genomic modification with partially single-stranded donor molecules
WO2013144257A1 (en) * 2012-03-27 2013-10-03 Dsm Ip Assets B.V. Cloning method
WO2023107713A3 (en) * 2021-12-09 2023-10-12 Bonadea Diagnostics, Llc Sequence conversion reaction

Non-Patent Citations (48)

* Cited by examiner, † Cited by third party
Title
BANKS ET AL., LANCET, vol. 356, 2000, pages 1749 - 1756
BERG, J. M., NAT BIOTECHNOL, vol. 15, no. 323, 1997
BLOOR; CRANENBURGH, APPL. ENV. MICRO., 2006, pages 2520 - 2525
BODE ET AL., BIOL CHEM, vol. 381, 2000, pages 801 - 813
CARTER, D.M.; RADDING, C.M., J. BIOL. CHEM., vol. 246, 1971, pages 2502 - 2512
CHENG ET AL., PROC NATL ACAD SCI, vol. 91, 1994, pages 5695 - 5699
DATTA ET AL., PNAS, vol. 105, 2008, pages 1626 - 1631
DOYON ET AL., J. AM. CHEM. SOC, vol. 128, 2006, pages 2477 - 2484
FERRIN, METHODS MOL BIOL, vol. 152, 2000, pages 135 - 147
FERRIN; CAMERINI-OTERO, PROC NATL ACAD SCI USA, vol. 95, 1998, pages 2152 - 2157
FILUTOWICZ ET AL.: "Positive and negative roles of an initiator protein at an origin of replication", PNAS, vol. 83, 1986, pages 9645 - 9
HALL ET AL., J.BACTERIOL., vol. 175, 1993, pages 277 - 287
HALL ET AL.: "Muniyappa and Radding", J.BIOI.CHEM., vol. 261, 1993, pages 7472 - 7478
HAUSER ET AL., CELLS TISSUES ORGANS, vol. 167, 2000, pages 75 - 80
IOANNOU ET AL., NATURE GENET., vol. 6, 1994, pages 84 - 89
JAMIESON, A. C.; WANG, H.; KIM, S. H., PROC NATL ACAD SCI U S A, vol. 93, 1996, pages 12834 - 9
KIM ET AL., PNAS, vol. 93, 1996, pages 1156 - 1160
KMIEC; HOLLOMAN, J.BIOL.CHEM., vol. 256, 1981, pages 12636 - 12639
KOVALL, R.; MATTHEWS, B.W., SCIENCE, vol. 277, 1997, pages 1824 - 1827
LING; ROBINSON, ANAL. BIOCHEM., vol. 254, 1997, pages 157 - 178
LITTLE, J.W., J. BIOL. CHEM., vol. 242, 1967, pages 679 - 686
LSALAN, M.; KLUG, A.; CHOO, Y., NAT BIOTECHNOL, vol. 19, 2001, pages 656 - 60
MARTIENSSEN, PROC. NATL. ACAD, SCI. USA, vol. 95, 1998, pages 2021 - 2026
METHOD OF NUCLEIC ACID RECOMBINATION, 20 February 2009 (2009-02-20)
MURPHY ET AL., J MOL BIOL, vol. 194, 1987, pages 105 - 117
MUYRERS ET AL., NUCL ACIDS RES, vol. 27, 1999, pages 1555 - 1557
MUYRERS ET AL., TRENDS BIOCH SCI, vol. 26, no. 5, 2001, pages 325 - 331
MUYRERS ET AL., TRENDS IN BIOCH SCI, vol. 26, no. 5, 2001, pages 325 - 33 I
NOIROT; KOLODNER, J BIOL CHEM, vol. 273, 1998, pages 12274 - 12280
PABO, C. O.; PEISACH, E.; GRANT, R. A., ANNU REV BIOCHEM, vol. 70, 2001, pages 313 - 40
PANDEY; MANN, NATURE, vol. 405, 2000, pages 837 - 846
PERKINS TT; DALAL RV; MITSIS PG: "Block SM Sequence-dependent pausing of single lambda exonuclease molecules", SCIENCE, vol. 30T, pages T9T4 - 8
POTEETE; FENTON, GENETICS, vol. 134, 1993, pages 1013 - 1021
POTEETE; FENTON, J MOL BIOL, vol. 163, 1983, pages 257 - 275
REBAR, E. J.; PABO, C. O., SCIENCE, vol. 263, 1994, pages 671 - 3
RHODES, D.; KLUG, A., SCI AM, vol. 268, 1993, pages 5662 - 95
SHASHIKANT ET AL., GENE, vol. 223, 1998, pages 9 - 20
SHIGAKI; HIRSHI, ANAL. BIOCHEM., vol. 298, 2001, pages 118 - 120
SILVA ET AL.: "Second-generation shRNA libraries covering the mouse and human genomes.", NATURE GENETICS, vol. 37, 2005, pages 1281 - 8
SKOLNICK ET AL., NATURE BIOTECH, vol. 18, 2000, pages 283 - 287
STEMMER, W.P., NATURE, vol. 370, 1994, pages 389 - 91
SUNDARESAN, CURR OPIN BIOTECHNOL, 2000, pages 157 - 161
SZYBALSKI ET AL., GENE, vol. 20, 1991, pages 13 - 26
THOMASON ET AL., PLASMID, vol. 58, 2007, pages 148 - 158
VUKMIROVIC; TILGHMAN, NATURE, vol. 405, 2000, pages 820 - 822
WARMING ET AL., NUCL. ACID RES., 2005, pages 36
ZHANG ET AL., NATURE GENET, vol. 20, 1998, pages 123 - 128
ZHUMABAYEVA ET AL., BIOTECHNIQUES, vol. 27, 1999, pages 834 - 840

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2534163A1 (en) * 2010-02-09 2012-12-19 Sangamo BioSciences, Inc. Targeted genomic modification with partially single-stranded donor molecules
EP2534163A4 (en) * 2010-02-09 2013-10-16 Sangamo Biosciences Inc Targeted genomic modification with partially single-stranded donor molecules
US9005973B2 (en) 2010-02-09 2015-04-14 Sangamo Biosciences, Inc. Targeted genomic modification with partially single-stranded donor molecules
US9255259B2 (en) 2010-02-09 2016-02-09 Sangamo Biosciences, Inc. Targeted genomic modification with partially single-stranded donor molecules
US9970028B2 (en) 2010-02-09 2018-05-15 Sangamo Therapeutics, Inc. Targeted genomic modification with partially single-stranded donor molecules
WO2013144257A1 (en) * 2012-03-27 2013-10-03 Dsm Ip Assets B.V. Cloning method
CN104204205A (en) * 2012-03-27 2014-12-10 帝斯曼知识产权资产管理有限公司 Cloning method
US9738890B2 (en) 2012-03-27 2017-08-22 Dsm Ip Assets B.V. Cloning method
US10865407B2 (en) 2012-03-27 2020-12-15 Dsm Ip Assets B.V. Cloning method
WO2023107713A3 (en) * 2021-12-09 2023-10-12 Bonadea Diagnostics, Llc Sequence conversion reaction

Also Published As

Publication number Publication date
GB0905458D0 (en) 2009-05-13
WO2010113031A3 (en) 2011-01-27

Similar Documents

Publication Publication Date Title
US20200283759A1 (en) Direct cloning
JP4303597B2 (en) Construction of novel strains containing minimized genomes by Tn5-binding Cre / loxP excision system
KR102488128B1 (en) New technology for direct cloning of large fragments of the genome and multi-molecular assembly of DNA
EP2255001B1 (en) Method of nucleic acid recombination
WO2018081535A2 (en) Dynamic genome engineering
AU1877199A (en) Novel dna cloning method
US10385334B2 (en) Molecular identity tags and uses thereof in identifying intermolecular ligation products
WO2010113031A2 (en) Method of altering nucleic acids
WO2010140066A2 (en) Method of altering nucleic acids
JP2006525817A (en) An improved method for the determination of protein interactions
JP2015136314A (en) Methods for development, production, and marketing of clone
JP2024509194A (en) In vivo DNA assembly and analysis
CN117178056A (en) Method for producing seamless DNA vector

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10717759

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10717759

Country of ref document: EP

Kind code of ref document: A2