A walk- thro ugh technique for in vitro recombination of polynucleotide sequences
A simple and efficient method for random in vitro mutagenesis and recombination of polynucleotide sequences based on walk-through with chain-elongating molecules and chain terminating molecules e.g. dNTPs/ddNTPs followed by reassembly and amplification is described. The utility of this method for improving protein structure and/or function is demonstrated by creating novel sarcosine oxidase variants and by recombining human placenta! alkaline phosphatase and calf intestinal alkaline phosphatase.
Various optimization procedures such as genetic algorithms (Holland, J. H. 1975. Adaptation in natural and artificial systems. The University Press, Ann Arbor, Goldberg, D. E. 1989. Genetic algorithms in search, optimization and machine learning. Addison-Wesley. Reading) and evolutionary strategies (Eigen, M. 1971. Naturwissenschaften 58: 465-523, Rechenberg, I. 1973. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holz- boog, Stuttgart) have been inspired by natural evolution. These procedures employ mutation, which makes small random changes in each single member of the population, as well as crossover, which combines properties of different individuals, to achieve a specific optimization goal. There also exist strong interplay's between mutation and crossover, as shown by computer simulations of different optimization problems (Brady, R. M. 1985. Nature 317: 804-806, Muhlenbein, H. 1991. Parallel Computing 17:619-632, Pdl, K. F. 1993. Bio. Cybern. 69: 539-546, Pal, K. F. 1995. Bio.Cybern. 73, 335- 341). Developing efficient and practical experimental techniques to mimic these key processes is a scientific challenge. The application of such techniques should allow us, for example, to explore and optimize the functions of biological molecules such as proteins and nucleic acids, in vivo or even completely free from the constraints of a living system (Joyce, G. F. 1992. Scientific American, 267:90- 97, Shao, Z. and Arnold F. H. 1996. Curr. Opin. Struct. Biol. 6: 513-518).
Proteins are engineered with the goal of better understanding the molecular basis for their functions as well as much improving their performance for practical applications. A variety of approaches, involving both 'rational' and 'irrational' design, have been successfully used to optimize protein func-
tions (Shao, Z. and Arnold F. H. 1996. Curr. Opin. Struct. Biol. 6: 513-518). The choice of approach for a given optimization problem will depend upon the degree of understanding of the relationships between sequence, structure and function. The rational redesign of an enzyme's catalytic site, for example, often requires extensive knowledge of the enzyme structure, the structures of its complexes with various ligands and analogs of reaction intermediate and details of the catalytic mechanism. Such information is available only for a very few well-studied systems; little is known about the vast majority of potentially interesting enzymes.
For optimizing protein functions, several 'irrational' optimization approaches are being used (recently reviewed by Shao, Z. and Arnold F. H. 1996. Curr. Opin. Struct. Biol. 6: 513-518). These optimization procedures do not require much specific knowledge about the enzyme itself, only values of its function to be optimized. These procedures are even fairly tolerant to inaccuracies and noise in the function evaluation (Moore, J. C. and Arnold, F. H. 1996. Chem. Eng. Sci. 51:5091-5102). A particularly promising approach is to use both mutation and crossover synchronously to probe populations of solutions simultaneously for a specific optimization problem. Through multiple generations of screening or selection, bad local optima are evaded and better and better solutions are "bred" (Stemmer, W. P. C. 1994a. Nature, 370: 389-391, 1994; Stemmer, W. P. C. 1994b. Proc. Natl. Acad. Sci., USA, .91:10747-10751). During the past few years, more and more people have recognized the merits of evolutionary search strategies. Only a few practical techniques, however, such as
• sequential mutagenic PCR (Moore, J. C. and Arnold, F. H. 1996. Chem. Eng. Sci. 51 :5091-5102),
• combinatorial cassette mutagenesis (Reidhaar-Olson, J. F. and Sauer, R. T. 1988. Science 241: 53- 57),
• 'DNA shuffling' (Stemmer, W. P. C. 1994a. Nature, 370: 389-391; Stemmer, W. P. C. 1994b. Proc. Natl. Acad. Sci, USA, 9i:10747-10751),
• random-priming recombination (Shao, Zhixin, H. Zhao, Lori Giver and Frances H. Arnold. 1998. Nucleic Acids Res. 26(2): 681-683),
• staggered-extension (StEP). H. Zhao et al. 1998, Nature Biotechnol 16(3):258-2661 ),
• artificial recombination (Shao, Zhixin. 1999. Protein Science 8(Suppl.2): 55; Volkov, Alexander, Zhixin Shao, and Frances H. Arnold. 1999 Nucleic Acids Res. 27(18)elS:i-vi) have been successfully applied to these problems.
One of the most commonly used technique of evolutionary optimization in procedure is "DNA- shuffling" (US 5,834,252; US 5,605,793; US 5,830,721; US 5,837,458; US 5,811,238) DNA-shuffling comprises the following steps:
1) creation of random double-stranded fragments from different double-stranded template polynucleotides wherein said different template polynucleotides contain preferably areas of identity and areas of heterology
2) reassembling of the random fragments
3) cloning
4) expression and
5) screening.
The different double-stranded template polynucleotides may belong to the same family of nucleic acids or proteins (i.e. are related) but which differ in their sequence (i.e. are not identical) and hence in their biological activity. The use of DNAse I is described for random fragmentation in the prior art.
The DNA-shuffling using DNAse fragmentation described by Stemmer et al. has a number of disadvantages. First of all, DNAse I cleaves the DNA non-randomly. During an early phase of reaction cleavage occurs preferably in the middle of a DNA sequence and in the later phase of fragmentation cleavage occurs preferably between purine - and pyrimidine analogues. This leads to a fragmentation procedure which can be hardly controlled. Furthermore the minimal length of the desired gene is limited because of the non-controllable digestion of the template polynucleotide. Moreover, DNAse has to be removed completely after the digestion step because it introduces disturbance into the further reassembly reaction.
The aim of the present invention was, therefore, developing a novel technique for the recombination of DNA sequences.
The subject of the present invention is a novel technique for in vitro walk-through recombination of DNA sequences. The technique involves walking through a template gene with a mixture of chain- elongating molecules and chain- terminating molecules, e.g. dNTPs/ddNTPs, to generate a pool of 3'
end-randomly distributed DNA fragments with a low level of point mutations. Thus, a fragment ladder is generated similar to fragment ladders during sequencing reactions. After removing the terminating molecules, e.g. ddNMP ends, these short DNA fragments can prime one another under appropriate reaction conditions based on homology and thus can be reassembled to form full-length genes by repeated thermocycling in the presence of thermostable DNA polymerase. These genes can be further amplified by a conventional PCR and cloned into a proper vector for expression of the encoded proteins. Screening or selection of the expressed mutants leads to new variants with improved or even novel functions. These variants can be immediately used as partial solutions to a practical problem, or they can serve as new starting points for further cycles of walk-through mutagenesis and recombination. This technique has been proven by creating Bacillus subtilis BMTU3420 sarcosine oxidase variants and by recombining human placental alkaline phosphatase and calf intestinal alkaline phosphatase. It was found that the technique is both - simple and efficient.
Therefore, subject of the present invention is a method for forming a polynucleotide sequence comprising the steps of
a) generating a nucleic acid fragment ladder by nucleic acid synthesis which is carried out in presence of a reaction mixture comprising at least one primer, an enzyme having nucleic acid-synthesizing activity and a mixture of nucleic acid chain-terminating and nucleic acid chain-elongating molecules and a template polynucleotide,
b) removing the chain-terminating molecules or changing it into non-terminating molecules,
c) reassembly of the polynucleotide by hybridizing these fragments to one another or to template polynucleotides in presence of a thermostable enzyme having nucleic acid-synthesizing activity and conducting nucleic acid synthesis either i) in presence of a mixture of nucleic acid chain-terminating molecules and nucleic acid chain elongating molecules or ii) in presence of a reaction mixture of nucleic acid chain elongating molecules which does not contain chain-terminating molecules whereas in case of i) steps b) and c) will be repeated subsequently.
The step of removing chain-terminators and the step of fragment reassembly step can also be done at the same time. The subdivision of the process in steps a, b, c does not imply that three separate steps have to be performed. The three steps may also take place at the same time. It is also possible that the whole WalkThrough recombination maybe performed in more preferred forms, such as in a single container, vessel or tube.
Even though one primer is enough for the fragment synthesis (or chain elongation), two or more primers are necessary for different fragment synthesis reactions in order to get correctly orientated fragments for the afterwards reassembly. That is, two or more primer are needed for the whole walk through process.
Only one template is enough for the WallcThrough process, as diversity can be introduced during the chain synthesis and fragment reassembly (resulting from, for example, synthesis errors of DNA polymerase). That means in case the template polynucleotide is not a mixture of different polynucleotides the DNA fragments may contain a low level of local mutations due to some synthesis errors and thus new sequences will be created.
Using more than one template may significantly increase the diversity of the variant library, and, consequently, finding potentially positive variants may be easier. Preferably, the template polynucleotide in step a) and/or c) is a mixture consisting of different template polynucleotides which may belong to the same family of nucleic acids or proteins (i.e. related) but which differ in their sequence (i.e. not identical) and hence differ in their biological activity. In this case gene shuffling may occur.
Conditions for the WalkThrough technique may vary from case to case and can be optimized at three main different levels:
(a) fragment synthesis in the presence of chain-terminators,
(b) Removing the chain terminators incorporated, and
(c) Reassembling the fragments from the step (b).
a) Fragment synthesis in the presence of chain-terminators
This step is somehow similar to normal polynucleotide sequencing reaction, and, therefore, various conditions suitable for sequencing different targets may be applied to this step.
To carry out the WallcThrough procedure, the template can be single- or double-stranded polynucleotides in linear or closed circular form. The template may also be in the form of genomic DNA, that is, in native, intact, unpurified, or uncloned forms. Since, in most cases, the template genes are cloned in vectors into which no additional mutations should be introduced, they are usually first cleaved with restriction endonuclease(s) and purified from the vectors through agarose gel electrophoresis. For example linear DNA molecules are denatured, annealed to oligodeoxynucleotides for walkthrough reactions in the presence of an appropriate amount of chain-elongating/ chain- terminating molecules as e.g. dNTPs/ddNTPs. Thus the oligonucleotides prime the DNA of interest at different positions along the entire target region extend to generate DNA fragments complementary to each strand of the template DNA. Due to some synthesis errors, these DNA fragments also contain a low level of local mutations. After removing the chain-terminating ends of ddNMP's, these DNA fragments can prime one another under appropriate reaction conditions based on homology and be reassembled into full-length genes by repeated thermocycling in the presence of thermostable DNA polymerase. The resulting full-length genes will have diverse sequences, most of which, however, still resemble that of the original template DNA.
The fragment synthesis can be carried out with either chain elongation under mild conditions using mesophilic DNA polymerases or thermocycle sequencing using thermostable polymerase. To adapt PCR to the WalkThrough synthesis provides a more convenient way for more nascent DNA fragments and makes our technique more robust.
If thermocycling is used for the fragment synthesis, the number of thermocycling can be from about 15 to about 55 cycles, depending on the amount of template and its purity. The concrete cycle number can be easily determined according to the quality and quantity of the newly synthesized fragments.
The amount of initial DNA template(s) may vary according to the chain elongation conditions. Normally, 0.1-10 pmol of ds-DNA is sufficient for this kind of reaction.
The length of the primers used may vary from 14-mer to 28-mer. The lengths of the selected primers should be long enough to prevent annealing to unspecific DNA binding positions and guarantee a good signal-to-noise ratio. We usually use primers of 18- to 21 -mer.
The distances between the WalkThrough primers depend on the chain reaction modes and may vary from 100 to 800 nucleotides. The average distance between two primers in our cases is normally around 500 nucleotides.
The concentration of each primer can vary from 0.05 pmol to 2 pmol, depending on what kind of chain reaction is used. Normally, 0.1 pmol primer is used for the chain elongation of B. subtilis sarcosine oxidase gene, while 1.0 pmol each primer is used for the chain elongation of human placental alkaline phosphatase and calf intestinal alkaline phosphatase.
There are dozen of polymerases currently available, the synthesis of the nascent DNA fragments with randomly distributed 3' end can be achieved in more different fashions. For example, bacteriophage T4 DNA polymerase (Nossal 1974) or T7 sequenase version 2.0 DNA polymerase (Tabor and Richardson 1987, 1989), and /or thermostable DNA polymerases can be used for the WalkThrough synthesis. More preferably, other modified or mixture of DNA polymerases can be used which incorporate ddNTPs more efficiently than the wild type polymerases. For single-stranded RNA template, reverse transcriptase is preferred for the WalkThrough synthesis. Each of these enzymes is usually used under its optimized conditions to promote the chain elongation. Enzymes having nucleic acid-synthesizing activity can be DNA or RNA dependent polymerases.
It has been known for long time that different DNA polymerases can use 2', 3'-dideoxynucleotide triphosphates (ddNTPs) as substrates (Sanger, F., Nicklen, S., and Coulson, A.R. 1977. Proc. Natl. Acad. Sci. U.S.A. 74:5463-5467, Sanger, F., Coulson, A.R., Barrell, B.G., Smith, A.J.M., and Roe, B.A. 1980. /. Mol. Biol. 143:161-178) for DNA synthesis on single-stranded templates (Klenow, H. and I. Hen- ningsen. 1970. Proc. Natl. Acad. Sci. 65:168).
When a ddNMP is incorporated at the 3' end of the growing chain, chain elongation is terminated at A, C, G, or T because the chain now lacks a 3' hydroxyl group. To generate the four chain ladders, only one of the four possible ddNTPs is included in each of the four reactions. The ddNTP/dNTP
ratio in each reaction is adjusted such that a portion of the elongating chains terminate at each occurrence of the base in the template corresponding to the included complementary ddNTP. In this way, each of the four elongation reactions contains a population of extended chains, all of which have a fixed 5' end determined by the annealed primers and a variable 3' end terminated at a specific dideoxynucleotide. The chain elongation and termination can also be done in one reaction in the presence of proper amount of ddNTPs/dNTPs.
Chain terminating molecules means that these molecules terminate nucleic acid synthesis and, therefore, a fragment ladder is generated. After generation of the fragment ladder suitable chain-terminating molecules can be removed. Suitable chain-terminating nucleotides are e.g. dideoxynucleotides as ddGTP, ddATP, ddCTP, ddTTP or derivatives thereof. Derivatives of dideoxynucleotides are defined as those dideoxynucleotides that are able to be incorporated by a thermostable DNA polymerase into growing DNA molecules that are synthesized in a thermocycling reaction. These dideoxynucleotides and derivatives are preferably used at a concentration of about 20 μM to 1.0 mM. Such derivatives can include thionucleotides, 7-deaza-2'-dGTP, 7-deaza-2'-dATP as well as deoxy- inosine triphosphate that can also be used as a substitute deoxynucleotide for dATP, dGTP, dTTP or dCTP, but are not limited to these. These deoxynucleotides and derivatives are preferably used at a concentration of about 4 μM to 400 μM.
The concentrations and the ratios of the ddNTP/dNTP may vary from case to case in order to get shorter or longer fragments. Though a ratio of dNTP/ddNTP between 1/50 and 5/1 works well for the chain elongation, it has to be optimized for different individual template in order to get fragment pools with randomly distributed 3'-ends. This is very important for allowing every nucleotide of the template should be copied at a similar frequency into products, and therefore, providing possibility to recombine or dissect two or more mutations although they may be very close to each other.
Preferred WalkThrough buffer systems normally include Tris-HCI at a concentration of about 50 to 500 mM, preferably of about 100 to 250 mM. The pH values of these buffer systems range from pH6.5 to pH 10.0, depending on polymerases used. MaCl2 is generally included in these buffer systems at a concentration ranging from 1.0 to 5.0 mM. KCI may also be included at a concentration of 2-80 mM. Certain amount of mercaptoethanol (0.5-1.5%), Tween 20 (0.2-0.4%) and DMSO (1- 5%) may also exist in the buffers. Other agents, such as glycerol, betaine, etc., which can lower the
melting point of the templates may also be included in the buffers to facilitate the WallcThrough reaction.
The Walk-Through DNA synthesis is based on the chain elongation guided by template, and the nascent strand is synthesized from the 3'-OH termini at the primers using polymerase and the four deoxynucleoside triphosphates and stop after the ddNTP incorporation. Thus the reaction is independent of the length of the DNA template.
(b) Removing the chain terminators incorporated
This step is the second key step for successful WalkThrough recombination. The end nucleotide or incorporated terminator at each newly generated fragments must be removed so that each such fragment gains a free 3' -OH group necessary for further fragment reassembly.
Methods of removing chain-terminating molecules are known to a person skilled in the art. Chain- terminating molecules can be removed, for example, by DNA polymerases, many DNA polymerases have a 3' -5' exonuclease activity. This activity removes a single nucleotide at a time, releasing a nucleotide 5' monophosphate. In the absence of dNTPs, this activity will catalyze stepwise degradation from 3' end of both single- and double-stranded DNA. The polymerase which may serve this purpose include Klenow fragment of E. coli DNA polymerase I (Jacobsen, H., Klenow, H., and Over- gaard-Hansen, K. 1974. The N-terminal amino-acid sequences of DNA polymerase I from Escheri- chia coli and of the large and the small fragments obtained by a limited proteolysis. Eur. J. Biochem. 45:623-627. Joyce, CM. and Gridley, N.D.F. 1983. Construction of a plasmid that overproduces the large proteolytic fragment (Klenow fragment) of DNA polymerase I of Escherichia coli. Proc. Natl. Acad. Sci. USA 80:1830-1834), bacteriophage T4 DNA polymerase (Nossal, N.G. 1984, Prokaryotic DNA replication systems. Annu. Rev. Biochem. 53:581-615. Lin, T.C., Rush, J., Spicer, E. K., and Konigsberg, W.H. 1987. Cloning and expression of T4 DNA polymerase. Proc. Natl. Acad. Sci. USA 84:7000-7004), bacteriophage T7 DNA polymerase (Tabor, S. and Richardson, C.C. 1987. DNA sequencing analysis with a modified bacteriophage T7 DNA polymerase. Proc. Natl. Acad. Sci. USA. 84:4767-4771). The 3'-end terminators may be removed from the fragments with DNA polymerase with 3'-»5' proofreading activity in the presence of very limited or no dNTPs under mild conditions. Alternatively, chain-terminating molecules can also be removed by exonucleases, such as Exonuclease III. This enzyme is a multifunctional enzyme that catalyzes hydrolysis of several types of
phosphodiester bonds in double stranded DNA. The main application of Exo III is as a 3'-5' double- stranded specific exonuclease that catalyzes release of 3' nucleotides from the 3'- end of double stranded DNA (Roger, S.G. and Weiss, B. 1980. Exonuclease III of Escherichia coli K-12, an AP exonuclease. Meth. Enzymol. 65:201-211).
Preferred reaction buffers used for terminator-removing reactions normally include Tris-HCI at a concentration of about 40 to 200 mM, preferably of about 50 to 100 mM. The pH values of these buffer systems range from pH 6.5 to pH9.0, depending on enzymes with 3'-5' exonuclease activity used. MgCl2 is generally mcluded in these buffer systems at a concentration ranging from 1.0 to 3.0 mM. KCI may also be included at a concentration of 5-50 mM. Certain amount of mer cap to ethanol (0.5-1.5%) may also exist in the buffers.
The reaction normally is carried out at 37°C for 0.5-2.0 hours.
Using thermostable DNA polymerases and thermostable enzymes with 3'->5' exonuclease activity in the proper buffer and under the optimized conditions, the steps of terminator removal and reassembly can also be combined, so that no separate (B) and (C) steps are necessary. This method of the present invention is particularly preferred, as demonstrated in the example of recombining human placental alkaline phosphatase and calf intestinal alkaline phosphatase.
(c) Reassembling the fragments
The reassembling of the fragments is a PCR like reaction, comprising cycles of DNA-denaturation, annealing and DNA synthesis. During the annealing ssDNA- fragments hybridize in homologues areas. The overlapping ends of the ssDNA are extended by a polymerase. In case ssDNA random fragments from different polynucleotides hybridize, gene-shuffling occurs. Gene-shuffling means recombination between homologues but non-identical sequences. The term "identical" means that two nucleic acid sequences have the same sequence or a complementary sequence. Thus, "areas of identity" means that regions or areas of a nucleic acid fragment or polynucleotide are identical or complementary to another polynucleotide or nucleic acid fragment. The term "homologues" means that one single-stranded (ss) nucleic acid sequence may hybridize to a complementary ss nucleic acid sequence. The degree of hybridization may depend on a number of factors including the amount of
identity between sequences and the hybridization conditions such as temperature and salt concentration as discussed later.
The reassembly step is the most critical one in the whole WalkThrough process, as all synthesized fragments during Step (A) prime one another for elongation without any additional primers added.
Since the concentration of the DNA is the most important variable, it is useful to set up three separate reactions with different concentrations (high, middle, low). According to our experience, the amount of DNA fragments between 0.1 μg to 2.0 μg usually gives satisfied reassembly results in a reaction volume of 20-50 μl.
The deoxynucleotides during reassembly are preferably used at a concentration of about 100 μM to 400 μM, and the chosen concentration depends on the cycle number of the reaction.
Preferred reassembly buffers normally include Tris-HCI at a concentration of about 5 to 50 mM, preferably of about 10 mM. The pH values of these buffer systems range from pH 7.5 to pH 10.0, depending on polymerases used. MgCl2 is generally included in these buffer systems at a concentration ranging from 1.0 to 5.0 mM. KCI may also be included at a concentration of 20-80 mM. Certain amount of mercaptoethanol (0.5-1.5%), Tween 20 (0.2-0.4%) or Triton-X 100 (0.1-0.5%) may also exist in the buffers.
A typical reassembly cycle consists of three steps: the first step is heat denaturation of the double- stranded target nucleic acid. The exact conditions required for denaturation of the sample nucleic acid depend on the length and composition of the sample nucleic acid. Typically, an incubation at 90°C-100°C for about 10 seconds up to 5 minutes is efficient to denature the sample nucleic acid. The annealing temperature used in reassembly reaction is about 40°C to 70°C, usually ranging from about 55°C to 65°C and lasting for a period of 15 second to 60 seconds. The elongation is usually done under conditions sufficient to provide for polymerization of nucleotides to the fragment ends. To achieve polymerization conditions, the temperature of the reaction mixture will typically be maintained at a temperature ranging from about 65°C to 75°C, more preferably at 68°C to 72°C for
about 15 second to 2 minutes, preferably for 30 seconds to 1 minute depending on the length of the finally reassembled DNA.
The number of thermocycling can be from about 15 to about 55 cycles, depending on the amount of template and the length of the finally reassembled DNA. The concrete cycle number can be easily determined according to the quality and quantity of the newly synthesized fragments.
Several thermostable polymerases with different synthesis fidelity can be used for the reassembly, depending on what kind of the error rate the final reassembly product should have. Each of these enzymes is usually used under its optimized conditions to promote the reassembly. The goal is that the resulting full-length genes will have diverse sequences, most of which, however, still resemble that of the original template DNA.
These sequences obtained after the reassembling step can be further amplified by a conventional PCR and cloned into a vector for expression. Suitable vectors are known to a person skilled in the art and vectors cited in the following references are herein incorporated by reference Kingsman SM, Kingsman AJ. Philos Trans. R. Soc. Lond B. Biol. Sci. 1989, 324(1224):477-485; Bailey JE. Adv. Biochem. Engl. Biotechnol. 1993, 48:29-52. Suitable expression systems are known in the art as well and expression systems cited in the following references are herein incorporated by reference Shatzman AR, Rosenbrg M. Methods Enzymol. 1987; 152:661-73. Screening or selection of the expressed mutants should lead to variants with improved or even new specific functions. Suitable screening and selection systems are known in the art and screening and selection systems cited in the following references are herein incorporated by Kuchner O, Arnold FH. Trends Biotechno. 1997 Dec; 15(12):523-30; Patel PH, Loeb LA. Procc. Natl. Acad. Sci. USA, 2000 May 9; 97(10):5095-100. These variants can be immediately used as partial solutions to a practical problem, or they can serve as new starting points for further cycles of directed evolution.
Compared to other techniques used for protein optimization, such as combinatorial cassette and oli- gonucleotide-directed mutagenesis (Reidhaar-Olson, J. F. and Sauer, R. T. 1988. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241: 53-57; Oliphant, A. R., Nussbaum, A. L., and Struhl, K. 1986. Cloning of random-sequence oligodeoxynucleotides. Gene 44: 177-183, Hermes, J. D., Blacklow, S. C, and Knowles, J. R. 1990. Searching
sequence space by definably random mutagenesis - improving the catalytic potency of an enzyme. Proc. Natl. Acad. Sci. USA 87: 696-700), error-prone PCR (Leung, D. W., Chen, E., and Goeddel, D. V. 1989. A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction. BioTechnique 1: 11-15; Chen K. and Arnold, F. H. 1993 . Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in di- methylformamide. Proc. Natl. Acad. Sci. USA 90:5681-5622), or DNA 'shuffling' (Stemmer, W. P. C. 1994a. Rapid evolution of a protein in vitro by DNA shuffling. Nature, 370: 389-391; Stemmer, W. P. C. 1994b. DNA shuffling by random fragmentation and reassembly -in vitro recombination for molecular evolution. Proc. Natl. Acad. Sci., USA, 91:10747-10751), this walk-through based technique shows several advantages for in vitro protein optimization:
( 1 ) Since the template used for walkthrough synthesis may b e either single- or double-stranded polynucleotides, one limitation of DNA 'shuffling' (Stemmer, W. P. C. 1994a. Rapid evolution of a protein in vitro by DNA shuffling. Nature, 370: 389-391; ), which has to employ double- stranded polynucleotide as a template, does not exist any more. With the technique described here, any potential mutations and/or crossovers can be introduced at DNA level by using different DNA-dependent DNA polymerases, or even directly from mRNA level by using different RNA-dependent DNA polymerases. This provides more opportunities and realities to achieve the goal of optimizing protein specific functions.
(2) Different from the DNA 'shuffling' procedure which requires fragmenting the double-stranded DNA template with DNAse I to get random fragments (Stemmer, W. P. C. 1994a. Nature, 370: 389-391; Stemmer, W. P. C. 1994b. Proc. Natl. Acad. Sci, USA, 91:10747-10751), the technique described here employs WalkThrough synthesis to obtain size-controllable DNA fragments with randomly distributed 3' end as "breeding blocks" for further reassembly (Fig. 1).
(3) Since the WalkThrough chains are a population of fragments that each stops in every position, they are uniform in their positional preference and lack a sequence bias. The sequence heterogeneity allows both, mutations and crossover may happen more randomly than, for example, with error-prone PCR or DNA 'shuffling'.
(4) Normal error-prone PCR and DNA shuffling can not efficiently recombine or dissect two or more mutations if they are very close to each other. (Stemmer, W. P. C. 1994a. Rapid evolution of a protein in vitro by DNA shuffling. Nature, 370: 389-391). In contrast, Walk Through approach allows recombination occuring at every position of templates and therefore, provides possibility to recombine or dissect two or more mutations although they may be very close to each other.
(5) The Walk-Through DNA synthesis is based on the chain elongation guided by template, and the nascent strands are synthesized from the 3'-OH termini at the primers using polymerase and the four deoxynucleoside triphosphates and stop after the ddNTP incorporation. Thus the reaction is independent of the length of the DNA template. This is particularly useful for engineering small peptides or large enzymes or even enzyme pathways.
(6) Since DNase I is an endonuclease that hydrolyzes double-stranded DNA preferentially at sites adjacent to pyrimidine nucleotides, its use in DNA shuffling (Stemmer, W. P. C. 1994a.. Nature, 370: 389-391; Stemmer, W. P. C. 1994b. Proc. Natl. Acad. Sci, USA, 91:10747-10751) may result in bias (particularly for genes with high G+C or high A+T content) at the step of template gene digestion. Effects of this potential bias on the overall mutation rate and recombination frequency have not yet been investigated, but they may be avoided by using the Walk- Through approach.
(7) Since there are dozen of polymerases currently available, the synthesis of the nascent DNA fragments with randomly distributed 3' end can be achieved in more different fashions. For example, bacteriophage T4 DNA polymerase (Nossal, N.G. 1974. J. Biol. Chem. 249: 5668-5676) or T7 sequenase version 2.0 DNA polymerase (Tabor, S. and Richardson, C. C. 1987. Proc. Natl. Acad. Sci., USA, 84:4767-4771, Tabor, S. and Richardson, C. C. 1989. J. Biol. Chem. 264:6447-6458), and /or thermostable DNA polymerases can be used for the WalkThrough synthesis.
For single-stranded polynucleotide template (particularly for RNA template), reverse trans- criptase is preferred for the WalkThrough synthesis. Since this enzyme lacks 3'- 5' exonuclease activity, it is therefore prone to error. In the presence of high concentrations of dNTPs and
2+
Mn , about 1 base in every 500 is misincorporated (Sambrook, J., Fritsch, E. F. and Maniatis, T. 1989. Molecular cloning: A Laboratory Manual. 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY).
(8) One of the key steps in our technique is to control the 3' end of the nascent, single-strand DNA synthesized during the WalkThrough process. Under certain conditions, this step may be used for efficient terminal and/ or internal insertion/ deletion, resulting in molecules with different sizes. Efficient changing target molecule sizes is normally unachievable through error-prone PCR or DNA shuffling.
(9) By modifying reaction conditions, PCR can be adjusted for the WalkThrough synthesis using thermostable polymerase for the short, nascent DNA fragments. To adapt PCR to the Walk Through synthesis provides a more convenient way for more nascent DNA fragments and makes our technique more robust.
The inventive method of the present invention may, however, be combined with other known methods of the art if this seems to be advantageously.
WalkThrough DNA synthesis was used to generate DNA fragments with randomly distributed 3' ends from denatured, linear, double-stranded DNA (e.g., restriction fragments purified by gel electrophoresis) to Bacillus subtilis BMTU3420 sarcosine oxidase gene. The purified DNA, mixed with a molar excess of primers, was denatured, and synthesis was then carried out using the Stoffel fragment. This enzyme lacks 5'-»3' exonuclease activity, so that the WalkThrough product was synthesized exclusively by extension and was not degraded by exonuclease (see Example 1).
The WalkThrough recombination technique is further demonstrated with another example of re- combining human placental alkaline phosphatase and calf intestinal alkaline phosphatase. (see Example 2). In example 2, the step of removing chain-terminators and the step of fragment reassembly step can be done at the same time.
The technique described here maybe used to explore the vast space of potentially useful catalysts for their optimal performance in a wide range of applications as well as to develop or evolve new enzymes for basic structure-function studies.
Most of the experimental conditions described here may potentially be used with little modification for optimizing native enzymes, part (or all) of the enzyme pathways, as well as other biomolecules. For example, in terms of reducing proteolytic activity, probing activator genes, enhancing secretion mechanisms, or solving other fundamental enzyme engineering problems, the technique described here may find immediate applications.
While this protocol describes using DNA-dependent DNA polymerase and single-stranded DNA as template, alternative protocols are also feasible for using single-stranded RNA as a template. By using specific enzyme mRNA as the template and RNA-dependent DNA polymerase (reverse transcriptase) as the catalyst, our approach ma be modified to introduce mutations and crossovers into cDNA clones and to create molecular diversity directly from mRNA level to achieve the goal of optimizing enzyme functions. Reverse transcriptases are derived from retrovirus, such as avian myelo- blastosis virus (AMV) or Moloney murine leukemia virus (MMLV), which use them to make DNA copies of their RNA genomes. AMV and MMLN reverse transcriptases (Nerma IM. The reverse transcriptase. Biochim. Biophys. Acta. 1977 Mar. 21; 473(l):l-38. Roth MJ. Tanese Ν. Goff SP. Gene product of Moloney murine leukemia virus required for proviral integration is a DΝA-binding protein. J. Mol. Biol. 1988; 203(l):131-9) are mainly used as RΝA-directed DΝA polymerase. Specifically, deoxyoligonucleotides are used as primers for extension on RΝA (usually messenger RΝA) templates to generate complementary DΝA.
Figure 1:
Walking through the single-stranded polynucleotide template with dΝTPs/ddΝTPs to generate fragments with randomly-distributed 3' end, removing ddΝMP from the 3' end, reassembly of the full length DΝA by thermocycling in the presence of DAΝΝ polymerase and nucleotides and amplification of reassembled products by conventional PCR for further cloning and screening.
Figure 2:
Primer arrangement along human placental alkaline phosphatase (hpap) and calf intestinal alkaline phosphatase (ciap) genes
Figure 3:
The N-terminal sequence alignment of parental ciap and hpap genes and their recombination variants (AP01, AP03, AP05, AP06, AP11 andAP15)
Figure 4:
The C-terminal sequence alignment of parental parental ciap and hpap genes and their recombination variants (AP2, AP9 and AP13)
Figure 5:
Sequences of hpap, ciap, APOl, AP03, AP05, AP06, AP11, AP15, AP2, AP9 and APB.
Example 1
1. The DNA of interest with appropriate restriction endonuclease(s) and purify the DNA fragment of interest was purified by gel electrophoresis using Roche High Pure PCR Prep Kit (Roche Diagnostics GmbH, Germany)). As an example, the Bacillus subtilis BMTU3420 sarcosine oxidase gene was cleaved as a 1.2 kb-long Pstl-Nsil fragment from the recombinant plasmid pBMTU5823.
2. About 0.5 pmol of the double-stranded DNA dissolved in H2O was mixed with 0.1 pmol of each SODF1, SODF2, SODR1, and SODR2 primers. After immersion in boiling water for 3 minutes, the mixture was placed immediately in an ice/ ethanol bath.
3. Ten μl of 10 x reaction buffer [10X buffer: 900 mM HEPES, pH 6.6; 0.1 M magnesium chloride, 10 mM dithiothreitol, 5 mM each dATP, dCTP, dGTP and dTTP, and 2mM each ddATP, ddCTP, ddGTP and ddTTP) was added to the denatured sample, and the total volume of the re-
action mixture was brought up to 98 μl with H2O.
4. Ten units (about 2 μl) of the Stoffel fragment was added. All the components were mixed by gently tapping the outside of the tube and were centrifuged at 12,000 g for 1-2 seconds in a mi- crofuge to move all the liquid to the bottom. The reaction was carried out under standard PCR cycling sequencing conditions using Taq DNA Sequencing Kit for Standard and Cycle Sequencing Kit (Roche Diagnostics GmbH, Germany).
5. The reaction products were subjected to Wizard DNA Clean up System (Promega, Wl, USA) to remove the enzymes, primers, reaction buffer components and dNTPs/ddNTPs. The purified products were incubated in 1 x Klenow buffer and 10 U of Klenow at 37°C for 40 minutes. Two μl of Dpnl (10 U/μl) was added to the mixture, and the incubation was continued at 37°C for further 40 minutes.
6. The digested WallcThrough products were purified with High Pure PCR Purification Kit (Roche Diagnostics GmbH, Germany) and was used for whole gene reassembly.
Reassembly of the whole gene
1. For SarcOD gene reassembly by PCR, 5μl of the purified WalkThrough DNA fragments, 20 μl of 2X PCR pre-mix (5-fold diluted cloned Pfu buffer, 0.5 mM each dNTP, O.lU/μl cloned Pfu polymerase (Stratagene, La Jolla, CA)) and 15 μl of H2O were mixed on ice.
2. After incubation at 96°C for 5 min, 40 thermocycles were performed, each with 1.0 min at 95°C, 1.0 min at 55°C and 0 + 5 sec/cycle at 72°C, with the extension step of the last cycle proceeding at 72°C for 10 min, in an Eppendorf Master cycler Gradient (Eppendorf, Germany).
3. 3 μl aliquots at cycles 20, 30 and 40 were removed from the reaction mixture and analyzed by agarose gel electrophoresis. The reassembled PCR product at 40 cycles contained the correct size product in a smear of larger and smaller sizes.
Amplification
The correctly reassembled product of this first PCR was further amplified in a second PCR reaction which contained the PCR primers complementary to the ends of the template DNA.
1. 2.0 μl of the PCR reassembly aliquots were used as template in 100-μl standard PCR reactions, which contained 0.2 μM each primers of pstF (5' GGTAGAGCGAG-TCTCGAGGGGGAGATG C 3', SEQ ID NO: 1) and NsiR (51 AGCCGGCGTGACGTGGGTCAGC 3', SEQ ID NO: 2), 1.5 mM MgCi2, 10 mM Tris-HCI [pH 9.0], 50 mM KCI, 200 μM each of the four dNTPs, 2.5 U of Taq polymerase (Roche, Germany) and 2.5 U of Pfu polymerase (Stratagene, La Jolla, CA).
2. After incubation at 96°C for 5 min, 30 thermocycles were performed, each with 1.0 min at 95°C and 1.0 min at 72°C, with the extension step of the last cycle proceeding at 72°C for 10 min, in an Eppendorf Mastercycler Gradient (Eppendorf GmbH, Germany).
3. The amplification resulted in a large amount of PCR product with the correct size of the sarcosine oxidase whole gene.
Cloning
1. The PCR product of Bs SarcOD gene was digested with Pst I and Nsi I restriction enzymes, and cloned into pBMTU5823 (Pst/Nsi).
2. E. coli XL1 F' cells were transformed with the above ligation mixture to form a mutant library.
Results:
After only one round of applying this technique on the wild- type SarcOD gene, two Hind III and Pvu II resistant mutants have been found to be enzymatic active.
Example 2
Recombination of human placental alkaline phosphatase and calf intestinal alkaline phosphatase genes
1. Description of the method
Walk through recombination' is a method used to recombine two or more DNA-sequences based on their homology. Furthermore point mutations can also be introduced during the recombination steps. This method consists mainly of five steps:
Fragment synthesis Template removal Terminator removal Reassembly Amplification
1.1 Fragment synthesis
Fragment synthesis results from a DNA synthesis reaction where extension is terminated by the incorporation of dideoxy nucleotides. Target DNA sequences selected for recombination serve as templates in this reaction. The number of primers should be chosen depending on the length of the target DNA sequences and the length of the resulting fragments. The fragment length may also be controlled by the reaction conditions. Furthermore, a ,Cycle Sequencing reaction' is favorable because of its higher product yield and easy use.
1.2 Template removal
To avoid interference with the reassembly step, template DNA has to be completely removed after the fragment synthesis. Several methods can be used for this purpose, we have chosen to use U-DNA and Uracil-DNA Glycos lase. Uracil-DNA Glycosylase can be used to remove uracil base at any site where a deoxyuridylate has been incorporated (U-DNA). The resulting abasic site can subsequently be hydrolyzed by alkali- treatment, high temperatures or specific endonucleases. In our approach, a simple temperature treatment is sufficient. U-DNA can be prepared by a PCR reaction using dUTP's instead of dTTP's. After the U-DNA has served as the template for the fragment synthesis, the whole
reaction is subjected to the uracil-DNA glycosylase treatment and temperature treatment. Another control PCR can be used to ensure the complete U-DNA removal.
1.3 Terminator removal
The fragments need to have 3'-OH end for the reassembly reaction. Since our fragments are terminated with dideoxynucleotides, they do not carry 3' -OH ends. These terminators can be removed by 3'-5'-exonuclease activity of several nucleases or DNA polymerases resulting in 3'-OH end. We combine terminator removal with the reassembly by employing a thermostable exonuclease III and Taq polymerase for this step. Exonuclease III cuts the terminator by it's 3'-5'-exonuclease activity (enzyme activity only at ds DNA, and therefor only after DNA annealing) and Taq polymerase extends the fragment by it's 5'-3'-polymerase activity.
1.4. Reassembly
Reassembly is a PCR-like reaction without primers, there the fragments anneal to each other based on their homologies and extend. Through all cycles of denaturation, annealing and extension fragments growth or reassemble up to the length of the original DNA sequences.
1.5 Amplification
Amplifying the reassembled DNA takes place in a PCR with sequence flanking primers in order to provide enough material for subsequent cloning and analysing steps.
2. Experimental Protocol
2.1 Preparation of U-DNA dUTP containing DNA template is generated by PCR using 2.5 mM MgCl2 according to the manual (,Uracyl-DNA Glycosylase' Roche # 1 444646), 600 μM dUTP and three other nucleotides dATP, dCTP and dGTP at a concentration of 200 μM each. A typical reaction mixture and cycling conditions used for recombining the alkaline phosphatase are shown below:
Template DNA (plasmid) 40 ng
Primer APhpaF 20 pmol
Primer APxbaR 20 pmol
10 x Taq Puffer 10 μl
Taq DNA polymerase 0,5 U
M Cl2 2,5 mM dATP, dCTP, dGTP (Roche # 1 969 064) 0,2 mM each dUTP (Roche # 1 420 470) 0,6 mM
H2O ad 100 μl
Cycling conditions:
95 °C 5 min
95 °C 1 min
60 °C 1 min 30 x
72 °C 2 min
72 °C 10 min
4 °C 00
The synthesized U-DNA is separated from the template DNA by preparative agarose gel electrophoresis (1% agarose/TAE) and gel extraction (, QIAquick Gel Extraction Kit' Qiagen # 28706).
2.2 Fragment synthesis
Fragments are synthesized using Cycle sequencing reactions (,DIG Taq DNA Sequencing Kit for Standard and Cycle Sequencing' Roche # 1449443) with some modifications. Twelve reaction mixtures are set up for a total of six primers and two U-DNA templates. A typical reaction mixture with cycling conditions used for recombining the two alkaline phosphatase genes are shown as following:
U-DNA 175 ng
Primer (lμM) 1 pmol
Puffer 3 μl
Taq DNA polymerase (5U/μl) 0,6 μl
Termination mixture ddATP/dGTP 2,5 μl
Termination mixture ddCTP/dGTP 2,5 μl
Termination mixture ddGTP/dGTP 2,5 μl
Termination mixture ddTTP/dGTP 2,5 μl
H2O ad 30 μl
Cycling conditions:
95 °C 5 min
95 °C 30 sec
60 °C 30 sec 30 x
72 °C 1 min
4 °C 00
The calf intestinal alkaline phosphatase gene (ciap) and the human placental alkaline phosphatase gene (hpap) show a sequence similarity of 81 % and have lengths of about 1530 bps. Six primers are used for fragment synthesis. Two external primers (APhpaF and APxbaR) share the same sequence. Figure 2 shows the primer arrangement along human placental alkaline phosphatase (hpap) and calf intestinal alkaline phosphatase (ciap) genes.
Primer sequences
Flanking primers
APhpaF CTT CGG CGT TCA GTA ACA CGC (SEQ
ID NO.: 3) ApxbaR GCT TTC GAG GTG AAT TTC GAC C (SEQ
ID NO.: 4) Primers for ciap gene CIAPintlF GGT CAC GTC TGT GAT CAA CCG (SEQ
ID NO.: 5) CIAPintlR CGG TTG ATC ACA GAC GTG AC C (SEQ
ID NO.: 6) CIAPint2F CGC AAA GCT TAT ATG GCA CTG AC
(SEQ ID NO.: 7)
CIAPint2R GTC AGT GCC ATA TAA GCT TTG CC
(SEQ ID NO.: 8) Primers for hpap gene HPAPintlF CCA AGA AAG CAG GGA AGT CAG TG
(SEQ ID NO.: 9) HPAPintlR CAC TGA CTT CCC TGC TTT CTT GG
(SEQ ID NO.: 10) HPAPint2F CAT GTT CGA CGA CGC CAT TGA G
(SEQ ID NO.: 11) HPAPint2R CTC AAT GGC GTC GTC GAA CAT G
(SEQ ID NO.: 12)
2.3 Uracil-DNA Glycosylase treatment and temperature treatment
This step is accomplished by addition of 1 μl Uracil-DNA Glycosylase ( 1 U/μl; Roche , #1775375) to each fragment synthesis mixture and by incubation at 37 °C for 4 h to cleavage the U-bases followed by incubation at 95 °C for 2 min to hydrolyse the abasic sites and to inactivate the enzyme. Another PCR maybe used to control the complete removal of the template U-DNA.
2.4 Reassembly
It is important to use proper amount of purified DNA in the critical reassembly step. DNA fragments are purified from the cleavage products by using Microcon (Microcon 50; Millipore # 42415) and are retained in H2O. A typical mixture and cycling conditions used for recombining the two alkaline phosphatase genes are shown below:
Purified fragments x μl (proper amount) dNTP's (Roche # 1 969 064) 0,4 mM each Expand High Fidelity lOx buffer (Roche # 1 759 175) 5 μl
Taq/ Exo III mix 1 U
H2O ad 50 μl
After 30 cycles 1 μl of Taq/ExoIII is added to the reaction mixture for further reasembly.
Cycling conditions:
95 °C 5 min
95 °C 1 min
45 °C 1 min 50 cycles
72 °C 30 sec
72 °C 10 min
4 °C 00
2.5 Amplification PCR (also used as control PCR)
The amplification PCR is a standard PCR with the two sequence flanking primers (AphpaF and ApxbaR) .The following is a typical mixture with cycling conditions used for recombining the two alkaline phosphatase genes:
Reasembly mixture 0,5 μl
Primer APhpaF 10 pmol
Primer APxbaR 10 pmol
10 x Taq Puffer (+ Mg; Roche # 1 271 318) 5 μl
Taq DNA polymerase (Roche # 1 418 432) 2,5 U dATP, dCTP, dGTP, dTTP (Roche # 1 969064) 0,2 mM each
H2O ad 50 μl
Cycling conditions:
95 °C 5 min
95 °C 1 min
60 °C 1 min 30 x
72 °C 2 min
72 °C 10 min
4 °C 00
2.6 Cloning and expression
The amplification product is purified by preparative gel electrophoresis (1% agarose/TAE) and gel extraction (,QIAquick Gel Extraction Kit' Quiagen # 28706), cloned into vector pCR®-XL-TOPO® according to the suppliers manual (Invitrogen # K 475020) and expressed in E. coli TOP 10 cells delivered with the vector.
2.7 Sequencing results
Nine recombination clones have been analyzed by sequencing using one sequencing primer. The results are summarized in Table 1. Eight of these clones show one ore more recombination events within the sequenced region (445 - 645 bp). One variant contains 4 crossovers within the mentioned region. The overall mutation rate is about 0,25 %. Since the reassembly products can be ligated into the vector in two different orientations, sequencing these clones with only one primer could give both sequences of the N-terminal and coding for the C-terminal regions.
Clone Direction of sequencing Length of sequenced region Crossovers Mutations
AP I N-terminus 509 bp 1 1
AP 3 N-terminus 445 bp 1 0
AP 5 N-terminus 624 bp 4 3
AP 6 N-terminus 553 bp 1 1
AP 11 N-terminus 599 bp 2 1
AP 15 N-terminus 528 bp 1 2
AP 2 C-terminus 645 bp 1 3
AP 9 C-terminus 639 bp 0 1
AP13 C-terminus 638 bp 2 1
Table 1: Sequencing results after recombination of two alkaline phosphatase genes
2.8 Sequences
Figure 3 shows the N-terminal sequence ahgnment of parental ciap and hpap genes and their recombination variants (APOl, AP03, AP05, AP06, AP11 and AP15)
Figure 4 shows the C-terminal sequence alignment of parental parental ciap and hpap genes and their recombination variants (AP2, AP9 and AP13)
Figure 5 shows sequences of hpap, ciap, APOl, AP03, AP05, AP06, APll, AP15, AP2, AP9 and AP13.