Materials and Methods for Nucleic Acid Synthesis
Field of the Invention
The present invention relates to materials and methods for nucleic acid synthesis. More particularly, the present invention relates provides methods and kits for synthesizing a DNA sequence which employ a series of overlapping template oligonucleotides having sequences corresponding to one strand of a DNA sequence, the template oligonucleotides being incapable of extension in the synthesis reaction.
Background of the Invention
Gene synthesis finds applications in protein expression and protein engineering, such as: optimisation of species specific codon usage for protein expression; control of mRNA secondary structure; in vi tro mutagenesis; generation of libraries of variants.
In various organisms some synonymous codons are used more frequently than others. This non random use of synonymous codons has been well documented and shown to be correlated with the relative levels of particular tRNAs (Ike ura et al . , 1981; Maruyama et al., 1986) and with levels of protein expression (Chen and Inouye, 1994) . In order to optimise recombinant protein expression, the coding sequence may be altered to remove rare codons and replace them with codons of host preference, using gene synthesis methods. This approach has been used successfully for heterologous expression of many genes. Strong mRNA structure at the junction region encompassing the Shine-Dalgarno sequence, translation initiation codon and the 5' end of coding sequence can dramatically reduce gene-expression by interfering with
ribosome binding (Baneyx, F., 1999). Gene synthesis could be used to reduce this secondary structure by incorporating synonymous codons with reduced GC content.
Gene synthesis also allows introduction of convenient restriction sites, facilitating subsequent cloning applications (eg. Withers-Martinez et al., 1999), and introduction site specific mutations. In addition, these methods might be used for other purposes such as: generation of libraries of mutants and/or truncated forms of a coding sequence; production of chimeric fusion proteins (eg: Majumder et al . , 1992).
A number of different approaches have been used to synthesise DNA duplexes in vitro without use of DNA from a biological source as template. Historically gene synthesis has been achieved by chemical synthesis of multiple oligonucleotides corresponding to both DNA strands spanning the entire coding sequence that were subsequently annealed and ligated to produce the full length coding sequence (Edge et al . , 1981; Ferretti et al., 1986; Bell et al . , 1988; Jayaraman et al . , 1989). These methods were all rather slow and labourious, involving multiple iterative steps of assembly and cloning of fragments. Alternative procedures termed double-stranded gap repair methods involved synthesis of oligos corresponding to the partial DNA sequence of both strands, followed by "filling in" of the gaps by DNA polymerase prior to ligation (Rink et al., 1984). The above methods produced low yields of full length DNA duplex requiring further amplification by cloning or PCR before further manipulation. Synthetic genes have also be constructed by "splicing by overlap extension" (Higuchi et al., 1988). In this method, PCR products were
purified from their amplification primers and then annealed with each other and extended and the process then repeated until a full length coding sequence is generated. Several methods use several long oligos corresponding to a single DNA strand of the synthetic gene, that are linked by annealing to short complementary bridging oligos, before ligation. Two of these strands with a single complementary overlap are then annealed "filled in" by DNA polymerase to produce the double stranded full length coding sequence (Commess, et al.,
1994; Adams et al . , 1988; Jayaraman et al., 1991; Kalman et al. , 1990) .
More recent double-stranded gap repair procedures use PCR to "fill in" gaps between overlapping oligos, corresponding to both DNA strands, to produce full length duplex DNA that is then further amplified by PCR (Barnet and Erfle, 1990; Jayaraman et al . , 1992; Sandhu et al., 1992; Prodromou and Pearl, 1992; Graham et al., 1993; Stemmer et al., 1995; Casimiro, et al . , 1997; Brocca et al., 1998) (Figure 1). A similar method termed "step-wise extension synthesis (SES)" allows gene synthesis from one end of the gene, generating longer duplexes at each step until a full length gene is synthesised (Majumder et al . , 1992).
There are several undesirable caveats to these double- strand gap repair methods. Firstly, by-products strands, that are incapable of extension to full length, accumulate in each cycle of synthesis (see Figure 1) .
These compete with primers and other intermediate-length strands, which can be extended to full length, for annealing to template and thus reduce the annealing efficiency and yield at each step of synthesis resulting
in low yields of full length product. Secondly, after the first two cycles of synthesis the overlaps between different pairs of annealed intermediate length byproduct and/or productive strands become very variable in length. This may promote undesirable, non-specific annealing between or within strands (i.e. hairpins) as a result of the increased chance of a longer stretch of sequence finding some fortuitous sequence complementarity either to itself or to a different strand. Self- annealing may give rise to strong secondary structure that is known to reduce the efficiency of annealing to template strands, and in addition non-specific annealing to other may give rise to multiple erroneous by-products. The combination of these drawbacks mean that time- consuming optimisation of reaction conditions is often required to obtain full length product, and yields of full length product are low requiring amplification by PCR. Products are sometimes found to contain nucleotide mis-incorporations (e.g. Stemmer et al., 1995) that must then be removed by site-directed mutagenesis.
However, there is a continuing need in the art for further methods for gene synthesis, and especially for methods which can be adapted for high throughput gene synthesis.
Summary of the Invention
The present invention concerns new methods for nucleic acid synthesis. In particular, the present invention provides methods and kits for synthesizing a DNA sequence which employ a series of overlapping template oligonucleotides having sequences corresponding to one strand of a DNA sequence, these template oligonucleotides being incapable of extension in the synthesis reaction,
and forward and reverse primers. The novel method for gene synthesis described herein helps to avoid some of the limitations inherent in the current methodologies. The method uses two oligonucleotide primers: one primer that primes synthesis of a single full length DNA strand, using a series of template oligonucleotides corresponding solely to the complementary strand as template; a second primer that primes synthesis of the complementary strand, using the newly synthesised strand as template. In one embodiment, the template oligonucleotides have modified 3' termini to prevent DNA polymerase mediated strand extension and nucleotide backbone modifications between one or more 3' nucleotides, to inhibit or prevent removal of the 3' modified nucleotides by the 3'-5' exonuclease activity of proof-reading polymerases, i.e. to confer some resistance to the exonuclease. Thus, in the method, gene synthesis proceeds by step-wise extension of a single strand until a full length single stranded product has been generated, then the complementary strand is synthesised in a single step. This process is referred to herein as "ladder and snake gene synthesis". By overcoming some of the limitations inherent in existing gene synthesis, this method may prove to be sufficiently reliable to allow gene synthesis to be used in medium to high-throughput mode in functional and structural genomics programs, and in general protein biochemistry studies, and is especially adaptable for protein engineering and functional studies.
Accordingly, in a first aspect, the present invention provides a method for synthesizing a DNA sequence, the method employing: a plurality of template oligonucleotides, the oligonucleotides having nucleic acid sequences which
overlap to form a template having a sequence corresponding to a first strand of the DNA sequence, wherein the template oligonucleotides are resistant to DNA polymerase mediated strand extension; a forward primer which is capable of annealing to a first, 3 '-most template oligonucleotide and being extended; a DNA polymerase and nucleotides; the method comprising: contacting the template oligonucleotides, the forward primer, the DNA polymerase and the nucleotides under conditions so that the forward primer anneals to the first template oligonucleotide and extends to form a nucleic acid product, said process repeating with successive overlapping template oligonucleotides to synthesize the second strand of the DNA sequence; and: employing a reverse primer capable of annealing to the 3' end of the second strand and extending the reverse primer to synthesize the first strand of the DNA sequence thereby providing the DNA sequence.
Thus, the reaction proceeds by annealing successive template oligonucleotides to forward primer or the extension product based on the forward primer, and then denaturing and reannealing the next template oligonucleotide and further extending the extension product .
In some embodiments, the reverse primer may hybridise to a sequence at or towards the 3' end of the second strand of the DNA sequence that has been synthesised by the method of the present invention. In this case, extension of the reverse primer results in the production of a DNA sequence which corresponds to the original DNA sequence
from which the overlapping template primers were designed. In alternative embodiments, the reverse primer may be a longer sequence, the 3' end of which overlaps with and is capable of hybridising to the 3' end of the second strand of the DNA sequence. In this case, the 3' end of the reverse primer can be extended to provide a DNA sequence corresponding to the original DNA sequence, while the 3 ' end of the second strand can be extended using the non-overlapping part of the reverse primer as a template. Thus, this provides a way of producing chimeric DNA species joining a DNA sequence produced according to the present invention with a further DNA sequence based on the non-overlapping part of the reverse primer. The further DNA sequence may be a second sequence produced by the method of the invention or may be a sequence produced using another method known in the art.
Preferably, the reaction mixture comprises a thermostable DNA polymerase and a buffer comprising Mg2+ ions.
Preferably, the method comprises the additional step of amplifying the DNA sequence synthesized according to the method, e.g. using PCR with the forward and reverse primers.
Thus, in contrast to the prior art gap filling approaches employing a series of overlapping template oligonucleotides corresponding to both DNA strands, the present invention uses a template based on a single strand, thereby helping to avoid the synthesis of intermediate length products. In preferred embodiments, this allows gene synthesis is performed in a single PCR reaction and the use of 3' modifications to the template
oligonucleotides based on one strand of the DNA sequence means that only one strand is synthesised throughout the extension process and no dead end products are produced.
Thus, the method of the present invention help to avoid the main problems associated with prior-art "gap-repair" methods discussed above. The present invention helps to avoid the generation of undesirable by-products by synthesising one DNA strand to full length, before synthesis of its complementary strand and therefore enhances the efficiency of synthesis and final yield of full length product. Additionally, in the present method, the overlap between template oligonucleotides and the first primer or newly synthesis product, can be designed so that all pairings have matched annealing temperatures. Since no by-products are generated in the method, the length of the overlap of each pairing remains constant throughout synthesis and thus reduces the chance mis-annealing of DNA strands. The only source of competition for annealing between a template oligonucleotide and its newly synthesised product primer is the preceding template oligonucleotide (i.e. the adjacent template oligonucleotide on the 3' side) . This method therefore offers increased fidelity and yield over prior art gene synthesis methods. These advantages may make this method sufficiently reliable to allow synthesis of very large genes or DNA constructs of other kinds and may also be amenable to high-through-put operation.
Optionally, the method may comprise the initial step of designing the template oligonucleotides and/or forward and reverse primers suitable for synthesizing the desired DNA sequence, and this design process is described in more detail below. The design of the template
oligonucleotides used for the synthesis of a desired DNA sequence is a part of the present invention and in each given case might depend on one or more of the following considerations. Firstly, primers may be designed to optimise the coding sequence for protein expression: e.g. to adjust the DNA sequence produced to take account of host cell codon preference or of GC content or to reduce mRNA secondary structure.
In preferred embodiments, the template oligonucleotides are between about 30 and 100 nucleotides in length, more preferably between about 30 and 70 nucleotides in length, and most preferably between about 40 and 60 nucleotides in length. Preferably, the length of the overlaps of adjacent template oligonucleotides is between about 10 and 30 nucleotides in length, more preferably between about 15 and 25 nucleotides in length, and most preferably between about 17 and 24 nucleotides in length. The length of the template oligonucleotides and the length of the overlapping sequence between adjacent template oligonucleotides can be designed to ensure that annealing of template oligonucleotides to primer or newly synthesised strand is highly selective at around the same temperature. The sequence of the template oligonucleotides can also be designed to take account of selectivity and annealing temperature by changing the nucleic acid sequence of one or more template oligonucleotides, e.g. to employ an alternative codon encoding the same amino acid at a given position, in order to reduce or prevent template oligonucleotides assembling in more than one order, to inhibit side reactions and to promote efficient annealing of the specific pairings at the same temperature, and so reducing the efficiency of DNA synthesis. The selection
of the oligonucleotide sequences may also take account of further studies or modifications that might be carried out using the DNA sequence. An example of this is where the methods described herein are used to carry out mutagenesis studies by changing the amino acid sequence of the polypeptide encoded by the DNA sequence. In this application of the invention, mutations (one or more substitutions, insertions or deletions involving one, two, three, four, five, ten, twenty or more amino acid residues) can be readily introduced into the sequence by modifying the sequence of one or more of the template oligonucleotides. In this event, it would be advantageous if the original and modified template oligonucleotides had sequences which allowed them to be used under substantially the same reaction conditions, without redesigning a set of template oligonucleotides based on the above considerations.
Conveniently, the template oligonucleotides are derivatised so that they can anneal to the forward primer or a nucleic acid product derived from the forward primer, but cannot themselves be extended. In one embodiment, this can be achieved by chemically derivatising the template oligonucleotides, e.g. towards or at their 3' termini, to inhibit or prevent polymerase mediated strand extension. Addition of many different chemical groups to the 3' position of the deoxyribose moiety of the 3'most nucleotide might be used to confer DNA polymerase blocking activity. The primary considerations for choice of the blocking group are: firstly, that the blocking group must not be a substrate for the polymerase activity of the DNA polymerase used in the method; secondly, that the blocking group must be compatible with PCR, e.g. reasonably chemically stable
under the conditions used in the method, so that the group continues to exert polymerase blocking activity throughout synthesis. There is a wide range of chemical derivatisations that could satisfy these criteria including: 3' amino group; 3' phosphate; 3' propyl phosphate; 3' hexaethylene glycol; internal hexaethylene glycol modifications (i.e. between adjacent nucleotides); 3' propanol; 3' propylamino; 3' thiopropyl; 3' phosphorothioate; an abasic site (i.e. deoxyribose analogue without a base); 3' alkyl phosphonates; 3' phosphoramidates; and many other possible modifications known to those skilled in the art. One or more of these derivatisations may be used in the template oligonucleotides .
In addition, and particularly when a proof reading polymerase is employed to synthesize the DNA sequence, it is preferred that the template oligonucleotides are modified to prevent removal of the 3' nucleotides of the template oligonucleotides by the 3 '-5' exonuclease activity of the polymerase. Conveniently, this can be accomplished by modifying the backbone of the template oligonucleotides. Potential backbone modifications include: various phosphorothioate linkages (Stec et al., 1984; reviewed in De Mesmaeker et al . , 1995); ionic and non-ionic methyl phosphonate linkages (reviewed in Wozniak, L. A., 1999); a range of phosphoramidate linkages and various 2'carbohydrates such as 2'-F (reviewed by Egli and Gryaznov, 2000) . Alternatively or in addition to backbone modifications, some 3' modifications may be used to block 3'-5' exonuclease activity such as: 3' amino group; long chain alkylamino groups and others.
The method may also additionally comprise the steps of introducing the nucleic acid into an expression vector and/or transfecting a host cell with expression vector thus obtained or the nucleic acid. The present invention therefore also includes a method of producing the polypeptide encoded by the desired DNA sequence, the method comprising culturing the host cells and isolating the polypeptide thus produced.
In a further aspect, the present invention provides a kit for synthesizing a DNA sequence, the kit comprising: a plurality of template oligonucleotides, the oligonucleotides having nucleic acid sequences which overlap to form a template having a sequence corresponding to a first strand of the DNA sequence, wherein the template oligonucleotides are resistant to DNA polymerase mediated strand extension; a forward primer which is capable of annealing to a first, 3'most template oligonucleotide and being extended; a DNA polymerase (preferably a thermostable polymerase) , nucleotides and a buffer comprising Mg2+ ions.
In a further aspect, the present invention provides a method of designing a set of template oligonucleotides and primers for synthesizing a desired DNA molecule, the method comprising:
(a) obtaining the sequence of the DNA molecule; (b) deciding on a range of template oligonucleotide lengths;
(c) deciding on range of template oligonucleotide overlap lengths;
(d) deciding on annealing temperatures of template
oligonucleotides to a forward primer or an extension product based on the forward primer;
(e) dividing the DNA sequence into a set of template oligonucleotides and designing forward and reverse primers;
(f) checking the sequences of the template oligonucleotides and the primers for degeneracy;
(g) checking the sequence of oligonucleotide overlaps for annealing temperature; (h) if there is unacceptable degeneracy or variation in annealing temperature, then test regions of sequence to the 5' and 3' sides, within the range of oligonucleotide lengths decided above, for the reduced degeneracy and improved annealing temperature, and if suitable oligonucleotide overlap (s) cannot identified within the above length constraints then choose alternative codons to remove degeneracy or reduce temperature variation; and,
(i) repeat the checks on sequence degeneracy and annealing temperature until the method produces the sequence of the set of template oligonucleotides and primers with acceptable characteristics.
Optionally, the method may comprise one or more of additional steps:
(j) checking for codon preference of host cells;
(k) adjusting for modifications to the DNA molecule that might be made in future studies.
In further related aspects, the present invention provides a computer programmed to carry out the above method of designing a set of template oligonucleotides and primers for synthesizing a desired DNA molecule, and a data carrier having stored thereon a program for
carrying out the method.
Embodiments of the present invention will now be described by way of example and not limitation with reference the accompanying figures .
Brief Description of the Figures
Figure 1. A Schematic Representation of Double Stranded
Gap-Repair Methods for Gene-Synthesis. This schematic diagram shows the desired products that are predicted to form during synthesis of a 300bp gene using double- stranded gap repair based methods, with ten oligos of around 50bp. After the first two cycles of synthesis the length of the overlaps between complementary products becomes much longer increasing the chances of nonspecific annealing of products or oligonucleotides and generation of erroneous products. It is therefore predicted that fourteen intermediate length products would be generated during PCR, that could be extended to full length sequence. Only the terminal oligos will be incorporated into full length product. All intermediate length products apart from those that extend to the 5' termini of one of the strands of the gene are by products in that they can not be extended to full length and act only as template for extension. Based on the assumption all strands are therefore extended to their maximum length, then it is predicted in this example that eight different by products will be created. The number of by products increases with the size of the gene to be synthesised and the number of oligos used.
Figure 2. Synthesis of a 300bp fragment of a GFP mutant gene using the "ladder and snake" gene synthesis method. This schematic diagram shows all of the predicted
products generated during gene-synthesis of the 300bp fragment of the GFP coding sequence using the "ladder and snake" method, using ten template oligos of around 50bp and two primers of 22bp. It is therefore predicted that ten intermediate-length products, that can be extended to full length, and no by products would be generated by PCR. The primer for synthesis of the first strand is coloured blue, the template oligos are red and the primer for synthesis of the complementary strand is green. NH2, indicates replacement of the 3' hydroxyl moiety with an amine group that blocks the DNA polymerases-mediated chain extension (this modification could be replaced by a number of other modifications that prevent chain- extension, such as phosphorylation, dideoxy etc) . XX indicates the presence of two phosphorothioate linkages between the two 3' terminal nucleotides, blocking the 3'- 5' exonuclease activity of proof-reading DNA polymerase.
Figure 3. (A) Three 50ml PCRs were performed with Vent DNA polymerase at 1.5, 2.5 and 3.5 mM MgS0 concentrations, primer concentrations of ImM and template oligo concentrations of O.l M. Three 50 ml PCRs were also performed with Vent exo- under the same conditions. (B) Four PCRs were repeated with Vent DNA polymerase at 0.5, 1.5, 2.5 and 3.5 mM MgS04 concentrations using only the external 22bp primers and no template oligos. 0.5 ml of PCR product from the previous experiment (lane 4 in Figure A) was used as template. Otherwise PCR conditions were as described in the methods section.
Figure 4. (A) Lane 1, DNA size markers. Lane 2, complete synthesis reaction. Lane 3, minus 5' primer control reaction. Lane 4, minus 3' primer control reaction. Lane 5, minus template oligo 7 control reaction. (B)
Lanes 1 and 6, DNA size markers. Lanes 2-5, EcoRl digests of four different pCRBlunt-synthetic-Vpr clones. The cloning site of the pCRBlunt is flanked by two EcoRl sites and therefore the EcoRl digest release the insert from the vector.
Detailed Description
Methods for Synthesizing and Expressing Nucleic Acid
DNA sequences can be readily obtained, amplified and manipulated according to the present invention based on the information and references contained herein and techniques known in the art (for example, see Sambrook, Fritsch and Maniatis, "Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989, and Ausubel et al., 1992). These techniques include (i) the use of the polymerase chain reaction (PCR) to amplify samples of such nucleic acid, (ii) chemical synthesis, or (iii) amplification in a host such as E. coli .
In order to obtain expression of a DNA sequence encoding a polypeptide of interest, the sequences can be incorporated in a vector having control sequences operably linked to the DNA to control its expression. The vectors may include other sequences such as promoters or enhancers to drive the expression of the inserted nucleic acid, so that the polypeptide is produced as a fusion, and/or nucleic acid encoding secretion signals, so that the polypeptide produced in the host cell is secreted from the cell. The polypeptide can then be obtained by transforming the vectors into host cells in which the vector is functional, culturing the host cells so that the polypeptide is produced and recovering the polypeptide from the host cells or the surrounding medium. Prokaryotic and eukaryotic cells are used for
this purpose in the art, including strains of E. coli, yeast, and eukaryotic cells such as COS or CHO cells. The choice of host cell can be used to control the properties of the polypeptide expressed in those cells, e.g. controlling where the polypeptide is deposited in the host cells or affecting properties such as its glycosylation and phosphorylation.
PCR techniques for the amplification of nucleic acid are described in US Patent No: 4,683,195. In general, such techniques require that sequence information from the ends of the target sequence is known to allow suitable forward and reverse oligonucleotide primers to be designed to be identical or similar to the polynucleotide sequence that is the target for the amplification. In the present invention, the design of primers for PCR may involve including a sequence comprising 1, 2, 3, 4, 5, 10 or 20 or more nucleotides which are resistant to the activity of endonuclease enzymes, e.g. by being chemically modified. PCR comprises steps of denaturation of template nucleic acid (if double-stranded) , annealing of primer to target and polymerisation. The DNA sequences provided herein readily allow the skilled person to design PCR primers. References for the general use of PCR techniques include Mullis et al, Cold Spring Harbor Symp. Quant. Biol., 51:263, (1987), Ehrlich (ed) , PCR Technology, Stockton Press, NY, 1989, Ehrlich et al, Science, 252:1643-1650, (1991), "PCR protocols; A Guide to Methods and Applications", Eds. Innis et al, Academic Press, New York, (1990) .
Systems for cloning and expression of a polypeptide in a variety of different host cells are well known. Suitable host cells include bacteria, eukaryotic cells such as
mammalian and yeast, and baculovirus systems. Mammalian cell lines available in the art for expression of a heterologous polypeptide include Chinese hamster ovary cells, HeLa cells, baby hamster kidney cells, COS cells and many others. A common, preferred bacterial host is E. coli .
Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. Vectors may be plasmids, viral e.g. 'phage, or phagemid, as appropriate. For further details see, for example, Molecular Cloning: a Laboratory Manual: 2nd edition, Sambrook et al., 1989, Cold Spring Harbor Laboratory Press. Many known techniques and protocols for manipulation of nucleic acid, for example in preparation of nucleic acid constructs, mutagenesis, sequencing, introduction of DNA into cells and gene expression, and analysis of proteins, are described in detail in Current Protocols in Molecular Biology, Ausubel et al. eds . , John Wiley & Sons, 1992.
The DNA produced according to the present invention having single stranded overhangs or blunt ends can be inserted into a replicable vector for cloning (amplification of the DNA) or for expression. Various vectors are publicly available. The vector may, for example, be in the form of a plasmid, cosmic, viral particle, or phage. The appropriate nucleic acid sequence may be inserted into the vector by a variety of procedures. In general, DNA is inserted at an appropriate site in the vector have a complementary single stranded overhang to that produced by the methods
described herein. The vector may include one or more other useful sequences including, but not limited to, a signal sequence, an origin of replication, a marker gene, an enhancer element, a promoter or a transcription termination sequence. Construction of suitable vectors containing one or more of these components employs standard ligation techniques which are known to the skilled person.
Host cells are transfected or transformed with expression or cloning vectors described herein and the cells cultured in conventional nutrient media modified as appropriate for inducing promoters, selecting transformants, or amplifying the genes encoding the desired sequences. The methods of the invention may include the additionally step of transforming a host cells with the DNA produced by the methods described herein. For eukaryotic cells, suitable techniques may include calcium phosphate transfection, DEAE-Dextran, electroporation, liposome-mediated transfection and transduction using retrovirus or other virus, e.g. vaccinia or, for insect cells, baculovirus. For bacterial cells, suitable techniques may include calcium chloride transformation, electroporation and transfection using bacteriophage. As an alternative, direct injection of the nucleic acid could be employed. Host cells produced in this way can be cultured to express a polypeptide encoded by the DNA and the polypeptide thus produced isolated for further use.
Designing the Template Oligonucleotides and Primers
The design of the templates and primers may, in a preferred embodiment, be carried out according to the following scheme.
(a) identify a target DNA sequence, such as an open reading frame, and optionally its derived amino acid sequence, optionally incorporating amino acid mutations if required and/or silent mutations to optimise codon usage for a particular host if required.
(b) decide on a range of lengths for the two primer oligonucleotides (e.g. 16-25).
(c) decide on a range of lengths of template oligonucleotides (e.g. 40-75). (d) decide on a range of lengths of complementary overlaps (e.g. 16-25).
(e) decide on a value for the theoretical melting temperature to be used in design of all complementary overlaps and the range around this value that will be tolerated (e.g. 65 ± 2°C) .
(f) optionally, identify a library of codons that may be used for substitution in subsequent design of oligonucleotides, including for example either all possible codons or including a limited set of codons of preference to the host intended for protein expression.
(g) decide on an upper threshold value for the self-complementarity of sequence in terms of the secondary structure stability of all oligonucleotides (ie. a value of theoretical melting temperature, e.g. 40°C) .
(h) decide on an upper threshold value for undesirable complementary pairing between different oligonucleotides or between oligonucleotides and the newly synthesised strand (i.e. a value of theoretical melting temperature, e.g. 40°C)
(i) decide on an upper threshold value for secondary structure stability of the newly synthesised strand (i.e. a value of theoretical melting temperature, e.g. 40°C) .
(j) computationally generate a "virtual library" of sets of candidate template oligonucleotides corresponding to all possibilities within the above defined ranges, optionally including synonymous codons. (k) optionally, design the forward primer and 3'- ost template oligonucleotide, so that the entire sequence of the forward primer is complementary to the 3' most template oligonucleotide, to minimise the possibility of mispriming by annealing of the forward primer to another template oligonucleotide.
(1) start by designing the forward primer and 3'- most template oligonucleotide, selecting the optimal oligonucleotides from the above library of candidate oligonucleotides on the basis of the below criteria: (1) complementary overlap is closest to "ideal" annealing temperature defined in section (e) .
(2) one or two A or T residues in the last five residues at the 3' end of the forward primer or in the last five residues at the 5' end of the first template oligonucleotide, and more preferably two or three A or T residues and most preferrably three or more A or T residues, in order to minimise the chances of mispriming.
(3) secondary structure (i.e. sequence self- complementarity) below threshold melting temperature defined in section (g) .
( ) select the second template oligonucleotide from the candidate oligonucleotide library on the basis of the below criteria: (1) the complementary overlap of the candidate oligonucleotide to the predicted extension product of the forward primer and first template oligonucleotide must fulfil the criteria in section 12.
(2) any undesirable sequence complementarity between the candidate oligonucleotide and other template oligonucleotides, that have already been selected, or the newly synthesised strand is below the melting temperature threshold set in section (i) .
(3) the secondary structure (i.e. sequence self-complementarity) of the newly synthesised strand, predicted to be produced from the candidate template oligonucleotide, is below the threshold set in section (i) .
(n) repeat section (m) until template oligonucleotides for the entire strand have been designed. (o) compare all oligonucleotides to the sequence of the newly synthesised full-length strand to check that all oligonucleotides have high complementarity to the intended sequence alone (i.e. that any undesirable complementary pairing is below the threshold set in section (h) .
(p) optionally, design a reverse primer so that the entire sequence of the reverse primer is identical to the 5' most region of the 5'-most template oligonucleotide, to minimise the possibility of mis-priming by annealing of the reverse primer at an undesired site in other oligonucleotides or elsewhere on the newly synthesised strand.
(q) alternatively the last template oligonucleotide designed as in section (m) can be used as reverse primer.
(r) synthesise all template oligonucleotides with DNA polymerase blocking modifications and optionally, with 3'-5' exonuclease blocking groups. Synthesise both forward and reverse primers without DNA polymerase
blocking chemical modifications and optionally with 3'-5' exonuclease blocking chemical modifications.
Preferred Applications of the Method Expression of recombinant proteins in bacteria and yeast is currently necessary for production proteins for applications that require milligram scale quantities of protein such as structural analysis by X-ray crystallography or NMR. New "structural genomics" initiatives will require production of multi-milligram quantities of many thousand proteins, from diverse species. However, protein structure projects often fails before the structural analysis due to insufficient quantity or quality of protein (REF) . These problems can sometimes be overcome through engineering of the coding sequence to: optimise codon usage; reduce mRNA secondary structure; select mutant proteins with improved properties from libraries of point mutants and /or libraries of truncation mutants. Gene synthesis techniques might be used to facilitate this process and therefore may become important tools in protein- engineering. We therefore aimed to develop methodology for gene-synthesis that would be sufficiently robust for routine application in an automated high-throughput format.
The present method can be used to generate synthetic open reading frames that can then be used for protein expression. This approach may be particularly useful in the study of proteins from organisms that have skewed codon usage such as many important human pathogens, including Plasmodium falciparum and Mycobacterium tuberculosis, several Schistosoma species. Most recombinant protein expression is performed in either
mammalian, insect cell, bacterial or yeast cell lines and the codon usage of these systems may often be incompatible with high level expression of a coding sequence from an organism which has a different bias in codon usage. Gene synthesis may therefore facilitate recombinant protein expression of proteins from these organisms and of particular genes from any organism that happen to contain rare codons. In addition, gene synthesis may be needed for expression of genes from certain cellular organelles, such as the mitochondria of many eukaryotes and nuclei of protozoans, that have non- standard genetic code, and therefore require multiple codon changes in order to allow recombinant protein expression of the correct protein sequence.
This method may become particularly important in the emerging functional genomics initiatives in which genome scale protein expression is to be performed in order to elucidate protein function. It might be envisaged that when insufficient protein expression levels have been obtained using the wild type coding sequence that a synthetic coding sequence would be generated using this method in order to obtain a useful level of protein expression. The novelty of this method in combination with semi-automated oligo design may make this method accessible to high-throughput application on a large scale. This may find particular utility in structural genomics or other large scale protein structure initiatives where large quantities of protein are required for crystallisation and/or NMR studies. This method can also provide a platform for the intensive study of a particular high-value target protein, because it allows a wide range of manipulations to be performed using a single integrated platform. Manipulations
include generation of: single or multiple point mutants; truncation mutants; essentially any protein fusion or affinity tag; libraries of point or truncation mutants to allow screening for functional or structural domains or novel activities.
Other potential applications include:
Generation of novel linear dsDNA sequences for use as DNA vaccines.
The method could also be used to generate linear expression elements (Sykes and Johnston, 1999) including transcription control sequences such as promoters, terminators, for use as DNA vaccines or for in vivo or in vi tro protein synthesis.
Generation of tailor made ssDNA and/or ssRNA that might be used for: in vitro transcription and/or translation studies perhaps for preparation of recombinant protein samples by cell-free expression; synthesis of novel tRNAs; antisense studies.
Production of 3' ssDNA overlaps on dsDNA products to facilitate subsequent manipulations such as cloning.
Construction of engineered cloning or expression vectors in order to optimise various features such as size; ease of cloning of inserts; multiple cloning sites; minimal unnecessary restriction sites; promoters; transcription regulations sequences; selectable markers; ease of site directed mutagenesis of inserts etc.
In DNA computers, see Adle an L.M. et al . , Science 266, 1021-1024, 1994.
Materials and Methods
Two oligonucleotide primers (22 nucleotides in length) and 10 template oligos (51-55 nucleotides in length) were ordered from MWG Biotech with the following modifications: phosphorothioate modifications were introduced between the last two nucleotides at the 3' termini of all oligos; all template oligos were synthesised with 3' amino groups. The program "Oligo Calculator" (JaMBW, EMBL) was used to estimate the melting temperature of each overlap.
All PCR experiments were performed using either Vent DNA polymerase or Vent exo- DNA polymerase (NEB) . Conditions for efficient PCR were screened using the HotWax Optistart Kit (Invitrogen) according to the manufacturers instructions. The PCR program for gene-synthesis consisted of 2 min 94°C, then 45 cycles of: 94°C, 1 min; 55°C 1 min; 72°C, 1 min. PCR products were analysed by 1.5% (Vv) agarose gel in TAE buffer using standard electrophoresis conditions.
Example 1
Rationale for gene-synthesis
A synthetic coding sequence was designed for expression of the first 300bp of a green fluorescent protein mutant with five amino acid mutations and 17 silent mutations based on an mutant GFP with enhanced fluorescence (Crameri et al . , 1996).
This method uses only two oligonucleotide primers: one oligonucleotide primer for synthesis of only one DNA strand (the first strand) , while template is provided by a series of template oligos corresponding solely to the complementary strand; and a second oligo corresponding to
the 5' end of the complementary strand that uses the newly synthesised first strand as template (Figure 2) . The 3' termini of all template oligos were modified to prevent DNA polymerase mediated strand extension. There are a number of different chemical groups that could be attached at the 3' carbon to prevent chain extension (amino group, dideoxy, phosphate etc.). This therefore prevents production of any intermediate length products that accumulate using standard methods (Figure 1) . In addition phosphorothioate (PTO) modifications were introduced between the last two 3' nucleotides in all template oligos preventing removal of the 3' modified nucleotides by the 3'-5' exonuclease activity of high fidelity proof-reading polymerases. The primer oligos were not PTO modified so that proof-reading of the full length product could occur during amplification of the full length coding sequence. The 3' PTO modifications will also prevent "trimming" of oligos, intermediate length and full length products which might affect the length of the overlaps during gene assembly. The rationale for "relay gene synthesis" has some similarity to that of the "step wise extension (SES)" approach (Majumder et al., 1992), with several notable differences: 1) gene synthesis is performed in a single PCR reaction; 2) use of 3' modifications to template oligos so that only one strand is synthesised throughout the extension process and no dead end products are produced; 3) blocking of 3' - 5' proof-reading of template-oligos by PTO modifications.
Design and assembly of synthetic coding sequence
The original 5' primer and the overlaps to be generated during extension, that are to act as primers for subsequent extensions, were all designed to have the
following parameters: annealing temperatures of 57-61°C ; length between 17 and 24 bases; GC content of 40-60%; two or three A or T bases in the last five 3' residues. The program "Oligo Calculator" (JaMBW, EMBL) was used to estimate the melting temperature of each overlap. The panel of oligos was also screened and matched to meet the following criteria: . minimisation of tandem repeats; minimisation of secondary structure; minimisation of sequence complementarity between oligonucleotides. In order to apply the above constraints several base-changes were typically introduced to each overlap (eg. A/T to G/C mutations to increase the melting temperature of an overlap, or base changes to reduce secondary structure in a particular overlap) . These changes were accomplished by using only silent mutations (i.e. changing codons to synonymous codons) , without choosing any codons of reduced E. coli codon preference.
PCR gene synthesis trial The preliminary trial of our gene synthesis method used both Vent DNA polymerase (Vent) and Vent exo- DNA polymerase (Vent exo-) in PCR experiments under otherwise identical conditions. A product of the expected size (~300bp) was observed by agarose gel electrophoresis for PCRs catalysed by Vent but not by Vent exo, suggesting that proof-reading activity is necessary for efficient gene-synthesis (Figure 3A) . A smeared band stretching from around 50bρ to ~300bp was observed for the Vent exo- reaction. The high-intensity of the ~300bp band produced in the Vent reaction suggests that synthesis has been efficient. In order to confirm that the 300bp product obtained in our study was indeed the intended product, we repeated the PCR experiment using a 1/100 dilution of initial PCR product as template in PCR reaction mix, with
just the 5' and 3' oligonucleotides as primers (Figure 3B) .
Discussion Vent and Vent exo- DNA polymerases were chosen for preliminary trial of this novel "gene-synthesis" approach. Vent has 5' - 3' polymerase activity common to all template-dependent DNA polymerases and also has 3' - 5' exonuclease activity common to proof-reading polymerases. The exonuclease domain of Vent exo- has been deleted removing proofreading activity. Non-proofreading thermostable polymerases, such as Taq DNA polymerase, are known to catalyse non-template-directed addition of a single A residue to the 3' end of PCR products. Vent exo- has been reported to generate 70% blunt-ended products and 30% single A overhangs according to the supplier (NEB, Inc.). We predicted that non-proof-reading polymerases would be inefficient at gene-synthesis by this method, or by any of the existing PCR methods. The rationale for this was that the intermediate PCR products generated during PCR-mediated gene-synthesis methods would make poor primers, because the 3' residue would not be base-paired to template in one in four cases (on average) due to the presence of the non-template encoded A-extensions. We predicted that this would greatly reduce the efficiency of the step-wise strand elongation and/or might introduce nucleotide mis-incorporations. Our predictions appear to have been confirmed since gene- synthesis appeared to be much more efficient using Vent than Vent exo- under identical reaction conditions. Surprisingly Taq DNA polymerase has been reported to allow successful gene synthesis by other methods (eg. Prodoumou and Pearl 1993) . In these cases a very small amount of product may have been produced that was then
greatly amplified by PCR.
In summary, we have developed an optimised gene synthesis method that avoids production of dead-end products and allows control of primer overlap-length throughout gene synthesis. Of the drawbacks inherent in previous TGS methods, only potential inhibition of extension reactions by annealing of the preceding template oligonucleotide (i.e. the oligo immediately to the 3' side) remain's. In theory, there is essentially no limit to the size of gene to be synthesised by this method other than the size limitations of standard PCR, and there are no obvious drawbacks other than those normally found for general PCR. We believe that this method may be sufficiently robust to make it attractive for high-throughput applications .
Example 2
Synthesis of the Vpr coding sequence optimised for E. coll codon usage.
Codon usage of many HIV proteins is strongly biased away from codons of E. coli preference and as a result E. coli proteins are often expressed at very low levels in bacteria. Attempts to express Vpr in E . coli using an expression vector containing the native Vpr coding sequence produced barely detectable quantities of protein. We therefore designed a synthetic coding sequence for Vpr using the procedure described earlier.
Experimental design
In designing the synthetic 291 base pair coding sequence of Vpr silent mutations in -80% of all codons were introduced in order to optimise the codon usage towards that of E. coli . In designing of the template
oligonucleotides for optimal performance in the gene synthesis reaction it was not necessary to make any silent mutations that would introduce codons of low abundance in E. coli . Thirteen template oligos were designed ranging in length from 37-45 bases with theoretical melting temperatures ranging from 53-55°C.
The following 50 μl reaction mixtures were prepared: 0.2 μM 5' and 3' primers; 0.02 μM of each template primer; 200 μM dNTP; 1 mM MgS04; 2% DMSO; 0.1% Tween, 2 Units of Vent DNA polymerase (NEB Inc.). The thermal cycling protocol used was fifty cycles of 94°C for 60 sec, 50°C for 60 sec, 73°C for 15 sec.
Results and Discussion
Synthesis of the codon-optimised coding sequence for Vpr was performed according to the method outlined earlier. The products of the reaction were analysed by agarose gel electrophoresis (Figure 4a) . DNA product of the expected size were obtained from the reaction mixture described above. The DNA band produced for the above reaction was sharply defined and of similar definition to that of the DNA size markers. This suggests that the Vpr product is double-stranded and essentially homogeneous with respect to length. Several control experiments were also performed: 1) 5' primer omitted; 2) 3 ' primer omitted; 3) template oligonucleotide 7 omitted (Figure 4a) . The results of the control experiments are discussed below:
(1) No full-length product was observed when the 5' primer was omitted as would be expected since the 5' primer is needed to prime synthesis of the first stand and therefore the reaction would not be expected to
proceed at all .
(2) In the method the 3' primer is expected to prime synthesis of the second strand using the newly synthesised first stand as template. This experiment would therefore be expected to produce just the first stand. A weak diffuse band of slightly lower apparent size than the Vpr synthesis product, was observed when the 3' primer was omitted consistent with presence of a heterogeneous product and of single stranded DNA. This therefore indicates that the 3' primer is needed in order to produce full-length Vpr of homogeneous length.
(3) A smeared band of lower apparent size than full- length Vpr was observed consistent with partial extension of the first strand to ~150 bases in length. The failure to produce full-length product was expected since the absence of the seventh template oligo means that the first strand can not be extended further. The smeared bands are also consistent with the presence of single stranded DNA that would be expected since there is no 3' primer available to anneal to the intermediate length first strand.
Analysis of Cloned Synthetic Vpr Coding Sequence
The codon-optimised Vpr synthesis product was cloned into pCRBlunt-TOPO according to the manufacturers protocol (Invitrogen Inc.). Restriction analysis of the cloned fragments indicated that the inserts were of the expected size (Figure 4b) . The ability to clone the synthetic product indicates that it comprises double-stranded blunt-ended DNA as expected. DNA sequence analysis confirmed the sequence identity of the synthetic Vpr insert and indicated that the sequence fidelity was of
the same order as conventional PCR.
Conclusions
The above experiments indicate that the rationale behind the "ladder and snake" gene synthesis method is valid and that the method produces specific DNA products of the expected size and sequence and that it does so by the chemical mechanism proposed in the present method.
Re erences :
The references mentioned herein are expressly incorporated by reference.
Adams, S. E., Johnson, I. D. , Braddock, M., Kingsman, A. J. , Kingsman, S. M. and Edwards, R. M. (1988), Nucleic Acids Res . , 16 , 4287-4298.
Adleman L.M. et al . , Science 266, 1021-1024, 1994.
Baneyx, F., (1999) Curr . Opin Biotech . 10, 411-421.
Barnet, R. and Erfle, H. (1990) Nucleic Acids Res . , 18, 3094.
Bell, L. D., Smith, J. C, Derbyshire, R. , Finlay, M., Johnson, I., Gilbert, R. , Slocombe, P., Cook, E., Richards, H., Clissold, P., Meredith, D. , Powell-Jones, C. H., Dawson, K. M., Carter, B. L. and McCullagh, K. G. (1988) Gene 63, 155-163.
Brocca, S., Schimdt_Dannert, C, Lotti, M., Alberghina, L. and Schmid, R. D. (1998) Protein Sci . r 7, 1415-1422.
Casimiro, D. R., Wright, P. E. and Dyson, H. J. (1997) Structure, 5, 1407-1412.
Chen, G-F. T and Inouye M. (1994) Genes and Development 8: 2641-2652.
Commess, K. M., Shewchuck, L. M., Ivanetich, K. and Walsh, C. T. (1994) Biochemistry, 33, 4175-4186.
Crameri et al., (1996) Nature Biotechnology 14, 315-319,
Edge, M. D. , Greene, A. R. , Heathcliffe, G. R. , Meacock, P. A., Schuch, W., Scanlon, D. B., Atkinson, T. C, Newton, C. R. and Markham, A. F. (1981) Na ture, 292, 756- 762.
Ferretti, L., Karnik, S.S., Khorana, H. G., Nassal, M. and Oprian, D. D. (1986) Proc. Na tl . Acad. Sci . USA, 83, 599-603.
Graham, R. W. , Atkinson, T., Kilburn, D. G. , Miller, R. C.and Warren, R. A. J. (1993) Nucleic Acids Res . , 21, 4923-4928.
Higuchi, R., Krummel, B. and Saiki, R. (1988) Nucleic Acids Res . ,16, 7351-7367..
Ikemura, T (1981) J. Mol . Biol . 146: 1-21.
Jayaraman, K. et al., (1989) Nucleic Acids Res . (1989) 17, 4403.
Jayaraman, K, Fingar, S. A., Shah, J. and Fyles, J. (1991) Proc . Na tl . Acad. Sci . USA, 88, 4084-4088.
Jayaraman, K. and Puccini, C. J. (1992) BioTechniques , 12, 392-398.
Kalman, M. I., Cserpan, I., Bajszar, G., Dobi, A., Horvath, E., Pazman, C. and
Simoncsits, A. (1990) Nucleic Acids Res . 18, 6075-6081,
Majumder, K. (1992) Gene, 110, 89-92.
Maruyama, T. et al., (1986) Nucleic Acids Res . 14: 151- 197.
Prodromou, C. and Pearl, L. H. (1992) Protein Eng. , 5, 827-829.
Rink, H., Manfred, L., Sieber, P. and Meyer, F. (1984) Nucleic Acids Res . , 15, 6369-6387.
Sandhu, G. S., Aleff, R. A. and Kline, B. C. (1992) BioTechniques, 12, 14-16.
Stemmer, W. P. C (1994) Nature 370, 389-391.
Stemmer, W. P. C, Cra eri, A., Ha, K. D., Brennan, T. M. and Heyneker, H. L. (1995) Gene, 164, 49-53.
Withers-Martinez, C. Carpenter, E. P., Hackett, F. , Ely, B., Sajid, M., Grainger, M. and Blackman, M. J. (1999) Protein Eng. 12, 1113-1120.