In vitro evolution of enzyme specificity
Field to which invention relates
This invention relates to the field of chemical synthesis and is particularly, though not exclusively, concerned with a method of producing active molecules, such as enzymes with desirable characteristics.
Background to invention
The primary focus of drag production in the pharmaceutical industry is on small-molecule (MW <1000 Da) chemicals, as these compounds are often orally available (Buckland B. C. et al (2000) Metab. Eng. 2: 42-48). An increasing awareness of the alternative activities exhibited by different optical isomers of the same drug has led to an increased pressure to manufacture optically pure therapeutics. Syntheses of such pure compounds by chemical methods often require several expensive protection and de-protection steps to achieve selectivity. By contrast, enzymes achieve higher selectivity in a single step. However, the use of enzymes in the synthesis of complex molecules is currently hindered by the time taken to discover or develop an enzyme with the required substrate specificity, as compared to optimising established chemical transformations. Even with an increasing number of known enzymatic reactions, identifying a suitable biocatalyst is extremely difficult, as the known enzymes often do not show activity towards the desired substrate.
Although forced evolution has shown great potential to enhance the properties of enzymes, particularly for hydrolases, enzymes derived by the method are still not widely used in the manufacture of pharmaceuticals. A major bottleneck is the identification of a suitable enzyme that is capable of the desired reaction. While databases such as ENZYME (Bairoch A. (2000), The ENZYME database in 2000. Nucleic Acids Research 28 pp304-305) usually identify a suitable class of enzyme capable of the desired chemistry, it is much more difficult to find an example of that enzyme with activity towards a particular substrate, due to the high substrate specificity exhibited by most natural enzymes. The chemical route is therefore used in drug production rather than the enzymatic route despite
the selectivity drawbacks and typically higher process costs of many chemical catalysts (Thayer, A. M. (2001) Chemical & Engineering News 79: 27-34).
To produce enzymes having improved properties suitable for use in chemical production mutant genes encoding the proteins have to be created and expressed. In the past, the only way to obtain mutations was to isolate naturally occurring mutants with strain-screening methods. However, the rate of natural mutation is very low, and mutants have to be generated by treatment with mutagenic agents such as chemical mutagens or UN light. Moreover, even if an accurate screening test was available, isolation of mutants that were lethal or that did not produce observable changes was not possible (Zaccolo, M., et al, (1996) J Mol. Biol, 255, 589-603).
10
Random mutagenesis is now, usually, achieved by PCR methods. These methods have revolutionised the means by which mutants are obtained because they are more precise and more efficient, yielding mutations in 50-100% of the proteins created, than phenotypic screening (Smith, M., Biochimie, (1985) 67, 717-723, & Zoller, M., (1999) Curr. Opin. Biotech., 2, 526-531).
15
In contrast to natural evolution, directed evolution has a defined goal, which is to generate large pools of molecular variants at the DΝA level from which proteins with the desired properties can be selected. As opposed to protein engineering by design, directed evolution does not require a prior knowledge of the protein structure (Manuela Zaccolo and Ermanno 20 Gherardi., (1999) J Mol Biol, 285, 775-783). Furthermore, it relies on the fact that proteins can tolerate a number of amino acid residue substitutions without dramatic effect on folding or stability (Axe et al, (1996) Proc. Natl. Acad. Sci. USA, 93, 5590-5594; Bowie et al, (1996) Methods Enzymol, 266, 598-616). The other fact is that natural evolution has only screened for a subset of potentially useful sequences and therefore an unexplored sequence space can reveal better solutions to biological problems (Manuela
?5
Zaccolo and Ermanno Gherardi., J Mol. Biol, (1999) supra). In practice, the mutational load representing the number of mutations induced into the gene cannot exceed a certain rate and the size of the mutant DΝA libraries are limited to about 106 to 1013 clones depending on the mode of screening or selection.
Random mutations have previously been introduced into a stretch of DNA using non-PCR methods such as chemical mutagenesis, UV irradiation, mutator strain and poisoned nucleotides (Kuchner and Arnold., 1997) Trends Biotechnol, 15, 523-530). These methods are reported to be successful but they suffer from a number of disadvantages. First, the mutations can affect any gene in the organism's genome and the average number of mutated genes of interest can be very low. Second, prior to DNA cloning and sequencing technologies there was no way of knowing the location and the type of mutations being induced into the genome.
Therefore there was a huge necessity for biologists to refine mutagenesis techniques and combine it with PCR amplification methods in order to generate higher numbers of mutants with more precise mutations. These methods have revolutionized the means by which mutants are obtained (Smith M et al, (1985) Biochimie, 67, 717-723, & Zoller, M.J., Curr. (1991) Opin. Biotech., 2 (4), 526-531).
Among the methods that introduce random mutations in the entire gene is Error-prone PCR (epPCR). The approach has successfully been applied to engineer new protein functions and to improve the catalytic activity. Another technique is combinatorial cassette mutagenesis, which uses oligonucleotides containing randomised codons as mutagenic cassettes for their introduction into the gene of interest by PCR methods (Reidhaar-Olson et al, (1999) Methods Enzymol, 208, 564-586). Moreover, a DNA shuffling method developed for random in vitro DNA recombination represents a significant advance in the applications of directed evolution methods (Stemmer et ah, (1994) Nature, 370, 389-391).
A typical cycle of directed evolution starts with the selection of DNA sequences encoding proteins that involve to some extent the sought after property. The diversity of the sequences is then increased through the mutagenesis step by introducing random point nucleotide mutations and amplifying the DNA fragment using epPCR. These DNA sequences are then cloned into an expression vector and transformed into competent E. coli cells. A screening procedure is then employed to isolate the transformant E. coli cells containing the mutated PCR amplified DNA fragments encoding proteins with improved characteristics.
The selected sequences are then amplified again so that the mutagenesis, amplification and screening are repeated many times until the proteins with the desired properties or functions are obtained.
A remarkable success has been achieved in the industrial application of directed evolution to improve the activities and thermostabilities of vaccines and pharmaceuticals (Schmith-Dannert & Arnold, (1999) Trends Biotechnoh, 17, 135-136). These successful applications have proved the different possibilities for future uses of directed evolution in understanding protein functions and the production of novel biocatalysts (Gregory L.Moore and Costas D.Maranas, (2000) J Theor. Biol., 205 (3), 483-503).
The technique of DNA shuffling creates gene libraries, containing combinations of mutations derived from a set of homologous DNA sequences or arising as a result of point mutations (Kuchner and Arnold., (1997) Trends Biotechnoh, 15, 523-530). Recombination serves to promote positive traits and to eliminate negative traits in the progeny, resulting in a rapid accumulation of beneficial mutations in separate genes.
In the Error-prone PCR method, the gene of interest is amplified after many PCR cycles under conditions that increase normal mis-incorporation errors. The error prone PCR replication process (Cadwell & Joyce, (1994) PCR Meth. Apph, 3(6), S136-S140) intentionally introduces copying errors by imposing mutagenic reaction conditions, based on the parameters below:
1. The error rate (fidelity) of the Taq polymerase.
2. The length of the mutagenised gene and the number of effective doubling cycles.
3. The concentration of MnCl2 affecting the rate of mutations induced by the Taq polymerase.
The first step of PCR is the denaturation of the DNA into 2 single strands. The second step is the annealing of a primer to the DNA single strands. The third step is the extension by Taq polymerase.
Nucleotides complementary to the single strand template are added by using the original sequence as a template, extending the complementary strands until the normal DNA double strands are recovered. Most mutations occur in this step where the non-complementary nucleotides are incorporated into the chain. The mutation rates induced by Taq range from 10"7up to 10-3 per nucleotide polymerized, as reported by Eckert & Kunkel, (1990) Nucleic Acids Research, 18, 3739-3744. However these mutations are nucleotide dependent (Cadwell & Joyce., (supra); Shafikliani et ah, (1997) Biotechniques, 23, 304-310). Therefore the monitoring of these variable replication errors is crucial for the mutagenesis.
Examples of applications of epPCR already in use, include improved solvent, thermostability and enhanced specific activity of enzymes and proteins (Chen and Arnold., P. (1993) Natl. Acad. Sci. USA, 90 (12), 5618-5622; Giordano et al, (1999) Biochemistry-US, 38 (10), 3043-3054; Heneke and Bornscheuer., (1999) Biol. Chem., 380, (7-8), 1029-1033; Moore and Arnold., (1996) Nat. Biotechnol, 14, 458-467; Shibata et al, (1998) Protein Eng., 11 (6), 467-472; You and Arnold,. (1996) Protein Eng., 9 (1), 77-83). Therefore specific protein activities and functions that never occur in nature can be easily generated.
The method of epPCR is subject to several disadvantages. First, there is a tendency for neutral and deleterious mutations to accumulate in the selected progeny sequences, increasing with the number of cycles of epPCR. Second, the random mutagenesis is directed across the entire gene sequences in the hope that a mutation resulting in the desired improvement will be found within the generated library. As a result, most mutants that are formed, are non-functional (deleterious), or have no effect on the desired property (neutral). Given the restrictions on library size that can be searched in a practical manner, these redundant mutations further limit the useful sequence space that can be searched.
Combinatorial cassette mutagenesis is sometimes used to mutate selected sequences and therefore reduces the sampling of redundant sequence space. However, the level of
mutation can no longer be varied using the method, unless new DNA primers are synthesized for each mutational load required. Control of mutational load is desired in order to control the generated sequence space, and hence the library size, to within a practical limit. It is also extremely difficult to generate combinatorial cassettes that contain only non-disruptive mutations, i.e. encoding only amino acids with similar physico-chemical properties to that of the original wild-type amino acid. Altering an amino acid in a protein sequence frequently disrupts the structure or function of the protein, resulting in the need to search again through redundant sequence space.
In our copending patent application PCT application WO 03/004595 filed 5th July 2002 there is disclosed an improved method of epPCR which allows hyper-mutation by "focused error-prone PCR" at a specific and selected active site of a nucleic acid or polypeptide. The approach is analogous to phage-display libraries and the natural repertoire of antibodies, in that only the active site of the displayed protein is randomised. Phage-display has been used successfully to obtain tighter ligand binding, and also to obtain "catalysis" of a reaction. However, phage-display relies on the linkage between catalysis and a binding property in order to select variants from a large library, thus limiting its potential to the selection of single turnover events and not true catalysis. Focused error-prone PCR is a hybrid approach that combines a direct assay for nucleic acid or polypeptide function and the focused sequence randomization of phage display.
Focused epPCR is a novel method based on conventional epPCR in which a nucleic acid fragment is amplified by PCR using Taq polymerase. Taq polymerase has a low fidelity of replication due to a lack or reduction of 3 '-5' exonuclease proof-reading activity. The rate of mutagenesis and hence the mutational load can be altered by varying the concentration of Mn2+ in the PCR reaction according to previously established equations (Fromant et ah, (1995) Anal. Biochem., 224, 347-353,). During amplification, the sequence between the two PCR primers becomes mutated at random and the sequence of the primers can also become mutated at random, although generally to a lesser degree than the sequence between the primers.
Focused epPCR takes advantage of the fact that the majority of residues important for nucleic acid function (e.g. promoter or enhancer activity) or polypeptide function (e.g. catalysis or substrate binding) make up only a small proportion of the entire nucleic acid sequence or protein in most cases (Clackson et ah, (1998) J Mol Biol., 277, 1111-1128). The primers for PCR are thus chosen to complement the sequence at either side of these short and specific regions to be randomised. The primers may complement the sequences within the short and specific regions or may complement sequences just outside the short specific regions to be amplified. The result is that only those regions of the nucleic acid or polypeptide comprising the active site are randomized.
In particular the copending application discloses a method of randomly modifying a specific region of a functional nucleic acid sequence or polypeptide sequence while maintaining the remaining sequence so as to arrive at a functional nucleic acid or polypeptide with improved characteristics.
Specifically, the copending application discloses a method of producing a modified polypeptide with improved characteristics comprising the steps of:
(a) obtaining nucleic acid primers which flank an active site within a parent nucleic acid sequence encoding a parent polypeptide;
(b) carrying out a polymerase chain reaction (PCR) using said primers and the parent nucleic acid sequence as a template under suitable conditions for introducing mutations into the amplified active site sequence;
(c) isolating said mutated active site;
(d) introducing said mutated active site into the parent nucleic acid sequence to replace the non-mutated active site thereby producing a modified nucleic acid sequence, or introducing said mutated active site into a template nucleic acid sequence to produce a modified nucleic acid; and
(f) expressing said modified nucleic acid sequence to produce a modified polypeptide.
Forced evolution has been previously used to alter the specificity of an enzyme towards substrates that were already poorly accepted by the wild type enzyme (Arnold, F. H. et al
(1999) Curr. Opin. Chem. Biol, 3: 54-59; May, O. et al (2000) Nat. Biotechnoh 18: 317-320). Ellington and co-workers have since demonstrated that substrate specificity initially broadens as an enzyme is evolved for improved activity towards poorly accepted substrates (Matsumura, I. et al (2001) J Mol. Biol. 305: 331-339). Specifically, the Escherichia coli β-glucuronidase (GUS) was evolved through three rounds of DNA shuffling and screening in vitro to catalyze the hydrolysis of a β-galactoside substrate 500 times more efficiently (kcat Km) than the wild-type GUS which only has weak β galactosidase activity, with a 52 million-fold inversion in specificity. The kinetic behaviour of the purified mutant proteins in reactions with a series of substrate analogues showed that certain mutations account for the changes in substrate specificity, and that they were synergistic. They noted that, during a forced evolution experiment, a second evolutionary intermediate of GUS, unlike the wild-type and evolved forms, exhibited broadened specificity for substrates Pnp-fucoside and oNP-galactoside-6-phosphate dissimilar to either glucuronides or galactosides and on which the wild-type GUS showed no detectable activity. These results were indicated as being consistent with the "patchwork" hypothesis, which postulates that modern enzymes diverged from ancestors with broad specificity. The minor changes in substrate are illustrated in the accompanying drawing Fig 1. In other words, previously undetectable activity towards novel substrates was obtained unintentionally during forced evolution, where the new substrate differed by a few small chemical structure changes that could be accepted by the GUS enzyme after a single mutation (N566S) in one round of random or directed mutagenesis. In one aspect of the present invention, substrate specificity of an enzyme is intentionally changed to act on a desired substrate where the wild-type enzyme shows substantially no previously detectable activity. By repeating this several times on sequential substrates an enzyme's substrate specificity can be directed along an "evolutionary pathway" comprising stepwise modifications of the substrate structure. This can then be repeated with each new enzyme on substrates that are increasingly distant in structural and chemical-space from the original substrate. By making stepwise modifications of the substrate it is possible to generate an enzyme which has activity against a desired substrate that differs from the substrate of the wild-type enzyme to such a degree that it would be substantially impossible to generate an enzyme having activity against the desired substrate by randomly mutating the wild-type enzyme and selecting for activity against the desired substrate.
It is an object of the invention to provide an efficient method of developing active molecules, particularly enzymes with improved properties and, in particular, specificity for a substrate of commercial interest.
Disclosure of the invention
According to one aspect of the invention there is provided a method of producing new active molecules, the method comprising:
i) a first round comprising mutating a nucleic acid encoding a starting active molecule, which has activity against a starting substrate, to produce one or more second active molecules and detecting activity of the one or more second active molecules against a second substrate on which the starting active molecule has substantially no activity, and selecting one or more second active molecules which have activity against the second substrate; and
ii) a subsequent round comprising mutating a nucleic acid encoding the one or more selected second active molecules to produce one or more third active molecules and detecting the activity of one or more third active molecules against a third substrate on which the second active molecule has substantially no activity, and selecting one or more third active molecules which have activity against the third substrate,
wherein the third substrate is sufficiently different in structure from the starting substrate so that the starting active molecule will have no activity against the third substrate and it would be substantially impossible to obtain an active molecule having activity against the third substrate performing a single round of random mutagenesis on the starting active molecule.
The basis of the invention is that active molecules against the final substrate (e.g. the third substrate) can only be obtained by using one or more intermediatory substrates. The distance between the structure (and chemical space) of the starting substrate and the final substrate is so large that it would be practically impossible to obtain an active molecule having activity against the final substrate by performing a single round of random
mutagenesis using current methods. The degree of mutagenesis of the active molecule required will be so large that the number of mutants generated will be so large that it will be practically impossible to screen in an efficient manner.
The term "a single round of directed or random mutagenesis" means that the starting active molecule can be randomly mutated using any current random mutagenic technique, including directed evolution, DNA shuffling, etc., and that the mutated active molecules produced are only tested for activity against the final substrate.
The present method provides a more efficient method for obtaining active molecules having activity against a desired substrate as a series of substrates bridging the structural gap between the starting substrate and the final substrate are used.
The active molecule can be any active molecule encoded by a nucleic acid which has a detectable activity against a substrate. The active molecule may be DNA, RNA or protein including peptides. Preferably, the active molecule is an enzyme, a receptor, an antibody molecule, an antigen, a ligand for a receptor or a substrate for an enzyme. Suitable receptors include cytokine receptors, ion channels, etc. Suitable ligands include cytokines and hormones. Most preferably, the starting active molecule will be an enzyme.
The substrate may be any substrate on which the active molecule has a detectable activity. The substrate may be a protein, a nucleic acid, a carbohydrate, an imprinted polymer, an organic or inorganic chemical compound. For example, when the active molecule is an enzyme, the substrate is an enzyme substrate; when the active molecule is a receptor, the substrate is a receptor ligand; when the active molecule is a receptor ligand, the substrate is a receptor; when the active molecule is an antibody, the substrate is an antigen, etc. Preferably the substrate is an enzyme substrate.
The activity can be any detectable activity, such as an enzymatic activity, which can be measured by detecting the product or by-product of the enzymatic reaction. The detectable activity may be the binding of the active molecule to the substrate, which can be measured by detecting the bound complex or the amount of free active molecule or substrate. It is particularly preferred that the active molecules identified as having activity against the substrate have greater activity against the substrate than the previous active molecule. The
active molecules preferably have at least 2 times, more preferably at least 10 times and most preferably at least 100 times as much activity against the substrate as the previous active molecule.
Therefore the invention provides a method for forcing the evolution of active molecules 5 such as enzymes which use different substrates. The invention provides the foundation of a technology by which active molecule variants with the desired substrate specificity can be identified while preserving the high selectivity that the active molecules typically achieve. The method of the invention would greatly reduce the time taken to develop a suitable biocatalytic route for pharmaceuticals, agrochemicals and selected fine chemicals.
10 Basically, the invention provides a method of producing new active molecules comprising at least 2 rounds of mutating a nucleic acid encoding an active molecule and selecting encoded active molecules which have activity against a substrate, wherein from round to round the substrate used differs from the previously used substrate by one or more minor structural or chemical differences.
I5 Although the mutation step can be carried out using any suitable method known to the skilled worker, including the random mutagenesis processes described above, according to a preferred aspect of the invention, the mutation of the nucleic acid is performed using the focused error-prone PCR technique described above. Specifically, this aspect of the invention provides a method of producing new active molecules, the method comprising:
υ i) a first round comprising mutating a nucleic acid encoding a starting active molecule, which has activity against a starting substrate, to produce one or more second active molecules and detecting activity of the one or more second active molecules against a second substrate on which the starting active molecule has substantially no activity, and selecting one or more second active molecules which have substantial activity against the second substrate; and 5 ii) a subsequent round comprising mutating a nucleic acid encoding the one or more selected second active molecules to produce one or more third active molecules and detecting the activity of one or more third active molecules against a third substrate on
which the second active molecule has substantially no activity, and selecting one or more third active molecules which have substantial activity against the third substrate,
in which the mutation steps in one or more such rounds are conducted by
5 (a) obtaining nucleic acid primers which flanlc an active site within a parent nucleic acid sequence encoding a parent active molecule;
(b) carrying out a polymerase chain reaction (PCR) using said primers and the parent nucleic acid sequence as a template under suitable conditions for introducing mutations into the amplified active site sequence; 0
(c) isolating said mutated active site;
(d) introducing said mutated active site into the parent nucleic acid sequence to replace the non-mutated active site thereby producing a modified nucleic acid sequence, or introducing said mutated active site into a template nucleic acid sequence to produce a modified ^ nucleic acid; and
(e) expressing said modified nucleic acid sequence to produce a modified active molecule,
wherein the third substrate is sufficiently different in structure from the starting substrate so that the starting active molecule will have no activity against the third substrate and it υ would be substantially impossible to obtain an active molecule having activity against the third substrate by performing a single round of random mutagenesis on the starting active molecule.
The method of the invention may comprise further rounds of mutation involving further substrates and active molecules and testing the activity of the active molecules produced 5 during each round sufficient to provide active molecules which act on substrates to effect reactions which are of significant commercial value. The method of the invention may involve any number of rounds of mutation and testing. Typically, it may involve 2 to 10 rounds. The number of rounds required will depend on the difference in structure between the starting substrate and the final substrate.
Typically the third and subsequent active molecules will be produced as part of a selection of mutated active molecules which are then selected on the basis of their ability to act on a specified substrate. In particular, when the active molecules are enzymes they are selected on the basis of their ability to catalyse a reaction of interest on a specified enzyme substrate.
The invention requires that the previous active molecule (e.g. the starting active molecule or the second active molecule has substantially no activity on the new substrate (e.g. the second substrate or the third substrate, respectively). The term "substantially no activity" as used herein means that the previous active molecule may be active against the new substrate at a level of 0 to 50%, preferable 0 to 20%, more preferably 0 to 10%, and most preferably 0 to 5% of its activity against the previous substrate. As a specific example, the second active molecule may be active against the third substrate at a level of 0 to 50% of its activity against the second substrate.
It is indicated that the starting active molecule will have no activity against the third substrate. This means that the starting active molecule will have no detectable activity against the third or subsequent substrate. It is also preferred that any active molecule will have no detectable activity against a substrate used 2 or more rounds later in the method of the present invention. Such subsequent substrates differ in structure from the substrate on which the active molecule does have activity to such a degree that the active molecule will not have any detectable activity against the subsequent structures.
The new active molecule must have activity against the new substrate but may also have activity against one or more of the previous substrates. For example, the third active molecule must have a detectable activity against the third substrate and may have activity against the second substrate. In some situations it may be desirable for the new active molecule to have activity against both the new substrate as well as one or more of the previous substrates. The method of the present invention can therefore be easily modified by testing the new active molecule on the new substrate as well as on one or more of the previous substrates so that active molecules with the desired activity can be selected. By using this method active molecules with an increased range of substrates can be obtained.
Alternatively, the new active molecule can be selected so that it does not have activity against the previous substrates or only has minimal activity against the previous substrates, for example less than 10% , more preferably less than 5% of the activity of the new active molecule against the new substrate. Using this method an active molecule with a very narrow range of substrates can be obtained.
As indicated above, the active molecule is preferably an enzyme. Enzymes for use as a starting point in the method of the invention, the first enzyme, can be readily selected by the skilled addressee for example from databases such as ENZYME. One suitable type of enzyme is the transketolases. Other enzymes can be selected from the other classes of enzymes i.e the oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. As indicated above, the invention can also be extended to ligand binding to any protein scaffold.
The substrates used in the invention may be selected by the skilled addressee. Preferably each substrate is chosen to have one or more minor differences over the previous substrate. For example, and where the substrates are chemical compounds, the two substrates may differ by the substitution of a hydrogen with a methyl group or the substitution of a methyl group by a hydroxyl group, the removal or addition of a chemical group (e.g RCH3 to RCH2CH3 or the opposite), inversion or addition of chemical groups around a central atom (e.g S enantiomer to R enantiomer), oxidation or reduction of a chemical group (eg NO2 to NH2 or the reverse), change in bonding by oxidation or reduction (e.g single bond to double or triple bond, alkane to alkene or alkyne; or the reverse), etc.
Where the substrates are proteins, the two substrates may differ by a few amino acids (e.g. 1 to 10 amino acids). The modifications made will depend on the activity being sort. For example, modifications can be made to amino acids in a protein to alter the structure of the protein, or to the alter the active site of the protein. Preferably the few amino acids that differ interact directly with the active molecule. Those skilled in the art could determine what modifications to make in order to achieve the substrate having the desired function and/or structure. Furthermore, standard techniques for altering the amino acids of a protein are well known to those skilled in the art. Where the substrates are nucleic acids, the two substrates may differ by a few nucleotides (e.g. 1 to 10 nucleotides). As indicated above
with respect to the protein substrates, the modification made will depend on the activity being sort. Standard techniques for altering the nucleotide sequence of a nucleic acid are well known to those skilled in the art.
The substrates used in the invention are preferably a series of intermediates wherein stepwise modifications of the substrate's structure are made. The starting active molecule is active against a starting substrate and the new active molecule produced by the method of the present invention is active against a desired substrate. Each subsequent substrate used in the method of the invention is modified to be closer in structure to the desired substrate than the previous substrate. Accordingly, each substrate used in the method can be seen as a stepping stone from the starting substrate to the desired substrate. By using such a series of substrates, the "evolution" of the active molecule is directed toward the desired substrate so that an active molecule having activity against the desired substrate is obtained.
The desired or final substrate is so different from the starting substrate that it is substantially impossible to obtain an active molecule having activity against the desired substrate by performing a single round of random mutagenesis. Only by using a series of substrates that differ from each other by relatively minor differences is it possible to efficiently obtain an active molecule having activity against the desired substrate.
It is preferred that the differences between each subsequent substrate are relatively small so that active molecules having activity against the new substrate can be obtained. Preferably, each subsequent substrate differs by less than 20%, more preferably less than 10%), from the previous substrate. Where the substrate is a chemical compound, it is preferred that less than 20%), more preferably less than 10%) of the substituent groups are changed. Where the substrate is a protein it is preferred that less than 20%>, more preferably less than 10%o of the amino acids are changed. Where the substrate is a nucleic acid, it is preferred that less than 20%, more preferably less than 10% of the nucleotides are changed.
The new active molecules produced by the method of the present invention can be used to perform a reaction of interest. The reaction of interest may be the binding of a specific ligand or receptor, the digestion of a carbohydrate or a particular peptide or nucleotide sequence, protecting a particular substrate from degradation, etc. Particularly preferred
reactions of interest include the isolation of optical isomers of compounds that have therapeutic value. Other preferred reactions of interest include reactions that produce compounds that are high-value intermediates to pharmaceuticals, fine chemicals or agrochemicals; reactions that degrade toxic compounds (bioremediation); and reactions towards analytes, e.g using enzymes in diagnostics or biosensors.
Once an enzyme has been produced which is able to catalyse a reaction of commercial interest it can be further modified to improve its properties such as thermal stability, kinetic activities, etc.
The method of the invention may be automated.
According to another aspect of the invention there is provided an active molecule such as an enzyme obtained or obtainable by a method in accordance with the invention. Preferably the enzyme is obtained by the method of the present invention.
According, to a further aspect of the invention there is provided a method of producing a compound of interest, the method involving the use of an active molecule, such as an enzyme, produced by the method according to the present invention in the production of the compound.
Brief Description of the Drawings
Methods in accordance with the invention will now be described, by way of example only with reference to the further accompanying drawings Figures 2 to 3 in which:
Fig 2 shows a series of substrates that can be used in a method according to the invention; and
Fig 3 shows an assayable reaction scheme for wild-type transketolase.
Examples
Forced evolution of transketolase and selection of modified enzymes
Transketolase is an important enzyme for the synthesis of asymmetric C-C bonds with a new chiral carbon centre with up to 100%) selectivity. Transketolase thus has potential use in the synthesis of compounds such as novel sugars, amino-acids, peptides and polyketides 5 for new antibiotics, antihypertensive vasopeptidase inhibitors, HIN protease inhibitors, antiviral medication, treatments for rheumatic arthritis, and antitumor compounds (Krix, G. et al. (1997) Journal of Biotechnology. 53: 29-39; Szarka, L. et al (1999) Bioorganic and Medicinal Chemistry 7: 2247-2252; Bommarius, A et al. (1998) Journal of Molecular Catalysis B: Enzymatic 5: 1-11). The substrate specificity of transketolase has only been previously altered by site-directed mutagenesis studies (J Ward (UCL) unpublished), and only for the aldehyde acceptor substrate. The structure of enzyme transketolase is available (Littlechild, J. A. et al (1995) Ada Crystallographica Section D-Biological Crystallography 51: 1074-1076). The donor substrate binds deep into the transketolase active-site to form many specificity-determining contacts. This makes transketolase a good test system for demonstrating the method of this invention permitting the expansion of the substrate specificity for the donor ketol substrate, which is currently limited to sugars and 5 β -hydroxypyruvate.
The series of substrates shown in Figure 1, is an example of such a series that can be used, though there are many other commercial compounds that can extend this series as necessary. Forced evolution techniques such as error-prone PCR, mutator strains and DΝA shuffling are preferably used to alter the specificity of transketolase at each 'step' of the 0 substrate series.
The first step in the series to be explored is a change in polarity from a hydroxyl to a methyl group, without a significant change in size. Evolution of the transketolase enzyme is likely to produce a corresponding change from a hydrogen-bonding residue in the <- enzyme structure to a hydrophobic one. The hydroxyl group of the β -hydroxypyruvate has no direct influence on the mechanism of transketolase, although its presence may slightly diminish activation of the ketone. The following illustrative steps, changing the substrate from 2 ketobutyric acid to 2 ketovaleric acid and then to 2 ketoisocaproic acid, involve an increase in the size and branching of the hydrophobic side-chain of the substrate.
Evolution of the enzyme may then increase the cavity size of the active site of the resulting mutated enzyme complementary to this region of the substrate.
The set of new substrates SI to S3 (Figure 1) are first assessed for their activity towards the wild-type transketolase enzyme. The sequence of wild type TK is given at Genbank accession no. NP_417410 and GI.T6130836 from E. coli K12. Ref: Sprenger GA (1991). Transketolase from E. coli K-12 - DNA-sequence of the gene and purification of the enzyme from recombinant strains. Biol Chem. 372(9): 759-759. A suitable assay uses β -hydroxypyruvate and glycolaldehyde as the ketol and aldehyde substrates respectively (Figure 3). The change in absorbance of Cresol-red indicator is monitored as the pH increases due to consumption of the acidic β -hydroxypyruvate substrate. All of the new substrates contain this acidic carboxylate group as it is necessary for the reaction to proceed irreversibly. An HPLC assay for TK has been produced (C. Ingram, P. Dalby, G.Lye, unpublished) and may be used to design a chiral HPLC assay for verifying the product enantioselectivity. A suitable chiral HPLC system uses Chromtech columns.
Two librarys of transketolase mutants in E. coli BL21 DΕ3 (O. Miller, G. Lye & P. Dalby, unpublished) are produced, one by focussed epPCR, the other produced using XL- 1 -red mutator strain (Stratagene) maintained as glycerol stocks in 96-well deep-well plates at -70°C. For screening, the glycerol stock plates are replicated into another deep-well plate with 0.5ml LB medium containing 50 μg/ml of Ampicillin. Incubated overnight at 37°C with shaking. The library is assessed in 96-well microplates for activity towards all three new substrates SI, S2 and S3 (Figure 1) using an absorbance microplate reader and the assay shown in Figs. 2 and 3. For the assay, each micro-well contains 150 μl of lysed (1 cycle of freeze-thaw using -70°C freezer) culture: 50 μl of 60 mg/I Cresol red pH-indicator solution; 50 μl of 300 mM glycolaldehyde solution and 50 μl of 600 mM ketol donor solution (substrates SI, S2, S3, etc). Reaction is followed by absorbance at 560 nm using a
Fluostar Optima (BMG Labtechnologies) plate reader for at least 2 hours. This allows us to identify new enzyme variants capable of activity towards SI, and simultaneously assess whether activity can be obtained towards S2 or S3 in a single round of error-prone PCR
(average of one mutation per gene). This will characterise the extent of substrate structural-space that can be explored in a single round of forced evolution. The enzyme variants that show activity on the substrate with the greatest modification (S3>S2>S1) are
then isolated for further rounds of evolution and characterisation by sequencing. If activity is obtained directly on S3 in the first round, further commercial substrates can be chosen with even greater modifications as a new S2 and S3 (e.g. benzeneglyoxylic acid or imidazolepyruvic acid).
The best mutant(s) isolated above are then subjected to a second round of forced evolution using DNA shuffling or error-prone PCR and similarly assessed for activity towards SO, SI, S2 and S3. This characterises the extent to which the substrate specificity is broadening or narrowing, and also the extent to which activity towards a new substrate (S2 or S3) is being obtained. The best mutants are then isolated and characterised.
This iterative process is then repeated until activity towards S3 can be enhanced no further in terms of high activity and selectivity. The approach outlined, whereby activity towards all substrates is continually assessed, allows the identification of the optimum process for evolving specificity towards the final substrate. For example whether evolution towards each new substrate can be attempted in successive rounds, or whether some enhancement must be obtained for each substrate before proceeding to the next one.
The products of each reaction for the best mutants obtained can also be confirmed at each round of evolution using mass-spectrometry. The sequences of the obtained enzymes can be used to rationalise mutations with the observed changes in substrate specificity, where possible, by comparison with the wild-type transketolase structure.
Although methods of the invention have been described with reference to transketolase, which has great potential in the synthesis of asymmetric C-C bonds with a new chiral centre, especially for novel sugar-based pharmaceuticals such as HIV protease inhibitors, and anti-viral, anti-rheumatoid arthritis, and anti-tumour compounds, the method of the invention can be readily applied to increasing the synthetic repertoire of other enzymes, whether transketolases or from the other classes of enzymes mentioned above.
This new approach will allow the substrate specificity of an active molecule, e.g. An enzyme, to be evolved beyond the presently perceived limitations. Forced evolution can only currently be used to improve upon an existing active molecule's, e.g. an enzyme's, activity where detectable activity on a new substrate can be introduced within the first
round of mutagenesis. The new substrates in these cases are only slightly modified: more radical changes in substrate specificity are currently not directly possible.
The method of the invention can be used to modify an active molecule, e.g. an enzyme, such that it can be evolved to accept much more substantially altered substrates and hence greatly extend its range of biosynthetic reactions. An efficient process, such as that outlined in this proposal, for obtaining activity towards non-natural pharmaceutical intermediates would, therefore, be highly desirable, especially as biocatalyst discovery is the first and most critical step in determining the feasibility of a bioconversion. Those skilled in the art of biochemical engineering have the capacity to take such enhanced biocatalysts to process studies and scale-up.
Industrial Application
Methods in accordance with the invention may be used to develop active molecules, e.g. enzymes, for a desired reaction, e.g. for use in the industrial production of chemicals. In particular, the invention will allow us to make full use of the thousands of enzyme chemistries that exist in nature, by increasing their ability to accept a wider range of novel substrates.
All documents cited above, are incorporated herein by reference.