WO2003088004A2 - The cop protein design tool - Google Patents
The cop protein design tool Download PDFInfo
- Publication number
- WO2003088004A2 WO2003088004A2 PCT/US2003/011613 US0311613W WO03088004A2 WO 2003088004 A2 WO2003088004 A2 WO 2003088004A2 US 0311613 W US0311613 W US 0311613W WO 03088004 A2 WO03088004 A2 WO 03088004A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rotamer
- residues
- protein
- aars
- binding
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/93—Ligases (6)
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- proteins such as enzymes and transcription factors
- co-factors such as vitamins, metal ions, etc.
- natural binding partners including other macromolecules such as nucleic acids or protein, steroid, lipids, mono- and poly-saccharides, etc.
- Protein engineering is a powerful tool for modification of the structural catalytic and binding properties of natural proteins and for the de novo design of artificial proteins. Protein engineering relies on an efficient recognition mechanism for incorporating mutant amino acids in the desired protein sequences. This process has been very useful for designing new macromolecules with precise control of composition and architecture. However, a major limitation is that the mutagenesis is restricted to the 20 naturally occurring amino acids. Interestingly, this limitation can be overcome by protein engineering itself, by using the product of a particular protein engineering process - engineered Aminoacyl tRNA Synthetase (AARS).
- AARS Aminoacyl tRNA Synthetase
- Proteins are synthesized with precise control over nucleotide sequences encoding such proteins, leading to the vast range of specific structures and functional properties observed in nature. Even so the monomer pool for proteins is limited to the 20 natural amino acids. Increasing the monomer pool by incorporating new amino acid analogs would allow development of beautiful new bioderived polymers exhibiting novel but well-controlled architectures [1, 2]. The ability to incorporate amino acid analogs into proteins would greatly expand our ability to rationally and systematically manipulate the structures of proteins, both to probe protein function and create proteins with new properties.
- the ability to synthesize large quantities of proteins containing heavy atoms would facilitate protein structure determination; incorporating selenium-substituted serine can facilitate crystallization processes in proteins [4]; and the ability to site specifically substitute fluorophores or photo-cleavable groups into proteins in living cells would provide powerful tools for studying protein structure and functions in vivo [3].
- AARS aminoacyl-tRNA synthetases
- Clash Opportunity Progressive Design is a method that can computationally design a mutant protein that would preferentially bind an analog ligand over the natural ligand occurring in the wild type protein binding.
- the method has been applied to design a mutant tyrosyl-tRNA synthetase that would selectively bind O-Methyl-L-Tyrosine rather than any natural amino acids.
- One aspect of the invention provides a method for designing a mutant of a polypeptide, said mutant preferentially binds an analog ligand (AL) over at least one inactive ligand (IL) of said polypeptide, comprising: (a) providing (i) an atomic coordinate model for said polypeptide, which model includes coordinates for at least the binding pocket residues for said IL; (ii) an IL rotamer of the most favorable solution conformation, and (iii) one or more AL rotamers, each more stable in solution than a pre-determined level of solution conformation energy; (b) docking said IL rotamer and each said AL rotamers into said binding pocket, wherein the backbone of said IL rotamer and said AL rotamers remain unchanged with respect to one another; (c) for each said AL rotamers docked into said binding pocket, identifying, in said binding pocket, varying residues which have less favorable interactions with said AL rotamer than with said IL rotamer; (d
- the method further comprises: (g) for each stable combinatorial mutation selected in (f), identifying one or more specific combinatorial mutations that preferentially bind said AL rotamer over a pool of ILs structurally similar to said IL.
- any of the above methods further comprises: (h) generating each combinatorial mutation finally selected in (f) or (g) and testing in vivo and/or in vitro for selectively incorporating said AL into proteins over said IL, or said pools of ILs structurally similar to said IL.
- said polypeptide is an Aminoacyl tRNA Synthetase (AARS)
- said IL is a natural ligand of said AARS
- said AL is an analog of said natural ligand.
- said natural ligand is one of the twenty amino acids usually found in natural proteins.
- said AARS is a phenoalanyl-tRNA Synthetase (PheRS) or a tyrosyl-tRNA Synthetase (TyrRS).
- said AARS is from a bacteria. In a preferred embodiment, said AARS is from a eukaryote.
- said AL is a derivative of at least one of the 20 natural amino acids, with one or more functional groups not present in natural amino acids.
- said functional group can be selected from the group consisting of: bromo-, iodo-, ethynyl-, cyano-, azido-, aceytyl, aryl ketone, a photolabile group, a fluorescent group, and a heavy metal.
- said AL is a derivative of Phe or Tyr.
- said polypeptide is a G-Protein Coupled Receptor
- GPCR selected from: an adrenergic receptor (AR), an endothelial differential gene (EDG), an Olfactory Receptor (OR), or a Sweet Receptor (SR).
- AR adrenergic receptor
- EDG endothelial differential gene
- OR Olfactory Receptor
- SR Sweet Receptor
- said binding pocket residues comprise residues having at least one atom within a pre-defined distance from any atom of said AL or said IL.
- said pre-defined distance may be 6 A.
- said AL rotamers are provided in (iii) by generating various candidate rotamers of said AL over a grid of dihedral angles, and calculating stability of all candidate rotamers.
- the stability of said one or more AL rotamers are calculated using quantum mechanics, or a suitable force field with molecular mechanics, or both.
- said IL rotamer or said AL rotamers is/are docked into said binding pocket based on a known three-dimensional complex structure of said IL and said polypeptide.
- said IL rotamer or said AL rotamers is/are docked into said binding pocket based on a docking algorithm.
- said docking algorithm is HIERDOCK.
- step (c) is effectuated by calculating and comparing non-bond energy contribution towards said AL rotamer and said IL rotamer for each binding pocket residues.
- said non-bond energy contribution is calculated using a force field.
- said force field can be selected from: AMBER, AMBER94, AMBER/OPLS, OPLS, OPLS-AA, CHARMM, CHARMM22, Discover, ECEPP/2, GROMOS, MM2, MM3, MM4, MMFF, MMFFs, MMFF94, or UFF.
- said force field is a DREIDING force field.
- said DREIDING force field considers function forms for
- the dielectric constant for said Coulomb functional form is distance dependent or distance independent.
- the charge for either said binding pocket residues, or said AL / IL rotamer, or both can be varied in said Coulomb functional form.
- said charge includes charge from experiment, or charge based on a model selected from: QEq, Del Re, MPEOE, or Gasteiger/PEOE.
- the functional form of said van der Waals interaction uses a Leonard- Jones potential.
- said Leonard- Jones potential is a 6-12 or 6-10 Leonard-
- the functional form of said van der Waals interaction uses a Morse potential.
- the functional form of said hydrogen bond interaction uses a three-body form. In one embodiment, the functional form of said hydrogen bond interaction uses a two-body form or a four-body form.
- said non-bond energy contribution is calculated using quantum mechanics (QM).
- QM quantum mechanics
- said threshold is 50%.
- said varying residues are clash residues.
- step (d) is effectuated by substituting the wild-type amino acid at said varying residue with all 19 natural amino acids, one at a time, and selecting all substitutions that favor binding to said AL rotamer over said IL rotamer.
- the side-chain conformation of each of the 19 substituted natural amino acids is generated over a grid of dihedral angles. In one embodiment, the side-chain conformation of each of the 19 substituted namral amino acids is generated from a backbone-dependent rotamer library.
- the wild-type amino acid at said varying residue is substituted with all 19 namral amino acids, one at a time, using SCWRL. In one embodiment, the wild-type amino acid at said varying residue is substituted with all 19 natural amino acids, one at a time, using SCAP.
- the wild-type amino acid at said varying residue is substituted with all 19 natural amino acids, one at a time, using a side-chain modeling method based on branch-and-bound or dead-end-elimination algorithm.
- the method further comprises optimizing the side-chain of each of the 19 substituted namral amino acids after substitution but before selection.
- said optimization is carried out by energy minimization.
- said energy minimization utilizes a force field.
- said force field is a DREIDING force field.
- said optimization is carried out by Molecular Dynamics or by using Monte Carlo techniques.
- substitutions that favor binding to said AL rotamer are selected based on a score for each substitution, said score comprising a weighed sum of (I) the differential non-bond interaction energy of the substituted varying residue with said IL rotamer and said AL rotamer, and (II) the differential non-bond interaction energy of the substituted varying residue with the remaining residues of said polypeptide.
- said differential non-bond interaction energy of (I) and said differential non-bond interaction energy of (II) are independently calculated using a force field or quantum mechanics.
- said weighed sum comprises 75-100% of said differential non-bond interaction energy of (I).
- said score includes desolvation penalty for said AL rotamer and said IL rotamer.
- said desolvation penalty is calculated using a continuum implicit solvent model.
- said continuum implicit solvent model is a Surface Generalized Born (SGB) model, a Solvent Accessible Surface Area (SASA) / Analytical Volume Generalized Born (AVGB) model, or a Poisson-Boltzmann (PB) model.
- SGB Surface Generalized Born
- SASA Solvent Accessible Surface Area
- AVGB Analytical Volume Generalized Born
- PB Poisson-Boltzmann
- said desolvation penalty calculation further includes adding explicit solvent molecules.
- said varying residues are opportunity residues. In one embodiment, said opportunity residues are identified after identification of clash residues.
- said opportunity residues stabilizes said AL rotamer or destabilizes said IL rotamer or both.
- said opportunity residues takes advantage of hydrogen bond donor or acceptor atoms that are different between said AL rotamer and said IL rotamer. In one embodiment, said opportunity residues are identified by their proximity to a large void space in said binding pocket after identifying and mutating clash residues.
- step (d) is effectuated by substituting the wild-type amino acid at said varying residue with all 19 natural amino acids, one at a time, and selecting all substitutions that favor binding to said AL rotamer over said IL rotamer.
- step (e) side-chains of each generated combinatorial mutation are optimized to generate optimal side-chain conformation for each combinatorial mutation.
- said optimization is carried out using a DREIDING force field with conjugate gradient minimization.
- each of said combinatorial mutation is globally optimized both with said AL rotamer and with said IL rotamer.
- said global optimization both with said AL rotamer and with said IL rotamer is independently carried out using a force field, Molecular Dynamics, or Monte Carlo techniques.
- each said combinatorial mutation is not fixed during global optimization.
- combinatorial mutations which form a more favorable interaction with said AL rotamer than with said IL rotamer are selected based on the differential binding energy of said combinatorial mutations with said AL rotamer and said IL rotamer.
- said differential binding energy is calculated with a force field or quantum mechanics.
- said differential binding energy is calculated with a DREIDING force field with SGB solvation or AVGB solvation.
- said differential binding energy is calculated with Equation 2. In one embodiment, the binding energies for said interactions in steps (c) -
- thermodynamic cycles are calculated using Potential of Mean Force (PMF), average dynamic free energy, Free Energy Perturbation (FEP), or methods based on thermodynamic cycles.
- PMF Mean Force
- FEP Free Energy Perturbation
- step (f) for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in step (f), for each combinatorial mutations selected in
- the varying residue side-chains are reselected from a backbone-dependent rotamer library, and each said combinatorial mutation is globally optimized with no ligand, using a force field or quantum mechanics with a continuum implicit solvent model.
- said global optimization include explicit water in said binding pocket.
- the method further includes calculating the differential binding energy for the globally optimized combinatorial mutation with said AL rotamer and said IL rotamer, using a force field or quantum mechanics with a continuum implicit solvent model.
- the global optimization and the differential binding energy are calculated using Equation 2.
- Another aspect of the invention provides a peptide or polypeptide incorporating one or more amino acid analogs, which peptide or polypeptide was produced using a mutant AARS designed by any of the suitable method (or combinations thereof) described above.
- This aspect of the invention contains all the combinations of the features of the methods described above.
- Another aspect of the invention provides a method for conducting a biotechnology business comprising: (i) identifying one or more mutant forms of an AARS sequence, by the method of claim 4, said mutant preferentially binds to an amino acid analog of a natural amino acid substrate of said AARS; (ii) providing a translation system including: (a) a transcript, or means for generating a transcript, that encodes a peptide or polypeptide, (b) an mutant AARS having said identified mutant AARS sequence(s), and (c) said amino acid analog, under circumstances wherein said mutant AARS catalyzes incorporation of said amino acid analog in said peptide or polypeptide.
- the method further comprises the step of providing a packaged pharmaceutical including the peptide or polypeptide, and instructions and/or a label describing how to administer or use said peptide or polypeptide.
- Another aspect of the invention provides a recombinant AARS protein generated by any of the above suitable methods, said AARS protein comprising an optimized protein sequence that incorporates an amino acid analog of a natural amino acid substrate of said AARS into a protein in vivo.
- This aspect of the invention contains all the combinations of the features of the methods described above.
- Another aspect ofthe invention provides a nucleic acid sequence encoding a recombinant AARS protein described above.
- Another aspect ofthe invention provides an expression vector comprising the nucleic acid sequence described above.
- This aspect of the invention contains all the combinations of the features of the methods described above.
- Another aspect of the invention provides a host cell comprising the nucleic acid sequence described above.
- Another aspect ofthe invention provides an apparatus for designing a mutant of a polypeptide, said mutant preferentially binds an analog ligand (AL) over at least one inactive ligand (IL) of said polypeptide, said apparatus comprising: (A) means for providing (i) an atomic coordinate model for said polypeptide, which model includes coordinates for at least the binding pocket residues for said IL; (ii) an IL rotamer of the most favorable solution conformation, and (iii) one or more AL rotamers, each more stable in solution than a pre-determined level of solution conformation energy; (B) means for docking said IL rotamer and each said AL rotamers into said binding pocket, wherein the backbone of said IL rotamer and said AL rotamers remain unchanged with respect to one another; (C) for each said AL rotamers docked into said binding pocket, means for identifying, in said binding pocket, varying residues which have less favorable interactions with said AL rotamer than with said IL
- said IL is a natural ligand of said AARS, and said AL is an analog of said natural ligand.
- Another aspect of the invention contains all the combinations of the features of the methods described above.
- Another aspect of the invention provides a computer system for designing a mutant of a polypeptide, said mutant preferentially binds an analog ligand (AL) over at least one inactive ligand (IL) of said polypeptide, said computer system comprising computer instructions for: (a) providing (i) an atomic coordinate model for said polypeptide, which model includes coordinates for at least the binding pocket residues for said IL; (ii) an IL rotamer of the most favorable solution conformation, and (iii) one or more AL rotamers, each more stable in solution than a pre-determined level of solution conformation energy; (b) docking said IL rotamer and each said AL rotamers into said binding pocket, wherein the backbone of said IL rotamer and said AL rotamers remain unchanged with respect to one another; (c) for each said AL rotamers docked into said binding pocket, identifying, in said binding pocket,
- said polypeptide is an Aminoacyl tRNA Synthetase (AARS)
- said IL is a natural ligand of said AARS
- said AL is an analog of said natural ligand.
- Another aspect of the invention provides a computer-readable medium storing a computer program executable by a plurality of server computers, the computer program comprising computer instructions for: (a) providing (i) an atomic coordinate model for said polypeptide, which model includes coordinates for at least the binding pocket residues for said IL; (ii) an IL rotamer of the most favorable solution conformation, and (iii) one or more AL rotamers, each more stable in solution than a pre-determined level of solution conformation energy; (b) docking said IL rotamer and each said AL rotamers into said binding pocket, wherein the backbone of said IL rotamer and said AL rotamers remain unchanged with respect to one another; (c) for each said AL rotamers docked into said binding pocket, identifying, in said binding pocket, varying residues which have less favorable interactions with said AL rotamer than with said IL rotamer; (d) identifying a subset of mutations for each of said varying residues identified in
- said polypeptide is an Aminoacyl tRNA Synthetase (AARS)
- said IL is a natural ligand of said AARS
- said AL is an analog of said natural ligand.
- Another aspect ofthe invention provides a computer data signal embodied in. a carrier wave, comprising computer instructions for: (a) providing (i) an atomic coordinate model for said polypeptide, which model includes coordinates for at least • the binding pocket residues for said IL; (ii) an IL rotamer of the most favorable solution conformation, and (iii) one or more AL rotamers, each more stable in solution than a pre-determined level of solution conformation energy; (b) docking said IL rotamer and each said AL rotamers into said binding pocket, wherein the backbone of said IL rotamer and said AL rotamers remain unchanged with respect to one another; (c) for each said AL rotamers docked into said binding pocket, identifying, in said binding pocket, varying residues which have less favorable interactions with said AL rotamer than with said IL rotamer; (d) identifying a subset of mutations for each of said varying residues identified in (c), wherein each of said mutations
- said polypeptide is an Aminoacyl tRNA Synthetase (AARS)
- said IL is a natural ligand of said AARS
- said AL is an analog of said natural ligand.
- Another aspect of the invention provides an apparatus comprising a computer readable storage medium having instructions stored thereon for: (i) accessing a datafile representative of the coordinates for a plurality of different rotamers of one or more amino acids and one or more analogs of said amino acids; (ii) accessing a datafile representative of a set of structure coordinates for amino acid residues that define a binding pocket for an aminoacyl tRNA synthetase; (iii) a set of modeling routines for (a) calculating differential non-bond interaction energies between residues of the binding pocket with both the amino acids and the amino acid analogs; (b) altering one or more amino acid residues in the binding pocket for the aminoacyl tRNA synthetase (AARS) to produce one or more sets of altered AARS structure coordinates defining altered binding pockets that differentially favor binding of the amino avid analogs over the amino acids; (c) generating a list representative of optimized AARS sequences having preferential interactions with said amino acid analogs over said amino acids.
- Another aspect ofthe invention provides a method for synthesizing a peptide or protein incorporating one or more amino acid analogs, comprising providing a translational system including: (i) a transcript, or means for generating a transcript, that encodes a peptide or polypeptide, and (ii) one or more mutant AARS having a mutated binding pocket, each said mutant AARS catalyze preferential incorporation of an amino acid analog of the natural amino acid substrate of said AARS into said peptide or protein under the conditions ofthe translation system.
- said translation system is a whole cell that expresses said one or more AARS.
- said translation system is a cell lysate or reconstituted protein preparation that is translation competent.
- At least one of said mutant AARS catalyzes incorporation of said amino acid analog with a K cat at least 10 fold greater than the wild-type AARS. In one embodiment, at least one of said mutant AARS catalyzes incorporation of said amino acid analog with a K cat at least 10 fold greater than the K cat ofthe incorporation of natural amino acid substrate ofthe wild-type AARS.
- At least one of said mutant AARS catalyzes incorporation of said amino acid analog with a K ca t at least 2 fold greater than the K cat ofthe incorporation of natural amino acid substrate ofthe wild-type AARS.
- At least one of said mutant AARS has a sequence identified by any suitable methods described above.
- FIG. 1 The flowchart of an illustrative embodiment of COP.
- Leu65 form a hydrophobic pocket for the methyl group.
- the amide N ⁇ 2 of Gln32 has close contact with the oxygen of OMe, while the O ⁇ l atom of Gln32 is stabilized by forming a weak hydrogen bond (3.58 A) with the main chain NH of Leu65 (Both may have intervening water molecules).
- FIG. 3 The predicted structure for M ann-TyrRS with explicit side chains for the Tyr32, Aspl58 residues involved in the design.
- the Tyr ligand is also shown (labeled in red).
- the predicted position of residues Glu 107 and Leu62 (labeled in blue) M y -r ⁇ j-TyrRS.
- Panel 1 two rotamers of naph-Ala used in the design of Example 2 and Example 6; Panel 2: several non-natural amino acid analogs used in re-designing of mj-TyvRS; Panel 3: the two rotamers of keto-Phe used in Example 4; Panel 4: the two rotamers of NBD-Ala used in Example 5; Panel 5: the two rotamers of bpy-Ala used in Example 5.
- Figure 5. The ribbon representation of PheRS with Phe bound in the active site.
- FIG. 7 A Graphic User Interface (GUI) for COP.
- This invention is a method that can computationally design a mutant protein that would preferentially bind an analog ligand over the natural ligand occurring in the wild type protein binding. It should be understood that the method can be used for any protein-ligand pair, although in certain portions of the following description, the method has been described in detail from the view point of designing mutant AARS that would selectively bind an amino acid analog (such as O-Methyl-L-Tyrosine) over any natural amino acids. This, however, should not be construed to be limiting in any respect.
- an amino acid analog such as O-Methyl-L-Tyrosine
- the Clash-Opportunity Progressive (COP) procedure can be used foi ⁇ structure based rational redesign of a binding site.
- one embodiment of the invention considers just a single Active Ligand (AL) that will bind specifically to the redesign protein or mutant (the Target Protein or TP), and just a single specific Inactive Ligand (IL), in this case, the natural ligand for the wild-type protein.
- A Active Ligand
- IL Inactive Ligand
- other embodiments ofthe invention relates to redesigning a target protein for preferential binding to a list of ALs with respect to a list of ILs.
- the design strategy can be used for designing AARS mutants that preferentially bind and activate a specific amino acid analog compared to all 20 natural amino acids (including the natural, preferred amino acid substrate of the AARS).
- the design goal is for the mutant protein to preferentially bind the target ligand (such as the amino acid analog) versus the wild type ligand (such as the wild- type amino acid or any other natural amino acid).
- This is achieved by calculating the differential binding energy of the desired AL (such as amino acid analog) against any other potential competitor inactive ligand (IL) that might bind selectively to the mutant.
- the differential binding energy of the amino acid analog against Tyr and Phe is calculated. If the analog is much larger than Tyr, then it may be necessary to consider Trp as a potential competitor for the redesigned mutant AARS.
- COP is different from other protein design protocols, such as the one described in U.S. patent number 6,269,312, in a number of ways.
- COP designs for protein function to recognize a new ligand, such that the designed protein preferentially binds the new ligand over its natural ligand(s).
- U.S. patent number 6,269,312 aims at identifying the optimal side-chains for designing a protein with a certain desired fold (structure).
- COP calculates clashes between the protein and the ligand(s), and only designs the ligand binding site of the target protein. This allows COP to avoid lengthy calculations required in the protein core design.
- the COP design procedure for designing mutant proteins generally comprises several interrelated steps as described below. Although these steps are named using numeric numbers (such as steps 1, 2, 3, etc.), the nomenclature is only used to roughly differentiate one step from the other. Although the COP method can be carried out in order from step 1 to step 7, it does not necessarily mean that step n must be carried out before step (n+l). In fact, a few steps can be combined, carried out in parallel, or in reverse order; while some steps may be carried out either before or after certain other steps. The boundaries between the steps are not necessarily clear cut, since certain calculations may belong to either of the two neighboring steps, or both.
- Step 1 ofthe COP design tool begins by obtaining accurate descriptions for the structures ofthe target protein, the inactive ligand (such as the namral amino acid substrate for a particular AARS), and the active ligand (such as the intended amino acid analog for the AARS).
- the protein structures can generally be obtained from known crystaUographic or NMR data, if such data are readily available for the target protein. Alternatively, the structure description may be obtained using any ofthe modeling techniques described below.
- CHARMM22 charges with the nonpolar hydrogen charges summed onto the heavy atoms can be assigned to the ⁇ chain according to the parameters set forth in the DREIDING force field (Mayo et al, J. Phys. Chem.
- the protein can be neutralized by adding counterions (Na + and CI " ) to the charged residues (Asp, Arg, Glu and Lys) and subject to a minimization of the potential energy by, for example, the conjugate gradient method using Surface Generalized Born continuum solvation method (Ghosh et al, J. Phys. Chem. B102: 10983-10990, 1998).
- the RMS in coordinates (CRMS) of all atoms after minimization may be obtained to verify if the CRMS values for the structure is well within experimental error.
- the crystal waters and other bound molecules can be removed for docking to maximize the searchable surface of the protein.
- the SGB continuum solvation method may be used for all structure optimizations and energy scoring with an internal protein dielectric constant of, for example, 2.5, for all calculations. Although other similar solvation models can be used.
- the structure or favorable conformation ofthe ligands can be obtained by generating various rotamers ofthe ligands over a grid of dihedral angles, and calculating their energies in solution using quantum mechanics (QM), or a suitable force field with molecular mechanics (MM), or both.
- QM quantum mechanics
- MM molecular mechanics
- An illustrative QM program is Jaguar (Schrodinger, Portland, OR), and an illustrative MM program is Biograf (Accelrys, San Diego, CA).
- MM can be used in the initial calculation to identify the most promising few conformations, then QM can be used to more accurately determine the most favorable ligand conformations.
- the ligand conformations can be optimized in the extended conformation at the Hartree-Fock level of theory with a 6-3 IG** basis set, including Poisson-Boltzmann continuum dielectric solvation using the Jaguar computational suite (Tannor et al, J. Am. Chem. Soc. 116: 11875-11882, 1994) (Schrodinger, Portland, OR). The Mulliken charges ascertained from this calculation can then be retained for the subsequent molecular mechanics simulations.
- Hydrogen atoms if not already present in the structures of the target protein and/or ligands, can be added manually, or using any of the suitable means, such as Biograf (Accelrys, San Diego, CA). Step 2. Ligand Docking and Energy Minimization. Once the structures of the target protein, IL, and AL are obtained, the low energy rotamers of the AL can then be docked into the binding site ofthe target protein in Step 2.
- the binding site can be determined from crystal structure containing a ligand bound to the target protein. If no ligand is present in the protein structure, it can be modeled using various docking techniques.
- An example of such docking techniques is HierDock (U.S. Patent Application Publication No. US20020099506A1, or PCT publication WO0171347A1, incorporated herein by reference), which has been used extensively to predict and verify the binding of amino acids to AARS and other ligand-protein interactions, including binding of odorants to membrane-bound olfactory receptors (Floriano et al, P.N.A.S. U.S.A. 97: 10712-6, 2000), binding of outer membrane protein A to sugars (Datta et al, J.
- HierDock ligand screening protocol follows a hierarchical strategy for examining conformations, binding sites and binding energies. Such a hierarchical method has been shown to be necessary for docking algorithms (Halperin et al, Proteins: Struct. Funct. & Gene. 47, 409-443, 2002).
- HierDock involve using coarse-grain docking methods to generate several conformations of protein/ligand complexes followed by molecular dynamics (MD) simulations including continuum solvation methods performed on a subset of good conformations generated from the coarse-grain docking.
- MD molecular dynamics
- DOCK 4.0 Ewing and Kuntz, J Comput. Chem. 18: 1175-1189, 1997; Ewing et al, J.Comput. Aid. Mol. Design 15: 411-428, 2001, incorporated herein by reference
- DOCK 4.0 can be used to generate and score 20,000 configurations, of which 10% were ranked using the DOCK scoring function.
- the 20 best conformations for each ligand from DOCK are selected, and subjected to annealing molecular dynamics (MD) to further optimize the conformation in the local binding pocket, allowing the atoms of the ligand to move in the field of the protein.
- MD molecular dynamics
- the system may be heated and cooled from, for example, 50 K to 600 K, in steps of, for example, 10 K (0.05 ps at each temperature) for 5 one cycle.
- the best energy structure is retained.
- Annealing MD allows the ligand to readjust in the binding pocket to optimize its interaction with the protein. This fine grain optimization may be performed using MPSim (Lim et al, J. Comput. Chem. 18: 501-521, 1997) with DREIDING force field
- the annealing MD procedure will generate 20 protein/ligand complexes for each ligand.
- the potential energy of the full ligand/protein complex in aqueous solution can be minimized (using, for example, conjugate gradients minimization) using SGB.
- his step of protein/ligand- complex optimization is critical for obtaining energetically good conformations for the complex (cavity + ligand). Binding energies as
- Solvation energies can be calculated using Poisson-Boltzmann continuum solvation method available in the software suite Delphi.
- the non-bond interaction energies can be calculated exactly using all pair interactions.
- binding energy is given by:
- the IL (such as the natural amino acid substrate) in the binding pocket ofthe target protein (such as the AARS of interest) can then be replaced with the energetically favorable rotamer(s) of the AL identified above (such as the intended amino acid analog), while keeping the backbone of the AL fixed, in order that the reaction center for the AL backbone remain unchanged with respect to the IL backbone.
- the IL such as the natural amino acid substrate
- the AL such as the intended amino acid analog
- the potential energy of the TP-AL complex is minimized using any of the suitable methods, such as DREIDING force field in MPSim (Lim et al, J. Comput. Chem. 18: 501-521, 1997).
- DREIDING force field in MPSim Li et al, J. Comput. Chem. 18: 501-521, 1997.
- Poisson-Boltzmann dielectric continuum solvent was included to simulate solvation in water.
- Step 3 Energy Calculation and Clash Identification.
- the non-bond energy (E t ) contributions are calculated for each residue k in the binding pocket.
- the binding pocket residue is usually defined as target protein residues with any atom within a given distance (such as 6 A) from any atom of the AL (analog).
- These calculations can be carried out using any reliable force fields or quantum mechanics.
- One illustrative DREIDING force field (Mayo et al, J. Phys. Chem. 1990, 94:8897) method is described below in Equation 1, which considers the functional forms for such non-bond interaction energies as Coulomb, van der Waals, and hydrogen bond.
- This equation may be generally used to calculate non-bond interaction energy between a first and a second molecule.
- atom i represents an atom on the first molecule (such as an atom of the AL or IL)
- atom j represents an atom on the second molecule (such as an atom belonging to the binding pocket residue k).
- this equation may also be used to calculate the non-bond interaction energy of a particular target protein residue (such as a mutated one, see the clash-relieving step below) with the remaining residues of the target protein.
- atom i may represent an atom of said particular (mutant) residue
- atom j represents an atom belonging to any one of the remaining residues ofthe target protein.
- the same ligand docking and energy calculation can be done for each (or at least the most energetically favorable) IL rotamer.
- Step 3a Since the target protein backbone is fixed in Step 3 a, it is required that any AL (analog) rotamer should not have a severe clash with the backbone of the target protein. Thus, any TP-AL complex in which AL (analog) rotamers have a severe clash with the target protein backbone are discarded without further consideration. "Severe clash” usually includes situations wherein more than a threshold value (say, 50%) ofthe total non- bond interaction energy between the ligand and residue k is contributed by interaction ofthe ligand atoms with the backbone atoms of residue k.
- a threshold value say, 50%
- each binding pocket residue with minor (below threshold) backbone clash and/or with side-chain clash (collectively called “clash residues") with the rotamer will be substituted by (or “mutated into”) the remaining 19 natural amino acids, one at a time, in order to find substitutions (or “mutations") that can either relieve the clash or improve the differential interaction with the AL (analog) with respect to the IL (natural amino acid substrate of AARS), or both.
- Step 3 a utilize rotameric conformations that can be generated over a grid of dihedral angles or by using side chain rotamer libraries (Bower et al, J. Mol. Biol. 267: 1268-82, 1997) (such as a backbone-dependent rotamer library), while the backbone of the target protein is held fixed during the process.
- side chain alone may be optimized while the rest of the protein is kept fixed. This optimization step can be done using energy minimization (such as the one used in our example), Molecular Dynamics, or Monte Carlo techniques.
- the optimization step may be beneficial because the initial side-chain placement may not be optimum, and may lead to bad local contacts that could give misleading indications regarding this specific side-chain rotameric conformation.
- the contribution from this mutated side-chain to the binding energy of both the AL (analog) and the IL (wild type amino acid) can be calculated, using, for example, Equation 1.
- Clashes of this mutated residue with neighboring residues in the target protein (“constraint energy") are also calculated.
- a score will be generated for each mutation using a scoring energy function, which includes a weighted sum of the differential non-bond interaction energy of the mutated residue with the IL and the AL (analog), and the non-bond interaction energy of the mutated residue with the remaining residues ofthe target protein.
- weights of 0.75 to 1.0 can generally be used for ligand-protein interaction, and 0.0 to 0.25 can generally be used for protein residue - protein residue interactions. However, under certain circumstances, weights outside the range described above may also be used.
- the differential binding energies for all twenty possible amino acids are calculated, and the best mutations are selected based on these scores (lower score is better). If none of the other 19 residues has a better score, then this particular residue position is either not changed, or changed to the few residues with scores within a predetermined cutoff value (such as within 5 kcal/mole of the best mutation). This precautionary selection may be beneficial since combined mutations (see below) involving these less-than-optimal residue changes may have better selectivity for the AL (analog), or other unexpected overall benefits.
- desolvation penalty may also be added to the above energy calculation for the mutated residue.
- the desolvation penalty can be calculated using any of a variety of methods, such as SGB (Ghosh, J. Phys. Chem. B 102: 10983-10990, 1998), AVGB (Zamanakos, Ph.D. Thesis, California Institute of Technology, Pasadena, CA, 2001), or Poison-Boltzmann (Tannor et al, J. Am. Chem. Soc. 116: 11875-11882, 1994) solvation method.
- This differential energy score between the AL (analog) and the IL (natural ligand) selects one or more candidate amino acid mutations for each ofthe clash residues for further consideration. Since it is the differential energy scores that are considered, both mutations that favor AL (analog) binding and ones that disfavor IL (the natural ligand) binding are considered. In summaiy, based on the differential energy scores, a subset of candidate amino acid changes for each clash residue is selected for later use in simultaneous combinatorial mutations.
- Step 3 b After identifying candidate mutations for relieving clashes at the clash residues, COP then looks for "opportunity mutations" in the binding pocket that would stabilize the analog ligand or disrupt the bonding with the natural ligand. To illustrate, residues positioned near the AL (for example, within a cutoff value of 6 A) are examined for their ability to take advantage of, for example, hydrogen bond donor or acceptor atoms that are different between the AL (analog) and the IL (natural ligand). Alternatively, the void space in the binding pocket after making the clash mutations is calculated, and any residue that is close to a large void is considered as a candidate for a stabilizing point mutation.
- the selection may be similarly based on a pre-determined threshold differential energy / score, such as 5, 3, 2, 1, or 0.5 kcal/mole, so that any mutation that differentially favors AL binding is selected as a candidate opportunity mutation.
- a pre-determined threshold differential energy / score such as 5, 3, 2, 1, or 0.5 kcal/mole
- “Differentially favor” means the mutated residue favors AL binding and disfavors IL binding, such that the differential energy / score is more negative (favored) than the threshold value.
- the pre-set threshold value may be the same or different. After substituting all 20 possible amino acids in each opportunity position, based on the differential scores, a subset of amino acid changes for each opportunity residue is selected for later use in simultaneous combinatorial mutations.
- Step 4 Combinatorial Mutations.
- the above steps lead to the identification of one or more clash residues and/or opportunity residues associated with a given rotamer, and a subset of candidate mutations for each of these clash or opportunity residues that are expected to either relieve clashes or provide stabilization opportunities to the binding ofthe AL (analog) in preference to the IL (wild type ligand).
- Step 4 simultaneous mutation combines these chosen subsets of candidate mutations in all possible permutations.
- the best possible rotamer combination is generated for each of these combinatorial mutants by optimizing the side-chains, using, for example, conjugate gradient minimization or any other suitable means described below.
- residues near any of the mutated residues are re-examined to determine the optimal side-chain conformation.
- the structure of each combinatorial mutant protein as a whole is optimized, both with the IL (natural ligand) and with the AL (analog). This optimization can be done using energy minimization (such as the ones used in our examples), Molecular Dynamics, or Monte Carlo techniques.
- the differential binding energy (with respect to the AL/analog and the IL/natural amino acid) in each combinatorial mutant is calculated.
- a pool of IL may be separately used in this calculation, and the result of each calculation (or at least the best competitor IL) compared to that of the AL (alternatively, this pool competition may be included as the last step, as described below in Step 6).
- These calculations can use any reliable force field or can use quantum mechanics. In one illustrative example below, Equation 2 with DREIDING Force Field and SGB solvation is used.
- Step 5 Relation of the free protein without the ligand.
- Step 4 above considers combinatorial mutations and side-chain conformations that enhance binding of the target protein to the AL (analog). However, it is possible that some combinatorial mutations would disrupt the folding of the free protein when the ligand is not present.
- the side-chains of each combinatorial mutant is re-optimized without any ligand in the binding site. This allows the side-chains of each combinatorial mutation to go into the binding site normally occupied by the ligand during the previous steps.
- the side-chain conformations of each combinatorial mutants are first re-selected from a side-chain rotamer library, and the structure ofthe whole combinatorial mutant is optimized as above (for example, using Equation 2), including solvation (for example, using the SGB continuum solvent procedure).
- Equation 2 the structure ofthe whole combinatorial mutant is optimized as above (for example, using Equation 2), including solvation (for example, using the SGB continuum solvent procedure).
- solvation for example, using the SGB continuum solvent procedure.
- differential binding of the AL is only compared to a single IL to identify the best combinatorial mutant target protein above, it is possible that the identified combinatorial mutant, while preferring to bind the intended AL to the single IL, may nevertheless prefer to bind other IL(s) (such as those with similar structure as the single used IL) rather than the intended AL.
- other IL(s) such as those with similar structure as the single used IL
- the mutant Tyrosyl-AARS activates only the Tyrosine analog, but not any other natural amino acid such as Trp or Phe.
- the redesigned AARS should preferentially bind the analog not only over the natural ligand Tyr, but also over Phe and Trp.
- the designed mutant protein might not be able to fold correctly. For example, if there is charged residue placed in the protein core without favorable local stabilizing interactions, it is a strong destabilizing force. In order to detect such cases in post design, a consensus method may be used to evaluate the interactions for each residue involved in the design.
- the consensus is derived from all the AARS structares Applicants have been working one, including TyrRS (PDB: 2tsl, 3tsl, 4tsl), PheRS (PDB: lb70), SerRS (PDB: lses, lset, lsry, lfyf), ArgRS (PDB: lbs2), MetRS( PDB: lf41), HisRS (PDB: ladj, lhht).
- Table X lists the energies for each amino acid from the consensus:
- a warning message will be given with a stability score of the residue.
- the score is defined by the energy difference divided by the standard deviation. A score higher than 2 generally means the residue is in a very unfavorable position, i.e., it is not making enough interactions with other residues to stabilize the fold.
- Steps 1-6 are repeated for other low energy rotamers of the AL
- the COP method makes the minimum number of rational structure-based mutations in a protein, so that the protein binds preferentially to the desired ligand compared to the designated inactive ligand (for example, the natural ligand).
- This preferential (or discriminatory) binding to a set of preferred analog ligand (AL) against a set of natural ligand or inactive ligand (IL) is a unique feature ofthe COP method, since it ensures at least a better (if not exclusive) interaction of the target protein with the analog over the inactive ligand. In the AARS case, it ensures the incorporation of the amino acid analogs with the intended structure into the final protein product.
- the redesigned target protein may bind the intended AL, it may also bind the IL or other unintended ligands, sometimes with more affinity / specificity for other ligands than for the intended AL.
- the COP method uses the principle minimum change design, which focuses on mutation of residues in the binding site, although there may be circumstances in which it is appropriate to modify residues outside the active site. This greatly simplifies the problem, partly because the number of residues involved is typically much less than the number of residues involved in a protein core design. In addition, the residues required to be changed (mutated) are often well separated, hence the probability of having combinatorial side-chain placement problem is much lower, if exists at all. Other advantages of the COP design methods are apparent in the discussion below about alternative embodiments. Briefly:
- COP can use any force field valid for both protein and ligand (particularly valuable here are generic force fields such as DREIDING or UFF that are valid for a large part of the periodic table), and it can use quantum mechanics for the region of the active site.
- COP uses a complete non-bond energy function such as in Equation 1. This function includes (Coulomb) electrostatic interactions (which may be described as point charges as in Eq. 1 or maybe described as distributed charges as in QEq (Rappe and Goddard, J. Phys. Chem.
- ReaxFF van Duin et al, J. Phys.Che . A 105: 9396-9409, 2001
- van der Waals non-electrostatic non-bond interaction
- an explicit hydrogen bond potential which may be described by a radial potential (e.g. 10-12 Leonard- Jones) and may use a three-body cosine angle term as in Eq. 1.
- Solvation is explicitly included in the COP design procedure. Continuum implicit solvation methods such as SGB or AVGB can be used to describe the role of solvation in the structure and energies of protein and ligand in water (or other) solvent. This greatly decreases the computation effort over the use of explicit solvent. However, explicit solvent molecules can be included in the evaluation of the best cases for final selection in the design. • The protein backbone can be allowed to move (distort in response to the mutations, solvent, and ligand) at any part ofthe algorithm except during clash identification. COP allows the protein backbone to be fully movable in any part of the optimization. The designed protein can be better optimized with backbone flexibility.
- COP design adds the functionality of recognizing a new analog ligand to a mutant target protein (such as AARS), and it selects against any natural ligands. This ensures that the designed target protein (AARS) binds the analog (amino acid) exclusively.
- AARS designed target protein
- the redesigned AARS can be used as an orthogonal tRBA-synthetase tRBA pair that corresponds to the 21 st amino acid
- “About” and “approximately” shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. Typical, exemplary degrees of error are within 20 percent (%), preferably within 10%, and more preferably within 5% of a given value or range of values. Alternatively, and particularly in biological systems, the terms “about” and “approximately” may mean values that are within an order of magnitude, preferably within 5-fold and more preferably within 2-fold of a given value. Numerical quantities given herein are approximate unless stated otherwise, meaning that the term “about” or “approximately” can be inferred when not expressly stated.
- Amino acid analog is meant to include all amino acid-like compounds that are similar in structure and/or overall shape to one or more of the twenty L-amino acids commonly found in naturally occurring proteins (Ala or A, Cys or C, Asp or D, Glu or E, Phe or F, Gly or G, His or H, lie or I, Lys or K, Leu or L, Met or M, Asn or N, Pro or P, Gin or Q, Arg or R, Ser or S, Thr or T, Val or V, Trp or W, Tyr or Y, as defined and listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3). Amino acid analog can also be natural amino acids with modified side chains or backbones.
- these analogs usually are not "substrates" for the AARSs because ofthe high specificity ofthe AARSs, although occasionally, certain analogs with structures / shapes sufficiently close to natural amino acids may be erroneously incorporated into the protein by AARSs, especially mutant AARSs with relaxed substrate specificity.
- the analogs share backbone structures, and/or even the most side chain structares of one or more natural amino acids, with the only difference(s) being containing one or more modified groups in the molecule.
- Such modification may include, without limitation, substitution of an atom (such as N) for a related atom (such as S), addition of a group (such as methyl, or hydroxyl group, etc.) or an atom (such as CI or Br, etc.), deletion of a group (supra), substitution of a covalent bond (single bond for double bond, etc.), or combinations thereof.
- Amino acid analogs may include -hydroxy acids, and ⁇ - amino acids, and can also be referred to as "modified amino acids," "unnatural AARS substrates” or “non-canonical amino acids.”
- the amino acid analogs may either be naturally occurring or non-natarally occurring; as will be appreciated by those in the art, any structure for which a set of rotamers is known or can be generated can be used as an amino acid analog.
- the side chains may be in either the (R) or the (S) configuration (or D- or L- configuration). In a preferred embodiment, the amino acids are in the (S) or L- configuration.
- the overall shape and size ofthe amino acid analogs are such that, upon being charged to tRNAs by the re-designed AARS, the analog-tRNA is a ribosomally accepted complex, that is, the tRNA-analog complex can be accepted by the prokaryotic or eukaryotic ribosomes in an in vivo or in vitro translation system.
- Anchor residues are residue positions in AARS that maintain critical interactions between the AARS and the natural amino acid backbone.
- Backbone includes the backbone atoms and any fixed side chains (such as the anchor residue side chains) of the protein (e.g., AARS).
- AARS the protein
- the backbone of an analog is treated as part of the AARS backbone.
- Protein backbone structure or grammatical equivalents herein is meant the three dimensional coordinates that define the three dimensional structure of a particular protein.
- the structures which comprise a protein backbone structure (of a naturally occurring protein) are the nitrogen, the carbonyl carbon, the ⁇ -carbon, and the carbonyl oxygen, along with the direction ofthe vector from the ⁇ -carbon to the ⁇ -carbon.
- the protein backbone structure which is input into the computer can either include the coordinates for both the backbone and the amino acid side chains, or just the backbone, i.e. with the coordinates for the amino acid side chains removed. If the former is done, the side chain atoms of each amino acid ofthe protein structure may be "stripped" or removed from the structure of a protein, as is known in the art, leaving only the coordinates for the "backbone” atoms (the nitrogen, carbonyl carbon and oxygen, and the ⁇ -carbon, and the hydrogens attached to the nitrogen and ⁇ -carbon).
- the protein backbone structure may be altered prior to the analysis outlined below. In this embodiment, the representation of the starting protein backbone structure is reduced to a description of the spatial arrangement of its secondary structural elements.
- the relative positions of the secondary structural elements are defined by a set of parameters called supersecondary structure parameters. These parameters are assigned values that can be systematically or randomly varied to alter the arrangement of the secondary structure elements to introduce explicit backbone flexibility. The atomic coordinates of the backbone are then changed to reflect the altered supersecondary structural parameters, and these new coordinates are input into the system for use in the subsequent protein design automation. For details, see U.S. Pat. No. 6,269,312, the entire content incorporated herein by reference.
- Conformational energy refers generally to the energy associated with a particular “conformation”, or three-dimensional structure, of a macromolecule, such as the energy associated with the conformation of a particular protein. Interactions that tend to stabilize a protein have energies that are represented as negative energy values, whereas interactions that destabilize a protein have positive energy values. Thus, the conformational energy for any stable protein is quantitatively represented by a negative conformational energy value. Generally, the conformational energy for a particular protein will be related to that protein's stability. In particular, molecules that have a lower (i.e., more negative) conformational energy are typically more stable, e.g., at higher temperatures (i.e., they have greater "thermal stability").
- the conformational energy of a protein may also be referred to as the "stabilization energy.”
- the conformational energy is calculated using an energy "force- field” that calculates or estimates the energy contribution from various interactions which depend upon the conformation of a molecule.
- the force-field is comprised of terms that include the conformational energy of the alpha-carbon backbone, side chain - backbone interactions, and side chain - side chain interactions.
- interactions with the backbone or side chain include terms for bond rotation, bond torsion, and bond length.
- the backbone-side chain and side chain-side chain interactions include van der Waals interactions, hydrogen-bonding, electrostatics and solvation terms.
- Electrostatic interactions may include Coulombic interactions, dipole interactions and quadrapole interactions).
- Force-fields that may be used to determine the conformational energy for a polymer are well known in the art and include the CHARMM (see, Brooks et al, J. Comp. Chem. 4:187-217, 1983; MacKerell et al, in The Encyclopedia of Computational Chemistry, Vol. 1:271-277, John Wiley & Sons, Chichester, 1998), AMBER (see, Georgia et al, J. Amer. Chem. Soc. 1995, 117:5179; Woods et al, J. Phys. Chem. 1995, 99:3832-3846; Weiner et al, J. Comp. Chem. 1986, 7:230; and Weiner et al, J. Amer. Chem. Soc. 1984, 106:765) and DREIDING (Mayo et al, J. Phys. Chem. 1990, 94:8897) force-fields, to name but a few.
- the hydrogen bonding and electrostatics terms are as described in Dahiyat & Mayo, Science 1997 278:82).
- the force field can also be described to include atomic conformational terms (bond angles, bond lengths, torsions), as in other references. See e.g., Nielsen J E, Andersen K V, Honig B, Hooft R W W, Klebe G, Vriend G, & Wade R C, "Improving macromolecular electrostatics calculations," Protein Engineering, 12: 657662(1999); Stikoff D, Lockhart D J, Sharp K A & Honig B, "Calculation of electrostatic effects at the amino-terminus of an alpha-helix,” Biophys.
- Coupled residues generally contribute to polymer fitness through the coupling interaction.
- the coupling interaction is a physical or chemical interaction, such as an electrostatic interaction, a van der Waals interaction, a hydrogen bonding interaction, or a combination thereof.
- changing the identity of either residue will affect the "fitness" ofthe molecule, particularly if the change disrupts the coupling interaction between the two residues.
- Coupling interaction may also be described by a distance parameter between residues in a molecule. If the residues are within a certain cutoff distance, they are considered interacting.
- “Fitness” is used to denote the level or degree to which a particular property or a particular combination of properties for a molecule, e.g., a protein, are optimized.
- the fitness of a protein is preferably determined by properties which a user wishes to improve.
- the fitness of a protein may refer to the protein's thermal stability, catalytic activity, binding affinity, solubility (e.g., in aqueous or organic solvent), and the like.
- Other examples of fitness properties include enantioselectivity, activity towards non-natural substrates, and alternative catalytic mechanisms. Coupling interactions can be modeled as a way of evaluating or predicting fitness (stability). Fitness can be determined or evaluated experimentally or theoretically, e.g. computationally.
- the fitness is quantitated so that each molecule, e.g., each amino acid will have a particular "fitness value".
- the fitness of a protein may be the rate at which the protein catalyzes a particular chemical reaction, or the protein's binding affinity for a ligand.
- the fitness of a protein refers to the conformational energy of the polymer and is calculated, e.g., using any method known in the art. See, e.g. Brooks B. R., Bruccoleri R E, Olafson, B D, States D J, Swaminathan S & Karplus M, "CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations," J. Comp.
- the fitness of a protein is quantitated so that the fitness value increases as the property or combination of properties is optimized.
- the "fitness contribution" of a protein residue refers to the level or extent f(i a ) to which the residue i a , having an identity a, contributes to the total fitness of the protein.
- f(i a ) the level or extent to which the residue i a , having an identity a, contributes to the total fitness of the protein.
- DEE Dead-end elimination
- GMEC global minimum energy conformation
- Dead end elimination is based on the following concept.
- two rotamers, i r and i t at residue i, and the set of all other rotamer configurations ⁇ S ⁇ at all residues excluding i (of which rotamer j s is a member). If the pair-wise energy contributed between i r and j s is higher than the pair-wise energy between i t and j s for all ⁇ S ⁇ , then rotamer i r cannot exist in the global minimum energy conformation, and can be eliminated. This notion is expressed mathematically by the inequality.
- Equation A is not computationally tractable because, to make an elimination, it is required that the entire sequence (rotamer) space be enumerated. To simplify the problem, bounds implied by Equation A can be utilized:
- Equation B can be extended to the elimination of pairs of rotamers inconsistent with the GMEC. This is done by determining that a pair of rotamers i r at residue i and j s at residue j, always contribute higher energies than rotamers i u and j v with all possible rotamer combinations ⁇ L ⁇ . Similar to
- Equation B the strict bound of this statement is given by:
- a rotamer i t ⁇ contributes a lower energy than i r for a portion of the conformational space, and a rotamer i t2 has a lower energy than i r for the remaining fraction, then i r can be, eliminated. This case would not be detected by the less sensitive Desmet or Goldstein criteria.
- all ofthe described enhancements to DEE were used.
- “Expression system” means a host cell and compatible vector under suitable conditions, e.g. for the expression of a protein coded for by foreign DNA carried by the vector and introduced to the host cell.
- Common expression systems include E. coli host cells and plasmid vectors, insect host cells such as Sf9, Hi5 or S2 cells and Baculovirus vectors, Drosophila cells (Schneider cells) and expression systems, and mammalian host cells and vectors.
- Host cell means any cell of any organism that is selected, modified, transformed, grown or used or manipulated in any way for the production of a substance by the cell.
- a host cell may be one that is manipulated to express a particular gene, a DNA or RNA sequence, a protein or an enzyme.
- Host cells may be cultured in vitro or one or more cells in a non-human animal (e.g., a transgenic animal or a transiently transfected animal).
- the methods of the invention may include steps of comparing sequences to each other, including wild-type sequence to one or more mutants.
- Such comparisons typically comprise alignments of polymer sequences, e.g., using sequence alignment programs and/or algorithms that are well known in the art (for example, BLAST, FASTA and MEGALIGN, to name a few).
- sequence alignment programs and/or algorithms that are well known in the art (for example, BLAST, FASTA and MEGALIGN, to name a few).
- sequence alignment programs and/or algorithms that are well known in the art (for example, BLAST, FASTA and MEGALIGN, to name a few).
- sequence similarity in all its grammatical forms, refers to the degree of identity or correspondence between nucleic acid or amino acid sequences that may or may not share a common evolutionary origin (see, Reeck et al, supra).
- sequence similarity when modified with an adverb such as "highly”, may refer to sequence similarity and may or may not relate to a common evolutionary origin.
- a nucleic acid molecule is "hybridizable" to another nucleic acid molecule, such as a cDNA, genomic DNA, or RNA, when a single stranded form of the nucleic acid molecule can anneal to the other nucleic acid molecule under the appropriate conditions of temperature and solution ionic strength (see Sambrook et al, Molecular Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.). The conditions of temperature and ionic strength determine the "stringency" of the hybridization.
- low stringency hybridization conditions corresponding to a T m (melting temperature) of 55°C
- T m melting temperature
- Moderate stringency hybridization conditions correspond to a higher T m , e.g., 40% formamide, with 5 or 6xSSC.
- High stringency hybridization conditions correspond to the highest T m , e.g., 50% formamide, 5 or 6xSSC.
- SSC is a 0.15M NaCl, 0.015M Na-citrate.
- Hybridization requires that the two nucleic acids contain complementary sequences, although depending on the stringency of the hybridization, mismatches between bases are possible.
- the appropriate stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of similarity or homology between two nucleotide sequences, the greater the value of T m for hybrids of nucleic acids having those sequences.
- the relative stability (corresponding to higher T m ) of nucleic acid hybridizations decreases in the following order: RNA:RNA, DNA:RNA, DNA:DNA.
- a minimum length for a hybridizable nucleic acid is at least about 10 nucleotides; preferably at least about 15 nucleotides; and more preferably the length is at least about 20 nucleotides.
- standard hybridization conditions refers to a T m of about 55°C, and utilizes conditions as set forth above.
- the T m is 60°C; in a more preferred embodiment, the T m is 65°C.
- “high stringency” refers to hybridization and/or washing conditions at 68°C in 0.2xSSC, at 42°C in 50% formamide, 4 ⁇ SSC, or under conditions that afford levels of hybridization equivalent to those observed under either of these two conditions.
- Suitable hybridization conditions for oligonucleotides are typically somewhat different than for full- length nucleic acids (e.g., full-length cDNA), because ofthe oligonucleotides' lower melting temperature. Because the melting temperature of oligonucleotides will depend on the length of the oligonucleotide sequences involved, suitable hybridization temperatures will vary depending upon the oligonucleotide molecules used.
- Exemplary temperatures may be 37°C (for 14-base oligonucleotides), 48°C (for 17-base oligonucleotides), 55°C (for 20-base oligonucleotides) and 60°C (for 23 -base oligonucleotides).
- Exemplary suitable hybridization conditions for oligonucleotides include washing in 6xSSC/0.05% sodium pyrophosphate, or other conditions that afford equivalent levels of hybridization.
- Polypeptide “peptide” or “protein” are used interchangeably to describe a chain of amino acids that are linked together by chemical bonds called “peptide bonds.”
- a protein or polypeptide, including an enzyme may be a "native” or “wild- type”, meaning that it occurs in nature; or it may be a “mutant”, “variant” or “modified”, meaning that it has been made, altered, derived, or is in some way different or changed from a native protein or from another mutant.
- “Rotamer” is defined as a set of possible conformers for each amino acid or analog side chain. See Ponder, et al, Acad. Press Inc. (London) Ltd. pp. 775-791 (1987); Dunbrack, et al, Struc Biol.
- a "rotamer library” is a collection of a set of possible / allowable rotameric conformations for a given set of amino acids or analogs.
- a backbone dependent rotamer library allows different rotamers depending on the position of the residue in the backbone; thus for example, certain leucine rotamers are allowed if the position is within an ⁇ helix, and different leucine rotamers are allowed if the position is not in an ⁇ -helix.
- a backbone independent rotamer library utilizes all rotamers of an amino acid at every position.
- a backbone independent library is preferred in the consideration of core residues, since flexibility in the core is important.
- backbone independent libraries are computationally more expensive, and thus for surface and boundary positions, a backbone dependent library is preferred.
- either type of library can be used at any position.
- variable residue position herein is meant an amino acid position of the protein to be designed that is not fixed in the design method as a specific residue or rotamer, generally the wild-type residue or rotamer. It should be noted that even if a position is chosen as a variable position, it is possible that the methods of the invention will optimize the sequence in such a way as to select the wild type residue at the variable position. This generally occurs more frequently for core residues, and less regularly for surface residues. In addition, it is possible to fix residues as non- wild type amino acids as well.
- “Fixed residue position” means that the residue identified in the three dimensional structure as being in a set conformation. In some embodiments, a fixed position is left in its original conformation (which may or may not correlate to a specific rotamer of the rotamer library being used). Alternatively, residues may be fixed as a non-wild type residue depending on design needs; for example, when known site-directed mutagenesis techniques have shown that a particular residue is desirable (for example, to eliminate a proteolytic site or alter the substrate specificity of an AARS), the residue may be fixed as a particular amino acid. Residues which can be fixed include, but are not limited to, structurally or biologically functional residues. For example, the anchor residues.
- variable residues may be at least one, or anywhere from 0.1% to 99.9%) ofthe total number of residues. Thus, for example, it may be possible to change only a few (or one) residues, or most ofthe residues, with all possibilities in between.
- the protein structure description should be in high quality, preferably obtained from either X-ray or NMR stady.
- protein structares of high quality from theoretical modeling can also be used.
- crystal structure of a homologous protein for example, a homolog from a related species
- a conserved domain to substitute the crystaUographic structure description for the necessary atomic coordinates.
- PDB Brookhaven Protein Data Bank
- All contents of PDB are in the public domain.
- PDB contains 20,317 total deposited structares, including 18289 protein / peptide / virus structares, 843 protein / nucleic acid complex structares, 1167 nucleic acid structares, and 18 carbohydrates.
- 18289 protein / peptide / virus structares contains 20,317 total deposited structares, including 18289 protein / peptide / virus structares, 843 protein / nucleic acid complex structares, 1167 nucleic acid structares, and 18 carbohydrates.
- about 4000 - 4500 structures are being deposited to this database every year. More detailed information regarding all aspects ofthe PDB database can be found at the PDB website.
- MMDB Molecular Modeling DataBase
- the structure database or Molecular Modeling DataBase (MMDB) contains experimental data from crystaUographic and NMR structure determinations.
- the data for MMDB are obtained from the Protein Data Bank (PDB).
- PDB Protein Data Bank
- NCBI 3D the NCBI 3D structure viewer, can be used for easy interactive visualization of molecular structures from Entrez.
- the Entrez 3D Domains database contains protein domains from the NCBI conserveed Domain Database (CDD). Computational biologists define conserved domains based on recurring sequence patterns or motifs. CDD currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI, such as COG. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.
- CDD NCBI conserveed Domain Database
- the CD-Search service employs the reverse position-specific BLAST algorithm.
- the query sequence is compared to a position-specific score matrix prepared from the underlying conserved domain alignment. Hits may be displayed as a pair-wise alignment of the query sequence with a representative domain sequence, or as a multiple alignment.
- CD-Search now is run by default in parallel with protein BLAST searches. While the user waits for the BLAST queue to further process the request, the domain architecture of the query may already be studied.
- CDART the conserveed Domain Architecture Retrieval Tool allows user to search for proteins with similar domain architectures. CDART uses precomputed CD-search results to quickly identify proteins with a set of domains similar to that ofthe query. For more details, see Marchler-Bauer et al, Nucleic Acids Research 31: 383-387, 2003; and Marchler-Bauer et al, Nucleic Acids Research 30: 281-283, 2002.
- Valyl-tRNA Synthetase for mouse has 1263 amino acids (Accession No. AAD26531), and was published by Snoek M. and van Vugt H. in Immunogenetics 49: 468-470(1999); and the Phenylalanyl-tRNA Synthetase sequences for human, Drosophila, S. pombe, S. cerevisiae, Candida albicans, E. coli, and numerous other bacteria including Thermus aquaticus ssp. thermophilus are also available. The database was last updated in May 2000.
- Similar information for other newly identified AARSs can be obtained, for example, by conducting a BLAST search using any of the known sequences in the AARS database as query against the available public (such as the non-redundant database at NCBI, or "nr") or proprietary private databases.
- the AARSs may be from any organism, including prokaryotes and eukaryotes, with enzymes from bacteria, fungi, extremeophiles such as the archeobacteria, worm, insects, fish, amphibian, birds, animals (particularly mammals and particularly human) and plants all possible.
- prokaryotes and eukaryotes with enzymes from bacteria, fungi, extremeophiles such as the archeobacteria, worm, insects, fish, amphibian, birds, animals (particularly mammals and particularly human) and plants all possible.
- enzymes from bacteria, fungi, extremeophiles such as the archeobacteria, worm, insects, fish, amphibian, birds, animals (particularly mammals and particularly human) and plants all possible.
- extremeophiles such as the archeobacteria, worm, insects, fish, amphibian, birds, animals (particularly mammals and particularly human) and plants all possible.
- the conformation of the protein in question will be similar to the known crystal structure of the homologous
- the known structure may, therefore, be used as the structare for the protein of interest, or more preferably, may be used to predict the structare of the protein of interest (i.e., in "homology modeling” or "molecular modeling”).
- MMDB Molecular Modeling Database
- the Molecular Modeling Database described above (see, Wang et al, Nucl. Acids Res. 2000, 28:243-245; Marchler-Bauer et al, Nucl. Acids Res. 1999,27:240- 243) provides search engines that may be used to identify proteins and/or nucleic acids that are similar or homologous to a protein sequence (referred to as "neighboring" sequences in the MMDB), including neighboring sequences whose three-dimensional structares are known.
- the database further provides links to the known structares along with alignment and visualization tools, such as Cn3D (developed by NCBI), RasMol, etc., whereby the homologous and parent sequences may be compared and a structure may be obtained for the parent sequence based on such sequence alignments and known structares.
- alignment and visualization tools such as Cn3D (developed by NCBI), RasMol, etc.
- the homologous protein sequence with known 3D-structure is preferably at least about 60%, or at least about 70%, or at least about 80%, or at least about 90%, or at least about 95% identical to the protein of interest, at least in the region that may be involved in interacting with another molecule of interest.
- Such potential binding sites may not be continuous in the primary amino acid sequence of the protein since distant amino acids may come together in the 3D-structure.
- sequence homology or identity can be calculated using, for example, the NCBI standard BLASTp programs for protein using default conditions, in regions aligned together (without insertions or deletions in either of the two sequences being compared) and including residues Icnown to be involved in substrate amino acid binding.
- the homologous protein is preferably about 35%, or 40%, or 45%, or 50%, or 55% identical overall to the protein of interest.
- Many proteins with just about 20-25% overall sequence homology / identity turns out to be conserved in three-dimensional structure.
- it is typically possible to determine the structure using routine experimental teclmiques for example, X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy
- NMR Nuclear Magnetic Resonance
- the three-dimensional structare of a protein sequence may be calculated from the sequence itself and using ab initio molecular modeling techniques already Icnown in the art. See e.g., Smith T F, LoConte L, Bienkowska J, et al, "Current limitations to protein threading approaches," J. Comput.
- Three-dimensional structares obtained from ab initio modeling are typically less reliable than structures obtained using empirical (e.g., NMR spectroscopy or X-ray crystallography) or semi-empirical (e.g., homology modeling) techniques. However, such structures will generally be of sufficient quality, although less preferred, for use in the methods of this invention.
- the AARS protein structare was obtained computationally by combining the STRUCTFAST structure alignment predicting with molecular dynamics using a force field.
- a homologous protein can be used in the design, and mutations can be translated back into the protein of interest according to the sequence alignment between the two proteins.
- a computer-generated molecular model of the protein and its potential binding site(s) can nevertheless be generated using any of a number of techniques available in the art.
- the C ⁇ -carbon positions of a protein sequence of interest can be mapped to a particular coordinate pattern of a protein ("Icnown protein") having a similar sequence and deduced structure using homology modeling techniques, and the structure of the protein of interest and velocities of each atom calculated at a simulation temperature (To) at which a docking simulation with an amino acid analog is to be determined.
- Icnown protein a protein having a similar sequence and deduced structure using homology modeling techniques
- such a protocol involves primarily the prediction of side-chain conformations in the modeled protein of interest, while assuming a main-chain trace taken from a tertiary structure, such as provided by the known protein.
- Computer programs for performing energy minimization routines are commonly used to generate molecular models. For example, both the CHARMM (Brooks et al. (1983) J Comput Chem 4:187-217) and AMBER (see, Georgia et al, J. Amer. Chem. Soc. 1995, 117:5179; Woods et al, J. Phys. Chem. 1995, 99:3832-3846; Weiner et al, J. Comp. Chem. 1986, 7:230; and Weiner et al, J. Amer.
- energy minimization methods can be carried out for a given temperature, Ti, which may be different than the docking simulation temperature, To.
- Ti which may be different than the docking simulation temperature, To.
- coordinates and velocities of all the atoms in the system are computed.
- the normal modes of the system are calculated. It will be appreciated by those skilled in the art that each normal mode is a collective, periodic motion, with all parts ofthe system moving in phase with each other, and that the motion ofthe molecule is the superposition of all normal modes.
- the mean square amplitude of motion in a particular mode is inversely proportional to the effective force constant for that mode, so that the motion of the molecule will often be dominated by the low frequency vibrations.
- the system is "heated” or "cooled” to the simulation temperature, To, by carrying out an equilibration run where the velocities of the atoms are scaled in a step-wise manner until the desired temperature, To, is reached.
- the system is further equilibrated for a specified period of time until certain properties of the system, such as average kinetic energy, remain constant.
- the coordinates and velocities of each atom are then obtained from the equilibrated system.
- a second class of methods involves calculating approximate solutions to the constrained EOM for the protein. These methods use an iterative approach to solve for the Lagrange multipliers and, typically, only need a few iterations if the corrections required are small.
- SHAKE Rivaert et al. (1977) J Comput Phys 23:327; and Van Gunsteren et al. (1977) Mol Phys 34:1311
- RATTLE (Anderson (1983) J Comput Phys 52:24) is based on the velocity version of the Verlet algorithm. Like SHAKE, RATTLE is an iterative algorithm and can be used to energy minimize the model of a subject protein.
- MembStruk For predicting membrane protein structures, Vaidehi et al (PNAS USA 99: 12622-7, 2002, incorporated herein by reference) describe a program called MembStruk, which has been used to successfully predict the structure of multi-pass membrane proteins, such as G-Protein Coupled Receptors (GPCR) including adrenergic receptors (AR), olfactory receptors (OR), sweet receptors (SR), endothelial differentiation gene (EDG), etc.
- GPCR G-Protein Coupled Receptors
- AR adrenergic receptors
- OR olfactory receptors
- SR sweet receptors
- EDG endothelial differentiation gene
- Active Ligand Conformation Determination Generally, structure and favorable conformations of the active ligand can be obtained using various art-recognized methods, such as obtaining the structure directly from crystaUographic databases, or energy calculation for AL in solution using quantum mechanics (QM). Alternatively, any force field with molecular mechanics may be used for this calculation.
- QM quantum mechanics
- the favorable conformations ofthe analog can first be determined by generating various rotamers of the ligand over a grid of dihedral angles, followed by calculating their energies in solution using QM. Alternatively, this step can be carried out using a force field with molecular mechanics.
- each amino acid side chain has a set of possible conformers, called rotamers. See Ponder, et al, Acad. Press Inc. (London) Ltd. pp. 775-791 (1987); Dunbrack, et al, Struc. Biol.
- a preferred embodiment does a type of "fine tuning" of the rotamer library by expanding the possible ⁇ angle values ofthe rotamers by plus and minus one standard deviation (+1 SD) (or more) about the mean value, in order to minimize possible errors that might arise from the discreteness ofthe library.
- This is particularly important for aromatic residues, and fairly important for hydrophobic residues, due to the increased requirements for flexibility in the core and the rigidity of aromatic rings; it is not as important for the other residues.
- a preferred embodiment expands the ⁇ l and ⁇ 2 angles for all amino acids except Met, Arg and Lys. For the intended amino acid analogs, the ⁇ l and ⁇ 2 angles are expanded as such in their corresponding rotamers.
- Tyr has 36 rotamers
- Cys has 9 rotamers
- Gin has 69 rotamers
- His has 54 rotamers
- Val has 9 rotamers
- He has 45 rotamers
- Leu has 36 rotamers
- Mat has 21 rotamers
- Ser has 9 rotamers
- Phe has 36 rotamers.
- analog rotamers can be derived from natural amino acids.
- the possible ⁇ l and ⁇ 2 angles are derived from database analysis. Since this is not feasible in the case of artificial amino acids or analogs, the closest approximation for ⁇ l and ⁇ 2 angles for the analogs are taken to be the same as those for the natural substrate amino acid.
- both the ⁇ l and ⁇ 2 torsional angles for the analogs are varied to match those of the natural substrate rotamers in the standard backbone independent rotamer library.
- the torsional angles of the natural substrate in the crystal structure are also included in the new rotamer libraries for both the natural substrate and the analogs.
- the ligand conformation can be calculated using quantum mechanics theory, using any ofthe software packages implementing QM.
- One ofthe many comparable programs that can be used in this embodiment is Jaguar (Schrodinger, Portland, OR).
- Jaguar is an extremely fast electronic structure package designed to increase the speed of ab initio calculations in order to accelerate basic and applied research projects and to enable calculations at a higher level of theory.
- Jaguar's speed and power make it possible to study larger systems than ever before, or to study many more systems than previously possible, within a reasonable timeframe.
- Jaguar's fast computational engine allows one to address real-world problems with chemical accuracy in time periods short enough to be relevant in a fast-paced research environment requiring computational treatment at the quantum mechanical level.
- Jaguar implements accurate solvation energies with the improved SCRF
- Jaguar can also be used to predict conformational energy differences. As is well-known in the art, finding low-energy conformers of a structure is an important step in applications such as ligand-receptor binding and identifying intra-molecular hydrogen bonds. While this has largely been a molecular mechanics computational area, it is well Icnown that force field calculations frequently cannot reliably determine accurate conformational energies for novel scaffolds; are often missing parameters or handle non-standard or exotic atom types in a poor manner; and do not include polarization effects.
- Jaguar's computational efficiency makes it possible to routinely calculate the conformational energy of every important low-energy conformer of a structare; to provide an accurate determination ofthe geometries of and energy differences between conformers; and to obtain chemical accuracy when compared to experimental data.
- Jaguar can also be run in parallel over a user-defined number of processors to further speed up calculations.
- Jaguar also has a highly-accurate pKa Predictor.
- Jaguar provides accurate determination of solute geometries and their solvation free energies, chemical accuracy in calculating conformational energy differences.
- the ligand conformation may also be obtained using any suitable force field with molecular mechanics, such as the AMBER package of molecular simulation programs, which includes:
- sander Simulated annealing with NMR-derived energy restraints. This allows for NMR refinement based on NOE-derived distance restraints, torsion angle restraints, and penalty functions based on chemical shifts and NOESY volumes. Sander is also the "main" program used for molecular dynamics (MD) simulations.
- MD molecular dynamics
- LEaP Normal mode analysis program using first and second derivative information, used to find search for local minima, perform vibrational analysis, and search for transition states.
- LEaP is an X-windows-based program that provides for basic model building and AMBER coordinate and parameter/topology input file creation. It includes a molecular editor which allows for building residues and manipulating molecules. Motif-style X- windows Athena widgets and a table widget used in LEaP were written by Vladimir Romanovski.
- • antechamber This program suite automates the process of developing force field descriptors for most organic molecules. It starts with structures (usually in PDB format), and generates files that can be read into LEaP for use in molecular modeling. The force field description that is generated is designed to be compatible with the usual Amber force fields for proteins and nucleic acids.
- • ptraj and carnal These are programs to analyze MD trajectories, computing a variety of things, like RMS deviation from a reference structare, hydrogen bonding analysis, time-correlation functions, diffusional behavior, and so on.
- • mm_pbsa This is a script to automate post-processing of MD trajectories, to analyze energetics using continuum solvent ideas. It can be used to break energies into "pieces" arising from different residues, and to estimate free energy differences between conformational basins.
- Directed methods generally fall into two categories: (1) design by analogy in which 3-D structures of known molecules (such as from a crystaUographic database) are docked to the binding site structure and scored for goodness-of-fit; and (2) de novo design, in which the analog model is constructed piece-wise in the binding site.
- the latter approach in particular, can facilitate the development of novel molecules, uniquely designed to bind to the target protein (e.g., an AARS mutant) binding site.
- the design of potential analogs that may function with a particular target protein begins from the general perspective of shape complimentary for the binding site of the target protein, and a search algorithm is employed which is capable of scanning a database of small molecules of known three-dimensional structare for candidates which fit geometrically into the substrate binding site.
- libraries can be general small molecule libraries, or can be libraries directed to amino acid analogs or small molecules which can be used to create amino acid analogs. It is not expected that the molecules found in the shape search will necessarily be leads themselves, since no evaluation of chemical interaction necessarily be made during the initial search. Rather, it is anticipated that such candidates might act as the framework for further design, providing molecular skeletons to which appropriate atomic replacements can be made.
- each of a set of small molecules from a particular data-base is individually docked to the binding site ofthe target protein in a number of geometrically permissible orientations with use of a docking algorithm.
- a set of computer algorithms called DOCK can be used to characterize the shape of imaginations and grooves that form the binding site. See, for example, Kuntz et al. (1982) J. Mol. Biol 161: 269-288.
- the program can also search a database of small molecules for templates whose shapes are complementary to particular binding site of the target protein. Exemplary algorithms that can be adapted for this purpose are described in, for example, DesJarlais et al. (1988) J Med Chem 31:722-729.
- orientations are evaluated for goodness-of-fit and the best are kept for further examination using molecular mechanics programs, such as AMBER or
- the subject method can utilize an algorithm described by Goodford (1985, J Med Chem 28:849-857) and Boobbyer et al (1989, J Med Chem 32:1083-1094).
- GRID computer program
- Those papers describe a computer program (GRID) which seeks to determine regions of high affinity for different chemical groups (termed probes) on the molecular surface ofthe binding site.
- GRID hence provides a tool for suggesting modifications to known ligands that might enhance binding. It may be anticipated that some of the sites discerned by GRID as regions of high affinity correspond to "pharmacophoric patterns" determined inferentially from a series of known ligands.
- a pharmacophoric pattern is a geometric arrangement of features of the anticipated amino acid analog that is believed to be important for binding.
- Goodsell and Olson (1990, Proteins: Struct Funct Genet 8:195-202) have used the Metropolis (simulated annealing) algorithm to dock a single Icnown ligand into a target protein, and their approach can be adapted for identifying suitable amino acid analogs for docking with the AARS binding site.
- This algorithm can allow torsional flexibility in the amino acid sidechain and use GRID interaction energy maps as rapid lookup tables for computing approximate interaction energies.
- Yet a further embodiment of the present invention utilizes a computer algorithm such as CLIX which searches such databases as CCDB for small molecules which can be oriented in the substrate binding site ofthe target protein in a way that is both sterically acceptable and has a high likelihood of achieving favorable chemical interactions between the candidate molecule and the surrounding amino acid residues.
- the method is based on characterizing the substrate binding site in terms of an ensemble of favorable binding positions for different chemical groups and then searching for orientations of the candidate molecules that cause maximum spatial coincidence of individual candidate chemical groups with members ofthe ensemble.
- the current availability of computer power dictates that a computer-based search for novel ligands follows a breadth-first strategy.
- a breadth- first strategy aims to reduce progressively the size of the potential candidate search space by the application of increasingly stringent criteria, as opposed to a depth-first strategy wherein a maximally detailed analysis of one candidate is performed before proceeding to the next.
- CLIX conforms to this strategy in that its analysis of binding is rudimentary -it seeks to satisfy the necessary conditions of steric fit and of having individual groups in "correct” places for bonding, without imposing the sufficient condition that favorable bonding interactions actually occur.
- a ranked "shortlist" of molecules, in their favored orientations, is produced which can then be examined on a molecule-by-molecule basis, using computer graphics and more sophisticated molecular modeling techniques.
- CLIX is also capable of suggesting changes to the substituent chemical groups of the candidate molecules that might enhance binding.
- the starting library can be of amino acid analogs or of molecules which can be used to generate the sidechain of an amino acid analog.
- the algorithmic details of CLIX is described in Lawerence et al. (1992) Proteins 12:31-41, and the CLIX algorithm can be summarized as follows.
- the GRID program is used to determine discrete favorable interaction positions (termed target sites) in the binding site of the target protein for a wide variety of representative chemical groups. For each candidate ligand in the CCDB an exhaustive attempt is made to make coincident, in a spatial sense in the binding site of the protein, a pair of the candidate's substituent chemical groups with a pair of corresponding favorable interaction sites proposed by GRID. All possible combinations of pairs of ligand groups with pairs of GRID sites are considered during this procedure.
- the program Upon locating such coincidence, the program rotates the candidate ligand about the two pairs of groups and checks for steric hindrance and coincidence of other candidate atomic groups with appropriate target sites. Particular candidate/orientation combinations that are good geometric fits in the binding site and show sufficient coincidence of atomic groups with GRID sites are retained. Consistent with the breadth-first strategy, this approach involves simplifying assumptions. Rigid protein and small molecule geometry is maintained throughout. As a first approximation rigid geometry is acceptable as the energy minimized coordinates of the binding site of the AARS mutant, describe an energy minimum for the molecule, albeit a local one.
- CLIX A further assumption implicit in CLIX is that the potential ligand, when introduced into the substrate binding site of the target protein, does not induce change in the protein's stereochemistry or partial charge distribution and so alter the basis on which the GRID interaction energy maps were computed. It must also be stressed that the interaction sites predicted by GRID are used in a positional and type sense only, i.e., when a candidate atomic group is placed at a site predicted as favorable by GRID, no check is made to ensure that the bond geometry, the state of protonation, or the partial charge distribution favors a strong interaction between the protein and that group. Such detailed analysis should form part of more advanced modeling of candidates identified in the CLIX shortlist.
- Yet another embodiment of a computer-assisted molecular design method for identifying amino acid analogs that may be utilized by a predetermined AARS mutant comprises the de novo synthesis of potential inhibitors by algorithmic connection of small molecular fragments that will exhibit the desired structural and electrostatic complementarity with the substrate binding site of the enzyme.
- the methodology employs a large template set of small molecules with are iteratively pieced together in a model ofthe AARS' substrate binding site.
- Each stage of ligand growth is evaluated according to a molecular mechanics-based energy function, which considers van der Waals and Coulombic interactions, internal strain energy of the lengthening ligand, and desolvation of both ligand and enzyme.
- the search space can be managed by use of a data tree which is kept under control by pruning according to the binding criteria.
- potential amino acid analogs can be determined using a method based on an energy minimization-quenched molecular dynamics algorithm for determining energetically favorable positions of functional groups in the substrate binding site of a mutant AARS enzyme. The method can aid in the design of molecules that incorporate such functional groups by modification of known amino acid and amino acid analogs or through de novo synthesis.
- the multiple copy simultaneous search method described by Miranlcer et al. (1991) Proteins 11: 29-34 can be adapted for use in the subject method.
- MCSS multiple copy simultaneous search method
- To determine and characterize a local minima of a functional group in the force field of the protein multiple copies of selected functional groups are first distributed in a binding site of interest on the AARS protein. Energy minimization of these copies by molecular mechanics or quenched dynamics yields the distinct local minima. The neighborhood of these minima can then be explored by a grid search or by constrained minimization.
- the MCSS method uses the classical time dependent Hartee (TDH) approximation to simultaneously minimize or quench many identical groups in the force field of the protein.
- TDH time dependent Hartee
- Implementation of the MCSS algorithm requires a choice of functional groups and a molecular mechanics model for each of them.
- Groups must be simple enough to be easily characterized and manipulated (3-6 atoms, few or no dihedral degrees of freedom), yet complex enough to approximate the steric and electrostatic interactions that the functional group would have in substrate binding to the site of the AARS protein.
- a preferred set is, for example, one in which most organic molecules can be described as a collection of such groups (Patai's Guide to the Chemistry of Functional Groups, ed. S. Patai (New York: John Wiley, and Sons, (1989)).
- Determination of the local energy minima in the binding site requires that many starting positions be sampled. This can be achieved by distributing, for example, 1,000-5,000 groups at random inside a sphere centered on the binding site; only the space not occupied by the protein needs to be considered. If the interaction energy of a particular group at a certain location with the protein is more positive than a given cut-off (e.g. 5.0 kcal/mole) the group is discarded from that site.
- a given cut-off e.g. 5.0 kcal/mole
- the results can be examined to eliminate groups converging to the same minimum. This process is repeated until minimization is complete (e.g. RMS gradient of 0.01 kcal/mole/A).
- minimization e.g. RMS gradient of 0.01 kcal/mole/A.
- the next step then is to connect the pieces with spacers assembled from small chemical entities (atoms, chains, or ring moieties) to form amino acid analogs, e.g., each of the disconnected can be linked in space to generate a single molecule using such computer programs as, for example, NEWLEAD (Tschinke et al. (1993) J Med Chem 36: 3863,3870).
- NEWLEAD Tschinke et al. (1993) J Med Chem 36: 3863,3870.
- the procedure adopted by NEWLEAD executes the following sequence of commands (1) connect two isolated moieties, (2) retain the intermediate solutions for further processing, (3) repeat the above steps for each of the intermediate solutions until no disconnected units are found, and (4) output the final solutions, each of which is single molecule.
- Such a program can use for example, three types of spacers: library spacers, single-atom spacers, and fuse-ring spacers.
- library spacers are optimized structures of small molecules such as ethylene, benzene and methylamide.
- the output produced by programs such as NEWLEAD consist of a set of molecules containing the original fragments now connected by spacers. The atoms belonging to the input fragments maintain their original orientations in space.
- the molecules are chemically plausible because of the simple makeup of the spacers and functional groups, and energetically acceptable because ofthe rejection of solutions with van-der Waals radii violations.
- the method ofthe present invention may be performed in either hardware, software, or any combination thereof, as those terms are currently Icnown in the art.
- the present method may be carried out by software, firmware, or microcode operating on a computer or computers of any type.
- software embodying the present invention may comprise computer instructions in any form (e.g., source code, object code, interpreted code, etc.) stored in any computer-readable medium (e.g., ROM, RAM, magnetic media, punched tape or card, compact disc (CD) in any form, DVD, etc.).
- computer-readable medium e.g., ROM, RAM, magnetic media, punched tape or card, compact disc (CD) in any form, DVD, etc.
- such software may also be in the form of a computer data signal embodied in a carrier wave, such as that found within the well-known Web pages transferred among devices connected to the Internet. Accordingly, the present invention is not limited to any particular platform, unless specifically stated otherwise in the present disclosure.
- Exemplary computer hardware means suitable for carrying out the invention can be a Silicon Graphics Power Challenge server with 10 RIOOOO processors running in parallel.
- the functional forms of the non-bond energy in Equation 1 can have different forms.
- the dielectric constant in the Coulomb term can be distance dependent.
- the charges for both the protein and the ligand can be varied. This includes charges from experiment, or charges based on various models, such as, but not limited to, QEq ("Charge Equilibration," see Rappe and Goddard, J Phys. Chem. 95: 3358-63, 1991), Del Re (Del Re J. Chem. Soc.
- ReaxFF ReaxFF Polarizable Reactive Force Fields for Molecular Dynamics Simulation of Ferroelectrics
- the 6-12 Leonard- Jones potential can be made 6-10, or even softer to allow closer contact.
- the hydrogen bond term can have several different variations. The three body form is used in Equation 1, but two-body or four body form is common too, and can also be similarly used. Further, in some force fields, hydrogen bond can also be treated implicitly as part ofthe Coulomb term.
- Solvation is an important factor in determining biomolecular stability and binding properties.
- implicit solvent is used to minimize the structares and to calculate binding energies.
- These implicit solvent model includes, but not limited to Surface Generalized Born (SGB) model (Ghosh et al. J. Phys. Chem. 102: 10983-10990, 1998), Solvent Accessible Surface Area (SASA) / Analytical Volume Generalized Born (AVGB) model (Zamanakos, Ph.D. Thesis, California Institute of Technology, Pasadena, CA, 2001), and Poisson- Boltzmann (PB) model.
- SGB Surface Generalized Born
- SASA Solvent Accessible Surface Area
- AVGB Analytical Volume Generalized Born
- PB Poisson- Boltzmann
- explicit solvent molecules can be easily added to the calculation of binding energy, as long as the solvation model is accurate enough to account for the solvation effect of ligand binding to the target protein. This can usually be used in the evaluation ofthe best cases for final selection in the design.
- Impact a molecular mechanics program specifically designed to handle large macromolecular simulations.
- the program effortlessly treats simulations on ligand-protein systems in solvation or in vacuo, enabling the study of large systems in a timely manner.
- Impact's explicit solvation offers the possibility of setting up calculations in solvent boxes, where water or user-defined solvent molecules are placed at distinct sites of a solvent box. Periodic boundary conditions are also available.
- Continuum models greatly enhance computational speed.
- Impact's continuum solvation method is a surface area version ofthe generalized Born model (SGB). The SGB model has been shown to give significant improvement in accuracy over the uncorrected GB model.
- side chain modeling methods There are several side chain modeling methods that can be incorporated into the COP procedure. These side chain modeling methods include SCWRL, SCAP, and methods based on branch-and-bound, dead-end-elimination algorithms.
- SCWRL (Bower et al, J. Mol. Biol. 267: 1268-82, 1997) is a program for adding side-chains to a protein backbone based on the backbone-dependent rotamer library.
- the library provides lists of chil-chi2-chi3-chi4 values and their relative probabilities for residues at given phi-psi values, and explores these conformations to minimize sidechain-backbone clashes and sidechain-sidechain clashes. It is possible to get output from the program at any of the three steps: best library rotamers, no clashes relieved; backbone clashes relieved; backbone and sidechain- sidechain clashes relieved.
- the present version of the program (scwrl2.7) achieves a prediction rate for chil dihedral angles correct within 40 degrees for all residues of 81.0%.
- SCWRL The method used by SCWRL is based on the hypothesis that a great deal of the information needed for sidechain positioning is contained in the local mainchain conformation of each residue, but that a search strategy to resolve steric exclusions is also necessary for the most accurate predictions.
- SCWRL is a single self- contained program optimized for speed and accuracy.
- SCWRL is designed to take full advantage of the rotamer approximation and the strong backbone dependencies rotamers display to create an initial placement for each residue, followed by systematic searches to resolve steric clashes.
- E 10.0 R ⁇ 0.83R0 where R is the distance between the atoms and R0 is the sum of their radii.
- the linear portion of the function approximates the repulsive curve of a Leonard- Jones potential.
- a graph showing the energy function used by SCWRL and a Leonard- Jones potential (light line) is displayed in the SCWRL website. To make the steric term more forgiving on the rigid rotamers, the radii of atoms are reduced roughly 15% from their van der Waals radii, to values which approximate the distance where a Leonard- Jones potential would become repulsive.
- SCWRL first calculates possible disulfide bonds from rotamers of cysteine residues and resulting distances between sulfur atoms. These residues are then frozen in their disulfide conformation, and the sidechain atoms are treated as part of the backbone scaffold in determining the conformations ofthe other residue types.
- the search strategy does not involve a search of every rotamer of every sidechain, but rather takes a structare with residues in their most favorable backbone-dependent rotamers and systematically resolves the conflicts that arise from that structure.
- Each residue begins in its most favored rotamer, according to the rotamer database used. This is the first stage structare.
- the first stage structare When a sidechain from the first stage structure has a steric clash as defined by Eq. A with the (fixed) mainchain, the rotamer for that residue is changed to progressively less favorable rotamers until one is found that does not conflict with the mainchain.
- the second stage structure has all of these sidechain to mainchain clashes relieved.
- each one represents an exclusive subset of residues which are allowed to interact with each other.
- Each cluster is solved, in turn, tlirough a combinatorial search to find its minimum steric clash score.
- the stage three structare is output as the solution.
- the search procedure tests each residue and combination of residues in the order they were added to the cluster. Rotamers for each residue are tested in order of decreasing favorability. The first combination of rotamers which is found to have a steric clash score of zero is taken as the final solution. If no such combination is found, all the rotamer combinations are searched, and the combination with the minimum steric clash score is taken as the final solution.
- clusters grow too large to be solved quickly with a combinatorial search.
- the cluster is broken into sub-clusters to speed the solution time.
- the limit is set for clusters that cannot be solved by the combinatorial search in approximately one second, which is reached for clusters containing more than 1.5E7 rotamer combinations, about 15 residues.
- a large cluster is parsed by finding the residue in that cluster whose removal from the cluster results in the smallest sub-clusters. Then each ofthe sub-clusters is solved in the presence of each ofthe "keystone" residue's potential rotamers. For example, in a 21-residue cluster where every residue has three potential rotamers, the combinations to search will number 3 ⁇ 21, or OxElO.
- residues are searched in order from high entropy sidechains to low entropy sidechains.
- all residues which clash in their library conformations are searched (in order of decreasing entropy) before sidechains which clash with lower probability rotamers of the original clustering residues.
- This parsing of clusters into sub-clusters is a recursive process, and if the sub-clusters still contain more combinations than the cutoff, or if a keystone residue fails to break a cluster into non-interacting sub-clusters, the remaining clusters are passed down to a new level for additional parsing. Only in some very rare cases, the parse routine cannot find a subset of keystones which breaks the cluster up fast enough to overcome the combinatorics.
- the backbone-independent rotamer library is similar to the familiar Ponder & Richards rotamer library (J. Mol. Biol. 193: 775-791, 1987).
- the backbone-dependent rotamer library defines a rotamer distribution for small ranges of phi &n ⁇ psi, basically a rotamer library for every 10 x 10 degree box of phi and psi.
- the site mentioned above includes backbone-independent and backbone- dependent rotamer libraries and the SCWRL program for making sidechain conformation predictions from the library and input phi and psi values and for evaluating the rotamers and chi angles of a preliminary x-ray, NMR, or model structare in comparison to the experimental distributions of rotamers and chi angles in the Protein Databank.
- the library is subject to frequent update (every couple of months).
- 2002 version ofthe library contains 850 chains from the PDB of resolution 1.7 A or better, and less than 40% homology with other chains in the set.
- the list was determined from the Dunbrack group's algorithm for finding sets of PDB chains with maximum sequence identities and resolution cutoffs. More specifically, the chains used in constructing the database are all from x-ray crystaUographic structures from the Brookhaven Protein Databank (PDB).
- the algorithm for selecting these lists is similar to that of Hobohm and Sander. The only difference is the addition of resolution cutoffs, so that one gets the largest list possible with PDB entries of a certain resolution or better, and also that the algorithm favored higher resolution structures over lower resolution structares (by proceeding from high resolution to low resolution in the reject-until-done procedure).
- SCAP (Xiang and Honig, J. Mol. Biol. 311: 421-430, 2001): Scap is a program for protein side-chain prediction and residue mutation.
- the program can make prediction on all residues or on certain residues in a protein of multiple chains. It can automatically detect if the residue to be predicted or mutated is backbone only, or complete with all side-chain atoms. If the residue is backbone only, it will first add side-chains and then do predictions; if the residue is to be mutated, the residue is first mutated accordingly and prediction is then performed.
- the scap performs side-chain prediction using coordinate rotamer libraries.
- Side_small_rotamer library has 214 side-chain rotamers with 40- degree chi angle cutoff.
- side_medium_rotamer has 3222 side-chain rotamers compiled from 297 chains with 20 degree cutoff and 96% representation
- side_large_rotamer has 6487 side-chain rotamers compiled from 297 chains with 10 degree cutoff and 96% representation
- side-mix-rotamer is a mix of side_medium_rotamer and side_large_rotamer. It includes all rotamers from side_large_rotamer except for ARG, LYS and MET which come from side_medium_rotamer.
- the more coordinate rotamer and dihedral angle rotamer libraries can be downloaded from website: http://trantor.bioc.columbia.edu/ ⁇ xiang/sidechain/index.html. More detailed information about the methods and algorithms in scap can be found in the paper: Xiang Z, Honig B. Extending the accuracy limits of prediction for side-chain conformations. J. Mol. Biol. 2001 311:421-30 (incorporated herein by reference); and Xiang, Z; Honig, B. Colony energy improves prediction of exposed sidechain conformations. The software and manual can be downloaded at: http://trantor.bioccolumbia.edu/ ⁇ xiang/jaclcal/index.html.
- Scap support the following platform: SGI, Sun Solarius, Linux and Microsoft Window.
- COP is a generic method that can be applied to any target protein for recognizing a desired active ligand (AL) vs. one or more structurally related inactive ligand(s) (IL).
- the ligand type can be any molecule that is a binding target to a protein and has some sort of anchoring point as the reaction center.
- the ligand itself may be a protein, and thus the binding can be between two proteins. For example, if protein X normally binds protein Y, then a mutant protein X (denoted X') can be used as a target protein in COP, such that a complimentary counterpart protein Y' can be designed for binding to X'.
- the instant invention uses in the design calculation a full force field, including hydrogen-bonding capability and solvation effects. Hence the all-atom energy function is more accurate, and also biased towards recognizing the new ligand compared to its competitors. In addition, proteins are allowed to be fully movable in the sage of binding calculation. The algorithm has been designed to make few mutations to recognize a desired ligand. The conformational search of side-chain rotamers can be exhaustive. All these features and combinations thereof allows COP to be a unique design approach.
- GUI graphic user interface
- Glade a GTK-based free user interface builder
- Figure 7 shows the screen snapshot when COP is started using the graphical interface.
- Clicking a button brings out a popup window with the corresponding information. For example, clicking the "About" button will open a window showing the version and copyright information of COP.
- the help window is designed to let user know the conventions used in the COP program.
- the four buttons in the bottom right of the window are for carrying four functions in COP, with each corresponding to a different program. "Calculate Clash” will run the clash identification to find mutation residues and their mutation targets to relieve clash.
- "HB Builder” uses a rotamer library to build possible hydrogen bond donor or acceptor residues in the binding site to stabilize new polar atoms in the analog ligand, if there is any.
- Combi Mutation carries out the combined mutation step in COP, and calculates the binding energies of each ligand including competing nataral amino acids to generated mutants. A list of top candidates will be given at the end. Finally the “Stability Check” step will eliminate any mutant that potentially cannot fold correctly.
- the ration of natural / analog amino acids may be controlled so that the degree of incorporation may be adjusted.
- 100% of one or more of the natural amino acids such as Phe
- two or more analogs may be used in the same in vitro or in vivo translation system. These analogs can be analogs ofthe same amino acid (for example, different Phe analogs), or different amino acids (for example, analogs of Phe and Tyr, respectively).
- one or more AARSs of the instant invention can be recombinantly produced and supplied to any the available in vitro translation systems (such as the commercially available Wheat Germ Lysate-based PROTEINscript-PROTM, Ambion's E. coli system for coupled in vitro transcription/translation; or the rabbit reticulocyte lysate-based Retic Lysate IVTTM Kit from Ambion).
- the in vitro translation system can be selectively depleted of one or more natural AARSs (by, for example, immunodepletion using immobilized antibodies against nataral AARS) and/or natural amino acids so that enhanced incorporation of the analog can be achieved.
- nucleic acids encoding the re-designed AARSs may be supplied in place of recombinantly produced AARSs.
- the in vitro translation system is also supplied with the analogs to be incorporated into mature protein products.
- in vitro protein synthesis usually cannot be carried out on the same scale as in vivo synthesis, in vitro methods can yield hundreds of micrograms of purified protein containing amino acid analogs.
- Such proteins have been produced in quantities sufficient for their characterization using circular dichroism (CD), nuclear magnetic resonance (NMR) spectrometry, and X-ray crystallography.
- This methodology can also be used to investigate the role of hydrophobicity, packing, side chain entropy and hydrogen bonding in determining protein stability and folding. It can also be used to probe catalytic mechanism, signal transduction and electron transfer in proteins.
- the properties of proteins can be modified using this methodology. For example, photocaged proteins can be generated that can be activated by photolysis, and novel chemical handles have been introduced into proteins for the site specific incorporation of optical and other spectroscopic probes.
- one or more AARS of the instant invention can be supplied to a host cell (prokaryotic or eukaryotic) as genetic materials, such as coding sequences on plasmids or viral vectors, which may optionally integrate into the host genome and constitatively or inducibly express the re-designed AARSs.
- a heterologous or endogenous protein of interest can be expressed in such a host cell, at the presence of supplied amino acid analogs.
- the protein products can then be purified using any art-recognized protein purification techniques, or techniques specially designed for the protein of interest. The above described uses are merely a few possible means for generating a transcript which encodes a polypeptide.
- any means known in the art for generating transcripts can be employed to synthesize proteins with amino acid analogs.
- any in vitro transcription system or coupled transcription / translation systems can be used for generate a transcript of interest, which then serves as a template for protein synthesis.
- any cell, engineered cell / cell line, or functional components that is capable of expressing proteins from genetic materials can be used to generate a transcript.
- RNA polymerase T7, SP6, etc.
- co-factors RNA polymerase
- nucleotides ATP, CTP, GTP, UTP
- necessary transcription factors ATP, CTP, GTP, UTP
- buffer conditions as well as at least one suitable DNA template, but other components may also added for optimized reaction condition.
- a skilled artisan would readily envision other embodiments similar to those described herein.
- the instant invention provides methods and implementing computer software for designing mutant proteins (the Target Protein or TP) that will preferentially bind one list of prespecified ligands (Active Ligands or AL) with respect to another list of ligands (The Inactive Ligands or IL).
- TP Target Protein
- AL Active Ligands
- IL The Inactive Ligands
- Example 1 Designing Mutant tyrosyl-tRNA synthetase from Methanococcus janacshii for recognizing O-methyl-L-tyrosine
- Applicants have applied the COP algorithm to design mutants of tyrosyl- tRNA synthetase from Methanococcus janacshii (M. jann-TyrRS) for selective binding of OMe-Tyr (see Scheme 1). Since the crystal structare of my-TyrRS is not available, Applicants predicted the three-dimensional structare for wild-type my ' -TyrRS, based on a combination of the STRUCTFAST sequence alignment and structure prediction algorithm with molecular dynamics (MD) including continuum solvent forces. [To select the 5 residues to modify in their experiments, Wang et al.
- TyrRS crystal structures in the Protein Data Bank. They are all from Bacilus stearothermophilus, with different ligands in the structures. Structare 2tsl has no ligand, 3tsl with Tyr- AMP bound, and 4tsl has a Tyr in the binding site.
- Genbank accession number: Q57834
- the three-dimensional structare of the main chain of 7?y ' -TyrRS was predicted with STRUCTFAST homology modeling technique.
- the structure of 4tsl was used as the template in the prediction. The sequence identity between the two sequences is 32.1%>.
- the main chain atoms of the initial predicted m/-TyrRS structare agree with the corresponding residues of 4tsl structure to 0.64 A in root mean square difference (rmsd) in coordinates after aligning the two structures using DALI (Holm and Sander, J. Mol. Biol. 233: 123-38, 1993). After full minimization, the main chain rmsd increases to 1.75 A for the 139 structurally aligned residues. But the conserved five residues (Tyr32, Tyrl51, Glnl55, Aspl58, and Glnl73) in the binding site have a rmsd of 0.62 A for all heavy atoms.
- Applicants matched the side chain conformation of the five strictly conserved residues (Tyr32, Tyrl51, Glnl55, Aspl58 and Glnl73) in the binding site of mj-TyxRS, with those conformations from the 4tsl crystal structure.
- the rest of the side chains for the predicted m/-TyrRS structure were added by using side chain modeling program SCWRL version 2.7 (Bower et al, J. Mol. Biol 267: 1268-82, 1997; Dunbrack, Proteins Suppl, 81-7, 1999) while keeping the conformations in the binding site fixed.
- AARSs must be able to charge the correct amino acid to its corresponding tRNA.
- the activation step consists of the bound amino acid forming the aminoacyl adenylate complex and subsequent transferring the aminoacyl group to the 3 '-end of the bound tRNA.
- the Tyr ligand obtained in rn/ ' -TyrRS structure optimization was used to build 19 other amino acids.
- the contact between the zwitterions ofthe amino acid and the appropriate residues in the binding site was fixed.
- SCWRL was used to mutate the side chain into 19 other amino acids.
- Each of the resulting amino acids was minimized in the binding site of the protein using conjugate gradient method.
- the binding energy of each amino acid is calculated as:
- AG ⁇ rotein and AG (ligand) are the free energy for the protein and ligand alone, respectively.
- the structare optimization was always done with SGB continuum solvation.
- Such continuum solvation model is optimized with the potential mean force (PMF) from bulk solvent, the total energies are very close to the free energy of the system (Hendsch and Tidor, Protein Sci. 8: 1381-92, 1999). This is true especially for tight bound complexes.
- PMF potential mean force
- Scheme 1 shows the natural amino acid Tyr, and its analog OMe-Tyr (rotamer 1), for which the mutant protein was designed.
- OMe-Tyr has two equally favorable rotamers, one shown in Scheme 1 (denoted 1 and the other with the -OMe group pointed down (denoted 2). Both rotamers were matched in the binding site of Tyr in wild type M. jann-TyrRS, keeping the zwitterion's end fixed in the structure. Component analysis of the contribution of each residue in the binding site to the binding of OMe-Tyr was calculated using Equation 2 (supra). The binding site was defined in this case as the entire residue for all atoms ofthe protein within 6 A of any atom ofthe ligand. This leads to the identification of 26 binding site residues in Table 1. Since rotamer 2 had severe clashes with protein backbone residue Gly34, while rotamer 1 had none, we considered only rotamer 1 further.
- Ty ⁇ -32 Glu, Asp, Gin, Phe, and Met.
- stage 2 we used SCWRL (Side-chain placement With a Rotamer Library) to generate the side-chain configurations for the two mutated residue positions for each ofthe five combinations of simultaneous mutations selected from stage 1 (with 5 choices for Tyr32 and 1 for Asp 158). These side-chains were optimized separately for the analog and for Tyr in the binding site. This leads to 5 full protein structures for each ligand. We then carried out energy minimization for the new side chains with all other atoms fixed. This was followed with energy minimization ofthe whole protein with either OMe-Tyr or Tyr bound for all five mutants. The binding energies (including solvation) for both OMe-Tyr and Tyr were then calculated for all mutants. This leads to the binding energies in the top box of Table 3 for OMe-Tyr and Tyr bound to the five mutants.
- the first row is for the wild type, which binds Tyr well but not OMe-Try.
- the next five rows (boxed) consider the Mutations for Y32 and D158 identified in Table 2.
- the three cases denoted as ** are considered to be promising cases worth testing.
- the last five rows consider these same five mutations, but with the E107T and L162P mutations observed in the experiments of Wang et al, Science 292: 497-500 (see below).
- the case denoted as *** is the one determined experimentally.
- the results for the five mutant proteins can be compared to the wild type (given in the top row of numbers in Table 3), which leads to 44 kcal/mol for Tyr but -12 kcal/mol for OMe-Tyr. All five mutants bind Tyr less strongly (38 to 42 kcal/mol) while these five mutants bind OMe-Tyr by 37 to 49 kcal/mol. Of the five mutants three favor binding of OMe-Tyr over Tyr by at least 5 kcal/mol.
- Figure 2 shows the predicted binding site of OMe-Tyr for the best mutant, [Y32Q, D158A]. It is observed that residues Ala67, Alal58, and Leu65 form a hydrophobic pocket for the methyl group of OMe-Tyr.
- the amide N ⁇ 2 of Gln32 has close contact with the oxygen atom of the OMe group (3.79 A), while the O ⁇ l atom of Gln32 is stabilized by forming a weak hydrogen bound (3.58 A) with the main chain NH of Leu65. These H-bonds might be further stabilized by an intervening water.
- the mutant [Y32M, D158A] is also a favorable candidate. However for
- Glul07 is on the surface of the protein, 12.9 A away from the Tyr ligand (from the C ⁇ of Glul07 to the O ⁇ ofthe Tyr ligand), and Leul62 is 14.5 A away from the Tyr ligand.
- Leul62 and Glul07 in M. jann-TyrRS correspond to Leul80 and Asnl23 in the 4tsl structure.
- Leul80 is in the middle of a beta strand on the bottom of the binding site in 4tsl.
- the Leu-Pro mutation at this position would be expected to disrupt the secondary structure. Since Asnl23 is close to the core of the protein in 4tsl, it seems unlikely that a charged Glu would fold into this structure.
- D158A, E107A, L162P] mutant is calculated to be only about 27 kcal/mol, while the net binding of Tyr to the mutant is calculated to be about 18 kcal/mol. This could explain the observation that the mutant led to an incorporation rate much slower than for the nataral amino acid. Thus the calculations do seem to be consistent with the experiment of Wang et al, given how the experiment was carried out.
- Example 2 Designing Mutatant mi-Tyr RNA Synthetase for recognizing Naphthyl-Alanine
- Applicants compare these mutants with the experimental mutant selected from a library of 5 20 mutants with five positions each replaced by one ofthe 20 natural amino acids (Wang et al, J. Am. Chem. Soc 124: 1836-1837, 2002). These five positions are Y32, D158, 1159, L162, and A167. And the experimental mutant is Y32L-D158P-I159A-L162Q-A167V. Compared with COP designed mutants, the first two mutation sites are the same as what COP found. However, the other three residues 1159, L162 and A167 are not in contact with the analog naph- Ala, thus COP did not identify them as mutation residues.
- the mutation Y32L also appears in two designed mutants with good binding affinity toward naph-Ala.
- P was not a choice for D158 in COP design. The reason is that it requires main chain conformational change in the mutation D158P.
- the phi/psi angles for D158 in m/-TyrRS is -58 113°, while for P it should be either -57°/-38° or -63°/139°.
- the mutation V159A facilitated the main chain conformational change by allowing the backbone move further away from the ligand.
- Next step is to try all 20 amino acids in position Y32 and D158 one at a time.
- the mutated residue is minimized with everything else fixed first, followed by calculating the interaction energy ofthe mutation with the ligands and the rest of the protein.
- a preferential score for each amino acid is then calculated as 95% of the differential interaction energy with keto-Tyr and with Tyr, plus 5% ofthe constraint energy ofthe mutated residue with its neighbor residues in protein.
- PheRS in the PDB. Instead PheRS from Thermus thermophilus has been crystallized and solved under different conditions previously (18-21). Because the homology between PheRSs from E. coli and T. Thermophilus is very high (46.2% identical residues, with only a few deletions), we used PheRS from T. thermophilus as the modeling system.
- the structure of PheRS complexed with Phe (PDB ID: 1B70 resolution 2.7 A) was downloaded from the Protein Data Bank, and hydrogens were added using Biograf (Accelrys, San Diego, CA). The structare was minimized with conjugate gradient method to an rms force of 0.1 kcal/mol/A or maximum of 5000 steps.
- Dreiding force field was used for energy expression.
- the protein was described using CHARMM22 (supra) charges, while charges for the ligands were Mulliken charges derived from molecular orbitals in quantum mechanics using Jaguar 4.5 (Schrodinger, Portland, OR). The minimized structure was used in the design.
- the keto-Phe analog was built from the Phe ligand in the minimized structare. There are two rotamers with equally favorable energies (see panel 3 of Figure 4). For the opportunity part, Applicants implemented a rotamer library based procedure to build potential hydrogen bonds between the protein and the new ligand. The library was based on the protein side chain rotamer library used by SCAP (supra). First the new analog ligand is compared with the wild-type ligand to see if there is extra polar atom for the analog ligand. If no such atom found for the analog, no hydrogen bond building is necessary.
- FIG. 5 is the minimized PheRS in ribbon representation with Phe shown as ball model.
- the two rotamers of keto-Phe were built from the Phe ligand in the minimized PheRS/Phe complex structure.
- Each rotamers of keto-Phe was matched into the binding site of Phe in PheRS, and clashes were calculated for each rotamer. It is shown that rotamer 1 of keto-Phe clashes with the protein backbone with G284 and A283, while rotamer 2 does not clash with any backbone atoms. Therefore, only rotamer 2 was used in the following steps of design.
- V261 and A314 Two residues V261 and A314 were identified as mutation target residues. Each of them was mutated into all 20 amino acids using scwrl. A backbone-dependent rotamer library was used to place the side chain conformation with the lowest constraint energy in the mutation site. Using a cutoff of 0 kcal/mol, only one choice for both V261 and A314 were selected, and both were Gly. There is a polar oxygen atom in the keto group. However, the hydrogen bond design algorithm in COP did not find optimal hydrogen bonds for the keto group. Thus COP designed a V261G-A314G mutant only. Previously we did the hydrogen bond design part by visualization and decided to build a hydrogen bond donor residue on V286.
- V286 mutation was also tried to make room for V286 mutation by make L222 to smaller residues. Both rotamers of keto- Phe and some competitors (Phe and Tyr in this case) were used as binding ligands. As a test, the binding energy to the wild-type PheRS and an A314G mutant were also calculated. The A314G mutant has been previously shown to be able to bind p- Br-Phe. V261G-A314G-V286R was a mutant we designed previously using visualization as a procedure to build hydrogen bond between the protein and the analog. Using the new hydrogen bond builder with a side chain rotamer library, COP did not choose any residue to build hydrogen bond donors.
- this mutant now shows a less favorable binding energy than Tyr, thus it will also be rejected.
- the stability check also failed to give a stable protein fold.
- the V261G-A314G mutant shows a good differential binding energy between keto-Phe and its competitors, Phe and Tyr in this case. It favors keto-Phe binding by 7.45 kcal/mol better than Tyr, the closest competitor.
- Figure 3 shows the binding site of keto-Phe in the designed V261G-A314G mutant. There is no specific polar interaction with the carbonyl group of the side chain of keto-Phe from the protein, i.e., no hydrogen bond is formed.
- the two mutations V261G and A314G enable the binding of keto-Phe by making the binding site larger to accommodate the extra acetyl group in the side chain of keto-Phe.
- Other interactions remain the same as seen in the wild-type Phe-PheRS complex.
- Resides 258-261, 282-284, and 314-316 form the binding pocket for the side chain of keto-Phe.
- E220, SI 80 and Q218 form hydrogen bonds with the N-terminus
- W149, H178 and R204 form hydrogen bonds with the C- terminus.
- Applicants have applied the COP protein design tool to design mutant PheRS for the in vivo incorporation of p-keto-Phe.
- a mutant V261G-A314G was designed, and showed good binding affinity to keto-Phe and good differential binding to keto-Phe than its competitors from the natural amino acids the mutant has been experimentally tested to be able to recognize the target analog p-lceto- phenylalanine (keto-Phe).
- Example 5 Designing Mutatant Trp-tRNA Synthetase for recognizing non- natural amino acid analogs
- TrpRS 2-amino-3-(7-nitro-benzo[l,2,5]oxadiazol-4-ylamino)-propionic acid
- Bpy-Ala 2-amino-3-[2,2']bipyridinyl-5-yl-propionic acid
- TrpRS from B. stearothermophilus with Trp bound in the binding site was downloaded from the Protein Data Bank. Hydrogens were added to the structure using Biograf (Accelrys, San Diego, CA), followed by annealing on the hydrogens to optimize the hydrogen bond network. The heavy atoms were fixed during the annealing. The structure was subject to further optimization using conjugate gradient minimization with all atoms movable for 2000 steps and with a convergence criterion of rms force reaching less than 0.1 kcal/mol/A.
- the simulation program MPSim (supra) was used along DREIDING force field (supra).
- a score is calculated by considering the differential interaction energy of the mutated residue with the analog and the wild-type amino acid, the constraint energy of the mutated residue with the rest of the protein. Mutations with positive scores are selected for use in making mutants in the combined stage. The mutant proteins are then generated by combining the choices for each mutation site, and optimized by minimization. The binding energy of the analog to the mutant is calculated with Equation 2. The binding energies of competitors from natural amino acids are also calculated, and the differential binding energy between the analog and the best competitor is used as a criterion to select mutants.
- the two rotamers of NBD-Ala were placed into the binding site of TrpRS, and the interaction energies of each residue with NBD-Ala and Trp ligand were calculated using Equation 1. The difference of the interaction between NBD-Ala and Trp was then calculated.
- the binding site is defined as within 6 A of the side chain of NBD-Ala.
- Five residues (V141, VI 43, D132, Ml 29 and F5) were found to have less favorable interactions with rotamer 1 of NBD-Ala compared to Trp.
- the same five residues had less favorable interactions with rotamer 2 of NBD-Ala.
- a cutoff value of 1 kcal/mol was used to select residues to mutate.
- V143 I, M, N, G, S, A, H, C, T, P, V are chosen for D132; M, L, I are chosen for M129; F, W, N are chosen for F5.
- A, G, Q are chosen for V141; T, N, C, A, S, G are chosen for V143; Q, W, L, I, M, G are chosen for D132; M , L, I, N, S are chosen for M129; Y, F are chosen for F5.
- the top mutants designed by COP for rotamer 2 of NBD-Ala are: (1) [V141G, V143T, D132I, M129L, F5F]; (2) [V141G, V143T, D132I, M129N, F5F]; (3) [V141G, V143T, D132L, M129S, F5F]; (4) [V141A, V143T, D132L, M129S, F5F]; (5) [V141G, V143T, D132M, M129I, F5F]; (6) [V141A, V143T, D132M, M129I, F5F]; (7) [V141G, V143T, D132M, M129L, F5F]; (8) [V141G, V143C, D132L, M129N, F5F]; (9) [V141G, V143T, D132I, M129M, F5F]; (10) [V141G, V143S, D132L M129I, F
- V141 and Ml 29 were always mutated to the same residue.
- the V141 mutation which had severe clash with NBD-Ala before mutation, can be either G or A. This mutation is presumably for relieving the clash between the nitro group of NBD-Ala and the protein.
- M129 seemed to be less critical as it could be either M or I. Both seem to have the same size.
- the V143T mutation formed a hydrogen bond with the nitro group. D132 recognizes the side chain NH of the Trp ligand in wild-type TrpRS, and mutation D132I blocks the normal binding of Trp. As a result, Trp is hardly a competitor in the competitive binding.
- TrpRS When these rotamers were simply placed in the binding site of TrpRS, they had very bad clash with the backbone of TrpRS. The reason was that these rotamers were not aligned optimally with the binding site for Trp. However, the alignment is significantly better after optimization. In order to optimize the orientation ofthe side chain of bpy-Ala, the following procedure was adopted to get the conformation with lowest backbone clash energy. First a mutant TrpRS with all Gly was generated, i.e., all the side chains ofthe protein were removed.
- a grid of conformations for bpy-Ala was then generated by changing the ⁇ l and ⁇ 2 angles (as usual, ⁇ l was defined as the dihedral angle of N-C ⁇ -C ⁇ -C ⁇ , and ⁇ 2 was defined as the dihedral angle of C ⁇ -C ⁇ - C ⁇ -C ⁇ ).
- ⁇ l was defined as the dihedral angle of N-C ⁇ -C ⁇ -C ⁇
- ⁇ 2 was defined as the dihedral angle of C ⁇ -C ⁇ - C ⁇ -C ⁇ .
- the bpy-Ala analog was put into the binding site of the all Gly TrpRS, and the energy of the complex was calculated after 10 steps of steepest descents minimization of the analog with the protein fixed. The energy was plotted in a two-dimensional energy surface plot.
- Rotamer 1 had 8 residues that need to be mutated if the cutoff value of 0.5 kcal/mol is used. Therefore, rotamer 2 was chosen for design first since it had less main chain clash and only 5 mutations to do. The 5 mutation residues were F5, D132, 1133, V40, and M129. A cutoff of 0.5 kcal/mol was used. Applicants then tried to mutate each of the five residues to all 20 natural amino acids one by one. The mutated residue conformation was chosen from a rotamer library with the lowest energy rotamer being selected.
- a score was calculated as 95% ofthe differential interaction energy of the mutated residue with bpy-Ala and Trp, plus 5% of the constraint energy between the mutated residue and the rest of the protein. Mutations with positive score were chosen. For F5, G was chosen; for D132, G and A were chosen; for 1133, T, S, M, A, G, C were chosen; for V40, A and G were chosen; for M129, M, L, I, F, H, C, V were chosen.
- Each mutant was scored by the binding energy to bpy-Ala using Equation 2.
- a cutoff of 25 kcal/mol was used to select good binding mutants to calculate binding energies to competitors from nataral amino acids.
- Trp, Tyr and Phe were used to select good binding mutants.
- the difference of the binding energy between bpy-Ala and the competitor with the best binding energy was calculated and used to select best mutants.
- Finally a stability check for each mutation was performed for each mutant. Those mutants with mutations making unfavorable protein-protein interactions were discarded due to their possible problem with folding.
- mutants designed by COP three residues have the same mutation in all of them. These mutations are F5G, V40A and M129C. D132 is mutated to either G or A. 1133, which has slight clash with bpy-Ala in main chain, can be mutated into A, S, T, or M. In mutant F5G-D132A-I133T-V40A-M129C, it can be seen that the extra six-member ring takes the space opened by mutation F5G. D132A also opens some space for the extra six-member ring in bpy-Ala. Other mutations contribute to the binding of bpy-Ala by shaping the binding site according to the orientation assumed by bpy-Ala upon its binding.
- V40A makes the orientation of the C ⁇ -C ⁇ bond possible.
- I133T seems to form a weak hydrogen bond with the nitrogen atom in the second six-member ring in bpy-Ala.
- the distance between the nitrogen atom and the O ⁇ l in T133 is 3.2 A.
- Example 6 Designing Mutatant Phe-tRNA Synthetase for recognizing naphthyl- Ala
- the incorporation of non-natural amino acids in vivo is mostly determined by the recognition of the non-natural amino acids by aminoacyl-tRNA synthetases.
- Manipulation of the activity of aminoacyl-tRNA synthetases provides a way to significantly increase the number of non-natural amino acids that can be incorporated in vivo.
- the mutant has three mutations (V261G, F258I, and A314G), and computationally shows both a good binding affinity to naphthyl-alanine (31.25 kcal/mol) and a good differentiation toward tryptophan (6.08 kcal/mol), the closest competitor from natural amino acids.
- This mutant was selected from 54 mutants Applicants tried using the COP method.
- AARSs aminoacyl-tRNA synthetases
- the accuracy ofthe reaction is essential due to its nature of protein biosynthesis fidelity.
- protein biosynthesis is a great tool to make biomaterials with precise control over sequence, structure and function.
- the monomer pool is limited to the 20 natural amino acids. It has been shown that the monomer pool of amino acids can be increased by incorporating some non-natural amino acids using the wild-type AARS apparatus.
- non-natural amino acids incorporated into protein using wild-type AARSs are small, and the functionalities carried by these non-natural amino acids are very limited.
- these non-natural amino acids are analogs of natural amino acids with little difference in the side chain.
- non-natural amino acids that have desired chemical or physical properties cannot be incorporated this way. The most important reason is that these amino acid analogs are very different from their nataral amino acid counterpart. Therefore, they are rejected by the AARSs in the esterification to tRNAs.
- AARS activity manipulation is the promise of expanding the genetic codes by developing novel tRNA:AARS pairs orthogonal to existing such pairs in cells. It is typically done by evolving the suppressor tRNA with nonsense codon to pair with a cross-species mutant AARS, which recognizes a non-natural amino acid instead of one ofthe natural amino acids.
- the design of mutant AARS has been the bottleneck in this process due to the lack of an effective mutant screening method.
- the current technique screens a library of AARS mutants, in which several positions are replaced by all 20 amino acids. Five such positions will generate 5 20 mutants. There has been some success, but it is very time-consuming and cumbersome.
- COP Clash Opportunity Progressive
- the designed mutant has three mutations (V261G, F258I, and A314G), and computationally shows both a good binding affinity to naph-Ala (31.25 kcal/mol) and a good differentiation toward Trp (6.08 kcal/mol), the closest competitor from nataral amino acids.
- the first step in COP is to prepare the structures of protein and ligands (both the nataral amino acid Phe and non-natural amino acid naph-Ala here).
- the crystal structure of PheRS from Thermus thermophilus (PDB: 1B70, resolution 2.7 A) was downloaded from Protein Databank. Hydrogen was added using Biograf (Accelrys, San Diego, CA) and the potential energy of the structure was subsequently minimized with DREIDING force field (see Mayo et al, J. Phys. Chem. 1990, 94:8897) in MPSim (supra).
- the termination criterion was 0.1 kcal/mol/A in RMS force or maximum 2000 steps, and the minimization method was conjugate gradient.
- CHARMM22 charges (Brooks et al. J Comput Chem 4:187-217, 1983; MacKerell et al, J. Phys. Chem. 102: 3586-3616, 1998) were used for protein and Mulliken charges were used for ligands (naph-Ala and all natural amino acids used here). Mulliken charges were derived from quantum mechanics using Jaguar (Schrodinger, Portland, OR). The quantum mechanics calculation was done at HF level with 6- 31G** basis set, and Poisson-Boltzmann dielectric continuum solvent was included to simulate solvation in water (supra). Two rotamers of naph-Ala were built from the ligand Phe in the minimized protein ligand complex structare. Panel 1 of Figure 4 shows the structare of the two rotamers.
- AAG hmdmg AG(protei ⁇ ) + AG(ligand) - AG(protein + ligand)
- the binding energies of competitors from natural amino acids are also calculated, and the differential binding energy between the analog and the best competitor is used as a criterion to select mutants. These mutants will be checked by stability of each mutated residues to make sure that they make enough interactions with the rest ofthe protein so that the fold is stable.
- FIG. 5 is the minimized PheRS in ribbon representation with Phe shown as ball model. Both rotamers of naph-Ala shown in Panel 1 of Figure 4 were matched into the binding site of PheRS, and clashes were calculated using Equation 1. This was also done for the wild-type ligand Phe. Table 4 shows these interaction energies and the difference between naph-Ala and Phe.
- the binding site is defined as with 6 A ofthe side chain of NBD-Ala. Residues labeled with * have at least 1 kcal/mol less favorable interactions with NBD-Ala compared to Phe.
- a negative score generally means that the mutation prefers the binding of naph-Ala over Phe, and the mutation does not cause a lot of clashing energies inside the protein. From Table 5, we chose G and A for V261; for F258, G, A, S, C, M, T, N, I, and V were selected; for A314, G, A, and S were chosen. Some of these mutations had a positive score, however, they were included because they are within 5 kcal/mol of the best mutation and the combined mutations sometimes have better selectivity for the non-natural amino acid. Also the selection depends on the total number of mutants being tractable.
- 2 x 9 x 3 54 mutants were generated using SCRWL. These mutants were first optimized by minimizing the mutated residues while the rest of protein was fixed. This was followed by minimizing the whole protein with AVGB implicit solvation. Finally the binding energies of the mutant with naph-Ala and its competitors from the nataral amino acid pool (Phe, Tyr and Trp were considered here) were calculated using Equation 2. These results were listed in Table 6. The mutants were sorted by the differential binding energy between naph- Ala and the competitor with the best binding energy. The stability-check procedure was applied to determine if the mutant can fold correctly. This check was only done for those mutants showing at least 5 kcal/mol differential binding energy to naph- Ala.
- the three mutations open the space for the extra six- member ring in the naphthyl group.
- the recognizing forces for the zwitterion part of the ligand remain the same as in wild-type PheRS.
- the NH 3 + group interacts with E220, T179, S180 and Q218, while the COO " forms several hydrogen bonds with W149, R204 and Q218.
- Trifuluormethionine into a Phage Lysozyme Implications and a New Marker for Use in Protein 19F NMR. Biochemistry, 1997. 36: p. 3404-3416s.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Organic Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Peptides Or Proteins (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03718406A EP1573441A2 (en) | 2002-04-12 | 2003-04-14 | The cop protein design tool |
AU2003221944A AU2003221944A1 (en) | 2002-04-12 | 2003-04-14 | The cop protein design tool |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US37207402P | 2002-04-12 | 2002-04-12 | |
US60/372,074 | 2002-04-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2003088004A2 true WO2003088004A2 (en) | 2003-10-23 |
WO2003088004A3 WO2003088004A3 (en) | 2007-07-26 |
Family
ID=29250787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2003/011613 WO2003088004A2 (en) | 2002-04-12 | 2003-04-14 | The cop protein design tool |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1573441A2 (en) |
AU (1) | AU2003221944A1 (en) |
WO (1) | WO2003088004A2 (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7139665B2 (en) * | 2002-02-27 | 2006-11-21 | California Institute Of Technology | Computational method for designing enzymes for incorporation of non natural amino acids into proteins |
-
2003
- 2003-04-14 WO PCT/US2003/011613 patent/WO2003088004A2/en not_active Application Discontinuation
- 2003-04-14 EP EP03718406A patent/EP1573441A2/en not_active Withdrawn
- 2003-04-14 AU AU2003221944A patent/AU2003221944A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7139665B2 (en) * | 2002-02-27 | 2006-11-21 | California Institute Of Technology | Computational method for designing enzymes for incorporation of non natural amino acids into proteins |
Non-Patent Citations (6)
Also Published As
Publication number | Publication date |
---|---|
AU2003221944A1 (en) | 2003-10-27 |
WO2003088004A3 (en) | 2007-07-26 |
AU2003221944A8 (en) | 2003-10-27 |
EP1573441A2 (en) | 2005-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7139665B2 (en) | Computational method for designing enzymes for incorporation of non natural amino acids into proteins | |
Pierri et al. | Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening | |
Bower et al. | Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool | |
US20030215877A1 (en) | Directed protein docking algorithm | |
Vyas et al. | Homology modeling a fast tool for drug discovery: current perspectives | |
George et al. | Evolution-and structure-based computational strategy reveals the impact of deleterious missense mutations on MODY 2 (maturity-onset diabetes of the young, type 2) | |
Harris et al. | Predicting reactive cysteines with implicit-solvent-based continuous constant pH molecular dynamics in amber | |
US7751988B2 (en) | Lead molecule cross-reaction prediction and optimization system | |
Rost | Prediction in 1D: secondary structure, membrane helices, and accessibility | |
DeBartolo et al. | Protein structure prediction enhanced with evolutionary diversity: SPEED | |
Wickstrom et al. | The unfolded state of the villin headpiece helical subdomain: computational studies of the role of locally stabilized structure | |
US20010051855A1 (en) | Computationally targeted evolutionary design | |
Rifai et al. | Combined linear interaction energy and alchemical solvation free-energy approach for protein-binding affinity computation | |
Ivanov et al. | Bioinformatics platform development: from gene to lead compound | |
WO2001061344A1 (en) | Computationally targeted evolutionary design | |
Cloete et al. | Structural and functional effects of nucleotide variation on the human TB drug metabolizing enzyme arylamine N-acetyltransferase 1 | |
US20060121455A1 (en) | COP protein design tool | |
Kumar et al. | Protein folding and function: the N-terminal fragment in adenylate kinase | |
Jones et al. | Molecular dynamics studies of the protein–protein interactions in inhibitor of κB kinase-β | |
WO2005017805A2 (en) | Systems and methods for predicting the structure and function of multipass transmembrane proteins | |
US20050003389A1 (en) | Computationally targeted evolutionary design | |
Gaillard et al. | Full protein sequence redesign with an MMGBSA energy function | |
Brás et al. | Protein ligand docking in drug discovery | |
Sun et al. | Non-Canonical Interaction between Calmodulin and Calcineurin Contributes to the Differential Regulation of Plant-Derived Calmodulins on Calcineurin | |
Yasuo et al. | Structure-based CoMFA as a predictive model-CYP2C9 inhibitors as a test case |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003718406 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2003718406 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |