WO1994003627A1 - Method for design of novel proteins and proteins produced thereby - Google Patents

Method for design of novel proteins and proteins produced thereby Download PDF

Info

Publication number
WO1994003627A1
WO1994003627A1 PCT/US1993/006937 US9306937W WO9403627A1 WO 1994003627 A1 WO1994003627 A1 WO 1994003627A1 US 9306937 W US9306937 W US 9306937W WO 9403627 A1 WO9403627 A1 WO 9403627A1
Authority
WO
WIPO (PCT)
Prior art keywords
proteins
polar
protein
design
periodicity
Prior art date
Application number
PCT/US1993/006937
Other languages
French (fr)
Original Assignee
British Technology Group Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Technology Group Inc. filed Critical British Technology Group Inc.
Priority to AU47826/93A priority Critical patent/AU4782693A/en
Publication of WO1994003627A1 publication Critical patent/WO1994003627A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides

Definitions

  • This invention relates to a method for designing novel proteins and the proteins made therefrom.
  • Biomedical engineering is entering a new era.
  • most advances in this field have centered on the design of instrumentation that could be built and applied on a macroscopic scale.
  • continued research using this 'conventional' engineering approach is likely to yield new and useful applications, many of the most important future applications of biomedical engineering will result from research and engineering that can now be done on the molecular scale. Such engineering will lead to an entire new field of "nano-technology”.
  • the 'machinery' for biomedical engineering on the molecular scale is likely to be based upon the same type of molecular machinery that nature has developed over the course of several billion years of evolution.
  • most processes are carried out by proteins.
  • the functions of natural proteins range from the transport of oxygen (by hemoglobin) to the regulation of cell growth (by oncogene products). When these proteins do not function properly, the result can be diseases ranging from anemia to cancer. Because of their many roles in all living systems, proteins are at the center of current research aimed at the development of biomedical engineering on the molecular scale.
  • Protein design is a new field. Nevertheless, research in several laboratories (including my own) has already demonstrated that the design of novel protein structures is within the reach of current technology. Although the success of these initial attempts is extremely encouraging, protein design is still far from routine. Indeed, each of the initial attempts at protein design used a different strategy to achieve a somewhat different goal. Consequently, the results of these pioneering studies are not broadly applicable. Future progress towards the routine design of novel proteins will require the development of molecular strategies that can be applied to a broad range of design problems. The overall objective of the present invention is to set forth such a generally applicable strategy.
  • TASP's template-assembled-synthetic- proteins
  • Kaiser and coworkers used similar template-directed methods to construct an artificial heme-protein by linking four ⁇ -helical peptides directly to a porphyrin derivative [2]. Stewart and coworkers have also attempted to fix the position of peptide chains by cross-linking them [3]. However, they did so without using an artificial template. Instead, they linked together four synthetic polypeptide chains by attaching the C-terminus of each chain to an amino group on the side chain of an ornithine or lysine residue of a neighboring chain. The cross-linked peptide was designed to form a parallel four-helix bundle that would position serine, histidine, and aspartic acid side chains in the same spatial arrangement as they are found in chymotrypsin.
  • the resulting molecule binds chymotrypsin substrates and hydrolyses them at rates — 1 % that seen for natural chymotrypsin [3] .
  • cross-linking peptides has the advantage of reducing the number of possible conformations that the chain could assume (thus reducing the entropic barrier), its utility as a general strategy for protein design is quite limited. Indeed, to attain the long-term goals of engineering large proteins and producing them in significant quantities in biological systems, it will be essential to design single chain polypeptides that are capable of folding into defined 3-dimensional structures without the assistance of artificial cross-links.
  • Betabellin was designed by Richardson et al. to form a /3-sandwich consisting of two identical 4-stranded anti-parallel /3-sheets.
  • the first versions of betabellin were extremely hydrophobic and therefore difficult to characterize in aqueous solution.
  • a version of betabellin has been constructed that is soluble and readily characterized in aqueous solution.
  • Initial results from circular dichroism, Raman, and NMR spectroscopies indicate that betabellin indeed folds into a /3-sheet structure.
  • the resulting protein is monomeric in solution, ⁇ -helical by CD spectroscopy, and extremely stable.
  • the design of a ⁇ -4 differs significantly from natural proteins: It contains four identical ⁇ -helices and each of these helices is composed of only 3 different amino acids. Nonetheless, the extreme stability of the designed structure is very encouraging. It suggests that by using simple strategies to maximize those favorable interactions that are currently understood, it should be possible to successfully design proteins with "native-like" sequences despite the mistakes that are likely to be made by working with complex and non-repetitive amino acid sequences.
  • a final example of a protein designed to fold into a specified structure without the use of templates or cross-links is my own work on the ⁇ -helical protein, 'Felix'.
  • Felix was designed to form a 4-helix bundle. However, it differs significantly from ⁇ -4 in that Felix was explicitly designed to be 'native-like': Each of its four helices is different, their sequences are non-repetitive, and a wide variety of different amino acids was included in the design.
  • Felix has been expressed in bacteria and is easily purified in large quantities. Characterization of its structure and stability shows that Felix adopts a folded structure that is similar to the one in the designed model.
  • the method disclosed in this application sets forth a generally applicable technique for the production of novel proteins.
  • the powerful techniques of molecular biology are utilized to construct vast libraries of amino acid sequences, all of which are designed to fold into a chosen 3-dimensional structure.
  • the members of this collection are then screened to isolate those amino acid sequences that successfully fold into the desired structure. In doing so, those features of an amino acid sequence, which are both necessary and sufficient for the successful design of novel proteins can be elucidated.
  • our strategy for protein design is based upon the premise that the dominant force driving an amino acid sequence to fold into a native-like protein structure is the formation of highly amphiphilic secondary structures ( ⁇ -helices and /3-strands) that are capable of packing together in such a way that their non-polar side chains are removed from contact with water.
  • ⁇ -helices and /3-strands highly amphiphilic secondary structures
  • it For an amino acid sequence to form such amphiphilic structures, it must be designed such that its polar and non-polar amino acids are arranged with a periodicity that matches the periodicity of the ⁇ -helix (3.6 residues/turn) or the /3-strand (2.0 residues per turn).
  • a binary code comprising sequences having this kind of polar/nonpolar periodicity will be driven by the hydrophobic effect to pack against one another and thereby bury their non-polar surfaces in an internal environment that excludes water.
  • the ability of a sequence to form highly amphiphilic secondary structures is not only necessary for the formation of structure, but is in fact the most important factor in determining whether a sequence can fold into a stable globular structure. Indeed, in many cases, the ability of a sequence to form a particular amphiphilic secondary structure is actually sufficient to drive a designed polypeptide chain to collapse into a unique native-like structure.
  • Figure 1 is a ribbon diagram of an idealized four-helix bundle.
  • Figure 2 is a head on view of the idealized four-helix bundle shown in figure 1.
  • Non-polar (hydrophobic) side chains are in red.
  • Polar (hydrophilic) side chains are in blue.
  • Figure 4 is table showing the genetic code.
  • Non-polar (hydrophobic) side chains encoded by the triplet XTX are in yellow;
  • Polar (hydrophilic) side chains encoded by the triplet ZAZ are in red.
  • X any base;
  • Z C, A, or G
  • Figure 5 shows a polyacrylamide gel electrophoretic separation showing the presence of soluble protein made in accordance with the invention.
  • Figure 6 shows the results of protein separation using a size exclusion chromatography.
  • Figure 7 shows the circular dichroism spectrum of the designed protein #86.
  • the amino acid sequence in order for the polypeptide chain to form a segment of secondary structure with one hydrophobic face, and one hydrophilic face, the amino acid sequence must be designed with a periodicity of hydrophobic and hydrophilic residues that matches the helical repeat for that type of secondary structure.
  • the sequence in order for the polypeptide chain to form a segment of secondary structure with one hydrophobic face, and one hydrophilic face, the amino acid sequence must be designed with a periodicity of hydrophobic and hydrophilic residues that matches the helical repeat for that type of secondary structure.
  • the sequence to design a stable /3-sheet protein, the sequence must be composed of alternating hydrophobic and hydrophilic residues.
  • the periodicity of hydrophobic and hydrophilic residues must approximate the 3.6 residue repeat that is characteristic of ⁇ -helices.
  • the resulting materials among these sequences can be screened for those that indeed fold into the desired 3-dimensional structures.
  • the collection of designed amino acid sequences will be generated by constructing and expressing a large collection of synthetic genes. Each of these synthetic genes will encode a different amino acid sequence. However, all of the encoded sequences will share the property of having their polar and non-polar amino acids arranged with the appropriate periodicity to generate extremely amphiphilic ⁇ -helices that are punctuated by inter-helical turns.
  • the protein structural motif that is focused upon initially is the four helix bundle (as shown in Figure 1). This topology is simple and frequently observed in natural proteins. In the amphiphilic helices of a 'perfect' four-helix bundle, all the hydrophobic side chains would point towards the interior of the bundle and all of the polar side chains would point towards the solvent. This is diagrammed in the head- on view in figure 2 wherein non-polar (hydrophobic) side chains are in red and polar (hydrophilic) side chains are in blue.
  • the DNA encoding the ⁇ -helices is semi-random (incorporating XTX and ZAZ codons at the appropriate locations for non-polar and polar residues respectively).
  • the ends of each strand code for the inter-helical turns and are composed of unique, non-randomized sequences.
  • the 3' end of Strand #1 will be complementary to the 3' end of strand #2.
  • each one will prime for second-strand syntheses on the other strand as shown above.
  • a parallel strategy is employed to construct the other half of the gene encoding alpha-helices #3 and #4.
  • the resulting pieces of double stranded DNA is then digested with the appropriate restriction enzymes to generate 'sticky' ends.
  • Finally the two halves of the gene are combined and ligated into an expression vector that employs the strong T7 promoter. In this expression system very large quantities of protein are synthesized (sometimes as much as 50% of total cell protein).
  • each isolated bacterial (E. coli) colony will harbor a different DNA sequence that encodes a different protein sequence.
  • Expression of these sequences can lead to three possible outcomes: a) The protein is soluble and resistant to intracellular proteases. Since unfolded proteins are readily proteolysed, this result would suggest that the designed sequence indeed folds into a stable globular structure. b) The protein is sequestered in inclusion bodies. Interpretation of this result would require that these proteins be solubilized and refolded before any conclusions can be drawn about their structures and/or stabilities. c) The protein is completely degraded in vivo. This result would suggest that a particular amino acid sequence cannot fold into a structure that is sufficiently stable to escape intracellular proteolysis.
  • the inclusion bodies are solubilized and the proteins refolded at low concentrations to prevent re- aggregation.
  • refolding proteins from inclusion bodies is slightly less direct than working with proteins that come out of the cell in a soluble folded form, inclusion bodies are now so common in the field of biotechnology that refolding proteins from inclusion bodies has become routine.
  • monomeric proteins refolding is rarely a problem.
  • a small fraction of the candidates are proteolysed in vivo (condition 'c').
  • this group represents a very small percentage of the total. Since, as described above, one can readily screen for stable proteins among hundreds of candidates, one can easily ensure the isolation of many candidates that are not degraded.
  • the designed sequences in the preferred collection of proteins is constrained by a polar/nonpolar periodicity that strongly favors the formation of amphiphilic ⁇ -helices. Such helices will be driven by the hydrophobic effect to pack against one another and thereby exclude water from contact with their non-polar faces. Consequently, the majority of sequences in the collection will fold into 3-dimensional structures resembling the designed four- helix bundle.
  • a purified protein in the collection is shown to be ⁇ -helical by CD spectroscopy, it is essential to determine whether it exists as the expected monomeric four-helix bundle, or alternatively, as a single long helix that buries its non-polar face by forming intermolecular contacts in an oligomer.
  • This can readily be accomplished by size-exclusion chromatography and by a variety of hydrodynamic techniques. While these initial characterizations will not provide the kind of atomic resolution obtainable from NMR or X-ray crystallography, they certainly enable one to choose which of the sequence candidates merits further study by one of these high-resolution techniques. Referring to figure 6, there is shown the results of size- exclusion chromatography of a novel protein prepared in accordance with the present invention.
  • the essential first step will be the successful design of large molecules (e.g. proteins) that fold into predetermined structures.
  • large molecules e.g. proteins
  • Pioneering experiments aimed at the design of novel proteins have already shown that the construction of particular sequences that fold into specific 3-dimensional structures is within the reach of current understanding and technology.
  • each of these initial efforts employed a different molecular strategy in pursuit of a different structural goal: They each designed one amino acid sequence that folded into one desired structure. Consequently, these initial strategies are not broadly applicable to the design of entire classes of novel proteins.
  • novel strategies will have to be developed. The invention disclosed herein develops such strategies.
  • novel proteins are designed by specifying only the minimally required sequence information. This leads to the generation of a vast collection of stable protein structures. Each of these structures is slightly different, and each of them will have the potential to serve as a scaffold for incorporating any one of many possible biochemical functions.

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Peptides Or Proteins (AREA)

Abstract

A general method is described for the design and production of a multiplicity of novel proteins. The method entails constructing a degenerate family of synthetic genes wherein each codon is chosen to specify either a polar or a non-polar amino acid, but the exact identity of the amino acid at each particular position is not specified explicitly. Thus a huge collection of different proteins can be encoded using a simple binary code of polar versus non-polar amino acids. The periodicity of the polar and non-polar amino acids in the sequence is constrained to match the natural repeat of 3.6 residues/turn in α-helices. Thus all the novel proteins are explicitly designed to have only hydrophobic amino acids in their interiors, and only hydrophilic amino acids on their surfaces. The synthetic genes encoding these novel proteins are cloned and expressed in vivo, thus giving rise to a library of novel proteins, each of which is easily purified in large quantities.

Description

1
METHOD FOR DESIGN OF NOVEL PROTEINS AND PROTEINS PRODUCED THEREBY
Field of the Invention
This invention relates to a method for designing novel proteins and the proteins made therefrom.
Background of the Invention
Biomedical engineering is entering a new era. In the past, most advances in this field have centered on the design of instrumentation that could be built and applied on a macroscopic scale. Although continued research using this 'conventional' engineering approach is likely to yield new and useful applications, many of the most important future applications of biomedical engineering will result from research and engineering that can now be done on the molecular scale. Such engineering will lead to an entire new field of "nano-technology".
The 'machinery' for biomedical engineering on the molecular scale is likely to be based upon the same type of molecular machinery that nature has developed over the course of several billion years of evolution. In nature, most processes are carried out by proteins. The functions of natural proteins range from the transport of oxygen (by hemoglobin) to the regulation of cell growth (by oncogene products). When these proteins do not function properly, the result can be diseases ranging from anemia to cancer. Because of their many roles in all living systems, proteins are at the center of current research aimed at the development of biomedical engineering on the molecular scale. Although some 'protein engineering' has already been accomplished by manipulating the properties of several well-studied natural proteins, future applications of biomedical engineering on the molecular scale will depend upon our ability to design entirely novel proteins that are capable of folding into stable 3-dimensional structures and performing useful biochemical reactions.
Protein design is a new field. Nevertheless, research in several laboratories (including my own) has already demonstrated that the design of novel protein structures is within the reach of current technology. Although the success of these initial attempts is extremely encouraging, protein design is still far from routine. Indeed, each of the initial attempts at protein design used a different strategy to achieve a somewhat different goal. Consequently, the results of these pioneering studies are not broadly applicable. Future progress towards the routine design of novel proteins will require the development of molecular strategies that can be applied to a broad range of design problems. The overall objective of the present invention is to set forth such a generally applicable strategy.
In attempting to design novel proteins that will be of biomedical significance, the first major goal must be to coerce amino acid sequences to fold into unique compact structures. Since a long polypeptide chain might be arranged into an enormous number of possible folds, significant conformational entropy must be overcome in order to constrain the chain to fold into a unique globular structure that is both compact and stable. To overcome this entropic barrier, different research groups have employed different strategies for protein design.
Perhaps the simplest strategy for locking the polypeptide chain into a unique conformation relies upon synthetic templates or other non-native cross-links that covalently join several peptides and thereby fix their locations relative to one another. Since the template directs the peptides into a particular orientation, this methodology bypasses the folding problem that is inherent in designing native-like proteins from singly polypeptide chains. This approach has been pioneered by Mutter and coworkers. The constructed several template-assembled-synthetic- proteins (TASP's) in which 3-barrel, α-helical bundle, or βaβ protein topologies are formed by covalently attaching amphiphilic α-helices and/or 3-strands to a multifunctional template [1]. Kaiser and coworkers used similar template-directed methods to construct an artificial heme-protein by linking four α-helical peptides directly to a porphyrin derivative [2]. Stewart and coworkers have also attempted to fix the position of peptide chains by cross-linking them [3]. However, they did so without using an artificial template. Instead, they linked together four synthetic polypeptide chains by attaching the C-terminus of each chain to an amino group on the side chain of an ornithine or lysine residue of a neighboring chain. The cross-linked peptide was designed to form a parallel four-helix bundle that would position serine, histidine, and aspartic acid side chains in the same spatial arrangement as they are found in chymotrypsin. The resulting molecule binds chymotrypsin substrates and hydrolyses them at rates — 1 % that seen for natural chymotrypsin [3] . These results are extremely encouraging. They indicate that the current synthetic strategies and molecular modeling tools are sufficient, at least in some cases, to design predetermined structures that possess biologically relevant activities.
Although cross-linking peptides has the advantage of reducing the number of possible conformations that the chain could assume (thus reducing the entropic barrier), its utility as a general strategy for protein design is quite limited. Indeed, to attain the long-term goals of engineering large proteins and producing them in significant quantities in biological systems, it will be essential to design single chain polypeptides that are capable of folding into defined 3-dimensional structures without the assistance of artificial cross-links. This has been accomplished, with at least preliminary success, for three model systems: (i) The anti-parallel b-sheet protein ("Betabellin") of Richardson, Erickson, and colleagues [4]; (ii) The modular four helix bundle ("a-4") of Degrado, Eisenberg, and coworkers [5]; and (iii) The four helix bundle with a native-like sequence ("Felix") that resulted from my own work [6]. These three projects are described below.
Betabellin was designed by Richardson et al. to form a /3-sandwich consisting of two identical 4-stranded anti-parallel /3-sheets. The first versions of betabellin were extremely hydrophobic and therefore difficult to characterize in aqueous solution. After several cycles of design, analysis, and redesign, a version of betabellin has been constructed that is soluble and readily characterized in aqueous solution. Initial results from circular dichroism, Raman, and NMR spectroscopies indicate that betabellin indeed folds into a /3-sheet structure.
One of the most encouraging results in the design of novel proteins is the 4- helix bundle (" -4") developed by Degrado, Eisenberg, and coworkers. These researchers built on Eisenberg's work with hydrophobic moments [7] to design a simple amphiphilic α-helix containing only leucine side-chains on its hydrophobic face and glutamic acid and lysine side chains on its hydrophilic face. A modular strategy was followed to design the entire protein: First, the sequence of the single amphiphilic helix was optimized so that self-associated into tetramers; next, two helices were joined to form a hairpin; and finally the entire 4-helix bundle was expressed in vivo from a synthetic gene. The resulting protein is monomeric in solution, α-helical by CD spectroscopy, and extremely stable. The design of a α-4 differs significantly from natural proteins: It contains four identical α-helices and each of these helices is composed of only 3 different amino acids. Nonetheless, the extreme stability of the designed structure is very encouraging. It suggests that by using simple strategies to maximize those favorable interactions that are currently understood, it should be possible to successfully design proteins with "native-like" sequences despite the mistakes that are likely to be made by working with complex and non-repetitive amino acid sequences.
In recent work, α-4 has been modified to incorporate a metal binding sit [8]. Since metals are involved in numerous biochemical reactions, and since many natural enzymes contain metals as essential components of their active sites, the successful design of novel metallo-proteins represents and extremely important step towards the ultimate goal of designing novel proteins that will catalyze biologically and medically important reactions.
A final example of a protein designed to fold into a specified structure without the use of templates or cross-links is my own work on the α-helical protein, 'Felix'. Like 'α-4', Felix was designed to form a 4-helix bundle. However, it differs significantly from α-4 in that Felix was explicitly designed to be 'native-like': Each of its four helices is different, their sequences are non-repetitive, and a wide variety of different amino acids was included in the design. Felix has been expressed in bacteria and is easily purified in large quantities. Characterization of its structure and stability shows that Felix adopts a folded structure that is similar to the one in the designed model.
The experiments summarized above indicate that the design of particular sequences that fold into specific 3-dimensional structures is already within the reach of current understanding and technology. Protein design is still a new field, and each of the initial experiments in this field employed a different strategy in pursuit of a different design goal. While it is encouraging to see that so many different approaches can lead to successful designs, it is also clear that in order to attain the long term-goal of designing a variety of biomedically useful proteins, novel strategies will have to be developed. These strategies, in contrast to those used in the initial experiments described above, will have to be of a general nature so that they can be employed for the design of many novel sequences that fold into any one of a number of different structural motifs. Summary of the Invention
The method disclosed in this application sets forth a generally applicable technique for the production of novel proteins. The powerful techniques of molecular biology are utilized to construct vast libraries of amino acid sequences, all of which are designed to fold into a chosen 3-dimensional structure. The members of this collection are then screened to isolate those amino acid sequences that successfully fold into the desired structure. In doing so, those features of an amino acid sequence, which are both necessary and sufficient for the successful design of novel proteins can be elucidated. These results can facilitate progress towards the eventual design of entire classes of proteins that will form the basis of tomorrow's medicine and pharmacology.
Our strategy for protein design is based upon the premise that the dominant force driving an amino acid sequence to fold into a native-like protein structure is the formation of highly amphiphilic secondary structures (α-helices and /3-strands) that are capable of packing together in such a way that their non-polar side chains are removed from contact with water. For an amino acid sequence to form such amphiphilic structures, it must be designed such that its polar and non-polar amino acids are arranged with a periodicity that matches the periodicity of the α-helix (3.6 residues/turn) or the /3-strand (2.0 residues per turn). A binary code comprising sequences having this kind of polar/nonpolar periodicity will be driven by the hydrophobic effect to pack against one another and thereby bury their non-polar surfaces in an internal environment that excludes water. We propose that the ability of a sequence to form highly amphiphilic secondary structures is not only necessary for the formation of structure, but is in fact the most important factor in determining whether a sequence can fold into a stable globular structure. Indeed, in many cases, the ability of a sequence to form a particular amphiphilic secondary structure is actually sufficient to drive a designed polypeptide chain to collapse into a unique native-like structure.
In order to test whether sequences designed with the appropriate periodicities of polar/non-polar amino acids indeed fold into the expected 3-dimensional structures, the following steps are follows:
1) Construct a vast library of synthetic genes. These genes are designed to encode amino acid sequences that differ from one another in exact sequence, but nonetheless share a common pattern of polar and non-polar amino acids. This pattern is explicitly designed so that polar and non-polar side chains occur with the same periodicity as the natural repeat of the α-helix. The construction of such a library exploits the organization of the genetic code, in which XAX codes for polar side chains while XTX codes for non-polar side chains, where X represents one or more of the bases Adenine, Guanine, Thymine (or Uracil) or Cytosine.
2) Clone this library of synthetic genes into a high-level bacterial expression system.
3) Screen individual members of the expressed library in order to determine which amino acid sequences are capable of folding into compact and stable 3- dimensional structures.
4) Purify a representative sample of the proteins.
5) Characterize the structures and stabilities of these purified proteins by employing various known biophysical techniques. The folding of these sequences into the predicted designed structures demonstrate: (i) that the ability of a binary-coded sequence of polar and non-polar amino acids to form highly amphiphilic secondary structures is actually sufficient to drive a designed polypeptide chain to collapse into a compact native-like structure; (ii) that desired protein structures can be engineered by carefully controlling the ordering and periodicity of amino acid types; and (iii) that attempts to optimize all specific interactions at the highest level of detail may not be necessary. Based upon this discovery, designing novel proteins that possess biomedically useful properties will become far simpler than previously imagined.
Brief Description of the Drawings
Figure 1 is a ribbon diagram of an idealized four-helix bundle.
Figure 2 is a head on view of the idealized four-helix bundle shown in figure 1. Non-polar (hydrophobic) side chains are in red. Polar (hydrophilic) side chains are in blue.
Figure 3 depicts a generalized binary-coded amino acid sequence pattern for an idealized four-helix bundle wherein npl = non-polar (hydrophobic) side chains and pol = polar (hydrophilic) side chains.
Figure 4 is table showing the genetic code. Non-polar (hydrophobic) side chains encoded by the triplet XTX are in yellow; Polar (hydrophilic) side chains encoded by the triplet ZAZ are in red. (X = any base; Z = C, A, or G)
Figure 5 shows a polyacrylamide gel electrophoretic separation showing the presence of soluble protein made in accordance with the invention.
Figure 6 shows the results of protein separation using a size exclusion chromatography.
Figure 7 shows the circular dichroism spectrum of the designed protein #86.
Detailed Description of the Invention
Rationale for the Procedure
The minimal sequence requirements necessary for the successful design of novel proteins are related to two unifying themes: (i) Globular, water-soluble proteins invariably fold into structures that maximize the burial of hydrophobic surface area while simultaneously exposing polar side chains to solvent; and (ii) These structures are almost always composed of elements of secondary structures including α-helices, /3-sheets, and reverse turns. Taken together, the dual constraints of forming regular secondary structures while at the same time burying hydrophobic side-chains (and exposing hydrophilic ones) have significant implications for the design of proteins de novo: These constraints require that in a successful design, amino acid sequences must be arranged in such a way that they can form amphiphilic α-helices or /3-strands. In other words, in order for the polypeptide chain to form a segment of secondary structure with one hydrophobic face, and one hydrophilic face, the amino acid sequence must be designed with a periodicity of hydrophobic and hydrophilic residues that matches the helical repeat for that type of secondary structure. For example, to design a stable /3-sheet protein, the sequence must be composed of alternating hydrophobic and hydrophilic residues. Conversely, for an α-helical protein, the periodicity of hydrophobic and hydrophilic residues must approximate the 3.6 residue repeat that is characteristic of α-helices.
I have found the ability of a sequence to form highly amphiphilic secondary structures is not only necessary for the formation of structure, but is in fact the most important factor in determining whether a sequence can fold into a stable globular structure. Indeed, based upon this work, it is expected that in many cases, the ability of a sequence to form amphiphilic secondary structures is actually sufficient to drive a designed polypeptide chain to fold into a compact native-like structure. If this is indeed the case, then the construction of sequences that meet these constraints leads to an extremely general and widely applicable method for the design of novel proteins. The above method is implemented by using the power of molecular genetics to generate a vast collection of designed amino acid sequences that satisfy the constraints described above. The resulting materials among these sequences can be screened for those that indeed fold into the desired 3-dimensional structures. The collection of designed amino acid sequences will be generated by constructing and expressing a large collection of synthetic genes. Each of these synthetic genes will encode a different amino acid sequence. However, all of the encoded sequences will share the property of having their polar and non-polar amino acids arranged with the appropriate periodicity to generate extremely amphiphilic α-helices that are punctuated by inter-helical turns.
It should be stressed that the method described above is not limited to the design of α-helical proteins. Indeed, by designing with the appropriate periodicities of polar and non-polar sequences, /3-sheet structures or mixed α-helical//3-sheet structures can also be built.
Experimental Strategy - Engineering a Library of Designed Proteins
The protein structural motif that is focused upon initially is the four helix bundle (as shown in Figure 1). This topology is simple and frequently observed in natural proteins. In the amphiphilic helices of a 'perfect' four-helix bundle, all the hydrophobic side chains would point towards the interior of the bundle and all of the polar side chains would point towards the solvent. This is diagrammed in the head- on view in figure 2 wherein non-polar (hydrophobic) side chains are in red and polar (hydrophilic) side chains are in blue.
An amino acid sequence pattern that is consistent with this idealized four helix bundle and the 3.6 residue/turn periodicity of the alpha helix is shown in figure 3. The design of this sequence pattern is based upon the structures of natural four- helix bundles and my own experience with designed protein structures. The helices are 16 residues long and the inter-helical turns are 3 residues long. This requires a total length of 74 amino acids. Note that with the exception of the turn residues (which have been designed by analogy to turns in known helical proteins), the pattern does not specify an exact amino acid sequence: It specifies only the locations of polar and non-polar residues. The sequences, however, will have the common feature that all non-polar side chains occur on the inner surface of the α-helix, and all polar side chains occur on the exposed surface of the α-helix, so as to give rise to a water-soluble protein. A large collection of amino acid sequences consistent with this pattern will be generated by expressing a degenerate family of synthetic genes that is constructed to take advantage of the natural distribution of polar and non-polar amino acids in the genetic code (see figure 4). Thus, in a designed gene sequence, whenever a hydrophobic side chain is desired, the DNA triplet XTX is synthesized (where X = A, G, C, or T). Conversely, when a polar amino acid is required, the triplet ZAZ is synthesized (where Z = A,G, or C; T is omitted to avoid stop codons). Thus, although the locations of polar and non-polar amino acids are specified explicitly, the exact identity of the amino acid at each position is semi- variable. (It should be noted that T in the DNA alphabet = U in the RNA alphabet.)
Although synthesis of semi-variable DNA of sufficient length to encode an entire protein is non-trivial, it is well within the reaches of current technology. The method set forth above is to synthesize oligonucleotides of approximately 75 bases by standard automated solid-phase methods and then use DNA polymerase to make the complementary strand as shown below. rostric. she ' ■ DNA POLYMERASE→
<-DNA POLYMERASE— I 5'
-— ~ restric. site aeml-random non-random semi-random sequence of helix #1 turn sequence sequence of helix #2
The DNA encoding the α-helices is semi-random (incorporating XTX and ZAZ codons at the appropriate locations for non-polar and polar residues respectively). However, the ends of each strand code for the inter-helical turns and are composed of unique, non-randomized sequences. The 3' end of Strand #1 will be complementary to the 3' end of strand #2. Thus when the two strands are mixed together, each one will prime for second-strand syntheses on the other strand as shown above. A parallel strategy is employed to construct the other half of the gene encoding alpha-helices #3 and #4. The resulting pieces of double stranded DNA is then digested with the appropriate restriction enzymes to generate 'sticky' ends. Finally the two halves of the gene are combined and ligated into an expression vector that employs the strong T7 promoter. In this expression system very large quantities of protein are synthesized (sometimes as much as 50% of total cell protein).
Following cloning, each isolated bacterial (E. coli) colony will harbor a different DNA sequence that encodes a different protein sequence. Expression of these sequences can lead to three possible outcomes: a) The protein is soluble and resistant to intracellular proteases. Since unfolded proteins are readily proteolysed, this result would suggest that the designed sequence indeed folds into a stable globular structure. b) The protein is sequestered in inclusion bodies. Interpretation of this result would require that these proteins be solubilized and refolded before any conclusions can be drawn about their structures and/or stabilities. c) The protein is completely degraded in vivo. This result would suggest that a particular amino acid sequence cannot fold into a structure that is sufficiently stable to escape intracellular proteolysis. In my own experience expressing designed α-helical proteins in the T7 expression system, I have found that these three situations can readily be distinguished from one another. The presence or absence of a novel over-expressed protein in either the soluble or the insoluble fraction of whole cell lysates is easily detected by polyacrylamide gel electrophoresis. Because of the simplicity of this assay, several hundred candidates can readily be screened over the course of a few months.
I have found that among the sequence candidates, a significant fraction express proteins that fold into 3-dimensional structures that are both soluble and sufficiently stable to resist intracellular proteolysis (condition 'a'). Such a finding indicates that many different versions of the sequence pattern shown in figure 3 are capable of folding into a stable globular protein structure. This result implies that there is a high level of plasticity in the sequence information that is necessary to encode a stable protein structure and clearly has enormous implications for future designs of a variety of novel proteins possessing a variety of medically or other commercially important activities. Many of the candidates that survive proteolysis can be found in insoluble inclusion bodies (condition 'b'). The formation of inclusion bodies greatly simplifies protein purification since proteins in this insoluble form are both protected from intracellular proteolysis, and are easily purified. For these candidates, the inclusion bodies are solubilized and the proteins refolded at low concentrations to prevent re- aggregation. Although refolding proteins from inclusion bodies is slightly less direct than working with proteins that come out of the cell in a soluble folded form, inclusion bodies are now so common in the field of biotechnology that refolding proteins from inclusion bodies has become routine. For small ( <200 amino acids), monomeric proteins refolding is rarely a problem. A small fraction of the candidates are proteolysed in vivo (condition 'c').
However, based on previous experience with proteins designed de novo, this group represents a very small percentage of the total. Since, as described above, one can readily screen for stable proteins among hundreds of candidates, one can easily ensure the isolation of many candidates that are not degraded. As described above (and shown in figures 2 & 3), the designed sequences in the preferred collection of proteins is constrained by a polar/nonpolar periodicity that strongly favors the formation of amphiphilic α-helices. Such helices will be driven by the hydrophobic effect to pack against one another and thereby exclude water from contact with their non-polar faces. Consequently, the majority of sequences in the collection will fold into 3-dimensional structures resembling the designed four- helix bundle. To verify this one must purify and physically characterize several of these proteins. Fortunately, the detection of α-helicity is quite straightforward. Circular dichroism (CD) spectroscopy is well suited for these studies since it is simple to use, requires very small amounts of sample, and readily measures the presence and quantity of α-helix that is present in a protein. An example of the results of CD spectroscopy on a novel protein prepared in accordance with the method disclosed herein in shown in figure 7. The spectrum shows minima at 208 and 222 nanometers, a crossover point at 200 nanometers, and a maximum at 190 nanometers. These features indicate that the protein is predominantly α-helical. Similar spectra have been observed for all of the novel proteins examined.
Once a purified protein in the collection is shown to be α-helical by CD spectroscopy, it is essential to determine whether it exists as the expected monomeric four-helix bundle, or alternatively, as a single long helix that buries its non-polar face by forming intermolecular contacts in an oligomer. This can readily be accomplished by size-exclusion chromatography and by a variety of hydrodynamic techniques. While these initial characterizations will not provide the kind of atomic resolution obtainable from NMR or X-ray crystallography, they certainly enable one to choose which of the sequence candidates merits further study by one of these high-resolution techniques. Referring to figure 6, there is shown the results of size- exclusion chromatography of a novel protein prepared in accordance with the present invention. In this technique, large molecules elute early, and small molecules elute late. Our designed protein (candidate 'F') elutes as a peak centered at 29.22 minutes. A size marker, cytochrome b-562, which has a molecular weight of - 12,000, elutes at 28.61 minutes (data not shown in this figure). These results indicate that the size of our protein in solution is smaller than cytochrome b-562, and is comparable to what would be expected for a globular monomeric protein of molecular weight — 8,000. Similar results have also been obtained for candidates #10 and #86.
These experiments show that many of the designed sequences indeed fold into monomeric 4-helix bundles. Such a result demonstrates that the periodicity of polar and non-polar amino acid types is sufficient to drive a designed polypeptide chain to fold into a native-like structure. Since the precise identity of the amino acid side- chain at each position is not explicitly specified in the design, the implications of this result are that carefully controlling the ordering and periodicity of amino acid types is sufficient to engineer novel protein structures. Hence, designing novel proteins that possess biomedically useful properties will become far simpler than previously imagined.
This new strategy for protein design will be extremely useful in the production of a vast array of novel macromolecules. These macromolecules can eventually be used for a variety of different functions. In the past, molecular design has focussed on small molecules. We are now entering a new phase of molecular design in which it is possible to design very large molecules (e.g. proteins) to meet our specifications. Just as the ability to design small molecules was applied to solve a variety of different technological problems in the past, this new ability to design macromolecules has the potential to have a wide impact in the future. Thus this novel approach might be used in many fields of technology ranging from industrial catalysis to the development of new pharmaceuticals.
Although the initial prototypes have not yet been shown to possess any particular activity, the method for producing these structures is likely to be widely used in the future. The method for constructing new structures disclosed herein is very general and these structures can serve as scaffolds for incorporating a variety of novel activities. Thus this method can be seen as a first step towards the development of an entire generation of newly designed 'tailor-made' molecules.
Perhaps the best way to illustrate this is by way of a metaphor: In a sense our new method is analogous to a novel strategy for the design and production of buildings. Since we have not yet furnished the buildings, they have not yet been made into homes, offices, or schools. However, since they have the potential to serve any one of these functions, our general approach to construction is likely to be used by engineers interested in constructing any one of a number of different functional structures.
The following is a summary of the more important experimental data: 1) For 15 randomly selected candidates, it has been demonstrated that the protein is expressed at very high levels and is in the soluble fraction of the cells. Since it is well-known that unstable or structure-less proteins are readily degraded in vivo, this result provides very strong evidence that our designed proteins fold into stable globular structures.
2) As a step in our protein purification, the partially pure protein was heated to 70°C, followed by quickly cooling to 0°C, and then centrifuging away whatever had precipitated out of solution. This step is used in some protein purifications in the literature because it causes many contaminating proteins to unfold and precipitate. If the protein of interest is stable, it will not unfold (or it will refold rapidly) and will remain in solution. Therefore, it can be separated from precipitated contaminants by centrifugation. Figure 5 shows the result of this experiment for 3 novel proteins. The three lanes on the right show soluble protein before the heat step, the three lanes on the left show the same proteins after the heat step. This experiment demonstrates that our novel proteins form structures that are extremely stable and robust. Thus far, this heat step has worked on 9 of the 10 novel proteins that we have examined.
3) Three of the purified proteins have been analyzed on sizing columns. Such columns allow one to determine the size of a protein in solution. Large proteins are excluded from the matrix of the column and so they elute faster than small proteins. All three of the designed proteins elute at positions expected for a compact monomeric protein of the expected size (ie molecular weight = ~ 8000.) Figure 6 shows an example for one of our proteins which clearly elutes much later (and hence is much smaller) than the molecular weight standard indicated by the tic mark.
4) These same 3 proteins were examined by circular dichroism spectroscopy. This technique allows one to determine general features about the structure of a protein. For example, minima at 208 and 222 nanometers indicate that a protein is predominantly α-helical. Figure 7 shows that the circular dichroism spectrum of one of our proteins indeed shows these features. This result has been confirmed for all three of the proteins examined thus far. Significance of the Invention
The field of biotechnology has seen enormous advances in the past decade. As a result of breakthroughs in disciplines ranging from genetic engineering to computer modelling, it will soon be possible to design and engineer entirely novel bio-molecules that possess predetermined structures and desired activities. Since the reactions and interactions of biological molecules underlie all of the macroscopic and medically observable states of living systems, this new capability for biomedical engineering on the molecular scale has the potential to revolutionize almost every facet of biology and medicine.
To produce novel bio-molecules with medically important functions, the essential first step will be the successful design of large molecules (e.g. proteins) that fold into predetermined structures. Pioneering experiments aimed at the design of novel proteins have already shown that the construction of particular sequences that fold into specific 3-dimensional structures is within the reach of current understanding and technology. However, each of these initial efforts employed a different molecular strategy in pursuit of a different structural goal: They each designed one amino acid sequence that folded into one desired structure. Consequently, these initial strategies are not broadly applicable to the design of entire classes of novel proteins. In order to attain the long term-goal of constructing a variety of biomedically useful proteins, novel strategies will have to be developed. The invention disclosed herein develops such strategies. In the experiments described above, novel proteins are designed by specifying only the minimally required sequence information. This leads to the generation of a vast collection of stable protein structures. Each of these structures is slightly different, and each of them will have the potential to serve as a scaffold for incorporating any one of many possible biochemical functions.
While the specific research described here does not involve clinical studies, its long-term clinical potential is enormous, since virtually all diseases stem from aberrant reactions or interactions on the molecular scale, our ability to devise novel 'molecular machinery' has the potential to touch every corner of clinical medicine: Viral infections might be treated with new proteins that attach to, and inactivate virus particles; heart disease might be treated with new enzymes that dissolve clots; and cancer might be treated by the injection of a 'molecular magic bullets' that seek out and destroy aberrant cells. While some of these applications may not be clinically available for several years, there is no doubt that in the decades to come, virtually all fields of medicine will be revolutionized by the future availability of novel proteins 'tailor made' for specific clinical applications. BIBLIOGRAPHY
1. Mutter, M. (1988) Trends in Biochem. Sci. 13, 260. Mutter, M. et al. (1989) Proteins 5, 13.
2. Saslri, T. and Kaiser, E.T. (1989) 7. Amer. Chem. Soc. Ill, 380. Saski, T. and Kaiser, E.T. (1990) Biopolymers. 29, 79.
3. Hahn, K.W., Klis, W.A., and Stewart, J.M. (1990) Science 248, 1544.
4. Unson, C.G., Erickson, B.W., Richardson, D.C., and Richardson, J.S.
(1984) Fed. Proc. 43, 1837. Richardson, J.S. and Richardson, D.C. (1987) in Protein Engineering (Oxander, CD. and Fox, C, eds..) ppl49-163, Alan R. Liss. Inc.
Richardson, J.S., and Richardson, D.C. (1989) Trends in Biochem. Sci. 4,
304.
5. Eisenberg, D., Wilcox, W., Eshita, S.M. , Pryciak, P.M., Ho, S.P., and
Degrado, W.F. (1986) Proteins 1, 16. Ho, S.P., and DeGrado, W.F. (1987) J. Amer. Chem. Soc. 109, 6751.
DeGrado, W.F., Regan, L., and Ho, S.P. (1987) Cold Spring Harbor Symp. on Quant. Biol. 52, 521. Regan, L. and DeGrado (1988) Science 241, 976. Hill, C.P., Anderson, D.H., Wesson, L., DeGrado, W.F., and Eisenberg, D.
(1990) Science 249, 543.
6. Hecht, M.H. , Richardson, J.S., Richardson, D.C, and Odgen, R.C. (1990)
Science 249, 884.
7. Eisenberg, D., Weiss, R.M., and Terwilliger, T.C. (1982) Nature 299,371. Eisenberg, D., Weiss, R.M., and Terwilliger, T.C. (1984) Proc. Nat'l.
Acad.
Sci. (USA) 81 , 140.
8. Handel, T. and Degrado, W.F. (1990) J. Amer. Chem. Soc. 112, 6710. Regan, L., and Clarke, N. (1990) Biochemistry 29, 10878.

Claims

18
What I claim is:
1) A method for the design and expression of a novel protein comprising the steps of: a) predetermining an amino acid sequence representing a pattern of hydrophilic and hydrophobic residues arranged in such a pattern that the periodicity of polar and non-polar residues matches the inherent periodicity of an alpha-helical secondary structure; b) creating a degenerate family of genes that encode protein sequences in accordance with said periodicity; c) inserting said gene into a high-level expression vector; d) placing said gene-containing vector into a host; e) isolating individual clones of cells that express individual members of the degenerate family of genes.
2) The method recited in claim 1 , including the further step of purifying protein produced by at least one of said genes.
3) The method recited in claim 2 yields proteins that fold into predetermined 3- dimensional structures that are compact and stable.
4) A method for expressing a library of different folded α-helical proteins having similar solubility properties comprising coding a sequence of polar and non-polar amino acids in accordance with the periodicity of the protein α-helix to create a degenerate set of stable proteins having the desired solubility.
5) The method recited in claim 4, including the step of modifying the protein so produced for developing a resultant protein having a specified function.
6) One or more proteins produce from the method set forth in claim #1.
7) One or more proteins produce from the method set forth in claim #4. 8) At least one member of the family of proteins formed by the method recited in claim #1 and having a periodicity of 3.6 residues/ turn.
PCT/US1993/006937 1992-08-04 1993-07-19 Method for design of novel proteins and proteins produced thereby WO1994003627A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU47826/93A AU4782693A (en) 1992-08-04 1993-07-19 Method for design of novel proteins and proteins produced thereby

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US92557292A 1992-08-04 1992-08-04
US07/925,572 1992-08-04

Publications (1)

Publication Number Publication Date
WO1994003627A1 true WO1994003627A1 (en) 1994-02-17

Family

ID=25451932

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/006937 WO1994003627A1 (en) 1992-08-04 1993-07-19 Method for design of novel proteins and proteins produced thereby

Country Status (2)

Country Link
AU (1) AU4782693A (en)
WO (1) WO1994003627A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1352959A1 (en) * 1997-01-24 2003-10-15 Bioinvent International AB A method for in vitro molecular evolution of protein function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SCIENCE, Vol. 249, issued 24 Aug. 1990, MICHAEL H. HECHT et al., "De Novo Design, Expression, and Characterization of Felix: A Four-Helix Bundle Protein of Native-Like Sequence", pages 884-891. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1352959A1 (en) * 1997-01-24 2003-10-15 Bioinvent International AB A method for in vitro molecular evolution of protein function
US7432083B2 (en) 1997-01-24 2008-10-07 Bioinvent International Ab Method for in vitro molecular evolution of protein function

Also Published As

Publication number Publication date
AU4782693A (en) 1994-03-03

Similar Documents

Publication Publication Date Title
Mutter et al. A chemical approach to protein design—template‐assembled synthetic proteins (TASP)
Korendovych et al. De novo protein design, a retrospective
Yeates Geometric principles for designing highly symmetric self-assembling protein nanomaterials
Goodsell et al. Structural symmetry and protein function
US20060160138A1 (en) Compositions and methods for protein design
Culver et al. Efficient reconstitution of functional Escherichia coli 30S ribosomal subunits from a complete set of recombinant small subunit ribosomal proteins
US6756039B1 (en) Self assembling proteins
US20080167194A1 (en) Protein Design Automation for Protein Libraries
Wang et al. Synthetic genomics: from DNA synthesis to genome design
US20030124537A1 (en) Procaryotic libraries and uses
US20070184487A1 (en) Compositions and methods for design of non-immunogenic proteins
Miranda et al. Challenges for protein chemical synthesis in the 21st century: bridging genomics and proteomics
Gilbert et al. Accelerating code to function: sizing up the protein production line
Brizuela et al. FLEXGene repository: from sequenced genomes to gene repositories for high-throughput functional biology and proteomics
Edgell et al. De novo designed protein-interaction modules for in-cell applications
Hendrix Bacteriophage HK97: assembly of the capsid and evolutionary connections
Daube et al. Cell-free co-synthesis of protein nanoassemblies: tubes, rings, and doughnuts
Ożga et al. Design and engineering of miniproteins
WO1994003627A1 (en) Method for design of novel proteins and proteins produced thereby
JP2004528850A (en) A new way of directed evolution
Carey Protein engineering and design
Ghaffari et al. Design and simulation of a novel bio nano actuator by prefoldin
Prakash et al. De novo designed Heterochiral blue fluorescent protein
Thompson et al. Rational Design of Phosphorylation-Responsive Coiled Coil-Peptide Assemblies
Nicol From peptide oligomers to single-chain proteins

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA