WO1999031266A1 - Method for determining and modifying protein/peptide solubility - Google Patents

Method for determining and modifying protein/peptide solubility Download PDF

Info

Publication number
WO1999031266A1
WO1999031266A1 PCT/US1998/025862 US9825862W WO9931266A1 WO 1999031266 A1 WO1999031266 A1 WO 1999031266A1 US 9825862 W US9825862 W US 9825862W WO 9931266 A1 WO9931266 A1 WO 9931266A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
solubility
dna
modifying
fragment
Prior art date
Application number
PCT/US1998/025862
Other languages
French (fr)
Inventor
Geoffrey S. Waldo
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Priority to AU16291/99A priority Critical patent/AU1629199A/en
Publication of WO1999031266A1 publication Critical patent/WO1999031266A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P21/00Preparation of peptides or proteins
    • C12P21/02Preparation of peptides or proteins having a known sequence of two or more amino acids, e.g. glutathione
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6897Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids involving reporter genes operably linked to promoters
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/60Fusion polypeptide containing spectroscopic/fluorescent detection, e.g. green fluorescent protein [GFP]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/61Fusion polypeptide containing an enzyme fusion for detection (lacZ, luciferase)
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis

Definitions

  • the present invention relates generally to improving the solubility of proteins/peptides and, more particularly to a method for identifying more or less soluble proteins/peptides from libraries of mutants thereof generated from the directed evolution of genes which express these proteins/peptides.
  • This invention was made with government support under Contract No. W-7405-ENG-36 awarded by the U.S.
  • Protein insolubility constitutes a significant problem in basic and applied bioscience, in many situations limiting the rate of progress in these areas. Protein folding and solubility has been the subject of considerable theoretical and empirical research. However, there still exists no general method for improving intrinsic protein solubility.
  • Such a method would greatly facilitate protein structure-function studies, drug design, de novo peptide and protein design and associated structure-function studies, industrial process optimization using bioreactors and microorganisms, and many disciplines in which a process or application depends on the ability to tailor or improve the solubility of proteins, screen or modify the solubility of large numbers of unique proteins about which little or no structure-function information is available, or adapt the solubility of proteins to new environments when the structure and function of the protein(s) are poorly understood or unknown.
  • Modified growth media and/or growth conditions can sometimes improve the folding and solubility of a foreign protein.
  • tnese methods are frequently cumbersome, unreliable, ineffective, or lack generality.
  • a second set of approaches changes the sequence of the expressed protein. Rational approaches employ site-directed mutation of key residues to improve protein stability and solubility. Alternatively, a smaller, more soluble fragment of the protein may be expressed. These approaches require a priori knowledge about the structure of the protein, knowledge which is generally unavailable when the protein is insoluble. Furthermore, rational design approaches are best applied when the problem involves only a small number of amino-acid changes. Finally, even when the structure is known, the changes required to improve solubility may be unclear. Thus, many thousands of possible combinations of mutations may have to be investigated leading to what is essentially an "irrational" or random mutagenesis approach. Such an approach requires a method for rapidly determining the solubility of each version.
  • Random or "irrational" mutagenesis redesign of protein solubility carries the possibility that the native function of the protein may be destroyed or modified by the inadvertent mutation of residues which are important for function, but not necessarily related to solubility.
  • protein solubility is strongly influenced by interaction with the environment through surface amino acid residues, while catalytic activities and/or small substrate recognition often involve partially buried or cleft residues distant from the surface residues.
  • rational mutation of proteins has demonstrated that the solubility of a protein can be modified without destroying the native function of the protein. Modification of the function of a protein without effecting its solubility has also been frequently observed.
  • Green fluorescent protein has become a widely used reporter of gene expression and regulation.
  • DNA shuffling has been used to obtain a mutant having a whole cell fluorescence 45-times greater than the standard, commercially available plasmid GFP. See, e.g., "Improved Green Fluorescent Protein By Molecular Evolution Using DNA Shuffling," by Andreas Crameri et al., Nature Biotechnology 14, 315 (1996).
  • the screening process optimizes the function of GFP (green fluorescence), and thus uses a functional screen.
  • the bacteria under the control of a T7 promoter, and that the bacteria contained inclusion bodies consisting of protein indistinguishable from jellyfish or soluble recombinant protein on denaturing gels, but that this material was completely nonfluorescent, lacked the visible absorbance bands of the chromophore, and did not become fluorescent when solubilized and subjected to protocols that renature GFP, as opposed to the soluble GFP in the bacteria which undergoes correct folding and, therefore, fluoresces.
  • Another object of the present invention is to provide a solubility reporter for rapidly identifying soluble forms of proteins.
  • Another object of the invention is to provide a method for modifying the solubility of proteins by generating large numbers of genetic mutants of the gene which encodes for the protein to be solubilized which can be expressed and the resulting proteins screened for solubility.
  • the method for determining the solubility of a protein, P, of this invention may include the steps of: fusing a DNA fragment, [P], which codes for the protein with the DNA [R] which codes for a reporter protein, R, which can be detected in solution, forming thereby a fusion
  • the DNA fragment [P] is fused with the DNA fragment [L] which codes for a flexible linker peptide, L, which has been fused with the DNA fragment [R], forming thereby either fusion DNA fragment [P-L-R] or fusion DNA fragment [R-L-P], such that the solubility of the fusion proteins encoded by the [P-L-R] or the [R-L-P] are determined by the solubility of protein P.
  • the DNA fragment bearing [L-R] or [R-L] is part of an expression vector and/or transfection/transformation vector enabling the fusion of [P] to yield the DNA fusions [P-L-R] or [R-L-P] as part of said vectors, thus enabling a host cell to express either the fusion protein P-L-R or the fusion protein R-L-P, such that the solubility of the fusion protein is determined by the solubility of protein P.
  • linker peptide is short, flexible, hydrophilic and soluble.
  • the reporter protein includes green fluorescent protein.
  • the method for modifying the solubility of a protein, P, hereof may include the steps of: introducing mutations into [P], the DNA fragment which codes for the protein, generating thereby a combinatorial library of mutated variants, [X]; in-frame fusing individual [X] variants with a DNA construct such as a plasmid vector which includes a fragment which codes for a reporter protein, [R], which can be detected in solution, forming thereby a set of DNA constructs containing [X-R], which code for the fusion proteins, X-R, such that the solubility of each of the X-R proteins is determined by the solubility of the variant protein X contained therein; and introducing each of the DNA constructs into an expression host such that the fusion protein is overexpressed therein; whereby if one of the fusion proteins X-R is soluble in the host therefor, said reporter protein R can be detected, thereby indicating that the variant
  • the DNA fragment [X] is fused with the DNA fragment which codes for a flexible linker peptide, [L], which has been fused with the DNA fragment [R], forming thereby either fusion DNA fragment [X-L-R] or fusion DNA fragment [R-L-X], such that the solubility of the fusion proteins expressed by the [X-L-R] or the [R-L-X] are determined by the solubility of protein X.
  • the DNA fragment bearing [L-R] or [R-L] is part of an expression vector and/or transfection/transformation vector enabling the fusion of [X] to yield the DNA fusions [X-L-R] or [R-L-X] as part of said vectors, thus enabling a host cell to express either the fusion protein X-L-R or the fusion protein R-L-X, such that the solubility of the fusion protein is determined by the solubility of protein X.
  • linker peptide short, flexible, hydrophilic and soluble.
  • reporter protein includes green fluorescent protein. It is also preferred that the step of introducing mutations into [P] generating thereby a combinatorial library of mutated variants [X] is achieved using gene shuffling and directed evolution.
  • Benefits and advantages of the present invention include the enhancement of the solubility of proteins of interest without having to individually test, (such as by large- scale growth of each mutant in question followed by cell lysis, fractionation and sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE)), the solubility of each protein modification generated, and has general applicability.
  • FIGURE 1 is a flow diagram illustrating the use of the solubility reporter according to the teachings of the present invention. if protein, P, is insoluble, the fusion protein, P-L-GFP, is insoluble, aggregated or bound in inclusion bodies, and is nonfluorescent, while if protein P is soluble, fusion protein P-L-GFP is soluble and fluorescent.
  • FIGURE 2 is a flow diagram illustrating the generation of mutated versions of an arbitrary protein, P, which have enhanced solubility, employing fluorescence-assisted cell sorting to identify and select mutants with enhanced solubility.
  • FIGURE 3 illustrates the performajice of the GFP solubility reporter in E. coli
  • FIGURE 4 illustrates the increase in fluorescence of clones expressing the fusion H-type ferritin-L-GFP during the process of directed evolution using nutrient agar plates.
  • FIGURE 5 illustrates the application of the method to improve the solubility of bullfrog H-type ferritin, a protein which is normally insoluble when overexpressed at 37° C in E. coli.
  • the present invention utilizes a solubility reporter protein, expressed by the DNA fragment [R], which gives a specific, measurable signal when the protein encoded by the in-frame fusion DNA fragment, [P-L-R], is soluble, where [P] is the DNA fragment which encodes the protein, P, to be solubilized, and [L] is the DNA fragment which encodes flexible linker peptide, L.
  • R is green fluorescent protein (GFP).
  • Linker peptide L which is preferably optimized for flexibility, hydrophilic nature, and solubility, is fused to the GFP.
  • the fusion protein(s) L-GFP (GFP fused to the C- terminus of L) or GFP-L (GFP fused to the N-terminus of L) are soluble within the expression host and fluorescent.
  • the DNA encoding P is then fused to a reporter vector containing the DNA fragment which encodes the L-GFP construct, and the fusion protein P-L-GFP (P fused to the N-terminus of L-GFP) is caused to be overexpressed in host cell.
  • the DNA encoding P is fused to a reporter vector containing a DNA fragment which encodes the GFP-L construct, and the fusion protein GFP-L-P (P fused to the C-terminus of GFP-L) is caused to be overexpressed in the host cell.
  • the GFP-L and L-GFP are chosen such that the solubility of the P-L-GFP or GFP-L-P is controlled by the solubility of P. It is anticipated that for some systems, linker peptide L will not be required.
  • the proteins P-L-GFP or GFP-L-P are soluble within the expression host and are fluorescent.
  • FIG. 1 is a schematic representation of the use of the solubility reporter according to the teachings of the present invention.
  • Modification and, more particularly, enhancement of the solubility of protein P is accomplished by use of a DNA construct containing at least the solubility reporter DNA fragments [L-GFP] or [GFP-L], in a directed evolution of [P].
  • a combinatorial library of mutated variants X is generated by gene shuffling, for example.
  • the resulting pool of genes [X] encoding mutated proteins X is then genetically fused in-frame either with a pool of DNA constructs such as vectors containing [L-GFP] to produce a pool of DNA constructs encoding fusion proteins X-L-GFP; or to a pool of DNA constructs containing [GFP-L] to produce a pool of DNA constructs encoding fusion proteins GFP-L-X, each fusion variant having solubility determined by X.
  • an expression host such as electroporation of circular plasmid vectors into E. coli
  • individual variants with increased fluorescence may be screened and separated using fluorescence-assisted cell sorting, as an example.
  • FIG. 1 is a schematic illustration of the generation of mutated versions of an arbitrary protein, P, which have enhanced solubility, employing fluorescence-assisted cell sorting (FACS) to identify and select mutants with enhanced solubility according to the teaching of the present invention.
  • FACS fluorescence-assisted cell sorting
  • reporter R be chosen to have the following characteristics: (1) The observed parameter for R, which indicates solubility of X-L-R and R-L-X, must not be observable independent of the solubility of X or by the presence of X; (2) R should not dominate the solubility of X-L-R; (3) The solubility of X-L-R and R-L-X should be determined primarily by the solubility of X; (4) R should not assist the folding of X; (5) L should not significantly influence the solubility of R-L-X or X-L-R; and (6) L should not dominate the folding of any of X, R, X-L-R, or R-L-X.
  • EXAMPLE 1 As an example of the assembly of a construct which satisfies the above- described six criteria, a Bgl-ll/Xho-1 fragment of plasmid pET-21a(+), containing: the T7 promoter; lac operator sequence; ribosomal binding site; and multiple cloning site was ligated into the Bgl-ll/Xho-1 site of pET-28a(+).
  • the resulting hybrid plasmid contained the Kan, lad, and F1 origin of replication of the pET-28a(+) backbone.
  • the pET21a(+) and pET28a(+) vectors were used as obtained from a commercial source.
  • the vector was digested with Nde-1 and BamH-1 , the small fragment was discarded, and replaced with an in-frame stuffer such that the sequence, inclusive of the Nde-1 and BamH-l sites, was [CATATGTGTAGACAGCTGGGATCC].
  • the vector was digested with BamH-l and EcoR-1 and the small stuffer was discarded.
  • the BamH-l/EcoR-1 site was filled with the DNA fragment [GGATCCGCTGGCTCCGCTGCTGGTTCTGGCGAATTC], coding for the flexible linker L (GSAGSAAGSGEF).
  • the resulting GFP variant was amplified by PCR using the 5' primer [GATATAGAATTCAGCAAA ⁇ GGAGAAGAACTTTTC], incorporating a 5' EcoR-1 site; and the 3' primer [GAATTCGGTACCTTATTTGTAGAGCTCTACCA ⁇ , incorporating a 5' Xho-1 site.
  • the resulting vector was digested with EcoR-1/Xho-1 , the stuffer discarded, and replaced with the EcoR-1/Xho-1 -digested EcoR-1 :GFP:Xho-1 amplicon, and the circular plasmid produced thereby was transformed by electroporation into the E.
  • the construct in the pET vector system is inducible by IPTG.
  • a transformant was used to inoculate a culture of LB and grown to an optical density (O.D.) at 600 nm of approx. 0.5, IPTG was added to a final concentration of 1 mM, and induction was allowed to proceed for 2 h.
  • the small in-frame stuffer fragment between Nde-1 and BamH-1 was removed by restriction digest, and replaced by an out-of-frame stuffer with 3 translational stops. Cells expressing this fusion were non-fluorescent due to termination of translation prior to the GFP.
  • the vector was digested with Nde- 1+BamH-1 to remove the stuffer and create a recipient site for Nde-1 /BamH-1 flanked inserts. This recipient vector is subsequently referred to as the solubility-reporter vector.
  • the specific examples described below use primers for the genes of interest which contain Nde-1 (N-terminus) and BamH-1 (C-terminus). The use of an out-of- frame stuffer insures that and vectors escaping digest code for non-fluorescent constructs and thus had the effect of eliminating false-positives.
  • Fig. 3 The response of the reporter system prepared as described hereinabove to two proteins (one highly soluble, the other highly insoluble) which are each efficiently overexpressed in E. coli is demonstrated in Fig. 3.
  • a fusion to the highly soluble protein malE which is widely used as a fusion protein to facilitate the purification of overexpressed proteins in E.coli, [malE-L-GFP] was selected to demonstrate the response of the reporter system to a soluble protein.
  • a fusion construct with xylR, a highly insoluble bacterial regulator protein, [xylR-L-GFP] was chosen to demonstrate the response of the reporter system to an insoluble protein.
  • Fig. 3a is a photograph of the resulting brightly fluorescent colonies where the protein malE-L-GFP is overexpressed
  • Fig. 3b is a photograph of the resulting weakly fluorescent colonies where the protein xylR-L-GFP is overexpressed.
  • the response of the solubility repbrter system during improvement of the solubility of bullfrog H-ferritin by directed evolution of the expressed fusion construct, [ferritin-L-GFP], is shown in Fig. 4.
  • the 6 clones of the ninth row are: wild type (barely visible at the extreme left); followed by optima, (brightest, most soluble), from cycles 1 , 2, 3 and 4 of directed evolution, and round 1 of backcrossing of the round 4 optima against the wildtype ferritin.
  • the upper grid of 8 rows, 6 clones per row (48 colonies), are optima from a second round of backcrossing to remove non- essential mutations. With each cycle, the fluorescence (and hence solubility) improves.
  • Figure 5 shows the use of an SDS-PAGE gel to illustrate the effectiveness of solubility reporters in a directed evolution process to improve the solubility of bullfrog H- type ferritin expressed in E. coli.
  • Molecular weight marker ladder, M 10 kDAL.
  • EXAMPLE 2 The above-described use of a solubility reporter can be analogously extended to determine the solubility of protein fragments.
  • the DNA [P] is subjected to a partial enzymatic digest, (e.g., by DNASE-I in the presence of the divalent cations Mn 2+ or Co 2+ ), to create a pool of smaller fragments, [F].
  • the fragments can be polished with a proof-reading polymerase bearing 3'-5' exonuclease activity to yield blunt-ends, or subsequently given A- overhangs by treatment with a polymerase devoid of 3'-5' exonuclease activity with excess dATP (e.g., Taq polymerase).
  • a particular size range of the fragments [F] may be selected, by agarose gel electrophoresis as an example.
  • solubility reporter method may be used to determine the solubility of a protein, its variants (mutants), and fragments thereof.
  • EXAMPLE 3 has shown that GFP can be used as a solubility reporter.
  • solubility reporters incorporating a translational fusion [P-L-R] include systems in which R is a protein/peptide other than GFP.
  • R can be a protein/peptide which gives a detectable signal observable by chemical, biological or physical means, when linked to P-L as P-L-R.
  • R could be the beta-galactosidase enzyme, lacZ.
  • Clones expressing P-L-lacZ in which P is a soluble protein are detected by the enzymatic activity of lacZ (See, e.g., "Beta- Galactosidase Gene Fusions For Analyzing Gene Expression In Escherichia Coli And Yeast," by M. Casadaban et al., Methods ⁇ Enzymol. 100, 293 (1983)) on substrates which yield a colored reaction product (For example, X-gal (5-bromo-4-chioro-3-indolyl- ⁇ -D-galactoside)).
  • Colonies expressing fusion proteins with ⁇ -galactosidase activity turn blue on plates containing X-gal.
  • the functionally complementable lacZ ⁇ fragment is used as a substitute.
  • the complementary fragment ⁇ -lacZ is provided by the host chromosome
  • E. coli strain DH10B (F " mcrA A ⁇ mrrhsdRMS-mcrBC) ⁇ 80d/acZ ⁇ M15
  • Reporter proteins R which have optimal activity when present in a non-fusion context may be employed for assays.
  • the construct P-L-C-R is generated, where C is a unique protease site.
  • C could be the viral protease cleavage site for the plum pox virus Nla protease (See, e.g., M. Martin et al., "Determination of polyprotein processing sites by amino terminal sequencing of nonstructural proteins encoded by plum pox polyvirus", Virus Res. 15, 97, (1990)), and R is the lacZ ⁇ fragment, as an example.
  • P-L-C-lacZ ⁇ and the viral protease (Nla) could each be expressed under the control of separately inducible promoters on separate plasmids with compatible origins of replication.
  • plasmids with cloning sites under independently controlled promoters, see R. Lutz and H. Berjard, "Independent and tight regulation of transcriptional units in E. coli via the LacR/O, the TetR/O and AraD/VI 2 regulatory elements", Nucleic Acids Res., 25(6), 1203, (1997).
  • the P-L-C-lacZ ⁇ construct could be expressed under the control of the tet promoter, and the Nla gene under the control of the arabinose promoter/repressor.
  • the plasmid(s) would be transformed into the appropriate E. coli host (see Lutz, supra), and anhydrotetrac ⁇ cline added to the growth medium to induce expression of P-L-C-lacZ ⁇ .
  • arabinose+IPTG is added to the growth medium to induce expression of the Nla protease.
  • P-L-C-lacZ ⁇ is soluble and contains a correctly-folded lacZ ⁇ domain, and P- L-C-lacZ ⁇ is cleaved at site C, only if P were soluble. Subsequent release of lacZ ⁇ complements the ⁇ -lacZ fragment and restores lacZ ⁇ -galactosidase activity, which is detected by standard colorimetric or fluorometric assays for ⁇ -galactosidase activity.
  • R might be an antibiotic selection marker such as the ⁇ -lactamase gene (bla), which confers resistance to penicillin-derived antibiotics commonly used in cloning vectors.
  • the ⁇ -lactamase gene contains a signal peptide and is translocated to the periplasm of E. coli.
  • proper processing of the antibiotic resistance protein and translocation to the periplasm would be impeded by N-terminus fusions, although cleavage by the protease obviates this problem.
  • the P-L-C- ⁇ -lactamase fusion protein would be soluble only if P were soluble. Concomitant induction by both anhydrotetracycline and IPTG+arabinose would provide both the fusion protein P-L-C- ⁇ -lactamase and the viral cleavage protease Nla.
  • the fusion protein P-L-C- ⁇ -lactamase In cells bearing soluble variants of P, the fusion protein P-L-C- ⁇ -lactamase would be soluble and cleaved at C by virtue of the protease Nla, releasing functional ⁇ -lactamase resistance protein, thereby conferring antibiotic resistance to the antibiotic ampicillin. Conversely, in cells bearing non-soluble variants P, the fusion protein would be insoluble, the protease cleavage site C would be buried in inclusion bodies, and thereby inaccessible to cleavage by the viral protease. Furthermore, the ⁇ -lactamase protein would be buried in inclusion bodies, misfolded and non-functional. Such cells would not have resistance to the antibiotic ampicillin.

Abstract

A solubility reporter for measuring a protein's solubility in vivo or in vitro is described. The reporter, which can be used in a single living cell, gives a specific signal which can be used to determine whether the cell bears a soluble version of the protein of interest. A pool of random mutants of an arbitrary protein, generated using error-prone in vitro recombination, may also be screened for more soluble versions using the reporter, and these versions may be recombined to yield variants having further enhanced solubility. The method of the present invention includes 'irrational' (random mutagenesis) methods, which do not require a priori knowledge of the three-dimensional structure of the protein of interest. Multiple sequences of mutation/genetic recombination and selection for improved solubility yield versions of the protein which display enhanced solubility.

Description

METHOD FOR DETERMINING AND MODIFYING PROTEIN/PEPTIDE SOLUBILITY
FIELD OF THE INVENTION The present invention relates generally to improving the solubility of proteins/peptides and, more particularly to a method for identifying more or less soluble proteins/peptides from libraries of mutants thereof generated from the directed evolution of genes which express these proteins/peptides. This invention was made with government support under Contract No. W-7405-ENG-36 awarded by the U.S.
Department of Energy to The Regents of the University of California. The government has certain rights in the invention.
BACKGROUND OF THE INVENTION Protein insolubility constitutes a significant problem in basic and applied bioscience, in many situations limiting the rate of progress in these areas. Protein folding and solubility has been the subject of considerable theoretical and empirical research. However, there still exists no general method for improving intrinsic protein solubility. Such a method would greatly facilitate protein structure-function studies, drug design, de novo peptide and protein design and associated structure-function studies, industrial process optimization using bioreactors and microorganisms, and many disciplines in which a process or application depends on the ability to tailor or improve the solubility of proteins, screen or modify the solubility of large numbers of unique proteins about which little or no structure-function information is available, or adapt the solubility of proteins to new environments when the structure and function of the protein(s) are poorly understood or unknown.
Overexpression of cloned genes using an expression host, for example E. coli, is the principal method of obtaining proteins for most applications. Unfortunately, many such cloned foreign proteins are insolublesor unstable when overexpressed. There are iwo sets of approaches currently in use which deal with such insoluble proteins. One set of approaches modifies the environment of the protein in vivo and/or in vitro. For example, proteins may be expressed as fusions with more soluble proteins, or directed to specific cellular locations. Chaperons may be coexpressed to assist folding pathways. Insoluble proteins may be purified from inclusion bodies using denaturants and the protein subsequently refolded in the absence of the denaturant. Modified growth media and/or growth conditions can sometimes improve the folding and solubility of a foreign protein. However, tnese methods are frequently cumbersome, unreliable, ineffective, or lack generality. A second set of approaches changes the sequence of the expressed protein. Rational approaches employ site-directed mutation of key residues to improve protein stability and solubility. Alternatively, a smaller, more soluble fragment of the protein may be expressed. These approaches require a priori knowledge about the structure of the protein, knowledge which is generally unavailable when the protein is insoluble. Furthermore, rational design approaches are best applied when the problem involves only a small number of amino-acid changes. Finally, even when the structure is known, the changes required to improve solubility may be unclear. Thus, many thousands of possible combinations of mutations may have to be investigated leading to what is essentially an "irrational" or random mutagenesis approach. Such an approach requires a method for rapidly determining the solubility of each version.
Random or "irrational" mutagenesis redesign of protein solubility carries the possibility that the native function of the protein may be destroyed or modified by the inadvertent mutation of residues which are important for function, but not necessarily related to solubility. However, protein solubility is strongly influenced by interaction with the environment through surface amino acid residues, while catalytic activities and/or small substrate recognition often involve partially buried or cleft residues distant from the surface residues. Thus, in many situations, rational mutation of proteins has demonstrated that the solubility of a protein can be modified without destroying the native function of the protein. Modification of the function of a protein without effecting its solubility has also been frequently observed. Furthermore, spontaneous mutants of proteins bearing only 1 or 2 point mutations have been serendipitously isolated which have converted a previously insoluble protein into a soluble one. This suggests that the solubility of a protein can be optimized with a low level of mutation and that protein function can be maintained independently of enhancements or modifications to solubility. Furthermore, a screen for function may be applied concomitantly after each round of solubility selection during the directed evolution process.
In the absence of a screen for function, for example when the function is unknown, the final version of the protein can be backcrossed against the wild type in vitro to remove nonessential mutations. This approach has been successfully applied by Stemmer in "Rapid Evolution Of A Protein In Vitro By DNA Shuffling," by W.P.C. Stemmer, Nature 370, 389 (1994), and in "DNA Shuffling By Random Fragmentation And Reassembly: In Vitro Recombination For Molecular Evolution," by W.P.C. Stemmer, Proc. Natl. Acad. Sci. USA 91, 10747 (1994) to problems in which the function of a protein had been optimized and it was desired to remove nonessential mutations accumulated during directed evolution. The development of highly specialized protein variants by directed, in vitro evolution, which exerts unidirectional selection pressure on organisms, is further discussed in: "Searching Sequence Space: Using Recombination To Search More Efficiently And Thoroughly Instead Of Making Bigger Combinatorial Libraries," by Willem P.C. Stemmer, Biotechnology 13, 549 (1995); in "Directed Evolution: Creating Biocatalysts For The Future," by Frances H. Arnold, Chemical Engineering Science 51, 5091 (1996); in "Directed Evolution Of A Fucosidase From A Galactosidase By DNA Shuffling And Screening," by Ji-Hu Zhang et al., Proc. Natl. Acad. Sci. USA 94, 4504 (1997); in "Functional And Nonfunctional Mutations Distinguished By Random Combination Of Homologous Genes," by Huimin Zhao and Frances H. Arnold, Proc. Natl. Acad. Sci. USA 94, 7007 (1997); and in "Strategies For The In Vitro Evolution of Protein Function: Enzyme Evolution By Random Recombination of Improved Sequences", by Jeff Moore et al., J. Mol. Biol. 272, 336-346 (1997). Therein, efficient strategies for engineering new proteins by multiple generations of random mutagenesis and recombination coupled with screening for improved variants is described. However, there are no teachings concerning the use of directed evolutionary processes to improve solubility of proteins; rather, the mutagenesis was directed to improvement of protein function. It should be mentioned, however, that in order for the protein to function properly in any environment, it must be correctly folded and, therefore, soluble.
Finally, for structural determination it is often not necessary or even desirable to have a fully functional version of the protein. If the mutationai rate is low (ensured by molecular backcrossing), it is likely that the structure of the wild-type and solubility optimized versions of a protein will be similar. As long as the protein is soluble, and a structure can be obtained, it should then be possible to redesign the solubility of the protein using rational methods, if desired.
Green fluorescent protein has become a widely used reporter of gene expression and regulation. DNA shuffling has been used to obtain a mutant having a whole cell fluorescence 45-times greater than the standard, commercially available plasmid GFP. See, e.g., "Improved Green Fluorescent Protein By Molecular Evolution Using DNA Shuffling," by Andreas Crameri et al., Nature Biotechnology 14, 315 (1996). The screening process optimizes the function of GFP (green fluorescence), and thus uses a functional screen. Although the screening process coincidentally optimizes the solubility of the GFP, in that the GFP is only fluorescent when properly folded, there is no mention of using soluble GFP as a tag to monitor solubility of other proteins; that is, the function of the protein and not its solubility are being modified. In "Wavelength Mutations And Post-translational Auto-oxidation Of Green Fluorescent Protein," by Roger Heim et al., Proc. Natl. Acad. Sci. USA 91, 12501 (1994), GFP was mutagenized and screened for variants with altered absorption or emission spectra. The authors mention that in place of proteins labeled with fluorescent tags to detect location and sometimes their conformational changes both in vitro and in intact cells, a possible strategy would be to concatenate the gene for the nonfluorescent protein of interest with the gene for a naturally fluorescent protein and express the fusion product. However, the focus of this paper is the extension of the usefulness of GFP by enabling visualization of differential gene expression and protein localization and measurement of protein association by fluorescence resonance energy transfer, by making available two visibly distinct colors. There is no mention of the use of the gene construct for solubility determinations. The paper further discusses the expression of GFP in E. coli under the control of a T7 promoter, and that the bacteria contained inclusion bodies consisting of protein indistinguishable from jellyfish or soluble recombinant protein on denaturing gels, but that this material was completely nonfluorescent, lacked the visible absorbance bands of the chromophore, and did not become fluorescent when solubilized and subjected to protocols that renature GFP, as opposed to the soluble GFP in the bacteria which undergoes correct folding and, therefore, fluoresces.
Chun Wu et al. in "Novel Green Fluorescent Protein (GFP) Baculovirus Expression Vectors," Gene 190, 157 (1997), describe the construction of Baculovirus expression vectors which contain GFP as a reporter gene. The authors follow the production and purification of a protein of interest by in-frame cloning of the gene that expresses the protein in insect cells with the GFP open reading frame, thereby permitting visualization of the produced GFP-fusion protein using UV light. However, the purified GFP-XylE fusion protein was found to be insoluble after harvest. In "Application Of A Chimeric Green Protein Fluorescent Protein To Study
Protein-Protein Interactions," by N. Garamszegi et al., Biotechniques 23, 864 (1997), the authors discuss the fusion between GFP and human calmodulin-like protein (CLP) and show that this protein retains fluorescence and the known characteristics of CLP. That is, the GFP portion remains responsible for efficient fluorescent signals with little or no influence on the properties of the fused protein of interest. The authors maintain that the exhibited GFP fluorescence provides information concerning the maintenance of the GFP structural integrity in the chimeric protein, but does not provide information about the integrity of the entire fusion protein and, in particular, does not allow any statements concerning the maintenance of CLP function or integrity. From these statements, it is clear that this paper does not contemplate the use of the GFP as a solubility reporter for the CLP.
Accordingly, it is an object of the present invention to provide a solubility reporter for rapidly identifying soluble forms of proteins. Another object of the invention is to provide a method for modifying the solubility of proteins by generating large numbers of genetic mutants of the gene which encodes for the protein to be solubilized which can be expressed and the resulting proteins screened for solubility. Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
SUMMARY OF THE INVENTION
To achieve the foregoing and other objects, and in accordance with the purposes of the present invention, as embodied and broadly described herein, the method for determining the solubility of a protein, P, of this invention may include the steps of: fusing a DNA fragment, [P], which codes for the protein with the DNA [R] which codes for a reporter protein, R, which can be detected in solution, forming thereby a fusion
DNA fragment, [P-R], which codes for the fusion protein, P-R, such that the solubility of the P-R is determined by the solubility of protein, P; ligating the [P-R] fragment into an expression vector to form a plasmid DNA; and introducing the plasmid DNA into an expression host such that the fusion protein is overexpressed therein; whereby if the fusion protein P-R is in solution in the host, the reporter protein R can be detected, thereby indicating that the protein P is soluble.
Preferably, the DNA fragment [P] is fused with the DNA fragment [L] which codes for a flexible linker peptide, L, which has been fused with the DNA fragment [R], forming thereby either fusion DNA fragment [P-L-R] or fusion DNA fragment [R-L-P], such that the solubility of the fusion proteins encoded by the [P-L-R] or the [R-L-P] are determined by the solubility of protein P.
Preferably also, the DNA fragment bearing [L-R] or [R-L] is part of an expression vector and/or transfection/transformation vector enabling the fusion of [P] to yield the DNA fusions [P-L-R] or [R-L-P] as part of said vectors, thus enabling a host cell to express either the fusion protein P-L-R or the fusion protein R-L-P, such that the solubility of the fusion protein is determined by the solubility of protein P.
It is also preferred that the linker peptide is short, flexible, hydrophilic and soluble.
Preferably also, the reporter protein includes green fluorescent protein.
In another aspect of the present invention, in accordance with its objects and purposes, the method for modifying the solubility of a protein, P, hereof may include the steps of: introducing mutations into [P], the DNA fragment which codes for the protein, generating thereby a combinatorial library of mutated variants, [X]; in-frame fusing individual [X] variants with a DNA construct such as a plasmid vector which includes a fragment which codes for a reporter protein, [R], which can be detected in solution, forming thereby a set of DNA constructs containing [X-R], which code for the fusion proteins, X-R, such that the solubility of each of the X-R proteins is determined by the solubility of the variant protein X contained therein; and introducing each of the DNA constructs into an expression host such that the fusion protein is overexpressed therein; whereby if one of the fusion proteins X-R is soluble in the host therefor, said reporter protein R can be detected, thereby indicating that the variant of the protein P is soluble.
Preferably, the DNA fragment [X] is fused with the DNA fragment which codes for a flexible linker peptide, [L], which has been fused with the DNA fragment [R], forming thereby either fusion DNA fragment [X-L-R] or fusion DNA fragment [R-L-X], such that the solubility of the fusion proteins expressed by the [X-L-R] or the [R-L-X] are determined by the solubility of protein X.
Preferably also, the DNA fragment bearing [L-R] or [R-L] is part of an expression vector and/or transfection/transformation vector enabling the fusion of [X] to yield the DNA fusions [X-L-R] or [R-L-X] as part of said vectors, thus enabling a host cell to express either the fusion protein X-L-R or the fusion protein R-L-X, such that the solubility of the fusion protein is determined by the solubility of protein X.
It is preferred that the linker peptide short, flexible, hydrophilic and soluble. Preferably also the reporter protein includes green fluorescent protein. It is also preferred that the step of introducing mutations into [P] generating thereby a combinatorial library of mutated variants [X] is achieved using gene shuffling and directed evolution.
Benefits and advantages of the present invention include the enhancement of the solubility of proteins of interest without having to individually test, (such as by large- scale growth of each mutant in question followed by cell lysis, fractionation and sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE)), the solubility of each protein modification generated, and has general applicability.
BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and form a part of the specification, illustrate the embodiments of the present invention and, together with the description, serve to explain the principles of the invention. In the drawings:
FIGURE 1 is a flow diagram illustrating the use of the solubility reporter according to the teachings of the present invention; if protein, P, is insoluble, the fusion protein, P-L-GFP, is insoluble, aggregated or bound in inclusion bodies, and is nonfluorescent, while if protein P is soluble, fusion protein P-L-GFP is soluble and fluorescent.
FIGURE 2 is a flow diagram illustrating the generation of mutated versions of an arbitrary protein, P, which have enhanced solubility, employing fluorescence-assisted cell sorting to identify and select mutants with enhanced solubility. FIGURE 3 illustrates the performajice of the GFP solubility reporter in E. coli
BL21(DE3) induced by isopropyl-β-D-thiogalactopyranoside (IPTG) on Luria-Bertani (LB) media plates.
FIGURE 4 illustrates the increase in fluorescence of clones expressing the fusion H-type ferritin-L-GFP during the process of directed evolution using nutrient agar plates. FIGURE 5 illustrates the application of the method to improve the solubility of bullfrog H-type ferritin, a protein which is normally insoluble when overexpressed at 37° C in E. coli.
DETAILED DESCRIPTION Briefly, the present invention utilizes a solubility reporter protein, expressed by the DNA fragment [R], which gives a specific, measurable signal when the protein encoded by the in-frame fusion DNA fragment, [P-L-R], is soluble, where [P] is the DNA fragment which encodes the protein, P, to be solubilized, and [L] is the DNA fragment which encodes flexible linker peptide, L. In one embodiment of the invention, R is green fluorescent protein (GFP). Linker peptide L, which is preferably optimized for flexibility, hydrophilic nature, and solubility, is fused to the GFP. When overexpressed in the host cell, for example E. coli, the fusion protein(s) L-GFP (GFP fused to the C- terminus of L) or GFP-L (GFP fused to the N-terminus of L) are soluble within the expression host and fluorescent. The DNA encoding P is then fused to a reporter vector containing the DNA fragment which encodes the L-GFP construct, and the fusion protein P-L-GFP (P fused to the N-terminus of L-GFP) is caused to be overexpressed in host cell. Alternatively, the DNA encoding P is fused to a reporter vector containing a DNA fragment which encodes the GFP-L construct, and the fusion protein GFP-L-P (P fused to the C-terminus of GFP-L) is caused to be overexpressed in the host cell. The GFP-L and L-GFP are chosen such that the solubility of the P-L-GFP or GFP-L-P is controlled by the solubility of P. It is anticipated that for some systems, linker peptide L will not be required. When P is soluble, the proteins P-L-GFP or GFP-L-P are soluble within the expression host and are fluorescent. When P is insoluble, P-L-GFP or GFP- L-P are found in aggregates within the host known as inclusion bodies and are non- fluorescent. Thus, P-L-GFP or GFP-LrP constitute solubility reporters for rapidly determining the solubility of P. Figure 1 is a schematic representation of the use of the solubility reporter according to the teachings of the present invention.
Modification and, more particularly, enhancement of the solubility of protein P is accomplished by use of a DNA construct containing at least the solubility reporter DNA fragments [L-GFP] or [GFP-L], in a directed evolution of [P]. A combinatorial library of mutated variants X is generated by gene shuffling, for example. The resulting pool of genes [X] encoding mutated proteins X is then genetically fused in-frame either with a pool of DNA constructs such as vectors containing [L-GFP] to produce a pool of DNA constructs encoding fusion proteins X-L-GFP; or to a pool of DNA constructs containing [GFP-L] to produce a pool of DNA constructs encoding fusion proteins GFP-L-X, each fusion variant having solubility determined by X. After introducing the DNA into an expression host, such as electroporation of circular plasmid vectors into E. coli, individual variants with increased fluorescence (and hence increased solubility) may be screened and separated using fluorescence-assisted cell sorting, as an example. Millions of variants can be screened in one hour. Further cycles of directed evolution may be instigated until no further improvement in solubility is observed. Furthermore, mutations which are unnecessary for enhanced solubility which accumulated during the directed evolution, can be removed by in vitro recombination or backcrossing of the DNA encoding enhanced variants X of P against an excess of DNA encoding wild type P, followed by selection of variants retaining enhanced solubility, using said solubility reporter. Figure 2 is a schematic illustration of the generation of mutated versions of an arbitrary protein, P, which have enhanced solubility, employing fluorescence-assisted cell sorting (FACS) to identify and select mutants with enhanced solubility according to the teaching of the present invention.
To screen large numbers of versions of an arbitrary protein, it is desirable, but not essential, that reporter R be chosen to have the following characteristics: (1) The observed parameter for R, which indicates solubility of X-L-R and R-L-X, must not be observable independent of the solubility of X or by the presence of X; (2) R should not dominate the solubility of X-L-R; (3) The solubility of X-L-R and R-L-X should be determined primarily by the solubility of X; (4) R should not assist the folding of X; (5) L should not significantly influence the solubility of R-L-X or X-L-R; and (6) L should not dominate the folding of any of X, R, X-L-R, or R-L-X. Having generally described the invention, the following EXAMPLES illustrate the application of the method of the present invention in greater detail.
EXAMPLE 1 As an example of the assembly of a construct which satisfies the above- described six criteria, a Bgl-ll/Xho-1 fragment of plasmid pET-21a(+), containing: the T7 promoter; lac operator sequence; ribosomal binding site; and multiple cloning site was ligated into the Bgl-ll/Xho-1 site of pET-28a(+). The resulting hybrid plasmid contained the Kan, lad, and F1 origin of replication of the pET-28a(+) backbone. The pET21a(+) and pET28a(+) vectors were used as obtained from a commercial source. The vector was digested with Nde-1 and BamH-1 , the small fragment was discarded, and replaced with an in-frame stuffer such that the sequence, inclusive of the Nde-1 and BamH-l sites, was [CATATGTGTAGACAGCTGGGATCC]. Next, the vector was digested with BamH-l and EcoR-1 and the small stuffer was discarded. The BamH-l/EcoR-1 site was filled with the DNA fragment [GGATCCGCTGGCTCCGCTGCTGGTTCTGGCGAATTC], coding for the flexible linker L (GSAGSAAGSGEF). An improved variant of GFP was created by site-directed mutation using recombinant PCR (see, e.g., "Recombinant PCR" by Russel Higuchi in "PCR Protocols, a Guide to Methods and Applications", Michael A. Innis, David H. Gelfand, John J. Sninsky, and Thomas J. White, eds. Academic press, Inc., 177, (1990)), of the soluble variant of Crameri et al., supra, to yield the red-shift S65T mutation (See, e.g., "Improved Green Fluorescence," by Roger Heim et al., Nature 373, 663, (1995)) which improves the performance of the protein in FACS, by increasing the absorption of the fluorophore of 488 nm light (near the argon laser emission commonly used for FACS). The internal Nde-1 and BamH-1 sites were abolished by silent-mutation. The resulting GFP variant was amplified by PCR using the 5' primer [GATATAGAATTCAGCAAA^GGAGAAGAACTTTTC], incorporating a 5' EcoR-1 site; and the 3' primer [GAATTCGGTACCTTATTTGTAGAGCTCTACCAη, incorporating a 5' Xho-1 site. The resulting vector was digested with EcoR-1/Xho-1 , the stuffer discarded, and replaced with the EcoR-1/Xho-1 -digested EcoR-1 :GFP:Xho-1 amplicon, and the circular plasmid produced thereby was transformed by electroporation into the E. coli strain BL21 (DE3) genotype: (F" ompT hsdSB (rB "mB ) gal dcm (DE3)), a commercially available strain. The construct in the pET vector system is inducible by IPTG. A transformant was used to inoculate a culture of LB and grown to an optical density (O.D.) at 600 nm of approx. 0.5, IPTG was added to a final concentration of 1 mM, and induction was allowed to proceed for 2 h. The bright green fluorescence, visible under room lighting, indicated that the fusion construct was soluble and well-expressed. Next, the small in-frame stuffer fragment between Nde-1 and BamH-1 was removed by restriction digest, and replaced by an out-of-frame stuffer with 3 translational stops. Cells expressing this fusion were non-fluorescent due to termination of translation prior to the GFP. Finally, the vector was digested with Nde- 1+BamH-1 to remove the stuffer and create a recipient site for Nde-1 /BamH-1 flanked inserts. This recipient vector is subsequently referred to as the solubility-reporter vector. The specific examples described below use primers for the genes of interest which contain Nde-1 (N-terminus) and BamH-1 (C-terminus). The use of an out-of- frame stuffer insures that and vectors escaping digest code for non-fluorescent constructs and thus had the effect of eliminating false-positives.
The response of the reporter system prepared as described hereinabove to two proteins (one highly soluble, the other highly insoluble) which are each efficiently overexpressed in E. coli is demonstrated in Fig. 3. A fusion to the highly soluble protein malE, which is widely used as a fusion protein to facilitate the purification of overexpressed proteins in E.coli, [malE-L-GFP], was selected to demonstrate the response of the reporter system to a soluble protein. A fusion construct with xylR, a highly insoluble bacterial regulator protein, [xylR-L-GFP], was chosen to demonstrate the response of the reporter system to an insoluble protein. The constructs were overexpressed in strain BL21(DE3), clones were allowed to grow on nitrocellulose membranes on LB media agar plates containing kanamycin until colonies were 1-2 mm in diameter. The membranes bearing the colonies were transferred to LB media agar plates containing kanamycin and the IPTG inducer to cause overexpression of the fusion proteins. Under long-wavelength UV radiation Fig. 3a is a photograph of the resulting brightly fluorescent colonies where the protein malE-L-GFP is overexpressed, while Fig. 3b is a photograph of the resulting weakly fluorescent colonies where the protein xylR-L-GFP is overexpressed.
The response of the solubility repbrter system during improvement of the solubility of bullfrog H-ferritin by directed evolution of the expressed fusion construct, [ferritin-L-GFP], is shown in Fig. 4. The 6 clones of the ninth row (from left to right) are: wild type (barely visible at the extreme left); followed by optima, (brightest, most soluble), from cycles 1 , 2, 3 and 4 of directed evolution, and round 1 of backcrossing of the round 4 optima against the wildtype ferritin. The upper grid of 8 rows, 6 clones per row (48 colonies), are optima from a second round of backcrossing to remove non- essential mutations. With each cycle, the fluorescence (and hence solubility) improves. Figure 5 shows the use of an SDS-PAGE gel to illustrate the effectiveness of solubility reporters in a directed evolution process to improve the solubility of bullfrog H- type ferritin expressed in E. coli. Cultures expressing non-fusion constructs of ferritin alone were sonicated to lyse cells, and the soluble and insoluble fractions were separated by centrifugation. Fractions were resolved by SDS-PAGE; here, S = soluble (supernatant) fraction, P = insoluble (pellet) fraction. Molecular weight marker ladder, M = 10 kDAL. Lanes 1,2 are bullfrog L-type ferritin, a soluble protein used as control; lanes 3,4 are insoluble wild-type bullfrog H-type ferritin; lanes 5,6 are the round 4 optimum variant of bullfrog H-type ferritin after 2 rounds of back-crossing against the wildtype to remove spontaneous mutations not related to solubility. Improvement of the solubility of round 4 variant is observed by comparing lane 5 with the wildtype (lane 3) H-type ferritins. Round 4 optimum (with 2 back-crossing rounds) was picked from row 3, column 2 of the plate shown in Fig. 4 hereof, and shows that the strong fluorescence from the solubility reporter is indeed related to solubility of the fusion protein construct.
EXAMPLE 2 The above-described use of a solubility reporter can be analogously extended to determine the solubility of protein fragments. For example, to determine the solubility of fragments F of a protein P, the DNA [P] is subjected to a partial enzymatic digest, (e.g., by DNASE-I in the presence of the divalent cations Mn2+ or Co2+), to create a pool of smaller fragments, [F]. The fragments can be polished with a proof-reading polymerase bearing 3'-5' exonuclease activity to yield blunt-ends, or subsequently given A- overhangs by treatment with a polymerase devoid of 3'-5' exonuclease activity with excess dATP (e.g., Taq polymerase). If desired, a particular size range of the fragments [F] may be selected, by agarose gel electrophoresis as an example. After ligation (e.g., blunt-end or T/A overhang) with the pool of appropriate recipient solubility reporter vector (e.g., bearing a blunt-end or T/A cloning site in-frame with [L-R]), some of the fragments [F] will form in-frame translational fusions, [F-L-R]. After transformation into an appropriate host, (e.g., E. coli), expressed fusion proteins F-L-R which contain a soluble fragment F will be soluble, and detectable in the host by virtue of R (e.g., if R is GFP the host cells will be fluorescent). Thus, the above-described solubility reporter method may be used to determine the solubility of a protein, its variants (mutants), and fragments thereof.
EXAMPLE 3 EXAMPLE 1 has shown that GFP can be used as a solubility reporter. However, solubility reporters incorporating a translational fusion [P-L-R] include systems in which R is a protein/peptide other than GFP. When the fusion construct [P-L-R] is used, R can be a protein/peptide which gives a detectable signal observable by chemical, biological or physical means, when linked to P-L as P-L-R. As an example, R could be the beta-galactosidase enzyme, lacZ. Clones expressing P-L-lacZ in which P is a soluble protein are detected by the enzymatic activity of lacZ (See, e.g., "Beta- Galactosidase Gene Fusions For Analyzing Gene Expression In Escherichia Coli And Yeast," by M. Casadaban et al., Methods^ Enzymol. 100, 293 (1983)) on substrates which yield a colored reaction product (For example, X-gal (5-bromo-4-chioro-3-indolyl- β-D-galactoside)). Colonies expressing fusion proteins with β-galactosidase activity turn blue on plates containing X-gal. Furthermore, in situations where the lacZ protein proves too large, the functionally complementable lacZα fragment is used as a substitute. The complementary fragment Δ-lacZ is provided by the host chromosome
(For example, E. coli strain DH10B (F" mcrA A{mrrhsdRMS-mcrBC) φ80d/acZΔM15
Δ/acX74 eoR recA1 endA1 araD139 Δ(ara,/eu)7697 gall) galKλ' rpsL nupG), where the complementary fragment is provided by <j>80d/acZΔM15. Fusion proteins P-L-lacZα containing a soluble protein P are soluble and contain a correctly-folded lacZα, thereby leading to complementation of the Δ-lacZ fragment and restoration of lacZ β- galactosidase activity.
EXAMPLE 4
Reporter proteins R, which have optimal activity when present in a non-fusion context may be employed for assays. The construct P-L-C-R is generated, where C is a unique protease site. For example, C could be the viral protease cleavage site for the plum pox virus Nla protease (See, e.g., M. Martin et al., "Determination of polyprotein processing sites by amino terminal sequencing of nonstructural proteins encoded by plum pox polyvirus", Virus Res. 15, 97, (1990)), and R is the lacZα fragment, as an example. The construct P-L-C-lacZα and the viral protease (Nla) could each be expressed under the control of separately inducible promoters on separate plasmids with compatible origins of replication. For an example of the use of multiple compatible plasmids with cloning sites under independently controlled promoters, see R. Lutz and H. Berjard, "Independent and tight regulation of transcriptional units in E. coli via the LacR/O, the TetR/O and AraD/VI2 regulatory elements", Nucleic Acids Res., 25(6), 1203, (1997). The plasmids and required E. coli host strains are commercially available; for example, the P-L-C-lacZα construct could be expressed under the control of the tet promoter, and the Nla gene under the control of the arabinose promoter/repressor. The plasmid(s) would be transformed into the appropriate E. coli host (see Lutz, supra), and anhydrotetrac^cline added to the growth medium to induce expression of P-L-C-lacZα. After accumulation of the fusion protein P-L-C-lacZα, arabinose+IPTG is added to the growth medium to induce expression of the Nla protease. P-L-C-lacZα is soluble and contains a correctly-folded lacZα domain, and P- L-C-lacZα is cleaved at site C, only if P were soluble. Subsequent release of lacZα complements the Δ-lacZ fragment and restores lacZ β-galactosidase activity, which is detected by standard colorimetric or fluorometric assays for β-galactosidase activity. As another example, R might be an antibiotic selection marker such as the β-lactamase gene (bla), which confers resistance to penicillin-derived antibiotics commonly used in cloning vectors. The β-lactamase gene contains a signal peptide and is translocated to the periplasm of E. coli. However, proper processing of the antibiotic resistance protein and translocation to the periplasm would be impeded by N-terminus fusions, although cleavage by the protease obviates this problem. The P-L-C-β-lactamase fusion protein would be soluble only if P were soluble. Concomitant induction by both anhydrotetracycline and IPTG+arabinose would provide both the fusion protein P-L-C- β-lactamase and the viral cleavage protease Nla. In cells bearing soluble variants of P, the fusion protein P-L-C-β-lactamase would be soluble and cleaved at C by virtue of the protease Nla, releasing functional β-lactamase resistance protein, thereby conferring antibiotic resistance to the antibiotic ampicillin. Conversely, in cells bearing non-soluble variants P, the fusion protein would be insoluble, the protease cleavage site C would be buried in inclusion bodies, and thereby inaccessible to cleavage by the viral protease. Furthermore, the β-lactamase protein would be buried in inclusion bodies, misfolded and non-functional. Such cells would not have resistance to the antibiotic ampicillin. It would be apparent to those having skill in the biochemical arts that selection for cells bearing soluble variants of P (and therefore having antibiotic resistance) could be accomplished by challenging mixtures of the above-mentioned cells by supplying the selective agent (e.g., the antibiotic ampicillin) in the growth medium. Moreover, it is likewise apparent to one having skill in the art that both the fusion protein P-L-C-β- lactamase and the protease Nla must be made continuously available to confer antibiotic selection throughout the life of the cell, and thus both genes must be simultaneously induced (in this example, by providing both anhydrotetracycline and IPTG/arabinose in the growth media). Cells with antibiotic resistance will survive, thereby selecting for soluble variants of P. Furthermore, additional improvement in the solubility of such variants could be accomplished by increasing the concentration of selective agent (e.g. ampicillin) during subsequent rounds of recombination and selection.
The foregoing description of the invention has been presented for purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. For example, it would be apparent one having skill in biochemistry after reviewing the present disclosure that the method of the present invention can be implemented in insect, yeast and mammalian cells, wherein fusion proteins P-L-GFP are expressed to create a solubility reporter. Similarly, directed evolution for improving the solubility of proteins can be performed using insect cells, and the required DNA manipulation according to the teachings of the present invention can be achieved in vitro or in vivo.
The embodiments were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto.

Claims

WHAT IS CLAIMED IS:
1. A method for determining the solubility of a protein, P, which comprises the steps of:
(a) fusing a DNA fragment, [P], which codes for said protein with the DNA [R] which codes for a reporter protein, R, which can be detected, forming thereby a fusion DNA fragment, [P-R], which codes for the fusion protein,
P-R, such that the solubility of P-R is determined by the solubility of protein, P; and
(b) introducing said DNA into an expression system such that fusion protein P-R is overexpressed therein; whereby if fusion protein P-R is soluble in said expression system, reporter protein R can be detected, thereby indicating that protein P is soluble.
2. The method for determining the solubility of a protein as described in claim 1, wherein DNA fragment [P] is fused with the DNA fragment which codes for a flexible linker peptide, [L], which has been fused with DNA fragment [R], forming thereby a fusion DNA fragment selected from the group consisting of [P- L-R] and [R-L-P], such that the solubility of fusion proteins expressed by said [P-
L-R] and said [R-L-P] are determined by the solubility of said protein P.
3. The method for determining the solubility of a protein as described in claim 2, wherein linker peptide, L is chosen to be short, flexible, hydrophilic and soluble.
4. The method for determining the solubility of a protein as described in claim 1 , wherein reporter protein R is selected from the group consisting of green fluorescent protein and variants^ thereof, lacZ, the lacZ-╬▒ fragment, and selectable marker proteins.
5. The method for determining the solubility of a protein as described in claim 4, wherein said lacZ and lacZ-╬▒ fragments include enzymes having chromogenic and fluorogenic substrates.
6. The method for determining the solubility of a protein as described in claim 4, wherein said selectable marker proteins are selected from the group consisting of ampicillin resistance proteins, tetracycline resistance proteins, kanamycin resistance proteins and aVsenic resistance proteins.
7. The method for determining the solubility of a protein as described in claim 1 , wherein said protein is a fragment of a larger protein and the DNA which codes for said fragment, is a fragment of the DNA which codes for said larger protein.
8. The method for determining the solubility of a protein as described in claim 7, wherein said DNA fragments which encode protein fragments of a larger protein are generated using methods from the group consisting of partial DNASE digest, radiation-induced fragmentation, chemical fragmentation, enzymatic digest, endonuclease digest, exonuclease digest, acoustic/mechanical shearing, and fragmentation.
9. The method for determining the solubility of a protein as described in claim 8, wherein said DNA fragments are size selected before said step of fusing said DNA fragment with the DNA [RJ which codes for a reporter protein, R, using methods selected from the group consisting of polyacrylamide gel electrophoresis, agarose gel electrophoresis, capillary electrophoresis, and high pressure liquid chromatography.
10. A method for modifying the solubility of a protein, P, which comprises the steps of:
(a) introducing mutations into [P], the DNA fragment which codes for said protein, generating thereby a combinatorial library of mutated variants, [X];
(b) in-frame fusing individual [X] variants, with a DNA construct which contains [R] which codes for a reporter protein R which can be detected in solution, forming thereby a set of DNA constructs containing [X-R], which code for the fusion proteins, X-R, such that the solubility of each of said X- R proteins is determined by the solubility of variant protein, X contained therein; and
(c) introducing each of said DNA constructs into an expression host such that fusion proteins X-R are ^overexpressed therein; whereby if one of said fusion proteins X-R is soluble in said host therefor, said reporter protein R can be detected, thereby indicating that the mutated variant of said protein P is soluble.
11. The method for modifying the solubility of a protein P as described in claim 10, wherein DNA fragment [X] is fused with the DNA fragment which codes for a flexible linker peptide, [L], which has been fused with said DNA fragment [R], forming thereby a fusion DNA fragment selected from the group consisting of [X-L-R] and [R-L-X], such that the solubility of said fusion proteins expressed by said [X-L-R] and said [R-L-X] are determined by the solubility of protein X.
12. The method for modifying the solubility of a protein as described in claim 11, wherein linker peptide, L is chosen to be short, flexible, hydrophilic and soluble.
13. The method for modifying the solubility of a protein as described in claim 10, further comprising the step of collecting said expression hosts expressing X, a more soluble form of protein P than the form of protein P expressed by the wild-type DNA.
14. The method for modifying the solubility of a protein as described in claim 13, wherein said expression hosts containing a soluble form X of protein P are separated by fluorescence assisted cell sorting from said expression hosts which contain an insoluble form X of protein P, before said step of collecting said expression hosts expressing X.
15. The method for modifying the solubility of a protein as described in claim 13, wherein said expression hosts containing a soluble form X of protein P are separated from said expression hosts which contain an insoluble form X of protein P using nutrient agar plates, before said step of collecting said expression hosts expressing X.
16. The method for modifying the solubility of a protein as described in claim 10, wherein reporter protein R is selected from the group consisting of green fluorescent protein and variants thereof, lacZ, the lacZ-╬▒ fragment, and selectable marker proteins.
17. The method for modifying the solubility of a protein as described in claim 16, wherein said lacZ and said lacZ-╬▒ fragments include enzymes having chromogenic and fluorogenic substrates.
18. The method for modifying the solubility of a protein as described in claim 16, wherein said selectable marker proteins are selected from the group consisting of ampicillin resistance proteins, tetracycline resistance proteins, kanamycin resistance proteins and arsenic resistance proteins.
19. The method for modifying the solubility of a protein as described in claim 10, wherein said step of introducing mutations into [P], thereby generating a combinatorial library of mutated variants [X], includes methods selected from the group consisting of recombination, error-prone PCR, propagation in error-prone host strains, doping mutagenesis, saturation mutagenesis, chemical mutagenesis, irradiation mutagenesis, site-directed mutation, and combinations thereof.
20. The method for modifying the solubility of a protein as described in claim 13, further comprising the step of recombining the DNA encoding [X] from each of said collected expression hosts expressing a soluble form of said protein P, thereby yielding a pool of variant DNA fragments [X] encoding mutants X of said protein P with further enhanced solubility.
21. The method for modifying the solubility of a protein as described in claim 20, wherein said step of recombining the DNA encoding variants [X] with enhanced solubility is accomplished using recombination.
22. The method for modifying the solubility of a protein as described in claim 21, wherein the recombination is achieved by in vitro by gene shuffling.
23. The method for modifying the solubility of a protein as described in claim 21, wherein the recombination is achieved in vivo by cell-mediated recombination.
24. The method for modifying the solubility of a protein as described in claim 10, wherein mutations which do not improve solubility are removed from the DNA encoding protein X by recombination of the DNA encoding protein X with wild type DNA fragments, followed by selection for the most soluble variants.
25. The method for modifying the solubility of a protein as described in claim 10, wherein said protein is a fragment of a larger protein and the DNA which codes for said fragment, is a fragment of the DNA which codes for said larger protein.
26. The method for modifying the solubility of a protein as described in claim
25, wherein said DNA fragments which encode protein fragments of a larger protein are generated using methods from the group consisting of partial DNASE digest, radiation-induced fragmentation, chemical fragmentation, enzymatic digest, endonuciease digest, exonuclease digest, acoustic/mechanical shearing, and fragmentation.
27. The method for modifying the solubility of a protein as described in claim
26, wherein said DNA fragments are size selected before said step of fusing said DNA fragment with the DNA [R] which codes for a reporter protein, R, using methods selected from the group consisting of polyacrylamide gel electrophoresis, agarose gel electrophoresis, capillary electrophoresis, and high pressure liquid chromatography.
PCT/US1998/025862 1997-12-12 1998-12-04 Method for determining and modifying protein/peptide solubility WO1999031266A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU16291/99A AU1629199A (en) 1997-12-12 1998-12-04 Method for determining and modifying protein/peptide solubility

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US98938097A 1997-12-12 1997-12-12
US08/989,380 1997-12-12

Publications (1)

Publication Number Publication Date
WO1999031266A1 true WO1999031266A1 (en) 1999-06-24

Family

ID=25535069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/025862 WO1999031266A1 (en) 1997-12-12 1998-12-04 Method for determining and modifying protein/peptide solubility

Country Status (2)

Country Link
AU (1) AU1629199A (en)
WO (1) WO1999031266A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001023602A1 (en) * 1999-09-30 2001-04-05 The Regents Of The University Of California Method for determining and modifying protein/peptide solubility
WO2001029225A1 (en) * 1999-10-21 2001-04-26 Panorama Research, Inc. A general method for optimizing the expression of heterologous proteins
WO2002083734A2 (en) * 2001-04-17 2002-10-24 Isis Innovation Ltd Modified calcitonin

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5491084A (en) * 1993-09-10 1996-02-13 The Trustees Of Columbia University In The City Of New York Uses of green-fluorescent protein

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5491084A (en) * 1993-09-10 1996-02-13 The Trustees Of Columbia University In The City Of New York Uses of green-fluorescent protein

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"COMBINATORIAL MULTIPLE CASSETTE MUTAGENESIS CREATES ALL THE PERMUTATIONS OF MUTANT AND WILD-TYPE SEQUENCES", BIOTECHNIQUES, INFORMA HEALTHCARE, US, vol. 18, no. 02, 1 January 1995 (1995-01-01), US, pages 194 - 196, XP002917681, ISSN: 0736-6205 *
STEMMER W P C: "RAPID EVOLUTION OF A PROTEIN IN VITRO BY DNA SHUFFLING", NATURE, NATURE PUBLISHING GROUP, UNITED KINGDOM, vol. 370, 4 August 1994 (1994-08-04), United Kingdom, pages 389 - 391, XP002917680, ISSN: 0028-0836, DOI: 10.1038/370389a0 *
WU C, ET AL.: "NOVEL GREEN FLUORESCENT PROTEIN (GFP) BACOLOVIRUS EXPRESSION VECTORS", GENE., ELSEVIER, AMSTERDAM., NL, vol. 190, 29 April 1997 (1997-04-29), NL, pages 157 - 162, XP002917679, ISSN: 0378-1119, DOI: 10.1016/S0378-1119(96)00538-0 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001023602A1 (en) * 1999-09-30 2001-04-05 The Regents Of The University Of California Method for determining and modifying protein/peptide solubility
WO2001029225A1 (en) * 1999-10-21 2001-04-26 Panorama Research, Inc. A general method for optimizing the expression of heterologous proteins
WO2002083734A2 (en) * 2001-04-17 2002-10-24 Isis Innovation Ltd Modified calcitonin
WO2002083734A3 (en) * 2001-04-17 2003-05-01 Isis Innovation Modified calcitonin

Also Published As

Publication number Publication date
AU1629199A (en) 1999-07-05

Similar Documents

Publication Publication Date Title
Feilmeier et al. Green fluorescent protein functions as a reporter for protein localization in Escherichia coli
CN107922931B (en) Thermostable Cas9 nuclease
US6867042B2 (en) Method for determining and modifying protein/peptide solubility
CA2747462C (en) Systems and methods for the secretion of recombinant proteins in gram negative bacteria
Kuhn et al. Recombinant forms of M13 procoat with an OmpA leader sequence or a large carboxy‐terminal extension retain their independence of secY function.
WO1999028746A1 (en) A bacterial multi-hybrid system and applications
CN106834252B (en) A kind of high stable type MazF mutant and its application
EP1147211A1 (en) High level expression of a heterologous protein having rare codons
US4725535A (en) Promoter probe vectors
JP2008511310A (en) Methods for determining protein solubility
WO1999031266A1 (en) Method for determining and modifying protein/peptide solubility
Maclntyre et al. Requirement of the SecB chaperone for export of a non-secretory polypeptide in Escherichia coli
KR20100033845A (en) Method for preparing protein having high specific amino acid content through co-expression of trna of specific amino acid
US10544414B2 (en) Two-cassette reporter system for assessing target gene translation and target gene product inclusion body formation
WO2019111194A1 (en) Fusion proteins for the detection of apoptosis
KR101500839B1 (en) Method for Screening Microorganism with High L-tryptophan Productivity Using Riboswitch
KR101523834B1 (en) Red Fluorescence Protein Variants
JP4243680B2 (en) Foreign insert selection marker
JP4300031B2 (en) Cloning vector
KR102617593B1 (en) Target protein expression platform using virus nucleocapsid
Gaytán et al. CiPerGenesis, A Mutagenesis Approach that Produces Small Libraries of Circularly Permuted Proteins Randomly Opened at a Focused Region: Testing on the Green Fluorescent Protein
US6632638B1 (en) Enhanced solubility of recombinant proteins using Uracil DNA glycosylase inhibitor
EP2796556A1 (en) Improved means and methods for expressing recombinant proteins
KR101104817B1 (en) The use of esterase ESTL120P for reporter as a fusion partner, and its use for indicator in cloning vector system
JP2022131118A (en) Vectors for displaying proteins on surface of escherichia coli

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase