WO1994025609A1 - Method for creating optimized regulatory regions affecting protein expression and protein trafficking - Google Patents

Method for creating optimized regulatory regions affecting protein expression and protein trafficking Download PDF

Info

Publication number
WO1994025609A1
WO1994025609A1 PCT/US1994/004651 US9404651W WO9425609A1 WO 1994025609 A1 WO1994025609 A1 WO 1994025609A1 US 9404651 W US9404651 W US 9404651W WO 9425609 A1 WO9425609 A1 WO 9425609A1
Authority
WO
WIPO (PCT)
Prior art keywords
polypeptide
nucleic acid
vector
protein
expression
Prior art date
Application number
PCT/US1994/004651
Other languages
French (fr)
Inventor
Douglas Evan Antelman
Barry S. Wilson
Original Assignee
Hybritech Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hybritech Incorporated filed Critical Hybritech Incorporated
Priority to AU67761/94A priority Critical patent/AU6776194A/en
Publication of WO1994025609A1 publication Critical patent/WO1994025609A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • C12N15/625DNA sequences coding for fusion proteins containing a sequence coding for a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K16/00Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/61Fusion polypeptide containing an enzyme fusion for detection (lacZ, luciferase)

Definitions

  • This invention relates generally to methods for the mutagenesis of nucleic acid sequences and more specifically to mutagenesis strategies involving polymerase chain reaction-related technologies and to optimized regulatory sequences generated by these methods.
  • Gene expression can be regulated at various steps on the path from DNA to RNA to mature protein.
  • the overall path is the same for pro aryotes and eukaryotes with regulation possible at the level of transcription of RNA from the DNA template, translation of mRNA at the ribosome, and targeting of protein to appropriate locations within the cell or outside.
  • prokaryotes and eukaryotes lead to a number of differences in how gene expression can be regulated in each. Intracellular membranes and organelles are critical components of eukaryotic cells, but are not found in prokaryotes.
  • the potential for coupled transcription and translation in prokaryotes is not possible in compartmentalized eukaryotic cells.
  • Eukaryotic genes have introns interrupting the coding sequences, prokaryotic genes do not.
  • prokaryotes put coding sequences for multiple genes under a single promoter and transcribe them as a single polycistronic mRNA, but eukaryotes do not.
  • Most eukaryotic genes require RNA splicing, capping, poly-adenylation, and transport from the nucleus to the cytoplasm before the mRNA becomes functional, but prokaryotes do not.
  • eukaryotes require a more complicated process for protein targeting to distinguish proteins destined for different organelles, the cell membrane, and for secretion, as well as post translational modification of many eukaryotic proteins before they can be assembled or become functional.
  • EUKARYOTES the expression of a particular gene can be regulated at transcription, RNA processing, RNA transport, translation, protein targeting, and protein activation/modification. Specific gene sequences are responsible for effects at each level and these can be modified to increase expression according to this invention, if eukaryotic expression is desired.
  • these sequences act directly, by interacting with DNA binding proteins.
  • Examples include: Promoters which interact with transcription factors and RNA polymerase to initiate transcription. Upstream promoter elements which are generally near to the promoter and interact with DNA binding proteins to either improve or inhibit the ability of polymerase to interact with the promoter.
  • the transcription of the genes encoding the DNA binding proteins described above and below is another level of regulation of specific eukaryotic genes. Enhancers, which are quite distant from the promoter, bind to other DNA binding proteins to enhance the level of transcription of the gene, although specific sequences are also present in enhancers which can inhibit transcription under certain conditions or in specific cell types.
  • enhancers, promoters and upstream promoter elements can be modified to increase expression according to this invention.
  • primary transcripts are capped at their 5' ends, poly-adenylated at their 3' ends and have their introns spliced out (sometimes leading to different protein products due to alternative RNA splicing) prior to being transported through pores in the nuclear membrane to cytoplasm, where they associate with ribosomes.
  • Important RNA sequences which can affect the efficiency of this process include the poly-A addition site, the Cap site, and the splice acceptor and donor sites, and DNA sequences encoding these sites can be modified to increase expression according to this invention.
  • a eukaryotic mRNA To participate in translation, a eukaryotic mRNA must be transported out of the nucleus. Specific recognition signals on the mRNA are believed necessary for transit of mRNA through the nuclear pore. Other signals may retain certain mRNAs within the nucleus, where they are degraded. As many as half of the unspliced heterologous nuclear RNAs produced from primary transcripts are degraded without ever leaving the nucleus, offering a potential for regulation at another level, the level of RNA transport.
  • DNA sequences encoding these RNA signals could be modified to increase expression according to this invention as well.
  • Eukaryotic mRNAs vary in susceptibility to degradation. Binding to ribosomes decreases their degradation. Important sequences that can be modified to control expression at the level of translation are those recognized by specific translation repressor proteins, which bind to the 5 1 end of some mRNAs and block translation initiation. For those mRNAs which are not blocked, current data suggests that initiation of translation is by the so-called "scanning model" of initiation.
  • a ribosome component (4OS) binds initially to the 5• end of an mRNA then migrates 3• until it finds an initiation site where the other portion of the ribosome joins it to form the 80S initiation complex.
  • the Kozak sequences surrounding the translational start site determine whether a particular mRNA will be efficiently translated.
  • the presence of an Adenine (A) at the -3 position has as much as a twenty-fold impact on translation.
  • the translation rate is most affected by the nucleotide three prior to the initiator codon and the one immediately after it [best if a Guanine (G) ] .
  • the DNA sequence encoding binding sites for translation represser proteins and for the region around the initiator can be modified to increase expression according to this invention.
  • eukaryotic proteins have a diversity of target locations within the cell including the nucleus, nuclear membrane, cytoplasm, mitochondria, mitochondrial inner membrane, mitochondrial outer membrane, endoplasmic reticulum, golgi apparatus, lysosomes, lysosomal membranes, endoplasmic reticulum membrane, and cell membrane.
  • secreted proteins are targeted to the exterior of the cell. The process begins, as for prokaryotes, with a signal sequence directing the nascent peptide and its ribosomes to a membrane. But in this case, the membrane is the rough endoplasmic reticulum (RER) .
  • RER rough endoplasmic reticulum
  • the protein If the protein is to be secreted, it is translated into the RER and begins a journey through different membrane bound organelles and vesicles to the cell membrane where it is extruded to the outside by exocytosis. Proteins destined to be membrane- bound are also translated into the RER, but do not completely transit the rough endoplasmic reticulum membrane (RERM) , becoming integrated into it. This membrane fuses with that of the Golgi apparatus or of vesicles which, in turn, bud off and fuse with the destination membrane where the protein either becomes bound or transits further to its final target. Targeting to the proper site is almost certainly facilitated by identifiable target signals within the sequence of the protein that direct the protein to the proper destination. Thus, the sequences that can be modified to increase expression according to this invention include those encoding the signal sequence, transmembrane sequences, cytoplasmic anchor sequences, and the putative specific membrane targeting sequences.
  • immunoglobulins consist of multiple chains that must be assembled prior to secretion.
  • the heavy chains cannot be secreted if they are not bound to light chains.
  • additional regulatory elements that can be modified according to the method of this invention to increase expression are those which encode structural features that facilitate or impede correct folding of a peptide or its association with other chains necessary to form the mature protein product.
  • eukaryotic proteins are synthesized as inactive or partially active precursors that become activated by proteolytic cleavage or other modifications (e.g., phosphorylation) .
  • proteolytic cleavage or other modifications e.g., phosphorylation
  • secretion cannot occur without proteolytic cleavage of the signal peptide. Inefficiency in this process can delay or block secretion of functional protein.
  • DNA sequences encoding proteolytic cleavage sites, phosphorylation sites, glycosylation sites, or other sites for post-translational modification can be modified according to this invention to yield increased expression of functional protein.
  • PROKARYOTES there are fewer stages where regulation of gene expression can occur, and prokaryote specific elements and associated proteins are involved. Regulation does take place at the levels of transcription, translation, and protein targeting, as for eukaryotic genes, but RNA processing and RNA transport are totally absent in prokaryotes. There are also far fewer known cases of post-translational modification of proteins in bacteria than in eukaryotes. Of special importance, there are very few authenticated cases of glycosylated bacterial proteins, although proteins are modified in other ways (e.g., fatty acylation). No clear picture of regulation at the level of post-translational modification has emerged. Like eukaryotes, prokaryotes offer a number of specific DNA sequences which can be modified according to this invention to increase expression.
  • RNA polymerase binds RNA polymerase and, with the release of the ⁇ subunit, initiation of transcription occurs. Unlike in eukaryotes, distant enhancer sequences have not been seen in prokaryotes. Different promoter sequences have different affinities for RNA polymerase. Transcription initiation can be positively regulated by binding of a catabolite gene activator protein (CAP) to a site immediately upstream of the promoter and, in many genes, can be negatively regulated by binding of a repressor protein to a site immediately downstream of the promoter called the operator. The level of transcription of these DNA binding proteins can affect the transcriptional level of the regulated gene. As one skilled in the art will appreciate, binding sites can be modified both in the target gene and in the gene encoding the repressor or CAP using the present invention to increase or decrease the level of transcription of prokaryotic genes.
  • CAP catabolite gene activator protein
  • Sites amenable to modification to increase expression at the level of translation also exist in prokaryotic DNA, but are distinct from those in eukaryotes.
  • a region originally known as the Shine and Dalgarno sequence or more recently as the ribosome binding site found on the 5' end of prokaryotic mRNAs is complementary to the 16S ribosomal RNA. Similar to eukaryotic initiators, an Adenine (A) is preferred at the -3 position.
  • A Adenine
  • the Shine and Dalgarno interaction is unique to prokaryotes and is unlike the scanning model used in eukaryotes to initiate translation described above.
  • the DNA sequence which encodes the ribosome binding site in mRNA can be modified according to this invention to increase expression.
  • ribosomal proteins proteins which bind to sites on specific mRNAs to inhibit their translation
  • transcription can be affected by 5' coding sequences which can lead to premature termination of transcription based on charged tRNA availability.
  • This system called attenuation, is seen in genes for amino acid biosynthetic pathways, but is unique to prokaryotes because it requires coupled transcription and translation, which can occur only in cells having no nuclear membrane separating the two processes. Placement of the gene within the operon also has an effect on translation of specific genes.
  • the 5* genes are more efficient in initiation than those at the 3 ' end of the polycistronic message.
  • the "polarity effect" is amplified by the fact that binding to ribosomes actually delays mRNA degradation.
  • the DNA sequences which encode protein sequences responsible for attenuation can be modified according to this invention to increase expression.
  • specific targeting sequences are required for different types of targeting. These sequences encode portions of the protein with hydrophobic or hydrophilic properties corresponding to the protein's point of association with the membrane.
  • the most common sequence is the N-terminal signal sequence, which is responsible for targeting the protein and its associated ribosomes to the cell membrane.
  • Internal sequences determine if portions of the protein will reside within or outside of the lipid bilayer of the membrane or if the protein will be secreted into the periplas ic space (in gram negative) or directly into the media (in gram positive bacteria) .
  • the efficiency of secretion depends upon the signal sequence as well as the cellular machinery required for transport. This includes peptidases for removing the signal peptide and bacterial cell membrane proteins, both of which have been implicated as potential signal peptide receptors.
  • a Signal Recognition Particle (SRP) analogue has recently been confirmed for bacteria (Luirink J., et al. Nature, 359:741-743, 1992.
  • SRP Signal Recognition Particle
  • the absence of functional prokaryotic ribosome binding sequences often leads to little or no translation of the eukaryotic message.
  • the absence of prokaryotic codon usage can slow translation.
  • the DNA sequence encoding an mRNA for a eukaryotic signal could be completely misinterpreted in a prokaryotic expression system. For example, it could be recognized by a prokaryotic DNA binding protein, or the mRNA could experience inhibitory folding or be more rapidly degraded than in the native cell. Because such miscommunication depends in part upon the sequence of the gene to be expressed, it is possible that different genes inserted into the same vector in the same heterologous expression cell will attain different levels of expression. For example, a leader sequence optimized to attain the highest level of expression of one eukaryotic protein gene in E. coli may not be the leader sequence that will optimize expression of a different eukaryotic protein gene in E. coli .
  • the present invention overcomes these difficulties by providing a means to customize heterologous expression for any particular desired protein.
  • the present method assays the combined effect of mutations made in the DNA of one or more particular control sequences upon all steps taken during the whole process of heterologous expression leading up to secretion of a particular desired protein. Therefore, the present method provides a means to customize modifications to discrete sequences in eukaryotic DNA which impact expression of the desired protein in the heterologous host cell.
  • This invention relates to methods for optimizing the expression of polypeptide in a cell and to methods for creating and isolating novel regulatory sequences for the expression of polypeptide in a cell.
  • the regulatory sequences contemplated within the scope of this invention include, but are not limited to, signal sequences, ribosome binding sites, promoter sequences, translational regulatory sequences, transcription regulatory sequences, protein trafficking sequences, enhancer sequences, and the like.
  • a method for optimizing the production of polypeptide in a cell comprising the steps of (a) identifying at least one regulatory region within a nucleic acid sequence to be mutagenized; (b) preparing a nucleic acid vector comprising the regulatory region and a nucleic acid sequence encoding at least one polypeptide regulated by the regulatory region; (c) deleting the regulatory region from the vector; (d) producing a pool of random oligonucleotides; (e) using a polymerase chain reaction to introduce at least one random oligonucleotide into the position previously occupied by the regulatory region in a plurality of vectors to generate a pool of mutagenized vectors; (f) introducing the mutagenized vectors into a cell sample; (g) assaying for the expression of the polypeptide in the cell sample; (h) selecting cells exhibiting optimized polypeptide expression; and (i) isolating optimized polypeptide from the cells of step (h) .
  • the regulatory region is located within the nucleic acid sequence encoding polypeptide and in one preferred embodiment the regulatory region is in a translated portion of the nucleic acid sequence encoding polypeptide. In another preferred embodiment, the regulatory region is outside of the nucleic acid sequence encoding polypeptide.
  • the method is useful for optimizing regulatory regions selected from the group consisting of signal sequences, ribosome binding sites, promoter sequences, translational regulatory sequences, transcription regulatory sequences and protein trafficking sequences.
  • the nucleic acid sequence encodes an antibiotic resistant gene and the selection step consists essentially of growing the cell sample in the presence of an antibiotic.
  • the nucleic acid sequence encoding polypeptide encodes a selectable marker and in yet another aspect of this embodiment, the nucleic acid sequence encoding polypeptide encodes a fusion protein.
  • the nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and the cell sample is prokaryotic.
  • the method additionally comprises the steps of introducing the nucleic acid vector into a second cell sample, expressing the polypeptide encoded by the vector in the second cell sample, and measuring the level of polypeptide expression in the second cell sample.
  • This method optionally includes the step of selecting cells from the first cell sample that exhibit optimized polypeptide expression relative to the measuring step. Once optimized cells are selected, it is contemplated that the nucleic acid sequence of the random oligonucleotide corresponding to the mutagenized regulatory region producing optimized polypeptide expression can be identified.
  • a method for creating and isolating novel signal sequences comprising the steps of: (a) identifying a signal sequence within a nucleic acid sequence encoding a polypeptide; (b) preparing a nucleic acid vector comprising the nucleic acid sequence encoding the polypeptide; (c) introducing the vector into a first cell sample and expressing the polypeptide in the first cell sample; (d) measuring the level of polypeptide expression in the first cell sample; (e) deleting the signal sequence from the vector; (f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by the signal sequence in a plurality of vectors to generate a pool of mutagenized vectors; (h) introducing the mutagenized vectors into a second cell sample; (i) assaying for the expression of the polypeptide in the second cell sample; (j)
  • the optimized level of polypeptide expression is a level of expression greater than or equal to the level of expression of the polypeptide in step (d) .
  • the nucleic acid vector encodes an antibiotic resistant gene and the selection step additionally consists essentially of growing the cell sample in the presence of an antibiotic.
  • the nucleic acid sequence encoding polypeptide preferably encodes a selectable marker and the polymerase chain reaction is preferably an enzymatic inverse polymerase chain reaction.
  • the nucleic acid sequence encoding polypeptide encodes a .fusion polypeptide and in a particularly preferred embodiment a portion of the fusion protein is derived from 3-lactamase. In another embodiment a portion of the fusion protein is derived from an antibody.
  • the nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and the first and second cell samples are prokaryotic. Still more preferably, the cell samples are E . coli .
  • the random oligonucleotides are biased and preferably contain at least one positively charged amino acid at the N- terminus, a stretch of at least 8 hydrophobic amino acids and a small amino acid such as alanine, glycine, or valine positioned at the C-terminus.
  • a method for optimizing polypeptide expression in a cell by performing random mutagenesis on a regulatory region regulating polypeptide expression from a nucleic acid vector wherein the improvement comprises deleting a region of nucleic acid to be mutagenized, isolating the vector containing the deletion and replacing the region to be mutagenized with a random nucleic acid sequence.
  • a method for creating and isolating novel ribosome binding sites comprising the steps of: (a) identifying a ribosome binding site within a nucleic acid sequence; (b) preparing a nucleic acid vector comprising the nucleic acid sequence containing a ribosome binding site operably linked to a nucleic acid sequence encoding a polypeptide; (c) introducing the vector into a first cell sample and expressing the polypeptide in the first cell sample; (d) measuring the level of polypeptide expression in the first cell sample; (e) deleting the nucleic acid sequence containing the ribosome binding site from the vector; (f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by the nucleic acid sequence containing the ribosome binding site in a plurality of vectors to generate a pool of mu
  • the nucleic acid sequence encodes an antibiotic resistance gene and the selection step consists essentially of growing the cell samples in the presence of an antibiotic.
  • the nucleic acid sequence encoding polypeptide encodes a selectable marker.
  • the nucleic acid sequence encoding polypeptide is a fusion protein and preferably the fusion protein is derived from an antibody and in another preferred embodiment the fusion protein is derived from jS-lactamase.
  • polypeptide signal sequences are disclosed. These signal sequences correspond to SEQ ID NO: 13, SEQ ID NO: 16 and SEQ ID NO: 14 and to polypeptide signal sequences containing at least contiguous amino acid 10-mers thereof.
  • nucleic acid sequences are disclosed corresponding to ribosome binding sites identified as SEQ ID NO: 35, SEQ ID NO: 37, and SEQ ID NO: 38 and to ribosome binding sites corresponding to at least contiguous 5-mers thereof.
  • a method for creating and isolating novel signal sequences comprising the steps of: (a) identifying a signal sequence within a nucleic acid sequence encoding a polypeptide; (b) preparing a nucleic acid vector comprising the nucleic acid sequence encoding a polypeptides, wherein the polypeptide is a fusion protein having a C-terminus corresponding to /3-lactamase; (c) deleting the signal sequence from the vector; (d) producing a pool of random oligonucleotide suitable for an enzymatic inverse polymerase chain reaction; (e) using an enzymatic inverse polymerase chain reaction to introduce the random oligonucleotides into the position in the vector previously occupied by the signal sequence to generate a pool of mutagenized vectors; (f) introducing the mutagenized vectors into a first cell sample of E .
  • the method additionally comprises the steps of introducing the vector comprising the nucleic acid sequence encoding a polypeptide into a second cell sample of E . coli , expressing the polypeptide in the second cell sample and measuring the level of polypeptide expression in the second cell sample.
  • this method includes the added step of selecting cells from the first cell sample that exhibit optimized polypeptide expression relative to the level of polypeptide expression observed in the measuring step.
  • the optimized polypeptide expression is a level of polypeptide expression greater than or equal to the level of polypeptide expression obtained from the measuring step.
  • the method additionally comprises the step of determining the nucleic acid sequence of the random oligonucleotide introduced into the position in the vector, previously occupied by the signal sequence, isolated from cells identified by the selecting step.
  • the concentration of ampicillin is at least 30 ⁇ g/ml and the nucleic acid sequence encoding polypeptide is a sequence encoding a single-chain antibody.
  • a method for identifying a novel nucleic acid sequence encoding a protein trafficking signal that directs a polypeptide to a desired location in a cell comprising the steps of: (a) identifying a region of nucleic acid containing at least one protein trafficking signal to be mutagenized; (b) preparing a nucleic acid vector comprising the protein trafficking signal sequence to be mutagenized and a nucleic acid sequence encoding at least one polypeptide; (c) deleting the protein trafficking signal sequence from the vector; (d) producing a pool of random oligonucleotides; (e) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by the protein trafficking signal in a plurality of vectors to generate a pool of mutagenized vectors; (f) introducing the mutagenized vectors into a cell sample; (g) assaying for the location of the polypeptide in the cell sample;
  • the protein trafficking sequence to be mutagenized is located in the nucleic acid sequence encoding the polypeptide and in another embodiment the polymerase chain reaction is the enzymatic inverse polymerase chain reaction and the nucleic acid sequence encodes an antibiotic resistant gene and the selection step consists essentially of growing the cell sample in the presence of an antibiotic.
  • the desired cell location is extracellular.
  • a method for creating and isolating novel regulatory sequences useful for optimizing the expression of a reco binant polypeptide in prokaryotic cells comprising the steps of: (a) identifying at least one regulatory region within a nucleic acid sequence to be mutagenized; (b) preparing a nucleic acid vector suitable for expressing polypeptide in a prokaryotic cell, the nucleic acid vector comprising the regulatory region and a nucleic acid sequence encoding at least one polypeptide operably linked to the regulatory region; (c) introducing the vector into a first prokaryotic cell sample and expressing the polypeptide encoded by the vector in the first cell sample; (d) measuring the level of polypeptide expression in the first prokaryotic cell sample; (e) deleting the regulatory region from the vector; (f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one random oligonucleotide into the position previously
  • the regulatory region is selected from the group consisting of a signal sequence, a ribosome binding site, a promoter sequence, a translational regulatory sequence, a transcription regulatory sequence and a protein trafficking sequence.
  • the nucleic acid sequence encoding polypeptide encodes an antibiotic resistant gene and the selection step additionally consists of growing the second prokaryotic cell sample in the presence of an antibiotic.
  • the first and second cell samples are derived from the same cell type.
  • Figure 1 diagrams the assembly of the single chain antibody ⁇ -lactamase fusion protein and the incorporation of the fusion protein with the Omp A signal sequence into the expression vector pCCHAscl-ampRv2.
  • Figure 2 illustrates the strategies followed to delete the Omp A regulatory region, to incorporate the random oligonucleotides, and to generate a library of random signal peptide mutants.
  • Figure 3 is a diagram of plas id pCLA3ampR that was used to redone identified signal sequences to test for signal sequence effectiveness.
  • Figure 4 is a comparison of the hydrophobicity and the alpha and beta region analysis of two novel signal peptide sequences identified using the methods of this invention as compared with known signal peptide sequences.
  • Figure 5 illustrates the construction of variable length signal peptide libraries using the methods of this invention.
  • Figure 6 is a diagram of the plasmid modifications used to produce the plasmid containing the ribosome binding site library.
  • Figure 7 is a diagram of plasmid pGCEMK.
  • Figure 8 is a diagram of plasmid pNCEMGl.
  • Figure 9 illustrates sequences of some of the oligonucleotides containing the ribosome binding site and signal peptide sequences identified using the methods of this invention.
  • This invention provides methods useful for creating novel regulatory sequences that affect recombinant protein expression within either a prokaryotic or eukaryotic cell.
  • these methods advantageously facilitate the optimization of recombinant protein expression in a cell.
  • modifications can be incorporated into more than one regulatory region and the cumulative effect of these mutations can be assessed by isolating clones expressing the desired level of recombinant protein expression.
  • polypeptide and “protein” are used interchangeably throughout this text.
  • heterologous protein expression is used herein to refer to protein that is not native to the host cell expressing the heterologous protein.
  • optimized protein expression is used herein to refer to the cumulative conditions that provide an optimal level of protein expression for a particular protein in a particular cell system. Under one set of laboratory conditions, optimized protein expression may refer to the highest available level of protein expression, while in another set of laboratory conditions, optimized protein expression may refer to low level protein expression because for that particular application low level protein expression is preferred. In yet another set of laboratory conditions optimized protein expression may refer to the level of protein expression that can coexist with cell life in situations where, under standard conditions, the protein would be cytocidal to the cells.
  • operably linked is used to mean regulatory regions that ultimately influence, or effect the level of recombinant protein expression from a nucleic acid vector.
  • regulatory region is outside of a nucleic acid sequence encoding polypeptide
  • coding region is used to denote that region of DNA that corresponds to RNA beginning with a methionine codon and ending with a polyadenylation signal.
  • the methods of this invention are suitable for the optimization of recombinant protein sequences in either prokaryotes or eukaryotes, it is contemplated that these methods are particularly useful for the optimization of heterologous protein sequences in either prokaryotes or eukaryotes. In particular, these methods are useful for optimizing the expression of eukaryotic protein in prokaryotic cells.
  • eukaryotic protein in a prokaryotic system is often commercially desirable.
  • Prokaryotic cells typically have shorter doubling times than eukaryotic cells and they are easier and less expensive to grow in large quantity than eukaryotes.
  • optimizing expression of a eukaryotic protein in a prokaryotic cell such as E. coli has heretofore been an inefficient process.
  • a eukaryotic gene sequence encoding protein that is incorporated into a prokaryotic expression vector is expressed at relatively low levels or at levels that result in prokaryotic cell death.
  • the methods of this invention do not require a systematic assessment of each individual factor that can effect protein synthesis. Rather, the methods of this invention employ a novel random mutagenesis and selection strategy that advantageously permits one of skill in the art to identify clones that yield optimized levels of heterologous protein expression by assessing the cumulative effect of all of the changes incorporated into the expression vector. Thus, by the method of this invention, it is possible to select clones having mutagenized regulatory sequences individually tailored to optimize expression of any given protein in any given expression cell.
  • mutagenesis techniques involve the systematic replacement of one or more nucleotides with other known nucleotides.
  • a variety of site-directed mutagenesis strategies are known in the art and are commercially available in kit format (see BioRad, Richmond, CA; Stratagene, La Jolla, CA; or Invitrogen, San Diego, CA.)
  • site-directed mutagenesis one begins with a known, efficient regulatory region and improves it by introducing single or multiple point mutations.
  • the mutagenized regulatory region may or may not be biased according to what is known about the chemical properties of the wild-type regulatory region.
  • cassette mutagenesis strategies are also known in the art. These strategies involve removing a section of DNA between two restriction sites and replacing it with a different DNA sequence that is bounded by the same restriction sites (Richards, J.H. , (1991) "Cassette Mutagenesis” in Directed Mutagenesis, A Practical Approach, M.J. McPherson ed. IRL Press) . Chou, et al. (J. Biol . Chem .
  • cassette mutagenesis in eukaryotic cells to study the effect of signal sequence mutants on protein expression (Mol . Cell . Biol . 9:3400-3410, 1989). These techniques replace identified sequences with known alternatives. However, unlike the mutagenesis strategies of this invention, these traditional site-directed and cassette mutagenesis techniques are not efficient methods for creating novel functional sequences.
  • random or biased mutagenesis techniques are useful methods for generating a pool of novel mutants. Regions of oligonucleotides, or whole oligonucleotides, are synthesized using methods that permit the random incorporation of nucleotides. Random mutagenesis combined with good selection systems can be used to identify functional regulatory sequences lacking any predisposed bias to previously identified sequences. The collection of random sequences is incorporated into the gene sequence in place of the native sequence.
  • random mutagenesis will be used to encompass biased mutagenesis techniques.
  • Biased mutagenesis is a type of random mutagenesis in which pools of oligonucleotides are prepared to accommodate the incorporation of more than one type of nucleotide in a given location within a nucleic acid sequence.
  • Biased mutagenesis techniques are particularly useful for generating pools of mutants that maintain the consensus patterns of charge, hydrophobicity, hydrophilicity, etc. within a particular region of protein encoded by a nucleic acid sequence. In this way the chemical characteristics of a molecule can be maintained while the actual nucleic acid sequence and amino acid sequence of an encoding protein is varied.
  • biased mutagenesis to regulatory regions has the advantage that optimized sequences can be identified that are chemically biased toward native sequences but reflect a wider variation than what has previously been identified in the art. Some of these sequences may direct protein expression more efficiently than other sequences heretofore identified.
  • Strategies of random mutagenesis to select clones containing functional mutations is complicated by the contaminating background level of clones containing nonmutated sequences. The presence of nonmutated sequences complicates selection and screening strategies by decreasing the variability within a library and by increasing the number of false positive clones. A large number of the clones produced by traditional random mutagenesis techniques are false positive because they contain the functional nonmutated sequence.
  • the eukaryotic regulatory sequences contemplated within the scope of this invention include, but are not limited to promoters, upstream promoter elements, enhancers, transcriptional regulatory elements, translational regulatory proteins such as translation repressor proteins, signal sequences, protein targeting sequences, chaperon proteins (these binding proteins coat polypeptide chains as they emerge from ribosomes or membranes, preventing aggregation reactions and premature folding (Flynn, et al. Nature 353: 726, 1991) and sequences directing post-translational modifications and the like.
  • prokaryotic regulatory sequences contemplated within the scope of this invention include, but are not limited to, promoters, transcription regulatory sequences, translation regulatory sequences including ribosome binding sites, sequences associated with attenuation, protein targeting sequences, signal sequences, and the like.
  • the regulatory sequences may be in any number of locations relative to the gene sequence encoding protein, for instance, within the translated region of a coding sequence, within a transcribed region of a coding sequence, or external to the coding sequence.
  • the signal sequence for both eukaryotic and prokaryotic cells is positioned at the amino terminus of the nucleic acid sequence encoding protein. Since the primary sequence of the signal peptide has a direct effect on the translocation of the associated protein (Inouye, et al. Proc . Natl . Acad . Sci . , USA 74: 1004-1008, 1977) , the signal peptide is a candidate regulatory sequence suitable for the methods of this invention.
  • Example 1 illustrates the application of the methods of this invention to the generation of novel signal sequences.
  • Example 3 provides an exemplary strategy useful for generating novel ribosome binding sites.
  • the regulatory sequences created by the methods of this invention may be of any length. For example, while most signal sequences fall between a range of between 18-24 amino acids, it is contemplated that novel optimized signal sequences could be created in a variety of lengths.
  • Example 2 provides methods for generating libraries of signal peptides having lengths of 12, 16 or 20 amino acids. Similarly these methods could be applied to other regulatory regions such as ribosome binding sites, protein targeting sequences and the like.
  • the position of the regulatory region to be mutagenized is identified. Regulatory regions contemplated within the scope of this invention have been discussed above. The relative position of the regulatory region can be identified within a particular nucleic acid sequence by studying the consensus data for that particular regulatory region available from the prior art. There exists in the literature a wealth of information concerning the consensus positions of known regulatory regions. For example, regions encoding signal peptides have been localized for both prokaryotic and eukaryotic genes as have regions encoded on mRNA which are required for ribosome binding (Protein Targeting, supra. pp. 52-61; Kozak, Microbiol . Rev .
  • the regulatory region to be mutagenized is identified, it is then incorporated into a nucleic acid vector together with a nucleic acid sequence encoding at least one protein under the control of the regulatory region to be mutagenized.
  • Any suitable expression vector may be used in this invention and exemplary vectors are provided in the Examples below. Those with skill in the art will appreciate that the choice of vector is limited to those vectors capable of directing expression of the nucleic acid sequence encoding protein and to those vectors that can incorporate and support the function of the regulatory region to be mutagenized. Further, the choice of vector is limited by the cell type selected; not all vectors and not all regulatory elements necessary for recombinant protein expression function in all cell types.
  • eukaryotic expression vectors are suitable for protein expression in eukaryotes and prokaryotic expression vectors are suitable for prokaryotes. Both types of vectors are commercially available and those with skill in the art of molecular biology will be able to select the appropriate vectors suitable for recombinant protein expression within a given cell type.
  • nucleic acid vectors Methods for incorporating a particular region of nucleic acid into a nucleic acid vector are well known in the art of molecular biology (see Sambrook, et al., supra) .
  • short regions of nucleic acid can be prepared by generating overlapping oligonucleotide fragments complementary to the nucleic acid. These oligonucleotides are then hybridized to one another, ligated and incorporated into an appropriate expression vector.
  • those with skill in the art of molecular biology will be able to use the polymerase chain reaction technology to amplify a suitable nucleic acid fragment containing the regulatory gene of interest and incorporate this fragment into the expression vector of choice (see generally Erlich, H.A.
  • the nucleic acid vector additionally contains a nucleic acid sequence encoding a recombinant protein product such that the regulatory region is operably linked to and/or regulates expression of the protein encoded by the gene. Since the regulatory region may be positioned in the nucleic acid vector as part of the nucleic acid sequence encoding protein or as a. nucleic acid sequence adjacent to the nucleic acid sequence encoding protein, the assembly of the nucleic acid vector containing the regulatory region to be mutagenized may require multiple steps.
  • nucleic acid vector suitable for expressing a protein product from a gene sequence in a host cell and to incorporate both the gene sequence encoding the protein product and the gene sequence encoding the regulatory region into the vector using techniques of genetic engineering well known in the art. Incorporation of both the regulatory sequence and the gene sequence encoding the protein product into the nucleic acid vector is designed such that once the nucleic acid vector is introduced into a compatible cell sample, if the regulatory sequence is functional, protein expression will be detected.
  • Compatible cells are those cells capable of expressing a protein from the nucleic acid vector when the vector contains those regulatory elements that facilitate protein expression (i.e. origins of replication, etc.) .
  • nucleic acid vector of this invention may additionally be incorporated into the nucleic acid vector of this invention.
  • gene sequences coding for antibiotic resistance, or other gene sequences that facilitate cell selection may be introduced into the vector.
  • the nucleic acid sequence encoding protein can itself encode an antibiotic resistance gene or a portion of an antibiotic resistance gene.
  • Other sequences that may be incorporated into the vector include but are not limited to promoters, enhancers, polyA+ adenylation sites, origins of replication (eukaryotic and/or prokaryotic) , and specific restriction endonuclease cleavage sites (or arrays of sites) .
  • Exemplary nucleic acid vectors containing a regulatory region to be mutagenized and a nucleic acid sequence encoding protein are provided in Examples 1-3 as well as Figures 1-3 and 6.
  • some base line information should be obtained for protein produced under the control of the nonmutagenized regulatory region.
  • This data is later compared to the level of protein expression influenced by the mutated regulatory region.
  • This data may be obtained from the literature, or the data may be obtained first hand.
  • the vector containing the nonmutagenized regulatory region is introduced into a cell sample that is able to support the expression of the gene sequence under the control of the regulatory region.
  • nucleic acid vectors into prokaryotic and eukaryotic cells.
  • the vectors may be introduced as viral vectors, electroporated into the cells, bacterial cell walls may be permeabilized, the vectors may be introduced through standard precipitation methods or through the use of lipophilic agents.
  • Such methods are commercially available as kits and detailed methods are readily available in the literature. Methods for transforming E . coli with nucleic acid vectors are provided in Example 1.
  • Proteins of interest may be concentrated from cell supernatants or cell lysates and quantitated by chromatographic methods, enzyme-linked immunoadsorbant assays (ELISA) , western blot assays, radio-immunoassays, gel electrophoresis, or the like.
  • ELISA enzyme-linked immunoadsorbant assays
  • Exemplary assays for measuring the amount of protein expressed in the periplasmic space of E . coli are provided in Example 1 and are detailed in Stemmer, et al. BioTechnigues 13:114-220, 1992, hereby incorporated by reference and U.S. Patent Application No. 07/641,140, filed April 26, 1991, and U.S.
  • Example 4 an exemplary assay for measuring the amount of protein expressed in a eukaryotic cell is provided in Example 4.
  • Methods to detect protein expression from a nucleic acid vector in a cell will vary depending on the gene sequence encoding protein. Those with skill in the art of molecular biology are well versed at selecting gene sequences encoding protein and matching these sequences with methods suitable for detecting protein expression derived from that gene sequence in a particular cell type.
  • the value of protein expression obtained from cells expressing recombinant protein under the influence of the nonmutagenized regulatory region provides a level of expression that can be compared with that of cells expressing protein from vectors with mutagenized regulatory regions. Deletion of the Regulatory Region to be Mutagenized
  • An important aspect of this invention involves the deletion of the region to be mutagenized before the pool of random oligonucleotides are incorporated into the vector.
  • This step of the invention advantageously removes the endogenous background level of protein expression that contaminates mutagenic libraries produced by other methods.
  • the regulatory region can be deleted from the nucleic acid vector in a variety of ways. For example, during vector construction, two identical vectors may be simultaneously assembled such that one contains and the other lacks the mutagenized regulatory region.
  • the regulatory region can be deleted from the vector using PCR technology and an exemplary strategy is disclosed in Example 1. Similarly the regulatory region can be deleted through the use of restriction endonuclease recognition sites bracketing the regulatory region.
  • Random Oligonucleotides As a next step for practicing the methods of this invention, a pool of random oligonucleotides is prepared that are suitable for PCR methods and can be readily incorporated into the position in the vector previously occupied by the regulatory region. These oligonucleotides are positioned in the nucleic acid vector in place of the deleted regulatory region. As noted previously, the term "random oligonucleotides" includes biased oligonucleotides. Methods for generating random oligonucleotides on a DNA synthesizer are well known in the art.
  • the random oligonucleotides are of a size and design useful for directing a polymerase chain reaction and for incorporating the nucleic acid vector containing the nucleic acid sequence encoding protein. That is, the oligonucleotides should contain regions of homology to the vector, they should include a restriction endonuclease site to facilitate recircularization of the plasmid following incorporation of the vector, and they should include a region containing the random or biased nucleotides.
  • the oligonucleotides can be of any desired length, and the length of the oligonucleotide will depend on the type of regulatory sequence and the number of random oligonucleotides that will be incorporated into the vector.
  • the oligonucleotides prepared in this invention contain at least one region that incorporates more than one nucleotide at a given position within the oligonucleotide.
  • This pool of random oligonucleotides is incorporated into the vector to produce a library of clones, containing random regulatory sequences.
  • Libraries prepared from random oligonucleotides, and particularly libraries prepared using the methods of this invention wherein the regulatory region to be mutagenized is deleted, have the advantage that the selected active clones will likely be novel regulatory sequences. Because of the large theoretical number of random regulatory regions be they signal peptides, ribosome binding sites, or the like, the methods of this invention are more likely to generate novel regulatory sequences that direct increased or optimized protein expression than other methods currently available in the art.
  • One potential difficulty of a completely random regulatory library is that the majority of the sequences generated are inactive. This means that the libraries must be sufficiently large in order to identify a sufficient number of active clones for further study. If the goal of the mutagenesis procedure is to generate a regulatory region that significantly increases the expression of the recombinant protein, then this type of regulatory region would occur infrequently in a random library. For example, in an E . coli library, if the probability of finding a "super sequence" is 10 "9 there is a good chance that it would not appear in a library of 10 8 clones. Exemplary Calculation to Estimate the Library Size Necessary for Identifying Novel Regulatory Sequences
  • the following example is provided as one method for estimating an adequate library size that would be required to identify a novel, optimized sequence.
  • This example is specifically addressed to the library size required to identify an optimized signal peptide in a prokaryotic cell using biased oligonucleotides. It is contemplated that those with skill in the art would be able to apply the strategy exemplified by the calculations used in this example to determine the library size required for other regulatory regions in either prokaryotes or eukaryotes.
  • Known signal peptides contain one or two positively charged amino acids near the N-terminus, a stretch of 8-12 hydrophobic amino acids and a small amino acid such as alanine, glycine or valine positioned at the C-terminus.
  • the peptide can be diagrammed as [M] [+][+] [n] ⁇ 9 [S] where each [] represents an amino acid, [M] represents methionine, [+] represents a positively charged amino acid, [n] represents any hydrophobic amino acid and [S] represents a tiny amino acid.
  • a biased library is difficult to construct due to the constraints on the usage of the genetic code. For example, if one tries to use oligonucleotide doping to produce either a lysine or arginine in a given position, the codon for glutamine will also occur. Similarly, due to the degeneracy of the genetic code, it is impossible to constrain an amino acid to solely a hydrophoibic amino acid using single base oligonucleotide synthesis from variable base mixtures. In some cases it is possible to use doping to achieve the desired amino acid mixture. For example, a 50% occurrence of alanine or glycine will occur if the sequence 5* G(G/C)X is used.
  • the oligonucleotide column swapping DNA synthesis technique developed by Glaser, et al. is used (J . Immunol . 149:3903- 3913) .
  • This method produces a 50% chance of a mutation at any desired amino acid.
  • This is a codon based mutagenesis approach that allows the formation of large diverse libraries.
  • the column-swapping manipulation causes one- half of each synthesis step to contain random codons, while the other half contains wild-type codons, at each position to be mutated.
  • the mutagenized oligonucleotides are introduced using a polymerase chain reaction into the position in the vector previously occupied by the deleted regulatory region.
  • a polymerase chain reaction There are a variety of polymerase chain reaction methods known in the art that are suitable to the methods of this invention. The examples below use the enzymatic inverse polymerase chain reaction (see Stemmer, et al. BioTechnigues 13:114-220, 1992, hereby incorporated by reference and U.S. Patent Application No. 07/641,140, filed April 26, 1991, and U.S. Patent Application No. 07/806,154, filed December 12, 1991) as a preferred PCR strategy.
  • EIPCR advantageously allows targeted mutagenesis of a given DNA sequence and rapid selection of active clones from a large library.
  • Other PCR techniques suitable for use in the methods of this invention include, but are not limited to, inverse PCR (He sley, et al. Nucl . Acids Res . 17:6545- 6551, 1989) , recombinant circle PCR mutagenesis (Jones, et al. BioTechnigues 8:178-183, 1990) , recombination PCR (Jones, et al., BioTechnigues 10:62-66, 1991) and standard PCR techniques such as those described by Saiki, et al. (Science 239:487-491, 1988) .
  • the enzymatic inverse polymerase chain reaction involves the use of oligonucleotides containing a class IIS restriction endonuclease recognition site and the associated downstream class IIS cleavage site.
  • the EIPCR technique uses a small circular vector and the PCR primers are designed to overlap at their class IIS cleavage sites.
  • one oligonucleotide of the oligonucleotide pair used in the EIPCR reaction contains in a first 5' to 3' orientation a 5' tail, a class IIS restriction endonuclease recognition site, a class IIS cleavage site, the region containing the mutation and a region complementary to the nucleic acid vector.
  • the other oligonucleotide of the oligonucleotide pair comprises in a second 5' to 3 1 orientation; a 5 1 tail, a class IIS restriction endonuclease recognition site, a class IIS cleavage site and a region complementary to the nucleic acid vector.
  • both oligonucleotides of the oligonucleotide pair used in the EIPCR reaction contain the first 5' to 3' orientation. Further details of primer design applicable to the mutagenesis application of this invention are provided in Examples 1-3.
  • the amplified linear vectors are digested with the appropriate restriction endonuclease and recircularized to produce a pool of mutagenized vectors.
  • the pool of mutagenized vectors is purified and aliquots are introduced into a cell sample using any of the methods disclosed in association with the discussion relating to cell transformation or transfection discussed above.
  • Example 1 uses electroporation to introduce the mutagenized vectors into the cell sample.
  • the cell samples used for both the mutagenized and nonmutagenized vector transformations or transfections are derived from the same cell type. In this way, protein expression from cells containing the nucleic acid vector having the nonmutagenized regulatory region can be directly compared to expression from cells containing the nucleic acid vector with the mutagenized regulatory region.
  • the cell sample is assayed for the expression of the recombinant protein, that is, the protein associated with the mutagenized regulatory region.
  • the particular assay for assessing the level of protein expression will depend on the particular protein being expressed as well as on the cell type. Exemplary methods for assessing protein expression in the periplasmic space of prokaryotes are provided in Example 1.
  • For enhanced protein expression cells are selected that exhibit protein expression from the nucleic acid vector at a level greater than or equal to the level of protein expression observed in the cell sample containing nonmutagenized vector.
  • other characteristics in addition to the level of protein synthesis, may be important for clone selection. These include cell growth characteristics, the location of protein expression or the like.
  • the processes of transcription, translation, translocation and signal peptide cleavage are parameters that are affected when the signal sequence is mutated.
  • the use of a random signal sequence allows for selection of clones that are optimized for all of these parameters using the internal mechanisms of the host cell. For example, a random signal sequence that generates a nonfunctional secondary structure in RNA would preclude efficient translation and positive clones containing this sequence would not be identified.
  • Assays for recombinant protein expression from the mutagenized vectors permits the cumulative effect of the mutated sequence on protein expression to be quantitated.
  • these parameters are optimized with respect to the individual requirements of the product proteins being expressed by the regulatory regions such as glycosylation requirements, folding requirements for functionality, and the like.
  • the nucleic acid vector is isolated from cells derived from the cell sample that contain mutagenized vector and exhibit enhanced or optimized expression of the recombinant protein.
  • the purified vector is subjected to DNA sequence analysis to determine the nucleic acid sequence of the mutagenized regulatory region corresponding to the random oligonucleotide.
  • DNA sequencing techniques are well known in the field of molecular biology, thus no further disclosure of sequencing techniques is required.
  • this invention creates novel regulatory sequences, including novel signal peptides.
  • novel signal peptides including novel signal peptides.
  • a library of 21 random amino acid sequences would contain a relatively low percentage of functional secreting clones.
  • the library can be restricted to only clones containing functional signal peptides.
  • Selectable marker proteins particularly those in prokaryotes require transport to the periplasmic space to be active, have the advantage that when a regulatory region is operably linked to the amino-terminus of a heterologous protein containing the C-terminus of a periplasmic protein, this will result in transport to the periplasmic space and the selectable marker is able to confer a subsequent advantage to cell growth under certain conditions.
  • the in-frame fusion of a heterologous protein to a selectable marker provides a rapid means to select only active regulatory sequences.
  • the selectable marker is an antibiotic resistant gene, then the use of increasing concentrations of antibiotic allows for selection of clones containing the most efficient regulatory sequences.
  • the enzyme ⁇ -lactamase represents one such selectable marker because it is active only if translocated from the cytoplasm to the periplasm (Pluckthun, et al. J. Biol . Chem . 262(9): 3951-3957, 1987 and Francisco, et al. supra) .
  • Example 1 uses a regulatory region operably linked to a eukaryotic protein fused to the C-terminus of / 5-lactamase, it also is contemplated that the regulatory region can be operably linked to a full-length selectable marker such as /J-lactamase (see Example 3) .
  • novel regulatory regions by assessing protein expression in the periplasmic space is not limited to /3-lactamase fusion proteins.
  • Other selectable marker proteins can also be used.
  • aminoglycoside 3'-phosphotransferase II can be used as a selectable marker in the same way.
  • This enzyme is present in bacteria that are resistant to streptomycin, neomycin or kana ycin. The enzyme renders the antibiotic inactive as it enters the cell through the periplasm.
  • Other periplasmic enzymes include Streptomycin adenylating enzyme and alkyl sulfhydrolase.
  • selectable markers contemplated for use in the methods of this invention include binding proteins for various carbon sources. Proper secretion of this class of proteins leads to their presence on the outer membrane or periplasm, enabling the cell to internalize a particular sugar or amino acid.
  • maltose binding protein when properly secreted, allows bacteria to use maltose as the sole carbon source. If the maltose binding protein is not secreted, cells cannot grow on media containing maltose as the sole carbon source. In some cases, it is necessary to use the appropriate auxotrophic strain of E. coli . For example a his-negative strain requires histidine in the media.
  • selectable proteins that may be useful as fusion proteins include but are not limited to arabinose binding protein, arginine binding protein, cystine and diaminopimelic binding protein, galactose-glucose binding protein, glutamate-aspartate binding protein, glutamine binding protein, histidine binding protein, leucine specific binding protein, lysine-arginine-ornithine binding protein, phosphate binding protein, ribose binding protein, sulfate binding protein, thiamine binding protein and xylose binding protein.
  • Example 1 employs a random sequence library linked to a fusion protein of a single chain antibody with an antibody resistance gene, the regulatory region could also be linked solely to a sequence encoding an antibody sequence and positive clones selected by screening with labelled antigen.
  • the antigen is preferably a small hapten capable of diffusing into the periplasmic space.
  • any regulatory region affecting the level of protein synthesis is contemplated as a potential target for the optimization strategies of this invention.
  • Two specific prokaryotic examples are provided below. These examples illustrate the methods of this invention as they apply to optimizing signal sequences (Example 1) and ribosome binding sites (Example 3) . The methods of this invention readily apply to mutagenesis strategies directed to both these and other regulatory regions in prokaryotic and eukaryotic cells.
  • Example 1 details a method for creating novel signal sequences for the expression of eukaryotic protein in E. coli . In addition to creating novel signal sequences, this method is also useful for optimizing the expression of a particular protein from a given construct in a given cell type.
  • Example 1 combines the mutagenesis strategies of this invention with antibiotic selection to improve secretion of Fv antibody molecules in E . coli .
  • a fusion protein consisting of an N-terminal, hapten-binding single chain antibody (CHA 255) and the C- terminal of /3-lactamase was constructed and expressed in active form in the periplasmic space of E . coli using the Omp A signal peptide.
  • the fusion of /3-lactamase with a second protein to study protein transport was reported by 10 Broome-S ith, et al..Mol . Micro . 4(10) 1637-1644, 1990.
  • the regulatory sequence of interest here the Omp A signal peptide
  • the Omp A signal peptide was deleted and enzymatic inverse PCR was used to construct a library containing random signal peptides at the N-terminus of the fusion protein.
  • None of the references cited above report the deletion of the regulatory region to be mutagenized followed by the isolation of vector containing that deletion to create large libraries of regulatory mutants uncontaminated with false positive clones.
  • Functional signal peptides were selected by plating the clones on media containing ampicillin. Clones were identified that had /3-lactamase activity comparable to or greater than that of the fusion protein containing the Omp A signal peptide. To ensure retention of antibody activity, colony filter lifts were screened with a radioactively labeled hapten, ⁇ In-EOTUBE, recognized by the CHA 255 antibody.
  • Example 1 provides a detailed method for generating novel signal sequences that are 22 amino acids in length.
  • optimized signal sequences like optimized regulatory regions can be selected based on the lengths of known regulatory regions, or, alternatively, regulatory regions of any desired length can be selected for optimization. Therefore, steps similar to those outlined in Example 1 can be performed to generate optimized signal sequences, or other regulatory regions of any length.
  • Example 2 provides methods for creating random leader peptides of 12, 16, or 20 amino acids in length. It is further contemplated that the methods of this invention can be used to create novel regulatory sequences for expression of either heterologous or homologous protein.
  • Example 3 uses a native E. coli protein to identify novel ribosome binding site sequences.
  • ribosome binding site sequences can be incorporated into other vectors to study the effect of these sequences on heterologous protein expression.
  • the steps outlined above are followed for creating a library of random regulatory sequences operably linked to protein except that here the initial nucleic acid vector construct includes the Omp A ribosome binding site linked to the Omp A signal peptide located at the amino terminus of the /3-lactamase protein.
  • the Omp A ribosome binding site sequence was deleted by PCR and the resulting vector was used to incorporate a library of random oligonucleotides.
  • These mutagenized vectors were introduced into E. coli and the level of protein expression from the resulting clones was assessed using a PADAC assay.
  • Example 3 Three clones were identified that expressed elevated levels of protein relative to the nonmutated ribosome binding site sequence. These clones were selected and the nucleic acid sequence of the mutated ribosome binding site was determined. Details of this experimental protocol are provided in Example 3.
  • a method is provided for creating and identifying novel regulatory regions in eukaryotes.
  • Example 4 details a method for producing signal peptide libraries in mammalian cells for improved protein expression.
  • Known eukaryotic signal sequences are disclosed and reviewed by Von Heigne, G. , Eur. J . Biochem . 133:17-21, 1983.
  • Figure 7 illustrates an exemplary vector contemplated for use in this invention.
  • the vector uses the Immunoglobulin heavy chain promoter to initiate transcription of the desired gene inserted downstream from the promoter.
  • the gene to be expressed along with the regulatory region to be mutagenized is inserted into the vector.
  • the regulatory region is deleted and a random or biased regulatory sequence is inserted into the vector.
  • Example 4 teaches the mutagenesis of the signal sequence. Following the incorporation of the random signal sequence into the eukaryotic vector, the vector is introduced into suitable eukaryotic cells that are capable of expressing protein from the expression vector.
  • Example 4 employs the vector of Figure 7 in SP 2/0 cells. Vectors can be introduced into cells using those methods known in the art.
  • Example 4 employs electroporation to mediate transfection.
  • Positive cells are preferably selected by growing the cells in neomycin.
  • the relative level of protein expression is assayed using a Western dot blot.
  • Positive transfected cells can be further quantitated for protein expression by ELISA as compared with cells expressing protein under the control of the native regulatory sequence.
  • Cells expressing equal or greater levels of protein expression as compared with cells containing the native regulatory sequence are subjected to DNA sequencing.
  • the methods of the invention are used to generate targeting sequences to target proteins to particular cellular locations.
  • the terms "protein targeting sequence” and “protein trafficking sequence” are used interchangeably in this disclosure.
  • proteins with N-terminal signal sequences are secreted into the periplasmic space or are inserted into the outer membrane using a secretion apparatus defined by a series of sec genes (Bassford, et al., Cell 65:367, 1991).
  • Other proteins lacking signal sequences are also secreted into the extracellular space in E. coli and some of these proteins contain secretion signals located at the C-terminus of the protein (Delepelaire, P. et al, J . Biol . Chem . 265:17118, 1990) .
  • These secretory proteins lack a classical hydrophobic signal peptide. Instead, the proteins contain a consensus region of 200 amino acids containing highly conserved stretches of amino acids that are postulated to confer secretory activity.
  • targeting sequences are located within a variety of locations in eukaryotic proteins.
  • interleukin proteins lack hydrophobic leader peptides and their targeting sequences are believed to be located in the C-terminal portion of the protein precursor (Marck, et al., Nature , 315: 641, 1985) .
  • E . coli expression of Interleukin-1 results in the translocation of the protein to the periplasmic space, while expression in yeast results in expression of the protein in the cytosol.
  • Particular protein sequences also are implicated in mitochondrial targeting (see Zara, et al., J. Biol . Chem . 267:12077-12081; Adrian, et al., Mol . Cell .
  • the ability to target protein to a particular cell location is an important goal in gene therapy strategies.
  • the method described in the claims and below and detailed in Example 5 permits one with skill in the art to introduce a particular nucleic acid sequence encoding protein into a cell and to select cells expressing protein in the desired cell location.
  • a comparison of the native protein sequence with the mutated protein sequence that targets the protein to the desired cell location permits the identification of the novel sequence that directs protein targeting.
  • a particular region of nucleic acid containing a protein trafficking/targeting signal or a postulated protein trafficking/targeting signal is identified.
  • trafficking signals can be positioned in a variety of locations within a protein sequence.
  • a nucleic acid vector is prepared that contains the nucleic acid sequence encoding the protein trafficking signal to be mutagenized and a nucleic acid sequence encoding at least one protein.
  • the nucleic acid vector may be suitable for directing eukaryotic or prokaryotic expression of the protein, and those with skill in the art of molecular biology will be readily able to select and construct an expression vector suitable for this invention.
  • the nucleic acid sequence containing the protein trafficking signal is deleted from the vector.
  • a pool of random oligonucleotides are produced that are suitable for use in a polymerase chain reaction.
  • oligonucleotides are incorporated into the position in the vector previously occupied by the protein trafficking signal to generate a pool of mutagenized vectors. Suggested methods for the steps of this invention are discussed supra as they relate to methods for creating nucleic acid regulatory sequences and specific examples of the suggested methods are provided in the Examples below.
  • the mutagenized vectors are introduced into a cell sample using electroporation, membrane permeabilization, CaCl 2 precipitation, viral vectors, or the like. Those with skill in the art will use their preferred methods for introducing nucleic acid vectors into a cell sample.
  • the cells are assayed to determine the location of the protein encoded by the nucleic acid vector in the cell sample.
  • Methods to determine the localization of a protein within a cell are known in the art. For example, if antibodies are available that react with the protein encoded by the nucleic acid vector, then these antibodies can be labelled with a fluorescent marker, colloidal gold, or the like.
  • Cell samples expressing the protein can be membrane permeabilized for fluorescent antibody analysis or cells can be fixed and sectioned for protein localization studies using electron microscopy in the presence of colloidal gold.
  • cell samples can be fractionated and individual organelles separated and individually tested for the presence of the protein.
  • Cells expressing the protein in the desired cell location are identified by an assay and the nucleic acid sequence of the novel protein trafficking/targeting signal is determined and compared to the native sequence.
  • Particular embodiments of the invention will be discussed in detail in the following examples and reference will be made to possible variations within the scope of the invention. There are a variety of alternative techniques and procedures available to those of skill in the art which would similarly permit one to successfully perform the intended invention.
  • SCCHA225 was amplified from the construct pUCHAsc2 using a 5'-terminal primer (B444, designated SEQ ID NO: 1, see Figure 1) containing an Xba I restriction site and a 3' primer (B456, designated SEQ ID NO: 2) containing DNA coding for the (gly 4 ser) 2 sequence and a Class IIS restriction endonuclease restriction site, Bsa I.
  • pUCHAsc2 The designation "2" in the name pUCHAsc2, indicates that the a pR gene was a mutation eliminating the BSA1 site, which was accomplished by a single round of EIPCR using primers 939 (SEQ ID NO: 3) and 940 (SEQ ID NO: 4) as described in Example 2) .
  • /3-lactamase was amplified from pUCHAsc2, which contains a modified copy of /3-lactamase lacking the Bsa I restriction site, but containing the native amino acid sequence.
  • a 5 '-terminal primer (B457, designated as SEQ ID NO: 5) containing a Bsa I restriction site and nucleotides coding for (gly 4 ser) 2 were used.
  • the 3 '-overlap primer (B447, designated as SEQ ID NO: 6) introduced 3 consecutive stop codons and a Bam HI restriction site at the 3' end of the /3-lactamase gene.
  • the resulting PCR products were processed with Bsa I, ligated and the ligation products were used as template for a second PCR reaction using the external primers B444 and B447. Ligation conditions are provided below.
  • pCCHAscl is a pUC based plasmid that constitutively expresses a chloramphenicol resistance gene and contains an expression cassette having a Lac promotor driving expression of the CHA 255 single chain antibody through the use of an Omp A derived ribosome binding site and signal sequence.
  • the Omp A sequence was created from primers using the sequence provided in the literature (Mowa, et al. J. Mol . Biol . 147:317-328, 1980).
  • the resulting plasmid pCCHAscl-ampRV2 was verified by restriction digestion and DNA sequencing. Double stranded dideoxy sequencing was performed on a Dupont Genesis 2000, using the DuPont Genesis 2000 sequencing kit according to the manufacturer's instructions. Post gel processing was done with the Base Caller 5.0 program (DuPont, Boston, MA).
  • Oligonucleotides used to produce the fusion protein construct are shown in Figures 1 and 2. Oligonucleotides were synthesized in an Eppendorf Synostat D automated DNA synthesizer (Madison, Wisconsin) .
  • Oligonucleotides used for library EIPCR were synthesized with the 5'-trityl group on and were purified with a Nensorb Prep column (New England Nuclear, Tozer, MA) according to the manufacturer's instructions. Oligonucleotides containing random nucleotide incorporation were prepared by selecting a mixed base option according to the software provided by the manufacturer.
  • Standard ligations contained 0.1-1.0 ⁇ g DNA, lx ligation buffer and 1-2 ⁇ l (400,000 units) of T4 DNA ligase (New England Biolabs, Tozer, MA) in a 20 ⁇ l volume. The ligation reactions were incubated at room temperature for one hour. Mass estimates for chemically synthesized oligonucleotides were obtained spectrophotometrically. In all other cases, DNA mass estimates were made by visual comparison of samples run on agarose gels with a 1 kb ladder DNA standard (GIBCO/BRL, Gaithersburg, MD) loaded to approximate 100 ng DNA per band.
  • GEBCO/BRL 1 kb ladder DNA standard
  • the construct pCCHAscl-ampRv-2 was introduced into E. coli DH-10B (GIBCO-BRL, Gaithersburg, MD) . Bacteria were propagated at 30°C in Terrific Broth medium (Sambrook, et al. 1989. supra) or on agar plates containing 34 ⁇ g/ml chloramphenicol and 100 ⁇ g/ml ampicillin. Positive clones were selected from the ampicillin plates for further study. Assays to monitor the level of protein expression in the bacteria are described below. The completed vector provides the template for EIPCR-based construction of the signal peptide library.
  • the PCR reactions contained 0.2 ng the template, 0.5 _M of each primer, IX Taq buffer (Perkin- Elmer Cetus, Norwalk, CT) , 200 _M of each dNTP, 1.61 mM MgCl 2 , and 5 units of Taq DNA Polymerase (A pliTaqTM, Perkin-Elmer Cetus) in a total volume of 100 ⁇ l.
  • the addition of MgCl 2 beyond 1.61 mM up to, 3.11 mM is often used as a variable that when modified impacts the yield of DNA.
  • Templates were amplified in a Perkin Elmer Thermocycler (Norwalk, CT) using a "hot start" according to manufacturer's instructions (3 min.
  • the EIPCR template DNA is similar in size to the desired EIPCR product, hence they could copurify by agarose gel electrophoresis during further manipulations. While the circular template is present at a much lower concentration than the EIPCR product, it will electroporate at high efficiency in bacteria thereby resulting in the contamination of the library with wild-type clones.
  • pCCHAscl-ampR_SP The terminal extensions of DNA were removed by digestion with Bsa I, yielding compatible cohesive ends that efficiently undergo an intramolecular ligation.
  • the resulting plasmid, pCCHAscl-ampR_SP is a 3.2 kB pUC- derived plasmid which contains a single chain CHA225-
  • FIG. 2b illustrates the use of EIPCR to create a random signal peptide library.
  • Divergent oligonucleotide primers were used to amplify the DNA and to incorporate mutant sequences.
  • the forward primer, B524 (SEQ ID NO: 9) , annealed to the DNA corresponding to the N-terminal region of the scCHA antibody.
  • the reverse primer, B525 (SEQ ID NO: 10) , incorporated the initiator codon and 21 random codon triplets.
  • the reverse primer's annealing location is illustrated in Figure 2b.
  • Both the forward and reverse primers contained the Bsa I restriction site near their termini.
  • the Bsa I restriction site is present at both ends of the linear EIPCR product and was removed by digestion, yielding compatible cohesive ends.
  • the linear PCR product was then ligated and electroporated into E. coli .
  • EIPCR reactions contain only 0.5 ng of template DNA per 100 ⁇ l reaction (Stemmer, et al. supra) . Increasing the amount of template to 25-50 ng per 100 ⁇ l reaction resulted in more EIPCR product and a larger library size. As an example, when pCCHAscl-ampRV2 was used as a template, 5 ng of plasmid yielded only 0.25 ⁇ g of PCR product.
  • PCR product When 50 ng of template was used, 5 ⁇ g of PCR product was obtained. It is contemplated that the activity of different primer-template combinations will be optimized within a range of template concentrations from about 0.1 to 50 ng of template per 100 ⁇ l reaction. Those with skill in the art of molecular biology can readily determine the optimal template concentrations as well as adjust the concentration of MgCl 2 to maximize PCR reaction product.
  • the deletion of the critical regulatory sequence from the template prior to the incorporation of random sequence enables one to use greater amounts of template than those typically used in standard PCR reactions without concern for amplifying nonmutated template. EIPCR reaction conditions for the incorporation of random oligonucleotides were the same as for the cloning reactions.
  • the 100 ⁇ l PCR mixture was extracted with an equal volume of phenol-chloroform-isoamyl alcohol (24:23:1) and was then precipitated using 2 volumes of ethanol.
  • the protruding termini of the PCR end products were filled in and digested according to the method of Stemmer (Biotechnigue ⁇ , 1992 supra) . Briefly, the pellet was resuspended in 70 ⁇ l H 2 0 and 10 ⁇ l of 10X Klenow buffer (NEB, Tozer, MA), 10 ⁇ l dNTP mix (2.5 mM each dNTP), 5 ⁇ l of DNA Polymerase I (large fragment: Klenow) , and 5 ⁇ l of T4 DNA polymerase were added.
  • the reaction was incubated at 37°C for one hour followed by phenol extraction and ethanol precipitation as disclosed above. Blunt-ended EIPCR fragments were then exhaustively digested in a 100 ⁇ l reaction containing 50 units of Bsa I at 60°C for four hours.
  • the Bsa I digested fragment was purified away from both the PCR primers and the short terminal DNA fragment by centrifugation through a Biospin 6 column (BioRad, Richmond, CA) according to the manufacturer's protocol.
  • the purified Bsa I digested fragment was then phenol extracted and ethanol precipitated as described earlier and ligated under standard conditions with the exception that the fragments were ligated at higher concentrations of T4 DNA ligase (106 units/ ⁇ l) and incubation was at 12°C overnight.
  • Ligated or control DNA (pUC 19, GIBCO/BRL) was precipitated in 2 volumes of ethanol and resuspended in 20 ⁇ l of TE (lOmM Tris, pH 7.4, 0.1 mM EDTA, pH 8.0). 10 ⁇ g of yeast tRNA (GIBCO/BRL) was added as a carrier and did not affect electroporation efficiency.
  • the DNA was electroporated in 1-5 ⁇ l aliquots into electrocompetent DH10-B MAX cells (BRL, Bethesda, MD) according to the manufacturer's instructions.
  • pUC19 DNA (0.01 ng) was electroporated to monitor the electroporation efficiency.
  • the cells were centrifuged briefly and resuspended in 400_1 S.O.C.
  • a IO" 4 dilution was prepared in S.O.C. and 0.1 ml aliquots were plated on TB plates containing 34 ⁇ g/ml chloramphenicol with the indicated concentration of ampicillin.
  • This transformation strategy was compared to a transformation using 4 ⁇ l (approximately 1 ⁇ g) of signal peptide library DNA electroporated into 80 ⁇ l electrocompetent E. coli DH10B and processed as described for the control pCCHAscl-ampR plasmid except that the IO "4 dilution step was eliminated.
  • the library size was determined by electroporating 1 ⁇ l of library DNA and plating 90% of the electroporated cells onto media containing 34 ⁇ g/ml chloramphenicol and 100 ⁇ g/ml ampicillin and by plating serial dilutions of the remaining i ⁇ % of the electroporated cells onto media containing only chloramphenicol (34 ⁇ g/ml) . Under the latter conditions all clones produced colonies. The quality of the resulting signal peptide library was verified by DNA sequencing of the mutated region of four nonselected clones. In all clones the mutations were correctly incorporated. The composition of the mixed bases was 31% A, 19% G, 25% C and 25% T where the sample number was 119.
  • colony lifts of 23cmx23cm plates with 0.3- 1 x 10 5 colonies were prepared using BA83 nitrocellulose filters (Schleicher and Schuell, Keene, New Hampshire) .
  • the filters were blocked by incubation in 3% non-fat milk in 25 mM Tris-HCl pH 7.5 for 10 minutes, washed with 25 mM Tris, followed by incubation in 25 mM Tris containing 50 °Ci of chelated n ⁇ Indium per filter for 1 hour at room temperature.
  • the filters were then washed with 25 mM Tris ' for a total of 15 minutes, dried and exposed to Kodak X- Omat AR autoradiography film for several hours.
  • the cells were grown at 30°C in Terrific Broth (see supra) containing 34 ⁇ g/ml chloramphenicol and 50 ⁇ g/ml ampicillin in baffled shaker flasks at 250 rpm for 24 hours.
  • the periplasmic fraction of E . coli was prepared and isolated using the methods described by Witholt, et al. (Anal. Biochem . 74: 160-170, 1976, hereby incorporated by reference) .
  • E13 and E14 showed moderate levels of antibody activity but very low levels of beta lactamase activity. Also, clones like E10, E12 and E16, showed moderate levels of beta lactamase activity while exhibiting very little antibody activity. Only clone E6 showed levels of antibody and enzyme activity that were close to that for the control Omp A clone. Because there is the possibility of obtaining mutations in any PCR technology that may affect the activity of antibody or enzymes, we took several of the periplasmic preparations and evaluated them for their expression levels by Western blotting (Sambrook, et al. supra) since this method should be less sensitive to mutation (see Stemmer, et al. BioTechnigues , 1993 supra) .
  • Samples were developed with purified rabbit polyclonal antibodies to CHAsFv.
  • the antibodies were obtained from rabbit polyclonal antiserum. Rabbits were immunized by a traditional procedure (J. Immunol . 116:1306-1312, 1975) that employed multiple intramuscular injections of CHA255 monoclonal antibody (100 ⁇ g amounts) in Freund's adjuvant. Serum obtained from the rabbits were processed to isolate a purified polyclonal antibody preparation to CHA255 Fv by absorption and elution from an immunoadsorbant column that was conjugated with CHA255 chi eric antibody. The CHA255 chimeric antibody was cloned and produced as described previously (J . Immunol . 145:1200-1204, 1990 and Protein
  • E5 was higher than that of E6, with E12 and E10 slightly lower in expression than E6 (data not shown) .
  • E12 and E10 slightly lower in expression than E6 (data not shown) .
  • E12 and E10 slightly lower in expression than E6 (data not shown) .
  • E12 and E10 slightly lower in expression than E6 (data not shown) .
  • E12 and E10 slightly lower in expression than E6 (data not shown) .
  • E12 and E10 slightly lower in expression than E6 (data not shown) .
  • E6 was subjected to DNA sequencing to determine the sequence of the DNA that coded for the signal peptide (see Table 4) .
  • a number of the clones contained stop codons within their sequence.
  • clone E6, which directed increased protein expression in the antibody and enzyme activity tests and was second only to E5 in the Western blot contained several stop codons in the signal sequence.
  • the EIPCR method involves amplification of the entire plasmid, there is always the possibility that one can have mutational errors occurring in any location in the vector of the selected library clones. Thus, it is always recommended that after the DNA sequence is determined for the mutated library segment, the selected new sequence is recloned using new DNA oligonucleotides with standard cloning procedures to confirm that the sequence is in fact directing the expected effect identified from the originally selected mutant sequence.
  • the signal sequence library E we chose to redone E5 and ElO signal sequences in place of the natural /3-lactamase signal sequence and compare expression with Omp A driven /3-lactamase.
  • oligonucleotides containing the new sequences were cloned into the vector pCLA3ampR ( Figure 3) .
  • This PUC derived vector contains a constitutively expressed chloramphenicol gene, Lad, and the /3-lactamase gene. Expression of /3-lactamase is driven by the Lac promoter and is controlled by the Omp A RBS sequence and Omp A signal sequence. The activity of the Lac promoter is negatively regulated by the LacI gene product, the Lac repressor.
  • pCLA3ampR E5 was accomplished by PCR using primers B947 (SEQ ID NO: 24) and 940 (5' and 3' primers respectively, see Figure 9) with the pCLA3ampR template using standard PCR conditions as disclosed supra. The product was cut with Xba I and Pst I. The PCR product was then ligated back into pCLA3ampR, which had previously been digested with Xba I and Pst I. The construction of pCLA3ampR ElO was accomplished in the same way, except that the 5' primer was B946 (SEQ ID NO: 25) .
  • clones were grown in Terrific Broth containing 30 ⁇ g/ml chloramphenicol in baffled shaker flasks rotating at 250 rpm for 24 hours at 30°C.
  • IPTG Boehringer Mannheim, Indianapolis, IN
  • the periplasmic fraction was obtained as described by Witholt, et al. (supra) .
  • the periplasmic samples were analyzed for /3-lactamase activity using the PADAC assay described earlier.
  • E5 and ElO signal sequences were then compared for structural properties with selected signal sequences derived from E. coli proteins (Omp A, DGAL, LamB, ELT6, Omp T, Pel B, PhoA) . These signal sequences, as well as a variety of others are available from the GenBank database.
  • the program Protean (DNAstar, Madison, WI) was used to evaluate alpha helix vs beta sheet regions (Garnier-Robson and Chou-Fasman methods) as well as the hydrophilicity index (Kyte-Doolittle method) .
  • Both E5 and ElO displayed the characteristic hydrophobic core of amino acids seen in all of the E . coli derived signal sequences ( Figure 4) .
  • the ElO sequence contained no alpha helix regions. Alpha helix content varied with all of the native E . coli signal sequences evaluated. Also, an interesting feature of the E5 signal sequence was that the final 3 residues of the sequence, Thr, Asp, Thr, correspond to those designated in the art as a -3,-1 box which provides the recognition sequence for cleavage by the E. coli signal peptidase (Protein Targeting, supra) . The sequence in the -3,-1 box for E5 differs from all known E. coli sequences. The presence of this unique -3,-1 sequence is not detrimental to the ability of E5 to export a functional /3-lactamase protein to the periplasm.
  • periplasmic extracts were fractionated on SDS- polyacrylamide gels. The proteins were transferred to nitrocellulose and probed with an anti-CHA Fv antibody. Results from the Western blots indicated that the fusion proteins isolated from the bacterial clones were _53 kD. This is close to the predicted size of the fusion protein.
  • FIG. 5 illustrates the use of primers B755, B756, B757, and B758 (corresponding to SEQ ID NOS: 26 - 30 respectively) that were used in EIPCR reactions to produce signal peptide libraries of varying length.
  • These libraries were constructed using primer B755 and one of the following primers: B756, corresponding to the random 12 aa signal peptide library; B757, corresponding to the random 16 aa signal peptide library; and B758, corresponding to the random 20 aa peptide library.
  • the varying length libraries were constructed such that the random signal sequences were incorporated directly onto the amino terminus of the gene sequence encoding /3-lactamase
  • active clones are defined as those clones which grow on 34 ⁇ g/ml chloramphenicol plus 10 ⁇ g/ml ampicillin. The following percentages of active clones were identified. Active clones are defined as those which grown on 34 ⁇ g/ml chloramphenicol plus 10 ⁇ g/ml ampicillin) :
  • the clones were also analyzed by PADAC assay
  • ribosome binding site (GAGG) from the Omp A gene (Omp A sequence from -18 to -1) followed by the Omp A signal sequence (+1 to +63) to direct the /3-lactamase protein to the periplasm.
  • the pCampR vector has a wild type DNA /3-lactamase sequence.
  • Bsa I enzyme for the EIPCR library reaction, a Bsa I site in the 3-lactamase gene was first removed. Removal was accomplished by standard EIPCR methodology (disclosed above) using the oligonucleotide primers 939 (SEQ ID NO: 3) and 940 (SEQ ID NO: 4) (see Figure 6) .
  • the resulting vector was designated pCampR_BSA.
  • the entire region containing the Omp A derived ribosome binding site was removed from pCampR_BSAI.
  • Sixteen bases were removed from position -16 to -1 by a single round of EIPCR using standard conditions and primers B667 (SEQ ID NO: 30) and B668 (SEQ ID NO: 31) (See Figure 6) .
  • the resulting vector was designated pCampR_RBS and was unable to grow on plates containing 100 ⁇ g/ml ampicillin.
  • pCa pR ⁇ RBS was used as the template in an EIPCR reaction containing primers B765 (SEQ ID NO: 32) and B766 (SEQ ID NO: 33) (See Figure 6) .
  • Primer B766 contained a sequence of 16 NTPs (25% mixture of each base) so as to exactly replace the prior deleted 16 base sequence with a random DNA sequence.
  • the EIPCR library reaction was accomplished using the EIPCR conditions as disclosed by Stemmer, et al. (BioTechnigues 13:114-220, 1992) . This reaction included low amounts of template (0.5-1 ng/100 ⁇ l reaction mixture) .
  • Example 1 This is in contrast to Example 1, which required much larger amounts of DNA template (up to 50 ng/100 ⁇ l reaction mixture) .
  • the amount of template DNA that needs to be used in a particular EIPCR library reaction can be experimentally determined by those with skill in the art. As illustrated in Example 1, deletion of the entire functional region of the DNA prior to introducing the library sequence eliminates the concern for wildtype clones contaminating the library and enables the use of large amounts of template DNA for EIPCR, if indeed large amounts are required.
  • the EIPCR reaction material was processed to fill in the ends with Klenow and T4 DNA polymerase, cut with Bsa I, and ligated as described in Example 1 except that after the Bsa I digest, the DNA was electrophoresed on an agarose gel and the band containing the DNA was cut out and processed using GeneClean (BiolOl, San Diego, CA) , according to the manufacturer's instructions, to separate the digested from undigested fragments.
  • GeneClean BiolOl, San Diego, CA
  • the ligated RBS library DNA was electroporated by adding 1 ⁇ l (50 ng) of DNA to 20 ⁇ l of electrocompetent DH10-B MAX cells (BRL, Bethesda, MD) according to the manufacturers instructions. Under conditions to select for the plasmid marker (34 ⁇ g/ml chloramphenicol) , the library size was estimated to be 1.7 x 10 6 colony forming units/ ⁇ g ligated DNA. Under dual selective conditions (34 ⁇ g/ml chloramphenicol + 50 ⁇ g/ml ampicillin) , an equivalent amount of DNA was estimated to produce 24,040 colony forming units (cfu), which indicates that 1.4% of the colonies contained potential mutants.
  • cfu colony forming units
  • PADAC as says to measure j ⁇ -lactamase activity showed a wide range of activity (Table 7) , with several clones exhibiting activities higher than what was obtained with the wildtype Omp A RBS in the pCampR ⁇ BSA vector. After several repeat cultures and PADAC assays, it was clear that RBS clones 1, 8 and 12 showed higher levels of expressed 3-lactamase activity than the wildtype Omp A RBS clones pCampR ⁇ BSAI (see Table 7) .
  • the DNA coding sequence for the RBS regions in four clones were determined by DNA sequencing and new oligonucleotides containing these sequences were synthesized to enable expression analysis in a new vector that had not undergone EIPCR.
  • the DNA sequence for RBS clones #1, 2, 8 and 12 are provided in Figure 8. As noted above, for clones 1, 2 and 8 there was an additional insertion due to improper cutting of the terminus of the EIPCR primer (inserted nucleotides shown in bold) .
  • RBS long the full length RBS sequence containing the insertion is herein referred to as RBS long
  • RBS short the RBS sequence without the insertion
  • DNA oligonucleotide primers coding for resistance, RBS 2 long and short, RBS 8 short and RBS 12 were synthesized. Their sequences are shown in Figure 9.
  • the vector used to retest the activity of these various new RBS sequences was pCLAampR ( Figure 3) , disclosed above.
  • the various clones were produced by performing a standard round of PCR in which the various primers were included and pCLAampR was used as the template ( Figure 8) .
  • the PCR products were filled in with Klenow and T4 DNA polymerase, followed by restriction digestion using Xba I and Pst I.
  • the products were cloned into the pCLAampR vector that had been previously cut with the same restriction enzymes.
  • the DNA was ligated using the methods provided in Example 1, and DH10B bacteria were transformed using the electroporation procedures disclosed above. Clones were selected for growth on chloramphenicol and grown overnight in Terrific Broth media + 34 ⁇ g/ml chloramphenicol. The next day, cultures were seeded at
  • DNA sequence is from the XBAl site (-16 position relative to the ompA signal peptide sequence) to the start of the Omp A signal sequence (-1 position) .
  • Bolded nucleotides indicate the DNA sequence corresponding to an insertional error due to blunt ended ligation of a EIPCR DNA primer that failed to be cut by BSA1.
  • Underlined nucleotides indicate areas of potential complementarity with the 3' end of E. coli 16S ribosomal RNA. (ie. potential Shine Dalgarno sequences) .
  • Example 4
  • FIG. 7a illustrates the vector contemplated for use in this invention, pGCEMK.
  • the vector uses the immunoglobulin Kappa Light Chain promoter (from the kappa light chain gene of the murine monoclonal antibody CEM 231) to direct transcription of a desired gene.
  • the native CEM kappa light chain variable region it is driving the native CEM kappa light chain variable region, but one skilled in the art will be aware that any natural or recombinant variable region could replace the native Variable region and the same process of signal sequence mutagenesis could be accomplished. Further, the entire light chain coding sequences could be replaced by those from any eukaryotic gene and a similar process carried out. In the present example, the aim of the mutagenesis is to determine if an alternate signal sequence can be obtained of equivalent or improved efficacy, as measured by expression of mature light chain.
  • pGCEMK Other components of the vector, pGCEMK, include the major intron from the human kappa light chain gene, containing its native enhancer sequence; the human kappa constant region, containing its native polyadenylation signal; a Xanthine-guanine phosphoribosyl transferase gene (gptR) under the control of an SV 40 promoter to permit selection of stable transformed cells; the bacterial colEl origin of replication, to permit high copy number plasmid replication in E . coli .
  • the plasmid, pGCEMK was constructed from the plasmids pHF-1 and pMLCE-10, which have been deposited with the American Type Culture Collection as Accession numbers 67,637 and 67,639, respectively. These plasmids and the construction pGCEMK are described in U.S. patent application Serial No. 07/727,719, filed July 2, 1991, which is incorporated herein by reference in its entirety.
  • the native signal sequence is removed.
  • this region is replaced with the first two codons of the native signal to allow efficient initiation of translation to take place. These are fused directly to the first codon of the Variable region.
  • Random 60-65 mer oligonucleotides are inserted into the vector by PCR.
  • One skilled in the art has the option of conserving the signal intron in the resultant construct or deleting it, as long as the control containing the native signal has a comparable format.
  • the vector is introduced into suitable eukaryotic cells that are capable of expressing protein from the expression vector (for example, SP 2/0 cells, American Type Culture Collection, Rockville, MD) .
  • electroporation is used to mediate transfection; however, those skilled in the art will be readily able to select other equally appropriate methods for introducing their selected expression vector into a suitable cell type.
  • Positive cells are preferably selected by growing the cells on media containing hypoxanthine, mycophenolic acid and xanthine (HMAX) .
  • cells expressing a detectable level of protein are identified by ELISA. Positive colonies are further quantitated for protein expression by a quantitative ELISA technique as compared with cells expressing protein under the control of the native regulatory sequence, and using the vector without random oligonucleotides added as a negative control. Cells expressing equal or greater levels of protein as compared with cells containing the native regulatory sequence are subjected to DNA sequencing, and identified as optimal signal sequences for the particular protein expressed.
  • a critical component of immunoglobulin (Ig) protein trafficking is the binding of the chaperon protein BiP to immunoglobulin heavy chain and the subsequent displacement of BiP by light chain (Nature 306:387-389,
  • a mammalian expression vector is selected that contains a heavy chain gene.
  • Figure 8 illustrates one heavy chain chimeric vector, pNCEMGl, that is useful for practicing the trafficking methods of this invention.
  • This vector uses the Immunoglobulin heavy chain promoter to transcribe the desired gene inserted into a cloning site.
  • the vector includes an immunoglobulin heavy chain polyadenylation signal and includes a neomycin resistance gene.
  • the neomycin resistance gene is under the control of the SV40 promoter, thereby permitting the selection of stable transformed cells in media containing the antibiotic G418.
  • the bacterial colEl origin of replication permits high copy number plasmid replication in E . coli .
  • the normal Bip binding region located within the human gamma 1 constant region gene is operably linked to the recombinant variable region in this vector.
  • a second vector is now constructed that is identical to the first except that the BiP binding region is deleted from the C H1 domain.
  • Random 12-20 mer sequences are inserted into the vector using the PCR methods described in Example 1.
  • the vector is introduced into suitable eukaryotic cells, which produce a light chain complementary to the heavy chain used (e.g. the cell lines produced in Example 4) .
  • the vector can be introduced into cells using methods well known in the art. Here, electroporation is used to mediate transfection. Positive cells are preferably selected by growing the cells in media containing G418.
  • cells expressing a detectable level of protein are identified by ELISA. Positive cells are further quantitated for protein expression by a quantitative ELISA technique as compared with cells expressing the protein under the control of the native regulatory sequence, and using the vector without random oligonucleotides added as a negative control. Cells expressing equal or greater levels of intact antibody as compared with positive control cells and that do not express an increased level of free light or heavy chains are subjected to DNA sequencing to determine the DNA sequence of the trafficking signals most efficacious for expression of the immunoglobulin. While particular embodiments of the invention have been described in detail, it will be apparent to those skilled in the art that these embodiments are exemplary rather than limiting, and the true scope of the invention is that defined in the following claims.

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

This invention discloses a method for optimizing the production of a polypeptide in a cell comprising the steps of identifying a regulatory region within a nucleic acid sequence to be mutagenized; preparing a nucleic acid vector comprising that region and encoding at least one polypeptide regulated by said regulatory region; deleting that region from the vector; producing a pool of random oligonucleotides; using PCR to introduce one random oligonucleotide into the position previously occupied by the regulatory region into each of a plurality of vectors to generate a pool of mutagenized vectors; introducing the mutagenized vectors into a cell sample; assaying for the expression of the polypeptide in that sample; and selecting and isolating those cells exhibiting optimized polypeptide expression. The regulatory regions contemplated within the scope of this invention include signal and protein trafficking sequences, ribosome binding sites, promoters, and translational regulatory sequences.

Description

METHOD FOR CREATING OPTIMIZED REGULATORY REGIONS AFFECTING PROTEIN EXPRESSION AND PROTEIN TRAFFICKING
This invention relates generally to methods for the mutagenesis of nucleic acid sequences and more specifically to mutagenesis strategies involving polymerase chain reaction-related technologies and to optimized regulatory sequences generated by these methods.
Gene expression can be regulated at various steps on the path from DNA to RNA to mature protein. The overall path is the same for pro aryotes and eukaryotes with regulation possible at the level of transcription of RNA from the DNA template, translation of mRNA at the ribosome, and targeting of protein to appropriate locations within the cell or outside. However, the differences between prokaryotes and eukaryotes lead to a number of differences in how gene expression can be regulated in each. Intracellular membranes and organelles are critical components of eukaryotic cells, but are not found in prokaryotes. On the other hand, the potential for coupled transcription and translation in prokaryotes is not possible in compartmentalized eukaryotic cells. Eukaryotic genes have introns interrupting the coding sequences, prokaryotic genes do not. On the other hand, prokaryotes put coding sequences for multiple genes under a single promoter and transcribe them as a single polycistronic mRNA, but eukaryotes do not. Most eukaryotic genes require RNA splicing, capping, poly-adenylation, and transport from the nucleus to the cytoplasm before the mRNA becomes functional, but prokaryotes do not. In addition, eukaryotes require a more complicated process for protein targeting to distinguish proteins destined for different organelles, the cell membrane, and for secretion, as well as post translational modification of many eukaryotic proteins before they can be assembled or become functional. In EUKARYOTES, the expression of a particular gene can be regulated at transcription, RNA processing, RNA transport, translation, protein targeting, and protein activation/modification. Specific gene sequences are responsible for effects at each level and these can be modified to increase expression according to this invention, if eukaryotic expression is desired.
At the level of transcription, these sequences act directly, by interacting with DNA binding proteins. Examples include: Promoters which interact with transcription factors and RNA polymerase to initiate transcription. Upstream promoter elements which are generally near to the promoter and interact with DNA binding proteins to either improve or inhibit the ability of polymerase to interact with the promoter. The transcription of the genes encoding the DNA binding proteins described above and below is another level of regulation of specific eukaryotic genes. Enhancers, which are quite distant from the promoter, bind to other DNA binding proteins to enhance the level of transcription of the gene, although specific sequences are also present in enhancers which can inhibit transcription under certain conditions or in specific cell types. One skilled in the art can readily determine whether transcription is increased or decreased by specific sequences within upstream promoter elements or enhancers in a specific cell type. Thus, enhancers, promoters and upstream promoter elements can be modified to increase expression according to this invention. At the level of RNA processing, primary transcripts are capped at their 5' ends, poly-adenylated at their 3' ends and have their introns spliced out (sometimes leading to different protein products due to alternative RNA splicing) prior to being transported through pores in the nuclear membrane to cytoplasm, where they associate with ribosomes. Important RNA sequences which can affect the efficiency of this process include the poly-A addition site, the Cap site, and the splice acceptor and donor sites, and DNA sequences encoding these sites can be modified to increase expression according to this invention.
To participate in translation, a eukaryotic mRNA must be transported out of the nucleus. Specific recognition signals on the mRNA are believed necessary for transit of mRNA through the nuclear pore. Other signals may retain certain mRNAs within the nucleus, where they are degraded. As many as half of the unspliced heterologous nuclear RNAs produced from primary transcripts are degraded without ever leaving the nucleus, offering a potential for regulation at another level, the level of RNA transport.
DNA sequences encoding these RNA signals could be modified to increase expression according to this invention as well.
Not all mRNAs that escape the nucleus are translated into protein. Eukaryotic mRNAs vary in susceptibility to degradation. Binding to ribosomes decreases their degradation. Important sequences that can be modified to control expression at the level of translation are those recognized by specific translation repressor proteins, which bind to the 51 end of some mRNAs and block translation initiation. For those mRNAs which are not blocked, current data suggests that initiation of translation is by the so-called "scanning model" of initiation. A ribosome component (4OS) binds initially to the 5• end of an mRNA then migrates 3• until it finds an initiation site where the other portion of the ribosome joins it to form the 80S initiation complex. The Kozak sequences surrounding the translational start site determine whether a particular mRNA will be efficiently translated. The presence of an Adenine (A) at the -3 position has as much as a twenty-fold impact on translation. In general, the translation rate is most affected by the nucleotide three prior to the initiator codon and the one immediately after it [best if a Guanine (G) ] . The DNA sequence encoding binding sites for translation represser proteins and for the region around the initiator can be modified to increase expression according to this invention.
At the level of protein targeting, eukaryotic proteins have a diversity of target locations within the cell including the nucleus, nuclear membrane, cytoplasm, mitochondria, mitochondrial inner membrane, mitochondrial outer membrane, endoplasmic reticulum, golgi apparatus, lysosomes, lysosomal membranes, endoplasmic reticulum membrane, and cell membrane. In addition, secreted proteins are targeted to the exterior of the cell. The process begins, as for prokaryotes, with a signal sequence directing the nascent peptide and its ribosomes to a membrane. But in this case, the membrane is the rough endoplasmic reticulum (RER) . If the protein is to be secreted, it is translated into the RER and begins a journey through different membrane bound organelles and vesicles to the cell membrane where it is extruded to the outside by exocytosis. Proteins destined to be membrane- bound are also translated into the RER, but do not completely transit the rough endoplasmic reticulum membrane (RERM) , becoming integrated into it. This membrane fuses with that of the Golgi apparatus or of vesicles which, in turn, bud off and fuse with the destination membrane where the protein either becomes bound or transits further to its final target. Targeting to the proper site is almost certainly facilitated by identifiable target signals within the sequence of the protein that direct the protein to the proper destination. Thus, the sequences that can be modified to increase expression according to this invention include those encoding the signal sequence, transmembrane sequences, cytoplasmic anchor sequences, and the putative specific membrane targeting sequences.
In addition, numerous secreted proteins, including immunoglobulins, consist of multiple chains that must be assembled prior to secretion. In immunoglobulins, for example, the heavy chains cannot be secreted if they are not bound to light chains. The heavy chains that do not bind to light chains, either because of improper folding or an absence of free light chains, remain bound to a protein in the RER called BiP, which prevents their transit to the Golgi apparatus. Thus, it is clear that additional regulatory elements that can be modified according to the method of this invention to increase expression are those which encode structural features that facilitate or impede correct folding of a peptide or its association with other chains necessary to form the mature protein product.
At the level of protein activation/modification, many eukaryotic proteins are synthesized as inactive or partially active precursors that become activated by proteolytic cleavage or other modifications (e.g., phosphorylation) . Although, not required for expression in many cases, secretion cannot occur without proteolytic cleavage of the signal peptide. Inefficiency in this process can delay or block secretion of functional protein. DNA sequences encoding proteolytic cleavage sites, phosphorylation sites, glycosylation sites, or other sites for post-translational modification can be modified according to this invention to yield increased expression of functional protein.
In PROKARYOTES, there are fewer stages where regulation of gene expression can occur, and prokaryote specific elements and associated proteins are involved. Regulation does take place at the levels of transcription, translation, and protein targeting, as for eukaryotic genes, but RNA processing and RNA transport are totally absent in prokaryotes. There are also far fewer known cases of post-translational modification of proteins in bacteria than in eukaryotes. Of special importance, there are very few authenticated cases of glycosylated bacterial proteins, although proteins are modified in other ways (e.g., fatty acylation). No clear picture of regulation at the level of post-translational modification has emerged. Like eukaryotes, prokaryotes offer a number of specific DNA sequences which can be modified according to this invention to increase expression.
At the level of transcription in prokaryotes identifiable DNA sequences interact with DNA binding proteins. The promoter binds RNA polymerase and, with the release of the σ subunit, initiation of transcription occurs. Unlike in eukaryotes, distant enhancer sequences have not been seen in prokaryotes. Different promoter sequences have different affinities for RNA polymerase. Transcription initiation can be positively regulated by binding of a catabolite gene activator protein (CAP) to a site immediately upstream of the promoter and, in many genes, can be negatively regulated by binding of a repressor protein to a site immediately downstream of the promoter called the operator. The level of transcription of these DNA binding proteins can affect the transcriptional level of the regulated gene. As one skilled in the art will appreciate, binding sites can be modified both in the target gene and in the gene encoding the repressor or CAP using the present invention to increase or decrease the level of transcription of prokaryotic genes.
Sites amenable to modification to increase expression at the level of translation also exist in prokaryotic DNA, but are distinct from those in eukaryotes. A region originally known as the Shine and Dalgarno sequence or more recently as the ribosome binding site found on the 5' end of prokaryotic mRNAs is complementary to the 16S ribosomal RNA. Similar to eukaryotic initiators, an Adenine (A) is preferred at the -3 position. The Shine and Dalgarno interaction is unique to prokaryotes and is unlike the scanning model used in eukaryotes to initiate translation described above. The DNA sequence which encodes the ribosome binding site in mRNA can be modified according to this invention to increase expression.
There are proteins which bind to sites on specific mRNAs to inhibit their translation (e.g., ribosomal proteins) . In addition, transcription can be affected by 5' coding sequences which can lead to premature termination of transcription based on charged tRNA availability. This system, called attenuation, is seen in genes for amino acid biosynthetic pathways, but is unique to prokaryotes because it requires coupled transcription and translation, which can occur only in cells having no nuclear membrane separating the two processes. Placement of the gene within the operon also has an effect on translation of specific genes. The 5* genes are more efficient in initiation than those at the 3 ' end of the polycistronic message. The "polarity effect" is amplified by the fact that binding to ribosomes actually delays mRNA degradation. The DNA sequences which encode protein sequences responsible for attenuation can be modified according to this invention to increase expression. At the level of protein targeting, as with eukaryotes, specific targeting sequences are required for different types of targeting. These sequences encode portions of the protein with hydrophobic or hydrophilic properties corresponding to the protein's point of association with the membrane. The most common sequence is the N-terminal signal sequence, which is responsible for targeting the protein and its associated ribosomes to the cell membrane. Internal sequences determine if portions of the protein will reside within or outside of the lipid bilayer of the membrane or if the protein will be secreted into the periplas ic space (in gram negative) or directly into the media (in gram positive bacteria) .
The efficiency of secretion depends upon the signal sequence as well as the cellular machinery required for transport. This includes peptidases for removing the signal peptide and bacterial cell membrane proteins, both of which have been implicated as potential signal peptide receptors. A Signal Recognition Particle (SRP) analogue has recently been confirmed for bacteria (Luirink J., et al. Nature, 359:741-743, 1992. Another significant difference between prokaryotes and eukaryotes is the limited number of target membranes in bacteria.
Due to all of these differences in the mechanisms of expression between eukaryotes and prokaryotes heterologous expression of eukaryotic genes in prokaryotic cells faces a number of hurdles. First, all introns must be removed or truncated products will be produced. Second, eukaryotic upstream promoter elements and enhancers are not only non-functional in prokaryotes but can be deleterious. Third, eukaryotic signals, while occasionally functional, are never optimized for prokaryotes, and often are totally ineffective. Often fortuitous prokaryote-like signal sequences are used that are distinct from the native eukaryotic signal. Fourth, the absence of functional prokaryotic ribosome binding sequences often leads to little or no translation of the eukaryotic message. Fifth, the absence of prokaryotic codon usage can slow translation. Finally, the DNA sequence encoding an mRNA for a eukaryotic signal could be completely misinterpreted in a prokaryotic expression system. For example, it could be recognized by a prokaryotic DNA binding protein, or the mRNA could experience inhibitory folding or be more rapidly degraded than in the native cell. Because such miscommunication depends in part upon the sequence of the gene to be expressed, it is possible that different genes inserted into the same vector in the same heterologous expression cell will attain different levels of expression. For example, a leader sequence optimized to attain the highest level of expression of one eukaryotic protein gene in E. coli may not be the leader sequence that will optimize expression of a different eukaryotic protein gene in E. coli .
The present invention overcomes these difficulties by providing a means to customize heterologous expression for any particular desired protein. The present method assays the combined effect of mutations made in the DNA of one or more particular control sequences upon all steps taken during the whole process of heterologous expression leading up to secretion of a particular desired protein. Therefore, the present method provides a means to customize modifications to discrete sequences in eukaryotic DNA which impact expression of the desired protein in the heterologous host cell.
For a general review of the differences in eukaryotic and prokaryotic gene expression and protein transport see: Watson, JD, et al. Molecular Biolocry of the Gene 4th edition. Menlo Park, Benjamin Cummings Press, pp. 359-618 (1987) ; Alberts, et al. Molecular Biology of the Cell 2nd ed. New York, Garland Publ., pp. 551-612 (1989); Reznikoff, W. and Gold, L. eds. Maximizing Gene Expression Boston, Butterworths Publ., pp. 1-34, 195-286 (1986);
Pugsley, A.P. Protein Targeting San Diego, Academic Press, pp. 1-168 (1989) and Watson, et al. Reco binant DNA: A Short Course New York, Freeman Publ. pp. 45-57. For a review of protein translocation across the E . coli plasma membrane see Wichner, et al. Annu . Rev. Biochem . 60:101-24, 1991 and Bieker, et al. Trends in Gen . 6:329-333, 1990. The references listed in this application are incorporated herein by reference to the extent that they supplement, explain, or provide a background for each methodology, technique and/or composition employed herein. This invention relates to methods for optimizing the expression of polypeptide in a cell and to methods for creating and isolating novel regulatory sequences for the expression of polypeptide in a cell. The regulatory sequences contemplated within the scope of this invention include, but are not limited to, signal sequences, ribosome binding sites, promoter sequences, translational regulatory sequences, transcription regulatory sequences, protein trafficking sequences, enhancer sequences, and the like.
In a preferred aspect of the invention, a method is provided for optimizing the production of polypeptide in a cell comprising the steps of (a) identifying at least one regulatory region within a nucleic acid sequence to be mutagenized; (b) preparing a nucleic acid vector comprising the regulatory region and a nucleic acid sequence encoding at least one polypeptide regulated by the regulatory region; (c) deleting the regulatory region from the vector; (d) producing a pool of random oligonucleotides; (e) using a polymerase chain reaction to introduce at least one random oligonucleotide into the position previously occupied by the regulatory region in a plurality of vectors to generate a pool of mutagenized vectors; (f) introducing the mutagenized vectors into a cell sample; (g) assaying for the expression of the polypeptide in the cell sample; (h) selecting cells exhibiting optimized polypeptide expression; and (i) isolating optimized polypeptide from the cells of step (h) . Preferably the regulatory region is located within the nucleic acid sequence encoding polypeptide and in one preferred embodiment the regulatory region is in a translated portion of the nucleic acid sequence encoding polypeptide. In another preferred embodiment, the regulatory region is outside of the nucleic acid sequence encoding polypeptide. The method is useful for optimizing regulatory regions selected from the group consisting of signal sequences, ribosome binding sites, promoter sequences, translational regulatory sequences, transcription regulatory sequences and protein trafficking sequences. In a preferred embodiment of this invention, the nucleic acid sequence encodes an antibiotic resistant gene and the selection step consists essentially of growing the cell sample in the presence of an antibiotic. In another aspect of this embodiment, the nucleic acid sequence encoding polypeptide encodes a selectable marker and in yet another aspect of this embodiment, the nucleic acid sequence encoding polypeptide encodes a fusion protein. In a preferred embodiment the nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and the cell sample is prokaryotic. In a preferred method for optimizing the production of polypeptide in a cell, the method additionally comprises the steps of introducing the nucleic acid vector into a second cell sample, expressing the polypeptide encoded by the vector in the second cell sample, and measuring the level of polypeptide expression in the second cell sample. This method optionally includes the step of selecting cells from the first cell sample that exhibit optimized polypeptide expression relative to the measuring step. Once optimized cells are selected, it is contemplated that the nucleic acid sequence of the random oligonucleotide corresponding to the mutagenized regulatory region producing optimized polypeptide expression can be identified.
In another preferred embodiment of this invention, a method is provided for creating and isolating novel signal sequences comprising the steps of: (a) identifying a signal sequence within a nucleic acid sequence encoding a polypeptide; (b) preparing a nucleic acid vector comprising the nucleic acid sequence encoding the polypeptide; (c) introducing the vector into a first cell sample and expressing the polypeptide in the first cell sample; (d) measuring the level of polypeptide expression in the first cell sample; (e) deleting the signal sequence from the vector; (f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by the signal sequence in a plurality of vectors to generate a pool of mutagenized vectors; (h) introducing the mutagenized vectors into a second cell sample; (i) assaying for the expression of the polypeptide in the second cell sample; (j) selecting cells exhibiting optimized polypeptide expression relative to step (d) ; and (k) determining the nucleic acid sequence of the regulatory region contained in the mutagenized vector located in the cells of step (j). In one embodiment the optimized level of polypeptide expression is a level of expression greater than or equal to the level of expression of the polypeptide in step (d) . Preferably the nucleic acid vector encodes an antibiotic resistant gene and the selection step additionally consists essentially of growing the cell sample in the presence of an antibiotic. The nucleic acid sequence encoding polypeptide preferably encodes a selectable marker and the polymerase chain reaction is preferably an enzymatic inverse polymerase chain reaction. In one embodiment the nucleic acid sequence encoding polypeptide encodes a .fusion polypeptide and in a particularly preferred embodiment a portion of the fusion protein is derived from 3-lactamase. In another embodiment a portion of the fusion protein is derived from an antibody. Preferably the nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and the first and second cell samples are prokaryotic. Still more preferably, the cell samples are E . coli .
In yet another preferred embodiment of the method for creating and isolating novel signal sequences, the random oligonucleotides are biased and preferably contain at least one positively charged amino acid at the N- terminus, a stretch of at least 8 hydrophobic amino acids and a small amino acid such as alanine, glycine, or valine positioned at the C-terminus.
In another preferred method of this invention, a method is disclosed for optimizing polypeptide expression in a cell by performing random mutagenesis on a regulatory region regulating polypeptide expression from a nucleic acid vector wherein the improvement comprises deleting a region of nucleic acid to be mutagenized, isolating the vector containing the deletion and replacing the region to be mutagenized with a random nucleic acid sequence.
In a preferred method of this invention a method is provided for creating and isolating novel ribosome binding sites, comprising the steps of: (a) identifying a ribosome binding site within a nucleic acid sequence; (b) preparing a nucleic acid vector comprising the nucleic acid sequence containing a ribosome binding site operably linked to a nucleic acid sequence encoding a polypeptide; (c) introducing the vector into a first cell sample and expressing the polypeptide in the first cell sample; (d) measuring the level of polypeptide expression in the first cell sample; (e) deleting the nucleic acid sequence containing the ribosome binding site from the vector; (f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by the nucleic acid sequence containing the ribosome binding site in a plurality of vectors to generate a pool of mutagenized vectors; (h) introducing the mutagenized vectors into a second cell sample; (i) assaying for the expression of the polypeptide in the second cell sample; (j) selecting cells exhibiting optimized polypeptide expression relative to step (d) ; and determining the nucleic acid sequence containing the ribosome binding site located in the mutagenized vector introduced into the cells of step (j). In a preferred embodiment of this method the nucleic acid sequence encodes an antibiotic resistance gene and the selection step consists essentially of growing the cell samples in the presence of an antibiotic. Preferably, the nucleic acid sequence encoding polypeptide encodes a selectable marker. In one embodiment the nucleic acid sequence encoding polypeptide is a fusion protein and preferably the fusion protein is derived from an antibody and in another preferred embodiment the fusion protein is derived from jS-lactamase.
In another preferred aspect of this invention, polypeptide signal sequences are disclosed. These signal sequences correspond to SEQ ID NO: 13, SEQ ID NO: 16 and SEQ ID NO: 14 and to polypeptide signal sequences containing at least contiguous amino acid 10-mers thereof. In another preferred aspect of this invention, nucleic acid sequences are disclosed corresponding to ribosome binding sites identified as SEQ ID NO: 35, SEQ ID NO: 37, and SEQ ID NO: 38 and to ribosome binding sites corresponding to at least contiguous 5-mers thereof.
In yet another preferred embodiment of this invention, a method is provided for creating and isolating novel signal sequences, comprising the steps of: (a) identifying a signal sequence within a nucleic acid sequence encoding a polypeptide; (b) preparing a nucleic acid vector comprising the nucleic acid sequence encoding a polypeptides, wherein the polypeptide is a fusion protein having a C-terminus corresponding to /3-lactamase; (c) deleting the signal sequence from the vector; (d) producing a pool of random oligonucleotide suitable for an enzymatic inverse polymerase chain reaction; (e) using an enzymatic inverse polymerase chain reaction to introduce the random oligonucleotides into the position in the vector previously occupied by the signal sequence to generate a pool of mutagenized vectors; (f) introducing the mutagenized vectors into a first cell sample of E . coli ,' (g) assaying for the expression of the fusion protein in a first cell sample in the presence of ampicillin; and (h) selecting cells exhibiting optimized polypeptide expression. In one embodiment of this method, the method additionally comprises the steps of introducing the vector comprising the nucleic acid sequence encoding a polypeptide into a second cell sample of E . coli , expressing the polypeptide in the second cell sample and measuring the level of polypeptide expression in the second cell sample. In one embodiment this method includes the added step of selecting cells from the first cell sample that exhibit optimized polypeptide expression relative to the level of polypeptide expression observed in the measuring step. Preferably, the optimized polypeptide expression is a level of polypeptide expression greater than or equal to the level of polypeptide expression obtained from the measuring step. In another preferred embodiment, the method additionally comprises the step of determining the nucleic acid sequence of the random oligonucleotide introduced into the position in the vector, previously occupied by the signal sequence, isolated from cells identified by the selecting step. In yet another preferred embodiment of the invention the concentration of ampicillin is at least 30 μg/ml and the nucleic acid sequence encoding polypeptide is a sequence encoding a single-chain antibody.
In another method of this invention, a method is disclosed for identifying a novel nucleic acid sequence encoding a protein trafficking signal that directs a polypeptide to a desired location in a cell, comprising the steps of: (a) identifying a region of nucleic acid containing at least one protein trafficking signal to be mutagenized; (b) preparing a nucleic acid vector comprising the protein trafficking signal sequence to be mutagenized and a nucleic acid sequence encoding at least one polypeptide; (c) deleting the protein trafficking signal sequence from the vector; (d) producing a pool of random oligonucleotides; (e) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by the protein trafficking signal in a plurality of vectors to generate a pool of mutagenized vectors; (f) introducing the mutagenized vectors into a cell sample; (g) assaying for the location of the polypeptide in the cell sample; (h) selecting cells expressing the polypeptide in the desired cell location; and (i) determining the nucleic acid sequence of the protein trafficking signal contained in the mutagenized vector located in the cells of step (h) . In a preferred embodiment of the trafficking method of this invention, the protein trafficking sequence to be mutagenized is located in the nucleic acid sequence encoding the polypeptide and in another embodiment the polymerase chain reaction is the enzymatic inverse polymerase chain reaction and the nucleic acid sequence encodes an antibiotic resistant gene and the selection step consists essentially of growing the cell sample in the presence of an antibiotic. Preferably the desired cell location is extracellular. In another aspect of this invention, a method is provided for creating and isolating novel regulatory sequences useful for optimizing the expression of a reco binant polypeptide in prokaryotic cells comprising the steps of: (a) identifying at least one regulatory region within a nucleic acid sequence to be mutagenized; (b) preparing a nucleic acid vector suitable for expressing polypeptide in a prokaryotic cell, the nucleic acid vector comprising the regulatory region and a nucleic acid sequence encoding at least one polypeptide operably linked to the regulatory region; (c) introducing the vector into a first prokaryotic cell sample and expressing the polypeptide encoded by the vector in the first cell sample; (d) measuring the level of polypeptide expression in the first prokaryotic cell sample; (e) deleting the regulatory region from the vector; (f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one random oligonucleotide into the position previously occupied by the regulatory region in a plurality of vectors to generate a pool of mutagenized vectors; (h) introducing the mutagenized vectors into a first prokaryotic cell sample; (i) assaying for the expression of the polypeptide in the second prokaryotic cell sample; (j) selecting cells exhibiting optimized polypeptide expression relative to step (d) ; and (k) determining the nucleic acid sequence of the mutagenized regulatory region located within the vector introduced into the cells of step (j).
In a preferred embodiment of this method, the regulatory region is selected from the group consisting of a signal sequence, a ribosome binding site, a promoter sequence, a translational regulatory sequence, a transcription regulatory sequence and a protein trafficking sequence. In one embodiment the nucleic acid sequence encoding polypeptide encodes an antibiotic resistant gene and the selection step additionally consists of growing the second prokaryotic cell sample in the presence of an antibiotic. Preferably the first and second cell samples are derived from the same cell type. Figure 1 diagrams the assembly of the single chain antibody β-lactamase fusion protein and the incorporation of the fusion protein with the Omp A signal sequence into the expression vector pCCHAscl-ampRv2.
Figure 2 illustrates the strategies followed to delete the Omp A regulatory region, to incorporate the random oligonucleotides, and to generate a library of random signal peptide mutants.
Figure 3 is a diagram of plas id pCLA3ampR that was used to redone identified signal sequences to test for signal sequence effectiveness.
Figure 4 is a comparison of the hydrophobicity and the alpha and beta region analysis of two novel signal peptide sequences identified using the methods of this invention as compared with known signal peptide sequences. Figure 5 illustrates the construction of variable length signal peptide libraries using the methods of this invention.
Figure 6 is a diagram of the plasmid modifications used to produce the plasmid containing the ribosome binding site library.
Figure 7 is a diagram of plasmid pGCEMK.
Figure 8 is a diagram of plasmid pNCEMGl.
Figure 9 illustrates sequences of some of the oligonucleotides containing the ribosome binding site and signal peptide sequences identified using the methods of this invention.
This invention provides methods useful for creating novel regulatory sequences that affect recombinant protein expression within either a prokaryotic or eukaryotic cell. In particular, these methods advantageously facilitate the optimization of recombinant protein expression in a cell. Using the techniques of this invention, modifications can be incorporated into more than one regulatory region and the cumulative effect of these mutations can be assessed by isolating clones expressing the desired level of recombinant protein expression.
The term "polypeptide" and "protein" are used interchangeably throughout this text.
The term "heterologous protein expression" is used herein to refer to protein that is not native to the host cell expressing the heterologous protein.
The term "optimized protein expression" is used herein to refer to the cumulative conditions that provide an optimal level of protein expression for a particular protein in a particular cell system. Under one set of laboratory conditions, optimized protein expression may refer to the highest available level of protein expression, while in another set of laboratory conditions, optimized protein expression may refer to low level protein expression because for that particular application low level protein expression is preferred. In yet another set of laboratory conditions optimized protein expression may refer to the level of protein expression that can coexist with cell life in situations where, under standard conditions, the protein would be cytocidal to the cells. The term "operably linked" is used to mean regulatory regions that ultimately influence, or effect the level of recombinant protein expression from a nucleic acid vector. The term "regulatory region is outside of a nucleic acid sequence encoding polypeptide" is used herein to indicate that the regulatory region of interest is not located within the coding region. For this purpose, the term coding region is used to denote that region of DNA that corresponds to RNA beginning with a methionine codon and ending with a polyadenylation signal.
While the methods of this invention are suitable for the optimization of recombinant protein sequences in either prokaryotes or eukaryotes, it is contemplated that these methods are particularly useful for the optimization of heterologous protein sequences in either prokaryotes or eukaryotes. In particular, these methods are useful for optimizing the expression of eukaryotic protein in prokaryotic cells.
Expression of eukaryotic protein in a prokaryotic system is often commercially desirable. Prokaryotic cells typically have shorter doubling times than eukaryotic cells and they are easier and less expensive to grow in large quantity than eukaryotes. However, optimizing expression of a eukaryotic protein in a prokaryotic cell such as E. coli has heretofore been an inefficient process. Typically a eukaryotic gene sequence encoding protein that is incorporated into a prokaryotic expression vector is expressed at relatively low levels or at levels that result in prokaryotic cell death. Given the multitude of differences in protein expression between prokaryotes and eukaryotes, as outlined in the background of this invention, it is often difficult to identify exactly why a particular eukaryotic sequence encoding protein is either not efficiently expressed or is bacteriocidal. The methods of this invention do not require a systematic assessment of each individual factor that can effect protein synthesis. Rather, the methods of this invention employ a novel random mutagenesis and selection strategy that advantageously permits one of skill in the art to identify clones that yield optimized levels of heterologous protein expression by assessing the cumulative effect of all of the changes incorporated into the expression vector. Thus, by the method of this invention, it is possible to select clones having mutagenized regulatory sequences individually tailored to optimize expression of any given protein in any given expression cell.
Most mutagenesis techniques involve the systematic replacement of one or more nucleotides with other known nucleotides. A variety of site-directed mutagenesis strategies are known in the art and are commercially available in kit format (see BioRad, Richmond, CA; Stratagene, La Jolla, CA; or Invitrogen, San Diego, CA.) Here, using site-directed mutagenesis, one begins with a known, efficient regulatory region and improves it by introducing single or multiple point mutations. In this procedure, the mutagenized regulatory region may or may not be biased according to what is known about the chemical properties of the wild-type regulatory region. For site- directed mutagenesis techniques used to optimize a prokaryotic signal peptide through the modification of a hydrophobic region of the signal peptide see Kendall, et al. Nature 321:706-708, 1986. For a review of site- directed mutagenesis strategies for prokaryotic signal peptides see Ryan, et al. J . Biol . Chem . 261:3389-3395, 1986; Park, et al. Agric . Biol . Chem . 55:1745-1750, 1991; vigal, et al. Mol . Gen . Genet . 231:88-96, 1991; Borchert, et al. J . Bacteriol . 173:276-282, 1991; Morioka-Fujimoto, et al. J. Biol . Chem . 266:1728-1732, 1991. Similarly, cassette mutagenesis strategies are also known in the art. These strategies involve removing a section of DNA between two restriction sites and replacing it with a different DNA sequence that is bounded by the same restriction sites (Richards, J.H. , (1991) "Cassette Mutagenesis" in Directed Mutagenesis, A Practical Approach, M.J. McPherson ed. IRL Press) . Chou, et al. (J. Biol . Chem . 265:2873-2880, 1990) studied the hydrophobicity requirements of the prokaryotic signal sequence using cassette mutagenesis. Ngsee, et al. used cassette mutagenesis in eukaryotic cells to study the effect of signal sequence mutants on protein expression (Mol . Cell . Biol . 9:3400-3410, 1989). These techniques replace identified sequences with known alternatives. However, unlike the mutagenesis strategies of this invention, these traditional site-directed and cassette mutagenesis techniques are not efficient methods for creating novel functional sequences.
In the practice of this invention, random or biased mutagenesis techniques are useful methods for generating a pool of novel mutants. Regions of oligonucleotides, or whole oligonucleotides, are synthesized using methods that permit the random incorporation of nucleotides. Random mutagenesis combined with good selection systems can be used to identify functional regulatory sequences lacking any predisposed bias to previously identified sequences. The collection of random sequences is incorporated into the gene sequence in place of the native sequence.
For purposes of this application, the term random mutagenesis will be used to encompass biased mutagenesis techniques. Biased mutagenesis is a type of random mutagenesis in which pools of oligonucleotides are prepared to accommodate the incorporation of more than one type of nucleotide in a given location within a nucleic acid sequence. Biased mutagenesis techniques are particularly useful for generating pools of mutants that maintain the consensus patterns of charge, hydrophobicity, hydrophilicity, etc. within a particular region of protein encoded by a nucleic acid sequence. In this way the chemical characteristics of a molecule can be maintained while the actual nucleic acid sequence and amino acid sequence of an encoding protein is varied. The application of biased mutagenesis to regulatory regions has the advantage that optimized sequences can be identified that are chemically biased toward native sequences but reflect a wider variation than what has previously been identified in the art. Some of these sequences may direct protein expression more efficiently than other sequences heretofore identified. Strategies of random mutagenesis to select clones containing functional mutations is complicated by the contaminating background level of clones containing nonmutated sequences. The presence of nonmutated sequences complicates selection and screening strategies by decreasing the variability within a library and by increasing the number of false positive clones. A large number of the clones produced by traditional random mutagenesis techniques are false positive because they contain the functional nonmutated sequence. Since the presence of novel optimized sequences from a random or biased pool of nucleotides is a relatively rare event, contaminating protein expression directed from nonmutated sequences can easily block the signal of the rarer, optimized sequence. In this invention the drawback of non-mutated clones is avoided by substantially removing the wild-type sequence and replacing it with the randomly mutagenized sequence before selection of clones for function characteristics. As noted, this invention is particularly useful for generating novel regulatory sequences associated with protein expression. The eukaryotic regulatory sequences contemplated within the scope of this invention include, but are not limited to promoters, upstream promoter elements, enhancers, transcriptional regulatory elements, translational regulatory proteins such as translation repressor proteins, signal sequences, protein targeting sequences, chaperon proteins (these binding proteins coat polypeptide chains as they emerge from ribosomes or membranes, preventing aggregation reactions and premature folding (Flynn, et al. Nature 353: 726, 1991) and sequences directing post-translational modifications and the like. The prokaryotic regulatory sequences contemplated within the scope of this invention include, but are not limited to, promoters, transcription regulatory sequences, translation regulatory sequences including ribosome binding sites, sequences associated with attenuation, protein targeting sequences, signal sequences, and the like.
Thus, the regulatory sequences may be in any number of locations relative to the gene sequence encoding protein, for instance, within the translated region of a coding sequence, within a transcribed region of a coding sequence, or external to the coding sequence. For example, the signal sequence for both eukaryotic and prokaryotic cells is positioned at the amino terminus of the nucleic acid sequence encoding protein. Since the primary sequence of the signal peptide has a direct effect on the translocation of the associated protein (Inouye, et al. Proc . Natl . Acad . Sci . , USA 74: 1004-1008, 1977) , the signal peptide is a candidate regulatory sequence suitable for the methods of this invention. Example 1 illustrates the application of the methods of this invention to the generation of novel signal sequences. As a second example to illustrate that the methods of this invention can be applied to a variety of regulatory sequences, Example 3 provides an exemplary strategy useful for generating novel ribosome binding sites.
The regulatory sequences created by the methods of this invention may be of any length. For example, while most signal sequences fall between a range of between 18-24 amino acids, it is contemplated that novel optimized signal sequences could be created in a variety of lengths. Example 2 provides methods for generating libraries of signal peptides having lengths of 12, 16 or 20 amino acids. Similarly these methods could be applied to other regulatory regions such as ribosome binding sites, protein targeting sequences and the like.
Identification of a Regulatory Region to be Mutagenized As a first step for practicing the methods of this invention, the position of the regulatory region to be mutagenized is identified. Regulatory regions contemplated within the scope of this invention have been discussed above. The relative position of the regulatory region can be identified within a particular nucleic acid sequence by studying the consensus data for that particular regulatory region available from the prior art. There exists in the literature a wealth of information concerning the consensus positions of known regulatory regions. For example, regions encoding signal peptides have been localized for both prokaryotic and eukaryotic genes as have regions encoded on mRNA which are required for ribosome binding (Protein Targeting, supra. pp. 52-61; Kozak, Microbiol . Rev . 47:1-45, 1983; and Kozak, Nucl . Acids Res . 15: 8125- 8147, 1987). The precise identification of the position of a regulatory region within or in association with a nucleic acid sequence encoding protein is not necessary. Rather, any particular region of the nucleic acid containing a putative regulatory region postulated to exist within a nucleic acid sequence can be subjected to the mutagenesis techniques of this invention. These putative regulatory regions may be postulated in the art or they may be identified by trial and error using site-directed mutagenesis, or other mutagenesis techniques known in the art.
Preparing the Nucleic Acid Vector
Once the regulatory region to be mutagenized is identified, it is then incorporated into a nucleic acid vector together with a nucleic acid sequence encoding at least one protein under the control of the regulatory region to be mutagenized. Any suitable expression vector may be used in this invention and exemplary vectors are provided in the Examples below. Those with skill in the art will appreciate that the choice of vector is limited to those vectors capable of directing expression of the nucleic acid sequence encoding protein and to those vectors that can incorporate and support the function of the regulatory region to be mutagenized. Further, the choice of vector is limited by the cell type selected; not all vectors and not all regulatory elements necessary for recombinant protein expression function in all cell types. As a general rule, eukaryotic expression vectors are suitable for protein expression in eukaryotes and prokaryotic expression vectors are suitable for prokaryotes. Both types of vectors are commercially available and those with skill in the art of molecular biology will be able to select the appropriate vectors suitable for recombinant protein expression within a given cell type. In addition, the following vectors used in the
Examples of this invention (pCLA3ampR; pCCHAscl-ampRV2; and pCampR) have been deposited with the American Type Culture
Collection (Bethesda, Maryland) under the Budapest Treaty.
Methods for incorporating a particular region of nucleic acid into a nucleic acid vector are well known in the art of molecular biology (see Sambrook, et al., supra) . For example, short regions of nucleic acid (less than 200 bp) can be prepared by generating overlapping oligonucleotide fragments complementary to the nucleic acid. These oligonucleotides are then hybridized to one another, ligated and incorporated into an appropriate expression vector. Alternatively, those with skill in the art of molecular biology will be able to use the polymerase chain reaction technology to amplify a suitable nucleic acid fragment containing the regulatory gene of interest and incorporate this fragment into the expression vector of choice (see generally Erlich, H.A. PCR Technology: Principles and Applications for DNA Amplification, 1992. W.H. Freeman and Co., New York). As noted in the preceding paragraphs, the nucleic acid vector additionally contains a nucleic acid sequence encoding a recombinant protein product such that the regulatory region is operably linked to and/or regulates expression of the protein encoded by the gene. Since the regulatory region may be positioned in the nucleic acid vector as part of the nucleic acid sequence encoding protein or as a. nucleic acid sequence adjacent to the nucleic acid sequence encoding protein, the assembly of the nucleic acid vector containing the regulatory region to be mutagenized may require multiple steps. Those skilled in the art of molecular biology will be able to select an initial nucleic acid vector suitable for expressing a protein product from a gene sequence in a host cell and to incorporate both the gene sequence encoding the protein product and the gene sequence encoding the regulatory region into the vector using techniques of genetic engineering well known in the art. Incorporation of both the regulatory sequence and the gene sequence encoding the protein product into the nucleic acid vector is designed such that once the nucleic acid vector is introduced into a compatible cell sample, if the regulatory sequence is functional, protein expression will be detected. Compatible cells are those cells capable of expressing a protein from the nucleic acid vector when the vector contains those regulatory elements that facilitate protein expression (i.e. origins of replication, etc.) .
It is anticipated that other gene sequences may additionally be incorporated into the nucleic acid vector of this invention. For example, gene sequences coding for antibiotic resistance, or other gene sequences that facilitate cell selection may be introduced into the vector. Similarly, the nucleic acid sequence encoding protein can itself encode an antibiotic resistance gene or a portion of an antibiotic resistance gene. Other sequences that may be incorporated into the vector include but are not limited to promoters, enhancers, polyA+ adenylation sites, origins of replication (eukaryotic and/or prokaryotic) , and specific restriction endonuclease cleavage sites (or arrays of sites) . Exemplary nucleic acid vectors containing a regulatory region to be mutagenized and a nucleic acid sequence encoding protein are provided in Examples 1-3 as well as Figures 1-3 and 6.
Expression of the Reconibinant Protein under Control of the Regulatory Region to be Mutagenized
In order to identify enhanced or optimized protein expression from a particular mutagenized regulatory region relative to the regulatory region to be mutagenized, some base line information should be obtained for protein produced under the control of the nonmutagenized regulatory region. This data is later compared to the level of protein expression influenced by the mutated regulatory region. This data may be obtained from the literature, or the data may be obtained first hand. As one method for obtaining the data first hand, the vector containing the nonmutagenized regulatory region is introduced into a cell sample that is able to support the expression of the gene sequence under the control of the regulatory region.
There are a variety of methods recognized in the art for introducing nucleic acid vectors into prokaryotic and eukaryotic cells. Depending on whether the cells are prokaryotic or eukaryotic, the vectors may be introduced as viral vectors, electroporated into the cells, bacterial cell walls may be permeabilized, the vectors may be introduced through standard precipitation methods or through the use of lipophilic agents. Such methods are commercially available as kits and detailed methods are readily available in the literature. Methods for transforming E . coli with nucleic acid vectors are provided in Example 1.
Next, the level of protein expression from the nonmutagenized nucleic acid vector is measured in the cell sample. Proteins of interest may be concentrated from cell supernatants or cell lysates and quantitated by chromatographic methods, enzyme-linked immunoadsorbant assays (ELISA) , western blot assays, radio-immunoassays, gel electrophoresis, or the like. Exemplary assays for measuring the amount of protein expressed in the periplasmic space of E . coli are provided in Example 1 and are detailed in Stemmer, et al. BioTechnigues 13:114-220, 1992, hereby incorporated by reference and U.S. Patent Application No. 07/641,140, filed April 26, 1991, and U.S. Patent Application No. 07/806,154, filed December 12, 1991. Alternatively, an exemplary assay for measuring the amount of protein expressed in a eukaryotic cell is provided in Example 4. Methods to detect protein expression from a nucleic acid vector in a cell will vary depending on the gene sequence encoding protein. Those with skill in the art of molecular biology are well versed at selecting gene sequences encoding protein and matching these sequences with methods suitable for detecting protein expression derived from that gene sequence in a particular cell type. The value of protein expression obtained from cells expressing recombinant protein under the influence of the nonmutagenized regulatory region provides a level of expression that can be compared with that of cells expressing protein from vectors with mutagenized regulatory regions. Deletion of the Regulatory Region to be Mutagenized
An important aspect of this invention involves the deletion of the region to be mutagenized before the pool of random oligonucleotides are incorporated into the vector. This step of the invention advantageously removes the endogenous background level of protein expression that contaminates mutagenic libraries produced by other methods. The regulatory region can be deleted from the nucleic acid vector in a variety of ways. For example, during vector construction, two identical vectors may be simultaneously assembled such that one contains and the other lacks the mutagenized regulatory region. The regulatory region can be deleted from the vector using PCR technology and an exemplary strategy is disclosed in Example 1. Similarly the regulatory region can be deleted through the use of restriction endonuclease recognition sites bracketing the regulatory region.
Preparation of Random Oligonucleotides As a next step for practicing the methods of this invention, a pool of random oligonucleotides is prepared that are suitable for PCR methods and can be readily incorporated into the position in the vector previously occupied by the regulatory region. These oligonucleotides are positioned in the nucleic acid vector in place of the deleted regulatory region. As noted previously, the term "random oligonucleotides" includes biased oligonucleotides. Methods for generating random oligonucleotides on a DNA synthesizer are well known in the art. In this invention the random oligonucleotides are of a size and design useful for directing a polymerase chain reaction and for incorporating the nucleic acid vector containing the nucleic acid sequence encoding protein. That is, the oligonucleotides should contain regions of homology to the vector, they should include a restriction endonuclease site to facilitate recircularization of the plasmid following incorporation of the vector, and they should include a region containing the random or biased nucleotides. The oligonucleotides can be of any desired length, and the length of the oligonucleotide will depend on the type of regulatory sequence and the number of random oligonucleotides that will be incorporated into the vector.
The oligonucleotides prepared in this invention contain at least one region that incorporates more than one nucleotide at a given position within the oligonucleotide. This pool of random oligonucleotides is incorporated into the vector to produce a library of clones, containing random regulatory sequences. Libraries prepared from random oligonucleotides, and particularly libraries prepared using the methods of this invention wherein the regulatory region to be mutagenized is deleted, have the advantage that the selected active clones will likely be novel regulatory sequences. Because of the large theoretical number of random regulatory regions be they signal peptides, ribosome binding sites, or the like, the methods of this invention are more likely to generate novel regulatory sequences that direct increased or optimized protein expression than other methods currently available in the art.
One potential difficulty of a completely random regulatory library is that the majority of the sequences generated are inactive. This means that the libraries must be sufficiently large in order to identify a sufficient number of active clones for further study. If the goal of the mutagenesis procedure is to generate a regulatory region that significantly increases the expression of the recombinant protein, then this type of regulatory region would occur infrequently in a random library. For example, in an E . coli library, if the probability of finding a "super sequence" is 10"9 there is a good chance that it would not appear in a library of 108 clones. Exemplary Calculation to Estimate the Library Size Necessary for Identifying Novel Regulatory Sequences
The following example is provided as one method for estimating an adequate library size that would be required to identify a novel, optimized sequence. This example is specifically addressed to the library size required to identify an optimized signal peptide in a prokaryotic cell using biased oligonucleotides. It is contemplated that those with skill in the art would be able to apply the strategy exemplified by the calculations used in this example to determine the library size required for other regulatory regions in either prokaryotes or eukaryotes.
The known requirements for a signal sequence are first incorporated into the calculation. Known signal peptides contain one or two positively charged amino acids near the N-terminus, a stretch of 8-12 hydrophobic amino acids and a small amino acid such as alanine, glycine or valine positioned at the C-terminus. For consensus regions in prokaryotic signal sequences see Maclntyre, et al. Mol . Gen . Genet . 221:466-474, 1990; Gennity, et al. J. Bioenergetics and Biomembranes 22:233-269, 1990; Laforet, et al. J . Biol . Chem . 266:1326-1334, 1991 and Yamane, et al. J . Biol . Chem . 263:19690-19696, 1988 and for consensus regions and exemplary eukaryotic signal sequences see Von Heijne, et al. Eur. J. Biochem . 133:17-21, 1983.
The fraction of active leader peptides in this example is (f) = pi*, p2*, p3* ....pn, where pi, p2, p3...pn represent the probability of a given amino acid or amino acids, occurring in a given position. In a hypothetical library containing a 22 amino acid signal peptide, the peptide can be diagrammed as [M] [+][+] [n]ι9[S] where each [] represents an amino acid, [M] represents methionine, [+] represents a positively charged amino acid, [n] represents any hydrophobic amino acid and [S] represents a tiny amino acid. The fraction of active leader peptides (f) = (1) (.15) (.15) (.66)18(.2) = 7.3 x IO"6 Pugsley, supra. p. 47, Sambrook, supra, Appendix p. 6. Thus a signal peptide of this particular configuration would occur once in 300,000 random clones. This value reflects the incidence of an active signal peptide within a library and not the frequency of an optimized signal peptide. One would expect that the frequency of an optimized signal peptide occurring in a random library to be less; therefore the library size would need to be larger.
Several assumptions were made in the above calculation. First, it was assumed that two positive charges are required at the N-terminus. In fact, there are known signal peptides that carry only a single positive charge at their N-terminus. If only a single positive charge is required, active leader peptides will occur at a higher frequency. Similarly, for this calculation, the hydrophobic region was limited to only hydrophobic amino acids. There are some cases where amino acids such as gin or asn or even acidic or basic residues occur in this region. Thus, the calculation of (0.66)18 may be too restrictive. If 2 amino acids in the hydrophobic stretch are permitted to be any of the 20 amino acids, then we arrive at a value of (0.66)16. Finally, the impact of the degeneracy of the code was ignored in these calculations. Although the last amino acids in known signal peptides are usually glycine or alanine, these amino acids are generally believed to be required for signal peptide cleavage. Since signal peptide cleavage is not required for translocation and β-lactamase activity, that requirement can be eliminated. If these less restrictive requirements are used, the frequency of a leader peptide occurring in a random library is 0.02% or 1 in 5000 random sequences. These types of calculation are important for estimating the required library size needed to identify a novel, optimized sequence.
In practical terms, a biased library is difficult to construct due to the constraints on the usage of the genetic code. For example, if one tries to use oligonucleotide doping to produce either a lysine or arginine in a given position, the codon for glutamine will also occur. Similarly, due to the degeneracy of the genetic code, it is impossible to constrain an amino acid to solely a hydrophoibic amino acid using single base oligonucleotide synthesis from variable base mixtures. In some cases it is possible to use doping to achieve the desired amino acid mixture. For example, a 50% occurrence of alanine or glycine will occur if the sequence 5* G(G/C)X is used. The successive addition of pre-made trimers can obviate this problem. Precise ratios of desired amino acids can be designed into the library through use of premixed trinucleotide codon triplets during DNA synthesis (Sondek, et al. Proc . Natl . Acad . Sci . (USA) 89:3581-3585, 1985) . Given current DNA synthesizer technology this would be a time consuming and expensive undertaking.
As another method for generating a biased library the oligonucleotide column swapping DNA synthesis technique developed by Glaser, et al. is used (J . Immunol . 149:3903- 3913) . This method produces a 50% chance of a mutation at any desired amino acid. This is a codon based mutagenesis approach that allows the formation of large diverse libraries. The column-swapping manipulation causes one- half of each synthesis step to contain random codons, while the other half contains wild-type codons, at each position to be mutated. In this way, one can mutate large stretches of DNA without a high frequency of stop codons, as only one of the three possible stop codons can occur, and this only at a frequency of 1 in 32. The use of a suppressor tRNA for this stop codon can reduce the frequency to zero. However, the preparation of biased oligonucleotides to identify optimized regulatory sequences is not an absolute requirement for the methods of this invention. Optimized sequences can be identified from clones incorporating random oligonucleotides containing no chemical bias. In a preferred embodiment, illustrated in Example 1, the invention demonstrates that even with no preconceived notions about the specific requirements for a regulatory region, the bacterial host will naturally select those regulatory regions that it finds functional in expressing the protein of interest.
Incorporation of the Oligonucleotides into the Nucleic Acid Vector As a next step for practicing the methods of this invention, the mutagenized oligonucleotides are introduced using a polymerase chain reaction into the position in the vector previously occupied by the deleted regulatory region. There are a variety of polymerase chain reaction methods known in the art that are suitable to the methods of this invention. The examples below use the enzymatic inverse polymerase chain reaction (see Stemmer, et al. BioTechnigues 13:114-220, 1992, hereby incorporated by reference and U.S. Patent Application No. 07/641,140, filed April 26, 1991, and U.S. Patent Application No. 07/806,154, filed December 12, 1991) as a preferred PCR strategy. EIPCR advantageously allows targeted mutagenesis of a given DNA sequence and rapid selection of active clones from a large library. Other PCR techniques suitable for use in the methods of this invention include, but are not limited to, inverse PCR (He sley, et al. Nucl . Acids Res . 17:6545- 6551, 1989) , recombinant circle PCR mutagenesis (Jones, et al. BioTechnigues 8:178-183, 1990) , recombination PCR (Jones, et al., BioTechnigues 10:62-66, 1991) and standard PCR techniques such as those described by Saiki, et al. (Science 239:487-491, 1988) .
Briefly, the enzymatic inverse polymerase chain reaction (EIPCR) involves the use of oligonucleotides containing a class IIS restriction endonuclease recognition site and the associated downstream class IIS cleavage site. The EIPCR technique uses a small circular vector and the PCR primers are designed to overlap at their class IIS cleavage sites. In one embodiment, one oligonucleotide of the oligonucleotide pair used in the EIPCR reaction contains in a first 5' to 3' orientation a 5' tail, a class IIS restriction endonuclease recognition site, a class IIS cleavage site, the region containing the mutation and a region complementary to the nucleic acid vector. The other oligonucleotide of the oligonucleotide pair comprises in a second 5' to 31 orientation; a 51 tail, a class IIS restriction endonuclease recognition site, a class IIS cleavage site and a region complementary to the nucleic acid vector. In another preferred embodiment, both oligonucleotides of the oligonucleotide pair used in the EIPCR reaction contain the first 5' to 3' orientation. Further details of primer design applicable to the mutagenesis application of this invention are provided in Examples 1-3. The amplified linear vectors are digested with the appropriate restriction endonuclease and recircularized to produce a pool of mutagenized vectors. The pool of mutagenized vectors is purified and aliquots are introduced into a cell sample using any of the methods disclosed in association with the discussion relating to cell transformation or transfection discussed above. Example 1 uses electroporation to introduce the mutagenized vectors into the cell sample. Preferably the cell samples used for both the mutagenized and nonmutagenized vector transformations or transfections are derived from the same cell type. In this way, protein expression from cells containing the nucleic acid vector having the nonmutagenized regulatory region can be directly compared to expression from cells containing the nucleic acid vector with the mutagenized regulatory region.
Assaying for the Expression of the Recombinant Protein from the Mutagenized Vectors
Following introduction of the mutagenized vectors into the cell sample, the cell sample is assayed for the expression of the recombinant protein, that is, the protein associated with the mutagenized regulatory region. As noted in the section entitled "Expression of the Recombinant Protein under Control of the Regulatory Region to be Mutagenized", the particular assay for assessing the level of protein expression will depend on the particular protein being expressed as well as on the cell type. Exemplary methods for assessing protein expression in the periplasmic space of prokaryotes are provided in Example 1. For enhanced protein expression, cells are selected that exhibit protein expression from the nucleic acid vector at a level greater than or equal to the level of protein expression observed in the cell sample containing nonmutagenized vector. For optimized protein expression, other characteristics, in addition to the level of protein synthesis, may be important for clone selection. These include cell growth characteristics, the location of protein expression or the like.
Referring to Example 1 below, the processes of transcription, translation, translocation and signal peptide cleavage are parameters that are affected when the signal sequence is mutated. The use of a random signal sequence allows for selection of clones that are optimized for all of these parameters using the internal mechanisms of the host cell. For example, a random signal sequence that generates a nonfunctional secondary structure in RNA would preclude efficient translation and positive clones containing this sequence would not be identified. Assays for recombinant protein expression from the mutagenized vectors permits the cumulative effect of the mutated sequence on protein expression to be quantitated. In addition, these parameters are optimized with respect to the individual requirements of the product proteins being expressed by the regulatory regions such as glycosylation requirements, folding requirements for functionality, and the like.
As a final step in the methods of this invention, the nucleic acid vector is isolated from cells derived from the cell sample that contain mutagenized vector and exhibit enhanced or optimized expression of the recombinant protein. The purified vector is subjected to DNA sequence analysis to determine the nucleic acid sequence of the mutagenized regulatory region corresponding to the random oligonucleotide. DNA sequencing techniques are well known in the field of molecular biology, thus no further disclosure of sequencing techniques is required.
Identification of optimized regulatory sequences is a particularly important step in the method of optimizing protein expression from any number of vectors. Thus, this invention provides a method for both optimizing regulatory regions associated with protein expression and for identifying the optimized sequences of these regions. Those with skill in the art of molecular biology will be able to incorporate these optimized sequences into other constructs thereby optimizing expression of any number of proteins from a variety of vectors introduced into a variety of cell types. Incorporation of Selectable Markers into the Nucleic Acid Vector
Rather than improving upon existing regulatory regions, this invention creates novel regulatory sequences, including novel signal peptides. As noted in the section entitled "Exemplary Calculation to Estimate the Library Size Necessary for Identifying Novel Regulatory Sequences", one would expect that a library of 21 random amino acid sequences would contain a relatively low percentage of functional secreting clones. However, by attaching the random regulatory region to a selectable marker, particularly a marker which requires secretion for activity, the library can be restricted to only clones containing functional signal peptides. Selectable marker proteins, particularly those in prokaryotes require transport to the periplasmic space to be active, have the advantage that when a regulatory region is operably linked to the amino-terminus of a heterologous protein containing the C-terminus of a periplasmic protein, this will result in transport to the periplasmic space and the selectable marker is able to confer a subsequent advantage to cell growth under certain conditions. The in-frame fusion of a heterologous protein to a selectable marker provides a rapid means to select only active regulatory sequences. Furthermore, if the selectable marker is an antibiotic resistant gene, then the use of increasing concentrations of antibiotic allows for selection of clones containing the most efficient regulatory sequences.
The enzyme β-lactamase represents one such selectable marker because it is active only if translocated from the cytoplasm to the periplasm (Pluckthun, et al. J. Biol . Chem . 262(9): 3951-3957, 1987 and Francisco, et al. supra) . While Example 1 uses a regulatory region operably linked to a eukaryotic protein fused to the C-terminus of /5-lactamase, it also is contemplated that the regulatory region can be operably linked to a full-length selectable marker such as /J-lactamase (see Example 3) . However, the creation of novel regulatory regions by assessing protein expression in the periplasmic space is not limited to /3-lactamase fusion proteins. Other selectable marker proteins can also be used. As an alternative, aminoglycoside 3'-phosphotransferase II can be used as a selectable marker in the same way. This enzyme is present in bacteria that are resistant to streptomycin, neomycin or kana ycin. The enzyme renders the antibiotic inactive as it enters the cell through the periplasm. Other periplasmic enzymes include Streptomycin adenylating enzyme and alkyl sulfhydrolase.
Other selectable markers contemplated for use in the methods of this invention include binding proteins for various carbon sources. Proper secretion of this class of proteins leads to their presence on the outer membrane or periplasm, enabling the cell to internalize a particular sugar or amino acid. For example, maltose binding protein, when properly secreted, allows bacteria to use maltose as the sole carbon source. If the maltose binding protein is not secreted, cells cannot grow on media containing maltose as the sole carbon source. In some cases, it is necessary to use the appropriate auxotrophic strain of E. coli . For example a his-negative strain requires histidine in the media. Other selectable proteins that may be useful as fusion proteins include but are not limited to arabinose binding protein, arginine binding protein, cystine and diaminopimelic binding protein, galactose-glucose binding protein, glutamate-aspartate binding protein, glutamine binding protein, histidine binding protein, leucine specific binding protein, lysine-arginine-ornithine binding protein, phosphate binding protein, ribose binding protein, sulfate binding protein, thiamine binding protein and xylose binding protein. One can also use nonselectable markers. These are proteins that do not necessarily give the cell any physiological advantage. Their presence in the periplasm is detected based on their interaction with specific ligands. With the use of these markers it is possible to detect the periplasmic protein by probing with the appropriate ligand. Examples of these markers include but are not limited to hapten binding antibodies, cytochrome c, nitrite reductase, phosphoglucomutase and phosphoglucose isomerase. Therefore, while Example 1 employs a random sequence library linked to a fusion protein of a single chain antibody with an antibody resistance gene, the regulatory region could also be linked solely to a sequence encoding an antibody sequence and positive clones selected by screening with labelled antigen. For identification of intact colonies expressing antibody sequences in the periplasmic space, the antigen is preferably a small hapten capable of diffusing into the periplasmic space.
As noted above, any regulatory region affecting the level of protein synthesis is contemplated as a potential target for the optimization strategies of this invention. Two specific prokaryotic examples are provided below. These examples illustrate the methods of this invention as they apply to optimizing signal sequences (Example 1) and ribosome binding sites (Example 3) . The methods of this invention readily apply to mutagenesis strategies directed to both these and other regulatory regions in prokaryotic and eukaryotic cells.
As a particularly preferred strategy for generating optimized signal sequences encoding signal peptide, Example 1 details a method for creating novel signal sequences for the expression of eukaryotic protein in E. coli . In addition to creating novel signal sequences, this method is also useful for optimizing the expression of a particular protein from a given construct in a given cell type.
The strategy detailed in Example 1 combines the mutagenesis strategies of this invention with antibiotic selection to improve secretion of Fv antibody molecules in E . coli . A fusion protein consisting of an N-terminal, hapten-binding single chain antibody (CHA 255) and the C- terminal of /3-lactamase was constructed and expressed in active form in the periplasmic space of E . coli using the Omp A signal peptide. The fusion of /3-lactamase with a second protein to study protein transport was reported by 10 Broome-S ith, et al..Mol . Micro . 4(10) 1637-1644, 1990. For studies linking Omp A with a prokaryotic protein as a fusion protein to ..-lactamase see Francisco, et al. Proc . Natl . Acad . Sci . USA 89:2713-2717, 1992 and Meyer et al. disclosing a method for combining an Fab1 antibody conjugate to 3-lactamase, Bioconj . Chem . 3:42-48, 1992). For site-directed mutagenesis of a eukaryotic signal sequence associated with a eukaryotic protein fused to the C-terminus of /3-lactamase for prokaryotic expression see Stahl, et al. (Gene 71:147-156, 1988)
Unlike previous mutagenesis strategies, the regulatory sequence of interest, here the Omp A signal peptide, was deleted and enzymatic inverse PCR was used to construct a library containing random signal peptides at the N-terminus of the fusion protein. None of the references cited above report the deletion of the regulatory region to be mutagenized followed by the isolation of vector containing that deletion to create large libraries of regulatory mutants uncontaminated with false positive clones. Functional signal peptides were selected by plating the clones on media containing ampicillin. Clones were identified that had /3-lactamase activity comparable to or greater than that of the fusion protein containing the Omp A signal peptide. To ensure retention of antibody activity, colony filter lifts were screened with a radioactively labeled hapten, πιIn-EOTUBE, recognized by the CHA 255 antibody.
Example 1 provides a detailed method for generating novel signal sequences that are 22 amino acids in length. However, optimized signal sequences, like optimized regulatory regions can be selected based on the lengths of known regulatory regions, or, alternatively, regulatory regions of any desired length can be selected for optimization. Therefore, steps similar to those outlined in Example 1 can be performed to generate optimized signal sequences, or other regulatory regions of any length. Example 2 provides methods for creating random leader peptides of 12, 16, or 20 amino acids in length. It is further contemplated that the methods of this invention can be used to create novel regulatory sequences for expression of either heterologous or homologous protein. For example, Example 3 uses a native E. coli protein to identify novel ribosome binding site sequences. Once optimized ribosome binding site sequences are identified they can be incorporated into other vectors to study the effect of these sequences on heterologous protein expression. In this example the steps outlined above are followed for creating a library of random regulatory sequences operably linked to protein except that here the initial nucleic acid vector construct includes the Omp A ribosome binding site linked to the Omp A signal peptide located at the amino terminus of the /3-lactamase protein. The Omp A ribosome binding site sequence was deleted by PCR and the resulting vector was used to incorporate a library of random oligonucleotides. These mutagenized vectors were introduced into E. coli and the level of protein expression from the resulting clones was assessed using a PADAC assay. Three clones were identified that expressed elevated levels of protein relative to the nonmutated ribosome binding site sequence. These clones were selected and the nucleic acid sequence of the mutated ribosome binding site was determined. Details of this experimental protocol are provided in Example 3. In yet another preferred embodiment of this invention, a method is provided for creating and identifying novel regulatory regions in eukaryotes. Example 4 details a method for producing signal peptide libraries in mammalian cells for improved protein expression. Known eukaryotic signal sequences are disclosed and reviewed by Von Heigne, G. , Eur. J . Biochem . 133:17-21, 1983. Figure 7 illustrates an exemplary vector contemplated for use in this invention. The vector uses the Immunoglobulin heavy chain promoter to initiate transcription of the desired gene inserted downstream from the promoter. The gene to be expressed along with the regulatory region to be mutagenized is inserted into the vector. The regulatory region is deleted and a random or biased regulatory sequence is inserted into the vector. Example 4 teaches the mutagenesis of the signal sequence. Following the incorporation of the random signal sequence into the eukaryotic vector, the vector is introduced into suitable eukaryotic cells that are capable of expressing protein from the expression vector. Example 4 employs the vector of Figure 7 in SP 2/0 cells. Vectors can be introduced into cells using those methods known in the art. Example 4 employs electroporation to mediate transfection. However, those with skill in the art will be readily able to select an appropriate method for introducing the vector into the eukaryotic cells. Positive cells are preferably selected by growing the cells in neomycin. In Example 4, the relative level of protein expression is assayed using a Western dot blot. Positive transfected cells can be further quantitated for protein expression by ELISA as compared with cells expressing protein under the control of the native regulatory sequence. Cells expressing equal or greater levels of protein expression as compared with cells containing the native regulatory sequence are subjected to DNA sequencing. In another preferred embodiment of this invention, the methods of the invention are used to generate targeting sequences to target proteins to particular cellular locations. The terms "protein targeting sequence" and "protein trafficking sequence" are used interchangeably in this disclosure. Both prokaryotic and eukaryotic systems studied to date indicate that sequences associated with protein targeting can occur in a variety of locations within a protein sequence. For example, in bacteria, proteins with N-terminal signal sequences are secreted into the periplasmic space or are inserted into the outer membrane using a secretion apparatus defined by a series of sec genes (Bassford, et al., Cell 65:367, 1991). Other proteins lacking signal sequences are also secreted into the extracellular space in E. coli and some of these proteins contain secretion signals located at the C-terminus of the protein (Delepelaire, P. et al, J . Biol . Chem . 265:17118, 1990) . These secretory proteins lack a classical hydrophobic signal peptide. Instead, the proteins contain a consensus region of 200 amino acids containing highly conserved stretches of amino acids that are postulated to confer secretory activity.
Similarly, targeting sequences are located within a variety of locations in eukaryotic proteins. For example, interleukin proteins lack hydrophobic leader peptides and their targeting sequences are believed to be located in the C-terminal portion of the protein precursor (Marck, et al., Nature , 315: 641, 1985) . E . coli expression of Interleukin-1 results in the translocation of the protein to the periplasmic space, while expression in yeast results in expression of the protein in the cytosol. Particular protein sequences also are implicated in mitochondrial targeting (see Zara, et al., J. Biol . Chem . 267:12077-12081; Adrian, et al., Mol . Cell . Biology 6:626- 634, 1986 and Horwich, et al, Cell 44:451-459). Similarly in both prokaryotes and eukaryotes, post-translational modifications, such as glycosylations (eukaryotes) , the addition of lipids, or the like, have been shown to be important in protein translocation and targeting within a cell. It is generally believed that specific regions within the protein sequence are important determinants for post-translational modifications within a cell. Kuchler, et al. Endocrine Reviews 13:499-514, 1992 observed that the C-terminal modification of a chimeric IL-1_: :bacterial a- factor protein resulted in exclusive plasma membrane association. Others have demonstrated that a combination of post-translational modifications of the C-terminal region of a heterologous protein with the presence of a polybasic domain, positioned near the C terminus, result in the targeting of heterologous protein to the plasma membrane in MDCK cells (Hancock, et al. EMBO J. 10:4033, 1991) . In another series of experiments, Bakke, et al. (Cell 63:707-716, 1990) identified the region of a protein and the particular sequence responsible for sorting a plasma membrane protein from the endosomal vesicles to the plasma membrane. For a review of the location and sequences of known eukaryotic sorting sequences see Schwartz, Ann . Rev. Immunol . 8:195-229, 1990.
The ability to target protein to a particular cell location is an important goal in gene therapy strategies. Advantageously, the method described in the claims and below and detailed in Example 5, permits one with skill in the art to introduce a particular nucleic acid sequence encoding protein into a cell and to select cells expressing protein in the desired cell location. A comparison of the native protein sequence with the mutated protein sequence that targets the protein to the desired cell location permits the identification of the novel sequence that directs protein targeting. As a first step for practicing the trafficking or targeting method of this invention, a particular region of nucleic acid containing a protein trafficking/targeting signal or a postulated protein trafficking/targeting signal is identified. As noted, trafficking signals can be positioned in a variety of locations within a protein sequence. They have been identified with the leader sequence as well as within the C-terminal portion of a protein. Those with skill in the art of molecular biology will be able to search the literature for recognized regions of eukaryotic or prokaryotic protein sequences that are postulated to affect protein trafficking and/or targeting. Equivalent regions within the nucleic acid sequence are then identified.
Next, a nucleic acid vector is prepared that contains the nucleic acid sequence encoding the protein trafficking signal to be mutagenized and a nucleic acid sequence encoding at least one protein. The nucleic acid vector may be suitable for directing eukaryotic or prokaryotic expression of the protein, and those with skill in the art of molecular biology will be readily able to select and construct an expression vector suitable for this invention. Once the vector is prepared, the nucleic acid sequence containing the protein trafficking signal is deleted from the vector. As a next step for practicing the methods of this invention, a pool of random oligonucleotides are produced that are suitable for use in a polymerase chain reaction. These oligonucleotides are incorporated into the position in the vector previously occupied by the protein trafficking signal to generate a pool of mutagenized vectors. Suggested methods for the steps of this invention are discussed supra as they relate to methods for creating nucleic acid regulatory sequences and specific examples of the suggested methods are provided in the Examples below. The mutagenized vectors are introduced into a cell sample using electroporation, membrane permeabilization, CaCl2 precipitation, viral vectors, or the like. Those with skill in the art will use their preferred methods for introducing nucleic acid vectors into a cell sample.
As a next step for practicing the methods of this invention, the cells are assayed to determine the location of the protein encoded by the nucleic acid vector in the cell sample. Methods to determine the localization of a protein within a cell are known in the art. For example, if antibodies are available that react with the protein encoded by the nucleic acid vector, then these antibodies can be labelled with a fluorescent marker, colloidal gold, or the like. Cell samples expressing the protein can be membrane permeabilized for fluorescent antibody analysis or cells can be fixed and sectioned for protein localization studies using electron microscopy in the presence of colloidal gold. In addition, cell samples can be fractionated and individual organelles separated and individually tested for the presence of the protein. Cells expressing the protein in the desired cell location are identified by an assay and the nucleic acid sequence of the novel protein trafficking/targeting signal is determined and compared to the native sequence. Particular embodiments of the invention will be discussed in detail in the following examples and reference will be made to possible variations within the scope of the invention. There are a variety of alternative techniques and procedures available to those of skill in the art which would similarly permit one to successfully perform the intended invention.
Example 1 Creation of Novel Signal Sequences in E. Coli
This example illustrates a method for producing novel signal sequences for optimized expression of eukaryotic or prokaryotic protein in prokaryotic cells. The SCCHA225 /3-lactamase fusion protein was cloned as follows: SCCHA225 was amplified from the construct pUCHAsc2 using a 5'-terminal primer (B444, designated SEQ ID NO: 1, see Figure 1) containing an Xba I restriction site and a 3' primer (B456, designated SEQ ID NO: 2) containing DNA coding for the (gly4ser)2 sequence and a Class IIS restriction endonuclease restriction site, Bsa I. The designation "2" in the name pUCHAsc2, indicates that the a pR gene was a mutation eliminating the BSA1 site, which was accomplished by a single round of EIPCR using primers 939 (SEQ ID NO: 3) and 940 (SEQ ID NO: 4) as described in Example 2) . /3-lactamase was amplified from pUCHAsc2, which contains a modified copy of /3-lactamase lacking the Bsa I restriction site, but containing the native amino acid sequence. A 5 '-terminal primer (B457, designated as SEQ ID NO: 5) containing a Bsa I restriction site and nucleotides coding for (gly4ser)2 were used. The 3 '-overlap primer (B447, designated as SEQ ID NO: 6) introduced 3 consecutive stop codons and a Bam HI restriction site at the 3' end of the /3-lactamase gene. The use of overlapping primers, B456 and B457, created a 10 amino acid (gly4ser)2 sequence between the single chain antibody and 3-lactamase domains (Figure 1) . The resulting PCR products were processed with Bsa I, ligated and the ligation products were used as template for a second PCR reaction using the external primers B444 and B447. Ligation conditions are provided below. The 1.6 kb PCR product was then gel purified and subcloned into the expression plasmid pCCHAscl cut with Xba I and Bam HI. pCCHAscl is a pUC based plasmid that constitutively expresses a chloramphenicol resistance gene and contains an expression cassette having a Lac promotor driving expression of the CHA 255 single chain antibody through the use of an Omp A derived ribosome binding site and signal sequence. The Omp A sequence was created from primers using the sequence provided in the literature (Mowa, et al. J. Mol . Biol . 147:317-328, 1980). The resulting plasmid pCCHAscl-ampRV2 (see Figure 1) was verified by restriction digestion and DNA sequencing. Double stranded dideoxy sequencing was performed on a Dupont Genesis 2000, using the DuPont Genesis 2000 sequencing kit according to the manufacturer's instructions. Post gel processing was done with the Base Caller 5.0 program (DuPont, Boston, MA).
Oligonucleotides used to produce the fusion protein construct are shown in Figures 1 and 2. Oligonucleotides were synthesized in an Eppendorf Synostat D automated DNA synthesizer (Madison, Wisconsin) .
Oligonucleotides used for library EIPCR were synthesized with the 5'-trityl group on and were purified with a Nensorb Prep column (New England Nuclear, Tozer, MA) according to the manufacturer's instructions. Oligonucleotides containing random nucleotide incorporation were prepared by selecting a mixed base option according to the software provided by the manufacturer.
Standard ligations contained 0.1-1.0 μg DNA, lx ligation buffer and 1-2 μl (400,000 units) of T4 DNA ligase (New England Biolabs, Tozer, MA) in a 20 μl volume. The ligation reactions were incubated at room temperature for one hour. Mass estimates for chemically synthesized oligonucleotides were obtained spectrophotometrically. In all other cases, DNA mass estimates were made by visual comparison of samples run on agarose gels with a 1 kb ladder DNA standard (GIBCO/BRL, Gaithersburg, MD) loaded to approximate 100 ng DNA per band.
The construct pCCHAscl-ampRv-2 was introduced into E. coli DH-10B (GIBCO-BRL, Gaithersburg, MD) . Bacteria were propagated at 30°C in Terrific Broth medium (Sambrook, et al. 1989. supra) or on agar plates containing 34 μg/ml chloramphenicol and 100 μg/ml ampicillin. Positive clones were selected from the ampicillin plates for further study. Assays to monitor the level of protein expression in the bacteria are described below. The completed vector provides the template for EIPCR-based construction of the signal peptide library.
For EIPCR, the PCR reactions contained 0.2 ng the template, 0.5 _M of each primer, IX Taq buffer (Perkin- Elmer Cetus, Norwalk, CT) , 200 _M of each dNTP, 1.61 mM MgCl2, and 5 units of Taq DNA Polymerase (A pliTaq™, Perkin-Elmer Cetus) in a total volume of 100 μl. The addition of MgCl2 beyond 1.61 mM up to, 3.11 mM is often used as a variable that when modified impacts the yield of DNA. Templates were amplified in a Perkin Elmer Thermocycler (Norwalk, CT) using a "hot start" according to manufacturer's instructions (3 min. at 94°C) followed by 30 cycles of 94°C (1 min.) denaturation, 50°C (1 min.) annealing and 72°C (2 min.) extension. A 5 second auto extension was used. Extension was completed with an additional 10 minute incubation at 72°C.
The EIPCR template DNA is similar in size to the desired EIPCR product, hence they could copurify by agarose gel electrophoresis during further manipulations. While the circular template is present at a much lower concentration than the EIPCR product, it will electroporate at high efficiency in bacteria thereby resulting in the contamination of the library with wild-type clones. The data presented in Table 1, below, demonstrate the problem of wild type clone contamination in library mutagenesis. When pCCHAscl-ampR was used as a template, only the wild- type sequence was recovered from the library under selective conditions.
To render the expression construct nonfunctional and eliminate the appearance of functional template in the EIPCR product, the entire Omp A signal sequence was deleted from pCCHAscl-ampRV-2 in a single EIPCR reaction (see Figure 2A) . Two primers, B593 (SEQ ID NO: 7) and B594 (SEQ ID NO: 8) , were used in an enzymatic inverse polymerase chain reaction to remove the signal sequence. The oligonucleotides hybridized to opposite strands of the plasmid and in a PCR reaction yielded a linear DNA that is slightly smaller than the original template. This linear DNA contains at its termini a 14 bp extension containing the recognition site for the class IIs restriction site Bsa I. The terminal extensions of DNA were removed by digestion with Bsa I, yielding compatible cohesive ends that efficiently undergo an intramolecular ligation. The resulting plasmid, pCCHAscl-ampR_SP, is a 3.2 kB pUC- derived plasmid which contains a single chain CHA225-
/3-lactamase fusion protein absent the Omp A signal peptide, and this vector confers resistance to chloramphenicol. Removal of the signal sequence resulted in ampicillin sensitivity. Thus, any pCCHASCl-ampR_SP clones that appeared in the final EIPCR library, were eliminated during selective expression to yield the desired mutagenized clones (see Table 1, below) .
The random oligonucleotides, created as described above, were incorporated into pCCHAscl-ampRV2_SP using a polymerase chain reaction. Figure 2b illustrates the use of EIPCR to create a random signal peptide library. Divergent oligonucleotide primers were used to amplify the DNA and to incorporate mutant sequences. The forward primer, B524 (SEQ ID NO: 9) , annealed to the DNA corresponding to the N-terminal region of the scCHA antibody. The reverse primer, B525 (SEQ ID NO: 10) , incorporated the initiator codon and 21 random codon triplets. The reverse primer's annealing location is illustrated in Figure 2b. Both the forward and reverse primers contained the Bsa I restriction site near their termini. The Bsa I restriction site is present at both ends of the linear EIPCR product and was removed by digestion, yielding compatible cohesive ends. The linear PCR product was then ligated and electroporated into E. coli .
The infrequent occurrence of an active signal peptide in a library of random signal peptides necessitated several considerations for library construction. First, to ensure a large enough library size and complexity we used a higher than normal amount of template DNA in our EIPCR reactions. Typically EIPCR reactions contain only 0.5 ng of template DNA per 100 μl reaction (Stemmer, et al. supra) . Increasing the amount of template to 25-50 ng per 100 μl reaction resulted in more EIPCR product and a larger library size. As an example, when pCCHAscl-ampRV2 was used as a template, 5 ng of plasmid yielded only 0.25 μg of PCR product. When 50 ng of template was used, 5 μg of PCR product was obtained. It is contemplated that the activity of different primer-template combinations will be optimized within a range of template concentrations from about 0.1 to 50 ng of template per 100 μl reaction. Those with skill in the art of molecular biology can readily determine the optimal template concentrations as well as adjust the concentration of MgCl2 to maximize PCR reaction product. Advantageously, the deletion of the critical regulatory sequence from the template prior to the incorporation of random sequence enables one to use greater amounts of template than those typically used in standard PCR reactions without concern for amplifying nonmutated template. EIPCR reaction conditions for the incorporation of random oligonucleotides were the same as for the cloning reactions.
Table 1
Library Template CFU/uσ DNA % Wildtype (n)
Template DNA/100 μl CAM CAM/AMP CAM/AMP
pccHAscl-ampR 2θng 2O . 4xl06 4511 100% ( 8 ) '
pccHAscl- 50ng 3 . 9 1904 0 ( IO ) * *
AmpRΔSP
CAM (chloramphenicol) = 34 μg/ml AMP (ampicillin) = 50 μg/ml n=number of samples evaluated *=determined by restriction mapping **=determined by DNA sequencing
The 100 μl PCR mixture was extracted with an equal volume of phenol-chloroform-isoamyl alcohol (24:23:1) and was then precipitated using 2 volumes of ethanol. The protruding termini of the PCR end products were filled in and digested according to the method of Stemmer (Biotechnigueε , 1992 supra) . Briefly, the pellet was resuspended in 70 μl H20 and 10 μl of 10X Klenow buffer (NEB, Tozer, MA), 10 μl dNTP mix (2.5 mM each dNTP), 5 μl of DNA Polymerase I (large fragment: Klenow) , and 5 μl of T4 DNA polymerase were added. The reaction was incubated at 37°C for one hour followed by phenol extraction and ethanol precipitation as disclosed above. Blunt-ended EIPCR fragments were then exhaustively digested in a 100 μl reaction containing 50 units of Bsa I at 60°C for four hours. The Bsa I digested fragment was purified away from both the PCR primers and the short terminal DNA fragment by centrifugation through a Biospin 6 column (BioRad, Richmond, CA) according to the manufacturer's protocol. The purified Bsa I digested fragment was then phenol extracted and ethanol precipitated as described earlier and ligated under standard conditions with the exception that the fragments were ligated at higher concentrations of T4 DNA ligase (106 units/μl) and incubation was at 12°C overnight.
Ligated or control DNA (pUC 19, GIBCO/BRL) was precipitated in 2 volumes of ethanol and resuspended in 20 μl of TE (lOmM Tris, pH 7.4, 0.1 mM EDTA, pH 8.0). 10 μg of yeast tRNA (GIBCO/BRL) was added as a carrier and did not affect electroporation efficiency. The DNA was electroporated in 1-5 μl aliquots into electrocompetent DH10-B MAX cells (BRL, Bethesda, MD) according to the manufacturer's instructions. pUC19 DNA (0.01 ng) was electroporated to monitor the electroporation efficiency.
Under conditions to select for the plasmid marker (34 μg/ml chloramphenicol) , a library of 3.9 x 106 colony forming units was obtained from approximately 1 μg of ligated DNA (Table 1) . Under dual selective conditions (34 μg/ml chloramphenicol + 50 μg/ml ampicillin) an equivalent amount of DNA yielded 1904 cfu, which represents a positive rate of 0.05%. Clones showed a variety of colony sizes ranging from 0.2-2 mm.
When greater concentrations of ampicillin were used, the fraction of positive clones decreased (see Table 2, B) . In contrast, the plating efficiency of the control plasmid using CAM selection pCCHAscl-ampR (A) , which contains the wild Omp A signal, did not decrease appreciably at higher concentrations of ampicillin. The libraries formed under higher concentrations of ampicillin presumably contain clones with leader peptides capable of higher levels of secretion. In these experiments 10 μg of pCCHAscl-ampR was electroporated into 20 μl electrocompetent E . coli DH10B and grown for one hour at 37°C in 1 mL S.O.C. (Sambrook, et al. supra) . The cells were centrifuged briefly and resuspended in 400_1 S.O.C. A IO"4 dilution was prepared in S.O.C. and 0.1 ml aliquots were plated on TB plates containing 34 μg/ml chloramphenicol with the indicated concentration of ampicillin. This transformation strategy was compared to a transformation using 4 μl (approximately 1 μg) of signal peptide library DNA electroporated into 80 μl electrocompetent E. coli DH10B and processed as described for the control pCCHAscl-ampR plasmid except that the IO"4 dilution step was eliminated.
Table 2
Restriction of the signal peptide library on increasing concentrations of ampicillin
Clone/Library [amp] cfu/plate (+) cfu by %
(μg/mL) '"In screen positive
pCCHAsclampR 50 254 199 78.3
100 218 149 68.3
200 319 201 63.0
400 211 74 35.1
SP Library 50 640 107 16.7
100 313 98 31.3
200 118 42 35.5
400 12 6 50.0
The library size was determined by electroporating 1 μl of library DNA and plating 90% of the electroporated cells onto media containing 34 μg/ml chloramphenicol and 100 μg/ml ampicillin and by plating serial dilutions of the remaining iθ% of the electroporated cells onto media containing only chloramphenicol (34 μg/ml) . Under the latter conditions all clones produced colonies. The quality of the resulting signal peptide library was verified by DNA sequencing of the mutated region of four nonselected clones. In all clones the mutations were correctly incorporated. The composition of the mixed bases was 31% A, 19% G, 25% C and 25% T where the sample number was 119.
Library clones that were able to grow on 50 μg/ml or greater ampicillin were further verified by colony lift and by hapten binding assays. One possible complication of ampicillin selection can be deletion of all or some of the scCHA insert. Another possibility is that the EIPCR reaction could incorporate mutations in the antibody sequence and thereby eliminate antibody activity. In fact some errors were identified when individual clones were later sequenced. To examine these possible complications, two approaches were used. First, digestion of DNA from positive clones (grown with 200 μg/ml ampicillin) with Xba I and Pst I indicated that about 70% of the clones contained the full length 800 scCHA insert.
Second, a large number of colonies were grown under selective conditions, transferred to nitrocellulose and probed with the radioactive hapten ιnIn-EOTUBE. Labeled chelate was prepared and colony lifts performed as described by Stemmer, et al. (BioTechnigues 14:256-265,
1993) . Briefly, colony lifts of 23cmx23cm plates with 0.3- 1 x 105 colonies were prepared using BA83 nitrocellulose filters (Schleicher and Schuell, Keene, New Hampshire) . The filters were blocked by incubation in 3% non-fat milk in 25 mM Tris-HCl pH 7.5 for 10 minutes, washed with 25 mM Tris, followed by incubation in 25 mM Tris containing 50 °Ci of chelated Indium per filter for 1 hour at room temperature. The filters were then washed with 25 mM Tris ' for a total of 15 minutes, dried and exposed to Kodak X- Omat AR autoradiography film for several hours. Approximately 30-50% of the colonies yielded a positive signal in this assay, indicating that a significant number of clones were not expressing active scCHA antibody (see Table 2, supra) . Loss of scCHA activity also occurred in the control plasmid, though at a lower frequency. Clones yielded positive scCHA signals of varying intensities. Twelve clones yielding a high intensity signal were selected for further work and were purified by a secondary and tertiary colony screen. These positive clones were removed from the primary screen plates and diluted into fresh media, replated and reassayed. A single positive colony from this secondary screen was replated and reassayed in a tertiary screen, which yielded clonal material, since 100% of the colonies yielded a positive signal. The results of representative colony lifts shown demonstrated that the scCHA signal was stable during successive rounds of growth (data not shown) .
In order to study protein expression in the cells, the cells were grown at 30°C in Terrific Broth (see supra) containing 34 μg/ml chloramphenicol and 50 μg/ml ampicillin in baffled shaker flasks at 250 rpm for 24 hours. Following cell growth, the periplasmic fraction of E . coli was prepared and isolated using the methods described by Witholt, et al. (Anal. Biochem . 74: 160-170, 1976, hereby incorporated by reference) .
Determination of antibody expression levels was performed as previously described (Stemmer, et al., BioTechnigues, 1993 supra) . The quantitation of the active fusion protein was based on a comparison to the CHA 255 standard. Isolated clones were further characterized to demonstrate the production of an active fusion protein. To quantitate the amount of fusion protein recognizing the labelled hapten, 50 μl of periplasmic extract was incubated with 50 μl of carrier free In-E0TUBE for 30 minutes. The mixture was then centrifuged through an ultrafree column (10,000 MW cutoff) and the eluate was counted. The % bound was expressed as the (input cpm-eluate cpm) /input cpm. The number of sites (in pmol) was determined from a standard curve utilizing CHA255 antibody. The data of Table 3 indicates that periplasmic supernatants from several clones bound to the radioactive hapten nlIn.
Table 3
Hapten binding and /3-lactamase activity of protein containing mutant signal peptides
clone pmol ng/ l ΔAU/min/ml ng site/ml protein/ml
CHA 12 960.0 NA NA
/3-lac STD NA NA 3.038 680
o pA
Fusion 1.13 59.9 1.075 481 protein
E3 0.009 0.45 0.104 45.6 E5 0.076 4.04 0.079 34.66
E6 0.638 33.8 1.409 618.1
E9 0.098 5.17 0.089 39.05
E10 0.009 0.5 0.371 162.76
E12 0.131 6.93 0.697 305.8 E13 0.33 17.47 0.184 80.72
E14 0.36 19.09 0.086 37.7
E15 0.079 4.19 0.074 32.46
E16 0.007 0.4 0.225 98.7
E17 0.012 0.65 " 0.087 38.16 E18 0.043 2.29 0.083 36.41 E19 0.032 1.7 0.081 35.53
E20 0.014 0.76 0.074 32.5
E21 0.03 1.58 0.104 45.6
E22 0.035 1.85 0.100 43.9
Since library clones were isolated from ampicillin containing plates, one would predict that periplasmic supernatants should also contain significant /3-lactamase enzymatic activity. The activity of the /3-lactamase portion of the fusion protein was determined by the quantitative PADAC (2-[ (N,N-dimethylanilin-4- yl) azo]pyridinium 3 '-cephalosporin) assay (Meyer, et al., Biocon jugate Chem . 3:42-48, 1992, which is hereby incorporated by reference) . 100 μl aliquots of periplasmic supernatant were incubated at 37°C with A570 = 0.5 of PADAC in a 2 ml stirred cuvette and the decrease in the A570 was monitored over time. /3-lactamase activities were expressed as a change in absorbance unit/min/ml and mass determinations were made based on a commercially available E . coli /3-lactamase (Sigma St. Louis, MO) standard of known concentration assayed under identical conditions. Results are provided in Table 3. The data reveals that all of the clones tested had significant /3-lactamase activity. Interestingly, mutant E6 had the highest activity, above that of the control fusion protein. The use of a /3-lactamase standard permitted calculation of the approximate mass of fusion protein produced.
Based on the data in Table 3 , there seems to be no clear direct correlation of antibody binding activity with /3-lactamase activity. For example, some clones like
E13 and E14 showed moderate levels of antibody activity but very low levels of beta lactamase activity. Also, clones like E10, E12 and E16, showed moderate levels of beta lactamase activity while exhibiting very little antibody activity. Only clone E6 showed levels of antibody and enzyme activity that were close to that for the control Omp A clone. Because there is the possibility of obtaining mutations in any PCR technology that may affect the activity of antibody or enzymes, we took several of the periplasmic preparations and evaluated them for their expression levels by Western blotting (Sambrook, et al. supra) since this method should be less sensitive to mutation (see Stemmer, et al. BioTechnigues , 1993 supra) . Samples were developed with purified rabbit polyclonal antibodies to CHAsFv. The antibodies were obtained from rabbit polyclonal antiserum. Rabbits were immunized by a traditional procedure (J. Immunol . 116:1306-1312, 1975) that employed multiple intramuscular injections of CHA255 monoclonal antibody (100 μg amounts) in Freund's adjuvant. Serum obtained from the rabbits were processed to isolate a purified polyclonal antibody preparation to CHA255 Fv by absorption and elution from an immunoadsorbant column that was conjugated with CHA255 chi eric antibody. The CHA255 chimeric antibody was cloned and produced as described previously (J . Immunol . 145:1200-1204, 1990 and Protein
Expr. and Pur if. 2:75-82, 1990). CHA255 chimeric antibody was conjugated to CNBr preactivated Sepharose 4B according to the manufacturers instructions (BioRad Labs, Richmond, CA) . The antibody in the rabbit polyclonal antiserum were applied to the column and eluted as described previously (J. Immunol . 116:1306-1312, 1975). This purified polyclonal antibody preparation was used in Western blots in 100 μg/ml concentrations. The results of the Western Blot experiments indicated that E5 was now the best producer of the mutants tested from Table 3. In this experiment, the Omp A control clone produced higher levels of protein than the mutants. Expression of E5 was higher than that of E6, with E12 and E10 slightly lower in expression than E6 (data not shown) . In addition, several of the selected library E clones were subjected to DNA sequencing to determine the sequence of the DNA that coded for the signal peptide (see Table 4) . A number of the clones contained stop codons within their sequence. In particular, clone E6, which directed increased protein expression in the antibody and enzyme activity tests and was second only to E5 in the Western blot, contained several stop codons in the signal sequence.
Table 4
AMINO ACID SEQUENCES OF SIGNAL PEPTIDE LIBRARY E
LEADER SEOUENCE
OmpA(SEQ ID NO: 11) MKKTAIAIAV ALAGFATVAQ A
E3 (SEQ ID NO : 12) MLYPGEMLCM QNFITLIIEI YI
E5 (SEQ ID NO : 13) MGMYKLRLLI DCLMLYVCLT DT
E6 (SEQ ID NO : 14) MNILLEN*M* RTHDMLIDGT I
E9 (SEQ ID NO : 15) MGMYKLRLL DCLMLYVCST DT
E10(SEQ ID NO : 16) MTSVCKITVL ILFXTGTYNI LG
E12(SEQ ID NO 17) MRNTYV'YXKY GRTEWLVEQI NL
E13(SEQ ID NO 18) MvQCERXIYH YVQRIREYFX FE
E14(SEQ ID NO 19) MvGICTGIGV GILLLLLWG QV
E15(SEQ ID NO 20) MLGA EFKKL SYAILVLSHT PT
E16(SEQ ID NO 21) MDQGIKYRLE KWRGRISLIR VF
E17(SEQ ID NO- 22) MQRRGIRWAY VIYLSMFFGA C
E18(SEQ ID NO: 23) MLGVLTMPCD TMVEDLACDY MS
* = stop codor _
X = unknown anlino acid
Because the EIPCR method involves amplification of the entire plasmid, there is always the possibility that one can have mutational errors occurring in any location in the vector of the selected library clones. Thus, it is always recommended that after the DNA sequence is determined for the mutated library segment, the selected new sequence is recloned using new DNA oligonucleotides with standard cloning procedures to confirm that the sequence is in fact directing the expected effect identified from the originally selected mutant sequence. In the case of the signal sequence library E, we chose to redone E5 and ElO signal sequences in place of the natural /3-lactamase signal sequence and compare expression with Omp A driven /3-lactamase. To accomplish this, oligonucleotides containing the new sequences were cloned into the vector pCLA3ampR (Figure 3) . This PUC derived vector contains a constitutively expressed chloramphenicol gene, Lad, and the /3-lactamase gene. Expression of /3-lactamase is driven by the Lac promoter and is controlled by the Omp A RBS sequence and Omp A signal sequence. The activity of the Lac promoter is negatively regulated by the LacI gene product, the Lac repressor.
Construction of pCLA3ampR E5 was accomplished by PCR using primers B947 (SEQ ID NO: 24) and 940 (5' and 3' primers respectively, see Figure 9) with the pCLA3ampR template using standard PCR conditions as disclosed supra. The product was cut with Xba I and Pst I. The PCR product was then ligated back into pCLA3ampR, which had previously been digested with Xba I and Pst I. The construction of pCLA3ampR ElO was accomplished in the same way, except that the 5' primer was B946 (SEQ ID NO: 25) .
In order to determine the expression level of /3-lactamase, clones were grown in Terrific Broth containing 30 μg/ml chloramphenicol in baffled shaker flasks rotating at 250 rpm for 24 hours at 30°C. IPTG (Boehringer Mannheim, Indianapolis, IN) was added (final culture concentration 0.4 mM) to the cultures when the OD^ reached 0.8 to induce the expression of the 3-lactamase gene. The periplasmic fraction was obtained as described by Witholt, et al. (supra) . The periplasmic samples were analyzed for /3-lactamase activity using the PADAC assay described earlier. The results indicated that the E5 and ElO signal sequences were equivalent to Omp A in their ability to express /3-lactamase from the pCLA3 vector (Table 5) . It should be noted that the DNA sequence coding for the signal peptides of clones E5 and ElO depicted in Table 5, were determined. The sequence of ElO was as previously determined, but E5 showed a T to C base change at position 8, resulting in a change of the third amino acid of E5 from a methionine to a threonine. The results in Table 5 indicate that this single amino acid change in E5 did not adversely affect its ability to function as a signal sequence.
Table 5
Clone Exp. 1 Exp. 2 Exp. 3 Average/ % Of (stdev) Control
AU/min /ml x 10"3 pCLA3 19 23 22 21.3/ 100 ampR (2.1)
Control
RBS 1 40 188 short
RBS 2 46 48 35 43/ (7.0) 202 short
RBS 2 40 47 38 41.7/ 196 long (4.7)
RBS 8 57 71 59 62.3/ 293 short (7.6)
RBS 12 39 47 36 41.7/ 196 (3.5)
Signal 20 22 21/(1.0) 99 peptide E 5
Signal 21 23 22/(1.0) 103 peptide E 10
Blank 0 0 0 0
The library generated E5 and ElO signal sequences were then compared for structural properties with selected signal sequences derived from E. coli proteins (Omp A, DGAL, LamB, ELT6, Omp T, Pel B, PhoA) . These signal sequences, as well as a variety of others are available from the GenBank database. The program Protean (DNAstar, Madison, WI) was used to evaluate alpha helix vs beta sheet regions (Garnier-Robson and Chou-Fasman methods) as well as the hydrophilicity index (Kyte-Doolittle method) . Both E5 and ElO displayed the characteristic hydrophobic core of amino acids seen in all of the E . coli derived signal sequences (Figure 4) . Interestingly, the ElO sequence contained no alpha helix regions. Alpha helix content varied with all of the native E . coli signal sequences evaluated. Also, an interesting feature of the E5 signal sequence was that the final 3 residues of the sequence, Thr, Asp, Thr, correspond to those designated in the art as a -3,-1 box which provides the recognition sequence for cleavage by the E. coli signal peptidase (Protein Targeting, supra) . The sequence in the -3,-1 box for E5 differs from all known E. coli sequences. The presence of this unique -3,-1 sequence is not detrimental to the ability of E5 to export a functional /3-lactamase protein to the periplasm.
To demonstrate that the fusion proteins expressed with mutant signal peptides were processed to the correct size, periplasmic extracts were fractionated on SDS- polyacrylamide gels. The proteins were transferred to nitrocellulose and probed with an anti-CHA Fv antibody. Results from the Western blots indicated that the fusion proteins isolated from the bacterial clones were _53 kD. This is close to the predicted size of the fusion protein.
Example 2
Production of Novel Signal Sequences of Variable Length Figure 5 illustrates the use of primers B755, B756, B757, and B758 (corresponding to SEQ ID NOS: 26 - 30 respectively) that were used in EIPCR reactions to produce signal peptide libraries of varying length. These libraries were constructed using primer B755 and one of the following primers: B756, corresponding to the random 12 aa signal peptide library; B757, corresponding to the random 16 aa signal peptide library; and B758, corresponding to the random 20 aa peptide library. In this example, the varying length libraries were constructed such that the random signal sequences were incorporated directly onto the amino terminus of the gene sequence encoding /3-lactamase
(amp-r) . The resulting plasmids are diagrammed in Figure 5 as pCampR SP-12, pCampR SP-16 and pCampR SP-20.
Following EIPCR, and using the methods outlined in Example 1, several clones from each library were prepared and the DNA was analyzed by restriction endonuclease digestion to verify the predicted size of the constructs. In the table below, active clones are defined as those clones which grow on 34 μg/ml chloramphenicol plus 10 μg/ml ampicillin. The following percentages of active clones were identified. Active clones are defined as those which grown on 34 μg/ml chloramphenicol plus 10 μg/ml ampicillin) :
Library % Active Colonies
12 aa 0.0028%
16 aa 0.0057%
20 aa 0.0141%
The clones were also analyzed by PADAC assay
(outlined in Example 1) and clones with high /3-lactamase activity were selected for sequencing. The results indicated that as the length of the signal sequence increases, the percentage of active clones within the library also increases. Moreover, while it is possible to obtain shorter signal sequences that were active, signal sequences of at least 20 aa have an increased likelihood of being functional.
Example 3
Creation of Novel Ribosome Binding Sites in E . coli
To apply this invention to a second regulatory region, we deleted the DNA sequence in an expression vector coding for a ribosome binding site (RBS) , and generated a random DNA library in its place. Since the RBS sequence initiated translation of /3-lactamase, library clones which contained a useful RBS sequence were selected by growing the library on ampicillin. The RBS library was prepared in the pCampR vector (Figure 6) . This is a pUC derived vector which has a constitutively expressed chloramphenicol resistance gene and expressed /3-lactamase under the control of the Lac promoter. In addition, translation of the /3-lactamase gene was accomplished using a ribosome binding site (GAGG) from the Omp A gene (Omp A sequence from -18 to -1) followed by the Omp A signal sequence (+1 to +63) to direct the /3-lactamase protein to the periplasm. The pCampR vector has a wild type DNA /3-lactamase sequence. In order to use the Bsa I enzyme for the EIPCR library reaction, a Bsa I site in the 3-lactamase gene was first removed. Removal was accomplished by standard EIPCR methodology (disclosed above) using the oligonucleotide primers 939 (SEQ ID NO: 3) and 940 (SEQ ID NO: 4) (see Figure 6) . The resulting vector was designated pCampR_BSA. Next, the entire region containing the Omp A derived ribosome binding site was removed from pCampR_BSAI. Sixteen bases were removed from position -16 to -1 by a single round of EIPCR using standard conditions and primers B667 (SEQ ID NO: 30) and B668 (SEQ ID NO: 31) (See Figure 6) . The resulting vector was designated pCampR_RBS and was unable to grow on plates containing 100 μg/ml ampicillin. To generate the RBS library, pCa pRΔRBS was used as the template in an EIPCR reaction containing primers B765 (SEQ ID NO: 32) and B766 (SEQ ID NO: 33) (See Figure 6) . Primer B766 contained a sequence of 16 NTPs (25% mixture of each base) so as to exactly replace the prior deleted 16 base sequence with a random DNA sequence. The EIPCR library reaction was accomplished using the EIPCR conditions as disclosed by Stemmer, et al. (BioTechnigues 13:114-220, 1992) . This reaction included low amounts of template (0.5-1 ng/100 μl reaction mixture) . This is in contrast to Example 1, which required much larger amounts of DNA template (up to 50 ng/100 μl reaction mixture) . The amount of template DNA that needs to be used in a particular EIPCR library reaction can be experimentally determined by those with skill in the art. As illustrated in Example 1, deletion of the entire functional region of the DNA prior to introducing the library sequence eliminates the concern for wildtype clones contaminating the library and enables the use of large amounts of template DNA for EIPCR, if indeed large amounts are required.
The EIPCR reaction material was processed to fill in the ends with Klenow and T4 DNA polymerase, cut with Bsa I, and ligated as described in Example 1 except that after the Bsa I digest, the DNA was electrophoresed on an agarose gel and the band containing the DNA was cut out and processed using GeneClean (BiolOl, San Diego, CA) , according to the manufacturer's instructions, to separate the digested from undigested fragments.
The ligated RBS library DNA was electroporated by adding 1 μl (50 ng) of DNA to 20 μl of electrocompetent DH10-B MAX cells (BRL, Bethesda, MD) according to the manufacturers instructions. Under conditions to select for the plasmid marker (34 μg/ml chloramphenicol) , the library size was estimated to be 1.7 x 106 colony forming units/μg ligated DNA. Under dual selective conditions (34 μg/ml chloramphenicol + 50 μg/ml ampicillin) , an equivalent amount of DNA was estimated to produce 24,040 colony forming units (cfu), which indicates that 1.4% of the colonies contained potential mutants. When greater concentrations of ampicillin were used, the fraction of positive colonies on the double selected plates decreased precipitously (Table 6) . In contrast, the plating efficiency of the control plasmid, pCampRΔBSA, did not decrease appreciably until 400 or 800 ng/ml ampicillin was used. Referring to Table 6, sixteen library colonies from the dual selective plates (800 μg/ml ampicillin + 34 μg/ml chloramphenicol) were picked and inoculated into LB media and grown overnight. In the morning, cells were harvested and periplasmic fractions were produced according to the above referenced methods. PADAC as says to measure jδ-lactamase activity showed a wide range of activity (Table 7) , with several clones exhibiting activities higher than what was obtained with the wildtype Omp A RBS in the pCampRΔBSA vector. After several repeat cultures and PADAC assays, it was clear that RBS clones 1, 8 and 12 showed higher levels of expressed 3-lactamase activity than the wildtype Omp A RBS clones pCampRΔBSAI (see Table 7) .
Table 6
ELECTROPORATION COLONIES
[AMP] (μg/ml) RBS Library Control pCamp ΔBSA
50 5000 1894 100 760 2007 200 81 1564 400 41 914 800 22 211
Table 7
BETA LACTAMASE ACTIVITIES (PADAC) FOR RIBOSOME BINDING SITE
LIBRARY CLONES Selected * Not Selected **
Clone# Activity*** Clone# Activity* (in ΔAU/min/ml) (ΔAU/min/ml)
RBS 1 5.58 RBS 11 1.95
RBS 2 1.654 RBS 12 5.18
RBS 3 2.72 RBS 13 0.012
RBS 4 3.514 RBS 14 0.021
RBS 5 3.78 RBS 15 0.016
RBS 6 2.094 RBS 16 0.005
RBS 7 0.95
RBS 8 12.764
RBS 9 2.354
RBS 10 3.544 * clones derived from plates containing 800 μg/ml AMP,
34 μg/ml CAM ** clones derived from plates containing 34 μg/ml CAM *** PADAC activity for the wildtype ompA RBS sequence
(pCampRBSAI) is 3.31 _AU/min/ml 4 These clones had an insertion of 17 bases added 3 ' of the library RBS sequences as measured by an enlarged
Xba I, Eco Rl restriction fragment
Purified DNA from the selected RBS clones were initially evaluated for key restriction sites. It was noticed that some of the library clones (RBS 1, 2, 4, 8-10) had acquired an unexpected mutation during formation of the library. Although it is not unusual to pick up errors during PCR techniques, it is important to note that the errors incorporated into the constructs disclosed below had no impact on the activities of the RBS sequences that were isolated from the library. This is known with certainty since we later recloned the selected sequences without the errors and found that the improved RBS sequences were as active as they were in the initial clones from the library.
The error we found resulted in an insertion of about 17 nucleotides arising from incomplete Bsa I digestion of one of the ends of the PCR library reaction, causing a blunt ended ligation. The insertional error did not appear to be related to selection conditions of the library, since it was seen in both the amp/CAM selected (50%, N=24) and CAM only selected clones (19%, n=16) . A comparison of /3-lactamase activities of the ampicillin selected clones also showed no obvious correlations between the insertion error and the level of beta lactamase activity produced (Table 7) .
In order to determine if the insertional mutation played a role in the RBS clones exhibiting increased levels of expressed /3-lactamase, the DNA coding sequence for the RBS regions in four clones were determined by DNA sequencing and new oligonucleotides containing these sequences were synthesized to enable expression analysis in a new vector that had not undergone EIPCR. The DNA sequence for RBS clones #1, 2, 8 and 12 are provided in Figure 8. As noted above, for clones 1, 2 and 8 there was an additional insertion due to improper cutting of the terminus of the EIPCR primer (inserted nucleotides shown in bold) . In clones 1, 2, and 8, the full length RBS sequence containing the insertion is herein referred to as RBS long, while the RBS sequence without the insertion is called RBS short. DNA oligonucleotide primers coding for resistance, RBS 2 long and short, RBS 8 short and RBS 12 were synthesized. Their sequences are shown in Figure 9.
The vector used to retest the activity of these various new RBS sequences was pCLAampR (Figure 3) , disclosed above. The various clones were produced by performing a standard round of PCR in which the various primers were included and pCLAampR was used as the template (Figure 8) . The PCR products were filled in with Klenow and T4 DNA polymerase, followed by restriction digestion using Xba I and Pst I. The products were cloned into the pCLAampR vector that had been previously cut with the same restriction enzymes. The DNA was ligated using the methods provided in Example 1, and DH10B bacteria were transformed using the electroporation procedures disclosed above. Clones were selected for growth on chloramphenicol and grown overnight in Terrific Broth media + 34 μg/ml chloramphenicol. The next day, cultures were seeded at
OD600=0.1 in LB media + 34 μg/ml chloramphenicol and grown at 30°C to OD600=0.8. IPTG was then added to a final concentration of 0.4mM. The cultures were grown for an additional 24 hours at which time the cells were harvested and periplasmic fractions were produced as described in Example 1. The periplasmic samples were next subjected to PADAC assay for /3-lactamase activity determination. The results from three separate sets of cultures are provided in Table 5 and indicate that in the Lac regulated vector all the RBS library sequences shown in Table 5 were more active than the wild type Omp A RBS sequence. Also, the same level of /3-lactamase expression was obtained for both the long and short versions of RBS clone 2, which indicates that the additional insertion from the original EIPCR reaction played no role in initiating translation.
It should be noted that DNA sequencing was used to verify the recloned RBS sites in all the RBS clones shown in Table 5. All sequences were unchanged except for clone 12 , in which a thymidine nucleotide was deleted from the sequence. It is known from the literature that a critical element within an RBS sequence is an area of three or more bases in the mRNA that hybridize with the 3 ' end of 16S rRNA in bacteria (also known as the Shine and Dalgarno sequence) . The underlined bases in Table 8 show potential areas of mRNA complementarity to the 3' end of 16S rRNA.
Potential Shine and Dalgarno sequences were identified for RBS clones 2 and 12, but not for clones 1 and 6.
It should be pointed out that the mutagenesis process described here is very powerful due to its ability to optimize sequences at both the RNA and the protein level. In the case of the RBS sequences, some clones were isolated with improved expression levels, even though the area of the RBS sequence contained a 5 ' unexpected insertion of 17 nucleotides derived from an uncut primer sequence. In fact, these selected RBS sequences proved themselves to function well, whether or not the accidental insertion was present downstream. In addition, some of these sequences are working efficiently even though they do not contain a potential Shine and Dalgarno type sequence. Thus, this data illustrates the ability of the library approach to randomly incorporate sequences, both in predicted locations and in unpredicted locations and to assess the effectiveness of the mutations on protein expression irrespective of the location of the mutations.
Table 8 DNA Sequences from Original Ribosome Binding Site Library
Clones Sequence Names DNA Sequence (5'-3' ton strand)* Omp A RBS TAACGAGGCGCAAAAA
(SEQ ID NO: 34)
Library RBS #1 long AGACACGTACAAACCAATGAAAGAGACCTATTTACTC (SEQ ID NO: 35)
#2 long AAGCAAAGTCCCGCGAATGAAAGAGACCTATTT (SEQ ID NO: 36)
#8 long ACGTTTAAACAGACACATGAAAGAGACCTATTT (SEQ ID NO: 37)
#12 GGAACTCAAAGGCACC (SEQ ID NO: 38)
* DNA sequence is from the XBAl site (-16 position relative to the ompA signal peptide sequence) to the start of the Omp A signal sequence (-1 position) .
Bolded nucleotides indicate the DNA sequence corresponding to an insertional error due to blunt ended ligation of a EIPCR DNA primer that failed to be cut by BSA1.
Underlined nucleotides indicate areas of potential complementarity with the 3' end of E. coli 16S ribosomal RNA. (ie. potential Shine Dalgarno sequences) . Example 4
Creation of Novel Regulatory Sequences in Mammalian Cells As an example of the application of this invention to the development of novel regulatory sequences in mammalian cells, this example provides a method for producing signal peptide libraries in mammalian cells for improved protein expression. Figure 7a illustrates the vector contemplated for use in this invention, pGCEMK. The vector uses the immunoglobulin Kappa Light Chain promoter (from the kappa light chain gene of the murine monoclonal antibody CEM 231) to direct transcription of a desired gene. In this example it is driving the native CEM kappa light chain variable region, but one skilled in the art will be aware that any natural or recombinant variable region could replace the native Variable region and the same process of signal sequence mutagenesis could be accomplished. Further, the entire light chain coding sequences could be replaced by those from any eukaryotic gene and a similar process carried out. In the present example, the aim of the mutagenesis is to determine if an alternate signal sequence can be obtained of equivalent or improved efficacy, as measured by expression of mature light chain. Other components of the vector, pGCEMK, include the major intron from the human kappa light chain gene, containing its native enhancer sequence; the human kappa constant region, containing its native polyadenylation signal; a Xanthine-guanine phosphoribosyl transferase gene (gptR) under the control of an SV 40 promoter to permit selection of stable transformed cells; the bacterial colEl origin of replication, to permit high copy number plasmid replication in E . coli . The plasmid, pGCEMK, was constructed from the plasmids pHF-1 and pMLCE-10, which have been deposited with the American Type Culture Collection as Accession numbers 67,637 and 67,639, respectively. These plasmids and the construction pGCEMK are described in U.S. patent application Serial No. 07/727,719, filed July 2, 1991, which is incorporated herein by reference in its entirety.
In this example the native signal sequence is removed. In one vector, this region is replaced with the first two codons of the native signal to allow efficient initiation of translation to take place. These are fused directly to the first codon of the Variable region. Random 60-65 mer oligonucleotides are inserted into the vector by PCR. One skilled in the art has the option of conserving the signal intron in the resultant construct or deleting it, as long as the control containing the native signal has a comparable format. Following incorporation of the random nucleotides into the eukaryotic vector, the vector is introduced into suitable eukaryotic cells that are capable of expressing protein from the expression vector (for example, SP 2/0 cells, American Type Culture Collection, Rockville, MD) . In this example, electroporation is used to mediate transfection; however, those skilled in the art will be readily able to select other equally appropriate methods for introducing their selected expression vector into a suitable cell type. Positive cells are preferably selected by growing the cells on media containing hypoxanthine, mycophenolic acid and xanthine (HMAX) .
At this point, one skilled in the art could enrich for cells actually expressing protein at a detectable level, for instance, by Fluorescence Activated Flow Cytometry with Cell Sorting. In this example, cells expressing a detectable level of protein are identified by ELISA. Positive colonies are further quantitated for protein expression by a quantitative ELISA technique as compared with cells expressing protein under the control of the native regulatory sequence, and using the vector without random oligonucleotides added as a negative control. Cells expressing equal or greater levels of protein as compared with cells containing the native regulatory sequence are subjected to DNA sequencing, and identified as optimal signal sequences for the particular protein expressed.
Example 5 Creation of Novel Protein Trafficking Signals in Mammalian Cells
A critical component of immunoglobulin (Ig) protein trafficking is the binding of the chaperon protein BiP to immunoglobulin heavy chain and the subsequent displacement of BiP by light chain (Nature 306:387-389,
1983) . Secretion of the intact immunoglobulin molecule is not possible unless BiP is disassociated from the heavy chain. The kinetics of these interactions impact directly upon the level of expression of intact antibody. For example, if binding of heavy chain monomers by BiP is inefficient, non-bound heavy chain monomers can be secreted. These monomers compete with intact molecules and inhibit the secretion process. Alternatively, the inefficient displacement of BiP, by light chain, from BiP- associated heavy chain would slow the process of intact immunoglobulin secretion. The BiP binding regions within the immunoglobulin heavy chain have been narrowed to specific regions on the molecule (in CH! and VH; Blood 79:2181-2195, 1992), encoded by specific regions of DNA in the gene. The methods of this invention can be used to identify optimized sequences that increase overall expression of intact Ig without increasing the level of light or heavy chain monomers (or homodimers) .
A mammalian expression vector is selected that contains a heavy chain gene. Figure 8 illustrates one heavy chain chimeric vector, pNCEMGl, that is useful for practicing the trafficking methods of this invention. This vector uses the Immunoglobulin heavy chain promoter to transcribe the desired gene inserted into a cloning site. The vector includes an immunoglobulin heavy chain polyadenylation signal and includes a neomycin resistance gene. The neomycin resistance gene is under the control of the SV40 promoter, thereby permitting the selection of stable transformed cells in media containing the antibiotic G418. The bacterial colEl origin of replication permits high copy number plasmid replication in E . coli . Construction of this plasmid from plasmids pHGlz and pMHCE30, which have been deposited in the American Type Culture Collection (Accession Nos. 67,638 and 67,640, respectively) as described in pending U.S. patent application Serial No. 07/727,319. In this example, the native CEM immunoglobulin heavy chain V-region sequence is incorporated into the vector, but one skilled in the art will be aware that any natural or recombinant variable region could replace the native variable region and the same process of BiP binding region optimization could be carried out.
The normal Bip binding region located within the human gamma 1 constant region gene is operably linked to the recombinant variable region in this vector. A second vector is now constructed that is identical to the first except that the BiP binding region is deleted from the CH1 domain. Random 12-20 mer sequences are inserted into the vector using the PCR methods described in Example 1. Following incorporation of the random oligonucleotides into the eukaryotic vector, the vector is introduced into suitable eukaryotic cells, which produce a light chain complementary to the heavy chain used (e.g. the cell lines produced in Example 4) . The vector can be introduced into cells using methods well known in the art. Here, electroporation is used to mediate transfection. Positive cells are preferably selected by growing the cells in media containing G418.
At this point, one skilled in the art could enrich for cells actually expressing protein at a detectable level, for instance, by Fluorescence Activated Flow Cytometry with Cell Sorting. In this example, cells expressing a detectable level of protein are identified by ELISA. Positive cells are further quantitated for protein expression by a quantitative ELISA technique as compared with cells expressing the protein under the control of the native regulatory sequence, and using the vector without random oligonucleotides added as a negative control. Cells expressing equal or greater levels of intact antibody as compared with positive control cells and that do not express an increased level of free light or heavy chains are subjected to DNA sequencing to determine the DNA sequence of the trafficking signals most efficacious for expression of the immunoglobulin. While particular embodiments of the invention have been described in detail, it will be apparent to those skilled in the art that these embodiments are exemplary rather than limiting, and the true scope of the invention is that defined in the following claims.
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT: Antelman, Douglas E.
Wilson, Barry S.
(ii) TITLE OF INVENTION: METHOD FOR CREATING OPTIMIZED REGULATORY REGIONS AFFECTING PROTEIN EXPRESSION AND PROTEIN TRAFFICKING
(iii) NUMBER OF SEQUENCES: 43
(iv) CORRESPONDENCE ADDRESS: (A) ADDRESSEE: Eli Lilly and Company
(B) STREET: Lilly Corporate Center
(C) CITY: Indianapolis
(D) STATE: Indiana
(E) COUNTRY: United States of America (F) ZIP: 46285
(V) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: Apple Macintosh (C) OPERATING SYSTEM: Apple System 7.0
(D) SOFTWARE: Microsoft Word
(vi) CURRENT APPLICATION DATA: (A) APPLICATION NUMBER: (B) FILING DATE:
(C) CLASSIFICATION:
(Viϋ) ATTORNEY/AGENT INFORMATION: (A) NAME: Gaylo, Paul J. (B) REGISTRATION NUMBER: 36,808 (C) REFERENCE/DOCKET NUMBER: H-8589
(ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: (317) 276-0756 (B) TELEFAX: (317) 276-3861
(2) INFORMATION FOR SEQ ID NO: 1:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:
ATATATTCTA GATAACGAGG CGCAAA 26
(2) INFORMATION FOR SEQ ID NO: 2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 54 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ϋ) MOLECULE TYPE: CDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:
ATAAATTGGT CTCACCAGAA CCACCACCAC CACCTGCAGA GACAGTGACC
AGAG 54
(2) INFORMATION FOR SEQ ID NO: 3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 25 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3:
GGTCTCGCGG TATCATTGCA GCACT 25 (2) INFORMATION FOR SEQ ID NO: 4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 35 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4:
AATTGGTCTC GGAACCACGC TCACCGGCTC CAGAT 35
(2) INFORMATION FOR SEQ ID NO: 5:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 53 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: CDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:
ATAAATTGGT CTCATCTGGT GGTGGTGGTT CTCACCCAGA AACGCTGGTG
AAA 53
(2) INFORMATION FOR SEQ ID NO: 6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 44 base pairs (B) . TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
( i) SEQUENCE DESCRIPTION: SEQ ID NO: 6:
ATATATGGAT CCGGTACCCT ATCATTACCA ATGCTTAATC AGTG 44
(2) INFORMATION FOR SEQ ID NO: 7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 32 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7
ATAATTAGGT CTCCCAGGCT GTGGTGACCC AG 32
(2) INFORMATION FOR SEQ ID NO: 8:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 32 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:
ATAATTAGGT CTCACCTGTT TTTGCGCCTC GT 32 (2) INFORMATION FOR SEQ ID NO: 9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:
ATAATTAGGT CTCAAGGCTG TGGTGACCCA G 31
(2) INFORMATION FOR SEQ ID NO: 10:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 100 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:
AATAATAGGT CTCTGCCTGN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN
NNNNNNNNNN 60
NNNNNNNNNN NNNNNNNNNN NNCATTTTTT GCGCCTCGTT 100
(2) INFORMATION FOR SEQ ID NO: 11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECTJLE TYPE: peptide
(iϋ) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:
Met Lys Lys Thr Ala lie Ala lie Ala Val Ala Leu Ala Gly Phe
Ala
1 5 10 15 Thr Val Ala Gin Ala
20
( 2 ) INFORMATION FOR SEQ ID NO : 12 :
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:
Met Leu Tyr Pro Gly Glu Met Leu Cys Met Gin Asn Phe lie Thr Leu 1 5 10 15
lie lie Glu lie Tyr He 20
(2) INFORMATION FOR SEQ ID NO: 13:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 22 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13
Met Gly Met Tyr Lys Leu Arg Leu Leu He Asp Cys Leu Met Leu Tyr 1 5 10 15
Val Cys Leu Thr Asp Thr
20
(2) INFORMATION FOR SEQ ID NO: 14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iϋ) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(ix) FEATURE:
(A) NAME/KEY: Stop Codon
(B) LOCATION: 8, 10
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:
Met Asn He Leu Leu Glu Asn Xaa Met Xaa Arg Thr His Asp Met Leu 1 5 10 15
He Asp Gly Thr He
20
(2) INFORMATION FOR SEQ ID NO: 15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(ϋi) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15:
Met Gly Met Tyr Lys Leu Arg Leu Leu Asp Cys Leu Met Leu Tyr Val 1 5 10 15
Cys Ser Thr Asp Thr 20
(2) INFORMATION FOR SEQ ID NO: 16:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16:
Met Thr Ser Val Cys Lys He Thr Val Leu He Leu Phe Xaa Thr
Gly
1 5 10 15 Thr Tyr Asn He Leu Gly
20
(2) INFORMATION FOR SEQ ID NO: 17: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:
Met Arg Asn Thr Tyr Val Tyr Xaa Lys Tyr Gly Arg Thr Glu Trp Leu
1 5 10 15
Val Glu Gin He Asn Leu
20
(2) INFORMATION FOR SEQ ID NO: 18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:
Met Val Gin Cys Glu Arg Xaa He Tyr His Tyr Val Gin Arg He Arg
1 5 10 15
Glu Tyr Phe Xaa Phe Glu
20
(2) INFORMATION FOR SEQ ID NO: 19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO ( iv) ANTI-SENSE : NO
(v) FRAGMENT TYPE : N-terminal
(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 19 :
Met Val Gly He Cys Thr Gly He Gly Val Gly He Leu Leu Leu
Leu
1 5 10 15
Leu Val Val Gly Gin Val
20
(2) INFORMATION FOR SEQ ID NO: 20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: Met Leu Gly Ala Met Glu Phe Lys Lys Leu Ser Tyr Ala He Leu
Val
1 5 10 15
Leu Ser His Thr Pro Thr
20
(2) INFORMATION FOR SEQ ID NO: 21:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 21 :
Met Asp Gin Gly He Lys Tyr Arg Leu Glu Lys Trp Arg Gly Arg He
1 5 10 15
Ser Leu He Arg Val Phe
20 (2) INFORMATION FOR SEQ ID NO: 22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:
Met Gin Arg Arg Gly He Arg Trp Ala Tyr Val He Tyr Leu Ser
Met
1 5 10 15
Phe Phe Gly Ala Cys
20
(2) INFORMATION FOR SEQ ID NO: 23
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(v) FRAGMENT TYPE: N-terminal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:
Met Leu Gly Val Leu Thr Met Pro Cys Asp Thr Met Val Glu Asp Leu
1 5 10 15
Ala Cys Asp Tyr Met Ser
20
(2) INFORMATION FOR SEQ ID NO: 24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 120 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:
TAATTATTCT AGAATGGGGA TGTACAAGCT GAGACTACTG ATAGATTGCC TAATGCTTTA 60
TGTTTGTTCA ACCGATACCG GAATTCCGGG TCACCCAGAA ACGCTGGTGA AAGTAAAAGA 120
(2) INFORMATION FOR SEQ ID NO: 25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 120 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:
TAATTATTCT AGAATGACTT CTGGGAGCAA AATAACTGTT CTTATACTTC TTTTGACTGG 60 TACATACAAT ATATTAGGGG GAATTCCGGG TCACCCAGAA ACGTCGGTGA
AAGTAAAAGA 120
(2) INFORMATION FOR SEQ ID NO: 26:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 31 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:
ATAATAGGTC TCCACCCAGA AACGCTGGTG A 31
(2) INFORMATION FOR SEQ ID NO: 27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 72 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:
ATAATAGGTC TCTGGGTGNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NCATTTTTTG 60 CGCCTCGTTA TC 72
(2) INFORMATION FOR SEQ ID NO: 28:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 84 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: ATAATAGGTC TCTGGGTGNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN
NNNNNNNNNN 60
NNNCATTTTT TGCGCCTCGT TATC 84
(2) INFORMATION FOR SEQ ID NO: 29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 96 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29:
ATAATAGGTC TCTGGGTGNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 60
NNNNNNNNNN NNNNNCATTT TTTGCGCCTC GTTATC 96
(2) INFORMATION FOR SEQ ID NO: 30:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 31 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30:
ATAAATAGGT CTCATGAAAA AGACAGCTAT C 31 (2) INFORMATION FOR SEQ ID NO: 31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:
ATAAATAGGT CTCCTCTAGA AATTGTGAAA T 31
(2) INFORMATION FOR SEQ ID NO: 32:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 32 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:
ATAAATAGGT CTCATGAAAA AGACAGCTAT CG 32
(2) INFORMATION FOR SEQ ID NO: 33:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 53 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33:
ATAAATAGGT CTCTTTCATN NNNNNNNNNN NNNNNTCTAG AAATGTGTGA
AAT 53
(2) INFORMATION FOR SEQ ID NO: 34:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 16 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34
TAACGAGGCG CAAAAA 16
(2) INFORMATION FOR SEQ ID NO: 35:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 37 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:
AGACACGTAC AAACCAATGA AAGAGACCTA TTTACTC 37 (2) INFORMATION FOR SEQ ID NO: 36:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 33 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:
AAGCAAAGTC CCGCGAATGA AAGAGACCTA TTT 33
(2) INFORMATION FOR SEQ ID NO: 37:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 33 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37:
ACGTTTAAAC AGACACATGA AAGAGACCTA TTT 33
(2) INFORMATION FOR SEQ ID NO: 38:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 16 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38:
GGAACTCAAA GGCACC 16
(2) INFORMATION FOR SEQ ID NO: 39:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 61 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39:
CACACATTTC TAGAAGACAC GTACAAACCA ATGAAAAAAG ACAGCTATCG CGATTGCAGT 60
G 61
(2) INFORMATION FOR SEQ ID NO:40:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 87 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: CACATTTCTA GAAAGCAAAG TCCCGCGAAT GAAAGAGACC TATTTATGAA AAAGACAGCT 60
ATCGCGATTG CAGTGGCACT GGCTGGT 87
(2) INFORMATION FOR SEQ ID NO: 41:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 70 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(iϋ) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41:
CACATTTCTA GAAAGCAAAG TCCCGCGAAT GAAAAAGACA GCTATCGCGA TTGCAGTGGC 60
ACTGGCTGGT 70
(2) INFORMATION FOR SEQ ID NO: 42:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 70 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42:
CACATTTCTA GAACGTTTAA ACAGACACAT GAAAAAGACA GCTATCGCGA TTGCAGTGGC 60
ACTGGCTGGT 70
(2) INFORMATION FOR SEQ ID NO: 43:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 70 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: CDNA
(iii) HYPOTHETICAL: NO
(iv) ANTI-SENSE: NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: CACATTTCTA GAGGAACTCA AAGGCACCAT GAAAAAGACA GCTATCGCGA TTGCAGTGGC 60
ACTGGCTGGT 70

Claims

Claims :
1. A method for optimizing the production of polypeptide in a cell, comprising the steps of: (a) identifying at least one regulatory region within a nucleic acid sequence to be mutagenized;
(b) preparing a nucleic acid vector comprising said regulatory region and a nucleic acid sequence encoding at least one polypeptide regulated by said regulatory region;
(c) deleting said regulatory region from said vector;
(d) producing a pool of random oligonucleotides;
(e) using a polymerase chain reaction to introduce at least one random oligonucleotide into the position previously occupied by said regulatory region in a plurality of vectors to generate a pool of mutagenized vectors;
(f) introducing said mutagenized vectors into a cell sample;
(g) assaying for the expression of said polypeptide in said cell sample;
(h) selecting cells exhibiting optimized polypeptide expression; and (i) isolating optimized polypeptide from the cells of step (h) .
2. The method of Claim 1, wherein said regulatory region is located within said nucleic acid sequence encoding polypeptide.
3. The method of Claim 1, wherein said regulatory region is in a translated portion of said nucleic acid sequence encoding polypeptide.
4. The method of Claim 1 wherein said regulatory region is selected from the group consisting of a signal sequence, a ribosome binding site, a promoter sequence, a translational regulatory sequence, a transcription regulatory sequence, and a protein trafficking sequence.
5. The method of Claim 1, wherein said nucleic acid sequence encodes an antibiotic resistant gene and the selection step additionally consists essentially of growing said cell sample in the presence of an antibiotic.
6. The method of Claim 1, wherein said nucleic acid sequence encoding polypeptide encodes a selectable marker.
7. The method of Claim 1, wherein said nucleic acid sequence encoding polypeptide encodes a fusion protein.
8. The method of Claim 1, wherein said nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and said cell samples is prokaryotic.
9. The method of Claim 1, wherein said random oligonucleotides are biased.
10. The method of Claim 1, additionally comprising the steps of introducing said vector comprising said regulatory region and a nucleic acid sequence encoding at least one polypeptide regulated by said regulatory region into a second cell sample, expressing said polypeptide encoded by said vector in said second cell sample, and measuring the level of polypeptide expression in said second cell sample.
11. The method of Claim 10, additionally comprising the step of selecting cells from said first cell sample that exhibit optimized polypeptide expression relative to the measuring step of Claim 10.
12. The method of Claim 11, additionally comprising the step of determining the nucleic acid sequence of the mutagenized regulatory region producing optimized polypeptide expression.
13. The method of Claim 11, wherein said optimized polypeptide expression is a level of polypeptide expression greater than or equal to the level of polypeptide expression obtained from the measuring step of Claim 10.
14. A method for creating and isolating novel signal sequences, comprising the steps of:
(a) identifying a signal sequence region within a nucleic acid sequence encoding a polypeptide; (b) preparing a nucleic acid vector comprising said nucleic acid sequence encoding the polypeptide;
(c) introducing said vector into a first cell sample and expressing said polypeptide in said first cell sample; (d) measuring the level of polypeptide expression in said first cell sample;
(e) deleting said signal sequence region from said vector;
(f) producing a pool of random oligonucleotides; (g) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by said signal sequence in a plurality of vectors to generate a pool of mutagenized vectors; (h) introducing said mutagenized vectors into a second cell sample;
(i) assaying for the expression of said polypeptide in said second cell sample;
(j) selecting cells exhibiting optimized polypeptide expression relative to step (d) ; and
(k) determining the nucleic acid sequence of the regulatory region located in the mutagenized vector contained in the cells of Step (j).
15. The method of Claim 14, wherein said signal sequence region comprises an entire signal sequence.
16. The method of Claim 14, wherein said nucleic acid vector encodes an antibiotic resistant gene and the selection step additionally consists essentially of growing said cell sample in the presence of an antibiotic.
17. The method of Claim 14, wherein said nucleic acid sequence encoding polypeptide encodes a selectable marker.
18. The method of Claim 14, wherein said polymerase chain reaction is the enzymatic inverse polymerase chain reaction.
19. The method of Claim 14, wherein said nucleic acid sequence encoding polypeptide encodes a fusion protein.
20. The method of Claim 16, wherein a portion of said fusion protein is derived from /3-lactamase.
21. The method of Claim 20, wherein a portion of said fusion protein is derived from an antibody.
22. The method of Claim 15, wherein said nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and said first and second cell samples are prokaryotic.
23. The method of Claim 22, wherein said cells are Escherichia coli .
24. The method of Claim 15, wherein said random oligonucleotides are biased.
25. The method of Claim 24, wherein said random oligonucleotides contain at least one positively charged amino acid at the N-terminus, a stretch of at least 8 hydrophobic amino acids and a small amino acid such as alanine, glycine, or valine positioned at the C-terminus.
26. The method of Claim 15, wherein said optimized level of polypeptide expression is a level greater than or equal to the level of expression of said polypeptide in step (d) .
27. A method for optimizing polypeptide expression in a cell by performing random mutagenesis on a regulatory region regulating polypeptide expression from a nucleic acid vector wherein the improvement comprises deleting a region of nucleic acid to be mutagenized, isolating said vector containing said deletion and replacing said region with random nucleic acid sequences.
28. A method for creating and isolating novel ribosome binding sites, comprising the steps of: (a) identifying a ribosome binding site within a nucleic acid sequence;
(b) preparing a nucleic acid vector comprising said nucleic acid sequence containing a ribosome binding site operably linked to a nucleic acid sequence encoding a polypeptide;
(c) introducing said vector into a first cell sample and expressing said polypeptide in said first cell sample;
(d) measuring the level of polypeptide expression in said first cell sample;
(e) deleting said nucleic acid sequence containing said ribosome binding site from said vector;
(f) producing a pool of random oligonucleotides;
(g) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by said nucleic acid sequence containing said ribosome binding site in a plurality of vectors to generate a pool of mutagenized vectors; (h) introducing said mutagenized vectors into a second cell sample;
(i) assaying for the expression of said polypeptide in said second cell sample;
(j) selecting cells exhibiting optimized polypeptide expression relative to step (d) ; and (k) determining the nucleic acid sequence containing said ribosome binding site located in the mutagenized vector introduced into the cells of step (j) .
29. The method of Claim 28, wherein said nucleic sequence encodes an antibiotic resistance gene and said selection step consists essentially of growing said cell samples in the presence of an antibiotic.
30. The method of Claim 28, wherein said nucleic acid sequence encoding polypeptide encodes a selectable marker.
31. The method of Claim 30, wherein said nucleic acid sequence encoding polypeptide is a fusion protein.
32. The method of Claim 30, wherein a portion of said fusion protein is derived from an antibody.
33. The method of Claim 31, wherein a portion of said fusion protein is derived from /3-lactamase.
34. The method of Claim 28, wherein said nucleic acid sequence encoding polypeptide is derived from a eukaryotic cell and said first and second cell samples are prokaryotic.
35. The method of Claim 31, wherein said cells are Escherichia coli .
36. The method of Claim 28, wherein said random oligonucleotides are biased.
37. The method of Claim 28, wherein said polymerase chain reaction is the enzymatic inverse polymerase chain reaction.
38. The method of Claim 28, wherein said optimized level of polypeptide expression is a level greater than or equal to the level of expression of said polypeptide in step (d) .
39. A protein signal sequence corresponding to SEQ ID NO: 13 and to protein signal sequences containing at least contiguous amino acid 10-mers thereof.
40. A protein signal sequence corresponding to SEQ ID NO: 16 and to protein signal sequences containing at least contiguous amino acid 10-mers thereof.
41. A protein signal sequence corresponding to SEQ ID NO: 14 and to protein signal sequences containing at least contiguous amino acid 10-mers thereof.
42. Isolated nucleic acid containing a ribosome binding site sequence identified as SEQ ID NO: 35 and to ribosome binding sites containing at least contiguous 5- mers thereof.
43. Isolated nucleic acid containing a ribosome binding site sequence identified as SEQ ID NO: 37 and to ribosome binding sites containing at least contiguous 5- mers thereof.
44. Isolated nucleic acid containing a ribosome binding site sequence identified as SEQ ID NO: 38 and to ribosome binding sites containing at least contiguous 5- mers thereof.
45. A method for creating and isolating novel signal sequences, comprising the steps of:
(a) identifying a signal sequence region within a nucleic acid sequence encoding a polypeptide;
(b) preparing a nucleic acid vector comprising said nucleic acid sequence encoding a polypeptide, wherein said polypeptide is a fusion protein having a C-terminus corresponding to /3-lactamase;
(c) deleting said signal sequence region from said vector;
(d) producing a pool of random oligonucleotides suitable for an enzymatic inverse polymerase chain reaction;
(e) using an enzymatic inverse polymerase chain reaction to introduce said random oligonucleotides into the position in said vector previously occupied by said signal sequence to generate a pool of mutagenized vectors; (f) introducing said mutagenized vectors into a first cell sample of E . coli ,'
(g) assaying for the expression of said polypeptide in said first cell sample in the presence of ampicillin; and
(h) selecting cells exhibiting optimized polypeptide expression.
46. The method of Claim 45, wherein said signal sequence region comprises an entire signal sequence.
47. The method of Claim 45, additionally comprising the steps of introducing said vector comprising said nucleic acid sequence encoding a polypeptide, wherein said polypeptide is a fusion protein having a C-terminus corresponding to /3-lactamase into a second cell sample of, E. coli expressing said polypeptide in said second cell sample and measuring the level of polypeptide expression in said second cell sample.
48. The method of Claim 47, additionally comprising the step of selecting cells from said first cell sample that exhibit optimized polypeptide expression relative to the level of polypeptide expression observed in the measuring step of Claim 49.
49. The method of Claim 48, wherein said optimized polypeptide expression is a level of polypeptide expression greater than or equal to the level of polypeptide expression obtained from the measuring step of Claim 47.
50. The method of Claim 46 or 48 additionally comprising the step of determining the nucleic acid sequence of said regulatory region contained in the mutagenized vector isolated from cells identified by said selecting step.
51. The method of Claim 46, wherein the concentration of ampicillin in step (i) is at least 30 μg/ml.
52. The method of Claim 46, wherein said nucleic acid sequence encoding a polypeptide encodes a single-chain antibody.
53. The method of Claim 46, wherein said random oligonucleotides are biased.
54. The method of Claim 53, wherein said random oligonucleotides contain at least one positively charged amino acid at the N-terminus, a stretch of at least 8 hydrophobic amino acids and a small amino acid such as alanine, glycine or valine positioned at the C-terminus.
55. A method for identifying a novel nucleic acid sequence encoding a protein trafficking signal that directs a polypeptide to a desired location in a cell, comprising the steps of: (a) identifying a region of nucleic acid containing at least one protein trafficking signal to be mutagenized;
(b) preparing a nucleic acid vector comprising said protein trafficking signal sequence to be mutagenized and a nucleic acid sequence encoding at least one polypeptide;
(c) deleting said protein trafficking signal sequence from said vector;
(d) producing a pool of random oligonucleotides; (e) using a polymerase chain reaction to introduce at least one of the random oligonucleotides into the position previously occupied by said protein trafficking signal in a plurality of vectors to generate a pool of mutagenized vectors; (f) introducing said mutagenized vectors into a cell sample;
(g) assaying for the location of said polypeptide in said cell sample;
(h) selecting cells expressing said polypeptide in the desired cell location; and (i) determining the nucleic acid sequence of the protein trafficking signal contained in the mutagenized vector located in the cells of Step (h) .
56. The method of Claim 55, wherein said protein trafficking sequence to be mutagenized is located in said nucleic acid sequence encoding said polypeptide.
57. The method of Claim 55, wherein said polymerase chain reaction is the enzymatic inverse polymerase chain reaction.
58. The method of Claim 55, wherein said desired cell location is extracellular.
59. The method of Claim 55, wherein said nucleic sequence encodes an antibiotic resistant gene and said selection step consists essentially of growing said cell sample in the presence of an antibiotic.
60. The method of Claim 55, wherein said polypeptide is a selectable marker.
61. The method of Claim 55, wherein said polypeptide is a fusion protein.
62. The method of Claim 55, wherein said polypeptide is derived from a eukaryotic cell and said cell sample is prokaryotic.
63. The method of Claim 55, wherein said random oligonucleotides are biased.
64. A method for creating and isolating novel regulatory sequences for optimizing the expression of a recombinant polypeptide in prokaryotic cells comprising the steps of:
(a) identifying at least one regulatory region within a nucleic acid sequence to be mutagenized;
(b) preparing a nucleic acid vector suitable for expressing polypeptide in a prokaryotic cell, said nucleic acid vector comprising said regulatory region and a nucleic acid sequence encoding at least one polypeptide operably linked to said regulatory region; (c) introducing said vector into a first prokaryotic cell sample and expressing said polypeptide encoded by said vector in said first cell sample;
(d) measuring the level of polypeptide expression in said first prokaryotic cell sample;
(e) deleting said regulatory region from said vector;
(f) producing a pool of random oligonucleotides;
(g) using a polymerase chain reaction to introduce at least one random oligonucleotide into said position previously occupied by said regulatory region in a plurality of vectors to generate a pool of mutagenized vectors;
(h) introducing said mutagenized vectors into a first prokaryotic cell sample;
(i) assaying for the expression of the polypeptide in said second prokaryotic cell sample; (j) selecting cells exhibiting optimized polypeptide expression relative to step (d) ; and (k) determining the nucleic acid sequence of said mutagenized regulatory region located within the vector introduced into the cells of step (j).
65. The method of Claim 64, wherein said regulatory region comprises an entire regulatory sequence.
66. The method of Claim 65, wherein said regulatory region is selected from the group consisting of a signal sequence, a ribosome binding site, a promoter sequence, a translational regulatory sequence, a transcription regulatory sequence and a protein trafficking sequence.
67. The method of Claim 65, wherein said nucleic acid sequence encoding polypeptide encodes an antibiotic resistant gene and the selection step additionally consists of growing said second prokaryotic cell sample in the presence of an antibiotic.
68. The method of Claim 65, wherein said nucleic acid sequence encoding polypeptide encodes a selectable marker.
69. The method of Claim 65, wherein said nucleic acid sequence encoding polypeptide encodes a fusion protein.
70. The method of Claim 65, wherein said random oligonucleotides are biased.
71. The method of Claim 65, wherein said nucleic acid sequence encoding polypeptide encodes a polypeptide derived from a eukaryotic cell.
72. The method of Claim 65, wherein said first and second cell samples are derived from the same cell type.
PCT/US1994/004651 1993-04-28 1994-04-28 Method for creating optimized regulatory regions affecting protein expression and protein trafficking WO1994025609A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU67761/94A AU6776194A (en) 1993-04-28 1994-04-28 Method for creating optimized regulatory regions affecting protein expression and protein trafficking

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US5473093A 1993-04-28 1993-04-28
US08/054,730 1993-04-28

Publications (1)

Publication Number Publication Date
WO1994025609A1 true WO1994025609A1 (en) 1994-11-10

Family

ID=21993131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1994/004651 WO1994025609A1 (en) 1993-04-28 1994-04-28 Method for creating optimized regulatory regions affecting protein expression and protein trafficking

Country Status (2)

Country Link
AU (1) AU6776194A (en)
WO (1) WO1994025609A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998007846A1 (en) * 1996-08-23 1998-02-26 Peter Ruhdal Jensen Artificial promoter libraries for selected organisms and promoters derived from such libraries
WO2003089605A2 (en) * 2002-04-22 2003-10-30 Genencor International, Inc. Method of creating a library of bacterial clones with varying levels of gene expression
EP1576094A2 (en) * 2002-04-22 2005-09-21 Genencor International, Inc. Methods of creating modified promoters resulting in varying levels of gene expression
WO2009020899A1 (en) * 2007-08-03 2009-02-12 Dow Global Technologies Inc. Translation initiation region sequences for the optimal expression of heterologous proteins
WO2009145606A1 (en) * 2008-05-28 2009-12-03 U-Protein Express B.V. Method for the production of proteins or protein fragments
WO2010002966A2 (en) * 2008-07-03 2010-01-07 Dow Global Technologies Inc. High throughput screening method and use thereof to identify a production platform for a multifunctional binding protein
US20120238456A1 (en) * 2009-08-12 2012-09-20 Uni'TargetingResearch AS Rational library
WO2012151503A3 (en) * 2011-05-04 2013-03-21 The Broad Institute, Inc. Multiplexed genetic reporter assays and compositions
US8691952B2 (en) 2005-12-30 2014-04-08 Merck Patent Gmbh Anti-CD19 antibodies with reduced immunogenicity
US8835606B2 (en) 2004-01-22 2014-09-16 Merck Patent Gmbh Anti-cancer antibodies with reduced complement fixation
US8907066B2 (en) 2009-04-22 2014-12-09 Merck Patent Gmbh Antibody fusion proteins with a modified FcRn binding site
US8926973B2 (en) 2001-03-30 2015-01-06 Merck Patent Gmbh Reducing the immunogenicity of fusion proteins
US9029330B2 (en) 2005-12-30 2015-05-12 Merck Patent Gmbh Methods of treating cancer using interleukin-12p40 variants having improved stability
US9534217B2 (en) 2002-04-22 2017-01-03 Danisco Us In. Method of creating a library of bacterial clones with varying levels of gene expression

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683195A (en) * 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US5096815A (en) * 1989-01-06 1992-03-17 Protein Engineering Corporation Generation and selection of novel dna-binding proteins and polypeptides

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683195A (en) * 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683195B1 (en) * 1986-01-30 1990-11-27 Cetus Corp
US5096815A (en) * 1989-01-06 1992-03-17 Protein Engineering Corporation Generation and selection of novel dna-binding proteins and polypeptides

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NATURE, Volume 354, issued 07 November 1991, R.A. HOUGHTEN et al., "Generation and Use of Synthetic Peptide Combinatorial Libraries for Basic Research and Drug Discovery", pages 84-86. *
NUCLEIC ACIDS RESEARCH, Volume 16, Number 11, issued 10 June 1988, K.T. MIN et al., "Search for the Optimal Sequence of the Ribosomal Binding Site by Random Oligonucleotide-Directed Mutagenesis", pages 5075-5099. *
NUCLEIC ACIDS RESEARCH, Volume 16, Number 15, issued 11 August 1988, A.R. OLIPHANT et al., "Defining the Consensus Sequences of E. Coli Promoter Elements by Random Selection", pages 7673-7683. *
SCIENCE, Volume 235, issued 16 January 1987, C.A. KAISER et al., "Many Random Sequences Functionally Replace the Secretion Signal Sequence of Yeast Invertase", pages 312-317. *
THE JOURNAL OF BIOLOGICAL CHEMISTRY, Volume 264, issued 05 December 1989, D.B. LEMIRE et al., "The Mitochondrial Targeting Function of Randomly Generated Peptide Sequences Correlates with Predicted Helical Amphiphilicity", pages 20206-20215. *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2034021A1 (en) * 1996-08-23 2009-03-11 Peter Ruhdal Jensen Artificial promoter libraries for selected organisms and promoters derived from such libraries
WO1998007846A1 (en) * 1996-08-23 1998-02-26 Peter Ruhdal Jensen Artificial promoter libraries for selected organisms and promoters derived from such libraries
US8926973B2 (en) 2001-03-30 2015-01-06 Merck Patent Gmbh Reducing the immunogenicity of fusion proteins
US9534217B2 (en) 2002-04-22 2017-01-03 Danisco Us In. Method of creating a library of bacterial clones with varying levels of gene expression
EP1576094A4 (en) * 2002-04-22 2006-08-30 Genencor Int Methods of creating modified promoters resulting in varying levels of gene expression
WO2003089605A2 (en) * 2002-04-22 2003-10-30 Genencor International, Inc. Method of creating a library of bacterial clones with varying levels of gene expression
EP1576094A2 (en) * 2002-04-22 2005-09-21 Genencor International, Inc. Methods of creating modified promoters resulting in varying levels of gene expression
WO2003089605A3 (en) * 2002-04-22 2004-12-02 Genencor Int Method of creating a library of bacterial clones with varying levels of gene expression
US8835606B2 (en) 2004-01-22 2014-09-16 Merck Patent Gmbh Anti-cancer antibodies with reduced complement fixation
US10633452B2 (en) 2004-01-22 2020-04-28 Merck Patent Gmbh Anti-cancer antibodies with reduced complement fixation
US9617349B2 (en) 2004-01-22 2017-04-11 Merck Patent Gmbh Anti-cancer antibodies with reduced complement fixation
US10017579B2 (en) 2004-01-22 2018-07-10 Meck Patent Gmbh Anti-cancer antibodies with reduced complement fixation
US11208496B2 (en) 2005-12-30 2021-12-28 Cancer Research Technology Ltd. Anti-CD19 antibodies with reduced immunogenicity
US10072092B2 (en) 2005-12-30 2018-09-11 Merck Patent Gmbh Methods of use of anti-CD19 antibodies with reduced immunogenicity
US8691952B2 (en) 2005-12-30 2014-04-08 Merck Patent Gmbh Anti-CD19 antibodies with reduced immunogenicity
US8957195B2 (en) 2005-12-30 2015-02-17 Merck Patent Gmbh Anti-CD19 antibodies with reduced immunogenicity
US9029330B2 (en) 2005-12-30 2015-05-12 Merck Patent Gmbh Methods of treating cancer using interleukin-12p40 variants having improved stability
WO2009020899A1 (en) * 2007-08-03 2009-02-12 Dow Global Technologies Inc. Translation initiation region sequences for the optimal expression of heterologous proteins
WO2009145606A1 (en) * 2008-05-28 2009-12-03 U-Protein Express B.V. Method for the production of proteins or protein fragments
WO2010002966A3 (en) * 2008-07-03 2010-07-22 Dow Global Technologies Inc. High throughput screening method and use thereof to identify a production platform for a multifunctional binding protein
WO2010002966A2 (en) * 2008-07-03 2010-01-07 Dow Global Technologies Inc. High throughput screening method and use thereof to identify a production platform for a multifunctional binding protein
US8907066B2 (en) 2009-04-22 2014-12-09 Merck Patent Gmbh Antibody fusion proteins with a modified FcRn binding site
US20120238456A1 (en) * 2009-08-12 2012-09-20 Uni'TargetingResearch AS Rational library
WO2012151503A3 (en) * 2011-05-04 2013-03-21 The Broad Institute, Inc. Multiplexed genetic reporter assays and compositions
US11767534B2 (en) 2011-05-04 2023-09-26 The Broad Institute, Inc. Multiplexed genetic reporter assays and compositions

Also Published As

Publication number Publication date
AU6776194A (en) 1994-11-21

Similar Documents

Publication Publication Date Title
Schwer et al. PRP16 is an RNA-dependent ATPase that interacts transiently with the spliceosome
CA2113363C (en) Process for constructing a cdna library and a novel polypeptide and dna coding the same
Bird et al. Translocation in yeast and mammalian cells: not all signal sequences are functionally equivalent.
WO1994025609A1 (en) Method for creating optimized regulatory regions affecting protein expression and protein trafficking
Matsufuji et al. Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme
Lewis et al. The ERD2 gene determines the specificity of the luminal ER protein retention system
AU693569B2 (en) Synthetic leader peptide sequences
EP0555370B1 (en) Methods and applications for efficient genetic suppressor elements
EP1625219B1 (en) Methods for producing secreted polypeptides
Johnson et al. Pichia pastoris Pex14p, a phosphorylated peroxisomal membrane protein, is part of a PTS–receptor docking complex and interacts with many peroxins
FI88407B (en) DNA MOLEKYL, TRANSFORMER JAESTCELLER OCH FOERFARANDE FOER FRAMSTAELLNING AV HUMAN-LYSOZYM
AU2007310870A1 (en) Improved alpha factor signal peptide for producing a polypeptide
JP4809347B2 (en) Signal peptide for polypeptide production
CS267192A3 (en) Expression systems
JPH07250689A (en) Periplasm membrane connective system for detecting protein-protein interaction
JP4528623B2 (en) Rapidly degradable reporter fusion protein
Nogami et al. Probing novel elements for protein splicing in the yeast Vmal protozyme: a study of replacement mutagenesis and intragenic suppression
Günther et al. The Saccharomyces cerevisiae TRG1 gene is essential for growth and encodes a lumenal endoplasmic reticulum glycoprotein involved in the maturation of vacuolar carboxypeptidase.
US5948677A (en) Reading frame independent epitope tagging
JP4180112B2 (en) Vector for expression of N-terminally extended protein in yeast cells
Allison et al. Single-amino-acid substitutions within the signal sequence of yeast prepro-α-factor affect membrane translocation
JP2769541B2 (en) Equilibrium-type inducible transcription system
DK2646558T3 (en) Promoters for expression of genes in a fungal cell
US20050106671A1 (en) Expression vector, host cell and method for producing fusion proteins
Kölling et al. The first hydrophobic segment of the ABC‐transporter, Ste6, functions as a signal sequence

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BB BG BR BY CA CH CN CZ DE DK ES FI GB HU JP KG KP KR KZ LK LU LV MD MG MN MW NL NO NZ PL PT RO RU SD SE SI SK TJ TT UA UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA