NEW EXPRESSION VECTORS
The present invention relates to novel expression vectors comprising besides the usual transcription and translation regulating sequences, i.e. promoter, operator, ribosomal binding site, translation start codon, transcription terminators, etc., a sequence coding for the part of β-galactosidase comprising
15 - 20 amino acids, the repetition of the operator sequence or any part thereof, a further ribosomal
binding site, a translation start codon, a sequence coding for a homopolymer oligopeptide comprising
5 - 10 aminoacids and a further ATG codon. The invention also relates to the method for production of
the above expression vectors.
The method of in vitro DNA recombination can be used for introducing genes of whatever origin and information content into bacterial cells. The
use of the proper vehicle molecules, the expression vectors can assure the foreign DNA to stably exist in the cell, and express the information encoded.
When expressing a gene coding for a protein in a
bacterial cell, first the enzymes of the bacterial cell synthesize mRNA from the DNA during transcription, followed by synthetizing protein therefrom
during translation, The expression is initiated by
a so called promoter region. The RNA polymerase
recognizes this promoter and binds to it during the initiation of transcription. The method of in vitro DNA recombination has a practical value only if the target protein is biosynthesized in sufficient amount, that is if both the transcription and translation has proper intensity and the produced mRNA and biosynthesized protein is properly stable in the host cell.
The expression of foreign - eucaryotic - - genes is sometimes very difficult because of the following reasons:
- the gene product is toxic to the host cell
- the mRNA synthesized from the given gene is not stable in the bacterial cell
_ the gene product - the protein - degrades very
fast, can not be accumulated.
The first probleme can be overcome by a promoter regulatable and initiatable at a given moment which is turned on by the user when the bacterial cells reach proper density. At this stage it does not mean any probleme if only the synthesis of the product is going further on in the cells, there is no more division of cells, the life conditions - just because of the synthesis and toxicity of the product - are not ideal for the host cell.
In most cases the mRNA and protein level
protection is achieved by "masking" the gene in question in such a way that a part of a gene of bacterial origin is fused to the 5'-end of the gene of the product. In most cases this protecting peptide is a part of the β-galactosidase enzyme, the relative amount of which is very high in the fusion product since a sequence of 150 amino acids is needed for protection, as a minimum. This fact represents a serious disadvantage in the purification of the product since the relative amount of the useful protein is small in the fusion protein and the most widely used separation procedure, the BrCN cleavage, results in many fragments, from which the selection of the authentic one is difficult.
Recently there was developed and reported the fusion of a short DNA tail coding for a homopolymer of amino acids to the 5'-end of the gene aiming the protection of foreign proteins in E. coli (Wing L.
Sung et al.: Short synthetic oligodeoxyribonucleotide leader sequence enhance accumulation of human proinsulin synthesized in Escherichia coli, Proc. Natl. Acad.
Sci USA Vol. 83, pp. 561-565, 1986). In case of some amino acids (Thr, Gin, Ser) a very good expression could be achieved (6 - 26 %. of the total protein
content) and the ratio of the useful product in the fusion protein had remarkably increased since the homopolymer part consisted of 6 - 8 amino acids
only. But this vector can not be used as universally as the vector of the present invention. The reason is that the short - three amino acids long - sequence coding for β-galactosidase, described in the above report, does not give sufficient mRNA level protection. It turns out from the report that the authors did not attach importance to the sequence mentioned above since, on the basis of the previous experiments, it was supposed that only the sequence of usual length (coding for approx. 150 amino acids) could have such type of stabilising effect.
It has been found- that, rather surprisingly, the mRNA level protection can be achieved if the region coding for the monoton amino acids is inserted after a short β-galactosidase DNA sequence (coding for 15 - 20 amino acids) and before the gene to be expressed. This "sandwich" arrangement assures the sufficient mRNA and protein level protection. Our conclusion is that the β-galactosidase coding part is important for the mRNA level protection while the homopolymer amino acid tail has greater importance on the protein level.
Our aim was to develop a new expression vector that assures a good protein yield and, at the same time, can be reliably regulated, and provides sufficient mRNA and protein level stability in case of any kind of gene to be expressed.
From the adequate members of the expression vector family pERVI/23 (described in the Hungarian application No. 4111/87) the threo-alpha plasmid pER 23 complying with the above requirements was constructed using in vitro DNA recombination techniques (The restriction and functional map of the plasmid is presented in Fig. 2/a.). The construction of this plasmid was rendered possible by our finding that by the application of a surprisingly short sequence
(40-60 nucleotides long) coding for β-galactosidase and a nucleotide sequence coding for a homopolymer of amino acids, a significant mRNA and protein level protection could be achieved. Moreover, with the
introduction of a nearly complete operator region the controllability of the vector became even more perfect. With the introduction of a further ribosomal binding site and translation start codon it was achieved that the range of proteins produced by our vector had significantly increased.
Thus, it is an object of the present invention to provide a method for the construction of
new expression vectors by introducing a sequence
coding for a homopolymer of amino acids comprising introducing a sequence coding for 15-20 amino acids of β-glactosidase, the repetition of the operator sequence or any part thereof, a further ribosomal binding site, a translation start codon, a sequence
coding for a homopolymer oligopeptide comprising
5 - 10 amino acids and a further ATG codon into a
derivative of plasmid vector pERVI/23, after the regions of the promoter, operator, ribosomal binding site, and translation start codon and before the gene to be expressed.
The main characteristics of the expression vector and the gene expression provided thereby,
according to the present invention, are the following:
- the vector carries a very strong regulatable promoter
- the mRNA and protein level protection is provided
by the sequence coding for β-galactosidase and
the homopolymer amino acid tail syntesized by the sequence coding for the homopolymer of amino acids, both located before the gene to be expressed;
- the length of the tail is only about 40 amino acids that makes the purification of the product significantly easier;
- due to the ATG codon inserted before the gene to
be expressed, the undesired fused protein parts can be removed from the product protein by a simple BrCN cleavage;
- after the β-galactosidase part a second, nearly
intact lac operator sequence can be found (in reading frame with the amino acid coding sequence and the β-galactosidase coding part). This second operator, together with the first intact operator after the
promoter, makes the repression perfect even after such a strong promoter as promoter 6/23 formed from the p2 promoter of the ribosomal RNA operon and regulating sequences of the E. coli lac operon (Hungarian application No. 4111/87);
- after the β-galactosidase coding part and before the sequence coding for the homopolymer of amino acids a second ribosomal binding site and a translation start codon can be found. Experimental data show that in case of certain proteins only the first or the second translation signal while in other cases both of them are used, thus practically the degree of freedom of the system increases since the translation apparatus of the bacterium can select the more efficient, more easily usable signal;
- the vector can be used for the expression of any other protein gene containing ClaI linker or any protein gene inserted into the ClaI site of the vector (other restriction sites of the linker can also be used of course);
- with the aid of the monoton amino acid tail the
one-step purification of different proteins, based on imrnunological methods, is possible;
- in case of the genes so far tested (human proinsulin, synthetic insulin A and B chain, vasoactive intestinal of
polypeptide) very high level/expression could be
effected by the vector (25-30 % of the total protein content of the cell).
The method according to the present invention elaborated for the construction of the above mentioned expression vector suitably comprises the following steps, detailed later.
1. A peptide is selected, the gene coding for which is available in a cloned form and which is known to be very instable in E. coli cell (for example the human proinsulin).
2. Cla I linker is ligated to the 5'-end (start)
of the gene in such a way that the final 3 nucleotides of the linker (ATG) is located directly before the nucleotide triplet coding for the first amino acid (the amino acid metionin coded by the codon ATG makes possible the BrCN cleavage of the fusion product), then the gene starting with the Cla I linker is cloned into a suitable plasmid following the lac regulator sequences (operator and ribosomal binding site).
3. Into the unique Cla I restriction site of the
obtained plasmid a synthetic oligonuclεotid is inserted which contains the sequencial coding region of some amino acids threonin. In case of correct insertion the triplets of the threonins are in reading frame with the gene and, on the other hand, the Cla I site regenerates only at
the end of the oligonucleotide being closer to the gene.
4. From the obtained plasmid
the lac regulator region
- the oligonucleotide coding for the threonins the Cla I linker
and the DNA- fragment containing the gene are cut out by suitable restriction enzymes and cloned into the unique PvuII restriction site of the α-peptide coding region of plasmid pER VI/23 or any derivative thereof, in such a way that in the position corresponding to the PvuII site a unique restriction site (R) should be generated. In the relevant region of the obtained plasmid DNA regions with the following functions can be found sequentially: 6/23 promoter; lac regulator region I; DNA sequence coding for the first 34 amino acids of the α -peptide; R unique restriction site; lac regulator region II; DNA sequence coding for the monoton threoinin oligopeptide; Cla I linker with the ATG codon; and the gene of the peptide to be stabilized.
5. In the next step using the R restriction site, the plasmid is linearized, then deletions are made with Bal 31 exonuclease in such circumstances that the enzyme should digest only some 10 nucleotides, then the plasmid is recircularized
with DNA ligase.
6. After transformation clones are selected which
express foreign protein of the desired size at a high level.
7. Using immunological methods (RIA, immunoblot)
it is proved that the produced fusion peptid involves the protein to be expressed.
8. The precise nucleotide sequence of the plasmid
is determined by sequencing.
9. Using the Cla I restriction site, the DNA sequence coding for the gene is changed to a polylinker sequence that makes possible to clone the genes of any other peptides. The method according to the present invention is illustsated by the way of the following example without limiting the scope of claims.
Example 1
Construction of plasmid pER23 Threo-α
When carrying out the construction, three different plasmids were used as starting material:
pSZI 153, pERVI/23 PLH4, pERVI /23/+ATG/. The plasmid named pSZI 153 is described in our Hungarian patent application No. 4363/84, the other two ones are the members of the vector family described in our Hungarian patent application No. 4111/87.
The DNA region coding for human proinsulin is cut out from plasmid pSZI 153 with ClaI and HindIII restriction enzymes, where on the removed DNA fragment, the nucleotide triplet originating from the ClaI linker is directly followed by the codon TTT coding for the first amino acid of the protein. In the first step, the DNA fragment (proinsulin gene) produced by digestion with ClaI, HindIII restriction enzymes is cloned into the ClaI-HindIII site of the polylinker region of plasmid pER23(+ATG).
In the second step, into the ClaI site of the obtained plasmid - hereinafter Intermediate I - directly before the proinsulin gene, a synthetic double stranded oligonucleotide is inserted that contains sequentially the codons of seven threonin amino acids and carries two sticky ends that can be ligated to a plasmid digested with ClaI enzyme. The oligonucleotide is designed in such a way that, in case of insertion in a correct orientation, the threonin codons are in reading frame with the proinsulin gene and the
ClaI restriction site should be regenerated only on the end of the oligonucleotide closer to the gene.
(This is necessary later on for being able to change the proinsulin gene to another gene of any peptide using the obtained unique ClaI restriction site).
It should be noted that in contrary to the statements of the Sung et al. report (see in the intro
duction) there was no demonstrable expression in the E. coli clones transformed with the plasmid prepared by the above mentioned procedure - this plasmid is called hereinafter Intermadiate II.
In the next step the unique Nsil restriction site of the Intermediate II plasmid is transformed to EcoRI restriction site. In this procedure, after
Nsil digestion, the 3' overhanging ends are removed by digestion with T. polymerase enzyme, then the
plasmid is recircularised with DNA ligase in the presence of large amount of EcoRI linker phosphorylated by
polynucleotide kinase. The resulting plasmid is called hereinafter Intermediate III.
Afterwards, into the unique PvuII site of plasmid pER23 PLH4 - the middle of the et-peptide
coding region - an EcoRI linker is inserted also, the difference is that the T4 polymerase treatment is unnecessary because of having generated blunt ends by PvuII enzyme.
Hereupon the EcoRI-HindIII fragment, originating from Intermediate III is cloned into the EcoRI--HindIII site of the resulting plasmid (hereinafter called Intermediate IV). The obtained plasmid is called hereinafter Intermediate V, the structure of the relevant region of which is shown on Figure 1.
In the next step the Intermediate V plasmid is digested with EcoRI restriction enzyme and series
of deletions are generated with Bal31 exonuclease in both directions from the EcoRI site. The member of the series is selected, in which the Bal 31 enzyme digested some dozens of nucleotides in both directions. Hereby the distance between the α-peptide coding region and the seven threonine codons is reduced.
After recircularisation with DNA ligase the resulting mixture of plasmids is transformed into E. coli IM 107 cells and clones are selected expressing foreing protein belonging to the desired size-range in large amount. (In order to correctly determine the resulting plasmid, a suitable clone was selected and the end-points of the deletion was determined by DNA sequencing on the plasmid prepared therefrom. (The concrete construction and sequence are shown on fig. 2/a, Fig. 2/b and Fig. 3.) The produced fusion protein is strongly reacting with antibodies specific for the human proinsulin protein.)
Finally, in the last step the ClaI - Hindlll fragment containing the proinsulin gene is changed to a ClaI-HindIII polylinker fragment of plasmid
VI/23 PLH4 originating from miniplasmid πvx.
Chemicals and preparations
The generally used laboratory chemicals were of pro anal, quality, commercially available
REANAL, SIGMA and MERCK products. The specific activity
of the (γ-32P ATP and α-32P) dATP preparations (Hungarian Isotope Insttitute) was 100-150 TBq/mM.
The used restriction endonucleases PvuII, EcoRI, HindIII were purified by the coworkers of the Biochemistry Institute of the Hungarian Academy of
Sciences, Nucleic Acids Department on the basis of published methods (Methods in Enzymology (Ed: L. Grossmann, V. Moldave) Vol. 65 pp. 89-180). Enzymes ClaI, Nsil were from New England Biolabs.
Bal31 nuclease and Klenow enzyme (E. coli
DNA polymerase I Large Fragment) were from New England Biolabs, bacterial alkaline phosphatase (BAP) was from Worthington, the pancrease RN-ase and the lysozyme were from Reanal.
T4 induced pαlynucleotide ligase was prepared by the method of Murray et al. (Murray, N.E., Bruce S.A. and Murray, K.: Molecular cloning of the DNA ligase gene from bacteriophage T4. D. Mol. Biol.,
132, pp. 493-505 (1979)).
Strains
E. coli K12 JM107 (Yanisch-Perron, C. et al.: Improved M13 phage cloning vectors and host strains nucleotide sequences of the M14 mp 18 and pUC19 vectors. Gene, 33 , 103-119 (1985)).
Escherichia coli strains were grown in liquid media completed with 10 g Bacto-tryptone, 5 g Bacto Yeast extract and 5 g NaCl per litre. Solid culture medium was prepared by adding 15 g Bacto-agar per litre to the above liquid medium.
Indicator plates suitable for the estimation of the expression of the N-terminal part of the β-galactosidase enzyme (α -peptide) were composed of the following components per litre:
20 g casamino acid
6 g Na2HPO4
3 g KH2PO4
0.5 g NaCl
1 g NH4CI
15 g Bacto-Agar
20 mg X-gal (dissolved in dimethyl- -formamide)
2 mM MgCl2
0.1 mM CaCl2
The cells containing the plasmid carrying the β-lactamase gene were grown in a medium containing 100 μg/ml ampicillin.
Isolation of plasmid DNA was made by growing the bacterium strain carrying the plasmid in question in a medium containing 100 μg/ml ampicillin and,
at the time of reaching 0.7-0.8 OD600nm' adding 170 μg/ml chloramphenicol to the culture medium for the amplification of the plasmid. When isolating the plasmid DNA in preparative amount, clear lysate was prepared by the method described by Clewell and Helinski (Clewell, D.B. and Helinski, D.R., (1969)) Supercoiled circular DNA-protein complex in Escherichia coli:
purification and induced conversion to an open circular RNA form. Proc. Natl. Acad. Sci. USA, 62 , pp.
1159-1166), then the plasmid DNA was purified on
Sephacryl S1000 (Pharmacia) column or by ultracentrifugation on caesium chloride-etidiumbromide density gradient. When isolating the plasmid DNA in analytical amount (from a bacterium culture of 1.0 to 1.5 ml) the potassium acetate method, elaborated by Doly and Birnboim and modified by D. Ish-Horowith was used
(Maniatis, T., Fritsch, E.F., Sambrook, 3. (1982)
Molecular cloning, Cold Spring Harbor Lab., New York).
Cleavage of DNA samples by restriction
endonucleases was conducted under the conditions suggested by New England Biolabs.
When joining DNA fragments with sticky ends to each other, a reaction mixture (30-40 μl) comprising 0.5-1.0 μg DNA, 66 mM Tris-HCl (pH 7.6), 5 mM MgCl2, 5 mM dithiothreitol, 1 mM ATP and 1 unit induced T4 polynucleotid ligase was used. The ligation was carried out at 14 ºC for 2-3 hours /Maniatis T., Fritsch,
E.F., Sambrook, J . (1982) Molecular Cloning, Cold
Spring Harbor Lab., New York/.
Joning of fragments with blunt ends was carried out in a buffer containing 30-40 μg/ml DNA, 25 mM Tris-HCl (pH 7.4), 5 mM MgCl2, 5 mM dithiothreitol, 0.25 mM spermidin, 1 mM ATP, 10 μg/ml BSA (Sigma, Type V). 4-6 units of induced T4 polynucleotide kinase were given to the reaction mixture and it was incubated at 14 ºC for 8 - 12 hours.
Gel electrophoresis of DNA samples was performed in 0.8-0.2 % agarose gels (Sigma, Type I) in horizontal electrophoresis apparatuses by the method of Helling et al. (1974). Polyacrylamide gel electrophoresis - using 4 and 8 % , 1 mm thick, vertical gels - was performed by the method of Maniatis, T., Jeffrey, A. and van de Sande, H. (1975) /Chain length determination of small double and single stranded DNA molecules by polyacrylamide gel electrophoresis. Biochemistry, 14, 3787-3794/.
Isolation of DNA fragments from agarose and polyacrylamide gels was done by the method of
Winber G., Hammarskjöld, M. using DEAE paper (Isolation of DNA from agarose gels using DEAE-paper. Application to restriction site mapping of adenovirus type 16 DNA. NAR, 8, 253 (1980)).
The DNA fragments were digested with BAL31 nuclease in a final concentration of 100 μg/ μl in a reaction mixture containing 600 mM NaCl, 12 mM CaCl2,
20 mM Tris-Hcl (pH 8.0), 1.0 mM EDTA at 30 ºC. Depending on the desired extent of the shortening, 0.4 - 1.2 units of enzyme were added to 1.0 μg of DNA. In each case test preparation was made with the given
DNA fragment and the available BAL 31 enzyme whereby the extent of shortening of the fragment after different incubation period was determined by gel electrophoretic analysis of the samples. The functioning of the enzyme was stopped by phenol-extraction of the reaction mixture and, after alcoholic precipitation of the DNA, the ends were repaired by Klenow polymerase to blunt ends under reaction conditions suitable for 3' end-labeling/Maniatis, T., Fritsch, E.F., Sambrook, I. (1982) Molecular cloning, Cold Spring Harbor Lab., New York/.
For the transformation of the JM 107 bacterium strain competent cells were prepared according to CaCl2 method described by Mandel and Higa /Maniatis, T., Fritsch, E.F., Sambrook, I. (1982) Molecular Cloning, Cold spring Harbor Lab., New York/.
The dephosphorilation of 5'-ends of the
DNA fragments and end-labeling with polynucleotide kinase using ɣ -32P-ATP as well as the determination of the nucleotid sequence was performed according to the protocol described by Maxam A. and Gilbert, w. /Sequencing end-labelled DNA with basespecific chemical cleavages. Meth. Enzymol. 65, pp. 499 - 560 (1980)/.
The fusion protein containing insulin was demonstrated by the immunoblot method /Towbin et al.: Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets. Procedure and some applications, Proc. Natl. Acad. Sci. USA, 76, 4350 (1979)/.
In preparing the expression vectors, the methods and procedures described by Maniatis T.,
Fritsch, E.F., Sambrook, I. (1982) /Molecular Cloning, Cold Spring Harbor Lab., New York/ were applied.
Data of deposition
The symbol of the Number of Date of strain deposited deposition deposition
I 1090/pszI 153 13. 07. 1988 850AM
IM 107/pER 23 Threoalpha 13. 07. 1988 850AM
List of abbreviations used in the figures
Pr : promoter
T1T2 : transcription terminators
sp : Shine-Dalgarno sequence (ribosomal binding site)
aa : amino acid
op : lac operator
∇ : deletion (by Bal 31 exonuclease)
nu : nucleotid
thr : threonin codons
ori : replication origin
α : alpha-peptid gene
h.p.i. : human proinsulin gene
ApR : beta-lactamase gene, providing ampicillin resistance
E : Eco RI restriction enzyme
Ba : Bam HI restriction enzyme
Pst : Pst I restriction enzyme
Bg : Bgl II restriction enzyme
X : Xbal restriction enzyme
H : Hind III restriction enzyme