EP1315822A2

EP1315822A2 - Modified construct downstream of the initiation codon for recombinant protein overexpression

Info

Publication number: EP1315822A2
Application number: EP01947565A
Authority: EP
Inventors: Laurent Chevalet
Original assignee: Pierre Fabre Medicament SA
Current assignee: Pierre Fabre Medicament SA
Priority date: 2000-06-22
Filing date: 2001-06-21
Publication date: 2003-06-04
Also published as: MXPA02012880A; FR2810675B1; CA2413612A1; WO2001098453A2; US20040260060A1; BR0111907A; WO2001098453A3; JP2004500875A; CN1443242A; FR2810675A1; AU6922401A

Abstract

The invention concerns a construct for expressing a gene coding for a recombinant protein of interest placed under the control of a tryptophan operon (Ptrp) in a prokaryotic cell, comprising directly downstream of the initiation codon a nucleic sequence SEQ ID N<o> 1 and downstream of said sequence a multiple cloning cassette designed to receive the gene coding for said recombinant protein of interest, at least nucleic acids of the nucleic sequence SEQ ID N<o> 1 being mutated or deleted so as to enable overexpression of said recombinant protein. The invention also concerns a vector containing such a construct, a prokaryotic host cell transformed by said vector, as well as a method for producing a recombinant protein of interest using the inventive construct.

Description

MODIFIED CONSTRUCTION DOWNSTREAM OF THE INITIATION CODON FOR OVEREXPRESSION OF RECOMBINANT PROTEINS.

The invention relates to a construct for the expression of a gene coding for a recombinant protein of interest placed under the control of the tryptophan Ptrp operon in a prokaryotic host cell, which comprises directly downstream of the initiation codon a sequence nucleic acid with sequence SEQ ID No. 1 and downstream of this sequence a multiple cloning cassette intended to receive the gene coding for said recombinant protein of interest, at least one of the nucleotides of the sequence SEQ ID No. 1 being mutated or deleted so as to allow an overexpression of said recombinant protein. The subject of the invention is also a vector containing such a construct, a prokaryotic host cell transformed by said vector, as is a method for producing a recombinant protein of interest using a construct according to the invention.

The ability of biotechnologists to clone a gene in a short time, to express it as a biologically active protein and then to create variants to establish sequence / function relationships has made it possible to offer a wide range of recombinant proteins to medical or research use. Many human diseases are now treated or avoided thanks to the availability of molecules from biotechnology in pure form and at an acceptable cost (Koths K., Current Opinion in Biotechnology, 6, 681-687, 1995).

Bacterial cells are privileged hosts for the expression of recombinant proteins because they have limited nutritional requirements while being able to reach high growth densities, but also because they have been the subject of in the past. numerous investigations which have led to the generation of mutants of interest and various plasmid expression systems. Among the bacteria, Escherichia coli (E. coli) is the most used and best characterized organism if we judge by the abundant literature relating to the expression of proteins of prokaryotic or eukaryotic origin. However, all the proteins are not expressed there with the same efficiency due to difficulties which may occur at different levels: transcription of the gene of interest, translation, post-translational events affecting the fate of the molecule in the cytoplasmic or periplasmic environment of the bacteria (Makrides, SC, Microbiological Reviews, 60, 512-538, 1996).

To be efficiently translated, a messenger RNA must contain a sequence specifying the binding of the bacterial ribosome and allowing the initiation of translation. This sequence, called the Ribosome Binding Site (RBS), is located in an area covering the initiator codon. Statistical analysis of the initiation domains of bacterial mRNA reveals the existence of a window of 34 nucleotides whose sequence differs from a random distribution (Gold, L., Annual Revue of Biochemistry, 57, 199-233, 1988 ). This sequence, going from position - 20 to position + 13 of the mRNA if the position + 1 is assigned to the first nucleotide of the initiator codon, plays the role of RBS by helping the ribosome to distinguish the real domains of initiation among all the “RBS-li e” sequences. Numerous investigations made it possible to refine the knowledge of RBS to define some characteristic elements: i) The Shine-Dalgarno sequence (SD):

Since the sequencing of the 3 'end of the 16S ribosomal AR (Shine, J. and Dalgarno, L., Proc. Natl. Acad. Sci. USA, 71, 1342-1346, 1974), the so-called Shine sequence -Dalgarno was defined as the region of the mRNAs 5 ′ of the initiation codon having a complementarity with the sequence 5 ′-CCUCCUUA-3 ′ of the 3 ′ end of the 16S rRNA. The existence of an interaction between the 16S rRNA and the RBS mediated by the Shine-Dalgarno sequence is confirmed by the strong representation of the purine bases A and G in the region [-12; -7] of natural RBS of mRNA of E. al. This bias is found in a collection of 158 RBS randomized and selected for their capacity to promote the expression of a reporter gene (Barrick, D., et al., Nucleic. Acids. Res., 22, 1287-1295, 1994 ). ii) The initiation codon:

It is the AUG codon which is preferably used as the initiation codon, although GUG and to a lesser extent UUG can be found occasionally (Ringquist, S., et al., Molecular Microbiology, 6, 1219-1229, 1992) . iii) The distance between the SD sequence and the initiation codon: A comprehensive study by Chen H., et al. (Nucleic Acids Research, 22, 4953-4957, 1994a) has shown the existence of an optimal distance separating the 3 'end of the Shine-Dalgarno sequence and the initiation codon. By taking the consensus sequence 5'-UAAGGAGGU-3 'as the reference SD sequence, the spacing giving the maximum level of expression is 5 nucleotides. A spacing of between 1 and 9 nucleotides remains favorable while ensuring a level of expression at least equal to 50% of the maximum level. iv) Other primary sequences:

Two pairings are known to intervene in the initiation of translation: the pairing between initiation codon of the mRNA and t-fMet RNA on the one hand and the pairing between SD sequence and 3 'end of the 16S rRNA on the other hand . Mutagenesis studies and analysis of atypical mRNAs (in particular mRNAs lacking a leader sequence) have made it possible to identify new sequence elements in the environment of the AUG codon which may contribute to the overall efficiency of the initiation domain. . Adenine-rich motifs immediately downstream of the initiation codon are favorable for the initiation of translation (Scherer, GFE, et al., Nucleic Acids Research, 8, 3895-3907, 1980; Chen, H., et al ., Journal of Molecular Biology, 240, 20-27, 1994b). Similarly, the AAA and GCU codons which are most frequent in the position of second codons (Gold, L., 1988) have a positive effect on translation, especially when the initiation codon is sub-optimal (GUG or UUG) (Ringquist, S., et al., 1992). Another sequence promoting translation (Sprengart, M., Et al.) Is a sequence identified on the 0.3 mRNA of phage T7 and named “Downstream Box” (DB) because of its downstream position relative to the initiation codon. ., Nucleic Acids Research, 18, 1719-1723, 1990). This sequence of 12 nucleotides has a complementarity with nucleotides 1469-1483 of the 16S rRNA and it is found in similar forms on domains of initiation of the translation of several genes of E. coli and highly expressed bacteriophages (Sprengart, ML, et al., 1990). This “Downstream Box” allows the initiation of translation even in the absence of an SD sequence (Sprengart, ML, et al., The EMBO Journal, 15, 665-674, 1996). Recent results indicate that, contrary to the hypothesis initially put forward, the DB sequence would act by a mechanism other than a pairing with the region 1469-1483 of the 16S rRNA (O'Connor, M., et al., Proc. Natl. Acad. Sci. USA, 96, 8973-8978, 1999). v) Secondary structures:

The sequence of mRNA near the SD region can influence translational efficiency through the formation of secondary structures. De Smit, MH, and van Duin, J. (Journal of Molecular Biology, 235, 173-184, 1994) show that intramolecular pairings on mRNA can hinder good translation by competing with mRNA / rRNA pairing, all the more so since the complementarity of the SD region with the 16S rRNA is low. Similarly, it has been shown that the expression of prochymosin in E. coli is dependent on the composition of the region connecting SD to the initiation codon: a sequence limiting the secondary structures promotes the accessibility of RBS to the ribosome and results in strong translational efficiency (Wang, G., et al., Protein Expression and Purification, 6, 284-290, 1995). Given the importance of the translation initiation step on the expression yield of the recombinant proteins, numerous studies have been carried out with the aim of optimizing the RBS region of bacterial expression vectors. An intuitive approach first consisted in placing the complete consensus SD region (UAAGGAGGU) upstream of genes of interest (Jay, G., et al., Proc. Natl. Acad. Sci. USA, 78, 5543-5548 , nineteen eighty one). More systematically, Marquis DM, et al. (Gene, 42, 175-183, 1986) placed this sequence downstream of different promoters and at a variable distance (5 to 9 nucleotides) from the initiation codon. With the PIL-2 gene as a model, the results indicate that an SD / AUG spacing of 6 nucleotides is optimal for almost all of the promoters tested. In a comparative study between the consensus SD sequence and the SD sequence of the lacZ gene, Mandecki, W., et al. (Gene, 43, 131-13, 1986) noted however that the consensus SD sequence gave a higher expression in vitro but 2 to 2.5 times weaker than that of lacL in vivo. Whole RBS regions derived from phage genes with their own SD sequence have also been shown to be superior to the consensus SD sequence for the expression of proteins of different origins (plants, mammalian cells, bacteria) (Olins, PO, et al. ., Gene, 73, 227-235, 1988). Using the tryptophan promoter, Curry, K. and Tomich, CSC (DNA, 1, 173-179, 1988) compared the efficiency of the consensus SD sequence with that naturally present in Ptrp. Their results indicate a very strong dependence on the gene of interest studied, to conclude that it is impossible to construct an optimal vector working for all heterologous genes. Also working from the tryptophan promoter and the consensus SD sequence, Olsen MK, et al. (Journal of Biotechnology, 9, 179-190, 1989) obtained very high expression levels (20 to 30% of total proteins) for different heterologous proteins (growth hormones, TNF) by enriching the flanking sequences of the region SD in nucleotides A and T. Similar results had been described previously by De Boer HA, et al. (DNA, 2, 231-235, 1983) which noted the positive effect of the bases A and T placed downstream of the SD region in the context of the hybrid promoter Ptrp / PlacUN5 expressing α interferon.

All these results were obtained in the context of experiments where a limited number of parameters were taken into account. Aware of the large number of factors, known or not, influencing the initiation of translation, and above all of the a priori non-negligible role of interactions between factors which are not taken into account during iterative approaches, some authors have subsequently tried to select in vivo optimal synthetic RBS from large random libraries. Wilson B.S., et al. (BioTechniques, 17, 944-952, 1994) thus screened a repertoire of degenerate sequences at 16 positions upstream of the initiation codon, within an expression cassette containing the β-lactamase gene under control from the promoter / operator lake. Such an approach made it possible to identify original sequences expressing β-lactamase with an efficiency 3 times greater. With another gene coding for an scFv, the level of overexpression compared to the original RBS is approximately 2 times.

It is established in the light of these results that RBS regions described as optimal are always so in a particular context in which both the sequence of the gene of interest and the sequence of the leader region of the AR m, which depends, intervene. even the type of promoter used. The tryptophan promoter (Nichols, BP and Yanofsky, C. Methods in Enzymology, 101, 155-164, 1983) is one of the major systems used in the expression of recombinant proteins (Yansura, DG and Henner, DJ, Methods in Enzymology (Anonymous Academy Press, Inc., San Diego, CA.) 54-60, 1990; Yansura, DG and Bass, SH Methods in Molecular Biology, 62, 55-62, 1997) but its RBS has never been systematically optimized by an approach based on the screening of random sequences. The interest of the biotechnologist wishing to develop processes on an industrial scale is to have tools guaranteeing maximum expression, whatever the protein of interest. There is therefore a strong interest in any improvement making it possible to optimize the expression of recombinant proteins, whether the improvements are made by the host strain, by the expression vector, by the culture and expression process or by any combination of these factors.

More particularly, the present invention demonstrates the advantage in terms of translational efficiency of new nucleotide sequences carried by an expression vector, in the region of the "Ribosome Binding Site" (RBS), downstream of the tryptophan promoter (Ptrp). By the use of degenerate oligonucleotides introduced upstream of the initiation codon and then by selection of clones overexpressing the reporter gene for chloramphemcol acetyl transferase (CAT), the applicant sought new optimized RBS sequences. By searching for optimized sequences upstream of the initiation codon, it was discovered quite surprisingly that the nucleic sequence located directly downstream of the initiation codon could be mutated or deleted so as to overexpress recombinant proteins. The sequences thus obtained exhibit an improvement character with regard to the expression of different genes of interest from various origins.

In addition, a major current problem with regard to current quality constraints is to obtain the purest recombinant protein possible, that is to say with a minimum of amino acids grafted upstream or downstream of the recombinant protein. , amino acids from the construction used. When the nucleic sequence located between the initiation codon and the first cloning site has deletions so as to overexpress recombinant proteins, this problem is also solved by the present invention. A subject of the present invention is thus a construction for the expression of a gene coding for a recombinant protein of interest placed under the control of the promoter of the tryptophan Ptrp operon in a prokaryotic host cell comprising directly downstream of the codon of initiation of a nucleic sequence of sequence SEQ ID No. 1, and downstream of this sequence a multiple cloning cassette intended to receive the gene coding for said recombinant protein of interest, characterized in that at least one of the nucleotides of the sequence SEQ ID No. 1 is mutated or deleted so as to allow an overexpression of said recombinant protein.

It is all the more surprising to obtain an overexpression thanks to the object of the invention, whereas the prior art teaches to use this sequence SEQ ID N ° 1 as it is. Mention may in particular be made for this purpose of the patents US 5,714,589, US 5,468,845, US 5,418,135, US 4,891,310, US 4,789,702, WO 88/09344, US 4,738,921 and EP 0 212 532 which teach the use of the sequence SEQ ID N ° 1 in downstream of the initiation codon for the expression of proteins of interest. The term “recombinant protein of interest” is intended to denote any protein, polypeptide or peptide obtained by genetic recombination, and capable of being used in fields such as that of human or animal health, cosmetology, animal nutrition, agro-industry or the chemical industry. Among these proteins of interest, there may be mentioned in particular, but not limited to: - a cytokine and in particular an interleukin, an interferon, a tissue necrosis factor and a growth factor and in particular hematopoietic (G-CSF, GM - CSF), a human growth hormone or insulin, a neuropeptide;

- a factor or cofactor involved in coagulation and in particular the NUI factor, von Willebrand factor, antithrombin III, protein C, thrombin and hirudin; - an enzyme and in particular trypsin, a ribonuclease and β-galactosidase;

- an enzyme inhibitor such as α1 antitrypsin and viral protease inhibitors;

- a protein capable of inhibiting the initiation or progression of cancers, such as the expression products of tumor suppressor genes, for example the P53 gene; - a protein capable of stimulating an immune response or an antigen, such as for example the proteins, or their active fragments, of the membrane of bacteria Gram negative, in particular OmpA proteins from Klebsellia or protein G from human respiratory syncytial virus;

- a monoclonal antibody humanized or not or an antibody fragment such as a scFv;

a protein capable of inhibiting a viral infection or its development, for example the antigenic epitopes of the virus in question or altered variants of viral proteins capable of entering into competition with the native viral proteins;

- a protein capable of being contained in a cosmetic composition such as substance P or a superoxide dismutase; - a food protein and in particular a food;

- an enzyme capable of directing the synthesis of chemical or biological compounds, or capable of degrading certain toxic chemical compounds; or

- any protein having a toxicity vis-à-vis the micro-organism which produces it, in particular if this micro-organism is the E. coli bacterium, such as for example the protease of the NIH-1 virus, the ECP protein "Eosinophil Cationic Protein "or the 2B and 3A proteins of the poliovirus.

By nucleic sequence of sequence SEQ ID Ν ° 1 in which at least one of the nucleic acids is mutated or deleted so as to allow an overexpression of said recombinant protein, is meant any sequence which comprises a deletion or a mutation of at least one nucleotide of the sequence SEQ ID N ° 1 which allows an overexpression of the recombinant protein, compared to the expression of said recombinant protein obtained using the sequence SEQ ID N ° 1 unmodified.

By deletion is meant the elimination of one or more nucleotides at one or more nucleotide sites of the sequence SEQ ID No. 1. The resulting sequence is shortened compared to that of origin.

By mutation is meant the replacement of one nucleotide by another (A by C, G or T; C by A, G or T; G by A, C or T; T by A, C or G). The resulting sequence is the same size as the original.

The overexpression, that is to say the fact of obtaining an expression greater than that obtained without the modification downstream of the initiation codon can be determined in particular by using one of the following methods: i) migration by SDS-PAGE of the total proteins of the bacterium and revelation of the recombinant protein by staining with Comassie Blue or by Western blot; ii) assaying the recombinant protein by a method involving a specific antibody (Elisa); iii) enzymatic assay if the recombinant protein has a catalytic activity. Preferably, method ii) is used, detailed in Example III. By multiple cloning cassette is meant a nucleotide sequence containing one or more restriction sites, which sites can be used during steps of cloning the gene of interest downstream of the start codon. Preferably, said at least nucleotide of the sequence SEQ ID No. 1 is deleted so as to allow overexpression of said recombinant protein.

The invention also relates to a construction according to the invention in which said at least mutated or deleted nucleotide, preferably deleted, is located on the fragment of sequence SEQ ID No. 2 of the sequence SEQ ID No. 1. Another object of the invention The invention relates to the constructions in which said at least mutated or deleted nucleotide, preferably mutated, is located on the codon GTA and / or on the codon GCA and / or on the codon CTG of the sequence SEQ ID No. 1.

In a preferred embodiment of the invention, said sequence SEQ ID No. 1 of which at least one of the nucleotides is mutated or deleted, has at least in position 1, 2 and 3, the nucleotide A.

In a preferred embodiment of the invention, at least one of the nucleotides, and preferably all the nucleotides, located between the nucleic sequence of sequence SEQ ID No 1 and the multiple cloning cassette intended to receive the gene coding for said protein recombinant of interest, are deleted. In another even more preferred embodiment of the invention, said sequence SEQ ID No. 1 of which at least one of the nucleic acids is mutated or deleted and all the nucleotides located between the nucleic sequence of sequence SEQ ID No. 1 and the multiple cloning cassette are completely deleted, so that the initiation codon is directly upstream of the multiple cloning cassette. In a preferred embodiment of the invention, the constructs contain a nucleic sequence directly upstream of the chosen initiation codon among the sequences of sequence SEQ ID N ° 3, SEQ ID N ° 4, SEQ ID N ° 5, SEQ ID N ° 6, SEQ ID N ° 7, SEQ ID N ° 8, SEQ ID N ° 9 and SEQ ID N ° 10.

The invention comprises a construction according to the invention, characterized in that the prokaryotic host cell is a gram negative bacterium, preferably belonging to the species E. coli.

Another object of the invention relates to a vector containing a construct as defined above, just like a prokaryotic host cell, preferably belonging to the species E. coli, transformed by such a vector.

The present invention also relates to a process for the production of a recombinant protein of interest in a host cell using a construct as defined above.

The present invention also relates to a process for the production of a recombinant protein of interest according to the invention, in which said construct is introduced into a prokaryotic host cell, preferably by a vector as defined above.

A process for the production of a recombinant protein of interest according to the invention is preferred, characterized in that it comprises the following steps: a) cloning of a gene of interest in a vector according to the invention; b) transformation of a prokaryotic cell with a vector containing a gene coding for said recombinant protein of interest; c) culturing said transformed cell in a culture medium allowing expression of the recombinant protein; and d) recovering the recombinant protein from the culture medium or said transformed cell. The invention further comprises a use of a prokaryotic construct, vector or host cell according to the present invention, for the production of a recombinant protein.

Finally, the invention relates to a use of a recombinant protein for the preparation of a medicament intended for administration to a patient in need of such treatment, characterized in that said recombinant protein is produced by a process for the production of a protein recombinant of interest according to the invention. The examples and figures which follow are intended to illustrate the invention without in any way limiting its scope.

Legend of the figures and the tables: Figure 1: Map of the plasmid vector pTEXmplδ and sequence SEQ ID N ° 39 of the region 1-450 comprising the promoter / operator Ptrp, the region of the leader TrpL, the multiple cloning site mpl8 and the terminator of transcription.

Figure 2: Restriction map of the RBS region (Ribosome Binding Site) on the vector pTEXmpl 8 (SEQ ID No. 40). Figure 3: Estimation on SDS-PAGE gel of CAT expression in bacteria transformed by the pTEXCAT or pTEXCAT4 vectors.

Figure 4: Comparative study of the expression of β-galactosidase from the vectors pTEX-βGAL and pTEX4-βGAL (kinetics in fermenter).

Example I

This example illustrates one of the aspects which led to the invention, and in particular the way in which the library of plasmid vectors carrying the tryptophan Ptrp promoter and randomly mutated upstream of the initiation codon is constructed. The original vector is described in FIG. 1. It is a plasmid derived from pBR322 (Bolivar, F., et al., Gene, 2, 95-113, 1977) in which the promoter / operator Ptrp (1-298), followed by the sequence coding for the first 7 amino acids of the leader TrpL from E. coli (Yanofsky, C, et al., Nucleic Acids Research, 9, 6647-6668, 1981), a multiple cloning site and the trpt transcription terminator of E. coli (Yanofsky, C, et al., 1981). The 3 ′ part of Ptrp in pTΕXmplδ differs from the natural sequence by the presence of an Xbal cloning site upstream from the ATG initiation codon (see FIG. 2) and by a greater spacing between SD and initiation codon.

To allow the selection of modified vectors in their RBS part, the reporter gene for chloramphenicol acetyl transferase (CAT) is cloned at the ΕcoRI and PstI sites of pTEXmpl 8. For this, the coding sequence of the cat gene is amplified by PCR at 1 using CATfor and CATrev oligonucleotides whose sequences are: CATfor: 5 '-CCGGAATTCATGGAGAAAAAAATCACTGG-3' (SEQ ID N ° 11)

EcoRI CATrev: 5 '-AAACTGCAGTTACGCCCCGCCCTG-3' (SEQ ID No. 12) PstI The PCR reaction is carried out using the phagemid pBC as a template-

SK (Stratagene, La Jolla, CA, USA). The amplification product is deposited on agarose gel and purified according to the GeneClean method (BiolOl, La Jolla, CA). The cloning of the insert in pTEXmpl 8 is verified after transformation in E. coli by the appearance of colonies developing on boxes of LB agar medium (Sambrook J., et al., Molecular cloning. A laboratory manual, 2 ^nd edition Plainview, NY: Cold Spring Harbor Laboratory Press, 1989) in the presence of 30 μg / ml of chloramphenicol. The sequence of the insert is confirmed by automatic sequencing using the “Dye Terminator” kit and the DNA sequencer 373A (Perkin Εlmer Applied Biosystems, Foster City, CA). The vector obtained is named pTΕXCAT. The insertion of RBS having a degenerate sequence upstream of the initiation codon is carried out by ligation of synthetic oligonucleotides at the Spel and ΕcoRI sites of the vector pTΕXCAT. The region going from the Spel site to the ΕcoRI site respectively in positions - 49 and + 28 (see FIG. 2) is deleted by enzymatic digestion and replaced by a heteroduplex formed by two partially degenerate synthetic oligonucleotides, hybrids between them. Two pairs of oligonucleotides are used, involving respectively the oligonucleotides RanSDl / RanSD2 and RanSD3 / RanSD4 whose sequences are: RanSDl: 5'CTAGTTAACTAGTACGCAAGTTCACGTAAANNNNNNNNNNNÎWNNNATG AAAGCAATTTTCGTACTGAATGCGG-3 '(SΕQ ID NO: 13) RanSD2:

5 'AATTCCGCATTCAGTACGAAAATTGCTTTCATNNNN NNNNNNNNNNNTT TACGTGAACTTGCGTACTAGTTAA-3' (SΕQ ID NO: 14) RanSD3: 5 'CTAGTTAACTAGTACGCAAGTTCACGTAAATRRRRRRRNNNNNNATGAAA GCAATTTTCGTACTGAATGCGG-3' (SΕQ ID NO: 15) RanSD4:

5'AATTCCGCATTCAGTACGAAAATTGCTTTCATNNNNNNYYYYYYYATTTA CGTGAACTTGCGTACTAGTTAA-3 '(SEQ ID N ° 16).

The four oligonucleotides were synthesized by MWG Biotech (Ebersberg, D) under conditions ensuring an equimolar distribution of the bases for each degeneration. The RanSDl / RanSD2 pair brings complete degeneration (N = mixture of the 4 nucleotides A, C, T, G) on the 16 nucleotides preceding the ATG codon. The number of combinations (4 ¹⁶ or approximately 4.3 10 ⁹ ) allows the screening of RBS which are optimized both from the point of view of their Shine-Dalgarno (SD) sequence, of the sequence located between the SD region and the d codon. initiation as well as in SD-ATG spacing. This library will be named (N ₁₆ ) in the remainder of the text. The RanSD3 / RanSD4 pair provides complete degeneration on 6 nucleotides preceding PATG and partial degeneration on the 7 nucleotides upstream. The exclusive use of purines (R = A or G) on the positive strand and of pyrimidines on the complementary strand (Y = C or T) promotes the representation of Shine-Dalgarno type sequences at an optimal distance (6 nucleotides) from the ATG codon. This second library is named (RN ₆ ).

The linearization of the vector pTEXCAT and its purification on gel, the hybridization of the oligonucleotides in pairs, the ligation of the heteroduplexes to the linearized pTEXCAT vector and the transformation in E. coli of the library thus constituted are carried out according to the conditions described by Sambrook J., et al. (1989). Conventionally, 100 fmol of vector and 1000 fmol of insert are used in a ligation reaction in the presence of T4 ligase in a final volume of 15 μl. The reaction is carried out overnight at 16 ° C. Electrocompetent TOP 10 bacteria (50 μl) are then transformed by electroporation with 3 μl of the ligation mixture under the conditions recommended by the supplier (Invitrogen, CarIsbad, CA). The transformation mixture is spread on LB agar dishes containing 200 μg / ml of ampicillin, giving rise after 16 hours of incubation at 37 ° C. to the appearance of transformed colonies. Example II

Screening of libraries is carried out on the assumption that the clones overexpressing the CAT enzyme will have an increased resistance to chloramphenicol. This is validated by experience, the results of which are presented in Table 1 below. Table 1. Resistance to chloramphenicol in TOP 10 x pTEXCAT bacteria in the presence or absence of IAA.

The figures (upper row) indicate the number of colonies counted after 18 h of incubation at 37 ° C., each medium having been seeded with approximately 100 cells.

The index in the lower row is a qualitative criterion for colony growth (- = no growth at -H- + = maximum growth).

These results show that E. coli TOP 10 bacteria (Invitrogen, CarIsbad,

CA) transformed by the vector pTEXCAT and spread on dishes containing different concentrations of chloramphenicol have, between 300 and 600 μg / ml of chloramphenicol, a stronger development in the presence of 3-β indole acrylic acid (LAA), an analogue of tryptophan acting as an inducer by a Ptrp derepressure effect (Marmorstein, RQ and Sigler, PB, The Journal of Biological Chemistry, 264, 9149-9154, 1989). This suggests that clones overproducing CAT due to an optimized RBS region may either develop faster than the wild population at a chloramphenicol concentration below the MIC (minimum inhibitory concentration), or develop in the presence of concentrations of chloramphenicol lethal to the wild population. Example III

This example illustrates the selection of clones from the libraries constructed according to the description of Example 1. The libraries obtained in the form of colony mats on boxes of LB agar + ampicillin are taken up in sterile water so as to reconstitute a suspension whose Optical Density (OD) at 580 nm is close to 1. In accordance with the results of Example 2, this suspension is spread on LB agar dishes containing lethal doses of chloramphenicol (600, 700, 800 and 900 μg / ml) at a rate of 100 μl of suspension per Petri dish. The dishes are incubated at 37 ° C. and the appearance of resistant colonies is observed by checking at the same time that dishes seeded with a suspension of TOP 10 bacteria transformed by the wild vector pTEXCAT do not give rise to to no growth. Resistant colonies are isolated and subcultured several times on the selection medium to confirm their resistance phenotype. The clones selected at this stage are then subjected to a series of analyzes: (i) extraction of the plasmid (Qiagen kit, Hilden, D) and sequencing of the region covering the RBS, (ii) culture in Erlenmeyers with induction by 1TAA then estimation of the level of expression of CAT by ELISA assay, (iii) SDS-PAGE electrophoresis of the total proteins extracted from the previous cultures and staining with Coomassie blue making it possible to visualize the total intracellular proteins. The sequencing of the clones is carried out using the Dye Terminator kit on an ABI 373A sequencer (Perkin Elmer Applied Biosystems, Foster City, CA). Erlenmeyer cultures are produced by inoculating 25 ml of TSBY medium (Tryptic Soy Broth (DIFCO) 30 g / 1 + Yeast Extract (Difco) 5 g / 1) + tetracycline 8 mg / 1 by a colony on a dish or by a suspension bacterial stored at - 80 ° C. Each preculture is incubated on a plate stirred at 200 rpm and 37 ° C overnight. A fraction is transferred into 50 ml of the same medium so as to reach an initial optical density equal to 1. For the induction of the CAT protein, the medium is added with 25 mg / 1 of IAA and then stirred in the same conditions for 5 hours. A fraction of the suspension (3 x 1 ml diluted to OD = 0.1) is centrifuged and the cells stored at −20 ° C. for the CAT assay by ELISA (CAT ELISA kit, Roche Diagnostics, Basel, CH). The rest of the biomass is recovered by centrifugation at 10,000 g, 4 ° C for 15 minutes. The biomass is taken up in a TEL buffer (25 mM Tris, 1 mM EDTA, Lysozyme 500 μg / ml, pH 8) at a rate of 5 ml per 1 g of wet biomass. The cells are lysed by sonication (NibraCell sonicator equipped with a micro-probe, Sonics & Materials, Danbury, CT). One ml of the resulting suspension is centrifuged for 5 min at 12,000 rpm. The pellet is taken up in 200 μl of TEL to give the insoluble fraction (I). The supernatant is noted “S”. The total proteins contained in fractions I and S are analyzed by electrophoresis under denaturing conditions (SDS-PAGE) and staining with Coomassie blue.

Table 2 below indicates the different RBS sequences obtained after screening the two libraries (N ₁₆ ) and (R ₇ N ₆ ). Following alignment in the GenBank and EMBL nucleotide databases, we can conclude that none of the sequences of 16 nucleotides (strategy (N ₁₆ )) or 13 nucleotides (strategy (R ₇ N ₆ )) located immediately upstream of the AUG codon in the various isolated clones has not been described to date.

Table 2. New RBS sequences isolated by one of the strategies (N ₁₆ ) or (R ₇ N ₆ ).

(*) Each nucleotide sequence (messenger RNA) comprises the mutated region downstream of the initiation codon. The reference sequence of the pTEXCAT vector is shown in the first line of the table. Nucleic sequences upstream and downstream of the codon initiation vectors are shown in this table after transcription as RNA. At the 3 'end of these sequences, only the first two codons of the multiple cloning site are represented, namely GAAUUC.

Thus it has been observed, quite surprisingly, that the clones described in Table 2 have mutations in the region of RBS situated immediately downstream of the codon AUG. The clones pTEXCAT4, pTEXCATl 'and pTEXCAT3' carry a point mutation affecting an amino acid of the N-terminal part of the encoded protein (respectively Leu7Pro, Ala3Pro and ValόAla). The other clones carry larger rearrangements: pTEXCAT2 ', pTEXCAT5' and pTEXCAT9 'have deletions inducing the loss of the regions Ala3Leu7, Ile4Leu7 and Ile4Asn8 respectively. Since the random analysis of 10 clones from the library (N ₁₆ ) selected on ampicillin (that is to say without selection pressure with chloramphenicol) does not show any modification in the region coding for the peptide TrpL (data not presented), it is deduced therefrom that the mutations in TrpL observed on the clones selected for their capacity for expression of CAT play a role in expression. Thus, we demonstrate the following original property: P expression of recombinant proteins is positively affected by mutations downstream of the initiation codon. Figure 3 shows an SDS-PAGE analysis of the total proteins of bacteria transformed by pTEXCAT or pTEXCAT4. We see confirmation of the overproductive nature of the vector pTEXCAT4 since a major protein migrating to the expected position for CAT (28 kDa) is clearly demonstrated in extracts induced by 1TAA while the extracts of the vector pTEXCAT obtained under the same conditions induction only show a low intensity band.

To rule out the possibility that overproduction is caused by modifications of the vector outside the Spel-EcoRI part, the vector pTEXCAT4 was reconstructed in vitro from pTEXCAT by Spel-EcoRI digestion and ligation of a duplex formed by the two. The following phosphorylated oligonucleotides: SDopt4-f:

5'CTAGTTAACTAGTACGCAAGTTCACGTAAAACGGAGAAACCCCCCAATGA AAGCAATTTTCGTACCGAATGCGG-3 '(SEQ ID N ° 17) SDopt4-r:

5'AATTCCGCATTCGGTACGAAAATTGCTTTCATTGGGGGGTTTCTCCGTTTT ACGTGAACTTGCGTACTAGTTAA-3 '(SEQ ID N ° 18).

The resulting vector noted pTEXCAT-SD4 was then transformed into E. coli TOP 10 and compared to pTΕXCAT4 in terms of potential for expression of the CAT enzyme. The results obtained indicate that the expression levels of pTΕXCAT4 and pTΕXCAT-SD4 are comparable to each other and significantly higher than pTΕXCAT. This supports the hypothesis that the improvement in expression observed with the clones claimed in this patent application is indeed caused specifically by the sequences located between the Spel and ΕcoRI sites.

In order to demonstrate the specificity of the mutated or deleted sequences located directly downstream of the initiation codon, the leucine CTG in seventh position of the wild-type vector pTΕXCAT was replaced by a proline CCG to give the vector pTΕXCAT-L7P. The proline CCG in the seventh position of the vector pTΕXCAT4 was replaced by a leucine CTG to give the vector pTΕXCAT4-P7L. The results of this experiment are shown in Table 3 below.

Table 3. Comparison between the expression levels given by the vectors pTΕXCAT, pTΕXCAT4, pTΕXCAT-L7P and pTΕXCAT4-P7L.

The results (mean ± standard deviation) were obtained on two independent experiments. Level 1 is arbitrarily assigned to the vector pTEXCAT.

These results demonstrate that the mutation downstream of the initiation codon alone is responsible for overexpression, since this mutation reintroduced into the wild vector makes it possible to obtain the same overexpression. Example IV

This example shows that the overexpression effect of the new sequences described is not limited to the reporter gene used to select them but is transposed to other genes as soon as these genes are linked to them functionally on the same vector. To this end, the CAT gene of the vectors pTEXCAT and pTEXCAT4 has been replaced by the sequence of the lacZ gene coding for the β-galactosidase of E. coli. Cloning was carried out by amplifying the lacZ sequence by PCR from the vector pβGAL-basic (Clontech, Palo Alto, CA) then by inserting this sequence downstream of trpL at the unique Bsml and HindIII sites, to give the vectors pTΕX-, respectively. βGAL and pTΕX4-bGAL.

The two vectors were transformed into the E. coli ICONE 200 strain (French patent application FR 2 777 292 published on October 15, 1999) with a view to culturing in a fermenter with kinetic monitoring of the expression of β-galactosidase . Conventionally, the recombinant bacteria ICONE 200 x pTEX-βGAL and ICONE 200 x pTEX4-βGAL were cultured in 200 ml of complete medium (Tryptic Soy Broth (DIFCO) 30 g / 1, Yeast Extract (DIFCO) 5 g / 1) for overnight at 37 ° C. The cell suspension obtained was transferred sterile to a fermenter (CF3000 model from Chemap, capacity 3.5 1) containing 1.8 liters of the following medium (concentrations for 2 liters of final culture): glycerol 90 g / 1, (NH_ι) 2S04 5 g / 1, KH2PO4 6 g / 1, K2HPO4 4 g / 1, Na3-citrate 2H2O 9 g / 1, MgS 4 7H2O 2 g / 1, yeast extract 1 g / 1, trace elements, defoamer 0.06% , tetracycline 8 mg / 1, tryptophan 200 mg / 1. The pH is adjusted to 7.0 by adding ammonia. The dissolved oxygen rate is maintained at 30% of saturation by slaving the stirring speed and then the aeration rate to measure dissolved PO2. When the optical density of the culture reaches a value between 30 and 40, the induction is carried out by adding 25 mg / 1 of IAA (Sigma, St Louis, MO). Kinetic analysis of the optical density of the culture (OD to 580 nm) and intracellular β-galactosidase activity was performed. The level of β-galactosidase activity is estimated by a colorimetric assay by mixing 30 μl of sample (fraction “S”, see example 3), 204 μl of buffer (Tris-HCl 50 mM pH 7.5 - MgCl ₂ 1 mM) and 66 μl of ONPG (4 mg / ml in 50 mM Tris-HCl pH 7.5). The reaction mixture is incubated at 37 ° C. The reaction is stopped by the addition of 500 μl of Na ₂ CO ₃ 1 M. The OD at 420 nm relative to the incubation time is proportional to the β-galactosidase activity present in the sample. Knowing that E. coli ICONE 200 has a complete deletion of Lac Popon, the β-galactosidase activity measured is only due to the expression of the lacZ plasmid gene. The results of this comparative study indicate that in two independent experiments, the vector pTEX4-βGAL gives a level of β-galactosidase activity approximately 50 times higher than pTEX-βGAL (Figure 4). We deduce that the original sequence isolated in the RBS area of the vector pTEXCAT4 potentiates the expression, not only of the CAT protein, but also of other proteins such as, for example, β-galactosidase. From this example, we can conclude that other proteins of biotechnological interest could be advantageously expressed from one of the vectors according to the invention by introducing their coding sequence downstream of the mutated or deleted sequences according to the invention. Example V Comparison between the expression levels given by the vectors pTEXwt (not part of the invention) and the vectors pTEX9 *, pTEXIO ', pTEXl P and pTEX12'.

The vectors pTEXIO ', pTEXl P and pTEX12' originate from the vector pTEX9 but also include additional mutations, as indicated in table 4 below: Table 4. Comparison between the expression levels given by the vectors pTEXwt, pTEX9 *, pTEXIO ', pTEXl P and pTEX12'.

(*): Each nucleotide sequence (messenger RNA) comprises the mutated region downstream of the initiation codon. The reference sequence of the pTEXCAT vector is shown in the first line of the table. The nucleic acid sequences upstream and downstream of the vector initiation codon are represented in this table after transcription in the form of RNA. At the 3 'end of these sequences, only the first two codons of the multiple cloning site are represented, namely GAAUUC.

The methods for determining the expression are those used in the examples above.

These results demonstrate that the deletions downstream of the initiation codon make it possible to obtain an overexpression up to more than 250 times greater than the expression observed using the wild-type vector.

Claims

1. Construction for the expression of a gene coding for a recombinant protein of interest placed under the control of the tryptophan Ptrp operon in a prokaryotic host cell comprising directly downstream of the initiation codon a nucleic sequence of sequence SEQ ID No. 1 and downstream of this sequence a multiple cloning cassette intended to receive the gene coding for said recombinant protein of interest, characterized in that at least one of the nucleotides of the sequence SEQ ID No. 1 is mutated or deleted so as to allow an overexpression of said recombinant protein.

2. Construction according to claim 1, characterized in that at least one of the nucleotides of the sequence SEQ ID No. 1 is deleted.

3. Construction according to claim 1, characterized in that said at least mutated or deleted nucleotide is located on the fragment of sequence S> EQ ID No. 2 of the sequence SEQ ID No. 1.

4. Construction according to claim 1, characterized in that said at least mutated or deleted nucleotide, preferably mutated, is located on the GTA codon of the sequence SEQ ID No. 1.

5. Construction according to claim 1, characterized in that said at least mutated or deleted nucleotide, preferably mutated, is located on the codon GCA of the sequence SEQ ID No. 1.

6. Construction according to claim 1, characterized in that said at least mutated or deleted nucleotide, preferably mutated, is located on the CTG codon of the sequence SEQ ID No. 1.

7. Construction according to claim 1, characterized in that said sequence SEQ ID No. 1 of which at least one of the nucleic acids is mutated or deleted, has at least in position 1, 2 and 3 the nucleotide A.

8. Construction according to claim 1, characterized in that said sequence SEQ ID No. 1 is completely deleted.

9. Construction according to one of claims 1 to 8, characterized in that at least one of the nucleotides, and preferably all the nucleotides, located between the nucleic sequence of sequence SEQ ID No. 1 and the multiple cloning cassette intended to receive the gene coding for said recombinant protein of interest, are deleted.

10. Construction according to any one of claims 1 to 9, characterized in that the nucleic sequence directly upstream of the initiation codon is chosen from the sequences SEQ ID No. 3 to SEQ ID No. 10.

11. Construction according to any one of claims 1 to 10, characterized in that the prokaryotic host cell is a gram negative bacterium.

12. Construction according to any one of claims 1 to 11, characterized in that the prokaryotic host cell is E. coli.

13. Vector containing a construction according to any one of claims 1 to 12.

14. A prokaryotic host cell transformed by a vector according to claim 13.

15. Prokaryotic host cell according to claim 14, characterized in that it is E. coli.

16. A method of producing a recombinant protein of interest in a host cell using a construct according to any one of claims 1 to 12.

17. A method of producing a recombinant protein of interest according to claim 16, wherein said construct is introduced into a prokaryotic host cell.

18. A method of producing a recombinant protein of interest according to claim 16 or 17, wherein said construct is introduced into a prokaryotic host cell by a vector according to claim 13.

19. A method of producing a recombinant protein of interest according to one of claims 16 to 18, characterized in that it comprises the following steps: a) cloning of a gene of interest in a vector according to the claim 13; b) transformation of a prokaryotic cell with a vector containing a gene coding for said recombinant protein of interest; c) culturing said transformed cell in a culture medium allowing expression of the recombinant protein; and d) recovering the recombinant protein from the culture medium or from said transformed cell.

20. Use of a construct according to one of claims 1 to 12, of a vector according to claim 13 or of a cell according to claim 14 or 15, for the production of a recombinant protein.

21. Use of a recombinant protein for the preparation of a medicament intended for administration to a patient in need of such treatment, characterized in that said recombinant protein is produced by a process according to one of claims 16 to 19.