MODIFIED INSULIN PRECURSORS PRODUCTS AND PROCESSES
Field of the Invention
This invention is related to products and processes useful for modifying polypeptide proteolytic cleavage sites, thus making the sites accessible to highly specific protease enzymes. More particularly, the invention is directed to methods and products whereby active insulin, preferably human insulin, can be obtained in vitro following removal of the C-chain of proinsulin by the action of specific enzymatic and/or chemical cleavage agents. Background of the Invention
The biosynthesis of insulin takes place through a precursor molecule, proinsulin, wherein the end of the B-chain is connected to the beginning of the A-chain through an intermediate connecting peptide chain (C-chain). The mature active form of insulin results from the cleavage of proinsulin, two times, thereby releasing the C-chain, yielding the amino-terminus B-chain and the carboxyl-terminus A-chain joined by a pair of disulfide bridges. See: Steiner e.t al.. Science, 157:697 (1967). The mechanism by which proinsulin is converted to insulin in vitro is conducted by, as yet, unidentified enzymes.
By convention, the amino acids of proinsulin are numbered according to their position starting at the aminoterminus of the B-chain. The method of the present invention is applicable to all insulin precursors, particularly those comprising human proinsulin in which neither the B-chain nor the A-chain of proinsulin contains the amino acids methionine or aspartic acid.
Human proinsulin has a sequence of 86 amino acids having the structure, NH2-(B-chain)-Arg-Arg-(C-chain)-LysArg-(A-chain); see Stein et al . , Proc. Nat. Acad. Sci . USA. 67 : 248 (1970) . The B-chain consists of the aminoaσids at positions 1 through 30. The A-chain of human proinsulin
consists of amino acids at positions 66 through 86. Cleavage of human proinsulin to form insulin occurs between the amino acids at positions 30 and 31 and between the amino acids at positions 65 and 66; see FIG. 1. The lengths and composition of the A-and B-chains of insulin are highly conserved evolutionarily, i.e., they are fairly constant from species to species. However, the length and composition of the C-chain of proinsulin varies widely according to the source. The patent and scientific literature is replete with recombinant products and processes reportedly useful in the production of insulin and/or proinsulin. Generally, these techniques involve the synthesis of a desired gene sequence, and the expression of that sequence in either a procaryotiσ or eucaryotic cell, using techniques commonly available to the skilled artisan. Once a given gene has been isolated, purified and inserted into a transfer- vector, the overall result of which is termed the cloning of the gene, its availability in substantial quantity is assured. The vector with its cloned gene is transferred to a suitable microorganism, for example, bacteria or yeast, wherein the vector replicates as the microorganism proliferates and from which the vector can be isolated by conventional means. Thus there is provided a continuously renewable source of the gene for further manipulations, modifications and transfers to other vectors or other loci within the same vector.
Expression may often be obtained by transferring the cloned gene, in proper orientation and reading frame, into an appropriate site in a transfer vector such that translational read-through from a procaryotic or eucaryotic gene results in synthesis of a chimeriσ fusion protein comprising the amino acid sequence coded by the cloned gene linked to an amino-terminal sequence from the procaryotic or eucaryotic gene. A variety of specific protein cleavage techniques
may be used to cleave the chimeric protein at a desired point so as to release the desired amino acid sequence, which may then be purified by conventional means.
Techniques for constructing an expression transfer vector having the cloned gene in proper juxtaposition are described in Polisky et al., Proc. Nat. Acad. Sci. USA. 73:3900 (1976); Itakura et al., Science. 198:1056 (1977); Villa-Komaroff etal., Proc. Nat. Acad. Sci USA.75:505(1978) and Chang et al.. Nature. 275:617 (1978), the disclosures of which are incorporated herein by reference.
U.S. Patent No. 4,451,396 describes a process for producing the proinsulin cleaved fused gene product by inhibiting the undesirable irreversible thiol reactions during methionyl cleavage using cyanogen bromide. U.S. Patent No. 4,431,740 described DNA having a base sequence coding for human proinsulin and novel recombinant DNA transfer vectors containing the cloned DNA. There is no mention of modified proinsulins.
U.S. Patent No. 4,430,266 describes a proinsulin-like disulfide insulin precursor produced from its corresponding linear chain S-sulfonate insulin precursor.
U.S. Patent No. 4,421,685 describes a process for the production of insulin or an insulin analogue. The process comprises bringing the S-sulfonated form of the A-chain and the S-sulfonated form of the B-chain, and a thiol reducing agent together in an aqueous medium.
U.S. Patent No. 4,343,898 describes a process for the preparation of esters of human and other mammalian insulins.
U.S. Patent No. 3,996,268 describes a cross-linking reagent useful in the synthesis of a proinsulin like precursor. The A- and B-chains are cross-linked with the reagent carbony-bis (L-methionine p-nitrophenyl ester) to yield a precursor of proinsulin which can be oxidized to form the appropriate disulfide bridges between the A-and Bchains.
Proinsulin is readily converted to insulin using well established enzymatic and/or chemical techniques. See Kemmber et al., J. Biol. Chem.. 242:6786 (1971).
The synthesis of a variety of connecting peptides and human proinsulin is described in Yanaihara et al.. Diabetes, 27 (Suppl 1):149 (1978).
Examples of bacterial clones synthesizing prosinulin are shown in Villa-Komaroff et al., Proc. Nat. Acad. Sci. USA, 75:3727 (1978). PCT application No. 83/03413 (hereinafter, the Seed application) proposed methods and products including modified proinsulin proteins which would be synthesized in bacteria, isolated and cleaved in vitro to produce insulin.
The concept suggested in the Seed application was to allow normal folding of proinsulin to occur coupled with a method for specific removal of the C-chain. This method entailed altering the first and last codons of the DNA sequence encoding the C-chain such that Arg31 became Asp31 and Arg65 became Met65. This modified proinsulin gene was then to be expresed in E. coli to yield a modified proinsulin protein (presumed to be folded correctly) which would then be treated with cyanogen bromide to cleave the proinsulin molecule on the carboxyl side of Met65. Finally the once cleaved protein would be treated with a metaloprotease from Pseudomonas fracri which had been reported in the literature to cleave on the amino side of aspartic acid residues. This enzyme was expected to cut the protein between Thr30 and Asp31 to yield active mature insulin.
As described in greater detail herein below, the method and products described in the Seed application do not produce insulin in vitro. Specifically, by following the methods described in the Seed application, the change, Arg65 to Met65, could not be accomplished. Moreover, the purported cleavage of the Thr30-Asp31 bond by a P. fragi enzyme does not occur.
However, the Seed application does describe some useful concepts, and for that reason, it is incorporated herein by reference.
Brief Description of the Drawings
FIG. 1 represents the amino acid sequence of human proinsulin, showing the cleavage sites resulting in the production of human insulin;
FIG. 2 represents the general scheme by which modified proinsulin genes are inserted into an expression vector. The plasmid pAL181 is a derivative of pUC18 (Norrander et al.. Gene. 26:101 (1983)), from which the Ndel site has been removed and the PvuII to EcoRl fragment which specifies the lac promoter has been deleted and replaced by a portion of the bacteriophage lambda described by Shimatake and Rosenburg, Nature. 292:128 (1981). This segment contains the pL promoter fused to a segment carrying the ribosome binding site and initial ATG of the CII gene. The sequence in which this ATG codon resides is a site for cleavage with restriction endonuclease Nde 1. This Nde site has been fused to the kpn I site in the polylinker of pUC18: and
FIGS. 3A and 3B represent the amino acid sequences of two modified human proinsulins, showing modifications near the cleavage sites which allow for the in vitro production of human insulin.
Summary of the Invention
The present invention is directed to products and processes for the generation in polypeptides of protease cleavage sites, thereby making said cleavage sites accessible for in vitro cleavage by highly specific proteases. More particularly, this concept has been employed in the production from a modified proinsulin of an insulin containing a B-chain having the proper carboxyl-terminus. A modified proinsulin with several amino acids deleted from its C-chain
was cleaved at Thr30-Asp31 and at a site equivalent to the Arg65-Gly66 of the natural proinsulin C-chain (this site having been modified to a Met-Gly sequence) to yield active human insulin in vitro. There are thus provided compositions and methods for the in vitro production of insulin, especially active human insulin. According to the present invention, insulin is produced by a process comprising the in vitro synthesis of a modified proinsulin gene sequence. Modified proinsulin comprises proinsulin having the amino acid aspartiσ acid substituted for the amino acid at the amino-terminus of the C-chain, methionine substituted for the amino. acid at the carboxyl-terminus of the C-chain, and a deletion from the C-chain of certain charged amino acid groups. This last modification is essential to enable cleavage by a protease at the Thr30-Asp31 bond. The preferred modified proinsulin comprises a modified human proinsulin.
In general, the process whereby a mammalian protein, such as a modified proinsulin, is produced with the aid of reσombinant DNA technology first requires the cloning of the mammalian gene. Once cloned, the gene may be produced in quantity, further modified by chemical or enzymic means and transferred to an expression vector or plasmid. Any appropriate expression vector may be selected by the skilled artisan for use in the process of this invention. The cloned gene is also useful for isolating related genes, or, where a fragment is cloned, for isolating the entire gene, by using the cloned gene as a hybridization probe. Further, the cloned gene is useful in providing, by hybridization, the identity or homology of independent isolates of the same or related genes. Because of the nature of the genetic code, the cloned gene, translated in the proper reading frame, will direct the production only of the amino acid sequence for which it codes and no other.
The present invention provides the essential elements needed for the production of human insulin by techniques adaptable to industrial processes. The naturally occurring structural gene for proinsulin has been cloned. Its modification and expression in an appropriate host cell type yields a protein product which is convertible to insulin by known methods. The present invention is fundamentally based upon using the proinsulin molecule, taking advantage of the fact that the C-peptide region (i.e., C-chain) of the proinsulin molecule permits folding such that the Aand B-chains are properly juxtaposed. In such configuration, the correct pairing of sulfhydryl groups is assured and the formation of the disulfide as found in the active insulin molecule are readily formed. A DNA sequence encoding modified proinsulin precursors was prepared by annealing at least one select oligonucleotide primer to a vector containing a single-stranded DNA sequence encoding for, or complementary to, the sequence encoding for proinsulin. In. a preferred embodiment, the primer was annealed to a single-stranded DNA encoding human proinsulin that had been inserted into an M13 vector.
To effect the substitution of aspartic acid for the amino acid at the amino-terminus of the C-chain, the primer contained a nucleotide triplet complementary to that encoding aspartic acid and sufficient nucleotides complementary to a single-stranded, message-synonymous DNA sequence encoding proinsulin to effectively anneal the primer to the singlestranded DNA sequence at a position wherein the aspartic acid encoding triplet encoded the amino acid at the aminoterminus of the C-chain.
Likewise, to substitute an aspartic acid for the amino acid at the amino-terminus of the C-chain, a primer that was the complement of the above-described primer was prepared and annealed to a single-stranded DNA sequence complementary to the sequence encoding proinsulin.
To effect the substitution of methionine for the amino acid at the carboxyl terminus of the C-chain, the primer contained a nucleotide triplet complementary to that encoding methionine and sufficient nucleotides complementary to a single-stranded DNA sequence encoding proinsulin to effectively anneal the primer to the single-stranded, messagesynonymous DNA sequence at a position wherein the methionine encoding triplet encoded the amino acid at the carboxyl terminus of the C-chain. Similarly, to substitute methionine for the amino acid at the carboxyl terminus of the C-chain, a primer that was the complement to the above-described primer containing a triplet encoding methionine was annealed to the singlestranded DNA sequence complementary to the sequence encoding proinsulin.
After the primer or its complement had been annealed to the vector, the primer was extended and ligated to form a circular double-stranded vector which was then inserted into cells such as E. coli. The cells and the vectors were then replicated. This formed at least two groups of vectors, one group comprising an unaltered DNA sequence encoding normal proinsulin and a second group comprising an altered DNA sequence encoding modified proinsulin. The vectors carrying the altered DNA sequence were identified and isolated by conventional techniques such as (1) specific hybridization to the oligonucleotide primer or (2) by electrophoresis following digestion with a restriction enzyme, if the alteration of the DNA sequence caused a change in the restriction pattern of the gene. The altered DNA sequence was then cleaved from the vector and inserted into a second vector capable of expressing high levels of proinsulin in production cells. Modified proinsulin produced in these cells was recovered and purified. Any of the purification methods available to the skilled artisan could have been employed, but the preferred
method was that of Wetzel et al.. Gene, 16:63 (1981), the disclosure of which is incorporated herein by reference. Once the modified proinsulin was purified, the C-chain and any amino acids attached either to the amino-terminus of the B-chain, or the carboxyl-terminus of the A-chain were cleaved from the modified proinsulin in vitro, to yield active human insulin.
Detailed Description
The present invention overcomes the problems of the Seed application by employing a modified proinsulin capable of being cleaved on the carboxy side of threonine 30 by a protease from a mutant strain of Pseudomonas fragi. This enzyme, isolated from culture filtrates of the mutant P. fragi Me 1 (Noreau et al., J. Bacteriol., 140:911 (1979)), has been shown to cleave peptide bonds on the amino terminal side of either aspartic acid or cysteic acid residues in oxidized ribonuclease (Drapeau, J. Biol. Chem., 255:839 (1980)). This is in contrast with wild type P. fragi which produces a proteolytic enzyme with a more general substrate specificilty (Porzio et al., Biochim. Biophys. Acta. 384:235 (1975)). In addition to the results obtained with oxidized ribonuclease, the mutant enzyme has been employed in the digestion of alpha-tublin from porcine brain (Ponstingl, Proc. Nat. Acad. Sci. USA. 78:2757 (1981)). Here again it cleave the protein on the amino side of at least some of the aspartic acid residues.
As described above, the present invention represents a modification of the teachings of the Seed application whereby insulin is produced in vitro from a modified proinsulin expressed in E. coli.
Thus, as a first step, the problems associated with the Seed application will be described. In order to test the methods taught in the Seed appli
cation, the gene for human proinsulin had to be isolated. The coding sequence for human proinsulin is known (Bell et al.. Nature (London), 282:525 (1979)). Thus, rather than isolating a cDNA clone as suggested in the Seed application, two oligonucleotides complementary to portions of the human proinsulin gene were synthesized. The sequences of these two oligonucleotides are given in Table I. Oligonucleotide Nos. 1 and 2 were radioactively labeled (32P) and used to screen a human genomiσ library for complementary sequences. This screen yielded a clone which contained the entire coding sequence for preproinsulin as well as one large intron. This cloned segment of DNA was transferred into phage M13 and single-stranded DNA containing the preproinsulin coding sequence was prepared. To this DNA was annealed an oligonucleotide, complementary to the sequences on both sides of the intron. The sequence of the oligonucleotide used in the reaction was No. 3 in Table I. This oligonucleotide was extended in length using the large fragment of E. coli DNA polymerase I which generated a double-stranded DNA molecule from the single-stranded template, and the ends were joined with T4 DNA ligase. This DNA was transferred into E. coli and viral plaques were screened for DNA sequences which would hybridize strongly with oligonucleotide No. 3. Among those M13 plaques were found phage in which the entire intervening sequence from the genomic prepoinsulin gene had been lost and the entire coding sequence for the preproinsulin gene was continuous.
TABLE I Oligonucleotide No. Oligonucleotide Sequence (5'-3') 1. d(CTG CCC CTG CTG GCG CTG CTG GCC CTC TGG)
2. (CCT CCA GGG CCA AGG GCT GCA GGC TGC CTG)
3. d(GGA CCT GCA GGT GGG GCA GGT G)
4. d(C CTG CAG TTC CTC TGC)
5. d(CCC CAC CTG CAG ATC GGT CTT GGG) 6. d(TTC CAC AAT GCC CAT ACG CTT CTG CAG)
Once the proinsulin gene had been isolated, it was transferred into a plasmid, e.g. pAL181,which carried the appropriate DNA sequences to allow expression of the proinsulin protein in E. coli. A sample of plasmid pAL181 is on deposit at the American Type Culture Collection, Rockville, Maryland under accession number ATCC No. 40134. These plasmids carried the pL promoter of phage Lamda and the ribosome binding site for the CII gene of Lambda. The proinsulin sequence was transferred into these plasmids by synthesizing an oligonucleotide which was complementary to the first twenty nucleotides of the proinsulin portion of the gene. This oligonucleotide was annealed to the singlestranded M13 phage DNA carrying thje preproinsulin coding sequence and extended by the action of the large fragment of E. coli DNA polymerase I. After a short period of synthesis, the DNA was treated with a single-stranded specific nuclease and finally with restriction endonuclease Sph I. The resultant double-stranded DNA fragment was ligated into the expression plasmids which had been prepared by cleaving with Kpn I, treatment with the large fragment of E. coli DNA polymerase I, and finally cleaving the DNA with Sph I (see FIG. 2). This construction placed the proinsulin gene in frame with the initial methionine codon of the Lambda CII gene. This plasmid was transferred into E. coli and used to produce the proinsulin protein.
Once the proinsulin coding sequence was isolated, confirmed and expressed, the Seed modifications were undertaken. The two oligonucleotides taught in the Seed application were synthesized and therein described reactions were carried out to alter the proinsulin sequence. The Arg31 to Asp31 change was conducted using the procedures taught by Seed. M13 phage which hybridized strongly to the oligonucleotide were picked and the presence of the new Pvu I site was substantiated. The sequence of the region was determined to be sure no other alterations had occurred.
The replacement of Arg65 with Met65 was problematic. It was found that a second region within the M13 phage is homologous to 12 of the 15 bases of the oligonucleotide in the Seed application and in fact seemed to be the main site for priming during the DNA polymerase mediated synthesis of the second strand of M13 (an essential step in site directed mutagenesis). The reaction was attempted a number of times under various conditions, but was never successful for the generation of the desired substitution of a Met codon for an Arg codon.
Despite this, it was decided to use the proinsulin sequence containing the Asp31 codon to test the Seed suggestion of using the P. fragi protease to cleave the proinsulin between Thr30 and Asp31. The analysis of the cleavage pattern of proinsulin posed some problems. Along with Asp31, the modified proinsulin contained a second aspartic acid at position 36. If the P. fragi protease was to cut the modified proinsulin protein specifically on the amino side of aspartic acids,it would cut the molecule twice leaving a piece 30 amino acids long, a piece 5 amino acids long, and a piece 51 amino acids long. The most reliable method for analyzing these cleavages is to sequence the resultant pieces. Such sequencing is generally done starting from the amino-terminus of each fragment. Small fragments are very difficult to sequence by this method. Thus a quantitative analysis of the cleavage of interest (i.e., between Thr30 and Asp31) would be very difficult.
To obviate these problems, another alteration was made to the proinsulin sequence. This time the Asp36 codon was altered in its third position so that it coded for glutamic acid. This cleavage was effected in a manner completely analogous to the method by which the Asp31 codon was produced, only this time, oligonucleotide No. 4 from Table I was used as a precursor. Sequence analysis confirmed
that this new proinsulin contained the desired single aspartic acid at position 31. This doubly modified proinsulin gene was inserted into the same expression vector used to make the single modified proinsulin and it too encoded the synthesis of a protein which had a mobility on polyacrylamide gel which was similar to proinsulin.
Both the singly and doubly modified proinsulins were prepared in E. coli and isolated by the method of Wetzel et al.. Gene. 16:63 (1981). These preparations were used to determine if the P. fragi protease could indeed cleave the protein as desired (i.e., between Thr30 and Asp31).
Proinsulin protein containing only one aspartic acid residue (Asp31) was mixed, with P. fragi protease at ratios varying from 100/1 to 1/1. These solutions were allowed to remain at 37°C for up to 24 hours. The only reaction in which any change was seen was at the highest enzyme to substrate ratio and then only after many hours. In this case approximately 20% of the proinsulin was cut. The cleavage occurred within the C-peptide between Leu56 and Ala57. This cleavage was interpreted as resulting from a trace impurity in the protease preparation, and not from the proteolytic activity of interest.
Attempts were made to cleave the modified proinsulin under a variety of conditions. These included reactions in buffer plus and minus 2 M urea as well as buffer plus and minus extra zinc ions. In no case did the proinsulin molecule indicate a sign of cleavage as indicated by any change in mobility on a gel or in its HPLC profile.
Furthermore, the modified proinsulin was reacted with radioactively labeled iodoacetamide to alkylate and label all of the cysteine groups of the protein. This material was refractory to cleavage by P. fragi protease but was readily cleaved by trypsin.
Finally an equimolar mixture of oxidized ribonuclease and the modified proinsulin was prepared and treated with
the protease. Within minutes the ribonuclease was cleaved into many fragments, but the modified proinsulin was unaffected. Thus it was concluded that despite the presence of an aspartic acid in the modified proinsulin sequence the P. fragi protease was unable to cleave this protein between Thr30 and Asp31. Thus the process described in the Seed application is thus shown to be unworkable.
These results were surprising in that they represent the first indication that there are aspartic acid residues at which the P. fragi protease will not cleave.
While not wishing to be bound to any particular theory, it was reasoned that the high charge density in the region (from amino acids 32-36) might be inhibiting the protease from either recognizing aspartic acid 31 or from binding to and cleaving at the threonine 30-aspartic acid 31 junction.
Thus the proinsulin gene was modified once again, this time by deleting the 15 nucleotides that had coded for the amino acid sequence arginine 32 to aspartic acid 36; see Figure 3A.
This modification was performed in a manner similar to, but not identical with, the method used to produce the original proinsulin modifications. The expression plasmid pALIA181 carrying the modified proinsulin gene with the Arg31 to Asp31 substitution was cut once with restriction endonuclease Sea I. Plasmid pAL181 was cut with restriction endonuclease Nce I which cuts pAL181 just before the ATG sequence which constitutes the end of the sequence derived from phage Lamda. This plasmid was also cut with Sph I and the larger fragment was isolated. The two linearized plasmids were mixed in equimolar amounts, the pH was raised to 13.3 for ten minutes and the denatured DNA was diluted ten fold and the pH adjusted to 8.0. This mixture was heated to 68°C for 3 hours and allowed to slowly cool to 25°C. To this solution oligonucleotide No. 5 from Table I
was added. The solution temperature was raised to 37°C for two minutes and then reduced to 15°C for thirty minutes. To this DNA solution the large fragment of E. coli DNA polymerase I and T4 DNA ligase were added. After two hours at 15°C and one hour at 25°C the DNA was transformed into E. coli and the resultant ampicillin resistant colonies were screened for sequences that would hybridize with oligonucleotide No. 5. Several colonies were identified and sequence analysis confirmed that the desired 15 base pair deletion had been effected in the proinsulin gene. When modified proinsulin was made from this deleted gene, the P. fragi protease cleaved it effectively at the desired bond.
Alternatively, deletion of amino acids, arginine 31 through glutamic acid 35 of normal proinsulin (See Figure 3B) can be accomplished by deleting the codons which code therefor. Synthesizing this modified proinsulin in E. coli followed by isolation, purification and in vitro digestion using P. fragi protease yields efficient cleavage at the desired bond.
The final modification to the proinsulin molecule that was carried out was the insertion of a sequence at the carboxyl-terminus of the C-chain which allowed specific processing at that site. This modification was carried out in a fashion completely analogous to that described above for generating the 15 base paid deletion. In this case the oligonucleotide used was No. 6 from Table I. This oligonucleotide was used to prime the repair of the heteroduplexed DNA and then used to probe the resultant colonies for those which hybridized with this new sequence. This modified proinsulin sequence produced a protein with a methionine at the carboxylterminus of its C-chain which allowed for the cleavage of the methionine to glycine bond with cyanogen bromide. Alternatively, Arg65 simply may be replaced by the
amino acid. Met, if the appropriate oligonucleotide primer is used, thereby creating the proper cleavage site.
Expression, followed by purification and refolding of this final modified proinsulin led to a protein which when treated in vitro with cyanogen bromide and a protease from P. fragi. afforded active human insulin.
This invention also contemplates substituting enzymatic cleavage sites for cyanogen bromide cleavage sites. In order to effect such substitution, an appropriate oligonucleotide primer is used to create a DNA molecule encoding a peptide cleavage site specifically recognized and cleaved by a proteolytic enzyme. A variety of peptide cleavage sites and corresponding proteolytic enzymes may be used so long as the enzyme does not cleave the modified proinsulin within the mature insulin sequence. The C-chain of the proinsulin molecule containing the modified cleavage site sequences may thus be cleaved enzymatically at both of its termini using appropriate enzymes and conventional enzymatic techniques. As employed in the following claims, the numbering system conventionally employed to describe human proinsulin (see Figure 1) is retained, even though several amino acids have been removed to form a modified proinsulin chain. For example, in reference to the Met65 - Gly66 bond of natural human insulin, the removal of four amino acids upstream would cause a change in reference numbers to Met61 - Gly62. For the sake of simplicity, regardless of any amino acid deletions, the conventional numbering system is retained.