EP0329710A1

EP0329710A1 - Microbial production of peptide oligomers

Info

Publication number: EP0329710A1
Application number: EP19880900652
Authority: EP
Inventors: Jon Ira Williams; Anthony Joseph Salerno; Ina Goldberg; William Turner Mcallister
Original assignee: Allied Corp
Current assignee: Allied Corp
Priority date: 1987-01-07
Filing date: 1987-12-17
Publication date: 1989-08-30
Also published as: JPH02502604A; WO1988005082A1

Abstract

Procédés de production microbienne d'oligomères peptidiques, de produits polypeptidiques résultant de l'application de l'un de ces procédés et de microbes à utiliser en vue de cette production. Un autre aspect de l'invention à trait aux procédés d'obtention de ces microbes par des méthodes relevant du génie génétique et aux vecteurs plasmagènes destinés à être utilisés dans le domaine du génie génétique.Processes for microbial production of peptide oligomers, polypeptide products resulting from the application of one of these processes and microbes to be used for this production. Another aspect of the invention relates to methods for obtaining these microbes by methods relating to genetic engineering and to plasma vectors intended for use in the field of genetic engineering.

Description

MICROBIAL PRODUCTION OF PEPTIDE OLIGOMERS

1. Field of the Invention

This invention relates to processes for the microbial production of peptide oligomers, to polypeptide products resulting from application of any of these processes, and to microbes for use in such production. Another aspect of this invention relates to processes for genetically engineering such microbes and to plasmid vectors for use in such engineering.

2. Prior Art:

Procedures for genetically engineering microbes are known. Illustrative of certain aspects of these procedures relevant to this application are those described in G. D. Stormo, T. D. Scheider and L. M. Gold, Nucleic Acids Research 10, 2971-2996 (1982); A. Shatzman, Y. S. Ho and M. Rosenberg in Experimental Manipulation of Gene Expression, pp. 1-14. M. Inouye, ed. (Academic Press, 1983); A. Rattray, S. Altuvia, G. Mahagna, A. B. Oppenheim and M. Gottesman, Journal of Bacteriology 159, 238-242 (1984).

Modern biochemical advances in genetic technology have led to the introduction of new techniques for transferring genes between species. Many of these techniques are based on the use of plasmid vectors with microorganisms as hosts. These vectors allow establishment and expression of foreign genes in microorganisms such as bacteria under controllable conditions. See J. G. Sutcliffe and F. M. Ausubel in Genetic Engineering, pp. 83-111. A. M. Chakrabarty, ed. (CRC Press, 1978) and R. Wu, L.-H. Guo and R.C. Scarpella in Genetic Engineering Techniques, pp. 3-21. P. C. Huang, T.T. Kuo and R. Wu, eds. (Academic Press, 1982). A large number of plasmids are now available that allow cloning of either genes with their naturally associated regulatory DNA sequences or genes which function under the control of regulatory DNA sequences inherent to the parent plasmid. Many of these plasmids have been applied to the isolation, characterization and expression of many genes, gene fragments or gene promoter sequences. Most of the genes which have been cloned and expressed from plasmid vectors in bacteria such as the gram-negative bacterium Escherichia coli code for proteins which are enzymes or which have a physiologic function (e.g., hormones, blood factors, cell growth factors, etc.). Relatively few genes or gene fragments have been cloned that code for all or part of a structural protein such as components of the extracellular matrix in multiσellular higher organisms; these proteins include the collagen family, elastin, fibronectin, laminin and other fibrous proteins. Other structural proteins with interesting physical or chemical properties include the protein or glycoprotein elements of thick, intermediate or thin filaments in higher organisms, the annelid or arthropod silks, bacterial flagellin, resilin, eucaryotic egg shell proteins, insect cuticle proteins and architectural proteins involved with eucaryotic developmental processes such as tissue organization. Very few of these cloned genes have been expressed and their protein products isolated, purified and/or biochemically analyzed following their expression in a heterologous bacterial host.

Researchers in recombinant DNA technology using the bacterial host E. coli who have been or who are interested in optimizing foreign gene expression from plasmid vectors have utilized various strategies for increasing protein production from the foreign genes. These strategies include use of runaway replication of the plasmid vector, thermal or chemical induction of the promoter DNA sequence controlling expression of the foreign gene, or use of highly active promoter sequences such as the lac, trp or Ipp promoters endogeneous to E. coli or natural or synthetic mutant forms thereof. For illustrative examples of such efforts, see B. Uhlin, S. Molin, P. Gustafsson and K. Nordstrom, Gene 6, 91-106 (1979); K. Backman and M. Ptashne, Cell 13, 65-71 (1978); K. Nordstrom, S. Molin and J. Light, Plasmid 12, 71-90 (1984); and P. Stanssens, E. Remaut and W. Fiers, Gene 36, 211-223 (1985). Hybrid promoters which advantageously use a -35 consensus sequence and a 5' flanking region from one promoter and a portion of a promoter/operator sequence including a -10 region sequence and a Shine-Delgarno sequence from a second natural or synthetic promoter /operator DNA sequence have proven particularly useful for high level expression of foreign genes in E. coli. See literature, in the case of hybrid trp-lac promoters, such as H. A. DeBoer, L. J. Comstock and M. Vasser, Proc. Natl. Acad. Sci. 80, 21-25 (1983); E. Amann, J. Brosius and M. Ptashne, Gene 25, 167-178 (1983); U. S. Patent 4,551,433, issued Nov. 5, 1985 to H. A. DeBoer; and European patent application 0136090 (filed Aug. 24, 1985) by R. Arentzen and S. R.

Petteway, Jr. Plasmid vectors utilizing the controlling elements of the bacteriophage lambda P_L promoter in concert with additional elements such as temperaturesensitive expression of the cl repressor protein governing activity from the P_L promoter and the nutL locus for anti termination activity mediated by the bacteriophage N protein have also provided high levels of foreign gene expression in E . coli and proved comparatively to be as strong or stronger than other strong promoters such as the lacUV5 promoter in E. coli. See especially E. Remaut, P. Stanssens and W. Fiers, Gene 15, 81-93 (1981); U.S. Patent 4,578,355, issued Mar. 25, 1986 to M. Rosenberg; J. A. Lautenberger, D. Court and T. S. Papas, Gene 23, 75-84 (1983); and European patent application 0131843 (field Mar. 7, 1984) by H. Aviv, M. Gorecki, A. Levanon, A. Oppenheim, T. Vogel, E. Zeelon, and M. Zeevi; and C. A. Caulcott and M. Rhodes, Trends in Biotechnology 4, 142-146 (1986). Most of these publications describe cloning of foreign genes in phase with an initiation codon ATG and production of a fusion protein under the control of the lambda P_LO_L promoter/ operator system, N protein-nutL interaction and the lambda cll gene ribosomal binding site. The product fusion protein then includes some portion of the amino terminus peptide sequence from the bacteriophage lambda cll protein.

Applicants are aware that the Department of Health and Human Services, U.S.A., under the names of T. S. Papas and J. A. Lautenberger filed a U.S. Patent application under Serial No. 6-511,108 on July 6, 1983 covering the plasmid pJL6. Portions of this application have been obtained from the National Techn i cal Information Service, U. S. Department of Commerce. However, the claims are not available and are maintained in confidence. The available portions of the application have been reviewed. The construction of pJL6 is described and its use as a cloning and expression vector for heterologous genes is discussed with relevant examples drawn exclusively from molecular cloning experiments with oncogenes. No mention is made in the available application portions of the use of recombination deficient bacterial hosts, the cloning of synthetic genes or genes coding for structural proteins, or cloning into restriction enzyme recognition sites in pJL6 other than the Clal site or the Clal-BamHl site pair. All heterologous genes therefore cloned in pJL6 will necessarily produce fusion protein products whereby the foreign gene product cannot be prepared free of amino acid residues on the amino terminus which derive from the lambda cll gene.

A. Seth, P. Lapis, G. F. Van de Woude and T. S. Papas in Gene 42, 49-57 (1986) describe modification of the expression vector pJL6 to yield a class of plasmid vectors which contain in 5' to 3' order: the lambda bacteriophage P_LO_L promoter/operator sequence, an N gene-cro gene fusion polypeptide, the N gene utilization site (nutL), a ribosomal binding site from the l ambda ell gene and a restriction enzyme recognition site which is adjacent to the initiation codon ATG and which allows insertion of foreign genes in phase with the initiation codon so as to code for a protein product with at most one extraneous amino acid residue. The plasmids constructed by A. Seth et al. were specifically designed to be cleaved by an appropriate restriction enzyme and treated with S1 nuclease and also have an Ndel restriction enzyme recognition site which includes the initiation codon ATG as well as a second Ndel restriction site downstream of the unique Hpal, BamHI or Kpnl restriction sites described as useful for cloning foreign genes. This article makes no mention of cloning synthetic genes or production of structural proteins for other than the purpose of biochemical research studies. Any advantages of the use of E. coli recombination deficient bacterial hosts for these plasmids is also not disclosed nor discussed by these authors.

H. Aviv et al. (op. cit.) claim as a composition of matter vectors which include in 5' to 3' order: a DNA sequence which contains the promoter and operator P_LO_L from bacteriophage lambda, the N gene utilization site for binding anti terminator N protein produced by the host cell, a DNA sequence which contains a ribosomal binding site for rendering the mRNA of the desired gene capable of binding to ribosomes within the host cell, an ATG initiation codon or a DNA sequence which is converted into an ATG initiation codon upon insertion of the desired foreign gene into the vector, and a restriction enzyme recognition site for inserting the desired foreign gene into the vector in phase with the ATG initiation codon. This type of vector does not necessarily suffer from potential disadvantages of producing fusion proteins with unwanted amino acid residues at the amino terminus which cannot be conveniently removed. No mention is made in this patent application of cloning of synthetic genes or of genes with repeating amino acid sequences, of cloning of structural proteins or proteins with interesting physical properties, or of the utility or preferred use of E. coli recombination-deficient bacterial hosts for gene expression from the claimed plasmid vectors.

Gene fusions and hybrid genes have been known in the art of molecular genetics for a number of years. For example, see L. Guarente in Genetic Engineering, Principles and Methods - Volume 6, pp. 233-248 (J. K.

Setlow and A. Hollaender, eds.; Plenum Press, 1984) and J. H. Kelly and G. J. Darlington, Annual Reviews of Genetics 19, 273-296 (1985) for reviews. Also see world patent applications WO 83/03547 (U.S.A. priority date April 14, 1982) by J. L. Bittle and R. A. Lerner, WO

85/02611 (filed December12, 1984) by R. A. Houghten for the Scripps Clinic and Research Foundation and WP 86/01210 (filed August 1985) by D. A. Carson, G. Rhodes and R. Houghten for the Scripps Clinic and Research Foundation, and European patent applications EPA 0141484 (GB priority date June 10, 1983) by C. Weissman and H. Weber for Biogen N.V., EPA 0152736 (GB priority date November 1, 1984) by H. Ferres, R.A.G. Smith and A. J. Garman for Beecham Group P.L.C.I, and EPA 0161937 (GB priority date May 16, 1984) by K. Nagal and H. C. Thogersen for Celltech Ltd. All of these patent applications describe the production of fusion or hybrid proteins for a variety of pharmacological agents, enzyme conjugates and diagnostic methods and kits. None of these applications, however, refers to the production of proteins preferred for their physical or structural properties, the production of peptides or proteins from synthetic genes or discusses a requirement to produce recombinant products in recombination-deficient bacterial hosts. Some of these applications claim peptide or protein products with internally repeating amino acid sequences, including oligomers of a native protein, but without exception these products as discussed in the relevant applications are pharmacologically or antigenically active compounds.

As another aspect of the art of molecular cloning pertinent to the invention described herein, it should be noted that several research groups have successfully cloned synthetic genes. Very few of these cloning efforts have focused on peptide or protein products with internally repeating amino acid sequences. The cloning of a synthetic gene coding for a polymeric form of an oligopeptide, specifically the dipeptide L-aspartyl-L- phenylalanine, is disclosed in M.T. Doel et al., Nucleic Acids Research 8, 4575-4592 (1980). A requirement therein for the use of a recombination-deficient host is recognized by the employment of E. coli strain HB101 (genotype recA13) which is widely used in the art of molecular cloning. However, these researchers only describe a process for producing polymeric forms of short oligopeptides which could be subsequently broken down chemically or enzymatically into short oligopeptides and do not address any potential advantages to production and use of the polymeric peptides directly. The method described in this reference also is limited to those synthetic genes which can be constructed by annealing two completely complementary oligodeoxy- nucleotides so as to create DNA hybrids with staggered ends that can further anneal into large oligomeric synthetic DNA sequences. There is no disclosure in this reference of any method to further oligomerize the synthetic gene products into even larger synthetic genes.

Other literature in the art of molecular cloning and peptide or protein expression has dealt with the problem of DNA segment oligomerization. Strategies have been presented in several of these references for specifically and efficiently linking equivalent DNA segments into long DNA sequences which code in an uninterrupted fashion for a large peptide or protein product with internally repeating sequence. See J. L. Hartley and T. J. Gregori, Gene 13, 347-353 (1981); T. A. Willson et al., Gene Analytical Techniques 2, 77-82 (1985); and T. Kempe et al. Gene 39, 239-245 (1985). In contrast to the current invention, none of these references discloses production of synthetic genes coding for repeating amino acid sequences which are of essential value in the polymerized state or discloses the preferred use of recombination-deficient bacterial hosts for plasmid expression vectors bearing synthetic genes. The examples and discussion in these articles bear only on aggregates or oligomers of protein or peptide products which are pharmacologically active or have an undisclosed activity.

S. Petty-Saphon and J. A. Light have claimed in U.K. patent application GB 2162190 (filed July 8, 1985) a method of producing polypeptide products which are components of silk including those wherein the silk protein comprises sets of the sequence (Gly-Ala-Gly-Ala- Gly-Ser). However this application is not enabling since no examples are given, appropriate plasmid expression vectors and suitable bacterial hosts or other host microorganisms or plant or animal cell hosts are not identified and a method of producing or isolating a natural or synthetic gene encompassing the aforementioned sets of the sequence (Gly-Ala-Gly-Ala-Gly-Ser) is not described.

SUMMARY OF THE INVENTION One aspect of this invention relates to a method of preparing double-stranded DNA fragments which code totally for a repeating amino acid sequence for insertion into plasmid vectors, which process comprises the steps of:

(a) annealing a mixture comprising at least two complementary phosphorylated DNA oligodeoxynucleotides which partially overlap upon base pairing by heati ng said mixture and thereafter slowly cooling said mixture to allow formation of stable base pairs between complementary sequences oriented antiparallel with respect to their 5' to 3' polarity;

(b) treating said mixture of annealed DNA oligodeoxynucleotides with a ligase enzyme to covalently link adjacent oligodeoxynucleotides with the same 5' to 3' polarity into longer DNA segments;

(c) enzymatically attaching duplex oligodeoxy- nucleotide linker DNAs to said covalently linked DNA segments to provide do ub l e -s tr anded DNA fragments having linkers attached to the ends thereof, said DNA linkers including at least one restriction enzyme recognition site which is unique to the linkers which is not found within the repeating oli godeoxynucleotide sequences of said double-stranded DNA fragments and which occurs not more than once within the sequences of some plasmid vector, said linker DNAs having non-equivalent single- stranded chain ends, and said linkers also adapted to maintain the genetic code reading frame and to maintain the repeating amino acid sequence of one or more of said DNA segments when attached enzymatically in tandem to said plasmid vector.

Another aspect of this invention relates to double- stranded DNA sequences prepared by the process of this invention and; A preferred embodiment of the process of this invention further comprises cleaving said linkers with a restriction enzyme so as to eliminate multimeric forms of linker ends on said DNA fragments or oligomerized forms of said DNA fragments. In another preferred embodiment of the process of this invention the covalently linked base paired complementary DNA sequences are further treated with a DNA polymerase prior to attachment of linkers to totally or partially remove nicks or gaps in the base-paired synthetic DNA sequences. In yet another preferred embodiment of the process of this invention steps (b) and (c) are conducted simultaneously. Still another preferred embodiment of the process of this invention further comprises cooling the oligodeoxynucleotide mixture in step (a) of the present invention at a rate and to an extent sufficient to allow formation of hybridized oligodeoxynucleoti de strands base paired to provide the maximum amount of overlap between said oligodeoxynucleotides.

In still another preferred embodiment the process of this invention further comprises:

(d) cooling a mixture compr i s i ng one or more of the double-stranded DNA fragments having linkers attached thereto as in step (c), which mixture optionally contains other double-stranded DNA fragments which have compatible termini for ligation and which code for repeating amino acid sequences to a temperature sufficiently low to allow ligation of said double- stranded DNA sequences into longer double-stranded DNA fragments; and

(e) treating said cooled mixture of double- stranded DNA fragments with a ligase enzyme to covalently link said double-stranded DNA fragments to form linked double-stranded DNA fragments which code for contiguous repeats of one or more amino acid sequences.

Still another aspect of this invention relates to a method of forming a recombinant plasmid comprising a plasmid vector and one or more double-stranded DNA fragments as described above, said method comprising the steps of:

(a) cleaving a plasmid vector at a predetermined restriction site; and (b) enzymatically attaching one or more double- stranded DNA sequences at said site so as to maintain the genetic code reading frame of said DNA fragments relative to translation initiation DNA sequences in said plasmid vector and to maintain the repeating amino acid sequence through and between any joined DNA fragments, such that said sequences are under the control of a regulatable gene promoter sequence in said plasmid vector whereby said DNA sequence is expressible in said plasmid to form po l yp ep t i des composed of a known amino acid sequence when cloned into a microbial organism.

The invention also relates to recombinant plasmids and recombinant microorganisms formed by the process of this invention. The invention further relates to any of the polypeptide products of the recombinant processes of this invention.

BRIEF DESCRIPTION OF THE DRAWING Figure 1 is a physical map of the expression vector PAV1 . DESCRIPTION OF THE PREFERRED EMBODIMENTS

One aspect of this invention relates to a process for forming double-stranded DNA fragments which code for a desired repeating amino acid sequence with linker DNA ends which may be inserted into a suitable plasmid vector. As part of the first step of the process at least two synthetic oligodeoxynucleotides which can function as coding or anticoding strands for a desired amino acid sequence are prepared. Oligodeoxynucleotides are polymeric DNA sequences which are linear chains of deoxynucleot ides covalently linked through a phosphodiester bond between the C5' and C3' atoms of adjacent deoxyribose sugar moieties. The synthetic method for preparing such oligodeoxynucleotides sequences may vary widely. For example, they can each be chemically synthesized by any one of several available solution or solid phase techniques. See M.H. Caruthers et al., Genetic Engineering, Volume 4 (J. Setlow and A. Hollaender, eds.; Plenum Press, 1982) for a review of the preferred solid phase synthesis technology based on phosphoramidite chemistries as originally disclosed in S. L. Beaucage and M. Caruthers, Tetrahedron Letters 22, 1859-1862 (1981).

The nature of the synthetic oligodeoxynucleotides prepared for use in the practice of this invention is critical. The prepared oligodeoxynucleotides must consist of at least two oligodeoxynucleotides which are capable of base pairing and forming partially double- stranded DNA. Accordingly, at least one of the selected or prepared synthetic oligodeoxynucleotides must be a circularly permuted sequence of an oligodeoxynucleotide which is perfectly complementary to another of the selected and prepared oligodeoxynucleotides using the base pairing rules of guanine with cytosine and adenine with thymine well known in nucleic acid biochemistry and taking into account the antiparallel polarity of the two complementary synthetic oligodeoxynucleotide strands. Thus, for example, if one synthetic oligodeoxynucleotide is represented by the sequence:

5'-a-b-c-d-e.....f-g-h-3'

where a, b, c, d, e, f, g and h each stand for one of the four purlne or pyrimidine nucleotides, the other synthetic oligodeoxynucleotide might appear as

3'-d'-e'...f'-g'-h'-a'-b'-c'-5'

with d and d', e and e' and the like, representing the appropriate paired bases. The choice of a circularly permuted sequence for at least one of the synthetic deoxynucleotides is restricted to those sequences which leave unequal numbers of paired bases and unpaired bases in either strand when the two synthetic oligodeoxynucleotides are annealed to one another in later steps of the method. It is further required that the number of unpaired bases following annealing not be zero. Thus, in the above-example, the number of bases represented by the sequence 5'-d e....f g h-3' is not equal to the number of bases represented by the sequence 5'-a b c-3'. The DNA sequences of the oligodeoxynucleotides are selected according to these rules in order to control the polar orientation of the unpaired bases following strand annealing (that is, the unpaired bases will either be on the 5' or 3' end of each synthetic deoxynucleotide following the annealing step in the current process) or to prevent hybridizations between at least two complementary oligodeoxynucleotide strands which might leave no unpaired bases and thereby prevent efficient oligomer ization subsequent to or during the annealing step of the present invention. The choice of nucleotide sequence in the synthetic oligodeoxynucleotides is governed by the order of amino acids in the basic repeating unit for which directly repeating oligomers are desired in product polypeptides. One or more of the synthetic oligodeoxynucleotides can then be selected to code for the desired basic repeating peptide unit or a circularly permuted version of this coding sequence. Coding sequences are chosen on the basis of the genetic code and preferred codon usage in the host microorganism in which the synthetic gene described in this invention is to be expressed. More than one coding sequence may be chosen in situations where codon preference is unknown or ambiguous for optimum codon usage in the chosen host microorganism. The length of the selected or prepared oligodeoxynucleotides may vary widely. The minimum length of the oligodeoxynucleotide for use in the process of this invention is a number of covalently joined nucleotides which is equal to three times the number of amino acids in the basic repeating peptide unit. The maximum length is not critical and the employment of synthetic deoxynucleotides with integral multiples of this number of bases is also acceptable and is preferred if the number of amino acids in the basic repeating peptide unit is less than about 4. The synthetic oligodeoxynucleotides will generally terminate in a 5' hydroxyl chemical group and will require phosphorylation of the 5' chain end if this moiety does not already bear a phosphate chemical group. Phosphorylation of a 5' hydroxyl chain end can be conveniently done with any enzyme capable of transferring a phosphate chemical group preferably from adenosine triphosphate (ATP) to the 5' hydroxyl site. The preferred enzyme is T4 polynucleotide kinase (E.C. 2.7.1.78) but it is recognized that other phosphorylation enzymes such as phosphatases under the appropriate substrate conditions are also acceptable for the phosphorylation reaction. As part of the initial step of this process, the phosphorylated oligodeoxynucleotides are annealed to form complementary oligomeric forms. These complementary synthetic oligodeoxynucleotldes may be annealed by heating a mixture of two or more oligodeoxynucleotides, at least two of which are complementary, in an appropriate buffered salt solution. The final temperature to which the mixture is heated may vary widely, but is preferably above the temperature at which the synthetic oligodeoxynucleotides can stably form hydrogen bonds and base pair and below 100°C. The heated mixture of synthetic oligodeoxynucleotides is slowly cooled to a temperature allowing stable base pairs to form between complementary strands. For example, using the nomenclature introduced above, two of the possible resultant DNA hybrid duplex sequences can be represented as

or

Other structures may be possible if there is internal sequence degeneracy in the synthetic oligodeoxynucleotides; these structures will possibly have no or at worst a few base pair mismatches. The number of structures formed will be a consequence of the smallest unique sequence contained within the oligodeoxynucleotides and the number of synthetic oligodeoxynucleotides in the reaction mixture. For example, If the sequence 5'-a-b-c-d-e...f-g-h-3' has no internal degeneracy and only two synthetic oligodeoxynucleotides are used, the two base pairing structures shown above will be the only product molecules of thermal annealing between the two synthetic oligodeoxynucleotides. The preferred final temperature is chosen so as to allow only hybrid structures with a defined single stranded polarity (that is, only 5 or 3' base overhangs) to be formed, but other temperatures below this value can be used as long as the cooling step is sufficiently slow so as to allow complete formation of only the most stable base paired hybrid structures. Oligomeric forms of the overlapping synthetic DNA strands with individual synthetic oligodeoxynucleotides base paired in staggered fashion to two of the complementary and overlapping synthetic oligodeoxynucleotides will subsequently be formed by this method when the sample temperature is lowered further. Lowering of the sample temperature at this stage of the procedure will allow the staggered ends of the overlapping base paired synthetic oligodeoxynucleotides to further anneal and stably form base pairs with other combinations of overlapping base paired synthetic oligodeoxynucleotides. The length of hybrid duplex DNA segments resulting from annealing may vary widely. In the preferred embodiments of the invention the length of such fragments is generally substantial since the two synthetic oligodeoxynucleotides are selected such that they cannot hybridize in perfect register and thus they do not prematurely terminate duplex chain elongation before substantial lengths have been obtained. The oligomeric DNA strands produced during the annealing step will code for contiguous repeats of the basic repeating peptide unit chosen with the ends of the oligomeric DNA strands coding for some portion of the basic repeating peptide unit. By judicious choice of the annealing temperature and base sequence in the synthetic oligodeoxynucleotides, the oligomeric DNA strands will both have a 5' or both have a 3' overhanging end with the two ends being self-complementary. For example, using the nomenclature introduced above, an oligomeric duplex DNA produced by the annealing step of this process might appear as

(3) 5'-a-b-c-d-e...f-g-h a-b-c-d-e...f-g-h-3'

3'd^,e^,...f'g^,h'a^,b^,c^,d'e^,...f'g'h'a^,b'c'5' The number of annealed complementary synthetic oligodeoxynucleotides will vary between each set of stably base paired oligodeoxynucleotides and will form a distribution of sizes ranging from one to many repeating base paired and overlapping synthetic oligodeoxynucleotides per annealed set.

The above process has been described in terms of using two or more complementary oligodeoxynucleotides. Certain embodiments of the method of this invention may require more than two oligodeoxynucleotides to be annealed in preparation of oligomeric DNA fragments. Practice of such an embodiment may be preferred when optimal codon usage for a given host all for peptide or protein expression is unknown or ambiguous or when stringent translational control related to the amounts of transfer RNA (tRNA) molecules in a bacterial cell limits the rate or extent of protein synthesis. Combinations of more than one pair of complementary oligodeoxynucleotides can be annealed into oligomeric DNA fragments with complete base pairing according to the adenine-thymine and guanine-cytosine base pairing requirement if the unpaired bases between any given pair of annealed complementary oligodeoxynucleotides are equivalent in sequence and polarity to the unpaired bases in any or all other pairs of complementary synthetic oligodeoxynucleotides in the reaction mixture. For example, using the nomenclature introduced earlier, an example of two pairs of synthetic oligodeoxynucleotides which could be oligomerized as described above in which the unpaired bases a b c and a' b' c' are equivalent in sequence and polarity are: (4) 5'-a-b-c-d-e ...f-g-h-3' 3 '-d'e' ...f'g'h'a'b'c'-5' and

(5) 5'-a-b-c-j-k ...f-g-h-3'

3'-j'k'...f'g'h'-a'-b'-c'-5'

It is also possible to use an odd number of synthetic oligodeoxynucleotides equal to or greater than three within the context of the present invention if all except one such synthetic oligodeoxynucleotide have a complementary oligodeoxynucleotide present as a circularly permuted sequence, and the remaining unpaired synthetic deoxynucleotide has a nucleotide sequence that varies at most at only a few positions from some other synthetic oligodeoxynucleotide in the reaction mixture. The base pair mismatches in this embodiment of the invention must still allow stable formation of duplex molecules through hydrogen bond formation upon annealing at an appropriate temperature. That is, the base pair mismatches in the resulting duplex molecules must be so few in number as to not destabilize duplex formation at the chosen annealing temperature.

In the second step of this embodiment of the invention, the oligomerized DNA strands are treated with a ligase enzyme to covalently link the base paired synthetic oligodeoxynucleotides which are oriented parallel with respect to their 5' to 3' polarity. The ligase enzyme used can be any of several enzymes capable of forming phosphodiester bonds between two DNA strands respectively terminating in 3' hydroxyl and 5' phosphate chemical groups. The preferred enzymes include T4 DNA ligase (E.C. 6.5.1.1) and E. coli DNA ligase (NAD+, E.C. 6.5.1.2). These enzymes would be employed with appropriate cofactors and substrate or substrates in concentrations appropriate for good enzymatic activity using procedures known in the art. The cofactors for any ligating enzyme and most particularly for T4 DNA ligase can include the enzyme T4 RNA ligase (E.C. 6.5.1.3) which is known to stimulate formation of linear llgated DNA products in the presence of T4 DNA ligase (cf. A. Sugino et al., Journal of Biological Chemistry 252. 3987-3994 (1977)), or can include any of several nonspecific polymers such as polyethylene glycol, spermidine, Ficoll, or bovine serum albumin (cf. B.H. Pheiffer and S.B. Zimmerman, Nucleic Acids Research 11, 7853-7871 (1983)).

The double-stranded DNA molecules generated following enzymatic treatment with a DNA ligase enzyme may have one or more nicks or gaps in either or both of the DNA strands. These nicks or gaps can arise by several mechanisms such as incomplete deprotection of 5' or 3' chain ends of the synthetic ol igodeoxynucleotides during chemical synthesis of said oligodeoxynucleotides, improper base pairing during the annealing step of the claimed process, contamination of the synthetic oligodeoxynucleotides with chains one or several bases shorter due to premature chain termination or elongation failure during chemical synthesis and subsequent inadequate purification of the desired synthetic oligodeoxynucleotide, incomplete chain ligation, incomplete addition of phosphate chemical groups to 5' hybroxyl chain ends, or nonspecific degradation of the synthetic oligodeoxynucleotides by a contaminating nuclease prior to or subsequent to the ligation step of the claimed process. Many if not all of these problems can be substantilly reduced or eliminated by treatment of the ligated double-stranded synthetic DNA fragments with a DNA polymerase. The DNA polymerase will extend extant DNA chains in a 5' to 3' di rect ion by the process of nick translation ( cf . R . G . Kelly et al., Journal of Biological Chemistry 245, 39-45 (1970). The various DNA polymerases or fragments thereof known in the art are useful for this step in the claimed process with the preferred enzymes being Escherichia coli DNA polymerase I (E.C. 2.7.7.7) or any proteolytic fragment of E. coli DNA polymerase I which retains the polymerase activity of the holoenzyme.

Following treatment with a DNA polymerase, the synthetic double-stranded DNA fragments prepared in certain embodiments of the invention are fractionated to isolate only those fragments of greater than some minimum size for use in subsequent process steps. This purification procedure may also be necessary for any natural genes, gene fragments or DNA copies of messenger RNAs for specific genes or gene fragments which are of utility in certain embodiments of the process of this invention as described below. The method of purification can be chosen from a variety of biochemical techniques including size exclusion chromatogr aphy, ion exchange chromatography and affinity chromatography. The current preferred method is size exclusion chromatography over a suitable separation matrix; many such matrices ar e commercially available.

Any natural or synthetic double-stranded DNA fragment selected or prepared for the process of this invention is generally of a length which is sufficient to code for a polypeptide which has desirable polymeric properties. The selection of a particular polymeric property such as strength, elasticity, thermoplas ticity, binding or coordination to other molecules, and the like, will determine more or less the proper relationship between length of polypeptide product and optimized structure-function activity of the polypeptide. However a general attribute of polymers is that increasing chain length usually enhances the physical property being optimized in the polymer to some degree. It is therefore generally desirable to maximize the size of double-stranded DNA fragment or fragments to be used in this invention within the exigencies of the molecular cloning aspects of this invention. Cloning of DNA fragments above some minimum size is also convenient for purposes of subsequently identifying bacterial hosts containing plasmid vectors which in turn contain the DNA fragments and for optimizing the ligation reaction of, the natural or synthetic DNA fragment or fragments into the plasmid vector as described below. In the preferred embodiments of this invention, the length of the fragments is at least about 75 base pairs, in the particularly preferred embodiments of the invention the fragments are at least about 100 base pairs in length.

In the third step of the primary process of this invention, the ends of double-stranded synthetic or naturally occurring DNA fragments are modified prior to cloning. The ends of said DNA fragments are modified by attachment of a DNA linker with or without prior enzymatic treatment of the said DNA fragment to render the ends of said DNA fragment blunt-ended or flush. The ends of the synthesized double-stranded DNA fragments are made flush enzymatically using conventional techniques such as those described in T. Maniatis et al., Molecular Cloning (Cold Spring Harbor), pp. 113- 114, and other like references. DNA linkers as defined for purposes of describing this Invention are double- stranded oligodeoxynucleotides which contain at least one restriction enzyme recognition sequence or contain an end sequence for which any unpaired bases are equivalent to those found at an end of duplex DNA following the action of a specific restriction enzyme. The term adaptors within the descriptive text of this specification is equivalent to the term DNA linkers. The attachment of DNA linkers to the double-stranded DNA fragments is carried out enzymatically with a suitable ligase enzyme, preferably NAD-dependent E. coli DNA ligase or T4 DNA ligase. The resulting double-stranded DNA fragments with linkers attached may subsequently require exhaustive digestion with a restriction enzyme which has a recognition sequence within oligomeric forms of the linker DNA so as to limit the number of linker DNA molecules attached to any one end of a double- stranded DNA fragment to one linker molecule per end. The type of linkers selected for use with any particular doubl e- s tranded DNA fragment and/or pl asmi d vec tor as described below is critical to the process of this invention. The selected linkers must contain at their ends or internally at least one restriction enzyme recognition site which is not present within the repeating oligodeoxynucleotide sequence and which is preferably present only once within the plasmid vector into which the DNA fragment will be inserted. For example, if the repeating oligodeoxynucleotide sequences used to prepare a synthetic DNA fragment of the type described in this invention are

(6) 5'-GGT GTT GGT GTT CCG-3' 3'-GGC CCA CAA CCA CAA-5'

which code for repeats of the amino acid s equence Glycine-Valine-Glycine-Valine-Proline (Gly-Val-Gly-Val- Pro in three letter amino acid code) when this double- stranded deoxynucleotide is oligomerized with itself, then the linker DNA molecule:

(7) 5'-GGG CCC CCG-3' 3'-GGC CCC GGG-5'

can be attached to oligomeric forms of the above pentadecanucleotide and subsequently cleaved with the restriction enzyme Apal (recognition sequence GGG CCC) which will cut within the linker DNA molecules but not within the oligomeric forms of the pentadecanucleotide. These DNA fragments can then be inserted into a suitable plasmid vector with a unique Apal restriction enzyme recognition site.

DNA linkers used in the practice of the process of this invention must have additional unique properties. For example, the nucleotide sequence contained within any DNA linker must allow for placement of the attached DNA fragment or fragments in the proper reading frame (as defined by the genetic code) relative to any controlling genetic element such as a translation initlatlon DNA sequence found in the plasmid vector Into which the DNA fragment or fragments will be inserted. It is also preferable that the repeating amino acid sequence coding for any DNA fragment attached to a DNA linker be continuous into and within the amino acid sequence coded for by the DNA linker. For example, if the DNA fragment to which DNA linkers are to be attached is oligomeric forms of the sequence

(8) 5'-GTT GGT GTT CCG GGT-3'

3'-CCA CAA CCA CAA GGC-5'

which in oligomeric form codes for repeats of the amino acid sequence Val-Gly-Val-Pro-Gly, then an adequate DNA linker within the context of this invention would be

(9) 5'-GTT GGG GTG CCG GGT-3' 3'-CCA CAA CCC CAC GGC-5'

In this representative example, the linker DNA contains within it a unique recognition sequence for the restriction enzyme BanI (GGTGCC) which is not found within the DNA fragment described above and is preferably also not found in the plasmid vector into which this DNA fragment is to be inserted. This linker DNA further retains the ability to insure the reading frame of the DNA fragment will be maintained so as to code for repeats of the amino acid sequence Val-Gly-Val- Pro-Gly and further continues the reading frame of the coding sequence into and through the linker DNA. That is, the linker DNA also codes for the amino acid sequence Val-Gly-Val-Pro-Gly.

It is additionally preferred that the single- stranded ends of duplex linker DNAs have self- complementary but not have equivalent sequences (that is, the end sequences do not have a two-fold rotational axis of symmetry) in order to ensure that they will only attach to the oligomerized DNA segments in the proper orientation for maintaining the genetic conding capacity for the desired repeating amino acid sequence. For example, if the desired repeating amino acid sequence is coded for by the oligomerized nucleotide sequence shown as (3) above and the chosen DNA linker Is

(10) 5'-a-b-c-j-k-l-m-n ...o-p-q-3'

3'-j'k'l'm'n'...o'p'q'a'b'c'-5',

then it is required that 5'-a-b-c not be equivalent to 5'-c'-b'-a'. This avoids DNA linker sequences which might attach with the wrong pol ar i ty to the ol i gomer i zed DNA and code for an und es i r a bl e amino acid sequence from an open reading frame on the anticoding strand. An example of such an anticoding strand is the sequence 5'- c'-b'-a'-q'-p'-o'...n'-m'-l'-k'-j'-3' shown in (10).

Two DNA fragments coding for different amino acid sequences may be joined to the same linker DNA in some aspects of this invention. In this instance, linker DNA is constrained to maintain the amino acid sequence of at least one of the sequences encoded by one of the two joined DNA fragments such that the fragment-linker- fragment DNA which either encodes a polypeptide chain with the repeating amino acid sequence encoded by each fragment covalently joined to a contiguous repeating amino acid sequence encoded by the other fragment or encodes a polypeptide chain that covalently joins and overlaps the two sequences at an amino acid residue or residues present in both repeating amino acid sequences. An example of DNA fragments within the scope of this invention which are joined by a linker DNA that provides overlap of the repeating amino acid sequences encoded respectively by each DNA fragment would be the DNA fragments formed as oligomers of the double-stranded deoxynucleotides (11) 5'-CCG CCG GGT CCG CCG GGT-3'

3' -CCA GGC GGC CCA GGC GGC-5'

Pro Pro Gly Pro Pro Gly and (12) 5' -GTT GGT GTT CCG GGT-3'

3' -CCA CAA CCA CAA GGC-5'

Val Gly Val Pro Gly

which in turn can be joined by the DNA linker

(13) 5'-GTT GGG GTG CCG GGT-3'

3'CCA CAA CCC CAC GGC-5'

Val Gly Val Pro Gly

In this representative example, the linker DNA contains a recognition sequence (GGTGCC) for the restriction enzyme Ban I which is not present in either oligomeric form of the double-stranded deoxynucleotides. The joined DNA fragments will code for a polypeptide which in part can be represented as the amino acid sequences (Pro-Pro-Gly)_n-(Val-Gly-Val-Pro-Gly)_m or ( Val-Gly-Val- Pro-Gly)_m-(Pro-Pro-Gly)_n where the two repeating ami no acid sequences overlap at a common Proline-Glycine dipeptide and the recognition sequence for Ban I has been introduced between the two original DNA fragments. Several classes of linker DNA are preferred for use in the process of this invention. The DNA fragments to which DNA linkers are to be attached will have either blunt-ended or cohesive termini with the preferred class of DNA fragment ends being cohesive termini. Linker DNA for blunt-ended DNA fragments will preferably have at least one blunt end and will lead to DNA fragments with identical termini once the linker DNA is attached to the DNA fragments and subsequently cleaved by the appropriate restriction enzyme recognizing some unique site within the linker DNA. It is particularly preferred in this instance to use linker DNA with at least two non-overlapping restriction enzyme recognition sites that result in cohesive termini when any of the appropriate restriction enzymes cleave the linker DNA and for which the cohesive termini produced by any two appropriate restriction enzymes following cleavage are complementary for base pairing and non-equivalent. It is also particularly preferred to use linker DNA in the process of this invention which contains restriction enzyme recognition sites that are non-palindromic and are recognizable and cleavable by restriction enzymes with non-palindromic or multiple recognition sites.

Illustrative examples of such enzymes are Accl, Afllll, Ahall, Aval, Banl, Banll, Bgll, Haell, HgiAI, HincI I, NspBII, XhoII, Bbvl, BsmI, Fokl, Gsul, Hgal, HphI, Mboll, Mnll, SfaNI, Sfil, and Tth111II. The non- equivalence of the DNA f ragmen t termini following attachment and cleavage of such DNA linkers provides that two or more of such DNA fragments or other DNA fragments with cohesive termini compatible for base pairing to one or both ter mi n i of such DNA fragments can only be attached unidirectionally to one another. The non-equivalence of the cohesive termini on such DNA fragments also insures that the repeating amino acid sequences encoded by covalently joined aggregates of two or more DNA fragments will be of the type and variety desired. That is, each of the DNA fragments and linker DNAs in such joined aggregates will be certain to express a polypeptide with the desired contiguous repeating amino acid sequence or sequences from the appropriate DNA coding strand and no DNA fragment or linker DNAs will be joined in the larger aggregate with the wrong polarity. Linker DNAs containing a number of restriction enzyme recognition sites can be used. Where a DNA linker contains at least one non-overlapping restriction enzyme recognition site, it is particularly preferred to use linker DNAs with two or less such recognition sites to minimize the size of the linker DNAs which are to be made synthetically, although some embodiments of the current invention may use linker DNA with more than two such recognition sites. DNA Linkers with at least two restriction enzyme recognition sites are cleaved sequentially with the appropr i ate restriction enzymes following attachment to DNA fragments to yield DNA fragments with non- equi valent cohesive termini.

Another class of embodiments which fall within the scope of the present invention are those where DNA linkers are attached to a plasmid expression vector to provide new insertion sites for double-stranded DNA fragments prepared and/or selected by the methods of the present invention. Such linkers can be attached to linearized plasmid DNA by techniques familiar to those practicing the art of molecular cloning and used as linkage sites for natural or synthetic DNA fragments bearing complementary DNA linker sequences.

It is also possible to combine the ligation reactions for any synthetic oligodeoxynucleotides being oligomerized and for linker DNAs being attached to oligomerized synthetic oligodeoxynucleotides in a single step within certain embodiments of the present invention. This approach offers several advantages including control of the size distribution of the synthetic genes by modulating the ratio of duplex oligodeoxynucleotides to linker DNAs. Another advantage is that any DNA chain which becomes circular during. the oligodeoxynucleotide ligation step can subsequently be linearized to provide clonable DNA fragments as long as at least one linker DNA has been incorporated into this DNA fragment during enzymatic ligation.

In the ligation of DNA linkers to double-stranded DNA sequences, multimerlc linker species are often formed. As used herein, multimeric species are those in which more than one linker is attached to the end of an oligomeric DNA sequences. In this event, the double- stranded DNA sequences are preferably subjected to exhaustive digestion with an appropriate restriction enzyme which has a recognition sequence within oligomeric forms of linker DNA so as to limit the number of linker DNA molecules attached to any one end of a double-stranded DNA fragment to one linker per end.

At this juncture the double-stranded DNA sequences with linkers attached to the ends can be cloned d i r ec t l y into a suitable restriction enzyme recognition site or pair of sites in a suitable replicable cloning vehicle and preferably in a plasmid vector. However, as noted above, there is a direct relationship between the length of the various double-stranded DNA fragments and the molecular weight of the polypeptide expressed from such fragments and there is also a direct re l at i ons h i p between the molecular weight of the polypeptide expressed from such fragments and the degree of quality of the desirable physical properties of the polypeptide product. Therefore, in the preferred embodiments of the invention it is often desirable to further increase the length of the DNA fragments prior to insertion into a plasmid vector. This increase in length can be conveniently obtained by mixing and cooling the DNA fragments with attached DNA linkers to a temperature sufficiently low to allow ligation of the DNA fragments through their linker ends into longer double-stranded DNA sequences which code for higher molecular polypeptides and then treating this mixture with a suitable ligase enzyme. The desirable temperature may vary widely, and in the preferred embodiments is above the freezing point of the mixture but sufficiently low to allow for maximum alignment of the linker ends of the double-stranded DNA fragments. In this preferred embodiment, the cooling step is also sufficiently slow so as to allow for the complete formation of the most stable aligned structures. After cooling, the mixture can be treated with a suitable ligase enzyme at this lower temperature, and optionally with a DNA polymerase to covalently link the aligned double-stranded DNA fragments into the desired longer sequences. Two or more of the fragments may be joined together into larger DNA fragments which each have at least about 75 base pairs using conventional ligation procedures, as for example those described in T. Manlatis et al., Molecular Cloning (Cold Spring Harbor, 1982), pp. 243- 246, incorporated herein by reference. The preferred ligase enzyme for this step of the invention is T4 DNA ligase. These larger, joined DNA fragments contain one or more repeating oligodeoxynucleotide sequences which can code for either the same or distinct repeating amino acid sequences, and they are joined so as to continuously maintain the genetic code reading frame for at least one of the repeating amino acid sequences through and between DNA fragments. The symmetry and placement of the joined DNA fragments in the larger DNA fragment may vary, leading to polypeptides encoded by such larger DNA fragments which are either random or alternating block peptide copolymers. The larger DNA fragment will preferably have cohesive termini and most preferably cohesive termini which are non-equivalent.

Examples of preferred cohesive termini found on each end of any of the aforesaid larger DNA fragments with complementary and equivalent sequences are:

(14) 5' C....................-GGG CC-3'

3'-CCGGG....................-C -5' and

(15) 5'-AATTC.....................G -3'

3'- G.....................CTTAA -5'

which can base pair with DNA termini left following cleavage with the restriction enzymes Apa I or Eco R1, respectively. Examples of preferred cohesive termini which are non-equivalent and non-palindromic include the following sequences which result from cleavage of linker DNA with restriction enzymes which have multiple recognition sequences: Restriction Enzyme

(16) 5'-GTGCC...........................G -3' Ban I

3 ' - G...........................CCACG-5'

(17) 5'- C...........................G.TGCC-3' Bsp 1286

3'-ACGGG...........................C -5'

As an alternative source of double-stranded DNA fragments for use in preparing larger DNA fragments, natural genes or gene fragments or complementary DNA copies of all or a portion of a natural gene in the form of double-stranded DNA fragments can be isolated by techniques well known in the art of molecular cloning. These DNA fragments are restricted to those which mostly or wholly code for repeating amino acid sequences with the possible exception of their end nucleotides on each DNA strand when the triplet grouping of nucleotide sequences required by the genetic code is taken into account. Illustrative of natural genes or gene fragments which are useful in the practice of this invention are those which code for part or all of any form or isolate of the proteins collagen, elastin, keratin, troponin C, any other intermediate filament protein (cf. E. Lazarides, Nature 283, 249-256 (1980)) or silk fibroin and which includes most or all of an amino acid sequence which exhibits some degree of repetitiveness within the protein sequence. The degree of repeti tiveness can be judged by DNA or protein sequence homology using various theoretical techniques in peptide biology. See, for example, S.B. Needleman and CD. Wunsch, Journal of Molecular Biology 48, 443- 453 (1970), A.D. McLachlan, Journal of Molecular Biology 61 , 409-424 (1971), and D. Eisenberg et al., Proc. Natl. Acad. Sci. (U.S.A.) 81, 140-144 (1984). Exemplary of useful complementary DNA copies in this invention are those resulting from reverse transcription and DNA strand copying from messenger RNA by an appropriate reverse transcription process and DNA strand copying process wherein the messenger RNA is transcribed from genes coding for proteins such as collagen, elastin, keratin, troponin C, any other intermediate filament, or silk fibroin. These illustrative lists are not meant to be inclusive of all proteins from which part or all of an appropriate double-stranded DNA fragment can be prepared in any of the processes of this invention. These natural DNA fragments will preferably be prepared for isolation using a restriction enzyme which leaves cohesive termini on the natural DNA fragments compatible with the cohesive termini on DNA fragments of synthetic origin. Alternatively, the ends of any natural DNA fragments preferably may be adapted or modified with an appropriate DNA linker or linkers which subsequent to attachment to the natural DNA fragments can either be uniquely cleaved with one or more restriction enzymes to reveal or intrinsically has one or more cohesive termini compatible with the cohesive termini of one or more synthetic DNA fragments. The repeating amino acid sequences for which the shorter DNA fragments and larger DNA fragments code in the primary process of this invention may vary widely, depending on the shorter DNA fragments selected for joining. Some of the preferred amino acid sequences encoded by the shorter DNA fragments include, in three letter amino acid code, poly(Gly), poly(Ala), poly(Gly- Ala), poly(Ala-Lys), poly (Gly-Ala-Gly-Ala-Gly-Ser), poly (Gly-Ala-Pro), poly (Gly-Pro-Ala), poly (Gly-Pro-Pro), poly(Gly-Val-Gly-Val-Pro), poly (Gly-Lys-Leu-Glu-Ala-Leu- Glu), poly (Ala-Lys-Pro-Thr-Tyr-Lys), poly (Ala-Lys-Pro- Ser-Tyr-Pro-Pro-Thr-Tyr-Lys) and the like wherein each amino acid residue has the L-amino acid conformation. Hydroxylated forms of any of these sequences are also preferable within certain embodiments of this invention. Some embodiments of this invention preferably select shorter DNA fragments which in part code for proline-containing or proline-rich amino acid sequences in which the DNA fragment-linker junction in oligomerized and larger DNA fragments occurs at or adjacent to a codon for the amino acid proline.

The DNA fragments or larger DNA fragments which code for the d'esired repeating amino acid sequence or joined repeating amino acid sequences can be inserted into a suitable plasmid vector using conventional techniques. Such techniques are well known in the art, and will not be described herein in detail. The larger DNA fragment will preferably be inserted at a unique site or pair of sites in the plasmid vector that allows perfect base pairing with cohesive termini on the larger DNA fragment. Such insertion may or may not yield a restriction enzyme recognition sequence at any of the junctions between the plasmid vector and the inserted DNA fragment. In the preferred embodiments of this invention such a restriction enzyme recognition sequence is constituted or reconstituted so that the inserted DNA fragment may be removed at a later time if desired in other applications of this invention. The site of DNA fragment insertion is preferably at a position 3' to a strong promoter/operator sequence in the plasmid vector which will regulate the production of sufficient amounts of polypeptide from the inserted DNA fragment which must be inserted in the correct reading frame and in the proper orientation. Illustrative of suitable plasmid vectors are pASl (described in U.S. Patent 4,578,355), pKC30 (described in R. N. Rao, Gene 31, 247-250 (1984)) and pKN403 (described in U.S. Patent 4,495,287). Preferred plasmid vectors include pJL6 (described in J.A. Lautenberger et al., Gene 23, 75-84 (1983), pAV1 whose construction is described below, ptac12H (described in E. Amann et al., Gene 25, 167-178 (1983)) and pKK233-2 (described in E. Amann and J. Brosius, Gene, in press). The plasmid vector plus the inserted DNA fragments or larger DNA fragment or fragments can be transformed using conventional techniques known in the art of molecular cloning using an acceptable bacterial host or other suitable microorganism in which the fragments are able to be expressed using established techniques, as for example those techniques described in U . S . Patent 4,237,224; T. Maniatis et al . , Molecular Clon ing : A Laborator y Manual ( Cold Spring Harbor, 1983), pp. 249- 255; and D. Hanahan, Journal of Molecular Biology 166, 557-580 (1983) and incorporated herein by reference. Useful bacterial species may vary widely and may be strains of such well known species as Escherichia coli, Bacillus subtllls and the like. Preferred bacteria are strains of E. coli which are recombinant-deficient in order to prevent recombination events that may be favored between various segments of the inserted DNA fragments which have a substantial degree of internal repetitiveness. Especially preferred strains of E. coli are genotype rec A-, especially MH01 (genotype recA-, Tet^r derivative of strain N99) whose construction is described in the examples below, MH03 (recA-, Tet^r derivative of strain N4830 made by P1 transduction from strain N6240 by techniques analogous to those used in the construction of MH01), DC1138 (pro-, leu-, λ ΔsrlR recA301::Tn 10, λ_def cI⁺), DC1139A (same as DC1138 except λ_def ΔBam H1 ΔH1 cl857), JM109 and DHB9 (F' lacl^q Z⁺ Y⁺, recA, srl::Tn 10, phoR, Δ phoA, Δ malF, Δara leu, Δlac, galE, galK; derived from MC1000).

After transformation, clonal isolates of transformed bacteria can be screened and selected using conventional techniques as for example screening by hybridization techniques using a radiolabelled synthetic oligodeoxynucleotide probe. The screened bacterial colonies can be selected and isolated once it is determined that they contain useful plasmid vectors, and can be as sayed for expr ess i ng the i ns er ted DNA as a polypeptide with the desired repeating amino acid sequence. If the cloned bacteria are capable of polypeptide expression from the DNA fragments utilized in the process of this invention, additional bacteria can be grown under fermentation conditions and these bacteria can be induced to express the desired polypeptide under conditions which are appropriate for the particular plasmid vector-bacterial host gene expression system being utilized. The desired polypeptide can then be isolated from the bacterial growth medium or from the bacteria using appropriate procedures. Illustrative of useful bacterial growth and bacterial product harvest procedures are those described in greater detail in European patent application 0131843 which is incorporated herein by reference.

If the cloned bacteria do not produce the polypeptide having the desired length or the desired repeating amino acid sequence, larger DNA fragments coding for an appropriate length polypeptide or one of the appropriate sequence can be obtained by isolation of one or more of the DNA fragment inserts from bacteria harboring a plasmid vector containing such insert using one or a pair of restriction enzymes which only cleave the associated linker DNAs and by oligomer ization of such insert DNAs. The techniques for oligomerization and transformation of the newly created larger DNA fragment are obvious extensions of techniques described above in detail. This procedure can be applied to the creation of hybrid DNA fragments containing more than one DNA fragment coding for distinct repeating amino acid sequences. The oligomerization and recloning of DNA fragments can be done several times and can be continued until gene constructs having the described characteristics are formed. For example, individual DNA fragments coding for repeating amino acid sequences (Gly-Pro-Pro) and (Gly-Val-Gly-Val-Pro) can be joined in various recloning procedures to obtain random or alternating block copolymer polypeptides composed of repeating units of these amino acid sequences of various lengths. The sequence (Gly-Pro-Pro)_n is an analogue to the eucaryotic protein collagen and may therefore form triple helical macromolecular aggregates and exhibit physical proper ties of high tensile strength and low elasticity. The sequence (Gly-Val-Gly-Val-Pro)_m Is a consensus sequence extracted from the known amino acid sequence of the eucaryotic protein elastin and has among its various physical properties the quality of elasticity. A hybrid copolymer polypeptide of these two repeating amino acid sequences might therefore be expected to show degrees of tensile strength and/or elasticity depending upon the nature and size of the larger DNA fragment prepared by the process of this invention which encodes the relevant hybrid copolymer polypeptide.

The process of this invention has many uses. For example, the process can be used to make or create bacteria which produce many useful polypeptide products. Illustrative of such products are analogues to naturally occurring proteins such as collagen, elastin, keratin, protein or glycoprotein elements of thick, intermediate or thin filaments in higher organisms, silk fibroin, tropomyosin, troponin C, resilin, eucaryotic egg shell proteins, insect cuticle proteins or other eucaryotic architectural proteins.

The following examples are presented to more particularly illustrate the invention and are not to be construed as limitations thereon.

EXAMPLE 1

Preparation of a Synthetic Gene for a Collagen Analogue

Without DNA linkers.

The following complementary and overlapping oligodeoxynucleotides were prepared using solid phase phosphoramidite chemistry as disclosed in Beaucage and Caruthers, op. cit., on an Applied Biosystems model 380 DNA synthesizer:

A. 5' - CG GGT CCG CCG GGT CCG C - 3' B. 3' - GGC CCA GGC GGC CCA GGC - 5'

Each oligodeoxynucleotide was isolated from shorter chain-elongation failure products by electrophor esis on and elution from 20% polyacrylamide gels containing 8 M urea. The final product was greater than 95% pure as determined by densitometry of autoradiograms prepared from end-labeled oligodeoxynucleotide products separated by analytical gel eleαtrophoresis. Phosphate was added to the 5' ends of oligodeoxynucleotides A and B in separate reactions that contained 8.6 nmol oligodeoxynucleotide and 20 units T4 polynucleotide kinase in 35- 45 ul buffer (66 mM Tris-HCl, pH 7.6, 1 mM spermidine, 10 mM MgCl₂, 15 mM dithiothreitol, 200 ug/ml bovine serum albumin (BSA), and 1 mM [ γ - ³²P]ATP with a specific activity of 0.2 Ci/mmol). These reaction mixtures were incubated for 2 hr at 37°C, then they were combined and were incubated at 14°C overnight. During this time, oligodeoxynucleotides A and B were annealing, presumably to form 17 base pair heteroduplexes with one base pair overhanging 3' ends or 10 base pair heteroduplexes with 8 base pair overhanging 5' ends. T4 DNA ligase (40 units) was added and incubation was continued at 14°C for three days to polymerize the annealed oligodeoxynucleotides into long repetitive heteroduplex DNA coding for multiple repeats of the tripeptide (Gly-Pro-Pro). These synthetic genes were dialyzed against TE buffer (10 mM Tris-HCl, pH 7.5, 1 mM EDTA) to remove unincorporated oligodeoxynucleotides and buffer components. The ends of the synthetic genes were then blunt-ended by using three units of the Klenow fragment of E. coli DNA polymerase I in a reaction (50 ul total volume) containing the following: 600 uM each of dCTP, dGTP, dATP and TTP; 50 mM Tris-HCl, pH 7.8; 9mM MgCl₂; 10 mM 2-mercap toethanol; and 50 ug/ml BSA. This reaction mixture was incubated at 14°C for 30 minutes, then Na₃EDTA was added to 10 mM and 150 ul of TE buffer was also added. The synthetic genes were purified on a DE-52 column, then ethanol precipitated. These synthetic genes were combined with the excluded fraction of another batch of synthetic genes prepared in substantially like manner that had previously been passed over a Sepharose 6B (Pharmacia) column. The combined synthetic genes were size fractionated on a Sepharose 4B (Pharmacia) column. The size distribution of synthetic genes was determined by electrophoresis on a 5% polyacrylamlde gel. The relative molecular weight distribution of fractions enriched for highly polymerized synthetic genes was compared on denaturing (i.e., containing 8 M urea) and non-denaturing 5 % polyacrylamlde gels. These gels showed the molecular weight distribution of single- stranded synthetic genes was smaller than expected from the molecular weight distribution of heteroduplex synthetic genes, suggesting that nicks and/or gaps were present in the double-stranded heteroduplex DNA. The nicks and/or gaps in 1.2 ug of synthetic genes were nick-translated in vitro using one unit of E. coli DNA polymerase I in the presence of 167 uM of each of dCTP, dGTP, dATP and TTP (with other buffer components as described in the blunt-ending reaction above) at 10°C for 20 minutes (15 ul total volume). Synthetic genes (0.5 ug heteroduplex DNA) were ligated without further manipulation to Clal-digested and blunt-ended pJL6 plasmid DNA (2.0 ug) using flve units of T4 DNA ligase in the buffer described above for the kinasing and ligation reactions (10 ul total volume). The reaction mixture was incubated overnight at 14°C, diluted to 200 ul in TE buffer, and used directly to transform E. coli strain MH01.

EXAMPLE 2: Bacteriophagr P1 Transductlon of E. coli strain N99cl⁺ and Construction of strain MH01.

In order to insure that any highly, repetitive synthetic gene would not be excised from an expr ess i on vector by host-mediated homologous recombination, a recA mutation was introduced into the E. coli strain

N99cl⁺. The recA mutation used here originated from E. coli strain N6240(CR63 recA::Tn10). This mutation was transferred into strain N99cl⁺ using the generalized transducing phage P1 cml, clr100. This particular phage carries a gene for chloramphenicol resistance (cml) and makes clear pjaques (clr) at high temperature (42°C) but turbid plaques at low temperature (32°C). A high titer stock of P1 cml, clr 100 grown on N6240 was used to transduce the recA mutation into N99cl⁺ as disclosed in J. H. Miller, Experiments in Molecular Genetics (Cold Spring Harbor, 1972). Five ml of a fresh overnight culture of N99cl⁺ was resuspended in an equal volume of MC buffer (0.1 M MgSO₄, 5 mM CaCl₂). The cells were then aerated at 37° C for 15 minutes. A 100 ul aliquot of the suspended cells was added to 100 ul of a 10^-1 or 10^-2 dilution of the P1 lysate. After incubation at 37°C for 20 minutes, 200 ul 1 M sodium citrate was added to each tube. The contents were then plated on LB plates containing 12.5 ug/ml tetracycline using 3 ml R top agar and the plates were incubated overnight at 39°C. Each tetracycl ine-res istant colony was screened for chloramphenicol sensitivity at 30°C in order to ensure that it was not a fortuitous PI lysogen. The presence of the recA mutation was confirmed by testing for sensitivity to UV light. Each potential bacterial transductant was streaked across an LB plate and different sections of the streaks were exposed to UV light for 0, 10 or 20 seconds, respectively. The agar plate was subsequently incubated at 30°C overnight. One strain which was unusually UV sensitive relative to its parent (as demonstrated by growth only in the 0 second exposure section of the streak) was saved and designated MH01.

EXAMPLE 3: Transformation of MH01 with the Ident i f i cation of Plasmids Bearing a Synthetic Collagen Analogue Gene Without DNA Linkers.

Frozen competent cells were prepared and transformed according to the Hanahan procedure (disclosed in D. Hanahan, Journal of Molecular Biology, 166: 557-580 1983) except the FSB buffer contained 10 mM potassium acetate, pH 6.4, 100 mM KCl, 15 mM MnCl₂, 10 mM CaCl₂, and 3 mM he xamine cobalt chloride. About 125 ng of DNA was used for each transformation; cells were subsequently selected for resistance to ampicillin and tetracycline. Transformants were replica plated onto nitrocellulose filters and those containing plasmids carrying synthetic gene inserts were identified by colony hybridization using radiolabeled oligodeoxynucleotide A as a probe. Hybridization was done at 37°C for 2 hr in a solution composed of 20% formamide, 5X SSC, 0 . 1 % SDS, 1 mM Na₂EDTA, 1X Denhardt's solution, and 250 ug/ml denatured, sheared salmon sperm DNA. Nitrocellulose filters were washed three times with 5X SSC, 0.1% SDS at 55°C successively for 20, 10, and 1 minute and then rinsed once in 2X SSC for two minutes at room temperature. Insert-bearing plasmids were subsequently isolated from bacterial clones yielding positive hybridization signals. These plasmids were restricted with the enzymes Hindlll and Ndel, and the restriction products were analyzed by agarose gel electrophoresis to determine the size of the insert.

EXAMPLE 4: DNA Sequencing of 5' and 3' Junctions for Synthetic Collagen Analogue Genes in pJL6 and Identification of pACl

Direct sequencing of the 5' junctions of the synthetic gene insert in several supercoiled plasmid DNAs bearing a synthetic collagen analogue gene without DNA linkers was conducted as disclosed in R. J. Zagursky et al., Gene Analytical Techniques 2: 289-94 (1985). The 5' and 3' gene orientations as used here respectively refer to the proximal and distal junctions relative to the lambda P_L promoter located in pJL6. The following oligodeoxynucleotide was prepared by solid phase automated synthesis for priming DNA sequencing reactions based on the Sanger dideoxynucleot ide sequencing method as adapted by Zagursky et al.: C. 5' - CTTACATATGGTTCGTGCAA - 3'

Primed synthesis reactions us i ng this oligodeoxynucleotide allow sequencing into any gene inserted at the Clal site of pJL6 and in a direction reading toward the Hindlll site of pJL6. On the basis of proper readi ng fr ame and corr e ct cod ing i nf orma ti on at bo th the 5' and 3' junctions, one of these plasmids was designated pACl and investigated further.

For determining the junction sequence at the 3' end of the synthetic collagen analogue gene in pACl, the chemical cleavage method (as disclosed in Maxam and Gilbert, 1980, Methods Enzymol., 65:499-560) was used after restricting pACl with Hindlll, radiolabel ing the linearized plasmid with [ γ -³²p]ATP using T4 polynucleotide kinase, digesting the labeled plasmid DNA with the enzyme Nde I, and purifying the synthetic gene fragment on a 5 % polyacrylamlde gel containing 8 M urea.

EXAMPLE 5 :

Northern Blot Analysis of pACl-Encoded Collagen Analogue Gene Messenger RNA

Total RNA was prepared from the following three strains: DC1139A(pro leu r- m⁺ ΔSrlR- recA301::Tn10 λdef ΔBamHl ΔH1 cl857), DC1139A(pJL6), and DC1139A(pAC1). Cultures were grown in 20 ml LB broth at 30°C overnight to 0D₆₀₀ =3. Then the cultures were split and half was shifted to 41°C for 1 h in order to activate the λ P _L promoter. Following the induction, the cultures were chilled to 0°C. The cells were centrifuged at 8000X g for 5 minutes at 0°C. The pellets were resuspended in 500 ul STE buffer (100 mM NaCl, 10 mM Tris-HCl, pH 7.0, 1 mM Na₂EDTA) and transferred to a 1.5 ml Eppendorf centrifuge tube. A 500 ul sample of hot ( 65 °C ) pheno l equi l i brated with distilled water was added, the tube was vortexed and then the tube was incubated at 65°c for 10 minutes. After a 5 minute centrifugat ion in an Eppendor f microfuge, the aqueous phase was removed and 500 ul of hot phenol was added. Another 500 ul of STE buffer was added to the first phenol phase, both tubes were vortexed, and both tubes were incubated at 65°C for 5 minutes. Following another 5 minute centr ifugation, both aqueous phases were pooled. The phenol extraction was repeated three more times. A final extraction was made with phenol:chloroform (1:1) at room temperature. The samples were extracted with ether and the RNA was precipitated with ethanol. The RNA was redissolved in 300 ul RNA storage buffer (210 ul 100% ethanol, 90 ul RNA buffer consisting of 20 mM sodium phosphate, pH 6.5, 1 mM Na₂EDTA, 99.5% ethanol). The quality of the RNA preparation was monitored by electrophoresing 2 ul of the samples on a 1.2% agarose gel in 10 mM sodium phosphate buffer, pH 7.0. The OD₂₆₀ and OD₂₈₀ of each sample was recorded.

About 20 ug of each RNA sample was prepared for gel electrophoresis as disclosed in T. Maniatis et al.,

Molecular Cloning (Cold Spring Harbor, 1982), pp. 202- 203. The RNA samples were electrophoresed on a 1.0% agarose-formaldehyde gel at 30 V. overnight. The next morning, the gel was stained with acridine orange to visualize the RNA and processed for Northern hybridization analysis according to the procedure disclosed by Barinaga et al. in Transfer of RNA to Solid Supports (Schlelcher and Schuell). The agarose- formaldehyde gel was blotted onto DBM paper overnight. Northern prehybridization solution was prepared as described by Barinaga et al. The DBM paper with transferred RNA was incubated in 17 ml prehybridization solution at 42°C overnight. The probe for the Northern blot consisted of oligodeoxynucleotide B of Example 1 radiolabeled with T4 polynucleot ide kinase in the presence of [ γ -³²p]ATP. The hybridization solution consisted of 25% formamide, 5X SSPE, 0.05% SDS, 1 nM Na₂EDTA, 1X Denhardt's solution and 750 ug/ml salmon sperm DNA (see T. Maniatis et al., op. cit., for definition of IX SSPE). The probe and hybridization solution were mixed and incubated with the DBM paper containing transferred RNA at 37°C overnight. The DBM paper was then washed successively in 1 1 4X SSPE, 0.1% SDS for 20 minutes at 55°C, 800 ml of 4X SSPE for 10 minutes at 55°C, 200 ml of 4X SSPE for 1 minute at 55°C, and 500 ml of 2X SSPE at room temperature for 2 minutes. The blot was subsequently dried and exposed to X-ray film overnight. The autoradiogram resulting from this exposure showed very strong probe hybridization to DC1139A(pACl) RNA for the culture induced at 41°C. Hybridization in all other strains and under other culture conditions including growth of DC1139A (pAC1) at 30°C was minimal. These data demonstrate unambiguously that strong induction of collagen analogue oligodeoxynucleotide B-specific messenger RNA synthesis from the λ P_L promoter occurred only at the high temperature and only in strain DC 1139 A(pACl) as expected.

EXAMPLE 6: In Vitro Coupled Transcription-Translation Assay for pACl

A commercially available coupled transcr iptiontranslation system (Amersham) was used to investigate the proteins encoded by plasmid pACl. The reaction mixtures contained 2.5 ug of plasmid DNA and were prepared according to the procedure supplied by the manufacturer. The parent plasmid pJL6 was studied in parallel reactions for comparison with pACl. Following the in vitro transcription-translation stimulated by these plasmid DNAs, a portion of one sample containing pACl DNA was treated with 5 ug collagenase in 60 mM CaCl₂ at 37°C for 30 minutes. Each reaction mixture was diluted with an equal volume of loading buffer (0.08 M Tris-HCl, pH 6.8, 0.1 M dithiothreitol, 2% SDS, 10% glycerol, 0.1 mg/ml bromophenol blue) and heated to 100°C for 5 minutes. Ηalf of each sample was then electrophoresed on a 12.5% SDS-polyacrylamide gel at 50 V. overnight. Molecular weight marker proteins run in a parallel lane were bovine serum albumin, ovalbumin, carbonic anhydrase and cytochrome C. The gel was then fixed and f lourogr aphed with En³Hance using the procedure disclosed in A Guide to Autoradiography Enhancement (New England Nuclear). After exposure of the gel to X-ray film overnight, two prominent bands could be visualized on the autoradiogram. The first band occurred in the lane containing pJL6 DNA as well as the lane containing pACl DNA. This probably represents the beta-lactamase enzyme which is coded for by both plasmids. The second band is unique to pACl DNA and is a protein of 22,000 daltons based on its electrophoretic mobility. This protein is the product of the synthetic collagen analogue gene in pACl. Supporting evidence for this conclusion is the fact that the band is no longer visible in the sample containing pACl DNA and treated with collagenase prior to electrophoresis.

EXAMPLE 7 Peptide Expression from the Synthetic Collagen Analogue Gene Without DNA Linkers Contained in Plasmid pACl The in vivo expression of a collagen analogue peptide encoded by the synthetic gene inserted in the plasmid pACl was demonstrated using a whole-cell labeling protocol. Overnight cultures were prepared of DC1139A(pJL6) and DC1139A(pACl) at 30°C. The next morning, one ml of overnight culture was inoculated into 20 ml LB broth (10 g tryptone, 5 g yeast extract and 5 g NaCl in one liter of water) containing 50 ug/ml of ampicillin. The cultures were grown to OD₆₀₀=0.4 at 30°C. One ml samples were then taken and washed twice in M63 salt solution. The pellets were resuspended in 1 ml M63 medium plus 0.2% glucose, 1 ug/ml of vitamin B1, and 100 ug/ml of all amino acids except proline. The cultures were preincubated at 41°C for 20 minutes before 2 uCi of [¹⁴C]proline were added to each culture tube and the incubations were continued for an additional 3 minutes. About 1 mg of unlabeled proline was then added to all cultures, incubation was continued for an additional 3 minutes and then the incubations were terminated by pelleting the cells from all cultures. The cell pellets were washed once in 1 ml M63 salts to remove any residual unincorporated [¹⁴C]proline. The f i na l c e l l pel l ets wer e r es us p ende d i n 50 ul SDS loading buffer (80 mM Tris-HCl, pH 6.8, 100 mM dithiothreitol, 2% SDS, 10% glycerol, 100 ug/ml of bromphenol blue) and were immediately heated in a boiling water bath for 5 minutes. Aliguots of 20 ul were then electrophoresed on a 12.5% SDS-polyacr ylamide gel. The resulting gel was then treated with En³Hance (New England Nuclear) and exposed to X-ray film overnight. The resulting autoradiogram showed a protein band in the lane containing pACl proteins which was absent in the lane with pJL6 proteins; this band represented a protein with an apparent molecular weight of 22,000 daltons based on electrophor etic mobility. This protein was therefore of the same size as the pACl-specific protein band identified in the coupled transcription-translation system of Example 6.

The effect of temperature during culture preincubation was also determined. The experimental protocol was the same as above except that the temperature of preincubation was studied at temperatures ranging between 30° and 47° C. The resulting autoradiogram demonstrated that the temperature of preincubation leading to maximal collagen analogue peptide expression lies between 41º and 44º C.

There appeared to be an inconsistency between the measured molecular weight of the collagen analogue encoded by pACl (22,000 daltons) and the calculated molecular weight of 12,000 daltons derived from physical mapping and DNA sequencing data for pACl. An experiment was therefore undertaken to determine whether or not the collagen analogue was migrating anomalously on SDS- polyacrylamide gels. Three SDS-polyacrylamide gels with 10%, 12.5% or 15% polyacrylamlde concentrations were prepared and replicate samples from thermally induced DC1139A(pACl) cultures labeled with [¹⁴C]proline as well as radiolabeled marker proteins were electrophoresed on all three SDS-polyacrylamide gels. The results indicate that the synthetic collagen analogue peptide migrates abnormally s low r e lative to the mar ker prote i ns . Therefore, the true molecular weight of the collagen analogue peptide is less than 22,000 daltons.

It was also of interest to us to determine whether or not the collagen analogue peptide segregates into the soluble or insoluble protein fraction of the cell since many genetically engineered proteins form insoluble aggregates termed inclusion bodies. A gently lysis procedure was performed and the proteins present in a high salt pellet as well as in the associated supernatant were analyzed. Two 1 ml samples of DC1139A(pJL6) and DC1139A(pACl) were labeled with [ ¹⁴C]proline as described above following induction at 41°C for 1 h. One sample of each culture was processed as before and represented the unfractionated extract. The other portions were pelleted following the pulsechase and were washed with 1 ml TES buffer (40 mM Tris- HCl), pH 8.0, 1 mM EDTA, 25% sucrose). The pellets were resuspended in 250 ul TES buffer and 1 mg of lysozyme was added. The samples were frozen in dry ice and thawed in a 37°C water bath two times in order to facilitate lysis. Lysis buffer was subsequently added to the following concentrations: 0.5% Nonidet P40, 10 mM MgCl₂, 50 mM NaCl. The viscosity of the cell lysates was reduced by addition of E. coli DNase I enzyme to 20 ug/ml. The sample were then placed on ice for 30 minutes and were centrifuged in an Eppendorf microfuge for 10 minutes at 4°C. The resulting supernatants had trichloroacetic acid added to a final concentration of 10% and were placed on ice for 15 minutes. These acidified samples were centrifuged (again for 10 minutes at 4°C) and the white pellets were washed two times with 100% ethanol before being resuspended in 50 ul SDS loading buffer and being heated in a boiling water bath for 5 minutes. The high salt pellets were washed once in TES buffer before being resuspended and heated in SDS loading buffer. Sufficient amounts of 1 M Tris-HCl, pH 8.0, were added to those samples with a yellow hue to return their color to blue prior to heating in the boiling water bath. The fractionated samples as well as the unfractionated control samples were electrophor esed on a 12.5% polyacrylamlde gel, treated with En³Hance and exposed overnight to X-ray film. The autoradiogram of this gel revealed that the major band at an apparent molecular weight of 22,000 daltons specifically found in the lanes containing DC 1 1 39A9 (pACl) proteins was primarily found in the supernatant fraction and was the main labeled band in this lane.

EXAMPLE 8: Preparation of the Expression Vεctor pAVl

Genes inserted in the expression vector pJL6 at the Clal site, when expressed, produce proteins that are fusion peptides containing the first 13 amino acid residues of the λcll protein. A new expression vector was designed and constructed that removes the protein coding sequences related to the ell protein from pJL6 and also introduces a unique Apal restriction endonuclease recognition site; this new expression vector was designated pAVl. The plasmid pAVl still makes use of the λP_L promoter and a variant of the DNA sequences just upstream of the translational initiation codon for that portion, but the plasmid vector DNA between the Ndel and Hindlll sites of pJL6 have been replaced with a chemically synthesized DNA sequence that allows the tripeptide Met-Gly-Pro to be made rather than the first 13 amino acids of the λ cll protein. The new Apal restriction endonuclease recognition site is located within this sequence such that the DNA encoding the amino acid residue Pro is cleaved. Any synthetic or natural gene or gene segment terminating in the unpaired base sequence...GGCC-3' can be cloned into the Apa I site of pAVl, and, upon expression of the gene, the afore-mentioned tripeptide will comprise the amino terminus of the peptide produced under the control of the P_L promoter in pAVl.

The following oligodeoxynucleotides were synthesized as the first step in constructing pAVl using an Applied Biosystems model 380A automated DNA synthesizer:

D. 5'- TAAGGAAATACTTACATATGGGGCCCTAAGCTTTAATGCGGTAGTT-3'

E. 5' - TAAAGCTTAGGGCCCCATATGTA - 3'

The oligodeoxynucleotide E is completely complementary to a portion of the oligodeoxynucleotide D and produces a DNA fragment having both 5' and 3' overhanging ends. When annealed, oligodeoxynucleotides D and E form a heteroduplex DNA within which are located restriction enzyme recognition sites for both Ndel and Hindlll. The most direct method of constructing pAVl from D and E is to digest the synthetic heteroduplex with Ndel and Hindlll and then ligate the heteroduplex product into pJL6 from which the small DNA fragment produced by an Ndel-Hindlll double digest has been excised. During the course of constructing pAVl, it was determined that Ndel restricted the synthetic heteroduplex formed by D and E poorly or not at all, necessitating the additional steps described herein. Oligodeoxynucleotides D and E were annealed (270 pmol of each) in 35 ul of 10 mM TE buffer (see Example 1) by allowing the solution to cool slowly from 75°C to room temperature. A portion of thi s synteet i c heterodup lex was r ad iolabeled by T4 polynucleotide kinase in the presence of [ γ -³²P]ATP. After completing the radiolabeling, the synthetic heteroduplex was purified by chromatography on DE-52 cellulose (Whatman) and then precipitated in ethanol. The labeled synthetic heteroduplex was added to the unlabeled material as a tracer and the combined fractions were further purified on a NENSORB-20 column (DuPont) and then concentrated by evaporation. Another 270 pmol each of oligodeoxynucleotides D and E were added to the concentrated solution and the annealing reaction was repeated by allowing the solution to cool slowly from 98°C to 4°C. Proper annealing was monitored by gel electrophoresis of an aliquot of the reaction mixture in 16% polyacrylamide.

The synthe t i c he t erodup l ex was r e str i c ted at 37 ° C f or 5 h w i th 75 units of Hindlll restriction enzyme in 50 mM NaCl, 50 mM Tris-HCl, pH 8.0, 10 mM MgCl₂, and 100 ug/ml BSA. Then nine units of Ndel enzyme were added (after adjusting the buffer components to 200 mM NaCl, 60 mM Tris-HCl, pH 8.0, 17 mM MgCl₂, and 200 u g ml BSA) and incubation at 37°C was continued overnight. The reaction mixture was stored at -20°C and subsequently 7.5 units of Hindlll enzyme and 3 units of Ndel enzyme were added and the mixture was again incubated overnight but at room temperature. The reaction mixture was then extracted twice with phenol:chloroform (1:1), once with ether, and then was purified by chromatography through Sephadex G-25 (Pharmacia). The excluded fractions were pooled and the synthetic heteroduplex was again restricted with 15 units Ndel enzyme at room temperature for 24 h in 150 mM NaCl, 10 mM Tris-HCl, pH 7.8, 7 mM MgCl₂, 6 mM 2-mercaptoethanor, and 100 ug/ml BSA. The reaction mixture was extracted once with phenol:chloroform (1:1), and the synthetic heteroduplex was further purified by chromatography on Sephadex G-25 (Pharamacia). The excluded fractions were again pooled and then treated with 100 units of Hindlll enzyme for 24 h at room temperature in 50 mM NaCl, 50 mM Tris-HCl, pH 8.0, 10 mM MgCl₂, and 100 ug/ml BSA. This incubation was followed by extraction of the reaction mixture with phenol:chloroform (1:1) and the synthetic heteroduplex was purified by chromatography through Sephadex G-25 (Pharmacia). The excluded fractions were pooled, and a portion of the pooled fractions was analyzed on either denaturing and non-denatur ing 16% polyacrylamide gels either in the presence or absence of 8 M urea, respectively.

An aliquot of the pooled material representing approximately a five-fold molar excess was ligated to 10 ug of pJL6 DNA that had been cleaved with both Hindlll and Ndel enzymes and purified over a NACS PREPAC mlnicolumn (Bethesda Research Laboratories) using the manufacturer's directions. The chimeric plasmidsynthetic heteroduplex DNA which was joined at the common Hindlll site was purified from unligated mater i al by chromatography on a NACS PREPAC mlnicolumn (Bethesda Research Laboratories). The purified chimeric DNA was then treated with 3.7 units of T4 DNA polymerase for 5 minutes at 37°C in a 10 ul reaction mixture containing 33 mM Tris-acetate, pH 7.9, 66 mM potassium acetate, 10 mM magnesium acetate, 0.5 mM dithiothreitol, and 100 ug/ml BSA. The reaction was terminated by addition of

Na₃EDTA to 10 mM. The solution was extracted once with phenol:chloroform (1:1) and once with ether; residual traces of ether were removed in vacuo. Then the chlmeric DNA was circularized by heating it to 65°C and slowly cooling the reaction mixture to 4°C.

The resulting circular chimera contained single- stranded gaps on each side of the annealed region which were filled in with the Klenow fragment of E. coli DNA polymerase I. This was accomplished with 5 ug chimeric DNA in 50 ul of a solution containing 50 mM Tris-HCl, pH 7.8, 60 mM MgCl₂, 1 mM dATP, 1 mM dCTP, 1 mM dGTP, 1 mM TTP, 10 mM 2-mercaptoethanol, 50 ug/ml BSA, and 1 unit of the Klenow fragment of E. coli DNA polymerase I. This reaction mixture was incubated at room temperature for 15 minutes, then 10 ul of 10X ligation buffer (10X = 0.66 M Tris-HCl, pH 7.6, 10 mM ATP, 10 mM spermidine, 0.1 M MgCl₂, 150 mM dithiothreitol, and 2 mg/ml BSA), 3 units of T4 DNA ligase and water (to 100 ul total reaction volume) were added. This reaction mixture was incubated at 16°C for 3-4 h before being used directly to transform E. coli strain MH01 (100 ng DNA per transformation).

Colonies carrying the desired plasmid cons tr uc t were identified by colony hybridization. Oligodeoxynucleotide F was radiolabeled using T4 polynucleotide kinase and [ γ -³²P]ATP and was used as a probe in colony blots. Hybridization to lysed colonies immobilized on nitrocellulose filters occurred overnight at 37°C in 37% formamide, 5X Denhardt's solution, 250 ug/ml yeast tRNA, 1.0 M NaCl, 0.1 M Tris-HCl, pH 8.0, 6 mM NaoEDTA and 0.1% SDS. Plasmids prepared from colonies hybridizing to the radiolabeled probe were checked for the presence of an Apal restriction site by treatment with Apal enzyme and analysis on agarose gels. One plasmid that contained an Apal site was subjected to plasmid DNA sequencing by the method disclosed in Chen and Seeburg, 1985, DNA, 4:165-170. A pBR322 Hindlll site 16-mer primer (New England Biolabs) that could be extended in a counterclockwise fashion relative to the conventional pBR322 physical map was used in these DNA sequencing reactions. The sequencing results were confirmed and extended by the method disclosed in Guo and Wu , 1982, Nucleic Acids Res., 10:2065-2084. Briefly, the plasmid bearing an Apal site was digested with EcoRV restriction enzyme, then subjected to limited digestion with E . coll exonuclease III. Repair synthesis in the presence of all four standard dideoxynucleotides (ddATP, ddCTP, ddGTP, and ddTTP) as well as dATP, dCTP, dGTP, and TTP was conducted using the Klenow fragment of E . coli DNA polymerase I. The plasmid was then digested with Aval restriction enzyme and the resultant DNA was analyzed by electrophoresis on polyacrylamlde sequencing gels. Determination of the DNA sequence around the Apal recognition site in this plasmid showed that DNA between the Ndel and Hindlll cleavage sites on the coding strand in pJL6 had been replaced with the following sequence (coding strand): 5' - TATGGGGCCCTA - 3'. In addition, an A to G transition had occurred during the plasmid constr uc t i on at a position 5' to the ell gene translational initiation codon, producing a base sequence in this region which reads:

* 5'- ..TAAGGAAGTACTTACATATG... - 3'

The starred base is the mutated base, while the underlined regions are respectively the Shine-Delgarno sequence and the translational initiation codon in this newly constructed plasmid. This plasmid was designated pAVl; a physical map of pAVl is shown in the accompanying figure.

EXAMPLE 9

Preparation of a Synthetic Gene Without DNA Linkers For an Elastin Analogue

The following complementary and overlapping oligodeoxynucleotides were prepared as described in Example

1 :

F. 5' - TTCCGGGTGTTGGTG - 3'

G. 3' - CCACAACCACAAGGC - 5'

The final products were greater than 97% pure, determined as described in Example 1. Phosphate was added to the 5' ends of oligodeoxynucleotides F and G in separate reactions that contained 0.57 nmol oligodeoxynucleotide in 20 ul of the buffer described for the analogous reaction in Example 1 except that unlabeled ATP was added (1 mM). The kinased oligodeoxynucleotides were then combined and heated to 70°C for 15 minutes to inactivate the kinase enzyme. A portion of each of the prepared oligodeoxynucleotides F and G was separately radiolabeled using T4 polynucleotide kinase and [ γ -³²P]ATP. Two pmol of each radiolabeled oligodeoxynucleotide was added to the reaction mixture containing the unlabeled oligodeoxynucleotides F and G, and the temperature of the mixture was allowed to decrease slowly from 70º C to 1°C overnight. By allowing oligonucleotides F and G to anneal under these conditions, the formation of the most s tab l e heteroduplex is favored. In this case, one which contains 10 base pairs with a 5 base 5' overhang on each oligodeoxynucleotide strand is the favored heteroduplex. The 5' overhanging ends can further base pair with one another to generate, upon enzymatic ligation, a long synthetic DNA gene coding for an elastin analogue composed of repeats of the sequence Val-Pro-Gly-Val- Gly. After annealing of the oligodeoxynucleotides F and G, fresh ATP (63 uM) and T4 DNA ligase enzyme (5 units) were added to the reaction mixture. The temperature of the reaction mixture was then slowly increased to 14°C over a 2 h period and incubation of the mixture was continued at 14°C overnight. Large s ynthe t i c DNA genes were then obtained by size fractionating the enzymatic ligation products using chromatography on Sepharose 4B (Pharmacia). The relative size of synthetic gene DNAs in each fraction was determined as described in Example 1. Any DNA nicks and/or gaps were removed from 0.5 ug of synthetic gene DNA by E. coli DNA polymerase I as described in Example 1 except the final concentration of each of dATP, dCTP, dGTP and TTP was 0.33 mM. The DNA products were directly ligated to Clal-digested and blunt-ended pJL6 plasmid DNA as described in Example 1. The ligation reaction was diluted to 200 ul in TE buffer and used directly to transform E. coli strain MH01 as described in Example 3.

Bacterial transformant colonies containing plasmids carrying synthetic gene inserts were identified and characterized by physical mapping using restriction enzymes as described in Example 3 for synthetic collagen analogue genes. Oligodeoxynucleotide F was used as a radiolabeled hybridization probe in these experiments. The hybridization temperature was 27°C using this probe and buffer washes were conducted at 30°C rather than 55°C. EXAMPLE 10: Preparation of Apal Linkers and Attachment to a Collagen Analogue Gene to Form a Gene Cassette

The following complementary and overlapping oligodeoxynucleot ides were prepared and purified as described in Example 9:

H. 5' - GGGCCCCCG - 3'

I. 3' - GGCCCCGGG - 5'

The addition of phosphate to and the annealing of oligodeoxynucleotides H and I were as described in Example 9 except the solution in the annealing step was cooled to room temperature rather than 1ºC. Proper annealing of oligodeoxynucleotides H and I to form Apal linkers was followed by electrophoresis of samples on 20% polyacrylamide gels.

The following oligodeoxynucleotide was then chemically prepared and purified as described in Example 1: J. 5' - GGTCCGCCGGGTCCGCCG - 3' The purified preparation of oligodeoxynucleotide J contained no detectable impurities, determined as described in Example 1. Oligodeoxynucleotide J is complementar y to and overlaps with oligodeoxynueleotide B of Example 1. The addition of phosphate to and annealing and ligation of oligodeoxynucleotides B and J to form large synthetic genes was accomplished in substantially like manner to that described for oligodeoxynucleotides F and G of Example 9. Using those annealing conditions, the favored heteroduplex between oligodeoxynucleotides B and J is one that contains 15 base pairs with 3 base 3' overhanging ends on each oligodeoxynucleotide strand. The 3' overhanging ends can further base pair with one another to form, upon ligation, long synthetic collagen analogue genes coding for repeats of the amino acid sequence Gly-Pro-Pro. The synthetic DNA genes so formed were then size fractionated by chromatography on Sepharose 4B (Pharmacia) and the relative size of synthetic gene DNA in each fraction was assessed as described in Example 1. One factor which may ultimately limit the length of synthetic gene DNA attainable during the preceding step is the ligation of oligodeoxynucleotides lacking 5' phosphates at each growing end of these molecules. To insure that the large synthetic gene DNA does contain phosphate on its 5' ends, the kinase reaction was repeated as described before in several Examples. Apal linkers (23 pmol) were attached to a portion of this material (about 0.23 pmol) in the kinase reaction mixture after adjusting the ATP concentration to 2 mM and adding 4 units of T4 DNA ligase enzyme. After incubating the reaction mixture at 14°C overnight, any nicks and/or gaps were removed by adding 37 ul 5X polymerase buffer (5X=250 mM Tris-HCl, pH 7.8, 45 mM MgCl₂, 50 mM 2-mercap toethanol, and 250 ug/ml of BSA), dATP, dCTP, dGTP and TTP to 1 mM, and 7 units of E. coli DNA polymerase I in a total volume of 185 ul. This reaction mixture was incubated at 14°C for 1 h, and then extracted once with phenol:chloroform (1:1). The DNA was then precipitated and digested with Apal enzyme (up to 100 units) for as long as two days in reaction mixtures that contained 5 mM Tris-HCl, pH 7.4, 6 mM NaCl, 6 mM MgCl₂, 6 mM 2-mercaptoethanol, and 100 ug/ml of BSA. After extraction once with phenol:chloroform (1:1), the DNA products were passed over a Sepharose 4B (Pharmacia) column to remove digested Apal linkers. The resulting synthetic collagen gene cassettes carrying Apal linker ends were then ethanol precipitated after pooling the excluded fractions from the Sepharose 6B column.

EXAMPLE 11

Transformation of DC 1133 with and Identification of Plasmids Bearing Collagen Analogue Genes With DNA Linkers (Gene Cassettes)

Collagen analogue gene cassettes carrying Apal linker ends were ligated into pAVl DNA that had been digested with Apal. Typical ligation r eac t i ons contained about 0.3 to 1.2 pmole collagen analogue gene cassettes and 0.03 to 0.12 pmole Apal-digested plasmid vector in reaction volumes of 10-17 ul. Reaction buffer and other conditions were similar to those in previous Examples. The reactions were diluted to 1 ng total DNA per ul of solution using TE buffer as diluent and were then used to transform E. coli strain DC 1138 as described in Example 3. Three to 10 ng total DNA were used on each transformation plate. Identification of transformant colonies containing plasmids carrying collagen analogue gene cassettes was as described in Example 3. In addition, restriction mapping with the enzymes EcoRl and Hindlll was used to give a rough estimate of the size of synthetic gene inserts in pAVl. After more accurate sizing of interesting synthetic gene inserts was completed using the restriction enzymes Ndel and Hindlll, the size of the collagen analogue gene cassette was confirmed by digestion of the recombinant plasmid DNA with the enzymes Banll and/or Apa I. Many transformants containing plasmids carrying collagen analogue gene cassettes were identlfied. Character ization of some of these showed that gene inserts identifiable as single gene cassettes in the bacterial clones analyzed ranged from about 170 to about 350 base pairs in length.

EXAMPLE 12 Identification of a Bacterial Clone Containing a Plasmid With Multiple Collagen Analogue Gene Cassettes Ligation, transformation, and identification of colonies containing plasmids carrying multiple collagen analogue gene cassettes were as described in Example 11. By using a ten to one molar excess of collagen analogue gene cassettes to plasmid expression vector DNA, the likelihood was increased in this Example that multiple cassettes will be incorporated into a single plasmid. One plasmid was identified as containing a synthetic gene insert of about 440 base pairs when digested with the restriction enzymes Ndel and Hindlll. Apal digestion of this plasmid (designated pAC95) yielded two collagen analogue gene cassettes, one about 200 base pairs and the other about 230 base pairs in length. The Ndel-Hindlll synthetic gene cassette fragment was subcloned into the sequencing vector Bluescript M13+ (Stratagene, San Diego, CA) and RNA transcripts of the synthetic gene cassette fragment were prepared. These RNA transcripts were then sequenced with avian myoblastosis reverse transcr iptase enzyme and appropriate oligodeoxynucleotide primers according to the suppliers specifications. Sequence analysis of the Ndel-Hindlll region of pAC95 which surrounds the two Apal sites external to the insert DNA contained within pAC95 showed that the two gene cassettes were inserted in tandem within the Apal sites of pAVl in the appropriate orientation with respect to the translation initiation signal sequence of the vector. Expression of the tandemly arranged gene cassettes in pAC95 should therefore produce the peptide Met-(Gly-Pro-Pro)₄₈-Gly- Pro.

Claims

WHAT IS CLAIMED IS:

(1) A method of producing double-stranded DNA fragments coding for polypeptides composed of a repeating amino acid sequence, said method comprising the steps of:

(a) annealing at least two complementary DNA oligodeoxynucleotides which have phosphorylated 5' ends and which partially overlap upon base pairing such that the number of perfect base pairs formed upon annealing between any two oligodeoxynucleotides in the mixture does not equal the number of unpaired bases in the annealed strands and does not equal zero, and which code for one or more desired amino acid sequences, by heating a mixture comprising at least two such DNA oligodeoxy- nucleotides and thereafter cooling said mixture to allow formation of stable base pairs between complementary sequences oriented anti-parallel with respect to their 5' to 3' polarity;

(b) treating said mixture of annealed DNA oligodeoxynucleotides with a ligase enzyme to covalently link adjacent oligodeoxynucleotides with the same 5' to 3' polarity into longer double-stranded DNA segments; and

(c) enzymatically attaching double-stranded linker DNAs to said double- str anded DNA segments to provide double-stranded DNA fragments having l inker s attached to the ends thereof, said linker DNAs including at least one restriction enzyme recognition site which does not occur within said DNA se gments and which occurs no more than once within the DNA of some plasmid vector, and said linkers adapted to maintain the genetic code reading frame and to maintain the repeating amino acid sequence of one or more of said DNA segments when said segments are attached enzymatically in tandem to said plasmid vector.

(2) A process according to claim 1 which further comprises partially or fully cleaving said linkers with an appropriate restriction enzyme so as to eliminate multimeric forms of linker ends on said DNA fragments.

(3) A method according to claim 1 which further comprises treating said double-stranded DNA segments with a DNA polymerase enzyme to totally or partially remove nicks or gaps in said double-stranded DNA segments.

(4) A method according to claim 1 wherein in step (a) said mixture is cooled at a rate sufficient and to an extent to allow formation of hybridized DNA oligodeoxynucleotides base paired to provide the maximum amount of overlap between said oligodeoxynucleotides.

(5) A method according to claim 1 wherein said mixture comprises two complementary oligodeoxynucleotides.

(6) A method according to claim 1 wherein the ligation reactions of steps (b) and (c) are carried out simultaneously after addition of the said linker DNAs.

(7) A method according to claim 1 wherein said linker DNAs are selected such that they provide non- equivalent ends when attached to said double-stranded DNA segments.

(8) A method according to claim 1 wherein said DNA oligodeoxynucleotides and said linker DNAs are synthetic.

(9) A method according to claim 1 wherein said mixture comprises more than two species of complementary DNA fragments such that said oligomerized DNA fragments are heteropolymers which code for polypeptides having two or more amino acid sequences present as random block peptide copolymers.

(10) A method according to claim 1 wherein two or more species of DNA fragments are further mixed and joined by a ligase enzyme into oligomerized double- stranded DNA fragments, each speαies of said DNA fragment mixture having at least one single-stranded chain end compatible for base pairing to at least one other single-stranded chain end of a distinct species of DNA fragment, and such oligomerized double-stranded DNA fragments comprising joined fragments which code for heteropolypeptides comprising two or more joined and contiguous random or block repeating amino acid sequences, or joined fragments which code for polypeptides having a single repeating amino sequences, which homopolypeptides having a higher molecular weight than the polypeptide coded for by the joined DNA fragments.

(11) A method according to claim 10 wherein said larger double-stranded DNA fragment codes for heteropolypeptides comprising two or more joined and contiguous random or block repeating amino acid sequences.

(12) A method according to claim 11 wherein said species of DNA fragments are a mixture of synthetic DNA fragments and natural DNA fragments, said natural DNA fragments being comprised of natural DNA genes or gene segments or complementary DNA copies of natural messenger RNA sequences, and wherein said oligomerized DNA fragments comprise contiguous random or alternating block repeats of said synthetic DNA fragments and said natural DNA fragments which code for heteropolypeptides comprising two or more joined and contiguous random or alternating block repeating amino acid sequences.

(13) A method according to claim 1 or claim 10 which further comprises isolating said DNA fragments.

(14) A method according to claim 13 wherein said isolated DNA fragments are more than about 75 base pairs in length.

(15) A method according to claim 14 wherein said isolated DNA fragments are more than about 100 base pairs in length.

(16) A method according to claim 15 wherein said linker DNAs are selected such that when treated with an appropriate restriction enzyme capable of cutting at least one enzyme recognition site in said linker DNAs they provide equivalent ends.

(17) - A method according to claim 1 wherein said linker DNAs are comprised of the o l i god eo xynuc l eot i des with sequences

5'-GGGCCCCCG-3' and 5'-GGGCCCCGG-3'.

(18) A method according to claim 1 wherein said linker DNAs are selected such that when treated with an appropriate restriction enzyme capable of cutting at least one enzyme recognition site in said linker DNAs they provide non-equivalent ends.

(19) A method according to claim 18 wherein said linker DNAs are comprised of the oligodeoxynucleotides with sequences

5'-GGGCCGCCAGGGCCGCCG-3' and 5'-CGGCCCTGGCGGCCCCGG-3'.

(20) A method according to claim 1 wherein said DNA fragments code for a repeating collagen analogue tripeptide sequence, a repeating elastin analogue pentapeptide sequence, or a random or alternating polypeptide copolymer comprised of joined and contiguous collagen analogue tripeptide and elastin analogue pentapeptide sequences.

(21) The DNA fragment produced by the method of claim 1.

(22) The DNA fragment produced by the method of claim 12.

(23) The fragment of claim 21 of at least about 75 base pairs in length.

(24) The fragment of claim 23 of at least about 100 base pairs in length.

(25) A replicable plasmid cloning vehicle, said vehicle capable of expressing the DNA fragment of claim 21 when said vehicle containing said DNA fragment is cloned in a suitable host microbial organ-ism.

(26) A method of forming a recombinant pl asmi d comprising a replicable plasmid cloning vehicle according to claim 25 and the DNA fragment of claim 21 said method comprising the steps of:

(a) cleaving a plasmid cloning vehicle at a predetermined restriction endonuclease recognition site; and

(b) enzymatically inserting one or more of the DNA fragments of claim 21 at said site, such that said DNA fragments are under the control of a regulatable gene promoter sequence in said plasmid cloning vehicle and whereby said Inserted DNA fragments are express i ble to form po lypept ides composed of units of one or more repeating amino acid sequences when said recombinant plasmid is cloned into a suitable host microbial organism.

(27) The method of claim 26 wherein two or more said DNA fragments are inserted in tandem in said plasmid cloning vehicle so as to maintain the genetic code reading frame of said DNA fragments relative to translation initiation DNA sequences in said plasmid vector and to maintain one or more of the repeating amino acid sequences through and between inserted DNA fragments.

(28) The method of claim 27 wherein said two or more DNA fragments code totally for the same repeating amino acid sequence and wherein said inserted fragments are expressible to form polypeptides comprising repeating units of said sequence.

(29) The method of claim 27 wherein said two or more DNA fragments code totally or in part for different repeating amino acid sequences and wherein said inserted fragments are expressible to form block or random heteropolypeptides comprising repeat units of said amino acid sequences in block or in random.

(30) The process of claim 26 which further comprises enzymatically inserting a naturally occurring DNA fragment or a complementary DNA copy of all or a portion of a naturally occurring messenger RNA with said one or more DNA fragments.

(31) The recombinant plasmid prepared by the method of claim 26.

(32) A method of forming a microbial organism capable of producing polypeptides composed of repeating amino acid sequences, said method comprising the steps of transforming a recombination-deficient microorganism with the recombinant plasmid of claim 31.

(33) A method of claim 32 wherein said microbial organism is E. coli.

(34) A microbial organism which has been transformed in accordance with the process of claim 32.

(35) A method of producing a polypeptide composed of repeating known amino acid sequences which comprises:

(a) growing said microbial organisms of claim 34 in a culture medium to effect expression of said DNA fragment or fragments containing coding sequences for said polypeptide, thereby forming said polypeptides;

(b) isolating from said microbial organisms a fraction comprising said polypeptide; and (c) purifying said fraction to provide said polypeptide.

(36) The polypeptide prepared according to claim 35.