AU762399B2 - Recombinant narbonolide polyketide synthase - Google Patents

Recombinant narbonolide polyketide synthase Download PDF

Info

Publication number
AU762399B2
AU762399B2 AU42137/99A AU4213799A AU762399B2 AU 762399 B2 AU762399 B2 AU 762399B2 AU 42137/99 A AU42137/99 A AU 42137/99A AU 4213799 A AU4213799 A AU 4213799A AU 762399 B2 AU762399 B2 AU 762399B2
Authority
AU
Australia
Prior art keywords
pks
narbonolide
gene
streptomyces
recombinant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU42137/99A
Other versions
AU762399C (en
AU4213799A (en
Inventor
Gary Ashley
Mary Betlach
Melanie C. Betlach
Robert Mcdaniel
Li Tang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kosan Biosciences Inc
Original Assignee
Kosan Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/141,908 external-priority patent/US6503741B1/en
Application filed by Kosan Biosciences Inc filed Critical Kosan Biosciences Inc
Publication of AU4213799A publication Critical patent/AU4213799A/en
Publication of AU762399B2 publication Critical patent/AU762399B2/en
Application granted granted Critical
Publication of AU762399C publication Critical patent/AU762399C/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/65Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression using markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)

Description

WO 99/61599 -1 PCT/US99/11814 RECOMBINANT NARBONOLIDE POLYKETIDE SYNTHASE Reference to Government Funding This invention was supported in part by SBIR grant 1R43-CA75792-01. The U.S.
government has certain rights in this invention.
Field of the Invention The present invention provides recombinant methods and materials for producing polyketides by recombinant DNA technology. More specifically, it relates to narbonolides and derivatives thereof. The invention relates to the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, medicine, molecular biology, pharmacology, and veterinary technology.
Background of the Invention Polyketides represent a large family of diverse compounds synthesized from 2-carbon units through a series of condensations and subsequent modifications. Polyketides occur in many types of organisms, including fungi and mycelial bacteria, in particular, the actinomycetes. There is a wide variety of polyketide structures, and the class of polyketides encompasses numerous compounds with diverse activities. Tetracycline, erythromycin, FK506, FK520, narbomycin, picromycin, rapamycin, spinocyn, and tylosin, are examples of such compounds. Given the difficulty in producing polyketide compounds by traditional chemical methodology, and the typically low production of polyketides in wild-type cells, there has been considerable interest in finding improved or alternate means to produce polyketide compounds. See PCT publication Nos. WO 93/13663; WO 95/08548; WO 96/40968; WO 97/02358; and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 5,149,639; 5,672,491; and 5,712,146; Fu et al., 1994, Biochemistry 33: 9321-9326; McDaniel et al., 1993, Science 262: 1546-1550; and Rohr, 1995, Angew. Chem.
Int. Ed. Engl. 34(8): 881-888, each of which is incorporated herein by reference.
Polyketides are synthesized in nature by polyketide synthase (PKS) enzymes. These enzymes, which are complexes of multiple large proteins, are similar to the synthases that catalyze condensation of 2-carbon units in the biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually consist of three or more open reading frames (ORFs).
WO 99/61599 PCT/US99/11814 -2- Two major types of PKS enzymes are known; these differ in their composition and mode of synthesis. These two major types of PKS enzymes are commonly referred to as Type I or "modular" and Type II "iterative" PKS enzymes.
Modular PKSs are responsible for producing a large number of 12, 14, and 16membered macrolide antibiotics including methymycin, erythromycin, narbomycin, picromycin, and tylosin. These large multifunctional enzymes (>300,000 kDa) catalyze the biosynthesis of polyketide macrolactones through multistep pathways involving decarboxylative condensations between acyl thioesters followed by cycles of varying Bcarbon processing activities (see O'Hagan, D. The polyketide metabolites; E. Horwood: New York, 1991, incorporated herein by reference). The modular PKS are generally encoded in multiple ORFs. Each ORF typically comprises two or more "modules" of ketosynthase activity, each module of which consists of at least two (if a loading module) and more typically three or more enzymatic activities or "domains." During the past half decade, the study of modular PKS function and specificity has been greatly facilitated by the plasmid-based Streptomyces coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) synthase (DEBS) genes (see Kao et al., 1994, Science, 265: 509-512, McDaniel et al., 1993, Science 262: 1546-1557, and U.S. Patent Nos. 5,672,491 and 5,712,146, each of which is incorporated herein by reference). The advantages to this plasmid-based genetic system for DEBS were that it overcame the tedious and limited techniques for manipulating the natural DEBS host organism, Saccharopolyspora erythraea, allowed more facile construction of recombinant PKSs, and reduced the complexity of PKS analysis by providing a "clean" host background. This system also expedited construction of the first combinatorial modular polyketide library in Streptomyces (see PCT publication No. WO 98/49315, incorporated herein by reference).
The ability to control aspects of polyketide biosynthesis, such as monomer selection and degree of B-carbn processing, by genetic manipulation of PKSs has stimulated great interest in the combinatorial engineering of novel antibiotics (see Hutchinson, 1998, Curr.
Opin. Microbiol. 1: 319-329; Carreras and Santi, 1998, Curr. Opin. Biotech. 9: 403-411; and U.S. Patent Nos. 5,712,146 and 5,672,491, each of which is incorporated herein by reference). This interest has resulted in the cloning, analysis, and manipulation by recombinant DNA technology of genes that encode PKS enzymes. The resulting technology allows one to manipulate a known PKS gene cluster either to produce the polyketide synthesized by that PKS at higher levels than occur in nature or in hosts that otherwise do not WO 99/61599 PCT/US99/11814 -3produce the polyketide. The technology also allows one to produce molecules that are structurally related to, but distinct from, the polyketides produced from known PKS gene clusters. It has been possible to manipulate modular PKS genes other than the narbonolide PKS using generally known recombinant techniques to obtain altered and hybrid forms. See, U.S. Patent Nos. 5,672,491 and 5,712,146 and PCT publication No. WO 98/49315. See Lau et al., 1999, "Dissecting the role of acyltransferase domains of modular polyketide synthases in the choice and stereochemical fate of extender units" Biochemistry 38(5):1643- 1651, and Gokhale et al., 16 Apr. 1999, Dissecting and Exploiting Intermodular Communication in Polyketide Synthases", Science 284: 482-485.
The present invention provides methods and reagents relating to the modular PKS gene cluster for the polyketide antibiotics known as narbomycin and picromycin.
Narbomycin is produced in Streptomyces narbonensis, and both narbomycin and picromycin are produced in S. venezuelae. These species are unique among macrolide producing organisms in that they produce, in addition to the 14-membered macrolides narbomycin and picromycin (picromycin is shown in Figure 1, compound the 12-membered macrolides neomethymycin and methymycin (methymycin is shown in Figure 1, compound 2).
Narbomycin differs from picromycin only by lacking the hydroxyl at position 12. Based on the structural similarities between picromycin and methymycin, it was speculated that methymycin would result from premature cyclization of a hexaketide intermediate in the picromycin pathway.
Glycosylation of the C5 hydroxyl group of the polyketide precursor, narbonolide, is achieved through an endogenous desosaminyl transferase to produce narbomycin. In Streptomyces venezuelae, narbomycin is then converted to picromycin by the endogenously produced narbomycin hydroxylase. (See Figure 1) Thus, as in the case of other macrolide antibiotics, the macrolide product of the narbonolide PKS is further modified by hydroxylation and glycosylation. Figure 1 also shows the metabolic relationships of the compounds discussed above.
Picromycin (Figure 1, compound 1) is of particular interest because of its close structural relationship to ketolide compounds HMR 3004, Figure 1, compound The ketolides are a new class of semi-synthetic macrolides with activity against pathogens resistant to erythromycin (see Agouridas et al., 1998, J. Med. Chem. 41: 4080-4100, incorporated herein by reference). Thus, genetic systems that allow rapid engineering of the narbonolide PKS would be valuable for creating novel ketolide analogs for pharmaceutical WO 99/61599 PCT/US99/11814 -4applications. Furthermore, the production of picromycin as well as novel compounds with useful activity could be accomplished if the heterologous expression of the narbonolide PKS in Streptomyces lividans and other host cells were possible. The present invention meets these and other needs.
Disclosure of the Invention The present invention provides recombinant methods and materials for expressing PKSs derived in whole and in part from the narbonolide PKS and other genes involved in narbomycin and picromycin biosynthesis in recombinant host cells. The invention also provides the polyketides derived from the narbonolide PKS. The invention provides the complete PKS gene cluster that ultimately results, in Streptomyces venezuelae, in the production of picromycin. The ketolide product of this PKS is narbonolide. Narbonolide is glycosylated to obtain narbomycin and then hydroxylated at C12 to obtain picromycin. The enzymes responsible for the glycosylation and hydroxylation are also provided in recombinant form by the invention.
Thus, in one embodiment, the invention is directed to recombinant materials that contain nucleotide sequences encoding at least one domain, module, or protein encoded by a narbonolide PKS gene. The recombinant materials may be "isolated." The invention also provides recombinant materials useful for conversion of ketolides to antibiotics. These materials include recombinant DNA compounds that encode the C12hydroxylase (the picK gene), the desosamine biosynthesis and desosaminyl transferase enzymes, and the betaglucosidase enzyme involved in picromycin biosynthesis in S. venezuelae and the recombinant proteins that can be produced from these nucleic acids in the recombinant host cells of the invention.
In one embodiment, the invention provides a recombinant expression system that comprises a heterologous promoter positioned to drive expression of the narbonolide PKS, including a "hybrid" narbonolide PKS.. In a preferred embodiment, the promoter is derived from a PKS gene. In a related embodiment, the invention provides recombinant host cells comprising the vector that produces narbonolide. In a preferred embodiment, the host cell is Streptomyces lividans or S. coelicolor.
In another embodiment, the invention provides a recombinant expression system that comprises the desosamine biosynthetic genes as well as the desosaminyl transferase gene. In a related embodiment, the invention provides recombinant host cells comprising a vector that WO 99/61599 PCT[US99/11814 produces the desosamine biosynthetic gene products and desosaminyl transferase gene product. In a preferred embodiment, the host cell is Streptomyces lividans or S. coelicolor.
In another embodiment, the invention provides a method for desosaminylating polyketide compounds in recombinant host cells, which method comprises expressing the PKS for the polyketide and the desosaminyl transferase and desosamine biosynthetic genes in a host cell. In a preferred embodiment, the host cell expresses a beta-glucosidase gene as well. This preferred method is especially advantageous when producing desosaminylated polyketides in Streptomyces host cells, because such host cells typically glucosylate desosamine residues of polyketides, which can decrease desired activity, such as antibiotic activity. By coexpression of beta-glucosidase, the glucose residue is removed from the polyketide.
In another embodiment, the invention provides the picK hydroxylase gene in recombinant form and methods for hydroxylating polyketides with the recombinant gene product. The invention also provides polyketides thus produced and the antibiotics or other useful compounds derived therefrom.
In another embodiment, the invention provides a recombinant expression system that comprises a promoter positioned to drive expression of a "hybrid" PKS comprising all or part of the narbonolide PKS and at least a part of a second PKS, or comprising a narbonolide PKS modified by deletions, insertions and/or substitutions. In a related embodiment, the invention provides recombinant host cells comprising the vector that produces the hybrid PKS and its corresponding polyketide. In a preferred embodiment, the host cell is Streptomyces lividans or S. coelicolor.
In a related embodiment, the invention provides recombinant materials for the production of libraries of polyketides wherein the polyketide members of the library are synthesized by hybrid PKS enzymes of the invention. The resulting polyketides can be further modified to convert them to other useful compounds, such as antibiotics, typically through hydroxylation and/or glycosylation. Modified macrolides provided by the invention that are useful intermediates in the preparation of antibiotics are of particular benefit.
In another related embodiment, the invention provides a method to prepare a nucleic acid that encodes a modified PKS, which method comprises using the narbonolide PKS encoding sequence as a scaffold and modifying the portions of the nucleotide sequence that encode enzymatic activities, either by mutagenesis, inactivation, insertion, or replacement.
The thus modified narbonolide PKS encoding nucleotide sequence can then be expressed in a WO 99/61599 PCTIUS99/1814 -6suitable host cell and the cell employed to produce a polyketide different from that produced by the narbonolide PKS. In addition, portions of the narbonolide PKS coding sequence can be inserted into other PKS coding sequences to modify the products thereof. The narbonolide PKS can itself be manipulated, for example, by fusing two or more of its open reading frames, particularly those for extender modules 5 and 6, to make more efficient the production of 14-membered as opposed to 12-membered macrolides.
In another related embodiment, the invention is directed to a multiplicity of cell colonies, constituting a library of colonies, wherein each colony of the library contains an expression vector for the production of a modular PKS derived in whole or in part from the narbonolide PKS. Thus, at least a portion of the modular PKS is identical to that found in the PKS that produces narbonolide and is identifiable as such. The derived portion can be prepared synthetically or directly from DNA derived from organisms that produce narbonolide. In addition, the invention provides methods to screen the resulting polyketide and antibiotic libraries.
The invention also provides novel polyketides and antibiotics or other useful compounds derived therefrom. The compounds of the invention can be used in the manufacture of another compound. In a preferred embodiment, the antibiotic compounds of the invention are formulated in a mixture or solution for administration to an animal or human.
These and other embodiments of the invention are described in more detail in the following description, the examples, and claims set forth below.
Brief Description of the Figures Figure 1 shows the structures of picromycin (compound methymycin (compound and the ketolide HMR 3004 (compound 3) and the relationship of several compounds related to picromycin.
Figure 2 shows a restriction site and function map of cosmid pKOS023-27.
Figure 3 shows a restriction site and function map of cosmid pKOS023-26.
Figure 4 has three parts. In Part A, the structures of picromycin and methymycin are shown, as well as the related structures of narbomycin, narbonolide, and methynolide. In the structures, the bolded lines indicate the two or three carbon chains produced by each module (loading and extender) of the narbonolide PKS. Part B shows the organization of the narbonolide PKS genes on the chromosome of Streptomyces venezuelae, WO 99/61599 PCT/US99/11814 -7including the location of the various module encoding sequences (the loading module domains are identified as sKS*, sAT, and sACP), as well as thepicB thioesterase gene and two desosamine biosynthesis genes (picCII and picCIII). Part C shows the engineering of the S. venezuelae host of the invention in which the picAI gene has been deleted. In the Figure, ACP is acyl carrier protein; AT is acyltransferase; DH is dehydratase; ER is enoylreductase; KR is ketoreductase; KS is ketosynthase; and TE is thioesterase.
Figure 5 shows the narbonolide PKS genes encoded by plasmid pKOS039-86, the compounds synthesized by each module of that PKS and the narbonolide (compound 4) and (compound 5) products produced in heterologous host cells transformed with the plasmid. The Figure also shows a hybrid PKS of the invention produced by plasmid pKOS038-18, which encodes a hybrid of DEBS and the narbonolide PKS. The Figure also shows the compound, 3,6-dideoxy-3-oxo-erythronolide B (compound 6), produced in heterologous host cells comprising the plasmid.
Figure 6 shows a restriction site and function map of plasmid pKOS039-104, which contains the desosamine biosynthetic, beta-glucosidase, and desosaminyl transferase genes under transcriptional control of actll-4.
Modes of Carrying out the Invention The present invention provides useful compounds and methods for producing polyketides in recombinant host cells. As used herein, the term recombinant refers to a compound or composition produced by human intervention. The invention provides recombinant DNA compounds encoding all or a portion of the narbonolide PKS. The invention also provides recombinant DNA compounds encoding the enzymes that catalyze the further modification of the ketolides produced by the narbonolide PKS. The invention provides recombinant expression vectors useful in producing the narbonolide PKS and hybrid PKSs composed of a portion of the narbonolide PKS in recombinant host cells. Thus, the invention also provides the narbonolide PKS, hybrid PKSs, and polyketide modification enzymes in recombinant form. The invention provides the polyketides produced by the recombinant PKS and polyketide modification enzymes. In particular, the invention provides methods for producing the polyketides 10-deoxymethynolide, narbonolide, YC17, narbomycin, methymycin, neomethymycin, and picromycin in recombinant host cells.
To appreciate the many and diverse benefits and applications of the invention, the description of the invention below is organized as follows. First, a general description of WO 99/61599 PCTfUS99/11814 -8polyketide biosynthesis and an overview of the synthesis of narbonolide and compounds derived therefrom in Streptomyces venezuelae are provided. This general description and overview are followed by a detailed description of the invention in six sections. In Section I, the recombinant narbonolide PKS provided by the invention is described. In Section II, the recombinant desosamine biosynthesis genes, the desosaminyl transferase gene, and the betaglucosidase gene provided by the invention are described. In Section III, the recombinant picK hydroxylase gene provided by the invention is described. In Section IV, methods for heterologous expression of the narbonolide PKS and narbonolide modification enzymes provided by the invention are described. In Section V, the hybrid PKS genes provided by the invention and the polyketides produced thereby are described. In Section VI, the polyketide compounds provided by the invention and pharmaceutical compositions of those compounds are described. The detailed description is followed by a variety of working examples illustrating the invention.
The narbonolide synthase gene, like other PKS genes, is composed of coding sequences organized in a loading module, a number of extender modules, and a thioesterase domain. As described more fully below, each of these domains and modules is a polypeptide with one or more specific functions. Generally, the loading module is responsible for binding the first building block used to synthesize the polyketide and transferring it to the first extender module. The building blocks used to form complex polyketides are typically acylthioesters, most commonly acetyl, propionyl, malonyl, methylmalonyl, and ethylmalonyl CoA. Other building blocks include amino acid like acylthioesters. PKSs catalyze the biosynthesis of polyketides through repeated, decarboxylative Claisen condensations between the acylthioester building blocks. Each module is responsible for binding a building block, performing one or more functions on that building block, and transferring the resulting compound to the next module. The next module, in turn, is responsible for attaching the next building block and transferring the growing compound to the next module until synthesis is complete. At that point, an enzymatic thioesterase activity cleaves the polyketide from the PKS. See, generally, Figure Such modular organization is characteristic of the modular class of PKS enzymes that synthesize complex polyketides and is well known in the art. The polyketide known as 6deoxyerythronolide B is a classic example of this type of complex polyketide. The genes, known as eryAI, eryAII, and eryAIII (also referred to herein as the DEBS genes, for the proteins, known as DEBS1, DEBS2, and DEBS3, that comprise the 6-dEB synthase), that WO 99/61599 PCT/US99/11814 -9code for the multi-subunit protein known as DEBS that synthesizes 6-dEB, the precursor polyketide to erythromycin, are described in U.S. Patent No. 5,824,513, incorporated herein by reference. Recombinant methods for manipulating modular PKS genes are described in U.S. Patent Nos. 5,672,491; 5,843,718; 5,830,750; and 5,712,146; and in PCT publication Nos. WO 98/49315 and WO 97/02358, each of which is incorporated herein by reference.
The loading module of DEBS consists of two domains, an acyl-transferase (AT) domain and an acyl carrier protein (ACP) domain. Each extender module of DEBS, like those of other modular PKS enzymes, contains a ketosynthase AT, and ACP domains, and zero, one, two, or three domains for enzymatic activities that modify the beta-carbon of the growing polyketide chain. A module can also contain domains for other enzymatic activities, such as, for example, a methyltransferase or dimethyltransferase activity. Finally, the releasing domain contains a thioesterase and, often, a cyclase activity.
The AT domain of the loading module recognizes a particular acyl-CoA (usually acetyl or propionyl but sometimes butyryl) and transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT on each of the extender modules recognizes a particular extender-CoA (malonyl or alpha-substituted malonyl, methylmalonyl, ethylmalonyl, and carboxylglycolyl) and transfers it to the ACP of that module to form a thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of the loading module migrates to form a thiol ester (trans-esterification) at the KS of the first extender module; at this stage, extender module 1 possesses an acyl-KS adjacent to a malonyl (or substituted malonyl) ACP. The acyl group derived from the loading module is then covalently attached to the alpha-carbon of the malonyl group to form a carbon-carbon bond, driven by concomitant decarboxylation, and generating a new acyl-ACP that has a backbone two carbons longer than the loading unit (elongation or extension). The growing polyketide chain is transferred from the ACP to the KS of the next module, and the process continues.
The polyketide chain, growing by two carbons each module, is sequentially passed as covalently bound thiol esters from module to module, in an assembly line-like process. The carbon chain produced by this process alone would possess a ketone at every other carbon atom, producing a polyketone, from which the name polyketide arises. Most commonly, however, additional enzymatic activities modify the beta keto group of each two-carbon unit just after it has been added to the growing polyketide chain, but before it is transferred to the next module. Thus, in addition to the minimal module containing KS, AT, and ACP domains necessary to form the carbon-carbon bond, modules may contain a ketodreductase (KR) that WO 99/61599 PCTIUS99/11814 reduces the keto group to an alcohol. Modules may also contain a KR plus a dehydratase (DH) that dehydrates the alcohol to a double bond. Modules may also contain a KR, a DH, and an enoylreductase (ER) that converts the double bond to a saturated single bond using the beta carbon as a methylene function. As noted above, modules may contain additional enzymatic activities as well.
Once a polyketide chain traverses the final extender module of a PKS, it encounters the releasing domain or thioesterase found at the carboxyl end of most PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The resulting polyketide can be modified further by tailoring enzymes; these enzymes add carbohydrate groups or methyl groups, or make other modifications, oxidation or reduction, on the polyketide core molecule.
While the above description applies generally to modular PKS enzymes, there are a number of variations that exist in nature. For example, some polyketides, such as epothilone, incorporate a building block that is derived from an amino acid. PKS enzymes for such polyketides include an activity that functions as an amino acid ligase or as a non-ribosomal peptide synthetase (NRPS). Another example of a variation, which is actually found more often than the two domain loading module construct found in DEBS, occurs when the loading module of the PKS is not composed of an AT and an ACP but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is in most instances called KS
Q
where the superscript letter is the abbreviation for the amino acid, glutamine, that is present instead of the active site cysteine required for activity. For example, the narbonolide PKS loading module contains a KS
Q
Yet another example of a variation has been mentioned above in the context of modules that include a methyltransferase or dimethyltransferase activity; modules can also include an epimerase activity. These variations will be described further below in specific reference to the narbonolide PKS and the various recombinant and hybrid PKSs provided by the invention.
With this general description of polyketide biosynthesis, one can better appreciate the biosynthesis of narbonolide related polyketides in Streptomyces venezuelae and S. narbonensis. The narbonolide PKS produces two polyketide products, narbonolide and deoxymethynolide. Narbonolide is the polyketide product of all six extender modules of the narbonolide PKS. 10-deoxymethynolide is the polyketide product of only the first five extender modules of the narbonolide PKS. These two polyketides are desosaminylated to yield narbomycin and YC17, respectively. These two glycosylated polyketides are the final WO 99/61599 PCTfUS99/11814 products produced in S. narbonensis. In venezuelae, these products are hydroxylated by the picK gene product to yield picromycin and either methymycin (hydroxylation at the CIO position of YC 17) or neomethymycin (hydroxylation at the C 12 position of YCl17). (See Figure 1) The present invention provides the genes required for the biosynthesis of all of these polyketides in recombinant form.
Section L: The Narbonolide PKS The narbonolide PKS is composed of a loading module, six extender modules, and two thioesterase domains one of which is on a separate protein. Figure 4, part B, shows the organization of the narbonolide PKS genes on the Strep tomyces venezuelae chromosome, as well as the location of the module encoding sequences in those genes, and the various domains within those modules. In the Figure, the loading module is not numbered, and its domains are indicated as sKS*, sAT, and ACP. Also shown in the Figure, part A, are the structures of picromycin and methymycin.
The loading and six extender modules and the thioesterase domain of the narbonolide PKS reside on four proteins, designated PICAI, PICAII, PICA!!!, and PICAIV. PICMI includes the loading module and extender modules 1 and 2 of the PKS. PICA!! includes extender modules 3 and 4. PICA!!! includes extender module 5. PICAIV includes extender module 6 and a thioesterase domain. There is a second thioesterase domain (TEII) on a separate protein, designated PICB. The amino acid sequences of these proteins are shown below.
Amino acid sequence of narbonolide synthase subunit 1, PICA! (SEQ ID NO: 1) 1 MSTVSKSESE EFVSVSNDAG SAHGTAEPVA VVGISCRVPG ARDPREFWEL 61 121 181 241 301 361 421 481 54-1 601 661 721 781 841 901 961
VPADRWNAGD
LGWEALERAG
LSYTLGLRGP
GGLSPDGRAY
MTTPDAQAQE
PLLVGSVKTN
YLPWEPEHDG
VVSAKSAAAL
VVGSGPDDLA
ECEAALS PYV
H-SQGEIAAAY
LSVAAVNGPT
AGLS PQAPRV
SAHPVLTMAL
PTYAFQTERH
VLGYATGGQT
FYDPDRSAPG
IDPSSLTGTR
SMVVDSGQSS
T FDARANGYV
AVLREAHERA
I GHLEGAAG I
QRMVVGVSSF
DAQI ERLAAF
AALAAPEGLV
DWSLEAVVRQ
VAGALSLDDA
ATVVSGDPVQ
PFFSTLEGAW
PGTVTGLATL
WLGEIEALAP
EVDRTFREAG
RSNSRWGGFI
TGVFAGAIWD
SLVAVHLACE
RGEGGGFVVL
GTAPADVRYV
AGLI KAVLAV
GMGGTNAHVV
ASRDRTDGVD
RGVAS GVGRV
APGAPTLERV
ARVVTLRSKS
IEELARACEA
ITEPVLDGGY
RRDNGGQDRL
AGE PAVQPAV
CTSLTGVDLR
EDVDRFDAAF
DYATLKHRQG
SLRRGESELA
KRLSRAVADG
ELHGTGTPVG
RGRALPASLN
LEEAPGVVEG
AGAVDAGAVD
AFVFPGQGTQ
DVVQPVTFAV
IAAHLAGKGG
DGVRARVI PV
WYRNLRHRVG
FGISPREAAE
GAAITPHTVT
LAGGVSLNLV
DPVLAVI RGS
DPIEAAALGA
YETPNPAIPF
ASVVESTVGG
AGAVARVLAG
WAGMGAELLD
MVSLARVWQH
MLSLALSEDA
DYASHSRQVE
FAPAVETLAT
LAAGGQAVTD
MDPQQRLALE
GLHRGI IANR
PDSIIGASKF
AVNNGGAAQG
ALGTGRPAGQ
EELNLRVNTE
SAVGGGVVPW
GRAQFEHRAV
S SAVFAAAMA HGVT PQAVVG
VLERLAGFDG
I IESELAEVL
DEGFTHFVEV
VASLAEAWAN GLAVDWSPLL PSATGHHSDL LRTEAAEPAE LDRDEQLRVI LDKVRAQTAQ NRINAAFGVR MAPSMIFDFP TPEALAEQLL WO 99/61599 WO 9961599PCT/US99/1 1814 -12- 1021 L 1081 E 1141 I 1201 F 1261 C 1321 C 1381 C; 1441 1 1501 V' 1561 C 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821 2881 2941 3001 3061 3121 3181 3241 3301 3361 3421 3481 3541 3601 3661 3721 3781 3841 3901 3961 4021 4081 4141 4201 4261 4321 4381 4441 4501
VVHGEAAAN
F'PQDRGWDV
JETS WEAVED
~VSYTLGLEG
~RGLAGDGRS
;LTAPNGPSQ
~PLRLGSLKS
~AVDWPEKQD
qVVSAKSAAA
;ADDLVQALA
kLSPYVDWSL iIAAAYVAGA kVNGPTATVV
?QAPRVPFFS
ILTMTL PET V
AERYWLENT
rALVDAGAKV
GIKAPLWSVT
PHLVTALSGA
A.ARWMAHHGA
PAETPLTAVV
VSSTLGIPGQ
GVPGMDPELA
RDSATSGQGG
FKDIGFDSLA
TAALPATVGAG
RGWDLDGLYD
EAFERAGIEP
FGLEGPATTV
PDGRSKAFSA
NGPSQQRVIR
GSVKSNIGHT
PAGTG PRRAA
EGSEASEAPA
RLRDVGYTLA
TGQGSQRPGA
YTQCALFALE
QELPAGGAML
SGLGRRTRAL
PEYWVRHVRG
S PAGS PADSP RVDLPTYS F1
RTHPWLADHP?
VGAPAGEPGC
PPQGAEEVPI
GGGSAAAAP
SAGTDAVSLE
DPHATS YGP1
GPADGGAEG\
AAAWGLVRTI
ASVRPETGTI
ELVHELEAL(
DVEHVLRPKM
RRRAAGLPAI
HPVLLPLRLI
DGAAETAAV'
LTAVELRNR]
TASRSTAETI
PDGAGSGAE]
PAGAEPAPVA
EGLYHPD PER
AGIDPTSLRG
PALTVDTACS
KAFAASADGT
QRVI RRALAD NIGH TQAAAG
GGLRRAAVSS
LDAQI ERLAA
DPDGLIRGTA
EAVVRQAPGA
LPLDDAARVV
SGDPVQI EEL
TLEGTWITEP
TGLGTLRREQ
PAALATGDDW
EVLTAGADDD
QGAVSVGRLD
TGEDQIAIRT
EILLLVSRSG
HTAGALDDGI
GNYAPHNAYL
LAALESALGR
SSAQGANPLA
GVELRNRLTR
AGAGAGT DAD
ADPDALGRAY
ASLRGSSTGV
DTACSSSLTA
DADGFGAAEG
QALADARLAP
QAAAGAAGI I
VSSFGISGTN
APGSREASLP
TSRTAFAHRA
GRELYDRHPV
VALFRLVESW
AVQAAEDEIR
RVSHAFHSAH
,TVRFLDGVR'V
,AGALRPRPLL
SRDRYWLDAPP
,VLGSVLLPGP
ESAGDGARPv
DGLYERLDG'
GIHPALLDAE
LTDGEGRPL\
AVLGKDELK\
RGTVARTLEI
QTENPGRFG]
APALAPEGTI
3ADVSVAACDI I DAAFLLDEL'.
SLGWGLWAE'.
SAAGLRDAAGI
r' LADRAATVD(
SNSAGGLALPA
D ALLAQLTRLI D RPWAAGDGA,
AAGAVDEPVA
PGTSYVRQGG
RQVGVFTGAN
SSLVALHLAV
SWSEGVGVLL
ARLTTSDVDV
VSGVIKMVQA
FGI SGTNAHV
FASRDRTDDA
SGVGRVAFVF
PTLERVDVVQ
TLRSKSIAAH
AQACKADGFR
VLDGTYWYRN
GGQERLVTSL
RYRI DWKRLP
REALAARLTA
TPADPDRAML
TGLHARRLAR
EQAPGATQLT
VDTLTAEQVR
DALAARRRAT
DETAITVADI
ERLAAAAPGE
ATGLQLPATL
DDP IATVANS
VREGGFLHDA
FIGLSYQDYA
LHLAVRALRS
VGLLLVERLS
GDI DAVETHG
KMVLAM~RHGT
AHVVLEQAPD
GHLPWVLSAK
AVTAADRDGF
FARALDET CA
GMRFAALLGH
VWLETEERYA
MDGMLDGFRP.
LRDLGVRTCL
VALLRRKRSE
LADTAVDTAGI
AM'VELAAHA-4
SLHSRLADAI
IGLAFGPLFQC
LHAIAVGGLN
rSVERLTLRP\
IAAALESAGVI
"LQAWLADEHI
"LDLADDASS'.
I LLTGGTGGL( I ADREALTAVI r' STPAYDLAA] r' SGMTGELGQ ,q DPAGIPALF 3 PARQRLLLE: k. TLVFDHPSP, E GALVLTGLS 3 GGSEDGAGV
IVGMACRLPG
FIENVAGFDA
THEYGPSLRD
QALRKGEVDM
VERLSDARRN
VEAHGTGTRL
MRHGLLPKTL
VLEEAPVVVE
DAGAVDAGAV
PGQGTQWAGM
PVTFAVMVS L
LAGKGGMLSL
ARI IPVDYAS
LRHRVGFAPA
AEAWVNGLPV
AAEGSERTGL
LTTGDGFTGV
WGLGRV VALE
APLHGRRPTR
AELTASGARV
RAHRAKAVGA
GRSAVSVAWG
DWDRFYLAYS
RTEILLGLVR
VFDHPT PLAL
CRYPGDIRSP
AEFDAEFFGV
ARVPNAPRGV
GECTMALAGG
DARRNGHPVL
TGTSLGDPIE
LPKTLHADEP
AAGEVLGADE
DEQSLRGQAA
LDGLATLAQG
HLDGHLELPI
SVGEIAAAHWj
GRLDVAAVNC
VLETVEFRRE
ELGPDGVLTI
TETVADALGE
GLGTADHPLI
ESAGLRDVRI
AGTAWSCHAI
LNAVWRYEGI
1DEPELVRVPI I TADQAAASRI
VGLYPDLAA]
.AGTRLLLVTI
~RTLPSVLMD
3GLVARHVVG]
SDAIPAEHPL'
VMFSSAAAV
~DLRRMSRAG
P DVVGARTVR F VVGEVAEVL P, ALASHLDAE D APGSEEVLE P DFMNASAEE GVASPEDLWR L AFFGISPREA L GGEGLDGYLL T ALAGGVAVMP
T
GHQVLAVVRG S GDPIEAQALI A HVDEPSDQID W GASVVEPSVG G A-IVLADGRAQ
F
GAELLDSSAVF
ARVWQHHGVT
P
ALNEDAVLER
I
HSRQVEIIES E IETLAVDEGF 'I AWTSLLPATA
E
SGRWLAVTPE I
VSLLDGLVPQ
HPERWAGLVD I DWQPHGTVLI I
TIAACDVADP
SVLDELTRDLI
PWDGGGMAAGI
SGRPQPLVEE
AQAAAVLRM~R
VSLLRSEFLG
EDLWRMLSEG
SPREALAMDP
EGYLLTGSTP
VAMMATPHMF
AVVRGTAVNQ
AQGLQATYGK
S PHVDWANSG VPEVSET VAN
ALHAWLSEPA
GTSAHVHLDT
LDVMFAAEGS
AGVFSLADAA
PEAAVLSGDA
SLTVVSNVTG
MAADGLADTP
AHAHGTGPDW
GAVVSLPDRD
LTLLEPLVLP
GLLATDRPEL
VFADIALPAT
HWSGVTVHAA
I GGLMHRVAWR
SQDVAAGAPA
k. GAVRDPEGSG
GLRDEPQLAL
Z WGVRRLLLVS r AVVHTAGVLS F GGAGQGAYAA I GGISDAEGIA A~ RPSAASASTT G HARGHRIDAE L PRGASDQDGA H LRSLRSMVTG L FGLLDQDPST VAGGGDAI S
AMDPQQRLL
GNTASVMSG
PGMFVE FSR
AVNQDGASN
~TYGQGRDDE
SAGAVELLT
SAVGGGVT P
'EHRAVALGA
'AAAMAECEA
~QAVVGHSQG
~SDFDGLSVA
LAQVLAGLS
~HFVEVSAHP
RPGLPTYAF
)HSAQAAAVL
IAWVQALGDA
~PAQPDAAAL
~GGTGALGSH
IAMRTLLDAI
)LDAFVLFSS
DGVAERLRNH
LPEVRRI IDA S PEDVAADRA
DEETADARRS
GEGITPFPTD
QQRMLLTTSW
SVASGRIAYT
VEFSRQRALA
DGASNGLTAP
ERPAERPLAI
LALVTEPI DW
AGTAGTSEVA
ADLSDADGPA
ARDGTTAFLF
AEAALLDETR
RLVAARGRLM
DAAREAEAYW
LAAGPDDLCD
ADSAAGSPVG
HAWFAGSGAH
GLLLTGRLSL
EHGGVELRVT
PVAPDRAANW
TNATAPATAN
GAAAARVRLA
PYALAS SGEQ
PRTVLAPLPA
ADDGGEDLSH
HDGTIRLARL
RRGTDAPGAD
DGTLPSMTTE
ANATLDALAW
LLDAALRDDR
AGTAGTPGTA
RGFLDLGFDS
GNRNGNENGT
ETGTGTASGA
D (SEQ ID NO:1) WO 99/61599 PCTIUS99/11814 13 Amino acid sequence of narbonolide, synthase subunit 2, PICAII (SEQ ID NO:2) 1 61 1211 181 C 241 3012 361 421 4811 541 601 6611 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821 2881 2941 3001 3061 3121 3181 3241 3301
TSTVNEEKYL
TAGGE DATSE kMDPQQRLLL
;NSGSVASGR
?STEFVEFSRQ
kVNQDGASSG
[YGQGRDGEQ
3AGAVELLTE
PSVGAGLVPW
ILGTUGQDDFA
ECESA-LSRYV
-SQGEIAAAY
US TAAVNGPT kGLSPRTPEV
SAHPVLTMTL
PTYAFQRRHY
EAAPVLAALS
GHPAPFTRGT
ALEHPERWGG
TASPWWQADG
LAGLVAELAD
ADALARVVTA
DALAIGQHRAD
DTAVT TADVD
AEQQRRMQEL
TLVFDYPTPR
LWRLVAGGED
REALAM'DPQQ
YVGTIGNAAS I VMST PTT VE VRG SAVNQDG
ALIATYGQGB
QI DWSAGTVE
VGGVVPWLVE
TGQDDLAAAI
AALAPYVDWS
GEIAAAYVAC
AAVNGPTAT%
SPQTPQVPFF
PVLTMALPEI]
FQTERYWPQI
AVAGTVLLP(
RTFC-LYAHPI
ANGYGYGPL1
FGAGTRLPF)
PAQLAAFSDI
TDLVEAVDR(
VTRDAVAAPJ
GDATVGGTS(
ALPLPAAPA:
ALGMYPDPA:.
WTFAQGASV
SHGKWDALR
GGRF:VEMGK
VTTWDVRRA
GVRRLLLVS
DYLRRATADL
FPQDRG WOVE
EASWEAFEHA
VAYTLGLEGP
RGLAPDGRSK
LTAPNGPSQQ
PLRLGSLKSN
AD4DWPDKGDG
LVSAKTPAAL
QALTAPEGLI
DWSLEAVVRQ
VAGALTLDDA
ATVVSGDPTQ
PFFSTLEGAW
PET VTGLGTL WLHDS PAVQG
GAGADPVQLD
GATLTLVQAL
LIDLPSDADR
TVLVTGAEEP
LGATATVVTC
KATAALHLDR
GPTVTSVAWS
WSSFAPGFTT
VREHLAVVLN
TLAEFLLAEI
AT SGFPQDRG
RLLLETSWEA
MSGRVSYTLG
FSRQRGLAED
ASNGLTAPNG
*DTEQPLRLGS
*LLTEAMDWPP
AKTPAALDAC
AAPEGLVRG'
LEA VVRQAPC ALSLDDAARv
VSGDPTQIQE
STLEGAWITE
VTGLGTLRRI
DLSAAGDIT~c 3TAFVELAFR7
DAPGEAEWTI
-QGVRGVWRR(
k WSGISLYAV(
TLDALHLLEI
3ETPAPATVLI 3GDGLRSTGQ 3 DAALGSALA' L WRLEPGTDG L MGTEGAGVV' P VVFLTAVYA h LGLDDAHIA T DVRDAERVA R. DAFRHVSQA R. RGTDAPGAG
HEARGRLREL
GLYDPN PEAT GI PAATARGT AVTVDTACS S
SFSSTADGTS
RVI RRALADA I GHTQAAAGV GLRRAAVS SF
DAQIGRLAAF
RGT PS DVGRV
APGAPTLERV
ARVVTLRSKS
IQELAQACEA
ITEPVLDGTY
RREQGGQERL
SVQDSWRYRI
VSPLGDRQRL
EDAGVAAPLW
AALDRMTTVL
AAAEAARRLA
DLTDAEAAAR
LLREAAAAGG
PWEGSRVTEG
ARPGTLLADL
HPS PEAVDTG
LGEQAGAGEQ
WDVEGLYDPD
VEDAGIDPTS
LEG PAVTVDT
GRSKAFAAS))
PSQQRVIRRP
LKSNIGHTQP
KQEGGLRRAP
IGRLAAFASC
ASGVGRVAF"%
APTLERVD\
VTLRSKSIGI
LAQACEADG\
PALDGGYWYI
NGGQHRLTT~c 3AGLGAAEHP] k GDQVGCDLVI k HATGVLAAPI 3 DEVFADVAL] 3ATALRVRLAI q TAWDGAAQA'.
I ACPAAGPGG:
AVWGLGRSA,
r ALGSGEPQL 3 LESLTAAPG T ATGPGVTHL.
L RDLADVKPG S SRTLDFESA A DHPGVGYRA R HTGKVVLTM E LVHELEALG
EAKAGEPVAI
GKSYAREAGF
SVGVFTGVM~Y
SLVALHLAVQ
WSEGVGVLLV
RLTT SDVDVV
SGVIKMVQAM
GVSGTNAHVV
ASQGRTDAAD
AFVFPGQGTQ
DVVQPVTFAV
IAAHLAGKGG
DGVRARI IPV
WYRNLRHRVG
VTSLAEAWTN
DWKRLAVADA
AATLGEALAA
CVTHGAVSVG
AGGTGEDQVA
RDGAGHLLLH
LLAGVSDAHP
RPPVLVLFSS
ATGERLRRLG
PEARRALDEQ
RAFRDLGFDS
LPVDGGVDDE
PDASGRTYCR
LQGQQVGVFA
ACSS SLVALH
DGFGPAEGVG
LADARLT TAE
AAGVSGIIK'.
LVSSFGISGTN
GRTDAADPGP
FFPGQGTQWAC
rQPVTFAVMVE
HLAGQGGMLE
7RARIIPVDYI k. NLRHRVGFAI 3LAEAWANGLI SLGAAVALADc
ELTLDAPLVI
~DRTAPVADPI
?AEVAGAEGAI
P AGPDTVSVS) U PGAVVLGGDj P EHVREALHG: D TESPGRFVL:.
P, LRDGALLVPJ.
D AETLAPEPL, A PGDRVMGLL E RLLVHSAAG F RAASGGAGM F DLGEAGPER P SGLDPEGTV A DVSVAACDV VGMACRLPGG V LYEAGEFDAD F HDYATRLTDV P ALRKGEVDMA L ERLSDARRKG H EAHGTGTRLG D RHGVLPKTLH V LEEAPAAEET P PGAVARVLAG G WAGMGAELLD V MVSLAKVWQH H MISLALSEEA TI DYASHSAHVE T FAPAVETLAT D GLTIDWAPVL P SERAGLSGRW L AGGAVDGVLS I
RADHVTSPAQ
VRASGLLARR I TTPSGSEGAE C
LSAVLHLPPT
VAAIWGGAGQ
LRPLAPATAL
QSTTAADDTV
LTAVELRNRL
PVAIVGMACR
AGGFLDEAGE
GTNGPHYEPL
LAVQALRKGE
MLLVERLSDA
VDVVEAHGTG
VQAMRHGVLP
IAHIVLEEAPV
VAR VLAGGRA
MGAELLDVSK
LAKVWQHHGV
LALSEAAVVE
SHSAHVETIE
AVETLATDEG
VDWASLLPTT
3DGCLLTGSLS
PRRGAVRVQL
AWPPPGAEPV
k FGLHPALLDA k~ ADSSGQPVFA
DGLAAALRAG
3LALMQAWLAD L DLAGEARTAG R LARAAAPAAA G PGQVRIAIRA S GAYAPVVVAD G VGMAAVQLAR D VVLNSLAREF I GEMLAEVIAL L LTGGTGALGG A DREALTAVLD
ASPEDLWRL
FGISPREAL
EGIEGYLGT
AGGVTVMST
RI LAVVRGT
PIEAQAVIA
EKPT DQVDW AS EAT PAVE
,RAEFEHRAV
SKEFAAAMA
.GVT PQAVVG
'RQRIENLHG
IESELAEVL
)EGFTHFIEV
~TATGHHPEL
~VVVPEDRSA
~LAWDESAHP
UI4VWGMGRVA
VRASLPAHG
;TSGAAEDSG
lOSE PLAATD 3AYAAGTAFL
'AILDTALGHG
.JSRELGALTG
.CNATGLALPA
UPGGVASPED
EDADFFGIS P
LRNTAEDLEG
CGLALAGGVT
RRNGHRVLAV
TRLGDPIEAQ
KTLHVDRPSD
DEDAPADEPS
QFEHRAVALG
EFAAAMAECE
TPQAVVGHSQ
RLAGFDGLSV
SELADVLAGL
FTHFVEVSAH
TTHPDLPTYA
LRTHPWLADH
SVGASDESGR
DVDGLYERFA
AVQAAGAGGA
ADSLTVLPVb GTE VLS FPDL
ERFTDGRLVL
DATAGDGLTT
DGLAAADGLA
TGLNFRDVLI
ART VARMPEG
HWGVEVHGTA
VDASLRLLGP
FEDGVLRHLP
IVARHVVGEW
SI PAEHPLTA WO 99/61599 WO 9961599PCTIUS99/1 1814 -14- 3361 3421.
3481 3541 3601 3661 3721 VVH TAG VLS D
GAGQGAYAAA
PMDSELTLSL
QRRAAAGGAG
TGFDSLTAVE
OTASAT DRQT ASDDDLFS FT
GTLPSMTAED
NATLDALAWR
LDAAMRRDDP
EADT DLGGRL
LRNRLNAATG
TAALAELDRL
VEHVLRPKVD
RRTAGLPALS
ALVPIALDVA
AAM'TPDDRVA
LRLPATLVFD
EGVLASLAPA
AAFLLDELTS
LGWGLWAETS
ALRAQQRDGM
HLRDLVRTHV
H PTPG ELAG H
AGGRPELAAR
T PGYDLAAFV
GMTGGLSDTD
LAPLLSGLTR
AT VLGHGT PS
LLDELATAAG
LRALAAALGD
MFSSAAAVFG
RSRLARSGAT
GSRVGGAPVN
RVDLERAFRD
GSWAEGTGSG
DGDDATDLDE
DKELGDSDF (SEQ ID NO:2) Amino acid sequence of narbonolide synthase subunit 3, PICAIII (SEQ ID NO:3) 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501
MANNEDKLRD
AGDGDAISEF
MDPQQRLSLT
SGAALGFLSG
NADLFVQFSR
SAVNQDGASN
ATYGQEKSSE
WSAGTVELLT
GVVPWPVSAK
ALRDALRMPE
RYVDWSLEAV
AAYVAGALTL
GPTATVVSGD
PH VP FF5TLE
TLPETVTGLG
RFWLQSSAPT
DVLEAGADDD
QGAVSVGRLD
TGEDQIAIRT
EHLLLVSRSG
HTAGAPGGDP
GVYAAANAHL
DELAKALSHD
VAPTGQSSAL
AVQLRNQLST
PLDRLRDAGV
YLKRVTAELQ
PQDRGWDVEG
TAWEAIESAG
RIAYVLGTDG
QRGLAADGRS
GLTAPHGPSQ
QPLRLGALKS
EAVDWPEKQD
T PAAL DAQI G GLVRGTS SDV
VRQEPGAPTL
DDAARVVTLR
PTQI EELART
GTWITEPVLD
TLRREQGGQE
SAADDWRYRV
REALAARLTA
TPADPDRAML
TGLHARRLAR
EQAPGATQLT
LDVTGPEDIA
DALAARRRAR
ET FVAVADVD
AAITALPEPE
VVGNRLPATT
LDTVLRLTGI
QNTRRLREI S
LYDPDPDASG
IDPTALKGSG
PALTVDTACS
KAFATSADGF
QRVI RRA1LAD
NIGHTQAAAG
GGLRRAAVS S
QLAAYADGRT
GRVAFVFPGQ
DRVDVVQPVT
SKS IAAHLAG
CEADGVRARI
GTYWYRNLRH
RLVTSLAEAW
EWKPLTASGQ
LTTGDGFTGV
WGLGRVVALE
APLHGRRPTR
AELTASGARV
RILGAKTSGA
GETATSVAWG
WERFAPAFTV
RRPALLTLVR
VFDHPTPAAL
EPEPGSGGSD
GRTHEPVAIV
RTYCRSGGFL
LGVFVGGWHT
SSLVALHLAV
GPAEGAGVLL
ARLAPGDVDV
VAGVI KMVQA FGI SGTNAHV
DVDPAVAARA
GTQWAGMGAE
FAVMVSLAKV
KGGMI SLALD I PVDYASHSR
RVGFAPAVET
ANGLT IDWAP
ADLSGRWIVA
VSLLDDLVPQ
HPERWAGLVD
DWQPHGTVL I
TIAACDVADP
EVLDDLLRGT
LWAGDGMGRG
SRPSLLLDGV
THAAAVLGHS
AAHLHEAYLA
GGAADPGAE P
GMACRLPGGV
HDAGEFDADF
GYTSGQTTAV
QALRKGECDM
VERLS DARRN
VEAHGTGTRL
MRHGLLPKTL
VLEEAPAVED
LVDSRTAMEH
LLDSSPEFAA
WQHHGI TPQA EAAVLKRLS D QVEI TEKELA
LAVDGFTHFI
ILPTATGHHP
VGSEPEAELL
VAWVQALGDA
LPAQPDAAAL
TGGTGALGSH
HAM'RTLLDAI
PLDAFVLYS S
ADDAYWQRRG
PEARQALAAP
SPDRVAPGRA
PAEPAPTDWE
EASIDDLDAE
ASPEDLWQLV
FGI SPREALA
QSPELEGHLV
ALAGGVTVMP
GHRILAVVRG
GDPI EAQALI HVDEPSDQI D S PAVE PPAGG
RAVAVGDSRE
SMAECETALS
VVGHSQGEIA
FDGLSVAAVN
EVLAGLAPQA
EVSAHPVLTM
ELPTYAFQTE
GALKAAGAEV
GI KAPLWSVT
AHLVTALSGA
AARWMAHHGA
PAETPLTAVV
NAGVWGSGSQ
I RPMS PDRAL
VGAPAPGDAA
FTELGFDSLT
GRVRRALAEL
ALIRMALGPR
1561 (SEQ ID NO:3) Amino acid sequence of narbonolide synthase subunit 4, PICAIV (SEQ ID NO:4) 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021
MTSSNEQLVD
AAGKDLVSEV
MDPQQRQLLE
NSSAVASGRI
GAFIEFSSQQ
INQDGASNGL
YGQGRAPGQP
AGSVELLTEA
PWPVSAKTSA
DALRMPEGLV
DWSLEAVVRQ
VAGALTLDDA
ATVVSGDPTQ
PFFSTLEGTW
PDKVTGLATL
RSYWI SPAGP CDS PEEVPVD
AERNWAVAEP
ALRASLKENE
PEERGWDIDS
ASWEVFERAG
AYSLGLEGPA
AMAADGRTKG
TAPHGPSQQR
LRLGTLKSNI
VDWPERPGRL
ALDAQI GQLA
RGTVTDPGRV
APSAPTLDRV
ARVVTLRSKS
IQELAQACEA
ITEPALDGGY
RREDGGQHRL
GEAPAHTASG
RPLREI GFDS S DHEQAEEEK
ELRKESRRRA
LYDPVPGRKG
I DPAS VRGT D VT VDTACSS S
FASAADGLAW
LIRQALADAR
GHTQAASGVA
RRAGVSAFGV
AYAEDRTDVD
AFVFPGQGTQ
DVVQPVTFAV
IAAHLAGKGG
DGIRARIIPV
WYRNLRHRVG
TTSLAEAWAN
REAVAETGLA
LTAVDFRNRV
AAAPAGARSG
DRRQEPMAIV
TTYVRNAAFL
VGVYVGCGYQ
LVALHLALKG
GEGVAVLLLE
LTS SDVDVVE GVI KMVQALR
GGTNAHVVLE
PAVAARALVD
WAGMGAELLD
MVSLAKVWQH
MI SLALSEEA
DYASHSAHVE
FAPAVETLAT
GLALDWASLL
WGPGAEDLDE
NRLTGLQLPP
ADTGAGAGMF
GMSCRFAGGI
DDAAG FDAAF
DYAPDIRVAP
LRNGDCSTAL
RLSDARRKGH
GHGTGTRLGD
HGVLPKTLHV
EAPAVEESPA
SRTAMEHRAV
SSPEFAAAMA
HG ITPEAVI G
TRQRIENLHG
TIENELADVL
DEGFTHFIEV
PATGALS PAV
EGRRSAVLAM
TVVFEHPT PV
RALFRQAVED
RSPEDLWDAV
FGISPREALA
EGTGGYVVTG
VGGVAVLATP
RVLAVVRGSA
PT EAQALLAT
DEPTDQVDWS
VEPPAGGGVV
AVGDSREALR
ECETALS PYV
HSQGETAAAY
LSIAAVNGPT
AGLS PQT PQV
SAHPVLTMTL
PDLPTYAFQH
VMRQAASVLR
ALAERI SDEL DRYGE FLDVL WO 99/61599 PCT/US99/11814 1081 AEASAFRPQF ASPEACSERL DPVLLAGGPT DRAEGRAVLV GCTGTAANGG
PH-EFLRLSTS
1141 FQEERDFLAV PLPGYGTGTG TGTALLPADL DTALDAQARA ILRAAGDAPV
VLLGHSGGAL
1201 LAHELAFRLE RAHGAPPAGI VLVDPYPPGH QEP[EVWSRQ LGEGLFAGEL EPMSDARLLA 1261 MGRYARFLAG PRPGRSSAPV LLVRASEPLG DWQEERGDWR AI-WDLPH-TVA
DVPGDHFTMM
1321 RDHAPAVAEA VLSWLDAIEG IEGAGK (SEQ ID NO:4) Amino acid sequence of typeII thioesterase, P1GB (SEQ ID 1 VTDRPLNVDS GLWIRRFHPA PNSAVRLVCL PHAGGSASYF FRFSEELHPS
VEALSVQYPG
61 RQDRRAEPCL ESVEE 'LAEHV VAATEPWWQE GRLAFFGHSL GASVAFETAR
ILEQRHGVRP
121 EGLYVSGRRA PSLAPDRLVH QLDDRAFLAE IRRLSGTDER FLQDDELLRL
VLPALRSDYK
181 AAETYLHRPS AKLTCPVMAL AGDRDPKAPL NEVAEWRRHT SGPFCLRAYS
GGHFYLNDQW
241 HEICNDISDH LLVTRGAPDA RVVQPPTSLI EGAAKRWQNP R (SEQ ID The DNA encoding the above proteins can be isolated in recombinant form from the recombinant cosmid pKOSO23-27 of the invention, which was deposited with the American Type Culture Collection under the terms of the Budapest Treaty on 20 August 1998 and is available under accession number ATCC 203141. Cosmid pKOSO23- 2 7 contains an insert of Streptomyces venezuelae DNA of -38506 nucleotides. The complete sequence of the insert from cosmid pKOS023-27 is shown below. The location of the various ORFs in the insert, as well as the boundaries of the sequences that encode the various domains of the multiple modules of the PKS, are summarized in the Table below. Figure 2 shows a restriction site and function map of pKOSO23-2 7 which contains the complete coding sequence for the four proteins that constitute narbonolide PKS and four additional ORFs. One of these additional ORFs encodes the picB gene product, the type 11 thioesterase mentioned above. P1GB shows a high degree of similarity to other type 11 thioesterases, with an identity of 51%, 49%, and 40% as compared to those of Amycolatopsis mediterranae, griseus, S. fradiae and Saccharopolyspora erythraea, respectively. The three additional ORFs in the cosmid pKOSO23-27 insert DNA sequence, from the picCIl, picCIIl, and picC V, genes, are involved in desosamine biosynthesis and transfer and described in the following section.
From Nucleotide To Nucleotide Description 13725 picA I 13725 narbonolide synthase 1 (PICAI) 148 3141 loading module 148 1434 KS loading module 1780 2802 AT loading module 2869 3141 ACP loading module 3208 7593 extender module 1 3208 4497 KS1 4828 5847
ATI
WO 99/61599 PCT/US99111814 6499 7336 7693 7693 9418 10594 12175 13063 13830 13830 13935 13935 15540 17271 18123 18447 18447 20031 21093 22620 23652 24498 25133 25133 25235 25235 26822 28474 29302 29924 29924 30026 30026 31604 32708 From Nucleotide 33068 33961 33961 34863 34863 36159 36159 37529 37529 -16- 7257 KCR1 7593 ACP1 13332 extender module 2 8974 KS2 10554 AT2 11160 DH2 12960 KR2 13332 ACP2 25049 picA 11 25049 narbonolide synthase 2 (PI 18392 extender module 3 15224 KS3 16562 AT3 18071 KR3 (inactive) 18392 ACP3 24767 extender module 4 19736 KS4 21050 AT4 21626 DH-4 23588 ER4 24423 KR4 24765 ACP4 29821 picARI1 29821 narbonolide synthase 3 (P 29567 extender module 26530 27841 29227 29569 33964 picA N 33964 narbonolide synthase 4 (P 32986 extender module 6 31312 KS6 32635 AT6 32986 ACP6 To Nucleotide Description 33961 PKS thioesterase domain 34806 picB 34806 typell thioesterase homol 36011 picCil 36011 4-keto-6-deoxyglucose is 37439 piCil 37439 desosaminyl transferase 38242 picCVI 38242 3-amino dimethyltransfe CAll)
ICAIII)
ICAIV)
og omerase rase DNA Sequence of the Insert DNA in Cosmid pKOS023-27 (SEQ ID NO: 19) 1 GATCATGCGG AGCACTCCTT CTCTCGTGCT CCTACCGGTG ATGTGCGCGC CGAATTGATT WO 99/61599 WO 9961599PCT/US99/1 1814 -17- 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2342 2401 246] 2521 2581 2641 270: 276: 282: 288: 294: 300: 3061 312 318 324 330 336 342 348 354
CGTGGAGAGA
GACGCCGGTT
GTGCCCGGCG
GTCACCGACG
GCCCCCGGCC
GCCGCCTTCT
GCCCTGGAGC
GGCACCCGCA
CGCCAGGGCG
GCGJAACCGAC
CAGTCCTCGT
GAGCTCGCCC
AGCAAGTTCG
GGCTACGTAC
GCCGACGGCG
GCCCAGGGCA
GAGCIGGGCCG
CCCGTGGGCG
GCCGGACAGC
GCCGGCATCG
AGCCTGAACT
AACACGGAGT
TCC:CGTTCG
GTCGAGGGTG
GTGCCGTGGG
GCCGCGTTCG
GCTGTCGATG
CGGGCCGTCG
GGTCTGGTCC
GGCACGCAGT
GCCATGGCCG
GTACGGCAGG
TTCG'CCGTCA
GTCGTCGGCC
GACGACGCCG
*AAGIGGCGGC1P
*TTCGACGGGC
*CCCGTACAGP
*ATTCCCGTCC
*GAGGTCCTCC
*GGCGCCTGG;
L CGTU'TGGGCI L GTCGAGGTCI
LGCGACCCTGC
LTGGGCCAACC
LTCCGACCTC(
L. CTCGCCCCG( L. CCGGCGGAG( 1 ACGGCCCAGC 1 GAGGCCGGT' 1 GGCGTACGG 1 CAGCTGCTC( 1 CCGC2TGGCGI 1 CTGCCCGGT, 1 GCGATCTCG, 1 CCCGAGCAC 1 TTCGACGCG 1 CGGCTCCTC 1 CTG'CGGGGA
TGTCGACAGT
CCGCGCACGG
CCCGGGACCC
TCCCCGCGGA
GCTCGAACAG
TCGGCATCTC
TGGGCTGGGA
CCGGCGTCTT
GCGCCGCGAT
TCTCGTACAC
CGCTCGTCGC
TCGCCGGCGG
GCGGCCTCTC
GCGGCGAGGG
ACCCGGTGCT
TGACGACCCC
GGACCGCGCC
ACCCGATCGA
CGCTCCTGGT
CCGGCCTCAT
ACGAGACCCC
ACCTGCCGTG
GCATGGGCGG
CTTCGGTCGT
TGGTGTCGGC
CCTCGCGGGA
CGGGTGCTGT
TCGTCGGCAG
GGGGCGTGGC
GGGCCGGCAT
AATGCGAGGC
CCCCCGGTGC
TGGTCTCGC'I
ACTCGCAGGC
CTCGTGTCG I
TGCTGTCCCI
TGTCCGTCGC
TCGAAGAGC']
ACTACGCGT(
CCGGGCTCA(
TCACCGAGC(
TCGCCCCGG(
GCGCCCACC(
-GTCGCGACAj
GACTCGCGG'
CCACCTACG
,CGGGCGAGCI
TCGACCGGG.
3 TGCTGGGGT.
r' GCACCTCCC k. TGGCGCCGT
TCGTCGTGC
3 CGGCCGGTG 3 GGGTCGCCT S AGTTCCCGC C CCGGCACGT G CCTTCTTCG C TCGAAACCT C GGCAGGTCC
GTCCAAGAGT
CACAGCGGAA
GAGAGAGTTC
CGGCTGGAAC
CCGGTGGGGC
GCCCCGCGAG
GGCCCTGGAG
CGCCGGCGCC
CACCCCGCAC
GCTCGGGCTC
CGTCCACCTC
CGTCTCGCTC
CCGCGACGGC
CGGCGGTTTC
CGCCGTGATC
CGACGCGCAG
GGCCGACGTG
GGCCGCTGCG
CGGCTCGGTC
CAAGGCCGTC
GAACCCGGCG
GGAGCCGGAG
CACGAACGCG
GGAGTCGACG
GAAGTCCGCT
TCGTACGGAT
CGCTCGCGTA
CGGGCCGGAC
TTCCGGTGTC
GGGTGCCGAA
CGCACTCTCC
GCCCACGCTG
GGCTCGCGTG
CGAGATCGCC
GACCCTGCGC
CGCGCTGAGC
CGCTGTGAAC
TGCTCGGGCC
CCACAGCCGC
1CCCGCAGGC'I
CGTGCTCGAC
2CGTCGAGACC 2CGTCCTCACC k. CGGCGGTCA( r CGACTGGAGC 7, GTTCCAGAC( C GGCGGTGCA( N. CGAGCAGCT( A CGCGACAGG( T GACCGGCGT( C CATGATCTTI A CGGGGAGGC' C CGTCGACGA' C GCCGGAGGA A GGACCGCGG C GTACGTCCG G GATCTCGCC 'C CTGGGAGGC ;G CGTCTTCAC
GAGTCCGAGG
CCCGTCGCCG
TGGGAACTCC
GCCGGCGACT
GGGTTCATCG
GCCGCGGAGA
CGCGCCGGGA
ATCTGGGACG
ACCGTCACCG
CGCGGCCCCA
GCGTGCGAGA
AACCTGGTGC
CGCGCCTACA
GTCGTCCTGA
CGGGGCAGCG
GCGCAGGAGG
CGGTACGTCG
CTCGGCGCCG
AAGACGAACA
CTGGCGGTCC
ATCCCGTTCG
CACGACGGGC
CATGTCGTGC
GTCGGCGGGT
GCCGCGCTGG
GGTGTCGACG
CTGGCCGGCG
GATCTGGCGG
GGGCGAGTGG
CTGCTGGACT
CCGTACGTCG
GAGCGGGTCG
TGGCAGCACC
GCGGCGTACC
AGCAAGTCCT
GAGGACGCCC
GGGCCCACCC
TGTGAGGCCC
CAGGTCGAGY
CCGCGCGTGC
GGCGGCTAC'
CTGGCCACC(
ATGGCCCTC(
3 GACCGCCTC( 2CCGCTCCTC( 2GAGCGCCACr 3 CCCGCCGTC( 3 CGCGTGATCI 2 GGGCAGATC, 3 GACCTGCGC.
Z GACTTCCCC.
S GCGGCGAAC G CCGGTGGCG C CTGTGGCGG C TGGGACGTG C CAGGGCGGT G CGCGAGGCC C GTCGAGGAC T GGGGCGATC
AATTCGTGTC
TCGTCGGCAT
TGGCGGCAGG
TCTACGACCC
AGGACGTCGA
TGGACCCGCA
TCGACCCGTC
ACTACGCCAC
GCCTCCACCG
GCATGGTCGT
GCCTGCGGCG
CGGACAGCAT
CCTTCGACGC
AG CGCCT CT C
CCGTCAACAA
CCGTGCTCCG
AGCTGCACGG
CCCTCGGCAC
TCGGCCACCT
GCGGTCGCGC
AGGAACTGAA
AGCGGATGGT
TCGAAGAGGC
CGGCGGTCGG
ACGCGCAGAT
CGGGCGCTGT
GGCGTGCTCA
CAGCGCTGGC
CGTTCGTGTI
CTTCCGCGG'I
ACTGGTCGC'I
ATGTCGTGC;
ACGGGGTGAC
TCGCCGGTGC
TCGCCGCCCI
TCCTGGAGCC
CCACCGTGG'
ATGGGGTCCC
k TCATCGAGA( 2CGTTCTTCT( C' GGTACCGCAj 3ACGAGGGCT'
CCGGGACCG'
3 TCGCCTCCC' 2 CCTCCGCGA, r GGCTGGGCG.
2: TCCGCACGG.
Z TGGACAAGG G AGGTCGACC h ACCGGATCA A CCCCCGAGG C CGGCCGGTG A TCGTCGGCA C TGGTGGCCG G AGGGGCTGI T TCATCGAGP C TCGCCATGG :G CCGGGATCC A CCCACGAGI
CGTGTCGAAC
CTCCTGCCGG
CGGCCAGGCC
GGACCGCTCC
CCGGTTCGAC
GCAGCGGCTC
CTCGCTCACC
CCTGAAGCAC
CGGCATCATC
CGACTCCGGC
CGGCGAGTCC
CATCGGGGCG
GCGCGCCAAC
CCGGGCCGTC
CGGCGGCGCC
CGAGGCCCAC
CACCGGCACC
CGGCCGCCCG
GGAGGGCGCG
GCTGCCCGCC
CCTCCGGGTG
CGTCGGCGTG
CCCGGGGGTT
CGGCGGTGTG
CGAGCGGCTT
CGATGCGGGT
GTTCGAGCAC
CGCGCCTGAG
CCCCGGGCAG
GTTCGCGGCG
GGAGGCCGTC
GCCTGTGACG
GCCCCAGGCG
CCTGAGCCTG
SCCTCGCCGGC
IACTGGCCGGG
r CTCCGGTGAC 3 TGCGCGGGTC 3 CGAGCTCGCC 2GACACTCGAA k. CCTGCGCCAT r CACCCACTTC r CACCGGTCTG r' CGCCGAAGCA C CGGCCACCAC A GATCGAGGCG A GGCGGCCGAG T CCGGGCGCAG G GACCTTCCGT A CGCCGCCTTC C TCTCGCGGAG C GGAGCCGGCT *T GGCCTGCCGC G CGGCGGGGAC 'A CCACCCGGAT A CGTCGCCGGC ;A CCCGCAGCAG ;A CCCGACCTCC 'A CGGGCCGAGC WO 99/61599 WO 9961599PCTIUS99/1 1814 18- 3601 3661 3721 3781 3841 3901 3961 4021 4081 4141 4201 4261 4321 4381 4441 4501 4561 4621 4681 4741 4801 4861 4921 4981 5041 5101 5161 5221 5281 5341 5401 5461 5521 5581 5641 5701 5761 5821 5882 5942 6002 606] 6121 618, 624: 630: 636: 64 2: 648:.
654:.
660: 666: 672: 678 684 690 696 702 708
CTGCGGGAG
ATGTCGGGCC
GCCTGCTCGT
GTCGACATGG
TTCAGCCGGC
GACGGCACCA
CGCCGCAACG
GCGAGCAACG
CTGGCGGACG
ACGCGACTCG
GACGACGAAC
GCGGCCGGCG
AAGACGCTGC
CTCCTCACCG
GTCTCCTCCT
GTTGTCGAGG
GTGACGCCTT
CTTGCCGCAT
GGCGCTGTCG
CTCGGCGCCG
GGAADCGGCTT
GCTGGCATGG
TGTGAGGCCG
CCCGGTGCGC
GTCTCGCTGG
TCGCAGGGCG
CGCGTCGTCA
CTGTCCCTCG
TCCGTCGCCG
GAAGAGCTTG
TACGCGTCCC
GGTCTCAGCC
ACCGAGCCCG
GCCCCCGCCP
*GCCCACCCCC
*CGCGAACAGG
*CTTCCCGTGE
*TACGCCTTCC
*GACGACTGGC
*ACCGGCCTGI
-GCCGTGCTC
2
-GACGACGACC
L ACCGGCGTGC
LGGCGACGCGC
LCGTCTCGACZ
L GCCCTTGAGC
LGCCGCCCTC(
LATCCGCACC)
L CCCACCCGC( I. GGCAGCCAC( I CGCAGCGGC( I GCCCGCGTCI 1 GACGCCATC( 1 GACGGCATCI 1 GTCGGCGCC 1 TTCTCGTCC, 1 GCCTACCTC 1 GCCTGGGGA 1 CGCAACCAC
GCGGGGAAGG
GCGTCTCGTA
CGTCGCTGGT
CGCTCGCCGG
AGCGCGGGCT
GCTGGTCCGA
GACACCAGGT
GCCTCACGGC
CCCGGCTGAC
GCGACCCGAT
AGCCGCTGCG
TCTCGGGTGT
ACGTCGACGA
AGGCCGTCGA
TCGGGATCAG
GTGCTTCGGT
GGGTGGTGTC
TCGCCTCGCG
CTCACGTACT
GGGCGGACGA
CCGGTGTCGG
GTGCCGAACT
CGCTGTCCCC
CCACGCTGGA
CTCGCGTGTG
AGATCGCCGC
CCCTGCGCAG
CGCTGAACGA
CCGTCAACGG
CTCAGGCGTG
ACAGCCGGCP
CGCAGGCCCC
TCCTCGACGC
TCGAGACCCI
TCCTCACCAI
GAGGCCAAGT
CATGGACTTC
AGGCCGAGCC
GCTACCGCAI
CCGGCCGCTC
CCGCGCTG'
GTGAGGCCC
TCTCGCTCC'
GAATCAAGG(
k. CCCCCGCCG2
-ACCCCGAAC(
3CCCACCTCG'
CCGGACTCM
3ACTGGCAGCI 3CCGCACGCTI 3AACAAGCCC, k. CCATCGCCG,
:CCGCCGAGA
3 TGGACACGC T CGGTGCTCG G TGTCGAGCA G ACGCCCTCG C CGTGGGACG G GCGTGCCCG CCTCGACGGC T.
CACACTCGGC C CGCCCTGCAC C CGGCGTGGCC G GGCCGGGGAC G GGGCGTCGGC G CCTCGCGGTC G TCCGAACGGG C GACCTCCGAC G CGAGGCGCAG G CCTCGGGTCG T CATCAAGATG G GCCCTCGGAC C
CTGGCCGGAG
CGGCACCAAT G CGTCGAGCCG TI GGCGAAGTCC G GGATCGTACG G GGCTGACGGG C CCTCGTACAG C GCGAGTGGCC 9]
GCTGGACTCTI
GTACGTCGACI
GCGGGTCGAT
GCAGCACCAC
CGCGTACGTC
CAAGTCCATC
GGACGCCGTC
GCCCACCGCC
CAAGGCGGAC
GGTCGAGATC
GCGCGTGCCG
CACCTACTGG
GGCCGTCGAC
GACCCTCCCC
GCGTCTGGTC
GCTCCTGCCC
CTACTGGCTC
CGACTGGAAG
GCTCGCCGTC
CGACGCCGGG
r CGCCGCCCGG C' CGACGGACTC
GCCCCTGTGG
~CCCCGACCGG
3CTGGGCCGGC r CACCGCACTC k. CGCCCGCCGC
CCACGGCACC
3 GATGGCCCAC
CCGGAGCCACC
C *CTGCGACGTC C GCCCCTCACC T GACCGCCGAG A CGAGCTGACC C TCTGGGCATC C GGCTCGCCGC G TGGCGGCATG G CATGGACCCG ACCTGCTGA C TTGAGGGCC C TCGCCGTGC A TGATGCCCA C GCCGGTCGA A TCCTCCTCG T TCCGCGGCA G CCTCGCAGC A TGGACGTCG TI CCCTGATCG C
TGAAGTCCA
TCCAGGCGA TI :AGATCGACT C AGCAGGACG C ;CGCATGTGG 'I 'CGGTTGGCG C ;CTGCCGCGC9
;ATGACGCCG
]GTGCTCAGT
;CGCTGGCCG
TCGTGTTCC
CCGCGGTGT
'GGTCGCTGG
;TCGTGCAGC
3GTGTGACGC
,CCGGAGCCC
GCCGCCCACC
CTGGAGCGAC
kiCTGTCGTGT
GGATTCCGCG
FTCGAGAGCG
rTCTTCTCGA 1ACCGCAACC
GAGGGCTTCA
GAGACCGTCA
ACCTCGCTCG
GCCACGGCCT
GAGAACACTC
CGCCTCCCGG
ACGCCGGAGG
GCGAAGGTCG
CTCACCGCAC
GTACCGCAGG
TCCGTCACCC
GCCATGCTCT
CTCGTCGACC
TCCGGCGCCA
CTCGCCCGCG
GTCCTCATCA
CACGGAGCCG
CAACTCACCG
GCCGACCCCC
GCCGTCGTCC
CAGGTCCGGC
CGGGACCTCG
CCCGGTCAGG
CGGGCCACCG
GCCGCCGGTG
GAACTCGCCC
CGGCAACAC G CGCCCTGAC G *GGCCCTGCG C GCCCGGGAT G *GGCGTTCGC C 'CGAGCGCCT G CGCCGTGAA C LGCGCGTCAT C 'CGAGGCACA C CACCTACGG C LCATCGGGCA C ~GCGCCACGG A ;GTCGGCTGG C ;CGGGCTGCG C ~GCTCGAAGA G ;GTCGGCGGT C ~CGACGCGCA C kCGCCGGTGC TI rCGAGCACCG C k.TCCGGACGG C 'CGGTCAGGG C ICGCGGCGGC %GGCCGTCGT I
CTGTGACGTT
CCCAGGCGGT
rGCCCCTGGA rCGCCGGCAA
TGAGTGACTT
CGGGTGACCC
CGCGGATCAT
AGCTCGCCCA
CGCTCGAAGG
TCCGTCACCG
CGCACTTCGT
CCGGCCTCGG
CCGAGGCGTG
CCCGCCCCGG
CCGCCGCCCT
CCGCCGAGGG
ACCACTCCGC
AGGTGCTGAC
TGACGACCGG
TCGCCTGGGT
AGGGCGCGGT
GGGGCCTCGG
TCCCCGCCCA
CCGGCGAGGA
CACCCCTCCA
CCGGCGGCAC
AACACCTCCT
CCGAACTCAC
ACGCCATGCG
ACACCGCCGG
GGGCCCACCG
ACCTCGACGC
GCAACTACGC
GCCGGTCCGC
ACGGCGTGGC
TGGCCGCACT
GCCAGCGTG
GTGGACACG
AAGGGCGAG
TTCGTCGAG
GCGTCGGCG
TCGGACGCC
CAGGACGGC
CGGCGCGCG
GGCACGGGC
:CAGGGCCGT
ACCCAGGCC
~CTGCTGCCG
GCCGTGGAA
:CGGGCCGCC
;GCCCCGGTG
GGCGGCGGT
ATCGAGCGG
GTCGACGCG
;GCCGTCGCG
;CTGATACGC
ACGCAGTGG
ATGGCCGAG
~CGGCAGGCC
'GCCGTCATG
:GTCGGCCAC
:GACGCCGCC
3GGCGGCATG
CGACGGGCTG
CGTACAGATC
rCCCGTCGAC
GGTCCTCGCC
CACCTGGATC
CGTCGGCTTC
CGAGGTCAGC
CACCCTCCGT
GGTCAACGGG
TCTGCCCACC
GGCCACCGGC
GTCCGAGCGC
GCAGGCCCC
GGCCGGGGCG
TGACGGCTTC
CCAGGCGCTC
CTCCGTCGGA
CCGCGTCGTC
GCCCGATGCC
CCAGATCGCC
CGGACGTCGG
CGGAGCCCTC
CCTCGTCAGC
CGCATCGGGC
CACCCTCCTC
CGCGCTCGAC
TGCGAAGGCC
GTTCGTGCTC
CCCGCACAAC
CGTCTCGGTG
CGAGCGGCTG
GGAGTCCGCG
WO 99/61599 WO 9961599PCTIUS99/1 1814 -19- 7141 7201 7261 7321 7381 7441 7501 7561 7621 7681 7741 7801 7861 7921 7981 8041 8101 8161 8221 8281 8341 8401 8461 8521 8581 8641 8701 8761 8821 8881 8941 9001 9061 9121 9181 9241 9301 9361 9421 9481 9541 9601 9661 9721 9782 9842 9902 996] 1002] 1008] 1014] 1020' 1026' 1032' 1038: 1044:.
1050: 1056' 10621
CTCGGCCGGG
GCGTACTCCT
ATCGACGCAC
CCCCTGGCCG
CTCGTACGGG
GACCGCGCCT
CTGACCCGGG
CTGGCCCTCG
CGGCGGTCCG
GATGCCGACG
CGCAGCCCGG
CCCACCGACC
AGGGCGTACG
TTCGGCGTCT
ACGTCCTGGG
ACCGGTGTCT
CGTGGCGTGG
GCGTACACCT
CTGACCGCCC
GCCGGTGGCG
GCGCTCGCCC
GCGGAGGGCG
GCGGTGCTCG
ACCGCGCCCA
CTGGCACCCG
CCCATCGAGG
CTCGCCATCG
GGCATCATCA
GACGAGCCGA
ATCGACTGGC
GGGACGAACG
GCCGATGAGG
GAGGTCGCTC
TCCCTCCCCC
CAGGCCGCCC
GGACCGGCCC
CACCGCGCCC
GCCCAGGGCC
TTCCTCTTC;
CACCCCGTC]
CTGCCCCTGC
*GAGACGCGG]
*GAGAGCTGGC
*GCGCACGTC(
*CGGCTCATG(
*GAGATCCGC(
*GTCAACGGC(
*GCGTACTGG'
L TCCGCGCACJ
LCGGCGCCCC'
LCTGTGCGAC,
LGTCCGTGTC,
L CTCACCGCC.
L CCCGTCGGC L. CCGCTGCTC 1 CTCGGCAGG 1 GGGGCGCAC 1 GCCCCGGCG I CCGCTGCTC
ACGAGACCGC
CCGGTCGCCC
GGGACAGCGC
AGCGGCTGGC
CGCAGGCCGC
TCAAGGACAT
CGACCGGGCT
TGTCGCTGCT
CGGCGCTGCC
ACGATCCGAT
AGGACCTGTG
GCGGCTGGGA
TCCGCGAGGG
CGCCGCGCGA
AGGCCTTCGA
TCATCGGCCT
AGGGTTACCT
TCGGTCTCGA
TGCACCTGGC
TGGCGATGAT
CGGACGGCCG
TCGGCCTGCT
CCGTGGTCCG
ACGGACCCTC
GCGACATCGA
CCCAGGGCCT
GCTCCGTGAA
AGATGGTCCT
GCCCGCACGT
CGGCCGGCAC
CGCACGTCGT
TGCCTGAGGT
AGGGCTCTGA
GGCACCTGCC
CCCTGCACGC
*GCCTGCGGGP
CCGTGACCGC
GCACCTCGGC
SCCGGCCAGGC
TCGCCCGGGC
-TCGACGTGAI
ACACGCAGTC
,GCATGCGGCC
3 CCGGTGTGT]
AGGAGCTGCC
3 TGTGGCTGG)
CCGAGGCCGC
r CCGGGCTCG( k. TGGACGGCA' r CCCTGACCG'
ZCCGAGTACT(
C TGCGCGACC' A. TGGCGGCCG, T CTCCCGCCG G TGGCGCTGC G CGCACGCCC C GCGTGGACC G CCGACACCG G GCGCCGTGG
GATCACCGTC
GCAGCCCCTC
CACGTCCGGA
CGCGCGGCT
CGCCGTGCTC
CGGCTTCGAC
CCAGCTGCCC
CCGCAGCGAG
CGCGACTGTC
CGCGATCGTC
GCGGATGCTG
CCTCGACGGC
CGGGTTCCTG
GGCGCTGGCC
GCGGGCCGGC
CTCCTACCAG
GCTGACCGGC
AGGGCCCGCG
GGTGCGGGCG
GGCGACCCCG
CAGCAAGGCC
GCTCGTGGAG
CGGTACCGCC
GCAGCAGCGG
CGCCGTCGAG
CCAGGCCACG
GTCCAACATC
CGCGATGCC
CGACTGGGCG
CGGTCCGCGC
GCTGGAGCAG
GTCTGAGACG
GGCCTCCGACG
CTGGGTGCTC
GTGGCTGTCC
CGTCGGGTAC
CGCCGACCGC
CCACGTCCAC
CAGTCAGCGC
GCTCGACGAC
'GTTCGCGGCC
3CGCGCTGTTC
GGCCGCACTC
C' CTCGCTCGC(
CGCCGGTGG(
k GACGGAGGA(
:CGTCCTGTC(
3 CCGCAGGAC( r GCTCGACGG( r GGTCTCGAA( 3 GGTCCGGCAI r CGGCGTGCG' P, GGGCCTCGC, G CTCTCCCGC T GCGCCGCAA A CGGCACCGG T GCCCACGTA C GGTGGACAC T CAGCCTTCC
GCGGACATCG
GTCGAGGAGC
CAGGGCGGGA
CCCGGCGAC
CGGATGCGTT
TCGCTCGCCG
GCGACGCTCG
TTCCTCGGTG
GGTGCCGGTG
GCGATGAGCT
TCCGAGGGCG
CTGTACGACG
CACGACGCGG
ATGGACCCGC
ATCGAGCCGG
GACTACGCGG
AGCACGCCGA
ACGACCGTCG
CTGCGCAGCG
CACATGTTCG
TTCTCGGCGG
CGGCTCTCGG
GTCAACCAGG
GTGATCCGGC
ACGCACGGCA
TACGGCAAGG
GGACACACCC
CACGGCACCC
AACAGCGGCC
CGCGCCGCCC
GCGCCGGATC
GTAGCGATGC
GCCCCCGCGC
TCCGCCAAGC
GAGCCCGCCC
ACGCTCGCCI
GACGGGTTCC
CTGGACACCC
CCCGGCGCC(
ATCTGCGCCC
GAGGGCAGCC
GCCCTGGAG(
3CTCGGTCAC'
GACGCCGCC(
GCGATGCTCI
3 CGGTACGCGI
GGCGACGCG,
CGCGCGCTG,
3 TTCCGCGCC
:GTCACCGGC
ZGTCCGCGGC
3 ACCTGCCTG G GACACCCCC C GACTCCGCC G CGGTCGGAG A CCCGACTGG C TCCTTCCGG C GCCGGCCTC G GACCGGGAC ACTCGGACCG C TGCCCGAGGT G GCTCCGCCCA G GTACGGAGAT C CGCCGGAGGA C GTGTCGAGCT G TCTTCGACCA C ACCAGGAGAC G CCGGCGCCGG C GCCGCTACCC C CCGAGGGCAT C CCGACCCGGA C CCGAGTTCGA C AGCAGCGGAT G CATCGCTGCG C CCCGCGTCCC G GCGTCGCGTC G ACACCGCCTG C GCGAGTGCAC C TGGAGTTCAG C ACGCCGACGG C
ACGCCGCGC
ACGGCGCCAG(
AGGCGCTCGC
CGGAACCTC
AGCGGCCCGC
IAGGCCGCGGC
TGCCGAAGAC
TGGCCCTCGT
TCTCCTCCTT
CTGCTGGTGA
CTGGGACGGC
CCCCCGGCAG
ACGAGCAGTC
CCGACCTGTC
k CGAGCCGTAC
TGGACGGGCT
]CCCGGGACGG
GCCGTGAGCT
ACCTCGACGG
3 CGGAGGCCGC 3 TCGCGCTCTT r' CGGTCGCGA
:GCCTGGTCGC
3 CCGTCCAGCC 3 CACGTCTGGA G ACI3CGGCGCG C GGGTCAGCCA G TCCTGGAGAC C TGGCCGCCGG A CCCTCCGCTT
G-AGCTGGGCCC
G CGGATTCCC G CCGGCGCGCT A CCGAGACCGT C ACGCCTGGTT C GCCACCGCTA IC GTCTCGGCAC CG GCCTGCTGCT
TTCTACCTC
CGGCGCATC
GGCGCCAAC
CTCCTCGGT
GTCGCCGCC
CGCAACAGG
CCGACGCtG
GCGGACGCC
GCCGGCACC
GGTGACATC
.ACGCCGTTC
IGCGCTCGGC
:GCGGAGTTC
;CTCCTGACG
:GGCAGCAGC
;AACGCCCCG
;GGCCCTATC
TCGTCGTCG
;ATGGCGCTC
~CGTCAGCGG
TTCGGCGCC
~AACGGTCAC
AACGGGCTG
,GACGCCCGG
3CTGGGCCAC 3GAACGGCCG
~CGTGCGGCG
CCTCCACGCC
CACCGAGCCG
CGGCATCAC
GGTGCTTGGG
TGGGACCTCC
CCGTGAGGCG
GCTGCGCGGC
GGACGCGGAC
CGCCTTCGCG
GGCCACGCTG
CACCACCGCG
GTACGACCGG
TCACCTCGAA
GCTGCTCGAC
CCGGCTCGTC
GATCGCCCC
CGCGCGCGGC
CGCGGAGGAC
CGTCGCCCC
GGAGGCGGAG
CGCCTTCCAC
GGTGGAGTTC
CCCGGACGAC
CCTCGACGGC
CGACCGCTC
TGCCGGCTCC
CCGGCCCCGG
CGCGGACGCC
CGCCGGCTCC
CTGGCTGGAC
CGCCGACCAC
CACCGCCCGC
WO 99/61599 WO 9961599PCT/US99/1 1814 10681 10741 10801 10861 10921 10981 11041 11101 11161 11221 11281 11341 11401 11461 11521 11581 11641 11701 11761 11821 11881 11941 12001 12061 12121 12181 12241 12301 12361 12421 12481 12541 12601 12661 12721 12781 12841 12901 12961 13021 13081 13141 13201 13261 13321 13381 13441 13501 13561 13621 13681 13741 13801 13861 13921 13981 14041 14101 14161 CTCTCCCTGC C CCCGGCGCCG C GTGCGGGAGC I] CGCGTGACGG I] CGGCCCGTCT C CACGCGACCG C
GCCATGTGGCC
GACGGGAACG C
GAGGGTGAGG
ACCGCGAACG
GACGCTTCGC
GTCCCCTTCC2
CGTCTCGCCT
CCGCTGGTCT
AGCCGCGTCG
GGCGAACAGG2
CTGAAGGTCG
GCCGCGCTGT
CTGCCCGCGG
CTGGAGCTGC
GTCACCCGCG
CTGTCGCACG
TTCGGCCTTC
TCCGACGGG
GCCCGCCTGG
GGCACGGTCC
GTGGGCGAGT
GGCGCCGACG
TGCGACGTCG
CCGCTCACCG
ACGACGGAGG
GAACTCACCT
GCCGTCTTCG
CTCGCCTGGC
GCCGAGACCA
GCGGGCATCG
GACGACCGCC
GCCGGGAACG
GTCCGGGCCC
GGGACGGCGG
GTGGACGGGC
GTACTCGGCC
TTCGACTCCC
CT CC CGGCGA
GCCGAGCTGC
AACGGGACGA
CGCCTGGAAG
CTGGAGCACC
TCCGGAGCCC
GGAGCCGGGG
GAGGAACTCT
CGCCTCCCGC
GGGCGCCTCC
ACTACCTGCG
AGGCGAAGGC
TCGCCTCGCC
TCCCCCAGGA
GCAAGAGTTA
TCTTCGGGAT
;CACCCACCC
~GATGGTCGA
~GACCCTCCT
"CGGGGCGCC
CCTCCACTC
;TCTGCTGGC
~GCCGCAGGG
CCTCGCCTT
rCTTCGCCGA
;CGGCGGGAG
[GCACGCCAT
k~CTGGAGCGG
'CGCGGGGAC
,CGTGGAACG
3CGGGCTGAT
%CCCGCACGC
CCGCCGCCCT
CCCAGGACGT
GTCCCGCCGA
TCCAGGCCTG
GTGCGGTGCG
CGGCCGCCTG
1CGACCTGGC
GCCTGCGCGA
CCTCCGTCCG
TGCTGACCGG
GGGGCGTACG
AGCTCGTGCA
CCGACCGCGA
CGGTCGTCCA
ACGTGGAACA
CGACGCCCGC
GTGGCGCGGG
GCCGCCGGGC
GCGGCATGAC
GCGGGATCAG
ACCCGGTCCT
ACCCGGCCGG
GGCCGTCCGC
ACGGCGCGGC
CCGCACGGCA
ACGCCCGCGG
TGACCGCCGT
CCCTGGTCTT
CGCGCGGCGC
CGGCGTCCCC
GCGCCTTGGI
TGCGGTCCCI
CGGACGGCGC
GCGGGAGTGI
TCGGCCTCCI
CCCGGACCCC
AGGAACTCAI
TCGTGCCACC
GGGCGAGCCC
CGAGGACCTC
CCGCGGCTGC
CGCCCGCGAC
CTCGCCGCG(
GTGGCTCGCG
ACTCGCCGCG
TGAACCGCTG
GGCCGGAGAG
GCGGCTCGCC
CACCGACCGG
CGCCGAGGAG
CGGTCCGCTG
CATCGCGCTC
TGCGGCGGCG
CGCGGTCGGC
TGTCACCGTG
GGACGCCGTC
GCTCACGCTG
GCACCGGGTG
CACTTCGTAC
GGAGTCCGCG
GGCGGCCGGC
CGGCGGCGCG
GCTGGCCGAC
GGACCCCGAG
GGGTCTCGTA
CGACGACGCC
CGAACCGCAG
GCCCGAGACC
CGGCACCGGC
ACGCCTGCTG
CGAGCTGGAG
AGCCCTCACC
CACGGCAGGC
CGTACTGCGG
ATACGACCTG
GCAGGGCGCC
AGCCGGACTC
CGGCGAGCTC
CGACGCCGAG
GCTGCCCCTG
AATCCCGGCG
GGCCTCCGCC
GGAAACGGCC
GCGCCTGCTC
TCACCGGATC
CGAACTCCGC
CGACCACCCI
CTCGGACCAC
GAGCACCGCC
GCTGACGGGC
GCGCTCGATC
CGGGTCCGGC
GGACGGCGCC
CGACCAGGA(
GTCCCGGGC)
GGGGACAGCC
GCGGACCTCC
3GTGGCGATC(
'TGGCGGCTG(
GACGTGGAG(
.7 GCCGGATTC(
GAGGCCCTC(
GACCACGCCG
CACGCTGCGG
GTACTGCCCG
CCCGGTGGCG
GACGCGCCCG
CCCGAGCTTC
GTGCCGCTCG
TTCCAGGGGC
CCCGCCACCA
GCCCCCTACG
GGTCTCGTCG
CACGCGGCCG
TCGCTGTCCC
CGCCCGGTCA
GCCTGGCGTC
GGGCCGACCG
GGCGTCGAAG
GCCCCGGCGC
GAGGGTGTAC
GAGCACCTCG
GGGTCCGGCG
CGGACCGCGC
TCGTCGTACC
CTCGCCCTGC
GGCACCGCCG
GGCCTGGGCG
CTGGTGAGCC
GCCCTGGGAG
GCCGTACTCG
GTCCTCTCCG
CCCAAGGTCG
GCAGCGTTCG
TACGCCGCCG
CCCGCCCTCT
GGCCAGGCGG
GGCATCGCGC
CGGCTCGACC
CTCTTCCGGC
TCGACGACAC
GCGGTCACGC
CTCGAGTTCC
GACGCCGAAC
AACCGGCTCI
AGCCCGGCGC
GACGGAGCCC
GAGACGGACC
CTCTCGGACC
GTCACGGGCC
GCCGAGGAC(
3GGAGTGCCG(
CCCAGCACGC
k CCTCGACTC( 3TGTCCACGG'
ACGAGGCCC(
3 TCGGCATGG( 3 TGGCCGGCG( 3 CCCTGTACGJ
TGTACGAGG,
3 CCATGGACC TCCTGGGGAG C AGTCCGCCGG T AGCACGGTGG C AGTCGGCCGG G CCGGTACCGC C CCGTCGCGCC C ACGGTCTCTA C TGAACGCGGT G CGAATGCGAC C GCATCCACCC C ACGAGCCCGA G GTGCCGCGGC C TGACGGACGG C CCGCCGATCA C CGTACGCCCT C CCGTCCTCGG C TCGGGCTCTA CCCGTACCGT C
GGGGCACGGT
CGGGCACCCG
CCGACGATGG
AGACCGAGAA
GGACCCTGCC
ACGACGGCAC
CACCGGCGCT
GACTGGTCGC
GGCGGGGCAC
CCGACGTCTC
ACGCCATCCC
ACGGCACCCT
ACGCCGCGTT
TCATGTTCTC
CCAACGCCAC
CCCTCGGCTG
ACCTGCGCCG
TCCTCGACGC
CCGCCGGGCT
ACGTCGTCGG
CCGGGACGGC
TCGCCGACCG
TCGTCGGCGA
GGGGCTTCCT
ACTCCGCCGG
CACTCGCCTC
GGAACCGGAA
CGCTGCTGGC
CCCCCGGGAG
3AGACCGGGAC
'GGCCCTGGGC
3 ACTTCATGAA 3 ACTGATCCCT 3 AATCACTTCA r GAACGAAGAG 3 TGGCCGCCTC
:CTGCCGCCTG
3 CGAGGACGCG P, CCCGAACCCG
CGGGCGAGTTC
C GCAGCAGCGT
:GTCCTGCTC
'CTGCGTGAC
:GTCGAGCTG
GACGGCGCA
:TGGTCCTGC
:GACCGTGCG
:GAGCGGCTC
;TGGCGGTAC
GCGCCCGCG
GCCCTGCTC
;CTCGTCCGC
;GCCCGGGTC
~GAGGGACC
;GCGGCGGCG
GCCTCGTCC
AAGGACGAG
~CCCGACCTG
CTTGCGCCG
;GCCCGGACG
'CTGCTCCTG
GGCGAGGAC
'CCCGGCCGC
3TCGGTGCTC
:ATCAGGCTG
CGCCCCGGAG
CCGGCACGTG
GGACGCCCCG
GGTGGCCGCG
CGCCGAACAC
CCCGTCCATG
CCTCCTCGAC
CTCCGCCGCC
CCTCGACGCC
GGGCCTCTGG
GATGAGCCGC
CGCCCTCCGC
GCGGGACGCG
CGCCAGGACC
CGGCACGCCG
GGCCGCCACC
GGTCGCCGAA
CGACCTCGGC
TGGCCTCGCC
CCACCTGGAC
CGGGAACGAG
ACAACTGACC
CGAAGAAGTC
CGGGACCGCG
GGCCGGGGAC
CGCCTCGGCC
GCCGCACGGT
TGCGCGCCTC
AAGTACCTCG
CGCGAGCTGG
CCCGGCGGCG
ATCTCGGAGT
GAGGCCACGG
GACGCCGACT
CTCCTCCTGG
WO 99/61599 WO 9961599PCTIUS99/I 1814 -21- 14221 14281 C 14341 C 14401 14461 14521 14581 14641 14701 14761 14821 14881 14941 15001 15061 15121 15181 15241 15301 15361 15421 15481 1554 1 15601 15661 15721 15781 15841 15901 15961 16021 16081 16141 16201 16261 16321 16381 16441 16501 16561 16621 16681 16741 16801 16861 16921 16981 17041 17101 17161 17221 17281 17341 17401 17461 17521 17581 17641 17701 GGCCTCCTG G ~GGTCGGCGT C GGAGGGCAT C ~CGCGTACAC G ,GCTGGTCGC C LCGCCGGCGG C 3CGGGCTGGC G 3GTCCGAGGG C kTCGGATCCT C TCACGGCTCC G GGCTCACGAC C kCCCGATCGA G CGCTGCGCCT C CCGGCGTGAT C rGGAGAAGCC C CCATGGACTG C GCGTCAGCGG C CTGCCTCCGA C TGGTGUTCGGC
CCTCGCAGGG
GGCG\CGCCGAC
AGGCGCTGAC
CGTTCGTGTT
TGTCGAAGGA
ACTGGTCGCT
ACGTCGTCCA
ACGGCGTGAC
TCGCCGGTGC
TCGCCGCCCA
CCCGGCAGCG
CCACCGTGGT
ACGGGGTCCG
CCATCGAGAG
CGTTCTTCTC
GGTACCGCAA
ACGAAGGCTT
CCGAGACCGT
TCACCTCACT
CCACCGCAAC
GGCTCCACGA
ACTGGAAGCG
TCGTCGTCGT
GCGCCGGCGC
CCGCGACGCT
TGCTCGCGTG
GCGCCACCCT
GCGTGACCCA
CCATGGTGTG
TGATCGACCT
CCGGCGGTAC
TCGTCCGCGC
CGGTGCTCGT
GCGACGGCGC
GCACCTCCGG
TGGGCGCGAC
TGCTCGCCGG
TCGACTCCGA
AGGCCACCGC
GTCCGCCCGT
GAGGCGTTC G TTCACCGGC G GAGGGCTAC C CTTGGCCTG G CTGCACCTC G :GTGACGGTC A CCGGACGGC C :CTCGGCGTC C :GCCGTGGTC C ~AACGGGCCG TI :TCCGACGTG G ;GCGCAGGCC G
GGGTCGTTG
AAGATGGTC C ;ACGGACCAG G ;CCGGACAAG C ACGAACGCG C ;GCGACCCG C ;AAGACTCCG C 'CGTACGGAC C 3TTCGAGCACC
'GCTCCGGAA
'CCCGGTCAG
3TTCGCGGCG
GGAGGCCGTC
GCCCGTGACC
GCCGCAGGCC
CCTCACCCTC
CCTCGCCGGC
CATCGAGAAC
I'TCGGGCGAC
CGCACGGATC
CGAACTCGCC
GACACTCGAA
CCTCCGCCAC
CACCCACTTC
CACCGGCCTC
CGCCGAAGCC
CGGCCACCAC
CTCCCCCGCC
CCTCGCGGTC
CCCCGAGGAC
CGACCCCGTA
GGGCGAGGCC
GGACGAGAGC
CACCCTGGTG
CGGCGCGGTG
GGGCATGGGC
GCCCTCGGAC
GGGTGAGGAC
CTCCCTCCCG
CACCGGTGCC
CGGACACCTC
TGCCGCCGAG
GGCCACCGTC
CGTCTCCGAC
GCCGCTCGCC
CGCGCTCCAC
CCTGGTCCTC
AGCACGCCG
TGATGTACC
TGGGCACCG
AGGGGCCGG
CCGTGCAGG
~TGTCGACGC
GGTCGAAGT
:TCCTCGTCG
GGGGCACCG
CGCAGCAGC
ACGTCGTCG
TCATCGCCA
LAGTCCAACA
AGGCGATGC
;TGGACTGGT
;GCGACGGCG
~ACGTCGTGC
;CCGTCGAGC
CCGCGCTGG
CCGCCGATC
,GGGCCGTCG
3GACTGATAC 3GCACGCAGT 3CCATGGCCG 3TCCGGCAGG rTCGCTGTCA
GTCGTCGGCC
GACGACGCCG
A.AGGGCGGCA
CTCCACGGAC
CCCACCCAGA
IkTCCCCGTCG
GAGGTCCTCG
GGCGCCTGGA
CGCGTCGGCT
A'TCGAGGTCA
GGCACCCTCC
TGGACCAACG
CCC GAGC TCC
GTCCAGGGCT
GCCGACGCG'I
CGTTCCGCCC
CAGCTGGACC
CTGGCGGCGC
GCGCACCCCC
CAGGCGCTGC
TCCGTCGGCC
CGGGTCGCCC
GCCGACCGG(
CAGGTCGCGC
GCGCACGGCI
GAGGAGCCT(
CTCCTCCACJ
GACTCCGGC(
GTGACCTGC(
GCGCACCCG(
GCGACCGAC(
CTGGACCGCI
TTCTCCTCG,
GGATCCCGGC
ACGACTACGC
GCAACTCCGG
CCGTCACGGT
CCCTGCGCAA
CCAGCACCTT
CCTTCTCGTC
AGCGCCT GT C
CCGTCAACCA.
GCGTCATCCG
AGGCCCACGG
CGTACGGGCA
TCGGACACAC
GCCACGGCGT
CCGCGGGCCC
GACTGCGCAG
TCGAAGAGGC
CGTCGGTCGG
ACGCCCAGAT
CGGCGCGGT
TCCGGCAC
GCGGCACGCC
GGGCCGGGAT
AGTGCGAGAG
CGCCGGGCGC
TGGTTTCGCT
ACTCGCAGGG
CCCGCGTCGT
TGATCTCCCT
TGTCGATCGC
TCCAAGAGCT
ACTACGCCTC
CCGGGCTCAG
TCACCGAGCC
TCGCCCCCGC
CCCCACCC
GCCGCGAACP
GCCTCACCAI
CCACCTACGC
CCGTGCAG
CCGAGCGCC
AGGCCCCCC
TGTCCCCGC9
CCGGTGGAGC
GCCACCCCGC
AGGACGCCC(
GGGCCGACC2
CCCTGGAGC)
CGGCCCTGGj 3TACGCGCCT( k. CGCCTTCGC( 3CGGCCCCCGi k CCACCCCCT(
'TCCCCGGGC'
3 ACCTCACGG,
'TCAGCGCCG'
3 CGGACGCGC
:TCCTGCGGG.
3 TCGCCGCGA GGCCACCGCC C CACCCGTCTC A CACTGTCGCC TI CGACACCCCC TI GGGCGAGGTC G CGTCGAGTTC P GACGGCCGAC G CGACCCCGT C GGACGGCGCC I ACGTCCCCTG C CACGGCTACG C GGGCCGTGAC C CCACGCCGCC C CCTGCCGAAG I
GGTCGAGCTG
GCCGCGGTC
CCCGGCGCC
CGCCGGCCTG
CGGACGCCTC
CGCTCGCGTA
CGGACAGGAC
CTCGGACGTG
GGGCGCCGAA
CGCGCTCTCC
GCCCACGCTG
GGCGAAGGTC
CGAGATCCC
CACCCTGCGC
CGCCCTCAC
CGCCGTCAAC
CGCTCAGGCG
CCACAGCCC
CCCGCGGACA
GGTGCTCGAC
CGTCGAGACC
CGTCCTCACC
GGGAGGCCAG
CGACTGGGCG
CTTCCAGCGC
CTCCTGGCGC
CGGCTGTCC
GGTGCTCCC
GGGCGACCGG
CCTCGACGC
CCCCTTCACC
3 CCTCGCCGCC k. CGTCACCTCC k. CCCCGAGCGC k. CCGCATGACC
CGGCTGCTC
GTGGTGGCAG
k. CGCCGCACC
:CGGCAGCGAA
1T CGTCGCCGAA k. CCGCAGGC T CCTCCACCTG T CGCCCGTGTC A GGCCGCGGCT T CTGGGGCGGC
:CCCGCACCT
.CCGATGTCC
CGGCCCCG
GCTCGTCCT
;ACATGGCGC
LCCCGTCAC
;GCACCAGCT
GCAAGGGCC
~GCAGCGGCC
;CGGACGCCC
~GACTCGGCG
;GCGAACAGC
CCGGTGTCT
~CGCTCCACG
TCACCGACG
rCCTCCTTCG 3AGGAGACCC 3TGCCGTGGC 3CCGCGTTCG
CTGGCCGGCG
GATTTCGCGC
GGCCGGGTGG
CTCCTCGACG
CGCTATGTCG
GAGCGGGTCG
TGGCAGCACC
GCCGCGTACG
A~GCAAGTCCA
GAGGAAGCGA
GGCCCCACCG
TGTGAGGCCG
CACGTCGAGA
CCTGAGGTC
GGCACCTACT
CTCGCCACCG
ATGACCGTCC
GAGCGTCTGG
CCCGTCCTCC
CGTCACTACT
TACCGCATCG
GGGCGCTGCC
GCGCTGTCCG
CAGCGGCTCG
GTCCTCTCC
CGGGGCACCG
CCGCTGTGGT
CCCGCCCAGG
TCGGGCCGGCC
ACGGTCCTCG
GCCCGCCGCC
GCCGACGGCA
CGGCTGGCCC
GGCGCCGAAG
CT CCGGAC C
GCCGCCCGGC
CCGCCCACCG
GTGACCGCGA
GCCGGAGGCC
GCCGGTCACG
WO 99/61599 WO 9961599PCTLJS99/1 1814 -22- 17761 G 17821 G 17881 C 17941 C 18001 G 18061 C 18121 T 18181 T 18241 G 18301 A 18361 C 18421 TI 18481 T1 18541 C 18601 C 18661 18721 C 18781 '1 18841 18901 18961 19021 19081 19141 19201 19261 19321 19381 19441 19501 19561 19621 19681 19741 19801 19861 19921 19981 20041 20101 20161 20221 20281 20341 20401 20461 20521 20581 20641 20701 20761 20821 20881 20941 21001 21061 21121 21181 21241
CGCGTACGC
CCCCACCGT
GACCGGGGA
CGCCCTGGA
GTCGAGCTT
:CGAGGCGCG
'GAGCCGCGA
CCGCGAGCA
GGCCTTCCG
~GAACGCCAC
GCTGGCGGA
'TCCGGTGGA
~GCCGGGCGG
~GATCTCCGG
~GGACGCGTC
'CGACGCCGA
;GCTCCTCCT
~TCAGGGGCA
CCCGCAACAC
CGTCGGGCCG
:CTGCTCCTC
3CGGACTGGC
TCAGCCGGCA
rCGGCTTCGG
GCCGCAACGG
CGAGCAACGG
TCGCGGACGC
CGCGACTCGC
ACACCGAACPz
CCGCCGGTGI
AGACGCTCCI
TGCTCACCGI
TCTCCTCCT]
ACGAGGACGC
CGAAGACTC(
GCCGTACGGI
AGTTCGAGCI
CCGCGCCTGJ
TCCCGGGACj
AGTTCGCGG(
TGGAGGCCG'
AGCCCGTGA(
CCCCGCAAG,
CCCTGAGCC'
ACCTCGCGG,
GACTGGCCG
TTTCGGGCG
GCGCACGGA
GCGAACTCG
CCACCCTCG
ACCTCCGCC
TCACCCACT
TCACCGGCC
TCGCCGAGG
CCACCCACC
ACCTCTCCC
TCGGCGCGC
TCCGTACGC
CGGCGTTCC
CGCCGGTACG
GACCTCGGTG
GCGGCTGCGC
CACCGCGCTC
CGCCCCCGGC
CCGCGCGCTC
GCTCGGTGCG
CCTCGCCGTG
TGACCTCGGA
CGGCCTGGCC
GTTCCTCCTC
CGGCGGGGTC
TGTCGCCTCG
CTTCCCGCAG
CGGGCGGACG
CTTCTTCGGG
GGAGACCTCC
GCAGGTCGGC
CGCCGAGGAT
TGTCTCGTAC
CTCGCTGGTC
GCTCGCGGGC
GCGCGGGCTC
CCCGGCGGAG
ACACCGTGTG
CCTGACCGCC
CCGACTGACC
CGACCCGATC
GCCGCTGCGC
CTCCGGCATC
CGTGGACCGC
GGCCATGGAC
CGGCATCAGC
CCCGGCGGAC
GGCCGCGCTC
k. CGCCGCCGA'.
k. CCGGGCCGT( k. GGGTCTGGT(
GGGCACGCA(
2GGCCATGGC( r CGTCCGACA( 2GTTCGCCGTI 2CGTCGTCGG, 1T GGACGACGC' G CCAGGGCGG G GTTCGACGG A CCCGACCCA T CATCCCCGT C CGACGTCCT A AGGCGCCTG A TCGTGTGGG T CGTCGAGGT T CGGCACCCT ;C CTGGGCCAP :C CGATCTGCC ;C CGCCGGTGP ;C CGTGGCGCI 'A CCCCTGGC I ;T GGAGCTGGC
GCCTTCCTCG
GCCTGGAGCC
CGCCTCGGCC
GGCCACGGCG
TTCACCACGG
GACGAGCAGC
CTCACCGGCG
GTCCTCAACC
TTCGACTCGC
CTCCCGGCCA
GCGGAGATCC
GACGACGAGC
CCGGAGGACC
GACCGCGGCT
TACTGCCGTG
ATCTCGCCGC
TGGGAGGCCG
GTGTTCGCGG
CTTGAGGGTT
ACCCTCGGCC
GCCCTGCACC
GGTGTGACGG
GCGGAGGACG
GGCGTCGGCA
CTGGCGGTCG
CCGAACGGGC
ACCGCCGACG
GAGGCACAGG
CTGGGGTCGT
ATCAAGATGC
CCGTCGGACC
TGGCCGAGGP
GGCACGAACC
GAGCCGTCGC
GACGCCCAG;
PCCGGGCGCGC
7 GCGCTCGGCI
CGGGGTGTGC
3 TGGGCCGGGI 2GAGTGCGAG( 7, GCCCCCGGC(
ATGGTCTCG(
Z: CACTCGCAG( C GCTCGTGTC( C ATGCTGTCC( G CTGTCCGTC, G ATCCAAGAG, C GACTACGCC G GCGGGGTTG G ATCACCGAA C TTCGCCCCG C AGCGCCCAC 'C CGCCGTGAC ,C GGCCTCACC :C ACCTACGCC LC ATCACCTCC C GCGGACTCC 'G GCGGACCAC 'G TTCCGAGCC
ACGCCCTCGC
CCTGGGAGGG
TGCGCCCCCT
ACACCGCCGT
CCCGGCCGGG
AGTCGACGAC
CCGAACAGCA
ACCCCTCCCC
TGACGGCGGT
CTCTGGTCTT
TGGGCGAGCA
CCGTCGCGAT
TGTGGCGGCT
GGGACGTGGA
CCGGTGGCTT
GCGAGGCCCT
TCGAGGACGC
GCACCAACGG
ACGTCGGGAC
TGGAGGGCCC
TCGCCGTGCA
TCATGTCGAC
GCCGGTCGAA
TGCTCCTCGT
TGCGCGGCAG
CCTCGCAGCA
TGGACGTCGI
CCCTCATCGC
TGAAGTCCAP
TCCAGGCGAI
AGATCGACTC
AGCAGGAGGC
CGCACATCG I
TCGGCGGTGI
TCGGACGCCJ
TCGCTCGCG)
k CCGGACAGGI 3CCTCCGGTG' k. TGGGTGCCGJ 3 CCGCGCTCG( 3 CGCCCACGC'
TGGCGAAGG'
3 GCGAGATCGI
TGACCCTGC'
-TCGCGCTGA
3 CCGCCGTCA C TCGCTCAGG T CCCACAGCG T CCCCCCAGA C CCGCCCTCG G CCGTCGAAA C CCGTCCTCIA A ACGGCGGAC G TCGACTGGC T TCCAGACCC :G CCGGTCTCC :G ACGGCTGCC :G CGGTGGCCC :G GGGACCAGC
CGGTCAGCAC
CAGCCGCGTC
CGCCCCCGCG
CACGATCGCC
CACCCTCCTC
GGCCGCCGAC
GCGCCGTATG
CGAGGCCGTC
CGAGCTCCGC
CGACTACCCG
GGCCGGTGCC
CGTCGGCATG
GGTGGCCGGC
GGGGCTGTAC
CCTCGACGAG
CGCCATGGAC
CGGGATCGAC
CCCCCACTAC
GGGCAACGCC
GGCCGTCACG
GGCCCTGCGC
GCCCACGACG
GGCGTTCGCC
CGAGCGCCTG
CGCGGTCAAC
GCGCGTCATC
CGAGGCCCAC
CACCTACGGC
CATCGGACAC
GCGCCACGGC
GTCGGCGGGC
CGGGCTGCGC
GCTCGAAGA-
GGTGCCGTGC
CGCCGCGTTC
rACTGGCCGGC k CGACCTGGCC P GGGTCGAGT(
ACTCCTCGA(
2TCCGTACGT( r GGAGCGGGT( r CTGGCAGCA( C CGCCGCGTA' 3 CAGCAAGTC, G CGAGGCGGC, A CGGGCCTAC C GTGTGAGGC C CCACGTCGA C ACCCCAGGT A CGGCGGCTA .C CCTGGCCAC .C CATGGCCCT 'A GCACCGCCT C CTCTCTCCT A GCGCTACTC ;G GGCGGCCGT 'T GCTCACGGC ;G CACCGTGCI ;T CGGTTGCGI
CGGGCCGACG
ACCGAGGGTG
ACGGCGCTCA
GACGTCGACT
GCCGATCTGC
GACACCGTCC
CAGGAGTTGG
GACACGGGGC
AACCGCCTCA
ACCCCCCGGA
GGCGAGCAGC
GCGTGCCGCC
GGCGAGGACG
GACCCGGACC
GCGGGCGAGT
CCGCAGCAGC
CCGACCTCCC
GAGCCGCTGC
GCCAGCATCA
GTCGACACCG
AAGGGCGAAT
TTCGTGGAGT
GCGTCGGCGG
TCGGACGCCC
CAGGACGGCG
CGGCGCGCGC
GGCACGGGCA
CAGGGGCGCG
ACCCAGGCCG
GTCCTGCCGA
ACGGTCGAGC
CGCGCGGCCG
GCCCCGGTCG
CTCGTGTCCG
3GCCTCGCAGG
'GGGCGTGCGC
3GCCGCACTGG 3 GCGTTCGTGT
GTGTCGAAGG
3 GACTGGTCGC 2GATGTCGTCC 2CACGGGGTGA Z GTCGCCGGTG C ATCGGCGCCC C GTTGTGGAGC C GCCACCGTGG C GACGGGGTCC G ACCATCGAGA C CCCTTCTTCT ,C TGGTACCGCA C GACGAAGGCT G CCCGAGACCG C ACCACCTCCC C CCCACCACGA ;G CCGCAGCCCG LG CACCCGCTGC ;G AGCCTCTCCC 'G CTGCCGGGAA T CTGGTCGAGG WO 99/61599 WO 9961599PCT/US99/1 1814 -23 21301 21361 21421 21481 21541 21601 21661 21721 21781 21841 21901 21961 22021 22081 22141 22201 22261 22321 22381 22441 22501 22561 22621 22681 22741 22801 22861 22921 22981 23041 23101 23161 23221 23281 23341 23401 23461 23521 23581 23641 23701 23761 23821 23881 23941 24001 24061 24121 24181 24241 24301 24361 24421 24481 24541 24601 24661 24721 24781
AGCTCACCCT
CCGTCGGCGC
PACGCGCCGGG
ACCGCACCGC
ACGTGGACGG
AGGGCGTCCG
CCGAGGTCGC
CCGTGCAGGC
GGAGCGGGAT
CCGGCCCGGA
CGGACTCCCT
CTCTGGACGC
CCGGCGCGGT
GCACCGAGGT
AGACCCCGGC
AGCATGTCCG
AGCGGTTCAC
GCGACGGCCT
CGGAGAGCCC
ACGCCACCGC
ACGCCGCCCT
TCCGGGACGG
ACGGCCTCGC
GGCGTCTGGA
CCGAGACCCT
CCGGTCTCAA
TGGGCACCGA
CCGGCGACCG
CGCGGACCGT
TGGTGTTCCT
GCCTCCTGGT
ACTGGGGCGT
TCGGCCTGGA
GTGCCGCTTC
TCGACGCCTC
ACGTCCGCGA
ACCTGGGCGA
TCGAGGACGG
ACGCCTTCCG
CGGGCCTCGA
TCGTGGCCCG
GGGGCACGGA
ACGTCTCGGT
CGATCCCCGC
GCACCCTCCC
CCGCGTTCCT
TGTTCTCCTC
ACGCCACCCT
TCGGCTGGGG
GCTCGCGGCT
TGGACGCGGC
CGCTCCGCGC
GATCGCGGGT
AGGCGGACAC
ACCTGCGGGA
GGGTGGACCT
TCCGCAACCG
ACCCCACCCC
GGTCCTGGGC
C.GACGCGCCG C GAGCGACGAG 9] C~GAGGCGGAG 9I CCCCGTCGCC C rCTGTACGAG rGGTGTCTGG
CGGTGCCGAG
GGCCGGTGCG
.TCCCTGTAC
.ACGGTGTCC
CACGGTGCTG
GCTGCACCTG
CGTGCTGGGC
CCTGTCCTTC
CCCGGCGACC
CGAGGCCCTG
CGATGGGCGC
GCGGTCCACG
GGGCCGGTTC
CGGGGACGGC
CGGCAGCGCC
GGCGCTCCTC
CGCGGCCGAC
GCCCGGTACG
CGCCCCGGAG
CTTCCGCGAC
GGGAGCCGGC
GGTCATGGGC
CGCGCGGATG
GACGGCCGTC
CCACTCCGCC
GGAGGTCCAC
CGACGCGCAC
CGGCGGGGCG
GCTGCGCCTG
CGCGGAGCGG
GGCCGGGCCG
GGTGCTCCGG
GCACGTCAGC
CCCGGAGGGT
GCACGTGGTG
CGCCCCGGGC
GGCCGCGTGC
CGAACACCCG
CTCGATGACA
CCTCGACGAA
CGCCGCCGCC
CGACGCCCTC
CCTCTGGGCC
GGCCCGTTCC
CATGCGCCGC
CCAGCAGCGC
CGGCGGCGCG
GGACCTCGGC
CCTCGTCCGT
GGAGCGGGCC
TCTCAACGCC
GGGGGAGCTC
GGAAGGCACC
~TCGTGCTGC C 'CCGGGCGTC G [GGACGCGGC A ;ACCCGGAGG C GCTTCGCGG C 'GGCGTGGCG T 3GCGCGCGGT TI 3GCGGGGCGT T1 3CGGTCGGCG C 3TGAGCGccG C 'CCGTCGACC C
'TGGAGTGGAC
GGCGACGCCG
CCGGACCTTAC
GTCCTGGTGGC
CACGGGTCGC I
CTGGTGCTCG
GGACAGGCCG
GTCCTGCTCG1
CTGACGACCG
CTCGCGACCG
GTACCCCGCC
GGCCTCGCCG
GACGGCAGCC
CCGCTCGGCC
GTCCTGATCG
GTGGTCACCG
CTGCTCTCCG
CCCGAGGGGT
TACGCCCTGC
GCCGGTGGCG
GGCACGGCGA
ATCGCCTCCT
GGCATGGACG
CTCGGGCCGG
GTCGCCGCCG
GAGCGGATCG
CACCTGCCCG
CAGGCCCGCC
ACGGTCCTGC
GGCGAGTGGG
GCCGGCGAGC
GACGTCGCCG
CTCACCGCGG
GCGGAGGATG
CTCACCTCGA
GTCTTCGGTG
GCCTGGCGCC
GAGACCAGCG
GGGGCGACGC
GACGACCCGG
GACGGCATGC
CCGGTCAACC
GGGCGGCTCG
ACGCACGTGG
TTCCGCGACA
GCGACCGGGC
GCCGGGCACC
GGGTCCGGAG
CCGTCGTGG C ~TACCTTCGG C LCGCCACCGG I] :GTGGCCGCC C GAACGGCTA C
LCGAGGTGTT(
'CGGCCTTCA
CGGCGCGGG
~CACCGCCCT
CGACTCCTC
~CGCGCAGCT
~CGCCTGGGA
CGGTCTCGC
~GGACCTGGT
~CTGCCCCGC
CGCGCTGAT
rGACCCGCGA
'CGTCTGGGG
kCCTCGCCGG 3GGACGCCAC
CCTCGGCTC
EGGCGCGGGC
TCTGCCGCT
TGGAGAGCCT
CGGGACAGGT
CCCTCGGCAT
CGACCGGCCC
GCGCGTACGC
GGACGTTCGC
GCGACCTGGC
TGGGCATGGC
GTCACGGGAA
CCCGCACCCT
TCGTACTGAA
GCGGCCGGTT
PCCACCCCGG
GCGAGATGCT
TCACGACCTG
ACACGGGCAA
TGACCGGCGG
GCGTACGACG
TCGTGCACGA
ACCGCGAAGC
TCGTCCACAC
TGGAACACGT
CGCCCGGCTA
GCGCGGGGCA
GCCGGACAGC
GCATGACCGG
CCATGGACAG
CGCTCGTCCC
TGGCGCCGCT
AGCGCAGGGC
CCGCGAT GAG
CGACCGTCCT
CCGGTTTCGA
TGCGGCTGCC
TGCTCGACGA
ACACGGCCTC
~GCGGTCCGT G ;CTCTACGCG C 'GTGCTGGCC G CCGGGCGCC C ~GGCTACGGC C GCCGACGTG C CCGGCGCTG C ~ACGCGGCTG C 'CGCGTGCGG C 'GGGCAGCCG C 3GCGGCCTTC IZ
MGGTGCCGCGC
'GCGGCGCTG(
3GAGGCCGTC
CGCCGGCCCC
GCAGGCCTGG
CGCGGTCGCC
CCTCGGCCGG
GGAAGCCCG
CGTCGGCGGC
GGGCGAGCCG
CGCCGCGCCC
GCCCGCCGCT
CACGGCGGCG
CCGCATCGCG
GTACCCCGAT
CGGCGTCACG
CCCGGTCGTC
CCAGGGCGCC
GGACGTCAAG
CGCCGTGCAG
GTGGGACGCC
GGACTTCGAG
CTCGCTCGCC
CGTGGAGATG
TGTCGGCTAC
CGCCGAGGTC
GGACGTGCGC
GGTCGTCCTC
CACCGGTGCG
CCTGCTGCTC
GCTGGAGGCC
CCTCACCGCC
GGCAGGCGTC
ACTGCGTCCC
CGACCTGGCA
GGGCGCCTAC
CGGACTCCCC
CGGACTCAGC
CGAGCTGACC
GATCGCCCTG
GCTCAGCGGG
AGCCGCCGGA
ACCGGACGAC
GGGACACGGC
CTCGCTCACC
GGCCACGCTG
ACTCGCCACG
GGCGACCGAT
;TGCAGCTGT
ACCCGGAGG
;CCCGTGCGG
;AGCCGGTGG
~CCCTCTTCC
;CCCTGCCGG
TCGACGCCG
~CGTTCGCCT
~TGGCCCCCG
;TGTTCGCCG
~GCGACCCGA
,AGGCCCTGC
,GCGCCGGTG
3ACCGGGGCG 3GTGGGCCGG
,TGGCCGACG
3CCCGTTCCG rCCGCGCAGA
%CGGCCGGGG
kCCTCTGGAG
CAGCTCGCCC
GCCGCGGCCG
CCGGCCCTCT
CCCGGCGACG
PTCCGGGCCA
CCGGCG CT GA
CACCTCGCCC
GTGGCGGACG
TCCGTGCCGG
CCCGGCGAGC
CTCGCCCGGC
CTGCGCGCGC
TCCGCGTTCC
CGCGAGTTCG
GGGAAGACCG
CGCGCCTTCG
ATCGCCCTCT
CGGGCCCGCG
ACGATGCCGT
CTGGGGGGCA
GTGAGCCGGC
CTGGGAGCCG
GTACTCGACT
CTCTCCGACG
AAGGTCGACG
GCGTTCGTCA
GCCGCCGCCA
GCCCTCTCCC
GACACCGACC
CTGTCCCTCC
GACGTCGCCG
CTCACCCGCG
GGCGCGGGCG
CGGGTCGCGC
ACCCCGAGCC
GCCGTCGAAC
GTCTTCGACC
GCCGCGGGCG
CGGCAGACCA
WO 99/61599 WO 9961599PCT/US99/1 1814 -24 24841 24901 24961 25021 25081 25141 25201 25261 25321 25381 25441 25501 25561 2562 1 25681 25741 25801 25861 25921 25981 26041 26101 26161 26221 26281 26341 26401 26461 26521 26581 26641 26701 26761 26821 26881 26941 27001 27061 27121 27181 27241 27301 27361 27421 27481 27541 27601 27661 27721 27781 27841 27901 27961 28021 28081 28141 28201 28261 28321
CGGCGGCCCT
CCGGCGGCCG
PCGGCGACGA
ACAAGGAGCT
PCCAGCCCCC
CAACGAAGAC
CAGGCGTCTG
CTGCCGCCTG
CGGGGACGCG
CCCCGACCCG
CGGCGAGTTC
GCAGCAGCGA
GACGGCCCTG
CTCGGGGCAG
GGCGCTGGGC
GACCGTGGAC
CCGCAAGGGC
CCTGTTCGTG
CGCCACCTCG
CCTGTCGGAC
CAACCAGGAC
CATCCGACGG
GCACGGCACG
CGGCCAGGAG
GCACACGCAG
CGGACTGCTG
GGGCACGGTG
GCGCCGCGCG
GGAGGCCCCG
GCCGTGGCCG
CGCGTACGCG
CAGCCGTACG
GGACGCCCTG
GGCGTTCGTC
CAGCTCACCG
CGACTGGTCT
CGACGTCGTC
CCACGGCATC
CGTCGCCGGT
CATCGCCGCC
CGTCCTGAAG
CGCCACCGTC
CGACGGCGTC
GATCATCGAG
GCCGTTCTTC
CTGGTACCGC
TGACGGCTTC
CGAGACCGTC
CACCTCACTC
CACCGCAACC
GCTGCAGAGC
GCCGCTGACG
CGAGCCAGAA
GGAAGCCGGG
CGGCGACGGC
GGTGCAGGCA
GGTCTCCGTC
CGGCCGCGTC
CCAGCCCGAT
'GCCGAACTC
TCCGGAGCTC
GCCACCGAC
GGGCGACTCC
CTCACACACG
DAGCTCCGCG
CGCGAGATCG
CCGGGCGGTG
ATCTCGGAGT
GACGCGTCCG
GACGCCGACT
CTGT CCCT CA
AAGGGCAGCG
PCCACCGCCG
TTCCTGTCCG
PCGGCCTGCT
GAGTGCGACA
CAGTTCAGCC
GCGGACGGCT
GCCCGCCGCA
GGCGCCAGCA
GCCCTGGCGG
GGCACGCGGC
AAGAGCAGCG
GCCGCGGCCG
CCGAAGACGC
GAACTCCTCA
GCTGTCTCCT
GCGGTCGAGG
GTGTCCGCGA
GACGGTCGTA
GCGATGGAGC
CGGATGCCGG
TTCCCCGGCC
GAGTTCGCTG
CTTGAAGCCG
CAGCCCGTGA
ACCCCCCAGG
GCACTCACCC
CACCTCGCCG
CGACTGAGCG
GTCTCCGGCG
CGTGCGCGGA
AAGGAGCTGG
TCCACCCTCG
AACCTGCGCC
ACCCACTTCA
ACCGGCCTCG
GCCGAAGCCT
GGCCACCACC
TCCGCGCCCP
GCCTCCGGCC
GCCGAGCTGC
GCGGACGACE
TTCACCGGCC
CTCGGCGACC
GGACGTCTCC
GTCGCCCTTC
GCCGCCGCCC
GACCGGCTGG
GCCGCCCGGC
CTGGACGAGG
GACTTCTGAC
GAACACGGAA
ACTACCTCAA
AGGGACGCAC
TCGCCTCGCC
TCCCGCAGGA
GCAGGACGTA
TCTTCGGGAT
CCACCGCGTG
GCCTCGGCGT
TGCAGTCGCC
GCCGTATCGC
CGTCCTCGCT
TGGCCCTCGC
GGCAGCGCGG
TCGGCCCCGC
ACGGACACCG
ACGGCCTCAC
ACGCCCGGCT
TCGGCGACCC
AACAGCCGCT
GTGTCGCAGG
TGCACGTCGA
CCGAGGCCGT
CCTTCGGCAT
ACTCCCCGGC
AGACTCCGGC
CGGACGTGGA
ACCGCGCGGT
AAGGACTGGT
AGGGCACGCA
CCTCGATGGC
TCGTCCGACA
CCTTCGCTGT
CCGTCGTCGG
TCGACGACGC
GCAAGGGCGG
ACTTCGACGG
ACCCGACCCP
TCATCCCGGI
CCGAGGTCCI
AAGGCACCTC
ATCGCGTGGC
TCGAGGTCAC
GCACCCTCCC
GGGCCAACGC
CCGAGCTCCC
LCCAGCGCCGC
AGGCGGACCI
TGGGCGCGCI
ACCGTGAGGC
TGGTCTCGC
CCGGAATCAI
ACACCCCCGC
AGCACCCCGI
TCGCCCACC'
AAGGCGTGCT
TCAGGGCGCT
CGTCCGACGA
CTGCCCGACA
CGGACAGGCG
GCGCGTCACC
GCACGAGCCG
CGAGGACCTG
CCGCGGCTGG
CTGCCGGTCC
CTCGCCGCGC
GGAGGCGATC
CTTCGTCGGC
CGAGCTGGAG
GTACGTCCTC
GGTCGCCCTG
CGGTGGTGTC
GCTGGCCGCG
GGAGGGCGCC
GATCCTCGCG
GGCTCCGCAC
CGCGCCGGGT
GATCGAGGCG
GAGGCTGGGC
TGTCATCAAG
CGAGCCCTCG
CGACTGGCCG
CAGCGGGACG
CGTCGAGCCG
CGCGCTGGAC
TCCGGCGGTG
CGCGGTCGGC
ACGCGGCACG
GTGGGCCGGC
CGAATGCGAG
GGAACCCGGC
CATGGTCTCG
CCACTCGCAG
CGCCCGCGTC
CATGATCTCC
ACTCTCCGTC
GATCGAGGAP
CGACTACGCC
CGCCGGACTC
GAT CACC GAG.
CTTCGCCCCC
CGCCCACCCC
CCGCGAACAC
CCTCACCATC
CACCTACGCC
CGACGACTGC
GTCCGGGCGC
GAAGGCCGC(
CCTCGCCGC(
CCTCGACGA(
k. GGCGCCCCTC
'CGACCCCGA(
k ACGCTGGGC( r' CGTCACCGC
CGCCTCCCTC
GGCCGCGGCC
CGACCTCTTC
CCACCGGCAC
AGAACGGGAG
GCCGAGCTGC
GTGGCGATCG
TGGCAGCTGG
GACGTGGAGG
GGCGGATTCC
GAGGCCCTCG
GAGAGCGCGG
GGCTGGCACA
GGCCACCTGG
GGTACGGACG
CACCTCGCCG
ACGGTCATGC
GACGGCCGGT
GGAGTCCTGC
GTCGTCCGCG
GGGCCCTCCC
GACGTGGACG
CAGGCCCTCA
GCGTTGAAGT
ATGGTCCAGG
GACCAGATCG
GAGAAGCAGG
AACGCGCACG
CCGGCCGGTG
GCCCAGATCG
GCCGCCCGCG
GACAGCCGGG
TCCTCGGACG
ATGGGCGCCG
ACCGCGCTCT
GCACCCACGC
CTGGCGAAGG
GGCGAGATCG
GTCACCCTGC
CTCGCCCTCG
GCCGCCGTCA
LCTCGCCCGCA
TCCCACAGCC
GCCCCGCAGG
CCGGTGCTCG
GCCGTGGAGA
GTCCTCACCA
GGAGGCCAGG
GACTGGGCGC
TTCCAGACCG
CGTTACCGCG
;TGGATCGTCG
GGAGCGGAGG
CGGCTCACCG
CTCGTGCCAC
3TGGTCCGTCA
'CGGGCCATGC
'GGCCTCGTCG
CTCTCCGGCG
3CGCCCGCCG
'TGGGGGACG
[CCTTCATCG
:ACCGGCACC
CCATGGCGAA
DGCAGAACAC
rGGGCATGGC rGGCCGGGGA
GGCTGTACGA
rGCACGACGC
CCATGGACCC
GCATCGACCC
CCGGCTACAC
I'CAGCGGCGC
GACCGGCCCT
TGCAGGCCCT
CCAACGCGGA
CGAAGGCGTT
TGGTGGAGCG
GCAGCGCGGT
A~GCAGCGCGT
TCGTCGAGGC
TCGCCACCTA
CGAACATCGG
CGATGCGCCA
ACTGGTCGGC
ACGGCGGGCT
TCGTCCTGGA
GCGGTGTGGT
GGCAGCTCGC
CCCTGGTCGA
AGGCACTGCG
TGGGCCGGGT
AACTCCTTGA
CCCGCTACGT
TCGACCGCGT
TCTGGCAGCA
CCGCCGCGTA
GCAGCAAGTC
ACGAGGCGGC.
ACGGCCCCAC
CCTGCGAGGC
GGCAGGTCGA
CTCCGCACGT
ACGGCACCTA
CCTTGGCGGT
TGACCCTCCC
AGCGTCTGGT
CCATCCTCCC
AGCGCTTCTG
TCGAGTGGAA
CCGTCGGGAG
TCGACGTACT
CACTGACGAC
AGGTCGCCTG
CCCAGGGCGC
TCTGGGGCCT
ACCTCCCCGC
CCACCGGCGA
WO 99/61599 WO 9961599PCTJUS99/1 1814 28381 C 28441 C 28501 C 28561 C 28621 C 28681 C 28741 C 28801C 28861 28921 28981 29041 29101 29161 29221 29281 29341 29401 29461 29521 29581 29641 29701 29761 29821 29881 29941 30001 30061 30121 30181 30241 30301 30361 30421 30481 30541 30601 30661 30721 30781 30841 30901 30961 31021 31081 31141 31201 31261 31321 31381 31441 31501 31561 31621 31681 31741 31801 31861
;GACCAGATC
CACGGACGT
ACCGGAGCC
CTCCTCGTC
~ACCGCATCG
;CGCACCCTC
~GGCGCACCG
'GGCGCGAAG
,GCCTTCGTC
,GCGGCGGCC
3GCGACCTCG
'GCGTACTGG
GGCCAAGGCC
GTTCGCGCCC
CCGGCAGGCG
GACCGGGCAG
GGCGCTCCTC
CCGGGTGGCC
GCTCCGCAAC
CCACCCGACG
GCCGGCCCCG
CCGGC-TGCGG
GCCGGGTTCC
GATCGACGAC
ACCCGACCGC
CACACGCCCA
ACAGTTGGTG
CCc3TCGCCGG
CGCGGGCGGA
GGTCTCCGAG
CGGGCGCAAG
CGACGCGGCC
GCAGCTCCTC
CCGCGGCACC
CCGGGTCGCC
CTCCGGGCGC
GTGCTCCTCT
CTCGACGGCP
CAGCAGCCAG
CGGCCTCGCC
GCGCAAGGGC
GAGCAACGGC
GGCCGACGCC
CCGTCTCGGC
CCCGGGGCAC
TTCGGGTGTC
GACCCTGCAC
GCTCACCGAC
CGCGTTCGGC
GGAGTCCCC'
GAAGACCTC(
CACGGACGTC
GCACCGCGC(
GGAAGGACT(
CCAGGGCAC(
CGCCGCCAT(
CGTCGTCCG,
CACCTTCGCi GGCCGT CAT,
GCCATCCGCA
CGGCCCACCC
CTCGGCAGCC
AGCCGCAGCG
GGCGCCCGCG
CTCGACGCCA
GGCGGCGATC
ACGAGCGGCG
CTCTACTCCT
AACGCCCACC
GTCGCCTGGG
CAGCGTCGCG
CTGAGCCACG
GCGTTCACGG
CTCGCCGCAC
TCGTCGGCGC
ACCCTCGTCC
CCCGGCCGTG
CAGCTCTCCA
CCCGCCGCAC
ACGGACTGGG
GACGCGGGGG
GGCGGTTCGG
CTGGACGCCG
GGTCCTGCCC
CAACCCCATC
GACGCTCTGC
GCCGACCGTC
ATCCGGTCCC
GTACCGGAGG
GGCACGACGT
TTCTTCGGGA
GAAGCCTCCT
GACGTCGGCG
CCCGAAGGCP.
ATCGCGTACI
TCGCTCGTCC
CTCGTGGGCC
CAGGCCATGC
*TGGGGCGAGC
CACCGGGTCC
CTCACGGCTC
CGGCTCACGI
-GACCCGATCC
CCGCTGCGGC
-GCCGGTGTC2
-GTGGACGAG(
3 CCGTGGAC' -GTGGGCGGGj r GCCGTCGAG( 3GCCGCACTG( 3GATCCGGCG(
IGTCGCGGTC(
3GTACGGGGC, 3CAGTGGGCC, 3GCCGAATGC, k. CAGGCTCCC.
ZGTCATGGTC
C GGCCACTCC
CCACCGGACT
GCGACTGGCA
ACGCCGCACG
GCGAACAAGC
TCACCATCGC
TCCCCGCCGA
CGCTGGACGT
CCGAGGTCCT
CGAACGCCGG
TCGACGCGCT
GCCTCTGGGC
GCATCCGTCC
ACGAGACCTT
TGTCCCGTCC
CCGTCGGTGC
TGGCCGCGAT
GTACCCACGC
CCTTCACCGA
CGGTGGTCGG
TCGCCGCGCA
AGc3GGCGGGT
TCCTCGACAC
ACGGCGGCGC
AGGCCCTGAT
CACGCGCCGC
CACGAGCc3GA
GCGCCTCTCT
GGCAGGAGCC
CCGAGGACCT
AGCGCGGCTG
ACGTCCGCAA
TCTCGCCGCG
GGGAGGTCTT
TGTACGTGGC
CCGGCGGTTP
CCCTCGGCCI
CCCTGCACCI
GCGTGGCCG1
CCGCCGACGC
GCGTCGCCGI
TGGCCGTCG]
CGCACGGGCC
CGAGCGACG]
AGGCGCAGGC
TGGGGACGC'
k TCAAGATGG'.
CGACGGACCj r GGCCGGAGC( k. CGAACGCGCA
'CGCCGGCCG
3 ACGCCCAGA' 3 TGGCCGCCC' :3 GCGACAGCC, k. CGGTCACCG.
3 GCATGGGCG G AGACCGCAC A. GCGCACCGA T CCCTCGCCA C AGGGCGAGA
CCACGCCCGC
GCCCCACGGC
CTGGATGGCC
CCCCGGAGCC
CGCCTGCGAC
GACGCCCCTC
CACCGGCCCG
CGACGACCTG
GGTCTGGGGC
CGCCGCCCGG
CGGCGACGGC
GATGAGCCCC
CGTCGCCGTG
CAGCCTTCTG
CCCGGCTCCC
CACCGCGCTC
GGCGGCCGTA
GCTCGGCTTC
CAACAGGCTC
CCTCCACGAG
GCGCCGGGCC
CGTCCTGCGC
CGCCGACCCT
CCGGATGGCT
ACCCCGCGCA
AGACCACACC
CAAGGAGAAC
CATGGCGATC
CTGGGACGCC
GGACATCGAC
CGCCGCGTTC
CGAGGCCCTC
CGAGCGGGCC
CTGTGGCTAC
CGTCGTCACC
GGAGGGACCC
CGCCCTGAAG
CCTCc3CGACC 3CCGGACCAAC
SACTCCTCCTC
'GCGCGGCAGC
CTCCCAGCAC
GGACGTCGTC
GCTGCTCGCC
C GAAGTCGAAC r GCAGGCGCTC
GGTCGACTGC
3 GCCGGGCCGC D, CGTCGTCCT( 3 TGGCGGCGT( r CGGGCAGCT( 3 CGCCCTGGT( G GGAGGCACTI A. TCCGGGCCGI C CGAACTCCT, T CTCCCCGTA, C ACTCGACCG A GGTCTGGCA T CGCCGCCGC
CGCCTCGCCC
ACCGTCCTCA
CACCACGGAG
ACCCAACTCA
c3TCGCCGACC
ACCGCCGTCG
GAGGACATCG
CTCCGCGGCA
AGCGGCAGCC
CGCCGCGCCC
ATGGGCCGGG
GACCGCGCCC
GCCGATGTCG
CTCGACGGCG
GGCGACGCCG
CCCGAGCCCG
CTCGGCCATT
GACTCGCTGA
CCCGCCACCA
GCGTACCTCG
CTGGCCGAAC
CTCACCGGCA
GGTGCGGAGC
CTCGGCCCCC
TCCCGCGCAC
CAGATGACGA
GAAGAACTCC
GTCGGCATGA
GTCGCCGCGG
TCCCTCTACG
CTCGACGACG
GCCATGGACC
GGCATCGACC
CAGGACTACG
GGCAACTCCT
GCCGTGACCC
GGCCTGCGGT
CCGGGCGCGI
GGCTTCGCCI
GAACGGCTCI
GCCATCAACC
CGCCTGATCC
GAGGGCCACC
ACGTACGGG(
ATCGGGCACI
CGCCACGGG(
3 TCGGCCGGT'.
3CTCCGCCGG( 3GAGGAGGCC( 3 GTGCCGTGG(
GCCGCATAC(
GACAGCCGT2 3 CGGGACGCC, 3; GTGGCGTTC, Z GACAGCTCA C GTCGACTGG C GTCGACGTC G CACCACGGC G TACGTCGCC
GCGCACCCCT
TCACCGGCGG
CCGAACACCT
CCGCCGAACT
CCCACGCCAT
TCCACACCGC
CCCGCATCCT
CT CCGC T GGA
AGGGCGTCTA
GGGGCGAGAC
GCGCCGACGA
TGGACGAACT
ACTGGGAGCG
TCCCGGAGGC
CCGTGGCGCC
AG CGCCGGC C
CCTCCCCCGA
CGGCCGTGCA
CGGTCTTCGA
CACCGGCCGA
TGCCCCTCGA
TCGAGCCCGA
CGGAGGCGTC
GTAACACCTG
CACCCG CCC C
GTTCCAACGA
GGAAAGAGAG
GCTGCCGGTT
GCAAGGACCT
ACCCGGTGCC
CCGCCGGATT
CGCAGCAGCG
CCGCGTCGGT
CGCCGGACAT
CCGCCGTGGC
TGGACACGGC
ACGGCGACTG
TCATCGAGTT
CGGCGGCGGA
7CCGACGCGCG
AGGACGGCGC
GCCAGGCCCT
3GCACGGGGAC
AGGGGCGCGC
k. CGCAGGCCGC 3TGCTGCCGAA r CGGTCGAGCT 3 CGGGCGTCTC
CCGGCGGTCGA
CGGTGTCCGC
3 CGGAAGACCG P. CGGCGATGGA C TGCGGATGCC G TCTTCCCCGG C CCGAATTCGC T CTCTCGAAGC G TCCAGCCCGT A TCACCCCCGA G GTGCCCTCAC WO 99/61599 WO 9961599PCTIUS99/1 1814 -26- 31921 31981 32041 32101 32161 32221 32281 32341 32401 32461 32521 32581 32641 32701 32761 32821 32881 32941 33001 33061 33121 33181 33241 33301 33361 33421 33481 33541 33601 33661 33721 33781 33841 33901 33961 34021 34081 34141 34201 34261 34321 34381 34441 34501 34561 34621 34681 34741 34801 34861 34921 34981 35041 35101 35161 35221 35281 35341 35401
CCTCGACGAC
CGGCAAGGGC
GAACCTCCAC
CGACCCCACC
GATCATCCCC
CGCCGACGTC
CGAAGGCACC
CCATCGTGTG
CTTCATCGAG
CCTGGCCACC
GGCCTGGGCC
CAGCCCCGCC
CCCCGCGGGT
GACGGGGCTC
CGTAC-TCGCG
GGTCCCCGTC
CCGCAACCGC
CCCGACGCCC
CGTCGCCGAG
GGCCCGCTCC
GGCCGTGGAG
CCGCCCGCAG
CGGCGGTCCG
GGCGAACGGC
CTTCC-TCGCC
CCCGGCCGAT
GGACGCCCCG
CTTCCGCCTG
TCCGCCGGGC
CGCGGGCGAG
GTTCCTCGCC
CGAACCGCTG
GCACACCGTC
CGT CGCC GAG
GTGACCGACA
CCGAACAGCG
TTCCGCTTCT
CGCCAGGACC
GTCGCGGCCA
GGCGCCTCCG
GAGGGCCTGT
CAGCTGGACG
TTCCTCCAGG
GCGGCGGAGA
GCCGGCGACC
AGCGGGCCGT
CACGAGATCT
CGCGTCGTGC
CGGTGACCGA
CGGTGGCCGA
ACGCCGCGAA
CCGCGTACGA
TCACCGCCGA
GCGCCGACGG
AGCG-CGAGCTA
TCGAGGGGAT
CCTTCGAGCT
TGGGTGTTCC
TGTCCGACAC
GCCGCTCGTG
GGCATGATCT
GGACTGTCGA
CAGATCCAAG
GTCGACTACG
CTGGCGGGGT
TGGATCACCG
GGCTTCGCCC
GTCAGCGCCC
CTCCGACGCG
AACGGCCTCG
GTCCCCGACC
CCCGGCGAGG
GCGTGGGGCC
ATGGTGATGC
GACCGCCCGC
GTCAACCGGC
GTCGCGCTCG
CCGTCGGATC
GGGGCCGACA
GACGACCGGT
TTCGCCTCGC
ACGGACCGGG
GGCCCGCACG
GTACCTCTCC
CTCGACACCG
GTCGTCCTGC
GAGCGGGCGC
CATCAGGAGC
CTGGAGCCGA
GGCCCGCGGC
GGCGACTGGC
GCGGACGTGC
GCCGTCCTCT
GACCTCTGAA
CGGTGCGGCT
CGGAGGAGCT
GGCGTGCCGA
CCGAACCCTG
TCGCCTTCGA
ACGTCTCCGG
ACCGGGCGTT
ACGACGAGCT
CGTACCTGCA
GTGACCCGAA
TCTGCCTCCC
GCAACGACAI
AGCCCCCGAC
CGACCTGACC
CCGTGAACTC
*CGGCGACCCC
*GCGGGTGCG I
*TCACGCCCTC
*CGTCCCGGTC
GGTGCTGCCC
CCACCGGGAC
GCTGGGCGGI
CGCGGACCG(
CCTGCTGGC(
TCGTGACCCT
CCCTCGCCCT
TCGCCGCCGT
AACTTGCTCA
CCTCCCACAG
TGTCCCCCCA
AACCCGCCCT
CGGCCGTCGA
ACCCCGTCCT
AGGACGGCGG
CCCTCGACTG
TCCCGACGTA
CGCCCGCGCA
CGGGTGCCGA
GGCAGGCGGC
TGCGGGAGAT
TGACCGGTCT
CCGAGCGCAT
ACGAGCAGGC
CCGGCGCCGG
ACGGCGAGTT
CCGAGGCCTG
CGGAAGGCCG
AGTTCCTGCG
CCGGCTACGG
CGCTCGACGC
TCGGGCACTC
ACGGCGCGCC
CCATCGAGGT
TGTCCGATGC
CGGGCCGCAG
AGGAGGAGCG
CGGGCGACCA
CCTGGCTCGA
CGTGGACAGC
GGTCTGCCTG
GCACCCCTCC
GCCGTGTCTG
GTGGCAGGAG
GACGGCCCGC
TCGGCGCGCC
CCTGGCCGAG
GCTGCGGCTG
CCGGCCGTCC
GGCGCCGCTG
GGCGTACTCC
CTCCGACCAC
CAGCCTTATC
GGGGCCCTCA
GGCACCCACC
TACGCCACCG
GCCCGCGGCG
GCGGCGAGCP
CCGCAGCAGG
GCGGCCGGTC
ACGCTGGAGG
TTCGTCCGCC
;CGCGCGGAC I
CCGCAGTCCC
CCGCAGCAAG
CAGCGAGGAA
CAACGGGCCT
GGCGTGTGAG
CGCCCACGTC
GACACCCCAG
CGACGGCGGC
GACCCTCGCC
CACCATGACC
ACAGCACCGC
GGCCTCCCTG
CGCCTTCCAG
CACCGCTTCC
GGACCTCGAC
CTCCGTGCTC
CGGCTTCGAC
CCAGCTGCCG
CAGCGACGAG
GGAGGAGGAG
CGCCGGGATG
CCTCGACGTC
CTCGGAGCGG
TGCCGTTCTC
GCTCAGCACC
CACGGGTACG
CCAGGCCCGG
CGGCGGCGCC
GCCGGCCGGG
GTGGAGCAGG
GCGGCTGCTG
CAGCGCGCCC
GGGCGACTGG
CTTCACGATG
CGCCATCGAG
GGACTGTGGA
CCGCACGCCG
GTCGAGGCCC
GAGAGCGTCG
GGCCGGCTGG
ATCCTGGAAC
CCGTCGCTGG
ATCCGGCGGC
GTGCTGCCCC
GCCAAGCTCP
AACGAGGTGC
GGCGGCCAC'I
CTGCTCGTCI
GAAGGAGCGC
CGCAGCCCCC
TCCTGGAGAC
TGCTGCGCGC
CGCTCTCCTI
TCCTCTGCTC
TCCTCTCGTI
ACGTGCCGG)
GTCTCGCGC(
CGGCGGTGA(
TCGCGGATC'
TGCGGACGG'
TCCATCGCCG
GCCACCCGGC
ACCGCCACCG
GCCGACGGCA
GAGACCATCG
GTCCCCTTCT
TACT GGTACC
ACCGACGAAG
CTCCCCGACA
CTCACCACCT
CTGCCCGCCA
CACCGCTCGT
GGGCGCGAGG
GAGGAGGGCC
CGGTGCGACT
TCGCTGACCG
CCCACCGTCG
CTGGCCGAGC
AAGGCCGCCG
TTCCGCGCCC
CTCGCCGAAG
CTCGACCCGG
GTCGGCTGCA
TCCTTCCAGG
GGCACCGGCA
GCGATCCTCC
CTGCTCGCGC
ATCGTCCTGG
CAGCTGGGCG
GCCATGGGCC
GTGCTTCTGG
CGTGCCCACT
ATGCGGGACC
GGCATCGAGG
TCCGGCGCTT
GCGGCTCCGC
TGTCGGTGCA
AGGAGCTCGC
CCTTCTTCGC-
AGCGGCACGC
CGCCGGACCC
TCAGCGGCAC
CGCTGCGCAC
CCTGCCCGG~I
CCGAGTGGCC
TCTACCTCA7
CCCGCGGCGC
CGAAGAGATC
GCTGGGCCG(
CCGCGGCATC
CCAGGCGGAC
CAGCCCGACC
GACGGACTT(
k. CGGGGAGGG( k GGGCGGGCA(
'GGACCCGTC(
'GGCCGCTGC(
r' GCTGGAGCG( r ACGGGCGGC(
CCCACCTCGC
AGCGCATCGA
TGGTTTCGGG
TCCGCGCACG
AGAACGAACT
TCTCCACCCT
GCAACCTCCG
GCTTCACCCA
AGGTCACCGG
CCCTTGCCGA
CGGGCGCCCT
ACTGGATCAG
CCGTCGCCGA
GGCGCAGCGC
CGCCCGAAGA
CCGTCGACTT
TGTTCGAGCA
GGAACTGGGC
CTCCGGCGGG
TGTTCCGGCA
CCTCCGCGTT
TGCTGCTCGC
CCGGCACCGC
AGGAGCGGGA
CGGCCCTCCT
GGGCCGCCGG
ACGAGCTGGC
TCGACCCCTA
AGGGCCTGTT
GGTACGCGCG
TCCGTGCCTC
GGGACCTTCC
ACGCGCCGGC
GGGCGGGCAA
CCACCCCGCG
CAGCTACTTC
GTATCCGGGC
CGAGCATGTG
GCACAGCCTC
GGTACGGCCC
GCTCGTCCAC
CGACGAGCGG
CGACTACAAG
GATGGCCCTG
TCGGCACACC
CGACCAGTGG
GCCCGATGCC
GCAGAACCCA
ACCGTCCGCG
CACTGGATCC
GACCCGTATC
GGCAGCTGGG
GGGGTCTCCG
TGTCCGCTGG
3CGTGCCGTGG 3 GCGTCGTACG
GCCGCCGTGC
3 CTCCGGCCGC 3GACGGCGCGC WO 99/61599 WO 9961599PCT/JS99/I 1814 -27- 35461 35521 35581 35641 35701 35761 35821 35881 35941 36001 36061 36121 36181 36241 36301 36361 36421 36481 36541 36601 36661 36721 36781 36841 36901 36961 37021 37081 37141 37201 37261 37321 37381 37441 37501 37561 37621 37681 37741 37801 37861 37921 37981 38041 38101 38161 38221 38281 38341 38401 38461 rGGCCGAGCT
CGCTCGGGGT
%TCCCGAGCA
%GACCCTCCG
kGCTGGCGGG
GCCGGGACCC
CCGCGCACCT
TCAGGCGGA
GGGACGTGCT
GCAGCTCCTG
CTCGGACCAC
CATCCCGCCC
ICGCACATCA
GGCACGAGGT
TCGCCGCGGT
GCGAGCCGCG
ACTGGGACCA
ACAACGACTC
TGCTGTGGGA
ACGCCCGGGT
TGCGGGACCG
CGCTCGACCG
ACCCGACCCC
TTCCGTACAA
GGGTCTGCCT
AGGGCGACAT
CGAGTCAGCG
CGATGCACGC
ACGCGACCGC
CGGTCAAGGC
TCACGCCGCA
CCGCCGCGCA
CCGAGCTGGA
CCGCACCCCT
CCTCCGAAAG
CCTCTTCTAC
GGTGCGCTCC
GCATCTGGAG
CATGCTCACC
GGACTTCCGG
CCTGAAGACG
CGGTGGCGTC
CAGCGCCGAC
GGAGGGGAAC
GCGGCACTTC
GTTCACGGCC
CTTCGTCGGC
GCACCAAGCP.
CCGCGGCCGC
CACCACCGTC
CCGGGCGGCC
CACGGCGCTG
CACCGCAGCC
GTGGCGGGAG
CTACGACCCG
CCGGCGGCTG
GGAGGTCTTC
CGCGCTGCAC
GGTCGCGCTG
CCGCCCCCGC
AGACACCGGG
GGGGACGGCT
CTCCACCGGC
CACGCACTAC
GCGGGTCGCC
GCCGGTCGGC
CCCGAACCAT
CGCCCTCGGC
GATGGTCGAC
GCCGACGACC
CCTGTGGGGG
GCAGCCGCCC
GTACGGCGCC
GCGGAGCCTG
CGGCACGTCG
GACCCTCGGC
CCTGGAGGCG
CGCCGAGATC
GCTCCTGCCG
CGTGATCAAC
GCGGGCCGTC
GGCCGTGCGG
CCGGCTGCGC
GCGGCTCGCC
CGCCCCAGGC
ACCGAAAGCA
CTGGGTCGCG
CGTACCCCCG
CACTTCACCA
CACGCCCGCA
CTCGGCCGGA
ACCGAGGAAC
GTCGTCGTCC
GTCGTCCGCC
GCGACGCGCP
TCCGACGTCC
GCCGGGCTGC
GTCCCCGCCI
AAGAGAGAGI
ACCACGCCCI
GCGGCCGCCC
GAGCTCGTCC
CTCGCCGATT
GTCCAGCTCA
CTGTGCGACC
CCGGTGCAGC
CCGGCCGGGG
ACGGACCCGG
CCCGCCGGTC
CGGACCCTGG
CGCGCGCCTG
GCCCCGGTCC
CAGACCGTCC
AAGGAAGGAC
TACGGCCTGG
AGCCAGCCCG
ACCGACCACC
CCGGCGATCG
ATCGAGGCGA
GACCTCGTCG
TACGCGGGCG
CCCGACGTGA
GAGCACCGCG
TCCTTCGAAG
CGCCTCGACA
GTCGTGCCGG
GTCTCCGCGG
CTCGCCGACC
CGCAACTACC
AGCTGCTCGG
GCGGTGCCGC
GCCGAGCAGG
GACGCCGTCG
GAGGAGACCT
GCGCAGCACC
CTCACCCCTG
GGAGCACCGT
GCAAGGACTA
AGGCCTCCTC
AGGAGTTCGG
AGCGGCTGCC
AGTTCTCCGC
TCGGCGCGGC
AGCCGTGGTC
GTGACGGGCC
TGGAGGTCCT
ATCTCATCAC
GCGTCGAGTI
GAGCACCGCC
AACGAACCG'I
~GGGCCTTCAC
CTCCCGGCG(
CCCAGATGA(
CGGACGACTC CCCCGGGGCC CTGCTGTCGG CCGGGAACGC GGTGCTCGCG CTCCTCGCGC GGCCCGGGCT CGCGGCGGCC GCGGTGGAGG TCGACGCCCG GGTGGTCCGC GGGGAGAGG CGCATGTCGT CGTCCTGACC GCCGCGACCG AGCGCTTCGA CCTCGCGCGC CCCGACGCCG CGTACGGCCC GGTGGCGTCC CTGGTCCGGC CCGGGCGTTT CCCCGGGCTG CGGCAGGCGG TCGGCCGCGG GCCGCTGAGC GTCCCGGTCA GCCCGGCCCC CCTTCGGACG GACCGGACGG CGTGTGTCCC CGTCCGGCTC CCGTCCGCCC ACGACGCCAT GCGCGTCCTG CTGACCTCGT TGCCCCTGGC CTGGGCGCTG CTCGCCGCCG CGCTCACGGA CACCATCACC GGGTCCGGGC TCATCCACGA GTACCGGGTG CGGATGGCGG CCTTCGACGA GGCCCGTCCC GAGCCGCTGG TCCTCGCCCC GTACTTCTAT CTGCTCGCCA ACTTCGCCCG GTCCTGGCAG CCGGACCTGG CCGTCGCCGC GCAGGTCAGC GGTGCCGCGC TGGGCAGCGC CCGCCGCAAG TTCGTCGCGG AGGACCCCAC CGCGGAGTGG CTGACGTGGA AGGAGCTGCT CACCGGCCAG TTCACGATCG CGGGCCTGCC GACCGTCGGG ATGCGTTATG ACTGGCTGAG TGAGCCGCCC GCGCGGCCCC GTGAGGTCCT CGGCGGCGAG GGCGTCTGGC TCGACATCGA GCTCGTCGCC ACGCTCGACG CGAAGCACAC CCGGTTCACG GACTTCGTGC CGATCATCCA CCACGGCGGG GCGGGCACCT AGGTCATGCT CGCCGAGCTG TGGGACGCGC GGGCGGGGTT CTTCCTGCCG CCGGCCGAGC TCCGCATCCT CGACGACCCC TCGGTCGCCA TCGGCGACCC CACCCCGGCC GGGATCGTCC GCCGCCCGCC GGCCGACGCC CGGCACTGAG TATCTGCGCC GGGGGACGCC CCCGGCCCAC GTACGAAGTC GACGACGCCG ACGTCTACGA CGCCGCCGAG GCCTCCGACA TCGCCGACCT GCTCCTGGAC GTGGCCTGCG GTACGGGCAC CGACACGGCC GGCCTGGAGC TGTCCGAGGA CGACGCCACG CTCCACCAGG GGGACATGCG CGTGGTCAGC ATGTTCAGCT CCGTCGGCTA CGTCGCCTCG TTCGCGGAGC ACCTGGAGCC GTTCCCGGAG ACCTTCGCCG ACGGCTGGGT CACCGTGGCC CGTGTCTCGC ACTCGGTGCG CTTCACCGTG GCCGACCCGG GCAAGGGCGT CCTGTTCCAC CAGGCCGAGT AGGAGGCCGC CCTGGAGGGC GGCCCGTCGG GCCGTGGCCT CAAGACCCCC CGGGGCGGGA CGTCCCGGGT GACAGGTAAG ACCCGAATAC GGCGTGTCCG CCTGGCCGTC GTCGGCACCC TGCTGGCGGG CGCCGACACG GCCAATGTTC AGTACACGAG GCTCGACGAG AAGATC (SEQ ID NO:19) Those of skill in the art will recognize that, due to the degenerate nature of the genetic code, a variety of DNA compounds differing in their nucleotide sequences can be used to encode a given amino acid sequence of the invention. The native DNA sequence encoding the narbonolide PKS of Streptomyces venezuelae is shown herein merely to illustrate a WO 99/61599 PCT/US99/11814 -28preferred embodiment of the invention, and the invention includes DNA compounds of any sequence that encode the amino acid sequences of the polypeptides and proteins of the invention. In similar fashion, a polypeptide can typically tolerate one or more amino acid substitutions, deletions, and insertions in its amino acid sequence without loss or significant loss of a desired activity. The present invention includes such polypeptides with alternate amino acid sequences, and the amino acid sequences shown merely illustrate preferred embodiments of the invention.
The recombinant nucleic acids, proteins, and peptides of the invention are many and diverse. To facilitate an understanding of the invention and the diverse compounds and methods provided thereby, the following description of the various regions of the narbonolide PKS and corresponding coding sequences is provided.
The loading module of the narbonolide PKS contains an inactivated KS domain, an AT domain, and an ACP domain. The AT domain of the loading module binds propionyl CoA. Sequence analysis of the DNA encoding the KS domain indicates that this domain is enzymatically inactivated, as a critical cysteine residue in the motif TVDACSSSL, which is highly conserved among KS domains, is replaced by a glutamine and so is referred to as a
KS
Q domain. Such inactivated KS domains are also found in the PKS enzymes that synthesize the 16-membered macrolides carbomycin, spiromycin, tylosin, and niddamycin.
While the KS domain is inactive for its usual function in extender modules, it is believed to serve as a decarboxylase in the loading module.
The present invention provides recombinant DNA compounds that encode the loading module of the narbonolide PKS and useful portions thereof. These recombinant
DNA
compounds are useful in the construction of PKS coding sequences that encode all or a portion of the narbonolide PKS and in the construction of hybrid PKS encoding DNA compounds of the invention, as described in the section concerning hybrid PKSs below. To facilitate description of the invention, reference to a PKS, protein, module, or domain herein can also refer to DNA compounds comprising coding sequences therefor and vice versa.
Also, reference to a heterologous PKS refers to a PKS or DNA compounds comprising coding sequences therefor from an organism other than Streptomyces venezuelae. In addition, reference to a PKS or its coding sequence includes reference to any portion thereof.
The present invention provides recombinant DNA compounds that encode one or more of the domains of each of the six extender modules (modules 1 6, inclusive) of the narbonolide PKS. Modules 1 and 5 of the narbonolide PKS are functionally similar. Each of WO 99/61599 PCT/US99/11814 -29these extender modules contains a KS domain, an AT domain specific for methylmalonyl CoA, a KR domain, and an ACP domain. Module 2 of the narbonolide PKS contains a KS domain, an AT domain specific for malonyl CoA, a KR domain, a DH domain, and an ACP domain. Module 3 differs from extender modules 1 and 5 only in that it contains an inactive ketoreductase domain. Module 4 of the narbonolide PKS contains a KS domain, an AT domain specific for methylmalonyl CoA, a KR domain, a DH domain, an ER domain, and an ACP domain. Module 6 of the narbonolide PKS contains a KS domain, an AT domain specific for methylmalonyl CoA, and an ACP domain. The approximate boundaries of these "domains" is shown in Table 1.
In one important embodiment, the invention provides a recombinant narbonolide PKS that can be used to express only narbonolide (as opposed to the mixture of narbonolide and S0-deoxymethynolide that would otherwise be produced) in recombinant host cells. This recombinant narbonolide PKS results from a fusion of the coding sequences of the picAIII and picAIV genes so that extender modules 5 and 6 are present on a single protein. This recombinant PKS can be constructed on the Streptomyces venezuelae or S. narbonensis chromosome by homologous recombination. Alternatively, the recombinant PKS can be constructed on an expression vector and introduced into a heterologous host cell. This recombinant PKS is preferred for the expression of narbonolide and its glycosylated and/or hydroxylated derivatives, because a lesser amount or no 0-deoxymethynolide is produced from the recombinant PKS as compared to the native PKS. In a related embodiment, the invention provides a recombinant narbonolide PKS in which the picAIV gene has been rendered inactive by an insertion, deletion, or replacement. This recombinant PKS of the invention is useful in the production of 10-deoxymethynolide and its derivatives without production of narbonolide.
In similar fashion, the invention provides recombinant narbonolide PKS in which any of the domains of the native PKS have been deleted or rendered inactive to make the corresponding narbonolide or 10-deoxymethynolide derivative. Thus, the invention also provides recombinant narbonolide PKS genes that differ from the narbonolide PKS gene by one or more deletions. The deletions can encompass one or more modules and/or can be limited to a partial deletion within one or more modules. When a deletion encompasses an entire module, the resulting narbonolide derivative is at least two carbons shorter than the polyketide produced from the PKS encoded by the gene from which deleted PKS gene and corresponding polyketide were derived. When a deletion is within a module, the deletion WO 99/61599 PCT/US o/l1 1 typically encompasses a KR, DH, or ER domain, or both DH and ER domains, or both KR and DH domains, or all three KR, DH, and ER domains.
This aspect of the invention is illustrated in Figure 4, parts B and C, which shows how a vector of the invention, plasmid pKOS039-16 (not shown), was used to delete or "knock out" the picAI gene from the Streptomyces venezuelae chromosome. Plasmid pKOS039-16 comprises two segments (shown as cross-hatched boxes in Figure 4, part B) of DNA flanking the picA gene and isolated from cosmid pKOS023-27 (shown as a linear segment in the Figure) of the invention. When plasmid pKOS039-16 was used to transform S. venezuelae and a double crossover homologous recombination event occurred, the picAI gene was deleted. The resulting host cell, designated K039-03 in the Figure, does not produce picromycin unless a functional picAl gene is introduced.
This Streptomyces venezuelae K039-03 host cell and corresponding host cells of the invention are especially useful for the production of polyketides produced from hybrid PKS or narbonolide PKS derivatives. Especially preferred for production in this host cell are narbonolide derivatives produced by PKS enzymes that differ from the narbonolide PKS only in the loading module and/or extender modules 1 and/or 2. These are especially preferred, because one need only introduce into the host cell the modified picAI gene or other corresponding gene to produce the desired PKS and corresponding polyketide. These host cells are also preferred for desosaminylating polyketides in accordance with the method of the invention in which a polyketide is provided to an S. venezuelae cell and desosaminylated by the endogenous desosamine biosynthesis and desosaminyl transferase gene products.
The recombinant DNA compounds of the invention that encode each of the domains of each of the modules of the narbonolide PKS are also useful in the construction of expression vectors for the heterologous expression of the narbonolide PKS and for the construction of hybrid PKS expression vectors, as described further below.
Section II: The Genes for Desosamine Biosynthesis and Transfer and for Beta-glucosidase Narbonolide and 10-deoxymethynolide are desosaminylated in Streptomyces venezuelae and S. narbonensis to yield narbomycin and YC-17, respectively. This conversion requires the biosynthesis of desosamine and the transfer of the desosamine to the substrate polyketides by the enzyme desosaminyl transferase. Like other Streptomyces, S. venezuelae and S. narbonensis produce glucose and a glucosyl transferase enzyme that glucosylates desosamine at the 2' position. However, S. venezuelae and S. narbonensis also WO 99/61599 PCT/US99/11814 -31 produce a beta-glucosidase, which removes the glucose residue from the desosamine. The present invention provides recombinant DNA compounds and expression vectors for each of the desosamine biosynthesis enzymes, desosaminyl transferase, and beta-glucosidase.
As noted above, cosmid pKOS023-27 contains three ORFs that encode proteins involved in desosamine biosynthesis and transfer. The first ORF is from the picCII gene, also known as des VIII, a homologue of eryCII, believed to encode a 4-keto-6-deoxyglucose isomerase. The second ORF is from the picCIII gene, also known as desVII, a homologue of eryCIII, which encodes a desosaminyl transferase. The third ORF is from the picCVI gene, also known as desVI, a homologue of eryCVI, which encodes a 3-amino dimethyltransferase.
The three genes above and the remaining desosamine biosynthetic genes can be isolated from cosmid pKOS023-26, which was deposited with the American Type Culture Collection on 20 Aug 1998 under the Budapest Treaty and is available under the accession number ATCC 203141. Figure 3 shows a restriction site and function map of cosmid pKOS023-26. This cosmid contains a region of overlap with cosmid pKOS023-27 representing nucleotides 14252 to nucleotides 38506 of pKOS023-27.
The remaining desosamine biosynthesis genes on cosmid pKOS023-26 include the following genes. ORF 11, also known as desR, encodes beta-glucosidase and has no ery gene homologue. The picCI gene, also known as desV, is a homologue of eryCI. ORF14, also known as deslV, has no known ery gene homologue and encodes an NDP glucose 4,6dehydratase. ORF 13, also known as desIII, has no known ery gene homologue and encodes an NDP glucose synthase. The picCV gene, also known as desll, a homologue of eryCV is required for desosamine biosynthesis. The picCIV gene also known as desl, is a homologue of eryCIV, and its product is believed to be a 3,4-dehydratase. Other ORFs on cosmid pKOS023-26 include ORF12, believed to be a regulatory gene; ORF15, which encodes an Sadenosyl methionine synthase; and ORF16, which is a homolog of the M tuberculosis cbhK gene. Cosmid pKOS023-26 also encodes the picK gene, which encodes the cytochrome P450 hydroxylase that hydroxylates the C 12 of narbomycin and the C 10 and C 12 positions of YC- 17. This gene is described in more detail in the following section.
Below, the amino acid sequences or partial amino acid sequences of the gene products of the desosamine biosynthesis and transfer and beta-glucosidase genes are shown. These amino acid sequences are followed by the DNA sequences that encode them.
WO 99/61599 -2 C/S9111 PCT/US99/11814 Amino acid sequence of PICCI (desV) (SEQ ID NO:6) 1 61 121 181 241 301 361
VSSRAETPRV
AVG VNSGMDA
PLLVEKAITP
GSSVAAFSFY
MQAAVLRI RL
DELRSHLDAR
QALRVT DAVR
PFLDLKAAYE
LQLAILRGLGI
RTRALLPVHL
PGKNLGCFGD
XHLDSWNGRR
GIDTLTHYPV
ELRAETDAAI
GPGDEVIVPS
YGHPADMDAL
GGAVVTGDPE
SALAAEYLSG
PVHLS PAYAG
ARVLDSGRYL
HTYIASWLAV
RELADRHGLH
LAERLRMLRN
LAGLPGIGLP
EAPPEGSLPR
LGPELEGFEA
SAT GAT PVPV
IVEDAAQAHG
YGSRQKYSHE
VTAPDTDPVW
AES FARQVLS EFAAYCET DE
EPHEDHPTLD
ARYRGRRIGA
TKGTNSRLDE
HLFTVRTERR
LPIGPHLERP
EWAERVDQA (SEQ ID NO:6) Amino acid sequence of 3-keto-6-deoxyglucose isomerase, PICCII (des VIII) (SEQ ID NO:7) 1 61 121 181 241 301 361
VADRELGTHL
TAD HALAAS I
EGIHRETLEG
SDSLLAPQSL
PEQWRELCDR
RDPEVFTDPE
DVLRPRRAPV
LETRGIHWIH
LCSTDFGVSG
LAPDPSASYA
RTVRAADGAL
PGLAAAAVEE
RFDLARPDAA
GRGPLSVPVS
AANGDPYATV
ADGVPVPQQV
FELLGGFVRP
AELTALLADS
TLRYDPPVQL
AHLALH PAGP SS (SEQ ID
LRGQADDPYP
LSYGEGCPLE
AVTAAAAAVL
DDSPGALLSA
DARVVRGETE
YGPVASLVRL
NO: 7)
AYERVRARGA
REQVLPAAGD
GVPADRRADF
LGVTAAVQLT
LAGRRLPAGA
QAEVALRTLA
LSFSPTGSWV
VPEGGQRAVV
ADLLERLRPL
GNAVLALLAH
HVVVLTAATG
GRFPGLRQAG
Amino acid sequence of desosaminyl transferase, PICCILI (des VII) (SEQ ID NO: 8) 1 61 121 181 241 301 361 MRVLLTS FAH
EYRVRMAGEP
RSWQPDLVLW
TAEWLTWTLD
SEP PARPRVC TRFT DFVPMH
FFLPPAELTP
HTHYYGLVPL
RPNHPAIAFD
EPTTYAGAVA
RYGAS FEE EL
LTLGVSAREV
ALLPSCSAII
QAVRDAVVRI
AWALLAAGHE
EARPEPLDWD
AQVTGAAHAR
LTGQFTIDPT
LGGDGVSQGD
HHGGAGTYAT
LDDPSVATAA
VRVASQPALT
HALGIEAILA
VLWGPDVMGS
PPSLRLDTGL
ILEALADLDI
AVINAVPQVM
HRLREET FGD
DTITGSGLAA
PYFYLLANND
ARRKFVALRD
PTVGMRYVPY
ELVATLDASQ
LAELWDAPVK
PTPAGIVPEL
VPVGTDHLIH
SMVDDLVDFA
RQPPEHREDP
NGTSVVPDWL
RAEIRNYPKH
ARAVAEQGAG
ERLAAQHRRP
421 PADARH (SEQ ID NO:8) Partial amino acid sequence of aminotransferase-dehydrase, PICCIV (desI) (SEQ ID NO:9) 1 61 121 181 241 301 VKSALS DLAF
VAGLAGVRHA
PDTGNLDPDQ
DGRPAGSLGD
NAKM'SEAAAA
VEI DEATTGI
FGGPAAFDQP
VATCNATAGL
VAAAVT PRTS AEVFS FHATK
MGLTSLDAFP
HRDLVMEVLK
LLVGRPNRID
QLLAHAAGLT
AVVGVHLWGR
AVNAFEGGAV
EVI DRNRRNH
AEGVHTRAYF
RARLYERLDR ALDSQWLSNG GEVIMPSMTF AATPHALRWI PCAADQLRKV ADEHGLRLYF VTDDADLAAR IRALHNFGFD AXYREHLADL PGVLVADHDR S (SEQ ID NO:9)
GPLVREFEER
GLTPVFADID
DAAHALGCAV
LPGGS PAGGT
H-GLNNHQYVI
Amino acid sequence of PICCV (desII) (SEQ ID NO: 1 61 121 181 241 301 361 421 481
MTAPALSATA
SPFTPLEEAR
AALARKPVFP
SAMYFSGGLE
LYGLNDEEYE
DFIADLNDAG
DYGYALNSLR
IAGRVT PDTS RGFLR (SEQ
PAERCAHPGA
HDLGVDRDAF
YSVGLYPG PT
PLTNPGLGSL
QTTGKKAAFR
QGRTI DFVNI
TGADAELLRI
LTEVVRDFVE
ID DLGAAVI-iAVG
RRLLALFGQV
CMFRCHFCVR
AAHATDHGLR
RVRENLRRFQ
REDYSGRDDG
KPATMRPTAH
RGGEVAAVDG
QTLAAGGLVP
PELRTAVETG
VTGARYDPSA
PTVYTNSFAL
QLRAERESPI
KLPQEERAEL
PQVAVQVDLL
DEYFMDGFDQ
PDEAGTTARH
PAGAYWKNTL
LDAGNAMFRS
TERTLERQPG
NLGFAYIVLP
QEALNAFEER
GDVYLYREAG
VVTARLNQLE
LVRLAVRYGN
LPLEQRGVFD
VIDET PAGNP
LWGLHAIRTS
GRASRLLDLV
VRERTPGLHI
FPDLDGATRY
RDAADGWEEA
Amino acid sequence of 3-amino diniethyl transferase, PICCVI (des VI) (SEQ ID NO: 11) 1 VYEVDHADVY 61 GTGES 121 AVASFAEHLE 181 HFTVADPGKG ID NO:11) DLFYLGRGKD YAAEASDIAD LVRSRTPEAS SLLDVACGTG THLEHFTKEF DMLTHARKRL PDATLHQGDM RDFRLGRKFS AVVSMFSSVG YLKTTEELGA PGGVVVVEPW WFPETFADGW VSADVVRRDG RTVARVSHSV REGNATRMEV VRHFSDVHLI TLFHQAEYEA AFTAAGLRVE YLEGGPSGRG LFVGVPA (SEQ WO 99/61599 PCT/US99/11814 -33- Partial amino acid sequence of beta-glucosidase, ORFi 1 (desR) (SEQ ID NO: 12) 61 121 181 241 301 361 421 481 541 601 661 721
MTLDEKISFV
ASTFDDTMAD
AQIKGIQGAG
AYNGLNGKPS
PKGEPSPPAK
GAQAVSRKVA
APLDT IKARA
ADGEYRIAVR
SLELGWVTPA
DANPNTIVVL
AAENQHAVAG
TVVRTSTGGL
KTVTVNVDRR
HWALDPDRQN
SYGKVMGRDG
LMTTAKHFAA
CGNDELLNNV
FFGEALKTAV
ENGAVLLRNE
GAGATVTYET
ATGGYATVQL
AADAT IAKAV
NTGSSVLMPW
DPTSYPGVDN
KVTVTVRNSG
QLQFWDAAT D
VGYLPGVPRL
RALNQDMVLG
NNQENNRFSV
LRTQWGFQGW
LNGTVPEAAV
GQALPLAGDA
GEETFGTQI P
GSHTIEAGQV
ESARKARTAV
LSKTRAVLDM
QQTYREGI HV
KRAGQEVVQA
NWKTGTGNRL
GI PELRAADG
PMMNNIRVPH
NAN VDEQTLR VMS DWLATPG
TRSAERIVGQ
GKS IAVIGPT AGNLS PAFNQ
YGKVSSPLLK
VFAYDDGTEG
WY PGQAGAEA
GYRWFDKENV
YLGASPNVTA
LQTGSSSADL
PNGIRLVGQT ATALPAPVAL GGRNYETFSE DPLVSSRTAV EIEFPAFEAS SKAGAGSFMC TDAI TKGLDQ EMGVELPGDV MEKFGLLLAT PAPRPERDKA AVDPKVTGLG SAHVVPDSAA GHQLEPGKAG ALYDGTLTVP LTKGTHKLTI SGFAMSATPL VDRPNLSLPG TQDKLISAVA TAALLYGDVN PSGKLTQSFP KPLFPFGHGL SYTSFTQSAP PQAKKKLVGY TKVSLAAGEA RGSATVNVW (SEQ ID NO:12) Amino acid sequence of transcriptional activator, ORF 12 (regulatory) (SEQ ID NO: 13) 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 MNLVERDGE I
ALPGDRDIPL
TTGCS PS PAG
RAELLRQPHC
RQASHTTLGA
TTAAAVERI
EDT VARHLLV
PHLVAASWRM
P55 DNDALEL
PDNASVAQAE
WEAVFAATRA
QAERVLRQPV
IVPWRTSAAE
AEAVDMLHDS
PGRGGRRAKA
LTRVYRKLNV
AHLRAVLDAS
GVLCQLLRSA
TPFLVAVDDL
RNMWLSGLPP
AGGDEPVHGD
QELAAIGLLD
GGAPDAPWAL
NPHMTTRALA
SLTR4WLAAL
QILQGCRLSE
MIAIRCGDLP
PDAMFDS RHG
VYLRLGNRQK
GDRLEHARAL
VSTELELPGG
TRRADLPISL
AAGDGTLLLV
EQHGADTSAV
THADTASLRF
SGVRQLLAHY
AFAQAVLDCL
EDGTLGQPAI
PLLERGAQQA
LFDRLLSGEL
CPPLLESLPA
ETYEALETAL
TARERAELAL
MEYMHARGRY
ARALAEAQLA
AGMSRHQQAQ
PDVGLLSEAE
SGPAGSGKTE
RDLLDAASRR
LLYCAAHHDQ
YGPEAAERRA
HRSAEGTLET
REAALQDLPA
LFDDRLDDAF
PPSHPVMAL I
TPEPERGPVP
LVLVHADRLD
SHAAPESWGL
WLAXGRLHAA
LVRPGRSRTR
GDNYRARMTA
RRVAALAARG
LLRSLRRLAA
AGTS PPPPTR
GGIGFVMTER
PAYHATTGGN
ARWLAVLEQS
GERTELHRRA
RILEFAVRSS
RCLVWYGRLP
VRLAPRTTAL
RALFWS DALL
AVGMPLSALL
LGEFMLCGEI
GLTLRVLAAA
RLAGDMAWAC
LTNRQIARRL
ERETPVWSVR
RSASTRHTAC
ASQRAGYRVF
PLLLRALTQD
DPLLVERLTG
AEQLHRDGAD
TDNTQLARLA
EAADALSRLR
QAQAGVFQRG
AEAVERRSLG
LACTEAGEYE
LGSWNLDQPS
VDGQQAERLH
GAYPLAEEIV
CVTASTVEQH
AQDKSVTA (SEQ ID NO:13) Amino acid sequence of dNDP-glucose synthase (glucose-I1-phosphate thymidyl transferase), ORF 13 (desIII) (SEQ ID NO: 14) 1 61 121 181 241 NO: 14)
MKGIVLAGGS
FQSLLGNGRH
RDS IARLDGC VDIAKN IRPS
EERQGVWIAG
GT RLH PATS V
LGIELDYAVQ
VLFGYPVKDP
PRGELEITDV
LEEIAFRMGF
ISKQILPVYN
KEPAGIADAL
ERYGVAEVDA
NRVYLERGRA
I DAEACHGLG
KPMIYYPLSV
LVGAEHIGDD
TGRLTDLVEK
ELVNLGRGFA
EGLSRTEYGS
LMLGGIREIQ
TCALILGDNI
PVKPRSNLAV
WLDTGTHDSL
YLMEIAGREG
IISTPQHIEL
FHG PGLYTLL
TGLYLYDNDV
LRAAQYVQVL
AP (SEQ ID Amino acid sequence of dNDP-glucose 4,6-dehydratase, ORF 14 (deslV) (SEQ ID NO: 1 61 121 181 241 301
VRLLVTGGAG
BGDI RDAGLL
RVVHVSTDEV
NYGPYQHPEK
HIGGGLELTN
ADGLARTVRW
FIGSHFVRQL
ARELRGVDAI
YGSIDSGSWT
LI PLFVTNLL
RELTGILLDS
YRENRGWWEP
LAGAYPDVPA
VHFAAESHVD
ESSPLEPNSP
DGGTLPLYGD
LGADWS SVRK
LKATAPQLPA
DEVIVLDSLT YAGNRANLAP VDADPRLRFV RSIAGASVFT ETNVQGTQTL LQCAVDAGVG YAASKAGSDL VARAYHRTYG LDVRITRCCN GANVREWVHT DDHCRGIALV LAGGRAGETY VADRKGHDLR YSLDGGKIER ELGYRPQVSF TAVEVSA (SEQ ID Partial amino acid sequence of S-adenosylmethionine synthase, ORF 15 (SAM synthase) (SEQ ID NO: 16) 1 IGYDSSKKGF DGASCGVSVS IGSQSPDIAQ GVDTAYEKRV EGASQRDEGD ELDKQGAGDQ WO 99/61599 WO 9961599PCT/US99/1 1814 34- 61 121 181 241 301 GLMFGYAS DE LDT VVVS SQH
GGPMGDAGLT
RCEVQVAYAI
QTAAYGHFGR
TPELMPLPIH
ASDIDLESLL
GRKIIIDTYG
GKAEPVGLFV
ELPDFTWERT
LAHRLSRRLT
APDVRKFVVE
GMARHGGGAF
ETFGTHKIET
DRVDALKKAA
EVRKNGTT PY
HVLAQLVEDG
SGKDPSKVDR
EKIENAIGEV
GL (SEQ ID
LRPDGKTQVT
I KLDTDGYRL
SAAYAMRWVA
FDLRPAAI IR NO: 16)
IEYDGDRAVR
LVNPTGRFEI
KNVVAAGLAS
DLDLLRPIYS
Partial amino acid sequence of ORF 16 (homologous to M NO:17) 1 MPJAVTGSIA TDHLMTFPGR FAEQILPDQL AHVSLSFLVD 61 GRRPVLVGAV GKDFDGYGQL LRAAGVDTDS VRVSDRQHTA 121 MAEARDIDLG ETAGRPGGID LVLVGADDPE AMVRHTRVCR 181 SVRELVDGAE LLFTNAYERA LLLSKTGWTE QEVLARVGTW tuberculosis cbhK) (SEQ ID TLDIRHGGVA ANIAYGLGLL RFMCTTDEDG NQLAS FYAGA ELGLRRAADP SQQLARLEGD ITTLGAKGCR (SEQ ID NO:17) While not all of the insert DNA of cosmid pKOSO23-26 has been sequenced, five large contigs shown of Figure 3 have been assembled and provide sufficient sequence information to manipulate the genes therein in accordance with the methods of the invention.
The sequences of each of these five contigs are shown below.
Contig 00 1 from cosmid pKOSO23-26 contains 2401 nucleotides, the first 100 bases of which correspond to 100 bases of the insert sequence of cosmid pKOS023-27. Nucleotides 80 2389 constitute ORFI 1, which encodes 1 beta glucosidase. (SEQ ID 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861
CGTGGCGGCC
GGCGGAGCTC
GGACCCCGAC
GCTGCGTGCC
GCCCGCGCCG
GGTCATGGGC
CAACATCCGG
CTCCTCGCGC
GGCCAAGCAC
CGACGAGCAG
CGCGGGCTCC
CGAGCTCCTC
CTGGCTCGCC
CGAGCTCCCC
GGCGCTGAAG
GGAGCGGATC
GCCCGAGCGC
GGTLGCTCCTG
CG'CGGTCATC
CGTCCCGGAC
GACGGTGACG
CAGCCCGGCG
CGGCACGCTG
TTACGCCACG
GAGCAGCCCG
GA.TGAGTGCC
GACGATCGCG
CGACGACGGC
GCTGATCTCG
GTCGGTGCTG
CCAGGCGGGC
GCTCACGCAG
GCCGCTCCCG
GTCGCCCAGA
CGGCAGAACG
GCCGACGGCC
GTCGCCCTGG
CGCGACGGTC
GTGCCGCACG
ACCGCGGTCG
TTCGCGGCCA
ACGCTCCGCG
TTCATGTGTG
AACAACGTGC
ACCCCGGGCA
GGCGACGTCC
ACGGCCGTCC
GTCGGCCAGA
GACAAGGCGG
CGCAACGAGG
GGCCCGACGG
TCGGCGGCGG
TACGAGACGG
TTCAACCAGG
ACCGTGCCCG
GTGCAGCTCG
CTCCTCAAGC
ACCCCGCTCT
AAGGCCGTG
ACCGAGGGCG
GCTGTCGCGG
ATGCCGTGGC
GCCGAGGCCA
AGCTTCCCGG
GCGCCGCCGA
rGACGCTCGA
TCGGCTACCT
CGAACGGCAT
CCAGCACCTT
GCGCGCTCAA
GCGGCCGGAA
CCCAGATCAA
ACAACCAGGA
AGATCGAGTT
CCTACAACGG
TGCGCACGCA
CCGACGCCAT
CGAAGGGCGA
TGAACGGCAC
TGGAGAAGTT
GTGCCCAGGC
GCCAGGCCCT
CCGTCGACCC
CGCCACTCGA
GTGAGGAGAC
GCCACCAGCT
CCGACGGCGA
GCAGCCACAC
TGACCAAGGG
CCCTGGAGCT
AGTCGGCGCG
TCGACCGTCC
ACGCCAACCC
TGTCCAAGAC
CCGCCGCGCT
CCGCCGAGAA
CACGGCCAAT
CGAGAAGATC
TCCCGGCGTG
CCGCCTGGTG
CGACGACACC
CCAGGACATG
CTACGAGACC
GGGCATCCAG
GAACAACCGC
CCCGGCGTTC
CCTCAACGGG
GTGGGGCTTC
CACCAAGGGC
GCCCTCGCCG
GGTCCCCGAG
CGGTCTGCTC
GGTGTCCCGC
GCCGCTCGCC
CAAGGTCACC
CACCATCAAG
CTTCGGGACG
CGAGCCGGGC
GTACCGCATC
CATCGAGGCC
CACGCACAAG
GGGCTGGGTN
GAAGGCCCGT
GAACCTGTCG
GAACACGATC
CCGCGCGGTC
CCTCTACGGT
CCAGCACGCG
GTTCAGTACA
AGCTTCGTCC
CCGCGTCTGG
GGGCAGACCG
ATGGCCGACA
GTCCTGGGCC
TTCAGCGAGG
GGTGCGGGTC
T TCT CCGT GA
GAGGCGTCCT
AAGCCGTCCT
CAGGGCTGGG
CTCGACCAGG
CCGGCCAAGT
GCGGCCGTGA
CTCGCCACTC
AAGGTCGCCG
GGTGACGCCG
GGCCTGGGCA
GCCCGCGCGG
CAGATCCCGG
AAGGCGGGGG
GCGGTCCGTG
GGTCAGGTCT
CTCACGATCT
ACGCCGGCGG
ACGGCGGTCG
CTGCCGGGTA
GTGGTCCTCA
CTGGACATGT
GACGTCAACC
GTCGCCGGCG
CGAGCCGGGC
ACTGGGCGCT
GCATCCCGGA
CCACCGCGCT
GCTACGGCAA
CGATGATGAA
ACCCCCTGGT
TGATGACCAC
ACGCCAATGT
CCAAGGCCGG
GCGGCAACGA
TGATGTCCGA
AGATGGGCGT
TCTTCGGCGA
CGCGGTCGGC
CGGCGCCGCG
AGAACGGCGC
GCAAGAGCAT
GCGCCCACGT
GTGCGGGTGC
CGGGGAACCT
CGCTGTACGA
CCACCGGTGG
ACGGCAAGGT
CGGGCTTCGC
CGGCCGACGC
TCTTCGCCTA
CGCAGGACAA
ACACCGGTTC
GGTACCCGGG
CGAGCGGCAA
ACCCGACCAG
WO 99/61599 WO 9961599PCTIUS99/1 1814 1921 1981 2041 2101 2161 2221 2281 2341
CTACCCGGGC
GTTCGACAAG
GTTCACGCAG
CACGGTCCGC
CAGCCCGAAC
GCTCGCCGCG
CTGGGATGCC
TTCGTCCTCC
GTCGACAACC
GAGAACGTCA
AGCGCCCCGA
AACAGCGGGA
GTGACGGCTC
GGCGAGGCGA
GCCACGGACA
GCCGACCTGC
AGCAGACGTA
AGCCGCTGTT
CCGTCGTGCG
AGCGCGCCGG
CGCAGGCGAA
AGACGGTGAC
ACTGGAAGAC
GGGGCAGCGC
CCGCGAGGGC
CCCGTTCGGG
TACGTCCACG
CCAGGAGGTC
GAAGAAGCTC
GGTGAACGTC
GGGAACGGGC
CACGGTCAAC
ATCCACGTCG
CACGGCCTGT
GGTGGTCTGA
GTCCAGGCGT
GTGGGCTACA
GAGCCGCCGT C
AACCGCCTCC
GTCTGGTGAC
GGTACCGCTG
CGTACACCTC
AGGTCACGGT
ACCTCGGTGC
CGAAGGTCTC
AGCTGCAGTT
TGCAGACCGG
GTGACGCCGT
2401 G (SEQ ID Contig 002 from cosmid pKOSO23-26 contains 5970 nucleotides and the following ORFs: from nucleotide 995 to 1 is an ORF of picCIV that encodes a partial sequence of an amino transferase-dehydrase; from nucleotides 1356 to 2606 is an ORY of picK that encodes a cytochrome P450 hydroxylase; and from nucleotides 2739 to 5525 is ORF 12, which encodes a transcriptional activator. (SEQ ID NO:2 1) 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281
GGCGAGAAGT
CGGTGGATGC
CCGTGGCGGT
SCGGCGTGGT
CCCATGGCGG
GGCAGGTCGA
GTGACGACGG
TCGGCGTCGC
GCGTCGAAGT
CAGGGGCGGC
GCCACCTGGT
AGGCCGATCC
TCGCCGGTGA
GCCACGGCAT
GGGCCGCCGT
GCGCGGTCGA
CCGAAGAATG
CGAAATGATT
GACGAGATGG
CCGACAAGAG
TCCATCGGCG
GGAGGGCCGG
TTCCGTTCCG
CCGCTTCTCC
ATCCGACGTA
GGGACGAGGT
GGTTCAGCAA
ACCACAACAT
GTGAGTT GAG
GGCTCGTGGA
CCTGGCCGCT
CCGCCTTCCG
CCGCCATGGC
ACGGCGAGGA
CCTCCGAGGA
TCAATCTGAT
TGCGGGCCGA
CGGTGGAATC
TCCCGGCCGG
AGGCGCGGGT
CGGTGGTGGC
CGTGGTCGGC
TGCGCCGGTT
CGGCGGCCTC
AGCCGAAGTT
CGCCGCCCTC
CGAGGCTGCC
ACAGCCGCAG
CCCAGAGGTG
CCGGGTCGAG
AGCGCAGTGC
GGCCGGCGGC
GCCGGACCCC
TGGACAGCCA
TGCGGTTGGG
CGAGGTCGGA
CGCCGATCCG
TCGATTGTGG
CACGCTATGC
GCCCGAATGT
GGCGCGCTCC
CTTCCGGCCC
CCCGGTACTC
CGCGAGACTG
GTGGCTGGTC
GGACTGGCGC
GCTGGAGTCC
CAT GCGCCGG
CGCCATGCTG
GCCGATCACC
CGTCTGGACC
CGAGATGAGC
*CCTGCTCAGC
*GCTGCTCGGT
CGCCAACGGC
CATGACGCTC
CGCGACCTAC
TGACACGGTC
GTGCACGCCT
CTCGTCGATC
GACGAGGACG
CCGGTCGATG
GCTCATCTTG
GTGGAGGGCG
GAAGGCGTTG
GGCGGGCCGG
GCCGTGCTCG
GACGCCGACG
GTTGCCGGTG
GTGCGGGGTG
GTGCGCGAGG
GGCGAGCCCG
CTGGCTGTCG
CCG CCC CACG
TAAGGCGCTT
GGAATCCCGA
TGGTCGATTT
GCTCTCGATG
GATGATCCTT
GCGGAAGAGT
GGTCTGGAGT
GACCTCGGGG
CGTGCCGAGG
GTCGGCTACG
AACTCCACGA
GACCCGCCGC
GTCGAGTTGC
GCGGCGCCCG
GTGATCTCCG
GACGCCTTCG
GGCTATCTCI
GCGCTCGTGC
ATGGCCCACT
ATGTACGCGC
TTGGACGGCC
CGCTTCCCGC
CTCGTCGTCC
TCGGCCTTCA
TCGACGATCA
CCGGGGAGGT
AC CTCGGGAA
GCGTTGGTCC
CGGATCCGGG
ACGGCCTTGG
CCGTCGACCG
TCGGCGACCT
ACGGCCGAGG
TCCGGGTCGA
GCGGCGAACG
AGCTGGAGCC
GCGACGCGCT
AGGGCCCGGT
AGGAGCGGCT
TTCACGGATG
ACGAGGTCGC
CGGGGGGACT
TGCTTCGGAT
GACAGGATCC
ACGTGTGAGA
TCTCCGTGCG
CCCTGGGGCA
GTCCGGCCCA
ACCGGGCGCG
CTCCCCTGAC
GGCACACCCG
TGCGGCCCCG
ACGGCCGCGC
AACTCCTCGG
TCTTCCCGGA
CCCGGCTCAT
GGACCAGCGA
TCCTGCTCGT
TGCTCTCGCA
CGGTGGAGGPA
TCGAGCCCGI
TGGCCGACGC
GGACCTCCAT
CGTACTGGTG
CCGCGAGGTG
ACGCGTCGAG
CGCCGGCGGG
CGGCGAGGTC
TGGCGTGGAA
CGCAGCCGAG
TCCGCAGCTG
TGCGGGGTGT
TGTCGGCGAA
TCATCGACGG
CGGCCGTGGC
CCTCGAACTC
CGAGCCGCTC
GGTCGAAAGC
TTCCCTCCGG
CGCGCTCCAC
CTAATCCGCG
CACATCCGCC
GGGAATCAGC
AGTCCCGTTC
CCGTACCCAG
GGATTTCGCG
CCGGGTGCGC
GGCGGTCCTC
CGAGGCCGAG
GCTGCGCAAG
GGTCCAGGAG
CGATCTGATG
CGTGCCCGAG
*CGATCCCGCC
CGACTCCAAG
*CGAGGACGGC
CGCGGGGCAC
CCCCGACCAG
GATGTTGCGC
CGACCTGGAC
CCACCGCACC
GACGAGGTCG
GTTGTTGAGG
CTCGCGGTAG
GGAGGTGAGG
GCTGCCGCCG
GGCGTCGTCG
GCTGAAGACC
GGCGTGCGCG
GTCGGCGGCG
GACCGCGGCG
GACCGGGGTG
CATGATCACT
GTTGCAGGTG
GCGGACGAGC
GTACAGCCTG
GGCGGGGCCG
GCCACCGTCA
CGTGACGTAC
CGGAACGGGA
TCCGGGGTAT
CGAGCCGCCG
CTCTTCCCGT
CAGGGAACGA
GCCGATCCGT
ACCCCCGAGG
GCCGATCCCC
GCCGCGCTCA
CTGGTGGCCC
ATCGTCGACG
GAGTCCCTGG
CCGGACCGCG
CAGGCCCAGA
CGCGGGCAGG
TCCCGGCTGA
GAGACCACGG
CTGGCCGCCC
TACGAGGGCC
GGCACGGTCA
CCCGAGCGCT
WO 99/61599 WO 9961599PCT/US99/1 1814 -36- 2341 2401 2461 2521 2581 2641 2701 2761 2821 2881 2941 3001 3061 3121 3181 3241 3301 3361 3421 3481 3541 3601 3661 3721 3781 3841 3901 3961 4021 4081 4141 4201 4261 4321 4381 4441 4501 4561 4621 4681 4741 4801 4861 4921 4981 5041 5101 5161 5221 5281 5341 5401 5461 5521 5581 5641 5701 5761 5821
TCCCGGACCG
ACGGCATCCA
GCGCCCTTCT
GGTATCCGAA
GGGAGGCGGG
GAAGCCCCGG
CGAAGGGTTC
GGGAGATAGC
TACTCGTCTC
TGGCCGCCGA
TCCCCCTGGG
CCGCCGTCCG
CGACGCGCCG
CCGCCGGCAC
TGAGGTTCCT
CCGAGCGGGC
CGCACTGCCG
CCCACTACTA
GCGGGAACCC
TCGGCGCGGC
ACTGCCTGCA
AACAGTCCGA
GCCACATCCA
CCGCGATCCG
GGCGCGCCGC
TGCTGGTCGG
AGCAGGCCCT
GGTCGAGCAC
GGCGGATGAA
GTGAACTGCC
GGCTGCCCGA
TGGAGCTGTC
TGCCGGCCAC
CCGCGCTCCA
AGGCCGAACA
CGGCCCTCTT
CCCTGCTCGC
CCCGGGCGAT
TGGCGCTCTC
CGCTGCTGCT
AGCCGGTGCC
GCCGCTACTG
GGGAGATCCT
CCGCCGAGGT
AGCTCGCCCT
CGGCGGCGGT
ACGACAGCGG
AGGCCCAGGG
GGGCCTGCGG
CGAAGGCGGT
AGGCCGAACG
GCCGGCTCTG
TGAACGTGAC
CCTGAGCCAC
GGACACGCCG
AGGCGCCCGA
ACGCCAGGGA
CCCGGTGCGC
GGCGGCCGGG
GCACCGCTTC
CTTCTGCATC
CGAACGCTGC
CCCGATGATC
CCGCCGTACC
ATCGGTCCCC
GGCGCCCGGA
CCATCTCAGG
CGGACCGGCC
GCGGGAGACC
CGTCCTCTGC
CGACCTGCTG
CTCCGCGTCG
CCCGTTCCTC
CCTGTACTGC
CTCGCAGCGC
CAACATGTGG
CGGCCCCGAG
GCTGCTCCTG
CGGCGGCGAC
CCGCAGCGCC
CCCGCTCCTG
GGAGCTCGCC
CGAGGCCGCC
GGAGCAGCTG
CGGCGCCCCC
GTTCGACGAC
CGACAACACC
CCCGCACATG
GCCCAGCCAC
GGCCGCCGAC
GCTCACCCGG
GCCGGAGCCG
GGCCCAGGCC
GATCCTGCAG
GGTCCTCGTC
CGAGGCCGTG
GATCGCGATC
CCACGCGGCG
CGCCTGCACG
GGACGCGATG
GCTGGCGANC
GGGCAGCTGG
GTACCTGCGG
GGTGCGGCCC
GGACGGCCAG
CGACCGGCTC
GGACAACTAC
CGCGTACCCG
GAGCACGGAG
CCGGGTGGCG
CGTCACCGCG
CCGCCGAGCA
CCCCGGTGTC
GTGCGACACG
GGCGCCCGGT
*CCGCTGGGGA
*CCGGGGACAC
GTGTCCTTCA
3ACATCCGCC 3GCGCCCCCT
CGGACCTCG
CGCGGGCTCA
3GTTGAACCC
CCTCGCCGTA
CGAGGGGGGA
GCCGTTCTTG
GGCAGCGGGA
CCCGTCTGGT
CAGTTACTCC
GACGCCGCCT
ACGAGACACA
GTCGCCGTCG
GCCGCCCACC
GCCGGATACC
CTCTCCGGGC
GCCGCCGAGC
CGGGCG CT GA
GAGCCCGTCC
GAGGGCACAC
GTGGAGCGGC
GCCATCGGCC
CTCCAGGACC
CACCGGGACG
GACGCTCCCT
CGACTCGACG
CAGCTGGCCC
ACGACCCGGG
CCGGTCATGG
GCGCTGTCCC
ATGTGGCTCG
GAGCGGGGTC
GGCGTCTTCC
GGCTGCCGGC
CACGCCGACC
GAGCGGCGGT
CGCTGCGGCG
CCGGAGAGCT
GAGGCCGGCG
TTCGACTCGC
GGCCGGCTGC
AACCTCGACC
CTCGGCAACC
GGGCGCTCCC
CAGGCGGAGC
GAACACGCCC
CGGGCGAGGA
CTGGCCGAGG
CTGGAACTGC
GCCCTGGCAG
AGCACGGTCG
GACCTCCCGA
CCCGTGCGAC
GGGGCGCGCC
GCGGCACCCG
CACCGGGACC
CAGACCGCCG
TCGGTGGGCC
3GGACACCGC C I'GGCCCGGTT G CCCTGGACGT C kGGCCCTGCC C GCACGTCACC C k.CAAGACCTG C CTTCCGCGAT C kCGCATCCGC C 0,GACGGAGCT C CGGTCCGGGC C GCAGCGCCGA I'
CGCGGCGGGC
CCGCCTGCAC
ACGACCTGAC
ACGACCAGGG
GGGTGTTCCG
TTCCCCCCAG
GGCGGGCCCC
CCCAGGACCG
ACGGCGACGC
TGGAGACCGC
TCACGGGAAC
TCCTGGACGA
TGCCGGCCGG
GCGCCGACGA
GGGCGCTGCC
ACGCCTTCCG
GCCTCGCCCC
CCCTCGCACT
CCCTGATCCG
GGCTGCGGCC
CGGCGCTGTG
CCGTCCCCGT
AGCGGGGCCC
TGTCGGAGGA
GGCTCGACCG
CGCTCGGCTG
ACCTCCCGAC
GGGGCCTCGC
AGTACGAACA
GGCACGGCAT
ACGCGGCGCT
AGCCCTCGAT
GCCAGAAGGC
GCACCCGGGG
GGCTGCACGC
GCGCGCTCGC
TGACGGCGCG
AGATCGTGCC
CGGGCGGCCC
CCCGAGGATT
AACAGCACCT
TCAGCCTCGC
GACCCGCCGC
AGGTGCCATG
GAGACGCCAG
TCAGGGACCG
GGACCACCCG
TTCATCGGCA
GGCCATCTC G ;GAGGCCCGG P~ TCCCCCGGC C ;ATCCGCTGG C ATTACGACT C ;TTAGAGTGA I ;AATCTGGTG C ~GCAGGTGAC C ;CTGCGGTCG ;CTGcCGGGT C
.CAACACGGTC
,GGAACCTCA
;ACTGGCTGC
:CACGCCGAC I
GGCATCGGC
MGCCGAGCTG
:GGGGTACGC
'GCGTACCAC
GCAGGCCTCC
CTTCGCCCAG
CCGCTGGCTC
GACCGCCGCC
GGACGGCACC
CGAGCGCACC
GGACACCGTG
CCTGCTCGAA
GATCCTCGAG
PCACCTGGTC
CTTCGACCGG
CTGCCTCGTC
CAGCTCCGAC
CCCGCCGCTC
ACGGCTCGCG
GGACAACGCC
GACGTACGAG
GGCGCTGTTC
GGAGGCGGTC
GGCGCGGGAG
CGTGGGCATG
GGCGGAGCGG
GGAGTACATG
GGGCGAGTTC
CGTGCCCTGG
CAGGGCGCTG
TCTCACCCTG
CGAGGCGGTC
CGGGATGAGC
GCTCGCCGGC
GGGCCGCGGC
GGACGTCGGC
GACGAACCGC
GACGCGCGTC
CCAGGACAAG
ACGGGCCACC
GGGACCTCCG
GACCGCCGGG
CCGGGACCGC
AGGGTGCCCG
GGAGGAAGCG
;CCTTCGGCC
~TCGCCGTCC
;AACTCGTGT
GGCGAGGAC
~CTTGTCACG
'GGAGGACGA
;AACGCGACG
;GGACGCTCT
TCCGCCGGC
ACCGCGACA
CCGACACCT
~CTCCCCCGC
rCTCCGTCTC
~CCGCGTCCC
r'TCGTCATGA
,TCCGCCAGC
'AGTTACTCG
3CGACGACCG
'ACACCACCC
GCCGTCCTCG
GCGGTCCTCG
GCCGTCGAGC
CTGGGACAGC
GAACTGCACC
GCCCGCCACC
CGGGGCGCGC
TTCGCCGTGC
GCGGCCTCCT
CTCCTGAGCG
TGGTACGGNC
A.ACGATGCCT
CTGGAGTCCC
CCGCGGACGA
TCGGTCGCGC
GCCCTGGAGA
TGGTCGGACG
TTCGCCGCGA
CGGGCCGAGC
CCCCTCTCCG
GTCCTGCGGC
CACGCCCGGG
ATGCTCTGCG
CGGACCTCCG
GCCGAGGCCC
CGGGTCCTGG
GACATGCTGC
CGCCACCAGC
GACATGGCGT
GGCCGCCGGG
CTGCTCTCGG
CAGATAGCGC
TACCGCAAAC
TCCGTCACGG
GGGCCCGCCG
TGACCGCCCG
ACCACCGGAG
CCGAGTTGCA
GTGTGGCCCC
ACCGTGAGAC
WO 99/61599 PCT)US99/1 1814 -37- 5881 CCGTCGTGCC GTCGGCGATC AGCCGCCTGT ACGGGCGTCG GACTCCCTGG CGGTCCCGGA 5941 CCCGTCGTAC GGGCTCGCGG GACCCGGTGG (SEQ ID NO:21) Contig 003 from cosmid pKOSO23-26 contains 3292 nucleotides and the following ORFs: from nucleotide 104 to 982 is ORF 13, which encodes dNDP glucose synthase (glucose-lI-phosphate thymidyl transferase); from nucleotide 1114 to 2127 is ORF 14, which encodes dNDP-glucose 4,6-dehydratase; and from nucleotide 2124 to 3263 is the picCI ORE.
(SEQ ID NO:22) 61.
121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281.
2341 2401 2461 2521 2581 2641 2701
ACCCCCCAAA
ACGCCGACCG
GGCCGGCGGG
TCCGGTCTAC
TCGCGAGATT
AAACGGCAGG
CGCGGACGCA
GGGCGACAAC
CCTCGACGGC
CGAGGTGGAC
CAACCTCGCC
CATCCGGCCC
GCGGGGCCGG
CCACGACTCG
CTGGATCGCG
TCACGGCCTG
CGGCCGCGAG
GCCACCGACA
GCGACCTACA
GGCTTCATCG
GCCGATGAGG
CCGGTGGACG
CTCGCCCGGG
GACCGCTCCA
CTGCTCCAGT
GTGTACGGGT
CCCTACGCGG
GGCCTCGACG
AAGCTCATCC
GACGGCGCGA
GTCCTCGCGG
AACCGCGAAC
AAGGTCGCCG
CGCGAGCTCG
TGGTACCGGG
GCCACCGCCG
TCGACCTCAA
TCCTCGACTC
CCGCGTACTG
TCGCCCTCCG
ACATCGCCAG
ACGAGGACCA
GGGCGCTCCT
TCGCGGACCG
ACCGGGGCCC
AGAACCTCGG
GGGGTGGTGA
TTATCACCGG
AGCGGAACTC
AACAAACCGA
CAAATCATCT
CACCTGGGAA
CTTCTCGTCG
ATCTTCCACG
TGCGTGCTCT
GCGACGGGCC
GTCACCGGCC
TCGCCGCGCG
GCCGAACTCG
CTCCTGCGGG
GGCCTTGAGG
GGAGAAGGCC
GGAGCCCCGT
GTGCGACCCA
GCGCGACCGA
GCTCGCACTT
TGATCGTCCT
CGGACCCGCG
AACTGCGCGG
TCGCGGGCGC
GCGCCGTCGA
CGATCGACTC
CGTCCAAGGC
TACGGATCAC
CCCTCTTCGT
ACGTCCGCGA
GCGGCCGGGC
TCACCGGCAT
ACCGCAAGGG
GCTACCGCCC
AGAACCGCGG
TGGAGGTGTC
GGCCGCCTAC
GGGGCGCTAC
CGAGACGGAC
CGGCCTCGGC
CTGGCTCGCC
CCCCACCCTC
CCCCGTCCAC
GCACGGCCTC
GCGGATCGGC
CTGCTTCGGC
CACTCCCCCT
CGCCCTGCTG
GGCTGCATCC
TGATCTACTA
CGACCCCCCA
TAGAACTCGA
GAGCCGAGCA
GGCCCGGCCT
TCGGCTACCC
GGCTGACCGA
TCTACCTCTA
GCGAGCTGGA
TCAACCTGGG
CCGCCCAGTA
AGATCGCCTT
TCTCCCGCAC
GAGGGCACCT
CACCGCGACC
AAGGAAGACG
CGTGCGGCAG
GGACAGCCTC
ACTGCGCTTC
CGTGGACGCG
GTCCGTGTTC
CGCCGGCGTC
CGGCTCCTGG
CGGCTCCGAC
CCGCTGCTGC
GACGAACCTC
GTGGGTGCAC
CGGCGAGATC
CCTCCTGGAC
CCACGACCTG
GCAGGTCTCC
CTGGTGGGAG
CGCGTGAGCA
GAGGAGCTCC
CTCCTCGGAC
CACGCCGTCG
ATCGGACCCG
GTGTCCGCCA
GACCCGCTGC
CTCTACGGGC
CACATCGTCC
GCCGGGTCGI
GACGGCGGCC
GGGCAGCCCC
CTAGTTTCCG
GGCGACCTCG
TCCGCTGTCG
GCACATCGAA
CTATGCGGTC
CATCGGCGAC
CTACACGCTC
GGTCAAGGAC
CCTCGTCGAG
CGACAACGAC
GATCACCGAC
CCGCGGCTTC
CGTCCAGGTC
CCGCATGGGC
CGAG TAG GGC
CGCGGCCGAC
CGCACCGCCA
GCAGTGCGGC
CTCCTCGCCG
ACCTACGCGG
GTCCACGGCG
ATCGTCCACT
ACCGAGACCA
GGCCGGGTCG
ACCGAGAGCA
CTCGTTGCCC
AACAACTACG
CTCGACGGCG
ACCGACGACC
TACCACATCG
TCGCTCGGCG
CGCTACTCCC
TTCGCGGACG
CCGCTCAAGC
GCCGCGCCGP.
GCGCGGAGAC
CCGAACTCGI!
GCGTGAACAC
GGGACGAGGI
*CCGGCGCGAC
TCGTCGAGAI
ACCCCGCCGI
AGGACGCCGC
*CGGTGGCCG(
CCGTCGTCA(
TAGCGCCCCC
AGAATGAAGG
GTCATTTCGA
GTTCTCATGC
CTCTTCCAGT
CAGAAAGAGC
GACACCTGCG
CTGCGGGACA
CCCGAGCGGT
AAGCCCGTCA
GTCGTCGACA
GTCAACCGCG
GCCTGGCTGG
CTGGAGGAGC
TTCATCGACG
AGCTATCTGA
GCGTTCCCAC
CCGACAGTGC
TTCTGGTGAC
GGGCGTACCC
GCAACCGCGC
ACATCCGCGA
TCGCGGCCGA
ACGTGCAGGG
TGCACGTCTC
GCCCGCTGGA
GCGCCTACCA
GGCCGTACCA
GGACGCTCCC
ACTGCCGGGG
GCGGCGGCCT
CCGACTGGTC
TCGACGGCGG
GCCTCGCGCG-
CGACCGCCCC
LGACCCCCCGC
CGACGCCGCC
AGGATTCGAC
CGGGATGGAC
GATCGTCCCC
CCCCGTGCCC
SGGCGATCACC
SCATGGACGCC
GCAGGGCCAC
GTTCAGCTTC
CGGCGACCC(
CTAACTCGCC
GAATAGTCCT
AGCAGATTCT
TCGGCGGTAT
CGCTTCTCGG
CCGCAGGAAT
CCCTGATCCT
GCATCGCGCG
ACGGCGTCGC
AGCCGCGCTC
TCGCCAAGAA
TCTACCTGGA
ACACCGGCAC
GGCAGGGCGT
CCGAGGCCTG
TGGAGATCGC
GACCGACAGC
GACCCACACC
CGGAGGTGCG
CGACGTGCCC
CAACCTCGCC
CGCCGGCCTC
GAGCCACGTG
CACGCAGACG
CACCGACGAG
GCCCAACTCG
CCGGACGTAC
GCACCCCGAG
GCTGTACGGC
CATCGCGCTC
GGAGCTGACC
CTCGGTCCGG
CAAGATCGAG
GACCGTCCGC
GCAGCTGCCC
GTCCCCTTCC
ATCGCCCGCG
GCGGAGTTG
GCCCTCCAGC
TCGCACACGT
GTCGAGCCGC
CCCCGCACCC
CTCCGCGAGC
GGCGCCCGCT
TACCCGGGCA
GAGCTCGCCG
WO 99/61599 WO 9961599PCT/US99/1 1814 -38- 2761 2821 2881 2941 3001 3061 3121 3181 3241 NO: 22)
AACGGCTCCG
GCACCAACTC
TGGACAGCTG
GACTGCCCGG
TCACCGTGCG
ACACCCTCAC
GGCCGGAAGG
TCGGCCCGCA
CCGAGCGGGT
GATGCTCCGC
CCGCCTGGAC
GAACGGCCGC
CATCGGCCTG
CACCGAGCGC
GCACTACCCG
CTCGCTCCCG
CCTGGAGCGC
CGACCAGGCC
AACTACGGCT
GAGATGCAGG
AGGTCGGCGC
CCGGTGACCG
CGCGACGAGC
GTACCCGTGC
CGGGCCGAGA
CCGCAGGCGC
TAGTCAGGTG
CGCGGCAGAA
CCGCCGTGCT
TGGCCGCGGA
CGCCCGACAC
TGCGCAGCCA
ACCTCTCGCC
GCTTCGCGCG
TGCGGGTGAT
GTCCGGTAGA
GTACAGCCAC
GCGGATCCGG
GTACCTCTCC
CGACCCGGTC
CCTCGACGCC
GGCCTACGCG
GCAGGTCCTC
CGACGCCGTG
CCCAGCAGGC
GAGACGAAGG
CTCGNCCACC
GGGCTCGCG
TGGCACCTCT
CGCGGCATCG
GGCGAGGCAC
AGCCTGCCGA
CGCGAATGGG
CG (SEQ ID Contig 004 from cosmid pKOSO23-26 contains 1693 nucleotides and the following ORFs: from nucleotide 1692 to 694 is ORFi15, which encodes a part of S-adenosylmethionine synthetase; and from nucleotide 692 to 1 is ORF 16, which encodes a part of a protein homologous to the M tuberculosis cbhK gene. (SEQ ID NO:23) 1 ATGCGGCACC 61 TGCTCGGTCC 121 AGCTCGGCTC 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681
GAGGGGTCCG
GCCTCGGGGT
TCGCCCAGGT
TTGCCGTCCT
ACGGAGTCGG
CCGACGGCGC
GCGGCGACGC
GCGAGCTGGT
GTGGCGATCG
GTCCACGCGG
GGCGGCGGTC
GCGGAGGTCG
GCCGAAGGTC
GACCTCGCAG
GTACGCGGCG
GCGGGCCATG
CATCGGGCCG
GTCGAGCTTG
CACGTCGGGG
GACCGTGTCG
GTCGGGACGC
ACGGTGCGCG
GAACATCAGG
GGAGGCACCC
CTGCGACCCG
GTCGTACCCG
CCTTGGCGCC
AGCCCGTCTT
CGTCGACGAG
CGGCCCGGCG
CGTCCGCGCC
CGATGTCGCG
CGTCGGTGGT
TGTCGACGCC
CGACGAGGAC
CGCCGTGCCG
CCGGCAGGAT
ACCCGGTGAC
TCGGTGCGCT
TGGGAGTAGA
AAGACCTCGC
TCGACGAAGA
CGCGAGGCGA
GAGCGGTCGA
CCGCCGTAGG
CCGATCTCGA
ATGCCGTCCT
GCGAGCAGCG
AGACGGACCG
AGGTACGGGA
AGGTGGATCG
CCCTGGTCAG
TCGACCCGCT
ATGGACACCG
GAGCGTGGTG
GCTGAGCAGC
CTCCCGGACG
GAGGCCCAGC
GACGAGGACG
CGCCTCGGCC
GCACATGAAG
CGCGGCGCGG
GGGGCGGCGA
GATGTCCAGG
CTGCTCGGCG
GGCTATACGC
CCCAGGTGAA
TCGGGCGGAG
CGATGGCGTT
GGCCGACGGG
GACCGGCGGC
CCTTCGACGG
TGTCGATGAT
AGCGACCGGT
CGACGAGCTG
ACTCCAGGTC
CCCTGTCGCC
TGGTCCCGTT
GCAGCGGCAT
CGGCGCCCTG
TCTCGTACGC
ACACGCCGCA
ATCCAGGTGC
AGCGCCCGCT
CTGTCGCCCT
TCGCGGCAGA
AGGTCGATCC
ATCGCGCCCG
CGGGCGGTGT
AGCAGCTGCC
CCGAGCAGGC
GTGTCGACGA
AAGCGGCCCG
ATGTCAGAGC
GTCCGGCAGC
CAGGTCGAGG
CTCGATCTTC
CTCGGCCTTG
GACGACGTTC
GTCCTTGCCG
GATCTTGCGG
CGGGTTCACG
CGCAAGCACG
GATGTCCGAG
GTCGTACTCG
CTTGCGGACC
CAGCTCGGGC
CTTGTCGAGC
GGTGTCGACA
GGAGGCGCCG
CGACCCGGGC
CGTAGGCGTT
CCAGCCGGGC
CCCGCGTGTG
CGCCGGGCCG
CGTAGAACGA
GCTGACGGTC
CGTACCCGTC
CGAGGCCGTA
GGAACGACAG
GGAAGGTCAT
CCCGCGGCCT
TCGCGGCCGA
TCGCGGATGA
TCGGTCTCGA
CCGATCGCGT
TTCGCCACCC
GAGAAGGCGC
CCGGTGAGGC
AGCAGGCGGT
TGCTCGACGA
GCGTGCTGCG
ATGGTGACCT
TCGGTCAGGC
GTCTCGTCCG
TCGTCCCCCT
CCCTGGGCGA
TCGAAGCCCT
GAGCACCTCC
CGTGAACAGC
GAGCTGCTGC
CCGCACCATC
GCCGGCCGTC
GGCGAGCTGA
CGACACCCGC
GAAGTCCTTG
CGCGATGTTG
GGACACGTGG
CAGGTGGTCG
TCTTCAGGGC
AGTGGCCGTA
TCGCGGCCGG
TCTTGTGGGT
ACGCGACCTG
AGCGCATCGC
CGCCACCGTG
CGGCGTCGCC
AGCCGTCGGT
CGAACTTCCG.
AGGAGACGAC
GGGTCTTGCC
GGCGCGAGAG
AGGCATAGCC
CGTCCCGCTG
TGTCCGGGGA
TCTTCGAGGA
ATC (SEQ ID NO:23) Contig 005 from cosmid pKOSO23-26 contains 1565 nucleotides and contains the ORF of the picCV gene that encodes PICCY, involved in desosamine biosynthesis. (SEQ ID NO:24) 1 61 121 181 CCCCGCTCGC GGCCCCCCAG ACATCCACGC CCACGATTGG ACGCTCCCGA TGACCGCCCC CGCCCTCTCC GCCACCGCCC CGGCCGAACG CTGCGCGCAC CCCGGAGCCG ATCTGGGGGC GGCGGTCCAC GCCGTCGGCC AGACCCTCGC CGCCGGCGGC CTCGTGCCGC CCGACGAGGC CGGAACGACC GCCCGCCACC TCGTCCGGCT CGCCGTGCGC TACGGCAACA GCCCCTTCAC WO 99/61599 WO 9961599PCT/US99/1 1814 -39- 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561
CCCGCTGGAG
CGCCCTGTTC
GTACTGGAAG
CAGGAAGCCC
CTGCCACTTC
CAACGCCATG
CTTCTCCGGG
CACCGACCAC
CCTGGAGCGC
CAACGACGAG
GAACCTGCGC
CGCCTACATC
CGACCTCAAC
CAGCGGCCGT
CAACGCCTTC
CGCCCTGAAC
CATGCGGCCC
CCTGTACCGC
CGTGACCCCC
GGTGGCGGCC
CCGCCTGAAC
GCGCTGACCC
GGCCC (SEQ
GAGGCCCGCC
GGGCAGGTCC
AACACCCTGC
GTCTTCCCGT
TGCGTCCGTG
TTCCGGTCGG
GGCCTGGAGC
GGCCTGCGGC
CAGCCCGGCC
GAGTACGAGC
CGCTTCCAGC
GTGCTCCCGG
GACGCCGGGC
GACGACGGCA
GAGGAGCGGG
AGCCTGCGCA
ACCGCGCACC
GAGGCCGGCT
GACACCTCCC
GTCGACGGCG
CAGCTGGAGC
GCACCCGCCC
ID NO:24)
ACGACCTGGG
CGGAGCTCCG
TCGCGCTCGA
ACAGCGTCGG
TGACCGGCGC
TCATCGACGA
CGCTCACCAA
CCAGCGTCTA
TCTGGGGCCT
AGACCACCGG
AGCTGCGCGC
GCCGTGCCTC
AGGGCAGGAC
AGCTGCCGCA
TCCGCGAGCG
CCGGGGCCGA
CGCAGGTCGC
TCCCCGACCT
TCACCGAGGT
ACGAGTACTT
GCGACGCCGC
CGATCCCCCC
CGTCGACCGG
CACCGCGGTC
ACAGCGCGGC
CCTCTACCCC
CCGCTACGAC
GATACCCGCG
CCCCGGCCTC
CACGAACTCC
GCACGCCATC
CAAGAAGGCC
CGAGCGCGAG
CCGCCTGCTC
GATCGACTTC
GGAGGAGCGG
CACGCCCGGA
CGCCGAACTG
GGTGCAGGTC
GGACGGCGCG
CGTCAGGGAC
CATGGACGGC
GGACGGCTGG
GATCCCCCC
GACGCCTTCC
GAGACCGGCC
GTCTTCGACG
GGCCCGACCT
CCGTCCGCCC
GGCAACCCCT
GGGAGCCTGG
TTCGCGCTCA
CGCACCTCGC
GCCTTCCGCC
TCGCCGATCA
GACCTGGTCG
GTCAACATTC
GCCGAGCTCC
CTCCACATCG
CTGCGGATCA
GATCTCCTCG
ACCCGCTACA
TTCGTCGAGC
TTCGATCAGG
GAGGAGGCCC
CCACGATCCC
GGCGCCTCCT
GCGCCGGGGC
CGGCGCTCGC
GCATGTTCCG
TCGACGCCGG
CGGCGATGTA
CCGCGCACGC
CCGAGCGCAC
TCTACGGCCT
GCGTCCGCGA
ACCTCGGCTT
ACTTCATCGC
GCGAGGACTA
AGGAGGCCCT
ACTACGGCTA
AGCCCGCCAC
GCGACGTGTA
TCGCGGGCCG
GCGGCGGCGA
TCGTCACCGC
GCGGCTTCCT
CCCACCTGAG
The recombinant desosamine biosynthesis and transfer and beta-glucosidase genes and proteins provided by the invention are useful in the production of glycosylated polyketides in a variety of host cells, as described in Section IV below.
Section Ill. The Genes for Macrolide Ring Modification: the picK Hydroxylase Gene The present invention provides the picK gene in recombinant form as well as recombinant PicK protein. The availability of the hydroxylase encoded by the picK gene in recombinant form is of significant benefit in that the enzyme can convert narbomycin into picromycin and accepts in addition a variety of polyketide substrates, particularly those related to narbomycin in structure. The present invention also provides methods of hydroxylating polyketides, which method comprises contacting the polyketide with the recombinant PicK enzyme under conditions such that hydroxylation occurs. This methodology is applicable to large numbers of polyketides.
DNA encoding the picK gene can be isolated from cosmid pKOS023-26 of the invention. The DNA sequence of the picK gene is shown in the preceding section. This DNA sequence encodes one of the recombinant forms of the enzyme provided by the invention.
The amino acid sequence of this form of the picK gene is shown below. The present invention also provides a recombinant picK gene that encodes a picK gene product in which WO 99/61599 PCT/US99/11814 the PicK protein is fused to a number of consecutive histidine residues, which facilitates purification from recombinant host cells.
Amino acid sequence of picromycin/methymycin cytochrome P450 hydroxylase, PicK (SEQ ID NO:18) 1 VRRTQQGTTA SPPVLDLGAL GQDFAADPYP TYARLRAEGP AHRVRTPEGD EVWLVVGYDR 61 ARAVLADPRF SKDWRNSTTP LTEAEAALNH NMLESDPPRH TRLRKLVARE FTMRRVELLR 121 PRVQEIVDGL VDAMLAAPDG RADLMESLAW PLPITVISEL LGVPEPDRAA FRVWTDAFVF 181 PDDPAQAQTA MAEMSGYLSR LIDSKRGQDG EDLLSALVRT SDEDGSRLTS EELLGMAHIL 241 LVAGHETTVN LIANGMYALL SHPDQLAALR ADMTLLDGAV EEMLRYEGPV ESATYRFPVE 301 PVDLDGTVIP AGDTVLVVLA DAHRTPERFP DPHRFDIRRD TAGHLAFGHG IHFCIGAPLA 361 RLEARIAVRA LLERCPDLAL DVSPGELVWY PNPMIRGLKA LPIRWRRGRE AGRRTG (SEQ ID NO:18) The recombinant PicK enzyme of the invention hydroxylates narbomycin at the C 12 position and YC-17 at either the C10 or C12 position. Hydroxylation of these compounds at the respective positions increases the antibiotic activity of the compound relative to the unhydroxylated compound. Hydroxylation can be achieved by a number of methods. First, the hydroxylation may be performed in vitro using purified hydroxylase, or the relevant hydroxylase can be produced recombinantly and utilized directly in the cell that produces it.
Thus, hydroxylation may be effected by supplying the nonhydroxylated precursor to a cell that expresses the hydroxylase. These and other details of this embodiment of the invention are described in additional detail below in Section IV and the examples.
Section IV: Heterologous Expression of the Narbonolide PKS; the Desosamine Biosynthetic and Transferase Genes; the Beta-Glucosidase Gene; and the picK Hydroxylase Gene In one important embodiment, the invention provides methods for the heterologous expression of one or more of the genes involved in picromycin biosynthesis and recombinant DNA expression vectors useful in the method. Thus, included within the scope of the invention in addition to isolated nucleic acids encoding domains, modules, or proteins of the narbonolide PKS, glycosylation, and/or hydroxylation enzymes, are recombinant expression systems. These systems contain the coding sequences operably linked to promoters, enhancers, and/or termination sequences that operate to effect expression of the coding sequence in compatible host cells. The host cells are modified by transformation with the recombinant DNA expression vectors of the invention to contain these sequences either as extrachromosomal elements or integrated into the chromosome. The invention also provides Wn 00/61 A;9 PCT/US99/11814 -41methods to produce PKS and post-PKS tailoring enzymes as well as polyketides and antibiotics using these modified host cells.
As used herein, the term expression vector refers to a nucleic acid that can be introduced into a host cell or cell-free transcription and translation medium. An expression vector can be maintained stably or transiently in a cell, whether as part of the chromosomal or other DNA in the cell or in any cellular compartment, such as a replicating vector in the cytoplasm. An expression vector also comprises a gene that serves to produce RNA, which typically is translated into a polypeptide in the cell or cell extract. To drive production of the RNA, the expression vector typically comprises one or more promoter elements.
Furthermore, expression vectors typically contain additional functional elements, such as, for example, a resistance-conferring gene that acts as a selectable marker.
The various components of an expression vector can vary widely, depending on the intended use of the vector. In particular, the components depend on the host cell(s) in which the vector will be introduced or in which it is intended to function. Components for expression and maintenance of vectors in E. coli are widely known and commercially available, as are components for other commonly used organisms, such as yeast cells and Streptomyces cells.
One important component is the promoter, which can be referred to as, or can be included within, a control sequence or control element, which drives expression of the desired gene product in the heterologous host cell. Suitable promoters include those that function in eucaryotic or procaryotic host cells. In addition to a promoter, a control element can include, optionally, operator sequences, and other elements, such as ribosome binding sites, depending on the nature of the host. Regulatory sequences that allow for regulation of expression of the heterologous gene relative to the growth of the host cell may also be included. Examples of such regulatory sequences known to those of skill in the art are those that cause the expression of a gene to be turned on or off in response to a chemical or physical stimulus.
Preferred host cells for purposes of selecting vector components include fungal host cells such as yeast and procaryotic, especially E. coli and Streptomyces, host cells, but single cell cultures of, for example, mammalian cells can also be used. In hosts such as yeasts, plants, or mammalian cells that ordinarily do not produce polyketides, it may be necessary to provide, also typically by recombinant means, suitable holo-ACP synthases to convert the recombinantly produced PKS to functionality. Provision of such enzymes is described, for WO 99/61599 PCT/IIS99/11814 -42example, in PCT publication Nos. WO 97/13845 and WO 98/27203, each of which is incorporated herein by reference. Control systems for expression in yeast, including controls that effect secretion are widely available and can be routinely used. For E. coli or other bacterial host cells, promoters such as those derived from sugar metabolizing enzymes, such as galactose, lactose (lac), and maltose, can be used. Additional examples include promoters derived from genes encoding biosynthetic enzymes, and the tryptophan (trp), the betalactamase (bla), bacteriophage lambda PL, and T5 promoters. In addition, synthetic promoters, such as the tac promoter Patent No. 4,551,433), can also be used.
Particularly preferred are control sequences compatible with Streptomyces spp.
Particularly useful promoters for Streptomyces host cells include those from PKS gene clusters that result in the production of polyketides as secondary metabolites, including promoters from aromatic (Type II) PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene promoters and tcm gene promoters; an example of a Type I PKS gene cluster promoter is the spiramycin PKS gene promoter.
If a Streptomyces or other host ordinarily produces polyketides, it may be desirable to modify the host so as to prevent the production of endogenous polyketides prior to its use to express a recombinant PKS of the invention. Such hosts have been described, for example, in U.S. Patent No. 5,672,491, incorporated herein by reference. In such hosts, it may not be necessary to provide enzymatic activities for all of the desired post-translational modifications of the enzymes that make up the recombinantly produced PKS, because the host naturally expresses such enzymes. In particular, these hosts generally contain holo-ACP synthases that provide the pantotheinyl residue needed for functionality of the PKS.
Thus, in one important embodiment, the vectors of the invention are used to transform Streptomyces host cells to provide the recombinant Streptomyces host cells of the invention.
Streptomyces is a convenient host for expressing narbonolide or 10-deoxymethynolide or derivatives of those compounds, because narbonolide and 10-deoxymethynolide are naturally produced in certain Streptomyces species, and Streptomyces generally produce the precursors needed to form the desired polyketide. The present invention also provides the narbonolide PKS gene promoter in recombinant form, located upstream of the picAI gene on cosmid pKOS023-27. This promoter can be used to drive expression of the narbonolide PKS or any other coding sequence of interest in host cells in which the promoter functions, particularly S. venezuelae and generally any Streptomyces species. As described below, however, wn oo/6599 PCIT/n.99/11814 -43promoters other than the promoter of the narbonolide PKS genes will typically be used for heterologous expression.
For purposes of the invention, any host cell other than Streptomyces venezuelae is a heterologous host cell. Thus, S. narbonensis, which produces narbomycin but not picromycin is a heterologous host cell of the invention, although other host cells are generally preferred for purposes of heterologous expression. Those of skill in the art will recognize that, if a Streptomyces host that produces a picromycin or methymycin precursor is used as the host cell, the recombinant vector need drive expression of only a portion of the genes constituting the picromycin gene cluster. As used herein, the picromycin gene cluster includes the narbonolide PKS, the desosamine biosynthetic and transferase genes, the beta-glucosidase gene, and the picK hydroxylase gene. Thus, such a vector may comprise only a single ORF, with the desired remainder of the polypeptides encoded by the picromycin gene cluster provided by the genes on the host cell chromosomal DNA.
The present invention also provides compounds and recombinant DNA vectors useful for disrupting any gene in the picromycin gene cluster (as described above and illustrated in the examples below). Thus, the invention provides a variety of modified host cells (particularly, S. narbonensis and S. venezuelae) in which one or more of the genes in the picromycin gene cluster have been disrupted. These cells are especially useful when it is desired to replace the disrupted function with a gene product expressed by a recombinant DNA vector. Thus, the invention provides such Streptomyces host cells, which are preferred host cells for expressing narbonolide derivatives of the invention. Particularly preferred host cells of this type include those in which the coding sequence for the loading module has been disrupted, those in which one or more of any of the PKS gene ORFs has been disrupted, and/or those in which the picK gene has been disrupted.
In a preferred embodiment, the expression vectors of the invention are used to construct a heterologous recombinant Streptomyces host cell that expresses a recombinant PKS of the invention. As noted above, a heterologous host cell for purposes of the present invention is any host cell other than S. venezuelae, and in most cases other than S. narbonensis as well. Particularly preferred heterologous host cells are those which lack endogenous functional PKS genes. Illustrative host cells of this type include the modified Streptomyces coelicolor CH999 and similarly modified S. lividans described in PCT publication No. WO 96/40968.
WO 99/61599 PCT/US99/11814 -44- The invention provides a wide variety of expression vectors for use in Streptomyces.
For replicating vectors, the origin of replication can be, for example and without limitation, a low copy number vector, such as SCP2* (see Hopwood et al., Genetic Manipulation of Streptomyces: A Laboratory manual (The John Innes Foundation, Norwich, 1985); Lydiate et al., 1985, Gene 35: 223-235; and Kieser and Melton, 1988, Gene 65: 83-91, each of which is incorporated herein by reference), SLP1.2 (Thompson et al., 1982, Gene 20: 51- 62, incorporated herein by reference), and pSG5(ts) (Muth et al., 1989, Mol. Gen. Genet. 219: 341-348, and Bierman et al., 1992, Gene 116: 43-49, each of which is incorporated herein by reference), or a high copy number vector, such as pIJ101 and pJV1 (see Katz et al., 1983, J.
Gen. Microbiol. 129: 2703-2714; Vara et al., 1989, J. Bacteriol. 171: 5782-5781; and Servin- Gonzalez, 1993, Plasmid 30: 131-140, each of which is incorporated herein by reference).
High copy number vectors are generally, however, not preferred for expression of large genes or multiple genes. For non-replicating and integrating vectors and generally for any vector, it is useful to include at least an E. coli origin of replication, such as from pUC, piP, plI, and pBR. For phage based vectors, the phage phiC31 and its derivative KC515 can be employed (see Hopwood et al., supra). Also, plasmid pSET152, plasmid pSAM, plasmids pSE101 and pSE211, all of which integrate site-specifically in the chromosomal DNA of S. lividans, can be employed.
Preferred Streptomyces host cell/vector combinations of the invention include S. coelicolor CH999 and S. lividans K4-114 host cells, which do not produce actinorhodin, and expression vectors derived from the pRMI and pRM5 vectors, as described in U.S.
Patent No. 5,830,750 and U.S. patent application Serial Nos. 08/828,898, filed 31 Mar. 1997, and 09/181,833, filed 28 Oct. 1998, each of which is incorporated herein by reference.
As described above, particularly useful control sequences are those that alone or together with suitable regulatory systems activate expression during transition from growth to stationary phase in the vegetative mycelium. The system contained in the illustrative plasmid the actllactlll promoter pair and the actl-ORF4 activator gene, is particularly preferred. Other useful Streptomyces promoters include without limitation those from the ermE gene and the melC1 gene, which act constitutively, and the tipA gene and the merA gene, which can be induced at any growth stage. In addition, the T7 RNA polymerase system has been transferred to Streptomyces and can be employed in the vectors and host cells of the invention. In this system, the coding sequence for the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a vector under the control of the inducible merA WO 99/61599 PCT/S99/11 814 promoter, and the gene of interest is placed under the control of the T7 promoter. As noted above, one or more activator genes can also be employed to enhance the activity of a promoter. Activator genes in addition to the actlI-ORF4 gene described above include dnrl, redD, and ptpA genes (see U.S. patent application Serial No. 09/181,833, supra).
Typically, the expression vector will comprise one or more marker genes by which host cells containing the vector can be identified and/or selected. Selectable markers are often preferred for recombinant expression vectors. A variety of markers are known that are useful in selecting for transformed cell lines and generally comprise a gene that confers a selectable phenotype on transformed cells when the cells are grown in an appropriate selective medium.
Such markers include, for example, genes that confer antibiotic resistance or sensitivity to the plasmid. Alternatively, several polyketides are naturally colored, and this characteristic can provide a built-in marker for identifying cells. Preferred selectable markers include antibiotic resistance conferring genes. Preferred for use in Streptomyces host cells are the ermE (confers resistance to erythromycin and lincomycin), tsr (confers resistance to thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 (confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to viomycin) resistance conferring genes.
To provide a preferred host cell and vector for purposes of the invention, the narbonolide PKS genes were placed on a recombinant expression vector that was transferred to the non-macrolide producing host Streptomyces lividans K4-114, as described in Example 3. Transformation ofS. lividans K4-114 with this expression vector resulted in a strain which produced two compounds in similar yield (-5-10 mg/L each). Analysis of extracts by LC/MS followed by 1H-NMR spectroscopy of the purified compounds established their identity as narbonolide (Figure 5, compound 4) and 10-deoxymethynolide (Figure 5, compound the respective 14 and 12-membered polyketide precursors of narbomycin and YC17.
To provide a host cell of the invention that produces the narbonolide PKS as well as an additional narbonolide biosynthetic gene and to investigate the possible role of the PIC TEII in picromycin biosynthesis, the picB gene was integrated into the chromosome to provide the host cell of the invention Streptomyces lividans K39-18. The picB gene was cloned into the Streptomyces genome integrating vector pSET152 (see Bierman et al., 1992, Gene 116: 43, incorporated herein by reference) under control of the same promoter (Pactl) as the PKS on plasmid pKOS039-86.
WO 99/619 so PCT/US99/11814 -46- A comparison of strains Streptomyces lividans K39-18/pKOS039-86 and K4- 114/pKOS039-86 grown under identical conditions indicated that the strain containing TEII produced 4-7 times more total polyketide. This increased production indicates that the enzyme is functional in this strain and is consistent with the observation that yields fall to below 5% for both picromycin and methymycin when picB is disrupted in S. venezuelae.
Because the production levels of compound 4 and 5 from K39-18/pKOS03986 increased by the same relative amounts, TEII does not appear to influence the ratio of 12 and 14membered lactone ring formation. Thus, the invention provides methods of coexpressing the picB gene product or any other type II thioesterase with the narbonolide PKS or any other PKS in heterologous host cells to increase polyketide production. However, transformation of a 6-dEB-producing Streptomyces lividans/pCK7 strain with an expression vector of the invention that produces PIC TEII resulted in little or no increase in 6-dEB levels, indicating that TEII enzymes may have some specificity for their cognate PKS complexes and that use of homologous TEII enzymes will provide optimal activity.
In accordance with the methods of the invention, picromycin biosynthetic genes in addition to the genes encoding the PKS and PIC TEII can be introduced into heterologous host cells. In particular, the picK gene, desosamine biosynthetic genes, and the desosaminyl transferase gene can be expressed in the recombinant host cells of the invention to produce any and all of the polyketides in the picromycin biosynthetic pathway (or derivatives thereof).
Those of skill will recognize that the present invention enables one to select whether only the 12-membered polyketides, or only the 14-membered polyketides, or both 12- and 14membered polyketides will be produced. To produce only the 12-membered polyketides, the invention provides expression vectors in which the last module is deleted or the KS domain of that module is deleted or rendered inactive. If module 6 is deleted, then one preferably deletes only the non-TE domain portion of that module or one inserts a heterologous TE domain, as the TE domain facilitates cleavage of the polyketide from the PKS and cyclization and thus generally increases yields of the desired polyketide. To produce only the 14-membered polyketides, the invention provides expression vectors in which the coding sequences of extender modules 5 and 6 are fused to provide only a single polypeptide.
In one important embodiment, the invention provides methods for desosaminylating polyketides or other compounds. In this method, a host cell other than Streptomyces WO 99/61599 PCT/US99/11814 -47venezuelae is transformed with one or more recombinant vectors of the invention comprising the desosamine biosynthetic and desosaminyl transferase genes and control sequences positioned to express those genes. The host cells so transformed can either produce the polyketide to be desosaminylated naturally or can be transformed with expression vectors S 5 encoding the PKS that produces the desired polyketide. Alternatively, the polyketide can be supplied to the host cell containing those genes. Upon production of the polyketide and expression of the desosamine biosynthetic and desosaminyl transferase genes, the desired desosaminylated polyketide is produced. This method is especially useful in the production of polyketides to be used as antibiotics, because the presence of the desosamine residue is known to increase, relative to their undesosaminylated counterparts, the antibiotic activity of many polyketides significantly. The present invention also provides a method for desosaminylating a polyketide by transforming an S. venezuelae or S. narbonensis host cell with a recombinant vector that encodes a PKS that produces the polyketide and culturing the transformed cell under conditions such that said polyketide is produced and desosaminylated.
In this method, use of an S. venezuelae or S. narbonensis host cell of the invention that does not produce a functional endogenous narbonolide PKS is preferred.
In a related aspect, the invention provides a method for improving the yield of a desired desosaminylated polyketide in a host cell, which method comprises transforming the host cell with a beta-glucosidase gene. This method is not limited to host cells that have been transformed with expression vectors of the invention encoding the desosamine biosynthetic and desosaminyl transferase genes of the invention but instead can be applied to any host cell that desosaminylates polyketides or other compounds. Moreover, while the beta-glucosidase gene from Streptomyces venezuelae provided by the invention is preferred for use in the method, any beta-glucosidase gene may be employed. In another embodiment, the betaglucosidase treatment is conducted in a cell free extract.
Thus, the invention provides methods not only for producing narbonolide and deoxymethynolide in heterologous host cells but also for producing narbomycin and YC-17 in heterologous host cells. In addition, the invention provides methods for expressing the picK gene product in heterologous host cells, thus providing a means to produce picromycin, methymycin, and neomethymycin in heterologous host cells. Moreover, because the recombinant expression vectors provided by the invention enable the artisan to provide for desosamine biosynthesis and transfer and/or C10 or C12 hydroxylation in any host cell, the invention provides methods and reagents for producing a very wide variety of glycosylated WO 99/61599 PCT/US99/11814 -48and/or hydroxylated polyketides. This variety of polyketides provided by the invention can be better appreciated upon consideration of the following section relating to the production of polyketides from heterologous or hybrid PKS enzymes provided by the invention.
Section V: Hybrid PKS Genes The present invention provides recombinant DNA compounds encoding each of the domains of each of the modules of the narbonolide PKS, the proteins involved in desosamine biosynthesis and transfer to narbonolide, and the PicK protein. The availability of these compounds permits their use in recombinant procedures for production of desired portions of the narbonolide PKS fused to or expressed in conjunction with all or a portion of a heterologous PKS. The resulting hybrid PKS can then be expressed in a host cell, optionally with the desosamine biosynthesis and transfer genes and/or the picK hydroxylase gene to produce a desired polyketide.
Thus, in accordance with the methods of the invention, a portion of the narbonolide PKS coding sequence that encodes a particular activity can be isolated and manipulated, for example, to replace the corresponding region in a different modular PKS. In addition, coding sequences for individual modules of the PKS can be ligated into suitable expression systems and used to produce the portion of the protein encoded. The resulting protein can be isolated and purified or can may be employed in situ to effect polyketide synthesis. Depending on the host for the recombinant production of the domain, module, protein, or combination of proteins, suitable control sequences such as promoters, termination sequences, enhancers, and the like are ligated to the nucleotide sequence encoding the desired protein in the construction of the expression vector.
In one important embodiment, the invention thus provides a hybrid PKS and the corresponding recombinant DNA compounds that encode those hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a recombinant PKS that comprises all or part of one or more extender modules, loading module, and/or thioesterase/cyclase domain of a first PKS and all or part of one or more extender modules, loading module, and/or thioesterase/cyclase domain of a second PKS. In one preferred embodiment, the first PKS is most but not all of the narbonolide PKS, and the second PKS is only a portion or all of a nonnarbonolide PKS. An illustrative example of such a hybrid PKS includes a narbonolide PKS in which the natural loading module has been replaced with a loading module of another WO 99/61599 PCT/US99/11814 -49- PKS. Another example of such a hybrid PKS is a narbonolide PKS in which the AT domain of extender module 3 is replaced with an AT domain that binds only malonyl CoA.
In another preferred embodiment, the first PKS is most but not all of a nonnarbonolide PKS, and the second PKS is only a portion or all of the narbonolide PKS. An illustrative example of such a hybrid PKS includes a DEBS PKS in which an AT specific for methylmalonyl CoA is replaced with the AT from the narbonolide PKS specific for malonyl CoA.
Those of skill in the art will recognize that all or part of either the first or second PKS in a hybrid PKS of the invention need not be isolated from a naturally occurring source. For example, only a small portion of an AT domain determines its specificity. See U.S.
provisional patent application Serial No. 60/091,526, and Lau et al., infra, incorporated herein by reference. The state of the art in DNA synthesis allows the artisan to construct de novo DNA compounds of size sufficient to construct a useful portion of a PKS module or domain. Thus, the desired derivative coding sequences can be synthesized using standard solid phase synthesis methods such as those described by Jaye et al., 1984, J. Biol. Chem.
259: 6331, and instruments for automated synthesis are available commercially from, for example, Applied Biosystems, Inc. For purposes of the invention, such synthetic DNA compounds are deemed to be a portion of a PKS.
With this general background regarding hybrid PKSs of the invention, one can better appreciate the benefit provided by the DNA compounds of the invention that encode the individual domains, modules, and proteins that comprise the narbonolide PKS. As described above, the narbonolide PKS is comprised of a loading module, six extender modules composed of a KS, AT, ACP, and optional KR, DH, and ER domains, and a thioesterase domain. The DNA compounds of the invention that encode these domains individually or in combination are useful in the construction of the hybrid PKS encoding DNA compounds of the invention.
The recombinant DNA compounds of the invention that encode the loading module of the narbonolide PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS loading module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for the loading module of the heterologous PKS is replaced by that for the coding sequence of the narbonolide PKS loading module provides a novel PKS. Examples WO 99/i 99 PCT/STOO/1 R14 include the 6-deoxyerythronolide B, rapamycin, FK506, FK520, rifamycin, and avermectin PKS coding sequences. In another embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS loading module is inserted into a DNA compound that comprises the coding sequence for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative in a different location in the modular system.
In another embodiment, a portion of the loading module coding sequence is utilized in conjunction with a heterologous coding sequence. In this embodiment, the invention provides, for example, replacing the propionyl CoA specific AT with an acetyl CoA, butyryl CoA, or other CoA specific AT. In addition, the KS o and/or ACP can be replaced by another inactivated KS and/or another ACP. Alternatively, the KS
Q
AT, and ACP of the loading module can be replaced by an AT and ACP of a loading module such as that of DEBS. The resulting heterologous loading module coding sequence can be utilized in conjunction with a coding sequence for a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
The recombinant DNA compounds of the invention that encode the first extender module of the narbonolide PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS first extender module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that for the first extender module of the narbonolide PKS or the latter is merely added to coding sequences for modules of the heterologous PKS, provides a novel PKS coding sequence. In another embodiment, a DNA compound comprising a sequence that encodes the first extender module of the narbonolide PKS is inserted into a DNA compound that comprises coding sequences for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative or into a different location in the modular system.
In another embodiment, a portion or all of the first extender module coding sequence is utilized in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or carboxyglycolyl CoA specific AT; deleting (which includes inactivating) the KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of these replacements or insertions, the WO QQ/615 Q PCT/I TS/99/11814s -51heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for another module of the narbonolide PKS, from a gene for a PKS that produces a polyketide other than narbonolide, or from chemical synthesis. The resulting heterologous first extender module coding sequence can be utilized in conjunction with a coding sequence for a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
In an illustrative embodiment of this aspect of the invention, the invention provides recombinant PKSs and recombinant DNA compounds and vectors that encode such PKSs in which the KS domain of the first extender module has been inactivated. Such constructs are especially useful when placed in translational reading frame with the remaining modules and domains of a narbonolide PKS or narbonolide derivative PKS. The utility of these constructs is that host cells expressing, or cell free extracts containing, the PKS encoded thereby can be fed or supplied with N-acetylcysteamine thioesters of novel precursor molecules to prepare narbonolide derivatives. See U.S. patent application Serial No. 60/117,384, filed 27 Jan.
1999, and PCT publication Nos. WO 99/03986 and WO 97/02358, each of which is incorporated herein by reference.
The recombinant DNA compounds of the invention that encode the second extender module of the narbonolide PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS second extender module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that for the second extender module of the narbonolide PKS or the latter is merely added to coding sequences for the modules of the heterologous PKS, provides a novel PKS. In another embodiment, a DNA compound comprising a sequence that encodes the second extender module of the narbonolide PKS is inserted into a DNA compound that comprises the coding sequences for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative.
In another embodiment, a portion or all of the second extender module coding sequence is utilized in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, replacing the malonyl CoA specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or carboxyglycolyl CoA specific AT; deleting (or inactivating) the KR, the DH, or both the DH and KR; replacing the KR or the KR and DH with a KR, a KR and a DH, or a KR, DH, and ER; and/or inserting an ER. In WO 99/61599 PCT/US99/11814 -52addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for another module of the narbonolide PKS, from a coding sequence for a PKS that produces a polyketide other than narbonolide, or from chemical synthesis. The resulting heterologous second extender module coding sequence can be utilized in conjunction with a coding sequence from a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
The recombinant DNA compounds of the invention that encode the third extender module of the narbonolide PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS third extender module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that for the third extender module of the narbonolide PKS or the latter is merely added to coding sequences for the modules of the heterologous PKS, provides a novel PKS.
In another embodiment, a DNA compound comprising a sequence that encodes the third extender module of the narbonolide PKS is inserted into a DNA compound that comprises coding sequences for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative.
In another embodiment, a portion or all of the third extender module coding sequence is utilized in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or carboxyglycolyl CoA specific AT; deleting the inactive KR; and/or inserting a KR, or a KR and DH, or a KR, DH, and ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for another module of the narbonolide PKS, from a gene for a PKS that produces a polyketide other than narbonolide, or from chemical synthesis. The resulting heterologous third extender module coding sequence can be utilized in conjunction with a coding sequence for a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
The recombinant DNA compounds of the invention that encode the fourth extender module of the narbonolide PKS and the corresponding polypeptides encoded thereby are WO 99/6199 PCT/ITO/11R14 -53useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS fourth extender module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that for the fourth extender module of the narbonolide PKS or the latter is merely added to coding sequences for the modules of the heterologous PKS, provides a novel PKS.
In another embodiment, a DNA compound comprising a sequence that encodes the fourth extender module of the narbonolide PKS is inserted into a DNA compound that comprises coding sequences for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative.
In another embodiment, a portion of the fourth extender module coding sequence is utilized in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or carboxyglycolyl CoA specific AT; deleting any one, two, or all three of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for another module of the narbonolide PKS, from a coding sequence for a PKS that produces a polyketide other than narbonolide, or from chemical synthesis. The resulting heterologous fourth extender module coding sequence can be utilized in conjunction with a coding sequence for a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
The recombinant DNA compounds of the invention that encode the fifth extender module of the narbonolide PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS fifth extender module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that for the fifth extender module of the narbonolide PKS or the latter is merely added to coding sequences for the modules of the heterologous PKS, provides a novel PKS.
In another embodiment, a DNA compound comprising a sequence that encodes the fifth extender module of the narbonolide PKS is inserted into a DNA compound that comprises the WO 99/61599 PCTIUS99/11814 -54coding sequence for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative.
In another embodiment, a portion or all of the fifth extender module coding sequence is utilized in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or carboxyglycolyl CoA specific AT; deleting (or inactivating) the KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for another module of the narbonolide PKS, from a coding sequence for a PKS that produces a polyketide other than narbonolide, or from chemical synthesis. The resulting heterologous fifth extender module coding sequence can be utilized in conjunction with a coding sequence for a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
The recombinant DNA compounds of the invention that encode the sixth extender module of the narbonolide PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence that encodes the narbonolide PKS sixth extender module is inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that for the sixth extender module of the narbonolide PKS or the latter is merely added to coding sequences for the modules of the heterologous PKS, provides a novel PKS.
In another embodiment, a DNA compound comprising a sequence that encodes the sixth extender module of the narbonolide PKS is inserted into a DNA compound that comprises the coding sequences for the narbonolide PKS or a recombinant narbonolide PKS that produces a narbonolide derivative.
In another embodiment, a portion or all of the sixth extender module coding sequence is utilized in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or carboxyglycolyl CoA specific AT; and/or inserting a KR, a KR and DH, or a KR, DH, and an ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of these replacements or insertions, the WO 99/61599 PCTIUS99/11814 heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a coding sequence for another module of the narbonolide PKS, from a coding sequence for a PKS that produces a polyketide other than narbonolide, or from chemical synthesis. The resulting heterologous sixth extender module coding sequence can be utilized in conjunction with a coding sequence for a PKS that synthesizes narbonolide, a narbonolide derivative, or another polyketide.
The sixth extender module of the narbonolide PKS is followed by a thioesterase domain. This domain is important in the cyclization of the polyketide and its cleavage from the PKS. The present invention provides recombinant DNA compounds that encode hybrid PKS enzymes in which the narbonolide PKS is fused to a heterologous thioesterase or a heterologous PKS is fused to the narbonolide synthase thioesterase. Thus, for example, a thioesterase domain coding sequence from another PKS gene can be inserted at the end of the sixth extender module coding sequence in recombinant DNA compounds of the invention.
Recombinant DNA compounds encoding this thioesterase domain are therefore useful in constructing DNA compounds that encode the narbonolide PKS, a PKS that produces a narbonolide derivative, and a PKS that produces a polyketide other than narbonolide or a narbonolide derivative.
The following Table lists references describing illustrative PKS genes and corresponding enzymes that can be utilized in the construction of the recombinant hybrid PKSs and the corresponding DNA compounds that encode them of the invention. Also presented are various references describing tailoring enzymes and corresponding genes that can be employed in accordance with the methods of the invention.
Avermectin U.S. Pat. No. 5,252,474 to Merck.
MacNeil et al., 1993, Industrial Microorganisms: Basic and Applied Molecular Genetics, Baltz, Hegeman, Skatrud, eds. (ASM), pp. 245-256, A Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, Erythromycin, and Nemadectin.
MacNeil et al., 1992, Gene 115: 119-125, Complex Organization of the Streptomyces avermitilis genes encoding the avermectin polyketide synthase.
Candicidin (FR008) Hu et al., 1994, Mol. Microbiol. 14: 163-172.
WO QQ/61 09 Pr'T/I TOQ/11 R1 -56- Epothilone U.S. patent application Serial No. 60/130,560, filed 22 Apr. 1999, and Serial No.
60/122,620, filed 3 Mar. 1999.
Erythromycin PCT Pub. No. WO 93/13663 to Abbott.
US Pat. No. 5,824,513 to Abbott.
Donadio et al., 1991, Science 252:675-9.
Cortes et al., 8 Nov. 1990, Nature 348:176-8, An unusually large multifunctional polypeptide in the erythromycin producing polyketide synthase of Saccharopolyspora erythraea.
Glycosylation Enzymes PCT Pat. App. Pub. No. WO 97/23630 to Abbott.
FK506 Motamedi et al., 1998, The biosynthetic gene cluster for the macrolactone ring of the immunosuppressant FK506, Eur. J. Biochem. 256: 528-534.
Motamedi et al., 1997, Structural organization of a multifunctional polyketide synthase involved in the biosynthesis of the macrolide immunosuppressant FK506, Eur. J.
Biochem. 244: 74-80.
Methyltransferase US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from Streptomyces MA6858. 31-O-desmethyl-FK506 methyltransferase.
Motamedi et al., 1996, Characterization of methyltransferase and hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 and FK520, J. Bacteriol. 178: 5243-5248.
FK520 U.S. patent application Serial No. 60/123,800, filed 11 Mar. 1999.
Immunomycin Nielsen et al., 1991, Biochem. 30:5789-96.
Lovastatin U.S. Pat. No. 5,744,350 to Merck.
Nemadectin MacNeil et al., 1993, supra.
wnoo /1 oo PCT//99/11814 -57- Niddaymcin Kakavas et al., 1997, Identification and characterization of the niddamycin polyketide synthase genes from Streptomyces caelestis, J. Bacteriol. 179: 7515-7522.
Oleandomycin Swan et al., 1994, Characterization of a Streptomyces antibioticus gene encoding a type I polyketide synthase which has an unusual coding sequence, Mol. Gen. Genet. 242: 358-362.
Olano et al., 1998, Analysis of a Streptomyces antibioticus chromosomal region involved in oleandomycin biosynthesis, which encodes two glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol. Gen. Genet. 259(3): 299-308.
U.S. patent application Serial No. 60/120,254, filed 16 Feb. 1999, and Serial No.
60/106,100, filed 29 Oct. 1998.
Platenolide EP Pat. App. Pub. No. 791,656 to Lilly.
Pradimicin PCT Pat. Pub. No. WO 98/11230 to Bristol-Myers Squibb.
Rapamycin Schwecke et al., Aug. 1995, The biosynthetic gene cluster for the polyketide rapamycin, Proc. Natl. Acad. Sci. USA 92:7839-7843.
Aparicio et al., 1996, Organization of the biosynthetic gene cluster for rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in the modular polyketide synthase, Gene 169: 9-16.
Rifamycin August et al., 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic rifamycin: deductions from the molecular analysis of the rifbiosynthetic gene cluster of Amycolatopsis mediterranei S669, Chemistry Biology, 69-79.
Soraphen U.S. Pat. No. 5,716,849 to Novartis.
Schupp et al., 1995, J. Bacteriology 177: 3673-3679. A Sorangium cellulosum (Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide Synthase Genes from Actinomycetes.
WO 99/61599 PCT/US99/11814 -58- Spiramycin U.S. Pat. No. 5,098,837 to Lilly.
Activator Gene U.S. Pat. No. 5,514,544 to Lilly.
Tylosin EP Pub. No. 791,655 to Lilly.
Kuhstoss et al., 1996, Gene 183:231-6., Production of a novel polyketide through the construction of a hybrid polyketide synthase.
U.S. Pat. No. 5,876,991 to Lilly.
Tailoring enzymes Merson-Davies and Cundliffe, 1994, Mol. Microbiol. 13: 349-355. Analysis of five tylosin biosynthetic genes from the tylBA region of the Streptomycesfradiae genome.
As the above Table illustrates, there is a wide variety of PKS genes that serve as readily available sources of DNA and sequence information for use in constructing the hybrid PKS-encoding DNA compounds of the invention. Methods for constructing hybrid PKSencoding DNA compounds are described without reference to the narbonolide PKS in U.S.
Patent Nos. 5,672,491 and 5,712,146 and PCT publication No. WO 98/49315, each of which is incorporated herein by reference.
In constructing hybrid PKSs of the invention, certain general methods may be helpful.
For example, it is often beneficial to retain the framework of the module to be altered to make the hybrid PKS. Thus, if one desires to add DH and ER functionalities to a module, it is often preferred to replace the KR domain of the original module with a KR, DH, and ER domaincontaining segment from another module, instead of merely inserting DH and ER domains.
One can alter the stereochemical specificity of a module by replacement of the KS domain with a KS domain from a module that specifies a different stereochemistry. See Lau et al., 1999, "Dissecting the role of acyltransferase domains of modular polyketide synthases in the choice and stereochemical fate of extender units" Biochemistry 38(5):1643-1651, incorporated herein by reference. One can alter the specificity of an AT domain by changing only a small segment of the domain. See Lau et al., supra. One can also take advantage of known linker regions in PKS proteins to link modules from two different PKSs to create a hybrid PKS. See Gokhale et al., 16 Apr. 1999, Dissecting and Exploiting Intermodular Communication in Polyketide Synthases", Science 284: 482-485, incorporated herein by reference.
WO 99/61599 PCT/US99/11814 -59- The hybrid PKS-encoding DNA compounds of the invention can be and often are hybrids of more than two PKS genes. Even where only two genes are used, there are often two or more modules in the hybrid gene in which all or part of the module is derived from a second (or third) PKS gene. Thus, as one illustrative example, the invention provides a hybrid narbonolide PKS that contains the naturally occurring loading module and thioesterase domain as well as extender modules one, two, four, and six of the narbonolide PKS and further contains hybrid or heterologous extender modules three and five. Hybrid or heterologous extender modules three and five contain AT domains specific for malonyl CoA and derived from, for example, the rapamycin PKS genes.
To construct a hybrid PKS or narbonolide derivative PKS of the invention, one can employ a technique, described in PCT Pub. No. WO 98/27203, which is incorporated herein by reference, in which the large PKS gene cluster is divided into two or more, typically three, segments, and each segment is placed on a separate expression vector. In this manner, each of the segments of the gene can be altered, and various altered segments can be combined in a single host cell to provide a recombinant PKS gene of the invention. This technique makes more efficient the construction of large libraries of recombinant PKS genes, vectors for expressing those genes, and host cells comprising those vectors.
Included in the definition of"hybrid" are PKS where alterations (including deletions, insertions and substitutions) are made directly using the narbonolide PKS as a substrate.
The invention also provides libraries of PKS genes, PKS proteins, and ultimately, of polyketides, that are constructed by generating modifications in the narbonolide PKS so that the protein complexes produced have altered activities in one or more respects and thus produce polyketides other than the natural product of the PKS. Novel polyketides may thus be prepared, or polyketides in general prepared more readily, using this method. By providing a large number of different genes or gene clusters derived from a naturally occurring PKS gene cluster, each of which has been modified in a different way from the native cluster, an effectively combinatorial library of polyketides can be produced as a result of the multiple variations in these activities. As will be further described below, the metes and bounds of this embodiment of the invention can be described on both the protein level and the encoding nucleotide sequence level.
As described above, a modular PKS "derived from" the narbonolide or other naturally occurring PKS is a subset of the "hybrid" PKS family and includes a modular PKS (or its corresponding encoding gene(s)) that retains the scaffolding of the utilized portion of the WO 99/61599 PCT/US99/11814 naturally occurring gene. Not all modules need be included in the constructs. On the constant scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted so as to alter the activity of the resulting PKS relative to the original PKS. Alteration results when these activities are deleted or are replaced by a different version of the activity, or simply mutated in such a way that a polyketide other than the natural product results from these collective activities. This occurs because there has been a resulting alteration of the starter unit and/or extender unit, and/or stereochemistry, and/or chain length or cyclization, and/or reductive or dehydration cycle outcome at a corresponding position in the product polyketide. Where a deleted activity is replaced, the origin of the replacement activity may come from a corresponding activity in a different naturally occurring PKS or from a different region of the narbonolide PKS. Any or all of the narbonolide PKS genes may be included in the derivative or portions of any of these may be included, but the scaffolding of the PKS protein is retained in whatever derivative is constructed. The derivative preferably contains a thioesterase activity from the narbonolide or another PKS.
In summary, a PKS "derived from" the narbonolide PKS includes a PKS that contains the scaffolding of all or a portion of the narbonolide PKS. The derived PKS also contains at least two extender modules that are functional, preferably three extender modules, and more preferably four or more extender modules, and most preferably six extender modules. The derived PKS also contains mutations, deletions, insertions, or replacements of one or more of the activities of the functional modules of the narbonolide PKS so that the nature of the resulting polyketide is altered. This definition applies both at the protein and DNA sequence levels. Particular preferred embodiments include those wherein a KS, AT, KR, DH, or ER has been deleted or replaced by a version of the activity from a different PKS or from another location within the same PKS. Also preferred are derivatives where at least one noncondensation cycle enzymatic activity (KR, DH, or ER) has been deleted or added or wherein any of these activities has been mutated so as to change the structure of the polyketide synthesized by the PKS.
Conversely, also included within the definition of a PKS derived from the narbonolide PKS are functional PKS modules or their encoding genes wherein at least one portion, preferably two portions, of the narbonolide PKS activities have been inserted. Exemplary is the use of the narbonolide AT for extender module 2 which accepts a malonyl CoA extender unit rather than methylmalonyl CoA to replace a methylmalonyl specific AT in a PKS. Other examples include insertion of portions of non-condensation cycle enzymatic activities or WO 99/61599 PCTfUS99/11814 -61other regions of narbonolide synthase activity into a heterologous PKS. Again, the derived from definition applies to the PKS at both the genetic and protein levels.
Thus, there are at least five degrees of freedom for constructing a hybrid PKS in terms of the polyketide that will be produced. First, the polyketide chain length is determined by the number of modules in the PKS. Second, the nature of the carbon skeleton of the PKS is determined by the specificities of the acyl transferases that determine the nature of the extender units at each position, malonyl, methylmalonyl, ethylmalonyl, or other substituted malonyl. Third, the loading module specificity also has an effect on the resulting carbon skeleton of the polyketide. The loading module may use a different starter unit, such as acetyl, butyryl, and the like. As noted above and in the examples below, another method for varying loading module specificity involves inactivating the KS activity in extender module 1 (KS 1) and providing alternative substrates, called diketides that are chemically synthesized analogs of extender module 1 diketide products, for extender module 2. This approach was illustrated in PCT publication Nos. WO 97/02358 and WO 99/03986, incorporated herein by reference, wherein the KS 1 activity was inactivated through mutation.
Fourth, the oxidation state at various positions of the polyketide will be determined by the dehydratase and reductase portions of the modules. This will determine the presence and location of ketone and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. Finally, the stereochemistry of the resulting polyketide is a function of three aspects of the synthase. The first aspect is related to the AT/KS specificity associated with substituted malonyls as extender units, which affects stereochemistry only when the reductive cycle is missing or when it contains only a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity of the ketoreductase may determine the chirality of any beta- OH. Finally, the enoylreductase specificity for substituted malonyls as extender units may influence the result when there is a complete KR/DH/ER available.
Thus, the modular PKS systems, and in particular, the narbonolide PKS system, permit a wide range of polyketides to be synthesized. As compared to the aromatic PKS systems, a wider range of starter units including aliphatic monomers (acetyl, propionyl, butyryl, isovaleryl, etc.), aromatics (aminohydroxybenzoyl), alicyclics (cyclohexanoyl), and heterocyclics (thiazolyl) are found in various macrocyclic polyketides. Recent studies have shown that modular PKSs have relaxed specificity for their starter units (Kao et al., 1994, Science, supra). Modular PKSs also exhibit considerable variety with regard to the choice of extender units in each condensation cycle. The degree of beta-ketoreduction following a W) 99/6159 Q PCT/US99/11814 -62condensation reaction has also been shown to be altered by genetic manipulation (Donadio et al., 1991, Science, supra; Donadio et al., 1993, Proc. Natl. Acad. Sci. USA 90: 7119-7123).
Likewise, the size of the polyketide product can be varied by designing mutants with the appropriate number of modules (Kao et al., 1994, J. Am. Chem. Soc. 116:11612-11613).
Lastly, these enzymes are particularly well known for generating an impressive range of asymmetric centers in their products in a highly controlled manner. The polyketides and antibiotics produced by the methods of the invention are typically single stereoisomeric forms. Although the compounds of the invention can occur as mixtures of stereoisomers, it may be beneficial in some instances to generate individual stereoisomers. Thus, the combinatorial potential within modular PKS pathways based on any naturally occurring modular, such as the narbonolide, PKS scaffold is virtually unlimited.
The combinatorial potential is increased even further when one considers that mutations in DNA encoding a polypeptide can be used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations can be made to the native sequences using conventional techniques. The substrates for mutation can be an entire cluster of genes or only one or two of them; the substrate for mutation may also be portions of one or more of these genes. Techniques for mutation include preparing synthetic oligonucleotides including the mutations and inserting the mutated sequence into the gene encoding a PKS subunit using restriction endonuclease digestion. See, Kunkel, 1985, Proc. Natl. Acad. Sci. USA 82: 448; Geisselsoder et al., 1987, BioTechniques 5:786. Alternatively, the mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a temperature below the melting temperature of the mismatched duplex. The primer can be made specific by keeping primer length and base composition within relatively narrow limits and by keeping the mutant base centrally located.
See Zoller and Smith, 1983, Methods Enzymol. 100:468. Primer extension is effected using DNA polymerase, the product cloned, and clones containing the mutated DNA, derived by segregation of the primer extended strand, selected. Identification can be accomplished using the mutant primer as a hybridization probe. The technique is also applicable for generating multiple point mutations. See, Dalbie-McFarland et al., 1982, Proc. Natl. Acad. Sci.
USA 79: 6409. PCR mutagenesis can also be used to effect the desired mutations.
Random mutagenesis of selected portions of the nucleotide sequences encoding enzymatic activities can also be accomplished by several different techniques known in the art, by inserting an oligonucleotide linker randomly into a plasmid, by irradiation with WO 99/61599 PCT/US99/11814 -63- X-rays or ultraviolet light, by incorporating incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing synthetic mutants, or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens include, for example, sodium bisulfite, nitrous acid, nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, analogues of nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. coli and propagated as a pool or library of mutant plasmids.
In constructing a hybrid PKS of the invention, regions encoding enzymatic activity, regions encoding corresponding activities from different PKS synthases or from different locations in the same PKS, can be recovered, for example, using PCR techniques with appropriate primers. By "corresponding" activity encoding regions is meant those regions encoding the same general type of activity. For example, a KR activity encoded at one location of a gene cluster "corresponds" to a KR encoding activity in another location in the gene cluster or in a different gene cluster. Similarly, a complete reductase cycle could be considered corresponding. For example, KR/DH/ER corresponds to KR alone.
If replacement of a particular target region in a host PKS is to be made, this replacement can be conducted in vitro using suitable restriction enzymes. The replacement can also be effected in vivo using recombinant techniques involving homologous sequences framing the replacement gene in a donor plasmid and a receptor region in a recipient plasmid.
Such systems, advantageously involving plasmids of differing temperature sensitivities are described, for example, in PCT publication No. WO 96/40968, incorporated herein by reference. The vectors used to perform the various operations to replace the enzymatic activity in the host PKS genes or to support mutations in these regions of the host PKS genes can be chosen to contain control sequences operably linked to the resulting coding sequences in a manner such that expression of the coding sequences can be effected in an appropriate host.
However, simple cloning vectors may be used as well. If the cloning vectors employed to obtain PKS genes encoding derived PKS lack control sequences for expression operably linked to the encoding nucleotide sequences, the nucleotide sequences are inserted into appropriate expression vectors. This need not be done individually, but a pool of isolated encoding nucleotide sequences can be inserted into expression vectors, the resulting vectors WO 99/61599 PCT[US99/11814 -64transformed or transfected into host cells, and the resulting cells plated out into individual colonies.
The various PKS nucleotide sequences can be cloned into one or more recombinant vectors as individual cassettes, with separate control elements, or under the control of, a single promoter. The PKS subunit encoding regions can include flanking restriction sites to allow for the easy deletion and insertion of other PKS subunit encoding sequences so that hybrid PKSs can be generated. The design of such unique restriction sites is known to those of skill in the art and can be accomplished using the techniques described above, such as sitedirected mutagenesis and PCR.
The expression vectors containing nucleotide sequences encoding a variety of PKS enzymes for the production of different polyketides are then transformed into the appropriate host cells to construct the library. In one straightforward approach, a mixture of such vectors is transformed into the selected host cells and the resulting cells plated into individual colonies and selected to identify successful transformants. Each individual colony has the ability to produce a particular PKS synthase and ultimately a particular polyketide. Typically, there will be duplications in some, most, or all of the colonies; the subset of the transformed colonies that contains a different PKS in each member colony can be considered the library.
Alternatively, the expression vectors can be used individually to transform hosts, which transformed hosts are then assembled into a library. A variety of strategies are available to obtain a multiplicity of colonies each containing a PKS gene cluster derived from the naturally occurring host gene cluster so that each colony in the library produces a different PKS and ultimately a different polyketide. The number of different polyketides that are produced by the library is typically at least four, more typically at least ten, and preferably at least 20, and more preferably at least 50, reflecting similar numbers of different altered PKS gene clusters and PKS gene products. The number of members in the library is arbitrarily chosen; however, the degrees of freedom outlined above with respect to the variation of starter, extender units, stereochemistry, oxidation state, and chain length is quite large.
Methods for introducing the recombinant vectors of the invention into suitable hosts are known to those of skill in the art and typically include the use of CaC12 or agents such as other divalent cations, lipofection, DMSO, protoplast transformation, infection, transfection, and electroporation. The polyketide producing colonies can be identified and isolated using known techniques and the produced polyketides further characterized. The polyketides WO 99/61599 PCT/US99/11814 produced by these colonies can be used collectively in a panel to represent a library or may be assessed individually for activity.
The libraries of the invention can thus be considered at four levels: a multiplicity of colonies each with a different PKS encoding sequence; colonies that contain the proteins that are members of the PKS library produced by the coding sequences; the polyketides produced; and antibiotics or compounds with other desired activities derived from the polyketides. Of course, combination libraries can also be constructed wherein members of a library derived, for example, from the narbonolide PKS can be considered as a part of the same library as those derived from, for example, the rapamycin PKS or DEBS.
Colonies in the library are induced to produce the relevant synthases and thus to produce the relevant polyketides to obtain a library of polyketides. The polyketides secreted into the media can be screened for binding to desired targets, such as receptors, signaling proteins, and the like. The supematants per se can be used for screening, or partial or complete purification of the polyketides can first be effected. Typically, such screening methods involve detecting the binding of each member of the library to receptor or other target ligand. Binding can be detected either directly or through a competition assay. Means to screen such libraries for binding are well known in the art. Alternatively, individual polyketide members of the library can be tested against a desired target. In this event, screens wherein the biological response of the target is measured can more readily be included.
Antibiotic activity can be verified using typical screening assays such as those set forth in Lehrer et al., 1991, J. Immunol. Meth. 137:167-173, incorporated herein by reference, and in the examples below.
The invention provides methods for the preparation of a large number of polyketides.
These polyketides are useful intermediates in formation of compounds with antibiotic or other activity through hydroxylation and glycosylation reactions as described above. In general, the polyketide products of the PKS must be further modified, typically by hydroxylation and glycosylation, to exhibit antibiotic activity. Hydroxylation results in the novel polyketides of the invention that contain hydroxyl groups at C6, which can be accomplished using the hydroxylase encoded by the eryF gene, and/or C12, which can be accomplished using the hydroxylase encoded by the picK or eryK gene. The presence of hydroxyl groups at these positions can enhance the antibiotic activity of the resulting compound relative to its unhydroxylated counterpart.
WO 99/61599 PCT/US99/11814 -66 Gycosylation is important in conferring antibiotic activity to a polyketide as well.
Methods for glycosylating the polyketides are generally known in the art; the glycosylation may be effected intracellularly by providing the appropriate glycosylation enzymes or may be effected in vitro using chemical synthetic means as described herein and in PCT publication No. WO 98/49315, incorporated herein by reference. Preferably, glycosylation with desosamine is effected in accordance with the methods of the invention in recombinant host cells provided by the invention. In general, the approaches to effecting glycosylation mirror those described above with respect to hydroxylation. The purified enzymes, isolated from native sources or recombinantly produced may be used in vitro. Alternatively and as noted, glycosylation may be effected intracellularly using endogenous or recombinantly produced intracellular glycosylases. In addition, synthetic chemical methods may be employed.
The antibiotic modular polyketides may contain any of a number of different sugars, although D-desosamine, or a close analog thereof, is most common. Erythromycin, picromycin, narbomycin and methymycin contain desosamine. Erythromycin also contains Lcladinose (3-O-methyl mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 6-deoxy-D-allose. 2-acetyl-1-bromodesosamine has been used as a donor to glycosylate polyketides by Masamune et al., 1975, J. Am. Chem. Soc. 97: 3512-3513. Other, apparently more stable donors include glycosyl fluorides, thioglycosides, and trichloroacetimidates; see Woodward et al., 1981, J. Am. Chem. Soc. 103: 3215; Martin et al., 1997, J. Am. Chem. Soc. 119: 3193; Toshima et al., 1995, J. Am. Chem. Soc. 117: 3717; Matsumoto et al., 1988, Tetrahedron Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones as starting materials and using Saccharopolyspora erythraea or Streptomyces venezuelae to make the conversion, preferably using mutants unable to synthesize macrolides.
To provide an illustrative hybrid PKS of the invention as well as an expression vector for that hybrid PKS and host cells comprising the vector and producing the hybrid polyketide, a portion of the narbonolide PKS gene was fused to the DEBS genes. This construct also allowed the examination of whether the TE domain of the narbonolide PKS (pikTE) could promote formation of 12-membered lactones in the context of a different PKS. A construct was generated, plasmid pKOS039-18, in which the pikTE ORF was fused with the DEBS genes in place of the DEBS TE ORF (see Figure To allow the TE to distinguish between substrates most closely resembling those generated by the narbonolide PKS, the fusion junction was chosen between the AT and ACP to eliminate ketoreductase activity in DEBS WO 99/61599 PCTIUS99/11814 -67extender module 6 (KR 6 This results in a hybrid PKS that presents the TE with a B-ketone heptaketide intermediate and a B-(S)-hydroxy hexaketide intermediate to cyclize, as in narbonolide and 10-deoxymethynolide biosynthesis.
Analysis of this construct indicated the production of the 14-membered ketolide 3,6dideoxy-3-oxo-erythronolide B (Figure 5, compound Extracts were analyzed by LC/MS.
The identity of compound 6 was verified by comparison to a previously authenticated sample (see PCT publication No. WO 98/49315, incorporated herein by reference). The predicted 12membered macrolactone, (8R,9S)-8,9-dihydro-8-methyl-9-hydroxy-10-deoxymethynolide (see Kao et al. J. Am. Chem. Soc. (1995) 117:9105-9106 incorporated herein by reference) was not detected. Because the 12-membered intermediate can be formed by other recombinant PKS enzymes, see Kao et al., 1995, supra, the PIC TE domain appears incapable of forcing premature cyclization of the hexaketide intermediate generated by DEBS. This result, along with others reported herein, suggests that protein interactions between the narbonolide PKS modules play a role in formation of the 12 and 14-membered macrolides.
The above example illustrates also how engineered PKSs can be improved for production of novel compounds. Compound 6 was originally produced by deletion of the KR 6 domain in DEBS to create a 3-ketolide producing PKS (see U.S. patent application Serial No.
09/073,538, filed 6 May 1998, and PCT publication No. WO 98/49315, each of which is incorporated herein by reference). Although the desired molecule was made, purification of compound 6 from this strain was hampered by the presence of 2-desmethyl ketolides that could not be easily separated. Extracts from Streptomyces lividans K4-114/pKOS039-18, however, do not contain the 2-desmethyl compounds, greatly simplifying purification. Thus, the invention provides a useful method of producing such compounds. The ability to combine the narbonolide PKS with DEBS and other modular PKSs provides a significant advantage in the production of macrolide antibiotics.
Two other hybrid PKSs of the invention were constructed that yield this same compound. These constructs also illustrate the method of the invention in which hybrid PKSs are constructed at the protein, as opposed to the module, level. Thus, the invention provides a method for constructing a hybrid PKS which comprises the coexpression of at least one gene from a first modular PKS gene cluster in a host cell that also expresses at least one gene from a second PKS gene cluster. The invention also provides novel hybrid PKS enzymes prepared in accordance with the method. This method is not limited to hybrid PKS enzymes composed WO 99/61599 PCT/US99/11814 -68of at least one narbonolide PKS gene, although such constructs are illustrative and preferred.
Moreover, the hybrid PKS enzymes are not limited to hybrids composed of unmodified proteins; as illustrated below, at least one of the genes can optionally be a hybrid PKS gene.
In the first construct, the eryAI and eryAII genes were coexpressed with picAIV and a gene encoding a hybrid extender module 5 composed of the KS and AT domains of extender module 5 of DEBS3 and the KR and ACP domains of extender module 5 of the narbonolide PKS. In the second construct, the picAJV coding sequence was fused to the hybrid extender module 5 coding sequence used in the first construct to yield a single protein. Each of these constructs produced 3-deoxy-3-oxo-6-deoxyerythronolide B. In a third construct, the coding sequence for extender module 5 of DEBS3 was fused to the picAN coding sequence, but the levels of product produced were below the detection limits of the assay.
A variant of the first construct hybrid PKS was constructed that contained an inactivated DEBS1 extender module 1 KS domain. When host cells containing the resultant hybrid PKS were supplied the appropriate diketide precursor, the desired 13-desethyl-13propyl compounds were obtained, as described in the examples below.
Other illustrative hybrid PKSs of the invention were made by coexpressing the picAI and picAH genes with genes encoding DEBS3 or DEBS3 variants. These constructs illustrate the method of the invention in which a hybrid PKS is produced from coexpression of PKS genes unmodified at the modular or domain level. In the first construct, the eryAIII gene was coexpressed with the picAI and picAII genes, and the hybrid PKS produced 10,11 -anhydro-6-deoxyerythronolide B in Streptomyces lividans. Such a hybrid PKS could also be constructed in accordance with the method of the invention by transformation of S.
venezuelae with an expression vector that produces the eryAIII gene product, DEBS3. In a preferred embodiment, the S. venezuelae host cell has been modified to inactivate the picAIII gene.
In the second construct, the DEBS3 gene was a variant that had an inactive KR in extender module 5. The hybrid PKS produced 5,6-dideoxy-5-oxo-10-desmethyl-10,l 1anhydroerythronolide B in Streptomyces lividans.
In the third construct, the DEBS3 gene was a variant in which the KR domain of extender module 5 was replaced by the DH and KR domains of extender module 4 of the rapamycin PKS. This construct produced 5,6-dideoxy-5-oxo- 10-desmethyl-10,11 anhydroerythronolide B and 5,6-dideoxy-4,5-anhydro- 10-desmethyl-10,11- WO 99/61599 PCT/US99/11814 -69anhydroerythronolide B in Streptomyces lividans, indicating that the rapamycin DH and KR domains functioned only inefficiently in this construct.
In the fourth construct, the DEBS3 gene was a variant in which the KR domain of extender module 5 was replaced by the DH, KR, and ER domains of extender module 1 of the rapamycin PKS. This construct produced 5,6-dideoxy-5-oxo-10-desmethyl-10,11anhydroerythronolide B as well as 5,6-dideoxy- 10-desmethyl-10,11-anhydroerythronolide B in Streptomyces lividans, indicating that the rapamycin DH, KR, and ER domains functioned only inefficiently in this construct.
In the fifth construct, the DEBS3 gene was a variant in which the KR domain of extender module 6 was replaced by the DH and KR domains of extender module 4 of the rapamycin PKS. This construct produced 3,6-dideoxy-2,3-anhydro- 10-desmethyl-10,11anhydroerythronolide B in Streptomyces lividans.
In the sixth construct, the DEBS3 gene was a variant in which the AT domain of extender module 6 was replaced by the AT domain of extender module 2 of the rapamycin PKS. This construct produced 2,10-didesmethyl-10,11 -anhydro-6-deoxyerythronolide B in Streptomyces lividans.
These hybrid PKSs illustrate the wide variety ofpolyketides that can be produced by the methods and compounds of the invention. These polyketides are useful as antibiotics and as intermediates in the synthesis of other useful compounds, as described in the following section.
Section VI: Compounds The methods and recombinant DNA compounds of the invention are useful in the production of polyketides. In one important aspect, the invention provides methods for making ketolides, polyketide compounds with significant antibiotic activity. See Griesgraber et al., 1996, J. Antibiot. 49: 465-477, incorporated herein by reference. Most if not all of the ketolides prepared to date are synthesized using erythromycin A, a derivative of 6-dEB, as an intermediate. While the invention provides hybrid PKSs that produce a polyketide different in structure from 6-dEB, the invention also provides methods for making intermediates useful in preparing traditional, 6-dEB-derived ketolide compounds.
Because 6-dEB in part differs from narbonolide in that it comprises a group, the novel hybrid PKS genes of the invention based on the narbonolide PKS provide many novel ketolides that differ from the known ketolides only in that they lack a WO 99/61599 PCT/US99/l1814 group. Thus, the invention provides the 10-desmethyl analogues of the ketolides and intermediates and precursor compounds described in, for example, Griesgraber et al., supra; Agouridas et al., 1998, J. Med. Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 5,444,051; 5,439,890; 5,439,889; and PCT publication Nos. WO 98/09978 and WO 98/28316, each of which is incorporated herein by reference. Because the invention also provides hybrid PKS genes that include a methylmalonyl-specific AT domain in extender module 2 of the narbonolide PKS, the invention also provides hybrid PKS that can be used to produce the 10-methyl-containing ketolides known in the art.
Thus, a hybrid PKS of the invention that produces 10-methyl narbonolide is constructed by substituting the malonyl-specific AT domain of the narbonolide PKS extender module 2 with a methylmalonyl specific AT domain from a heterologous PKS. A hybrid narbonolide PKS in which the AT of extender module 2 was replaced with the AT from DEBS extender module 2 was constructed using boundaries described in PCT publication No. WO 98/49315, incorporated herein by reference. However, when the hybrid PKS expression vector was introduced into Streptomyces venezuelae, detectable quantities of methyl picromycin were not produced. Thus, to construct such a hybrid PKS of the invention, an AT domain from a module other than DEBS extender module 2 is preferred. One could also employ DEBS extender module 2 or another methylmalonyl specific AT but utilize instead different boundaries than those used for the substitution described above. In addition, one can construct such a hybrid PKS by substituting, in addition to the AT domain, additional extender module 2 domains, including the KS, the KR, and the DH, and/or additional extender module 3 domains.
Although modification of extender module 2 of the narbonolide PKS is required, the extent of hybrid modules engineered need not be limited to module 2 to make narbonolide. For example, substitution of the KS domain of extender module 3 of the narbonolide PKS with a heterologous domain or module can result in more efficient processing of the intermediate generated by the hybrid extender module 2. Likewise, a heterologous TE domain may be more efficient in cyclizing 10-methyl narbonolide.
Substitution of the entire extender module 2 of the narbonolide PKS with a module encoding the correct enzymatic activities, a KS, a methylmalonyl specific AT, a KR, a DH, and an ACP, can also be used to create a hybrid PKS of the invention that produces a methyl ketolide. Modules useful for such whole module replacements include extender WO 99/61599 PCT/US99/11814 -71modules 4 and 10 from the rapamycin PKS, extender modules 1 and 5 from the FK506 PKS, extender module 2 of the tylosin PKS, and extender module 4 of the rifamycin PKS. Thus, the invention provides many different hybrid PKSs that can be constructed starting from the narbonolide PKS that can be used to produce 10-methyl narbonolide. While narbonolide is referred to in describing these hybrid PKSs, those of skill recognize that the invention also therefore provides the corresponding derivatives produces by glycosylation and hydroxylation. For example, if the hybrid PKS is expressed in Streptomyces narbonensis or S. venezuelae, the compounds produced are 10-methyl narbomycin and picromycin, respectively. Alternatively, the PKS can be expressed in a host cell transformed with the vectors of the invention that encode the desosamine biosynthesis and desosaminyl transferase and picK hydroxylase genes.
Other important compounds provided by the invention are the 6-hydroxy ketolides.
These compounds include 3-deoxy-3-oxo erythronolide B, 6-hydroxy narbonolide, and 6narbonolide. In the examples below, the invention provides a method for utilizing EryF to hydroxylate 3-ketolides that is applicable for the production of any 6hydroxy-3-ketolide.
Thus, the hybrid PKS genes of the invention can be expressed in a host cell that contains the desosamine biosynthetic genes and desosaminyl transferase gene as well as the required hydroxylase gene(s), which may be either picK (for the C12 position) or eryK (for the C12 position) and/or eryF (for the C6 position). The resulting compounds have antibiotic activity but can be further modified, as described in the patent publications referenced above, to yield a desired compound with improved or otherwise desired properties. Alternatively, the aglycone compounds can be produced in the recombinant host cell, and the desired glycosylation and hydroxylation steps carried out in vitro or in vivo, in the latter case by supplying the converting cell with the aglycone.
The compounds of the invention are thus optionally glycosylated forms of the polyketide set forth in formula below which are hydroxylated at either the C6 or the C12 or both. The compounds of formula can be prepared using the loading and the six extender modules of a modular PKS, modified or prepared in hybrid form as herein described. These polyketides have the formula: WO 99/61599 PCT/US99/11814 -72- R2 b s t
R
13 or unsubstituted hydrocarbyl of 1-15C;
X
1 R 6 each of R'-R 6 is independently H or alkyl (1-4C) wherein any alkyl at R' may optionally be substituted; each ofX'-X 5 is independently two H, H and OH, or or each of X'-X 5 is independently H and the compound of formula contains a doublebond in the ring adjacent to the position of said X at 2-3, 4-5, 6-7, 8-9 and/or 10-11; with the proviso that: at least two ofR'-R 6 are alkyl (1-4C).
Preferred compounds comprising formula 2 are those wherein at least three ofR'-R are alkyl preferably methyl or ethyl; more preferably wherein at least four of R -R are alkyl preferably methyl or ethyl. Also preferred are those wherein X 2 is two H, or H and OH, and/or X 3 is H, and/or X' is OH and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* when R'-R 5 is methyl, X 2 is and X 4 and
X
5 are OH. The glycosylated forms of the foregoing are also preferred.
The invention also provides the 12-membered macrolides corresponding to the compounds above but produced from a narbonolide-derived PKS lacking extender modules and 6 of the narbonolide PKS.
The compounds of the invention can be produced by growing and fermenting the host cells of the invention under conditions known in the art for the production of other polyketides. The compounds of the invention can be isolated from the fermentation broths of these cultured cells and purified by standard procedures. The compounds can be readily WO 99/61599 PCT/US99/11814 -73formulated to provide the pharmaceutical compositions of the invention. The pharmaceutical compositions of the invention can be used in the form of a pharmaceutical preparation, for example, in solid, semisolid, or liquid form. This preparation will contain one or more of the compounds of the invention as an active ingredient in admixture with an organic or inorganic carrier or excipient suitable for external, enteral, or parenteral application. The active ingredient may be compounded, for example, with the usual non-toxic, pharmaceutically acceptable carriers for tablets, pellets, capsules, suppositories, solutions, emulsions, suspensions, and any other form suitable for use.
The carriers which can be used include water, glucose, lactose, gum acacia, gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, colloidal silica, potato starch, urea, and other carriers suitable for use in manufacturing preparations, in solid, semisolid, or liquefied form. In addition, auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. For example, the compounds of the invention may be utilized with hydroxypropyl methylcellulose essentially as described in U.S. Patent No. 4,916,138, incorporated herein by reference, or with a surfactant essentially as described in EPO patent publication No. 428,169, incorporated herein by reference.
Oral dosage forms may be prepared essentially as described by Hondo et al., 1987, Transplantation Proceedings XIX, Supp. 6:17-22, incorporated herein by reference. Dosage forms for external application may be prepared essentially as described in EPO patent publication No. 423,714, incorporated herein by reference. The active compound is included in the pharmaceutical composition in an amount sufficient to produce the desired effect upon the disease process or condition.
For the treatment of conditions and diseases caused by infection, a compound of the invention may be administered orally, topically, parenterally, by inhalation spray, or rectally in dosage unit formulations containing conventional non-toxic pharmaceutically acceptable carriers, adjuvant, and vehicles. The term parenteral, as used herein, includes subcutaneous injections, and intravenous, intramuscular, and intrasternal injection or infusion techniques.
Dosage levels of the compounds of the invention are of the order from about 0.01 mg to about 50 mg per kilogram of body weight per day, preferably from about 0.1 mg to about 10 mg per kilogram of body weight per day. The dosage levels are useful in the treatment of the above-indicated conditions (from about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In addition, the compounds of the invention may be administered on an intermittent basis, at semi-weekly, weekly, semi-monthly, or monthly intervals.
74 The amount of active ingredient that may be combined with the carrier materials to produce a single dosage form will vary depending upon the host treated and the particular mode of administration. For example, a formulation intended for oral administration to humans may contain from 0.5 mg to 5 mg of active agent compounded with an appropriate and convenient amount of carrier material, which may vary from about 5 percent to about 95 percent of the total composition. Dosage unit forms will generally contain from about 0.5 mg to about 500 mg of active ingredient.
For external administration, the compounds of the invention may be formulated within the range of, for example, 0.00001% to 60% by weight, preferably from 0.001% to 10% by weight, and most preferably from 0.005% to 0.8% by weight.
It will be understood, however, that the specific dose level for any particular patient will depend on a variety of factors. These factors include the activity of the specific compound employed; the age, body weight, general health, sex, and diet of the subject; the time and route of administration and the rate of excretion of the drug; whether a drug combination is employed in the treatment; and the severity of the particular disease or condition for which therapy is sought.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
~Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
A detailed description of the invention having been provided above, the following examples are given for the purpose of illustrating the invention and shall not be construed as being a limitation on the scope of the invention or claims.
Example 1 General Methodology Bacterial strains, plasmids, and culture conditions. Streptomyces coelicolor CH999 described in WO 95/08548, published 30 March 1995, or S. lividans K4-114, described in Ziermann and Betlach, Jan. 99, Recombinant Polyketide Synthesis in Streptomyces: Engineering of Improved Host Strains, BioTechniques 26:106-110, Streptomyces: Engineering of Improved Host Strains, BioTechniques 26:106-110, m:\specifications\090000\93000\93075resmro.doc 74a incorporated herein by reference, was used as an expression host. DNA manipulations were performed in Escherichia coli XL1-Blue, available from Stratagene. E. coli MC1061 is also suitable for use as a host for plasmid manipulation. Plasmids were passaged through E. coli ET12567 (dam dcm hsdS Cmr) (MacNeil, 1988, J Bacteriol.
170: 5607, incorporated herein by reference) to generate unmethylated DNA prior to transformation of S. coelicolor. E. coli strains were grown under standard conditions.
S. coelicolor strains were grown on R2YE agar m:\specifications\090000\93000\93O75resmro. doc WO 99/61599 PCT/US99/11814 plates (Hopwood et al., Genetic manipulation ofStreptomyces. A laboratory manual. The John Innes Foundation: Norwich, 1985, incorporated herein by reference).
Many of the expression vectors of the invention illustrated in the examples are derived from plasmid pRM5, described in WO 95/08548, incorporated herein by reference.
This plasmid includes a colEI replicon, an appropriately truncated SCP2* Streptomyces replicon, two act-promoters to allow for bidirectional cloning, the gene encoding the actll- ORF4 activator which induces transcription from act promoters during the transition from growth phase to stationary phase, and appropriate marker genes. Engineered restriction sites in the plasmid facilitate the combinatorial construction of PKS gene clusters starting from cassettes encoding individual domains of naturally occurring PKSs. When plasmid pRM5 is used for expression of a PKS, all relevant biosynthetic genes can be plasmid-borne and therefore amenable to facile manipulation and mutagenesis in E. coli. This plasmid is also suitable for use in Streptomyces host cells. Streptomyces is genetically and physiologically well-characterized and expresses the ancillary activities required for in vivo production of most polyketides. Plasmid pRM5 utilizes the act promoter for PKS gene expression, so polyketides are produced in a secondary metabolite-like manner, thereby alleviating the toxic effects of synthesizing potentially bioactive compounds in vivo.
Manipulation of DNA and organisms. Polymerase chain reaction (PCR) was performed using Pfu polymerase (Stratagene; Taq polymerase from Perkin Elmer Cetus can also be used) under conditions recommended by the enzyme manufacturer. Standard in vitro techniques were used for DNA manipulations (Sambrook et al. Molecular Cloning: A Laboratory Manual (Current Edition)). E. coli was transformed using standard calcium chloride-based methods; a Bio-Rad E. coli pulsing apparatus and protocols provided by Bio- Rad could also be used. S. coelicolor was transformed by standard procedures (Hopwood et al. Genetic manipulation ofStreptomyces. A laboratory manual. The John Innes Foundation: Norwich, 1985), and depending on what selectable marker was employed, transformants were selected using 1 mL of a 1.5 mg/mL thiostrepton overlay, 1 mL of a 2 mg/mL apramycin overlay, or both.
Example 2 Cloning of the Picromycin Biosynthetic Gene Cluster from Streptomyces venezuelae Genomic DNA (100 Ag) isolated from Streptomyces venezuelae ATCC15439 using standard procedures was partially digested with Sau3AI endonuclease to generate fragments WO 99/61599 PCT/US99/11814 -76kbp in length. SuperCosI (Stratagene) DNA cosmid arms were prepared as directed by the manufacturer. A cosmid library was prepared by ligating 2.5 pg of the digested genomic DNA with 1.5 gg of cosmid arms in a 20 pL reaction. One microliter of the ligation mixture was propagated in E. coli XL1-Blue MR (Stratagene) using a GigapackIII XL packaging extract kit (Stratagene). The resulting library of -3000 colonies was plated on a 10x150 mm agar plate and replicated to a nylon membrane.
The library was initially screened by direct colony hybridization with a DNA probe specific for ketosynthase domain coding sequences of PKS genes. Colonies were alkaline lysed, and the DNA was crosslinked to the membrane using UV irradiation. After overnight incubation with the probe at 42 0 C, the membrane was washed twice at 25 0 C in 2xSSC buffer 0.1% SDS for 15 minutes, followed by two 15 minute washes with 2xSSC buffer at 55 0
C.
Approximately 30 colonies gave positive hybridization signals with the degenerate probe.
Several cosmids were selected and divided into two classes based on restriction digestion patterns. A representative cosmid was selected from each class for further analysis. The representative cosmids were designated pKOS023-26 and pKOS023-27. These cosmids were determined by DNA sequencing to comprise the narbonolide PKS genes, the desosamine biosynthesis and transferase genes, the beta-glucosidase gene, and the picK hydroxylase gene.
These cosmids were deposited with the American Type Culture Collection in accordance with the terms of the Budapest Treaty. Cosmid pKOS023-26 was assigned accession number ATCC 203141, and cosmid pKOS023-27 was assigned accession number ATCC 203142.
To demonstrate that the narbonolide PKS genes had been cloned and to illustrate how the invention provides methods and reagents for constructing deletion variants of narbonolide PKS genes, a narbonolide PKS gene was deleted from the chromosome of Streptomyces venezuelae. This deletion is shown schematically in Figure 4, parts B and C. A -2.4 kb EcoRI KpnI fragment and a -2.1 kb KpnI Xhol fragment, which together comprise both ends of the picAI gene (but lack a large portion of the coding sequence), were isolated from cosmid pKOS023-27 and ligated together into the commercially available vector pLitmus 28 (digested with restriction enzymes EcoRI and Xhol) to give plasmid pKOS039-07. The -4.5 kb HindIII-Spel fragment from plasmid pKOS039-07 was ligated with the 2.5 kb HindIII-Nhel fragment of integrating vector pSET152, available from the NRRL, which contains an E. coli origin of replication and an apramycin resistance-conferring gene to create WO 99/61599 PCT/US99/11814 -77plasmid pKOS039-16. This vector was used to transform S. venezuelae, and apramycinresistant transformants were selected.
Then, to select for double-crossover mutants, the selected transformants were grown in TSB liquid medium without antibiotics for three transfers and then plated onto nonselective media to provide single colony isolates. The isolated colonies were tested for sensitivity to apramycin, and the apramycin-sensitive colonies were then tested to determine if they produced picromycin. The tests performed included a bioassay and LC/MS analysis of the fermentation media. Colonies determined not to produce picromycin (or methymycin or neomethymycin) were then analyzed using PCR to detect an amplification product diagnostic of the deletion. A colony designated K39-03 was identified, providing confirmation that the narbonolide PKS genes had been cloned. Transformation of strain K39-03 with plasmid pKOS039-27 comprising an intactpicA gene under the control of the ermE* promoter from plasmid pWHM3 (see Vara et al., J. Bact. (1989) 171: 5872-5881, incorporated herein by reference) was able to restore picromycin production.
To determine that the cosmids also contained the picK hydroxylase gene, each cosmid was probed by Southern hybridization using a labeled DNA fragment amplified by PCR from the Saccharopolyspora erythraea C12-hydroxylase gene, eryK. The cosmids were digested with BamHI endonuclease and electrophoresed on a 1% agarose gel, and the resulting fragments were transferred to a nylon membrane. The membrane was incubated with the eryK probe overnight at 42 0 C, washed twice at 25 0 C in 2XSSC buffer with 0.1% SDS for minutes, followed by two 15 minute washes with 2XSSC buffer at 50 0 C. Cosmid pKOS023- 26 produced an -3 kb fragment that hybridized with the probe under these conditions. This fragment was subcloned into the PCRscriptTM (Stratagene) cloning vector to yield plasmid pKOS023-28 and sequenced. The -1.2 kb gene designated picK above was thus identified.
The picK gene product is homologous to eryK and other known macrolide cytochrome P450 hydroxylases.
By such methodology, the complete set of picromycin biosynthetic genes were isolated and identified. DNA sequencing of the cloned DNA provided further confirmation that the correct genes had been cloned. In addition, and as described in the following example, the identity of the genes was confirmed by expression of narbomycin in heterologous host cells.
WO 99/61599 PCT/US99/11814 -78- Example 3 Heterologous Expression of the Narbonolide PKS and the Picromycin Biosynthetic Gene Cluster To provide a preferred host cell and vector for purposes of the invention, the narbonolide PKS was transferred to the non-macrolide producing host Streptomyces lividans K4-114 (see Ziermann and Betlach, 1999, Biotechniques 26, 106-110, and U.S. patent application Serial No. 09/181,833, filed 28 Oct. 1998, each of which is incorporated herein by reference). This was accomplished by replacing the three DEBS ORFs on a modified version of pCK7 (see Kao et al., 1994, Science 265, 509-512, and U.S. Patent No. 5,672,491, each of which is incorporated herein by reference) with all four narbonolide PKS ORFs to generate plasmid pKOS039-86 (see Figure The pCK7 derivative employed, designated pCK7'Kan', differs from pCK7 only in that it contains a kanamycin resistance conferring gene inserted at its HindII restriction enzyme recognition site. Because the plasmid contains two selectable markers, one can select for both markers and so minimize contamination with cells containing rearranged, undesired vectors.
Protoplasts were transformed using standard procedures and transformants selected using overlays containing antibiotics. The strains were grown in liquid R5 medium for growth/seed and production cultures at 30 0 C. A 2 L shake flask culture ofS. lividans K4- 114/pKOS039-86 was grown for 7 days at 30 0 C. The mycelia was filtered, and the aqueous layer was extracted with 2 x 2 L ethyl acetate. The organic layers were combined, dried over MgSO4, filtered, and evaporated to dryness. Polyketides were separated from the crude extract by silica gel chromatography (1:4 to 1:2 ethyl acetate:hexane gradient) to give an mg mixture of narbonolide and 10-deoxymethynolide, as indicated by LC/MS and 1H NMR.
Purification of these two compounds was achieved by HPLC on a C-18 reverse phase column (20-80% acetonitrile in water over 45 minutes). This procedure yielded -5 mg each of narbonolide and 10-deoxymethynolide. Polyketides produced in the host cells were analyzed by bioassay against Bacillus subtilis and by LC/MS analysis. Analysis of extracts by LC/MS followed by 1H-NMR spectroscopy of the purified compounds established their identity as narbonolide (Figure 5, compound 4; see Kaiho et al., 1982, J. Org. Chem. 47: 1612-1614, incorporated herein by reference) and 10-deoxymethynolide (Figure 5, compound 5; see Lambalot et al., 1992, J. Antibiotics 45, 1981-1982, incorporated herein by reference), the respective 14 and 12-membered polyketide aglycones of YC17, narbomycin, picromycin, and methymycin.
WO 99/61599 PCT/sI9Q/1 d14 -79 The production of narbonolide in Streptomyces lividans represents the expression of an entire modular polyketide pathway in a heterologous host. The combined yields of compounds 4 and 5 are similar to those obtained with expression of DEBS from pCK7 (see Kao et al., 1994, Science 265: 509-512, incorporated herein by reference). Furthermore, based on the relative ratios of compounds 4 and 5 produced, it is apparent that the narbonolide PKS itself possesses an inherent ability to produce both 12 and 14-membered macrolactones without the requirement of additional activities unique to S. venezuelae.
Although the existence of a complementary enzyme present in S. lividans that provides this function is possible, it would be unusual to find such a specific enzyme in an organism that does not produce any known macrolide.
To provide a heterologous host cell of the invention that produces the narbonolide PKS and the picB gene, the picB gene was integrated into the chromosome of Streptomyces lividans harboring plasmid pKOS039-86 to yield S. lividans K39-18/pKOS039-86. To provide the integrating vector utilized, the picB gene was cloned into the Streptomyces genome integrating vector pSET152 (see Bierman et al., 1992, Gene 116, 43, incorporated herein by reference) under control of the same promoter (Pactl) as the PKS on plasmid pKOS039-86.
A comparison of strains K39-18/pKOS039-86 and K4-114/pKOS039-86 grown under identical conditions indicated that the strain containing TEII produced 4-7 times more total polyketide. Each strain was grown in 30 mL of R5 (see Hopwood et al., Genetic Manipulation of Streptomyces: A Laboratory Manual; John Innes Foundation: Norwich, UK, 1985, incorporated herein by reference) liquid (with 20 pg/mL thiostrepton) at 30 0 C for 9 days. The fermentation broth was analyzed directly by reverse phase HPLC. Absorbance at 235 nm was used to monitor compounds and measure relative abundance. This increased production indicates that the enzyme is functional in this strain. As noted above, because the production levels of compound 4 and 5 from K39-18/pKOS03986 increased by the same relative amounts, TEII does not appear to influence the ratio of 12 and 14-membered lactone ring formation.
To express the glycosylated counterparts of narbonolide (narbomycin) and deoxymethynolide (YC 17) in heterologous host cells, the desosamine biosynthetic genes and desosaminyl transferase gene were transformed into the host cells harboring plasmid pKOS039-86 (and, optionally, the picB gene, which can be integrated into the chromosome as described above).
WO 99/61599 PCT/US99/11814 Plasmid pKOS039-104, see Figure 6, comprises the desosamine biosynthetic genes, the beta-glucosidase gene, and the desosaminyl transferase gene. This plasmid was constructed by first inserting a polylinker oligonucleotide, containing a restriction enzyme recognition site for Pad, a Shine-Dalgarno sequence, and restriction enzyme recognition sites for Ndel, BglII, and HindIII, into a pUC19 derivative, called pKOS24-47, to yield plasmid pKOS039-98.
An -0.3 kb PCR fragment comprising the coding sequence for the N-terminus of the desI gene product and an ~0.12 kb PCR fragment comprising the coding sequence for the Cterminus of the desR gene product were amplified from cosmid pKOS23-26 (ATCC 203141) and inserted together into pLitmus28 treated with restriction enzymes Nsil and EcoRI to produce plasmid pKOS039-101. The ~6 kb SphI-PstI restriction fragment of pKOS23-26 containing the desI, deslI, desIII, deslV, and des V genes was inserted into plasmid pUC19 (Stratagene) to yield plasmid pKOS039-102. The -6 kb SphI-EcoRI restriction fragment from plasmid pKOS039-102 was inserted into pKOS039-101 to produce plasmid pKOS039-103.
The -6 kb BglII-PstI fragment from pKOS23-26 that contains the desR, des VI, des VII, and desVIII genes was inserted into pKOS39-98 to yield pKOS39-100. The -6 kb Pacl-Pstl restriction fragment of pKOS39-100 and the -6.4 kb Nsil-EcoRI fragment of pKOS39-103 were cloned into pKOS39-44 to yield pKOS39-104.
When introduced into Streptomyces lividans host cells comprising the recombinant narbonolide PKS of the invention, plasmid pKOS39-104 drives expression of the desosamine biosynthetic genes, the beta-glucosidase gene, and the desosaminyl transferase gene. The glycosylated antibiotic narbomycin was produced in these host cells, and it is believed that YC17 was produced as well. When these host cells are transformed with vectors that drive expression of the picK gene, the antibiotics methymycin, neomethymycin, and picromycin are produced.
In similar fashion, when plasmid pKOS039-18, which encodes a hybrid PKS of the invention that produces 3-deoxy-3-oxo-6-deoxyerythronolide B was expressed in Streptomyces lividans host cells transformed with plasmid pKOS39-104, the desosaminylated analog was produced. Likewise, when plasmid pCK7, which encodes DEBS, which produces 6-deoxyerythronolide B, was expressed in Streptomyces lividans host cells transformed with plasmid pKOS39-104, the 5-desosaminylated analog was produced.
These compounds have antibiotic activity and are useful as intermediates in the synthesis of other antibiotics.
WO 99/61599 PCT/US99/11814 -81- Example 4 Expression Vector for Desosaminyl Transferase While the invention provides expression vectors comprising all of the genes required for desosamine biosynthesis and transfer to a polyketide, the invention also provides expression vectors that encode any subset of those genes or any single gene. As one illustrative example, the invention provides an expression vector for desosaminyl transferase.
This vector is useful to desosaminylate polyketides in host cells that produce NDPdesosamine but lack a desosaminyl transferase gene or express a desosaminyl transferase that does not function as efficiently on the polyketide of interest as does the desosaminyl transferase of Streptomyces venezuelae. This expression vector was constructed by first amplifying the desosaminyl transferase coding sequence from pKOS023-27 using the primers: N3917: 5'-CCCTGCAGCGGCAAGGAAGGACACGACGCCA-3' (SEQ ID NO:25); and N3918: 5'-AGGTCTAGAGCTCAGTGCCGGGCGTCGGCCGG-3' (SEQ ID NO:26), to give a 1.5 kb product. This product was then treated with restriction enzymes PstI and Xbal and ligated with HindIII and Xbal digested plasmid pKOS039-06 together with the 7.6 kb PstI-Hindlll restriction fragment of plasmid pWHM 104 to provide plasmid pKOS039- 14. Plasmid pWHM1104, described in Tang et al., 1996, Molec. Microbiol. 22(5): 801-813, incorporated herein by reference, encodes the ermE* promoter. Plasmid pKOS039-14 is constructed so that the desosaminyl transferase gene is placed under the control of the ermE* promoter and is suitable for expression of the desosaminyl transferase in Streptomyces, Saccharopolyspora erythraea, and other host cells in which the ermE* promoter functions.
Example Heterologous Expression of the picK Gene Product in E. coli The picK gene was PCR amplified from plasmid pKOS023-28 using the oligonucleotide primers: N024-36B (forward): 5'-TTGCATGCATATGCGCCGTACCCAGCAGGGAACGACC (SEQ ID NO:27); and N024-37B (reverse): (SEQ ID NO:28). These primers alter the Streptomyces GTG start codon to ATG and introduce a Spel site at the C- WO 99/61599 PCT/US99/11814 -82terminal end of the gene, resulting in the substitution of a serine for the terminal glycine amino acid residue. The blunt-ended PCR product was subcloned into the commercially available vector pCRscript at the Srfl site to yield plasmid pKOS023-60. An -1.3 kb Ndel- Xhol fragment was then inserted into the NdeIIXhoI sites of the T7 expression vector pET22b (Novagen, Madison, WI) to generate pKOS023-61. Plasmid pKOS023-61 was digested with restriction enzymes Spel and EcoRI, and a short linker fragment encoding 6 histidine residues and a stop codon (composed ofoligonucleotides 30-85a: CTAGTATGCATCATCATCATCATCATTAA-3' (SEQ ID NO:29); and 30-85b: 5'-AATTTTAATGATGATGATGATGATGCATA-3' (SEQ ID NO:30) was inserted to obtain plasmid pKOS023-68. Both plasmid pKOS023-61 and pKOS023-68 produced active PicK enzyme in recombinant E. coli host cells.
Plasmid pKOS023-61 was transformed into E. coli BL21-DE3. Successful transformants were grown in LB-containing carbenicillin (100 Rg/ml) at 37 0 C to an OD600 of 0.6. Isopropyl-beta-D-thiogalactopyranoside (IPTG) was added to a final concentration of 1 mM, and the cells were grown for an additional 3 hours before harvesting. The cells were collected by centrifugation and frozen at -80 0 C. A control culture of BL21-DE3 containing the vector plasmid pET21c (Invitrogen) was prepared in parallel.
The frozen BL21-DE3/pKOS023-61 cells were thawed, suspended in 2 tL of cold cell disruption buffer (5 mM imidazole, 500 mM NaC1, 20 mM Tris/HC1, pH 8.0) and sonicated to facilitate lysis. Cellular debris and supernatant were separated by centrifugation and subjected to SDS-PAGE on 10-15% gradient gels, with Coomassie Blue staining, using a Pharmacia Phast Gel Electrophoresis system. The soluble crude extract from BL21- DE3/pKOS023-61 contained a Coomassie stained band of Mr-46 kDa, which was absent in the control strain BL21-DE3/pET21c.
The hydroxylase activity of the picK protein was assayed as follows. The crude supernatant (20 pL) was added to a reaction mixture (100 LL total volume) containing mM Tris/HCl (pH 20 pM spinach ferredoxin, 0.025 Unit of spinach ferredoxin:NADP+ oxidoreductase, 0.8 Unit of glucose-6-phosphate dehydrogenase, 1.4 mM NADP+, 7.6 mM glucose-6phosphate, and 20 nmol of narbomycin. The narbomycin was purified from a culture ofStreptomyces narbonensis, and upon LC/MS analysis gave a single peak of The reaction was allowed to proceed for 105 minutes at 30 0 C. Half of the reaction mixture was loaded onto an HPLC, and the effluent was analyzed by evaporative light scattering (ELSD) and mass spectrometry. The control extract (BL21-DE3/pET21c) was WO 99/61599 PCT/UJS99/11814 -83processed identically. The BL21-DE3/pKOS023-61 reaction contained a compound not present in the control having the same retention time, molecular weight and mass fragmentation pattern as picromycin The conversion of narbomycin to picromycin under these conditions was estimated to be greater than 90% by ELSD peak area.
The poly-histidine-linked PicK hydroxylase was prepared from pKOS023-68 transformed into E. coli BL21 (DE3) and cultured as described above. The cells were harvested and the PicK protein purified as follows. All purification steps were performed at 4°C. E. coli cell pellets were suspended in 32 pL of cold binding buffer (20 mM Tris/HCl, pH 8.0, 5 mM imidazole, 500 mM NaC1) per mL of culture and lysed by sonication. For analysis ofE. coli cell-free extracts, the cellular debris was removed by low-speed centrifugation, and the supernatant was used directly in assays. For purification of PicK/6- His, the supernatant was loaded (0.5 mL/min.) onto a 5 mL HiTrap Chelating column (Pharmacia, Piscataway, New Jersey), equilibrated with binding buffer. The column was washed with 25 pL of binding buffer and the protein was eluted with a 35 jpL linear gradient (5-500 mM imidazole in binding buffer). Column effluent was monitored at 280 nm and 416 nm. Fractions corresponding to the 416 nm absorbance peak were pooled and dialyzed against storage buffer (45 mM Tris/HCl, pH 7.5, 0.1 mM EDTA, 0.2 mM DTT, glycerol). The purified 46 kDa protein was analyzed by SDS-PAGE using Coomassie blue staining, and enzyme concentration and yield were determined.
Narbomycin was purified as described above from a culture of Streptomyces narbonensis ATCC 19790. Reactions for kinetic assays (100 pLL) consisted of 50 mM Tris/HCl (pH 100 LM spinach ferredoxin, 0.025 Unit of spinach ferredoxin:NADP+ oxidoreductase, 0.8 U glucose-6-phosphate dehydrogenase, 1.4 mM NADP+, 7.6 mM glucose-6-phosphate, 20-500 pM narbomycin substrate, and 50-500 nM of PicK enzyme. The reaction proceeded at 30 0 C, and samples were withdrawn for analysis at 5, 10, 15, and minutes. Reactions were stopped by heating to 100°C for 1 minute, and denatured protein was removed by centrifugation. Depletion of narbomycin and formation of picromycin were determined by high performance liquid chromatography (HPLC, Beckman C-18 0.46x15 cm column) coupled to atmospheric pressure chemical ionization (APCI) mass spectroscopic detection (Perkin Elmer/Sciex API 100) and evaporative light scattering detection (Alltech 500 ELSD).
WO 99/61599 PCT/US99/11814 -84- Example 6 Expression of the picK Gene Encoding the Hydroxylase in Streptomyces narbonensis To produce picromycin in Streptomyces narbonensis, a host that produces narbomycin but not picromycin, the methods and vectors of the invention were used to express the picK gene in this host.
The picK gene was amplified from cosmid pKOS023-26 using the primers: N3903: 5'-TCCTCTAGACGTTTCCGT-3' (SEQ ID NO:31); and N3904: 5'-TGAAGCTTGAATTCAACCGGT-3' (SEQ ID NO:32) to obtain an -1.3 kb product. The product was treated with restriction enzymes Xbal and HindIland ligated with the 7.6 kb XbaI-HindII restriction fragment of plasmid pWHM 1104 to provide plasmid pKOS039-01, placing the picK gene under the control of the ermE* promoter. The resulting plasmid was transformed into purified stocks of S. narbonensis by protoplast fusion and electroporation. The transformants were grown in suitable media and shown to convert narbomycin to picromycin at a yield of over Example 7 Construction of a Hybrid DEBS/Narbonolide PKS This example describes the construction of illustrative hybrid PKS expression vectors of the invention. The hybrid PKS contains portions of the narbonolide PKS and portions of rapamycin and/or DEBS PKS. In the first constructs, pKOS039-18 and pKOS039-19, the hybrid PKS comprises the narbonolide PKS extender module 6 ACP and thioesterase domains and the DEBS loading module and extender modules 1-5 as well as the KS and AT domains of DEBS extender module 6 (but not the KR domain of extender module In pKOS039-19, the hybrid PKS is identical except that the KS1 domain is inactivated, the ketosynthase in extender module 1 is disabled. The inactive DEBS KS1 domain and its construction are described in detail in PCT publication Nos. WO 97/02358 and WO 99/03986, each of which is incorporated herein by reference. To construct pKOS039-18, the 2.33 kb BamHI-EcoRI fragment of pKOS023-27, which contains the desired sequence, was amplified by PCR and subcloned into plasmid pUC19. The primers used in the PCR were: N3905: 5'-TTTATGCATCCCGCGGGTCCCGGCGAG-3' (SEQ ID NO:33); and N3906: 5'-TCAGAATTCTGTCGGTCACTTGCCCGC-3' (SEQ ID NO:34).
WO 99/61599 PCT/US99/11814 The 1.6 kb PCR product was digested with PstI and EcoRI and cloned into the corresponding sites ofplasmid pKOS015-52 (this plasmid contains the relevant portions of the coding sequence for the DEBS extender module 6) and commercially available plasmid pLitmus 28 to provide plasmids pKOS039-12 and pKOS039-13, respectively. The BglII EcoRI fragment of plasmid pKOS039-12 was cloned into plasmid pKOS01 1-77, which contains the functional DEBS gene cluster and into plasmid pJRJ2, which contains the mutated DEBS gene that produces a DEBS PKS in which the KS domain of extender module I has been rendered inactive. Plasmid pJRJ2 is described in PCT publication Nos. WO 99/03986 and WO 97/02358, incorporated herein by reference.
Plasmids pKOS039-18 and pKOS039-19, respectively, were obtained. These two plasmids were transformed into Streptomyces coelicolor CH999 by protoplast fusion.
The resulting cells were cultured under conditions such that expression of the PKS occurred.
Cells transformed with plasmid pKOS039-18 produced the expected product 3-deoxy-3-oxo- 6-deoxyerythronolide B. When cells transformed with plasmid pKOS039-19 were provided (2S,3R)-2-methyl-3-hydroxyhexanoate NACS, 13-desethyl-13-propyl-3-deoxy-3-oxo- 6 deoxyerythronolide B was produced.
Example 8 6-Hydroxylation of 3,6-dideoxy-3-oxoerythronolide B using the eryF hydroxylase Certain compounds of the invention can be hydroxylated at the C6 position in a host cell that expresses the eryF gene. These compounds can also be hydroxylated in vitro, as illustrated by this example.
The 6-hydroxylase encoded by eryF was expressed in E. coli, and partially purified.
The hydroxylase (100 pmol in 10 gL) was added to a reaction mixture (100 gl total volume) containing 50 mM Tris/HCl (pH 20 pM spinach ferredoxin, 0.025 Unit of spinach ferredoxin:NADP+ oxidoreductase, 0.8 Unit of glucose-6-phosphate dehydrogenase, 1.4 mM NADP+, 7.6 mM glucose-6-phosphate, and 10 nmol 6-deoxyerythronolide B. The reaction was allowed to proceed for 90 minutes at 30 0 C. Half of the reaction mixture was loaded onto an HPLC, and the effluent was analyzed by mass spectrometry. The production of erythronolide B as evidenced by a new peak eluting earlier in the gradient and showing Conversion was estimated at 50% based on relative total ion counts.
Those of skill in the art will recognize the potential for hemiketal formation in the above compound and compounds of similar structure. To reduce the amount of hemiketal WO 99/61599 PCT/US99/11814 -86formed, one can use more basic (as opposed to acidic) conditions or employ sterically hindered derivative compounds, such as 5-desosaminylated compounds.
Example 9 Measurement of Antibacterial Activity Antibacterial activity was determined using either disk diffusion assays with Bacillus cereus as the test organism or by measurement of minimum inhibitory concentrations (MIC) in liquid culture against sensitive and resistant strains of Staphylococcuspneumoniae.
The invention having now been described by way of written description and example, those of skill in the art will recognize that the invention can be practiced in a variety of embodiments and that the foregoing description and examples are for purposes of illustration and not limitation of the following claims.

Claims (36)

1. A recombinant DNA compound that comprises a sequence encoding a domain of a narbonolide polyketide synthase (PKS) of Streptomyces or an isolated DNA compound comprising a degenerate sequence to said Streptomyces sequence.
2. A recombinant DNA compound according to claim 1 wherein Streptomyces is Streptomyces narbonensis.
3. A recombinant DNA compound according to claim 1 wherein the Streptomyces is Streptomyces venezuelae.
4. The recombinant DNA compound according to any one of claims 1 to 3, wherein said domain is selected from the group consisting of a thioesterase domain, a KSQ domain, an AT domain, a KS domain, an ACP domain, a KR domain, a DH domain, and an ER domain. The recombinant DNA compound of claim 4 that comprises the coding sequence of a loading module, thioesterase domain, and all six extender modules of the narbonolide PKS.
6. The recombinant DNA compound of claim 4 that comprises a hybrid PKS.
7. The recombinant DNA compound of claim 6 wherein said hybrid PKS 25 comprises at least a portion of a narbonolide PKS gene and at least a portion of a second PKS gene for a macrolide aglycone other than narbonolide.
8. The recombinant DNA compound of claim 7 wherein said second PKS gene is a DEBS gene.
9. The recombinant DNA compound of claim 8 wherein said hybrid PKS is composed of a loading module and extender modules 1 through 6 of DEBS excluding a KR domain of extender module 6 of DEBS and an ACP of extender module 6 and a thioesterase domain of the narbonolide PKS. A recombinant DNA compound that comprises a coding sequence of a desosamine biosynthetic gene or a desosaminyl transferase gene or a beta-glucosidase gene of Streptomyces venezuelae or a degenerate sequence to said S. venezuelae sequence.
11. A recombinant DNA compound that comprises a coding sequence of a picK hydroxylase gene of Streptomyces venezuelae or a degenerate sequence to said S. venezuelae sequence.
12. The DNA compound of any one of claims 1 to 11 that further comprises a promoter operably linked to said coding sequence.
13. The recombinant DNA compound of claim 12, wherein said promoter is a promoter derived from a cell other than a Streptomyces venezuelae cell.
14. The recombinant DNA compound of claim 13 that is a recombinant DNA expression vector. The expression vector of claim 14 that expresses a PKS in a Streptomyces host. cell.
16. A recombinant host cell, which in its untransformed state does not produce i: deoxymethynolide or narbonolide, wherein said cell comprises the recombinant DNA expression vector of claim 14 encoding a narbonolide PKS and wherein said cell 25 produces 10-deoxymethynolide or narbonolide.
17. The recombinant host cell of claim 16 that further comprises apicB gene.
18. The recombinant host cell of claim 16 or 17 that further comprises desosamine biosynthetic genes and a gene for desosaminyl transferase and produces YC17 or narbomycin.
19. The recombinant host cell of claim 18 that further comprises a picK gene and produces methymycin, neomethymycin, or picromycin. 89 The recombinant host cell of claim 19 that is Streptomyces coelicolor or Streptomyces lividans.
21. A recombinant host cell other than a Streptomyces venezuelae cell that expresses a picK hydroxylase gene of S. venezuelae or a degenerate sequence of S. venezuelae.
22. A recombinant host cell other than a Streptomyces venezuelae host cell that expresses a desosamine biosynthetic gene or desosaminyl transferase gene of S. venezuelae or a degenerate sequence ofS. venezuelae.
23. A method of increasing the yield of a desosaminylated polyketide in a cell, which method comprises transforming the cell with a recombinant expression vector that encodes a functional beta-glucosidase gene.
24. A hybrid PKS which comprises at least one domain of a Streptomyces narbonolide PKS. The hybrid PKS of claim 24 wherein said hybrid PKS comprises at least a portion of a narbonolide PKS gene and at least a portion of a second PKS gene for a macrolide aglycone other than narbonolide.
26. The hybrid PKS of claim 25 wherein said second PKS gene is a DEBS gene.
27. The hybrid PKS of claim 26 wherein said hybrid PKS is comprised of a loading S• 25 module and extender modules 1 through 6 of DEBS excluding a KR domain of extender module 6 of DEBS and an ACP of extender module 6 and a thioesterase domain of the narbonolide PKS.
28. A method of producing a polyketide which comprises providing starter, extender and/or intermediate ketide units to the hybrid PKS of claim 24.
29. A polyketide produced by the method of claim 28.
30. A method of desosaminylating a polyketide compound comprising expressing a recombinant DNA compound according to claim 1 and/or claim 10 and/or claim 11 in a host cell.
31. A method according to claim 30 Streptomyces venezuelae.
32. A method according to claim 30 narbonensis.
33. A method according to any one of Streptomyces lividans. 0
34. A method according to any one of Streptomyces coelicolor. wherein the recombinant DNA is from wherein the recombinant DNA is from claims 30 to 32 wherein the host cell is claims 30 to 32 wherein the host cell is A method according to any one of claims 30 to 34 wherein a beta-glucosidase gene is also expressed in the host cell.
36. A host cell that has been transformed with the recombinant DNA according to any one of claims 1 to 14 such that it expresses a Streptomyces narbonolide PKS and/or a Streptomyces desoaminyl transferase and/or a Streptomyces desosamine biosynthetic enzyme.
37. A host cell according to claim 36 wherein the Streptomyces is Streptomyces venezuelae. 25 38. A host cell according to claim 36 wherein the Streptomyces is Streptomyces nabonensis.
39. A host cell according to any one of claims 36 to 38 wherein the cell also expresses a beta-glucosidase enzyme. A recombinant DNA compound according to any one of claims 1 to 14 or the expression vector of claim 15 substantially as hereinbefore described with reference to the examples and/or the drawings. 9 9* 9 9 9* 9
41. A recombinant host cell according to any one of claims 16 to 22 or a host cell according to any one of claims 36 to 39 substantially as hereinbefore described with reference to the examples and/or the drawings.
42. A method for increasing the yield of a desosaminylated polyketide in a cell according to claim 23 substantially as hereinbefore described with reference to the examples and/or the drawings.
43. A hybrid PKS according to any one of claims 24 to 27 substantially as hereinbefore described with reference to the examples and/or the drawings.
44. A method of producing a polyketide according to claim 28 or a polyketide according to claim 29 substantially as hereinbefore described with reference to the examples and/or the drawings. A method of desosaminylating a polyketide compound according to any one of claims 30 to 35 substantially as hereinbefore described with reference to the examples and/or the drawings. Dated this TWENTY EIGHTH day of APRIL 2003 KOSAN Biosciences, Inc. i Patent Attorneys for the Applicant: F B RICE CO
AU42137/99A 1998-05-28 1999-05-27 Recombinant narbonolide polyketide synthase Ceased AU762399C (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US8708098P 1998-05-28 1998-05-28
US60/087080 1998-05-28
US09/141908 1998-08-28
US09/141,908 US6503741B1 (en) 1998-05-28 1998-08-28 Polyketide synthase genes from Streptomyces venezuelae
US10088098P 1998-09-22 1998-09-22
US60/100880 1998-09-22
US11913999P 1999-02-08 1999-02-08
US60/119139 1999-02-08
PCT/US1999/011814 WO1999061599A2 (en) 1998-05-28 1999-05-27 Recombinant narbonolide polyketide synthase

Publications (3)

Publication Number Publication Date
AU4213799A AU4213799A (en) 1999-12-13
AU762399B2 true AU762399B2 (en) 2003-06-26
AU762399C AU762399C (en) 2004-02-05

Family

ID=27492096

Family Applications (1)

Application Number Title Priority Date Filing Date
AU42137/99A Ceased AU762399C (en) 1998-05-28 1999-05-27 Recombinant narbonolide polyketide synthase

Country Status (6)

Country Link
EP (1) EP1082439A2 (en)
JP (1) JP2002516090A (en)
AU (1) AU762399C (en)
CA (1) CA2328427A1 (en)
NZ (1) NZ509006A (en)
WO (1) WO1999061599A2 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6265202B1 (en) 1998-06-26 2001-07-24 Regents Of The University Of Minnesota DNA encoding methymycin and pikromycin
EP1754700A3 (en) 1999-01-27 2007-05-09 Kosan Biosciences, Inc. Synthesis of oligoketides
US6524841B1 (en) 1999-10-08 2003-02-25 Kosan Biosciences, Inc. Recombinant megalomicin biosynthetic genes and uses thereof
US7033818B2 (en) 1999-10-08 2006-04-25 Kosan Biosciences, Inc. Recombinant polyketide synthase genes
US6838265B2 (en) 2000-05-02 2005-01-04 Kosan Biosciences, Inc. Overproduction hosts for biosynthesis of polyketides
WO2002059322A2 (en) * 2000-10-17 2002-08-01 Cubist Pharmaceuticlas, Inc. Compositions and methods relating to the daptomycin biosynthetic gene cluster
WO2003078411A1 (en) 2002-03-12 2003-09-25 Bristol-Myers Squibb Company C3-cyano epothilone derivatives
DK2668284T3 (en) 2011-01-28 2014-12-15 Amyris Inc Screening of colony micro encapsulated in gel
MX2013013065A (en) 2011-05-13 2013-12-02 Amyris Inc Methods and compositions for detecting microbial production of water-immiscible compounds.
US10927382B2 (en) 2012-08-07 2021-02-23 Amyris, Inc. Methods for stabilizing production of acetyl-coenzyme a derived compounds
KR102159691B1 (en) 2013-03-15 2020-09-24 아미리스 인코퍼레이티드 Use of phosphoketolase and phosphotransacetylase for production of acetyl-coenzyme a derived compounds
PT3030662T (en) 2013-08-07 2020-03-25 Amyris Inc Methods for stabilizing production of acetyl-coenzyme a derived compounds
AU2016284696B2 (en) 2015-06-25 2021-10-28 Amyris, Inc. Maltose dependent degrons, maltose-responsive promoters, stabilization constructs, and their use in production of non-catabolic compounds

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5712146A (en) * 1993-09-20 1998-01-27 The Leland Stanford Junior University Recombinant combinatorial genetic library for the production of novel polyketides
JP4173551B2 (en) * 1996-07-05 2008-10-29 バイオティカ テクノロジー リミティド Novel erythromycin and method for producing the same

Also Published As

Publication number Publication date
WO1999061599A3 (en) 2000-01-27
AU762399C (en) 2004-02-05
NZ509006A (en) 2003-09-26
EP1082439A2 (en) 2001-03-14
JP2002516090A (en) 2002-06-04
CA2328427A1 (en) 1999-12-02
AU4213799A (en) 1999-12-13
WO1999061599A2 (en) 1999-12-02

Similar Documents

Publication Publication Date Title
US6509455B1 (en) Recombinant narbonolide polyketide synthase
US6902913B2 (en) Recombinant narbonolide polyketide synthase
AU762399B2 (en) Recombinant narbonolide polyketide synthase
Rowe et al. Engineering a polyketide with a longer chain by insertion of an extra module into the erythromycin-producing polyketide synthase
US6391594B1 (en) Modified modular PKS with retained scaffold
US6251636B1 (en) Recombinant oleandolide polyketide synthase
Tang et al. Formation of functional heterologous complexes using subunits from the picromycin, erythromycin and oleandomycin polyketide synthases
US6670168B1 (en) Recombinant Streptomyces hygroscopicus host cells that produce 17-desmethylrapamycin
US6303767B1 (en) Nucleic acids encoding narbonolide polyketide synthase enzymes from streptomyces narbonensis
US6399789B1 (en) Multi-plasmid method for preparing large libraries of polyketides and non-ribosomal peptides
WO2001083803A1 (en) Overproduction hosts for biosynthesis of polyketides
US6503741B1 (en) Polyketide synthase genes from Streptomyces venezuelae
US20060269528A1 (en) Production detection and use of transformant cells
EP1171583B1 (en) A multi-plasmid method for preparing large libraries of polyketides and non-ribosomal peptides
Boom Recent developments in the molecular genetics of the erythromycin-producing organism Saccharopolyspora erythraea
US20030148469A1 (en) Combinatorial polyketide libraries produced using a modular PKS gene cluster as scaffold
US20040072165A1 (en) Method for producing dna encoding polypeptides that are composed of several section, and for producing polypeptides by expressing the dna thus obtained
US20040209322A1 (en) Combinatorial polyketide libraries produced using a modular PKS gene cluster as scaffold
US20010024810A1 (en) Method to prepare macrolide analogs
AU5780501A (en) Combinatorial polyketide libraries produced using a modular PKS gene cluster as scaffold

Legal Events

Date Code Title Description
DA3 Amendments made section 104

Free format text: THE NATURE OF THE AMENDMENT IS: AMEND THE APPLICANTS NAME TO READ KOSAN BIOSCIENCES, INC.

DA2 Applications for amendment section 104

Free format text: THE NATURE OF THE PROPOSED AMENDMENT IS AS SHOWN IN THE STATEMENT(S) FILED 20030605

FGA Letters patent sealed or granted (standard patent)
DA3 Amendments made section 104

Free format text: THE NATURE OF THE AMENDMENT IS AS WAS NOTIFIED IN THE OFFICIAL JOURNAL DATED 20030710