MXPA00012342A - Genes for the biosynthesis of epothilones - Google Patents

Genes for the biosynthesis of epothilones

Info

Publication number
MXPA00012342A
MXPA00012342A MXPA/A/2000/012342A MXPA00012342A MXPA00012342A MX PA00012342 A MXPA00012342 A MX PA00012342A MX PA00012342 A MXPA00012342 A MX PA00012342A MX PA00012342 A MXPA00012342 A MX PA00012342A
Authority
MX
Mexico
Prior art keywords
seq
nucleotides
amino acids
leu
wing
Prior art date
Application number
MXPA/A/2000/012342A
Other languages
Spanish (es)
Inventor
Thomas Schupp
James Madison Ligon
Molnar Istvan
Ross Zirkle
Jorn Gorlach
Devon Cyr
Original Assignee
Novartis Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novartis Ag filed Critical Novartis Ag
Publication of MXPA00012342A publication Critical patent/MXPA00012342A/en

Links

Abstract

Nucleic acid molecules are isolated from Sorangium cellulosum that encode polypeptides necessary for the biosynthesis of epothilone. Disclosed are methods for the production of epothilone in recombinant hosts transformed with the genes of the invention. In this manner, epothilone can be produced in quantities large enough to enable their purification and use in pharmaceutical formulations such as those for the treatment of cancer.

Description

GENES FOR THE EPOTILONE BIOSYNTHESIS FIELD OF THE INVENTION The present invention relates generally to polyketides and genes for their synthesis. In particular, the present invention relates to the isolation and characterization of novel polyketide synthase and non-ribosomal peptide synthetase genes from Sorangium cellulosum, which are necessary for the biosynthesis of epothilones A and B.
BACKGROUND OF THE INVENTION Polyketides are compounds synthesized from building blocks of two carbon atoms, the carbon-ß atom of which always carries a keto group, and hence the name of po-licetido. These compounds include many important antibiotics, immunosuppressants, chemotherapeutic agents for cancer, and other compounds that possess a wide range of biological properties. The tremendous structural diversity is derived from the different lengths of the polyketide chain, from the different side chains introduced (either as part of the 2 carbon atoms building blocks, or after the polyketide base structure is formed) , and the stereochemistry of these groups. Keto groups can also be reduced to hydroxyl, enoyl, or can be completely removed. Each round of addition of 2 carbon atoms is carried out by an enzyme complex called polyketide-synthase (PKS) in a manner similar to fatty acid biosynthesis. The biosynthetic genes have been isolated and sequenced by an increasing number of polyketides. For example, see Patents of the United States of North America Numbers 5,639.9 ^ = 9; 5,693,774; and 5,716,849, all of which are incorporated herein by reference, which describe genes for sorafen biosynthesis. See also Schupp et al., FEMS Microbiology Letters 159: 201-207 (1998), and International Publication Number 98/07868, which describe genes for the biosynthesis of rifamycin, and U.S. Patent No. 5,876,991 which describes genes for thylactone biosynthesis, all of which are incorporated into the pre-sité as a reference. The encoded proteins generally fall into two types: type I and type II. Type I proteins are polyfunctional, with several catalytic domains that perform different enzymatic steps covalently linked together (for example PKS for erythromycin, sorafeno, rifamycin, and avermectin (MacNeil et al., In Industrial Microorga -nisms: Basic and Applied Molecular Genetics , (editor: Balt et al.), American Society for Microbiology, Washington DC, pages 245-256 (1993)), while type II proteins are monofunctional (Hutchinson et al., in Indus -trial Microorganisms: Basic and Applied Molecular Genetics , (eds .: Baltz et al.), American Society for Microbiology, Washington DC, pages 203-216 (1993).) For the simplest polyketides, such as actmorrodine (produced by Streptomyces coelicolor), the different rounds 5 of additions of 2 carbon atoms are performed in an iterative fashion on the PKS enzymes encoded by a set of PKS genes. The most complicated compounds, such as erythromycin and sorafeno, involve PKS enzymes that are organized into modules, whereby each module performs a round of addition of 2 carbon atoms (for a review, see Hopwood et al. Industrial Microorganisms: Basic and Applied Molecular Genetics, (editor: Baltz et al.), American Society for Microbiology, Washington DC, pages 267-275 (1993). 5 Complex polyketides and secondary metabolites in general may contain substructures that are derived from amino acids instead of simple carboxylic acids. The incorpc > - Rations of these building blocks are made by the non-ribosomal polypeptide synthetases (NRPSs). The NRPSs 0 are multienzymes that are organized in modules. Each module is responsible for the addition (and additional processing), if required) of an amino acid building block. NRPSs activate the amino acids by forming aminoacyl-adenylates and capture the activated amino acids on the thiol groups of the phosphopantetheinyl prosthetic groups in the domains of the peptidyl carrier protein. In addition, NRPSs modify the amino acids by epimerization, N-methylation, or cyclization, if necessary, and catalyze the formation of peptide bonds between the amino acids linked with enzymes. The NRPSs are responsible for the biosynthesis of secondary peptide metabolites such as cyclosporin, they could provide polyketL-do chain terminator units as in rapamycin, and form mixed systems with the PKSs, as in the biosynthesis of yersiniabactin. The epothilones A and B are 16-membered macrocyclic polyketides with an initiator unit derived from acylcysteine, which are produced by the bacterium Sorangium cellulosum strain So ce90 (Gerth et al., J. Antibiotics 49: 560-563 (1996 ', incorporated into present as reference.) The structure of the epothilones A and B, wherein R means hydrogen (epothilone A) or methyl (epothilone B) is: Epothilones have a narrow antifungal spectrum, and show especially high cytotoxicity in animal cell cultures (see, Hófle et al., German Patent DE 4138042 (1993), incorporated herein by reference r -r --- 'v--? --- i ^ - ^^^^^^^^ m ^ ^ ^ M ^ mmw ^ m ^ mmm ^ mm ^ m ^ mmmiMm. Of significant importance, epothilones mimic the biological effects of taxol, both in vitro and in cultured cells (Bollag et al., Cancer Research 55: 2325-2333 (1995), incorporated herein by reference). Taxol and taxotere, which stabilize cellular microtubules, are chemotherapeutic agents for cancer with significant activity against different solid human tumors (Rowinsky et al., J. Nati, Cancer Inst. 83: 1773-1781 (1991)). Competitive studies have revealed that epothilones act as competitive inhibitors of taxol binding to microtubules, consistent with the interpretation that they share the same microtubule binding site, and possess a similar affinity to microtubules as taxo. However, epothilones enjoy a significant advantage over taxol, because epothilones exhibit a much lower potency drop compared to taxol, against a multidrug resistant cell line (Bollag et al. (1995)). In addition, epothilones are considerably less efficiently exported from the cells by the P-glycoprotein than taxol (Gerth et al. (1996)). In addition, several epothilone analogs having superior cytotoxic activity have been synthesized, compared to epothilone A or epothilone B, as demonstrated by their improved ability to induce polymerization and stabilization of microtubules (International Publication Number WO). 98/25929, incorporated herein by reference). Despite the promise shown by epothilones as anticancer agents, the problems pertaining to the production of these compounds currently limit their commercial potential. The compounds are too complex for chemical synthesis on an industrial scale, and thus, must be produced by fermentation. Techniques for the genetic manipulation of myxobacteria, such as Sorangi um cell? - Losum, are described in U.S. Patent No. 5,686,295, incorporated herein by reference. However, Sorangium Cellulosum is notoriously difficult to ferment, and consequently, production levels of epothilones are low. The recombinant production of epothilones in heterologous hosts that are more susceptible to fermentation could solve current production problems. However, the genes encoding the polypeptides responsible for epothilone biosynthesis have not been isolated thus far. In addition, the strain producing the epothilones, ie, So ce90, also produces at least one additional polyketide, spirangine, which would be expected to greatly complicate the isolation of the genes particularly responsible for the biosynthesis of epothilone. Accordingly, in view of the above, an objective of the present invention is to isolate the genes that are involved in the synthesis of epothilones, particularly the genes that are involved in the synthesis of epothilones A and B in myxo-bacteria of the group Sorangium / Polyangium, that is, Sorangi um cellulosum strain So ce90. A further objective of the invention is to provide a method for the recombinant production of epothilones, for application in anti-cancer formulations.
SUMMARY OF THE INVENTION In pursuit of the aforementioned and other objects, the present invention unexpectedly overcomes the difficulties stipulated above, to provide for the first time a nucleic acid molecule comprising a nucleotide sequence encoding at least one polypeptide involved in the biosynthesis of epothilone. In a preferred embodiment, the nucleotide sequence is isolated from a species belonging to Myxobacteria, more preferably Sorangium cellulosum. In another preferred embodiment, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence encoding at least one polypeptide involved in the biosynthesis of an epothilone, wherein said polypeptide comprises an amino acid sequence substantially similar to a sequence of amino acids selected from the group consisting of: SEQ ID NO: 2 amino acids 11-437 of SEQ ID NO: 2, amino acids 543-864 of SEQ ID NO: 2, amino acids 974-1273 of SEQ ID NO: 2, amino acids 1314-1385 of SEQ ID NO: 2, SEQ ID NO: 3, amino acids 72-81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of the SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO: 3, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-9 12 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268-1274 of SEQ ID NO: 3, amino acids 1285-1297 of SEQ ID NO: 3, amino-acids 973- 1256 of SEQ ID NO: 3, amino acids 1344-1351 of SEQ ID NO: 3, SEQ ID NO: 4, amino acids 7-432 of SEQ ID NO: 4, amino acids 539-859 of SEQ ID NO: 4 , amino acids 869-1037 of SEQ ID NO: 4, amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1722-1792 of SEQ ID NO: 4, SEQ ID NO: 5, amino acids 39-457 of SEQ. ID NO: 5, amino acids 563-884 of SEQ ID NO: 5, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO : 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO: 5 , amino acids 3555-3876 of SEQ ID NO: 5, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino-acids 5631-5951 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, SEQ ID NO: 6, amino acids 35-454 of SEQ ID NO: 6, amino acids 561-881 of SEQ ID NO: 6, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 1522 -1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, amino acids 2383-2551 of SEQ ID NO: 6, amino acids 2671-3045 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, SEQ ID NO: 7, amino acids 32-450 of SEQ ID NO: 7, amino acids 556-877 of SEQ ID NO: 7, amino acids 887-1051 of SEQ ID NO: 7, amino acids 1478-1790 of SEQ ID NO: 7, amino acids 1810-2055 of SEQ ID NO: 7, amino acids 2093-2164 of SEQ ID NO: 7, amino acids 2165-2439 of SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, and SEQ ID NO: 22. In a more preferred embodiment, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence encoding at least one polypeptide involved in the biosynthesis of an epothilone, wherein said polypeptide comprises an amino acid sequence selected from the group which consists of: SEQ ID NO: 2, amino acids 11-437 of SEQ ID NO: 2, amino acids 543-864 of SEQ ID NO: 2, amino acids 974-1273 of SEQ ID NO: 2, amino acids 1314-1385 of SEQ ID NO: 2, SEQ ID NO: 3, amino acids 72-81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO: 3, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268-1274 of SEQ ID NO: 3, amino acids 1285-1297 of SEQ ID NO: 3, amino acids 973-1256 of SEQ ID NO: 3, amino acids 1344- 1351 of SEQ ID NO: 3, SEQ ID NO: 4, amino acids 7-432 of SEQ ID NO: 4, amino acids 539-859 of SEQ ID NO: 4, amino acids 869-1037 of SEQ ID NO: 4 , amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1722-1792 of SEQ ID NO: 4, SEQ ID NO: 5, amino acids 39-457 of SEQ ID NO: 5, amino acids 563-884 of SEQ. ID NO: 5, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO : 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5 , amino acids 5010-5082 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, SEQ ID NO: 6, amino acids 35-454 of SEQ ID NO: 6, amino acids 561-881 of SEQ ID NO: 6, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO : 6, amino acids 2053-2373 of SEQ ID NO: 6, amino acids 2383-2551 of SEQ ID NO: 6, amino acids 2671-3045 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6 , amino acids 3673-3745 of SEQ ID NO: 6, SEQ ID NO: 7, amino acids 32-450 of SEQ ID NO: 7, amino acids 556 -877 of SEQ ID NO: 7, amino acids 887-1051 of SEQ ID NO: 7, amino acids 1478-1790 of SEQ ID NO: 7, amino acids 1810-2055 of SEQ ID NO: 7, amino acids 2093 -2164 of SEQ ID NO: 7, amino acids 2165-2439 of SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, and SEQ ID NO: 22. In still Another preferred embodiment, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence encoding at least one polypeptide involved in the biosynthesis of an epothilone, wherein this nucleotide sequence is substantially similar to a nucleotide sequence selected from from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO: 1, nucleotides 3415-5556 of SEQ ID NO: 1, na-cleotides 7610-11875 of SEQ ID NO: 1, nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 9236-10201 of SEQ ID NO: L, nucleotides 10529-11428 of SEQ ID NO: 1, nucleotide s 11543-11764 of SEQ ID NO: 1, nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466 -12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of the SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, nucleotides 15901-15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 21746-43519 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nu -cleotides 27911-28876 of SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, - * Sutellotides -38636-39598 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: l, nucleotides 41369-42256 of SEQ ID NO: l, nucleotides 42314-43048 of SEQ ID NO: l, nucleotides 43163-43378 of SEQ ID NO: l and nucleotides 43524-54920 of SEQ ID NO: l, nucleotides 43626 ^ -44885 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nu- cleotides 48087-49361 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, nucleotides 51534-52657 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, nucleotides 54935 -62254 of SEQ ID NO: 1, nucleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO: 1, nucleotides 57593-58087 of SEQ ID NO: 1, nu-cleotides 59366 -60304 of SEQ ID NO: 1, nucleotides 60362-61099 of SEQ ID NO: 1, nucleotides 61211-61426 of SEQ ID NO: 1, nucleotides 61427-62254 of SEQ ID NO: 1, nucleotides 62369-63628 of SEQ ID NO: 1, nucleotides 67334-68251 of SEQ ID NO: 1, and nucleotides 1-68750 of SEQ ID NO: 1. In an especially preferred embodiment, the present invention provides a nucleic acid molecule comprising a nucleo sequence which encodes at least one polypeptide involved in the biosynthesis of an epothilone, wherein this nucleotide sequence is selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO.-1, nucleotides 3415 -5556 of SEQ ID NO: 1, nucleotides 7610-11875 of SEQ ID NO: 1, nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 10529-11428 of SEQ ID NO: l, nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 11872-16104 of SEQ ID NO: l, nucleotides 12085-12114 of SEQ ID NO: l, nucleotides 12223-12246 of the SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ I D NO: l, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, nucleotides 15901-15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO : 1, nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 21746-43519 of SEQ ID NO: 1 , nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 30815- 32092 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1 , nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811- 48032 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, nucleotides 51534-52657 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, nucleotides 54935-62254 of SEQ ID NO: 1, nu-cleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO: 1, nucleotide os 57593-58087 of SEQ ID NO: 1, nucleotides 59366-60304 of SEQ ID NO: 1, nucleotides 60362-61099 of SEQ ID NO: 1, nucleotides 61211-61426 of SEQ ID NO: 1, nucleotides 61427 -62254 of SEQ ID NO: 1, nucleotides 62369-63628 of SEQ ID NO: 1, nucleotides 67334-68251 of SEQ ID NO: 1, and nucleotides 1-68750 of SEQ ID NO: 1. In yet another preferred embodiment, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence encoding at least one poty-peptide involved in the biosynthesis of an epothilone, wherein said nucleotide sequence comprises a nucleotide portion of , 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs, of a sequence identical to a portion of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) pars of respective consecutive bases of a nucleotide sequence selected from the group consisting of: the complement of s nucleotides 1900-3171 of SEQ ID NO: 1, nucleotides 3415-5556 of SEQ ID NO: 1, nucleotides 7610-11875 of SEQ ID NO: 1, nucleotides 7643-8920 of SEQ ID NO: 1, nu -cleotides 9236-10201 of SEQ ID NO: l, nucleotides 10529-11 ^ 28 of SEQ ID NO: l, nucleotides 11549-11764 of SEQ ID NO: l, nucleotides 11872-16104 of SEQ ID N0: 1 , nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473- 14547 of SEQ ID NO: 1, ru-cleotides 14578-14607 of SEQ ID NO.-1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ-ID N0: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, nucleotides 15 901-15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 18855- 19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 21746-43519 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 308-5-32092 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: l, nucleotides 36773-36991 of SEQ ID NO: l, nucleotides 37052-38320 of the SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, nucleotides 423.4-43048 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 43524-54920 of SEQ ID NO: 1 , nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1 , nucleotides 48087-49361 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, nucleotides 51534- 52657 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, nucleotides 54935-62254 of SEQ ID NO: 1, nucleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600- 57565 of SEQ ID NO: 1, nucleotides 57593-58087 of SEQ ID NO: 1, nucleotides 59366-60304 of SEQ ID NO: 1, nucleotides 60362-61099 of SEQ ID NO: 1, nucleotides 61211-61426 of SEQ ID NO: 1, nucleotides 61427-62254 of SEQ ID NO: 1, nu cleotides 62369-63628 of SEQ ID NO: 1, nucleotides 67334-68251 of SEQ ID NO: 1, and nucleotides 1-68750 of SEQ ID NO: 1. The present invention also provides a chimeric gene comprising a heterologous promoter sequence operably linked to a nucleic acid molecule of the invention. In addition, the present invention provides a recombinant vector comprising this chimeric gene, wherein the vector can be stably transformed into a host cell. Still further, the present invention provides a recombinant host cell comprising this chimeric gene, wherein the host cell can express the nucleotide sequence encoding at least one polypeptide not necessary for the biosynthesis of an epothilone. In a preferred embodiment, the recombinant host cell is a bacterium belonging to the order Actinomycetales, and in a more preferred embodiment, the recombinant host cell is a Streptomyces strain. In other embodiments, the recombinant host cell is any other bacteria susceptible to fermentation, such as a pseudomonada or E. coli Still further, the present invention provides a Bac clone comprising a nucleic acid molecule of the invention, preferably the pEP015 clone of Bac. In another aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence encoding an epothilone synthase domain. According to one embodiment, the epothilone synthase domain is a β-ketoacyl-synthase (KS) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 11- 437 of SEQ ID NO: 2, amino acids 7-432 of SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 3024- 3449 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO: 7. According to this embodiment, this domain preferably does not comprise an amino acid sequence selected from the group consisting of: 11-437 of SEQ ID NO: 2, amino acids 7-432 of the SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids-two 3024-3449 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO: 7. Also, according to this embodiment, this nucleotide sequence is preferably substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, and nucleotides 55028-56284 of SEQ ID NO: 1. According to this mode, this most preferable nucleotide sequence comprises a portion of nucleotides of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs, of identical sequence to a portion of 20, 25, 30, 40, 45, or 50 (preferably 20) respective consecutive base pairs of a nucleotide sequence selected from the set consisting of: nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 43626- 44885 of SEQ ID NO: l, nucleotides 48087-49361 of S? Q ID NO: 1, and u-cleotides 55028-56284 of -.SBQ ID NO: 1. In addition, according to this modality , this nucleotide sequence is more preferably selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO.-1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 43626-448 85 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, and nucleotides 55028-56284 of SEQ ID NO: 1. According to another embodiment, the epothilor.a-synthase domain is an acyltransferase (AT) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 543- 864 of SEQ ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563-884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO: 5, amino acids 561-881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, and amino acids 556-877 of the SEQ ID NO: 7. According to this embodiment, this AT domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO: 2, amino acids 539-859 of the SEQ ID NO: 4, amino acids 563-884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5 , amino acids 5631-5951 of SEQ ID NO: 5, amino acids 561-881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, and amino acids 556-877 of SEQ ID NO: 7. Also, according to this embodiment, this nucleotide sequence is preferably substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 38636-39598 of the SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, and nucleotides 56600-57565 of SEQ ID NO: 1. According to this embodiment, this nucleotide sequence more preferably comprises a portion of nucleotides of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs, of a sequence identical to a portion of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) respective consecutive base pairs of a nucleotide sequence selected from the set consisting of: nucleotides 9236-10201 of SEQ ID NO: l , nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotide Nos 38636-39598 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, and ru-cleotides 56600-57565 of SEQ ID NO. : 1. In addition, according to this embodiment, this nucleotide sequence is more preferably selected from the group consisting of: nucleotides 9 236-10201 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 32408- 33373 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, and nucleotides 56600-57565 of SEQ ID NO: 1. According to still another embodiment, the epoxylated synthase domain is an enoyl reductase (ER) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 974 -1273 of SEQ ID NO: 2, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7. According to this embodiment, this ER domain preferably comprises an amino acid sequence selected from the group consisting of: 974-1273 of SEQ ID NO: 2, aminca-acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7. Also, according to this embodiment, this nucleotide sequence is preferably substantially similar to a nucleotide sequence selected from of the group consisting of: nucleotides 10529-11428 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, and nucleotides 59366-60304 of the SEQ ID NO: l. According to this embodiment, this nucleotide sequence more preferably comprises a portion of nucleotides of 20, 25, 30, 35, 40, 45 or 50 (preferably 20) consecutive base pairs, of a sequence identical to a respective portion of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, and nucleotides 59366-60304 of SEQ ID NO: 1. In addition, according to this embodiment, this nucleotide sequence is more preferably selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, and nucleotides 59366-60304 of SEQ ID NO: 1. According to another embodiment, the epothilone synthase domain is an acyl carrier protein (ACP) domain, wherein this polypeptide comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO: 2, amino acids 1722-1792 of SEQ ID NO: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010 -5082 of the SEQ ID N0: 5, amino acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2164 of SEQ ID NO. : 7. According to this embodiment, this ACP domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO: 2, amino acids 1722-1792 of SEQ ID NO : 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5 , amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2164 of SEQ ID NO: 7. Also, according to this embodiment, this nucleotide sequence preferably it is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 21414-21626 of to SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, and nucleotides 61211-61426 of SEQ ID NO: 1. According to this embodiment, this sequence of nucleotides more preferably comprises a portion of nucleotides of 20, 25, 30, 40, 45, or 50 (preferably 20) consecutive base pairs, of a sequence identical to a respective portion of 20, 25, 30, 35 , 40, 45 or 50 (preferably 20) consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 4316 3-43378 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, and nucleotides 61211-61426 of SEQ ID NO: 1. In addition, according to this embodiment, this nucleotide sequence is more preferably selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, and nucleotides 61211-61426 of SEQ ID NO: 1. According to another embodiment, the epothilone synthase domain is a dehydratase (DH) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 869-1037 of the SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7. According to this embodiment, this DH domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7. Also, according to this embodiment, eeta nucleotide sequence is preferably substantially similar to a nucleotide sequence selected from starting a cell group consisting of: nucleotides 18855-19361 of SEQ ID NO: l, nucleotides 33401-33889 of SEQ ID NO: l, nucleotides 39635-40141 of SEQ ID NO: l, nucleotides 50670-51176 of the SEQ ID NO: 1, and nucleotides 57593-58087 of SEQ ID NO: 1. According to this embodiment, this sequence of nucleotides preferably will comprise a nucleotide portion of 20., 25, 30, 35, 40, 45 or 50 (preferably 20) consecutive base pairs, of a sequence identical to a respective portion of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, and nucleotides 57593-58087 of SEQ ID NO: 1. In addition, according to this embodiment, this nucleotide sequence is more preferably selected from cell - ^ sKs yes. group consisting of: nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID N0: 1, and nucleotides 57593-58087 of SEQ ID NO: l. According to still another embodiment, the epoxylated synthase domain is a β-keto-reductase (KR) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 6857 -7101 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 1810-2055 of SEQ ID NO: 7. According to this mode, this KR domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 2645- 2895 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acid 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 1810-2055 of SEQ ID NO: 7. Also according to this embodiment, the nucleotide sequence of preference is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, and nucleotides 60362-61099 of SEQ ID NO: l. According to this embodiment, this nucleotide sequence more preferably comprises a portion of nucleotides of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs, of a sequence identical to a respective portion. 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO: 1 , nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, and nucleotides 60362-61099 of SEQ ID NO: 1. In addition, according to this embodiment, this nucleotide sequence is more preferably selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, and nucleotides 60362-61099 of SEQ ID NO: 1. According to a further embodiment, the epoxyethyl synthase domain is a methyltransferase (MT) domain which comprises an amino acid sequence substantially similar to amino acids 2671-3045 of SEQ ID NO: 6. According to this embodiment, this MT domain preferably comprises amino acids 2671-3045 of SEQ ID NO: 6. Also, according to this embodiment, this nucleotide sequence is preferably substantially similar to nucleotides 51534-52657 of SEQ ID NO: 1. According to this embodiment, this nucleotide sequence more preferably comprises a portion of nucleotides of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs, of an identical sequence to each respective portion of 20, 25, 30, 35, 40, 45, or 50 (preferred -reference 20) consecutive base pairs of nucleotides 51534-52657 of SEQ ID NO: 1. In addition, according to this embodiment, this nucleotide sequence is more preferably nucleotides 51534-52657 of SEQ ID NO: 1. According to another embodiment, the epothilone synthase domain is a thioesterase (TE) domain comprising an amino acid sequence substantially similar to amino acids 2165-2439 of SEQ ID NO: 7. According to this embodiment, this TE domain preferably comprises amino acids 2165-2439 of SEQ ID NO: 7. Also in accordance with this In this mode, this nucleotide sequence is preferably substantially similar to nucleotides 61427-62254 of SEQ ID NO: 1. According to this embodiment, this nucleotide sequence more preferably comprises a nucleotide portion of 20, 25 , 30, 35, 40, 45, or 50 (preferably 20) consecutive base pairs, of a sequence identical to a respective portion of 20, 25, 30, 35, 40, 45, or 50 (preferably 20) pairs of consecutive bases of nucleotides os 61427-62254 of SEQ ID NO: 1. In addition, according to this embodiment, this nucleotide sequence is more preferably nucleotides 61427-62254 of SEQ ID NO: 1. In still another aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence encoding a non-ribosomal peptide synthetase, wherein this non-ribosomal peptide synthetase comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of : SEQ ID NO: 3, amino acids 72-81 of SEQ NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 2, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO : 3, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268-1274 of SEQ ID NO: 3, amino acids 1285-1297 of SEQ ID NO: 3, amino acids 973-1256 of SEQ ID NO: 3, and amino acids 1344-1351 of SEQ ID NO: 3. According to this embodiment, this peptide synthetase or ribosomal preferably comprises an amino acid sequence selected from the group consisting of: SEQ ID NO: 3, amino acids 72 -81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ED N0: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO:, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268-1274 of SEQ ID NO: 3, amino acids 1285-1297 of SEQ ID NO : 3, amino acids 973-1256 of SEQ ID NO: 3, and amino acids 1344-1351 of SEQ ID NO: 3. Also in accordance with this embodiment, this nucleotide sequence is preferably substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of the SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 1 3876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: l, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 1457 »-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: l, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, and nucleotides 15901-15924 of SEQ ID NO: 1. According to this modali tion, this nucleotide sequence more preferably comprises a nucleotide portion of, 25, 30, 35, 40, 45, or 50 (preferably 20) connective base pairs, of a sequence identical to a respective portion of 20, 25, 30, 35, 40, 45, or 50 (of preference 20) consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 11872-16304 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of the SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: l, and nucleotides 15901-15924 of SEQ ID NO: l. In addition, according to this embodiment, this nucleotide sequence is more preferably selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO-1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1 , nucleotides 14578-14607 of SEQ ID NO: 1, ru-cleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1 , nucleotides 14788-15639 of SEQ ID NO: 1, and nucleotides 15901-15924 of SEQ ID NO: 1. The present invention further provides an isolated nucleic acid molecule comprising a nucleotide sequence. s encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2-23. According to another aspect, the present invention also provides methods for the recombinant production of polyketides, such as epothilones, in amounts large enough to make possible their purification and use in pharmaceutical formulations, such as those for the treatment of cancer. A specific advantage of these production methods is the chirality of the molecules produced; production in transgenic organisms eliminates the generation of populations of racemic mixtures, within which some enantiomers may have a reduced activity. In particular, the present invention provides a method for the heterologous expression of epothilone in a recombinant host, which comprises: (a) introducing into a host a chimeric gene comprising a heterologous promoter sequence operably linked to a nucleic acid molecule of the invention comprising a nucleotide sequence encoding when rienes a polypeptide involved in the biosynthesis of epothilone; and (b) cultivate the host under conditions that allow the beo-synthesis of epothilone in the host. The present invention also provides a method for producing epothilone, which comprises: (a) expressing epothilone in a recombinant host by the aforementioned method; and (b) extract the epothilone from the recombinant host. According to yet another aspect, the present invention provides an isolated polypeptide comprising an amino acid sequence consisting of an epothyl-na synthase domain. According to one embodiment, the epothil-na-synthase domain is a β-ketoacyl-synthase (KS) domain that comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO: 2, amino acids 7-432 of SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32 -450 of SEQ ID NO: 7. According to this embodiment, this domain preferably does not comprise an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO: 2, amino acids 7 -432 of SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 3024-3449 of S? Q ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO: 7 According to another embodiment, the epothilone synthase domain is an acyltransferase (AT) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563-884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 3555-3876 of the SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO 5, amino acids 561-881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, and amino acids 556-877 of SEQ ID NO: 7 According to this embodiment, this AT domain preferably comprises an amino acid sequence selected from the group consisting of: 543-864 of S? Q ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563-884 of SEQ ID N0: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO: 5, amino acids 561 -881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO.-6, and amino acids 556-877 of SEQ ID NO: 7. According to yet another embodiment, the domain of epo-tilane- synthase is an enoyl reductase (ER) domain that comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO: 2, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7. According to this embodiment, this Inio ER preferably comprises an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO: 2, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7. According to another embodiment, the epothyl-na domain -sintasa is a domain of acyl carrier protein (ACP), wherein the polypeptide comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO: 2, amino acids 1722-1792 of SEQ ID O: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2164 of SEQ ID NO: 7. According to this embodiment, this ACP domain preferably comprises the amino acid sequence selected from the group c [ue] consisting of: amino acids 1314-1385 of SEQ ID NO: 2, amino acids 1722-1792 of SEQ ID NO: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2364 of the SEQ ID NO: 7. According to another embodiment, the epothilone synthase domain is a dehydratase (DH) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887 -1051 of SEQ ID NO: 7. According to this embodiment, this DH domain preferably comprises an amino acid sequence selected from the group consisting of: os 869-1037 of SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7. According to still another embodiment, the epoxylated synthase domain is a ß-keto-reductase (KR) domain that encompasses an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of : amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID N0: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 1810-2055 of SEQ ID NO: 7. In accordance with eeta modality, the KR domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 1810-2055 of SEQ ID NO: 7. According to a further embodiment, the epoxylated synthase domain is a methyltransferase (MT) domain comprising an amino acid sequence substantially similar to amino acids 2671-3045 of SEQ ID NO: 6. According to this embodiment , this MT domain preferably comprises-: amino acids 2671-3045 of SEQ ID NO: 6. According to another embodiment, the epothilone synthase domain is a thioesterase (TE) domain comprising a substantially similar amino acid sequence. to the amino acids of SEQ ID NO: 7. According to this embodiment, this TE domain preferably comprises amino acids 2165-2439 of SEQ ID NO: 7. Other aspects and advantages of the present invention will become more clear to those skilled in the art from a study of the following description of the invention and of the non-limiting examples.
DEFINITIONS In the description of the present invention, the following terms will be used, and it is intended to define them as indicated below. Associated with / Operationally Linked: refers to two DNA sequences that are physically or functionally related. For example, a promoter or regulatory DNA sequence is said to be "associated with" a DNA sequence encoding an RNA or a protein if the two sequences are operably linked, or located in such a way that the regulatory DNA sequence affects the level of expression of the coding or structural DNA sequence. Chimeric Gene: A recombinant DNA sequence wherein a promoter or regulatory DNA sequence is operably linked to, or associated with, a DNA sequence that encodes an mRNA, or that is expressed as a protein, such that the sequence The regulatory DNA can regulate the transcription or expression of the associated DNA sequence. The regulatory DNA sequence of the chimeric gene is not normally operably linked to the associated DNA sequence, as found in nature. Sequence of coding DNA: a sequence of DNA that is translated in an organism to produce a protein. Of a polyketide synthase necessary for a given different activity. Examples include the domains of acyl carrier protein (ACP), β-ketosynthase (KS), acyltransferase (AT), ß-cetorreductase (KR) dehydratase (DH), enoylrreductase (ER), and thioesterase (TE). Epothilones: 16-member macrocyclic polyketides naturally produced by the bacterium Sorangium cellulosum ce; pa So ce90, which mimic the biological effects of taxol. In this application, "epothilone" refers to the class of polyketides that includes epothilone A and epothilone B, as well as their analogs such as those described in International Publication Number WO 98/25929. Epothilone synthase: a polyketide synthase responsible for the biosynthesis of epothilone. Gene: A defined region that is located within a genome, and that, in addition to the aforementioned coding DNA sequence, comprises other primarily regulatory DNA sequences responsible for the control of expression, i.e., transcription and translation of the coding portion. Heterologous DNA Sequence: A DNA sequence not naturally associated with a host cell into which it is introduced, including multiple copies that do not occur naturally of a naturally occurring DNA sequence. DNA sequence Homologous: A DNA sequence naturally associated with a host cell that is introduced. Homologous recombination: the reciprocal exchange of DNA fragments between homologous DNA molecules. Isolated: In the context of the present invention, an isolated nucleic acid molecule or an isolated enzyme, is a nucleic acid molecule or enzyme that, by the hand of man, exists apart from its native environment, and therefore, It is not a product of nature. An isolated nucleic acid or enzyme molecule can exist in a purified form, or it can exist in a non-native environment, such as, for example, a recombinant host cell. Module: A genetic element that encodes all the different activities required in a single round of polyketide biosynthesis, that is, a condensation step and all the processing steps of β-.carbonyl associated therewith.
Each module encodes an activity of ACP, KS, and AT, to perform the condensation portion of the biosynthesis, and the post-condensation activities selected to effect the processing of β-carbonyl. NRPS: A non-ribosomal polypeptide synthetase, which is a complex of enzymatic activities responsible for the incorporation of amino acids into secondary metabolites, including, for example, the amino acid domains of adenylation, epimerization, N-methylation, cyclization, peptidyl carrier protein , and condensation. A functional NRPS is one that catalyzes the incorporation of an amino acid into a secondary metabolite. NRPS gene: One or more genes encoding NRPSs to produce functional secondary metabolites, for example epothilone A and B, when they are under the direction of one or more compatible control elements. Nucleic Acid Molecule: A linear segment of DNA or Single-stranded or double-stranded RNA that can be isolated from any source. In the context of the present invention, the nucleic acid molecule is preferably a segment of DNA ORF: Open Reading Framework. PKS: A polyketide synthase which is a complex of enzymatic activities (domains) responsible the biosynthesis of polyketides, including, example, ketoreductase, dehydr = i-rate, acyl carrier protein, enoylreductase, ketoacyl-ACP synthase, and acyltransferase. A functional PKS is one that catalyzes the synthesis of a polyketide. PKS genes: One or more genes encoding different polypeptides required the production of functional polyketides, example epothilones A and B, when they are under the direction of one or more compatible control elements. Substantially similar: With respect to nucleic acids, a nucleic acid molecule having a sequence identity of at least 60 percent with a reference nucleic acid molecule. In a preferred embodiment, a substantially similar DNA sequence is at least 80 percent identical to a reference DNA sequence; in a more preferred embodiment, a substantially similar DNA sequence is at least 90 percent identical to a reference DNA sequence; and in a very preferred mode. - da, a substantially similar DNA sequence is at least 95 percent identical to a reference DNA sequence. A substantially similar DNA sequence preferably encodes a protein or peptide having substantially the same activity as the protein or peptide encoded by the reference DNA sequence. A substantially similar nucleotide sequence normally hybridizes to a reference nucleic acid molecule, or fragments thereof, under the following conditions: hybridization in dodecyl sulfate of 7 percent sodium (SDS), 0.5 M NaP04, pH 7.0, EDTA 1 mM at 50 ° C; wash with 2X SSC, 1 percent SDS, at 50 ° C. With respect to proteins or peptides, a sequence of amino acids substantially similar to an amino acid sequence that is at least 90 percent identical to the amino acid sequence of a reference protein or peptide, and has substantially the same activity as the protein or protein. reference peptide. Transation: A process introducing heterologous nucleic acid into a host cell or organism. Transed / Transgenic / Recombinant: Refers to a host organism, such as a bacterium, into which a heterologous nucleic acid molecule has been introduced. The nucleic acid molecule can be stably integrated into the host gene, or the nucleic acid molecule can also be present as an extrachromosomal molecule. This extrachromosomal molecule can be self-replicating. It is understood that transed cells, tissues, or plants encompass not only the final product of a transation process, but also their transgenic progeny. An "untransed", "non-transgenic", or "non-recombinant" host refers to a wild-type organism, i.e. a bacterium, that does not contain the heterologous nucleic acid molecule. Nucleotides are indicated by their bases by the following conventional abbreviations: adenine (A) cytosine (C), - ^ rs-s • * • ''. thymine (T), and guanine (G). In the same way, amino acids indicate by the following conventional abbreviations: alanine (Ala; A) arginine (Arg; R), asparagine (Asn; N), aspartic acid (Asp; D), cysteine (Cys; C), glutamine (Gln; Q), glutamic acid (Glu; E), glycine (Gly; G), histidine (His; H), isoleucine (Lie; I); leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V). In addition, (Xaa; X) represents any amino acid.
DESCRIPTION OF THE SEQUENCES IN THE LIST OF SEQUENCES SEQ ID NO: 1 is the nucleotide sequence of a contiguous 68,750 base pairs containing 22 open reading frames (ORFs), comprising the epothilone biosynthesis genes. SEQ ID NO: 2 is the protein sequence of a polypeptide synthase type I (EPOS A) encoded by epoA (nucleotides 7610-11875 of SEQ ID NO: 1). SEQ ID NO: 3 is the protein sequence of a non-ribosomal peptide synthetase (EPOS P) encoded by epoP (nucleotides 11872-16104 of SEQ ID NO: 1). SEQ ID NO: 4 is the protein sequence of a polyketide synthase type I (EPOS B) encoded by epoB (nucleotides 16251-21749 of SEQ ID NO: 1).
- "--- * -" '• .- - - --- - - - - * .-, ~ * ~ * ~? UJl &a. R .- * - ~ ** ~ y. ^. ^^^ A * fa ^ '^ "^ SEQ ID NO: 5 is the protein sequence of a polyketide synthase type I (EPOS C) encoded by epoC (nucleotides 21746-43519 of SEQ ID N0: 1) SEQ ID NO: 6 is the protein sequence of a polyketide synthase type I (EPOS D) encoded by epoD (nucleotides 43524-54920 of SEQ ID NO: 1) SEQ ID NO: 7 is the protein sequence of a polyketide synthase type I (EPOS E) encoded by epoE (nucleotides 5492.5-62254 of SEQ ID NO: 1) SEQ ID NO: 8 is the protein sequence of a cyto-chromium homolog P450-oxygenase (EPOS F ) encoded by epoF (nucleotides 62369-63628 of SEQ ID NO: 1) SEQ ID NO: 9 is a partial protein sequence (partial Orf 1) encoded by orfl (nucleotides 1-1826 of SEQ ID NO: 1) SEQ ID NO: 10 is a protein sequence (Orf2) encoded by orf2 (nucleotides 3171-1900 on the reverse complement strand of SEQ ID NO: 1) SEQ ID NO: 11 is a protein sequence (Orf 3 ) codifi -ado por or f3 (nucleotides 3415-5556 of SEQ ID NO: l). SEQ ID NO: 12 is a protein sequence (Orf4) encoded by orf4 (nucleotides 5992-5612 on the reverse complement strand of SEQ ID NO: 1). SEQ ID NO: 13 is a sequence of protein (Orf5) encoded by orf5 (nucleotides 6226-6675 of SEQ ID NO: 1).
SEQ ID NO: 14 is a protein sequence (Orf6) encoded by 0rf6 (nucleotides 63779-64333 of SEQ ID NO: 1). SEQ ID NO: 15 is a sequence of protein (Orf7) encoded by orf7 (nucleotides 64290-63853 on the reverse complement chain of SEQ ID NO: 1). SEQ ID NO: 16 is a sequence of protein (Orf8) encoded by orf8 (nucleotides 64363-64920 of SEQ ID NO: 1. SEQ ID NO: 17 is a sequence of protein (Orf 9) encoded by orf9 (nucleotides 64727- 64287 on the reverse complement strand of SEQ ID NO: 1) SEQ ID NO: 18 is a protein sequence (Orf 10) encoded by orflO (nucleotides 65063-65767 of SEQ ID NO: 1) SEQ ID NO. : 19 is a protein sequence (Orf 11) encoded by orf11 (nucleotides 65874-65008 on the reverse complement chain of SEQ ID NO: 1) SEQ ID NO: 20 is a protein sequence (Orf 12) encoded by orf12 (nucleotides 66338-65871 on the reverse complement strand of SEQ ID NO: 1) SEQ ID NO: 21 is a protein sequence (Orf 13) encoded by orfl3 (nucleotides 66667-67137 of the SEQ ID NO: 1) SEQ ID NO: 22 is a protein sequence (Orf 14) encoded by orfl4 (nucleotides 67334-68251 of SEQ ID NO: 1) SEQ ID NO: 23 is a prototyping sequence. ein (partial Orf 15) encoded by orfl5 (nucleotides 68346-68750 of SEQ ID O: 1).
SEQ ID NO: 24 is the initial chain reaction sequence of the polymerase in universal reverse. SEQ ID NO: 25 is the initial chain reaction sequence of the universal forward polymerase. SEQ ID NO: 26 is the polymerase chain reaction initiator sequence of the "B" end of NH24. SEQ ID NO: 27 is the polymerase chain reaction initiator sequence of the "A" end of NH2. SEQ ID NO: 28 is the initiation sequence of the chain reaction of the polymerase of the "B" end of NH2. SEQ ID NO: 29 is the polymerase chain reaction initiator sequence of the "B" end of pEP015-NH6. SEQ ID NO: 30 is the polymerase chain reaction initiator sequence of the "A" end of pEP015-H2.7.
DEPOSIT INFORMATION The following material has been deposited at the Agricultural Research Service, Patent Culture Collection (NRRL), 1815 North University Street, Peoria, Illinois 61604, in accordance with the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for Purposes of Patent Procedures. All restrictions on the availability of the deposited material will be ircably removed when a patent is granted.
Deposited Material Access Number Deposit Date pEP015 NRRL B-30033 June 11, 1998 Pepo32 NRRL B-30119 April 16, 1999 DETAILED DESCRIPTION OF THE INVENTION The genes involved in the biosynthesis of epothilones can be isolated using the techniques according to the present invention. The preferable method for the isolation of epothilone biosynthesis genes requires the isolation of genomic DNA from an organism identified as producing epothilones A and B, and the transfer of the isolated DNA on a suitable plasmid or vector to a host organism that does not normally produce the polyketide, followed by the identification of the transformed host colonies to which the ability to produce epothilone has been conferred. Using a technique such as transposon mutagenesis ?:: Tn5 (de Bruijn and Lupski, Gene 27: 131-149 (1984)), one can define more precisely the exact region of the DNA that confers the transforming epothilone. Alternatively or additionally, the DNA conferring the transforming epothilone can be dissociated into smaller fragments, and the smallest one that retains the ability to confer epothilone is further characterized. Although the host organism lacking the ability to produce epothilone may be a different species of the organism from which the polyketide is derived, a ^^^^ gg ^ variation of this technique involves the transformation of the host DNA into the same host that has had its epothilone-producing capacity interrupted by mutagenesis. In this method, an epothilone producing organism is mutated, and non-epothilone-producing mutants are isolated. These are then supplemented with genomic DNA isolated from the epothilone producing progenitor strain. A further example of a technique that can be used to isolate genes required for epothilone biosynthesis is the use of transposon mutagenesis to generate mutants of an epothilone producing organism which, after mutagenesis, fails to produce the polyketide. Accordingly, the region of the host genome responsible for the production of epothilone is directed by the transposon, and can be recovered and used as a probe to isolate the native genes of the parent strain. The PKS genes that are required for the synthesis of polyketides, and which are similar to the known PKS genes, can be isolated by virtue of their sequence homology with the biosynthetic genes for which the sequence is known, such as those for biosynthesis. of rifamycin or sorafeno. Suitable techniques for isolation by homology include standard library screening by DNA hybridization. As a probe molecule, it is preferred to use a fragment of DNA that can be obtained from a gene or other DNA sequence having a part in the synthesis of a known polyketide. A preferred probe molecule comprises a 1.2 kb Smal DNA fragment encoding the ketosynthase domain of the fourth module of sorafen PKS (U.S. Patent No. 5,716,849), and a most preferred probe molecule comprises the β-ketoacyl synthase domains of the first and second modules of the rifamycin PKS (Schupp et al., FEMS Microbiology Letters 159: 201-207 (1998)). These can be used to probe a genetic library of an epothilone-producing microorganism, in order to isolate the PKS genes responsible for epothilone biosynthesis. In spite of the well-known difficulties with the isolation of the PKS gene in general, and in spite of the difficulties that it is expected to encounter with the isolation of epothilone biosynthesis genes in particular, by using the methods described herein Descriptive memory, in a surprising manner, biosynthetic genes for the epothilones A and B can be cloned from a microorganism that produces that polyketide. Using the methods of genetic manipulation and recombinant production described in this specification, the cloned PKS genes can be modified and expressed in transgenic host organisms. The isolated epothilone biosynthetic genes can be expressed in heterologous hosts to enable polyketide production more efficiently than would be possible from native hosts. The techniques for these genetic manipulations are specific for the different available hosts, and are known in the art. For example, heterologous genes can be expressed in Streptomyces and other actinomycetes using techniques such as those described in McDaniel et al., Science 262: 1546-1550 (1993) and Kao et al., Science 265: 509-512 (1994). , both incorporated herein by reference. See also, Rowe et al., Gene 216: 215-223 (1998); Holmes et al., EMBO Journal 12 (8): 3183-3191 (1993) and Bibb et al., Gene 38: 215-226 (1985), all of which are incorporated herein by reference. In an alternative way, the genes responsible for the biosynthesis of the polyketide, that is, the biosynthetic genes of epothilone, can also be expressed in other host organisms, such as pseudomonads and E. coli. The techniques for these genetic manipulations are specific for the different available hosts, and are known in this field. For example, PKS genes have been successfully expressed in E. co li using the vector pT7-7, which uses the T7 promoter. See, Tabor et al., Proc. Nati Acad. Sci. USA 82: 1074-1078 (1985), incorporated herein by reference. In addition, the expression vectors pKK223-3 and pKK223-2 can be used to express heterologous genes in E. coli, since ? e,. either in transcription or translation fusion, behind the tac or trc promoter. For the expression of operons that encode multiple open reading frames, the simplest procedure is to insert the operon into a vector such as pKK222-3 in transcription fusion, allowing the ribosome binding site to be used conato of the heterologous genes . Techniques for overexpression in gram-positive species such as Bacillus are also known in the art, and can be used in the context of this invention (Quax and tail-speakers, in: Industrial Microorganisms: Basic and Applied Molecular Genetics , editors, Baltz et al., Americam Society for Microbiology, Washington (1993)). Other expression systems that can be used with the epothilone biosynthetic genes of the invention include yeast and baculovirus expression systems.
See, for example, "The Expression of Recombinant Proteins in Yeasts, "Sudbery, P.E., Curr Opin. Biotechnol. 7 (5): 517-524 (nineteen ninety six); "Methods for Expressing Recombinant Proteins in Yeast," Mackay, et al., Editors: Carey, Paul R., Protein Eng. Des. 105-153, Editor: Academic, San Diego, Calif (1996); "Expression of heterologous gene products in yeast," Pichuan-tes, et al., Editors: Cleland, JL, Craik, CS, Protein Eng. 129-161, Editor: Wiley-Liss, New York, N. Y (1996); International Publication Number WO 98/27203; Kealey et al., Proc. Nati Acad. Sci. USA 95: 505-509 (1998); "Insect Cell Culture: Recent Advances, Bioengineering Challenges and Implications in Protein Production," Palomares, et al., Editors: Galindo, Enrique; Ramírez, Octavio T., Adv. Bioprocess Eng. Vol. II, Invited Pap. Int. Symp. , 2 / id (1998) 25-52, Editor: Kluwer, Dordrecht, Neth; "Baculovirus Expression Vectors," Jarvis, Donald L., Editors: Miller, Lols K., Baculoviruses 389-431, Editor: Plenum, New York, N. Y. (1997); "Production Of Heterologous Proteins Using The Baculovirus / lnsect Expression System," Grittiths, et al., Methods Mol. Biol. (Totowa, N.J.) 75 (Basic Cell Culture Protocols (2nd edition)) 427-440 (1997); e "Insect Cell Expressic Technology," Luckow, Verne A., Protein Eng. 183-218, publisher: Wiley-Liss, New York, N. Y. (1996); all of which are incorporated herein by reference. Another consideration for the expression of PKS genes in heterologous hosts is the requirement of enzymes for post-translational modification of PKS enzymes by phosphopantethenylation before they can synthesize polyketides. However, the enzymes responsible for this modi -fication of PKS type I enzymes, phosphopantetheinyl (P-pant) transferases. they are not normally present in many hosts such as E. coli. This problem can be solved by coexpression of a P-pant-transferase with the PKS genes in the heterologous host, as described by Kealey et al., Proc. Na ti, Acad. Sci. USA 95: 505-509 (1998)), .¿ ^ .. f B ^ t incorporated herein by reference. Therefore, for the purposes of polyketide production, the significant criteria in the choice of the host organism are its ease of handling, its rapid growth (ie, fermentation), its possession of the appropriate molecular machinery for such processes. as a later modification for the translation, and its lack of susceptibility to the polyketide that is being overproduced. The most preferred host organisms are actinomycetes, such as strains of Streptomyces. Other preferred host organisms are pseudomonads and E. coli. The above-described methods of polyketide production have significant advantages over the technology currently used in the preparation of the compounds. These advantages include the cheapest cost of production, the ability to produce greater amounts of the compounds, and the ability to produce compounds of a preferred biological enantiomer, as opposed to the racemic mixtures inevitably generated by the organic synthesis. The compounds produced by heterologous hosts can be used in medical applications (eg, cancer treatment in the case of epothilones) as well as agricultural ones.
EXPERIMENTAL The invention will be further described with reference to the following detailed examples. These Examples are provided for purposes of illustration only, and are not intended to be limiting, unless otherwise specified. The conventional molecular cloning and recombinant DNA techniques used herein are well known in the art, and are described by Ausubel (ed.), Current Protocols in Molecular Biology, John Wiley and Sons, Inc. (1994); T. Maniatis, E. F. Fritsch and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor laboratory, Cold Spring Harbor, NY (1989); and by T.J. Silhavy, M.L. Berman, and L.W. Er -quist, Experience wi th Gene Fusions, Cold Spring Harbor Labe, Cold Spring Harbor, NY (1984).
Example 1: Cultivation of an epothilone producing strain of Soranium cellulosum Strain 90 from Sorangi um cellulosum (DSM 6773, Deutsche Sammiung von Mikroorganismen und Zellkulturen, Braunschweig) is marked and grown (30 ° C) on an agar plate with a SolE medium (0.35 percent glucose, 0.05% tryptone). percent, MgS04 at 0.15 percent, x 7H20, 0.05 percent ammonium sulfate 0.1 percent CaCl2, 0.002 percent K2HP04, 0.01 percent sodium dithionite, 0.0008 percent Fe-EDTA, 1.2 percent HEPES one hundred, supernatant to 3.5 per cent - to [volume / volume] of sterilized stationary S. cellulosum culture), pH up to 7.4. Cells of approximately 1 square centimeter are collected and inoculated into 5 milliliters of a liquid G51t medium (0.2 percent glucose, 0.5 percent starch, 0.2 percent tryptone, 0.1 percent S probion, 0.01 CaCl2x2H20 per cent, MgSOx7H20 at 0.05 percent, HEPES at 1.2 percent, pH up to 7.4) and incubated at 30 ° C with shaking at 225 rpm. After 4 days, the culture is transferred to 500 milliliters of G51t, and incubated as above for 5 days. This culture is used to inoculate 500 milliliters of G51t, and incubate as before for 6 days. The culture is centrifuged for 10 minutes at 4,000 rpm, and the cell pellet is resuspended in 50 milliliters of G51t.
Example 2: Generation of a Library of Bacterial Artificial Chromosomes (Bac) To generate a Bac library, S. cellulosum cells cultured as described in Example 1 above, are embedded in agarose blocks, lysed, and genomic DNA The li-berado is partially digested by the restriction enzyme HindIII. The digested DNA is separated on an agarose gel by field-driven electrophoresis. Large DNA fragments (approximately 90 to 150 kb) are isolated from the agarose gel, and ligated into the pBelobacII vector. PBeLo-bacll contains a gene that codes for resistance to chloranE-nicole, a multiple cloning site in the lacZ gene, which provides blue / white selection on an appropriate medium, as well as the genes required for replication and maintenance of the plasmid in one or two copies per cell. The mixture of bridging is used to transform electrocompetent cells of Escheri chia coli DH10B using conventional electroporation techniques. Recombinant chloramphenicol-resistant colonies (white, lacZ mutant) are transferred to a positively charged nylon membrane filter in a 384 3X3 grid format. The clones are lysed and the DNA is crosslinked with the filters. The same clones are also preserved as liquid cultures at -80 ° C.
Example 3: Trace of the Bac Library of Sorangium cereulusum 90 to Determine the Presence of Sequences Related to Polyketide Sintase Type I Bac library filters are probed by conventional Southern hybridization methods. The DNA probes used encode the ß-ketoacyl-synthase domains of the first and second modules of the polyketide synthase of rifamycin (Schupp et al., FEMS Microbiology Letters 159: 201-207 (1998)). The probe DNAs are generated by polymerase chain reaction with primers flanking each ketosynthase domain using the pNE95 plasmid as the annealing (pNE95 is equal to cosmid 2 described in FIG. íf? g Schupp et al. (1998)). 25 nanograms of the amplified DNA are isolated with polymerase chain reaction from a 0.5 percent agarose gel, and labeled with 3: P-dCTP using a random primer starter kit (Gibco-BRL, Bethesda MD , USA) according to the manufacturer's instructions. Hybridization is at 65 ° C for 36 hours, and the membranes are washed with high stringency (3 times with SSC O.lx, and 0.5% SDS for 20 minutes at 65 ° C). The marked march is exposed in a phosphorescent screen, and the signals are detected in a Phospholmager 445SI (screen and 445SI of Molecular Dynamics). This results in a strong hybridization of certain Bac clones to the probes. These clones are selected and grown overnight in 5 milliliters of Luria broth (LB) at 37 ° C. The Bac DNA from the Bac clones of interest is isolated by a typical mimetic procedure. The cells are resuspended in 200 microliters of lysozyme solution (50 mM glucose, 10 mM EDTA, 25 mM Tris-HCl, 5 milligrams / milliliter of lysozyme), in 400 microliters of lysis solution (NaOH 0.2 N, and 2 percent SDS), the proteins are precipitated (3.0 M potassium acetate, adjusted to pH 5.2 with acetic acid), and the Bac DNA is precipitated with isopropanol. The DNA is resuspended in 20 microliters of nuclease-free distilled water, restricted with BamHI (New England Biolabs, Inc.), and separated on a 0.7 percent agarose gel. The gel is stained by Southern hybridization as described above, and probed under the conditions described above, with a 1.2 kb Smal DNA fragment encoding the ketosynthase domain of the fourth module of soraphen polyketide synthase as the probe (see United States Patent Number 5,716,849). There are 5 different hybridization patterns observed. A clone representing each of the five standards is selected, and is designated as pEP015, pEPO20, pEPO30, pEP031, and pEP033, respectively.
Example 4: Subcloning of BamHI Fragments from pEP015, pEPO20, pEPO30, pEP031, and pEP033, DNA from the five selected Bac clones is digested with BamHl, and random fragments are subcloned into pBluescript II SK + (Stratagene) at the BamHl site . Subclones carrying inserts of size between 2 and 10 kb are selected to sequence the flanking ends of the inserts, and are also probed with a SamI 1.2 probe as described above. Subclones showing a high degree of sequence homology to the known polyketide synthases, and / or strong hybridization to the sorafeno-ketosynthase domain for genetic alteration experiments are used.
Example 5: Preparation of Streptomycin Resistant Spontaneous Mutants of Sorangium cellulosum So Ce90 Sequence 0.1 milliliter of a three-day Sorangi um cell ulosum strain So ce90 strain is applied, which is grown in a liquid medium G52-H (yeast extract) 0.2 percent, 0.2 percent defatted soy flour, 0.8 percent potato starch, 0.2 percent glucose, MgS0x7H20, 0.1 percent CaCl2x 2H20, 0.008 percent Fe-EDTA, pH up to 7.4 with KOH ) on agar plates with an SoIE medium supplemented with 100 micrograms / milliliter of streptomycin. The plates are incubated at 30 ° C for 2 weeks. Colonies growing on this medium are streptomycin resistant mutants, which are labeled and cultured once again on the same agar medium with streptomycin for purification. One of these mutants resistant to streptomycin is selected, and is called Bce28 / 2.
Example 6: Genetic Alterations in Sorangium cellulosum BCE28 / 2, Using the Subcloned BamHl Fragments BamH1 inserts are isolated from subclones generated from the five Bac clones selected as described above, and ligated into the unique BamHl site of plasmid pCIB132 (See United States Patent Number 5716,849). The Pcibl32 derivatives bearing the inserts are transformed into Escherichia coli ED8767 containing the auxiliary plasmid pUZ8 (Hedges and Matthew, Plasmid 2: 269-278 (1979)). The transformants are used as donors in the conjugation experiments with Sorangium cell ulosum BCE28 / 2 as a receptor. For conjugation, 5-10 x 109 cells of Sorangium cellulosum BCE28 / 2 are mixed from an early stationary phase culture (reaching approximately 5 x 10 8 cells / milliliter) grown at 30 ° C a liquid medium G51b (G51b is equal to G51t medium replacing tryptone by peptone) in a 1: 1 cell ratio with a late registration phase culture (in a liquid medium LB) of E. coli ED8767 containing pCIB132 derivatives carrying the subcloned BamHl fragments, and the pUZd auxiliary plasmid. The mixed cells are then centrifuged at 4,000 rpm for 10 minutes, and resuspended in 0.5 milliliters of G51b medium. This cell suspension is then applied as a drop in the center of a plate with SolE agar containing 50 milligrams / liter of kanamycin. Cells obtained after incubation for 24 hours at 30 ° C are harvested and re-suspended in 0.8 milliliters of G51b medium, and 0.1 to 0.3 milliliters are applied on a solid medium containing selective phleomycin-containing SolE (30 milligrams / liter). ), streptomycin (300 milligrams / liter), and kanamycin (50 milligrams / liter). The counter-selection of the donor strain of Escherichia coli takes place with the help of streptomycin. Colonies growing on this selective medium after an incubation time of 8 to 12 days at a temperature of 30 ° C are isolated with a plastic handle and marked and grown on the same agar medium for a second round of selection and purification. Cultures derived from the colonies growing on this selective agar medium after 7 days at a temperature of 30 ° C, are transconjugants of Sorangium cellulosum BCE28 / 2 that have acquired resistance to phleomycin by the conjugative transfer of the derivatives of Pcibl32 carrying the subcloned BamHl fragments. The integration of plasmids derived from Pcibl32 in the chromosome of Sorangium cellulosum BCE28 / 2 by homologous recombination is verified by Southern hybridization. For this experiment, complete DNA is isolated from 5 to 10 transconjugants per transferred BamHl fragment (from 10 milliliter cultures grown in G52H medium for three days) by applying the method described by Pospiech and Neumann, Trends Genet. 11: 217 (1995). For Southern Blot, the isolated DNA as described above is dissociated with restriction enzymes Bg / I I, Clal, or Notl, and the respective BamHl isolates, or pCIB132 are used as probes labeled with 32P. Example 7: Analysis of the Effect of Integrated BamHl Fragments on the Production of Epothilone by Sorangium cellulosum After Genetic Alteration Transconjugant cells grown on a surface of approximately 1 square centimeter of the plates with selective EndoE of the second round of selection (see the pio 6) are transferred by a sterile plastic handle to 10 milliliters of G52-H medium in a 50 milliliter Erlenmeyer flask. After incubation at 30 ° C and at 180 rpm for 3 days, the culture is transferred to 50 milliliters of G52H medium in a 200 milliliter Erlenmeyer flask. After incubation at 30 ° C and at 180 rpm for 4 to 5 days, LO milliliters of this culture are transferred to 50 milliliters of 23B3 medium (0.2 percent glucose, 2 percent potato starch, defatted soybean meal at 1.6 percent, Fe-EDTA, sodium salt at 0.0008 percent, HEPES at 0.5 percent (4- (2-hydroxyethyl) -piperazine-1-ethanesulfonic acid), 2 percent by volume / volume of polyesterol resin XAD16 ( Rohm &Haas), pH adjusted to 7.8 with NaOH) in a 200 milliliter Erlenmeyer flask. The quantitative determination of the epothilone produced takes place after incubation of the cultures at 30 ° C and at 180 rpm for 7 days. The entire culture broth is filtered by suction through a 150 micron nylon filter. The resin remaining on the filter is resuspended in 10 milliliters of isopropanol, and is extracted by shaking the suspension at 180 rpm for 1 hour. Remove 1 milliliter of this suspension, and centrifuge at 12,000 rpm in an Eppendorff Microcentrifuge. The amount of epothilone A and B is determined by means of an HPLC, and with detection at 250 nanometers with a UV DAD detector (HPLC with Waters-Symetry C18 column, and a ,. and then, ».-» gradient of phosphoric acid at 0.02 percent from 60 percent to 0 percent, and acetonitrile from 40 percent to 100 percent). The transconjugants with three different integrated BamHl fragments subcloned from pEP015, that is, the transconjugants with the Ba Hl fragment of plasmid pEP015-21, the transconjugants with the BamH1 fragment of plasmid pEP015-4-5, and the transconjugants with the fragment Ba Hl of plasmid pEP015-4-l, are tested in the manner described above. HPLC analysis reveals that all transconjugants no longer produce epothilones A or B. In contrast, epothilones A and B can be detected in a concentration of 2 to 4 milligrams / liter in the transconjugants with the integrated Ba Hl fragments that are derived of pEPO20, pEPO30, pEP031, pEP033, and in parental strain BCE28 / 2.
Example 8: Determination of the Nucleotide Sequence of the Cloned Fragments and Construction of Contiguous A. BamHI insert of Plasmid pEP015-21 Plasmid DNA is isolated from Escherichia coli strain DH10B [pEP015-21], and the nucleotide sequence of the Ba Hl insert of 2.3 kb is determined in pEP015-21. Automated DNA sequencing is done on the double-stranded DNA annealing by the di-deoxynucleotide chain termination method, using the Applied Biosystems model 377 sequencers. The primers used are the universal reverse primer (5 'GGA AAC AGC TAT GAC CAT G 3 '(SEQ ID NO: 24)) and the forward universal initiator (5' GTA AAA CGA CGG CCA GT 3 '(SEQ ID NO: 25)). In the next rounds of sequencing reactions, custom synthesized oligonucleotides are used, designed for the 3 'ends of the previously determined sequences, in order to extend and join the contiguous ones. Both chains are sequenced entirely, and each nucleotide is sequenced at least twice. The nucleotide sequence is compiled using the Se-quencher software version 3.0 (Gene Codes Corporation), and is analyzed using the Genetics Computer Group programs of the University of Wisconsin. The nucleotide sequence of the 2213 base pair insert corresponds to nucleotides 20779-22991 of SEQ ID NO: 1.
B. Insert BamHl of Plasmid pEP015-4-l The plasmid DNA is isolated from Escherichia coli strain DH10B [pEP015-4-l], and the nucleotide sequence of the BamHl insert of 3.9 kb is determined in eEP015-4- l as described in (A) above. The nucleotide sequence of the 3909 base pair insert corresponds to nucleotides 16876-20784 of SEQ ID NO: 1. - '. * ?. > - ^ Sjr '? L, C. BamHI insert of Plasmid pEP015-4-5 Plasmid DNA is isolated from Escherichia coli strain DH10B [pEP015-4-5], and the nucleotide sequence of the Ba Hl insert is determined 2.3 kb in eEP015-4-5 as described in (A) above. The nucleotide sequence of the 2233 bp insert corresponds to nucleotides 421528-44760 of SEQ ID NO: 1.
Example 9: Subcloning and Ordering of DNA Fragments from pEP015 Containing Epitilone Biosynthetic Genes pEP015 is digested until the restriction enzyme HindI II is digested, and the resulting fragments are subcloned into pBluescript II SK- or pNEB193 (New England Biolabs) which has been cut with HindIII and dephosphorylated with calf intestinal alkaline phosphatase. Six different clones are generated, and are designated pEP015-NHl, pEP015-NH2, pEP015-NH6, pEP015-NH24 (all based on pNEB193), and pEP015-H2.7 and pEPO15-H3.0 (both based on pBluescript II SK -). The BamHl insert is isolated from pEP015-21 and labeled with DIG (Non-radioactive DNA Marking and Detection System, Boehrin-Ger Mannheim), and is used as a probe in high stringency DNA hybridization experiments against pEP015-NHl, pEP015-NH2, pEP015-NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. A strong hybridization signal is detected for pEP015-NH24, indicating that pEP015-21 is contained within pEP015-NH24. The Ba HI insert is isolated from pEP015-4-l and labeled with DIG as above, and used as a probe in high-stringency DNA hybridization experiments against pE-P015-NH1, pEP015-NH2, pEP015- NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong hybridization signals are detected for pEP015-NH24 and pEP015-H2.7. The nucleotide sequence data generated from one end of each of pE-P015-NH24 and pEP015-H2.7 are also in complete agreement with the previously determined sequence of the Ba Hl insert of pE-P015-4-1 . These experiments demonstrate that pEP015-4-l (containing an internal HindlII site) overlaps with pEP015-H2.7 and pEP015-NH24, and that pEP015-H2.7 and pEP015-NH24, in this or-den, are contiguous. The BamHl insert is isolated from pEP015-4-5, and labeled with DIG as above, and used as a probe in DNA hybridization experiments in high stringency concra pEP015-NHl, pEP015-NH2, pEP015-NH6, pEP015 -NH24, pEP015-H2.7 and pEPO15-H3.0. A strong hybridization signal is detected for pEP015-NH2, indicating that pEP015-21 is contained within pEP015-NH2. The nucleotide sequence data are generated from both ends of pEP015-NH2, and from the end of pEP015-NH24 which does not overlap with pEP015-4-l. The polymerase chain reaction initiates the "B" end of NH24: GTGACTGGCGCCTGGAATCTGCATGAGC (SEQ ID NO: 26), the "A" end of NH2: AGCGGGAGCTTGCTAGACATTCTGTTTC (SEQ ID NO: 27), and the "B" end of NH2 : GACGCGCCTCGGGCAGCGCCCCAA (SEQ ID NO: 28), pointing to the HindIII sites, and designed based on these sequences, and used in the amplification reactions with pEP015 and in separate experiments, with the genomic DNA of Sorangium cellulosum So ce90 as the tempered. An amplification is found with the starter pair of the "B" end of NH24 and the "A" end of NH2 with both hardened. The amplimers are cloned into pBluescript II SK-, and are completely sequenced. The sequences of the amplimers are identical, and it is also completely in agreement with the end sequences of pE-P015-NH24 and pEP015-NH2, fused at the HindIII site, is: a-blecing that the HindIII fragments of pEP015-NH2 and pEPOL5-NH24 are, in this order, contiguous. The HindIII insert is isolated from pEP015-H2.7, and labeled with DIG as above, and used as a probe in a high stringency DNA hybridization experiment with pI015 digested by NotI. A Notl fragment of approximately 9 kb size shows strong hybridization, and is additionally sabotaged in pBluescript II SK- which has been digested with Notl, and dephosphorylated with calf intestinal alkaline phosphatase, to produce pEP015-? 9- 16 The No-ti insert is isolated from pEP015-? 9-16 and labeled with DIG as above, and used as a probe in the DNA hybridization experiments in a high stringency against pEP015-NHl, pEP015-NH2, pE- P015-NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong hybridization signals are detected for pEP015-NH6, and also for the expected clones pEP015-H2.7 and pEP015-NH24. The nucleotide sequence data are generated from both ends of pEP015-NH6 and from the end of pEP015-H2.7 which does not overlap with pEP015-4-l. The polymerase chain reaction primers are designed pointing to the Hin-dlll sites, and used in the amplification reactions with pE-P015 and in separate experiments, with the genomic DNA of Sorangium cellulosum So ce90 as the temperate ones. A specific amplification is found with the primer pair "B" of EP015-NH6: CACCGAAGCGTCGATCTGGTCCATC (SEQ ID NO: 29) and the "A" end of pEP015-H2.7: CGGTCAGATCGACGACGGGCTTTCC (SEQ ID NO: 30) with both tempered. The amplimers are cloned into pBluescript II SK- and are completely sequenced. The sequences of the amplimers are identical, and are also completely in agreement with the end sequences of pEP015-NH6 and pEP015-H2.7, fused at the HindIII site, establishing that the HindIII fragments of pEP015-NH6 and pEP015-H2. 7, in this order, are contiguous. All these experiments, taken together, establish a contiguous HindIII fragment covering a reaction of approximately 55 kb, consisting of the HindIII inserts of pEP015-NH6, pEP015-H2.7, pEP015-NH24, and pEP015-NH2 , in this order. It is found that the inserts of the two remaining Hm-dlll subclones, ie, pEP015-NHl and pEPO15-H3.0, are not parts of this contiguous one.
Example 10: Additional Extension of the Subclone Contiguous Covering the Epothilone Biosynthesis Genes An approximately 2.2 kb BamHI-Jindrin fragment derived from the downstream end of the pEP015-NH2 insert is isolated and therefore represents the extreme Downstream of the contiguous subclone described in Example 9, it is labeled with DIG, and used in Southern hybridization experiments against the DNAs of pEP015 and pEP015-NH2 digested with various enzymes. It is always found that the bands of strong hybridization are of equal sbetween the two white DNAs, indicating that the genomic DNA fragment of Sorangium cell ulosum So ce90 is cloned at the ends pEP015 with the Hm-dlll site at the downstream end of pEP015-NH2. A Sorangium um cellulosum So ce90 cosmid DNA library is generated using the procedures set forth in pScosTriplex-II (Ji, et al., Genomics 31: 185-392 (1996)). Briefly, the high molecular weight genomic DNA of Sorangium cellulosum So ce90 is digested partially with the restriction enzyme Sau3AI to provide fragments with average s of approximately 40 kb, and ligated with pScos-Triplex-II digested with Ba Hl and Xbal. The ligation mixture is packaged with Gigapack III XL (Stratagene), and used to transfect XL1 Blue MR cells from E. coli. The cosmid library is screened with the Ba-HI-HindIII fragment of approximately 2.2 kb, derived from the downstream end of the pEP015-NH2 insert, and used as a probe in the hybridization of the colony. A clone of strong hybridization, called pEP04E7, is selected. The DNA is isolated from pEP04E7, digested with several restriction endonu-cleases, and probed with Southern hybridization experiments with the Ba Hl-HindIII fragment of 2.2 kb. A Notl fragment of strong hybridization of approximately 9 kb sis selected and subcloned into pBluescript II SK-to produce pEP04E7-? 9-8. Additional Southern hybridization experiments reveal that the approximately 9 kb Notl insert of pEP04E7-? 9-8 overlaps with pEP015-? H2 over 6 kb in a NotI-HindIII fragment, while the remaining HindIII-NotI fragment of about 3 kb would extend the contiguous subclone described in Example 9. However, the end sequencing reveals that the downstream end of the insert of pEP04E7-? 9-8 contains the polylinker Ba HE -Notl of pScosTriplex-II, thus indicating that the insert of the AD? The genome of pEP04E7 terminates at a Sau3AI site within the extending HindIII-Notl fragment, and that the NotI si-thio is derived from pScosTriplex-II.
A PstI-Sali fragment of approximately 1.6 kb derived from the HindIII-Notl subfragment extending from approximately 3 kb of pEP04E7-? 9-8 is used, which contains only sequences derived from Sorangium cellulosum So ce90 free from the vector, as a probe against the bacterial artificial chromosome library described in Example 2. In addition to EP015, previously isolated, it is found that a Bac clone termed EP032, hybrid strongly to the probe. PEP032 is isolated, digested with several restriction endonucleases, and hybrid with the PstI-Salí probe of approximately 1.6 kb. A HindIII-EcoRV fragment of approximately 13 kb sis found to hybridstrongly to the probe, and is subcloned into pBluescript II SK- digested with tfindIII and HincII to produce pEP032-HEV15. Oligonucleotide primers are designed based on the downstream end sequence of pEP015-? H2, and in the upstream end sequence (HindIII) derived from pEP032-HEV15, and used in sequencing reactions with pEP04E7- ? 9-8 AS the tempered. The sequences reveal the existence of a small HindIII fragment (EP04? I7-H0.02) of 24 base pairs, which can not be detected in the conventional restriction analysis, separating the Hind'b.II site at the downstream end of pEP015-? H2 from the HindIII site at the upstream end of pEP032-HEV15. Accordingly, the contiguous subclone described in Example 9 is extended to include the HindIII fragment of EPO4E7-H0.02 and the insert of pEP032-HEV15, and constitutes the inserts of: pEP015-NH6, pEP015-H2.7, pEP015 -NH24, pEP015-NH2, EPO4E7-H0.02 and pEP032-HEV15, in this order.
Example 11: Determination of the Subclone Contiguous Nucleotide Sequence Covering the Epothilone Biosynthesis Genes The nucleotide sequence of the contiguous subclone described in Example 10 is determined as follows. pEP015-H2.7. The plasmid DNA is isolated from Escherichia coli strain DH10B [pEP015-H2.7], and the nucleotide sequence of the 2.7 kb BamH1 insert in pEP015-H2.7 is determined. Automated DNA sequencing is performed on the double-stranded DNA annealing by the dideoxynucleotide chain termination method, using the Applied Biosystems model 377 sequencers. The primers used are the universal reverse primer (5 'GGA AAC AGC). TAT GAC CAT G 3 '(SEQ ID NO: 24)) and urii-versal forward primer (5' GTA AAA CGA CGG CCA GT 3 '(SEQ ID NO: 25)). In the next rounds of sequencing reactions, custom synthesized oligonucleotides are used, designed at the 3 'ends of the previously determined sequences, in order to extend and join the contiguous ones. pEP015-NH6, pEP015-NH24 and pEP015-NH2. The Hin-dlll inserts of these plasmids are isolated, and subjected to random fragmentation using a Hydroshear apparatus (Genomic Instrumentation Services, Inc.) to give an average fragment size of 1 to 2 kb. The fragments are repaired at the ends using enzymes of T4 DNA polymerase and Klenow DNA polymerase in the presence of deoxynucleotide triphosphates, and are phosphorylated with T4 DNA kinase in the presence of ribo-ATP. The fragments in the size range of 1.5 to 2.2 kb are isolated from agarose gels, and ligated into pBluescript II SK-which has been cut with EcoRV, and dephosphorylated. Sequential random subclones are used using the universal reverse and forward primers. pEP032-HEV15. PEP032-HEV15 is digested with HindIII and SspI, the approximately 13.3 kb fragment containing the HindIII-EcoRV insert of approximately 13 kb from So. cellulosum So ce90, and a 0.3 kb HincII-SspI fragment is isolated from pBluescript II SK-, and partially digested with HaelII to produce fragments with an average size of 1 to 2 kb. Fragments in the size range from 1.5 to 2.2 kb are isolated from agarose gels, and ligated into pBluescript II SK- which has been cut with EcoRV, and dephosphorylated. Sequences are random subclones using the universal reverse and universal forward primers. The chromatograms are analyzed and assembled in the others with the Phred, Phrap and Consed programs (Ewing, et al., Genome Res. 8 (3): 175-185 (1998); Ewing, et al., Genome Res. 8 (3): 186-194 (1998), Gordon, et al., Genome Res. 8 (3): 195-202 (1998)). The contiguous voids are filled in, the discrepancies of the sequences are resolved, and the low quality regions are re-sequenced using custom designed oligonucleotide primers to sequence over any of the original subclones or clones selected from the libraries of random subclones. Both chains are completely sequenced, and each pair of bases is covered with at least a minimum Phred cumulative score of 40 (confidence level of 99.99%). The contiguous nucleotide sequence of 68,750 base pairs is shown as SEQ ID NO: l.
Example 12: Analysis of the Nucleotide Sequence of the Epothilone Biosynthesis Genes It is found that SEQ ID NO: 1 contains 22 open reading frames, as detailed in Table 1 below.
Table 1 * On the complementary complementary chain. Numbering according to SEQ ID N0: 1. epoA (nucleotides 7610-11875 of SEQ ID NO: 1) codes for EPOS A (SEQ ID NO: 2), a polyketide type I synthase consisting of a single module, and which houses the following domains: β-ketoacyl- synthase (KS) (nucleotides 7643-8920 of SEQ ID NO: 1, amino acids 11-437 of SEQ ID NO: 2); acyltransferase (AT) (nucleotides 9236-10201 of SEQ ID NO: 1, amino acids 543-864 of SEQ ID NO: 2); enoyl reductase (ER) (nucleotides 10529-11428 of SEQ ID NO: 1, amino acids 974-1273 of SEQ ID NO: 2); and homologous domain of acyl carrier protein (ACP) (nucleotides 11549-11764 of SEQ ID NO: 1, amino acids 1314-1385 of SEQ ID NO: 2). Sequence comparisons and motif analysis (Haydock, et al., FEBS Let t. 374: 246-248 (1995); Tang, and collaborators, Gene 216: 255-265 (1998)) reveal that the AT encoded by EPOS A is specific for malonyl-CoA. EPOS A must be involved in the initiation of epothilone biosynthesis by loading the acetate unit to the multi-enzyme complex that will eventually be part of the 2-methylthiazole ring (C26 and C20). epoP (nucleotides 11872-16104 of SEQ ID NO: 1) codes for EPOS P (SEQ ID NO: 3), a non-ribosomal peptide synthetase containing a modulus. EPOS P hosts the following domains: • Peptide bond formation domain, delineated by the K motif (amino acids 72-81 [FPLTDIQESY] of SEQ ID NO: 3, corresponding to nucleotide positions 12085-12114 of SEQ ID NO: 1); L motif (amino acids 118-125 [WARHDML] of SEQ ID NO: 3, corresponding to nucleotide positions 12223-12246 of SEQ ID NO: 1); M motif (amino acids 199-212 [SIDLINVDLGSLSI] of SEQ ID NO: 3, corresponding to nucleotide positions 12466-12507 of SEQ ID NO: 1); and motif O (amino acids 353-363 [GDFTSMVLLDI] of SEQ ID NO: 3, corresponding to nucleotide positions 12928-12960 of SEQ ID NO: 1); • domain of aminoacyl adenylate formation, delineated by motif A (amino acids 549-565 [LTYEELSRRSRRLGARL] of SEQ ID NO: 3, corresponding to nucleotide positions 13516-13566 of SEQ ID NO: 1); B motif (amino acids 588-603 [VAVLAVLESGAAYVPI] of SEQ ID NO: 3, corresponding to nucleotide positions 13633-13680 of SEQ ID NO: 1); C motifs (amino acids 669-684 [AYVIYTSGSTGLPKGV] of SEQ ID NO: 3, corresponding to nucleotide positions 13876-13923 of SEQ ID NO: 1); motif D (amino acids 815-821 [SLGGATE] of SEQ ID NO: 3, corresponding to nucleotide positions 14313-14334 of SEQ ID NO: l); Molecule E (amino acids 868-892 [GQLYIGGVGLALGYWRDEEKTRKSF] of SEQ ID NO: 3, corresponding to nucleotide positions 14473-14547 of SEQ ID NO: 1); F motif (amino acids 903-912 [YKTGDLGRYL] of SEQ ID NO: 3, corresponding to nucleotide positions 14578-14607 of SEQ ID NO: 1); G motif (amino acids 918-940 [EFMGREDNQIKLRGYRVELGEIE] of SEQ ID NO: 3, corresponding to nucleotide positions 14623-14692 of SEQ ID NO: 1); H motif (amino acids 1268-1274 [LPEYMVP] of SEQ ID NO: 3, corresponding to nucleotide positions 15673-15693 of SEQ ID NO: l); and motif I (amino acids 1285-1297 [LTSNGKVDRKALR] of SEQ ID NO: 3, corresponding to nucleotide positions 15724-15762 of SEQ ID NO: l); • an unknown domain, inserted between the G and H motifs of the aminoacyl amino acid adenylate formation domain (amino acids 973-1256 of SEQ ID NO: 3, corresponding to nucleotide positions 14788-15639 of SEQ ID NO: 3J and • a homolog domain of peptidyl carrier protein (PCP), delineated by the J motif (amino acids 1344-1351 [GATSIHIV] of SEQ ID NO: 3, corresponding to nucleotide positions 15901-15924 of the SEQ ID NO: 1) It is proposed that EPOS P is involved in the activation of a cysteine by adenylation, binding to the activated cysteine as an aminoacyl-S-PCP, forming a peptide bond between the enzyme-linked cysteine and the acetyl -S-ACP supplied by EPOS A, and the formation of the initial thiazolin ring by intramolecular heterocycling The unknown domain of EPOS P exhibits very weak homologies with the oxidases and reductases of NAD (P) H from the Bacillu species Accordingly, this unknown domain, and / or the EP domain of EPOS A may be involved in the oxidation of the initial 2-methylthiazoline ring with a 2-methyltolol. epoB (nucleotides 16251-21749 of SEQ ID NO: 1) codes for EPOS B (SEQ ID NO: 4), a synthase of the type I polypeptide consisting of a single module, and which houses the following domains: KS (nucleotides 16269 -17546 of SEQ ID NO: 1, amino acids 7-432 of SEQ ID NO: 4); AT (nucleotides 17865-18827 of SEQ ID NO: 1, amino acids 539-859 of SEQ ID N0: 4); deshi-dratase (DH) (nucleotides 18855-19361 of SEQ ID NO: 1, atri-no acids 869-1037 of SEQ ID NO: 4); β-ketoreductase (KR) (nucleotides 20565-21302 of SEQ ID NO: 1, amino acids 1429-16d4 of SEQ ID NO: 4); and ACP (nucleotides 21414-21626 of SEQ ID NO: 1, amino acids 1722-1792 of SEQ ID NO: 4). Sequence comparisons and analysis of the motifs reveals that the AT encoded by EPOS B is specific for methalene-CoA. EPOS A must be involved in the extension of the first chain of the polyketide by catalyzing the Claisen-type condensation of the initiator group 2-methyl-4-thiazolecarboxyl-S-PCP with methylmalonyl-S-ACP, and the concomitant reduction of the ß-keto group of C17 with an enoyl. epoC (nucleotides 21746-43519 of SEQ ID N0: 1) codes for EPOS C (SEQ ID N0: 5), a polyketide type I synthase consisting of 4 modules. The first module houses a KS (nu-cleotides 21860-23116 of SEQ ID NO: 1, amino acids 39-457 of SEQ ID NO: 5); a specific AT of malonyl-CoA (nucleotides 23431-24397 of SEQ ID NO: 1, amino acids 563-884 of SEQ ID NO: 5); a KR (nucleotides 25184-25942 of SEQ ID NO: 1, amino acids 1147-1399 of SEQ ID NO: 5); and an ACP (nucleotides 26045-26263 of SEQ ID NO: 1, amino acids 1434-1506 of SEQ ID NO: 5). This module incorporates an acetate extender unit (C14-C13) and reduces the ß-keto group in C15 to the hydroxyl group that takes part in the final lactonization of the epothilone-macrolactone ring. The second EPOS C module houses a KS (nucleotides 26318-27595 of SEQ ID NO: 1, amino acids 1524-1950 of SEQ ID NO: 5); a specific AT of malonyl-CoA (nucleotides 27911-28876 of SEQ ID NO: 1, amino acids 2056-2377 of SEQ ID NO: 5); a KR (nucleotides 29678-30429 of the SEQ ID NO: 1, amino acids 2645-2895 of SEQ ID NO: 5); and one ACP (nucleotides 30539-30759 of SEQ ID NO: 1, amino acids 2932-3005 of SEQ ID NO: 5). This module incorporates an acetate extender unit (C12-C11) and reduces the ß-keto group in C13 to a hydroxyl group. Accordingly, the nascent polyketide chain of epothilone corresponds to epothilone A, and incorporation of the C12 methyl side chain into epothilone B would require a C-methyltransferase activity subsequent to the PKS. The formation of the C13-C12 epoxy ring would also require an oxidation step subsequent to the PKS. The third EPOS C module houses a KS (nucleotides 30815-32092 of SEQ ID NO: 1, amino acids 3024-3449 of SEQ ID NO: 5); a specific AT of malonyl-CoA (nucleotides 32408-33373 of SEQ ID NO: 1, amino acids 3555-3876 of SEQ ID NO: 5); a DH (nucleotides 33401-33889 of SEQ ID NO: 1, amino acids 3886-4048 of SEQ ID NO: 5); an ER (nucleotides 35042-35902 of SEQ ID NO: 1, amino acids 4433-4719 of SEQ ID NO: 5); a KR (nucleotides 35930-36667 of SEQ ID NO: 1, amino acids 4729-4974 of SEQ ID NO: 5); and an ACP (nucleotides 36773-36991 of SEQ ID NO: 1, amino acids 5010-5082 of SEQ ID NO: 5). This module incorporates an acetate extender unit (C10-C9) and completely reduces the ß-keto group in C31. The fourth EPOS C module houses a KS (nucleotides 37052-38320 of SEQ ID NO: 1, amino acids 5103-5525 of SEQ ID NO: 5); a specific AT of methylmalonyl-CoA (nucleotides 38636-39598 of SEQ ID NO: 1, amino acids 5631-5951 of SEQ ID NO: 5); a DH (nucleotides 39635-40141 of SEQ ID NO: 1, amino acids 5964-6132 of SEQ ID NO: 5); an ER (nucleotides 41369-42256 of SEQ ID NO: 1, amino acids 6542-6837 of SEQ ID NO: 5); a KR (nucleotides 42314-43048 of SEQ ID NO: 1, amino acids 6857-7101 of SEQ ID NO: 5); and an ACP (nucleotides 43163-43378 of SEQ ID NO: 1, amino acids 7140-7211 of SEQ ID NO: 5). This module incorporates a propionate extender unit (C24 and C8-C7) and completely reduces the ß-keto group in C. epoD (nucleotides 43524-54920 of SEQ ID NO: l) codif Read for EPOS D (SEQ ID NO: 6), a polyketide type I synthase consisting of 2 modules. The first module houses a KS (nucleotides 43626-44885 of SEQ ID NO: 1, amino acids 35-454 of SEQ ID NO: 6); a specific AT of methylmalonyl-CoA (nucleotides 45204-46166 of SEQ ID NO: 1, amino acids 561-881 of SEQ ID NO: 6); a KR (nucleotides 46950-47702 of SEQ ID NO: 1, amino acids 1143-1393 of SEQ ID NO: 6); and an ACP (nucleotides 47811-48032 of SEQ ID NO: 1, amino acids 1430-1503 of SEQ ID NO: 6). This module incorporates a propionate extender unit (C23 and C6-C5) and reduces the ß-keto group in C7 to a hydroxyl group. The second module houses a KS (nu-cleotides 48087-49361 of SEQ ID NO: 1, amino acids 1522-1946 of SEQ ID NO: 6); a specific AT of methylmalonyl-CoA (nucleotides 49680-50642 of SEQ ID NO: 1, amino acids 2053-2373 of SEQ ID NO: 6); a DH (nucleotides 50670-51176 of SEQ ID NO: 1, amino acids 2383-2551 of SEQ ID NO: 6); a metLl-transferase (MT, nucleotides 51534-52657 of SEQ ID NO: 1, amino acids 2671-3045 of SEQ ID NO: 6); a KR (nucleotides 3697-54431 of SEQ ID NO: 1, amino acids 3392-3636 of SEQ ID NO: 6); and an ACP (nucleotides 54540-54758 of SEQ ID NO: 1, amino acids 3673-3745 of SEQ ID NO: 6). This module incorporates a propiotate extender unit (C21 or C22 and C4-C3) and reduces the ß-keto group in C5 to a hydroxyl group. This reduction is somewhat unexpected, because the epothilones contain a keto group in C5. The discrepancies of this class between the reductive capacities deduced from the PKS modules and the reduction-oxidation state of the corresponding positions in the final polyketide products, however, they have been reported in the literature (see, for example, Schwecke, et al, Proc. Na ti, Acad. Sci. USA 92: 7839-7843 (1995) and Schupp, et al., FEMS Microbiology Letters 159: 201-207 (1998)). An important feature of epothilones is the presence of gem-methyl side groups in C4 (C21 and C22). It is predicted that the second EPOS D module incorporates a propionate unit in the chain of the growing polyketide, providing a methyl side chain at C4. This module also contains a methyltransferase domain integrated in the PKS between the DH and KR domains, in a configuration similar to that seen in the yersiniabactin synthase HMWP1 (Gehrirg, AM, DeMoll, E., Fetherston, JD, Mori , I., Mayhew, GF, Blattner, FR, Walsh, CT, and Perry, RD: Iron acquisition in plague: modular logic in enzymatic biogenesis of yersiniabact in by Yersinia pestis, Chem. Biol. 5, 573-586, 1998) . It is proposed that this MT domain in EPOS D is responsible for the incorporation of the second methyl side group (C21 or C22) into C4. epoE (nucleotides 54935-62254 of SEQ ID NO: 1) codes for EPOS E (SEQ ID NO: 7), a polyketide type I synthase consisting of a module, which houses a KS (nucleotides 55028-56284 of the SEQ ID NO: l, amino acids 32-450 of SEQ ID NO: 7); a specific AT of malonyl-CoA (nucleotides 56600-57565 of SEQ ID N0: 1, amino acids 556-877 of SEQ ID NO: 7); a DH (nucleotides 57593-58087 of SEQ ID NO: 1, amino acids 837-1051 of SEQ ID NO: 7); an ER probably not functional (nucleotides 59366-60304 of SEQ ID NO: 1, amino acids 1478-1790 of SEQ ID NO: 7); a KR (nucleotides 60362-61099 of SEQ ID NO :, amino acids 1810-2055 of SEQ ID NO: 7); an ACP (: au-cleotides 61211-61426 of SEQ ID NO: 1, amino acids 2093-2164 of SEQ ID NO: 7); and a thioesterase (TE) (nucleotides 61427-62254 of SEQ ID NO: 1, amino acids 2165-2439 of SEQ ID NO: 7). The ER domain in this module hosts an active site motif with some highly i: nu-sual amino acid substitutions that probably make this domain inactive. The module incorporates an acetate extender unit (C2-C1), and reduces the ß-keto group in C3 to an enoyl group. The epothilones contain a hydroxyl group at C3, so that this reduction also seems to be excessive, as discussed for the second EPOS D module. The EP domain of EPOS E takes part in the release and cyclization of the polyketide chain grown by means of lactonization between the carboxyl group of Cl and the hydroxyl group of C15. Five open reading frames are detected upstream of epoA in the sequenced region. The partially sequenced orfl has no homologs in the sequence data banks. The deduced protein product (Orf2, SEQ ID NO: 10) of orf2 (nucleotides 3171-1900 on the reverse complement strand of SEQ ID NO: l) shows strong similarities to hypothetical open reading frames from Mycobacterium and Streptomyces coelicolor, and more distant similarities with carboxypeptidases and DD-peptidases from different bacteria. The protein product deduced from orf3 (nucleotides 3415-5556 of SEQ ID NO: 1) Orf 3 (SEQ ID NO: 11), shows homologies with the Na / H antiporters of different bacteria. The Orf3 could take part in the export of epothilones from the producing strain. orf4 and orf5 do not have homologs in the sequence data banks. There are 11 current open reading frames of epoE in the sequenced region. EpoF (nucleotides 62369-63628 of SEQ ID NO: 1) codes for EPOS F (SEQ ID NO: 3), a protein deduced with strong sequence similarities to the cytochrome p450 oxygenases. EPOS F can take part in the adjustment of the reduction-oxidation state of C12, C5, and / or C3 carbons. The protein product deduced from orf14 (nucleotides 67334-68251 of SEQ ID NO: 1), Orf 14 (SEQ ID NO: 22), shows strong similarities with Gl: 3293544, a hypothetical protein without a proposed function from Streptomyces coelicolor, and also with Gl: 2654559, the human embryonic pulse protein. It is also more distantly related to the proteins of the cation efflux system such as GI: 2623026 from Methanobacterium t ermoautotrophicu, so that it could also take part in the export of epothilones from the producer cells. The remaining open reading frames (orf6-orf "13 and orf15) do not show homologies with the inscriptions in the sequence data banks.
Example 13: Recombinant Expression of Epothilone Biosynthetic Genes Epothilone synthase genes according to the present invention are expressed in heterologous organisms for the purposes of producing epothilone in greater amounts than can be done by fermentation of Sorangium cellulosum. A preferable host for the heterologous expression is Streptomyces, for example Streptomyces coelicolor, which natively produces the polyketide actinorrodine. Techniques for the expression of the recombinant PKS gene in this host are described in McDaniel et al., Science 262: 1546-1550 (1993) and Kao et al., Science 265: 509-512 (1994). See also, Holmes et al., EMBO Journal 12 (8): 3183-3191 (1993) and Bibb et al., Gene 38: 215-226 (1985), as well as United States of America Patents Numbers 5,521,077, 5,672,491 , and 5,712,146, which are incorporated herein by reference. According to one method, the heterologous host strain is designed to contain a chromosomal deletion of the actinorhodin (act) gene cluster. Expression plasmids containing the epothilone synthase genes of the invention are constructed by transferring the DNA from a temperature-sensitive donor plasmid to a receptor launch vector in E. coli (McDaniel et al. (1993), and Kao et al. (1994)), in such a way that the synthase genes are constructed by homologous recombination within the vector. In an alternative manner, the cluster of the epothilone-syntax gene is introduced into the vector by ligating the restriction fragment. Following the selection, for example as described in Kao et al. (1994), the vector DNA is introduced into the act-minus strain of Streptomyces coelicolor according to the protocols stipulated in Hopwood and collaborators, Genetic Manipulation. of Streptomyces. A Laboratory Manual (John Innes Foundation, Norwich, United Kingdom, 1985), incorporated herein by reference. The re-combining Streptomyces strain is grown on an R2YE medium (Hopwood et al. (1985)), and produces epothilones. Alternatively, the epothilone-syntax genes according to the present invention are expressed in other host organisms, such as pseudomonads, Bacillus, yeast, insect cells, and / or E. coli The PKS and NRPS genes are preferably expressed in E. coli using the pT7-7 vector, which uses the T7 promoter. See Tabor et al., Proc. Nati Acad. Sci. USA 82: 1074-1078 (1985). In another embodiment, the expression vectors pKK223-3 and pKK223-2 are used to express the E. coli PKS and NRPS genes, either in a transcription or translational fusion, behind the tac or trc promoter. The expression of PKS and NRPS genes in heterologous hosts, which do not naturally have the phosphopantetheinyl (P-pant) transferases necessary for the post-translational modification of the PKS enzymes, requires co-expression in the host of a P -pant -transferase, as written by Kealey et al., Proc. Na ti. Acad. Sci. USA 95: 505-509 (1998).
Example 14: Isolation of Epothilones from the Production Strains Examples of culture, fermentation, and extraction procedures for the isolation of the polyketide, which are useful for extracting epothilones from both native and recombinant hosts, according to the present invention , are given in International Publication Number WO 93/10121 incorporated herein by reference, in Example 57 of the Patent of the United States of America Number 5,639,949, «Sa a in Gerth et al., J. Antibiotics 49; 560-563 (1996), and in Swiss Patent Application No. 396/98, filed February 19, 1998, and in United States Patent Application Number 09 / 248,910 (which also discloses the preferred mutant strains of Sorangium cellulosum), both of which are incorporated herein by reference. The following are procedures that are useful for the isolation of epothilones from cultured Sorangium cellulosum strains, such as So ce90, and can also be used for the isolation of epothilone from recombinant hosts.
A, Culture of epothilone producing strains Strain: Sorangium cellulosum Soce-90 or a recombinant host strain according to the present invention.
Conservation of the strain: In liquid N2.
Medium: Pre-cultivation and intermediate crops: G52 Main crop: 1B12 Medium G52: yeast extract, low salt (BioSpringer, Maison Alfort, France) 2 g / 1 MgSO4 (7 H20) 1 g / 1 CaCl2 (2 H20) 1 g / 1 soybean meal defatted Soyamine 50T (Lucas Meyer, Hamburg, Germany) 2 g / 1 potato starch Noredux A-150 (Blattmann, Waedenswil, Switzerland) 8 g / 1 anhydrous glucose 2 g / 1 EDTA-Fe (III) -salt of Na (8 g / 1) 1 ml / l pH 7.4, corrected with KOH Sterilization: 20 mins. at 120 ° C Medium 1B12: Noredux potato starch A-150 (Blattmann, Waedenswil, Switzerland) 20 g / 1 soybean meal defatted Soyamine 50T (Lucas Meyer, Hamburg, Germany) 11 g / 1 EDTA-Fe (III) -salt of Na 8 mg / l pH 7.8, corrected with KOH Sterilization: 20 mins. at 120 ° C Addition of cyclodextrin and cyclodextrin derivatives: Cyclodextrin (Fluka, Buchs, Switzerland, or Wacker Chemie, Munich, Germany) in different concentrations, sterilized separately, and added to medium 1B12 before sowing.
»** Cultivation; 1 milliliter of the Sorangium cellulosum Soce 90 suspension is transferred from a liquid N2 vial to 10 milliliters of the G52 medium (in a 50 milliliter Erlenmeyer flask), and incubated for 3 days at 180 rpm in a shaker. 30 ° C, with a displacement of 25 millimeters. 5 milliliters of this culture are added to 45 milliliters of G52 medium (in a 200 milliliter Erlenmeyer flask), and incubated for 3 days at 180 rpm in a shaker at 30 ° C, with a displacement of 25 millimeters. Then 50 milliliters of this culture are added to 450 milliliters of G52 medium (in a 2-liter Erlenmeyer flask), and incubated for 3 days at 180 rpm in a shaker at 30 ° C, with a displacement of 50 millimeters. Maintenance culture: The crop is over-sowed every 3 to 4 days, adding 50 milliliters of culture to 450 milli-liters of G52 medium (in a 2-liter Erlenmeyer flask). All experiments and fermentations are carried out starting with this maintenance culture. Testing in a flask: (i) Pre-culture in a shake flask: Starting with 500 milliliters of the maintenance culture, 1 x 450 milliliters of G52 medium is seeded with 50 milliliters of the maintenance culture and incubated for 4 days at 180 rpm. in an agitator at 30 ° C, with a displacement of 50 millimeters. (ii) Main culture in shake flask: 40 milliliters of 1B12 medium plus 5 grams / liter of 4-morpholine-propan-sulfonic acid powder (= MOPS) (in a 200 milliliter Erlen-meyer flask) are mixed with 5 milliliters of a 10x concentrated cyclodextrin solution are seeded with 10 milliliters of preculture and incubated for 5 days at 180 rpm in a shaker at 30 ° C, with a displacement of 50 millimeters. Fermentation: The fermentations are made on a scale of 10 liters, 100 liters, and 500 liters. The fermentations of 20 liters and 100 liters serve as a step of inferior cultivation. Although precultures and intermediate crops are sown as the 10 percent (volume / volume) maintenance crop, the main crops are sown with 20 percent (volume / volume) of the intermediate crop. Important: in contrast to agitated cultures, the ingredients of the fermentation medium are calculated on the final culture volume, including the inoculum. For example, if 18 liters of the medium plus 2 liters of the inoculum are combined, then the substances are weighed for 20 liters, but only mixed with 18 liters. Pre-culture in a shake flask: Beginning with 500 milliliters of maintenance culture, and 4 x 450 milliliters of G52 medium (in a 2-liter Erlen-meyer flask), each is plated with 50 milliliters of the same, and incubate for 4 days at 180 rpm in a shaker at 30 ° C, with displacement of 50 millimeters.
Intermediate culture 20 liters, or 100 liters: 20 liters: 18 liters of G52 medium in a fermentor, which has a total volume of 30 liters, are sown with 2 liters of preculture. The culture lasts from 3 to 4 days, and the conditions are: 30 ° C, 250 rpm, 0.5 liters of air per liter of liquid per minute, excessive pressure of 0.5 bar, no pH control. 100 liters: 90 liters of medium G52 in a fermentaclor that has a total volume of 150 liters are sown with 10 liters of the 20 liters of the intermediate crop. The culture lasts from 3 to 4 days, and the conditions are: 30 ° C, 250 rpm, 0.5 liters of air per liter of liquid per minute, excessive pressure of 0.5 bar, no pH control. Main crop. 10 liters. 100 liters, or 500 liters: 10 liters: The substances of the medium for 10 liters of a 1B12 medium are sterilized in 7 liters of water, then 1 liter of a sterile solution of 2- (hydroxypropyl) -β-cyclodextrin at 10 is added. percent, and are sown with 2 liters of an intermediate culture of 20 liters. The duration of the main crop is 6 to 7 days, and the conditions are: 30 ° C, 250 rpm, 0.5 liters of air per liter of liquid per minute, excessive pressure of 0.5 bar, pH control with H2S04 / KOH up to a pH of 7.6 +/- 0.5 (ie, no control between a pH of 7.1 and 8.1). 100 liters: The medium substances for 100 liters of a 1B12 medium are sterilized in 70 liters of water, then 10 liters of a sterile solution of 2- (hydroxypropyl) -β-cyclodextrin at 10 percent are added, and they are seeded with 20 liters of an intermediate culture of 20 liters. The duration of the main crop is 6 to 7 days, and the conditions are: 30 ° C, 200 rpm, 0.5 liters of air per liter of liquid per minute, excessive pressure of 0.5 bar, pH control with H2S04 / KOH up to a pH of 7.6 +/- 0.5. The sowing chain for a fermentation of 100 liters is shown schematically as follows: maintenance culture (500 milliliters) Medium G52 inter¬ cultivation % medium (eg 20 precultures 110% liters) 10% (4 x 500 ml) medium G52 medium G52 20% maintenance culture (500 ml) medium G52 main culture (eg 100 liters) medium + HP-ß-CD 500 liters: The substances of the medium for 500 liters of medium 1B12 are sterilized in 350 liters of water, then 50 liters of a sterile solution of 2- (hydroxypropyl) -β-cyclodextrin at 10 percent are added and planted with 100 liters. liters of an intermediate culture of 100 liters. The duration of the main crop is 6 to 7 days, and the conditions are: 30 ° C, 120 rpm, 0.5 liters of air per liter of liquid per minute, excessive pressure of 0.5 bar, pH control with H2S04 / KOH lAfA up to a pH of 7.6 +/- 0.5. Product analysis: Sample preparation: 50 milliliter samples are mixed with 2 milliliters of Amberlite XAD16 polystyrene resin (Rohm + Haas, Frankfurt, Germany) and shaken at 180 rpm for 1 hour at 30 ° C. The resin is subsequently filtered using a 150 micron nylon sieve, washed with a little water, and then added together with the filter to a 15 milliliter Nunc tube. Elution of the product from the resin: 10 milliliters of isopropanol (> 99 percent) are added to the tube with the filter and the resin. Then, the sealed tube is stirred for 30 minutes at room temperature on a Rota-Mixer (Labinco BV, Holland). Then 2 milliliters of the liquid are centrifuged out, and the supernatant is added using a pipette for HPLC tubes. HPLC analysis: Column: Waters -Symetry C18, 100 x 4 mm, WAT066220 3.5 microns + preliminary column 3.9 x 20 mm WAT054225 Solvents: A: 0.02 percent phosphoric acid B: Acetomtril (HPLC quality) Gradienee: 41% B from 0 to 7 min. 100% B from 7.2 to 7.8 min. 41% B from 8 to 12 min. Temp. from the oven. : 30 ° C Detection: 250 nm, UV-DAD detection Injection volume: 10 μl Retention time: Epo A: 4.30 min Epo B: 5.38 min B: Effect of the addition of cyclodextrin and cyclodextrin derivatives on the obtained concentrations of epothilone. Cyclodextrins are cyclic oligosaccharides linked with (a-1), 4) of a-D-glucopyranose with a relatively hydrophobic central cavity, and a hydrophilic external surface area. The following are distinguished in particular (the figures in parentheses give the number of glucose units per molecule): a-cyclodextrin (6), β-cyclodextrin (7), β-cyclodextrin (8), d-cyclodextrin ( 9), e-cyclodextrin (10), β-cyclodextrin (11), β-cyclodextrin (12), and β-cyclodextrin (13). Especially preferred are d-cyclodextrin and in particular a-cyclodextrin, β-cyclodextrin or β-cyclodextrin, or mixtures thereof. ^^ * ^^? & ® ^^^^ &! ^^ < ^ ~ S * and ^ ^ Fr * "The cyclodextrin derivatives are primarily derivatives of the aforementioned cyclodextrins, especially of a-cyclodextrin, β-cyclodextrin or β-cyclodextrin, primarily those in which one or more and up to all groups hydroxyl (3 per glucose radical) are etherified or esterified. The ethers are primarily the alkyl ethers, especially lower alkyl ether, such as methyl or ethyl ether, also propyl or butyl ether; the aryl hydroxyalkyl ethers, such as phenyl-hydroxyalkyl lower ether, especially phenyl-hydroxyethyl; hydroxyalkyl ethers, in particular hydroxyalkyl lower ethers, especially 2-hydroxyethyl-, hydroxypropyl-, such as 2-hydroxypropyl- or hydroxybutyl-, such as 2-hydroxybutyl ether; the carboxyalkyl ethers, in particular the carboxyalkyl lower ethers, especially carboxymethyl- or carboxyethyl ether; the carboxyalkyl ethers derivatives, in particular derived carboxyalkyl lower ether, wherein the carboxyl derivative is carboxyl etherified or amidated (primarily aminocarbonyl, mono- or di-lower alkyl-aminocarbonyl, morpholino-, piperidino-, pyrrolidino-, or piperazinecarbonyl or alkyloxycarbonyl), in particular lower alkoxycarbonylalkyl lower alkyl ether, for example methyloxycarbonylpropyl ether or ethyloxycarbonylpropyl ether; the sulfoalkyl ethers, in particular the sulfoalkyl ethers, especially sulfobutyl ether; cyclodextrins wherein one or more OH groups are etherified with a radical of the form: -O- [alk-0-] nH wherein alk is alkyl, especially lower alkyl, and n is an integer from 2 to 12, especially from 2 to 5, in particular 2 or 3; cyclodextrins wherein one or more OH groups are etherified with a radical of the formula: wherein R 'is hydrogen, hydroxyl, -0- (alk-O) ZH, -O- (alk (-R) -0-) pH or -O- (alk (-R) -0-) q-alk -C0-Y; alk in all cases is alkyl, especially lower alkyl; m, n, p, q, and z are an integer from 1 to 12, preferably from 1 to 5, in particular from 1 to 3; and Y is ORi, or NR2R3, wherein Ri, R2, and R3, independently of one another, are hydrogen or lower alkyl, or R2 and R3 combined together with the linking nitrogen, mean morpholino, piperidino, pyrrolidino, or piperazine; or branched cyclodextrins, wherein there are etherifications or acetals with other sugar molecules present, especially glucosyl-, diglucosyl-, (G2-β-cyclodextrin), maltosyl- or di-maltosyl-cyclodextrin, or N-acetylglucosaminyl-, glucosaminyl-, N-acetylgalactosaminyl-, or galactosaminyl-cyclodextrin. The esters are primarily the alkanoyl esters, in particular the lower alkanoyl esters, such as acetyl ether cyclodextrins. It is also possible to have cyclodextrins wherein two or more of these different ether and ester groups are present at the same time. Mixtures of two or more of these cyclodextrins and / or cyclodextrin derivatives can also exist. Preference is given in particular to a-, β- or β-cyclodextrins, or to the lower alkyl ethers thereof, such as methyl-β-cyclodextrin, or in particular 2,6-di-O-methyl-β-cyclodextrin , or in particular its hydroxyalkyl in-ferior ethers, such as 2-hydroxypropyl-α-, 2-hydroxypropyl-β-, or 2-hydroxypropyl-β-cyclodextrin. Cyclodextrins or cyclodextrin derivatives are added to the culture medium preferably in a concentration of 0.02 to 10, preferably 0.05 to 5, especially 0.1 to 4, for example 0.1 to 2 weight percent (weight / volume ). Cyclodextrins or cyclodextrin derivatives are known or can be produced by known processes (see, for example, Patents Numbers US 3,459,731, US 4,383,992, US 4,535,152, US 4,659,696, EP 0 094 157, EP 0 149 197, EP 0 197 571, EP 0 300 526, EP 0 320 032, EP 0 499 322, EP 0 503 710, EP 0 818 469, WO 90/12035, WO 91/11200, WO 93/19061; WO 95/08993; WO 96/14090; GB 2,189,245; DE 3,118,218; DE 3,317,064 and the references mentioned therein, which also relate to the synthesis of cyclodextrins or cyclodextrin derivatives, or also: T. Loftsson and M.E. Brewster (1996): Pharmaceutical Applications of Cyclodextrins: Drug Solubilization and Stabilization: Journal of Pharmaceutical Science JL5. (10): 1037-1025; R.A. Rajewski and V.J. Stella (1996): Pharmaceutical Applications of Cyclodextrins: In Vivo Drug Delivery: Journal of Pharmaceutical Science £ j (11): 1142-1169). All of the cyclodextrin derivatives tested herein can be obtained from Fluka, Buchs, CH. The tests are carried out in shake flasks of 200 milliliters with a culture volume of 50 milliliters. As controls, flasks are used with Amberlite XAD-16 absorbent resin (Rohm &Haas, Frankfurt, Germany, and without addition of adsorbent.) After incubation for 5 days, the following epothilone titrations can be determined by HPLC: Table 2 ^ &JSÉ «1) Apart from Amberlite (percent by volume / volume), all percentages are by weight (percent by weight / volume).
Few of the cyclodextrins tested (2,6-di-o-methyl-β-cyclodextrin, methyl-β-cyclodextrin) do not exhibit any effect or exhibit a negative effect on the production of epothilone at the concentrations used. The 2-hydroxypropyl-β-cyclodextrin and the β-cyclodextrin at 1-2 percent, increase the production of epothilone in the examples by 6 to 8 times, compared with the production without using cyclodextrins.
'J & C? Fermentation of 10 liters with 2- (hydroxypropyl) -β-cyclodextrin at 1 percent: The fermentation takes place in a glass fermenter of 15 liters. The medium contains 10 grams / liter of 2-hydroxypropyl-β-cyclodextrin from Wacker Chemie, Munich, Germany. The progress of the fermentation is illustrated in Table 3. The fermentation ends after 6 days, and processing takes place. Table 3: Progress of a 10-liter fermentation D Fermentation of 1M liters with 2- (hydroxypropyl) -β-cyclo-eastrin 1 percent; The fermentation takes place in a fermentor of 150 liters. The medium contains 10 grams / liter of 2- (hydroxypropyl) -β-cyclodextrin. The progress of the fermentation is illustrated in Table 4. The fermentation is harvested after 7 days, and processed. Table 4: Progress of a 100-liter fermentation E: Fermentation of 500 liters with 2- (hydroxypropyl) -β-cyclodextrin at 1 percent: The fermentation takes place in a 750 Liter fermenter, the medium contains 10 grams / liter of 2- (hydroxypropyl L) - ß-cyclodextrin . The progress of the fermentation is illustrated in Table 5. Fermentation is harvested after 7 days, and processed. Table 5: Progress of a 500-liter fermentation Fi Example of comparison of fermentation of 1Q liters of aggregate, adsorbent; The fermentation takes place in a 15-liter glass fermenter. The medium does not contain cyclodextrin or another adsorbent. The progress of the fermentation is illustrated in Table 6. The fermentation is not harvested and processed. Table 6: Progress of a fermentation of 10 liters without adsorbent.
G: Processing of epothilones: Isolation from a main culture of 500 liters; The volume of the crop from the 500 liter main crop of Example 2D is 450 liters, and it is separated using a Westfalia lightening separator type SA-20-06 (rpm = 6500) in the liquid phase (centrifugation + rinse water) = 650 liters) and in the solid phase (cells = approximately 15 kilograms). The main part of epothilones is found in the centrifuge. The centrifuged cell pulp contains > 15 percent of the determined epothilone portion, and it is not further processed. The 650 liter centrifuge is then placed in a 4,000 liter agitation vessel, mixed with 10 liters of Amberlite XAD-16 (centrifuged: resin volume = 65: 1), and stirred. After a contact period of approximately 2 hours, the resin is centrifuged in a Heine overflow centrifuge (contents of the basket: 40 liters, rpm = 2,800). The resin is discharged from the centrifuge, and washed with 10 to 15 liters of deionized water. The desorption is effected by stirring the resin twice, each time in portions with 30 liters of isopropanol in 30 liter glass stirring vessels for 30 minutes. The separation of the isopropanol phase from the resin takes place using a suction filter. The isopropanol is then removed from the combined isopropanol phases by the addition of 15 to 20 liters of water in a circulating vacuum-operated evaporator (Schmid-Verdampfer), and the resulting water phase of approximately 10 liters is extracted three times, each once with 10 liters of ethyl acetate. The extraction is carried out in 30 liter glass stirring vessels. The ethyl acetate extract is concentrated to 3 to 5 liters in a circulating evaporator operated under vacuum (Schmid-Verdampfer), and then concentrated to dryness in a rotary evaporator (Büchi type vacuum). The result is an ethyl extract of 50.2 grams. The ethyl acetate extract is dissolved in 500 milliliters of methanol, the insoluble portions are filtered using a folded filter, and the solution is added to a cobalt. lumna Sephadex LH 20 of 10 kilograms (Pharmacia Uppsala, Sweden) (diameter of the column of 20 centimeters, filling level of approximately 1.2 meters). The elution is carried out with methanol as eluent. Epothilone A and B are present predominantly in fractions 21 to 23 (in the fraction size of 1 liter). These fractions are concentrated to dryness under vacuum in a rotary evaporator (total weight of 9.0 grams). Subsequently, these peak fractions of Sephadex (9.0 grams) are dissolved in 92 milliliters of acetonitrile: water: methylene chloride = 50: 40: 2, the solution is filtered through a folded filter, and added to a column RP (Prepbar 200, Merck equipment, 2.0 kilograms of LiChrospher RP-18 Merck, grain size of 12 microns, column diameter of 10 centimeters, filling level of 42 centimeters, Merck, Darmstadt, Germany). The elution is carried out with acetonitrile: water = 3: 7 (flow rate = 500 milliliters / minute, retention time of epothilone A = approximately 51-59 minutes, retention time of epothilone B = approximately 60-69 minutes). The fractionation is monitored with an ultraviolet detector at 250 nanometers. The fractions are concentrated to dryness in vacuo in a Büchi-Rotavapor rotary evaporator. The weight of the peak fraction of epothilone A is 700 milligrams, and according to the HPLC (external standard) it has a content of 75.1 percent. The peak fraction of epothilone B is 1,980 milligrams, and the content according to HPLC (external standard). 1 no) is 86.6 percent. Finally, the fraction of epote-lona A (700 milligrams) is crystallized from 5 milliliters of ethyl acetate: toluene = 2: 3, and produces 170 milligrams of a pure crystalline epothilone A [content according to HPLC (percentage of area) = 94.3 percent]. The crystallization of the epothilone B fraction (1,980 milligrams) is carried out from 18 milliliters of methanol, and produces 1,440 milligrams of the pure crystalline of epothilone B [content according to HPLC (area percentage) = 99.2 percent]. p.f. (Epothilone B): for example 124-125 ° C; data from ^ -RMN for epothilone B: 500 MHz-NMR, solvent: DMSO-d6. Chemical shift d in ppm in relation to TMS. s = singlet; d = doublet; m = multiplet. d (Multiplicity) Integral (H number) 7.34 (s) 1 6.50 (s) 1 5.28 (d) 1 5.08 (d) 1 4.46 (d) 1 4.08 (m) 1 3.47 (m) 1 3.11 (m) 1 2.83 (dd) 1 2.64 (s) 3 - "f- 2.36 () 2 2.09 (s) 3 2.04 (m) 1 1.83 (m) 1 1.61 (m) 1 1.47 - 1.24 (m) 4 1.18 (s) 6 1.13 (m) 2 1.06 (d) 3 0.89 (d + s, overlapping) 6 S = 41 Example 15: Medical Uses of the Recombinantly Produced Epothilones. Preparations or pharmaceutical compositions comprising epothilones are used, for example, in the treatment of cancer diseases, such as different human solid tumors. These anti-cancer formulations comprise, for example, an active amount of an epothilone, together with one or more pharmaceutically suitable organic or inorganic liquid or solid carrier materials. These formulations are delivered, for example, enterally, nasally, rectally, orally, or parenterally, in particular intramuscularly or intravenously. The dosage of the active ingredient depends on the weight, age, and physical and pharmacokinetic condition of the patient, and also depends on the delivery method. Because epothilones mimic the biological effects of taxol, epothilones can be used to replace taxol in compositions and methods that use taxol in the treatment of cancer. See, for example, Patents of the United States of North America Nos. 5,496,804; 5,565,478; and 5,641,803, all of which are incorporated herein by reference. For example, for the treatments, epothilone B is supplied in individual 2 milliliter glass bottles formulated as 1 milligram / l milliliter of colorless or clear intravenous concentrate. The substance is formulated in polyethylene: n-glycol 300 (PEG 300), and diluted with 50 or 100 milliliters of 0.9 percent sodium chloride injection, USP, to achieve the desired final concentration of the drug for infusion. It is administered as a single intravenous infusion of 30 minutes every 21 days (treatment every three weeks) for six cycles, or as a single intravenous infusion of 30 minutes every 7 days (weekly treatment). Preferably, for the weekly treatment, the dose is between about 0.1 and about 6, preferably between about 0.1 and about 5 milligrams / square meter, more preferably between about 0.1 and about 3 milligrams / square meter, still more preferably between 0.1 and 1.7 milligrams / square meter, very pree- ..... £. ^ - Easily between about 0.3 and about 1 milligram / square meter; for treatment every three weeks (treatment every three weeks or every third week) the dose is between about 0.3 and about 18 milligrams / square meter, preferably between about 0.3 and about 15 milligrams / square meter, more preferably between about 0.3 and about 12 milligrams / square meter, still more preferably between about 0.3 and about 7.5 milligrams / square meter, still more preferably between about 0.3 and about 5 milligrams / square meter, and most preferably between about 1.0 and about 3.0 milligrams / square meter. This preferred dose is administered to the human by intravenous (iv) administration for 2 to 180 minutes, preferably 2 to 120 minutes, more preferably for about 5 to about 30 minutes, more preferably for about 10 to about 30 minutes, example for approximately 30 minutes. Although the present invention has been described with reference to its specific modalities, it will be appreciated that numerous variations, modifications, and modalities are possible, and in accordance with the foregoing, it will be considered that all these variations, modifications, and modalities are within the scope of the invention. spirit and scope of the present invention.
IST3¡jS > 0 SEQUENCES < 110 > Novartis AG < 120 > GENES FOR THE EPOTILONE BIOSYNTHESIS < 130 > 4-30582A < 140 > < 141 > < 160 > 30 < 170 > Patentln Ver. 2.0 < 210 > 1 < 211 > 68750 < 212 > DNA < 213 > Sorangium cellulosum < 400 > 1 aagcttcgct cgacgccctc ttcgcccgcg ccacctctgc ccgtgtgctc gatgatggcc 60 acggccgggc cacggagcgg catgtgctcg ccgaggcgcg cgggatcgag gacctccgcg 120 ccctccgaga gcacctccgc atccaggaag gggggccgtc ctttcactgc atgtgcctcg 180 gcgacctgac ggtggagctc ctcgcgcacg accagcccct cgcgtccatc agcttccacc 240 atgcccgcag cctgaggcac cccgactgga cctcggacgc gatgctcgtc gacggccccg 300 cgctcgtccg gtggctcgcc gcgcgcggcg cgccgggtcc cctccgcgag tacgaagagg 360 agcgcgagcg agcccgaacc gcgcaggagg cgaggcgcct gtggctcgcg gccgcgccgc 420 cctgcttcgc gcccgatctg ccccgcttcg aggacgacgc caacgggctg ccgctcggcc 480 cgatgtcgcc tgaagtcgcc gaggccgagc ggcgcctccg cgcctcgtac gcgactcctg 540 agctcgcctg tgccgcgctg ctcgcctggc tcgggacggg cgcgggtccc tggtccggat 600 atcccgccta cgagatgctg ccagagaatc tgctcctcgg gtttggcctc ccgaccgcga 660 tcgccgcggc ctccgcgccc ggcacatcgg aggccgctct ccgcggcgca gcgcggctgt 720 ggaggtcgta tcgcctcctg tcgagcaaga agagccagct cggcaacatc cccgaagccc 780 tgtgggagcg gctccggacg atcgtccgcg cgatgggcaa tgccgacaac ctctctcgct 840 tcgagcgcgc cgaggcga tc gcggcggagg tgcgccgcct gcgcgcacag ccggcgccct 900 tcgcggcggg cgccggcctg gcggtcgctg gggtctcctc gagcggccgg ctctcgggcc 960 cggagacgca tcgtgaccga ttgtactccg gcgacggcaa cgacatcgtc atgttccaac 1020 ccggccggat ctcgccggtc gtgctgctcg ccggaaccga tcccttcttc gagctcgcac 1080 cgcccctcag ccagatgctc ttcgtcgcgc acgccaacgc gggcaccatc tccaaggtcc 1140 tgacggaagg cagccccctc atcgtgatgg caagaaacca ggcgcgaccg atgagcctcg 1200 tccacgctcg cgggttcatg gcgtgggtca accaggccat ggtgcccgac cccgagcggg 1260 gcgcgccctt cgtcgtccag cgctcgacca tcatggaatt cgagcacccc acgcctcgtt 1320 gtctccacga gcccgccggc agcgctttct ccctcgcctg cgacgaggag cacctctact 1380 ggtgcgagct ttcggctggc cggctcgagc tatggcgcca cccgcaccac cgccccggcg 1440 ccccgagccg cttcgcgtac ctcggcgagc accccattgc ggcgacctgg tacccctcgc 1500 tcaccctcaa tgcgacccac gtgctgtggg ccgaccctga tcgcagggcc atcctcgggg 1560 tcgacaagcg caccggcgta gagcccatcg tcctcgcgga gacgcgccat cccccggcgc 1620 acgtcgtgtc cgaggaccgg gacatcttcg cgcttaccgg acagcccgac tcccgcgact 1680 ggcacgtcga gcacatccgc tccgg cgcct ccaccgtcgt ggccgactac cagcgccagc 1740 tatgggaccg ccctgacatg gtgctcaatc ggcgcggcct cttcttcacg acgaacgacc 1800 gcatcctgac gctcgcccgc agctgacatc gctcgacgcc gggccgctca tcgagggcgc 1860 ccggaccgag ctggcgaccc gccgctggcg ggccgcagct catgccgatt cggtggcgac 1920 gtagacgctg cgccagaaac gctcgagagc ccccgagaac aggaagccgg cggattgtgt 1980 catcacgatc ccgatcagct cgcggcccgg atcattgatc caggacgtcc cgaacccgcc 2040 gtcccaccca tagcgcccgg gcacctccga gaccgcgtcc ggcgccgtga ccacggccat 2100 cccataaccc cagccgtgcg tctcgaagaa gcccgggaaa aacgaggacg ccgccttctg 2160 .,% »» H ».« ** - ^ * 116 ggccggcgtg aggtgatcgg ccgtcatctc gcgcaccgag gcggcgctca agagccgccg 2220 gccctcgtgc acaccgccgt tcatgagcat gcgcgcgaac aggaggtagt cgtccaccgt 2280 cgacacgagc ccggcggcgc ccgaagggaa cgccggcggg ctggcatagg cgctctcggc 2340 cccgtcgcga tccatgcgcg tcttctcccc cgtctgctcg tcggtgaagt aaccgcagcc 2400 cgcgaaccga gcgagcttgt ccgccgggac gtgaaagtcg gtgtcccgca tcccgagcgg 2460 cgcgaggatg cgctcgcgca cgaacgcatc gaagccctgg tcggccgcgc gccccacgag 2520 caccccctgc accaggctcc ccgtgttgta catccactgc gcccccggct gatgcatgag 2580 cggcagcgtc ccgagccgcc ggatccactc gtctggcccg tgcggcgtca tcggcaccgg 2640 ctgcgcgttg acgagcccga gctcgtcgat ggcccgctgg atcggcgacg atgcgtcgaa 2700 cgagattccg aagcccatcg tgaacgtcat caggtcgcgc accgtgatcg gccgctccgc 2760 gggcaccgtc tcgtcgatcg gaccatcgat gcgcgccagc accttccggt tcgcgagctc 2820 cggcaaccat cggtcgacgg gggagtcgag gtcgagcttg ccttcctcga cgagcatcat 2880 caccgccgtc gcggtgaccg ccttcgtcat cgaggcgatc cggaagatcg tgtcccgccg 2940 catgggcgcg ctgccgccga gctcggtcac gcccaccgcg tccacgtgca cgtcgtcgcc 3000 gcgcgcgacc agccagaccg ctcccggcat ctgccccgcc gccacctccg ccgccatcac 3060 ctcgcgcgcg ggcgccagcg cgccggcccc cgcgtcctgc cctggctgcc cctcctcctc 3120 ggccccaccc aacgcgcacc ccggcgccgc cacgctgatc aaagctccca taaactcccg 3180 ccttctcatg accgtcgatg cctctccgag cgggggcgcc tgcccctgcc gagagcactg 3240 actgcccgcg cccgaaaaaa tcatcggtgc cccgtcacga tcgccgccgg gcgtggctcc 3300 gcccggccgc ccgctcgggc gcccgcccct ggacgagcaa agctcgcccg cccgcgctca 3360 gcacgccgct tgccatgtcc ggcctgcacc cacaccgagg agccacccac cctgatgcac 3420 ggcctcaccg agcggcaggt cctgctctcg ctcgtcaccc tcgcgctcat cctcgtgacc 3480 gcgcgcgcct ccggcgagct cgcgcggcgg ctgcgccagc ccgaggtgct cggggagctc 3540 ttcggcggcg tcgtgctggg cccctccgtc gtcggcgcgc tcgcgcccgg gttccatcga 3600 gccctcttcc aggagccggc ggtcggggtc gtgctctcgg gcatctcctg gataggcgcg 3660 ctcctcctgc tgctgatggc gggcatcgag gtcgacgtgg gcatcctgcg caaggaggcg 3720 cgccccgggg cgctctcggc gctcggcgcg atcgcgcccc cgctcgcggc gggcgccgcc 3780 ttctcggcgc tcgtgctcga tcggcccctt ccgagcggcc tcttcctcgg gatcgtgctc 3840 tcggtgacgg cggtcagcgt gatcgcgaag gtgctgatcg agcgcgagtc gatgcgccgc 3900 agctatgcgc aggtgacgct cgcggcgggg gtggtcagcg aggtcgctgc ctgggtgctc 3960 gtcgcgatga cgtcgtcgag ctacggcgcg tcgcccgcgc tggcggtcgc ccggagcgcg 4020 ctcctggcga gcggattctt gctgttcatg gtgctcgtcg ggcggcggct cacccacctc 4080 gcgatgcgct gggtggccga cgcgacgcgc gtctccaagg gacaggtgtc gctcgtcctc 4140 gtcctcacgt tcctggccgc ggcgctgacg cagcggctcg gcctgcaccc gctgctcggc 4200 gcgttcgcgc tcggcgtgct gctcaacagc gctcctcgca ccaaccgccc tctcctcgac 4260 ggcgtgcaga cgctcgtggc gggcctcttc gcgcctgtgt tcttcgtcct cgcgggcatg 4320 cgcgtcgacg tgtcgcagct gcgcacgccg gcggcgtggg ggacggtcgc gttgctgctg 4380 cggcggcgaa gcgaccgcga ggtcgtcccc gccgcgctcg gcgcgcggct cggcgggctc 4440 aggggcagcg aggcggcgct cgtggcggtg ggcctgaaca tgaagggcgg cacggacctc 4500 40 atcgtcgcga tcgtcggcgt cgagctcggg ctcctctcca acgaggctta tacgatgtac 4560 gccgtcgtcg cgctggtcac ggtgaccgcc tcacccgcgc tcctcatctg gctcgagaaa 4620 agggcgcctc cgacgcagga ggagtcggct cgcctcgagc gcgaggaggc cgcgaggcgc 4680 gcgtacatcc ccggggtcga gcggatcctc gtcccgatcg tggcgcacgc cctgcccggg 4740 ttcgccacgg acatcgtgga gagcatcgtc gcctccaagc gaaagctcgg cgagacggtc 4800 45 gacatcacgg agctctccgt ggagcagcag gcgcccggcc catcgcgcgc cgcgggggag 4860 gcgagccggg ggctcgcgag gctcggcgcg cgcctccgcg tcggcatctg gcggcaaagg 4920 cgcgagctgc gcggctcgat ccaggcgatc ctgcgcgcct cgcgggatca cgatctgctc 4980 gtgatcggcg cgcgatcgcc ggcgcgcgcg cgcggaatgt cgttcggtcg cctgcaggac 5040 gcgatcgtcc agcgggccga gtccaacgtg ctcgtcgtgg tgggcgaccc tccggcggcg 5100 50 gagcgcgcct ccgcgcggcg gatcctcgtc ccgatcatcg gcctcgagta ctccttcgcc 5160 gccgccgatc tcgcggccca cgtggcgctg gcgtgggacg ccgagctcgt gctgctcagc 5220 agcgcgcaga ccgatccggg cgcggtcgtc tggcgcgatc gcgagccatc ccgggtgcgc 5280 gcggtggcgc ggagcgtcgt cgacgaggcg gtcttccggg ggcgccggct cggcgtgcgc 5340 gtctcgtcgc gcgtgcacgt gggcgcgcac ccgagcgacg agataacgcg ggagctcgcg 5400 55 cgcgccccgt acgatctgct cgtgctcgga tgctacgacc atgggccgct cggccggctc 5460 tacctcggca gcacggtcga gtcggtggtg gtccggagcc gggtgccggt cgcgttgctc 5520 gtcgcgcatg gagggactcg agagcaggtg aggtgaggct tccaccgcgc tcgcccgtga 5580 ggaagcgagc gcccggctct gccgacgatc gtcactcccg gtccgtgtag gcgatcgtgc 5640 tgagcagcgc gttctccgcc tgacgcgagt cgagccgggt atgctgcacg acgatggggg 5700 cgtccgattc gatcacgctg gcatagtccg tatcgcgcgg gatcggctcg ggttcggtca 5760 gatcgttgaa ccggacgtgc cgggtgcgcc tcgctggaac ggtcacccgg taaggcccgg 5820 cggggtcgcg gtcgctgaag taaacggtga tggcgacctg cgcgtcccgg tccgacgcat 5880 ggccgtctca tcaacaggca tggctcgtca tctgcggctc aggtccgttg ctcccgcctg 5940 ggatgtagcc ctctgcgatt gcacagcgcg tccgcccgat cggcttgtcc atgtgtcctc 6000 cctcctggct cctctttggc agcctccctc tgctgtccag gagcgatggc ctcttcgctc 6060 gacgcgctcg gggatccatg gctgaggatc ctcgccgagc gctccctgcc gaccggcgcg 6120 ccgagcgccg acgggctttg aaagcgcgcg accggccagc ccggacgcgg gcccgagagg 6180 gacagtgggt ccgccgtgaa gcagagaggc gatcgaggtg gtgagatgaa acacgtcgac 6240 acgggccgac gattcggccg ccggataggg cacacgctcg gtcttctcgc gagcatggcg 6300 ctcgccggct gcggcggtcc gagcga gaaa accgtgcagg gcacgcggct cgcgcccggc 6360 gccgatgcgc gcgtcaccgc cgacgtcgac cccgacgccg cgaccacgcg gctggcggtg 6420 gacgtcgttc acctctcgcc gcccgagcgg ctcgaggccg gcagcgagcg gttcgtcgtc 6480 tggcagcgtc cgagccccga gtccccgtgg cgacgggtcg gagtgctcga ctacaatgct 6540 gacagccgaa gaggcaagct ggccgagacg accgtgccgt atgccaactt cgagctgctc 6600 atcaccgccg agaagcagag cagccctcag tcgccatcgt ctgccgccgt catcgggccg 6660 acgtctgtcg ggtgacatcg cgctatcagc agcgctgagc ccgccagcag gccccagggc 6720 cctgcctcga tggccttccc catcacccct gcgcactcct ccagcgacgg ccgcgcagcg 6780 acggccgcgt ccaagcaacc gccgtgccgg cgcggctcca cgcgcgcgac aggcgagcgt 6840 cctggcgcgg cctgcgcatc gctggaagga tcggcggagc atggatagag aatcgaggat 6900 cgcgatcttt gttgccatcg cagccaacgt ggcgatcgcg gcggtcaagt tcatcgccgc 6960 cgccgtgacc ggcagctcgg cgaggcgttt gccgacttcg gcggcgtccc gcgcgtgctg 7020 ctctacgaca acctcaagag cgccgtcgtc gagcgccacg gcgacgcgat ccggttccac 7080 cccacgctgc tggctctgtc ggcgcattac cgcttcgagc cgcgccccgt cgccgtcgcc 7140 cgcggcaacg agaagggccg cgtccagcgc g ccatcacgg cgtggacgac atggcgcgga 7200 aacgtcgtcg taaccgccca gcaatgtcat gggaatggcc ccttgaaatg gccccttgag 7260 ggggctggcc ggggtcgacg atatcgcgcg atctccccgt tggtaaaaga caattcccga 7320 atagatcgta aaaatttgtc agctgtgata gtggtctgtc ttacgttgcg tcttccgcac 7380 ctcgagcgag ttctctcgga taactttcaa tttttccgag gggggcttgg tctctggttc 7440 ctcaggaagc ctgatcggga cgagctaatt cccatccatt tttttgaggc tctgctcaaa 7500 gggattagat cgagtgagac agttcttttg cagtgcgcga agaacctggg cctcgaccgg 7560 aggacgatcg acgtccgcga gcgggtcagc cgctgaggat gtgcccgtcg tggcggatcg 7620 tcccatcgag cgcgcagccg aagatccgat tgcgatcgtc ggagcgagtt gccgtctgcc 7680 cggtggcgtg atcgatctga gcgggttctg gacgctcctc gagggctcgc gcgacaccgt 7740 cgggcgagtc cccgccgaac gctgggatgc agcagcgtgg tttgatcccg accccgatgc 7800 cccggggaag acgcccgtta cgcgcgcatc tttcctgagc gacgtagcct gcttcgacgc 7860 ctccttcttc ggcatctcgc ctcgcgaagc gctgcggatg gaccctgcac atcgactctt 7920 gctggaggtg tgctgggagg cgctggagaa cgccgcgatc gctccatcgg cgctcgtcgg 7980 tacggaaacg ggagtgttca tcgggatcgg cccgtcc gaa tatgaggccg cgctgccgca 8040 agcgacggcg tccgcagaga tcgacgctca tggcgggctg gggacgatgc ccagcgtcgg 8100 agcgggccga atctcgtatg ccctcgggct gcgagggccg tgtgtcgcgg tggatacggc 8160 ctattcgtcc tcgctggtgg ccgttcatct ggcctgtcag agcttgcgct ccggggaatg 8220 ctccacggcc ctggctggtg gggtatcgct gatgttgtcg ccgagcaccc tcgtgtggct 8280 ctcgaagacc cgggcgctgg ccagggacgg tcgctgcaag gcattttcgg cggaggccga 8340 tgggttcgga cgaggcgaag ggtgcgccgt cgtggtcctc aagcggctca gtggagcccg 8400 cgcggacggc gatcggatat tggcggtgat tcgaggatcc gcgatcaatc acgacggtgc 8460 gagcagcggt ctgaccgtgc cgaacgggag ctcccaagaa atcgtgctga aacgggccct 8520 ggcggacgca ggctgcgccg cgtcttcggt gggttatgtc gaggcacacg gcacgggcac 8580 gacgcttggt gaccccatcg aaatccaagc tctgaatgcg gtatacggcc tcgggcgaga 8640 tgtcgccacg ccgctgctga tcgggtcggt gaagaccaac cttggccatc ctgagtatgc 8700 gtcggggatc actgggctgc tgaaggtcgt cttgtccctt cagcacgggc agattcctgc 8760 gcacctccac gcgcaggcgc tgaacccccg gatctcatgg ggtgatcttc ggctgaccgt 8820 cacgcgcgcc cggacaccgt ggccggactg gaatacgccg cg acgggcgg gggtgagctc 8880 gttcggcatg agcgggacca acgcgcacgt ggtgctggaa gaggcgccgg cggcgacgtg 8940 cacaccgccg gcgccggagc gaccggcaga gctgctggtg ctgtcggcaa ggaccgcgtc 9000 agccctggat gcacaggcgg cgcggctgcg cgaccatctg gagacctacc cttcgcagtg 9060 tctgggcgat gtggcgttca gtctggcgac gacgcgcagc gcgatggagc accggctcgc 9120 ggtggcggcg acgtcgaggg aggggctgcg ggcagccctg gacgctgcgg cgcagggaca 9180 gacgtcgccc ggtgcggtgc gcagtatcgc cgattcctca cgcggcaagc tcgcctttct 9240 cttcaccgga cagggggcgc agacgctggg catgggccgt gggctgtacg atgtatggtc 9300 cgcgttccgc gaggcgttcg acctgtgcgt gaggctgttc aaccaggagc tcgaccggcc 9360 gctccgcgag gtgatgtggg ccgaaccggc cagcgtcgac gccgcgctgc tcgaccagac 9420 agccttcacc cagccggcgc tgttcacctt cgaatatgcg ctcgccgcgc tgtggcggtc 9480 gtggggtgta gagccggagt tggtcgccgg ccatagcatc ggtgagctgg tggctgcctg 9540 cgtggcgggc gtgttctcgc ttgaggacgc ggtgttcctg gtggctgcgc gcgggcgcct 9600 gatgcaggcg ctgccggccg gcggggcgat ggtgtcgatc gaggcgccgg aggccgatgt 9660 ggctgctgcg gtggcgccgc acgcagcgtc ggtgtcgatc gccgcggtca acgctc cgga 9720 ccaggtggtc atcgcgggcg ccgggcaacc cgtgcatgcg atcgcggcgg cgatggccgc 9780 gcgcggggcg cgaaccaagg cgctccacgt ctcgcatgcg ttccactcac cgctcatggc 9840 cccgatgctg gaggcgttcg ggcgtgtggc cgagtcggtg agctaccggc ggccgtcgat 9900 cgtcctggtc agcaatctga gcgggaaggc ttgcacagac gaggtgagct cgccgggcta 9960 ttgggtgcgc cacgcgcgag aggtggtgcg cttcgcggat ggagtgaagg cgctgcacgc 10020 ggccggtgcg ggcaccttcg tcgaggtcgg tccgaaatcg acgctgctcg gcctggtgcc 10080 tgcctgcatg ccggacgccc ggccggcgct gctcgcatcg tcgcgcgctg ggcgtgacga 10140 gccggcgacc gtgctcgagg cgctcggcgg gctctgggcc gtcggtggcc tggtctcctg 10200 ggccggcctc ttcccctcag gggggcggcg ggtgccgctg cccacgtacc cttggcagcg 10260 tggatcgaca cgagcgctac cgaaagccga cgacgcggcg cgtggcgacc gccgtgctcc 10320 gggagcgggt cacgacgagg tcgaggaggg gggcgcggtg cgcggcggcg accggcgcag 10380 cgctcggctc gaccatccgc cgcccgagag cggacgccgg gagaaggtcg aggccgccgg 10440 cgaccgtccg ttccggctcg agatcgatga gccaggcgtg cttgatcacc tcgtgcttcg 10500 ggtcacggag cggcgcgccc ctggtctggg cgaggtcgag atcgccgtcg acg cggcggg 10560 gctcagcttc aatgatgtcc agctcgcgct gggcatggtg cccgacgacc tgccgggaaa 10620 gcccaaccct ccgctgctgc tcggaggcga gtgcgccggg cgcatcgtcg ccgtgggcga 10680 gggcgtgaac ggcctcgtgg tgggccaacc ggtcatcgcc ctttcggcgg gagcgtttgc 10740 tacccacgtc accacgtcgg ctgcgctggt gctgcctcgg cctcaggcgc tctcggcgat 10800 cgaggcggcc gccatgcccg tcgcgtacct gacggcatgg tacgcgctcg acagaatagc 10860 ccgccttcag ccgggggagc gggtgctgat ccatgcggcg accggcgggg tcggtctcgc 10920 cgcggtgcag tgggcgcagc acgtgggagc cgaggtccat gcgacggccg gcacgcccga 10980 gaaacgcgcc tacctggagt cgctgggcgt gcggtatgtg agcgattccc gctcggaccg 11040 gttcgtcgcc gacgtgcgcg cgtggacggg cggcgaggga gtagacgtcg tgctcaactc 11100 gctctcgggc gagctgatcg acaagagttt caatctcctg cgatcgcacg gccggtttgt 11160 ggagctcggc aagcgcgact gttacgcgga taaccagctc gggctgcggc cgttcctgcg 11220 caatctctcc ttctcgctgg tggatctccg ggggatgatg ctcgagcggc cggcgcgggt 11280 ccgtgcgctc ttggaggagc tcctcggcct ggcgtgttca gatcgcggca cccctccccc 11340 catcgcgacg ctcccgatcg cccgtgtcgc cgatgcgttc cggagc ATGG cgcaggcgca 11400 gcatcttggg aagctcgtac tcacgctggg tgacccggag gtccagatcc gtattccaac 11460 ccacgcaggc gccggcccgt ccaccgggga tcgggacctg ctcgacaggc tcgcgtcagc 11520 tgcgccggcc gcgcgcgcgg cggcgctgga ggcgttcctc cgtacgcagg tctcgcaggt 11580 gctgcgcacg cccgaaatca aggtcggcgc ggaggcgctg ttcacccgcc tcggcatgga 11640 ctcgctcatg gccgtggagc tgcgcaatcg tatcgaggcg agcctcaagc tgaagctgtc 11700 gacgacgttc ctgtccacgt cccccaatat cgccttgttg gcccaaaacc tgttggatgc 11760 tctcgccaca gctctctcct tggagcgggt ggcggcggag aacctacggg caggcgtgca 11820 aaacgacttc gtctcatcgg gcgcagatca agactgggaa atcattgccc tatgacgatc 11880 aatcagcttc tgaacgagct cgagcaccag ggtatcaagc tggcggccga tggggagcgc 11940 ctccagatac aggcccccaa gaacgccctg aacccgaacc tgctcgctcg aatctccgag 12000 cacaaaagca cgatcctgac gatgctccgt cagagactcc ccgcagaatc catcgtgccc 12060 gccccagccg agcggcacgc tccgtttcct ctcacagaca tccaagaatc ctactggctg 12120 ggccggacag gagcgtttac ggtccccagc gggatccacg cctatcgcga atacgactgt 12180 acggatctcg acgtgccgag gctgagccgc gcctttcgg to aagtcgtcgc gcggcacgac 12240 atgcttcggg cccacacgct gcccgacatg atgcaggtga tcgagcctaa agtcgacgcc 12300 gacatcgaga tcatcgatct gcgcgggctc gaccggagca cacgggaagc gaggctcgtg 12360 tcgttgcgag atgcgatgtc gcaccgcatc tatgacaccg agcgccctcc gctctatcac 12420 ttcggctgga gtcgtcgccg cgagcggcaa acccgtctcg tgctcagtat cgatctcatt 12480 aacgttgacc taggcagcct gtccatcatc ttcaaggact ggctcagctt ctacgaagat 12540 cccgagacct ctctccctgt cctggagctc tcgtaccgcg attatgtact cgcgctggag 12600 tctcgcaaga agtctgaggc gcatcaacga tcgatggatt actggaagcg gcgcatcgcc 12660 gagctcccac ctccgccgac gcttccgatg aaggccgatc catctaccct gaaggagatc 12720 acacggagca cgcttccggc atggctgccg tcggactcct ggggtcgatt gaagcggcgt 12780 gtcggggagc gcgggctgac cccgacgggc gtcatcctgg ctgcattttc cgaggtgatc 12840 gggcgctgga gcgcgagccc ccggtttacg ctcaacataa cgctcttcaa ccggctcccc 12900 gtccatccgc gcgtgaacga tatcaccggg gacttcacgt cgatggtcct cctggacatc 12960 gacaccactc gcgacaagag cttcgaacag cgcgctaagc gtattcaaga gcagctgtgg 13020 gaagcgatgg atcactgcga cgtaagcggt atc gaggtcc agcgagaggc cgcccgggtc 13080 ctggggatcc aacgaggcgc attgttcccc gtggtgctca cgagcgcgct taaccagcaa 13140 gtcgttggtg tcacctcgtt gcagaggctc ggaactccgg tgtacaccag cacgcagact 13200 cctcagctgc tgctggatca tcagctctac gagcacgatg gggacctcgt cctcgcgtgg 13260 gacatcgtcg acggagtgtt cccgcccgac cttctggacg acatgctcga agcgtacgtc 13320 gtttttctcc ggcggctcac tgaggaacca tggggtgaac aggtgcgctg ttcgcttccg 13380 cctgcccagc tagaagcgcg aacgcgacca ggcgagcgca acgcgctgct gagcgagcat 13440 acgctgcacg gcctgttcgc ggcgcgggtc gagcagctgc ccatgcagct cgccgtggtg 13500 tcggcgcgca agacgctcac gtacgaagag ctttcgcgcc gttcgcggcg acttggcgcg 13560 cggctgcgcg agcagggggc acgcccgaac acattggtcg cggtggtgat ggagaaaggc 13620 tgggagcagg ttgtcgcggt tctcgcggtg ctcgagtcag gcgcggccta cgtgccgatc 13680 gatgccgacc taccggcgga gcgtatccac tacctcctcg atcatggtga ggtaaagctc 13740 gtgctgacgc agccatggct ggatggcaaa ctgtcatggc cgccggggat ccagcggctg 13800 ctcgtgagcg aggccggcgt cgaaggcgac ggcgaccagc ctccgatgat gcccattcag 13860 acaccttcgg atctcgcgta tgtcat CTAC acctcgggat ccacagggtt gcccaagggg 13920 gtgatgatcg atcatcgggg tgccgtcaac accatcctgg acatcaacga gcgcttcgaa 13980 atagggcccg gagacagggt gctggcgctc tcctcgctga gcttcgatct ctcggtctat 14040 gatgtgttcg ggatcctggc ggcgggcggt acgatcgtgg tgccggacgc gtccaagctg 14100 cgcgatccgg cgcattgggc agagttgatc gaacgagaga aggtgacggt gtggaactcg 14160 gtgccggcgc tgatgcggat gctcgtcgag cattttgagg gtcgccccga ttcgctcgct 14220 aggtctctgc ggctttcgct gctgagcggc gactggatcc cggtgggcct gcctggcgag 14280 ctccaggcca tcaggcccgg cgtgtcggtg atcagcctgg gcggggccac cgaagcgtcg 14340 atctggtcca tcgggtaccc cgtgaggaac gtcgacctat cgtgggcgag catcccctac 14400 ggccgtccgc tgcgcaacca gacgttccac gtgctcgatg aggcgctcga accgcgcccg 14460 gtctgggttc cggggcaact ctacattggc ggggtcgggc tggcactggg ctactggcgc 14520 agacgcgcaa gatgaagaga gagcttcctc gtgcaccccg agaccgggga gcgcctctac 14580 aagaccggcg atctgggccg ctacctgccc gatggaaaca tcgagttcat ggggcgtgag 14640 gacaaccaaa tcaagcttcg cggataccgc gttgagctcg gggaaatcga ggaaacgctc 14700 aagtcgcatc cgaacgtac g cgacgcggtg attgtgcccg tcgggaacga cgcggcgaac 14760 aagctccttc tagcctatgt ggtcccggag ggcacacgga gacgcgctgc cgagcaggac 14820 gcgagcctca agaccgagcg gatcgacgcg agagcacacg ccgccgaagc ggacggcttg 14880 agcgacggcg agagggtgca gttcaagctc gctcgacacg gactccggag ggacctggac 14940 ggaaagcccg tcgtcgatct gaccgggcag gatccgcggg aggcggggct ggacgtctac 15000 gcgcgtcgcc gtagcgtccg aacgttcctt gaggccccga ttccgtttgt tgagtttggt 15060 cgattcctga gctgcttgag cagcgtggag cccgacggcg cgacccttcc caaattccgt 15120 tatccatcgg cgggcagcac gtacccggtg caaacctacg cgtatgtcaa atccggccgc 15180 atcgagggcg tggacgaggg cttctattat taccacccgt tcgagcaccg tttgctgaag 15240 ctctccgatc acgggatcga gcgcggagcg cacgttcggc aaaacttcga cgtgttcgat 15300 gaagcggcgt tcaacctcct gttcgtgggc aggatcgacg ccatcgagtc gctgtatgga 15360 tcgtcgtcgc gagaattttg cctgctggag gccggatata tggcgcagct cctgatggag 15420 caggcgcctt cctgcaacat cggcgtctgt ccggtggggc aattcaattt tgaacaggtt 15480 cggccggttc tcgacctgcg acattcggac gtttacgtgc acggcatgct gggcgggcgg gtagacccgc 15540 g gcagttcca ggtctgtacg ctcggtcagg attcctcacc gaggcgcgcc 15600 acgacgcgcg gcgcccctcc cggccgcgag cagcacttcg ccgatatgct tcgcgacttc 15660 aactacccga ttgaggacca gtacatggtg cctacagtct tcgtggagct cgatgcgttg 15720 ccgctgacgt ccaacggcaa ggtcgatcgt aaggccctgc gcgagcggaa ggatacctcg 15780 attcggggca tcgccgcggc cacggcgcca cgggacgcct tggaggagat cctcgtcgcg 15840 gtcgtacggg aggtgctcgg gctggaggtg gtcgggctcc agcagagctt cgtcgatctt 15900 ggtgcgacat cgattcacat cgttcgcatg aggagcctgt tgcagaagag gctggatagg 15960 gagatcgcca tcaccgagtt gttccagtac ccgaacctcg gctcgctggc gtccggtttg 16020 cgccgagact cgagagatct agatcagcgg ccgaacatgc aggaccgagt ggaggttcgg 16080 cgcaagggca ggagacgtag ctaagagcgc cgaacaaaac caggccgagc gggccgatga 16140 gccgcaagcc cgcctgcgtc accctgggac tcatctgatc tgatcgcggg tacgcgtcgc 16200 gggtgtgcgc gttgagccgt gttgttcgaa cgctgaggaa cggtgagctc atggaagaac 16260 cgctatcgca aagagtcctc gtcatcggca tgtcgggccg ttttccgggg gcgcgggatc 16320 tggacgaatt ctggaggaac cttcgagacg gcacggaggc cgtgcagcgc ttctccgagc 16380 aggagctcgc ggc gtccgga gtcgaccccg cgctggtgct ggacccgagc tacgtccggg 16440 cgggcagcgt gctggaagac gtcgaccggt tcgacgctgc tttcttcggc atcagcccgc 16500 gcgaggcaga gctcatggat ccgcagcacc ggatcttcat ggaatgcgcc tgggaggcgc 16560 tggagaacgc cggatacgac ccgacggctt acgagggctc tatcggcgtg tacgccggcg 16620 ccaacatgag ctcgtacttg acgtcgaacc tccacgagca cccagcgatg atgcggtggc 16680 ccggctggtt tcagacgttg atcggcaacg acaaggatta cctcgcgacc cacgtctcct 16740 acaggctgaa tctgagaggg ccgagcatct ccgttcaaac tgcctgctcc acctcgctcg 16800 tggcggttca cttggcgtgc atgagcctcc tggaccgcga gtgcgacatg gcgctggccg 16860 gcgggattac cgtccggatc ccccatcgag ccggctatgt atatgctgag gggggcatct 16920 tctctcccga cggccattgc cgggccttcg acgccaaggc gaacggcacg atcatgggca 16980 acggctgcgg cgttgtcctc ctgaagccgc tggaccgggc gctctccgat ggtgatcccg 17040 tccgcgcggt tatccttggg tctgccacaa acaacgacgg agcgaggaag atcgggttca 17100 ctgcgcccag tgaggtgggc caggcgcaag cgatcatgga ggcgctggcg ctggcagggg 17160 tcgaggcccg gtccatccaa tacatcgaga cccacgggac cggcacgctg ctcggagacg 17220 ccatcg agac ggcggcgctg cggcgggtgt tcggtcgcga cgcttcggcc cggaggtctt 17280 gcgcgatcgg ctccgtgaag accggcatcg gacacctcga atcggcggct ggcatcgccg 17340 gtttgatcaa gacggtcttg gcgctggagc accggcagct gccgcccagc ctgaacttcg 17400 agtctcctaa cccatcgatc gatttcgcga gcagcccgtt ctacgtcaat acctctctta 17460 aggattggaa taccggctcg actccgcggc gggccggcgt cagctcgttc gggatcggcg 17520 gcaccaacgc ccatgtcgtg ctggaggaag cgcccgcggc gaagcttcca gccgcggcgc 17580 cggcgcgctc tgccgagctc ttcgtcgtct cggccaagag cgcagcggcg ctggatgccg 17640 cggcggcacg gctacgagat catctgcagg cgcaccaggg gatttcgttg ggcgacgtcg 17700 ccttcagcct ggcgacgacg cgcagcccca tggagcaccg gctcgcgatg gcggcgccgt 17760 cgcgcgaggc gttgcgagag gggctcgacg cagcggcgcg aggccagacc ccgccgggcg 17820 ccgtgcgtgg ccgctgctcc ccaggcaacg tgccgaaggt ggtcttcgtc tttcccggcc 17880 agggctctca gtgggtcggc atgggccggc agctcctggc tgaggaaccc gtcttccacg 17940 cggcgctttc ggcgtgcgac cgggccatcc aggccgaagc tggttggtcg ctgctcgcgg 18000 agctcgccgc cgacgaaggg tcctcccagc tcgagcgcat cgacgtggtg cagccggtgc 1806 0 tgttcgccct cgcggtggca tttgcggcgc tgtggcggtc gtggggtgtc gcgcccgacg 18120 tcgtgatcgg ccacagcatg ggcgaggtag ccgccgcgca tgtggccggg gcgctgtcgc 18180 tcgaggatgc ggtggcgatc atctgccggc gcagccggct gctccggcgc atcagcggtc 18240 agggcgagat ggcggtgacc gagctgtcgc tggccgaggc cgaggcggcg ctccgaggct 18300 acgaggatcg ggtgagcgtg gccgtgagca acagcccgcg ctcgacggtg ctctcgggcg 18360 agccggcagc gatcggcgag gtgctgtcgt ccctgaacgc gaagggggtg ttctgccgtc 18420 gggtgaaggt ggatgtcgcc agccacagcc cgcaggtcga cccgctgcgc gaggacctct 18480 tggcagccct gggcgggctc cggccgggtg cggctgcggt gccgatgcgc tcgacggtga 18540 cgggcgccat ggtagcgggc ccggagctcg gagcgaatta ctggatgaac aacctcaggc 18600 agccagtgcg cttcgccgag gtagtccagg cgcagctcca aggcggccac ggtctgttcg 18660 tggagatgag cccgcatccg atcctaacga cttcggtcga ggagatgcgg cgcgcggccc 18720 agcgggcggg cgcagcggtg ggctcgctgc ggcgggggca ggacgagcgc ccggcgatgc 18780 tggaggcgct gggcacgctg tgggcgcagg gctaccctgt accctggggg cggctgtttc 18840 gcggcgggta ccgcgggggg ccgctgccga cctatccctg gcagcgcgag cggtact gga 18900 tcgaagcgcc ggccaagagc gccgcgggcg atcgccgcgg cgtgcgtgcg ggcggtcacc 18960 cgctcctcgg tgaaatgcag accctgtcaa cccagacgag cacgcggctg tgggagacga 19020 cgctggatct caagcggctg ccgtggctcg gcgaccaccg ggtgcaggga gcggtcgtgt 19080 ttccgggcgc ggcgtacctg gagatggcga tttcgtcggg ggccgaggct ttgggcgatg 19140 gccctttgca gataactgac gtggtgctcg ccgaggcgct ggccttcgcg ggcgacgcgg 19200 cggtgttggt ccaggtggtg acgacggagc agccgtcggg gcggctgcag ttccagatcg 19260 cgagccgggc gccgggcgct ggccacgcgt ccttccgggt ccacgctcgc ggcgcgttgc 19320 tccgagtgga gcgcaccgag gtcccggctg ggcttacgct ttccgctgtg cgcgcgcggc 19380 tccaggccag catacccgcc gcggccacct acgcggagct gaccgagatg gggctgcagt 19440 acggccctgc cttccagggg attgctgagc tatggcgggg tgaaggcgag gcgctgggac 19500 gggtacgcct gcccgacgcg gccggctcgg cagcggagta tcggttgcat cctgcgctgc 19560 tggacgcgtg cttccagatc gtcggcagcc tcttcgcccg cagtggcgag gcgacgccgt 19620 gggtgcccgt ggagttgggc tcgctgcggc tcttgcagcg gccttcgggg gagctgtggt 19680 gccatgcgcg cgtcgtgaac catgggcacc aaacccccga tcggcagggc gcc gactttt 19740 gggtggtcga cagctcgggt gcagtggtcg ccgaagtttg cgggctcgtg gcgcagcggc 19800 ttccgggagg ggtgcgccgg cgcgaagaag acgattggtt cctggagctc gagtgggaac 19860 ccgcagcggt cggcacagcc aaggtcaacg cgggccggtg gctgctcctc ggcggcggcg 19920 gtgggctcgg cgccgcgttg cgcgcgatgc tggaggccgg cggccatgcc gtcgtgcatg 19980 cggcagagaa caacacgagc gctgccggcg tacgcgcgct cctggcaaag gcctttgacg 20040 gccaggctcc gacggcggtg gtgcacctcg gcagcctcga tgggggtggc gagctcgacc 20100 cagggctcgg ggcgcaaggc gcattggacg cgccccggag cgccgacgtc agtcccgatg 20160 ccctcgatcc ggcgctggta cgtggctgcg acagcgtgct ctggaccgtg caggccctgg 20220 ccggcatggg ctttcgagac gccccgcgat tgtggctttt gacccgcggc gcacaggccg 20280 tcggcgccgg cgacgtctcc gtgacacagg caccgctgct ggggctgggc cgcgtcatcg 20340 ccatggagca cgcggatctg cgctgcgctc gggtcgacct cgatccagcc cggcccgagg 20400 gggagctcgc tgccctgctg gccgagctgc tggccgacga cgccgaagcg gaagtcgcgt 20460 tgcgcggtgg cgagcgatgc gtcgctcgga tcgtccgccg gcagcccgag acccggcccc 20520 gggggaggat cgagagctgc gttccgaccg acgtcaccat ccgcgc GGAC agcacctacc 20580 ttgtgaccgg cggtctgggt gggctcggtc tgagcgtggc cggatggctg gccgagcgcg 20640 gcgctggtca cctggtgctg gtgggccgct ccggcgcggc gagcgtggag caacgggcag 20700 ccgtcgcggc gctcgaggcc cgcggcgcgc gcgtcaccgt ggcgaaggcg gatgtcgccg 20760 atcgggcgca gctcgagcgg atcctccgcg aggttaccac gtcggggatg ccgctgcggg 20820 gcgtcgtcca tgcggccggc atcttggacg acgggctgct gatgcagcag actcccgcgc 20880 ggtttcgtaa ggtgatggcg cccaaggtcc agggggcctt gcacctgcac gcgttgacgc 20940 gcgaagcgcc gctttccttc ttcgtgctgt acgcttcggg agtagggctc ttgggctcgc 21000 cgggccaggg caactacgcc gcggccaaca cgctctggcg cgttcctcga caccaccgga 21060 gggcgcaggg gctgccagcg ttgagcgtcg actggggcct gttcgcggag gtgggcatgg 21120 cggccgcgca ggaagatcgc ggcgcgcggc tggtctcccg cggaatgcgg agcctcaccc 21180 ccgacgaggg gctgtccgct ctggcacggc tgctcgaaag cggccgcgct caggtggggg 21240 tgatgccggt gaacccgcgg ctgtgggtgg agctctaccc cgcggcggcg tcttcgcgaa 21300 tgttgtcgcg cctggtgacg gcgcatcgcg cgagcgccgg cgggccagcc ggggacgggg 21360 acctgctccg ccgcctcgcc gctgccgagc cgagcgcgc g gagcgcgctc ctggagccgc 21420 tcctccgcgc gcagatctcg caggtgctgc gcctccccga gggcaagatc gaggtggacg 21480 ccccgctcac gagcctgggc atgaactcgc tgatggggct cgagctgcgc aaccgcatcg 21540 aggccatgct gggcatcacc gtaccggcaa cgctgttgtg gacctatccc acggtggcgg 21600 cgctgagcgg gcatctggcg cgggaggcat gcgaagccgc tcctgtggag tcaccgcaca 21660 ccaccgccga ctctgccgtc gagatcgagg agatgtcgca ggacgatctg acgcagttga 21720 tcgcagcaaa attcaaggcg cttacatgac tactcgcggt cctacggcac agcagaatcc 21780 gcggccatca gctgaaacaa tcattcagcg gctggaggag cggctcgctg ggctcgcaca 21840 ggcggagctg gaacggaccg agccgatcgc catcgtcggt atcggctgcc gcttccctgg 21900 cggtgcggac gctccggaag cgttttggga gctgctcgac gcggagcgcg acgcggtcca 21960 gccgctcgac atgcgctggg cgctggtggg tgtcgctccc gtcgaggccg tgccgcactg 22020 ggcggggctg ctcaccgagc cgatagattg cttcgatgct gcgttcttcg gcatctcgcc 22080 tcgggaggcg cgatcgctcg acccgcagca tcgtctgttg ctggaggtcg cttgggaggg 22140 gctcgaggac gccggtatcc cgccccggtc catcgacggg agccgcaccg gtgtgttcgt 22200 cggcgctttc acggcggact acgcgcgcac g gtcgctcgg ctgccgcgcg aggagcgaga 22260 gccaccggca cgcgtacagc acatgctcag catcgccgcc ggacggctgt cgtacacgct 22320 ggggttgcag ggaccttgcc tgaccgtcga cacggcgtgc tcgtcatcgc tggtggcgat 22380 tcacctcgcc tgccgcagcc tgcgcgcagg agagagcgat ctcgcgttgg cgggaggggt 22440 cagcgcgctc ctctcccccg acatgatgga agccgcggcg cgcacgcaag cgctgtcgcc 22500 cgatggtcgt tgccggacct tcgatgcttc ggccaacggg tccgtccgtg gcgagggctg 22560 tggcctggtc gtcctcaaac ggctctccga cgcgcaacgg gatggcgacc gcatctgggc 22620 gctgatccgg ggctcggcca tggccggtcg tcaaccatga accgggttga ccgcgcccaa 22680 cgtgctggct caggagacgg tcttgcgcga ggcgctgcgg agcgcccacg tcgaagctgg 22740 ggccgtcgat tacgtcgaga cccacggaac agggacctcg ctgggcgatc ccatcgaggt 22800 cgaggcgctg cgggcgacgg tggggccggc gcgctccgac ggcacacgct gcgtgctggg 22860 cgcggtgaag accaacatcg gccatctcga ggccgcggca ggcgtagcgg gcctgatcaa 22920 ggcagcgctt tcgctgacgc acgagcgcat cccgagaaac ctcaacttcc gcacgctcaa 22980 tccgcggatc cggctcgagg gcagcgcgct cgcgttggcg accgagccgg tgccgtggcc 23040 gcgcacggac cgcccgcgct tcgcgggggt gag ctcgttc gggatgagcg gaacgaacgc 23100 gcatgtggtg ctggaagagg cgccggcggt ggagctgtgg cctgccgcgc cggagcgctc 23160 ggcggagctt ttggtgctgt cgggcaagag cgagggggcg ctcgatgcgc aggcggcgcg 23220 cacctggaca gctgcgcgag tgcacccgga gctcgggctc ggggacgtgg cgttcagcct 23280 ggcgacgacg cgcagcgcga tgagccaccg gctcgcggtg gcggtgacgt cgcgcgaggg 23340 gctgctggcg gcgctctcgg ccgtggcgca ggggcagacg ccggcggggg cggcgcgctg 23400 catcgcgagc tcctcgcgcg gcaagctggc gttcctgttc accggacagg gcgcgcagac 23460 gccgggcatg ggccgggggc tttgcgcggc gtggccagcg ttccgggagg cgttcgaccg 23520 gtgcgtggcg ctgttcgacc gggagctgga ccgcccgctg tgtgggcgga cgcgaggtga 23580 ggcggggagc gccgagtcgt tgttgctcga ccagacggcg ttcacccagc ccgcgctctt 23640 cgcggtggag tacgcgctga cggcgctgtg gcggtcgtgg ggcgtagagc cggagctcct 23700 ggttgggcat agcatcgggg agctggtggc ggcgtgcgtg gcgggggtgt tctcgctgga 23760 agatggggtg aggctcgtgg cggcgcgcgg gcggctgatg caggggctct cggcgggcgg 23820 cgcgatggtg tcgctcggag cgccggaggc ggaggtggcg gcggcggtgg cgccgcacgc 23880 ggcgtcggtg tcgatcgcgg cggtca ATGG gccggagcag gtggtgatcg cgggcgtgga 23940 gcaagcggtg caggcgatcg cggcggggtt cgcggcgcgc ggcgcgcgca ccaagcggct 24000 gcatgtctcg cacgcgttcc actcgccgct gatggaaccg atgctggagg agttcgggcg 24060 ggtggcggcg tcggtgacgt accggcggcc aagcgtttcg ctggtgagca acctgagcgg 24120 gaaggtggtc acggacgagc tgagcgcgcc ggggtactgg gtgcggcacg tgcgggaggc 24180 ggtgcgcttc gcggacgggg tgaaggcgct gcacgaagcc ggcgcgggga cgttcgtcga 24240 agtgggcccg aagccgacgc tgctcgggct gttgccagcc tgcctgccgg aggcggagcc 24300 gacgctgctg gcgtcgttgc gcgccgggcg cgaggaggct gcgggggtgc tcgaggcgct 24360 gggcaggctg tgggccgccg gcggctcggt cagctggccg ggcgtcttcc ccacggctgg 24420 gcggcgggtg ccgctgccga cctatccgtg gcagcggcag cggtactgga tcgaggcgcc 24480 ctcggagcca ggccgaaggg cggccgccga tgcgctggcg cagtggttct accgggtgga 24540 ctggcccgag atgcctcgct catccgtgga ttcgcggcga gcccggtccg gcgggtggct 24600 ggtgctggcc gaccggggtg gagtcgggga ggcggccgcg gcggcgcttt cgtcgcaggg 24660 atgttcgtgc gccgtgctcc atgcgcccgc cgaggcctcc gcggttgccg agcaggtgac 24720 ccaggccctc ggtggccgc to acgactggca gggggtgctg tacctgtggg gtctggacgc 24780 cgtcgtggag gcgggggcat cggccgaaga ggtcgccaaa gtcacccatc ttgccgcggc 24840 gccggtgctc gcgctgattc aggcgctcgg cacggggccg cgctcacccc ggctctggat 24900 ggggcctgca cgtgacccga cggtgggcgg cgagcctgac gctgccccct gtcaggcggc 24960 gctgtggggt atgggccggg tcgcggcgct agagcatccc ggctcctggg gcgggctcgt 25020 ccggaggaga ggacctggat gcccgacgga ggtcgaggcc ctggtggccg agctgctttc 25080 gccggacgcc gaggatcagc tggcattccg ccaggggcgc cggcgcgcag cgcggcttgt 25140 ccggagggaa ggccgcccca acgcagcgcc ggtgtcgctg tctgcggagg ggagttactt 25200 ggtgacgggt gggctgggcg cccttggcct cctcgttgcg cggtggttgg tggagcgcgg 25260 cttgtgctga ggcggggcac tcagccggca cggattgccc gaccgcgagg aatggggccg 25320 agatcagccg ccagaggtgc gcgcgcgcat tgcggcgatc gaggcgctgg aggcgcaggg 25380 cgcgcgggtc accgtggcgg cggtcgacgt ggccgatgcc gaaggcatgg cggcgctctt 25440 ggcggccgtc gagccgccgc tgcggggggt agtgcacgcc gcgggtctgc tcgacgacgg 25500 gctgctggcc caccaggacg ctggtcggct cgcccgggtg ttgcgcccca aggtggaggg 25560 ggcatgggtg c tgcacaccc ttacccgcga gcagccgctg gacctcttcg tactgttttc 25620 ctcggcgtcg ggcgtcttcg gctcgatcgg ccagggcagc tacgcggcag gcaatgcctt 25680 tttggacgcg ctggcggacc tccgccgaac gcaggggctc gccgccctga gcatcgcctg 25740 gggcctgtgg gcggaggggg ggatgggctc gcaggcgcag cgccgggaac acgaggcatc 25800 gggaatctgg gcgatgccga cgagtcgggc cctggcggcg atggaatggc tgctcggtac 25860 gcgcgcgacg cagcgcgtgg tcatccagat ggattgggcc catgcgggag cggcgccgcg 25920 cgacgcgagc cgaggccgct tctgggatcg gctggtaact gccacgaaag aggcctcctc 25980 ctcggccgtg ccagctgtgg agcgctggcg caacgcgtct gttgtggaga cccgctcggc 26040 gctctacgag cttgtgcgcg gcgtggtcgc cggggtgatg ggccttaccg accagggcac 26100 gctcgacgtg cgacgaggct tcgccgagca gggcctcgac tccctgatgg ccgtggagat 26160 ccgcaaacgg cttcagggtg agctgggtat gccgctgtcg gcgacgctag cgttcgacca 26220 tccgaccgtg gagcggctgg tggaatactt gctgagccag gcgctggagc tgcaggaccg 26280 caccgacgtg cggagcgttc ggttgccggc gacagaggac ccgatcgcca tcgtgggtgc 26340 cgcctgccgc ttcccgggcg gggtcgagga cctggagtcc tactggcagc tgttgaccga gggcgtggtg 26400 gtc agcaccg aggtgccggc cgaccggtgg aatggggcag acgggcgcgt 26460 ccccggctcg ggagaggcac agagacagac ctacgtgccc aggggtggct ttctgcgcga 26520 ggtggagacg ttcgatgcgg cgttcttcca catctcgcct tgagcctgga cgggaggcga 26580 cccgcaacag cggctgctgc tggaagtgag ctgggaggcg atcgagcgcg cgggccagga 26640 cccgtcggcg ctgcgcgaga gccccacggg cgtgttcgtg ggcgcgggcc ccaacgaata 26700 tgccgagcgg gtgcaggaac tcgccgatga ggcggcgggg ctctacagcg gcaccggcaa 26760 catgctcagc gttgcggcgg gacggctatc atttttcctg ggcctgcacg ggccgaccct 26820 ggctgtggat acggcgtgct cctcgtcgct ggtggcgctg cacctcggct gccagagctt 26880 gcgacggggc gagtgcgacc aagccctggt tggcggggtc aacatgctgc tctcgccgaa 26940 gaccttcgcg ctgctctcac ggatgcacgc actttcgccc ggcgggcggt gcaagacgtt 27000 ctcggccgac gcggacggct acgcgcgggc cgagggctgc gccgtggtgg tgctcaagcg 27060 gctctccgac gcgcagcgcg accgcgaccc catcctggcg gtgatccggg gtacggcgat 27120 ggcccgagca caatcatgat gcgggctgac agtgcccagc ggccctgccc aggaggcgct 27180 gttacgccag gcgctggcgc acgcaggggt ggttccggcc gacgtcgatt tcgtggaatg 27240 ccacgg GACC gggacggcgc tgggcgaccc gatcgaggtg gcgacgtgta cgtgcgctga 27300 cgggcaagcc cgccctgcgg accgaccgct gatcctggga gccgccaagg ccaaccttgg 27360 gcacatggag cccgcggcgg gcctggccgg cttgctcaag gcggtgctcg cgctggggca 27420 agagcaaata ccagcccagc cggagctggg cgagctcaac ccgctcttgc cgtgggaggc 27480 gctgccggtg gcggtggccc gcgcagcggt gccgtggccg cgcacggacc gcccgcgctt 27540 cgcgggggtg agctcgttcg ggatgagcgg aacgaacgcg catgtggtgc tggaagaggc 27600 gccggcggtg gagctgtggc ctgccgcgcc ggagcgctcg gcggagcttt tggtgctgtc 27660 gggcaagagc gagggggcgc tcgatgcgca ggcggcgcgg ctgcgcgagc acctggacat 27720 gcacccggag ctcgggctcg gggacgtggc gttcagcctg gcgacgacgc gcagcgcgat 27780 gaaccaccgg ctcgcggtgg cggtgacgtc gcgcgagggg ctgctggcgg cgctttcggc 27840 cgtggcgcag gggcagacgc cgccgggggc ggcgcgctgc atcgcgagct cgtcgcgcgg 27900 ttcctgttca caagctggcg ccggacaggg cgcgcagacg ccgggcatgg gccgggggct 27960 ttgcgcggcg tggccagcgt tccgggaggc gttcgaccgg tgcgtggcgc tgttcgaccg 28020 ggagctggac cgcccgctgc gcgaggtgat gtgggcggag ccggggagcg ccgagtcgtt 2808 0 gttgctcgac cagacggcgt tcacccagcc cgcgctcttc acggtggagt acgcgctgac 28140 ggcgctgtgg cggtcgtggg gcgtagagcc ggagctggtg gctgggcata gcgccgggga 28200 gctggtggcg gcgtgcgtgg cgggggtgtt ctcgctggaa gatggggtga ggctcgtggc 28260 ggcgcgcggg cggctgatgc aggggctctc ggcgggcggc gcgatggtgt cgctcggagc 28320 gccggaggcg gaggtggcgg cggcggtggc gccgcacgcg gcgtcggtgt cgatcgcggc 28380 ggtcaatggg ccggagcagg tggtgatcgc gggcgtggag caagcggtgc aggcgatcgc 28440 ggcggggttc gcggcgcgcg gcgcgcgcac caagcggctg catgtctcgc acgcgtccca 28500 ctcgccgctg atggaaccga tgctggagga gttcgggcgg gtggcggcgt cggtgacgta 28560 ccggcggcca agcgtttcgc tggtgagcaa cctgagcggg aaggtggtcg cggacgagct 28620 gagcgcgccg gggtactggg tgcggcacgt gcgggaggcg gtgcgcttcg cggacggggt 28680 gaaggcgctg cacgaagccg gtgcgggcac gttcgtcgaa gtgggcccga agccgacgct 28740 gctcgggctg ttgccagcct gcctgccgga ggcggagccg acgctgctgg cgtcgttgcg 28800 cgccgggcgc gaggaggctg cgggggtgct cgaggcgctg ggcaggctgt gggccgccgg 28860 cggctcggtc agctggccgg gcgtcttccc cacggctggg cggcgggtgc cgctgcc gac 28920 ctatccgtgg cagcggcagc ggtactggcc cgacatcgag cctgacagcc gtcgccacgc 28980 agccgcggat ccgacccaag gctggttcta tcgcgtggac tggccggaga tacctcgcag 29040 cctccagaaa tcagaggagg cgagccgcgg gagctggctg gtattggcgg ataagggtgg 29100 agtcggcgag gcggtcgctg cagcgctgtc gacacgtgga cttccatgcg tcgtgctcca 29160 tgcgccggca gagacatccg cgaccgccga gctggtgacc gaggctgccg gcggtcgaag 29220 cgattggcag gtagtgctct acctgtgggg tctggacgcc gtcgtcggtg cggaggcgtc 29280 gatcgatgag atcggcgacg cgacccgtcg tgctaccgcg ccggtgctcg gcttggctcg 29340 gtttctgagc accgtgtctt gttcgccccg actctgggtc gtgacccggg gggcatgcat 29400 cgttggcgac gagcctgcga tcgccccttg tcaggcggcg ttatggggca tgggccgggt 29460 ggcggcgctc gagcatcccg gggcctgggg cgggctcgtg gacctggatc cccgagcgag 29520 gccagcccga cccgccccaa tcgacggcga gatgctcgtc accgagctat tgtcgcagga 29580 gaccgaggat cagctcgcct tccgccatgg gcgccggcac gcggcacggc tggtggccgc 29640 cccgccacag gggcaagcgg caccggtgtc gctgtctgcg gaggcgagct acctggtgac 29700 gggaggcctc ggtgggctgg gcctgatcgt ggcccagtgg ctggtggagc tgg gagcgcg 29760 gcacttggtg ctgaccagcc ggcgcgggtt gcccgaccgg caggcgtggt gcgagcagca 29820 gccgcctgag atccgcgcgc ggatcgcagc ggtcgaggcg ctggaggcgc ggggtgcacg 29880 ggtgaccgtg gcagcggtgg acgtggccga cgtcgaaccg atgacagcgc tggtttcgtc 29940 ggtcgagccc ccgctgcgag gggtggtgca cgccgctggc gtcagcgtca tgcgtccact 30000 ggcggagacg gacgagaccc tgctcgagtc ggtgctccgt cccaaggtgg ccgggagctg 30060 gctgctgcac cggctgctgc acggccggcc tctcgacctg ttcgtgctgt tctcgtcggg 30120 cgcagcggtg tggggtagcc atagccaggg tgcgtacgcg gcggccaacg ctttcctcga 30180 cgggctcgcg catcttcggc gttcgcaatc gctgcctgcg ttgagcgtcg cgtggggtct 30240 gtgggccgag ggaggcatgg cggacgcgga ggctcatgca cgtctgagcg acatcggggt 30300 tctgcccatg tcgacgtcgg cagcgttgtc ggcgctccag cgcctggtgg agaccggcgc 30360 ggctcagcgc acggtgaccc ggatggactg ggcgcgcttc gcgccggtgt acaccgctcg 30420 agggcgtcgc aacctgcttt cggcgctggt cgcagggcgc gacatcatcg cgccttcccc 30480 gcaacccgga tccggcggca actggcgtgg cctgtccgtt gcggaagccc gcgtggctct 30540 gcacgagatc gtccatgggg ccgtcgctcg ggtgctgggc ttcctc GACC cgagcgcgct 30600 cgatcctggg atggggttca atgagcaggg cctcgactcg ttgatggcgg tggagatccg 30660 caacctcctt caggctgagc tggacgtgcg gctttcgacg acgctggcct ttgatcatcc 30720 gacggtacag cggctggtgg agcatctgct cgtcgatgta ctgaagctgg aggatcgcag 30780 cgacacccag catgttcggt cgttggcgtc agacgagccc atcgccatcg tgggagccgc 30840 ctgccgcttc ccgggcgggg tggaggacct ggagtcctac tggcagctat tggccgaggg 30900 cgtggtggtc agcgccgagg tgccggccga ccggtgggat gcggcggact ggtacgaccc 30960 tgatccggag atcccaggcc ggacttacgt gaccaaaggc gccttcctgc gcgatttgca 31020 gagattggat gcgaccttct tccgcatctc gcctcgcgag gcgatgagcc tcgacccgca 31080 gcagcggttg ctcctggagg taagctggga agcgctcgag agcgcgggta tcgctccgga 31140 gatagcccca tacgctgcga ccggggtgtt cgtgggtgcg gggcccaatg agtactacac 31200 gcagcggctg cgaggcttca ccgacggagc ggcagggttg tacggcggca ccgggaacat 31260 gctcagcgtt acggctggac ggctgtcgtt tttcctgggt ctgcacggcc cgacgctggc 31320 catggatacg gcgtgctcgt catccctggt cgcgctgcac ctcgcctgcc agagcctgcg 31380 actgggcgag tgcgatcaag cgctggttgg cggggtcaa c gtgctgctcg cgccggagac 31440 cttcgtgctg ctctcacgga tgcgcgcgct ttcgcccgac gggcggtgca agacgttctc 31500 ggccgacgcg gacggctacg cgcggggcga ggggtgcgcc gtggtggtgc tcaagcggct 31560 gcgcgatgcg cagcgcgccg gcgactccat cctggcgctg atccggggaa gcgcggtgaa 31620 ccacgacggc ccgagcagcg ggctgaccgt acccaacgga cccgcccagc aagcattgct 31680 gcgccaggcg ctttcgcaag caggcgtgtc tccggtcgac gttgattttg tggagtgtca 31740 cgggacaggg acggcgctgg gcgacccgat cgaggtgcag gcgctgagcg aggtgtatgg 31800 tccagggcgc tccggggacc gaccgctggt gctgggggcc gccaaggcca acgtcgcgca 31860 tctggaggcg gcatctggct tggccagcct gctcaaggcc gtgcttgcgc tgcggcacga 31920 gcagatcccg gcccagccgg agctggggga gctcaacccg cacttgccgt ggaacacgct 31980 gccggtggcg gtgccacgta aggcggtgcc gtgggggcgc ggcgcacgcc cgcgtcgggc 32040 cggcgtgagc gcgttcgggt tgagcggaac caacgtgcat gtcgtgctgg aggaggcacc 32100 ggaggtggag ccggcgcccg cggcgccggc gcgaccggtg gagctggtcg tgctatcggc 32160 caagagcgcg gcggcgctgg acgccgcggc ggcacggctc tcggcgcacc tgtccgcgca 32220 cccggagctg agcctcggcg acgtggcgtt c agcctggcg acgacgcgca gcccgatgga 32280 gcaccggctc gccatcgcga cgacctcgcg cgaggccctg cgaggcgcgc tggacgccgc 32340 ggcgcagcaa aagacgccgc agggcgcggt gcgcggcaag gccgtgtcct cacgcggtaa 32400 gctggctttc ctgttcaccg gacagggcgc gcaaatgccg ggcatgggcc gtgggctgta 32460 cgaaacgtgg cctgcgttcc gggaggcgtt cgaccggtgc gtggcgctct tcgatcggga 32520 gatcgaccag cctctgcgcg aggtgatgtg ggctgcgccg ggcctcgctc aggcggcgcg 32580 gctcgatcag accgcgtacg cgcagccggc tctctttgcg ctggagtacg cgctggctgc 32640 cctgtggcgt tcgtggggcg tggagccgca ggtcatagca cgtactgctc tcggcgagct 32700 ggtcgccgcc tgcgtggcgg gcgtgttctc gctcgaagat gcggtgaggt tggtggccgc 32760 gcgcgggcgg ctgatgcagg cgctacccgc cggcggtgcc atggtagcca tcgcagcgtc 32820 cgaggccgag gtggccgcct ccgtggcgcc ccacgccgcc acggtgtcga tcgccgcggt 32880 caacggtcct gacgccgtcg tgatcgccgg cgccgaggta caggtgctcg ccctcggcgc 32940 May 1 gacgttcgcg gcgcgtggga tacgcacgaa gaggctcgcc gtctcccatg cgttccactc 33000 gccgctcatg gatccgatgc tggaagactt ccagcgggtc gctgcgacga tcgcgtaccg 33060 cgcgccagac cgcccggtgg tgtcgaatgt caccggccac gtcgcaggcc ccgagatcgc 33120 cacgcccgag tattgggtcc ggcatgtgcg aagcgccgtg cgcttcggcg acggggcaaa 33180 ggcgttgcat gccgcgggtg ccgccacgtt cgtcgaggtt ggcccgaagc cggtcctgct 33240 cgggctgttg ccagcgtgcc tcggggaagc ggacgcggtc ctcgtgccgt cgctacgcgc 33300 ggaccgctcg gaatgcgagg tggtcctcgc ggcgctcggg gcttggtatg cctggggggg 33360 tgcgctcgac tggaagggcg tgttccccga tggcgcgcgc cgcgtggctc tgcccatgta 33420 tccatggcag cgtgagcgcc attggatgga cctcaccccg cgaagcgccg cgcctgcagg 33480 gatcgcaggt cgctggccgc tggctggtgt cgggctctgc atgcccggcg ctgtgttgca 33540 ccacgtgctc tcgatcggac cacgccatca gcccttcctc ggtgatcacc tcgtgtttgg 33600 caaggtggtg gtgcccggcg cctttcatgt cgcggtgatc ctcagcatcg ccgccgagcg 33660 ctggcccgag cgggcgatcg agctgacagg cgtggagttc tcgcgatgga ctgaaggcca 33720 gcccgaccag gaggtcgagc tccacgccgt gctcaccccc gaagccgccg gggatggct to 33780 cctgttcgag ctggcgaccc tggcggcgcc ggagaccgaa cgccgatgga cgacccacgc 33840 ccgcggtcgg gtgcagccga cagacggcgc gcccggcgcg ttgccgcgcc tcgaggtgct 33900 ggaggaccgc gcgatccagc ccctcgactt cgccggattc ctcgacaggt tatcggcggt 33960 gcggatcggc tggggtccgc tttggcgatg gctgcaggac gggcgcgtcg gcgacgaggc 34020 ctcgcttgcc accctcgtgc cgacctatcc gaacgcccac gacgtggcgc ccttgcaccc 34080 gatcctgctg gacaacggct ttgcggtgag cctgctgtca acccggagcg agccggagga 34140 cgacgggacg cccccgctgc cgttcgccgt ggaacgggtg cggtggtggc gggcgccggt 34200 tggaagggtg cggtgtggcg gcgtgccgcg gtcgcaggca ttcggtgtct cgagcttcgt 34260 gctggtcgac gaaactggcg aggtggtcgc cgaggtggag ggatttgttt gccgccgggc 34320 gccgcgagag gtgttcctgc ggcaggagtc gggcgcgtcg actgcagcct tgtaccgcct 34380 cgactggccc gaagcgccct tgcccgatgc gcctgcggaa cggatcgagg agagctgggt 34440 cgtggtggca gcacctggct cggagatggc cgcggcgctc gcaacacggc tcaaccgctg 34500 cgtcctcgcc gaacccaaag gcctcgaggc ggccctcgcg ggggtgtctc ccgcaggtgt 34560 gatctgcctc tgggaggctg gagcccacga gcggcggcgc to ggaagctccg gcgtgtggc 34620 gaccgagggc ctctcggtgg tgcaggcgct cagggaccgc gcggtgcgcc tgtggtgggt 34680 gaccatgggc gcagtggccg tcgaggccgg tgagcgggtg caggtcgcca cagcgccggt 34740 atggggcctc ggccggacag tgatgcagga gcgcccggag ctcagctgca ctctggtgga 34800 tttggagccg gaggccgatg cagcgcgctc agctgacgtt ctgttgcggg agctcggtcg 34860 cgctgacgac gagacacagg tggctttccg ttccggaaag cgccgcgtag cgcggctggt 34920 caaagcgacg acccccgaag ggctcctggt ccctgacgca gagtcctatc gactggaggc 34980 tgggcagaag ggcacattgg accagctccg cctcgcgccg gcacagcgcc gggcacctgg 35040 cccgggcgag gtcgagatca aggtaaccgc ctcggggctc aacttccgga ccgtcctcgc 35100 tgtgctggga atgtatccgg gcgacgccgg gccgatgggc ggagattgtg ccggtgtcgc 35160 cacggcggtg ggccaggggg tgcgccacgt cgcggtcggc gatgctgtca tgacgctggg 35220 gacgttgcat cgattcgtca cggtcgacgc gcggctggtg gtccggcagc ctgcagggct 35280 gactcccgcg caggcagcta cggtgccggt cgcgttcctg acggcctggc tcgctctgca 35340 cgacctgggg aatctgcggc gcggcgagcg ggtgctgatc catgctgcgg ccggcggtgt 35400 gggcatggcc gcggtgcaaa tcgcccgatg gataggggcc gagg tgttcg ccacggcgag 35460 cccgtccaag tgggcagcgg ttcaggccat gggcgtgccg cgcacgcaca tcgccagctc 35520 gcggacgctg gagtttgctg agacgttccg gcaggtcacc ggcggccggg gcgtggacgt 35580 ggtgctcaac gcgctggccg gcgagttcgt ggacgcgagc ctgtccctgc tgtcgacggg 35640 cgggcggttc ctcgagatgg gcaagaccga catacgggat cgagccgcgg tcgcggcggc 35700 gcatcccggt gttcgctatc gggtattcga catcctggag ctcgctccgg atcgaactcg 35760 agagatcctc gagcgcgtgg tcgagggctt tgctgcggga catctgcgcg cattgccggt 35820 gcatgcgttc gcgatcacca aggccgaggc agcgtttcgg ttcatggcgc aagcgcggca 35880 tcagggcaag gtcgtgctgc tgccggcgcc ctccgcagcg cccttggcgc cgacgggcac 35940 cgtactgctg accggtgggc tgggagcgtt ggggctccac gtggcccgct ggctcgccca 36000 gcagggcgtg ccgcacatgg tgctcacagg tcggcggggc ctggatacgc cgggcgctgc 36060 caaagccgtc gcggagatcg aagcgctcgg cgctcgggtg acgatcgcgg cgtcggatgt 36120 cgccgatcgg aatgcgctgg aggctgtgct ccaggccatt ccggcggagt ggccgttaca 36180 gggcgtgatc catgcagccg gagcgctcga tgatggtgtg cttgatgagc agaccaccga 36240 ccgcttctcg cgggtgctgg caccgaaggt gactggc gcc tggaatctgc atgagctcac 36300 ggcgggcaac gatctcgctt tcttcgtgct gttctcctcc atgtcggggc tcttgggctc 36360 ggccgggcag tccaactatg cggcggccaa caccttcctc gacgcgctgg ccgcgcatcg 36420 gcgggccgaa ggcctggcgg cgcagagcct cgcgtggggc ccatggtcgg acggaggcat 36480 ggcagcgggg ctcagcgcgg cgctgcaggc gcggctcgct cggcatggga tgggagctct 36540 gtcgccggct cagggcaccg cgctgctcgg gcaggcgctg gctcggccgg aaacgcagct 36600 cggggcgatg tcgctcgacg tgcgtgcggc aagccaagct tcgggagcgg cagtgccgcc 36660 tgtgtggcgc gcgttggtgc gcgcggaggc gcgccatacg gcggctgggg cgcagggggc 36720 attggccgcg cgtcttgggg cgctgcccga ggcgcgtcgc gccgacgagg tgcgcaaggt 36780 cgtgcaggcc gagatcgcgc gcgtgctttc atggagcgcc gcgagcgccg tgcccgtcga 36840 tcggccgctg tcggacttgg gcctcgactc gctcacggcg gtggagctgc gcaacgtgct 36900 gtgggtgcga cggccagcgg gacgctggca cgctgccggc cgacggtcga ttcgatcacc 36960 cgcgctcacg cgctggctgc tcgataaggt cctggccgtg gccgagccga gcgtatcgtc 37020 cgcaaagtcg tcgccgcagg tcgccctcga cgagcccatt gccatcatcg gcatcggctg 37080 ccgtttccca ggcggcgtgg ccgatccgga gtc gttttgg cggctgctcg aagagggcag 37140 cgatgccgtc gtcgaggtgc cgcatgagcg atgggacatc gacgcgttct atgatccgga 37200 tccggatgtg cgcggcaaga tgacgacacg ctttggcggc ttcctgtccg atatcgaccg 37260 gttcgatccg gccttcttcg gcatctcgcc gcgcgaagcg acgaccatgg atccgcagca 37320 gcggctgctc ctggagacga gctgggaggc gttcgagcgc gccgggattt tgcccgagcg 37380 gctgatgggc agcgataccg gcgtgttcgt ggggctcttc taccaggagt acgctgcgct 37440 cgccggcggc atcgaggcgt tcgatggcta tctaggcacc ggcaccacgg ccagcgtcgc 37500 ctcgggcagg atctcttatg tgctcgggct aaaggggccg agcctgacgg tggacaccgc 37560 gtgctcctcg tcgctggtcg cggtgcacct ggcctgccag gcgctgcggc ggggcgagtg 37620 ttcggtggcg ctggccggcg gcgtggcgct gatgctcacg ccggcgacgt tcgtggagtt 37680 cagccggctg cgaggcctgg ctcccgacgg acggtgcaag agcttctcgg ccgcagccga 37740 cggcgtgggg tggagcgaag gctgcgccat gctcctgctc aaaccgcttc gcgatgcgca 37800 gcgcgatggg gatccgatcc tggcggtgat ccgcggcacc gcggtgaacc aggatgggcg 37860 cagcaacggg ctgacggcgc ccaacgggtc gtcgcagcaa gaggtgatcc gtcgggccct 37920 ggagcaggcg gggctggctc cggcgg ACGT cagctacgtc gagtgccacg gcaccggcac 37980 gacgttgggc gaccccatcg aagtgcaggc cctgggcgcc gtgctggcac aggggcgacc 38040 ctcggaccgg ccgctcgtga tcgggtcggt gaagtccaat atcggacata cgcaggctgc 38100 ggcgggcgtg gccggtgtca tcaaggtggc gctggcgctc gagcgcgggc ttatcccgag 38160 gagcctgcat ttcgacgcgc ccaatccgca cattccgtgg tcggagctcg ccgtgcaggt 38220 ggccgccaaa cccgtcgaat ggacgagaaa cggcgtgccg cgacgagccg gggtgagctc 38280 agcgggacca gtttggcgtc acgcgcacgt ggtgctggag gaggcgccag cggcggcgtt 38340 cgcgcccgcg gcggcgcgtt cagcggagct tttcgtgctg tcggcgaaga gcgccgcggc 38400 gctggacgcg caggcggcgc ggctttcggc gcacgtcgtt gcgcacccgg agctcggcct 38460 cggcgacctg gcgttcagcc tggcgacgac ccgcagcccg atgacgtacc ggctcgcggt 38520 ggcggcgacc tcgcgcgagg cgctgtctgc cgcgctcgac acagcggcgc aggggcaggc 38580 gccgcccgca gcggctcgcg gccacgcttc cacaggcagc gccccaaagg tggttttcgt 38640 ctttcctggc cagggctccc agtggctggg catgggccaa aagctcctct cggaggagcc 38700 cgtcttccgc gacgcgctct cggcgtgtga ccgagcgatt caggccgaag ccggctggtc 38760 gctgctcgcc gagctcgcg g ccgatgagac cacctcgcag ctcggccgca tcgacgtggt 38820 gcagccggcg ctgttcgcga tcgaggtcgc gctgtcggcg ctgtggcggt cgtggggcgt 38880 cgagccggat gcagtggtag gccacagcat gggcgaagtg gcggccgcgc acgtcgccgg 38940 cgccctgtcg ctcgaggatg ctgtagcgat catctgccgg cgcagcctgc tgctgcggcg 39000 gatcagcggc caaggcgaga tggcggtcgt cgagctttcc ctggccgagg ccgaggcagc 39060 gctcctgggc tacgaagacc ggctcagcgt ggcggtgagc aacagcccgc gctcgacggt 39120 gctggcgggc gagccggcag cgctcgcaga ggtgctggcg atccttgcgg caaagggggt 39180 gttctgccgt cgagtcaagg tggacgtcgc cagccacagc ccacagatcg acccgctgcg 39240 cgacgagcta ttggcagcat tgggcgagct cgagccgcga caagcgaccg tgtcgatgcg 39300 ctcgacggtg acgagcacga tcatggcggg cccggagctc gtggcgagct actgggcgga 39360 cagccggtgc caacgttcga agcggtgcaa gcttcgccga aagacggtca tcgttgatgg 39420 tgggctgttc gtggagatga gcccgcatcc gatcctgacg acatcggtcg aggagatccg 39480 acgggcgacg aagcgggagg gagtcgcggt gggctcgttg cggcgtggac aggacgagcg 39540 cctgtccatg ttggaggcgc tgggagcgct ctgggtacac ggccaggcgg tgggctggga 39600 gcggctgttc t ccgcgggcg gcgcgggcct ccgtcgcgtg ccgctgccga cctatccctg 39660 gcagcgcgag cggtactggg tcgatgcgcc gaccggcggc gcggcgggcg gcagccgctt 39720 tgctcatgcg ggcagtcacc cgctcctggg tgaaatgcag accctgtcga cccagaggag 39780 cacgcgcgtg tgggagacga cgctggatct caaacggctg ccgtggctcg gcgatcaccg 39840 ggtgcagggg gcggtcgtgt tcccgggcgc ggcgtacctg gagatggcgc tttcgtccgg 39900 ggccgaggcc ttgggtgacg gtccgctcca ggtcagcgat gtggtgctcg ccgaggcgct 39960 ggccttcgcg gatgatacgc cggcggcggt gcaggtcatg gcgaccgagg agcgaccagg 40020 ccgcctgcaa ttccacgttg cgagccgggt gccgggccac ggcggtgctg cctttcgaag 40080 ccatgcccgc ggggtgctgc gccagatcga gcgcgccgag gtcccggcga ggctggatct 40140 ggccgcgctt cgtgcccggc ttcaggccag cgcacccgct gcggctacct atgcggcgct 40200 ggccgagatg gggctcgagt acggcccagc gttccagggg cttgtcgagc tgtggcgggg 40260 ggagggcgag gcgctgggac gtgtgcggct ccccgaggcc gccggctccc cagccgcgtg 40320 ccggctccac cccgcgctct tggatgcgtg cttccacgtg agcagcgcct tcgctgaccg 40380 cggcgaggcg acgccatggg tacccgtgga aatcggctcg ctgcggtggt tccagcggcc 40440 ctg gtcgggggag tggtgtc atgcgcggag tgtgagccac ggaaagccaa cacccgaccg 40500 gcggagtacc gacttctggg tggtcgacag cacgggcgcg atcgtcgccg agatctccgg 40560 gctcgtggcg cagcggctcg cgggaggtgt acgccggcgc gaagaagacg actggttcat 40620 ggagccggct tgggaaccga ccgcggtccc cggatccgag gtcatggcgg gccggtggct 40680 gctcatcggc tcgggcggcg ggctcggcgc tgcgctccac tcggcgctga cggaagctgg 40740 ccattccgtc gtccacgcga cagggcgcgg cacgagcgcc gccgggttgc aggcactctt 40800 gacggcgtcc ttcgacggcc aggccccgac gtcggtggtg gcctcgatga cacctcggca 40860 gcgtggcgtg ctcgacgcgg atgccccctt cgacgccgat gcgcttgagg agtcgctggt 40920 gcgcggctgc gacagcgtgc tctggaccgt gcaggccgtg gccggggcgg gcttccgaga 40980 tcctccgcgg ttgtggctcg tgacacgcgg cgctcaggcc atcggcgccg gcgacgtctc 41040 tgtggcgcaa gcgccgctcc tggggctggg ccgcgttatc gccttggagc acgccgagct 41100 gcgctgcgct cggatcgacc tcgatccagc gcggcgcgac ggagaagtcg atgagctgct 41160 tgccgagctg ttggccgacg acgccgagga ggaagtcgcg tttcgcggcg gtgagcggcg 41220 ctcgtccgaa cgtggcccgg ggctgcccga gaccgactgc cgagagaaaa tcgagcccgc 41280 ggaagg GGCC ccgttccggc tggagatcga tgggtccggc gtgctcgacg acctggtgct 41340 ccgagccacg gagcggcgcc ctcctggccc gggcgaggtc gagatcgccg tcgaggcggc 41400 ggggctcaac tttctcgacg tgatgagggc catggggatc taccctgggc ccggggacgg 41460 tccggttgcg ctgggcgccg agtgctccgg ccgaattgtc gcgatgggcg aaggtgtcga 41520 gagccttcgt atcggccagg acgtcgtggc cgtcgcgccc ttcagtttcg gcacccacgt 41580 caccatcgac gcccggatgc tcgcacctcg ccccgcggcg ctgacggccg cgcaggcagc 41640 gtcgcattca cgcgctgccc tgacggcctg gtacggtctc gtccatctgg ggaggctccg 41700 ggccggcgag cgcgtgctca tccactcggc gacggggggc accgggctcg ctgctgtgca 41760 gatcgcccgc cacctcggcg cggagatatt tgcgaccgct ggtacaccgg agaagcgggc 41820 gtggctgcgc gagcagggga tcgcgcacgt gatggactcg cggtcgctgg acttcgccga 41880 gcaagtgctg gccgcgacga agggcgaggg ggtcgacgtc gtgttgaact cgctgtctgg 41940 cgccgcgatc gacgcgagcc tttcgaccct cgtgccggac ggccgcttca tcgagctcgg 42000 caagacggac atctatgcag atcgctcgct ggggctcgct cacttcagga agagcctgtc 42060 ctacagcgcc gtcgatcttg cgggcttggc cgtgcgtcgg cccgagcgcg tcgcagcgct 4212 0 gctggcggag gtggtggacc tgctcgcacg gggagcgctg cagccgcttc cggtagagat 42180 cttccccctc tcgcgggccg cggacgcgtt ccggaaaatg gcgcaagcgc agcatctcgg 42240 gaagctcgtg ctcgcgctgg aggacccgga cgtgcggatc cgcgttccgg gcgaatccgg 42300 cgtcgccatc cgcgcggacg gcgcctacct cgtgaccggc ggtctggggg ggctcggtct 42360 gagcgtggct ggatggctgg ccgagcaggg ggctgggcat ctggtgctgg tgggccgctc 42420 cggcgcggtg agcgcggagc agcagacggc tgtcgccgcg ctcgaggcgc acggcgcgcg 42480 gcgagggcag tgtcacggta acgtcgccga tcgggcgcag tcctccgcga atggagcgga 42540 ggttaccgcg tcggggatgc cgctccgcgg cgtcgttcat gcggccggaa tcctggacga 42600 cgggctgctg atgcagcaaa cccccgcgcg gttccgcgcg gtcatggcgc ccaaggtccg 42660 aggggccttg cacctgcatg cgttgacacg cgaagcgccg ctctccttct tcgtgctgta 42720 cgcttcggga gcagggctct tgggctcgcc gggccagggc aactacgccg cggccaacac 42780 gttcctcgac gcactggcac accaccggag ggcgcagggg ctgccagcat tgagcatcga 42840 ctggggcctg ttcgcggacg tgggtttggc cgccgggcag caaaatcgcg gcgcacggct 42900 ggtcacccgc gggacgcgga gcctcacccc cgacgaaggg ctgtgggcgc tcgagcg cct 42960 gctcgacggc gatcgcaccc aggccggggt catgccgttc gacgtgcggc agtgggtgga 43020 gttctacccg gcggcggcat cttcgcggag gttgtcgcgg ctcatgacgg cacggcgcgt 43080 ggcttccggt cggctcgccg gggatcggga cctgctcgaa cggctcgcca ccgccgaggc 43140 gggcgcgcgg gcagggatgc tgcaggaggt cgtgcgcgcg caggtctcgc aggtgctgcg 43200 cctctccgaa ggcaagctcg acgtggatgc gccgctcacg agcctgggaa tggactcgct 43260 gagctgcgca gatggggcta accgcatcga ggccgtgctc ggcatcacca tgccggcgac 43320 cctgctgtgg acctacccca cggtggcagc gctgagtgcg catctggctt ctcatgtcgt 43380 ctctacgggg gatggggaat ccgcgcgccc gccggataca gggagcgtgg ctccaacgac 43440 ccacgaagtc gcttcgctcg acgaagacgg gttgttcgcg ttgattgatg agtcactcgc 43500 gcgcgcggga aagaggtgat tgcgtgacag accgagaagg ccagctcctg gagcgcttgc 43560 gtgaggttac tctggccctt cgcaagacgc tgaacgagcg cgataccctg gagctcgaga 43620 agaccgagcc gatcgccatc gtggggatcg gctgccgctt ccccggcgga gcgggcactc 43680 cggaggcgtt ctgggagctg ctcgacgacg ggcgcgacgc gatccggccg ctcgaggagc 43740 gctgggcgct cgtaggtgtc gacccaggcg acgacgtacc gcgctgggcg ggg ctgctca 43800 ccgaggccat cgacggcttc gacgccgcgt tcttcggtat cgccccccgg gaggcacggt 43860 cgctcgaccc gcagcatcgc ctgctgctgg aggtcgcctg ggaggggttc gaagacgccg 43920 gcatcccgcc caggtccctc gtcgggagcc gcaccggcgt gttcgtcggc gtctgcgcca 43980 cggagtacct ccacgccgcc gtcgcgcacc agccgcgcga agagcgggac gcgtacagca 44040 ccaccggcaa catgctcagc atcgccgccg gacggctatc gtacacgctg gggctgcagg 44100 gaccttgcct gaccgtcgat acggcgtgct cgtcatcgct ggtggccatt cacctcgcct 44160 gccgcagcct gcgcgctcga gagagcgatc tcgcgctggc gggaggggtc aacatgcttc 44220 cacgatgcga tctcccccga gctctggcgc gcacccaggc gctgtcgccc aatggccgtt 44280 gccagacctt cgacgcgtcg gccaacgggt tcgtccgtgg ggagggctgc ggtctgatcg 44340 tgctcaagcg attgagcgac gcgcggcggg atggggaccg gatctgggcg ctgatccgag 44400 gatcggccat caatcaggac ggccggtcga cggggttgac ggcgcccaac gtgctcgccc 44460 agggggcgct cttgcgcgag gcgctgcgga acgccggcgt cgaggccgag gccatcggtt 44520 acatcgagac ccacggggcg gcaacctcgc tgggcgaccc catcgagatc gaagcgctgc 44580 gcgctgtggt ggggccggcg cgagccgacg gagcgcgctg cgtgct GGGC gcggtgaaga 44640 ccaacctcgg ccacctggag ggcgctgccg gcgtggcggg cctgatcaag gcgacgcttt 44700 cgctacatca cgagcgcatc ccgaggaacc tcaactttcg tacgctcaat ccgcggatcc 44760 ggatcgaggg gaccgcgctc gcgttggcga ccgaaccggt gccctggccg cggacgggcc 44820 ggacgcgctt cgcgggagtg agctcgttcg ggatgagcgg gaccaacgcg catgtggtgt 44880 tggaggaggc gccggcggtg gagcctgagg ccgcggcccc cgagcgcgca gcggagctgt 44940 tcgtcctgtc ggcgaagagc gcggcggcgc tggatgcgca ggcagcccgg ctgcgggacc 45000 acctggagaa gcacgtcgag cttggcctcg gcgatgtggc gttcagcctg gcgacgacgc 45060 gcagcgcgat ggagcaccgg ctggcggtgg ccgcgagctc gcgcgaggcg ctgcgagggg 45120 cgctttcggc cgcagcgcag gggcacacgc cgccgggagc cgtgcgtggg cgggcctcgg 45180 gcggcagcgc gccgaaggtg gtcttcgtgt ttcccggtca gggctcgcag tgggtgggca 45240 tgggccgaaa gctcatggcc gaagagccgg tcttccgggc ggcgctggag ggttgcgacc 45300 gggccatcga ggcggaagcg ggctggtcgc tgctcgggga gctctccgcc gacgaggccg 45360 cctcgcagct cgggcgcatc gacgtggttc agccggtgct cttcgccatg gaagtagcgc 45420 tttctgcgct gtggcggtcg tggggagtgg agccggaag c ggtggtgggc cacagcatgg 45480 gcgaggttgc ggcggcgcac gtggccggcg cgctgtcgct cgaggacgcg gtggcgatca 45540 tctgccggcg cagccggctg ctgcggcgga tcagcggtca gggggagatg gcgctggtcg 45600 agctgtcgct ggaggaggcc gaggcggcgc tgcgtggcca tgagggtcgg ctgagcgtgg 45660 cggtgagcaa cagcccgcgc tcgaccgtgc tcgccggcga gccggcggcg ctctcggagg 45720 tgctggcggc gctgacggcc aagggggtgt tctggcggca ggtgaaggtg gacgtcgcca 45780 gccatagccc gcaggtcgac ccgctgcgcg aagagctgat cgcggcgctg ggagcgatcc 45840 ggccgcgagc ggctgcggtg ccgatgcgct cgacggtgac gggcggggtg atcgcgggtc 45900 cggagctcgg tgcgagctac tgggcggaca accttcggca gccggtgcgc ttcgctgcgg 45960 cggcgcaagc gctgctggag ggtggccccg cgctgttcat cgagatgagc ccgcacccga 46020 tcctggtgcc gcccctggac gagatccaga cggcggccga gcaagggggc gctgcggtgg 46080 gctcgctgcg gcgagggcag gacgagcgcg cgacgctgct ggaggcgctg gggacgctgt 46140 gggcgtccgg ctatccggtg agctgggctc ggctgttccc cgcgggcggc aggcgggttc 46200 cgctgccgac ctatccctgg cagcacgagc ggtgctggat cgaggtcgag cctgacgccc 46260 gccgcctcgc cccaccaagg to cgcagccgac ctggttcta ccgaacggac tggcccgagg 46320 tgccccgcgc cgccccgaaa tcggagacag ctcatgggag ctggctgctg ttggccgaca 46380 ggggtggggt cggtgaggcg gtcgctgcag cgctgtcgac gcgcggactt tcctgcaccg 46440 tgcttcatgc gtcggctgac gcctccaccg tcgccgagca ggcatccgaa gctgccagtc 46500 ctggcaggga gccgaaacga gtcctctacc tgtggggcct cgacgccgtc gtcgatgctg 46560 gggcatcggc cgacgaagtc agcgaggcta cccgccgtgc caccgcaccc gtccttgggc 46620 tggttcgatt cctgagcgct gcgccccatc ctcctcgctt ctgggtggtg acccgcgggg 46680 catgcacggt gggcggcgag ccagaggcct ctctttgcca agcggcgttg tggggcctcg 46740 cgcgcgtcgc ggcgctggag caccccgctg cctggggtgg cctcgtggac ctggatcctc 46800 agaagagccc gacggagatc gagcccctgg tggccgagct gctttcgccg gacgccgagg 46860 atcaactggc gttccgcagc ggtcgcaggc acgcagcacg ccttgtagcc gccccgccgg 46920 agggcgacgt cgcaccgata tcgctgtccg cggaggggag ctacctggtg acgggcgggc 46980 tgggtggcct tggtctgctc gtggctcggt ggctggtgga gcggggagct cgacatctgg 47040 tgctcaccag ccggcacggg ctgccagagc gacaggcgtc gggcggagag cagccgccgg 47100 aggcccgcgc gcgcatcgca gcggtcgagg ggc tggaagc gcagggcgcg cgggtgaccg 47160 tggcagcggt ggatgtcgcc gaggccgatc ccatgacggc gctgctggcc gccatcgagc 47220 ccccgttgcg cggggtggtg cacgccgccg gcgtcttccc cgtgcgtcac ctggcggaga 47280 cggacgaggc cctgctggag tcggtgctcc gtcccaaggt ggccgggagc tggctgctgc 47340 accggctgct gcgcgaccgg cctctcgacc tgttcgtgct gttctcgtcg ggcgcggcgg 47400 tgtggggtgg caaaggccaa ggcgcatacg ccgcggccaa tgcgttcctc gacgggctcg 47460 cgcaccatcg ccgcgcgcac tcgctgccgg cgttgagcct cgcctggggc ttatgggccg 47520 agggaggcat ggttgatgca aaggctcatg cacgtctgag cgacatcggg gtcctgccca 47580 tggccacggg gccggccttg tcggcgctgg agcgcctggt gaacaccagc gctgtccagc 47640 gttcggtcac acggatggac tgggcgcgct tcgcgccggt ctatgccgcg cgagggcggc 47700 gcaacttgct ttcggctctg gtcgcggagg acgagcgcgc tgcgtctccc ccggtgccga 47760 cggcaaaccg gatctggcgc ggcctgtccg ttgcggagag ccgctcagcc ctctacgagc 47820 tcgttcgcgg catcgtcgcc cgggtgctgg gcttctccga cccgggcgcg ctcgacgtcg 47880 gccgaggctt cgccgagcag gggctcgact ccctgatggc tctggagatc cgtaaccgcc 47940 gctgggcgaa ttcagcgcga cggctg TCGG cgactctggc cttcgaccac ccgacggtgg 48000 agcggctggt ggcgcatctc ctcaccgacg tgctgaagct ggaggaccgg agcgacaccc 48060 ggcacatccg gtcggtggcg gcggatgacg acatcgccat cgtcggtgcc gcctgccggt 48120 tcccaggtgg ggatgagggc ctggagacat actggcggca tctggccgag ggcatggtgg 48180 tcagcaccga ggtgccagcc gaccggtggc gcgcggcgga ctggtacgac cccgatccgg 48240 aggttccggg ccggacctat gtggccaagg gtgccttcct ccgcgatgtg cgcagcttgg 48300 atgcggcgtt cttcgccatt tcccctcgtg aggcgatgag cctggacccg caacagcggc 48360 tgttgctgga ggtgagctgg gaggcgatcg agcgcgctgg ccaggacccg atggcgctgc 48420 gcgagagcgc cacgggcgtg ttcgtgggca tgatcgggag cgagcacgcc gagcgggtgc 48480 agggcctcga cgacgacgcg gcgttgctgt acggcaccac cggcaacctg ctcagcgtcg 48540 ccgctggacg gctgtcgttc ttcctgggtc tgcacggccc gacgatgacg gtggacaccg 48600 cctgctcgtc gtcgctggtg gcgttgcacc tcgcctgcca gagcctgcga ttgggcgagt 48660 gcgaccaggc cctggccggc gggtccagcg tgcttttgtc gccgcggtca ttcgtcgcgg 48720 cgtcgcgcat gcgtttgctt tcgccagatg ggcggtgcaa gacgttctcg gccgctgcag 48780 gcgggccga acggctttgc g ggctgcgccg tggtggtgct caagcggctc cgtgacgcgc 48840 agcgcgaccg cgaccccatc ctggcggtgg tcaggagcac ggcgatcaac cacgatggcc 48900 cgagcagcgg gctcacggtg cccagcggtc ctgcccagca ggcgttgcta cgccaggcgc 48960 tggcgcaagc gggcgtggcg ccggccgagg tcgatttcgt ggagtgccac gggacgggga 49020 cagcgctggg tgacccgatc gaggtgcagg cgctgggcgc ggtgtacggg cggggccgcc 49080 ccgcggagcg gccgctctgg ctgggcgctg tcaaggccaa cctcggccac ctggaggccg 49140 cggcgggctt ggccggcgtg ctcaaggtgc tcttggcgct ggagcacgag cagattccgg 49200 ctcaaccgga gctcgacgag ctcaacccgc acatcccgtg ggcagagctg ccagtggccg 49260 ttgtccgcag ggcggtcccc tggccgcgcg gcgcgcgccc gcgtcgtgca ggcgtgagcg 49320 ctttcggcct gagcgggacc aacgcgcatg tggtgttgga ggaggcgccg gcggtggagc 49380 ctgtggccgc ggcccccgag cgcgcagcgg agctgttcgt cctgtcggcg aagagcgcgg 49440 tgcgcaggca cggcgctgga gcccggctgc gggaccacct ggagaagcat gtcgagcttg 49500 gcctcggcga tgtggcgttc agcctggcga cgacgcgcag cgcgatggag caccggctgg 49560 cggtggccgc gagctcgcgc gaggcgctgc gaggggcgct ttcggccgca gcgcaggggc acacgccgcc 49620 g ggagccgtg cgtgggcggg cctcgggcgg cagcgcgccg aaggtggtct 49680 tcgtgtttcc cggccagggc tcgcagtggg tgggcatggg ccgaaagctc atggccgaag 497 ^ 0 agccggtctt ccgggcggcg ctggagggtt gcgaccgggc catcgaggcg gaagcgggct 49800 ggtcgctgct cggggagctc tccgccgacg aggccgcctc gcagctcggg cgcatcgacg 49860 tggttcagcc ggtgctgttc gccatggaag tagcgctttc tgcgctgtgg cggtcgtggg 49920 gagtggagcc ggaagcggtg gtgggccaca gcatgggcga ggttgcggcg gcgcacgtgg 49980 ccggcgcgct gtcgctcgag gacgcggtgg cgatcatctg ccggcgcagc cggctgctgc 50040 ggcggatcag cggtcagggg gagatggcgc tggtcgagct gtcgctggag gaggccgagg 50,100 cggcgctgcg tggccatgag ggtcggctga gcgtggcggt gagcaacagc ccgcgctcga 50160 ccgtgctcgc cggcgagccg gcggcgctct cggaggtgct ggcggcgctg acggccaagg 50220 gggtgttctg gcggcaggtg aaggtggacg tcgccagcca tagcccgcag gtcgacccgc 50280 tgcgcgaaga gctgatcgcg gcgctgggag cgatccggcc gcgagcggct gcggtgccga 50340 tgcgctcgac ggtgacgggc ggggtgatcg cgggtccgga gctcggtgcg agctactggg 50400 cggacaacct tcggcagccg gtgcgcttcg ctgcggcggc gcaagcgctg ctggagggtg 50460 gccccgcgct gtt catcgag atgagcccgc acccgatcct ggtgccgccc ctggacgaga 50520 tccagacggc ggccgagcaa gggggcgctg cggtgggctc gctgcggcga gggcaggacg 50580 agcgcgcgac gctgctggag gcgctgggga cgctgtgggc gtccggctat ccggtgagct 50640 gggctcggct gttccccgcg ggcggcaggc gggttccgct gccgacctat ccctggcagc 50700 acgagcggta ctggatcgag gacagcgtgc atgggtcgaa gccctcgctg cggcttcggc 50760 agcttcgcaa cggcgccacg gaccatccgc tgctcggggc tccattgctc gtctcggcgc 50820 gacccggagc tcacttgtgg gagcaagcgc tgagcgacga gaggctatcc tacctttcgg 50880 ccatggcgaa aacatagggt gccgtgttgc ccagcgcggc gtatgtagag atggcgctcg 50940 ccgccggcgt agatctctat ggcacggcga cgctggtgct ggagcagctg gcgctcgagc 51000 gagccctcgc cgtgccctcc gaaggcggac gcatcgtgca agtggccctc agcgaagaag 51060 gtcccggtcg ggcctcattc caggtatcga gtcgtgagga ggcaggtagg agctgggtgc 51120 ggcacgccac ggggcacgtg tgtagcggcc agagctcagc ggtgggagcg ttgaaggaag 51180 ctccgtggga gattcaacgg cgatgtccga gcgtcctgtc gtcggaggcg ctctatccgc 51240 tgctcaacga gcacgccctc gactatggtc cctgcttcca gggcgtggag caggtgtggc 51300 tcggca CGGG ggaggtgctc ggccgggtac gcttgccagg agacatggca tcctcaagtg 51360 gcgcctaccg gattcatccc gccttgttgg atgcatgttt tcaggtgctg acagcgctgc 51420 tcaccacgcc ggaatccatc gagattcgga ggcggctgac ggatctccac gaaccggatc 51480 tcccgcggtc cagggctccg gtgaatcaag cggtgagtga cacctggctg tgggacgccg 51540 cgctggacgg tggacggcgc cagagcgcga gcgtgcccgt cgacctggtg ctcggcagct 51600 tccatgcgaa gtgggaggtc atggagcgcc tcgcgcaggc gtacatcatc ggcactctcc 51660 gcatatggaa cgtcttctgc gctgctggag agcgtcacac gatagacgag ttgctcgtca 51720 ggcttcaaat ctctgtcgtc tacaggaagg tcatcaagcg atggatggaa caccttgtcg 51780 cgatcggcat ccttgtaggg gacggagagc attttgtgag ctctcagccg ctgccggagc 51840 ctgatttggc ggcggtgctc gaggaggccg ggagggtgtt cgccgacctc ccagtcctat 51900 ttgagtggtg caagtttgcc ggggaacggc tcgcggacgt attgaccggt aagacgctcg 51960 cgctcgagat cctcttccct ggtggctcgt tcgatatggc ggagcgaatc tatcgagatt 52020 cgcccatcgc ccgttactcg aacggcatcg tgcgcggtgt cgtcgagtcg gcggcgcggg 52080 tggtagcacc gtcgggaatg ttcagcatct tggagatcgg agcagggacg ggcgcgacca 5214 0 ccgccgccgt cctcccggtg ttgctgcctg accggacgga gtaccatttc accgatgttt 52200 ctccgctctt ccttgctcgc gcggagcaaa gatttcgaga ttatccattc ctgaagtatg 52260 gcattctgga tgtcgaccag gagccagctg gccagggata cgcacatcag aggtttgacg 52320 tcatcgtcgc ggccaatgtc atccatgcga cccgcgatat aagagccacg gcgaagcgtc 52380 tcctgtcgtt gctcgcgccc ggaggccttc tggtgctggt cgagggcaca gggcatccga 52440 tctggttcga tatcaccacg ggattgattg aggggtggca gaagtacgaa gatgatcttc 52500 tccgctcctg gtatcgacca cctgctcgga cctggtgtga cgtcctgcgc cgggtaggct 52560 ttgcggacgc cgtgagtctg ccaggcgacg gatctccggc ggggatcctc ggacagcacg 52620 tgatcctctc gcgcgcgccg ggcatagcag gagccgcttg tgacagctcc ggtgagtcgg 52680 cgaccgaatc gccggccgcg cgtgcagtac ggcaggaatg ggccgatggc tccgctgacg 52740 tcgtccatcg gatggcgttg gagaggatgt acttccaccg ccggccgggc cggcaggttt 52800 gggtccacgg tcgattgcgt accggtggag gcgcgttcac gaaggcgctc gctggagatc 52860 tgctcctgtt cgaagacacc gggcaggtcg tggcagaggt tcaggggctc cgcctgccgc 52920 agctcgaggc ttctgctttc gcgccgcggg acccgcggga agagtggttg tacgctt tgg 52980 aatggcagcg caaagaccct ataccagagg ctccggcagc cgcgtcttct tcctccgcgg 53040 gggcttggct cgtgctgatg gaccagggcg ggacaggcgc tgcgctcgta tcgctgctgg 53100 aag gc9a g cgaggcgtgc gtgcgcgtca tcgcgggtac ggcatacgcc tgcctcgcgc 53160 cggggctgta tcaagtcgat ccggcgcagc cagatggctt tcataccctg ctccgcgatg 53220 cattcggcga ggaccggatt tgtcgcgcgg tagtgcatat gtggagcctt gatgcgacgg 53280 gagggcgaca cagcagggga ttcaggccga gcggagtcgc gggagcctga tcaactcctg 53340 gcgcgctttc tctggtgcag gcgctggtgc gccggaggtg gcgcaacatg ccgcggcttt 53400 ggctcttgac ccgcgccgtg catgcggtgg gcgcggagga cgcagcggcc tcggtggcgc 53460 aggcgccggt gtggggcctc ggtcggacgc tcgcgctcga gcatccagag ctgcggtgca 53520 cgctcgtgga cgtgaacccg gcgccgtctc cagaggacgc agccgcactg gcggtggagc 53580 tcggggcgag cgacagagag gaccaggtcg cattgcgctc ggatggccgc tacgtggcgc 53640 gcctcgtgcg gagctccttt tccggcaagc ctgctacgga ttgcggcatc cgggcggacg 53700 gcagctatgt gatcaccgat ggcatgggga gagtggggct ctcggtcgcg caatggatgg 53760 tgatgcaggg ggcccgccat gtggtgctcg tggatcgcgg cggcgcttcc gag gcatccc 53820 gggatgccct ccggtccatg gccgaggctg gcgcggaggt gcagatcgtg gaggccgacg 53880 tggctcggcg cgacgatgtc gctcggctcc tctcgaagat cgaaccgtcg atgccgccgc 53940 ttcgggggat cgtgtacgtg gacgggacct tccagggcga ctcctcgatg ctggagctgg 54000 atgcccgtcg cttcaaggag tggatgtatc ccaaggtgct cggagcgtgg aacctgcacg 54060 cgctgaccag ggatagatcg ctggacttct tcgtcctgta ttcctcgggc acctcgcttc 54120 tgggcttgcc aggacagggg agccgcgccg ccggtgacgc cttcttggac gccatcgcgc 54180 atcaccggtg caaggtgggc cttacagcga tgagcatcaa ctggggattg ctctccgaag 54240 catcatcgcc ggcgaccccg aacgacggcg gagcacggct cgaataccgg gggatggaag 54300 gcctcacgct ggagcaggga gcggcggcgc tcgggcgctt gctcgcacga cccagggcgc 54360 aggtaggggt gatgcggctg aatctgcgcc agtggttgga gttctatccc aacgcggccc 54420 gattggcgct gtgggcggag ctgctgaagg agcgtgaccg ggcgcgtcga cgccgaccga 54480 acgcgtcgaa cctgcgcgag gcgctgcaga gcgccaggcc cgaagatcgt cagttgattc 54540 tggagaagca cttgagcgag ctgttggggc gggggctgcg ccttccgccg gagaggatcg 54600 agcggcacgt gccgttcagc aatctcggca tggactcgct gatagg CCTG gagctccgca 54660 accgcatcga ggccgcgctc ggcatcaccg tgccggcgac cctgctatgg acctacccta 54720 acgtagcagc tctgagcggg agcttgctag acattctgtt tccgaatgcc ggcgcgaccc 54780 acgctccggc caccgagcgg gagaagagct tcgagaacga tgccgcagat ctcgaggctc 54840 tgcggggcat gacggacgag cagaaggacg cgttgctcgc cgaaaagctg gcgcagctcg 54900 cgcagatcgt tggtgagtaa gggaccgagg gagtatggcg accacgaatg ccgggaagct 54960 tgagcatgcc cttctgctca tggacaagct tgcgaaaaag aacgcgtctt tggagcaaga 55020 gcggaccgag ccgatcgcca tcgtaggcat tggctgccgc ttccccggcg gagcggacac 55080 tccggaggca ttctgggagc tgctcgactc aggccgagac gcggtccagc cgctcgaccg 55140 gcgctgggcg ctggtcggcg tccatcccag cgaggaggtg ccgcgctggg ccggactgct 55200 caccgaggcg gtggacggct tcgacgccgc gttctttggc acctcgcctc gggaggcgcg 55260 gtcgctcgat cctcagcaac gcctgctgct ggaggtcacc tgggaagggc tcgaggacgc 55320 cggcatcgca ccccagtccc tcgacggcag ccgcaccggg gtgttcctgg gcgcatgcag 55380 cagcgactac tcgcataccg ttgcgcaaca gcggcgcgag gagcaggacg catacgacat 55440 caccggcaat acgctcagcg tcgccgccgg acggttgtc t tatacgctag ggctgcaggg 55500 accctgcctg accgtcgaca cggcctgctc gtcgtcgctc gtggccatcc accttgcctg 55560 ccgcagcctg cgcgctcgcg agagcgatct cgcgctggcg ggaggcgtca acatgctcct 55620 ttcgtccaag acgatgataa tgctggggcg catccaggcg ctgtcgcccg atggccactg 55680 ccggacattc gacgcctcgg ccaacgggtt cgtccgtggg gagggctgcg gtatggtcgt 55740 gctcaaacgg ctctccgacg cccagcgaca cggcgatcgg atctgggctc tgatccgggg 55800 ttcggccatg aatcaggatg gccggtcgac agggttgatg gcacccaatg tgctcgctca 55860 ggaggcgctc ttgcgcgagg cgctgcagag cgctcgcgtc gacgccgggg ccatcggtta 55920 tgtcgagacc cacggaacgg ggacctcgct cggcgacccg atcgaggtcg aggcgctgcg 55980 tgccgtgttg gggccggcgc gggccgatgg gagccgctgc gtgctgggcg cagtgaagac 56040 aaacctcggc cacctggagg gcgctgcagg cgtggcgggt ttgatcaagg cggcgctggc 56100 tctgcaccac gaactgatcc cgcgaaacct ccatttccac acgctcaatc cgcggatccg 56160 gatcgagggg accgcgctcg cgctggcgac ggagccggtg ccgtggccgc gggcgggccg 56220 accgcgcttc gcgggggtga gcgcgttcgg cctcagcggc accaacgtcc atgtcgtgct 56280 ggaggaggcg ccggccacgg tgctcgcacc g gcgacgccg gggcgctcag cggagctttt 56340 ggtgctgtcg gcgaagagcg ccgccgcgct ggacgcacag gcggcgcggc tctcagcgca 56400 catcgccgcg tacccggagc agggtctcgg agacgtcgcg ttcagcctgg tatcgacgcg 56460 tagcccgatg gagcaccggc tcgcggtggc ggcgacctcg cgcgaggcgc tgcgaagcgc 56520 gctggaggtt gcggcgcagg ggcagacccc ggcaggcgcg gcgcgcggca gggccgcttc 56580 ctcgcccggc aagctcgcct tcctgttcgc cgggcagggc gcgcaggtgc cgggcatggg 56640 ccgtgggttg tgggaggcgt ggccggcgtt ccgcgagacc ttcgaccggt gcgtcacgct 56700 cttcgaccgg gagctccatc agccgctctg cgaggtgatg tgggccgagc cgggcagcag 56760 caggtcgtcg ttgctggacc agacggcgtt cacccagccg gcgctctttg cgctggagta 56820 cgcgctggcc gcgctcttcc ggtcgtgggg cgtggagccg gagctcgtcg ctggccatag 56880 cctcggcgag ctggtggccg cctgcgtggc gggtgtgttc tccctcgagg acgccgtgcg 56940 cttggtggtc gcgcgcggcc ggttgatgca ggcgctgccg gccggcggcg cgatggtatc 57000 gatcgccgcg ccggaggccg acgtggctgc cgcggtggcg ccgcacgcag cgttggtgtc 57060 gatcgcggca gtcaatgggc cggagcaggt ggtgatcgcg aattcgtgca ggcgccgaga 57120 gcagatcgcg gcggcgttcg cggcgcgggg ggc gcgaacc aaaccgctgc atgtctcgca 57180 tcgccgctca cgcgttccac tggatccgat gctggaggcg ttccggcggg tgactgagtc 57240 ggtgacgtac cggcggcctt cgatcgcgct ggtgagcaac ctgagcggga agccctgcac 57300 cgatgaggtg agcgcgccgg gttactgggt gcgtcacgcg cgagaggcgg tgcgcttcgc 57360 ggacggagtg aaggcgctgc acgcggccgg tgcgggcctc ttcgtcgagg tggggccgaa 57420 gccgacgctg ctcggccttg tgccggcctg cctgccggat gccaggccgg tgctgctccc 57480 agcgtcgcgc gccgggcgtg acgaggctgc gagcgcgcta gaggcgctgg gtgggttctg 57540 ggtcgtcggt ggatcggtca cctggtcggg tgtcttccct tcgggcggac ggcgggtacc 57600 gctgccaacc tatccctggc agcgcgagcg ttactggatc gaagcgccgg tcgatcgtga 57660 ggcggacggc accggccgtg ctcgggcggg gggccacccc cttctgggtg aagtcttttc 57720 cgtgtcgacc catgccggtc tgcgcctgtg ggagacgacg ctggaccgaa agcggctgcc 57780 gtggctcggc gagcaccggg cgcaggggga ggtcgtgttt cctggcgccg ggtacctgga 57840 gatggcgctg tcgtcggggg ccgagatctt gggcgatgga ccgatccagg tcacggatgt 57900 ggtgctcatc gagacgctga ccttcgcggg cgatacggcg gtaccggtcc aggtggtgac 57960 gaccgaggag cgaccgggac ggctgc GGTT ccaggtagcg agtcgggagc cgggggaacg 58020 tcgcgcgccc ttccggatcc acgcccgcgg cgtgctgcgc cggatcgggc gcgtcgagac 58080 cccggcgagg tcgaacctcg ccgccctgcg cgcccggctt catgccgccg tgcccgctgc 58140 ggctatctat ggtgcgctcg ccgagatggg gcttcaatac ggcccggcgt tgcgggggct 58200 cgccgagctg tggcggggtg agggcgaggc gctgggcagg gtgagactgc ctgaggccgc 58260 cggctccgcg acagcctacc agctgcatcc ggtgctgctg gacgcgtgcg tccaaatgat 58320 tgttggcgcg ttcgccgatc gcgatgaggc gacgccgtgg gcgccggtgg aggtgggctc 58380 ggtgcggctg ttccagcggt ctcctgggga gctatggtgc catgcgcgcg tcgtgagcga 58440 tggtcaacag gcctccagcc ggtggagcgc cgactttgag ttgatggacg gtacgggcgc 58500 ggtggtcgcc gagatctccc ggctggtggt ggagcggctt gcgagcggtg tacgccggcg 58560 cgacgcagac gactggttcc tggagctgga ttgggagccc gcggcgctcg gtgggcccaa 58620 gatcacagcc ggccggtggc tgctgctcgg cgagggtggt gggctcgggc gctcgttgtg 58680 ctcggcgctg aaggccgccg gccatgtcgt cgtccacgcc gcgggggacg acacgagcac 58740 tgcaggaatg cgcgcgctcc tggccaacgc gttcgacggc caggccccga cggccgtggt 58800 gcacctcagc agcctcgac g ggggcggcca gctcggcccg gggctcgggg cgcagggcgc 58860 gctcgacgcg ccccggagcc cagatgtcga tgccgatgcc ctcgaatcgg cgctgatgcg 58920 tggttgcgac agcgtgctct ccctggtgca agcgctggtc ggcatggacc tccgaaacgc 58980 gccgcggctg tggctcttga cccgcggggc tcaggcggcc gccgccggcg atgtctccgt 59040 ggtgcaagcg ccgctgttgg ggctgggccg caccatcgcc ttggagcacg ccgagctgcg 59100 ctgtatcagc gtcgacctcg atccagccga gcctgaaggg gaagccgatg ctttgctggc 59160 cgagctactt gcagatgatg ccgaggagga ggtcgcgctg cgcggtggcg accggctcgt 59220 tgcgcggctc gtccaccggc tgcccgacgc tcagcgccgg gagaaggtcg agcccgccgg 59280 tgacaggccg ttccggctag agatcgatga acccggcgcg ctggaccaac tggtgctccg 59340 agccacgggg cggcgcgctc ctggtccggg cgaggtcgag atctccgtcg aagcggcggg 59400 gctcgactcc atcgacatcc agctggcgtt gggcgttgct cccaatgatc tgcctggaga 59460 agaaatcgag ccgttggtgc tcggaagcga gtgcgccggg cgcatcgtcg ctgtgggcga 59520 gggcgtgaac ggccttgtgg tgggccagcc ggtgatcgcc cttgcggcgg gagtatttgc 59580 tacccatgtc accacgtcgg ccacgctggt gttgcctcgg cctctggggc tctcggcgac cgaggcggcc 59640 g cgatgcccc tcgcgtattt gacggcctgg tacgccctcg acaaggtcgc 59700 ccacctgcag gcgggggagc gggtgctgat ccatgcggag gccggtggtg tcggtctttg 59760 cgcggtgcga tgggcgcagc gcgtgggcgc cgaggtgtat gcgaccgccg acacgcccga 59820 gaaccgtgcc tacctggagt cgctgggcgt gcggtacgtg agcgattccc gctcgggccg 59880 gttcgtcaca gacgtgcatg catggacgga cggcgagggt gtggacgtcg tgctcgactc 59940 gctttcgggc gagcgcatcg acaagagcct catggtcctg cgcgcctgtg gtcgccttgt 60000 gaagctgggc aggcgcgacg actgcgccga cacgcagcct gggctgccgc cgctcctacg 60060 gaatttttcc ttctcgcagg tggacttgcg gggaatgatg ctcgatcaac cggcgaggat 60120 ccgtgcgctc ctcgacgagc tgttcgggtt ggtcgcagcc ggtgccatca gcccactggg 60180 gtcggggttg cgcgttggcg gatccctcac gccaccgccg gtcgagacct tcccgatctc 60240 tcgcgcagcc gaggcattcc ggaggatggc gcaaggacag catctcggga agctcgtgct 60300 cacgctggac gacccggagg tgcggatccg cgctccggcc gaatccagcg tcgccgtccg 60360 cgcggacggc acctaccttg tgaccggcgg tctgggtggc ctcggtctgc gcgtggccgg 60420 atggctggcc gagcggggcg cggggcaact ggtgctggtg ggccgctccg gtgcggcgag 60480 cga cgcagagcag gccgccg tggcggcgct ggaggcccac ggcgcgcgcg tcacggtggc 60540 gaaagcggac gtcgccgatc ggtcacagat cgagcgggtc ctccgcgagg ttaccgcgtc 60600 ggggatgccg ctgcggggtg tcgtgcatgc ggcaggtctc gtggatgacg ggctgctgat 60660 gcagcagact ccggcgcggt tccgcacggt gatgggacct aaggtccagg gggccttgca 60720 cttgcacacg ctgacacgcg aagcgcctct ttccttcttc gtgctgtacg cttctgcagc 60780 tgggcttttc ggctcgccag gccagggcaa ctatgccgca gccaacgcgt tcctcgacgc 60840 cctttcgcat caccgaaggg cgcagggcct gccggcgctg agcatcgact ggggcatgtt 60900 cacggaggtg gggatggccg ttgcgcaaga aaaccgtggc gcgcggcaga tctctcgcgg 60960 gatgcggggc atcacccccg atgagggtct gtcagctctg gcgcgcttgc tcgagggtga 61020 tcgcgtgcag acgggggtga taccgatcac tccgcggcag tgggtggagt tctacccggc 61080 aacagcggcc tcacggaggt tgtcgcggct ggtgaccacg cagcgcgcgg tcgctgatcg 61140 gaccgccggg gatcgggacc tgctcgaaca gcttgcgtcg gctgagccga gcgcgcgggc 61200 ggggctgctg caggacgtcg tgcgcgtgca ggtctcgcat gtgctgcgtc tccctgaaga 61260 caagatcgag gtggatgccc cgctctcgag catgggcatg tgagcctgga gactcgctga 61320 gctgcg caac cgcatcgagg ctgcgctggg cgtcgccgcg cctgcagcct tggggtggac 61380 gtacccaacg gtagcagcga taacgcgctg gctgctcgac gacgccctcg tcgtccggct 61440 tggcggcggg tcggacacgg acgaatcgac ggcgagcgcc ggttcgttcg tccacgtcct 61500 cctgtcgtca ccgctttcgt agccgcgggc tcgtctcttc tgttttcacg gttctggcgg 61560 ctcgcccgag ggcttccgtt cctggtcgga gaagtctgag tggagcgatc tggaaatcgt 61620 cacgatcgca ggccatgtgg gcctcgcctc cgaggacgcg cctggtaaga agtacgtcca 61680 agaggcggcc tcgctgattc agcactatgc agacgcaccg tttgcgttag tagggttcag 61740 cctgggtgtc cggttcgtca tggggacagc cgtggagctc gccagtcgtt ccggcgcacc 61800 ggctccgctg gccgtcttca cgttgggcgg cagcttgatc tcttcttcag agatcacccc 61860 ggagatggag accgatataa tagccaagct cttcttccga aatgccgcgg gtttcgtgcg 61920 atccacccaa caagtccagg ccgatgctcg cgcagacaag gtcatcacag acaccatggt 61980 ggctccggcc cccggggact cgaaggagcc gcccgtgaag atcgcggtcc ctatcgtcgc 62040 catcgccggc tcggacgatg tgatcgtgcc tccgagcgac gttcaggatc tacaatctcg 62100 caccacggag cgcttctata tgcatctcct tcccggagat cacgaatttc tcgtcgatcg 6216 0 agggcgcgag atcatgcaca tcgtcgactc gcatctcaat ccgctgctcg ccgcgaggac 62220 gacgtcgtca ggccccgcgt tcgaggcaaa atgatggcag cctccctcgg gcgcgcgaga 62280 tggttgggag cagcgtgggc gctggcggcc ggcggcaggc cgcggaggcg catgagcctt 62340 cctggacgtt tgcagtatag gagattttat gacacaggag caagcgaatc agagtgagac 62400 ttcgacttca gaagcctgct agccgttcgc gcctgggtac gcggaggacc cgttccccgc 62460 gatcgagcgc ctgagagagg caacccccat cttctactgg gatgaaggcc gctcctgggt 62520 cctcacccga taccacgacg tgtcggcggt gttccgcgac gaacgcttcg cggtcagtcg 62580 agaagagtgg gaatcgagcg cggagtactc gtcggccatt gcgatatgaa cccgagctca 62640 ttgttcgggc gaagtacgga tgccgccgga ggatcacgct cgggtccgca agctcgtcaa 62700 cccgtcgttt acgtcacgcg ccatcgacct gctgcgcgcc gaaatacagc gcaccgtcga 62760 ccagctgctc gatgctcgct ccggacaaga ggagttcgac gttgtgcggg attacgcgga 62820 gggaatcccg atgcgcgcga tcagcgctct gttgaaggtt ccggccgagt gtgacgagaa 62880 gttccgtcgc ttcggctcgg cgactgcgcg cgcgctcggc gtgggtttgg tgccccaggt 62940 cgatgaggag accaagaccc tggtcgcgtc cgtcaccgag gggctcgcgc tgctcca tga 63000 cgtcctcgat gagcggcgca ggaacccgct cgaaaatgac tgctgcttca gtcttgacga 63060 gacggcagca ggccgaggcc ggctgagcac gaaggagctg gtcgcgctcg tgggtgcgat 63120 ggcaccgata tatcgctgct ccacgatcta ccttatcgcg ttcgctgtgc tcaacctgct 63180 gcggtcgccc gaggcgctcg agctggtgaa ggccgagccc gggctcatga ggaacgcgct 63240 cgatgaggtg ctccgcttcg acaatatcct cagaatagga actgtgcgtt tcgccaggca 63300 ggacctggag tactgcgggg catcgatcaa gaaaggggag atggtctttc tcctgatccc 63360 gagcgccctg agagatggga ctgtattctc caggccagac gtgtttgatg tgcgacggga 63420 cacgggcgcg agcctcgcgt acggtagagg cccccatgtc tgccccgggg tgtcccttgc 63480 tcgcctcgag gcggagatcg ccgtgggcac catcttccgt aggttccccg agatgaagct 63540 gaaagaaact cccgtgtttg gataccaccc cgcgttccgg aacatcgaat cactcaacgt 63600 catcttgaag ccctccaaag ctggatagct cgcgggggta tcgcttcccg aacctcattc 63660 cctcatgata cagctcgcgc gcgggtgctg tctgccgcgg gtgcgattcg atccagcgga 63720 caagcccatt gtcagcgcgc gaagatcgaa tccacggccc ggagaagagc ccgtccgggt 63780 gacgtcggaa gaagtgccgg gcgccgccct gggagcgcaa agctcgctcg ttc gcgctca 63840 gcacgccgct cgtcatgtcc ggccctgcac ccgcgccgag gagccgcccg ccctgatgca 63900 cggcctcacc gagcggcagg ttctgctctc gctcgtcgcc ctcgcgctcg tcctcctgac 63960 cgcgcgcgcc ttcggcgagc tcgcgcggcg gctgcgccag cccgaggtgc tcggcgagct 64020 cttcggcggc gtggtgctgg gcccgtccgt cgtcggcgcg ctcgctcctg ggttccatcg 64080 agtcctcttc caggatccgg cggtcggggt cgtgctctcc ggcatctcct ggataggcgc 64140 gctcgtcctg ctgctcatgg cgggtatcga ggtcgatgtg agcatcctgc gcaaggaggc 64200 gcgccccggg gcgctctcgg cgctcggcgc gatcgcgccc ccgctgcgca cgccggggcc 64260 gctggtgcag cgcatgcagg gcgcgttcac gtgggatctc gacgtctcgc cgcgacgctc 64320 tgcgcaagcc tgagcctcgg cgcctgctcg tacacctcgc cggtgctcgc tccgcccgcg 64380 gacatccggc cgcccgccgc ggcccagctc gagccggact cgccggatga cgaggccgac 64440 gaggccgacg aggcgctccg cccgttccgc gacgcgatcg ccgcgtactc ggaggccgtt 64500 cggtgggcgg aggcggcgca gcggccgcgg ctggagagcc tcgtgcggct cgcgatcgtg 64560 aggcgctcga cggctgggca caaggtccct ttcgcgcaca cgacggccgg cgtctcccag 64620 gactccagaa atcgccggca cgatgcggtc tggttcgatg tcgccg CCCG gtacgcgagc 64680 ttccgcgcgg cgacggagca cgcgctccgc gacgcggcgt cggccatgga ggcgctcgcg 64740 gccggcccgt accgcggatc gagccgcgtg tccgctgccg taggggagtt tcggggggag 64800 gcggcgcgcc ttcaccccgc ggaccgtgta cccgcgtccg accagcagat cctgaccgcg 64860 ctgcgcgcag ccgagcgggc gctcatcgcg ctctacactg cgttcgcccg tgaggagtga 64920 gcctctctcg ggcgcagccg agcggcggcg tgccggtggt tccctcttcg caaccatgac 64980 cggagccgcg ctcggtccgc gcagcggcta gcgcgcgtcg cggcagagat cgctggagcg 65040 acaggcgacg acccgcccga gggtgtcgaa cggattgccg cagccctcat tgcggatccc 65100 ctccagacac tcgttcagct gcttggcgtc gatgccgcct gggcactcgc cgaaggtcag 65160 ctcgtcgcgc cactcggatc ggatcttgtt cgagcacgcg tccttgctcg aatactcccg 65220 gtcttgtccg atgttgttgc accgcgcctc gcggtcgcac cgcgccgcca cgatgctatc 65280 gacggcgctg ccgactggca ccggcgcctc gccctgcgcg ccacccgggg tttgcgcctc 65340 cccgcctgac cgcttttcgc cgccgcacgc cgcgagcagg ctcattcccg acaccgagat 65400 caggcccacg accagcttcc cagcaatctt ttgcatggct tcccctccct cacgacacgt 65460 cacatcagag actctccgct cggctcgtcg gttcgacag c cggcgacggc cacgagcaga 65520 accgtccccg accagaacag ccgcatgcgg gtttctcgca acatgccccg acatccttgc 65580 gactagcgtg cctccgctcg tgccgagatc ggctgtcctg tgcgacggca atatcctgcg 65640 atcggccggg caggaggtac cgacacgggc gccgggcggg aggtgccgcc acgggctcga 65700 aatgtgctgc ggcaggcgcc tccatgcccg cagccgggaa cgcggcgccc ggccagcctc 65760 ggggtgacgc cgcaaacggg agatgctccc ggagaggcgc cgggcacagc cgagcgccgt 65820 caccaccgtg cgcactcgtg agctccagct cctcggcata gaagagaccg tcactcccgg 65880 tccgtgtagg cgatcgtgct gatcagcgcg ttctccgcct gacgcgagtc gagccgggta 65940 tgctgcacga caatgggaac gtccgattcg atcacgctgg catagtccgt atcgcgcggg 66000 atcggctcgg gttcggtcag atcgttgaac cggacgtgcc gggtgcgcct cgctgggacg 66060 gtcacccggt acggcccggc ggggtcgcgg tcgctgaagt agacggtgat ggcgacctgc 66120 gcgtcccggt ccgacgcatt caacaggcag gccgtctcat ggctcgtcat ctgcggctcg 66180 ggtccgttgc tccggcctgg gatgtagccc tctgcgattg cccagcgcgt ccgcccgatc 66240 ggcttctcca tatgtcctcc ctgctggctc ctctttggct gcctccctct gctgtccagg 66300 agcgacggcc tcttctcccg acgcgctcgg g gatccatgg ctgaggatcc tcgccgagcg 66360 ctccttgccg accggcgcgc cgagcgccga cgggctttga aagcacgcga ccggacacgt 66420 gatgccggcg cgacgaggcc gccccgcgtc tgatcccgat cgtgacatcg cgacgtccgc 66480 cggcgcctct gcaggccggc ctgagcgttg cgcggtcatg gtcgtcctcg cgtcaccgcc 66540 acccgccgat tcacatccca ccgcggcacg acgcttgctc aaaccgcggc gagacggccg 66600 ggcggctgtg gtaccggcca gcccggacgc gaggcccgag agggacagtg ggtccgccgt 66660 gaagcagtga ggcgatcgag gtggcagatg aaacacgttg acacgggccg acgagtcggc 66720 cgccggatag ggctcacgct cggtctcctc gcgagcatgg cgctcgccgg ctgtggcggc 66780 ccgagcgaga aaatcgtgca gggcacgcgg ctcgcgcccg gcgccgatgc gea; cgtcgcc 66840 gccgacgtcg accccgacgc cgcgaccacg cggctggcgg tggacgtcgt tea; cctctcg 66900 ccgcccgagc gcatcgaggc cggcagcgag cggttcgtcg tctggcagcg tc < : Gagctcc 66960 gagtccccgt ggcaacgggt cggagtgctc gactacaacg ctgccagccg aagaggcaag 67020 ctggccgaga cgaccgtgcc gcatgccaac ttcgagctgc tcatcaccgt cgagaagcag 67080 agcagccctc agtctccatc ttctgccgcc gtcatcgggc cgacgtccgt cgggtaacat 67140 gcagcgctga cgcgctatca gcccgccagc aggccccaga gccctgcctc gatcgccttc 67200 tccatcatat catccctgcg tactcctcca gcgacggccg cgtcgaagca accgccgtgc 67260 cggcgcggct ctacgtgcgc gacaggagag cgtcctggcg cggcctgcgc atcgctggaa 67320 ggatcggcgg agcatggaga aagaatcgag gatcgcgatc tacggcgcca tcgcagccaa 67380 gcggcggtca cgtggcgatc agttcatcgc cgccgccgtg accggcagct cggcgatgct 67440 ctccgagggc gtgcactccc tcgtcgatac tgcagacggg ctcctcctcc tgctcggcaa 67500 gcaccggagc gcacgcccgc ccgacgccga gcatccgttc ggccacggca aggagctcta 67560 tttctggacg ctgatcgtcg ccatcatgat cttcgccgcg ggcggcggcg tctcgatcta 67620 cgaagggatc ttgcacctct tgcacccgcg ccagatcgag gatccgacgt ggaactacgt 67680 cgtcctcggc gcagcggccg tcttcgaggg gacgtcgctc atcatctcga tccacgagtt 67740 caagaagaag gacggacagg gctacctcgc ggcgatgcgg tccag FACs acccgacgac 67800 gttcacgatc gtcctggagg actccgcggc gctcgccggg ctcaccatcg ccttcctcgg 67860 cgtctggctc gggcaccgcc tgggaaaccc ctacctcgac ggcgcggcgt cgatcggcat 67920 cggcctcgtg ctcgccgcgg tcgcggtctt cctcgccagc cagagccgtg ggctcctcgt 67980 gggggagagc gcggacaggg agctcctcgc cgcgatccgc gcgctcgcca gcgcagatcc 68040 tggcgtgtcg gcggtggggc ggcccctgac gatgcacttc ggtccgcacg aagtcctggt 68100 cgtgctgcgc atcgagttcg acgccgcgct cacggcgtcc ggggtcgcgg aggcgatcga 68160 gcgcatcgag acccggatac ggagcgagcg acccgacgtg aagcacatct acgtcgaggc 68220 caggtcgctc caccagcgcg cgagggcgtg acgcgccgtg gagagaccgc gcgcggcctc 68280 cgccatcctc cgcggcgccc gggctcaggt ggccctcgca gcagggcgcg cctggcgggc 68340 aaaccgtgca gacgtcgtcc ttcgacgcga ggtacgctgg ttgcaagtcg tcacgccgta 68400 tcgcgaggtc cggcagcgcc ggagcccggg cgggccgggc gcacgaaggc gcggcgagcg 68460 caggcttcga ggggggcgac gtcatgagga aggccagggc gcatggggcg atgctcggcg 68520 ggcgagatga cggctggcgt cgcggcctcc ccggcgccgg cgcgcttcgc gccgcgctcc 68580 agcgcggtcg ctcgcgcgat ctcgcccggc gccggctc at cgcctccgtg tccctcgccg 68640 gcggcgccag catggcggtc gtctcgctgt tccagctcgg gatcatcgag cgcctgcccg 68700 atcctccgct tccagggttc gattcggcca aggtgacgag ctccgatatc 68750 < 210 > 2 < 211 > 1421 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 2 Val Ala Asp Arg Pro lie Glu Arg Ala Ala Glu Asp Pro lie Ala lie 1 5 10 15 Val Gly Ala Ser Cys Arg Leu Pro Gly Gly Val lie Asp Leu Ser Gly 20 25 30 Phe Trp Thr Leu Leu Glu Gly Ser Arg Asp Thr Val Gly Arg Val Pro 35 40 45 Wing Glu Arg Trp Asp Ala Wing Wing Trp Phe Asp Pro Asp Pro Asp Wing 50 55 60 Pro Gly Lys Thr Pro Val Thr Arg Wing Ser Phe Leu Ser Asp Val Wing 65 * 70 75 80 Cys Phe Asp Wing Being Phe Phe Gly He Being Pro Arg Glu Wing Leu Arg 85 90 95 Met Asp Pro Wing His Arg Leu Leu Leu Glu Val Cys Trp Glu Wing Leu 100 105 110 Glu Asn Wing Wing He Wing Pro Wing Wing Leu Val Gly Thr Glu Thr Gly 115 120 125 Val Phe He Gly He Gly Pro Ser Glu Tyr Glu Ala Wing Leu Pro Gln 130 135 140 Wing Thr Wing Being Wing Glu He Asp Wing His Gly Gly Leu Gly Thr Met 145 150 155 160 Pro Ser Val Gly Ala Gly Arg He Ser Tyr Ala Leu Gly Leu Arg Gly 165 170 175 Pro Cys Val Wing Val Asp Thr Wing Tyr Ser Ser Ser Leu Val Wing Val 180 185 190 His Leu Wing Cys Gln Ser Leu Arg Ser Gly Glu Cys Ser Thr Ala Leu 195 200 205 Wing Gly Gly Val Ser Leu Met Leu Ser Pro Ser Thr Leu Val Trp Leu 210 215 220 Ser Lys Thr Arg Wing Leu Wing Arg Asp Gly Arg Cys Lys Wing Phe Ser 225 230 235 240 Wing Glu Wing Asp Gly Phe Gly Arg Gly Glu Gly Cys Wing Val Val Val 245 250 255 Leu Lys Arg Leu Ser Gly Wing Arg Wing Asp Gly Asp Arg He Leu Wing 260 265 270 Val He Arg Gly Be Wing He Asn His Asp Gly Wing Being Ser Gly Leu 275 280 285 Thr Val Pro Asn Gly Ser Ser Gln Glu He Val Leu Lys Arg Ala Leu 290 295 300 Wing Asp Wing Gly Cys Wing Wing Being Ser Val Gly Tyr Val Glu Wing His 305 310 315 320 Gly Thr Gly Thr Thr Leu Gly Asp Pro He Glu He Gln Wing Leu Asn 325 330 335 Wing Val Tyr Gly Leu Gly Arg Asp Val Wing Thr Pro Leu Leu He Gly 340 345 350 Ser Val Lys Thr Asn Leu Gly His Pro Glu Tyr Wing Ser Gly He Thr 355 360 365 Gly Lea Leu Lys Val Val Leu Ser Leu Gln His Gly Gln He Pro Wing 370 375 380 His Leu His Wing Gln Ala Leu Asn Pro Arg He Ser Trp Gly Asp Leu 385 390 395 400 Arg Leu Thr Val Thr Arg Wing Arg Thr Pro Trp Pro Asp Trp Asn Thr 405 410 415 Pro Arg Arg Wing Gly Val Be Ser Phe Gly Met Ser Gly Thr Asn Wing 420 425 430 His Val Val Leu Glu Wing Ala Pro Wing Wing Thr Cys Thr Pro Pro Wing 435 440 445 Pro Glu Arg Pro Wing Glu Leu Leu Val Leu Wing Wing Arg Thr Wing Ser 450 455 460 Wing Leu Asp Wing Gln Wing Wing Arg Leu Arg Asp His Leu Glu Thr Tyr 465 470 475 480 Pro Ser Gln Cys Leu Gly Asp Val Wing Phe Ser Leu Wing Thr Thr Arg 485 490 495 Be Wing Met Glu His Arg Leu Wing Val Wing Wing Thr Ser Arg Glu Gly 500 505 510 Leu Arg Wing Wing Leu Asp Wing Wing Gln Gly Gln Thr Ser Pro Gly 515 520 525 Wing Val Arg Ser Wing Asp Being Ser Arg Gly Lys Leu Ala Phe Leu 530 535 540 Phe Thr Gly Gln Gly Wing Gln Thr Leu Gly Met Gly Arg Gly Leu Tyr 545 550 555 560 Asp Val Trp Ser Wing Phe Arg Glu Wing Phe Asp Leu Cys Val Arg Leu 565 570 575 Phe Asn Gln Glu Leu Asp Arg Pro Leu Arg Glu Val Met Trp Wing Glu 580 585 590 Pro Wing Ser Val Asp Wing Wing Leu Leu Asp Gln Thr Wing Phe Thr Gln 595 600 605 Pro Wing Leu Phe Thr Phe Glu Tyr Wing Leu Wing Wing Leu Trp Arg Ser 610 615 620 Trp Gly Val Glu Pro Glu Leu Val Wing Gly His Ser He Gly Glu Leu 625 630 635 640 Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu Asp Ala Val Phe 645 650 655 Leu Val Wing Wing Arg Gly Arg Leu Met Gln Wing Leu Pro Wing Gly Gly 660 665 670 Wing Met Val Ser He Glu Wing Pro Glu Wing Asp Val Wing Wing Wing Val 675 680 685 Wing Pro His Wing Wing Ser Val Val W He Wing Wing Val Asn Ala Pro Asp 690 695 700 Gln Val Val He Wing Gly Wing Gly Gln Pro Val His Wing He Wing Wing 705 710 715 720 Ala Ala Ala Ala Arg Gly Ala Arg Thr Lys Ala Leu His Val Ser His 725 730 735 Ala Phe His Ser Pro Leu Met Ala Pro Met Leu Glu Ala Phe Gly Arg 740 745 750 Val Ala Glu Ser Val Ser Tyr Arg Arg Pro Ser He Val Leu Val Ser 755 760 765 Asn Leu Ser Gly Lys Wing Cys Thr Asp Glu Val Ser Ser Pro Gly Tyr 770 775 780 Trp Val Arg His Wing Arg Glu Val Val Arg Phe Wing Asp Gly Val Lys 785 790 795 800 Ala Leu His Ala Ala Ala Gly Ala Gly Thr Phe Val Glu Val Gly Pro Lys 805 810 815Be Thr Leu Leu Gly Leu Val Pro Wing Cys Met Pro Asp Wing Arg Pro 820 825 830 Wing Leu Leu Wing Ser Being Arg Wing Gly Arg Asp Glu Pro Wing Thr Val 835 840 845 Leu Glu Wing Leu Gly Gly Leu Trp Wing Val Gly Gly Leu Val Ser Trp 850 855 860 Wing Gly Leu Phe Pro Be Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr 865 870 875 880 Pro Trp Gln Arg Glu Arg Tyr Trp He Asp Thr Lys Wing Asp Asp Wing 885 890 895 Wing Arg Gly Asp Arg Arg Wing Pro Gly Wing Gly His Asp Glu Val Glu 900 905 910 Glu Gly Gly Wing Val Arg Gly Gly Asp Arg Arg Ser Wing Arg Leu Asp 915 920 925 Pro His Pro Pro Glu Ser Gly Arg Arg Glu Lys Val Glu Ala Ala Gly 930 935 940 Asp Arg Pro Phe Arg Leu Glu He Asp Glu Pro Gly Val Leu Asp His 945 950 955 960 Leu Val Leu Arg Val Thr Glu Arg Arg Ala Pro Gly Leu Gly Val Glu 965 970 975 Glu He Ala Val Asp Ala Ala Gly Leu Ser Phe Asn Asp Val Gln Leu 980 985 990 Wing Leu Gly Met Val Pro Asp Asp Leu Pro Gly Lys Pro Asn Pro Pro 995 1000 1005 Leu Leu Leu Gly Gly Glu Cys Ala Gly Arg He Val Wing Val Gly Glu 1010 1015 1020 Gly Val Asn Gly Leu Val Val Gly Gln Pro Val He Ala Leu Ser Wing 1025 1030 1035 1040 '& *** - "A: Gly Ala Phe Ala Thr His Val Thr Thr Ser Ala Ala Leu Val Leu Pro 1045 1050 1055 Arg Pro Gln Ala Leu Ser Ala He Glu Ala Ala Ala Pro Val Ala 1060 1065 1070 Tyr Leu Thr Ala Trp Tyr Ala Leu Asp Arg He Ala Arg Leu Gln Pro 1075 1080 1085 Gly Glu Arg Val Leu He His Ala Ala Thr Gly Gly Val Gly Leu Wing 1090 1095 1100 Wing Val Gln Trp Wing Gln His Val Gly Wing Glu Val His Wing Thr Wing 1105 1110 1115 1120 Gly Thr Pro Glu Lys Arg Wing Tyr Leu Glu Being Leu Gly Val Arg Tyr 1125 1130 1135 Val Ser Asp Being Arg Being Asp Arg Phe Val Wing Asp Val Arg Wing Trp 1140 1145 1150 Thr Gly Gly Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Glu 1155 1160 1165 Leu He Asp Lys Ser Phe Asn Leu Leu Arg Ser His Gly Arg Phe Val 1170 1175 1180 Glu Leu Gly Lys Arg Asp Cys Tyr Wing Asp Asn Gln Leu Gly Leu Arg 1185 1190 1195 1200 Pro Phe Leu Arg Asn Leu Ser Phe Ser Leu Val Asp Leu Arg Gly Met 1205 1210 1215 Met Leu Glu Arg Pro Wing Arg Val Arg Wing Leu Leu Glu Glu Leu Leu 1220 1225 1230 Gly Leu He Wing Wing Gly Val Phe Thr Pro Pro Pro He Wing Thr Leu 1235 1240 1245 Pro Wing Wing Arg Val Wing Asp Wing Phe Arg Ser Met Wing Gln Wing Gln 1250 1255 1260 His Leu Gly Lys Leu Val Leu Thr Leu Gly Asp Pro Glu Val Gln He 1265 1270 1275 1280 Arg He Pro Thr His Wing Gly Wing Gly Pro Be Thr Gly Asp Arg Asp 1285 1290 1295 1295 Leu Leu Asp Arg Leu Wing Wing Wing Pro Wing Wing Arg Wing Wing Wing 1300 1305 1310 Leu Glu Wing Phe Leu Arg Thr Gln Val Ser Gln Val Leu Arg Thr Pro 1315 1320 1325 Glu He Lys Val Gly Ala Glu Ala Leu Phe Thr Arg Leu Gly Met Asp 1330 1335 1340 Ser Leu Met Wing Val Glu Leu Arg Asn Arg He Glu Wing Ser Leu Lys 1345 1350 1355 1360 Leu Lys Leu Ser Thr Thr Phe Leu Ser Thr Ser Pro Asn He Ala Leu 1365 1370 1375 Leu Ala Gln Asn Leu Leu Asp Ala Leu Ala Thr Ala Leu Ser Leu Glu 1380 1385 1390 Arg Val Ala Ala Glu Asn Leu Arg Ala Gly Val Gln Asn Asp Phe Val 1395 1400 1405 Ser Ser Gly Wing Asp Gln Asp Trp Glu He He Ala Leu 1410 1415 1420 < 210 > 3 < 211 > 1410 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 3 Met Thr He Asn Gln Leu Leu Asn Glu Leu Glu His Gln Gly He Lys 1 5 10 15 Leu Ala Ala Asp Gly Glu Arg Leu Gln He Gln Wing Pro Lys Asn Wing 20 25 30 Leu Asn Pro Asn Leu Leu Wing Arg He Ser Glu His Lys Ser Thr He 35 40 45 Leu Thr Met Leu Arg Gln Arg Leu Pro Wing Glu He Val Pro Wing 50 55 60 Pro Wing Glu Arg His Wing Pro Phe Pro Leu Thr Asp He Gln Glu Ser 65 70 75 80 Tyr Trp Leu Gly Arg Thr Gly Wing Phe Thr Val Pro Ser Gly He His 85 90 95 Wing Tyr Arg Glu Tyr Asp Cys Thr Asp Leu Asp Val Pro Arg Leu Ser 100 105 110 Arg Wing Phe Arg Lys Val Val Wing Arg His Asp Met Leu Arg Wing His 115 120 125 Thr Leu Pro Asp Met Met Gln Val He Glu Pro Lys Val Asp Wing Asp 130 135 140 He Glu He He Asp Leu Arg Gly Leu Asp Arg Ser Thr Arg Glu Wing 145 150 155 160 Arg Leu Val Ser Leu Arg Asp Ala Met Ser His Arg He Tyr Asp Thr 165 170 175 Glu Arg Pro Pro Leu Tyr His Val Val Wing Val Arg Leu Asp Glu Arg 180 185 190 Gln Thr Arg Leu Val Leu Ser He Asp Leu He Asn Val Asp Leu Gly 195 200 205 Ser Leu Ser He He Phe Lys Asp Trp Leu Ser Phe Tyr Glu Asp Pro 210 215 220 Glu Thr Ser Leu Pro Val Leu Glu Leu Ser Tyr Arg Asp Tyr Val Leu 225 230 235 240 Wing Leu Glu Be Arg Lys Lys Ser Glu Wing His Gln Arg Ser Met Asp 245 250 255 Tyr Trp Lys Arg Arg He Wing Glu Leu Pro Pro Pro Thr Leu Pro 260 265 270 Met Lys Wing Asp Pro Ser Thr Leu Lys Glu He Arg Phe Arg His Thr 275 280 285 Glu Gln Trp Leu Pro Ser Asp Ser Trp Gly Arg Leu Lys Arg Arg Val 290 295 300 Gly Glu Arg Gly Leu Thr Pro Thr Gly Val He Leu Wing Ala Phe Ser 305 310 315 320 Glu Val He Gly Arg Trp Ser Wing Ser Pro Arg Phe Thr Leu Asn He 325 330 335 Thr Leu Phe Asn Arg Leu Pro Val His Pro Arg Val Asn Asp He Thr 340 345 350 Gly Asp Phe Thr Ser Met Val Leu Leu Asp He Asp Thr Thr Arg Asp 355 360 365 Lys Ser Phe Glu Gln Arg Ala Lys Arg He Gln Glu Gln Leu Trp Glu 370 375 380 Wing Met Asp His Cys Asp Val Ser Gly He Glu Val Gln Arg Glu Ala 385 390 395 400 Wing Arg Val Leu Gly He Gln Arg Gly Wing Leu Phe Pro Val Val Leu 405 410 415 Thr Ser Wing Leu Asn Gln Gln Val Val Gly Val Thr Ser Leu Gln Arg 420 425 430 Leu Gly Thr Pro Val Tyr Thr Ser Thr Gln Thr Pro Gln Leu Leu Leu 435 440 445 Asp His Gln Leu Tyr Glu His Asp Gly Asp Leu Val Leu Wing Trp Asp 450 455 460 He Val Asp Gly Val Phe Pro Pro Asp Leu Leu Asp Asp Met Leu Glu 465 470 475 480 Wing Tyr Val Val Phe Leu Arg Arg Leu Thr Glu Glu Pro Trp Gly Glu 485 490 495 Gln Val Arg Cys Ser Leu Pro Pro Wing Gln Leu Glu Wing Arg Wing Ser 500 505 510 Wing Asn Wing Thr Asn Wing Leu Leu Ser Glu His Thr Leu His Gly Leu 515 520 525 Phe Wing Wing Arg Val Glu Gln Leu Pro Met Gln Leu Wing Val Val Ser 530 535 540 Wing Arg Lys Thr Leu Thr Tyr Glu Glu Leu Ser Arg Arg Ser Arg Arg 545 550 555 560 Leu Gly Wing Arg Leu Arg Glu Gln Gly Wing Arg Pro Asn Thr Leu Val 565 570 575 Wing Val Val Met Glu Lys Gly Trp Glu Gln Val Val Wing Val Leu Wing 580 585 590 Val Leu Glu Ser Gly Wing Wing Tyr Val Pro He Asp Wing Asp Leu Pro 595 600 605 Wing Glu Arg He His Tyr Leu Leu Asp His Gly Glu Val Lys Leu Val 610 615 620 Leu Thr Gln Pro Trp Leu Asp Gly Lys Leu Ser Trp Pro Pro Gly He 625 630 635 640 Gln Arg Leu Leu Val Ser Glu Ala Gly Val Glu Gly Asp Gly Asp Gln 645 650 655 Pro Pro Met Met Pro He Gln Thr Pro Ser Asp Leu Wing Tyr Val He 660 665 670 Tyr Thr Ser Gly Ser Thr Gly Leu Pro Lys Gly Val Met He Asp His 675 680 685 Arg Gly Wing Val Asn Thr He Leu Asp He Asn Glu Arg Phe Glu He 690 695 700 Gly Pro Gly Asp Arg Val Leu Ala Leu Ser Ser Leu Ser Phe Asp Leu 705 710 715 720 Ser Val Tyr Asp Val Phe Gly He Leu Wing Wing Gly Gly Thr He Val 725 730 735 Val Pro Asp Wing Ser Lys Leu Arg Asp Pro Wing His Trp Wing Glu Leu 740 745 750 He Glu Arg Glu Lys Val Thr Val Trp Asn Ser Val Pro Ala Leu Met 755 760 765 Arg Met Leu Val Glu His Phe Glu Gly Arg Pro Asp Ser Leu Ala Arg 770 775 780 Ser Leu Arg Leu Ser Leu Leu Ser Gly Asp Trp He Pro Val Gly Leu 785 790 795 800 Pro Gly Glu Leu Gln Wing He Arg Pro Gly Val Ser Val He Ser Leu 805 810 815 Gly Gly Wing Thr Glu Wing Being He Trp Ser He Gly Tyr Pro Val Arg 820 825 830 Asn Val Asp Leu Ser Trp Wing Being Pro Pro Tyr Gly Arg Pro Leu Arg 835 840 845 Asn Gln Thr Phe His Val Leu Asp Glu Ala Leu Glu Pro Arg Pro Val 850 855 860 Trp Val Pro Gly Gln Leu Tyr He Gly Gly Val Gly Leu Ala Leu Gly 865 870 875 880 Tyr Trp Arg Asp Glu Glu Lys Thr Arg Lys Ser Phe Leu Val Hxs Pro 885 890 895 Glu Thr Gly Glu Arg Leu Tyr Lys Thr Gly Asp Leu Gly Arg Tyr Leu 900 905 910 Pro Asp Gly Asn He Glu Phe Met Gly Arg Glu Asp Asn Gln He Lys 915 920 925 Leu Arg Gly Tyr Arg Val Glu Leu Gly Glu He Glu Glu Thr Leu Lys 930 935 940 Ser His Pro Asn Val Arg Asp Ala Val Val Val Val Val Gly Asn Asp 945 950 955 960 Ala Ala Asn Lys Leu Leu Leu Ala Tyr Val Val Pro Glu Gly Thr Arg 965 970 975 Arg Arg Ala Ala Glu Gln Asp Ala Ser Leu Lys Thr Glu Arg He Asp 980 985 990 Wing Arg Ala His Wing Wing Glu Wing Asp Gly Leu Ser Asp Gly Glu Arg 995 1000 1005 Val Gln Phe Lys Leu Wing Arg His Gly Leu Arg Arg Asp Leu Asp Gly 1010 1015 1020 Lys Pro Val Val Asp Leu Thr Gly Gln Asp Pro Arg Glu Wing Gly Leu 1025 1030 1035 1040 Asp Val Tyr Ala Arg Arg Arg Ser Val Arg Thr Phe Leu Glu Ala Pro 1045 1050 1055 He Pro Phe Val Glu Phe Gly Arg Phe Leu Ser Cys Leu Ser Val 1060 1065 1070 Glu Pro Asp Gly Wing Thr Leu Pro Lys Phe Arg Tyr Pro Ser Wing Gly 1075 1080 1085 Ser Thr Tyr Pro Val Gln Thr Tyr Wing Tyr Val Lys Being Gly Arg He JTíl SS ^ A. 1090 1095 1100 Glu Gly Val Asp Glu Gly Phe Tyr Tyr Tyr His Pro Phe Glu His Arg 1105 1110 1115 1120 Leu Leu Lys Leu Ser Asp His Gly He Glu Arg Gly Wing His Val Arg 1125 1130 1135 Gln Asn Phe Asp Val Phe Asp Glu Ala Ala Phe Asn Leu Leu Phe Val 1140 1145 1150 Gly Arg He Asp Ala He Glu Ser Leu Tyr Gly Ser Ser Arg Glu 1155 1160 1165 Phe Cys Leu Leu Glu Wing Gly Tyr Met Wing Gln Leu Leu Met Glu Gln 1170 1175 1180 Wing Pro Ser Cys Asn He Gly Val Cys Pro Val Gly Gln Phe Asn Phe 1185 1190 1195 1200 Glu Gln Val Arg Pro Val Leu Asp Leu Arg His Ser Asp Val Tyr Val 1205 1210 1215 His Gly Met Leu Gly Gly Arg Val Asp Pro Arg Gln Phe Gln Val Cys 1220 1225 1230 Thr Leu Gly Gln Asp Ser Ser Pro Arg Arg Ala Thr Thr Arg Gly Ala 1235 1240 1245 Pro Pro Gly Arg Glu Oln His Phe Wing Asp Met Leu Arg Asp Phe Leu 1250 1255 1260 Arg Thr Lys Leu Pro Glu Tyr Met Val Pro Thr Val Phe Val Glu Leu 1265 1270 1275 1280 Asp Ala Leu Pro Leu Thr Ser Asn Gly Lys Val Asp Arg Lys Ala Leu 1285 1290 1295 Arg Glu Arg Lys Asp Thr Ser Ser Pro Arg His Ser Gly His Thr Wing 1300 1305 1310 Pro Arg Asp Ala Leu Glu Glu He Leu Val Wing Val Val Arg Glu Val 1315, 1320 1325 Leu Gly Leu Glu Val Val Gly Leu Gln Gln Ser Phe Val Asp Leu Gly 1330 1335 1340 Wing Thr Ser He His He Val Arg Met Arg Ser Leu Leu Gln Lys Arg 1345 1350 1355 1360 Leu Asp Arg Glu He Wing He Thr Glu Leu Phe Gln Tyr Pro Asn Leu 1365 1370 1375 Gly Ser Leu Wing Ser Gly Leu Arg Arg Asp Ser Arg Asp Leu Asp Gln 1380 1385 1390 Arg Pro Asn Met Gln Asp Arg Val Glu Val Arg Arg Lys Gly Arg Arg 1395 1400 1405 Arg Ser 1410 < 210 > 4 < 211 > 1832 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 4 Met Glu Glu Gln Glu Be Ser Wing He Wing Val He Gly Met Ser Gly 1 5 10 15 Arg Phe Pro Gly Wing Arg Asp Leu Asp Glu Phe Trp Arg Asn Leu Arg 20 25 30 Asp Gly Thr Glu Wing Val Gln Arg Phe Ser Glu Gln Glu Leu Wing Ala 35 40 45 Ser Gly Val Asp Pro Ala Leu Val Leu Asp Pro Ser Tyr Val Arg Wing 50 55 60 Gly Ser Val Leu Glu Asp Val Asp Arg Phe Asp Wing Wing Phe Phe Gly 65 70 75 80 He Ser Pro Arg Glu Wing Glu Leu Met Asp Pro Gln His Arg He Phe 85 90 95 Met Glu Cys Wing Trp Glu Wing Leu Glu Asn Wing Gly Tyr Asp Pro Thr 100 105 110 Wing Tyr Glu Gly Ser He Gly Val Tyr Wing Gly Wing Asn Met Ser Ser 115 120 125 Tyr Leu Thr Ser Asn Leu His Glu His Pro Wing Met Met Arg Trp Pro 130 135 140 Gly Trp Phe Gln Thr Leu He Gly Asn Asp Lys Asp Tyr Leu Wing Thr 145 150 155 160 His Val Ser Tyr Arg Leu Asn Leu Arg Gly Pro Ser He He Ser Val Gln 165 170 175 Thr Ala Cys Ser Thr Ser Leu Val Wing Val His Leu Wing Cys Met Ser 180 185 190 Leu Leu Asp Arg Glu Cys Asp Met Wing Leu Wing Gly Gly He Thr Val 195 200 205 Arg He Pro His Arg Wing Gly Tyr Val Tyr Wing Glu Gly Gly He Phe 210 215 220 Ser Pro Asp Gly His Cys Arg Ala Phe Asp Ala Lys Ala Asn Gly Thr 225 230 235 240 He Met Gly Asn Gly Cys Gly Val Val Leu Leu Pro Lys Asp Arg 245 250 255 Wing Leu Being Asp Gly Asp Pro Val Arg Wing Val He Leu Gly Ser Wing 260 265 270 Thr Asn Asn Asp Gly Wing Arg Lys He Gly Phe Thr Wing Pro Ser Glu 275 280 285 Val Gly Gln Wing Gln Wing He Met Glu Wing Leu Wing Leu Ala Gly Val 290 295 300 Glu Ala Arg Ser He Gln Tyr He Glu Thr His Gly Thr Gly Thr Leu 305 310 315 320 Leu Gly Asp Ala He Glu Thr Ala Ala Leu Arg Arg Val Phe Gly Arg 325 330 335 Asp Wing Being Wing Arg Arg Being Cys Wing He Gly Being Val Lys Thr Gly 340 345 350 He Gly His Leu Glu Being Wing Wing Gly He Wing Gly Leu He Lys Thr 355 360 365 Val Leu Wing Leu Glu His Arg Gln Leu Pro Pro Ser Leu Asn Phe Glu 370 375 380 Ser Pro Asn Pro Ser He Asp Phe Ala Be Ser Pro Phe Tyr Val Asn 385 390 395 400 Thr Ser Leu Lys Asp Trp Asn Thr Gly Ser Thr Pro Arg Arg Wing Gly 405 410 415 Val Ser Ser Phe Gly He Gly Gly Thr Asn Wing His Val Val Leu Glu 420 425 430 Glu Wing Pro Wing Wing Lys Leu Pro Wing Wing Pro Wing Wing Arg Wing 435 440 445 Glu Leu Phe Val Val Wing Wing Wing Wing Wing Wing Leu Asp Wing Wing 450 455 460 Wing Wing Arg Leu Arg Asp His Leu Gln Wing His Gln Gly He Ser Leu 465 470 475 480 Gly Asp Val Ala Phe Ser Leu Ala Thr Thr Arg Ser Pro Met Glu His 485 490 495 Arg Leu Ala Ala Ala Ala Pro Ser Arg Glu Ala Leu Arg Glu Gly Leu 500 505 510 Asp Wing Wing Wing Arg Gly Gln Thr Pro Pro Gly Wing Val Arg Gly Arg 515 520 525 Cys Ser Pro Gly Asn Val Pro Lys Val Val Phe Val Phe Pro Gly Gln 530 535 540 Gly Ser Gln Trp Val Gly Met Gly Arg Gln Leu Leu Ala Glu Glu Pro 545 550 555 560 Val Phe His Ala Ala Leu Ser Ala Cys Asp Arg Ala He Gln Ala Glu 565 570 575 Wing Gly Trp Ser Leu Leu Wing Glu Leu Wing Wing Asp Glu Gly Ser Ser 580 585 590 Gln Leu Glu Arg He Asp Val Val Gln Pro Val Leu Phe Ala Leu Wing 595 600 605 Val Wing Phe Wing Wing Leu Trp Arg Ser Trp Gly Val Wing Pro Asp Val 610 615 620 Val He Gly His Ser Met Gly Glu Val Wing Wing Wing His Val Wing Gly 625 630 635 640 Ala Leu Ser Leu Glu Asp Ala Val Ala He He Cys Arg Arg Ser Arg 645 650 655 Leu Leu Arg Arg He Ser Gly Gln Gly Glu Met Wing Val Thr Glu Leu 660 665 670 Ser Leu Wing Glu Wing Glu Wing Wing Leu Arg Gly Tyr Glu Asp Arg Val 675 680 685 Ser Val Wing Val Ser Asn Ser Pro Arg Ser Thr Val Leu Ser Gly Glu 690 695 700 Pro Wing Wing He Gly Glu Val Leu Ser Ser Leu Asn Wing Lys Gly Val 705 '710 715 720 Phe Cys Arg Arg Val Lys Val Asp Val Ala Ser His Ser Pro Gln Val 725 730 735 Asp Pro Leu Arg Glu Asp Leu Leu Wing Wing Leu Gly Gly Leu Arg Pro 740 745 750 Gly Wing Wing Wing Val Pro Met Arg Wing Thr Val Thr Gly Wing Wing Val 755 760 765 Wing Gly Pro Glu Leu Gly Wing Asn Tyr Trp Met Asn Asn Leu Arg Gln 770 775 780 Pro Val Arg Phe Ala Glu Val Val Gln Ala Gln Leu Gln Gly His 785 790 795 800 Gly Leu Phe Val Glu Met Ser Pro Pro Pro Le Le Thr Thr Ser Val 805 810 815 Glu Glu Met Arg Arg Wing Wing Gln Arg Wing Gly Wing Wing Val Gly Ser 820 825 830 Leu Arg Arg Gly Gln Asp Glu Arg Pro Wing Met Leu Glu Wing Leu Gly 835 840 845 Thr Leu Trp Wing Gln Gly Tyr Pro Val Pro Trp Gly Arg Leu Phe Pro 850 855 860 Wing Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln Arg Glu 865 870 875 880 Arg Tyr Trp He Glu Wing Pro Wing Lys Wing Wing Gly Asp Arg Arg 885 890 895 Gly Val Arg Wing Gly Gly His Pro Leu Leu Gly Glu Met Gln Thr Leu 900 905 910 Ser Thr Gln Thr Ser Thr Arg Leu Trp Glu Thr Thr Leu Asp Leu Lys 915 920 925 Arg Leu Pro Trp Leu Gly Asp His Arg Val Gln Gly Ala Val Val Phe 930 935 940 Pro Gly Wing Wing Tyr Leu Glu Met Wing He Ser Ser Gly Wing Glu Wing 945 950 955 960 Leu Gly Asp Gly Pro Leu Gln He Thr Asp Val Val Leu Wing Glu Wing 965 970 975 Leu Wing Phe Wing Gly Asp Wing Wing Val Leu Val Gln Val Val Thr Thr 980 985 990 Glu Gln Pro Ser Gly Arg Leu Gln Phe Gln He Wing Ser Arg Wing Pro 995 1000 1005 Gly Wing Gly His Wing Being Phe Arg Val His Wing Arg Gly Wing Leu Leu 1010 1015 1020 Arg Val Glu Arg Thr Glu Val Pro Wing Gly Leu Thr Leu Ser Wing Val 1025 1030 1035 1040 Arg Ala Arg Leu Gln Ala Ser He Pro Ala Ala Ala Thr Tyr Ala Glu 1045 1050 1055 Leu Thr Glu Met Gly Leu Gln Tyr Gly Pro Ala Phe Gln Gly He Ala 1060 1065 1070 Glu Leu Trp Arg Gly Glu Gly Glu Wing Leu Gly Arg Val Arg Leu Pro 1075 1080 1085 Asp Ala Ala Gly Be Ala Ala Glu Tyr Arg Leu His Pro Ala Leu Leu 1090 1095 1100 Asp Ala Cys Phe Gln He Val Gly Ser Leu Phe Ala Arg Ser Gly Glu 1105 1110 1115 1120 Wing Thr Pro Trp Val Pro Val Glu Leu Gly Ser Leu Arg Leu Leu Gln 1125 1130 1135 Arg Pro Ser Gly Glu Leu Trp Cys His Wing Arg Val Val Asn His Gly 1140 1145 1150 His Gln Thr Pro Asp Arg Gln Gly Wing Asp Phe Trp Val Val Asp Ser 1155 1160 1165 Ser Gly Ala Val Val Ala Glu Val Cys Gly Leu Val Ala Gln Arg Leu 1170 1175 1180 Pro Gly Gly Val Arg Arg Glu Glu Asp Asp Trp Phe Leu Glu Leu 1185 1190 1195 1200 Glu Trp Glu Pro Ala Ala Val Gly Thr Ala Lys Val Asn Ala Gly Arg 1205 1210 1215 Trp Leu Leu Leu Gly Gly Gly Gly Glu Leu Gly Wing Ala Leu Arg Wing 1220 1225 1230 Met Leu Glu Wing Gly Gly His Wing Val Val His Wing Wing Ala Glu Asn Asn 1235 1240 1245 Thr Ser Wing Wing Wing Val Arg Wing Leu Leu Wing Lys Wing Phe Asp Gly 1250 1255 1260 Gln Wing Pro Thr Wing Val Val His Leu Gly Ser Leu Asp Gly Gly Gly 1265 1270 1275 1280 Glu Leu Asp Pro Gly Leu Gly Wing Gln Gly Wing Leu Asp Wing Pro Arg 1285 1290 1295 Be Wing Asp Val Ser Pro Asp Wing Leu Asp Pro Wing Leu Val Arg Gly 1300 1305 1310 Cys Asp Ser Val Leu Trp Thr Val Gln Ala Leu Wing Gly Met Gly Phe 1315 1320 1325 Arg Asp Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gln Ala Val 1330 1335 1340 Gly Wing Gly Asp Val Ser Val Thr Gln Wing Pro Leu Leu Gly Leu Gly 1345 1350 1355 1360 Arg Val He Wing Met Glu His Wing Asp Leu Arg Cys Wing Arg Val Asp 1365 1370 1375 Leu Asp Pro Ala Arg Pro Glu Glu Glu Leu Ala Ala Leu Leu Ala Glu 1380 1385 1390 Leu Leu Wing Asp Asp Wing Glu Wing Glu Val Wing Leu Arg Gly Gly Glu 1395 1400 1405 Arg Cys Val Wing Arg He Val Arg Arg Gln Pro Glu Thr Arg Pro Arg 1410 1415 1420 Gly Arg He Glu Ser Cys Val Pro Thr Asp Val Thr He Arg Wing Asp 1425 1430 1435 1440 Ser Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Glu Leu Ser Val 1445 1450 1455 Wing Gly Trp Leu Ala Glu Arg Gly Ala Gly His Leu Val Leu Val Gly 1460 1465 1470 Arg Ser Gly Ala Ala Ser Val Glu Gln Arg Ala Ala Val Ala Ala Leu 1475 1480 1485 Glu Ala Arg Gly Ala Arg Val Thr Val Ala Ala Ala Ala Asp Val Ala Asp 1490 1495 1500 Arg Ala Gln Leu Glu Arg He Leu Arg Glu Val Thr Thr Ser Gly Met 1505 1510 1515 1520 Pro Leu Arg Gly Val Val His Wing Wing Gly He Leu Asp Asp Gly Leu 1525 1530 1535 Leu Met Gln Gln Thr Pro Wing Arg Phe Arg Lys Val Met Wing Pro Lys 1540 1545 1550 Val Gln Gly Wing Leu His Leu His Wing Leu Thr Arg Glu Wing Pro Leu 1555 1560 1565 Ser Phe Phe Val Leu Tyr Wing Ser Gly Val Gly Leu Leu Gly Ser Pro 1570 1575 1580 Gly Gln Gly Asn Tyr Wing Wing Wing Asn Thr Phe Leu Asp Wing Leu Wing 1585 1590 1595 1600 His His Arg Arg Wing Gln Gly Leu Pro Wing Leu Ser Val Asp Trp Gly 1605 1610 1615 Leu Phe Wing Glu Val Gly Met Wing Wing Gln Glu Asp Arg Gly Wing 1620 1625 1630 Arg Leu Val Ser Arg Gly Met Arg Ser Leu Thr Pro Asp Glu Gly Leu 1635 1640 1645 Ser Ala Leu Ala Arg Leu Leu Glu Ser Gly Arg Ala Gln Val Gly Val 1650 1655 1660 Met Pro Val Asn Pro Arg Leu Trp Val Glu Leu Tyr Pro Ala Wing Wing 1665 1670 1675 1680 Ser Ser Arg Met Leu Ser Arg Leu Val Thr Ala His Arg Ala Ser Ala 1685 1690 1695 Gly Gly Pro Wing Gly Asp Gly Asp Leu Leu Arg Wing Arg Leu Wing Wing 1700 1705 1710 Glu Pro Be Wing Arg Wing Wing Leu Leu Glu Pro Leu Leu Arg Wing Gln 1715 1720 1725 He Be Gln Val Leu Arg Leu Pro Glu Gly Lys He Glu Val Asp Wing 1730 1735 1740 Pro Leu Thr Ser Leu Gly Met Asn Ser Leu Met Gly Leu Glu Leu Arg 1745 1750 1755 1760 Asn Arg He Glu Wing Met Leu Gly He Thr Val Pro Wing Thr Leu Leu 1765 1770 1775 Trp Thr Tyr Pro Thr Val Wing Wing Leu Ser Gly His Leu Wing Arg Glu 1780 1785 1790 Wing Cys Glu Wing Wing Pro Val Glu Ser Pro His Thr Thr Wing Asp Ser 1795 1800 1805 Wing Val Glu He Glu Glu Met Ser Gln Asp Asp Leu Thr Gln Leu He 1810 1815 1820 Wing Wing Lys Phe Lys Wing Leu Thr 1825 1830 < 210 > 5 < 211 > 7257 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 5 Met Thr Thr Arg Gly Pro Thr Wing Gln Gln Asn Pro Leu Lys Gln Wing 1 5 10 15 Wing He He He Gln Arg Leu Glu Glu Arg Leu Wing Gly Leu Wing Gln 20 25 30 Wing Glu Leu Glu Arg Thr Glu Pro He Wing He Val Gly He Gly Cys 35 40 45 Arg Phe Pro Gly Gly Wing Asp Wing Pro Glu Wing Phe Trp Glu Leu Leu 50 55 60 Asp Wing Glu Arg Asp Wing Val Gln Pro Leu Asp Met Arg Trp Wing Leu 65 70 75 80 Val Gly Val Ala Pro Val Glu Ala Val Pro His Trp Ala Gly Leu Leu 85 90 95 Thr Glu Pro He Asp Cys Phe Asp Wing Wing Phe Phe Gly He Ser Pro 100 105 110 Arg Glu Wing Arg Ser Leu Asp Pro Gln His Arg Leu Leu Leu Glu Val 115 120 125 Wing Trp Glu Gly Leu Glu Asp Wing Gly He Pro Pro Arg Ser He Asp 130 135 140 Gly Ser Arg Thr Gly Val Phe Val Gly Ala Phe Thr Ala Asp Tyr Ala 145 150 155 160 Arg Thr Val Wing Arg Leu Pro Arg Glu Glu Arg Asp Wing Tyr Ser Wing 165 170 175 Thr Gly Asn Met Leu Ser Wing Wing Wing Gly Arg Leu Ser Tyr Thr Leu 180 185 190 Gly Leu Gln Gly Pro Cys Leu Thr Val Asp Thr Wing Cys Ser Ser 195 195 205 Leu Val Wing His Leu Le Cys Arg Ser Leu Arg Wing Gly Glu Ser 210 215 220 Asp Leu Wing Leu Wing Gly Gly Val Ser Wing Leu Leu Ser Pro Asp Met 225 230 235 240 Met Glu Ala Ala Ala Arg Thr Gln Ala Leu Ser Pro Asp Gly Arg Cys 245 250 255 Arg Thr Phe Asp Wing Being Wing Asn Gly Phe Val Arg Gly Glu Gly Cys 260 265 270 Gly Leu Val Val Leu Lys Arg Leu Ser Asp Wing Gln Arg Asp Gly Asp 275 280 285 Arg He Trp Wing Leu He Arg Gly Being Wing He Asn His Asp Gly Arg 290 295 300 Ser Thr Gly Leu Thr Wing Pro Asn Val Leu Wing Gln Glu Thr Val Leu 305 310 315 320 Arg Glu Ala Leu Arg Ser Ala His Val Glu Ala Gly Ala Val Asp Tyr 325 330 335 Val Glu Thr His Gly Thr Gly Thr Ser Leu Gly Asp Pro He Glu Val 340 345 350 Glu Ala Leu Arg Ala Thr Val Gly Pro Ala Arg Ser Asp Gly Thr Arg 355 360 365 Cys Val Leu Gly Ala Val Lys Thr Asn He Gly His Leu Glu Ala Ala 370 375 380 Ala Gly Val Ala Gly Leu He Lys Ala Ala Leu Ser Leu Thr His Glu 385 390 395 400 Arg He Pro Arg Asn Leu Asn Phe Arg Thr Leu Asn Pro Arg He Arg 405 410 415 Leu Glu Gly Pro Wing Ala Leu Wing Leu Wing Thr Glu Pro Val Pro Trp Pro 420 425 430 Arg Thr Asp Arg Pro Arg Phe Wing Gly Val Ser Ser Phe Gly Met Ser 435 440 445 Gly Thr Asn Wing His Val Val Leu Glu Glu Wing Pro Wing Val Glu Leu 450 455 460 Trp Pro Wing Wing Pro Glu Arg Wing Wing Glu Leu Leu Val Leu Ser Gly 465 470 475 480 Lys Ser Glu Gly Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Glu His 485 490 495 Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp Val Wing Phe Ser Leu 500 505 510 Wing Thr Thr Arg Ser Wing Met Ser His Arg Leu Ala Val Ala Val Thr 515 520 525 Ser Arg Glu Gly Leu Leu Ala Ala Leu Ser Ala Val Ala Gln Gly Gln 530 535 540 10 Thr Pro Ala Gly Ala Ala Arg Cys He Ala Ser Ser Ser Arg Gly Lys 545 550 555 560 Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gln Thr Pro Gly Met Gly 565 570 575 Arg Gly Leu Cys Wing Wing Trp Pro Wing Phe Arg Glu Wing Phe Asp Arg 580 585 590 20 Cys Val Wing Leu Phe Asp Arg Glu Leu Asp Arg Pro Leu Arg Glu Val 595 600 605 Met Trp Wing Glu Wing Gly Ser Wing Glu Ser Leu Leu Leu Asp Gln Thr 610 615 620 25 Wing Phe Thr Gln Pro Wing Leu Phe Wing Val Glu Tyr Wing Leu Thr Wing 625 630 635 640 Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu Leu Val Gly His Ser 30 645 650 655 He Gly Glu Leu Val Ala Wing Cys Val Wing Gly Val Phe Ser Leu Glu 660 665 670 35 Asp Gly Val Arg Leu Val Ala Wing Arg Gly Arg Leu Met Gln Gly Leu 675 680 685 Be Wing Gly Gly Wing Met Val Be Leu Gly Wing Pro Glu Wing Glu Val 690 695 700 40 Wing Wing Wing Val Wing Pro Wing Wing Wing Val Wing Wing Wing Wing Val 705 710 715 720 Asn Gly Pro Glu Gln Val Val He Wing Wing Gly Val Glu Gln Wing Val Gln 45 725 730 735 Wing Wing Wing Wing Gly Wing Wing Wing Arg Gly Wing Arg Thr Lys Arg Leu 740 745 750 50 His Val Ser His Wing Phe His Ser Pro Leu Met Glu Pro Met Leu Glu 755 760 765 Glu Phe Gly Arg Val Wing Wing Ser Val Thr Tyr Arg Arg Pro Ser Val 770 775 780 DH Ser Leu Val Ser Asn Leu Ser Gly Lys Val Val Thr Asp Glu Leu Ser 785 790 795 800 Wing Pro Gly Tyr Trp Val Arg His Val Arg Glu Wing Val Arg Phe Wing 805 810 815 Asp Gly Val Lys Ala Leu His Glu Ala Gly Ala Gly Thr Phe Val Glu 820 825 830 Val Gly Pro Lys Pro Thr Leu Leu Glu Leu Leu Pro Wing Cys Leu Pro 835 840 845 Glu Ala Glu Pro Thr Leu Leu Ala Ser Leu Arg Ala Gly Arg Glu Glu 850 855 860 Wing Wing Gly Val Leu Glu Wing Leu Gly Arg Leu Trp Wing Wing Gly Gly 865 870 875 880 Ser Val Ser Trp Pro Gly Val Phe Pro Thr Wing Gly Arg Arg Val Pro 885 890 895 Leu Pro Thr Tyr Pro Trp Gln Arg Gln Arg Tyr Trp He Glu Wing Pro 900 905 910 Wing Glu Gly Leu Gly Wing Thr Wing Wing Asp Wing Leu Wing Gln Trp Phe 915 920 925 Tyr Arg Val Asp Trp Pro Glu Met Pro Arg Ser Ser Val Asp Ser Arg 930 935 940 Arg Wing Arg Ser Gly Gly Trp Leu Val Leu Wing Asp Arg Gly Gly Val 945 950 955 960 Gly Glu Wing Wing Wing Wing Wing Leu Being Ser Gln Gly Cys Ser Cys Wing 965 970 975 Val Leu His Wing Pro Wing Glu Wing Wing Wing Val Wing Glu Gln Val Thr 980 985 990 Gln Ala Leu Gly Gly Arg Asn Asp Trp Gln Gly Val Leu Tyr Leu Trp 995 1000 1005 Gly Leu Asp Ala Val Val Glu Ala Gly Ala Ser Ala Glu Glu Val Ala 1010 1015 1020 Lys Val Thr His Leu Ala Ala Ala Pro Val Leu Ala Leu He Gln Ala 1025 1030 1035 1040 Leu Gly Thr Gly Pro Arg Ser Pro Arg Leu Trp He Val Thr Arg Gly 1045 1050 1055 Wing Cys Thr Val Gly Gly Pro Asp Ala Wing Pro Cys Gln Wing Wing 1060 1065 1070 Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu His Pro Gly Ser Trp 1075 1080 1085 Gly Gly Leu Val Asp Leu Asp Pro Glu Glu Ser Pro Thr Glu Val Glu 1090 1095 1100 '"A -« - to Ala Leu Val Ala Glu Leu Leu Ser Pro Asp Ala Glu Asp Gln Leu Ala 1105 1110 1115 1120 Phe Arg Gln Gly Arg Arg Arg Wing Wing Arg Leu Val Wing Wing Pro Pro 1125 1130 1135 Glu Gly Asn Wing Wing Pro Val Ser Leu Ser Wing Glu Gly Ser Tyr Leu 1140 1145 1150 10 Val Thr Gly Gly Leu Gly Wing Leu Gly Leu Leu Val Ala Arg Trp Leu 1155 1160 1165 Val Glu Arg Gly Ala Gly His Leu Val Leu He Ser Arg His Gly Leu 15 1170 1175 1180 Pro Asp Arg Glu Glu Trp Gly Arg Asp Gln Pro Pro Glu Val Arg Wing 1185 1190 1195 1200 Arg He Wing Wing He Glu Wing Leu Glu Wing Gln Gly Wing Arg Val Thr 1205 1210 1215 Val Ala Wing Val Asp Val Wing Asp Wing Glu Gly Met Wing Wing Leu Leu 1220 1225 1230 25 Wing Wing Val Glu Pro Pro Leu Arg Val Gly Val His Wing Wing Gly Leu 1235 1240 1245 Leu Asp Asp Gly Leu Leu Wing His Gln Asp Wing Gly Arg Leu Wing Arg 1250 1255 1260 Val Leu Arg Pro Lys Val Glu Gly Wing Trp Val Leu His Thr Leu Thr 1265 1270 1275 1280 Arg Glu Gln Pro Leu Asp Leu Phe Val Leu Phe Ser Be Wing Ser Gly 1285 1290 1295 Val Phe Gly Ser He Gly Gln Gly Ser Tyr Ala Wing Gly Asn Wing Phe 1300 1305 1310 40 Leu Asp Ala Leu Wing Asp Leu Arg Arg Thr Gln Gly Leu Ala Ala Leu 1315 1320 1325 Ser He Wing Trp Gly Leu Trp Wing Glu Gly Gly Met Gly Ser Gln Wing 45 1330 1335 1340 Gln Arg Arg Glu His Glu Wing Ser Gly He Trp Wing Met Pro Thr Ser 1345 1350 1355 1360 50 Arg Wing Leu Wing Wing Met Glu Trp Leu Leu Gly Thr Arg Wing Thr Gln 1365 1370 1375 Arg Val Val He Gln Met Asp Trp Wing His Wing Gly Wing Wing Pro Arg 1380 1385 1390 55 Asp Wing Being Arg Gly Arg Phe Trp Asp Arg Leu Val Thr Ala Thr Lys 1395 1400 1405 «Fcí 1 6 Glu Wing Being Ser Wing Val Pro Wing Val Glu Arg Trp Arg Asn Wing 1410 1415 1420 Ser Val Val Glu Thr Arg Ser Wing Leu Tyr Glu Leu Val Arg Gly Val 1425 1430 1435 1440 Val Wing Gly Val Met Gly Phe Thr Asp Gln Gly Thr Leu Asp Val Arg 1445 1450 1455 Arg Gly Phe Wing Glu Gln Gly Leu Asp Being Leu Met Wing Val Glu He 1460 1465 1470 Arg Lys Arg Leu Gln Glu Glu Leu Gly Met Pro Leu Ser Wing Thr Leu 1475 1480 1485 Wing Phe Asp His Pro Thr Val Glu Arg Leu Val Glu Tyr Leu Leu Ser 1490 1495 1500 Gln Ala Leu Glu Leu Gln Asp Arg Thr Asp Val Arg Ser Val Arg Leu 1505 1510 1515 1520 Pro Wing Thr Glu Asp Pro He Wing He Val Gly Wing Wing Cys Arg Phe 1525 1530 1535 Pro Gly Val Glu Asp Leu Glu Ser Tyr Trp Gln Leu Leu Thr Glu 1540 1545 1550 Gly Val Val Val Ser Thr Glu Val Pro Wing Asp Arg Trp Asn Gly Wing 1555 1560 1565 Asp Gly Arg Val Pro Gly Ser Gly Glu Wing Gln Arg Gln Thr Tyr Val 1570 1575 1580 Pro Arg Gly Gly Phe Leu Arg Glu Val Glu Thr Phe Asp Ala Wing Phe 1585 1590 1595 1600 Phe His He Ser Pro Arg Glu Wing Met Ser Leu Asp Pro Gln Gln Arg 1605 1610 1615 Leu Leu Leu Glu Val Ser Trp Glu Wing He Glu Arg Wing Gly Gln Asp 1620 1625 1630 Pro Be Wing Leu Arg Glu Ser Pro Thr Gly Val Phe Val Gly Wing Gly 1635 1640 1645 Pro Asn Glu Tyr Wing Glu Arg Val Gln Glu Leu Wing Asp Glu Wing Wing 1650 1655 1660 Gly Leu Tyr Ser Gly Thr Gly Asn Met Leu Ser Val Wing Wing Gly Arg 1665 1670 1675 1680 Leu Ser Phe Phe Leu Gly Leu His Gly Pro Thr Leu Ala Val Asp Thr 1685 1690 1695 Ala Cys Ser Ser Leu Val Ala Leu His Leu Gly Cys Gln Ser Leu 1700 1705 1710 Arg Arg Gly Glu Cys Asp Gln Ala Leu Val Gly Gly Val Asn Met Leu 1715 1720 1725 Leu Ser Pro Lys Thr Phe Ala Leu Leu Ser Arg Met His Ala Leu Ser 1730 1735 1740 Pro Gly Gly Arg Cys Lys Thr Phe Ser Wing Asp Wing Asp Gly Tyr Ala 1745 1750 1755 1760 Arg Ala Glu Gly Cys Ala Val Val Val Leu Lys Arg Leu Ser Asp Ala 1765 1770 1775 Gln Arg Asp Arg Asp Pro He Leu Wing Val He Arg Gly Thr Wing He 1780 1785 1790 Asn His Asp Gly Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Wing 1795 1800 1805 Gln Glu Wing Leu Leu Arg Gln Wing Leu Wing His Wing Gly Val Val Pro 1810 1815 1820 Wing Asp Val Asp Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly 1825 1830 1835 1840 Asp Pro He Glu Val Arg Ala Leu Ser Asp Val Tyr Gly Gln Ala Arg 1845 1850 1855 Pro Wing Asp Arg Pro Leu He Leu Gly Wing Wing Lys Wing Asn Leu Gly 1860 1865 1870 His Met Glu Pro Wing Wing Gly Leu Wing Gly Leu Leu Wing Wing Val Leu 1875 1880 1885 Wing Leu Gly Gln Glu Gln He Pro Wing Gln Pro Glu Leu Gly Glu Leu 1890 1895 1900 Asn Pro Leu Leu Pro Trp Glu Wing Leu Pro Val Wing Val Wing Arg Wing 1905 1910 1915 1920 Wing Val Pro Trp Pro Arg Thr Asp Arg Pro Arg Phe Wing Gly Val Ser 1925 1930 1935 Be Phe Gly Met Be Gly Thr Asn Ala His Val Val Leu Glu Glu Ala 1940 1945 1950 Pro Ala Val Glu Leu Trp Pro Ala Ala Pro Glu Arg Ser Ala Glu Leu 1955 1960 1965 Leu Val Leu Ser Gly Lys Ser Glu Gly Ala Leu Asp Ala Gln Ala Ala 1970 1975 1980 Arg Leu Arg Glu His Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp 1985 1990 1995 2000 Val Wing Phe Ser Leu Wing Thr Thr Arg Wing Wing Asn His Arg Leu 2005 2010 2015 Wing Val Wing Val Thr Ser Arg Glu Gly Leu Wing Wing Wing Leu Wing 2020 2025 2030 Wing Wing Gln Gly Gln Thr Pro Pro Gly Wing Wing Arg Cys He Wing Ser 2035 2040 2045 Ser Ser Arg Gly Lys Leu Wing Phe Leu Phe Thr Gly Gln Gly Wing Gln 2050 2055 2060 Thr Pro Gly Met Gly Arg Gly Leu Cys Wing Wing Trp Pro Wing Phe Arg 2065 2070 2075 2080 Glu Ala Phe Asp Arg Cys Val Ala Leu Phe Asp Arg Glu Leu Asp Arg 2085 2090 2095 Pro Leu Arg Glu Val Met Trp Wing Glu Pro Gly Ser Wing Glu Ser Leu 2100 2105 2110 Leu Leu Asp Gln Thr Ala Phe Thr Gln Pro Ala Leu Phe Thr Val Glu 2115 2120 2125 Tyr Ala Leu Thr Ala Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu 2130 2135 2140 Val Ala Gly His Ser Ala Gly Glu Leu Val Ala Ala Cys Val Ala Gly 2145 2150 2155 2160 Val Phe Ser Leu Glu Asp Gly Val Arg Leu Val Ala Wing Arg Gly Arg 2165 2170 2175 Leu Met Gln Gly Leu Be Wing Gly Gly Wing Met Val Ser Leu Gly Wing 2180 2185 2190 Pro Glu Wing Glu Val Wing Wing Wing Val Wing Pro Wing Wing Wing Val 2195 2200 2205 Ser Wing Wing Val Asn Gly Pro Glu Val Val Val He Wing Wing Gly Val 2210 2215 2220 Glu Gln Wing Wing Gln Wing Wing Wing Wing Gly Wing Wing Wing Arg Gly Wing 2225 2230 2235 2240 Arg Thr Lys Arg Leu His Val Ser His Wing Ser His Ser Pro Leu Met 2245 2250 2255 Glu Pro Met Leu Val Glu Glu Phe Gly Arg Val Wing Wing Val Thr Tyr 2260 2265 2270 Arg Arg Pro Ser Val Ser Leu Val Ser Asn Leu Ser Val Val 2275 2280 2285 Wing Asp Glu Leu Ser Wing Pro Gly Tyr Trp Val Arg His Val Arg Glu 2290 2295 2300 Wing Val Arg Phe Wing Asp Gly Val Lys Wing Leu His Glu Wing Gly Wing 2305 2310 2315 2320 Gly Thr Phe Val Glu Val Gly Pro Lys Pro Thr Leu Leu Glu Leu Leu 2325 2330 2335 Pro Ala Cys Leu Pro Glu Ala Glu Pro Thr Leu Leu Ala Be Leu Arg 2340 2345 2350 Wing Gly Arg Glu Glu Wing Wing Gly Val Leu Glu Wing Leu Gly Arg Leu 2355 2360 2365 10 Trp Wing Wing Gly Gly Ser Val Ser Trp Pro Gly Val Phe Pro Thr Wing 2370 2375 2380 Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln Arg Gln Arg Tyr 2385 2390 2395 2400 Trp Pro Asp He Glu Pro Asp Ser Arg Arg His Wing Wing Asp Pro 2405 2410 2415 Thr Gln Gly Trp Phe Tyr Arg Val Asp Trp Pro Glu He Pro Arg Ser 2420 2425 2430 Leu Gln Lys Ser Glu Glu Wing Ser Arg Gly Ser Trp Leu Val Leu Wing 2435 2440 2445 25 Asp Lys Gly Gly Val Gly Glu Wing Val Wing Ala Ala Leu Ser Thr Arg 2450 2455 2460 Gly Leu Pro Cys Val Val Leu His Ala Pro Ala Glu Thr Ser Ala Thr 30 2465 2470 2475 2480 Wing Glu Leu Val Thr Glu Wing Wing Gly Gly Arg Ser Asp Trp Gln Val 2485 2490 2495 Val Leu Tyr Leu Trp Gly Leu Asp Wing Val Val Gly Wing Glu Wing Ser 2500 2505 2510 He Asp Glu He Gly Asp Wing Thr Arg Arg Wing Thr Wing Pro Val Leu 2515 2520 2525 40 Gly Leu Wing Arg Phe Leu Ser Thr Val Ser Cys Ser Pro Arg Leu Trp 2530 2535 2540 Val Val Thr Arg Gly Ala Cys He Val Gly Asp Glu Pro Ala He Wing 45 2545 2550 2555 2560 Pro Cys Gln Ala Ala Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu 2565 2570 2575 50 His Pro Gly Wing Trp Gly Gly Leu Val Asp Leu Asp Pro Arg Wing Ser 2580 2585 2590 Pro Pro Gln Wing Pro Pro He Asp Gly Glu Met Leu Val Thr Glu Leu 2595 2600 2605 DD Leu Ser Gln Glu Thr Glu Asp Gln Leu Ala Phe Arg His Gly Arg Arg 2610 2615 2620 His Ala Ala Arg Leu Val Ala Ala Pro Pro Gln Gln Gln Ala Ala Pro 2625 2630 2635 2640 Val Ser Leu Ser Ala Glu Ala Ser Tyr Leu Val Thr Gly Gly Leu Gly 2645 2650 2655 Gly Leu Gly Leu He Val Wing Gln Trp Leu Val Glu Leu Gly Wing Arg 2660 2665 2670 His Leu Val Leu Thr Ser Arg Arg Gly Leu Pro Asp Arg Gln Wing Trp 2675 2680 2685 Cys Glu Gln Gln Pro Pro Glu He Arg Ala Arg He Wing Wing Val Glu 2690 2695 2700 Wing Leu Glu Wing Arg Gly Wing Arg Val Thr Val Wing Wing Val Asp Val 2705 2710 2715 2720 Wing Asp Val Glu Pro Met Thr Wing Leu Val Ser Ser Val Glu Pro Pro 2725 2730 2735 Leu Arg Gly Val Val His Wing Wing Gly Val Ser Val Met Arg Pro Leu 2740 2745 2750 Wing Glu Thr Asp Glu Thr Leu Leu Glu Ser Val Leu Arg Pro Lys Val 2755 2760 2765 Wing Gly Ser Trp Leu Leu His Arg Leu Leu His Gly Arg Pro Leu Asp 2770 2775 2780 Leu Phe Val Leu Phe Ser Ser Gly Wing Wing Val Trp Gly Ser His Ser 2785 2790 2795 2800 Gln Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu Ala His 2805 2810 2815 Leu Arg Arg Ser Gln Ser Leu Pro Ala Leu Ser Val Wing Trp Gly Leu 2820 2825 2830 Trp Wing Glu Gly Gly Met Wing Asp Wing Glu Wing His Wing Arg Leu Ser 2835 2840 2845 Asp He Gly Val Leu Pro Met Ser Thr Ser Wing Wing Leu Ser Ala Leu 2850 2855 2860 Gln Arg Leu Val Glu Thr Gly Ala Ala Gln Arg Thr Val Thr Arg Met 2865 2870 2875 2880 Asp Trp Wing Arg Phe Wing Pro Val Tyr Thr Wing Arg Gly Arg Arg Asn 2885 2890 2895 Leu Leu Be Ala Leu Val Ala Wing Gly Arg Asp He He Wing Pro Pro 2900 2905 2910 Pro Wing Wing Thr Arg Asn Trp Arg Gly Leu Ser Val Wing Glu Wing 2915 2920 2925 Arg Val Wing Leu His Glu He Val His Gly Wing Val Ala Arg Val Leu 2930 2935 2940 Gly Phe Leu Asp Pro Be Wing Leu Asp Pro Gly Met Gly Phe Asn Glu 2945 2950 2955 2960 Gln Gly Leu Asp Ser Leu Met Wing Val Glu He Arg Asn Leu Leu Gln 2965 2970 2975 Wing Glu Leu Asp Val Arg Leu Ser Thr Thr Leu Wing Phe Asp His Pro 2980 2985 2990 Thr Val Gln Arg Leu Val Glu His Leu Leu Val Asp Val Leu Lys Leu 2995 3000 3005 Glu Asp Arg Ser Asp Thr Gln His Val Arg Ser Leu Wing Ser Asp Glu 3010 3015 3020 20 Pro He Wing He Val Gly Ala Wing Cys Arg Phe Pro Gly Val Glu 3025 3030 3035 3040 Asp Leu Glu Ser Tyr Trp Gln Leu Leu Wing Glu Gly Val Val Val Ser 3045 3050 3055 Wing Glu Val Pro Wing Asp Arg Trp Asp Wing Wing Asp Trp Tyr Asp Pro 3060 3065 3070 Asp Pro Glu He Pro Gly Arg Thr Tyr Val Thr Lys Gly Wing Phe Leu 3075 3080 3085 Arg Asp Leu Gln Arg Leu Asp Wing Thr Phe Phe Arg He Ser Pro Arg 3090 3095 3100 35 Glu Ala Met Ser Leu Asp Pro Gln Gln Arg Leu Leu Leu Glu Val Ser 3105 3110 3115 3120 Trp Glu Wing Leu Glu Be Wing Gly He Wing Pro Asp Thr Leu Arg Asp 3125 3130 3135 40 Ser Pro Thr Gly Val Phe Val Gly Wing Gly Pro Asn Glu Tyr Tyr Thr 3140 3145 3150 Gln Arg Leu Arg Gly Phe Thr Asp Gly Wing Wing Gly Leu Tyr Gly Gly 45 3155 3160 3165 Thr Gly Asn Met Leu Ser Val Thr Wing Gly Arg Leu Ser Phe Phe Leu 3170 3175 3180 50 Gly Leu His Gly Pro Thr Leu Wing Met Asp Thr Wing Cys Ser Being 3185 3190 3195 3200 Leu Val Ala Leu His Leu Ala Cys Gln Ser Leu Arg Leu Gly Glu Cys 3205 3210 3215 DD Asp Gln Ala Leu Val Gly Gly Val Asn Val Leu Leu Wing Pro Glu Thr 3220 3225 3230 Phe Val Leu Leu Ser Arg Met Arg Ala Leu Ser Pro Asp Gly Arg Cys 3235 3240 3245 Lys Thr Phe Ser Wing Asp Wing Asp Gly Tyr Ala Arg Gly Glu Gly Cys 3250 3255 3260 Wing Val Val Leu Lys Arg Leu Arg Asp Wing Gln Arg Wing Gly Asp 3265 3270 3275 3280 Be He Leu Wing Leu He Arg Gly Be Wing Val Asn His Asp Gly Pro 3285 3290 3295 Be Ser Gly Leu Thr Val Pro Asn Gly Pro Wing Gln Gln Wing Leu Leu 3300 3305 3310 Arg Gln Ala Leu Ser Gln Wing Gly Val Ser Pro Val Asp Val Asp Phe 3315 3320 3325 20 Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro He Glu Val 3330 3335 3340 Gln Ala Leu Ser Glu Val Tyr Gly Pro Gly Arg Ser Gly Asp Arg Pro 3345 3350 3355 3360 Leu Val Leu Gly Ala Ala Lys Ala Asn Val Ala His Leu Glu Ala Ala 3365 3370 3375 Ser Gly Leu Ala Ser Leu Leu Lys Ala Val Leu Ala Leu Arg His Glu 30 3380 3385 3390 Gln He Pro Ala Gln Pro Glu Leu Gly Glu Leu Asn Pro His Leu Pro 3395 3400 3405 35 Trp Asn Thr Leu Pro Val Wing Val Pro Arg Lys Wing Val Pro Trp Gly 3410 3415 3420 Arg Gly Wing Arg Pro Arg Arg Wing Gly Val Ser Wing Phe Gly Leu Ser 3425 3430 3435 3440 40 Gly Thr Asn Val Val Val Leu Glu Glu Ala Pro Glu Val Glu Pro 3445 3450 3455 Wing Pro Wing Wing Pro Wing Arg Pro Val Val Leu Val Val Leu Ser Wing 45 3460 3465 3470 Lys Ser Wing Wing Wing Leu Asp Wing Wing Wing Wing Arg Leu Ser Wing His 3475 3480 3485 50 Leu Ser Wing His Pro Glu Leu Ser Leu Gly Asp Val Wing Phe Ser Leu 3490 3495 3500 Wing Thr Thr Arg Ser Pro Met Glu His Arg Leu Wing He Wing Thr Thr 3505 3510 3515 3520 D5 Ser Arg Glu Wing Leu Arg Gl / Wing Leu Asp Wing Wing Gln Gln Lys 3525 3530 3535 Thr Pro Gln Gly Wing Val Arg Gly Lys Wing Val Being Ser Arg Gly Lys 3540 3545 3550 Leu Wing Phe Leu Phe Thr Gly Gln Gly Wing Gln Met Pro Gly Met Gly 3555 3560 3565 Arg Gly Leu Tyr Glu Thr Trp Pro Wing Phe Arg Glu Wing Phe Asp Arg 3570 3575 3580 Cys Val Wing Leu Phe Asp Arg Glu He Asp Gln Pro Leu Arg Glu Val 3585 3590 3595 3600 Met Trp Wing Wing Pro Gly Leu Wing Gln Wing Wing Arg Leu Asp Gln Thr 3605 3610 3615 Ala Tyr Ala Gln Pro Ala Leu Phe Ala Leu Glu Tyr Ala Leu Ala Ala 3620 3625 3630 Leu Trp Arg Ser Trp Gly Val Glu Pro His Val Leu Leu Gly His Ser 3635 3640 3645 He Gly Glu Leu Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu 3650 3655 3660 Asp Wing Val Arg Leu Val Wing Wing Arg Gly Arg Leu Met Gln Wing Leu 3665 3670 3675 3680 Pro Ala Gly Gly Ala Met Val Ala Ala Ala Ala Ser Glu Ala Glu Val 3685 3690 3695 Wing Wing Wing Val Wing Pro Wing Wing Wing Thr Val Wing Wing Wing Val 3700 3705 3710 Asn Gly Pro Asp Wing Val Val Wing Wing Gly Wing Glu Val Gln Val Leu 3715 3720 3725 Wing Leu Gly Wing Thr Phe Wing Wing Arg Gly He Arg Thr Lys Arg Leu 3730 3735 3740 Wing Val Ser His Wing Phe His Ser Pro Leu Met Asp Pro Met Leu Glu 3745 3750 3755 3760 Asp Phe Gln Arg Val Wing Wing Thr He Wing Tyr Arg Wing Pro Asp Arg 3765 3770 3775 Pro Val Val Ser Asn Val Thr Gly His Val Wing Gly Pro Glu He Wing 3780 3785 3790 Thr Pro Glu Tyr Trp Val Arg His Val Arg Ser Wing Val Arg Phe Gly 3795 3800 3805 Asp Gly Ala Lys Ala Leu His Ala Ala Gly Ala Ala Thr Phe Val Glu 3810 3815 3820 Val Gly Pro Lys Pro Val Leu Leu Gly Leu Leu Pro Ala Cys Leu Gly 3825 3830 3835 3840 Glu Ala Ala Asp Ala Val Leu Val Pro Ser Leu Arg Ala Asp Arg Ser Glu 3845 3850 3855 Cys Glu Val Val Leu Ala Ala Leu Gly Ala Trp Tyr Ala Trp Gly Gly 3860 3865 3870 Ala Leu Asp Trp Lys Gly Val Phe Pro Asp Gly Wing Arg Arg Val Wing 3875 3880 3885 10 Leu Pro Met Tyr Pro Trp Gln Arg Glu Arg His Trp Met Asp Leu Thr 3890 3895 3900 Pro Arg Ser Ala Ala Pro Ala Gly He Ala Gly Arg Trp Pro Leu Ala 3905 3910 3915 3920 Gly Val Gly Leu Cys Met Pro Gly Ala Val Leu His His Val Leu Ser 3925 3930 3935 He Gly Pro Arg His Gln Pro Phe Leu Gly Asp His Leu Val Phe Gly 3940 3945 3950 Lys Val Val Pro Gly Ala Phe His Val Ala Val He Leu Ser He 3955 3960 3965 25 Wing Wing Glu Arg Trp Pro Glu Arg Wing He Glu Leu Thr Gly Val Glu 3970 3975 3980 Phe Leu Lys Ala He Wing Met Glu Pro Asp Gln Glu Val Glu Leu His 30 3985 3990 3995 4000 Wing Val Leu Thr Pro Glu Wing Wing Gly Asp Gly Tyr Leu Phe Glu Leu 4005 4010 4015 Ala Thr Leu Ala Ala Pro Glu Thr Glu Arg Arg Trp Thr Thr His Wing 4020 4025 4030 Arg Gly Arg Val Gln Pro Thr Asp Gly Ala Pro Gly Ala Leu Pro Arg 4035 4040 4045 40 Leu Glu Val Leu Glu Asp Arg Ala He Gln Pro Leu Asp Phe Wing Gly 4050 4055 4060 Phe Leu Asp Arg Leu Ser Wing Val Arg He Gly Trp Gly Pro Leu Trp 45 4065 4070 4075 4080 Arg Trp Leu Gln Asp Gly Arg Val Gly Asp Glu Wing Ser Leu Wing Thr 4085 4090 4095 50 Leu Val Pro Thr Tyr Pro Asn Wing His Asp Val Wing Pro Leu His Pro 4100 4105 4110 He Leu Leu Asp Asn Gly Phe Wing Val Ser Leu Leu Ser Thr Arg Ser 4115 4120 4125 DD Glu Pro Glu Asp Asp Gly Thr Pro Pro Leu Pro Phe Wing Val Glu Arg 4130 4135 4140 Val Arg Trp Trp Arg Ala Pro Val Gly Arg Val Arg Cys Gly Val 4145 4150 4155 4160 Pro Arg Ser Gln Wing Phe Gly Val Ser Ser Phe Val Leu Val Asp Glu 4165 4170 4175 Thr Gly Glu Val Val Glu Wing Glu Val Glu Gly Phe Val Cys Arg Arg Ala 4180 4185 4190 Pro Arg Glu Val Phe Leu Arg Gln Glu Ser Gly Ala Ser Thr Wing Ala 4195 4200 4205 Leu Tyr Arg Leu Asp Trp Pro Glu Ala Pro Leu Pro Asp Ala Pro Wing 4210 4215 4220 Glu Arg He Glu Glu Ser Trp Val Val Val Wing Wing Pro Gly Ser Glu 4225 4230 4235 4240 Met Wing Wing Wing Leu Wing Thr Arg Leu Asn Arg Cys Wing Leu Val Glu 4245 4250 4255 Pro Lys Gly Leu Glu Wing Ala Leu Wing Gly Val Ser Pro Wing Gly Val 4260 4265 4270 He Cys Leu Trp Glu Wing Gly Wing His Glu Wing Glu Wing Pro Wing Wing Wing 4275 4280 4285 Gln Arg Val Wing Thr Glu Gly Leu Ser Val Val Gln Wing Leu Arg Asp 4290 4295 4300 Arg Wing Val Arg Leu Trp Trp Val Thr Met Gly Wing Val Wing Val Glu 4305 4310 4315 4320 Wing Gly Glu Arg Val Gln Val Wing Ala Thr Wing Pro Val Trp Gly Leu Gly 4325 4330 4335 Arg Thr Val Met Gln Glu Arg Pro Glu Leu Ser Cys Thr Leu Val Asp 4340 4345 4350 Leu Glu Pro Glu Wing Asp Wing Wing Arg Wing Wing Asp Val Leu Leu Arg 4355 4360 4365 Glu Leu Gly Arg Wing Asp Asp Glu Thr Gln Val Wing Phe Arg Ser Gly 4370 4375 4380 Lys Arg Arg Val Wing Arg Leu Val Lys Wing Thr Thr Pro Glu Gly Leu 4385 4390 4395 4400 Leu Val Pro Asp Wing Glu Ser Tyr Arg Leu Glu Wing Gly Gln Lys Gly 4405 4410 4415 Thr Leu Asp Gln Leu Arg Leu Wing Pro Wing Gln Arg Arg Wing Pro Gly 4420 4425 4430 Pro Gly Glu Val Glu He Lys Val Thr Wing Ser Gly Leu Asn Phe Arg 4435 4440 4445 Thr Val Leu Wing Val Leu Gly Met Tyr Pro Gly Asp Wing Gly Pro Met 4450 4455 4460 Gly Gly Asp Cys Wing Gly Val Allah Thr Wing Val Gly Gln Gly Val Arg 4465 4470 * > 4475 4480 Hxs Val Wing Val Gly Asp Wing Val Met Thr Leu Gly Thr Leu His Arg 4485 4490 4495 Phe Val Thr Val Asp Ala Arg Leu Val Val Arg Gln Pro Ala Gly Leu 4500 4505 4510 Thr Pro Ala Gln Ala Ala Thr Val Pro Val Ala Phe Leu Thr Ala Trp 4515 4520 4525 Leu Ala Leu His Asp Leu Gly Asn Leu Arg Arg Gly Glu Arg Val Leu 4530 4535 4540 20 He His Wing Wing Wing Wing Gly Gly Val Gly Met Wing Wing Val Gln He Wing 4545 4550 4555 4560 Arg Trp He Gly Wing Glu Val Phe Wing Thr Wing Ser Pro Ser Lys Trp 4565 4570 4575 Ala Ala Gl Gln Ala Met Gly Val Pro Arg Thr His He Wing Ser Ser 4580 4585 4590 Arg Thr Leu Glu Phe Wing Glu Thr Phe Arg Gln Val Thr Gly Gly Arg 4595 4600 4605 Gly Val Asp Val Val Leu Asn Ala Leu Ala Gly Glu Phe Val Asp Ala 4610 4615 4620 35 Ser Leu Ser Leu Leu Ser Thr Gly Gly Arg Phe Leu Glu Met Gly Lys 4625 4630 4635 4640 Thr Asp He Arg Asp Arg Wing Wing Val Wing Wing Wing Hxs Pro Gly Val 4645 4650 4655 40 Arg Tyr Arg Val Phe Asp He Leu Glu Leu Wing Pro Asp Arg Thr Arg 4660 4665 4670 Glu He Leu Glu Arg Val Val Glu Gly Phe Ala Wing Gly His Leu Arg 45 4675 4680 4685 Ala Leu Pro Val His Ala Phe Ala He Thr Lys Ala Glu Ala Ala Phe 4690 4695 4700 50 Arg Phe Met Ala Gln Ala Arg His Gln Gly Lys Val Val Leu Leu Pro 4705 4710 4715 4720 Wing Pro Wing Wing Pro Pro Leu Wing Pro Thr Gly Thr Val Leu Leu Thr 4725 4730 4735 DD Gly Gly Leu Gly Wing Leu Gly Leu Hxs Val Wing Arg Trp Leu Wing Gln 4740 4745 75C * .1-ft.sfrjfe * "-.- rf ^ ea ^.
Gln Gly Val Pro His Met Val Leu Thr Gly Arg Arg Gly Leu Asp Thr 4755 4760 4765 Pro Gly Wing Wing Lys Wing Val Wing Glu He Glu Wing Leu Gly Wing Arg 4770 4775 4780 Val Thr He Wing Wing Being Asp Val Wing Asp Arg Asn Ala Leu Glu Ala 4785 4790 4795 4800 Val Leu Gln Ala He Pro Ala Glu Trp Pro Leu Gln Gly Val He His 4805 4810 4815 Ala Ala Gly Ala Leu Asp Asp Gly Val Leu Asp Glu Gln Thr Thr Asp 4820 4825 4830 Arg Phe Ser Arg Val Leu Wing Pro Lys Val Thr Gly Wing Trp Asn Leu 4835 4840 4845 His Glu Leu Thr Wing Gly Asn Asp Leu Wing Phe Phe Val Leu Phe Ser 4850 4855 4860 Ser Met Ser Gly Leu Leu Gly Ser Wing Gly Gln Ser Asn Tyr Ala Wing 4865 4870 4875 4880 Wing Asn Thr Phe Leu Asp Wing Leu Wing Wing His Arg Arg Wing Glu Gly 4885 4890 4895 Leu Ala Ala Gln Ser Leu Ala Trp Gly Pro Trp Ser Asp Gly Gly Met 4900 4905 4910 Wing Wing Gly Leu Wing Wing Leu Gln Wing Arg Leu Wing Arg His Gly 4915 4920 4925 Met Gly Ala Leu Ser Pro Wing Gln Gly Thr Ala Leu Leu Gly Gln Wing 4930 4935 4940 Leu Wing Arg Pro Glu Thr Gln Leu Gly Wing Met Ser Leu Asp Val Arg 4945 4950 4955 4960 Ala Ala Ser Gln Ala Ser Gly Ala Ala Val Pro Pro Val Trp Arg Ala 4965 4970 4975 Leu Val Arg Ala Glu Ala Arg His Thr Ala Ala Gly Ala Gln Gly Ala 4980 4985 4990 Leu Ala Ala Arg Leu Gly Ala Leu Pro Glu Ala Arg Arg Ala Asp Glu 4995 5000 5005 Val Arg Lys Val Val Gln Ala Glu He Ala Arg Val Leu Ser Trp Ser 5010 5015 5020 Wing Wing Wing Val Pro Val Asp Arg Pro Leu Ser Asp Leu Gly Leu 5025 5030 5035 5040 Asp Ser Leu Thr Ala Val Glu Leu Arg Asn Val Leu Gly Gln Arg Val 5045 5050 5055 Gly Wing Thr Leu Pro Wing Thr Leu Wing Phe Asp Hxs Pro Thr Val Asp 5060 5065 5070 Wing Leu Thr Arg Trp Leu Leu Asp Lys Val Leu Wing Val Ala Glu Pro 5075 5080 5085 Ser Val Ser Be Ala Lys Ser Ser Pro Gln Val Ala Leu Asp Glu Pro 5090 5095 5100 He Wing He He Gly He Gly Cys Arg Phe Pro Gly Val Wing Asp 5105 5110 5115 5120 Pro Glu Ser Phe Trp Arg Leu Leu Glu Glu Gly Ser Asp Ala Val Val 5125 5130 5135 Glu Val Pro His Glu Arg Trp Asp He Asp Wing Phe Tyr Asp Pro Asp 5140 5145 5150 Pro Asp Val Arg Gly Lys Met Thr Thr Arg Phe Gly Gly Phe Leu Ser 5155 5160 5165 Asp He Asp Arg Phe Asp Pro Wing Phe Phe Gly He Ser Pro Arg Glu 5170 5175 5180 Wing Thr Thr Met Asp Pro Gln Gln Arg Leu Leu Glu Thr Ser Trp 5185 5190 5195 5200 Glu Ala Phe Glu Arg Ala Gly He Leu Pro Glu Arg Leu Met Gly Ser 5205 5210 5215 Asp Thr Gly Val Phe Val Gly Leu Phe Tyr Gln Glu Tyr Ala Ala Leu 5220 5225 5230 Wing Gly Gly He Glu Wing Phe Asp Gly Tyr Leu Gly Thr Gly Thr Thr 5235 5240 5245 Wing Ser Val Wing Ser Gly Arg He Ser Tyr Val Leu Gly Leu Lys Gly 5250 5255 5260 Pro Ser Leu Thr Val Asp Thr Wing Cys Ser Ser Ser Leu Val Wing Val 5265 5270 5275 5280 Hxs Leu Ala Cys Gln Ala Leu Arg Arg Gly Glu Cys Ser Val Ala Leu 5285 5290 5295 Wing Gly Gly Val Wing Leu Wing Leu Thr Pro Wing Thr Phe Val Glu Phe 5300 5305 5310 Ser Arg Leu Arg Gly Leu Wing Pro Asp Gly Arg Cys Lys Ser Phe Ser 5315 5320 5325 Ala Wing Wing Asp Gly Val Gly Trp Ser Glu Gly Cys Wing Met Leu Leu 5330 5335 5340 Leu Lys Pro Leu Arg Asp Wing Gln Arg Asp Gly Asp Pro He Leu Wing 5345 5350 5355 5360 Val He Arg Gly Thr Wing Val Asn Gln Asp Gly Arg Ser Asn Gly Leu 5365 5370 5375 Thr Ala Pro Asn Gly Be Ser Gln Gln Glu Val He Arg Arg Ala Leu 5380 5385 5390 Glu Gln Wing Gly Leu Wing Pro Wing Asp Val Ser Tyr Val Glu Cys His 5395 5400 5405 10 Gly Thr Gly Thr Thr Leu Gly Asp Pro He Glu Val Gln Ala Leu Gly 5410 5415 5420 Wing Val Leu Wing Gln Gly Arg Pro Ser Asp Arg Pro Leu Val He Gly 5425 5430 5435 5440 Ser Val Lys Ser Asn He Gly His Thr Gln Ala Wing Wing Gly Val Wing 5445 5450 5455 Gly Val He Lys Val Ala Leu Ala Leu Glu Arg Gly Leu He Pro Arg 5460 5465 5470 Ser Leu His Phe Asp Ala Pro Asn Pro His He Pro Trp Ser Glu Leu 5475 5480 5485 25 Wing Val Gln Val Wing Ala Lys Pro Val Glu Trp Thr Arg Asn Gly Val 5490 5495 5500 Pro Arg Arg Wing Gly Val Being Ser Phe Gly Val Ser Gly Thr Asn Wing 30 5505 5510 5515 5520 His Val Val Leu Glu Glu Ala Pro Ala Ala Ala Phe Ala Pro Ala Ala 5525 5530 5535 Wing Arg Wing Wing Glu Leu Phe Val Leu Ser Wing Lys Wing Wing Wing 5540 5545 5550 Leu Asp Wing Gln Wing Wing Arg Leu Wing His Val Val Wing His Pro 5555 5560 5565 40 Glu Leu Gly Leu Gly Asp Leu Wing Phe Ser Leu Wing Thr Thr Arg Ser 5570 5575 5580 Pro Met Thr Tyr Arg Leu Wing Val Wing Wing Thr Ser Arg Glu Wing Leu 45 5585 5590 5595 5600 Be Ala Ala Leu Asp Thr Ala Ala Gln Gly Gln Ala Pro Pro Ala Ala 5605 5610 5615 50 Wing Arg Gly His Wing Ser Thr Gly Wing Wing Pro Lys Val Val Phe Val 5620 5625 5630 Phe Pro Gly Gln Gly Ser Gln Trp Leu Gly Met Gly Gln Lys Leu Leu 5635 5640 5645 DD Ser Glu Glu Pro Val Phe Arg Asp Ala Leu Be Ala Cys Asp Arg Ala 5650 5655 5660 He Gln Ala Glu Ala Gly Trp Ser Leu Leu Ala Glu Leu Ala Ala Asp 5665 5670 5675 5680 Glu Thr Thr Ser Gln Leu Gly Arg He Asp Val Val Gln Pro Ala Leu 5685 5690 5695 Phe Ala He Glu Val Ala Leu Ser Ala Leu Trp Arg Ser Trp Gly Val 5700 5705 5710 Glu Pro Asp Ala Val Val Gly His Ser Met Gly Glu Val Ala Ala Ala 5715 5720 5725 His Val Ala Gly Ala Leu Ser Leu Glu Asp Ala Val Wing He He Cys 5730 5735 5740 Arg Arg Ser Leu Leu Leu Arg Arg He Ser Gly Gln Gly Met Ala 5745 5750 5755 5760 Val Val Glu Leu Ser Leu Ala Glu Ala Glu Ala Ala Leu Leu Gly Tyr 5765 5770 5775 Glu Asp Arg Leu Ser Val Wing Val Ser Asn Ser Pro Arg Ser Thr Val 5780 5785 5790 Leu Wing Gly Glu Pro Wing Wing Leu Wing Glu Val Leu Wing He Leu Wing 5795 5800 5805 Wing Lys Gly Val Phe Cys Arg Arg Val Lys Val Asp Val Ala Ser His 5810 5815 5820 Ser Pro Gln He Asp Pro Leu Arg Asp Glu Leu Leu Wing Wing Leu Gly 5825 5830 5835 5840 Glu Leu Glu Pro Arg Gln Wing Thr Val Ser Met Arg Ser Thr Val Thr 5845 5850 5855 Be Thr He Met Wing Gly Pro Glu Leu Val Wing Ser Tyr Trp Wing Asp 5860 5865 5870 Asn Val Arg Gln Pro Val Arg Phe Wing Glu Wing Val Gln Ser Leu Met 5875 5880 5885 Glu Asp Gly His Gly Leu Phe Val Glu Met Ser Pro His Pro He Leu 5890 5895 5900 Thr Thr Ser Val Glu Glu He Arg Arg Wing Thr Lys Arg Glu Gly Val 5905 5910 5915 5920 Wing Val Gly Ser Leu Arg Arg Gly Gln Asp Glu Arg Leu Ser Met Leu 5925 5930 5935 Glu Ala Leu Gly Ala Leu Trp Val His Gly Gln Ala Val Gly Trp Glu 5940 5945 5950 Arg Leu Phe Be Wing Gly Gly Wing Gly Leu Arg Arg Val Pro Leu Pro 5955 5960 5965 .. *? Fc ^ -? - i ^ ". < *.
Thr Tyr Pro Trp Gln Arg Glu Arg Tyr Trp Val Asp Wing Pro Thr Gly 5970 5975 5980 Gly Wing Wing Gly Gly Ser Arg Phe Wing His Wing Gly Ser His Pro Leu 5985 5990 5995 6000 Leu Gly Glu Met Gln Thr Leu Ser Thr Gln Arg Ser Thr Arg Val Trp 6005 6010 6015 Glu Thr Thr Leu Asp Leu Lys Arg Leu Pro Trp Leu Gly Asp His Arg 6020 6025 6030 Val Gln Gly Ala Val Val Phe Pro Gly Ala Ala Tyr Leu Glu Met Ala 6035 6040 6045 Leu Ser Ser Gly Ala Glu Ala Leu Gly Asp Gly Pro Leu Gln Val Ser 6050 6055 6060 Asp Val Val Leu Wing Glu Wing Leu Wing Phe Wing Asp Asp Thr Pro Wing 6065 6070 6075 6080 Wing Val Gln Val Met Wing Thr Glu Glu Arg Pro Gly Arg Leu Gln Phe 6085 6090 6095 His Val Ala Ser Arg Val Pro Gly His Gly Gly Ala Wing Phe Arg Ser 6100 6105 6110 His Wing Arg Gly Val Leu Arg Gln He Glu Arg Wing Glu Val Pro Wing 6115 6120 6125 Arg Leu Asp Leu Wing Wing Leu Arg Wing Arg Leu Gln Ala Ser Ala Pro 6130 6135 6140 Wing Wing Thr Wing Tyr Wing Wing Leu Wing Glu Met Gly Leu Glu Tyr Gly 6145 6150 6155 6160 Pro Ala Phe Gln Gly Leu Val Glu Leu Trp Arg Gly Glu Gly Glu Ala 6165 6170 6175 Leu Gly Arg Val Arg Leu Pro Glu Ala Wing Gly Ser Pro Ala Wing Cys 6180 6185 6190 Arg Leu 'His Pro Wing Leu Leu Asp Wing Cys Phe His Val Ser Wing 6195 6200 6205 Phe Wing Asp Arg Gly Glu Wing Thr Pro Trp Val Pro Val Glu He Gly 6210 6215 6220 Ser Leu Arg Trp Phe Gln Arg Pro Ser Gly Glu Leu Trp Cys His Wing 6225 6230 6235 6240 Arg Ser Val Ser His Gly Lys Pro Thr Pro Asp Arg Arg Ser Thr Asp 6245 6250 6255 Phe Trp Val Val Asp Ser Thr Gly Wing He Val Wing Glu He Ser Gly 6260 6265 6270 Leu Val Wing Gln Arg Leu Wing Gly Gly Val Arg Arg Ar Glu Glu Asp 6275 6280 6285 Asp Trp Phe Met Glu Pro Wing Trp Glu Pro Thr Wing Val Pro Gly Ser 6290 6295 6300 Glu Val Met Wing Gly Arg Trp Leu Leu He Gly Ser Gly Gly Gly Leu 6305 6310 6315 6320 Gly Ala Ala Leu His Ser Ala Leu Thr Glu Ala Gly His Ser Val Val 6325 6330 6335 His Wing Thr Gly Arg Gly Thr Wing Wing Wing Gly Leu Gln Wing Leu Leu 6340 6345 6350 Thr Wing Being Phe Asp Gly Gln Wing Pro Thr Ser Val Val His Leu Gly 6355 6360 6365 Ser Leu Asp Glu Arg Gly Val Leu Asp Wing Asp Wing Pro Phe Asp Wing 6370 6375 6380 Asp Ala Leu Glu Glu Ser Leu Val Arg Gly Cys Asp Ser Val Leu Trp 6385 6390 6395 6400 Thr Val Gln Ala Val Ala Gly Ala Gly Phe Arg Asp Pro Pro Arg Leu 6405 6410 6415 Trp Leu Val Thr Arg Gly Wing Gln Wing He Gly Wing Gly Asp Val Ser 6420 6425 6430 Val Wing Gln Wing Pro Leu Leu Gly Leu Gly Arg Val He Wing Leu Glu 6435 6440 6445 His Wing Glu Leu Arg Cys Wing Arg He Asp Leu Asp Pro Arg Wing Arg 6450 6455 6460 Asp Gly Glu Val Asp Glu Leu Leu Wing Glu Leu Leu Wing Asp Asp Wing 6465 6470 6475 6480 Glu Glu Glu Val Wing Phe Arg Gly Gly Glu Arg Arg Val Wing Arg Leu 6485 6490 6495 Val Arg Arg Leu Pro Glu Thr Asp Cys Arg Glu Lys He Glu Pro Wing 6500 6505 6510 Glu Gly Arg Pro Phe Arg Leu Glu He Asp Gly Ser Gly Val Leu Asp 6515 6520 6525 Asp Leu Val Leu Arg Wing Thr Glu Arg Arg Pro Pro Gly Pro Gly Glu 6530 6535 6540 Val Glu He Ala Val Glu Ala Ala Gly Leu Asn Phe Leu Asp Val Met 6545 6550 6555 6560 Arg Wing Met Gly He Tyr Pro Gly Pro Gly Asp Gly Pro Val Wing Leu 6565 6570 6575 Gly Wing Glu Cys Ser Gly Arg He Val Wing Met Gly Glu Glu Val Glu 6580 6585 6590 Ser Leu Arg He Gly Gln Asp Val Val Wing Val Wing Pro Phe Ser Phe 6595 6600 6605 Gly Thr His Val Thr He Asp Wing Arg Met Leu Wing Pro Arg Pro Wing 6610 6615 6620 Wing Leu Thr Wing Wing Gln Wing Wing Wing Leu Pro Val Wing Phe Met Thr 6625 6630 6635 6640 Wing Trp Tyr Gly Leu Val His Leu Gly Arg Leu Arg Wing Gly Glu Arg 6645 6650 6655 Val Leu He His Be Wing Thr Gly Gly Thr Gly Leu Wing Ala Gl Gln 6660 6665 6670 He Wing Arg His Leu Gly Wing Glu He Phe Wing Thr Wing Gly Thr Pro 6675 6680 6685 Glu Lys Arg Wing Trp Leu Arg Glu Gln Gly He Wing His Val Met Asp 6690 6695 6700 Ser Arg Ser Leu Asp Phe Wing Glu Gln Val Leu Wing Wing Thr Lys Gly 6705 6710 6715 6720 Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Ala Wing Asp 6725 6730 6735 Wing Ser Leu Ser Thr Leu Val Pro Asp Gly Arg Phe He Glu Leu Gly 6740 6745 6750 Lys Thr Asp He Tyr Wing Asp Arg Ser Leu Gly Leu Wing His Phe Arg 6755 6760 6765 Lys Ser Leu Ser Tyr Ser Wing Val Asp Leu Wing Gly Leu Ala Val Arg 6770 6775 6780 Arg Pro Glu Arg Val Ala Ala Leu Leu Ala Glu Val Val Asp Leu Leu 6785 6790 6795 6800 Wing Arg Gly Wing Leu Gln Pro Leu Pro Val Glu He Phe Pro Leu Ser 6805 6810 6815 Arg Wing Wing Asp Wing Phe Arg Lys Met Wing Gln Wing Gln Hxs Leu Gly 6820 6825 6830 Lys Leu Val Leu Wing Leu Glu Asp Pro Asp Val Arg He Arg Val Pro 6835 6840 6845 Gly Glu Ser Gly Val Wing He Arg Wing Asp Gly Wing Tyr Leu Val Thr 6850 6855 6860 Gly Gly Leu Gly Leu Le Gly Leu Ser Val Wing Gly Trp Leu Wing Glu 6865 6870 6875 6880 Gln Gly Wing Gly His Leu Val Leu Val Gly Ar Ser Gly Wing Val Ser 6885 6890 6895 Wing Glu Gln Gln Thr Wing Wing Wing Wing Leu Glu Wing His Gly Wing Arg 6900 6905 6910 Val Thr Val Wing Arg Wing Asp Val Wing Asp Arg Wing Gln Met Glu Arg 6915 6920 6925 He Leu Arg Glu Val Thr Wing Ser Gly Met Pro Leu Arg Gly Val Val 6930 6935 6940 His Wing Wing Gly He Leu Asp Asp Gly Leu Leu Met Gln Gln Thr Pro 6945 6950 6955 6960 Ala Arg Phe Arg Ala Val Met Ala Pro Lys Val Arg Gly Ala Leu His 6965 6970 6975 Leu His Ala Leu Thr Arg Glu Ala Pro Leu Ser Phe Phe Val Leu Tyr 6980 6985 6990 Wing Ser Gly Wing Gly Leu Leu Gly Ser Pro Gly Gln Gly Asn Tyr Wing 6995 7000 7005 Ala Ala Asn Thr Phe Leu Asp Ala Leu Ala His His Arg Arg Ala Gln 7010 7015 7020 Gly Leu Pro Ala Leu Ser He Asp Trp Gly Leu Phe Wing Asp Val Gly 7025 7030 7035 7040 Leu Wing Wing Gly Gln Gln Asn Arg Gly Wing Arg Leu Val Thr Arg Gly 7045 7050 7055 Thr Arg Ser Leu Thr Pro Asp Glu Gly Leu Trp Wing Leu Glu Arg Leu 7060 7065 7070 Leu Asp Gly Asp Arg Thr Gln Wing Gly Val Met Pro Phe Asp Val Arg 7075 7080 7085 Gln Trp Val Glu Phe Tyr Pro Ala Ala Ala Ser Ser Arg Arg Leu Ser 7090 7095 7100 Arg Leu Met Thr Ala Arg Arg Val Ala Ser Gly Arg Leu Ala Gly Asp 7105 7110 7115 7120 Arg Asp Leu Leu Glu Arg Leu Ala Thr Ala Glu Ala Gly Ala Arg Ala 7125 7130 7135 Gly Met Leu Gln Glu Val Val Arg Ala Gln Val Ser Gln Val Leu Arg 7140 7145 7150 Leu Ser Glu Gly Lys Leu Asp Val Asp Ala Pro Leu Thr Ser Leu Gly 7155 7160 7165 Met Asp Ser Leu Met Gly Leu Glu Leu Arg Asn Arg He Glu Wing Val 7170 7175 7180 Leu Gly He Thr Met Pro Wing Thr Leu Leu Trp Thr Tyr Pro Thr Val 7185 7190 7195 7200 Ala Ala Leu Ser Ala His Leu Ala Ser His Val Val Ser Thr Gly Asp 7205 7210 7215 Gly Glu Be Wing Arg Pro Pro Asp Thr Gly Ser Val Wing Pro Thr Thr 7220 7225 7230 His Glu Val Wing Ser Leu Asp Glu Asp Gly Leu Phe Wing Leu He Asp 7235 7240 7245 Glu Ser Leu Wing Arg Wing Gly Lys Arg 7250 7255 < 210 > 6 < 211 > 3798 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 6 Val Thr Asp Arg Glu Gly Gln Leu Leu Glu Arg Leu Arg Glu Val Thr 1 5 10 15 Leu Ala Leu Arg Lys Thr Leu Asn Glu Arg Asp Thr Leu Glu Leu Glu 20 25 30 Lys Thr Glu Pro He Wing He Val Gly He Gly Cys Arg Phe Pro Gly 35 40 45 Gly Wing Gly Thr Pro Glu Wing Phe Trp Glu Leu Leu Asp Asp Gly Arg 50 55 60 Asp Wing He Arg Pro Leu Glu Glu Arg Trp Wing Leu Val Gly Val Asp 65 70 75 80 Pro Gly Asp Asp Val Pro Arg Trp Wing Gly Leu Leu Thr Glu Wing He 85 90 95 Asp Gly Phe Asp Wing Wing Phe Phe Gly He Wing Pro Arg Glu Wing Arg 100 105 110 Being Leu Asp Pro Gln His Arg Leu Leu Leu Glu Val Wing Trp Glu Gly 115 120 125 Phe Glu Asp Wing Gly Pro Pro Arg Ser Leu Val Gly Ser Arg Thr 130 135 140 Gly Val Phe Val Gly Val Cys Ala Thr Glu Tyr Leu Hxs Ala Ala Val 145 150 155 160 Wing Hxs Gln Pro Arg Glu Glu Arg Asp Wing Tyr Ser Thr Thr Gly Asn 165 170 175 Met Leu Ser Wing Wing Wing Gly Arg Leu Ser Tyr Thr Leu Gly Leu Gln 180 185 190 Gly Pro Cys Leu Thr Val Asp Thr Wing Cys Ser Ser Leu Val Wing 195 200 205 He His Leu Wing Cys Arg Ser Leu Arg Wing Arg Glu Be Asp Leu Wing 210 215 220 Leu Wing Gly Gly Val Asn Met Leu Leu Ser Pro Asp Thr Met Arg Wing 225 230 235 240 Leu Ala Arg Thr Gln Ala Leu Ser Pro Asn Gly Arg Cys Gln Thr Phe 245 250 255 Asp Wing Being Wing Asn Gly Phe Val Arg Gly Glu Gly Cly Gly Leu He 260 265 270 Val Leu Lys Arg Leu Ser Asp Wing Arg Arg Asp Gly Asp Arg He Trp 275 280 285 Wing Leu He Arg Gly Wing He Asn Gln Asp Gly Arg Ser Thr Gly 290 295 300 Leu Thr Ala Pro Asn Val Leu Ala Gln Gly Ala Leu Leu Arg Glu Ala 305 310 315 320 Leu Arg Asn Wing Gly Val Glu Wing Glu Wing He Gly Tyr He Glu Thr 325 330 335 His Gly Wing Wing Thr Ser Leu Gly Asp Pro He Glu He Glu Wing Leu 340 345 350 Arg Wing Val Val Gly Pro Wing Arg Wing Asp Gly Wing Arg Cys Val Leu 355 360 365 Gly Wing Val Lys Thr Asn Leu Gly His Leu Glu Gly Wing Wing Gly Val 370 375 380 Wing Gly Leu He Lys Wing Thr Leu Ser Leu His His Glu Arg He Pro 385 390 395 400 Arg Asn Leu Asn Phe Arg Thr Leu Asn Pro Arg He Arg He Glu Gly 405 410 415 Thr Ala Leu Ala Leu Ala Thr Glu Pro Val Pro Trp Pro Arg Thr Gly 420 425 430 Arg Thr Arg Phe Wing Gly Val Ser Ser Phe Gly Met Ser Gly Thr Asn 435 440 445 Wing His Val Val Leu Glu Wing Ala Pro Wing Val Glu Pro Glu Wing Wing 450 455 460 Wing Pro Glu Arg Wing Wing Glu Leu Phe Val Leu Ser Ala Lys Ser Ala 465 470 475 480 Ala Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Asp His Leu Glu Lys 485 490 495 Hxs Val Glu Leu Gly Leu Gly Asp Val Wing Phe Ser Leu Wing Thr Thr 500 505 510 Arg Wing Wing Met Glu Hxs Arg Leu Wing Val Wing Wing Being Ser Arg Glu 515 520 525 Wing Leu Arg Gly Wing Leu Wing Wing Wing Gln Gly Hxs Thr Pro Pro 530 535 540 Gly Wing Val Arg Gly Arg Wing Ser Gly Gly Ser Wing Pro Lys Val Val 545 550 555 560 Phe Val Phe Pro Gly Gln Gly Ser Gln Trp Val Gly Met Gly Arg Lys 565 570 575 Leu Met Wing Glu Glu Pro Val Phe Arg Wing Wing Leu Glu Gly Cly Asp 580 585 590 Arg Wing He Glu Wing Glu Wing Gly Trp Ser Leu Leu Gly Glu Leu Ser 595 600 605 Wing Asp Glu Wing Wing Gln Leu Gly Arg He Asp Val Val Gln Pro 610 615 620 Val Leu Phe Wing Met Glu Val Wing Leu Ser Wing Leu Trp Arg Ser Trp 625 630 635 640 Gly Val Glu Pro Glu Val Wing Val Gly Hxs Ser Met Gly Val Val Wing 645 650 655 Ala Ala Hxs Val Ala Gly Ala Leu Ser Leu Glu Asp Ala Val Ala He 660 665 670 He Cys Arg Arg Ser Arg Leu Leu Arg Arg He Ser Gly Gln Glu 675 680 685 Met Ala Leu Val Glu Leu Ser Leu Glu Glu Ala Glu Ala Ala Leu Arg 690 695 700 Gly Hxs Glu Gly Arg Leu Ser Val Ala Val Ser Asn Ser Pro Arg Ser 705 710 715 720 Thr Val Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu Val Leu Ala Ala 725 730 735 Leu Thr Wing Lys Gly Val Phe Trp Arg Gln Val Lys Val Asp Val Wing 740 745 750 Ser Hxs Ser Pro Gln Val Asp Pro Leu Arg Glu Glu Leu He Wing Wing 755 760 765 Leu Gly Wing He Arg Pro Arg Wing Wing Wing Val Pro Met Arg Ser Thr 770 775 780 Val Thr Gly Gly Val He Wing Gly Pro Glu Leu Gly Wing Ser Tyr Trp 785 790 795 800 Wing Asp Asn Leu Arg Gln Pro Val Arg Phe Wing Wing Wing Wing Gln Wing 805 810 815 Leu Leu Glu Gly Gly Pro Wing Ala Le Phe He Glu Met Ser Pro His Pro 820 825 830 He Leu Val Pro Pro Leu Asp Glu He Gln Thr Ala Wing Glu Gln Gly 835 840 845 Gly Wing Wing Val Gly Ser Leu Arg Arg Gly Gln Asp Glu Arg Wing Thr 850 855 860 Leu Leu Glu Wing Leu Gly Thr Leu Trp Wing Ser Gly Tyr Pro Val Ser 865 870 875 880 Trp Wing Arg Leu Phe Pro Wing Gly Gly Arg Arg Val Pro Leu Pro Thr 885 890 895 Tyr Pro Trp Gln His Glu Arg Cys Trp He Glu Val Glu Pro Asp Wing 900 905 910 Arg Arg Leu Wing Wing Asp Pro Thr Lys Asp Trp Phe Tyr Arg Thr 915 920 925 Asp Trp Pro Glu Val Pro Arg Ala Wing Pro Lys Ser Glu Thr Ala His 930 935 940 Gly Ser Trp Leu Leu Leu Wing Asp Arg Gly Gly Val Gly Glu Ala Val 945 950 955 960 Ala Ala Ala Leu Ser Thr Arg Gly Leu Ser Cys Thr Val Leu His Ala 965 970 975 Be Wing Asp Wing Ser Thr Val Wing Glu Gln Val Ser Glu Wing Wing Ser 980 985 990 Arg Arg Asn Asp Trp Gln Gly Val Leu Tyr Leu Trp Gly Leu Asp Wing 995 1000 1005 Val Val Asp Wing Gly Wing Ser Wing Asp Glu Val Ser Glu Ala Thr Arg 1010 1015 1020 Arg Ala Thr Ala Pro Val Leu Gly Leu Val Arg Phe Leu Ser Ala Ala 1025 1030 1035 1040 Pro His Pro Pro Arg Phe Trp Val Val Thr Arg Gly Ala Cys Thr Val 1045 1050 1055 Gly Gly Glu Pro Glu Wing Be Leu Cys Gln Ala Wing Leu Trp Gly Leu 1060 1065 1070 Wing Arg Wing Wing Ala Leu Glu His Pro Wing Wing Trp Gly Gly Leu Val 1075 1080 1085 Asp Leu Asp Pro Gln Lys Ser Pro Thr Glu He Glu Pro Leu Val Ala . 1090 1095 1100 Glu Leu Leu Ser Pro Asp Wing Glu Asp Gln Leu Wing Phe Arg Ser Gly 1105 1110 1115 1120 Arg Arg His Ala Ala Arg Leu Val Ala Ala Pro Pro Glu Gly Asp Val 1125 1130 1135 Wing Pro Be Ser Leu Ser Wing Glu Gly Ser Tyr Leu Val Thr Gly Gly 1140 1145 1150 Leu Gly Gly Leu Leu Leu Val Wing Arg Trp Leu Val Glu Arg Gly 1155 1160 1165 Wing Arg His Leu Val Leu Thr Ser Arg His Gly Leu Pro Glu Arg Gln 1170 1175 1180 Wing Ser Gly Gly Glu Gln Pro Pro Glu Wing Arg Wing Arg He Wing Wing 1185 1190 1195 1200 Val Glu Gly Leu Glu Wing Gln Gly Wing Arg Val Thr Val Wing Wing Val 1205 1210 1215 Asp Val Ala Glu Ala Asp Pro Met Thr Ala Ala Leu Leu Ala Ala He Glu 1220 1225 1230 Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Val Phe Pro Val Arg 1235 1240 1245 His Leu Ala Glu Thr Asp Glu Ala Leu Leu Glu Ser Val Leu Arg Pro 1250 1255 1260 Lys Val Wing Gly Ser Trp Leu Leu His Arg Leu Leu Arg Asp Arg Pro 1265 1270 1275 1280 Leu Asp Leu Phe Val Leu Phe Ser Ser Gly Ala Wing Val Trp Gly Gly 1285 1290 1295 Lys Gly Gln Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu 1300 1305 1310 Wing Hxs Hxs Arg Arg Wing Hxs Ser Leu Pro Wing Leu Ser Leu Wing Trp 1315 1320 1325 Gly Leu Trp Wing Glu Gly Gly Met Val Asp Wing Lys Wing Hxs Wing Arg 1330 1335 1340 Leu Ser Asp He Gly Val Leu Pro Met Ala Thr Gly Pro Ala Leu Ser 1345 1350 1355 1360 Wing Leu Glu Arg Leu Val Asn Thr Ser Wing Val Gln Arg Ser Val Thr 1365 1370 1375 Arg Met Asp Trp Wing Arg Phe Wing Pro Val Tyr Wing Wing Arg Gly Arg 1380 1385 1390 Arg Asn Leu Leu Be Wing Leu Val Wing Glu Asp Glu Arg Wing Wing Ser 1395 1400 1405 Pro Pro Pro Thr Wing Asn Arg He Trp Arg Gly Leu Ser Val Wing 1410 1415 1420 Glu Ser Arg Ser Wing Leu Tyr Glu Leu Val Arg Gly He Val Wing Arg 1425 1430 1435 1440 Val Leu Gly Phe Ser Asp Pro Gly Ala Leu Asp Val Gly Arg Gly Phe 1445 1450 1455 Wing Glu Gln Gly Leu Asp Being Leu Met Wing Leu Glu He Arg Asn Arg 1460 1465 1470 Leu Gln Arg Glu Leu Gly Glu Arg Leu Ser Wing Thr Leu Wing Phe Asp 1475 1480 1485 His Pro Thr Val Glu Arg Leu Val Ala His Leu Leu Thr Asp Val Leu 1490 1495 1500 Lys Leu Glu Asp Arg Ser Asp Thr Arg His He Arg Ser Val Ala Ala 1505 1510 1515 1520 Asp Asp Asp He Wing He Val Gly Wing Wing Cys Arg Phe Pro Gly Gly 1525 1530 1535 Asp Glu Gly Leu Glu Thr Tyr Trp Arg His Leu Wing Glu Gly Met Val 1540 1545 1550 Val Ser Thr Glu Val Pro Wing Asp Arg Trp Arg Wing Wing Asp Trp Tyr 1555 1560 1565 Asp Pro Asp Pro Glu Val Pro Gly Arg Thr Tyr Val Ala Lys Gly Ala 1570 1575 1580 Phe Leu Arg Asp Val Arg Ser Leu Asp Ala Ala Phe Phe Ala He Ser 1585 1590 1595 1600 Pro Arg Glu Ala Met Ser Leu Asp Pro Gln Gln Arg Leu Leu Lelu Glu 1605 1610 1615 Val Ser Trp Glu Wing He Glu Arg Wing Gly Gln Asp Pro Met Wing Leu 1620 1625 1630 Arg Glu Be Wing Thr Gly Val Phe Val Gly Met He Gly Ser Glu His 1635 1640 1645 Wing Glu Arg Val Gln Gly Leu Asp Asp Asp Ala Wing Leu Leu Tyr Gly 1650 1655 1660 Thr Thr Gly Asn Leu Leu Ser Val Wing Wing Gly Arg Leu Ser Phe Phe 1665 1670 1675 1680 Leu Gly Leu His Gly Pro Thr Met Thr Val Asp Thr Ala Cys Ser Ser 1685 1690 1695 Be Leu Val Ala Leu His Leu Ala Cys Gln Ser Leu Arg Leu Gly Glu 1700 1705 1710 Cys Asp Gln Ala Leu Ala Gly Gly Be Ser Val Leu Leu Ser Pro Arg 1715 1720 1725 Be Phe Val Ala Ala Be Arg Met Arg Leu Leu Ser Pro Asp Gly Arg 1730 1735 1740 Cys Lys Thr Phe Ser Wing Wing Wing Asp Gly Phe Wing Arg Wing Glu Gly 1745 1750 1755 1760 Cys Ala Val Val Val Leu Lys Arg Leu Arg Asp Ala Gln Arg Asp Arg 1765 1770 1775 Asp Pro He Leu Ala Val Val Arg Ser Thr Ala He Asn His Asp Gly 1780 1785 1790 Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala Gln Gln Ala Leu 1795 1800 1805 Leu Arg Gln Ala Leu Ala Gln Ala Gly Val Ala Pro Ala Glu Val Asp 1810 1815 1820 Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro He Glu 1825 1830 1835 1840 Val Gln Wing Ala Leu Gly Wing Val Tyr Gly Arg Gly Arg Pro Wing Glu Arg 1845 1850 1855 Pro Leu Trp Leu Gly Wing Val Lys Wing Asn Leu Gly His Leu Glu Wing 1860 1865 1870 Wing Wing Gly Leu Wing Gly Val Leu Lys Val Leu Leu Ala Leu Glu His 1875 1880 1885 Glu Gln He Pro Wing Gln Pro Glu Leu Asp Glu Leu Asn Pro His He 1890 1895 1900 Pro Trp Wing Glu Leu Pro Val Wing Val Val Arg Arg Wing Val Pro Trp 1905 1910 1915 1920 Pro Arg Gly Ala Arg Pro Arg Arg Ala Gly Val Ser Wing Phe Gly Leu 1925 1930 1935 Ser Gly Thr Asn Wing His Val Val Leu Glu Wing Ala Pro Wing Val Glu 1940 1945 1950 Pro Val Wing Wing Pro Pro Glu Arg Wing Ala Glu Leu Phe Val Leu Ser 1955 1960 1965 Wing Lys Ser Wing Wing Wing Leu Asp Wing Gln Wing Wing Arg Leu Arg Asp 1970 1975 1980 His Leu Glu Lys His Val Glu Leu Gly Leu Gly Asp Val Wing Phe Ser 1985 1990 1995 2000 Leu Ala Thr Thr Arg Ser Ala Met Glu His Arg Leu Ala Val Ala Ala and fa = a = atgSfcia; ..: ..,. 2005 2010 2015 Be Ser Arg Glu Ala Leu Arg Gly Ala Leu Be Ala Ala Ala Gln Gly 2020 2025 2030 His Thr Pro Pro Gly Ala Val Arg Gly Arg Ala Ser Gly Gly Be Ala 2035 2040 2045 Pro Lys Val Val Phe Val Phe Pro Gly Gln Gly Ser Gln Trp Val Gly 2050 2055 2060 Met Gly Arg Lys Leu Met Wing Glu Glu Pro Val Phe Arg Ala Wing Leu 2065 2070 2075 2080 Glu Gly Cys Asp Arg Wing He Glu Wing Glu Wing Gly Trp Ser Leu Leu 2085 2090 2095 Gly Glu Leu Ser Wing Asp Glu Wing Wing Gln Leu Gly Arg He Asp 2100 2105 2110 Val Val Gln Pro Val Leu Phe Ala Met Glu Val Ala Leu Ser Ala Leu 2115 2120 2125 Trp Arg Ser Trp Gly Val Glu Pro Glu Val Val Wing Gly His Ser 2130 2135 2140 Gly Glu Val Wing Ala Wing His Val Wing Ala Gly Ala Leu Ser Leu Glu Asp 2145 2150 2X55 2160 Wing Val Wing He He Cys Arg Arg Ser Arg Leu Leu Arg Arg He Ser 2165 2170 2175 Gly Gln Gly Met Glu Ala Leu Glu Val Leu Ser Leu Glu Glu Glu Ala 2180 2185 2190 Ala Ala Leu Arg Gly His Glu Gly Arg Leu Ser Val Ala Val Ser Asn 2195 2200 2205 Ser Pro Arg Ser Thr Val Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu 2210 2215 2220 Val Leu Ala Ala Leu Thr Ala Lys Gly Val Phe Trp Arg Gln Val Lys 2225 2230 2235 2240 Val Asp Val Ala Ser His Ser Pro Gln Val Asp Pro Leu Arg Glu Glu 2245 2250 2255 Leu He Ala Ala Leu Gly Ala He Arg Pro Arg Ala Ala Ala Ala Pro 2260 2265 2270 Met Arg Ser Thr Val Thr Gly Gly Val He Wing Gly Pro Glu Leu Gly 2275 2280 2285 Wing Ser Tyr Trp Wing Asp Asn Leu Arg Gln Pro Val Arg Phe Wing Wing 2290 2295 2300 Ala Ala Gln Ala Leu Leu Glu Gly G_y Pro Ala Leu Phe He Glu Met 2305 2310 2315 2320 Be Pro Pro Pro He Leu Val Pro Pro Leu Asp Glu He Gln Thr Ala 2325 2330 2335 Wing Glu Gln Gly Gly Wing Wing Val Gly Ser Leu Arg Gly Gln Asp 2340 2345 2350 Glu Arg Wing Thr Leu Leu Glu Wing Leu Gly Thr Leu Trp Wing Ser Gly 2355 2360 2365 Tyr Pro Val Ser Trp Wing Arg Leu Phe Pro Wing Gly Gly Arg Arg Val 2370 2375 2380 Pro Leu Pro Thr Tyr Pro Trp Gln His Glu Arg Tyr Trp He Glu Asp 2385 2390 2395 2400 Ser Val His Gly Ser Lys Pro Ser Leu Arg Leu Arg Gln Leu Arg Asn 2405 2410 2415 Gly Ala Thr Asp His Pro Leu Leu Gly Ala Pro Leu Leu Val Ser Wing 2420 2425 2430 Arg Pro Gly Ala His Leu Trp Glu Gln Ala Leu Ser Asp Glu Arg Leu 2435 2440 2445 Ser Tyr Leu Ser Glu His Arg Val His Gly Glu Ala Val Leu Pro Ser 2450 2455 2460 Ala Ala Tyr Val Glu Met Ala Ala Ala Ala Gly Val Asp Leu Tyr Gly 2465 2470 2475 2480 Thr Ala Thr Leu Val Leu Glu Gln Leu Ala Leu Glu Arg Ala Leu Ala 2485 2490 2495 Val Pro Ser Glu Gly Gly Arg He Val Gln Val Ala Leu Ser Glu Glu 2500 2505 2510 Gly Pro Gly Arg Ala Ser Phe Gln Val Ser Ser Arg Glu Glu Ala Gly 2515 2520 2525 Arg Ser Trp Val Arg Hxs Ala Thr Gly His Val Cys Be Gly Gln Ser 2530 2535 2540 Ser Wing Val Gly Wing Leu Lys Glu Wing Pro Trp Glu He Gln Arg Arg 2545 2550 2555 2560 Cys Pro Ser Val Leu Ser Ser Glu Ala Leu Tyr, Pro Leu Leu Asn Glu 2565 2570 (2575 Hxs Ala Leu Asp Tyr Gly Pro Cys Phe Gln Gly Val Glu Gln Val Trp 2580 2585 2590 Leu Gly Thr Gly Val Leu Gly Arg Val Arg Leu Pro Gly Asp Met 2595 2600 2605 Wing Being Ser Gly Wing Tyr Arg He Hxs Pro Wing Leu Leu Asp Wing 2610 2615 2620 Cys Phe Gln Val Leu Thr Wing Leu Leu Thr Thr Pro Glu Ser He Glu 2625 2630 2635 2640 He Arg Arg Arg Leu Thr Asp Leu His Glu Pro Asp Leu Pro Arg Ser 2645 2650 2655 Arg Wing Pro Val Asn Gln Wing Val Ser Asp Thr Trp Leu Trp Asp Wing 2660 2665 2670 Wing Leu Asp Gly Arg Arg Gln Ser Wing Ser Val Pro Val Asp Leu 2675 2680 2685 Val Leu Gly Ser Phe His Wing Lys Trp Glu Val Met Glu Arg Leu Wing 2690 2695 2700 Gln Ala Tyr He He Gly Thr Leu Arg He Trp Asn Val Phe Cys Ala 2705 2710 2715 2720 Wing Gly Glu Arg His Thr He Asp Glu Leu Leu Val Arg Leu Gln He 2725 2730 2735 Ser Val Val Tyr Arg Lys Val He Lys Arg Trp Met Glu His Leu Val 2740 2745 2750 Wing He Gly He Leu Val Gly Asp Gly Glu His Phe Val Ser Ser Gln 2755 2760 2765 Pro Leu Pro Glu Pro Asp Leu Ala Wing Val Leu Glu Glu Wing Gly Arg 2770 2775 2780 Val Phe Wing Asp Leu Pro Val Leu Phe Glu Trp Cys Lys Phe Wing Gly 2785 2790 2795 2800 Glu Arg Leu Wing Asp Val Leu Thr Gly Lys Thr Leu Ala Leu Glu He 2805 2810 2815 Leu Phe Pro Gly Gly Be Phe Asp Met Wing Glu Arg He Tyr Arg Asp 2820 2825 2830 Ser Pro He Wing Arg Tyr Ser Asn Gly He Val Arg Gly Val Val Glu 2835 2840 2845 Wing Wing Arg Val Val Wing Pro Ser Gly Met Phe Ser He Leu Glu 2850 2855 2860 He Gly Wing Gly Thr Gly Wing Thr Thr Wing Wing Val Leu Pro Val Leu 2865 2870 2875 2880 Leu Pro Asp Arg Thr Glu Tyr His Phe Thr Asp Val Ser Pro Leu Phe 2885 2890 2895 Leu Ala Arg Ala Glu Gln Arg Phe Arg Asp Tyr Pro Phe Leu Lys Tyr 2900 2905 2910 Gly He Leu Asp Val Asp Gin Glu Pro Wing Gly Gln Gly Tyr Wing His 2915 2920 2925 Gln Arg Phe Asp Val He Val Wing Wing Asn Val He His Wing Thr Arg 2930 2935 2940 Asp He Arg Wing Thr Wing Lys Arg Leu Leu Ser Leu Leu Wing Pro Gly 2945 2950 2955 2960 Gly Leu Leu Val Leu Val Glu Gly Thr Gly His Pro He Trp Phe Asp 2965 2970 2975 He Thr Thr Gly Leu He Glu Gly Trp Gln Lys Tyr Glu Asp Asp Leu 2980 2985 2990 Arg He Asp His Pro Leu Leu Pro Wing Arg Thr Trp Cys Asp Val Leu 2995 3000 3005 Arg Arg Val Gly Phe Wing Asp Wing Val Ser Leu Pro Gly Asp Gly Ser 3010 3015 3020 Pro Wing Gly He Leu Gly Gln His Val He Leu Wing Arg Wing Pro Gly 3025 3030 3035 3040 He Wing Wing Gly Wing Wing Cys Asp Being Ser Gly Glu Wing Wing Thr Glu Ser 3045 3050 3055 Pro Ala Ala Arg Ala Ala Arg Gln Glu Trp Ala Asp Gly Ser Ala Asp 3060 3065 3070 Val Val His Arg Met Ala Leu Glu Arg Met Tyr Phe His Arg Arg Pro 3075 3080 3085 Gly Arg Gln Val Trp Val His Gly Arg Leu Arg Thr Gly Gly Gly Ala 3090 3095 3100 Phe Thr Lys Ala Leu Ala Gly Asp Leu Leu Leu Phe Glu Asp Thr Gly 3105 3110 3115 3120 Gln Val Val Wing Glu Val Gln Gly Leu Arg Leu Pro Gln Leu Glu Wing 3125 3130 3135 Be Wing Phe Wing Pro Arg Asp Pro Arg Glu Glu Trp Leu Tyr Wing Leu 3140 3145 3150 Glu Trp Gln Arg Lys Asp Pro Pro Glu Wing Pro Wing Wing Wing Ser 3155 3160 3165 Being Ser Wing Gly Wing Trp Leu Val Leu Met Asp Gln Gly Gly Thr 3170 3175 3130 Gly Ala Ala Leu Val Ser Leu Leu Glu Gly Arg Gly Glu Ala Cys Val 3185 3190 3195 3200 Arg Val He Wing Gly Thr Wing Tyr Wing Cys Leu Wing Pro Gly Leu Tyr 3205 3210 3215 Gln Val Asp Pro Wing Gln Px "or Asp Gly Phe His _t.r Leu Leu Arg Asp 3220 3225 3230 Wing Phe Gly Glu Asp Arg He Cys Arg Wing Val Val His Met Trp Ser 3235 3240 3245 Leu Asp Wing Thr Wing Wing Gly Glu Arg Ala Thr Ala Glu Ser Leu Gln 3250 3255 3260 Wing Asp Gln Leu Leu Gly Ser Leu Ser Wing Leu Ser Leu Val Gln Wing 3265 3270 3275 3280 Leu Val Arg Arg Arg Trp Arg Asn Met Pro Arg Leu Trp Leu Leu Thr 3285 3290 3295 Arg Ala Val His Ala Val Gly Ala Glu Asp Ala Ala Ala Ser Val Ala 3300 3305 3310 Gln Ala Pro Val Trp Gly Leu Gly Arg Thr Leu Ala Leu Glu His Pro 3315 3320 3325 Glu Leu Arg Cys Thr Leu Val Asp Val Asn Pro Pro Wing Pro Glu 3330 3335 3340 Asp Wing Wing Wing Leu Wing Val Glu Leu Gly Wing Ser Asp Arg Glu Asp 3345 3350 3355 3360 Gln Val Ala Leu Arg Ser Asp Gly Arg Tyr Val Wing Arg Leu Val Arg 3365 3370 3375 Being Ser Phe Ser Gly Lys Pro Wing Thr Asp Cys Gly He Arg Wing Asp 3380 3385 3390 Gly Ser Tyr Val He Thr Asp Gly Met Gly Arg Val Gly Leu Ser Val 3395 3400 3405 Wing Gln Trp Met Val Met Gln Gly Wing Arg His Val Val Leu Val Asp 3410 3415 3420 Arg Gly Gly Wing Ser Glu Wing Ser Arg Asp Ala Leu Arg Ser Met Wing 3425 3430 3435 3440 Glu Wing Gly Wing Glu Val Gln He Val Glu Wing Asp Val Wing Arg Arg 3445 3450 3455 Asp Asp Val Wing Arg Leu Leu Ser Lys He Glu Pro Ser Met Pro Pro 3460 3465 3470 Leu Arg Gly He Val Tyr Val Asp Gly Thr Phe Gln Gly Asp Ser Ser 3475 3480 3485 Met Leu Glu Leu Asp Wing Arg Arg Phe Lys Glu Trp Met Tyr Pro Lys 3490 3495 3500 Val Leu Gly Wing Trp Asn Leu His Wing Leu Thr Arg Asp Arg Ser Leu 3505 3510 3515 3520 Asp Phe Phe Val Leu Tyr Ser Ser Gly Thr Ser Leu Leu Gly Leu Pro 3525 3530 3535 Gly Gln Gly Ser Arg Wing Wing Gly Asp Wing Phe Leu Asp Wing Wing 3540 3545 3550 His His Arg Cys Lys Val Gly Leu Thr Wing Met Ser He Asn Trp Gly 3555 3560 3565 Leu Leu Ser Glu Wing Ser Ser Pro Wing Thr Pro Asn Asp Gly Gly Ala 3570 3575 3580 Arg Leu Glu Tyr Arg Gly Met Glu Gly Leu Thr Leu Glu Gln Gly Wing 3585 3590 3595 3600 Ala Ala Leu Gly Arg Leu Leu Ala Arg Pro Arg Ala Gln Val Gly Val 3605 3610 3615 Met Arg Leu Asn Leu Arg Gln Trp Leu Glu Phe Tyr Pro Asn Wing Wing 3620 3625 3630 Arg Leu Wing Leu Trp Wing Glu Leu Leu Lys Glu Arg Asp Arg Wing Asp 3635 3640 3645 Arg Gly Wing Being Asn Wing Being Asn Leu Arg Glu Wing Leu Gln Ser Wing 3650-3655 3660 Arg Pro Glu Asp Arg Gln Leu He Leu Glu Lys His Leu Ser Glu Leu 3665 3670 3675 3680 Leu Gly Arg Gly Leu Arg Leu Pro Pro Glu Arg He Glu Arg His Val 3685 3690 3695 Pro Phe Be Asn Leu Gly Met Asp Ser Leu He Gly Leu Glu Leu Arg 3700 3705 3710 Asn Arg He Glu Ala Ala Leu Gly He Thr Val Pro Ala Thr Leu Leu 3715 3720 3725 Trp Thr Tyr Pro Asn Val Ala Ala Leu Ser Gly Ser Leu Leu Asp He 3730 3735 3740 Leu Phe Pro Asn Wing Gly Wing Thr His Wing Pro Wing Thr Glu Arg Glu 3745 3750 3755 3760 Lys Ser Phe Glu Asn Asp Wing Wing Asp Leu Glu Wing Leu Arg Gly Met 3765 3770 3775 Thr Asp Glu Gln Lys Asp Wing Leu Leu Wing Glu Lys Leu Wing Gln Leu 3780 3785 3790 Wing Gln He Val Gly Glu 3795 < 210 > 7 < 211 > 2439 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 7 Met Wing Thr Thr Asn Wing Gly Lys Leu Glu His Wing Leu Leu Leu Met 1 5 10 15 Asp Lys Leu Wing Lys Lys Asn Wing Being Leu Glu Gln Glu Arg Thr Glu 20 25 30 Pro He Wing He Val Gly He Gly Cys Arg Phe Pro Gly Wing Wing Asp 35 40 45 Thr Pro Glu Wing Phe Trp Glu Leu Leu Asp Ser Gly Arg Asp Ala Val 50 55 60 Gln Pro Leu Asp Arg Arg Trp Ala Leu Val Gly Val His Pro Ser Glu 65 70 75 80 Glu Val Pro Arg Trp Wing Gly Leu Leu Thr Glu Wing Val Asp Gly Phe 85 90 95 Asp Wing Wing Phe Phe Gly Thr Ser Pro Arg Glu Wing Arg Ser Leu Asp 100 105 110 Pro Gln Gln Arg Leu Leu Glu Val Thr Trp Glu Gly Leu Glu Asp 115 120 125 Wing Gly He Wing Pro Gln Ser Leu Asp Gly Ser Arg Thr Gly Val Phe 130 135 140 Leu Gly Wing Cys Ser Ser Asp Tyr Ser His Thr Val Wing Gln Gln Arg 145 150 155 160 Arg Glu Glu Gln Asp Wing Tyr Asp He Thr Gly Asn Thr Leu Ser Val 165 170 175 Ala Ala Gly Arg Leu Ser Tyr Thr Leu Gly Leu Gln Gly Pro Cys Leu 180 185 190 Thr Val Asp Thr Ala Cys Ser Ser Leu Val Ala W Hxs Leu Wing 195 200 205 Cys Arg Ser Leu Arg Ala Arg Glu Ser Asp Leu Ala Leu Wing Gly Gly 210 215 220 Val Asn Met Leu Leu Be Ser Lys Thr Met He Met Leu Gly Arg He 225 230 235 240 Gln Ala Leu Ser Pro Asp Gly His Cys Arg Thr Phe Asp Wing Ser Wing 245 250 255 Asn Gly Phe Val Arg Gly Glu Gly Cys Gly Met Val Val Leu Lys Arg 260 265 270 Leu Ser Asp Wing Gln Arg Hxs Gly Asp Arg He Trp Wing Leu He Arg 275 280 285 Gly Being Wing Met Asn Gln Asp Gly Arg Ser Thr Gly Leu Met Ala Pro 290 295 300 Asn Val Leu Ala Gln Glu Ala Leu Leu Arg Glu Ala Leu Gln Ser Ala 305 310 315 320 Arg Val Asp Ala Gly Ala He Gly Tyr Val Glu Thr His Gly Thr Gly 325 330 335 Thr Ser Leu Gly Asp Pro He Glu Val Glu Ala Leu Arg Ala Val Leu 340 345 350 Gly Pro Ala Arg Ala Asp Gly Ser Arg Cys Val Leu Gly Ala Val Lys 355 360 365 Thr Asn Leu Gly His Leu Glu Gly Wing Wing Gly Val Wing Gly Leu He 370 375 380 Lys Ala Ala Leu Ala Leu His Hxs Glu Leu He Pro Arg Asn Leu His 385 390 395 400 Phe His Thr Leu Asn Pro Arg He Arg He Glu Gly Thr Ala Leu Wing 405 410 415 Leu Wing Thr Glu Pro Val Pro Trp Pro Arg Wing Gly Arg Pro Arg Phe 420 425 430 Wing Gly Val Ser Wing Phe Gly Leu Ser Gly Thr Asn Val Val Val 435 440 445 Leu Glu Glu Wing Pro Wing Thr Val Leu Wing Pro Wing Pro Pro Gly Arg 450 455 460 Ser Wing Glu Leu Leu Val Leu Ser Wing Lys Ser Ala Ala Ala Ala Asu 465 470 475 480 Wing Gln Wing Wing Arg Leu Wing His Wing He Wing Wing Tyr Pro Glu Gln 485 490 495 Gly Leu Gly Asp Val Wing Phe Ser Leu Val Ser Thr Arg Ser Pro Met 500 505 510 Glu Hxs Arg Leu Wing Ala Wing Thr Ser Arg Glu Wing Leu Arg Ser 515 520 525 Wing Leu Glu Val Wing Wing Gln Gly Gln Thr Pro Wing Gly Wing Wing Arg 530 535 540 Gly Arg Wing Wing Being Ser Pro Gly Lys Leu Ala Phe Leu Phe Ala Gly 545 550 555 560 Gln Gly Wing Gln Val Pro Gly Met Gly Arg Gly Leu Trp Glu Wing Trp 565 570 575 Pro Wing Phe Arg Glu Thr Phe Asp Arg Cys Val Tr.r Leu Phe Asp Arg 580 585 590 Glu Leu His Gln Pro Leu Cys Glu Val Met Trp Wing Glu Pro Gly Ser 595 600 605 Ser Arg Ser Ser Leu Leu Asp Gln Thr Wing Phe Thr Gln Pro Wing Leu 610 615 620 Phe Ala Leu Glu Tyr Ala Leu Ala Wing Leu Phe Arg Ser Trp Gly Val 625 630 635 640 Glu Pro Glu Leu Val Gly Wing His Ser Leu Gly Glu Leu Val Wing Ala 645 650 655 Cys Val Wing Gly Val Phe Ser Leu Glu Asp Wing Val Arg Leu Val Val 660 665 670 Wing Arg Gly Arg Leu Met Gln Wing Leu Pro Wing Gly Gly Wing Met Val 675 680 685 Ser He Wing Wing Pro Glu Wing Asp Val Wing Wing Wing Val Ala Pro His 690 695 700 Wing Wing Leu Val Ser Wing Wing Wing Val Asn Gly Pro Glu Val Val 705 710 715 720 Wing Wing Gly Wing Glu Lys Phe Val Gln Gln Wing Wing Wing Wing Phe Wing 725 730 735 Wing Arg Gly Wing Arg Thr Lys Pro Leu His Val Ser His Wing Phe His 740 745 750 Pro Pro Leu Met Asp Pro Met Leu Glu Wing Phe Arg Arg Val Thr Glu 755 760 765 Ser Val Thr Tyr Arg Arg Pro Ser He Wing Leu Val Ser Asn Leu Ser 770 775 780 Gly Lys Pro Cys Thr Asp Glu Val Ser Wing Pro Gly Tyr Trp Val Arg 785 790 795 800 His Wing Arg Glu Wing Val Arg Phe Wing Asp Gly Val Lys Wing Leu His 805 810 815 Ala Ala Gly Ala Gly Leu Phe Val Val Glu Pro Lys Pro Thr Leu 820 825 830 Leu Le Val Val Leu Pro Cry Leu Pro Asp Ala Pro Val Leu Leu 835 840 845 Pro Ala Ser Arg Ala Gly Arg Asp Glu Ala Ala Ser Wing Leu Glu Wing 850 855 860 Leu Gly Gly Phe Trp Val Val Gly Gly Ser Val Thr Trp Ser Gly Val 865 870 875 880 Phe Pro Ser Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln 885 890 895 Arg Glu Arg Tyr Trp He Glu Wing Pro Val Asp Arg Glu Wing Asp Gly 900 905 910 Thr Gly Arg Wing Arg Wing Gly Gly His Pro Leu Leu Gly Glu Val Phe 915 920 925 Ser Val Ser Thr His Wing Gly Leu Arg Leu Trp Glu Thr Thu Leu Asp 930 935 940 Arg Lys Arg Leu Pro Trp Leu Gly Glu His Arg Wing Gln Gly Glu Val 945 950 955 960 Val Phe Pro Gly Wing Gly Tyr Leu Glu Met Wing Leu Ser Ser Gly Wing 965"" 970 975 Glu He Leu Gly Asp Gly Pro He Gln Val Thr Asp Val Val Leu He 980 985 990 Glu Thr Leu Thr Phe Wing Gly Asp Thr Wing Val Pro Val Gln Val Val 995 1000 1005 Thr Thr Glu Glu Arg Pro Gly Arg Leu Arg Phe Gln Val Ala Ser Arg 1010 1015 1020 Glu Pro Gly Glu Arg Arg Wing Pro Phe Arg He His Wing Arg Gly Val 1025 1030 1035 1040 Leu Arg Arg He Gly Arg Val Glu Thr Pro Ala Arg Ser Asn Leu Ala 1045 1050 1055 Ala Leu Arg Ala Arg Leu His Ala Ala Ala Pro Ala Ala Ala He Tyr 1060 1065 1070 Gly Ala Leu Ala Glu Met Gly Leu Gln Tyr Gly Pro Ala Leu Arg Gly 1075 1080 1085 Leu Ala Glu Leu Trp Arg Gly Glu Gly Glu Ala Leu Gly Arg Val Arg 1090 1095 1100 Leu Pro Glu Wing Wing Gly Be Wing Thr Wing Tyr Gln Leu His Pro Val 1105 1110 1115 1120 Leu Leu Asp Wing Cys Val Gln Met He Val Gly Wing Phe Wing Asp Arg 1125 1130 1135 Asp Glu Wing Thr Pro Trp Wing Pro Val Glu Val Gly Ser Val Arg Leu 1140 1145 1150 Phe Gln Arg Ser Pro Gly Glu Leu Trp Cys His Wing Arg Val Val Ser 1155 1160 1165 Asp Gly Gln Gln Wing Being Ser Arg Trp Ser Wing Asp Phe Glu Leu Met 1170 1175 1180 Asp Gly Thr Gly Wing Val Val Wing Glu He Ser Arg Leu Val Val Glu 1185 1190 1195 1200 Arg Leu Wing Ser Gly Val Arg Arg Arg Asp Wing Asp Asp Trp Phe Leu 1205 1210 1215 Glu Leu Asp Trp Glu Pro Wing Wing Leu Gly Gly Pro Lys He Thr Wing 1220 1225 1230 Gly Arg Trp Leu Leu Glu Glu Gly Gly Gly Leu Gly Arg Ser Leu 1235 1240 1245 Cys Ser Ala Leu Ala Ala Ala Gly His Val Val Val His Wing Wing Gly 1250 1255 1260 Asp Asp Thr Ser Thr Wing Gly Met Arg Wing Leu Leu Wing Asn Wing Phe 1265 1270 1275 1280 Asp Gly Gln Ala Pro Thr Ala Val Val His Leu Ser Ser Leu Asp Gly 1285 1290 1295 Gly Gly Gln Leu Gly Pro Gly Leu Gly Wing Gln Gly Wing Leu Asp Wing 1300 1305 1310 Pro Arg Ser Pro Asp Val Asp Wing Asp Wing Leu Glu Wing Wing Leu Met 1315 1320 1325 Arg Gly Cys Asp Ser Val Leu Ser Leu Val Gln Ala Leu Val Gly Met 1330 1335 1340 Asp Leu Arg Asn Wing Pro Arg Leu Trp Leu Leu Thr Arg Gly Wing Gln 1345 1350 1355 1360 Wing Wing Wing Wing Gly Asp Val Ser Val Val Gln Wing Pro Leu Leu Gly 1365 1370 1375 Leu Gly Arg Thr He Ala Leu Glu His Wing Glu Leu Arg Cys He Ser 1380 1385 1390 Val Asp Leu Asp Pro Wing Glu Pro Glu Gly Glu Wing Asp Wing Leu Leu 1395 1400 1405 Wing Glu Leu Leu Wing Asp Asp Wing Glu Glu Glu Val Wing Leu Arg Gly 1410 1415 1420 Gly Asp Arg Leu Val Wing Arg Leu Val His Arg Leu Pro Asp Wing Gln 1425 1430 1435 1440 Arg Arg Glu Lys Val Glu Pro Wing Gly Asp Arg Pro Phe Arg Leu Glu 1445 1450 1455 He Asp Glu Pro Gly Wing Leu Asp Gln Leu Val Leu Arg Wing Thr Gly 1460 1465 1470 Arg Arg Wing Pro Gly Pro Gly Glu Val Glu He Ser Val Glu Wing Wing 1475 1480 1485 Gly Leu Asp Ser He Asp He Gln Leu Ala Leu Gly Val Ala Pro Asn 1490 1495 1500 Asp Leu Pro Gly Glu Glu He Glu Pro Leu Val Leu Gly Ser Glu Cys 1505 1510 1515 1520 Wing Gly Arg He Val Wing Val Gly Glu Gly Val Asn Gly Leu Val Val 1525 1530 1535 Gly Gln Pro Val He Wing Leu Wing Wing Gly Val Phe Wing Thr His Val 1540 1545 1550 Thr Thr Ser Wing Thr Leu Val Leu Pro Arg Pro Leu Gly Leu Ser Wing 1555 1560 1565 Thr Glu Wing Wing Wing Met Pro Leu Wing Tyr Leu Thr Wing Trp Tyr Wing 1570 1575 1580 Leu Asp Lys Val Wing His Leu Gln Wing Gly Glu Arg Val Leu IIe HlS 1585 1590 1595 1600 Wing Glu Wing Gly Gly Val Gly Leu Cys Wing Val Arg Trp Wing Gln Arg 1605 1610 1615 Val Gly Wing Glu Val Tyr Wing Thr Wing Asp Thr Pro Glu Asn Arg Wing 1620 1625 1630 Tyr Leu Glu Ser Leu Gly Val Arg Tyr Val Ser Asp Ser Arg Ser Gly 1635 1640 1645 Arg Phe Val Thr Asp Val His Wing Trp Thr Asp Gly Glu Gly Val Asp 1650 1655 1660 Val Val Leu Asp Ser Leu Ser Gly Glu Arg He Asp Lys Ser Leu Met 1665 1670 1675 1680 Val Leu Arg Ala Cys Gly Arg Leu Val Lys Leu Gly Arg Arg Asp Asp 1685 1690 1695 Cys Wing Asp Thr Gln Pro Gly Leu Pro Pro Leu Leu Arg Asn Phe Ser 1700 1705 1710 Phe Ser Gln Val Asp Leu Arg Gly Met Met Leu Asp Gln Pro Wing Arg 1715 1720 1725 He Arg Ala Leu Leu Asp Glu Leu Phe Gly Leu Val Wing Ala Gly Wing 1730 1735 '1740 He Ser Pro Leu Gly Ser Gly Leu Arg Val Gly Gly Ser Leu Thr Pro 1745 1750 1755 1760 Pro Pro Val Glu Thr Phe Pro He Ser Arg Ala Wing Glu Wing Phe Arg 1765 1770 1775 Arg Met Wing Gln Gly Gln His Leu Gly Lys Leu Val Leu Thr Leu Asp 1780 1785 1790 Asp Pro Glu Val Arg He Arg Wing Pro Wing Glu Ser Ser Val Val Wing 1795 1800 1805 Arg Ala Asp Gly Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly 1810 1815 1820 Leu Arg Val Wing Gly Trp Leu Wing Glu Arg Gly Wing Gly Gln Leu Val 1825 1830 1835 1840 Leu Val Gly Arg Be Gly Wing Wing Be Wing Glu Gln Arg Wing Wing Val 1845 1850 1855 Wing Wing Leu Glu Wing His Gly Wing Arg Val Thr Val Wing Wing Lys Wing Asp 1860 1865 1870 Val Wing Asp Arg Ser Gln He Glu Arg Val Leu Arg Glu Val Thr Wing 1875 1880 1885 Ser Gly Met Pro Leu Arg Gly Val Val His Wing Wing Gly Leu Val Asp 1890 1895 1900 Asp Gly Leu Leu Met Gln Gln Thr Pro Wing Arg Phe Arg Thr Val Met 1905 1910 1915 1920 Gly Pro Lys Val Gln Gly Ala Leu His Leu His Thr Leu Thr Arg Glu 1925 1930 1935 Ala Pro Leu Ser Phe Phe Val Leu Tyr Ala Ser Ala Ala Gly Leu Phe 1940 1945 1950 Gly Ser Pro Gly Gln Gly Asn Tyr Ala Ala Ala Asn Wing Phe Leu Asp 1955 1960 1965 Wing Leu Ser His His Arg Arg Wing Gln Gly Leu Pro Wing Leu Ser He 1970 1975 1980 Asp Trp Gly Met Phe Thr Glu Val Gly Met Wing Val Wing Gln Glu Asn 1985 1990 1995 2000 Arg Gly Ala Arg Gln Be Arg Gly Met Arg Gly He Thr Pro Asp 2005 2010 2015 Glu Gly Leu Be Ala Leu Ala Arg Leu Leu Glu Gly Asp Arg Val Gln 2020 2025 2030 Thr Gly Val He Pro He Thr Pro Arg Gln Trp Val Glu Phe Tyr Pro 2035 2040 2045 Wing Thr Wing Wing Being Arg Arg Leu Being Arg Leu Val Thr Thr Gln Arg 2050 2055 2060 Wing Val Wing Asp Arg Thr Wing Gly Asp Arg Asp Leu Leu Glu Gln Leu 2065 2070 2075 2080 Wing Being Wing Glu Pro Wing Wing Arg Wing Gly Leu Leu Gln Asp Val Val 2085 2090 2095 Arg Val Gln Val Ser His Val Leu Arg Leu Pro Glu Asp Lys He Glu 2100 2105 2110 Val Asp Ala Pro Leu Ser Met Met Gly Met Asp Ser Leu Met Ser Leu 2115 2120 2125 Glu Leu Arg Asn Arg He Glu Wing Wing Leu Gly Val Wing Wing Pro Wing 2130 2135 2140 Wing Leu Gly Trp Thr Tyr Pro Thr Val Wing Wing He Thr Arg Trp Leu 2145 2150 2155 2160 Leu Asp Asp Wing Leu Val Val Arg Leu Gly Gly Gly Ser Asp Thr Asp 2165 2170 2175 Glu Ser Thr Wing Being Wing Gly Ser Phe Val His Val Leu Arg Phe Arg 2180 2185 2190 Pro Val Val Lys Pro Arg Wing Arg Leu Phe Cys Phe His Gly Ser Gly 2195 2200 2205 Gly Ser Pro Glu Gly Phe Arg Ser Trp Ser Glu Lys Ser Glu Trp Ser 2210 2215 2220 Asp Leu Glu He Val Wing Met Trp His Asp Arg Ser Leu Wing Ser Glu 2225 2230 2235 2240 Asp Wing Pro Gly Lys Lys Tyr Val Gln Glu Wing Wing Ser Leu He Gln 2245 2250 2255 His Tyr Ala Asp Ala Pro Phe Ala Leu Val Gly Phe Ser Leu Gly Val 2260 2265 2270 Arg Phe Val Met Gly Thr Ala Val Glu Leu Ala Ser Arg Ser Gly Wing 2275 2280 2285 Pro Ala Pro Leu Ala Val Phe Thr Leu Gly Gly Ser Leu He Ser Ser 2290 2295 2300 Ser Glu He Thr Pro Glu Met Glu Thr Asp He He Wing Lys Leu Phe 2305 2310 2315 2320 Phe Arg Asn Wing Wing Gly Phe Val Arg Ser Thr Gln Gln Val Gln Wing 2325 2330 2335 Asp Wing Arg Wing Asp Lys Val He Thr Asp Thr Met Val Wing Pro Wing 2340 2345 2350 Pro Gly Asp Ser Lys Glu Pro Pro Val Lys He Wing Val Pro He Val 2355 2360 2365 Wing Wing Wing Gly Ser Asp Val Val Hep Pro Pro Ser Asp Val Gln 2370 2375 2380 Asp Leu Gln Ser Arg Thr Thr Glu Arg Phe Tyr Met His Leu Leu Pro 2385 2390 2395 2400 Gly Asp His Glu Phe Leu Val Asp Arg Gly Arg Glu He Met His He 2405 2410 2415 Val Asp Ser His Leu Asn Pro Leu Leu Ala Wing Arg Thr Thr Ser Ser 2420 2425 2430 Gly Pro Wing Phe Glu Wing Lys 2435 < 210 > 8 < 211 > 419 10 < 212 > PRT < 13 > Sorangium cellulosum < 400 > 8 Met Thr Gln Glu Gln Wing Asn Gln Ser Glu Thr Lys Pro Wing Phe Asp 15 1 5 10 15 Phe Lys Pro Phe Wing Pro Gly Tyr Wing Glu Asp Pro Phe Pro Wing He 20 25 30 20 Glu Arg Leu Arg Glu Wing Thr Pro He Phe Tyr Trp Asp Glu Gly Arg 35 40 45 Ser Trp Val Leu Thr Arg Tyr His Asp Val Ser Wing Val Phe Arg Asp 50 55 60 25 Glu Arg Phe Wing Val Ser Arg Glu Glu Trp Glu Ser Ser Wing Glu Tyr 65 70 75 80 Be Being Wing He Pro Glu Leu Being Asp Met Lys Lys Tyr Gly Leu Phe 85 90 95 Gly Leu Pro Pro Glu Asp His Wing Arg Val Arg Lys Leu Val Asn Pro 100 105 110 35 Ser Phe Thr Ser Arg Wing He Asp Leu Leu Arg Wing Glu He Gln Arg 115 120 125 Thr Val Asp Gln Leu Leu Asp Wing Arg Ser Gly Gln Glu Glu Phe Asp 130 135 140 40 Val Val Arg Asp Tyr Wing Glu Gly He Pro Met Arg Wing He Ser Wing 145 150 155 160 Leu Leu Lys Val Pro Wing Glu Cys Asp Glu Lys Phe Arg Arg Phe Gly 45 165 170 175 Be Ala Thr Ala Arg Ala Leu Gly Val Gly Leu Val Pro Gln Val Asp 180 185 190 50 Glu Glu Thr Lys Thr Leu Val Ala Ser Val Thr Glu Glu Leu Ala Leu 195 200 205 Leu His Asp Val Leu Asp Glu Arg Arg Arg Asn Pro Leu Glu Asn Asp 210 215 220 DD Val Leu Thr Met Leu Leu Gln Wing Glu Wing Asp Gly Ser Arg Leu Ser 225 230 235 240 Thr Lys Glu Leu Val Wing Leu Val Gly Wing He He Wing Wing Gly Thr 245 250 255 Asp Thr Thr He Tyr Leu He Wing Phe Wing Val Leu Asn Leu Leu Arg 260 265 270 Ser Pro Glu Wing Leu Glu Leu Val Lys Wing Glu Pro Gly Leu Met Arg 275 280 285 Asn Ala Leu Asp Glu Val Leu Arg Phe Asp Asn He Leu Arg He Gly 290 295 300 Thr Val Arg Phe Ala Arg Gln Asp Leu Glu Tyr Cys Gly Ala Ser He 305 310 315 320 Lys Lys Gly Glu Met Val Phe Leu Leu He Pro Be Ala Leu Arg Asp 325 330 335 Gly Thr Val Phe Ser Arg Pro Asp Val Phe Asp Val Arg Arg Asp Thr 340 345 350 Gly Wing Ser Leu Wing Tyr Gly Arg Gly Pro His Val Cys Pro Gly Val 355 360 365 Ser Leu Wing Arg Leu Glu Wing Glu He Wing Val Gly Thr He Phe Arg 370 375 380 Arg Phe Pro Glu Met Lys Leu Lys Glu Thr Pro Val Phe Gly Tyr Hxs 385 390 395 400 Pro Ala Phe Arg Asn He Glu Ser Leu Asn Val He Leu Lys Pro Ser 405 410 415 Lys Ala Gly < 210 > 9 < 211 > 607 < 212 > PRT < 213 > Sorangxum cellulosum < 400 > 9 Ala Ser Leu Asp Ala Leu Phe Ala Arg Ala Thr Ser Ala Arg Val Leu 1 5 10 15 Asp Asp Gly Hxs Gly Arg Wing Thr Glu Arg Hxs Val Leu Wing Glu Wing 20 25 30 Arg Gly He Glu Asp Leu Arg Wing Leu Arg Glu Hxs Leu Arg He Gln 35 40 45 Glu Gly Gly Pro Ser Phe His Cys Met Cys Leu Gly Asp Leu Thr Val 50 55 60 Glu Leu Leu Wing Hxs Asp Gln Pro Leu Wing Ser Be Ser Phe Hxs Hxs • < v 65 70 75 80 Wing Arg Ser Leu Arg Hxs Pro Asp Trp Thr Ser Asp Wing Met Leu Val 85 90 95 Asp Gly Pro Ala Leu Val Arg Trp Leu Ala Ala Arg Gly Ala Pro Gly 100 105 110 Pro Leu Arg Glu Tyr Glu Glu Glu Arg Glu Arg Ala Arg Thr Ala Gln 115 120 125 Glu Ala Arg Arg Leu Trp Leu Ala Ala Ala Pro Pro Cys Phe Wing Pro 130 135 140 Asp Leu Pro Arg Phe Glu Asp Asp Wing Asn Gly Leu Pro Leu Gly Pro 145 150 155 160 Met Ser Pro Glu Val Wing Glu Wing Glu Arg Arg Leu Arg Wing Ser Tyr 165 170 175 Ala Thr Pro Glu Leu Ala Cys Ala Ala Leu Leu Ala Trp Leu Gly Thr 180 185 190 Gly Wing Gly Pro Trp Ser Gly Tyr Pro Wing Tyr Glu Met Leu Pro Glu 195 200 205 Asn Leu Leu Leu Gly Phe Gly Leu Pro Thr Ala He Ala Ala Ala Ser 210 215 220 Ala Pro Gly Thr Ser Glu Ala Ala Leu Arg Gly Ala Ala Arg Leu Phe 225 230 235 240 Wing Ser Trp Glu Val Val Ser Ser Lys Lys Ser Gln Leu Gly Asn He 245 250 255 Pro Glu Ala Leu Trp Glu Arg Leu Arg Thr He Val Arg Ala Met Gly 260 265 270 Asn Wing Asp Asn Leu Ser Arg Phe Glu Arg Wing Glu Wing Wing Wing 275 280 285 Glu Val Arg Arg Leu Arg Wing Gln Pro Wing Pro Phe Ala Ala Gly Ala 290 295 300 Gly Leu Ala Ala Ala Gly Val Being Ser Gly Arg Leu Ser Gly Leu 305 310 315 320 Val Thr Asp Gly Asp Ala Leu Tyr Ser Gly Asp Gly Asn Asp He Val 325 330 335 Met Phe Gln Pro Gly Arg He Ser Val Val Leu Leu Wing Gly Thr 340 345 350 Asp Pro Phe Glu Leu Wing Pro Pro Leu Ser Gln Met Leu Phe Val 355 360 365 Wing Hxs Wing Asn Wing Gly Thr He Ser Lys Val Leu Thr Glu Gly Ser 370 375 380 Pro Leu He Val Met Wing Arg Asn Gln Wing Arg Pro Met Ser Leu Val 385 390 395 400 His Wing Arg Gly Phe Met Wing Trp Val Asn Gln Wing Met Val Pro Asp 405 410 415 Pro Glu Arg Gly Wing Pro Phe Val Val Gln Arg Ser Thr He Met Glu 420 425 430 Phe Glu His Pro Thr Pro Arg Cys Leu His Glu Pro Wing Gly Ser Wing 435 440 445 Phe Ser Leu Wing Cys Asp Glu Glu His Leu Tyr Trp Cys Glu Leu Ser 450 455 460 Wing Gly Arg Leu Glu Leu Trp Arg His Pro His Hxs Arg Pro Gly Wing 465 470 475 480 Pro Ser Arg Phe Wing Tyr Leu Gly Glu His Pro Wing Wing Thr Trp 485 490 495 Tyr Pro Ser Leu Thr Leu Asn Wing Thr His Val Leu Trp Wing Asp Pro 500 505 510 Asp Arg Arg Wing He Leu Gly Val Asp Lys Arg Thr Gly Val Glu Pro 515 520 525 He Val Leu Wing Glu Thr Arg His Pro Pro Wing His Val Val Ser Glu 530 535 540 Asp Arg Asp He Phe Wing Leu Thr Gly Gln Pro Asp Ser Arg Asp Trp 545 550 555 560 His Val Glu His He Arg Ser Gly Wing Ser Thr Val Val Wing Asp Tyr 565 570 575 Gln Arg Gln Leu Trp Asp Arg Pro Asp Met Val Leu Asn Arg Arg Gly 580 585 590 Leu Phe Phe Thr Thr Asn Asp Arg He Leu Thr Leu Wing Arg Ser 595 600 605 < 210 > 10 < 211 > 423 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 10 Met Gly Ala Leu He Ser Val Ala Ala Pro Gly Cys Ala Leu Gly Gly 1 5 10 15 Ala Glu Glu Glu Gly Gln Pro Gly Gln Asp Ala Gly Ala Gly Ala Leu 25 30 Wing Pro Wing Arg Glu Val Met Wing Wing Glu Val Wing Wing Gly Gln Met 35 40 45 Pro Gly Wing Val Trp Leu Val Wing Arg Gly Asp Asp Val His Val Asp 50 55 60 Wing Val Gly Val Thr Glu Leu Gly Gly Be Wing Pro Met Arg Arg Asp 65 70 75 80 Thr He Phe Arg He Wing Ser Met Thr Lys Wing Val Thr Wing Thr Wing 85 90 95 Val Met Met Leu Val Glu Glu Gly Lys Leu Asp Leu Asp Ser Val Val 105 105 110 Asp Arg Trp Leu Pro Glu Leu Wing Asn Arg Lys Val Leu Wing Arg He 115 120 125 Asp Gly Pro He Asp Glu Thr Val Pro Wing Glu Arg Pro He Thr Val 130 135 140 Arg Asp Leu Met Thr Phe Thr Met Gly Phe Gly He Ser Phe Asp Wing 145 150 155 160 Ser Ser Pro He Gln Arg Wing He Asp Glu Leu Gly Leu Val Asn Wing 165 170 175 Gln Pro Val Pro Met Thr Pro His Gly Pro Asp Glu Trp He Arg Arg 180 185 190 Leu Gly Thr Leu Pro Leu Met His Gln Pro Gly Wing Gln Trp Met Tyr 195 200 205 Asn Thr Gly Ser Leu Val Gln Gly Val Leu Val Gly Arg Wing Wing Asp 210 215 220 Gln Gly Phe Asp Wing Phe Val Arg Glu Arg He Leu Wing Pro Leu Gly 225 230 235 240 Met Arg Asp Thr Asp Phe His Val Pro Wing Asp Lys Leu Wing Arg Phe 245 250 255 Wing Gly Cys Gly Tyr Phe Thr Asp Glu Gln Thr Gly Glu Lys Thr Arg 260 265-270 Met Asp Arg Asp Gly Wing Glu Wing Wing Tyr Wing Pro Pro Wing Phe 275 280 285 Pro Ser Gly Wing Wing Gly Leu Val Ser Thr Val Asp Asp Tyr Leu Leu 290 295 300 Phe Wing Arg Met Leu Met Asn Gly Gly Val His Glu Gly Arg Arg Leu 305 310 315 320 Leu Ser Ala Ala Ser Val Arg Glu Met Thr Wing Asp His Leu Thr Pro 325 330 335 Wing Gln Lys Wing Wing Being Phe Phe Pro Gly Phe Phe Glu Thr His 340 345 350 Gly Trp Gly Tyr Gly Met Wing Val Val Thr Wing Pro Asp Wing Val Ser 355 360 365 Glu Val Pro Gly Arg Tyr Gly Trp Asp Gly Gly Phe Gly Thr Ser Trp 370 375 380 He Asn Asp Pro Gly Arg Glu Leu He Gly He Val Met Thr Gln Ser 385 390 395 400 Wing Gly Phe Leu Phe Ser Gly Wing Leu Glu Arg Phe Trp Arg Ser Val 405 410 415 Tyr Val Ala Thr Glu Ser Wing 420 < 210 > 11 < 211 > 713 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 11 Met His Gly Leu Thr Glu Arg Gln Val Leu Leu Ser Leu Val Thr Leu 1 5 10 15 Ala Leu He Leu Val Thr Ala Arg Ala Ser Gly Glu Leu Ala Arg Arg 20 25 30 Leu Arg Gln Pro Glu Val Leu Gly Glu Leu Phe Gly Gly Val Val Leu 35 40 45 Gly Pro Ser Val Val Gly Ala Leu Ala Pro Gly Phe His Arg Ala Leu 50 55 60 Phe Gln Glu Pro Wing Val Gly Val Val Leu Ser Gly He Ser Trp He 65 70 75 80 Gly Ala Leu Leu Leu Leu Le Met Met Gly He Glu Val Asp Val Gly 85 90 95 He Leu Arg Lys Glu Ala Arg Pro Gly Ala Leu Ser Ala Leu Gly Ala 100 105 110 He Ala Pro Pro Leu Ala Ala Gly Ala Ala Phe Be Ala Leu Val Leu 115 120 125 Asp Arg Pro Leu Pro Be Gly Leu Phe Leu Gly He Val Leu Ser Val 130 135 140 Thr Wing Val Ser Val Wing Wing Lys Val Leu He Glu Arg Glu Ser Met 145 150 155 160 Arg Arg Ser Tyr Ala Gln Val Thr Leu Ala Ala Gly Val Val Ser Glu 165 170 175 .. y¡é Val Ala Ala Trp Val Leu Val Ala Met Thr Ser Ser Ser Tyr Gly Ala 180 185 190 Ser Pro Ala Leu Ala Val Ala Arg Ser Ala Leu Leu Ala Ser Gly Phe 195 200 205 Leu Leu Phe Met Val Leu Val Gly Arg Arg Leu Thr His Leu Wing Met 210 215 220 Arg Trp Val Wing Asp Wing Thr Arg Val Ser Lys Gly Gln Val Ser Leu 225 230 235 240 Val Leu Val Leu Thr Phe Leu Ala Ala Ala Leu Thr Gln Arg Leu Gly 245 250 255 Leu His Pro Leu Leu Gly Ala Phe Ala Leu Gly Val Leu Leu Asn Ser 260 265 270 Ala Pro Arg Thr Asn Arg Pro Leu Leu Asp Gly Val Gln Thr Leu Val 275 280 285 Ala Gly Leu Phe Ala Pro Val Phe Phe Val Leu Ala Gly Met Arg Val 290 295 300 Asp Val Ser Gln Leu Arg Thr Pro Ala Wing Trp Gly Thr Val Wing Leu 305 310 315 320 Leu Leu Ala Thr Ala Thr Ala Ala Lys Val Val Pro Ala Ala Leu Gly 325 330 335 Wing Arg Leu Gly Gly Leu Arg Gly Ser Glu Wing Wing Leu Val Wing Val 340 345 350 Gly Leu Asn Met Lys Gly Gly Thr Asp Leu He Val Wing He Val Gly 355 360 365 Val Glu Leu Gly Leu Leu Ser Asn Glu Wing Tyr Thr Met Tyr Ala Val 370 375 380 Val Ala Leu Val Thr Val Thr Ala Ser Pro Ala Leu Leu He Trp Leu 385 390 395 400 Glu Lys Arg Wing Pro Pro Thr Gln Glu Glu Be Wing Arg Leu Glu Arg 405 410 415 Glu Glu Wing Wing Arg Arg Wing Tyr He Pro Gly Val Glu Arg He Leu 420 425 430 Val Pro He Val Wing His Wing Ala Leu Pro Gly Phe Wing Thr Asp He Val 435 440 445 Glu Ser He Val Wing Ser Lys Arg Lys Leu Gly Glu Thr Val Asp He 450 455 460 Thr Glu Leu Ser Val Glu Gln Gln Wing Pro Gly Pro Ser Arg Wing Wing 465 470 475 480 Gly Glu Wing Being Arg Gly Leu Wing Arg Leu Gly Wing Arg Leu Arg Val 485 490 495 Gly He Trp Arg Gln Arg Arg Glu Leu Arg Gly Ser He Gln Wing He 500 505 510 Leu Arg Wing Being Arg Asp His Asp Leu Leu Val He Gly Wing Arg Ser 515 520 525 Pro Wing Arg Wing Arg Gly Met Being Phe Gly Arg Leu Gln Asp Ala He 530 535 540 Val Gln Arg Ala Glu Ser Asn Val Leu Val Val Val Gly Asp Pro Pro 545 550 555 560 Ala Ala Glu Arg Ala Be Ala Arg Arg He Leu Val Pro He He Gly 565 570 575 Leu Glu Tyr Ser Phe Ala Ala Ala Asp Leu Ala Ala His Ala Ala Ala 589 585 590 Trp Wing Asp Ala Glu Leu Val Leu Leu Ser Wing Gln Thr Asp Pro 595 600 605 Gly Wing Val Val Trp Arg Asp Arg Glu Pro Ser Arg Val Arg Wing Val 610 615 620 Wing Arg Ser Val Val Asp Glu Wing Val Phe Arg Gly Arg Arg Leu Gly 625 630 635 640 Val Arg Val Ser Ser Arg Val His Val Gly Ala Pro Ser Asp Glu 645 650 655 He Thr Arg Glu Leu Ala Arg Ala Pro Tyr Asp Leu Leu Val Leu Gly 660 665 670 Cys Tyr Asp His Gly Pro Leu Gly Arg Leu Tyr Leu Gly Ser Thr Val 675 680 685 Glu Ser Val Val Val Arg Ser Arg Val Pro Val Ala Leu Leu Val Wing 690 695 700 His Gly Gly Thr Arg Glu Gln Val Arg 705 710 < 210 > 12 < 211 > 126 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 12 Met Asp Lys Pro He Gly Arg Thr Arg Cys Wing He Wing Glu Gly Tyr 1 5 10 15 He Pro Gly Gly Ser Asn Gly Pro Glu Pro Gln Met Thr Ser His Glu 20 25 30 Thr Wing Cys Leu Leu Asn Wing Being Asp Arg Asp Wing Gln Val Wing 35 40 45 Thr Val Tyr Phe Ser Asp Arg Asp Pro Wing Gly Pro Tyr Arg Val Thr 50 55 60 Val Pro Wing Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu 65 70 75 80 Pro Glu Pro He Pro Arg Asp Thr Asp Tyr Ala Ser Val He Glu Ser 85 90 95 Asp Ala Pro He Val Val Gln His Thr Arg Leu Asp Ser Arg Gln Ala 100 105 110 Glu Asn Ala Leu Leu Ser Thr He Ala Tyr Thr Asp Arg Glu 115 120 125 < 210 > 13 < 211 > 149 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 13 Met Lys His Val Asp Thr Gly Arg Arg Phe Gly Arg Arg He Gly His 1 5 10 15 Thr Leu Gly Leu Leu Wing Pro Met Wing Wing Le Gly Cly Gly Gly Pro 20 25 30 Ser Glu Lys Thr Val Gln Gly Thr Arg Leu Wing Pro Gly Wing Asp Wing 35 40 45 Arg Val Thr Wing Asp Val Asp Pro Asp Wing Wing Thr Thr Arg Leu Wing 50 55 60 Val Asp Val Val His Leu Ser Pro Pro Glu Arg Leu Glu Wing Gly Ser 65 70 75 80 Glu Arg Phe Val Val Trp Gln Arg Pro Ser Pro Glu Ser Pro Trp Arg 85 90 95 Arg Val Gly Val Leu Asp Tyr Asn Wing Asp Ser Arg Arg Gly Lys Leu 100 105 110 Wing Glu Thr Thr Val Pro Tyr Wing Asn Phe Glu Leu Leu He Thr Wing 115 120 125 Glu Lys Gln Ser Ser Pro Gln Ser Pro Ser Wing Ala Val He Gly 130 135 140 Pro Thr Ser Val Gly 145 < 210 > 14 < 211 > 184 < 12 > PRT < 213 > Sorangium cellulosum < 400 > 14 Val Thr Ser Glu Glu Val Pro Gly Ala Ala Glu Ala Gln Ser Ser 1 5 10 15 Leu Val Arg Ala Gln His Ala Ala Arg His Val Arg Pro Cys Thr Arg 20 25 30 Wing Glu Glu Pro Pro Wing Leu Met His Gly Leu Thr Glu Arg Gln Val 35 40 45 Leu Leu Ser Leu Val Ala Leu Ala Leu Val Leu Leu Thr Ala Arg Ala 50 55 60 Phe Gly Glu Leu Wing Arg Arg Leu Arg Gln Pro Glu Val Leu Gly Glu 65 70 75 80 Leu Phe Gly Gly Val Val Leu Gly Pro Ser Val Val Gly Ala Leu Ala 85 90 95 Pro Gly Phe His Arg Val Leu Phe Gln Asp Pro Ala Val Gly Val Val 100 105 110 Leu Ser Gly He Ser Trp He Gly Ala Leu Val Leu Leu Leu Met Wing 115 120 125 Gly He Glu Val Asp Val Ser He Leu Arg Lys Glu Wing Arg Pro Gly 130 135 140 Wing Leu Be Wing Leu Gly Wing Wing Pro Pro Leu Arg Thr Pro Gly 145 150 155 160 Pro Leu Val Gln Arg Met Gln Gly Wing Phe Thr Trp Asp Leu Asp Val 165 170 175 Ser Pro Arg Arg Ser Ala Gln Ala 180 < 210 15 < 211 > 145 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 15 Val Asn Wing Pro Cys Met Arg Cys Thr Ser Gly Pro Gly Val Arg Ser 1 5 10 15 Gly Gly Wing Wing Pro Wing Wing Glu Wing Wing Pro Gly Arg Wing Being 20 25 30 Leu Arg Arg Met Leu Thr Ser Thr Ser He Pro Wing Met Ser Ser Arg 35 40 45 Thr Wing Pro Pro He Gln Glu Met Pro Glu Ser Thr Thr Pro Thr Wing 50 55 60 Gly Ser Trp Lys Arg Thr Arg Trp Asn Pro Gly Wing Be Wing Pro Thr 65 70 75 80 Thr Asp Gly Pro Ser Thr Thr Pro Pro Lys Ser Ser Pro Ser Thr Ser 85 90 95 Gly Trp Arg Ser Arg Arg Wing Ser Pro Lys Wing Arg Wing Val Arg 100 105 110 Arg Thr Ser Wing Arg Wing Thr Ser Glu Being Arg Thr Cys Arg Ser Val 115 120 125 Arg Pro Cys He Arg Wing Gly Gly Being Wing Arg Val Gln Gly Arg 130 135 140 Thr 145 < 210 > 16 < 211 > 185 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 16 Val Leu Ala Pro Pro Ala Asp He Arg Pro Pro Ala Ala Ala Gln Leu 1 5 10 15 Glu Pro Asp Ser Pro Asp Asp Glu Wing Asp Glu Wing Asp Glu Wing Leu 20 25 30 Arg Pro Phe Arg Asp Wing He Wing Wing Tyr Ser Glu Wing Val Arg Trp 35 40 45 Wing Glu Wing Wing Gln Arg Pro Arg Leu Glu Being Leu Val Arg Leu Ala 50 55 60 He Val Arg Leu Gly Lys Ala Leu Asp Lys Val Pro Phe Ala His Thr 65 70 75 80 Thr Ala Gly Val Ser Gln He Wing Gly Arg Leu Gln Asn Asp Wing Val 85 90 95 Trp Phe Asp Val Wing Wing Arg Tyr Wing Ser Phe Arg Wing Wing Thr Glu 100 105 110 His Wing Ala Leu Arg Wing Ala Wing Wing Met Wing Glu Wing Leu Wing Wing Wing 115 120 125 Pro Tyr Arg Gly Ser Ser Val Val Wing Wing Wing Val Gly Glu Phe Arg 130 135 140 Gly Glu Wing Wing Arg Leu His Pro Wing Asp Arg Val Pro Wing As Asp 145 150 155 160 Gln Gln He Leu Thr Wing Leu Arg Wing Wing Glu Arg Wing Leu He Wing 165 170 175 Leu Tyr Thr Ala Phe Ala Arg Glu Glu 180 185 < 210 > 17 < 211 > 146 < 212 > PRT < 213 > Sorangxum cellulosum < 400 > 17 Met Ala Asp Ala Ala Ser Arg Be Ala Cys Ser Val Ala Ala Arg Lys 10 15 Leu Ala Tyr Arg Ala Ala Thr Ser Asn Gln Thr Ala Ser Phe Trp Ser 25 30 Leu Pro Wing He Trp Glu Thr Pro Wing Val Val Cys Wing Lys Gly Thr 35 40 45 Leu Ser Wing Leu Pro Ser Arg Thr Wing Wing Being Arg Thr Arg Leu 50 55 60 Ser Ser Arg Gly Arg Cys Wing Wing Wing Hxs Arg Thr Wing Ser Glu 65 70 75 80 Tyr Wing Wing Wing Being Arg Asn Gly Arg Being Wing Being Wing Being 85 90 95 Being Wing Being Being Gly Glu Being Gly Ser Being Trp Wing Wing Wing Gly 100 105 110 Gly Arg Met Be Wing Gly Gly Wing Be Thr Gly Glu Val Tyr Glu Gln 115 120 125 Wing Pro Arg Leu Arg Leu Wing Gln Ser Val Wing Wing Arg Arg Arg Asp 130 135 140 Pro Thr 145 < 210 > 18 < 211 > 288 < 212 > PRT < 213 > Sorangxum cellulosum < 400 > 18 Val Thr Val Ser Ser Met Pro Arg Ser Trp Ser Ser Arg Val Arg Thr 1 5 10 15 Val Val Thr Ala Leu Gly Cys Ala Arg Arg Leu Ser Gly Ser He Ser 20 25 30 Arg Leu Arg Arg His Pro Glu Ala Gly Arg Ala Pro Arg Ser Arg Leu 35 40 45 Arg Ala Trp Arg Arg Leu Pro Gln His He Ser Ser Pro Trp Arg His 50 55 60 Leu Pro Pro Gly Wing Arg Val Gly Thr Ser Cys Pro Wing Asp Arg Arg 65 70 75 80 He Leu Pro Ser His Arg Thr Wing Asp Leu Gly Thr Ser Gly Gly Thr 85 90 95 Leu Val Wing Arg Met Ser Gly His Val Wing Arg Asn Pro His Wing Wing 100 105 110 Val Leu Val Gly Asp Gly Ser Wing Arg Gly Arg Arg Arg Leu Ser Asn 115 120 125 Arg Arg Wing Glu Arg Arg Val Ser Asp Val Thr Cys Arg Glu Gly Gly 130 135 140 Glu Wing Met Gln Lys He Wing Gly Lys Leu Val Val Gly Leu He Ser 145 150 155 160 Val Ser Gly Met Ser Leu Leu Ala Wing Cys Gly Gly Glu Lys Arg Ser 165 170 175 Gly Gly Glu Wing Gln Thr Pro Gly Gly Wing Gln Gly Glu Wing Pro Val 180 185 190 Pro Val Gly Ser Wing Val Asp Ser He Val Wing Wing Arg Cys Asp Arg 195 200 205 Glu Wing Arg Cys Asn Asn He Gly Gln Asp Arg Glu Tyr Ser Ser Lys 210 215 220 Asp Wing Cys Ser Asn Lys He Arg Ser Glu Trp Arg Asp Glu Leu Thr 225 230 235 240 Phe Gly Glu Cys Pro Gly Gly He Asp Ala Lys Gln Leu Asn Glu Cys 245 250 255. Leu Glu Gly He Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu 260 265 270 Gly Arg Val Val Ala Cys Arg Ser Ser Asp Leu Cys Arg Asp Ala Arg 275 280 285 < 210 > 19 < 211 > 288 < 12 > PRT < 213 > Sorangium cellulosum < 400 > 19 Val Thr Val Ser Ser Met Pro Arg Ser Trp Ser Ser Arg Val Arg Thr 1 5 10 15 Val Val Thr Ala Leu Gly Cys Ala Arg Arg Leu Ser Gly Ser He Ser 20 25 30 Arg Leu Arg Arg His Pro Glu Ala Gly Arg Ala Pro Arg Ser Arg Leu 35 40 45 Arg Ala Trp Arg Arg Leu Pro Gln His He Ser Ser Pro Trp Arg His 50 55 60 Leu Pro Pro Gly Wing Arg Val Gly Thr Ser Cys Pro Wing Asp Arg Arg 65 70 75 80 He Leu Pro Ser His Arg Thr Wing Asp Leu Gly Thr Ser Gly Gly Thr 85 90 95 Leu Val Ala Arg Met Ser Gly His Val Ala Arg Asn Pro His Ala Ala 100 105 110 Val Leu Val Gly Asp Gly Ser Ala Arg Gly Arg Arg Arg Leu Ser Asn 115 120 125 Arg Arg Ala Glu Arg Arg Val Ser Asp Val Thr Cys Arg Glu Gly Gly 130 135 140 Glu Wing Met Gln Lys He Wing Gly Lys Leu Val Val Gly Leu He Ser 145 150 155 160 Val Ser Gly Met Ser Leu Leu Ala Ala Cys Gly Gly Glu Lys Arg Ser 165 170 175 Gly Gly Glu Wing Gln Thr Pro Gly Gly Wing Gln Gly Glu Wing Pro Val 180 185 190 Pro Val Gly Ser Wing Val Asp Ser He Val Wing Wing Arg Cys Asp Arg 195 200 205 Glu Wing Arg Cys Asn Asn He Gly Gln Asp Arg Glu Tyr Ser Ser Lys 210 215 220 Asp Wing Cys Ser Asn Lys He Arg Ser Glu Trp Arg Asp Glu Leu Thr 225 230 235 240 Phe Gly Glu Cys Pro Gly Gly He Asp Ala Lys Gln Leu Asn Glu Cys 245 250 255 Leu Glu Gly He Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu 260 265 270 Gly Arg Val Val Wing Cys Arg Ser Being Asp Leu Cys Arg Asp Wing Arg 275 280 285 < 210 > 20 < 211 > 155 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 20 Met Asp Pro Arg Ala Arg Arg Glu Lys Arg Pro Ser Leu Leu Asp Ser 1 5 10 15 Arg Gly Arg Gln Pro Lys Arg Ser Gln Gln Gly Gly His Met Glu Lys 20 25 30 Pro He Gly Arg Thr Arg Trp Wing He Wing Glu Gly Tyr He Pro Gly 35 40 45 Arg Ser Asn Gly Pro Glu Pro Gln Met Thr Ser His Glu Thr Ala Cys 50 55 60 Leu Leu Asn Wing Being Asp Arg Asp Wing Gln Val Wing He Thr Val Tyr 65 70 75 80 Phe Ser Asp Arg Asp Pro Wing Gly Pro Tyr Arg Val Thr Val Pro Wing 85 90 95 Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu Pro Glu Pro 100 105 110 He Pro Arg Asp Thr Asp Tyr Ala Ser Val He Glu Ser Asp Val Pro 115 120 125 He Val Val Gln His Thr Arg Leu Asp Ser Arg Gln Wing Glu Asn Wing 130 135 140 Leu He Ser Thr He Wing Tyr Thr Asp Arg Glu 145 150 155 < 210 > 21 < 211 > 156 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 21 Val Arg Arg Ser Arg Trp Gln Met Lys His Val Asp Thr Gly Arg Arg 1 5 10 15 Val Gly Arg Arg He Gly Leu Thr Leu Gly Leu Leu Wing Ser Met Wing 20 25 30 Leu Wing Gly Cys Gly Gly Pro Ser Glu Lys He Val Gln Gly Thr Arg 35 40 45 Leu Wing Pro Gly Wing Asp Wing Hxs Val Wing Wing Asp Val Asp Pro Asp 50 55 60 Wing Wing Thr Thr Arg Leu Wing Val Asp Val Val His Leu Ser Pro Pro 65 70 75 80 Glu Arg He Glu Wing Gly Ser Glu Arg Phe Val Val Trp Gln Arg Pro 85 90 95 Being Ser Glu Being Pro Trp Gln Arg Val Gly Val Leu Asp Tyr Asn Wing 100 105 110 Wing Being Arg Arg Gly Lys Leu Wing Glu Thr Thr Val Pro His Wing Asn 115 120 125 Phe Glu Leu Leu He Thr Val Glu Lys Gln Ser Ser Pro Gln Ser Pro 130 135 140 Ser Ser Wing Wing Val He Gly Pro Thr Ser Val Gly 145 150 155 < 210 > 22 < 211 > 305 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 22 Met Glu Lys Glu Ser Arg He Wing He Tyr Gly Wing He Wing Wing Asn 1 5 10 15 Val Ala Wing Wing Wing Val Lys Phe Wing Wing Wing Val Thr Gly Ser 20 25 30 Ser Wing Met Leu Ser Glu Gly Val His Ser Leu Val Asp Thr Wing Asp 35 40 45 Gly Leu Leu Leu Leu Leu Gly Lys His Arg Ser Wing Arg Pro Pro Asp 50 55 60 Wing Glu His Pro Phe Gly His Gly Lys Glu Leu Tyr Phe Trp Thr Leu 65 70 75 80 He Val Ala He Met He Phe Wing Wing Gly Gly Gly Val Ser He Tyr 85 90 95 Glu Gly He Leu His Leu Leu His Pro Arg Gln He Glu Asp Pro Thr 100 105 110 Trp Asn Tyr Val Val Leu Gly Ala Wing Wing Val Phe Glu Gly Thr Ser 115 120 125 Leu He He Ser He His Glu Phe Lys Lys Lys Asp Gly Gln Gly Tyr 130 135 140 Leu Ala Ala Met Arg Ser Ser Lys Asp Pro Thr Thr Phe Thr He Val 145 150 155 160 Leu Glu Asp Ser Ala Ala Leu Ala Gly Leu Thr He Ala Phe Leu Gly 165 170 175 Val Trp Leu Gly His Arg Leu Gly Asn Pro Tyr Leu Asp Gly Ala Wing 180 185 190 Ser He Gly He Gly Leu Val Leu Ala Wing Val Wing Val Phe Leu Wing 195 200 205 Ser Gln Ser Arg Gly Leu Leu Val Gly Glu Ser Wing Asp Arg Glu Leu 210 215 220 Leu Wing Wing Wing Arg Wing Leu Wing Wing Wing Asp Pro Gly Val Wing Wing 225 230 235 240 Val Gly Arg Pro Leu Thr Met His Phe Gly Pro His Glu Val Leu Val 245 250 255 Val Leu Arg He Glu Phe Asp Wing Wing Leu Thr Wing Being Gly Val Wing 260 265 270 Glu Wing He Glu Arg He Glu Thr Arg He Arg Ser Glu Arg Pro Asp 275 280 285 Val Lys His He Tyr Val Glu Ala Arg Ser Leu His Gln Arg Ala Arg 290 295 300 Wing 305 < 210 > 23 < 211 > 135 < 212 > PRT < 213 > Sorangium cellulosum < 400 > 23 Val Gln Thr Ser Ser Phe Asp Wing Arg Tyr Wing Gly Cys Lys Ser Ser 1 5 10 15 Arg Arg He Wing Arg Wing Gly Wing Wing Wing Arg Wing Wing Arg Wing 20 25 30 His Glu Gly Wing Wing Wing Wing Gly Phe Glu Gly Gly Asp Val Met Arg 35 40 45 Lys Wing Arg Wing His Gly Wing Met Leu Gly Gly Arg Asp Aep Gly Trp 50 55 60 Arg Arg Gly Leu Pro Gly Wing Gly Wing Leu Arg Wing Wing Leu Gln Arg 65 70 75 80 Gly Arg Ser Arg Asp Leu Wing Arg Arg Arg Leu He Wing Ser Val Ser 85 90 95 Leu Ala Gly Gly Ala Ser Met Ala Val Val Ser Leu Phe Gln Leu Gly 100 105 110 He He Glu Arg Leu Pro Asp Pro Pro Leu Pro Gly Phe Asp Be Wing 115 120 125 Lys Val Thr Ser Ser Asp He 130 135 < 210 > 24 < 211 > 19 < 212 > DNA < 213 > Artificial Sequence < 220 > < 223 > Description of the Artificial Sequence: universal reverse initiator < 400 > 24 ggaaacagct atgaccatg 19 < 210 > 25 < 211 > 17 < 212 > DNA < 213 > Artificial Sequence < 220 > < 223 > Description of the Artificial Sequence: universal forward starter < 400 > 25 gtaaaacgac ggccagt 17 < 210 > 26 < 211 > 28 < 212 > DNA < 213 > Artificial Sequence < 220 > < 223 > Description of the Artificial Sequence: "B" end of NH24 of PCR starter < 400 > 26 gtgactggcg cctggaatct gcatgagc 28 < 210 > 27 < 211 > 28 < 212 > DN < 213 > Artxficial Sequence < 220 > < 223 > Description of the Artificial Sequence: "A" end of NH2 of PCR initiation < 400 > 27 agcgggagct tgctagacat tctgtttc 28 < 210 > 28 < 211 > 24 < 212 > DNA < 213 > Artificial Sequence < 220 > < 223 > Description of the Artificial Sequence: "B" end of NH2 of PCR starter < 400 > 28 gacgcgcctc gggcagcgcc ccaa 24 < 210 > 29 < 211 > 25 < 212 > DNA < 213 > Artificial Sequence < 220 > < 223 > Description of the Artificial Sequence: "B" end of pEP015-NH6 of PCR primer < 400 > 29 caccgaagcg tcgatctggt ccatc 25 < 210 > 30 < 211 > 25 < 212 > DNA < 213 > Artificial Sequence < 220 > < 223 > Description of the Artificial Sequence: end "A" of pEP015H2.7 of PCR initiator. < .400 > 30 cggtcagatc gacgacgggc tttcc 25

Claims (93)

1. An isolated nucleic acid molecule comprising a nucleotide sequence encoding at least one poly-peptide involved in the biosynthesis of epothilone.
2. An isolated nucleic acid molecule according to claim 1, wherein this nucleotide sequence is isolated from a myxobacteria.
3. An isolated nucleic acid molecule according to claim 2, wherein said myxobacteria is Sorangiam cellulosum.
4. A chimeric gene comprising a heterologous promoter sequence operably linked to a nucleic acid molecule according to claim 1.
5. A recombinant vector comprising a chimeric gene according to claim 4.
6. A recombinant host cell that it comprises a chimeric gene according to claim 4.
7. The recombinant host cell of claim 6, which is a bacterium.
8. The recombinant host cell of the claim
7, which is an Actinomycete.
9. The recombinant host cell of the claim
8, which is Streptomyces.
10. A Bac clone comprising a nucleic acid molecule according to claim 1.
11. The Bac clone of claim 10 which is pE-P015.
12. An isolated nucleic acid molecule according to claim 1, wherein said polypeptide comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: SEQ ID NO: 2, amino acids 11 -437 of SEQ ID NO: 2, amino acids 543-864 of SEQ ID NO: 2, amino acids 974-1273 of SEQ ID NO: 2, amino acids 1314-1385 of SEQ ID NO: 2, SEQ ID NO: 3, amino acids 72-81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO: 3, amino acids 868 -892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 1, amino acids 1268-1274 of SEQ ID NOA, amino acids 1285-1297 of the SEQ ID NO: 3, amino acids 973-1256 of SEQ ID NO: 3, amino-acids 1344-1351 of SEQ ID NO: 3, SEQ ID NO: 4, amino acids 7-432 of SEQ ID NO: 4, amino acids 539-859 of the SEQ ID NO: i, amino acids 869-1037 of SEQ ID NO: 4, amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1722-1792 of SEQ ID NO: 4, SEQ ID NO: 5, amino acids 39 -457 of SEQ ID NO: 5, amino acids 563-884 of SEQ ID NO: 5, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 3024-3449 of the SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acid gone 5631-5951 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 7140 -7211 of SEQ ID NO: 5, SEQ ID NO: 6, amino acids 35-454 of SEQ ID NO: 6, amino acids 561-881 of SEQ ID NO: 6, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, amino acids 2383-2551 of SEQ ID NO: 6, amino acids 2671-3045 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, SEQ ID NO: 7, amino acids 32 ^ 450 of SEQ ID NO: 7, amino acids 556-877 of SEQ ID NO: 7, amino acids 887-1051 of SEQ ID NO: 7, amino acids 1478-1790 of SEQ ID NO: 7, amino acids 1810-2055 of SEQ ID NO: 7, amino acids 2093-2164 of SEQ ID NO: 7, amino acids 2165-2439 of SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, and SEQ ID NO: 22.
13. An isolated nucleic acid molecule according to claim 12, wherein said polypeptide comprises an amino acid sequence selected from the group consisting of from the group consisting of: SEQ ID NO: 2, amino acids 11-437 of SEQ ID N0: 2, amino acids 543-864 of SEQ ID N0: 2, amino acids 974-1273 of SEQ ID N0: 2, amino acids 1314-1385 of SEQ ID NO: 2, SEQ CD NO: 3, amino acids 72-81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO. : 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3 , amino acids 815-821 of SEQ ID NO: 3, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268-1274 of SEQ ID NO: 3, amino acids 1285-1297 of SEQ ID NO: 3, amino acids 973 -1256 of SEQ ID NO: 3, amino acids 1344-1351 of SEQ ID NO: 3, SEQ ID NO: 4, amino acids 7-432 of SEQ ID NO: 4, amino acids 539-859 of SEQ ID NO • 4, amino acids 869-1037 of SEQ ID NO: 4, amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1722-1792 of SEQ ID NO: 4, SEQ ID NO: 5, amino acids 39-457 of SEQ ID NO: 5, amino acids 563-884 of SEQ ID NO: 5, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 1524-1950 of the SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 3886-4048 of SEQ ID NO: 5, am Nos. 4433-4719 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 5631 -5951 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: E, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO-.5, SEQ ID NO: 6, amino acids 35-454 of SEQ ID NO: 6, amino acids 561-881 of SEQ ID NO-.6, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, amino acids 2383-2551 of SEQ ED NO: 6, amino acids 2671-3045 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, SEQ ID NO: 7, amino acids 32-450 of SEQ ID NO: 7, amino acids 556-877 of SEQ ID NO: 7, amino acids 887-1051 of SEQ ID NO: 7, amino acids 1478-1790 of SEQ ID NO: 7, amino acids 1810-2055 of SEQ ID NO: 7, amino acids 2093-2164 of SEQ ID NO: 7, amino acids 2165-2439 of SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, and SEQ ID NO: 22.
14. An isolated nucleic acid molecule according to claim 12 wherein this nucleotide sequence is substantially similar to a nucleotide sequence selected from of the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO: 1, nucleotides 3415-5556 of SEQ ID NO: 1, nucleotides 7610-11875 of SEQ ID NO: 1, nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 10529-11428 of SEQ ID NO: l, nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 11872-16104 of the SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SE Q ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: l, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, nucleotides 15901- 15924 of SEQ ID NO: 1, nu-cleotides 16251-21749 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 18855- 19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 21746-43519 of SEQ ID NO: 1, nucleotides 21860- 23116 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, ru-cleotides 26045-26263 of S? Q ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 27911-28876 of the SEQ ID NO: 1, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO : 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO : 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1 , nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO-1, nucleotides 48087-49361 of SEQ ID NO: 1, nucleotides 49680 -50642 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, nucleotides 51534-52657 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, nucleotides 54935-62254 of SEQ ID NO: 1, nucleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO: 1, nucleotides 57593-58087 of SEQ ID NO: 1, nucleotides 59366-60304 of SEQ ID NO: 1, nucleotides 60362-61099 of SEQ ID NO: 1, nucleotides 61211-61426 of SEQ ID NO: 1, nucleotides 61427-62254 of SEQ ID NO: 1, nucleotides 62369-63628 of SEQ ID NO: 1, nucleotides 67334-68251 of SEQ ID NO: 1, and nucleotides 1-68750 SEQ ID NO: l.
15. A nucleic acid molecule according to claim 12, wherein said nucleotide sequence is selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO: 1, nucleotides 3415-5556 of SEQ ID NO: 1, nucleotides 7610-11875 of SEQ ID NO: 1, nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 10529-11428 of the SEQ ID NO: 1, nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, n -cleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO: 1, nuc leotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, nucleotides 15901 -15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 21746-43519 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1 , nucleotides 25184-25942 of SEQ ID NO: 1, nu-cleotides 26045-26263 of SEQ ID NO: 1, nucleotides 2631S-27595 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1 , nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 37052- 38320 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, n ucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, IU-cleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, nucleotides 51534 -52657 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, nucleotides 54935-62254 of SEQ ID NO: 1, nucleotides 55028 -56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO: 1, nucleotides 57593-58087 of SEQ ID NO: 1, nucleotides 59366-60304 of SEQ ID NO: 1, nucleotides 60362-61099 of SEQ ID NO: 1, nucleotides 61211-61426 of SEQ ID NO: 1, nucleotides 61427-62254 of SEQ ID NO: 1, nucleotides 62369-63628 of SEQ ID NO: 1, nucleotides 67334-68251 of the SEQ ID NO: 1, and nucleotides 1-68750 SEQ ID NO: l.
16. A chimeric gene comprising a heterologous promoter sequence operably linked to a nucleic acid molecule according to claim 12.
17. A recombinant vector comprising a chimeric gene according to claim 16.
18. A host cell recombinant comprising a chimeric gene according to claim 16.
19. The recombinant host cell of the claim
18, which is a bacterium.
20. The recombinant host cell of the claim
19, which is an Actinomycete.
21. The recombinant host cell of the claim
20, which is Streptomyces.
22. An isolated nucleic acid molecule according to claim 1, wherein the nucleotide sequence comprises a portion of nucleotides-of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO: 1, nucleotides 3415-5556 of SEQ ID NO: 1, nucleotides 7610-11875 of SEQ ID NO: 1, nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 10529-11428 of SEQ ID NO: 1, nucleotides 11549-11764 of SEQ ID NO: l, nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SE Q ID NO: 1, nucleotides 13876-13923 ce SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1, nucleotides 15901-15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 21746-43519 of SEQ ID NO: 1, nucleotides 21860 -23116 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: l, nucleotides 27911-2887 6 of SEQ ID NO: 1, ru-cleotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 32408- 33373 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO-1, nucleotides 48087-49361 of SEQ ID NO: l, nucleotides 49680-50642 of SEQ ID NO: l, nucleotides 506 70-51176 of SEQ ID NO-1, nucleotides 51534-52657 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, nucleotides 54935 -62254 of SEQ ID NO: 1, nucleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO: 1, nucleotides 57593-58087 of SEQ ID NO: 1, nucleotides 59366-60304 of SEQ ID NO: 1, nucleotides 60362-61099 of SEQ ID NO: 1, nucleotides 61211-61426 of SEQ ID NO: 1, nucleotides 61427-62254 of SEQ ID NO: 1, nucleotides 62369-63628 of the SEQ ID NO: 1, nucleotides 67334-68251 of SEQ ID NO: 1, and nucleotides 1-68750 SEQ ID NO: l.
23. A chimeric gene comprising a heterologous promoter-sequence operably linked to an acid molecule
<tb> <tb> <tb> A nucleic acid according to claim 22.
24. A recombinant vector comprising a chimeric gene according to claim 23.
25. A recombinant host cell comprising a chimeric gene according to claim 1. 23.
26. The recombinant host cell of the claim
25, which is a bacterium.
27. The recombinant host cell of the claim
26, which is an Actinomycete.
28. The recombinant host cell of the claim
27, which is Streptomyces.
29. An isolated nucleic acid molecule comprising a nucleotide sequence encoding at least one epothilone synthase domain.
30. An isolated nucleic acid molecule according to claim 29, wherein this epothilone synthase domain is a β-ketoacyl synthase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group which consists of: amino acids 11-437 of SEQ ID NO: 2, amino acids 7-432 of SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO : 5, amino acids 3024-3449 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6 , and amino acids 32-450 of SEQ ID NO: 7.
31. An isolated nucleic acid molecule according to claim 30, wherein the β-ketoacyl synthase domain comprises an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO: 2, amino acids 7-432 of SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO: 5, amino acids 5103 -5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO: 7.
32. An isolated nucleic acid molecule according to claim 30, wherein this nucleotide sequence is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 16269-17546 of SEQ ID NO : 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1 , nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, and nucleotides 55028-56284 of SEQ ID NO: 1.
33. An isolated nucleic acid molecule according to the invention. claim 30, wherein this nucleotide sequence comprises a nucleotide portion of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of : nucleotides 7643-8920 of SEQ ID NO: l, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: l, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, and nucleotides 55028-56284 of SEQ ID NO: l .
34. An isolated nucleic acid molecule according to claim 30, wherein this nucleotide sequence is selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 16269-17546 of the SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 30815-32092 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: l, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO: 1, and nucleotides 55028-56284 of SEQ ID NO: 1.
35. A nucleic acid molecule isolated according to claim 29, wherein this epothilone synthase domain is an acyltransferase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 543- 864 of SEQ ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563- 884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO-5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO: 5, amino-acids 561-881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6 , and amino acids 556-877 of SEQ ID NO: 7.
36. An isolated nucleic acid molecule according to claim 35, wherein the acyltransferase domain comprises an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563-884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO:, amino acids 3555- 3876 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO-.5, amino acids 561-881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, and amino acids 556- 877 of SEQ ID NO: 7.
37. An isolated nucleic acid molecule according to claim 35, wherein the nucleotide sequence is substantially similar to a nucleotide sequence. selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 49680-50642 of the SEQ ID NO: 1, and nucleotides 56600-57565 of SEQ ID NO: 1.
38. An isolated nucleic acid molecule according to claim 35, wherein this nucleotide sequence comprises a nucleotide portion of 20 base pairs Sequents of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 27911-28876 of the SEQ ID NO: 1, nucleotides 32408-33373 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, and nucleotides 56600-57565 of SEQ ID NO: 1.
39. An isolated nucleic acid molecule according to claim 35, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID NO: 1, nucleotides 23431-24397 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID NO: 1, nucleotides 32408 -33373 of SEQ ID NO: 1, nucleotides 38636-39598 of SEQ ID NO: 1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 49680-50642 of SEQ ID NO: 1, and nucleotides 56600- 57565 of SEQ ID NO: 1.
40. An isolated nucleic acid molecule according to claim 29, wherein the epothilone synthase domain is an enoyl-reductase domain comprising and an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO: 2, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7.
41. An isolated nucleic acid molecule according to claim 40, wherein the enoyl -reductase domain comprises a sequence of amino acids selected from the group consisting of: amino acids 974-1273 ele SEQ ID NO: 2, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478 -1790 of SEQ ID NO: 7.
42. An isolated nucleic acid molecule according to claim 40, wherein the nucleotide sequence is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO: l, nucle Etidos 35042-35902 of SEQ ID NO: l, nucleotides 41369-42256 of SEQ ID NO: 1, and na-cleotides 59366-60304 of SEQ ID NO: 1.
43. An isolated nucleic acid molecule according to Claim 40, wherein the nucleotide sequence comprises a portion of nucleotides of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO: 1, and nucleotides 59366-60304 of SEQ ID NO: 1.
44. A molecule of Isolated nucleic acid according to claim 40, wherein this nucleotide sequence is selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO: 1, nucleotides 35042-35902 of SEQ ID NO: 1 , nucleotides 41369-42256 of SEQ ID NO: 1, and nucleotides 59366-60304 of SEQ ID NO: 1.
45. An isolated nucleic acid molecule according to claim 29, wherein the epothilone-syntan domain is an acyl carrier protein domain comprising an amino acid sequence substantially similar to a sequence of amino acids selected from group consisting of: amino acids 1314-1385 of SEQ ID NO: 2, amino acids 1722-1792 of SEQ ID NO: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2164 of SEQ ID NO: 7.
46. An isolated nucleic acid molecule according to claim 45, wherein the acyl carrier protein domain comprises an amino acid sequence selected from the group that consists of: amino acids 1314-1385 of SEQ ID NO: 2, amin o acids 1722-1792 of SEQ ID NO: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids no acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2164 of SEQ ID NO: 7.
47 An isolated nucleic acid molecule according to claim 45, wherein this nucleotide sequence is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO.l , nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, and nucleotides 61211-61426 of SEQ I D NO: l.
48. An isolated nucleic acid molecule according to claim 45, wherein said nucleotide sequence comprises a nucleotide portion of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID NO: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 54540-54758 of the SEQ ID NO: 1, and nucleotides 61211-61426 of SEQ ID NO: 1.
49. An isolated nucleic acid molecule according to claim 45, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ ID N0: 1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO: 1, nucleotides 36773-36991 ce SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 54540-54758 of SEQ ID NO: 1, and nucleotides 61211-61426 of SEQ ID NO: 1.
50. A nucleic acid molecule isolated from according to claim 29, wherein the epothilone synthase domain is a dehydratase domain, comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 869-1037 SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino Acids 5964-6132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7.
51. An isolated nucleic acid molecule according to Claim 50, wherein the dehydratase domain comprises an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964 -6132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7.
52. An isolated nucleic acid molecule according to claim 50, wherein the nucleotide sequence is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, and nucleotides 57593 -58087 of SEQ ID NO: l.
53. An isolated nucleic acid molecule according to claim 50, wherein the nucleotide sequence comprises a nucleotide portion of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: L, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO: 1, and nucleotides 57593-58087 of SEQ ID NO: 1.
54. An isolated nucleic acid molecule according to claim 50, wherein the nucleotide sequence is selected from the group consisting of in: nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID NO : 1, and nucleotides 57593-58087 of SEQ ID N0: 1.
55. An isolated nucleic acid molecule according to claim 29, wherein the epothilone synthase domain is a ß-ketosereductase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of in: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-13S9 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5 , amino acids 6857-71C1 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 181C-2055 of SEQ ID NO: 7.
56. An isolated nucleic acid molecule according to claim 55, wherein the β-ketoreductase domain comprises an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 2645-2895 of the SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 6857-7101 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 1810-2055 of SEQ ID NO: 7.
57. An isolated nucleic acid molecule according to claim 55, wherein the nucleotide sequence is substantially similar to a selectable nucleotide sequence donated from the group consisting of: nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO : l, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO-. 1, nucleotides 53697-54431 of SEQ ID NO: 1, and nucleotides 60362-61099 of SEQ ID NO: 1.
58. An isolated nucleic acid molecule according to claim 55, wherein the nucleotide sequence comprises a nucleotide portion of 20 consecutive base pairs of a sequence identical to a 20 base pair base eculents of a nucleotide sequence selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 25184-25942 of SEQ ID NO: L, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 53697-54431 of SEQ ID NO: 1, and nucleotides 60362-61099 of SEQ ID NO: l.
59. An isolated nucleic acid molecule according to claim 55, wherein the nucleotide sequence is selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO: 1, nucleotides 25184-25942 of the SEQ ID NO: 'l, nucleotides 29678-30429 of SEQ ID NO: 1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO: 1, nu-cleotides 53697-54431 of SEQ ID NO: 1, and nucleotides 60362-61099 of SEQ ID NO: l.
60. An isolated nucleic acid molecule according to claim 29, wherein the epothilone synthase domain is a methyltransferase domain comprising an amino acid sequence substantially similar to amino acids 261 '.. -3045 of SEQ ID NO. : 6.
61. An isolated nucleic acid molecule according to claim 60, wherein the methyltransferase domain comprises amino acids 2671-3045 of SEQ ID NO: 6.
62. An isolated nucleic acid molecule according to the invention. claim 60, wherein the nucleotide sequence is substantially similar to nucleotides 2671-3045 of SEQ ID NO: 1.
63. An isolated nucleic acid molecule according to claim 60, wherein said nucleotide sequence comprises a portion of nucleotides of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of nucleotides 51534-52657 of SEQ ID NO: 1.
64. An isolated nucleic acid molecule according to claim 60, wherein said nucleotide sequence is of nucleotides 51534-52657 of SEQ ID NO: 1.
65. An isolated nucleic acid molecule according to claim 29 , wherein the epothilone-smt.ase domain is a thioesterase domain comprising a sequence of amino acids substantially similar to amino acids 2165-2439 of SEQ ID NO: 7.
66. An isolated nucleic acid molecule according to Claim 65, wherein the thioesterase domain comprises amino acids 2165-2439 of SEQ ID NO: 7.
67. An isolated nucleic acid molecule according to claim 65, wherein the nucleotide sequence is substantially similar to nucleotides. 61427-62254 of SEQ ID NO: 1.
68. An isolated nucleic acid molecule according to claim 65, wherein said nucleotide sequence comprises a nucleotide portion of 20 s of consecutive bases of a sequence identical to a portion of 20 consecutive base pairs of nucleotides 61427-62254 of SEQ ID NO: l.
69. An isolated nucleic acid molecule according to claim 65, wherein this nucleotide sequence is from nucleotides 61427-62254 of SEQ ID NO: 1.
70. An isolated nucleic acid molecule comprising a nucleotide sequence encoding a non-ribosomal synthetase peptide, wherein this peptide synthetase is not ribosomal comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: SEQ ID NO: 3, amino acids 72-81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO: 3, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268-1274 of SEQ ID NO: 3, amino acids 1285-1297 of SEQ ID NO: 3, amino acid s 973-1256 of SEQ ID NO: 3, and amino acids 1344-1351 of SEQ ED NO: 3.
71. An isolated nucleic acid molecule according to claim 70, wherein said non-ribosomal peptide synthetase comprises a sequence of amino acids selected from the group consisting of SEQ ID NO: 3, amino acids 72-81 of SEQ ID NO: 3, amino acids 118-125 of SEQ ID NO: 3, amino acids 199-212 of SEQ ID NO: 3, amino acids 353-363 of SEQ ID NO: 3, amino acids 549-565 of SEQ ID NO: 3, amino acids 588-603 of SEQ ID NO: 3, amino acids 669-684 of SEQ ID NO: 3, amino acids 815-821 of SEQ ID NO:, amino acids 868-892 of SEQ ID NO: 3, amino acids 903-912 of SEQ ID NO: 3, amino acids 918-940 of SEQ ID NO: 3, amino acids 1268- 1274 of SEQ ID NO-3, amino acids 1285-1297 of SEQ ID NO: 3, amino acids 973-1256 of SEQ ID NO: 3, and amino acids 1344-1351 of SEQ ID NO: 3
72. A molecule of nucleic acid isolated according to claim 70, wherein this nucleotide sequence is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of the SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ. ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO : l, nucleotides 14578-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1 , nucleotides 14788-15639 of SEQ ID NO: 1, and nucleotides 15901-15924 of SEQ ID NO: 1.
73. An isolated nucleic acid molecule according to claim 70 , wherein this nucleotide sequence comprises a portion of nucleotides of 20 consecutive base pairs of a sequence identical to a portion of 20 consecutive base pairs of a nucleotide sequence selected from the group consisting of: nucleotides 11872 -16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO: 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of the SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO-1, nucleotides 14788-15639 of SEQ ID NO: 1, and nucleotides 15901-15924 of the SEQ ID NO: l.
74. An isolated nucleic acid molecule according to claim 70, wherein this nucleotide sequence is selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO: 1, nucleotides 12085-12114 of SEQ ID NO: 1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO: 1, nucleotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO: 1, ru-cleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO-1, nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO : 1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID NO: 1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID NO: 1 , and nucleotides 15901-15924 of SEQ ID NO: 1.
75. A method for the heterologous expression of epothilone in a recombinant host, which corpides: (a) introducing a chimeric gene according to the claim 4 in a host; and (b) cultivate the host under conditions that allow the biosynthesis of epothilone in the host.
76. A method for producing epothilone, which comprises: (a) expressing epothilone in a recombinant host by the method of claim 75; and (b) extracting epothilone from the recombinant host.
77. An isolated polypeptide comprising an amino acid sequence consisting of an epothilone synthase domain.
78. An isolated polypeptide according to claim 77, wherein the epothilone synthase domain is a β-ketoacyl synthase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of : amino acids 11-437 of SEQ ID NO: 2, amino acids 7-432 of SEQ ID NO: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO: 7.
79. An isolated polypeptide according to claim 78, wherein the β-ketoacyl synthase domain comprises an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO: 2, atii-no acids 7-432 of SEQ ID N0: 4, amino acids 39-457 of SEQ ID NO: 5, amino acids 1524-1950 of SEQ ID NO: 5, amino acids 3024-3449 of SEQ ID NO: 5, amino acids 5103-5525 of SEQ ID NO: 5, amino acids 35-454 of SEQ ID NO: 6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO: 7.
80. An isolated polypeptide according to claim 77, wherein the epothilone synthase domain is a domain of acetyltransferase comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563-884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO: 5, amino acids 561- 881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, and amino acids 556-877 of SEQ ID NO: 7.
81. An isolated polypeptide according to claim 1 80, wherein the acyltransferase domain comprises an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO: 2, amino acids 539-859 of SEQ ID NO: 4, amino acids 563 -884 of SEQ ID NO: 5, amino acids 2056-2377 of SEQ ID NO: 5, amino acids 3555-3876 of SEQ ID NO: 5, amino acids 5631-5951 of SEQ ID NO: 5, amino acids 561-881 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO: 6, and amino acids 556-877 of SEQ ID NO-7.
82. An isolated polypeptide according to claim 77, wherein the epothilone synthase domain is an enoyl-reductase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of : amino acids 974-1273 of SEQ ID N0: 2, amino acids 4433-4719 of SEQ] D N0: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7.
83. An isolated polypeptide according to claim 82, wherein the enoyl reductase domain comprises an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO: 2, amino acids 4433-4719 of SEQ ID NO: 5, amino acids 6542-6837 of SEQ ID NO: 5, and amino acids 1478-1790 of SEQ ID NO: 7.
84. An isolated polypeptide according to claim 77, wherein the epothilone synthase domain is a domino of acyl carrier protein, wherein said polypeptide comprises an amino acid sequence substantially similar to an amino acid sequence selected from of the group consisting of: amino acids 1314-1385 of SEQ ID NO. 2, amino acids 1722-1792 of SEQ ID NO: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ ID NO: 6, amino acids 3673-3745 of SEQ ID NO : 6, and amino acids 2093-2164 of SEQ ID NO: 7.
85. An isolated polypeptide according to claim 84, wherein the acyl carrier protein domain comprises an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO: 2, amino acids 1722 -1792 of SEQ ID NO: 4, amino acids 1434-1506 of SEQ ID NO: 5, amino acids 2932-3005 of SEQ ID NO: 5, amino acids 5010-5082 of SEQ ID NO: 5, amino acids 7140-7211 of SEQ ID NO: 5, amino acids 1430-1503 of SEQ CD NO: 6, amino acids 3673-3745 of SEQ ID NO: 6, and amino acids 2093-2164 of SEQ ID NO: 7.
86. A polypeptide isolated according to claim 77, wherein the epothilone synthase domain is a dehydratase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 869-1037 of the SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964-6 132 of SEQ ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7.
87. An isolated polypeptide according to claim 86, wherein the dehydratase domain comprises an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO: 4, amino acids 3886-4048 of SEQ ID NO: 5, amino acids 5964-6132 of SEQ. ID NO: 5, amino acids 2383-2551 of SEQ ID NO: 6, and amino acids 887-1051 of SEQ ID NO: 7.
88. An isolated polypeptide according to claim 77, wherein the epothilone synthase domain is a β-ketorreductase domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 4729-4974 of SEQ ID NO: 5, amino acids 6857 -7101 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO: 6, and amino acids 1810-2055 of SEQ ID NO: 7.
89. A Isolated polypeptide according to claim 88, wherein the β-ketoreductase domain comprises an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO: 4, amino acids 1147-1399 of the SEQ ID NO: 5, amino acids 2645-2895 of SEQ ID NO: 5, amino acids 4729- 4974 of SEQ ID NO: 5, atii-no acids 6857-7101 of SEQ ID NO: 5, amino acids 1143-1393 of SEQ ID NO: 6, amino acids 3392-3636 of SEQ ID NO -.6, and ami -no acids 1810-2055 of SEQ ID NO: 7.
90. An isolated polypeptide according to claim 77, wherein the epothilone-synthase domain is a methyltransferase domain comprising an amino acid sequence substantially similar to those of amino acids 2671-3045 of SEQ ID NO: 6.
91. An isolated polypeptide according to claim 90, wherein the methyltransferase domain comprises amino acids 2671-3045 of SEQ ID NO: 6.
92. An isolated polypeptide according to claim 77, wherein the epothilone domain -sintase is a thioesterase domain comprising an amino acid sequence substantially similar to amino acids 2165-2439 of SEQ ID NO: 7.
93. An isolated polypeptide according to claim 77, wherein the thioesterase domain comprises the amino acids 2165-2439 of SEQ ID NO: 7.
REUMEN
Nucleic acid molecules are isolated from Sorangium cellulosum which encode the polypeptides necessary for the biosynthesis of epothilone. Methods for the production of epothilone in recombinant hosts transformed with the genes of the invention are disclosed. In this way epothilone can be produced in sufficiently large amounts to make possible its purification and use in pharmaceutical formulations, such as those for the treatment of cancer.
* * * * *
f * S $ fc & >
MXPA/A/2000/012342A 1998-06-18 2000-12-13 Genes for the biosynthesis of epothilones MXPA00012342A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/099,504 1998-06-18
US60/101,631 1998-09-24
US60/118,906 1999-02-05

Publications (1)

Publication Number Publication Date
MXPA00012342A true MXPA00012342A (en) 2001-11-21

Family

ID=

Similar Documents

Publication Publication Date Title
US6383787B1 (en) Genes for the biosynthesis of epothilones
KR100511233B1 (en) Genes for the Biosynthesis of Epothilones
AU753546B2 (en) Epothilone C, D, E and F, production process, and their use as cytostatic as well as phytosanitary agents
DK2271666T3 (en) NRPS-PKS GROUP AND ITS MANIPULATION AND APPLICABILITY
AU773517B2 (en) Polyketide synthase enzymes and recombinant DNA constructs therefor
US7172884B2 (en) Methods for the preparation, isolation and purification of epothilone B, and x-ray crystal structures of epothilone B
CN101809030A (en) Thiopeptide precursor protein, gene encoding it and uses thereof
US20030180760A1 (en) Compositions and methods for hydroxylating epothilones
JP2023012549A (en) Modified streptomyces fungicidicus isolates and use thereof
CN100374566C (en) Genes for the biosynthesis of epothilones
CN108456703A (en) A kind of method of heterogenous expression Epothilones
MXPA00012342A (en) Genes for the biosynthesis of epothilones
CN107090476A (en) Sanglifehrin derivatives and preparation method thereof
CZ20004693A3 (en) Isolated nucleic acid encoding polypeptide participating in biosynthesis of epothilone, chimeric gene, vector and host cells containing such nucleic acid
KR20050050146A (en) Genes and proteins for the biosynthesis of the glycopeptide antibiotic a40926
RU2265054C2 (en) Recombinant cell-host (variants) and bac clone
RU2234532C2 (en) Nucleic acid (variants), it using for expression of epotilones, polypeptide (variants), escherichia coli microorganism clone
CN100359014C (en) Novel epothilones compound and its preparation method and application
CN115247179B (en) Polyketide skeleton and biosynthetic gene cluster of post-modifier thereof and application thereof
Julien et al. Genetic Engineering of Myxobacterial Natural Product Biosynthetic Genes