AU753567B2 - Genes for the biosynthesis of epothilones - Google Patents

Genes for the biosynthesis of epothilones Download PDF

Info

Publication number
AU753567B2
AU753567B2 AU46116/99A AU4611699A AU753567B2 AU 753567 B2 AU753567 B2 AU 753567B2 AU 46116/99 A AU46116/99 A AU 46116/99A AU 4611699 A AU4611699 A AU 4611699A AU 753567 B2 AU753567 B2 AU 753567B2
Authority
AU
Australia
Prior art keywords
seq
nucleotides
amino acids
ala
leu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU46116/99A
Other versions
AU4611699A (en
Inventor
Devon Cyr
Jorn Gorlach
James Madison Ligon
Istvan Molnar
Thomas Schupp
Ross Zirkle
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Novartis AG
Original Assignee
Novartis AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novartis AG filed Critical Novartis AG
Publication of AU4611699A publication Critical patent/AU4611699A/en
Application granted granted Critical
Publication of AU753567B2 publication Critical patent/AU753567B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P17/00Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms
    • C12P17/18Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms containing at least two hetero rings condensed among themselves or condensed with a common carbocyclic ring system, e.g. rifamycin
    • C12P17/181Heterocyclic compounds containing oxygen atoms as the only ring heteroatoms in the condensed system, e.g. Salinomycin, Septamycin
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • A61P35/04Antineoplastic agents specific for metastasis
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Animal Behavior & Ethology (AREA)
  • Oncology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Enzymes And Modification Thereof (AREA)

Description

WO 99/66028 PCT/EP99/04171 GENES FOR THE BIOSYNTHESIS OF EPOTHILONES FIELD OF THE INVENTION The present invention relates generally to polyketides and genes for their synthesis.
In particular, the present invention relates to the isolation and characterization of novel polyketide synthase and nonribosomal peptide synthetase genes from Sorangium cellulosum that are necessary for the biosynthesis of epothilones A and B.
BACKGROUND OF THE INVENTION Polyketides are compounds synthesized from two-carbon building blocks, the 1carbon of which always carries a keto group, thus the name polyketide. These compounds include many important antibiotics, immunosuppressants, cancer chemotherapeutic agents, and other compounds possessing a broad range of biological properties. The tremendous structural diversity derives from the different lengths of the polyketide chain, the different side-chains introduced (either as part of the two-carbon building blocks or after the polyketide backbone is formed), and the stereochemistry of such groups. The keto groups may also be reduced to hydroxyls, enoyls, or removed altogether. Each round of two-carbon addition is carried out by a complex of enzymes called the polyketide synthase (PKS) in a manner similar to fatty acid biosynthesis.
The biosynthetic genes for an increasing number of polyketides have been isolated and sequenced. For example, see U.S. Patent Nos. 5,639,949, 5,693,774, and 5,716,849, all of which are incorporated herein by reference, which describe genes for the biosynthesis of soraphen. See also, Schupp et al., FEMS Microbiology Letters 159: 201-207 (1998) and WO 98/07868, which describe genes for the biosynthesis of rifamycin, and U.S. Patent No.
5,876,991, which describes genes for the biosynthesis of tylactone, all of which are incorporated herein by reference. The encoded proteins generally fall into two types: type I and type II. Type I proteins are polyfunctional, with several catalytic domains carrying out different enzymatic steps covalently linked together PKS for erythromycin, soraphen, rifamycin, and avermectin (MacNeil et in Industrial Microorganisms: Basic and Applied Molecular Genetics, Baltz et American Society for Microbiology, Washington D. C.
WO 99/66028 PCT/EP99/04171 -2pp. 245-256 (1993)); whereas type II proteins are monofunctional (Hutchinson et al., in Industrial Microorganisms: Basic and Applied Molecular Genetics, Baltz et al.), American Society for Microbiology, Washington D. C. pp. 203-216 (1993)).
For the simpler polyketides such as actinorhodin (produced by Streptomyces coelicolor), the several rounds of two-carbon additions are carried out iteratively on PKS enzymes encoded by one set of PKS genes. In contrast, synthesis of the more complicated compounds such as erythromycin and soraphen involves PKS enzymes that are organized into modules, whereby each module carries out one round of two-carbon addition (for review, see Hopwood et al., in Industrial Microorganisms: Basic and Applied Molecular Genetics, Baltz et American Society for Microbiology, Washington D. pp. 267-275 (1993)).
Complex polyketides and secondary metabolites in general may contain substructures that are derived from amino acids instead of simple carboxylic acids. Incorporations of these building blocks are accomplished by non-ribosomal polypeptide synthetases (NRPSs). NRPSs are multienzymes that are organized in modules. Each module is responsible for the addition (and the additional processing, if required) of one amino acid building block. NRPSs activate amino acids by forming aminoacyl-adenylates, and capture the activated amino acids on thiol groups of phophopantheteinyl prosthetic groups on peptidyl carrier protein domains. Further, NRPSs modify the amino acids by epimerization, N-methylation, or cyclization if necessary, and catalyse the formation of peptide bonds between the enzyme-bound amino acids. NRPSs are responsible for the biosynthesis of peptide secondary metabolites like cyclosporin, could provide polyketide chain terminator units as in rapamycin, or form mixed systems with PKSs as in yersiniabactin biosynthesis.
Epothilones A and B are 16-membered macrocyclic polyketides with an acylcysteine-derived starter unit that are produced by the bacterium Sorangium cellulosum strain So (Gerth et al., J. Antibiotics 49: 560-563 (1996), incorporated herein by reference). The structure of epothilone A and B wherein R signifies hydrogen (epothilone A) or methyl (epothilone B) is: WO 99/66028 PCTIEP99/04171 -3- R 0 HO 0
O
OH
The epothilones have a narrow antifungal spectrum and especially show a high cytotoxicity in animal cell cultures (see, H6fle et al., Patent DE 4138042 (1993), incorporated herein by reference). Of significant importance, epothilones mimic the biological effects of taxol, both in vivo and in cultured cells (Bollag et al., Cancer Research 55: 2325- 2333 (1995), incorporated herein by reference). Taxol and taxotere, which stabilize cellular microtubules, are cancer chemotherapeutic agents with significant activity against various human solid tumors (Rowinsky et al., J. Natl. Cancer Inst. 83: 1778-1781 (1991)). Competition studies have revealed that epothilones act as competitive inhibitors of taxol binding to microtubules, consistent with the interpretation that they share the same microtubule-binding site and possess a similar microtubule affinity as taxol. However, epothilones enjoy a significant advantage over taxol in that epothilones exhibit a much lower drop in potency compared to taxol against a multiple drug-resistant cell line (Bollag et al. (1995)). Furthermore, epothilones are considerably less efficiently exported from the cells by P-glycoprotein than is taxol (Gerth et al. (1996)). In addition, several epothilone analogs have been synthesized that have a superior cytotoxic activity as compared to epothilone A or epothilone B as demonstrated by their enhanced ability to induce the polymerization and stabilization of microtubules (WO 98/25929, incorporated herein by reference).
Despite the promise shown by the epothilones as anticancer agents, problems pertaining to the production of these compounds presently limit their commercial potential. The compounds are too complex for industrial-scale chemical synthesis and so must be produced by fermentation. Techniques for the genetic manipulation of myxobacteria such as Sorangium cellulosum are described in U.S. Patent No. 5,686,295, incorporated herein by reference. However, Sorangium cellulosum is notoriously difficult to ferment and production levels of epothilones are therefore low. Recombinant production of epothilones in heterologous hosts that are more amenable to fermentation could solve current production problems. However, the genes that encode the polypeptides responsible for epothilone bio- -4synthesis have heretofore not been isolated. Furthermore, the strain that produces epothilones, i.e. So ce90, also produces at least one additional polyketide, spirangien, which would be expected to greatly complicate the isolation of the genes particularly responsible for epothilone biosynthesis.
Therefore, in view of the foregoing, one preferred object of the present invention is to isolate the genes that are involved in the synthesis of epothilones, particularly the genes that are involved in the synthesis of epothilones A and B in myxobacteria of the SQrangium/- Polyangium group, Sorangium cellulosum strain So ce90. A further preferred object of the invention is to provide a method for the recombinant production of epothilones for application in anticancer formulations.
SUMMARY OF THE INVENTION In furtherance of the aforementioned and other objects, the present invention unexpectedly overcomes the difficulties set forth above to provide for the first time a nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of epothilone. In a preferred embodiment, the nucleotide sequence is isolated from a species belonging to Myxobacteria, most preferably Sorangium celhilosum.
*e *eoe o •o *•o -4A- In another preferred embodiment, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of epothilone, wherein the complement of said nucleotide sequence hybridizes to a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415- 5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, 00 nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1. nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1.
nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, -4B nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NQ:1, nucleotides 54540-54758 of SEQ* ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1. nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1I, and nucleotides 1-68750 SEQ ID NO:1I, under conditions of hybridization at 7% sodium dodecyl sulfate (SIDS), 0.5 M NaPO 4 pH 1 mM EDTA at 50*C and washing with 2X SSC, 1 SDS at 50 0
C.
Q:O)PERMKRU.362694 2(YJdoa-19A17.)2 In another preferred embodiment, the present invention provides an isolated nucleic acd molecule comprising a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of epothilone, wherein said nucleotide sequence has at least 60 percent sequence identity with a nucleotide sequence selected from the group consisting of: the complement of nudeotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1. nucleotides 11549-11764 of SEQ ID NO:1.
nucleotides 11872-16104 of SEQ ID NO:1. nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1,. nucleotides 13516-13566 of SEQ ID NO:1, nudeotides 13633-13680 of SEQ ID NO:1, nudeotides 13876-13923 of SEQ ID NO:1, nudeotides 14313-14334 of SEQ ID NO:1. nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nudeotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nudeotides 17865-18827 of SEQ ID NO: 1, nucleotides 18855-19361 of SEQ ID NO: 1.
nucleotides 20565-21302 of SEQ ID NO:1. nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1.
nucleotides 23431-24397 of SEQ ID NO:1. nucleotides 25184-25942 of SEQ ID NO:1.
nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1. nudeotides 29678-30429 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID NO:1, nudeotides 30815-32092 of SEQ ID NO: 1.
nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO: 1, *o• P:'OPER\MKR\2362684 spc.doc-22/07/02 nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO: 1, nucleotides 47811-48032 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
In another preferred embodiment, the present invention provides an isolated :nucleic acid molecule as outlined above comprising a nucleotide sequence that encodes a polypeptide which comprises an amino S acid sequence selected from the group consisting of: SEQ ID NO:2, amino acids 11-437 of SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3. amino acids 549-565 of SEQ ID NO:3, amino acids 588- 603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4, SEQ ID NO:5, amino acids 39- 457 of SEQ ID NO:5, amino acids 563-884 of SEQ ID NO:5, amino acids 1147-1399 of SEQ ID NO:5, amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID amino acids 2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID amino acids 2932-3005 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5, amino acids 4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010- 5082 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID amino acids 6857-7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID SEQ ID NO:6, amino acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6, amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392- 3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NO:7, amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11, and SEQ ID NO:22.
e P:OPER\MKR\2362684 spec.doc-22/07/02 -7- In another preferred embodiment, the present invention provides a nucleic acid molecule as outlined above comprising a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643- 8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529- 11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872- 16104 of *e WO 99/66028 WO 9966028PCT/EP99/041 71 -8- SEQ ID NO:1, nucteotides 12085-1 2114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-1 2507 of SEQ ID NO:1, nucleotides 12928-1 2960 of SEQ ID NO:1, nucleotides 13516-1 3566 of SEQ ID NO:1, nucleotides 13633-1 3680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 1431 3-14334 of SEQ ID NQ:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-1 4607 of SEQ ID NO:1, nucleotides 14623-1 4692 of SEQ ID NQ:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-1 5762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NQ:1, nucleotides 17865-1 8827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21 302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 3081 5-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-3699 1 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 5 1534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NQ:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ) ID NO:1, nucieotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61 426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NQ:1, and nucleotides 1-68750 SEQ) ID NO:1.
WO 99/66028 PCT/EP99/04171 -9- In yet another preferred embodiment, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of an epothilone, wherein said nucleotide sequence comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415- 5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID WO 99/66028 PCTIEP99/04171 NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
The present invention also provides a chimeric gene comprising a heterologous promoter sequence operatively linked to a nucleic acid molecule of the invention. Further, the present invention provides a recombinant vector comprising such a chimeric gene, wherein the vector is capable of being stably transformed into a host cell. Still further, the present invention provides a recombinant host cell comprising such a chimeric gene, wherein the host cell is capable of expressing the nucleotide sequence that encodes at least one polypeptide necessary for the biosynthesis of an epothilone. In a preferred embodiment, the recombinant host cell is a bacterium belonging to the order Actinomycetales, and in a more preferred embodiment the recombinant host cell is a strain of Streptomyces. In other embodiments, the recombinant host cell is any other bacterium amenable to fermentation, such as a pseudomonad or E. coli. Even further, the present invention provides a Bac clone comprising a nucleic acid molecule of the invention, preferably Bac clone In another aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence that encodes an epothilone synthase domain.
According to one embodiment, the epothilone synthase domain is a p-ketoacyl-synthase (KS) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103- 5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. According to this embodiment, said KS domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids WO 99/66028 PCT/EP99/04171 -11 3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 55028-56284 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 55028-56284 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 55028-56284 of SEQ ID NO:1.
According to another embodiment, the epothilone synthase domain is an acyltransferase (AT) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631- 5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. According to this embodiment, said AT domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino WO 99/66028 PCT/EP99/04171 -12acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556- 877 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, and nucleotides 56600-57565 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably base pair nucleotide portion identical in sequence to a respective consecutive 20, 35, 40, 45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865- 18827 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911- 28876 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636- 39598 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680- 50642 of SEQ ID NO:1, and nucleotides 56600-57565 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, and nucleotides 56600-57565 of SEQ ID NO:1.
According to still another embodiment, the epothilone synthase domain is an enoyl reductase (ER) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID and amino acids 1478-1790 of SEQ ID NO:7. According to this embodiment, said ER domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 10529-11428 of WO 99/66028 PCT/EP99/04171 -13- SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, and nucleotides 59366-60304 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, and nucleotides 59366-60304 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, and nucleotides 59366-60304 of SEQ ID NO:1.
According to another embodiment, the epothilone synthase domain is an acyl carrier protein (ACP) domain, wherein said polypeptide comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino acids 1430- 1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093- 2164 of SEQ ID NO:7. According to this embodiment, said ACP domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434- 1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 61211-61426 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, WO 99/66028 PCT/EP99/04171 -14or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 61211-61426 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 61211-61426 of SEQ ID NO:1.
According to another embodiment, the epothilone synthase domain is a dehydratase (DH) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. According to this embodiment, said DH domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 33401- 33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 50670- 51176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, WO 99/66028 PCT/EP99/04171 nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ ID NO:1.
According to yet another embodiment, the epothilone synthase domain is a P-ketoreductase (KR) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. According to this embodiment, said KR domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857- 7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID NO:1.
WO 99/66028 PCT/EP99/04171 -16- According to an additional embodiment, the epothilone synthase domain is a methyltransferase (MT) domain comprising an amino acid sequence substantially similar to amino acids 2671-3045 of SEQ ID NO:6. According to this embodiment, said MT domain preferably comprises amino acids 2671-3045 of SEQ ID NO:6. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to nucleotides 51534-52657 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, or 50 (preferably 20) base pair portion of nucleotides 51534-52657 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is nucleotides 51534-52657 of SEQ ID NO:1.
According to another embodiment, the epothilone synthase domain is a thioesterase (TE) domain comprising an amino acid sequence substantially similar to amino acids 2165- 2439 of SEQ ID NO:7. According to this embodiment, said TE domain preferably comprises amino acids 2165-2439 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to nucleotides 61427-62254 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably base pair portion of nucleotides 61427-62254 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is nucleotides 61427-62254 of SEQ ID NO:1.
In still another aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleotide sequence that encodes a non-ribosomal peptide synthetase, wherein said non-ribosomal peptide synthetase comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549- 565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973- 1256 of SEQ ID NO:3, and amino acids 1344-1351 of SEQ ID NO:3. According to this WO 99/66028 PCT/EP99/04171 -17embodiment, said non-ribosomal peptide synthetase preferably comprises an amino acid sequence selected from the group consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, and amino acids 1344-1351 of SEQ ID NO:3. Also, according to this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide sequence selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably base pair portion of a nucleotide sequence selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably is selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085- 12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466- -18- 12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEO ID NO:1, nucleotides 13516- 13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876- 13923 of SEQ IO NO:1, nucleotides 14313-14334 of SEQ ID NO: 1, nucleotides 14473- 14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623- 14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724- 15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901- 15924 of SEQ ID NO:1.
The present invention further provides an isolated nucleic acid molecule comprising a nucleotide sequence that encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOs:2-23.
In accordance with another aspect, the present invention also provides methods for the recombinant production of polyketides such as epothilones in quantities large enough to enable their purification and use in pharmaceutical formulations such as those for the treatment of cancer. A specific advantage of these production methods is the chirality of the molecules produced; production in transgenic organisms avoids the generation of populations of racemic mixtures, within which some enantiomers may have reduced activity. In particular, the present invention provides a method for heterologous expression of epothilone in a recombinant host. comprising: introducing into a host a chimeric gene comprising a heterologous promoter sequence operatively linked to a nucleic acid molecule of the invention that comprises a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of epothilone; and growing the host in conditions that allow biosynthesis of epothilone in the host.. The present invention also provides a method for producing epothilone, comprising: expressing epothilone in a recombinant host by the 1 aforementioned method; and extracting epothilone from the recombinant host.
According to still another aspect, the present invention provides an isolated polypeptide involved in the biosynthesis of epothilone, wherein said polypeptide comprises an amino acid sequence that is at least 90% identical to an amino acid sequence selected from the group consisting of: SEQ ID NO:2, amino acids 11-437 of P:OPER\MKR\2362684 spc.doc.-22/07/02 18A- SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285- 1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ ID NO:5, amino acids 563-884 of SEQ ID NO:5, amino acids 1147-1399 of SEQ ID amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555- 3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5, amino acids 4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID i amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, amino acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522- 1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6, amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NO:7, amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11, and SEQ ID NO:22.
P:DOPER\MKR\23626U. spwod-l2O7/02 -18B According to one embodiment, the epothilone synthase domain is a P-ketoacyIsynthase (KS) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of. amino acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5. amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103- 5525 of SEQ ID NO:5. amino acids 35-454 of SEQ ID NO:6. amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. According to this embocliment WO 99/66028 PCT/EP99/04171 19said KS domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7.
According to another embodiment, the epothilone synthase domain is an acyltransferase (AT) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631- 5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. According to this embodiment, said AT domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556- 877 of SEQ ID NO:7.
According to still another embodiment, the epothilone synthase domain is an enoyl reductase (ER) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID and amino acids 1478-1790 of SEQ ID NO:7. According to this embodiment, said ER domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7.
According to another embodiment, the epothilone synthase domain is an acyl carrier protein (ACP) domain, wherein said polypeptide comprises an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010- 5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of WO 99/66028 PCT/EP99/04171 SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7. According to this embodiment, said ACP domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7.
According to another embodiment, the epothilone synthase domain is a dehydratase (DH) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. According to this embodiment, said D.H domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7.
According to yet another embodiment, the epothilone synthase domain is a p-ketoreductase (KR) domain comprising an amino acid sequence substantially similar to an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. According to this embodiment, said KR domain preferably comprises an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645- 2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7.
According to an additional embodiment, the epothilone synthase domain is a methyltransferase (MT) domain comprising an amino acid sequence substantially similar to amino acids 2671-3045 of SEQ ID NO:6. According to this embodiment, said MT domain preferably comprises amino acids 2671-3045 of SEQ ID NO:6.
WO 99/66028 PCT/EP99/04171 -21 According to another embodiment, the epothilone synthase domain is a thioesterase (TE) domain comprising an amino acid sequence substantially similar to amino acids 2165- 2439 of SEQ ID NO:7. According to this embodiment, said TE domain preferably comprises amino acids 2165-2439 of SEQ ID NO:7.
Other aspects and advantages of the present invention will become apparent to those skilled in the art from a study of the following description of the invention and non-limiting examples.
DEFINITIONS
In describing the present invention, the following terms will be employed, and are intended to be defined as indicated below.
Associated With Operatively Linked: Refers to two DNA sequences that are related physically or functionally. For example, a promoter or regulatory DNA sequence is said to be "associated with" a DNA sequence that codes for an RNA or a protein if the two sequences are operatively linked, or situated such that the regulator DNA sequence will affect the expression level of the coding or structural DNA sequence.
Chimeric Gene: A recombinant DNA sequence in which a promoter or regulatory DNA sequence is operatively linked to, or associated with, a DNA sequence that codes for an mRNA or which is expressed as a protein, such that the regulator DNA sequence is able to regulate transcription or expression of the associated DNA sequence. The regulator DNA sequence of the chimeric gene is not normally operatively linked to the associated DNA sequence as found in nature.
Coding DNA Sequence: A DNA sequence that is translated in an organism to produce a protein.
Domain: That part of a polyketide synthase necessary for a given distinct activity.
Examples include acyl carrier protein (ACP), p-ketosynthase acyltransferase (3ketoreductase dehydratase enoylreductase and thioesterase (TE) domains.
Epothilones: 16-membered macrocyclic polyketides naturally produced by the bacterium Sorangium cellulosum strain So ce90, which mimic the biological effects of taxol. In this application, "epothilone" refers to the class of polyketides that includes epothilone A and epothilone B, as well as analogs thereof such as those described in WO 98/25929.
WO 99/66028 PCT/EP99/04171 -22 Epothilone Synthase: A polyketide synthase responsible for the biosynthesis of epothilone.
Gene: A defined region that is located within a genome and that, besides the aforementioned coding DNA sequence, comprises other, primarily regulatory, DNA sequences responsible for the control'of the expression, that is to say the transcription and translation, of the coding portion.
Heterologous DNA Sequence: A DNA sequence not naturally associated with a host cell into which it is introduced, including non-naturally occurring multiple copies of a naturally occurring DNA sequence.
Homologous DNA Sequence: A DNA sequence naturally associated with a host cell into which it is introduced.
Homologous Recombination: Reciprocal exchange of DNA fragments between homologous DNA molecules.
Isolated: In the context of the present invention, an isolated nucleic acid molecule or an isolated enzyme is a nucleic acid molecule or enzyme that, by the hand of man, exists apart from its native environment and is therefore not a product of nature. An isolated nucleic acid molecule or enzyme may exist in a purified form or may exist in a non-native environment such as, for example, a recombinant host cell.
Module: A genetic element encoding all of the distinct activities required in a single round of polyketide biosynthesis, one condensation step and all the p-carbonyl processing steps associated therewith. Each module encodes an ACP, a KS, and an AT activity to accomplish the condensation portion of the biosynthesis, and selected postcondensation activities to effect the p-carbonyl processing.
NRPS: A non-ribosomal polypeptide synthetase, which is a complex of enzymatic activities responsible for the incorporation of amino acids into secondary metabolites including, for example, amino acid adenylation, epimerization, N-methylation, cyclization, peptidyl carrier protein, and condensation domains. A functional NRPS is one that catalyzes the incorporation of an amino acid into a secondary metabolite.
NRPS gene: One or more genes encoding NRPSs for producing functional secondary metabolites, epothilones A and B, when under the direction of one or more compatible control elements.
WO 99/66028 PCT/EP99/04171 23 Nucleic Acid Molecule: A linear segment of single- or double-stranded DNA or RNA that can be isolated from any source. In the context of the present invention, the nucleic acid molecule is preferably a segment of DNA.
ORF: Open Reading Frame.
PKS: A polyketide synthase, which is a complex of enzymatic activities (domains) responsible for the biosynthesis of polyketides including, for example, ketoreductase, dehydratase, acyl carrier protein, enoylreductase, ketoacyl ACP synthase, and acyltransferase.
A functional PKS is one that catalyzes the synthesis of a polyketide.
PKS Genes: One or more genes encoding various polypeptides required for producing functional polyketides, epothilones A and B, when under the direction of one or more compatible control elements.
Substantially Similar: With respect to nucleic acids, a nucleic acid molecule that has at least 60 percent sequence identity with a reference nucleic acid molecule. In a preferred embodiment, a substantially similar DNA sequence is at least 80% identical to a reference DNA sequence; in a more preferred embodiment, a substantially similar DNA sequence is at least 90% identical to a reference DNA sequence; and in a most preferred embodiment, a substantially similar DNA sequence is at least 95% identical to a reference DNA sequence.
A substantially similar DNA sequence preferably encodes a protein or peptide having substantially the same activity as the protein or peptide encoded by the reference DNA sequence. A substantially similar nucleotide sequence typically hybridizes to a reference nucleic acid molecule, or fragments thereof, under the following conditions: hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO 4 pH 7.0, 1 mM EDTA at 50 0 C; wash with 2X SSC, 1% SDS, at 50°C. With respect to proteins or peptides, a substantially similar amino acid sequence is an amino acid sequence that is at least 90% identical to the amino acid sequence of a reference protein or peptide and has substantially the same activity as the reference protein or peptide.
Transformation: A process for introducing heterologous nucleic acid into a host cell or organism.
Transformed Transgenic Recombinant: Refers to a host organism such as a bacterium into which a heterologous nucleic acid molecule has been introduced. The nucleic acid molecule can be stably integrated into the genome of the host or the nucleic acid molecule can also be present as an extrachromosomal molecule. Such an extrachromosomal molecule can be auto-replicating. Transformed cells, tissues, or plants are understood to WO 99/66028 PCT/EP99/04171 -24 encompass not only the end product of a transformation process, but also transgenic progeny thereof. A "non-transformed", "non-transgenic", or "non-recombinant" host refers to a wild-type organism, a bacterium, which does not contain the heterologous nucleic acid molecule.
Nucleotides are indicated by their bases by the following standard abbreviations: adenine cytosine thymine and guanine Amino acids are likewise indicated by the following standard abbreviations: alanine (ala; arginine (Arg; asparagine (Asn; aspartic acid (Asp; cysteine (Cys; glutamine (Gin; glutamic acid (Glu; E), glycine (Gly; histidine (His; isoleucine (lie; leucine (Leu; lysine (lys; K), methionine (Met; phenylalanine (Phe; proline (Pro; serine (Ser; threonine (Thr; tryptophan (Trp; tyrosine (Tyr; and valine (Val; Furthermore, (Xaa; X) represents any amino acid.
DESCRIPTION OF THE SEQUENCES IN THE SEQUENCE LISTING SEQ ID NO:1 is the nucleotide sequence of a 68750 bp contig containing 22 open reading frames (ORFs), which comprises the epothilone biosynthesis genes.
SEQ ID NO:2 is the protein sequence of a type I polyketide synthase (EPOS A) encoded by epoA (nucleotides 7610-11875 of SEQ ID NO:1).
SEQ ID NO:3 is the protein sequence of a non-ribosomal peptide synthetase (EPOS P) encoded by epoP (nucleotides 11872-16104 of SEQ ID NO:1).
SEQ ID NO:4 is the protein sequence of a type I polyketide synthase (EPOS B) encoded by epoB (nucleotides 16251-21749 of SEQ ID NO:1).
SEQ ID NO:5 is the protein sequence of a type I polyketide synthase (EPOS C) encoded by epoC (nucleotides 21746-43519 of SEQ ID NO:1).
SEQ ID NO:6 is the protein sequence of a type I polyketide synthase (EPOS D) encoded by epoD (nucleotides 43524-54920 of SEQ ID NO:1).
SEQ ID NO:7 is the protein sequence of a type I polyketide synthase (EPOS E) encoded by epoE (nucleotides 54935-62254 of SEQ ID NO:1).
SEQ ID NO:8 is the protein sequence of a cytochrome P450 oxygenase homologue (EPOS F) encoded by epoF (nucleotides 62369-63628 of SEQ ID NO:1).
SEQ ID NO:9 is a partial protein sequence (partial Orf 1) encoded by orfl (nucleotides 1-1826 of SEQ ID NO:1).
WO 99/66028 PCT/EP99/04171 25 SEQ ID NO:10 is a protein sequence (Orf 2) encoded by orf2 (nucleotides 3171-1900 on the reverse complement strand of SEQ ID NO:1).
SEQ ID NO:11 is a protein sequence (Orf 3) encoded by orf3 (nucleotides 3415-5556 of SEQ ID NO:1).
SEQ ID NO:12 is a protein sequence (Orf 4) encoded by orf4 (nucleotides 5992-5612 on the reverse complement strand of SEQ ID NO:1).
SEQ ID NO:13 is a protein sequence (Orf 5) encoded by orn5 (nucleotides 6226-6675 of SEQ ID NO:1).
SEQ ID NO:14 is a protein sequence (Orf 6) encoded by orf6 (nucleotides 63779- 64333 of SEQ ID NO:1).
SEQ ID NO:15 is a protein sequence (Orf 7) encoded by orf7 (nucleotides 64290- 63853 on the reverse complement strand of SEQ ID NO:1).
SEQ ID NO:16 is a protein sequence (Orf 8) encoded by orf8 (nucleotides 64363- 64920 of SEQ ID NO:1).
SEQ ID NO:17 is a protein sequence (Orf 9) encoded by or/9 (nucleotides 64727- 64287 on the reverse complement strand of SEQ ID NO:1).
SEQ ID NO:18 is a protein sequence (Orf 10) encoded by orf10 (nucleotides 65063- 65767 of SEQ ID NO:1).
SEQ ID NO:19 is a protein sequence (Orf 11) encoded by orf11 (nucleotides 65874- 65008 on the reverse complement strand of SEQ ID NO:1).
SEQ ID NO:20 is a protein sequence (Orf 12) encoded by orf12 (nucleotides 66338- 65871 on the reverse complement strand of SEQ ID NO:1).
SEQ ID NO:21 is a protein sequence (Orf 13) encoded by orfl3 (nucleotides 66667- 67137 of SEQ ID NO:1).
SEQ ID NO:22 is a protein sequence (Orf 14) encoded by orfl4 (nucleotides 67334- 68251 of SEQ ID NO:1).
SEQ ID NO:23 is a partial protein sequence (partial Orf 15) encoded by (nucleotides 68346-68750 of SEQ ID NO:1).
SEQ ID NO:24 is the universal reverse PCR primer sequence.
SEQ ID NO:25 is the universal forward PCR primer sequence.
SEQ ID NO:26 is the NH24 end PCR primer sequence.
SEQ ID NO:27 is the NH2 end PCR primer sequence.
SEQ ID NO:28 is the NH2 end PCR primer sequence.
WO 99/66028 PCT/EP99/04171 26 SEQ ID NO:29 is the pEPO15-NH6 end PCR primer sequence.
SEQ ID NO:30 is the pEPO15-H2.7 end PCR primer sequence.
DEPOSIT INFORMATION The following material has been deposited with the Agricultural Research Service, Patent Culture Collection (NRRL), 1815 North University Street, Peoria, Illinois 61604, under the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for the Purposes of Patent Procedure. All restrictions on the availability of the deposited material will be irrevocably removed upon the granting of a patent.
Deposited Material Accession Number Deposit Date NRRL B-30033 June 11, 1998 pEPO32 NRRL B-30119 April 16, 1999 DETAILED DESCRIPTION OF THE INVENTION The genes involved in the biosynthesis of epothilones can be isolated using the techniques according to the present invention. The preferable procedure for the isolation of epothilone biosynthesis genes requires the isolation of genomic DNA from an organism identified as producing epothilones A and B, and the transfer of the isolated DNA on a suitable plasmid or vector to a host organism that does not normally produce the polyketide, followed by the identification of transformed host colonies to which the epothilone-producing ability has been conferred. Using a technique such as X::Tn5 transposon mutagenesis (de Bruijn Lupski, Gene 27: 131-149 (1984)), the exact region of the transforming epothiloneconferring DNA can be more precisely defined. Alternatively or additionally, the transforming epothilone-conferring DNA can be cleaved into smaller fragments and the smallest that maintains the epothilone-conferring ability further characterized. Whereas the host organism lacking the ability to produce epothilone may be a different species from the organism from which the polyketide derives, a variation of this technique involves the transformation of host DNA into the same host that has had its epothilone-producing ability disrupted by mutagenesis. In this method, an epothilone-producing organism is mutated and nonepothilone-producing mutants are isolated. These are then complemented by genomic DNA isolated from the epothilone-producing parent strain.
WO 99/66028 PCT/EP99/04171 27 A further example of a technique that can be used to isolate genes required for epothilone biosynthesis is the use of transposon mutagenesis to generate mutants of an epothilone-producing organism that, after mutagenesis, fails to produce the polyketide. Thus, the region of the host genome responsible for epothilone production is tagged by the transposon and can be recovered and used as a probe to isolate the native genes from the parent strain. PKS genes that are required for the synthesis of polyketides and that are similar to known PKS genes may be isolated by virtue of their sequence homology to the biosynthetic genes for which the sequence is known, such as those for the biosynthesis of rifamycin or soraphen. Techniques suitable for isolation by homology include standard library screening by DNA hybridization.
Preferred for use as a probe molecule is a DNA fragment that is obtainable from a gene or another DNA sequence that plays a part in the synthesis of a known polyketide. A preferred probe molecule comprises a 1.2 kb Smal DNA fragment encoding the ketosynthase domain of the fourth module of the soraphen PKS Patent No. 5,716,849), and a more preferred probe molecule comprises the P-ketoacyl synthase domains from the first and second modules of the rifamycin PKS (Schupp et al., FEMS Microbiology Letters 159: 201-207 (1998)). These can be used to probe a gene library of an epothilone-producing microorganism to isolate the PKS genes responsible for epothilone biosynthesis.
Despite the well-known difficulties with PKS gene isolation in general and despite the difficulties expected to be encountered with the isolation of epothilone biosynthesis genes in particular, by using the methods described in the instant specification, biosynthetic genes for epothilones A and B can surprisingly be cloned from a microorganism that produces that polyketide. Using the methods of gene manipulation and recombinant production described in this specification, the cloned PKS genes can be modified and expressed in transgenic host organisms.
The isolated epothilone biosynthetic genes can be expressed in heterologous hosts to enable the production of the polyketide with greater efficiency than might be possible from native hosts. Techniques for these genetic manipulations are specific for the different available hosts and are known in the art. For example, heterologous genes can be expressed in Streptomyces and other actinomycetes using techniques such as those described in McDaniel et Science 262:1546-1550 (1993) and Kao et al., Science 265: 509-512 (1994), both of which are incorporated herein by reference. See also, Rowe et al., Gene WO 99/66028 PCT/EP99/04171 28- 216: 215-223 (1998); Holmes et al., EMBO Journal 12(8): 3183-3191 (1993) and Bibb et al., Gene 38: 215-226 (1985), all of which are incorporated herein by reference.
Alternately, genes responsible for polyketide biosynthesis, epothilone biosynthetic genes, can also be expressed in other host organisms such as pseudomonads and E.
coli. Techniques for these genetic manipulations are specific for the different available hosts and are known in the art. For example, PKS genes have been sucessfully expressed in E.
coli using the pT7-7 vector, which uses the T7 promoter. See, Tabor et al., Proc. Natl.
Acad. Sci. USA 82: 1074-1078 (1985), incorporated herein by reference. In addition, the expression vectors pKK223-3 and pKK223-2 can be used to express heterologous genes in E. coli, either in transcriptional or translational fusion, behind the tac or trc promoter. For the expression of operons encoding multiple ORFs, the simplest procedure is to insert the operon into a vector such as pKK223-3 in transcriptional fusion, allowing the cognate ribosome binding site of the heterologous genes to be used. Techniques for overexpression in gram-positive species such as Bacillus are also known in the art and can be used in the context of this invention (Quax et al., in: Industrial Microorganisms: Basic and Applied Molecular Genetics, Eds. Baltz et al., American Society for Microbiology, Washington (1993)).
Other expression systems that may be used with the epothilone biosynthetic genes of the invention include yeast and baculovirus expression systems. See, for example, 'The Expression of Recombinant Proteins in Yeasts," Sudbery, P. Curr. Opin. Biotechnol.
517-524 (1996); "Methods for Expressing Recombinant Proteins in Yeast," Mackay, et al., Editor(s): Carey, Paul Protein Eng. Des. 105-153, Publisher: Academic, San Diego, Calif (1996); "Expression of heterologous gene products in yeast," Pichuantes, et al., Editor(s): Cleland, J. Craik, C. Protein Eng. 129-161, Publisher: Wiley-Liss, New York, N. Y (1996); WO 98/27203; Kealey et al., Proc. Natl. Acad. Sci. USA 95: 505-509 (1998); "Insect Cell Culture: Recent Advances, Bioengineering Challenges And Implications In Protein Production," Palomares, et al., Editor(s): Galindo, Enrique; Ramirez, Octavio T., Adv. Bioprocess Eng. Vol. II, Invited Pap. Int. Symp., 2nd (1998) 25-52, Publisher: Kluwer, Dordrecht, Neth; "Baculovirus Expression Vectors," Jarvis, Donald Editor(s): Miller, Lois Baculoviruses 389-431, Publisher: Plenum, New York, N. Y. (1997); "Production Of Heterologous Proteins Using The Baculovirus/Insect Expression System," Grittiths, et al., Methods Mol. Biol. (Totowa, N. 75 (Basic Cell Culture Protocols (2nd Edition)) 427-440 (1997); and "Insect Cell Expression Technology," Luckow, Verne Protein Eng. 183-218, WO 99/66028 PCT/EP99/04171 29- Publisher: Wiley-Liss, New York, N. Y. (1996); all of which are incorporated herein by reference.
Another consideration for expression of PKS genes in heterologous hosts is the requirement of enzymes for posttranslational modification of PKS enzymes by phosphopantetheinylation before they can synthesize polyketides. However, the enzymes responsible for this modification of type I PKS enzymes, phosphopantetheinyl (P-pant) transferases are not normally present in many hosts such as E. coli. This problem can be solved by coexpression of a P-pant transferase with the PKS genes in the heterologous host, as described by Kealey et Proc. Nat!. Acad. Sci. USA 95: 505-509 (1998), incorporated herein by reference.
Therefore, for the purposes of polyketide production, the significant criteria in the choice of host organism are its ease of manipulation, rapidity of growth fermentation), possession or the proper molecular machinery for processes such as posttranslational modification, and its lack of susceptibility to the polyketide being overproduced. Most preferred host organisms are actinomycetes such as strains of Streptomyces. Other preferred host organisms are pseudomonads and E. coli. The above-described methods of polyketide production have significant advantages over the technology currently used in the preparation of the compounds. These advantages include the cheaper cost of production, the ability to produce greater quantities of the compounds, and the ability to produce compounds of a preferred biological enantiomer, as opposed to racemic mixtures inevitably generated by organic synthesis. Compounds produced by heterologous hosts can be used in medical cancer treatment in the case of epothilones) as well as agricultural applications.
WO 99/66028 PCT/EP99/04171 30
EXPERIMENTAL
The invention will be further described by reference to the following detailed examples. These examples are provided for purposes of illustration only, and are not intended to be limiting unless otherwise specified. Standard recombinant DNA and molecular cloning techniques used here are well known in the art and are described by Ausubel Current Protocols in Molecular Biology, John Wiley and Sons, Inc. (1994); T. Maniatis, E. F.
Fritsch and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor laboratory, Cold Spring Harbor, NY (1989); and by T.J. Silhavy, M.L. Berman, and L.W.
Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1984).
Example 1: Cultivation of an Epothilone-Producing Strain of Sorangium cellulosum Sorangium cellulosum strain 90 (DSM 6773, Deutsche Sammlung von Mikroorganismen und Zelilkulturen, Braunschweig) is streaked out and grown (30 0 C) on an agar plate of SolE medium (0.35% glucose, 0.05% tryptone, 0.15% MgSO 4 x 7H 2 0, 0.05% ammonium sulfate, 0.1% CaCI 2 0.006% K 2
HPO
4 0.01% sodium dithionite, 0.0008% Fe-EDTA, 1.2% HEPES, 3.5% [vol/vol] supernatant of sterilized stationary S. cellulosum culture) pH ad. 7.4.
Cells from about 1 square cm are picked and inoculated into 5 mis of G51t liquid medium glucose, 0.5% starch, 0.2% tryptone, 0.1% probion S, 0.05% CaCI 2 x2H 2 0, 0.05% MgSO 4 x7H 2 0, 1.2% HEPES, pH ad. 7.4) and incubated at 30 0 C with shaking at 225 rpm.
After 4 days, the culture is transferred into 50 mis of G51t and incubated as above for days. This culture is used to inoculate 500 mis of G51t and incubated as above for 6 days.
The culture is centrifuged for 10 minutes at 4000 rpm and the cell pellet is resuspended in mis of G51t.
Example 2: Generation of a Bacterial Artificial Chromosome (Bac) Library To generate a Bac library, S. cellulosum cells cultivated as described in Example 1 above are embedded into agarose blocks, lysed, and the liberated genomic DNA is partially digested by the restriction enzyme HindIll. The digested DNA is separated on an agarose gel by pulsed-field electrophoresis. Large (approximately 90-150 kb) DNA fragments are WO 99/66028 PCT/EP99/04171 -31 isolated from the agarose gel and ligated into the vector pBelobacll. pBelobacll contains a gene encoding chloramphenicol resistance, a multiple cloning site in the lacZ gene providing for blue/white selection on appropriate medium, as well as the genes required for the replication and maintenance of the plasmid at one or two copies per cell. The ligation mixture is used to transform Escherichia coil DH10OB electrocompetent cells using standard electroporation techniques. Chloramphenicol-resistant recombinant (white, lacZ mutant) colonies are transferred to a positively charged nylon membrane filter in 384 3X3 grid format. The clones are lysed and the DNA is cross-linked to the filters. The same clones are also preserved as liquid cultures at -80 0
C.
Example 3: Screening the Bac Library of Sorangium cellulosum 90 for the Presence of Type I Polyketide Synthase-Related Sequences The Bac library filters are probed by standard Southern hybridization procedures.
The DNA probes used encode P-ketoacyl synthase domains from the first and second modules of the rifamycin polyketide synthase (Schupp et al., FEMS Microbiology Letters 159: 201-207 (1998)). The probe DNAs are generated by PCR with primers flanking each ketosynthase domain using the plasmid pNE95 as the template (pNE95 equals cosmid 2 described in Schupp et al. (1998)). 25 ng of PCR-amplified DNA is isolated from a agarose gel and labeled with 32 P-dCTP using a random primer labeling kit (Gibco-BRL, Bethesda MD, USA) according to the manufacturer's instructions. Hybridization is at for 36 hours and membranes are washed at high stringency (3 times with 0.1x SSC and SDS for 20 min at 65 0 The labeled blot is exposed on a phosphorescent screen and the signals are detected on a Phospholmager 445SI (screen and 445SI from Molecular Dynamics). This results in strong hybridization of certain Bac clones to the probes. These clones are selected and cultured overnight in 5 mis of Luria broth (LB) at 370°C. Bac DNA from the Bac clones of interest is isolated by a typical miniprep procedure. The cells are resuspended in 200 pl lysozyme solution (50mM glucose, 10 mM EDTA, 25 mM Tris-HCI, lysozyme), lysed in 400 p.1 lysis solution (0.2 N NaOH and 2% SDS), the proteins are precipitated (3.0 M potassium acetate, adjusted to pH5.2 with acetic acid), and the Bac DNA is precipitated with isopropanol. The DNA is resuspended in 20.l of nuclease-free distilled water, restricted with BamHI (New England Biolabs, Inc.) and separated on a 0.7% agarose gel. The gel is blotted by Southern hybridization as described above and probed WO 99/66028 PCT/EP99/04171 32 under conditions described above, with a 1.2 kb Smal DNA fragment encoding the ketosynthase domain of the fourth module of the soraphen polyketide synthase as the probe (see, U.S. Patent No. 5,716,849). Five different hybridization patterns are observed. One clone representing each of the five patterns is selected and named pEPO15, pEPO20, pEPO31, and pEPO33, respectively.
Example 4: Subcloning of BamHI Fragments from pEPO15, pEPO20, pEPO30, pEPO31, and pEPO33 The DNA of the five selected Bac clones is digested with BamHI and random fragments are subcloned into pBluescript II SK+ (Stratagene) at the BamHI site. Subclones carrying inserts between 2 and 10 kb in size are selected for sequencing of the flanking ends of the inserts and also probed with the 1.2 Smal probe as described above. Subclones that show a high degree of sequence homology to known polyketide synthases and/or strong hybridization to the soraphen ketosynthase domain are used for gene disruption experiments.
Example 5: Preparation of Streptomycin-Resistant Spontaneous Mutants of Sorangium cellulosum strain So 0.1 ml of a three day old culture of Sorangium cellulosum strain So ce90, which is raised in liquid medium G52-H yeast extract, 0.2% soyameal defatted, 0.8% potato starch, 0.2% glucose, 0.1% MgSO4 x7H20, 0.1% CaCI2 x2H20, 0.008% Fe-EDTA, pH ad 7.4 with KOH), is plated out on agar plates with SolE medium supplemented with 100 gg/ml streptomycin. The plates are incubated at 30°C for 2 weeks. The colonies growing on this medium are streptomycin-resistant mutants, which are streaked out and cultivated once more on the same agar medium with streptomycin for purification. One of these streptomycin-resistant mutants is selected and is called BCE28/2.
WO 99/66028 PCT/EP99/04171 33 Example 6: Gene Disruptions in Sorangium cellulosum BCE28/2 Using the Subcloned BamHI Fragments The BamHI inserts of the subclones generated from the five selected Bac clones as described above are isolated and ligated into the unique BamHI site of plasmid pCIB132 (see, U.S. Patent No. 5,716,849). The pCIB132 derivatives carrying the inserts are transformed into Escherichia coli ED8767 containing the helper plasmid pUZ8 (Hedges and Matthew, Plasmid 2: 269-278 (1979). The transformants are used as donors in conjugation experiments with Sorangium cellulosum BCE28/2 as recipient. For the conjugation, 5-10 x 109 cells of Sorangium cellulosum BCE28/2 from an early stationary phase culture (reaching about 5 x 108 cells/ml) grown at 300°C in liquid medium G51b (G51b equals medium G51t with tryptone replaced by peptone) are mixed in a 1:1 cellular ratio with a late-log phase culture (in LB liquid medium) of E. coli ED8767 containing pCIB132 derivatives carrying the subcloned BamHI fragments and the helper plasmid pUZ8. The mixed cells are then centrifuged at 4000 rpm for 10 minutes and resuspended in 0.5 ml G51b medium. This cell suspension is then plated as a drop in the center of a plate with SolE agar containg 50 mg/1 kanamycin. The cells obtained after incubation for 24 hours at 30°C are harvested and resuspended in 0.8 ml of G51b medium, and 0.1 to 0.3 ml of this suspension is plated out on a selective Sol E solid medium containing phleomycin (30 mg/l), streptomycin (300 mg/I), and kanamycin (50 mg/I). The counterselection of the donor Escherichia coli strain takes place with the aid of streptomycin. The colonies that grow on this selective medium after an incubation time of 8-12 days at a temperature of 30 0 C are isolated with a plastic loop and streaked out and cultivated on the same agar medium for a second round of selection and purification. The colony-derived cultures that grow on this selective agar medium after 7 days at a temperature of 30°C are transconjugants of Sorangium cellulosum BCE28/2 that have acquired phleomycin resistance by conjugative transfer of the pCIB132 derivatives carrying the subcloned BamHI fragments.
Integration of the pCIB132-derived plasmids into the chromosome of Sorangium cellulosum BCE28/2 by homologous recombination is verified by Southern hybridization.
For this experiment, complete DNA from 5-10 tranconjugants per transferred BamHI fragment is isolated (from 10 ml cultures grown in medium G52-H for three days) applying the method described by Pospiech and Neumann, Trends Genet. 11:217 (1995). For the Southern blot, the DNA isolated as described above is cleaved either with the restriction WO 99/66028 PCT/EP99/04171 34 enzymes Bg/Al, C/al, or Notl, and the respective BamHI inserts or pCIB132 are used as 32P labelled probes.
Example 7: Analysis of the Effect of the Integrated BamHI Fragments on Epothilone Production by Sorangium cellulosum After Gene Disruption Transconjugant cells grown on about 1 square cm surface of the selective Sol E plates of the second round of selection (see Example 6) are transferred by a sterile plastic loop into 10 ml of medium G52-H in an 50 ml Erlenmeyer flask. After incubation at 300°C and 180 rpm for 3 days, the culture is transfered into 50 ml of medium G52-H in an 200 ml Erlenmeyer flask. After incubation at 30°C and 180 rpm for 4-5 days, 10 ml of this culture is transfered into 50 ml of medium 23B3 (0.2 glucose, 2 potato starch, 1.6 soya meal defatted, 0.0008 Fe-EDTA Sodium salt, 0.5 HEPES (4-(2-hydroxyethyl)-piperazine-1ethane-sulfonic-acid), 2 vol/vol polysterole resin XAD16 (Rohm Haas), pH adjusted to 7.8 with NaOH) in an 200 ml Erlenmeyer flask.
Quantitative determination of the epothilone produced takes place after incubation of the cultures at 30°C and 180 rpm for 7 days. The complete culture broth is filtered by suction through a 150 pm nylon filter. The resin remaining on the filter is then resuspended in 10 ml isopropanol and extracted by shaking the suspension at 180 rpm for 1 hour. 1 ml is removed from this suspension and centrifuged at 12,000 rpm in an Eppendorff Microfuge.
The amount of epothilones A and B therein is determined by means of an HPLC and detection at 250 nm with a UV_DAD detector (HPLC with Waters -Symetry C18 column and a gradient of 0.02 phosphoric acid 60%-0% and acetonitril 40%-100%).
Transconjugants with three different integrated BamHI fragments subcloned from namely transconjugants with the BamHI fragment of plasmid pEPO15-21, transconjugants with the BamHI fragment of plasmid pEPO15-4-5, and transconjugants with the BamHI fragment of plasmid pEPO15-4-1, are tested in the manner described above. HPLC analysis reveals that all transconjugants no longer produce epothilone A or B. By contrast, epothilone A and B are detectable in a concentration of 2-4 mg/I in transconjugants with BamHl fragments integrated that are derived from pEPO20, pEPO30, pEPO31, pEPO33, and in the parental strain BCE28/2.
WO 99/66028 PCT/EP99/04171 35 Example 8: Nucleotide Sequence Determination of the Cloned Fragments and Construction of Contigs A. BamHI Insert of Plasmid pEPO15-21 Plasmid DNA is isolated from the strain Escherichia coli DH10B [pEPO15-21], and the nucleotide sequence of the 2.3-kb BamHI insert in pEPO15-21 is determined. Automated DNA sequencing is done on the double-stranded DNA template by the dideoxynucleotide chain termination method, using Applied Biosystems model 377 sequencers. The primers used are the universal reverse primer GGA AAC AGC TAT GAC CAT G 3' (SEQ ID NO:24)) and the universal forward primer GTA AAA CGA CGG CCA GT 3' (SEQ ID In subsequent rounds of sequencing reactions, custom-synthesized oligonucleotides, designed for the 3' ends of the previously determined sequences, are used to extend and join contigs. Both strands are entirely sequenced, and every nucleotide is sequenced at least two times. The nucleotide sequence is compiled using the program Sequencher vers. 3.0 (Gene Codes Corporation), and analyzed using the University of Wisconsin Genetics Computer Group programs. The nucleotide sequence of the 2213-bp insert corresponds to nucleotides 20779-22991 of SEQ ID NO:1.
B. BamHI Insert of Plasmid pEPO15-4-1 Plasmid DNA is isolated from the strain Escherichia coli DH10B [pEPO15-4-1], and the nucleotide sequence of the 3.9-kb BamHI insert in pEPO15-4-1 is determined as described in above. The nucleotide sequence of the 3909-bp insert corresponds to nucleotides 16876-20784 of SEQ ID NO:1.
C. BamHI Insert of Plasmid pEPO15-4-5 Plasmid DNA is isolated from the strain Escherichia coli DH10B [pEPO15-4-5], and the nucleotide sequence of the 2.3-kb BamHI insert in pEPO15-4-5 is determined as described in above. The nucleotide sequence of the 2233-bp insert corresponds to nucleotides 42528-44760 of SEQ ID NO:1.
WO 99/66028 PCT/EP99/04171 36 Example 9: Subcloning and Ordering of DNA Fragments from pEPO15 Containing Epothilone Biosynthesis Genes is digested to completion with the restriction enzyme Hindlll and the resulting fragments are subcloned into pBluescript II SK- or pNEB193 (New England Biolabs) that has been cut with HindIll and dephosphorylated with calf intestinal alkaline phosphatase. Six different clones are generated and named pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24 (all based on pNEB193), and pEPO15-H2.7 and (both based on pBluescript II SK-).
The BamHI insert of pEPO15-21 is isolated and DIG-labeled (Non-radioactive DNA labeling and detection system, Boehringer Mannheim), and used as a probe in DNA hybridization experiments at high stringency against pEPO15-NH1, pEPO15-NH2, NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0. Strong hybridization signal is detected for pEPO15-NH24, indicating that pEPO15-21 is contained within pEPO15-NH24.
The BamHI insert of pEPO15-4-1 is isolated and DIG-labeled as above, and used as a probe in DNA hybridization experiments at high stringency against pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0. Strong hybridization signals are detected for pEPO15-NH24 and pEPO15-H2.7. Nucleotide sequence data generated from one end each of pEPO15-NH24 and pEPO15-H2.7 are also in complete agreement with the previously determined sequence of the BamHI insert of pEPO15-4-1. These experiments demonstrate that pEPO15-4-1 (which contains one internal HindIll site) overlaps pEPO15-H2.7 and pEPO15-NH24, and that pEPO15-H2.7 and pEPO15-NH24, in this order, are contiguous.
The BamHI insert of pEPO15-4-5 is isolated and DIG-labeled as above, and used as a probe in DNA hybridization experiments at high stringency against pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0. Strong hybridization signal is detected for pEPO15-NH2, indicating that pEPO15-21 is contained within pEPO15-NH2.
Nucleotide sequence data is generated from both ends of pEPO15-NH2 and from the end of pEPO15-NH24 that does not overlap with pEPO15-4-1. PCR primers NH24 end GTGACTGGCGCCTGGAATCTGCATGAGC (SEQ ID NO:26), NH2 end AGCGGGAGCTTGCTAGACATTCTGTTTC (SEQ ID NO:27), and NH2 end GACGCGCCTCGGGCAGCGCCCCAA (SEQ ID NO:28), pointing towards the HindIll sites, WO 99/66028 PCT/EP99/04171 37 are designed based on these sequences and used in amplification reactions with and, in separate experiments, with Sorangium cellulosum So ce90 genomic DNA as the templates. Specific amplification is found with primer pair NH24 end and NH2 end "A" with both templates. The amplimers are cloned into pBluescript II SK- and completely sequenced. The sequences of the amplimers are identical, and also agree completely with the end sequences of pEPO15-NH24 and pEPO15-NH2, fused at the Hindlll site, establishing that the HindIll fragments of pEPO15-NH2 and pEP015-NH24 are, in this order, contiguous.
The Hindlll insert of pEPO15-H2.7 is isolated and DIG-labeled as above, and used as a probe in a DNA hybridization experiment at high stringency against pEPO15 digested by Noft. A Notl fragment of about 9 kb in size shows a strong a hybridization, and is further subcloned into pBluescript II SK- that has been digested with Nofl and dephosphorylated with calf intestinal alkaline phosphatase, to yield pEPO15-N9-16. The Nott insert of pEPO15-N9-16 is isolated and DIG-labeled as above, and used as a probe in DNA hybridization experiments at high stringency against pEPO15-NH1, pEPO15-NH2, NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0. Strong hybridization signals are detected for pEPO15-NH6, and also for the expected clones pEPO15-H2.7 and NH24. Nucleotide sequence data is generated from both ends of pEPO15-NH6 and from the end of pEPO15-H2.7 that does not overlap with pEPO15-4-1. PCR primers are designed pointing towards the Hindlll sites and used in amplification reactions with and, in separate experiments, with Sorangium cellulosum So ce90 genomic DNA as the templates. Specific amplification is found with primer pair pEPO15-NH6 end CACCGAAGCGTCGATCTGGTCCATC (SEQ ID NO:29) and pEPO15-H2.7 end CGGTCAGATCGACGACGGGCTTTCC (SEQ ID NO:30) with both templates. The amplimers are cloned into pBluescript II SK- and completely sequenced. The sequences of the amplimers are identical, and also agree completely with the end sequences of NH6 and pEPO15-H2.7, fused at the Hindill site, establishing that the HindIll fragments of pEPO15-NH6 and pEPO15-H2.7 are, in this order, contiguous.
All of these experiments, taken together, establish a contig of Hindill fragments covering a region of about 55 kb and consisting of the Hindill inserts of pEPO15-NH6, pEPO15-H2.7, pEP015-NH24, and pEPO15-NH2, in this order. The inserts of the remaining two Hindlll subclones, namely pEPO15-NH1 and pEPO15-H3.0, are not found to be parts of this contig.
WO 99/66028 PCT/EP99/04171 38 Example 10: Further Extension of the Subclone Contig Covering the Epothilone Biosynthesis Genes An approximately 2.2 kb BamHI Hindill fragment derived from the downstream end of the insert of pEPO15-NH2 and thus representing the downstream end of the subclone contig described in Example 9 is isolated, DIG-labeled, and used in Southern hybridization experiments against pEPO15 and pEPO15-NH2 DNAs digested with several enzymes. The strongly hybridizing bands are always found to be the same in size between the two target DNAs indicating that the Sorangium cellulosum So ce90 genomic DNA fragment cloned into ends with the Hindlll site at the downstream end of pEPO15-NH2.
A cosmid DNA library of Sorangium cellulosum So ce90 is generated, using established procedures, in pScosTriplex-ll (Ji, et al., Genomics 31: 185-192 (1996)). Briefly, highmolecular weight genomic DNA of Sorangium cellulosum So ce90 is partially digested with the restriction enzyme Sau3AI to provide fragments with average sizes of about 40 kb, and ligated to BamHI and Xbal digested pScosTriplex-II. The ligation mix is packaged with Gigapack III XL (Stratagene) and used to transfect E. coli XL1 Blue MR cells.
The cosmid library is screened with the approximately 2.2 kb BamHI Hindlll fragment, derived from the downstream end of the insert of pEPO15-NH2, used as a probe in colony hybridization. A strongly hybridizing clone, named pEPO4E7 is selected.
pEPO4E7 DNA is isolated, digested with several restriction endonucleases, and probed in Southern hybridization experiments with the 2.2 kb BamHI Hindill fragment. A strongly hybridizing Notl fragment of approximately 9 kb in size is selected and subcloned into pBluescript II SK- to yield pEPO4E7-N9-8. Further Southern hybridization experiments reveal that the approximately 9 kb Notl insert of pEPO4E7-N9-8 overlaps pEPO15-NH2 over 6 kb in a Nott Hindlll fragment, while the remaining approximately 3 kb Hindill Notl fragment would extend the subclone contig described in Example 9. End sequencing reveals, however, that the downstream end of the insert of pEPO4E7-N9-8 contains the BamHI Notl polylinker of pScosTriplex-ll, thereby indicating that the genomic DNA insert of pEPO4E7 ends at a Sau3AI site within the extending Hindll Notl fragment and that the Notl site is derived from pScosTriplex-II.
An approximately 1.6 kb Pstl Safl fragment derived from the approximately 3 kb extending Hindlll Not subfragment of pEPO4E7-N9-8, containing only Sorangium WO 99/66028 PCT/EP99/04171 39 cellulosum So ce90-derived sequences free of vector, is used as a probe against the bacterial artificial chromosome library described in Example 2. Besides the previouslyisolated EPO15, a Bac clone, named EP032, is found to strongly hybridize to the probe.
pEPO32 is isolated, digested with several restriction endonucleases, and hybridized with the approximately 1.6 kb Pstl Sall probe. A Hindlll EcoRV fragment of about 13 kb in size is found to strongly hybridize to the probe, and is subcloned into pBluescript II SKdigested with Hindll and Hincl to yield pEPO32-HEV15.
Oligonucleotide primers are designed based on the downstream end sequence of pEPO15-NH2 and on the upstream (Hindlll) end sequence derived from pEPO32-HEV15, and used in sequencing reactions with pEPO4E7-N9-8 as the template. The sequences reveal the existence of a small Hindlll fragment (EPO4E7-HO.02) of 24 bp, undetectable in standard restriction analysis, separating the Hindill site at the downstream end of pEP015- NH2 from the Hindlll site at the upstream end of pEPO32-HEV15.
Thus, the subclone contig described in Example 9 is extended to include the HindIll fragment EPO4E7-HO.02 and the insert of pEPO32-HEV15, and constitutes the inserts of: pEPO15-NH6, pEPO15-H2.7, pEPO15-NH24, pEPO15-NH2, EPO4E7-HO.02 and pEPO32in this order.
Example 11: Nucleotide Sequence Determination of the Subclone Contig Covering the Epothilone Biosynthesis Genes The nucleotide sequence of the subclone contig described in Example 10 is determined as follows.
pEPO15-H2.7. Plasmid DNA is isolated from the strain Escherichia coli [pEPO15-H2.7], and the nucleotide sequence of the 2.7-kb BamHI insert in pEPO15-H2.7 is determined. Automated DNA sequencing is done on the double-stranded DNA template by the dideoxynucleotide chain termination method, using Applied Biosystems model 377 sequencers. The primers used are the universal reverse primer GGA AAC AGC TAT GAC CAT G 3' (SEQ ID NO:24)) and the universal forward primer GTA AAA CGA CGG CCA GT 3' (SEQ ID NO:25)). In subsequent rounds of sequencing reactions, customsynthesized oligonucleotides, designed for the 3' ends of the previously determined sequences, are used to extend and join contigs.
WO 99/66028 PCT/EP99/04171 40 pEP015-NH6, pEPO15-NH24 and pEPO15-NH2. The Hindlll inserts of these plasmids are isolated, and subjected to random fragmentation using a Hydroshear apparatus (Genomic Instrumentation Services, Inc.) to yield an average fragment size of 1-2 kb. The fragments are end-repaired using T4 DNA Polymerase and Klenow DNA Polymerase enzymes in the presence of desoxynucleotide triphosphates, and phosphorylated with T4 DNA Kinase in the presence of ribo-ATP. Fragments in the size range of 1.5-2.2 kb are isolated from agarose gels, and ligated into pBluescript It SK- that has been cut with EcoRV and dephosphorylated. Random subclones are sequenced using the universal reverse and the universal forward primers.
pEPO32-HEV15. pEPO32-HEV15 is digested with Hindlll and Sspl, the approximately 13.3 kb fragment containing the -13 kb HindIll EcoRV insert from So. cellulosum So and a 0.3 kb Hincll Sspl fragment from pBluescript II SK- is isolated, and partially digested with Haelll to yield fragments with an average size of 1-2 kb. Fragments in the size range of 1.5-2.2 kb are isolated from agarose gels, and ligated into pBluescript II SK- that has been cut with EcoRV and dephosphorylated. Random subclones are sequenced using the universal reverse and the universal forward primers.
The chromatograms are analyzed and assembled into contigs with the Phred, Phrap and Consed programs (Ewing, etaL., Genome Res. 175-185 (1998); Ewing, et al., Genome Res. 186-194 (1998); Gordon, et al., Genome Res. 195-202 (1998)).
Contig gaps are filled, sequence discrepancies are resolved, and low-quality regions are resequenced using custom-designed oligonucleotide primers for sequencing on either the original subclones or selected clones from the random subclone libraries. Both strands are completely sequenced, and every basepair is covered with at least a minimum aggregated Phred score of 40 (confidence level of 99.99%).
The nucleotide sequence of the 68750 bp contig is shown as SEQ ID NO:1.
WO 99/66028 WO 9966028PCT/EP99/04171 -41 Example 12: Nucleotide Sequence Analysis of the Epothilone Biosynthesis Genes SEQ ID NO:1 is found to contain 22 ORFs as detailed below in Table 1: Table 1 ORF Start codon S oon Homology of deduced protein Proposed function of deduced protein or1 9 outside of 1826 sequenced orj2 3171 1900 Hypothetical protein SP: QI 11037; __________DD-peptidase SP:P15555 J 3415 5556 NaIH antiporter PID:DIN17724 Transport rf4 5992 5612 6226 6675 epoA 7610 11875 Type I polyketide synthase Epothilone synthase: Thiazole ring epoP 11872 16104 Non-ribosomnal peptide synthetase Epothilone synthase: Thiazole ring epoB 16251 21749 Type I polyketide synthase Epothilone synthase: Polyketide formation epoC 21746 43519 Type I polyketide synthase Epothilone synthase: Polyketide backbone formation epoD 43524 54920 Type I polyketide synthase Epothilone synthase: Polyketide backbone formation epoE 54935 62254 Type I polyketide synthase Epothilone synthase: Polyketide formation e O 62369 63628 Cvtochromne P450 Epothilone macrolactone oxidase orf6 63779 64333 ocr1f7 64290 63853 cr18 64363 64920 cr19 64727 64287 65063 65767 cr1* 65874 65008 orl2* 66338 65871 onr113 66667 67137 or)f14 67334 68251 Hypothetical protein GI:3293544; Transport Cation efflux system protein GI:2623026 cr11 5 68346 outside of sequenced *On the reverse complementer strand. Numbering according to SEQ ID NO:1.
epoA (nucleotidles 761 0-11875 of SEQ ID NO:1) codes for EPOS A (SEQ ID NO:2), a type I polyketide synthase consisting of a single module, and harboring the following domains: 03-ketoacyl-synthase (KS) (nucleotides 7643-8920 of SEQ ID NO:1, amino acids 11 WO 99/66028 PCT/EP99/04171 42 437 of SEQ ID NO:2); acyltransferase (AT) (nucleotides 9236-10201 of SEQ ID NO:1, amino acids 543-864 of SEQ ID NO:2); enoyl reductase (ER) (nucleotides 10529-11428 of SEQ ID NO:1, amino acids 974-1273 of SEQ ID NO:2); and acyl carrier protein homologous domain (ACP) (nucleotides 11549-11764 of SEQ ID NO:1, amino acids 1314-1385 of SEQ ID NO:2). Sequence comparisons and motif analysis (Haydock, et al. FEBS Lett. 374: 246- 248 (1995); Tang, et al., Gene 216: 255-265 (1998)) reveal that the AT encoded by EPOS A is specific for malonyl-CoA. EPOS A should be involved in the initiation of epothilone biosynthesis by loading the acetate unit to the multienzyme complex that will eventually form part of the 2-methylthiazole ring (C26 and epoP (nucleotides 11872-16104 of SEQ ID NO:1) codes for EPOS P (SEQ ID NO:3), a non-ribosomal peptide synthetase containing one module. EPOS P harbors the following domains: peptide bond formation domain, as delineated by motif K (amino acids 72-81 [FPLTDIQESY] of SEQ ID NO:3, corresponding to nucleotide positions 12085-12114 of SEQ ID NO:1); motif L (amino acids 118-125 [VVARHDML] of SEQ ID NO:3, corresponding to nucleotide positions 12223-12246 of SEQ ID NO:1); motif M (amino acids 199- 212 [SIDLINVDLGSLSI] of SEQ ID NO:3, corresponding to nucleotide positions 12466- 12507 of SEQ ID NO:1); and motif 0 (amino acids 353-363 [GDFTSMVLLDI] of SEQ ID NO:3, corresponding to nucleotide positions 12928-12960 of SEQ ID NO:1); aminoacyl adenylate formation domain, as delineated by motif A (amino acids 549- 565 [LTYEELSRRSRRLGARL] of SEQ ID NO:3, corresponding to nucleotide positions 13516-13566 of SEQ ID NO:1); motif B (amino acids 588-603 [VAVLAVLESGAAYVPI] of SEQ ID NO:3, corresponding to nucleotide positions 13633-13680 of SEQ ID NO:1); motif C (amino acids 669-684 [AYVIYTSGSTGLPKGV] of SEQ ID NO:3, corresponding to nucleotide positions 13876-13923 of SEQ ID NO:1); motif D (amino acids 815-821 [SLGGATE] of SEQ ID NO:3, corresponding to nucleotide positions 14313-14334 of SEQ ID NO:1); motif E (amino acids 868-892 [GQLYIGGVGLALGYWRDEEKTRKSF] of SEQ ID NO:3, corresponding to nucleotide positions 14473-14547 of SEQ ID NO:1); motif F (amino acids 903-912 [YKTGDLGRYL] of SEQ ID NO:3, corresponding to nucleotide positions 14578-14607 of SEQ ID NO:1); motif G (amino acids 918-940 [EFMGREDNQIKLRGYRVELGEIE] of SEQ ID NO:3, corresponding to nucleotide positions 14623-14692 of SEQ ID NO:1); motif H (amino acids 1268-1274 [LPEYMVP] of SEQ ID NO:3, corresponding to nucleotide positions 15673-15693 of SEQ ID NO:1); and WO 99/66028 PCT/EP99/04171 43 motif I (amino acids 1285-1297 [LTSNGKVDRKALR] of SEQ ID NO:3, corresponding to nucleotide positions 15724-15762 of SEQ ID NO:1); an unknown domain, inserted between motifs G and H of the aminoacyl adenylate formation domain (amino acids 973-1256 of SEQ ID NO:3, corresponding to nucleotide positions 14788-15639 of SEQ ID NO:1); and a peptidyl carrier protein homologous domain (PCP), delineated by motif J (amino acids 1344-1351 [GATSIHIV] of SEQ ID NO:3, corresponding to nucleotide positions 15901-15924 of SEQ ID NO:1).
It is proposed that EPOS P is involved in the activation of a cysteine by adenylation, binding the activated cysteine as an aminoacyl-S-PCP, forming a peptide bond between the enzyme-bound cysteine and the acetyl-S-ACP supplied by EPOS A, and the formation of the initial thiazoline ring by intramolecular heterocyclization. The unknown domain of EPOS P displays very weak homologies to NAD(P)H oxidases and reductases from Bacillus species.
Thus, this unknown domain and/or the ER domain of EPOS A may be involved in the oxidation of the initial 2-methylthiazoline ring to a 2-methylthiazole.
epoB (nucleotides 16251-21749 of SEQ ID NO:1) codes for EPOS B (SEQ ID NO:4), a type I polyketide synthase consisting of a single module, and harboring the following domains: KS (nucleotides 16269-17546 of SEQ ID NO:1, amino acids 7-432 of SEQ ID NO:4); AT (nucleotides 17865-18827 of SEQ ID NO:1, amino acids 539-859 of SEQ ID NO:4); dehydratase (DH) (nucleotides 18855-19361 of SEQ ID NO:1, amino acids 869-1037 of SEQ ID NO:4); P3-ketoreductase (KR) (nucleotides 20565-21302 of SEQ ID NO:1, amino acids 1439-1684 of SEQ ID NO:4); and ACP (nucleotides 21414-21626 of SEQ ID NO:1, amino acids 1722-1792 of SEQ ID NO:4). Sequence comparisons and motif analysis reveal that the AT encoded by EPOS B is specific for methylmalonyl-CoA. EPOS A should be involved in the first polyketide chain extension by catalysing the Claisen-like condensation of the 2-methyl-4-thiazolecarboxyl-S-PCP starter group with the methylmalonyl-S-ACP, and the concomitant reduction of the b-keto group of C17 to an enoyl.
epoC (nucleotides 21746-43519 of SEQ ID NO:1) codes for EPOS C (SEQ ID a type I polyketide synthase consisting of 4 modules. The first module harbors a KS (nucleotides 21860-23116 of SEQ ID NO:1, amino acids 39-457 of SEQ ID NO:5); a malonyl CoAspecific AT (nucleotides 23431-24397 of SEQ ID NO:1, amino acids 563-884 of SEQ ID a KR (nucleotides 25184-25942 of SEQ ID NO:1, amino acids 1147-1399 of SEQ ID and an ACP (nucleotides 26045-26263 of SEQ ID NO:1, amino acids 1434-1506 of WO 99/66028 PCT/EP99/04171 -44 SEQ ID NO:5). This module incorporates an acetate extender unit (C14-C13) and reduces the p-keto group at C15 to the hydroxyl group that takes part in the final lactonization of the epothilone macrolactone ring. The second module of EPOS C harbors a KS (nucleotides 26318-27595 of SEQ ID NO:1, amino acids 1524-1950 of SEQ ID NO:5); a malonyl CoAspecific AT (nucleotides 27911-28876 of SEQ ID NO:1, amino acids 2056-2377 of SEQ ID a KR (nucleotides 29678-30429 of SEQ ID NO:1, amino acids 2645-2895 of SEQ ID and an ACP (nucleotides 30539-30759 of SEQ ID NO:1, amino acids 2932-3005 of SEQ ID NO:5). This module incorporates an acetate extender unit (C12-C11) and reduces the P-keto group at C13 to a hydroxyl group. Thus, the nascent polyketide chain of epothilone corresponds to epothilone A, and the incorporation of the methyl side chain at C12 in epothilone B would require a post-PKS C-methyltransferase activity. The formation of the epoxi ring at C13-C12 would also require a post-PKS oxidation step. The third module of EPOS C harbors a KS (nucleotides 30815-32092 of SEQ ID NO:1, amino acids 3024-3449 of SEQ ID NO:5); a malonyl CoA-specific AT (nucleotides 32408-33373 of SEQ ID NO:1, amino acids 3555-3876 of SEQ ID NO:5); a DH (nucleotides 33401-33889 of SEQ ID NO:1, amino acids 3886-4048 of SEQ ID NO:5); an ER (nucleotides 35042-35902 of SEQ ID NO:1, amino acids 4433-4719 of SEQ ID NO:5); a KR (nucleotides 35930-36667 of SEQ ID NO:1, amino acids 4729-4974 of SEQ ID NO:5); and an ACP (nucleotides 36773-36991 of SEQ ID NO:1, amino acids 5010-5082 of SEQ ID NO:5). This module incorporates an acetate extender unit (C10-C9) and fully reduces the p-keto group at C11. The fourth module of EPOS C harbors a KS (nucleotides 37052-38320 of SEQ ID NO:1, amino acids 5103- 5525 of SEQ ID NO:5); a methylmalonyl CoA-specific AT (nucleotides 38636-39598 of SEQ ID NO:1, amino acids 5631-5951 of SEQ ID NO:5); a DH (nucleotides 39635-40141 of SEQ ID NO:1, amino acids 5964-6132 of SEQ ID NO:5); an ER (nucleotides 41369-42256 of SEQ ID NO:1, amino acids 6542-6837 of SEQ ID NO:5); a KR (nucleotides 42314-43048 of SEQ ID NO:1, amino acids 6857-7101 of SEQ ID NO:5); and an ACP (nucleotides 43163- 43378 of SEQ ID NO:1, amino acids 7140-7211 of SEQ ID NO:5). This module incorporates a propionate extender unit (C24 and C8-C7) and fully reduces the p-keto group at C9.
epoD (nucleotides 43524-54920 of SEQ ID NO:l) codes for EPOS D (SEQ ID NO:6), a type I polyketide synthase consisting of 2 modules. The first module harbors a KS (nucleotides 43626-44885 of SEQ ID NO:1, amino acids 35-454 of SEQ ID NO:6); a methylmalonyl CoA-specific AT (nucleotides 45204-46166 of SEQ ID NO:1, amino acids 561-881 of SEQ ID NO:6); a KR (nucleotides 46950-47702 of SEQ ID NO:1, amino acids WO 99/66028 PCT/EP99/04171 1143-1393 of SEQ ID NO:6); and an ACP (nucleotides 47811-48032 of SEQ ID NO:1, amino acids 1430-1503 of SEQ ID NO:6). This module incorporates a propionate extender unit (C23 and C6-C5) and reduces the P-keto group at C7 to a hydoxyl group. The second module harbors a KS (nucleotides 48087-49361 of SEQ ID NO:1, amino acids 1522-1946 of SEQ ID NO: a'methylmalonyl CoA-specific AT (nucleotides 49680-50642 of SEQ ID NO:1, amino acids 2053-2373 of SEQ ID NO:6); a DH (nucleotides 50670-51176 of SEQ ID NO:1, amino acids 2383-2551 of SEQ ID NO:6); a methyltransferase (MT, nucleotides 51534-52657 of SEQ ID NO:1, amino acids 2671-3045 of SEQ ID NO:6); a KR (nucleotides 53697-54431 of SEQ ID NO:1, amino acids 3392-3636 of SEQ ID NO:6); and an ACP (nucleotides 54540-54758 of SEQ ID NO:1, amino acids 3673-3745 of SEQ ID NO:6). This module incorporates a propionate extender unit (C21 or C22 and C4-C3) and reduces the p-keto group at C5 to a hydoxyl group. This reduction is somewhat unexpected, since epothilones contain a keto group at C5. Discrepancies of this kind between the deduced reductive capabilities of PKS modules and the redox state of the corresponding positions in the final polyketide products have been, however, reported in the literature (see, for example, Schwecke, et al., Proc. Natl. Acad. Sci. USA 92: 7839-7843 (1995) and Schupp, et al., FEMS Microbiology Letters 159: 201-207 (1998)). An important feature of epothilones is the presence of gem-methyl side groups at C4 (C21 and C22). The second module of EPOS D is predicted to incorporate a propionate unit into the growing polyketide chain, providing one methyl side chain at C4. This module also contains a methyltransferase domain integrated into the PKS between the DH and the KR domains, in an arrangement similar to the one seen in the HMWP1 yersiniabactin synthase (Gehring, DeMoll, E., Fetherston, Mori, Mayhew, Blattner, Walsh, and Perry, Iron acquisition in plague: modular logic in enzymatic biogenesis of yersiniabactin by Yersinia pestis. Chem. Biol. 5, 573-586, 1998). This MT domain in EPOS D is proposed to be responsible for the incorporation of the second methyl side group (C21 or C22) at C4.
epoE (nucleotides 54935-62254 of SEQ ID NO:1) codes for EPOS E (SEQ ID NO:7), a type I polyketide synthase consisting of one module, harboring a KS (nucleotides 55028- 56284 of SEQ ID NO:1, amino acids 32-450 of SEQ ID NO:7); a malonyl CoA-specific AT (nucleotides 56600-57565 of SEQ ID NO:1, amino acids 556-877 of SEQ ID NO:7); a DH (nucleotides 57593-58087 of SEQ ID NO:1, amino acids 887-1051 of SEQ ID NO:7); a probably nonfunctional ER (nucleotides 59366-60304 of SEQ ID NO:1, amino acids 1478-1790 of SEQ ID NO:7); a KR (nucleotides 60362-61099 of SEQ ID NO:1, amino acids 1810-2055 WO 99/66028 PCT/EP99/04171 -46 of SEQ ID NO:7); an ACP (nucleotides 61211-61426 of SEQ ID NO:1, amino acids 2093- 2164 of SEQ ID NO:7); and a thioesterase (TE) (nucleotides 61427-62254 of SEQ ID NO:1, amino acids 2165-2439 of SEQ ID NO:7). The ER domain in this module harbors an active site motif with some highly unusual amino acid substitutions that probably render this domain inactive. The module incorporates an acetate extender unit (C2-C1), and reduces the P-keto at C3 to an enoyl group. Epothilones contain a hydroxyl group at C3, so this reduction also appears to be excessive as discussed for the second module of EPOS D. The TE domain of EPOS E takes part in the release and cyclization of the grown polyketide chain via lactonization between the carboxyl group of C1 and the hydroxyl group of Five ORFs are detected upstream of epoA in the sequenced region. The partially sequenced orfl has no homologues in the sequence databanks. The deduced protein product (Orf 2, SEQ ID NO:10) of orf2 (nucleotides 3171-1900 on the reverse complement strand of SEQ ID NO:1) shows strong similarities to hypothetical ORFs from Mycobacterium and Streptomyces coelicolor, and more distant similarities to carboxypeptidases and DDpeptidases of different bacteria. The deduced protein product of orf3 (nucleotides 3415- 5556 of SEQ ID NO:1), Orf 3 (SEQ ID NO:11), shows homologies to Na/H antiporters of different bacteria. Orf 3 might take part in the export of epothilones from the producer strain. orf4 and or5 have no homologues in the sequence databanks.
Eleven ORFs are found downstream of epoE in the sequenced region. epoF (nucleotides 62369-63628 of SEQ ID NO:1) codes for EPOS F (SEQ ID NO:8), a deduced protein with strong sequence similarities to cytochrome P450 oxygenases. EPOS F may take part in the adjustment of the redox state of the carbons C12, C5, and/or C3. The deduced protein product of orfl4 (nucleotides 67334-68251 of SEQ ID NO:1), Orf 14 (SEQ ID NO:22) shows strong similarities to GI:3293544, a hypothetic protein with no proposed function from Streptomyces coelicolor, and also to GI:2654559, the human embrionic lung protein. It is also more distantly related to cation efflux system proteins like G1:2623026 from Methanobacterium thermoautotrophicum, so it might also take part in the export of epothilones from the producing cells. The remaining ORFs (orf6-orf13 and orfl5) show no homologies to entries in the sequence databanks.
Example 13: Recombinant Expression of Epothilone Biosynthesis Genes WO 99/66028 PCT/EP99/04171 -47 Epothilone synthase genes according to the present invention are expressed in heterologous organisms for the purposes of epothilone production at greater quantities than can be accomplished by fermentation of Sorangium cellulosum. A preferable host for heterologous expression is Streptomyces, e.g. Streptomyces coelicolor, which natively produces the polyketide actinorhodin. Techniques for recombinant PKS gene expression in this host are described in McDaniel et al., Science 262:1546-1550 (1993) and Kao et al., Science 265: 509-512 (1994). See also, Holmes et al., EMBO Journal 12(8): 3183-3191 (1993) and Bibb et al., Gene 38: 215-226 (1985), as well as U.S. Patent Nos. 5,521,077, 5,672,491, and 5,712,146, which are incorporated herein by reference.
According to one method, the heterologous host strain is engineered to contain a chromosomal deletion of the actinorhodin (act) gene cluster. Expression plasmids containing the epothilone synthase genes of the invention are constructed by transferring DNA from a temperature-sensitive donor plasmid to a recipient shuttle vector in E. coli (McDaniel et al. (1993) and Kao et al. (1994)), such that the synthase genes are built-up by homologous recombination within the vector. Alternatively, the epothilone synthase gene cluster is introduced into the vector by restriction fragment ligation. Following selection, e.g. as described in Kao et al. (1994), DNA from the vector is introduced into the act-minus Streptomyces coelicolor strain according to protocols set forth in Hopwood et al., Genetic Manipulation of Streptomyces. A Laboratory Manual (John Innes Foundation, Norwich, United Kingdom, 1985), incorporated herein by reference. The recombinant Streptomyces strain is grown on R2YE medium (Hopwood et al. (1985)) and produces epothilones.
Alternatively, the epothilone synthase genes according to the present invention are expressed in other host organisms such as pseudomonads, Bacillus, yeast, insect cells and/or E. coli. PKS and NRPS genes are preferably expressed in E. coli using the pT7-7 vector, which uses the T7 promoter. See, Tabor et al., Proc. Natl. Acad. Sci. USA 82:1074-1078 (1985). In another embodiment, the expression vectors pKK223-3 and pKK223-2 are used to express PKS and NRPS genes in E. coli, either in transcriptional or translational fusion, behind the tac or trc promoter. Expression of PKS and NRPS genes in heterologous hosts, which do not naturally have the phosphopantetheinyl (P-pant) transferases needed for posttranslational modification of PKS enzymes, requires the coexpression in the host of a Ppant transferase, as described by Kealey et al., Proc. Natl. Acad. Sci. USA 95: 505-509 (1998).
WO 99/66028 PCT/EP99/04171 -48 Example 14: Isolation of Epothilones from Producing Strains Examples of cultivation, fermentation, and extraction procedures for polyketide isolation, which are useful for extracting epothilones from both native and recombinant hosts according to the present invention, are given in WO 93/10121, incorporated herein by reference, in Example 57 of U.S. Patent No. 5,639,949, in Gerth et al., J. Antibiotics 49: 560-563 (1996), and in Swiss patent application no. 396/98, filed February 19, 1998, and U.S. patent application no. 09/248,910 (that discloses also preferred mutant strains of Sorangium cellulosum), both of which are incorporated herein by reference. The following are procedures that are useful for isolating epothilones from cultured Sorangium cellulosum strains such as So ce90, and may also be used for the isolation of epothilone from recombinant hosts.
A: Cultivation of epothilone-producing strains: Strain: Sorangium cellulosum Soce-90 or a recombinant host strain according to the present invention.
Preservation of the strain: In liquid N 2 Media: Precultures and intermediate cultures: G52 Main culture: 1B12 G52 Medium: yeast extract, low in salt (BioSpringer, Maison Alfort, France) 2 g/I MgSO 4 (7 H 2 0) 1 g/I CaCI 2 (2 H 2 0) 1 g/I soya meal defatted Soyamine 50T (Lucas Meyer, Hamburg, Germany) 2 g/I potato starch Noredux A-150 (Blattmann, Waedenswil, Switzerland) 8 g/l glucose anhydrous 2 g/I EDTA-Fe(ll)-Na salt (8 g/l) 1 ml/I WO 99/66028 PCT/EP99/04171 -49 pH 7.4, corrected with KOH Sterilisation: 20 mins. 120 °C 1B12 Medium: potato starch Noredux A-150 (Blattmann; Waedenswil, Switzerland) 20 g/l soya meal defatted Soyamine 50T (Lucas Meyer, Hamburg, Germany) 11 g/I EDTA-Fe(Ill)-Na salt 8 mg/l pH 7.8, corrected with KOH Sterilisation: 20 mins. 120 °C Addition of cyclodextrins and cyclodextrin derivatives: Cyclodextrins (Fluka, Buchs, Switzerland, or Wacker Chemie, Munich, Germany) in different concentrations are sterilised separately and added to the 1812 medium prior to seeding.
Cultivation: 1 ml of the suspension of Sorangium cellulosum Soce-90 from a liquid N 2 ampoule is transferred to 10 ml of G52 medium (in a 50 ml Erlenmeyer flask) and incubated for 3 days at 180 rpm in an agitator at 300°C, 25 mm displacement. 5 ml of this culture is added to 45 ml of G52 medium (in a 200 ml Erlenmeyer flask) and incubated for 3 days at 180 rpm in an agitator at 30°C, 25 mm displacement. 50 ml of this culture is then added to 450 ml of G52 medium (in a 2 litre Erlenmeyer flask) and incubated for 3 days at 180 rpm in an agitator at 30 0 C, 50 mm displacement.
Maintenance culture: The culture is overseeded every 3-4 days, by adding 50 ml of culture to 450 ml of G52 medium (in a 2 litre Erlenmeyer flask). All experiments and fermentations are carried out by starting with this maintenance culture.
Tests in a flask: Preculture in an agitating flask: WO 99/66028 PCTIEP99/04171 50 Starting with the 500 ml of maintenance culture, 1 x 450 ml of G52 medium are seeded with ml of the maintenance culture and incubated for 4 days at 180 rpm in an agitator at 50 mm displacement.
(ii) Main culture in the agitating flask: ml of 1B12 medium plus 5 g/I 4-morpholine-propane-sulfonic acid MOPS) powder (in a 200 ml Erlenmeyer flask) are mixed with 5 ml of a 10x concentrated cyclodextrin solution, seeded with 10 ml of preculture and incubated for 5 days at 180 rpm in an agitator at 300C, mm displacement.
Fermentation: Fermentations are carried out on a scale of 10 litres, 100 litres and 500 litres.
litre and 100 litre fermentations serve as an intermediate culture step. Whereas the precultures and intermediate cultures are seeded as the maintenance culture 10% the main cultures are seeded with 20% of the intermediate culture. Important: In contrast to the agitating cultures, the ingredients of the media for the fermentation are calculated on the final culture volume including the inoculum. If, for example, 18 litres of medium 2 litres of inoculum are combined, then substances for 20 litres are weighed in, but are only mixed with 18 litres.
Preculture in an agitating flask: Starting with the 500 ml maintenance culture, 4 x 450 ml of G52 medium (in a 2 litre Erlenmeyer flask) are each seeded with 50 ml thereof, and incubated for 4 days at 180 rpm in an agitator at 30°C, 50 mm displacement.
Intermediate culture, 20 litres or 100 litres: litres: 18 litres of G52 medium in a fermenter having a total volume of 30 litres are seeded with 2 litres of the preculture. Cultivation lasts for 3-4 days, and the conditions are: 300C, 250 rpm, 0.5 litres of air per litre liquid per min, 0.5 bars excess pressure, no pH control.
100 litres: 90 litres of G52 medium in a fermenter having a total volume of 150 litres are seeded with 10 litres of the 20 litre intermediate culture. Cultivation lasts for 3-4 days, and the conditions are: 30°C, 150 rpm, 0.5 litres of air per litre liquid per min, 0.5 bars excess pressure, no pH control.
WO 99/66028 PCT/EP99/04171 -51 Main culture, 10 litres, 100 litres or 500 litres: litres: The media substances for 10 litres of 1 B12 medium are sterilised in 7 litres of water, then 1 litre of a sterile 10% 2-(hydroxypropyl) -0-cyclodextrin solution are added, and seeded with 2 litres of a 20 litre intermediate culture. The duration of the main culture is 6- 7 days, and the conditions are: 30°C, 250 rpm, 0.5 litres of air per litre of liquid per min, bars excess pressure, pH control with H 2 SOJKOH to pH 7.6 0.5 no control between pH 7.1 and 8.1).
100 litres: The media substances for 100 litres of 1B12 medium are sterilised in 70 litres of water, then 10 litres of a sterile 10% 2-(hydroxypropyl) -3-cyclodextrin solution are added, and seeded with 20 litres of a 20 litre intermediate culture. The duration of the main culture is 6-7 days, and the conditions are: 300°C, 200 rpm, 0.5 litres air per litre liquid per min., bars excess pressure, pH control with H 2 SOIKOH to pH 7.6 0.5. The chain of seeding for a 100 litre fermentation is shown schematically as follows: maintenance culture (500ml) G52 medium intermediate precutures culture 20 1) (4 x 500 ml) G52 medium SmG52 medium maintenance culture (500 ml) G52 medium main culture 100 1) medium HP-p-CD 500 litres: The media substances for 500 litres of 1 B12 medium are sterilised in 350 litres of water, then 50 litres of a sterile 10% 2-(hydroxypropyl) -0-cyclodextrin solution are added, and seeded with 100 litres of a 100 litre intermediate culture. The duration of the main culture is 6-7 days, and the conditions are: 30°C, 120 rpm, 0.5 litres air per litre liquid per min., 0.5 bars excess pressure, pH control with H 2 SO4/KOH to pH 7.6 Product analysis: Preparation of the sample: WO 99/66028 PCT/EP99/04171 52 ml samples are mixed with 2 ml of polystyrene resin Amberlite XAD16 (Rohm Haas, Frankfurt, Germany) and shaken at 180 rpm for one hour at 30°C. The resin is subsequently filtered using a 150 pm nylon sieve, washed with a little water and then added together with the filter to a 15 ml Nunc tube.
Elution of the product from the resin: ml of isopropanol are added to the tube with the filter and the resin. Afterwards, the sealed tube is shaken for 30 minutes at room temperature on a Rota-Mixer (Labinco BV, Netherlands). Then, 2 ml of the liquid are centrifuged off and the supernatant is added using a pipette to HPLC tubes.
HPLC analysis: Column: Solvents: Gradient: Oven temp.: Detection: Injection vol.: Retention time: Waters-Symetry C18, 100 x 4 mm, 3.5 pm WAT066220 preliminary column 3.9 x 20 mm WAT054225 A: 0.02 phosphoric acid B: Acetonitrile (HPLC-Quality) 41% B from 0 to 7 min.
100% B from 7.2 to 7.8 min.
41% B from 8 to 12 min.
250 nm, UV-DAD detection 10 pl Epo A: 4.30 min Epo B: 5.38 min B: Effect of the addition of cyclodextrin and cyclodextrin derivatives to the epothilone concentrations attained.
Cyclodextrins are cyclic (a-1,4)-linked oligosaccharides of a-D-glucopyranose with a relatively hydrophobic central cavity and a hydrophilic external surface area.
The following are distinguished in particular (the figures in parenthesis give the number of glucose units per molecule): a-cyclodextrin p-cyclodextrin y- cyclodextrin 8-cyclodextrin e- cyclodextrin (-cyclodextrin cyclodextrin and 6cyclodextrin Especially preferred are 8-cyclodextrin and in particular a-cyclodextrin, 3cyclodextrin or y-cyclodextrin, or mixtures thereof.
WO 99/66028 PCT/EP99/04171 53 Cyclodextrin derivatives are primarily derivatives of the above-mentioned cyclodextrins, especially of a-cyclodextrin, P-cyclodextrin or y-cyclodextrin, primarily those in which one or more up to all of the hydroxy groups (3 per glucose radical) are etherified or esterified. Ethers are primarily alkyl ethers, especially lower alkyl, such as methyl or ethyl ether, also propyl or butyl ether; the aryl-hydroxyalkyl ethers, such as phenyl-hydroxy-lower-alkyl, especially phenyl-hydroxyethyl ether; the hydroxyalkyl ethers, in particular hydroxy-loweralkyl ethers, especially 2-hydroxyethyl, hydroxypropyl such as 2-hydroxypropyl or hydroxybutyl such as 2-hydroxybutyl ether; the carboxyalkyl ethers, in particular carboxy-lower-alkyl ethers, especially carboxymethyl or carboxyethyl ether; derivatised carboxyalkyl ethers, in particular derivatised carboxy-lower-alkyl ether in which the derivatised carboxy is etherified or amidated carboxy (primarily aminocarbonyl, mono- or di-lower-alkyl-aminocarbonyl, morpholino-, piperidino-, pyrrolidino- or piperazino-carbonyl, or alkyloxycarbonyl), in particular lower alkoxycarbonyl-lower-alkyl ether, for example methyloxycarbonylpropyl ether or ethyloxycarbonylpropyl ether; the sulfoalkyl ethers, in particular sulfo-lower-alkyl ethers, especially sulfobutyl ether; cyclodextrins in which one or more OH groups are etherified with a radical of formula -O-[alk-O-]n-H wherein alk is alkyl, especially lower alkyl, and n is a whole number from 2 to 12, especially 2 to 5, in particular 2 or 3; cyclodextrins in which one or more OH groups are etherified with a radical of formula
R'
I o (Alk-O)--Alk--K
Y
wherein R' is hydrogen, hydroxy, -O-(alk-O)z-H, or -O-(alk(-R)-O-)q-alk-CO-Y; alk in all cases is alkyl, especially lower alkyl; m, n, p, q and z are a whole number from 1 to 12, preferably 1 to 5, in particular 1 to 3; and Y is OR, or NR 2
R
3 wherein Ri, R 2 and Ra independently of one another, are hydrogen or.lower alkyl, or R 2 and
R
3 combined together with the linking nitrogen signify morpholino, piperidino, pyrrolidino or piperazino; or branched cyclodextrins, in which etherifications or acetals with other sugar molecules are present, especially glucosyl-, diglucosyl- (G 2 -p-cyclodextrin), maltosyl- or dimaltosylcyclodextrin, or N-acetylglucosaminyl-, glucosaminyl-, N-acetylgalactosaminyl- or galactosaminyl-cyclodextrin.
WO 99/66028 PCT/EP99/04171 54 Esters are primarily alkanoyl esters, in particular lower alkanoyl esters, such as acetyl esters of cyclodextrins.
It is also possible to have cyclodextrins in which two or more different said ether and ester groups are present at the same time.
Mixtures of two or more of the said cyclodextrins and/or cyclodextrin derivatives may also exist.
Preference is given in particular to 3- or y-cyclodextrins or the lower alkyl ethers thereof, such as methyl-p-cyclodextrin or in particular 2,6-di-O-methyl-p-cyclodextrin, or in particular the hydroxy lower alkyl ethers thereof, such as 2-hydroxypropyl-a-, 2-hydroxypropyl-p- or 2-hydroxypropyl-y-cyclodextrin.
The cyclodextrins or cyclodextrin derivatives are added to the culture medium preferably in a concentration of 0.02 to 10, preferably 0.05 to 5, especially 0.1 to 4, for example 0.1 to 2 percent by weight Cyclodextrins or cyclodextrin derivatives are known or may be produced by known processes (see for example US 3,459,731; US 4,383,992; US 4,535,152; US 4,659,696; EP 0 094 157; EP 0 149 197; EP 0 197 571; EP 0 300 526; EP 0 320 032; EP 0 499 322; EP 0 503 710; EP 0 818 469; WO 90/12035; WO 91/11200; WO 93/19061; WO 95/08993; WO 96/14090; GB 2,189,245; DE 3,118,218; DE 3,317,064 and the references mentioned therein, which also refer to the synthesis of cyclodextrins or cyclodextrin derivatives, or also: T.
Loftsson and M.E. Brewster (1996): Pharmaceutical Applications of Cyclodextrins: Drug Solubilization and Stabilisation: Journal of Pharmaceutical Science 85 (10):1017-1025; R.A.
Rajewski and V.J. Stella(1996): Pharmaceutical Applications of Cyclodextrins: In Vivo Drug Delivery: Journal of Pharmaceutical Science 85 1142-1169).
All the cyclodextrin derivatives tested here are obtainable from the company Fluka, Buchs, CH. The tests are carried out in 200 ml agitating flasks with 50 ml culture volume. As controls, flasks with adsorber resin Amberlite XAD-16 (Rohm Haas, Frankfurt, Germany) and without any adsorber addition are used. After incubation for 5 days, the following epothilone titres can be determined by HPLC: Table 2: Addition order Conc Epo A [mg/I] Epo B [mg/I] No. 1 Amberlite XAD-16 2.0 9.2 3.8 WO 99/66028 WO 9966028PCTIEP99/04 171 55 Addition order Conc Epo A [mg/I] Epo B [mg/I] No. [%w/v]l 2-hydroxypropyl-D3-cyclodextrin 56332 0.1 2.7 1.7 2-hydroxypropyl-o3-cyclodextrin 0.5 4.7 3.3 2-hydroxypropyl-p3-cyclodextrin 1 .0 4.7 3.4 2-hydroxypropyl-o3-cyclodextrin 2.0 4.7 4.1 2-hydroxypropyl-p3-cyclodextrin 5.0 1.7 2-hydroxypropyl- a-cyclodextrin 56330 0.5 1.2 1.2 2-hydroxypropyl- a-cyclodextrin 1.0 1.2 1.2 2-hydroxypropyl- cx-cyciodextrin I 5.0 2.5 j 2.3 P-cyclodextrin 28707 0.1 1.6 1.3 O-cycodextrin 110.5 3.6 f-cyclodextrin 1.0 4.8 3.7 f-cyclodextrin 2.0 4.8 2.9 3-cyclodextrin 5.0 1.1 0.4 methyl-p3-cyclodextrin 66292 0.5 0.8 <0.3 methyI-p-cyclodextrin 1.0 <0.3 <0.3 methyl-p3-cyclodextrin 2.0 <0.3 <0.3 2,6 di-o-methyl- -cyclodextrin 139915 1.0 <0.3 <0.3 2-hydroxypropyl-y-cyclodextrin 56334 0.1 0.3 <0.3 2-hydroxypropyi-y-cyclodextrin 0. 5 0.9 0.8 2-hydroxypropyl-y-cyclodextrin U1.0 1 .1 0.7 2-hydroxypropyl-y-cyclodextrin 2.0 2.6 0.7 2-hydroxypropyl-y-cyclodextrin 5.0 5.0 1.1 no addition 0.5 1Apart from Amberlite all percentages are by weight Few of the cyclodextrins tested (2,6-di-o-methyl-j3-cyclodextrin, methyl-p3-cyclodextrin) display no effect or a negative effect on epothilone production at the concentrations used.
1-2% 2-hydroxy-propyl-J3-cyclodextrin and O-cycodextrin increase epothilone production in the examples by 6 to 8 times compared with production using no cyclodextrins.
WO 99/66028 PCT/EP99/04171 56 C: 10 litre fermentation with 1% 2-(hydroxvypropyl)-3-cyclodextrin): Fermentation is carried out in a 15 litre glass fermenter. The medium contains 10 g/ of 2-(hydroxypropyl)-P-cyclodextrin from Wacker Chemie, Munich, Germany. The progress of fermentation is illustrated in Table 3. Fermentation is ended after 6 days and working up takes place.
Table 3: Progress of a 10 litre fermentation duration of culture Epothilone A [mg/I] Epothilone B [mg/I] 0 0 0 1 0 0 2 0.5 0.3 3 1.8 4 3.0 5.1 3.7 5.9 6 3.6 5.7 D: 100 litre fermentation with 1% 2-(hydroxvypropyl)-B-cyclodextrin): Fermentation is carried out in a 150 litre fermenter. The medium contains 10 g/l of 2- (Hydroxypropyl)-3-cyclodextrin. The progress of fermentation is illustrated in Table 4. The fermentation is harvested after 7 days and worked up.
Table 4: Progress of a 100 litre fermentation duration of Epothilone A Epothilone B culture [mg/I] [mg/I] 0 0 0 1 0 0 2 0 WO 99/66028 PCTIEP99/04171 57 3 0.9 1.1 4 1.5 2.3 1.6 3.3 6 1.8 3.7 7 1.8 E: 500 litre fermentation with 1% 2-(hydroxypropyl)-B-cyclodextrin): Fermentation is carried out in a 750 litre fermenter. The medium contains 10 g/l of 2- (Hydroxypropyl)-p-cyclodextrin. The progress of fermentation is illustrated in Table 5. The fermentation is harvested after 7 days and worked up.
Table 5: Progress of a 500 litre fermentation duration of culture Epothilone A Epothilone B [mg/I] [mg/I] 0 0 0 1 0 0 2 0 0 3 0.6 0.6 4 1.7 2.2 3.1 6 3.1 5.1 F: Comparison example 10 litre fermentation without adding an adsorber: Fermentation is carried out in a 15 litre glass fermenter. The medium does not contain any cyclodextrin or other adsorber. The progress of fermentation is illustrated in Table 6.
The fermentation is not harvested and worked up.
Table 6: Progress of a 10 litre fermentation without adsorber.
WO 99/66028 PCT/EP99/04171 58duration of culture Epothilone A Epothilone B [mg/I] [mg/I] 0 0 0 1 0 0 2 0 0 3 0 0 4 0.7 0.7 0.7 6 0.8 1.3 G: Working up of the epothilones: Isolation from a 500 litre main culture: The volume of harvest from the 500 litre main culture of example 2D is 450 litres and is separated using a Westfalia clarifying separator Type SA-20-06 (rpm 6500) into the liquid phase (centrifugate rinsing water 650 litres) and solid phase (cells ca. 15 kg).
The main part of the epothilones are found in the centrifugate, The centrifuged cell pulp contains 15% of the determined epothilone portion and is not further processed. The 650 litre centrifugate is then placed in a 4000 litre stirring vessel, mixed with 10 litres of Amberlite XAD-16 (centrifugate:resin volume 65:1) and stirred. After a period of contact of ca. 2 hours, the resin is centrifuged away in a Heine overflow centrifuge (basket content litres; rpm 2800). The resin is discharged from the centrifuge and washed with 10-15 litres of deionised water. Desorption is effected by stirring the resin twice, each time in portions with 30 litres of isopropanol in 30 litre glass stirring vessels for 30 minutes.
Separation of the isopropanol phase from the resin takes place using a suction filter. The isopropanol is then removed from the combined isopropanol phases by adding 15-20 litres of water in a vacuum-operated circulating evaporator (Schmid-Verdampfer) and the resulting water phase of ca. 10 litres is extracted 3x each time with 10 litres of ethyl acetate.
Extraction is effected in 30 litre glass stirring vessels. The ethyl acetate extract is concentrated to 3-5 litres in a vacuum-operated circulating evaporator (Schmid-Verdampfer) and afterwards concentrated to dryness in a rotary evaporator (BOchi type) under vacuum.
The result is an ethyl acetate extract of 50.2 g. The ethyl acetate extract is dissolved in WO 99/66028 PCT/EP99/04171 59 500 ml of methanol, the insoluble portions filtered off using a folded filter, and the solution added to a 10 kg Sephadex LH 20 column (Pharmacia, Uppsala, Sweden) (column diameter 20 cm, filling level ca. 1.2 Elution is effected with methanol as eluant.
Epothilone A and B is present predominantly in fractions 21-23 (at a fraction size of 1 litre).
These fractions are concentrated to dryness in a vacuum on a rotary evaporator (total weight 9.0 These Sephadex peak fractions (9.0 g) are thereafter dissolved in 92 ml of acetonitrile:-water:-methylene chloride 50:40:2, the solution filtered through a folded filter and added to a RP column (equipment Prepbar 200, Merck; 2. 0 kg LiChrospher RP-18 Merck, grain size 12gm, column diameter 10 cm, filling level 42 cm; Merck, Darmstadt, Germany). Elution is effected with acetonitrile:water 3:7 (flow rate 500 ml/min.; retention time of epothilone A ca. 51-59 mins.; retention time of epothilone B ca. 60-69 mins.).
Fractionation is monitored with a UV detector at 250 nm. The fractions are concentrated to dryness under vacuum on a Buchi-Rotavapor rotary evaporator. The weight of the epothilone A peak fraction is 700 mg, and according to HPLC (external standard) it has a content of 75.1%. That of the epothilone B peak fraction is 1980 mg, and the content according to HPLC (external standard) is 86.6%. Finally, the epothilone A fraction (700 mg) is crystallised from 5 ml of ethyl acetate:toluene 2:3, and yields 170 mg of epothilone A pure crystallisate [content according to HLPC of area) Crystallisation of the epothilone B fraction (1980 mg) is effected from 18 ml of methanol and yields 1440 mg of epothilone B pure crystallisate [content according to HPLC of area) m.p.
(Epothilone e.g. 124-125 oC; 'H-NMR data for Epothilone B: 500 MHz-NMR, solvent: DMSO-d6. Chemical displacement 5 in ppm relative to TMS. s singlet; d doublet; m multiplet (Multiplicity) Integral (number of H) 7.34 1 6.50 1 5.28 1 5.08 1 4.46 1 4.08 (m) WO 99/66028 PCT/EP99/04171 60 3.47 (m) 3.11 (m) 2.83 (dd) 2.64 (s) 2.36 (m) 2.09 (s) 2.04 (m) 1.83 (m) 1.61 (m) 1.47- 1.24 (m) 1.18 (s) 1.13 (m) 1.06 (d) 0.89 (d s, overlapping) 6 2 3 6 Z= 41 Example 15: Medical Uses of Recombinantly Produced Epothilones Pharmaceutical preparations or compositions comprising epothilones are used for example in the treatment of cancerous diseases, such as various human solid tumors.
Such anticancer formulations comprise, for example, an active amount of an epothilone together with one or more organic or inorganic, liquid or solid, pharmaceutically suitable carrier materials. Such formulations are delivered, for example, enterally, nasally, rectally, orally, or parenterally, particularly intramuscularly or intravenously. The dosage of the active ingredient is dependent upon the weight, age, and physical and pharmacokinetical condition of the patient and is further dependent upon the method of delivery. Because epothilones mimic the biological effects of taxol, epothilones may be substituted for taxol in compositions and methods utilizing taxol in the treatment of cancer. See, for example, U.S.
-61 Patent Nos. 5,496,804, 5,565,478, and 5,641,803, all of which are incorporated herein by reference.
For example, for treatments, epothilone B is supplied in individual 2 ml glass vials formulated as 1 mg/1 ml of clear, colorless intravenous concentrate. The substance is formulated in polyethylene glycol 300 (PEG 300) and diluted with 50 or 100 ml 0.9% Sodium Chloride Injection, USP, to achieve the desired final concentration of the drug for infusion. It is administered as a single 30-minute intravenous infusion every 21 days (treatment three-weekly) for six cycles, or as a single 30-minute intravenous infusion every 7 days (weekly treatment).
Preferably, for weekly treatment, the dose is between about 0.1 and about 6, preferably about 0.1 and about 5 mg/m 2 more preferably about 0.1 and about 3 mg/m 2 even more preferably 0.1 and 1.7 mg/m 2 most preferably about 0.3 and about 1 mg/m 2 for three-weekly treatment (treatment every three weeks or every third week) the dose is between about 0.3 and about 18 mg/m 2 preferably about 0.3 and about 15 mg/m 2 more preferably about 0.3 and about 12 mg/m 2 even more preferably about 0.3 and about mg/m 2 still more preferably about 0.3 and about 5 mg/m 2 most preferably about 1.0 and about 3.0 mg/m 2 This dose is preferably administered to the human by intravenous administration during 2 to 180 min, preferably 2 to 120 min, more preferably during about to about 30 min, most preferably during about 10 to about 30 min, e.g. during about 30 min.
go While the present invention has been described with reference to specific embodiments thereof, it will be appreciated that numerous variations, modifications, and embodiments are possible, and accordingly, all such variations, modifications and embodiments are to be regarded as being within the spirit and scope of the present invention.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word, "comprise", and variations such as "comprises" and S "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
.0 ;The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that that prior art forms part of the common general knowledge in Australia.
ST
OF
EDITORIAL NOTE 46116/99 SEQUENCE LISTING PAGES 1 TO 88 FOLLOW PAGE 61 OF THE
DESCRIPTION.
WO 99/66028 WO 9966028PCT/EP99/041 71 1- SEQUENCE LISTING <110> NovartiS AG <120> GENES FOR THE BIOSYNTHESIS OF EPOTHILONES <130> 4-30582A <140> <141> <160> <170> PaterntIn Ver. <210> <211> <212> <213> 1 68750
DNA
Sorangiumr cellulosun <400> 2.
aagct tcgc t acggccgggc ccct.ccgaga gcgacctuac atgccgcag cgctCgtccg agcgcgagcg ccrtgcttcgc cgatgtcgcc agctCgcCtg atcccgccta tcgccgcggC tcgcctcctg tgtgggagcg tcgagcgcgc tcgcggcggg tcgtgacca ccggcgat cgcccctcag tgacggaagg tccacgctcg gcgcgCCCtt gtctccacga ggtgcgagcz ccccgagccgtcaccctcaa tcgacaagcg acgtcgtgtc ggcacgtcga tacgggaccg gcatcctgac ccggaccgag gcagacgctg catcacgatc atcccaccce cccataaccc ggcCggcgtc gccctcgtgC cgacacgagc cccgtcgcgz cgcgaaccgz cgcgaggatc caccccctgc cggcagcgtc cgacaccc cc cacggagcgg gcaccr.ccgc aatggagctc cctgaggcac gtggctcgc agcccgaacc gcccgatctg tgaagtcgcc tgccgcgctg cgagatgctg ctccgcgccc ggaggtcgta gctccggacg cgaggcgatc cgccggcctg cggagacgca cscgccagtc ccagatgctc cagccccctc cgggttcatg cgtcgtccag gcccgccggC .tcggctggc cz.:cgcgtac Egacgacccac caccggcgta cgaggaccgq gcacatccgc ccctgacatc gctCgCCCgC ctggcgaCc fcgccagaaac ccgatcagcl tagcgcccg5 cagccgtgcc, aggtgatcg acaccgccgl ccggcggCg( i tccatgcgci g cgagcttg ;cgctCgCgc, accagctC ccgagcCgc ttcgcccgcg catgtgctcg atccaggaag ctcgcgcacg cccgactgga gcgcgcggcg gcgcaggagg ccccgcttcg gaggccgagc ctcgcCtggC ccagagaatc ggcacatcgg tcgagcaaga atcg'tccgcg gcggcggagg gcggtcgCtg ttgtactccg gtgctgctcg ttcgtcgcgc atcgtgatgg gcgtgggtca cgctcgacca agcgctttct cggctcgagc ctcggcgagc gtgctgtggg gagcccatcg rgacatcttcg :tccggcgcct Igtgctcaatc -agctgacatc :gccgctggcc- :gctcgagagc :cgcggCCCgq ;gcacctccgz ;tctcgaagaa ;ccgtcatctc :tcatgagcal ccgaagggai ;tcttCtCCCC t ccgccgggac a cgaacgcat( c ccgtgttgti c ggatccact( ccacctc~iac ccgaggcgcg gggggccatc accagcccc:: cctcggacgc cgccgggtcC cgaggcgcct aggacgacgc ggcgcczccg tcgggacggg tgctcctcgg aggccgctct agagccagct cgatgggcaa tgcgccgcct gggtctcc.C gcgacggcaa ccggaaccga acgccaac caagaaacca accaggccat tcatggaatt ccctcgcctg tatggcocca accccazzgc ccgaccczga tcctcgcgga cgcttaCcgg ccaccgtcgt ggcgcggcct gctcgacgcc rggccgcagCt ccccgagaac atcattgatc kgaccgcgtcc igcccgggaai gcgcaccga(.
gcgcgcgaa( I cgccggcg :CgtctgCtc( :gtgaaagto :gaagcCtg' a catccactg :gtCtggCCC ccgtgtgctC cgggatcgag Ct::tcactgC cgcgzccatc gatgctcgtc cctccgcgag gtggctcgcg caacgggctg cgcctcgtac cgcgggtccC gtttggcctC ccgcggcgca cggcaacatc tgccgacaac acgcgcacag gagcggccgg cgacat.cgtc cccttc: tc gggcaccatc ggcgcgaccg ggtgcccgac cgagcaccc cgacgaggag cccgcaccac ggcgacctgg *tcgcagggcc *gacgcgccat facagcccgac *ggccgactac cttcttcaccgggcCgCiCa catgccgat aggaagccgq caggacgtcc ggcgccgtgE i aacgaggacc 3 gcggCgCtCz :aggaggtagl g ctggcatag g tcggtgaagl g gtgtccCgCi g tcggccgcgi -gccCccggC, g tgcggcgtc, caaataacc gacctccgcg 120 atgtgcctcg 180 agcttccacc 240 gacggccccg 300 tacgaagagg 360 gccgcgccgc 420 ccgctcggcc 480 gcgactcctg 540 tggtccggat 600 ccgaccgcga 660 gcgcggctgt 720 cccgaagccc 780 ctctctcgct 840 ccggcgccct 900 ctctcgggcc 960 atgttccaac 1020 gagcmcgcac 1080 tccaaggtcc 1140 atgagcctcg 1200 cccgagcggg 1260 acgcctcgtt 1320 cacctctact 1380 cgccccggcg 1440 tacccctcgc 1500 atcctcgggg 1560 cccccggcgc 1620 tcccgcgact 1680 *cagcgccagc 1740 acgaacgacc 1800 tcgagggcgc 1860 *cggtggcgac 1920 1cggattgtgt 1980 -cgaacccgcc 2040 iccacggccat 2100 7ccgccttctg 2160 iagagccgccg 2220 cgtccaccgt 2280 ;cgctctcggc 2340 -aaccgcagcc 2400 t cccgagcgg 2460 -gccccacgag 2520 r gacgcatgag 2580 a tcggcaccgg 2640 WO 99/66028 WO 9966028PCT/EP99/04171 ctgcgcgttg cgagattccg gggcaccgtc cggcaaccat caccgccgtc catgggcgcg gcgcgcgacc ctcgcgcgcg ggccccaCc ccttctcatg actgcccgcg gcccggccgc gcacgccgct ggcctcaccg gcgcgcgCCt ttcggcggcg gccctCttcc ctcctcctgc cgcccCgggg ctcccggcgc tcggtgacgg agctatgcgc gtcgcgatga ctcctggcga gcgatgcgct gtcctcacgt gcgttCgCgC ggcgtgcaga cgcgtcgacg gcgaccgcga aggggcagcg atcgtcgcga gccgtcgtcg agggcgcctc gcgtacatcc ttcgccacgg gacatcacgg gcgagccggg cgcgagctgc gtgatcggcg gcgatcgtcc gagcgcgcct gccgccgatc agcgcgcaga gcggtggCgC gtctcgtCgc cgcgccccgt tacctcggca gtcgcgcatc ggaagcgagc tgagcagcgC cgtccgattc gatcgttgaz cggggtcgcc tcaacaggci ggatgtagcc cctcctggcl gacgcgctcc, ccgagcgcc, gacagtggg' acgggccga, ctcgccggC gccgatgcg gacgtcgtt cggcagcgt acgagcccga aagcccatcg tcgtcgatcg cggtcgacgg gcggtgaccg ctgccgccga agccagaccg ggcgccagcg aacgcgcacc accgtcgatg cccgaaaaaa ccgctcgggc tgccatgCcc agcggcaggt ccggcgagct tcgtgctggg aggagccggc tgctgatggc cgctCtCggc tcgtgctcga cggtcagcgt aggtgacgct cgtcgtcgag gcggattctt gggtggccga tcctggccgc tcggcgtgct cgctcgtggc tgtcgcagct cggcggcgaa aggcggcgct tcgtcggCgt cgctggccac cgacgcagga ccggggtcga acatcgtgga agctctccgt ggctcgcgag gcggctcgat cgcgatcgcc agcgggccga ccgcgcggcq tcgcggccCa ccgatccggg ggagcgtcgt gcgtgcacgt acgat-ctgct igcacggtcgz Igagggactcc :gcccggctCt -gttctccgcc -gatcacgctc.
i ccggacgtgc I gtcgctgaa(.
a ggcCgtctc :ctctgcgati cctctttg g gggatcccati g acgggCttt, t. ccgccgtga, -gattcggcc, t gcggcggtC c gcgtcaccg c acctctcgc c cgagccccg gctcgtcgat tgaacgtcat gaccatcgat gggagtcgag ccttCgtCat gctcggtcac ctcccggcat cgccggcccC ccggcgCcgC cctctccgag tcatcggtgc gcccgcccct ggcctgcacc cctgctCtCg cacgcggCgg cccctccgtc ggtCggggtC gggcatcgag actcggcgcg tcggcccctt gatcgcgaag cgcggCgggg ctacggcgcg gctgttcatg cgcgacgcgc ggcgctgacg gctcaacagc gggCCtCttC gcgcacgccg ggtCgtCCCC cgtggcggcg caagctcggg ggtgaccgc ggagtcggct gcggatcctc gagcatcgtc ggagcagaa gctcggCgcc ccaggccatc ggcacgcgcc grtccaacgtc gatcctcgt-c *Cgr-ggcgc Icgcggtcgtc *cgacgaggcc *gggcgcgcac *cgtgctcg igtcggtggt(.
Iagagcaggt-( gccgacgat4 -tgacgcgagi ;gcatagt-Ccl -caggtgcgci aaacggtg a tggctcgtc -gcacagcgc agcct-ccct g gctgaggat ;aaagcgcgc Sgcagagagg Sccggatagg Sgagcgagaa cgacgtcga gcccgagcg a gt Ccccgtg ggcccgctgg caggtcgcgc gcgcgccagc gtcgagctg cgaggcgatc gcccaccgcg ctgccccgCC cgcgtcctgC cacgctgatc cgggg~CC cccgtcacga ggacgagcaa cacaccgagg ctcgtcacc cItgcgccagc gtcggcgcgc gtgctctcgg gtcgacgtgg atcgcgcccc ccgagcggcc gtgctgatcg gtggtcagcg tcgcccgCgc gtgctCgtcg gtctccaagg cagcggctcg gctcctcgca gcgcctgtgt gcggCgtggg accgcgCtCg ggcctgaaca ctcctctcca tcacccgcgc cgcctcgagc gtcccgat~cg gcctccaagc gcgccCggCC rcgcctccgCg :ctgcgcgCCt Tcgcggaatgt Tctcgtcgtgg -ccgatcatcg- I gcgtgggacc- Stggcgcgatc I gtcttCCggq :ccgagcgacc 3. tgctacgacc g gtccggagcc ;aggtgaggct gtcactcccc r- cgagccgggt g tatcgcgcg ctcgctggaac a tggcgacct(, a tctgcggctc g tccgcccga, tgctgtccai ctcgccgag, g accggccag, c gatcgaggt, g cacacgctc a accgtgcag cccgacgcc g ctcgaggcc 9g cgacgggtc atcggcgacg a accgtgatcg g accttccggt t ccttcctcga c cggaagatcg t tccacgtgca c gccacctccg c cctggctgcc c aaagctccca t tgCCCCtgCC g tcgccgccgg g agctcgcccg c agccacccac c tcgcgctcat c ccgaggtgct c tcgcgcccgg g gcatctcctg g gcatcctgcg c cgctcgCggC g tcttcctcgg g agcgcgagtCc aggtcgctgc c tggcggtcgc ggcggcggctC gacaggtgtc gcctgcacc ccaaccgccc ggacggtcgc gcgcgcggcc tgaagggcgg acgaggccta tcctcatctg gcgaggaggc tggcgcacgc gaaagctcgg catcgcgCgc tcggcatctg cgcgggatca cgttcggtcg tgggcgaccc fgcctcgagca ccgagctcgt gcgagccatc IggcgccggC"t Fagataacgcg atgggccgct gggtgccggt tccaccgcgc gtccgtgtag atgctgcacg j gatcggctcg :ggtcacccgg g cgcgtcccgg :aggtccgttg C: cggcttgtcc j gagcgatggc cgctccctgcc :ccggacgcgg g gtgagatgaa g gtcttctcgc g gcacgcggct g cgaccacgcg g gcagcgagcg g gagtgctcga tgcgtcgaa 2700 ccactccgc 2760 cgcgagctc 2820 gagcatcat 2880 gtcccgccg 2940 grcgtcgcc 3000 cgccatcac 3060 ctcctcctc 3120 aaactcccg 3180 agagcactg 3240 cgtggctcc 3300 ccgcgctca 3360 ctgatgcac 3420 ctcgtgacc 3480 :ggggagctc 3540 -t.CCatcga 3600 ~ataggcgcg 3660 :aaggaggcg 3720 ~ggcgccgcc 3780 ~atcgtgctc 3840 ;atgcgccgc 3900 :tgggtgctc 3960 :cggagcgcg 4020 :acccacctc 4080 ;ctcgtcctc 4140 :rctgctcggc 4200 tctcctcgac 4260 cgcgggcatg 4320 gttgctgc:g 4380 :agcgggczc 4440 cacggacc c 4500 t.acgacgtac 4560 gctcgagaaa 4620 cgcgaggcgc 4680 cctgcccggg 4740 cgagacggtc 4800 cgcgggggag 4860 gcggcaaagg 4920 caatctactc 4980 cczgcaggac 5040 tccggcggcg 5100 ctccttcgcc 5160 gctgctcagc 5220 ccgggtgcgc 5280 cggcgtgcgc 5340 ggagctcgcg 5400 cggccggctc 5460 ccgctgctc 5520 ccgcccgtga 5580 gcgatcgtgc 5640 acgatggggg 5700 ggttcggtca 5760 caaggcccgg 5820 tccgaacgcat 5880 ctcccgcctg 5940 atgtgtcctc 6000 ctcttcgctc 6060 gaccagcgcg 6120 gcccgagagg 6180 acacgtcgac 6240 gagcatggcg 6300 cgcgcccggc 6360 gctggcggtg 6420 gctcgtcgtc 6480 ctacaatgct~ 6540 WO 99/66028 WO 9966028PCTIEP99/04 171 gacagccgaa atcaccgccg acgtctgtcg cctgc ctcga acggccgcgt cctggcgCgg cgcgatcttt cgccgtgacc ctctacgaca cccacgctgc cgcggcaacg aacgtcgtcg ggggc tggcc aaaatttgtc c tcgagcgag ctcaggaagc gggattagat aggacgatcg tcccatcgag cggtggcgtg cgggcgagc cccggggaag C tCC ttCttC gctggaggtg tacggaaacg agcgacggcg agcgggccga ctattcgtcc ctccacggcc c"tcgaaaacc tgggtctcgga cgcggacggc gagcagcggt ggcggacgca gacgcttggt tgtcgccacg gtcggggatc gcacctccac cacgcgcgcc gttcggcatgc cacaccgccg agccctggat tctgggcat ggtggcggccgacgtcgccc cttcaccggz cgcgttccgc gctccgcgac agccttcacc gtggggtgtz cgtggcgggc gatgcaggcc ggctgctgc(.
ccaggtggtc gcgcggggc(.
cccgatgct-( CgtcCtggtc ttgggtgcgi ggCCggrtgc tgcctgcat, gccggcgac, ggccggcct cgagcgcta gggagcggg .cgctcggCt.
gaggcaagct agaagcagag ggtgacatcg tggccttccc ccaagcaacc cctgcgcatc gttgccatcg ggcagctcgg acctcaagag EggCtCtgtC agaagggccg taaccgccca ggggtcgacg atagatcgta ttctctcgga ctgatcggga cgagtgagac acgtccgcga cgcgcagccg atcgat.ctga cccgccgaac acgcccgtta ggcatctcgc tgctgggagg ggagtgttca tccgcagaga a Ectcgtatg tcgctggtgg ctggctggtg cgggcgctgg cgaggcgaag gatcggatat ctgaccgtgc ggctgcgccg gaccccatcg ccgctgctga actgggctgc gcgcaggcgc cggacaccgt agcgggacca gcgccggagc *gcacaggcgg *gtggcgttCa Iacgtcgaggg :ggtgcggtgc Lcagggggcgc :gaggcgttcc I atgatgtggg :cagccggcgc i gagccggagt gtgttCtcgc ;ctgccggCCc gtggCgCCgc atcgcgggcc.
3cgaaccaagi g gaggcgttcS.
agcaatCtgi 2cacgcgcga( g ggcaccttc g ccggacgcci cgtgctcgag, c ttcccctcal c tggatcgac t cacgacgag c gaccatccg ggccgagacg cagccctcag cgctatcagc catcacccct gccgtgccgg gctggaagga cagccaacgt cgaggcgttt cgccgtcgtC ggcgcattac cgtccagcgc gcaatgtcat atatcgcgcg agctgtgata taactttcaa cgagctaatt agttcttttg gcgggtcagc aagatccgat gcgggttctg gctgggatgc cgcgcgcatc ctcgcgaagc cgctggagaa tcgggaccgg tcgacgctca ccctcgggct ccgttcatct gggtatcgct ccagggacgg ggtgcgccgt tggcggtgat cgaacgggag cgtcttcggt aaatccaagc tcgggtcggr tgaaggtcgt tgaacccccg ggccggactq acgcgcacgt gaccggcaga cgcggctgcc gtctggcgac aggggctgcc *gcagtatcgc *agacgctggS Iacctgtgcgt ccgaaccggc -tgttcacct-, ggtcgccg *ttgaggacg( I gcggggcgal -acgcagcgti ;ccgggcaaci a cgctccacg 9 ggCgtgtgg, a gcgggaagg, g aggtggtgc' j tcgaggtcg ggcCcc g cgctcggcg gggggCggC a cgaaagccg g tcgaggagg c cgcccgaga accgtgccgt tcgccatcgt agcgctgagc gcgcactcct cgcggctcca tcggcggagc ggcgatcgcg gccgacttcg gagcgccacg cgcttcgagc gccaccacgg gggaatggcc atctccccgt gtggtctgtc tttttccgag cccatccat cagtgcgcga cgctaaggat tgcgatcgtc gacgctcczc agcagcgtgg tttcccgagc gctgcggatg cgccgcgatc cccatccaaa tggcgggctg gcgagggccg ggcctgrtcag gatgttgtcg tcgctgcaag cgtggtcctC ccgaggaccc ct.cccaagaa gggttatgtc tctgaatgcg a aagaccaac *cttgtccctt gatctcatgg gaatacacc c* gt~ggaa gctgctggtc Icgaccatctc acgcgcagc Iggcagccctc cgattcctcz catgggccgt gaggctgttc cagcgtcgac cgaatatgcc ;ccatagcatc ggtgttcct(.
gtgtcgat-( ggtgtcgat( cgtgcatgCo r- crtcgcat-gci =cgagtccggti c ttgcacaga' g cttcgcgga g tccgaaatc t gctcgcatc g gctctgggc g ggtgccgct a cgacgcggc g gggcgcggt g cggacgccg atgccaactt ctgccgcCgt ccgccagcag ccagcgacgg cgcgcgcgac atggatagag gcggtcaagt gcggCgtCCC gcgacgcgat cgcgcCCCgt cgtggacgac ccttgaaatg caattcccga ttacgttgcg gggggcttgg tttttgaggc agaacctggg gtgcCgtcg ggagcgagtt gagggctcgc tttgatccccg gacgtagcct gaccctgcac gctccatcgg tatgaggccg gggacgatgc tgtgtcgcqg agcttgcgct ccgagcaccc gcattttcgg aagcggctca gcgatcaatc atcgtgctga gaggcacacg gtatacggcc cttggccatc cagcacgggc ggtgatcttc cgacgggcgg tgaggcgccgq Ictgtcggcaz gagacctacc gcgatggagc Igacgctgcgq icgcggcaagc ,gggctgtacc -aaccaggagc :gccgcgctgc Sctcgccgcgc ggtgagctgi 3 gtggctgcg4 gaggcgccgi gccgcggtc g atcgcggcgi g ttccactca g agctaccgg cgaggtgagc t ggagtgaag g acgctgcrc g tcgcgCgCt C gtcggtggC g cccacgtac g cgtggcgac g cgcggCggC g gagaaggiC cgagctgctc 6600 catcgggccg 6660 gccccagggc 6720 ccgcgcagcg 6780 aggcgagcgt 6840 aatcgaggat 6900 tcatcgccgc 6960 gcgcgtgctg 7020 ccggttccac 7080 cgccgtcgcc 7140 atggcgcgga 7200 gccccttgag 7260 tggtaaaaga 7320 tcttccgcac 7380 tctctggttc 7440 tctgctcaaa 7500 cctcgaccgg 7560 tggcggatcg 7620 gccgtctacc 7680 gcgacaccgt 7740 accccgatgc 7800 gcttcgacgc 7860 atcgactct 7920 cgctcgtcgg 7980 cgctgccgca 8040 ccagcgtcgg 8100 tggatacggc 8160 ccggggaatg 8220 tcgtgcggcz 8280 cggaggccga 8340 gtggagcccg 8400 acgacggtac 3460 aacgggccct 8520 gcacgggcac 8580 t cgggcgaga 8640 ctgagtatgc 8700 *agattcctgc 8760 *ggctgaccgt 8820 1gggtgagctc 8880 1cggcgacgtg 8940 tggaccgcgtc 9000 cttcgcagtg 9060 accggctcgC 9120 1cgcagggaca 9180 -tcgcctttct 9240 ;atgtatggtc 9300 -tcgaccggcc 9360 tcgaccagac 9420 tgtggcggtc 9480 j tggctgcctg 9540 gcgggcgcct 9600 g aggccgatgt 9660 a acgctccgga 9720 g cgatggccgc 9780 c cgctcatggc 9840 c ggccgtcgat 9900 t cgccgggcta 9960 g cgctgcacgc 10020 g gcctggtgcc 10080 g ggcgtgacga 10140 tggtctcctg 10200 cttggcagcg 10260 gccgtgctcc 10320 g accggcgcag 10380 g aggccgccgg 10440 WO 99/66028 WO 9966028PCT[EP99/041 71 .4cgaccgtccg ggtcacggag gctcagcttc gcccaaccct gggcgtgaac tacccacgtc cgaggcggcc ccgcctcag cgcggtgcag gaaacgcgcc gttcgtcgcc gctcccgggc ggagctcggc caatctctcc ccgtgcgctc catcgcgacg gcatcttggg ccacgcaggc tgcgccggCC gctgcgcacg ctcgctcatg gacgacgttc tctcgccaca aaacgacttc aatcagcttc ctccagatac cacaaaagca gccccagccg ggccggacag acggatctcg atcgcttcggg gacatcgaga tcgttgcgag gtcgtcgccg aacgttgacc cccgagacct tctcgcaaga gagctcccac cgcttccggc gtcggggagc gggcgctgga gtccatccgc gacaccactc gaagcgatgg ctggggatcc gtcgttggtc cctcagctgc gacatcgtcc gtttttctcc cctgcccagc acgctgcacc tcggcgcacz cggctgcgcc tgggagcaac gatgccgacc gtgctgacgc ctcgtgagcc.
acaccttcg gtgatgatc(.
atagggccc cgcgatccg gtgCCggCgi aggtctctgi ccccaggcc ttccggctcg cggcgcgccc aatgatgtcc ccgctgctgc ggcctcgtgg accacgtcgg gccatgcccg ccgggggagc cgggcgcagc tacctggagt gacgtgcgcg gagctgatcg aagcgcgac t ttctcgctgg ttggaggagc ctcccgatcg aagctcgtac gccggcccgt gcgcgcgcgg cccgaaacca gccgtggagc ctgtccacgt gctctctcct gtctcatcgg tgaacgagct aggcccccaa cgatcctgac agcggcacgc gagcgtttac acatgccgag cccacacgct tcatcgatct atgcgatgtc ttcggcrtgga taggcagcct ctctccctgt agtctgaggc ctccgccgac acacggagca gcgggctgac gcgcgagccc gcgtgaacga gcgacaagag atcactgcga aacgaggcgc tcacctcgtt tgctggatca racggagtgtt ggcggctcaC tagaagcgcc 7gcctgttcgc iagacgctcac agcagggggc trgtcgcggt :taccggcggz :agccatggct I aggcc-ggcgl j atctcgcgti ;atcatcggg ;gagacagggl ;ggatcctg' cgcattgggi tgatgcgga c ggctttCgC a tcaggcccg agatcgatga ctggtctggg agctcgcgct tcggaggcga tgggccaacc ctgcgctggt tcgcgtacct gggtgctgat acgtgggagc cgctgggcgt cgtggacggg acaagagttt gttacgcgga tggatctccg tcctcggcct cccgtgtcgc tcacgctggg ccaccgggga cggcgctgga aggtcggcgc tgcgcaatcg cccccaatat tggagcgggt gcgcagatca cgagcaccag gaacgccctg gatgctccgt tccgtttcct ggtccccagc gctgagccgc gcccgacatg gcgcgggctc gcaccgcatc cgagcggcaa gtccatcatc cctggagctc gcatcaacga gcttccaatg atggctgccg cccgacgggc czggtttacg tatcaccggg *cttcgaacag *cgtaagcggt attgttcccc *gcagaggctc i cagctctac cccgcccgac tgraggaaccz ggcgagcgcz ggcgcgggtc gr-acgaagac acgcccgaac Ztctcgcggtc i gcgtatccac :ggatggcaai :cgaaggcgac a tgtcatcta( g tgccgtcaa( r. gctggcgct( ggcgggcggl agagttgati t gctcgtcgai ,gctgagcgg, g cgtgtcggt gccaggcgtg cgaggtcgag gggcatggtg gtgcgccggg ggtcatcgcc gctgcctcgg gacggcatgg ccatgcggcg cgaggtccat gcggtatgtg cggcgaggga caatctcctg taaccagctc ggggatgatg gatcgcggca cgatgcgttc tgacccggag tcgggacctg ggcgttcctc ggaggcgctg tatcgaggcg cgccttgttg ggcggcggag agactgggaa ggtatcaagc aacccgaacc cagagactcc ctcacagaca gggatccacg gcctttcgga atgcaggtga gaccggagca tatgacaccg acccgtctcg t tcaaggac t tcgtaccgcg tcgatggat~t aaggccgatc tcggacccct gtcatcctgg ctcaacataa gacttcacgt cgcgctaagc atcgaggtcc gtggtgctca ggaactccgg gagcacgatc cttctggacc ttggggtgaac iaacgcgaccz gagcagctgc Ictttcgcgcc ,acattggtcc I ctcgagtcac acctcctc( a ctgtcatgg( ggcgaccagc acctcgggai accatcctgI tcctcgctgi r- acgatcgtg gaacgagag Scattttgag, gactggatc Satcagcctg cttgatcacc t accgccgtcg a cccgacgacc t cgcatcgtcg c ctttCggcgg g cctcaggcgc t tacgcgctcg a accggcgggg t gcgacggccg g agcgattccc g gtagacgtcg t cgatcgcacg g gggctgcggc c ctcgagcggc c ggcgtgttca c cggagcatgg c gt-ccagatcc ctcgacaggct cgtacgcagg ttcacccgcc agcct.caaac gcccaaaacc aacctacggg atcactgccc tggcggccga1 tgctcgctcg ccgcagaa~c tccaagaatc cctatcgcga aagtcgtCgc zcgagcct-aa cacgggaagc agcgccctcc tgctcagtat ggctcagctt attatgtact actggaagcg catctaccct ggggtcgatt *ctgcattttc *cgctcttcaa *cgaiggtcct gtattcaaga agcgagaggc Lcgagcgcgct ftgtacaccag Igggacctcgt acatgctcga aggtgcgctg iacgcgctgct :ccatgcagct :gt tcgcggCg cggtggtgat gcgcggccta atcatggtga :cgccggggat :ctccgatgat :ccacagggtt j acatcaacga a gcttcgatct U tgccggacgc a agrgacggt 9 gtcgccccga c cggtgggCCt g gcggggccac cgtgctrtcg 10500 cgcggcggg 10560 gccgggaaa 10620 cgtgggcga 10680 agcgtttgc 10740 *ctcggcgat 10800 .cagaacagc 10860 *cggtctcgc 10920 rcacgcccga 10980 rctcggaccg 11040 ~gctcaactc 11100 ~ccggtttgt 11160 :gttcctgcg 11220 :ggcgcgggt 11280 :ccctccccc 11340 :gcaggcgca 11400 ;tattccaac 11460 :cgcgtcagc 11520 :ctcgcaggt 11580 :cggcatgga 11640 tgaagctgtc 11700 :gttggatgc 11760 :aggcgtgca 11820 :atgacgatc 11880 tggggagcgc 11940 aatctccgag 12000 :atcgtgccc 12060 ctactggctg 12120 atacgactgt 12180 gcggcacgac 12240 agtcgacgcc 12300 gaggctcgtg 12360 gctctatcac 12420 cgatctcatt 12480 ctacgaagat 12540 cgcgctggag 12600 gcgcatcgcc 12660 gaaggagatc 12720 aaagcggcgt~ 12780 cgaggtgatc 12840 ccggctcccc 12900 cctggacatc 12960 gcagctgtgg 13020 cgcccgggtc 13080 taaccagcaa 13140 cacgcagact 13200 cctcgcgtgg 13260 agcgtacgtc 13320 ttcgcttccg 13380 gagcgagcat 13440 cgccgtggtg 13500 acttggcgcg 13560 agagaaaggc 13620 cgtgccgatc 13680 ggtaaagctc 13740 ccagcggctg 13800 gcccattcag 13860 gcccaagggg 13920 gcgcttcgaa 13980 ctcggtctat 14040 gtccaagctg 14100 gtggaactcg 14160 ttcgctcgct 14220 gcctggcgag 14280 cgaagcgtcg 14340 WO 99/66028 WO 9966028PCT/EP99/04171 atctggtcca ggCCgtCCgC gtctgggttC gatgaagaga aagaccggcg gacaaccaaa aagtcgcatc aagctccttc gcgagcctca agcgacggcg ggaaagcccg gcgcgtcgec cgattcctga tatccatcgg atcgagggcg ctctccgatc gaagcggCgt tcgtcgtcgc caggcgcctt cggccggttc gtagacccgc acgacgcgcg tt~gaggacca ccgctgacgt tcgccgcggc gtcgtacggg ggtgcgacat gagatcgcca cgccgagact cacaagggca gccgcaagcc gggtg"tgcgc aaaarcctc tggacgaatt aggagctcgc cgggcagcgt gcgaggcaga tggagaacgc ccaacatgag ccggctggtt acaggctgaa tggcaggtca gcgggattac tctctcccga acggctgcgg tccgcgcggt ctgcgcccag tcgaggcccc ccatcgagac gcgcgatcg gtttaatcae agtctcctaz aggattggaz gcaccaacgc cggcgcgctc cggcggcacc cct-tcagcct cgcgcgaggc ccgtgcgtgc agggctctcz cggcgcttt( agctcgccgc zgttcgcc tccgatcg tcgaggatgl tcgggtaccc tgcgcaacca cggggcaact agacgcgcaa atctgggccg tcaagcttcg cgaacgtacg tagcctatgt agaccgadcg agagggtgca tcgtcgatct gtagcgtccg gctgcttgag cgggcagcac tggacgaggg acgggatcga tcaacctcct gagaattttg cctgcaacat tcgacctgcg ggcagttcca gcgcccctcc aactacccga ccaacggcaa attcggggca aggtgctcgg cgattcacat tcaccgagtt cgagagatct ggagacgtag cgcctgcgtC gttgagccgt cgctatcgca ctggaggaac ggcgtccgga gctggaagac gctcatggat cggatacgac ctcgtacttg tcagacgttg tc tgagaggg cttggcgtgc cgtccggatc cggccattgc cgttgtcctc tatccttggg tgaggtgggc gtccatccaa ggcggCgCtg fctccgtgaag Lgacggtcttg kcccatcgatc ktaccggctccccatgtcgtc tgccgagctc Igctacgagat ggcgacgacc grttgcgagac Sccgctgctcc a gtgggtcggc :ggcgtgcgac cgacgaagg cgcggtggci ;ccacagcatc ggtggcgatc cgtgaggaac gacgttccac ctacattggc gagcttcctc ctacctgccc cggataccgc cgacgcggtg ggtcccggag gatcgacgcg gttcaagctc gaccgggcag aacgttcctt cagcgtggag gtacccggtg cttciattat gcgcggagcg gttcgtgggc cctgctggag cggcgtctgt acattcggac ggtctgtacg cggccgcgag gtacatggtg ggtcgatcgt cacggcgcca gctggaggtg cgttcgcatg gttccagtac agatcagcgg ctaagagcgc accctgggac gttgttcgaa gtcatcggca cttcgagacg gtcgaccccg gtcgaccggt ccgcagcacc ccgacggctt acatcgaacc atcggcaacg ccgagcatct atgagcctcc ccccatcgac cgggccttcc ctgaagccgc tctgccacaa caggcgcaac tacatcgagz cggcgggtgt accggcatcc Igcgcrtggagc *gatttcgcgz actccgcggc I ctggaggaa'.
*trtcgrtcgtci *catctgcag cgcagcccci Igggctcgact -ccaggcaaci *atgggccgg, -cgggccatci ;tcctcccag, I tttgcggcg ;ggcgaggta -atctgccgg gtcgacc tat gtgctcgatg ggggtcgggc gtgcaccccg gatggaaaca gttgagctcg attgtgcccg ggcacacgga agagcacacg gctcgacacg gatccgcggg gaggccccga cccgacggcg caaacctacg taccacccgt cacgttcggc aggatcgacg gccggatata ccggtggggc gtttacgtgc ctcggtcagg cagcacttcg cctacagtct aaggccctgc cgggacgcct gtcgggctcc aggagcctgt ccgaacctcg ccgaacatgc cgaacaaaac tcacctgatc cgccgaggaa tgtcgggccg gcacggaggc cgctggtgct tcgacgctgc ggatcttcat acgagggctc tccacgagca acaaggatta ccgttcaaac tggaccgcga iccggctatgt racgccaaggc :tggaccgggc iacaacgacgg Icgatcatgga icccacgggac tcggtcgcga gacacctcga accggcagct i gcagcccgt :gggcCcct j cgcccgcggc -cggccaagac j cgcaccagg a tggagcaccc g cagcggcgcc g tgccgaag zagctcctgg( zaggccgaagc c tcgagcgcal c tgtggcggt( g ccgccgcgci c gcagccggci cgtgggcgag c aggcgctcga a tggcactggg c agaccgggga tcgagttcat gggaaatcga q tcgggaacga c gacgcgctgc c ccgccgaagc gactccggag aggcggggct ttccgtttgt cgacccttcc cgtatgtcaa tcgagcaccg1 aaaacttcga ccatcgagtc tggcgcagct aattcaattt acggcatgct attcctcacc ccgatatgct tcgtggagct gcgagcggaa tggaggagat agcagagctt tgcagaagag gctcgctggc aggaccgagt caggccgagc tgatcgcggg cggtgagctc ttttccgggg cgtgcagcgc ggacccgagc tttcttcggc ggaatgcgcc tatcggcgtg cccagcgatg cctcgcgacc tgcctgctcc gtgcgacatg atatgctgag gaacggcacg gctctccgat 1agcgaggaag ggcgctggCg cggcacgctg Icgcttcggcc Latcggcggct gccgcccagc ctacgtcaat cagctcgtt-c gaagcttcca Icgcagcggcg Igatttcgttg gctcgcgatg aggccagacc :ggtcttcgtc :tgaggaaCc :tggttggtCg :cgacgtggtg :gtggggtgtC a tgtggCCggg tgctcCggCgc ~atcccctac 14400 Lccgcgcccg 14460 :tactggcac 14520 ~cgcctctac 14580 ~gggcgtgag 14640 ;gaaacgctc 14700 ~gcggcgaac 14760 :gagcaggac 14820 ;gacggcttg 14880 Igacctggac 14940 ;gacgtctac 15000 :gagtttggt 15060 :aaactccgt 15120 itccggccgc 15180 tttgctgaag 15240 cgttcgat 15300 gctgtatgga 15360 cgatggag 15420 tgaacaggzt, 15480 gggcaggcgg 15540 gaggcgcacc 15600 tcgcgac-tc 15660 cgatgcgttg 15720 ggatacctcg 15780 cctcatcgcg 15840 cgtcgatcc: 15900 gctggacagg 15960 gtccggcctg 16020 ggaggttcgg 16080 gggccgatga 16140 tacacatcgc 16200 atggaaaaac 16260 gcgcgggacc 16320 ttccccgagc 16380 tacgtccggg 16440 atcagcccgc 16500 cgggaggcgc 16560 tacgccggcg 16620 atgcagtggc 16680 cacgtctcct- 16740 acctcgctcg 16800 gcgctggccg 16860 gggggcatct 16920 atcatgggca 16980 ggtgatcccg 17040 atcgggctca 17100 ctggcagggg 17160 ctcggagacg 17220 cggaggtctt 17280 ggcatcgccg 17340 ctgaacttcg 17400 acctctctta 17460 gggatcggcg 17520 gccgcggcgc 17580 ctggatgccg 17640 ggcgacgtcg 17700 gcggcgccgt 17760 ccgccgggcg 17820 tttcccggcc 17880 gtcttccacg 17940 ctgctcgcgg 18000 cagccggtgc 18060 gcgcccgacg 18120 gcgczgtcgc 18180 atcagcggtc 18240 WO 99/66028 WO 9966028PCTIEP99/041 71 agggcgagat acgaggatcg agccggcagc gggtgaaggt tggcagccct cgggcgccat agccagtgcg tggagatgag agcgggcggg tggaggcgct ccgcgggggg tcgaagcgcc cgctcctCgg cgctggatct ttccgggcgc gccctttgca cggtgttggt cgagccgggc tccgagtgga tccaggccag acagccctgc gggtacgcct tggacgcgtg gggtgcCCgt gccacgcgcg gggtggtcga ttccgggagg ccgcagcggt gcgggct.Cgg cggcagagaa gc caggc tcc cagggctcgg ccctcgatcc ccggcatggg tcggcgccgg ccatggagca gggagctcgc tgcgcggtgg gggggagga t t.gtgaccgg gcgctggtca ccgtcgcggC atcgggcgca gcgtcgtcca ggtttcgtaa gcgaagcgcc cgggccaggg gggcgcaggg cggccgcgca ccgacgaggg tgatgccggt tgitgtcgcg acctgctccg ccctccgcgc ccccgctcac aggccatgct cgctgagcgq ccaccgccge tcgcagcaaa gctgaaacaz ggcggagctc cggtgcggac gccgctcgac ggcggggctc zcgggaggcc ggcggtgacc ggtgagcgtg gatcggcgag ggatgtcgcc gggcgggctc ggtagcgggc cttcgccgag cccgcatccg cacagcggtg gggcacgctg gcggcgggta ggccaagagc tgaaatgcag caagcggctg ggcgtacctg gataactgac ccaggtggtg gccgggcgct gcgcaccgag catacccgcc cttccagggg gcccgacgcg cttccagatc ggagctgggc catcgtgaac caactcgggt ggtgcgccg cggcacagcc cgccgcgt tg caacacgagc gacggcggtg ggcgcaaggc ggcgccggta ctttcgagac cgacgtctcc cgcggatctg tgccctgctg cgagcgatgc cuagagctgc cggtctgggt cctggtgctg gctcgaggcc gctcgagcgg tgcggccggc ggtgatggcg gctttccttc caactacgcc gctgccagcg ggaagatcgc gctgtccgct gaacccgcgg cctggtgacc ccgcctcgcc :gcagatctcc gagcctgggc gggcatcacc gcatctggcc LcrtCtgCCgtc kattcaaggcc tgcggccatc2 Igaacggaccc ,gctccggaac, :atgcgctaac Sctzcaccgagc Scgar-cgctc( gagctgtcgc gccgtgagca gtgctgtcgt agccacagcc CggCgggtg ccggagctcg gtagtccagg atcctaacga ggctcgctgc tgggcgcagg ccgctgccga gccgcgggCg accctgtcaa ccgtggctcg gagatggcga gtggtgctcg acgacggagc ggccacgcgt gtcccggCtg gcggccacct attgctgagc gccggctcgg gtcggcagcc tcgctgcggc catgggcacc gcagtggtcg cgcgaagaag aaggtcaacg cgcgcgatgc gctgccggcg gtgcacctcg gcattggacg cgtggctgcg gccccgcgat gcgacacagg cgctgcgctC gccgagctgc gtcgctcgga attccgaccg gggctcggtc gtgggcCgCt cgcggcgcgC atcctccgcg atcttggacg cccaaggtcc ttcgtgctgt gcggccaaca *ttgagcgtcg ggcgCgCggC ctggcacggc Ictgtgggtgg gcgcatcgcc *gctgccgagc caggtgctgc *ar-gaactcgc *gtaccggcaz Icgggaggcat -gagatcgagq cttacargac i tcattcagcc ;agccgatcgc cgcttgggi -cgatagattc.
;acccgcagc tggccgaggc acagcccgcg ccctgaacgc cgcaggtcga cggctgcggt gagcgaatta cgcagctcca cttcggtcga ggcgggggca gctaccctgt cctatccctg atcgccgcgg cccagacgag gcgaccaccg tttcgtcggg ccgaggcgct agccgtcggg ccttccgggt ggcttacgct acgcggagct tatggcgggg cagcggagta tcttcgcccg tcttgcagcg aaacccccga ccgaagtt ta acgattggtt cgggccggtg tggaggccgg tacgcgcgct gcagcctcaa cgccccggag acagcgtgct tgtggctttt caccgctgct gggtcgacct tggccgacga tcgtccgccg acgtcaccat tgagcgtggc ccggcgcggc gcgtcaccgt aggttaccac acgggctgct agggggcctt acgcttcggg cgttcctcga *actggggcct *tggtctcccg tgctcgaaac fagctctaccc fcgagcgccgq cgagcgcgcc gcctccccgz tgatggggct icgctgttgtc -gcgaagccgc IagatgtcacE tactcgcggt gctggaggac catcgtcggi i gctgctcgaa gtcgCtccc Scttcgatgcl cgaggcggcg c ctcgacggtg c gaagggggtg t cccgctgcgc g gccgatgcgc t ctggatgaac a aggcggccac g ggagatgcgg c ggacgagcgc c accctggggg c gcagcgcgag c cgtgcgtgcgc cacgcggctgt ggtgcaggga ggccgaggCt t ggccttcgcg gcggctgcag ccacgctcgc ttccgctgtg aaccgagatg tgaaggcgag tcggttgcat cagtggcgag gccttcgggg tcggcagggc cgggctcgtg cctggagctc gctgctcctc cggccatgcc cctggcaaag tgggggtagc cgccgacgtc ctggaccgtg gacccgcggc ggggctgggc cgatccagcc cgccgaagcg gcagcccgag ccacgcggac cggatggctg gagcgtggag ggcgaaggcg gtcggggatg gatgcagcag gcacctgcac agcagggctc cgctctggcg gttcgcggag fcggaatgcgg fcggccgcgct !cgcggcggcg fcaggccagcc jgagcgcgctc igggcaagatc -cgagctgcgc Igacctatccc :tcctgtggag iggacgatctg icctacggcac Icggctcgctg :atcggctgcc :gcggagcgcg :gzcgaggccg :gcgticttcg ctggaggtcg tccgaggct 18300 *tctcgggcg 18360 *tctgccgtc 18420 raggacctct 18480 .cgacggtga 18540 Lacctcaggc 18600 rgtctgttcg 18660 *gcgcggccc 18720 *cggcgatgc 18780 .ggctgtttc 18840 *ggtactgga 18900 ~gcggtcacc 18960 .gggagacga 19020 ;cggtcgtgt 19080 tgggcgatg 19140 jgcgacgcgg 19200 .tccagatcg 19260 jgcgcgttgc 19320 .gcgcgcggc 19380 Iggctgcagt 19440 gcgctgggac 19500 cctgcgctgc 19560 gcgacgccgt 19620 gagctgtggt 19680 gccgactttt 19740 gcgcagcggc 19800 gagtgggaac 19860 ggcggcggcg 19920 gtcgtgcacg 19980 gcctttgacg 20040 gagctcgacc 20100 agtcccgatg 20160 caggccctgg 20220 gcacaggccg 20280 cgcgtcatcg 20340 cggcccgagg 20400 gaagtcgcgt 20460 acccggcccc 20520 agcacctacc 20580 gccgagcgcg 20640 caacgggcag 20700 gatgtcgccg 20760 ccgctgcggg 20820 actcccgcgc 20880 gcgttgacgc 20940 ttgggctcgc 21000 caccaccgga 21060 gtgggcatgg 21120 agcctcaccc 21180 caggtggggg 21240 tcttcgcgaa 21300 ggggacgggg 21360 ctggagccgc 21420 gaggtggacg 21480 aaccgcatcg 21540 acggtggcgg 21600 icaccgcaca 21660 acgcagttga 21720 agcagaatcc 21780 ggctcgcaca 21840 gcttccctgg 21900 acgcggtcca 21960 tgccgcactg 22020 gcatctcgcc 22080 cttgggaggg 22140 WO 99/66028 WO 9966028PCT/EP99/04171 -7gctcgaggac cggcgctttC cgcgtacagc ggggttgcag tcacctcgcc cagcgcgctc cgatggtcgt tggcctggtC gctgatccgg cgtgctggct ggccgtcgat cgaggcgctg cgcggtgaag ggcagcgdtt tccgcggatc gcgcacggac gcatgtggtg ggcggagctt gctgcgcgag ggcgacgacg gctgctggCg catcgcgagc gccgggcatg gtgcgtggCg ggcggggagc cgcggtggag ggttgggCat agatggggtg cgcgatggtg ggcgtcggtg gcaagcggtg gcatgtctcg ggtggcggcg gaaggtggtc ggtgCgCttc agtgggcccg gacgctgctg gggcaggctg gcggCgggtg ggccgaaggg crtggcccgag ggtgctggc acgttcgtgc ccaggccctc cgtcgtggac gccggtgctC cgtgacccga gctgtggggt ggacctggat gccggacgcc ggccgccccz ggt-gacgrggt ggcggggcac agatcagccc cgcgcgggt-c ggcggccgtC gctgCtggc ggcartgggt(.
ctcggcgtc(.
r-ttggacgc(.
gggcctgtg7 gggaatctgl gcgcgcgac, cgacgcgag, ctcggccgt gccggtatcc acggcggact gccaccggca ggaccttgc tgccgcagcc ctctcccccg tgccggacct gtcctcaaac ggctcggcca caggagacgg tacgtcgaga cgggcgacgg accaacatcg tcgctgacgc cggctcgagg cgcccgcgct ctggaagagg ttggtgctgt cacctggaca cgcagcgcga gcgctCtCgg tcctcgcgcg ggCCgggggc ctgttcgacc gccgagtcgt tacgcgctga agcatcgggg aggctcgtgg tcgctcggag tcgatcgcgg caggcgatcg cacgcgttcc tcggtgacgt acggacgagc gcggacgggg aagccgacgc gcgtcgttgc tgggccgccc ccgctgccga *ctcggagccE atgcctcgct *gaccggggtc *gccgtgctcc ggtggCCgCz gcgggggcat gcgctgattc LggggCCtgCZ atgggccgg ccggaggagi gaggatcag( iccggagggai gggctgggcc.
cttgtgctgi ccagaggtgi accgtggcgi gagccgccgi caccaggac ;ctgcacac ;ggcgtcttc ;ctggcggac ;gcggagggg I gcgatgccg g cagcgcgtg c cgaggccgc g ccagctgtg cgccCCggtC acgcgcgcac acatgctcag tgaccgtcga tgcgcgcagg acatgatgga tcgatgcttc ggctctccga tcaaccatga tcttgcgcga cccacggaac tggggccggc gccatctcga acgagcgcat gcagcgcgct tcgcgggggt cgccggcggt cgggcaagag tgcacccgga tgagccaccg ccgtggcgca gcaagctggc tttgcgcggC gggagctgga tgttgctcga cggcgctgtg agctggtggc cggcgcgCgg cgccggaggc cggtcaatgg cggcggggt~t actcgccgct accggcggcc tgagcgcgc *tgaaggcgct *tgctcgggct *gcgccgggcg Igcggctcggt Lcctatccgrtg Lcggccgccga catccgtgga jgagtcgggga atgcgcccgc iacgactggca cggccgaaga :aggcgctcgg I cggtgggcgq I tcgcggcgct a gcccgacggz tggcattccc a acgcagcgcc 3 cccttggcct a tcagccggcz gcgcgcgcal g cggrtcgacgi :tgcggggg9l g ctggtcggCl r- ttacccgcgi g gctcgatcg c tccgccgaai g ggatgggcr-, a cgagtcggg, g tcacccaga t tctgggatcc g agcgctggc catcgacggg ggtcgCtCgg catcgccgcc cacggcgtgc agagagcgat agccgcggcg ggccaacggg cgcgcaacgg tggccggtcg ggcgctgcgg agggacctcg gcgctccgac ggccgcggca cccgagaaac cgcgttggcg gagctcgttc ggagctgtgg cgagggggcg gctcgggCtC gctcgcggtg ggggcagacg gttcctgttc gtggccagcg ccgcccgctg ccagacggcg gcggtcgtgg ggcgtgcgtg gcggctgatg ggaggtggcg gccggagcag cgcggcgcgC gatggaac cg aagcgtttcg ggggtactgg gcacgaagcc gttgccagc cgaggaggct cagctggccg gcagcggcag tgcgC tggcg ttcgcggcga ggcggccgcq *cgaggcctcc L gggggtgCt- Lggtcgccaae fcacggggccc rcgagcctgac agagcatCc ggtcgaggcc ccaggggcgc ggtgtCgctc cctcgttgcc, i cggattgccc :tgcggcgatc :ggccgatgc( :agtgcacgci :cgcccgggti a gcagccgct, g ccagggcag, :gcaggggct =gcaggcgca =cctggcggc t ggattgggc g gctggtaac g caacgcgtc agccgcaccg ctgccgcgcg ggacggctgt tcgtcatcgc ctcgcgttgg cgcacgcaag ttcgtccgtg gatggcgacc accgggttga agcgcccacg ctgggcgatc ggcacacgct ggcgtagcgg ctcaacttcc accgagccgg gggatgagcg cctgccgcgc ctcgatgcgc ggggacgtgg gcggtgacgt ccggcggggg accggacagg ttccgggagg cgcgaggtga ttcacccagc ggcgtagagc gcgggggtgt caggggctct gcggcggtgg gtggtgatca ggcgcgcgca atgctggagg ctggtgagca gtgcggcacg ggcgcgggga tgcctgccgg gcgggggtgc ggcgtCttCC cggtactgga fcagtggttct LgcccggtCCg rgcggcgctt"t Sgcggttgccg Ttacctgtggg igtcacccatc cgctcaccc gctgCCCCCt ggctcctggq ctggtggccc cggcgcgcac ;tctgcggag 3 Cggtggttg :gaccgcgag :gaggcgctg :gaaggcatg :gcgggtCtgc 3 ttgcgccmc g gacctcttcc c tacgcggcac, C gccgccctgi g cgccgggaai g atggaatggi c catgcgggai t gccacgaaal t gttgtggag, gtgtgttcgt 22200 aagagcgaga 22260 cgtacacgct 22320 tggtggcgat 22380 cgggaggggt 22440 cgctgtcgcc 22500 gcgagggctg 22560 gcatctgggc 22620 ccgcgcccaa 22680 tcgaagctgg 22740 ccatcgaggt 22800 gcgtgctggg 22860 gcctgatcaa 22920 gcacgctcaa 22980 tgccgtggcc 23040 gaacgaacgc 23100 cggagcgctc 23160 aagcggcgcg 23220 cactcagcct 23280 cacgcgaggg 23340 cggcgcgccg 23400 gcgcgcagac 23460 cgttcgaccg 23520 tgtgggcgga 23580 ccgcgctctt 23640 cagagctcct 23700 zctcgctgga 23760 cggcgggcgg 23820 cgccgcacgc 23880 cgggcgtgga 23940 ccaagcggct 24000 agttcgggcg 24060 acctgagcgg 24120 tgcgggaggc 24180 cgttcgtcga 24240 aggcggagcc 24300 tcgaggcgct 24360 ccacggctgg 24420 tcgaggcgcc 24480 accgggtgga 24540 gcgggtggct 24600 cgtcgcaggg 24660 agcaggtgac 24720 gtctggacgc 24780 ttgccgcggc 24840 *ggctctggat 24900 *gtcaggcggc 24960 gcgggctcgt 25020 agctgctttc 25080 cgcggcttgt 25140 ggagttactt 25200 1taggagcgcgg 25260 ;aatggggccg 25320 ;aggcgcaggg 25380 1cggcgcr-Ctt 25440 tcgacgacgg 25500 a aggtggaggg 25560 ;tactgttttc 25620 gcaatgcctt 25680 a acatcgcctg 25740 :acgaggcatc 25800 tactcggtac 25860 g cggcgccgcg 25920 g aggcctcctc 25980 a cccgctcggc 26040 4 WO 99/66028 WO 9966028PCTIEP99/041 71 -8gctctacgag gctcgacgtg ccgcaaacgg tccgaccgtg caccgacgtg cgcctgccgc gggcgtggtg ccccggctcg ggtggagacg cccgcaacag cccgtcggCg tgccgagcgg catgctcagc ggctgtggat gcgacggggc gaccttcgcg ctcggccgac gctctccgac caatcaigat gttacgccag ccacgggacc cgggcaagcc gcacatggag agagcaaata gctgccggtg cgcgggggtggccggcggtg gggcaagagC gcacccggag gaaccaccgg cgtggcgcag caagctggcg tgcgcggcg ggagctggac gt.tgctcgac ggcgCtgtgg gctggtggCg ggcgCgcgg gccggaggcc ggtcaatggq ggcggggttC ctcgcCgCt.q ccggcggCCe gagcgcgccc gaaggcgctc gctcgggctc cgccgggcgc cggctcggE-c agccgcggai cctccagaai agtcggcgac.
tgcgccggci cgattggcac, gatcgatga gtttctgagi cgttggcgal ggcggcgcsl cccgcccca gaccgagga cccgccaca gggaggccc gcacttggc gccgcctga ggtgaccgtc ttgtgcgcg cgacgaggct cttcagggtg gagcggctgg cggagcgttc rtcccgggcg gtcagcaccg ggagaggcac ctcgat-gcggcggctgctgc ctgcgcgaga gtgcaggaac gttgcggCgg acggcgtgct gagtgcgacc ctgctctcac gcggacggct gcgcagcgcg ggcccgagca gcgctggcgc gggacggcgc cgccctgcgg cccgcggcgg ccagcccagc gcggtggccc agctcgttcg gagctgtggc gagggggcgc czcgggctCg ctcgcggtgg gggcagacgc t: icctgttca tggccagcgt cgcccgctgc cagacggcgt cggtcgtggg gcgt-gcgtgg cggctgatgc gaggtggcgg rccggagcagg acggcgCgcg atggaaccga iagcgtttcgc ;gg gtactggg Icacgaagccc ttgccagcct gaggaggctc agctggccg ;cagcggcagc ccgacccaac a tcagaggag5 3 gcggtcgCtc a gagacatccc.
g gtagrtgctcl g atcggcgac(, :accgtgtCtl cgagcctgcgi cgagcatccoc a gccagccCg, c cagctcgcc g gggcaagcg ggtgggctg ctaaccagc g atccgcgcg g gcagcggcg gcgtggtcgc c' tcgccgagca g agctgggtat g tggaacactt g ggttgCCggC g gggtcgagga c aggtgccggc c agagacagac c cgttcttcca c tggaagtgag c gccccacggg c tcgccgatga g gacggctatc a cctcgtcgct g aagccctggt t ggatgcacgc a acgcgcgggc c accgcgaccc c gcgggctgac a acgcaggggt g tgggCgaCCC g accgaccgct q gcctggCcgg c cggagctggg c gcgcagcggt ggatgagcgg a ctgccgcgcc tcgatgcgca gggacgtggc cggtgacgtc cgccgggggc ccggacaggg tccgggaggc gcgaggtgat tcacccagcc gcgtagagcc cgggggtgtt aggggc-.Ctc cggcggtggc tggtgatcgc fgcgcgcgcac tgctggagga *tggtgagcaa rtgcggcacgt gtgcgggcac i gcctgccgga Icgggggtgct gcgtcttccc -ggtactggcc ;gctggttcta ;cgagccgcgg ;cagcgctgtc I cgaccgccga acctgtgggg g cgaccgtcg t gttCgCCCCg a tcgccccttg g gggcctgggg a tcgacggcga t tccgccatgg g caccggtgtc g gcctgatcgt ggcgCgggtt ggatcgcagc g acgtggccga ;gggtgatg gc ;gcctcgac tc ccgctgtcg g~ ctgagccag gi acagaggac c( ctggagtcc t gaccggtgg a tacgtgccc a atctcgcct c tgggaggcg a gtgctcgtg g gcggcgggg C tttttcctg g gtggcgctg c .ggcggggtc a .ctttcgccc g :gagggctgc g :atcctggcg g Lgtgcccagc g ~gttccagcc g ~atcgaggt-g c ~atcctggga g :ttgctcaag g :gagctcaac c ;ccgtggccg c ~acgaacgcg c ;gagcgctcg g lgcggcgcgg C ttcagcctg g gcgcgagggg c ggcgcgctgc a cgcgcagacg gttcgaccgg gtgggcggag zgcgctcttCa ggagctggtg ctcgctggaa ggcgggcggc gccgcacgcg gggcgcggao caagcggctg gttcgggcgg cctgagcggg gcgggaggcg gttcgtcgaa ggcggagccg cgaggcgctg cacggctggg cgacatca tcgcgtggac gagctggctg gacacgtgga gc tggtgac c tctggacgcc tgctaccgcg actctgggc tcaggcggcg cgggctcgzg gatgctcatc gcgccggcac gCtgtctgcg ggcccagtgg gcccgaccg ggtcgaggcg cgtcgaaccg gctttaccg a :cctgatgg c cgacgctag c :gctggagc t cgatcgcca t actggcagc t atggggcag a ggggtggct t gggaggcga t tcgagcgcg c gCgCgggCC c tctacagcg g gcctgcacg g acctcggct g acatgctgc t gcgggcggt 9 ccgtggtgg t tgatccggg g gccctgccc a acgtcgatt t gtgcgctga 'ccgccaagg c cggtgCtCg :cgctcttgcc :gcacggacc :atgtggtgc ~cggagcttt :tgcgcgagc ;cgacgacgc Z~gctggcgg itcgcgagct :cgggcatgg :gcgtggcgc :cggggagcg icggtggagt gctgggcata gatggggtga gcgatggtgt gcgtcggtgt caagcggtgc catgtctcgc gtggCggCgt aaggtggtcg gtgcgCttCg gtgggcccga acgctgctgg ggcaggctgt cggcgggtgc cctgacagcc tggccggaga gzattggcgg cttccatgcg gaggctgccg gtcgtCggtg ccggtgctcg gtgacccggg ttatggggca gacctggatc accgagctat acggcacggc gaggcgagct ctggtggagc caggcgtggt ctggaggcgc atgacagcgc ccagggcac 26100 cgtggagat 26160 gttcgacca 26220 gcaggaccg 26280 cgtgggtgc 26340 gttgaccga 26400 cgggcgcgt 26460 tctgcgcga 26520 gagcctgga- 26580 gggccagga 26640 caacgaata 26700 'caccggcaa 26760 rgccgaccct 26820 rccagagctt 26880 *ctcgccgaa 26940 rcaagacgtz 27000 .gctcaagcg 27060 ~tacggcgat 27120 ~ggaggcgct 27180 cgtggaatg 27240 ;cgacgtgta 27300 .caaccttgg 27360 gctggggca 27420 gtgggaggc 27480 ;cccgcgctz 27540 tggaagaggc 27600 tggtgctgtc 27660 acctggacat 27720 gcagcgcgat 27780 cgctttcggc 2-7840 cgtcgcgcgg 27900 gccgggggct 27960 tgttcgaccg 28020 ccgagtcgtt 28080 acgcgctgac 28140 gcgccgggga 28200 ggct~cgtggc 28260 cgctcggagc 28320 cgatcgcggc 28380 aggcgatcgc 28440 acgcgtccca 28500 cggtgacgta 28560 cggacgagct 28620 cggacggggt 28680 agccgacgct 28740 cgrtcgttgcg 28800 gggccgccgg 28860 cgctgccgac 28920 gtcgccacgc 28980 tacctcgcag 29040 ataagggtgg 29100 tcgtgctcca 29160 gcggtcgaag 29220 cggaggcgtc 29280 gcttggctcg 29340 gggcatgcat 29400 tgggccgggt 29460 cccgagcgag 29520 tgtcgcagga 29580 tggtggccgc 29640 acctggtgac 29700 tgggagcgcg 29760 gcgagcagca 29820 ggggtgcacg 29880 tggtttcgcc 29940 WO 99/66028 WO 9966028PCT/EP99/04 171 -9ggtcgagCc ggcggagacg gctgctgcac cgcagcggtg cgggctcgcg gtgggccgag tctgcccatg ggctcagcgc agggcgtcgc tccggcggca gcacgagatc cgatcctggg caacctcctt gacggtac-ag cgacacccag ctgccgcttc cgtggtggtC tgatccggag gagattggat gcagcggttg tacgctgcga gcagcggctg gctcagcgtt catggatacg actgggcgag cttcgtgctg ggccgacgcg gcgcgatgcg ccacgacggc gcgccaggcg cgggacaggg tccagggcgc tctggaggcg gcagatccg gccggtggcg cggcgtgagc ggaggtggag caagagcgcg cccggagctg gcaccggctc ggcgcagcaa gctggctttC cgaaacgtgg gatcgaccac gctcgatcac cctgtggcgt ggtcgccgCC gcgcgggcgq cgaggccgac caacggtcct gacgttcgcc gccgctcatc cgcgccagac cacgcccga( ggcgttgcal cgggctgttc.
ggaccgc tcc tgcgctcga( tccatggcal gat-cgcagq ccacgtgct, caaggtggt', ctggcccga gcccgacca cctgtcga ccgctgcgag gacgagaccc cggctgc tgc tggggtagcc catcttcggc ggaggcatgg tcgacgtcgg acggtgaccc aacctgcttt gcaacccgga gtccatgggg atggggttca caggctgagc cggctggtgg catgttcggt ccgggcgggg agcgccgagg atcccaggcc gcgaccttct ctcctggagg gatagccca cgaggcttca acggctggac gcgtgctcgt tgcgatcaag ctctcacgga gacggctacg cagcgcgccg ccgagcagcg ctttcgcaaa acggcgctgg tccggggacc gcatctggct gcccagccgg gtgccacgta gcgttcgggt ccggcgcccc gcggcgctgg agcctcggcc gc cat cg cg aagacgccgc ctgttcaccc fcctgcgttc Icctctgcgcc accgcgtacc tcgtggggcc tgcgtggCg Ictgatgcag gtggccgCC-, gacgccgtcc.
i gcgcgtgggi I gaccgatg( cgccggtg( ;tar-tgggtc; gccgcgggti j ccagcgtgc' g gaatgcgag :tggaagggc j cgtgagcgc c cgctggccg c tcgatcgga g gtgcccggc g cgggcgatc g gaggtcgag g ctggcgacc gggtggtgca tgctcgagtc acggccggcc atagccaggg gttcgcaatc cggacgcgga cagcgttgtc agatggactg cggcgctggt actggcgtgg ccgtcgctcg a tgagcaggg tggacgtgcg agcatctgct cgttggcgtc tggaggacct cgccggccga ggacttacgt tccgcatctc caagctggga ccggggtgt~t ccgacggagc ggctgtcgtt catccctggt cgctggttgg tgcgcgcgct cgcggggcga gcgactccat ggctgaccgt caggcgtgtc gcgacccgat gaccgctggt ::ggccagcct *agctggggga *aggcggtgcc *tgagcggaac cggcgccggc facgccgcggc facgtggcgtt: Lcgacctcgcc agggcgcggt Igacagggcgc -gggaggcgtt aggtgatgtc I ca~cagccggc ;tggagccgcz 3 gcgtgttctc cgctacccgc ccgtggcgcc g tgatcgccg5 a tacgcacgai ~ggaagac-.i gtcgaatgi gcatgt-gc( g cgccacgt, c zcggggaagi g :ggtcctcg, g :-gttccccg attggatgg tggctggtg cacgccatc cctttcatg agctgacag :ccacC9 :ggCggcgc cgccgctggc ggtgctccgt tctcgacctg tgcgtacgcg gctgcctgcg ggctcatgca ggcgctccag ggcgcgcttc cgcagggcgc cctgtccgtt ggtgctgggc cctcgactcg gctttcgacg cgtcgatgta agacgagccc ggagtcctac ccggtgggat gaccaaaggc gcctcgcgag agcgctcgag cgtgggtgcg ggcagggttg tttcctgggt cgcgctgcac cggggtcaac ttcgcccgac ggggtgcgcc cctggcgctg acccaacgga tccggtcgac cgaggtgcag gctgggggcc gctcaaggcc gctcaacccg gtgggggcgc caacgtgcat *gcgaccggtg ggcacggctc *cagcctggcg fcgaggccctg gcgcggcaag gcaaatgccg cgaccggtgc Iggctgcgccg icgtactgctc gctcgaagat cggcggtgcc ccacgccgcc j cgccgaggtz i gaggctcgcc ccagcgggtc caccggccac g aagcgccgtc r cgtcgaggti =ggacgcggtc C ggcgctcggT a tggcgcgcgc aL cctcaccccc t cgggctctg( a gcccttccti t cgcggtgati g cgtggagtt, t gctcacccc c ggagaccga, gtcagcgtca cccaaggtgg ttcgtgctgt gcggccaacg ttgagcgtcg cgtctgagcg cgcctggtgg gcgccggtgt gacatcatcg gcggaagccc ttcctcgacc ttgatggcgg acgctggcct ctgaagctgg atcgccatcg tggcagctat gcggcggact gccttcctgC gcgatgagcc agcgcgggta gggcccaatg tacggcggca ctgcacggcc ctcgcctgcc gtgctgctCg gggcggtgca gtggtggtgC atccggggaa cccgcccagc gttgaztttg gcgctoagcg gccaaggcca gtgcttgcgC cacttgccgt ggcgcacgc gtcgtgctgg gagctggtcg tcggcgcacc acgacgcgca cgaggcgcgc gccgtgtcct ggcatgggcc gtggcgctc, ggcctcgCtc fctggagtacc ggtcatagcz gcggtgaggt artggtagccz acggtgt~cgz icaggtgctc :gtctcccatc :gczgcgacgz :gt-cgcaggc( cgcttcggc(.
ggcccgaag( ctcgtgccgi cgcgtggCt( I cgaagcgcci, atgCCCggC, ggtga-caci ctcagcatc' ctgaaggcc gaagccgcc a cgccgacgg tgcgtccact 30000 ccgggagctg 30060 tctcgtcggg 30120 ctttcctcga 30180 cgtggggtct 30240 acatcggggt 30300 ag-accggcgc 30360 acaccgctcg 30420 cgccttcccc 30480 acgtggctct 30540 cgagcgcgct 30600 tggagatccg 30660 ttgatcatcc 30720 aggatcgcag 30780 tgggagccgc 30840 tggccgaggg 30900 ggtacgaccc 30960 gcgatttgca 31020 tcgacccgca 31080 tcgctccgga 31140 agtactacac 31200 ccgggaacat 31260 cgacgctggc 31320 agagcctgcg 31380 cgccggagac 31440 agacgttctc 31500 tcaagcggct 31560 gcgcggtgaa 31620 aagcaztgct 31680 r-agagcgtca 31740 aggtgtatgg 31800 acgtcgcgca 31860 tgcggcacga 31920 ggaacacgct 31980 cgcgtcgggc 32040 aggaggcacc 32100 tgctatcggc 32160 tgtccgcgca 32220 gcccgatgga 32280 tgagacgccgc 32340 *cacgcggtaa 32400 *gtgggctgta 32460 *tcgatcggga 32520 *aggcggcgcg 32580 1cgctggctgc 32640 ttcggcgagct 32700 tggtggccgc 32760 itcgcagcgtc 32820 i cgccgcggt 32880 jccctcggcgc 32940 ;cgttccactc 33000 i tcgcgtaccg 33060 ccgagatcgc 33120 acggggcaaa 33180 cggtcctgct 33240 -cgctacgcgc 33300 g cctggggggg 33360 tgcccatgta 33420 g cgcctgcagg 33480 j ctatgttgca 33540 tcgtgtttgg 33600 Sccgccgagcg 33660 a tcgcgatgga 33720 g gggacggcta 33780 a cgacccacgc 33840 WO 99/66028 WO 9966028PCT/EP99/04171 ccgcggtcgg ggaggaccgc gcggatcggc ctcgcttgcc gatcctgctg cgacgggacg tggaagggtg gctggtcgac gccgcgagag cgactggccc cgtggtggca cgtcctcgcC gatctgcctc gaccgagggc gaccatgggc atggggcctc tttggagccg cgc tgacgac caaagcgacg tgggcagaag cccgggcgag tgtgctggga cacggcggtg gacgttgcat gact-cccgcg cgacctgggg gggcacggcc cccgtccaag gcggacgctg ggtogctcaac cgggcggttc gcatcccggt agagatcctc gcatgcgttc tcagggcaag cgtactgctg gcagggcgtg caaagccgtc cgccgatcgg gggcgtgatc ccgcctctcg ggcgggcaac ggccgggcag gcgggccgaa ggcaagcgggg gtcgccggct cggggcgatg tgtgtggcgc arttggccgcc cgtgcaggcc tcggccgctc cggccagcgg cgcgctcacc cgcaaagtcc ccgtttcccz cgatgccgtc tccggatg7tc gttcgatccc gcggctgctc gctgatgggc cgccggcggc ctcgggCag gtgctcctCc, ttcggtggcc cagccggct, gtgcagccga gcgatccagc tggggtCCgC accctcgtgc gacaacggct cccccgctgc cggtgtggcg gaaac tggcg gtgttcctgc gaagcgccct gcacctggct gaacccaaag tgggaggctg ctctcggtgg gcagtggccg ggccggacag gaggccgatg gagacacagg acccccgaag ggcacattgg gtcgagatca atgtatccgg ggccaggggg cgattcgtca caggcagcta aatctgcggc gcggtgcaaa tgggcagcgg gagtttgctg gcgctggccg ctcgagatgg gttcgctatc gagcgcgtgg gcgatcacca gtcgtgctgc accggcgggc ccgcacatgg gcggagatcg aatgcgctgg catgcagccg cgggtgctgg gatctcgctt tccaactatc ggcctggcgq ctcagcgcgq cagggcaccc tcgctcgacc gcgttggtgc IcgtcttgggM :gagatcgcgc r cagacttg gt-gggtgcga cgctggCtg( Itcgccgcag Iggcggcgtgi gtcgaggtgc cgcggcaagi IgccttCttCt.
ctggagacgi agcgatacci atcgaggcg, ;atctcttati t cgctggtcl ;ctgg9CCggc, ;cgaggcctg cagacggcgc ccctcgactt tttggcgatg cgacctatcc ttgcggtgag cgttcgCCgt gcgtgccgcg aggtggtcgc ggcaggagtc tgcccgatgc cggagatggc gcct~cgaggc gagcccacga tgcaggcgct tcgaggccgg tgatgcagga cagcgcgctc tggctttcCg ggctcctggc accagctccg aggtaaccgc gcgacgccgg tgcgccacgt cggtcgacgc cggtgccggt gcggcgagcg tcgcccgatg ttcaggccat agacgttccg gcgagttcgt gcaagaccga gggtattcga tcgagggctt aggccgaggc tgccggcgcc tgggagcgtt tgctcacagg aagcgctcgg aagctgtgc-.
gagcgctcga caccgaaggt tcttcgtgct rcggcggccaa rcgcagagcct rcgctgcaggc Icgctgctcgg Itgcgtgcggc gcgcggaggc cgctgcccga -gcgtgctttc I gcctcgactc I cgctgccggc -tcgataaggt ;tcgccctcgz ;ccgatccgga -cgcatgagcc a tgacgacacc 9 gcatctcgcc a gcrtgggaggc 3 gcgtgttCgt t tcgatggct2 g tgctCgggC" g cggtgcaccl g gcgtggcgcl g ctcccgacg gcccggcgcg cgccggattc gctgcaggac gaacgcccac cctgctgtca ggaacgggtg gccgcaggca cgaggtggag gggcgcgtcg gcctgcggaa cgcggcgctc ggccctcgcg ggaagctccg cagggaccgc tgagcgggtg gcgcccggag agctgacgtt ttccggaaag ccctgacgca cctcgcgccg ctcggggctc gccgatgggc cgcggtcggc gcggctggtg cgcgttcctg ggtgctgatc gataggggcc gggcgtgccg gcaggtcacc ggacgcgagc catacgggat catcctggag tgctgcggga agcgtttcgg ctccgcagcg ggggctccac tcggcggggc cgctcgggtg ccaggccatt tgatggtgtg gactggcgcc gttctcctcc caccttcctc cgcgtggggc gcggctcgct gcaggcgctc aagccaagct gcgccatacc ggcgCgtCgc atggagcgcc gc-tcacggcc gacgctggcz cctggccgtc Lcgagcccatt Iatgggacatc ctttggcgg( gcgcgaaacc gttcgagcgc ggggctcttc Itctaggcacc aaaggggcc(, ggcctgccai =gatgctcacc ;acggtgcaai ttgccgcgcc t ctcgacaggt t gggcgcgtcg 9 gacgtggcgc c acccggagcg a cggtggtggc q ttcggtgtct c ggatttgtttc actgcagcctt cggatcgagg a gcaacacggct ggggtgtCtcc gcggcggcgC E gcggtgcgcc caggtcgcca ctcagctgca ctgttgcggg cgccgcgtag gagtcctatc gcacagcgc aacttccgga ggagattgtg gatgctgtca gtccggcagc acggcctggc catgctgcgg gaggtgttcg cgcacgcaca ggcggCCggg ctgtccctgc cgagccgcgg czcgctccgg catctgcgcg ttcatggcgc cccttggcgc gtggcccgct ctggatacgc acgatcgcgg ccggcggagt cttgatgagc tggaatctgc atgtcggggc gacgcgctgg ccatggtcgg cggcatggga IgctcggCCgg tcgggagcgg Igcggctgggg gccgacgagg gcgagcgccg gt-ggagctgc i tcgatcacc gccgagccga gccatcatcg cggctgctcg gacgcgttct ttcctgtccg ;acgaccatgg gccgggattt taccaggagt ggcaccacgg ;agcctgacgg g gcgCtgCggc g ccggcgacgt g agcttctcgg .cgaggtgct 33900 .atcggcggt 33960 rcgacaaggc 34020 .cttgcaccc 34080 igccggagga 34140 ~ggcgccggt 34200 .gagcttcrt 34260 ;ccgccgggc 34320 .gtaccgcct 34380 ~gagccgggt 34440 caaccgctg 34500 cgcaggtgt 34560 ~gcgtgtggc 34620 :gtggtgggt 34680 :agcgccggt 34740 :tctggtgga 34800 agctcggtcg 34860 cgcggctggt 34920 gactggaggc 34980 gggcacctgg 35040 ccgtcccgc 35100 ccggtgtcgc 35160 tgacgctggg 35220 ctgcagggct 35280 tcgctctgca 35340 ccggcggtgt 35400 ccacggcgag 35460 tcgccagctc 35520 gcgtggacat 355B0 tgtcgacggg 35640 tcgcggcggc 35700 atcgaactcg 35760 cattgccggt 35820 aagcgcggca 35880 cgacgggcac 35940 ggctcgccca 36000 cgggcgctgc 36060 cgtcggatgt 36120 ggccgttaca 36180 agaccaccaa 36240 atgaactcac 36300 tcttgggczc 36360 ccgcgcatcg 36420 acggaggcat 36480 tgggagctct 36540 aaacgcagct 36600 cagtgccgcc 36660 cgcagggggc 36720 tgcgcaaggt 36780 tgcccgtcga 36840 gcaacgtgct 36900 cgacggtcga 36960 gcgtaccgtc 37020 gcatcggctg 37080 aagagggcag 37140 atgatccgga 37200 atatcgaccg 37260 atccgcagca 37320 tgcccgagcg 37380 acgctgcgct 37440 ccagcgtcgc 37500 tggacaccgc 37560 ggggcgagtg 37620 tcgtggagtt 37680 ccgcagccga 37740 WO 99/66028 WO 9966028PCT/EP99/041 71 11 cggcgtgggg gcgcgatggg cagcaacggg ggagcaggcg gacgttgggc ctcggaccgg ggcgggcgtg gagcctgcat ggccgccaaa gtttggCgtC cgcgcccgcg gctggacgcg cggcgacctg ggcggcgacc gccgcccgca ctttcctggc cgtcttcCgC gctgCtCgCC gcagccggcg cgagccggat cgccctgtcg gatcagcggc gctcctgggC gctggcgggC gttctgCcgt cgacgagcta ctcgacggcg caacgt tcga tgggctgttC acgggcgacg cctgtccatg gcggctgttc gcagcgcgac tgctcatgcc_ cacgcgcgtc ggtgcagggq ggccgaggcc ggccttCgCc ccgcctgcaz ccatgcccgc ggccgCgCtt ggccgagar-c ggagggcgac ccggctcca( cggcgaggcc.
gtcggggga(.
gcggagtac gctcgtggCi ggagccggc, gctcatcgg, ccattccgt gacggcgtc gcgtggCgt gcgcggctg tcctccgcg tgtggcgca gcgctgcgc t gccgagct cgtggccCg ggaaggccg ccgagccac ggggctcaa tccggttgC gagccttCg caccatcga tggagcgaag gatccgatcc ctgacggcgc gggctggctc gaccccatcg ccgctcgtga gccggtgtca ttcgacgcgc cccgtcgaat agcgggacca gcggcgcgtt caggcggcgc gcgttcagcc tcgcgcgagg gcggctcgcg cagggctccc gacgcgctct gagctcgcgg ctgttcgcga gcagtggtag czcgaggatg caaggcgaga tacgaagacc gagccggcag cgagtcaagg ttggcagcat acgagcacga cagccggtgc gtggagacga aagcgggagg ttggaggcgc tccgcgggcg fcggrtact-ggg Iggcagtcacc rtgggagacga gcggt-cgtgt ttgggtgacq gatgatacgc ittccacgt :ggggt-gct-gC :cgt-gcccggc ;gggctcgagt ;gcgctgggac cccgcgctct j acgccatgg u ctgtggt-gtc :gacttctgg5 U cagcggctc.
t tgggaaccgi c tcgggcggc c gtccacgcg c ttcgacggc' g ctcgacgcg, c gacagcgtg g ttgtggctC a gcgccgctc cggatccgac g ttggccgac g cr-cgzccga g ccgttccgg g gagcggcgc .c tttctcgac :g ctgggcgcc rt atcggccag LC gcccggatq gctgcgccat tggcggtgat ccaacgggtc cggcggacgt aagtgcaggC tcgggtcggt tcaaggtggc ccaatccgca ggacgagaaa acgcgcacgt cagcggagct ggctttCggC tggcgacgac cgctgtCtgC gccacgcttc agtggctggg cggcgtgtga ccgatgagac tcgaggtcgc gccacagcat ctgtagcgat tggcggtcgt ggctcagcgt cgctcgcaga tggacgtcgc tgggcgagct tcatggcggg gcttcgccga gcccgcatcc gaaccgcggt tgggagcgct gcgcgggCCt tcgatgcgcc cgctcctggg *cgctggatct *tcccgggCgc gtccgccca *cggCggcggt rcg-agccgggt *gccagatcgz -ttcaggccac *acggcccagc -gcgtgcggct -tggatgcgtc I tacccgtggi atgcgcgga( ;tggtcgacac j cgggaggtgi a ccgcggtcm ggctcggcgl a cagggcgcgi agccccgal g atgccccct c tctggaccg g tgacacgcg c tggggctgg tcgatccag g acgccgagg a ggctgcccg tggagatcg C ctcctggcC g tgatgaggg a agtgctCCg -g acgtcgtgg c tcgcacctc gctcctgctc ccgcggcacc gtcgcagcaa cagctacgtc cctgggcgcc gaagtccaat gctggcgctc cattccgtgg cggcgtgCcg ggtgctggag tttcgtgctg gcacgtcgt t ccgcagcccg cgcgctcgac cacaggcagc catgggccaa ccgagcgat~t cacctcgcag gctgtcggcg gggcgaagtg catctgccgg cgagctttcc ggcggtgagc ggtgCtggCg cagccacagc cgagccgcga cccggagctc agcggtgcaa gatcctgacg gggct.Cgttg ctgggtacac ccgtcgcgtg gaccggcggc *tgaaatgcac *caaacggctc ggcgtacctc Lggtcagcgat gcaggtcatc gccgggccac igcgcgccgac Tcgcacccgcl :gttccaggg :ccccgaggc( I cttccacgtc.
i aatcggctcc.
I tgtgagccaC j cacgggcgc4( :acgccggcgl :cggatccgal tgcgctcca, g cacgagcgc :gtCggtggt t cgacgcca t gcaggccgz g cgctcaggc g ccgcgttat gcggcgcga a ggaagtcgc a gaccgactg a tgggtccgg gggcgaggt ,c catggggat ,g ccgaattgt C cgtcgcgcc :g ccccgcggc aaaccgcttc gcggtgaacc gaggtgatcc gagtgccacg gcgctggcac atcggacata gagcgcaggc tcggagctcg cgacgagccg gaggcgccag tcggcgaaga gcgcacccgg atgacgtacc acagcggcgc gccccaaagg aagct-cctct caggccgaag ctcggccgca cztgtggcggt gcggccgcgc cacagcctgc c!tggccgagg aacagcccgc atccttgcgg ccacagatcg caagcgaccg gtggcgagct tcgttgatgg acatcg-gtcg cggcatggac ggccaggcgg ccgctgccga gcggcgggccaccctgtcgE rccgtggctcc gagatggcgC gt-ggtgctcc Igcgaccgag agcggtgctc g.ccccggcgz gcggctaccl ;cttgtcgagc gccggctcc ;agcagcgccl 3 ctgcggrtggl ggaaagccai ;atcgtcgcl gaagaagac, ;gtcatggcg, c zcggcgctg c gccgggttg g cacctcggc t gcgcttgag g gccggggcg atcggcgcc gccttggag ggagaagtc g tttcgcggc c cgagagaaa c gcgctcgac c gagatcgcc .c taccctggg .c gcgatgggc c ttcagtttc :g ctgacggc gcgatgcgca 37800 aggatgggcg 37860 gtcgggccct 37920 gcaccggcac 37980 aggggcgacc 38040 cgcaggctgc 38100 ttatcccgag 38160 ccgtgcaggt 38220 gggtgagctc 38280 cggcggcgtt 38340 gcgccgcggc 38400 agctcggcct 38460 ggctcgcggt 38520 aggggcaggc 38580 tggttttcgt 38640 cggaggagcc 38700 ccggctggtc 38760 tcgacgtggt 38820 cgtggggcgt 38880 acgtcgccgg 38940 tgctgcggcg 39000 ccgaggcagc 39060 gctcgacggt 39120 caaagggggt 39180 acccgctgcg 39240 tgtcgatgcg 39300 actgggcgga 39360 aagacggtca 39420 aggagatccg 39480 aggacgagcg 39540 ftgggctggga 39600 Lcctatccctg 39660 rgcagccgctt 39720 icccagaggag 39780 1gcgatcaccg 39840 :tttcgtccgg 39900 jccgaggcgct 39960 ;agcgaccagg 40020 7 cctttcgaag 40080 i ggctggatct 40140 atgcggcgct 40200 tgtggcgggg 40260 cagccgcgtg 40320 t-cgctgaccg 40380 r tccagcggcc 40440 a cacccgaccg 40500 g agatctccgg 40560 g actggttcat 40620 g gccggtggct 40680 a cggaagctgg 40740 aggcactctt 40800 gcctcgatga 40860 g agtcgctggt 40920 g gcttccgaga 40980 g gcgacgtctc 41040 c acgccgagct 41100 g atgagctgct 41160 .g gtgagcggcg 41220 ,a tcgagcccgc 41280 !g acctggtgct 41340 :g tcgaggcggc 41400 rc ccggggacgg 41460 -g aaggigtcga 41520 :g gcacccacgt 41580 ,g cgcaggcagc 41640 WO 99/66028 WO 9966028PCTIEP99/041 71 12cgcgctgccc g ggccggcgag c gatcgcccgc c gtggctgCgC 9 gcaagtgctg 9 cgccgcgatc g caagacggac a ctacagcgcc g gctggcggag cttcccCCtc t gaagctcgtg c cgtcgccatc c gagcgtggct cggcgcggtg tgtcacggta ggttaccgcg cgggctgctg aggggccttg cgctt cggga gctcctcgac ccggggcctg1 ggtcacccgC gctcgacggc gttctacccg ggczzccggt gggcgcgcgg cctczccgaa aatagggct~a cctgctgtgg c -ctacgggg ccacgaagtc gcgcgcggga gtgaggttac agaccgagcc cggaggcgtt gctgggcgCt ccgaggccat cgctcgaccc gcaccccgcc cggagtacct ccaccggcaa gaccttgcct gccgcagcc.
rctcccccga gccagacctt tgctcaagcg gatcggccat agggggcgct acatcgagac gcgctgtggt ccaacctcgg cgccacatca ggatcgaggg ggacgcgctt ztggaggaggc.
tcgtcctgtc acctggagaa gcagcgcgat cgctttCggC gcggcagcgc taggccgaaa gggccatcga ~ccgcagct tttctgcgct gcgaggttgc rtcgcattca t :gcgtgctca t :acctcggcg c ragcagggga t jccgcgacga a racgcgagcc t tctatgcag a ~tcgatcttg c ~tggtggacc t :cgCgggccg c :tcgcgctgg a :gcgcggacg g ;gatggctgg c igcgcggagc a ;cgagggcag a tcggggatgc c itgcagcaaa c :acctgcatg c ;cagggctct t acactggcac a :tcgcggacgt gggacgcgga q gatcgcacccz gcggcggcat zggctcgccg gcagggatgc ggcaagctcg gagctgcgca acctacccca gatggggaat gcttcgctcg aagaggtgat tctggccctt gatcgccatc ctgggagctg cgtaggtgtc cgacggcttc gcagcatcgc caggtccctc ccacgccgcc catgctcagc gaccgtcgat gcgcgctcga cacgatgcga cgacgcgtcg attgagcgac caatcaggac cttgcgcgag ccacggggcg ggggCCggcg ccacctggag cgagcgcatc gaccgcgctc cgcgggagtg gccggcggtg ggcgaagagc gcacgtcgag ggagcaccgg cgcagcgcag gccgaaggtg gctcatggcc ggcggaagcg cgggcgcatc gtggCggtcg ggcggcgcac gacggcctg g ccactcagc g.
ggagatatt t cgcgcacgt g gggcgaggg g ttcgacccc. c tcgcccgcc g gggcttggc c gctcgcacg g ggacgcatt c .ggacccgga c rcgcctacct c :cgagcaggg g 1 gcagacggc t Lcgt.cgccga t :gctccgcgg c :ccccgcgcg g :gttgacacg c :gggctCgcC g iccaccggag g :gggtttggc C ;cctcacccc c 9ggccggggt C :ttcgcggag g ;ggatcggga c :gcaggaggt c acgtggatgc q accgcatca cggtggcagcS cgcgcaccc acaagacgg tgcgtgacag cgcaagacgc gtggggatcg ctcgacgacg gacccaggcg gacgccgcgt1 ctgctgctgg gtcgggagcc gtcgcgcacc atcgccgccg acggcgtgct gagagcgatc gctctggcgC gccaacgggt gcgcagcggg ggccggtcga gcgctgcgga gcaacctcgc cgagccaaCg ggcgctgccg ccgaggaacc gcgttggcga agctcgttcg gagcctgagg gcggcggcgc cttggcctCg ctggcggtgg gggcacacgc gtcttcgtgt gaagagccgg ggctggtcgc gacgtggttc tggggagcgg gtggccggCg tacggtctc g acggggggc a gcgaccgct g atggactcg c gtcgacgtc g gtgccggac g gggctcgct c gtgcgtcgg c ggagcgctg c cggaaaatgc gtgcggatc c gtgaccggc gctgggcatc gtcgccgcg cgggcgcag gtcgttcat ttccgcgcg 'gaagcgccg rggccagggc rgcgcagggg :gccgggcag :gacgaaggg :atgccgttc ttgtcgcgg :ctgctcgaa :gtgcgcgcg ~ccgctcacg ;gccgtgctc ;ctgagtgcg iccggataca ;ttattcgcg accgagaagg tgaacgagc g gctgccgctt ggcgcgacgc acgacgtacc :cttcggtat aggtcgcctg gcaccggcgt agccgcgcga gacggctatc cgtcatcgct tcgcgctggc gcacccaggc tcgtccgtgg atggggaccg cggggttgac acgccggcgt tgggcgacc gagcgcgctg gcgtggcggg tcaactttcg ccgaaccggt ggatgagcgg ccgcggccc tggatgcgca gcgatgtggc ccgcgagctc cgccgggagc ttcccggtca tcttccgggc tgctcgggga agccggtgct agccggaagc cgctgtcgct rtccatctgg g Lccgggctcg c rgtacaccgg a :ggtcgctgg a tgttgaact c ~gccgcttca t :acttcagga a :ccgagcgcg t :agccgctcC c ~cgcaagcgc a :gCgttCCgg g ;gtCtggggg 9 :tggtgctgg t :Ccgaggcgc a itggagcgga t ;cggccggaa t ;tcatggcgc c :ccCcttct t aactacgccg c ctgccagcat t caaaatcgcg :cgtgggcgct gacgtgcggcz ctcatgacgg c cggctcgccac caggtctcgc agccgggaa ggcatcacca catctggctt gggagcgtg tgattgatg ccagctcctg cgataccctg ccccggcgga gatccggccg gcgCtgggcg cgccccccgg ggaggggttc gttcgtcggc agagcgggac gtacacgctg ggtggccatt gggaggggtc gctgtcgcc ggagggctgc gatctgggcg ggcgcccaac cgaggccgag catcgagatc cgtgctgggc cctgatcaag tacgctcaat gccctggccg gaccaacgcg cgagcgcgca ggcagcccgg gttcagcctg gcgcgaggcg cgtgcgtggg gggctcgcag ggcgctggag gctctccgCC cttcgccatg ggtggtgggC cgaggacgCg gaggctccg 41700 tgctgtgca 41760 gaagcgggc 41820 cttcgccga 41880 gctgtctgg 41940 cgagctcgg 42000 gagcczgtc 42060 cgcagcgct 42120 ggtagagat 42180 .gcatctcgg 42240 'cgaatccgg 42300 rgctcggtct 42360 .gggccgctc 42420 .cggcqgcgcg 42480 ccr-ccgcga 42540 ~cctggacga 42600 :caaggtccg 42660 :cgtgctgta 42720 :ggccaacac 42780 agccazcga 42840 ;cgcacggct 42900 :cgagcgcct 42960 igtgggtgga 43020 :acggcgcgt 43080 :cgccgaggc 43140 iggtgctgcg 43200 :ggaczcgct 43260 :gccggcgac 43320 ctcatgtcgt. 43380 czccaacaac 43440 agtcaczcgc 43500 gagcgcttgc 43560 gagctcgaga 43620 gcgggcactc 43680 :ccgaggagc 43740 gggctgctca 43800 gaggcacg 43860 gaagacgccg 43920 gtctgcgcca 43980 ccgtacagca 44040 gggctgcagg 44100 cacctcgcct 44160 aacatgcttc 44220 aatggccgtt 44280 ggtctgatcg 44340 ctgatccgag 44400 gtgctcgccc 44460 gccatcggtt 44520 gaagcgczgc 44580 gcggtgaaga 44640 gcgacgcttt 44700 ccgcggatcc 44760 cggacgggcc 44820 catgtggtgt 44880 gcggagctgt 44940 czgcgggacc 45000 gcgacgacgc 45060 ctgcgagggg 45120 cgggcctcgg 45180 tgggtgggca 45240 ggttgcgacc 45300 gacgaggccg 45360 gaagitagcgc 45420 cacagcatgg 45480 gtggcgatca 45540 WO 99/66028 WO 9966028PCT/EP99/04171 -13tctgccggcg cagccggctg ctgcggcgga agctgtcgct ggaggaggcc gaggcggcgc cggtgagcaa cagcccgcgc tcgaccgtgc tgctggcggc gctgacggcc aagggggtgt gccatagccc gcaggtcgac ccgctgcgcg ggccgcgagc ggctgcggt.g ccgatgcgct cggagctcgg tgcgagctac tgggcggaca cggcgcaagC gctgctggag ggtggccccg tcctggtgcc gcccctggac gagatccaga gctcgctgcg gcgagggcag gacgagcgcg gggcgtccgg ctatccggtg agctgggctc cgctgccgac ctatccctgg cagcacgagc gccgcctcgc cgcagccgac cccaccaagg tgccccgcgc cgccccgaaa tcggagacag ggggtggggt cggtgaggcg gtcgctgcag tgcttcatgc gccggctgac gcctccaccg gccgaaacga ctggcaggga gtcctctacc gggcatcggc cgacgaagtc agcgaggcta tggttcgatt cccgagcgct gcgccccatc catgcacggt gggcggcgag ccagaggcct cgcgcgtcgc ggcgctggag caccccgctg agaagagccc gacggagatc gagcccctgg atcaactggc gttccgcagc ggtcgcaggc agggcgacgt cgcaccgata tcgctgtccg tgggtggcCt tggtCztgCtc gtggctcggt tgctcaccag ccggcacggg ctgccagagc aggcccgcgc gcgcatcgca gcggtcgagg tggcagcggt ggatgtcgcc gaggccgatc ccccgttgcg cggggtggtg cacgccgccg cggacgaggc cctgctggag tcggtgctcc accggctgct gcgcgaccgg cctctcgacc tgtggggtgg caaaggccaa ggcgcatacc cgcaccatcg ccgcgcgcac tcgctgccgg agggaggcat ggttgatgca aaggctcatc tggccacggg gccggccttg tcggcgctgq gttcggtcac acggatggac tgggcgcgct gcaacttgct ttcggctctg gtcgcggag5 cggcaaaccg gatctggcgc ggcctgtccc tcgttcgcgg catcgtcgcc cgggtgctg gccgaggctt cgccgagcag gggctcgacl ttcagcgcga gctgggcgaa cggctgtcg agcggctggt ggcgcatctc ctcaccgac( ggcacatccg gtcggtggcg gcggatgacc, tcccaggtgg ggatgagggc ctggagacal tcagcaccga ggtgccagcc gaccggtggi aggttccggg ccggacctat gtggccaag, atgcggcgtt cttcgccatt tcccctcgt' tgttgctgga ggtgagctgg gaggcgatc gcgagagcgc cacgggcgtg ttcgtgggc agggcctcga cgacgacgcg gcgttgccg ccgctggacg gctgtcgttc ttcctgggt cctgctcgtc gtcgctggtg gcgttgcac gcgaccaggc cctggccggc gggtccagc cgtcgcgcat gcgtttgctt tcgccagat acggctttgc gcgggccgag ggctgcgcc agcgcgaccg cgaccccatc ctggcggtg cgagcagcgg gctcacggtg cccagcggt tggcgcaagc gggcgtggcg ccggccgag cagcgctggg tgacccgatc gaggtgcag ccgcggagcg gccgctctgg ctgggcgct cggcgggctt ggccggcgtg ctcaaggtc cccaaccgga gctcgacgag ctcaacccc ttgtccgcag ggcggtcccc tggccgcgc cttcggcct gagcgggacc aacgcgcat ctgtggccgc ggcccccgag cgcgcagcc tcagcggtca tgcgtggcca tcgccggcga tctggcggca aagagctgat cgacggtgac accttcggca cgctgttcat cggcggccga cgacgctgc t ggctgttCc ggtgctggat ac tggt tc ta ctcatgggag cgctgtcgac tcgccgagca tgiggggCct cccgccgtgc ctcctcgctt ctctttgcca cctggggtgg tggccgagct acgcagcacg cggaggggag ggctggtgga gacaggcgtc ggctagaagc ccatgacggc gcgtctzccc gtcccaaggt tgttcgtgct ccgcggccaa cgttgagcct Fcacgtctgag Tagcgcctggt :tcgcgcCggt Iacgagcgcgc tgcggagac 3 gcttctccgz ccctgatggc ;cgactctggc g tgctgaagct g acatcgccal r. actggcggcz :gcgcggcggi g gtgccttccl g aggcgatga(.
g agcgcgcr-g a tgartcggga( t acggcaccai tgcacggcci c tcgcctgcc, g tgcttttgt, g ggcggtgca g tggtggtgc g tcaggagca c ctgcccagc ,g tcgatttcg 'g cgctgggcg .g tcaaggcca [C tcttggcgC rc acatcccgt :g gcgcgcgc .g tggcgttgg ig agctgttc gggggagatg gc tgagggtcgg ci gccggcggcg Ci ggtgaaggtg g9 cgcggCgctg g ggcggggtg am gccggtgcgC t' cgagatgagc ci gcaagggggc g ggaggcgctg gi cgcgggcggc a cgaggtcgag c ccgaacggac t ctggctgctg t gcgcggactt t ggtatccgaa g cgacgccgtc g caccgcaccc g ctgggtggtg a agcggcgttg t cct-cgtggac c gctttcgccg g ccttgtagcc g ctacctggtg a gcgjgggagct c gggcggagag c gcaqgggcg c gctgctggCc g cgtgcgtcac c ggccgggagct gttctcgtcg tgcgttcctC cgcctggggC cgacatcggg *gaacaccagc *ctatgccgcg tgCgtcCCCC Fccgctcagcc LcccgggcgCg tccggagatc cttcgaccac ggaggaccgg cgtcggtgcc i tctggccgag a ctggtacgac ,ccgcgatgtg j cctggacccg ccaggacccg g cgagcacgcc cggcaacctg gacgatgacg a. gagcctgCga c gccgcggtca a gacgttctcg t caagcggctc c ggcgatcaac a ggcgttgcta t ggagtgccac c ggt-gtacggg a cctcggccac t ggagcacgag g ggcagagctg 'c gcgtcgtgca [a ggaggcgccg [t cctgtcggcg :gctggtcg 45600 :gagcgtgg 45660 tctcggagg 45720 acgtcgcca 45780 gagcgatcc 45840 tcgcgggtc 45900 tcgczgcgg 45960 cgcacccga 46020 ctgcggtgg 46080 ggacgctgt 46140 ggcgggttc 46200 ctgacgccc 46260 ggcccgagg 46320 tggccgaca 46380 cctgcaccg 46440 ctgccagtc 46500 tcgatgctg 46560 tccttgggc 46620 cccgcgggg 46680 ggggcctcg 46740 tggatcctc 46800 racgccgagg 46860 rccccgccgg 46920 cgggcgggc 46980 gacatctgg 47040 :agccgccgg 47100 :gggtgaccg 47160 ;ccaccgagc 47220 :Cggcggaga 47280 :ggctgctgc 47340 ;gcgcggcgg 47400 ;acgggctcg 47460 :Catgggccg 47520 Itcctgccca 47580 Ictgtccagc 47640 cgagggcggc 47700 ccggtgccga 47760 ctctacgagc 47820 :ccgacgtcg 47880 cgtaaccgcc 47940 ccgacggtgg 48000 agcgacaccc 48060 gcctgccggt 48120 ggcatggtgg 48180 cccgatccgg 48240 cgcagcttgg 48300 caacagcggc 48360 atggcgctgc 48420 gagcgggtgc 48480 ctcagcgtcg 48540 gtggacaccg 48600 ttgggcgagt 48660 ttcgtcgcgg 48720 gccgctgcag 48780 cgtgacgcgc 48840 cacgatggcc 48900 cgccaggCgc 48960 gggacgggga 49020 cggggccgcc 49080 ctggaggccg 49140 cagattccgg 49200 ccagtggCcg 49260 ggcgtgagCg 49320 gcggtggagc 49380 aagagcgcgg 49440 WO 99/66028 WO 9966028PCTIEP99/04171 14cggcgctgga gcctcggcga cggtggccgc acacgccgcc tcgtgtttcc agccggtctt ggtcgctgct tggttcagcc gagtggagcc ccggcgCgCt ggcggatcag cggcgCtgCg ccgtgctcgc gggtgttCtg tgcgcgaaga tgcgctcgac cggacaacct gccccgcgct cccagacggc agcgcgcgac gggctCggCt acgagcggta agcttcgcaa gacccggagc aacatagggt ccgccggcgt gagccctcgc gtcccggtCg ggcacgccac ctccgtggga cgctcaacga ccggcacggg gcgcctaccg tcaccacgcc tcccgcggtc cgctggacgg tccatgcgaa gcatatggaa ggcttcaaat cgatcggcat ctaatttggc ttgagtggcc cgctcgagat cacccatcgc tggtagcacc ccgccgccgt ctccgctctt gcattctggz tcatcgtcgc tcctgtcgtt tctggticg4 gtatcgacci ttgcggacg( tgatcctctc( cgaccgaat tcgtccaCc( gggtccacgi t-gctCCtgt agctcgagg aatggcagc gggcttggC aagggcgag cggggCtgt cattcggcg cagcagggg tgcgcaggca tgtggcgttc gagctcgcgc gggagccgtg cggccagggc ccgggcggcg cggggagctc ggtgctgttC ggaagcggtg gtcgctcgag cggtcagggg tggccatgag cggcgagccg gcggcaggtg gctgatcgcg ggtgacgggc tcggcagccg gttcatcgag ggccgagcaa gctgctggag gttcccCgCg ctggatcgag cggcgccacg tcacttgtgg ccatggcgaa agatcttat cgtgccctCC ggcctcattc ggggcacgtg gattcaacgg gcacgccctc ggaggtgctc gattcatccc ggaatccatc cagggctccc *tggacggcgc *gtgggaggtc *cgtcttctgc *ctctgtcgtc *ccttgtagg ggcggtgct< Icaagtttgc cctcttccci ccgttactcc.
gtcgggaat(.
cctcccggt(.
ccttgctcg4 Stgtcgaccai ggccaatgti gctcgCgcCl i tatcaccac, a tccgctcct :cgtgagtct :gcgcgcgcc :gccggccgc g gatggcgtt g tcgattgcg c cgaagacac c ttctgcttt g caaagaccc t cgtgctgat g cgaggcgtg a tcaagtcga a ggaccggar a gagggcgac gcccggctgC agcctggcga gaggcgctgc cgtgggcggg tcgcagtggg ctggagggtt tccgccgacg gccatggaag gtgggccaca gacgcggtgg gagatggcgc ggtcggctga gcggcgctCt aaggtggacg gcgctgggag ggggtgatcg gtgcgcttCg atgagcccgc gggggcg-ctg gcgctgggga ggcggcaggc gacagcgtgc gaccatccgc gagcaaqgcg gccgtgttgC ggcacggcga gaaggcggac caggtatcga tgtagcggcc *cgatgtccga gactatggtc ggccgggtac *gccttgttgg gagattcgga Tgtgaatcaag *cagagcgcga atggagcgcc gctactggac tacaggaagq I gacggagagc -gaggaggccc -ggggaacggc -ggtggct-cgt 3 aacggcatcc j ttcagcatct 3 Ctgctgcctc :gcggagcaaz g gagccagctc :atccatgcgi c ggaggcctt( g cctgctcg g ccaggcgaci g ggcatagcal g cgcgcagta g gagaggatg t accggtgga c gggcaggtc c gcgccgcgg t ataccagag g gaccagggc C gtgcgCgtC t ccggcgcag a gcggagtc~g gggaccacct cgacgcgcag gaggggcgct cctcgggCgg tgggcatggg gcgaccgggc aggccgcctc tagcgctttc gcatgggcga cgatcatctg tggtcgagct gcgtggcggt cggaggtgct tcgccagcca cgatccggcc cgggtccgga c tgcggcggC acccgatcct cggtgggctc cgctgtgggC gggttccgct atgggtcgaa tgctcggggC tgagcgacga ccagcgcggc cgctggtgCt gcatcgtgca gtcgtgagga agagc tcagc gcgicctgtc cctgcttcca gcttgCcagg atgcatgttt ggcggctgac cggtgagtgz LgCgtgCCCgt :tcgcgcaggc Iagcgtcacac i catcaagcc attttgtgac Iggaggg tgtt tcacggacgi tcgatatggc tgcgcggtgl tggagatcg ;accggacggi i gatttcgagi I gccagggati a cccgcgata, :tggtgCtgg j aggggtggc, a cctggtgtg g gatctccgg g gagCCgctt c ggcaggaat t acttccac g gcgcgttca g t-ggcagagg g acccgcggg g ctccggcag g ggacaggcg a tcgcgggta c cagatggct g tagtgcata Ic ttcaggccc ggagaagcat g~ cgcgatggag c~ ttcggccgca g cagcgcgccg a ccgaaagctc a catcgaggcg g.
gcagctcggg c tgcgctgtgg C ggttgCggCg..g ccggcgcagc c gtcgctggag g gagcaacagc c ggcggcgCtg a tagcccgcag g gcgagcggct g gctcggtgcg a gcaagcgctg c ggtgccgccc c gctgcggcga a gtzcggctat c gccgacctat c gccctcgctg C tccattgctc g gaggctatcc t gtatgtagag a ggagcagctg g agtggccctc z ggcaggtagg ggtgggagcgt gtcggaggcgc *gggcgtggagc agacatggca *tcaggtgctg ggatctccac Lcacctggctg cgacctggtg gtacatcatc gatagacgag j atggataggaa ;ctctcagccg cgc cgacct.c attgaccggt ggagcgaatc cgtcgagtcg j agcagggacg a gtaccatttc a cgcacatcag L aagagccacg t cgagggcaca a gaagtacgaa a cgtcctgcgc c ggggatcctc g tgacagctc g ggccaatggc g ccggccgggc c gaaggcgctc t tcaggggctc a agagtggttg C cgcgtcttct C tgcgctCqta ,c ggcatacgcc ,t tcataccctg Lt gtggacicct [a tcaacrtg tcgagcttg 49500 accggctgg 49560 cgcaggggc 49620 aggtggtct 49680 tggccgaag 49740 aagcgggct 49800 gcatcgacg 49860 ggtcgtggg 49920 cgcacgtgg 49980 ggctgctgc 50040 aggccgagg 50100 cgcgctcga 50160 cggccaagg 50220 tcgacccgc 50280 cggtgccga 50340 gctactggg 50400 cggagggtg 50460 ~.agacgaga 50520 ggcaggacg 50580 ccgtgagcc 50640 *cctggcagc 50700 *ggct-tcgrgc 50760 jtctcggcgc 50820 *acctttcgg 50880 tggcgctcg 50940 ;cgctcgagc 51000 ~gcgaagaag 51060 Igctgggtgc 51120 tgaaggaag 51180 zctatccgc 51240 .aggtgtggc 51300 tcctcaagtg 51360 acagcgctgc 51420 gaaccggatc 51480 tgggacgccg 51540 ctcggcagct 51600 ggcactctcc 51660 ttgctcgtca 51720 caccttgtca 51780 c: ccggagc 51840 ccagtcctat 51900 aagacgctcg 51960 zatcaagatt. 52020 gcggcgcggg 52080 ggcgcgacca 52140 accgatgttt 52200 ccgaagtatg 52260 aggtttgacg 52320 gcgaagcgtc 52380 gggcatccga 52440 gatgatcttc 52500 cgggtaggct 52560 ggacagcaCg 52620 ggtgagtcgg 52680 cccgctgacg 52740 cggcaggttt 52800 gczggagatc 52860 cgcctgccgc 52920 tacgctttgg 52980 tcctccgcgg 53040 tcgctgccgg 53100 tgcctcgcgC 53160 ctccgcgacg 53220 c-atgcgacgg 53280 gggagcctga 53340 WO 99/66028 WO 9966028PCT/EP99/04171 gcgcgctttC ggctcttgac aggcgccggt cgctcgtgga tcggggcgag gcctcgtgcg gcagctatgt tgatgcaggg gggatgccct tggctcggcg ttcgggggat atgcccgtcg cgctgaccag tgggCttgCC atcaccggtg catcatcgcc gcctcacgct aggtaggggt gattggcgct acgcg~cgaa tggaaaagca agcggcacgt accgcatcga acgtagcagc acgctccggc tgcggggcat cgcagatcgt tgagcatgcc gcggaccgag cccggaggca gcgctgggCg caccgaggcggtcgctcgat cggcatcgca cagcgactac caccggcaat accctgcctc ccgcagccl: 7rtcgtccaac ccggacattc gctcaaacg ttcggccatc, ggaggcgctc( tgtcgagacc tgccgtgtt4, aaacctcgg, tctgcaccai gatcgaggg, accgcgctt ggaggaggc ggtgctatc cat.cgccgc tagcccgat gctggaggt ctcgcccgg ccgtgggtt cttcgaccg caggtcgtc cgcgCtggc cctcggCga cttggtaat gatcgccgc gatcgcggc gcagatCgtC cgcgttccz tctggtgcag ccgcgccgtg gtggggCCtC cgtgaacccg cgacagagag gagctccttt gat.caccgat ggcccgccat ccggtccatg cgacgatgtc cgtgtacgtg cctcaaggag ggatagatcg aggacagggg caaggtgggc ggcgaCcccg ggagcaggga agcggctg atgggcggag cctgcgcgag cttgagcgag gccgttcagc ggccgcgctC tctgagcggg caccgagcgg gacggacgag tggtgagtaa cttctgctca ccgatcgcca ctctgggagc *ctggtcggcg gtggacggct *cctcagcaac Lccccagtccc tcgcataccg acgctcagcg Iaccgtcgaca ;cgcgctcgcc IacgatgartaE gacgcct-cg ctccgacc Saatcaggatc ttgcgcgag cacggaacg cacctggag gaactgatc g accgcgct-c( c gcgggggtg g ccggccacgi g gcgaagagci g tacccggag, g gagcaccgg t gcggcgcag c aagctcgcc g tgggaggcg g gagctccat g ttgctggac c gcgctcttc 1g ctggtggcc .c gccgcgcggc :g ccggaggcc -a gtcaatggg -g gcggcgttc IC tcgccgctc gcgctggtgC gc catgcggtgg 9c ggtcggacgc tc gcgccgtCtC cz gaccaggtcg cz tccggcaagc c~ ggcatgggga g~ gtggtgCtCg t~ gccgaggctg g( gct~cggctcc t( gacgggacct t( tggatgtatc c ctggacttct t agccgcgccg c cttacagcga t aacgacggcg g gcggcggCg C t aatctgcgcc a ctgctgaagg a gcgctgcaga g ct~gttggggC g aatctcggca c ggcatcaccg t agcttgctag a gagaagagct t cagaaggacg c gggaccgagg g tggacaagct t tcgtaggcat t tgctcgactc a tccatcccag c tcgacgccgc gcctgctgCE tcgacggcagc ttgcgcaaca tcgccgccgg cggcctgctc Iagagcgatct L rgctggggCg ;ccaacgggtt- Icccagcgaca ;gccggtcgac Icgctgcagag I ggacctcgct -gggccgatgg ;gcgctgcagg -cgcgaaacct g cgctggcgac a gcgcgttcgg g tgctcgcacc g ccgccgcgct c agggtctcgg c tcgcggtggc g ggcagacccc t tcctgttcgc t ggccggcgtt agccgctctg agacggcgtt ggtCgtgggg g cctgcgtggC c ggttgacgca 'g acgtggct-gc zC cggagcaggr- :g cggCgCgggg a tggatCCgat :cggaggtg :gcggagga :gcgctcga Lgaggacgc ittgcgCtC :gctacgga lgtggggCrt ;gatcgcgg cgcggaggt ctcgaagat ccagggcga caaggtgct cgtcctgta cggtgacgc gagcatcaa agcacggct cgggcgcctt gtggttgga acgtgaccg cgccaggCC ggggc agcg ggactcgc t gccggcgaC cattctcgtt cgagaacga :gttgCtCgC agtatggcg .gcgaaaaag ~ggctgccgC aggccgagac :gaggaggtc ;:tctttggc ~gaggtcacc :cgcacCggq ;cggcgcgac acggttgtcl gtcgtcgctc :gcgctggcc catccagc :gt CCgtgg cggcgatcg agggttgat(, cgctcgcgt( cggcgaccci gagccgctgl cgtggcggg' ccatttcca ggagccggt cctcagcgg ggcgacgcc ggacgcaca agacgtcgc ggcgacctc ggcaggcgc cgggcaggg ccgcgagac cgaggtgat cacccagcc cgtggagcc gggtgtgtt ggCgctgc cgcggtggc ggtgatcac ggcgcgaac gct-ggaggc gcgcaacatg cgcagcggcc gcatccagag agccgcactg ggatggccgc ttgcggcatc ctcggtcgcg cggcgcttcc gcagatcgtg cgaaccgtcg ctcctcgatg cggagcgtgg ttcctcgggc cttcttggac ctggggattg cgaataccgg gctcgcacga gttctatccc cgccgaccga cgaagatcgt ccttccgccg gataggcctg cctgctatgg tccgaatgcc tgccgcagat cgaaaagctg accacgaatg aacgcgtctt ttccccggcg gcggtccagc fccgcgctggg acctcgcctc tgggaagggc Igtgttcctgg ;gagcaggacg itatacgctag gtggccatcc ;ggaggcgt-ca a ctgrcgcccc j gagggctgcc a atct-gggctc j gcacccaatc :gacgccggg ;j atcgaggtcc =gtgctgggCc t ttgatcaag c acgctcaatc g ccgtggCCgC c accaacgtc( g gggcgctcai g gCggCgCgg4 g ttcagcctgl g cgcgaggcg, g gcg~cgC9 c gcgcaggtg c ttcgaccgg .g rtgggccgag :g gcgctcttt :g gagctcgtc :c tccctcgag :g gccggCggC :g ccgcacgca -g ggcgccgag -c aaaccgct-9 :g ttccggCgq ccgcggcttt 53400 tcggtggcgc 53460 ctgcggtgca 53520 gcggtggagc 53580 tacgtggcgc 53640 cgggcggacg 53700 caatggatgg 53760 gaggcatccc 53820 gaggccgacg 53880 atgccgccgc 53940 ctggagctgg 54000 aacctgcacg 54060 acctcgcttc 54120 gccatcgcgc 54180 ctctccgaag 54240 gggatggaag 54300 cccagggcgc 54360 aacgcggccc 54420 ggcgcgtcaa 54480 cagctgactc 54540 gagaggatca 54600 gagctccaca 54660 acctacccta 54720 ggcgcgaccc 54780 ctcgaggczc 54840 gcgcagctcg 54900 ccgggaagct 54960 tggagcaaga 55020 gagcggacac 55080 cgctcgaccg 55140 ccggactgct 55200 gggaggcgcg 55260 tcgaggacgc 55320 gcgcatgcag 55380 cacacgacat 55440 ggctgcaggg 55500 accttgcctg 55560 kacatgctcct, 55620 1atggccactg 55680 1gtatggtcgt 55740 -tgatccgggg 55800 tgctcgctca 55860 ccatcggtta 55920 1aggcgctgcg 55980 jcagtgaagac 56040 j cggcgctggc 56100 :cgcggatccg 56160 gggcgggccg 56220 atgtcgtgct 56280 g cggagctttt 56340 :tctcagcgca 56400 g tatcgacgcg 56460 ctgcgaagcgc 56520 a gggccgcttc 56580 c cgggcatggg 56640 t gcgtcacgct 56700 c cgggcagcag 56760 g cgctggagta 56820 g ctggccatag 56880 g acgccgtgcg 56940 g cgatggtatc 57000 .g cgttggtgtc 57060 [a aattcgtgca 57120 [c atgt.cgca 57180 [g tgactgagtc 57240 WO 99/66028 WO 9966028PCTIEP99/04171 16ggtgacgtac cgatgaggtg ggacggagtg gccgacgctg agcgtcgcgc ggtcgtcggt gctgccaacc ggcggacggc cgtgtcgacc gtggctcggc gatggcgctg ggtgctcatc gaccgaggag tcgcgCgccc cccggcgagg ggctatctat cgccgagctg cggctCCgCg tgttggcgcg ggtgCggccg tggt-caacag ggtggtcgCC cgacgcagac gatcacagcc ct.cggcgctg tgcaggaatg gcacctcagc gctcgacgcg tggttgcgac gccacggctg ggtgcaagcg ctgtatcagc cgagctactt tgcgcggctc tgacaggccc agccacggg gctcgactcc agaaatcgac gggcgtgaaC( zacccatgtc cgaggc-ggc( ccacctgcac cgcggrtgCgi gaaccgtgc( gttcgtcaci gctttCggg, gaagctggg, gaatttttc ccgtgcgct gtcggggtt tcgcgcagc cacgctgga cgcggacgg atggctggc cgcagagca gaaagcgga ggggatgc gcagcagac cttgcacac tgggCtttt cctttcgca cacggaggt gacgCgMg tcgcgtgCz aacagcggc cggcggcctt agcgcgccgg aaggcgctgc ctcggccttg gccgggCgtg ggatcggt~ca tatccctggc accggccgtg catgccggtc gagcaccggg tcgtcggggg gagacgctga cgaccgggac ttccggatcc tcgaacctcg ggtgcgctcg tggcggggrtg acagcctacc ttcgccgatc ttccagcg gcctccagcc gagatctcc gactggttc ggccggtggc aaggccgccg cgcgcgctcC agcctcgacg ccccggagcc agcgtgctct tggct-cttge ccgctgttgq gtcgacctcc gcagatgatc gtccaccggc Ittccggctac Icggcgcgctc atcgacatc ccgttggta< ggCCttgtg-c~ accacgtcgc.
gcgatgccc( j gcgggggag a tgggcgcagi :tacctggag BL gacgtgcat', =gagcgcatc caggcgcgac c ttctcgcag c ctcgacgag g cgcgttggc gaggcattc gacccggag c acctacctt c gagcggggc g cgagccgcc c gtcgccgat g ctgcggggt: -t ccaggcga :g ctgacacgc .c ggctcgcca Lt caccgaacc :g gggat-ggcc ic ar-caccccc ig acgggggtc :c tcacggagc cgatcgcgct gttactgggt acgcggccgg tgccggcctg acgaggctgc cctggtcggg agcgcgagcg ctcgggcggg tgcgcCtgtg cgcaggggga ccgagatctt ccttcgcggg ggctgcggtt acgcccgcgg ccgccctgcg ccgagatggg agggcgaggc agctgcatcc acgatgaggc c:cctgggga ggtcggagcgc ggctggtggt tggagctgga tgctgctcgg gccatgtcgt tggccaacgc ggggcggcca *cagatgtcga *ccctggtgca cccgcggggc fggctggg9CCg atccagccga Iccgaggagga tgcccgacgc Iagatcgatga ctggtccggg :agctggcgtt :tcggaagcga I taggccagcc j ccacgctagt zcgcgtattt aggtgctgat -gcgtgggcgc r- cgctgggtgt g catggacggi g acaagagccl g actgcgccgi g tggacttgc(.
c tgttcgggtl g gatccctca c ggaggat-gg, g tgcggatccl g tgacCggCg, g cggggcaac !9 tggcggCgC .c cgtcacaga .g tcgtgcatg Ft r-ccgcacgg .g aagcgcctc Lg g~Ccagggca Iao cgcagggtc g ,tgcgcaag -a at-gagggtc ;a taccgatcp cgtCgCggC ggtgagcaac gcgtcacgcg tgcgggcctc cctgccggat gagcgcgcta tgtcttccCt ttactggatc gggccacccc ggagacgacg ggtcgtgttt gggcgatgga cgatacggcg ccaggtagcg cgtgctgcgc cgcccggctt gciztcaatac gctgggCagg ggtgctgctg gacgccgtgg gctatggtgc cgactttgag ggagcggctt ttgggagccc cgagggtggt cgtccacgcc gttcgacggc gctcggCCCg tgccgatgcc agcgctggtc tcaggcggcc caccatcgcc gcc tgaaggg ggtCgcgCtg tcagcgccgg acccggcgcg cgaggtcgac gggcgttgCt Igt-gCgccggq ggtgatcgcc gttgcctCgq gacggcctgc ccatgcggac cgaggtgtat gcggtacgtc i cggcgagggi :catggtcctc 3L cacgcagcci g gggaatgati, r ggtcgcagci gccaccgcc, gcaaggacal g cgctcCggC 9 tctgggtgg t ggtgctggt t ggaggccca t cgagcgggt c ggcaggtct t gatgggacc t ttccttCtt ,a ctatgccgc t gccggcgct [a aaaccgtgg :t gtcagctct LC tCCgCggCa 't ggtgaccac ctgagcggga a~ cgagaggcgg tc ttcgtcgagg tc gccaggccgg t~ gaggcgctgg gi tcgggcggac g gaagcgccag t~ cttctgggtg a ctggaccgaa a cctggcgccg gi ccgatccagg t gtaccggtcc a agtcgggagc c cggatcgggc g catgccgccg t ggcccggcgt t gtgagaCtgc c gacgcgtgcg t gcgccggtgg a catgcgcgcg t ttgatggacg 9 gcgagcggtg t gcggcgCtCg 9 gggctCgggC 9 gcgggggacg a caggccccga c gggctcgggg C ctcgaatcgg c gcatggacct gccgccggcg ttagagcacg gaagccgatg cgcggtggcg gagaaggt-cg ct-ggaccaac1 fatct.ccgttg- *cccaatgar-c f cacatcg- cg czzcacg cc :-Ctggggc j cacgccctcg g ccggtg gcgaccgccg SagcgattccC gtggacgtcg g cgcgCCtgtg gggctgCCgC ;ctcgatcaac ggtgccatca g gtcgagacct g catctcggga c gaatccagcg c ctcggtctgC g ggccgCtCCg c ggCgCgCgCg c ctccgcgagg c gcggatgacg t aaggtccaag c gtgctgtacg a gccaacgcat .g agcat-cgact rc gcgcggcaga g gCgCgCttgC Lg tgggtggagt- :q cagcgcgCgg ;ccctgcac 57300 ;cgcttcgc 57360 ggggccgaa 57420 gctgctccc 57480 tgggttctg 57540 gcgggtacc 57600 cgatcgtga 57660 agrtcttttc 57720 gcggctgcc 57780 ;tacctgga 57840 cacggatgt 57900 ggtggtgac 57960 gggggaacg 58020 cgtcgagac 58080 gcccgctgc 58140 gcgggggct 58200 ztgaggccgc 58260 ccaaatgat 58320 ggcgggctc 58380 cgtgagcga 58440 'tacgggcgc 58500 *acgccggcg 58560 itgggcccaa 58620 Fctcgttgtg 58680 kcacgagcac 58740 ~ggccgtggt 58800 :gcagggcgc 58860 ~gctgatgcg 58920 :ccgaaacgc 58980 itgtctccgt 59040 :cgagctgcg 59100 :tttgctggc 59160 accggctcgt 59220 agcccgccgg 59280 tggtgctccg 59340 aagcggcggg 59400 tgcctggaga 59460 ctgtgggcga 59520 ;agtatttgc 59580 -czcggcgac 59640 acaaggtcgc 59700 tcggtctttg 59760 acacgcccga 59820 gcccgggccg 59880 tgctcgactc 59940 gtcgccttgt 60000 cgctcctacg 60060 cggcgaggat 60120 gcccactggg 60180 tcccgatctc 60240 agctcgtgct 60300 tcgccgtccg 60360 gcgtggccgg 60420 gtgcggcgag 60480 tcacggtggc 60540 ttaccgcgtc 60600 ggctgctgat 60660 gggccttgca 60720 cttctgcagc 60780 tcctcgacgc 60840 g-gggcacgtt 60900 tctctcgcgg 60960 tcgagggtga 61020 tctacccggc 61080 tcgctgatcg 61140 WO 99/66028 WO 9966028PCTIEP99/041 71 17gaccgccggg ggggCtgCtg caagatcgag gctgcgcaac gtacccaacg tggcggcggg ccgctttcgt ctcgcccgag ggccatgtgg agaggcggcc cctgggtgtc ggctcCgCtg ggagatggag atccacccaa ggctccggcC catcgccggc caccacggag agggcgcgag gacgtcgtca zggttgggag cctggacgtt gaagcctgct gatcgagcgc cctcacccga agaagagtgg gaagrtacgga cccgtcgttt ccagctgctc gggaatccg gttc gtcgc cgatgaggag cgtcctcgat ggccgaggcc zatcgctgct gcggtcgccc cgatgaggtg ggacctggag gagcgccctg cacaggcgcg zcgcctcgag gaaagaaact catctrgaag cctcatgata caagcccatt gacgtcggaE gcacgccgct cggcctcacc cgcgcgcgc cttcggcggc agtcctcttc gctcgtCCtc gctggtgcac zgcgcaagc gacatccggc gaggccgac, cggtgggg cggctgggC, a:cagccggc.
zzccgcgcg gccggccCg gcggcgcgc czgcgcgca gcctctctc cggagccgc gatcgggacc caggacgtcg gtggatgccc cgcatcgagg gtagcagcga tcggacacgg cctgtcgtca ggcttCCgtt cacgatcgca tcgctgattc cggttcgtca gccgtcttca accgatataa caagtccagg cccggggact tcggacgatg cgcrtctata atcatgcaca ggccCCgCgt cagcgtgggc tgcagtatag ttcgacttca ctgagagagg taccacgacg gaatcgagcg ttgttcgggc acgtcacgcg gatgctcgct a tgcgcgcga t tcggc tcgg accaagaccc gagcggcgca gacggcagca ggcaccgata gaggcgctcg ctccgcttcg tactgcgggg aaagacggga agcctcgcgt gcggagatcc ccctccaaac Lcagctcgcgc gtcagcgcgc Lgaagtgccgq cgtcatgtc gagcggcag ttcggcgagc gtggtgCtgS caggat~ccg ;ctgctcatg ;gcgCtctcg ;cgcatgcag tgagcctcg! cgcccgccgl ;aggcgctcc ;aggcggcgc aaggcgctcg agactccaga g cgacggagc c accgcggat c ttcaccccg g ccgagcggg g ggcgcagcc g ctcggtccg tgctcgaaca tgcgcgtgca cgctctcgag ctgcgctggg taacgcgctg acgaatcgac agccgcgggc cctggtcgga gcctcgCCtC agcactatgc tggggacagc cgttgggCgg tagccaagct ccgatgctcg cgaaggagcc tgatcgtgcc tgcatct~cct tcgtcgactc tcgaggcaaa gctggcggcc gagattttat agccgttcgc caacccccat tgtcggcggt cggagtactc tgccgccgga ccatcgacct ccggacaaga tcagcgctct cgactgcgcg tggtcgcgtc ggaacccagct ggctgagcac ccacgatcta agctggtgae acaatatcct catcgatcaz *ctgtattcztc *acggtagag ccgtgggcac gataccaccc fctggatagcl *gcgggtgCtc.
-gaagatcga4 IgcgccgCCCl *ggccctgca4 ;ttctgctcti -tcgcgcggci ;gcccgtccg ;cggtcgggg ;cgggtatcg ;cgctcggcg j gcgcgttca a cgcctgctc ggcccagct g cccgttccg gcggccgcg a caaggtccc a cgatgcggt a cgcgctccg gagccgcgt ggaccgtgt gctcatcgc g agcggcggc 'C gcagcggct gcttgcgtcg ggtctcgcat catgggcatg cgtcgccgcg gctgctcgac ggcgagcgc ,cgtctcttc gaagtctgag cgaggacgcg agacgcaccg cgtggagc tc cagcttgatc cttcttccga cgcagacaag gcccgtgaag tccgagcgac tcccggagat gcatctcaat atgatggcag ggcggcaggc gacacaggag gcctgggtac cttctactgg gttccgcgac gtcggccatt ggatcacgct gctgcgcgcc ggagttcgac gttgaaggtt cgcgctcggc cgtcaccgag cgaaaatgac gaaggagctg iccttatcgcg Lggccgagccc :cagaatagga igaaaggggag :caggccagac I cccccatgtc catrcttccgt zcgcgttccgq cgcgggggtE ;tctgccgcg a tccacggccc :gggagcgcaa :ccgcgccgac =gctcgtCgcc g gctgcgccac t cgtcggcgcc t cgtgctctcc a ggtcgatgt( c gatcgcgcci c gtgggatcti g tacacctcgi gagccggac gacgcgatc g ctggagagc t ttcgcgaca c tggtctcgat c gacgcggcg .g tccgctgcc .a cccgcgrtCC :g ctctacacr- :g tgccggtgg a gcgcgcgtc gctgagccga g gtgctgcgtC t gactcgctga t cctgcagcct t gacgccctcg t ggttCgttCg t tgttttcacg g tggagcgatc t cctggtaaga a tttgcgttag t gccagtcgtt c tcttcttcag a aatgccgcgg g gtcatcacag a atcgcggtcc c gtzcaggatc t cacgaatttc t ccgctgctcg c cctccctcgg q cgc-ggaggcgc caagcgaatcz gcggaggaccc gatgaaggcc gaacgcttcg cccgagctca cgggtccgca gaaatacagc gttgtgcggg ccggccgagt gtgggtttgg gggctcgcgc gtcttgacga gtcgcgctcg itcgctgtgc gggctcatga actgtgcgtt atggtctttc gtgtttgatg cgccccgggg aggttccccg raacatcgaat Ltcgcttcccg Tgtgcgattcg :ggagaagagc iagctcgctcg ;gagccgcccg ctcgcgctCg Scccgaggtgc SctcgctCCtg ggcatctcct g agcatcctgc :ccgctgcgca =gacgtctcgc :cggtgctcgC t cgccggatga g ccgcgtactc C tcgtgcggCt a cgacggccgg g tcgccgcccg t cggccatgga g iaggggagtt g accagcagat g cgttcgcccg t tccctcttcg g cggcagagat cgcgcgggc 61200 ccctgaaga 61260 gagcctgga 61320 ggggtggac 61380 cgtccggct 61440 ccacgtcct 61500 ttctggcgg 61560 ggaaatcgt 61620 gtacgt-cca 61680" agggttcag 61740 'cggcgcacc 61800 .gatcacccc 61860 rtttcgtgcg 61920 Lcaccatggt 61980 tatcgtcgc 62040 .acaatctcg 62100 .cgtcgatcg 62160 *cgcgaggac 62220 ~cgcgcgaga 62280 atgagcc::z 62340 ~gagcgagac 62400 gctccccgc 62460 Ictcctgggt 62520 :ggtcagtcg 62580 3cgatatgaa 62640 agctcaccaa 62700 gcaccutcga 62760 actacgcgga 62820 gtgacgagaa 62880 tgccccaggt 62940 tgctccatga 63000 tgctgctcca 63060 tgggtgcgat 63120 tcaacctgct 63180 ggaacgcgct 63240 tcgccaagca 63300 tcctcatccc 63360 tgcgacggga 63420 tgtccc::gc 63480 agatgaagcz 63540 cactcaacgt 63600 aacctcattc 63660 atccagcgga 63720 ccgtccgggt 63780 ttcgcactca 63840 ccctgatgca 63900 tcctcctgac 63960 tcggcgagct 64020 ggttccaccg 64080 ggataggcgc 64140 gcaaggaggc 64200 cgccggggcc 64260 cgcgacgCtc 64320 tccgcccgcg 64380 cgaggccgac 64440 ggaggccgtt 64500 cgcgatcgtg 64560 cgtczcccag 64620 gtacgcgagc 64680 ggcgctcgcg 64740 tcggggggag 64800 cctgaccgcg 64860 tgaggagcga 64920 caaccatgac 64980 cgctggagcg 65040 WO 99/66028 WO 9966028PCT/EP99/041 71 -18acaggcgacg ctccagacac ctcgtcgcgC gtcttgtCCg gacggcgctg cccgccigac caggcccacg cacatcagag accgtcccg gactagcgtg atcggccggg aatgtgctgc ggggtgacgc caccaccgtg tccgtgtagg tgctgcacga atcggct.cgg gtcacccggt gcgtcCcggt ggtccgttgC ggcttctcca agcgacggc ctccttgccg gatgccggcg cggcgcctCt acccgccgat ggcggCrtgtg gaagcagtga cgccggatag ccgagcgaga gccgacgtcg c cgcc cgagc gagtccccgt ctggccgage agcagccctc cgcgctatcz tccatcatat cggcgcggct ggatcggcg c gr-ggc ga t ctccgaggg( gcaccggagc tttctggacc.
cgaagggatu cgtcctcggc caagaagaa( gttcacgati cgtctggcti cggcctcgt gggggagag tggcgtgtc cgtgctgCg gcgcatcga caggtcgct cgccatcct aaaccgtgc ccgcgaggt caggcttcg ggcgagatg agcgcggtc gcggCgCca atcctccgc acccgcccga g~ tcgttcagct g cactcggatc g~ atgttgttgc a' ccgactggca c cgcttttcgc ci accagcttcc c actctccgct c accagaacag c cctccgctcg t caggaggtac c ggcaggcgcc t cgcaaacggg a cgcactcgtg a cgatcgtgct g caatgggaac g gttcggtcag a acggcccggc 9 ccgacgcatt c tccggcctgg g catgtcctcc c tcttctcccg a accggcgcgc c cgacgaggcc q gcaggccggc c tcacatccca c gtaccggcca ggcgatcgag ggctcacgct aaatcgtgca faccccgacgc gcatcgaggc ggcaacgggt kcgaccgtgcc :agtctccatc igcagcgctga icatccctgcg ctacgtgcgc aacatggaga gcggcggtca gtgcactccc gcacgcccgc g ctgatcgtcg :ttgcacctct :gcagcggccg y gacggacagg gtcctggagg cgggcaccgcc g ctcgccgcgg c gcggacaggg g gcggtggggC c atcgagttcg g acccggatac caccagcgcg C cgcggcgccc a gacgtcgtcc cggcagcgcc *a gggggcgac a cggctggcgtg CTcgcgCgat .g cat-ggCggtC :t tccagggrtc ggtgtcgaa :ttggCgtC gatcttgtt :cgcgcCtC :ggcgcCtC gccgcacgc agcaatctt ggctcgtcg cgcatgcgg gccgagatc gacacgggc ccatgcccg gatgctccc gctccagct atcagcgcg tccgaattcg tcgttgaac rgggtCgCgg :aacaggcag ratgtagccc :tgCtggCtC LCgcgCtCgg :gagcgccga ~cccCgcgtC :tgagcgttg :cgcggcacg ;cccggacgc Itggcagatg :ggtCtCCtC Iggcacgcgg cgcgaccacg cggcagcgag cggagtgctc gcatgccaac ttctgccgcC gcccgccagc tactcctcca gacaggagag aagaatccaa agttcatcgc tcgtcgatac ccgacgccgz ccatcatgat tgcacccgc~ tcttcgagg gctacctcgc actccgcgg tgggaaaccc tcgcggtct' agctcctcg ggcccctga acgccgcgc ggagcgagc cgagggcgt gggc tcagg ttcgacgcg ggagcccgg gtcatgagg cgcggcctc ctCgcccgg atctcgctg gattcggcc cggattgccg cagccctcat t gatgccgcct gggcactcgc C cgagcacgcg tccttgctcg a gcggtcgcac cgcgccgcca c gccctgcgcg ccacccgggg t cgcgagcagg ctcattcccg a ttgcatggct tcccctccct c gitcgacagc cggcgacggc c gtttctcgca- acatgccccg a ggctgtcctg tgcgacggca a gccgggcggg aggtgccgcc a cagccgggaa cgcggcgccc 9 ggagaggcgc cgggcacagc c cctcggcata gaagagaccg t ttctccgcct gacgcgagtc g atcacgctgg catagzccgt a cggacgtgcc gggtgcg-cct c tcgctgaagt agacggtgat g gccgtctcat ggctcgtcat c tctgcgattg cccagcgcgt c ctctttggCt gcCtccctct ggatccatgg ctgaggatcc cgggctttga aagcacgcgac tgatcccgat cgtgacatcg cgcggtcatg gtcgtcctcg acgcttgctc aaaccgcggc gaggcccgag agggacagtg aaacacglttg acacgggccg gcgagcatgg cgctcgccgg ctcgcgcccg gcgccgatgc cggctggcgg tggacgtcgt cggttcgtcg tctggcagcg gactacaacg ctgccagccg ttcgagctgc tcatcaccgt gtcatcgggc cgacgtccgt aggccccaga gccctgCCtC gcgacggccg- cgtcgaagca cgtcctggcg cggcctgcgc gatcgcgatc tacggcacca cgccgccgtg accggcagct tgcagacggg ctcctcCtCC Sgcatccgttc ggccacggca cttcgccgcg ggcggcggCg ;ccagatcgag gatccgacgt ;gacgtcgctc atcatctcga "ggcgatgcgg tccagcaagg "gctcgccggg ctcaccatcg "ctacctcgac ggcgcggcgt cctcgccagc cagagccgtg "cgcgatccgc gcgctcgcca gatgcacttc ggtccgcacg t cacggcgtcc ggggtcgcgg g acccgacgtg aagcacatct g acgcgccgtg gagagaccgc t ggccctcgca gcagggcgcg a ggtacgctgg ttgcaagtcg g cgggccgggc gcacgaaggc a aggccagggc gcatggggcg c ccggcgCCgg cgcgcttcgc c gccggctcat cgcctccgtg t tccagctcgg gatcatcgag 'a aggtgacgag ct.ccgatatc gcggatccc 6! gaaggtcag 6! atactcccg 6! gatgctatc 6' ttgcgcct.c 6' caccgagat 6 acgacacgt 6 acgagcaga 6 catccttgc 6 tatcctgcg 6 cgggctcga 6 'gccagcctc 6 :gagcgccgt 6 .cactcccgg 6 ragccgggta 6 tCgcgcggg 6 :gctgggacg 6 ~gcgacctgc 6 :tgcggctcg 6 :cgcccgatc 6 ~ctgtccagg 6 :cgccgagcg 6 :cggacacgt 6 :gacgtccgc 6 cgtcaccgcc6 gagacggccg6 ggtccgccgt acgagtcggc :tgtggcggc gcacgtcgcc ecacctctcg tccgagctcc aagaggcaag cgagaagcag cgggtaacat gatcgccttc accgccgcgc atcgctggaa tcgcagccaa cggcgatgct tgctcggcaa aggagctcta tctcgatcta ggaactacgt tccacgagtt acccgacgac ccttcctcgg cgat~cggcat ggcictCgt gcgcagatcc aagtcctggt aggcgatcga acgtcgaggc gcgcggCCtC cctggcgggC tcacgccgta gcggcgagcg atgctcggcg gCcgcgCtCC tccctcgccg cgcctgCCCg 5100 5160 5220 5280 3340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 56540 56600 56660 66720 66780 66840 66900 66960 67020 67080 67140 67200 67260 67320 67380 67440 67500 67560 67620 67680 67740 67800 67860 67920 67980 68040 68100 68160 68220 68280 68340 68400 68460 68520 68580 68640 68700 68750 <210> 2 <211> 1421 WO 99/66028 PCT/EP99/04171 -19- <212> PRT <213> Sorangium cellulosum <400> 2 Val Ala 1 Val Gly Phe Trp Ala Glu Pro Gly Cys Phe Met Asp Glu Asn Val Phe 130 Ala Thr 145 Pro Ser Pro Cys His Leu Ala Gly 210 Ser Lys 225 Ala Glu Leu Lys Val Ile Thr Val 290 Ala Asp 305 Asp Ala rhr Arg Lys Asp Pro Ala 115 Ile Ala Val Val Ala 195 Gly Thr Ala Arg Arg 275 Pro Ala Arg Ser Leu Trp Thr Ala Ala 100 Ala Gly Ser Gly Ala 180 Cys Val Arg Asp Leu 260 Gly Asn ?ro Ile Glu Arg Ala Ala Glu Asp Pro Ile Ala Ile 10 5 Cys Leu Asp Pro Ser His Ile Ile Ala Ala 165 Val Gin Ser Ala Gly 245 Ser Ser Gly Arg Glu Ala Val 70 Phe Arg Ala Gly Glu 150 Gly Asp Ser Leu Leu 230 Phe Gly Ala Sex Leu Gly Ala 55 Thr Phe Leu Pro Pro 135 Ile Arg Thr Leu Met 215 Ala Gly Ala Ile Ser 295 Pro Ser 40 Ala Arg Gly Leu Ser 120 Ser Asp Ile Ala Arg 200 Leu Arg Arg Arg Asr 28( G1 Gly C 25 Arg Trp Ala Ile Leu 105 Ala Glu Ala Ser Tyr 185 Ser Ser Asp Gly Ala 265 His i Glu r Ser Gly Asp Phe Ser Ser 90 Glu Leu Tyr His Tyr 170 Ser Gly Pro Gly Glu.
250 Asp Asp I1 Va Val Thr Asp Phe 75 Pro Val Val Glu Gly 155 Ala Ser Glu Ser Arg 235 Gly Gly Gly e Val 1 Gly 315 Ile Val Pro Leu Arg Cys Gly Ala 140 Gly Leu Ser Cys Thr 220 Cys Cys Asp Ala Leu 300 Tyr Asp Gly Asp Ser Glu Trp Thr 125 Ala Leu Gly Leu Ser 205 Leu Lys Ala Arg Ser 285 Lys Val Leu Arg Pro Asp Ala Glu 110 Glu Leu Gly Leu Val 190 Thr Val Ala Val Ile 270 Ser Arg Glu Ser Val Asp.
Val Leu Ala Thr Pro Thr Arg 175 Ala Ala Trp Phe Val 255 Leu Gly Ala Ala Gly Pro Ala Ala Arg Leu Gly Gin Met 160 Gly Val Leu Leu Ser 240 Val Ala Leu Leu His 320 Gly Cys Ala Ala Se 310 Gly Thr Gly Thr Thr Leu Gly Asp Pro Ile Glu Ile Gin Ala Leu Asn WO 99/66028 PCT/EP99/04171 Ala Val Tyr Gly Ser Gly His 385 Arg Pro His Pro Ala 465 Pro Ser Leu Ala Phe 545 Asp Phe Pro Pro Trp 625 Val Va 1 Leu 370 Leu Leu Arg Val Glu 450 Leu Ser Al a Arg Val 530 Thr Val Asn Ala Ala 610 Gly Ala 3 Lys T 355 Leu L His Thr Arg Val 435 Arg Asp Gln Met Ala 515 Arg Gly Trp Gin Ser 595 Leu Val Ala 40 hr ys Ala Val Ala 420 ,eu Pro Ala Cys Glu 500 Ala Ser Gin Ser Glu 580 Val Ph Gi Cy, 325 Leu Asn Val Gin Thr 405 Gly Glu Ala Gin Leu 485 His Leu ile Gly Ala 565 Leu Asp Th .I Pr c ly Arg Asp Leu Gly H 3 Val Leu 375 Ala Leu 390 Arg Ala Val Ser Glu Ala Glu Leu 455 Ala Ala 470 Gly Asp Arg Leu Asp Ala Ala Asp 535 Ala Gin 550 Phe Arg Asp Arg Ala Ala Phe Glu 615 Glu Leu 630 is 60 er Asn Arg Ser Pro 440 Leu Arg Val Ala Ala 520 Ser Thr Glu Pro Leu.
600 Val Ala Thr 345 Pro Glu 1yr Leu Gin His Pro Arg Ile 395 Thr Pro Trp 410 Phe Gly Met 425 Ala Ala Thr Val Leu Ser Leu Arg Asp 475 Ala Phe Ser 490 Val Ala Ala 505 Ala Gin Gly Ser Arg Gly Leu Gly Met 555 Ala Phe Asp 570 Leu Arg Glu 585 Leu Asp Gin Pro Ala 380 Ser Pro Ser Cys Ala 460 His Leu Thr Gin Lys 540 Gly Leu Val Thr Ala 620 Ser Glu Leu Leu 350 Ser Gly 365 Gin Ile Trp Gly Asp Trp, Gly Thr 430 Thr Pro 445 Arg Thr Leu Glu Ala Thr Ser Arg 510 Thr Ser 525 Leu Ala Arg Gly Cys Val Met Trp, 590 Ala Phe 605 Leu Trp Ile Gly Asp Ala 335 Ile Ile Pro Asp Asn 415 Asn Pro Ala Thr Thr 495 Glu Pro Phe Leu Arg 575 Ala Thr Arg Giu Val 655 G1y Thr Ala Leu 400 Thr Ala Ala Ser Tyr 480 Arg Gly Gly Leu Tyr 560 Leu Glu Gin Ser Leu 640 Phe Tyr Ala Leu Ala Val Ala Gly His 635 s Val Ala Gly Val Phe Ser Leu 650 Leu Val Ala Ala Arg Gly Arg Leu Met G'n Ala Leu Pro 660 665 Ala Gly Gly 670 WO 99/66028 PCT/EP99/04171 -21 Ala Met Val 675 Ser Ile Glu Ala Pr 69 Gin Va 705 Ala Me Ala Ph Val Al Asn Le 77 Trp Va 785 Ala Le Ser T1 Ala Le Leu G] 81 Ala G 865 Pro T: Ala Ai Glu G: His P: 9: Asp K 945 Leu V Glu I Ala L 0 0 1 t e a u '0 l _u hr eu lu 50 rg ly al le eu His 2 Val Ala His Glu 755 Ser Arg His Leu Leu 835 Ala Leu Gin Gly Gly 915 Pro Pro Leu Ala Gly 995 Ala Ile Ala Ser 740 Ser Gly His Ala Leu 820 Ala Leu Phe Arg Asp 900 Ala Pro Phe Arg Val 980 Met Ala Ala Arg 725 Pro Val Lys Ala Ala 805 Gly Ser Gly Pro Glu 885 Arg Val Glu Arg Val 965 Asp Ser Gly 710 Gly Leu Ser Ala Arg 790 Gly Leu Ser Gly Ser 870 Arg Arg Arg Ser Leu 950 Thr Ala Ala Pro Glu Ala Asp 680 Val Ser Ile Ala Ala 695 Ala Gly Gln Pro Val 715 Ala Arg Thr Lys Ala 730 Met Ala Pro Met Leu 745 Tyr Arg Arg Pro Ser 760 Cys Thr Asp Glu Val 775 Glu Val Val Arg Phe 795 Ala Gly Thr Phe Val 810 Val Pro Ala Cys Met 825 Arg Ala Gly Arg Asp 840 Leu Trp Ala Val Gly 855 Gly Gly Arg Arg Val 875 Tyr Trp le Asp Thr 890 Ala Pro Gly Ala Gly 905 Gly Gly Asp Arg Arg 920 Gly Arg Arg Glu Lys 935 Glu lie Asp Glu Pro 955 Glu Arg Arg Ala Pro 970 Ala Gly Leu Ser Phe 985 Asp Asp Leu Pro Gly 1000 Val Val 700 His Leu Glu Ile Ser 780 Ala Glu Pro Glu Gly 860 Pro Lys His Ser Val 940 Gly Gly Asr Lys Ala 685 Asn Ala His Ala Val 765 Ser Asp Val Asp Pro 845 Leu Leu Ala Asp Ala 925 SGlu Val Leu Asp Pro 1005 Ala Val Phe 750 Leu Pro Gly Gly Ala 830 Ala Val Pro Asp Glu 910 Arg Ala Leu Gly Val 990 Asn Ala Pro Ala Ser 735 Gly Val Gly Val Pro 815 Arg Thr Ser Thr Asp 895 Val Leu Ala Asp Glu 975 Gln Pro Val Asp Ala 720 His Arg Ser Tyr Lys 800 Lys Pro Val Trp Tyr 880 Ala Glu Asp Gly His 960 Val Leu Pro Val Pro Leu Leu Leu Gly Gly Glu Cys Ala L015 Gly Arg Ile Val Ala Val Gly Glu 1020 1010 WO 99/66028 PCT/EP99/04171 -22- Gly Val Asn Gly Leu Val Val Gly Gin Pro Val Ile Ala Leu Ser Ala 1025 1030 1035 1040 Gly Ala Phe Ala Thr His Val Thr Thr Ser Ala Ala Leu Val Leu Pro 1045 1050 1055 Arg Pro Gln Ala Leu Ser Ala Ile Glu Ala Ala Ala Met Pro Val Ala 1060 1065 1070 Tyr Leu Thr Ala Trp Tyr Ala Leu Asp Arg Ile Ala Arg Leu Gin Pro 1075 1080 1085 Gly Glu Arg Val Leu Ile His Ala Ala Thr Gly Gly Val Gly Leu Ala 1090 1095 1100 Ala Val Gin Trp Ala Gln His Val Gly Ala Glu Val His Ala Thr Ala 1105 1110 1115 1120 Gly Thr Pro Glu Lys Arg Ala Tyr Leu Glu Ser Leu Gly Val Arg Tyr 1125 1130 1135 Val Ser Asp Ser Arg Ser Asp Arg Phe Val Ala Asp Val Arg Ala Trp 1140 1145 1150 Thr Gly Gly Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Glu 1155 1160 1165 Leu Ile Asp Lys Ser Phe Asn Leu Leu Arg Ser His Gly Arg Phe Val 1170 1175 1180 Glu Leu Gly Lys Arg Asp Cys Tyr Ala Asp Asn Gin Leu Gly Leu Arg 1185 1190 1195 1200 Pro Phe Leu Arg Asn Leu Ser Phe Ser Leu Val Asp Leu Arg Gly Met 1205 1210 1215 Met Leu Glu Arg Pro Ala Arg Val Arg Ala Leu Leu Glu Glu Leu Leu 1220 1225 1230 Gly Leu Ile Ala Ala Gly Val Phe Thr Pro Pro Pro Ile Ala Thr Leu 1235 1240 1245 Pro Ile Ala Arg Val Ala Asp Ala Phe Arg Ser Met Ala Gin Ala Gin 1250 1255 1260 His Leu Gly Lys Leu Val Leu Thr Leu Gly Asp Pro Glu Val Gln Ile 1265 1270 1275 1280 Arg ile Pro Thr His Ala Gly Ala Gly Pro Ser Thr Gly Asp Arg Asp 1285 1290 1295 Leu Leu Asp Arg Leu Ala Ser Ala Ala Pro Ala Ala Arg Ala Ala Ala 1300 1305 1310 Leu Glu Ala Phe Leu Arg Thr Gin Val Ser Gin Val Leu Arg Thr Pro 1315 1320 1325 Glu Ile Lys Val Gly Ala Glu Ala Leu Phe Thr Arg Leu Gly Met Asp 1330 1335 1340 Ser Leu Met Ala Val Glu Leu Arg Asn Arg Ile Glu Ala Ser Leu Lys '345 1350 1355 1360 Leu Lys Leu Ser Thr Thr Phe Leu Ser Thr Ser Pro Asn Ile Ala Leu WO 99/66028 PCTIEP99/04171 -23- 1365 1370 1375 Leu Ala Gin Asn Leu Leu Asp Ala Leu Ala Thr Ala Leu Ser Leu Glu 1380 1385 1390 Arg Val Ala Ala Glu Asn Leu Arg Ala Gly Val Gin Asn Asp Phe Val 1395 1400 1405 Ser Ser Gly Ala Asp Gin Asp Trp Glu Ile Ile Ala Leu 1410 1415 1420 <210> 3 <211> 1410 <212> PRT <213> Sorangium cellulosum <400> 3 Met Thr Ile Asn Gin Leu Leu Asn Glu Leu Glu His Gin Gly Ile Lys 1 Leu Leu Leu Pro Tyr Ala Arg Thr Ile 145 Arg Glu Gin Ser Ala Asn Thr Ala Trp Tyr Ala Leu 130 Glu Leu Arg Thr Leu 210 Ala Pro Met Glu Leu Arg Phe 115 Pro Ile Val Pro Arg 195 Ser Asp Asn Leu Ara Gly Glu 100 Arg Asp Ile Ser Pro 180 Leu :1 e 5 Gly Leu Arg His Arg Tyr Lys Met Asp Leu 165 Leu Val Ile Glu Leu Gin Ala 70 Thr Asp Val Met Leu 150 Arg Tyr Leu Phe Arg Leu Gin Ile Ala Arg 55 Pro Gly Cys Val Gin 135 Arg Asp His Ser Lys 215 Leu Arg 40 Leu Phe Ala Thr Ala 120 Val Gly Ala Val Ile 200 Asp Glu 25 Ile Pro Pro Phe Asp 105 Arg Ile Leu Met Val 185 Asp Trp Leu Ser Ala Leu Thr 90 Leu His Glu Asp Ser 170 Ala Leu Leu Gin Glu Glu Thr 75 Val Asp Asp Pro Arg 155 His Val Ile Ser Ala Pro Lys His Lys Ser Ser Ile Val Asp Ile Gin Pro Ser Gly Val Pro Arg 110 Met Leu Arg 125 Lys Val Asp 140 Ser Thr Arg Arg Ile Tyr Arg Leu Asp 190 Asn Val Asp 205 Phe Tyr Glu 220 Arg Asp Tyr Asn Thr Pro Glu Ile Leu Ala Ala Glu Asp 175 Giu Leu Asp Val Ala Ile Ala Ser His Ser His Asp Ala 160 Thr Arg Gly Pro Leu 240 Glu Thr Ser Leu Pro Val Ser Tyr 235 Ala Leu Glu Ser A-rg 245 Lys Lys Ser Glu His Gin Arg Ser Met Asp 255 WO 99/66028 PCT/EP99/04171 -24- Tyr Trp Lys Arg Arg Ile 260 Ala Glu Met Glu Gly 305 Glu Thr Gly Lys Ala 385 Thr Leu Asp lie 465 Ala Gin Ala Phe Ala 545 Leu Ala Lys An 290 Glu Jal Leu Asp Ser 370 Met Arg Ser Gly His 450 Val Tyr Val Asn Ala 530 Arg Gly Val Ala 275 Trp Arg Ile Phe Phe 355 Phe Asp Val Ala Thr 435 Gin Asp Val Arg Ala 515 Ala Lys Ala Val Asp Leu Gly Gly Asn 340 Thr Glu His Leu Leu 420 Pro Leu Gly Val Cys 500 Thr Arg Thr Ar Me Pro S Pro Leu 1 Arg 325 Arg Ser Gin Cys Gly 405 Asn Val Tyr Val Phe 485 Ser Asn Val Leu g Leu 565 t Glu er Ser Thr 310 Trp Leu Met Arg Asp 390 Ile Gln Tyr Glu Phe 470 Leu Leu Ala Glu Thr 550 Arg Lys Thr L 2 Asp S 295 Pro I Ser Pro Val Ala 375 Val Gin Gln Thr His 455 Pro Arg Pro Leu Gin 535 Tyr Glu Gly eu 80 er hr ~la Ala Val Leu 360 Lys Ser Arg Val Ser 440 Asp Pro Arg Pro Leu 520 Leu Giu G1 Tr Leu Pro Pro 265 Lys Glu Ile Trp Gly Arg Gly Val Ile 315 Ser Pro Arg 330 His Pro Arg 345 Leu Asp Ile Arg Ile Gin Gly Ile Glu 395 Gly Ala Leu 410 Val Gly Val 425 Thr Gin Thr Gly Asp Leu Asp Leu Leu 475 Leu Thr Glu 490 Ala Gin Leu 505 Ser Glu His Pro Met Gin Glu Leu Ser 555 i Gly Ala Arg 570 p Glu Gin Val 585 Pro Arg Leu 300 Leu Phe Val Asp Glu 380 Val Phe Thr Pro Val 460 Asp Glu Glu Thr Leu 540 Arg Pro Val Pro Phe 285 Lys Ala Thr Asn Thr 365 Gin Gin Pro Ser Gln 445 Leu Asp Pro Ala Leu 525 Ala Arg Asn Ala Thr 270 Arg Arg Ala Leu Asp 350 Thr Leu Arg Val Leu 430 Leu Ala Met Trp Arg 510 His Val Ser Thr Val Leu His Arg Phe Asn 335 Ile A.rg Trp Glu Val 415 GIn Leu Trp Leu Gly 495 Ala Gly Val Arg Leui 575 Let.
Pro Thr Val Ser 320 Ile Thr Asp Glu Ala 400 Leu Arg Leu Asp Glu 480 Glu Ser Leu Ser Arg 560 Val Ala 580 590 Val Leu Glu Ser Gly Ala Ala Tyr Val Pro Ile Asp Ala Asp Leu Pro WO 99/66028 PCT/EP99/04171 595 Ala Glu Arg 610 Leu Thr Gin 625 Gin Arg Leu Pro Pro Met Tyr Thr Ser 675 Arg Gly Ala 690 Gly Pro Gly 705 Ser Val Tyr Val Pro Asp Ile Glu Arg 755 Arg Met Leu 770 Ser Leu Arg 785 Pro Gly Glu Gly Gly Ala Asn Val Asp 835 Asn Gin Thr 850 Trp Val Pro 865 Tyr Trp Arg Glu Thr Gly Pro Asp Gly 915 Ile His Tyr Leu Asp His Gly Pro Leu Met 660 Gly Val Asp Asp Ala 740 Glu Val Leu Leu Thr 820 Leu Phe Gly Asp Glu 900 Asr Trp I Val 645 Pro Ser Asn Arg Val 725 Ser Lys Glu Ser Gin 805 Glu Ser His Gin Glu 885 Arg Ile Leu 630 3er Ile rhr Thr Val 710 Phe Lys Val His Leu 790 Ala Ala Trp, Val Leu 870 Glu Leu Glu 615 Asp Glu Gin Gly Ile 695 Leu Gly Leu Thr Phe 775 Leu Ile Ser Ala Leu 855 Tyr Lys Tyr Phe( Gly I Ala C Thr Leu 680 Leu Ala Ile Arg Val 760 Glu Ser Arg Ile Ser 840 Asp Ile Thr Lys Met 920 Lys Gly Pro 665 Pro Asp Leu Leu Asp 745 Trp Gly Gly Pro Trp 825 Ile Glu Gly Arg Th 90 G1 Leu S 6 Val C 650 Ser I Lys Ile Ser Ala 730 Pro Asn Arg Asp Gly 810 Ser Pro Ala Gly Lys 890 Gly 5 y Arg er 35 ;lu Asp Gly Asn Ser 715 Ala Ala Ser Pro Tr-p 795 Val Ile Tyr Let Va.- 87 Se As G1 Glu 620 Trp Gly Leu Val Glu 700 Leu Gly His Val Asp 780 Ile Ser Gly Gly Glu 860 Gly 5 r Phe p Leu u Asp Glu 940 605 al Pro Asp Ala Met 685 Arg Ser Gly Trp Pro 765 Ser Pro Val Tyr Arg 845 Pro Leu Leu Gly Asn 925 Pro C Gly Tyr 670 Ile Phe Phe Thr Ala 750 Ala Leu Val Ile Pro 830 Pro Arg Ala Val Arg 910 Gin Gly Asp 655 Val Asp Glu Asp Ile 735 Glu Leu Ala Gly Ser 815 Val Leu Pro Leu His 895 Tyr Ile Lys Leu Val Ile 640 Gin Ile His Ile Leu 720 Val Leu Met Arg Leu 800 Leu Arg Arg Val Gly 880 Pro Leu Lys Leu Arg Gly Tyr Arg Val 930 Leu Gly Glu Il Glu Thr Leu Lys WO 99/66028 PCT/EP99/04171 -26- Ser His Pro Asn Val Arg Asp Ala Val Ile Val Pro Val Gly Asn Asp 945 950 955 960 Ala Ala Asn Lys Leu Leu Leu Ala Tyr Val Val Pro Glu Gly Thr Arg 965 970 975 Arg Arg Ala Ala Glu Gln Asp Ala Ser Leu Lys Thr Glu Arg lIe Asp 980 985 990 Ala Arg Ala His Ala Ala Glu Ala Asp Gly Leu Ser Asp Gly Glu Arg 995 1000 1005 Val Gln Phe Lys Leu Ala Arg His Gly Leu Arg Arg Asp Leu Asp Gly 1010 1015 1020 Lys Pro Val Val Asp Leu Thr Gly Gln Asp Pro Arg Glu Ala Gly Leu 1025 1030 1035 1040 Asp Val Tyr Ala Arg Arg Arg Ser Val Arg Thr Phe Leu Glu Ala Pro 1045 1050 1055 Ile Pro Phe Val Glu Phe Gly Arg Phe Leu Ser Cys Leu Ser Ser Val 1060 1065 1070 Glu Pro Asp Gly Ala Thr Leu Pro Lys Phe Arg Tyr Pro Ser Ala Gly 1075 1080 1085 Ser Thr Tyr Pro Val Gln Thr Tyr Ala Tyr Val Lys Ser Gly Arg Ile 1090 1095 1100 Glu Gly Val Asp Glu Gly Phe Tyr Tyr Tyr His Pro Phe Glu His Arg 1105 1110 1115 1120 Leu Leu Lys Leu Ser Asp His Gly Ile Glu Arg Gly Ala His Val Arg 1125 1130 1135 Gin Asn Phe Asp Val Phe Asp Glu Ala Ala Phe Asn Leu Leu Phe Val 1140 1145 1150 Gly Arg lie Asp Ala Ile Glu Ser Leu Tyr Gly Ser Ser Ser Arg Glu 1155 1160 1165 Phe Cys Leu Leu Glu Ala Gly Tyr Met Ala Gln Leu Leu Met Glu Gin 1170 1175 1180 Ala Pro Ser Cys Asn Ile Gly Val Cys Pro Val Gly Gln Phe Asn Phe 1185 1190 1195 1200 Glu Gln Val Arg Pro Val Leu Asp Leu Arg His Ser Asp Val Tyr Val 1205 1210 1215 His Gly Met Leu Gly Gly Arg Val Asp Pro Arg Gln Phe Gin Val Cys 1220 1225 1230 Thr Leu Gly Gln Asp Ser Ser Pro Arg Arg Ala Thr Thr Arg Gly Ala 1235 1240 1245 Pro Pro Gly Arg Glu Gln His Phe Ala Asp Met Leu Arg Asp Phe Leu 1250 1255 1260 Arg Thr Lys Leu Pro Glu Tyr Met Val Pro Thr Val Phe Val Glu Leu 1265 1270 1275 1280 Asp Ala Leu Pro Leu Thr Ser Asn Gly Lys Val Asp Arg Lys Ala Leu 1285 1290 1295 WO 99/66028 WO 9966028PCT/EP99/04171 27 Axg Glu Arg Lys Asp Thr Ser Ser Pro Arg His Ser Gly His Thr Ala 1300 1305 1310 Pro Arg Asp Ala Leu Glu Glu Ile Leu Val Ala Val Val Arg Giu Val 1315 1320 1325 Leu Gly Leu Giu Val Val Gly Leu Gin Gin Ser PheVal Asp.Leu Gly 1330 1335 1340 Ala Thr Ser Ile His Ilie Val Arg Met Arg Ser Leu Leu Gin Lys Arg 1345 1350 1355 1360 Leu Asp Arg Giu Ile Ala Ile Thr Giu Leu Phe Gin Tyr Pro Asn Leu 1365 1370 1375 Gly Ser Leu Ala Ser Gly Leu Arg Arg Asp Ser Arg Asp Leu Asp Gin 1380 1385 1390 Arg Pro Asn Met Gin Asp Arg Vai Giu Vai Arg Arg Lys Gly Arg Arg 1395 1400 1405 Arg Ser 1410 <210> 4 <2i1> 1832 <2.12> PRT <213> Sorangiun celiulosum <400> 4 Met Arg Asp Ser Gly Ile Met Al a Tyr Gly 145 Glu Phe Gly Gly Ser Ser Giu Tyr Leu 130 Trp Giu Pro Thr Vali Val1 Pro Cys G-'u 115 T hr Phe Gin Gly Giu Asp Leu Arg Ala 100 Gly Ser Gin Giu Ala Ala Pro Giu Glu Trp Ser Asn Thr Ser Ser Ala Ile Ala Val le Giy Met Ser Gly Asp Leu Gin Arg 40 Leu Val 55 Val Asp Giu Leu Ala Leu Giy Val 120 His Glu 135 Asp 25 Phe Leu Arg Met Giu 105 Tyr His Glu Ser Asp Phe Asp 90 Asn Ala Pro Asp Phe Giu Pro Asp 75 Pro Ala Gly Ala Lys 155 Trp Gin Ser Ala Gin Gly Ala Met 140 Asp Arg Glu Tyr Ala His Tyr Asn 125 Met Asn Leu Leu Ala Val Arg Phe Phe Arg Ilie Asp Pro 110 Met Ser Arg Trp Leu Ala Arg Ala Ala Gly Phe Thr Ser Pro Thr 160 Ile Gly Asn His Val Ser Tyr Arg 165 Leu Asn Leu Arg Gly Pro Ser Ile Ser 170 Val Gin 175 WO 99/66028 PCT/EP99/04171 -28- Thr Leu Arg Ser 225 Ile Ala Thr Val Glu 305 Leu Asp Ile Val Ser 385 Thr Val Glu Glu Ala 465 Ala Leu Ile 210 Pro Met Leu Asn Gly 290 Ala Gly Ala Gly Leu 370 Pro Ser Ser Ala Leu 450 Ala Cys Asp 195 Pro Asp Gly Ser Asn 275 Gln Arg Asp Ser His 355 Ala Asn Leu Ser Pro 435 Phe Arg Ser 180 Arg His Gly Asn Asp 260 Asp Ala Ser Ala Ala 340 Leu Leu Pro Lys Phe 420 Ala Val Leu Thr Ser Leu Glu Cys Asp Arg Ala Gly 215 His Cys Arg 230 Gly Cys Gly 245 Gly Asp Pro Gly Ala Arg Gin Ala Ile 295 Ile Gin Tyr 310 Ile Glu Thr 325 Arg Arg Ser Glu Ser Ala Glu His Arg 375 Ser Ile Asp 390 Asp Trp Asn 405 Gly l1e Gly Ala Lys Leu Val Ser Ala 455 Arg Asp His 470 Phe Ser Leu 485 ral Met 00 Tyr Ala Val Jal Lys 280 Met Ile Ala Cys Ala 360 Gin Phe Thr Gly Pro 440 Lys Let Al Ala V 185 Ala L Val T Phe Val I 2 Arg 265 Ile Glu Glu Ala Ala 345 Gly Leu Ala Gly Thr 425 Ala Ser Gin a Thr al eu lyr sp Leu 250 Ala ;ly Ala Thr Leu 330 Ile Ile Pro Ser Ser 410 Asn Ala Ala Ala Thr 490 His I Ala C Ala C Ala 235 Leu Val Phe Leu His 315 Arg Gly Ala Pro Ser 395 Thr Ala Ala Ala His 475 Arg Leu Gly lu 220 Lys Lys Ile Thr Ala 300 Gly Arg Ser Gly Ser 380 Pro Pro His Pro Ala 460 Gln Ser Ala Gly 205 Gly Ala Pro Leu Ala 285 Leu Thr Val Val Leu 365 Leu Phe Arg Val Ala 445 Leu Gly Pro Cys Met S 190 Ile Thr Gly Ile Asn Gly Leu Asp 255 Gly Ser 270 Pro Ser Ala Gly Gly Thr Phe Gly 335 Lys Thr 350 Ile Lys Asn Phe Tyr Val Arg Ala 415 Val Leu 430 Arg Ser Asp Ala Ile Ser Met Glu 495 Glu Gly 510 er Val ?he Thr 240 Arg Ala Glu Val Leu 320 Arg Gly Thr Glu Asn 400 Gly Glu Ala Ala Leu 480 His Leu Gly Asp Val Ala Arg Leu Ala Met Ala Ala Pro Ser Arg Glu Ala Leu Arg 500 505 Asp Ala Ala 515 Ala Arg Gly Gin Thr Pro 520 Pro Gly Ala Arg Gly Arg WO 99/66028 PCT/EP99/04171 -29- Cys Gly 545 Val Ala Gin Val Val 625 Ala Leu Ser Ser Pro 705 Phe Asp Gly Ala Pro 785 Ser 530 Ser Phe Gly Leu Ala 610 Ile Leu Leu Leu Val 690 Ala Cys Pro Ala Gly 770 Val Pro Gln His Trp Glu 595 Phe Gly Ser Arg Ala 675 Ala Ala Arg Leu Ala 755 Pro Arg ly Trp Ala Ser 580 Arg Ala His Leu Arg 660 Glu Val Ile Arg Arg 740 Ala Glu Phe Asn Val Ala 565 Leu Ile Ala Ser Glu 645 Ile Ala Ser Gly Val 725 Glu Val Leu Ala Gli.
805 Val Pro I 535 Gly Met C 550 Leu Ser Leu Ala Asp Val Leu Trp 615 Met Gly 630 Asp Ala Ser Gly Glu Ala Asn Ser 695 Glu Val 710 Lys Val Asp Leu Pro Met Gly Ala 775 Glu Val 790 Met Ser Lys Gly Ala Glu Val 600 Arg Glu Val Gin Ala 680 Pro Leu Asp Leu Arg 760 Asn Val Pr
GI
Val Arg C Cys Leu 585 Gin Ser Val Ala Gly 665 Leu Arg Ser Val Ala 745 Ser Tyr Gin o His i Arg 825 g Pro 0 Val Gin Asp 570 Ala Pro Trp Ala Ile 650 Glu Arg Ser Ser Ala 730 Ala Thr Trp Ala Pro 810 Ala Ala Phe Leu 555 Arg Ala Val Gly Ala 635 Ile Met Gly Thr Leu 715 Ser Leu Val Met Gin 795 Ile Gly Met TrI Val P 540 Leu A Ala I Asp G Leu Val 620 Ala Cys Ala Tyr Val 700 Asn His Gly Thr Asn 780 Leu Leu Ala Leu Gly 860 he la le lu ?he 605 Ala His Arg Val Glu 685 Leu Ala Ser Gly Gly 765 Asn Gin Thr Ala Glu 845 Arg Pro Gly C Glu Glu Gin Ala 575 Gly Ser 590 Ala Leu Pro Asp Val Ala Arg Ser 655 Thr Glu 670 Asp Arg Ser Gly Lys Gly Pro Gin 735 Leu Arg 750 Ala Met Leu Arg Gly Gly Thr Ser 815 Val Gly 830 Gin ?ro 560 Glu Ser Ala Val Gly 640 Arg Leu Val Glu Val 720 Val Pro Val Gln His 800 Val Ser Gly Pro Gly Leu Phe Val Glu Glu Met Arg Arg Ala Ala 820 Leu Arg Arg Gly Gin Asp Glu Ar 835 84
L
Thr Leu Trp Ala Gin Gly Tyr 850 855 Pro Val Pro Ala Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln Arg Glu WO 99/66028 PCT/EP99/04171 865 870 875 880 Arg Tyr Trp Ile Glu Ala Pro Ala Lys Ser Ala Ala Gly Asp Arg Arg 885 890 895 Gly Val Arg Ala Gly Gly His Pro Leu Leu Gly Glu Met Gln Thr Leu 900 905 910 Ser Thr Gln Thr Ser Thr Arg Leu Trp Glu Thr Thr Leu Asp Leu Lys 915 920 925 Arg Leu Pro Trp Leu Gly Asp His Arg Val Gln Gly Ala Val Val Phe 930 935 940 Pro Gly Ala Ala Tyr Leu Glu Met Ala Ile Ser Ser Gly Ala Glu Ala 945 950 955 960 Leu Gly Asp Gly Pro Leu Gln Ile Thr Asp Val Val Leu Ala Glu Ala 965 970 975 Leu Ala Phe Ala Gly Asp Ala Ala Val Leu Val Gln Val Val Thr Thr 980 985 990 Glu Gln Pro Ser Gly Arg Leu Gln Phe Gln lie Ala Ser Arg Ala Pro 995 1000 1005 Gly Ala Gly His Ala Ser Phe Arg Val His Ala Arg Gly Ala Leu Leu 1010 1015 1020 Arg Val Glu Arg Thr Glu Val Pro Ala Gly Leu Thr Leu Ser Ala Val 1025 1030 1035 1040 Arg Ala Arg Leu Gln Ala Ser Ile Pro Ala Ala Ala Thr Tyr Ala Glu 1045 1050 1055 Leu Thr Glu Met Gly Leu Gln Tyr Gly Pro Ala Phe Gln Gly Ile Ala 1060 1065 1070 Glu Leu Trp Arg Gly Glu Gly Glu Ala Leu Gly Arg Val Arg Leu Pro 1075 1080 1085 Asp Ala Ala Gly Ser Ala Ala Glu Tyr Arg Leu His Pro Ala Leu Leu 1090 1095 1100 Asp Ala Cys Phe Gln Ile Val Gly Ser Leu Phe Ala Arg Ser Gly Glu 1105 1110 1115 1120 Ala Thr Pro Trp Val Pro Val Glu Leu Gly Ser Leu Arg Leu Leu Gin 1125 1130 1135 Arg Pro Ser Gly Glu Leu Trp Cys His Ala Arg Val Val Asn His Gly 1140 1145 1150 His Gln Thr Pro Asp Arg Gln Gly Ala Asp Phe Trp Val Val Asp Ser 1155 1160 1165 Ser Gly Ala Val Val Ala Glu Val Cys Gly Leu Val Ala Gln Arg Leu 1170 1175 1180 Pro Gly Gly Val Arg Arg Arg Glu Glu Asp Asp Trp Phe Leu Glu Leu 1185 1190 1195 1200 Glu Trp Glu Pro Ala Ala Val Gly Thr Ala Lys Val Asn Ala Gly Arg 1205 1210 1215 WO 99/66028 PCT/EP99/04171 -31 Trp Leu Leu Leu Gly Gly Gly Gly Gly Leu Gly Ala Ala Leu Arg Ala 1220 1225 1230 Met Leu Glu Ala Gly Gly His Ala Val Val His Ala Ala Glu Asn Asn 1235 1240 1245 Thr Ser Ala Ala Gly Val Arg Ala Leu Leu Ala Lys Ala Phe Asp Gly 1250 1255 1260 Gln Ala Pro Thr Ala Val Val His Leu Gly Ser Leu Asp Gly Gly Gly 1265 1270 1275 1280 Glu Leu Asp Pro Gly Leu Gly Ala Gin Gly Ala Leu Asp Ala Pro Arg 1285 1290 1295 Ser Ala Asp Val Ser Pro Asp Ala Leu Asp Pro Ala Leu Val Arg Gly 1300 1305 1310 Cys Asp Ser Val Leu Trp Thr Val Gin Ala Leu Ala Gly Met Gly Phe 1315 1320 1325 Arg Asp Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gin Ala Val 1330 1335 1340 Gly Ala Gly Asp Val Ser Val Thr Gin Ala Pro Leu Leu Gly Leu Gly 1345 1350 1355 1360 Arg Val Ile Ala Met Glu His Ala Asp Leu Arg Cys Ala Arg Val Asp 1365 1370 1375 Leu Asp Pro Ala Arg Pro Glu Gly Glu Leu Ala Ala Leu Leu Ala Glu 1380 1385 1390 Leu Leu Ala Asp Asp Ala Glu Ala Glu Val Ala Leu Arg Gly Gly Glu 1395 1400 1405 Arg Cys Val Ala Arg Ile Val Arg Arg Gin Pro Glu Thr Arg Pro Arg 1410 1415 1420 Gly Arg Ile Glu Ser Cys Val Pro Thr Asp Val Thr Ile Arg Ala Asp 1425 1430 1435 1440 Ser Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly Leu Ser Val 1445 1450 1455 Ala Gly Trp Leu Ala Glu Arg Gly Ala Gly His Leu Val Leu Val Gly 1460 1465 1470 Arg Ser Gly Ala Ala Ser Val Glu Gin Arg Ala Ala Val Ala Ala Leu 1475 1480 1485 Glu Ala Arg Gly Ala Arg Val Thr Val Ala Lys Ala Asp Val Ala Asp 1490 1495 1500 Arg Ala Gin Leu Glu Arg Ile Leu Arg Glu Val Thr Thr Ser Gly Met 1505 1510 1515 1520 Pro Leu Arg Gly Val Val His Ala Ala Gly Ile Leu Asp Asp Gly Leu 1525 1530 1535 Leu Met Gin Gin Thr Pro Ala Arg Phe Arg Lys Val Met Ala Pro Lys 1540 1545 1550 Val Gin Gly Ala Leu His Leu His Ala Leu Thr Arg Glu Ala Pro Leu 1555 1560 1565 WO 99/66028 PCT/EP99/04171 -32- Ser Phe 1570 Gly Gin 1585 Phe Val Leu Tyr Ala Ser 1575 Gly Asn Tyr Ala Ala Ala 1590 Gly Val Gly Leu 1580 Asn Thr Phe Leu 1595 Gly Ser Pro Ala Leu Ala 1600 His His Arg, Arg Ala Gln Gly Leu Pro Ala Leu Ser Val Asp Trp Gly 1605 1610 1615' Leu Phe Ala Glu Val Gly Met Ala Ala Ala Gln Glu Asp Arg Gly Ala 1620 1625 1630 Arg Leu Val Ser Arg Gly Met Arg Ser Leu Thr Pro Asp Glu Gly Leu 1635 1640 1645 Ser Ala Leu Ala Arg Leu Leu Glu Ser Gly Arg Ala Gln Val Gly Val 1650 1655 1660 Met Pro Val Asn Pro Arg Leu Trp Val Glu Leu Tyr Pro Ala Ala Ala 1665 1670 1675 1680 Ser Ser Arg Met Leu Ser Arg Leu Val Thr Ala His Arg Ala Ser Ala 1685 1690 1695 Gly Gly Pro Ala Gly Asp Gly Asp Leu Leu Arg Arg Leu Ala Ala Ala 1700 1705 1710 Glu Pro Ser Ala Arg Ser Ala Leu Leu Glu Pro Leu Leu Arg Ala Gin 1715 1720 1725 Ile Ser Gln Val Leu Arg Leu Pro Glu Gly Lys Ile Glu Val Asp Ala 1730 1735 1740 Pro Leu Thr Ser Leu Gly Met Asn Ser Leu Met Gly Leu Glu Leu Arg 1745 1750 1755 1760 Asn Arg Ile Glu Ala Met Leu Gly Ile Thr Val Pro Ala Thr Leu Leu 1765 1770 1775 Trp Thr Tyr Pro Thr Val Ala Ala Leu Ser Gly His Leu Ala Arg Glu 1780 1785 1790 Ala Cys Glu Ala Ala Pro Val Glu Ser Pro His Thr Thr Ala Asp Ser 1805 1795 1800UU Ala Val Glu Ile 1810 Ala Ala Lys Phe 1825 <210> <211> 7257 <212> PRT <213> Sorangium <400> Met Thr Thr Arg 1 Ala Ile Ile Ile Glu Lys Glu Met Ser Gin 1815 Ala Leu Thr 1830 Asp Asp Leu Thr Gln Leu Ile 1820 cellulosum Gly Pro Thr Ala Gln Gin Asn Pro Leu Lys Gln Ala 5 10 Gln Arg Leu Glu Glu Arg Leu Ala Gly Leu Ala Gln 25 WO 99/66028 PCT/EP99/04171 -33- Ala Glu Leu Glu Arg Thr Glu Pro Ile Ala Ile Val Gly Ile Gly Cys 40 Arg Phe Pro Gly Gly Ala Asp Ala Pro Glu Ala Phe Trp Glu Leu Leu 55 Asp Ala Glu Arg Asp Ala Val Gln Pro Leu Asp Met Arg Trp Ala Leu 70 75 Val Gly Val Ala Pro Val Glu Ala Val Pro His Trp Ala Gly Leu Leu 90 Thr Glu Pro Ile Asp Cys Phe Asp Ala Ala Phe Phe Gly Ile Ser Pro 100 105 110 Arg Glu Ala Arg Ser Leu Asp Pro Gln His Arg Leu Leu Leu Glu Val 115 120 125 Ala Trp Glu Gly Leu Glu Asp Ala Gly Ile Pro Pro Arg Ser Ile Asp 130 135 140 Gly Ser Arg Thr Gly Val Phe Val Gly Ala Phe Thr Ala Asp Tyr Ala 145 150 155 160 Arg Thr Val Ala Arg Leu Pro Arg Glu Glu Arg Asp Ala Tyr Ser Ala 165 170 175 Thr Gly Asn Met Leu Ser Ile Ala Ala Gly Arg Leu Ser Tyr Thr Leu 180 185 190 Gly Leu Gln Gly Pro Cys Leu Thr Val Asp Thr Ala Cys Ser Ser Ser 195 200 205 Leu Val Ala Ile His Leu Ala Cys Arg Ser Leu Arg Ala Gly Glu Ser 210 215 220 Asp Leu Ala Leu Ala Gly Gly Val Ser Ala Leu Leu Ser Pro Asp Met 225 230 235 240 Met Glu Ala Ala Ala Arg Thr Gln Ala Leu Ser Pro Asp Gly Arg Cys 245 250 255 Arg Thr Phe Asp Ala Ser Ala Asn Gly Phe Val Arg Gly Glu Gly Cys 260 265 270 Gly Leu Val Val Leu Lys Arg Leu Ser Asp Ala Gln Arg Asp Gly Asp 275 280 285 Arg Ile Trp Ala Leu Ile Arg Gly Ser Ala Ile Asn His Asp Gly Arg 290 295 300 Ser Thr Gly Leu Thr Ala Pro Asn Val Leu Ala Gin Glu Thr Val Leu 305 310 315 320 Arg Glu Ala Leu Arg Ser Ala His Val Glu Ala Gly Ala Val Asp Tyr 325 330 335 Val Glu Thr His Gly Thr Gly Thr Ser Leu Gly Asp Pro Ile Glu Val 340 345 350 Glu Ala Leu Arg Ala Thr Val Gly Pro Ala Arg Ser Asp Gly Thr Arg 355 360 365 Cys Val Leu Gly Ala Val Lys Thr Asn Ile Gly His Leu Glu Ala Ala 370 375 380 WO 99/66028 PCT/EP99/04171 -34- Ala Gly Val Ala Gly Leu Ile Lys Ala Ala 385 390
L
3 Arg Ile Pro Arg Leu Arg Gly Trp 465 Lys Leu Ala Ser Thr 545 Leu Arg Cys Met Ala 625 Leu Ile Asp Ser 3lu Thr Thr 450 Pro Ser Asp Thr Arg 530 Pro Ala Gly Val Trp 610 Phe Trp Gly Gly Ala 690 Gly Asp 435 Asn Ala Glu Met Thr 515 Glu Ala Phe Leu Ala 595 Ala Thr Arg Glu Val 675 Gly Ser 420 Arg Ala Ala Gly His 500 Arg Gly Gly Leu Cys 580 Leu Glu Gln Ser Lei.
66( Arg Gl Asn L 405 Ala L Pro His Pro C Ala 485 Pro Ser Leu Ala Phe 565 Ala Phe Ala Pro Trp 645 Val Leu y Ala eu eu Arg Val Glu 470 Leu Glu Ala Leu Ala 550 Thr Ala Asp Gly Ala 630 Gly Ala Val Met Asn Ala Phe Val 455 Arg Asp Leu Met Ala 535 Arg Gly Trp Arg Ser 615 Leu Val Al Al Va 69 Phe A Leu Ala C 440 Leu C Ser Ala Gly Ser 520 Ala Cys Gin Pro Glu 600 Ala Phe Glu I Cys a Ala 680 1 Ser 5 s Ala Lrg la 425 Gly Glu Ala G'1n Leu 505 His Leu Ile Gly Ala 585 Leu Gl.
Ala Pr Va 66 Ar Le Thr L 410 Thr G Val S Glu P Glu 1 4 Ala 490 Gly Arg Ser Ala Ala 570 Phe Asp Ser Val o Glu 650 1 Ala 5 g Gly u Gly eu 95 eu lu er la Leu 75 Ala Asp Leu Ala Ser 555 Gln Arg Arg Leu Glu 635 Leu Gly Arg Ala Ser Leu T Asn Pro A Pro Val 1 Ser Phe 445 Pro Ala 460 Leu Val Arg Leu Val Ala Ala Val 525 Val Ala 540 Ser Ser Thr Pro Glu Ala Pro Leu 605 Leu Leu 620 Tyr Ala Leu Val Val Phe Leu Met 685 Pro Glu 700 Ser Ile hr His Glu 400 rg Ile Arg 415 Pro Trp Pro 430 Gly Met Ser Val Glu Leu Leu Ser Gly 480 Arg Glu His 495 Phe Ser Leu 510 Ala Val Thr Gln Gly Gin Arg Gly Lys 560 Gly Met Gly 575 Phe Asp Arg 590 Arg Glu Val Asp Gin Thr Leu Thr Ala 640 Gly His Ser 655 Ser Leu Glu 670 Gin Gly Leu Ala Glu Val Ala Ala Val 720
L
'S
Ala Ala Ala Val Ala Pro Hi Ala Ser Val 710 715 Asn Gly Pro Glu Gin Val Val lie Ala Gly Val Glu Gin Ala Val Gin WO 99/66028 PCT/EP99/04171 Ala Ile His Val Glu Phe 770 Ser Leu 785 Ala Pro Asp Gly Val Gly Glu Ala 850 Ala Ala 865 Ser Val Leu Pro Ala Glu Tyr Arg 930 Arg Ala 945 Gly Glu Val Leu Gin Ala Gly Leui 101C Lys Val 1025 Ala Ala C 740 Ser His 755 Gly Arg Val Ser Gly Tyr Val Lys 820 Pro Lys 835 Glu Pro Gly Val Ser Trp Thr Tyr 900 Gly Leu 915 Val Asp Arg Ser Ala Ala His Ala 980 Leu Gly 995 i Asp Ala Thr His 725 Gly I Ala I Val Asn Trp 805 Ala Pro Thr Leu Pro 885 Pro Gly Trp Gly Ala 965 Pro Gly Val Phe ?he Ala Leu 790 Jal Leu Thr Leu Glu 870 Gly Trp Ala Pro Gly 950 Ala Ala Ala His Ala 775 Ser Arg His Leu Leu 855 Ala Val Gln Thr Glu 935 Trp Ala Glu Ala Ser 760 Ser Gly His Glu Leu 840 Ala Leu Phe Arg Ala 920 Met Leu Leu Ala Arg 745 Pro Val Lys Val Ala 825 Gly Ser Gly Pro Gin 905 Ala Pro Val Ser Se: 985 Trj Gl 730 Gly Leu Thr Val Arg 810 Gly Leu Leu Arg Thr 890 Arg Asp Arg Leu Ser 970 Ala p Gin y Ala Ala Arg Thr I Met Glu Pro 765 Tyr Arg Arg 780 Val Thr Asp 795 Glu Ala Val Ala Gly Thr Leu Pro Ala 845 Arg Ala Gly 860 Leu Trp Ala 875 Ala Gly Arg Tyr Trp Ile Ala Leu Ala 925 Ser Ser Val 940 Ala Asp Arg 955 Gln Gly Cys Val Ala Glu Gly Val Leu 1005 i Ser Ala Glu 1020 Leu Ala Leu 1035 Trp Ile Val -ys 750 4et Pro Glu Arg Phe 830 Cys Arg Ala Arg Glu 910 Gin Asp Gly Ser Gln 990 Tyr Glu Ile Th 735 Arg Leu Leu Glu Ser Val Leu Ser 800 Phe Ala 815 Val Glu Leu Pro Glu Glu Gly Gly 880 Val Pro 895 Ala Pro Trp Phe Ser Arg Gly Val 960 Cys Ala 975 Val Thr Leu Trp Val Ala SGn Ala 1040 r Arg Gly 1055 Arg Asn Asp 1000 Val Glu Ala 1015 Leu Ala Ala Ala Pro Val 1030 Leu Gly Thr Gly Pro Arg Ser Pro Arg Leu 1045 1050 Ala Cys Thr Val Gly Gly Glu Pro Asp Ala Ala Pro 1060 1065 Cys Gin Ala Ala 1070 WO 99/66028 PCT/EP99/04171 -36- Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu His Pro Gly Ser Trp 1075 1080 1085 Gly Gly Leu Val Asp Leu Asp Pro Glu Glu Ser Pro Thr Glu Val Glu 1090 1095 1100 Ala Leu Val Ala Glu Leu Leu Ser Pro Asp Ala Glu Asp Gin Leu Ala 1105 1110 1115 1120 Phe Arg Gln Gly Arg Arg Arg Ala Ala Arg Leu Val Ala Ala Pro Pro 1125 1130 1135 Glu Gly Asn Ala Ala Pro Val Ser Leu Ser Ala Glu Gly Ser Tyr Leu 1140 1145 1150 Val Thr Gly Gly Leu Gly Ala Leu Gly Leu Leu Val Ala Arg Trp Leu 1155 1160 1165 Val Glu Arg Gly Ala Gly His Leu Val Leu Ile Ser Arg His Gly Leu 1170 1175 1180 Pro Asp Arg Glu Glu Trp Gly Arg Asp Gln Pro Pro Glu Val Arg Ala 1185 1190 1195 1200 Arg Ile Ala Ala Ile Glu Ala Leu Glu Ala Gln Gly Ala Arg Val Thr 1205 1210 1215 Val Ala Ala Val Asp Val Ala Asp Ala Glu Gly Met Ala Ala Leu Leu 1220 1225 1230 Ala Ala Val Glu Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Leu 1235 1240 1245 Leu Asp Asp Gly Leu Leu Ala His Gln Asp Ala Gly Arg Leu Ala Arg 1250 1255 1260 Val Leu Arg Pro Lys Val Glu Gly Ala Trp Val Leu His Thr Leu Thr 1265 1270 1275 1280 Arg Glu Gln Pro Leu Asp Leu Phe Val Leu Phe Ser Ser Ala Ser Gly 1285 1290 1295 Val Phe Gly Ser Ile Gly Gln Gly Ser Tyr Ala Ala Gly Asn Ala Phe 1300 1305 1310 Leu Asp Ala Leu Ala Asp Leu Arg Arg Thr Gln Gly Leu Ala Ala Leu 1315 1320 1325 Ser Ile Ala Trp Gly Leu Trp Ala Glu Gly Gly Met Gly Ser Gln Ala 1330 1335 1340 Gin Arg Arg Glu His Glu Ala Ser Gly Ile Trp Ala Met Pro Thr Ser 1345 1350 1355 1360 Arg Ala Leu Ala Ala Met Glu Trp Leu Leu Gly Thr Arg Ala Thr Gin 1365 1370 1375 Arg Val Val Ile Gln Met Asp Trp Ala His Ala Gly Ala Ala Pro Arg 1380 1385 1390 Asp Ala Ser Arg Gly Arg Phe Trp Asp Arg Leu Val Thr Ala Thr Lys 1395 1400 1405 Glu Ala Ser Ser Ser Ala Val Pro Ala Val Glu Arg Trp Arg Asn Ala 1410 1415 1420 WO 99/66028 PCT/EP99/04171 -37- Ser Val Val Glu Thr Arg Ser Ala Leu Tyr Glu Leu Val Arg Gly Val 1425 1430 1435 1440 Val Ala Gly Val Met Gly Phe Thr Asp Gln Gly Thr Leu Asp Val Arg 1445 1450 1455 Arg Gly Phe Ala Glu Gln Gly Leu Asp Ser Leu Met Ala Val Glu Ile 1460 1465 1470 Arg Lys Arg Leu Gln Gly Glu Leu Gly Met Pro Leu Ser Ala Thr Leu 1475 1480 1485 Ala Phe Asp His Pro Thr Val Glu Arg Leu Val Glu Tyr Leu Leu Ser 1490 1495 1500 Gin Ala Leu Glu Leu Gln Asp Arg Thr Asp Val Arg Ser Val Arg Leu 1505 1510 1515 1520 Pro Ala Thr Glu Asp Pro Ile Ala Ile Val Gly Ala Ala Cys Arg Phe 1525 1530 1535 Pro Gly Gly Val Glu Asp Leu Glu Ser Tyr Trp Gln Leu Leu Thr Glu 1540 1545 1550 Gly Val Val Val Ser Thr Glu Val Pro Ala Asp Arg Trp Asn Gly Ala 1555 1560 1565 Asp Gly Arg Val Pro Gly Ser Gly Glu Ala Gln Arg Gln Thr Tyr Val 1570 1575 1580 Pro Arg Gly Gly Phe Leu Arg Glu Val Glu Thr Phe Asp Ala Ala Phe 1585 1590 1595 1600 Phe His Ile Ser Pro Arg Glu Ala Met Ser Leu Asp Pro Gln Gln Arg 1605 1610 1615 Leu Leu Leu Glu Val Ser Trp Glu Ala Ile Glu Arg Ala Gly Gln Asp 1620 1625 1630 Pro Ser Ala Leu Arg Glu Ser Pro Thr Gly Val Phe Val Gly Ala Gly 1635 1640 1645 Pro Asn Glu Tyr Ala Glu Arg Val Gln Glu Leu Ala Asp Glu Ala Ala 1650 1655 1660 Gly Leu Tyr Ser Gly Thr Gly Asn Met Leu Ser Val Ala Ala Gly Arg 1665 1670 1675 1680 Leu Ser Phe Phe Leu Gly Leu His Gly Pro Thr Leu Ala Val Asp Thr 1685 1690 1695 Ala Cys Ser Ser Ser Leu Val Ala Leu His Leu Gly Cys Gln Ser Leu 1700 1705 1710 Arg Arg Gly Glu Cys Asp Gln Ala Leu Val Gly Gly Val Asn Met Leu 1715 1720 1725 Leu Ser Pro Lys Thr Phe Ala Leu Leu Ser Arg Met His Ala Leu Ser 1730 1735 1740 Pro Gly Gly Arg Cys Lys Thr Phe Ser Ala Asp Ala Asp Gly Tyr Ala 1745 1750 1755 1760 Arg Ala Glu Gly Cys Ala Val Val Val Leu Lys Arg Leu Ser Asp Ala WO 99/66028 PCT/EP99/04171 -38- 1765 1770 1775 Gin Arg Asp Arg Asp Pro Ile Leu Ala Val Ile Arg Gly Thr Ala Ile 1780 1785 1790 Asn His Asp Gly Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala 1795 1800 1805 Gin Glu Ala Leu Leu Arg Gln Ala Leu Ala His Ala Gly Val Val Pro 1810 1815 1820 Ala Asp Val Asp Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly 1825 1830 1835 1840 Asp Pro Ile Glu Val Arg Ala Leu Ser Asp Val Tyr Gly Gln Ala Arg 1845 1850 1855 Pro Ala Asp Arg Pro Leu Ile Leu Gly Ala Ala Lys Ala Asn Leu Gly 1860 1865 1870 His Met Glu Pro Ala Ala Gly Leu Ala Gly Leu Leu Lys Ala Val Leu 1875 1880 1885 Ala Leu Gly Gln Glu Gln Ile Pro Ala Gln Pro Glu Leu Gly Glu Leu 1890 1895 1900 Asn Pro Leu Leu Pro Trp Glu Ala Leu Pro Val Ala Val Ala Arg Ala 1905 1910 1915 1920 Ala Val Pro Trp Pro Arg Thr Asp Arg Pro Arg Phe Ala Gly Val Ser 1925 1930 1935 Ser Phe Gly Met Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala 1940 1945 1950 Pro Ala Val Glu Leu Trp Pro Ala Ala Pro Glu Arg Ser Ala Glu Leu 1955 1960 1965 Leu Val Leu Ser Gly Lys Ser Glu Gly Ala Leu Asp Ala Gln Ala Ala 1970 1975 1980 Arg Leu Arg Glu His Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp 1985 1990 1995 2000 Val Ala Phe Ser Leu Ala Thr Thr Arg Ser Ala Met Asn His Arg Leu 2005 2010 2015 Ala Val Ala Val Thr Ser Arg Glu Gly Leu Leu Ala Ala Leu Ser Ala 2020 2025 2030 Val Ala Gln Gly Gln Thr Pro Pro Gly Ala Ala Arg Cys Ile Ala Ser 2035 2040 2045 Ser Ser Arg Gly Lys Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gin 2050 2055 2060 Thr Pro Gly Met Gly Arg Gly Leu Cys Ala Ala Trp Pro Ala Phe Arg 2065 2070 2075 2080 Glu Ala Phe Asp Arg Cys Val Ala Leu Phe Asp Arg Glu Leu Asp Arg 2085 2090 2095 Pro Leu Arg Glu Val Met Trp Ala Glu Pro Gly Ser Ala Glu Ser Leu 2100 2105 2110 WO 99/66028 PCT/EP99/04171 -39- Leu Leu Asp Gln Thr Ala Phe Thr Gln Pro Ala Leu Phe Thr Val Glu 2115 2120 2125 Tyr Ala Leu Thr Ala Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu 2130 2135 2140 Val Ala Gly His Ser Ala Gly Glu Leu Val Ala Ala Cys Val Ala Gly 2145 2150 2155 2160 Val Phe Ser Leu Glu Asp Gly Val Arg Leu Val Ala Ala Arg Gly Arg 2165 2170 2175 Leu Met Gln Gly Leu Ser Ala Gly Gly Ala Met Val Ser Leu Gly Ala 2180 2185 2190 Pro Glu Ala Glu Val Ala Ala Ala Val Ala Pro His Ala Ala Ser Val 2195 2200 2205 Ser lie Ala Ala Val Asn Gly Pro Glu Gln Val Val Ile Ala Gly Val 2210 2215 2220 Glu Gln Ala Val Gln Ala Ile Ala Ala Gly Phe Ala Ala Arg Gly Ala 2225 2230 2235 2240 Arg Thr Lys Arg Leu His Val Ser His Ala Ser His Ser Pro Leu Met 2245 2250 2255 Glu Pro Met Leu Glu Glu Phe Gly Arg Val Ala Ala Ser Val Thr Tyr 2260 2265 2270 Arg Arg Pro Ser Val Ser Leu Val Ser Asn Leu Ser Gly Lys Val Val 2275 2280 2285 Ala Asp Glu Leu Ser Ala Pro Gly Tyr Trp Val Arg His Val Arg Glu 2290 2295 2300 Ala Val Arg Phe Ala Asp Gly Val Lys Ala Leu His Glu Ala Gly Ala 2305 2310 2315 2320 Gly Thr Phe Val Glu Val Gly Pro Lys Pro Thr Leu Leu Gly Leu Leu 2325 2330 2335 Pro Ala Cys Leu Pro Glu Ala Glu Pro Thr Leu Leu Ala Ser Leu Arg 2340 2345 2350 Ala Gly Arg Glu Glu Ala Ala Gly Val Leu Glu Ala Leu Gly Arg Leu 2355 2360 2365 Trp Ala Ala Gly Gly Ser Val Ser Trp Pro Gly Val Phe Pro Thr Ala 2370 2375 2380 Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln Arg Gln Arg Tyr 2385 2390 2395 2400 Trp Pro Asp Ile Glu Pro Asp Ser Arg Arg His Ala Ala Ala Asp Pro 2405 2410 2415 Thr Gln Gly Trp Phe Tyr Arg Val Asp Trp Pro Glu Ile Pro Arg Ser 2420 2425 2430 Leu Gln Lys Ser Glu Glu Ala Ser Arg Gly Ser Trp Leu Val Leu Ala 2435 2440 2445 Asp Lys Gly Gly Val Gly Glu Ala Val Ala Ala Ala Leu Ser Thr Arg 2450 2455 2460 WO 99/66028 PCT/EP99/04171 Gly Leu Pro Cys Val Val Leu His Ala Pro Ala Glu Thr Ser Ala Thr 2465 2470 2475 2480 Ala Glu Leu Val Thr Glu Ala Ala Gly Gly Arg Ser Asp Trp Gln Val 2485 2490 2495 Val Leu Tyr Leu Trp Gly Leu Asp Ala Val Val Gly Ala Glu Ala Ser 2500 2505 2510 Ile Asp Glu Ile Gly Asp Ala Thr Arg Arg Ala Thr Ala Pro Val Leu 2515 2520 2525 Gly Leu Ala Arg Phe Leu Ser Thr Val Ser Cys Ser Pro Arg Leu Trp 2530 2535 2540 Val Val Thr Arg Gly Ala Cys Ile Val Gly Asp Glu Pro Ala ile Ala 2545 2550 2555 2560 Pro Cys Gin Ala Ala Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu 2565 2570 2575 His Pro Gly Ala Trp Gly Gly Leu Val Asp Leu Asp Pro Arg Ala Ser 2580 2585 2590 Pro Pro Gln Ala Ser Pro Ile Asp Gly Glu Met Leu Val Thr Glu Leu 2595 2600 2605 Leu Ser Gln Glu Thr Glu Asp Gln Leu Ala Phe Arg His Gly Arg Arg 2610 2615 2620 His Ala Ala Arg Leu Val Ala Ala Pro Pro Gln Gly Gln Ala Ala Pro 2625 2630 2635 2640 Val Ser Leu Ser Ala Glu Ala Ser Tyr Leu Val Thr Gly Gly Leu Gly 2645 2650 2655 Gly Leu Gly Leu Ile Val Ala Gln Trp Leu Val Glu Leu Gly Ala Arg 2660 2665 2670 His Leu Val Leu Thr Ser Arg Arg Gly Leu Pro Asp Arg Gln Ala Trp 2675 2680 2685 Cys Glu Gln Gln Pro Pro Glu Ile Arg Ala Arg Ile Ala Ala Val Glu 2690 2695 2700 Ala Leu Glu Ala Arg Gly Ala Arg Val Thr Val Ala Ala Val Asp Val 2705 2710 2715 2720 Ala Asp Val Glu Pro Met Thr Ala Leu Val Ser Ser Val Glu Pro Pro 2725 2730 2735 Leu Arg Gly Val Val His Ala Ala Gly Val Ser Val Met Arg Pro Leu 2740 2745 2750 Ala Glu Thr Asp Glu Thr Leu Leu Glu Ser Val Leu Arg Pro Lys Val 2755 2760 2765 Ala Gly Ser Trp Leu Leu His Arg Leu Leu His Gly Arg Pro Leu Asp 2770 2775 2780 Leu Phe Val Leu Phe Ser Ser Gly Ala Ala Val Trp Gly Ser His Ser 2785 2790 2795 2800 Gin Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu Ala His WO 99/66028 PCT/EP99/04171 -41 2805 2810 2815 Leu Arg Arg Ser Gln Ser Leu Pro Ala Leu Ser Val Ala Trp Gly Leu 2820 2825 2830 Trp Ala Glu Gly Gly Met Ala Asp Ala Glu Ala His Ala Arg Leu Ser 2835 2840 2845 Asp lie Gly Val Leu Pro Met Ser Thr Ser Ala Ala Leu Ser Ala Leu 2850 2855 2860 Gin Arg Leu Val Glu Thr Gly Ala Ala Gln Arg Thr Val Thr Arg Met 2865 2870 2875 2880 Asp Trp Ala Arg Phe Ala Pro Val Tyr Thr Ala Arg Gly Arg Arg Asn 2885 2890 2895 Leu Leu Ser Ala Leu Val Ala Gly Arg Asp Ile Ile Ala Pro Ser Pro 2900 2905 2910 Pro Ala Ala Ala Thr Arg Asn Trp Arg Gly Leu Ser Val Ala Glu Ala 2915 2920 2925 Arg Val Ala Leu His Glu Ile Val His Gly Ala Val Ala Arg Val Leu 2930 2935 2940 Gly Phe Leu Asp Pro Ser Ala Leu Asp Pro Gly Met Gly Phe Asn Glu 2945 2950 2955 2960 Gin Gly Leu Asp Ser Leu Met Ala Val Glu Ile Arg Asn Leu Leu Gin 2965 2970 2975 Ala Glu Leu Asp Val Arg Leu Ser Thr Thr Leu Ala Phe Asp His Pro 2980 2985 2990 Thr Val Gln Arg Leu Val Glu His Leu Leu Val Asp Val Leu Lys Leu 2995 3000 3005 Glu Asp Arg Ser Asp Thr Gln His Val Arg Ser Leu Ala Ser Asp Glu 3010 3015 3020 Pro Ile Ala Ile Val Gly Ala Ala Cys Arg Phe Pro Gly Gly Val Glu 3025 3030 3035 3040 Asp Leu Glu Ser Tyr Trp Gin Leu Leu Ala Glu Gly Val Val Val Ser 3045 3050 3055 Ala Glu Val Pro Ala Asp Arg Trp Asp Ala Ala Asp Trp Tyr Asp Pro 3060 3065 3070 Asp Pro Glu Ile Pro Gly Arg Thr Tyr Val Thr Lys Gly Ala Phe Leu 3075 3080 3085 Arg Asp Leu Gln Arg Leu Asp Ala Thr Phe Phe Arg Ile Ser Pro Arg 3090 3095 3100 Glu Ala Met Ser Leu Asp Pro Gln Gln Arg Leu Leu Leu Glu Val Ser 3105 3110 3115 3120 Trp Glu Ala Leu Glu Ser Ala Gly Ile Ala Pro Asp Thr Leu Arg Asp 3125 3130 3135 Ser Pro Thr Gly Val Phe Val Gly Ala Gly Pro Asn Glu Tyr Tyr Thr 3140 3145 3150 WO 99/66028 PCT/EP99/04171 -42- Gin Arg Leu Arg Gly Phe Thr Asp Gly Ala Ala Gly Leu Tyr Gly Gly 3155 3160 3165 Thr Gly Asn Met Leu Ser Val Thr Ala Gly Arg Leu Ser Phe Phe Leu 3170 3175 3180 Gly Leu His Gly Pro Thr Leu Ala Met Asp Thr Ala Cys Ser Ser Ser 3185 3190 3195 3200 Leu Val Ala Leu His Leu Ala Cys Gln Ser Leu Arg Leu Gly Glu Cys 3205 3210 3215 Asp Gln Ala Leu Val Gly Gly Val Asn Val Leu Leu Ala Pro Glu Thr 3220 3225 3230 Phe Val Leu Leu Ser Arg Met Arg Ala Leu Ser Pro Asp Gly Arg Cys 3235 3240 3245 Lys Thr Phe Ser Ala Asp Ala Asp Gly Tyr Ala Arg Gly Glu Gly Cys 3250 3255 3260 Ala Val Val Val Leu Lys Arg Leu Arg Asp Ala Gln Arg Ala Gly Asp 3265 3270 3275 3280 Ser Ile Leu Ala Leu Ile Arg Gly Ser Ala Val Asn His Asp Gly Pro 3285 3290 3295 Ser Ser Gly Leu Thr Val Pro Asn Gly Pro Ala Gln Gin Ala Leu Leu 3300 3305 3310 Arg Gin Ala Leu Ser Gln Ala Gly Val Ser Pro Val Asp Val Asp Phe 3315 3320 3325 Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro Ile Glu Val 3330 3335 3340 Gln Ala Leu Ser Glu Val Tyr Gly Pro Gly Arg Ser Gly Asp Arg Pro 3345 3350 3355 3360 Leu Val Leu Gly Ala Ala Lys Ala Asn Val Ala His Leu Glu Ala Ala 3365 3370 3375 Ser Gly Leu Ala Ser Leu Leu Lys Ala Val Leu Ala Leu Arg His Glu 3380 3385 3390 Gin Ile Pro Ala Gln Pro Glu Leu Gly Glu Leu Asn Pro His Leu Pro 3395 3400 3405 Trp Asn Thr Leu Pro Val Ala Val Pro Arg Lys Ala Val Pro Trp Gly 3410 3415 3420 Arg Gly Ala Arg Pro Arg Arg Ala Gly Val Ser Ala Phe Gly Leu Ser 3425 3430 3435 3440 Gly Thr Asn Val His Val Val Leu Glu Glu Ala Pro Glu Val Glu Pro 3445 3450 3455 Ala Pro Ala Ala Pro Ala Arg Pro Val Glu Leu Val Val Leu Ser Ala 3460 3465 3470 Lys Ser Ala Ala Ala Leu Asp Ala Ala Ala Ala Arg Leu Ser Ala His 3475 3480 3485 Leu Ser Ala His Pro Glu Leu Ser Leu Gly Asp Val Ala Phe Ser Leu 3490 3495 3500 WO 99/66028 PCTIEP99/04171 -43- Ala Thr Thr Arg Ser Pro Met Glu His Arg Leu Ala Ile Ala Thr Thr 3505 3510 3515 3520 Ser Arg Glu Ala Leu Arg Gly Ala Leu Asp Ala Ala Ala Gln Gln Lys 3525 3530 3535 Thr Pro Gln Gly Ala Val Arg Gly Lys Ala Val Ser Ser Arg Gly Lys 3540 3545 3550 Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gln Met Pro Gly Met Gly 3555 3560 3565 Arg Gly.Leu Tyr Glu Thr Trp Pro Ala Phe Arg Glu Ala Phe Asp Arg 3570 3575 3580 Cys Val Ala Leu Phe Asp Arg Glu Ile Asp Gln Pro Leu Arg Glu Val 3585 3590 3595 3600 Met Trp Ala Ala Pro Gly Leu Ala Gln Ala Ala Arg Leu Asp Gln Thr 3605 3610 3615 Ala Tyr Ala Gln Pro Ala Leu Phe Ala Leu Glu Tyr Ala Leu Ala Ala 3620 3625 3630 Leu Trp Arg Ser Trp Gly Val Glu Pro His Val Leu Leu Gly His Ser 3635 3640 3645 Ile Gly Glu Leu Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu 3650 3655 3660 Asp Ala Val Arg Leu Val Ala Ala Arg Gly Arg Leu Met Gln Ala Leu 3665 3670 3675 3680 Pro Ala Gly Gly Ala Met Val Ala Ile Ala Ala Ser Glu Ala Glu Val 3685 3690 3695 Ala Ala Ser Val Ala Pro His Ala Ala Thr Val Ser Ile Ala Ala Val 3700 3705 3710 Asn Gly Pro Asp Ala Val Val Ile Ala Gly Ala Glu Val Gln Val Leu 3715 3720 3725 Ala Leu Gly Ala Thr Phe Ala Ala Arg Gly Ile Arg Thr Lys Arg Leu 3730 3735 3740 Ala Val Ser His Ala Phe His Ser Pro Leu Met Asp Pro Met Leu Glu 3745 3750 3755 3760 Asp Phe Gln Arg Val Ala Ala Thr Ile Ala Tyr Arg Ala Pro Asp Arg 3765 3770 3775 Pro Val Val Ser Asn Val Thr Gly His Val Ala Gly Pro Glu Ile Ala 3780 3785 3790 Thr Pro Glu Tyr Trp Val Arg His Val Arg Ser Ala Val Arg Phe Gly 3795 3800 3805 Asp Gly Ala Lys Ala Leu His Ala Ala Gly Ala Ala Thr Phe Val Glu 3810 3815 3820 Val Gly Pro Lys Pro Val Leu Leu Gly Leu Leu Pro Ala Cys Leu Gly 3825 3830 3835 3840 Glu Ala Asp Ala Val Leu Val Pro Ser Leu Arg Ala Asp Arg Ser Glu WO 99/66028 PCT/EP99/04171 -44- 3845 3850 3855 Cys Glu Val Val Leu Ala Ala Leu Gly Ala Trp Tyr Ala Trp Gly Gly 3860 3865 3870 Ala Leu Asp Trp Lys Gly Val Phe Pro Asp Gly Ala Arg Arg Val Ala 3875 3880 3885 Leu Pro Met Tyr Pro Trp Gln Arg Glu Arg His Trp Met'Asp'Leu Thr 3890 3895 3900 Pro Arg Ser Ala Ala Pro Ala Gly Ile Ala Gly Arg Trp Pro Leu Ala 3905 3910 3915 3920 Gly Val Gly Leu Cys Met Pro Gly Ala Val Leu His His Val Leu Ser 3925 3930 3935 Ile Gly Pro Arg His Gln Pro Phe Leu Gly Asp His Leu Val Phe Gly 3940 3945 3950 Lys Val Val Val Pro Gly Ala Phe His Val Ala Val Ile Leu Ser Ile 3955 3960 3965 Ala Ala Glu Arg Trp Pro Glu Arg Ala Ile Glu Leu Thr Gly Val Glu 3970 3975 3980 Phe Leu Lys Ala Ile Ala Met Glu Pro Asp Gln Glu Val Glu Leu His 3985 3990 3995 4000 Ala Val Leu Thr Pro Glu Ala Ala Gly Asp Gly Tyr Leu Phe Glu Leu 4005 4010 4015 Ala Thr Leu Ala Ala Pro Glu Thr Glu Arg Arg Trp Thr Thr His Ala 4020 4025 4030 Arg Gly Arg Val Gln Pro Thr Asp Gly Ala Pro Gly Ala Leu Pro Arg 4035 4040 4045 Leu Glu Val Leu Glu Asp Arg Ala Ile Gln Pro Leu Asp Phe Ala Gly 4050 4055 4060 Phe Leu Asp Arg Leu Ser Ala Val Arg Ile Gly Trp Gly Pro Leu Trp 4065 4070 4075 4080 Arg Trp Leu Gln Asp Gly Arg Val Gly Asp Glu Ala Ser Leu Ala Thr 4085 4090 4095 Leu Val Pro Thr Tyr Pro Asn Ala His Asp Val Ala Pro Leu His Pro 4100 4105 4110 Ile Leu Leu Asp Asn Gly Phe Ala Val Ser Leu Leu Ser Thr Arg Ser 4115 4120 4125 Glu Pro Glu Asp Asp Gly Thr Pro Pro Leu Pro Phe Ala Val Glu Arg 4130 4135 4140 Val Arg Trp Trp Arg Ala Pro Val Gly Arg Val Arg Cys Gly Gly Val 4145 4150 4155 4160 Pro Arg Ser Gln Ala Phe Gly Val Ser Ser Phe Val Leu Val Asp Glu 4165 4170 4175 Thr Gly Glu Val Val Ala Glu Val Glu Gly Phe Val Cys Arg Arg Ala 4180 4185 4190 WO 99/66028 PCT/EP99/04171 Pro Arg Glu Val Phe Leu Arg Gln Glu Ser Gly Ala Ser Thr Ala Ala 4195 4200 4205 Leu Tyr Arg Leu Asp Trp Pro Glu Ala Pro Leu Pro Asp Ala Pro Ala 4210 4215 4220 Glu Arg Ile Glu Glu Ser Trp Val Val Val Ala Ala Pro Gly Ser Glu 4225 4230 4235 4240 Met Ala Ala Ala Leu Ala Thr Arg Leu Asn Arg Cys Val Leu Ala Glu 4245 4250 4255 Pro Lys Gly Leu Glu Ala Ala Leu Ala Gly Val Ser Pro Ala Gly Val 4260 4265 4270 Ile Cys Leu Trp Glu Ala Gly Ala His Glu Glu Ala Pro Ala Ala Ala 4275 4280 4285 Gin Arg Val Ala Thr Glu Gly Leu Ser Val Val Gln Ala Leu Arg Asp 4290 4295 4300 Arg Ala Val Arg Leu Trp Trp Val Thr Met Gly Ala Val Ala Val Glu 4305 4310 4315 4320 Ala Gly Glu Arg Val Gln Val Ala Thr Ala Pro Val Trp Gly Leu Gly 4325 4330 4335 Arg Thr Val Met Gln Glu Arg Pro Glu Leu Ser Cys Thr Leu Val Asp 4340 4345 4350 Leu Glu Pro Glu Ala Asp Ala Ala Arg Ser Ala Asp Val Leu Leu Arg 4355 4360 4365 Glu Leu Gly Arg Ala Asp Asp Glu Thr Gln Val Ala Phe Arg Ser Gly 4370 4375 4380 Lys Arg Arg Val Ala Arg Leu Val Lys Ala Thr Thr Pro Glu Gly Leu 4385 4390 4395 4400 Leu Val Pro Asp Ala Glu Ser Tyr Ara Leu Glu Ala Gly Gln Lys Gly 4405 4410 4415 Thr Leu Asp Gln Leu Arg Leu Ala Pro Ala Gln Arg Arg Ala Pro Gly 4420 4425 4430 Pro Gly Glu Val Glu Ile Lys Val Thr Ala Ser Gly Leu Asn Phe Arg 4435 4440 4445 Thr Val Leu Ala Val Leu Gly Met Tyr Pro Gly Asp Ala Gly Pro Met 4450 4455 4460 Gly Gly Asp Cys Ala Gly Val Ala Thr Ala Val Gly Gln Gly Val Arg 4465 4470 4475 4480 His Val Ala Val Gly Asp Ala Val Met Thr Leu Gly Thr Leu His Arg 4485 4490 4495 Phe Val Thr Val Asp Ala Arg Leu Val Val Arg Gln Pro Ala Gly Leu 4500 4505 4510 Thr Pro Ala Gln Ala Ala Thr Val Pro Val Ala Phe Leu Thr Ala Trp 4515 4520 4525 Leu Ala Leu His Asp Leu Gly Asn Leu Arg Arg Gly Glu Arg Val Leu 4530 4535 4540 WO 99/66028 PCT/EP99/04171 -46- Ile His Ala Ala Ala Gly Gly Val Gly Met Ala Ala Val Gln Ile Ala 4545 4550 4555 4560 Arg Trp Ile Gly Ala Glu Val Phe Ala Thr Ala Ser Pro Ser Lys Trp 4565 4570 4575 Ala Ala Val Gln Ala Met Gly Val Pro Arg Thr His Ile Ala Ser Ser 4580 4585 4590 Arg Thr Leu Glu Phe Ala Glu Thr Phe Arg Gln Val Thr Gly Gly Arg 4595 4600 4605 Gly Val Asp Val Val Leu Asn Ala Leu Ala Gly Glu Phe Val Asp Ala 4610 4615 4620 Ser Leu Ser Leu Leu Ser Thr Gly Gly Arg Phe Leu Glu Met Gly Lys 4625 4630 4635 4640 Thr Asp Ile Arg Asp Arg Ala Ala Val Ala Ala Ala His Pro Gly Val 4645 4650 4655 Arg Tyr Arg Val Phe Asp Ile Leu Glu Leu Ala Pro Asp Arg Thr Arg 4660 4665 4670 Glu Ile Leu Glu Arg Val Val Glu Gly Phe Ala Ala Gly His Leu Arg 4675 4680 4685 Ala Leu Pro Val His Ala Phe Ala Ile Thr Lys Ala Glu Ala Ala Phe 4690 4695 4700 Arg Phe Met Ala Gin Ala Arg His Gln Gly Lys Val Val Leu Leu Pro 4705 4710 4715 4720 Ala Pro Ser Ala Ala Pro Leu Ala Pro Thr Gly Thr Val Leu Leu Thr 4725 4730 4735 Gly Gly Leu Gly Ala Leu Gly Leu His Val Ala Arg Trp Leu Ala Gln 4740 4745 4750 Gin Gly Val Pro His Met Val Leu Thr Gly Arg Arg Gly Leu Asp Thr 4755 4760 4765 Pro Gly Ala Ala Lys Ala Val Ala Glu Ile Glu Ala Leu Gly Ala Arg 4770 4775 4780 Val Thr Ile Ala Ala Ser Asp Val Ala Asp Arg Asn Ala Leu Glu Ala 4785 4790 4795 4800 Val Leu Gin Ala Ile Pro Ala Glu Trp Pro Leu Gln Gly Val Ile His 4805 4810 4815 Ala Ala Gly Ala Leu Asp Asp Gly Val Leu Asp Glu Gln Thr Thr Asp 4820 4825 4830 Arg Phe Ser Arg Val Leu Ala Pro Lys Val Thr Gly Ala Trp Asn Leu 4835 4840 4845 His Glu Leu Thr Ala Gly Asn Asp Leu Ala Phe Phe Val Leu Phe Ser 4850 4855 4860 Ser Met Ser Gly Leu Leu Gly Ser Ala Gly Gln Ser Asn Tyr Ala Ala 4865 4870 4875 4880 Ala Asn Thr Phe Leu Asp Ala Leu Ala Ala His Arg Arg Ala Glu Gly WO 99/66028 PCT/EP99/04171 -47- 4885 4890 4895 Leu Ala Ala Gin Ser Leu Ala Trp Gly Pro Trp Ser Asp Gly Gly Met 4900 4905 4910 Ala Ala Gly Leu Ser Ala Ala Leu Gin Ala Arg Leu Ala Arg His Gly 4915 4920 4925 Met Gly Ala Leu Ser Pro"'Ala Gin Gly Thr Ala Leu Leu Gly Gin Ala 4930 4935 4940 Leu Ala Arg Pro Glu Thr Gin Leu Gly Ala Met Ser Leu Asp Val Arg 4945 4950 4955 4960 Ala Ala Ser Gin Ala Ser Gly Ala Ala Val Pro Pro Val Trp Arg Ala 4965 4970 4975 Leu Val Arg Ala Glu Ala Arg His Thr Ala Ala Gly Ala Gin Gly Ala 4980 4985 4990 Leu Ala Ala Arg Leu Gly Ala Leu Pro Glu Ala Arg Arg Ala Asp Glu 4995 5000 5005 Val Arg Lys Val Val Gin Ala Glu Ile Ala Arg Val Leu Ser Trp Ser 5010 5015 5020 Ala Ala Ser Ala Val Pro Val Asp Arg Pro Leu Ser Asp Leu Gly Leu 5025 5030 5035 5040 Asp Ser Leu Thr Ala Val Glu Leu Arg Asn Val Leu Gly Gin Arg Val 5045 5050 5055 Gly Ala Thr Leu Pro Ala Thr Leu Ala Phe Asp His Pro Thr Val Asp 5060 5065 5070 Ala Leu Thr Arg Trp Leu Leu Asp Lys Val Leu Ala Val Ala Glu Pro 5075 5080 5085 Ser Val Ser Ser Ala Lys Ser Ser Pro Gin Val Ala Leu Asp Glu Pro 5090 5095 5100 Ile Ala Ile Ile Gly Ile Gly Cys Arg Phe Pro Gly Gly Val Ala Asp 5105 5110 5115 5120 Pro Glu Ser Phe Trp Arg Leu Leu Glu Glu Gly Ser Asp Ala Val Val 5125 5130 5135 Glu Val Pro His Glu Arg Trp Asp Ile Asp Ala Phe Tyr Asp Pro Asp 5140 5145 5150 Pro Asp Val Arg Gly Lys Met Thr Thr Arg Phe Gly Gly Phe Leu Ser 5155 5160 5165 Asp Ile Asp Arg Phe Asp Pro Ala Phe Phe Gly Ile Ser Pro Arg Glu 5170 5175 5180 Ala Thr Thr Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser Trp 5185 5190 5195 5200 Glu Ala Phe Glu Arg Ala Gly Ile Leu Pro Glu Arg Leu Met Gly Ser 5205 5210 5215 Asp Thr Gly Val Phe Val 5220 Gly Leu Phe Tyr Gin Glu Tyr Ala Ala Leu 5225 5230 WO 99/66028 PCT/EP99/04171 -48- Ala Gly Gly 5235 Ile Glu Ala Phe Asp 5240 Gly Tyr Leu Gly Thr Gly Thr Thr 5245 Ala Ser Val Ala Ser Gly Arg Ile Ser Tyr Val Leu Gly Leu Lys Gly 5250 5255 5260 Pro Ser Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val 5265 5270 5275 5280 His Leu Ala Cys Gln Ala Leu Arg Arg Gly Glu Cys Ser Val Ala Leu 5285 5290 5295 Ala Gly Gly Val Ala Leu Met Leu Thr Pro Ala Thr Phe Val Glu Phe 5300 5305 5310 Ser Arg Leu Arg Gly Leu Ala Pro Asp Gly Arg Cys Lys Ser Phe Ser 5315 5320 5325 Ala Ala Ala Asp Gly Val Gly Trp Ser Glu Gly Cys Ala Met Leu Leu 5330 5335 5340 Leu Lys Pro Leu Arg Asp Ala Gln Arg Asp Gly Asp Pro Ile Leu Ala 5345 5350 5355 5360 Val Ile Arg Gly Thr Ala Val Asn Gln Asp Gly Arg Ser Asn Gly Leu 5365 5370 5375 Thr Ala Pro Asn Gly Ser Ser Gln Gln Glu Val Ile Arg Arg Ala Leu 5380 5385 5390 Glu Gln Ala Gly Leu Ala Pro Ala Asp Val Ser Tyr Val Glu Cys His 5395 5400 5405 Gly Thr Gly Thr Thr Leu Gly Asp Pro Ile Glu Val Gln Ala Leu Gly 5410 5415 5420 Ala Val Leu Ala Gln Gly Arg Pro Ser Asp Arg Pro Leu Val Ile Gly 5425 5430 5435 5440 Ser Val Lys Ser Asn Ile Gly His Thr Gln Ala Ala Ala Gly Val Ala 5445 5450 5455 Gly Val Ile Lys 5460 Ser Leu His Phe 5475 Ala Val Gln Val 5490 Pro Arg Arg Ala 5505 Val Ala Leu Ala Leu 5465 Asp Ala Pro Asn Pro 5480 Ala Ala Lys Pro Val 5495 Gly Val Ser Ser Phe 5510 Glu Arg Gly Leu Ile Pro Arg 5470 His Ile Pro Trp Ser 5485 Glu Trp Thr Arg Asn 5500 Gly Val Ser Gly Thr 5515 Glu Leu Gly Val Asn Ala 5520 His Val Val Leu Glu Glu Ala Pro Ala Ala Ala Phe Ala Pro Ala Ala 5525 5530 5535 Ala Arg Ser Ala 5540 Leu Asp Ala Gin 5555 Glu Leu Gly Leu 5570 Glu Ala Gly Phe Val Leu 5545 Arg Leu Ser 5560 Leu Ala Phe 5575 Ala Lys Ser Ala 5550 His Val Val Ala 5565 Leu Ala Thr Thr 5580 Ala His Arg WO 99/66028 PCT/EP99/04171 -49- Pro Met Thr Tyr Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Ala Leu 5585 5590 5595 5600 Ser Ala Ala Leu Asp Thr Ala Ala Gin Gly Gin Ala Pro Pro Ala Ala 5605 5610 5615 Ala Arg Gly His Ala Ser Thr Gly Ser Ala Pro Lys Val Val Phe Val 5620 5625 5630 Phe Pro Gly Gin Gly Ser Gin Trp Leu Gly Met Gly Gin Lys Leu Leu 5635 5640 5645 Ser Glu Glu Pro Val Phe Arg Asp Ala Leu Ser Ala Cys Asp Arg Ala 5650 5655 5660 Ile Gin Ala Glu Ala Gly Trp Ser Leu Leu Ala Glu Leu Ala Ala Asp 5665 5670 5675 5680 Glu Thr Thr Ser Gin Leu Gly Arg Ile Asp Val Val Gin Pro Ala Leu 5685 5690 5695 Phe Ala Ile Glu Val Ala Leu Ser Ala Leu Trp Arg Ser Trp Gly Val 5700 5705 5710 Glu Pro Asp Ala Val Val Gly His Ser Met Gly Glu Val Ala Ala Ala 5715 5720 5725 His Val Ala Gly Ala Leu Ser Leu Glu Asp Ala Val Ala Ile Ile Cys 5730 5735 5740 Arg Arg Ser Leu Leu Leu Arg Arg Ile Ser Gly Gln Gly Glu Met Ala 5745 5750 5755 5760 Val Val Glu Leu Ser Leu Ala Glu Ala Glu Ala Ala Leu Leu Gly Tyr 5765 5770 5775 Glu Asp Arg Leu Ser Val Ala Val Ser Asn Ser Pro Arg Ser Thr Val 5780 5785 5790 Leu Ala Gly Glu Pro Ala Ala Leu Ala Glu Val Leu Ala lie Leu Ala 5795 5800 5805 Ala Lys Gly Val Phe Cys Arg Arg Val Lys Val Asp Val Ala Ser His 5810 5815 5820 Ser Pro Gin Ile Asp Pro Leu Arg Asp Glu Leu Leu Ala Ala Leu Gly 5825 5830 5835 5840 Glu Leu Glu Pro Arg Gin Ala Thr Val Ser Met Arg Ser Thr Val Thr 5845 5850 5855 Ser Thr Ile Met Ala Gly Pro Glu Leu Val Ala Ser Tyr Trp Ala Asp 5860 5865 5870 Asn Val Arg Gin Pro Val Arg Phe Ala Glu Ala Val Gin Ser Leu Met 5875 5880 5885 Glu Asp Gly His Gly Leu Phe Val Glu Met Ser Pro His Pro Ile Leu 5890 5895 5900 Thr Thr Ser Val Glu Glu Ile Arg Arg Ala Thr Lys Arg Glu Gly Val 5905 5910 5915 5920 Ala Val Gly Ser Leu Arg Arg Gly Gin Asp Glu Arg Leu Ser Met Leu WO 99/66028 PCT/EP99/04171 5925 5930 5935 Glu Ala Leu Gly Ala Leu Trp Val His Gly Gln Ala Val Gly Trp Glu 5940 5945 5950 Arg Leu Phe Ser Ala Gly Gly Ala Gly Leu Arg Arg Val Pro Leu Pro 5955 5960 5965 Thr Tyr Pro Trp Gln Arg Glu Arg Tyr Trp Val Asp Ala Pro Thr Gly 5970 5975 5980 Gly Ala Ala Gly Gly Ser Arg Phe Ala His Ala Gly Ser His Pro Leu 5985 5990 5995 6000 Leu Gly Glu Met Gln Thr Leu Ser Thr Gln Arg Ser Thr Arg Val Trp 6005 6010 6015 Glu Thr Thr Leu Asp Leu Lys Arg Leu Pro Trp Leu Gly Asp His Arg 6020 6025 6030 Val Gln Gly Ala Val Val Phe Pro Gly Ala Ala Tyr Leu Glu Met Ala 6035 6040 6045 Leu Ser Ser Gly Ala Glu Ala Leu Gly Asp Gly Pro Leu Gln Val Ser 6050 6055 6060 Asp Val Val Leu Ala Glu Ala Leu Ala Phe Ala Asp Asp Thr Pro Ala 6065 6070 6075 6080 Ala Val Gln Val Met Ala Thr Glu Glu Arg Pro Gly Arg Leu Gln Phe 6085 6090 6095 His Val Ala Ser Arg Val Pro Gly His Gly Gly Ala Ala Phe Arg Ser 6100 6105 6110 His Ala Arg Gly Val Leu Arg Gln Ile Glu Arg Ala Glu Val Pro Ala 6115 6120 6125 Arg Leu Asp Leu Ala Ala Leu Arg Ala Arg Leu Gln Ala Ser Ala Pro 6130 6135 6140 Ala Ala Ala Thr Tyr Ala Ala Leu Ala Glu Met Gly Leu Glu Tyr Gly 6145 6150 6155 6160 Pro Ala Phe Gln Gly Leu Val Glu Leu Trp Arg Gly Glu Gly Glu Ala 6165 6170 6175 Leu Gly Arg Val Arg Leu Pro Glu Ala Ala Gly Ser Pro Ala Ala Cys 6180 6185 6190 Arg Leu His Pro Ala Leu Leu Asp Ala Cys Phe His Val Ser Ser Ala 6195 6200 6205 Phe Ala Asp Arg Gly Glu Ala Thr Pro Trp Val Pro Val Glu Ile Gly 6210 6215 6220 Ser Leu Arg Trp Phe Gln Arg Pro Ser Gly Glu Leu Trp Cys His Ala 6225 6230 6235 6240 Arg Ser Val Ser His Gly Lys Pro Thr Pro Asp Arg Arg Ser Thr Asp 6245 6250 6255 Phe Trp Val Val Asp Ser Thr Gly Ala Ile Val Ala Glu Ile Ser Gly 6260 6265 6270 WO 99/66028 PCT/EP99/04171 -51 Leu Val Ala Gln Arg Leu Ala Gly Gly Val Arg Arg Arg Glu Glu Asp 6275 6280 6285 Asp Trp Phe Met Glu Pro Ala Trp Glu Pro Thr Ala Val Pro Gly Ser 6290 6295 6300 Glu Val Met Ala Gly Arg Trp Leu Leu Ile Gly Ser Gly Gly Gly Leu 6305 6310 6315 6320 Gly Ala Ala Leu His Ser Ala Leu Thr Glu Ala Gly His Ser Val Val 6325 6330 6335 His Ala Thr Gly Arg Gly Thr Ser Ala Ala Gly Leu Gln Ala Leu Leu 6340 6345 6350 Thr Ala Ser Phe Asp Gly Gln Ala Pro Thr Ser Val Val His Leu Gly 6355 6360 6365 Ser Leu Asp Glu Arg Gly Val Leu Asp Ala Asp Ala Pro Phe Asp Ala 6370 6375 6380 Asp Ala Leu Glu Glu Ser Leu Val Arg Gly Cys Asp Ser Val Leu Trp 6385 6390 6395 6400 Thr Val Gln Ala Val Ala Gly Ala Gly Phe Arg Asp Pro Pro Arg Leu 6405 6410 6415 Trp Leu Val Thr Arg Gly Ala Gln Ala Ile Gly Ala Gly Asp.Val Ser 6420 6425 6430 Val Ala Gln Ala Pro Leu Leu Gly Leu Gly Arg Val Ile Ala Leu Glu 6435 6440 6445 His Ala Glu Leu Arg Cys Ala Arg Ile Asp Leu Asp Pro Ala Arg Arg 6450 6455 6460 Asp Gly Glu Val Asp Glu Leu Leu Ala Glu Leu Leu Ala Asp Asp Ala 6465 6470 6475 6480 Glu Glu Glu Val Ala Phe Arg Gly Gly Glu Arg Arg Val Ala Arg Leu 6485 6490 6495 Val Arg Arg Leu Pro Glu Thr Asp Cys Arg Glu Lys Ile Glu Pro Ala 6500 6505 6510 Glu Gly Arg Pro Phe Arg Leu Glu Ile Asp Gly Ser Gly Val Leu Asp 6515 6520 6525 Asp Leu Val Leu Arg Ala Thr Glu Arg Arg Pro Pro Gly Pro Gly Glu 6530 6535 6540 Val Glu Ile Ala Val Glu Ala Ala Gly Leu Asn Phe Leu Asp Val Met 6545 6550 6555 6560 Arg Ala Met Gly Ile Tyr Pro Gly Pro Gly Asp Gly Pro Val Ala Leu 6565 6570 6575 Gly Ala Glu Cys Ser Gly Arg Ile Val Ala Met Gly Glu Gly Val Glu 6580 6585 6590 Ser Leu Arg Ile Gly Gln Asp Val Val Ala Val Ala Pro Phe Ser Phe 6595 6600 6605 Gly Thr His Val Thr Ile Asp Ala Arg Met Leu Ala Pro Arg Pro Ala 6610 6615 6620 WO 99/66028 PCT/EP99/04171 -52 Ala Leu Thr Ala Ala Gin Ala Ala Ala Leu Pro Val Ala Phe Met Thr 6625 6630 6635 6640 Ala Trp Tyr Gly Leu Val His Leu Gly Arg Leu Arg Ala Gly Glu Arg 6645 6650 6655 Val Leu Ile His Ser Ala Thr Gly Gly Thr Gly Leu Ala Ala Val Gin 6660 6665 6670 Ile Ala Arg His Leu Gly Ala Glu lie Phe Ala Thr Ala Gly Thr Pro 6675 6680 6685 Glu Lys Arg Ala Trp Leu Arg Glu Gin Gly Ile Ala His Val Met Asp 6690 6695 6700 Ser Arg Ser Leu Asp Phe Ala Glu Gin Val Leu Ala Ala Thr Lys Gly 6705 6710 6715 6720 Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Ala Ala ile Asp 6725 6730 6735 Ala Ser Leu Ser Thr Leu Val Pro Asp Gly Arg Phe Ile Glu Leu Gly 6740 6745 6750 Lys Thr Asp Ile Tyr Ala Asp Arg Ser Leu Gly Leu Ala His Phe Arg 6755 6760 6765 Lys Ser Leu Ser Tyr Ser Ala Val Asp Leu Ala Gly Leu Ala Val Arg 6770 6775 6780 Arg Pro Glu Arg Val Ala Ala Leu Leu Ala Glu Val Val Asp Leu Leu 6785 6790 6795 6800 Ala Arg Gly Ala Leu Gin Pro Leu Pro Val Glu Ile Phe Pro Leu Ser 6805 6810 6815 Arg Ala Ala Asp Ala Phe Arg Lys Met Ala Gin Ala Gin His Leu Gly 6820 6825 6830 Lys Leu Val Leu Ala Leu Glu Asp Pro Asp Val Arg Ile Arg Val Pro 6835 6840 6845 Gly Glu Ser Gly Val Ala Ile Arg Ala Asp Gly Ala Tyr Leu Val Thr 6850 6855 6860 Gly Gly Leu Gly Gly Leu Gly Leu Ser Val Ala Gly Trp Leu Ala Glu 6865 6870 6875 6880 Gin Gly Ala Gly His Leu Val Leu Val Gly Arg Ser Gly Ala Val Ser 6885 6890 6895 Ala Glu Gin Gin Thr Ala Val Ala Ala Leu Glu Ala His Gly Ala Arg 6900 6905 6910 Val Thr Val Ala Arg Ala Asp Val Ala Asp Arg Ala Gin Met Glu Arg 6915 6920 6925 Ile Leu Arg Glu Val Thr Ala Ser Gly Met Pro Leu Arg Gly Val Val 6930 6935 6940 His Ala Ala Gly Ile Leu Asp Asp Gly Leu Leu Met Gin Gin Thr Pro 6945 6950 6955 6960 Ala Arg Phe Arg Ala Val Met Ala Pro Lys Val Arg Gly Ala Leu His WO 99/66028 PCT/EP99/04171 -53- 6965 6970 6975 Leu His Ala Leu Thr Arg Glu Ala Pro Leu Ser Phe Phe Val Leu Tyr 6980 6985 6990 Ala Ser Gly Ala Gly Leu Leu Gly Ser Pro Gly Gln Gly Asn Tyr Ala 6995 7000 7005 Ala Ala' Asn Thr Phe Leu Asp Ala Leu Ala His His Arg Arg Ala Gin 7010 7015 7020 Gly Leu Pro Ala Leu Ser Ile Asp Trp Gly Leu Phe Ala Asp Val Gly 7025 7030 7035 7040 Leu Ala Ala Gly Gln Gln Asn Arg Gly Ala Arg Leu Val Thr Arg Gly 7045 7050 7055 Thr Arg Ser Leu Thr Pro Asp Glu Gly Leu Trp Ala Leu Glu Arg Leu 7060 7065 7070 Leu Asp Gly Asp Arg Thr Gin Ala Gly Val Met Pro Phe Asp Val Arg 7075 7080 7085 Gin Trp Val Glu Phe Tyr Pro Ala Ala Ala Ser Ser Arg Arg Leu Ser 7090 7095 7100 Arg Leu Met Thr Ala Arg Arg Val Ala Ser Gly Arg Leu Ala Gly Asp 7105 7110 7115 7120 Arg Asp Leu Leu Glu Arg Leu Ala Thr Ala Glu Ala Gly Ala Arg Ala 7125 7130 7135 Gly Met Leu Gln Glu Val Val Arg Ala Gln Val Ser Gln Val Leu Arg 7140 7145 7150 Leu Ser Glu Gly Lys Leu Asp Val Asp Ala Pro Leu Thr Ser Leu Gly 7155 7160 7165 Met Asp Ser Leu Met Gly Leu Glu Leu Arg Asn Arg Ile Glu Ala Val 7170 7175 7180 Leu Gly Ile Thr Met Pro Ala Thr Leu Leu Trp Thr Tyr Pro Thr Val 7185 7190 7195 7200 Ala Ala Leu Ser Ala His Leu Ala Ser His Val Val Ser Thr Gly Asp 7205 7210 7215 Gly Glu Ser Ala Arg Pro Pro Asp Thr Gly Ser Val Ala Pro Thr Thr 7220 7225 7230 His Glu Val Ala Ser Leu Asp Glu Asp Gly Leu Phe Ala Leu Ile Asp 7235 7240 7245 Glu Ser Leu Ala Arg Ala Gly Lys Arg 7250 7255 <210> 6 <211> 3798 <212> PRT <213> Sorangium cellulosum <400> 6 Val Thr Asp Arg Glu Gly Gln Leu Leu Glu Arg Leu Arg Glu Val Thr 1 5 10 WO 99/66028 PCT/EP99/04171 -54- Leu Ala Leu Lys Thr Leu Asn Glu Arg Asp Thr Leu Glu Leu Glu Lys Gly Asp Pro Asp Ser Phe Gly 145 Ala Met Gly Ile Leu 225 Leu Asp Val Al Thr Ala Ala Gly Gly Leu Glu 130 Val His Leu Pro His 210 Ala Ala Ala Leu a Leu 290 lu Gy Ile Asp Phe Asp 115 Asp Phe Gln Ser Cys 195 Leu Gly Arg Ser Lys 275 Ile Pro Thr Arg Asp Asp 100 Pro Ala Val Pro Ile 180 Leu Ala Gly Thr Ala 260 Arg Arg Ile Pro Pro Val Ala Gin Gly Gly Arg 165 Ala Thr Cys Val Gln 245 Asn Leu Gly Ala Ile N Glu Ala 55.
Leu Glu 70 Pro Arg Ala Phe His Arg Ile Pro 135 Val Cys 150 Glu Glu Ala Gly Val Asp Arg Ser 215 Asn Met 230 Ala Leu Gly Phe Ser Asp Ser Ala 295 al 40 ?he Glu Trp Phe Leu 120 Pro Ala Arg Arg Thr 200 Leu Leu Ser Val Ala 280 Gly Trp Arg Ala Gly 105 Leu Arg Thr Asp Leu 185 Ala Arg Leu Pro Arg 265 Arg Ile Gly Cys A Glu Leu Leu Trp Ala Leu 75 Gly Leu Leu 9 90 Ile Ala Pro Leu Glu Val Ser Leu Val 140 Glu Tyr Leu 155 Ala Tyr Ser 170 Ser Tyr Thr Cys Ser Ser Ala Arg Glu 220 Ser Pro Asp 235 Asn Gly Arg 250 Gly Glu Gly Arg Asp Gly Gin Asp Gly 300 Gly Ala Leu 315 i Ala Ile Gly 330 Lrg sp Val Thr Arg %la 125 Gly His Thr Leu Ser 205 Ser Thr Cys Cys Asp 285 Arg Let Ty Phe Asp Gly Glu Glu 110 Trp, Ser Ala Thr Gly 190 Leu Asp Met Gin Gly 270 Arg Ser Arg Ile Pro 3Gy Val Ala Ala Glu Arg Ala Gly 175 Leu Val Leu Arg Thr 255 Leu Ile Thr Glu Glu 335 Gly Arg Asp Ile Arg Gly Thr Val 160 Asn Gin Ala Ala Ala 240 Phe Ile Trp Gly Ala 320 Thr Ile Asn Leu Thr Ala Pro Asn Val Leu Ala Gir 310 Val Glu Ala Gl Leu Arg Asn Ala Gly 325 His Gly Ala Ala Thr Ser Leu Gly 340 Pro Ile Glu Ile Glu Ala Leu 350 Arg Ala Val Val Gly Pro Ala Arg Ala Asp Gly Ala Arg Cys Val Leu WO 99/66028 PCTIEP99/04171 355 Gly Ala Val 370 Ala Gly Leu 385 Arg" Asn Leu Thr Ala Leu Arg Thr Arg 435 Ala His Val 450 Ala Pro Glu 465 Ala Ala Leu His Val Glu Arg Ser Ala 515 Ala Leu Arg 530 Gly Ala Val 545 Phe Val Phe Leu Met Ala Arg Ala Ile 595 Ala Asp Glu 610 Val Leu Phe 625 Gly Val Glu Ala Ala His Lys Ile Asn Ala 420 Phe Val Arg Asp Leu 500 Met Gly Arg Pro Glu 580 Glu Ala Ala Pro Val 660 Thr Asn Leu Gly His 375 Lys Ala Thr Leu Ser 390 Phe Arg Thr Leu Asn 405 Leu Ala Thr Glu Pro 425 Ala Gly Val Ser Ser 440 Leu Glu Glu Ala Pro 455 Ala Ala Glu Leu Phe 470 Ala Gln Ala Ala Arg 485 Gly Leu Gly Asp Val 505 Glu His Arg Leu Ala 520 Ala Leu Ser Ala Ala 535 Gly Arg Ala Ser Gly 550 Gly Gln Gly Ser Gin 565 Glu Pro Val Phe Arg 585 Ala Glu Ala Gly Trp 600 Ala Ser Gln Leu Gly 615 i Met Glu Val Ala Leu 630 Glu Ala Val Val Gly 645 Ala Gly Ala Leu Ser 665 Leu Glu Gly 380 Ala Gly Val Leu His His C 395 Pro Arg Ile 410 Val Pro Trp Phe Gly Met Ala Val Glu 460 Val Leu Ser 475 Leu Arg Asp 490 Ala Phe Ser Val Ala Ala Ala Gln Gly 540 Gly Ser Ala 555 Trp Val Gly 570 Ala Ala Leu Ser Leu Leu Arg Ile Asp 620 Ser Ala Leu 635 His Ser Met 650 Leu Glu Asp Arg Ile Ser 3lu Arg Arg Ile Pro Arg 430 Ser Gly 445 Pro Glu Ala Lys His Leu Leu Ala 510 Ser Ser 525 His Thr Pro Lys Met Gly Glu Gly 590 Gly Glu 605 Val Val Trp Arg Gly Glu Ala Val 670 Gly Gln 685 Lle 3lu 415 Thr Thr Ala Ser Glu 495 Thr Arg Pro Val Arg 575 Cys Leu Gln Ser Val 655 Ala Gly Pro 400 Gly Gly Asn Ala Ala 480 Lys Thr Glu Pro Val 560 Lys Asp Ser Pro Trp 640 Ala lie Glu Ile Cys Arg Arg 675 Ser Arg Leu Leu Arg 680 Met Ala 690 Leu Val Glu Leu Ser 695 Leu Glu Glu Ala Ala Ala Leu Arg WO 99/66028 PCT/EP99/04171 -56 Gly His 705 Thr Val Leu Thr Ser His Leu Gly 770 Val Thr 785 Ala Asp Leu Leu Ile Leu Gly Ala 850 Leu Leu 865 Trp Ala Tyr Pro Arg Arg Asp Trp 930 Gly Ser 945 Ala Ala Ser Ala Arg Arg Val Val 1010 Arg Ala 1025 3lu Leu Ala Ser 755 Ala Gly Asn Glu Val 835 Ala Glu Arg Tr-p Leu 915 Pro Trp Ala Asp Asn 995 Asp Thr Gly Arg Leu Ser 710 Ala Gly Glu Pro 725 Lys Gly Val Phe 740 Pro Gln Val Asp Ile Arg Pro Arg 775 Gly Val Ile Ala 790 Leu Arg Gln Pro 805 Gly Gly Pro Ala 820 Pro Pro Leu Asp Val Gly Ser Leu 855 Ala Leu Gly Thr 870 Leu Phe Pro Ala 885 Gln His Glu Arg 900 Ala Ala Ala Asp Glu Val Pro Arg 935 Leu Leu Leu Ala 950 Leu Ser Thr Arg 965 Ala Ser Thr Val 980 Asp Trp Gln Gly Ala Gly Ala Ser 1015 Ala Pro Val Leu Val Ala Trp Pro 760 Ala Gly Val Leu Glu 840 Arg Leu Gly Cys Pro 920 Ala Asp Gly Ala Val 1000 Ala Ala Arg 745 Leu Ala Pro Arg Phe 825 Ile Arg Trp Gly Trp 905 Thr Ala Arg Leu Glu 985 Leu Val Leu 730 Gin Arg Ala Glu Phe 810 Ile Gin Gly Ala Arg 890 Ile Lys Pro Gly Ser 970 Gln Tyr Ser 715 Ser Val Glu Val Leu 795 Ala Glu Thr Gin Ser 875 Arg Glu Asp Lys Gly 955 Cys Val Leu Asn S Glu Lys Glu I Pro 780 Gly Ala Met Ala Asp 860 Gly Val Val Trp Ser 940 Val Thr Ser Trp 1 Ser 1020 Ser Val Val Leu 765 Met Ala Ala Ser Ala 845 Glu Tyr Pro Glu Phe 925 Glu Gly Val Glu Gly .005 Pro Leu Asp 750 Ile Arg Ser Ala Pro 830 Glu Arg Pro Leu Pro 910 Tyr Thr Glu Leu Ala 990 Leu Arg Ala 735 Val Ala Ser Tyr Gin 815 His Gin Ala Val Pro 895 Asp Arg Ala Ala His 975 Ala Asp Ser 720 Ala Ala Ala Thr Trp 800 Ala Pro Gly Thr Ser 880 Thr Ala Thr His Val 960 Ala Ser Ala Ala Asp Glu Val Glu Ala Thr Arg Gly Leu Val Arg Phe Leu Ser Ala Ala 1035 1040 1030 Pro His Pro Pro Arg Phe Trp Val Val Thr Arg Gly Ala Cys Thr Val 1045 1050 1055 WO 99/66028 PCT/EP99/04171 -57- Gly Gly Glu Pro Glu Ala Ser Leu Cys Gln Ala Ala Leu Trp Gly Leu 1060 1065 1070 Ala Arg Val Ala Ala Leu Glu His Pro Ala Ala Trp Gly Gly Leu Val 1075 1080 1085 Asp Leu Asp Pro Gln Lys Ser Pro Thr Glu Ile Glu Pro Leu Val Ala 1090 1095 1100 Glu Leu Leu Ser Pro Asp Ala Glu Asp Gln Leu Ala Phe Arg Ser Gly 1105 1110 1115 1120 Arg Arg His Ala Ala Arg Leu Val Ala Ala Pro Pro Glu Gly Asp Val 1125 1130 1135 Ala Pro Ile Ser Leu Ser Ala Glu Gly Ser Tyr Leu Val Thr Gly Gly 1140 1145 1150 Leu Gly Gly Leu Gly Leu Leu Val Ala Arg Trp Leu Val Glu Arg Gly 1155 1160 1165 Ala Arg His Leu Val Leu Thr Ser Arg His Gly Leu Pro Glu Arg Gin 1170 1175 1180 Ala Ser Gly Gly Glu Gln Pro Pro Glu Ala Arg Ala Arg Ile Ala Ala 1185 1190 1195 1200 Val Glu Gly Leu Glu Ala Gln Gly Ala Arg Val Thr Val Ala Ala Val 1205 1210 1215 Asp Val Ala Glu Ala Asp Pro Met Thr Ala Leu Leu Ala Ala Ile Glu 1220 1225 1230 Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Val Phe Pro Val Arg 1235 1240 1245 His Leu Ala Glu Thr Asp Glu Ala Leu Leu Glu Ser Val Leu Arg Pro 1250 1255 1260 Lys Val Ala Gly Ser Trp Leu Leu His Arg Leu Leu Arg Asp Arg Pro 1265 1270 1275 1280 Leu Asp Leu Phe Val Leu Phe Ser Ser Gly Ala Ala Val Trp Gly Gly 1285 1290 1295 Lys Gly Gln Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu 1300 1305 1310 Ala His His Arg Arg Ala His Ser Leu Pro Ala Leu Ser Leu Ala Trp 1315 1320 1325 Gly Leu Trp Ala Glu Gly Gly Met Val Asp Ala Lys Ala His Ala Arg 1330 1335 1340 Leu Ser Asp Ile Gly Val Leu Pro Met Ala Thr Gly Pro Ala Leu Ser 1345 1350 1355 1360 Ala Leu Glu Arg Leu Val Asn Thr Ser Ala Val Gln Arg Ser Val Thr 1365 1370 1375 Arg Met Asp Trp Ala Arg Phe Ala Pro Val Tyr Ala Ala Arg Gly Arg 1380 1385 1390 Arg Asn Leu Leu Ser Ala Leu Val Ala Glu Asp Glu Arg Ala Ala Ser WO 99/66028 PCT/EP99/04171 -58- 1395 1400 1405 Pro Pro Val Pro Thr Ala Asn Arg Ile Trp Arg Gly Leu Ser Val Ala 1410 1415 1420 Glu Ser Arg Ser Ala Leu Tyr Glu Leu Val Arg Gly Ile Val Ala Arg 1425 1430 1435 1440 Val Leu Gly Phe SerAsp Pro Gly Ala Leu Asp Val Gly Arg Gly Phe 1445 1450 1455 Ala Glu Gln Gly Leu Asp Ser Leu Met Ala Leu Glu Ile Arg Asn Arg 1460 1465 1470 Leu Gln Arg Glu Leu Gly Glu Arg Leu Ser Ala Thr Leu Ala Phe Asp 1475 1480 1485 His Pro Thr Val Glu Arg Leu Val Ala His Leu Leu Thr Asp Val Leu 1490 1495 1500 Lys Leu Glu Asp Arg Ser Asp Thr Arg His Ile Arg Ser Val Ala Ala 1505 1510 1515 1520 Asp Asp Asp Ile Ala Ile Val Gly Ala Ala Cys Arg Phe Pro Gly Gly 1525 1530 1535 Asp Glu Gly Leu Glu Thr Tyr Trp Arg His Leu Ala Glu Gly Met Val 1540 1545 1550 Val Ser Thr Glu Val Pro Ala Asp Arg Trp Arg Ala Ala Asp Trp Tyr 1555 1560 1565 Asp Pro Asp Pro Glu Val Pro Gly Arg Thr Tyr Val Ala Lys Gly Ala 1570 1575 1580 Phe Leu Arg Asp Val Arg Ser Leu Asp Ala Ala Phe Phe Ala Ile Ser 1585 1590 1595 1600 Pro Arg Glu Ala Met Ser Leu Asp Pro Gln Gln Arg Leu Leu Leu Glu 1605 1610 1615 Val Ser Trp Glu Ala Ile Glu Arg Ala Gly Gln Asp Pro Met Ala Leu 1620 1625 1630 Arg Glu Ser Ala Thr Gly Val Phe Val Gly Met Ile Gly Ser Glu His 1635 1640 1645 Ala Glu Arg Val Gln Gly Leu Asp Asp Asp Ala Ala Leu Leu Tyr Gly 1650 1655 1660 Thr Thr Gly Asn Leu Leu Ser Val Ala Ala Gly Arg Leu Ser Phe Phe 1665 1670 1675 1680 Leu Gly Leu His Gly Pro Thr Met Thr Val Asp Thr Ala Cys Ser Ser 1685 1690 1695 Ser Leu Val Ala Leu His Leu Ala Cys Gln Ser Leu Arg Leu Gly Glu 1700 1705 1710 Cys Asp Gln Ala Leu Ala Gly Gly Ser Ser Val Leu Leu Ser Pro Arg 1715 1720 1725 Ser Phe Val Ala Ala Ser Arg Met Arg Leu Leu Ser Pro Asp Gly Arg 1730 1735 1740 WO 99/66028 PCT/EP99/04171 -59- Cys Lys Thr Phe Ser Ala Ala Ala Asp Gly Phe Ala Arg Ala Glu Gly 1745 1750 1755 1760 Cys Ala Val Val Val Leu Lys Arg Leu Arg Asp Ala Gln Arg Asp Arg 1765 1770 1775 Asp Pro Ile Leu Ala Val Val Arg Ser Thr Ala Ile Asn His Asp Gly 1780 1785 1790 Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala Gln Gln Ala Leu 1795 1800 1805 Leu Arg Gln Ala Leu Ala Gln Ala Gly Val Ala Pro Ala Glu Val Asp 1810 1815 1820 Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro Ile Glu 1825 1830 1835 1840 Val Gln Ala Leu Gly Ala Val Tyr Gly Arg Gly Arg Pro Ala Glu Arg 1845 1850 1855 Pro Leu Trp Leu Gly Ala Val Lys Ala Asn Leu Gly His Leu Glu Ala 1860 1865 1870 Ala Ala Gly Leu Ala Gly Val Leu Lys Val Leu Leu Ala Leu Glu His 1875 1880 1885 Glu Gln Ile Pro Ala Gln Pro Glu Leu Asp Glu Leu Asn Pro His Ile 1890 1895 1900 Pro Trp Ala Glu Leu Pro Val Ala Val Val Arg Arg Ala Val Pro Trp 1905 1910 1915 1920 Pro Arg Gly Ala Arg Pro Arg Arg Ala Gly Val Ser Ala Phe Gly Leu 1925 1930 1935 Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala Pro Ala Val Glu 1940 1945 1950 Pro Val Ala Ala Ala Pro Glu Arg Ala Ala Glu Leu Phe Val Leu Ser 1955 1960 1965 Ala Lys Ser Ala Ala Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Asp 1970 1975 1980 His Leu Glu Lys His Val Glu Leu Gly Leu Gly Asp Val Ala Phe Ser 1985 1990 1995 2000 Leu Ala Thr Thr Arg Ser Ala Met Glu His Arg Leu Ala Val Ala Ala 2005 2010 2015 Ser Ser Arg Glu Ala Leu Arg Gly Ala Leu Ser Ala Ala Ala Gln Gly 2020 2025 2030 His Thr Pro Pro Gly Ala Val Arg Gly Arg Ala Ser Gly Gly Ser Ala 2035 2040 2045 Pro Lys Val Val Phe Val Phe Pro Gly Gln Gly Ser Gln Trp Val Gly 2050 2055 2060 Met Gly Arg Lys Leu Met Ala Glu Glu Pro Val Phe Arg Ala Ala Leu 2065 2070 2075 2080 Glu Gly Cys Asp Arg Ala Ile Glu Ala Glu Ala Gly Trp Ser Leu Leu 2085 2090 2095 WO 99/66028 PCT/EP99/04171 Gly Glu Leu Ser Ala Asp Glu Ala Ala Ser Gln Leu Gly Arg Ile Asp 2100 2105 2110 Val Val Gln Pro Val Leu Phe Ala Met Glu Val Ala Leu Ser Ala Leu 2115 2120 2125 Trp Arg Ser Trp Gly Val Glu Pro Glu Ala Val Val Gly His Ser Met 2130 2135 2140 Gly Glu Val Ala Ala Ala His Val Ala Gly Ala Leu Ser Leu Glu Asp 2145 2150 2155 2160 Ala Val Ala Ile Ile Cys Arg Arg Ser Arg Leu Leu Arg Arg Ile Ser 2165 2170 2175 Gly Gln Gly Glu Met Ala Leu Val Glu Leu Ser Leu Glu Glu Ala Glu 2180 2185 2190 Ala Ala Leu Arg Gly His Glu Gly Arg Leu Ser Val Ala Val Ser Asn 2195 2200 2205 Ser Pro Arg Ser Thr Val Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu 2210 2215 2220 Val Leu Ala Ala Leu Thr Ala Lys Gly Val Phe Trp Arg Gln Val Lys 2225 2230 2235 2240 Val Asp Val Ala Ser His Ser Pro Gln Val Asp Pro Leu Arg Glu Glu 2245 2250 2255 Leu Ile Ala Ala Leu Gly Ala Ile Arg Pro Arg Ala Ala Ala Val Pro 2260 2265 2270 Met Arg Ser Thr Val Thr Gly Gly Val Ile Ala Gly Pro Glu Leu Gly 2275 2280 2285 Ala Ser Tyr Trp Ala Asp Asn Leu Arg Gln Pro Val Arg Phe Ala Ala 2290 2295 2300 Ala Ala Gln Ala Leu Leu Glu Gly Gly Pro Ala Leu Phe Ile Glu Met 2305 2310 2315 2320 Ser Pro His Pro Ile Leu Val Pro Pro Leu Asp Glu Ile Gln Thr Ala 2325 2330 2335 Ala Glu Gln Gly Gly Ala Ala Val Gly Ser Leu Arg Arg Gly Gln Asp 2340 2345 2350 Glu Arg Ala Thr Leu Leu Glu Ala Leu Gly Thr Leu Trp Ala Ser Gly 2355 2360 2365 Tyr Pro Val Ser Trp Ala Arg Leu Phe Pro Ala Gly Gly Arg Arg Val 2370 2375 2380 Pro Leu Pro Thr Tyr Pro Trp Gln His Glu Arg Tyr Trp Ile Glu Asp 2385 2390 2395 2400 Ser Val His Gly Ser Lys Pro Ser Leu Arg Leu Arg Gln Leu Arg Asn 2405 2410 2415 Gly Ala Thr Asp His Pro Leu Leu Gly Ala Pro Leu Leu Val Ser Ala 2420 2425 2430 Arg Pro Gly Ala His Leu Trp Glu Gln Ala Leu Ser Asp Glu Arg Leu WO 99/66028 PCT/EP99/04171 -61 2435 Ser Tyr Leu Ser Glu His Arg 2450 2455 Ala Ala Tyr Val Glu Met Ala 2465 2470 Thr Ala Thr Leu Val Leu Glu 2485 Val Pro Ser Glu Gly Gly Arg 2500 2440 Val Leu Gin Ile 2445 His Gly Glu Ala Val Leu Pro Ser 2460 Ala Ala Gly Val Asp Leu Tyr Gly 2475 2480 Leu-Ala Leu Glu Arg Ala Leu Ala 2490 2495 Val Gln Val Ala Leu Ser Glu Glu 2505 2510 Gly Pro Gly Arg Ala Ser Phe Gln Val Ser Ser Arg Glu Glu Ala Gly 2515 2520 2525 Arg Ser Trp Val Arg His Ala Thr Gly His Val Cys Ser Gly Gln Ser 2530 2535 2540 Ser Ala Val Gly Ala Leu Lys Glu Ala Pro Trp Glu Ile Gln Arg Arg 2545 2550 2555 2560 Cys Pro Ser Val Leu Ser Ser Glu Ala Leu Tyr Pro Leu Leu Asn Glu 2565 2570 2575 His Ala Leu Asp Tyr Gly Pro Cys Phe Gln Gly Val Glu Gln Val Trp 2580 2585 2590 Leu Gly Thr Gly Glu Val Leu Gly Arg Val Arg Leu Pro Gly Asp Met 2595 2600 2605 Ala Ser Ser Ser Gly Ala Tyr Arg Ile His Pro Ala Leu Leu Asp Ala 2610 2615 2620 Cys Phe Gln Val Leu Thr Ala Leu Leu Thr Thr Pro Glu Ser Ile Glu 2625 2630 2635 2640 Ile Arg Arg Arg Leu Thr Asp Leu His Glu Pro Asp Leu Pro Arg Ser 2645 2650 2655 Arg Ala Pro Val Asn Gln Ala Val Ser Asp Thr Trp Leu Trp Asp Ala 2660 2665 2670 Ala Leu Asp Gly Gly Arg Arg Gln Ser Ala Ser Val Pro Val Asp Leu 2675 2680 2685 Val Leu Gly Ser Phe His Ala Lys Trp Glu Val Met Glu Arg Leu Ala 2690 2695 2700 Gin Ala Tyr Ile Ile Gly Thr Leu Arg Ile Trp Asn Val Phe Cys Ala 2705 2710 2715 2720 Ala Gly Glu Arg His Thr Ile Asp Glu Leu Leu Val Arg Leu Gln Ile 2725 2730 2735 Ser Val Val Tyr Arg Lys 2740 Ala Ile Gly Ile 2755 Pro Leu Pro Glu 2770 Val Asp Val Ile Lys Arg 2745 Gly Asp Gly Glu 2760 Leu Ala Ala Val 2775 s Phe Val Ser 2765 i Glu Glu Ala 2780 Trp Met Glu His Leu Val 2750 Ser Gin Gly Arg WO 99/66028 PCT/EP99/04171 -62- Val Phe Ala Asp Leu Pro Val Leu Phe Glu Trp Cys Lys Phe Ala Gly 2785 2790 2795 2800 Glu Arg Leu Ala Asp Val Leu Thr Gly Lys Thr Leu Ala Leu Glu Ile 2805 2810 2815 Leu Phe Pro Gly Gly Ser Phe Asp Met Ala Glu Arg Ile Tyr Arg Asp 2820 2825 2830 Ser Pro Ile Ala Arg Tyr Ser Asn Gly Ile Val Arg Gly Val Val Glu 2835 2840 2845 Ser Ala Ala Arg Val Val Ala Pro Ser Gly Met Phe Ser Ile Leu Glu 2850 2855 2860 Ile Gly Ala Gly Thr Gly Ala Thr Thr Ala Ala Val Leu Pro Val Leu 2865 2870 2875 2880 Leu Pro Asp Arg Thr Glu Tyr His Phe Thr Asp Val Ser Pro Leu Phe 2885 2890 2895 Leu Ala Arg Ala Glu Gin Arg Phe Arg Asp Tyr Pro Phe Leu Lys Tyr 2900 2905 2910 Gly Ile Leu Asp Val Asp Gln Glu Pro Ala Gly Gln Gly Tyr Ala His 2915 2920 2925 Gin Arg Phe Asp Val Ile Val Ala Ala Asn Val Ile His Ala Thr Arg 2930 2935 2940 Asp Ile Arg Ala Thr Ala Lys Arg Leu Leu Ser Leu Leu Ala Pro Gly 2945 2950 2955 2960 Gly Leu Leu Val Leu Val Glu Gly Thr Gly His Pro Ile Trp Phe Asp 2965 2970 2975 Ile Thr Thr Gly Leu Ile Glu Gly Trp Gln Lys Tyr Glu Asp Asp Leu 2980 2985 2990 Arg lie Asp His Pro Leu Leu Pro Ala Arg Thr Trp Cys Asp Val Leu 2995 3000 3005 Arg Arg Val Gly Phe Ala Asp Ala Val Ser Leu Pro Gly Asp Gly Ser 3010 3015 3020 Pro Ala Gly Ile Leu Gly Gln His Val Ile Leu Ser Arg Ala Pro Gly 3025 3030 3035 3040 lie Ala Gly Ala Ala Cys Asp Ser Ser Gly Glu Ser Ala Thr Glu Ser 3045 3050 3055 Pro Ala Ala Arg Ala Val Arg Gln Glu Trp Ala Asp Gly Ser Ala Asp 3060 3065 3070 Val Val His Arg Met Ala Leu Glu Arg Met Tyr Phe His Arg Arg Pro 3075 3080 3085 Gly Arg Gln Val Trp Val His Gly Arg Leu Arg Thr Gly Gly Gly Ala 3090 3095 3100 Phe Thr Lys Ala Leu Ala Gly Asp Leu Leu Leu Phe Glu Asp Thr Gly 3105 3110 3115 3120 Gin Val Val Ala Glu Val Gln Gly Leu Arg Leu Pro Gln Leu Glu Ala 3125 3130 3135 WO 99/66028 PCT/EP99/04171 -63- Ser Ala Phe Ala Pro Arg Asp Pro Arg Glu Glu Trp Leu Tyr Ala Leu 3140 3145 3150 Glu Trp Gln Arg Lys Asp Pro Ile Pro Glu Ala Pro Ala Ala Ala Ser 3155 3160 3165 Ser Ser Ser Ala Gly Ala Trp Leu Val Leu Met Asp Gln Gly Gly Thr 3170 3175 3180 Gly Ala Ala Leu Val Ser Leu Leu Glu Gly Arg Gly Glu Ala Cys Val 3185 3190 3195 3200 Arg Val Ile Ala Gly Thr Ala Tyr Ala Cys Leu Ala Pro Gly Leu Tyr 3205 3210 3215 Gin Val Asp Pro Ala Gln Pro Asp Gly Phe His Thr Leu Leu Arg Asp 3220 3225 3230 Ala Phe Gly Glu Asp Arg Ile Cys Arg Ala Val Val His Met Trp Ser 3235 3240 3245 Leu Asp Ala Thr Ala Ala Gly Glu Arg Ala Thr Ala Glu Ser Leu Gin 3250 3255 3260 Ala Asp Gln Leu Leu Gly Ser Leu Ser Ala Leu Ser Leu Val Gln Ala 3265 3270 3275 3280 Leu Val Arg Arg Arg Trp Arg Asn Met Pro Arg Leu Trp Leu Leu Thr 3285 3290 3295 Arg Ala Val His Ala Val Gly Ala Glu Asp Ala Ala Ala Ser Val Ala 3300 3305 3310 Gin Ala Pro Val Trp Gly Leu Gly Arg Thr Leu Ala Leu Glu His Pro 3315 3320 3325 Glu Leu Arg Cys Thr Leu Val Asp Val Asn Pro Ala Pro Ser Pro Glu 3330 3335 3340 Asp Ala Ala Ala Leu Ala Val Glu Leu Gly Ala Ser Asp Arg Glu Asp 3345 3350 3355 3360 Gin Val Ala Leu Arg Ser Asp Gly Arg Tyr Val Ala Arg Leu Val Arg 3365 3370 3375 Ser Ser Phe Ser Gly Lys Pro Ala Thr Asp Cys Gly Ile Arg Ala Asp 3380 3385 3390 Gly Ser Tyr Val Ile Thr Asp Gly Met Gly Arg Val Gly Leu Ser Val 3395 3400 3405 Ala Gln Trp Met Val Met Gln Gly Ala Arg His Val Val Leu Val Asp 3410 3415 3420 Arg Gly Gly Ala Ser Glu Ala Ser Arg Asp Ala Leu Arg Ser Met Ala 3425 3430 3435 3440 Glu Ala Gly Ala Glu Val Gln Ile Val Glu Ala Asp Val Ala Arg Arg 3445 3450 3455 Asp Asp Val Ala Arg Leu Leu Ser Lys Ile Glu Pro Ser Met Pro Pro 3460 3465 3470 Leu Arg Gly Ile Val Tyr Val Asp Gly Thr Phe Gln Gly Asp Ser Ser WO 99/66028 PCT/EP99/04171 -64- 3475 3480 3485 Met Leu Glu Leu Asp Ala Arg Arg Phe Lys Glu Trp Met Tyr Pro Lys 3490 3495 3500 Val Leu Gly Ala Trp Asn Leu His Ala Leu Thr Arg Asp Arg Ser Leu 3505 3510 3515 3520 Asp Phe Phe Val Leu Tyr Ser Ser Gly Thr Ser Leu Leu Gly Leu Pro 3525 3530 3535 Gly Gin Gly Ser Arg Ala Ala Gly Asp Ala Phe Leu Asp Ala Ile Ala 3540 3545 3550 His His Arg Cys Lys Val Gly Leu Thr Ala Met Ser Ile Asn Trp Gly 3555 3560 3565 Leu Leu Ser Glu Ala Ser Ser Pro Ala Thr Pro Asn Asp Gly Gly Ala 3570 3575 3580 Arg Leu Glu Tyr Arg Gly Met Glu Gly Leu Thr Leu Glu Gin Gly Ala 3585 3590 3595 3600 Ala Ala Leu Gly Arg Leu Leu Ala Arg Pro Arg Ala Gin Val Gly Val 3605 3610 3615 Met Arg Leu Asn Leu Arg Gin Trp Leu Glu Phe Tyr Pro Asn Ala Ala 3620 3625 3630 Arg Leu Ala Leu Trp Ala Glu Leu Leu Lys Glu Arg Asp Arg Ala Asp 3635 3640 3645 Arg Gly Ala Ser Asn Ala Ser Asn Leu Arg Glu Ala Leu Gin Ser Ala 3650 3655 3660 Arg Pro Glu Asp Arg Gin Leu Ile Leu Glu Lys His Leu Ser Glu Leu 3665 3670 3675 3680 Leu Gly Arg Gly Leu Arg Leu Pro Pro Glu Arg Ile Glu Arg His Val 3685 3690 3695 Pro Phe Ser Asn Leu Gly Met Asp Ser Leu Ile Gly Leu Glu Leu Arg 3700 3705 3710 Asn Arg Ile Glu Ala Ala Leu Gly Ile Thr Val Pro Ala Thr Leu Leu 3715 3720 3725 Trp Thr Tyr Pro Asn Val Ala Ala Leu Ser Gly Ser Leu Leu Asp Ile 3730 3735 3740 Leu Phe Pro Asn Ala Gly Ala Thr His Ala Pro Ala Thr Glu Arg Glu 3745 3750 3755 3760 Lys Ser Phe Glu Asn Asp Ala Ala Asp Leu Glu Ala Leu Arg Gly Met 3765 3770 3775 Thr Asp Glu Gin Lys Asp Ala Leu Leu Ala Glu Lys Leu Ala Gin Leu 3780 3785 3790 Ala Gin Ile Val Gly Glu 3795 <210> 7 <211> 2439 WO 99/66028 PCTIEP99/04171 <212> PRT <213> Sorangium <400> 7 Met Ala Thr Thr 1 Asp Lys Leu Ala Pro Ile Ala Ile Thr Pro Glu Ala Gin Pro Leu Asp Glu Val Pro Arg Asp Ala Ala Phe 100 Pro Gin Gin Arg 115 Ala Gly 1le Ala 130 Leu Gly Ala Cys 145 Arg Glu Glu Gin Ala Ala Gly Arg 180 Thr Val Asp Thr 195 Cys Arg Ser Leu 210 Val Asn Met Leu 225 GIn Ala Leu Ser Asn Gly Phe Val 260 Leu Ser Asp Ala 275 Gly Ser Ala Met 290 Asn Val Leu Ala 305 cellulosum Asn-Ala Gly Lys Leu Glu 5 10 His Ala Leu Leu Leu Met Lys Val Phe Arg Trp Phe Leu Pro Ser Asp 165 Leu Al a Arg Leu Pro 245 Arg Gin Asr Glr Lys Gly Trp Arg 70 Ala Gly Leu Gin Ser 150 Ala Ser Cys Ala Ser 230 Asp Gly Arg Gin Asn Ile Glu 55 Trp Gly Thr Leu Ser 135 Asp Tyr Tyr Ser Arg 215 Ser Gly Glu His Asp 295 Ala Gly 40 Leu Ala Leu Ser Glu 120 Leu Tyr Asp Thr Ser 200 Glu Lys His Gly Gly 280 Gly Ser 25 -ys Leu Leu Leu Pro 105 Val Asp Ser Ile Leu 185 Ser Ser Thr Cys Cys 265 Asp Ar Leu Arg Asp Val Thr 90 Arg Thr Gly His Thr 170 Gly Leu Asp Met Arg 250 Gly Arg I Ser u Arg 3lu Phe Ser Gly 75 Glu Glu Trp Ser Thr 155 Gly Leu Val Leu Ile 235 Thr Met Ile Thr Glu Gin C Pro C Gly Val Ala Ala Glu Arg 140 Val Asn Gin Ala Ala 220 Met Phe Val Trp, Gly 300 Ala lu ly Arg His Val Arg Gly 125 Thr Ala Thr Gly 1le 205 Leu Leu Asp Val Ala 285 Leu Leu Arg Gly, Asp Pro Asp Ser 110 Leu Gly Gin Leu Pro 190 His Ala Gly Ala Leu 270 Leu Met Gin Thr Ala Ala Ser Gly Leu Glu Val Gin Ser 175 Cys Leu Gly Arg Ser 255 Lys Ile Ala Ser Glu Asp Val Glu Phe Asp Asp Phe Arg 160 Val Leu Ala Gly Ile 240 Ala Arg Arg Pro Ala Glu Ala Leu Le 310 315 320 Arg Val Asp Ala Gly Ala Ile Gly Tyr Val Glu Thr His Gly Thr Gly WO 99/66028 PCT/EP99/04171 -66- Thr S Gly I Thr Lys 1 385 Phe Leu Ala Leu Ser 465 Ala Gly Glu Ala G- Y 545 Gin Pro Glu Ser Phe 625 Glu er Pro Asn 370 Ala iis Ala Gly Glu 450 Ala Gn Leu His Leu 530 Arg Gly Ala Leu Arg 610 Ala Prc Leu C Ala 355 Leu Ala Thr Thr Val 435 Glu Glu Ala Gly Arg 515 Glu Ala Ala Phe His 595 Ser Leu Glu ly 340 Arg Gly Leu Leu Glu 420 Ser Ala Leu Ala Asp 500 Leu Val Ala Gln Arg 580 Gin Ser Glu Lei.
325 Asp Ala His Ala Asn 405 Pro Ala Pro Leu Arg 485 Val Ala Ala Ser Val 565 Glu Prc Le Ty Va Pro I Asp C Leu C Leu 1 390 Pro Val Phe Ala Val 470 Leu Ala Val Ala Ser 550 Pro Thr Leu a Leu r Ala 630 1 Ala le ly Glu 375 is krg Pro Gly Thr 455 Leu Ser Phe Ala Gin 535 Pro Gly Phe Cys Asp 615 Glu Ser 360 31y His Ile Trp Leu 440 Val Ser Ala Ser Ala 520 Gly G 1y Met Asp Glu 600 Gin al 45 Arg Ala Glu Arg Pro 425 Ser Leu Ala His Leu 505 Thr Gin Gly Arg 585 Val Thr 330 Glu Cys N Ala Leu Ile 410 Arg Gly Ala Lys Ile 490 Val Ser Thr Leu Arg 570 Cys Met Ala Ala Val Gly Ile 395 Glu Ala Thr Pro Ser 475 Ala Ser Arg Pro Ala 555 Gly Val Trp Phe Leu Leu C Val 380 Pro Gly' Gly Asn Ala 460 Ala Ala Thr Glu Ala 540 Phe Leu Thr Ala Thr 620 Arg A 3 Gly P 365 Ala C Arg Thr Arg Val 445 Thr Ala Tyr Arg Ala 525 Gly Leu Trp Leu Glu 605 Gin la la.
ly Asn Ala Pro 430 His Pro Ala Pro Ser 510 Leu Ala Phe Glu Ph 59 Pr Pr 335 Val I Val Leu Leu Leu 415 Arg Val Gly Leu Glu 495 Pro Arg Ala Ala Ala 575 Asp 0 Gly Ala Leu ,ys Ile His 400 Ala Phe Val Arg Asp 480 Gin Met Ser Arg Gly 560 Trp Arg Ser Leu Leu Ala Ala Leu Arg Ser Trp Gly Val 645 Gly His Ser Leu Gly Glu Leu Val Ala Ala 650 655 Cys Val Ala Gly 660 Val Phe Ser Leu Glu Asp Ala Val Arg 665 Leu Val Val 670 WO 99/66028 WO 9966028PCTIEP99/04171 67 Ala Arg Gly Arg Leu Met Gin Ala Leu Pro Ala Gly Gly Ala Met Val 675 680 685 Ser Ile Ala Ala Pro Glu Ala Asp Val Ala Ala Ala Val Ala Pro His 690 695 700 Ala Ala Leu Val Ser Ile Ala Ala Val Asn Gly Pro Glu Gin Val Val 705 710 715 720 Ile Ala Giy Ala Giu Lys Phe Val Gin Gin Ile Ala Ala Ala Phe Ala 725 730 735 Ala Arg Gly Ala Arg Thr Lys Pro Leu His Val Ser His Ala Phe His 740 '745 750 Ser Pro Leu Met Asp Pro Met Leu Giu Ala Phe Axg Arg Vai Thr Glu 755 760 765 Ser Val Thr Tyr Arg Arg Pro Ser Ile Ala Leu Val Ser Asn Leu Ser 770 775 780 Gly Lys Pro Cys Thr Asp Giu Val Ser Ala Pro Gly Tyr Trp Val Arg 785 790 795 800 His Ala Ara Glu Ala Val Arg Phe Ala Asp Gly Val Lys Ala Leu His 805 810 815 Ala Ala Giy Ala Gly Leu Phe Val Giu Val Gly Pro Lys Pro Thr Leu 820 825 830 Leu Giy Leu Val Pro Ala Cys Leu Pro Asp Ala Arg Pro Val Leu Leu 835 840 845 Pro Ala Ser Arg Ala Gly Arg Asp Glu Ala Ala Ser Ala Leu Giu Ala 850 855 860 Leu Gly Gly Phe Trp Val Val Gly Gly Ser Val Thr Trp Ser Gly Val 865 870 875 880 ?he Pro Ser Gly Giy Arg Arg Val Pro Leu Pro Thr T.yr Pro Trp Gin 885 890 895 Arg Glu Arg Tyr Trp Ilie Giu Ala Pro Val Asp Arg Glu Ala Asp Gly 900 905 910 Thr Gly Arg Ala Arg Ala Gly Gly His Pro Leu Leu Gly Giu Val Phe 915 920 925 Ser Val Ser Thr His Ala Gly Leu Arg Leu Trp Glu Thr Thr Leu Asp 930 935 940 Arg Lys Ar g Leu Pro Trp Leu Gly Glu His Arg Ala Gln Gly Glu Val 945 950 955 960 Vai Phe Pro Gly Ala Gly Tyr Leu Giu Met Ala Leu Ser Ser Gly Ala 965 970 975 Glu Ile Leu Gly Asp Gly Pro Ile Gin Val Thr Asp Val Val Leu Ile 980 985 990 Giu Thr Leu Thr Phe Ala Gly Asp Thr Ala Val Pro Val Gin Val Val 995 1000 1005 Thr Thr Giu Glu Arg Pro Gly Arg Leu Arg Phe Gin Val Ala Ser Arg 1010 1015 1020 WO 99/66028 PCT/EP99/04171 -68- Glu Pro Gly Glu Arg Arg Ala Pro Phe Arg Ile His Ala Arg Gly Val 1025 1030 1035 1040 Leu Arg Arg Ile Gly Arg Val Glu Thr Pro Ala Arg Ser Asn Leu Ala 1045 1050 1055 Ala Leu Arg Ala Arg Leu His Ala Ala Val Pro Ala Ala Ala Ile Tyr 1060 1065 1070 Gly Ala Leu Ala Glu Met Gly Leu Gln Tyr Gly Pro Ala Leu Arg Gly 1075 1080 1085 Leu Ala Glu Leu Trp Arg Gly Glu Gly Glu Ala Leu Gly Arg Val Arg 1090 1095 1100 Leu Pro Glu Ala Ala Gly Ser Ala Thr Ala Tyr Gln Leu His Pro Val 1105 1110 1115 1120 Leu Leu Asp Ala Cys Val Gln Met Ile Val Gly Ala Phe Ala Asp Arg 1125 1130 1135 Asp Glu Ala Thr Pro Trp Ala Pro Val Glu Val Gly Ser Val Arg Leu 1140 1145 1150 Phe Gln Arg Ser Pro Gly Glu Leu Trp Cys His Ala Arg Val Val Ser 1155 1160 1165 Asp Gly Gln Gln Ala Ser Ser Arg Trp Ser Ala Asp Phe Glu Leu Met 1170 1175 1180 Asp Gly Thr Gly Ala Val Val Ala Glu Ile Ser Arg Leu Val Val Glu 1125 1190 1195 1200 Arg Leu Ala Ser Gly Val Arg Arg Arg Asp Ala Asp Asp Trp Phe Leu 1205 1210 1215 Glu Leu Asp Trp Glu Pro Ala Ala Leu Gly Gly Pro Lys Ile Thr Ala 1220 1225 1230 Gly Arg Trp Leu Leu Leu Gly Glu Gly Gly Gly Leu Gly Arg Ser Leu 1235 1240 1245 Cys Ser Ala Leu Lys Ala Ala Gly His Val Val Val His Ala Ala Gly 1250 1255 1260 Asp Asp Thr Ser Thr Ala Gly Met Arg Ala Leu Leu Ala Asn Ala Phe 1265 1270 1275 1280 Asp Gly Gln Ala Pro Thr Ala Val Val His Leu Ser Ser Leu Asp Gly 1285 1290 1295 Gly Gly Gln Leu Gly Pro Gly Leu Gly Ala Gln Gly Ala Leu Asp Ala 1300 1305 1310 Pro Arg Ser Pro Asp Val Asp Ala Asp Ala Leu Glu Ser Ala Leu Met 1315 1320 1325 Arg Gly Cys Asp Ser Val Leu Ser Leu Val Gln Ala Leu Val Gly Met 1330 1335 1340 Asp Leu Arg Asn Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gin 1345 1350 1355 1360 Ala Ala Ala Ala Gly Asp Val Ser Val Val Gln Ala Pro Leu Leu Gly WO 99/66028 PCT/EP99/04171 -69- 1365 1370 1375 Leu Gly Arg Thr Ile Ala Leu Glu His Ala Glu Leu Arg Cys Ile Ser 1380 1385 1390 Val Asp Leu Asp Pro Ala Glu Pro Glu Gly Glu Ala Asp Ala Leu Leu 1395 1400 1405 Ala Glu Leu Leu Ala Asp Asp Ala Glu Glu Glu Val Ala Leu Arg Gly 1410 1" 1415 1420 Gly Asp Arg Leu Val Ala Arg Leu Val His Arg Leu Pro Asp Ala Gln 1425 1430 1435 1440 Arg Arg Glu Lys Val Glu Pro Ala Gly Asp Arg Pro Phe Arg Leu Glu 1445 1450 1455 Ile Asp Glu Pro Gly Ala Leu Asp Gln Leu Val Leu Arg Ala Thr Gly 1460 1465 1470 Arg Arg Ala Pro Gly Pro Gly Glu Val Glu Ile Ser Val Glu Ala Ala 1475 1480 1485 Gly Leu Asp Ser Ile Asp Ile Gln Leu Ala Leu Gly Val Ala Pro Asn 1490 1495 1500 Asp Leu Pro Gly Glu Glu Ile Glu Pro Leu Val Leu Gly Ser Glu Cys 1505 1510 1515 1520 Ala Gly Arg Ile Val Ala Val Gly Glu Gly Val Asn Gly Leu Val Val 1525 1530 1535 Gly Gln Pro Val Ile Ala Leu Ala Ala Gly Val Phe Ala Thr His Val 1540 1545 1550 Thr Thr Ser Ala Thr Leu Val Leu Pro Arg Pro Leu Gly Leu Ser Ala 1555 1560 1565 Thr Glu Ala Ala Ala Met Pro Leu Ala Tyr Leu Thr Ala Trp Tyr Ala 1570 1575 1580 Leu Asp Lys Val Ala His Leu Gln Ala Gly Glu Arg Val Leu Ile His 1585 1590 1595 1600 Ala Glu Ala Gly Gly Val Gly Leu Cys Ala Val Arg Trp Ala Gln Arg 1605 1610 1615 Val Gly Ala Glu Val Tyr Ala Thr Ala Asp Thr Pro Glu Asn Arg Ala 1620 1625 1630 Tyr Leu Glu Ser Leu Gly Val Arg Tyr Val Ser Asp Ser Arg Ser Gly 1635 1640 1645 Arg Phe Val Thr Asp Val His Ala Trp Thr Asp Gly Glu Gly Val Asp 1650 1655 1660 Val Val Leu Asp Ser Leu Ser Gly Glu Arg Ile Asp Lys Ser Leu Met 1665 1670 1675 1680 Val Leu Arg Ala Cys Gly Arg Leu Val Lys Leu Gly Arg Arg Asp Asp 1685 1690 1695 Cys Ala Asp Thr Gln Pro Gly Leu Pro Pro Leu Leu Arg Asn Phe Ser 1700 1705 1710 WO 99/66028 PCT/EP99/04171 Phe Ser Gln Val Asp Leu Arg Gly Met Met Leu Asp Gln Pro Ala Arg 1715 1720 1725 Ile Arg Ala Leu Leu Asp Glu Leu Phe Gly Leu Val Ala Ala Gly Ala 1730 1735 1740 Ile Ser Pro Leu Gly Ser Gly Leu Arg Val Gly Gly Ser Leu Thr Pro 1745 1750 1755 1760 Pro Pro Val Glu Thr Phe Pro Ile Ser Arg Ala Ala Glu Ala Phe Arg 1765 1770 1775 Arg Met Ala Gln Gly GIn His Leu Gly Lys Leu Val Leu Thr Leu Asp 1780 1785 1790 Asp Pro Glu Val Arg Ile Arg Ala Pro Ala Glu Ser Ser Val Ala Val 1795 1800 1805 Arg Ala Asp Gly Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly 1810 1815 1820 Leu Arg Val Ala Gly Trp Leu Ala Glu Arg Gly Ala Gly Gln Leu Val 1825 1830 1835 1840 Leu Val Gly Arg Ser Gly Ala Ala Ser Ala Glu Gln Arg Ala Ala Val 1845 1850 1855 Ala Ala Leu Glu Ala His Gly Ala Arg Val Thr Val Ala Lys Ala Asp 1860 1865 1870 Val Ala Asp Arg Ser Gln Ile Glu Arg Val Leu Arg Glu Val Thr Ala 1875 1880 1885 Ser Gly Met Pro Leu Arg Gly Val Val His Ala Ala Gly Leu Val .Asp 1890 1895 1900 Asp Gly Leu Leu Met Gln Gln Thr Pro Ala Arg Phe Arg Thr Val Met 1905 1910 1915 1920 Gly Pro Lys Val Gln Gly Ala Leu His Leu His Thr Leu Thr Arg Glu 1925 1930 1935 Ala Pro Leu Ser Phe Phe Val Leu Tyr Ala Ser Ala Ala Gly Leu Phe 1940 1945 1950 Gly Ser Pro Gly Gln Gly Asn Tyr Ala Ala Ala Asn Ala Phe Leu Asp 1955 1960 1965 Ala Leu Ser His His Arg Arg Ala Gln Gly Leu Pro Ala Leu Ser Ile 1970 1975 1980 Asp Trp Gly Met Phe Thr Glu Val Gly Met Ala Val Ala Gln Glu Asn 1985 1990 1995 2000 Arg Gly Ala Arg Gln Ile Ser Arg Gly Met Arg Gly Ile Thr Pro Asp 2005 2010 2015 Glu Gly Leu Ser Ala Leu Ala Arg Leu Leu Glu Gly Asp Arg Val Gin 2020 2025 2030 Thr Gly Val Ile Pro Ile Thr Pro Arg Gln Trp Val Glu Phe Tyr Pro 2035 2040 2045 Ala Thr Ala Ala 2050 Ser Arg Arg Leu Ser 2055 Arg Leu Val Thr Thr Gln Arg 2060 WO 99/66028 PCT/EP99/04171 -71 Ala Val Ala Asp Arg Thr Ala Gly Asp Arg Asp Leu Leu Glu Gin Leu 2065 2070 2075 2080 Ala Ser Ala Glu Pro Ser Ala Arg Ala Gly Leu Leu Gin Asp Val Val 2085 2090 2095 Arg Val Gin Val Ser His Val Leu Arg Leu Pro Glu Asp Lys Ile Glu 2100 2105 2110 Val Asp Ala Pro Leu Ser Ser Met Gly Met Asp Ser Leu Met Ser Leu 2115 2120 2125 Glu Leu Arg Asn Arg Ile Glu Ala Ala Leu Gly Val Ala Ala Pro Ala 2130 2135 2140 Ala Leu Gly Trp Thr Tyr Pro Thr Val Ala Ala Ile Thr Arg Trp Leu 2145 2150 2155 2160 Leu Asp Asp Ala Leu Val Val Arg Leu Gly Gly Gly Ser Asp Thr Asp 2165 2170 2175 Glu Ser Thr Ala Ser Ala Gly Ser Phe Val His Val Leu Arg Phe Arg 2180 2185 2190 Pro Val Val Lys Pro Arg Ala Arg Leu Phe Cys Phe His Gly Ser Gly 2195 2200 2205 Gly Ser Pro Glu Gly Phe Arg Ser Trp Ser Glu Lys Ser Glu Trp Ser 2210 2215 2220 Asp Leu Glu Ile Val Ala Met Trp His Asp Arg Ser Leu Ala Ser Glu 2225 2230 2235 2240 Asp Ala Pro Gly Lys Lys Tyr Val Gin Glu Ala Ala Ser Leu Ile Gin 2245 2250 2255 His Tyr Ala Asp Ala Pro Phe Ala Leu Val Gly Phe Ser Leu Gly Val 2260 2265 2270 Arg Phe Val Met Gly Thr Ala Val Glu Leu Ala Ser Arg Ser Gly Ala 2275 2280 2285 Pro Ala Pro Leu Ala Val Phe Thr Leu Gly Gly Ser Leu Ile Ser Ser 2290 2295 2300 Ser G1u Ile Thr Pro Glu Met Glu Thr Asp Ile Ile Ala Lys Leu Phe 2305 2310 2315 2320 Phe Arg Asn Ala 'Ala Gly Phe Val Arg Ser Thr Gin Gin Val Gin Ala 2325 2330 2335 Asp Ala Arg Ala Asp Lys Val Ile Thr Asp Thr Met Val Ala Pro Ala 2340 2345 2350 Pro Gly Asp Ser Lys Glu Pro Pro Val Lys Ile Ala Val Pro Ile Val 2355 2360 2365 Ala Ile Ala Gly Ser Asp Asp Val Ile Val Pro Pro Ser Asp Val Gin 2370 2375 2380 Asp Leu Gin Ser Arg Thr Thr Glu Arg Phe Tyr Met His Leu Leu Pro 2385 2390 2395 2400 Gly Asp His Glu Phe Leu Val Asp Arg Gly Arg Glu Ile Met His Ile WO 99/66028 PCT/EP99/04171 -72- 2405 2410 2415 Val Asp Ser His Leu Asn Pro Leu Leu Ala Ala Arg Thr Thr Ser Ser 2420 2425 2430 Gly Pro Ala Phe Glu Ala Lys 2435 <210> 8 <211> 419 <212> PRT <213> Sorangium cellulosum <400> 8 Met Thr Gln Glu Gln Ala Asn Gln Ser Glu Thr Lys Pro Ala Phe Asp 1 5 10 Phe Lys Pro Phe Ala Pro Gly Tyr Ala Glu Asp Pro Phe Pro Ala lle 25 Glu Arg Leu Arg Glu Ala Thr Pro Ile Phe Tyr Trp Asp Glu Gly Arg 40 Ser Trp Val Leu Thr Arg Tyr His Asp Val Ser Ala Val Phe Arg Asp 55 Glu Arg Phe Ala Val Ser Arg Glu Glu Trp Glu Ser Ser Ala Glu Tyr 70 75 Ser Ser Ala Ile Pro Glu Leu Ser Asp Met Lys Lys Tyr Gly Leu Phe 90 Gly Leu Pro Pro Glu Asp His Ala Arg Val Arg Lys Leu Val Asn Pro 100 105 110 Ser Phe Thr Ser Arg Ala Ile Asp Leu Leu Arg Ala Glu Ile Gln Arg 115 120 125 Thr Val Asp Gln Leu Leu Asp Ala Arg Ser Gly Gln Glu Glu Phe Asp 130 135 140 Val Val Arg Asp Tyr Ala Glu Gly Ile Pro Met Arg Ala Ile Ser Ala 145 150 155 160 Leu Leu Lys Val Pro Ala Glu Cys Asp Glu Lys Phe Arg Arg Phe Gly 165 170 175 Ser Ala Thr Ala Arg Ala Leu Gly Val Gly Leu Val Pro Gln Val Asp 180 185 190 Glu Glu Thr Lys Thr Leu Val Ala Ser Val Thr Glu Gly Leu Ala Leu 195 200 205 Leu His Asp Val Leu Asp Glu Arg Arg Arg Asn Pro Leu Glu Asn Asp 210 215 220 Val Leu Thr Met Leu Leu Gln Ala Glu Ala Asp Gly Ser Arg Leu Ser 225 230 235 240 Thr Lys Glu Leu Val Ala Leu Val Gly Ala Ile Ile Ala Ala Gly Thr 245 250 255 Asp Thr Thr Ile Tyr Leu Ile Ala Phe Ala Val Leu Asn Leu Leu Arg 260 265 270 WO 99/66028 PCTIEP99/04171 -73- Glu 275 Leu Arg Gly Val Ser 355 Ala Pro Phe Gly Ala Asp Phe Glu Phe 340 Leu Arg Glu Arg <210> 9 <211> 607 <212> PRT <213> Sorangium <400> 9 Ala Ser Leu Asp 1 Asp Asp Gly His Arg Gly Ile Glu Glu Gly Gly Pro Glu Leu Leu Ala Ala Arg Ser Leu Asp Gly Pro Ala 100 Pro Leu Arg Glu 115 Glu Ala Arg Arg 130 Leu Glu Leu Glu Val Leu 295 Ala Arg Gin 310 Met Val Phe 325 Ser Arg Pro Ala Tyr Gly Leu Glu Ala 375 Met Lys Leu 390 Asn Ile Glu 405 cellulosum Ala Leu Phe 5 Gly Arg Ala Asp Leu Arg Ser Phe His 55 His Asp Gln 70 Arg His Pro Leu Val Arg Tyr Glu Glu Leu Trp Leu 135 Val 280 Arg Asp Leu Asp, Arg 360 Glu Lys Ser Ala Thr Ala 40 Cys Pro Asp Trp, Glu 120 Ala Lys Phe Leu Leu Jal 345 Gly Ile Glu Leu Ala Asp Glu Ile 330 Phe Pro Ala Thr Asn 410 Glu Asn Tyr 315 Pro Asp His Val Pro 395 Val Thr His Glu Leu Ser 75 Ser Ala Arg Pro Pro Ile 300 Cys Ser Val Val Gly 380 Val Ile Ser Val His Gly Ile Asp Arg Ala Pro 140 Gly 285 Leu Gly Ala Arg Cys 365 Thr Phe Leu Ala Leu Leu Asp Ser Ala Gly Arg 125 Cys Leu Arg Ala Leu Arg 350 Pro Ile Gly Lys Arg Ala Arg Leu Phe Met Ala 110 Thr Phe Met Ile Ser Arg 335 Asp Gly Phe Tyr Pro 415 Val Glu Ile Thr His Leu Pro Ala Ala Arg Gly Ile 320 Asp Thr Val Arg His 400 Ser Leu Ala Gln Val His Val Gly Gln Pro Arg Ala 10 Glu Arg 25 Leu Arg Met Cys Leu Ala Trp Thr 90 Leu Ala 105 Arg Glu Ala Ala WO 99/66028 PCT/EP99/04171 -74- Asp 145 Met Ala Gly Asn Ala 225 Ala Pro G ly 305 Val Met Aso Ala Pro His Pro Phe Phe Ala 465 Pro Leu Ser Thr Ala Leu 210 Pro Ser Glu Val Leu Thr Phe Pro His 370 Leu Ala Glu Glu Ser 450 Gly Ser Pro Pro Pro Gly 195 Leu Gly Trp Ala Asp Arg Al a Asp Gin Phe Ala Ile Arg Arg His 435 Leu Arg Arg Arg Glu Glu 180 Pro Leu Thr Glu Leu 260 Asn Arg Val Gly Pro 340 Phe Asn Val Gly Gly 420 Pro Ala Leu Phe Phe Val 165 Leu Trp, Gly Ser Val 245 Trp Leu Leu Ala Asp 325 Gly Glu Ala Met Phe 405 Ala Thr Cys Glu Ala 485 Glu 150 Ala Ala Ser Phe Glu 230 Val Glu Ser Arg Gly 310 Ala Arg Leu Gly Ala 390 Met Pro Pro Asp Leu 470 Tyr Asp Glu Cys Gly Gly 215 Ala Ser Arg Arg Ala 295 Val Leu Ile Ala Thr 375 Arg Ala Phe Arg Glu 455 Trp Leu Asp Ala Ala .Glu Ala Ala 185 Tyr Pro 200 Leu Pro Ala Leu Ser Lys Leu Arg 265 Phe Glu 280 Gln Pro Ser Ser Tyr Ser Ser Pro 345 Pro Pro 360 Ile Ser Asn Gln Trp Val Val Val 425 Cys Leu 440 Glu His Arg His Gly Glu Asn Arg 170 Leu Ala Thr Arg Lys 250 Thr Arg Ala Ser Gly 330 Val Leu Lys Ala Asn 410 Gln His Leu Pro His 490 Gly 155 Arg Leu Tyr Ala Gly 235 Ser Ile Ala Pro Gly 315 Asp Val Ser Val Arg 395 Gln Arg Glu Tyr His 475 Pro Leu Leu Ala Glu Ile 220 Ala Gin Val Glu Phe 300 Arg Gly Leu Gin Leu 380 Pro Ala Ser Pro Trp 460 His Ile Pro Arg Trp Met 205 Ala Ala Leu Arg Ala 285 Ala Leu Asn Leu Met 365 Thr Met Met Thr Ala 445 Cys Arg Ala Leu Ala Leu 190 Leu Ala Arg Gly Ala 270 Ile Ala Ser Asp Ala 350 Leu Glu Ser Val Ile 430 Gly Glu Pro Ala Gly Ser 175 Gly Pro Ala Leu Asn 255 Met Ala Gly Gly Ile 335 Gly Phe Gly Leu Pro 415 Met Ser Leu Gly Thr 495 Pro 160 Tyr Thr Glu Ser Phe 240 Ile Gly Ala Ala Leu 320 Val Thr Val Ser Val 400 Asp Glu Ala Ser Ala 480 Trp WO 99/66028 WO 9966028PCTIEP99/041 71 T~yr Asp Ile Asp 545 His Gin Leu Val Arg Ala Pro 555 Thr Val Thr Asp Giu Ser Asp Asp 575 Arg Ser Pro Pro Giu Trp 560 Tyr Giy <210> i0 <21i> 423 212> PR7 <2:13> Sorangii'n celiuioswri <400> i0 Met Giy .1 Aia Giu Ala Pro Pro Giy Ala Val Thr Ile Val Met Asp Arg Asp Giy 130 Arg Asp 145 Ser Ser Ala Leu Ilie 5 Giu Giu Giy Ala Arg Giu Ala Val1 Trp Gly Val Thr Phe Arg Ile Met Leu Val 100 Trp Leu Pro 115 Pro Ile Asp Leu Met Thr Pro Ile Gin 165 Ser Val Ala Aia Pro Giy Cys Aia Leu Pro Gly Met Aia 40 Vai Ala Leu Gly Ser Met Giu Gly Leu Ala 120 Thr Vai 135 Thr Met Ala Ile 10 Gin Asp Ala Giu Ara Gly Gly Ser Thr Lys 90 Lys Leu 105 Asn Arg Pro Ala Giy Phe Asp Giu 170 Ala Val1 Asp Ala 75 Ala Asp Lys Giu Gly 155 Leu Gly Al a Aso) Pro Val1 Leu Val1 Arg 140 Ile Gly Aia Ala Val1 Met Thr Asp Leu 125 Pro Ser Leu Gly Gly Ala Leu Gin Met Val Asp Arg Asp Thr Ala Pro Val Arg Ile Thr Val Asp Ala 160 Asn Ala 175 Gin Pro Val Pro 180 Met Thr Pro His Gly Pro Asp GiU Trp 185 Ile Arg Arg 190 WO 99/66028 PCT/EP99/04171 -76- Leu Gly Thr Leu Pro Leu Met 195 Asn Thr 210 Gin Gly 225 Met Arg Ala Gly Met Asp Pro Ser 290 Phe Ala 305 Leu Ser Ala Gln Gly Trp Glu Val 370 Ile Asn 385 Ala Gly Tyr Val Gly Phe Asp Cys Arg 275 Gly Arg Ala Lys Gly 355 Pro Asp ?he A Ia Leu Ala Asp 245 Tyr Gly Ala Leu Ser 325 Ala Gly Arg Gly Phe 405 Glu Val Phe 230 Phe Phe Ala Gly Met 310 Val Ser Met Tyr Arg 390 Ser Ser Gin 215 Val His Thr Glu Leu 295 Asn Arg Ser Ala Gly 375 Glu Gly Ala His Gin Pro 200 Gly Val Leu Arg Glu Arg Val Pro Ala 250 Asp Glu Gin 265 Ser Ala Tyr 280 Val Ser Thr Gly Gly Val Glu Met Thr 330 Phe Phe Pro 345 Val Val Thr 360 Trp Asp Gly Leu Ile Gly Ala Leu Glu 410 Gly Val Ile 235 Asp Thr Ala Val His 315 Ala Gly Ala Gly Ile 395 Arg Ala Gly 220 Leu Lys Gly Ser Asp 300 Glu Asp Phe Pro Phe 380 Val Phe Gin Trp 205 Arg Ala Ala Pro Leu Ala Glu Lys 270 Pro Pro 285 Asp Tyr Gly Arg His Leu Phe Glu 350 Asp Ala 365 Gly Thr Met Thr Trp Arg Met Tyr Ala Asp Leu Gly 240 Arg Phe 255 Thr Arg Ala Phe Leu Leu Arg Leu 320 Thr Pro 335 Thr His Val Ser Ser Trp Gin Ser 400 Ser Val 415 <210> 11 <211> 713 <212> PRT <213> Sorangium <400> 11 Met His Gly Leu 1 Ala Leu Ile Leu Leu Arg Gin Pro cellulosum Thr Glu Arg Gin Val Leu Leu Ser Leu Val Thr Leu 5 10 Val Thr Ala Arg Ala Ser Gly Glu Leu Ala Arg Arg 25 Glu Val Leu Gly Glu Leu Phe Gly Gly Val Val Leu 40 Gly Pro Ser Val Val Gly Ala Leu Ala Pro Gly Phe His Arg Ala Leu 55 Phe Gin Glu Pro Ala Val Gly Val Val Leu Ser Gly Ile Ser Trp Ile WO 99/66028 PCT/EP99/04171 -77- 70 75 Gly Ala Leu Leu Leu Leu Leu Met Ala Gly lie Glu Val Asp Val Gly 90 Ile Leu Arg Lys Glu Ala Arg Pro Gly Ala Leu Ser Ala Leu Gly Ala 100 105 110 Ile Ala Pro Pro Leu Ala Ala Gly Ala Ala Phe Ser Ala Leu Val Leu 115 120 125' Asp Arg Pro Leu Pro Ser Gly Leu Phe Leu Gly Ile Val Leu Ser Val 130 135 140 Thr Ala Val Ser Val Ile Ala Lys Val Leu Ile Glu Arg Glu Ser Met 145 150 155 160 Arg Arg Ser Tyr Ala Gln Val Thr Leu Ala Ala Gly Val Val Ser Glu 165 170 175 Val Ala Ala Trp Val Leu Val Ala Met Thr Ser Ser Ser Tyr Gly Ala 180 185 190 Ser Pro Ala Leu Ala Val Ala Arg Ser Ala Leu Leu Ala Ser Gly Phe 195 200 205 Leu Leu Phe Met Val Leu Val Gly Arg Arg Leu Thr His Leu Ala Met 210 215 220 Arg Trp Val Ala Asp Ala Thr Arg Val Ser Lys Gly Gln Val Ser Leu 230 235 240 Val Leu Val Leu Thr Phe Leu Ala Ala Ala Leu Thr Gln Arg Leu Gly 245 250 255 Leu His Pro Leu Leu Gly Ala Phe Ala Leu Gly Val Leu Leu Asn Ser 260 265 270 Ala Pro Arg Thr Asn Arg Pro Leu Leu Asp Gly Val Gln Thr Leu Val 275 280 285 Ala Gly Leu Phe Ala Pro Val Phe Phe Val Leu Ala Gly Met Arg Val 290 295 300 Asp Val Ser Gln Leu Arg Thr Pro Ala Ala Trp Gly Thr Val Ala Leu 305 310 315 320 Leu Leu Ala Thr Ala Thr Ala Ala Lys Val Val Pro Ala Ala Leu Gly 325 330 335 Ala Arg Leu Gly Gly Leu Arg Gly Ser Glu Ala Ala Leu Val Ala Val 340 345 350 Gly Leu Asn Met Lys Gly Gly Thr Asp Leu Ile Val Ala Ile Val Gly 355 360 365 Val Glu Leu Gly Leu Leu Ser Asn Glu Ala Tyr Thr Met Tyr Ala Val 370 375 380 Val Ala Leu Val Thr Val Thr Ala Ser Pro Ala Leu Leu Ile Trp Leu 385 390 395 400 Glu Lys Arg Ala Pro Pro Thr Gln Glu Glu Ser Ala Arg Leu Glu Arg 405 410 415 WO 99/66028 PCT/EP99/04171 -78- Glu Glu Ala Ala Arg Arg Ala Tyr 420 Val Glu Thr 465 Gly Gly Leu Pro Val 545 Ala Leu Ala Gly Ala 625 Val Ile Cys Glu Pro Ser 450 Glu Glu Ile Arg Ala 530 Gin Ala Glu Trp Ala 610 Arc Arg Thr Tyr Ser 690 Ile 435 Ile Leu Ala Trp Ala 515 Arg Arg Glu Tyr Asp 595 Val Ser Val Arg Asp 675 Val Val Val Ser Ser Arg 500 Ser Ala Ala Arg Ser 580 Ala Val Val Ser Glu 660 His Val Ala Ala Val Arg 485 Gin Arg Arg Glu Ala 565 Phe Glu Trp Val Ser 645 Leu Gly Val His Ala Ser Lys 455 Glu Gn 470 Gly Leu Arg Arg Asp His Gly Met 535 Ser Asn 550 Ser Ala Ala Ala Leu Val Arg Asp 615 Asp Glu 630 Arg Val Ala Arg Pro Leu Arg Ser 695 Leu 440 Arg Gln Ala Glu Asp 520 Ser Val Arg Ala Leu 600 Arg Ala His Ala Gly 680 Arg Ile Pro 425 Pro Gly Lys Leu Ala Pro Arg Leu 490 Leu Arg 505 Leu Leu Phe Gly Leu Val Arg Ile 570 Asp Leu 585 Leu Ser Glu Pro Val Phe Val Gly 650 Pro Tyr 665 Arg Leu Val Pro Gly Val Phe Ala Gly Glu 460 Gly Pro 475 Gly Ala Gly Ser Val Ile Arg Leu 540 Val Val 555 Leu Val Ala Ala Ser Ala Ser Arg 620 Arg Gly 635 Ala His Asp Leu Tyr Leu Val Ala 700 Glu Arg 430 Thr Asp 445 Thr Val Ser Arg Arg Leu Ile Gin 510 Gly Ala 525 Gin Asp Gly Asp Pro Ile His Val 590 Gin Thr 605 Val Arg Arg Arg Pro Ser Leu Val 670 Gly Ser 685 Leu Leu Ile Ile Asp Ala Arg 495 Ala Arg Ala Pro Ile 575 Ala Asp Ala Leu Asp 655 Leu Thr Val Leu Val Ile Ala 480 Val Ile Ser Ile Pro 560 Gly Leu Pro Val Gly 640 Glu Gly Val Ala His Gly Gly Thr Arg Glu Gln Val Arg 705 710 <210> 12 <211> 126 <212> PRT <213> Sorangium cellulosum <400> 12 Met Asp Lys Pro Ile Gly Arg Thr Arg Cys Ala Ile Ala Glu Gly Tyr WO 99/66028 WO 9966028PCT/EP99/041 71 -79 <210> 13 <211> 149 <212> PRT <213> Sorangiun ceilulosum <400> 13 Met Lys His Val Asp Thr Gly Thr Leu Gly Leu Leu Ala Ser Ser Glu Lys Thr Val Gin Gly Arg Val Thr Ala Asp Val Asp 55 Val Asp Val Val His Leu Ser 70 Giu Arg Phe Val Val Trp Gin Arg Val Gly Val Leu Asp Tyr 100 Ala Giu Thr Thr Val Pro Tyr 115 Giu Lys Gin Ser Ser Pro Gin 130 135 Pro Thr Ser Val Giy 145 <210> 14 <211> 184 <212> PRT <213> Sorangiun cellulosun Pro Ser Asp His Thr His Ile 120 Arg Met Thr 40 Pro Pro Arg Asn Ala 120 Ser Giu Asp Pro Val Asp Thr 105 Ala Arg Aia 25 Arg Asp Pro Pro Ala 105 Asn Pro Pro Gin Met Arg Asp Ala Ala Gly Pro Arg Phe Asn Tyr Ala Ser Arg Leu Asp Tyr Thr Asp Phe Gly Arg 10 Leu Ala Gly Leu Ala Pro Ala Ala Thr Giu A-rg Leu 75 Ser Pro Glu 90 Asp Ser Arg Phe Giu Leu Ser Ser Ala 140 His Ala Val Thr Glu Gin Gly Gly Asp Leu Gly Trp, Lys Thr Ile Giu Ile Thr Glu Ser Ala His Pro Ala Ala Ser Arg Leu Ala Gly Arg Ile Cys Gly Gly Ala Thr Arg Glu Ala Ser Pro Arg Gly 110 Leu Ile 125 Ala Val WO 99/66028 WO 9966028PCT/EP99/041 71 <400> 14 Val Thr Ser Glu Giu Val Pro Gly Ala Ala Leu Gly Ala Gin 1 5 10 Ser Ser Val Glu Leu Gly Phe Gly Ser 130 7 eu Pro Arg Glu Ser Glu Gly Phe Gly 115 Glu Ser Val1 Arg Ala Pro Leu Leu Gly His 100 Ile Val1 Gin Arg 180 Gin Pro Val Ala Val Arg Ser Asp Leu Arg 165 Ser Ala Met Ala Leu Gly Phe Gly 120 Ile Ile Gly Ala Arg 25 His Leu Arg Pro Gin 105 Ala Leu Al a Al a Val Leu Leu Pro Val Pro Val Lys Pro 155 Thr Cys Arg Ala Leu Ala Gly 110 Leu Arg Thr Leu Thr Gin Arg Gly Leu Val Met Pro Pro Asp 175 Sorangium <400> Val Asri Ala Pro Gly Gly Ala Ile Leu Arg Arg Met Thr Ser Ala Pro Gly Ser Trp Lys Thr Asp Gly Pro Gly Trp Arg Ser 100 cel lulosum Cys Met Arg Ala Pro Ser Leu Thr Ser Ile Gin Giu Arg Thr Arg 70 Ser Thr Thr Arg Arg Ala Thr Glu Ser Pro Asn Pro Ser 105 Gly Al a Pro Ser Gly 75 Ser Lys Ser Ser Arg Ala Thr Ser Arg WO 99/66028 WO 9966028PCT/EP99/04171 -81 Arg Thr Ser Ala Arg Ala Thr Ser Glu Ser Arg Thr Cys Arg Ser Val 115 120 125 Arg Pro Cys Ile Arg Ala Gly Gly Ser Ser Ala Arg Val Gln Gly Arg 130 135 140 Thr 145 <210> 16 <211> 185 <212> PRT <213> Sorangium cellulosum <400> 16 Val1 Leuj 1 Glu Proj Arg Pro I Ala Gluj I le Val Thr Ala Trp Phe Pis Ala Pro Tyr 130 Gly Giu 145 Gin Gin Leu TPyr Asp Asp Ile Pro Al a Ile Arg Ala Arg 135 His Leu Arg Ile Giu Ala Arg Leu Ala Tyr Ser 120 Val1 Pro Arg Glu Arg Al a Al a Leu Asp Gly Ala 105 Ala Ser Ala Ala Giu 185 Ala Ala Asp Giu Ala Val Val Arg Phe Ala Asn Asp Ala Ala 110 Leu Ala 125 Gly Glu Pro Ala Ala Leu <210> 17 <211> 146 <212> PRT <213> Sorangium <400> 17 Met Ala Asp Ala 1 Leu Ala Tyr Arg cel lulosun Ala Ser Arg Ser Ala Cys Ser Val Ala Ala Arg Lys 5 10 Ala Ala Th~r Ser Asn Gin Thr Ala Ser Phe Trp Ser 25 WO 99/66028 WO 9966028PCT/EP99/04171 -82 Leu Pro Ala Ile Leu Ser Ser Ala Ser Ser Arg Gly Tyr Ala Ala Ile Ser Ala Ser Ser 100 Gly Arg Met Ser 115 Ala Pro Arg Leu 130 Pro Thr 145 <210> 18 <211> 288 <212> PRT <213> Sorarigium Thr Ser Ala Arg Glu Gly Ala 135 Pro 40 Arg Al a Asn Ser Ala 120 Gln Val Val Ile Ala Ala His 75 Arg Ser Ser Ser Thr Giy Val Ala Ala Lys Arg Thr Thr Ala Ser Ser Ala Ala 110 Val Tyr 125 Arg Arg Thr Leu Glu Ser Giy Gin Asp cellulosum <400> 18 Val Thr 1 Vai Val Arg Leu Arg Ala Leu Pro Ile Leu Leu Val Val Leu Arg Arg 130 Glu Ala 145 Val1 Thr Ar g Trp Pro Pro Al a Val 115 Ala Met Ser Ala Arg Arg Gly Ser Arg 100 Gly Giu Gin Ser Met Pro Arg Ser 5 Leu Gly His Pro Arg Leu Ala Arg 70 His Arg Met Ser Asp Gly Arg Arg Lys Ile 150 Ala Arg 25 Ala Gly 40 Gin His Gly Thr Ala Asp His Val 105 Ala Arg 120 Ser Asp Gly Lys Trp 10 Arg Arg Ile Ser Leu 90 Ala Gly Val Leu Ser Ser Arg Val Leu Ala Ser Cys 75 Gly Arg Arg Thr Val 155 Gly Ser Arg Ser Pro Trp Ala Asp Ser Gly Pro His 110 Arg Leu 125 Arg Glu Gly Leu Arg Ile Arg Arg Arg Gly Ala Ser Gly Ile Thr Ser Leu His Arg Thr Ala Asn Gly Ser 160 Val Ser Gly Met Ser 165 Leu Leu Ala Ala Cys Giy Gly Giu Lys Arg Ser 170 175 WO 99/66028 WO 9966028PCTIEP99/041 71 83 Gly Pro Glu Asp 225 Phe Leu Gly Glu Ala 180 Gly Ser 195 Arg Cys Cys Ser Glu Cys Gly Ile 260 Val Val 275 Gly Ser 200 Gly Arg Ile Gly Ser 280 Glu Ala 190 Arg Cys 205 Tyr Ser Asp Glu Leu Asn Phe Asp 270 Arg Asp 285 Val Arg Lys Thr 240 Cys Leu Arg <210> 19 <211> 288 <212> PRT <213> Sorangiun cel lulosurn Vaj. Thr Val Val Val Thr Arg Leu Arg Arg Ala Trp Leu Pro Pro lie Leu Pro Leu Val Ala Val Leu Val 115 Arg Arg Ala 130 Glu Ala Met 145 Val Ser Gly Ser Ala Arg A g Gly Ser Arg 100 Gly Glu Gin Met Ser Leu His Arg Al a His Met Asp Arg Lys Ser Met Gly Pro Leu Arg 70 Arg Ser Gly Arg Ile 150 Leu Pro Cys Glu Pro Val Thr Gly Ser Val1 135 Ala Leu Arg Al a Ala 40 Gin Gly Ala His Ala 120 Ser Gly Ala Ser Trp Arg Arg 25 Gly Arg His Ile Thr Ser Asp Leu Val Ala 105 Arg Gly Asp Val Lys Leu Ala Cys 170 Ser Leu Al a Ser Cys 75 Gly Arg Arg Thr Val 155 Gly Ser Arg Val Ser Gly Ser Pro Arg Ser Ser Pro Trp, Pro Ala Asp Thr Ser Gly Asn Pro His 110 Arg Arg Leu 125 Cys Arg Glu 140 Val Gly Leu Gly Glu Lys Thr Ser Leu His Arg Thr Ala Asn Gly Ser 160 Ser 165 Gly Gly Glu Ala Gin Thr Pro Gly Gly Ala Gin Gly Glu Ala Pro Val WO 99/66028 PCTIEP99/04171 -84- 180 185 190 Pro Val Gly Ser Ala Val Asp Ser Ile Val Ala Ala Arg Cys Asp Arg 195 200 205 Glu Ala Arg Cys Asn Asn Ile Gly Gln Asp Arg Glu Tyr Ser Ser Lys 210 215 220 Asp Ala Cys Ser Asn Lys Ile Arg Ser Glu Trp Arg Asp Glu Leu Thr 225 230 235 240 Phe Gly Glu Cys Pro Gly Gly Ile Asp Ala Lys Gln Leu Asn Glu Cys 245 250 255 Leu Glu Gly Ile Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu 260 265 270 Gly Arg Val Val Ala Cys Arg Ser Ser Asp Leu Cys Arg Asp Ala Arg 275 280 285 <210> <211> 155 <212> PRT <213> Sorangium cellulosum <400> Met Asp Pro Arg Ala Arg Arg Glu Lys Arg Pro Ser Leu Leu Asp Ser 1 5 10 Arg Gly Arg Gln Pro Lys Arg Ser Gin Gin Gly Gly His Met Glu Lys 25 Pro Ile Gly Arg Thr Arg Trp Ala Ile Ala Glu Gly Tyr Ile Pro Gly 40 Arg Ser Asn Gly Pro Glu Pro Gin Met Thr Ser His Glu Thr Ala Cys 55 Leu Leu Asn Ala Ser Asp Arg Asp Ala Gin Val Ala Ile Thr Val Tyr 70 75 Phe Ser Asp Arg Asp Pro Ala Gly Pro Tyr Arg Val Thr Val Pro Ala 90 Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu Pro Glu Pro 100 105 110 Ile Pro Arg Asp Thr Asp Tyr Ala Ser Val Ile Glu Ser Asp Val Pro 115 120 125 Ile Val Val Gin His Thr Arg Leu Asp Ser Arg Gln Ala Glu Asn Ala 130 135 140 Leu Ile Ser Thr Ile Ala Tyr Thr Asp Arg Glu 145 150 155 <210> 21 <211> 156 <212> PRT <213> Sorangium cellulosum WO 99/66028 PCT/EP99/04171 <400> 21 Val Arg Arg Ser 1 Val Gly Arg Arg Leu Ala Gly Cys Leu Ala Pro Gly Ala Ala Thr Thr Glu Arg Ile Glu Ser Ser Glu Ser 100 Ala Ser Arg Arg 115 Phe Glu Leu Leu 130 Ser Ser Ala Ala 145 <210> 22 <211> 305 <212> PRT <213> Sorangium <400> 22 Met Glu Lys Glu 1.
Val Ala Ile Ala Ser Ala Met Leu Gly Leu Leu Leu Ala Glu His Pro Ile Val Ala Ile Glu Gly Ile Leu 100 Trp Asn Tyr Val 115 Leu Ile Ile Ser 130 Arg Trp Gin 5 Ile Gly Leu Gly Gly Pro Ala Asp Ala 55 Arg Leu Ala 70 Ala Gly Ser Pro Trp Gn Gly Lys Leu Ile Thr Val 135 Val Ile Gly 150 cellulosum Ser Arg Ile 5 Ala Val Lys Ser Glu Gly Leu Leu Gly 55 Phe Gly His 70 Met Ile Phe His Leu Leu Val Leu Gly Ile His Glu 135 Met Thr Ser 40 His Val Glu Arg Ala 120 Glu Pro Ala Phe Val 40 Lys Gly Ala His Ala 120 Phe Lys Leu Glu Val Asp Arg Val 105 Glu Lys Thr Ile Ile 25 His His Lys Ala Pro 105 Ala Lys His 10 Gly Lys Ala Val Phe 90 Gly Thr Gin Ser Tyr 10 Ala Ser Arg Glu Gly 90 Arg Ala Lys Val Leu Ile Ala Val 75 Val Val Thr Ser Val 155 Gly Ala Leu Ser Leu 75 Gly Gin Val Lys Asp Leu Val Asp His Val Leu Val Ser 140 Gly Ala Ala Val Ala Tyr Gly Ile Phe Asp 140 Thr Ala Gln Val Leu Trp Asp Pro 125 Pro Arg Met Thr Pro Pro Arg Asn Ala Ser Ala Gly Ala Pro Thr Ile Pro Thr Gly Arg Ala Arg Asp Pro Pro Ala Asn Pro Asn Ser Asp Asp Leu Tyr Thr Ser Tyr Ile Ala Val Thr Asp Thr Arg Pro Phe Trp Val Ser Glu Asp 110 Glu Gly 125 Gly Gin WO 99/66028 WO 9966028PCTIEP99/04 171 86 Leu Ala Ala Met 145 Leu Glu Asp Ser Val Trp Leu Gly 180 Ser Ile Gly Ile 195 Ser Gin Ser Arg 210 Leu Ala Ala Ile 225 Val Gly Arg Pro Vai Leu Arg Ile 260 Glu Ala lie Glu 275 Val Lys His Ile 290 -a <210> 23 <211> 135 <212> PRT <213> Sorangium <400> 23 Val Gin Thr Ser Arg Arg Ilie Ala His Giu Giy Ala Lys Ala Arg Ala Arg Arg Gly Leu Gly Arg Ser Arg Leu Ala Gly Gly 100 Ilie Ile Glu Arg 115 Ser Leu Leu Val Leu 215 Leu Met Asp Glu Glu 295 Asp Gly Asn 185 A.la Gly Ser Phe Ala 265 Arg Arg Arg Al a 25 Phe Leu Ala Arg Val 105 Pro Thr Ile Leu Ala Ala 220 Pro His Ala Ser His 300 Gly Ar g Gly Arg Al a Ile Leu Gly Phe Thr Ala Phe Asp Gly 190 Val Phe 205 Asp Arg Gly Val Glu Val Ser Gly 270 Glu Arg 285 Gin Arg Cys Lys Ala Gly Asp Val Asp Asp Ala Leu Ala Ser Phe Gin 110 Phe Asp 125 Ile Leu 17 kla Leu Glu Ser Leu 255 Val Pro Al a Ser Arg Met Gly Gin Val Leu Ser Val 160 Gly Ala Ala Leu Ala 240 Val Ala Asp Arg Ser Ala Arg Trp Arg Ser Gly Ala cellulosum Ser Phe Asp Ala Arg Ser Gly Ser Ala Ser Ala Gly 40 His Gly Ala Met 55 Pro Gly Ala Gly 70 Asp Leu Ala Arg Ala Ser Met Ala Leu Pro Asp Pro 120 Tyr Ala 10 Gly Ala Glu Gly Gly Gly Leu Arg 75 Arg Leu 90 Val Ser Leu Pro WO 99/66028 PCT/EP99/04171 -87- Lys Val Thr Ser Ser Asp Ile 130 135 <210> 24 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: universal reverse primer <400> 24 ggaaacagct atgaccatg 19 <210> <211> 17 <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: universal forward primer <400> gtaaaacgac ggccagt 17 <210> 26 <211> 28 <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: PCR primer NH24 end "B" <400> 26 gtgactggcg cctggaatct gcatgagc 28 <210> 27 <211> 28 <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: PCR primer NH2 end "A" <400> 27 agcgggagct tgctagacat tctgtttc 28 <210> 28 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: PCR primer NH2 end "B" <400> 28 gacgcgcctc gggcagcgcc ccaa 24 <210> 29 WO 99/66028 WO 9966028PCT/EP99/04171 88- <211> <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: PCR primer pEPO15-NH6 end "B" <400> 29 caccgaagcg tcgatctggt ccatc <210> <211> <212> DNA <213> Artificial Sequence <220> <223> Description of Artificial Sequence: PCR primer pEPO15H2.7 end "All <400> cggtcagatc gacgacgggc tttcc

Claims (14)

1. An isolated nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of epothilone, wherein the complement of said nucleotide sequence hybridizes to a nucleotide sequence selected from the group consisting of: the complement of nucleotides
1900-3171 of SEQ ID NO:1, nucleotides
3415- 5556 of SEQ ID NO:1, nucleotides
7610-11875 of SEQ ID NO:1, nucleotides
7643-8920 of SEQ ID NO:1, nucleotides
9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides
11549-11764 of SEQ ID NO:1, nucleotides
11872-16104 of SEQ ID NO:1, nucleotides
12085-12114 of SEQ ID NO:1, nucleotides
12223-12246 of SEQ ID NO:1, nucleotides
12466-12507 of SEQ ID NO:1, nucleotides
12928-12960 of SEQ ID NO:1,. nucleotides
13516-13566 of SEQ ID NO:1, nucleotides
13633-13680 of SEQ ID NO:1, nucleotides
13876-13923 of SEQ ID NO:1, nucleotides
14313-14334 of SEQ ID NO:1, nucleotides
14473-14547 of SEQ ID NO:1, nucleotides
14578-14607 of SEQ ID NO:1, nucleotides
14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides
15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides
15901-15924 of SEQ ID NO:1, nucleotides
16251-21749 of SEQ ID NO:1, nucleotides
16269-17546 of SEQ ID NO:1, nucleotides
17865-18827 of SEQ ID NO:1, nucleotides
18855-19361 of SEQ ID NO:1, nucleotides
20565-21302 of SEQ ID NO:1, nucleotides
21414-21626 of SEQ ID NO:1, nucleotides
21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, S nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides
26318-27595 of SEQ ID NO:1, nucleotides
27911-28876 of SEQ ID NO:1, nucleotides
29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides
33401-33889 of SEQ ID NO:1, nucleotides
35042-35902 of SEQ ID NO:1, o. nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides
37052-38320 of SEQ ID NO:1, nucleotides
38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides
43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides
47811-48032 of SEQ ID NO:1, nucleotides
48087-49361 of SEQ ID NO:1, eotides 49680-50642 of SEQ ID NO:1, nucleotides
50670-51176 of SEQ ID NO:1, 63 nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1, under conditions of hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaP0 4 pH 1 mM EDTA at 50°C and washing with 2X SSC, 1% SDS at 500C.
2. An isolated nucleic acid molecule according to claim 1 comprising a nucleotide sequence whose complement hybridizes to a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415- 5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, ,,gIeotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, 64 nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1, under conditions of hybridization at 65*C for 36 hours and washing 3 times at high stringency with 0.1X SSC and 0.5% SDS for 20 minutes at 65 0 C.
3. An isolated nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypeptide involved in the biosynthesis of epothilone, wherein said nucleotide sequence has at least 60 percent sequence identity with a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, 0* nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, tides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO: 1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
4. An isolated nucleic acid molecule according to claim 3 comprising a nucleotide sequence that has at least 80 percent sequence identity with a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, S nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, tides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, 66 nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO: 1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
5. An isolated nucleic acid molecule according to claim 4 comprising a nucleotide sequence that has at least 90 percent sequence identity with a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, Seotides 3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, I(~Z~l 67 nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 9236-10201 of SEQ ID NO: 1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-1 2246 of SEQ ID NO:1, nucleotides 12466-1 2507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 1351 6-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-1 5693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-1 5639 of SEQ ID NO:1, nucleotides 15901-1 5924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-1 8827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 2631 8-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of S.EQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-4936 1 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO: 1, nucleotides 55028-56284 of SEQ ID NO: 1, :9 nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, -68- nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
6. An isolated nucleic acid molecule according to claim 5 comprising a nucleotide sequence that has at least 95 percent sequence identity with a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, o nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, /ro- Seotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, -69 nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO: 1, nucleotides 55028-56284 of SEQ ID NO: 1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides
60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides
61427-62254 of SEQ ID NO:1, nucleotides
62369-63628 of SEQ ID NO:1, nucleotides
67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
7. An isolated nucleic acid molecule according to either claim 1 or claim 3 comprising a nucleotide sequence that encodes a polypeptide which comprises an amino acid sequence selected from the group consisting of: SEQ ID NO:2, amino acids 11-437 of SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, 2: SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ ID amino acids 563-884 of SEQ ID NO:5, amino acids 1147-1399 of SEQ ID NO:5, amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 2056- 2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5. amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5, amino acids 4433-4719 of SEQ ID amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, amino acids 6857- 7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, amino acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of ,ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6, amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093- 2164 of SEQ ID NO:7, amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID SEQ ID NO:11, and SEQ ID NO:22.
8. An isolated nucleic acid molecule according to either claim 1 or claim 3 comprising a nucleotide sequence selected from the group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID NO:1, nucleotides 7610- 11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, gtides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, -71 nucleotides 43163-43378 of SEQ ID NO: 1, nucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1.
9. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises a p-ketoacyl-synthase domain comprising an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, amino acids 7- 432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7.
10. An isolated nucleic acid molecule according to claim 9, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 55028-56284 of SEQ ID NO:1. O
11. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises an acyltransferase domain comprising an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID 72- amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7.
12. An isolated nucleic acid molecule according to claim 11, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, and nucleotides 56600-57565 of SEQ ID NO:1.
13. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises an enoyl reductase domain comprising an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7.
14. An isolated nucleic acid molecule according to claim 13, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, and nucleotides 59366-60304 of SEQ ID NO:1. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises an acyl carrier protein domain comprising an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5, amino acids 2932- 3005 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7. 16. An isolated nucleic acid molecule according to claim 15, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, 73 nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 61211-61426 of SEQ ID NO:1. 17. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises a dehydratase domain comprising an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886- 4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. 18. An isolated nucleic acid molecule according to claim 17, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ ID NO:1. 19. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises a p-ketoreductase domain comprising an amino acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 4729- 4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. 20. An isolated nucleic acid molecule according to claim 19, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID NO:1. 21. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises a methyltransferase domain comprising amino acids 2671-3045 of SEQ ID NO:6. N 74- 22. An isolated nucleic acid molecule according to claim 21, wherein said nucleotide sequence is nucleotides 51534-52657 of SEQ ID NO: 1. 23. An isolated nucleic acid molecule according to either claim 1 or claim 3. wherein said polypeptide comprises a thioesterase domain comprising amino acids 2165-2439 of SEQ ID NO:7. 24. An isolated nucleic acid molecule according to claim 23, wherein said nucleotide sequence is nudeotides 61427-62254 of SEQ ID NO:1. An isolated nucleic acid molecule according to either claim 1 or claim 3, wherein said polypeptide comprises a non-ribosomal peptide synthetase comprising an amino acid sequence selected from the group consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3. amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, and amino acids 1344- 1351 of SEQ ID NO:3. 26. An isolated nucleic acid molecule according to claim 25, wherein said nucleotide sequence is selected from the group consisting of: nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO: 1, nudeotides 12928-12960 of SEQ ID NO: 1, nucleotides 13516-13566 of SEQ ID NO:1, nudeotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nudeotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nudeotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. .o 27. An isolated nucleic acid molecule according to any one of claims 1 to 26, w said nucleotide sequence is isolated from a myxobacterium. 75 28. An isolated nucleic acid molecule according to claim 27, wherein said myxobacterium is Sorangium cellulosum. 29. A chimeric gene comprising a heterologous promoter sequence operatively linked to a nucleic acid molecule according to any one of claims 1 to 28. A recombinant vector comprising a chimeric gene according to claim 29. 31. A recombinant host cell comprising a chimeric gene according to claim 29. 32. The recombinant host cell of claim 31, which is a bacteria. 33. The recombinant host cell of claim 32, which is an Actinomycete. 34. The recombinant host cell of claim 33, which is Streptomyces. A Bac clone comprising a nucleic acid molecule according to any one of claims 1 to 28. 36. The Bac clone of claim 35, which is 37. A method for heterologous expression of epothilone in a recombinant host, :comprising: introducing a chimeric gene according to claim 29 into a host; and growing the host in conditions that allow biosynthesis of epothilone in the host. 38. A method for producing epothilone, comprising: expressing epothilone in a recombinant host by the method of claim 37; and S" extracting epothilone from the recombinant host. o o 39. An isolated polypeptide involved in the biosynthesis of epothilone, wherein said polypeptide comprises an amino acid sequence that is at least 90% identical to an amino Ad"sequence selected from the group consisting of: SEQ ID NO:2, amino acids 11-437 of -76- SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285- 1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ ID NO:5, amino acids 563-884 of SEQ ID NO:5, amino acids 1147-1399 of SEQ ID amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555- 3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5, amino acids 4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, :amino acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522- 1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6, amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NO:7, amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11, and SEQ ID NO:22. An isolated polypeptide according to claim 39 comprising an amino acid sequence selected from the group consisting of: SEQ ID NO:2, amino acids 11-437 of SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID NO:2, Varoino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, 77 amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ ID amino acids 563-884 of SEQ ID NO:5, amino acids 1147-1399 of SEQ ID amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555- 3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5, amino acids 4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, amino acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522- 1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6, amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NO:7, amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11, and SEQ ID NO:22. 41. An isolated polypeptide according to claim 39, wherein said polypeptide comprises a p-ketoacyl-synthase domain comprising an amino acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 78- 454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. 42. An isolated polypeptide according to claim 39, wherein said polypeptide comprises an acyltransferase domain comprising an amino acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. 43. An isolated polypeptide according to claim 39, wherein said polypeptide comprises an enoyl reductase domain comprising an amino acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. 44. An isolated polypeptide according to claim 39, wherein said polypeptide comprises an acyl carrier protein domain comprising an amino acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID amino acids 1430-1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7. co* An isolated polypeptide according to claim 39, wherein said polypeptide comprises a dehydratase domain comprising an amino acid sequence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. 46. An isolated polypeptide according to claim 39, wherein said polypeptide comprises a P-ketoreductase domain comprising an amino acid sequence selected from the fjomjiPansisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of 79 SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID amino acids 6857-7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. 47. An isolated polypeptide according to claim 39, wherein said polypeptide comprises a methyltransferase domain comprising amino acids 2671-3045 of SEQ ID NO:6. 48. An isolated polypeptide according to claim 39, wherein said polypeptide comprises a thioesterase domain comprising amino acids 2165-2439 of SEQ ID NO:7. 49. An isolated polypeptide according to claim 39, wherein said polypeptide comprises a non-ribosomal peptide synthetase comprising an amino acid sequence selected from the group consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, and amino acids 1344-1351 of SEQ ID NO:3. 50. A recombinant host cell comprising a recombinantly expressed polypeptide 0 according to any one of claims 39 to 49. 51. The recombinant host cell of claim 50, which is a bacteria. 52. The recombinant host cell of claim 51, which is an Actinomycete. 53. The recombinant host cell of claim 52, which is Streptomyces. 54. The isolated nucleic acid molecule according to either claim 1 or claim 3, substantially as hereinbefore described with reference to the examples. :o 55. The isolated polypeptide according to claim 39, substantially as hereinbefore described with reference to the examples. flocwneW6-2M7A72 80 DATED this 2 5 th day of July 2002. Novartis AG by DAVIES COLLISON CAVE Patent Attorneys for the Applicant 0
AU46116/99A 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones Ceased AU753567B2 (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US9950498A 1998-06-18 1998-06-18
US09/099504 1998-06-18
US10163198P 1998-09-24 1998-09-24
US60/101631 1998-09-24
US11890699P 1999-02-05 1999-02-05
US60/118906 1999-02-05
PCT/EP1999/004171 WO1999066028A2 (en) 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones

Publications (2)

Publication Number Publication Date
AU4611699A AU4611699A (en) 2000-01-05
AU753567B2 true AU753567B2 (en) 2002-10-24

Family

ID=27378840

Family Applications (1)

Application Number Title Priority Date Filing Date
AU46116/99A Ceased AU753567B2 (en) 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones

Country Status (16)

Country Link
EP (1) EP1088078A2 (en)
JP (3) JP2002518004A (en)
KR (1) KR100511233B1 (en)
CN (1) CN100374565C (en)
AU (1) AU753567B2 (en)
BR (1) BR9911349A (en)
CA (1) CA2329774A1 (en)
HU (1) HUP0102186A3 (en)
ID (1) ID29128A (en)
IL (3) IL139735A0 (en)
NO (2) NO20006195L (en)
NZ (1) NZ508326A (en)
PL (1) PL200157B1 (en)
SK (1) SK19242000A3 (en)
TR (1) TR200003759T2 (en)
WO (1) WO1999066028A2 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6242469B1 (en) 1996-12-03 2001-06-05 Sloan-Kettering Institute For Cancer Research Synthesis of epothilones, intermediates thereto, analogues and uses thereof
FR2775187B1 (en) 1998-02-25 2003-02-21 Novartis Ag USE OF EPOTHILONE B FOR THE MANUFACTURE OF AN ANTIPROLIFERATIVE PHARMACEUTICAL PREPARATION AND A COMPOSITION COMPRISING EPOTHILONE B AS AN IN VIVO ANTIPROLIFERATIVE AGENT
DE19846493A1 (en) * 1998-10-09 2000-04-13 Biotechnolog Forschung Gmbh DNA sequence coding for products involved in the biosynthesis of polyketide or heteropolyketide compounds, especially epothilone
AU768220B2 (en) 1998-11-20 2003-12-04 Kosan Biosciences, Inc. Recombinant methods and materials for producing epothilone and epothilone derivatives
US6410301B1 (en) 1998-11-20 2002-06-25 Kosan Biosciences, Inc. Myxococcus host cells for the production of epothilones
WO2001053533A2 (en) * 2000-01-21 2001-07-26 Kosan Biosciences, Inc. Method for cloning polyketide synthase genes
US6998256B2 (en) 2000-04-28 2006-02-14 Kosan Biosciences, Inc. Methods of obtaining epothilone D using crystallization and /or by the culture of cells in the presence of methyl oleate
ES2332727T3 (en) * 2000-04-28 2010-02-11 Kosan Biosciences, Inc. EPOTILONE D CRYSTALINE.
WO2002030356A2 (en) 2000-10-13 2002-04-18 The University Of Mississipi Synthesis of epothilones and relates analogs
US7257562B2 (en) 2000-10-13 2007-08-14 Thallion Pharmaceuticals Inc. High throughput method for discovery of gene clusters
DK1483251T3 (en) 2002-03-12 2010-04-12 Bristol Myers Squibb Co C3-cyano-epothilone derivatives
AU2006211216B2 (en) * 2005-01-31 2011-02-03 Merck Sharp & Dohme Llc Purification process for plasmid DNA
AU2012211052A1 (en) 2011-01-28 2013-09-05 Amyris, Inc. Gel-encapsulated microcolony screening
EP2707722A1 (en) 2011-05-13 2014-03-19 Amyris, Inc. Methods and compositions for detecting microbial production of water-immiscible compounds
CA2879178C (en) 2012-08-07 2020-11-24 Amyris, Inc. Methods for stabilizing production of acetyl-coenzyme a derived compounds
US9410214B2 (en) 2013-03-15 2016-08-09 Amyris, Inc. Use of phosphoketolase and phosphotransacetylase for production of acetyl-coenzyme A derived compounds
EP3663392A1 (en) 2013-08-07 2020-06-10 Amyris, Inc. Methods for stabilizing production of acetyl-coenzyme a derived compounds
CN108474008A (en) 2015-06-25 2018-08-31 阿迈瑞斯公司 Maltose dependence degron, maltose responsive promoter stabilize construct and its purposes in generating non-decomposition metabolic compounds
CN106916834B (en) * 2015-12-24 2022-08-05 武汉合生科技有限公司 Biosynthetic gene cluster of compounds and application thereof
CN111138444B (en) * 2020-01-08 2022-05-03 山东大学 Epothilone B glucoside compounds and enzymatic preparation and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU5483798A (en) * 1996-11-18 1998-06-10 Helmholtz-Zentrum Fuer Infektionsforschung Gmbh Epothilone c, d, e and f, production process, and their use as cytostatic as well as phytosanitary agents

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU5483798A (en) * 1996-11-18 1998-06-10 Helmholtz-Zentrum Fuer Infektionsforschung Gmbh Epothilone c, d, e and f, production process, and their use as cytostatic as well as phytosanitary agents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SCHUPP T ET AL. J.BACTERIOLOGY 177(13): 3673-3679 (1995) *

Also Published As

Publication number Publication date
AU4611699A (en) 2000-01-05
JP2008092958A (en) 2008-04-24
SK19242000A3 (en) 2001-07-10
EP1088078A2 (en) 2001-04-04
ID29128A (en) 2001-08-02
JP2006061166A (en) 2006-03-09
WO1999066028A3 (en) 2000-06-29
KR100511233B1 (en) 2005-08-31
CN1305530A (en) 2001-07-25
TR200003759T2 (en) 2001-06-21
NO20006195L (en) 2001-02-16
WO1999066028A2 (en) 1999-12-23
HUP0102186A2 (en) 2001-10-28
IL139735A (en) 2009-06-15
NO20006195D0 (en) 2000-12-06
CA2329774A1 (en) 1999-12-23
NO20091055L (en) 2001-02-16
KR20010052962A (en) 2001-06-25
HUP0102186A3 (en) 2005-10-28
PL345579A1 (en) 2001-12-17
JP2002518004A (en) 2002-06-25
BR9911349A (en) 2001-03-13
IL190391A0 (en) 2008-11-03
IL139735A0 (en) 2002-02-10
CN100374565C (en) 2008-03-12
NZ508326A (en) 2003-10-31
PL200157B1 (en) 2008-12-31

Similar Documents

Publication Publication Date Title
US6383787B1 (en) Genes for the biosynthesis of epothilones
AU753567B2 (en) Genes for the biosynthesis of epothilones
US7172884B2 (en) Methods for the preparation, isolation and purification of epothilone B, and x-ray crystal structures of epothilone B
DK2271666T3 (en) NRPS-PKS GROUP AND ITS MANIPULATION AND APPLICABILITY
NZ521788A (en) Production of polyketides
KR20100049580A (en) Thiopeptide precursor protein, gene encoding it and uses thereof
JP2023012549A (en) Modified streptomyces fungicidicus isolates and use thereof
CN100374566C (en) Genes for the biosynthesis of epothilones
MXPA00012342A (en) Genes for the biosynthesis of epothilones
CZ20004693A3 (en) Isolated nucleic acid encoding polypeptide participating in biosynthesis of epothilone, chimeric gene, vector and host cells containing such nucleic acid
KR20050050146A (en) Genes and proteins for the biosynthesis of the glycopeptide antibiotic a40926
RU2265054C2 (en) Recombinant cell-host (variants) and bac clone
RU2234532C2 (en) Nucleic acid (variants), it using for expression of epotilones, polypeptide (variants), escherichia coli microorganism clone
CN100359014C (en) Novel epothilones compound and its preparation method and application
CN115247179B (en) Polyketide skeleton and biosynthetic gene cluster of post-modifier thereof and application thereof
KR20130097538A (en) Chejuenolide biosynthetic gene cluster from hahella chejuensis
Julien et al. Genetic Engineering of Myxobacterial Natural Product Biosynthetic Genes

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)