CN100374566C - Genes for the biosynthesis of epothilones - Google Patents

Genes for the biosynthesis of epothilones Download PDF

Info

Publication number
CN100374566C
CN100374566C CNB2004100637938A CN200410063793A CN100374566C CN 100374566 C CN100374566 C CN 100374566C CN B2004100637938 A CNB2004100637938 A CN B2004100637938A CN 200410063793 A CN200410063793 A CN 200410063793A CN 100374566 C CN100374566 C CN 100374566C
Authority
CN
China
Prior art keywords
seq
nucleotide
amino acid
epothilone
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100637938A
Other languages
Chinese (zh)
Other versions
CN1715414A (en
Inventor
T·斯彻普
J·M·利根
I·莫尔纳
R·泽克尔
J·戈拉彻
D·西尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Novartis AG
Original Assignee
Novartis AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novartis AG filed Critical Novartis AG
Publication of CN1715414A publication Critical patent/CN1715414A/en
Application granted granted Critical
Publication of CN100374566C publication Critical patent/CN100374566C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P20/00Technologies relating to chemical industry
    • Y02P20/50Improvements relating to the production of bulk chemicals
    • Y02P20/52Improvements relating to the production of bulk chemicals using catalysts, e.g. selective catalysts

Abstract

Nucleic acid molecules are isolated from Sorangium cellulosum that encode polypeptides necessary for the biosynthesis of epothilone. Disclosed are methods for the production of epothilone in recombinant hosts transformed with the genes of the invention. In this manner, epothilone can be produced in quantities large enough to enable their purification and use in pharmaceutical formulations such as those for the treatment of cancer.

Description

Be used for the biosynthetic gene of EPOTHILONE
Invention field
The present invention relates generally to polyketide (polyketide) and is used for synthesizing their gene.Especially, the present invention relates to separation and synthetic epothilone A of identification of organism and essential new polyketide synthase and the non-ribosomal peptides synthase gene of B from sorangium cellulosum (Sorangium cellulosum).
Background of invention
Polyketide is that its β-carbon always carries ketone group, thus the called after polyketide by two carbon structure unit synthetic compounds.These compounds comprise that many important microbiotic, immunosuppressor, cancer chemotherapeutic agent and other have the compound of extensive biological characteristics.Huge structure diversity comes from different lengths, (as the unitary part of two carbon structures or after the polyketide main chain forms) the different side chains of introducing and the stereochemistry of these groups of polyketide chain.Ketone group can also be reduced into hydroxyl, enoyl-or remove fully.Each adding of taking turns two carbon is that (polyketide synthase, enzyme complex PKS) is to carry out to fatty acid biological synthetic similar mode by being called polyketide synthase.
The biosynthesis gene that has separated the polyketide that obtains and check order grows with each passing day.For example, see U.S. Patent number 5,639,949,5,693,774 and 5,716,849 have described and have been used for the biosynthetic gene of soraphen, and all these quote as a reference herein.Be also shown in people such as Schupp, FEMS microbiology communication (FEMS Microbiology Letters) 159:201-207 (1998) and WO98/07868 (wherein described and be used for the biosynthetic gene of rifomycin), with U.S. Patent number 5,876,991 (wherein described and be used for the biosynthetic gene of tylactone, all these quote as a reference herein.Encoded protein matter is divided into two classes usually: the first kind and second class.First kind protein is multi-functional, have that a plurality of catalyst structure domains that carry out different enzyme steps are covalently bound together (as to be used for the PKS (people such as MacNeil of erythromycin, soraphen, rifomycin and avermectin, industrial microorganism: basis and application molecular genetics (Industrial Microorganisms:Basic and AppliedMolecular Genetics), editor: people such as Baltz), U.S. microbiology association, the Washington D.C., pp.245-256 (1993)); And second proteinoid is unifunctional (people such as Hutchinson, industrial microorganism: basis and use molecular genetics, (editor: people such as Baltz), U.S. microbiology association, Washington D.C., pp.203-216 (1993)).
For better simply polyketide such as actinorhodin (producing), severally take turns being added on the PKS enzyme by one group of PKS genes encoding of two carbon and repeat by streptomyces coelicolor (Streptomyces coelicolor).Opposite, the synthetic PKS enzyme that is organized into module that relates to of more complicated compound such as erythromycin and soraphen, each module carries out taking turns the adding of two carbon (in order to look back thus, see people such as Hopwood, industrial microorganism: basis and application molecular genetics, (editor: people such as Baltz), U.S. microbiology association, the Washington D.C., pp.267-275 (1993)).
Complicated polyketide and secondary metabolites may comprise amino acid but not the substructure of simple carboxylic acid derivatives usually.These structural units mix that (non-ribosomal polypeptide synthetase NRPS) finishes by non-ribosomal polypeptide synthetic enzyme.NRPS is the multienzyme that is organized into module.Each module is responsible for the unitary adding of amino acid structure (with extra processing, if desired).NRPS activates amino acid by forming aminoacyl adenylate, and activated amino acid is trapped on the thiol group of phosphopantetheine prothetic group of peptidyl carrier proteins structural domain.Then, NRPS methylates by stereoisomers, N-or cyclisation (if desired) comes modified amino acid, and catalysis is by the formation of peptide bond between the enzyme bonded amino acid.NRPS is responsible for the biosynthesizing of peptide secondary metabolites such as S-Neoral, can be as polyketide chain termination unit be provided in rapamycin, or as in yersinia genus rhzomorph (yersiniabactin) biosynthesizing, form mixing system with PKS.
Epothilone A and B are 16 yuan big ring polyketides; has (people such as Gerth by bacterial fibers heap capsule bacteria strain So ce90; microbiotic magazine (J.Antibiotics) 49:560-563 (1996) quotes as a reference herein) the initial unit of the acyl group cysteine derivatives that produces.The structure of Epothilone A and B is (wherein R represents hydrogen in epothilone A, represents methyl in epothilone B):
Figure C20041006379300071
Epothilone has narrow anti-fungus spectra, and especially shows high cell toxicity (see people such as H  fle, German Patent 4138042 (1993) is quoted as a reference) in animal cell culture herein.Extremely importantly be that epothilone in vivo and imitate the biological effect (people such as Bollag, cancer research (Cancer Research) 55:2325-2333 (1995) quotes as a reference) of taxol herein in culturing cell.The taxol of stabilized cell microtubule and taxotere (taxotere) are the cancer chemotherapeutic agents (people such as Rowinsky, national cancer (J.Natl.Cancer Inst.) 83:1778-1781 of association journal (1991)) that various human solid tumors is had remarkable activity.Competition research has disclosed epothilone as taxol and microtubule bonded competitive inhibitor, with they total identical microtubule binding sites and the explanation unanimity that has the microtubule avidity similar to taxol.Yet epothilone has than the significant advantage of taxol, because the effectiveness of multi-medicine resistance clone epothilone is reduced than taxol much lower people (1995) such as () Bollag.And, comparing with taxol, less epothilone effectively discharges cell people (1996) such as () Gerth by P-glycoprotein.In addition, multiple epothilone analogue is synthesized, and induces shown in microtubule polymerization and the stable ability as their enhanced, has more herein than epothilone A or the higher cellular cytoxicity activity (WO98/25929 quotes as a reference) of epothilone B.
Although epothilone is hopeful as anticarcinogen, the production restriction of these compounds at present their business potential.These compounds are too complicated for plant-scale chemosynthesis, so must pass through fermentative production.The technical description that is used for slime bacteria such as sorangium cellulosum genetic manipulation is quoted as a reference in U.S. Patent number 5,686,295 herein.Yet, sorangium cellulosum be difficult to ferment and also the output of epothilone therefore low.Epothilone recombinant production in the heterologous host that is more suitable for fermenting can solve current production problems.Yet the gene that coding is responsible for the biosynthetic polypeptide of epothilone also separates so far and obtains.And, producing the bacterial strain of epothilone, i.e. So ce90 also produces at least a other polyketide spirangien, may make to separate that to be responsible for the biosynthetic gene of epothilone specially complicated more.
Therefore, in sum, an object of the present invention is to separate and relate to the biosynthetic gene of epothilone, particularly heap capsule Acarasiales/-slime bacteria of Polyangium, promptly relate to epothilone A and B synthetic gene among the sorangium cellulosum bacterial strain So ce90.Another object of the present invention provides the recombinant method for production of the epothilone that is applied to anticancer preparation.
Summary of the invention
In order to promote above-mentioned and other purpose, the present invention is unexpected to have overcome the difficulty that proposes above, and the nucleic acid molecule of the nucleotide sequence that comprises the biosynthetic polypeptide of at least a epothilone of relating to of coding is provided first.In preferred embodiments, this nucleotide sequence is the species that are subordinated to fruiting myxobacteria, and is most preferably isolating in the sorangium cellulosum.
In another preferred embodiment, the invention provides isolated nucleic acid molecule, it comprises the nucleotide sequence of the biosynthetic polypeptide of at least a epothilone of relating to of coding, wherein polypeptide comprises and the similar basically aminoacid sequence of aminoacid sequence that is selected from down group: SEQID NO:2, the amino acid/11 1-437 of SEQ ID NO:2, the amino acid 543-864 of SEQ ID NO:2, the amino acid 974-1273 of SEQ ID NO:2, the amino acid/11 314-1385 of SEQ ID NO:2, SEQ ID NO:3, the amino acid 72-81 of SEQ ID NO:3, the amino acid/11 18-125 of SEQID NO:3, the amino acid/11 99-212 of SEQ ID NO:3, the amino acid 353-363 of SEQID NO:3, the amino acid 549-565 of SEQ ID NO:3, the amino acid 588-603 of SEQID NO:3, the amino acid 669-684 of SEQ ID NO:3, the amino acid 815-821 of SEQID NO:3, the amino acid 868-892 of SEQ ID NO:3, the amino acid 903-912 of SEQID NO:3, the amino acid 918-940 of SEQ ID NO:3, the amino acid/11 268-1274 of SEQID NO:3, the amino acid/11 285-1297 of SEQ ID NO:3, the amino acid 973-1256 of SEQ ID NO:3, the amino acid/11 344-1351 of SEQ ID NO:3, SEQ ID NO:4, the amino acid 7-432 of SEQ ID NO:4, the amino acid 539-859 of SEQ ID NO:4, the amino acid 869-1037 of SEQ ID NO:4, the amino acid/11 439-1684 of SEQ ID NO:4, the amino acid/11 722-1792 of SEQ ID NO:4, SEQ ID NO:5, the amino acid 39-457 of SEQ ID NO:5, the amino acid 563-884 of SEQ ID NO:5, the amino acid/11 147-1399 of SEQ ID NO:5, the amino acid/11 434-1506 of SEQ ID NO:5, the amino acid/11 524-1950 of SEQ ID NO:5, the amino acid 2056-2377 of SEQ ID NO:5, the amino acid 2645-2895 of SEQ ID NO:5, the amino acid 2932-3005 of SEQ ID NO:5, the amino acid 3024-3449 of SEQ ID NO:5, the amino acid 3555-3876 of SEQ ID NO:5, the amino acid 3886-4048 of SEQ ID NO:5, the amino acid 4433-4719 of SEQ IDNO:5, the amino acid 4729-4974 of SEQ ID NO:5, the amino acid 5010-5082 of SEQID NO:5, the amino acid 5103-5525 of SEQ ID NO:5, the amino acid 5631-5951 of SEQ ID NO:5, the amino acid 5964-6132 of SEQ ID NO:5, the amino acid 6542-6837 of SEQ ID NO:5, the amino acid 6857-7101 of SEQ ID NO:5, the amino acid 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, the amino acid 35-454 of SEQID NO:6, the amino acid 561-881 of SEQ ID NO:6, the amino acid/11 143-1393 of SEQ IDNO:6, the amino acid/11 430-1503 of SEQ ID NO:6, the amino acid/11 522-1946 of SEQID NO:6, the amino acid 2053-2373 of SEQ ID NO:6, the amino acid 2383-2551 of SEQ ID NO:6, the amino acid 2671-3045 of SEQ ID NO:6, the amino acid 3392-3636 of SEQ ID NO:6, the amino acid 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, the amino acid 32-450 of SEQ ID NO:7, the amino acid 556-877 of SEQ IDNO:7, the amino acid 887-1051 of SEQ ID NO:7, the amino acid/11 478-1790 of SEQ IDNO:7, the amino acid/11 810-2055 of SEQ ID NO:7, the amino acid 2093-2164 of SEQID NO:7, the amino acid 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11 and SEQ ID NO:22.
In a more preferred embodiment, the invention provides a kind of isolated nucleic acid molecule, it comprises the nucleotide sequence of the biosynthetic polypeptide of at least a epothilone of relating to of coding, wherein this polypeptide comprises the aminoacid sequence that is selected from down group: SEQ ID NO:2, the amino acid/11 1-437 of SEQ ID NO:2, the amino acid 543-864 of SEQ ID NO:2, the amino acid 974-1273 of SEQ ID NO:2, the amino acid/11 314-1385 of SEQ ID NO:2, SEQ ID NO:3, the amino acid 72-81 of SEQ ID NO:3, the amino acid/11 18-125 of SEQ ID NO:3, the amino acid/11 99-212 of SEQ ID NO:3, the amino acid 353-363 of SEQ ID NO:3, the amino acid 549-565 of SEQ ID NO:3, the amino acid 588-603 of SEQ ID NO:3, the amino acid 669-684 of SEQ ID NO:3, the amino acid 815-821 of SEQ ID NO:3, the amino acid 868-892 of SEQ ID NO:3, the amino acid 903-912 of SEQ ID NO:3, the amino acid 918-940 of SEQ ID NO:3, the amino acid/11 268-1274 of SEQ ID NO:3, the amino acid/11 285-1297 of SEQ ID NO:3, the amino acid 973-1256 of SEQ ID NO:3, the amino acid/11 344-1351 of SEQ ID NO:3, SEQ ID NO:4, the amino acid 7-432 of SEQ ID NO:4, the amino acid 539-859 of SEQ ID NO:4, the amino acid 869-1037 of SEQ ID NO:4, the amino acid/11 439-1684 of SEQ ID NO:4, the amino acid/11 722-1792 of SEQ ID NO:4, SEQ ID NO:5, the amino acid 39-457 of SEQ ID NO:5, the amino acid 563-884 of SEQ ID NO:5, the amino acid/11 147-1399 of SEQ ID NO:5, the amino acid/11 434-1506 of SEQ ID NO:5, the amino acid/11 524-1950 of SEQ ID NO:5, the amino acid 2056-2377 of SEQ ID NO:5, the amino acid 2645-2895 of SEQ ID NO:5, the amino acid 2932-3005 of SEQ ID NO:5, the amino acid 3024-3449 of SEQ ID NO:5, the amino acid 3555-3876 of SEQ ID NO:5, the amino acid 3886-4048 of SEQ IDNO:5, the amino acid 4433-4719 of SEQ ID NO:5, the amino acid 4729-4974 of SEQID NO:5, the amino acid 5010-5082 of SEQ ID NO:5, the amino acid 5103-5525 of SEQ ID NO:5, the amino acid 5631-5951 of SEQ ID NO:5, the amino acid 5964-6132 of SEQ ID NO:5, the amino acid 6542-6837 of SEQ ID NO:5, the amino acid 6857-7101 of SEQ ID NO:5, the amino acid 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, the amino acid 35-454 of SEQ ID NO:6, the amino acid 561-881 of SEQID NO:6, the amino acid/11 143-1393 of SEQ ID NO:6, the amino acid/11 430-1503 of SEQ ID NO:6, the amino acid/11 522-1946 of SEQ ID NO:6, the amino acid 2053-2373 of SEQ ID NO:6, the amino acid 2383-2551 of SEQ ID NO:6, the amino acid 2671-3045 of SEQ ID NO:6, the amino acid 3392-3636 of SEQ ID NO:6, the amino acid 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, the amino acid 32-450 of SEQ ID NO:7, the amino acid 556-877 of SEQ ID NO:7, the amino acid 887-1051 of SEQ ID NO:7, the amino acid/11 478-1790 of SEQ ID NO:7, the amino acid/11 810-2055 of SEQ ID NO:7, the amino acid 2093-2164 of SEQ ID NO:7, the amino acid 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11 and SEQ ID NO:22.
In another preferred embodiment, the invention provides a kind of isolated nucleic acid molecule, it comprises the nucleotide sequence of the biosynthetic polypeptide of at least a epothilone of relating to of coding, wherein said nucleotide sequence is similar basically to the nucleotide sequence that is selected from down group: the complementary sequence of the Nucleotide 1900-3171 of SEQ ID NO:1, the Nucleotide 3415-5556 of SEQ ID NO:1, the Nucleotide 7610-11875 of SEQ ID NO:1, the Nucleotide 7643-8920 of SEQ ID NO:1, the Nucleotide 9236-10201 of SEQ ID NO:1, the Nucleotide 10529-11428 of SEQ ID NO:1, the Nucleotide 11549-11764 of SEQ ID NO:1, the Nucleotide 11872-16104 of SEQ IDNO:1, the Nucleotide 12085-12114 of SEQ ID NO:1, the Nucleotide 12223-12246 of SEQ ID NO:1, the Nucleotide 12466-12507 of SEQ ID NO:1, the Nucleotide 12928-12960 of SEQ ID NO:1, the Nucleotide 13516-13566 of SEQ ID NO:1, the Nucleotide 13633-13680 of SEQ ID NO:1, the Nucleotide 13876-13923 of SEQ ID NO:1, the Nucleotide 14313-14334 of SEQ ID NO:1, the Nucleotide 14473-14547 of SEQID NO:1, the Nucleotide 14578-14607 of SEQ ID NO:1, the Nucleotide 14623-14692 of SEQ ID NO:1, the Nucleotide 15673-15693 of SEQ ID NO:1, Nucleotide 1 5724-15762 of SEQ ID NO:1, the Nucleotide 14788-15639 of SEQ ID NO:1, the Nucleotide 15901-15924 of SEQ ID NO:1, the Nucleotide 16251-21749 of SEQ ID NO:1, the Nucleotide 16269-17546 of SEQ ID NO:1, the Nucleotide 17865-18827 of SEQID NO:1, the Nucleotide 18855-19361 of SEQ ID NO:1, the Nucleotide 20565-21302 of SEQ ID NO:1, the Nucleotide 21414-21626 of SEQ ID NO:1, the Nucleotide 21746-43519 of SEQ ID NO:1, the Nucleotide 21860-23116 of SEQ ID NO:1, the Nucleotide 23431-24397 of SEQ ID NO:1, the Nucleotide 25184-25942 of SEQ ID NO:1, the Nucleotide 26045-26263 of SEQ ID NO:1, the Nucleotide 26318-27595 of SEQID NO:1, the Nucleotide 27911-28876 of SEQ ID NO:1, the Nucleotide 29678-30429 of SEQ ID NO:1, the Nucleotide 30539-30759 of SEQ ID NO:1, the Nucleotide 30815-32092 of SEQ ID NO:1, the Nucleotide 32408-33373 of SEQ ID NO:1, the Nucleotide 33401-33889 of SEQ ID NO:1, the Nucleotide 35042-35902 of SEQ ID NO:1, the Nucleotide 35930-36667 of SEQID NO:1, the Nucleotide 36773-36991 of SEQID NO:1, the Nucleotide 37052-38320 of SEQ ID NO:1, the Nucleotide 38636-39598 of SEQ ID NO:1, the Nucleotide 39635-40141 of SEQ ID NO:1, the Nucleotide 41369-42256 of SEQ ID NO:1, the Nucleotide 42314-43048 of SEQ ID NO:1, the Nucleotide 43163-43378 of SEQ ID NO:1, the Nucleotide 43524-54920 of SEQ ID NO:1, the Nucleotide 43626-44885 of SEQ ID NO:1, the Nucleotide 45204-46166 of SEQID NO:1, the Nucleotide 46950-47702 of SEQ ID NO:1, the Nucleotide 47811-48032 of SEQ ID NO:1, the Nucleotide 48087-49361 of SEQ ID NO:1, the Nucleotide 49680-50642 of SEQ ID NO:1, the Nucleotide 50670-51176 of SEQ ID NO:1, the Nucleotide 51534-52657 of SEQ ID NO:1, the Nucleotide 53697-54431 of SEQ ID NO:1, the Nucleotide 54540-54758 of SEQ ID NO:1, the Nucleotide 54935-62254 of SEQID NO:1, the Nucleotide 55028-56284 of SEQ ID NO:1, the Nucleotide 56600-57565 of SEQ ID NO:1, the Nucleotide 57593-58087 of SEQ ID NO:1, the Nucleotide 59366-60304 of SEQ ID NO:1, the Nucleotide 60362-61099 of SEQ ID NO:1, the Nucleotide 61211-61426 of SEQ ID NO:1, the Nucleotide 61427-62254 of SEQ ID NO:1, the Nucleotide 62369-63628 of SEQ ID NO:1, the Nucleotide 1-68750 of the Nucleotide 67334-68251 of SEQID NO:1 and SEQ ID NO:1.
In particularly preferred embodiments, the invention provides the nucleic acid molecule of the nucleotide sequence that comprises the biosynthetic polypeptide of at least a epothilone of relating to of coding, wherein said nucleotide sequence is selected from down group: the complementary sequence of the Nucleotide 1900-3171 of SEQ ID NO:1, the Nucleotide 3415-5556 of SEQ ID NO:1, the Nucleotide 7610-11875 of SEQ ID NO:1, the Nucleotide 7643-8920 of SEQ ID NO:1, the Nucleotide 9236-10201 of SEQ ID NO:1, the Nucleotide 10529-11428 of SEQ ID NO:1, the Nucleotide 11549-11764 of SEQ ID NO:1, the Nucleotide 11872-16104 of SEQ ID NO:1, the Nucleotide 12085-12114 of SEQID NO:1, the Nucleotide 12223-12246 of SEQ ID NO:1, the Nucleotide 12466-12507 of SEQ ID NO:1, the Nucleotide 12928-12960 of SEQ ID NO:1, Nucleotide 1 3516-13566 of SEQ ID NO:1, the Nucleotide 13633-13680 of SEQ ID NO:1, the Nucleotide 13876-13923 of SEQ ID NO:1, the Nucleotide 14313-14334 of SEQ ID NO:1, the Nucleotide 14473-14547 of SEQ ID NO:1, the Nucleotide 14578-14607 of SEQID NO:1, the Nucleotide 14623-14692 of SEQ ID NO:1, the Nucleotide 15673-15693 of SEQ ID NO:1, the Nucleotide 15724-15762 of SEQ ID NO:1, the Nucleotide 14788-15639 of SEQ ID NO:1, the Nucleotide 15901-15924 of SEQ ID NO:1, the Nucleotide 16251-21749 of SEQ ID NO:1, the Nucleotide 16269-17546 of SEQ ID NO:1, the Nucleotide 17865-18827 of SEQ ID NO:1, the Nucleotide 18855-19361 of SEQID NO:1, the Nucleotide 20565-21302 of SEQ ID NO:1, the Nucleotide 21414-21626 of SEQ ID NO:1, the Nucleotide 21746-43519 of SEQ ID NO:1, the Nucleotide 21860-23116 of SEQ ID NO:1, the Nucleotide 23431-24397 of SEQ ID NO:1, the Nucleotide 25184-25942 of SEQ ID NO:1, the Nucleotide 26045-26263 of SEQ ID NO:1, the Nucleotide 26318-27595 of SEQ ID NO:1, the Nucleotide 27911-28876 of SEQID NO:1, the Nucleotide 29678-30429 of SEQ ID NO:1, the Nucleotide 30539-30759 of SEQ ID NO:1, the Nucleotide 30815-32092 of SEQ ID NO:1, the Nucleotide 32408-33373 of SEQ ID NO:1, the Nucleotide 33401-33889 of SEQ ID NO:1, the Nucleotide 35042-35902 of SEQ ID NO:1, the Nucleotide 35930-36667 of SEQ ID NO:1, the Nucleotide 36773-36991 of SEQ ID NO:1, the Nucleotide 37052-38320 of SEQID NO:1, the Nucleotide 38636-39598 of SEQ ID NO:1, the Nucleotide 39635-40141 of SEQ ID NO:1, the Nucleotide 41369-42256 of SEQ ID NO:1, the Nucleotide 42314-43048 of SEQ ID NO:1, the Nucleotide 43163-43378 of SEQ ID NO:1, the Nucleotide 43524-54920 of SEQ ID NO:1, the Nucleotide 43626-44885 of SEQ ID NO:1, the Nucleotide 45204-46166 of SEQ ID NO:1, the Nucleotide 46950-47702 of SEQID NO:1, the Nucleotide 47811-48032 of SEQ ID NO:1, the Nucleotide 48087-49361 of SEQ ID NO:1, the Nucleotide 49680-50642 of SEQ ID NO:1, the Nucleotide 50670-51176 of SEQ ID NO:1, the Nucleotide 51534-52657 of SEQ ID NO:1, the Nucleotide 53697-54431 of SEQ ID NO:1, the Nucleotide 54540-54758 of SEQ ID NO:1, the Nucleotide 54935-62254 of SEQ ID NO:1, the Nucleotide 55028-56284 of SEQID NO:1, the Nucleotide 56600-57565 of SEQ ID NO:1, the Nucleotide 57593-58087 of SEQ ID NO:1, the Nucleotide 59366-60304 of SEQ ID NO:1, the Nucleotide 60362-61099 of SEQ ID NO:1, the Nucleotide 61211-61426 of SEQ ID NO:1, the Nucleotide 61427-62254 of SEQ ID NO:1, the Nucleotide 62369-63628 of SEQ ID NO:1, the Nucleotide 1-68750 of the Nucleotide 67334-68251 of SEQ ID NO:1 and SEQ ID NO:1.
In another preferred embodiment, the invention provides isolated nucleic acid molecule, it comprises the nucleotide sequence of the biosynthetic polypeptide of at least a epothilone of relating to of coding, wherein said nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40, the partial sequence of the individual base pair in 45 or 50 (preferred 20) identical continuous 20,25,30,35,40, the nucleotide segment of the individual base pair in 45 or 50 (preferred 20): the complementary sequence of the Nucleotide 1900-3171 of SEQ IDNO:1, the Nucleotide 3415-5556 of SEQ ID NO:1, the Nucleotide 7610-11875 of SEQ ID NO:1, the Nucleotide 7643-8920 of SEQ ID NO:1, the Nucleotide 9236-10201 of SEQ ID NO:1, the Nucleotide 10529-11428 of SEQ ID NO:1, the Nucleotide 11549-11764 of SEQ ID NO:1, the Nucleotide 11872-16104 of SEQID NO:1, the Nucleotide 12085-12114 of SEQ ID NO:1, the Nucleotide 12223-12246 of SEQ ID NO:1, the Nucleotide 12466-12507 of SEQ ID NO:1, the Nucleotide 12928-12960 of SEQ ID NO:1, the Nucleotide 13516-13566 of SEQ ID NO:1, the Nucleotide 13633-13680 of SEQ ID NO:1, the Nucleotide 13876-13923 of SEQ ID NO:1, the Nucleotide 14313-14334 of SEQ ID NO:1, the Nucleotide 14473-14547 of SEQID NO:1, the Nucleotide 14578-14607 of SEQ ID NO:1, the Nucleotide 14623-14692 of SEQ ID NO:1, the Nucleotide 15673-15693 of SEQ ID NO:1, the Nucleotide 15724-15762 of SEQ ID NO:1, the Nucleotide 14788-15639 of SEQ ID NO:1, the Nucleotide 15901-15924 of SEQ ID NO:1, the Nucleotide 16251-21749 of SEQ ID NO:1, the Nucleotide 16269-17546 of SEQ ID NO:1, the Nucleotide 17865-18827 of SEQID NO:1, the Nucleotide 18855-19361 of SEQ ID NO:1, the Nucleotide 20565-21302 of SEQ ID NO:1, the Nucleotide 21414-21626 of SEQ ID NO:1, Nucleotide 2 1746-43519 of SEQ ID NO:1, the Nucleotide 21860-23116 of SEQ ID NO:1, the Nucleotide 23431-24397 of SEQ ID NO:1, the Nucleotide 25184-25942 of SEQ ID NO:1, the Nucleotide 26045-26263 of SEQ ID NO:1, the Nucleotide 26318-27595 of SEQID NO:1, the Nucleotide 27911-28876 of SEQ ID NO:1, the Nucleotide 29678-30429 of SEQ ID NO:1, the Nucleotide 30539-30759 of SEQ ID NO:1, the Nucleotide 30815-32092 of SEQ ID NO:1, the Nucleotide 32408-33373 of SEQ ID NO:1, the Nucleotide 33401-33889 of SEQ ID NO:1, the Nucleotide 35042-35902 of SEQ ID NO:1, the Nucleotide 35930-36667 of SEQ ID NO:1, the Nucleotide 36773-36991 of SEQID NO:1, the Nucleotide 37052-38320 of SEQ ID NO:1, the Nucleotide 38636-39598 of SEQ ID NO:1, the Nucleotide 39535-40141 of SEQ ID NO:1, the Nucleotide 41369-42256 of SEQ ID NO:1, the Nucleotide 42314-43048 of SEQ ID NO:1, the Nucleotide 43163-43378 of SEQ ID NO:1, the Nucleotide 43524-54920 of SEQ ID NO:1, the Nucleotide 43626-44885 of SEQ ID NO:1, the Nucleotide 45204-46166 of SEQID NO:1, the Nucleotide 46950-47702 of SEQ ID NO:1, the Nucleotide 47811-48032 of SEQ ID NO:1, the Nucleotide 48087-4936 1 of SEQ ID NO:1, the Nucleotide 49680-50642 of SEQ ID NO:1, the Nucleotide 50670-51176 of SEQ ID NO:1, the Nucleotide 51534-52657 of SEQ ID NO:1, the Nucleotide 53697-54431 of SEQ ID NO:1, the Nucleotide 54540-54758 of SEQ ID NO:1, the Nucleotide 54935-62254 of SEQID NO:1, the Nucleotide 55028-56284 of SEQ ID NO:1, the Nucleotide 56600-57565 of SEQ IDNO:1, the Nucleotide 57593-58087 of SEQ ID NO:1, the Nucleotide 59366-60304 of SEQ ID NO:1, the Nucleotide 60362-61099 of SEQ ID NO:1, the Nucleotide 61211-61426 of SEQ ID NO:1, the Nucleotide 61427-62254 of SEQ ID NO:1, the Nucleotide 62369-63628 of SEQ ID NO:1, the Nucleotide 1-68750 of the Nucleotide 67334-68251 of SEQID NO:1 and SEQ ID NO:1.
The present invention also provides the mosaic gene that comprises the allogeneic promoter sequence that can be operatively connected with nucleic acid molecule of the present invention.Secondly, the invention provides the recombinant vectors that comprises this mosaic gene, what wherein said carrier can be stabilized is transformed in the host cell.Once more, the invention provides the recombinant host cell that comprises this mosaic gene, wherein host cell can be expressed the nucleotide sequence of the essential polypeptide of at least a epothilone biosynthesizing of coding.In preferred embodiments, recombinant host cell is the bacterium that belongs to actinomycetales, and recombinant host cell is a streptomycete bacterial strain in a more preferred embodiment.In other embodiments, recombinant host cell is that any other is fit to the bacterium of fermentation, such as pseudomonas or intestinal bacteria.Again secondly, the invention provides the Bac clone who comprises nucleic acid molecule of the present invention, preferred Bac clone pEP015.
On the other hand, the invention provides isolated nucleic acid molecule, it comprises the nucleotide sequence of coding epothilone synthase domain.
According to an embodiment, the epothilone synthase domain is β-ketoacyl synthase (KS) structural domain, it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid/11 1-437 of SEQID NO:2, the amino acid 7-432 of SEQ ID NO:4, the amino acid 39-457 of SEQ IDNO:5, the amino acid/11 524-1950 of SEQ ID NO:5, the amino acid 3024-3449 of SEQ IDNO:5, the amino acid 5103-5525 of SEQ ID NO:5, the amino acid 35-454 of SEQID NO:6, the amino acid 32-450 of the amino acid/11 522-1946 of SEQ ID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described KS structural domain comprises the aminoacid sequence that is selected from down group: the amino acid/11 1-437 of SEQ ID NO:2, the amino acid 7-432 of SEQ ID NO:4, the amino acid 39-457 of SEQ ID NO:5, the amino acid/11 524-1950 of SEQID NO:5, the amino acid 3024-3449 of SEQ ID NO:5, the amino acid 5103-5525 of SEQ ID NO:5, the amino acid 35-454 of SEQ ID NO:6, the amino acid 32-450 of the amino acid/11 522-1946 of SEQ ID NO:6 and SEQ ID NO:7.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 7643-8920 of SEQ ID NO:1, the Nucleotide 16269-17546 of SEQ ID NO:1, the Nucleotide 21860-23116 of SEQ ID NO:1, the Nucleotide 26318-27595 of SEQID NO:1, the Nucleotide 30815-32092 of SEQ ID NO:1, the Nucleotide 37052-38320 of SEQ ID NO:1, the Nucleotide 43626-44885 of SEQ ID NO:1, the Nucleotide 55028-56284 of the Nucleotide 48087-49361 of SEQ ID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably this nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40,45, or the individual base pair partial sequence in 50 (preferred 20) identical continuous 20,25,30,35,40,45, or the nucleotide segment of the individual base pair in 50 (preferred 20): the Nucleotide 7643-8920 of SEQ ID NO:1, the Nucleotide 16269-17546 of SEQ ID NO:1, the Nucleotide 21860-23116 of SEQ ID NO:1, the Nucleotide 26318-27595 of SEQ ID NO:1, the Nucleotide 30815-32092 of SEQID NO:1, the Nucleotide 37052-38320 of SEQ ID NO:1, the Nucleotide 43626-44885 of SEQ ID NO:1, the Nucleotide 55028-56284 of the Nucleotide 48087-49361 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 7643-8920 of SEQ ID NO:1, the Nucleotide 16269-17546 of SEQ ID NO:1, the Nucleotide 21860-23116 of SEQ ID NO:1, the Nucleotide 26318-27595 of SEQ ID NO:1, the Nucleotide 30815-32092 of SEQ ID NO:1, the Nucleotide 37052-38320 of SEQ ID NO:1, the Nucleotide 43626-44885 of SEQID NO:1, the Nucleotide 55028-56284 of the Nucleotide 48087-49361 of SEQ ID NO:1 and SEQ ID NO:1.
According to another embodiment; the epothilone synthase domain is acyltransferase (AT) structural domain; it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid 543-864 of SEQID NO:2; the amino acid 539-859 of SEQ ID NO:4; the amino acid 563-884 of SEQID NO:5; the amino acid 2056-2377 of SEQ ID NO:5; the amino acid 3555-3876 of SEQ ID NO:5; the amino acid 5631-5951 of SEQ ID NO:5; the amino acid 561-881 of SEQ ID NO:6; the amino acid 556-877 of the amino acid 2053-2373 of SEQ ID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described AT structural domain comprises the aminoacid sequence that is selected from down group: the amino acid 543-864 of SEQ ID NO:2, the amino acid 539-859 of SEQ ID NO:4, the amino acid 563-884 of SEQ ID NO:5, the amino acid 2056-2377 of SEQ ID NO:5, the amino acid 3555-3876 of SEQ ID NO:5, the amino acid 5631-5951 of SEQ ID NO:5, the amino acid 561-881 of SEQ IDNO:6, the amino acid 556-877 of the amino acid 2053-2373 of SEQ ID NO:6 and SEQ ID NO:7.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 9236-10201 of SEQ ID NO:1, the Nucleotide 17865-18827 of SEQ ID NO:1, the Nucleotide 23431-24397 of SEQ ID NO:1, the Nucleotide 27911-28876 of SEQ ID NO:1, the Nucleotide 32408-33373 of SEQID NO:1, the Nucleotide 38636-39598 of SEQ ID NO:1, the Nucleotide 45204-46166 of SEQ ID NO:1, the Nucleotide 56600-57565 of the Nucleotide 49680-50642 of SEQ ID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably described nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40,45, or the partial sequence of the individual base pair in 50 (preferred 20) identical continuous 20,25,30,35,40,45, or the nucleotide segment of the individual base pair in 50 (preferred 20): the Nucleotide 9236-10201 of SEQ ID NO:1, the Nucleotide 17865-18827 of SEQ ID NO:1, the Nucleotide 23431-24397 of SEQ ID NO:1, the Nucleotide 27911-28876 of SEQ IDNO:1, the Nucleotide 32408-33373 of SEQ ID NO:1, the Nucleotide 38636-39598 of SEQ ID NO:1, the Nucleotide 45204-46166 of SEQ ID NO:1, the Nucleotide 56600-57565 of the Nucleotide 49680-50642 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 9236-10201 of SEQ ID NO:1, the Nucleotide 17865-18827 of SEQ ID NO:1, the Nucleotide 23431-24397 of SEQ ID NO:1, the Nucleotide 27911-28876 of SEQID NO:1, the Nucleotide 32408-33373 of SEQ ID NO:1, the Nucleotide 38636-39598 of SEQ ID NO:1, the Nucleotide 45204-46166 of SEQID NO:1, the Nucleotide 56600-57565 of the Nucleotide 49680-50642 of SEQ ID NO:1 and SEQ ID NO:1.
According to another embodiment; this epothilone synthase domain is enoyl-reductase enzyme (ER) structural domain; it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid 974-1273 of SEQ ID NO:2; the amino acid 4433-4719 of SEQ ID NO:5; the amino acid/11 478-1790 of the amino acid 6542-6837 of SEQ ID NO:5 and SEQ ID NO:7.According to this embodiment, preferred described ER structural domain comprises the aminoacid sequence that is selected from down group: the amino acid 974-1273 of SEQ ID NO:2, the amino acid 4433-4719 of SEQ ID NO:5, the amino acid 6542-6837 of SEQ ID NO:5 and the amino acid/11 478-1790 of SEQID NO:7.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 10529-11428 of SEQ ID NO:1, the Nucleotide 35042-35902 of SEQ ID NO:1, the Nucleotide 59366-60304 of the Nucleotide 41369-42256 of SEQID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably this nucleotide sequence comprises the nucleotide segment of continuous 20,25,30,35,40,45 or 50 (preferred 20) the individual base pair identical with the partial sequence of the individual base pair in corresponding continuous 20,25,30,35,40,45 or 50 (preferred 20) in the nucleotide sequence that is selected from down group: the Nucleotide 10529-11428 of SEQ ID NO:1, the Nucleotide 35042-35902 of SEQ IDNO:1, the Nucleotide 59366-60304 of the Nucleotide 41369-42256 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 10529-11428 of SEQ ID NO:1, the Nucleotide 35042-35902 of SEQ ID NO:1, the Nucleotide 59366-60304 of the Nucleotide 41369-42256 of SEQ ID NO:1 and SEQ ID NO:1.
According to another embodiment; described epothilone synthase domain is acyl carrier protein (ACP) structural domain; wherein said polypeptide comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid/11 314-1385 of SEQ ID NO:2; the amino acid/11 722-1792 of SEQ ID NO:4; the amino acid/11 434-1506 of SEQ ID NO:5; the amino acid 2932-3005 of SEQ ID NO:5; the amino acid 5010-5082 of SEQ ID NO:5; the amino acid 7140-7211 of SEQ IDNO:5; the amino acid/11 430-1503 of SEQ ID NO:6; the amino acid 2093-2164 of the amino acid 3673-3745 of SEQID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described ACP structural domain comprises the aminoacid sequence that is selected from down group: the amino acid/11 314-1385 of SEQ ID NO:2, the amino acid/11 722-1792 of SEQ ID NO:4, the amino acid/11 434-1506 of SEQ ID NO:5, the amino acid 2932-3005 of SEQ ID NO:5, the amino acid 5010-5082 of SEQ ID NO:5, the amino acid 7140-7211 of SEQ ID NO:5, the amino acid/11 430-1503 of SEQ ID NO:6, the amino acid 2093-2164 of the amino acid 3673-3745 of SEQ ID NO:6 and SEQ ID NO:7.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 11549-11764 of SEQ ID NO:1, the Nucleotide 21414-21626 of SEQ ID NO:1, the Nucleotide 26045-26263 of SEQ ID NO:1, the Nucleotide 30539-30759 of SEQID NO:1, the Nucleotide 36773-36991 of SEQ ID NO:1, the Nucleotide 43163-43378 of SEQ ID NO:1, the Nucleotide 47811-48032 of SEQ ID NO:1, the Nucleotide 61211-61426 of the Nucleotide 54540-54758 of SEQ ID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably described nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40,45, or the partial sequence of the individual base pair in 50 (preferred 20) identical continuous 20,25,30,35,40,45, or the nucleotide segment of the individual base pair in 50 (preferred 20): the Nucleotide 11549-11764 of SEQ ID NO:1, the Nucleotide 21414-21626 of SEQ ID NO:1, the Nucleotide 26045-26263 of SEQ IDNO:1, the Nucleotide 30539-30759 of SEQ ID NO:1, the Nucleotide 36773-36991 of SEQ ID NO:1, the Nucleotide 43163-43378 of SEQ ID NO:1, the Nucleotide 47811-48032 of SEQ ID NO:1, the Nucleotide 61211-61426 of the Nucleotide 54540-54758 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 11549-11764 of SEQ ID NO:1, the Nucleotide 21414-21626 of SEQ ID NO:1, the Nucleotide 26045-26263 of SEQ IDNO:1, the Nucleotide 30539-30759 of SEQ ID NO:1, the Nucleotide 36773-36991 of SEQ ID NO:1, the Nucleotide 43163-43378 of SEQ ID NO:1, the Nucleotide 47811-48032 of SEQ ID NO:1, the Nucleotide 61211-61426 of the Nucleotide 54540-54758 of SEQ ID NO:1 and SEQ ID NO:1.
According to another embodiment, described epothilone synthase domain is dehydratase (DH) structural domain, it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid 869-1037 of SEQID NO:4, the amino acid 3886-4048 of SEQ ID NO:5, the amino acid 5964-6132 of SEQ ID NO:5, the amino acid 887-1051 of the amino acid 2383-2551 of SEQ ID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described DH structural domain comprises the aminoacid sequence that is selected from down group: the amino acid 869-1037 of SEQ ID NO:4, the amino acid 3886-4048 of SEQ ID NO:5, the amino acid 5964-6132 of SEQ ID NO:5, the amino acid 2383-2551 of SEQ ID NO:6 and the amino acid 887-1051 of SEQ IDNO:7.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 18855-19361 of SEQ ID NO:1, the Nucleotide 33401-33889 of SEQ ID NO:1, the Nucleotide 39635-40141 of SEQ ID NO:1, the Nucleotide 57593-58087 of the Nucleotide 50670-51176 of SEQ ID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably described nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40,45, or the partial sequence of the individual base pair in 50 (preferred 20) identical continuous 20,25,30,35,40,45, or the nucleotide segment of the individual base pair in 50 (preferred 20): the Nucleotide 18855-19361 of SEQ IDNO:1, the Nucleotide 33401-33889 of SEQ ID NO:1, the Nucleotide 39635-40141 of SEQ ID NO:1, the Nucleotide 57593-58087 of the Nucleotide 50670-51176 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 18855-19361 of SEQ ID NO:1, the Nucleotide 33401-33889 of SEQ ID NO:1, the Nucleotide 39635-40141 of SEQ ID NO:1, the Nucleotide 50670-51176 of SEQ ID NO:1 and the Nucleotide 57593-58087 of SEQ IDNO:1.
According to another embodiment, described epothilone synthase domain is β-ketoreductase (KR) structural domain, it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid/11 439-1684 of SEQ ID NO:4, the amino acid/11 147-1399 of SEQ ID NO:5, the amino acid 2645-2895 of SEQ ID NO:5, the amino acid 4729-4974 of SEQ ID NO:5, the amino acid 6857-7101 of SEQ ID NO:5, the amino acid/11 143-1393 of SEQ ID NO:6, the amino acid 3392-3636 of SEQ ID NO:6 and the amino acid/11 810-2055 of SEQID NO:7.According to this embodiment, preferred described KR structural domain comprises the aminoacid sequence that is selected from down group: the amino acid/11 439-1684 of SEQ ID NO:4, the amino acid/11 147-1399 of SEQ ID NO:5, the amino acid 2645-2895 of SEQ ID NO:5, the amino acid 4729-4974 of SEQ ID NO:5, the amino acid 6857-7101 of SEQ ID NO:5, the amino acid/11 143-1393 of SEQ ID NO:6, the amino acid/11 810-2055 of the amino acid 3392-3636 of SEQ ID NO:6 and SEQ ID NO:7.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 20565-21302 of SEQ ID NO:1, the Nucleotide 25184-25942 of SEQ ID NO:1, the Nucleotide 29678-30429 of SEQ ID NO:1, the Nucleotide 35930-36667 of SEQ ID NO:1, the Nucleotide 42314-43048 of SEQ ID NO:1, the Nucleotide 46950-47702 of SEQID NO:1, the Nucleotide 60362-61099 of the Nucleotide 53697-54431 of SEQ ID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably this nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40,45, or the partial sequence of the individual base pair in 50 (preferred 20) identical continuous 20,25,30,35,40,45, or the nucleotide segment of the individual base pair in 50 (preferred 20): the Nucleotide 20565-21302 of SEQ ID NO:1, the Nucleotide 25184-25942 of SEQ ID NO:1, the Nucleotide 29678-30429 of SEQ ID NO:1, the Nucleotide 35930-36667 of SEQ ID NO:1, the Nucleotide 42314-43048 of SEQ ID NO:1, the Nucleotide 46950-47702 of SEQ ID NO:1, the Nucleotide 60362-61099 of the Nucleotide 53697-54431 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 20565-21302 of SEQ ID NO:1, the Nucleotide 25184-25942 of SEQ ID NO:1, the Nucleotide 29678-30429 of SEQ ID NO:1, the Nucleotide 35930-36667 of SEQ ID NO:1, the Nucleotide 42314-43048 of SEQ ID NO:1, the Nucleotide 46950-47702 of SEQ ID NO:1, the Nucleotide 60362-61099 of the Nucleotide 53697-54431 of SEQ ID NO:1 and SEQ ID NO:1.
According to another embodiment, described epothilone synthase domain is methyltransgerase (MT) structural domain, and it comprises the similar substantially aminoacid sequence of amino acid 2671-3045 to SEQ ID NO:6.According to this embodiment, preferred described MT structural domain comprises the amino acid 2671-3045 of SEQ IDNO:6.And according to this embodiment, preferred described nucleotide sequence is similar substantially to the Nucleotide 51534-52657 of SEQ ID NO:1.According to this embodiment, more preferably this nucleotide sequence comprises the nucleotide segment of the identical individual base pair in continuous 20,25,30,35,40,45 or 50 (preferred 20) of the partial sequence of the individual base pair in corresponding continuous 20,25,30,35,40,45 or 50 (preferred 20) among the Nucleotide 51534-52657 with SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is the Nucleotide 51534-52657 of SEQ ID NO:1.
According to another embodiment, described epothilone synthase domain is thioesterase (TE) structural domain, and it comprises the similar substantially aminoacid sequence of amino acid 2165-2439 to SEQ ID NO:7.According to this embodiment, preferred described TE structural domain comprises the amino acid 2165-2439 of SEQ ID NO:7.And according to this embodiment, preferred described nucleotide sequence is similar substantially to the Nucleotide 61427-62254 of SEQ ID NO:1.According to this embodiment, more preferably this nucleotide sequence comprises the nucleotide segment of the individual base pair in continuous 20,25,30,35,40,45 or 50 (preferred 20) that the individual base pair partial sequence in corresponding continuous 20,25,30,35,40,45 or 50 (preferred 20) is identical among the Nucleotide 61427-62254 with SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is the Nucleotide 61427-62254 of SEQ ID NO:1.
On the other hand, the invention provides isolated nucleic acid molecule, it comprises the nucleotide sequence of coding non-ribosomal peptide synthetase, wherein said non-ribosomal peptide synthetase comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: SEQ ID NO:3, the amino acid 72-81 of SEQ ID NO:3, the amino acid/11 18-125 of SEQ ID NO:3, the amino acid/11 99-212 of SEQ ID NO:3, the amino acid 353-363 of SEQ ID NO:3, the amino acid 549-565 of SEQ ID NO:3, the amino acid 588-603 of SEQ ID NO:3, the amino acid 669-684 of SEQ ID NO:3, the amino acid 815-821 of SEQ ID NO:3, the amino acid 868-892 of SEQ ID NO:3, the amino acid 903-912 of SEQ ID NO:3, the amino acid 918-940 of SEQ ID NO:3, the amino acid/11 268-1274 of SEQ ID NO:3, the amino acid/11 285-1297 of SEQ ID NO:3, the amino acid 973-1256 of SEQ ID NO:3 and the amino acid/11 344-1351 of SEQ IDNO:3.According to this embodiment, preferred described non-ribosomal peptide synthetase comprises the aminoacid sequence that is selected from down group: SEQ ID NO:3, the amino acid 72-81 of SEQ ID NO:3, the amino acid/11 18-125 of SEQ ID NO:3, the amino acid/11 99-2 12 of SEQ ID NO:3, the amino acid 353-363 of SEQ ID NO:3, the amino acid 549-565 of SEQ ID NO:3, the amino acid 588-603 of SEQ ID NO:3, the amino acid 669-684 of SEQ ID NO:3, the amino acid 815-821 of SEQ ID NO:3, the amino acid 868-892 of SEQ ID NO:3, the amino acid 903-912 of SEQ ID NO:3, the amino acid 918-940 of SEQ ID NO:3, the amino acid/11 268-1274 of SEQ ID NO:3, the amino acid/11 285-1297 of SEQ ID NO:3, the amino acid 973-1256 of SEQ ID NO:3 and the amino acid/11 344-1351 of SEQ IDNO:3.And, according to this embodiment, preferred described nucleotide sequence is similar substantially to the nucleotide sequence that is selected from down group: the Nucleotide 11872-16104 of SEQ ID NO:1, the Nucleotide 12085-12114 of SEQ ID NO:1, the Nucleotide 12223-12246 of SEQ ID NO:1, the Nucleotide 12466-12507 of SEQ ID NO:1, the Nucleotide 12928-12960 of SEQID NO:1, the Nucleotide 13516-13566 of SEQ ID NO:1, the Nucleotide 13633-13680 of SEQ ID NO:1, the Nucleotide 13876-13923 of SEQ ID NO:1, the Nucleotide 14313-14334 of SEQ ID NO:1, the Nucleotide 14473-14547 of SEQ ID NO:1, the Nucleotide 14578-14607 of SEQ ID NO:1, the Nucleotide 14623-14692 of SEQ ID NO:1, the Nucleotide 15673-15693 of SEQ ID NO:1, the Nucleotide 15724-15762 of SEQID NO:1, the Nucleotide 15901-15924 of the Nucleotide 14788-15639 of SEQ ID NO:1 and SEQ ID NO:1.According to this embodiment, more preferably this nucleotide sequence comprise with the nucleotide sequence that is selected from down group in accordingly continuous 20,25,30,35,40, the partial sequence of the individual base pair in 45 or 50 (preferred 20) identical continuous 20,25,30,35,40, the nucleotide segment of the individual base pair in 45 or 50 (preferred 20): the Nucleotide 11872-16104 of SEQ IDNO:1, the Nucleotide 12085-12114 of SEQ ID NO:1, the Nucleotide 12223-12246 of SEQ ID NO:1, the Nucleotide 12466-12507 of SEQ ID NO:1, the Nucleotide 12928-12960 of SEQ ID NO:1, the Nucleotide 13516-13566 of SEQ ID NO:1, the Nucleotide 13633-13680 of SEQ ID NO:1, the Nucleotide 13876-13923 of SEQ ID NO:1, the Nucleotide 14313-14334 of SEQ ID NO:1, the Nucleotide 14473-14547 of SEQID NO:1, the Nucleotide 14578-14607 of SEQ ID NO:1, the Nucleotide 14623-14692 of SEQ ID NO:1, the Nucleotide 15673-15693 of SEQ ID NO:1, the Nucleotide 15724-15762 of SEQ ID NO:1, the Nucleotide 15901-15924 of the Nucleotide 14788-15639 of SEQ ID NO:1 and SEQ ID NO:1.In addition, according to this embodiment, most preferably this nucleotide sequence is selected from down group: the Nucleotide 11872-16104 of SEQ ID NO:1, the Nucleotide 12085-12114 of SEQ ID NO:1, the Nucleotide 12223-12246 of SEQ IDNO:1, the Nucleotide 12466-12507 of SEQ ID NO:1, the Nucleotide 12928-12960 of SEQ ID NO:1, the Nucleotide 13516-13566 of SEQ ID NO:1, the Nucleotide 13633-13680 of SEQ ID NO:1, the Nucleotide 13876-13923 of SEQ ID NO:1, the Nucleotide 14313-14334 of SEQ ID NO:1, the Nucleotide 14473-14547 of SEQ ID NO:1, the Nucleotide 14578-14607 of SEQ ID NO:1, the Nucleotide 14623-14692 of SEQID NO:1, the Nucleotide 15673-15693 of SEQ ID NO:1, the Nucleotide 15724-15762 of SEQ ID NO:1, the Nucleotide 15901-15924 of the Nucleotide 14788-15639 of SEQ ID NO:1 and SEQ ID NO:1.
The present invention also provides a kind of isolated nucleic acid molecule, and it comprises the nucleotide sequence that coding comprises the polypeptide of the aminoacid sequence that is selected from SEQID NO:2-23.
According on the other hand, the present invention also provides the method that is used for recombinant production polyketide such as epothilone, its production output enough be used for they purifying and medicinal preparations such as being used for the treatment of cancer.The peculiar advantage of these production methods is that the molecule that produces has chirality; Produce in the transgenosis organism, avoided producing a large amount of racemic mixtures, the some of them enantiomorph may have the activity of reduction.Especially, the invention provides the method for heterogenous expression epothilone in recombinant host, comprise that (a) will comprise the mosaic gene importing host of the allogeneic promoter sequence that can be operatively connected with nucleic acid molecule of the present invention (it comprises the nucleotide sequence of the biosynthetic polypeptide of at least a epothilone of relating to of coding); (b) in being fit to the host, cultivate the host under the condition of biosynthesizing epothilone.The present invention also provides the method for producing epothilone, comprises that (a) expresses epothilone with aforesaid method in recombinant host; (b) from recombinant host, extract epothilone.
According on the other hand, the invention provides a kind of isolated polypeptide, it comprises the aminoacid sequence of being made up of the epothilone synthase domain.
According to an embodiment, described epothilone synthase domain is β-ketoacyl synthase (KS) structural domain, it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid/11 1-437 of SEQ ID NO:2, the amino acid 7-432 of SEQ ID NO:4, the amino acid 39-457 of SEQ ID NO:5, the amino acid/11 524-1950 of SEQ ID NO:5, the amino acid 3024-3449 of SEQ ID NO:5, the amino acid 5103-5525 of SEQ ID NO:5, the amino acid 35-454 of SEQ ID NO:6, the amino acid 32-450 of the amino acid/11 522-1946 of SEQ ID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described KS structural domain comprises the aminoacid sequence that is selected from down group: the amino acid/11 1-437 of SEQ ID NO:2, the amino acid 7-432 of SEQ ID NO:4, the amino acid 39-457 of SEQ ID NO:5, the amino acid/11 524-1950 of SEQ ID NO:5, the amino acid 3024-3449 of SEQ ID NO:5, the amino acid 5103-5525 of SEQ ID NO:5, the amino acid 35-454 of SEQ IDNO:6, the amino acid/11 522-1946 of SEQ ID NO:6 and the amino acid 32-450 of SEQID NO:7.
According to another embodiment; described epothilone synthase domain is acyltransferase (AT) structural domain; it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid 543-864 of SEQ ID NO:2; the amino acid 539-859 of SEQ ID NO:4; the amino acid 563-884 of SEQ ID NO:5; the amino acid 2056-2377 of SEQ ID NO:5; the amino acid 3555-3876 of SEQ ID NO:5; the amino acid 5631-5951 of SEQ ID NO:5; the amino acid 561-881 of SEQ ID NO:6; the amino acid 556-877 of the amino acid 2053-2373 of SEQ ID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described AT structural domain comprises the aminoacid sequence that is selected from down group: the amino acid 543-864 of SEQ ID NO:2, the amino acid 539-859 of SEQ ID NO:4, the amino acid 563-884 of SEQ ID NO:5, the amino acid 2056-2377 of SEQ ID NO:5, the amino acid 3555-3876 of SEQ ID NO:5, the amino acid 5631-5951 of SEQ ID NO:5, the amino acid 561-881 of SEQ IDNO:6, the amino acid 556-877 of the amino acid 2053-2373 of SEQ ID NO:6 and SEQ ID NO:7.
According to another embodiment; described epothilone synthase domain is enoyl-reductase enzyme (ER) structural domain; it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid 974-1273 of SEQ ID NO:2; the amino acid 4433-4719 of SEQ ID NO:5; the amino acid/11 478-1790 of the amino acid 6542-6837 of SEQ ID NO:5 and SEQ ID NO:7.According to this embodiment, preferred this ER structural domain comprises the aminoacid sequence that is selected from down group: the amino acid 974-1273 of SEQ ID NO:2, the amino acid 4433-4719 of SEQ ID NO:5, the amino acid 6542-6837 of SEQ ID NO:5 and the amino acid/11 478-1790 of SEQ IDNO:7.
According to another embodiment; described epothilone synthase domain is acyl carrier protein (ACP) structural domain; wherein said polypeptide comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid/11 314-1385 of SEQ ID NO:2; the amino acid/11 722-1792 of SEQ ID NO:4; the amino acid/11 434-1506 of SEQ ID NO:5; the amino acid 2932-3005 of SEQ ID NO:5; the amino acid 5010-5082 of SEQ ID NO:5; the amino acid 7140-7211 of SEQ IDNO:5; the amino acid/11 430-1503 of SEQ ID NO:6; the amino acid 2093-2164 of the amino acid 3673-3745 of SEQID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described ACP structural domain comprises the aminoacid sequence that is selected from down group: the amino acid/11 314-1385 of SEQ ID NO:2, the amino acid/11 722-1792 of SEQ ID NO:4, the amino acid/11 434-1506 of SEQ ID NO:5, the amino acid 2932-3005 of SEQ ID NO:5, the amino acid 5010-5082 of SEQ ID NO:5, the amino acid 7140-7211 of SEQ ID NO:5, the amino acid/11 430-1503 of SEQ ID NO:6, the amino acid 2093-2164 of the amino acid 3673-3745 of SEQ ID NO:6 and SEQ ID NO:7.
According to another embodiment, described epothilone synthase domain is dehydratase (DH) structural domain, it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid 869-1037 of SEQID NO:4, the amino acid 3886-4048 of SEQ ID NO:5, the amino acid 5964-6132 of SEQ ID NO:5, the amino acid 887-1051 of the amino acid 2383-2551 of SEQ ID NO:6 and SEQ ID NO:7.According to this embodiment, preferred described DH structural domain comprises the aminoacid sequence that is selected from down group: the amino acid 869-1037 of SEQ ID NO:4, the amino acid 3886-4048 of SEQ ID NO:5, the amino acid 5964-6132 of SEQ ID NO:5, the amino acid 2383-2551 of SEQ ID NO:6 and the amino acid 887-1051 of SEQ IDNO:7.
According to another embodiment, described epothilone synthase domain is β-ketoreductase (KR) structural domain, it comprises and the similar substantially aminoacid sequence of aminoacid sequence that is selected from down group: the amino acid/11 439-1684 of SEQ ID NO:4, the amino acid/11 147-1399 of SEQ ID NO:5, the amino acid 2645-2895 of SEQ ID NO:5, the amino acid 4729-4974 of SEQ ID NO:5, the amino acid 6857-7101 of SEQ ID NO:5, the amino acid/11 143-1393 of SEQ ID NO:6, the amino acid 3392-3636 of SEQ ID NO:6 and the amino acid/11 810-2055 of SEQID NO:7.According to this embodiment, preferred described KR structural domain comprises the aminoacid sequence that is selected from down group: the amino acid/11 439-1684 of SEQ ID NO:4, the amino acid/11 147-1399 of SEQ ID NO:5, the amino acid 2645-2895 of SEQ ID NO:5, the amino acid 4729-4974 of SEQ ID NO:5, the amino acid 6857-7101 of SEQ ID NO:5, the amino acid/11 143-1393 of SEQ ID NO:6, the amino acid/11 810-2055 of the amino acid 3392-3636 of SEQ ID NO:6 and SEQ ID NO:7.
According to another embodiment, described epothilone synthase domain is methyltransgerase (MT) structural domain, and it comprises the similar substantially aminoacid sequence of amino acid 2671-3045 to SEQ ID NO:6.According to this embodiment, preferred described MT structural domain comprises the amino acid 2671-3045 of SEQ IDNO:6.
According to another embodiment, described epothilone synthetase structure domain is thioesterase (TE) structural domain, and it comprises the similar substantially aminoacid sequence of amino acid 2165-2439 to SEQ ID NO:7.According to this embodiment, preferred described TE structural domain comprises the amino acid 2165-2439 of SEQ IDNO:7.
Other aspects and advantages of the present invention will be conspicuous for those those skilled in the art after having studied following description of the present invention and non-limiting example.
Definition
In description of the invention, will use following term, and do as giving a definition.
Be associated/can be operatively connected: refer to two kinds of dna sequence dnas that physics or function are relevant.For example, if promotor or regulating DNA sequence and coding RNA or protein DNA can be operatively connected or place, to such an extent as to the regulating DNA sequence will influence the expression level of coding or structural DNA sequence, then this promotor or regulating and controlling sequence are considered to " be associated " with this dna sequence dna.
Mosaic gene: recombinant DNA sequence, wherein promotor or regulating DNA sequence be with coding mRNA or be expressed as the protein DNA sequence and can be operatively connected, or associated, to such an extent as to the regulating DNA sequence can be regulated and control transcribing or expressing of this dna sequence dna that is associated.As what find at occurring in nature, the dna sequence dna that the regulating DNA sequence of mosaic gene is not associated with this usually can be operatively connected.
DNA sequences encoding: in organism, translate into the protein DNA sequence.
Structural domain: in the polyketide synthase for essential that part of of specific unique activity.Example comprises acyl carrier protein (acyl carrier protein; ACP), β-ketone synthase (β-ketosynthase; KS), acyltransferase (acyltransferae; AT), β-ketoreductase (β-ketoreductase; KR), dehydratase (dehydratase; DH), the enoyl-reductase enzyme (enoylreductase, ER) and thioesterase (thioesterase, TE) structural domain.
Epothilone: by 16 yuan big ring polyketides of the natural generation of bacterial fibers heap capsule bacteria strain So ce90, the biological effect of imitation taxol.In this application, " epothilore " refers to the polyketide class, comprises epothilone A and epothilone B, and their analogue such as describe among the WO98/25929 those.
Epothilone synthase: be responsible for the biosynthetic polyketide synthase of epothilone.
Gene: be positioned at genome,, also comprise it mainly being definite zone of regulating DNA sequence (dna sequence dna that the expression of responsible regulation and control DNA sequences encoding is promptly transcribed and translated) except above-mentioned DNA sequences encoding.
Allogeneic dna sequence:, comprise the non-spontaneous multiple copied of natural DNA sequence with the natural unconnected dna sequence dna of host cell that imports.
Homologous DNA sequence: with the natural dna sequence dna that is associated of host cell that imports.
Homologous recombination: intercourse dna fragmentation between the homologous dna molecule.
Isolating: in content of the present invention, isolated nucleic acid molecule or isolating enzyme are by artificial realization, break away from that its physical environment exists and because of rather than the nucleic acid molecule or the enzyme of natural product.Isolated nucleic acid molecule or enzyme can exist with purified form, perhaps can exist in such as recombinant host cell at the non-natural environment.
Module: the gene element of all unique activity that coding single-wheel polyketide biosynthesizing (i.e. β-carbonyl treatment step that condensation step is relevant therewith with all) needs.Each module coding ACP, KS and AT activity are to realize biosynthetic condensation part, and be active in to carry out β-carbonyl processing after the condensation that coding is selected.
NRPS: non-ribosomal polypeptide synthetic enzyme is to be responsible for amino acid is joined enzymic activity in the secondary metabolites, comprises, for example, the acidifying of amino acid adenosine, isomerization, N-methylate, the mixture of cyclisation, acyltransferase polypeptide carrier proteins and condensation structural domain.There is the NRPS catalysis of function that amino acid is joined in the secondary metabolites.
The NRPS gene: the guidance that is coded in one or more compatible controlling elementss produces one or more genes of the NRPS of the secondary metabolites (for example epothilone A and B) that function is arranged down.
Nucleic acid molecule: can be from the linear fragment of the isolating strand in any source or double-stranded DNA or RNA.In content of the present invention, nucleic acid molecule is dna fragmentation preferably.
ORF: open reading frame.
PKS: polyketide synthase is to be responsible for the biosynthetic enzymic activity of polyketide (structural domain), comprises, for example the mixture of ketoreductase, dehydratase, acyl carrier protein, enoyl-reductase enzyme, ketoacyl ACP synthase and acyltransferase.PKS catalysis polyketide synthetic that function is arranged.
The PKS gene: when under the guidance of one or more compatible controlling elementss, one or more genes of each peptide species that the polyketide (as epothilone A and B) that coding produces function needs.
Substantially similar: as, to be meant to have and the identical nucleic acid molecule of reference nucleic acid molecule at least 60% sequence for nucleic acid.In preferred embodiments, similar substantially dna sequence dna is identical with reference dna sequence at least 80%; In a more preferred embodiment, similar substantially dna sequence dna is identical with reference dna sequence at least 90%; And in the most preferred embodiment, similar substantially dna sequence dna is identical with reference dna sequence at least 95%.Substantially similar dna sequence dna optimized encoding has protein or identical substantially active protein of peptide or the peptide with the reference dna sequence encoding.Substantially similar nucleotide sequence can be hybridized under following condition with reference nucleic acid molecule or their fragment usually: at 7% sodium lauryl sulphate (SDS), 0.5M NaPO 4Among pH7.0, the 1mM EDTA in 50 ℃ of hybridization; With 2xSSC, 1%SDS in 50 ℃ of rinsings.For protein or peptide, similar substantially aminoacid sequence is that the aminoacid sequence at least 90% with reference protein or peptide is identical and have and reference protein or the identical substantially active aminoacid sequence of peptide.
Transform: heterologous nucleic acids is imported host cell or organic process.
Through transform/genetically modified/reorganization: the host's organism that has referred to import the heterologous nucleic acids molecule is such as bacterium.In the genome that is incorporated into the host that nucleic acid molecule can be stable, perhaps nucleic acid molecule also can be used as the extrachromosomal molecule existence.Such extrachromosomal molecule can self-replicating.Be interpreted as not only comprising the end product of conversion process through cell transformed, tissue or plant, and also have its transgenic progeny." non-conversion ", " not genetically modified " or " nonrecombinant " host refer to the wild-type organism, and promptly bacterium does not wherein contain the heterologous nucleic acids molecule.
Nucleotide is represented according to following standardized abbreviations by their base: VITAMIN B4 (adenine, A), cytosine(Cyt) (cytosine, C), thymus pyrimidine (thymine, T), and guanine (guanine, G).Represent according to following standardized abbreviations like the amino acids: L-Ala (alanine; Ala; A), arginine (arginine; Arg; R), l-asparagine (asparagine; Asn; N), aspartic acid (aspartic acid; Asp; D), halfcystine (cysteine; Cys; C), glutamine (glutamine; Gln; Q), L-glutamic acid (glutamic acid; Glu; E), glycine (glycine; Gly; G), Histidine (histidine; His; H), Isoleucine (isoleucine; Ile; I), leucine (leucine; Leu; L), Methionin (lysine; Lys; K), methionine(Met) (methionine; Met; M), phenylalanine (phenylalanine; Phe; F), proline(Pro) (proline; Pro; P), Serine (serine; Ser; S), Threonine (threonine; Thr; T), tryptophane (trytophan; Trp; W), tyrosine (tyrosine; Tyr; And Xie Ansuan (valine Y); Val; V).In addition, (Xaa; X) represent any amino acid.
The description of sequence in the sequence table
SEQ ID NO:1 is the nucleotide sequence that contains the 68750bp contig of 22 open reading frames (ORF), comprises the epothilone biosynthesis gene.
SEQ ID NO:2 is the protein sequence by the first kind polyketide synthase (EPOS A) of epoA (the Nucleotide 7610-11875 of SEQ ID NO:1) coding.
SEQ ID NO:3 is the protein sequence by the non-ribosomal peptide synthetase (EPOS P) of epoP (the Nucleotide 11872-16104 of SEQ ID NO:1) coding.
SEQ ID NO:4 is the protein sequence by the first kind polyketide synthase (EPOS B) of epoB (the Nucleotide 16251-21749 of SEQ ID NO:1) coding.
SEQ ID NO:5 is the protein sequence by the first kind polyketide synthase (EPOS C) of epoC (the Nucleotide 21746-43519 of SEQ ID NO:1) coding.
SEQ ID NO:6 is the protein sequence by the first kind polyketide synthase (EPOS D) of epoD (the Nucleotide 43524-54920 of SEQ ID NO:1) coding.
SEQID NO:7 is the protein sequence by the first kind polyketide synthase (EPOS E) of epoE (the Nucleotide 54935-62254 of SEQ ID NO:1) coding.
SEQ ID NO:8 is the protein sequence by the Cytochrome P450 oxygenase homologue (EPOS F) of epoF (the Nucleotide 62369-63628 of SEQ ID NO:1) coding.
SEQ ID NO:9 is the partial protein sequence (part of O rf1) by orf1 (the Nucleotide 1-1826 of SEQ ID NO:1) coding.
SEQ ID NO:10 is by orf2 (the Nucleotide 3171-1900 of SEQ ID NO:1 reverse complemental chain) encoded protein matter sequence (Orf2).
SEQ ID NO:11 is by orf3 (the Nucleotide 3415-5556 of SEQ ID NO:1) encoded protein matter sequence (Orf3).
SEQ ID NO:12 is by orf4 (the Nucleotide 5992-5612 of SEQ ID NO:1 reverse complemental chain) encoded protein matter sequence (Orf4).
SEQ ID NO:13 is by orf5 (the Nucleotide 6226-6675 of SEQ ID NO:1) encoded protein matter sequence (Orf5).
SEQ ID NO:14 is by orf6 (the Nucleotide 63779-64333 of SEQ ID NO:1) encoded protein matter sequence (Orf6).
SEQ ID NO:15 is by orf7 (the Nucleotide 64290-63853 of SEQ ID NO:1 reverse complemental chain) encoded protein matter sequence (Orf7).
SEQ ID NO:16 is by orf8 (the Nucleotide 64363-64920 of SEQ ID NO:1) encoded protein matter sequence (Orf8).
SEQ ID NO:17 is by orf9 (the Nucleotide 64727-64287 of SEQ ID NO:1 reverse complemental chain) encoded protein matter sequence (Orf9).
SEQ ID NO:18 is by orf10 (the Nucleotide 65063-65767 of SEQ ID NO:1) encoded protein matter sequence (Orf10).
SEQ ID NO:19 is by orf11 (the Nucleotide 65874-65008 of SEQ ID NO:1 reverse complemental chain) encoded protein matter sequence (Orf11).
SEQ ID NO:20 is by orf12 (the Nucleotide 66338-65871 of SEQ ID NO:1 reverse complemental chain) encoded protein matter sequence (Orf12).
SEQ ID NO:21 is by orf13 (the Nucleotide 66667-67137 of SEQ ID NO:1) encoded protein matter sequence (Orf13).
SEQ ID NO:22 is by orf14 (the Nucleotide 67334-68251 of SEQ ID NO:1) encoded protein matter sequence (Orf14).
SEQ ID NO:23 is the partial protein sequence (part of O rf15) by orf15 (the Nucleotide 68346-68750 of SEQ ID NO:1) coding.
SEQ ID NO:24 is general inverse PCR primer sequence.
SEQ ID NO:25 is general forward PCR primer sequence.
SEQ ID NO:26 is NH24 end " B " PCR primer sequence.
SEQ ID NO:27 is NH2 end " A " PCR primer sequence.
SEQ ID NO:28 is NH2 end " B " PCR primer sequence.
SEQ ID NO:29 is pEPO15-NH6 " B " PCR primer sequence.
SEQ ID NO:30 is pEPO15-H2.7 " A " PCR primer sequence.
Preservation information
Following material has been used for the microbial preservation budapest treaty of patented procedure according to international recognition, be stored in farming research service centre (Agricultural Research Service), patent culture collecting center (Patent Culture Collection (NRRL)), 1815 northern university streets, Peoria, Illinois 61604.All will be eliminated after being limited in of preservation thing obtained the mandate of patent obtaining.
The preservation thing Preserving number Store the date
PEPO15 NRRL B-30033 on June 11st, 1998
PEPO32 NRRL B-30119 on April 16th, 1999
Description of drawings
Fig. 1 is the inoculation chain synoptic diagram of 100 liters of fermentations described in the embodiment 14.
The detailed description of invention
Can use technical point of the present invention from obtaining relating to the biosynthetic gene of epothilone. The preferred steps of separating the epothilone biosynthesis gene, need to be from isolation of genomic DNA through identifying the organism that produces epothilone A and B, and the suitable plasmid that will separate or the DNA on the carrier transfer to usually and do not produce in host's organism of polyketide, identifies subsequently the host's bacterium colony through transforming with epothilone production capacity. Use such as λ:: Tn5 transposon mutagenesis (de Bruijn and Lupski, gene (Gene) 27:131-149 (1984)) technology, the precise region of the DNA that gives epothilone of conversion can more accurately be determined. Alternately or extraly, the DNA that gives epothilone that transforms can be cut into less fragment, and can further identify and keep the minimal segment of authorizing the epothilone ability. Yet the host's organism that lacks epothilone production capacity may be the species different from the species that produce polyketide, and the change of this technology relates to host DNA is transformed among the identical host that epothilone production capacity destroyed by mutagenesis. In this method, the organism that produces epothilone has been suddenlyd change, and separates the mutant that is not produced epothilone. Then these mutant are used the genomic DNA that from the parent strain that produces epothilone, separates to remedy.
Can be used for separating the further example of the gene of epothilone biosynthesis needs, be the organic mutant that produces epothilone with transposon mutagenesis, and it can not produce polyketide after by mutagenesis. Like this, the zone that host genome is responsible for producing epothilone is marked with label by transposons, and can reclaim and make separating natural gene from parental strain as probe. Because the sequence homology of biosynthesis gene such as rifamycin or the biosynthetic PKS gene of soraphen that they and sequence are known can separate obtaining PKS gene polyketone synthesis compound needs and similar to known PKS gene. Utilize the appropriate separation technology of homology, comprise by DNA hybridization and carry out conventional library screening.
The dna fragmentation that can obtain from the gene that works known polyketide is synthetic or other dna sequence dna uses as preferred probe molecule. A kind of preferred probe molecule comprises the 1.2kb Sma I DNA fragment (U.S. Patent number 5 of the ketone synthase domain of the 4th module of coding soraphen PKS, 716,849), preferred probe molecule comprises the β of rifamycin PKS the first and second modules-ketoacyl synthase domain (people such as Schupp, FEMS microorganism communication 159:201-207 (1998)). These probes can be used for surveying the gene pool of the microorganism that produces epothilone, are responsible for the biosynthetic PKS gene of epothilone to separate.
Although it is usually very difficult to separate as everyone knows the PKS gene, although expection separates especially difficulty of epothilone biosynthesis gene, by using the method for describing in this specification, can surprisingly from the microorganism that produces epothilone A and B, clone the biosynthesis gene that obtains this polyketide. Use method and the recombinant products of the genetic manipulation of describing in this specification, can modify and in the transformed host organism, express the PKS gene that is cloned.
Can in heterologous host, express the epothilone biosynthesis gene that separates, with than the more efficient production polyketide of natural host. Being used for the technology of these genetic manipulations, be special for different obtainable hosts, and those skilled in the art knows. For example, use is described in the people such as McDaniel such as those, the people such as science (Science) 262:1546-1550 (1993) and Kao, the technology of science 265:509-512 (1994) (herein quoting as a reference) can be at expressing heterologous gene in streptomycete and other actinomyces. Be also shown in the people such as Rowe, gene (Gene) 216:215-226 (1998); The people such as Holmes, European molecular biology magazine (EMBO Journal) 12 (8): the people such as 3183-3191 (1993) and Bibb, gene 38:215-226 (1985) quotes as a reference herein.
Perhaps, can also be responsible for the biosynthetic gene of polyketide, i.e. epothilone biosynthesis gene such as pseudomonad and expression in escherichia coli at other host's organism. The technology that is used for these genetic manipulations is special for different obtainable hosts, and those skilled in the art knows. For example, use pT7-7 carrier (using the T7 promoter), in Escherichia coli successful expression the PKS gene. See, the people such as Tabor, progress (Proc.Natl.Acad.Sci.USA) 82:1074-1078 (1985) of NAS quotes as a reference herein. In addition, can be with expression vector pKK223-3 and pKK223-2 at the expression in escherichia coli heterologous gene, or to transcribe or with the translation amalgamation mode, after tac or trc promoter. For the expression of operon of the many ORF of coding, the simplest step is to transcribe amalgamation mode operon is inserted into carrier such as among the pKK223-3, makes it possible to use the similar ribosome bind site of heterologous gene. Be used in the technology of Gram-positive species such as the bacillus overexpression, those skilled in the art also knows, and can be used for the present invention (people such as Quax, industrial microbiology: basis and application molecular genetics, the people such as editor Baltz, American Academy Of Microbiology, Washington (1993)).
The expression system that other can use with epothilone biosynthesis gene of the present invention comprises yeast and baculovirus expression system. See, for example, " expression of recombinant protein in yeast ", P.E.Sudbery, the prevailing view of biotechnology (Curr.Opin.Biotechnol.) 7 (5): 517-524 (1996); " in yeast, express the method for recombinant protein ", the people such as Mackay, editor: Paul R.Carey, protein engineering comment (Protein Eng.Des.) 105-153, publisher: Academic, Santiago, California (1996); " expression of heterologous gene products in yeast ", the people such as Pichuantes, editor: J.L.Cleland, C.S.Craik, protein engineering (Protein Eng.) 129-161, publisher: Wiley-Liss, New York, New York (1996); WO98/27203; The people such as Kealey, the progress 95:505-509 (1998) of NAS; " insect cell is cultivated: the latest developments of protein production, bioengineering challenge and meaning ", the people such as Palomares, editor: Enrique Galindo; Octavio T.Ramirez, senior bioengineering (Adv.Bioprocess Eng.) the 2nd volume, Invited Pap.Int.Symp., second edition (1998) 25-52, publisher: Kluwer, Dordrecht, Neth; " rhabdovirus expression vector ", Donald L.Jarvis, editor: Lois K.Miller, baculoviral (Baculoviruses) 389-431, publisher: Plenum, New York, New York (1997); " use baculoviral/insect expression system to produce heterologous protein ", the people such as Grittiths, molecular biology method (Methods Mol.Biol.) is 75 (basic cell culture scheme (second edition)) 427-440 (1997) (N.J.Totowa); " insect cell expression technology ", Verne A.Luckow, protein engineering 183-218, publisher: Wiley-Liss, New York, New York (1996); All documents are all quoted as a reference herein.
Expressing another problem that will consider of PKS gene in heterologous host, is that the posttranslational modification (can want the phosphopantetheine base before the polyketone synthesis compound at them) of PKS enzyme needs enzyme. Yet, be responsible for the enzyme of this modification of first kind PKS enzyme, phosphopantetheine base (P-pant) transferase does not exist in such as Escherichia coli usually many hosts. This problem can solve by the coexpression in heterologous host by P-pant transferase and PKS gene, and such as people such as Kealey, the progress 95:505-509 of NAS (1998) describes, and quotes as a reference herein.
Therefore, in order to produce polyketide, selecting the organic major criterion of host is its processing ease, and growth (i.e. fermentation) rapidly, has the molecular mechanism of suitable processing such as posttranslational modification, and insensitive to the overexpression polyketide. Most preferred host's organism is that actinomyces are such as streptomycete bacterial strain. Other preferred host's organism is pseudomonad and Escherichia coli. The said method of producing polyketide has clear superiority than the existing technology that this compounds of preparation uses. These advantages comprise that production cost is lower, produce the ability of more substantial compound, and produce the ability of the compound of preferred biology enantiomer, and to produce inevitable racemic mixture opposite with organic synthesis for this. The compound that is produced by heterologous host can be used for medical treatment (for example treating cancer with epothilone) and agricultural application.
Experiment
The present invention will be further described as a reference by following specific embodiment.Provide these embodiment just for illustration, and be not in order to limit (unless specified otherwise).Conventional recombinant DNA of Shi Yonging and molecule clone technology are well-known to those having ordinary skill in the art herein, and by Ausubel (editor), molecular biology current programme (Current Protocols inMolecular Biology), John Wiley and Sons Inc. (1994); T.Maniatis, E.F.Fritsch and J.Sambrook, molecular cloning: laboratory manual (MolecularCloning:A Laboratory Manual), cold spring harbor laboratory, cold spring port, New York (1989); And T.J.Silhavy, M.L.Berman, and L.W.Enquist, gene fusion experiment (Experiments with Gene Fusion), cold spring harbor laboratory, the cold spring port, New York (1994) are described.
Embodiment 1: the cultivation of producing the sorangium cellulosum bacterial strain of epothilone
(DSM 6773, and Deutsche Sammlung vonMikroorganismen und Zellkulturen is Braunschweig) at SolE substratum (0.35% glucose, 0.05% Tryptones, 0.15%MgSO with sorangium cellulosum bacterial strain 90 47H 2O, 0.05% ammonium sulfate, 0.1%CaCl 2, 0.006%K 2HPO 4, 0.01% V-Brite B, 0.0008%Fe-EDTA, 1.2%HEPES, 3.5%[vol/vo/] and through the supernatant liquor of the sorangium cellulosum culture stationary phase of sterilization) pH is transferred on 7.4 the agar plate and rules, and in 30 ℃ of cultivations.Picking is from 1cm 2Cell, be inoculated into 5ml G51t liquid nutrient medium (0.2% glucose, 0.5% starch, 0.2% Tryptones, 0.1%probion S, 0.05%CaCl 22H 2O, 0.05%MgSO 47H 2O, 1.2%HEPES, pH is transferred to 7.4) in, and in 30 ℃ of jolting 225rpm cultivations.After 4 days, culture is transferred among the 50ml G51t, and cultivated as mentioned above 5 days.With this culture inoculation 500ml G51t, and cultivated as mentioned above 6 days.With this culture centrifugal 10 minutes, and cell precipitation is resuspended among the 50ml G51t with 4000rpm.
Embodiment 2: the structure of Bacterial Artificial Chromosome Library
In order to make up the Bac library, will be in the agar sugar as the sorangium cellulosum cell embedding of cultivation as described in the embodiment 1, cracking, and the genomic dna that discharges partly digested with restriction enzyme hindIII.On sepharose, separate DNA through digestion with pulse electrophoresis.From sepharose, separate large fragment DNA (approximately 90-150kb), and be connected among the carrier pBelobacII.PBelobacII comprise the chlorampenicol resistant of encoding gene, be arranged in the lacZ gene multiple clone site that blue hickie selects and the gene that duplicates and keep at each cell one or two plasmid copy needs be provided on appropriate culture medium.Use conventional electroporation technology, will connect mixture transformed into escherichia coli DH10B electroreception attitude cell.With chlorampenicol resistant recombinant chou (hickie, lacZ sudden change) colony lift to the nylon leaching film of 384 3 * 3 grids of positively charged.Cracking is cloned and DNA is linked on the filter membrane.Same clone is stored in-80 ℃ with the form of liquid culture.
Embodiment 3: the correlated series of screening first kind polyketide synthase in the Bac library of sorangium cellulosum 90
Use conventional Southern hybridization step to survey Bac library filter membrane.β-ketoacyl the synthase domain of probe coding rifomycin polyketide synthase first and second modules of using people such as (, FEMS microbiology communication 159:210-207 (1998)) Schupp.As template, use the primer of each ketone synthase domain both sides to produce dna probe with plasmid pNE95 (pNE95 is the clay of describing among the people such as Schupp (1998) 2) by PCR.Separate the DNA that obtains the 25ng pcr amplification from 0.5% sepharose, and use random primer labelling test kit (Gibco-BRL, Methesda, MD, the U.S.) to use according to supplier's indication 32The P-dCTP mark.Hybridize in 65 ℃ and carried out 36 hours, and under highly rigorous condition, wash film (0.1xSSC and 0.5%SDS in 65 ℃ 20 minutes 3 times).To be exposed on the phosphorescent screen through the spot of mark, and on Phospholmager 445SI detection signal (screen and 445SI available from MolecularDynamics).Some Bac clone hybridizes by force with probe as a result.Select these clones, and in 5ml Luria meat soup (LB) substratum in 37 ℃ of overnight incubation.Use typical micropreparation step, from interested Bac clone, separate Bac DNA.Cell is resuspended in 200 μ l lysozyme soln (50mM glucose, 10mM EDTA, 25mM Tris-HCl, the 5mg/ml N,O-Diacetylmuramidase), cracking in 400 μ l lysates (0.2N NaOH and 2%SDS), precipitating proteins (the 3.0M Potassium ethanoate is transferred to pH5.2 with acetate), and with isopropanol precipitating Bac DNA.DNA is resuspended in the distilled water of 20 μ l nuclease free, (New England Biolabs, Inc.) restriction enzyme digests, and separates on 0.7% sepharose with BamH I.By Southern hybridization gel is changeed film as mentioned above, and the 1.2kb Sma I dna fragmentation with the ketone synthase domain of coding soraphen polyketide synthase four module (is seen as probe under aforesaid condition, U.S. Patent number 5,716,849) survey.Observe five different hybridization patterns.Select a clone as representative for every kind, and difference called after pEPO15, pEPO20, pEPO30, pEPO31 and pEPO33.
The segmental subclone of BamH I of embodiment 4:pEPO15, pEPO20, pEPO30, pEPO31 and pEPO33
With the Bac clone's of five selections of BamH I digestion DNA, and with the random fragment subclone to pBluescriptIISK +(Stragagene) BamH I site.Select to insert the subclone of clip size between 2-10kb, measuring the sequence of inserting the fragment both-side ends, and with aforesaid 1.2 Sma I probes detection.To have the height sequence homology and/or be used for the gene disruption experiment with known polyketide synthase with the strong subclone of hybridizing of soraphen ketone synthase domain.
Embodiment 5: the streptomycin resistance spontaneous mutant of preparation sorangium cellulosum bacterial strain So ce90
Get 0.1ml sorangium cellulosum bacterial strain So ce90 liquid medium within G52-H (0.2% yeast extract, 0.2% defatted soyflour, 0.8% potato starch, 0.2% glucose, 0.1%MgSO 47H 2O, 0.1%CaCl 22H 2O, 0.008%Fe-EDTA is transferred to pH7.4 with KOH) the middle culture of cultivating 3 days, be layered on the SolE medium agar flat board that adds 100 μ g/ml Streptomycin sulphates.Dull and stereotyped in 30 ℃ of 2 weeks of cultivation.The bacterium colony that is grown on this substratum is the mutant of streptomycin resistance, and it is being contained on the identical nutrient agar of Streptomycin sulphate line and is cultivating with purifying again.Select in these streptomycin resistant mutation bodies, and called after BCE28/2.
Embodiment 6: the gene disruption among the sorangium cellulosum BCE28/2 (using the BamHI fragment of subclone)
Separate the BamH I insertion fragment of cloning the subclone that produces as mentioned above by the Bac of five selections, and be connected to the single BamHI site of plasmid pCIB132 (seeing U.S. Patent number 5,716,849).Insert the intestinal bacteria ED8767 (Hedges and Matthew, plasmid (Plasmid) 2:269-278 (1979)) that segmental pCIB132 derivative conversion contains helper plasmid pUZ8 with carrying.Use transformant as the donor that engages in the experiment, and sorangium cellulosum BCE28/2 is as acceptor.In order to engage, with 5-10 * 10 9Among the individual liquid medium within G51b (G51b is equivalent to G51t, wherein substitutes Tryptones with peptone) in the sorangium cellulosum BCE28/2 stationary phase of 30 ℃ of cultivations early stage culture (reach about 5 * 10 8Cell/ml),, mix with the index culture in late period (in the LB liquid nutrient medium) of the intestinal bacteria ED8767 that contains pCIB132 derivative (carrying the BamH I fragment of subclone) and helper plasmid pUZ8 with 1: 1 cells ratio.Then with blended cell centrifugal 10 minutes, and be resuspended in the 0.5ml G51b substratum with 4000rpm.Then this cell suspension drop one is dropped in the central bed board of the SolE agar plate that contains the 50mg/l kantlex.In 30 ℃ cultivate 24 hours after harvested cell, and be resuspended in the 0.8mlG51b substratum, and with this suspension of 0.1-0.3ml bed board on the selectivity SolE solid medium that contains phleomycin (30mg/l), Streptomycin sulphate (300mg/l) and kantlex (50mg/l).Use Streptomycin sulphate to carry out the negative selection of donor coli strain.In 30 ℃ cultivate 8-12 days after, separate the bacterium colony that is grown on this selection substratum with plastic hoop, and line and cultivate to carry out second and take turns and select and purifying on identical nutrient agar.Being grown in this after cultivating 7 days in 30 ℃ and selecting bacterium colony deutero-culture on nutrient agar, is the switching zoarium that has obtained the sorangium cellulosum BCE28/2 of phleomycin resistance by the conjugal transfer of carrying the segmental pCIB132 derivative of subclone BamH I.
Verifying pCIB132 deutero-plasmid by Southern hybridization has been incorporated in the karyomit(e) of sorangium cellulosum BCE28/2 by homologous recombination.For this experiment, use Pospiech and Neumann, the method that Trends Genet.11:217 (1995) describes, every kind is shifted the segmental switching zoarium of BamH I and get 5-10 separation global DNA (the 10ml culture of having grown 3 days) in substratum G52-H.For film is changeed in Southern hybridization, separated DNA is with restriction enzyme Bgl II, Cla I or Not I cutting as mentioned above, and uses corresponding BamHI to insert fragment or the pCIB132 probe as the 32P mark.
Embodiment 7: the influence that the BamH I fragment of analytical integration is produced epothilone behind gene disruption sorangium cellulosum
To be grown in second with the aseptic plastic ring and take turns on the selectivity SolE planar surface of selection approximately 1cm 2Switching close cell (seeing embodiment 6) and transfer in the 10ml G52-H substratum and (be contained in the 50ml Erlenmeyer flask).After 30 ℃ and 180rpm cultivate 3 days, culture transferred in the 50ml G52-H substratum (be contained in the 200ml Erlenmeyer flask).After 30 ℃ and 180rpm cultivate 4-5 days, this culture of 10ml is transferred to 50ml 23B3 substratum (0.2% glucose, 2% potato starch, 1.6% defatted soyflour, the 0.0008%Fe-EDTA sodium salt, 0.5%HEPES (4-(2-hydroxyethyl)-piperazine-1-ethane sulfonic acid), 2%vol/vol polystyrene resin XAD16 (Rohm and Haas) transfers pH to 7.8 with NaOH) in (being contained in the 200ml Erlenmeyer flask).
Culture after 30 ℃ and 180rpm cultivate 7 days, the quantitative experiment of the epothilone that produces.Whole nutrient solutions are passed through 150 μ m nylon leaching film filtered off with suction.The resin that to stay then on the filter membrane is resuspended in the 10ml Virahol, and by suspension was carried out extracting in 1 hour with the 180rpm jolting.From then on take out 1ml in the suspension, and in the Eppendorff Eppendorf centrifuge with 12,000rpm is centrifugal.By HPLC and with the UV_DAD detector (with Waters-Symetry C18 chromatography column carry out HPLC and with the gradient of 0.02% phosphoric acid 60%-0% and acetonitrile 40%-100%) the method mensuration that detects in 250nm wherein epothilone A and the amount of B.
Contain with aforesaid method test that to integrate segmental switchings from three kinds of the pEPO15 subclone different BamH I fit, it is fit and contain the segmental switching zoarium of BamH I of plasmid pEPO15-4-1 promptly to contain the segmental switching of the segmental switching of BamH I BamH I fit, that contain plasmid pEPO15-4-5 of plasmid pEPO15-21.HPLC analyzes all switching zoariums of announcement and no longer produces epothilone A and B.Opposite, in segmental switching zoarium of the BamH I that is integrated with pEPO20, pEPO30, pEPO31, pEPO33 and parent strain BCE28/2, detect epothilone A and B concentration is 2-4mg/l.
Embodiment 8: measure clone's segmental nucleotide sequence and make up contig
A. the BamH I of plasmid pEPO15-21 inserts fragment
From intestinal bacteria DH10B[pEPO15-21] isolated plasmid dna the bacterial strain, and measure the segmental nucleotide sequence of 2.3kb BamH I that inserts pEPO15-21.The automated DNA order-checking is undertaken by the dideoxy nucleotide chain terminating method on the double-stranded DNA template, uses and uses Applied Biosystems 377 type sequenators.The primer that uses is general reverse primer (5 ' GGA AAC AGC TAT GAC CAT G, 3 ' (SEQ ID NO:24)) and general forward primer (5 ' GTA AAA CGA CGG CCA GT, 3 ' (SEQ ID NO:25)).In with the later several rounds sequencing reaction, use 3 ' tip designs for the previous sequence of measuring to customize the synthetic oligonucleotide and extend and connect contig.Article two, all order-checkings fully of chain, and each Nucleotide checks order twice at least.Use 3.0 editions Sequencher programs (Gene Codes Corporation) editor nucleotide sequence, and use Wisconsin genetics computer set program (WisconsinGenetics Computer Group programs) analysis.2213-bp inserts the Nucleotide 20779-22991 of segmental nucleotide sequence corresponding to SEQ ID NO:1.
B. the BamH I of plasmid pEPO15-4-1 inserts fragment
From intestinal bacteria DH10B[pEPO15-4-1] isolated plasmid dna the bacterial strain, and as above (A) described mensuration is inserted the segmental nucleotide sequence of 3.9kb BamH I of pEPO15-4-1.3909-bp inserts the Nucleotide 16876-20784 of segmental nucleotide sequence corresponding to SEQ ID NO:1.
C. the BamH I of plasmid pEPO15-4-5 inserts fragment
From intestinal bacteria DH10B[pEPO15-4-5] isolated plasmid dna the bacterial strain, and as above (A) described mensuration is inserted the segmental nucleotide sequence of 2.3kb BamH I of pEPO15-4-5.2233-bp inserts the Nucleotide 42528-44760 of segmental nucleotide sequence corresponding to SEQ ID NO:1.
Embodiment 9: contain the subclone and the ordering of the dna fragmentation among the pEPO15 of epothilone biosynthesis gene
With pEPO15 with restriction enzyme HindIII complete digestion, and with the fragment subclone that produces to the HindIII cutting and with among dephosphorylized pBluescriptIISK-of calf intestine alkaline phosphatase or the pNEB193 (New England Biolabs).Produced six different clones, and called after pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24 (all deriving by pNEB193) and pEPO15-H2.7 and pEPO15-H3.0 (all deriving by pBluescriptII SK-).
Separate and DIG mark (on-radiation dna marker and detection system, BoehringerMannheim) the BamH I of pEPO15-21 inserts fragment, and uses as the probe at pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0 under highly rigorous condition in the DNA hybrid experiment.PEPO15-NH24 is detected strong hybridization signal, illustrate that pEPO15-NH24 comprises pEPO15-21.
Separate the also BamH I insertion fragment of DIG mark pEPO15-4-1 as mentioned above, and in the DNA hybrid experiment, under highly rigorous condition, use as probe at pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0.PEPO15-NH24 and pEPO15-H2.7 are detected strong hybridization signal.Also to insert fragments sequence in full accord with the BAmH I of the pEPO15-4-1 of previous mensuration for the nucleotide sequence information that is obtained by the end of pEPO15-NH24 and pEPO15-H2.7.These experimental results show that pEPO15-4-1 (the Hind III site of containing an inside) is overlapping with pEPO15-H2.7 and pEPO15-NH24, and pEPO15-H2.7 is adjacent with this order with pEPO15-NH24.
Separate the also BamH I insertion fragment of DIG mark pEPO15-4-5 as mentioned above, and in the DNA hybrid experiment, under highly rigorous condition, use as probe at pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0.PEPO15-NH2 is detected strong hybridization signal, illustrate that pEPO15-NH2 comprises pEPO15-4-5.
Two ends and pEPO15-H24 by pEPO15-NH2 do not obtain nucleotide sequence information with that end of pEPO15-4-1 eclipsed.Towards the PCR in HindIII site primer NH24 end " B ": GTGACTGGCGCCTGGAATCTGCATGAGC (SEQ ID NO:26), NH2 end " A ": AGCGGGAGCTTGCTAGACATTCTGTTTC (SEQ ID NO:27) and NH2 end " B ": GACGCGCCTCGGGCAGCGCCCCAA (SEQ ID NO:28) according to these sequences Design, and be used for pEPO15 and in Individual testwas with the amplified reaction of sorangium cellulosum So ce90 genomic dna as template.Right with NH24 end " B " and NH2 end " A " as primer, in two kinds of templates, all find specific amplified.Be cloned into amplified material among the pBluescript II SK-and order-checking fully.The sequence of amplified material is identical, but also in full accord with the end sequence of pEPO15-NH24 and pEPO15-NH2, and merges in the HindIII site, has determined that the HindIII fragment of pEPO15-NH2 and pEPO15-NH24 is adjacent with this order.
Separate the also HindIII insertion fragment of DIG mark pEPO2.7 as mentioned above, and in the DNA hybrid experiment, under highly rigorous condition, use as probe at the pEPO15 that digests through Not I.The Not I fragment that size is approximately 9kb shows strong hybridization signal, with its further subclone to Not I digestion and with among the dephosphorylized pBluescript II of the calf intestine alkaline phosphatase SK-, thereby generation pEPO15-N9-16.Separate the also Not I insertion fragment of DIG mark pEPO15-N9-16 as mentioned above, and in the DNA hybrid experiment, under highly rigorous condition, use as probe at pEPO15-NH1, pEPO15-NH2, pEPO15-NH6, pEPO15-NH24, pEPO15-H2.7 and pEPO15-H3.0.Clone pEPO15-H2.7 and pEPO15-NH24 to pEPO15-NH6 and expection detect strong hybridization signal.Two ends and pEPO15-H2.7 by pEPO15-NH6 do not obtain nucleotide sequence information with that end of pEPO15-4-1 eclipsed.Design towards the PCR in HindIII site primer and be used for pEPO15 and in Individual testwas with the amplified reaction of sorangium cellulosum So ce90 genomic dna as template.Right with pEPO15-NH6 end " B ": CACCGAAGCGTCGATCTGGTCCATC (SEQ ID NO:29) and pEPO15-H2.7 end " A ": CGGTCAGATCGACGACGGGCTTTCC (SEQ ID NO:30) as primer, in two kinds of templates, all find specific amplified.Be cloned into amplified material among the pBluescript H SK-and order-checking fully.The sequence of amplified material is identical, but also in full accord with the end sequence of pEPO15-NH6 and pEPO15-NH2.7, merges in the HindIII site, has determined that the HindIII fragment of pEPO15-NH6 and pEPO15-NH2.7 is adjacent with this order.
All these experiments integrate, and have determined to cover about 55kb zone and have inserted fragment with this HindIII segment adjoined sequence group who forms in proper order by the HindIII of pEPO15-NH6, pEPO15-H2.7, pEPO15-NH24 and pEPO15-NH2.It is the part of this contig that the insertion fragment of remaining two HindIII subclones (called after pEPO15-NH1 and pEPO15-H3.0) is not found.
Embodiment 10: the further extension that covers the subclone contig of epothilone biosynthesis gene
With pEPO15-NH2 insert the fragment downstream end produce, and therefore represent about 2.2kb BamH I-HindIII fragment of the downstream end of the subclone contig of describing among the embodiment 9 to separate, DIG mark and being used at through the pEPO15 of plurality of enzymes digestion and the Southern hybrid experiment of pEPO15-NH2DNA.Always find that strong hybridization stripe size is identical between two kinds of target DNAs, illustrate that the sorangium cellulosum So ce90 genomic DNA fragment of being cloned among the pEPO15 ends to be positioned at the HindIII site of pEPO15-NH2 downstream end.
Use fixed step, in pScosTriplex-II people such as (, genomics (Genomics) 31:185-192 (1996)) Ji, make up the cosmid DNA library of sorangium cellulosum So ce90.Briefly, partly digesting so that mean size to be provided with restriction enzyme Sau3A I the high molecular genomic dna of sorangium cellulosum So ce90 is the fragment of about 40kb, and is connected in the pScosTriplex-II of BamH I and Xba I digestion.Connect mixture and pack, and be used for transfection Escherichia coli XL1 BlueMR cell with GigapackIII XL (Stratagene).
With BamH I-HindIII fragment (pEPO15-NH2 inserts segmental downstream end and produces, uses as the probe in the colony hybridization) the screening cosmid library of described about 2.2kb, choose a strong hybridization clone, called after pEPO4E7.
With pEPO4E7 DNA separation, with multiple digestion with restriction enzyme, also with the detection in Southern hybridization of 2.2kb BamHI-HindIII fragment.Choose the Not I fragment that a size is approximately the strong hybridization of 9kb, and subclone to pBluescriptII SK-to produce pEPO4E7-N9-8.Further to surpass 6kb overlapping for the Southern hybrid experiment Not I that disclosed the about 9kb of the pEPO4E7-N9-8 NotI-HindIII fragment of inserting fragment and pEPO15-NH2, and remaining about 3kb HindIII-NotI will extend the subclone contig of description among the embodiment 9.Yet, end sequencing discloses pEPO4E7-N9-8 and inserts the BamH I-Not I multiple clone site that segmental downstream end comprises pScosTriplex-II, the genomic dna of explanation: pEPO4E7 insertion fragment ends at the Sau3A I site in the HindIII-Not I extension fragment thus, and the NotI site is from pScosTriplex-II.
The about 1.6kb Pst of the about 3kb HindIII-Not I extension subfragment deutero-I-Sal I fragment of pEPO4E7-N9-8 (is not contained carrier, only comprise sorangium cellulosum So ce90 deutero-sequence), use as probe at the Bacterial Artificial Chromosome Library of describing among the embodiment 2.Except previous isolating EPO15, find that also the Bac clone of called after EPO32 hybridizes by force with this probe.With pEPO32 separate, with multiple digestion with restriction enzyme and with the PstI-Sal I probe hybridization of about 1.6kb.Find that HindIII-EcoR V fragment and this probe that size is approximately 13kb hybridize by force, with its subclone in the pBluescriptIISK-of HindIII and HincII digestion with generation pEPO32-HEV15.
According to the downstream end sequence of pEPO15-NH2 and upstream (HindIII) the end sequence design oligonucleotides primer of pEPO32-HEV15, and to be used for pEPO4E7-N9-8 be the sequencing reaction of template.Sequence disclosed in conventional restriction analysis detect less than the existence of the little HindIII fragment of a 24bp (EPO4E7-H0.02), separate the HindIII site of pEPO15-NII2 downstream end and the HindIII site of pEPO32-HEV15 upstream termination.
Thus, be described in subclone contig among the embodiment 9 and extended to the HindIII fragment that comprises EPO4E7-H0.02 and the insertion fragment of pEPO32-HEV15, and form in proper order with this by the insertion fragment of pEPO15-NH6, pEPO15-H2.7, pEPO15-NH24, pEPO15-NH2, EPO4E7-H0.02 and pEPO32-HEV15.
Embodiment 11: the nucleotide sequence of measuring the subclone contig that covers the epothilone biosynthesis gene
The nucleotide sequence of the subclone contig of describing among the following mensuration embodiment 10.
pEPO15-H2.7。From intestinal bacteria DH10B[pEPO15-H2.7] isolated plasmid dna the bacterial strain, and the 2.7kb BamHI that measures pEPO15-H2.7 inserts segmental nucleotide sequence.The automated DNA order-checking is undertaken by the dideoxy nucleotide chain terminating method on the double-stranded DNA template, uses Applied Biosystems 377 type sequenators.The primer that uses is general reverse primer (5 ' GGA AAC AGC TAT GAC CAT G, 3 ' (SEQ ID NO:24)) and general forward primer (5 ' GTA AAA CGA CGG CCA GT, 3 ' (SEQ ID NO:25)).In with the later several rounds sequencing reaction, use 3 ' tip designs for the previous sequence of measuring to customize the synthetic oligonucleotide and extend and connect contig.
PEPO15-NH6, pEPO15-NH24 and pEPO15-NH2.The HindIII that separates these plasmids inserts fragment and is used for random fragmentation, and (GenomicInstrumentation Services is the fragment of 1-2 kb to produce mean size Inc.) to use Hydroshear apparatus.Use T4 archaeal dna polymerase and Klenow archaeal dna polymerase when triphosphate deoxy-nucleotide exists, fragment to be carried out end reparation, and with T4 DNA kinases phosphorylation when ribo-ATP exists.From sepharose, be separated in the fragment in the 1.5-2.2kb magnitude range, and be connected in EcoRV cutting and dephosphorylized pBluescriptII SK-.Use general reverse primer and general forward primer to subclone order-checking at random.
pEPO32-HEV15。PEPO32-HEV15 is digested with HindIII and Ssp I, separate the about 13kb HindIII-EcoR V obtaining comprising sorangium cellulosum So ce90 and insert the segmental about 13.3kb fragment of 0.3kb HincII-Ssp I of fragment and pBluescriptIISK-, and partly to digest to produce mean size with HaeIII be the fragment of 1-2kb.From sepharose, be separated in the fragment in the 1.5-2.2kb magnitude range, and be connected in EcoR V cutting and dephosphorylized pBluescriptII SK-.Use general reverse primer and general forward primer to subclone order-checking at random.
Analyzed color atlas and with Phred, Phrap and Consed program (people such as Ewing, genome research (Genome Res.) 8 (3): 175-185 (1998); People such as Ewing, genome research 8 (3): 186-194 (1998); People such as Gordon, genome research 8 (3): 195-202 (1998)) is assembled into contig.Fill the contig breach, analytical sequence difference, and the low accuracy zone used being used for original subclone or checking order again from the Oligonucleolide primers of the selected cloning and sequencing in subclone library at random of custom design.Article two, all order-checkings fully of chain, and the accumulative total Phred score at least 40 of each base pair (degree of confidence 99.99%).
The nucleotide sequence of this 68750bp contig shows as SEQ ID NO:1.
Embodiment 12: the nucleotide sequence of analyzing the epothilone biosynthesis gene
Find that SEQ ID NO:1 comprises 22 ORF, sees following table 1 for details:
Table 1
ORF Initiator codon Terminator codon The proteinic homology that derivation is come out The possible function of protein that derivation is come out
orf1 Outside the order-checking scope 1826
orf2* 3171 1900 Hypothetical protein matter SP:Q11037; DD-peptase SP:P15555
orf3 3415 5556 Na/H antiport albumen PID:D1017724 Transhipment
orf4* 5992 5612
orf5 6226 6675
epoA 7610 11875 First kind polyketide synthase Epothilone synthase: form thiazole ring
epoP 11872 16104 The non-ribosomal peptide synthetase Epothilone synthase: form thiazole ring
epoB 16251 21749 First kind polyketide synthase Epothilone synthase: form the polyketide main chain
epoC 21746 43519 First kind polyketide synthase Epothilone synthase: form the polyketide main chain
epoD 43524 54920 First kind polyketide synthase Epothilone synthase: form the polyketide main chain
epoE 54935 62254 First kind polyketide synthase Epothilone synthase: form the polyketide main chain
epoF 62369 63628 Cytochrome P450 Epothilone imperial palace ester oxidase
orf6 63779 64333
orf7* 64290 63853
orf8 64363 64920
orf9* 64727 64287
orf10 65063 65767
orf11* 65874 65008
orf12* 66338 65871
orf13 66667 67137
orf14 67334 68251 Hypothetical protein matter GI:3293544; Positively charged ion flows out protein mapping GI:2623026 Transhipment
orf15 68346 Outside the order-checking scope
* on the reverse complemental chain.Numbering is according to SEQ ID NO:1.
EpoA (the Nucleotide 7610-11875 of SEQ ID NO:1) the EPOS A (SEQ IDNO:2) that encodes; a kind of first kind polyketide synthase; form by individual module; and comprise following structural domain: β-ketoacyl synthase (KS) (Nucleotide 7643-8920 of SEQ ID NO:1; the amino acid/11 1-437 of SEQID NO:2); acyltransferase (acyltransferase; AT) (the Nucleotide 9236-10201 of SEQID NO:1; the amino acid 543-864 of SEQ ID NO:2); enoyl-reductase enzyme (enoyl reductase; ER) (the Nucleotide 10529-11428 of SEQ ID NO:1; the amino acid 974-1273 of SEQ ID NO:2); with acyl carrier protein homeodomain (ACP) (the Nucleotide 11549-11764 of SEQ ID NO:1, the amino acid/11 314-1385 of SEQ ID NO:2).Sequence relatively and primitive analysis (people such as Haydock, FEBS communication (FEBS Lett.) 374:246-248 (1995); People such as Tang, gene 216:255-265 (1998)) AT that has disclosed EPOS A coding is special to malonyl CoA.EPOS A should be the part (C26 and C20) that finally forms 2-methylthiazol ring on the multienzyme complex by the acetate moiety unit is uploaded to, and it is biosynthetic initial to relate to epothilone.
EpoP (the Nucleotide 11872-16104 of SEQ ID NO:1) the EPOS P (SEQ IDNO:3) that encodes, a kind of non-ribosomal peptide synthetase contains individual module.EPOS P comprises following structural domain:
● peptide bond forms structural domain, by primitive K (the amino acid 72-81[FPLTDIQESY of SEQ ID NO:3], nucleotide position 12085-12114 corresponding to SEQ ID NO:1), primitive L (the amino acid/11 18-125[VVARHDML of SEQ ID NO:3], nucleotide position 12223-12246 corresponding to SEQ ID NO:1), primitive M (the amino acid/11 99-212[SIDLINVDLGSLSI of SEQ ID NO:3], nucleotide position 12466-12507 corresponding to SEQ ID NO:1), describe with primitive O (the amino acid 353-363[GDFTSMVLLDI of SEQ ID NO:3], corresponding to the nucleotide position 12928-12960 of SEQ ID NO:1);
● aminoacyl adenylate forms structural domain, by primitive A (the amino acid 549-565[LTYEELSRRSRRLGARL of SEQ ID NO:3], nucleotide position 13516-13566 corresponding to SEQ ID NO:1), primitive B (the amino acid 588-603[VAVLAVLESGAAYVPI of SEQ ID NO:3], nucleotide position 13633-13680 corresponding to SEQ ID NO:1), primitive C (the amino acid 669-684[AYVIYTSGSTGLPKGV of SEQ ID NO:3], nucleotide position 13876-13923 corresponding to SEQ ID NO:1), primitive D (the amino acid 815-821[SLGGATE of SEQ ID NO:3], nucleotide position 14313-14334 corresponding to SEQ ID NO:1), primitive E (the amino acid 868-892[GQLYIGGVGLALGYWRDEEKTRKSF of SEQ ID NO:3], nucleotide position 14473-14547 corresponding to SEQ ID NO:1), primitive F (the amino acid 903-912[YKTGDLGRYL of SEQ ID NO:3], nucleotide position 14578-14607 corresponding to SEQ ID NO:1), primitive G (the amino acid 918-940[EFMGREDNQIKLRGYRVELGEIE of SEQ ID NO:3], nucleotide position 14623-14692 corresponding to SEQ ID NO:1), primitive H (the amino acid/11 268-1274[LPEYMVP of SEQ ID NO:3], nucleotide position 15673-15693 corresponding to SEQ ID NO:1), describe with primitive I (the amino acid/11 285-1297[LTSNGKVDRKALR of SEQ ID NO:3], corresponding to the nucleotide position 15724-15762 of SEQ ID NO:1);
● a unknown structure territory is inserted between the primitive G and H that aminoacyl adenylate forms structural domain (the amino acid 973-1256 of SEQ ID NO:3 is corresponding to the nucleotide position 14788-15639 of SEQ ID NO:1); With
● peptidyl carrier proteins homeodomain (PCP), describe by primitive J (the amino acid/11 344-1351[GATSIHIV of SEQ ID NO:3], corresponding to the nucleotide position 15901-15924 of SEQ ID NO:1).
Someone proposes the adenosine acidifying activation that EPOS P relates to halfcystine (become aminoacyl-S-PCP in conjunction with activated halfcystine, form peptide bond between ethanoyl-S-ACP that enzyme bonded halfcystine and EPOS A provide) and forms initial thiazoline ring by the intramolecularly heterocyclization.The unknown structure territory demonstration of EPOS P has very weak homology with NAD (P) H oxydase and the reductase enzyme from bacillus species.Thus, the ER structural domain of this unknown structure territory and/or EPOS A may relate to initial 2-methylthiazol quinoline epoxy and changes into the 2-methylthiazol.
EpoB (the Nucleotide 16251-21749 of SEQ ID NO:1) the EPOS B (SEQ IDNO:4) that encodes, a kind of first kind polyketide synthase, form by individual module, and comprise following structural domain: KS (the Nucleotide 16269-17546 of SEQ ID NO:1, the amino acid 7-432 of SEQ ID NO:4), AT (the Nucleotide 17865-18827 of SEQ ID NO:1, the amino acid 539-859 of SEQ ID NO:4), dehydratase (DH) (the Nucleotide 18855-19361 of SEQ ID NO:1, the amino acid 869-1037 of SEQ ID NO:4), β-ketoreductase (KR) (Nucleotide 20565-21302 of SEQ IDNO:1, the amino acid/11 439-1684 of SEQ ID NO:4) and ACP (the Nucleotide 21414-21626 of SEQ ID NO:1, the amino acid/11 722-1792 of SEQ ID NO:4).The AT that sequence comparison and primitive analysis have disclosed EPOS B coding is special to methylmalonyl CoA.β one ketone group of Claisen sample condensation that EPOS A may be by catalysis 2-methyl-initial group of 4-thiazole carboxyl-S-PCP and methylmalonyl-S-ACP and the C17 that follows is reduced into enoyl-, and participates in the extension of first kind of polyketide chain.
EpoC (the Nucleotide 21746-43519 of SEQ ID NO:1) the EPOS C (SEQ IDNO:5) that encodes, a kind of first kind polyketide synthase is made up of 4 modules.First module comprises KS structural domain (the Nucleotide 21860-23116 of SEQ ID NO:1, the amino acid 39-457 of SEQ ID NO:5), special AT (the Nucleotide 23431-24397 of SEQ ID NO:1 of malonyl CoA, the amino acid 563-884 of SEQ ID NO:5), KR (the Nucleotide 25184-25942 of SEQ ID NO:1, the amino acid/11 147-1399 of SEQ ID NO:5) and ACP (the Nucleotide 26045-26263 of SEQ ID NO:1, the amino acid/11 434-1506 of SEQ ID NO:5).This module is introduced acetoxyl extension apparatus (C14-C13), and the beta-keto that will be positioned at C15 is reduced into and participates in the oh group that the big lactonic ring of epothilone finally lactonizes.Second module of EPOS C comprises KS (the Nucleotide 26318-27595 of SEQ ID NO:1, the amino acid/11 524-1950 of SEQ ID NO:5), special AT (the Nucleotide 27911-28876 of SEQID NO:1 of malonyl CoA, the amino acid 2056-2377 of SEQ ID NO:5), KR (the Nucleotide 29678-30429 of SEQ ID NO:1, the amino acid 2645-2895 of SEQ ID NO:5) and ACP (the Nucleotide 30539-30759 of SEQ ID NO:1, the amino acid 2932-3005 of SEQ ID NO:5).This module is introduced acetoxyl extension apparatus (C12-C11), and the beta-keto that will be positioned at C13 is reduced into oh group.Thus, the nascent polyketide chain of epothilone is equivalent to epothilone A, and the introducing of the C12 in epothilone B methyl chains will need the follow-up C-methyl transferase activity of PKS.The formation that is positioned at the oxirane ring of C13-C12 also will need PKS subsequent oxidation step.The 3rd module of EPOS C comprises KS (the Nucleotide 30815-32092 of SEQ ID NO:1, the amino acid 3024-3449 of SEQ ID NO:5), AT (the Nucleotide 32408-33373 of SEQ ID NO:1 that malonyl CoA is special, the amino acid 3555-3876 of SEQ ID NO:5), DH (the Nucleotide 33401-33889 of SEQ ID NO:1, the amino acid 3886-4048 of SEQ ID NO:5), ER (the Nucleotide 35042-35902 of SEQ IDNO:1, the amino acid 4433-4719 of SEQ ID NO:5), KR (the Nucleotide 35930-36667 of SEQ ID NO:1, the amino acid 4729-4974 of SEQ ID NO:5), and ACP (the Nucleotide 36773-36991 of SEQ ID NO:1, the amino acid 5010-5082 of SEQ ID NO:5).This module is introduced acetoxyl extension apparatus (C10-C9), and reduction is positioned at the beta-keto of C11 fully.The 4th module of EPOS C comprises KS (the Nucleotide 37052-38320 of SEQ IDNO:1, the amino acid 5103-5525 of SEQ ID NO:5); AT (the Nucleotide 38636-39598 of SEQ ID NO:1 that methylmalonyl CoA is special, the amino acid 5631-5951 of SEQ ID NO:5), DH (the Nucleotide 39635-40141 of SEQ ID NO:1, the amino acid 5964-6132 of SEQ ID NO:5), ER (the Nucleotide 41369-42256 of SEQ ID NO:1, the amino acid 6542-6837 of SEQ ID NO:5), KR (the Nucleotide 42314-43048 of SEQID NO:1, the amino acid 6857-7101 of SEQ ID NO:5), and ACP (the Nucleotide 43163-43378 of SEQ ID NO:1, the amino acid 7140-7211 of SEQ ID NO:5).This module is introduced propionyloxy extension apparatus (C24 and C8-C7), and reduction is positioned at the beta-keto of C9 fully.
EpoD (the Nucleotide 43524-54920 of SEQ ID NO:1) the EPOS D (SEQIDNO:6) that encodes, a kind of first kind polyketide synthase is made up of 2 modules.First module comprises KS structural domain (the Nucleotide 43626-44885 of SEQ ID NO:1, the amino acid 35-454 of SEQ ID NO:6), special AT (the Nucleotide 45204-46166 of SEQ ID NO:1 of methylmalonyl CoA, the amino acid 561-881 of SEQ ID NO:6), KR (the Nucleotide 46950-47702 of SEQ ID NO:1, the amino acid/11 143-1393 of SEQ ID NO:6) and ACP (the Nucleotide 47811-48032 of SEQ ID NO:1, the amino acid/11 430-1503 of SEQ ID NO:6).This module is introduced propionyloxy extension apparatus (C23 and C6-C5), and the beta-keto that will be positioned at C7 is reduced into oh group.Second module comprises KS (the Nucleotide 48087-49361 of SEQ ID NO:1, the amino acid/11 522-1946 of SEQ ID NO:6), AT (the Nucleotide 49680-50642 of SEQ ID NO:1 that methylmalonyl CoA is special, the amino acid 2053-2373 of SEQ IDNO:6), DH (the Nucleotide 50670-51176 of SEQ ID NO:1, the amino acid 2383-2551 of SEQ ID NO:6), methyltransgerase (MT, the Nucleotide 51534-52657 of SEQ ID NO:1, the amino acid 2671-3045 of SEQ ID NO:6), KR (the Nucleotide 53697-54431 of SEQID NO:1, the amino acid 3392-3636 of SEQ ID NO:6), and ACP (the Nucleotide 54540-54758 of SEQ ID NO:1, the amino acid 3673-3745 of SEQ ID NO:6).This module is introduced propionyloxy extension apparatus (C21 or C22 and C4-C3), and the beta-keto that will be positioned at C5 is reduced into oh group.This reduces, and some is unexpected, because epothilone contains ketone group at C5.Yet, this inconsistent between the redox state of reducing activity that the PKS module is inferred and final polyketide product corresponding position, (see at document, for example, people such as Schweche, people such as NAS progress 92:7839-7843 (1995) and Schupp, FEMS microbiology communication 159:201-207 (1998)) in report is arranged.The key character of epothilone is the existence together with methyl chains group (C21 and C22) that is positioned at C4.Second module of prediction EPOS D is incorporated into the polyketide chain of growth with the propionyloxy unit, provides a methyl chains at C4.This module also comprises and is incorporated into the methyltransgerase structural domain between the DH and KR structural domain among the PKS, with at HMWP1 yersinia genus rhzomorph synthase (A.M.Gehring, E.DeMoll, J.D.Fetherston, I.Mori, G.F.Mayhew, F.R.Blattner, C.T.Walsh, and R.D.Perry, iron in the plague obtains: module principle (the Iron acquisition in plague:modularlogic in enzymatic biogenesis of yersiniabactin by Yersiniapestis) chemicobiology (Chem.Biol.) 5 in the enzyme biogenesis of the yersinia genus rhzomorph that yersinia pestis (Yersinia pestis) produces, 573-586,1998) homotaxy of seeing in.This MT structural domain that someone proposes among the EPOS D is responsible for introducing this second methyl chains group (C21 or C22) at C4.
EpoE (the Nucleotide 54935-62254 of SEQ ID NO:1) the EPOS E (SEQ IDNO:7) that encodes, a kind of first kind polyketide synthase, form by a module, comprise KS (the Nucleotide 55028-56284 of SEQID NO:1, the amino acid 32-450 of SEQ ID NO:7), AT (the Nucleotide 56600-57565 of SEQ ID NO:1 that malonyl CoA is special, the amino acid 556-877 of SEQ IDNO:7), DH (the Nucleotide 57593-58087 of SEQ ID NO:1, the amino acid 887-1051 of SEQ ID NO:7), non-functional probably ER (the Nucleotide 59366-60304 of SEQ ID NO:1, the amino acid/11 478-1790 of SEQ ID NO:7), KR (the Nucleotide 60362-61099 of SEQID NO:1, the amino acid/11 810-2055 of SEQ ID NO:7), ACP (the Nucleotide 61211-61426 of SEQ ID NO:1, the amino acid 2093-2164 of SEQ ID NO:7), and thioesterase (TE) (the Nucleotide 61427-62254 of SEQ ID NO:1, the amino acid 2165-2439 of SEQID NO:7).ER structural domain in this module comprises the avtive spot primitive, wherein has some highly abnormal aminoacid replacement, may make this structural domain non-activity.This module is introduced acetoxyl extension apparatus (C2-C1), and the beta-keto that will be positioned at C3 is reduced into the enoyl-group.Epothilone contains oh group at C3, so this reduction also seems unnecessary, as what discuss in second module of EPOS D.The polyketide chain that the TE structural domain of EPOS E participates in having increased is via release that lactonizes and cyclisation between C1 carboxylic group and the C15 oh group.
In by the order-checking zone, epoA upstream detection to five ORF.The orf1 of part order-checking does not have homologous sequence in sequence library.Protein (the Orf2 that ort2 (the Nucleotide 3171-1900 on the SEQ ID NO:1 reverse complemental chain) derives, SEQ ID NO:10) shows that the hypothesis ORF with mycobacterium (Mycobacterium) and streptomyces coelicolor (Streptomycescoelicolor) has strong similarity, and farther similarity is arranged with the carboxypeptidase and the DD-peptase of different bacterium.The protein that orf3 (the Nucleotide 3415-5556 of SEQ ID NO:1) derivation is come out, Orf3 (SEQ ID NO:11) show and the Na/H antiport albumen of different bacterium has homology.Orf3 may participate in transporting epothilone from produce bacterial strain.Orf4 and orf5 do not have homologous sequence in sequence library.
In by the order-checking zone, ORF of epoE detected downstream to ten.EpoF (the Nucleotide 62369-63628 of SEQ ID NO:1) the EPOS F (SEQ ID NO:8) that encodes, what a kind of derivation was come out has the protein of strong sequence similarity with the Cytochrome P450 oxygenase.EPOS F may participate in adjusting the redox state of carbon C12, C5 and/or C3.The protein that orf14 (the Nucleotide 67334-68251 of SEQ ID NO:1) derivation is come out, Orf14 (SEQ ID NO:22) show with GI:3293544 (not inferring the hypothetical protein matter from streptomyces coelicolor that function) and GI:2654559 (human embryo lung (HEL) protein) strong similarity.It is also farther with flow out protein mapping class GI:2623026 from hot autotrophic methane bacteria (Methanobacterium thermoautotrophicum) positively charged ion relevant, so it may also participate in transporting epothilone from the production cell.Clauses and subclauses in remaining ORF (orf6-orf13 and orf15) and the sequence library do not show homology.
Embodiment 13:epothilone biosynthesis gene recombinant expressed
In order to reach, in the allos organism, express epothilone synthase gene of the present invention than the higher output of sorangium cellulosum fermentation.A kind of preferred host who is used for heterogenous expression is a streptomycete, the streptomyces coelicolor of for example natural generation polyketide actinorhodin.Be used for technical description at this host's express recombinant PKS gene in people such as McDaniel, people such as science 262:1546-1550 (1993) and Kao, science 265:509-512 (1994).Be also shown in, people such as Holmes, European molecular biology magazine 12 (8): people such as 3183-3191 (1993) and Bibb, gene 38:215-226 (1985), and U.S. Patent number 5,521,077,5,672,491 and 5,712,146 quote as a reference herein.
According to someway, handle the heterologous host bacterial strain to comprise the karyomit(e) deletion of actinorhodin (act) gene cluster.By DNA is transferred to acceptor shuttle vectors (people (1994) such as people (1993) such as McDaniel and Kao) the intestinal bacteria from temperature sensitive donor plasmid, to such an extent as to synthase gene is realized shifting by the homologous recombination in the carrier, has made up the expression plasmid that contains epothilone synthase gene of the present invention thus.Perhaps, epothilone synthase gene bunch is connected by restriction fragment import carrier.After the selection according to people (1994) such as for example Kao description, according to people such as Hopwood, the genetic manipulation of streptomycete: laboratory manual (Genetic Manipulation of Streptomyces:A Laboratory Manual) (John Innes Foundation, Norwich, United Kingdom, 1985) scheme of Ti Chuing will import the streptomyces coelicolor bacterial strain (quoting as a reference) of act-from the DNA of carrier herein.Go up at R2YE substratum people (1985) such as () Hopwood and to cultivate the reorganization streptomycete bacterial strain and to produce epothilone.Perhaps, at other host's organism such as pseudomonas, genus bacillus, yeast, insect cell and/or expression in escherichia coli epothilone synthase gene of the present invention.PKS and NRPS gene preferably use pT7-7 carrier (using the T7 promotor) to express in intestinal bacteria.See people such as Tabor, the progress 82:1074-1078 (1985) of NAS.In another embodiment, use expression vector pKK223-3 and pKK223-2 in intestinal bacteria, after tac or trc promotor, express PKS and NRPS gene to transcribe or to translate the fusion form.At the natural phosphopantetheine base (phosphopantetheinyl that does not contain PKS enzyme posttranslational modification needs, P-pant) express PKS and NRPS gene in the heterologous host of transferring enzyme, need be in the host coexpression P-pant transferring enzyme, as people such as Kealey, the progress 95:505-509 of NAS (1998) describes.
Embodiment 14: separate epothilone from produce bacterial strain
WO93/10121 (herein quoting as a reference), at U.S. Patent number 5,639, among 949 the embodiment 57, people such as Gerth, microbiotic magazine 49:560-563 (1996), with at Swiss Patent application number 396/98 (February 19 1998 date of application), with Application No. 09/248, (quoting as a reference) provided cultivation, fermentation and be used for separating the embodiment of the extraction step of polyketide herein in 910 (also the disclosing preferred sorangium cellulosum mutant strain), and be useful to extract epothilone from natural and of the present invention recombinant host.The following step can be used for from the sorangium cellulosum bacterial strain of artificial culture such as separating epothilone the So ce90, and can also be used to separating from recombinant host epothilone.
A:eothilone produces the cultivation of bacterial strain
Bacterial strain:Sorangium cellulosum Soce-90 or recombinant host bacterial strain of the present invention
Bacterial strain is preserved:Liquid nitrogen
Substratum:The pre-cultivation and middle cultivation the: G52
The main cultivation: 1B12
The G52 substratum:
Yeast extract, less salt (BioSpringer, Maison Alfort, France) 2g/l
MgSO 4(7H 2O) 1g/l
CaCl 2(2H 2O) 1g/l
Defatted soyflour Soyamine 50T (Lucas Meyer, hamburger, Germany) 2g/l
Potato starch Noredux A-150 (Blattmann, Waedenswil, Switzerland) 8g/l
Glucose, anhydrous 2g/l
EDTA-Fe (III)-Na salt (8g/l) 1ml/l
Be transferred to pH7.4 with KOH
Sterilization: 20mins, 120 ℃
The 1B12 substratum:
Potato starch Noredux A-150 (Blattmann, Waedenswil, Switzerland) 20g/l
Defatted soyflour Soyamine 50T (Lucas Meyer, hamburger, Germany) 11g/l
EDTA-Fe (III)-Na salt 8mg/l
Be transferred to pH7.8 with KOH
Sterilization: 20mins, 120 ℃
Add cyclodextrin and cyclodextrin derivative:
The cyclodextrin of different concns (Fluka, Buchs, Switzerland, or Wacker Chemie, Munich, Germany) is sterilized separately and is added to before inoculation in the 1B12 substratum.
Cultivate:From the liquid nitrogen ampoule, get 1ml sorangium cellulosum Soce-90 suspension, transfer to (be contained in the 50mlErlenmeyer flask) in the 10ml G52 substratum and in shaking table in 180rpm and 30 ℃, the 25mm amplitude was cultivated 3 days.Get this culture of 5ml, be added to (be contained in the 200mlErlenmeyer flask) in the 45ml G52 substratum and in shaking table in 180rpm and 30 ℃, the 25mm amplitude was cultivated 3 days.Get this culture of 50ml then, be added in the 450ml G52 substratum (be contained in 2 liters Erlenmeyer flask in) and in shaking table in 180rpm and 30 ℃, the 50mm amplitude was cultivated 3 days.
Keep cultivation:Every 3-4 days by joining the 50ml culture in the 450ml G52 substratum (be contained in 2 liters Erlenmeyer flask in) with the excessive inoculation of culture.All experiments and fermentation all begin to carry out with this culture of keeping.
Test in the flask:
(i) the pre-cultivation in shaking bottle
Keep culture by 500ml and begin, with 50ml keep culture inoculation 1 * 450ml G52 substratum and in shaking table with 180rpm and 30 ℃, the 50mm amplitude was cultivated 4 days.
The (ii) main cultivation in shaking bottle
The 40ml 1B12 substratum (being contained in the 200ml Erlenmeyer flask) that has added 5g/l 4-morpholino-propane sulfonic acid (MOPS) powder is concentrated cyclodextrin soln with 5ml 10x to be mixed, the inoculation pre-culture of 10ml and in shaking table with 180rpm and 30 ℃, the 50mm amplitude was cultivated 5 days.
Fermentation:Amount with 10 liters, 100 liters and 500 liters is fermented.20 liters and 100 liters of fermentations are as middle culturing step.The pre-cultivation with middle inoculation 10% (v/v) that cultivate kept culture, and main cultivation inoculation 20% (v/v) intermediate culture.Important: opposite with the jolting cultivation, the composition of fermention medium calculates according to final nutrient solution volume (comprising inoculation liquid).If for example, 18 liters of substratum+2 liter inoculation liquid are mixed, but weighing is used for 20 liters material only is mixed with 18 liters so.
Shake the pre-culture in the bottle:
Keep culture by 500ml and begin, 4 * 450ml G52 substratum (being contained in 2 liters of Erlenmeyer flasks) each keep culture inoculation with 50ml and in shaking table with 180rpm and 30 ℃, the 50mm amplitude was cultivated 4 days.
Intermediate culture, 20 liters or 100 liters:
20 liters:Be contained in the 2 liters of pre-cultures of 18 liters of G52 culture medium inoculateds in the fermentor tank of 30 liters of total volumies.Cultivated lasting 3-4 days, condition is: 30 ℃, and 250rpm, every liter of liquid per minute of 0.5 litres of air, 0.5 crust superpressure, no pH control.
100 liters:Be contained in 10 liters in the 20 liters of intermediate culture of 90 liters of G52 culture medium inoculateds in the fermentor tank of 150 liters of total volumies.Cultivated lasting 3-4 days, condition is: 30 ℃, and 150rpm, every liter of liquid per minute of 0.5 litres of air, 0.5 crust superpressure, no pH control.
Main culture, 10 liters, 100 liters or 500 liters:
10 liters:The substratum material that is used for 10 liters of 1B12 is sterilized at 7 premium on currency, adds 1 liter of aseptic 10%2-(hydroxypropyl)-beta-cyclodextrin solution then, and inoculates 20 liters of intermediate culture of 2 liters.The main time length of cultivating is 6-7 days, and condition is: 30 ℃, and 250rpm, every liter of liquid per minute of 0.5 litres of air, 0.5 crust superpressure, pH H 2SO 4/ KOH controls to pH7.6+/-0.5 (being need not control between pH7.1 and 8.1).
100 liters:The substratum material that is used for 100 liters of 1B12 is sterilized at 70 premium on currency, adds 10 liters of aseptic 10%2-(hydroxypropyl)-beta-cyclodextrin solution then, and inoculates 20 liters of intermediate culture of 20 liters.The main time length of cultivating is 6-7 days, and condition is: 30 ℃, and 200rpm, every liter of liquid per minute of 0.5 litres of air, 0.5 crust superpressure, pH H 2SO 4/ KOH controls to pH7.6+/-0.5.The inoculation chain synoptic diagram of 100 liters of fermentations is seen Fig. 1.
500 liters:The substratum material that is used for 500 liters of 1B12 is sterilized at 350 premium on currency, adds 50 liters of aseptic 10%2-(hydroxypropyl)-beta-cyclodextrin solution then, and inoculates 100 liters of intermediate culture of 100 liters.The main time length of cultivating is 6-7 days, and condition is: 30 ℃, and 120rpm, every liter of liquid per minute of 0.5 litres of air, 0.5 crust superpressure, pH H 2SO 4/ KOH controls to pH7.6+/-0.5.
Product analysis:
The preparation of sample:
50ml sample and 2ml polystyrene resin Amberlite XAD16 (Rohm+Haas, Frankfurt, Germany) is mixed, and in 30 ℃ with 180rpm jolting one hour.Use 150 μ m nylon mesh to filter resin then, be added in the 15ml Nunc test tube with filter then with the less water cleaning.
With product wash-out from the resin:
In the test tube that fills filter and resin, add 10ml Virahol (>99%).Then, the test tube of sealing is gone up jolting 30 minutes in room temperature at Rota-Mixer (Labinco BV, Holland).Then, centrifugally go out 2ml liquid, and use pipettor that supernatant liquor is added in the HPLC pipe.
HPLC analyzes:
Chromatography column: Waters-Symetry C18,100 * 4mm, 3.5 μ m
WATO66220+ preparation post 3.9 * 20mm
WATO54225
Solvent: A:0.02% phosphoric acid
B: acetonitrile (HPLC level)
Gradient: 41% B 0-7min
100% B 7.2-7.8min
41% B 8-12min
Oven temperature: 30 ℃
Detect: 250nm, UV-DAD detects
Application of sample volume: 10 μ l
Hold-time: EpoA:4.30min EpoB:5.38min
B: add the influence of cyclodextrin and cyclodextrin derivative to the epothilone concentration that obtains
Cyclodextrin is the oligosaccharides of the α-D-glucopyranose of annular (α-1,4) connection, has hydrophobic relatively central chamber and hydrophilic outer surface regions.
The concrete following substances (numeral in the bracket has provided the number of the glucose unit of each molecule) of distinguishing: alpha-cylodextrin (6), beta-cyclodextrin (7), γ-Huan Hujing (8), δ-cyclodextrin (9), ε-cyclodextrin (10), ζ-cyclodextrin (11), η-cyclodextrin (12) and θ-cyclodextrin (13).Especially preferred δ-cyclodextrin, particularly alpha-cylodextrin, beta-cyclodextrin or γ-Huan Hujing, perhaps their mixture.
Cyclodextrin derivative mainly is the derivative of above-mentioned cyclodextrin, especially the derivative of alpha-cylodextrin, beta-cyclodextrin or γ-Huan Hujing mainly is that those have one or several even reach all hydroxyls (each glucose free radical has 3) by the derivative of etherificate or esterification.Ether mainly is alkyl oxide, especially low alkyl group, such as methyl or ethyl ether, also has propyl group or butyl ether; The aromatic base hydroxyalkyl ether is such as phenyl hydroxyl low-grade alkyl ether, especially phenyl hydroxyethyl ether; Hydroxyalkyl ether, particularly hydroxyl low-grade alkyl ether, especially 2-hydroxyethyl ether, hydroxypropyl ether such as 2-hydroxypropyl ether or hydroxyl butyl ether are such as 2-hydroxyl butyl ether; Carboxyalkyl ether, particularly carboxyl low-grade alkyl ether, especially carboxymethyl or carboxyethyl ester; Derivatize carboxyalkyl ether, derivatize carboxyl low-grade alkyl ether particularly, wherein the derivatize carboxyl be etherificate or amidated carboxyl (mainly be aminocarboxyl, list-or two-low-grade alkyl amino carbonylic, morpholino-, piperidino-(1-position only)-, pyrrolidino (pyrrolidino)-or Piperazino (piperazino)-carbonyl or alkoxy carbonyl), particularly lower alkoxycarbonyl-lower alkyl ether, for example methoxycarbonyl propyl ether or ethoxycarbonyl propyl ether; Sulfoalkyl ether, particularly sulfo group lower alkyl ether, especially sulfo group butyl ether; The cyclodextrin that one of them or several OH group are had the group etherificate of following formula:
-O-[alk-O-]n-H
Wherein alk is an alkyl, and especially low alkyl group, and n is the integer of 2-12, especially 2-5, particularly 2 or 3; One of them or several OH group are by the cyclodextrin of the group etherificate of following formula:
Figure C20041006379300601
Wherein R ' be hydrogen, hydroxyl ,-O-(alk-O) z-H ,-(alk is (R)-O-) for O- p-H or-(alk (R)-O-) for O- q-alk-CO-Y; Alk is alkyl, especially low alkyl group in all situations; M, n, p, q and z are the integers of 1-12, preferred 1-5, particularly 1-3; Y is OR 1Or NR 2R 3, R wherein 1, R 2And R 3Be hydrogen or low alkyl group independently of each other, or R 2And R 3Represent morpholino, piperidino-(1-position only), pyrrolidino or Piperazino with connecting nitrogen; Perhaps branching cd, wherein exist with etherificate or acetal, the especially glucosyl of other glycan molecule-, the didextrose base-, (G 2-beta-cyclodextrin), malt-base-or two malt-base cyclodextrin, or N-acetylamino glucosyl, glucosamine base-, N-acetylamino galactosamine base-or galactosaminyl-cyclodextrin.
Ester mainly is alkane acyl ester, particularly lower alkyl acyl ester, such as the acetonyl ester of cyclodextrin.
Also might there be two or more different described ether and ester groups simultaneously in the cyclodextrin.
The mixture of two or more described cyclodextrin and/or cyclodextrin derivative also can exist.
Particularly preferably be α-, β-or γ-Huan Hujing or their lower alkyl ether, such as methyl-beta-cyclodextrin or particularly 2,6-two-O-methyl-beta-cyclodextrin, or their hydroxyl low-grade alkyl ether particularly, such as 2-hydroxypropyl-α-, 2-hydroxy propyl-Beta-or 2-hydroxypropyl-γ-Huan Hujing.
In substratum, add cyclodextrin or cyclodextrin derivative, the preferred 0.02-10 of its concentration, more preferably 0.05-5, especially 0.1-4, for example 0.1-2 weight percent (w/v).
Cyclodextrin or cyclodextrin derivative are knownly maybe can (consult for example US3,459,731 by known method production; US4,383,992; US4,535,152; US4,659,696; EP0094157; EP0149197; EP0197571; EP0300526; EP0320032; EP0499322; EP0503710; EP0818469; WO90/12035; WO91/11200; WO93/19061; WO95/08993; WO96/14090; GB2,189,245; DE3,118,218; DE3,317,064 and the relevant cyclodextrin wherein mentioned or cyclodextrin derivative synthetic reference or also have: T.Loftsson and M.E.Brewster (1996), the medicinal application of cyclodextrin: medicine dissolution and stability (PharmaceuticalApplications of Cyclodextrins:Drug Solubilization andStabilisation), pharmaceutical science magazine (Jounal of PharmaceuticalScience) 85(10): 1017-1025; R.A.Rajewski and V.J.Stella (1996), the medicinal application of cyclodextrin: drug disposition is delivered (Pharmaceutical Applications ofCyclodextrins:In Vivo Drug Delivery), pharmaceutical science magazine 85(11): 1142-1169).
Here Ce Shi all cyclodextrin derivative can be from Fluka company, Buchs, and CH buys.Test is shaken in the bottle at the 200ml with 50ml cultivation capacity and is carried out.In contrast, use the bottle that shakes that absorbent resin Amberlite XAD-16 (Rohm and Haas, Frankfurt, Germany) and no any sorbent material interpolation are housed.Cultivate after 5 days, measure the concentration of following epothilone by HPLC:
Table 2:
Additive The number of ordering Concentration [%w/v] 1 EpoA[mg/l] EpoB[mg/l]
Amberlite XAD-16(v/v) 2.0(%v/v) 9.2 3.8
The 2-hydroxypropyl-beta-cyclodextrin 56332 0.1 2.7 1.7
The 2-hydroxypropyl-beta-cyclodextrin The same 0.5 4.7 3.3
The 2-hydroxypropyl-beta-cyclodextrin The same 1.0 4.7 3.4
The 2-hydroxypropyl-beta-cyclodextrin The same 2.0 4.7 4.1
The 2-hydroxypropyl-beta-cyclodextrin The same 5.0 1.7 0.5
2-hydroxypropyl-alpha-cylodextrin 56330 0.5 1.2 1.2
2-hydroxypropyl-alpha-cylodextrin The same 1.0 1.2 1.2
2-hydroxypropyl-alpha-cylodextrin The same 5.0 2.5 2.3
Beta-cyclodextrin 28707 0.1 1.6 1.3
Beta-cyclodextrin The same 0.5 3.6 2.5
Beta-cyclodextrin The same 1.0 4.8 3.7
Beta-cyclodextrin The same 2.0 4.8 2.9
Beta-cyclodextrin The same 5.0 1.1 0.4
Methyl-beta-cyclodextrin 66292 0.5 0.8 <0.3
Methyl-beta-cyclodextrin The same 1.0 <0.3 <0.3
Methyl-beta-cyclodextrin The same 2.0 <0.3 <0.3
2,6 two-o-methyl-beta-cyclodextrins 39915 1.0 <0.3 <0.3
2-hydroxypropyl-γ-Huan Hujing 56334 0.1 0.3 <0.3
2-hydroxypropyl-γ-Huan Hujing The same 0.5 0.9 0.8
2-hydroxypropyl-γ-Huan Hujing The same 1.0 1.1 0.7
2-hydroxypropyl-γ-Huan Hujing The same 2.0 2.6 0.7
2-hydroxypropyl-γ-Huan Hujing The same 5.0 5.0 1.1
No additive 0.5 0.5
1) except that Amberlite (%v/v), all per-cents all be weight percentage (%w/v).
The concentration that the cyclodextrin (2,6-two-O-methyl-beta-cyclodextrin, methyl-beta-cyclodextrin) of minority test is presented at use does not influence epothilone output or negative impact is arranged.1-2%2-hydroxyl-propyl group-beta-cyclodextrin is compared with epothilone output raising 6-8 doubly with not using cyclodextrin production in an embodiment with beta-cyclodextrin.
C: 10 liters of fermentations of 1%2-(hydroxypropyl)-beta-cyclodextrin are arranged
Fermentation is carried out in 15 liters of glass fermentor tanks.Substratum contains 2-(hydroxypropyl)-beta-cyclodextrin (available from Wacker Chemie, Munich, Germany) of 10g/l.Fermenting process is listed in table 3.Fermentation stopped and beginning extracting after 6 days.
Table 3: the process of 10 liters of fermentations
Incubation time [my god] epothilone A [mg/l] epothilone B [mg/l]
0 0 0
1 0 0
2 0.5 0.3
3 1.8 2.5
4 3.0 5.1
5 3.7 5.9
6 3.6 5.7
D: 100 liters of fermentations of 1%2-(hydroxypropyl)-beta-cyclodextrin are arranged
Fermentation is carried out in 150 liters of fermentor tanks.Substratum contains 2-(hydroxypropyl)-beta-cyclodextrin of 10g/l.Fermenting process is listed in table 4.Fermented liquid was gathered in the crops after 7 days and is extracted.
Table 4: the process of 100 liters of fermentations
Incubation time [my god] epothilone A [mg/l] epothilone B [mg/l]
0 0 0
1 0 0
2 0.3 0
3 0.9 1.1
4 1.5 2.3
5 1.6 3.3
6 1.8 3.7
7 1.8 3.5
E: 500 liters of fermentations of 1%2-(hydroxypropyl)-beta-cyclodextrin are arranged
Fermentation is carried out in 750 liters of fermentor tanks.Substratum contains 2-(hydroxypropyl)-beta-cyclodextrin of 10g/l.Fermenting process is listed in table 5.Fermented liquid was gathered in the crops after 7 days and is extracted.
Table 5: the process of 500 liters of fermentations
Incubation time [my god] epothilone A [mg/l] epothilone B [mg/l]
0 0 0
1 0 0
2 0 0
3 0.6 0.6
4 1.7 2.2
5 3.1 4.5
6 3.1 5.1
F: 10 liters of fermentation comparing embodiments that do not add sorbent material
Fermentation is carried out in 15 liters of glass fermentor tanks.Substratum does not contain any cyclodextrin or other sorbent material.Fermenting process is listed in table 6.Have no harvest and extract fermented liquid.
Table 6: the process that does not contain 10 liters of fermentations of sorbent material
Incubation time [my god] epothilone A [mg/l] epothilone B [mg/l]
0 0 0
1 0 0
2 0 0
3 0 0
4 0.7 0.7
5 0.7 1.0
6 0.8 1.3
G: progressively obtain epothilone: from 500 liters of main cultures, separate
From 500 liters of the embodiment 2D main volumes of cultivating results is 450 liters, and uses Westfalia classifying separator (clarifying separator) SA-20-06 type (rpm=6500) to be separated into liquid phase (centrifugate+wash-down water=650 liter) and solid phase (cell=about 15kg).The major portion of finding epothilone is in centrifugate.The cytoplasm of centrifugal gained contains<the epothilone part of 15% mensuration, no longer further handle.Then 650 liters of centrifugates are placed 4000 liters of agitators, with 10 liters of Amberlite XAD-16 (centrifugate: resin=65: 1) mix and stir.After contacting about 2 hours, in Heine overflow type whizzer (overflow centrifuge) (40 liters of ladle capacities; Rpm=2800) centrifugal resin down in.Unload resin from whizzer, and rise washed with de-ionized water with 10-15.Desorption is undertaken by twice agitating resin, all a part and 30 liters of Virahols is stirred 30 minutes in 30 liters of glassed agitators at every turn.Use suction filter from resin, to isolate the Virahol phase.In the circulating evaporator (Schmid-Verdampfer) of vacuum operating, add the 15-20 premium on currency then Virahol is removed mutually from mixing Virahol, and about 10 premium on currency that will obtain extract 3 times mutually, use 10 liters of ethyl acetate extractings at every turn.Be extracted in 30 liters of glassed agitators and carry out.In the circulating evaporator (Schmid-Verdampfer) of vacuum operating, acetic acid ethyl acetate extract is concentrated into the 3-5 liter, under vacuum, in rotatory evaporator (B ü chi type), is concentrated into drying afterwards.Obtain the 50.2g ethyl acetate extract.Ethyl acetate extract is dissolved in the 500ml methyl alcohol, use pleated filters that insoluble part is filtered, and solution is added to 10kg Sephadex LH 20 chromatography columns (Pharmacia, Uppsala, Sweden) on (chromatography column diameter 20cm, the about 1.2m of packing height).Carry out wash-out with methyl alcohol as elutriant.Epothilone A and B mainly are present among the fraction 21-23 (every part fraction 1 liter).In rotatory evaporator, these fractions are concentrated into drying (gross weight 9.0g) in a vacuum.Then these Sephadex elution peak fractions (9.0g) are dissolved in the 92ml acetonitrile: water: methylene dichloride=in 50: 40: 2, solution is filtered and be added to the RP chromatography column (be equipped with Prepbar200, Merck with pleated filters; 2.0kgLiChrospher RP-18, Merch, granular size 12 μ m, column diameter 10cm, packing height 42cm; Merch, Darmstadt, Germany) on.Use acetonitrile: water=3: 7 carries out wash-out (flow velocity=500ml/min; Epothilone A the hold-time=about 51-59min; Epothilone B the hold-time=about 60-69min).Fraction is monitored in 250nm with the UV detector.Under vacuum, on B ü chi-Rotavapor rotatory evaporator, fraction is concentrated into drying.The weight of epothilone A elution peak fraction is 700mg, and is 75.1% according to HPLC (external perimysium reference) content.The weight of epothilone B elution peak fraction is 1980mg, and content is 86.6% according to HPLC (external perimysium reference).At last, epothilone A collects part (700mg) by the 5ml ethyl acetate: toluene=crystallization in 2: 3, and produce the pure crystal [content is according to HPLC (area %)=94.3%] of 170mgepothilone A.Epothilone B collects the pure crystal [content is according to HPLC (area %)=99.2%] that part (1980mg) is carried out crystallization by 18ml methyl alcohol and produced 1440mgepothilone B.M.p. (epothilone B): for example 124-125 ℃; Epothilone B's 1H-NMR data: 500MHz-NMR, solvent: DMSO-d6.Chemistry replaces δ and represents with the ppm with respect to TMS.S=is unimodal; D=is bimodal; The m=multiplet
δ (multiplicity) integral (number of H)
7.34(s) 1
6.50(s) 1
5.28(d) 1
5.08(d) 1
4.46(d) 1
4.08(m) 1
3.47(m) 1
3.11(m) 1
2.83(dd) 1
2.64(s) 3
2.36(m) 2
2.09(s) 3
2.04(m) 1
1.83(m) 1
1.61(m) 1
1.47-1.24(m) 4
1.18(s) 6
1.13(m) 2
1.06(d) 3
(0.89 d+s, overlapping) 6
∑=41
Embodiment 15: the medical use of the epothilone of recombinant production
The pharmaceutical preparation or the composition that comprise epothilone can be used to for example treatment of cancer (such as various human solid tumors).These anticancer preparations are including (for example) the epothilone of active dose and the suitable medicinal carrier substance of one or more organic or inorganic liquid or solids.These preparations are with for example enteron aisle, nasal cavity, rectum, oral cavity or non-enteron aisle mode, especially intramuscular or the administration of intravenously mode.The dosage of activeconstituents depends on patient's body weight, age and physique and pharmacokinetics situation and further depends on administering mode.Because the biological effect of epothilone imitation taxol, so epothilone might replace taxol in composition that uses the paclitaxel treatment cancer and method.See, for example, U.S. Patent number 5,496,804,5,565,478 and 5,641,803, quote these as a reference.
For example, when being used for the treatment of, the epothilone B that provides is that the 2ml vial is packed separately, is mixed with that 1mg/ml is limpid, colourless intravenous fluid enriched material.This material with Liquid Macrogol (PEG300) preparation and with 50 or 100ml 0.9% sodium chloride injection USP dilution to reach the ultimate density of medication infusion needs.With six circulations of per 21 days one time 30 minutes intravenous infusions (treatment of three weeks once) or per 7 days one time 30 minutes intravenous infusion (week treatment once) carry out.
Preferably, for weekly treatment, dosage is at about about 6mg/m of 0.1- 2, the about 5mg/m of preferably approximately 0.1- 2, more preferably about about 3mg/m of 0.1- 2, even more preferably 0.1-1.7mg/m 2, most preferably about about 1mg/m of 0.3- 2Between; For triweekly treatment (per three weeks once or each the 3rd week treatment), dosage is at about 0.3-18mg/m 2, the about 15mg/m of preferably approximately 0.3- 2, more preferably about 0.3-12mg/m 2, even more preferably about about 7.5mg/m of 0.3- 2, also will more preferably about about 5mg/m of 0.3- 2, most preferably about 1.0-3.0mg/m 2Between.Preferred this dosage is by carrying out 2-180min to the people, preferred 2-120min, and the more preferably about about 30min of 5-, most preferably approximately the about 30min of 10-(as about 30min) finishes intravenously (i.v.) administration.
Though the present invention was described as reference by their particular, significantly, numerous variations, modification and embodiment are possible, and corresponding, all these variations, modification and embodiment will be counted as within the spirit and scope of the present invention.
Sequence table
<110>Novartis AG
<120〉be used for the biosynthetic gene of epothilone
<130>4-30582A
<140>
<141>
<160>30
<170>PatentIn Ver.2.0
<210>1
<211>68750
<212>DNA
<213〉sorangium cellulosum
<400>1
aagcttcgct cgacgccctc ttcgcccgcg ccacctctgc ccgtgtgctc gatgatggcc 60
acggccgggc cacggagcgg catgtgctcg ccgaggcgcg cgggatcgag gacctccgcg 120
ccctccgaga gcacctccgc atccaggaag gggggccgtc ctttcactgc atgtgcctcg 180
gcgacctgac ggtggagctc ctcgcgcacg accagcccct cgcgtccatc agcttccacc 240
atgcccgcag cctgaggcac cccgactgga cctcggacgc gatgctcgtc gacggccccg 300
cgctcgtccg gtggctcgcc gcgcgcggcg cgccgggtcc cctccgcgag tacgaagagg 360
agcgcgagcg agcccgaacc gcgcaggagg cgaggcgcct gtggctcgcg gccgcgccgc 420
cctgcttcgc gcccgatctg ccccgcttcg aggacgacgc caacgggctg ccgctcggcc 480
cgatgtcgcc tgaagtcgcc gaggccgagc ggcgcctccg cgcctcgtac gcgactcctg 540
agctcgcctg tgccgcgctg ctcgcctggc tcgggacggg cgcgggtccc tggtccggat 600
atcccgccta cgagatgctg ccagagaatc tgctcctcgg gtttggcctc ccgaccgcga 660
tcgccgcggc ctccgcgccc ggcacatcgg aggccgctct ccgcggcgca gcgcggctgt 720
tcgcctcctg ggaggtcgta tcgagcaaga agagccagct cggcaacatc cccgaagccc 780
tgtgggagcg gctccggacg atcgtccgcg cgatgggcaa tgccgacaac ctctctcgct 840
tcgagcgcgc cgaggcgatc gcggcggagg tgcgccgcct gcgcgcacag ccggcgccct 900
tcgcggcggg cgccggcctg gcggtcgctg gggtctcctc gagcggccgg ctctcgggcc 960
tcgtgaccga cggagacgca ttgtactccg gcgacggcaa cgacatcgtc atgttccaac 1020
ccggccggat ctcgccggtc gtgctgctcg ccggaaccga tcccttcttc gagctcgcac 1080
cgcccctcag ccagatgctc ttcgtcgcgc acgccaacgc gggcaccatc tccaaggtcc 1140
tgacggaagg cagccccctc atcgtgatgg caagaaacca ggcgcgaccg atgagcctcg 1200
tccacgctcg cgggttcatg gcgtgggtca accaggccat ggtgcccgac cccgagcggg 1260
gcgcgccctt cgtcgtccag cgctcgacca tcatggaatt cgagcacccc acgcctcgtt 1320
gtctccacga gcccgccggc agcgctttct ccctcgcctg cgacgaggag cacctctact 1380
ggtgcgagct ttcggctggc cggctcgagc tatggcgcca cccgcaccac cgccccggcg 1440
ccccgagccg cttcgcgtac ctcggcgagc accccattgc ggcgacctgg tacccctcgc 1500
tcaccctcaa tgcgacccac gtgctgtggg ccgaccctga tcgcagggcc atcctcgggg 1560
tcgacaagcg caccggcgta gagcccatcg tcctcgcgga gacgcgccat cccccggcgc 1620
acgtcgtgtc cgaggaccgg gacatcttcg cgcttaccgg acagcccgac tcccgcgact 1680
ggcacgtcga gcacatccgc tccggcgcct ccaccgtcgt ggccgactac cagcgccagc 1740
tatgggaccg ccctgacatg gtgctcaatc ggcgcggcct cttcttcacg acgaacgacc 1800
gcatcctgac gctcgcccgc agctgacatc gctcgacgcc gggccgctca tcgagggcgc 1860
ccggaccgag ctggcgaccc gccgctggcg ggccgcagct catgccgatt cggtggcgac 1920
gtagacgctg cgccagaaac gctcgagagc ccccgagaac aggaagccgg cggattgtgt 1980
catcacgatc ccgatcagct cgcggcccgg atcattgatc caggacgtcc cgaacccgcc 2040
gtcccaccca tagcgcccgg gcacctccga gaccgcgtcc ggcgccgtga ccacggccat 2100
cccataaccc cagccgtgcg tctcgaagaa gcccgggaaa aacgaggacg ccgccttctg 2160
ggccggcgtg aggtgatcgg ccgtcatctc gcgcaccgag gcggcgctca agagccgccg 2220
gccctcgtgc acaccgccgt tcatgagcat gcgcgcgaac aggaggtagt cgtccaccgt 2280
cgacacgagc ccggcggcgc ccgaagggaa cgccggcggg ctggcatagg cgctctcggc 2340
cccgtcgcga tccatgcgcg tcttctcccc cgtctgctcg tcggtgaagt aaccgcagcc 2400
cgcgaaccga gcgagcttgt ccgccgggac gtgaaagtcg gtgtcccgca tcccgagcgg 2460
cgcgaggatg cgctcgcgca cgaacgcatc gaagccctgg tcggccgcgc gccccacgag 2520
caccccctgc accaggctcc ccgtgttgta catccactgc gcccccggct gatgcatgag 2580
cggcagcgtc ccgagccgcc ggatccactc gtctggcccg tgcggcgtca tcggcaccgg 2640
ctgcgcgttg acgagcccga gctcgtcgat ggcccgctgg atcggcgacg atgcgtcgaa 2700
cgagattccg aagcccatcg tgaacgtcat caggtcgcgc accgtgatcg gccgctccgc 2760
gggcaccgtc tcgtcgatcg gaccatcgat gcgcgccagc accttccggt tcgcgagctc 2820
cggcaaccat cggtcgacgg gggagtcgag gtcgagcttg ccttcctcga cgagcatcat 2880
caccgccgtc gcggtgaccg ccttcgtcat cgaggcgatc cggaagatcg tgtcccgccg 2940
catgggcgcg ctgccgccga gctcggtcac gcccaccgcg tccacgtgca cgtcgtcgcc 3000
gcgcgcgacc agccagaccg ctcccggcat ctgccccgcc gccacctccg ccgccatcac 3060
ctcgcgcgcg ggcgccagcg cgccggcccc cgcgtcctgc cctggctgcc cctcctcctc 3120
ggccccaccc aacgcgcacc ccggcgccgc cacgctgatc aaagctccca taaactcccg 3180
ccttctcatg accgtcgatg cctctccgag cgggggcgcc tgcccctgcc gagagcactg 3240
actgcccgcg cccgaaaaaa tcatcggtgc cccgtcacga tcgccgccgg gcgtggctcc 3300
gcccggccgc ccgctcgggc gcccgcccct ggacgagcaa agctcgcccg cccgcgctca 3360
gcacgccgct tgccatgtcc ggcctgcacc cacaccgagg agccacccac cctgatgcac 3420
ggcctcaccg agcggcaggt cctgctctcg ctcgtcaccc tcgcgctcat cctcgtgacc 3480
gcgcgcgcct ccggcgagct cgcgcggcgg ctgcgccagc ccgaggtgct cggggagctc 3540
ttcggcggcg tcgtgctggg cccctccgtc gtcggcgcgc tcgcgcccgg gttccatcga 3600
gccctcttcc aggagccggc ggtcggggtc gtgctctcgg gcatctcctg gataggcgcg 3660
ctcctcctgc tgctgatggc gggcatcgag gtcgacgtgg gcatcctgcg caaggaggcg 3720
cgccccgggg cgctctcggc gctcggcgcg atcgcgcccc cgctcgcggc gggcgccgcc 3780
ttctcggcgc tcgtgctcga tcggcccctt ccgagcggcc tcttcctcgg gatcgtgctc 3840
tcggtgacgg cggtcagcgt gatcgcgaag gtgctgatcg agcgcgagtc gatgcgccgc 3900
agctatgcgc aggtgacgct cgcggcgggg gtggtcagcg aggtcgctgc ctgggtgctc 3960
gtcgcgatga cgtcgtcgag ctacggcgcg tcgcccgcgc tggcggtcgc ccggagcgcg 4020
ctcctggcga gcggattctt gctgttcatg gtgctcgtcg ggcggcggct cacccacctc 4080
gcgatgcgct gggtggccga cgcgacgcgc gtctccaagg gacaggtgtc gctcgtcctc 4140
gtcctcacgt tcctggccgc ggcgctgacg cagcggctcg gcctgcaccc gctgctcggc 4200
gcgttcgcgc tcggcgtgct gctcaacagc gctcctcgca ccaaccgccc tctcctcgac 4260
ggcgtgcaga cgctcgtggc gggcctcttc gcgcctgtgt tcttcgtcct cgcgggcatg 4320
cgcgtcgacg tgtcgcagct gcgcacgccg gcggcgtggg ggacggtcgc gttgctgctg 4380
gcgaccgcga cggcggcgaa ggtcgtcccc gccgcgctcg gcgcgcggct cggcgggctc 4440
aggggcagcg aggcggcgct cgtggcggtg ggcctgaaca tgaagggcgg cacggacctc 4500
atcgtcgcga tcgtcggcgt cgagctcggg ctcctctcca acgaggctta tacgatgtac 4560
gccgtcgtcg cgctggtcac ggtgaccgcc tcacccgcgc tcctcatctg gctcgagaaa 4620
agggcgcctc cgacgcagga ggagtcggct cgcctcgagc gcgaggaggc cgcgaggcgc 4680
gcgtacatcc ccggggtcga gcggatcctc gtcccgatcg tggcgcacgc cctgcccggg 4740
ttcgccacgg acatcgtgga gagcatcgtc gcctccaagc gaaagctcgg cgagacggtc 4800
gacatcacgg agctctccgt ggagcagcag gcgcccggcc catcgcgcgc cgcgggggag 4860
gcgagccggg ggctcgcgag gctcggcgcg cgcctccgcg tcggcatctg gcggcaaagg 4920
cgcgagctgc gcggctcgat ccaggcgatc ctgcgcgcct cgcgggatca cgatctgctc 4980
gtgatcggcg cgcgatcgcc ggcgcgcgcg cgcggaatgt cgttcggtcg cctgcaggac 5040
gcgatcgtcc agcgggccga gtccaacgtg ctcgtcgtgg tgggcgaccc tccggcggcg 5100
gagcgcgcct ccgcgcggcg gatcctcgtc ccgatcatcg gcctcgagta ctccttcgcc 5160
gccgccgatc tcgcggccca cgtggcgctg gcgtgggacg ccgagctcgt gctgctcagc 5220
agcgcgcaga ccgatccggg cgcggtcgtc tggcgcgatc gcgagccatc ccgggtgcgc 5280
gcggtggcgc ggagcgtcgt cgacgaggcg gtcttccggg ggcgccggct cggcgtgcgc 5340
gtctcgtcgc gcgtgcacgt gggcgcgcac ccgagcgacg agataacgcg ggagctcgcg 5400
cgcgccccgt acgatctgct cgtgctcgga tgctacgacc atgggccgct cggccggctc 5460
tacctcggca gcacggtcga gtcggtggtg gtccggagcc gggtgccggt cgcgttgctc 5520
gtcgcgcatg gagggactcg agagcaggtg aggtgaggct tccaccgcgc tcgcccgtga 5580
ggaagcgagc gcccggctct gccgacgatc gtcactcccg gtccgtgtag gcgatcgtgc 5640
tgagcagcgc gttctccgcc tgacgcgagt cgagccgggt atgctgcacg acgatggggg 5700
cgtccgattc gatcacgctg gcatagtccg tatcgcgcgg gatcggctcg ggttcggtca 5760
gatcgttgaa ccggacgtgc cgggtgcgcc tcgctggaac ggtcacccgg taaggcccgg 5820
cggggtcgcg gtcgctgaag taaacggtga tggcgacctg cgcgtcccgg tccgacgcat 5880
tcaacaggca ggccgtctca tggctcgtca tctgcggctc aggtccgttg ctcccgcctg 5940
ggatgtagcc ctctgcgatt gcacagcgcg tccgcccgat cggcttgtcc atgtgtcctc 6000
cctcctggct cctctttggc agcctccctc tgctgtccag gagcgatggc ctcttcgctc 6060
gacgcgctcg gggatccatg gctgaggatc ctcgccgagc gctccctgcc gaccggcgcg 6120
ccgagcgccg acgggctttg aaagcgcgcg accggccagc ccggacgcgg gcccgagagg 6180
gacagtgggt ccgccgtgaa gcagagaggc gatcgaggtg gtgagatgaa acacgtcgac 6240
acgggccgac gattcggccg ccggataggg cacacgctcg gtcttctcgc gagcatggcg 6300
ctcgccggct gcggcggtcc gagcgagaaa accgtgcagg gcacgcggct cgcgcccggc 6360
gccgatgcgc gcgtcaccgc cgacgtcgac cccgacgccg cgaccacgcg gctggcggtg 6420
gacgtcgttc acctctcgcc gcccgagcgg ctcgaggccg gcagcgagcg gttcgtcgtc 6480
tggcagcgtc cgagccccga gtccccgtgg cgacgggtcg gagtgctcga ctacaatgct 6540
gacagccgaa gaggcaagct ggccgagacg accgtgccgt atgccaactt cgagctgctc 6600
atcaccgccg agaagcagag cagccctcag tcgccatcgt ctgccgccgt catcgggccg 6660
acgtctgtcg ggtgacatcg cgctatcagc agcgctgagc ccgccagcag gccccagggc 6720
cctgcctcga tggccttccc catcacccct gcgcactcct ccagcgacgg ccgcgcagcg 6780
acggccgcgt ccaagcaacc gccgtgccgg cgcggctcca cgcgcgcgac aggcgagcgt 6840
cctggcgcgg cctgcgcatc gctggaagga tcggcggagc atggatagag aatcgaggat 6900
cgcgatcttt gttgccatcg cagccaacgt ggcgatcgcg gcggtcaagt tcatcgccgc 6960
cgccgtgacc ggcagctcgg cgaggcgttt gccgacttcg gcggcgtccc gcgcgtgctg 7020
ctctacgaca acctcaagag cgccgtcgtc gagcgccacg gcgacgcgat ccggttccac 7080
cccacgctgc tggctctgtc ggcgcattac cgcttcgagc cgcgccccgt cgccgtcgcc 7140
cgcggcaacg agaagggccg cgtccagcgc gccatcacgg cgtggacgac atggcgcgga 7200
aacgtcgtcg taaccgccca gcaatgtcat gggaatggcc ccttgaaatg gccccttgag 7260
ggggctggcc ggggtcgacg atatcgcgcg atctccccgt caattcccga tggtaaaaga 7320
aaaatttgtc atagatcgta agctgtgata gtggtctgtc ttacgttgcg tcttccgcac 7380
ctcgagcgag ttctctcgga taactttcaa tttttccgag gggggcttgg tctctggttc 7440
ctcaggaagc ctgatcggga cgagctaatt cccatccatt tttttgaggc tctgctcaaa 7500
gggattagat cgagtgagac agttcttttg cagtgcgcga agaacctggg cctcgaccgg 7560
aggacgatcg acgtccgcga gcgggtcagc cgctgaggat gtgcccgtcg tggcggatcg 7620
tcccatcgag cgcgcagccg aagatccgat tgcgatcgtc ggagcgagtt gccgtctgcc 7680
cggtggcgtg atcgatctga gcgggttctg gacgctcctc gagggctcgc gcgacaccgt 7740
cgggcgagtc cccgccgaac gctgggatgc agcagcgtgg tttgatcccg accccgatgc 7800
cccggggaag acgcccgtta cgcgcgcatc tttcctgagc gacgtagcct gcttcgacgc 7860
ctccttcttc ggcatctcgc ctcgcgaagc gctgcggatg gaccctgcac atcgactctt 7920
gctggaggtg tgctgggagg cgctggagaa cgccgcgatc gctccatcgg cgctcgtcgg 7980
tacggaaacg ggagtgttca tcgggatcgg cccgtccgaa tatgaggccg cgctgccgca 8040
agcgacggcg tccgcagaga tcgacgctca tggcgggctg gggacgatgc ccagcgtcgg 8100
agcgggccga atctcgtatg ccctcgggct gcgagggccg tgtgtcgcgg tggatacggc 8160
ctattcgtcc tcgctggtgg ccgttcatct ggcctgtcag agcttgcgct ccggggaatg 8220
ctccacggcc ctggctggtg gggtatcgct gatgttgtcg ccgagcaccc tcgtgtggct 8280
ctcgaagacc cgggcgctgg ccagggacgg tcgctgcaag gcattttcgg cggaggccga 8340
tgggttcgga cgaggcgaag ggtgcgccgt cgtggtcctc aagcggctca gtggagcccg 8400
cgcggacggc gatcggatat tggcggtgat tcgaggatcc gcgatcaatc acgacggtgc 8460
gagcagcggt ctgaccgtgc cgaacgggag ctcccaagaa atcgtgctga aacgggccct 8520
ggcggacgca ggctgcgccg cgtcttcggt gggttatgtc gaggcacacg gcacgggcac 8580
gacgcttggt gaccccatcg aaatccaagc tctgaatgcg gtatacggcc tcgggcgaga 8640
tgtcgccacg ccgctgctga tcgggtcggt gaagaccaac cttggccatc ctgagtatgc 8700
gtcggggatc actgggctgc tgaaggtcgt cttgtccctt cagcacgggc agattcctgc 8760
gcacctccac gcgcaggcgc tgaacccccg gatctcatgg ggtgatcttc ggctgaccgt 8820
cacgcgcgcc cggacaccgt ggccggactg gaatacgccg cgacgggcgg gggtgagctc 8880
gttcggcatg agcgggacca acgcgcacgt ggtgctggaa gaggcgccgg cggcgacgtg 8940
cacaccgccg gcgccggagc gaccggcaga gctgctggtg ctgtcggcaa ggaccgcgtc 9000
agccctggat gcacaggcgg cgcggctgcg cgaccatctg gagacctacc cttcgcagtg 9060
tctgggcgat gtggcgttca gtctggcgac gacgcgcagc gcgatggagc accggctcgc 9120
ggtggcggcg acgtcgaggg aggggctgcg ggcagccctg gacgctgcgg cgcagggaca 9180
gacgtcgccc ggtgcggtgc gcagtatcgc cgattcctca cgcggcaagc tcgcctttct 9240
cttcaccgga cagggggcgc agacgctggg catgggccgt gggctgtacg atgtatggtc 9300
cgcgttccgc gaggcgttcg acctgtgcgt gaggctgttc aaccaggagc tcgaccggcc 9360
gctccgcgag gtgatgtggg ccgaaccggc cagcgtcgac gccgcgctgc tcgaccagac 9420
agccttcacc cagccggcgc tgttcacctt cgaatatgcg ctcgccgcgc tgtggcggtc 9480
gtggggtgta gagccggagt tggtcgccgg ccatagcatc ggtgagctgg tggctgcctg 9540
cgtggcgggc gtgttctcgc ttgaggacgc ggtgttcctg gtggctgcgc gcgggcgcct 9600
gatgcaggcg ctgccggccg gcggggcgat ggtgtcgatc gaggcgccgg aggccgatgt 9660
ggctgctgcg gtggcgccgc acgcagcgtc ggtgtcgatc gccgcggtca acgctccgga 9720
ccaggtggtc atcgcgggcg ccgggcaacc cgtgcatgcg atcgcggcgg cgatggccgc 9780
gcgcggggcg cgaaccaagg cgctccacgt ctcgcatgcg ttccactcac cgctcatggc 9840
cccgatgctg gaggcgttcg ggcgtgtggc cgagtcggtg agctaccggc ggccgtcgat 9900
cgtcctggtc agcaatctga gcgggaaggc ttgcacagac gaggtgagct cgccgggcta 9960
ttgggtgcgc cacgcgcgag aggtggtgcg cttcgcggat ggagtgaagg cgctgcacgc 10020
ggccggtgcg ggcaccttcg tcgaggtcgg tccgaaatcg acgctgctcg gcctggtgcc 10080
tgcctgcatg ccggacgccc ggccggcgct gctcgcatcg tcgcgcgctg ggcgtgacga 10140
gccggcgacc gtgctcgagg cgctcggcgg gctctgggcc gtcggtggcc tggtctcctg 10200
ggccggcctc ttcccctcag gggggcggcg ggtgccgctg cccacgtacc cttggcagcg 10260
cgagcgctac tggatcgaca cgaaagccga cgacgcggcg cgtggcgacc gccgtgctcc 10320
gggagcgggt cacgacgagg tcgaggaggg gggcgcggtg cgcggcggcg accggcgcag 10380
cgctcggctc gaccatccgc cgcccgagag cggacgccgg gagaaggtcg aggccgccgg 10440
cgaccgtccg ttccggctcg agatcgatga gccaggcgtg cttgatcacc tcgtgcttcg 10500
ggtcacggag cggcgcgccc ctggtctggg cgaggtcgag atcgccgtcg acgcggcggg 10560
gctcagcttc aatgatgtcc agctcgcgct gggcatggtg cccgacgacc tgccgggaaa 10620
gcccaaccct ccgctgctgc tcggaggcga gtgcgccggg cgcatcgtcg ccgtgggcga 10680
gggcgtgaac ggcctcgtgg tgggccaacc ggtcatcgcc ctttcggcgg gagcgtttgc 10740
tacccacgtc accacgtcgg ctgcgctggt gctgcctcgg cctcaggcgc tctcggcgat 10800
cgaggcggcc gccatgcccg tcgcgtacct gacggcatgg tacgcgctcg acagaatagc 10860
ccgccttcag ccgggggagc gggtgctgat ccatgcggcg accggcgggg tcggtctcgc 10920
cgcggtgcag tgggcgcagc acgtgggagc cgaggtccat gcgacggccg gcacgcccga 10980
gaaacgcgcc tacctggagt cgctgggcgt gcggtatgtg agcgattccc gctcggaccg 11040
gttcgtcgcc gacgtgcgcg cgtggacggg cggcgaggga gtagacgtcg tgctcaactc 11100
gctctcgggc gagctgatcg acaagagttt caatctcctg cgatcgcacg gccggtttgt 11160
ggagctcggc aagcgcgact gttacgcgga taaccagctc gggctgcggc cgttcctgcg 11220
caatctctcc ttctcgctgg tggatctccg ggggatgatg ctcgagcggc cggcgcgggt 11280
ccgtgcgctc ttggaggagc tcctcggcct gatcgcggca ggcgtgttca cccctccccc 11340
catcgcgacg ctcccgatcg cccgtgtcgc cgatgcgttc cggagcatgg cgcaggcgca 11400
gcatcttggg aagctcgtac tcacgctggg tgacccggag gtccagatcc gtattccaac 11460
ccacgcaggc gccggcccgt ccaccgggga tcgggacctg ctcgacaggc tcgcgtcagc 11520
tgcgccggcc gcgcgcgcgg cggcgctgga ggcgttcctc cgtacgcagg tctcgcaggt 11580
gctgcgcacg cccgaaatca aggtcggcgc ggaggcgctg ttcacccgcc tcggcatgga 11640
ctcgctcatg gccgtggagc tgcgcaatcg tatcgaggcg agcctcaagc tgaagctgtc 11700
gacgacgttc ctgtccacgt cccccaatat cgccttgttg gcccaaaacc tgttggatgc 11760
tctcgccaca gctctctcct tggagcgggt ggcggcggag aacctacggg caggcgtgca 11820
aaacgacttc gtctcatcgg gcgcagatca agactgggaa atcattgccc tatgacgatc 11880
aatcagcttc tgaacgagct cgagcaccag ggtatcaagc tggcggccga tggggagcgc 11940
ctccagatac aggcccccaa gaacgccctg aacccgaacc tgctcgctcg aatctccgag 12000
cacaaaagca cgatcctgac gatgctccgt cagagactcc ccgcagaatc catcgtgccc 12060
gccccagccg agcggcacgc tccgtttcct ctcacagaca tccaagaatc ctactggctg 12120
ggccggacag gagcgtttac ggtccccagc gggatccacg cctatcgcga atacgactgt 12180
acggatctcg acgtgccgag gctgagccgc gcctttcgga aagtcgtcgc gcggcacgac 12240
atgcttcggg cccacacgct gcccgacatg atgcaggtga tcgagcctaa agtcgacgcc 12300
gacatcgaga tcatcgatct gcgcgggctc gaccggagca cacgggaagc gaggctcgtg 12360
tcgttgcgag atgcgatgtc gcaccgcatc tatgacaccg agcgccctcc gctctatcac 12420
gtcgtcgccg ttcggctgga cgagcggcaa acccgtctcg tgctcagtat cgatctcatt 12480
aacgttgacc taggcagcct gtccatcatc ttcaaggact ggctcagctt ctacgaagat 12540
cccgagacct ctctccctgt cctggagctc tcgtaccgcg attatgtact cgcgctggag 12600
tctcgcaaga agtctgaggc gcatcaacga tcgatggatt actggaagcg gcgcatcgcc 12660
gagctcccac ctccgccgac gcttccgatg aaggccgatc catctaccct gaaggagatc 12720
cgcttccggc acacggagca atggctgccg tcggactcct ggggtcgatt gaagcggcgt 12780
gtcggggagc gcgggctgac cccgacgggc gtcatcctgg ctgcattttc cgaggtgatc 12840
gggcgctgga gcgcgagccc ccggtttacg ctcaacataa cgctcttcaa ccggctcccc 12900
gtccatccgc gcgtgaacga tatcaccggg gacttcacgt cgatggtcct cctggacatc 12960
gacaccactc gcgacaagag cttcgaacag cgcgctaagc gtattcaaga gcagctgtgg 13020
gaagcgatgg atcactgcga cgtaagcggt atcgaggtcc agcgagaggc cgcccgggtc 13080
ctggggatcc aacgaggcgc attgttcccc gtggtgctca cgagcgcgct taaccagcaa 13140
gtcgttggtg tcacctcgtt gcagaggctc ggaactccgg tgtacaccag cacgcagact 13200
cctcagctgc tgctggatca tcagctctac gagcacgatg gggacctcgt cctcgcgtgg 13260
gacatcgtcg acggagtgtt cccgcccgac cttctggacg acatgctcga agcgtacgtc 13320
gtttttctcc ggcggctcac tgaggaacca tggggtgaac aggtgcgctg ttcgcttccg 13380
cctgcccagc tagaagcgcg ggcgagcgca aacgcgacca acgcgctgct gagcgagcat 13440
acgctgcacg gcctgttcgc ggcgcgggtc gagcagctgc ccatgcagct cgccgtggtg 13500
tcggcgcgca agacgctcac gtacgaagag ctttcgcgcc gttcgcggcg acttggcgcg 13560
cggctgcgcg agcagggggc acgcccgaac acattggtcg cggtggtgat ggagaaaggc 13620
tgggagcagg ttgtcgcggt tctcgcggtg ctcgagtcag gcgcggccta cgtgccgatc 13680
gatgccgacc taccggcgga gcgtatccac tacctcctcg atcatggtga ggtaaagctc 13740
gtgctgacgc agccatggct ggatggcaaa ctgtcatggc cgccggggat ccagcggctg 13800
ctcgtgagcg aggccggcgt cgaaggcgac ggcgaccagc ctccgatgat gcccattcag 13860
acaccttcgg atctcgcgta tgtcatctac acctcgggat ccacagggtt gcccaagggg 13920
gtgatgatcg atcatcgggg tgccgtcaac accatcctgg acatcaacga gcgcttcgaa 13980
atagggcccg gagacagggt gctggcgctc tcctcgctga gcttcgatct ctcggtctat 14040
gatgtgttcg ggatcctggc ggcgggcggt acgatcgtgg tgccggacgc gtccaagctg 14100
cgcgatccgg cgcattgggc agagttgatc gaacgagaga aggtgacggt gtggaactcg 14160
gtgccggcgc tgatgcggat gctcgtcgag cattttgagg gtcgccccga ttcgctcgct 14220
aggtctctgc ggctttcgct gctgagcggc gactggatcc cggtgggcct gcctggcgag 14280
ctccaggcca tcaggcccgg cgtgtcggtg atcagcctgg gcggggccac cgaagcgtcg 14340
atctggtcca tcgggtaccc cgtgaggaac gtcgacctat cgtgggcgag catcccctac 14400
ggccgtccgc tgcgcaacca gacgttccac gtgctcgatg aggcgctcga accgcgcccg 14460
gtctgggttc cggggcaact ctacattggc ggggtcgggc tggcactggg ctactggcgc 14520
gatgaagaga agacgcgcaa gagcttcctc gtgcaccccg agaccgggga gcgcctctac 14580
aagaccggcg atctgggccg ctacctgccc gatggaaaca tcgagttcat ggggcgtgag 14640
gacaaccaaa tcaagcttcg cggataccgc gttgagctcg gggaaatcga ggaaacgctc 14700
aagtcgcatc cgaacgtacg cgacgcggtg attgtgcccg tcgggaacga cgcggcgaac 14760
aagctccttc tagcctatgt ggtcccggag ggcacacgga gacgcgctgc cgagcaggac 14820
gcgagcctca agaccgagcg gatcgacgcg agagcacacg ccgccgaagc ggacggcttg 14880
agcgacggcg agagggtgca gttcaagctc gctcgacacg gactccggag ggacctggac 14940
ggaaagcccg tcgtcgatct gaccgggcag gatccgcggg aggcggggct ggacgtctac 15000
gcgcgtcgcc gtagcgtccg aacgttcctt gaggccccga ttccgtttgt tgagtttggt 15060
cgattcctga gctgcttgag cagcgtggag cccgacggcg cgacccttcc caaattccgt 15120
tatccatcgg cgggcagcac gtacccggtg caaacctacg cgtatgtcaa atccggccgc 15180
atcgagggcg tggacgaggg cttctattat taccacccgt tcgagcaccg tttgctgaag 15240
ctctccgatc acgggatcga gcgcggagcg cacgttcggc aaaacttcga cgtgttcgat 15300
gaagcggcgt tcaacctcct gttcgtgggc aggatcgacg ccatcgagtc gctgtatgga 15360
tcgtcgtcgc gagaattttg cctgctggag gccggatata tggcgcagct cctgatggag 15420
caggcgcctt cctgcaacat cggcgtctgt ccggtggggc aattcaattt tgaacaggtt 15480
cggccggttc tcgacctgcg acattcggac gtttacgtgc acggcatgct gggcgggcgg 15540
gtagacccgc ggcagttcca ggtctgtacg ctcggtcagg attcctcacc gaggcgcgcc 15600
acgacgcgcg gcgcccctcc cggccgcgag cagcacttcg ccgatatgct tcgcgacttc 15660
ttgaggacca aactacccga gtacatggtg cctacagtct tcgtggagct cgatgcgttg 15720
ccgctgacgt ccaacggcaa ggtcgatcgt aaggccctgc gcgagcggaa ggatacctcg 15780
tcgccgcggc attcggggca cacggcgcca cgggacgcct tggaggagat cctcgtcgcg 15840
gtcgtacggg aggtgctcgg gctggaggtg gtcgggctcc agcagagctt cgtcgatctt 15900
ggtgcgacat cgattcacat cgttcgcatg aggagcctgt tgcagaagag gctggatagg 15960
gagatcgcca tcaccgagtt gttccagtac ccgaacctcg gctcgctggc gtccggtttg 16020
cgccgagact cgagagatct agatcagcgg ccgaacatgc aggaccgagt ggaggttcgg 16080
cgcaagggca ggagacgtag ctaagagcgc cgaacaaaac caggccgagc gggccgatga 16140
gccgcaagcc cgcctgcgtc accctgggac tcatctgatc tgatcgcggg tacgcgtcgc 16200
gggtgtgcgc gttgagccgt gttgttcgaa cgctgaggaa cggtgagctc atggaagaac 16260
aagagtcctc cgctatcgca gtcatcggca tgtcgggccg ttttccgggg gcgcgggatc 16320
tggacgaatt ctggaggaac cttcgagacg gcacggaggc cgtgcagcgc ttctccgagc 16380
aggagctcgc ggcgtccgga gtcgaccccg cgctggtgct ggacccgagc tacgtccggg 16440
cgggcagcgt gctggaagac gtcgaccggt tcgacgctgc tttcttcggc atcagcccgc 16500
gcgaggcaga gctcatggat ccgcagcacc ggatcttcat ggaatgcgcc tgggaggcgc 16560
tggagaacgc cggatacgac ccgacggctt acgagggctc tatcggcgtg tacgccggcg 16620
ccaacatgag ctcgtacttg acgtcgaacc tccacgagca cccagcgatg atgcggtggc 16680
ccggctggtt tcagacgttg atcggcaacg acaaggatta cctcgcgacc cacgtctcct 16740
acaggctgaa tctgagaggg ccgagcatct ccgttcaaac tgcctgctcc acctcgctcg 16800
tggcggttca cttggcgtgc atgagcctcc tggaccgcga gtgcgacatg gcgctggccg 16860
gcgggattac cgtccggatc ccccatcgag ccggctatgt atatgctgag gggggcatct 16920
tctctcccga cggccattgc cgggccttcg acgccaaggc gaacggcacg atcatgggca 16980
acggctgcgg cgttgtcctc ctgaagccgc tggaccgggc gctctccgat ggtgatcccg 17040
tccgcgcggt tatccttggg tctgccacaa acaacgacgg agcgaggaag atcgggttca 17100
ctgcgcccag tgaggtgggc caggcgcaag cgatcatgga ggcgctggcg ctggcagggg 17160
tcgaggcccg gtccatccaa tacatcgaga cccacgggac cggcacgctg ctcggagacg 17220
ccatcgagac ggcggcgctg cggcgggtgt tcggtcgcga cgcttcggcc cggaggtctt 17280
gcgcgatcgg ctccgtgaag accggcatcg gacacctcga atcggcggct ggcatcgccg 17340
gtttgatcaa gacggtcttg gcgctggagc accggcagct gccgcccagc ctgaacttcg 17400
agtctcctaa cccatcgatc gatttcgcga gcagcccgtt ctacgtcaat acctctctta 17460
aggattggaa taccggctcg actccgcggc gggccggcgt cagctcgttc gggatcggcg 17520
gcaccaacgc ccatgtcgtg ctggaggaag cgcccgcggc gaagcttcca gccgcggcgc 17580
cggcgcgctc tgccgagctc ttcgtcgtct cggccaagag cgcagcggcg ctggatgccg 17640
cggcggcacg gctacgagat catctgcagg cgcaccaggg gatttcgttg ggcgacgtcg 17700
ccttcagcct ggcgacgacg cgcagcccca tggagcaccg gctcgcgatg gcggcgccgt 17760
cgcgcgaggc gttgcgagag gggctcgacg cagcggcgcg aggccagacc ccgccgggcg 17820
ccgtgcgtgg ccgctgctcc ccaggcaacg tgccgaaggt ggtcttcgtc tttcccggcc 17880
agggctctca gtgggtcggc atgggccggc agctcctggc tgaggaaccc gtcttccacg 17940
cggcgctttc ggcgtgcgac cgggccatcc aggccgaagc tggttggtcg ctgctcgcgg 18000
agctcgccgc cgacgaaggg tcctcccagc tcgagcgcat cgacgtggtg cagccggtgc 18060
tgttcgccct cgcggtggca tttgcggcgc tgtggcggtc gtggggtgtc gcgcccgacg 18120
tcgtgatcgg ccacagcatg ggcgaggtag ccgccgcgca tgtggccggg gcgctgtcgc 18180
tcgaggatgc ggtggcgatc atctgccggc gcagccggct gctccggcgc atcagcggtc 18240
agggcgagat ggcggtgacc gagctgtcgc tggccgaggc cgaggcggcg ctccgaggct 18300
acgaggatcg ggtgagcgtg gccgtgagca acagcccgcg ctcgacggtg ctctcgggcg 18360
agccggcagc gatcggcgag gtgctgtcgt ccctgaacgc gaagggggtg ttctgccgtc 18420
gggtgaaggt ggatgtcgcc agccacagcc cgcaggtcga cccgctgcgc gaggacctct 18480
tggcagccct gggcgggctc cggccgggtg cggctgcggt gccgatgcgc tcgacggtga 18540
cgggcgccat ggtagcgggc ccggagctcg gagcgaatta ctggatgaac aacctcaggc 18600
agccagtgcg cttcgccgag gtagtccagg cgcagctcca aggcggccac ggtctgttcg 18660
tggagatgag cccgcatccg atcctaacga cttcggtcga ggagatgcgg cgcgcggccc 18720
agcgggcggg cgcagcggtg ggctcgctgc ggcgggggca ggacgagcgc ccggcgatgc 18780
tggaggcgct gggcacgctg tgggcgcagg gctaccctgt accctggggg cggctgtttc 18840
ccgcgggggg gcggcgggta ccgctgccga cctatccctg gcagcgcgag cggtactgga 18900
tcgaagcgcc ggccaagagc gccgcgggcg atcgccgcgg cgtgcgtgcg ggcggtcacc 18960
cgctcctcgg tgaaatgcag accctgtcaa cccagacgag cacgcggctg tgggagacga 19020
cgctggatct caagcggctg ccgtggctcg gcgaccaccg ggtgcaggga gcggtcgtgt 19080
ttccgggcgc ggcgtacctg gagatggcga tttcgtcggg ggccgaggct ttgggcgatg 19140
gccctttgca gataactgac gtggtgctcg ccgaggcgct ggccttcgcg ggcgacgcgg 19200
cggtgttggt ccaggtggtg acgacggagc agccgtcggg gcggctgcag ttccagatcg 19260
cgagccgggc gccgggcgct ggccacgcgt ccttccgggt ccacgctcgc ggcgcgttgc 19320
tccgagtgga gcgcaccgag gtcccggctg ggcttacgct ttccgctgtg cgcgcgcggc 19380
tccaggccag catacccgcc gcggccacct acgcggagct gaccgagatg gggctgcagt 19440
acggccctgc cttccagggg attgctgagc tatggcgggg tgaaggcgag gcgctgggac 19500
gggtacgcct gcccgacgcg gccggctcgg cagcggagta tcggttgcat cctgcgctgc 19560
tggacgcgtg cttccagatc gtcggcagcc tcttcgcccg cagtggcgag gcgacgccgt 19620
gggtgcccgt ggagttgggc tcgctgcggc tcttgcagcg gccttcgggg gagctgtggt 19680
gccatgcgcg cgtcgtgaac catgggcacc aaacccccga tcggcagggc gccgactttt 19740
gggtggtcga cagctcgggt gcagtggtcg ccgaagtttg cgggctcgtg gcgcagcggc 19800
ttccgggagg ggtgcgccgg cgcgaagaag acgattggtt cctggagctc gagtgggaac 19860
ccgcagcggt cggcacagcc aaggtcaacg cgggccggtg gctgctcctc ggcggcggcg 19920
gtgggctcgg cgccgcgttg cgcgcgatgc tggaggccgg cggccatgcc gtcgtgcatg 19980
cggcagagaa caacacgagc gctgccggcg tacgcgcgct cctggcaaag gcctttgacg 20040
gccaggctcc gacggcggtg gtgcacctcg gcagcctcga tgggggtggc gagctcgacc 20100
cagggctcgg ggcgcaaggc gcattggacg cgccccggag cgccgacgtc agtcccgatg 20160
ccctcgatcc ggcgctggta cgtggctgcg acagcgtgct ctggaccgtg caggccctgg 20220
ccggcatggg ctttcgagac gccccgcgat tgtggctttt gacccgcggc gcacaggccg 20280
tcggcgccgg cgacgtctcc gtgacacagg caccgctgct ggggctgggc cgcgtcatcg 20340
ccatggagca cgcggatctg cgctgcgctc gggtcgacct cgatccagcc cggcccgagg 20400
gggagctcgc tgccctgctg gccgagctgc tggccgacga cgccgaagcg gaagtcgcgt 20460
tgcgcggtgg cgagcgatgc gtcgctcgga tcgtccgccg gcagcccgag acccggcccc 20520
gggggaggat cgagagctgc gttccgaccg acgtcaccat ccgcgcggac agcacctacc 20580
ttgtgaccgg cggtctgggt gggctcggtc tgagcgtggc cggatggctg gccgagcgcg 20640
gcgctggtca cctggtgctg gtgggccgct ccggcgcggc gagcgtggag caacgggcag 20700
ccgtcgcggc gctcgaggcc cgcggcgcgc gcgtcaccgt ggcgaaggcg gatgtcgccg 20760
atcgggcgca gctcgagcgg atcctccgcg aggttaccac gtcggggatg ccgctgcggg 20820
gcgtcgtcca tgcggccggc atcttggacg acgggctgct gatgcagcag actcccgcgc 20880
ggtttcgtaa ggtgatggcg cccaaggtcc agggggcctt gcacctgcac gcgttgacgc 20940
gcgaagcgcc gctttccttc ttcgtgctgt acgcttcggg agtagggctc ttgggctcgc 21000
cgggccaggg caactacgcc gcggccaaca cgttcctcga cgctctggcg caccaccgga 21060
gggcgcaggg gctgccagcg ttgagcgtcg actggggcct gttcgcggag gtgggcatgg 21120
cggccgcgca ggaagatcgc ggcgcgcggc tggtctcccg cggaatgcgg agcctcaccc 21180
ccgacgaggg gctgtccgct ctggcacggc tgctcgaaag cggccgcgct caggtggggg 21240
tgatgccggt gaacccgcgg ctgtgggtgg agctctaccc cgcggcggcg tcttcgcgaa 21300
tgttgtcgcg cctggtgacg gcgcatcgcg cgagcgccgg cgggccagcc ggggacgggg 21360
acctgctccg ccgcctcgcc gctgccgagc cgagcgcgcg gagcgcgctc ctggagccgc 21420
tcctccgcgc gcagatctcg caggtgctgc gcctccccga gggcaagatc gaggtggacg 21480
ccccgctcac gagcctgggc atgaactcgc tgatggggct cgagctgcgc aaccgcatcg 21540
aggccatgct gggcatcacc gtaccggcaa cgctgttgtg gacctatccc acggtggcgg 21600
cgctgagcgg gcatctggcg cgggaggcat gcgaagccgc tcctgtggag tcaccgcaca 21660
ccaccgccga ctctgccgtc gagatcgagg agatgtcgca ggacgatctg acgcagttga 21720
tcgcagcaaa attcaaggcg cttacatgac tactcgcggt cctacggcac agcagaatcc 21780
gctgaaacaa gcggccatca tcattcagcg gctggaggag cggctcgctg ggctcgcaca 21840
ggcggagctg gaacggaccg agccgatcgc catcgtcggt atcggctgcc gcttccctgg 21900
cggtgcggac gctccggaag cgttttggga gctgctcgac gcggagcgcg acgcggtcca 21960
gccgctcgac atgcgctggg cgctggtggg tgtcgctccc gtcgaggccg tgccgcactg 22020
ggcggggctg ctcaccgagc cgatagattg cttcgatgct gcgttcttcg gcatctcgcc 22080
tcgggaggcg cgatcgctcg acccgcagca tcgtctgttg ctggaggtcg cttgggaggg 22140
gctcgaggac gccggtatcc cgccccggtc catcgacggg agccgcaccg gtgtgttcgt 22200
cggcgctttc acggcggact acgcgcgcac ggtcgctcgg ctgccgcgcg aggagcgaga 22260
cgcgtacagc gccaccggca acatgctcag catcgccgcc ggacggctgt cgtacacgct 22320
ggggttgcag ggaccttgcc tgaccgtcga cacggcgtgc tcgtcatcgc tggtggcgat 22380
tcacctcgcc tgccgcagcc tgcgcgcagg agagagcgat ctcgcgttgg cgggaggggt 22440
cagcgcgctc ctctcccccg acatgatgga agccgcggcg cgcacgcaag cgctgtcgcc 22500
cgatggtcgt tgccggacct tcgatgcttc ggccaacggg ttcgtccgtg gcgagggctg 22560
tggcctggtc gtcctcaaac ggctctccga cgcgcaacgg gatggcgacc gcatctgggc 22620
gctgatccgg ggctcggcca tcaaccatga tggccggtcg accgggttga ccgcgcccaa 22680
cgtgctggct caggagacgg tcttgcgcga ggcgctgcgg agcgcccacg tcgaagctgg 22740
ggccgtcgat tacgtcgaga cccacggaac agggacctcg ctgggcgatc ccatcgaggt 22800
cgaggcgctg cgggcgacgg tggggccggc gcgctccgac ggcacacgct gcgtgctggg 22860
cgcggtgaag accaacatcg gccatctcga ggccgcggca ggcgtagcgg gcctgatcaa 22920
ggcagcgctt tcgctgacgc acgagcgcat cccgagaaac ctcaacttcc gcacgctcaa 22980
tccgcggatc cggctcgagg gcagcgcgct cgcgttggcg accgagccgg tgccgtggcc 23040
gcgcacggac cgcccgcgct tcgcgggggt gagctcgttc gggatgagcg gaacgaacgc 23100
gcatgtggtg ctggaagagg cgccggcggt ggagctgtgg cctgccgcgc cggagcgctc 23160
ggcggagctt ttggtgctgt cgggcaagag cgagggggcg ctcgatgcgc aggcggcgcg 23220
gctgcgcgag cacctggaca tgcacccgga gctcgggctc ggggacgtgg cgttcagcct 23280
ggcgacgacg cgcagcgcga tgagccaccg gctcgcggtg gcggtgacgt cgcgcgaggg 23340
gctgctggcg gcgctctcgg ccgtggcgca ggggcagacg ccggcggggg cggcgcgctg 23400
catcgcgagc tcctcgcgcg gcaagctggc gttcctgttc accggacagg gcgcgcagac 23460
gccgggcatg ggccgggggc tttgcgcggc gtggccagcg ttccgggagg cgttcgaccg 23520
gtgcgtggcg ctgttcgacc gggagctgga ccgcccgctg cgcgaggtga tgtgggcgga 23580
ggcggggagc gccgagtcgt tgttgctcga ccagacggcg ttcacccagc ccgcgctctt 23640
cgcggtggag tacgcgctga cggcgctgtg gcggtcgtgg ggcgtagagc cggagctcct 23700
ggttgggcat agcatcgggg agctggtggc ggcgtgcgtg gcgggggtgt tctcgctgga 23760
agatggggtg aggctcgtgg cggcgcgcgg gcggctgatg caggggctct cggcgggcgg 23820
cgcgatggtg tcgctcggag cgccggaggc ggaggtggcg gcggcggtgg cgccgcacgc 23880
ggcgtcggtg tcgatcgcgg cggtcaatgg gccggagcag gtggtgatcg cgggcgtgga 23940
gcaagcggtg caggcgatcg cggcggggtt cgcggcgcgc ggcgcgcgca ccaagcggct 24000
gcatgtctcg cacgcgttcc actcgccgct gatggaaccg atgctggagg agttcgggcg 24060
ggtggcggcg tcggtgacgt accggcggcc aagcgtttcg ctggtgagca acctgagcgg 24120
gaaggtggtc acggacgagc tgagcgcgcc ggggtactgg gtgcggcacg tgcgggaggc 24180
ggtgcgcttc gcggacgggg tgaaggcgct gcacgaagcc ggcgcgggga cgttcgtcga 24240
agtgggcccg aagccgacgc tgctcgggct gttgccagcc tgcctgccgg aggcggagcc 24300
gacgctgctg gcgtcgttgc gcgccgggcg cgaggaggct gcgggggtgc tcgaggcgct 24360
gggcaggctg tgggccgccg gcggctcggt cagctggccg ggcgtcttcc ccacggctgg 24420
gcggcgggtg ccgctgccga cctatccgtg gcagcggcag cggtactgga tcgaggcgcc 24480
ggccgaaggg ctcggagcca cggccgccga tgcgctggcg cagtggttct accgggtgga 24540
ctggcccgag atgcctcgct catccgtgga ttcgcggcga gcccggtccg gcgggtggct 24600
ggtgctggcc gaccggggtg gagtcgggga ggcggccgcg gcggcgcttt cgtcgcaggg 24660
atgttcgtgc gccgtgctcc atgcgcccgc cgaggcctcc gcggttgccg agcaggtgac 24720
ccaggccctc ggtggccgca acgactggca gggggtgctg tacctgtggg gtctggacgc 24780
cgtcgtggag gcgggggcat cggccgaaga ggtcgccaaa gtcacccatc ttgccgcggc 24840
gccggtgctc gcgctgattc aggcgctcgg cacggggccg cgctcacccc ggctctggat 24900
cgtgacccga ggggcctgca cggtgggcgg cgagcctgac gctgccccct gtcaggcggc 24960
gctgtggggt atgggccggg tcgcggcgct agagcatccc ggctcctggg gcgggctcgt 25020
ggacctggat ccggaggaga gcccgacgga ggtcgaggcc ctggtggccg agctgctttc 25080
gccggacgcc gaggatcagc tggcattccg ccaggggcgc cggcgcgcag cgcggcttgt 25140
ggccgcccca ccggagggaa acgcagcgcc ggtgtcgctg tctgcggagg ggagttactt 25200
ggtgacgggt gggctgggcg cccttggcct cctcgttgcg cggtggttgg tggagcgcgg 25260
ggcggggcac cttgtgctga tcagccggca cggattgccc gaccgcgagg aatggggccg 25320
agatcagccg ccagaggtgc gcgcgcgcat tgcggcgatc gaggcgctgg aggcgcaggg 25380
cgcgcgggtc accgtggcgg cggtcgacgt ggccgatgcc gaaggcatgg cggcgctctt 25440
ggcggccgtc gagccgccgc tgcggggggt agtgcacgcc gcgggtctgc tcgacgacgg 25500
gctgctggcc caccaggacg ctggtcggct cgcccgggtg ttgcgcccca aggtggaggg 25560
ggcatgggtg ctgcacaccc ttacccgcga gcagccgctg gacctcttcg tactgttttc 25620
ctcggcgtcg ggcgtcttcg gctcgatcgg ccagggcagc tacgcggcag gcaatgcctt 25680
tttggacgcg ctggcggacc tccgccgaac gcaggggctc gccgccctga gcatcgcctg 25740
gggcctgtgg gcggaggggg ggatgggctc gcaggcgcag cgccgggaac acgaggcatc 25800
gggaatctgg gcgatgccga cgagtcgggc cctggcggcg atggaatggc tgctcggtac 25860
gcgcgcgacg cagcgcgtgg tcatccagat ggattgggcc catgcgggag cggcgccgcg 25920
cgacgcgagc cgaggccgct tctgggatcg gctggtaact gccacgaaag aggcctcctc 25980
ctcggccgtg ccagctgtgg agcgctggcg caacgcgtct gttgtggaga cccgctcggc 26040
gctctacgag cttgtgcgcg gcgtggtcgc cggggtgatg ggctttaccg accagggcac 26100
gctcgacgtg cgacgaggct tcgccgagca gggcctcgac tccctgatgg ccgtggagat 26160
ccgcaaacgg cttcagggtg agctgggtat gccgctgtcg gcgacgctag cgttcgacca 26220
tccgaccgtg gagcggctgg tggaatactt gctgagccag gcgctggagc tgcaggaccg 26280
caccgacgtg cggagcgttc ggttgccggc gacagaggac ccgatcgcca tcgtgggtgc 26340
cgcctgccgc ttcccgggcg gggtcgagga cctggagtcc tactggcagc tgttgaccga 26400
gggcgtggtg gtcagcaccg aggtgccggc cgaccggtgg aatggggcag acgggcgcgt 26460
ccccggctcg ggagaggcac agagacagac ctacgtgccc aggggtggct ttctgcgcga 26520
ggtggagacg ttcgatgcgg cgttcttcca catctcgcct cgggaggcga tgagcctgga 26580
cccgcaacag cggctgctgc tggaagtgag ctgggaggcg atcgagcgcg cgggccagga 26640
cccgtcggcg ctgcgcgaga gccccacggg cgtgttcgtg ggcgcgggcc ccaacgaata 26700
tgccgagcgg gtgcaggaac tcgccgatga ggcggcgggg ctctacagcg gcaccggcaa 26760
catgctcagc gttgcggcgg gacggctatc atttttcctg ggcctgcacg ggccgaccct 26820
ggctgtggat acggcgtgct cctcgtcgct ggtggcgctg cacctcggct gccagagctt 26880
gcgacggggc gagtgcgacc aagccctggt tggcggggtc aacatgctgc tctcgccgaa 26940
gaccttcgcg ctgctctcac ggatgcacgc actttcgccc ggcgggcggt gcaagacgtt 27000
ctcggccgac gcggacggct acgcgcgggc cgagggctgc gccgtggtgg tgctcaagcg 27060
gctctccgac gcgcagcgcg accgcgaccc catcctggcg gtgatccggg gtacggcgat 27120
caatcatgat ggcccgagca gcgggctgac agtgcccagc ggccctgccc aggaggcgct 27180
gttacgccag gcgctggcgc acgcaggggt ggttccggcc gacgtcgatt tcgtggaatg 27240
ccacgggacc gggacggcgc tgggcgaccc gatcgaggtg cgtgcgctga gcgacgtgta 27300
cgggcaagcc cgccctgcgg accgaccgct gatcctggga gccgccaagg ccaaccttgg 27360
gcacatggag cccgcggcgg gcctggccgg cttgctcaag gcggtgctcg cgctggggca 27420
agagcaaata ccagcccagc cggagctggg cgagctcaac ccgctcttgc cgtgggaggc 27480
gctgccggtg gcggtggccc gcgcagcggt gccgtggccg cgcacggacc gcccgcgctt 27540
cgcgggggtg agctcgttcg ggatgagcgg aacgaacgcg catgtggtgc tggaagaggc 27600
gccggcggtg gagctgtggc ctgccgcgcc ggagcgctcg gcggagcttt tggtgctgtc 27660
gggcaagagc gagggggcgc tcgatgcgca ggcggcgcgg ctgcgcgagc acctggacat 27720
gcacccggag ctcgggctcg gggacgtggc gttcagcctg gcgacgacgc gcagcgcgat 27780
gaaccaccgg ctcgcggtgg cggtgacgtc gcgcgagggg ctgctggcgg cgctttcggc 27840
cgtggcgcag gggcagacgc cgccgggggc ggcgcgctgc atcgcgagct cgtcgcgcgg 27900
caagctggcg ttcctgttca ccggacaggg cgcgcagacg ccgggcatgg gccgggggct 27960
ttgcgcggcg tggccagcgt tccgggaggc gttcgaccgg tgcgtggcgc tgttcgaccg 28020
ggagctggac cgcccgctgc gcgaggtgat gtgggcggag ccggggagcg ccgagtcgtt 28080
gttgctcgac cagacggcgt tcacccagcc cgcgctcttc acggtggagt acgcgctgac 28140
ggcgctgtgg cggtcgtggg gcgtagagcc ggagctggtg gctgggcata gcgccgggga 28200
gctggtggcg gcgtgcgtgg cgggggtgtt ctcgctggaa gatggggtga ggctcgtggc 28260
ggcgcgcggg cggctgatgc aggggctctc ggcgggcggc gcgatggtgt cgctcggagc 28320
gccggaggcg gaggtggcgg cggcggtggc gccgcacgcg gcgtcggtgt cgatcgcggc 28380
ggtcaatggg ccggagcagg tggtgatcgc gggcgtggag caagcggtgc aggcgatcgc 28440
ggcggggttc gcggcgcgcg gcgcgcgcac caagcggctg catgtctcgc acgcgtccca 28500
ctcgccgctg atggaaccga tgctggagga gttcgggcgg gtggcggcgt cggtgacgta 28560
ccggcggcca agcgtttcgc tggtgagcaa cctgagcggg aaggtggtcg cggacgagct 28620
gagcgcgccg gggtactggg tgcggcacgt gcgggaggcg gtgcgcttcg cggacggggt 28680
gaaggcgctg cacgaagccg gtgcgggcac gttcgtcgaa gtgggcccga agccgacgct 28740
gctcgggctg ttgccagcct gcctgccgga ggcggagccg acgctgctgg cgtcgttgcg 28800
cgccgggcgc gaggaggctg cgggggtgct cgaggcgctg ggcaggctgt gggccgccgg 28860
cggctcggtc agctggccgg gcgtcttccc cacggctggg cggcgggtgc cgctgccgac 28920
ctatccgtgg cagcggcagc ggtactggcc cgacatcgag cctgacagcc gtcgccacgc 28980
agccgcggat ccgacccaag gctggttcta tcgcgtggac tggccggaga tacctcgcag 29040
cctccagaaa tcagaggagg cgagccgcgg gagctggctg gtattggcgg ataagggtgg 29100
agtcggcgag gcggtcgctg cagcgctgtc gacacgtgga cttccatgcg tcgtgctcca 29160
tgcgccggca gagacatccg cgaccgccga gctggtgacc gaggctgccg gcggtcgaag 29220
cgattggcag gtagtgctct acctgtgggg tctggacgcc gtcgtcggtg cggaggcgtc 29280
gatcgatgag atcggcgacg cgacccgtcg tgctaccgcg ccggtgctcg gcttggctcg 29340
gtttctgagc accgtgtctt gttcgccccg actctgggtc gtgacccggg gggcatgcat 29400
cgttggcgac gagcctgcga tcgccccttg tcaggcggcg ttatggggca tgggccgggt 29460
ggcggcgctc gagcatcccg gggcctgggg cgggctcgtg gacctggatc cccgagcgag 29520
cccgccccaa gccagcccga tcgacggcga gatgctcgtc accgagctat tgtcgcagga 29580
gaccgaggat cagctcgcct tccgccatgg gcgccggcac gcggcacggc tggtggccgc 29640
cccgccacag gggcaagcgg caccggtgtc gctgtctgcg gaggcgagct acctggtgac 29700
gggaggcctc ggtgggctgg gcctgatcgt ggcccagtgg ctggtggagc tgggagcgcg 29760
gcacttggtg ctgaccagcc ggcgcgggtt gcccgaccgg caggcgtggt gcgagcagca 29820
gccgcctgag atccgcgcgc ggatcgcagc ggtcgaggcg ctggaggcgc ggggtgcacg 29880
ggtgaccgtg gcagcggtgg acgtggccga cgtcgaaccg atgacagcgc tggtttcgtc 29940
ggtcgagccc ccgctgcgag gggtggtgca cgccgctggc gtcagcgtca tgcgtccact 30000
ggcggagacg gacgagaccc tgctcgagtc ggtgctccgt cccaaggtgg ccgggagctg 30060
gctgctgcac cggctgctgc acggccggcc tctcgacctg ttcgtgctgt tctcgtcggg 30120
cgcagcggtg tggggtagcc atagccaggg tgcgtacgcg gcggccaacg ctttcctcga 30180
cgggctcgcg catcttcggc gttcgcaatc gctgcctgcg ttgagcgtcg cgtggggtct 30240
gtgggccgag ggaggcatgg cggacgcgga ggctcatgca cgtctgagcg acatcggggt 30300
tctgcccatg tcgacgtcgg cagcgttgtc ggcgctccag cgcctggtgg agaccggcgc 30360
ggctcagcgc acggtgaccc ggatggactg ggcgcgcttc gcgccggtgt acaccgctcg 30420
agggcgtcgc aacctgcttt cggcgctggt cgcagggcgc gacatcatcg cgccttcccc 30480
tccggcggca gcaacccgga actggcgtgg cctgtccgtt gcggaagccc gcgtggctct 30540
gcacgagatc gtccatgggg ccgtcgctcg ggtgctgggc ttcctcgacc cgagcgcgct 30600
cgatcctggg atggggttca atgagcaggg cctcgactcg ttgatggcgg tggagatccg 30660
caacctcctt caggctgagc tggacgtgcg gctttcgacg acgctggcct ttgatcatcc 30720
gacggtacag cggctggtgg agcatctgct cgtcgatgta ctgaagctgg aggatcgcag 30780
cgacacccag catgttcggt cgttggcgtc agacgagccc atcgccatcg tgggagccgc 30840
ctgccgcttc ccgggcgggg tggaggacct ggagtcctac tggcagctat tggccgaggg 30900
cgtggtggtc agcgccgagg tgccggccga ccggtgggat gcggcggact ggtacgaccc 30960
tgatccggag atcccaggcc ggacttacgt gaccaaaggc gccttcctgc gcgatttgca 31020
gagattggat gcgaccttct tccgcatctc gcctcgcgag gcgatgagcc tcgacccgca 31080
gcagcggttg ctcctggagg taagctggga agcgctcgag agcgcgggta tcgctccgga 31140
tacgctgcga gatagcccca ccggggtgtt cgtgggtgcg gggcccaatg agtactacac 31200
gcagcggctg cgaggcttca ccgacggagc ggcagggttg tacggcggca ccgggaacat 31260
gctcagcgtt acggctggac ggctgtcgtt tttcctgggt ctgcacggcc cgacgctggc 31320
catggatacg gcgtgctcgt catccctggt cgcgctgcac ctcgcctgcc agagcctgcg 31380
actgggcgag tgcgatcaag cgctggttgg cggggtcaac gtgctgctcg cgccggagac 31440
cttcgtgctg ctctcacgga tgcgcgcgct ttcgcccgac gggcggtgca agacgttctc 31500
ggccgacgcg gacggctacg cgcggggcga ggggtgcgcc gtggtggtgc tcaagcggct 31560
gcgcgatgcg cagcgcgccg gcgactccat cctggcgctg atccggggaa gcgcggtgaa 31620
ccacgacggc ccgagcagcg ggctgaccgt acccaacgga cccgcccagc aagcattgct 31680
gcgccaggcg ctttcgcaag caggcgtgtc tccggtcgac gttgattttg tggagtgtca 31740
cgggacaggg acggcgctgg gcgacccgat cgaggtgcag gcgctgagcg aggtgtatgg 31800
tccagggcgc tccggggacc gaccgctggt gctgggggcc gccaaggcca acgtcgcgca 31860
tctggaggcg gcatctggct tggccagcct gctcaaggcc gtgcttgcgc tgcggcacga 31920
gcagatcccg gcccagccgg agctggggga gctcaacccg cacttgccgt ggaacacgct 31980
gccggtggcg gtgccacgta aggcggtgcc gtgggggcgc ggcgcacgcc cgcgtcgggc 32040
cggcgtgagc gcgttcgggt tgagcggaac caacgtgcat gtcgtgctgg aggaggcacc 32100
ggaggtggag ccggcgcccg cggcgccggc gcgaccggtg gagctggtcg tgctatcggc 32160
caagagcgcg gcggcgctgg acgccgcggc ggcacggctc tcggcgcacc tgtccgcgca 32220
cccggagctg agcctcggcg acgtggcgtt cagcctggcg acgacgcgca gcccgatgga 32280
gcaccggctc gccatcgcga cgacctcgcg cgaggccctg cgaggcgcgc tggacgccgc 32340
ggcgcagcaa aagacgccgc agggcgcggt gcgcggcaag gccgtgtcct cacgcggtaa 32400
gctggctttc ctgttcaccg gacagggcgc gcaaatgccg ggcatgggcc gtgggctgta 32460
cgaaacgtgg cctgcgttcc gggaggcgtt cgaccggtgc gtggcgctct tcgatcggga 32520
gatcgaccag cctctgcgcg aggtgatgtg ggctgcgccg ggcctcgctc aggcggcgcg 32580
gctcgatcag accgcgtacg cgcagccggc tctctttgcg ctggagtacg cgctggctgc 32640
cctgtggcgt tcgtggggcg tggagccgca cgtactgctc ggtcatagca tcggcgagct 32700
ggtcgccgcc tgcgtggcgg gcgtgttctc gctcgaagat gcggtgaggt tggtggccgc 32760
gcgcgggcgg ctgatgcagg cgctacccgc cggcggtgcc atggtagcca tcgcagcgtc 32820
cgaggccgag gtggccgcct ccgtggcgcc ccacgccgcc acggtgtcga tcgccgcggt 32880
caacggtcct gacgccgtcg tgatcgccgg cgccgaggta caggtgctcg ccctcggcgc 32940
gacgttcgcg gcgcgtggga tacgcacgaa gaggctcgcc gtctcccatg cgttccactc 33000
gccgctcatg gatccgatgc tggaagactt ccagcgggtc gctgcgacga tcgcgtaccg 33060
cgcgccagac cgcccggtgg tgtcgaatgt caccggccac gtcgcaggcc ccgagatcgc 33120
cacgcccgag tattgggtcc ggcatgtgcg aagcgccgtg cgcttcggcg acggggcaaa 33180
ggcgttgcat gccgcgggtg ccgccacgtt cgtcgaggtt ggcccgaagc cggtcctgct 33240
cgggctgttg ccagcgtgcc tcggggaagc ggacgcggtc ctcgtgccgt cgctacgcgc 33300
ggaccgctcg gaatgcgagg tggtcctcgc ggcgctcggg gcttggtatg cctggggggg 33360
tgcgctcgac tggaagggcg tgttccccga tggcgcgcgc cgcgtggctc tgcccatgta 33420
tccatggcag cgtgagcgcc attggatgga cctcaccccg cgaagcgccg cgcctgcagg 33480
gatcgcaggt cgctggccgc tggctggtgt cgggctctgc atgcccggcg ctgtgttgca 33540
ccacgtgctc tcgatcggac cacgccatca gcccttcctc ggtgatcacc tcgtgtttgg 33600
caaggtggtg gtgcccggcg cctttcatgt cgcggtgatc ctcagcatcg ccgccgagcg 33660
ctggcccgag cgggcgatcg agctgacagg cgtggagttc ctgaaggcca tcgcgatgga 33720
gcccgaccag gaggtcgagc tccacgccgt gctcaccccc gaagccgccg gggatggcta 33780
cctgttcgag ctggcgaccc tggcggcgcc ggagaccgaa cgccgatgga cgacccacgc 33840
ccgcggtcgg gtgcagccga cagacggcgc gcccggcgcg ttgccgcgcc tcgaggtgct 33900
ggaggaccgc gcgatccagc ccctcgactt cgccggattc ctcgacaggt tatcggcggt 33960
gcggatcggc tggggtccgc tttggcgatg gctgcaggac gggcgcgtcg gcgacgaggc 34020
ctcgcttgcc accctcgtgc cgacctatcc gaacgcccac gacgtggcgc ccttgcaccc 34080
gatcctgctg gacaacggct ttgcggtgag cctgctgtca acccggagcg agccggagga 34140
cgacgggacg cccccgctgc cgttcgccgt ggaacgggtg cggtggtggc gggcgccggt 34200
tggaagggtg cggtgtggcg gcgtgccgcg gtcgcaggca ttcggtgtct cgagcttcgt 34260
gctggtcgac gaaactggcg aggtggtcgc cgaggtggag ggatttgttt gccgccgggc 34320
gccgcgagag gtgttcctgc ggcaggagtc gggcgcgtcg actgcagcct tgtaccgcct 34380
cgactggccc gaagcgccct tgcccgatgc gcctgcggaa cggatcgagg agagctgggt 34440
cgtggtggca gcacctggct cggagatggc cgcggcgctc gcaacacggc tcaaccgctg 34500
cgtcctcgcc gaacccaaag gcctcgaggc ggccctcgcg ggggtgtctc ccgcaggtgt 34560
gatctgcctc tgggaggctg gagcccacga ggaagctccg gcggcggcgc agcgtgtggc 34620
gaccgagggc ctctcggtgg tgcaggcgct cagggaccgc gcggtgcgcc tgtggtgggt 34680
gaccatgggc gcagtggccg tcgaggccgg tgagcgggtg caggtcgcca cagcgccggt 34740
atggggcctc ggccggacag tgatgcagga gcgcccggag ctcagctgca ctctggtgga 34800
tttggagccg gaggccgatg cagcgcgctc agctgacgtt ctgttgcggg agctcggtcg 34860
cgctgacgac gagacacagg tggctttccg ttccggaaag cgccgcgtag cgcggctggt 34920
caaagcgacg acccccgaag ggctcctggt ccctgacgca gagtcctatc gactggaggc 34980
tgggcagaag ggcacattgg accagctccg cctcgcgccg gcacagcgcc gggcacctgg 35040
cccgggcgag gtcgagatca aggtaaccgc ctcggggctc aacttccgga ccgtcctcgc 35100
tgtgctggga atgtatccgg gcgacgccgg gccgatgggc ggagattgtg ccggtgtcgc 35160
cacggcggtg ggccaggggg tgcgccacgt cgcggtcggc gatgctgtca tgacgctggg 35220
gacgttgcat cgattcgtca cggtcgacgc gcggctggtg gtccggcagc ctgcagggct 35280
gactcccgcg caggcagcta cggtgccggt cgcgttcctg acggcctggc tcgctctgca 35340
cgacctgggg aatctgcggc gcggcgagcg ggtgctgatc catgctgcgg ccggcggtgt 35400
gggcatggcc gcggtgcaaa tcgcccgatg gataggggcc gaggtgttcg ccacggcgag 35460
cccgtccaag tgggcagcgg ttcaggccat gggcgtgccg cgcacgcaca tcgccagctc 35520
gcggacgctg gagtttgctg agacgttccg gcaggtcacc ggcggccggg gcgtggacgt 35580
ggtgctcaac gcgctggccg gcgagttcgt ggacgcgagc ctgtccctgc tgtcgacggg 35640
cgggcggttc ctcgagatgg gcaagaccga catacgggat cgagccgcgg tcgcggcggc 35700
gcatcccggt gttcgctatc gggtattcga catcctggag ctcgctccgg atcgaactcg 35760
agagatcctc gagcgcgtgg tcgagggctt tgctgcggga catctgcgcg cattgccggt 35820
gcatgcgttc gcgatcacca aggccgaggc agcgtttcgg ttcatggcgc aagcgcggca 35880
tcagggcaag gtcgtgctgc tgccggcgcc ctccgcagcg cccttggcgc cgacgggcac 35940
cgtactgctg accggtgggc tgggagcgtt ggggctccac gtggcccgct ggctcgccca 36000
gcagggcgtg ccgcacatgg tgctcacagg tcggcggggc ctggatacgc cgggcgctgc 36060
caaagccgtc gcggagatcg aagcgctcgg cgctcgggtg acgatcgcgg cgtcggatgt 36120
cgccgatcgg aatgcgctgg aggctgtgct ccaggccatt ccggcggagt ggccgttaca 36180
gggcgtgatc catgcagccg gagcgctcga tgatggtgtg cttgatgagc agaccaccga 36240
ccgcttctcg cgggtgctgg caccgaaggt gactggcgcc tggaatctgc atgagctcac 36300
ggcgggcaac gatctcgctt tcttcgtgct gttctcctcc atgtcggggc tcttgggctc 36360
ggccgggcag tccaactatg cggcggccaa caccttcctc gacgcgctgg ccgcgcatcg 36420
gcgggccgaa ggcctggcgg cgcagagcct cgcgtggggc ccatggtcgg acggaggcat 36480
ggcagcgggg ctcagcgcgg cgctgcaggc gcggctcgct cggcatggga tgggagctct 36540
gtcgccggct cagggcaccg cgctgctcgg gcaggcgctg gctcggccgg aaacgcagct 36600
cggggcgatg tcgctcgacg tgcgtgcggc aagccaagct tcgggagcgg cagtgccgcc 36660
tgtgtggcgc gcgttggtgc gcgcggaggc gcgccatacg gcggctgggg cgcagggggc 36720
attggccgcg cgtcttgggg cgctgcccga ggcgcgtcgc gccgacgagg tgcgcaaggt 36780
cgtgcaggcc gagatcgcgc gcgtgctttc atggagcgcc gcgagcgccg tgcccgtcga 36840
tcggccgctg tcggacttgg gcctcgactc gctcacggcg gtggagctgc gcaacgtgct 36900
cggccagcgg gtgggtgcga cgctgccggc gacgctggca ttcgatcacc cgacggtcga 36960
cgcgctcacg cgctggctgc tcgataaggt cctggccgtg gccgagccga gcgtatcgtc 37020
cgcaaagtcg tcgccgcagg tcgccctcga cgagcccatt gccatcatcg gcatcggctg 37080
ccgtttccca ggcggcgtgg ccgatccgga gtcgttttgg cggctgctcg aagagggcag 37140
cgatgccgtc gtcgaggtgc cgcatgagcg atgggacatc gacgcgttct atgatccgga 37200
tccggatgtg cgcggcaaga tgacgacacg ctttggcggc ttcctgtccg atatcgaccg 37260
gttcgatccg gccttcttcg gcatctcgcc gcgcgaagcg acgaccatgg atccgcagca 37320
gcggctgctc ctggagacga gctgggaggc gttcgagcgc gccgggattt tgcccgagcg 37380
gctgatgggc agcgataccg gcgtgttcgt ggggctcttc taccaggagt acgctgcgct 37440
cgccggcggc atcgaggcgt tcgatggcta tctaggcacc ggcaccacgg ccagcgtcgc 37500
ctcgggcagg atctcttatg tgctcgggct aaaggggccg agcctgacgg tggacaccgc 37560
gtgctcctcg tcgctggtcg cggtgcacct ggcctgccag gcgctgcggc ggggcgagtg 37620
ttcggtggcg ctggccggcg gcgtggcgct gatgctcacg ccggcgacgt tcgtggagtt 37680
cagccggctg cgaggcctgg ctcccgacgg acggtgcaag agcttctcgg ccgcagccga 37740
cggcgtgggg tggagcgaag gctgcgccat gctcctgctc aaaccgcttc gcgatgcgca 37800
gcgcgatggg gatccgatcc tggcggtgat ccgcggcacc gcggtgaacc aggatgggcg 37860
cagcaacggg ctgacggcgc ccaacgggtc gtcgcagcaa gaggtgatcc gtcgggccct 37920
ggagcaggcg gggctggctc cggcggacgt cagctacgtc gagtgccacg gcaccggcac 37980
gacgttgggc gaccccatcg aagtgcaggc cctgggcgcc gtgctggcac aggggcgacc 38040
ctcggaccgg ccgctcgtga tcgggtcggt gaagtccaat atcggacata cgcaggctgc 38100
ggcgggcgtg gccggtgtca tcaaggtggc gctggcgctc gagcgcgggc ttatcccgag 38160
gagcctgcat ttcgacgcgc ccaatccgca cattccgtgg tcggagctcg ccgtgcaggt 38220
ggccgccaaa cccgtcgaat ggacgagaaa cggcgtgccg cgacgagccg gggtgagctc 38280
gtttggcgtc agcgggacca acgcgcacgt ggtgctggag gaggcgccag cggcggcgtt 38340
cgcgcccgcg gcggcgcgtt cagcggagct tttcgtgctg tcggcgaaga gcgccgcggc 38400
gctggacgcg caggcggcgc ggctttcggc gcacgtcgtt gcgcacccgg agctcggcct 38460
cggcgacctg gcgttcagcc tggcgacgac ccgcagcccg atgacgtacc ggctcgcggt 38520
ggcggcgacc tcgcgcgagg cgctgtctgc cgcgctcgac acagcggcgc aggggcaggc 38580
gccgcccgca gcggctcgcg gccacgcttc cacaggcagc gccccaaagg tggttttcgt 38640
ctttcctggc cagggctccc agtggctggg catgggccaa aagctcctct cggaggagcc 38700
cgtcttccgc gacgcgctct cggcgtgtga ccgagcgatt caggccgaag ccggctggtc 38760
gctgctcgcc gagctcgcgg ccgatgagac cacctcgcag ctcggccgca tcgacgtggt 38820
gcagccggcg ctgttcgcga tcgaggtcgc gctgtcggcg ctgtggcggt cgtggggcgt 38880
cgagccggat gcagtggtag gccacagcat gggcgaagtg gcggccgcgc acgtcgccgg 38940
cgccctgtcg ctcgaggatg ctgtagcgat catctgccgg cgcagcctgc tgctgcggcg 39000
gatcagcggc caaggcgaga tggcggtcgt cgagctttcc ctggccgagg ccgaggcagc 39060
gctcctgggc tacgaagacc ggctcagcgt ggcggtgagc aacagcccgc gctcgacggt 39120
gctggcgggc gagccggcag cgctcgcaga ggtgctggcg atccttgcgg caaagggggt 39180
gttctgccgt cgagtcaagg tggacgtcgc cagccacagc ccacagatcg acccgctgcg 39240
cgacgagcta ttggcagcat tgggcgagct cgagccgcga caagcgaccg tgtcgatgcg 39300
ctcgacggtg acgagcacga tcatggcggg cccggagctc gtggcgagct actgggcgga 39360
caacgttcga cagccggtgc gcttcgccga agcggtgcaa tcgttgatgg aagacggtca 39420
tgggctgttc gtggagatga gcccgcatcc gatcctgacg acatcggtcg aggagatccg 39480
acgggcgacg aagcgggagg gagtcgcggt gggctcgttg cggcgtggac aggacgagcg 39540
cctgtccatg ttggaggcgc tgggagcgct ctgggtacac ggccaggcgg tgggctggga 39600
gcggctgttc tccgcgggcg gcgcgggcct ccgtcgcgtg ccgctgccga cctatccctg 39660
gcagcgcgag cggtactggg tcgatgcgcc gaccggcggc gcggcgggcg gcagccgctt 39720
tgctcatgcg ggcagtcacc cgctcctggg tgaaatgcag accctgtcga cccagaggag 39780
cacgcgcgtg tgggagacga cgctggatct caaacggctg ccgtggctcg gcgatcaccg 39840
ggtgcagggg gcggtcgtgt tcccgggcgc ggcgtacctg gagatggcgc tttcgtccgg 39900
ggccgaggcc ttgggtgacg gtccgctcca ggtcagcgat gtggtgctcg ccgaggcgct 39960
ggccttcgcg gatgatacgc cggcggcggt gcaggtcatg gcgaccgagg agcgaccagg 40020
ccgcctgcaa ttccacgttg cgagccgggt gccgggccac ggcggtgctg cctttcgaag 40080
ccatgcccgc ggggtgctgc gccagatcga gcgcgccgag gtcccggcga ggctggatct 40140
ggccgcgctt cgtgcccggc ttcaggccag cgcacccgct gcggctacct atgcggcgct 40200
ggccgagatg gggctcgagt acggcccagc gttccagggg cttgtcgagc tgtggcgggg 40260
ggagggcgag gcgctgggac gtgtgcggct ccccgaggcc gccggctccc cagccgcgtg 403Z0
ccggctccac cccgcgctct tggatgcgtg cttccacgtg agcagcgcct tcgctgaccg 40380
cggcgaggcg acgccatggg tacccgtgga aatcggctcg ctgcggtggt tccagcggcc 40440
gtcgggggag ctgtggtgtc atgcgcggag tgtgagccac ggaaagccaa cacccgaccg 40500
gcggagtacc gacttctggg tggtcgacag cacgggcgcg atcgtcgccg agatctccgg 40560
gctcgtggcg cagcggctcg cgggaggtgt acgccggcgc gaagaagacg actggttcat 40620
ggagccggct tgggaaccga ccgcggtccc cggatccgag gtcatggcgg gccggtggct 40680
gctcatcggc tcgggcggcg ggctcggcgc tgcgctccac tcggcgctga cggaagctgg 40740
ccattccgtc gtccacgcga cagggcgcgg cacgagcgcc gccgggttgc aggcactctt 40800
gacggcgtcc ttcgacggcc aggccccgac gtcggtggtg cacctcggca gcctcgatga 40860
gcgtggcgtg ctcgacgcgg atgccccctt cgacgccgat gcgcttgagg agtcgctggt 40920
gcgcggctgc gacagcgtgc tctggaccgt gcaggccgtg gccggggcgg gcttccgaga 40980
tcctccgcgg ttgtggctcg tgacacgcgg cgctcaggcc atcggcgccg gcgacgtctc 41040
tgtggcgcaa gcgccgctcc tggggctggg ccgcgttatc gccttggagc acgccgagct 41100
gcgctgcgct cggatcgacc tcgatccagc gcggcgcgac ggagaagtcg atgagctgct 41160
tgccgagctg ttggccgacg acgccgagga ggaagtcgcg tttcgcggcg gtgagcggcg 41220
cgtggcccgg ctcgtccgaa ggctgcccga gaccgactgc cgagagaaaa tcgagcccgc 41280
ggaaggccgg ccgttccggc tggagatcga tgggtccggc gtgctcgacg acctggtgct 41340
ccgagccacg gagcggcgcc ctcctggccc gggcgaggtc gagatcgccg tcgaggcggc 41400
ggggctcaac tttctcgacg tgatgagggc catggggatc taccctgggc ccggggacgg 41460
tccggttgcg ctgggcgccg agtgctccgg ccgaattgtc gcgatgggcg aaggtgtcga 41520
gagccttcgt atcggccagg acgtcgtggc cgtcgcgccc ttcagtttcg gcacccacgt 41580
caccatcgac gcccggatgc tcgcacctcg ccccgcggcg ctgacggccg cgcaggcagc 41640
cgcgctgccc gtcgcattca tgacggcctg gtacggtctc gtccatctgg ggaggctccg 41700
ggccggcgag cgcgtgctca tccactcggc gacggggggc accgggctcg ctgctgtgca 41760
gatcgcccgc cacctcggcg cggagatatt tgcgaccgct ggtacaccgg agaagcgggc 41820
gtggctgcgc gagcagggga tcgcgcacgt gatggactcg cggtcgctgg acttcgccga 41880
gcaagtgctg gccgcgacga agggcgaggg ggtcgacgtc gtgttgaact cgctgtctgg 41940
cgccgcgatc gacgcgagcc tttcgaccct cgtgccggac ggccgcttca tcgagctcgg 42000
caagacggac atctatgcag atcgctcgct ggggctcgct cacttcagga agagcccgtc 42060
ctacagcgcc gtcgatcttg cgggcttggc cgtgcgtcgg cccgagcgcg tcgcagcgct 42120
gctggcggag gtggtggacc tgctcgcacg gggagcgctg cagccgcttc cggtagagat 42180
cttccccctc tcgcgggccg cggacgcgtt ccggaaaatg gcgcaagcgc agcatctcgg 42240
gaagctcgtg ctcgcgctgg aggacccgga cgtgcggatc cgcgttccgg gcgaatccgg 42300
cgtcgccatc cgcgcggacg gcgcctacct cgtgaccggc ggtctggggg ggctcggtct 42360
gagcgtggct ggatggctgg ccgagcaggg ggctgggcat ctggtgctgg tgggccgctc 42420
cggcgcggtg agcgcggagc agcagacggc tgtcgccgcg ctcgaggcgc acggcgcgcg 42480
tgtcacggta gcgagggcag acgtcgccga tcgggcgcag atggagcgga tcctccgcga 42540
ggttaccgcg tcggggatgc cgctccgcgg cgtcgttcat gcggccggaa tcctggacga 42600
cgggctgctg atgcagcaaa cccccgcgcg gttccgcgcg gtcatggcgc ccaaggtccg 42660
aggggccttg cacctgcatg cgttgacacg cgaagcgccg ctctccttct tcgtgctgta 42720
cgcttcggga gcagggctct tgggctcgcc gggccagggc aactacgccg cggccaacac 42780
gttcctcgac gcactggcac accaccggag ggcgcagggg ctgccagcat tgagcatcga 42840
ctggggcctg ttcgcggacg tgggtttggc cgccgggcag caaaatcgcg gcgcacggct 42900
ggtcacccgc gggacgcgga gcctcacccc cgacgaaggg ctgtgggcgc tcgagcgcct 42960
gctcgacggc gatcgcaccc aggccggggt catgccgttc gacgtgcggc agtgggtgga 43020
gttctacccg gcggcggcat cttcgcggag gttgtcgcgg ctcatgacgg cacggcgcgt 43080
ggcttccggt cggctcgccg gggatcggga cctgctcgaa cggctcgcca ccgccgaggc 43140
gggcgcgcgg gcagggatgc tgcaggaggt cgtgcgcgcg caggtctcgc aggtgctgcg 43200
cctctccgaa ggcaagctcg acgtggatgc gccgctcacg agcctgggaa tggactcgct 43260
gatggggcta gagctgcgca accgcatcga ggccgtgctc ggcatcacca tgccggcgac 43320
cctgctgtgg acctacccca cggtggcagc gctgagtgcg catctggctt ctcatgtcgt 43380
ctctacgggg gatggggaat ccgcgcgccc gccggataca gggagcgtgg ctccaacgac 43440
ccacgaagtc gcttcgctcg acgaagacgg gttgttcgcg ttgattgatg agtcactcgc 43500
gcgcgcggga aagaggtgat tgcgtgacag accgagaagg ccagctcctg gagcgcttgc 43560
gtgaggttac tctggccctt cgcaagacgc tgaacgagcg cgataccctg gagctcgaga 43620
agaccgagcc gatcgccatc gtggggatcg gctgccgctt ccccggcgga gcgggcactc 43680
cggaggcgtt ctgggagctg ctcgacgacg ggcgcgacgc gatccggccg ctcgaggagc 43740
gctgggcgct cgtaggtgtc gacccaggcg acgacgtacc gcgctgggcg gggctgctca 43800
ccgaggccat cgacggcttc gacgccgcgt tcttcggtat cgccccccgg gaggcacggt 43860
cgctcgaccc gcagcatcgc ctgctgctgg aggtcgcctg ggaggggttc gaagacgccg 43920
gcatcccgcc caggtccctc gtcgggagcc gcaccggcgt gttcgtcggc gtctgcgcca 43980
cggagtacct ccacgccgcc gtcgcgcacc agccgcgcga agagcgggac gcgtacagca 44040
ccaccggcaa catgctcagc atcgccgccg gacggctatc gtacacgctg gggctgcagg 44100
gaccttgcct gaccgtcgat acggcgtgct cgtcatcgct ggtggccatt cacctcgcct44160
gccgcagcct gcgcgctcga gagagcgatc tcgcgctggc gggaggggtc aacatgcttc 44220
tctcccccga cacgatgcga gctctggcgc gcacccaggc gctgtcgccc aatggccgtt 44280
gccagacctt cgacgcgtcg gccaacgggt tcgtccgtgg ggagggctgc ggtctgatcg 44340
tgctcaagcg attgagcgac gcgcggcggg atggggaccg gatctgggcg ctgatccgag 44400
gatcggccat caatcaggac ggccggtcga cggggttgac ggcgcccaac gtgctcgccc 44460
agggggcgct cttgcgcgag gcgctgcgga acgccggcgt cgaggccgag gccatcggtt 44520
acatcgagac ccacggggcg gcaacctcgc tgggcgaccc catcgagatc gaagcgctgc 44580
gcgctgtggt ggggccggcg cgagccgacg gagcgcgctg cgtgctgggc gcggtgaaga 44640
ccaacctcgg ccacctggag ggcgctgccg gcgtggcggg cctgatcaag gcgacgcttt 44700
cgctacatca cgagcgcatc ccgaggaacc tcaactttcg tacgctcaat ccgcggatcc 44760
ggatcgaggg gaccgcgctc gcgttggcga ccgaaccggt gccctggccg cggacgggcc 44820
ggacgcgctt cgcgggagtg agctcgttcg ggatgagcgg gaccaacgcg catgtggtgt 44880
tggaggaggc gccggcggtg gagcctgagg ccgcggcccc cgagcgcgca gcggagctgt 44940
tcgtcctgtc ggcgaagagc gcggcggcgc tggatgcgca ggcagcccgg ctgcgggacc 45000
acctggagaa gcacgtcgag cttggcctcg gcgatgtggc gttcagcctg gcgacgacgc 45060
gcagcgcgat ggagcaccgg ctggcggtgg ccgcgagctc gcgcgaggcg ctgcgagggg 45120
cgctttcggc cgcagcgcag gggcacacgc cgccgggagc cgtgcgtggg cgggcctcgg 45180
gcggcagcgc gccgaaggtg gtcttcgtgt ttcccggtca gggctcgcag tgggtgggca 45240
tgggccgaaa gctcatggcc gaagagccgg tcttccgggc ggcgctggag ggttgcgacc 45300
gggccatcga ggcggaagcg ggctggtcgc tgctcgggga gctctccgcc gacgaggccg 45360
cctcgcagct cgggcgcatc gacgtggttc agccggtgct cttcgccatg gaagtagcgc 45420
tttctgcgct gtggcggtcg tggggagtgg agccggaagc ggtggtgggc cacagcatgg 45480
gcgaggttgc ggcggcgcac gtggccggcg cgctgtcgct cgaggacgcg gtggcgatca 45540
tctgccggcg cagccggctg ctgcggcgga tcagcggtca gggggagatg gcgctggtcg 45600
agctgtcgct ggaggaggcc gaggcggcgc tgcgtggcca tgagggtcgg ctgagcgtgg 45660
cggtgagcaa cagcccgcgc tcgaccgtgc tcgccggcga gccggcggcg ctctcggagg 45720
tgctggcggc gctgacggcc aagggggtgt tctggcggca ggtgaaggtg gacgtcgcca 45780
gccatagccc gcaggtcgac ccgctgcgcg aagagctgat cgcggcgctg ggagcgatcc 45840
ggccgcgagc ggctgcggtg ccgatgcgct cgacggtgac gggcggggtg atcgcgggtc 45900
cggagctcgg tgcgagctac tgggcggaca accttcggca gccggtgcgc ttcgctgcgg 45960
cggcgcaagc gctgctggag ggtggccccg cgctgttcat cgagatgagc ccgcacccga 46020
tcctggtgcc gcccctggac gagatccaga cggcggccga gcaagggggc gctgcggtgg 46080
gctcgctgcg gcgagggcag gacgagcgcg cgacgctgct ggaggcgctg gggacgctgt 46140
gggcgtccgg ctatccggtg agctgggctc ggctgttccc cgcgggcggc aggcgggttc 46200
cgctgccgac ctatccctgg cagcacgagc ggtgctggat cgaggtcgag cctgacgccc 46260
gccgcctcgc cgcagccgac cccaccaagg actggttcta ccgaacggac tggcccgagg 46320
tgccccgcgc cgccccgaaa tcggagacag ctcatgggag ctggctgctg ttggccgaca 46380
ggggtggggt cggtgaggcg gtcgctgcag cgctgtcgac gcgcggactt tcctgcaccg 46440
tgcttcatgc gtcggctgac gcctccaccg tcgccgagca ggtatccgaa gctgccagtc 46500
gccgaaacga ctggcaggga gtcctctacc tgtggggcct cgacgccgtc gtcgatgctg 46560
gggcatcggc cgacgaagtc agcgaggcta cccgccgtgc caccgcaccc gtccttgggc 46620
tggttcgatt cctgagcgct gcgccccatc ctcctcgctt ctgggtggtg acccgcgggg 46680
catgcacggt gggcggcgag ccagaggcct ctctttgcca agcggcgttg tggggcctcg 46740
cgcgcgtcgc ggcgctggag caccccgctg cctggggtgg cctcgtggac ctggatcctc 46800
agaagagccc gacggagatc gagcccctgg tggccgagct gctttcgccg gacgccgagg 46860
atcaactggc gttccgcagc ggtcgcaggc acgcagcacg ccttgtagcc gccccgccgg 46920
agggcgacgt cgcaccgata tcgctgtccg cggaggggag ctacctggtg acgggcgggc 46980
tgggtggcct tggtctgctc gtggctcggt ggctggtgga gcggggagct cgacatctgg 47040
tgctcaccag ccggcacggg ctgccagagc gacaggcgtc gggcggagag cagccgccgg 47100
aggcccgcgc gcgcatcgca gcggtcgagg ggctggaagc gcagggcgcg cgggtgaccg 47160
tggcagcggt ggatgtcgcc gaggccgatc ccatgacggc gctgctggcc gccatcgagc 47220
ccccgttgcg cggggtggtg cacgccgccg gcgtcttccc cgtgcgtcac ctggcggaga 47280
cggacgaggc cctgctggag tcggtgctcc gtcccaaggt ggccgggagc tggctgctgc 47340
accggctgct gcgcgaccgg cctctcgacc tgttcgtgct gttctcgtcg ggcgcggcgg 47400
tgtggggtgg caaaggccaa ggcgcatacg ccgcggccaa tgcgttcctc gacgggctcg 47460
cgcaccatcg ccgcgcgcac tcgctgccgg cgttgagcct cgcctggggc ttatgggccg 47520
agggaggcat ggttgatgca aaggctcatg cacgtctgag cgacatcggg gtcctgccca 47580
tggccacggg gccggccttg tcggcgctgg agcgcctggt gaacaccagc gctgtccagc 47640
gttcggtcac acggatggac tgggcgcgct tcgcgccggt ctatgccgcg cgagggcggc 47700
gcaacttgct ttcggctctg gtcgcggagg acgagcgcgc tgcgtctccc ccggtgccga 47760
cggcaaaccg gatctggcgc ggcctgtccg ttgcggagag ccgctcagcc ctctacgagc 47820
tcgttcgcgg catcgtcgcc cgggtgctgg gcttctccga cccgggcgcg ctcgacgtcg 47880
gccgaggctt cgccgagcag gggctcgact ccctgatggc tctggagatc cgtaaccgcc 47940
ttcagcgcga gctgggcgaa cggctgtcgg cgactctggc cttcgaccac ccgacggtgg 48000
agcggctggt ggcgcatctc ctcaccgacg tgctgaagct ggaggaccgg agcgacaccc 48060
ggcacatccg gtcggtggcg gcggatgacg acatcgccat cgtcggtgcc gcctgccggt 48120
tcccaggtgg ggatgagggc ctggagacat actggcggca tctggccgag ggcatggtgg 48180
tcagcaccga ggtgccagcc gaccggtggc gcgcggcgga ctggtacgac cccgatccgg 48240
aggttccggg ccggacctat gtggccaagg gtgccttcct ccgcgatgtg cgcagcttgg 48300
atgcggcgtt cttcgccatt tcccctcgtg aggcgatgag cctggacccg caacagcggc 48360
tgttgctgga ggtgagctgg gaggcgatcg agcgcgctgg ccaggacccg atggcgctgc 48420
gcgagagcgc cacgggcgtg ttcgtgggca tgatcgggag cgagcacgcc gagcgggtgc 48480
agggcctcga cgacgacgcg gcgttgctgt acggcaccac cggcaacctg ctcagcgtcg 48540
ccgctggacg gctgtcgttc ttcctgggtc tgcacggccc gacgatgacg gtggacaccg 48600
cctgctcgtc gtcgctggtg gcgttgcacc tcgcctgcca gagcctgcga ttgggcgagt 48660
gcgaccaggc cctggccggc gggtccagcg tgcttttgtc gccgcggtca ttcgtcgcgg 48720
cgtcgcgcat gcgtttgctt tcgccagatg ggcggtgcaa gacgttctcg gccgctgcag 48780
acggctttgc gcgggccgag ggctgcgccg tggtggtgct caagcggctc cgtgacgcgc 48840
agcgcgaccg cgaccccatc ctggcggtgg tcaggagcac ggcgatcaac cacgatggcc 48900
cgagcagcgg gctcacggtg cccagcggtc ctgcccagca ggcgttgcta cgccaggcgc 48960
tggcgcaagc gggcgtggcg ccggccgagg tcgatttcgt ggagtgccac gggacgggga 49020
cagcgctggg tgacccgatc gaggtgcagg cgctgggcgc ggtgtacggg cggggccgcc 49080
ccgcggagcg gccgctctgg ctgggcgctg tcaaggccaa cctcggccac ctggaggccg 49140
cggcgggctt ggccggcgtg ctcaaggtgc tcttggcgct ggagcacgag cagattccgg 49200
ctcaaccgga gctcgacgag ctcaacccgc acatcccgtg ggcagagctg ccagtggccg 49260
ttgtccgcag ggcggtcccc tggccgcgcg gcgcgcgccc gcgtcgtgca ggcgtgagcg 49320
ctttcggcct gagcgggacc aacgcgcatg tggtgttgga ggaggcgccg gcggtggagc 49380
ctgtggccgc ggcccccgag cgcgcagcgg agctgttcgt cctgtcggcg aagagcgcgg 49440
cggcgctgga tgcgcaggca gcccggctgc gggaccacct ggagaagcat gtcgagcttg 49500
gcctcggcga tgtggcgttc agcctggcga cgacgcgcag cgcgatggag caccggctgg 49560
cggtggccgc gagctcgcgc gaggcgctgc gaggggcgct ttcggccgca gcgcaggggc 49620
acacgccgcc gggagccgtg cgtgggcggg cctcgggcgg cagcgcgccg aaggtggtct 49680
tcgtgtttcc cggccagggc tcgcagtggg tgggcatggg ccgaaagctc atggccgaag 49740
agccggtctt ccgggcggcg ctggagggtt gcgaccgggc catcgaggcg gaagcgggct 49800
ggtcgctgct cggggagctc tccgccgacg aggccgcctc gcagctcggg cgcatcgacg 49860
tggttcagcc ggtgctgttc gccatggaag tagcgctttc tgcgctgtgg cggtcgtggg 49920
gagtggagcc ggaagcggtg gtgggccaca gcatgggcga ggttgcggcg gcgcacgtgg 49980
ccggcgcgct gtcgctcgag gacgcggtgg cgatcatctg ccggcgcagc cggctgctgc 50040
ggcggatcag cggtcagggg gagatggcgc tggtcgagct gtcgctggag gaggccgagg 50100
cggcgctgcg tggccatgag ggtcggctga gcgtggcggt gagcaacagc ccgcgctcga 50160
ccgtgctcgc cggcgagccg gcggcgctct cggaggtgct ggcggcgctg acggccaagg 50220
gggtgttctg gcggcaggtg aaggtggacg tcgccagcca tagcccgcag gtcgacccgc 50280
tgcgcgaaga gctgatcgcg gcgctgggag cgatccggcc gcgagcggct gcggtgccga 50340
tgcgctcgac ggtgacgggc ggggtgatcg cgggtccgga gctcggtgcg agctactggg 50400
cggacaacct tcggcagccg gtgcgcttcg ctgcggcggc gcaagcgctg ctggagggtg 50460
gccccgcgct gttcatcgag atgagcccgc acccgatcct ggtgccgccc ctggacgaga 50520
tccagacggc ggccgagcaa gggggcgctg cggtgggctc gctgcggcga gggcaggacg 50580
agcgcgcgac gctgctggag gcgctgggga cgctgtgggc gtccggctat ccggtgagct 50640
gggctcggct gttccccgcg ggcggcaggc gggttccgct gccgacctat ccctggcagc 50700
acgagcggta ctggatcgag gacagcgtgc atgggtcgaa gccctcgctg cggcttcggc 50760
agcttcgcaa cggcgccacg gaccatccgc tgctcggggc tccattgctc gtctcggcgc 50820
gacccggagc tcacttgtgg gagcaagcgc tgagcgacga gaggctatcc tacctttcgg 50880
aacatagggt ccatggcgaa gccgtgttgc ccagcgcggc gtatgtagag atggcgctcg 50940
ccgccggcgt agatctctat ggcacggcga cgctggtgct ggagcagctg gcgctcgagc 51000
gagccctcgc cgtgccctcc gaaggcggac gcatcgtgca agtggccctc agcgaagaag 51060
gtcccggtcg ggcctcattc caggtatcga gtcgtgagga ggcaggtagg agctgggtgc 51120
ggcacgccac ggggcacgtg tgtagcggcc agagctcagc ggtgggagcg ttgaaggaag 51180
ctccgtggga gattcaacgg cgatgtccga gcgtcctgtc gtcggaggcg ctctatccgc 51240
tgctcaacga gcacgccctc gactatggtc cctgcttcca gggcgtggag caggtgtggc 51300
tcggcacggg ggaggtgctc ggccgggtac gcttgccagg agacatggca tcctcaagtg 51360
gcgcctaccg gattcatccc gccttgttgg atgcatgttt tcaggtgctg acagcgctgc 51420
tcaccacgcc ggaatccatc gagattcgga ggcggctgac ggatctccac gaaccggatc 51480
tcccgcggtc cagggctccg gtgaatcaag cggtgagtga cacctggctg tgggacgccg 51540
cgctggacgg tggacggcgc cagagcgcga gcgtgcccgt cgacctggtg ctcggcagct 51600
tccatgcgaa gtgggaggtc atggagcgcc tcgcgcaggc gtacatcatc ggcactctcc 51660
gcatatggaa cgtcttctgc gctgctggag agcgtcacac gatagacgag ttgctcgtca 51720
ggcttcaaat ctctgtcgtc tacaggaagg tcatcaagcg atggatggaa caccttgtcg 51780
cgatcggcat ccttgtaggg gacggagagc attttgtgag ctctcagccg ctgccggagc 51840
ctgatttggc ggcggtgctc gaggaggccg ggagggtgtt cgccgacctc ccagtcctat 51900
ttgagtggtg caagtttgcc ggggaacggc tcgcggacgt attgaccggt aagacgctcg 51960
cgctcgagat cctcttccct ggtggctcgt tcgatatggc ggagcgaatc tatcgagatt 52020
cgcccatcgc ccgttactcg aacggcatcg tgcgcggtgt cgtcgagtcg gcggcgcggg 52080
tggtagcacc gtcgggaatg ttcagcatct tggagatcgg agcagggacg ggcgcgacca 52140
ccgccgccgt cctcccggtg ttgctgcctg accggacgga gtaccatttc accgatgttt 52200
ctccgctctt ccttgctcgc gcggagcaaa gatttcgaga ttatccattc ctgaagtatg 52260
gcattctgga tgtcgaccag gagccagctg gccagggata cgcacatcag aggtttgacg 52320
tcatcgtcgc ggccaatgtc atccatgcga cccgcgatat aagagccacg gcgaagcgtc 52380
tcctgtcgtt gctcgcgccc ggaggccttc tggtgctggt cgagggcaca gggcatccga 52440
tctggttcga tatcaccacg ggattgattg aggggtggca gaagtacgaa gatgatcttc 52500
gtatcgacca tccgctcctg cctgctcgga cctggtgtga cgtcctgcgc cgggtaggct 52560
ttgcggacgc cgtgagtctg ccaggcgacg gatctccggc ggggatcctc ggacagcacg 52620
tgatcctctc gcgcgcgccg ggcatagcag gagccgcttg tgacagctcc ggtgagtcgg 52680
cgaccgaatc gccggccgcg cgtgcagtac ggcaggaatg ggccgatggc tccgctgacg 52740
tcgtccatcg gatggcgttg gagaggatgt acttccaccg ccggccgggc cggcaggttt 52800
gggtccacgg tcgattgcgt accggtggag gcgcgttcac gaaggcgctc gctggagatc 52860
tgctcctgtt cgaagacacc gggcaggtcg tggcagaggt tcaggggctc cgcctgccgc 52920
agctcgaggc ttctgctttc gcgccgcggg acccgcggga agagtggttg tacgctttgg 52980
aatggcagcg caaagaccct ataccagagg ctccggcagc cgcgtcttct tcctccgcgg 53040
gggcttggct cgtgctgatg gaccagggcg ggacaggcgc tgcgctcgta tcgctgctgg 53100
aagggcgagg cgaggcgtgc gtgcgcgtca tcgcgggtac ggcatacgcc tgcctcgcgc 53160
cggggctgta tcaagtcgat ccggcgcagc cagatggctt tcataccctg ctccgcgatg 53220
cattcggcga ggaccggatt tgtcgcgcgg tagtgcatat gtggagcctt gatgcgacgg 53280
cagcagggga gagggcgaca gcggagtcgc ttcaggccga tcaactcctg gggagcctga 53340
gcgcgctttc tctggtgcag gcgctggtgc gccggaggtg gcgcaacatg ccgcggcttt 53400
ggctcttgac ccgcgccgtg catgcggtgg gcgcggagga cgcagcggcc tcggtggcgc 53460
aggcgccggt gtggggcctc ggtcggacgc tcgcgctcga gcatccagag ctgcggtgca 53520
cgctcgtgga cgtgaacccg gcgccgtctc cagaggacgc agccgcactg gcggtggagc 53580
tcggggcgag cgacagagag gaccaggtcg cattgcgctc ggatggccgc tacgtggcgc 53640
gcctcgtgcg gagctccttt tccggcaagc ctgctacgga ttgcggcatc cgggcggacg 53700
gcagctatgt gatcaccgat ggcatgggga gagtggggct ctcggtcgcg caatggatgg 53760
tgatgcaggg ggcccgccat gtggtgctcg tggatcgcgg cggcgcttcc gaggcatccc 53820
gggatgccct ccggtccatg gccgaggctg gcgcggaggt gcagatcgtg gaggccgacg 53880
tggctcggcg cgacgatgtc gctcggctcc tctcgaagat cgaaccgtcg atgccgccgc 53940
ttcgggggat cgtgtacgtg gacgggacct tccagggcga ctcctcgatg ctggagctgg 54000
atgcccgtcg cttcaaggag tggatgtatc ccaaggtgct cggagcgtgg aacctgcacg 54060
cgctgaccag ggatagatcg ctggacttct tcgtcctgta ttcctcgggc acctcgcttc 54120
tgggcttgcc aggacagggg agccgcgccg ccggtgacgc cttcttggac gccatcgcgc 54180
atcaccggtg caaggtgggc cttacagcga tgagcatcaa ctggggattg ctctccgaag 54240
catcatcgcc ggcgaccccg aacgacggcg gagcacggct cgaataccgg gggatggaag 54300
gcctcacgct ggagcaggga gcggcggcgc tcgggcgctt gctcgcacga cccagggcgc 54360
aggtaggggt gatgcggctg aatctgcgcc agtggttgga gttctatccc aacgcggccc 54420
gattggcgct gtgggcggag ctgctgaagg agcgtgaccg cgccgaccga ggcgcgtcga 54480
acgcgtcgaa cctgcgcgag gcgctgcaga gcgccaggcc cgaagatcgt cagttgattc 54540
tggagaagca cttgagcgag ctgttggggc gggggctgcg ccttccgccg gagaggatcg 54600
agcggcacgt gccgttcagc aatctcggca tggactcgct gataggcctg gagctccgca 54660
accgcatcga ggccgcgctc ggcatcaccg tgccggcgac cctgctatgg acctacccta 54720
acgtagcagc tctgagcggg agcttgctag acattctgtt tccgaatgcc ggcgcgaccc 54780
acgctccggc caccgagcgg gagaagagct tcgagaacga tgccgcagat ctcgaggctc 54840
tgcggggcat gacggacgag cagaaggacg cgttgctcgc cgaaaagctg gcgcagctcg 54900
cgcagatcgt tggtgagtaa gggaccgagg gagtatggcg accacgaatg ccgggaagct 54960
tgagcatgcc cttctgctca tggacaagct tgcgaaaaag aacgcgtctt tggagcaaga 55020
gcggaccgag ccgatcgcca tcgtaggcat tggctgccgc ttccccggcg gagcggacac 55080
tccggaggca ttctgggagc tgctcgactc aggccgagac gcggtccagc cgctcgaccg 55140
gcgctgggcg ctggtcggcg tccatcccag cgaggaggtg ccgcgctggg ccggactgct 55200
caccgaggcg gtggacggct tcgacgccgc gttctttggc acctcgcctc gggaggcgcg 55260
gtcgctcgat cctcagcaac gcctgctgct ggaggtcacc tgggaagggc tcgaggacgc 55320
cggcatcgca ccccagtccc tcgacggcag ccgcaccggg gtgttcctgg gcgcatgcag 55380
cagcgactac tcgcataccg ttgcgcaaca gcggcgcgag gagcaggacg catacgacat 55440
caccggcaat acgctcagcg tcgccgccgg acggttgtct tatacgctag ggctgcaggg 55500
accctgcctg accgtcgaca cggcctgctc gtcgtcgctc gtggccatcc accttgcctg 55560
ccgcagcctg cgcgctcgcg agagcgatct cgcgctggcg ggaggcgtca acatgctcct 55620
ttcgtccaag acgatgataa tgctggggcg catccaggcg ctgtcgcccg atggccactg 55680
ccggacattc gacgcctcgg ccaacgggtt cgtccgtggg gagggctgcg gtatggtcgt 55740
gctcaaacgg ctctccgacg cccagcgaca cggcgatcgg atctgggctc tgatccgggg 55800
ttcggccatg aatcaggatg gccggtcgac agggttgatg gcacccaatg tgctcgctca 55860
ggaggcgctc ttgcgcgagg cgctgcagag cgctcgcgtc gacgccgggg ccatcggtta 55920
tgtcgagacc cacggaacgg ggacctcgct cggcgacccg atcgaggtcg aggcgctgcg 55980
tgccgtgttg gggccggcgc gggccgatgg gagccgctgc gtgctgggcg cagtgaagac 56040
aaacctcggc cacctggagg gcgctgcagg cgtggcgggt ttgatcaagg cggcgctggc 56100
tctgcaccac gaactgatcc cgcgaaacct ccatttccac acgctcaatc cgcggatccg 56160
gatcgagggg accgcgctcg cgctggcgac ggagccggtg ccgtggccgc gggcgggccg 56220
accgcgcttc gcgggggtga gcgcgttcgg cctcagcggc accaacgtcc atgtcgtgct 56280
ggaggaggcg ccggccacgg tgctcgcacc ggcgacgccg gggcgctcag cggagctttt 56340
ggtgctgtcg gcgaagagcg ccgccgcgct ggacgcacag gcggcgcggc tctcagcgca 56400
catcgccgcg tacccggagc agggtctcgg agacgtcgcg ttcagcctgg tatcgacgcg 56460
tagcccgatg gagcaccggc tcgcggtggc ggcgacctcg cgcgaggcgc tgcgaagcgc 56520
gctggaggtt gcggcgcagg ggcagacccc ggcaggcgcg gcgcgcggca gggccgcttc 56580
ctcgcccggc aagctcgcct tcctgttcgc cgggcagggc gcgcaggtgc cgggcatggg 56640
ccgtgggttg tgggaggcgt ggccggcgtt ccgcgagacc ttcgaccggt gcgtcacgct 56700
cttcgaccgg gagctccatc agccgctctg cgaggtgatg tgggccgagc cgggcagcag 56760
caggtcgtcg ttgctggacc agacggcgtt cacccagccg gcgctctttg cgctggagta 56820
cgcgctggcc gcgctcttcc ggtcgtgggg cgtggagccg gagctcgtcg ctggccatag 56880
cctcggcgag ctggtggccg cctgcgtggc gggtgtgttc tccctcgagg acgccgtgcg 56940
cttggtggtc gcgcgcggcc ggttgatgca ggcgctgccg gccggcggcg cgatggtatc 57000
gatcgccgcg ccggaggccg acgtggctgc cgcggtggcg ccgcacgcag cgttggtgtc 57060
gatcgcggca gtcaatgggc cggagcaggt ggtgatcgcg ggcgccgaga aattcgtgca 57120
gcagatcgcg gcggcgttcg cggcgcgggg ggcgcgaacc aaaccgctgc atgtctcgca 57180
cgcgttccac tcgccgctca tggatccgat gctggaggcg ttccggcggg tgactgagtc 57240
ggtgacgtac cggcggcctt cgatcgcgct ggtgagcaac ctgagcggga agccctgcac 57300
cgatgaggtg agcgcgccgg gttactgggt gcgtcacgcg cgagaggcgg tgcgcttcgc 57360
ggacggagtg aaggcgctgc acgcggccgg tgcgggcctc ttcgtcgagg tggggccgaa 57420
gccgacgctg ctcggccttg tgccggcctg cctgccggat gccaggccgg tgctgctccc 57480
agcgtcgcgc gccgggcgtg acgaggctgc gagcgcgcta gaggcgctgg gtgggttctg 57540
ggtcgtcggt ggatcggtca cctggtcggg tgtcttccct tcgggcggac ggcgggtacc 57600
gctgccaacc tatccctggc agcgcgagcg ttactggatc gaagcgccgg tcgatcgtga 57660
ggcggacggc accggccgtg ctcgggcggg gggccacccc cttctgggtg aagtcttttc 57720
cgtgtcgacc catgccggtc tgcgcctgtg ggagacgacg ctggaccgaa agcggctgcc 57780
gtggctcggc gagcaccggg cgcaggggga ggtcgtgttt cctggcgccg ggtacctgga 57840
gatggcgctg tcgtcggggg ccgagatctt gggcgatgga ccgatccagg tcacggatgt 57900
ggtgctcatc gagacgctga ccttcgcggg cgatacggcg gtaccggtcc aggtggtgac 57960
gaccgaggag cgaccgggac ggctgcggtt ccaggtagcg agtcgggagc cgggggaacg 58020
tcgcgcgccc ttccggatcc acgcccgcgg cgtgctgcgc cggatcgggc gcgtcgagac 58080
cccggcgagg tcgaacctcg ccgccctgcg cgcccggctt catgccgccg tgcccgctgc 58140
ggctatctat ggtgcgctcg ccgagatggg gcttcaatac ggcccggcgt tgcgggggct 58200
cgccgagctg tggcggggtg agggcgaggc gctgggcagg gtgagactgc ctgaggccgc 58260
cggctccgcg acagcctacc agctgcatcc ggtgctgctg gacgcgtgcg tccaaatgat 58320
tgttggcgcg ttcgccgatc gcgatgaggc gacgccgtgg gcgccggtgg aggtgggctc 58380
ggtgcggctg ttccagcggt ctcctgggga gctatggtgc catgcgcgcg tcgtgagcga 58440
tggtcaacag gcctccagcc ggtggagcgc cgactttgag ttgatggacg gtacgggcgc 58500
ggtggtcgcc gagatctccc ggctggtggt ggagcggctt gcgagcggtg tacgccggcg 58560
cgacgcagac gactggttcc tggagctgga ttgggagccc gcggcgctcg gtgggcccaa 58620
gatcacagcc ggccggtggc tgctgctcgg cgagggtggt gggctcgggc gctcgttgtg 58680
ctcggcgctg aaggccgccg gccatgtcgt cgtccacgcc gcgggggacg acacgagcac 58740
tgcaggaatg cgcgcgctcc tggccaacgc gttcgacggc caggccccga cggccgtggt 58800
gcacctcagc agcctcgacg ggggcggcca gctcggcccg gggctcgggg cgcagggcgc 58860
gctcgacgcg ccccggagcc cagatgtcga tgccgatgcc ctcgaatcgg cgctgatgcg 58920
tggttgcgac agcgtgctct ccctggtgca agcgctggtc ggcatggacc tccgaaacgc 58980
gccgcggctg tggctcttga cccgcggggc tcaggcggcc gccgccggcg atgtctccgt 59040
ggtgcaagcg ccgctgttgg ggctgggccg caccatcgcc ttggagcacg ccgagctgcg 59100
ctgtatcagc gtcgacctcg atccagccga gcctgaaggg gaagccgatg ctttgctggc 59160
cgagctactt gcagatgatg ccgaggagga ggtcgcgctg cgcggtggcg accggctcgt 59220
tgcgcggctc gtccaccggc tgcccgacgc tcagcgccgg gagaaggtcg agcccgccgg 59280
tgacaggccg ttccggctag agatcgatga acccggcgcg ctggaccaac tggtgctccg 59340
agccacgggg cggcgcgctc ctggtccggg cgaggtcgag atctccgtcg aagcggcggg 59400
gctcgactcc atcgacatcc agctggcgtt gggcgttgct cccaatgatc tgcctggaga 59460
agaaatcgag ccgttggtgc tcggaagcga gtgcgccggg cgcatcgtcg ctgtgggcga 59520
gggcgtgaac ggccttgtgg tgggccagcc ggtgatcgcc cttgcggcgg gagtatttgc 59580
tacccatgtc accacgtcgg ccacgctggt gttgcctcgg cctctggggc tctcggcgac 59640
cgaggcggcc gcgatgcccc tcgcgtattt gacggcctgg tacgccctcg acaaggtcgc 59700
ccacctgcag gcgggggagc gggtgctgat ccatgcggag gccggtggtg tcggtctttg 59760
cgcggtgcga tgggcgcagc gcgtgggcgc cgaggtgtat gcgaccgccg acacgcccga 59820
gaaccgtgcc tacctggagt cgctgggcgt gcggtacgtg agcgattccc gctcgggccg 59880
gttcgtcaca gacgtgcatg catggacgga cggcgagggt gtggacgtcg tgctcgactc 59940
gctttcgggc gagcgcatcg acaagagcct catggtcctg cgcgcctgtg gtcgccttgt 60000
gaagctgggc aggcgcgacg actgcgccga cacgcagcct gggctgccgc cgctcctacg 60060
gaatttttcc ttctcgcagg tggacttgcg gggaatgatg ctcgatcaac cggcgaggat 60120
ccgtgcgctc ctcgacgagc tgttcgggtt ggtcgcagcc ggtgccatca gcccactggg 60180
gtcggggttg cgcgttggcg gatccctcac gccaccgccg gtcgagacct tcccgatctc 60240
tcgcgcagcc gaggcattcc ggaggatggc gcaaggacag catctcggga agctcgtgct 60300
cacgctggac gacccggagg tgcggatccg cgctccggcc gaatccagcg tcgccgtccg 60360
cgcggacggc acctaccttg tgaccggcgg tctgggtggc ctcggtctgc gcgtggccgg 60420
atggctggcc gagcggggcg cggggcaact ggtgctggtg ggccgctccg gtgcggcgag 60480
cgcagagcag cgagccgccg tggcggcgct ggaggcccac ggcgcgcgcg tcacggtggc 60540
gaaagcggac gtcgccgatc ggtcacagat cgagcgggtc ctccgcgagg ttaccgcgtc 60600
ggggatgccg ctgcggggtg tcgtgcatgc ggcaggtctc gtggatgacg ggctgctgat 60660
gcagcagact ccggcgcggt tccgcacggt gatgggacct aaggtccagg gggccttgca 60720
cttgcacacg ctgacacgcg aagcgcctct ttccttcttc gtgctgtacg cttctgcagc 60780
tgggcttttc ggctcgccag gccagggcaa ctatgccgca gccaacgcgt tcctcgacgc 60840
cctttcgcat caccgaaggg cgcagggcct gccggcgctg agcatcgact ggggcatgtt 60900
cacggaggtg gggatggccg ttgcgcaaga aaaccgtggc gcgcggcaga tctctcgcgg 60960
gatgcggggc atcacccccg atgagggtct gtcagctctg gcgcgcttgc tcgagggtga 61020
tcgcgtgcag acgggggtga taccgatcac tccgcggcag tgggtggagt tctacccggc 61080
aacagcggcc tcacggaggt tgtcgcggct ggtgaccacg cagcgcgcgg tcgctgatcg 61140
gaccgccggg gatcgggacc tgctcgaaca gcttgcgtcg gctgagccga gcgcgcgggc 61200
ggggctgctg caggacgtcg tgcgcgtgca ggtctcgcat gtgctgcgtc tccctgaaga 61260
caagatcgag gtggatgccc cgctctcgag catgggcatg gactcgctga tgagcctgga 61320
gctgcgcaac cgcatcgagg ctgcgctggg cgtcgccgcg cctgcagcct tggggtggac 61380
gtacccaacg gtagcagcga taacgcgctg gctgctcgac gacgccctcg tcgtccggct 61440
tggcggcggg tcggacacgg acgaatcgac ggcgagcgcc ggttcgttcg tccacgtcct 61500
ccgctttcgt cctgtcgtca agccgcgggc tcgtctcttc tgttttcacg gttctggcgg 61560
ctcgcccgag ggcttccgtt cctggtcgga gaagtctgag tggagcgatc tggaaatcgt 61620
ggccatgtgg cacgatcgca gcctcgcctc cgaggacgcg cctggtaaga agtacgtcca 61680
agaggcggcc tcgctgattc agcactatgc agacgcaccg tttgcgttag tagggttcag 61740
cctgggtgtc cggttcgtca tggggacagc cgtggagctc gccagtcgtt ccggcgcacc 61800
ggctccgctg gccgtcttca cgttgggcgg cagcttgatc tcttcttcag agatcacccc 61860
ggagatggag accgatataa tagccaagct cttcttccga aatgccgcgg gtttcgtgcg 61920
atccacccaa caagtccagg ccgatgctcg cgcagacaag gtcatcacag acaccatggt 61980
ggctccggcc cccggggact cgaaggagcc gcccgtgaag atcgcggtcc ctatcgtcgc 62040
catcgccggc tcggacgatg tgatcgtgcc tccgagcgac gttcaggatc tacaatctcg 62100
caccacggag cgcttctata tgcatctcct tcccggagat cacgaatttc tcgtcgatcg 62160
agggcgcgag atcatgcaca tcgtcgactc gcatctcaat ccgctgctcg ccgcgaggac 62220
gacgtcgtca ggccccgcgt tcgaggcaaa atgatggcag cctccctcgg gcgcgcgaga 62280
tggttgggag cagcgtgggc gctggcggcc ggcggcaggc cgcggaggcg catgagcctt 62340
cctggacgtt tgcagtatag gagattttat gacacaggag caagcgaatc agagtgagac 62400
gaagcctgct ttcgacttca agccgttcgc gcctgggtac gcggaggacc cgttccccgc 62460
gatcgagcgc ctgagagagg caacccccat cttctactgg gatgaaggcc gctcctgggt 62520
cctcacccga taccacgacg tgtcggcggt gttccgcgac gaacgcttcg cggtcagtcg 62580
agaagagtgg gaatcgagcg cggagtactc gtcggccatt cccgagctca gcgatatgaa 62640
gaagtacgga ttgttcgggc tgccgccgga ggatcacgct cgggtccgca agctcgtcaa 62700
cccgtcgttt acgtcacgcg ccatcgacct gctgcgcgcc gaaatacagc gcaccgtcga 62760
ccagctgctc gatgctcgct ccggacaaga ggagttcgac gttgtgcggg attacgcgga 62820
gggaatcccg atgcgcgcga tcagcgctct gttgaaggtt ccggccgagt gtgacgagaa 62880
gttccgtcgc ttcggctcgg cgactgcgcg cgcgctcggc gtgggtttgg tgccccaggt 62940
cgatgaggag accaagaccc tggtcgcgtc cgtcaccgag gggctcgcgc tgctccatga 63000
cgtcctcgat gagcggcgca ggaacccgct cgaaaatgac gtcttgacga tgctgcttca 63060
ggccgaggcc gacggcagca ggctgagcac gaaggagctg gtcgcgctcg tgggtgcgat 63120
tatcgctgct ggcaccgata ccacgatcta ccttatcgcg ttcgctgtgc tcaacctgct 63180
gcggtcgccc gaggcgctcg agctggtgaa ggccgagccc gggctcatga ggaacgcgct 63240
cgatgaggtg ctccgcttcg acaatatcct cagaatagga actgtgcgtt tcgccaggca 63300
ggacctggag tactgcgggg catcgatcaa gaaaggggag atggtctttc tcctgatccc 63360
gagcgccctg agagatggga ctgtattctc caggccagac gtgtttgatg tgcgacggga 63420
cacgggcgcg agcctcgcgt acggtagagg cccccatgtc tgccccgggg tgtcccttgc 63480
tcgcctcgag gcggagatcg ccgtgggcac catcttccgt aggttccccg agatgaagct 63540
gaaagaaact cccgtgtttg gataccaccc cgcgttccgg aacatcgaat cactcaacgt 63600
catcttgaag ccctccaaag ctggatagct cgcgggggta tcgcttcccg aacctcattc 63660
cctcatgata cagctcgcgc gcgggtgctg tctgccgcgg gtgcgattcg atccagcgga 63720
caagcccatt gtcagcgcgc gaagatcgaa tccacggccc ggagaagagc ccgtccgggt 63780
gacgtcggaa gaagtgccgg gcgccgccct gggagcgcaa agctcgctcg ttcgcgctca 63840
gcacgccgct cgtcatgtcc ggccctgcac ccgcgccgag gagccgcccg ccctgatgca 63900
cggcctcacc gagcggcagg ttctgctctc gctcgtcgcc ctcgcgctcg tcctcctgac 63960
cgcgcgcgcc ttcggcgagc tcgcgcggcg gctgcgccag cccgaggtgc tcggcgagct 64020
cttcggcggc gtggtgctgg gcccgtccgt cgtcggcgcg ctcgctcctg ggttccatcg 64080
agtcctcttc caggatccgg cggtcggggt cgtgctctcc ggcatctcct ggataggcgc 64140
gctcgtcctg ctgctcatgg cgggtatcga ggtcgatgtg agcatcctgc gcaaggaggc 64200
gcgccccggg gcgctctcgg cgctcggcgc gatcgcgccc ccgctgcgca cgccggggcc 64260
gctggtgcag cgcatgcagg gcgcgttcac gtgggatctc gacgtctcgc cgcgacgctc 64320
tgcgcaagcc tgagcctcgg cgcctgctcg tacacctcgc cggtgctcgc tccgcccgcg 64380
gacatccggc cgcccgccgc ggcccagctc gagccggact cgccggatga cgaggccgac 64440
gaggccgacg aggcgctccg cccgttccgc gacgcgatcg ccgcgtactc ggaggccgtt 64500
cggtgggcgg aggcggcgca gcggccgcgg ctggagagcc tcgtgcggct cgcgatcgtg 64560
cggctgggca aggcgctcga caaggtccct ttcgcgcaca cgacggccgg cgtctcccag 64620
atcgccggca gactccagaa cgatgcggtc tggttcgatg tcgccgcccg gtacgcgagc 64680
ttccgcgcgg cgacggagca cgcgctccgc gacgcggcgt cggccatgga ggcgctcgcg 64740
gccggcccgt accgcggatc gagccgcgtg tccgctgccg taggggagtt tcggggggag 64800
gcggcgcgcc ttcaccccgc ggaccgtgta cccgcgtccg accagcagat cctgaccgcg 64860
ctgcgcgcag ccgagcgggc gctcatcgcg ctctacactg cgttcgcccg tgaggagtga 64920
gcctctctcg ggcgcagccg agcggcggcg tgccggtggt tccctcttcg caaccatgac 64980
cggagccgcg ctcggtccgc gcagcggcta gcgcgcgtcg cggcagagat cgctggagcg 65040
acaggcgacg acccgcccga gggtgtcgaa cggattgccg cagccctcat tgcggatccc 65100
ctccagacac tcgttcagct gcttggcgtc gatgccgcct gggcactcgc cgaaggtcag 65160
ctcgtcgcgc cactcggatc ggatcttgtt cgagcacgcg tccttgctcg aatactcccg 65220
gtcttgtccg atgttgttgc accgcgcctc gcggtcgcac cgcgccgcca cgatgctatc 65280
gacggcgctg ccgactggca ccggcgcctc gccctgcgcg ccacccgggg tttgcgcctc 65340
cccgcctgac cgcttttcgc cgccgcacgc cgcgagcagg ctcattcccg acaccgagat 65400
caggcccacg accagcttcc cagcaatctt ttgcatggct tcccctccct cacgacacgt 65460
cacatcagag actctccgct cggctcgtcg gttcgacagc cggcgacggc cacgagcaga 65520
accgtccccg accagaacag ccgcatgcgg gtttctcgca acatgccccg acatccttgc 65580
gactagcgtg cctccgctcg tgccgagatc ggctgtcctg tgcgacggca atatcctgcg 65640
atcggccggg caggaggtac cgacacgggc gccgggcggg aggtgccgcc acgggctcga 65700
aatgtgctgc ggcaggcgcc tccatgcccg cagccgggaa cgcggcgccc ggccagcctc 65760
ggggtgacgc cgcaaacggg agatgctccc ggagaggcgc cgggcacagc cgagcgccgt 65820
caccaccgtg cgcactcgtg agctccagct cctcggcata gaagagaccg tcactcccgg 65880
tccgtgtagg cgatcgtgct gatcagcgcg ttctccgcct gacgcgagtc gagccgggta 65940
tgctgcacga caatgggaac gtccgattcg atcacgctgg catagtccgt atcgcgcggg 66000
atcggctcgg gttcggtcag atcgttgaac cggacgtgcc gggtgcgcct cgctgggacg 66060
gtcacccggt acggcccggc ggggtcgcgg tcgctgaagt agacggtgat ggcgacctgc 66120
gcgtcccggt ccgacgcatt caacaggcag gccgtctcat ggctcgtcat ctgcggctcg 66180
ggtccgttgc tccggcctgg gatgtagccc tctgcgattg cccagcgcgt ccgcccgatc 66240
ggcttctcca tatgtcctcc ctgctggctc ctctttggct gcctccctct gctgtccagg 66300
agcgacggcc tcttctcccg acgcgctcgg ggatccatgg ctgaggatcc tcgccgagcg 66360
ctccttgccg accggcgcgc cgagcgccga cgggctttga aagcacgcga ccggacacgt 66420
gatgccggcg cgacgaggcc gccccgcgtc tgatcccgat cgtgacatcg cgacgtccgc 66480
cggcgcctct gcaggccggc ctgagcgttg cgcggtcatg gtcgtcctcg cgtcaccgcc 66540
acccgccgat tcacatccca ccgcggcacg acgcttgctc aaaccgcggc gagacggccg 66600
ggcggctgtg gtaccggcca gcccggacgc gaggcccgag agggacagtg ggtccgccgt 66660
gaagcagtga ggcgatcgag gtggcagatg aaacacgttg acacgggccg acgagtcggc 66720
cgccggatag ggctcacgct cggtctcctc gcgagcatgg cgctcgccgg ctgtggcggc 66780
ccgagcgaga aaatcgtgca gggcacgcgg ctcgcgcccg gcgccgatgc gcacgtcgcc 66840
gccgacgtcg accccgacgc cgcgaccacg cggctggcgg tggacgtcgt tcacctctcg 66900
ccgcccgagc gcatcgaggc cggcagcgag cggttcgtcg tctggcagcg tccgagctcc 66960
gagtccccgt ggcaacgggt cggagtgctc gactacaacg ctgccagccg aagaggcaag 67020
ctggccgaga cgaccgtgcc gcatgccaac ttcgagctgc tcatcaccgt cgagaagcag 67080
agcagccctc agtctccatc ttctgccgcc gtcatcgggc cgacgtccgt cgggtaacat 67140
cgcgctatca gcagcgctga gcccgccagc aggccccaga gccctgcctc gatcgccttc 67200
tccatcatat catccctgcg tactcctcca gcgacggccg cgtcgaagca accgccgtgc 67260
cggcgcggct ctacgtgcgc gacaggagag cgtcctggcg cggcctgcgc atcgctggaa 67320
ggatcggcgg agcatggaga aagaatcgag gatcgcgatc tacggcgcca tcgcagccaa 67380
cgtggcgatc gcggcggtca agttcatcgc cgccgccgtg accggcagct cggcgatgct 67440
ctccgagggc gtgcactccc tcgtcgatac tgcagacggg ctcctcctcc tgctcggcaa 67500
gcaccggagc gcacgcccgc ccgacgccga gcatccgttc ggccacggca aggagctcta 67560
tttctggacg ctgatcgtcg ccatcatgat cttcgccgcg ggcggcggcg tctcgatcta 67620
cgaagggatc ttgcacctct tgcacccgcg ccagatcgag gatccgacgt ggaactacgt 67680
cgtcctcggc gcagcggccg tcttcgaggg gacgtcgctc atcatctcga tccacgagtt 67740
caagaagaag gacggacagg gctacctcgc ggcgatgcgg tccagcaagg acccgacgac 67800
gttcacgatc gtcctggagg actccgcggc gctcgccggg ctcaccatcg ccttcctcgg 67860
cgtctggctc gggcaccgcc tgggaaaccc ctacctcgac ggcgcggcgt cgatcggcat 67920
cggcctcgtg ctcgccgcgg tcgcggtctt cctcgccagc cagagccgtg ggctcctcgt 67980
gggggagagc gcggacaggg agctcctcgc cgcgatccgc gcgctcgcca gcgcagatcc 68040
tggcgtgtcg gcggtggggc ggcccctgac gatgcacttc ggtccgcacg aagtcctggt 68100
cgtgctgcgc atcgagttcg acgccgcgct cacggcgtcc ggggtcgcgg aggcgatcga 68160
gcgcatcgag acccggatac ggagcgagcg acccgacgtg aagcacatct acgtcgaggc 68220
caggtcgctc caccagcgcg cgagggcgtg acgcgccgtg gagagaccgc gcgcggcctc 68280
cgccatcctc cgcggcgccc gggctcaggt ggccctcgca gcagggcgcg cctggcgggc 68340
aaaccgtgca gacgtcgtcc ttcgacgcga ggtacgctgg ttgcaagtcg tcacgccgta 68400
tcgcgaggtc cggcagcgcc ggagcccggg cgggccgggc gcacgaaggc gcggcgagcg 68460
caggcttcga ggggggcgac gtcatgagga aggccagggc gcatggggcg atgctcggcg 68520
ggcgagatga cggctggcgt cgcggcctcc ccggcgccgg cgcgcttcgc gccgcgctcc 68580
agcgcggtcg ctcgcgcgat ctcgcccggc gccggctcat cgcctccgtg tccctcgccg 68640
gcggcgccag catggcggtc gtctcgctgt tccagctcgg gatcatcgag cgcctgcccg 68700
atcctccgct tccagggttc gattcggcca aggtgacgag ctccgatatc 68750
<210>2
<211>1421
<212>PRT
<213〉sorangium cellulosum
<400>2
Val Ala Asp Arg Pro Ile Glu Arg Ala Ala Glu Asp Pro Ile Ala Ile
1 5 10 15
Val Gly Ala Ser Cys Arg Leu Pro Gly Gly Val Ile Asp Leu Ser Gly
20 25 30
Phe Trp Thr Leu Leu Glu Gly Ser Arg Asp Thr Val Gly Arg Val Pro
35 40 45
Ala Glu Arg Trp Asp Ala Ala Ala Trp Phe Asp Pro Asp Pro Asp Ala
50 55 60
Pro Gly Lys Thr Pro Val Thr Arg Ala Ser Phe Leu Ser Asp Val Ala
65 70 75 80
Cys Phe Asp Ala Ser Phe Phe Gly Ile Ser Pro Arg Glu Ala Leu Arg
85 90 95
Met Asp Pro Ala His Arg Leu Leu Leu Glu Val Cys Trp Glu Ala Leu
100 105 110
Glu Asn Ala Ala Ile Ala Pro Ser Ala Leu Val Gly Thr Glu Thr Gly
115 120 125
Val Phe Ile Gly Ile Gly Pro Ser Glu Tyr Glu Ala Ala Leu Pro Gln
130 135 140
Ala Thr Ala Ser Ala Glu Ile Asp Ala His Gly Gly Leu Gly Thr Met
145 150 155 160
Pro Ser Val Gly Ala Gly Arg Ile Ser Tyr Ala Leu Gly Leu Arg Gly
165 170 175
Pro Cys Val Ala Val Asp Thr Ala Tyr Ser Ser Ser Leu Val Ala Val
180 185 190
His Leu Ala Cys Gln Ser Leu Arg Ser Gly Glu Cys Ser Thr Ala Leu
195 200 205
Ala Gly Gly Val Ser Leu Met Leu Ser Pro Ser Thr Leu Val Trp Leu
210 215 220
Ser Lys Thr Arg Ala Leu Ala Arg Asp Gly Arg Cys Lys Ala Phe Ser
225 230 235 240
Ala Glu Ala Asp Gly Phe Gly Arg Gly Glu Gly Cys Ala Val Val Val
245 250 255
Leu Lys Arg Leu Ser Gly Ala Arg Ala Asp Gly Asp Arg Ile Leu Ala
260 265 270
Val Ile Arg Gly Ser Ala Ile Asn His Asp Gly Ala Ser Ser Gly Leu
275 280 285
Thr Val Pro Asn Gly Ser Ser Gln Glu Ile Val Leu Lys Arg Ala Leu
290 295 300
Ala Asp Ala Gly Cys Ala Ala Ser Ser Val Gly Tyr Val Glu Ala His
305 310 315 320
Gly Thr Gly Thr Thr Leu Gly Asp Pro Ile Glu Ile Gln Ala Leu Asn
325 330 335
Ala Val Tyr Gly Leu Gly Arg Asp Val Ala Thr Pro Leu Leu Ile Gly
340 345 350
Ser Val Lys Thr Asn Leu Gly His Pro Glu Tyr Ala Ser Gly Ile Thr
355 360 365
Gly Leu Leu Lys Val Val Leu Ser Leu Gln His Gly Gln Ile Pro Ala
370 375 380
His Leu His Ala Gln Ala Leu Asn Pro Arg Ile Ser Trp Gly Asp Leu
385 390 395 400
Arg Leu Thr Val Thr Arg Ala Arg Thr Pro Trp Pro Asp Trp Asn Thr
405 410 415
Pro Arg Arg Ala Gly Val Ser Ser Phe Gly Met Ser Gly Thr Asn Ala
420 425 430
His Val Val Leu Glu Glu Ala Pro Ala Ala Thr Cys Thr Pro Pro Ala
435 440 445
Pro Glu Arg Pro Ala Glu Leu Leu Val Leu Ser Ala Arg Thr Ala Ser
450 455 460
Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Asp His Leu Glu Thr Tyr
465 470 475 480
Pro Ser Gln Cys Leu Gly Asp Val Ala Phe Ser Leu Ala Thr Thr Arg
485 490 495
Ser Ala Met Glu His Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Gly
500 505 510
Leu Arg Ala Ala Leu Asp Ala Ala Ala Gln Gly Gln Thr Ser Pro Gly
515 520 525
Ala Val Arg Ser Ile Ala Asp Ser Ser Arg Gly Lys Leu Ala Phe Leu
530 535 540
Phe Thr Gly Gln Gly Ala Gln Thr Leu Gly Met Gly Arg Gly Leu Tyr
545 550 555 560
Asp Val Trp Ser Ala Phe Arg Glu Ala Phe Asp Leu Cys Val Arg Leu
565 570 575
Phe Asn Gln Glu Leu Asp Arg Pro Leu Arg Glu Val Met Trp Ala Glu
580 585 590
Pro Ala Ser Val Asp Ala Ala Leu Leu Asp Gln Thr Ala Phe Thr Gln
595 600 605
Pro Ala Leu Phe Thr Phe Glu Tyr Ala Leu Ala Ala Leu Trp Arg Ser
610 615 620
Trp Gly Val Glu Pro Glu Leu Val Ala Gly His Ser Ile Gly Glu Leu
625 630 635 640
Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu Asp Ala Val Phe
645 650 655
Leu Val Ala Ala Arg Gly Arg Leu Met Gln Ala Leu Pro Ala Gly Gly
660 665 670
Ala Met Val Ser Ile Glu Ala Pro Glu Ala Asp Val Ala Ala Ala Val
675 680 685
Ala Pro His Ala Ala Ser Val Ser Ile Ala Ala Val Asn Ala Pro Asp
690 695 700
Gln Val Val Ile Ala Gly Ala Gly Gln Pro Val His Ala Ile Ala Ala
705 710 715 720
Ala Met Ala Ala Arg Gly Ala Arg Thr Lys Ala Leu His Val Ser His
725 730 735
Ala Phe His Ser Pro Leu Met Ala Pro Met Leu Glu Ala Phe Gly Arg
740 745 750
Val Ala Glu Ser Val Ser Tyr Arg Arg Pro Ser Ile Val Leu Val Ser
755 760 765
Asn Leu Ser Gly Lys Ala Cys Thr Asp Glu Val Ser Ser Pro Gly Tyr
770 775 780
Trp Val Arg His Ala Arg Glu Val Val Arg Phe Ala Asp Gly Val Lys
785 790 795 800
Ala Leu His Ala Ala Gly Ala Gly Thr Phe Val Glu Val Gly Pro Lys
805 810 815
Ser Thr Leu Leu Gly Leu Val Pro Ala Cys Met Pro Asp Ala Arg Pro
820 825 830
Ala Leu Leu Ala Ser Ser Arg Ala Gly Arg Asp Glu Pro Ala Thr Val
835 840 845
Leu Glu Ala Leu Gly Gly Leu Trp Ala Val Gly Gly Leu Val Ser Trp
850 855 860
Ala Gly Leu Phe Pro Ser Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr
865 870 875 880
Pro Trp Gln Arg Glu Arg Tyr Trp Ile Asp Thr Lys Ala Asp Asp Ala
885 890 895
Ala Arg Gly Asp Arg Arg Ala Pro Gly Ala Gly His Asp Glu Val Glu
900 905 910
Glu Gly Gly Ala Val Arg Gly Gly Asp Arg Arg Ser Ala Arg Leu Asp
915 920 925
His Pro Pro Pro Glu Ser Gly Arg Arg Glu Lys Val Glu Ala Ala Gly
930 935 940
Asp Arg Pro Phe Arg Leu Glu Ile Asp Glu Pro Gly Val Leu Asp His
945 950 955 960
Leu Val Leu Arg Val Thr Glu Arg Arg Ala Pro Gly Leu Gly Glu Val
965 970 975
Glu Ile Ala Val Asp Ala Ala Gly Leu Ser Phe Asn Asp Val Gln Leu
980 985 990
Ala Leu Gly Met Val Pro Asp Asp Leu Pro Gly Lys Pro Asn Pro Pro
995 1000 1005
Leu Leu Leu Gly Gly Glu Cys Ala Gly Arg Ile Val Ala Val Gly Glu
1010 1015 1020
Gly Val Asn Gly Leu Val Val Gly Gln Pro Val Ile Ala Leu Ser Ala
1025 1030 1035 1040
Gly Ala Phe Ala Thr His Val Thr Thr Ser Ala Ala Leu Val Leu Pro
1045 1050 1055
Arg Pro Gln Ala Leu Ser Ala Ile Glu Ala Ala Ala Met Pro Val Ala
1060 1065 1070
Tyr Leu Thr Ala Trp Tyr Ala Leu Asp Arg Ile Ala Arg Leu Gln Pro
1075 1080 1085
Gly Glu Arg Val Leu Ile His Ala Ala Thr Gly Gly Val Gly Leu Ala
1090 1095 1100
Ala Val Gln Trp Ala Gln His Val Gly Ala Glu Val His Ala Thr Ala
1105 1110 1115 1120
Gly Thr Pro Glu Lys Arg Ala Tyr Leu Glu Ser Leu Gly Val Arg Tyr
1125 1130 1135
Val Ser Asp Ser Arg Ser Asp Arg Phe Val Ala Asp Val Arg Ala Trp
1140 1145 1150
Thr Gly Gly Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Glu
1155 1160 1165
Leu Ile Asp Lys Ser Phe Asn Leu Leu Arg Ser His Gly Arg Phe Val
1170 1175 1180
Glu Leu Gly Lys Arg Asp Cys Tyr Ala Asp Asn Gln Leu Gly Leu Arg
1185 1190 1195 1200
Pro Phe Leu Arg Asn Leu Ser Phe Ser Leu Val Asp Leu Arg Gly Met
1205 1210 1215
Met Leu Glu Arg Pro Ala Arg Val Arg Ala Leu Leu Glu Glu Leu Leu
1220 1225 1230
Gly Leu Ile Ala Ala Gly Val Phe Thr Pro Pro Pro Ile Ala Thr Leu
1235 1240 1245
Pro Ile Ala Arg Val Ala Asp Ala Phe Arg Ser Met Ala Gln Ala Gln
1250 1255 1260
His Leu Gly Lys Leu Val Leu Thr Leu Gly Asp Pro Glu Val Gln Ile
1265 1270 1275 1280
Arg Ile Pro Thr His Ala Gly Ala Gly Pro Ser Thr Gly Asp Arg Asp
1285 1290 1295
Leu Leu Asp Arg Leu Ala Ser Ala Ala Pro Ala Ala Arg Ala Ala Ala
1300 1305 1310
Leu Glu Ala Phe Leu Arg Thr Gln Val Ser Gln Val Leu Arg Thr Pro
1315 1320 1325
Glu Ile Lys Val Gly Ala Glu Ala Leu Phe Thr Arg Leu Gly Met Asp
1330 1335 1340
Ser Leu Met Ala Val Glu Leu Arg Asn Arg Ile Glu Ala Ser Leu Lys
1345 1350 1355 1360
Leu Lys Leu Ser Thr Thr Phe Leu Ser Thr Ser Pro Asn Ile Ala Leu
1365 1370 1375
Leu Ala Gln Asn Leu Leu Asp Ala Leu Ala Thr Ala Leu Ser Leu Glu
1380 1385 1390
Arg Val Ala Ala Glu Asn Leu Arg Ala Gly Val Gln Asn Asp Phe Val
1395 1400 1405
Ser Ser Gly Ala Asp Gln Asp Trp Glu Ile Ile Ala Leu
1410 1415 1420
<210>3
<211>1410
<212>PRT
<213〉sorangium cellulosum
<400>3
Met Thr Ile Asn Gln Leu Leu Asn Glu Leu Glu His Gln Gly Ile Lys
1 5 10 15
Leu Ala Ala Asp Gly Glu Arg Leu Gln Ile Gln Ala Pro Lys Asn Ala
20 25 30
Leu Asn Pro Asn Leu Leu Ala Arg Ile Ser Glu His Lys Ser Thr Ile
35 40 45
Leu Thr Met Leu Arg Gln Arg Leu Pro Ala Glu Ser Ile Val Pro Ala
50 55 60
Pro Ala Glu Arg His Ala Pro Phe Pro Leu Thr Asp Ile Gln Glu Ser
65 70 75 80
Tyr Trp Leu Gly Arg Thr Gly Ala Phe Thr Val Pro Ser Gly Ile His
85 90 95
Ala Tyr Arg Glu Tyr Asp Cys Thr Asp Leu Asp Val Pro Arg Leu Ser
100 105 110
Arg Ala Phe Arg Lys Val Val Ala Arg His Asp Met Leu Arg Ala His
115 120 125
Thr Leu Pro Asp Met Met Gln Val Ile Glu Pro Lys Val Asp Ala Asp
130 135 140
Ile Glu Ile Ile Asp Leu Arg Gly Leu Asp Arg Ser Thr Arg Glu Ala
145 150 155 160
Arg Leu Val Ser Leu Arg Asp Ala Met Ser His Arg Ile Tyr Asp Thr
165 170 175
Glu Arg Pro Pro Leu Tyr His Val Val Ala Val Arg Leu Asp Glu Arg
180 185 190
Gln Thr Arg Leu Val Leu Ser Ile Asp Leu Ile Asn Val Asp Leu Gly
195 200 205
Ser Leu Ser Ile Ile Phe Lys Asp Trp Leu Ser Phe Tyr Glu Asp Pro
210 215 220
Glu Thr Ser Leu Pro Val Leu Glu Leu Ser Tyr Arg Asp Tyr Val Leu
225 230 235 240
Ala Leu Glu Ser Arg Lys Lys Ser Glu Ala His Gln Arg Ser Met Asp
245 250 255
Tyr Trp Lys Arg Arg Ile Ala Glu Leu Pro Pro Pro Pro Thr Leu Pro
260 265 270
Met Lys Ala Asp Pro Ser Thr Leu Lys Glu Ile Arg Phe Arg His Thr
275 280 285
Glu Gln Trp Leu Pro Ser Asp Ser Trp Gly Arg Leu Lys Arg Arg Val
290 295 300
Gly Glu Arg Gly Leu Thr Pro Thr Gly Val Ile Leu Ala Ala Phe Ser
305 310 315 320
Glu Val Ile Gly Arg Trp Ser Ala Ser Pro Arg Phe Thr Leu Asn Ile
325 330 335
Thr Leu Phe Asn Arg Leu Pro Val His Pro Arg Val Asn Asp Ile Thr
340 345 350
Gly Asp Phe Thr Ser Met Val Leu Leu Asp Ile Asp Thr Thr Arg Asp
355 360 365
Lys Ser Phe Glu Gln Arg Ala Lys Arg Ile Gln Glu Gln Leu Trp Glu
370 375 380
Ala Met Asp His Cys Asp Val Ser Gly Ile Glu Val Gln Arg Glu Ala
385 390 395 400
Ala Arg Val Leu Gly Ile Gln Arg Gly Ala Leu Phe Pro Val Val Leu
405 410 415
Thr Ser Ala Leu Asn Gln Gln Val Val Gly Val Thr Ser Leu Gln Arg
420 425 430
Leu Gly Thr Pro Val Tyr Thr Ser Thr Gln Thr Pro Gln Leu Leu Leu
435 440 445
Asp His Gln Leu Tyr Glu His Asp Gly Asp Leu Val Leu Ala Trp Asp
450 455 460
Ile Val Asp Gly Val Phe Pro Pro Asp Leu Leu Asp Asp Met Leu Glu
465 470 475 480
Ala Tyr Val Val Phe Leu Arg Arg Leu Thr Glu Glu Pro Trp Gly Glu
485 490 495
Gln Val Arg Cys Ser Leu Pro Pro Ala Gln Leu Glu Ala Arg Ala Ser
500 505 510
Ala Asn Ala Thr Asn Ala Leu Leu Ser Glu His Thr Leu His Gly Leu
515 520 525
Phe Ala Ala Arg Val Glu Gln Leu Pro Met Gln Leu Ala Val Val Ser
530 535 540
Ala Arg Lys Thr Leu Thr Tyr Glu Glu Leu Ser Arg Arg Ser Arg Arg
545 550 555 560
Leu Gly Ala Arg Leu Arg Glu Gln Gly Ala Arg Pro Asn Thr Leu Val
565 570 575
Ala Val Val Met Glu Lys Gly Trp Glu Gln Val Val Ala Val Leu Ala
580 585 590
Val Leu Glu Ser Gly Ala Ala Tyr Val Pro Ile Asp Ala Asp Leu Pro
595 600 605
Ala Glu Arg Ile His Tyr Leu Leu Asp His Gly Glu Val Lys Leu Val
610 615 620
Leu Thr Gln Pro Trp Leu Asp Gly Lys Leu Ser Trp Pro Pro Gly Ile
625 630 635 640
Gln Arg Leu Leu Val Ser Glu Ala Gly Val Glu Gly Asp Gly Asp Gln
645 650 655
Pro Pro Met Met Pro Ile Gln Thr Pro Ser Asp Leu Ala Tyr Val Ile
660 665 670
Tyr Thr Ser Gly Ser Thr Gly Leu Pro Lys Gly Val Met Ile Asp His
675 680 685
Arg Gly Ala Val Asn Thr Ile Leu Asp Ile Asn Glu Arg Phe Glu Ile
690 695 700
Gly Pro Gly Asp Arg Val Leu Ala Leu Ser Ser Leu Ser Phe Asp Leu
705 710 715 720
Ser Val Tyr Asp Val Phe Gly Ile Leu Ala Ala Gly Gly Thr Ile Val
725 730 735
Val Pro Asp Ala Ser Lys Leu Arg Asp Pro Ala His Trp Ala Glu Leu
740 745 750
Ile Glu Arg Glu Lys Val Thr Val Trp Asn Ser Val Pro Ala Leu Met
755 760 765
Arg Met Leu Val Glu His Phe Glu Gly Arg Pro Asp Ser Leu Ala Arg
770 775 780
Ser Leu Arg Leu Ser Leu Leu Ser Gly Asp Trp Ile Pro Val Gly Leu
785 790 795 800
Pro Gly Glu Leu Gln Ala Ile Arg Pro Gly Val Ser Val Ile Ser Leu
805 810 815
Gly Gly Ala Thr Glu Ala Ser Ile Trp Ser Ile Gly Tyr Pro Val Arg
820 825 830
Asn Val Asp Leu Ser Trp Ala Ser Ile Pro Tyr Gly Arg Pro Leu Arg
835 840 845
Asn Gln Thr Phe His Val Leu Asp Glu Ala Leu Glu Pro Arg Pro Val
850 855 860
Trp Val Pro Gly Gln Leu Tyr Ile Gly Gly Val Gly Leu Ala Leu Gly
865 870 875 880
Tyr Trp Arg Asp Glu Glu Lys Thr Arg Lys Ser Phe Leu Val His Pro
885 890 895
Glu Thr Gly Glu Arg Leu Tyr Lys Thr Gly Asp Leu Gly Arg Tyr Leu
900 905 910
Pro Asp Gly Asn Ile Glu Phe Met Gly Arg Glu Asp Asn Gln Ile Lys
915 920 925
Leu Arg Gly Tyr Arg Val Glu Leu Gly Glu Ile Glu Glu Thr Leu Lys
930 935 940
Ser His Pro Asn Val Arg Asp Ala Val Ile Val Pro Val Gly Asn Asp
945 950 955 960
Ala Ala Asn Lys Leu Leu Leu Ala Tyr Val Val Pro Glu Gly Thr Arg
965 970 975
Arg Arg Ala Ala Glu Gln Asp Ala Ser Leu Lys Thr Glu Arg Ile Asp
980 985 990
Ala Arg Ala His Ala Ala Glu Ala Asp Gly Leu Ser Asp Gly Glu Arg
995 1000 1005
Val Gln Phe Lys Leu Ala Arg His Gly Leu Arg Arg Asp Leu Asp Gly
1010 1015 1020
Lys Pro Val Val Asp Leu Thr Gly Gln Asp Pro Arg Glu Ala Gly Leu
1025 1030 1035 1040
Asp Val Tyr Ala Arg Arg Arg Ser Val Arg Thr Phe Leu Glu Ala Pro
1045 1050 1055
Ile Pro Phe Val Glu Phe Gly Arg Phe Leu Ser Cys Leu Ser Ser Val
1060 1065 1070
Glu Pro Asp Gly Ala Thr Leu Pro Lys Phe Arg Tyr Pro Ser Ala Gly
1075 1080 1085
Ser Thr Tyr Pro Val Gln Thr Tyr Ala Tyr Val Lys Ser Gly Arg Ile
1090 1095 1100
Glu Gly Val Asp Glu Gly Phe Tyr Tyr Tyr His Pro Phe Glu His Arg
1105 1110 1115 1120
Leu Leu Lys Leu Ser Asp His Gly Ile Glu Arg Gly Ala His Val Arg
1125 1130 1135
Gln Asn Phe Asp Val Phe Asp Glu Ala Ala Phe Asn Leu Leu Phe Val
1140 1145 1150
Gly Arg Ile Asp Ala Ile Glu Ser Leu Tyr Gly Ser Ser Ser Arg Glu
1155 1160 1165
Phe Cys Leu Leu Glu Ala Gly Tyr Met Ala Gln Leu Leu Met Glu Gln
1170 1175 1180
Ala Pro Ser Cys Asn Ile Gly Val Cys Pro Val Gly Gln Phe Asn Phe
1185 1190 1195 1200
Glu Gln Val Arg Pro Val Leu Asp Leu Arg His Ser Asp Val Tyr Val
1205 1210 1215
His Gly Met Leu Gly Gly Arg Val Asp Pro Arg Gln Phe Gln Val Cys
1220 1225 1230
Thr Leu Gly Gln Asp Ser Ser Pro Arg Arg Ala Thr Thr Arg Gly Ala
1235 1240 1245
Pro Pro Gly Arg Glu Gln His Phe Ala Asp Met Leu Arg Asp Phe Leu
1250 1255 1260
Arg Thr Lys Leu Pro Glu Tyr Met Val Pro Thr Val Phe Val Glu Leu
1265 1270 1275 1280
Asp Ala Leu Pro Leu Thr Ser Asn Gly Lys Val Asp Arg Lys Ala Leu
1285 1290 1295
Arg Glu Arg Lys Asp Thr Ser Ser Pro Arg His Ser Gly His Thr Ala
1300 1305 1310
Pro Arg Asp Ala Leu Glu Glu Ile Leu Val Ala Val Val Arg Glu Val
1315 1320 1325
Leu Gly Leu Glu Val Val Gly Leu Gln Gln Ser Phe Val Asp Leu Gly
1330 1335 1340
Ala Thr Ser Ile His Ile Val Arg Met Arg Ser Leu Leu Gln Lys Arg
1345 1350 1355 1360
Leu Asp Arg Glu Ile Ala Ile Thr Glu Leu Phe Gln Tyr Pro Asn Leu
1365 1370 1375
Gly Ser Leu Ala Ser Gly Leu Arg Arg Asp Ser Arg Asp Leu Asp Gln
1380 1385 1390
Arg Pro Asn Met Gln Asp Arg Val Glu Val Arg Arg Lys Gly Arg Arg
1395 1400 1405
Arg Ser
1410
<210>4
<211>1832
<212>PRT
<213〉sorangium cellulosum
<400>4
Met Glu Glu Gln Glu Ser Ser Ala Ile Ala Val Ile Gly Met Ser Gly
1 5 10 15
Arg Phe Pro Gly Ala Arg Asp Leu Asp Glu Phe Trp Arg Asn Leu Arg
20 25 30
Asp Gly Thr Glu Ala Val Gln Arg Phe Ser Glu Gln Glu Leu Ala Ala
35 40 45
Ser Gly Val Asp Pro Ala Leu Val Leu Asp Pro Ser Tyr Val Arg Ala
50 55 60
Gly Ser Val Leu Glu Asp Val Asp Arg Phe Asp Ala Ala Phe Phe Gly
65 70 75 80
Ile Ser Pro Arg Glu Ala Glu Leu Met Asp Pro Gln His Arg Ile Phe
85 90 95
Met Glu Cys Ala Trp Glu Ala Leu Glu Asn Ala Gly Tyr Asp Pro Thr
100 105 110
Ala Tyr Glu Gly Ser Ile Gly Val Tyr Ala Gly Ala Asn Met Ser Ser
115 120 125
Tyr Leu Thr Ser Asn Leu His Glu His Pro Ala Met Met Arg Trp Pro
130 135 140
Gly Trp Phe Gln Thr Leu Ile Gly Asn Asp Lys Asp Tyr Leu Ala Thr
145 150 155 160
His Val Ser Tyr Arg Leu Asn Leu Arg Gly Pro Ser Ile Ser Val Gln
165 170 175
Thr Ala Cys Ser Thr Ser Leu Val Ala Val His Leu Ala Cys Met Ser
180 185 190
Leu Leu Asp Arg Glu Cys Asp Met Ala Leu Ala Gly Gly Ile Thr Val
195 200 205
Arg Ile Pro His Arg Ala Gly Tyr Val Tyr Ala Glu Gly Gly Ile Phe
210 215 220
Ser Pro Asp Gly His Cys Arg Ala Phe Asp Ala Lys Ala Asn Gly Thr
225 230 235 240
Ile Met Gly Asn Gly Cys Gly Val Val Leu Leu Lys Pro Leu Asp Arg
245 250 255
Ala Leu Ser Asp Gly Asp Pro Val Arg Ala Val Ile Leu Gly Ser Ala
260 265 270
Thr Asn Asn Asp Gly Ala Arg Lys Ile Gly Phe Thr Ala Pro Ser Glu
275 280 285
Val Gly Gln Ala Gln Ala Ile Met Glu Ala Leu Ala Leu Ala Gly Val
290 295 300
Glu Ala Arg Ser Ile Gln Tyr Ile Glu Thr His Gly Thr Gly Thr Leu
305 310 315 320
Leu Gly Asp Ala Ile Glu Thr Ala Ala Leu Arg Arg Val Phe Gly Arg
325 330 335
Asp Ala Ser Ala Arg Arg Ser Cys Ala Ile Gly Ser Val Lys Thr Gly
340 345 350
Ile Gly His Leu Glu Ser Ala Ala Gly Ile Ala Gly Leu Ile Lys Thr
355 360 365
Val Leu Ala Leu Glu His Arg Gln Leu Pro Pro Ser Leu Asn Phe Glu
370 375 380
Ser Pro Asn Pro Ser Ile Asp Phe Ala Ser Ser Pro Phe Tyr Val Asn
385 390 395 400
Thr Ser Leu Lys Asp Trp Asn Thr Gly Ser Thr Pro Arg Arg Ala Gly
405 410 415
Val Ser Ser Phe Gly Ile Gly Gly Thr Asn Ala His Val Val Leu Glu
420 425 430
Glu Ala Pro Ala Ala Lys Leu Pro Ala Ala Ala Pro Ala Arg Ser Ala
435 440 445
Glu Leu Phe Val Val Ser Ala Lys Ser Ala Ala Ala Leu Asp Ala Ala
450 455 460
Ala Ala Arg Leu Arg Asp His Leu Gln Ala His Gln Gly Ile Ser Leu
465 470 475 480
Gly Asp Val Ala Phe Ser Leu Ala Thr Thr Arg Ser Pro Met Glu His
485 490 495
Arg Leu Ala Met Ala Ala Pro Ser Arg Glu Ala Leu Arg Glu Gly Leu
500 505 510
Asp Ala Ala Ala Arg Gly Gln Thr Pro Pro Gly Ala Val Arg Gly Arg
515 520 525
Cys Ser Pro Gly Asn Val Pro Lys Val Val Phe Val Phe Pro Gly Gln
530 535 540
Gly Ser Gln Trp Val Gly Met Gly Arg Gln Leu Leu Ala Glu Glu Pro
545 550 555 560
Val Phe His Ala Ala Leu Ser Ala Cys Asp Arg Ala Ile Gln Ala Glu
565 570 575
Ala Gly Trp Ser Leu Leu Ala Glu Leu Ala Ala Asp Glu Gly Ser Ser
580 585 590
Gln Leu Glu Arg Ile Asp Val Val Gln Pro Val Leu Phe Ala Leu Ala
595 600 605
Val Ala Phe Ala Ala Leu Trp Arg Ser Trp Gly Val Ala Pro Asp Val
610 615 620
Val Ile Gly His Ser Met Gly Glu Val Ala Ala Ala His Val Ala Gly
625 630 635 640
Ala Leu Ser Leu Glu Asp Ala Val Ala Ile Ile Cys Arg Arg Ser Arg
645 650 655
Leu Leu Arg Arg Ile Ser Gly Gln Gly Glu Met Ala Val Thr Glu Leu
660 665 670
Ser Leu Ala Glu Ala Glu Ala Ala Leu Arg Gly Tyr Glu Asp Arg Val
675 680 685
Ser Val Ala Val Ser Asn Ser Pro Arg Ser Thr Val Leu Ser Gly Glu
690 695 700
Pro Ala Ala Ile Gly Glu Val Leu Ser Ser Leu Asn Ala Lys Gly Val
705 710 715 720
Phe Cys Arg Arg Val Lys Val Asp Val Ala Ser His Ser Pro Gln Val
725 730 735
Asp Pro Leu Arg Glu Asp Leu Leu Ala Ala Leu Gly Gly Leu Arg Pro
740 745 750
Gly Ala Ala Ala Val Pro Met Arg Ser Thr Val Thr Gly Ala Met Val
755 760 765
Ala Gly Pro Glu Leu Gly Ala Asn Tyr Trp Met Asn Asn Leu Arg Gln
770 775 780
Pro Val Arg Phe Ala Glu Val Val Gln Ala Gln Leu Gln Gly Gly His
785 790 795 800
Gly Leu Phe Val Glu Met Ser Pro His Pro Ile Leu Thr Thr Ser Val
805 810 815
Glu Glu Met Arg Arg Ala Ala Gln Arg Ala Gly Ala Ala Val Gly Ser
820 825 830
Leu Arg Arg Gly Gln Asp Glu Arg Pro Ala Met Leu Glu Ala Leu Gly
835 840 845
Thr Leu Trp Ala Gln Gly Tyr Pro Val Pro Trp Gly Arg Leu Phe Pro
850 855 860
Ala Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln Arg Glu
865 870 875 880
Arg Tyr Trp Ile Glu Ala Pro Ala Lys Ser Ala Ala Gly Asp Arg Arg
885 890 895
Gly Val Arg Ala Gly Gly His Pro Leu Leu Gly Glu Met Gln Thr Leu
900 905 910
Ser Thr Gln Thr Ser Thr Arg Leu Trp Glu Thr Thr Leu Asp Leu Lys
915 920 925
Arg Leu Pro Trp Leu Gly Asp His Arg Val Gln Gly Ala Val Val Phe
930 935 940
Pro Gly Ala Ala Tyr Leu Glu Met Ala Ile Ser Ser Gly Ala Glu Ala
945 950 955 960
Leu Gly Asp Gly Pro Leu Gln Ile Thr Asp Val Val Leu Ala Glu Ala
965 970 975
Leu Ala Phe Ala Gly Asp Ala Ala Val Leu Val Gln Val Val Thr Thr
980 985 990
Glu Gln Pro Ser Gly Arg Leu Gln Phe Gln Ile Ala Ser Arg Ala Pro
995 1000 1005
Gly Ala Gly His Ala Ser Phe Arg Val His Ala Arg Gly Ala Leu Leu
1010 1015 1020
Arg Val Glu Arg Thr Glu Val Pro Ala Gly Leu Thr Leu Ser Ala Val
1025 1030 1035 1040
Arg Ala Arg Leu Gln Ala Ser Ile Pro Ala Ala Ala Thr Tyr Ala Glu
1045 1050 1055
Leu Thr Glu Met Gly Leu Gln Tyr Gly Pro Ala Phe Gln Gly Ile Ala
1060 1065 1070
Glu Leu Trp Arg Gly Glu Gly Glu Ala Leu Gly Arg Val Arg Leu Pro
1075 1080 1085
Asp Ala Ala Gly Ser Ala Ala Glu Tyr Arg Leu His Pro Ala Leu Leu
1090 1095 1100
Asp Ala Cys Phe Gln Ile Val Gly Ser Leu Phe Ala Arg Ser Gly Glu
1105 1110 1115 1120
Ala Thr Pro Trp Val Pro Val Glu Leu Gly Ser Leu Arg Leu Leu Gln
1125 1130 1135
Arg Pro Ser Gly Glu Leu Trp Cys His Ala ArgVal Val Asn His Gly
1140 1145 1150
His Gln Thr Pro Asp Arg Gln Gly Ala Asp Phe Trp Val Val Asp Ser
1155 1160 1165
Ser Gly Ala Val Val Ala Glu Val Cys Gly Leu Val Ala Gln Arg Leu
1170 1175 1180
Pro Gly Gly Val Arg Arg Arg Glu Glu Asp Asp Trp Phe Leu Glu Leu
1185 1190 1195 1200
Glu Trp Glu Pro Ala Ala Val Gly Thr Ala Lys Val Asn Ala Gly Arg
1205 1210 1215
Trp Leu Leu Leu Gly Gly Gly Gly Gly Leu Gly Ala Ala Leu Arg Ala
1220 1225 1230
Met Leu Glu Ala Gly Gly His Ala Val Val His Ala Ala Glu Asn Asn
1235 1240 1245
Thr Ser Ala Ala Gly Val Arg Ala Leu Leu Ala Lys Ala Phe Asp Gly
1250 1255 1260
Gln Ala Pro Thr Ala Val Val His Leu Gly Ser Leu Asp Gly Gly Gly
1265 1270 1275 1280
Glu Leu Asp Pro Gly Leu Gly Ala Gln Gly Ala Leu Asp Ala Pro Arg
1285 1290 1295
Ser Ala Asp Val Ser Pro Asp Ala Leu Asp Pro Ala Leu Val Arg Gly
1300 1305 1310
Cys Asp Ser Val Leu Trp Thr Val Gln Ala Leu Ala Gly Met Gly Phe
1315 1320 1325
Arg Asp Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gln Ala Val
1330 1335 1340
Gly Ala Gly Asp Val Ser Val Thr Gln Ala Pro Leu Leu Gly Leu Gly
1345 1350 1355 1360
Arg Val Ile Ala Met Glu His Ala Asp Leu Arg Cys Ala Arg Val Asp
1365 1370 1375
Leu Asp Pro Ala Arg Pro Glu Gly Glu Leu Ala Ala Leu Leu Ala Glu
1380 1385 1390
Leu Leu Ala Asp Asp Ala Glu Ala Glu Val Ala Leu Arg Gly Gly Glu
1395 1400 1405
Arg Cys Val Ala Arg Ile Val Arg Arg Gln Pro Glu Thr Arg Pro Arg
1410 1415 1420
G1y Arg Ile Glu Ser Cys Val Pro Thr Asp Val Thr Ile Arg Ala Asp
1425 1430 1435 1440
Ser Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly Leu Ser Val
1445 1450 1455
Ala Gly Trp Leu Ala Glu Arg Gly Ala Gly His Leu Val Leu Val Gly
1460 1465 1470
Arg Ser Gly Ala Ala Ser Val Glu Gln Arg Ala Ala Val Ala Ala Leu
1475 1480 1485
Glu Ala Arg Gly Ala Arg Val Thr Val Ala Lys Ala Asp Val Ala Asp
1490 1495 1500
Arg Ala Gln Leu Glu Arg Ile Leu Arg Glu Val Thr Thr Ser Gly Met
1505 1510 1515 1520
Pro Leu Arg Gly Val Val His Ala Ala Gly Ile Leu Asp Asp Gly Leu
1525 1530 1535
Leu Met Gln Gln Thr Pro Ala Arg Phe Arg Lys Val Met Ala Pro Lys
1540 1545 1550
Val Gln Gly Ala Leu His Leu His Ala Leu Thr Arg Glu Ala Pro Leu
1555 1560 1565
Ser Phe Phe Val Leu Tyr Ala Ser Gly Val Gly Leu Leu Gly Ser Pro
1570 1575 1580
Gly Gln Gly Asn Tyr Ala Ala Ala Asn Thr Phe Leu Asp Ala Leu Ala
1585 1590 1595 1600
His His Arg Arg Ala Gln Gly Leu Pro Ala Leu Ser Val Asp Trp Gly
1605 1610 1615
Leu Phe Ala Glu Val Gly Met Ala Ala Ala Gln Glu Asp Arg Gly Ala
1620 1625 1630
Arg Leu Val Ser Arg Gly Met Arg Ser Leu Thr Pro Asp Glu Gly Leu
1635 1640 1645
Ser Ala Leu Ala Arg Leu Leu Glu Ser Gly Arg Ala Gln Val Gly Val
1650 1655 1660
Met Pro Val Asn Pro Arg Leu Trp Val Glu Leu Tyr Pro Ala Ala Ala
1665 1670 1675 1680
Ser Ser Arg Met Leu Ser Arg Leu Val Thr Ala His Arg Ala Ser Ala
1685 1690 1695
Gly Gly Pro Ala Gly Asp Gly Asp Leu Leu Arg Arg Leu Ala Ala Ala
1700 1705 1710
Glu Pro Ser Ala Arg Ser Ala Leu Leu Glu Pro Leu Leu Arg Ala Gln
1715 1720 1725
Ile Ser Gln Val Leu Arg Leu Pro Glu Gly Lys Ile Glu Val Asp Ala
1730 1735 1740
Pro Leu Thr Ser Leu Gly Met Asn Ser Leu Met Gly Leu Glu Leu Arg
1745 1750 1755 1760
Asn Arg Ile Glu Ala Met Leu Gly Ile Thr Val Pro Ala Thr Leu Leu
1765 1770 1775
Trp Thr Tyr Pro Thr Val Ala Ala Leu Ser Gly His Leu Ala Arg Glu
1780 1785 1790
Ala Cys Glu Ala Ala Pro Val Glu Ser Pro His Thr Thr Ala Asp Ser
1795 1800 1805
Ala Val Glu Ile Glu Glu Met Ser Gln Asp Asp Leu Thr Gln Leu Ile
1810 1815 1820
Ala Ala Lys Phe Lys Ala Leu Thr
1825 1830
<210>5
<211>7257
<212>PRT
<213〉sorangium cellulosum
<400>5
Met Thr Thr Arg Gly Pro Thr Ala Gln Gln Asn Pro Leu Lys Gln Ala
1 5 10 15
Ala Ile Ile Ile Gln Arg Leu Glu Glu Arg Leu Ala Gly Leu Ala Gln
20 25 30
Ala Glu Leu Glu Arg Thr Glu Pro Ile Ala Ile Val Gly Ile Gly Cys
35 40 45
Arg Phe Pro Gly Gly Ala Asp Ala Pro Glu Ala Phe Trp Glu Leu Leu
50 55 60
Asp Ala Glu Arg Asp Ala Val Gln Pro Leu Asp Met Arg Trp Ala Leu
65 70 75 80
Val Gly Val Ala Pro Val Glu Ala Val Pro His Trp Ala Gly Leu Leu
85 90 95
Thr Glu Pro Ile Asp Cys Phe Asp Ala Ala Phe Phe Gly Ile Ser Pro
100 105 110
Arg Glu Ala Arg Ser Leu Asp Pro Gln His Arg Leu Leu Leu Glu Val
115 120 125
Ala Trp Glu Gly Leu Glu Asp Ala Gly Ile Pro Pro Arg Ser Ile Asp
130 135 140
Gly Ser Arg Thr Gly Val Phe Val Gly Ala Phe Thr Ala Asp Tyr Ala
145 150 155 160
Arg Thr Val Ala Arg Leu Pro Arg Glu Glu Arg Asp Ala Tyr Ser Ala
165 170 175
Thr Gly Asn Met Leu Ser Ile Ala Ala Gly Arg Leu Ser Tyr Thr Leu
180 185 190
Gly Leu Gln Gly Pro Cys Leu Thr Val Asp Thr Ala Cys Ser Ser Ser
195 200 205
Leu Val Ala Ile His Leu Ala Cys Arg Ser Leu Arg Ala Gly Glu Ser
210 215 220
Asp Leu Ala Leu Ala Gly Gly Val Ser Ala Leu Leu Ser Pro Asp Met
225 230 235 240
Met Glu Ala Ala Ala Arg Thr Gln Ala Leu Ser Pro Asp Gly Arg Cys
245 250 255
Arg Thr Phe Asp Ala Ser Ala Asn Gly Phe Val Arg Gly Glu Gly Cys
260 265 270
Gly Leu Val Val Leu Lys Arg Leu Ser Asp Ala Gln Arg Asp Gly Asp
275 280 285
Arg Ile Trp Ala Leu Ile Arg Gly Ser Ala Ile Asn His Asp Gly Arg
290 295 300
Ser Thr Gly Leu Thr Ala Pro Asn Val Leu Ala Gln Glu Thr Val Leu
305 310 315 320
Arg Glu Ala Leu Arg Ser Ala His Val Glu Ala Gly Ala Val Asp Tyr
325 330 335
Val Glu Thr His Gly Thr Gly Thr Ser Leu Gly Asp Pro Ile Glu Val
340 345 350
Glu Ala Leu Arg Ala Thr Val Gly Pro Ala Arg Ser Asp Gly Thr Arg
355 360 365
Cys Val Leu Gly Ala Val Lys Thr Asn Ile Gly His Leu Glu Ala Ala
370 375 380
Ala Gly Val Ala Gly Leu Ile Lys Ala Ala Leu Ser Leu Thr His Glu
385 390 395 400
Arg Ile Pro Arg Asn Leu Asn Phe Arg Thr Leu Asn Pro Arg Ile Arg
405 410 415
Leu Glu Gly Ser Ala Leu Ala Leu Ala Thr Glu Pro Val Pro Trp Pro
420 425 430
Arg Thr Asp Arg Pro Arg Phe Ala Gly Val Ser Ser Phe Gly Met Ser
435 440 445
Gly Thr Asn Ala His Val Val Leu Glu Glu Ala Pro Ala Val Glu Leu
450 455 460
Trp Pro Ala Ala Pro Glu Arg Ser Ala Glu Leu Leu Val Leu Ser Gly
465 470 475 480
Lys Ser Glu Gly Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Glu His
485 490 495
Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp Val Ala Phe Ser Leu
500 505 510
Ala Thr Thr Arg Ser Ala Met Ser His Arg Leu Ala Val Ala Val Thr
515 520 525
Ser Arg Glu Gly Leu Leu Ala Ala Leu Ser Ala Val Ala Gln Gly Gln
530 535 540
Thr Pro Ala Gly Ala Ala Arg Cys Ile Ala Ser Ser Ser Arg Gly Lys
545 550 555 560
Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gln Thr Pro Gly Met Gly
565 570 575
Arg Gly Leu Cys Ala Ala Trp Pro Ala Phe Arg Glu Ala Phe Asp Arg
580 585 590
Cys Val Ala Leu Phe Asp Arg Glu Leu Asp Arg Pro Leu Arg Glu Val
595 600 605
Met Trp Ala Glu Ala Gly Ser Ala Glu Ser Leu Leu Leu Asp Gln Thr
610 615 620
Ala Phe Thr Gln Pro Ala Leu Phe Ala Val Glu Tyr Ala Leu Thr Ala
625 630 635 640
Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu Leu Val Gly His Ser
645 650 655
Ile Gly Glu Leu Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu
660 665 670
Asp Gly Val Arg Leu Val Ala Ala Arg Gly Arg Leu Met Gln Gly Leu
675 680 685
Ser Ala Gly Gly Ala Met Val Ser Leu Gly Ala Pro Glu Ala Glu Val
690 695 700
Ala Ala Ala Val Ala Pro His Ala Ala Ser Val Ser Ile Ala Ala Val
705 710 715 720
Asn Gly Pro Glu Gln Val Val IleA la Gly Val Glu Gln Ala Val Gln
725 730 735
Ala Ile Ala Ala Gly Phe Ala Ala Arg Gly Ala Arg Thr Lys Arg Leu
740 745 750
His Val Ser His Ala Phe His Ser Pro Leu Met Glu Pro Met Leu Glu
755 760 765
Glu Phe Gly Arg Val Ala Ala Ser Val Thr Tyr Arg Arg Pro Ser Val
770 775 780
Ser Leu Val Ser Asn Leu Ser Gly Lys Val Val Thr Asp Glu Leu Ser
785 790 795 800
Ala Pro Gly Tyr Trp Val Arg His Val Arg Glu Ala Val Arg Phe Ala
805 810 815
Asp Gly Val Lys Ala Leu His Glu Ala Gly Ala Gly Thr Phe Val Glu
820 825 830
Val Gly Pro Lys Pro Thr Leu Leu Gly Leu Leu Pro Ala Cys Leu Pro
835 840 845
Glu Ala Glu Pro Thr Leu Leu Ala Ser Leu Arg Ala Gly Arg Glu Glu
850 855 860
Ala Ala Gly Val Leu Glu Ala Leu Gly Arg Leu Trp Ala Ala Gly Gly
865 870 875 880
Ser Val Ser Trp Pro Gly Val Phe Pro Thr Ala Gly Arg Arg Val Pro
885 890 895
Leu Pro Thr Tyr Pro Trp Gln Arg Gln Arg Tyr Trp Ile Glu Ala Pro
900 905 910
Ala Glu Gly Leu Gly Ala Thr Ala Ala Asp Ala Leu Ala Gln Trp Phe
915 920 925
Tyr Arg Val Asp Trp Pro Glu Met Pro Arg Ser Ser Val Asp Ser Arg
930 935 940
Arg Ala Arg Ser Gly Gly Trp Leu Val Leu Ala Asp Arg Gly Gly Val
945 950 955 960
Gly Glu Ala Ala Ala Ala Ala Leu Ser Ser Gln Gly Cys Ser Cys Ala
965 970 975
Val Leu His Ala Pro Ala Glu Ala Ser Ala Val Ala Glu Gln Val Thr
980 985 990
Gln Ala Leu Gly Gly Arg Asn Asp Trp Gln Gly Val Leu Tyr Leu Trp
995 1000 1005
Gly Leu Asp Ala Val Val Glu Ala Gly Ala Ser Ala Glu Glu Val Ala
1010 1015 1020
Lys Val Thr His Leu Ala Ala Ala Pro Val Leu Ala Leu Ile Gln Ala
1025 1030 1035 1040
Leu Gly Thr Gly Pro Arg Ser Pro Arg Leu Trp Ile Val Thr Arg Gly
1045 1050 1055
Ala Cys Thr Val Gly Gly Glu Pro Asp Ala Ala Pro Cys Gln Ala Ala
1060 1065 1070
Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu His Pro Gly Ser Trp
1075 1080 1085
Gly Gly Leu Val Asp Leu Asp Pro Glu Glu Ser Pro Thr Glu Val Glu
1090 1095 1100
Ala Leu Val Ala Glu Leu Leu Ser Pro Asp Ala Glu Asp Gln Leu Ala
1105 1110 1115 1120
Phe Arg Gln Gly Arg Arg Arg Ala Ala Arg Leu Val Ala Ala Pro Pro
1125 1130 1135
Glu Gly Asn Ala Ala Pro Val Ser Leu Ser Ala Glu Gly Ser Tyr Leu
1140 1145 1150
Val Thr Gly Gly Leu Gly Ala Leu Gly Leu Leu Val Ala Arg Trp Leu
1155 1160 1165
Val Glu Arg Gly Ala Gly His Leu Val Leu Ile Ser Arg His Gly Leu
1170 1175 1180
Pro Asp Arg Glu Glu Trp Gly Arg Asp Gln Pro Pro Glu Val Arg Ala
1185 1190 1195 1200
Arg Ile Ala Ala Ile Glu Ala Leu Glu Ala Gln Gly Ala Arg Val Thr
1205 1210 1215
Val Ala Ala Val Asp Val Ala Asp Ala Glu Gly Met Ala Ala Leu Leu
1220 1225 1230
Ala Ala Val Glu Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Leu
1235 1240 1245
Leu Asp Asp Gly Leu Leu Ala His Gln Asp Ala Gly Arg Leu Ala Arg
1250 1255 1260
Val Leu Arg Pro Lys Val Glu Gly Ala Trp Val Leu His Thr Leu Thr
1265 1270 1275 1280
Arg Glu Gln Pro Leu Asp Leu Phe Val Leu Phe Ser Ser Ala Ser Gly
1285 1290 1295
Val Phe Gly Ser Ile Gly Gln Gly Ser Tyr Ala Ala Gly Asn Ala Phe
1300 1305 1310
Leu Asp Ala Leu Ala Asp Leu Arg Arg Thr Gln Gly Leu Ala Ala Leu
1315 1320 1325
Ser Ile Ala Trp Gly Leu Trp Ala Glu Gly Gly Met Gly Ser Gln Ala
1330 1335 1340
Gln Arg Arg Glu His Glu Ala Ser Gly Ile Trp Ala Met Pro Thr Ser
1345 1350 1355 1360
Arg Ala Leu Ala Ala Met Glu Trp Leu Leu Gly Thr Arg Ala Thr Gln
1365 1370 1375
Arg Val Val Ile Gln Met Asp Trp Ala His Ala GlyAla Ala Pro Arg
1380 1385 1390
Asp Ala Ser Arg Gly Arg Phe Trp Asp Arg Leu Val Thr Ala Thr Lys
1395 1400 1405
Glu Ala Ser Ser Ser Ala Val Pro Ala Val Glu Arg Trp Arg Asn Ala
1410 1415 1420
Ser Val Val Glu Thr Arg Ser Ala Leu Tyr Glu Leu Val Arg Gly Val
1425 1430 1435 1440
Val Ala Gly Val Met Gly Phe Thr Asp Gln Gly Thr Leu Asp Val Arg
1445 1450 1455
Arg Gly Phe Ala Glu Gln Gly Leu Asp Ser Leu Met Ala Val Glu Ile
1460 1465 1470
Arg Lys Arg Leu Gln Gly Glu Leu Gly Met Pro Leu Ser Ala Thr Leu
1475 1480 1485
Ala Phe Asp His Pro Thr Val Glu Arg Leu Val Glu Tyr Leu Leu Ser
1490 1495 1500
Gln Ala Leu Glu Leu Gln Asp Arg Thr Asp Val Arg Ser Val Arg Leu
1505 1510 1515 1520
Pro Ala Thr Glu Asp Pro Ile Ala Ile Val Gly Ala Ala Cys Arg Phe
1525 1530 1535
Pro Gly Gly Val Glu Asp Leu Glu Ser Tyr Trp Gln Leu Leu Thr Glu
1540 1545 1550
Gly Val Val Val Ser Thr Glu Val Pro Ala Asp Arg Trp Asn Gly Ala
1555 1560 1565
Asp Gly Arg Val Pro Gly Ser Gly Glu Ala Gln Arg Gln Thr Tyr Val
1570 1575 1580
Pro Arg Gly Gly Phe Leu Arg Glu Val Glu Thr Phe Asp Ala Ala Phe
1585 1590 1595 1600
Phe His Ile Ser Pro Arg Glu Ala Met Ser Leu Asp Pro Gln Gln Arg
1605 1610 1615
Leu Leu Leu Glu Val Ser Trp Glu Ala Ile Glu Arg Ala Gly Gln Asp
1620 1625 1630
Pro Ser Ala Leu Arg Glu Ser Pro Thr Gly Val Phe Val Gly Ala Gly
1635 1640 1645
Pro Asn Glu Tyr Ala Glu Arg Val Gln Glu Leu Ala Asp Glu Ala Ala
1650 1655 1660
Gly Leu Tyr Ser Gly Thr Gly Asn Met Leu Ser Val Ala Ala Gly Arg
1665 1670 1675 1680
Leu Ser Phe Phe Leu Gly Leu His Gly Pro Thr Leu Ala Val Asp Thr
1685 1690 1695
Ala Cys Ser Ser Ser Leu Val Ala Leu His Leu Gly Cys Gln Ser Leu
1700 1705 1710
Arg Arg Gly Glu Cys Asp Gln Ala Leu Val Gly Gly Val Asn Met Leu
1715 1720 1725
Leu Ser Pro Lys Thr Phe Ala Leu Leu Ser Arg Met His Ala Leu Ser
1730 1735 1740
Pro Gly Gly Arg Cys Lys Thr Phe Ser Ala Asp Ala Asp Gly Tyr Ala
1745 1750 1755 1760
Arg Ala Glu Gly Cys Ala Val Val Val Leu Lys Arg Leu Ser Asp Ala
1765 1770 1775
Gln Arg Asp Arg Asp Pro Ile Leu Ala Val Ile Arg Gly Thr Ala Ile
1780 1785 1790
Asn His Asp Gly Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala
1795 1800 1805
Gln Glu Ala Leu Leu Arg Gln Ala Leu Ala His Ala Gly Val Val Pro
1810 1815 1820
Ala Asp Val Asp Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly
1825 1830 1835 1840
Asp Pro Ile Glu Val Arg Ala Leu Ser Asp Val Tyr Gly Gln Ala Arg
1845 1850 1855
Pro Ala Asp Arg Pro Leu Ile Leu Gly Ala Ala Lys Ala Asn Leu Gly
1860 1865 1870
His Met Glu Pro Ala Ala Gly Leu Ala Gly Leu Leu Lys Ala Val Leu
1875 1880 1885
Ala Leu Gly Gln Glu Gln Ile Pro Ala Gln Pro Glu Leu Gly Glu Leu
1890 1895 1900
Asn Pro Leu Leu Pro Trp Glu Ala Leu Pro Val Ala Val Ala Arg Ala
1905 1910 1915 1920
Ala Val Pro Trp Pro Arg Thr Asp Arg Pro Arg Phe Ala Gly Val Ser
1925 1930 1935
Ser Phe Gly Met Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala
1940 1945 1950
Pro Ala Val Glu Leu Trp Pro Ala Ala Pro Glu Arg Ser Ala Glu Leu
1955 1960 1965
Leu Val Leu Ser Gly Lys Ser Glu Gly Ala Leu Asp Ala Gln Ala Ala
1970 1975 1980
Arg Leu Arg Glu His Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp
1985 1990 1995 2000
Val Ala Phe Ser Leu Ala Thr Thr Arg Ser Ala Met Asn His Arg Leu
2005 2010 2015
Ala Val Ala Val Thr Ser Arg Glu Gly Leu Leu Ala Ala Leu Ser Ala
2020 2025 2030
Val Ala Gln Gly Gln Thr Pro Pro Gly Ala Ala Arg Cys Ile Ala Ser
2035 2040 2045
Ser Ser Arg Gly Lys Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gln
2050 2055 2060
Thr Pro Gly Met Gly Arg Gly Leu Cys Ala Ala Trp Pro Ala Phe Arg
2065 2070 2075 2080
Glu Ala Phe Asp Arg Cys Val Ala Leu Phe Asp Arg Glu Leu Asp Arg
2085 2090 2095
Pro Leu Arg Glu Val Met Trp Ala Glu Pro Gly Ser Ala Glu Ser Leu
2100 2105 2110
Leu Leu Asp Gln Thr Ala Phe Thr Gln Pro Ala Leu Phe Thr Val Glu
2115 2120 2125
Tyr Ala Leu Thr Ala Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu
2130 2135 2140
Val Ala Gly His Ser Ala Gly Glu Leu Val Ala Ala Cys Val Ala Gly
2145 2150 2155 2160
Val Phe Ser Leu Glu Asp Gly Val Arg Leu Val Ala Ala Arg Gly Arg
2165 2170 2175
Leu Met Gln Gly Leu Ser Ala Gly Gly Ala Met Val Ser Leu Gly Ala
2180 2185 2190
Pro Glu Ala Glu Val Ala Ala Ala Val Ala Pro His Ala Ala Ser Val
2195 2200 2205
Ser Ile Ala Ala Val Asn Gly Pro Glu Gln Val Val Ile Ala Gly Val
2210 2215 2220
Glu Gln Ala Val Gln Ala Ile Ala Ala Gly Phe Ala Ala Arg Gly Ala
2225 2230 2235 2240
Arg Thr Lys Arg Leu His Val Ser His Ala Ser His Ser Pro Leu Met
2245 2250 2255
Glu Pro Met Leu Glu Glu Phe Gly Arg Val Ala Ala Ser Val Thr Tyr
2260 2265 2270
Arg Arg Pro Ser Val Ser Leu Val Ser Asn Leu Ser Gly Lys Val Val
2275 2280 2285
Ala Asp Glu Leu Ser Ala Pro Gly Tyr Trp Val Arg His Val Arg Glu
2290 2295 2300
Ala Val Arg Phe Ala Asp Gly Val Lys Ala Leu His Glu Ala Gly Ala
2305 2310 2315 2320
Gly Thr Phe Val Glu Val Gly Pro Lys Pro Thr Leu Leu Gly Leu Leu
2325 2330 2335
Pro Ala Cys Leu Pro Glu Ala Glu Pro Thr Leu Leu Ala Ser Leu Arg
2340 2345 2350
Ala Gly Arg Glu Glu Ala Ala Gly Val Leu Glu Ala Leu Gly Arg Leu
2355 2360 2365
Trp Ala Ala Gly Gly Ser Val Ser Trp Pro Gly Val Phe Pro Thr Ala
2370 2375 2380
Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln Arg Gln Arg Tyr
2385 2390 2395 2400
Trp Pro Asp Ile Glu Pro Asp Ser Arg Arg His Ala Ala Ala Asp Pro
2405 2410 2415
Thr Gln Gly Trp Phe Tyr Arg Val Asp Trp Pro Glu Ile Pro Arg Ser
2420 2425 2430
Leu Gln Lys Ser Glu Glu Ala Ser Arg Gly Ser Trp Leu Val Leu Ala
2435 2440 2445
Asp Lys Gly Gly Val Gly Glu Ala Val Ala Ala Ala Leu Ser Thr Arg
2450 2455 2460
Gly Leu Pro Cys Val Val Leu His Ala Pro Ala Glu Thr Ser Ala Thr
2465 2470 2475 2480
Ala Glu Leu Val Thr Glu Ala Ala Gly Gly Arg Ser Asp Trp Gln Val
2485 2490 2495
Val Leu Tyr Leu Trp Gly Leu Asp Ala Val Val Gly Ala Glu Ala Ser
2500 2505 2510
Ile Asp Glu Ile Gly Asp Ala Thr Arg Arg Ala Thr Ala Pro Val Leu
2515 2520 2525
Gly Leu Ala Arg Phe Leu Ser Thr Val Ser Cys Ser Pro Arg Leu Trp
2530 2535 2540
Val Val Thr Arg Gly Ala Cys Ile Val Gly Asp Glu Pro Ala Ile Ala
2545 2550 2555 2560
Pro Cys Gln Ala Ala Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu
2565 2570 2575
His Pro Gly Ala Trp Gly Gly Leu Val Asp Leu Asp Pro Arg Ala Ser
2580 2585 2590
Pro Pro Gln Ala Ser Pro Ile Asp Gly Glu Met Leu Val Thr Glu Leu
2595 2600 2605
Leu Ser Gln Glu Thr Glu Asp Gln Leu Ala Phe Arg His Gly Arg Arg
26l0 2615 2620
His Ala Ala Arg Leu Val Ala Ala Pro Pro Gln Gly Gln Ala Ala Pro
2625 2630 2635 2640
Val Ser Leu Ser Ala Glu Ala Ser Tyr Leu Val Thr Gly Gly Leu Gly
2645 2650 2655
Gly Leu Gly Leu Ile Val Ala Gln Trp Leu Val Glu Leu Gly ALa Arg
2660 2665 2670
His Leu Val Leu Thr Ser Arg Arg Gly Leu Pro Asp Arg Gln Ala Trp
2675 2680 2685
Cys Glu Gln Gln Pro Pro Glu Ile Arg Ala Arg Ile Ala Ala Val Glu
2690 2695 2700
Ala Leu Glu Ala Arg Gly Ala Arg Val Thr Val Ala Ala Val Asp Val
2705 2710 2715 2720
Ala Asp Val Glu Pro Met Thr Ala Leu Val Ser Ser Val Glu Pro Pro
2725 2730 2735
Leu Arg Gly Val Val His Ala Ala Gly Val Ser Val Met Arg Pro Leu
2740 2745 2750
Ala Glu Thr Asp Glu Thr Leu Leu Glu Ser Val Leu Arg Pro Lys Val
2755 2760 2765
Ala Gly Ser Trp Leu Leu His Arg Leu Leu His Gly Arg Pro Leu Asp
2770 2775 2780
Leu Phe Val Leu Phe Ser Ser Gly Ala Ala Val Trp Gly Ser His Ser
2785 2790 2795 2800
Gln Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu Ala His
2805 2810 2815
Leu Arg Arg Ser Gln Ser Leu Pro Ala Leu Ser Val Ala Trp Gly Leu
2820 2825 2830
Trp Ala Glu Gly Gly Met Ala Asp Ala Glu Ala His Ala Arg Leu Ser
2835 2840 2845
Asp Ile Gly Val Leu Pro Met Ser Thr Ser Ala Ala Leu Ser Ala Leu
2850 2855 2860
Gln Arg Leu Val Glu Thr Gly Ala Ala Gln Arg Thr Val Thr Arg Met
2865 2870 2875 2880
Asp Trp Ala Arg Phe Ala Pro Val Tyr Thr Ala Arg Gly Arg Arg Asn
2885 2890 2895
Leu Leu Ser Ala Leu Val Ala Gly Arg Asp Ile Ile Ala Pro Ser Pro
2900 2905 2910
Pro Ala Ala Ala Thr Arg Asn Trp Arg Gly Leu Ser Val Ala Glu Ala
2915 2920 2925
Arg Val Ala Leu His Glu Ile Val His Gly Ala Val Ala Arg Val Leu
2930 2935 2940
Gly Phe Leu Asp Pro Ser Ala Leu Asp Pro Gly Met Gly Phe Asn Glu
2945 2950 2955 2960
Gln Gly Leu Asp Ser Leu Met Ala Val Glu Ile Arg Asn Leu Leu Gln
2965 2970 2975
Ala Glu Leu Asp Val Arg Leu Ser Thr Thr Leu Ala Phe Asp His Pro
2980 2985 2990
Thr Val Gln Arg Leu Val Glu His Leu Leu Val Asp Val Leu Lys Leu
2995 3000 3005
Glu Asp Arg Ser Asp Thr Gln His Val Arg Ser Leu Ala Ser Asp Glu
3010 3015 3020
Pro Ile AlaIle Val Gly Ala Ala Cys Arg Phe Pro Gly Gly Val Glu
3025 3030 3035 3040
Asp Leu Glu Ser Tyr Trp Gln Leu Leu Ala Glu Gly Val Val Val Ser
3045 3050 3055
Ala Glu Val Pro Ala Asp Arg Trp Asp Ala Ala Asp Trp Tyr Asp Pro
3060 3065 3070
Asp Pro Glu Ile Pro Gly Arg Thr Tyr Val Thr Lys Gly Ala Phe Leu
3075 3080 3085
Arg Asp Leu Gln Arg Leu Asp Ala Thr Phe Phe Arg Ile Ser Pro Arg
3090 3095 3100
Glu Ala Met Ser Leu Asp Pro Gln Gln Arg Leu Leu Leu Glu Val Ser
3105 3110 3115 3120
Trp Glu Ala Leu Glu Ser Ala Gly Ile Ala Pro Asp Thr Leu Arg Asp
3125 3130 3135
Ser Pro Thr Gly Val Phe Val Gly Ala Gly Pro Asn Glu Tyr Tyr Thr
3140 3145 3150
Gln Arg Leu Arg Gly Phe Thr Asp Gly Ala Ala Gly Leu Tyr Gly Gly
3155 3160 3165
Thr Gly Asn Met Leu Ser Val Thr Ala Gly Arg Leu Ser Phe Phe Leu
3170 3175 3180
Gly Leu His Gly Pro Thr Leu Ala Met Asp Thr Ala Cys Ser Ser Ser
3185 3190 3195 3200
Leu Val Ala Leu His Leu Ala Cys Gln Ser Leu Arg Leu Gly Glu Cys
3205 3210 3215
Asp Gln Ala Leu Val Gly Gly Val Asn Val Leu Leu Ala Pro Glu Thr
3220 3225 3230
Phe Val Leu Leu Ser Arg Met Arg Ala Leu Ser Pro Asp Gly Arg Cys
3235 3240 3245
Lys Thr Phe Ser Ala Asp Ala Asp Gly Tyr Ala Arg Gly Glu Gly Cys
3250 3255 3260
Ala Val Val Val Leu Lys Arg Leu Arg Asp Ala Gln Arg Ala Gly Asp
3265 3270 3275 3280
Ser Ile Leu Ala Leu Ile Arg Gly Ser Ala Val Asn His Asp Gly Pro
3285 3290 3295
Ser Ser Gly Leu Thr Val Pro Asn Gly Pro Ala Gln Gln Ala Leu Leu
3300 3305 3310
Arg Gln Ala Leu Ser Gln Ala Gly Val Ser Pro Val Asp Val Asp Phe
3315 3320 3325
Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro Ile Glu Val
3330 3335 3340
Gln Ala Leu Ser Glu Val Tyr Gly Pro Gly Arg Ser Gly Asp Arg Pro
3345 3350 3355 3360
Leu Val Leu Gly Ala Ala Lys Ala Asn Val Ala His Leu Glu Ala Ala
3365 3370 3375
Ser Gly Leu Ala Ser Leu Leu Lys Ala Val Leu Ala Leu Arg His Glu
3380 3385 3390
Gln Ile Pro Ala Gln Pro Glu Leu Gly Glu Leu Asn Pro His Leu Pro
3395 3400 3405
Trp Asn Thr Leu Pro Val Ala Val Pro Arg Lys Ala Val Pro Trp Gly
3410 3415 3420
Arg Gly Ala Arg Pro Arg Arg Ala Gly Val Ser Ala Phe Gly Leu Ser
3425 3430 3435 3440
Gly Thr Asn Val His Val Val Leu Glu Glu Ala Pro Glu Val Glu Pro
3445 3450 3455
Ala Pro Ala Ala Pro Ala Arg Pro Val Glu Leu Val Val Leu Ser Ala
3460 3465 3470
Lys Ser Ala Ala Ala Leu Asp Ala Ala Ala Ala Arg Leu Ser Ala His
3475 3480 3485
Leu Ser Ala His Pro Glu Leu Ser Leu Gly Asp Val Ala Phe Ser Leu
3490 3495 3500
Ala Thr Thr Arg Ser Pro Met Glu His Arg Leu Ala Ile Ala Thr Thr
3505 3510 3515 3520
Ser Arg Glu Ala Leu Arg Gly Ala Leu Asp Ala Ala Ala Gln Gln Lys
3525 3530 3535
Thr Pro Gln Gly Ala Val Arg Gly Lys Ala Val Ser Ser Arg Gly Lys
3540 3545 3550
Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gln Met Pro Gly Met Gly
3555 3560 3565
Arg Gly Leu Tyr Glu Thr Trp Pro Ala Phe Arg Glu Ala Phe Asp Arg
3570 3575 3580
Cys Val Ala Leu Phe Asp Arg Glu Ile Asp Gln Pro Leu Arg Glu Val
3585 3590 3595 3600
Met Trp Ala Ala Pro Gly Leu Ala Gln Ala Ala Arg Leu Asp Gln Thr
3605 3610 3615
Ala Tyr Ala Gln Pro Ala Leu Phe Ala Leu Glu Tyr Ala Leu Ala Ala
3620 3625 3630
Leu Trp Arg Ser Trp Gly Val Glu Pro His Val Leu Leu Gly His Ser
3635 3640 3645
Ile Gly Glu Leu Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu
3650 3655 3660
Asp Ala Val Arg Leu Val Ala Ala Arg Gly Arg Leu Met Gln Ala Leu
3665 3670 3675 3680
Pro Ala Gly Gly Ala Met Val Ala Ile Ala Ala Ser Glu Ala Glu Val
3685 3690 3695
Ala Ala Ser Val Ala Pro His Ala Ala Thr Val Ser Ile Ala Ala Val
3700 3705 3710
Asn Gly Pro Asp Ala Val Val Ile Ala Gly Ala Glu Val Gln Val Leu
3715 3720 3725
Ala Leu Gly Ala Thr Phe Ala Ala Arg Gly Ile Arg Thr Lys Arg Leu
3730 3735 3740
Ala Val Ser His Ala Phe His Ser Pro Leu Met Asp Pro Met Leu Glu
3745 3750 3755 3760
Asp Phe Gln Arg Val Ala Ala Thr Ile Ala Tyr Arg Ala Pro Asp Arg
3765 3770 3775
Pro Val Val Ser Asn Val Thr Gly His Val Ala Gly Pro Glu Ile Ala
3780 3785 3790
Thr Pro Glu Tyr Trp Val Arg His Val Arg Ser Ala Val Arg Phe Gly
3795 3800 3805
Asp Gly Ala Lys Ala Leu His Ala Ala Gly Ala Ala Thr Phe Val Glu
3810 3815 3820
Val Gly Pro Lys Pro Val Leu Leu Gly Leu Leu Pro Ala Cys Leu Gly
3825 3830 3835 3840
Glu Ala Asp Ala Val Leu Val Pro Ser Leu Arg Ala Asp Arg Ser Glu
3845 3850 3855
Cys Glu Val Val Leu Ala Ala Leu Gly Ala Trp Tyr Ala Trp Gly Gly
3860 3865 3870
Ala Leu Asp Trp Lys Gly Val Phe Pro Asp Gly Ala Arg Arg Val Ala
3875 3880 3885
Leu Pro Met Tyr Pro Trp Gln Arg Glu Arg His Trp Met Asp Leu Thr
3890 3895 3900
Pro Arg Ser Ala Ala Pro Ala Gly Ile Ala Gly Arg Trp Pro Leu Ala
3905 3910 3915 3920
Gly Val Gly Leu Cys Met Pro Gly Ala Val Leu His His Val Leu Ser
3925 3930 3935
Ile Gly Pro Arg His Gln Pro Phe Leu Gly Asp His Leu Val Phe Gly
3940 3945 3950
Lys Val Val Val Pro Gly Ala Phe His Val Ala Val Ile Leu Ser Ile
3955 3960 3965
Ala Ala Glu Arg Trp Pro Glu Arg Ala Ile Glu Leu Thr Gly Val Glu
3970 3975 3980
Phe Leu Lys Ala Ile Ala Met Glu Pro Asp Gln Glu Val Glu Leu His
3985 3990 3995 4000
Ala Val Leu Thr Pro Glu Ala Ala Gly Asp Gly Tyr Leu Phe Glu Leu
4005 4010 4015
Ala Thr Leu Ala Ala Pro Glu Thr Glu Arg Arg Trp Thr Thr His Ala
4020 4025 4030
Arg Gly Arg Val Gln Pro Thr Asp Gly Ala Pro Gly Ala Leu Pro Arg
4035 4040 4045
Leu Glu Val Leu Glu Asp Arg Ala Ile Gln Pro Leu Asp Phe Ala Gly
4050 4055 4060
Phe Leu Asp Arg Leu Ser Ala Val Arg Ile Gly Trp Gly Pro Leu Trp
4065 4070 4075 4080
Arg Trp Leu Gln Asp Gly Arg Val Gly Asp Glu Ala Ser Leu Ala Thr
4085 4090 4095
Leu Val Pro Thr Tyr Pro Asn Ala His Asp Val Ala Pro Leu His Pro
4100 4105 4110
Ile Leu Leu Asp Asn Gly Phe Ala Val Ser Leu Leu Ser Thr Arg Ser
4115 4120 4125
Glu Pro Glu Asp Asp Gly Thr Pro Pro Leu Pro Phe Ala Val Glu Arg
4130 4135 4140
Val Arg Trp Trp Arg Ala Pro Val Gly Arg Val Arg Cys Gly Gly Val
4145 4150 4155 4160
Pro Arg Ser Gln Ala Phe Gly Val Ser Ser Phe Val Leu Val Asp Glu
4165 4170 4175
Thr Gly Glu Val Val Ala Glu Val Glu Gly Phe Val Cys Arg Arg Ala
4180 4185 4190
Pro Arg Glu Val Phe Leu Arg Gln Glu Ser Gly Ala Ser Thr Ala Ala
4195 4200 4205
Leu Tyr Arg Leu Asp Trp Pro Glu Ala Pro Leu Pro Asp Ala Pro Ala
4210 4215 4220
Glu Arg Ile Glu Glu Ser Trp Val Val Val Ala Ala Pro Gly Ser Glu
4225 4230 4235 4240
Met Ala Ala Ala Leu Ala Thr Arg Leu Asn Arg Cys Val Leu Ala Glu
4245 4250 4255
Pro Lys Gly Leu Glu Ala Ala Leu Ala Gly Val Ser Pro Ala Gly Val
4260 4265 4270
Ile Cys Leu Trp Glu Ala Gly Ala His Glu Glu Ala Pro Ala Ala Ala
4275 4280 4285
Gln Arg Val Ala Thr Glu Gly Leu Ser Val Val Gln Ala Leu Arg Asp
4290 4295 4300
Arg Ala Val Arg Leu Trp Trp Val Thr Met Gly Ala Val Ala Val Glu
4305 4310 4315 4320
Ala Gly Glu Arg Val Gln Val Ala Thr Ala Pro Val Trp Gly Leu Gly
4325 4330 4335
Arg Thr Val Met Gln Glu Arg Pro Glu Leu Ser Cys Thr Leu Val Asp
4340 4345 4350
Leu Glu Pro Glu Ala Asp Ala Ala Arg Ser Ala Asp Val Leu Leu Arg
4355 4360 4365
Glu Leu Gly Arg Ala Asp Asp Glu Thr Gln Val Ala Phe Arg Ser Gly
4370 4375 4380
Lys Arg Arg Val Ala Arg Leu Val Lys Ala Thr Thr Pro Glu Gly Leu
4385 4390 4395 4400
Leu Val Pro Asp Ala Glu Ser Tyr Arg Leu Glu Ala Gly Gln Lys Gly
4405 4410 4415
Thr Leu Asp Gln Leu Arg Leu Ala Pro Ala Gln Arg Arg Ala Pro Gly
4420 4425 4430
Pro Gly Glu Val Glu Ile Lys Val Thr Ala Ser Gly Leu Asn Phe Arg
4435 4440 4445
Thr Val Leu Ala Val Leu Gly Met Tyr Pro Gly Asp Ala Gly Pro Met
4450 4455 4460
Gly Gly Asp Cys Ala Gly Val Ala Thr Ala Val Gly Gln Gly Val Arg
4465 4470 4475 4480
His Val Ala Val Gly Asp Ala Val Met Thr Leu Gly Thr Leu His Arg
4485 4490 4495
Phe Val Thr Val Asp Ala Arg Leu Val Val Arg Gln Pro Ala Gly Leu
4500 4505 4510
Thr Pro Ala Gln Ala Ala Thr Val Pro Val Ala Phe Leu Thr Ala Trp
4515 4520 4525
Leu Ala Leu His Asp Leu Gly Asn Leu Arg Arg Gly Glu Arg Val Leu
4530 4535 4540
Ile His Ala Ala Ala Gly Gly Val Gly Met Ala Ala Val Gln Ile Ala
4545 4550 4555 4560
Arg Trp Ile Gly Ala Glu Val Phe Ala Thr Ala Ser Pro Ser Lys Trp
4565 4570 4575
Ala Ala Val Gln Ala Met Gly Val Pro Arg Thr His Ile Ala Ser Ser
4580 4585 4590
Arg Thr Leu Glu Phe Ala Glu Thr Phe Arg Gln Val Thr Gly Gly Arg
4595 4600 4605
Gly Val Asp Val Val Leu Asn Ala Leu Ala Gly Glu Phe Val Asp Ala
4610 4615 4620
Ser Leu Ser Leu Leu Ser Thr Gly Gly Arg Phe Leu Glu Met Gly Lys
4625 4630 4635 4640
Thr Asp Ile Arg Asp Arg Ala Ala Val Ala Ala Ala His Pro Gly Val
4645 4650 4655
Arg Tyr Arg Val Phe Asp Ile Leu Glu Leu Ala Pro Asp Arg Thr Arg
4660 4665 4670
Glu Ile Leu Glu Arg Val Val Glu Gly Phe Ala Ala Gly His Leu Arg
4675 4680 4685
Ala Leu Pro Val His Ala Phe Ala Ile Thr Lys Ala Glu Ala Ala Phe
4690 4695 4700
Arg Phe Met Ala Gln Ala Arg His Gln Gly Lys Val Val Leu Leu Pro
4705 4710 4715 4720
Ala Pro Ser Ala Ala Pro Leu Ala Pro Thr Gly Thr Val Leu Leu Thr
4725 4730 4735
Gly Gly Leu Gly Ala Leu Gly Leu His Val Ala Arg Trp Leu Ala Gln
4740 4745 4750
Gln Gly Val Pro His Met Val Leu Thr Gly Arg Arg Gly Leu Asp Thr
4755 4760 4765
pro Gly Ala Ala Lys Ala Val Ala Glu Ile Glu Ala Leu Gly Ala Arg
4770 4775 4780
Val Thr Ile Ala Ala Ser Asp Val Ala Asp Arg Asn Ala Leu Glu Ala
4785 4790 4795 4800
Val Leu Gln Ala Ile Pro Ala Glu Trp Pro Leu Gln Gly Val Ile His
4805 4810 4815
Ala Ala Gly Ala Leu Asp Asp Gly Val Leu Asp Glu Gln Thr Thr Asp
4820 4825 4830
Arg Phe Ser Arg Val Leu Ala Pro Lys Val Thr Gly Ala Trp Asn Leu
4835 4840 4845
His Glu Leu Thr Ala Gly Asn Asp Leu Ala Phe Phe Val Leu Phe Ser
4850 4855 4860
Ser Met Ser Gly Leu Leu Gly Ser Ala Gly Gln Ser Asn Tyr Ala Ala
4865 4870 4875 4880
Ala Asn Thr Phe Leu Asp Ala Leu Ala Ala His Arg Arg Ala Glu Gly
4885 4890 4895
Leu Ala Ala Gln Ser Leu Ala Trp Gly Pro Trp Ser Asp Gly Gly Met
4900 4905 4910
Ala Ala Gly Leu Ser Ala Ala Leu Gln Ala Arg Leu Ala Arg His Gly
4915 4920 4925
Met Gly Ala Leu Ser Pro Ala Gln Gly Thr Ala Leu Leu Gly Gln Ala
4930 4935 4940
Leu Ala Arg Pro Glu Thr Gln Leu Gly Ala Met Ser Leu Asp Val Arg
4945 4950 4955 4960
Ala Ala Ser Gln Ala Ser Gly Ala Ala Val Pro Pro Val Trp Arg Ala
4965 4970 4975
Leu Val Arg Ala Glu Ala Arg His Thr Ala Ala Gly Ala Gln Gly Ala
4980 4985 4990
Leu Ala Ala Arg Leu Gly Ala Leu Pro Glu Ala Arg Arg Ala Asp Glu
4995 5000 5005
Val Arg Lys Val Val Gln Ala Glu Ile Ala Arg Val Leu Ser Trp Ser
5010 5015 5020
Ala Ala Ser Ala Val Pro Val Asp Arg Pro Leu Ser Asp Leu Gly Leu
5025 5030 5035 5040
Asp Ser Leu Thr Ala Val Glu Leu Arg Asn Val Leu Gly Gln Arg Val
5045 5050 5055
Gly Ala Thr Leu Pro Ala Thr Leu Ala Phe Asp His Pro Thr Val Asp
5060 5065 5070
Ala Leu Thr Arg Trp Leu Leu Asp Lys Val Leu Ala Val Ala Glu Pro
5075 5080 5085
Ser Val Ser Ser Ala Lys Ser Ser Pro Gln Val Ala Leu Asp Glu Pro
5090 5095 5100
Ile Ala Ile Ile Gly Ile Gly Cys Arg Phe Pro Gly Gly Val Ala Asp
5105 5110 5115 5120
Pro Glu Ser Phe Trp Arg Leu Leu Glu Glu Gly Ser Asp Ala Val Val
5125 5130 5135
Glu Val Pro His Glu Arg Trp Asp Ile Asp Ala Phe Tyr Asp Pro Asp
5140 5145 5150
Pro Asp Val Arg Gly Lys Met Thr Thr Arg Phe Gly Gly Phe Leu Ser
5155 5160 5165
Asp Ile Asp Arg Phe Asp Pro Ala Phe Phe Gly Ile Ser Pro Arg Glu
5170 5175 5180
Ala Thr Thr Met Asp Pro Gln Gln Arg Leu Leu Leu Glu Thr Ser Trp
5185 5190 5195 5200
Glu Ala Phe Glu Arg Ala Gly Ile Leu Pro Glu Arg Leu Met Gly Ser
5205 5210 5215
Asp Thr Gly Val Phe Val Gly Leu Phe Tyr Gln Glu Tyr Ala Ala Leu
5220 5225 5230
Ala Gly Gly Ile Glu Ala Phe Asp Gly Tyr Leu Gly Thr Gly Thr Thr
5235 5240 5245
Ala Ser Val Ala Ser Gly Arg Ile Ser Tyr Val Leu Gly Leu Lys Gly
5250 5255 5260
Pro Ser Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val
5265 5270 5275 5280
His Leu Ala Cys Gln Ala Leu Arg Arg Gly Glu Cys Ser Val Ala Leu
5285 5290 5295
Ala Gly Gly Val Ala Leu Met Leu Thr Pro Ala Thr Phe Val Glu Phe
5300 5305 5310
Ser Arg Leu Arg Gly Leu Ala Pro Asp Gly Arg Cys Lys Ser Phe Ser
5315 5320 5325
Ala Ala Ala Asp Gly Val Gly Trp Ser Glu Gly Cys Ala Met Leu Leu
5330 5335 5340
Leu Lys Pro Leu Arg Asp Ala Gln Arg Asp Gly Asp Pro Ile Leu Ala
5345 5350 5355 5360
Val Ile Arg Gly Thr Ala Val Asn Gln Asp Gly Arg Ser Asn Gly Leu
5365 5370 5375
Thr Ala Pro Asn Gly Ser Ser Gln Gln Glu Val Ile Arg Arg Ala Leu
5380 5385 5390
Glu Gln Ala Gly Leu Ala Pro Ala Asp Val Ser Tyr Val Glu Cys His
5395 5400 5405
Gly Thr Gly Thr Thr Leu Gly Asp Pro Ile Glu Val Gln Ala Leu Gly
5410 5415 5420
Ala Val Leu Ala Gln Gly Arg Pro Ser Asp Arg Pro Leu Val Ile Gly
5425 5430 5435 5440
Ser Val Lys Ser Asn Ile Gly His Thr Gln Ala Ala Ala Gly Val Ala
5445 5450 5455
Gly Val Ile Lys Val Ala Leu Ala Leu Glu Arg Gly Leu Ile Pro Arg
5460 5465 5470
Ser Leu His Phe Asp Ala Pro Asn Pro His Ile Pro Trp Ser Glu Leu
5475 5480 5485
Ala Val Gln Val Ala Ala Lys Pro Val Glu Trp Thr Arg Asn Gly Val
5490 5495 5500
Pro Arg Arg Ala Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala
5505 5510 5515 5520
His Val Val Leu Glu Glu Ala Pro Ala Ala Ala Phe Ala Pro Ala Ala
5525 5530 5535
Ala Arg Ser Ala Glu Leu Phe Val Leu Ser Ala Lys Ser Ala Ala Ala
5540 5545 5550
Leu Asp Ala Gln Ala Ala Arg Leu Ser Ala His Val Val Ala His Pro
5555 5560 5565
Glu Leu Gly Leu Gly Asp Leu Ala Phe Ser Leu Ala Thr Thr Arg Ser
5570 5575 5580
Pro Met Thr Tyr Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Ala Leu
5585 5590 5595 5600
Ser Ala Ala Leu Asp Thr Ala Ala Gln Gly Gln Ala Pro Pro Ala Ala
5605 5610 5615
Ala Arg Gly His Ala Ser Thr Gly Ser Ala Pro Lys Val Val Phe Val
5620 5625 5630
Phe Pro Gly Gln Gly Ser Gln Trp Leu Gly Met Gly Gln Lys Leu Leu
5635 5640 5645
Ser Glu Glu Pro Val Phe Arg Asp Ala Leu Ser Ala Cys Asp Arg Ala
5650 5655 5660
Ile Gln Ala Glu Ala Gly Trp Ser Leu Leu Ala Glu Leu Ala Ala Asp
5665 5670 5675 5680
Glu Thr Thr Ser Gln Leu Gly Arg Ile Asp Val Val Gln Pro Ala Leu
5685 5690 5695
Phe Ala Ile Glu Val Ala Leu Ser Ala Leu Trp Arg Ser Trp Gly Val
5700 5705 5710
Glu Pro Asp Ala Val Val Gly His Ser Met Gly Glu Val Ala Ala Ala
5715 5720 5725
His Val Ala Gly Ala Leu Ser Leu Glu Asp Ala Val Ala Ile Ile Cys
5730 5735 5740
Arg Arg Ser Leu Leu Leu Arg Arg Ile Ser Gly Gln Gly Glu Met Ala
5745 5750 5755 5760
Val Val Glu Leu Ser Leu Ala Glu Ala Glu Ala Ala Leu Leu Gly Tyr
5765 5770 5775
Glu Asp Arg Leu Ser Val Ala Val Ser Asn Ser Pro Arg Ser Thr Val
5780 5785 5790
Leu Ala Gly Glu Pro Ala Ala Leu Ala Glu Val Leu Ala Ile Leu Ala
5795 5800 5805
Ala Lys Gly Val Phe Cys Arg Arg Val Lys Val Asp Val Ala Ser His
5810 5815 5820
Ser Pro Gln Ile Asp Pro Leu Arg Asp Glu Leu Leu Ala Ala Leu Gly
5825 5830 5835 5840
Glu Leu Glu Pro Arg Gln Ala Thr Val Ser Met Arg Ser Thr Val Thr
5845 5850 5855
Ser Thr Ile Met Ala Gly Pro Glu Leu Val Ala Ser Tyr Trp Ala Asp
5860 5865 5870
Asn Val Arg Gln Pro Val Arg Phe Ala Glu Ala Val Gln Ser Leu Met
5875 5880 5885
Glu Asp Gly His Gly Leu Phe Val Glu Met Ser Pro His Pro Ile Leu
5890 5895 5900
Thr Thr Ser Val Glu Glu Ile Arg Arg Ala Thr Lys Arg Glu Gly Val
5905 5910 5915 5920
Ala Val Gly Ser Leu Arg Arg Gly Gln Asp Glu Arg Leu Ser Met Leu
5925 5930 5935
Glu Ala Leu Gly Ala Leu Trp Val His Gly Gln Ala Val Gly Trp Glu
5940 5945 5950
Arg Leu Phe Ser Ala Gly Gly Ala Gly Leu Arg Arg Val Pro Leu Pro
5955 5960 5965
Thr Tyr Pro Trp Gln Arg Glu Arg Tyr Trp Val Asp Ala Pro Thr Gly
5970 5975 5980
Gly Ala Ala Gly Gly Ser Arg Phe Ala His Ala Gly Ser His Pro Leu
5985 5990 5995 6000
Leu Gly Glu Met Gln Thr Leu Ser Thr Gln Arg Ser Thr Arg Val Trp
6005 6010 6015
Glu Thr Thr Leu Asp Leu Lys Arg Leu Pro Trp Leu Gly Asp His Arg
6020 6025 6030
Val Gln Gly Ala Val Val Phe Pro Gly Ala Ala Tyr Leu Glu Met Ala
6035 6040 6045
Leu Ser Ser Gly Ala Glu Ala Leu Gly Asp Gly Pro Leu Gln Val Ser
6050 6055 6060
Asp Val Val Leu Ala Glu Ala Leu Ala Phe Ala Asp Asp Thr Pro Ala
6065 6070 6075 6080
Ala Val Gln Val Met Ala Thr Glu Glu Arg Pro Gly Arg Leu Gln Phe
6085 6090 6095
His Val Ala Ser Arg Val Pro Gly His Gly Gly Ala Ala Phe Arg Ser
6100 6105 6110
His Ala Arg Gly Val Leu Arg Gln Ile Glu Arg Ala Glu Val Pro Ala
6115 6120 6125
Arg Leu Asp Leu Ala Ala Leu Arg Ala Arg Leu Gln Ala Ser Ala Pro
6130 6135 6140
Ala Ala Ala Thr Tyr Ala Ala Leu Ala Glu Met Gly Leu Glu Tyr Gly
6145 6150 6155 6160
Pro Ala Phe Gln Gly Leu Val Glu Leu Trp Arg Gly Glu Gly Glu Ala
6165 6170 6175
Leu Gly Arg Val Arg Leu Pro Glu Ala Ala Gly Ser Pro Ala Ala Cys
6180 6185 6190
Arg Leu His Pro Ala Leu Leu Asp Ala Cys Phe His Val Ser Ser Ala
6195 6200 6205
Phe Ala Asp Arg Gly Glu Ala Thr Pro Trp Val Pro Val Glu Ile Gly
6210 6215 6220
Ser Leu Arg Trp Phe Gln Arg Pro Ser Gly Glu Leu Trp Cys His Ala
6225 6230 6235 6240
Arg Ser Val Ser His Gly Lys Pro Thr Pro Asp Arg Arg Ser Thr Asp
6245 6250 6255
Phe Trp Val Val Asp Ser Thr Gly Ala Ile Val Ala Glu Ile Ser Gly
6260 6265 6270
Leu Val Ala Gln Arg Leu Ala Gly Gly Val Arg Arg Arg Glu Glu Asp
6275 6280 6285
Asp Trp Phe Met Glu Pro Ala Trp Glu Pro Thr Ala Val Pro Gly Ser
6290 6295 6300
Glu Val Met Ala Gly Arg Trp Leu Leu Ile Gly Ser Gly Gly Gly Leu
6305 6310 6315 6320
Gly Ala Ala Leu His Ser Ala Leu Thr Glu Ala Gly His Ser Val Val
6325 6330 6335
His Ala Thr Gly Arg Gly Thr Ser Ala Ala Gly Leu Gln Ala Leu Leu
6340 6345 6350
Thr Ala Ser Phe Asp Gly Gln Ala Pro Thr Ser Val Val His Leu Gly
6355 6360 6365
Ser Leu Asp Glu Arg Gly Val Leu Asp Ala Asp Ala Pro Phe Asp Ala
6370 6375 6380
Asp Ala Leu Glu Glu Ser Leu Val Arg Gly Cys Asp Ser Val Leu Trp
6385 6390 6395 6400
Thr Val Gln Ala Val Ala Gly Ala Gly Phe Arg Asp Pro Pro Arg Leu
6405 6410 6415
Trp Leu Val Thr Arg Gly Ala Gln Ala Ile Gly Ala Gly Asp Val Ser
6420 6425 6430
Val Ala Gln Ala Pro Leu Leu Gly Leu Gly Arg Val Ile Ala Leu Glu
6435 6440 6445
His Ala Glu Leu Arg Cys Ala Arg Ile Asp Leu Asp Pro Ala Arg Arg
6450 6455 6460
Asp Gly Glu Val Asp Glu Leu Leu Ala Glu Leu Leu Ala Asp Asp Ala
6465 6470 6475 6480
Glu Glu Glu Val Ala Phe Arg Gly Gly Glu Arg Arg Val Ala Arg Leu
6485 6490 6495
Val Arg Arg Leu Pro Glu Thr Asp Cys Arg Glu Lys Ile Glu Pro Ala
6500 6505 6510
Glu Gly Arg Pro Phe Arg Leu Glu Ile Asp Gly Ser Gly Val Leu Asp
6515 6520 6525
Asp Leu Val Leu Arg Ala Thr Glu Arg Arg Pro Pro Gly Pro Gly Glu
6530 6535 6540
Val Glu Ile Ala Val Glu Ala Ala Gly Leu Asn Phe Leu Asp Val Met
6545 6550 6555 6560
Arg Ala Met Gly Ile Tyr Pro Gly Pro Gly Asp Gly Pro Val Ala Leu
6565 6570 6575
Gly Ala Glu Cys Ser Gly Arg Ile Val Ala Met Gly Glu Gly Val Glu
6580 6585 6590
Ser Leu Arg Ile Gly Gln Asp Val Val Ala Val Ala Pro Phe Ser Phe
6595 6600 6605
Gly Thr His Val Thr Ile Asp Ala Arg Met Leu Ala Pro Arg Pro Ala
6610 6615 6620
Ala Leu Thr Ala Ala Gln Ala Ala Ala Leu Pro Val Ala Phe Met Thr
6625 6630 6635 6640
Ala Trp Tyr Gly Leu Val His Leu Gly Arg Leu Arg Ala Gly Glu Arg
6645 6650 6655
Val Leu Ile His Ser Ala Thr Gly Gly Thr Gly Leu Ala Ala Val Gln
6660 6665 6670
Ile Ala Arg His Leu Gly Ala Glu Ile Phe Ala Thr Ala Gly Thr Pro
6675 6680 6685
Glu Lys Arg Ala Trp Leu Arg Glu Gln Gly Ile Ala His Val Met Asp
6690 6695 6700
Ser Arg Ser Leu Asp Phe Ala Glu Gln Val Leu Ala Ala Thr Lys Gly
6705 6710 6715 6720
Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Ala Ala Ile Asp
6725 6730 6735
Ala Ser Leu Ser Thr Leu Val Pro Asp Gly Arg Phe Ile Glu Leu Gly
6740 6745 6750
Lys Thr Asp Ile Tyr Ala Asp Arg Ser Leu Gly Leu Ala His Phe Arg
6755 6760 6765
Lys Ser Leu Ser Tyr Ser Ala Val Asp Leu Ala Gly Leu Ala Val Arg
6770 6775 6780
Arg Pro Glu Arg Val Ala Ala Leu Leu Ala Glu Val Val Asp Leu Leu
6785 6790 6795 6800
Ala Arg Gly Ala Leu Gln Pro Leu Pro Val Glu Ile Phe Pro Leu Ser
6805 6810 6815
Arg Ala Ala Asp Ala Phe Arg Lys Met Ala Gln Ala Gln His Leu Gly
6820 6825 6830
Lys Leu Val Leu Ala Leu Glu Asp Pro Asp Val Arg Ile Arg Val Pro
6835 6840 6845
Gly Glu Ser Gly Val Ala Ile Arg Ala Asp Gly Ala Tyr Leu Val Thr
6850 6855 6860
Gly Gly Leu Gly Gly Leu Gly Leu Ser Val Ala Gly Trp Leu Ala Glu
6865 6870 6875 6880
Gln Gly Ala Gly His Leu Val Leu Val Gly Arg Ser Gly Ala Val Ser
6885 6890 6895
Ala Glu Gln Gln Thr Ala Val Ala Ala Leu Glu Ala His Gly Ala Arg
6900 6905 6910
Val Thr Val Ala Arg Ala Asp Val Ala Asp Arg Ala Gln Met Glu Arg
6915 6920 6925
Ile Leu Arg Glu Val Thr Ala Ser Gly Met Pro Leu Arg Gly Val Val
6930 6935 6940
His Ala Ala Gly Ile Leu Asp Asp Gly Leu Leu Met Gln Gln Thr Pro
6945 6950 6955 6960
Ala Arg Phe Arg Ala Val Met Ala Pro Lys Val Arg Gly Ala Leu His
6965 6970 6975
Leu His Ala Leu Thr Arg Glu Ala Pro Leu Ser Phe Phe Val Leu Tyr
6980 6985 6990
Ala Ser Gly Ala Gly Leu Leu Gly Ser Pro Gly Gln Gly Asn Tyr Ala
6995 7000 7005
Ala Ala Asn Thr Phe Leu Asp Ala Leu Ala His His Arg Arg Ala Gln
7010 7015 7020
Gly Leu Pro Ala Leu Ser Ile Asp Trp Gly Leu Phe Ala Asp Val Gly
7025 7030 7035 7040
Leu Ala Ala Gly Gln Gln Asn Arg Gly Ala Arg Leu Val Thr Arg Gly
7045 7050 7055
Thr Arg Ser Leu Thr Pro Asp Glu Gly Leu Trp Ala Leu Glu Arg Leu
7060 7065 7070
Leu Asp Gly Asp Arg Thr Gln Ala Gly Val Met Pro Phe Asp Val Arg
7075 7080 7085
Gln Trp Val Glu Phe Tyr Pro Ala Ala Ala Ser Ser Arg Arg Leu Ser
7090 7095 7100
Arg Leu Met Thr Ala Arg Arg Val Ala Ser Gly Arg Leu Ala Gly Asp
7105 7110 7115 7120
Arg Asp Leu Leu Glu Arg Leu Ala Thr Ala Glu Ala Gly Ala Arg Ala
7125 7130 7135
Gly Met Leu Gln Glu Val Val Arg Ala Gln Val Ser Gln Val Leu Arg
7140 7145 7150
Leu Ser Glu Gly Lys Leu Asp Val Asp Ala Pro Leu Thr Ser Leu Gly
7155 7160 7165
Met Asp Ser Leu Met Gly Leu Glu Leu Arg Asn Arg Ile Glu Ala Val
7170 7175 7180
Leu Gly Ile Thr Met Pro Ala Thr Leu Leu Trp Thr Tyr Pro Thr Val
7185 7190 7195 7200
Ala Ala Leu Ser Ala His Leu Ala Ser His Val Val Ser Thr Gly Asp
7205 7210 7215
Gly Glu Ser Ala Arg Pro Pro Asp Thr Gly Ser Val Ala Pro Thr Thr
7220 7225 7230
His Glu Val Ala Ser Leu Asp Glu Asp Gly Leu Phe Ala Leu Ile Asp
7235 7240 7245
Glu Ser Leu Ala Arg Ala Gly Lys Arg
7250 7255
<210>6
<211>3798
<212>PRT
<213〉sorangium cellulosum
<400>6
Val Thr Asp Arg Glu Gly Gln Leu Leu Glu Arg Leu Arg Glu Val Thr
1 5 10 15
Leu Ala Leu Arg Lys Thr Leu Asn Glu Arg Asp Thr Leu Glu Leu Glu
20 25 30
Lys Thr Glu Pro Ile Ala Ile Val Gly Ile Gly Cys Arg Phe Pro Gly
35 40 45
Gly Ala Gly Thr Pro Glu Ala Phe Trp Glu Leu Leu Asp Asp Gly Arg
50 55 60
Asp Ala Ile Arg Pro Leu Glu Glu Arg Trp Ala Leu Val Gly Val Asp
65 70 75 80
Pro Gly Asp Asp Val Pro Arg Trp Ala Gly Leu Leu Thr Glu Ala Ile
85 90 95
Asp Gly Phe Asp Ala Ala Phe Phe Gly Ile Ala Pro Arg Glu Ala Arg
100 105 110
Ser Leu Asp Pro Gln His Arg Leu Leu Leu Glu Val Ala Trp Glu Gly
115 120 125
Phe Glu Asp Ala Gly Ile Pro Pro Arg Ser Leu Val Gly Ser Arg Thr
130 135 140
Gly Val Phe Val Gly Val Cys Ala Thr Glu Tyr Leu His Ala Ala Val
145 150 155 160
Ala His Gln Pro Arg Glu Glu Arg Asp Ala Tyr Ser Thr Thr Gly Asn
165 170 175
Met Leu Ser Ile Ala Ala Gly Arg Leu Ser Tyr Thr Leu Gly Leu Gln
180 185 190
Gly Pro Cys Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala
195 200 205
Ile His Leu Ala Cys Arg Ser Leu Arg Ala Arg Glu Ser Asp Leu Ala
210 215 220
Leu Ala Gly Gly Val Asn Met Leu Leu Ser Pro Asp Thr Met Arg Ala
225 230 235 240
Leu Ala Arg Thr Gln Ala Leu Ser Pro Asn Gly Arg Cys Gln Thr Phe
245 250 255
Asp Ala Ser Ala Asn Gly Phe Val Arg Gly Glu Gly Cys Gly Leu Ile
260 265 270
Val Leu Lys Arg Leu Ser Asp Ala Arg Arg Asp Gly Asp Arg Ile Trp
275 280 285
Ala Leu Ile Arg Gly Ser Ala Ile Asn Gln Asp Gly Arg Ser Thr Gly
290 295 300
Leu Thr Ala Pro Asn Val Leu Ala Gln Gly Ala Leu Leu Arg Glu Ala
305 310 315 320
Leu Arg Asn Ala Gly Val Glu Ala Glu Ala Ile Gly Tyr Ile Glu Thr
325 330 335
His Gly Ala Ala Thr Ser Leu Gly Asp Pro Ile Glu Ile Glu Ala Leu
340 345 350
Arg Ala Val Val Gly Pro Ala Arg Ala Asp Gly Ala Arg Cys Val Leu
355 360 365
Gly Ala Val Lys Thr Asn Leu Gly His Leu Glu Gly Ala Ala Gly Val
370 375 380
Ala Gly Leu Ile Lys Ala Thr Leu Ser Leu His His Glu Arg Ile Pro
385 390 395 400
Arg Asn Leu Asn Phe Arg Thr Leu Asn Pro Arg Ile Arg Ile Glu Gly
405 410 415
Thr Ala Leu Ala Leu Ala Thr Glu Pro Val Pro Trp Pro Arg Thr Gly
420 425 430
Arg Thr Arg Phe Ala Gly Val Ser Ser Phe Gly Met Ser Gly Thr Asn
435 440 445
Ala His Val Val Leu Glu Glu Ala Pro Ala Val Glu Pro Glu Ala Ala
450 455 460
Ala Pro Glu Arg Ala Ala Glu Leu Phe Val Leu Ser Ala Lys Ser Ala
465 470 475 480
Ala Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Asp His Leu Glu Lys
485 490 495
His Val Glu Leu Gly Leu Gly Asp Val Ala Phe Ser Leu Ala Thr Thr
500 505 510
Arg Ser Ala Met Glu His Arg Leu Ala Val Ala Ala Ser Ser Arg Glu
515 520 525
Ala Leu Arg Gly Ala Leu Ser Ala Ala Ala Gln Gly His Thr Pro Pro
530 535 540
Gly Ala Val Arg Gly Arg Ala Ser Gly Gly Ser Ala Pro Lys Val Val
545 550 555 560
Phe Val Phe Pro Gly Gln Gly Ser Gln Trp Val Gly Met Gly Arg Lys
565 570 575
Leu Met Ala Glu Glu Pro Val Phe Arg Ala Ala Leu Glu Gly Cys Asp
580 585 590
Arg Ala Ile Glu Ala Glu Ala Gly Trp Ser Leu Leu Gly Glu Leu Ser
595 600 605
Ala Asp Glu Ala Ala Ser Gln Leu Gly Arg Ile Asp Val Val Gln Pro
610 615 620
Val Leu Phe Ala Met Glu Val Ala Leu Ser Ala Leu Trp Arg Ser Trp
625 630 635 640
Gly Val Glu Pro Glu Ala Val Val Gly His Ser Met Gly Glu Val Ala
645 650 655
Ala Ala His Val Ala Gly Ala Leu Ser Leu Glu Asp Ala Val Ala Ile
660 665 670
Ile Cys Arg Arg Ser Arg Leu Leu Arg Arg Ile Ser Gly Gln Gly Glu
675 680 685
Met Ala Leu Val Glu Leu Ser Leu Glu Glu Ala Glu Ala Ala Leu Arg
690 695 700
Gly His Glu Gly Arg Leu Ser Val Ala Val Ser Asn Ser Pro Arg Ser
705 710 715 720
Thr Val Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu Val Leu Ala Ala
725 730 735
Leu Thr Ala Lys Gly Val Phe Trp Arg Gln Val Lys Val Asp Val Ala
740 745 750
Ser His Ser Pro Gln Val Asp Pro Leu Arg Glu Glu Leu Ile Ala Ala
755 760 765
Leu Gly Ala Ile Arg Pro Arg Ala Ala Ala Val Pro Met Arg Ser Thr
770 775 780
Val Thr Gly Gly Val Ile Ala Gly Pro Glu Leu Gly Ala Ser Tyr Trp
785 790 795 800
Ala Asp Asn Leu Arg Gln Pro Val Arg Phe Ala Ala Ala Ala Gln Ala
805 810 815
Leu Leu Glu Gly Gly Pro Ala Leu Phe Ile Glu Met Ser Pro His Pro
820 825 830
Ile Leu Val Pro Pro Leu Asp Glu Ile Gln Thr Ala Ala Glu Gln Gly
835 840 845
Gly Ala Ala Val Gly Ser Leu Arg Arg Gly Gln Asp Glu Arg Ala Thr
850 855 860
Leu Leu Glu Ala Leu Gly Thr Leu Trp Ala Ser Gly Tyr Pro Val Ser
865 870 875 880
Trp Ala Arg Leu Phe Pro Ala Gly Gly Arg Arg Val Pro Leu Pro Thr
885 890 895
Tyr Pro Trp Gln His Glu Arg Cys Trp Ile Glu Val Glu Pro Asp Ala
900 905 910
Arg Arg Leu Ala Ala Ala Asp Pro Thr Lys Asp Trp Phe Tyr Arg Thr
915 920 925
Asp Trp Pro Glu Val Pro Arg Ala Ala Pro Lys Ser Glu Thr Ala His
930 935 940
Gly Ser Trp Leu Leu Leu Ala Asp Arg Gly Gly Val Gly Glu Ala Val
945 950 955 960
Ala Ala Ala Leu Ser Thr Arg Gly Leu Ser Cys Thr Val Leu His Ala
965 970 975
Ser Ala Asp Ala Ser Thr Val Ala Glu Gln Val Ser Glu Ala Ala Ser
980 985 990
Arg Arg Asn Asp Trp Gln Gly Val Leu Tyr Leu Trp Gly Leu Asp Ala
995 1000 1005
Val Val Asp Ala Gly Ala Ser Ala Asp Glu Val Ser Glu Ala Thr Arg
1010 1015 1020
Arg Ala Thr Ala Pro Val Leu Gly Leu Val Arg Phe Leu Ser Ala Ala
1025 1030 1035 1040
Pro His Pro Pro Arg Phe Trp Val Val Thr Arg Gly Ala Cys Thr Val
1045 1050 1055
Gly Gly Glu Pro Glu Ala Ser Leu Cys Gln Ala Ala Leu Trp Gly Leu
1060 1065 1070
Ala Arg Val Ala Ala Leu Glu His Pro Ala Ala Trp Gly Gly Leu Val
1075 1080 1085
Asp Leu Asp Pro Gln Lys Ser Pro Thr Glu Ile Glu Pro Leu Val Ala
1090 1095 1100
Glu Leu Leu Ser Pro Asp Ala Glu Asp Gln Leu Ala Phe Arg Ser Gly
1105 1110 1115 1120
Arg Arg His Ala Ala Arg Leu Val Ala Ala Pro Pro Glu Gly Asp Val
1125 1130 1135
Ala Pro Ile Ser Leu Ser Ala Glu Gly Ser Tyr Leu Val Thr Gly Gly
1140 1145 1150
Leu Gly Gly Leu Gly Leu Leu Val Ala Arg Trp Leu Val Glu Arg Gly
1155 1160 1165
Ala Arg His Leu Val Leu Thr Ser Arg His Gly Leu Pro Glu Arg Gln
1170 1175 1180
Ala Ser Gly Gly Glu Gln Pro Pro Glu Ala Arg Ala Arg Ile Ala Ala
1185 1190 1195 1200
Val Glu Gly Leu Glu Ala Gln Gly Ala Arg Val Thr Val Ala Ala Val
1205 1210 1215
Asp Val Ala Glu Ala Asp Pro Met Thr Ala Leu Leu Ala Ala Ile Glu
1220 1225 1230
Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Val Phe Pro Val Arg
1235 1240 1245
His Leu Ala Glu Thr Asp Glu Ala Leu Leu Glu Ser Val Leu Arg Pro
1250 1255 1260
Lys Val Ala Gly Ser Trp Leu Leu His Arg Leu Leu Arg Asp Arg Pro
1265 1270 1275 1280
Leu Asp Leu Phe Val Leu Phe Ser Ser Gly Ala Ala Val Trp Gly Gly
1285 1290 1295
Lys Gly Gln Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu
1300 1305 1310
Ala His His Arg Arg Ala His Ser Leu Pro Ala Leu Ser Leu Ala Trp
1315 1320 1325
Gly Leu Trp Ala Glu Gly Gly Met Val Asp Ala Lys Ala His Ala Arg
1330 1335 1340
Leu Ser Asp Ile Gly Val Leu Pro Met Ala Thr Gly Pro Ala Leu Ser
1345 1350 1355 1360
Ala Leu Glu Arg Leu Val Asn Thr Ser Ala Val Gln Arg Ser Val Thr
1365 1370 1375
Arg Met Asp Trp Ala Arg Phe Ala Pro Val Tyr Ala Ala Arg Gly Arg
1380 1385 1390
Arg Asn Leu Leu Ser Ala Leu Val Ala Glu Asp Glu Arg Ala Ala Ser
1395 1400 1405
Pro Pro Val Pro Thr Ala Asn Arg Ile Trp Arg Gly Leu Ser Val Ala
1410 1415 1420
Glu Ser Arg Ser Ala Leu Tyr Glu Leu Val Arg Gly Ile Val Ala Arg
1425 1430 1435 1440
Val Leu Gly Phe Ser Asp Pro Gly Ala Leu Asp Val Gly Arg Gly Phe
1445 1450 1455
Ala Glu Gln Gly Leu Asp Ser Leu Met Ala Leu Glu Ile Arg Asn Arg
1460 1465 1470
Leu Gln Arg Glu Leu Gly Glu Arg Leu Ser Ala Thr Leu Ala Phe Asp
1475 1480 1485
His Pro Thr Val Glu Arg Leu Val Ala His Leu Leu Thr Asp Val Leu
1490 1495 1500
Lys Leu Glu Asp Arg Ser Asp Thr Arg His Ile Arg Ser Val Ala Ala
1505 1510 1515 1520
Asp Asp Asp Ile Ala Ile Val Gly Ala Ala Cys Arg Phe Pro Gly Gly
1525 1530 1535
Asp Glu Gly Leu Glu Thr Tyr Trp Arg His Leu Ala Glu Gly Met Val
1540 1545 1550
Val Ser Thr Glu Val Pro Ala Asp Arg Trp Arg Ala Ala Asp Trp Tyr
1555 1560 1565
Asp Pro Asp Pro Glu Val Pro Gly Arg Thr Tyr Val Ala Lys Gly Ala
1570 1575 1580
Phe Leu Arg Asp Val Arg Ser Leu Asp Ala Ala Phe Phe Ala Ile Ser
1585 1590 1595 1600
Pro Arg Glu Ala Met Ser Leu Asp Pro Gln Gln Arg Leu Leu Leu Glu
1605 1610 1615
Val Ser Trp Glu Ala Ile Glu Arg Ala Gly Gln Asp Pro Met Ala Leu
1620 1625 1630
Arg Glu Ser Ala Thr Gly Val Phe Val Gly Met Ile Gly Ser Glu His
1635 1640 1645
Ala Glu Arg Val Gln Gly Leu Asp Asp Asp Ala Ala Leu Leu Tyr Gly
1650 1655 1660
Thr Thr Gly Asn Leu Leu Ser Val Ala Ala Gly Arg Leu Ser Phe Phe
1665 1670 1675 1680
Leu Gly Leu His Gly Pro Thr Met Thr Val Asp Thr Ala Cys Ser Ser
1685 1690 1695
Ser Leu Val Ala Leu His Leu Ala Cys Gln Ser Leu Arg Leu Gly Glu
1700 1705 1710
Cys Asp Gln Ala Leu Ala Gly Gly Ser Ser Val Leu Leu Ser Pro Arg
1715 1720 1725
Ser Phe Val Ala Ala Ser Arg Met Arg Leu Leu Ser Pro Asp Gly Arg
1730 1735 1740
Cys Lys Thr Phe Ser Ala Ala Ala Asp Gly Phe Ala Arg Ala Glu Gly
145 1750 1755 1760
Cys Ala Val Val Val Leu Lys Arg Leu Arg Asp Ala Gln Arg Asp Arg
1765 1770 1775
Asp Pro Ile Leu Ala Val Val Arg Ser Thr Ala Ile Asn His Asp Gly
1780 1785 1790
Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala Gln Gln Ala Leu
1795 1800 1805
Leu Arg Gln Ala Leu Ala Gln Ala Gly Val Ala Pro Ala Glu Val Asp
1810 1815 1820
Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro Ile Glu
1825 1830 1835 1840
Val Gln Ala Leu Gly Ala Val Tyr Gly Arg Gly Arg Pro Ala Glu Arg
1845 1850 1855
Pro Leu Trp Leu Gly Ala Val Lys Ala Asn Leu Gly His Leu Glu Ala
1860 1865 1870
Ala Ala Gly Leu Ala Gly Val Leu Lys Val Leu Leu Ala Leu Glu His
1875 1880 1885
Glu Gln Ile Pro Ala Gln Pro Glu Leu Asp Glu Leu Asn Pro His Ile
1890 1895 1900
Pro Trp Ala Glu Leu Pro Val Ala Val Val Arg Arg Ala Val Pro Trp
1905 1910 1915 1920
Pro Arg Gly Ala Arg Pro Arg Arg Ala Gly Val Ser Ala Phe Gly Leu
1925 1930 1935
Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala Pro Ala Val Glu
1940 1945 1950
Pro Val Ala Ala Ala Pro Glu Arg Ala Ala Glu Leu Phe Val Leu Ser
1955 1960 1965
Ala Lys Ser Ala Ala Ala Leu Asp Ala Gln Ala Ala Arg Leu Arg Asp
1970 1975 1980
His Leu Glu Lys His Val Glu Leu Gly Leu Gly Asp Val Ala Phe Ser
1985 1990 1995 2000
Leu Ala Thr Thr Arg Ser Ala Met Glu His Arg Leu Ala Val Ala Ala
2005 2010 2015
Ser Ser Arg Glu Ala Leu Arg Gly Ala Leu ser Ala Ala Ala Gln Gly
2020 2025 2030
His Thr Pro Pro Gly Ala Val Arg Gly Arg Ala Ser Gly Gly Ser Ala
2035 2040 2045
Pro Lys Val Val Phe Val Phe Pro Gly Gln Gly Ser Gln Trp Val Gly
2050 2055 2060
Met Gly Arg Lys Leu Met Ala Glu Glu Pro Val Phe Arg Ala Ala Leu
2065 2070 2075 2080
Glu Gly Cys Asp Arg Ala Ile Glu Ala Glu Ala Gly Trp Ser Leu Leu
2085 2090 2095
Gly Glu Leu Ser Ala Asp Glu Ala Ala Ser Gln Leu Gly Arg Ile Asp
2100 2105 2110
Val Val Gln Pro Val Leu Phe Ala Met Glu Val Ala Leu Ser Ala Leu
2115 2120 2125
Trp Arg Ser Trp Gly Val Glu Pro Glu Ala Val Val Gly His Ser Met
2130 2135 2140
Gly Glu Val Ala Ala Ala His Val Ala Gly Ala Leu Ser Leu Glu Asp
2145 2150 2155 2160
Ala Val Ala Ile Ile Cys Arg Arg Ser Arg Leu Leu Arg Arg Ile Ser
2165 2170 2175
Gly Gln Gly Glu Met Ala Leu Val Glu Leu Ser Leu Glu Glu Ala Glu
2180 2185 2190
Ala Ala Leu Arg Gly His Glu Gly Arg Leu Ser Val Ala Val Ser Asn
2195 2200 2205
Ser Pro Arg Ser Thr Val Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu
2210 2215 2220
Val Leu Ala Ala Leu Thr Ala Lys Gly Val Phe Trp Arg Gln Val Lys
2225 2230 2235 2240
Val Asp Val Ala Ser His Ser Pro Gln Val Asp Pro Leu Arg Glu Glu
2245 2250 2255
Leu Ile Ala Ala Leu Gly Ala Ile Arg Pro Arg Ala Ala Ala Val Pro
2260 2265 2270
Met Arg Ser Thr Val Thr Gly Gly Val Ile Ala Gly Pro Glu Leu Gly
2275 2280 2285
Ala Ser Tyr Trp Ala Asp Asn Leu Arg Gln Pro Val Arg Phe Ala Ala
2290 2295 2300
Ala Ala Gln Ala Leu Leu Glu Gly Gly Pro Ala Leu Phe Ile Glu Met
2305 2310 2315 2320
Ser Pro His Pro Ile Leu Val Pro Pro Leu Asp Glu Ile Gln Thr Ala
2325 2330 2335
Ala Glu Gln Gly Gly Ala Ala Val Gly Ser Leu Arg Arg Gly Gln Asp
2340 2345 2350
Glu Arg Ala Thr Leu Leu Glu Ala Leu Gly Thr Leu Trp Ala Ser Gly
2355 2360 2365
Tyr Pro Val Ser Trp Ala Arg Leu Phe Pro Ala Gly Gly Arg Arg Val
2370 2375 2380
Pro Leu Pro Thr Tyr Pro Trp Gln His Glu Arg Tyr Trp Ile Glu Asp
2385 2390 2395 2400
Ser Val His Gly Ser Lys Pro Ser Leu Arg Leu Arg Gln Leu Arg Asn
2405 2410 2415
Gly Ala Thr Asp His Pro Leu Leu Gly Ala Pro Leu Leu Val Ser Ala
2420 2425 2430
Arg Pro Gly Ala His Leu Trp Glu Gln Ala Leu Ser Asp Glu Arg Leu
2435 2440 2445
Ser Tyr Leu Ser Glu His Arg Val His Gly Glu Ala Val Leu Pro Ser
2450 2455 2460
Ala Ala Tyr Val Glu Met Ala Leu Ala Ala Gly Val Asp Leu Tyr Gly
2465 2470 2475 2480
Thr Ala Thr Leu Val Leu Glu Gln Leu Ala Leu Glu Arg Ala Leu Ala
2485 2490 2495
Val Pro Ser Glu Gly Gly Arg Ile Val Gln Val Ala Leu Ser Glu Glu
2500 2505 2510
Gly Pro Gly Arg Ala Ser Phe Gln Val Ser Ser Arg Glu Glu Ala Gly
2515 2520 2525
Arg Ser Trp Val Arg His Ala Thr Gly His Val Cys Ser Gly Gln Ser
2530 2535 2540
Ser Ala Val Gly Ala Leu Lys Glu Ala Pro Trp Glu Ile Gln Arg Arg
2545 2550 2555 2560
Cys Pro Ser Val Leu Ser Ser Glu Ala Leu Tyr Pro Leu Leu Asn Glu
2565 2570 2575
His Ala Leu Asp Tyr Gly Pro Cys Phe Gln Gly Val Glu Gln Val Trp
2580 2585 2590
Leu Gly Thr Gly Glu Val Leu Gly Arg Val Arg Leu Pro Gly Asp Met
2595 2600 2605
Ala Ser Ser Ser Gly Ala Tyr Arg Ile His Pro Ala Leu Leu Asp Ala
2610 2615 2620
Cys Phe Gln Val Leu Thr Ala Leu Leu Thr Thr Pro Glu Ser Ile Glu
2625 2630 2635 2640
Ile Arg Arg Arg Leu Thr Asp Leu His Glu Pro Asp Leu Pro Arg Ser
2645 2650 2655
Arg Ala Pro Val Asn Gln Ala Val Ser Asp Thr Trp Leu Trp Asp Ala
2660 2665 2670
Ala Leu Asp Gly Gly Arg Arg Gln Ser Ala Ser Val Pro Val Asp Leu
2675 2680 2685
Val Leu Gly Ser Phe His Ala Lys Trp Glu Val Met Glu Arg Leu Ala
2690 2695 2700
Gln Ala Tyr Ile Ile Gly Thr Leu Arg Ile Trp Asn Val Phe Cys Ala
2705 2710 2715 2720
Ala Gly Glu Arg His Thr Ile Asp Glu Leu Leu Val Arg Leu Gln Ile
2725 2730 2735
Ser Val Val Tyr Arg Lys Val Ile Lys Arg Trp Met Glu His Leu Val
2740 2745 2750
Ala Ile Gly Ile Leu Val Gly Asp Gly Glu His Phe Val Ser Ser Gln
2755 2760 2765
Pro Leu Pro Glu Pro Asp Leu Ala Ala Val Leu Glu Glu Ala Gly Arg
2770 2775 2780
Val Phe Ala Asp Leu Pro Val Leu Phe Glu Trp Cys Lys Phe Ala Gly
2785 2790 2795 2800
Glu Arg Leu Ala Asp Val Leu Thr Gly Lys Thr Leu Ala Leu Glu Ile
2805 2810 2815
Leu Phe Pro Gly Gly Ser Phe Asp Met Ala Glu Arg Ile Tyr Arg Asp
2820 2825 2830
Ser Pro Ile Ala Arg Tyr Ser Asn Gly Ile Val Arg Gly Val Val Glu
2835 2840 2845
Ser Ala Ala Arg Val Val Ala Pro Ser Gly Met Phe Ser Ile Leu Glu
2850 2855 2860
Ile Gly Ala Gly Thr Gly Ala Thr Thr Ala Ala Val Leu Pro Val Leu
2865 2870 2875 2880
Leu Pro Asp Arg Thr Glu Tyr His Phe Thr Asp Val Ser Pro Leu Phe
2885 2890 2895
Leu Ala Arg Ala Glu Gln Arg Phe Arg Asp Tyr Pro Phe Leu Lys Tyr
2900 2905 2910
Gly Ile Leu Asp Val Asp Gln Glu Pro Ala Gly Gln Gly Tyr Ala His
2915 2920 2925
Gln Arg Phe Asp Val Ile Val Ala Ala Asn Val Ile His Ala Thr Arg
2930 2935 2940
Asp Ile Arg Ala Thr Ala Lys Arg Leu Leu Ser Leu Leu Ala Pro Gly
2945 2950 2955 2960
Gly Leu Leu Val Leu Val Glu Gly Thr Gly His ProIle Trp Phe Asp
2965 2970 2975
Ile Thr Thr Gly Leu Ile Glu Gly Trp Gln Lys Tyr Glu Asp Asp Leu
2980 2985 2990
Arg Ile Asp His Pro Leu Leu Pro Ala Arg Thr Trp Cys Asp Val Leu
2995 3000 3005
Arg Arg Val Gly Phe Ala Asp Ala Val Ser Leu Pro Gly Asp Gly Ser
3010 3015 3020
Pro Ala Gly Ile Leu Gly Gln His ValIle Leu Ser Arg Ala Pro Gly
3025 3030 3035 3040
Ile Ala Gly Ala Ala Cys Asp Ser Ser Gly Glu Ser Ala Thr Glu Ser
3045 3050 3055
Pro Ala Ala Arg Ala Val Arg Gln Glu Trp Ala Asp Gly Ser Ala Asp
3060 3065 3070
Val Val His Arg Met Ala Leu Glu Arg Met Tyr Phe His Arg Arg Pro
3075 3080 3085
Gly Arg Gln Val Trp Val His Gly Arg Leu Arg Thr Gly Gly Gly Ala
3090 3095 3100
Phe Thr Lys Ala Leu Ala Gly Asp Leu Leu Leu Phe Glu Asp Thr Gly
3105 3110 3115 3120
Gln Val Val Ala Glu Val Gln Gly Leu Arg Leu Pro Gln Leu Glu Ala
3125 3130 3135
Ser Ala Phe Ala Pro Arg Asp Pro Arg Glu Glu Trp Leu Tyr Ala Leu
3140 3145 3150
Glu Trp Gln Arg Lys Asp Pro Ile Pro Glu Ala Pro Ala Ala Ala Ser
3155 3160 3165
Ser Ser Ser Ala Gly Ala Trp Leu Val Leu Met Asp Gln Gly Gly Thr
3170 3175 3180
Gly Ala Ala Leu Val Ser Leu Leu Glu Gly Arg Gly Glu Ala Cys Val
3185 3190 3195 3200
Arg Val Ile Ala Gly Thr Ala Tyr Ala Cys Leu Ala Pro Gly Leu Tyr
3205 3210 3215
Gln Val Asp Pro Ala Gln Pro Asp Gly Phe His Thr Leu Leu Arg Asp
3220 3225 3230
Ala Phe Gly Glu Asp Arg Ile Cys Arg Ala Val Val His Met Trp Ser
3235 3240 3245
Leu Asp Ala Thr Ala Ala Gly Glu Arg Ala Thr Ala Glu Ser Leu Gln
3250 3255 3260
Ala Asp Gln Leu Leu Gly Ser Leu Ser Ala Leu Ser Leu Val Gln Ala
3265 3270 3275 3280
Leu Val Arg Arg Arg Trp Arg Asn Met Pro Arg Leu Trp Leu Leu Thr
3285 3290 3295
Arg Ala Val His Ala Val Gly Ala Glu Asp Ala Ala Ala Ser Val Ala
3300 3305 3310
Gln Ala Pro Val Trp Gly Leu Gly Arg Thr Leu Ala Leu Glu His Pro
3315 3320 3325
Glu Leu Arg Cys Thr Leu Val Asp Val Asn Pro Ala Pro Ser Pro Glu
3330 3335 3340
Asp Ala Ala Ala Leu Ala Val Glu Leu Gly Ala Ser Asp Arg Glu Asp
3345 3350 3355 3360
Gln Val Ala Leu Arg Ser Asp Gly Arg Tyr Val Ala Arg Leu Val Arg
3365 3370 3375
Ser Ser Phe Ser Gly Lys Pro Ala Thr Asp Cys Gly Ile Arg Ala Asp
3380 3385 3390
Gly Ser Tyr Val Ile Thr Asp Gly Met Gly Arg Val Gly Leu Ser Val
3395 3400 3405
Ala Gln Trp Met Val Met Gln Gly Ala Arg His Val Val Leu Val Asp
3410 3415 3420
Arg Gly Gly Ala Ser Glu Ala Ser Arg Asp Ala Leu Arg Ser Met Ala
3425 3430 3435 3440
Glu Ala Gly Ala Glu Val Gln Ile Val Glu Ala Asp Val Ala Arg Arg
3445 3450 3455
Asp Asp Val Ala Arg Leu Leu Ser Lys Ile Glu Pro Ser Met Pro Pro
3460 3465 3470
Leu Arg Gly Ile Val Tyr Val Asp Gly Thr Phe Gln Gly Asp Ser Ser
3475 3480 3485
Mer Leu Glu Leu Asp Ala Arg Arg Phe Lys Glu Trp Met Tyr Pro Lys
3490 3495 3500
Val Leu Gly Ala Trp Asn Leu His Ala Leu Thr Arg Asp Arg Ser Leu
3505 3510 3515 3520
Asp Phe Phe Val Leu Tyr Ser Ser Gly Thr Ser Leu Leu Gly Leu Pro
3525 3530 3535
Gly Gln Gly Ser Arg Ala Ala Gly Asp Ala Phe Leu Asp Ala Ile Ala
3540 3545 3550
His His Arg Cys Lys Val Gly Leu Thr Ala Met Ser Ile Asn Trp Gly
3555 3560 3565
Leu Leu Ser Glu Ala Ser Ser Pro Ala Thr Pro Asn Asp Gly Gly Ala
3570 3575 3580
Arg Leu Glu Tyr Arg Gly Met Glu Gly Leu Thr Leu Glu Gln Gly Ala
3585 3590 3595 3600
Ala Ala Leu Gly Arg Leu Leu Ala Arg Pro Arg Ala Gln Val Gly Val
3605 3610 3615
Met Arg Leu Asn Leu Arg Gln Trp Leu Glu Phe Tyr Pro Asn Ala Ala
3620 3625 3630
Arg Leu Ala Leu Trp Ala Glu Leu Leu Lys Glu Arg Asp Arg Ala Asp
3635 3640 3645
Arg Gly Ala Ser Asn Ala Ser Asn Leu Arg Glu Ala Leu Gln Ser Ala
3650 3655 3660
Arg Pro Glu Asp Arg Gln Leu Ile Leu Glu Lys His Leu Ser Glu Leu
3665 3670 3675 3680
Leu Gly Arg Gly Leu Arg Leu Pro Pro Glu Arg Ile Glu Arg His Val
3685 3690 3695
Pro Phe Ser Asn Leu Gly Met Asp Ser Leu Ile Gly Leu Glu Leu Arg
3700 3705 3710
Asn Arg Ile Glu Ala Ala Leu Gly Ile Thr Val Pro Ala Thr Leu Leu
3715 3720 3725
Trp Thr Tyr Pro Asn Val Ala Ala Leu Ser Gly Ser Leu Leu Asp Ile
3730 3735 3740
Leu Phe Pro Asn Ala Gly Ala Thr His Ala Pro Ala Thr Glu Arg Glu
3745 3750 3755 3760
Lys Ser Phe Glu Asn Asp Ala Ala Asp Leu Glu Ala Leu Arg Gly Met
3765 3770 3775
Thr Asp Glu Gln Lys Asp Ala Leu Leu Ala Glu Lys Leu Ala Gln Leu
3780 3785 3790
Ala Gln Ile Val Gly Glu
3795
<210>7
<211>2439
<212>PRT
<213〉sorangium cellulosum
<400>7
Met Ala Thr Thr Asn Ala Gly Lys Leu Glu His Ala Leu Leu Leu Met
1 5 10 15
Asp Lys Leu Ala Lys Lys Aln Ala Ser Glu Glu Gln Glu Arg Thr Glu
20 25 30
Pro Ile Ala Ile Val Gly Ile Gly Cys Arg Phe Pro Gly Gly Ala Asp
35 40 45
Thr Pro Glu Ala Phe Trp Glu Leu Leu Asp Ser Gly Arg Asp Ala Val
50 55 60
Gln Pro Leu Asp Arg Arg Trp Ala Leu Val Gly Val His Pro Ser Glu
65 70 75 80
Glu Val Pro Arg Trp Ala Gly Leu Leu Thr Glu Ala Val Asp Gly Phe
85 90 95
Asp Ala Ala Phe Phe Gly Thr Ser Pro Arg Glu Ala Arg Ser Leu Asp
100 105 110
Pro Gln Gln Arg Leu Leu Leu Glu Val Thr Trp Glu Gly Leu Glu Asp
115 120 125
Ala Gly Ile Ala Pro Gln Ser Leu Asp Gly Ser Arg Thr Gly Val Phe
130 135 140
Leu Gly Ala Cys Ser Ser Asp Tyr Ser His Thr Val Ala Gln Gln Arg
145 150 155 160
Arg Glu Glu Gln Asp Ala Tyr Asp Ile Thr Gly Asn Thr Leu Ser Val
165 170 175
Ala Ala Gly Arg Leu Ser Tyr Thr Leu Gly Leu Gln Gly Pro Cys Leu
180 185 190
Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Ile His Leu Ala
195 200 205
Cys Arg Ser Leu Arg Ala Arg Glu Ser Asp Leu Ala Leu Ala Gly Gly
210 215 220
Val Asn Met Leu Leu Ser Ser Lys Thr Met Ile Met Leu Gly Arg Ile
225 230 235 240
Gln Ala Leu Ser Pro Asp Gly His Cys Arg Thr Phe Asp Ala Ser Ala
245 250 255
Asn Gly Phe Val Arg Gly Glu Gly Cys Gly Met Val Val Leu Lys Arg
260 265 270
Leu Ser Asp Ala Gln Arg His Gly Asp Arg Ile Trp Ala Leu Ile Arg
275 280 285
Gly Ser Ala Met Asn Gln Asp Gly Arg Ser Thr Gly Leu Met Ala Pro
290 295 300
Asn Val Leu Ala Gln Glu Ala Leu Leu Arg Glu Ala Leu Gln Ser Ala
305 310 315 320
Arg Val Asp Ala Gly Ala Ile Gly Tyr Val Glu The His Gly Thr Gly
325 330 335
Thr Ser Leu Gly Asp Pro Ile Glu Val Glu Ala Leu Arg Ala Val Leu
340 345 350
Gly Pro Ala Arg Ala Asp Gly Ser Arg Cys Val Leu Gly Ala Val Lys
355 360 365
Thr Asn Leu Gly His Leu Glu Gly Ala Ala Gly Val Ala Gly Leu Ile
370 375 380
Lys Ala Ala Leu Ala Leu His His Glu Leu Ile Pro Arg Asn Leu His
385 390 395 400
Phe His Thr Leu Asn Pro Arg Ile Arg Ile Glu Gly Thr Ala Leu Ala
405 410 415
Leu Ala Thr Glu Pro Val Pro Trp Pro Arg Ala Gly Arg Pro Arg Phe
420 425 430
Ala Gly Val Ser Ala Phe Gly Leu Ser Gly Thr Asn Val His Val Val
435 440 445
Leu Glu Glu Ala Pro Ala Thr Val Leu Ala Pro Ala Thr Pro Gly Arg
450 455 460
Ser Ala Glu Leu Leu Val Leu Ser Ala Lys Ser Ala Ala Ala Leu Asp
465 470 475 480
Ala Gln Ala Ala Arg Leu Ser Ala His Ile Ala Ala Tyr Pro Glu Gln
485 490 495
Gly Leu Gly Asp Val Ala Phe Ser Leu Val Ser Thr Arg Ser Pro Met
500 505 510
Glu His Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Ala Leu Arg Ser
515 520 525
Ala Leu Glu Val Ala Ala Gln Gly Gln Thr Pro Ala Gly Ala Ala Arg
530 535 540
Gly Arg Ala Ala Ser Ser Pro Gly Lys Leu Ala Phe Leu Phe Ala Gly
545 550 555 560
Gln Gly Ala Gln Val Pro Gly Met Gly Arg Gly Leu Trp Glu Ala Trp
565 570 575
Pro Ala Phe Arg Glu Thr Phe Asp Arg Cys Val Thr Leu Phe Asp Arg
580 585 590
Glu Leu His Gln Pro Leu Cys Glu Val Met Trp Ala Glu Pro Gly Ser
595 600 605
Ser Arg Ser Ser Leu Leu Asp Gln Thr Ala Phe Thr Gln Pro Ala Leu
610 615 620
Phe Ala Leu Glu Tyr Ala Leu Ala Ala Leu Phe Arg Ser Trp Gly Val
625 630 635 640
Glu Pro Glu Leu Val Ala Gly His Ser Leu Gly Glu Leu Val Ala Ala
645 650 655
Cys Val Ala Gly Val Phe Ser Leu Glu Asp Ala Val Arg Leu Val Val
660 665 670
Ala Arg Gly Arg Leu Met Gln Ala Leu Pro Ala Gly Gly Ala Met Val
675 680 685
Ser Ile Ala Ala Pro Glu Ala Asp Val Ala Ala Ala Val Ala Pro His
690 695 700
Ala Ala Leu Val Ser Ile Ala Ala Val Asn Gly Pro Glu Gln Val Val
705 710 715 720
Ile Ala Gly Ala Glu Lys Phe Val Gln Gln Ile Ala Ala Ala Phe Ala
725 730 735
Ala Arg Gly Ala Arg Thr Lys Pro Leu His Val Ser His Ala Phe His
740 745 750
Ser Pro Leu Met Asp Pro Met Leu Glu Ala Phe Arg Arg Val Thr Glu
755 760 765
Ser Val Thr Tyr Arg Arg Pro Ser Ile Ala Leu Val Ser Asn Leu Ser
770 775 780
Gly Lys Pro Cys Thr Asp Glu Val Ser Ala Pro Gly Tyr Trp Val Arg
785 790 795 800
His Ala Arg Glu Ala Val Arg Phe Ala Asp Gly Val Lys Ala Leu His
805 810 815
Ala Ala Gly Ala Gly Leu Phe Val Glu Val Gly Pro Lys Pro Thr Leu
820 825 830
Leu Gly Leu Val Pro Ala Cys Leu Pro Asp Ala Arg Pro Val Leu Leu
835 840 845
Pro Ala Ser Arg Ala Gly Arg Asp Glu Ala Ala Ser Ala Leu Glu Ala
850 855 860
Leu Gly Gly Phe Trp Val Val Gly Gly Ser Val Thr Trp Ser Gly Val
865 870 875 880
Phe Pro Ser Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gln
885 890 895
Arg Glu Arg Tyr Trp Ile Glu Ala Pro Val Asp Arg Glu Ala Asp Gly
900 905 910
Thr Gly Arg Ala Arg Ala Gly Gly His Pro Leu Leu Gly Glu Val Phe
915 920 925
Ser Val Ser Thr His Ala Gly Leu Arg Leu Trp Glu Thr Thr Leu Asp
930 935 940
Arg Lys Arg Leu Pro Trp Leu Gly Glu His Arg Ala Gln Gly Glu Val
945 950 955 960
Val Phe Pro Gly Ala Gly Tyr Leu Glu Met Ala Leu Ser Ser Gly Ala
965 970 975
Glu Ile Leu Gly Asp Gly Pro Ile Gln Val Thr Asp Val Val Leu Ile
980 985 990
Glu Thr Leu Thr Phe Ala Gly Asp Thr Ala Val Pro Val Gln Val Val
995 1000 1005
Thr Thr Glu Glu Arg Pro Gly Arg Leu Arg Phe Gln Val Ala Ser Arg
1010 1015 1020
Glu Pro Gly Glu Arg Arg Ala Pro Phe Arg Ile His Ala Arg Gly Val
1025 1030 1035 1040
Leu Arg Arg Ile Gly Arg Val Glu Thr Pro Ala Arg Ser Asn Leu Ala
1045 1050 1055
Ala Leu Arg Ala Arg Leu His Ala Ala Val Pro Ala Ala Ala Ile Tyr
1060 1065 1070
Gly Ala Leu Ala Glu Met Gly Leu Gln Tyr Gly Pro Ala Leu Arg Gly
1075 1080 1085
Leu Ala Glu Leu Trp Arg Gly Glu Gly Glu Ala Leu Gly Arg Val Arg
1090 1095 1100
Leu Pro Glu Ala Ala Gly Ser Ala Thr Ala Tyr Gln Leu His Pro Val
1105 1110 1115 1120
Leu Leu Asp Ala Cys Val Gln Met Ile Val Gly Ala Phe Ala Asp Arg
1125 1130 1135
Asp Glu Ala Thr Pro Trp Ala Pro Val Glu Val Gly Ser Val Arg Leu
1140 1145 1150
Phe Gln Arg Ser Pro Gly Glu Leu Trp Cys His Ala Arg Val Val Ser
1155 1160 1165
Asp Gly Gln Gln Ala Ser Ser Arg Trp Ser Ala Asp Phe Glu Leu Met
1170 1175 1180
Asp Gly Thr Gly Ala Val Val Ala Glu Ile Ser Arg Leu Val Val Glu
1185 1190 1195 1200
Arg Leu Ala Ser Gly Val Arg Arg Arg Asp Ala Asp Asp Trp Phe Leu
1205 1210 1215
Glu Leu Asp Trp Glu Pro Ala Ala Leu Gly Gly Pro Lys Ile Thr Ala
1220 1225 1230
Gly Arg Trp Leu Leu Leu Gly Glu Gly Gly Gly Leu Gly Arg Ser Leu
1235 1240 1245
Cys Ser Ala Leu Lys Ala Ala Gly His Val Val Val His Ala Ala Gly
1250 1255 1260
Asp Asp Thr Ser Thr Ala Gly Met Arg Ala Leu Leu Ala Asn Ala Phe
1265 1270 1275 1280
Asp Gly Gln Ala Pro Thr Ala Val Val His Leu Ser Ser Leu Asp Gly
1285 1290 1295
Gly Gly Gln Leu Gly Pro Gly Leu Gly Ala Gln Gly Ala Leu Asp Ala
1300 1305 1310
Pro Arg Ser Pro Asp Val Asp Ala Asp Ala Leu Glu Ser Ala Leu Met
1315 1320 1325
Arg Gly Cys Asp Ser Val Leu Ser Leu Val Gln Ala Leu Val Gly Met
1330 1335 1340
Asp Leu Arg Asn Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gln
1345 1350 1355 1360
Ala Ala Ala Ala Gly Asp Val Ser Val Val Gln Ala Pro Leu Leu Gly
1365 1370 1375
Leu Gly Arg Thr Ile Ala Leu Glu His Ala Glu Leu Arg Cys Ile Ser
1380 1385 1390
Val Asp Leu Asp Pro Ala Glu Pro Glu Gly Glu Ala Asp Ala Leu Leu
1395 1400 1405
Ala Glu Leu Leu Ala Asp Asp Ala Glu Glu Glu Val Ala Leu Arg Gly
1410 1415 1420
Gly Asp Arg Leu Val Ala Arg Leu Val His Arg Leu Pro Asp Ala Gln
1425 1430 1435 1440
Arg Arg Glu Lys Val Glu Pro Ala Gly Asp Arg Pro Phe Arg Leu Glu
1445 1450 1455
Ile Asp Glu Pro Gly Ala Leu Asp Gln Leu Val Leu Arg Ala Thr Gly
1460 1465 1470
Arg Arg Ala Pro Gly Pro Gly Glu Val Glu Ile Ser Val Glu Ala Ala
1475 1480 1485
Gly Leu Asp Ser Ile Asp Ile Gln Leu Ala Leu Gly Val Ala Pro Asn
1490 1495 1500
Asp Leu Pro Gly Glu Glu Ile Glu Pro Leu Val Leu Gly Ser Glu Cys
1505 1510 1515 1520
Ala Gly Arg Ile Val Ala Val Gly Glu Gly Val Asn Gly Leu Val Val
1525 1530 1535
Gly Gln Pro Val Ile Ala Leu Ala Ala Gly Val Phe Ala Thr His Val
1540 1545 1550
Thr Thr Ser Ala Thr Leu Val Leu Pro Arg Pro Leu Gly Leu Ser Ala
1555 1560 1565
Thr Glu Ala Ala Ala Met Pro Leu Ala Tyr Leu Thr Ala Trp Tyr Ala
1570 1575 1580
Leu Asp Lys Val Ala His Leu Gln Ala Gly Glu Arg Va1 Leu Ile His
1585 1590 1595 1600
Ala Glu Ala Gly Gly Val Gly Leu Cys Ala Val Arg Trp Ala Gln Arg
1605 1610 1615
Val Gly Ala Glu Val Tyr Ala Thr Ala Asp Thr Pro Glu Asn Arg Ala
1620 1625 1630
Tyr Leu Glu Ser Leu Gly Val Arg Tyr Val Ser Asp Ser Arg Ser Gly
1635 1640 1645
Arg Phe Val Thr Asp Val His Ala Trp Thr Asp Gly Glu Gly Val Asp
1650 1655 1660
Val Val Leu Asp Ser Leu Ser Gly Glu Arg Ile Asp Lys Ser Leu Met
1665 1670 1675 1680
Val Leu Arg Ala Cys Gly Arg Leu Val Lys Leu Gly Arg Arg Asp Asp
1685 1690 1695
Cys Ala Asp Thr Gln Pro Gly Leu Pro Pro Leu Leu Arg Asn Phe Ser
1700 1705 1710
Phe Ser Gln Val Asp Leu Arg Gly Met Met Leu Asp Gln Pro Ala Arg
1715 1720 1725
Ile Arg Ala Leu Leu Asp Glu Leu Phe Gly Leu Val Ala Ala Gly Ala
1730 1735 1740
Ile Ser Pro Leu Gly Ser Gly Leu Arg Val Gly Gly Ser Leu Thr Pro
1745 1750 1755 1760
Pro Pro Val Glu Thr Phe Pro Ile Ser Arg Ala Ala Glu Ala Phe Arg
1765 1770 1775
Arg Met Ala Gln Gly Gln His Leu Gly Lys Leu Val Leu Thr Leu Asp
1780 1785 1790
Asp Pro Glu Val Arg Ile Arg Ala Pro Ala Glu Ser Ser Val Ala Val
1795 1800 1805
Arg Ala Asp Gly Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly
1810 1815 1820
Leu Arg Val Ala Gly Trp Leu Ala Glu Arg Gly Ala Gly Gln Leu Val
1825 1830 1835 1840
Leu Val Gly Arg Ser Gly Ala Ala Ser Ala Glu Gln Arg Ala Ala Val
1845 1850 1855
Ala Ala Leu Glu Ala His Gly Ala Arg Val Thr Val Ala Lys Ala Asp
1860 1865 1870
Val Ala Asp Arg Ser Gln Ile Glu Arg Val Leu Arg Glu Val Thr Ala
1875 1880 1885
Ser Gly Met Pro Leu Arg Gly Val Val His Ala Ala Gly Leu Val Asp
1890 1895 1900
Asp Gly Leu Leu Met Gln Gln Thr Pro Ala Arg Phe Arg Thr Val Met
1905 1910 1915 1920
Gly Pro Lys Val Gln Gly Ala Leu His Leu His Thr Leu Thr Arg Glu
1925 1930 1935
Ala Pro Leu Ser Phe Phe Val Leu Tyr Ala Ser Ala Ala Gly Leu Phe
1940 1945 1950
Gly Ser Pro Gly Gln Gly Asn Tyr Ala Ala Ala Asn Ala Phe Leu Asp
1955 1960 1965
Ala Leu Ser His His Arg Arg Ala Gln Gly Leu Pro Ala Leu Ser Ile
1970 1975 1980
Asp Trp Gly Met Phe Thr Glu Val Gly Met Ala Val Ala Gln Glu Asn
1985 1990 1995 2000
Arg Gly Ala Arg Gln Ile Ser Arg Gly Met Arg Gly Ile Thr Pro Asp
2005 2010 2015
Glu Gly Leu Ser Ala Leu Ala Arg Leu Leu Glu GlyAsp Arg Val Gln
2020 2025 2030
Thr Gly Val Ile Pro Ile Thr Pro Arg Gln Trp Val Glu Phe Tyr Pro
2035 2040 2045
Ala Thr Ala Ala Ser Arg Arg Leu Ser Arg Leu Val Thr Thr Gln Arg
2050 2055 2060
Ala Val Ala Asp Arg Thr Ala Gly Asp Arg Asp Leu Leu Glu Gln Leu
2065 2070 2075 2080
Ala Ser Ala Glu Pro Ser Ala Arg Ala Gly Leu Leu Gln Asp Val Val
2085 2090 2095
Arg Val Gln Val Ser His Val Leu Arg Leu Pro Glu Asp Lys Ile Glu
2100 2105 2110
Val Asp Ala Pro Leu Ser Ser Met Gly Met Asp Ser Leu Met Ser Leu
2115 2120 2125
Glu Leu Arg Asn Arg Ile Glu Ala Ala Leu Gly Val Ala Ala Pro Ala
2130 2135 2140
Ala Leu Gly Trp Thr Tyr Pro Thr Val Ala Ala Ile Thr Arg Trp Leu
2145 2150 2155 2160
Leu Asp Asp Ala Leu Val Val Arg Leu Gly Gly Gly Ser Asp Thr Asp
2165 2170 2175
Glu Ser Thr Ala Ser Ala Gly Ser Phe Val His Val Leu Arg Phe Arg
2180 2185 2190
Pro Val Val Lys Pro Arg Ala Arg Leu Phe Cys Phe His Gly Ser Gly
2195 2200 2205
Gly Ser Pro Glu Gly Phe Arg Ser Trp Ser Glu Lys Ser Glu Trp Ser
2210 2215 2220
Asp Leu Glu Ile Val Ala Met Trp His Asp Arg Ser Leu Ala Ser Glu
2225 2230 2235 2240
Asp Ala Pro Gly Lys Lys Tyr Val Gln Glu Ala Ala Ser Leu Ile Gln
2245 2250 2255
His Tyr Ala Asp Ala Pro Phe Ala Leu Val Gly Phe Ser Leu Gly Val
2260 2265 2270
Arg Phe Val Met Gly Thr Ala Val Glu Leu Ala Ser Arg Ser Gly Ala
2275 2280 2285
Pro Ala Pro Leu Ala Val Phe Thr Leu Gly Gly Ser Leu Ile Ser Ser
2290 2295 2300
Ser Glu Ile Thr Pro Glu Met Glu Thr Asp Ile Ile Ala Lys Leu Phe
2305 2310 2315 2320
Phe Arg Asn Ala Ala Gly Phe Val Arg Ser Thr Gln Gln Val Gln Ala
2325 2330 2335
Asp Ala Arg Ala Asp Lys Val Ile Thr Asp Thr Met Val Ala Pro Ala
2340 2345 2350
Pro Gly Asp Ser Lys Glu Pro Pro Val Lys Ile Ala Val Pro Ile Val
2355 2360 2365
Ala Ile Ala Gly Ser Asp Asp Val Ile Val Pro Pro Ser Asp Val Gln
2370 2375 2380
Asp Leu Gln Ser Arg Thr Thr Glu Arg Phe Tyr Met His Leu Leu Pro
2385 2390 2395 2400
Gly Asp His Glu Phe Leu Val Asp Arg Gly Arg Glu Ile Met His Ile
2405 2410 2415
Val Asp Ser His Leu Asn Pro Leu Leu Ala Ala Arg Thr Thr Ser Ser
2420 2425 2430
Gly Pro Ala Phe Glu Ala Lys
2435
<210>8
<211>419
<212>PRT
<213〉sorangium cellulosum
<400>8
Met Thr Gln Glu Gln Ala Asn Gln Ser Glu Thr Lys Pro Ala Phe Asp
1 5 10 15
Phe Lys Pro Phe Ala Pro Gly Tyr Ala Glu Asp Pro Phe Pro Ala Ile
20 25 30
Glu Arg Leu Arg Glu Ala Thr Pro Ile Phe Tyr Trp Asp Glu Gly Arg
35 40 45
Ser Trp Val Leu Thr Arg Tyr His Asp Val Ser Ala Val Phe Arg Asp
50 55 60
Glu Arg Phe Ala Val Ser Arg Glu Glu Trp Glu Ser Ser Ala Glu Tyr
65 70 75 80
Ser Ser Ala Ile Pro Glu Leu Ser Asp Met Lys Lys Tyr Gly Leu Phe
85 90 95
Gly Leu Pro Pro Glu Asp His Ala Arg Val Arg Lys Leu Val Asn Pro
100 105 110
Ser Phe Thr Ser Arg Ala Ile Asp Leu Leu Arg Ala Glu Ile Gln Arg
115 120 125
Thr Val Asp Gln Leu Leu Asp Ala Arg Ser Gly Gln Glu Glu Phe Asp
130 135 140
Val Val Arg Asp Tyr Ala Glu Gly Ile Pro Met Arg Ala Ile Ser Ala
145 150 155 160
Leu Leu Lys Val Pro Ala Glu Cys Asp Glu Lys Phe Arg Arg Phe Gly
165 170 175
Ser Ala Thr Ala Arg Ala Leu Gly Val Gly Leu Val Pro Gln Val Asp
180 185 190
Glu Glu Thr Lys Thr Leu Val Ala Ser Val Thr Glu Gly Leu Ala Leu
195 200 205
Leu His Asp Val Leu Asp Glu Arg Arg Arg Asn Pro Leu Glu Asn Asp
210 215 220
Val Leu Thr Met Leu Leu Gln Ala Glu Ala Asp Gly Ser Arg Leu Ser
225 230 235 240
Thr Lys Glu Leu Val Ala Leu Val Gly Ala Ile Ile Ala Ala Gly Thr
245 250 255
Asp Thr Thr Ile Tyr Leu Ile Ala Phe Ala Val Leu Asn Leu Leu Arg
260 265 270
Ser Pro Glu Ala Leu Glu Leu Val Lys Ala Glu Pro Gly Leu Met Arg
275 280 285
Asn Ala Leu Asp Glu Val Leu Arg Phe Asp Asn Ile Leu Arg Ile Gly
290 295 300
Thr Val Arg Phe Ala Arg Gln Asp Leu Glu Tyr Cys Gly Ala Ser Ile
305 310 315 320
Lys Lys Gly Glu Met Val Phe Leu Leu Ile Pro Ser Ala Leu Arg Asp
325 330 335
Gly Thr Val Phe Ser Arg Pro Asp Val Phe Asp Val Arg Arg Asp Thr
340 345 350
Gly Ala Ser Leu Ala Tyr Gly Arg Gly Pro His Val Cys Pro Gly Val
355 360 365
Ser Leu Ala Arg Leu Glu Ala Glu Ile Ala Val Gly Thr Ile Phe Arg
370 375 380
Arg Phe Pro Glu Met Lys Leu Lys Glu Thr Pro Val Phe Gly Tyr His
385 390 395 400
Pro Ala Phe Arg Asn Ile Glu Ser Leu Asn Val Ile Leu Lys Pro Ser
405 410 415
Lys Ala Gly
<210>9
<211>607
<212>PRT
<213〉sorangium cellulosum
<400>9
Ala Ser Leu Asp Ala Leu Phe Ala Arg Ala Thr Ser Ala Arg Val Leu
1 5 10 15
Asp Asp Gly His Gly Arg Ala Thr Glu Arg His Val Leu Ala Glu Ala
20 25 30
Arg Gly Ile Glu Asp Leu Arg Ala Leu Arg Glu His Leu Arg Ile Gln
35 40 45
Glu Gly Gly Pro Ser Phe His Cys Met Cys Leu Gly Asp Leu Thr Val
50 55 60
Glu Leu Leu Ala His Asp Gln Pro Leu Ala Ser Ile Ser Phe His His
65 70 75 80
Ala Arg Ser Leu Arg His Pro Asp Trp Thr Ser Asp Ala Met Leu Val
85 90 95
Asp Gly Pro Ala Leu Val Arg Trp Leu Ala Ala Arg Gly Ala Pro Gly
100 105 110
Pro Leu Arg Glu Tyr Glu Glu Glu Arg Glu Arg Ala Arg Thr Ala Gln
115 120 125
Glu Ala Arg Arg Leu Trp Leu Ala Ala Ala Pro Pro Cys Phe Ala Pro
130 135 140
Asp Leu Pro Arg Phe Glu Asp Asp Ala Asn Gly Leu Pro Leu Gly Pro
145 150 155 160
Met ser Pro Glu Val Ala Glu Ala Glu Arg Arg Leu Arg Ala Ser Tyr
165 170 175
Ala Thr Pro Glu Leu Ala Cys Ala Ala Leu Leu Ala Trp Leu Gly Thr
180 185 190
Gly Ala Gly Pro Trp Ser Gly Tyr Pro Ala Tyr Glu Met Leu Pro Glu
195 200 205
Asn Leu Leu Leu Gly Phe Gly Leu Pro Thr Ala Ile Ala Ala Ala Ser
210 215 220
Ala Pro Gly Thr Ser Glu Ala Ala Leu Arg Gly Ala Ala Arg Leu Phe
225 230 235 240
Ala Ser Trp Glu Val Val Ser Ser Lys Lys Ser Gln Leu Gly Asn Ile
245 250 255
Pro Glu Ala Leu Trp Glu Arg Leu Arg Thr Ile Val Arg Ala Met Gly
260 265 270
Asn Ala Asp Asn Leu Ser Arg Phe Glu Arg Ala Glu Ala Ile Ala Ala
275 280 285
Glu Val Arg Arg Leu Arg Ala Gln Pro Ala Pro Phe Ala Ala Gly Ala
290 295 300
Gly Leu Ala Val Ala Gly Val Ser Ser Ser Gly Arg Leu Ser Gly Leu
305 310 315 320
Val Thr Asp Gly Asp Ala Leu Tyr Ser Gly Asp Gly Asn Asp Ile Val
325 330 335
Met Phe Gln Pro Gly Arg Ile Ser Pro Val Val Leu Leu Ala Gly Thr
340 345 350
Asp Pro Phe Phe Glu Leu Ala Pro Pro Leu Ser Gln Met Leu Phe Val
355 360 365
Ala His Ala Asn Ala Gly Thr Ile Ser Lys Val Leu Thr Glu Gly Ser
370 375 380
Pro Leu Ile Val Met Ala Arg Asn Gln Ala Arg Pro Met Ser Leu Val
385 390 395 400
His Ala Arg Gly Phe Met Ala Trp Val Asn Gln Ala Met Val Pro Asp
405 410 415
Pro Glu Arg Gly Ala Pro Phe Val Val Gln Arg Ser Thr Ile Met Glu
420 425 430
Phe Glu His Pro Thr Pro Arg Cys Leu His Glu Pro Ala Gly Ser Ala
435 440 445
Phe Ser Leu Ala Cys Asp Glu Glu His Leu Tyr Trp Cys Glu Leu Ser
450 455 460
Ala Gly Arg Leu Glu Leu Trp Arg His Pro His His Arg Pro Gly Ala
465 470 475 480
Pro Ser Arg Phe Ala Tyr Leu Gly Glu His Pro Ile Ala Ala Thr Trp
485 490 495
Tyr Pro Ser Leu Thr Leu Asn Ala Thr His Val Leu Trp Ala Asp Pro
500 505 510
Asp Arg Arg Ala Ile Leu Gly Val Asp Lys Arg Thr Gly Val Glu Pro
515 520 525
Ile Val Leu Ala Glu Thr Arg His Pro Pro Ala His Val Val Ser Glu
530 535 540
Asp Arg Asp Ile Phe Ala Leu Thr Gly Gln Pro Asp Ser Arg Asp Trp
545 550 555 560
His Val Glu His Ile Arg Ser Gly Ala Ser Thr Val Val Ala Asp Tyr
565 570 575
Gln Arg Gln Leu Trp Asp Arg Pro Asp Met Val Leu Asn Arg Arg Gly
580 585 590
Leu Phe Phe Thr Thr Asn Asp Arg Ile Leu Thr Leu Ala Arg Ser
595 600 605
<210>10
<211>423
<212>PRT
<213〉sorangium cellulosum
<400>10
Met Gly Ala Leu Ile Ser Val Ala Ala Pro Gly Cys Ala Leu Gly Gly
1 5 10 15
Ala Glu Glu Glu Gly Gln Pro Gly Gln Asp Ala Gly Ala Gly Ala Leu
20 25 30
Ala Pro Ala Arg Glu Val Met Ala Ala Glu Val Ala Ala Gly Gln Met
35 40 45
Pro Gly Ala Val Trp Leu Val Ala Arg Gly Asp Asp Val His Val Asp
50 55 60
Ala Val Gly Val Thr Glu Leu Gly Gly Ser Ala Pro Met Arg Arg Asp
65 70 75 80
Thr Ile Phe Arg Ile Ala Ser Met Thr Lys Ala Val Thr Ala Thr Ala
85 90 95
Val Met Met Leu Val Glu Glu Gly Lys Leu Asp Leu Asp Ser Pro Val
100 105 110
Asp Arg Trp Leu Pro Glu Leu Ala Asn Arg Lys Val Leu Ala ArgIle
115 120 125
Asp Gly Pro Ile Asp Glu Thr Val Pro Ala Glu Arg Pro Ile Thr Val
130 135 140
Arg Asp Leu Met Thr Phe Thr Met Gly Phe Gly Ile Ser Phe Asp Ala
145 150 155 160
Ser Ser Pro Ile Gln Arg Ala Ile Asp Glu Leu Gly Leu Val Asn Ala
165 170 175
Gln Pro Val Pro Met Thr Pro His Gly Pro Asp Glu Trp Ile Arg Arg
180 185 190
Leu Gly Thr Leu Pro Leu Met His Gln Pro Gly Ala Gln Trp MetTyr
195 200 205
Asn Thr Gly Ser Leu Val Gln Gly Val Leu Val Gly Arg Ala Ala Asp
210 215 220
Gln Gly Phe Asp Ala Phe Val Arg Glu Arg Ile Leu Ala Pro Leu Gly
225 230 235 240
Met Arg Asp Thr Asp Phe His Val Pro Ala Asp Lys Leu Ala Arg Phe
245 250 255
Ala Gly Cys Gly Tyr Phe Thr Asp Glu Gln Thr Gly Glu Lys Thr Arg
260 265 270
Met Asp Arg Asp Gly Ala Glu Ser Ala Tyr Ala Ser Pro Pro Ala Phe
275 280 285
Pro Ser Gly Ala Ala Gly Leu Val Ser Thr Val Asp Asp Tyr Leu Leu
290 295 300
Phe Ala Arg Met Leu Met Asn Gly Gly Val His Glu Gly Arg Arg Leu
305 310 315 320
Leu Ser Ala Ala Ser Val Arg Glu Met Thr Ala Asp His Leu Thr Pro
325 330 335
Ala Gln Lys Ala Ala Ser Ser Phe Phe Pro Gly Phe Phe Glu Thr His
340 345 350
Gly Trp Gly Tyr Gly Met Ala Val Val Thr Ala Pro Asp Ala Val Ser
355 360 365
Glu Val Pro Gly Arg Tyr Gly Trp Asp Gly Gly Phe Gly Thr Ser Trp
370 375 380
Ile Asn Asp Pro Gly Arg Glu Leu Ile Gly Ile Val Met Thr Gln Ser
385 390 395 400
Ala Gly Phe Leu Phe Ser Gly Ala Leu Glu Arg Phe Trp Arg Ser Val
405 410 415
Tyr Val Ala Thr Glu Ser Ala
420
<210>11
<211>713
<212>PRT
<213〉sorangium cellulosum
<400>11
Met His Gly Leu Thr Glu Arg Gln Val Leu Leu Ser Leu Val Thr Leu
1 5 10 15
Ala Leu Ile Leu Val Thr Ala Arg Ala Ser Gly Glu Leu Ala Arg Arg
20 25 30
Leu Arg Gln Pro Glu Val Leu Gly Glu Leu Phe Gly Gly Val Val Leu
35 40 45
Gly Pro Ser Val Val Gly Ala Leu Ala Pro Gly Phe His Arg Ala Leu
50 55 60
Phe Gln Glu Pro Ala Val Gly Val Val Leu Ser Gly Ile Ser Trp Ile
65 70 75 80
Gly Ala Leu Leu Leu Leu Leu Met Ala Gly Ile Glu Val Asp Val Gly
85 90 95
Ile Leu Arg Lys Glu Ala Arg Pro Gly Ala Leu Ser Ala Leu Gly Ala
100 105 110
Ile Ala Pro Pro Leu Ala Ala Gly Ala Ala Phe Ser Ala Leu ValLeu
115 120 125
Asp Arg Pro Leu Pro Ser Gly Leu Phe Leu Gly Ile Val Leu Ser Val
130 135 140
Thr Ala Val Ser Val Ile Ala Lys Val Leu Ile Glu Arg Glu Ser Met
145 150 155 160
Arg Arg Ser Tyr Ala Gln Val Thr Leu Ala Ala Gly Val Val Ser Glu
165 170 175
Val Ala Ala Trp Val Leu Val Ala Met Thr Ser Ser Ser Tyr Gly Ala
180 185 190
Ser Pro Ala Leu Ala Val Ala Arg Ser Ala Leu Leu Ala Ser Gly Phe
195 200 205
Leu Leu Phe Met Val Leu Val Gly Arg Arg Leu Thr His Leu Ala Met
210 215 220
Arg Trp Val Ala Asp Ala Thr Arg Val Ser Lys Gly Gln Val Ser Leu
225 230 235 240
Val Leu Val Leu Thr Phe Leu Ala Ala Ala Leu Thr Gln Arg Leu Gly
245 250 255
Leu His Pro Leu Leu Gly Ala Phe Ala Leu Gly Val Leu Leu Asn Ser
260 265 270
Ala Pro Arg Thr Asn Arg Pro Leu Leu Asp Gly Val Gln Thr Leu Val
275 280 285
Ala Gly Leu Phe Ala Pro Val Phe Phe Val Leu Ala Gly Met Arg Val
290 295 300
Asp Val Ser Gln Leu Arg Thr Pro Ala Ala Trp Gly Thr Val Ala Leu
305 310 315 320
Leu Leu Ala Thr Ala Thr Ala Ala Lys Val Val Pro Ala Ala Leu Gly
325 330 335
Ala Arg Leu Gly Gly Leu Arg Gly Ser Glu Ala Ala Leu Val Ala Val
340 345 350
Gly Leu Asn Met Lys Gly Gly Thr Asp Leu Ile Val Ala Ile Val Gly
355 360 365
Val Glu Leu Gly Leu Leu Ser Asn Glu Ala Tyr Thr Met Tyr Ala Val
370 375 380
Val Ala Leu Val Thr Val Thr Ala Ser Pro Ala Leu Leu Ile Trp Leu
385 390 395 400
Glu Lys Arg Ala Pro Pro Thr Gln Glu Glu Ser Ala Arg Leu Glu Arg
405 410 415
Glu Glu Ala Ala Arg Arg Ala Tyr Ile Pro Gly Val Glu Arg Ile Leu
420 425 430
Val Pro Ile Val Ala His Ala Leu Pro Gly Phe Ala Thr Asp Ile Val
435 440 445
Glu Ser Ile Val Ala Ser Lys Arg Lys Leu Gly Glu Thr Val Asp Ile
450 455 460
Thr Glu Leu Ser Val Glu Gln Gln Ala Pro Gly Pro Ser Arg Ala Ala
465 470 475 480
Gly Glu Ala Ser Arg Gly Leu Ala Arg Leu Gly Ala Arg Leu Arg Val
485 490 495
Gly Ile Trp Arg Gln Arg Arg Glu Leu Arg Gly Ser Ile Gln Ala Ile
500 505 510
Leu Arg Ala Ser Arg Asp His Asp Leu Leu Val Ile Gly Ala Arg Ser
515 520 525
Pro Ala Arg Ala Arg Gly Met Ser Phe Gly Arg Leu Gln Asp Ala Ile
530 535 540
Val Gln Arg Ala Glu Ser Asn Val Leu Val Val Val Gly Asp Pro Pro
545 550 555 560
Ala Ala Glu Arg Ala Ser Ala Arg Arg Ile Leu Val Pro Ile Ile Gly
565 570 575
Leu Glu Tyr Ser Phe Ala Ala Ala Asp Leu Ala Ala His Val Ala Leu
580 585 590
Ala Trp Asp Ala Glu Leu Val Leu Leu Ser Ser Ala Gln Thr Asp Pro
595 600 605
Gly Ala Val Val Trp Arg Asp Arg Glu Pro Ser Arg Val Arg Ala Val
610 615 620
Ala Arg Ser Val Val Asp Glu Ala Val Phe Arg Gly Arg Arg Leu Gly
625 630 635 640
Val Arg Val Ser Ser Arg Val His Val Gly Ala His Pro Ser Asp Glu
645 650 655
Ile Thr Arg Glu Leu Ala Arg Ala Pro Tyr Asp Leu Leu Val Leu Gly
660 665 670
Cys Tyr Asp His Gly Pro Leu Gly Arg Leu Tyr Leu Gly Ser Thr Val
675 680 685
Glu Ser Val Val Val Arg Ser Arg Val Pro Val Ala Leu Leu Val Ala
690 695 700
His Gly Gly Thr Arg Glu Gln Val Arg
705 710
<210>12
<211>126
<212>PRT
<213〉sorangium cellulosum
<400>12
Met Asp Lys Pro Ile Gly Arg Thr Arg Cys Ala Ile Ala Glu Gly Tyr
1 5 10 15
Ile Pro Gly Gly Ser Asn Gly Pro Glu Pro Gln Met Thr Ser His Glu
20 25 30
Thr Ala Cys Leu Leu Asn Ala Ser Asp Arg Asp Ala Gln Val Ala Ile
35 40 45
Thr Val Tyr Phe Ser Asp Arg Asp Pro Ala Gly Pro Tyr Arg Val Thr
50 55 60
Val Pro Ala Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu
65 70 75 80
Pro Glu Pro Ile Pro Arg Asp Thr Asp Tyr Ala Ser Val Ile Glu Ser
85 90 95
Asp Ala Pro Ile Val Val Gln His Thr Arg Leu Asp Ser Arg Gln Ala
100 105 110
Glu Asn Ala Leu Leu Ser Thr Ile Ala Tyr Thr Asp Arg Glu
115 120 125
<210>13
<211>149
<212>PRT
<213〉sorangium cellulosum
<400>13
Met Lys His Val Asp Thr Gly Arg Arg Phe Gly Arg Arg Ile Gly His
1 5 10 15
Thr Leu Gly Leu Leu Ala Ser Met Ala Leu Ala Gly Cys Gly Glv Pro
20 25 30
Ser Glu Lys Thr Val Gln Gly Thr Arg Leu Ala Pro Gly Ala Asp Ala
35 40 45
Arg Val Thr Ala Asp Val Asp Pro Asp Ala Ala Thr Thr Arg Leu Ala
50 55 60
Val Asp Val Val His Leu Ser Pro Pro Glu Arg Leu Glu Ala Gly Ser
65 70 75 80
Glu Arg Phe Val Val Trp Gln Arg Pro Ser Pro Glu Ser Pro Trp Arg
85 90 95
Arg Val Gly Val Leu Asp Tyr Asn Ala Asp Ser Arg Arg Gly Lys Leu
100 105 110
Ala Glu Thr Thr Val Pro Tyr Ala Asn Phe Glu Leu Leu Ile Thr Ala
115 120 125
Glu Lys Gln Ser Ser Pro Gln Ser Pro Ser Ser Ala Ala Val Ile Gly
130 135 140
Pro Thr Ser Val Gly
145
<210>14
<211>184
<212>PRT
<213〉sorangium cellulosum
<400>14
Val Thr Ser Glu Glu Val Pro Gly Ala Ala Leu Gly Ala Gln Ser Ser
1 5 10 15
Leu Val Arg Ala Gln His Ala Ala Arg His Val Arg Pro Cys Thr Arg
20 25 30
Ala Glu Glu Pro Pro Ala Leu Met His Gly Leu Thr Glu Arg Gln Val
35 40 45
Leu Leu Ser Leu Val Ala Leu Ala Leu Val Leu Leu Thr Ala Arg Ala
50 55 60
Phe Gly Glu Leu Ala Arg Arg Leu Arg Gln Pro Glu Val Leu Gly Glu
65 70 75 80
Leu Phe Gly Gly Val Val Leu Gly Pro Ser Val Val Gly Ala Leu Ala
85 90 95
Pro Gly Phe His Arg Val Leu Phe Gln Asp Pro Ala Val Gly Val Val
100 105 110
Leu Ser Gly Ile Ser Trp Ile Gly Ala Leu Val Leu Leu Leu Met Ala
115 120 125
Gly Ile Glu Val Asp Val Ser Ile Leu Arg Lys Glu Ala Arg Pro Gly
130 135 140
Ala Leu Ser Ala Leu Gly Ala Ile Ala Pro Pro Leu Arg Thr Pro Gly
145 150 155 160
Pro Leu Val Gln Arg Met Gln Gly Ala Phe Thr Trp Asp Leu Asp Val
165 170 175
Ser Pro Arg Arg Ser Ala Gln Ala
180
<210>15
<211>145
<212>PRT
<213〉sorangium cellulosum
<400>15
Val Asn Ala Pro Cys Met Arg Cys Thr Ser Gly Pro Gly Val Arg Ser
1 5 10 15
Gly Gly Ala Ile Ala Pro Ser Ala Glu Ser Ala Pro Gly Arg Ala Ser
20 25 30
Leu Arg Arg Met Leu Thr Ser Thr Ser Ile Pro Ala Met Ser Ser Arg
35 40 45
Thr Ser Ala Pro Ile Gln Glu Met Pro Glu Ser Thr Thr Pro Thr Ala
50 55 60
Gly Ser Trp Lys Arg Thr Arg Trp Asn Pro Gly Ala Ser Ala Pro Thr
65 70 75 80
Thr Asp Gly Pro Ser Thr Thr Pro Pro Lys Ser Ser Pro Ser Thr Ser
85 90 95
Gly Trp Arg Ser Arg Arg Ala Ser Ser Pro Lys Ala Arg Ala Val Arg
100 105 110
Arg Thr Ser Ala Arg Ala Thr Ser Glu Ser Arg Thr Cys Arg Ser Val
115 120 125
Arg Pro Cys Ile Arg Ala Gly Gly Ser Ser Ala Arg Val Gln Gly Arg
130 135 140
Thr
145
<210>16
<211>185
<212>PRT
<213〉sorangium cellulosum
<400>16
Val Leu Ala Pro Pro Ala Asp Ile Arg Pro Pro Ala Ala Ala Gln Leu
1 5 10 15
Glu Pro Asp Ser Pro Asp Asp Glu Ala Asp Glu Ala Asp Glu Ala Leu
20 25 30
Arg Pro Phe Arg Asp Ala Ile Ala Ala Tyr Ser Glu Ala Val Arg Trp
35 40 45
Ala Glu Ala Ala Gln Arg Pro Arg Leu Glu Ser Leu Val Arg Leu Ala
50 55 60
Ile Val Arg Leu Gly Lys Ala Leu Asp Lys Val Pro Phe Ala His Thr
65 70 75 80
Thr Ala Gly Val Ser Gln Ile Ala Gly Arg Leu Gln Asn Asp Ala Val
85 90 95
Trp Phe Asp Val Ala Ala Arg Tyr Ala Ser Phe Arg Ala Ala Thr Glu
100 105 110
His Ala Leu Arg Asp Ala Ala Ser Ala Met Glu Ala Leu Ala Ala Gly
115 120 125
Pro Tyr Arg Gly Ser Ser Arg Val Ser Ala Ala Val Gly Glu Phe Arg
130 135 140
Gly Glu Ala Ala Arg Leu His Pro Ala Asp Arg Val Pro Ala Ser Asp
145 150 155 160
Gln Gln Ile Leu Thr Ala Leu Arg Ala Ala Glu Arg Ala Leu Ile Ala
165 170 175
Leu Tyr Thr Ala Phe Ala Arg Glu Glu
180 185
<210>17
<211>146
<212>PRT
<213〉sorangium cellulosum
<400>17
Met Ala Asp Ala Ala Ser Arg Ser Ala Cys Ser Val Ala Ala Arg Lys
1 5 10 15
Leu Ala Tyr Arg Ala Ala Thr Ser Asn Gln Thr Ala Ser Phe Trp Ser
20 25 30
Leu Pro Ala Ile Trp Glu Thr Pro Ala Val Val Cys Ala Lys Gly Thr
35 40 45
Leu Ser Ser Ala Leu Pro Ser Arg Thr Ile Ala Ser Arg Thr Arg Leu
50 55 60
Ser Ser Arg Gly Arg Cys Ala Ala Ser Ala His Arg Thr Ala Ser Glu
65 70 75 80
Tyr Ala Ala Ile Ala Ser Arg Asn Gly Arg Ser Ala Ser Ser Ala Ser
85 90 95
Ser Ala Ser Ser Ser Gly Glu Ser Gly Ser Ser Trp Ala Ala Ala Gly
100 105 110
Gly Arg Met Ser Ala Gly Gly Ala Ser Thr Gly Glu Val Tyr Glu Gln
115 120 125
Ala Pro Arg Leu Arg Leu Ala Gln Ser Val Ala Ala Arg Arg Arg Asp
130 135 140
Pro Thr
145
<210>18
<211>288
<212>PRT
<213〉sorangium cellulosum
<400>18
Val Thr Val Ser Ser Met Pro Arg Ser Trp Ser Ser Arg Val Arg Thr
1 5 10 15
Val Val Thr Ala Leu Gly Cys Ala Arg Arg Leu Ser Gly Ser Ile Ser
20 25 30
Arg Leu Arg Arg His Pro Glu Ala Gly Arg Ala Pro Arg Ser Arg Leu
35 40 45
Arg Ala Trp Arg Arg Leu Pro Gln His Ile Ser Ser Pro Trp Arg His
50 55 60
Leu Pro Pro Gly Ala Arg Val Gly Thr Ser Cys Pro Ala Asp Arg Arg
65 70 75 80
Ile Leu Pro Ser His Arg Thr Ala Asp Leu Gly Thr Ser Gly Gly Thr
85 90 95
Leu Val Ala Arg Met Ser Gly His Val Ala Arg Asn Pro His Ala Ala
100 105 110
Val Leu Val Gly Asp Gly Ser Ala Arg Gly Arg Arg Arg Leu Ser Asn
115 120 125
Arg Arg Ala Glu Arg Arg Val Ser Asp Val Thr Cys Arg Glu Gly Gly
130 135 140
Glu Ala Met Gln Lys Ile Ala Gly Lys Leu Val Val Gly Leu Ile Ser
145 150 155 160
Val Ser Gly Met Ser Leu Leu Ala Ala Cys Gly Gly Glu Lys Arg Ser
165 170 175
Gly Gly Glu Ala Gln Thr Pro Gly Gly Ala Gln Gly Glu Ala Pro Val
180 185 190
Pro Val Gly Ser Ala Val Asp Ser Ile Val Ala Ala Arg Cys Asp Arg
195 200 205
Glu Ala Arg Cys Asn Asn Ile Gly Gln Asp Arg Glu Tyr Ser Ser Lys
210 215 220
Asp Ala Cys Ser Asn Lys Ile Arg Ser Glu Trp Arg Asp Glu Leu Thr
225 230 235 240
Phe Gly Glu Cys Pro Gly Gly Ile Asp Ala Lys Gln Leu Asn Glu Cys
245 250 255
Leu Glu Gly Ile Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu
260 265 270
Gly Arg Val Val Ala Cys Arg Ser Ser Asp Leu Cys Arg Asp Ala Arg
275 280 285
<210>19
<211>288
<212>PRT
<213〉sorangium cellulosum
<400>19
Val Thr Val Ser Ser Met Pro Arg Ser Trp Ser Ser Arg Val Arg Thr
1 5 10 15
Val Val Thr Ala Leu Gly Cys Ala Arg Arg Leu Ser Gly Ser Ile Ser
20 25 30
Arg Leu Arg Arg His Pro Glu Ala Gly Arg Ala Pro Arg Ser Arg Leu
35 40 45
Arg Ala Trp Arg Arg Leu Pro Gln His Ile Ser Ser Pro Trp Arg His
50 55 60
Leu Pro Pro Gly Ala Arg Val Gly Thr Ser Cys Pro Ala Asp Arg Arg
65 70 75 80
Ile Leu Pro Ser His Arg Thr Ala Asp Leu Gly Thr Ser Gly Gly Thr
85 90 95
Leu Val Ala Arg Met Ser Gly His Val Ala Arg Asn Pro His Ala Ala
100 105 110
Val Leu Val Gly Asp Gly Ser Ala Arg Gly Arg Arg Arg Leu Ser Asn
115 120 125
Arg Arg Ala Glu Arg Arg Val Ser Asp Val Thr Cys Arg Glu Gly Gly
130 135 140
Glu Ala Met Gln Lys Ile Ala Gly Lys Leu Val Val Gly Leu Ile Ser
145 150 155 160
Val Ser Gly Met Ser Leu Leu Ala Ala Cys Gly Gly Glu Lys Arg Ser
165 170 175
Gly Gly Glu Ala Gln Thr Pro Gly Gly Ala Gln Gly Glu Ala Pro Val
180 185 190
Pro Val Gly Ser Ala Val Asp Ser Ile Val Ala Ala Arg Cys Asp Arg
195 200 205
Glu Ala Arg Cys Asn Asn Ile Gly Gln Asp Arg Glu Tyr Ser Ser Lys
210 215 220
Asp Ala Cys Ser Asn Lys Ile Arg Ser Glu Trp Arg Asp Glu Leu Thr
225 230 235 240
Phe Gly Glu Cys Pro Gly Gly Ile Asp Ala Lys Gln Leu Asn Glu Cys
245 250 255
Leu Glu Gly Ile Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu
260 265 270
Gly Arg Val Val Ala Cys Arg Ser Ser Asp Leu Cys Arg Asp Ala Arg
275 280 285
<210>20
<211>155
<212>PRT
<213〉sorangium cellulosum
<400>20
Met Asp Pro Arg Ala Arg Arg Glu Lys Arg Pro Ser Leu Leu Asp Ser
1 5 10 15
Arg Gly Arg Gln Pro Lys Arg Ser Gln Gln Gly Gly His Met Glu Lys
20 25 30
Pro Ile Gly Arg Thr Arg Trp Ala Ile Ala Glu Gly Tyr Ile Pro Gly
35 40 45
Arg Ser Asn Gly Pro Glu Pro Gln Met Thr Ser His Glu Thr Ala Cys
50 55 60
Leu Leu Asn Ala Ser Asp Arg Asp Ala Gln Val Ala Ile Thr Val Tyr
65 70 75 80
Phe Ser Asp Arg Asp Pro Ala Gly Pro Tyr Arg Val Thr Val Pro Ala
85 90 95
Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu Pro Glu Pro
100 105 110
Ile Pro Arg Asp Thr Asp Tyr Ala Ser Val Ile Glu Ser Asp Val Pro
115 120 125
Ile Val Val Gln His Thr Arg Leu Asp Ser Arg Gln Ala Glu Asn Ala
130 135 140
Leu Ile Ser Thr Ile Ala Tyr Thr Asp Arg Glu
145 150 155
<210>21
<211>156
<212>PRT
<213〉sorangium cellulosum
<400>21
Val Arg Arg Ser Arg Trp Gln Met Lys His Val Asp Thr Gly Arg Arg
1 5 10 15
Val Gly Arg Arg Ile Gly Leu Thr Leu Gly Leu Leu Ala Ser Met Ala
20 25 30
Leu Ala Gly Cys Gly Gly Pro Ser Glu Lys Ile Val Gln Gly Thr Arg
35 40 45
Leu Ala Pro Gly Ala Asp Ala His Val Ala Ala Asp Val Asp Pro Asp
50 55 60
Ala Ala Thr Thr Arg Leu Ala Val Asp Val Val His Leu Ser Pro Pro
65 70 75 80
Glu Arg Ile Glu Ala Gly Ser Glu Arg Phe Val Val Trp Gln Arg Pro
85 90 95
Ser Ser Glu Ser Pro Trp Gln Arg Val Gly Val Leu Asp Tyr Asn Ala
100 105 110
Ala Ser Arg Arg Gly Lys Leu Ala Glu Thr Thr Val Pro His Ala Asn
115 120 l25
Phe Glu Leu Leu Ile Thr Val Glu Lys Gln Ser Ser Pro Gln Ser Pro
130 135 140
Ser Ser Ala Ala Val Ile Gly Pro Thr Ser Val Gly
145 150 155
<210>22
<211>305
<212>PRT
<213〉sorangium cellulosum
<400>22
Met Glu Lys Glu Ser Arg Ile Ala Ile Tyr Gly Ala Ile Ala Ala Asn
1 5 10 15
Val Ala Ile Ala Ala Val Lys Phe Ile Ala Ala Ala Val Thr Gly Ser
20 25 30
Ser Ala Met Leu Ser Glu Gly Val His Ser Leu Val Asp Thr Ala Asp
35 40 45
Gly Leu Leu Leu Leu Leu Gly Lys His Arg Ser Ala Arg Pro Pro Asp
50 55 60
Ala Glu His Pro Phe Gly His Gly Lys Glu Leu Tyr Phe Trp Thr Leu
65 70 75 80
Ile Val Ala Ile Met Ile Phe Ala Ala Gly Gly Gly Val Ser Ile Tyr
85 90 95
Glu Gly Ile Leu His Leu Leu His Pro Arg Gln Ile Glu Asp Pro Thr
100 105 110
Trp Asn Tyr Val Val Leu Gly Ala Ala Ala Val Phe Glu Gly Thr Ser
115 120 125
Leu Ile Ile Ser Ile His Glu Phe Lys Lys Lys Asp Gly Gln Gly Tyr
130 135 140
Leu Ala Ala Met Arg Ser Ser Lys Asp Pro Thr Thr Phe Thr Ile Val
145 150 155 160
Leu Glu Asp Ser Ala Ala Leu Ala Gly Leu Thr Ile Ala Phe Leu Gly
165 170 175
Val Trp Leu Gly His Arg Leu Gly Asn Pro Tyr Leu Asp Gly Ala Ala
180 185 190
Ser Ile Gly Ile Gly Leu Val Leu Ala Ala Val Ala Val Phe Leu Ala
195 200 205
Ser Gln Ser Arg Gly Leu Leu Val Gly Glu Ser Ala Asp Arg Glu Leu
210 215 220
Leu Ala Ala Ile Arg Ala Leu Ala Ser Ala Asp Pro Gly Val Ser Ala
225 230 235 240
Val Gly Arg Pro Leu Thr Met His Phe Gly Pro His Glu Val Leu Val
245 250 255
Val Leu Arg Ile Glu Phe Asp Ala Ala Leu Thr Ala Ser Gly Val Ala
260 265 270
Glu Ala Ile Glu Arg Ile Glu Thr Arg Ile Arg Ser Glu Arg Pro Asp
275 280 285
Val Lys His Ile Tyr Val Glu Ala Arg Ser Leu His Gln Arg Ala Arg
290 295 300
Ala
305
<210>23
<211>135
<212>PRT
<213〉sorangium cellulosum
<400>23
Val Gln Thr Ser Ser Phe Asp Ala Arg Tyr Ala Gly Cys Lys Ser Ser
1 5 10 15
Arg Arg Ile Ala Arg Ser Gly Ser Ala Gly Ala Arg Ala Gly Arg Ala
20 25 30
His Glu Gly Ala Ala Ser Ala Gly Phe Glu Gly Gly Asp Val Met Arg
35 40 45
Lys Ala Arg Ala His Gly Ala Met Leu Gly Gly Arg Asp Asp Gly Trp
50 55 60
Arg Arg Gly Leu Pro Gly Ala Gly Ala Leu Arg Ala Ala Leu Gln Arg
65 70 75 80
Gly Arg Ser Arg Asp Leu Ala Arg Arg Arg Leu Ile Ala Ser Val Ser
85 90 95
Leu Ala Gly Gly Ala Ser Met Ala Val Val Ser Leu Phe Gln Leu Gly
100 105 110
Ile Ile Glu Arg Leu Pro Asp Pro Pro Leu Pro Gly Phe Asp Ser Ala
115 120 125
Lys Val Thr Ser Ser Asp Ile
130 135
<210>24
<211>19
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: general reverse primer
<400>24
ggaaacagct atgaccatg 19
<210>25
<211>17
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: general forward primer
<400>25
gtaaaacgac ggccagt 17
<210>26
<211>28
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: PCR primer NH24 end " B "
<400>26
gtgactggcg cctggaatct gcatgagc 28
<210>27
<211>28
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: PCR primer NH2 end " A "
<400>27
agcgggagct tgctagacat tctgtttc 28
<210>28
<211>24
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: PCR primer NH2 end " B "
<400>28
gacgcgcctc gggcagcgcc ccaa 24
<210>29
<211>25
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: PCR primer pEPO15-NH6 end " B "
<400>29
caccgaagcg tcgatctggt ccatc 25
<210>30
<211>25
<212>DNA
<213〉artificial sequence
<220>
<223〉artificial sequence description: PCR primer pEPO15H2.7 end " A "
<400>30
cggtcagatc gacgacgggc tttcc 25

Claims (27)

1. isolated nucleic acid molecule, the nucleotide sequence that wherein comprises the biosynthetic polypeptide of coding at least a participation epothilone, the complement of described nucleotide sequence can be hybridized with being selected from down under the hybridization conditions that the nucleotide sequence organized defines below: the Nucleotide 21746-54920 of the Nucleotide 21746-43519 of SEQ IDNO:1 and SEQ ID NO:1; Described hybridization conditions is at 7% sodium lauryl sulphate, 0.5M NaPO 4Among pH7.0, the 1mM EDTA in 50 ℃ of hybridization, in 2 * SSC, 1%SDS in 50 ℃ of washings.
2. the isolated nucleic acid molecule of claim 1 wherein comprises the nucleotide sequence of hybridizing under the hybridization conditions that its complementary sequence can define below with the nucleotide sequence that is selected from down group: the Nucleotide 21746-54920 of the Nucleotide 21746-43519 of SEQID NO:1 and SEQ ID NO:1; Described hybridization conditions is 65 ℃ of hybridization 36 hours, and washs 20 minutes 3 times in 65 ℃ with 0.1 * SSC and 0.5%SDS.
3. claim 1 or 2 isolated nucleic acid molecule wherein comprise the nucleotide sequence of coded polypeptide, and described polypeptide comprises the aminoacid sequence that is selected from down group: SEQ ID NO:5.
4. claim 1 or 2 isolated nucleic acid molecule wherein comprise the nucleotide sequence that is selected from down group: the Nucleotide 21746-54920 of the Nucleotide 21746-43519 of SEQ ID NO:1 and SEQ ID NO:1.
5. isolated nucleic acid molecule, wherein comprise and participate in the biosynthetic nucleotide sequence of epothilone, the complement of described nucleotide sequence can be hybridized with being selected from down under the hybridization conditions that the nucleotide sequence organized defines below: the Nucleotide 20779-22991 of SEQ ID NO:1; Described hybridization conditions is at 7% sodium lauryl sulphate, 0.5M NaPO 4Among pH7.0, the 1mM EDTA in 50 ℃ of hybridization, in 2 * SSC, 1%SDS in 50 ℃ of washings.
6. the isolated nucleic acid molecule of claim 5 wherein comprises the nucleotide sequence of hybridizing under the hybridization conditions that its complementary sequence can define below with the nucleotide sequence that is selected from down group: the Nucleotide 20779-22991 of SEQID NO:1; Described hybridization conditions is 65 ℃ of hybridization 36 hours, and washs 20 minutes 3 times in 65 ℃ with 0.1 * SSC and 0.5%SDS.
7. claim 5 or 6 isolated nucleic acid molecule wherein comprise the nucleotide sequence of coded polypeptide, and described polypeptide comprises the aminoacid sequence that is selected from down group: SEQ ID NO:5.
8. claim 5 or 6 isolated nucleic acid molecule wherein comprise the nucleotide sequence that is selected from down group: the Nucleotide 20779-22991 of SEQ ID NO:1.
9. the isolated nucleic acid molecule of any one among claim 1-2 and the 5-6, wherein said nucleotide sequence separates from slime bacteria.
10. the isolated nucleic acid molecule of claim 9, wherein said slime bacteria is sorangium cellulosum (Sorangium cellulosum).
11. a mosaic gene wherein comprises among the claim 1-10 each nucleic acid molecule and the allogeneic promoter sequence that can be operatively connected with it.
12. a recombinant vectors wherein comprises the mosaic gene of claim 11.
13. a recombinant host cell wherein comprises the mosaic gene of claim 11.
14. the recombinant host cell of claim 13, it is a bacterium.
15. the recombinant host cell of claim 14, it is actinomycetes (Actinomycete).
16. the recombinant host cell of claim 15, it is streptomycete (Streptomyces).
17. Bac clone wherein comprises among the claim 1-10 each nucleic acid molecule.
18. the Bac of claim 17 clone, it is pEPO15.
19. the method for heterogenous expression epothilone in recombinant host comprises:
A) mosaic gene with claim 11 imports the host; With
B) under the condition that is fit to the synthetic epothilone of host living beings, cultivate the host.
20. produce the method for epothilone, comprising:
A) in recombinant host, express epothilone with the method for claim 19; With
B) from recombinant host, extract epothilone.
21. isolated polypeptide, it participates in the epothilone biosynthesizing, and it is by so nucleotide sequence coded, hybridizes under the hybridization conditions that the complement of described nucleotide sequence can define below with the nucleotide sequence that is selected from down group: the Nucleotide 21746-43519 of SEQ ID NO:1; Described hybridization conditions is at 7% sodium lauryl sulphate, 0.5M NaPO 4Among pH7.0, the 1mM EDTA in 50 ℃ of hybridization, in 2 * SSC, 1%SDS in 50 ℃ of washings.
22. the isolated polypeptide of claim 21, the complementary sequence of wherein said nucleotide sequence can with the hybridization conditions that defines below of nucleotide sequence that is selected from down group under hybridize: the Nucleotide 21746-43519 of SEQ ID NO:1; Described hybridization conditions is 65 ℃ of hybridization 36 hours, and washs 20 minutes 3 times in 65 ℃ with 0.1 * SSC and 0.5%SDS.
23. the isolated polypeptide of claim 21 or 22 wherein comprises the aminoacid sequence that is selected from down group: SEQ ID NO:5.
24. a recombinant host cell wherein comprises recombinant expressed claim 21,22 or 23 polypeptide.
25. the recombinant host cell of claim 24, it is a bacterium.
26. the recombinant host cell of claim 25, it is actinomycetes.
27. the recombinant host cell of claim 26, it is a streptomycete.
CNB2004100637938A 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones Expired - Fee Related CN100374566C (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US9950498A 1998-06-18 1998-06-18
US09/099,504 1998-06-18
US60/101,631 1998-09-24
US60/118,906 1999-02-05

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CNB998074217A Division CN100374565C (en) 1998-06-18 1999-06-16 Genes for biosynthesis of EPOTHILONE

Publications (2)

Publication Number Publication Date
CN1715414A CN1715414A (en) 2006-01-04
CN100374566C true CN100374566C (en) 2008-03-12

Family

ID=35821574

Family Applications (2)

Application Number Title Priority Date Filing Date
CNA2007100890997A Pending CN101161817A (en) 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones
CNB2004100637938A Expired - Fee Related CN100374566C (en) 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CNA2007100890997A Pending CN101161817A (en) 1998-06-18 1999-06-16 Genes for the biosynthesis of epothilones

Country Status (1)

Country Link
CN (2) CN101161817A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102532287B (en) * 2010-12-13 2013-08-14 首都师范大学 Stress-resistant protein PpLEA3-17 of bryophyte as well as encoding gene and application thereof
CN114181922B (en) * 2021-12-10 2023-06-23 安徽医科大学 Recombinant esterase, gene, recombinant bacterium and application of recombinant esterase and recombinant bacterium in degradation of phthalate

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Sorangium cellulosum (myxobacterium) gene cluster forthe biosysnthesis of the macrolide antibiotic soraphen A:cloning, characterization and homology to polyketidesynthase genes from actinomycetes. Thomas schupp et al.Journal of Bacteriology,Vol.177 No.13. 1995 *
Towards the synthesis of epothilone A:enantioselectivepreparation of the thiazole sidechain and macrocyclic ringclosure. Richard E .taylor et al.Tetrahedron Letters,Vol.38 No.12. 1997 *

Also Published As

Publication number Publication date
CN1715414A (en) 2006-01-04
CN101161817A (en) 2008-04-16

Similar Documents

Publication Publication Date Title
US6346404B1 (en) Genes for the biosynthesis of epothilones
CN100374565C (en) Genes for biosynthesis of EPOTHILONE
AU753546B2 (en) Epothilone C, D, E and F, production process, and their use as cytostatic as well as phytosanitary agents
CN1227362C (en) Biosynthetic genes for spinosyn insecticide production
DK2271666T3 (en) NRPS-PKS GROUP AND ITS MANIPULATION AND APPLICABILITY
TWI291464B (en) Methods for the preparation, isolation and purification of epothilone B, and X-ray crystal structures of epothilone B
KR20100049580A (en) Thiopeptide precursor protein, gene encoding it and uses thereof
US20030180760A1 (en) Compositions and methods for hydroxylating epothilones
CN100374566C (en) Genes for the biosynthesis of epothilones
JP2023012549A (en) Modified streptomyces fungicidicus isolates and use thereof
MXPA00012342A (en) Genes for the biosynthesis of epothilones
CZ20004693A3 (en) Isolated nucleic acid encoding polypeptide participating in biosynthesis of epothilone, chimeric gene, vector and host cells containing such nucleic acid
KR20050050146A (en) Genes and proteins for the biosynthesis of the glycopeptide antibiotic a40926
RU2265054C2 (en) Recombinant cell-host (variants) and bac clone
RU2234532C2 (en) Nucleic acid (variants), it using for expression of epotilones, polypeptide (variants), escherichia coli microorganism clone
CN100359014C (en) Novel epothilones compound and its preparation method and application
CN1311820A (en) BASB027 protein and genes from i(Moraxella Catarrhalis), antigens, antibodies, and uses
Julien et al. Genetic Engineering of Myxobacterial Natural Product Biosynthetic Genes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: NOVARTIS CO., LTD.

Free format text: FORMER NAME: NOVARTIS AG

CP01 Change in the name or title of a patent holder

Address after: Basel

Patentee after: Novartis Ag

Address before: Basel

Patentee before: Novartis AG

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080312

Termination date: 20140616

EXPY Termination of patent right or utility model