MXPA01005097A - Recombinant methods and materials for producing epothilone and epothilone derivatives - Google Patents

Recombinant methods and materials for producing epothilone and epothilone derivatives

Info

Publication number
MXPA01005097A
MXPA01005097A MXPA/A/2001/005097A MXPA01005097A MXPA01005097A MX PA01005097 A MXPA01005097 A MX PA01005097A MX PA01005097 A MXPA01005097 A MX PA01005097A MX PA01005097 A MXPA01005097 A MX PA01005097A
Authority
MX
Mexico
Prior art keywords
epothilone
pks
domain
gene
cells
Prior art date
Application number
MXPA/A/2001/005097A
Other languages
Spanish (es)
Inventor
Chaitan Khosla
Bryan Julien
Leonard Katz
Li Tang
Rainer Ziermann
Original Assignee
Kosan Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kosan Biosciences Inc filed Critical Kosan Biosciences Inc
Publication of MXPA01005097A publication Critical patent/MXPA01005097A/en

Links

Abstract

Recombinant nucleic acids that encode all or a portion of the epothilone polyketide synthase (PKS) are used to express recombinant PKS genes in host cells for the production of epothilones, epothilone derivatives, and polyketides that are useful as cancer chemotherapeutics, fungicides, and immunosuppressants.

Description

RECOMBINANT METHODS AND MATERIALS TO PRODUCE EPOTILONE AND EPOTILONE DERIVATIVES Reference to government support This invention was supported in part by SBIR support 1 R43-CA79228-01. The government of E.U.A. You have certain rights to this invention.
FIELD OF THE INVENTION The present invention provides recombinant methods and materials for producing epothilone and epothilone derivatives. The invention relates to the field of agriculture, chemistry, chemical medicine, medicine, molecular biology and pharmacology.
BACKGROUND OF THE INVENTION Epothilones were initially identified by Gerhard Hofle and colleagues at the National Biotechnology Research Institute as an antifungal activity extracted from myxobacterium Sorangium cellulosum (see K. Gerth et al., 1996, J. Antibiotics 49: 560-563 and German Patent No. DE 41 38 042). It was subsequently found that the epothilones had activity in a test of polymerization of tubulin (see D. Bollag et al., 1995, Cancer Res. 55: 2325-2333) to identify antitumor agents and have since been extensively studied as potential antitumor agents for the treatment of cancer. The chemical structure of the epothilones produced by Sorangium cellulosum strain So ce 90 was described in Hofle et al., 1996, Epothilone A and B -novel 16- membered macrolides with cytotoxic activity: isolation, crystal structure, and conformation in solution, Angew. Chem. Int. Ed. Engl. 35 (13/14): 1567-1569, incorporated herein by reference. The strain was found to produce two epothilone compounds, designated A (R = H) and B (R = CH3), as shown below, which show broad cytotoxic activity against eukaryotic cells and remarkable activity and selectivity against breast and colon tumor cell lines.
The deoxyi counterparts of epothilones A and B, also known as epothilones C (R = H) and D (R = CH3), are known to be less cytotoxic, and the structures of these epothilones are shown below.
Two other epothilones that occur naturally have been described. These are the epothilones E and F, in which the methyl side chain of the thiazole portion of the epothilones A and B has been hydrolyzed to produce the epothilones E and F, respectively. Due to the potential use of epothilones as anticancer agents, and due to the low levels of epothilone produced by the native So ce 90 strain, a number of research teams undertook the effort to synthesize the epothilones. This effort has been successful (see Balog et al., 1996, Total synthesis of (-) - epothilone A, Angew. Chem. Int. Ed. Engl. 35 (23/24): 2801-2803; Su et al., 1997, Total synthesis of (-) - epothilone B: an extension of the Suzuli coupling method and insights into structure-activity of the epothilones, Angew, Chem. Int. Ed. Engl. 36 (7): 757-759; Meng et al., 1997, Total synthesis of epothilones A and B, JACS 119 (42): 10073-10092, and Balog et al., 1998, A novel aldol condensation with 2-methyl-4-pentenal and its application to an mproved total synthesis of epothilone B, Angew, Chem. Int. Ed. Engl. 37 (19): 2675-2678, each of which is incorporated herein by reference). Despite the success of these efforts, the chemical synthesis of epothilones is tedious, consuming time, and expensive. In fact, the methods have been characterized as impractical for the development of a large-scale pharmaceutical epothilone. A number of epothilone derivatives, as well as epothilones A-D, have been studied in vitro and in vivo (see Su et al., 1997, Structure-activity relationships of the epothilones and the first in vivo comparison with paclitaxel, Angew. Chem. Int. Ed. Engl. 36 (19): 2093-2096; and Chou et al., Aug. 1998, Desoxyepothilone B: an efficacious microtubule-targeted antitumor agent with a promising in vivo profile relative to epothilone B. Proc. Natl. Acad. Sci. USA 95: 9642-9647, each of which is incorporated herein by reference). Additional epothilone derivatives and methods for synthesizing epothilones and epothilone derivatives are disclosed in PCT Patent Publication Nos. 99/54330, 99/54319, 99/54318, 99/43653, 99/43320, 99/42602, 99/40047, 99/27890, 99/07692, 99/02514, 99/01124, 98/25929, 98/22461, 98/08849 and 97/19086; The patent of E.U.A. No. 5,969,145; and German Patent Publication No. De 41 38 042, each of which is incorporated herein by reference. There remains a need for economic means to produce not only the naturally occurring epothilones but also the derivatives or precursors thereof, as well as new epothilone derivatives with improved properties. A need remains for a host cell that produces epothilone or epothilone derivatives that is easier to handle and ferment than the natural producer Sorangium cellulosum. The present invention achieves this and other needs.
BRIEF DESCRIPTION OF THE INVENTION In one embodiment, the present invention provides recombinant DNA compounds that encode the proteins required to produce epothilones A, B, C, and D. The present invention also provides recombinant DNA compounds that encode portions of these proteins. The present invention also provides recombinant DNA compounds encoding a hybrid protein, whose hybrid protein includes all or a portion of the protein involved in the biosynthesis of epothilone and all or a portion of a protein involved in the biosynthesis of another polyketide or non-peptide. derived from the ribosome. In a preferred embodiment, the recombinant DNA compounds of the invention are recombinant DNA cloning vectors that facilitate manipulation of the coding sequences or recombinant DNA expression vectors that encode the expression of one or more of these proteins of the invention in recombinant host cells. In another embodiment, the present invention provides recombinant host cells that produce one or more of the epothilones or epothilone derivatives at higher levels than those produced in the naturally occurring organisms that produce epothilones. In another embodiment, the invention provides host cells that produce mixtures of epothilones that are less complex than mixtures produced by naturally occurring host cells. In another modality, the present invention provides recombinant non-Sorangium host cells that produce an epothilone or epothilone derivatives. In the preferred embodiment, the host cells of the invention produce less complex mixtures of epothilones than those naturally occurring in epothilone producing cells. Naturally occurring epothilone producing cells typically produce a mixture of epothilones A, B, C, D, E and F. The table below summarizes the epothilones produced in different illustrative host cells of the invention.
Cell type Epothilones produced Epothilones not produced 1 A, B, C, D, E, F 2 A, C, EB, D, F 3 BD FA, C, E 4 A, B, C, DE, F 5 A, CB , D, E, F 6 CA, B, D, E, F 7 B, DA, C, E, F 8 DA, B, C, E, F In addition, cell types can be constructed that produce only the recently discovered G and H epothilones, as well as discuss below, and one or the other of G and H or both in combination with the subsequent epothilones. Thus, it is understood, based on the present invention, that the biosynthetic pathway that refers to the naturally occurring epothilones is, respectively, G? C? A? E and H? D? B? F. The appropriate enzymes can also convert members of each route to the corresponding member of the other. Thus, the recombinant cells of the invention also include host cells that produce only a desired epothilone or epothilone derivative. In another embodiment, the invention provides host cells Sorangium that have been genetically modified to produce epothilones either at higher levels than those seen in naturally occurring host cells or as less complex mixtures of epothilones than those produced by naturally occurring host cells, or produce an epothilone derivative that It does not occur in nature. In a preferred embodiment, the host cell produces the epothilones at a level equal to or greater than 20 mg / L. In another embodiment, the recombinant host cells of the invention are host cells other than Sorangium cellulosum that have been genetically modified to produce an epothilone or an epothilone derivative. In a preferred embodiment, the host cell produces the epothilones at a level equal to or greater than 20 mg / L. In a more preferred embodiment the recombinant host cells are host cells of Myxococcus, Pseudomonas or Streptomyces that produce epothilones or an epothilone derivative at a level equal to or greater than 20 mg / L. In another modality, the present invention provides novel compounds useful in agriculture, veterinary practice and medicine. In one embodiment, the compounds are useful as fungicides. In another embodiment, the compounds are useful in cancer chemotherapy. In a preferred embodiment, the compound is an epothilone derivative that is at least as potent against tumor cells as epothilone B or D. In another embodiment, the compounds are useful as immunosuppressants. In another embodiment, the compounds are useful in the manufacture of another compound. In a preferred embodiment, the compounds are formulated in a mixture or solution for administration to a human or animal. These and other embodiments of the invention are described in more detail in the following description, examples and claims described below.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows a restriction site map of the genomic DNA insert of Sorangium cellulosum in four superimposed cosmids clones (designated 8A3, 1A2, 4 and 85 and corresponding to pKOS35-70.8A3, pKOS35-70.8A3, pKOS35-70 1A2, pKOS35-70.4 and pKOS35-79.85 respectively) encompassing the epothilone gene cluster. I also know shows a functional map of the epotilone gene cluster. The charge domain (charge, epoA), the non-ribosomal peptide synthase module (NRPS, Module 1, epoS) and each module (modules 2 to 9, epoC, epoD, epoE and epoF) of the eight remaining modules are shown of the epothilone synthase gene, as well as the location of the epoK gene that encodes an epoxidation enzyme similar to cytochrome P-450. Figure 2 shows a number of precursor compounds of N-acylcysteamine thioester derivatives that can be supplied by an epothilone PKS of the invention in which module 1 similar to NRPS or the KS domain of module 2 has been inactivated to produce a new epothilone derivative. A general synthetic procedure for making said compounds is also shown. Figure 3 shows the restriction and function site maps of the plasmids pKOS35-82.1 and pKOS35-82.2. Figure 4 shows the restriction and function site maps of plasmids pKOS35-154 and pKOS90-22. Figure 5 shows a scheme of the protocol for introducing the jdel genes of epothilone PKS and modification enzymes into the chromosome of a host cell Myxococcus xanthus as described in example 3. Figure 6 shows the restriction and function site maps of the plasmids pKOS039-124 and pKOS039-124R.
Figure 7 shows a restriction and function site map of plasmid pKOS039-126R. Figure 8 shows a restriction and function site map of plasmid pKOS039-141. Figure 9 shows a restriction and function site map of plasmid pKOS045-12. Figure 10 shows a map of BACII pELO. Figure 11 shows a plasmid pCK7. Figure 12 shows a plasmid pKOS010-153, a derivative of pSET152.
DETAILED DESCRIPTION OF THE INVENTION The present invention provides the genes and proteins that synthesize the epothilones in Sorangium cellulosum in recombinant and isolated form. As used herein, the term "recombinant" refers to a compound or composition produced by human intervention, typically by specific and direct manipulation of ungen or portion thereof. The term "isolated" refers to a compound or composition in a preparation that is substantially free of contaminants or unwanted materials or, with respect to a compound or composition found in nature, substantially free of the materials with which that compound or composition is associated in its natural state. Epothilones (epothilone A, B, C, D, E and F) and structurally related compounds (epothilone derivatives) are potent cytotoxic agents specific for eukaryotic cells. These compounds have applications such as antifungals, cancer chemotherapy and immunosuppressants. Epothilones occur at very low levels in the cells in which they have been identified from Sorangium cellulosum that occur naturally. In addition, S. cellulosum grows very slowly, and the fermentation of strains of S. cellulosum is difficult and time-consuming. An important benefit conferred by the present invention is the ability simply to produce an epothilone or epothilone derivative in a host cell not S. cellulosum. Another advantage of the present invention is the ability to produce the epothilones at higher levels and in higher amounts in the recombinant host cells provided by the invention than is possible in the epothilone-producing cells that occur in nature. Yet another advantage is the ability to produce an epothilone derivative in a recombinant host cell. The isolation of the recombinant DNA encoding the epothilone biosynthesis genes results from the selection of a DNA library of Sorangium cellulosum SMP44. As described more extensively in example 1 below, the library was prepared by partially digesting S. cellulosum genomic DNA with the restriction enzyme SaulllAI and inserting the DNA fragments generated into the supremidus cosmid DNA digested with BamHl (Stratagene). The cosmid clones containing the epothilone gene sequences were identified by probing with DNA probes specific for PKS gene sequences and re-probing with secondary probes comprising nucleotide sequences identified with the primary probes. Four clones of superimposed cosmids were identified by this effort. These four cosmids were deposited with the American Type Culture Collection (ATCC), Manassas, VA, E.U.A., under the terms of the Budapest Treaty, and by assigning ATCC access numbers. The clones (and accession numbers) were designated as cosmids pKOS35-70.1A2 (ATCC 203782), pKOS35-70.4 (ATCC 203781), pKOS35-70.8A3 (ATCC 203783) and pKOS35-79.85 (ATCC 203780). The cosmid contains the DNA insert that completely encompasses the epothilone gene cluster. A restriction site map of these cosmids is shown in Figure 1. Figure 1 also provides a function map of the epothilone gene cluster, showing the location of the six PKS genes of epothilone and the epoxidase P450 epoK gene. PKS genes of epothilone, like other PKS genes, are composed of coding sequences organized to code a charge domain, a number of modules, and a thioesterase domain. As more extensively described below, each of these domains and molecules correspond to a polypeptide with one or more specific functions. Generally, the load domain is responsible for the construction of the first building block used to synthesize the polyketide and transfer it to the first module. The used building blocks to form complex polyketides are typically acylthioesters, most commonly acetyl, propionyl, malonyl, methylmalonyl and ethylmalonyl CoA. Other building blocks include acylthioesters similar to amino acids. PKS catalyzes the biosynthesis of polyketides through Claisen decarboxylative condensations, repeated between the acylthioester building blocks. Each module is responsible for the construction of a building block, carrying out one or more functions on that building block, and transferring the resulting compound to the next module. The next module, in turn, is responsible for the union of the next building block and transfers the compound is growth to the next module until the synthesis is complete. At that point, an enzymatic activity thioesterase (TE) breaks the polyketide from PKS. Such a modular organization is characteristic of the class of PKS enzymes that synthesize complex polyketides and is well known in the art. The recombinant methods for the manipulation of the modular PKS genes are described in the patent of E.U.A. Nos. 5,672,491; 5,712,146; 5,830,750 and 5,843,718 and in PCT patent publication Nos. 98/49315 and 97/02358, each of which is incorporated herein by reference. The polyketide known as 6-deoxyerythronolide B (6-dEB) is synthesized by a PKS which is a prototypical modular PKS enzyme. The genes, known as eryAl, eryAII and eryAIII, which code for the multisubunit protein known as deoxyerythronolide B synthase or DEBS (each subunit is known as DEBS1, DEBS2 or DEBS3) that synthesize 6-dEB are described in the patent of E.U.A. Nos. 5,712,146 and 5,824,513, incorporated herein by reference. The charge domain of the PKS DEBS consists of an acyltransferase (AT) and an acyl carrier protein (ACP). The AT of the charge domain of the DEBS recognizes propionyl CoA (another AT charge domain can recognize other Acyl-CoA, such as acetyl, malonyl, methylmalonyl or butyryl CoA) and transfer it as a thioester to the ACP of the charge domain. Concurrently, the AT in each of the six extended modules recognizes a methylmalonyl CoA (another AT extension module can recognize other CoAs, such as malonyl or substituted malonyl CoA alpha, ie, malonyl, ethylmalonyl and 2-hydroxyhalonyl CoA) and transfer it to the ACP of that module to form a thioester. Once DEBS is prepared with acyl- and methylmalonyl-ACP, the acyl group of the charge domain migrates to form a thioester (trans-esterification) to the KS of the first module; at this stage, module one possesses an acyl-KS adjacent to methylmalonyl ACP. The acyl group derived from the charge domain DEBS is then covalently linked to the alpha carbon of the extension group to form a carbon-carbon bond, driven by concomitant decarboxylation, and generating a new acyl-ACP having a two-carbon longer support than the loading unit (elongation or extension). The chain of the growing polyketide is transferred from the ACP to the KS of the next DEBS module, and the process continues. The polyketide chain, growing by two carbons for each DEBS module, sequentially passes as a thioester covalently united from module to module, in a process similar to an assembly line. The carbon chain produced by this single process would possess a ketone every other carbon atom, producing a polyketone, from which said polyketide arises. Commonly, however, additional enzymatic activities modify the keto beta group of each two-carbon unit just after it has been added to the growing polyketide chain but before it is transferred to the next module. Thus, in addition to the minimum module containing KS, AT and ACP necessary to form the carbon-carbon bond, the modules can also contain a ketorolac (KR) which reduces the keto group to an alcohol. The modules can also contain a KR plus a dehydratase (DH) that dehydrates the alcohol to a double bond. The modules may also contain a KR, a DH and an enoylreductase (ER) that converts the double bond to a single saturated bond using the beta carbon as the methylene function. The DEBS module includes those with only one KR domain, only one inactive KR domain and with the three KR, DH and ER domains. Once the polyketide chain crosses the final module of a PKS, it finds the release domain or thioesterase found at the carboxyl terminus of most of the PKS. Here, the polyketide is broken by the enzyme and, most but not all polyketides, are structured in a ring. The polyketide can be further modified by adaptation or modification enzymes; these enzymes add carbohydrate groups or methyl groups, or make other modifications, that is, oxidation or reduction, on the molecule of the polyketide nucleus. For example, 6-dEB is hydroxylated, methylated and glycosylated (glycosidated) to produce the well-known antibiotic erythromycin A in Saccharopolyspora erythraea cells in which it occurs naturally. While the above description is generally applied to modular PKS enzymes and specifically to DEBS, there are a number of variations that exist in nature. For example, many PKS enzymes comprise loading domains which, unlike the DEBS loading domains, comprise an "inactive" KS domain that functions as a decarboxylase. This inactive KS is called in most cases KSQ, where the superscript is the abbreviation of a single letter for the amino acid (glutamine) that is present in place of the active site cysteine required for the ketosynthase activity. The PKS loading domain of epothilone contains a KSY domain absent in other PKS enzymes for which the amino acid sequence is currently available in which the amino acid tyrosine has replaced cysteine. The present invention provides recombinant DNA coding sequences for this novel KS domain. Another important variation in KS enzymes refers to the type of building block incorporated. Some polyketides, including epothilone, incorporate a building block derived from amino acids. The PKS enzymes that make said polyketides require specialized modules for incorporation. Such modules are referred to as non-ribosomal peptide synthase (NRPS) modules. The epothilone PKS, for example, contains a NRPS module. Another example of a variation refers to additional activities in a module. For example, a PKS module of epothilone contains a methyltransferase (MT) domain, a previously unknown domain of the PKS enzyme that makes modular polyketides. The complete nucleotide sequence of the coding sequence of the open reading frames (ORFs) of the Pots genes of epothilone and the genes of the adaptive enzyme (modification) of epothilone are provided in Example 1, below. This sequence information together with the information provided below regarding the locations of the open reading frames of the genes within that sequence provides the amino acid sequence of the encoded proteins. Those skilled in the art will recognize that, due to the degenerate nature of the genetic code, a variety of DNA compounds that differ in their nucleotide sequence can be used to encode a given amino acid sequence of the invention. The native DNA sequence encoding the epothiione PKS and the epothilone modification enzymes of Sorangium cellulosum is shown herein merely to illustrate a preferred embodiment of the invention. The present invention includes DNA compounds of any sequence encoding the amino acid sequences of the polypeptides and proteins of the invention. Similarly, a polypeptide can typically tolerate one or more amino acid substitutions, deletions and insertions in its amino acid sequence without loss or significant loss of a desired activity and, in some cases, even an improvement in the desired activity. The present invention includes said polypeptides with alternative amino acid sequences, and the amino acid sequences are merely shown to illustrate preferred embodiments of the invention. The present invention provides recombinant genes for the production of epothilones. The invention is exemplified by the cloning, characterization and manipulation of epothilone PKS and the modification enzymes of Sorangium cellulosum SMP44. The description of the invention and the recombinant vectors deposited in connection with that description enable the identification, cloning and manipulation of epothilone PKS and modification enzymes of any naturally occurring host cell producing an epothilone. Said host cells include other strains of S. cellulosum, such as So ce 90, other Sorangium species and non-Sorangium cells. Such identification, cloning and characterization can be conducted by ordinarily skilled artisans according to the present invention using standard methodology to identify homologous DNA sequences and to identify genes that encode a protein of similar function to a known protein. In addition, the present invention provides epothilone PKS genes and modification enzymes that are synthesized de novo or assembled from PKS genes other than epothilone to provide an ordered array of domains and modules in one or more proteins that are assemble to form a PKS that produces epothilone or a non-epothilone derivative.
The recombinant nucleic acids, proteins and peptides of the invention are many and diverse. To facilitate an understanding of the invention and the various compounds and methods provided herein, the following discussion describes various functions of epothilone PKS and corresponding coding sequences. This discussion begins with a general discussion of the genes encoding PKS, the location of the various domains and modules of those genes, and the location of the various domains in those modules. Then, a more detailed discussion below, which focuses first on the load domain, followed by the NRPS module and then the remaining eight modules of the epothilone PKS. There are six PKS genes of epothilone. The epoA gene encodes the 149 kDa load domain (which can also be referred to as the charge module). The epoB gene encodes module 1, the 158 kDa NRPS module. The epoC gene encodes module 2 of 193 kDa. The epoD gene encodes a 765 kDa protein comprising modules 3 to 6, inclusive. The epoE gene encodes a 405 kDa protein comprising modules 7 and 8. The epoF gene encodes a 257 kDa protein comprising module 9 and the thioesterase domain. Immediately towards the 3 'end of the epoF gene is epoK, the epoxidase P450 gene encoding a 47 kDa protein, followed immediately by the epoL gene, which can encode a 24 kDa dehydratase. The epoL gene is followed by a number of ORFs that include genes that are believed to encode proteins involved in transport and regulation.
The sequences of these genes are shown in Example 1 in a contiguous or contig sequence of 71,989 nucleotides. This contig also contains two genes that appear to originate from a transposon and are then identified as ORF A and B ORF B. These two genes are thought to be not involved in the biosynthesis of epothilone but could possibly contain sequences that function as promoters or breeders The contig also contains more than 12 additional ORFs, only 12 of which are designated ORF2 up to ORF12 and complement of ORF2, are identified below. As evidenced, ORF2 is in fact two ORF, because the complement of the shown strand also comprises an ORF. The function of the corresponding gene product, if any, of these ORFs has not yet been established. The table below provides the location of several open reading frames, coding sequences of the module and coding sequences of the domain within the contig sequence shown in example 1. Those skilled in the art will recognize, after consideration of the sequence shown in Example 1, that the current starting locations of several of the genes may differ from the starting locations shown in the table, due to the presence in the framework of codons for methionine or valine in close proximity to the codon indicated as start codon. The current start codon can be confirmed by amino acid sequencing of the proteins expressed from the genes.
Start Stop Comment 992 the ORF A of the transposase gene, the ORF B of the transposase gene is not part of PKD 989 1501, it is not part of the PKD 1998 6263 epoA gene, it codes the loading domain 2031 3548 KSY of the loading domain 3621 4661 AT of the loading domain 4917 5810 ER of the loading domain, potentially involved in the formation of the thiazole moiety Start Stop Comment 5856 6155 ACP of load domain 6260 10493 epoB gene, code to module 1, NRPS 6620 module 6649 condensation domain C2 of module NRPS 6861 6887 heterocyclic label sequence 6962 6982 condensation domain C4 of module NRPS 7358 7366 condensation domain C7 (partial) of the NRPS 7898 7921 module adenylation domain A1 of the NRPS module 8261 8308 adenylation domain A3 of the NRPS module 8411 8422 A4 adenylation domain of the NRPS module 8861 8905 A6 adenylation domain of the NRPS module 8966 8983 adenylation domain A7 of the NRPS module 9090 9179 adenylation domain A8 of the NRPS module 9183 9992 oxidation region to form thiazole 10121 10138 adenylation domain A10 of the NRPS module 10261 10306 thiolation domain (PCP) of the NRPS module 10639 16137 epoC gene, encodes module 2 10654 12033 KS2, the KS domain of module 2 Start Paro Comment 12250 13287 AT2, the AT domain of module 2 13327 13899 DH2, the DH domain of module 2 14962 15756 KR2, the KR domain of module 2 15763 16008 ACP2, the ACP domain of module 2 16134 37907 epoD gene, codes module 3-6 16425 17606 KS3 17817 18857 AT3 19581 20396 LR3 20424 20642 ACP3 20706 22082 KS4 22296 23336 AT4 24069 24647 KR4 24867 25151 ACP4 25203 26576 KS5 26793 27833 AT5 27966 29574 DH5 29433 30287 ER5 30321 30869 KR5 31077 31373 ACP5 31440 32807 KS6 33018 34067 AT6 34107 34676 DH6 Start Stop Comment 35760 36641 ER6 36705 37256 KR6 37470 37769 ACP6 37912 49308 gen epoE, code the modules 7 and 8 38014 39375 KS7 39589 40626 AT7 41341 41922 KR7 42181 42423 ACP7 42478 43851 KS8 44065 45102 AT8 45262 45810 DH (inactive) 46072 47172 MT8, the methyltransferase domain 8 48103 48636 KR8, this domain is inactive 48850 49149 ACP8 49323 56642 gen epoF, encodes module 9 and domain TE 49416 50774 KS9 50985 52025 AT9 52173 53414 DH (inactive) 54747 55313 KR9 55593 55805 ACP9 Start Paro Comment 55878 56600 TE9, the thioesterase domain 56757 58016 epoK gene, encodes P450 epoxidase 58194 58733 epoL gene (putative dehydratase) 59405 59974 complement ORF2, complement of the shown strand 59460 60249 ORF2 60271 60738 ORF3, complement of the strand shown 61730 62647 ORF4 (putative transporter) 63725 64333 ORF5 64372 65643 ORF6 66237 67472 ORF7 (putative oxidoreductase) 67572 68837 ORF8 (oxidoreductase subunit of putative membrane) 68837 69373 ORF9 69993 71174 ORF10 (putative transporter) 71171 71542 ORF11 71557 71989 ORF12 With this perspective of the organization and sequence of the epothilone gene cluster, one can better appreciate the very different recombinant DNA compounds provided by this invention. Epothilone PKS is a multiprotein complex composed of the products of the epoA, epoB, epoC, epoD, epoE and epoF genes. To confer the ability of a host cell to produce epothilones, one must provide the host cell with the epoA, epoB, epoC, epoD, epoE and epoF genes of the present invention, and optionally other genes, capable of expressing themselves in that host cell . Those skilled in the art will appreciate that, while epothilone and other PKS enzymes may be preferred herein as particular entities, these enzymes are typically multisubunit proteins. Thus, one can make a derived PKS (a PKS that differs from the naturally occurring PKS by deletion or mutation) or hybrid PKS (a PKS that is composed of portions of two different PKS enzymes) by altering one or more genes encoding a or more of the multiple proteins that make up the PKS. The post-PKS modification or epothilone adaptation includes multiple steps mediated by multiple enzymes. These enzymes refer here as adaptation or modification enzymes. Surprisingly, the products of the epothilone PKS domains predicted to be functional by analysis of the genes encoding them are compounds that have not been previously reported. These compounds are referred to herein as epothilones G and H. The epothilones G and H lack the C-12-C-13p linkage of the epothilones C and D and the C-12-C-13 epoxide of the epothilones A and B, having instead a hydrogen and hydroxyl group at C-13, a unique bond between C-12 and C-13, and a hydrogen and H group or methyl at C-12. These compounds are predicted to result from epothilone PKS, because the DNA and the corresponding amino acid sequence for module 4 of the epothilone PKS do not appear to include a DH domain. As described below, however, the expression of the epoA epoB, epoC, epoD, epoE and epoF genes of epothilone PKS in certain heterologous host cells that do not express epoK or epoL leads to the production of epothilones C and D, which they lack C-13 hydroxyl and have a double bond between C-12 and C-13. The dehydration reaction that mediates the formation of this double bond may be due to the action of a domain not yet recognized epothilone PKS (for example, dehydration may occur in the next module, which has an active DH domain and can generate a precursor conjugated diene before its dehydrogenation by an ER domain) or an endogenous enzyme in the heterologous host cells (Streptomyces coelicolor) in which it was observed. In the later event, the epothilones G and H can be produced in Sorangium cellulosum or in other host cells, to be converted to epothilones C and D, by the action of a dehydratase, which can be encoded by the epoL gene. In any event, the epothilones C and D are converted to epothilones A and B by an epoxidase encoded by the epoK gene. The epothilones A and B are converted to epothilones E and F by a hydroxylase gene, which may be encoded by one of the ORFs identified above or by another gene endogenous to Sorangium cellulosum. Thus, one can produce an epothilone or modified epothilone derivative as desired in a host cell by providing said host cell with one or more of the recombinant genes of modification enzymes provided by the invention or by using a host cell that normally expresses (or does not express) the modification enzyme. Thus, in general, by using the appropriate host and by appropriate inactivation, if desired, of modifying enzymes, one can interrupt the progression of G ^ C • A • E or processing to the corresponding 3 'end of epothilone H at any desired point; when controlling methylation, one or both routes can be selected. Thus, the present invention provides a wide variety of recombinant DNA compounds and host cells to express the naturally occurring epothilones A, B, C and D and derivatives thereof. The invention also provides recombinant host cells, particularly Solangium cellulosum host cells that produce modified epothilone derivatives in a manner similar to epothilones E and F. In addition, the invention provides host cells that can produce the epothilones previously. unknown G and H, either by expression of epothilone PKS genes in host cells that do not express the dehydratase that converts the epothilones G and H to C and D or by mutating or altering the PKS to prevent the dehydratase function, if it is presented in epothilone PKS. The macrolide compounds that are grouping products PKS can thus be modified in several ways. In addition to the modifications described above, PKS products can be glycosylated, hydroxylated, dehydroxylated, oxidized, methylated and demethylated using appropriate enzymes. Thus, in addition to modifying the product of the PKS pool, additional compounds within the scope of the invention can be produced by additional activity catalyzed by enzymes either provided by a host cell in which the polyketide synthase is produced or by modifying these cells so that contain additional enzymes or by additional modifications in vitro using purified enzymes or crude extracts or, in fact, by chemical modifications. The present invention also provides a wide variety of recombinant DNA compounds and host cells that make epothilone derivatives. As used herein, the phrase "epothilone derivative" refers to a compound that is produced by a recombinant epothilone PKS in which at least one domain has been either inactivated, mutated or altered its catalytic function, or replaced by a domain with a different function or in which a domain has been inserted. In any event, the "PKS derived from epothilone" works by producing a compound that differs in structure to Starting from an epothilone that occurs naturally but retains its ring support structure in such a way that it is called an "epothilone derivative". To facilitate a better understanding of the recombinant DNA compounds and host cells provided by the invention, a detailed discussion of the loading domain and each of the epothilone PKS modules, as well as novel recombinant derivatives thereof, is provided below. . The loading domain of the epothilone PKS includes an inactive KS domain, KSY, a specific AT domain for malonyl CoA (which is believed to be decarboxylated by the KSY domain to produce an acetyl group), and an ACP domain. The present invention provides recombinant DNA compounds encoding the epothilone loading domain. The coding sequence of the epothilone loading domain is contained within an EcoRI restriction fragment of -8.3 kb of the cosmid pKOS35-70.8A3. The KS domain is referred to as inactive, because the active site region "TAYSSSL" of the KS domain of the loading domain has a Y residue in place of the cysteine required for ketosynthase activity; this domain has decarboxylase activity. See Witkowski et al., 7 Sep. 1999, Biochem. 38 (36): 11643-11650, incorporated herein by reference. The presence of the Y residue instead of a Q residue (which typically occurs in an inactive KS charge domain) can make the KS domain less efficient in decarboxylation. The present invention provides a recombinant PKS loading domain of epothilone and the sequences of corresponding DNAs encoding a PKS charge domain of epothilone in which the Y residue has been changed by a Q residue by changing the codon thereof in the coding sequence of the loading domain. The present invention also provides recombinant PKS enzymes comprising said loading domains and host cells to produce said enzymes and the polyketides produced thereby. These recombinant loading domains include those in which only the Y residue has been changed, and those in which the complete KSY domain has been replaced by a complete KSQ domain. The latter embodiment includes but is not limited to a recombinant epothilone loading domain in which the KSY domain has been replaced by the KSQ domain of the oleandolide PKS or the narbonolide PKS (see references cited below in connection with the oleandomycin PKS. , narbomycin and picromycin and modification enzymes). The charge domain of epothilone also contains a domain AT that is believed to bind malonyl CoA. The sequence "QTAFTQPALFTFEYALAALW ... GHSIG" in the AT domain is consistent with the specificity of malonyl CoA. As evidenced above, it is believed that malonyl CoA is decarboxylated by the KSY domain to produce acetyl CoA. The present invention provides recombinant loading domains derived from epothilone or their coding DNA sequences in which the malonyl-specific AT domain or its coding sequences have changed to another specificity, such as methylmalonyl CoA, ethylmalonyl CoA and 2- hydroxy alonyl CoA. When expressed with the other epothilone PKS protein, said loading domains lead to the production of epothilones in which the methyl substituent of the thiazole ring of epothilone is replaced with, respectively, ethyl, propyl, and hydroxymethyl. The present invention provides recombinant PKS enzymes comprising said loading domains and host cells to produce said enzymes and the polyketides produced by them. Those skilled in the art will recognize that an AT domain that is specific for 2-hydroxyhalonyl CoA will result in a polyketide with a hydroxyl group at the corresponding location in the polyketide produced, and that the hydroxyl group can be methylated to produce a methoxy group by enzymes of modification of the polyketide. See, for example, the patent applications cited in connection with the PKS FK-520 in the table below. Consequently, the reference here to a PKS having specific AT domain for 2-hydroxymalonyl similarly refers to polyketides produced by that PKS having either a hydroxyl or methoxyl group at the corresponding location in the polyketide. The charge domain of the epothilone PKS also comprises an ER domain. Meanwhile, this ER domain may be involved in the formation of one of the two bonds in the thiazole moiety in epothilone (on the reverse of its normal reaction), or it may be non-functional. In any event, the invention provides recombinant DNA compounds encoding the PKS loading domain of epothilone with or without the ER region, as well as domains hybrid load cells containing an ER domain of another PKS (either active or inactive, with or without accompanying KR and DH domains) instead of the ER domain of the epothilone loading domain. The present invention also provides recombinant PKS enzymes comprising said loading domains and host cells to produce said enzymes and the polyketides produced thereby. The recombinant nucleic acid compounds of the invention that encode the loading domain of epothilone PKS and the corresponding polypeptides encoded by them are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence encoding the epothilone loading domain is coexpressed with the proteins of a heterologous PKS. As used herein, reference to a heterologous modular PKS (or coding sequence thereof) refers to all or part of a PKS, including each of the multiple proteins that make up the PKS, which synthesizes a polyketide other than a epothilone or epothilone derivative (or the coding sequence thereof). This coexpression can be in one of two ways. The epothilone loading domain can be coexpressed as a discrete protein with the other proteins of the heterologous PKS or as a fusion protein in which the loading domain is fused to one or more modules of the heterologous PKS. In any event, the hybrid PKS formed, in which the charge domain of the heterologous PKS is replaced by the epothilone loading domain, provides a novel PKS. Examples of heterologous PKS that can be used to prepare said hybrid PKS enzymes of the invention include but are not limited to DEBS and PKS enzymes of picromycin (narbonolide), oleandolide, rapamycin, FK-506, FK-520, rifamycin and avermectin and their corresponding coding sequences. In another embodiment, a nucleic acid compound comprising a sequence encoding the epothilone loading domain is coexpressed with the proteins constituting the epothilone PKS remanent (i.e., the epoB, epoC, epoD, epoE and epoF) or a recombinant epothilone PKS that produces an epothilone derivative due to an alteration or mutation in one or more of the epoB, epoC, epoD, epoE and epoF genes. As used herein, the reference to an epothilone or a PKS that produces an epothilone derivative (or to the coding sequence thereof) refers to all or any of the proteins comprising the PKS (or the coding sequence thereof) . In another embodiment, the invention provides recombinant nucleic acid compounds that encode a charge domain composed of part of the epothilone loading domain and part of a heterologous PKS. In this embodiment, the invention provides, for example, either the replacement of the specific AT for malonyl CoA with a methylmanlonyl CoA, ethylmalonyl CoA or AT specific for 2-hydroxyhalonyl CoA. This replacement, like the others described herein, is typically mediated by the replacement of the coding sequences thereof to provide a recombinant DNA compound of the invention; the recombinant DNA is used to prepare the protein correspondent. Such changes (including not only replacements but also deletions and insertions) can be referred to here either at the DNA or protein level. The compounds of the invention also include those in which both the KSY and AT domains of the epothilone loading domain have been replaced but the ACP and / or the epothilone loading domain adapter regions have been left intact. The adapter regions are those segments of amino acids between the domains in the loading domain and the modules of a PKS that helps form the tertiary structure of the protein and is involved in the correct alignment and positioning of the domains of a PKS. These compounds include, for example, a sequence encoding the recombinant loading domain in which the coding sequences of the KSY and AT domain of the epothilone PKS have been replaced by the coding sequences for the KSQ and AT domains of, for example, the oleandolide PKS or the narbonolide PKS. There are also PKS enzymes that do not employ a KSQ domain but instead merely use an AT domain that binds acetyl CoA, propionyl CoA or butyryl CoA (the loading domain DEBS) or isobutyryl CoA (the avermectin loading domain). Thus, the compounds of the invention also include, for example, a coding sequence of the recombinant loading domain in which the coding sequences of the KSY and AT domain of the epothilone PKS have been replaced by an ATMS domain of the DEBS PKS or of avermectin. The present invention also provides DNA compounds Recombinants coding for charge domains in which the ACP domain or any of the binding regions of the epothilone loading domain have been replaced by another ACP or binding region. Any of the aforementioned charge domain coding sequences are coexpressed with other proteins that constitute a PKS that synthesizes epothilone, an epothilone derivative or other polyketide to provide a PKS of the invention. If the desired product is epothilone or an epothilone derivative, then the coding sequence of the loading domain is typically expressed as a discrete protein, as is the loading domain in the naturally occurring epothilone PKS. If the desired product is produced by the loading domain of the invention and the proteins of one or more non-epothilone PKS enzymes, then the loading domain is expressed either as a discrete protein or as a fusion protein with one or more modules of heterologous PKS. The present invention also provides hybrid PKS enzymes in which the epothilone loading domain has been replaced in its entirety by a charge domain from a heterologous PKS with the remaining PKS proteins provided by modified epothilone PKS proteins or not. modified. The present invention also provides recombinant expression vectors and host cells to produce said enzymes and the polyketides produced by them. In one embodiment, the heterologous charge domain is expressed as a discrete protein in a host cell that expresses the epoB, epoC, epoD, epoE and epoF gene products. In other embodiment, the heterologous charge domain is expressed as a fusion protein with the epoB gene product in a host cell expressing the epoC, epoD, epoE and epoF gene products. In a related embodiment, the present invention provides recombinant epothilone PKS enzymes in which the loading domain has been deleted and replaced by a corresponding NRPS module and recombinant DNA compounds and expression vectors. In this embodiment, the recombinant PKS enzymes thus produce an epothilone derivative comprising a dipeptide moiety, as in the leinamycin compound. The invention provides such enzymes in which the remaining epothilone PKS is identical in function to the native epothilone PKS as well as those in which the remnant is a recombinant PKS that produces an epothilone derivative of the invention. The present invention also provides reagents and methods useful for deleting the coding sequence of the charge domain or any portion thereof from the chromosome of a host cell, such as Sorangium cellulosum, or replacing those sequences or any portion thereof with sequences that encode a recombining load domain. Using a recombinant vector comprising DNA complementary to DNA that includes and / or flanks the coding sequence of the loading domain on the Sorangium chromosome, one can employ the vector and homologous recombination to replace the coding sequence of the native loading domain with a coding sequence of the recombinant loading domain or deleting the sequence completely.
In addition, while the foregoing discussion focuses on the deletion or replacement of the coding sequences of the epothilone loading domain, those skilled in the art will recognize that the present invention provides recombinant DNA compounds, vectors and methods useful for deleting or replacing all or any portion of a PKS gene of epothilone or a gene of an epothilone modifying enzyme. These methods and materials are useful for a variety of purposes. One purpose is to build a host cell that does not make an epothilone or epothilone derivative that occurs naturally. For example, a host cell that has been modified so as not to produce a naturally occurring epothilone may be particularly preferred to make epothilone derivatives or other polyketides free of any naturally occurring epothilone. Another purpose is to replace the deleted gene with a gene that has been altered in a way that provides a different product or that produces more than one product than the other. If the coding sequence of the epothilone loading domain has been deleted or has also become non-functional in a host cell of Sorangium cellulosum, then the resulting host cell will produce a non-functional epothilone PKS. This PKS can still bind and process extended units, but the thiazole portion of epothilone will not form, leading to the production of a novel epothilone derivative. Because this derivative would predictably contain a free amino group, it would occur at most in low amounts. As it was evidenced previously, however, the provision of a heterologous charge domain or of another recombinant in the host cell would result in the production of an epothilone derivative with a structure determined by the charge domain provided. The charge domain of the epothilone PKS is followed by the first module of the PKS, which is a NRPS module specific for cysteine. This NRPS module is naturally expressed as a discrete protein, the product of the epoB gene. The present invention provides the epoB gene in recombinant form. The recombinant nucleic acid compounds of the invention that encode the NRPS module of the epothilone PKS and the corresponding polypeptides encoded thereby are useful for a variety of applications. In one embodiment, the nucleic acid compound comprises a sequence encoding the NRPS module of epothilone that is coexpressed with genes that encode one or more proteins of a heterologous PKS. The NRPS module can be expressed as a discrete protein or as a fusion protein with one of the heterologous PKS proteins. The resulting PKS, in which at least one module of the heterologous PKS is replaced by the NRPS module of epothilone or the NRPS module is in effect added as a module to the heterologous PKS, provides a novel PKS. In another embodiment, a DNA compound comprising a sequence encoding the NRPS module of epothilone is coexpressed with the other epothilone PKS proteins or modified versions thereof to provide an Recombinant epothilone PKS that produces an epothilone or an epothilone derivative. Two hybrid PKS enzymes provided by the invention illustrate this aspect. Both hybrid PKS enzymes are hybrids of DEBS and the NRPS module of epothilone. The first hybrid PKS is composed of four proteins: (i) DEBS1; (ii) a fusion protein composed of module 3 of the KS domain of DEBS and the entire loading domain of epothilone PKS are the KS domain, (iii) the NRPS module of epothilone; and (iv) a fusion protein composed of module 2 of the KS domain of epothilone PKS fused to module 5 of the AT domain of DEBS and the remainder of DEBS3. This hybrid PKS produces a novel polyketide with a thiazole moiety incorporated within the macrolactone ring and with a molecular weight of 413.53 when expressed in Streptomyces coelicolor. Hydroxylated and methylated glycosylated derivatives can be produced by expression of the hybrid PKS in Saccharopolyspora erythraea. Diagrammatically, the construction is represented: The structure of the product is: The second PKS hybrid illustrating this aspect of the invention consists of five proteins: (i) DEBS1; (ii) a fusion protein composed of module 3 of the KS domain of DEBS and the entire loading domain of epothilone PKS are the KS domain, (iii) the NRPS module of epothilone; and (iv) a fusion protein composed of module 2 of the KS domain of epothilone PKS fused to module 5 of the AT domain of DEBS and the remainder of DEBS2; and (v) DEBS3. This hybrid PKS produces a novel polyketide with a thiazole moiety incorporated within the macrolactone ring and with a molecular weight of 455.61 when expressed in Streptomyces coelicolor. Hydroxylated and methylated glycosylated derivatives can be produced by expression of the hybrid PKS in Saccharopolyspora erythraea. Diagrammatically, the construction is represented: The structure of the product is: In another embodiment, a portion of the coding sequence of the NRPS module is used in conjunction with a heterologous coding sequence. In this embodiment, the invention provides, for example, change in the specificity of the NRPS module of the epothilone PKS from a cysteine of another amino acid. This change is achieved by constructing a coding sequence in which all or a portion of the coding sequence of the NRPS module of the epothilone PKS has been replaced by those coding for an NRPS module of a different specificity. In an illustrative mode, the specificity of the NRPS module is changed from cysteine to serine or threonine. When the NRPS module thus modified is expressed with the other epothilone PKS proteins, the recombinant PKS produces an epothilone derivative in which the thiazole portion of the epothilone (or an epothilone derivative) is changed to an oxazole moiety. -methyloxazole, respectively. Alternatively, the present invention provides recombinant PKS enzymes composed of the products of the epoA, epoC, epoD, epoE and epoF genes (or modified versions thereof) without an NRPS module or with an NRPS module from a heterologous PKS. The module NRPS heterologous can be expressed as a discrete protein or as a fusion protein with either the epoA or epoC genes. The invention also provides methods and reagents useful in changing the specificity of a heterologous NRPS module from another amino acid to cysteine. This change is achieved by constructing a coding sequence in which the sequences that determine the specificity of the heterologous NRPS module have been replaced by those that specify cysteine from the coding sequence of the NRPS module of epothilone. The resulting heterologous NRPS module is typically coexpressed in conjunction with the proteins that constitute a heterologous PKS that synthesizes a polyketide other than the epofilone or epofilone derivative, although the heterologous NRPS module can also be used to produce epothilone or epothilone derivatives. In another embodiment, the invention provides epothilone PKS recombinant enzymes and corresponding nucleic acid recombinant compounds and vectors in which the NRPS module has been activated or deleted. Said enzymes, compounds, and vectors are generally constructed in accordance with the teachings to delegate or inactivate the genes of epothilone PKS enzymes or modification enzymes mentioned above. The inactive proteins of the NRPS module and the coding sequences thereof provided by the invention include those in which the domain of peptidyl carrier protein (PCP) has been totally or partially delegated or that has otherwise been made inactive by changing the serine from the active site (the phosphopantethenylation site) to another amino acid, such as alanine, or the adenylation domains have been deleted or otherwise inactivated. In one modality, both the load domain and the NRPS have been deleted or become inactive. In any event, the resulting epothilone PKS can only function if a substrate that binds to the KS domain or module 2 (or a subsequent module) of the epothilone PKS or a PKS is provided for an epothilone derivative. In a method provided by the invention, the cells thus modified are then fed with activated acylthioesters which are bound by preferably in second, but potentially any subsequent modules, and processed into novel epothilone derivatives. Thus, in one embodiment, the invention provides host cells of Sorangium and other than Sorangium that express an epothilone PKS (or a PKS that produces an epothilone derivative) with an inactive NRPS. The host cell is fed with activated acyl esters to produce novel epothilone derivatives of the invention. Host cells expressing, or free cell extracts containing, the PKS can be fed or stocked with thioesters of N-acylcysteamine (NACS) or novel precursor molecules to prepare epothilone derivatives. See, patent application of E.U.A. serial No. 60 / 117,384, filed January 27, 1999, and PCT patent publication No. US99 / 03986, both incorporated herein by reference, and example 6, below.
The second module (first without NRPS) of the epothilone PKS includes a KS, a specific AT for methylmalonyl CoA, a DH, a KR, and an ACP. This module is encoded by a sequence within an EcoRI-Nsil restriction fragment of -13.1 kb of the cosmid pKOS35-70.8A3. The recombinant nucleic acid compounds of the invention encoding the second epothilone PKS module and the corresponding polypeptides encoded by them are useful for a variety of applications. The second module of epothilone PKS is produced as a discrete protein by the epoC gene. The present invention provides the epoC gene in recombinant form. In one embodiment, a DNA compound comprising a sequence encoding the second epothilone module is coexpressed with the proteins that constitute a heterologous PKS either as a discrete protein or as a fusion protein with one or more modules of the heterologous PKS. . The resulting PKS, wherein a module of the heterologous PKS is either replaced by the second module of the epothilone PKS or the latter is merely added to the modules of the heterologous PKS, providing a novel PKS. In another embodiment, a DNA compound comprising a sequence encoding the second epothilone PKS module is coexpressed with the other proteins constituting the epothilone PKS or a recombinant epothilone PKS that produces an epothilone derivative. In another embodiment, all or only a portion of the coding sequence of the second module is used in conjunction with other coding sequences of PKS to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the specific AT for methylmalonyl CoA with a specific AT for malonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; deleting either DH or KR or both; replace the DH or KR or both with a DH or KR or both that specifies a different stereochemistry; and / or insert an ER. Generally, any reference here to insert or replace a KR, DH, and / or ER domain of PKS includes the replacement of the associated KR, DH, or ER domains in that module, typically with domains corresponding to the module from which it is derived. gets the domain inserted or replaced. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence for another epothilone PKS module, from a gene for a PKS that produces a polyketide different from epofilone, or from chemical synthesis. The heterologous coding sequence of the resulting second module can be coexpressed with the other proteins constituting a PKS that synthesizes epothilone, an epothilone derivative, or another polyketide. Alternatively, one can delegate or replace the second epothilone PKS module with a module from a heterologous PKS, which can be expressed as a discrete protein or as a fusion protein fused to either the epoB or epoD gene product.
Exemplary recombinant PKS genes of the invention include those in which the coding sequences of the AT domain for the second epothilone PKS module have been altered or replaced to change the AT domain thus encoding a specific AT to methylmalonyl a specific AT to malonil. Such nucleic acids encoding the malonyl-specific AT domain can be isolated, for example and without limitation, from the PKS genes encoding the narbonolide PKS, the rapamycin PKS (ie, modules 2 through 12), and the PKS of FK-520 (that is, modules 3, 7 and 8). When said second hybrid module is coexpressed with the other proteins constituting the epothilone PKS, the resulting epothilone derivative produced is a 16-demethyl epothilone derivative. In addition, the invention provides DNA compounds and vectors encoding epothilone recombinant PKS enzymes and corresponding recombinant proteins in which the KS domain of the second (or subsequent) module has been inactivated or deleted. In a preferred embodiment, this inactivation is achieved by changing the codon for the cysteine from the active site to an alanine codon. As with the variants described above for the NRPS module, the resulting epothilone PKS recombinant enzymes are incapable of producing an epothilone or epothilone derivative unless a precursor is added which can bind and extend through the remaining domains and modules of the recombinant enzyme PKS. The illustrative diketides are described in Example 6, below.
The third module of the epothilone PKS includes a KS, a specific AT for malonyl CoA, a KR, and an ACP. This module is encoded by a sequence within a Bg1 ll-Nsil restriction fragment of ~ 8 kb of the cosmid pKOS35-70.8A3. The recombinant DNA compounds of the invention that encode the third module of epothilone PKS and the corresponding polypeptides encoded by them are useful for a variety of applications. The third module of epothilone PKS is expressed in a protein, the product of the epoD gene, which also contains modules 4, 5, and 6. The present invention provides the epoD gene in recombinant form. The present invention also provides recombinant DNA compounds encoding each of the epothilone PKS modules 3, 4, 5, and 6, as discrete coding sequences without coding sequences for the other epothilone modules. In a modality, a DNA compound comprising a sequence encoding the third epothilone module is coexpressed with proteins that constitute a heterologous PKS. The third module of epothilone PKS can be expressed either as a discrete protein or as a fusion protein fused to one or more modules of the heterologous PKS. The resulting PKS, in which a module of the heterologous PKS is either replaced by the third module of the epothilone PKS or the latter is merely added to the modules of the heterologous PKS, providing a novel PKS. In another embodiment, a DNA compound comprising a sequence encoding the third module of the Epothilone PKS is coexpressed with the other proteins comprising the epothilone PKS remnant or a recombinant epothilone PKS that produces an epothilone derivative, typically as a protein comprising not only the third module, but also the fourth, fifth, and sixth. In another embodiment, all or only a portion of the coding sequence of the third module is used in conjunction with other coding PKS sequences to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the specific AT for malonyl CoA with a specific AT for methylmalonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; deleting KR with a KR specifying a different stereochemistry; and / or inserting a DH or a DH and an ER. As mentioned above, the reference to inserting a DH or a DH and an ER includes the replacement of the KR with a DH and KR or an ER, DH, and KR. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence for another epothilone PKS module, from a gene for a PKS that produces a polyketide different from epothilone, or from chemical synthesis. The heterologous coding sequence of the resulting third module can be coexpressed with the other proteins constituting a PKS that synthesizes epothilone, an epothilone derivative, or another polyketide.
Exemplary recombinant PKS genes of the invention include those in which the AT domain coding sequences for the third epothilone PKS module have been altered or replaced to change the AT domain thus encoding a specific AT to malonyl an AT specific to methylmalonyl. Such nucleic acids encoding the methylmalonyl-specific AT domain can be isolated, for example and without limitation, from the PKS genes encoding DEBS, the narbonolide PKS, the rapamycin PKS, and the FKS-520 PKS. When coexpressed with the remaining modules and proteins of the epothilone PKS or an epothilone derivative, the recombinant PLS produces the 14-methyl epothilone derivative of the invention. Those skilled in the art will recognize that the KR domain of the third module of the PKS is responsible for forming the hydroxyl group involved in the formation of the epothilone cycle. Consequently, abolishing the KR domain of the third module or adding a DH or DH domain and ER will interfere with the formation of the cycle, leading either to a linear molecule or to a molecule with a cycle formed in a location different from that of epothilone. The fourth module of the epothilone PKS includes a KS, an AT that can bind either malonyl CoA or methylmalonyl CoA, a KR, and an ACP. This module is encoded by a sequence within a Nsil-Hindlll restriction fragment of -10 kb of the cosmid pKOS35-70.1A2.
The recombinant DNA compounds of the invention encoding the fourth epothilone PKS module and the corresponding polypeptides encoded by them are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence encoding the fourth epothilone module is inserted into a DNA compound comprising the coding sequence for one or more modules of a heterologous PKS. The resulting construct encodes a protein in which a heterologous PKS module is either replaced by the fourth epothilone PKS module or the latter is merely added to the modules of the heterologous PKS. Together with other proteins that make up the heterologous PKS, this protein provides a novel PKS. In another embodiment, a DNA compound comprising a sequence encoding the fourth epothilone PKS module is expressed in a host cell that also expresses the remaining modules and proteins of epothilone PKS or a recombinant epothilone PKS that produces a epothilone derivative. To make epothilone or epothilone derivatives, the fourth recombinant module usually expresses a protein that also contains the third, fifth, and sixth epothilone modules or modified versions thereof. In another embodiment, all or only a portion of the coding sequence of the fourth module is used in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the AT specific for malonyl CoA and methylmalonyl CoA with a specific AT for malonyl CoA, melammalonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; delegate the KR; and / or replacing the KR, including, optionally, specifying a different stereochemistry; and / or inserting a DH or a DH and an ER. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence of another epothilone PKS module, from a gene for a PKS that produces a polyketide different from epothilone, or from chemical synthesis. The heterologous coding sequence of the resulting fourth module is incorporated into a protein subunit of a recombinant PKS that synthesizes epothilone, an epothilone derivative, or another polyketide. If the desired polyketide is an epothilone or epothilone derivative, the fourth recombinant module is typically expressed as a protein that also contains the third, fifth, and sixth epothilone modules or modified versions thereof. Alternatively, the invention provides recombinant PKS enzymes for epothilones and epothilone derivatives in which the fourth complete module has been deleted or replaced by a module from a heterologous PKS module. In a preferred embodiment, the invention provides compounds of Recombinant DNA comprising the sequence coding for the fourth module of the epothilone PKS modified to encode an AT that binds methylmalonyl CoA and non-malonyl CoA. These recombinant molecules are used to express a protein that is a recombinant derivative of the epoD protein comprising the modified fourth module as well as modules 3, 5, and 6, one or more of which may optionally be in derivative form, of the epothilone PKS. In another preferred embodiment, the invention provides recombinant DNA compounds comprising the sequence coding for the fourth module of epothilone PKS modified to encode an AT that binds malonyl CoA and not methylmalonyl CoA. These recombinant molecules are used to express a protein that is a recombinant derivative of the epoD protein comprising the modified fourth module as well as the modules 3, 5, and 6, one or more of which may optionally be in derivative form, of the Epothilone PKS. Before the present invention, it was known that Sorangium cellulosum produced the epothilones A, B, C, D, E, and F and that the epothilones A, C, and E had a hydrogen at C-12, while the epofilones B, D, and F had a methyl group in this position. Prior to the present invention, the order in which these compounds were synthesized in S. cellulosum, and the mechanism by which some of the compounds had a C-12 hydrogen while others had a methyl group in this position were not appreciated. The present disclosure reveals that the epothilones A and B are derived from the epothilones C and D by the action of the epoK gene product and that the presence of a hydrogen or methyl portion at C-12 is due to the AT domain of the module 4. of the epothilone PKS. This domain can bind either malonil or methylmalonyl CoA and, consistent with its great similarity that it has with the AT domains specific to malonil that to the AT domains specific to methylmalonyl, binds more frequently malonyl CoA than methylmalonyl CoA. Thus, the invention provides recombinant DNA compounds and expression vectors and corresponding recombinant PKSs in which the fourth hybrid module with an AT specific to methylmalonyl has been incorporated. The methylmalonyl-specific AT coding sequence can be originated, for example and without limitation, from coding sequences for oleandolide PKS, DEBS, narbololide PKS, rapamycin PKS, or any other PKS comprising a specific AT domain. to methylmalonyl. According to the invention, the fourth hybrid module expressed from this coding sequence is incorporated into the epothilone PKS (or the PKS for an epothilone derivative), typically as a product derived from the epoD gene. The resulting epothilone recombinant PKS produces epothilones with a methyl portion at C-12, ie, epothilone H (or an epothilone derivative H) if there is no dehydratase activity to form the C-12-C-13 alkene; epothilone D (or an epothilone D derivative), if the dehydratase activity is present but not the epoxidase activity; epothilone B (or an epothilone B derivative), if both the dehydratase activity and the epoxidase activity is present but not the hydroxylase activity; and epothilone F (or an epothilone derivative F), if the three activities dehydratase, epoxidase and hydroxylase are present. As indicated between parentheses above, the cell will produce the corresponding epothilone derivative if there have been other changes to the epothilone PKS. If the recombinant PKS comprising the fourth hybrid module specific to methylmalonyl is expressed in, for example, Sorangium cellulosum, the appropriate modifying enzymes are present (unless made inactive according to the methods herein), and the epothilones D, B, and / or F are produced. Said production is typically carried out in a recombinant S. cellulosum provided by the present invention in which the native epothilone PKS is incapable of functioning or incapable of functioning except in conjunction with the fourth recombinant module provided. In an illustrative example, one can use the methods and reagents of the invention to inactivate the epoD gene in the native host. Then, one can transform that host with a vector comprising the recombinant epoD gene which contains the coding sequence of the fourth hybrid module. The recombinant vector can exist as an extrachromosomal element or as a segment of DNA integrated within the chromosome of the host cell. In the latter embodiment, the invention provides that one can simply integrate the coding sequence of the recombinant specific module 4 into methylmalonyl within wild-type S. cellulosum by homologous recombination with the native epoD gene to ensure that only the desired epothilone is produced. The invention provides that the S. cellulosum host can either express or not express (by mutation or homologous recombination of the native genes thereof) the products of the dehydratase gene, epoxidase, and / or oxidase and thus form or not form the epofilone compounds D, B, and F, as the practitioner chooses. Sorangium cellulosum modified as described above is only one of the recombinant host cells provided by the invention. In a preferred embodiment, the coding sequences of the fourth methylmalonyl-specific epofilone recombinant module are used according to the methods of the invention to produce epothilone D, B, and F (or their corresponding derivatives) in heterologous host cells. Thus, the invention provides reagents and methods for introducing epothilone PKS genes or epothilone derivatives and epothilone dehydratase genes, epoxidase, and hydroxylase and combinations thereof into heterologous host cells. The coding sequences of the fourth methylmalonyl-specific epothilone-specific recombinant module provided by the invention provide important alternative methods for producing desired epothilone compounds in host cells. Thus, the invention provides a coding sequence of the fourth hybrid module in which, in addition to the replacement of the coding sequence of endogenous TA with a coding sequence for a specific AT for methylmalonyl CoA, the coding sequences for a DH and KR for, for example and without limitation, module 10 of the rapamycin PKS or modules 1 to 5 of the PKS of FK-520 have replaced the endogenous KR coding sequences. When the gene product comprising the fourth hybrid module and modules 3, 5, and 6 of the epothilone PKS (or derivatives thereof) encoded by this coding sequence are incorporated into a PKS comprising the other epothilone PKS proteins (or derivatives thereof) produced in a host cell, the cell makes either epothilone D or its transisomer (or derivatives thereof). of the same), depending on the stereochemical specificity of the DH or KR domain inserted. Similarly, and as evidenced above, the invention provides recombinant DNA compounds comprising the coding sequence for the fourth module of epothilone PKS modified to encode an AT that binds to malonyl CoA and not to methylmalonyl CoA. The invention provides recombinant DNA compounds and vectors and corresponding recombinant PKSs in which this fourth hybrid module has been incorporated into the product of the epoD gene. When incorporated within the epothilone PKS (or the PKS for an epothilone derivative), the resulting recombinant epofilone PKS produces epothilones C, A, and E, depending, again, on whether the epothilone modification enzymes are present. As evidenced above, depending on the host, if the fourth module includes a KR and DH domain, and of whether and which activities of dehydratase, epoxidase and oxidase are present, the practitioner of the invention can produce one or more of the compounds of the invention. epothilone G, C, A and E and derivatives thereof using the compounds, host cells, and methods of the invention.
The fifth module of the epothilone PKS includes a KS, an AT that binds malonyl CoA, a DH, an ER, a KR, and an ACP. This module is encoded by a sequence within a Nsil-Notl restriction fragment of -12.4 kb of the cosmid pKOS35-70.1A2. The recombinant DNA compounds of the invention encoding the fifth epothilone PKS module and the corresponding polypeptides encoded therein are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence encoding the fifth epothilone module is inserted into a DNA compound comprising the coding sequence for one or more modules of a heterologous PKS. The resulting construct, in which a module of the heterologous PKS is either replaced by the fifth module of the epothilone PKS or the latter is merely added to the modules of the heterologous PKS, can be incorporated into an expression vector and used to produce the recombinant protein encoded as well. When the recombinant protein is combined with the other proteins of the heterologous PKS, a novel PKS is produced. In another embodiment, a DNA compound comprising a sequence encoding the fifth module of epothilone PKS is inserted into a DNA compound comprising the coding sequences for epothilone PKS or a recombinant epothilone PKS that produces a derivative of epothilone. In the last construction, the fifth epofilone module is typically expressed as a protein comprising the third, fourth, and sixth modules of epothilone PKS or derivatives thereof. In another embodiment, a portion of the coding sequence of the fifth module is used in conjunction with other PKS coding sequences to create a coding sequence for the hybrid module and the hybrid molecule encoded by it. In this embodiment, the invention provides, for example, to replace either the specific AT for malonyl CoA with a specific AT for methylmalonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; deleting any one, two or three of the ER, DH, and KR; and / or replace any one, two or three of the ER, DH, and KR with either an ER, a DH, and KR, or a KR, DH, and ER, optionally including specifying a different stereochemistry. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence of another epothilone PKS module, from a coding sequence that produces a polyketide different from epothilone, or from chemical synthesis. The coding sequence of the resulting fifth hybrid module can be used in conjunction with a coding sequence for a PKS that synthesizes epothilone, an epothiione derivative, or another polyketide. Alternatively, the fifth module of epothilone PKS can be deleted or replaced in its entirety by a module of a heterologous PKS to produce a protein which in combination with the other proteins of epothilone PKS or derivatives thereof constitute a PKS that produces an epothilone derivative. Exemplary PKS recombinant genes of the invention include recombinant epoD gene derivatives in which the AT domain coding sequences for the fifth epothilone PKS module have been altered or replaced to change the AT domain thus encoding a specific AT to malonil to an AT specific for methylmalonyl. Such nucleic acids encoding the methylmalonyl-specific AT domain can be isolated, for example and without limitation, from PKS genes encoding DEBS, the narbonolide PKS, the rapamycin PKS, and the FKS-520 PKS. When said epoD recombinant gene derivative is coexpressed with the epoA, epoB, epoC, epoE, and epoF genes (or derivatives thereof), the PKS thus composed produces the 10-methyl epothilones or derivatives thereof. Another recombinant derivative of the epoD gene provided by the invention includes not only this coding sequence of the altered module 5 but also the coding sequences of module 4 that encode an AT domain that binds only mefilmalonyl CoA. When incorporated within a PKS with the epoA genes, epoB, epoC, epoE, and epoF, the product derived from the recombinant epoD gene leads to the production of derivatives of 10-methyl epothilone B and / or D. Other exemplary derivatives of the recombinant epoD gene of the invention include those with the sequences coding of the ER, DH, and KR domain for the fifth module of the epothilone PKS that have been replaced with those that code (!) a domain KR and DH; (ii) a KR domain; and (iii) an inactive KR domain. These recombinant epoD gene derivatives of the invention are coexpressed with the epoA, epoB, epoC, epoE, and epoF genes to produce a recombinant PKS that makes the corresponding epothilone derivatives (i) alkene C-11, (ii) hydroxy C- 11, and (iii) keto C-11. These derivatives of the recombinant epoD gene can also be coexpressed with recombinant epo genes containing other alterations or they can alter themselves to produce a PKS that makes the corresponding epothilone derivatives C-11. For example, a derivative of the recombinant epoD gene provided by the invention also includes the coding sequences of module 4 that encode an AT domain that binds only methylmalonyl CoA. When incorporated into a PKS with the epoA, epoB, epoC, epoE, and epoF genes, the product derived from the recombinant epoD gene leads to the production of 10-methyl epothilone B and / or D derivatives. The functional epoD genes similar to produce epothilone derivatives C-11 can also be made by inactivation of one, two, or the three domains ER, DH, and KR of the fifth epothilone module. However, the preferred mode for altering said domains in any module is by replacement with the complete set of desired domains taken from another module of the same coding sequence of heterologous PKS or a recombinant. In this way, the natural architecture of the PKS is preserved. Also, when present, the KR and DH or KR, DH, and ER domains that function together in a native PKS are preferably used in the PKS. recombinant. Illustrative replacement domains for the substitutions described above include, for example and without limitation, the inactive KR domain from rapamycin PKS module 3 to form the ketone, the KR domain from module 5 of the rapamycin PKS to form the alcohol, and the KR domains from module 4 of the rapamycin PKS to form the alkene. Other nucleic acids encoding such inactive KR, active KR, and active KR and DH domains can be isolated from, for example and without limitation, the PKS genes encoding DEBS, the narbonolide PKS, and the FK.520 PKS. Each of the resulting PKS enzymes produces a polyketide compound comprising a functional group at the C-11 position that can be further derivatized in vitro by standard chemical methodology to produce semi-synthetic epothilone derivatives of the invention. The sixth module of the epothilone PKS includes a KS, an AT that binds mefilmalonyl CoA, a DH, an ER, a KR, and an ACP. This module is encoded by a sequence within a Hindlll-Nsil restriction fragment of -14.5 kb of the cosmid pKOS35-70.1A2. The recombinant DNA compounds of the invention encoding the sixth module of the epothilone PKS and the corresponding polypeptides encoded by them are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence encoding the sixth epothilone module is inserted into a DNA compound comprising the coding sequence for one or more modules of a heterologous PKS. The resulting construct, in which a module of the heterologous PKS is either replaced by the sixth module of the epothilone PKS or the latter is merely added to the modules of the heterologous PKS, provides a novel PKS when co-expressed with other proteins PKS. In another embodiment, a DNA compound comprising a sequence encoding the sixth module of epothilone PKS is inserted into a DNA compound comprising the coding sequence for modules 3, 4, and 5 of the epothilone PKS or a recombinant epothilone PKS that produces an epothilone derivative and coexpresses with other proteins of the epothilone PKS or epothilone derivative to produce a PKS that makes epothilone or an epothilone derivative in a host cell. In another embodiment, a portion of the coding sequence of the sixth module is used in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the specific AT for methylmalonyl CoA with a specific AT for malonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; deleting any one, two or three of the ER, DH, and KR; and / or replace any one, two or three of the ER, DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER, including, optionally, specifying a different stereochemistry. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence for another module of the PKS of epothilone, from a coding sequence for a PKS, from a coding sequence for a PKS that produces a polyketide different from epofilone, or from chemical synthesis. The coding sequence of the resulting sixth hybrid module can be used in conjunction with a coding sequence for a protein subunit of a PKS which makes epothilone, an epothilone derivative, or another polyketide. If the PKS makes epothilone or an epothilone derivative, the sixth hybrid module is typically expressed as a protein comprising modules 3, 4, and 5 of the epothilone PKS or a derivative thereof. Alternatively, the sixth module of epothilone PKS can be deleted or replaced in its entirety by a module of a heterologous PKS to produce a PKS for an epothilone derivative. Exemplary PKS recombinant genes of the invention include those in which the AT domain coding sequences for the sixth epothilone PKS module have been altered or replaced to change the AT domain thus encoding a specific AT for methylmalonyl to a specific AT for malonil. Such nucleic acids encoding the malonyl-specific AT domain can be isolated, for example and without limitation, from PKS genes encoding the narbonolide PKS, the rapamycin PKS, and the FK-520 PKS. When an epoD recombinant gene of the invention encoding said hybrid module 6 is coexpressed with other epothilone PKS genes, the recombinant PKS makes the 8-desmethyl epothilone derivatives. This recombinant epoD gene derivative can also be co-expressed with recombinant epo gene derivatives that contain other alterations or can additionally alter itself to produce a PKS that makes the corresponding 8-demethyl epothilone derivatives. For example, a recombinant epoD gene provided by the invention also includes the coding sequences of module 4 that encode an AT domain that binds only methylmalonyl CoA. When incorporated into a PKS with epoA, epoB, epoC, epoE, and epoF, the product derived from the recombinant epoD gene leads to the production of 8-desmethyl epothilone derivatives B and D. Other exemplary derivatives of the recombinant epoD gene of the invention includes those with the coding sequences of the ER, DH, and KR domain for the sixth module of epothilone PKS that have been replaced with those coding for (i) a KR and DH domain; (ii) a KR domain; and (iii) an inactive KR domain. These recombinant epoD gene derivatives of the invention are coexpressed with the other Epothilone PKS genes which make the corresponding epothilone derivatives (i) C-9 alkene, (ii) C-9 hydroxy, and (iii) C-9 keto. These recombinant epoD gene derivatives can also be coexpressed with recombinant epo genes that contain other alterations or can alter themselves to produce a PKS that makes the corresponding epothilone derivatives C-9. For example, a derivative of the recombinant epoD gene provided by the invention also includes the coding sequences of module 4 that encode an AT domain that binds only methylmalonyl CoA. When incorporated into a PKS with the epoA, epoB, epoC, epoE, and epoF genes, the product derived from the recombinant epoD gene leads to the production of C-9 derivatives of epothilone B and D. The sixth functionally equivalent modules can also be made by inactivating one, two, or the three domains ER, DH, and KR of the sixth module of epothilone. The preferred mode for altering said domains in any module is by replacement with the complete set of desired domains taken from another module of the same PKS or a heterologous coding sequence. Illustrative replacement domains for the substitutions described above include but are not limited to the inactive KR domain from module 3 of the rapamycin PKS to form the ketone, the KR domain from module 5 of the rapamycin PKS to form the alcohol, and the KR and DH domains from module 4 of the rapamycin PKS to form the alkene. Other nucleic acids encoding such inactive KR, active KR, and active KR and DH domains can be isolated from, for example, and without limitation, the PKS genes encoding DEBS, the narbonolide PKS, and the FK-520 PKS. Each of the resulting PKS enzymes produces a polyketide compound comprising a functional group at the C-9 position that can be further derivatized in vitro by standard chemical methodology to produce semi-synthetic epothilone derivatives of the invention. The seventh module of the epothilone PKS includes a KS, an AT specific for methylmalonyl CoA, a KR, and an ACP. This module is encoded by a sequence within a BglII restriction fragment of -8.7 kb from the cosmid pKOS35-70.4. The recombinant DNA compounds of the invention encoding the seventh module of epothilone PKS and the corresponding polypeptides encoded by them are useful for a variety of applications. The seventh module of the epothilone PKS is contained in the gene product of the epoE gene, which also contains the eighth module. The present invention provides the epoE gene in recombinant form, but also provides DNA compounds encoding the seventh module without coding sequences for the eighth module as well as DNA compounds encoding the eighth module without coding sequences for the seventh module. In one embodiment, a DNA compound comprising a sequence encoding the seventh epofilone module is inserted into a DNA compound comprising the coding sequence for one or more modules of a heterologous PKS. The resulting construction, in which the coding sequence for a module of the heterologous PKS is either replaced by that of the seventh module of the epothilone PKS or the latter is merely added to coding sequences for the heterologous PKS modules, provides a novel coding sequence of PKS that can be expressed in a host cell. Alternatively, the seventh epothilone module can be expressed as a discrete protein. In another embodiment, a DNA compound comprising a sequence encoding the seventh module of epothilone PKS is expressed to form a protein that, together with other proteins, it constitutes the epothilone PKS or a PKS that produces an epofilone derivative. In these embodiments, the seventh module is typically expressed as a protein comprising the eighth module of epothilone PKS or derivative thereof and is co-expressed with the epoA, epoB, epoC, epoD, and epoF genes or derivatives thereof for constitute the PKS. In another embodiment, a portion or all of the coding sequence of the seventh module is used in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the specific AT for methylmalonyl CoA with a specific AT for malonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; deleting KR; replace KR with a KR that specifies a different stereochemistry; and / or insert a DH or a DH and an ER. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence for another epothilone PKS module, from a coding sequence for a PKS, which produces a polyketide different from epothilone, or from chemical synthesis. The coding sequence of the resulting seventh hybrid module is used, optionally in conjunction with other coding sequences, to express a protein which together with other proteins constitutes a PKS that synthesizes epothilone, an epothilone derivative, or another polyketide. When used to prepare epothilone or an epothilone derivative, the seventh module is typically expressed as a protein comprising the eighth module or derivative thereof and is coexpressed with the epoA, epoB, epoC, epoD, and epoF genes or derivatives thereof to constitute the PKS. Alternatively, the coding sequences for the seventh module of epothilone PKS in the epoE gene can be deleted or replaced by those of a heterologous module to prepare a recombinant epoE gene derivative which, together with the epoA, epoB, epoC, epoD genes, and epoF, can be expressed to make a PKS for an epothilone derivative. Exemplary epoE recombinant gene derivatives of the invention include those in which the AT domain coding sequences for the seventh module of epothilone PKS have been altered or replaced to change the AT domain thus encoding a specific AT for methylmalonyl a specific AT for malonil. Such nucleic acids encoding the malonyl-specific AT domain can be isolated, for example and without limitation, from PKS genes encoding the narbonolide PKS, the rapamycin PKS, and the PKS of FK-520. When co-expressed with the other PKS genes of epothilone, epoA, epoB, epoC, epoD, and epoF, or derivatives thereof, a PKS is produced for an epithilone derivative with a C-6 hydrogen, instead of a methyl C -6. Thus, if the genes do not confer other alterations, the compounds produced are the 6-desmethyl epothilones. The eighth module of the epothilone PKS includes a KS, an AT specific for methylmalonyl CoA, inactive KR and DH domains, a domain methyltransferase (MT), and an ACP. This module is encoded by a sequence within a Notl restriction fragment of -10 kb of the cosmid pKOS35-79.85. The recombinant DNA compounds of the invention that encode the eighth module of epothilone PKS and the corresponding polypeptides encoded by them are useful for a variety of applications. In one embodiment, a DNA compound comprising a sequence encoding the eighth epothilone module is inserted into a DNA compound comprising the coding sequence for one or more modules of a heterologous PKS. The resulting construct, in which the coding sequence for a module of the heterologous PKS is either replaced by that of the eighth module of the epothilone PKS or the latter is merely added to coding sequences for the modules of the heterologous PKS, provides a novel coding sequence of PKS that is expressed with the other proteins constituting the PKS to provide a novel PKS. Alternatively, the eighth module can be expressed as a discrete protein that can be associated with other PKS proteins to constitute a novel PKS. In another embodiment, a DNA compound comprising a sequence encoding the eighth module of the epothilone PKS is co-expressed with other proteins that constitute the epothilone PKS or a PKS that produces an epothilone derivative. In these embodiments, the eighth module is typically expressed as a protein comprising the seventh module or a derivative thereof.
In another embodiment, a portion or all of the coding sequence of the eighth module is used in conjunction with other coding PKS sequences to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the specific AT for methylmalonyl CoA with a specific AT for malonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; deleting inactive KR and / or inactive DH; replace the inactive KR and / or DH with an active KR and / or DH; and / or insert an ER. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence for another module of the PKS, from a coding sequence for a PKS which produces a polyketide different from epothilone, or from chemical synthesis. The coding sequence of the resulting eighth hybrid module is expressed as a protein that is used in conjunction with other proteins to constitute a PKS that synthesizes epothilone, an epothilone derivative, or another polyketide. When used to prepare epothilone or an epothilone derivative, the heterologous module or the eighth hybrid module is typically expressed as a product of the recombinant genE which also contains the seventh module. Alternatively, the coding sequences for the eighth module in the epoE gene can be deleted or replaced by those of a heterologous module to prepare a recombinant epoE gene which, together with the genes epoA, epoB, epoC, epoD, and epoF, can be expressed to make a PKS for an epothilone derivative.
The eighth module of epothilone PKS also comprises a methylation or methyltransferase (MT) domain with an activity that methylate the epothilone precursor. This function can be deleted to produce a recombinant epoD gene derivative of the invention, which can be expressed with the other PKS genes or derivatives thereof which make an epothilone derivative lacking one or both methyl groups, depending on whether the AT domain from the eighth module has been changed to a specific AT domain for malonyl, at the corresponding C-4 position of the epothilone molecule. In another important embodiment, the present invention provides recombinant DNA compounds encoding a polypeptide with this methylation and activity domain and a variety of PKS coding sequences. recombinants that encode recombinant PKS enzymes that incorporate this polypeptide. The availability of this MT domain and the coding sequences thereof provide a significant number of new polyketides which differ from the known polyketides by the presence of at least one additional methyl group. The MT domain of the invention can in fact be added to any PKS module to direct methylation at the corresponding location in the polyketide produced by the PKS. As an illustrative example, the present invention provides the recombinant nucleic acid compounds that result from inserting the coding sequence for this MT activity into a coding sequence for any one or more of the six modules of the DEBS enzyme to produce a recombinant DEBS that synthesizes a derivative of 6-deoxyeritronolide B that it comprises one or more additional methyl groups at the C-2, C-4, C-6, C-8, C-10, and / or C-12 positions. In such constructions, the MT domain can be inserted adjacent to the AT or the ACP. The ninth module of the epothilone PKS includes a KS, a specific AT for malonyl CoA, a KR, an inactive DH, and an ACP. This module is encoded by a sequence within a Hindlll-Bgl II restriction fragment of -14.7 kb of the cosmid pKOS35-79.85. The recombinant DNA compounds of the invention encoding the ninth module of the epofilone PKS and the corresponding polypeptides encoded by them are useful for a variety of applications. The ninth module of epothilone PKS is expressed as a protein, the product of the epoF gene, which also contains the TE domain of the epothilone PKS. The present invention provides the epoF gene in recombinant form, as well as the DNA compounds encoding the ninth module without the coding sequences of the TE domain and the DNA compounds encoding the TE domain without the coding sequences of the ninth module. In one embodiment, a DNA compound comprising a sequence encoding the ninth epothilone module is inserted into a DNA compound comprising the coding sequence for one or more modules of a heterologous PKS. The resulting construction, in which the coding sequence for a module of the heterologous PKS is either replaced by that of the ninth module of the epofilone PKS or the latter is merely added to coding sequences for the modules of the PKS heterologous, provides a novel coding sequence of PKS that is expressed with the other proteins that constitute a PKS to provide a novel PKS. The coding sequence of the ninth module can also be expressed as a discrete protein with or without an attached TE domain. In another embodiment, a DNA compound comprising a sequence encoding the ninth module of epothilone PKS is expressed as a protein together with other proteins to constitute an epothilone PKS or a PKS that produces an epothilone derivative. In these embodiments, the ninth module is typically expressed as a protein that also contains the TE domain of either the epothilone PKS or a heterologous PKS. In another embodiment, a portion or all of the coding sequence of the ninth module is used in conjunction with other PKS coding sequences to create a hybrid module. In this embodiment, the invention provides, for example, to replace either the specific AT for malonyl CoA with a specific AT for methylmalonyl CoA, ethylmalonyl CoA, or 2-hydroxyhalonyl CoA; delegate the KR; replace the KR with a KR that specifies a different stereochemistry; and / or insert a DH or a DH and an ER. In addition, the KS and / or ACP can be replaced with another KS and / or ACP. In each of these replacements or insertions, the heterologous coding sequences of KS, AT, DH, KR, ER, or ACP can originate from a coding sequence for another epothilone PKS module, from a coding sequence for a PKS that produces a polyketide different from epothilone, or from chemical synthesis. The resulting coding sequence of the ninth The heterologous module is coexpressed with the other proteins that constitute a PKS that synthesizes epofilone, an epothilone derivative, or another polyketide. Alternatively, the present invention provides a PKS for an epofilone or epothilone derivative in which the ninth module has been replaced by a module from a heterologous PKS or has been deleted in its entirety. In the latter mode, the TE domain is expressed as a discrete protein or fused to the eighth module. The ninth module of the epothilone PKS is followed by a thioesterase domain. This domain is encoded in the Hindlll-Bgl II restriction of -14.7 kb comprising the coding sequence of the ninth module. The present invention provides recombinant DNA compounds encoding hybrid PKS enzymes in which the ninth module of the epothilone PKS is fused to a heterologous thioesterase or one or more modules of a heterologous PKS are fused to the thioesterase of epothilone PKS. A) Yes, for example, a sequence encoding a thioesterase domain from another PKS can be inserted into the final part of the ACP coding sequence of the ninth module in the recombinant DNA compounds of the invention. The recombinant DNA compounds encoding this thioesterase domain are therefore useful for constructing DNA compounds that encode an epothilone PKS protein, a PKS that produces an epothilone derivative, and a PKS that produces a polyketide other than epothilone or an epothilone derivative.
In an important embodiment, the present invention thus provides a hybrid PKS and the corresponding recombinant DNA compounds that encode the proteins that constitute those hybrid PKS enzymes. For purposes of the present invention a hybrid PKS is a recombinant PKS comprising all or part of one or more modules, charge domain, and thioesterase / cyclase domain of a first PKS and all or part of one or more modules, charge domain , and thioesterase / cyclase domain of a second PKS. In a preferred embodiment, the first PKS is the most but not all of the epothilone PKS, and the second PKS is only a portion or all of a non-epothilone PKS. An illustrative example of said hybrid PKS includes an epothilone PKS in which the natural charge domain has been replaced with a loading domain of another PKS. Another example of said hybrid PKS is an epothilone PKS in which the AT domain of module four is replaced with an AT domain of a heterologous PKS that binds only methylmalonyl CoA. In another preferred embodiment, the first PKS is the most but not all of a non-epothilone PKS, and the second PKS is only a portion or a whole epothilone PKS. An illustrative example of said hybrid PKS includes an erythromycin PKS in which a specific AT for methylmalonyl CoA is replaced with an AT from the epothilone PKS specific for malonyl CoA. Another example is an erythromycin PKS that includes the MT domain of the epothilone PKS. Those skilled in the art will recognize that all or a portion of either the first or second PKS in a hybrid PKS of the invention is not It needs to be isolated from sources in which it occurs naturally. For example, only a small portion of the AT domain determines its specificity. See patent application of E.U.A. No. 09 / 346,860 series and PCT patent application No. WO 99/15047, each of which is incorporated herein by reference. The most advanced technique in DNA synthesis allows the technician to construct de novo DNA compounds of sufficient size to construct a useful portion of a PKS module or domain. For purposes of the present invention, said synthetic DNA compounds are considered to be a portion of a PKS. The following table lists references describing illustrative PKS genes and corresponding enzymes that can be used in the construction of recombinant PKSs and the corresponding DNA compounds of the invention that encode them. Several references are also presented which describe modifications of the end of the polyketide and modification enzymes and corresponding genes that can be used to make the recombinant DNA compounds of the present invention. Avermectin Patent of E.U.A. No. 5,252,474 to Merck. MacNeil et al., 1993, Industrial Microorganisms: Nasic and Applied Molecular Genetics, Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, Erythromycin, and Nemadectin.
MacNeil et al., 1992, Gene 115: 119-125, Complex Organization of the Streptomyces avermilitis genes encoding the avermectin polyketide synthase. Ikeda and Omura, 1997, Chem. Res. 97: 2599-2609, Avermectin biosynthesis. Candicidin (FR008) Hu et al., 1994, Mol. Microbiol. 14: 163-172. Erythromycin PCT Pub. No. 93/13663 of Abbott. Pat. E.U.A. No. 5,824,513 of Abbott. Donadio et al., 1991, Science 252: 675-9. Cortes et al., Nov. 8, 1990, Nature 348: 176-8, An unusually large multifunctional polypeptide in the erythromycin producing polyketide synthase of Saccharopolyspora erythraea. Glucosylation of enzymes Request of Pat. PCT pub. No. 97/23630 of Abbott. FK-506 Motamedi et al., 1998, The biosynthetic gene cluster for the macrolactone ring of the immunosuppressant FK-506, Eur. J. Biochem. 256: 528-534. Motamedi et al., 1997, Structural organization of a multifunctional polyketide synthase involved in the biosynthesis of the macrolide immunosuppressant FK-506, Eur. J. Biochem. 244: 74-80.
Methyltransferase US 5,264,355, filed on November 23, 1993, methylation enzymes from Streptomyces MA6858. 31-O-desmethyl-FK-506 methyltransferase. Motamedi et al., 1996, Characterization of methyltransferase and hydroxylase genes involved in the biosynthesis of the immunosuppressants FK-506 and FK-520, J. Bacteriol. 178: 5243-5248. FK-520 Patent application of E.U.A. serial No. 09 / 154,083, filed on September 16, 1998. Patent application of E.U.A. serial No. 09 / 410,551, filed October 1, 1999. Nielsen et al., 1991, Biochem. 30: 5789-96. Lovastatin Patent of E.U.A. No. 5,744,350 to Merck. Narbomycin Patent application of E.U.A. serial No. 60 / 107,093, filed November 5, 1998. Nemadectina MacNeil et al., 1993, previously mentioned. Niddamycin Kakavas et al., 1997, identification and Characterization of the niddamycin polyketide synthase genes from Streptomyces caelestis, J. bacteriol. 179: 7515-7522. Oleandomycin Swan et al., 1994, Characterization of a Streptomyces antibioticus gene encoding a type I polyketide synthase wich has an unusual coding sequence, Mol. Gen. Genet. 242: 358-362. Patent application of E.U.A. serial No. 60 / 120,254, filed February 16, 1999. Serial No. 09 /, filed on October 28, 1999, claiming priority thereof by the inventors S. Shah, M. Betlach, R. McDaniel, and L. Tang, attorney's note No. 30063-20029.00. Olano et al., 1998, Analysis of a Streptomyces antibioticus chromosomal region involved in oleandomycin biosynthesis, wich encodes two glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol. Gen. Genet. 259 (3): 299-308. Picromycin PCT Patent Application No. WO9999 / 11814, filed May 28, 1999. Patent application of E.U.A. serial No. 09 / 320,878, filed May 27, 1999. Patent application of E.U.A. serial No. 09/141, 908, filed on August 28, 1998.
Xue et al., 1998, Hydroxylation of macrolactones YC-17 and narbomycin is mediated by the pikC-encoded cytochrome P450 in Streptomyces venezuelae, Chemistry & Biology 5 (11): 661-667. Xue et al., Oct. 1998, A gene cluster for macrolide antibiotic biosynthesis in Streptomyces venezuelae: Architecture of metabolic diversity, Proc. Natl. Acad. Sci. USA 95: 12111-12116. Platenolido Request of Pat. EP No. 791, 656 of Lilly. Pradimycin Patent PCT Pub. No. WO 98/11230 by Britol-Myers Squibb. Rapamycin Schwecke et al., Aug. 1995, The biosynthetic gene cluster for the polyketide rapamycin, Proc. Natl. Acad. Sci. USA 92: 7839-7843. Aparicio et al., 1996, Organization of the biosynthetic gene cluster for rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in the modular polyketide synthase, Gene 169: 9-16. Rifamycin Pat. PCT Pub. No. WO 98/07868 to Novartis. August et al., 13 Feb. 1998, Biosynthesis of the antibiotic antibiotic rifamycin: deductions from the molecular analysis of the rif biosynthetic gene cluster of Amycolatopsis mediterranei S669, Chemistry & Biology, 5 (2): 69-79.
Sorangium PKS Patent application of E.U.A. serial No. 09 / 144,085, filed on August 31, 1998. Sorafén Patent of E.U.A. No. 5,716,849 of Novartis. Schupp et al., 1995, J. Bacteriology 177: 3673-3679. A Sorangium cellulosum (Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide Antibiotic Soraphen A: Cloning, Characterization, and Homology to Poliketide Synthase genes from Actinomycetes. Spiramycin Patent of E.U.A. No. 5,098,837 of Lilly. Activating gene Patent of E.U.A. No. 5,514,544 of Lilly. Tylosin Patent of E.U.A. No. 5,876,991 of Lilly. Pub. EP No. 791, 655 of Lilly. Kuhstoss et al., 1996, Gene 183: 231-6., Production of a novel polyketide through the construction of a hybrid polyketide synthase. Enzymes of modification of the end Merson-Davies and Cundliffe, 1994, Mol. Microbiol. 13: 349-355.
Analysis of five tylosin biosynthefic genes from the tyl BA region of the Streptomyces fradiae genome.
As the above table illustrates, there is a wide variety of PKS genes that serve as readily available sources of DNA and sequence information for use in the construction of the hybrid PKS / DNA encoding compounds of the invention. Methods for constructing hybrid PKS / coding DNA compounds are described without reference to the epothilone PKS in the U.S. Patents. Nos. 5,672,491 and 5,712,146 and in the patent applications of E.U.A. Serial Nos. 09 / 073,538, filed May 6, 1998, and 09/141, 908 filed August 28, 1998, each of which is incorporated herein by reference. Preferred PKS enzymes and coding sequences for the proteins that constitute them for purposes of isolating the coding sequences of the heterologous PKS domain to construct hybrid PKS enzymes of the invention are the sorafen PKS and the PKS described as PKS of Sorangium in the above table. To summarize the functions of the cloned and sequenced genes in Example 1: Gene Protein Module Domains present epoA EpoA Loading Ksy mAT ER ACP epoB - EpoB 1 NRPS, condensation, heterocyclisation, adenylation, thiolation, PCP epoC EpoC 2 KS mmAT DH KR ACP epoD EpoD 3 KS mAT KR ACP 4 KS mAT KR ACP Gene Protein Module Domains present 5 KS mAT DH ER KR ACP 6 KS mmAT DH ER KR ACP epoE EpoE 7 KS mmAT KR ACP 8 KS mmAT MT DH * KR * ACP epoF EpoF 9 KS mAT KR DH * ACP TE NRPS- peptide synthetase no ribosomal: KS-ketosynthase; mAT-acyltransferase specific for malonyl CoA; mmAT acyltransferase specific for methylmalonyl CoA; DH-dehydratase; ER- enoylreductase; KR-ketoreductase; MT-methyltransferase; TEthioesterase; * -domain inactive. The hybrid PKS / DNA encoding compounds of the invention can be and often are hybrids of more than two PKS genes. Even when only two genes are used, there are often two or more modules in the hybrid gene in which all or a part of the module derives from a second (or third) PKS gene. Illustrative examples of epothilone-derived recombinant PKS genes of the invention, which are identified by listing the specificities of the hybrid modules (the other modules having the same specificity as epothilone PKS), include: (a) module 4 with AT specific for methylmalonyl (mmAT) and a KR and module 2 with a specific AT for malonyl (mAT) and a KR; (b) module 4 with mM AT and one KR and module 3 with mM AT and one KR; (c) module 4 with mM AT and one KR and module 5 with mM AT and one ER, DH and KR; (d) module 4 with mM AT and one KR and module 5 with mM AT and one DH and KR; (e) module 4 with mM AT and one KR and module 5 with mM AT and one KR; (f) module 4 with mM AT and one KR and module 5 with mM AT and one inactive KR; (g) module 4 with mM AT and one KR and module 6 with m AT and one ER, DH and KR; (h) module 4 with mM AT and one KR and module 6 with m AT and one DH and KR; (i) module 4 with mM AT and one KR and module 6 with m AT and one KR; (j) module 4 with mM AT and one KR and module 6 with m AT and one inactive KR; (k) module 4 with mM AT and one KR and module 7 with m AT; (I) hybrids (c) to (f), except that module 5 has an m AT; (m) hybrids (g) through (j), except that module 6 has a mM AT; and (n) hybrids (a) through (m), except that module 4 has an m AT. The above list is illustrative only and should not be considered as limiting the invention, which includes other genes and epothilone recombinant enzymes with not only two hybrid modules different from those shown but also with three or more hybrid modules.
Those skilled in the art will appreciate that a hybrid PKS of the invention includes but is not limited to a PKS of any of the following types: (i) an epothilone PKS or epothilone derivative containing a module in which at least one of the domains is of a heterologous module; (ii) an epothilone PKS or epothilone derivative containing a module of a heterologous PKS; (iii) an epothilone PKS or epothilone derivative containing a protein of a heterologous PKS; and (iv) combinations of the above. Although an important embodiment of the present invention relates to genes from a hybrid PKS, the present invention also provides epothilone PKS recombinant genes in which there is no gene sequence for a second PKS but which differ from the epofilone PKS gene by one or more deletions. The deletions may comprise one or more modules and / or may be limited to a partial deletion within one or more modules. When a deletion encompasses an entire module other than the NRPS module, the resulting epothilone derivative is at least two carbons shorter than the compound produced from the PKS from which the deleted version is derived. The deletion may also encompass the NRPS module and / or the load domain, as evidenced above. When a deletion is within a module, the deletion typically encompasses a KR, DH or ER domain, or both DH and ER domains, or both KR and DH domains, or the three KR, DH and ER domains.
The catalytic properties of epothilone PKS domains and modules and epothilone modifying enzymes can also be altered by random mutagenesis or site-specific mutagenesis of the corresponding genes. A wide variety of mutagenesis agents and methods are known in the art and are suitable for this purpose. The technique known as DNA disorder can also be employed. See, for example, Patent of U.S.A. Nos. 5,830,721; 5,811, 238; and 5,605,793; and references cited there, each of which is incorporated herein by reference. Recombinant manipulations To construct a hybrid PKS gene or PKS gene for an epothilone derivative, or simply to express unmodified epothilone biosynthetic genes, one can employ a technique, described in PCT Pub. No. 98/27203 and patent application from the USA Serial No. 08 / 989,332, filed December 11, 1997, and 60 / 129,731, filed April 16, 1999, each of which is incorporated herein by reference, in which several genes of the PKS are divided into two or more, often three, segments, and each segment is located in a separate expression vector. In this way, the total complement of genes can be assembled and manipulated more easily for heterologous expression, and each of these segments of the gene can be altered, and several altered segments can be combined in a single host cell to provide a recombinant PKS of the invention. This technique makes construction more efficient of large libraries of recombinant PKS genes, vectors to express those genes, and host cells comprising those vectors. In this and in other contexts, the genes encoding the desired PKS are not only present in two or more vectors, but can also be arranged or arranged differently than in the native producer organism from which the genes are derived. Several examples of this technique as applied to the epothilone PKS are described in the examples below. In one embodiment, the epoA, epoB, epoC and epoD genes are present in a first plasmid, and the epoE and epoF genes and optionally either the epoK or epoK and epoL genes are present on a second (or third) plasmid. Thus, in an important embodiment, the recombinant nucleic acid compounds of the invention are expression vectors. As used herein, the term "expression vector" refers to any nucleic acid that can be introduced into a host cell or cell-free transcript and translation medium. An expression vector can be stably or transiently maintained in a cell, either as part of the chromosomal DNA or other DNA in the cell or in any cellular compartment, such as a replication vector in the cytoplasm. An expression vector also comprises a gene that serves to produce RNA that is translated into a polypeptide in the cell or cell extract. Thus, the vector typically includes a promoter to improve gene expression but alternatively it can serve to incorporate the relevant coding sequence under the control of a endogenous promoter. In addition, expression vectors can typically contain additional functional elements, such as genes that confer resistance to act as selection markers and regulatory genes to improve promoter activity. The various components of an expression vector can vary widely, depending on the intended use of the vector. In particular, the components depend on the host cell (s) in which the vector will be used or where it is intended to function. The vector components for the expression and maintenance of vectors in E. coli are widely known and commercially available, as are the vector components for other commonly used organisms, such as yeast cells and Streptomyces cells. In a modality, the vectors of the invention are used to transform Sorangium host cells to provide the recombinant Sorangium host cells of the invention. The Patent of E.U.A. No. 5,686,295, incorporated herein by reference, describes a method for transforming Sorangium host cells, although other methods may also be employed. Sorangium is a convenient host for expressing epothilone derivatives of the invention in which the recombinant PKS producing said derivatives is expressed from a recombinant vector in which the promoter of the epothilone PKS gene is located to direct the expression of the sequence Recombinant coding. The epothilone PKS gene promoter is provided recombinantly herein invention and is an important modality of it. The promoter is contained within a nucleotide sequence of -500 nucleotides between the end of the transposon sequence and the start site of the open reading frame of the epoA gene. Optionally, one can include sequences beyond the 5 'end of this 500 bp region in the promoter. Those skilled in the art will recognize that, if a Sorangium host producing epothilone is used as a host cell, the recombinant vector needs to direct the expression of only a portion of the PKS containing the altered sequences. Thus, said vector may comprise only a single epothilone PKS gene altered in particular, with the remnants of the epothilone PKS peptides provided by the genes in the chromosomal DNA of the host cell. If the host cell naturally produces an epothilone, the epothilone derivative will then be produced in a mixture containing the naturally occurring epothilone (s). Those skilled in the art will recognize that the compounds of Recombinant DNA of the invention can be used to construct Sorangium host cells in which one or more genes involved in the biosynthesis of epothilone have become inactive. Thus, the invention provides such host cells of Sorangium, which may be preferred host cells for expressing epothilone derivatives of the invention so as to avoid complex mixtures of epothilone. Particularly preferred host cells of this type include those in which one or more of any of the ORFs of the epothilone PKS gene have been disorganized, and / or those in which any one or more of the epothilone modification enzyme genes have been disorganized. Said host cells are typically constructed by a process involving homologous recombination using a vector containing DNA homologs to the flanking regions of the gene segment to be altered and located so that the desired desired homologous double-crossing recombination event will occur. . Homologous recombination can then be used to delegate, disorganize, or alter a gene. In a preferred exemplary embodiment, the present invention provides a host cell of Sorangium cellulosum that produces recombinant epothilone in which the epoK gene has been deleted or disorganized by homologous recombination using a recombinant DNA vector of the invention. This host cell, unable to make the epoKe gene product of epoxidase is unable to make epothilone A and B and is therefore a preferred source of epothilones C and D. Homologous recombination can also be used to alter the specificity of a module. of PKS by replacing the coding sequences of the module or domain of a module to alter them with those that specify a module or domain of the desired specificity. In another illustrative preferred embodiment, the present invention provides a host cell of Sorangium cellulosum that produces recombinant epothilone in which the coding sequence for the AT domain of module 4 encoded by the epoD gene has been altered by homologous recombination using a recombinant DNA vector of the invention for coding an AT domain that binds only methylmalonyl CoA. This host cell, incapable of making epothilones A, C, and E is a preferred source of epothilones B, D, and F. The invention also provides recombinant host Sorangium cells in which both alterations and deletions have been made, producing a cell host that makes only epothilone D. Similarly, those skilled in the art will appreciate that the present invention provides a wide variety of recombinant Sorangium cellulosum host cells that make fewer complex mixtures of epothilones than wild-type producer cells as well as those that make one or more epothilone derivatives. Said host cells include those that make only epothilones A, C, and E; those that make only epothilones B, D, and F; those that make only epothilone D; and those that make only epothilone C. In another preferred embodiment, the present invention provides recombinant expression vectors and Myxococcus, preferably M. xanthus, host cells that contain those expression vectors that express a recombinant epothilone PKS or a PKS for a derivative of epothilone. Currently, vectors that replicate extrachromosomally in M. xanthus are not known. There is, however, a number of known phages that integrate into the chromosomal DNA of M. xanthus, including Mx8, Mx9, Mx81, and Mx82. The integration and anchoring function of these phages can be located in plasmids to create vectors of expression based on phages that are integrated into the chromosomal DNA of M. xanthus. Of these, phages Mx9 and Mx8 are preferred for purposes of the present invention. Plasmid pPLH343, described in Salmi et al., Feb 1998, Genetic Determinants of immunity and integration of temprate Myxococcus xanthus phage Mx8, J. Bact. 180 (3): 614-621, is a plasmid that replicates in E. coli and comprises the Mx8 phage genes that encode anchoring and integration functions. The promoter of the PKS gene of epothilone works in host cells of Myxococcus xanthus. Thus, in one embodiment, the present invention provides a recombinant promoter for the use of recombinant host cells derived from the epksilone PKS gene promoter from Sorangium cellulosum. The promoter can be used to direct the expression of one or more PKS genes of epothilone or another gene product useful in recombinant host cells. The invention also provides an epothilone PKS expression vector in which one or more of the epothilone PKS enzyme genes or epothilone modifying enzymes are under the control of their own promoter. Another preferred promoter for use in the host cells of Myxococcus xanthus for purposes of expressing a recombinant PKS of the invention is the promoter of the M.A. xanthus pilA gene. This promoter, as well as two M. xanthus strains expressing high levels of gene products from the genes controlled by the pilA promoter, a pilA deletion strain and a pilS deletion strain, are described in Wu and Kaiser, Dec. 1997 , Regulation of expression of the pilA gene in Myxococcus xanthus, J. Bact. 179 (24): 7748-7758, incorporated herein by reference. Optionally, the invention provides recombinant Myxococcus host cells comprising both pilA and pilA deletions. Another preferred promoter is the starvation-dependent promoter of the sdcK gene. The markers that can be selected for use in Myxococcus xanthus include genes that confer resistance to kanamycin, tetracycline, chloramphenicol, zeocin, spectinomycin, and streptomycin. The recombinant DNA expression vectors of the invention for use in Myxococcus typically include said selection marker and may further comprise the promoter derived from an epothilone PKS enzyme gene or epothilone modifying enzymes. The present invention provides preferred expression vectors for use in preparing the recombinant expression vectors of Myxococcus xanthus and host cells of the invention. These vectors, designated plasmids pKOS35-82.1 and pKOS35-82.2 (figure 3), are capable of replicating in E. coli host cells as well as being integrated into the DNA of M. xanthus. The vectors comprise the Mx8 anchoring and integration genes as well as the pilA promoter with restriction enzyme recognition sites conveniently placed towards the 3 'end. The two vectors differ from each other merely in the orientation of the pilA promoter in the vector and can be easily modified to include the genes of the epothilone PKS enzyme and epothilone modifying enzymes of the invention. The construction of the vectors is described in example 2.
Especially preferred Myxococcus host cells are those which produce an epothilone or epothilone derivative or mixtures of epothilones or epothilone derivatives equal to or greater than 20 mg / L, more preferably equal to or greater than 200 mg / L, and more preferably the same or greater than 1 g / L. Especially preferred are host cells of M. xanthus that produce at these levels. Host cells of M. xanthus which can be used for purposes of the invention include the DZ1 cell lines (Campos et al., 1978, J. Mol. Biol. 119: 167-178, incorporated herein by reference), the producer cell line of TA ATCC 31046, DK1219 (Hodgkin and Kaiser, 1979, Mol.Gennet Gen. 171: 177-191, incorporated herein by reference) and the cell line DK1622 (Kaiser, 1979, Proc. Natl. Acad. Sci. USA 76: 5952-5956, incorporated herein by reference). In another preferred embodiment, the present invention provides vectors and recombinant Pseudomonas fluorescens host cells that contain those expression vectors and express a recombinant PKS of the invention. A plasmid for use in the construction of the expression vectors of P. fluorescens and host cells of the invention is the plasmid pRSF1010, which is replicated in host cells of E. coli and P. fluorescens (see Scholz et al., 1989 , Gene 75: 271-8, incorporated herein by reference). Low copy number replicons and vectors can also be used. As evidenced above, the invention also provides the promoter of the epothilone PKS of Sorangium cellulosum and the Enzyme genes for epothilone modification in recombinant form. The promoter can be used to direct the expression of a PKS gene of epothilone or of another gene in host cells of P. fluorescens. Also, the promoter of sorafen PKS genes can be used in any host cell in which the Sorangium promoter functions. Thus, in one embodiment, the present invention provides an epothilone PKS expression vector for use in P. fluorescens host cells. In another preferred embodiment, the expression vectors of the invention are used to construct recombinant Streptomyces host cells that express a recombinant PKS of the invention. Streptomyces host cells useful according to the invention include S. coelicolor, S. lividans, S. venezuelae, S. ambofaciens, S. fradiae, and the like. Preferred combinations of Streptomyces host cells / vector of the invention include the host cells S. coelicolor CH999 and S. lividans K4-114 and K4-155, which do not produce actinorodine, and the expression vectors derived from the vectors pRM1 and pRM5, as described in the US patent No. 5,830,750 and in the patent application of E.U.A. Serial Nos. 08 / 828,898, filed March 31, 1997, and 09/181, 833, filed October 28, 1998. Especially preferred Streptomyces host cells of the invention are those that produce an epothilone or epothilone derivative or mixtures of epothilones or epothilone derivatives equal to or greater than 20 mg / L, more preferably equal to or greater than 200 mg / L, and more preferably equal or greater than 1 g / L. Especially preferred are the host cells of S. coelicolor that produce at these levels. Also, species closely related to the genus Saccharopolyspora can be used to produce epothilones, including but not limited to S. erythraea. The present invention provides a wide variety of expression vectors for use in Streptomyces. For replication vectors, the origin of replication can be, for example and without limitation, a low copy number replicon and vectors comprising the same, such as SCP2 * (see Hopwood et al., Genetic Manipulation of Streptomyces-. A Laboratory manual (The John Innes Foundation, Norwich, UK, 1985), Lydiate et al., 1985, Gene 35: 223-235, and Kieser and Melton, 1988, Gene 65: 83-91, each of which incorporated herein by reference), SLP1.2 (Thompson et al., 1982, Gene 20: 51-62, incorporated herein by reference), and pSG5 (ts) (Muth et al., 1989, Mol. Gen. Genet. : 341-348, and Bierman et al., 1992, Gene 116: 43-49 each of which are incorporated herein by reference), or a replicon of high number of copies and vectors comprising the same, such as plJ101 and pJV1 (see Katz et al., 1983, J. Gen. Microbiol. 129: 2703-2714; Vara et al., 1989, J. Bacteriol., 171: 5772-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131 -140, 49 each of which is they are incorporated here as a reference). High copy number vectors are generally, however, not preferred for the expression of long genes or multiple genes. For non-replicating and integrating vectors and generally for any vector, it is useful to include at least one origin of E. coli replication, such as from pUC, p1P, pl l, and pBR. For phage-based vectors, phage phiC31 and its derivatives KC515 can be used (see Hopwood et al., Supra). Also, the plasmid pSET152, plasmid pSAM, plasmids pSE101 and pSE211, all of which are specifically integrated at a site in the chromosomal DNA of S. lividans, can be used. Typically, the expression vector will comprise one or more marker genes by which the host cells containing the vector can be identified and / or selected. Useful genes that confer resistance to antibiotics for use in Streptomyces host cells include genes that confer ermE resistance (which confers resistance to erythromycin and lincomycin), tsr (confers resistance to thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 (confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and neomycin), hyg (confers resistance to hygromycin), and vph ( confers resistance to viomycin). The recombinant PKS gene in the vector will be under the control of a promoter, typically with a sequence accompanying the ribosome binding site. A preferred promoter is the acti promoter and its actll-ORF4 accompanying activator gene, which is provided in the aforementioned expression vectors pRM1 and PMR5. This promoter is activated in the stationary phase of growth when secondary metabolites are synthesize. Other useful Streptomyces promoters include without limitation those from the ermE gene and the melC1 gene, which acts constitutively, and the tipA gene and the merA gene, which can be induced at any growth stage. In addition, the T7 RNA polymerase system has been transferred to Streptomyces and can be used in the vectors and host cells of the invention. In this system, the coding sequence for T7 RNA polymerase is inserted into a neutral site of the chromosome or into a low vector in control of the merA inducible promoter, and the gene of interest is placed under the control of the T7 promoter. As evidenced above, one or more activator genes can also be used to improve the activity of a promoter. Activating genes in addition to the actll-ORF4 gene discussed above include the dnrl, redD, and ptpA genes (see US Patent Application Serial No. 09/181, 833, supra), which can be used with its cognate promoters to direct the expression of a recombinant gene of the invention. The present invention also provides recombinant expression vectors that direct the expression of epothilone PKS and PKS enzymes that produce epothilone or epothilone derivatives in plant cells. Said vectors are constructed in accordance with the teachings in the patent application of E.U.A. Serial No. 09 / 114,083, filed July 10, 1998, and PCT Patent Publication No. 99/02669, each of which is incorporated herein by reference. Plants and plant cells that express epothilone are resistant to diseases and are capable of resist fungal infection. For improved production of an epothilone or epothilone derivative in any heterologous host cell, including plant, host cells of Myxococcus, Pseudomonas, and Streptomyces, one can also transform the cell to express a heterologous phosphopantetheinyl transferase. See the patent application of E.U.A. Serial No. 08 / 728,742, filed October 11, 1996, and PCT Patent Publication No. 97/13845, both incorporated herein by reference. In addition to providing recombinant expression vectors encoding epothilone PKS or a PKS for epothilone derivative, the present invention also provides, as discussed above, DNA compounds that encode genes for epothilone modification enzymes. As discussed above, these gene products convert epothilones C and D to epothilones A and B, and convert epothilones A and B to epothilones E and F. The present invention also provides recombinant expression vectors and host cells transformed with those vectors expressing any one or more of those genes and thus produce the corresponding epothilone or epothilone derivative. In one aspect, the present invention provides the epoK gene in recombinant form and host cells that express the gene product thereof, which converts epothilones C and D to epothilones A and B, respectively. In another important embodiment, as evidenced above, the present invention provides vectors for altering the function of any one or more of epoL, epoK, and any of the ORFs associated with the grouping of the PKS gene of epothilone in the Sorangium cells. The invention also provides recombinant Sorangium host cells that lack (or contain inactivated forms of) any one or more of these genes. These cells can be used to produce the corresponding epothilones and epothilone derivatives that result from the absence of any one or more of these genes. The invention also provides non-Sorangium host cells containing an epothilone PKS or a PKS for an epothilone derivative but which do not contain (or contain nonfunctional forms of) any modification enzyme gene. These host cells of the invention are expected to produce epothilones G and H in the absence of dehydratase activity capable of forming the C-12-13 alkene of epothilones C and D. This dehydration reaction is believed to take place in the absence of the product of the epoL gene in Streptomyces host cells. The host cells produce epothilones C and D (or the corresponding derivatives of epothilone C and D) when the dehydratase activity is present and the epoxidase P450 and hydroxylase genes (which convert epothilones A and B to epothilones E and F), respectively) are absent. Host cells also produce epothilones A and B (or the corresponding epothilone derivatives A and B) when only the hydroxylase gene is absent. The recombinant epothilone PKS enzymes of this invention are preferred for the expression in these host cells containing the hybrid module 4 with an AT specific for methylmalonyl CoA only, optionally in combination with one or more additional hybrid modules. The recombinant epothilone PKS enzymes of this invention are also preferred for expression in these host cells containing the hybrid module 4 with a specific AT for malonyl CoA only, optionally in combination with one or more additional hybrid modules. The recombinant host cells of The invention may also include other genes and corresponding gene products that enhance the production of the desired epothilone or epothilone derivatives. As a non-limiting example, the epothilone PKS protein requires phosphopantethenylation of the ACP domains of the loading domain and modules 2 through 9 as well as for the PCP domain of the NRSP. Phosphopantethenylation is mediated by enzymes called phosphopantetheinyl transferases (PTPases). To produce functional PKS enzyme in host cells that do not naturally express PPTase capable of acting on the desired PKS enzyme or increase the amounts of functional PKS enzyme in host cells in which PPTase is rate-limiting, one can induce a heterologous PPTase, including but not limited to Sfp, as described in PCT patent publications Nos. 97/13845 and 98/27203, and in the US patent application Serial Nos. 08 / 728,742, filed on October 11, 1996, and 08 / 989,332, each of which is incorporated herein by reference. The host cells of the invention can be grown and fermented under conditions known in the art for other purposes to produce the compounds of the invention. The compounds of the invention can be isolated from fermentation broths or from these cultured and purified cells by standard procedures. The fermentation conditions for producing the compounds of the invention from the Sorangium host cells can be based on the protocols described in PCT patent publications Nos. 93/10121, 97/19086, 98/22461, and 99/42602, each of which is incorporated herein by reference. The novel epothilone analogs of the present invention, as well as the epothilones produced by the host cells of the invention, can be derived and formulated as described in PCT patent publications Nos. 93/10121, 97/19086, 98/08846, 98/22461, 98/25929, 99/01124, 99/02514, 99/07682, 99/27890, 99/39694, 99/40047, 99/42602, 99/43653, 99/43320, 99/54319, 99 / 54319, and 99/54330, and in the US patent No. 5,969,145, each of which is incorporated herein by reference.
Compounds of the invention Preferred compounds of the invention include 14-methyl epothilone derivatives (made by using the hybrid module 3 of the invention having an AT that binds methylmalonyl CoA instead of malonyl CoA); the 8,9-dehydro epothilone derivatives (made by using the hybrid module 6 of the invention having a DH and KR instead of an ER, DH, and KR); 10-methyl epothilone derivatives (made by using the hybrid module 5 of the invention having an AT that binds methylmalonyl CoA instead of malonil CoA); the 9-hydroxy epothilone derivatives (made by using the hybrid module 6 of the invention having a KR instead of an ER, DH, and KR); the 8-demethyl-14-methyl epothilone derivatives (made by the use of the hybrid module 3 of the invention having an AT that binds methylmalonyl CoA instead of malonyl CoA and a 6 hybrid module that binds malonyl CoA instead of methylmalonyl CoA); and the 8-demethyl-8,9-dehydro epothilone derivatives (made by the use of the hybrid module 6 of the invention which has a DH and KR instead of an ER, DH, and KR and an AT that specifies malonyl CoA instead of mefilmalonyl CoA). More generally, the preferred epothilone derivative compounds of the invention are those that can be produced by altering the epothilone PKS genes as described herein and optionally by the action of epothilone modifying enzymes and / or by chemically modifying the Resulting epothilones produced when those genes are expressed. Thus, the present invention provides compounds of the formula: (I) Including glycosylated forms thereof and stereoisomeric forms where the stereochemistry is not shown, Wherein A is a straight or branched, substituted or unsubstituted alkyl, alkenyl or alkynyl residue that optionally contains 1-3 heteroatoms selected from O, S and N; or wherein A comprises a substituted or unsubstituted aromatic residue; R 2 represents H, H, or H. lower alkyl, or lower alkyl. Lower alkyl; X5 represents = O or a derivative thereof, or H, OH or H, NR2 wherein R is H, alkyl or acyl, or H, OCOR2, H, OCONR2 wherein R is H or alkyl, or is H, H; R6 represents H or lower alkyl, and the substituent remaining on the corresponding carbon is H; X7 represents OR, or NR2, wherein R is H, alkyl or acyl or is OCOR, or OCONR2 wherein R is H or alkyl or X7 taken together with X9 form a carbonate or cyclocarbamate, and wherein the substituent remaining on the corresponding carbon is H; R8 represents H or lower alkyl and the remaining substituent on the carbon is H; X9 represents = O or a derivative thereof, or H.OR or H, NR2 wherein R is H, alkyl or acyl, or is H.OCOR or H, OCONR2, wherein R is H or alkyl, or represents H, H or wherein X9 together with X7 or X11 can form a carbamate or carbamate cycle; R 10 is H, H or H. lower alkyl, or lower alkyl. Lower alkyl; X11 is = O or a derivative thereof, or H, OR, or H, NR2 wherein R is H, alkyl or acyl or H.OCOR or H, OCONR2 wherein R is H or alkyl, or is H, H or wherein X11 in combination with X9 can form a carbamate or carbamate cycle; R 12 is H, H, or H. lower alkyl, or lower alkyl. Lower alkyl; X13 is = O or a derivative thereof, or H, OR or H, NR2 wherein R is H, alkyl or acyl, or is H.OCOR or H, OCONR2, wherein R is H or alkyl; R 14 is H, H or H. lower alkyl, or lower alkyl. Lower alkyl; R16 is H or lower alkyl; and wherein optionally H and another substituent can be removed from positions 12 and 13 and / or 8 and 9 to form a double bond, wherein said double bond can optionally be converted to an epoxide. Particularly preferred are the compounds of the formulas 1 (c) Substituents evidenced are as defined above. Especially preferred are the compounds of the formulas where both Z are O or one Z is N and the other Z is O, and the remaining substituents are as defined above. As used herein, a substituent which "comprises an aromatic portion" contains at least one aromatic ring, such as a phenyl, pyridyl, pyrimidyl, thiophenyl, or thiazolyl. The substituent may also include fused aromatic residues such as naphthyl, indolyl, benzothiazolyl, and the like. The aromatic portion can also be fused to a non-aromatic ring and / or can be coupled to the remainder of the compound in which it is a substituent via an alkylene, for example, non-aromatic residue. The aromatic portion can be substituted or not replaced as well as the remainder of the substituent. Preferred embodiments of A include the "R" groups shown in Figure 2. As used herein, the term "alkyl" refers to straight or branched chain hydrocarbon radicals, saturated from C to Ca derived from hydrocarbon portions at remove a single atom of hydrogen. Alkenyl and alkynyl refers to the corresponding unsaturated forms. Examples of alkyl include but are not limited to methyl, ethyl, propyl, isopropyl, n-butyl, tert-butyl, neopentyl, -hexyl, n-heptyl, n-octyl. The lower alkyls (or alkenyl or alkynyl) refer to the 1-4C radicals. Methyl is preferred. Acyl refers to alkylCO, alkenylCO, or alkynylCO. The terms halo and halogen as used herein refer to an atom selected from fluoride, chloride, bromide, and iodide. The term Haloalkyl as used herein denotes an alkyl group to which one, two or three halogen atoms are attached to any carbon and include without limitation chloromethyl, bromoethyl, trifluoromethyl, and the like. The term "heteroaryl" as used herein refers to a cyclic aromatic radical having from five to ten ring atoms of which a ring atom is selected from S, O, and N; zero, one, or two ring atoms are additional heteroatoms independently selected from S, O, and N; and the remaining ring atoms are carbons, with the radical attached to the rest of the molecule via any of the ring atoms, such as, for example, pyridyl, pyrazinyl, pyrimidinyl, pyrrolyl, pyrazolyl, imidazolyl, thiazolyl, oxazolyl, isoxazolyl, thiadiazolyl, oxadiazolyl, thiophenyl, furanyl, quinolinyl, isoquinolinyl, and the like. The term "heterocycle" includes but is not limited to pyrrolidinyl, pyrazolinyl, pyrazolidinyl, imidazolinyl, imidazolidinyl, piperidinium, piperazinyl, oxazolidinyl, isoxazolidinyl, morpholinyl, thiazolidinyl, isothiazolidinyl, and tetrahydrofuryl. The term "substituted" as used herein refers to a group substituted by independent replacement of any of the hydrogen atoms therefore with, for example, Cl, Br, F, I, OH, CN, alkyl, alkoxy, alkoxy substituted with aryl, haloalkyl, alkylthio, amino, alkylamino, dialkylamino, mercapto, nitro, carboxaldehyde, carboxy, alkoxycarbonyl, or carboxamide. Any substituent can be an aryl, heteroaryl, or heterocycloalkyl group.
It will be apparent that the nature of the substituents at positions 2, 4, 6, 8, 10, 12, 14 and 16 in formula (1) is determined at least initially by the specificity of the AT catalytic domain of modules 9, 8. , 7, 6, 5, 4, 3 and 2, respectively. Because AT domains that accept malonyl CoA, methylmalonyl CoA, ethylmalonyl CoA (and in general, lower alkyls of malonyl CoA), as well as hydroxylamonyl CoA, are available, one of the substituents in these positions can be H, and the another may be H, lower alkyl, especially methyl and ethyl, or OH. Further reaction to these positions, for example, a methyl transferase reaction such as that catalyzed by module 8 of epothilone PKS, can be used to replace H in these positions as well. In addition, an H, OH mode can be oxidized at = 0 or, with the adjacent ring C, be dehydrated to form a p bond. Both OH and = 0 are easily derived as described further below. Thus, a wide variety of modalities of R2, R6, R8, R10, R12, R14 and R16 is available synthetically. The restrictions established with respect to the modalities of these substituents set forth in the definitions with respect to formula (1) above reflect the information described in the SAR description in example 8 below. Similarly, ß-carbonyl modifications (or absence of modification) can be easily controlled by modifying the epothilone PKS gene cluster to include the appropriate sequences at the corresponding positions in the epothilone gene cluster that will or will not contain active domains KR, DH and / or ER. Thus, the modalities of X5, X7, X9, X11 and X13 synthetically available are numerous, including the formation of the links p with the positions of adjacent rings. The positions occupied by OH are easily converted to ethers or esters by means well known in the art; OH protection may be required in positions that will not be derived. In addition, a hydroxyl can be converted to a waste group, such as tosylate, and replaced by an amino or halo substituent. A wide variety of "hydroxyl derivatives" such as those discussed above are known in the art. Similarly, positions in the ring which contain oxo groups can be converted to "carbonyl derivatives" such as oximes, ketals, and the like. The initial products of the reaction with the oxo potions can be further reacted to obtain more complex derivatives. As described in example 8, said derivatives can finally result in cyclic substituents joining two ring positions. Enzymes useful in the modification of the initially synthesized polyketide, such as transmethylases, dehydratases, oxidases, glycosylation enzymes and the like, can be supplied endogenously by a host cell when the polyketide is synthesized intracellularly, by modifying a host that does not contain the recombinant materials for the production of these modification enzymes, or they can be supplied in a free system of cells, either in purified forms or as relatively crude extracts. Thus, for example, epoxidation of the p-bond at position 12-13 can be effected using the protein product of the epoK gene directly in vitro. The nature of A is more conveniently controlled by employing an epothilone PKS comprising an inactive module 1 NRPS (using a substrate of module 2) or a KS2 knock-out (using a substrate of module 3) as described in example 6 , here below. Limited variation can be obtained by altering the catalytic specificity of AT of the charge module; additional variation is achieved by replacing the NRPS of module 1 with a NRPS of different specificity or with a conventional PKS module. However, up to now, the variables are more easily prepared by feeding the cells with synthetic substrate precursors of modulus 2 and substrate precursors of modulo 3 of the appropriately altered epothilone PKS as described in example 6.
Pharmaceutical compositions The compounds can be easily formulated by providing the pharmaceutical compositions of the invention. The pharmaceutical compositions of the invention can be used in the form of a pharmaceutical preparation, for example, in solid, semisolid, or liquid form. This preparation will contain one or more of the compounds of the invention as a active ingredient in a mixture with an organic or inorganic carrier or excipient suitable for external, enteral, or parenteral application. The active ingredient can be, for example, pharmaceutically acceptable, customary non-toxic carriers for tablets, pills, capsules, suppositories, vaginal suppositories, solutions, emulsions, suspensions, and any other suitable form of use. The vehicles that can be used include water, glucose, lactose, acacia gum, gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, colloidal silica, potato starch, urea, and other suitable vehicles for use. in the manufacture of preparations, in solid, semi-solid, or liquid form. In addition, auxiliary stabilizing agents, thickeners, and colorants and perfumes can be used. For example, the compounds of the invention can be used with hydroxypropyl methylcellulose essentially as described in the US patent. No. 4,916,138, incorporated herein by reference, or with a surfactant essentially as described in EPO Patent Publication No. 428,169, incorporated herein by reference. Oral dosage forms can be prepared essentially as described by Hondo et al., 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein by reference. Dosage forms for external application can be prepared essentially as described in EPO Patent Publication No. 423,714, incorporated herein by reference. The active compound is included in the pharmaceutical composition in a enough to produce the desired effect on the disease process or condition. For the treatment of conditions and diseases caused by infection, alterations of the immune system (or to suppress immune function), or cancer, a compound of the invention can be administered orally, topically, parenterally, by inhalation of aspersion, or rectally in formulations of dosage unit containing conventional non-toxic pharmaceutically acceptable vehicles, adjuvants and vehicles. The term parenteral, as used herein, includes subcutaneous, and intravenous, intrathecal, intramuscular injections, and intrasternal injection or infusion techniques. The dose levels of the compounds of the present invention are of the order of from 0.01 mg to about 100 mg per kilogram of body weight per day, preferably from about 0.1 mg to about 50 mg per kilogram of body weight per day. The dose levels are useful in the treatment of the conditions indicated above (from about 0.7 mg to about 3.5 mg per patient per day)., assuming a patient of 70 kg). In addition, the compounds of the present invention can be administered on an intermittent basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. The amount of active ingredient that can be combined with carrier materials to produce a particular dosage form will vary depending on the host treated and the particular mode of administration.
For example, a formulation that is intended for oral administration to humans may contain from 0.5 mg to 5 gm of active agent compounded with an appropriate carrier material and in convenient quantity, which can form from 5 percent to 95 percent of the composition total. Dosage unit forms will generally contain from about 0.5 mg to about 500 mg of active ingredient. For external administration, the compounds of the invention may be formulated within the ranges of, for example, 0.00001% to 60% by weight, preferably from 0.001% to 10% by weight, and more preferably from about 0.005% to 0.8% by weight. weight. It will be understood, however, that the specific dose level for any particular patient will depend on a variety of factors. These factors include the activity of the specific compound employed; the age, body weight, general health, sex, and diet of the subject; the time and route of administration and the rate of excretion of the drug; whether a combination of drugs is used in the treatment; and the severity of the particular disease or condition for which the therapy is requested. A detailed description of the invention has been provided above, the following examples are given for the purpose of illustrating the present invention and should not be construed as limiting the scope of the invention or claims.
EXAMPLE 1 DNA sequencing of cosmid clones and subclones thereof The epothilone producing strain, Sorangium cellulosum SMP44, was grown on a medium containing cellulose, see Bollag et al., 1995, Cancer Research 55: 2325-2333, incorporated herein by reference, and the production of epofilone was confirmed by LC analysis. / MS of the culture supernatant. Total DNA was prepared from this strain using the procedure described by Jaoua et al., 1992, Plasmid 28: 157-165, incorporated herein by reference. To prepare a cosmid library, the genomic DNA of S. cellulosum was partially digested with Sau3AI and ligated with pSupercos (Stratagene) digested with BamHI. The DNA was packaged in lambda phage as recommended by the manufacturer and the mixture was then used to infect E coli XL1-Blue MR cells. This procedure produces approximately 3,000 isolated colonies on LB-ampicillin dishes. Because the size of the genome of S. cellulosum is estimated to be about 107 nucleotides, the DNA insert present among the 3000 colonies could correspond to about 10 genomes of S. cellulosum. To select the library, two segments of the domains KS were used to design the oligonucleotide primers for a PCR with genomic DNA from Sorangium cellulosum as a template. The generated fragment was then used as a probe to select the library. This method is it chose, because it was found, from the examination of more than a dozen PKS genes, that the PKS domains are the most highly conserved (at the amino acid level) of all the PKS domains examined. Therefore, it was expected that the produced probes could detect not only epothilone PKS genes but also other clusters of PKS genes represented in the library. The two synthesized degenerate oligonucleotides using conserved regions within the ketosynthase (KS) domains collected from DEBS and the sorafen PKS gene sequences were (the standard nomenclature for degenerate positions is used): CTSGTSKCSSTBCACCTSGCSTGC and TGAYRTGSGCGTTSGTSCCGSWGA. A particular band of -750 bp, corresponding to the predicted size, was observed on an agarose gel after PCR using the oligonucleotides and primers and the genomic DNA of S. cellulosum SMP44 as a template. The fragment was removed from the gel and cloned into the Hincll site of pUC118 (which is a derivative of pUC18 with an insert sequence to make single-stranded DNA). After transformation of E. coli, plasmid DNA from ten independent clones was isolated and sequenced. The analyzes reveal nine unique sequences each corresponding to a common segment of the KS domain in PKS genes. Of the nine, three were identical to the gene pool of the polyketide synthase previously isolated from this organism and it was determined that it does not belong to the epotilone gene pool from the analysis of the modules. The six remaining KS fragments were cleaved from the vector, pooled, end-labeled with 32P and used as a probe in hybridizations with the colonies containing the cosmid library under high stringency conditions. The selection identified 15 cosmids that hybridize with the grouped KS probes. The DNA was prepared from each cosmid, digested with Notl, separated on an agarose gel, and transferred to a nitrocellulose membrane for Southern hybridization using the KS fragment pooled as a probe. The results revealed that two of the cosmids did not contain the KS hybridization insert, leaving 13 cosmids for further analysis. The label was removed from the blot and re-selected, under less severe conditions, with the labeled DNA containing the sequence corresponding to the enoylreductase domain from module four of the DEBS gene cluster. Because it was anticipated that the epothilone PKS gene cluster would encode two consecutive modules containing an ER domain, and because not all PKS gene clusters have modules that contain the ER domain, hybridization with the ER probe was predicted for identify cosmids containing the DNA insert from the epothilone PKS gene pool. It was found that two cosmids hybridize strongly with the ER probe, one moderately hybrid, and one weakly hybrid final cosmid. The analysis of the restriction pattern of the fragments digested with Notl indicated that the two cosmids that hybridize strongly with the ER probe overlap each other. The nucleotide sequence was also obtained from the ends of each of the 13 cosmids using the sites of binding of initiator T7 and T3. All contained sequences that show homology with the PKS genes. The sequence from one of the cosmids that hybridizes strongly with the ER probe shows homology with NRPS and, in particular, with the adenylation domain of an NRPS. Because it was anticipated that the thiazole portion of epothilone can be derived from the formation of an amide bond between an acetate and a cysteine molecule (with a step of forming a subsequent cycle), the presence of an NRPS domain in a cosmid that also contains ER domain (s) supports the prognosis that this cosmid would contain all or part of the epothilone PKS gene cluster. The preliminary restriction analysis of the remaining 12 cosmids suggests that three can be superimposed with the cosmid of interest. To verify this, oligonucleotides were synthesized for each end of the four cosmids (determined from the terminal sequence described above) and used as established primers in PCR with each DNA of the four cosmids. Overlap would be indicated by the presence of a band from an unrelated initiator-template reaction. The result of this experiment verified that two of the cosmids overlap with the cosmid that contains the NRPS. The restriction map of the three cosmids revealed that the cosmids, in fact, overlap. In addition, because the PKS sequences extend to the end of the insert in the last overlay fragment, based on the assumption that NRPS can be mapped to the 5 'end of the pool, the results also indicated that the 3 'end of the gene pool had not been isolated among the clones identified. To isolate the remaining segment of the epothilone biosynthesis genes, a PCR fragment was generated from the cosmid that confers the most 3 'terminal region of the putative gene cluster. This fragment was used as a probe to select a freshly prepared cosmid library of Sorangium cellulosum genomic DNA again of approximately 3000 colonies. Several hybridizing clones were identified; The DNA was made of six of them. Analysis of the fragments digested with Notl indicated that they all contained overlapping regions. The cosmid containing the longest DNA insert that also had the shortest overlap with the cosmid used to make the probe was selected for further analysis. Restriction maps were created for the four cosmids, as shown in Figure 1. The sequence obtained from one of the ends of the cosmid pKOS35-70.8A3 showed no homology with the PKS sequences or any associated modification enzyme. Similarly, the sequence from one end of the cosmid pKOS35-79.85 also did not contain the sequences corresponding to the PKS region. These findings support the observation that the epothilone pool was contained within the -70 kb region encompassed by the inserts of the four cosmids.
To sequence the inserts in the cosmids, each of the restriction fragments digested with Notl from the four cosmids were cloned into the Notl site of the commercially available pBluescript plasmid. The initial sequencing was carried out at the ends of each of the clones. The analysis of the sequences allowed the prediction, before having the complete sequence, that there could be 10 modules in this gene cluster of PKS, a load domain plus 9 modules. The sequence was obtained from the complete PKS as follows. Each of the 13 fragments digested with Notl that do not overlap was isolated and subjected to partial digestion with HinP1. Fragments of -2 to 4 kb in length were removed from an agarose gel and cloned into the Accl site of pUC118. Sufficient clones of the HinPl-digested fragments were sequenced from each library to provide at least an extension of four times each. To sequence through each of the NotI sites, one set of oligonucleotides was made, one 5 'and the other 3' of each NotI site, and were used as primers in the PCR amplification of a fragment containing each NotI site. . Each fragment produced in this way was cloned and sequenced. The nucleotide sequence was determined for a linear segment corresponding to -72 kb. The analysis revealed a grouping of the PKS gene with a loading domain and the nine modules. Towards the 3 'end of the PKS sequence in an ORF, designated epoK, which shows strong homology with the cytochrome P450 oxidase genes and encodes the epoxidase of epothilone. The nucleotide sequence of 15 kb towards the 3 'end of epoK has also been determined: a number of additional ORFs have been identified but an ORF showing homology with no known dehydratase has not been identified. The epoL gene may encode a dehydratase activity, but this activity may rather reside within the epothilone PKS or may be encoded by another gene. The PKS genes are organized in 6 open reading frames. At the polypeptide level, the loading domain and modules 1, 2, and 9 appear on individual polypeptides; their corresponding genes are designated epoA, epoB, epoC and epoF, respectively. Modules 3, 4, 5, and 6 are contained in a single polypeptide whose gene is designated epoD, and modules 7 and 8 are in another polypeptide whose gene is designated epoE. It is clear that from the space between the ORF that epoC, epoD, epoE and epoF constitute an operon. The epoA, epoB and epoK genes can also be part of a large operon, but there are approximately 100 bp spaces between epoB and epoC and 115 bp between epoF and epoK which could contain a promoter. The present invention provides the intergenic sequences in recombinant form. At least one, but potentially more than one promoter is used to express all the epothilone genes. The gene crush of epothilone PKS is schematically shown below.
EpoR epoB epoC epoD epoE epoF epoK Load Mod 1 Mod 2 Mod 3,4,5, & 6 Mod 7 &8 Mod 9 P450 (NRPS) A detailed examination of the modules shows an organization and composition that is consistent with one capable of being used for epothilone biosynthesis. The description that follows is at the level of the polypeptide. The sequence of the AT domain in the loading module and in modules 3, 4, 5, and 9 shows similarity to the consensus sequence for the malonyl loading domains, consistent with the presence of a side chain H in C-14, C-12 (epothilones A and C), C-10, and C-2, respectively, as well as the loading region. The AT domains in modules 2, 6, 7, and 8 resemble the consensus sequence of AT domains specific to methylmalonyl, again consistent with the presence of methyl side chains at C-16, C-8, C-6 , and C-4 respectively. The charge module contains a KS domain in which the cysteine residue is usually present at the active site instead of tyrosine. This domain is designated KSY and serves as a decarboxylase, which is part of its normal function, but can not function as a condensation enzyme. Thus, the charge domain is expected to carry malonyl CoA, move it to the ACP, and decarboxile to produce the acetyl residue required for condensation with cyne. Module 1 is the non-ribosomal peptide synthetase that activates cyne and catalyzes condensation with acetate in the charge module. The sequence contains segments highly similar to the ATP and ATPase binding domains, required for amino acid activation, a phosphopantoteinylation site, and an elongation domain. In search in bases of data, module 1 shows very high similarity with a number of previously identified peptide synthetases. Module 2 determines the structure of epothilone in C-15-C-17. The presence of the DH domain in module 2 produces the dehydro portion C-16-17 in the molecule. The domains in module 3 are consistent with the structure of epothilone in C-14 and C-15; the OH that is generated from the action of KR is used in the lactonization of the molecule. Module 4 controls the structure at C-12 and C-13 where there is a double bond in epothilones C and D, consistent with the presence of a DH domain. Although the sequence of the AT domain seems similar to those that specify the malonate load, it can also carry methylmalonate, therefore partly explaining the mixture of epothilones found in the fermentation broths of the organisms normally producing epothilone. A significant divergence from the expected function set was found in module 4. It was expected that this module would contain a DH domain, therefore directing the synthesis of epothilones C and D as the products of the PKS. The rigorous analysis revealed that the space between the AT and KR domains of module 4 was not long enough to accommodate a functional DH domain. Thus, the degree of reduction in module 4 does not proceed beyond the ketoreduction of the beta-keto formed after the condensation directed by module 4. Because the C-12,13 unsaturation has been demonstrated (epothilones C and D ), there must be a dehydration function additional that introduces the double bond, and it is believed that this function is in the same PKS or resides in an ORF in the epothilone biosynthetic gene cluster. Thus, the action of dehydratase can occur either during the synthesis of the polyketide or after the formation of the cycle has taken place. In the present case, the compounds produced at the end of a growing acyl chain would be the epothilones C and D. If the C-12,13 dehydration was a post-polyketide event, the completed acyl chain would have a hydroxyl group in C- 13, as shown below. The names of epothilones G and H have been assigned to the 13-hydroxy compounds produced in the absence of or prior to the action of dehydratase.
Epothilones G (R = H) and H (R = CH3). Each of modules 5 and 6 has the complete set of reduction domains (KR, DH and ER) to produce the methylene that functions in C-11 and C-9. Modules 7 and 9 have the KR domains to produce the hydroxyls at C-7 and C-3, and the module 8 does not have a functional KR domain, consistent with the presence of the keto group at C-5. Module 8 also contains a methyltransferase (MT) domain that results in the presence of the geminal dimethyl function in C-4. Module 9 has a thioesterase domain that terminates the synthesis of the polyketide and catalyzes ring closure. The genes, proteins, modules, and domains of epothilone PKS are summarized in the table below. The inspection of the sequence has revealed translational links between epoA and epoB (charge domain and module 1) and between epoC and epoD. Very few spaces have been seen between epoD and epoE and epoE and epoF but spaces exceeding 100 bp are between epoB and epoC and epoF and epoK. These intergenic regions may contain promoters. Sequencing efforts have not revealed the presence of regulatory genes, and it is possible that epothilone synthesis is not regulated by specific regulation of the operon in Sorangium cellulosum. The sequence of the epothilone PKS and flanking regions has been compiled into a single contiguous sequence, as shown below. 1 TCGTGCGCGG GCACGTCGAG GCGTTTGCCG ACTTCGGCGG CGTCCCGCGC GTGCTGCTCT 61 ACGACAACCT CAAGAACGCC GTCGTCGAGC GCCACGGCGA CGCGATCCGG T CCACCCCA 121 CGCTGCTGGC TCTGTCGGCG GATTACCGCT TCGAGCCGCG CCCCGTCGCC GTCGCCCGCG 131 GCAACGAGAA GGGCCGCGTC GAGCGCGCCA TCCGCTACGT CCGCGAGGGC TTCTTCGAGGCTA CGCCGACCTC GGAGACCTCA ACCGCCAAGC GACCGAGTGG ACCAGCTCCG CGGC3CTCGA TCGCTCCTGG GTCGAGGACC GCGCCCGCAC CGTGCGTCAG GCCTTCGACG 361 AC3AGCGCAG CGTGCTGCTG CGACACCCTG ACACACCGTT TCCGGACCAC GAGCGCGTCG 421 .GGTCGAGGT CGGAAAGACC CCCT.ACGCGC GCTTCGATCT CAACG.ACTAC TCGGTCCCCC 431 ACGACCGGAC GC3CCGCACG CTGGTCGTCC TCGCCGACCT CAGTCAGGTA CGCATCGCCG • 5 541 ACGGCAACCA GATCGTCGCG ACCC.ACGTCC GTTCGTGGGA CCGCGGCCAG CAGATCGAGC ÓOi AGCCCGAGCA CCTCCAGCGC CTGGTCGACG AGAAGCGCCG CGCCCGCGAG CACCGCGGCG 661 TTGATCGCCT CGCGCGCGCC GCCCGCAGCA GCCAGGCATT CCTGCGCATC GTCGCCGAGC 721 GCGGCGATAA CGTCGGC.GC GCGATCGCCC GGCTTCTGCA ACTGCTCGAC GCCGTGGGCG 781 CCGCCGAGCT CGAAGAGGCC CTGGTCGAGG TGCTTGAGCG CGACACCATC CACATCGGTG 841 CCGTCCGCCA GGTGATCGAC CGCCGCCGCT CCGAGCGCCA CCTGCCGCCT CCAGTCTCAA 901 TCCCCGTCAC CCGCGGCGAG CACGCCGCCC TCGTCGTCAC GCCGCATTCC CTCACCACCT 961 ACGACGCCCT GAAGAAGGAC CCGACGCCAT GACCGACCTG ACGCCCACCG AGACCAAAGA 102 CCGGCTCAAG AGCCTCGGCC TCTTCGGCCT GCTCGCCTGC TGGGAGCAGC TCGCCGACAA 1CS1 GCCCTGGCTT CGCGAGGTGC TCGCCATCGA GGAGCGCGAG CGCCACAAGC GCAGCCTCGA 1141 ACGCCGCCTG AAGAACTCCC GCGTCGCCGC CTTCAAGCCC ATGACCGACT TCGACTCGTC 12C1 CTGGCCCAAG AAGATCGACC GCGAGGCCGT CGACGACCTC TACGATAGCC GCTACGCGGA • | Q 1261 CCTGCTCTTC GAGGTCGTCA CCCGTCGCTA CGACGCGCAG AAGCCGCTCT TGCTCAGCAC 1321 GAACAAGGCA TTCGCCGACT GGGGCCAGGT CTTCCCGCAC GCCGCGTGCG TCGTCACGCT 1381 CGTCGACCGG CTCGTGCACC GCGCCGAGGT GATCGAGATC GAGGCCGAGA GCTACCGGCT 1441 GAAGGAAGCC AAGGAGCTCA ACGCCACCCG CACCAAGCAG CGCCGCACCA AGAAGCACTG 501 AGCGGCATTT TCACCGGTGA ACTTCACCGA AATCCCGCGT GTTGCCGAGA TCATCTACAG 1561 GCGG.ATCGAG ACCGTGCTCA CGGCGTGGAC GACATGGCGC GGAAACGTCG TCGCAACTGC 1621 CCAGC.AATGT CATGGGAATG GCCCCTTGAG GGGCTGGCCG GGGTCGACGA TATCGCGCGA 1631 TCTCCCCGTC AATTCCCGAG CGTAAAAGAA AAATTTGTCA TAGATCGTAA GCTGTGCTAG_1741_TGATCTGCCT TACGTTACGT CTTCCGCACC TCGAGCGAAT TCTCTCGGAT AACTTTCAAG 1801 TTTTCTGAGG GGGCTTGGTC TCTGGTTCCT CAGGAAGCCT GATCGGGACG AGCTAATTCC 1361 CATCCATTTT TTTGAGACTC TGCTCAAAGG GATTAGACCG AGTGAGACAG TTCTTTTGCA 1921 GTGAGCGAAG AACCTGGGGC TCGACCGGAG GACGATCGAC GTCCGCGAGC GGGTCAGCCG 1S81 CTGAGGATGT GCCCGTCGTG GCGGATCGTC CCATCGAGCG CGCAGCCGAA GATCCGATTG 20 January CGATCGTCGG AGCGGGCTGC CGTCTGCCCG GTGGCGTGAT CGATCTGAGC GGGTTCTGGA 2101 CGCTCCTCGA GGGCTCGCGC GACACCGTCG GGCAAGTCCC CGCCGAACGC TGGGATGCAG 2161 CAGCGTGGTT TGATCCCGAC CTCGATGCCC CGGGGAAGAC GCCCGTTACG CGCGCATCTT 2221 TCCTGAGCGA CGTAGCCTGC TTCGACGCCT CCTTCTTCGG CATCTCGCCT CGCGAAGCGC 2281 TGCGGATGGA CCCTGCACAT CGACTCTTGC TGGAGGTGTG CTGGGAGGCG CTGGAGAACG 2341 CCGCGATCGC TCCATCGGCG CTCGTCGGTA CGGAAACGGG AGTGTTCATC GGGATCGGCC 2401 CGTCCGAATA TGAGGCCGCG CTGCCGCGAG CGACGGCGTC CGCAGAGATC GACGCTCATG 2461 GCGGGCTGGG GACGATGCCC AGCGTCGGAG CGGGCCGAAT CTCGTATGTC CTCGGGCTGC 2521 GAGGGCCGTG TGTCGCGGTG GATACGGCCT ATTCGTCCTC GCTCGTGGCC GTTCATCTGG 2581 CCTGTCAGAG CTTGCGCTCC GGGGAATGCT CCACGGCCCT GGCTGGTGGG GTATCGCTGA January 26 TGTTGTCGCC GAGCACCCTC GTGTGGCTCT CGAAGACCCG CGCGCTGGCC ACGGACGGTC 2701 GCTGCAAGGC GTTTTCGGCG GAGGCCG.TG GGTTCGGACG AGGCGAAGGG TGCGCCGTCG 2761 TGGTCCTCAA GCGGCTCAGT GGAGCCCGCG CGGACGGCGA CCGGATATTG GCGGTGATTC 2S21 GAGGATCCGC GATCAATCAC GACGGAGCGA GCAGCGGTCT GACCGTGCCG .AACGGGAGCT 2831 CCCAAGAAAT CGTGCTGAAA CGGGCCCTGG CGGACGCAGG CTGCGCCGCG TCTTCGGTGG 2941 GTTA7GTCGA GGC.ACACGGC ACGGGCACGA CGCTTGGTGA CCCCATCGAA ATCCAAGCTC 3001 TG.AATGCGGT ATACGGCCTC GGGCGAGACG TCGCCACGCC GCTGCTGATC GGGTCGGTGA 3C61 AGACCAACCT TGGCCATCC? GAG7ATGCGT CGGGGATCAC TGGGCTGCTG AAGGTCGTCT 3121 TGTCCCTTCA GC. ^ CGGGCAG ATTCCTGCGC ACCTCCACGC GCAGGCGCTG AACCCCCGGA 3131 TCTCATGGGG TGATCTTCGG CTGACCGTCA CGCGCGCCCG GACACCGTGG CCGGACTGGA 3241 ATACGCCGCG ACGGGCGGGG GTGAGCTCGT TCGGCATGAG CGGGACCA.AC GCGCACGTGG 3301 TGC7GGAAGA GGCGCCGGCG GCGACGTGCA CACCGCCGGC 3CCCG.AGCGG CCGGCAGAGC 3361 TGCTGGTGCT GTCGGCAAGG ACCGCGGCAG CCTTGGATGC ACACGCGGCG CGGCTGCGCG 3421 ACCATCTGGA GACCT.ACCCT TCGC GT3TC TGGGCGAT3T GGCGTTCAGT CTGGC3ACGA 34 81 CGCGCAGCGC GATGGAGCAC CGGC7CGCGG TGGCGGCGAC G7CGAGCGAG GGGCTGCGGG 354 1 CAGCCCTGGA CGCTGCGGCG CAGGGACAGA CGCCGCCCGG TGTGGTGCGC GGTATCGCCG 3501 .ATTCCTCACG CGGCAAGCTC GCCT7TCTCT TCACCGGACA GGGGGCGCAG ACGCTGGGCA 3661 TGGGCCGTGG GCTGTATGAT GTATGGCCC3 CGTTCCGCGA GGCGTTCGAC CTGTGCGTGA 37? 1 GGCTGTTCAA CC.AGGAGCTC GACCGGCCGC TCCGCGAGG7 GATGTGGGCC GAACCGGCCA 3781 GCGTCGACGC CGCGCTGCTC GACCAG.ACAG CCTTTACCCA GCCGGCGCTG TTC.ACCTTCG 334 1 AG7ATGCGCT CGCCGCGCTG TGGC3GTCGT GGGGCGTAGA GCCGGAGTTG GTCGCTGGCC 3901 ATAGCATCGG TGAGCTGGTG GC7GCCTGCG TGGCGGGCGT GT7C7CGC7T GAGGACGCGG 3961 7GTTCCTGGT GGCTGCGCGC 3GGCGCCTGA TGCAGGC3CT GCCGGCCGGC GGGGCGA7GG C 21 TGTCG. TCGC GGCGCCGGAG GCCGATGTGG CTGCTGC3GT GGCGCCGCAC GCAGCGTCGG 4031 7GTCGATCGC CGCGGTCAAC GGTCCGG.ACC AGGTGGTCAT CGCGGGCGCC GGGCAACCCG April 1 TGCATGCGAT CGCGGCGGCG A7GGCCGCGC GCGGGGCGCG AACCAAGGCG CTCCACGTCT 4201 CGCATGCGTT CCACTCACCG CTCATGGCCC CGATGCTGGA GGCGTTCGGG CGTGTGGCCG 4251 AGTCGGTGAG CTACCGGCGG CCG7CGATCG TCCTGGTCAG CAATCTGAGC GGGAAGGCTG 4321 GCACAGACGA GGTGAGCTCG CCGGGCTATT GGGTGCGCCA CGCGCGAG.AG GTGGTGCGCT 4331 TCGCGGATGG AGTGAAGGCG CTGCACGCGG CCGGTGCGGG CACCTTCGTC GAGGTCGGTC 4441 CGAAATCGAC GCTGCTCGGC CTGGTGCCTG CCTGCCTGCC GGACGCCCGG CCGGCGCTGC 4501 TCGCATCGTC GCGCGCTGGG CGTGACGAGC CAGCGACCGT GCTCGAGGCG CTCGGCGGGC 4561 TCTGGGCCGT CGGTGGCCTG GTCTCCTGGG CCGGCCTCTT CCCCTCAGGG GGGCGGCGGG 4621 TGCCGCTGCC CACGTACCCT 7GGCAGCGCG AGCGCTACTG GATCGACACG AAAGCCGACG 4 6 S 1 ACGCGGCGCG TGGCGACCGC CGTGCTCCGG GAGCGGGTCA CGACGAGGTC GAGAAGGGGG 47 1 GCGCGGTGCG CGGCGGCGAC CGGCGCAGCG CTCGGCTCGA CCATCCGCCG CCCGAGAGCG i 801 GACGCCGGGA GAAGGTCGAG GCCGCCGGCG ACCGTCCGTT CCGGCTCGAG ATCGATGAGC 4361 CAGGCGTGCT CGATCGCCTG GTGCTTCGGG TCACGGAGCG GCGCGCCCCT GGTCTTGGC G 4921 AGGTCGAGAT CGCCGTCGAC GCGGCGGGGC TCAGCTTCAA TGATGTCCAG CTCGCGCTGG 4961 GCATGGTGCC CGACGACCTG CCGGGAAAGC CCAACCCTCC GCTGCTGCTC GGAGGCGAGT 5041 GCGCCGGGCG CA7CGTCGCC GTGGGCGAGG GCGTGAACGG CCTTGTGGTG GGCCAACCGG 5101 TCATCGCCCT TTCGGCGGGA GCGTTTGCTA CCCACGTCAC CACGTCGGCT GCGCTGGTGC 5161 TGCCTCGGCC TCAGGCGCTC TCGGCGACCG AGGCGGCCGC CATGCCCGTC GCGTACCTGA 5221 CGGCATGGTA CGCGCTCGAC GGAATAGCCC GCCT7CAGCC GGGGGAGCGG GTGCTGATCC 5281 ACGCGGCGAC CGGCGGGGTC GGTCTCGCCG CGGTGCAGTG GGCGCAGCAC GTGGGAGCCG 534 1 AGGTCCATGC GACGGCCGGC ACGCCCGAGA AGCGCGCCTA CCTGGAGTCG CTGGGCGTGC 5 C l GGTATGTGAG CGATTCCCGC TCGGACCGGT TCGTCGCCGA CGTGCGCGCG TGGACGGGCG 5461 GCGAGGGAGT AGACGTCGTG CTCAACTCGC TTTCGGGCGA GCTGATCGAC AAGAGTTTCA 5521 A7CTCCTGCG ATCGCACGGC CGGTTTGTGG AGCTCGGCAA GCGCGACTGT TACGCGGATA 5531 ACCAGCTCGG GCTGCGGCCG TTCCTGCGCA ATCTCTCCTT CTCGCTGGTG GATCTCCGGG 564 1 GGATGATGCT CGAGCGGCCG GCGCGGGTCC GTGCGCTCTT CGAGGAGCTC CTCGGCCTGA 5701 TCGCGGCAGG CGTGTTCACC CCTCCCCCCA TCGCGACGCT CCCGATCGCT CGTGTCGCCG 5 761 ATGCGTTCCG GAGCATGGCG CAGGCGCAGC ATCTTGGGAA GCTCGTACTC ACGCTGGGTG 5821 ACCCGGAGGT CCAGATCCGT ATTCCGACCC ACGCAGGCGC CGGCCCGTCC ACCGGGGATC 5881 GGGATCTGCT CGACAGGCTC GCGTCAGCTG CGCCGGCCGC GCGCGCGGCG GCGCTGGAGG 594 1 CGTTCCTCCG TACGCAGGTC TCGCAGG7GC TGCGCACGCC CGAAATCAAG GTCGGCGCGG 6001 AGGCGCTGTT CACCCGCCTC GGCATGGACT CGCTCATGGC CGTGGAGCTG CGCAATCGTA 6061 TCGAGGCGAG CCTCAAGCTG AAGCTGTCGA CGACGTTCCT GTCCACGTCC CCCAATATCG 6121 CCTTGTTGAC CCAAAACCTG TTGGATGCTC TCGCCACAGC TCTCTCCTTG GAGCGGGTGG 618 1 CGGCGGAGAA CCTACGGGCA GGCGTGCAAA GCGACTTCGT CTCATCGGGC GCAGATCAAG 624 1 ACTGGGAAAT CATTGCCCTA TGACGATCAA TCAGCTTCTG AACGAGCTCG AGCACCAGGG 6301 TGTC.AAGCTG GCGGCCGATG GGGAGCGCCT CCAGATACAG GCCCCCAAGA ACGCCCTGAA 6361 CCCGAACCTG CTCGCTCGAA TCTCCGAGCA CAAAAGCACG ATCCTGACGA TGC7CCGTCA 642 GAGACTCCCC GCAGAGTCCA TCGTGCCCGC CCCAGCCGAG CGGCACGTTC CGTTTCCTCT 6481 CACAGACATC CAAGGATCCT ACTGGCTGGG TCGGACAGGA GCGTTTACGG TCCCCAGCGG 654 1 GATCCACGCC TATCGCGAAT ACGACTGTAC GGATCTCGAC GTGGCGAGGC TGAGCCGCGC 660 1 CTTTCGGAAA GTCGTCGCGC GGCACGACAT GCTTCGGGCC CACACGCTGC CCGACATGAT 6661 GCAGGTGATC GAGCCTAAAG TC3ACGCCGA CATCGAGATC ATCGATCTGC GCGGGCTCGA 6721 CCGGAGCACA CGGGAAGCGA GGCTCGTATC GTTGCGaAGAT GCGATGTCGC ACCGCATCTA 6781 TGACACCGAG CGCCCTCCGC TCTATCACGT CGTCGCCGTT CGGCTGGACG AGCAGCAAAC 684 1 CCGTCTCGTG CTCAGTATCG ATCTCAT7AA CGTTG.ACCTA GGCAGCCTGT CCATCATC77 69C 1 CAAGGATTGG CTCAGCT7CT ACGAAGATCC CGAGACCTCT CTCCC7GTCC TGG.AGCTCTC 6961 GTACCGCGAC TATGTGCTCG CGCTGGAGTC TCGCAAGAAG TCTGAGGCGC ATCAACGATC 7021 GATGGATTAC TGGAAGCGGC GCGTCGCCGA GCTCCCACCT CCGCCGATGC TTCCGATGAA 7081 GGCCGATCCA TCTACCCTGA GGGAGATCCG CTTCCGGCAC ACGGAGCAAT GGCTGCCGTC 71 1 GGACTCCTGG AGTCGATTGA AGCAGCGTGT CGGGGAGCGC GGGCTGACCC CGACGGGCGT 7201 CATTC7GGCT GCATTTTCCG .AGGTGATCGG GCGCTGG.AGC GCGAGCCCCC GGTTTACGCT 7261 CAACA7AACG CTCTTCAACC GGCTCCCCGT CCATCCGCGC GTGAACGATA TCACCGGGGA 7321 CTTCACG7C3 ATGGTCCTCC TGGACATCGA CACCACTCGC GACAAGAGCT TCGAACAGCG 7381 CGCT AGCGT ATTCAAG.AGC AGCTGTGGGA AGCGATGGAT CACTGCGACG TAAGCGGTAT 74 4 1 CGAGGTCCAG CG.AGAGGCCG CCCGGGTCCT GGGGATCCAA CGAGGCGCAT TGTTCCCCGT 7501 GGTGCTCACG AGCGCGCTCA ACCAGCAAGT CG7TGGTGTC ACCTCGCTGC AGAGGCTCGG 7561 CACTCCGG73 TACACCAGCA CGCAGACTCC TCAGCTGCTG CTGGATCATC AGCTCTACGA 7621 GCACGA7GGG GACCTCGTCC TCGCGTGGGA CATCGTCGAC GGAGTGTTCC CGCCCGACC7"7681 TCTGGACGAC A7GCTCGAAG CGTACGTCGC TTTTCTCCGG CGGCTCACTG AGGAACCATG 77 1 GAGTGAACAG ATGCGCTGTT CGCTTCCGCC TGCCCAGCTA GAAGCGCGGG CGAGCGCAAA 7801 CGAGACCAAC TCGCTGCTGA GCGAGCATAC GCTGCACGGC CTGTTCGCGG CGCGGGTCGA 7861 GCAGCTGCCT ATGCAGCTCG CCGTGGTGTC GGCGCGCAAG ACGCTCACGT ACGAAGAGCT 792 1 TTCGCGCCGT TCGCGGCGAC TTGGCGCGCG GCTGCGCGAG CAGGGGGCAC GCCCGAACAC 7981 ATTGGTCGCG GTGGTGATGG AGAAAGGCTG GGAGCAGGTT GTCGCGGTTC TCGCGGTGCT 80 1 CGAGTCAGGC GCGGCCTACG TGCCGATCGA TGCCGACCTA CCGGCGGAGC GTATCCACTA 8101 CCTCCTCGAT CATGGTGAGG TAAAGCTCGT GCTGACGCAG CCATGGCTGG ATGGCAAACT 8161 GTCATGGCCG CCGGGGATCC AGCGGCTGCT CGTGAGCGAT GCCGGCGTCG AAGGCGACGG 8221 CGACC.AGCTT CCGATGATGC CCATTCAGAC ACCTTCGGAT CTCGCGTATG TCATCTACAC 8281 CTCGGGATCC ACAGGGTTGC CCAAGGGGGT GATGATCGAT CATCGGGGTG CCGTCAACAC 83 1 ^ CATCCTGGAC ATCA CGAGC GCTTCGAAAT AGGGCCCGGA GACAGAGTGC TGGCGCTCTC 8401 CTCGCTGAGC TTCGATCTCT CGGTCTACGA TGTGTTCGGG ATCCTGGCGG CGGGCGGTAC 84 61 GATCGTGGTG CCGGACGCGT CCAAGCTGCG CGATCCGGCG C.ATTGGGCAG CG7TGATCGA 8521 ACGAGAG AG GTGACGGTGT GGAACTCGGT GCCGGCGCTG ATGCGGATGC TCGTCGAGCA 8581 TTCCGAGGGT CGCCCCGATT CGCTCGCTAG GTCTCTGCGG CTTTCGCTGC TGAGCGGCGA 864 1 CTGGATCCCG GTGGGCCTGC CTGGCGAGCT CCAGGCCATC AGGCCCGGCG TGTCGGTGAT 8701 CAGCCTGGGC GGGGCCACCG AAGCGTCGAT CTGGTCCATC GGGTACCCCG TGAGGAACGT 8761 CGATCCATCG TGGGCG.AGCA TCCCCTACGG CCGTCCGCTG CGCAACCAGA CGTTCCACGT 8821 GCTCGATGAG GCGCTCGAAC CGCGCCCGGT CTGGGTTCCG GGGCAACTCT ACATTGGCGG 8881 GGTCGGACTG GCACTGGGCT ACTGGCGCGA TGAAGAGAAG ACGCGCAACA GCTTCCTCGT 8941 GCACCCCGAG ACCGGGGAGC GCCTCTACAA GACCGGCGAT CTGGGCCGCT ACCTGCCCGA 9001 TGGAAACATC GAGTTCATGG GGCGGGAGGA CAACCAAATC AAGCTTCGCG GATACCGCGT 9061 TGAGCTCGGG GAAATCGAGG AAACGCTCAA GTCGCATCCG AACGTACGCG ACGCGGTGAT 9121 TGTGCCCGTC GGGAACGACG CGGCGAACAA GCTCCTTCTA GCCTATGTGG TCCCGGAAGG 9181 CACACGGAGA CGCGCTGCCG AGCAGGACGC GAGCCTCAAG ACCGAGCGGG TCGACGCGAG 92 1 AGCACACGCC GCCAAAGCGG ACGGATTGAG CGACGGCGAG AGGGTGCAGT TC.AAGCTCGC 9301 TCGACACGGA CTCCGGAGGG ATCTGGACGG AAAGCCCGTC GTCGATCTGA CCGGGCTGGT 9361 TCCGCGGGAG GCGGGGCTGG ACGTCTACGC GCGTCGCCGT AGCGTCCGAA CGTTCCTCGA 9421 GGCCCCGATT CCATTTGTTG AATTCGGCCG ATTCCTGAGC TGCCTGAGCA GCGTGGAGCC 9481 CGACGGCGCG GCCCTTCCCA AATTCCGTTA TCCATCGGCT GGCAGCACGT ACCCGGTGCA 95 1 AACCTACGCG TACGCCAAAT CCGGCCGCAT CGAGGGCGTG GACGAGGGCT TCTATTATTA 9601 CCACCCGTTC GAGCACCGTT TGCTGAAGGT CTCCGATCAC GGGATCGAGC GCGGAGCGCA 9661 CGTTCCGCAA AACTTCGACG TGTTCGATGA AGCGGCGTTC GGCCTCCTGT TCGTGGGCAG 9721 GATCGATGCC ATCGAGTCGC TGTATGGATC GTTGTCACGA GAATTCTGCC TGCTGGAGGC 9781 CGGAT.ATATG GCGCAGCTCC TGATGGAGCA GGCGCCTTCC TGCAACATCG GCGTCTGTCC 9S41 GGTGGGTCAA TTCGATTTTG AACAGGTTCG GCCGGTTCTC GACCTGCGGC ATTCGGACGT 9901 TT.ACGTGCAC GGCATGCTGG GCGGGCGGGT AGACCCGCGG CAGT7CCAGG TCTGTACGCT 9961 CGGTCAGGAT TCCTCACCGA GGCGCGCCAC GACGCGCGGC GCCCCTCCCG GCCGCGATCA 10021 GC.ACTTCGCC GATATCCTTC GCGACTTCTT GAGGACCAAA CTACCCGAGT ACATGGTGCC 10081 TACAGTCTTC GTGGAGCTCG .ATGCGTTGCC GCTGACGTCC AACGGCAAGG TCGATCGTAA 101 GGCCC7GCGC GAGCGGAAGG ATACCTCGTC GCCGCGGCAT TCGGGGCACA CGGCGCCACG 10201 GGACGCCTTG G.AGGAGATCC TCGTTGCGGT CGTACGGGAG GTGCTCGGGC TGGAGGTGGT 10261 TGGGCTCCAG CAGAGCTTCG TCGATCTTGG TGCGACATCG ATTC.ACATCG TTCGCATGAG 10321 GAGTCTGTTG CAG.AAGAGGC TGGATAGGGA GATCGCCATC ACCGAGTTGT TCCAGTACCC 10381 GAACCTCGGC TCGCTGGCGT CCGGTTTGCG CCGAGACTCG AAAGATCTAG AGCAGCGGCC 10441 10501 GAACATGCAG CAAGGGCAGG AGACGTAGCT AAGAGCGCCG AACA.AAACCA GGCCGA i j GCCAATGAAC CGCAAGCCCG CTA- -GC TCAC CCTGGG.ACTC -ATCTGATCTG 10561 10621 ATCGCGGGTA GTGTGCGCGT TGAGCCG7GT TGCTCGAACG CT3AGGAACG GTGAGCTCAT GGAAGAACAA GAGTCCTCCG CT.A7CGCAGT CA7CGGCATG 10691 TCGGGCCGTT TTCCGGGGGC Cjtataltal. J \ a a t) GACGAATTCT GG.AGGAACC7 TGCAGCGCTT CTCCGAGCAG 10741 10301 GAGCTCGCGG CGTCCGGAGT CGACCCAGCG CTGGTGCTGG ACCCGAACTA CGTCCGGGCG GGCAGCGTGC TGGAAGATGT .¡-c CG ~ G i CT 10361 GACGCTGCTT TCTTCGGCAT CAGCCCGCGC GAGGCAGAGC TCATGGATCC GC 3CACCGC 10921 A7CTTCATGG AATGCGCCTG GGAGGCGC7G GAGAACGCCG GATACGACCC GACAGCCTAC 10981 GAGGGCTC7 TCGGCGTGTA AACATGAGCT CGTACTTCAC GTCGAACCTC 110 1 CACGAGCACC CAGCGATGAT GCGGTGCCCC GGCTGGTTTC AGACGT7GAT CGGCAaCGAC 111C1 AAGGATTACC TCGCGACCCA CGTCTCCTAC AGGCTGAATC TGAGAGGGCC GAGCATCTCC 11161 GTTC.AAa.CT3 CCTGCTCTAC CTCGCTCGTG GCGGTTCACT TGGCGTGC ^ T GAGCCTCCTC-11221 GACCGCGAGT GCGACATGGC GCTGGCCGGC GGGATTACCG TCCGGATCCC CCATCGAGCC GGCTAT TAT 11281 ATGCTGA.GGG GGGCATCTTC TCTCCCGACG GCCATTGCCG GGCCTTCGAC 11341 GCCAAGGCGA ACGGCAC3AT CATGGGCAAC GGCTGCGGGG TTG7CCTCCT GAAGCCGCTG 1140 G.ACCGGGCGC TCTCCGATGG TGATCCCGTC CGCGCGGTCA TCCTTGGGTC TGCCACAAAC 11461 AACGACGGAG CGAGGAAGAT CGGGTTCACT GCGCCCAGTG AGGTGGGCCA GGCGCAAGCG 11521 ATCATGGAGG CGCTGGCGCT GGCAGGGGTC GAGGCCCGGT CCATCCAATA CATCGAGACC 11531 CACGGG.ACCG GCACGCTGCT CGGAGACGCC ATCGAGACGG CGGCGTTGCG GCGGGTGTTC 116 1 GATCGCGACG CTTCGACCCG GAGGTCTTGC GCGATCGGCT CCGTGAAGAC CGGCATCGGA 11"O1 CACCTCG AT CGGCGGCTGG CATCGCCGGT 7TGATCAAGA CGGTCTTGGC GCTGGAGCAC 11761 CGGCAGCTGC CGCCCAGCCT GAACTTCGAG TCTCCTAACC CATCGATCGA TTTCGCGAGC 11821 AGCCCGTTCT ACGTCAATAC CTCTCTTAAG GATTGGAATA CCGGCTCGAC TCCGCGGCGG 11881 GCCGGCGTCA GCTCGTTCGG GATCGGCGGC ACCAACGCCC ATGTCGTGCT GGAGGAAGCA 11941 CCCGCGGCGA AGCTTCCAGC CGCGGCGCCG GCGCGCTCTG CCGAGCTCTT CGTCGTCTCG 12001 GCCAAGAGCG CAGCGGCGCT GGATGCCGCG GCGGCACGGC TACGAGATCA TCTGCAGGCG 12061 CACCAGGGGC TTTCGTTGGG CGACGTCGCC TTCAGCCTGG CGACGACGCG CAGTCCCATG 12121 GAGCACCGGC TCGCGATGGC GGCACCGTCG CGCGAGGCGT TGCGAGAGGG GCTCGACGCA 12181 GCGGCGCGAG GCCAGACCCC GCCGGGCGCC GTGCGTGGCC GCTGCTCCCC AGGCAACGTG 12241 CCGAAGGTGG TCTTCGTCTT TCCCGGCCAG GGCTCTCAGT GGGTCGGTAT GGGCCGTCAG 12301 CTCCTGGCTG AGGAACCCGT CTTCCACGCG GCGCTTTCGG CGTGCGACCG GGCCATCCAG 12361 GCCGAAGCTG GTTGGTCGCT GCTCGCCGAG CTCGCCGCCG ACGAAGGGTC GTCCCAGATC 12421 GAGCGCATCG ACGTGGTGCA GCCGGTGCTG TTCGCGCTCG CGGTGGCATT TGCGGCGCTG 12481 TGGCGGTCGT GGGGTGTCGG GCCCGACGTC GTGATCGGCC ACAGCATGGG CGAGGTAGCC 12541 GCCGCGCATG TGGCCGGGGC GCTGTCGCTC GAGGATGCGG TGGCGATCAT AGC CTGCCGGCGC 12601 CGGCTGC TCCGGCGCAT CAGCGGTCAG GGCGAGATGG CGGTGACCGA GCTGTCGCTG 12661 GCCGAGGCCG AGGCAGCGCT CCGAGGCTAC GAGGATCGGG TGAGCGTGGC CGTGAGCAAC 12721 AGCCCGCGCT CGACGGTGCT CTCGGGCGAG CCGGCAGCGA TCGGCGAGGT GCTGTCGTCC 12781 CTGAACGCGA AGGGGGTGTT CTGCCGTCGG GTGAAGGTGG ATGTCGCCAG CCACAGCCCG 12841 CAGGTCGACC CGCTGCGCGA GGACCTCTTG GCAGCGCTGG GCGGGCTCCG GCCGCGTGCG 12901 GCTGCGGTGC CGATGCGCTC GACGGTGACG GGCGCCATGG TAGCGGGCCC GGAGCTCGGA 12961 GCGAATTACT GGATGAACAA TCTCAGGCAG CCTGTGCGCT TCGCCGAGGT AGTCC.AGGCG 13021 C.AGCTCCAAG GCGGCCACGG TCTGTTCGTG GAGATGAGCC CGCATCCGAT CCTAACGACT 13081 TCGGTCGAGG AGATGCGGCG CGCGGCCCAG CGGGCGGGCG CAGCGGTGGG CTCGCTGCGG 13141 CGAGGGCAGG ACGAGCGCCC GGCGATGCTG GAGGCGCTGG GCGCGCTGTG GGCGCAGGGC 13201 ACCCTGTAC CCTGGGGGCG GCTGTTTCCC GCGGGGGGGC GGCGGGTACC GCTGCCGACC 13261 TATCCCTGGC AGCGCGAGCG GTACTGGATC GAAGCGCCGG CCAAGAGCGC CGCGGGCGAT 13321 CGCCGCGGCG TGCGTGCGGG CGGTCACCCG C7CCTCGGTG AAATGCAGAC CCTATCAACC 13381 CAGACGAGCA CGCGGCTGTG GGAGACGACG CTGGATCTCA AGCGGCTGCC GTGGCTCGGC 13441 GACCACCGGG TGCAGGGAGC GGTCGTGTTT CCGGGCGCGG CGTACCTGGA GATGGCGAT7 13501 TCGTCGGGGG CCGAGGCTTT GGGCGATGGC CCATTGCAGA TAACCGACGT GGTGCTCGCC 13561 GAGGCGCTGG CCTTCGCGGG CG.ACGCGGCG GTGTTGGTCC AGGTGGTGAC GACGGAGCAG 13621 CCGTCGGGAC GGCTGCAGTT CCAGATCGCG AGCCGGGCGC CGGGC: TGG CCACGCGTCC 13531 TTCCGGGTCC ACGCTCGCGG CGCGT7GCTC CGAGTGGAGC GCACCGAGGT 13741 CCCGGCTGGG CTT .- CGCTTT CCGCCGTGCG CGCACGGCTC CAGGCCAGCA TGCCCGCCGC GGCCACCTAC. 13801 GCGGAGCTGA CCGAG.ATGGG GCTGCAGTAC GGCCCTGCCT TCCAGGGGAT TGCTGAGCTA 13861 AGGGCGAG3C GCTG3GACGG GTACGCCTGC CCGACGCGGC CGGCTCGGCA 3921 3CGGAGTATC GGTTGCATCC TGCGCTGCTG GACGCGTGCT TCCAGGTCGT CGGCAGCCTC 13981 TTCGCCGGCG 3TGGCGAGGC GACGCCGTGG GTGCCCGTGG AAGTGGGCTC GCTGCGGC7C 14041 TTGCAGCGGC CTTCGGGGGA GCTGTGGTGC CATGCGCGCG TCGTG.AACCA CGGGCGCCAA. 1,101 ACCCCCGATC GGCAGGGCGC CGACTTTTGG GTGGTCGACA GCTCGGGTGC AGTG3TCGCC 14161 GAAGTC.AGCG GGCTCGTGGC GCAGCGGCTT CCGGGAGGGG TGCGCCGGCG CGAAGAAGAC 14221 GAT7GGTTCC TGGAGCTCGA GTGGG.AACCC GCAGCGGTCG GCACAGCCAA GGTCAACGCG 14281 GGCCGGTGGC TGCTCCTCGG CGGCGGCGGT GGGCTCGGCG CCGCGTTGC3 CTCGATGCTG 143 1 GAGGCCGGCG GCCATGCCGT CGTCCATGCG GCAGAGAGCA ACACGAGCGC TGCCGGCG7A 1,401 CGC3C3CTCC 7GGCAAAGGC CTTTGACGGC CAGGCTCCGA CGGC3GTGGT GCACCTCGGC 14461 AGCCTCGATG GGGGTGGCGA GCTCGACCCA GGGCTCGGGG CGCAAGGCGC ATTGGACGCG 14521 CCCCGGAGCG CCGACGTC.AG TCCCGATGCC CTCGATCCGG CGCTGGTACG TGGCTG7GAC 14581 .AGCG7GC7C7 GG.ACCGTGCA GGCCCTGGCC GGCATGGGCT TTCG.AGACGC CCCGCGATTG 146 1 7GGC7TCTGA CCCGCGGCGC ACAGGCCGTC GGCGCCGGCG ACGTCTCCGT GAC.ACAGGCA 1 7C1 CCGC7GCTGG GGCTGGGCCG CGTCATCGCC ATGGAGCACG CGGATCTGCG C7GCGC7CGG 14761 GTCGACCTCG .ATCCGACCCG GCCCGATGGG GAGCTCGGTG CCC7GCTGGC CG GCTGCTG 14821 GCCGACGACG CCGAAGCGGA AGTCGCGTTG CGCGGTGGCG AGCGATGCGT CGCTCGGATC 14881 GTCCGCCGGC AGCCCGAGAC CCGGCCCCGG GGG.GGATCG AGAGCT TCCGACCGAC GCGT 149 1 G7CACCATCC GCGCGGACAG CACCTACCTT GTGACCGGCG GTCTGGGTGG GCTCGGTCTG 15001 AGCGTGGCCG GATGGCTGGC CGAGCGCGGC GCTGGTCACC TGGTGCTGGT GGGCCGCTCC 15061 GGCGCGGCGA GCGTGGAGCA ACGGGCAGCC GTCGCGGCGC TCGAGGCCCG CGGCGCGCGC 15121 GTCACCGTGG CGAAGGCAGA TGTCGCCGAT CGGGCGCAGC TCGAGCGGAT CCTCCGCGAG 15131 GTTACCAC3T CGGGGATGCC GCTGCGGGGC GTCGTCCATG CGGCCGGC.AT CTTGGACGAC 15241 GGGCTGCTGA TGCAGCAGAC TCCCGCGCGG TTTCGTAAGG TGATGGCGCC CAAGGTCCAG 15301 GGGGCCTTGC ACCTGCACGC GTTGACGCGC GAAGCGCCGC TTTCCTTCTT CGTGCTGTAC 15361 GC7TCGGGAG TAGGGCTCTT GGGCTCGCCG GGCCAGGGCA ACT.ACGCCGC GGCCAACACG 15421 TTCCTCGACG CTCTGGCGCA CCACCGGAGG GCGCAGGGGC TGCCAGCGTT GAGCGTCGAC 15 81 TGGGGCCTGT TCGCGGAGGT GGGCATGGCG GCCGCGCAGG AAGATCGCGG CGCGCGGCTG 15541 GTCTCCCGCG GAATGCGGAG CCTCACCCCC GACGAGGGGC TGTCCGCTCT GGCACGGCTG 15601 CTCGAAAGCG GCCGCGTGCA GGTGGGGGTG ATGCCGGTGA ACCCGCGGCT GTGGGTGGAG 15661 CTCTACCCCG CGGCGGCGTC TTCGCGAATG TTGTCGCGCC TGGTGACGGC GCATCGCGCG 15721 AGCGCCGGCG GGCCAGCCGG GGACGGGGAC CTGCTCC GCC GCCTCGCTGC TGCCGAGCCG 15731 AGCGCGCGGA GCGGGCTCCT GGAGCCGCTC CTCCGCGCGC AGATCTCGCA GGTGCTGCGC 15841 CTCCCCGAGG GCAAGATCGA GGTGGACGCC CCGCTCACGA GCCTGGGCAT GAACTCGCTG 159 1 ATGGGGCTCG AGCTGCGCAA CCGCATCGAG GCCATGCTGG GCATCACCGT ACCGGCAACG 15961 CTGTTGTGGA CCTATCCCAC GGTGGCGGCG CTGAGCGGGC ATCTGGCGCG GGAGGCATGC 16021 GAAGCCGCTC CTGTGGAGTC ACCGCACACC ACCGCCGATT CTGCTGTCGA GATCGAGGAG 16081 ATGTCGCAGG ACGATCTGAC GCAGTTGATC GCAGCAAAAT TCAAGGCGCT 7ACATG.ACTA 16141 CTCGCGGTCC TACGGCACAG CAG.AATCCGC TGAAACAAGC GGCCATCATC ATTCAGCGGC 16201 TGGAGGAGCG GCTCGCTGGG CTCGCACAGG CGGAGCTGGA ACGGACCGAG CCGATCGCCA 16261 TCGTCGGTAT CGGCTGCCGC 7TCCCTGGCG GTGCGGACGC TCCGGAAGCG TTTTGGGAGC 16321 TGCTCGACGC GGAGCGCGAC GCGGTCCAGC CGCTCGACAG GCGCTGGGCG CTGGTAGGTG 16381 TCGCTCCCGT CGAGGCCGTG CCGCACTGGG CGGGGCTGCT CACCGAGCCG ATAGATTGCT 16441 TCGATGCTGC GTTCTTCGGC ATCTCGCCTC GGGAGGCGCG ATCGCTCGAC CCGCAGCATC 16501 GTCTGTTGCT GGAGGTCGCT TGGGAGGGGC TCGAGGACGC CGGTATCCCG CCCCGGTCCA 16561 TCGACGGGAG CCGCACCGGT GTGTTCGT CG GCGCTTTCAC GGCGGACTAC GCGCGCACGG 16621 TCGCTCGGTT GCCGCGCGAG GAGCGAGACG CGTACAGCGC CACCGGCAAC ATGCTCAGCA 16681 TCGCCGCCGG ACGGCTGTCG TACACGCTGG GGCTGCAGGG ACCTTGCCTG ACCGTCGACA 16741 CGGCGTGCTC GTCATCGCTG GTGGCGATTC ACCTCGCCTG CCGCAGCCTG CGCGCAGGAG 16801 AGAGCGATCT CGCGTTGGCG GGAGGGGTCA GCACGCTCCT C7CCCCCGAC ATGATGGAAG 16861 CCGCGGCGCG CACGCAAGCG CTGTCGCCCG ATGGTCGTTG CCGGACCTTC GATGC7TCGG 16921 CCAACGGGTT CGTCCGTGGC GAGGGCTGTG GCCTGGTCGT CCTCAAACGG CTCTCCGACG 16931 CGCAACGGGA TGGCG. ACCGC ATCTGGGCGC TGATCCGGGG CTCGGCCATC AACCATGATG 17041 GCCGGTCGAC CGGG7TGACC GCGCCCAACG TGCTGGCTCA GGAGACGGTC TTGCGCGAGG 17101 CCCTGCGGAG CGCCC.ACGTC GAAGCTGGGG CCGTCGATTA CGTCGAGACC CACGGAACAG January 61 GGACCTCGCT GGGCGATCCC ATCGAGGTCG AGGCGCTGCG GGCGACGGTG GGGCCGGCGC 17221 GCTCCGACGG CACACGCTGC GTGCTGGGCG CGGTGAAGAC CAACATCGGC CATCTCGAGG 17281 CCGCGGCAGG CGTAGCGGGC CTGATCAAGG CAGCGCTTTC GCTGACGCAC GAGCGCATCC 173 1 CGAGAA.ACCT CAACTTCCGC ACGCTCAATC CGCGGATCCG GCTCGAGGGC AGCGCGCTCG 17401 CGTTGGC3AC CG.AGCCGG TG CCGTGGCCGC GCACGGACCG TCCGCGCTTC GCGGGGGTGA 1"? 461 GCTCGTTCGG GATGAGCGGA ACGAACGCGC ATGTGGTGCT GGAAGAGGCG CCGGCGGTGG 17521 AGCTGTGGCC TGCCGCGCCG GAGCGCTCGG CGGAGCTTTT GGTGCTGTCG GGCAAGAGCG 17531 AGGGGGCGCT CGACGCGCAG GCGGCGCGGC TGCGCGAGCA CCTGGACATG CACCCGGAGC 17641 TCGGGCTCGG GGACGTGGCG TTCAGCCTGG CGACGACGCG CAGCGCGATG ACCCACCGGC 17701 TCGCGGTGGC GGTGACGTCG CGCGAGGGGC TGCTGGCGGC GCTTTCGGCC GTGGCGCAGG 17761 GGC.AGACGCC i 'j- saj * AA * r * ts-iu Ru- f? ~ ? - * n ~ GCGC3CTGCA 1b TCGCGAGCTC CTCGCGCGGC AAGCTGGCGT 17821 TGCTGTTCAC CGGACAGGGC GCGC.Í.GACGC CGGGC.ATGGG CCGGGGGCTC TGCGCGGCGT 17331 GGCCAGCG7T CCGGGAGGCG TTCGACCGGT GCGTGACGCT GTTCGACCGG GAGCTGGACC 17941 GCCCGCTGCG CGAGG7GATG TGGGCGGAGG CGGGGAGCGC CGAGTCGTTG TTGCTGGACC 13001 AGACGGCGTT CACCCAGCCC GCGCTC7TCG CGGTGGAGTA CGCGCTGACG GCGCTGTGGC 13061 oo: AUI UUU CGTAGAGCCG GAGCTCCTGG TTGGGCATAG C.ATCGGGGAG C i anGTGGCtjG 18121 CGTGCGTG3C GGGGGTG7TC TCGCTGGAAG ATGGGGTGAG GCTCGTGGCG GCGCGCGGGC 13181 GGCTGATGCA GGGGCTCTCG GC3GGCGGCG CGATGGTGTC GCTCGGAGCG 13241 AGGTGGCCGC GjCGGx G v- CGTGGGTGTC G.ATCGCGGCG GTCAATGGGC 13301 CGGAGCAGGT GGTGATCGCG GGCGTGGAGC AAGCGGTGCA GGCGATCGCG GCGGGG7TCG 13361 CGGCGCGCGG CGTGCGCACC A.AGCGGCTGC ATGTCTCGCA CGCGTTCCAC TCGCCGCTGA 18421 TGGAACC3AT GCTGGAGGAG T7CGGGCGGG TGGCGGCGTC GGTGACGTAC CGGCGGCCAA 13481 GCGTTTCGCT GGTGAGCAAC CTGAGCGGGA AGGTGGTCAC GGACG.AGCTG AGCGCGCCGG 18541 GCTACTGGGT GCGGCACGTG CGGGAGGCGG TGCGCTTCGC GGACGGGGTG. AAGGCGCTGC 13601 ACGAAGCCGG CGCGGGCACG TTCCTCGAAG TGGGCCCGAA GCCGACGCTG CTCGGCCTGT 18661 7GCCAGCTTG CCTGCCGGAG GCGGAGCCGA CGTTGCTGGC GTCGTTGCGC G ^ CGGGCGCG 13721 AGGAGGCTGC GGGGGTGCTC GAGGCGCTGG GCAGGCTGTG GGCCGCTGGC GGCTCGGTCA 18781 GCTGGCCGGG CGTCTTCCCC ACGGCTGGGC GGCGGGTGCC GCTGCCGACC TATCCGTGGC 13841 AGCGGCAGCG GTACTGGATC GAGGCGCCGG CCGAAGGGCT CGGAGCCACG GCCGCCGATG 18901 CGCTGGCGCA GTGGTTCTAC CGGGTGGACT GGCCCGAGAT GCCTCGCTCA TCCGTGGATT 18961 CGCGGCGAGC CCGGTCCGGC GGGTGGCTGG TGCTGGCCGA CCGGGGTGGA GTCGGGGAGG 19021 CGGCCGCGGC GGCGCTTTCG TCGCAGGGAT GTTCGTGCGC CGTGCTCCAT GCGCCCGCC G 19081 AGGCCTCCGC GGTCGCCGAG CAGGTGACCC AGGCCCTCGG TGGCCGCAAC GACTGGCAGG 19141 GGGTGCTGTA CCTGTGGGGT CTGGACGCCG TCGTGGAGGC GGGGGCATCG GCCGAAGAGG 19201 TCGGC.AAAGT CACCCATCTT GCCACGGCGC CGGTGCTCGC GCTGATTCAG GCGGTGGGCA 19261 CGGGGCCGCG CTCACCCCGG CTCTGGATCG TGACCCGAGG GGCCTGCACG GTGGGCGGCG 19321 AGCCTGACGC TGCCCCCTGT CAGGCGGCGC TGTGGGGTAT GGGCCGGGTC GCGGCGCTGG 9381 A3CATCCCGG CTCCTGGGGC GGGCTCGTGG ACCTGGATCC GGAGGAGAGC CCGACGGAGG 19441 TCGAGGCCCT GGTGGCCGAG CTGCTTTCGC CGGACGCCGA GGATCAGCTG GCATTCCGCC 19501 AGGGGCGCCG GCGCGCAGCG CGGCTCGTGG CCGCCCCACC GGAGGGAAAC GCAGCGCCGG 19561 TGTCGCTGTC TGCGGAGGGG AGTTACTTGG TGACGGGTGG GCTGGGCGCC CTTGGCCTCC 19621 TCGTTGCGCG GTGGTTGGTG GAGCGCGGGG CGGGGCACCT TGTGCTGATC AGCCGGCACG 19681 GATTGCCCGA CCGCGAGGAA TGGGGCCGAG ATCAGCCGCC AGAGGTGCGC GCGCGCATTG 19741 CGGCGATCGA GGCGCTGGAG GCGCAGGGCG CGCGGGTCAC CGTGGCGGCG GTCGACGTGG 19801 CCGATGCCGA AGGCATGGCG GCGCTCTTGG CGGCCGTCGA GCCGCCGCTG CGGGGGGTCG 19861 TGCACGCCGC GGGTCTGCTC GACGACGGGC TGCTGGCCCA CCAGGACGCC G GTCGGCTCG 19921 CCCGGGTGTT GCGCCCCAAG GTGGAGGGGG CATGGGTGCT GCACACCCTT ACCCGCGAGC 19981 AGCCGCTGGA CCTCTTCGTA CTGTTTTCCT CGGCGTCGGG CGTCTTCGGC TCGATCGGCC 20041 AGGGCAGCTA CGCGGCAGGC AATGCCTTTT TGGACGCGCT GGCGGACCTC CGTCGAACGC 20101 AGGGGCTCGC CGCCCTGAGC .ATCGCCTGGG GCCTGTGGGC GGAGGGGGGG ATGGGCTCGC 20161 AGGCGCAGCG CCGGGAACAT GAGGCATCGG GAATCTGGGC GATGCCGACG AGTCGTGCCC 20221 TGGCGGCGAT GGAATGGCTG CTCGGTACGC GCGCGACGCA GCGCGTGGTC ATCCAG.ATGG 20281 ATTGGGCCCA TGCGGGAGCG GCTCCGCGCG ACGCGAGCCG AGGCCGCTTC TGGGATCGGC 20341 TGGTAACTGT CACGAAAGCG GCCTCCTCCT CGGCCGTGCC AGCTGTAGAG CGCTGGCGCA 20401 ACGCGTCTGT TGTGGAGACC CGCTCGGCGC TCTACGAGCT TGTGCGCGGC GTGGTCGCCG 20461 GGGTGATGGG CT7TACCGAC CAAGGCACGC TCGACGTGCG ACGAGGCTTC GCCGAGCAGG 20521 GCCTCGACTC CCTGATGGCT GTGGAGATCC GCAAACGGCT TCAGGGTGAG CTGGGTATGC 20581 CGCTGTCGGC GACGCTGGCG TTCGACCATC CGACCGTGGA GCGGCTGGTG GAATACTTGC 20641 TGAGCCAGGC GC7GGAGCTG CAGGACCGCA CCGACGTGCG .AAGCGTTCGG TTGCCGGCGA 207C1 CAGAGGACCC GATCGCCATC GTGGGTGCCG CCTGCCGCTT C CCGGGCGGG GTCGAGGACC 20761 TGGAG7CCTA CTGGCAGCTG TTGACCGAGG GCGTGGTGGT CAGCACCGAG GTGCCGGCCG 20821 ACCGGTGGAA TGGGGCAGAC GGGCGCGGCC CCGGCTCGGG AGAGGCTCCG AGACAGACCT 20881 ACGTGCCCAG GGGTGGCTTT CTGCGCGAGG TGGAGACGTT CGATGCGGCG TTCTTCCACA 20941 TCTCGCCTCG GGAGGCGATG AGCCTGGACC CGCAACAGCG GCTGCTGCTG G.AAGTGAGCT 21001 GGGAGGCGAT CG CG • V > - GGCCAGGACC CGTCGGCGCT GCGCGAGAGC CCCACGGGCG 21061 TGTTCGTGGG CGCGGGCCCC AACGAATATG CCGAGCGGGT GCAGGACCTC GCCGATGAGG 21121 CGGCGGGGCT CTACAGCGGC ACCGGCAACA TGCTCAGCGT TGCGGCGGGA CGGCTGTCAT 21181 TTTTCCTGGG CCTGCACGGG CCGACCC7GG CTGTGGATAC GGCGTGCTCC TCG7CGCTCG 2124 1 TGGCGCTGCA CCTCGGCTGC CAG.AGCTTGC GACGGGGCGA GTGCGACCAA GCCCTGGTTG 21301 GCGGGGTCAA CATGCTGCTC TCGCCGAAGA CCTTCGCGCT GCTCTCACGG ATGCACGCGC 21561 TTTCGCCCGG CGGGCGGTGC AAGACG7TCT CGGCCGACGC GGACGGCTAC GCGCGGGCCG 21421 AGGGCTGCGC CGTGGTGGTG CTC.AAGCGGC TCTCCGACGC GCAGCGCGAC CGCGACCCCA 21 481 7CC7GGCGGT GATCCGGGGT ACGGCGATCA ATCATGATGG CCCGAGCAGC GGGCTGACAG 2154 1 TGCCCAGCGG CCCTGCCCAG GAGGCGCTGT TACGCCAGGC GCTGGCGCAC GCAGGGGTGG 21601 7TCCGGCCGA CG7CGATTTC GTGGAATGCC ACGGGACCGG GACGGCGCTG GGCGACCCGA 21661 TCGAGGTGCG GGCGCTGAGC GACGTGTACG GGCAAGCCCG CCCTGCGGAC CGACCGCTGA 21721 TCCTGGGAGC CGCCAAGGCC AACCTTGGGC ACATGGAGCC CGCGGCGGGC CTGGCCGGCT 21731 TGCTCAAGGC GGTGCTCGCG CTGGGGCAAG AGCAAATACC AGCCCAGCCG GAGCTGGGCG 218 1 AGCTCAACCC GCTCTTGCCG TGGGAGGCGC TGCCGGTGGC GGTGGCCCGC GCAGCGGTGC 21901 CGTGGCCGCG CACGGACCGT CCGCGCTTCG CGGGGGTGAG CTCGTTCGGG ATGAGCGGAA 21961 CGAACGCGCA TGTGGTGCTG GAAGAGGCGC CGGCGGTGGA GCTGTGGCCT GCCGCGCCGG 22021 AGCGCTCGGC GGAGCTTTTG GTGCTGTCGG GCAAGAGCGA GGGGGCGCTC GACGCGCAGG 22081 CGGCGCGGCT GCGCGAGCAC CTGGACATGC ACCCGGAGCT CGGGCTCGGG GACGTGGCGT 22141 TCAGCCTGGC GACGACGCGC AGCGCGATGA ACCACCGGCT CGCGGTGGCG GTGACGTCGC 22201 GCGAGGGGCT GCTGGCGGCG CTTTCGGCCG TGGCGCAGGG GCAGACGCCG CCGGGGGCGG 22261 CGCGCTGCAT CGCGAGCTCG TCGCGCGGCA AGCTGGCGTT CCTGTTCACC GGACAGGGCG 22321 CGCAGACGCC GGGCATGGGC CGGGGGCTCT GCGCGGCGTG GCCAGCGTTC CGAGAGGCGT 22381 TCGACCGGTG CGTGGCGCTG TTCGACCGGG AGCTGGACCG CCCGCTGTGC GAGGTGATGT 22441 GGGCGGAGCC GGGGAGCGCC GAGTCGTTGT TGCTCGACCA GACGGCGTTC ACCCAGCCCG 22501 CGCTCTTCAC GGTGGAGTAC GCGCTGACGG CGCTGTGGCG GTCGTGGGGC GTAGAGCCGG 22561 AGCTGGTGGC TGGGCATAGC GCCGGGGAGC TGGTGGCGGC GTGCGTGGCG GGGGTGTTCT 22621 CGCTGGAAGA TGGGGTGAGG CTCGTGGCGG CGCGCGGGCO GCTGATGCAG GGGCTCTCGG 2268 CGGGCGGCGC GATGGTGTCG CTCGGAGCGC CGGAGGCGGA GGTGGCCGCG GCGGTGGCGC 227 1 CGCACGCGGC GTGGGTGTCG ATCGCGGCGG TCAATGGGCC GGAGCAGGTG GTGATCGCGG 22801 GCGTGGAGCA AGCGGTGCAG GCGATCGCGG CGGGGTTCGC GGCGCGCGGC GTGCGCACCA 22861 AGCGGCTGCA TGTCTCGCAC GCATCCCACT CGCCGCTGAT GGAACCGATG CTGGAGGAGT 22921 TCGGGCGGGT GGCGGCGTCG GTGACGTACC GGCGGCCAAG CGTTTCGCTG GTGAGCAACC 22981 TGAGCGGGAA GGTGGTCACG GACGAGCTGA GCGCGCCGGG CTACTGGGTG CGGCACGTGC 23041 GGGAGGCGGT GCGCTTCGCG GACGGGGTGA AGGCGCTGCA CGAAGCCGGC GCGGGGACGT 23101 TCCTCGAAGT GGGCCCGAAG CCGACGCTGC TCGGCCTGTT GCCAGCTTGC CTGCCGGAGG 23161 CGGAGCCGAC GCTGCTGGCG TCGTTGCGCG CCGGGCGCGA GGAGGCTGCG GGGGTGCTCG 23221 AGGCGCTGGG CAGGCTGTGG GCCGCCGGCG GCTCGGTCAG CTGGCCGGGC GTCTTCCCCA 23281 CGGCTGGGCG GCGGGTGCCG CTGCCGACCT ATCCGTGGCA GCGGCAGCGG T.ACTGGCCCG 23341 ACATCGAGCC TGACAGCCGT CGCCACGCAG CCGCGGATCC GACCCAAGGC TGGTTCTATC 23401 GCGTGGACTG GCCGGAGATA CCTCGCAGCC TCCAG.AAATC AGAGGAGGCG AGCCGCGGGA 234 61 GCTGGCTGGT ATTGGCGGAT AAGGGTGGAG TCGGCGAGGC GGTCGCTGCA GCGCTGTCGA 23521 CACGTGGACT TCCATGCGTC GTGCTCCATG CGCCGGCAGA GACATCCGCG ACCGCCGAGC 23581 TGGTGACCGA GGCTGCCGGC GGTCGAAGCG ATTGGCAGGT AGTGCTCTAC CTGTGGGG7C 23641 TGGACGCCGT CGTCGGCGCG GAGGCGTCGA TCGATGAGAT CGGCGACGCG ACCCGTCGTG 23701 CTACCGCGCC GGTGCTCGGC TTGGCTCGGT TTCTGAGCAC CGTGTCTTGT TCGCCCCGAC 23761 TCTGGGTCGT GACCCGGGGG GCATGCATCG TTGGCGACGA GCCTGCGATC GCCCCTTGTC 23821 AGGCGGCGTT ATGGGGCATG GGCCGGGTGG CGGCGCTCGA GCATCCCGGG GCCTGGGGCG 23881 GGCTCGTGGA CCTGGATCCC CGAGCGAGCC CGCCCCAAGC CAGCCCGATC GACGGCGAGA 23941 TGCTCGTCAC CGAGCTATTG TCGCAGGAGA CCGAGG.ACCA GCTCGCCTTC CGCCATGGGC 24001 GCCGGCACGC GGCACGGCTG GTGGCCGCCC CGCCACGGGG GGAAGCGGCA CCGGCGTCGC 24061 7GTCTGCGGA GGCG.AGCTAC CTGGTGACGG GAGGCCTCGG TGGGCTGGGC CTGATCGTGG 24121 CCCAG7GGCT GGTGGAGCTG GGAGCGCGGC ACTTGGTGCT GACCAGCCGG CGCGGGTTGC 24131 CCGACCGGCA GGCGTGGCGC GAGCAGCAGC CGCCTGAGAT CCGCGCGCGG ATCGCAGCGG 24241 TCGAGGCGCT GGAGGCGCGG GGTGCACGGG TGACCGTGGC AGCGGTGGAC GTGGCCGACG 24301 TCGAACCGAT GACAGCGCTG GTTTCGTCGG TCGAGCCCCC GCTGCGAGGG GTGGTGCACG 24361 CCGCTGGCGT CAGCGTCATG CGTCCACTGG CGGAGACGGA CGAGACCCTG CTCGAGTCGG 24421 TGCTCCGTCC CAAGGTGGCC GGGAGCTGGC TGCTGCACCG GCTGCTGCAC GGCCGGCCTC 244 81 TCGACCTGTT CGTGCTGTTC TCGTCGGGCG CAGCGGTGTG GGGTAGCCAT AGCCAGGGTG 24541 CGTACGCGGC GGCCAACGCT TTCCTCGACG GGCTCGCGCA TCTTCGGCGT TCGCAATCGC 24601 TGCCTGCGTT GAGCGTCGCG TGGGGTCTGT GGGCCGAGGG AGGCATGGCG GACGCGGAGG 24661 CTCATGCACG TCTGAGCGAC ATCGGGGTTC TGCCCATGTC GACGTCGGCA GCGTTGTCGG 24721 CGCTCCAGCG CCTGGTGGAG ACCGGCGCGG CTCAGCGCAC GGTGACCCGG ATGGACTGGG 24781 CGCGCTTCGC GCCGGTGTAC ACCGCTCGAG GGCG7CGCAA CCTGCTTTCG GCGCTGGTCG 24841 CAGGGCGCGA CATCATCGCG CCTTCCCCTC CGGCGGCAGC AACCCGGAAC TGGCGTGGCC 24901 TGTCCGTTGC GGAAGCCCGC ATGGCTCTGC ACGAGGTCGT CCATGGGGCC GTCGCTCGGG 24961 TGCTGGGCTT CCTCGACCCG AGCGCGCTCG ATCCTGGGAT GGGGTTCAAT GAGCAGGGCC 25021 TCGACTCGTT GATGGCGGTG GAGATCCGCA ACCTCCTTCA GGCTGAGCTG GACGTGCGGC 25031 TTTCGACGAC GCTGGCCTTT GATCATCCGA CGGTACAGCG GCTGGTGGAG CATCTGCTCG 251 1 TCGATG7ACT GAAGCTGGAG GATCGCAGCG ACACCCAGCA TGTTCGGTCG 7TGGCGTCAG 25201 ACGAGCCCAT CGCCATCGTG GGAGCCGCCT GCCGCTTCCC GGGCGGGGTG GAGGACCTGG 25261 AGTCCTACTG GCAGC7GTTG GCCGAGGGCG TGGTGGTCAG CGCCGAGGTG CCGGCCGACC 25321 GGTGGGATGC GGCGGACTGG TACGACCCTG ATCCGGAGAT CCCAGGCCGG ACTTACGTGA 25381 CCAAAGGCGC CTTCCTGCGC GATTTGCAGA GATTGGATGC GACCTTCTTC CGC ATCTCGC 25441 CTCGCGAGGC GATGAGCCTC GACCCGCAGC AGCGGTTGCT CCTGGAGGTA AGCTGGGAGG 25501 CGCTCGAGAG CGCGGGTATC GCTCCGGATA CGCTGCGAGA TAGCCCCACC GGGGTGTTCG 25561 TGGGTGCGGG GCCCAATGAG TACTACACGC AGCGGCTGCG AGGCTTCACC GACGGAGCGG 25621 CAGGGCTGTA CGGCGGCACC GGGAACATGC TCAGCGTTGC GGCTGGACGG CTGTCGTTTT 25681 7CCTGGGTCT GCACGGCCCG ACGCTGGCCA TGGATACGGC GTGCTCGTCC TCCCTGGTCG 257 1 CGCTGCACCT CGCCTGCCAG AGCCTGCGAC TGGGCGAGTG CGATCAAGCG CTGGTTGGCG 25301 GGGTCAACGT GCTGCTCGCG CCGGAGACCT TCGTGCTGCT CTCACGGATG CGCGCGCTTT 25861 CGCCCGACGG GCGGTGCAAG ACGTTCTCGG CCGACGCGGA CGGCTACGCG CGGGGCGAGG 25921 GGTGCGCCGT GGTGGTGCTC AAGCGGCTGC GCGATGCGCA GCGCGCCGGC GACTCCATCC 25981 TGGCGCTGAT CCGGGGAAGC GCGGTGAACC ACGACGGCCC GAGCAGCGGG CTGACCGTGC 26041 CCAACGGACC CGCCCAGCAA GCATTGCTGC GCCAGGCGCT 7TCGCAAGCA GGCGTGTCTC 26101 CGGTCGACGT TGATTTTGTG GAGTGTCACG GGACAGGGAC GGCGCTGGGC GACCCGATCG 26161 AGGTGCAGGC GCTGAGCGAG GTGTATGGTC CAGGGCGCTC CGAGGATCGA CCGCTGGTGC 26221 TGGGGGCCGT CAAGGCCAAC GTCGCGCATC TGGAGGCGGC ATCCGG CTTG GCCAGCCTGC 26281 TCAAGGCCGT GCTTGCGCTG CGGCACGAGC AGATCCCGGC CCAGCCGGAG CTGGGGGAGC 263 1 TCAACCCGCA CTTGCCGTGG AACACGCTGC CGGTGGCGGT GCCACGTAAG GCGGTGCCGT January 26 GGGGGCGCGG CGCACGGCCG CGTCGGGCCG GCGTGAGCGC GTTCGGGTTG AGCGGAACCA 26461 ACGTGCATGT CGTGCTGGAG GAGGCACCGG AGGTGGAGCT GGTGCCCGCG GCGCCGGCGC 26521 GACCGGTGGA GCTGGTTGTG CTATCGGCCA AGAGCGCGGC GGCGCTGGAC GCCGCGGCGG 26581 AACGGCTCTC GGCGCACCTG TCCGCGCACC CGGAGCTGAG CCTCGGCGAC GTGGCGTTCA 26641 GCCTGGCGAC GACGCGCAGC CCGATGGAGC ACCGGCTCGC CATCGCGACG ACCTCGCGCG 26701 AGGCCCTGCG AGGCGCGCTG GACGCCGCGG CGCAGCGGCA GACGCCGCAG GGCGCGGTGC 26761 GCGGCAAGGC CGTGTCCTCA CGCGGTAAGT TGGCTTTCCT GTTCACCGGA CAGGGCGCGC 26821 AAATGCCGGG CATGGGCCGT GGGCTGTACG AGGCGTGGCC AGCGTTCCGG GAGGCGTTCG 26881 ACCGGTGCGT GGCGCTCTTC GATCGGGAGC TCGACCAGCC TCTGCGCGAG GTGATGTGGG 26941 CTGCGCCGGG CCTCGCTCAG GCGGCGCGGC TCGATCAGAC CGCGTACGCG CAGCCGGCTC 27001 TCTTTGCGCT GGAGTACGCG CTGGCTGCCC TGTGGCGTTC GTGGGGCGTG GAGCCGCACG 27061 TACTCCTCGG TCATAGCATC GGCGAGCTGG TCGCCGCCT G CGTGGCGGGC GTGTTCTCGC 2121 TCGAAGACGC GGTGAGGTTG GTGGCCGCGC GCGGGCGGCT GATGCAGGCG CTGCCCGCCG 27181 GCGGTGCCAT GGTCGCCATC GCAGCGTCCG AGGCCGAGGT GGCCGCCTCC GTGGCACCCC 27241 ACGCCGCCAC GGTGTCGATC GCCGCGGTCA ACGGTCCTGA CGCCGTCGTG ATCGCTGGCG 27301 CCGAGGTACA GGTGCTCGCC CTCGGCGCGA CGTTCGCGGC GCGTGGGATA CGCACGAAGA 27361 GGCTCGCCGT CTCCCATGCG TTCCACTCGC CGCTCATGGA TCCGATGCTG GAAGACTTCC 27421 AGCGGGTCGC TGCGACGATC GCGTACCGCG CGCCAGACCG CCCGGTGGTG TCGAATGTCA 27481 CCGGCCACGT CGCAGGCCCC GAGATCGCCA CGCCCGAGTA TTGGGTCCGG CATGTGCGAA 27541 GCGCCGTGCG CTTCGGCGAT GGGGCAAAGG CGTTGCATGC CGCGGGTGCC GCCACGTTCG 27601 TCGAGATTGG CCCGAAGCCG GTCCTGCTCG GGCTATTGCC AGCGTGCCTC GGGGAAGCGG 27661 ACGCGGTCCT CGTGCCGTCG CTACGCGCGG ACCGCTCGGA ATGCGAGGTG GTCCTCGCGG 27721 CGCTCGGGAC TTGGTATGCC TGGGGGGGTG CGCTCGACTG GAAGGGCGTG TTCCCCGATG 27781 GCGCGCGCCG CGTGGCTCTG CCCATGTATC CATGGCAGCG TGAGCGCCAT TGGATGGACC 27841 TCACCCCGCG AAGCGCCGCG CCTGCAGGGA TCGCAGGTCG CTGGCCGCTG GCTGGTGTCG 27901 GGCTCTGCAT GCCCGGCGCT GTGTTGCACC ACGTGCTCTC GATCGGACCA CGCCATCAGC 27961 CCTTCCTCGG TGATCACCTC GTGTTTGGCA AGGTGGTGGT GCCCGGCGCC TTTCATG7CG 28021 CGGTGATCCT CAGCATCGCC GCCGAGCGCT GGCCCGAGCG GGCGATCGAG CTGACAGGCG 28081 TGGAGTTCCT GAAGGCGATC GCGATGGAGC CCGACCAGGA GGTCGAGCTC CACGCCGTGC 28141 7CACCCCCGA AGCCGCCGGG GATGGCTACC 7GTTCGAGCT GGCG.ACCC7G GCGGC3CCGG 28201 AGACCGAACG CCGATGGACG ACCCACGCCC GCGGTCGGGT GC.AGCCGACA GACGGCGCGC 28261 CCGGCGCGTT GCCGCGCC7C GAGGTGCTGG AGGACCGCGC GA7CCAGCCC CTCGACTTCG 28321 CCGGATTCCT CGACAGGTTA TCGGCGG7GC GGATCGGCTG GGGTCCGC7T TGGCGATGGC 23381 7GCAGGACGG GCGCGTCGGC GACGAGGCCT CGC7T3CCAC CCTCGTGCCG ACCTATCCGA 28441 ACGCCCACGA CGTGGCGCCC TTGCACCCGA TCCTGCTGGA CAACGGCTTT GCGGTG.AGCC 29501 7GCTGGCAAC CCGGAGCGAG CCGGAGGACG ACGGGACGCC CCCGCTGCCG TTCGCCGTGG 28561 AACGGGTGCG GTGGTGGCGG GCGCCGGTTG GAAGGGTGCG GTGTGGCGGC GTGCCGCGGT 23621 CGCAGGCATT CGGTGTCTCG .AGCTTCGTGC TGGTCGACGA AACTGGCGAG GTGGTCGCTG 28681 AGGTGGAGGG AT7TGTTTGC CGCCGGGCGC CGCGAGAGGT GTTCCTGCGG CAGGAGTCGG 23741 GCGCGTCGAC TGCAGCCTTG TACCGCCTCG ACTGGCCCGA AGCCCCCTTG CCCGATGCGC 28801 CTGCGGAACG GATGGAGGAG AGCTGGGTCG TGGTGGCAGC ACCTGGCTCG GAGATGGCCG 28861 CGGCGCTCGC AACACGGCTC AACCGCTGCG 7ACTCGCCGA ACCCAAAGGC CTCGAGGCGG 28921 CCCTCGCGGG GGTGTCTCCC GCAGGTGTGA TCTGCCTCTG GGAACCTGGA GCCCACGAGG 28981 AAGCTCCGGC GGCGGCGCAG CGTGTGGCGA CCGAGGGCCT TTCGGTGGTG CAGGCGCTCA 29041 GGGATCGCGC GG7GCGCCTG TGGTGGGTGA CCACGGGCGC CGTGGCTGTC GAGGCCGGTG 29101 AGCGGGTGCA GGTCGCCACA GCGCCGGTAT GGGGCCTGGG CCGGACAGTG ATGCAGGAGC 29161 GCCCGGAGCT CAGCTGCACT CTGGTGGATT TGGAGCCGGA GGTCGATGCC GCGCGTTCAG 29221 CTGACGTTCT GCTGCGGGAG CTCGGTCGCG CTGACGACGA GACCCAGGTG GTT77CCGTT 29231 CCGGAGAGCG CCGCGTAGCG CGGCTGGTCA AAGCGACAAC CCCCGAAGGG CTCTTGGTCC 29341 CTGACGCAGA ATCCTATCGA CTGGAGGCTG GGCAGAAGGG CACATTGGAC CAGCTCCGCC 29401 TCGCGCCGGC AC.AGCGCCGG GCACCCGGCC CGGGCGAGGT CGAGATCAAG GTAACCGCCT 29461 CGGGGCTCCA CTTCCGGACC GTCCTCGCTG TGCTGGGAAT GTATCCGGGC GACGCTGGGC 29521 CGATGGGCGG AGATTGTGCC GG7ATCGTCA CGGCGGTGGG CCAGGGGGTG CACCACCTCT 29581 CGGTCGGCGA TGCTGTCATG ACGCTGGGGA CGTTGCATCG ATTCGTCACG GTCGACGCGC 2964 1 GGCTGGTGGT CCGGCAGCCT GCAGGGCTGA CTCCCGCGCA GGCAGCTACG GTGCCGGTTG 29701 CGTTCCTGAC GGCCTGGCTC GCTCTGCACG ACCTGGGGAA TCTGCGGCGC GGCGAGCGGG 29761 TGCTGATCCA TGCTGCGGCC GGCGGCGTGG GCATGGCCGC GGTGCAAATC GCCCGATGGA 29821 TAGGGGCCCCGA GGTGTTCGCC ACGGCGAGCC CGTCCAAGTG GGCAGCGGTT CAGGCCATGG 29881 GCGTGCCGCG CACGCACATC GCCAGCTCGC GGACGCTGGA GTTTGCTGAG ACGTTCCGGC 2994 1 AGGTCaACCGG CGGCCGGGGC GTGGACG7GG TGCTCAACGC GCTGGCCGGC GAGTTCGTGG 30001 ACGCGAGCCT GTCCCTGCTG ACGACGGGCG GGCGGTTCCT CGAGATGGGC AAGACCGACA 30061 TACGGGATCG AGCCGCGGTC GCGGCGGCGC ATCCCGGTGT TCGCTATCGG GTATTCGACA 30121 7CCTGGAGCT CGCTCCGGAT CGAACTCGAG AGATCC7CGA GCGCGTGGTC GAGGGCTTTG 30181 C7GCGGGACA TCTGCGCGCA TTGCCGGTGC ATGCGTTCGC GATCACCAAG GCCGAGGCAG 30241 CGTTTCGGTT CATGGCGCAA GCGCGGCATC AGGGCAAGGT CGTGCTGCTG CCGGCGCCCT 30301 CCGCAGCGCC CTTGGCGCCG ACGGGCACCG TACTGCTGAC CGGTGGGCTG GGAGCGTTGG 30361 GGCTCCACGT GGCCCGCTGG CTCGCCCAGC AGGGCGCGCC GCACATGGTG CTCACAGGTC 30421 GGCGGGGCCT GGATACGCCG GGCGCTGCCA AAGCCGTCGC GGAGATCGAA GCGCTCGGCG 30481 C7CGGGTGAC GATCGCGGCG TCGGATGTCG CCGATCGGAA CGCGCTGGAG GCTGTGCTCC 30541 AGGCCATTCC GGCGGAGTGG CCGTTACAGG GCGTGATCCA 7GCAGCCGGA GCGCTCGATG 30601 ATGGTGTGCT TGATGAGCAG ACCACCGACC GCTTC7CGCG GGTGCTGGCA CCGAAGGTGA 30661 CTGGCGCCTG GAATCTGCAT GAGCTCACGG CGGGCAACGA TCTCGCTTTC TTCGTGCTGT 30721 TCTCCTCCAT GTCGGGGCTC TTGGGCTCGG CCGGGCAGTC CAACTATGCG GCGGCCAACA 30781 CCTTCCTCGA CGCGCTGGCC GCGCATCGGC GGGCCGAAGG CCTGGCGGCG CAGAGCCTCG 30841 CGTGGGGCCC ATGGTCGGAC GGAGGCATGG CAGCGGGGCT CAGCGCGGCG CTGCAGGCGC 30901 GGCTCGCTCG GCATGGGATG GGAGCGCTGT CGCCCGCTCA GGGCACCGCG CTGCTCGGGC 30961 AGGCGCTGGC TCGGCCGGAA ACGCAGCTCG GGGCGATGTC GCTCGACGTG CGTGCGGCAA 31021 GCCAAGCTTC GGGAGCGGCA GTGCCGCCTG TGTGGCGCGC GCTGGTGCGC GCGG.AGGCGC 31081 GCCATGCGGC GGCTGGGGCG CAGGGGGCAT TGGCCGCGCG CC7TGGGGCG CTGCCCGAGG 3114 1 CGCGTCGCGC CGACGAGGTG CGCAAGGTCG TGCAGGCCGA GATCGCGCGC GTGCTT7CA7 31201 VJG GC oO > - C GAGCGCCGTG CCCGTCGATC GGCCGCTGTC GGAC7TGGGC C7CGACTCGC 31261 TCACGGCGGT GGAGCTGCGC AACGTGCTCG GCCAGCGGGT GGGTGCGACG CTGCCGGCGA 31321 CGCTGGCATT CGATCACCCG ACGGTCGACG CGCTCACGCG CTGGCTGC7C GATAAGGTCC 31381 TGGCCGTGGC CGAGCCGAGC G7ATCGCCCG CAAAGTCGTC GCCGCAGG7C GCCCTCGACG 31441 AGCCCATTGC GGTGATCGGC ATCGGCTGCC GTTTCCCAGG CGGCGTGACC GATCCGGAGT 31501 CGTTT7GGCG GCTGCTCGAA GAGGGCAGCG ATGCCGTCGT CGAGGTGCCG CATGA3CGAT 31561 GGGACATCGA CGCGTTC7AT GATCCGGATC CGGATGTGCG CGGCAAGA7G ACGACACGCT 31621 TTGGCGGCTT CCTGTCCGAT ATCGACCGGT TCGAGCCGGC C7TC7TCGGC ATCTCGCCGC 31681 GCGAAGCGAC GACCATGGAT CCGCAGCAGC GGCTGCTCCT GGAGACGAGC 7GGGAGGCGT 31741 TCGAGCGCGC CGGGATTTTG CCCGAGCGGC TGATGGGC.AG CGATACCGGC GTGTTCGTGG 31301 GGC7C77CTA CCAGGAGTAC GCTGCGCTCG CCGGCGGCAT CGAGGCG77C GATGGCT.ATC 31861 TAGGCACCGG CACCACGGCC AGCGTCGCCT CGGGCAGGAT C7CTTATG7G C7CGGGCTAA 31921 AGGGGCCGAG CCTGACGGTG GACACCGCGT GCTCC7CGTC GCTGGTCGCG GTGCACCTGG 31981 CCTGCCAGGC GCTGCGGCGG GGCGAG7GTT CGGTGGCGC7 GGCCGGCGGC G7GGCGCTGA 32041 TGCTCACGCC GGCG. ACGTTC GTGGAGTTCA GCCGGCTGCG .AGGCCTGGCT CCCGACGGAC 32101 GGTGCAAGAG CTTCTCGGCC GCAGCCGACG GCGTGGGGTG GAGCGAAG3C TGCGCCATGC 32161 TCCT3CTCAA ACCGCTTCGC GATGCTCAGC GCGATGGGGA TCCGATCC7G GCGGTGATCC 32221 GCGGCACCGC GGTGAACCAG GATGGGCGCA GC.AACGGGCT GACGGCGCCC AACGGGTCGT 32281 CGCAGCAAGA GGTG.ATCCGT CGGGCCC7GG AGCAGGCGGG GCTGGCTCCG GCGGACGTCA 32341 GCT.AC3TCGA GTGCCACGGC ACCGGCACGA CGTTGGGCGA CCCCATCGAA GT3CAGGCCC 32401 TGGGCGCCGT GCTGGCACAG GGGCGACCCT CGGACCGGCC GCTCGTGATC GGGTCGGTGA 324 61 AGTCC.AATAT CGGACATACG CAGGCTGCGG CGGGCGTGGC CGGTGTCA7C AAGGTGGCGC 32521 TGGCGCTCGA GCGCGGGCTT ATCCCGAGGA GCCTGC.ATTT CGACGCGCCC AATCCGCACA 32581 TTCC3TGGTC GGAGCTCGCC GTGCAGGTGG CCGCC.AAACC CGTCGAATGG AC3AGAAACG 3264 1 GCGCGCCGCG ACGAGCCGGG GTGAGCTCGT TTGGCGTCAG CGGGACCAAC GCGCACGTGG 32701 TGCTGGAGGA GGCGCC.AGCG GCGGCGTTCG CGCCCGCGGC GGCGCGTTCA GCGGAGCTTT 327 61 TCGTGC7G7C GGCGAAGAGC GCCGCGGCGC TGGACGCGCA GGCGGCGCGG C7TTCGGCGC 32821 ATG7CGTTGC GCACCCGGAG CTCGGCCTCG GCGACCTGGC GTTCAGCCTG GCGACGACCC 32881 GCAGCCCGAT GACGTACCGG CTCGCGGTGG CGGCGACCTC GCGCGAGGCG CTGTCTGCGG 3294 1 CGCTCGAC.AC AGCGGCGCAG GGGCAGGCGC CGCCCGCAGC GGCTCGCGGC CACGCTTCCA 33C C 1 CAGGCAGCGC CCCAAAGGTG GTTTTCGTCT TTCCTGGCCA GGGCTCCCAG TGGCTGGGCA 33061 TGGGCCAAAA GCTCCTCTCG GAGGAGCCCG TCTTCCGCGA CGCGC7CTCG GCGTGTGACC 33121 GAGCGATTCA GGCCGAAGCC GGCTGGTCGC TGCTCGCCGA GCTCGCGGCC GATGAGACCA 33181 CCTCGCAGCT CGGCCGCATC GACGTGGTGC AGCCGGCGCT GTTCGCGATC GAGGTCGCGC 3324 1 TGTCGGCGCT GTGGCGGTCG TGGGGCGTCG AGCCGGATGC AGTGGTAGGC CACAGCATGG 33301 GCGAAG7GGC GGCCGCGCAC GTCGCCGGCG CCCTGTCGCT CGAGGATGCT GTAGCGATCA 33361 TCTGCCGGCG CAGCCTGCTG CTGCGGCGGA TCAGCGGCCA AGGCGAGATG GCGGTCGTCG 33421 AGCTCTCCCT GGCCGAGGCC GAGGCAGCGC TCCTGGGCTA CGAAGATCGG CTCAGCGTGG 33481 CGGTGAGCAA CAGCCCGCGA TCGACGGTGC TGGCGGGCGA GCCGGCAGCG CTCGCAGAGG 33541 TGC7GGCGAT CCTTGCGGCA AAGGGGGTGT TCTGCCGTCG AGTCAAGGTG GACGTCGCCA 33601 GCCACAGCCC ACAGATCGAC CCGCTGCGCG ACGAGCT.ATT GGCAGCATTG GGCGAGCTCG 33661 .AGCCGCGACA AGCGACCGTG TCGATGCGCT CGACGGTGAC GAGCACGATC GTGGCGGGCC 33721 CGGAGCTCGT GGCGAGCTAC TGGGCGGACA ACGTTCGACA GCCGGTGCGC TTCGCCGAAG 33781 CGGTGCAATC GTTGATGGAA GGCGGTCATG GGCTGTTCGT GGAGATGAGC CCGCATCCGA 33841 TCCTGACG.AC GTCGGTCGAG GAGATCCGAC GGGCGACGAA GCGGGAGGGA GTCGCGGTGG 33901 GC7CGT7GCG GCGTGGACAG GACGAGCGCC TGTCCATGTT GGAGGCGCTG GGAGCGCTCT 33961 GGGTACACGG CCAGGCGGTG GGCTGGGAGC GGCTGTTCTC CGCGGGCGGC GCGGGCCTCC 34021 GTCGCGTGCC GCTGCCGACC TATCCCTGGC AGCGCGAGCG GTACTGGGTC GAAGCGCCGA 34081 CCGGCGGCGC GGCGAGCGGC AGCCGCTTTG CTCATGCGGG CAGTCACCCG CTCCTGGGTG 34 January 1 AAATGCAGAC CCTGTCGACC CAGAGGAGCA CGCGCGTGTG GGAGACGACG CTGGATCTCA 34201 AACGGCTGCC GTGGCTCGGC GATCACCGGG TGCAGGGGGC GGTCGTGTTC CCGGGCGCGG 34261 CGTACCTGGA GATGGCGCTT TCGTCTGGGG CCGAGGCCTT GGGTGACGGT CCGCTCCAGG 34321 TCAGCGATGT GGTGCTCGCC GAGGCGCTGG CCTTCGCGGA TGATACGCCG GTGGCGGTGC 34381 AGGTCATGGC GACCGAGGAG CGACCAGGCC GCCTGCAATT CCACGTTGCG AGCCGGGTGC 34441 CGGGCCACGG CCGTGCTGCC TTTCGAAGCC ATGCCCGCGG GGTGCTGCGC CAGACCG GC 34,501 GCGCCGAGGT CCCGGCGAGG CTGGATCTGG CCGCGCTTCG TGCCCGGCTT CAGGCCAGCG 34561 CACCCGCTGC GGCTACCTAT GCGGCGCTGG CCGAGATGGG GCTCGAGTAC GGCCCAGCGT 34 62 1 TCCAGGGGCT TGTCGAGC7G TGGCGGGGGG AGGGCGAGGC GCTGGGACG T GTGCGGCTCC 34 68 1 CCGAGGCCGC CGGCTCCCCA GCCGCGTGCC GGCTCCACCC CGCGCTCTTG GATGCGTGCT 3474 1 TCCACGTGAG CAGCGCCTTC GCTGACCGCG GCGAGGCGAC GCCATGGG7A CCCGTCGAAA 34801 TCGGCTCGCT GCGGTGGTTC CAGCGGCCGT CGGGGGAGCT GTGGTGTCAT GCGCGGAGCG 34861 TGAGCCACGG AAAGCCAACA CCCGATCGGC GGAGT.ACCGA CT7TTGGGTG GTCGACAGCA 34921 CGGGCGCG.AT CGTCGCCGAG ATCTCCGGGC TCGTGGCGCA GCGGCTCGCG GGAGGTGTAC 34 981 GCCGGCGCGA AGAAGACGAC TGGTTCATGG AGCCGGCTTG GGAACCGACC GCGGTCCCCG 3504 1 GATCCGAGGT CACGGCGGGC CGG7GGCTGC TCATCGGCTC GGGCGGCGGG CTCGGCGC73 35101 CGC7CTACTC GGCGC7GACG GAAGCTGGCC ATTCCGTCGT CCACGCGACA GGGCACGGCA 35161 CG.AGCGCCGC CGGGTTGCAG GCACTCCTGA CGGCGTCCTT CGACGGCCAG GCCCCGACGT 35221 CGGTGGTGCA CCTCGGCAGC CTCGATGAGC GTGGCGTGC7 CGACGCGGAT GCCCCCTTCG 35281 ACGCCGATGC CCTCGAGGAG TCGCTGGTGC GCGGCTGCGA CAGCGTGCTC TGGACCGTGC 35341 AGGCCGTGGC CGGGGCGGGC TTCCGAGATC CTCCGCGGTT GTGGCTCG7G ACACGCGGCG 35401 CTCAGGCCAT CGGCGCCGGC GACG7CTCCG TGGCGCAAGC GCCGCTCCTG GGGCTGGGCC 35461 GCGTTATCGC CTTGGAGCAC GCCGAGCTGC GCTGCGCTCG GATCGACCTC GATCCAGCGC 35521 GGCGCGACGG AGAGGTCGAT GAGCTGC7TG CCGAGCTGTT GGCCGACGAC GCCGAGGAGG 35581 -AAGTCGCGTT TCGCGGCGGT GAGCGGCGCG TGGCCCGGCT CGTCCGAAGG CTGCCCGAGA 3564 1 CCGAC7GCC3 AGAGAAAATC GAGCCCGCGG AAGGCCGGCC GTTCCGGCTG GAGATCGA7G 25701 GGTCCGGCGT GCTCGACGAC CTGGTGC7CC GAGCCACGGA GCGGCGCCCT CCTGGCCCGG 35761 'GCGAGGTCGA GATCGCCGTC GAGGCGGCGG GGCTCAACTT TCTCGACGTG ATGAGGGCCA 35821 TGGGGATC7A CCC7GGGCCC GGGGACGGTC CGGTTGCGCT GGGCGCCGAG TGCTCCGGCC 35881 GAATTGTCGC GATGGGCGAA GGTGTCGAGA GCCTTCGTAT CGGCCAGGAC GTCGTGGCCG 3594 1 TCGCGCCCT7 CAG7TTCGGC ACCCACGTCA CCATCGACGC CCGGATGGTC GCACCTCGCC 36001 CCGCGGCGCT GACGGCCGCG CAGGCAGCCG CGCTGCCCGT CGCATTCATG ACGGCCTGGT 36061 ACGGTCTCGT CCATCTGGGG AGGCTCCGGG CCGGCGAGCG CGTGCTCATC CACTCGGCGA 36121 CGGGGGGCAC CGGGCTCGCT GCTGTGCAGA TCGCCCGCCA CCTCGGCGCG GAGATATT7G 36181 CGACCGCTGG TACGCCGGAG AAGCGGGCGT GGCTGCGCGA GCAGGGGATC GCGCACGTGA 36241 TGGACTCGCG GTCGCTGGAC TTCGCCGAGC AAGTGCTGGC CGCGACGAAG GGCGAGGGGG 36301 TCGACGTCG GTTGAACTCG CTGTCTGGCG CCGCGATCGA CGCGAGCCTT GCGACCCTCG 36361 TGCCGGACGG CCGCTTCATC GAGCTCGGCA AGACGGACAT CTATGCAGAT CGCTCGCTGG 36421 GGCTCGCTCA CTTTAGGAAG AGCCTGTCCT ACAGCGCCGT CGATCTTGCG GGTTTGGCCG 36481 TGCGTCGGCC CGAGCGCGTC GCAGCGCTGC TGGCGGAGGT GGTGGACCTG CTCGCACGGG 3654 1 GAGCGCTGCA GCCGCTTCCG GTAGAGATCT TCCCCCTCTC GCGGGCCGCG GACGCGTTCC 36601 GGAAAATGGC GCAAGCGCAG CATCTCGGGA AGCTCGTGCT CGCGCTGGAG GACCCGGACG 36661 TGCGGATCCG CGTTCCGGGC GAATCCGGCG TCGCCATCCG CGCGGACGGC ACCTACCTCG 36721 TGACCGGCGG TCTGGGTGGG CTCGGTCTGA GCGTGGCTGG ATGGCTGGCC GAGCAGGGGG 367S 1 C7GGGCATCT GGTGCTGGTG GGCCGCTCCG GTGCGGTGAG CGCGGAGC.AG CAGACGGCTG 3684 1 TCGCCGCGCT CGAGGCGCAC GGCGCGCGTG TCACGGTAGC GAGGGCAGAC GTCGCCGATC 36901 GGGCGCAGAT CGAGCGGATC CTCCGCGAGG TTACCGCGTC GGGGATGCCG CTCCGCGGCG 36961 TCGTTCATGC GGCCGGTATC CTGGACGACG GGCTGCTGAT GCAGCAAACC CCCGCGCGGT 37021 TCCGCGCGGT CATGGCGCCC AAGGTCCGAG GGGCCTTGCA CCTGCATGCG TTGACACGCG 37081 AAGCGCCGC7 CTCCTTCTTC GTGCTGTACG CTTCGGGAGC AGGGCTCTTG GGCTCGCCGG 371 1 GCCAGGGCAA CTACGCCGCG GCCAACACGT TCCTCGACGC TCTGGCACAC CACCGGAGGG 37201 CGCAGGGGCT GCCAGCATTG AGC.ATCGACT GGGGCCTGTT CGCGGACGTG GGTTTGGCCG 37261 CCGGGCAGCA AAATCGCGGC GCACGGCTGG TCACCCGCGG GACGCGGAGC CTCACCCCCG 37321 ACGAAGGGCT GTGGGCGCTC GAGCGTCTGC TCGACGGCGA TCGCACCCAG GCCGGGGTCA 37381 TGCCGTTCGA CGTGCGGCAG TGGGTGGAGT TCTACCCGGC GGCGGCATCT TCGCGGAGGT 3744 1 TGTCGCGGCT GGTGACGGCA CGGCGCGTGG CTTCCGGTCG GCTCGCCGGG GATCGGGACC 37501 TGCTCGAACG GCTCGCCACC GCCGAGGCGG GCGCGCGGGC AGGAATGCTG CAGGAGGTCG 37561 TGCGCGCGCA GGTCTCGCAG GTGCTGCGCC TCCCCGAAGG CAAGCTCGAC GTGGATGCGC 37621 CGCTCACGAG CCTGGGAATG GACTCGCTGA TGGGGCTAGA GCTGCGCAAC CGCATCGAGG 37681 CCGTGCTCGG CATCACCATG CCGGCGACCC TGCTGTGGAC CTACCCCACG GTGGCAGCGC 3774 1 TGAGTGCGCA TCTGGCTTCT CATGTCGTCT CTACGGGGGA TGGGGAATCC GCGCGCCCGC 37801 CGGATACAGG GAACGTGGCT CCAATGACCC ACGAAGTCGC TTCGCTCGAC GAAG.ACGGGT 37861 TGTTCGCGTT GATTGATGAG TC.ACTCGCGC GTGCGGGAAA GAGGTGATTG CGTGACAGAC 37921 CGAGAAGGCC AGCTCCTGGA GCGCTTGCGT GAGGTTACTC TGGCCCTTCG CAAGACGCTG 37981 AACGAGCGCG ATACCCTGGA GCTCGAGAAG ACCGAGCCGA TCGCCATCGT GGGGATCGGC 3804 1 TGCCGCTTCC CCGGCGGAGC GGGCACTCCG GAGGCGTTCT GGGAGCTGCT CGACGACGGG 38101 CGCGACGCGA TCCGGCCGCT CGAGGAGCGC TGGGCGCTCG TAGGTGTCGA CCCAGGCGAC 38161 GACGTACCGC GCTGGGCGGG GCTGCTCACC GAAGCCATCG ACGGCTTCGA CGCCGCGTTC 33221 T7CGG7ATCG GGGA GGCACGG7CG C7CGACCCGC AGCATCGCT T GCTGCTGGAG 33281 GTCGCCTGGG AGGGGTTCGA AGACGCCGGC ATCCCGCCTA GGTCCCTCGT CGGGAGCCGC 3334 1 ACCGGCGTGT TCGTCGGCGT CTGCGCCACG GAGTATCTCC ACGCCGCCGT CGCGCACCAG 33401 CCGCGCGAAG AGCGGGACGC GTACAGCACC ACCGGCAACA TGCTCAGCAT CGCCGCCGGA 334 61 CGGCTATCGT ACACGCTGGG GCTGCAGGGA CCTTGCCTGA CCGTCGACAC GGCGTGCTCG 33521 TCATCGCTGG TGGCCATTCA CCTCGCCTGC CGCAGCCTGC GCGCTCGAGA GAGCGATCTC 33581 GCGCTGGCGG GAGGGGTCAA CATGCTTCTC TCCCCCGACA CGATGCGAGC TCTGGCGCGC 3364 1 .ACCCAGGCGC TGTCGCCCAA TGGCCGTTGC CAGACCTTCG ACGCGTCGGC CAACGGGTTC 38701 GTCCGTGGGG AGGGCTGCGG TCTGATCGTG CTC.AAGCGAT TGAGCGACGC GCGGCGGGAT 38761 GGGGACCGGA TCTGGGCGCT GATCCGAGGA 7CGGCCATCA ATCAGGACGG CCGGTCGACG 38821 GGGT7GACGG CGCCCAACGT GCTCGCCCAG GGGGCGCTCT TGCGCGAGGC GCTGCGGAAC 38381 GCCGGCGTCG AGGCCGAGGC CA7CGGTTAC ATCGAGACCC ACGGGGCGGC G.ACCTCGCTG 38941 GGCGACCCCA TCGAGATCGA AGCGCTGCGC ACCGTGGTGG GGCCGGCGCG AGCCGACGGA 39001 GCGCGCTGCG TGCTGGGCGC GGTGAAGACC AACCTCGGCC ACCTGGAGGG CGCTGCCGGC 39061 GTGGCGGGCC TGATCAAGGC TACACTTTCG CT.ACATCACG AGCGCATCCC GAGGAACCTC 39121 AACTTTCGTA CGCTCAA.TCC GCGGATCCGG A7CGAGGGGA CCGCGCTCGC GTTGGCGACC 39131 GAACCGGTGC CCTGGCCGCG GACGGGCCGG ACGCGCTTCG CGGG.AGTGAG CTC3TTCGGG 39241 ATGAGCGGGA CCAACGCGCA TGTGGTGTTG GAGGAGGCGC CG3CGGTGGA GCC7GAGGCC 39301 GCGGCCCCCG AGCGCGCTGC GGAGCTGTTC GTCCTGTCGG CGAAG.AGCGT GGCGGCGCTG 39361 GATGCGCAGG CAGCCCGGCT GCGGGACCAC CTGGAGAAGC ATGTCGAGCT 7GGCCTCGGC 39421 GA7G7GGCGT TCAGCCTGGC GACGACGCGC AGCGCGATGG AGCACCGGCT aJaj-a-lGTGtjCC 39431 GCGAGCTCGC GCGAGGCGCT GCGAGGGGCG CTTTCGGCCG CAGCGCAGGG GCA7ACGCCG 39541 CCGGGAGCCG TGCGTGGGCG GGCCTCCGGC GGCAGCGCGC CGAAGGTGGT C7TCGTGTTT 39601 CCCGGCCAGG GCTCGCAGTG GGTGGGCATG GGCCGAAAGC TCATGGCCGA AGAGCCGGTC 39661 -. 39661 - * ^ V- V3V3V ^ V3 CGCTGGAGGG T7GCGACCGG GCCATCGAGG CGGAAGCGGG CTGGTCGCTG 39721 CTCGGGGAGC TCTCCGCCGA CGAGGCCGCC TCGCAGCTCG GGCGCATCGA CGTGGTTC.AG 39731 CCGGTGCTCT TCGCCATGGA AGTAGCGCTT TCTGCGCTGT GGCGGTCGTG GGGAGTGGAG 39841 CCGGAAGCGo T »jGTGu * _CA CAGCATGGGC GAGGTGGCGG CGGCGCACGT GGCCGGCGCG 39901 C7G7CGC7CG AGGACGCGGT GGCGATCATC TGCCGGCGCA GCCGGCTGCT GCGGCGGATC 39961 AGCGGTCAGG GCGAGATGGC CCTGGTCGAG CTGTCGCTGG AGGAGGCCGA GGCGGCGCTG 40021 CGTGGCCATG AGGGTCGGCT GAGCGTGGCG GTGAGCAACA GCCCGCGCTC GACCGTGCTC 40031 GCAGGCGAGC CGGCGGCGCT CTCGGAGGTG CTGGCGGCGC TGACGGCCAA GGGGGTGTTC 40141 7GGCGGCAGG TGAAGGTGGA CGTCGCCAGC CATAGCCCGC AGGTCGACCC GCTGCGCGAA 40201 GAGCTGATCG CGGCGCTGGG 3GCGATCCGG CCGCGAGCGG CTGCGGTGCC GATGCGCTCG 40261 ACGGTGACGG GCGGGGTGAT CGCGGGTCCG GAGCTCGGTG CGAGCTACTG GGCGGACAAT 4032 CT7CGGCAGC CGGTGCGCTT C3CTGCGGCG GCGCAAGCGC TGCTGGAAGG TGGCCCCACG 40381 CTGTTCATCG AGATGAGCCC GCACCCGATC CTGGTGCCGC CCCTGGACGA GATCCAGACG 40441 GCGGTCGAGC AAGGGGGCGC TGCGGTGGGC TCGCTGCGGC GAGGGCAGGA CGAGCGCGCG 40501 ACGCTGCTGG AGGCGCTGGG GACGCTGTGG GCGTCCGGCT ATCCGGTGAG CTGGGCTCGG 40561 CTGTTCCCCG CGGGCGGCAG GCGGGTTCCG CTGCCGACCT ATCCCTGGCA GCACGAGCGG 40621 TGCTGGATCG AGGTCGAGCC TGACGCCCGC CGCCTCGCCG CAGCCGACCC CACCAAGGAC 40681 TGGTTCTACC GGACGGACTG GCCCGAGGTG CCCCGCGCCG CCCCGAAATC GGAGACAGC7 40741 CATGGGAGCT GGCTGCTGTT GGCCGACAGG GGTGGGGTCG GCGAGGCGGT CGCTGCAGCG 40801 CTGTCGACGC GCGGACTTTC CTGCACCGTG CTTCATGCGT CGGCTGACGC CTCCACCGTC 40361 GCCGAGCAGG TATCCGAAGC TGCCAGTCGC CGAAACGACT GGCAGGGAGT CCTCTACCTG 40921 TGGGGCCTCG ACGCCGTCGT CGATGCTGGG GCATCGGCCG ACGAAGTCAG CGAGGCTACC 40981 CGCCG7GCCA CCGCACCCGT CCTTGGGCTG GTTCGATTCC TGAGCGCTGC GCCCC.ATCC7 41041 CCTCGCTTCT GGGTGGTGAC CCGCGGGGCA TGCACGGTGG GCGGCGAGCC AGAGGTCTCT 411C1 CTTTGCCAAG CGGCGTTGTG GGGCCTCGCG CGCGTCGTGG CGCTGGAGCA TCCCGCTGCC 41161 TGGGGTGGCC TCGTGGACCT GGATCCTCAG AAGAGCCCGA CGGAGATCGA GCCCCTGGTG 41221 GCCGAGCTGC TTTCGCCGGA CGCCGAGGAT CAACTGGCGT TCCGCAGCGG TCGCCGGCAC 41281 GCAGCACGCC TTGTAGCCGC CCCGCCGGAG GGCGACGTCG CACCGATATC GCTGTCCGCG 41341 GAGGGAAGCT ACCTGGTGAC GGGTGGGCTG GGTGGCCTTG GTCTGCTCGT GGCTCGGTGG 41401 CTGGTGGAGC GGGGAGCTCG ACATCTGGTG CTCACCAGCC GGCACGGGCT GCCAGAGCGA 41461 CAGGCGTCGG GCGGAGAGCA GCCGCCGGAG GCCCGCGCGC GCATCGCAGC GGTCGAGGGG 41521 CTGGAAGCGC AGGGCGCGCG GGTGACCGTG GCAGCGGTGG A7GTCGCCGA GGCCGATCCC 41581 ATGACGGCGC TGCTGGCCGC C.ATCG.AGCCC CCGTTGCGCG GGGTGGTGCA CGCCGCCGGC 41641 GTCTTCCCCG TGCGTCCCCT GGCGGAGACG GACGAGGCCC TGCTGGAGTC GGTGCTCCGT 41701 CCCAAGGTGG CCGGGAGCTG GCTGCTGC.a.C CGGCTGCTGC GCGACCGGCC TC7CGACCTG 41761 TTCGTGC7GT TCTCGTCGGG CGCGGCGGTG TGGGGTGGCA AAGGCCAAGG CGCATACGCC 41821 GCGGCCAATG CGTTCCTCGA CGGGCTCGCG CACCATCGCC GCGCGCACTC CCTGCCGGCG 41831 7TGAGCCTCG CCTGG3GCCT ATGGGCCGAG GGAGGCGTGG TTGATGCAAA GGCTCATGCA 41941 CGTCTGAGCG ACATCGGAGT CCTGCCCATG GCCACGGGGC CGGCCTTGTC GGCGCTGGAG 42001 CGCCTGGTGA ACACCAGCGC T3TCCAGCGT 7CGG7CACAC GGATGGACTG GGCGCGCTTC 42061 GCGCCGGTCT ATGCCGCGCG AGGGCGGCGC AACTTGCTTT CGGCTCTGGT CGCGGAGGAC 42121 GAGCGCACTG CGTCTCCCCC GGTGCCG.ACG GCAAACCGGA TC7GGCGCGG CC7G7CCGTT 42181 GCGGAGAGCC GCTCAGCCCT CTACGAGC7C GTTCGCGGCA TCGTCGCCCG GGTGCTGGGC 42241 TTCTCCGACC CGGGCGCGCT CGACGTCGGC CGAGGCTTCG CCGAGCAGGG GC7CGAC7CC 42301 CTGATGGCTC TGGAGATCCG TAACCGCCTT CAGCGCGAGC TGGGCGAACG GCTGTCGGCG 42361 ACTCTGGCCT TCGACCACCC GACGGTGGAG CGGCTGGTGG CGCATCTCCT CACCGACGTG 42421 CTGAAGCTGG AGGACCGGAG CGACACCCGG CACATCCGGT CGGTGGCGGC GGATGACGAC 42481 ATCGCCATCG TCGG7GCCGC CTGCCGGTTC CCGGGCGGGG ATGAGGGCCT GCAGACATaAC 42541 TGGCGGCATC TGGCCGAGGG C.ATGGTGGTC AGCACCGAGG TGCCAGCCGA CCGGTGGCGC 42601 GCGGCGGACT GGTACGACCC CGATCCGGAG GTTCCGGGCC GGACCTATGT GGCCAAGGGG 42661 GCCTTCCTCC GCGATGTGCG CAGCTTGGAT GCGGCGTTCT TCTCCATCTC CCCTCGTGAG 42721 GCGATGAGCC TGGACCCGCA ACAGCGGCTG TTGC7GGAGG TGAGCTGGGA GGCGATCGAG 42731 CGCGCTGGCC AGGACCCGAT GGCGCTGCGC GAGAGCGCCA CGGGCGTGTT CGTGGGCATG 42341 ATCGGGAGCG AGCACGCCGA GCGGGTGCAG GGCC7CGACG ACGACGCGGC GT7GCTGTAC 429C1 GGCACCACCG GCAACCTGCT CAGCGTCGCC GCTG3ACGGC TGTCG7TC7T CCTGGGTC7G 42961 CACGGCCCGA CGATGACGGT GGACACCGCG TGCTCGTCGT CGCTGGTGGC GTTGCACCTC 43021 GCCTGCCAGA GCCTGCGATT GGGCGAGTGC GACCAGGCAC TGGCCGGCGG GTCCAGCGTG 43031 CTTTTGTCGC CGCGGTCATT CGTCGCGGCA TCGCGCATGC GTTTGC7TTC GCCAGATGGG 43141 CGGTGCAAGA CGTTCTCGGC CGCTGCAGAC GGCTTTGCGC GGGCCGAGGG CTGCGCCGTG 43201 GTGGTGCTCA AGCGGCTCCG TGACGCGCAG CGCGACCGCG ACCCCATCCT GGCGGTGGTC 43261 CGGAGCACGG CGATCAACCA CGATGGCCCG AGCAGCGGGC TCACGGTGCC CAGCGGTCCT 43321 GCCCAGCAGG 'CGTTGCTAGG CCAGGCGCTG GCGCAAGCGG GCGTGGCACC GGCCGAGGTC 43331 GATTTCGTGG AGTGCCACGG GACGGGGACA GCGCTGGGTG ACCCGATCGA GGTGCAGGCG 43441 CTGGGCGCGG TGT.ATGGCCG GGGCCGCCCC GCGGAGCGGC CGCTCTGGCT GGGCGCTGTC 43501 AAGGCCAACC TCGGCCACCT GGAGGCCGCG GCGGGCTTGG CCGGCGTGCT C.AAGGTGCTC 43561 TTGGCGCTGG AGCACGAGCA GATTCCGGCT CAACCGGAGC TCGACGAGCT C.AACCCGCAC 43621 ATCCCGTGGG CAGAGCTGCC AGTGGCCGTT GTCCGCGCGG CGGTCCCCTG GCCGCGCGGC 43681 GCGCGCCCGC GTCGTGCAGG CGTGAGCGCT TTCGGCCTGA GCGGGACCAA CGCGCATGTG 43741 GTGTTGGAGG AGGCGCCGGC GGTGGAGCCT GAGGCCGCGG CCCCCGAGCG CGCTGCGGAG 43801 CTGTTCGTCC TGTCGGCGAA GAGCGTGGCG GCGCTGGATG CGCAGGCAGC CCGGCTGCGG 43861 GATCATCTGG AGAAGCATGT CGAGCTTGGC CTCGGCGATG TGGCGTTCAG CCTGGCGACG 43921 ACGCGCAGCG CGATGGAGCA CCGGCTGGCG GTGGCCGCGA GCTCGCGCGA GGCGCTGCGA 43981 GGGGCGCTTT CGGCCGCAGC GCAGGGGCAT ACGCCGCCGG GAGCCGTGCG TGGGCGGGCC 44041 TCCGGCGGCA GCGCGCCGAA GGTGGTCTTC GTGTTTCCCG GCCAGGGCTC GCAGTGGGTG 44101 GGCATGGGCC GAAAGCTCAT GGCCGAAGAG CCGGTCTTCC GGGCGGCGCT GGAGGGTTGC 44161 GACCGGGCCA TCGAGGCGGA AGCGGGCTGG TCGCTGCTCG GGGAGCTCTC CGCCGACGAG 44221 GCCGCCTCGC AGCTCGGGCG CATCGACGTG GTTCAGCCGG TGCTCTTCGC CGTGGAAGTA 44281 GCGCTTTCAG CGCTGTGGCG GTCGTGGGGA GTGGAGCCGG AAGCGGTGGT GGGCCACAGC 44341 ATGGGCGAGG TTGCGGCGGC GCACGTGGCC GGCGCGCTGT CGCTCGAGGA TGCGGTGGCG 44401 ATCATCTGCC GGCGCAGCCG GCTGCTGCGG CGGATCAGCG GTCAGGGCGA GATGGCGCTG 44461 GTCGAGCTGT CGCTGGAGGA GGCCGAGGCG GCGCTGCGTG GCCATGAGGG TCGGCTGAGC 44521 GTGGCGGTGA GCAACAGCCC GCGCTCGACC GTGCTCGCAG GCGAGCCGGC GGCGC7CTCG 44581 GAGGTGCTGG CGGCGCTGAC GGCCAAGGGG GTGTTCTGGC GGCAGGTGAA GGTGGACGTC 44641 GCCAGCCATA GCCCGCAGGT CGACCCGCTG CGCGAAGAGC TGG7CGCGGC GCTGGGAGCG 44701 ATCCGGCCGC GAGCGGCTGC GGTGCCGATG CGCTCGACGG TGACGGGCGG GGTGATTGCG 44761 GGTCCGGAGC TCGGTGCGAG CTACTGGGCG GACAATCTTC GGCAGCCGGT GCGCTTCGCT 44821 GCGGCGGCGC AAGCGCTGCT GGAAGGTGGC CCCACGCTGT TCATCGAGAT GAGCCCGCAC 44881 CCGATCCTGG TGCCGCCTCT GGACGAGATC CAGACGGCGG TCGAGCAAGG GGGCGCTGCG 44941 GTGGGCTCGC TGCGGCGAGG GCAGGACGAG CGCGCGACGC TGCTGGAGGC GCTGGGGACG 45001 CTGTGGGCGT CCGGCTATCC GGTGAGCTGG GCTCGGCTGT TCCCCGCGGG CGGCAGGCGG 45061 GTTCCGCTGC CGACCTATCC CTGGC.AGCAC GAGCGGTACT GG.ATCGAGGA CAGCGTGCAT 45121 GGGTCGAAGC CCTCGCTGCG GCTTCGGCAG CTTCATAACG GCGCCACGGA CCATCCGCTG 45181 CTCGGGGCTC CATTGCTCGT CTCGGCGCGA CCCGGAGCTC ACTTG7GGGA GCAAGCGCTG 45241 AGCGACGAGA GGCTATCCTA TCTTTCGGAA CATAGGGTCC ATGGCGAAGC CGTGTTGCCC 45301 AGCGCGGCGT ATGTAGAGAT GGCGCTCGCC GCCGGCGTAG ATCTCTATGG CGCGGCGACG 45361 CTGGTGCTGG AGCAGCTGGC GCTCGAGCGA GCCCTCGCCG TGCCTTCCGA AGGCGGACGC 45421 ATCGTGCAAG TGGCCCTCAG CGAAGAAGGG CCCGGTCGGG CCTCA7TCCA GGTATCGAGC 45481 CGTGAGGAGG CAGGTAGAAG CTGGGTTCGG CACGCCACGG GGCACGTGTG TAGCGACCAG 45541 AGCTCAGCAG TGGGAGCGTT GAAGGAAGCT CCGTGGGAGA TTCAACAGCG ATGTCCGAGC 45601 GTCCTGTCGT CGGAGGCGCT CTATCCGCTG CTCAACGAGC ACGCCCTCGA CTATGGCCCC 45661 7GCT7CCAGG GTGTGGAGCA GGTGTGGCTC GGCACGGGGG AGGTGCTCGG CCGGG7ACGC 45721 TTGCCAGAAG .ACATGGCATC CTCAAGTGGC GCC7ATCGGA TTCATCCCGC CTTGT7GGAT 45781 GCATGTTTTC AAG7GCTGAC CGCGCTGCTC ACCACGCCGG AATCCATCGA GAT7CGGAGG 4584 1 CGGCTGACGG ATCTCCACGA ACCGGATCTC CCGCGGTCCA GGGCTCCGGT GAATCAAGCG 45901 GTGAGTGACA CCTGGCTGTG GGACGCCGCG C7GGACGGTG GACGGCGCCA GAGCGCGAGC 4 5961 GTGCCCGTCG ACCTGG7GCT CGGCAGCTTC CACGCGAAGT GGGAGGTCAT GGATCGCCTC 4 6021 GCGCAGACGT ACATCATCCG CACTCTCCGC ACATGGAACG TC7TCTGCGC TGCTGGAGAG 4608 1 CGTCACACGA TAGACGAGTT GCTCGTCAGG CTCCAAATCT CTGCTGTCTA CAGGAAGGTC 4614 1 ATCAAGCGAT GGATGGATCA CCTTGTCGCG A7CGGCG7CC 7TGTAGGGGA CGGAGAGCAT 4 6201 CTTGTGAGCT CTCAGCCGCT GCCGGAGCAT GATTGGGCGG CGGTGCTCGA GGAGGCCGCG 4 6261 ACGGTGTTCG CCGACC7CCC AGTCCTACTT GAG7GGTGCA AGTTTGCCGG GGAACGGCTC 4 6321 GCGGACGTGT TGACCGGGAA GACGCTGGCG CTCGAGATCC TCTTCCCTGG CGGCTCGTTC 4638 1 GATATGGCGG AGCGAATCTA TCAAGATTCG CCCATCGCCC GTTACTCGAA CGGCA7CGTG 4 6441 CGCGGTGTCG TCGAGTCGGC GGCGCGGGTG GTAGCACCGT CGGGAACGTT CAGCATCTTG 46501 GAGATCGGAG CAGGGACGGG CGCGACCACC GCCGCCGTCC TCCCGGTGTT GCTGCC7G.AC 4 6561 CGGACAGAAT ACCATT7CAC CGATGTTTCT CCGCTCTTCC 7TGCTCG7GC GGAGCAAAGA 46621 TTTCGAGATC ATCCATTCCT GAAGTATGGT ATTCTGGATA T CGACCAGGA GCCAGC7GGC 46681 CAGGGAT.ACG CACATCAGAA GTTCGACGTC ATCGTCGCGG CCAACGTCAT CCATGCGACC 46741 CGCGATATAA GAGCCACGGC GAAGCGTCTC CTGTCGTTGC TCGCGCCCGG AGGCCTTCTG 46801 GTGCTGGTCG AGGGCACAGG GCATCCGATC TGGTTCGATA TCACCACGGG ATTGATCGAG 46861 GGGTGGCAGA AGTACGAAGA TGATCTTCGT ACCGACCATC CGCTCCTGCC TGCTCGGACC 4 6921 TGGTGTGACG TCCTGCGCCG GGTAGGCTTT GCGGATGCCG TGAGTCTGCC AGGCGACGGA 4 6981 TCTCCGGCGG GGATCCTCGG ACAGCACGTG ATCCTCTCGC GCGCTCCGGG CATAGCAGGA 47041 GCCGCTTGTG ACAGCTCCGG TGAGTCGGCG ACCGAATCGC CGGCCGCGCG TGCAGT.ACGG 47101 CAGGAATGGG CCGATGGCTC CGCTGACGGC GTCCATCGGA TGGCGTTGGA GAGAATGTAC 4"? 161 T7CCACCGCC GGCCGGGCCG GCAGGTTTGG GTCCACGGTC GATTGCGTAC CGGTGGAGGC 47221 3CGTTCACGA AGGCGC7CAC TGGAGATCTG CTCCTGTTCG AAGAGACCGG GCAGGTCGTG 47281 GCAGAGGTTC AGGGGCTCCG CCTGCCGCAG CTCGAGGCTT CTGCTTTCGC GCCGCGGGAC 47341 CCGCGGGAAG AGTGGTTGTA CGCGTTGGAA TGGCAGCGCA AAGACCCTAT ACCAGAGGCT 47401 CCGGCAGCCG CGTCTTCTTC CACCGCGGGG GCTTGGCTCG TGCTG.ATGGA CCAGGGCGGG 47461 ACAGGCGCTG CGCTCGTATC GCTGCTGGAA GGGCGAGGCG AGGCGTGCGT GCGCGTCGTC 47521 GCGGGTACGG CATACGCCTG CCTCGCGCCG GGGCTGTATC AAGTCGATCC GGCGC.AGCCA 47581 GATGGCTTTC ATACCCTGCT CCGCGATGCA TTCGGCG.AGG ACCGG.ATGTG CCGCGCGGTA 4764 1 GTGCATATGT GGAGCCTTGA TGCGAAGGCA GCAGGGGAGA GGACGACAGC GGAGTCGCTT 47701 CAGGCCGATC AACTCC73GG GAGCCTGAGC GCGCTTTCTC TGGTGCAGGC GCTGG7GCGC 47761 CGGAGGTGGC GCAACATGCC GCGACTTTGG CTC7TGACCC GCGCCGTGCA TGCGGTGGGC 47821 GCGGAGGACG CAGCGGCCTC GGTGGCGCAG GCGCCGGTGT GGGGCCTCGG TCGGACGCTC 47881 GCGCTCGAGC ATCCAGAGCT GCGGTGCACG CTCGTGGACG TGAACCCGGC GCCGTCTCCA 4794 1 GAGGACGCAG CTGCACTCGC GGTGGAGCTC GGGGCGAGCG ACAGAGAGGA CCAGATCGCA 48001 77GCGCTCGA ATGGCCGCTA CGTGGCGCGC CTCGTGCGGA GCTCCTTTTC CGGCAAGCCT 48061 GCTACGGATT GCGGCATCCG GGCGGACGGC AGTTATGTGA TCACCGATGG CATGGGGAGA 43121 GTGGGGCTCT CGGTCGCGCA ATGGATGGTG ATGCAGGGGG CCCGCCATGT GGTGCTCGTG 48181 GATCGCGGCG GCGCTTCCGA CGCCTCCCGG GATGCCCTCC GGTCCATGGC CGAGGCTGGC 4824 1 GCAGAGGTGC AGATCGTGGA GGCCGACGTG GCTCGGCGCG TCGATGTCGC TCGGCTTCTC 43301 TCGAAGATCG AACCGTCGAT GCCGCCGCTT CGGGGGATCG TGTACGTGGA CGGGACCTTC 4 8361 CAGGGCGACT CCTCGATGCT GGAGCTGGAT GCCCATCGCT TCAAGGAGTG GATGTATCCC 48421 AAGGTGCTCG GAGCGTGGAA CCTGCACGCG CTGACCAGGG ATAG.ATCGCT GGACTTCTTC 48481 GTCCTGTACT CCTCGGGCAC CTCGCTTCTG GGC7TGCCCG GACAGGGGAG CCGCGCCGCC 4354 1 GGTGACGCCT TCTTGGACGC CATCGCGCAT CACCGGTG7A GGCTGGGCCT CACAGCGATG 48601 AGCATCAACT GGGGATTGCT CTCCGAAGCA TCA7CGCCGG CGACCCCGAA CGACGGCGGC 43661 GCACGGCTCC AATACCGGGG GATGGAAGGT CTCACGCTGG AGCAGGGAGC GGAGGCGCTC 48721 GGGCGCTTGC TCGCACAACC CAGGGCGCAG GT.AGGGGTAA TGCGGCTGAA TCTGCGCCAG 48781 TGGCTGGAGT TCTATCCCAA CGCGGCCCGA CTGGCGCTGT GGGCGGAGTT GCTGAAGGAG 4884 1 CGTGACCGCA CCGACCGGAG CGCGTCGAAC GCATCGAACC TGCGCGAGGC GCTGCAGAGC 48901 GCCAGGCCCG AAGATCGTCA GTTGGTTCTG GAGAAGCACT TGAGCGAGCT GTTGGGGCGG 48961 GGGCTGCGCC TTCCGCCGGA GAGGATCGAG CGGCACGTGC CGTTCAGCAA TCTCGGCATG 49021 GACTCGTTGA TAGGCCTGGA GCTCCGCAAC CGCATCGAGG CCGCGCTCGG CATCACCGTG 4 9081 CCGGCGACCC TGCTATGGAC TTACCCTACC GT.AGCAGCTC TGAGCGGGAA CCTGCTAGAT 4914 1 ATTCTGTTCC CGAATGCCGG CGCGACTCAC GCTCCGGCCA CCGAGCGGGA GAAGAGCTTC 4 9201 GAGAACGATG CCGCAGATCT CGAGGCTC7G CGGGGTATGA CGGACGAGCA GAAGGACGCG 4 9261 TTGCTCGCCG AAAAGCTGGC GCAGCTCGCG CAGATCGTTG GTGAGTAAGG GACTGAGGGA 49321 GTATGGCGAC CACGAATGCC GGGAAGCTTG .AGCATGCCCT TCTG C7CATG GACAAGCTTG 49381 CGAAAAAG.AA CGCGTC7TTG GAGCAAGAGC GGACCGAGCC GATCGCCATC ATAGGTATTG 49441 GCTGCCGCTT CCCCGGCGGA GCGGACACTC C3GAGGCATT CTGGGAGCTG CTCGACTCGG 49501 GCCGAGACGC GGTCCAGCCG CTCGACCGGC GCTGGGCGCT GGTCGGCGTC CATCCCAGCG 49561 AGGAGGTGCC GCGCTGGGCC GGACTGCTCA CCGAGGCGGT GGACGGCT7C GACGCCGCGT 49621 TCTTTGGC.AC CTCGCCTCGG GAGGCGCGGT CGCTCG.ATCC TCAGCAACGC CTGCTGCTGG 49631 AGGTCACCTG GGAAGGGCTC GAGGACGCCG GC.ATCGCACC CCAGTCCCTC GACGGCAGCC 49741 GCACCGGGGT ATTCCTGGGC GCATGCAGCA GCGACTACTC GCATACCGTT GCGCAACAGC 49801 GGCGCGAGGA GCAGG.ACGCG TACGACATCA CCGGCAATAC GCTCAGCGTC GCCGCCGGAC 49861 GG7TGTCTTA T.ACGCTAGGG CTGCAGGGAC CCTGCCTGAC CGTCGACACG GCC73CTCGT 49921 CGTCGCTCGT GGCCATCCAC CTTGCC7GCC GCAGCCTGCG CGCTCGCGAG AGCGATCTCG 49981 CGCTGGCGGG GGGCGTCAAC ATGCTCCTTT CGTCCAAGAC G.ATGATAATG CTGGGGCGCA 50041 TCCAGGCGCT GTCGCCCGAT GGCCACTGCC GGACATTCGA CGCCTCGGCC AACGGGTTCG S01C1 7CCGTGGGGA GGGCTGCGGT ATGGTCG7GC TCAAACGGCT CTCCGACGCC CAGCGACATG 5C161 GCGATCGGAT CTGGGCTCTG ATCCGGGGTT CGGCCATGAA TCAGGAT GGC CGGTCGACAG 50221 GGTTGATGGC ACCCAATGTG C7CGCTCAGG AGGCGCTCTT ACGCCAGGCG CTGC.AGAGCG 50281 CTCGCGTCGA CGCCGGGGCC ATCGATTATG TCGAGACCCA CGGAACGGGG ACCTCGCTCG 50341 GCGACCCGAT CG.AGGTCGAT GCGCTGCGTG CCGTGATGGG GCCGGCGCGG GCCGATGGGA 50401 GCCGCTGCGT GCTGGGCGCA GTGAAGACCA ACCTCGGCCA CCTGGAGGGC GCTGCAGGCG 50461 TGGCGGGTTT GATCAAGGCG GCGCTGGCTC TGCACCACGA ATCGATCCCG CGAAACCTCC 50521 A7T7TCACAC GCTCAATCCG CGGATCCGGA TCGAGGGG.AC CGCGCTCGCG CTGGCGACGG 50581 AGCCGGTGCC GTGGCCGCGG GCGGGCCGAC CGCGCTTCGC GGGGGTGAGC GCGTTCGGCC 50641 TCAGCGGCAC CAACGTCCAT GTCGTGCTGG AGGAGGCGCC GGCCACGGTG CTCGCACCGG 50701 CGACGCCGGG GCGCTCAGCA GAGCTTTTGG TGCTGTCGGC GAAGAGCACC GCCGCGCTGG 50761 ACGCACAGGC GGCGCGGCTC TCAGCGCACA TCGCCGCGTA CCCGGAGCAG GGCCTCGGAG 50821 ACGTCGCGT7 CAGCCTGGTA GCGACGCGGA GCCCGATGGA GCACCGGCTC GCGGTGGCGG 50881 CGACCTCGCG CGAGGCGCTG CGAAGCGCGC TGGAAGCTGC GGCGCAGGGG CAGACCCCGG 509 1 CAGGCGCGGC GCGCGGCAGG GCCGCTTCCT CGCCCGGCAA GCTCGCCTTC CTGTTCGCCG 51001 GGCAGGGCGC GCAGGTGCCG GGCATGGGCC GTGGGTT GTG GGAGGCGTGG CCGGCGTTCC 51061 GCGAGACCTT CGACCGGTGC GTCACGCTCT TCGACCGGGA GCTCCATCAG CCGCTCTGCG 51121 AGGTGATGTG GGCCGAGCCG GGCAGCAGCA GGTCGTCGTT GCTGGACCAG ACGGCATTCA 51181 CCC.AGCCGGC GCTCTTTGCG CTGGAGTACG CGCTGGCCGC GCTCTTCCGG TCGTGGGGCG 51241 TGGAGCCGGA GCTCATCGCT GGCCATAGCC TCGGCGAGCT GGTGGCCGCC TGCGTGGCGG 51301 GTGTGTTCTC CCTCGAGGAC GCCGTGCGCT TGGTGGTCGC GCGCGGCCGG TTGATGCAGG 51361 CGCTGCCGGC CGGCGGTGCG ATGGTATCGA TCGCCGCGCC GGAGGCCGAC GTGGCTGCCG 51421 CGGTGGCGCC GCACGCAGCG TCGGTGTCGA TCGCGGCAGT CAATGGGCCG GAGCAGGTGG 51431 TGATCGCGGG CGCCGAGAAA TTCGTGCAGC AGATCGCGGC GGCGT7CGCG GCGCGGGGGG 515 1 CGCGAACCAA ACCGCTGCAT GTTTCGCACG CGTTCCACTC GCCGCTCATG GATCCGATGC 51601 TGGAGGCGTT CCGGCGGGTG ACCGAGTCGG TGACGTATCG GCGGCCTTCG ATGGCGCTGG 51661 TGAGCAACCT GAGCGGGAAG CCCTGCACGG ATGAGGTGTG CGCGCCGGGT TACTGGGTGC 51721 GTCACGCGCG AGAGGCGGTG CGCTTCGCGG ACGGCGTGAA GGCGCTGCAC GCGGCCGGTG 51781 CGGGCATCTT CGTCGAGGTG GGCCCGAAGC CGGCGCTGCT CGGCCTTTTG CCGGCCTGCC 518 1 TGCCGGATGC CAGGCCGGTG CTGCTCCCA G CGTCGCGCGC CGGGCGTGAC GAGGCTGCGA 51901 GCGCGCTGGA GGCGCTGGGT GGGT7CTGGG TCGTCGGTGG ATCGGTCACC TGGTCGGGTG 51961 TCTTCCCTTC GGGCGGACGG CGGGTACCGC TGCCAACCTA TCCCTGGCAG CGCGAGCGTT 52021 ACTGGATCGA AGCGCCGGTC GATGGTGAGG CGGACGGCAT CGGCCGTGCT CAGGCGGGGG 52081 ACCACCCCCT TCTGGGTGAA GCCTTTTCCG TGTCGACCCA TGCCGGTCTG CGCCTGTGGG 52141 AGACGACGCT GGACCGAAAG CGGCTGCCGT GGCTCGGCGA GCACCGGGCG CAGGGGGAGG 52201 TCGTGTTTCC TGGCGCCGGG TACCTGG.AGA TGGCGCTGTC GTCGGGGGCC GAGATCTTGG 52261 GCGATGGACC GATCCAGGTC ACGGATGTGG TGCTCATCGA GACGCTGACC TTCGCGGGCG 52321 ATACGGCGGT ACCGGTCCAG GTGGTGACGA CCGAGGAGCG ACCGGGACGG CTGCGGTTCC 52381 AGGTAGCGAG TCGGGAGCCG GGGGCACGTC GCGCGTCCTT CCGGATCCAC GCCCGCGGCG 52441 TGC7GCGCCG GGTCGGGCGC GCCGAGACCC CGGCGAGGTT GAACCTCGCC GCCCTGCGCG 52501 CCCG3CTTCA TGCCGCCGTG CCCGCTGCGG CTATCTATGG GGCGCTCGCC GAGATGGGGC 52561 TTCAATACGG CCCGGCGTTG CGGGGGCTCG CCGAGCTGTG GCGGGGTGAG GGCGAGGCGC 52621 TGGGCAGAGT GAGACTGCCT GAGTCCGCCG GCTCCGCGAC AGCCTACCAG CTGC.ATCCGG 52681 7GC7GCTGGA CGCGTGCGTC CAA.ATGATTG TTGGCGCGTT CGCCGATCGC GATGAGGCGA 527 1 CGCCG7GGGC GCCGGTGGAG GTGGGCTCGG TGCGGCTGTT CCAGCGGTCT CCTGGGGAGC 52801 7ATGGTGCCA TGCGCGCGTC GTGAGCGATG GTCAACAGGC CCCCAGCCGG TGGAGCGCCG 52361 ACTTTGAGT7 GATGGACGGT ACGGGCGCGG TGGTCGCCGACTTGC GAGCGGTGTA CGCCGGCGCG ACGC.AGACGA CTGGTTCCTG GAGCTGGA7T 52981 GGGAGCCCGC GGCGC7CGAG GGGCCCA GA TCACAGCCGG CCGG7GGCTG CTGC7CGGCG 530 1 AGGGTGGTGG GCTCGGGCGC TCGTTGTGCT CAGCGCTGAA GGCCGCCGGC CATG7CGTCG 53101 TCCACGCCGC GGGGGACGAC ACGAGCGCTG CAGGAATGCG CGCGCTCCTG GCCAACGCGT 53161 TCGACGGCCA GGCCCCGACG GCCG7GG7GC ACCTCAGCAG CCTCGACGGG GGCGGCCAGC 53221 TCGACCCGGG GCTCGGGGCG CAGGGCGCGC TCG.ACGCGCC CCGG.AGCCCA GATGTCGATG 53281 CCGATGCCCT CGAGTCGGCG CTGATGCGTG GTTGCGACAG CGTGCTCTCC CTGGTGCAAG 533 1 CGCTGGTCGG CATGG.ACCTC CGAAATGCGC CGCGGCTGTG GCTTTTGACC CGCGGGGCTC 53401 .AGGCGGCCGC CGCCGGCGAT GTCTCCGTGG TGCAAGCGCC GCTGTTGGGG C7GGGCCGCA 534 61 CCATCGCCTT GGAGCACGCC GAGCTGCGCT GTATCAGCGT CG.ACCTCGAT CCAGCCCAGC 53521 CTGAAGGGGA .AGCCGATGCT TTGCTGGCCG AGCTACTTGC AGATGATGCC GAGGAGGAGG 53581 TCGCGCTGCG CGG7GGCGAG CGG7TTGTTG CGCGGCTCG7 CCACCGGCTG CCCGAGGCTC 53641 AACGCCGGGA G.AAGATCGCG CCCGCCGGTG ACAGGCCGTT CCGGCTAG.AG A7CGA7GAAC 53701 CCGGCG7GCT GGACCAACTG GTGCTCCGGG CCACGGGGCG GCGC GCTCCT GGTCCGGGCG 53761 AGGTCGAGA7 CGCCGTCGAA GCGGCGGGGC TCG.ACTCCAT CGACATCCA3 CTGGCGGTGG 53821 GCGTTGCTCC CAATGACCTG CCTGGAGGAG AAATCGAGCC GTCGGTGCTC GGAAGCG.AGT 53381 GCGCCGGGCG CATCGTCGCT GTGGGCGAGG GCGTGAACGG CCTTGTGGTG GGCCAGCCGG 53941 TGATCGCCCT TGCGGCGGGA GTATTTGCTA CCCATGTCAC CACGTCGGCC ACGCTGGTGT 54001 TGCCTCGGCC 7C7GGGGCTC TCGGCGACCG AGGCGGCCGC GATGCCCCTC GCGTATTTGA 54061 CGGCCTGGTA CGCCCTCGAC AAGGTCGCCC .ACCTGCAGGC GGGGGAGCGG GTGCTGATCC 5 121 GTGCGGAGGC CGGTGGTATC GGTCTTTGCG CGGTGCGATG GGCGCAGCGC GTGGGCGCCG 54131 AGGTGTATGC GACCGCCGAC ACGCCCGAGA AACGTGCCTA CCTGGAGTCG CTGGGCGTGC 5424 1 GGTACGTGAG CGAT7CCCGC TCGGGCCGGT TCGCCGCAGA CGTGCATGCA TGG.ACGGACG 54301 GCGAGGGTGT GGACGTCGTG CTCGACTCGC TTTCGGGCGA GCACATCGAC AAGAGCCTCA 54361 TGGTCCTGCG CGCCTGTGGC CGCCTTGTGA AGCTGGGCAG GCGCGACGAC TGCGCCGACA 54421 CGCAGCCTGG GCTGCCGCCG CTCCTACGGA ATTTTTCCTT CTCGCAGGTG GACTTGCGGG 54481 GAATGATGCT CGATCAACCG GCGAGGATCC GTGCGCTCCT CGACGAGCTG TTCGGGTTGG 545 1 TCGCAGCCGG TGCCATCAGC CCACTGGGGT C GGGGTTGCG CGTTGGCGGA TCCCTCACGC 54601 CACCGCCGGT CGAGACCTTC CCGATCTCTC GCGCAGCCGA GGCATTCCGG AGGATGGCGC 54661 AAGGACAGCA TCTCGGGAAG CTCGTGCTCA CGCTGGACGA CCCGGAGGTG CGGATCCGCG 54721 CTCCGGCCGA ATCCAGCGTC GCCGTCCGCG CGGACGGCAC CTACCTTGTG ACCGGCGGTC 54781 TGGGTGGGCT CGGTCTGCGC GTGGCCGGAT GGCTGGCCGA GCGGGGCGCG GGGCAACTGG 54841 TGCTGGTGGG CCGCTCCGGT GCGGCGAGCG CAGAGCAGCG AGCCGCCGTG GCGGCGCTAG_54901_AGGCCC.ACGG CGCGCGCGTC ACGGTGGCGA AAGCGGATGT CGCCGATCGG TCACAGATCG 54961 AGCGGGTCCT CCGCGAGGTT ACCGCGTCGG GGATGCCGCT GCGGGGTGTC GTGCATGCGG 55021 CAGGTCTTGT GGATGACGGG CTGCTGATGC AGCAGACTCC GGCGCGGCTC CGCACGGTGA 55081 TGGGACCTAA GGTCCAGGGA GCCTTGCAC7 TGCACACGCT GACACGCGAA GCGCCTCTTT 5514 1 CCTTCTTCGT GCTGTACGCT TCTGCAGCTG GGCTGTTCGG CTCGCCAGGC CAGGGCAACT 55201 ATGCCGC.AGC CAACGCGTTC CTCGACGCCC TTTCGCATCA CCGCAGGGCG CACGGCCTGC 55261 CGGCGCTGAG CATCGACTGG GGCATGTTCA CGGAGGTGGG GATGGCCGTT GCGCA.AGAAA 55321 ACCGTGGCGC GCGGCTGATC TCTCGCGGGA TGCGGGGCAT CACCCCCGAT GAGGGTCTGT 55381 CAGCTCTGGC GCGCTTGCTC GAGGGTGATC GCGTGCAGAC GGGGGTGATA CCGATCACTC 5544 1 CGCGGCAGTG GGTGGAGTTC TACCCGGCAA CAGCGGCCTC ACGGAGGTTG TCGCGGCTGG 55501 TGACCACGCA GCGCGCGGTT GCTGATCGGA CCGCCGGGGA TCGGGACCTG CTCGAACAGC 55561 TTGCCTCGGC TGAGCCGAGC GCGCGGGCGG GGCTGCTGCA GGACGTCGTG CGCGTGCAGG 55621 TCTCGCATGT GCTGCGTCTC CCTGAAGACA AGATCGAGGT GGATGCCCCG CTCTCGAGCA 55681 TGGGCATGGA CTCGCTGATG AGCCTGGAGC TGCGCAACCG CATCGAGGCT GCGCTGGGCG 5574 1 TCGCCGCGCC TGCAGCCTTG GGGTGGACGT ACCCAACGGT AGCAGCGATA ACGCGCTGGC 55801 TGCTCGACGA CGCCCTCGCC G7CCGGCT7G GCGGCGGGTC GGACACGGAC GAATCGACGG 55861 CAAGCGCCGG ATCGTTCGTC CACGTCCTCC GCTTTCGTCC TGTCGTCAAG CCGCGGGCTC 55921 GTCTCTTCTG TTTTCACGGT TCTGGCGGCT CGCCCGAGGG CTTCCGTTCC TGGTCGGAGA 55981 AGTCTGAGTG GAGCGATCTG GAAATCGTGG CCATGTGGCA CGATCGCAGC CTCGCCTCCG 5604 1 AGGACGCGCC TGGTAAGAAG TACGTCCAAG AGGCGGCCTC GCTGATTCAG CACTATGCAG 56101 ACGCACCG77 TGCGTTAGTA GGGTTCAGCC TGGGTGTCCG GTTCGTCATG GGGACAGCCG 56161 TGGAGCTCGC TAGTCG7TCC GGCGCACCGG CTCCGCTGGC CGTTTTTGCG TTGGGCGGCA 56221 GCTTGATCTC TTCTTCAGAG ATCACCCCGG AGATGGAGAC CGAT.ATAATA GCCAAGCTCT 56281 TCTTCCGAAA TGCCGCGGGT TTCGTGCGAT CCACCCAACA AGTTCAGGCC GATGCTCGCG 5634 1 CAGACAAGGT CATCACAGAC ACCATGGTGG CTCCGGCCCC CGGGGACTCG AAGGAGCCGC 56401 CCTCGAAGAT CGCGGTCCCT ATCGTCGCCA TCGCCGGCTC GGACGATGTG ATCGTGCC7C 56461 CAAGCGACG7 TCAGGATCTA CAATCTCGCA CCACGGAGCG CTTCTATATG CATCTC 56521 CCGGAGATCA CGAGTT7CTC GTCGATCGAG GGCGCGAGAT CATGCACATC GTCGACTCGC 56531 ATCTCAATCC GCTGCTCGCC GCGAGGACGA CGTCGTCAGG CCCCGCGTTC GAGGCAAAAT 56641 GATGGCAGCC TCGGGC GCGCGAGATG GTTGGG.AGCA GCGTGGGTGC TGGTGGCCGG 56701 CGGCAGGCAG CGGAGGCTCA TGAGCC TGGAAGTTTG CAGCATAGGA GATT7TATGA 56761 CACAGGAGCA AGCGAATCAG AGTGAG.ACGA AGC7TT CGACTTCAAG CCGTTCGCGC 56321 CTGGG7ACGC G3AGGACCCG TTTCCCGCGA TCGAGCGGAGAGAGGCA ACCCCCATCT 56381 7C7AC7GGGA TGAAGGCCGC TGGTCC TCACCCGA7A CCACGACG7G TCGGCGGTGT 56941 TCCGCGACGA ACGCTTCGCG GTCAGTCGAG AAG.AATGGGA ATCGAGCGCG GAGTAC7CGT 57001 CGGCCAT7CC CGAGCTCAGC GATATG.AAGA AGTACGGATT GTTCGGGCTG CCGCCGGAGG 57061 ATCACGCTCG GG7CCGCAAG CTCGTCAACC CATCGTTTAC GTCACGCGCG ATCGAC 57121 TGCGCGCCGA AATACAGCGC ACCGTCGACC AGCTGCTCGA TGCTCGCTCC GGACAAGAGG 57181 AGTTCGACGT TGTGCGGGAT TACGCGGAGG GAATCCCGAT GCGTGCGATC AGCGCTCTGT 57241 TGAAGG77CC GGCCGAGTGT GACGAGAAGT TCCGTCGCTT CGGCTCGGCG ACTGCGCGCG 57301 CGCTCGGCGT GGGTTTGGTG CCCCGGGTCG A7GAGGAGAC CAAGACCC7G GTCGCGTCCG 57361 TCACCGAGGG GCTCGCGCTG CTCCATGGCG TG.ATGA GCGGCGCAGG AACCCGCTCG 57421 AAAATGACGT CTTGACGATG CTGCTTCAGG CCGAGGCCGA CGGCAGCAGG CTGAGCACGA 57481 AGGAGCTGGT CGCGCTCGTG GGTGCGATTA TCGCTGCTGG CACCGATACC ACGATCTACC 57541 7T.ATCGCGTT CGCTGTGCTC AACTGC GGTCGCCCGA GGCGCTCGAG CTGGTGAAGG 57601 CCGAGCCCGG GCTCATGAGG AACGCGCTCG ATGAGGTGCT CCGCTTCGAC AATATA 57661 GAAT.AGGAAC TGTGCGTTTC GCCAGGCAGG AGAGTA CTGCGGGGCA TCGATCAAGA 57721 AAGGGGAGAT GGTCTTTCTC CTGATCCCGA GCGCAG AGATGGGACT GTATTCTCCA 57781 GGCCAGACGT GTTTGATGTG CGACGGGACA CG.AGCGCGAG GCGTAC GGTAGAGGCC 57841 CCCATGTCTG CCCCGGGGTG TCGCTC GGAGGC GGAGATCGCC GTGGGCACCA 579C1 TCTTCCGTAG GTTCCCCGAG ATGAAGCTGA AAGAAACTCC CGTGTTTGGA TACCACCCCG 57961 CGTTCCGGAA CATCGAATCA CTCAACGTCA TCTTGAAGCC CTCCAAAGCT GGATAACTCG 58021 CGGGGGCATC GCTTCCCGAA ATTCTT TCATGATGCA ACTCGCGCGC GGGTGCTGTC 58081 TGCCGCGGGT GCGATTCGAT CCAGCGGACA AGCCCATTGT CAGCGCGCGA AGATCGAATC 58141 CACGGCCCGG AGAAGAGCCC GATGGCGAGC CCGTCCGGGT AACGTCGGAA GAAGTGCCGG 58201 GCGCCGCCC7 GGGAGCGCAA AGCTCGCTCG CTCGCGCTCA GCGCGCCGCT TGCCATGTCC 58261 GGCCAC CCGCACCGAG GAGCCACCCG CATGCA CGGACC GAGCGGCAGG 58321 TTCTGCTCTC GCTCGTCGCC CTCGCGCTCG TCTGAC CGCGCGCGCC TTCGGCGAGC 58381 TCGCGCGGCG GCTGCGCCAG CCCGAGGTGC TCGGCGAGCT CTTCGGCGGC GTGGTGCTGG 58441 GCCCGTCCGT CGTCGGCGCG CTCGCT GGTTCCATCG AGTTTC CAGGATCCGG 58501 CGGTCGGGGG CGTGCTCTCC GGCATCTGGATAGGCGC GCTCGT CTGCTCATGG 58561 CGGGTATCGA GGTCGATGTG AGCATTCTAC GCAAGGAGGC GCGCCCCGGG GCGCTCTCGG 58621 CGCTCGGCGC GATCGCGCCC CCGCTGCGCA CGCCGGGCCC GCTGGTGCAG CGCATGCAGG 58681 GCACGTTGAC GTGGGATCTC GACGTCTCGC CGCGACGCTC TGCGCAAGCC TGAGGG 58741 CGCTCG TACAGC CGGTGCTCGC TCCGCCCGCG GACATCCGGC CGCCCCCCGC 58801 GGCCCAGCTC GAGCCGGACT CGCCGGATGA CGAGGCCGAC GAGGCGCTCC GCCCGTTCCG 58861 CGACGCGATC GCCGCGTACT CGGAGGCCGT TCGGTGGGCG GAGGCGGCGC AGCGGCCGCG 58921 GCTGGAGAGC CTCGTGCGGC TCGCGATCGT GCGGCTGGGC AAGGCGCTCG AC.AAGGCACC 58981 TTTCGCGCAC ACGACGGCCG GCGTCTCCCA GATCGCCGGC AGACTTCCCC AG.AAAACGAA 59041 TGCGGTCTGG TTCGATGTCG CCGCCCGGTA CGCGAGCTTC CGCGCGGCGA CGGAGCACGC 59101 GCTCCGCGAC GCGGCGTCGG CCACGGAGGC GCTCGCGGCC GGCCCGTACC GCGGATCGAG 59161 CAGCGTGTCC GCTGCCGTAG GGGAGTTTCG GGGGGAGGCG GCGCGC ACCCCGCGGA 59221 CCGCGTACCC GCGTCCGACC AGCAGATGACCGCGCTG CGCGCAGCCG AGCGGGCGCT 59281 CATCGCGCTC TACACCGCGT TCGCCCGTGA GGAGTGAGCC TCTCTCGGGC GCAGCCGAGC 59341 GGCGGCGTGC CGGTTGTTCC CTCTTCGCAA CCATGACCGG AGCCGCGCCC GGTCCGCGCA 59 01 GCGGCTAGCG CGCGTCGAGG CAGAGAC-CGC TGGAGCGACA GGCGACGACC CGCCCGAGGG 59461 TGTCGAACGG ATTGCCGCAG CATTGC GGATCC CAGACACTCG TTCAGCG59521 TGGCGTCGAT GCCGGG CACTCGCCGA AGGTCAGCTC GTCGCGCCAG TCGGATCGGA 59531 TCTTG77CGA GCACGCATCC TTGCTCGAAT ACTCCCGGTC TTGTCCGATG T7G77GCACC 59641 GCGGCG GTCGCACCGC GCCGCCACGA TGCTATCGAC GGCGCTGCCG ACTGGCACCG 59701 GCGGCC TTGCGCGCCA CCCGGGGTTT GCGCCC GACCGC TTTTCGCCGC 59761 CGCACGCCGC CGCGAGCAGG CTCATTCCCG ACATCGAGAT CAGGCCCACG ACCAGTTTCC 59821 CAGCAATCTT TTGCATGGCT TCCCACGACACGT CACATCAGAG ATTCTCCGC7 59881 CGGCTCGTCG GTTCGACAGC CGGCG.ACGGC CACGAGCAGA ACCGTCCCCG ACCAGAACAG 599 1 CCGCATGCGG GTTTCTCGCA GCATGCCACG ACATGC GACTAGCGTG CGCTCG 60001 TGCCGAGATC GGCTGTCCTG TGCGACGGCA ATGTCCTGCG ATCGGCCGGG CAGGATCGAC 60061 CGACACGGGC GCCGGGCTGG AGGTGCCGCC ACGGGCTCG.A AATGCGCTGT GGCAGGCGCC 60121 TCCATGCCCG CTGCCGGGAA CGCAGCGCCC GGCCAGCCTC GGGGCGACGC TGCGAACGGG 60181 AGATGCTCCC GGAGAGGCGC CGGGCACAGC CGAGCGCCGT CACCACCGTG CGCACTCGTG 60241 AGCGCTAGCT CCTCGGCATA GAAGAGACCG TCACTCCCGG TCCGTGTAGG CGATCGTGCT 60301 GATCAGCGCG TCCTCCGCCT GACGCGAG7C GAGCCGGGTA TGCTGCACGA CGATGGGCAC 60361 GTCCGATTCG ATCACGCTGG CATAGTCCGT ATCGCGCGGG ATCGGCTCGG GG7CGGTCAG 60421 ATCGTTGAAC CGGACGTGCC GGGTGCGCCT CGCTGGAACG GTCACCCGGT ACGGCCCGGC 60481 GGGGTCGCGG TCGCTGAAGT AGACGGTGAT GGCGACCTGC GCGTCCCGGT CCGACGCATT 60541 CAACAGGCAG GCCGTCTCAT GGCTCGTCA7 CTGCGGCTCA GGTCCGTTGC TCCGGCCTGG 60601 GATGTAGCCC TCTGCGATTG CCCAGCGCGT CCGCCCGATC GGCTTGTCCA TGTGTCCTCC 60661 CTCCTGGCTC CTCTTTGGCA GCCTCCCTCT GCTGTCCAGG TGCGACGGCC TCTTCGC7CG 60721 ACGCGCTCGG GGCTCCATGG CTGAGAATCC TCGCCGAGCG CTCCTTGCCG ACCGGCGCGC 60781 TGAGCGCCGA CGGGCCTTGA AAGCACGCGA CCGGACACGG GATGCCGGCG CGACGAGGCC 60841 GCCCCGCGTC TGATCCCGAT CGTGGCATCA CGACGTCCGC CGACGCCTCG GCAGGCCGGC 60901 GTGAGCGCTG CGCGG7CATG GTCGTCCTCG CGTCACCGCC ACCCGCCGAT TCACATCCCA 60961 CCGCGGCACG ACGCTTGCTC AAACCGCGAC GACACGGCCG GGCGGCTGTG GTACCGGCCA 61021 GCCCGGACGC GAGGCCCGAG AGGGACAGTG GGTCCGCCGT GAAGCAGAGA GGCGATCG.AG 61081 GTGGTGAGAT GAAACACGTT GACACGGGCC GACGAGTCGG CCGCCGGATA GGGCTCACGC 61141 TCGGTCTCCT CGCGAGCATG GCGCTCGCCG GCTGCGGCGG CCCGAGCGAG AAGACCGTGC 61201 AGGGCACGCG GCTCGCGCCC GGCGCCGATG CGCACGTCAC CGCCGACGTC GACGCCGACG 61261 CCGCGACCAC GCGGCTGGCG GTGGACGTCG TTCACCTCTC GCCGCCCGAG CGGATCGAGG 61321 CCGGCAGCGA GCGGTTCGTC GTCTGGCAGC GTCCGAACTC CGAGTCCCCG TGGCTACGGG 61381 TCGGAGTGCT CGACTACAAC GCTGCCAGCC GAAGAGGCAA GCTGGCCGAG ACGACCGTGC 61441 CGCATGCCAA CTTCGAGCTG CTCATCACCG TCGAGAAGCA GAGCAGCCCT CAGTCGCCAT 615C1 CGTCTGCCGC CGTCATCGGG CCGACGTCCG TCGGGTAACA TCGCGCTATC AGCAGCGCTG 61561 AGCCCGCCAG CATGCCCCAG AGCCCTGCCT CGATCGCTTT CCCCATCATC CGTGCGCACT 61621 CCTCCAGCGA CGGCCGCGTC AAAGCAACCG CCGTGCCGGC GCGGCTCTAC GTGCGCGACA 61631 GGAGAGCGTC CTAGCGCGGC CTGCGCATCG CTGGAAGGAT CGGCGGAGCA TGGAGAAAGA 61741 ATCGAGGATC GCGATCTACG GCGCCGTCGC CGCCAACGTG GCGATCGCGG CGGTCAAGTT 618C1 CATCGCCGCC GCCGTGACCG GCAGCTCTGC GATGCTCTCC GAGGGCGTGC ACTCCCTCGT 61861 CGATACCGCA GACGGGCTCC TCCTCCTGCT CGGCAAGCAC CGGAGCGCCC GCCCGCCCGA 61921 CGCCGAGCAT CCGTTCGGCC ACGGCAAGGA GCTCTATTTC TGGACGCTGA TCGTCGCCAT 61981 CATGATCTTC GCCGCGGGCG GCGGCGTCTC GATCTACGAA GGGATCTTGC ACCTCTTGCA 62041 CCCGCGCTCG ATCGAGGATC CGACGTGGAA CTACGTTGTC CTCGGCGCAG CGGCCGTCTT 62101 CGAGGGGACG TCGCTCGCCA TCTCGATCCA CGAGTTCAAG AAGAAAGACG GACAGGGCTA 62161 CGTCGCGGCG ATGCGGTCCA GCAAGGACCC GACGACGTTC ACGATCGTCC TGGAGGATTC 62221 CGCGGCGCTC GCCGGGCTCG CCATCGCCTT CCTCGGCGTC TGGCTTGGGC ACCGCCTGGG 62281 AAACCCCTAC CTCGACGGCG CGGCGTCGAT CGGCATCGGC CTCGTGCTCG CCGCGGTCGC 62341 GGTCTTCCTC GCCAGCCAGA GCCGTGGACT CCTCGTAGGG GAGAGCGCGG ACAGGGAGCT 62401 CCTCGCCGCG ATCCGCGCGC TCGCCAGCGC AGATCCTGGC GTGTCGGCGG TGGGGCGGCC 62461 CCTGACGATG CACTTCGGTC CGCACGAAGT CCTGGTCGTG CTGCGCATCG AGTTCGACGC 62521 CGCGCTCACG GCGTCCGGGG TCGCGGAGGC GATCGAGCGA ATCGAGACAC GGATACGGAG 62581 CGAGCGACCC GACGTGAAGC ACATCTACGT CGAGGCCAGG TCGCTCCACC AGCGCGCGAG 62641 GGCGTGACGC GCCGTGGAGA GACCGCTCGC GGCCTCCGCC ATCCTCCGCG GCGCCCGGGC 62701 TCGGGTAGCC CTCGCAGCAG GGCGCGCCTG GCGGGCAAAC CGTGAAGACG TCGTCCTTCG 62761 ACGCGAGGTA CGCTGGTTGC AAGTTGTCAC GCCGTATCGC GAGGTCCGGC AGCGCCGGAG 62821 CCCGGGCGGT CCGGGCGCAC GAAGGCCCGG CGAGCGCGGG CTTCGAGGGG GCGACGTCAT 62831 GAGGAAGGGC AGGGCGCATG GGGCGATGCT CGGCGGGCGA GAGGACGGCT GGCGTCGCGG 62941 CCTCCCCGGC GCCGGCGCGC TTCGCGCCGC GCTCCAGCGC GGTCGCTCGC GCGATCTCGC 63001 CCGGCGCCGG CTCATCGCCG CCGTGTCCCT CACCGGCGGC GCCAGCATGG CGGTCGTCTC 63061 GCTGTTCCAG CTCGGGATCA TCGAGCACCT GCCCGATCCT CCGCTTCCAG GG7TCGATTC 63121 GGCCAAGGTG ACGAGCTCCG ATATCGCGTT CGGGCTCACG ATGCCGGACG CGCCGCTCGC 63181 GCTCACCAGC TTCGCGTCCA ACCTGGCGCT GGCTGGCTGG GGAGGCGCCG AGCGCGCCAG 63241 GAACACCCCC TGGATCCCCG TCGCCGTGGC GGCCAAGGCG GCCGTCGAGG CGGCCGTGTC 63301 CGGATGGCTC CTCGTCCAGA TGCGACGGCG GGAGAGGGCC TGGTGCGCGT ACTGCCTGGT 63361 CGCCATGGCG GCCAACATGG CCGTGTTCGC GCTCTCGCTC CCGGAAGGGT GGGCGGCGCT 63421 GAGGAAGGCG CGAGCGCGCT CGTGACAGGG CCGTGCGGGC GCCGCGGCCA TCGGAGGCCG 63481 GCGTGC.ACCC GCTCCGTCAC GCCCCGGCCC GCGCCGCGGT GAGCTGCCGC GGACAGGGCG 63541 CGTACCGTGG ACCCCGCACG CGCCGCGTCG ACGGACATCC CC3GCGGC7C GCGCGGCGCG 63601 GCCGGCGCAA CTCCGGCCCG CCGCCGGGCA TCGACATCTC CCGCGAGCAA GGGCAC7CCG 63661 CTCCTGCCCG CGTCCGCGAA CGA7GGCTGC GCTGTTTCCA CCCTGGAGCA ACTCCGTT7A 63721 CCGCGTGGCG CTCG7CGGGC TCATCGCCTC GGCGGGCGGC GCCATCCTCG CGCTCATGAT 63781 CTACGTCCGC ACGCCGTGGA AGCGATACCA GTTCGAGCCC GTCGATCAGC CGGTGCAGTT 63841 CGATCACCGC CATCACGTGC AGGACGATGG CATCGATTGC GTCTACTGCC ACACCAC3G7 63901 GACCCGCTCG CCGACGGCGG GGATGCCGCC GACGGCCACG TGCATGGGGT GCCAC. AGCCA 63961 G.ATCTGGAAT CAGAGCGTCA TGCTCGAGCC CGTGCGGCGG AGC7GGTTCT CCGGCATGCC 64021 GATCCCGTGG AACCGGGTGA ACTCCGTGCC CGACTTCGTT 7AT7TCAACC ACGCGATTCA 64081 CGTGAACAAG GGCGTGGGCT GCGTGAGCTG CC.ACGGGCGC GTGGACGAGA TGGCGGCCG7 64141 CTACAAGGTG GCGCCGATGA CGATGGGCTG GTGCC7GGAG TGCCATCGCC TGCCGGAGCC 64201 GCACCTGCGC CCGCTCTCCG CGATCACCGA CATGCGCTGG GACCCGGGGG AACGGAGGGA 64261 CGAGCTCGGG GCGAAGCTCG CGAAGGAGTA CGGGGTCCGG CGGCTCACGC ACTGC & CAGC 64321 GTGCCATCGA TGAACGATGA ACAGGGGATC TCCGTGAAAG ACGCAGA 7GA GATGA ^ GGAA 64381 TGGTGGCTAG AAGCGCTCGG GCCGGCGGGA GA3CGCGCGT CCT.ACAGGCT GCTGGCGCCG 64441 CTCATCGAGA GCCCGGAGCT CCGCGCGCTC GCCGCGGGCG AACCGCCCCG GGGCGTGGAC 64501 GAGCCGGCGG GCGTCAGCCG CCGCGCGCTG CTCAAGCTGC TCGGCGCGAG CATGGCGCTC 64561 GCCGGCGTCG CGGGCTGCAC CCCGCATGAG CCCGAGAAGA 7CCTGCCGTA CAACGAGACC 64621 CCGCCCGGCG TCGTGCCGGG TCTCTCCCAG TCCTACGCGA CGAGCATGGT GCTCGACGGG 64681 TATGCCATGG GCCTCCTCGC CAAGAGCTAC GCGGGGCGGC CCATCAAGAT CGAGGGCAAC 64741 CCCGCGCACC CGGCGAGCCT CGGCGCGACC GGCGTCCACG AGCAGGCCTC GATCCTCTCG 64801 CTGTACGACC CGTACCGCGC GCGCGCGCCG ACGCGCGGCG GCCAGGTCGC GTCGTGGGAG 64861 GCGCTCTCCG CGCGCTTCGG CGGCGACCGC GAGGACGGCG GCGCTGGCCT CCGCTTCGTC 64921 CTCCAGCCCA CGAGCTCGCC CCTCATCGCC GCGCTGATCG AGCGCGTCCG GCGCAGGTTC 64981 CCCGGCGCGC GGTTCACCTT CTGGTCGCCG GTCCACGCCG AGCAAGCGCT CGAAGGCGCG 65041 CGGGCGGCGC TCGGCCTCAG GCTCTTGCCT CAGCTCGACT TCGACCAGGC CGAGGTGATC 65101 CTCGCCCTGG ACGCGGACTT CCTCGCGGAC A7GCCGTTCA GCGTGCGCTA TGCGCGCGAC 65161 TTCGCCGCGC GCCGCCGACC CGCGAGCCCG GCGGCGGCCA TGAACCGCCT CTACGTCGCG 65221 GAGGCGATGT TCACGCCCAC GGGGACGCTC GCCGACCACC GGCTCCGCGT GCGGCCCGCC 65281 GAGGTCGCGC GCGTCGCGGC CGGCGTCGCG GCGGAGCTCG TGCACGGCCT CGGCCTGCGC 653 1 CCGCGCGGGA TC.ACGGACGC CGACGCCGCC GCGCTGCGCG CGCTCCGCCC CCCGGACGGC 65401 GAGGGGCACG GCGCCT7CGT CCGGGCGCTC GCGCGCGATC TCGCGCGCGC GGGGGGCGCC 65461 GGCG7CGCCG 7CGTCGGCGA CGGCCAGCCG CCCATCGTCC ACGCCCTCGG GCACGTCATC 65521 AACGCCGCGC TCCGCAGCCG GGCGGCCTGG ATGGTCGATC CTGTGCTGAT CGACGCGGGC 65581 CCCTCCACGC .AGGGCTTCTC CGAGCTCGTC GGCGAGCTCG GGCGCGGCGC GG7CGACACC 65641 TGATCCTCCT CGACGTGAAC CCCGTGTACG CCGCGCCGGC CGACGTCGAT TTCGCGGGCC 65701 TCCTCGCGCG CGTGCCCACG AGCTTGAAGG CCGGGCTCTA CGACGACGAG ACCGCCCGCG 65761 CTTGCACGTG GTTCGTGCCG ACCCGGCATT ACCTCGAGTC GTGGGGGGAC GCGCGGGCGT 65821 ACGACGGGAC GGTCTCGTTC GTGCAACCCC TCGTCCGGCC GCTGTTCGAC GGCCGGGCGG 65881 TGCCCGAGCT GCTCGCCGTC TTCGCGGGGG ACGAGCGCCC GGATCCCCGG CTGCTGCTGC 65941 GCGAGCACTG GCGCGGCGCG CGCGGAGAGG CGGATTTCGA GGCCTTCTGG GGCGAGGCA7 66001 TGAAGCGCGG CTTCCTCCCT GACAGCGCCC GGCCGAGGCA GACACCGGAT CTCGCGCCGG 66061 CCGACCTCGC CAAGGAGCTC GCGCGGCTCG CCGCCGCGCC GCGGCCGGCC GGCGGCGCGC 66121 TCGACGTGGC GTTCCTCAGG TCGCCGTCGG TCCACGACGG CAGGTTCGCC AACAACCCCT 66181 GGCTGCAAGA GCTCCCGCGG CCGATCACCA GGCTCACCTG GGGCAACGCC GCCATGATGA 66241 GCGCGGCGAC CGCGGCGCGG CTCGGCGTCG AGCGCGGCGA TGTCGTCGAG CTCGCGCTGC 66301 GCGGCCGTAC GATCGAGATC CCGGCCGTCG TCGTCCGCGG GCACGCCG.AC GACGTGaATCA 66361 GCGTCGACCT CGGCTACGGG CGCGACGCCG GCGAGGAGGT CGCGCGCGGG GTGGGCGTGT 66421 CGGCGTATCG GATCCGCCCG TCCGACGCGC GGTGGTTCGC GGGGGGCC7C TCCGTGAGGA 66481 AGACCGGCGC CACGGCCGCG CTCGCGCTGG CTCAGATCGA GCTGTCCCAG CACGACCGTC 66541 CCATCGCGCT CCGGAGGACG CTGCCGCAGT ACCGTGAACA GCCCGGTTTC G CGGAGGAGC 66601 ACAAGGGGCC GGTCCGCTCG ATCCTGCCGG AGGTCGAGTA CACCGGCGCG CAATGGGCGA 66661 TG7CCA7CGA CATGTCGATC TGCACCGGGT GC7CC7CGTG CGTCGTGGCC TGTCAGGCCG 66721 AGAACAACGT CC7CGTCGTC GGCAAGGAGG AGGTGATGCA CGGCCGCGAG ATGCAGTG3T 66781 TGCGGATCGA TCAGTACTTC GAGGGTGGAG GCGACGAGGT GAGCGTCGTC AACC GCCGA 66841 TGCTCTGCCA GCACTGCGAG AAGGCGCCGT GCGAGTACGT CTGTCCGGTG AACGCGACGG 66901 TCCAC.AGCCC CGATGGCCTC AACGAGATGA TCTACAACCG ATGCATCGGG ACGCGCTTTT 66961 GCTCCAACAA C7GTCCGTAC AAGATCCGGC GGTTCAATTT CTTCGACTAC AATGCCCACG 67021 TCCCGTACAA CGCCGGCCTC CGCAGGCTCC AGCGCAACCC GGACGTCACC GTCCGCGCCC 67081 GCGGCGTCAT GGAGAAATGC ACGTACTGCG TGCAGCGGAT CCGAGAGGCG GACATCCGCG 67141 CGCAGATCGA GCGGCGGCCG CTCCGGCCGG GCGAGGTGGT C.ACCGCCTGC CAGCAGGCCT 67201 GTCCGACCGG CGCGATCCAG TTCGGGTCGC TGGATCACGC GGATACAAAG ATGGTCGCGT 67261 GGCGCAGGGA GCCGCGCGCG TACGCCGTGC TCCACGACCT CGGCACCCGG CCGCGGACGG 67321 AGTACCTCGC CAAGATCGAG .AACCCGAACC CGGGGCTCGG GGCGGAGGGC GCCGAGAGGC 67381 GACCCGGAGC CCCGAGCGTC AAACCCGCGC TCGGGGCGGA GGGCGCCGAG AGGCGACCCG 74 1 GAGCCCCGAG CGTCAAACCG GAGATTGAAT GAGCCATGGC GGGCCCGCTC ATCCTGGACG 67531 CACC3ACCGA CGATCAGCTG TCGAAGCAGC TCCTCGAGCC GGTATGGAAG CCGCGCTCCC 67561 GGCTCGGCTG GATGCTCGCG TTCGGGCTCG CGCTCGGCGG CACGGGCCTG C7CTTCCTCG 67621 CGATCACCTA CACCGTCCTC ACCGGGATCG GCGTGTGGGG CAACAACATC CCGGTCGCCT 67631 GG3CCTTCGC GATCACCAAC TTCGTCTGGT GGATCGGGAT CGGCCACGCC GGGACGTTCA 67741 TCTCCGCGAT CCTCCTCCTG CTCGAGCAGA AGTGGCGGAC GAGCATCAAC CGCTTCGCCG 67301 AGGCGATGAC GCTC77CGCG G7CG7CCAGG CCGGCCTCTT TCCGGTCCTC CACCTCGGCC 67361 3CCCCTGGTT CGCCTACTGG ATCTTCCCGT ACCCCGCGAC GATGCAGGTG TGGCCGCAGT 67921 TCCGGAGCGC GCTGCCGTGG GACGCCGCCG CGATCGCGAC CTACTTCACG GTGTCGCTCC 67931 7GTTCTGG7A CATGGGCCTC GTCCCGGATC TGGCGGCGCT GCGCGACCAC GCCCCGGGCC 68041 GCGTCCGGCG GGTGATCT.AC GGGCTCATGT CGTTCGGCTG GCACGGCGCG GCCGACCACT 63101 TCCGGCAT7A CCGGGTGC7G TACGGGCTGC TCGCGGGGCT CGCGACGCCC CTCGTCGTCT 68161 CGGTGCACTC GATCGTGAGC AGCGATT7CG CGATCGCCCT GGTGCCCGGC TGGCACTCGA 6322 CGCTCT7TCC GCCGTTCTTC GTCGCGGGCG CGATCTTCTC CGGGTTCGCG ATGGTGC7CA 68281 CGCTGCTCAT CCCGGTGCGG CGGATCTACG GGCTCCATAA CGTCGTGACC GCGCGCCACC 63341 TCGACGA7CT CGCGAAGATG ACGCTCGTGA CCGGCTGGAT CG7CATCCTC TCGTACATCA 68401 TCGAGAACTT CCTCGCCTGG TACAGCGGCT CGGCGTACGA GATGCATC .AG TT7TTCCAGA 68461 CGCGCCTGCA CGGCCCGAAC AGCGCCGCCT ACTGGGCCCA GCACGTCTGC AACGTGCTCG 68521 TCATCCAGCT CCTCTGGAGC GAGCGGATCC GGACGAGCCC CGTCGCGCTC TGGCTCATCT 68581 CCCTCCTGGT CAACGTCGGG ATGTGGAGCG AGCGGTTCAC GCTCATCGTG ATGTCGCTCG 68641 AGCAAGAGTT CC7CCCGTCC AAGTGGCACG GCTACAGCCC GACGTGGGTG GACTGGAGCC 63701 7CTTCATCGG GTCAGGCGGC TTCTTCATGC TCCTGTTCCT GAGCTTTTTG CGCGTCTTTC 68761 CGTTCATCCC CGTCGCGGAG GTCAAGGAGC TCAACCATGA AGAGCTGGAG AAGGCTCGGG 63821 GCGAGGGGGG CCGCTGATGG AGACCGGAAT GCTCGGCGAG TTCGATGACC CG GAGGCGAT 68881 GCTCCATGCG ATCCGAGAGC TCAGGCGGCG CGGCTACCGC CGGGTGG.AAG CGTTCACGCC 68941 CTATCCGGTG AAGGGGCTCG ACGAGGCGCT CGGCCTCCCG CGCTCGAACC TCAACCGGAT 69001 GGTGCTGCCC TTCGCGATCC TGGGGGTCGT GGGCGGCTAC TTCGTCCAGT GGTTCTGCAA 69061 CGCTTTCCAC TATCCGCTGA ACGTGGGCGG GCGCCCGCTG AACTCGGCGC CGGCGTTC.AT 69121 CCCGATCACG TTCGAGATGG GGGTGCTCTC CACCTCGATC TTCGGCGTGC TCATCGGCTT 69181 TTACCTGACG AGGCTGCCGA GGCTCTACCT CCCGCTCTTC GACGCCCCGG GCTTCGAGCG CGTCACGCTG 69241 G .ATCGGTTTC TGGTCGGGCT CGACGACACG GAACCTTCCT TCTCGAGCGC 69301 CCAGGCGGAG CGCGACCTCC TCGCGCTCGG CGCCCGGCGC GTCGTCGTCG CGAGGAGGCG 69361 CG.AGGAGCCA TGAGGGCCGG CGCCCCGGCT CGCCCTCTCG GGCGCGCGCT CGCGCCGTTC 69421 GCCCTCGTCC TGCTCGCCGG GTGCCGCGAG AAGGTGCTGC CCGAGCCGGA CTTCGAGCGG 69481 ATGATCCGCC AGGAGAAATA CGGACTCTGG GAGCCGTGCG AGCACTTCGA CGACGGCCGC 69541 GCGATGCAGC ACCCGCCCGA GGGGACCGTC GCGCGCGGGC GCGTCACCGG GCCGCCCGGC 69601 TATCTCCAGG GCGTCCTCGA CGGGGCGTAC GTCACGGAGG TGCCGCTCTT GCTCACGGTC 69661 GAGCTCGTGC AGCGCGGCCG GCAGCGCTTC GAGACCTTCT G CGCGCCGTG CCACGGGATC 69721 CTCGGCGACG GCAGCTCGCG CGTGGCGACG AACATGACGC TGCGCCCGCC CCCGTCGCTC 69781 ATCGGACCCG AGGCGCGGAG CTTCCCGCCG GGCAGGATCT ACCAGGTCAT C.ATCGAGGGC 698 1 TACGGCCTGA TGCCGCGCTA CTCGGACGAT C7GCCCGACA TCGAAGAGCG CTGGGCCGTG 69901 GTCGCCTACG TGAAGGCGCT TCAGCTGAGC CGCGGAGTGG CCGCGGGCGC CCTCCCGCCA 69961 GCGCTCCGCG GCCGGGCAGA GCAGGAGCTG CGATGAACAG GGATGCC.ATC GAGTACAAGG 70021 GCGGCGCGAC GATCGCGGCC TCGCTCGCGA TCGCGGCGCT CGGCGCGGTC GCCGCGATCG 70081 TCGGCGGCTT CGTCGATCTC CGCCGGTTCT TCTTCTCGTA CCTCGCCGCG TGGTCGTTCG 70141 CGGTGTTTCT GTCCGTGGGC GCGCTCGTC.A CGCTCCTCAC CTGCAACGCC ATGCGCGCGG 70201 GC7GGCCCAC GGCGGTGCGC CGCCTCCTCG AGACGATGGT GGCGCCGCTG CCTCTGCTCG 70261 CGGCGCTC7C CGCGCCGATC C7GGTCGGCC TGG.ACACGCT GTATCCGTGG ATGCACCCCG 70321 AGCGGATCGC CGGCGAGCAC GCGCGGCGCA TCCTCGAGCA CAGGGCGCCC TACTTCAATC 70381 C.AGGCTTCTT CGTCGTGCGC TCGGCGATCT ACTTCGCGAT CTGGATCGCC GTCGCCC7CG 70441 TGCTCCGCCG GCGATCG7TC GCGC.AGGACC GTGAGCCGAG GGCCGACGTC AAGGACGCGA 70501 TGTATGGCCT G.AGCGGCGCC .ATGCTGC CGG TCGTGGCGAT CACGATCGTC TTCTCGTCGT 70551 TCGACTGGCT CATGTCCCTC GACGCGACCT GGT.ACTCGAC GATGTTCCCG GTCTACGTGT 70621 TCGCGAGCGC CTTCGTGACC GCCGTCGGCG CGCTCACGG7 CCTCTCGTA7 GCCGCGCAGA 70681 CGTCCGGC7A CCTCGCG GG CTGAACGACT CGCACTATTA CGCGC7CGGG CGGCTGCTCC 70741 7CGCGTTCAC GATATTCTGG GCCTATGCGG CCTATTTCCA GTTCATGTTG ATCTGGATCG 70801 CGAACAAGCC CGATGAGG7C GCCTTCTTCC TCGACCGCTG GGAAGGGCCC TGGCGGCCGA 70861 CCTCCGTGCT CGTCG7CC7C ACGCGGT7CG TCGTCCCGTT CCTGATCCTG ATGTCGTACG 70921 CGATCAAGCG GCGCCCGCGC CAGCTCTCGT GGATGGCGCT CTGGGTCGTC GTCTCCGGC7 70981 ACATCGACTT 7CACTGGCTC GTGGTGCCGG CGACAGGGCG CCACGGGTTC GCCTATCACT 71041 GGCTCGACCT CGCGACCCTG TGCGTCGTGG GCGGCCTCTC GACCGCGTTC GCCGCGTGGC 71101 GGCTGCGAGG GCGGCCGGTG GTCCCGGTCC ACGACCCGCG GCTCGAAGAG GCCTTTGCGT 71161 ACCGGAGCAT ATGATGTTCC GTTTCCGTCA CAGCGAGGTT CyC * rt v? G? AGGACACGCT 71221 CCCCTGGGGG CGCGTGATCC TCGCGTTCGC CGTCGTGCTC GCGATCGGCG GCGCGCTGAC 71231 GC7C7GGGCC TGGCTCGCGA TGCGGGCCCG CGAGGCGGAT CTGCGGCCCT CCCTCGCG7T 71341 CCCCG.AGAAG GATCTCGGGC C CGGCGCvrA GGTCGGCATG GTCCAGCAGT CGCTGTTCGA 71401 CGAGGCGCGC CTGGGCCAGC AGCTCGTCGA CGCGCAGCGC GCGGAGCTCC rG.CCGCTTCGG 71461 CGTC3TCGAT vG jAGAGGo GCATCGTGAG C.ATCCCGATC GACGACGCGA TCGAGCTCAT 71521 GGTGGCGGGG GGCGCGCGAT GAGCCGGGCC GTCGCCGTGG CCCTCCTGCT GGC.AGCCGGC 71581 CTCGTGTCGC GCCCGGGCGC CGCGTCCGAG CCCGAGCGCG CGCGCCCCGC GCTGGGCCCG 71641 TCCGCGGCCG ACGCCGCGCC GGCGAGCGAC GGCTCCGGCG CGGAGGAGCC GCCCGAAGGC 71701 GCCT7CCTGG AGCCCACGCG CGGGGTGGAC ATCGAGGAGC GCCTCGGCCG CCCGGTGGAC 71761 CGCGAGCTCG CCT7CACCGA CATGGACGGG CGGCGGGTGC GCCTCGGCGA CTACTTCGCC 71821 G.ACGGCAAGC CCCTCCTCCT CGTCCTCGCG TACTACCGGT GTCCCGCGCT GTGCGGCCTC 71881 GTGCTGCGCG GCGCCGTCGA GGGGCTGAAG CTCCTCCCGT ACCGGCTCGG CGAGC.AGTTC 71941 C.ACGCGCTCA CGGTCAGCTT CGACCCGCGC GAGCGCCCGG CGGCCGCDD EXAMPLE 2 Construction of an expression vector in Mvxococcus xanthus The DNA that provides the function of phage integration and anchoring Mx8 was inserted into the commercially available vector pACYC184 (New England Biolabs). A fragment of -2360 bp from plasmid pPLH343 digested with Mfel-Smal, described in Salmi et al., Feb. 1998, J. Bact. 180 (3): 614-621, was isolated and ligated to the long restriction fragment of plasmid pACYC184 digested with EcoRI-Xmnl. The circular DNA thus formed was ~ 6 kb in size and was named plasmid pKOS35-77. Plasmid pKOS35-77 serves as a convenient plasmid for the expression of recombinant PKS genes of the invention under the promoter control of epothilone PKS. In an exemplary embodiment, the complete epothilone PKS gene with its homologous promoter is inserted into one or more fragments within the plasmid to produce an expression vector of the invention. The present invention also provides expression vectors in which the recombinant PKS genes of the invention are under the control of the Myxococcus xanthus promoter. To construct an illustrative vector, the promoter of the M. xanthus pilA gene was isolated as a PCR amplification product. Plasmid pSWU357, which comprises the pilA gene promoter as described in Wu and Kaiser, Dec. J. Bact. 179 (24): 7748-7758, was mixed with the PCR primers Seq1 and the primers MxpM 1: Seq1: 5'-AGCGGATAACAATTTCACACAGGAAACAGC-3 '; and Mxpi11: 5'-TTAATTAAGAGAAGGTTGCAACGGGGGGC-3 ', and amplified using standard PCR conditions to produce a -800 bp fragment. This fragment was cleaved with the restriction enzyme Kpnl and ligated with the long restriction fragment of commercially available plasmid pLitmus 28 (New England Biolabs) digested with Kpnl-EcoRV. The resulting circular DNA was designated plasmid pKOS35-71B. The pilA gene promoter from plasmid pKOS35-71 B digested with EcoRV-SnaBI was isolated as a restriction fragment of -800 bp and ligated with the long restriction fragment of plasmid pKOS35-77 digested with Mscl to produce a DNA circular of -6.8 kb in size. Because the -800 bp fragment could be inserted in any of the In two orientations, the ligation produced two plasmids of the same size, which were designated as plasmids pKOS35-82.1 and pKOS35-82.2. The restriction site and function maps of these plasmids are presented in Figure 3. The plasmids pKOS35-82.1 and pKOS35-82.2 serve as a convenient starting material for the vectors of the invention in which a recombinant PKS gene is located under control of the pilA gene promoter of Myxococcus xanthus. These plasmids comprise a unique recognition sequence for the Pacy restriction enzyme located immediately towards the 3 'end of the transcription initiation site of the promoter. In an illustrative mode, the entire epothilone PKS gene without its homologous promoter is inserted into one or more fragments within the plasmids at the PacI site to produce the expression vectors of the invention. The sequence of the pilA promoter in these plasmids is shown below.
CGACGCAGGTGAAGCTGCTTCGTGTGCTCCAGGAGCGGAAGGTGAAGCCGGTCGGCAGCGCCGCGGAGATTC CCTTCCAGGCGCGTGTCATCGCGGCAACGAACCGGCGGCTCGAAGCCGAAGTAAAGGCCGGACGCTTTCGTG AGGACCTCTTCTACCGGCTCAACGTCATCACGTTGGAGCTGCCTCCACTGCGCGAGCGTTCCGGCGACGTGT CGTTGCTGGCGAACTACTTCCTGTCCAGACTGTCGGAGGAGTTGGGGCGACCCGGTCTGCGTTTCTCCCCCG AGACACTGGGGCTATTGGAGCGCTATCCCTTCCCAGGCAACGTGCGGCAGCTGCAGAACATGGTGGAGCGGG CCGCGACCCTGTCGGATTCAGACCTCCTGGGGCCCTCCACGCTTCCACCCGCAGTGCGGGGCGATACAGACC CCGCCGTGCGTCCCGTGGAGGGCAGTGAGCCAGGGCTGGTGGCGGGCTTCAACCTGGAGCGGCATCTCGACG ACAGCGAGCGGCGCTATCTCGTCGCGGCGATGAAGCAGGCCGGGGGCGTGAAGACCCGTGCTGCGGAGTTGC TGGGCCTTTCGTTCCGTTCATTCCGCTACCGGTTGGCCAAGCATGGGCTGACGGATGACTTGGAGCCCGGGA GCGCTTCGGATGCGTAGGCTGATCGACAGTTATCGTCAGCGTCACTGCCGAATTTTGTCAGCCCTGGACCCA TCCTCGCCGAGGGGATTGTTCCAAGCCTTGAGAATTGGGGGGCTTGGAGTGCGCACCTGGGTTGGCATGCGT AGTGCTAATCCCATCCGCGGGCGCAGTGCCCCCCGTTGCAACCTTCTCTTAATTAA To make the host cells of Myxococcus xanthus of the invention, M. xanthus cells are grown in CYE medium (Fields and Zusman, 1975, Regulation of development in Myxococcus xanthus: effect of 3 ': 5'-cyclic AMP, ADP, and nutrition, Proc. Natl. Acad. Sci. USA 72: 518-522) in a Klett of 100 to 30 ° C at 300 rpm. The remaining protocol is conducted at 25 ° C unless otherwise indicated. The cells are concentrated by centrifugation (8000 rpm for 10 min in an SS34 or SA600 rotor) and resuspended in deionized water. The cells are then concentrated again and resuspended in 1/100 of the original volume. The DNA (one or two μL) is electroporated into the cells in a 0.1 cm cuvette at room temperature at 400 ohm, 25 μFD, 0.65 V with a constant time in the range of 8.8 - 9.4. The DNA must be free of salts and thus must be resuspended in distilled and deionized water or dialyzate in a membrane type VS of 0.025 μm (Millipore). For low efficiency electroporations, it is necessary to briefly dialyze the DNA, and allow extra growth in CYE. Immediately after electroporation, add 1 mL of CYR, and group the cells in the cuvette with an additional 1.5 mL of CYR previously added to a 50 mL Erlenmeyer bottle (total volume 2.5 mL). Allow cells to grow for four to eight hours (or overnight) at 30 to 32 ° C at 300 rpm to allow expression of the selection marker. Then, plant the cells on CYE soft agar in dishes with selection. If kanamycin is the selection marker, then typical yields are 103 to 105 per μg of DNA. If streptomycin is the selection marker, then it must be included in the top part of the agar, because it binds to the agar.
With this method, the recombinant DNA expression vectors of the invention are electroporated into Myxococcus host cells expressing recombinant PKS of the invention and produce epothilone, epothilone derivatives, and other novel polyketides encoded thereby.
EXAMPLE 3 Construction of an artificial bacterial chromosome (BAC) for the expression of epothilone in Mvxococcus xanthus To express the genes of the epothilone PKS and modification enzymes in a heterologous host to produce epothilones by fermentation, Myxococcus xanthus, which is closely related to Sorangium cellulosum and for which a number of cloning vectors are available, which also they can be used according to the methods of the invention. Because both M. xanthus and S. cellulosum are myxobacteria, they are expected to share common elements of gene expression, translational control, and post-translational modifications (if any), thereby improving the likelihood that the genes Epo of S. cellulosum can be expressed to produce epothilone in M. xanthus. Second, M. xanthus has been developed for gene cloning and expression. DNA can be introduced by electroporation, and a number of vectors and genetic markers are available for the introduction of DNA external, including those that allow its stable insertion inside the chromosome. Finally, M. xanthus can grow relatively easily in complex media in fermentors and can be subjected to manipulation to increase gene expression, if required. To introduce the epothilone gene cluster within Myxococcus xanthus, one can construct the grouping of epothilone within the chromosome by using cosmids of the invention and recombination homologs to assemble the clustering of the entire gene. Alternatively, the complete gene pool of the epothilone gene can be cloned into an artificial bacterial chromosome (BAC) and then moved into M. xanthus for integration into the chromosome. To assemble the gene pool from the cosmids pKOS35-70.1A2, and pKOS35-79.85, small regions of homology from these cosmids have to be introduced into Myxococcus xanthus to provide recombination sites for larger pieces of the gene pool. As shown in Figure 4, plasmids pKOS35-154 and pKOS90-22 were created to introduce these recombination sites. The strategy for assembling the epothilone gene cluster in the chromosome of M. xanthus is shown in Figure 5. Initially, a neutral site on the chromosome of the bacterium is chosen that does not alter any gene or transcriptional unit. One of said regions is towards the 3 'end of the devS gene, which has been shown not to affect the growth or development of M. xanthus. The first plasmid, pKOS35-154, is linearized with Dral and electroporated within M. xanthus. This plasmid contains two regions of the dev locus flanking two fragments of the epothilone gene pool. Inserted between the region of the epo gene is the kanamycin resistance marker and the galK gene. Resistance to kanamycin is generated in the colonies if the DNA recombines within the dev region by double recombination using the dev sequence as regions of homology. This strain, K35-159, contains small regions of the epothilone gene pool that will allow the recombination of pKOS35-79.85. Because the resistance markers in pKOS35-79.85 are the same as for K35-159, a tetracycline transposon was transposed within the cosmid, and the cosmids containing the transposon inserted within the kanamycin marker were selected. This cosmid, pKOS90-23, was electroporated into K35-159, and the oxytetracycline resistant colonies were selected to create strain K35-174. To remove the unwanted regions from the cosmid and leave only the epothilone genes, the cells were seeded on CYE dishes containing 1% galactose. The presence of the galK gene makes the cells sensitive to 1% of galactose. Galactose-resistant K35-174 colonies represent cells that have lost the galK marker by recombination or by mutation in the galK gene. If the recombination event occurs, then the galactose-resistant strain is sensitive to kanamycin and oxytetracycline. Strains sensitive to both antibiotics are verified by Southern blot analysis. The correct strain is identified and designated K35-175 and contains the gene pool of epothilone from module 7 to the two open reading frames passing the epoL gene. To introduce modules 1 to module 7, the above process is repeated one more time. Plasmid pKOS90-22 is linearized with Dral and electroporated into K35-175 to create K35-180. This strain is electroporated with the tetracycline resistant version of pKOS35-70.1A2, pKOS90-38, and the oxytetracycline resistant colonies are selected. This creates strain K35-185. The recombinants that now have the complete epothilone gene cluster are selected for their 1% galactose resistance. This results in strain K35-188. This strain contains all the epothilone genes as well as all potential promoters. This strain is fermented and tested for the production of epothilones A and B. To clone the complete gene pool as a fragment, a library of an artificial bacterial chromosome (BAC) is constructed. First, SMP44 cells are embedded in agarose and lysed according to the BIO-RAD genomic DNA connection kit. The DNA is partially digested with restriction enzyme, such as Sau3AI or Hindlll, and electroporated in a FIGE or CHEF gel. The adapter DNA fragments are isolated by electroelution of the DNA from the agarose or by using gelase to degrade the agarose. The method of choice for isolating the fragments is electroelution, as described in Strong et al., 1997, Nucleic Acids Res. 19: 3959-3961, incorporated herein by reference. The DNA bound within BAC (pBeloBACII) is cleaved with the appropriate enzyme. A pBeloBACII map is shown in Figure 10. DNA is electroporated into DH10B cells by the method of Sheng et al., 1995, Nucleic Acids Res. 23: 1990-1996, incorporated herein by reference, to create a library of S. cellulosum. The colonies are selected using a probe from the NRPS region of the epothilone pool. The positive clones are taken and the DNA is isolated by restriction analysis to confirm the presence of the complete gene pool. This positive clone is designated pKOS35-178. To create a strain that can be used to introduce pKOS35-178, a plasmid, pKOS35-164, is constructed which contains the homology regions that are towards the 5 'end and towards the 3' end of the epothilone gene pool flanked by the locus dev and containing the galK cassette of kanamycin resistance, analogous to plasmids pKOS90-22 and pKOS35-154. This plasmid is linearized with Dral and electroporated into M. xanthus, according to the method of Kafeshi et al., 1995, Mol. Microbiol. 15: 483-494, to create K35-183. Plasmid pKOS35-178 can be introduced into K35-183 by electroporation or by transduction with bacteriophage P1 and the chloramphenicol-resistant colonies are selected. Alternatively, a version of pKOS35-178 containing the origin of the conjugative transfer from pRP4 can be constructed to transfer DNA from E. coli to K35-183. This plasmid is made by first building a transposon that contains the oriT region from RP4 and the tetracycline resistance marker from pACYC184 and transposing the transposon in vitro or in vivo in pKOS35-178. This plasmid is transformed to S17-1 and conjugated within M. xanthus. This strain, K35-190, is grown in the presence of 1% galactose to select the second recombination event. This strain contains all the epothilone genes as well as all potential promoters. This strain will be fermented and tested for the production of epothilones A and B. In addition to integrating pKOS35-178 within the dev locus, it can also be integrated into a phage-binding site using the integration functions from the Mx8 mixofagos or Mx9. A transposon containing the integration genes and the att site are constructed from either Mx8 or Mx9 together with the tetracycline gene from pACYC184. Alternative versions of this transposon may have only the binding site. In this version, integration genes are provided in trans by co-electroporation of a plasmid containing the integrase gene or having the integrase protein expressed in the electroporated strain from any constitutive promoter, such as the mgl promoter (see Magrini et al., Jul. 1999, J. Bact. 181 (13): 4062-4070, incorporated herein by reference). Once the transposon is constructed, it transposes into pKOS35-178 to create pKOS35-191. This plasmid is introduced into Myxococcus xanthus as described below. This strain contains all the epothilone genes as well as all potential promoters. This strain is fermented and tested for the production of epothilones A and B.
Once the epothilone genes have been established in a strain of Myxococcus xanthus, manipulation of any part of the gene pool, such as promoter exchange or module exchange, can be carried out using the kanamycin and galK resistance cassette. Cultures of Myxococcus xanthus containing the epo genes are grown in a number of media and the production of epothilones is examined. If the production levels of epothilones (in particular B or D) are too low to allow large-scale fermentation, the clones of Myxococcus xanthus producers undergo development in medium and strain improvement, as described below to improve production in Streptomyces.
EXAMPLE 4 Construction of an expression vector in Streptomyces The present invention provides recombinant expression vectors for the heterologous expression of modular polyketide synthase genes in Streptomyces hosts. These vectors include expression vectors that employ the acti promoter that is regulated by the acti I gene ORF4 to allow regulated expression at high levels when the growing cells enter the stationary phase. Among the available vectors are the plasmids pRM1 and pRM5, and derivatives thereof such as pCK (figure 11), which is a stable, low copy number plasmid that carries the marker for resistance to thiostrepton in actinomycetes. Said plasmids can house long inserts of cloned DNA and have been used for the expression of PKS DEBS in S. coelicolor and S. lividans, the PKS gene of picromycin in S. lividans, and the PKS genes of oleandomycin in S. lividans. See patent of E.U.A. No. 5,712,146. Those skilled in the art will recognize that S. lividans does not make the tRNA that recognizes the TTA codon for leucine until growth in the last growth phase and if the production of a protein is required before, then appropriate modifications of the codon can be made. Another vector is a derivative of plasmid pSET152 and comprises the acti I expression system ORF4-Pactl but carries the selection marker for paramycin resistance. These vectors contain the attP site and the actinofago integrase gene phiC31 and do not replicate autonomously in Streptomyces hosts but are integrated by site-specific recombination within the chromosome at the binding site for phiC31 after introduction into the cell. The derivatives of pCK7 and pSET152 (Figure 12) have been used together for the heterologous production of a polyketide, with different PKS genes expressed from each plasmid. See the patent of E.U.A. Serial No. 60 / 129,731, filed on April 16, 1999, incorporated herein by reference. The need to develop expression vectors for epothilone PKS that works in Streptomyces is significant. The compounds Epothilone are usually produced in the genetically intractable host of Sorangium cellulosum, slow-growing or synthetically made. Streptomycetes, bacteria that produce more than 70% of all known antibiotics and important complex polyketides, are important hosts for the production of epothilone and epothilone derivatives. S. lividans and S. coelicolor have been developed for the expression of heterologous PKS systems. These organisms can stably maintain cloned genes of heterologous PKS, express them at high levels under controlled conditions, and modify the corresponding PKS proteins (e.g., phosphopantethenylation) so that they are capable of producing the polyketide they encode. In addition, these hosts contain the necessary routes to produce the substances required for the synthesis of the polyketide, for example, malonyl CoA and mefilmalonyl CoA. A wide variety of cloning and expression vectors are available for these hosts, as well as methods for the introduction and stable maintenance of long segments of external DNA. In relation to slow growth, the hosts Sorangium, S. lividans and S. coelicolor grow well in a number of media and have adapted to high levels of polyketide production in fermenters. A number of methods are available for performance improvements, including rational methods to increase expression rates, increase precursor supply, etc. The empirical methods to increase the titles of the polyketides, for a long time that proved effective for numerous other polyketides in streptomycetes, can also be used for the production of epothilone and epofilone derivatives in host cells of the invention. To produce epothilones by fermentation in a heterologous Streptomyces host, the epothilone PKS genes (including the NRPS module) are cloned into two segments in derivatives of pCK7 (loading domain up to module 6) and pKOS10-153 (modules 7 through 9). The two plasmids are introduced into S. lividans using the selection for resistance to thiostreptone and apramycin. In this arrangement, the pCK7 derivative is autonomously replicated while the pKOS10-153 derivative is integrated into the chromosome. In both vectors, the expression of the epothilone genes is from the active promoter resident within the plasmid. To facilitate cloning, the two segments encoding the epothilone PKS (one for the loading domain up to module six and one module seven through nine) were cloned translationally fused with the N-terminal segment of the KS domain of the module 5 of the PKS ery. High levels of expression have been demonstrated from this promoter using KS5 as the first translation sequence, see Jacobsen et al., 1998, Biochemistry 37: 4928-4934, incorporated herein by reference. A suitable BsaBI site is contained within the segment encoding the EPIAV amino acid sequence that is highly conserved in many KS domains including the regions encoding the epoxide KS and modulo 7 in epoE.
The charge domain expression vector and modules one through six of the epothilone PKS were designated pKOS039-124, and the expression vector for modules seven through nine was designated pKOS039-126. Those skilled in the art will recognize that other vectors and vector components can be used to make equivalent vectors. Because the preferred expression vectors of the invention, described below, are derived from pKOS039-124 and pKOS039-126, only a summary of the construction of plasmids pKOS039- has been deposited under the terms of the Budapest Treaty. 124 and pKOS039-126 is provided below. The eryKSd adapter coding sequences were cloned as a Pacl-BglII restriction fragment of -0.4 kb from plasmid pKOS10-153 into pKOS039-98 to construct the plasmid pKOS039-117. The coding sequences for the eryKSd adapter were linked to those for the epothilone loading domain by inserting the EcoRI-Xbal restriction fragment of -8.7 kb from the cosmid pKOS35-70.1A2 into the pLltmus28 plasmid digested with EcoRI-Xbal. The restriction fragments of -3.4 kb of BsaBI-Notl and -3.7 kb of Notl-Hindlll from the resulting plasmid were inserted into the plasmid pKOS039-117 digested with BsaBI-HindIII to construct the plasmid pKOS039-120. The restriction fragment of -7 kb digested with Pacl-Xbal from plasmid pKOS039-120 was inserted into plasmid pKA018 'to construct the plasmid pKOS039-123. The final expression vector pKOS039-124 was constructed by ligating the -34 kb restriction fragment of the cosmid pKOS35-70.1A2 digested with Xbal-Avrll with the restriction fragment of -21.1 kb of pKOS039-123 digested with Avrll-Xbal. The plasmid expression vector pKOS039-126 was constructed as follows. First the coding sequences of module 7 were ligated from the cosmids pKOS35-70.4 and pKOS35-79.85 by cloning the restriction fragment of -6.9 kb of pKOS35-70.4 digested with Bgl II-Notl and the restriction fragment of -5.9 kb of pKOS35-79.85 digested with Notl-HindIII within the plasmid pLitmus28 digested with BglII-HindIII to construct the plasmid pKOS039-119. The restriction fragment of -12 kb of the cosmid pKOS035-79.85 digested with Ndel-Nhel was cloned into plasmid pKOS039-119 digested with Ndel-Xbal to construct the plasmid pKOS039-122. To fuse the coding sequences of the adapter eryKS5 with the coding sequences for module 7, the restriction fragment of -1 kb digested with BsaBI-BglII derived from the cosmid pKOS35-70.4 was cloned into the plasmid pKOS039-117 digested with BsaBI- Bcll to construct the plasmid pKOS039-121. The restriction fragment of -21.5 kb from plasmid pKOS039-122 digested with Avrll was cloned into plasmid pKOS039-121 digested with Avrll-Xbal to construct the plasmid pKOS039-125. The restriction fragment of -21.8 kb of the plasmid pKOS039-125 digested with Pacl-EcoRI was ligated with the restriction fragment of -9 kb of plasmid pKOS039-44 digested with Pacl-EcoRI to construct pKOS039-126.
Plasmids pKOS039-124 and pKOS125 were introduced into S. lividans K4-114 sequentially using the selection for the corresponding drug resistance marker. Because the plasmid pKOS039-126 does not replicate autonomously in streptomycetes, the selection is for cells in which the plasmid has been integrated into the chromosome by site-specific recombination in the attB site of phC31. Because the plasmid is stably integrated, continuous selection for apramycin resistance is not required. The selection can be maintained if desired. The presence of thiostrepton in the medium is maintained to ensure continuous selection for the plasmid pKOS039-124. The plasmids pKOS039-124 and pKOS039-126 were transformed into Streptomyces lividans K4-114, and the transformants containing the plasmids were cultured and tested for the production of epothilones. Initial tests do not indicate the presence of an epothilone. To improve the production of epothilones from these vectors, the eryKSd adapter sequences were replaced by the coding sequences of the epothilone PKS gene, and the vectors were introduced into Streptomyces coelicolor CH999. To amplify the coding sequences from the coding sequence of the epoA gene, two oligonucleotide primers were used: N39-73, 5'- GCTTAATTAAGGAGGACACATATGCCCGTCGTGGCGGATCGTCC-3 '; and N39-74, 5'-GCGGATCCTCGAATCACCGCCAATATC-3 '.
The DNA template was derived from the cosmid pKOS35-70.8A3. The amplification product of -0.8 kb was directed with the restriction enzymes PacI and BamHl and then ligated with the restriction fragment of -2.4 kb BamHI-Notl and the restriction fragment of -6.4 kb Pacl-Notl of the plasmid pKOS039- 120 to construct the plasmid? KOS039-136. To make the expression vector for the epoA, epoB, epoC and epoD genes, the restriction fragment of -5 kb Pacl-Avrll of the plasmid pKOS039-136 was bound with the restriction fragment of -50 kb Pacl-avrll of the plasmid pKOS039 -124 to construct the expression plasmid pKOS039-124R. Plasmid pKOS039-124R has been deposited with the ATCC under the terms of the Treaty of Budapest and is available with the number. To amplify the coding sequences from the epoE gene sequence, two oligonucleotide primers were used. N39-67A, 5'-GCTTAATTAAGGAGGACACATATGACCGACCGAGAAGGCCAGCTC-CTGGA-3 ', and NST-β. d'-GGACCTAGGCGGGATGCCGGCGTCT-S '. The DNA template was derived from the cosmid pKOS35-70.1A2. The -0.4 kb amplification product was digested with the PacI and Avrll restriction enzymes and ligated with eitthe -29.5 kb restriction fragment of the pKOS039-126 plasmid digested with Pacl-Avrll or the restriction fragment of -23.8 kb of plasmid pKOS039-125 digested with Pacl-Avrll to construct the plasmid pKOS039-126R or the plasmid pKOS039-125R, respectively. Plasmid pKOS039-126R was deposited with the ATCC under the terms of the Budapest Treaty and is available under the access number. The pair of plasmids pKOS039-124R and pKOS039-126R (as well as the pair of plasmids pKOS039-124 and pKOS039-126) contain the total complement of the genes epoA, epoB, epoC, epoD, epoE, epoF, epoK and epoL. The last two genes are presented in the plasmid pKOS039-126R (as well as the plasmid pKOS039-126); however, to ensure that these genes were expressed at high levels, anotexpression vector of the invention, plasmid pKOS039-1 1 (FIG. 8), was constructed in which the epoK and epoL genes were placed under the control of the promoter. ermE *. The epoK gene sequences were amplified by PCR using the oligonucleotide primers: N39-69, 5'-AGGCATGCATATGACCCAGGAGCAAGCGAATCAGAGTG-3 '; and N39-70, 5'-CCAAGCTTTATCCAGCTTTGGAGGGCTTCAAG-3 \ The epoL gene sequences were amplified by PCR using the oligonucleotide primers: N39-71A, 5'-GTAAGCTTAGGAGGACACATATGATTGCAACTCGCGCGCGGGTG-3 '; and N39-72, 5'-GCCTGCAGGCTCAGGCTTGCGCAGAGCGT-3 '. The DNA template for the amplification was derived from the cosmid pKOS35-79.85. The PCR products were subcloned into PCR-script for sequence analysis. So, the epoK and epoL genes are isolated from the clones as the restriction fragments Ndel-Hindlll and HindIII-EcoRI, respectively, and ligated with the restriction fragment of -6 kb of the plasmid? KOS039-134B digested with Ndel-EcoRI, which contains the promoter ermE *, to construct the plasmid pKOS039-140. The restriction fragment of -2.4 kb of plasmid pKOS039-140 digested with Nhel-Pstl was cloned into plasmid pSAM-Hyg digested with Xbal-Pstl, a plasmid derived from pSAM2 containing a gene that confers hygromycin resistance, to construct the Plasmid pKOS039-141. Anotvariant of plasmid pKOS039-16R was constructed to provide the epoE and epoF genes in an expression vector without the epoK and epoL genes. This plasmid, pKOS045-12 (Figure 9), was constructed as follows. Plasmid pXH106 (described in J. Bact., 1991, 173: 5573-5577, incorporated in by reference) was digested with the restriction enzymes Stul and BamHl, and the restriction fragment of -2.8 kb contained in it was isolated and cloned. the genes xylE and the one that confers resistance to hygromycin within the plasmid pLitmus28 digested with EcoRV-Bgl II. The restriction fragment of -2.8 kb of the resulting plasmid digested with Ncol-Avrll was ligated to the restriction fragment of -18 kb of the plasmid pKOS039-125R digested with Pacl-BspHI and to the restriction fragment of ~ 9 kb of the plasmid pKOS039-42 digested with Spel-Pacl to construct the plasmid pKOS045-12. To construct an expression vector comprising only the epoL gene, the plasmid pKOS039-141 was partially digested with the restriction enzyme Ndel, the Ndel restriction fragment of -9 kb was isolated, and then the fragment was circularized by ligation to produce the plasmid pKOS045-150. The various expression vectors described above were then transformed into Streptomyces coelicolor CH999 and S. lividans K4-114 in a variety of combinations, the transformed host cells were fermented on boxes and liquid culture (medium R5, which is identical to the medium R2YE without agar). Typical fermentation conditions are described below. First, a seed culture of around 5mL containing 50 μg / L of thiostrepton is inoculated and grows at 30 ° C for two days. Then, about 1 to 2 mL of the seed culture is used to inoculate a production culture of about 50 mL containing 50 μg / L of thiostrepton and 1 mM cysteine, and the production culture was grown at 30 ° C for 5 days . Also, the seed culture was used to prepare boxes of cells (the boxes contain the same medium as the production culture with 10 mM propionate), which were grown at 30 ° C for nine days. Certain cultures of Streptomyces coelicolor and culture broths were analyzed for the production of epothilones. The liquid cultures were extracted three times with equal volumes of ethyl acetate, the organic extracts were combined and evaporated, and the residue was dissolved in acetonitrile for LC / MS analysis. The medium from the agar boxes was cut and extracted twice with equal volumes of acetone, and the acetone extracts were combined and evaporated in an aqueous suspension, which was extracted three times with equal volumes of ethyl acetate. The organic extracts were combined and evaporated, and the residue was dissolved in acetonitrile for LC / MS analysis. The production of epothilones was evaluated using LC mass spectrometry. The outflow from the UV detector of an analytical HPLC was likewise divided between an API100LC mass spectrometer from Perkin-Elmer / Sciex and an Alltech 500 light scattering evaporative detector. The samples were injected onto an HPLC column of Reverse phase of 4.6 x 150 mm (MetaChem 5 m ODS-3 inert silica) balanced in water with a flow rate of 1.0 mL / min. UV detection was established at 250 nm. The components of the sample were separated using H20 for 1 minute, then a linear gradient from 0 to 100% acetonitrile for 10 minutes. Under these conditions, epothilone A elutes at 10.2 minutes and epothilone B elutes at 10.5 minutes. The identity of these compounds was confirmed by mass spectrum obtained using an atmospheric chemical ionization source with orifice and ring voltages set as 75 V and 300 V, respectively, and a mass resolution of 0.1 amu. Under these conditions, epothilone A shows [M + H] at 494.4 amu, with fragments observed at 476.4, 318.3, and 306.4 amu. Epothilone B shows [M + H] at 508.4, with fragments observed at 490.4, 320.3, and 302.4 amu. Transformants containing the pair of vectors pKOS039-124R and pKOS039-126R or pKOS039-124 and pKOS039-126R produce detectable amounts of epothilones A and B. The transformants containing these pairs of plasmids and the additional plasmid pKOS039-141 produced similar amounts of epothilones A and B, indicating that additional copies of the epoK and epoL genes were not required for production under the assay conditions employed. Thus, these transformants produced epothilones A and B when the recombinant genes epoA, epoB, epoC, epoD, epoE, epoF, epoK, and epoL were present. In some cultures, the absence of propionate was found to increase the ratio of epothilone B to epothilone A. The transformants containing the pair of plasmids pKOS039-124R and pKOS045-12 produced epothilones C and D, as did the transformants containing this pair of plasmids and the additional plasmid pKOS039-150. These results showed that the epoL gene is not required under the test conditions used to form the double bond C-12-C-13. These results indicate that either the epothilone PKS gene alone is capable of forming the double bond or that Streptomyces coelicolor expresses a gene product capable of converting epothilones G and H to epothilones C and D., these transformants produce epothilones C and D when the recombinant genes epoA, epoB, epoC, epoD, epoE, epoF, epoK, and epoL were present. The heterologous expression of the epothilone PKS described herein is believed to represent the recombinant expression of the proteins and the longer active enzyme complexes that have been expressed in recombinant host cells. The epothilone-producing Streptomyces coelicolor transformants exhibit growth characteristics that indicate that either the epothilone PKS genes, or their products, or the cell growth inhibited by epothilones were somehow toxic to the cells. Any inhibition or toxicity could be due to the accumulation of the epothilones in the cell, and it is believed that the native producing Sorangium cells may contain carrier proteins that in effect pump epothilones out of the cell. It is believed that said transporter genes are included among the ORFs located towards the 3 'end of the epoK gene and described above. Thus, the present invention provides Streptomyces cells and other host cells that include recombinant genes encoding the products of one or more, including all, the ORF in this region. For example, each ORF can be cloned together with the ermE * promoter, see Stassi et al., 1998, Appl. Microbiol. Biotechnol. 49: 725-731, incorporated herein by reference, into a plasmid based on pSAM2 that can be integrated into the chromosome of Streptomyces coelicolor and of S. lividans at a site other than attB of phage phiC31, see Smokvina et al., 1990, Gene 94: 53-59, incorporated herein by reference. A vector based on pSAM2 carrying the gene for hygromycin resistance is modified to carry the ermE * promoter together with additional cloning sites. Each ORF towards the 3 'end is cloned by PCR into the vector which is then introduced into the host cell (which also contains pKOS039-124R and pKOS039-126R or other expression vectors of the invention) using the hygromycin selection. The clones that carry each gene individual to the 3 'end of epoK are analyzed for increased production of epothilones. Additional efforts to improve fermentation and strain can be conducted as illustrated below. The expression levels of epothilone PKS genes in the various constructs can be measured by evaluating mRNA levels (by quantitative RT PCR) relative to the levels of other heterologous PKS mRNAs (e.g. picromycin) produced from the cloned genes in similar expression vectors in the same host. If one of the epothilone transcripts is subproduced, experiments are conducted to improve its production by cloning the corresponding DNA segments in a different expression vector. For example, multiple copies of any one or more of the epothilone genes can be introduced into a cell if one or more gene products are rate-limiting by biosynthesis. If the basis for the low level of expression is not related to the low level of expression of the PKS gene (at the RNA level), a method of empirical mutagenesis and selection is undertaken, which is the basis of the improvement in the production of all products of commercially important fermentation. The spores are subjected to UV, X or chemical mutagens, and the surviving individuals are sown and taken and tested for the level of compound produced in the small scale fermentations. Although this process can be automated, one can examine several thousand isolates for the quantifiable production of epothilone using the susceptible fungus Mucor hiemalis as test organisms. Another method for increasing the production of epothilones produced is to change the KSY domain of the epothilone PKS loading domain to a KSQ domain. Such altered load domains can be constructed in any of a variety of ways, but an illustrative method is mentioned below. Plasmid pKOS039-124R of the invention can conveniently be used as a raw material. To amplify the DNA fragments useful in construction, four oligonucleotide primers are employed. N39-83: 5'-CCGGTATCCACCGCGGACACACGGC-3 \ N39-84: 5, -GCCAGTCGTCCTCGCTCGTGGCCGTTC-3 ', and N39-73 and N39-74, which have been described above. The PCR fragment generated with N37-73 and N39-83 and the PCR fragment generated with N39-74 and N39-84 are treated with the restriction enzymes Pací and BamHl, respectively, and ligated with the -3.1 kb fragment of plasmid pKOS39-120 digested with Pacl-BamHI to construct the plasmid pKOS039-148. The -0.8 kb restriction fragment of the plasmid pKOS039-148 digested with Pacl-BamHI (comprising the two PCR amplification products) is ligated with the restriction fragment of -2.4 kb of plasmid pKOS39-120 digested with BamHI-Notl and with the restriction fragment of -6.4 kb of plasmid pKOS39-120 digested with Pacl-Notl to construct pKOS39-136Q. The restriction fragment of -5 kb of the plasmid pKOS039- 136 Q digested with Pacl-Avrll is ligated to the -50 kb restriction fragment of plasmid pKOS039-124 digested with Pacl-Avrll to construct the plasmid pKOS39-124Q. The plasmids pKOS039-124Q and pKOS039-126R are then transformed into Streptomyces coelicolor CH999 for the production of epothilone. Cloned and expressed genes from epoA to epoF, optionally with epoK or with epoK plus epoL, are sufficient for the synthesis of epothilone compounds, and the distribution of congeners C-12 H to C-12 methyl seems to be similar to the in natural hosts (A: B :: 2: 1). This relationship reflects that the AT domain of module 4 is more similar to the consensus domains of AT that specify malonil instead of methylmalonyl. Thus, epothilones D and B are produced at lower amounts than their unmethylated counterparts C and A C-12. The invention provides genes of PKS that exclusively produce epothilone D and / or B. Specifically, the specific AT domains of methylmalonyl CoA from a number of sources (for example the PKS of narbonolide, the PKS of rapamycin, and others listed above) they can be used to replace those naturally occurring in the domain in module 4. The exchange is carried out by direct cloning of the incoming DNA into the appropriate site in the DNA segment encoding epothilone PKS or by gene replacement through homologous recombination. For gene replacement through homologous recombination, the donor sequence to be exchanged is placed in a vector of administration between the segments of at least 1 kb in length that flank the DNA encoding the AT domain of the 4 epo module. Cross-linking in the homologous regions results in the exchange of the AT4 epo domain with that of the delivery vector. Because pKOS039-124 and pKOS039-124R contain the AT4 coding sequences, these can be used as the host DNA for replacement. Adjacent DNA segments are cloned into one of a number of E. coli plasmids that are temperature sensitive for replication. The heterologous AT domains can be cloned into these plasmids in the correct orientation between the homologous regions as cassettes that enable the ability to carry out several AT exchanges simultaneously. The reconstructed plasmid (pKOS039-124 * or pKOS039-124 *) is tested for its ability to direct the synthesis of epothilone B and / or by inserting it together with pKOS039-126 or pKOS039-126R in Streptomyces coelicolor and / or S. lividans. Because the polyketide titers can be several from strain to strain carrying the different gene replacements, the invention provides a number of heterologous AT specific domains of methylmalonyl CoA to ensure that the production of epothilone D at titers equivalent to those of the mixture of epothilone C and D produced in the Streptomyces coelicolor host described above. In addition, longer segments of donor genes can be used for replacements, including, in addition to the AT domain, sequences adjacent to the 5 'end and toward the 3' end corresponding to an entire domain. If an entire module is used for the replacement, the DNA segments encoding KS, methylmalonyl AT, DH, KR, ACP can be obtained from, for example and without limitation, the DNA encoding the tenth module of the rapamycin PKS, or the first or fifth module of the PKS of FK-520.
EXAMPLE 5 Heterologous expression of EpoK and conversion of epothilone D to epothilone B This example describes the construction of expression vectors of £. coli for epoK. The product of the epoK gene is expressed in E. coli as a fusion protein with a poly-histidine appendage (histidine appendage). The fusion protein was purified and used to convert epothilone D to epothilone B. The plasmids were constructed to encode fusion proteins composed of six histidine residues fused to either the amino or carboxyl terminus of EpoK. The following oligonucleotides were used to construct the plasmids. 55-101.aI: 5 '-AAAAACATATGCACCACCACCACCACCACATGACACAGGAGCAAGCGAAT-CAGAGTGAG-3', 55-101.b: 5 '-AAAAAGGATCCTTAATCCAGCTTTGGftGGGCTT-3', 55-lCl.c: 5 * -AAAAACATATGACACAGGAGCAAGCGAAT-3 ', and £ 5-101. d: 5 * -AAAAAGGATCCTTAGTGGTGGTGGTGGTSGTGTCCAGCTTTGGAGGGCTTC-AAGATGAC-3 '.
The plasmid encoding the fusion protein with amino terminal histidine appendage, pKOS55-121, was constructed using primers 55-101. a and 55-101.b, and the one encoding the histidine appendix at the carboxyl terminus, pKOS55-129, was constructed using primers 55-101. c and 55-101.d in the PCR reactions containing pKOS35-83.5 as the DNA template. Plasmid pKOS35-83.5 contains the Notl fragment of -5 kb comprising the epoK gene linked within pBluescriptSKII + (Stratagene). The PCR products were digested with the restriction enzymes BamHI and Ndel and ligated into the BamHI and Ndel sites of pET22b (Invitrogen). Both plasmids were sequenced to verify that no mutations had been introduced during the PCR amplification. The protein gels were run as is known in the art. The purification of EpoK was carried out as follows. Plasmids pKOS55-121 and pKOS55-129 were transformed into BL21 (DE3) containing the plasmid expressing groELS, pREP4-groELS (Caspers et al., 1994, Cellular and Molecular Biology 40 (5): 635-644). The strains were inoculated into 250 mL of M9 medium supplemented with MgSO42M, 1% glucose, 20 mg of thiamin, 5 mg of FeCb, 4 mg of CaC and 50 mg of levulinic acid. The cultures were grown at a DOoe between 0.4 and 0.6, at which point IPTG was added to 1 mM, and the cultures were allowed to grow for two additional hours. The cells were harvested and frozen at -80 ° C. The frozen cells were resuspended in 10 ml of buffer pH 1 (5 mM imidazole, 500 mM NaCl, and 45 mM Tris pH 7.6) and used for sonication three times for 15 seconds each at level 8. Cell debris was concentrated by centrifugation in a SS-34 rotor at 16,000 rpm for 30 minutes. The supernatant was removed and centrifuged again at 16,000 rpm for 30 minutes. The supernatant was loaded into a 5 mL nickel column (Novagen), after which the column was washed with 50 mL of pH 1 buffer (Novagen). EpoK was eluted with a gradient from 5 mM to 1 M imidazole. Fractions containing EpoK were pooled and dialyzed twice against a 1 L dialysis buffer (45 mM Tris pH 7.6, 0.2 mM DTT, 0.1 mM EDTA, and 20% glycerol). The aliquots were frozen in liquid nitrogen and stored at -80 ° C. The protein preparations were more than 90% pure. The EpoK assay was carried out as follows (See Betlach et al., Biochem (1998) 37: 14937, incorporated herein by reference). Briefly, the reactions consisted of 50 mM Tris (pH 7.5), 21 μl of spinach ferredoxin, 0.132 units of spinach ferredoxin: NADP + oxidoreductase, 0.8 units of glucose-6-phosphate dehydrogenase, 1.4 mM NADP, and glucose-6 7.1 mM phosphate, 100 μM or 200 μM epothilone D (a generous gift of S. Danishefsky), and EpoK with histidine appendage at the 1.7 μM amino terminal or EpoK with histidine appendix at the 1.6 μM carboxyl terminus in a volume of 100 μL. The reactions were incubated at 30 ° C for 67 minutes and stopped when heated at 90 ° C for 2 minutes. The insoluble material was removed by centrifugation, and 50 μl of the supernatant was analyzed by LC / MS. HPLC conditions: Metachem 5 ODS-3 inert silica (4.6 X 150 mm); 80% H20 for 1 minute, then 100% MeCN for 10 minutes at 1 mL / min, with UV (? Max = 250 nm), ELSD, and MS detection. Under these conditions, epothilone D eluted at 16.6 minutes and epothilone B at 9.3 minutes. The LC / MS spectrum was obtained using a chemical ionization source at atmospheric pressure with orifice and ring voltages set at 20 V and 250 V, respectively, at a mass resolution of 1 amu. Under these conditions, epothilone e shows a [M + H] at miz 493, with fragments observed at 405 and 304. Epothilone B shows a [M + H] at miz 509, with fragments observed at 491 and 320. Reactions containing EpoK and epothilone D contain a compound absent in the control that exhibits the same retention time, molecular weight, and mass fragmentation pattern as pure epothilone B. With an epothilone D concentration of 100 μM, EpoK with histidine appendage at the amino and carboxyl terminal ends was able to convert 82% and 58% to epothilone B, respectively. In the presence of 200 μM, the conversion was 44% and 21%, respectively. These results demonstrate that EpoK can convert epothilone D to epothilone B.
EXAMPLE ß Epothilones modified for chemobiosynthesis This example describes a series of thioesters provided by the invention for the production of epothilone derivatives via chemobiosynthesis.
The DNA sequence of the biosynthetic gene pool for epothilone from Sorangium cellulosum indicates that the preparation of the PKS involves a mixture of polyketide and amino acid components. The preparation involves loading the PKS-like portion of the loading domain with malonyl CoA followed by the decarboxylation and loading of module one NRPS with cysteine, then the condensation to form the enzyme linked to N-acetylcysteine. The cyclization to form a thiazole is followed by oxidation to form the enzyme linked to 2-methylthiazole-4-carboxylate, the product of the charge domain and NRPS. The subsequent condensation with methylmalonyl CoA by the ketosynthase of module 2 provides the substrate for the module, as shown in the following diagram.
Module substrate 2 MM-CoA epothilone module substrate 3 The present invention provides methods and reagents for chemobiosynthesis to produce epothilone derivatives in a manner similar to that described for making 6-dEB and erythromycin analogs in PCT Patent Publication Nos. 99/03986 and 97/02358. Two types of feed substrates are provided: analogs of the NRPS product, and substrate analogues of module 3. The substrates of module 2 are used with the PKS enzymes with a domain similar to mutated NRPS, and the substrates of module 3 are used with the PKS enzymes with a KS domain mutated in module 2. The substrates of module 2 (as thioesters of N-acetyl isteamide) are illustrated below for use as substrates for epothilone PKS with inactivated modified NRPS: The substrates of module 2 are prepared by the activation of the corresponding carboxylic acid and treatment with N-acetylcysteamide. Activation methods include the formation of acid chloride, formation of a mixed anhydride, or reaction with a condensation reagent such as carbodiimide.
The exemplary substrates of module 3, as well as the Nac thioesters for use as substrates for epothilone PKS with KS2 knockout are: These compounds are prepared in a three-step process. First, the appropriate aldehyde is treated with a Wittig reagent or with an equivalent to form the substituted acrylic ester. The ester is saponified to the acid, which is then activated and treated with N-acetylcysteamine.
The . Illustrative schemes of the reaction for making the substrates of module 2 and module 3 are found below. Additional compounds suitable for making starting materials for the synthesis of the polyketide by epothilone PKS are shown in Figure 2 as carboxylic acids (or aldehydes which can be converted to carbocylic acids) which are converted to the N-acylcysteamides to supply the host cells of the invention.
A. Thiophene-3-carboxylate thioester N-acetylcysteamine A solution of thiophene-3-carboxylic acid (128 mg) in 2 mL of dry tetrahydrofuran under inert atmosphere was treated with triethylamine (0.25 mL) and diphenylphosphoryl azide (0.50 mL). After 1 hour, N-acetylcysteamide (0.25 mL) was added, and the reaction was allowed to proceed for 12 hours. The mixture was poured into water and extracted three times with equal volumes of ethyl acetate. The organic extracts were combined, washed sequentially with water, 1N HCl, saturated CuSO4, and brine, and then dried over MgSO *, filtered, and concentrated under vacuum. Chromatography on Si02 using ether followed by ethyl acetate yielded a pure product, which crystallized after being subjected to the procedure.
B. Furan-3-carboxylate thioester N.acetylcysteamine A solution of furan-3-carboxylic acid (112 mg) in 2 mL of dry tetrahydrofuran under inert atmosphere was treated with triethylamine (0.25 mL) and diphenylphosphoryl azide (0.50 mL). After 1 hour, N-acetylcysteamide (0.25 mL) was added, and the reaction was allowed to proceed for 12 hours. The mixture was poured into water and extracted three times with equal volumes of ethyl acetate. The organic extracts were combined, washed sequentially with water, 1 N HCl, saturated CuSO 4, and brine, and then dried over MgSO 4, filtered, and concentrated in vacuo. Chromatography on Si02 using ether followed by ethyl acetate yielded a pure product, which crystallized after being subjected to the procedure.
C. urea-2-carboxylate thioester N.acetylcysteamine A solution of pyrrole-2-carboxylic acid (112 mg) in 2 mL of dry tetrahydrofuran under inert atmosphere was treated with triethylamine (0.25 mL) and diphenylphosphoryl azide (0.50 mL) . After 1 hour, N-acetylcysteamide (0.25 mL) was added, and the reaction was allowed to proceed for 12 hours. The mixture was poured into water and extracted three times with equal volumes of ethyl acetate. The organic extracts were combined, washed sequentially with water, 1 N HCl, saturated CuS 4, and brine, and then dried over MgSO 4, filtered, and concentrated in vacuo. Chromatography on SiO2 using ether followed by ethyl acetate yielded a pure product, which crystallized after having been subjected to the procedure.
D. Thioester of 2-methyl-3- (3-thienyl) acrylate N-acetylcysteamide (1) Ethyl-2-methyl-3- (3-thienyl) acrylate: A mixture of thiophene-3-carboxaldehyde (1.12 g) and (carbethoxyethylidene) triphenylphosphorane (4.3 g) in dry tetrahydrofuran (20 mL) were heated to reflux for 16 hours. The mixture was cooled to room temperature and concentrated to dry under vacuum. The solid residue was resuspended in ether / hexane 1: 1 and filtered to remove the triphenylphosphine oxide. The filtrate was filtered through a pad of Si02 using 1: 1 ether / hexane to provide the product (1.78 g, 91%) as a pale yellow oil. (2) 2-methyl-3- (3-thienyl) acrylic acid: The ester from (1) was dissolved in a mixture of methanol (5 mL) and 8 N KOH (5 mL) and heated to reflux by 30 minutes. The mixture was cooled to room temperature, diluted with water, and washed twice with ether. The aqueous phase was acidified using 1 N HCl then extracted 3 times with equal volumes of ether. The organic extracts were combined, dried over MgSO 4, filtered and concentrated to dry under vacuum. The crystallization from hexane / ether 2: 1 provides the product as needles without color. (3) 2-Methyl-3- (3-thienyl) acrylate thioester N-acetylcysteamine: A solution of 2-methyl-3- (3-thienyl) acrylic acid (168 mg) in 2 mL of Dry tetrahydrofuran under inert atmosphere was treated with triethylamine (0.56 mL) and diphenylphosphoryl azide (0.45 mL). After 15 minutes, N-acetylcysteamide (0.15 mL) was added, and the reaction was allowed to proceed for 4 hours. The mixture was poured into water and extracted three times with equal volumes of ethyl acetate. The organic extracts were combined, washed sequentially with water, 1 N HCl, saturated CuS 4, and brine, and then dried over MgSO 4, filtered, and concentrated in vacuo. Chromatography on Si02 using ethyl acetate yielded the pure product, which crystallized after being subjected to the procedure. The above compounds were added to the host cell cultures containing a recombinant epothilone PKS of the invention in which either NRPS or the KS domain of module 2 as appropriate has been inactivated by mutation to prepare the corresponding epothilone derivative of the invention.
EXAMPLE 7 Production of epothilones and epothilone derivatives in Soranaium cellulosum SMP44 The present invention provides a variety of host cells of Sorangium cellulosum that produce less complex mixtures of epothilones than the naturally occurring epothilone producers as well as host cells that produce derivatives of epothilone. This example illustrates the construction of said strains by describing how to make a strain that produces only epothilones C and D without epothilones A and B. To build this strain, an inactivating mutation is made in epoK. Using the pKOS35-83.5 plasmid, which contains a NotI fragment having the epoK gene, the kanamycin and bleomycin resistance markers from Tnd are ligated into the Seal site of the epoK gene to construct pKOS90-55. The orientation of the resistance markers is such that the transcription initiated in the kanamycin promoter directs the expression of genes immediately towards the 3 'end of epoK. In other words, the mutation should not be polar. Next, the origin of the conjugative transfer, oriT, of RP4 is ligated into pKOS90-55 to create pKOS90-63. This plasmid can be introduced into S17-1 and conjugated within SMP44. The transconjugants are selected in boxes of phleomycin as previously described. Alternatively, electroporation of the plasmid can be achieved using conditions described above for Myxococcus xanthus. Because there are three generalized transducer phages for Myxococcus xanthus, one can transfer DNA from M. xanthus to SMP44. First, the epoK mutation is constructed in M. xanthus by linearizing the plasmid pKOS90-55 and electroporating into M. xanthus. The kanamycin-resistant colonies are selected and have a gene replacement of epoK. These strains are infected with phages Mx9, Mx8, Mx4 ts 18 hft to make phage lysates. These lysates are then individually infected within SMP44 and the phleomycin-resistant colonies are selected. Once the strain is constructed, the standard fermentation procedures, as described below, are employed to produce the C and D epothilones. Prepare a fresh box of Sorangium host cells (dispersed) in S42 medium. S42 medium contains tryptone, 0.5 g / L; MgSO4, 1.5 g / L; HEPES, 12 g / L; agar, 12 g / L, with deionized water. The pH of the medium S42 is set at 7.4 cpn KOH. To prepare S42 medium, after autoclaving at 121 C for at least 30 minutes, add the following ingredients (per liter): CaCL2, 1 g; K2HPO, 0.06 g; Iron citrate, 0.008 g; Glucose, 3.5 g; Ammonium sulfate, 0.5 g; Liquid medium Spent, 35 mL; and 200 micrograms / mL of kanamycin is added to prevent contamination. The culture is incubated at 37 ° C for 4-7 days, or until orange sorangios appear on the surface. To prepare a seed culture for inoculating agar boxes / bioreactor, the following protocol is followed. Scrape an area of orange Sorangium cells from the agar (about 5 mm2) and transfer to a bottle with 250 ml deviation with 38 mm silicone foam closures containing 50 ml of soybean meal medium containing starch. potato, 8 g; non-fat soy flour, 2 g; Yeast extract, 2 g; Salt of sodium-iron EDTA (III), 0.008 g; MgSO.7H2O, 1 g; CaCl2.2H20, 1 g; glucose, 2 g; HEPES buffer, 11.5 g. Use deionized water, and adjust the pH to 7.4 with 10% KOH. Add 2-3 drops of antifoam B to prevent the formation of foam. Incubate in a closed agitator for 4-5 days at 30 ° C and a 250 RPM. The crop should appear orange. This seed culture can be repeatedly subcultured for larger scalars to inoculate the desired volume of production medium. The same preparation can be used with medium 1 containing (per liter) CaCl2.H20, 1 g; Yeast extract, 2 g; Soytone, 2 g; FeEDTA, 0.008 g, MgSO4.7H2O, HEPES, 11.5 g. Adjust the pH to 7.4 with 10% KOH, and autoclave at 121 ° C for 30 minutes. Add 8 ml of 40% glucose after sterilization. Instead of a bottle with a deviation, use a 250 ml spiral spring bottle with an aluminum cover. Include 2-3 drops of antifoam B, and incubate in a closed agitator for 7 days at 37 ° C and 250 RPM. Subculture all 50 mL into 500 mL of fresh medium in a Fembach flask with narrow neck deviation with 38 mm silicone foam closure. Include 0.5 of antifoam to the crop. Incubate under the same conditions for 2-3 days. Use at least 10% inoculum for a bioreactor fermentation. To grow on solid medium, the following protocol is used. Prepare agar boxes containing (per liter of CNS medium) KNO3, 0.5 g; Na2HP04, 0.25 g; MgSO4.7H20, 1 g; FeCl2, 0.01 g; HEPES, 2.4 g; Agar, 15 g; and Whatman sterile filter paper. While the agar is not completely solidified, place a sterile disc of filter paper on the surface. When the box dries, add just enough seed culture to cover the surface homogeneously (about 1 mL). Spread homogeneously with a sterile handle or an applicator, and place in an incubator at 32 ° C for 7 days. Harvest the boxes. For production in a 5 L bioreactor, the following protocol is used. The fermentation can be conducted in a B. Braun Biostat MD-1 5 L bioreactor. Prepare 4 L of production medium (the same as the soybean meal medium for seed culture without HEPES buffer). Add XAD-16 2% (volume to volume) absorption resin, unwashed and untreated, for example, add 1 mL of XAD per 50 mL of production medium. Use H2S04 2.5 N for the acid bottle, 10% KOH for the basic bottle, and 50% antifoam B for the antifoaming bottle. For the sample port, make sure that the tube that will come into contact with the culture broth has a small opening to allow the XAD to pass through the vial to collect the daily samples. Shake the mixture completely before autoclaving to distribute the components homogeneously. Calibrate the pH probe and evaluate the dissolved oxygen probe to ensure proper functioning. Use a small antifoam probe, of -7.5 cm in length. For bottles, use tubes that can be welded by sterilization, but use silicone tubes for the sample post. Make sure that all the connections that fit are secured and that the tubes are fastened, not too strong, with clamps C. Do not hold the tubes of the exhaustion condenser. Attach a 0.2 μm disc filter to any open tube that is in contact with air. Use larger ACRO 50 disc filters for longer tubes, such as the condenser Exhaustion and the air supply tube. Prepare a sterile empty bottle for the inoculum. Autoclave at 121 ° C with a sterilization time of 90 minutes. Once the reactor has been removed from the autoclave, connect the tube to the acid, base and antifoam bottles through their respective pump heads. Release the clamps to these bottles, making sure that the tubes have not closed. Connect the temperature probe to the control unit. Allow the reactor to cool, while air is sprayed through the air supply tube at a low air flow rate. After making sure that the pumps are working and there is no problem with the flow rate or obstructions, connect the hoses from the water bath to the water cooler and to the condenser due to exhaustion. Make sure the water cooler is almost full. Set the temperature to 32 ° C. Connect the pH, D.O., and antifoam probes to the main control unit. Test the antifoam probe for proper operation. Adjust the set point of the crop to 7.4. Set agitation to 400 RPM. Calibrate the D.O. using air and nitrogen gas. Adjust the air flow using the speed at which the fermentation will operate, for example 1 LPM (liter per minute). To control the level of dissolved oxygen, adjust the parameters under the cascade of determinations so that the agitation will compensate for low levels of air to maintain a value of D.O. 50% Set the minimum and maximum agitation to 400 and 1000 RPM respectively, based on the establishments of the control unit. Adjust the establishments, if necessary. Check the seed culture for any contamination before inoculating the fermentor. The cells of Sorangium cellulosum are roll-shaped like a pill, with two large distinctive circular vacuoles at opposite ends of the cell. The length is approximately 5 times the thickness of the cell. Use a 10% inoculum volume, for example, 400 mL within 4 L of production medium. Take an initial sample from the container and check against the reference pH. If the difference between the pH of the fermentor and the reference pH is moved away by > 0.1 units, make a point 1 of recalibration. Adjust the scale to 0.1. Take daily samples of 25 mL noting the fermenter pH, reference pH, temperature, O.D., air flow, agitation, acid, base and antifoam levels. Adjust the pH if necessary. Allow the fermenter to work for seven days before harvesting. The extraction and analysis of the compounds is carried out substantially as described above in example 4. In brief, the fermentation culture is extracted twice with ethyl acetate, and the ethyl acetate extract is concentrated to dry it and dissolve / suspend it in -500 μL of MeCN-H2? (eleven). The sample is loaded in a 0.5 mL Bakerbond ODS SPE cartridge pre-equilibrated with MeCN-H2O (1: 1). The cartridge is washed with 1 mL of the same solvent, followed by 2 mL of MeCN. The eluate MeCN is concentrated to dry, and the residue is dissolved in 200 μL of MeCN. The samples (50 mL) are analyzed by HPLC / MS in a system comprised of a Beckman Gold System HPLC and a Sciex API100LC MS-based single-pole detector equipped with a chemical ionization source at atmospheric pressure. The ring and orifice voltages are set at 75 V and 300 V, respectively, and a dual interval mass sweep is used from miz 290-330 and 450-550. HPLC conditions: Metachem μ5 ODS-3 Inert silica (4.6 X 150 mm); 100% H20 for 1 minute, then 100% MeCN for more than 10 minutes at 1 mL / minute. Epothilone A elutes at 0.2 minutes under these conditions and gives ionic characteristics at miz 494 (M + H), 476 (M + H + H20), 318, and 306.
EXAMPLE 8 Derivatives of epothilone as anti-cancer aqentes The novel epothilone derivatives shown below by formula (1) proved to be potent anti-cancer agents and can be used for the treatment of patients with various forms of cancer, including but not limited to breast, ovarian, and lung cancers . Structure-activity relationships of epothilone are based on tubulin binding assays (see Nicolaou et al., 1997, Angew, Chem. Int. Ed. Engl. 36: 2097-2103, incorporated herein by reference), illustrated by the diagram below.
(A) configuration (3S) important; B) 4,4-ethane group not tolerated; C) crucial configuration (6R, 7S); D) important configuration (8S), 8,8-dimethyl group not tolerated; E) non-essential epoxide for the polymerization activity of tubulin, but which may be important for cytotoxicity; the epoxide configuration can be important; R group important; both geometries of the olefin are tolerated; F) configuration (15S) important; G) bulky group that reduces activity; H) tolerated oxygen substitution; I) important substitution; J) important heterocycle. Thus, this SAR indicates that the modification of the C1-C8 segment of the molecule can have strong effects on the activity, while the remnant of the molecule is relatively tolerant to change. The variation of stereochemistry of substituents with the C1-C8 segment, or the removal of functionality, can lead to significant loss of activity. The compounds derived from epothilone A-H differ from epothilone by modifications in the less sensitive portion of the molecule and thus possess good biological activity and offer better pharmacokinetic characteristics, having improved the lipophilic and steric profiles. These novel derivatives can be prepared by altering the genes involved in the epothilone biosynthesis optionally followed by chemical modification. The 9-hydroxy-epothilone derivatives prepared by genetic engineering can be used to generate carbonate derivatives (compound D) by treatment with triphosgene or 1,1 'carbonyldiimazole in the presence of a base. Similarly, the 9,11-dihydroxy-epothilone derivative, with appropriate protection of the C-7 hydroxyl group if present, produces carbonate derivatives (compound F). Selective oximation of 9-oxo-epothilone derivatives with hydroxylamine followed by reduction (Nickel Raney in the presence of hydrogen cyanoborohydride or sodium) produces the 9-amino analogues. By reacting these 9-amino derivatives with p-nitrophenyl chloroformate in the presence of a base and subsequently reacting them with sodium hydride, the carbamate derivatives (compound E) will be produced. Similarly, the carbamate compound G, after adequate protection of the C7 hydroxyl group if present, can be prepared from the 9-amino-11-hydroxy-epothilone derivatives. An illustrative synthesis is provided below.
Part A. Epothilone D-7.9-cyclic carbonate To a round bottom flask, a solution of 254 mg of epothilone D in 5 mL of methylene chloride is added. It cools in a bath with ice, and then 0.3 mL of triethylamine is added. To this solution, 104 mg of triphosgene are added. The bath is removed with ice, and the mixture is stirred under nitrogen for 5 hours. The solution is diluted with 20 mL of methylene chloride and washed with dilute sodium bicarbonate solution. The organic solution is dried over magnesium sulfate and filtered. After evaporating it to dryness, the epothilone-D-7,9-cyclic carbonate is isolated.
Part B. Epothilone D-7.9-cyclic carbamate (i) 9-amino-epothilone D To a round bottom flask, a solution of 252 mg of 9-oxo-epothilone D in 5 mL of methanol is added. After the addition of 0.5 mL of 50% hydroxyamine in water and 0.1 mL of acetic acid, the mixture is stirred at room temperature overnight. The solvent is then removed under reduced pressure to yield 9-oxime-epothilone D. To a solution of this compound 9 oxime in 5 mL of tetrahydrofuran (THF) in an ice bath is added 0.25 mL of cyanoborohydride solution in 1M THF. After the mixture is reacted for 1 hour, the ice bath is removed, and the solution is allowed to warm slowly to room temperature. One mL of acetic acid is added, and the solvent is then removed under reduced pressure. The residue is dissolved in 30 mL of methylene chloride and washed with dilute sodium bicarbonate solution. The organic solution is separated and dried over magnesium sulfate and filtered. After evaporation of the solvent 9-amino epothilone D is produced. (ii) Epothilone D-7,9-cyclic carbamate To a solution of 250 mg of 9-amino-epothilone D in 5 mL of methylene are added 110 mg of 4-nitrophenyl chloroformate followed by the addition of 1 mL of triethylamine. The solution is stirred at room temperature for 16 hours. It is diluted with 25 mL of methylene chloride. The solution is washed with saturated sodium chloride and the organic layer is separated and dried over magnesium sulfate. After filtration, the solution is evaporated to dry under reduced pressure. The residue is dissolved in 10 ml of dry THF. Sodium hydride, 40 mg (60% dispersion in mineral oil), is added to the solution in an ice bath. The bath is removed with ice, and the mixture is stirred for 16 hours. Half a mL of acetic acid is added, and the solution is evaporated to dry under reduced pressure. The residue is redissolved in 50 mL of methylene chloride and washed with saturated sodium chloride solution. The organic layer is dried over magnesium sulfate and the solution is filtered and the organic solvent is evaporated to dry under reduced pressure. After purification on a column of silica gel, the epothilone D-7,9-carbamate is isolated. The invention has now been described by written description and examples, those skilled in the art will recognize that the invention can be practiced in a variety of embodiments and that the foregoing descriptions and examples are for purposes of illustration and are not limiting of the following claims.

Claims (66)

NOVELTY OF THE INVENTION CLAIMS
1. - An isolated compound of recombinant nucleic acid characterized in that it comprises a nucleotide sequence that encodes at least one domain of an epothilone polyketide synthase protein (PKS) and / or that encodes a functional region of a modification enzyme to epothilone.
2. The nucleic acid according to claim 1, further characterized in that said domain is selected from the group consisting of a loading domain, a thioesterase domain, a non-ribosomal peptide synthase (NRPS), an acyltransferase domain (AT ), a ketosynthase domain (KS), a domain of acyl carrier protein (ACP), a ketoreductase (KR) domain, a dehydratase (DH) domain, and an enoylreductase (ER) domain, a methyltransferase domain and an oxidase domain functional.
3. The nucleic acid according to any of claims 1 or 2, further characterized in that it comprises the coding sequence of an epoA gene, and / or the coding sequence of an epoB gene, and / or the coding sequence of a gene epoC, and / or the coding sequence of an epoD gene, and / or the coding sequence of an epoE gene, and / or the coding sequence of an epoF gene, and / or the coding sequence of an epoK gene, and / or the coding sequence of an epoL gene.
4. - The nucleic acid according to claim 1, further characterized in that it comprises a promoter located to transcribe said coding sequence in host cells wherein said promoter is operable.
5. The nucleic acid according to claim 4, further characterized in that said promoter is a promoter from a Sorangium gene, or from a Mixococcus gene, or from a Streptomyces gene, or from a gene Epothilone PKS, or from a pilA gene, or from a PKS gene of actinorodin.
6. The nucleic acid according to any of claims 1-5, further characterized in that it is a recombinant DNA expression vector.
7. Host cells characterized in that they contain the nucleic acid as claimed in any of claims 4-6.
8. The cells according to claim 7, further characterized in that they are Sorangium cells, or Pseudomonas cells, or Streptomyces cells.
9. A method for producing a polyketide characterized in that it comprises culturing the cells as claimed in claim 7, under conditions wherein the coding nucleotide sequence is expressed to obtain a functional PKS.
10.- A recombinant Sorangium cellulosaum host cell that contains a mutated gene for an epothilone PKS protein or an enzyme of modification of epothilone, wherein said mutated gene was inserted in whole or in a part within the genomic DNA of said cell by homologous recombination with a recombinant vector comprising all or a part of an epothilone PKS gene or epothilone modification gene.
11. The recombinant host cell according to claim 10, further characterized because it makes epothilone C or D but not A or B due to a mutation that inactivates or deletes the epoK gene, or makes epothilone A or C but not B or D due to a mutation in epoD that alters the specificity of the AT domain of module 4, or makes epothilone B or D but not A or C due to a mutation in epoD that alters the specificity of the AT domain of module 4, or makes epothilone D but not A, B or C due to a mutation in epoD that alters the specificity of the AT domain of module 4 and a mutation in epoK.
12. Recombinant Streptomyces or Mixococcus host cells that express an epothilone PKS gene or an epothilone modification enzyme gene, which optionally comprises one or more of said epothilone PKS genes or modified modification enzyme genes within its chromosomal DNA and / or one or more of said epothilone PKS genes or modification enzyme genes on an extrachromosomal expression vector.
13. The host cells according to claim 12, further characterized in that they are S. coelicolor CH999.
14. - A method for producing an epothilone or epothilone derivative comprising culturing the cells as claimed in claim 12 or 13.
15. A functional modified epothilone PKS wherein said modification comprises at least one of: replacing at least one AT domain with an AT domain of different specificity; inactivation of module 1 similar to NRPS or catalytic domain KS2; inactivation of at least one activity in at least one β-carbonyl modification domain; addition of at least one KR, DH and ER activity in at least one β-carbonyl modification domain; and replacement of NRPS module 1 with a NRPS of different specificity.
16. The modified PKS according to claim 15, further characterized in that it is contained in a cell or contained in a cell-free system, wherein said cell or system contains additional enzymes for the modification of the product of said epothilone PKS.
17. The modified PKS according to claim 16, further characterized in that said modifying enzymes comprise at least one methyltransferase enzyme, an oxidase or a glycosylation enzyme.
18. A method for preparing an epothilone derivative which method comprises providing the substrates including extension units to the modified PKS as claimed in any of claims 15-17.
19. - A functional modified epothilone PKS wherein said modification comprises the inactivation of module 1 of the NRPS or module 2 of the NRPS thereof.
20. A method for making an epothilone derivative which method comprises contacting the modified PKS as claimed in claim 19 with a substrate of module 2 or a substrate of module 3 and extension units.
21. Recombinant host cells comprising the modified PKS as claimed in any of claims 15-17 or 19.
22. The cells according to claim 21, further characterized in that they produce an epothilone derivative selected from the group consisting of of 16-desmethyl epothilones, 14-methyl epothilones, 11-hydroxyl epothilones, 10-methyl epothilones, 8,9-anhydro epothilones, 9-hydroxyl epothilones, 9-keto epothilones, 8-demethyl epothilones, and 6-demethyl epothilones.
23. A compound selected from the group consisting of 16-desmethyl epothilones, 14-methyl epothilones, 11-hydroxyl epothilones, 10-methyl epothilones, 8,9-anhydro epothilones, 9-hydroxyl epothilones, 9-keto epothilones, 8- desmethyl epothilones; and 6-desmethyl epothilones.
24. A recombinant PKS enzyme comprising one or more domains, modules, or proteins of a non-epothilone PKS and one or more domains, modules, or proteins of an epothilone PKS, and / or contains a charge domain comprising a KSQ domain.
25. - The PKS enzyme according to claim 24, further characterized in that it comprises a charge domain of a deoxyerythronolide B synthase (DEBS) and 5 modules of DEBS and an NRPS of the epothilone PKS, wherein said PKS comprises an entire PKS not -epothilone with an MT domain of the epothilone PKS.
26 - A compound of the formula: (I) Including the glycosylated forms thereof and stereoisomeric forms wherein the stoichiometry is not shown, wherein A is a substituted or unsubstituted, branched or alkyl, alkenyl or cyclic alkynyl extension optionally containing 1-3 heteroatoms selected from O, S, and N; or wherein A comprises a substituted or unsubstituted aromatic residue; R 2 represents H, H, or H. lower alkyl, or lower alkyl. Lower alkyl; X5 represents = O or a derivative thereof, or H, OH or H, NR2 wherein R is H, alkyl or acyl, or H, OCOR2, H, OCONR2 wherein R is H or alkyl, or is H, H; R6 represents H or lower alkyl, and the substituent remaining on the corresponding carbon is H; X7 represents OR, or NR2, wherein R is H, alkyl or acyl or is OCOR, or OCONR2 wherein R is H or alkyl or X7 taken together with X9 form a carbonate or cyclocarbamate, and wherein the substituent remaining on the corresponding carbon is H; R8 represents H or lower alkyl and the remaining substituent on the carbon is H; X9 represents = O or a derivative thereof, or H.OR or H, NR2 wherein R is H, alkyl or acyl, or is H.OCOR or H, OCONR2, wherein R is H or alkyl, or represents HH or wherein X9 together with X7 or X11 can form a carbamate or carbamate cycle; R10 is H, H or H. lower alkyl, or lower alkyl. Lower alkyl; X11 is = O or a derivative thereof, or H.OR, or H, NR wherein R is H, alkyl or acyl or H.OCOR or H, OCONR2 wherein R is H or alkyl, or is H, H or wherein X11 in combination with X9 can form a carbamate or carbamate cycle; R 12 is H.H, or H. lower alkyl, or lower alkyl. Lower alkyl; X13 is = O or a derivative thereof, or H.OR or H.NR2 wherein R is H, alkyl or acyl, or is H.OCOR or H, OCONR2l wherein R is H or alkyl; R 14 is H, H or H. lower alkyl, or lower alkyl. Lower alkyl; R16 is H or lower alkyl; and wherein optionally H and another substituent can be removed from positions 12 and 13 / or 8 and 9 to form a double bond, wherein said double bond can optionally be converted to an epoxide.
27.- A compound of the formula 1 (a), 1 < ? ^ 'l Where both Z are O or one Z is N and the other Z is O and the remaining substituents are defined as in claim 26.
28.- A recombinant vector selected from the group consisting of pKOS35-70.8A3, pKOS35-70.1A2 , pKOS35-70.4, pKOS039-124R, and pKOS039-126R.
29.- Isolated or recombinant host cells containing a nucleic acid encoding the expression of the epothilone polyketide synthase of Sorangium cellulosum ("PKS") but does not contain a functionally active product of an epoK gene of Sorangium cellulosum wherein the host cells they produce more epothilone C than epothilone A or more epothilone D than epothilone B.
30. The cells according to claim 29, further characterized in that they are Sorangium cells, or Myxococcus cells, or Pseudomonas cells, or Streptomyces cells.
31. A method for producing epothilone C or epothilone D characterized in that it comprises culturing the cells as claimed in claim 29 under conditions wherein the nucleic acid is expressed to obtain a functional PKS.
32.- A recombinant host cell Sorangium cellulosum that contains a mutated epoK gene and produces more epothilone C than epothilone A or more epothilone D than epothilone B.
33.- The recombinant host cell according to claim 32, characterized in that it also makes epothilone C but not Epothilone A, B or C due to a mutation in epoD that alters the specificity of the acyltransferase ("AT") domain of extension module 4 or that makes epothilone D but not epothilone A, B or C due to a mutation in epoD that alters the specificity of the AT domain of extension module 4.
34.- Streptomyces recombinant host cells or Myxococcus expressing the epothilone polyketide synthase ("PKS") genes of Sorangium cellulosum and does not express a product of the fully functional epoK gene of Sorangium cellulosum, which optionally comprises one or more of said epothilone PKS genes integrated within its DNA chromosomal or in an extrachromosomal expression vector, wherein said recombinant cells produce more epothilone C than epothilone A or more epothilone D than epothilone B.
35.- A method for producing epothilone C or D comprising culturing the cells according to the claims 33 or 34.
36. The cells according to claim 29, further characterized in that said cells do not contain an epoK gene of Sorangium cellulosum.
37. The cells according to claim 29, further characterized in that said cells contain an epoK gene altered from Sorangium cellulosum that produces a gene product that does not convert epothilone C to epothilone A or epothilone D to epothilone B.
38. The cells according to claim 29, further characterized in that said cells produce both epothilone C and epothilone D.
39. The cells according to claim 29, further characterized in that said cells do not produce epothilone A or epothilone B.
40. - The cells according to claim 29, further characterized in that said nucleic acid comprises the genes epoA, epoB, epoC, epoD, epoE and epoF.
41. The cells according to claim 29, further characterized in that said nucleic acid comprises an altered coding sequence of the acyltransferase domain ("AT") of the epoD extension module 4.
42. The cells according to claim 41, characterized in that said altered coding sequence of the AT domain results in increased production of epothilone C and decreased production of epothilone D compared to cells containing a sequence encoding the unaltered AT domain
43. The cells according to claim 41 , further characterized in that said altered coding sequence of the AT domain results in increased production of epothilone D and decreased production of epothilone C as compared to cells that contain a sequence encoding the unaltered AT domain.
44. - The cells according to claim 30, further characterized in that they are Streptomyces cells.
45. The cells according to claim 36, further characterized in that they are Streptomyces cells.
46. The cells according to claim 38, further characterized in that they are Streptomyces cells.
47. The cells according to claim 39, further characterized in that they are Streptomyces cells.
48. The cells according to claim 30, further characterized because they are Myxococcus cells.
49. The cells according to claim 36, further characterized because they are Myxococcus cells.
50. The cells according to claim 38, further characterized in that they are Myxococcus cells.
51. The cells according to claim 39, further characterized because they are Myxococcus cells.
52. The cell according to claim 32, further characterized in that said mutated epoK gene comprises an insertion or deletion relative to a non-mutated epoK gene.
53. The cell according to claim 32, further characterized in that said mutated epoK gene is produced by random mutagenesis.
54. - The cell according to claim 32, further characterized in that said mutated epoK gene is produced by homologous recombination.
55.- The cell according to claim 32, further characterized in that they do not produce epothilone A or epothilone B.
56.- The cells according to claim 34, further characterized in that said cells do not contain an epoK gene of Sorangium cellulosum.
57. The cells according to claim 34, further characterized in that said cells contain an altered epoK gene of Sorangium cellulosum and said altered epoK gene produces a gene product that does not convert epothilone C to epothilone A or epothilone D to epothilone B.
58 The cells according to claim 34, further characterized in that said cells do not produce epothilone A or epothilone B.
59. The cells according to claim 34, further characterized in that they contain the PKS genes epÓA, epoB, epoC, epoD, epoE and epoF.
60. The cells according to claim 59, further characterized in that said epoD gene comprises an altered coding sequence of an extension module 4 of the acyltransferase domain ("AT").
61. - The cells according to claim 60, further characterized in that said altered coding sequence of the AT domain results in increased production of epothilone C and decreased production of epothilone D compared to cells that contain a sequence encoding the unaltered AT domain.
62. The cells according to claim 60, further characterized in that said altered coding sequence of the AT domain results in increased production of epothilone D and decreased production of epothilone C compared to cells that contain a sequence encoding the unaltered AT domain. . 63.- The method according to claim 31, further characterized in that both epothilones C and epothilones D are produced. 64.- The method according to claim 31, further characterized by the production of epothilone D. 65.- The method according to claim 35, further characterized in that the epothilones C and D are produced. The method according to claim 35, further characterized in that epothilone D is produced.
MXPA/A/2001/005097A 1998-11-20 2001-05-21 Recombinant methods and materials for producing epothilone and epothilone derivatives MXPA01005097A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US60/109,401 1998-11-20
US60/119,386 1999-02-10
US60/122,620 1999-03-03
US60/130,560 1999-04-22

Publications (1)

Publication Number Publication Date
MXPA01005097A true MXPA01005097A (en) 2003-11-07

Family

ID=

Similar Documents

Publication Publication Date Title
US6921650B1 (en) Recombinant methods and materials for producing epothilone and epothilone derivatives
US6410301B1 (en) Myxococcus host cells for the production of epothilones
AU2001295195B2 (en) Myxococcus host cells for the production of epothilones
US7323573B2 (en) Production of polyketides
AU2001295195A1 (en) Myxococcus host cells for the production of epothilones
Rodriguez et al. Heterologous production of polyketides in bacteria
MXPA01005097A (en) Recombinant methods and materials for producing epothilone and epothilone derivatives
US7364877B2 (en) Polynucleotides encoding disorazole polyketide synthase polypeptides
KR100549690B1 (en) Genes for the Synthesis of FR-008 Polyketides
AU2007200160A1 (en) Heterologous production of polyketides
ZA200207688B (en) Heterologous production of polyketides.