Genes encoding the synthetic pathway for the production of disorazole
The present invention relates to nucleic acid sequences and proteins derivable therefrom which are catalytically active or participate in the biosynthetic pathway of disorazoles. The catalytically active proteins, i.e. enzymes, are also known as polyketide synthases and nonribosomal peptide synthetases.
It is known that myxobacteria produce a large variety of biologically active compounds, also known as secondary metabolites. Among these secondary metabolites, the group of disorazoles has attracted attention as inhibitors for the polymerisation of tubulin, for the induction of apoptosis and for the arrest of the cell cycle or inhibition of cell proliferation at concentrations as low as e.g. 3 pM. The present invention provides nucleic acid sequences and proteins which can be translated from the nucleic acid sequences into catalytically active proteins or proteins participating in the biosynthesis of disorazoles. In cooperation, these translated proteins in vivo and/or in vitro catalyze the formation of disorazoles. Accordingly, the present invention also provides a production process using the nucleic acid sequences and/or proteins derivable therefrom for the production of disorazoles, for example using
homologous or heterologous expression of proteins derivable from these nucleic acid sequences in microorganisms for fermentation or the peptides in an immobilized state to produce disorazoles from precursor compounds.
State of the art
WO 2004/053065 A2 describes nucleic acid sequences encoding disorazole polyketide synthases DszA, DszB, DszC and DszD obtained from Sorangium cellulosum So ce 12 using transposon generated cosmids. In very general terms, synthetic synthases are described which can be obtained by rearrangement of domains that can be identified in the wildtype disorazole synthase enzymes, namely a ketoreductase domain, a dehydratase domain, an enoylreductase domain, a ketosynthase domain, a nonribosomal protein synthetase domain, a methyltransferase domain, an acyl carrier protein domain, a serine cyclization domain, a serine condensation domain, an adenylation domain, a peptidyl carrier protein domain, a thiolation domain, an oxidase domain, a thioesterase domain, and an acyl transferase domain from a total number of 8 domains in the disorazole synthetase. These domains are predicted from the DNA sequence obtained. However, specific synthetic rearrangements of these domains are not identified. The nucleotide sequence disclosed for the disorazole polyketide synthase and/or nonribosomal peptide synthetase comprises 77294 bp and allegedly includes the coding sequences for DszA, DszB, DszC, DszD and several other open reading frames which are located adjacent one another.
The present invention relates to the group of disorazoles, namely disorazole Al and derivatives thereof, for example dizorazoles according to the following formulae 1-8 and specific embodiments of these as detailed below:
Formula 1 Formula 2
Formula 5 Formula 6
Formula 7 Formula 8
wherein
X represents an O, two vicinal OH, or a single bond and Rl, R2, R3, R4 each represent independently H, OH, OCH3. Specific embodiments of general formulae 1-8 are:
Disorazole Al - A7 Disorazole Fl - F3
Disorazole Bl - B4 Disorazole Gl - G3
Disorazole Cl - C2 Disorazole H
Disorazole Dl - D5 Disorazole I
Disorazole El - E3
(R. Janssen et al.. Liebigs Ann. Chem. 1994, 759-773).
General description of the invention
The present invention provides the complete nucleic acid sequences encoding not only a gene cluster but further additional genetic elements which are necessary for correct biosynthesis of disorazoles. The entire biosynthetic gene cluster is disclosed, having high homology to the DszA - D disclosed in WO 2004/053065 A2 including its functional analysis.
The core biosynthetic gene cluster for the biosynthetic pathway for disorazoles comprises genes disA through disO. The gene disA is preceded by a putative ribosomal binding site located 11 base pairs upstream from the designated start codon (GTG). DisB presumably starts with an ATG and a putative ribosomal binding site could be localized 7 base pairs upstream from the start codon. Arranged with disA and disB, which are polyketide synthases, in one transcriptional unit is disC, the latter encoding a mixed polyketide synthase / nonribosomal peptide synthetase. DisC most likely starts with an ATG, preceded by a putative ribosomal binding site located 8 base pairs upstream. An alternative start codon of disC could be found 36 base pairs downstream of the putative start codon. Downstream this transcriptional unit of disA, disB and disC, a probable transcription terminator is located.
Following orf 9, located downstream of the transcriptional unit disA through disC, disO was identified having its putative ribosomal binding site 7 base pairs upstream its start codon. The gene disO shows significant similarities to the bifunctional proteins LnmG from the leinamycin biosynthetic gene cluster and to MmpIII from the mupirocin biosynthetic gene cluster. The C-terminus of DisD has close sequence similarity to the oxidoreductase superfamily. From a total of four transposon mutants, listed in Table 3 below, plasmids were recovered, harbouring the hygromycin resistance gene and the λpir dependent origin of replication (ori) R6K together with parts of chromosomal DNA of Sorangium cellulosum So ce 12 which originally flanked the transposition site. A computer assisted analysis of the chromosomal DNA portions using BLAST searches identified two of the proteins predicted from the recovered DNA portions as putative fragments of a polyketide synthase and a nonribosomal peptide synthetase. Using these two chromosomal DNA portions as probes for hybridization with a BAC library, previously established for Sorangium cellulosum So ce 12, sequencing of hybridizing BAC clones yielded orfs encoding proteins participating in the biosynthesis of disorazoles, which are summarised in Table 1 below.
Detailed description of the invention
When analysing the biosynthetic pathway for the production of disorazoles, the genomic DNA of Sorangium cellulosum So ce 12 has been analyzed to identify the genes whose translation products are necessary components of the synthetic pathway, finally producing disorazoles including known variants or derivatives of disorazole A, e. g. according to formulae 1 - 8 above. The gene cluster encoding the enzymes catalyzing the biosynthesis of disorazoles comprises the translation products of disA, disB, disC, disO. It is possible that translation products from open reading frame (orf) orf 9, arranged between disC and disO, may participate in or be beneficial to the biosynthesis of disorazoles.
In the following, reference is made to the figures, wherein
• Figure 1 is a schematic representation of the synthetic pathway for disorazoles,
• Figure 2 schematically shows the arrangement of genes adjacent to the insertion site of the transposon in the transposon mutant So ce 12 EXI IE-2 and sequenced from its plasmid pTn-Rec_IE-2, and
• Figure 3 lists nucleic acid and amino acid sequences relevant to the invention, namely the nucleic acid sequence of pTn-Rec_IE-2 (Seq.-ID No. 1), the amino acid sequences of orf l-pTn-Rec_IE-2 (Seq.-ID No. T), orf 2-pTn-Rec_IE-2 (Seq.-ID No. 3), orf 3- pTn-Rec_IE-2 (Seq.-ID No. 4), orf 4-pTn-Rec_IE-2 (Seq.-ID No. 5), orf 5-pTn- Rec_IE-2 (Seq.-ID No. 6), the nucleic acid sequence disA-disO (Seq.-ID No. 7) comprising genes disA, disB, disC, orf 9 and disO, and amino acid sequences of DisA (Seq.-ID No. 8), DisB (Seq.-ID No. 9), DisC (Seq.-ID No. 10), orf 9 (Seq.-ID No. 11) and DisD (Seq.-ID No. 12).
The functions proposed in Table 1 above have been identified by a similarity search on known sequences, however, the gene products from the orfs of Table 1 can differ according to their function in the biosynthetic gene cluster for disorazoles.
An analysis of the genomic DNA region encoding disA through disO has revealed several orfs in the vicinity of disA through disO, summarised in Table 1.
Figure 1 schematically depicts the arrangement of genes disA, disB, disC, orf9, and disO, wherein the abbreviations refer to catalytic centers and domains as follows:
Dark shade (J): polyketide synthase (PKS), Light shade (J|): nonribosomal protein synthetase (NRPS), KS: ketosynthase, DH: β-hydroxydehydratase, KR: β-ketoacyl reductase, ACP: acyl carrier protein, MT: methyltransferase, HC: heterocyclization domain, A: adenylation domain, PCP: peptidyl carrier protein, Ox: oxidation domain, TE: thioesterase domain,
AT: acyl transferase, Or: oxidoreductase, and J,: site of insertion of transposon in different mutants.
The sites indicated by the arrows (J,) are designated as Sol2_EX_13-21 and Sol2_EX_2793, which are So ce 12 mutants from which the plasmids pTn-Recl3-21 and pTn-Rec2793, respectively, were recovered.
The arrangement of genes adjacent to the insertion site of the transposon mutant So ce 12 EXI IE-2 is schematically depicted in Figure 2.
For the gene products of disA through disO, functions can be proposed for individual protein domains by homology search. These proposed functions, including their relative positions in the individual nucleic acid sequences are listed in Table 2 below.
Table 2: Disorazole biosynthetic genes disA, disB, disC and disD
Abbreviations are according to Figure 1.
However, when analysing the synthesis of disorazoles in microorganisms expressing the biosynthetic gene cluster consisting of the sequences encoding DisA, DisB, DisC and DisD only, homologous sequences of which have been described in WO 2004/053065 A2, it is considered impossible that the full range of derivative disorazoles could be produced with the translation products DisA, DisB, DisC and DisD only. The reason is that comparative analysis showed that DisA, DisB, DisC and DisD lack at least some functions, e.g. necessary for hydroxylation, epoxidation and methoxylation, that are assumed necessary for synthesis of at least some known derivatives of disorazole.
Further analysis of the genomic region adjacent the genes disA through disO, for example the gene products of those orfs listed in Table 2 above, did not identify coding sequences for accessory iunctions to complement the biosynthetic pathway of DisA through DisD to allow production of disorazole or the range of known disorazole derivatives.
Analysis of the two additional disorazole negative mutants revealed further sequences obtainable from Sorangium cellulosum So ce 12, at least one of which encodes a translation
product that is necessary for synthesis of disorazoles in combination with the translation products of disA, disB, disC and disO, preferably in combination with the translation product of orf 9. These additional nucleic acid sequences have been identified on recovered plasmids of disorazole negative So ce 12 mutants and are summarised in Table 3 below.
Table 3: Recovered plasmids and proposed function of the encoded proteins
The proposed functions have been identified by similarity searches with known proteins but may be different from the proposed functions indicated here according to their functions within the biosynthetic gene pathway.
Sequencing of pTn-Rec_IE-2 identified a total of 5 oris and their putative functions, which are summarized in Table 4 below:
Table 4: Proteins encoded on the plasmid pTn-RecIE-2 and their putative function
In a first embodiment of the present invention, at least one of the translation products of Table 4 is used in combination with the translation products of disA through disO to provide the biosynthetic pathway for disorazoles, in a preferred embodiment, at least 2, more preferred three or four translation products of the sequences identified in Table 4 participate in the biosynthetic pathway for disorazoles in combination with disA through disO, preferably including the translation product of orf 9.
The DNA sequences of disA, disB, disC, disO and orf l-pTn-Rec_IE-2, orf 2-pTn-Rec_IE-2, orf 3-pTn-Rec_IE-2, orf 4-pTn-Rec_IE-2, and orf 5-pTn-Rec_IE-2 as well as their translation products obtained from Sorangium cellulosum So ce 12 are listed in Figure 3. These specific sequences are preferred for performing the present invention, but other coding sequences and peptides derivable therefrom providing the respective activity necessary in the disorazole synthetic pathway are also applicable in the present invention and can replace the sequences of Figure 3.
The present invention will now be described in greater detail by way of examples, which are not intended to limit the scope of the invention.
Example 1: Cloning and sequencing of nucleic acid sequences complementing the biosynthetic pathway enzymes for disorazoles
Nucleic acid sequences, the translation products of which participate in the biosynthetic pathway for disorazoles have been identified using a transposon recovery procedure from disorazole negative transposon mutants of Sorangium cellulosum strain So ce 12. Strain So ce 12 is available at NCIMB Aberdeen, UK, under accession No. NCIB 12134.
For transposon mutagenesis, transposon termed pMiniHimarHyg which is applicable to myxobacteria was used, comprising the hygromycin resistance, but lacking the genes for conjugational DNA transfer. The transformation of Sorangium cellulosum was obtained by electroporation as described in European patent application EP 04 103 546.0, filed on 23 July 2004 with the European patent office.
Disorazole negative mutants were detected in a bioassay using an overlay with the disorazole sensitive yeast R. glutinis. In this bioassay, transposon mutants were plated on PM 12 agar plates without hygromycin at 32 °C until colonies became visible, then overlayed with R. glutinis, incubated overnight at 30 °C and growth inhibition zones were compared to a wild type Sorangium cellulosum So ce 12.
Transposon recovery from disorazole negative transposon mutant colonies was essentially carried out as described in Kopp et al. (J. Biotech 107, 29 (2004))
Example 2: Heterologous expression of biosynthetic pathway enzymes for the production of disorazole
The core biosynthetic gene cluster and their respective translation products sufficient for the biosynthesis of disorazoles was determined by heterologous gene expression experiments. As expected, the core enzymes comprising disA, disB, disC as well as disO are regarded as necessary components for the biosynthetic pathway. An optional and preferably included component is orf 9.
The core cluster comprising disA, disB, disC as well as disO needs complementation with at least an expression cassette encoding orf 3-pTn-Rec_IE-2, optionally in combination with orf l-pTn-Rec_IE-2, optionally in combination with orf 2-pTn-Rec_IE-2, optionally in
combination with orf 4-pTn-Rec_IE-2, and optionally in combination with orf 5-pTn-Rec_IE-
2.
When expressing sequences encoding at least one, preferably two, more preferably three or four and most preferably all of the group comprising orf l-pTn-Rec_IE-2, orf 2-pTn-Rec_IE- 2, orf 4-pTn-Rec_IE-2, and orf 5-pTn-Rec_IE-2, in combination with orf 3 - pTn-Rec_IE-2 to supplement the expression cassettes encoding disA - disO, optionally orf 9, respectively, production of disorazoles was found.
The number of derivative disorazoles varied according to the sequences selected among orf 1- pTn-Rec_IE-2, orf 2-pTn-Rec_IE-2, orf 4-pTn-Rec_IE-2, and orf 5-pTn-Rec_IE-2 for expression in combination with orf 3 - pTn-Rec_IE-2 and disA - disO, optionally orf 9. It is preferred that the coding sequences are contained intra-chromosomally in their natural arrangement.
For production of disorazoles, the identification of the set of genes or gene cluster according to the invention allows to modify producer strains, for example by specifically targeted modification of regulatory elements, e.g. the introduction of stronger promoters for disA, disB, disC, orf 9, and/or disO, and/or for the complementing genes orf l-pTn-Rec_IE-2, orf 2- pTn-Rec_IE-2, orf 3 - pTn-Rec_IE-2, orf4-pTn-Rec_IE-2, and/or orf 5 -pTn-Rec IE-2.
Alternatively, heterologous expression can be employed using microorganisms which are no natural producers of disorazole. For heterologous expression, Myxococcales, preferably Myxococcus xanthus, or Polyangium, also termed Sorangium, e. g. Sorangium cellulosum accessible as ATCC 25531, ATCC 29479 (DSMZ 2044), Stigmatella aurantiaca, Angiococcus disciformis and strains of the genus Pseudomonas, e.g. Pseudomonas putida, Pseudomonas stutzeri, and Pseudomonas syringae can be used.
Alternatively, the expression products, i. e. proteins derivable from the aforementioned sets of genes for the synthetic pathway, can be used in an extracellular synthesis system, e. g. as catalysts like an immobilized enzyme system for synthesis of disorazoles.