WO2018015512A1 - Biosynthesis of 13r-manoyl oxide derivatives - Google Patents

Biosynthesis of 13r-manoyl oxide derivatives Download PDF

Info

Publication number
WO2018015512A1
WO2018015512A1 PCT/EP2017/068418 EP2017068418W WO2018015512A1 WO 2018015512 A1 WO2018015512 A1 WO 2018015512A1 EP 2017068418 W EP2017068418 W EP 2017068418W WO 2018015512 A1 WO2018015512 A1 WO 2018015512A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
polypeptide
amino acid
acid sequence
recombinant host
Prior art date
Application number
PCT/EP2017/068418
Other languages
French (fr)
Inventor
Victor FORMAN
Irini PATERAKI
Jane Dannow DYEKJAER
Niels Bjerg JENSEN
Original Assignee
Evolva Sa
University Of Copenhagen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Evolva Sa, University Of Copenhagen filed Critical Evolva Sa
Publication of WO2018015512A1 publication Critical patent/WO2018015512A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07DHETEROCYCLIC COMPOUNDS
    • C07D311/00Heterocyclic compounds containing six-membered rings having one oxygen atom as the only hetero atom, condensed with other rings
    • C07D311/02Heterocyclic compounds containing six-membered rings having one oxygen atom as the only hetero atom, condensed with other rings ortho- or peri-condensed with carbocyclic rings or ring systems
    • C07D311/78Ring systems having three or more relevant rings
    • C07D311/92Naphthopyrans; Hydrogenated naphthopyrans
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/795Porphyrin- or corrin-ring-containing peptides
    • C07K14/80Cytochromes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8242Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits
    • C12N15/8243Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits involving biosynthetic or metabolic pathways, i.e. metabolic engineering, e.g. nicotine, caffeine
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • C12N9/0073Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14) with NADH or NADPH as one donor, and incorporation of one atom of oxygen 1.14.13
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P17/00Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms
    • C12P17/02Oxygen as only ring hetero atoms
    • C12P17/06Oxygen as only ring hetero atoms containing a six-membered hetero ring, e.g. fluorescein
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P7/00Preparation of oxygen-containing organic compounds
    • C12P7/02Preparation of oxygen-containing organic compounds containing a hydroxy group
    • C12P7/22Preparation of oxygen-containing organic compounds containing a hydroxy group aromatic
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y114/00Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
    • C12Y114/14Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14) with reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen (1.14.14)
    • C12Y114/14001Unspecific monooxygenase (1.14.14.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P5/00Preparation of hydrocarbons or halogenated hydrocarbons
    • C12P5/007Preparation of hydrocarbons or halogenated hydrocarbons containing one or more isoprene units, i.e. terpenes

Definitions

  • the present invention relates to the field of biosynthesis of substituted diterpenes. More specifically, the invention relates to methods for biosynthesis of 13f?-manoyl oxide (13R- MO) and 13R-MO derivatives, including biosynthesis of forskolin.
  • 13R- MO 13f?-manoyl oxide
  • 13R-MO derivatives including biosynthesis of forskolin.
  • Forskolin is a complex functionalized derivative of 13R-MO requiring regio- and stereospecific oxidation of five carbon positions.
  • Forskolin is a diterpene naturally produced by Coleus forskohlii.
  • Forskolin, oxidized variants of forskolin, and/or acetylated variants of forskolin have been suggested as useful in treatment of a number of clinical conditions.
  • Forskolin has been shown to decrease intraocular pressure and can be used as an antiglaucoma agent in the form of eye drops. See Wagh et al., 2012, J Postgrad Med. 58(3): 199-202.
  • a water- soluble analogue of forskolin (NKH477), which has been shown to have vasodilatory effects when administered intravenously, has been approved for commercial use in Japan for treatment of acute heart failure and heart surgery complications. See Kikura et al. , 2004, Pharmacol Res 49:275-81 .
  • Forskolin which also acts as bronchodilator, can be used for asthma treatments. See Yousif & Thulesius, 1999, J Pharm Pharmacol. 51 (2): 181 -6.
  • forskolin may help to treat obesity by contributing to higher rates of body fat burning and promoting lean body mass formation. See Godard et al., 2005, Obes Res. 13:1335-43.
  • the invention provides a recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1 -positions, having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:22, and further having at least one amino acid substitution corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 474, 476, and/or 478 of SEQ ID NO:22.
  • the polypeptide comprises an A99I, A100V, G104D, V207T, S235G, Y236F, G362V, L366F, L366E, D473E, D474L, F476T, L478M, L478A, and/or L478I substitution corresponding to SEQ ID NO:22.
  • the polypeptide comprises:
  • the invention also provides a recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1-positions, having at least one amino acid substitution corresponding to residues 93- 1 16; 202-209; 233-240; 286-304; 359-369 or 473-480 of SEQ ID NO:22.
  • 13R-MO 13f?-manoyl oxide
  • 13-MO derivative comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1-positions, having at least one amino acid substitution corresponding to residues 93-
  • the invention also provides a recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1-positions, having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:70, SEQ ID NO:71 , SEQ ID NO:72, or SEQ ID NO:73.
  • the 13R-MO derivative is an oxidized 13R-MO derivative.
  • the 13R-MO derivative is 1 1 -OXO-13R-MO and/or 1 1/3-hydroxy-13R-MO.
  • the 13R-MO derivative is forskolin.
  • the recombinant host cell further comprises:
  • the polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R- MO from (5S,8R,9R,10R)-labda-8-ol diphosphate comprises a polypeptide having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO: 16 or SEQ ID NO:17, or at least 40% sequence identity to the amino acid sequence set forth in SEQ ID NO: 18;
  • the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its -, 6-, 7-, 9-, and/or 1 1 -position comprises a polypeptide having at least 55% sequence identity to the amino acid sequence set forth in SEQ ID NO: 19, or at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:20, SEQ ID NO:21 , or SEQ ID NO:23;
  • the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a polypeptide having at least 40% sequence identity to SEQ ID NO:26;
  • the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a polypeptide having at least 70% sequence identity to the amino acid sequence set forth in SEQ ID NO:32 or SEQ ID NO:37;
  • the polypeptide capable of synthesizing DXS from pyruvate and D- glyceraldehyde 3-phosphate comprises a polypeptide having at least 75% sequence identity to the amino acid sequence set forth in SEQ ID NO:30;
  • the polypeptide capable of reducing cytochrome P450 complex comprises a polypeptide having at least 75% sequence identity to the amino acid sequence set forth in SEQ ID NO:34; and/or (g) the anti-post transcriptional suppressor protein polypeptide comprises a polypeptide having at least 65% sequence identity to the amino acid sequence set forth in SEQ ID NO:68.
  • the recombinant host cell comprises a plant cell, a mammalian cell, an insect cell, a fungal cell, an algal cell or a bacterial cell.
  • the bacterial cell comprises Escherichia cells, Lactobacillus cells, Lactococcus cells, Comebacterium cells, Acetobacter cells, Acinetobacter cells, or Pseudomonas cells.
  • the fungal cell comprises a yeast cell.
  • the yeast cell is a cell from Saccharomyces cerevisiae, Schizosaccharomyces pombe, Yarrowia lipolytica, Candida glabrata, Ashbya gossypii, Cyberlindnera jadinii, Pichia pastoris, Kluyveromyces lactis, Hansenula polymorpha, Candida boidinii, Arxula adeninivorans, Xanthophyllomyces dendrorhous, or Candida albicans species.
  • the yeast cell is a Saccharomycete.
  • the yeast cell is a Saccharomyces cerevisiae cell.
  • the plant cell is a Nicotiana benthamiana cell.
  • the invention also provides a method of producing 13R-MO and/or a 13R-MO derivative in a cell culture, comprising growing the recombinant host cell disclosed herein in the cell culture, under conditions in which the genes are expressed, and wherein 13R-MO and/or the 13R-MO derivative is produced by the recombinant host cell.
  • the recombinant host cell is grown in a fermentor at a temperature for a period of time, wherein the temperature and period of time facilitate the production of 13R-MO and/or the 13R-MO derivative.
  • the method disclosed herein further comprises a step of isolating 13R- MO and/or the 13R-MO derivative.
  • the 13R-MO derivative is 1 1 -oxo-13R- MO, 1 i -hydroxy-13R-MO or forskolin.
  • the invention also provides a 13R-MO derivative composition produced by the recombinant host cell or the method disclosed herein.
  • the 13R-MO derivative composition comprises 1 1-oxo-13R-MO, 1 i p-hydroxy-13R-MO, and/or forskolin.
  • Figure 1 shows the structure of 13R-MO ((3R,4aR,10aS)-3,4a,7,7,10a-pentamethyl- 3-vinyldodecahydro-1 H-benzo[f]chromene) and formulas for 13R-MO derivatives.
  • Figure 2A shows a hypothetical biosynthetic route to forskolin in C. forskohlii proposed by Asada et al. , Phytochemistry 79 (2012) 141 -146.
  • Figure 2B shows a reaction capable of being catalyzed by a terpene synthase, such as terpene synthase 2 (TPS2; SEQ ID NO: 16).
  • TPS2 terpene synthase 2
  • conversion of geranylgeranyl diphosphate to (5S,8R,9R, 10R)-labda-8-ol diphosphate is capable of being catalyzed by TPS2 of SEQ ID NO: 16.
  • Figure 2C shows a reaction capable of being catalyzed by a terpene synthase, such as terpene synthase 3 (TPS3; SEQ ID NO:17) or terpene synthase 4 (TPS4; SEQ ID NO:18).
  • a terpene synthase such as terpene synthase 3 (TPS3; SEQ ID NO:17) or terpene synthase 4 (TPS4; SEQ ID NO:18).
  • TPS3 terpene synthase 3
  • TPS4 terpene synthase 4
  • Figure 3 shows 13f?-MO-derived oxygenated products produced by cytochrome P450 (CYP) 76AH8 (CYP76AH8 of SEQ ID NO:20), CYPAH17 (SEQ ID NO:23), CYPAH15 (SEQ ID NO:22), CYP76AH1 1 (SEQ ID NO:21), and CYP76AH16 (SEQ ID NO:19).
  • CYP cytochrome P450
  • 76AH8 CYP76AH8 of SEQ ID NO:20
  • CYPAH17 SEQ ID NO:23
  • CYPAH15 SEQ ID NO:22
  • CYP76AH1 1 SEQ ID NO:21
  • CYP76AH16 SEQ ID NO:19
  • Figure 4 shows diterpene biosynthetic pathways from geranylgeranyl diphosphate (GGPP) towards forskolin and ferruginol, compounds present in the root cork cells of C. forskohlii.
  • GGPP geranylgeranyl diphosphate
  • Figure 5A shows alignments of Ratius Non/egicus CYP2A1 (SEQ ID NO:39), C. forshohlii CYP76AH8 (SEQ ID NO:20), C. forshohlii CYP76AH 15 (SEQ ID NO:22), Hyoscyamus muticus CYP71 D55 (SEQ ID NO:40), and Thapsia villosa CYP71AJ6 (SEQ ID NO:41) for substrate recognition site (SRS) identification. Highlighted areas indicate identified SRS regions for Non/egicus CYP2A1 by Gotoh, 1997, J. Biol. Chem. 1 (5):83-90, H.
  • FIG. 5B shows SRS regions for CYP76AH15 (SEQ ID NO:22), CYP76AH8 (SEQ ID NO:20), CYP76AH17 (SEQ ID NO:23), CYP76AH11 (SEQ ID NO:21), and CYP76AH16 (SEQ ID NO:19). See Example 1.
  • Figure 5C shows homology model structures of SRS1-6 from CYP76AH15 and identifies the A99, S235, Y236, L366, and G362 residues.
  • Figure 5D shows the residues of CYP76AH15 selected for mutagenesis (squares).
  • Figures 6A shows GC-MS chromatograms of hexane extracted tobacco leaf discs comprising native CYP76AH15 or CYP76AH15 SRS5 variants.
  • Figure 6B shows GC-MS chromatograms analyzing products produced by CYP76AH14 SRS1 and CYP76AH14 SRS1 +SRS5 variants.
  • Figure 6C shows the structures of 13R-MO, 1 1-oxo-13R-MO, and 1 1/3- hydroxy-13R-MO. See Example 2.
  • Figure 7A shows GC-MS chromatograms analyzing a control N. benthamiana plant, an N. benthamiana plant comprising native CYP76AH15, and N. benthamiana plants comprising CYP76AH15 SRS5 variants.
  • Figure 8A shows GC-MS chromatograms and variants producing miltiradiene, abietatriene, and/or ferruginol products from control and N. benthamiana plants expressing CYP76AH15 variants.
  • Figure 8B shows mass spectra and structures of produced miltiradiene, abietatriene, and ferruginol. See Example 2.
  • Figure 9A shows GC-MS chromatograms analyzing a control S. cerevisiae strain comprising C. forskohlii POR (SEQ ID NO:34), strains comprising native CYP76AH15 (SEQ ID NO:22), or strains comprising a CYP76AH15 variant.
  • Figure 9B shows relative yields of 13R- MO, 1 1 -oxo-13f?-MO, and 11/3-hydroxy-13f?-MO (measured as total ion chromatogram (TIC) area of compound of interest normalized to a standard).
  • Figure 9C shows fold changes (compared to native CYP76AH15) of the CYP76AH15 variants.
  • Figures 9D and 9E show GC- MS chromatograms of a control strain comprising C. forskohlii POR (SEQ ID NO:34) or strains comprising a CYP76AH15 variant. See Example 3.
  • Figures 10A and 10B shows GC-MS chromatograms showing miltiradiene, abietatriene, and/or ferruginol produced by a control S. cerevisiae strain, an S. cerevisiae strain comprising CYP76AH15 or a CYP76AH15 variant, or an S. cerevisiae strain comprising CYP76AH8. See Example 3.
  • Figure 1 1 shows GC-MS chromatograms of 13R-MO derivatives produced by a control S. cerevisiae strain or an S. cerevisiae strain comprising i) CYP76AH15 (SEQ ID NO:22) or CYP76AH15 A99I (SEQ ID NO:42) and ii) CYP76AH1 1 (SEQ ID NO:21) or CYP76AH17 (SEQ ID NO:23).
  • Figure 12A shows a biosynthetic pathway towards ferruginol in Salvia fructicosa (S ) species and Rosmarinus officinalis (Ro).
  • Class ll+l diterpene synthases (diTPS) produce miltiradiene from GGPP, which can undergo spontaneous oxidation into abietatriene, which can further be converted to ferruginol and 11-hydroxy-ferruginol with a ferruginol synthase from S. fructicosa or R. officinalis and/or R. officinalis CYP76AH4.
  • Figure 12B shows diterpene products produced using CYP76AH enzymes.
  • Figure 13A shows GC-MS chromatograms analyzing products produced by 13R-MO- producing N. benthamiana further expressing C. forskohlii CYP76AH15 (SEQ ID NO:22), R. officinalis CYP76AH4, R. officinalis FS1 , or S. fructicosa FS.
  • Figure 13B shows GC-MS chromatograms analyzing products produced by 13f?-MO-producing N. benthamiana further expressing C. forskohlii CYP76AH8 (SEQ ID NO:20) or R. officinalis CYP76AH6.
  • Figure 14 shows the relative yields of abietatriene, miltiradiene, and ferruginol (shown from left to right, for each strain) produced by S. cerevisiae strains expressing CYP76AH15 or a CYP76AH15 variant. See Example 6.
  • Figure 15 shows the production titers of 13R-MO (diamonds), 11 -oxo-13f?-MO (dark squares), 11/3-hydroxy-13R-MO (triangles), and C20H32O3 (light squares, WT CYP76AH15 only) of S. cerevisiae strains expressing CYP76AH15 or a CYP76AH15 variant, over 72 hours. See Example 8.
  • nucleic acid means one or more nucleic acids.
  • Methods well known to those skilled in the art can be used to construct genetic expression constructs and recombinant cells according to this invention. These methods include in vitro recombinant DNA techniques, synthetic techniques, in vivo recombination techniques, and polymerase chain reaction (PCR) techniques.
  • PCR polymerase chain reaction
  • nucleic acid can be used interchangeably to refer to nucleic acid comprising DNA, RNA, derivatives thereof, or combinations thereof, in either single-stranded or double-stranded embodiments depending on context as understood by the skilled worker.
  • microorganism can be used interchangeably.
  • recombinant host and “recombinant host cell” can be used interchangeably.
  • the person of ordinary skill in the art will appreciate that the terms “microorganism,” microorganism host,” and “microorganism host cell,” when used to describe a cell comprising a recombinant gene, may be taken to mean “recombinant host” or "recombinant host cell.”
  • recombinant host is intended to refer to a host, the genome of which has been augmented by at least one DNA sequence.
  • DNA sequences include but are not limited to genes that are not naturally present, DNA sequences that are not normally transcribed into RNA or translated into a protein ("expressed"), and other genes or DNA sequences which one desires to introduce into a host. It will be appreciated that typically the genome of a recombinant host described herein is augmented through stable introduction of one or more recombinant genes. Generally, introduced DNA is not originally resident in the host that is the recipient of the DNA, but it is within the scope of this disclosure to isolate a DNA segment from a given host, and to subsequently introduce one or more additional copies of that DNA into the same host, e.g. , to enhance production of the product of a gene or alter the expression pattern of a gene. In some instances, the introduced DNA will modify or even replace an endogenous gene or DNA sequence by, e.g., homologous recombination or site-directed mutagenesis. Suitable recombinant hosts include microorganisms.
  • recombinant gene refers to a gene or DNA sequence that is introduced into a recipient host, regardless of whether the same or a similar gene or DNA sequence may already be present in such a host. "Introduced,” or “augmented” in this context, is known in the art to mean introduced or augmented by the hand of man.
  • a recombinant gene can be a DNA sequence from another species or can be a DNA sequence that originated from or is present in the same species but has been incorporated into a host by recombinant methods to form a recombinant host.
  • a recombinant gene that is introduced into a host can be identical to a DNA sequence that is normally present in the host being transformed, and is introduced to provide one or more additional copies of the DNA to thereby permit overexpression or modified expression of the gene product of that DNA.
  • said recombinant genes are encoded by cDNA.
  • recombinant genes are synthetic and/or codon-optimized for expression in S. cerevisiae.
  • engineered biosynthetic pathway refers to a biosynthetic pathway that occurs in a recombinant host, as described herein. In some aspects, one or more steps of the biosynthetic pathway do not naturally occur in an unmodified host. In some embodiments, a heterologous version of a gene is introduced into a host that comprises an endogenous version of the gene.
  • the term "endogenous" gene refers to a gene that originates from and is produced or synthesized within a particular organism, tissue, or cell.
  • the endogenous gene is a yeast gene.
  • the gene is endogenous to S. cerevisiae, including, but not limited to S. cerevisiae strain S288C.
  • an endogenous yeast gene is overexpressed.
  • the term “overexpress” is used to refer to the expression of a gene in an organism at levels higher than the level of gene expression in a wild type organism. See, e.g., Prelich, 2012, Genetics 190:841 -54.
  • deletion can be used interchangeably to refer to an endogenous gene that has been manipulated to no longer be expressed in an organism, including, but not limited to, S. cerevisiae.
  • heterologous sequence As used herein, the terms “heterologous sequence,” “heterologous coding sequence,” and “heterologous gene” are used to describe a sequence derived from a species other than the recombinant host.
  • the recombinant host is an S. cerevisiae cell
  • a heterologous sequence is derived from an organism other than S. cerevisiae.
  • a heterologous coding sequence for example, can be from a prokaryotic microorganism, a eukaryotic microorganism, a plant, an animal, an insect, or a fungus different than the recombinant host expressing the heterologous sequence.
  • a heterologous nucleic acid may be introduced into a host organism by recombinant methods.
  • the genome of the host organism can be augmented by at least one incorporated heterologous nucleic acid sequence.
  • a coding sequence is a sequence that is native to the host.
  • a "selectable marker” can be one of any number of genes that complement host cell auxotrophy, provide antibiotic resistance, or result in a color change.
  • Linearized DNA fragments of the gene replacement vector then are introduced into the cells using methods well known in the art (see below). Integration of the linear fragments into the genome and the disruption of the gene can be determined based on the selection marker and can be verified by, for example, PCR or Southern blot analysis. Subsequent to its use in selection, a selectable marker can be removed from the genome of the host cell by, e.g. , Cre-LoxP systems (see, e.g., Gossen et al., 2002, Ann. Rev. Genetics 36:153-173 and U.S. 2006/0014264).
  • a gene replacement vector can be constructed in such a way as to include a portion of the gene to be disrupted, where the portion is devoid of any endogenous gene promoter sequence and encodes none, or an inactive fragment of, the coding sequence of the gene.
  • variant and mutant are used to describe a protein sequence that has been modified at one or more amino acids, compared to the wild-type sequence of a particular protein.
  • a nucleic acid sequence encoding a polypeptide can include a tag sequence that encodes a "tag” designed to facilitate subsequent manipulation (e.g. , to facilitate purification or detection), secretion, or localization of the encoded polypeptide.
  • Tag sequences can be inserted in the nucleic acid sequence encoding the polypeptide such that the encoded tag is located at either the carboxyl or amino terminus of the polypeptide.
  • Non-limiting examples of encoded tags include green fluorescent protein (GFP), human influenza hemagglutinin (HA), glutathione S transferase (GST), polyhistidine-tag (HIS tag), and FlagTM tag (Kodak, New Haven, CT).
  • GFP green fluorescent protein
  • HA human influenza hemagglutinin
  • GST glutathione S transferase
  • HIS tag polyhistidine-tag
  • FlagTM tag Kodak, New Haven, CT.
  • Other examples of tags include a chloroplast transit peptide, a mitochondrial transit peptide, an amyloplast peptide, signal peptide, or a secretion tag.
  • a fusion protein is a protein altered by domain swapping .
  • domain swapping is used to describe the process of replacing a domain of a first protein with a domain of a second protein.
  • the domain of the first protein and the domain of the second protein are functionally identical or functionally similar.
  • the structure and/or sequence of the domain of the second protein differs from the structure and/or sequence of the domain of the first protein .
  • the term "inactive fragment” is a fragment of the gene that encodes a protein having, e.g. , less than about 10% (e.g., less than about 9%, less than about 8%, less than about 7%, less than about 6%, less than about 5%, less than about 4%, less than about 3%, less than about 2%, less than about 1 %, or 0%) of the activity of the protein produced from the full-length coding sequence of the gene.
  • Such a portion of a gene is inserted in a vector in such a way that no known promoter sequence is operably linked to the gene sequence, but that a stop codon and a transcription termination sequence are operably linked to the portion of the gene sequence.
  • This vector can be subsequently linearized in the portion of the gene sequence and transformed into a cell. By way of single homologous recombination, this linearized vector is then integrated in the endogenous counterpart of the gene with inactivation thereof.
  • the terms “detectable amount,” “detectable concentration,” “measurable amount,” and “measurable concentration” refer to a level of 13R-MO and/or * ⁇ 3R- MO derivative measured in AUC, ⁇ / ⁇ 600 , mg/L, ⁇ , or mM. 13R-MO and/or 13R-MO derivatives (i.e.
  • total, supernatant, and/or intracellular levels can be detected and/or analyzed by techniques generally available to one skilled in the art, for example, but not limited to, liquid chromatography-mass spectrometry (LC-MS), thin layer chromatography (TLC), high- performance liquid chromatography (HPLC), ultraviolet-visible spectroscopy/ spectrophotometry (UV-Vis), mass spectrometry (MS), and nuclear magnetic resonance spectroscopy (NMR).
  • LC-MS liquid chromatography-mass spectrometry
  • TLC thin layer chromatography
  • HPLC high- performance liquid chromatography
  • UV-Vis ultraviolet-visible spectroscopy/ spectrophotometry
  • MS mass spectrometry
  • NMR nuclear magnetic resonance spectroscopy
  • the term "undetectable concentration” refers to a level of a compound that is too low to be measured and/or analyzed by techniques such as TLC, HPLC, UV-Vis, MS, or NMR. In some embodiments, a compound of an "undetectable concentration" is not present in a 13R-MO and/or 13R-MO derivative composition.
  • the term "contact” is used to refer to any physical interaction between two objects.
  • the term “contact” may refer to the interaction between an an enzyme and a susbtrate.
  • the term “contact” may refer to the interaction between a liquid (e.g., a supernatant) and an adsorbent resin.
  • 13R-MO and/or 13R-MO derivatives can be isolated using a method described herein. For example, following fermentation, a culture broth can be centrifuged for 30 min at 7000 rpm at 4°C to remove cells, or cells can be removed by filtration. The cell-free lysate can be obtained, for example, by mechanical disruption or enzymatic disruption of the host cells and additional centrifugation to remove cell debris. Mechanical disruption of the dried broth materials can also be performed, such as by sonication. The dissolved or suspended broth materials can be filtered using a micron or sub-micron prior to further purification, such as by preparative chromatography.
  • the fermentation media or cell-free lysate can optionally be treated to remove low molecular weight compounds such as salt; and can optionally be dried prior to purification and re-dissolved in a mixture of water and solvent.
  • the supernatant or cell- free lysate can be purified as follows: a column can be filled with, for example, HP20 Diaion resin (aromatic type Synthetic Adsorbent; Supeico) or other suitable non-polar adsorbent or reverse phase chromatography resin, and an aliquot of supernatant or cell-free lysate can be loaded on to the column and washed with water to remove the hydrophilic components.
  • the 13R-MO and/or 13R-MO derivative product can be eluted by stepwise incremental increases in the solvent concentration in water or a gradient from).
  • the levels of 13R-MO and/or 13R-MO derivatives in each fraction, including the flow-through, can then be analyzed by LC-MS. Fractions can then be combined and reduced in volume using a vacuum evaporator. Additional purification steps can be utilized, if desired, such as additional chromatography steps and crystallization.
  • x, y, and/or z can refer to “x” alone, “y” alone, “z” alone, “x, y, and z,” “(x and y) or z,” “x or (y and z),” or “x or y or z.”
  • "and/or” is used to refer to the exogenous nucleic acids that a recombinant cell comprises, wherein a recombinant cell comprises one or more exogenous nucleic acids selected from a group.
  • “and/or” is used to refer to production of 13R-MO and/or 13R-MO derivatives. In some embodiments, “and/or” is used to refer to production of 13R-MO and/or 13R-MO derivatives, wherein 13R-MO and/or 13R-MO derivatives are produced through the following steps: culturing a recombinant microorganism, synthesizing 13R-MO and/or 13R-MO derivatives in a recombinant microorganism, and/or isolating 13R-MO and/or 13R-MO derivatives.
  • a diterpene is used to refer to a compound derived or prepared from four isoprene units.
  • a diterpene according to the invention is a C 20 -molecule comprising 20 carbon atoms.
  • a diterpene typically comprises one or more ring structures, such as one or more monocyclic, bicyclic, tricyclic, or tetracyclic ring structure(s).
  • the diterpene can comprise one or more double bonds.
  • the diterpene can comprise up to three oxygen atoms, wherein the oxygen atom is generally present in the form of hydroxyl groups or part of a ring structure.
  • substituted with a moiety refers to hydrogen group(s) being substituted with the moiety.
  • Alkyl refers to a saturated, straight, or branched hydrocarbon chain. The hydrocarbon chain preferably comprises from one to eighteen carbon atoms (C-
  • alkyl such as from one to six carbon atoms (Ci_ 6 -alkyl), including methyl, ethyl, propyl, isopropyl, butyl, isobutyl, secondary butyl, tertiary butyl, pentyl, isopentyl, neopentyl, tertiary pentyl, hexyl, and isohexyl.
  • alkyl represents a C-
  • hydroxyl as used herein refers to an "-OH” substituent.
  • acetylated refers to presence of a CH 3 0 group.
  • 13R-MO refers to 13f?-manoyl oxide, the structure of which is provided in Figure 1 .
  • the structure also provides the numbering of the carbon atoms of the ring structure used herein.
  • oxidized 13R-MO includes "hydroxylated 13R-MO.”
  • acetylated 13R-MO refers to 13R-MO substituted with at least one acetyl group.
  • the term "derivative” is used to refer to a compound produced from or capable of being produced (e.g., derived) from a similar compound.
  • Non-limiting examples of 13R-MO derivatives include acetylated 13R-MO compounds, oxidized 13R-MO compounds, and acetylated oxidized 13R-MO compounds.
  • 13R-MO derivatives include 1 1 - oxo-(13f?)-MO forskolin, iso-forskolin, forskolin B, forskolin D, 9-deoxyforskolin, 1 ,9- dideoxyforskolin, and coleoforskolin. Additional 13R-MO derivatives are shown in Figure 2.
  • forskolin is a complex functionalized derivative of 13R-MO requiring region- and stereospecific oxidation of five carbon positions: one double-oxidation leading to a ketone and four single oxidation reactions yielding hydroxyl groups.
  • the results presented herein show identification of diterpene synthases, cytochrome P450 mono- oxygenases, and acety transferases, which when co-expressed, result in production of forskolin .
  • a host cell disclosed herein can comprise a diterpene synthase.
  • the diterpene synthase (diTPS or TPS) can be from class II or class I, and in particular, be capable of converting geranylgeranyl diphosphate (GGPP) to (5S,8R,9R, 10R)- labda-8-ol diphosphate and/or be capable of converting (5S,8R,9R, 10R)-labda-8-ol diphosphate to 13R-MO.
  • 13R-MO is capable of being produced in a host cell comprising a gene encoding a terpene synthase polypeptide.
  • a diTPS of class II is an enzyme capable of catalyzing protonation-initiated cationic cycloisomerization of GGPP to form a diterpene pyrophosphate intermediate.
  • the class II diTPS reaction can be terminated either by deprotonation or by water capture of the diphosphate carbocation.
  • the diTPS of class II may in particular comprise the following motif of four amino acids: D/E-X-D-D, wherein X can be any amino acid, such as any naturally occurring amino acids.
  • X can be an amino acid with a hydrophobic side chain, and thus, X can be A, I, L, M, F, W, Y, or V.
  • Even more preferably, X is an amino acid with a small hydrophobic side chain, and thus X can be A, I, L, or V.
  • the host organism comprises a gene encoding a TPS2 polypeptide.
  • TPS2 catalyzes the reaction shown in Figure 2B, wherein -OPP refers to diphosphate.
  • the TPS2 is TPS2 of C. forskohlii.
  • the TPS2 can be a polypeptide of SEQ ID NO:16 or a functional homolog thereof sharing at least 50% sequence identity therewith.
  • TPS2 of SEQ ID NO: 16 can be encoded by the nucleotide sequence set forth in SEQ ID NO:35. See Examples 2 and 3.
  • a diTPS of class I is an enzyme capable of catalyzing cleavage of the diphosphate group of the diterpene pyrophosphate intermediate and additionally preferably also is capable of catalyzing cyclization and/or rearrangement reactions on the resulting carbocation.
  • deprotonation or water capture may terminate the class I diTPS reaction leading to hydroxylation of the diterpene pyrophosphate intermediate.
  • a diTPS of class I may comprise the following motif of five amino acids: D-D-X-X- D/E, wherein X can be any amino acid, such as any naturally occurring amino acids.
  • X can be an amino acid with a hydrophobic side chain, and thus X can for example be A, I, L, M, F, W, Y, or V.
  • X is an amino acid with a small hydrophobic side chain, and thus X can be A, I, L, or V.
  • the host organism comprises a gene encoding a TPS3 polypeptide and/or a gene encoding a TPS4 polypeptide.
  • the TPS3 or TPS4 is an enzyme capable of catalyzing the reaction shown in Figure 2C.
  • the TPS3 is TPS3 of C. forskohlii.
  • the TPS3 can be a polypeptide of SEQ ID NO: 17 or a functional homolog thereof sharing at least 50% sequence identity therewith.
  • TPS3 of SEQ ID NO: 17 can be encoded by the nucleotide sequence set forth in SEQ ID NO:36.
  • the TPS4 is TPS4 of C. forskohlii.
  • the TPS4 can be a polypeptide of SEQ ID NO: 18 or a functional homolog thereof sharing at least 40% sequence identity therewith.
  • a host comprises a gene encoding a TPS1 polypeptide.
  • the TPS1 polypeptide can be a C. forskohlii TPS1 polypeptide, i.e. TPS1 of SEQ ID NO:65. See Example 2.
  • a host cell disclosed herein can comprise a nucleic acid encoding an enzyme capable of catalyzing oxidation of 13R-MO.
  • the enzyme capable of catalyzing oxidation of 13R-MO is a cytochrome P450 (CYP) polypeptide.
  • CYPs according to the present invention are enzymes capable of catalyzing oxidation reactions using NAD(P)H as electron donor.
  • Preferred CYPs according to the present invention are hemoproteins capable of catalyzing oxidation reactions that utilize NADPH and/or NADH to reductively cleave atmospheric dioxygen to produce a functionalized organic substrate and a molecule of water.
  • a host cell comprising a gene encoding a diterpene synthase polypeptide and genes encoding a CYP polypeptide is capable of producing oxidized 13R-MO.
  • CYPs are encoded by gene superfamily, which is divided into families sharing at least 40% sequence identity. The families are divided into subfamilies sharing at least 55% sequence identity. The CYP families have a number, which generally is written after "CYP.” Thus, by way of example, CYPs of family 74 are named CYP74. The subfamilies are indicated by a capital letter after the family number. Thus by way of example a CYP of family 74 and subfamily A is named CYP74A. Additional description of CYPs, the structural characteristics and the nomenclature thereof may for example be found in Schuler et al., Annu Rev. Plant Biol., 54:629-67 (2003) and in Podust et al. , Nat.
  • the CYP to be used with the present invention can be a CYP as described in Schuler et al or Podust et al.
  • the CYP may comprise the following motif of five amino acids: A G-G-X-X-T/S, wherein X can be any amino acid, such as any naturally occurring amino acids.
  • one of the X amino acids can be an amino acid with a charged side chain, and in particular an acidic side chain, such as E.
  • A/G indicates that the amino acid can be A or G.
  • T/S indicates that the amino acid can be T or S.
  • the CYP can also comprise the following motif 4 amino acids: E-X-X-R, wherein X can be any amino acid, such as any naturally occurring amino acids.
  • X can be an amino acid with an uncharged side chain, such as a hydrophobic side chain.
  • the CYP can comprise the following motif following motif of 10 amino acids: F-X-X-G-X-X-X-C-X-G (SEQ ID NO:69), wherein X can be any amino acid, such as any naturally occurring amino acid.
  • the CYP can comprise the following motif of 3 amino acids: P-F-G.
  • the CYP is an enzyme capable of catalyzing the following reactions: a) conversion of 13R-MO to hydroxyl-13f?-MO; b) conversion of hydroxyl-13f?-MO to dihydroxy- 13R-MO; c) conversion of hydroxyl-13f?-MO to 13R-MO ketone; and/or d) conversion of hydroxyl-13R-MO to 13R-MO aldehyde.
  • a host organism comprises a gene encoding an enzyme capable of catalyzing oxidation of 13R-MO and/or of oxidized 13R-MO.
  • the CYP may preferably be an enzyme capable of catalyzing oxidation of 13R-MO and/or of oxidized 13R-MO.
  • a host organism comprises: a) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 1 position; b) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 6 position; c) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 7 position; d) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 9 position; and/or e) a gene encoding CYP polypeptide capable of catalyzing oxidation of 13R-MO and/or of oxidized 13R-MO at the 1 1 position to a ketone.
  • a host organism comprises a gene encoding CYP76AH16.
  • the CYP76AH16 may in particular be CYP76AH16 of SEQ ID NO:19 or a functional homolog thereof sharing at least 55% sequence identity therewith.
  • a functional homolog of CYP76AH16 is a polypeptide sharing above-mentioned sequence identity with CYP76AH16 and which also is capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 9 position. See Examples 1 and 5.
  • a host organism comprises a gene encoding CYP76AH8.
  • the CYP76AH8 may in particular be CYP76AH8 of SEQ ID NO:20 or a functional homolog thereof sharing at least 50% sequence identity therewith. See Examples 1 , 3, and 5.
  • a host organism comprises a gene encoding CYP76AH15.
  • the CYP76AH15 may in particular be CYP76AH15 of SEQ ID NO:22 or a functional homolog thereof sharing at least 50% sequence identity therewith.
  • CYP76AH15 catalyzes conversion of 13R-MO to 1 1-oxo-13f?-MO. See Examples 1-3.
  • a host organism comprises a gene encoding CYP76AH17.
  • the CYP76AH17 may in particular be CYP76AH17 of SEQ ID NO:23 or a functional homolog thereof sharing at least 50% sequence identity therewith. See Example 1.
  • a host organism comprises a gene encoding CYP76AH11 .
  • the CYP76AH1 1 may in particular be CYP76AH1 1 of SEQ ID NO:21 or a functional homolog thereof sharing at least 50% sequence identity therewith. See Examples 1 and 4.
  • a host organism comprises a gene encoding R. officinalis CYP76AH4 (SEQ ID NO:71), R. officinalis FS1 (SEQ ID NO:70), S. fructicosa FS (SEQ ID NO:73), R. officinalis CYP76AH6 (SEQ ID NO:72), and/or a functional homolog thereof sharing at least 50% sequence identity therewith. See Example 5.
  • a functional homolog of CYP76AH8, CYP76AH15, CYP76AH17, or CYP76AH1 1 is a polypeptide sharing above-mentioned sequence identity with CYP76AH8, CYP76AH15, CYP76AH17, or CYP76AH1 1 and which also is capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 1 , 6, or 7 position or oxidation of 13R-MO at the 1 1 position.
  • the CYP76AH enzymes carry out ketonation at C-11 (CYP76AH15) and hydroxylations at C-6, C-7, C-1 (CYP76AH1 1) and C-9 (CYP76AH16) to produce deacetylforskolin.
  • cytochrome P450 enzymes have at least six specific regions known as the substrate recognition sites (SRS, i.e. SRS1-6) that are important for the activity of CYPs.
  • SRS substrate recognition sites
  • alterations to SRS sites affect product production and/or substrate specificity.
  • one or more SRS sites are altered to increase in vivo formation of products in yeast.
  • Positions of SRS1 -6 for CYP76AH15 are shown in Figure 5B.
  • the S235 and Y236 residues lie in a region potentially in contact with the ER membrane.
  • the A99 residue points towards SRS3.
  • SRS2 and SRS3 can be part of the substrate entrance.
  • the L366 and G362 residues can be essential for P450 function.
  • SRS regions were identified in the forskolin related CYP76AH enzymes (CYP76AH8, CYP76AH1 1 , CYP76AH15, CYP76AH16 and CYP76AH17) by alignments and comparisons of reported SRS regions in the rat CYP2A1 (Gotoh, 1992), Hyoscyamus muticus CYP71 D55 (Takahashi et a/. , 2007) and Thapsia villosa CYP71AJ6 (Dueholm et a/., 2015). Comparative homology modeling was furthermore utilized to determine and visualize SRS regions ( Figures 5C and D).
  • CYP76AH1 1 , CYP76AH15 and CYP76AH16 were determined to contain a total of 78 residues in putative SRS regions whereas CYP76AH8 and CYP76AH17 contained 77 residues due to a deletion in the SRS6.
  • CYP76AH8, CYP76AH15 and CYP76AH17 carry out similar reactions, but with differences in product patterns and the SRS regions were compared to identify the similarities and differences in these regions (Table 3; Figures 5C and D).
  • CYP76AH8 and CYP76AH17 share similar product patterns with (13R)-manoyl oxide and a total sequence identity of 88%, whereas the sequence identity in the SRS regions were found to be 99% with a single conservative amino acid substitution in SRS1 (A1 17S in CYP76AH17), suggesting a high sequence conservation in the SRS regions.
  • Differences in the SRS1-6 regions between CYP76AH15 and CYP76AH8/CYP76AH17 were mainly in the SRS1 , SRS3 and SRS6 whereas the SRS5 region was conserved between all three enzymes.
  • CYP76AH15 variants are expressed in an N. benthamiana host.
  • the N. benthamiana host can further comprise a terpene synthase, such as TPS2 (SEQ ID NO: 16) or TPS3 (SEQ ID NO:17), anti-post transcriptional suppressor protein P19 (SEQ ID NO:68), a 1 -deoxy-D-xylulose 5-phosphate synthase (DXS) such as C. forskohlii DXS (SEQ ID NO:30), and/or a geranylgeranyl diphosphate synthase (GGPPS) such as C. forskohlii GGPPS (SEQ ID NO:32).
  • a terpene synthase such as TPS2 (SEQ ID NO: 16) or TPS3 (SEQ ID NO:17), anti-post transcriptional suppressor protein P19 (SEQ ID NO:68), a 1 -deoxy-D-xylulose 5-phosphate synthase (
  • expression of CYP76AH15 A99I (SEQ ID NO:42), CYP76AH15 L366E, or CYP76AH15 A99I L366F (SEQ ID NO:58) in N. benthamiana results in accumulation of 1 1/3-hydroxy-13f?-MO and 1 1-oxo-13f?-MO.
  • expression of CYP76AH15 L366F (SEQ ID NO:50) in N. benthamiana results in accumulation of 11 -oxo- 13R-MO.
  • expression of CYP76AH15 G362V L366F (SEQ ID NO:51 ) in N. benthamiana results in accumulation of 1 1/3-hydroxy-13f?-MO.
  • CYP76AH15 variants are expressed in an S. cerevisiae host.
  • the S. cerevisiae host can further comprise a cytochrome P450 reductase such as C. forskohlii POR (SEQ ID NO:34), a terpene synthase such as C. forskohlii TPS2 (SEQ ID NO: 16) and/or C. forskohlii TPS3 (SEQ ID NO:17), and/or a GGPPS such as Synechococcus sp. GGPPS (SEQ ID NO:37). See Example 3 and Figure 9.
  • CYP76AH15 variants can increase in vivo accumulation of 1 1 -oxo-13f?- MO by several fold in S. cerevisiae.
  • mutating amino acids corresponding to SRS1 (i.e. , A99I), SRS3 (i.e. , S235G Y236F), and/or SRS5 (i.e. , L366F, L366E) of SEQ ID NO:22 can increase accumulation of 1 1-oxo-13f?-MO by over two-fold compared with native and codon- optimized CYP76AH15.
  • CYP76AH15 A99I SEQ ID NO:42
  • CYP76AH15 A99I can result in accumulation of 5.6-fold higher levels of 11-oxo-13f?-MO, compared to expression of native CYP76AH15 (SEQ ID NO:22). See Example 3, Figure 9, and Table 5.
  • mutations in SRS regions can be combined to further increase CYP76AH15 activity, specifically when combining SRS1 +SRS3 and SRS1 +SRS5.
  • Expression of CYP76AH15 A99I S235G Y236F (SEQ ID NO:62) can result in a 6.5-fold increase in 11-oxo-13R-MO accumulation, while CYP76AH15 A99I L366F (SEQ ID NO:58) can increase 1 1 -oxo-13f?-MO levels 6.2-fold. See Example 3, Figure 9, and Table 5.
  • SRS6 variants of CYP76AH15 can lead to a changed product profile towards a hydroxylated product of 11-oxo-13f?-MO.
  • SRS5 variants of CYP76AH15 i.e. CYP76AH15 G362V L366F of SEQ ID NO:51 ) result in production of 1 1-hydroxy-13f?-MO. See Example 3 and Figure 9.
  • 13f?-MO-producing S. cerevisiae strains comprising CYP76AH15, CYP76AH15 A99I, or CYP76AH1 1 results in formation of compounds with the formula C20H32O3 and C20H32O4, corresponding to single hydroxylation and double hydroxylation of 1 1-0X0-13f?-manoy I oxide, respectively.
  • 13f?-MO-producing S. cerevisiae strains comprising CYP76AH15, CYP76AH15 A99I, or CYP76AH1 1 results in formation of compounds with the formula C20H32O3 and C20H32O4, corresponding to single hydroxylation and double hydroxylation of 1 1-0X0-13f?-manoy I oxide, respectively.
  • cerevisiae strains comprising CYP76AH15 (SEQ ID NO:22) and CYP76AH16 (SEQ ID NO: 19) results in formation of a C20H32O3 compound corresponding to a single hydroxylation of 1 1 -oxo- 13f?-manoyl oxide.
  • 13f?-MO-producing S In some embodiments, 13f?-MO-producing S.
  • CYP76AH15 or a CYP76AH15 variant is expressed in N. benthamiana or an S. cerevisiae host to produce miltiradiene, abietatriene, and/or ferruginol.
  • expression of CYP76AH8 in an S. cerevisiae host results in production of ferruginol.
  • the host can further comprise C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO: 17), and/or C. forskohlii POR (SEQ ID NO:34). See Examples 2, 3, and 5 and Figures 8, 10, and 12.
  • a host cell disclosed herein can comprise a nucleic acid encoding a diterpene acetyltransferase capable of catalyzing acetylation of 13R-MO and/or acetylation of oxidized 13R-MO.
  • a host cell comprising a gene encoding a diterpene synthase polypeptide, a gene encoding a CYP polypeptide, and a gene an ACT polypeptide is capable of producing acetylated oxidized 13R-MO, such as forskolin.
  • a host cell disclosed herein comprises the diterpene acetyltransferase, ACT1-6.
  • ACT1-6 is derived from C. forskohlii.
  • the diterpene acetyltransferase can be ACT1 -6 of SEQ ID NO:6 or a functional homolog thereof sharing at least 55% sequence identity therewith.
  • a functional homolog of ACT1-6 of SEQ ID NO:6 is a polypeptide sharing at least 90% sequence identity therewith.
  • ACT1-6 of SEQ ID NO:6 is encoded by the nucleic acid set forth in SEQ ID NO: 1 or SEQ ID NO:1 1 , wherein SEQ ID NO:1 1 is optimized for expression in S. cerevisiae.
  • a host cell disclosed herein comprises the diterpene acetyltransferase, ACT1-7.
  • ACT1-7 is derived from C. forskohlii.
  • the diterpene acetyltransferase can be ACT1 -7 of SEQ ID NO:7 or a functional homolog thereof sharing at least 55% sequence identity therewith.
  • a functional homolog of ACT1-7 of SEQ ID NO:7 is a polypeptide sharing at least 90% sequence identity therewith.
  • ACT1-7 of SEQ ID NO:7 is encoded by the nucleic acid set forth in SEQ ID NO:2 or SEQ ID NO: 12, wherein SEQ ID NO: 12 is optimized for expression in S. cerevisiae.
  • a host cell disclosed herein comprises the diterpene acetyltransferase, ACT1 -8.
  • ACT1-8 can be derived from any suitable source; however, in a preferred embodiment, ACT1-8 is derived from C. forskohlii.
  • the diterpene acetyltransferase can be ACT1-8 of SEQ ID NO:26 or a functional homolog thereof sharing at least 55% sequence identity therewith.
  • a functional homolog of ACT1-8 of SEQ ID NO:26 is a polypeptide sharing at least 90% sequence identity therewith.
  • ACT1-8 is encoded by the nucleic acid set forth in SEQ ID NO:27.
  • 13R-MO and/or 13R-MO derivatives are produced in vivo through expression of one or more enzymes involved in a diterpene biosynthetic pathway in a recombinant host.
  • a recombinant host expressing a gene encoding a polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R- MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions, and/or a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-
  • the polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R,10R)-labda-8-ol diphosphate comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO: 16 (which can be encoded by the nucleotide sequence set forth in SEQ ID NO:35), SEQ ID NO: 17 (encoded by the nucleotide sequence set forth in SEQ ID NO:36), SEQ ID NO: 18, or SEQ ID NO:65.
  • a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO: 16 can produce (5S,8R,9R, 10R)-labda-8-ol diphosphate from GGPP in vivo.
  • a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO: 17 or SEQ ID NO: 18 can produce 13R-MO in vivo.
  • the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:6 (which can be encoded by the nucleotide sequence set forth in SEQ ID NO: 1 or SEQ ID NO: 1 1), SEQ ID NO:7 (encoded by the nucleotide sequence set forth in SEQ ID NO:2 or SEQ ID NO:12), or SEQ ID NO:26 (encoded by the nucleotide sequence set forth in SEQ ID NO:27).
  • SEQ ID NO:6 which can be encoded by the nucleotide sequence set forth in SEQ ID NO: 1 or SEQ ID NO: 1 1
  • SEQ ID NO:7 encoded by the nucleotide sequence set forth in SEQ ID NO:2 or SEQ ID NO:12
  • SEQ ID NO:26 encoded by the nucleotide sequence set forth in SEQ ID NO:27.
  • the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R- MO at its 1-, 6-, 7-, 9-, and/or 1 1 -positions comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:21 , SEQ ID NO:71 , SEQ ID NO:70, SEQ ID NO:73, or SEQ ID NO:72.
  • a recombinant host expressing a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 1 1 -positions can produce hydroxyl-13R-MO, dihydroxy-13R-MO, 13R-MO ketone, and/or 13R-MO aldehyde.
  • a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO: 19 can produce 13R-MO and/or oxidized 13R-MO hydroxylated at its 9-position in vivo.
  • a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO:22 can produce 1 1-oxo-13f?-MO in vivo.
  • the 13R-MO derivative is an oxidized 13R-MO derivative.
  • the 13R-MO derivative is 11-oxo-13f?-MO and/or 1 1/3-hydroxy-13f?-MO.
  • the 13R-MO derivative is forskolin.
  • the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions is further capable of synthesizing ferruginol from abietatriene and/or miltiradiene.
  • a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO:22 can produce ferruginol in vivo.
  • the polypeptide capable of synthesizing (5S,8R,9R, 10R)- labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate comprises a diterpene synthase (TPS) polypeptide as otherwise described herein, e.g. , a TPS1 , TPS2, TPS3, or TPS4 polypeptide.
  • TPS diterpene synthase
  • the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 11 -positions comprises a cytochrome P450 (CYP) polypeptide as otherwise described herein, e.g., a CYP76AH16, CYP76AH8, CYP76AH15, CYP76AH17, CYP76AH1 1 , CYP76AH4, ferruginol synthase (FS), or FS1 polypeptide.
  • CYP cytochrome P450
  • the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a diterpene acetyltransferase (ACT) polypeptide as otherwise described herein, e.g., an ACT1-6, ACT1-7, or ACT1-8 polypeptide.
  • ACT diterpene acetyltransferase
  • the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 1 1 -positions comprises a functional homolog, or variant, of CYP76AH15 (SEQ ID NO:22), as otherwise described herein.
  • the CYP76AH15 variant is further capable of synthesizing ferruginol from abietatriene and/or miltiradiene.
  • the CYP76AH15 variant comprises one or more amino acid substitutions corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 474, 476, and/or 478 of SEQ ID NO:22.
  • Non-limiting examples of the CYP76AH15 variant include polypeptides comprising substitutions (with respect to SEQ I D NO:22) corresponding to residue 99 (e.g. , an isoleucine corresponding to residue 99); 100 (e.g.
  • a valine corresponding to residue 100 104 (e.g., an aspartic acid corresponding to residue 104); 207 (e.g., a threonine corresponding to residue 207); 235 (e.g. , a glycine corresponding to residue 235); 236 (e.g., a phenylalanine corresponding to residue 236); 362 (e.g. , a valine corresponding to residue 362); 366 (e.g. , a phenylalanine or a glutamic acid corresponding to residue 366); 473 (e.g.
  • the CYP76AH 15 variant comprises an A99I substitution corresponding to SEQ I D NO:22 (e.g.
  • CYP76AH 15 A991 I; SEQ ID NO:42 an S235G and Y236F substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 S235 Y236F; SEQ ID NO:48), an L366F substitution corresponding to SEQ ID NO:22 (e.g. , CYP76AH15 L366F; SEQ ID NO:50), L366E substitution corresponding to SEQ I D NO:22 (e.g., CYP76AH15 L366E; SEQ ID NO:52), an A99I, S235G, and Y236F substitution corresponding to SEQ ID NO:22 (e.g.
  • CYP76AH15 A99I S235G Y236F L366E; SEQ ID NO:75 a G362V and L366F substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 G362V L366F; SEQ ID NO:51 ), a G362V substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 G362V; SEQ ID NO:49), or a D473E and D474L substitution, and a P475 deletion corresponding to SEQ ID NO:22 (e.g., CYP76AH 15 D473E D474L + P475 deletion; SEQ I D NO:53).
  • a CYP76AH15 variant can have one or more substitutions corresponding to the following portions of SEQ I D NO:22: residue 93-1 16, residue 202-209, residue 233-240, residue 233-240, residue 286-304, residue 359-369, and/or residue 473-480.
  • a CYP76AH 15 variant can have one or more mutations corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 476, and/or 478 of SEQ I D NO:22.
  • a CYP76AH15 variant can comprise the following mutations: A99I, A100V, G104D, V207T, S235G, Y236F, G362V, L366F, L366E, F476T, L478M, L478A, and/or L478I. See SEQ I D NOs:42-64, Tables 4 and 5, and Figure 5D.
  • the 13R-MO derivative is an oxidized 13R-MO derivative.
  • the 13R-MO derivative is 1 1 -oxo-13f?- MO and/or 1 1/3-hydroxy-13f?-MO.
  • the 13R-MO derivative is forskolin.
  • a recombinant host expressing a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 - positions further expresses a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate (e.g.
  • a terpene synthase (TPS) polypeptide comprising a terpene synthase 2 (TPS2) polypeptide, a terpene synthase 3 (TPS3) polypeptide, and/or a terpenes synthase 4 (TSP4) polypeptide; a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions (e.g.
  • a cytochrome P450 (CYP) polypeptide comprising a CYP76AH16 polypeptide variant, a CYP76AH8 polypeptide variant, a CYP76AH1 1 polypeptide variant, and/or a CYP76AH17 polypeptide variant); a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO (e.g.
  • ACT diterpene acetyltransferase
  • ACT diterpene acetyltransferase
  • GGPP farnesyl diphosphate
  • IPP isopentyl diphosphate
  • GGPPS geranylgeranyl diphosphate synthase
  • DXS 1 -deoxy-D-xylulose 5-phosphate
  • DXS 1 -deoxy-D-xylulose-5-phosphate synthase
  • a gene encoding a polypeptide capable of reducing cytochrome P450 complex e.g. , a cytochrome P450 reductase (CPR) polypeptide
  • CPR cytochrome P450 reductase
  • a recombinant host expressing a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 - positions e.g., a CYP76AH15 polypeptide variant
  • a cytochrome P450 (CYP) polypeptide comprising a CYP76AH16 polypeptide variant having at least 55% sequence identity to an amino acid sequence set forth in SEQ ID NO: 19, a CYP76AH8 polypeptide variant having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO:20, a CYP76AH1 1 polypeptide variant having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO:21 , and/or a CYP76AH17 polypeptide variant having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO:23); a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO (e.g., a diterpene acetyltransferase (ACT) polypeptide having at least 40% sequence identity to SEQ ID NO:26); a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl
  • GGPPS geranylgeranyl diphosphate synthase
  • SEQ ID NO:32 or SEQ ID NO:37 a gene encoding a polypeptide capable of synthesizing 1 -deoxy-D-xylulose 5-phosphate (DXS) from pyruvate and D-glyceraldehyde 3- phosphate (e.g., a 1 -deoxy-D-xylulose-5-phosphate synthase (DXS) polypeptide having at least 75% sequence identity to SEQ ID NO:30); a gene encoding a polypeptide capable of reducing cytochrome P450 complex (e.g., a cytochrome P450 reductase (CPR) polypeptide having at least 75% sequence identity to SEQ ID NO:34); and/or a gene encoding an anti-post transcriptional suppressor protein polypeptide having at least 75% sequence identity to SEQ I D NO:
  • expression of a gene encoding a CYP76AH15 variant in a recombinant host further expressing a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions, and/or a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO, results in increased production of a 13R-MO derivative (e.g.
  • 1 1 -oxo- 13R-MO relative to a corresponding host lacking the CYP76AH15 homolog, e.g., at least a 10% increase, or at least a 25% increase, or at least a 50% increase, or at least a 75% increase, or at least a 100% increase, or at least a 150% increase, or at least a 200% increase, or at least a 250% increase, or at least a 300% increase, or at least a 350% increase, or at least a 400% increase, or at least a 450% increase, or at least a 500% increase, or at least a 550% increase, or at least a 600% increase in production of a 13R-MO derivative (e.g. , 1 1 -oxo-13R- MO).
  • a 13R-MO derivative e.g. , 1 1 -oxo-13R- MO
  • a recombinant host expressing a gene encoding a CYP76AH15 variant and a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions, and/or a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO, and further expressing a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl diphosphate (FPP)
  • FPP farnesyl
  • the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:32 (which can be encoded by the nucleotide sequence set forth in SEQ ID NO:31) or SEQ ID NO:37 (encoded by the nucleotide sequence set forth in SEQ ID NO:38).
  • the polypeptide capable of synthesizing DXP from pyruvate and D-glyceraldehyde 3-phosphate comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:30 (encoded by the nucleotide sequence set forth in SEQ ID NO:29).
  • the polypeptide capable of reducing cytochrome P450 complex comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:34 (encoded by the nucleotide sequence set forth in SEQ ID NO:33).
  • the anti-post transcriptional suppressor protein polypeptide comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:68.
  • the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a geranylgeranyl diphosphate synthase (GGPPS) polypeptide as otherwise described herein.
  • GGPPS geranylgeranyl diphosphate synthase
  • the polypeptide capable of synthesizing DXP from pyruvate and D-glyceraldehyde 3-phosphate comprises a 1-deoxy-D-xylulose-5-phosphate synthase (DXS) polypeptide as otherwise described herein.
  • the polypeptide capable of reducing cytochrome P450 complex comprises a cytochrome P450 reductase (CPR) polypeptide as otherwise described herein, e.g., a POR polypeptide.
  • CPR cytochrome P450 reductase
  • the anti-post transcriptional suppressor protein polypeptide comprises a P19 polypeptide as otherwise described herein.
  • a functional homolog is a polypeptide that has sequence similarity to a reference polypeptide, and that carries out one or more of the biochemical or physiological function(s) of the reference polypeptide.
  • a functional homolog and the reference polypeptide can be a natural occurring polypeptide, and the sequence similarity can be due to convergent or divergent evolutionary events. As such, functional homologs are sometimes designated in the literature as homologs, or orthologs, or paralogs.
  • Variants of a naturally occurring functional homolog can themselves be functional homologs.
  • Functional homologs can also be created via site-directed mutagenesis of the coding sequence for a polypeptide, or by combining domains from the coding sequences for different naturally- occurring polypeptides ("domain swapping").
  • Techniques for modifying genes encoding functional polypeptides described herein are known and include, inter alia, directed evolution techniques, site-directed mutagenesis techniques and random mutagenesis techniques, and can be useful to increase specific activity of a polypeptide, alter substrate specificity, alter expression levels, alter subcellular location, or modify polypeptide-polypeptide interactions in a desired manner. Such modified polypeptides are considered functional homologs.
  • the term "functional homolog” is sometimes applied to the nucleic acid that encodes a functionally homologous polypeptide.
  • Functional homologs can be identified by analysis of nucleotide and polypeptide sequence alignments. For example, performing a query on a database of nucleotide or polypeptide sequences can identify homologs of 13R-MO and/or 13R-MO derivative biosynthesis polypeptides. Sequence analysis can involve BLAST, Reciprocal BLAST, or PSI- BLAST analysis of non-redundant databases using an amino acid sequence as the reference sequence. Amino acid sequence is, in some instances, deduced from the nucleotide sequence. Those polypeptides in the database that have greater than 40% sequence identity are candidates for further evaluation for suitability as a 13R-MO and/or 13R-MO derivative biosynthesis polypeptide.
  • nucleic acids and polypeptides are identified from transcriptome data based on expression levels rather than by using BLAST analysis.
  • conserveed regions can be identified by locating a region within the primary amino acid sequence of a 13R-MO and/or 13R-MO derivative biosynthesis polypeptide that is a repeated sequence, forms some secondary structure (e.g. , helices and beta sheets), establishes positively or negatively charged domains, or represents a protein motif or domain. See, e.g. , the Pfam web site describing consensus sequences for a variety of protein motifs and domains on the World Wide Web at sanger.ac.uk/Software/Pfam/ and pfam.janelia.org/. The information included at the Pfam database is described in Sonnhammer et al. , Nucl.
  • conserveed regions also can be determined by aligning sequences of the same or related polypeptides from closely related species. Closely related species preferably are from the same family. In some embodiments, alignment of sequences from two different species is adequate to identify such homologs.
  • polypeptides that exhibit at least about 40% amino acid sequence identity are useful to identify conserved regions.
  • conserved regions of related polypeptides exhibit at least 45% amino acid sequence identity (e.g., at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% amino acid sequence identity).
  • a conserved region exhibits at least 92%, 94%, 96%, 98%, or 99% amino acid sequence identity.
  • Methods to modify the substrate specificity of a polypeptide include without limitation site-directed/rational mutagenesis approaches, random directed evolution approaches and combinations in which random mutagenesis/saturation techniques are performed near the active site of the enzyme. For example see Osmani et al., 2009, Phytochemistry 70: 325-347.
  • a candidate sequence typically has a length that is from 80% to 200% of the length of the reference sequence, e.g. , 82, 85, 87, 89, 90, 93, 95, 97, 99, 100, 105, 1 10, 1 15, 120, 130, 140, 150, 160, 170, 180, 190, or 200% of the length of the reference sequence.
  • a functional homolog polypeptide typically has a length that is from 95% to 105% of the length of the reference sequence, e.g., 90, 93, 95, 97, 99, 100, 105, 1 10, 1 15, or 120% of the length of the reference sequence, or any range between.
  • a % identity (or % sequence identity) for any candidate nucleic acid or polypeptide relative to a reference nucleic acid or polypeptide can be determined as follows.
  • a reference sequence e.g. , a nucleic acid sequence or an amino acid sequence described herein
  • Clustal Omega version 1.2.1 , default parameters
  • Clustal Omega calculates the best match between a reference and one or more candidate sequences, and aligns them so that identities, similarities and differences can be determined. Gaps of one or more residues can be inserted into a reference sequence, a candidate sequence, or both, to maximize sequence alignments.
  • word size 2; window size: 4; scoring method: %age; number of top diagonals: 4; and gap penalty: 5.
  • gap opening penalty 10.0; gap extension penalty: 5.0; and weight transitions: yes.
  • the Clustal Omega output is a sequence alignment that reflects the relationship between sequences.
  • Clustal Omega can be run, for example, at the Baylor College of Medicine Search Launcher site on the World Wide Web (searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) and at the European Bioinformatics Institute site at http://www.ebi.ac.uk Tools/msa/clustalo/.
  • % identity value can be rounded to the nearest tenth. For example, 78.1 1 , 78.12, 78.13, and 78.14 are rounded down to 78.1 , while 78.15, 78.16, 78.17, 78.18, and 78.19 are rounded up to 78.2.
  • functional polypeptided disclosed herein can include additional amino acids that are not involved in the enzymatic activities carried out by the enzymes.
  • functional polypeptided disclosed herein are fusion proteins.
  • the terms “chimera,” “fusion polypeptide,” “fusion protein,” “fusion enzyme,” “fusion construct,” “chimeric protein,” “chimeric polypeptide,” “chimeric construct,” and “chimeric enzyme” can be used interchangeably herein to refer to proteins engineered through the joining of two or more genes that code for different proteins.
  • a chimeric enzyme is constructed by joining the C-terminal of a first polypeptide ProteinA to the N-terminal of a second polypeptide ProteinB through a linker "b," i.e., "ProteinA-b-ProteinB.”
  • the linker of a chimeric enzyme may be the amino acid sequence "KLVK.”
  • the linker of a chimeric enzyme may be the amino acid sequence "RASSTKLVK.”
  • the linker of a chimeric enzyme may be the amino acid sequence "GGGGS.”
  • the linker of a chimeric enzyme may be two repeats of the amino acid sequence "GGGGS" (i.e.
  • the linker of a chimeric enzyme may be three repeats of the amino acid sequence "GGGGS.” In some aspects, the linker of a chimeric enzyme may be the amino acid sequence "EGKSSGSGSESKST.” In some aspects, the linker of a chimeric enzyme is a direct bond between the C-terminal of a first polypeptide and the N-terminal of a second polypeptide.
  • a chimeric enzyme is constructed by joining the C-terminal of a first polypeptide ProteinA to the N-terminal of a second polypeptide ProteinB through a linker "b," i.e., "ProteinA-b-ProteinB” and by joining the C-terminal of the second polypeptide ProteinB to the N-terminal of a third polypeptide ProteinC through a second linker "d,” i.e., "ProteinA-b- ProteinB-d-ProteinC.
  • a fusion protein is a protein altered by domain swapping.
  • domain swapping is used to describe the process of replacing a domain of a first protein with a domain of a second protein.
  • the domain of the first protein and the domain of the second protein are functionally identical or functionally similar.
  • the structure and/or sequence of the domain of the second protein differs from the structure and/or sequence of the domain of the first protein.
  • a UGT polypeptide is altered by domain swapping.
  • a fusion protein is a protein altered by circular permutation, which consists in the covalent attachment of the ends of a protein that would be opened elsewhere afterwards.
  • a targeted circular permutation can be produced, for example but not limited to, by designing a spacer to join the ends of the original protein. Once the spacer has been defined, there are several possibilities to generate permutations through generally accepted molecular biology techniques, for example but not limited to, by producing concatemers by means of PCR and subsequent amplification of specific permutations inside the concatemer or by amplifying discrete fragments of the protein to exchange to join them in a different order.
  • the step of generating permutations can be followed by creating a circular gene by binding the fragment ends and cutting back at random, thus forming collections of permutations from a unique construct.
  • DAP1 polypeptide is altered by circular permutation. 13R-MO and/or 13f?-MO Derivative Biosynthesis Nucleic Acids
  • a recombinant gene encoding a polypeptide described herein comprises the coding sequence for that polypeptide, operably linked in sense orientation to one or more regulatory regions suitable for expressing the polypeptide. Because many microorganisms are capable of expressing multiple gene products from a polycistronic mRNA, multiple polypeptides can be expressed under the control of a single regulatory region for those microorganisms, if desired.
  • a coding sequence and a regulatory region are considered to be operably linked when the regulatory region and coding sequence are positioned so that the regulatory region is effective for regulating transcription or translation of the sequence.
  • the translation initiation site of the translational reading frame of the coding sequence is positioned between one and about fifty nucleotides downstream of the regulatory region for a monocistronic gene.
  • the coding sequence for a polypeptide described herein is identified in a species other than the recombinant host, i.e. , is a heterologous gene.
  • the coding sequence can be from other prokaryotic or eukaryotic microorganisms, from plants or from animals.
  • the coding sequence is a sequence that is native to the host and is being reintroduced into that organism.
  • a native sequence can often be distinguished from the naturally occurring sequence by the presence of non-natural sequences linked to the exogenous nucleic acid, e.g.
  • regulatory region refers to a nucleic acid having nucleotide sequences that influence transcription or translation initiation and rate, and stability and/or mobility of a transcription or translation product. Regulatory regions include, without limitation, promoter sequences, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5' and 3' untranslated regions (UTRs), transcriptional start sites, termination sequences, polyadenylation sequences, introns, and combinations thereof.
  • a regulatory region typically comprises at least a core (basal) promoter.
  • a regulatory region also may include at least one control element, such as an enhancer sequence, an upstream element or an upstream activation region (UAR).
  • a regulatory region is operably linked to a coding sequence by positioning the regulatory region and the coding sequence so that the regulatory region is effective for regulating transcription or translation of the sequence.
  • the translation initiation site of the translational reading frame of the coding sequence is typically positioned between one and about fifty nucleotides downstream of the promoter.
  • a regulatory region can, however, be positioned as much as about 5,000 nucleotides upstream of the translation initiation site, or about 2,000 nucleotides upstream of the transcription start site.
  • regulatory regions The choice of regulatory regions to be included depends upon several factors, including, but not limited to, efficiency, selectability, inducibility, desired expression level, and preferential expression during certain culture stages. It is a routine matter for one of skill in the art to modulate the expression of a coding sequence by appropriately selecting and positioning regulatory regions relative to the coding sequence. It will be understood that more than one regulatory region may be present, e.g., introns, enhancers, upstream activation regions, transcription terminators, and inducible elements.
  • One or more genes can be combined in a recombinant nucleic acid construct in "modules" useful for a discrete aspect of 13R-MO and/or 13R-MO derivative production.
  • Combining a plurality of genes in a module, particularly a polycistronic module facilitates the use of the module in a variety of species.
  • a 13R-MO and/or 13R-MO derivative gene cluster can be combined in a polycistronic module such that, after insertion of a suitable regulatory region, the module can be introduced into a wide variety of species.
  • a 13R-MO and/or 13R-MO derivative gene cluster can be combined such that each coding sequence is operably linked to a separate regulatory region, to form a module.
  • a recombinant construct typically also comprises an origin of replication, and one or more selectable markers for maintenance of the construct in appropriate species.
  • nucleic acids can encode a particular polypeptide; i.e., for many amino acids, there is more than one nucleotide triplet that serves as the codon for the amino acid.
  • codons in the coding sequence for a given polypeptide can be modified such that optimal expression in a particular host is obtained, using appropriate codon bias tables for that host (e.g., microorganism).
  • these modified sequences can exist as purified molecules and can be incorporated into a vector or a virus for use in constructing modules for recombinant nucleic acid constructs.
  • a nucleic acid that overexpresses the polypeptide or gene product may be included in a recombinant construct that is transformed into the strain.
  • mutagenesis can be used to generate mutants in genes for which it is desired to increase or enhance function.
  • Recombinant hosts can be used to express polypeptides for the producing 13R-MO and/or 13R-MO derivatives, including mammalian, insect, plant, and algal cells.
  • a number of prokaryotes and eukaryotes are also suitable for use in constructing the recombinant microorganisms described herein, e.g., gram-negative bacteria, yeast, and fungi.
  • a species and strain selected for use as a 13R-MO and/or 13R-MO derivative production strain is first analyzed to determine which production genes are endogenous to the strain and which genes are not present. Genes for which an endogenous counterpart is not present in the strain are advantageously assembled in one or more recombinant constructs, which are then transformed into the strain in order to supply the missing function(s).
  • the recombinant microorganism is grown in a fermenter at a temperature(s) for a period of time, wherein the temperature and period of time facilitate the production of 13R-MO and/or 13R-MO derivatives.
  • the constructed and genetically engineered microorganisms provided by the invention can be cultivated using conventional fermentation processes, including, inter alia, chemostat, batch, fed-batch cultivations, semi-continuous fermentations such as draw and fill, continuous perfusion fermentation, and continuous perfusion cell culture.
  • other recombinant genes such as isopentenyl biosynthesis genes and terpene synthase and cyclase genes may also be present and expressed. Levels of substrates and intermediates can be determined by extracting samples from culture media for analysis according to published methods.
  • Carbon sources of use in the instant method include any molecule that can be metabolized by the recombinant host cell to facilitate growth and/or production of the 13R-MO and/or 13R-MO derivatives.
  • suitable carbon sources include, but are not limited to, sucrose (e.g. , as found in molasses), fructose, xylose, ethanol, glycerol, glucose, cellulose, starch, cellobiose or other glucose-comprising polymer.
  • sucrose e.g. , as found in molasses
  • fructose xylose
  • ethanol glycerol
  • glucose e.glycerol
  • the carbon source can be provided to the host organism throughout the cultivation period or alternatively, the organism can be grown for a period of time in the presence of another energy source, e.g., protein, and then provided with a source of carbon only during the fed-batch phase.
  • a permeabilizing agent can be added to aid the feedstock entering into the host and product getting out.
  • a crude lysate of the cultured microorganism can be centrifuged to obtain a supernatant.
  • the resulting supernatant can then be applied to a chromatography column, e.g.
  • genes and modules discussed herein can be present in two or more recombinant hosts rather than a single host. When a plurality of recombinant hosts is used, they can be grown in a mixed culture to accumulate 13R-MO and/or 13R-MO derivatives.
  • the two or more hosts each can be grown in a separate culture medium and the product of the first culture medium can be introduced into second culture medium to be converted into a subsequent intermediate or into an end product. The product produced by the second or final host is then recovered. It will also be appreciated that in some embodiments, a recombinant host is grown using nutrient sources other than a culture medium and utilizing a system other than a fermenter.
  • suitable species can be in a genus such as Agaricus, Aspergillus, Bacillus, Candida, Corynebacterium, Eremothecium, Escherichia, Fusarium/Gibberella, Kluyveromyces, Laetiporus, Lentinus, Phaffia, Phanerochaete, Pichia (formally known as Hansuela), Scheffersomyces, Physcomitrella, Rhodoturula, Saccharomyces, Schizosaccharomyces, Sphaceloma, Xanthophyllomyces, Humicola, Issatchenkia, Brettanomyces, Yamadazyma, Lachancea, Zygosaccharomyces, Komagataella, Kazachstania, Xanthophyllomyces, Geotrichum, Blakeslea, Dunaliella, Haematococcus, Chlorella, Und
  • Exemplary species from such genera include Lentinus tigrinus, Laetiporus sulphureus, Phanerochaete chrysosponum, Pichia pastons, Pichia kudnavzevii, Cybehindnera jadinii, Physcomitrella patens, Rhodoturula glutinis, Rhodoturula mucilaginosa, Phaffia rhodozyma, Xanthophyllomyces dendrorhous, Issatchenkia orientalis, Saccharomyces cerevisiae, Saccharomyces bayanus, Saccharomyces pastorianus, Saccharomyces carlsbergensis, Hansuela polymorpha, Brettanomyces anomalus, Yamadazyma philogaea, Fusarium fujikuroi/Gibberella fujikuroi, Candida utilis, Candida glabrata, Candida
  • a microorganism can be a prokaryote such as Escherichia bacteria cells, for example, Escherichia coli cells; Lactobacillus bacteria cells; Lactococcus bacteria cells; Comebacterium bacteria cells; Acetobacter bacteria cells; Acinetobacter bacteria cells; or Pseudomonas bacterial cells.
  • a microorganism can be an algal cell such as Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp. , Undaria pinnatifida, Sargassum, Laminaria japonica, Scenedesmus almeriensis species.
  • a microorganism can be a fungi from the genera including but not limited to Acremonium, Arxula, Agaricus, Aspergillus, Agaricus, Aureobasidium, Brettanomyces, Candida, Cryptococcus, Corynascus, Chrysosporium, Debaromyces, Filibasidium, Fusarium, Gibberella, Humicola, Magnaporthe, Monascus, Mucor, Myceliophthora, Mortierella, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Piromyces, Phanerochaete Podospora, Pycnoporus, Rhizopus, Schizophyllum, Schizosaccharomyces, Sordaria, Scheffersomyces, Talaromyces, Rhodotorula, Rhodosporidium, Rasmsonia, Zygosacchar
  • Fungal species include, but are not limited to, Aspergillus niger, Aspergillus oryzae, Aspergillus fumigatus, Penicillium chrysogenum, Penicillium citrinum, Acremonium chrysogenum, Trichoderma reesei, Rasamsonia emersonii (formerly known as Talaromyces emersonii), Aspergillus sojae, Chrysosporium lucknowense, Myceliophtora thermophyla.
  • a microorganism can be an Ascomycete such as Gibberella fujikuroi, Kluyveromyces lactis, Schizosaccharomyces pombe, Geotrichum Aspergillus niger, Yarrowia lipolytica, Ashbya gossypii, Yamadazyma philogaea, Lachancea kluyveri, Kodamaea ohmeri, or S. cerevisiae.
  • Ascomycete such as Gibberella fujikuroi, Kluyveromyces lactis, Schizosaccharomyces pombe, Geotrichum Aspergillus niger, Yarrowia lipolytica, Ashbya gossypii, Yamadazyma philogaea, Lachancea kluyveri, Kodamaea ohmeri, or S. cerevisiae.
  • Agaricus, Gibberella, and Phanerochaete spp. can be useful because they are known to produce large amounts of isoprenoids in culture.
  • the terpene precursors for producing large amounts of steviol glycosides are already produced by endogenous genes.
  • modules comprising recombinant genes for steviol glycoside biosynthesis polypeptides can be introduced into species from such genera without the necessity of introducing mevalonate or MEP pathway genes.
  • Arxula adeninivorans (Blastobotrys adeninivorans)
  • Arxula adeninivorans is dimorphic yeast (it grows as budding yeast like the baker's yeast up to a temperature of 42°C, above this threshold it grows in a filamentous form) with unusual biochemical characteristics. It can grow on a wide range of substrates and can assimilate nitrate. It has successfully been applied to the generation of strains that can produce natural plastics or the development of a biosensor for estrogens in environmental samples. Rhodotorula sp.
  • Rhodotorula is unicellular, pigmented yeast.
  • the oleaginous red yeast, Rhodotorula glutinis has been shown to produce lipids and carotenoids from crude glycerol (Saenge et al., 201 1 , Process Biochemistry 46(1):210-8).
  • Rhodotorula toruloides strains have been shown to be an efficient fed-batch fermentation system for improved biomass and lipid productivity (Li et al., 2007, Enzyme and Microbial Technology 41 :312-7).
  • Schizosaccharomyces is a genus of fission yeasts. Similar to S. cerevisiae, Schizosaccharomyces is a model organism in the study of eukaryotic cell biology. It provides an evolutionary distant comparison to S. cerevisiae. Species include but are not limited to S. cryophilius and S. pombe. (See Hoffman et al., 2015, Genetics. 201 (2):403-23).
  • Humicola is a genus of filamentous fungi. Species include but are not limited to H. alopallonella and H. siamensis.
  • Brettanomyces is a non-spore forming genus of yeast. It is from the Saccharomycetaceae family and commonly used in the brewing and wine industries. Brettanomyces produces several sensory compounds that contribute to the complexity of wine, specifically red wine. Brettanomyces species include but are not limited to B. bruxellensis and B. claussenii. See, e.g., Fugelsang et al. , 1997, Wine Microbiology.
  • Trichosporon is a genus of the fungi family. Trichosporon species are yeast commonly isolated from the soil, but can also be found in the skin microbiota of humans and animals. Species include, for example but are not limited to, T. aquatile, T. beigelii, and T. dermatis.
  • Debaromyces is a genus of the ascomycetous yeast family, in which species are characterized as a salt-tolerant marine species. Species include but are not limited to D. hansenii and D. hansenius.
  • Physcomitrella spp. when grown in suspension culture, have characteristics similar to yeast or other fungal cultures. This genera can be used for producing plant secondary metabolites, which can be difficult to produce in other types of cells.
  • Saccharomyces is a widely used chassis organism in synthetic biology, and can be used as the recombinant microorganism platform. For example, there are libraries of mutants, plasmids, detailed computer models of metabolism and other information available for S. cerevisiae, allowing for rational design of various modules to enhance product yield. Methods are known for making recombinant microorganisms. Examples of Saccharomyces species include S. castellii, also known as Naumovozyma castelli.
  • Zygosaccharomyces is a genus of yeast. Originally classified under the Saccharomyces genus it has since been reclassified. It is widely known in the food industry because several species are extremely resistant to commericially used food preservation techniques. Species include but are not limited to Z. bisporus and Z. cidn. (See Barnett et al, Yeasts: Characteristics and Identification, 1983).
  • Geotrichum is a fungi commonly found in soil, water and sewage worldwide. It's often identified in plants, cereal and diary products. Species include, for example but are not limited to, G. candidum and G. klebahnii (see Carmichael et al. , Mycologica, 1957, 49(6):820-830.)
  • Kazachstania is a yeast genus in the family Sacchromycetaceae.
  • Torulaspora is a genus of yeasts and species include but are not limited to T. franciscae and T. globosa.
  • Aspergillus species such as A. oryzae, A. niger and A. sojae are widely used microorganisms in food production and can also be used as the recombinant microorganism platform.
  • Nucleotide sequences are available for genomes of A. nidulans, A. fumigatus, A. oryzae, A. clavatus, A. flavus, A. niger, and A. terreus, allowing rational design and modification of endogenous pathways to enhance flux and increase product yield.
  • Metabolic models have been developed for Aspergillus, as well as transcriptomic studies and proteomics studies.
  • A. niger is cultured for the industrial production of a number of food ingredients such as citric acid and gluconic acid, and thus species such as A. niger are generally suitable for producing steviol glycosides.
  • Yarrowia lipolytica is dimorphic yeast (see Arxula adeninivorans) and belongs to the family Hemiascomycetes. The entire genome of Yarrowia lipolytica is known. Yarrowia species is aerobic and considered to be non-pathogenic. Yarrowia is efficient in using hydrophobic substrates (e.g. , alkanes, fatty acids, and oils) and can grow on sugars. It has a high potential for industrial applications and is an oleaginous microorgamism. Yarrowia lipolyptica can accumulate lipid content to approximately 40% of its dry cell weight and is a model organism for lipid accumulation and remobilization.
  • hydrophobic substrates e.g. , alkanes, fatty acids, and oils
  • Rhodosporidium toruloides is oleaginous yeast and useful for engineering lipid- production pathways (See e.g. Zhu et al., 2013, Nature Commun. 3:1 1 12; Ageitos et al. , 201 1 , Applied Microbiology and Biotechnology 90(4) : 1219-27) .
  • Candida boidinii is methylotrophic yeast (it can grow on methanol). Like other methylotrophic species such as Hansenula polymorpha and Pichia pastoris, it provides an excellent platform for producing heterologous proteins. Yields in a multigram range of a secreted foreign protein have been reported.
  • a computational method, I PRO recently predicted mutations that experimentally switched the cofactor specificity of Candida boidinii xylose reductase from NADPH to NADH. See, e.g. , Mattanovich et al., 2012, Methods Mol Biol. 824:329-58; Khoury et al., 2009, Protein Sci. 18(10):2125-38.
  • Hansenula polymorpha is methylotrophic yeast (see Candida boidinii). It can furthermore grow on a wide range of other substrates; it is thermo-tolerant and can assimilate nitrate (see also, Kluyveromyces lactis). It has been applied to producing hepatitis B vaccines, insulin and interferon alpha-2a for the treatment of hepatitis C, furthermore to a range of technical enzymes. See, e.g., Xu et al., 2014, Virol Sin. 29(6):403-9.
  • Candida krusei (Issatchenkia orientalis)
  • Candida krusei scientific name Issatchenkia orientalis, is widely used in chocolate production. C. krusei is used to remove the bitter taste of and break down cacao beans. In addition to this species involvement in chocolate production, C. krusei is commonly found in the immunocompromised as a fungal nosocomial pathogen (see Mastromarino et al. , New Microbiolgica, 36:229-238; 2013)
  • Kluyveromyces lactis is yeast regularly applied to the production of kefir. It can grow on several sugars, most importantly on lactose which is present in milk and whey. It has successfully been applied among others for producing chymosin (an enzyme that is usually present in the stomach of calves) for producing cheese. Production takes place in fermenters on a 40,000 L scale. See, e.g., van Ooyen et al. , 2006, FEMS Yeast Res. 6(3):381-92.
  • Pichia pastoris is methylotrophic yeast (see Candida boidinii and Hansenula polymorpha). It is also commonly referred to as Komagataella pastoris. It provides an efficient platform for producing foreign proteins. Platform elements are available as a kit and it is worldwide used in academia for producing proteins. Strains have been engineered that can produce complex human N-glycan (yeast glycans are similar but not identical to those found in humans). See, e.g., Piirainen et al., 2014, N Biotechnol. 31 (6):532-7.
  • Pichia stipitis also known as Pichia stipitis is a homothallic yeast found in haploid form. Commonly used instead of S. cerevisiae due to its enhanced respiratory capacity that results from and alternative respiratory system. (See Papini et al. , Microbial Cell Factories, 1 1 :136 (2012)).
  • a microorganism can be an insect cell such as Drosophilia, specifically, Drosophilia melanogaster.
  • a microorganism can be an algal cell such as, for example but not limited to, Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp., [00178]
  • a microorganism can be a cyanobacterial cell such as, for example but not limited to, Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp., Undaria pinnatifida, Sargassum, Laminaria japonica, and Scenedesmus almeriensis.
  • a microorganism can be a bacterial cell.
  • bacteria include, but are not limited to, the genenera Bacillus (e.g. , B. subtilis, B. amyloliquefaciens, B. licheniformis, B. puntis, B. megaterium, B. halodurans, B. pumilus), Acinetobacter, Nocardia, Xanthobacter, Escherichia (e.g., E. coli), Streptomyces, Erwinia, Klebsiella, Serratia (e.g. , S. marcessans), Pseudomonas (e.g. , P.
  • Bacillus e.g. , B. subtilis, B. amyloliquefaciens, B. licheniformis, B. puntis, B. megaterium, B. halodurans, B. pumilus
  • Acinetobacter Nocardia
  • Bacterial cells may also include, but are not limited to, photosynthetic bacteria (e.g. , green non-sulfur bacteria (e.g. , Choroflexus bacteria (e.g., C. aurantiacus), Chloronema (e.g. , C. gigateum), green sulfur bacteria (e.g., Chlorobium bacteria (e.g., C. limicola), Pelodictyon (e.g. , P. luteolum), purple sulfur bacteria (e.g. , Chromatium (e.g., C.
  • photosynthetic bacteria e.g. , green non-sulfur bacteria (e.g. , Choroflexus bacteria (e.g., C. aurantiacus), Chloronema (e.g. , C. gigateum), green sulfur bacteria (e.g., Chlorobium bacteria (e.g., C. limicola), Pelodictyon (e.g. , P.
  • okenii e.g., Rhode-spirillum (e.g., R. rubrum), Rhodobacter (e.g., R. sphaeroides, R. capsulatus), and Rhodomicrobium bacteria (e.g., R. vanellii)).
  • Rhode-spirillum e.g., R. rubrum
  • Rhodobacter e.g., R. sphaeroides, R. capsulatus
  • Rhodomicrobium bacteria e.g., R. vanellii
  • E. coli another widely used platform organism in synthetic biology, can also be used as the recombinant microorganism platform. Similar to Saccharomyces, there are libraries of mutants, plasmids, detailed computer models of metabolism and other information available for E. coli, allowing for rational design of various modules to enhance product yield. Methods similar to those described above for Saccharomyces can be used to make recombinant E. coli microorganisms.
  • the recombinant host cell disclosed herein can comprise a plant cell, comprising a plant cell that is grown in a plant, a mammalian cell, an insect cell, a fungal eel from Aspergillus genus; a yeast cell from Saccharomyces (e.g., S. cerevisiae, S. bayanus, S. pastorianus, and S. carisbergensis), Schizosaccharomyces (e.g. , S. pombe), Yarrowia (e.g., Y. lipolytica), Candida (e.g., C. glabrata, C. albicans, C. krusei, C.
  • Saccharomyces e.g., S. cerevisiae, S. bayanus, S. pastorianus, and S. carisbergensis
  • Schizosaccharomyces e.g. , S. pombe
  • Yarrowia e.g
  • T. franciscae and T. globosa e.g., T. franciscae and T. globosa
  • Geotrichum e.g., G. candidum and G. klebahni
  • Zygosaccharomyces e.g., Z. bisporus and Z. cidri
  • Yamadazyma e.g., Y. philogaea
  • Lanchancea e.g., L. kluyveri
  • Kodamaea e.g., K. ohmeri
  • Bretianomyces e.g., B. anomalus
  • Trichosporon e.g. , T. aquatile, T. beigelii, and T.
  • Debaromyces e.g. , D. hansenuis and D. hansenii
  • Scheffersomyces e.g., S. stipis
  • Rhodosporidium e.g., R. toruloides
  • Pachysolen e.g., P.
  • Bacillus genus e.g., B. subtilis,
  • B. amyloliquefaciens, B. licheniformis, B. puntis, B. megaterium, B. halodurans, and B. pumilus) Acinetobacter, Nocardia, Xanthobacter genera, Escherichia (e.g., E. coli), Streptomyces, Erwinia, Klebsiella, Serratia (e.g., S. marcessans), Pseudomonas (e.g. , P. aeruginosa), Salmonella (e.g., S. typhimurium and S. typhi), and further including, Choroflexus bacteria (e.g.,
  • C. aurantiacus Chloronema (e.g., C. gigateum), green sulfur bacteria (e.g., Chlorobium bacteria (e.g., C. limicola), Pelodictyon (e.g., P. luteolum)), purple sulfur bacteria (e.g., Chromatium (e.g., C. okenii)), and purple non-sulfur bacteria (e.g. , Rhode-spirillum (e.g., R. rubrum), Rhodobacter (e.g. , R. sphaeroides and R. capsulatus), and Rhodomicrobium bacteria (e.g., R. vanellii).
  • Chloronema e.g., C. gigateum
  • green sulfur bacteria e.g., Chlorobium bacteria (e.g., C. limicola)
  • Pelodictyon e.g., P. luteolum
  • purple sulfur bacteria e.g.,
  • the host organism is a plant.
  • a plant or plant cell can be transformed by having a heterologous gene integrated into its genome, i.e., it can be stably transformed. Stably transformed cells typically retain the introduced nucleic acid with each cell division.
  • a plant or plant cell can also be transiently transformed such that the recombinant gene is not integrated into its genome. Transiently transformed cells typically lose all or some portion of the introduced nucleic acid with each cell division such that the introduced nucleic acid cannot be detected in daughter cells after a certain number of cell divisions. Both transiently transformed and stably transformed transgenic plants and plant cells can be useful in the methods described herein.
  • Plant cells comprising a heterologous gene used in methods described herein can constitute part or all of a whole plant. Such plants can be grown in a manner suitable for the species under consideration, either in a growth chamber, a greenhouse, or in a field. Plants may also be progeny of an initial plant comprising a heterologous gene provided the progeny inherits the heterologous gene. Seeds produced by a transgenic plant can be grown and then selfed (or outcrossed and selfed) to obtain seeds homozygous for the nucleic acid construct. [00184]
  • the plants to be used with the invention can be grown in suspension culture, or tissue or organ culture. For the purposes of this invention, solid and/or liquid tissue culture techniques can be used.
  • plant cells When using solid medium, plant cells can be placed directly onto the medium or can be placed onto a filter that is then placed in contact with the medium.
  • transgenic plant cells When using liquid medium, transgenic plant cells can be placed onto a flotation device, e.g. , a porous membrane that contacts the liquid medium.
  • a reporter sequence encoding a reporter polypeptide having a reporter activity can be included in the transformation procedure and an assay for reporter activity or expression can be performed at a suitable time after transformation.
  • a suitable time for conducting the assay typically is about 1-21 days after transformation, e.g., about 1-14 days, about 1-7 days, or about 1-3 days.
  • the use of transient assays is particularly convenient for rapid analysis in different species, or to confirm expression of a heterologous polypeptide whose expression has not previously been confirmed in particular recipient cells.
  • nucleic acids into monocotyledonous and dicotyledonous plants are known in the art, and include, without limitation, /4gro6acfe/7t//n-mediated transformation, viral vector-mediated transformation, electroporation and particle gun transformation, U.S. Patent Nos. 5,538,880; 5,204,253; 6,329,571 ; and 6,013,863. If a cell or cultured tissue is used as the recipient tissue for transformation, plants can be regenerated from transformed cultures if desired, by techniques known to those skilled in the art.
  • the plant comprising a heterologous nucleic acid to be used with the present invention can for example be: com (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cerale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuas), wheat (Tritium aestivum and other species), Triticale, Rye (Secale) soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Impomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp), coconut (Cocos nucifera), pineapple (Anana comos
  • plants of the present invention are crop plants (for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassava, barley, pea, sugar beets, sugar cane, soybean, oilseed rape, sunflower and other root, tuber or seed crops.
  • crop plants for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassava, barley, pea, sugar beets, sugar cane, soybean, oilseed rape, sunflower and other root, tuber or seed crops.
  • Other important plants maybe fruit trees, crop trees, forest trees or plants grown for their use as spices or pharmaceutical products
  • Meatha spp. clove, Artemesia spp., Thymus spp., Lavendula spp., Allium spp., Hypericum, Catharanthus spp., Vinca spp., Papaver spp., Digitalis spp., Rawolfia spp., Vanilla spp., Petrusilium spp., Eucalyptus, tea tree, Picea spp., Pinus spp., Abies spp., Juniperus spp. Horticultural plants which can be used with the present invention may include lettuce, endive, and vegetable brassicas including cabbage, broccoli, and cauliflower, carrots, and carnations and geraniums.
  • the plant can also be tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, or Chrysanthemum.
  • the plant may also be a grain plants for example oil-seed plants or leguminous plants.
  • Seeds of interest include grain seeds, such as corn, wheat, barley, sorghum, rye, etc.
  • Oil-seed plants include cotton soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, etc.
  • Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mung bean, lima bean, fava bean, lentils, chickpea.
  • the plant can be maize, rice, wheat, sugar beet, sugar cane, tobacco, oil seed rape, potato, soybean, or Arabidopsis thaliana. In some embodiments, the plant is not C. forskohlii.
  • SEQ ID NO:66 DNA sequence encoding CYP76AH15
  • SEQ ID NO:67 DNA sequence encoding CYP76AH15
  • Example 1 Identification and comparison of the putative SRS regions in the C. forskohlii CYP76AH
  • SRS1-6 The putative substrate recognition sites 1 -6 (SRS1-6) of cytochrome P450s were selected for site-directed mutagenesis, as these areas comprise residues important for the catalytic activities of CYPs.
  • the SRS regions were identified in CYP76AH8 (SEQ ID NO:20), CYP76AH1 1 (SEQ ID NO:21), CYP76AH15 (SEQ ID NO:22), CYP76AH16 (SEQ ID NO:19), and CYP76AH17 (SEQ ID NO:23) by alignments and comparisons of reported SRS regions in the Rattus norvegicus CYP2A1 (SEQ ID NO:39), Hyoscyamus muticus CYP71 D55 (SEQ ID NO:40), and Thapsia villosa CYP71AJ6 (SEQ ID NO:41 ).
  • SRS regions of R. norvegicus CYP2A1 SEQ ID NO:39
  • H. muticus CYP71 D55 SEQ ID NO:40
  • T. villosa CYP71AJ6 SEQ ID NO:41
  • SRS regions of in CYP76AH8 SEQ ID NO:20
  • CYP76AH1 1 SEQ ID NO:21
  • CYP76AH15 SEQ ID NO:22
  • CYP76AH16 SEQ ID NO:19
  • CYP76AH17 SEQ ID NO:23
  • Comparative homology modeling was furthermore utilized to determine and visualize SRS regions to identify residues involved in the regulating activity of the CYP76AH15 enzyme. See Figure 5C. Homology modeling was carried out using UCSF Chimera version 1 .10.2 (University of California) and Modeller 9.15 (University of California). BLAST searches in the PDB database were carried out using CYP76AH sequences as a query to find templates. Two templates were utilized for each modeling using default settings in Modeller except inclusion of the HEME heteroatom from the 3RUK template (Table 2).
  • CYP76AH15, CYP76AH8, and CYP76AH17 The SRS regions of CYP76AH enzymes that catalyze formation of 1 1-oxo-13R-MO (CYP76AH15, CYP76AH8, and CYP76AH17) were compared. See Table 3 and Figure 5D. CYP76AH8 and CYP76AH17 were found to share an overall sequence identity of 88%, whereas the total sequence identity in the SRS regions was found to be 99% with a single conservative amino acid substitution in SRS1 of A117S in CYP76AH17, suggesting a higher sequence conservation in these areas.
  • SRS1-6 Differences in SRS1-6 between CYP76AH15 and CYP76AH8/CYP76AH 17 were mainly in the SRS1 , SRS3, and SRS6 with sequence identities below the overall sequence identities, suggesting that SRS1 , SRS3, and SRS6 could be responsible for the catalytic differences between CYP76AH15 and either CYP76AH8 or CYP76AH17.
  • SRS regions of CYP76AH11 and CYP76AH16 which carry out distinct reactions in the forskolin pathway (CYP76AH1 1 is multifunctional, and CYP76AH16 is region-specific towards C-9 hydroxylation) were also compared. See Table 3 and Figure 5D.
  • the SRS differences between CYP76AH1 1 and CYP76AH16 shared a similar pattern to the differences between CYP76AH8/CYP76AH 17 and CYP76AH15; differences were found in SRS1 (54% identity), SRS3 (50% identity), and SRS6 (63% identity).
  • SRS5 was not conserved between CYP76AH1 1 and CYP76AH16.
  • SRS putative substrate recognition sites
  • CYP76AH15 variant enzymes were conducted using transient N. benthamiana (tobacco) in vivo expression.
  • CYP76AH15 variants were created using site-directed mutagenesis; variants were subsequently cloned for agrobacterium-meditated transient tobacco expression together with TPS2 (SEQ ID NO:16), TPS3 (SEQ ID NO: 17), anti-post transcriptional suppressor protein P19 (SEQ ID NO:68), DXS (SEQ ID NO:30), and C. forskohlii GGPPS (SEQ ID NO:32).
  • Agrobacterium suspensions normalized at an OD 600 of 1 were mixed in equivalent volumes prior to infiltrations.
  • RT room temperature
  • the Shimadzu GCMS-QP2010 Ultra system was utilized for GC-MS analysis using an Agilent HP-5MS column (30 m x 0.25 mm i.d., 0.25 ⁇ film thickness). Injection volume was set to 1 ⁇ and the injection temperature at 250°C with the following GC program: 50°C for 2 min, ramp at rate 4°C/min to 1 10°C, ramp at rate 8°C/min to 250°C, ramp at rate 10°C/min to 310°C and hold for 5 min. The ion source temperature of the mass spectrometer was set to 250°C, and spectra were recorded from m/z 50 to m/z 350.
  • the SRS1 variant A99I accumulated 1 1/3-hydroxy-13R-MO and 1 1-oxo-13R-MO ( Figure 6B).
  • the SRS5 variant L366F accumulated mainly 1 1-oxo-13f?-MO ( Figure 6A) as well as three putative single hydroxylated derivatives of the following formula: C20H32O2 ( Figure 7A, compounds a/b/c).
  • An SRS5 variant, G362V was inactive when combined with L366F; the G362V L366F variant produced 11/3-hydroxy-13R-MO but accumulated unused 13H- MO ( Figure 6A).
  • Variant L366E displayed a similar pattern as A99I with mainly 1 1/3-hydroxy- 13R-MO and 1 1 -oxo-13f?-MO accumulating (Figure 6A) and further oxygenated derivatives (Figure 7A).
  • the SRS1 +SRS5 variant A99I L366F demonstrated a profile similar to that of A99I, while the similar combination A99I L366E was inactive ( Figure 6B).
  • CYP76AH15 In addition to CYP76AH15 producing 11-oxo-13R-MO from 13R-MO, it also produces ferruginol from abietatriene. CYP76AH15 variants were tested for their ability to produce ferruginol from abietatriene to further study effects from mutagenesis in SRS sites. CYP76AH15 and CYP76AH15 variants were transiently expressed together with C. forskohlii TPS1 (SEQ ID NO:65) and C. forskohlii TPS3 (SEQ ID NO:17) producing miltiradiene and abietatriene.
  • C. forskohlii TPS1 SEQ ID NO:65
  • C. forskohlii TPS3 SEQ ID NO:17
  • the CYP76AH15 variants were individually genomically integrated in an S. cerevisiae strain further comprising C. forskohlii POR (SEQ ID NO:34), C. forskohlii TPS2 (SEQ ID NO:16), C. forskohlii TPS3 (SEQ ID NO: 17), and Synechococcus sp. GGPPS (SEQ ID NO:37).
  • a control strain comprising C. forskohlii POR (SEQ ID NO:34) and no CYPs accumulated 13R-MO ( Figures 9A, 9B).
  • An S. cerevisiae strain producing miltiradiene and abietatriene was also constructed to determine the effects of CYP76AH15 and CYP76AH15 variants on formation of ferruginol in S. cerevisiae.
  • the strain further comprised C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO: 17), Synechococcus sp. GGPPS (SEQ ID NO:37), and C. forskohlii POR (SEQ ID NO:34).
  • the control strain comprising C.
  • forskohlii TPS1 SEQ ID NO:65
  • C. forskohlii TPS3 SEQ ID NO:17
  • Synechococcus sp. GGPPS SEQ ID NO:37
  • C. forskohlii POR SEQ ID NO:34
  • Example 4 Step-wise incorporation of CYP76AH15 and CYP76AH15 A99I variant with CYP76AH11 or CYP76AH16 in S. cerevisiae
  • CYP76AH enzymes accept multiple diterpene substrates and produce several products from one substrate. See also, Ignea et a/., 2016, Microb. Cell Fact. 15:46 and Guo et a/., 2016, New Phytol. 210:525-34.
  • CYP76AH subfamily members are involved in production of ferruginol and 1 1-hydroxy-ferruginol in rosemary and sage ( Figure 12A).
  • the promiscuous nature of CYP76AH subfamily members from rosemary, sage, and C. forskohlii on 13R-MO was explored.
  • the enzymes were transiently expressed in N. benthamiana with diterpene synthases. As shown in Figures 13A and 13B, enzymes from rosemary and sage species, including R. officinalis CYP76AH4 (SEQ ID NO:71), R. officinalis FS1 (SEQ ID NO:70), S. fructicosa FS (SEQ ID NO:73), and R. officinalis CYP76AH6 (SEQ ID NO:72) were able to produce the forskolin precursors 1 1-oxo-13f?-MO and 1 1/3-hydroxy-13f?-MO.
  • the strain further comprised C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO: 17), Synechococcus sp. GGPPS (SEQ ID NO:37), and C. forskohlii POR (SEQ ID NO:34).
  • the control strain comprising C. forskohlii POR (SEQ ID NO:34) and no CYPs produced miltiradiene and abietatriene ( Figure 10).
  • CYP76AH15 F476T (SEQ ID NO:54), CYP76AH15 L478M (SEQ ID NO:55), CYP76AH15 L478I (SEQ ID NO:56), or CYP76AH15 L478A (SEQ ID NO:57) resulted in increased production of ferruginol.
  • Example 7 Product Profiles of S. cerevisiae strains expressing CYP76AH15 variants
  • CYP76AH15 variants shown in Table 6, below, expressed in 13R-MO- or miltiradiene-producing S. cereivisiae strains, as described in Examples 3 and 6, were compared to the diterpene accumulation pattern of corresponding S. cerevisiae strains expressiing CYP76AH15 (SEQ ID NO:22). Table 6. CYP76AH15 variants tested in N. benthamiana and characterization of diterpene product profile.
  • CYP76AH15 A99I resultsed in a two-fold increase in the total amount of diterpenes accumulated, relative to expression of CYP76AH15 (SEQ ID NO:22), and a 3.7-fold increase in the amount of 1 1-oxo-13f?-MO accumulated (accounting for 99% of the total amount of diterpenes produced).
  • Expression of CYP76AH15 A99I also resulted in a near-depletion of 13R-MO, an 18-fold decrease relative to expression of CYP76AH15.
  • CYP76AH15 S235G Y236F (SEQ ID NO:48), CYP76AH15 L366F (SEQ ID NO:50), CYP76AH15 L366E (SEQ ID NO:52), CYP76AH15 A99I S235G Y236F (SEQ ID NO:62), or CYP76AH15 A99I L366F (SEQ ID NO:58) also resulted in an amount of accumulated 1 1-oxo- 13R-MO that accounted for more than 93% of the total amount of diterpenes produced. See Table 7 and Figure 15.
  • LAIIEGFLNE RIESRRTNPN APKKDDFLET LVDTLQTNDN KLKTDHLTHL MLDLFVGGSE 300
  • LGLIEGYLNE RIEFRKANPN APKKDDFLET LVDALDAKDY KLKTEHLTHL MLDLFVGGSE 300
  • HAMBERGER & BAK "Plant P450s as versatile drivers for evolution of species-specific chemical diversity," Philos Trans R Soc Lond B Biol Sci. 368(1612):20120426 (January 2013).

Abstract

The invention relates to recombinant microorganisms and methods for producing 13R-MO and/or 13R-MO derivatives, including forskolin.

Description

BIOSYNTHESIS OF 13K-MANOYL OXIDE DERIVATIVES
BACKGROUND OF THE INVENTION
Field of the Invention
[0001] The present invention relates to the field of biosynthesis of substituted diterpenes. More specifically, the invention relates to methods for biosynthesis of 13f?-manoyl oxide (13R- MO) and 13R-MO derivatives, including biosynthesis of forskolin.
Description of Related Art
[0002] Forskolin is a complex functionalized derivative of 13R-MO requiring regio- and stereospecific oxidation of five carbon positions. Forskolin is a diterpene naturally produced by Coleus forskohlii. Forskolin, oxidized variants of forskolin, and/or acetylated variants of forskolin have been suggested as useful in treatment of a number of clinical conditions. Forskolin has been shown to decrease intraocular pressure and can be used as an antiglaucoma agent in the form of eye drops. See Wagh et al., 2012, J Postgrad Med. 58(3): 199-202. Moreover, a water- soluble analogue of forskolin (NKH477), which has been shown to have vasodilatory effects when administered intravenously, has been approved for commercial use in Japan for treatment of acute heart failure and heart surgery complications. See Kikura et al. , 2004, Pharmacol Res 49:275-81 . Forskolin, which also acts as bronchodilator, can be used for asthma treatments. See Yousif & Thulesius, 1999, J Pharm Pharmacol. 51 (2): 181 -6. In addition, forskolin may help to treat obesity by contributing to higher rates of body fat burning and promoting lean body mass formation. See Godard et al., 2005, Obes Res. 13:1335-43.
[0003] Forskolin has been previously purified from C. forskohlii roots using non- environmental friendly organic solvents or produced chemically by cost-ineffective procedures (Delpech et al., 1996, Tetrahedron Letters 37(7): 1019-22. Therefore, there remains a need in the art for methods for biosynthesis of forskolin and forskolin precursors.
SUMMARY OF THE INVENTION
[0004] It is against the above background that the present invention provides certain advantages and advancements over the prior art. [0005] Although this invention as disclosed herein is not limited to specific advantages or functionalities, the invention provides a recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1 -positions, having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:22, and further having at least one amino acid substitution corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 474, 476, and/or 478 of SEQ ID NO:22.
[0006] In one aspect of the recombinant host cell disclosed herein, the polypeptide comprises an A99I, A100V, G104D, V207T, S235G, Y236F, G362V, L366F, L366E, D473E, D474L, F476T, L478M, L478A, and/or L478I substitution corresponding to SEQ ID NO:22.
[0007] In one aspect of the recombinant host cell disclosed herein, the polypeptide comprises:
(a) an A99I substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(b) an S235G and Y236F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(c) an L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(d) an L366E substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(e) an A99I, S235G, and Y236F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(f) an A99I and L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(g) an S235G, Y236F, and L366E substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(h) an A99I, S235G, Y236F, and L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(i) a G362V and L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22; (k) a G362V substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22; or
(I) a D473E and D474L substitution, and a P475 deletion in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22.
[0008] The invention also provides a recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1-positions, having at least one amino acid substitution corresponding to residues 93- 1 16; 202-209; 233-240; 286-304; 359-369 or 473-480 of SEQ ID NO:22.
[0009] The invention also provides a recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1-positions, having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:70, SEQ ID NO:71 , SEQ ID NO:72, or SEQ ID NO:73.
[0010] In one aspect of the recombinant host cell disclosed herein, the 13R-MO derivative is an oxidized 13R-MO derivative.
[0011] In one aspect of the recombinant host cell disclosed herein, the 13R-MO derivative is 1 1 -OXO-13R-MO and/or 1 1/3-hydroxy-13R-MO.
[0012] In one aspect of the recombinant host cell disclosed herein, the 13R-MO derivative is forskolin.
[0013] In one aspect of the recombinant host cell disclosed herein, the recombinant host cell further comprises:
(a) a gene encoding a polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8- ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate;
(b) a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 1 1-positions;
(c) a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO;
(d) a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl diphosphate (FPP) and isopentyl diphosphate (IPP); (e) a gene encoding a polypeptide capable of synthesizing 1-deoxy-D-xylulose 5- phosphate (DXS) from pyruvate and D-glyceraldehyde 3-phosphate;
(f) a gene encoding a polypeptide capable of reducing cytochrome P450 complex; and/or
(g) a gene encoding an anti-post transcriptional suppressor protein polypeptide, wherein at least one of the genes is a recombinant gene.
In one aspect of the recombinant host cell disclosed herein:
(a) the polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R- MO from (5S,8R,9R,10R)-labda-8-ol diphosphate comprises a polypeptide having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO: 16 or SEQ ID NO:17, or at least 40% sequence identity to the amino acid sequence set forth in SEQ ID NO: 18;
(b) the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its -, 6-, 7-, 9-, and/or 1 1 -position comprises a polypeptide having at least 55% sequence identity to the amino acid sequence set forth in SEQ ID NO: 19, or at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:20, SEQ ID NO:21 , or SEQ ID NO:23;
(c) the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a polypeptide having at least 40% sequence identity to SEQ ID NO:26;
(d) the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a polypeptide having at least 70% sequence identity to the amino acid sequence set forth in SEQ ID NO:32 or SEQ ID NO:37;
(e) the polypeptide capable of synthesizing DXS from pyruvate and D- glyceraldehyde 3-phosphate comprises a polypeptide having at least 75% sequence identity to the amino acid sequence set forth in SEQ ID NO:30;
(f) the polypeptide capable of reducing cytochrome P450 complex comprises a polypeptide having at least 75% sequence identity to the amino acid sequence set forth in SEQ ID NO:34; and/or (g) the anti-post transcriptional suppressor protein polypeptide comprises a polypeptide having at least 65% sequence identity to the amino acid sequence set forth in SEQ ID NO:68.
[0015] In one aspect of the recombinant host cell disclosed herein, the recombinant host cell comprises a plant cell, a mammalian cell, an insect cell, a fungal cell, an algal cell or a bacterial cell.
[0016] In one aspect of the recombinant host cell disclosed herein, the bacterial cell comprises Escherichia cells, Lactobacillus cells, Lactococcus cells, Comebacterium cells, Acetobacter cells, Acinetobacter cells, or Pseudomonas cells.
[0017] In one aspect of the recombinant host cell disclosed herein, the fungal cell comprises a yeast cell.
[0018] In one aspect of the recombinant host cell disclosed herein, the yeast cell is a cell from Saccharomyces cerevisiae, Schizosaccharomyces pombe, Yarrowia lipolytica, Candida glabrata, Ashbya gossypii, Cyberlindnera jadinii, Pichia pastoris, Kluyveromyces lactis, Hansenula polymorpha, Candida boidinii, Arxula adeninivorans, Xanthophyllomyces dendrorhous, or Candida albicans species.
[0019] In one aspect of the recombinant host cell disclosed herein, the yeast cell is a Saccharomycete.
[0020] In one aspect of the recombinant host cell disclosed herein, the yeast cell is a Saccharomyces cerevisiae cell.
[0021] In one aspect of the recombinant host cell disclosed herein, the plant cell is a Nicotiana benthamiana cell.
[0022] The invention also provides a method of producing 13R-MO and/or a 13R-MO derivative in a cell culture, comprising growing the recombinant host cell disclosed herein in the cell culture, under conditions in which the genes are expressed, and wherein 13R-MO and/or the 13R-MO derivative is produced by the recombinant host cell.
[0023] In one aspect of the method disclosed herein, the recombinant host cell is grown in a fermentor at a temperature for a period of time, wherein the temperature and period of time facilitate the production of 13R-MO and/or the 13R-MO derivative.
[0024] In one aspect, the method disclosed herein further comprises a step of isolating 13R- MO and/or the 13R-MO derivative. [0025] In one aspect of the method disclosed herein, the 13R-MO derivative is 1 1 -oxo-13R- MO, 1 i -hydroxy-13R-MO or forskolin.
[0026] The invention also provides a 13R-MO derivative composition produced by the recombinant host cell or the method disclosed herein.
[0027] In one aspect of the 13R-MO derivative composition disclosed herein, the 13R-MO derivative composition comprises 1 1-oxo-13R-MO, 1 i p-hydroxy-13R-MO, and/or forskolin.
[0028] These and other features and advantages of the present invention will be more fully understood from the following detailed description taken together with the accompanying claims. It is noted that the scope of the claims is defined by the recitations therein and not by the specific discussion of features and advantages set forth in the present description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0029] The following detailed description of the embodiments of the present invention can be best understood when read in conjunction with the following drawings, where like structure is indicated with like reference numerals and in which:
[0030] Figure 1 shows the structure of 13R-MO ((3R,4aR,10aS)-3,4a,7,7,10a-pentamethyl- 3-vinyldodecahydro-1 H-benzo[f]chromene) and formulas for 13R-MO derivatives.
[0031] Figure 2A shows a hypothetical biosynthetic route to forskolin in C. forskohlii proposed by Asada et al. , Phytochemistry 79 (2012) 141 -146. Figure 2B shows a reaction capable of being catalyzed by a terpene synthase, such as terpene synthase 2 (TPS2; SEQ ID NO: 16). For example, conversion of geranylgeranyl diphosphate to (5S,8R,9R, 10R)-labda-8-ol diphosphate is capable of being catalyzed by TPS2 of SEQ ID NO: 16. Figure 2C shows a reaction capable of being catalyzed by a terpene synthase, such as terpene synthase 3 (TPS3; SEQ ID NO:17) or terpene synthase 4 (TPS4; SEQ ID NO:18). For example, conversion of (5S,8R,9R, 10R)-labda-8-ol diphosphate to 13R-MO is capable of being catalyzed by TPS3 (SEQ ID NO: 17) or TPS4 (SEQ ID NO:18).
[0032] Figure 3 shows 13f?-MO-derived oxygenated products produced by cytochrome P450 (CYP) 76AH8 (CYP76AH8 of SEQ ID NO:20), CYPAH17 (SEQ ID NO:23), CYPAH15 (SEQ ID NO:22), CYP76AH1 1 (SEQ ID NO:21), and CYP76AH16 (SEQ ID NO:19). The empirical formulas of the oxygenated products formed by the CYPs are shown. Each compound is marked by a letter and number; compounds identified by the same number are isomers of one another. The structures of 1 1-oxo-13R-manoyl oxide (compound 2), 9-hydroxy- 13R-manoyl oxide (compound 3a), 1 ,1 1-dihydroxy-13R-manoyl oxide (compound 5d), 1 ,9- dideoxydeacetyl-forskolin (compound 7h), and 9-deoxydeacetyl-forskolin (compound 10b) are also shown in Figure 3.
[0033] Figure 4 shows diterpene biosynthetic pathways from geranylgeranyl diphosphate (GGPP) towards forskolin and ferruginol, compounds present in the root cork cells of C. forskohlii.
[0034] Figure 5A shows alignments of Ratius Non/egicus CYP2A1 (SEQ ID NO:39), C. forshohlii CYP76AH8 (SEQ ID NO:20), C. forshohlii CYP76AH 15 (SEQ ID NO:22), Hyoscyamus muticus CYP71 D55 (SEQ ID NO:40), and Thapsia villosa CYP71AJ6 (SEQ ID NO:41) for substrate recognition site (SRS) identification. Highlighted areas indicate identified SRS regions for Non/egicus CYP2A1 by Gotoh, 1997, J. Biol. Chem. 1 (5):83-90, H. muticus CYP71 D55 by Takahashi et al., 2007, J. Biol. Chem. 282(43):31744-54 and T. villosa CYP71AJ6 by Dueholm et al., 2015, BMC Evolutionary Biology 15:122. Figure 5B shows SRS regions for CYP76AH15 (SEQ ID NO:22), CYP76AH8 (SEQ ID NO:20), CYP76AH17 (SEQ ID NO:23), CYP76AH11 (SEQ ID NO:21), and CYP76AH16 (SEQ ID NO:19). See Example 1. Figure 5C shows homology model structures of SRS1-6 from CYP76AH15 and identifies the A99, S235, Y236, L366, and G362 residues. Figure 5D shows the residues of CYP76AH15 selected for mutagenesis (squares).
[0035] Figures 6A shows GC-MS chromatograms of hexane extracted tobacco leaf discs comprising native CYP76AH15 or CYP76AH15 SRS5 variants. Figure 6B shows GC-MS chromatograms analyzing products produced by CYP76AH14 SRS1 and CYP76AH14 SRS1 +SRS5 variants. Figure 6C shows the structures of 13R-MO, 1 1-oxo-13R-MO, and 1 1/3- hydroxy-13R-MO. See Example 2.
[0036] Figure 7A shows GC-MS chromatograms analyzing a control N. benthamiana plant, an N. benthamiana plant comprising native CYP76AH15, and N. benthamiana plants comprising CYP76AH15 SRS5 variants. Figure 7B shows mass spectra of compounds with parent ions of m/z=320 indicating single hydroxylated 13R-MO derivatives. Figure 7C shows a mass spectrum of a putative double hydroxylated 13R-MO derivative with m/z=307. See Example 2.
[0037] Figure 8A shows GC-MS chromatograms and variants producing miltiradiene, abietatriene, and/or ferruginol products from control and N. benthamiana plants expressing CYP76AH15 variants. Figure 8B shows mass spectra and structures of produced miltiradiene, abietatriene, and ferruginol. See Example 2.
[0038] Figure 9A shows GC-MS chromatograms analyzing a control S. cerevisiae strain comprising C. forskohlii POR (SEQ ID NO:34), strains comprising native CYP76AH15 (SEQ ID NO:22), or strains comprising a CYP76AH15 variant. Figure 9B shows relative yields of 13R- MO, 1 1 -oxo-13f?-MO, and 11/3-hydroxy-13f?-MO (measured as total ion chromatogram (TIC) area of compound of interest normalized to a standard). Figure 9C shows fold changes (compared to native CYP76AH15) of the CYP76AH15 variants. Figures 9D and 9E show GC- MS chromatograms of a control strain comprising C. forskohlii POR (SEQ ID NO:34) or strains comprising a CYP76AH15 variant. See Example 3.
[0039] Figures 10A and 10B shows GC-MS chromatograms showing miltiradiene, abietatriene, and/or ferruginol produced by a control S. cerevisiae strain, an S. cerevisiae strain comprising CYP76AH15 or a CYP76AH15 variant, or an S. cerevisiae strain comprising CYP76AH8. See Example 3.
[0040] Figure 1 1 shows GC-MS chromatograms of 13R-MO derivatives produced by a control S. cerevisiae strain or an S. cerevisiae strain comprising i) CYP76AH15 (SEQ ID NO:22) or CYP76AH15 A99I (SEQ ID NO:42) and ii) CYP76AH1 1 (SEQ ID NO:21) or CYP76AH17 (SEQ ID NO:23).
[0041] Figure 12A shows a biosynthetic pathway towards ferruginol in Salvia fructicosa (S ) species and Rosmarinus officinalis (Ro). Class ll+l diterpene synthases (diTPS) produce miltiradiene from GGPP, which can undergo spontaneous oxidation into abietatriene, which can further be converted to ferruginol and 11-hydroxy-ferruginol with a ferruginol synthase from S. fructicosa or R. officinalis and/or R. officinalis CYP76AH4. Figure 12B shows diterpene products produced using CYP76AH enzymes.
[0042] Figure 13A shows GC-MS chromatograms analyzing products produced by 13R-MO- producing N. benthamiana further expressing C. forskohlii CYP76AH15 (SEQ ID NO:22), R. officinalis CYP76AH4, R. officinalis FS1 , or S. fructicosa FS. Figure 13B shows GC-MS chromatograms analyzing products produced by 13f?-MO-producing N. benthamiana further expressing C. forskohlii CYP76AH8 (SEQ ID NO:20) or R. officinalis CYP76AH6.
[0043] Figure 14 shows the relative yields of abietatriene, miltiradiene, and ferruginol (shown from left to right, for each strain) produced by S. cerevisiae strains expressing CYP76AH15 or a CYP76AH15 variant. See Example 6. [0044] Figure 15 shows the production titers of 13R-MO (diamonds), 11 -oxo-13f?-MO (dark squares), 11/3-hydroxy-13R-MO (triangles), and C20H32O3 (light squares, WT CYP76AH15 only) of S. cerevisiae strains expressing CYP76AH15 or a CYP76AH15 variant, over 72 hours. See Example 8.
[0045] Skilled artisans will appreciate that elements in the Figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the Figures can be exaggerated relative to other elements to help improve understanding of the embodiment(s) of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0046] All publications, patents and patent applications cited herein are hereby expressly incorporated by reference for all purposes.
[0047] Before describing the present invention in detail, a number of terms will be defined. As used herein, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. For example, reference to a "nucleic acid" means one or more nucleic acids.
[0048] It is noted that terms like "preferably," "commonly," and "typically" are not utilized herein to limit the scope of the claimed invention or to imply that certain features are critical, essential, or even important to the structure or function of the claimed invention. Rather, these terms are merely intended to highlight alternative or additional features that can or cannot be utilized in a particular embodiment of the present invention.
[0049] For the purposes of describing and defining the present invention it is noted that the term "substantially" is utilized herein to represent the inherent degree of uncertainty that can be attributed to any quantitative comparison, value, measurement, or other representation. The term "substantially" is also utilized herein to represent the degree by which a quantitative representation can vary from a stated reference without resulting in a change in the basic function of the subject matter at issue.
[0050] Methods well known to those skilled in the art can be used to construct genetic expression constructs and recombinant cells according to this invention. These methods include in vitro recombinant DNA techniques, synthetic techniques, in vivo recombination techniques, and polymerase chain reaction (PCR) techniques. See, for example, techniques as described in Green & Sambrook, 2012, MOLECULAR CLONING: A LABORATORY MANUAL, Fourth Edition, Cold Spring Harbor Laboratory, New York; Ausubel et a/., 1989, CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Greene Publishing Associates and Wiley Interscience, New York, and PCR Protocols: A Guide to Methods and Applications (Innis et a/., 1990, Academic Press, San Diego, CA).
[0051] As used herein, the terms "polynucleotide," "nucleotide," "oligonucleotide," and "nucleic acid" can be used interchangeably to refer to nucleic acid comprising DNA, RNA, derivatives thereof, or combinations thereof, in either single-stranded or double-stranded embodiments depending on context as understood by the skilled worker.
[0052] As used herein, the terms "microorganism," "microorganism host," and "microorganism host cell" can be used interchangeably. As used herein, the terms "recombinant host" and "recombinant host cell" can be used interchangeably. The person of ordinary skill in the art will appreciate that the terms "microorganism," microorganism host," and "microorganism host cell," when used to describe a cell comprising a recombinant gene, may be taken to mean "recombinant host" or "recombinant host cell." As used herein, the term "recombinant host" is intended to refer to a host, the genome of which has been augmented by at least one DNA sequence. Such DNA sequences include but are not limited to genes that are not naturally present, DNA sequences that are not normally transcribed into RNA or translated into a protein ("expressed"), and other genes or DNA sequences which one desires to introduce into a host. It will be appreciated that typically the genome of a recombinant host described herein is augmented through stable introduction of one or more recombinant genes. Generally, introduced DNA is not originally resident in the host that is the recipient of the DNA, but it is within the scope of this disclosure to isolate a DNA segment from a given host, and to subsequently introduce one or more additional copies of that DNA into the same host, e.g. , to enhance production of the product of a gene or alter the expression pattern of a gene. In some instances, the introduced DNA will modify or even replace an endogenous gene or DNA sequence by, e.g., homologous recombination or site-directed mutagenesis. Suitable recombinant hosts include microorganisms.
[0053] As used herein, the term "recombinant gene" refers to a gene or DNA sequence that is introduced into a recipient host, regardless of whether the same or a similar gene or DNA sequence may already be present in such a host. "Introduced," or "augmented" in this context, is known in the art to mean introduced or augmented by the hand of man. Thus, a recombinant gene can be a DNA sequence from another species or can be a DNA sequence that originated from or is present in the same species but has been incorporated into a host by recombinant methods to form a recombinant host. It will be appreciated that a recombinant gene that is introduced into a host can be identical to a DNA sequence that is normally present in the host being transformed, and is introduced to provide one or more additional copies of the DNA to thereby permit overexpression or modified expression of the gene product of that DNA. In some aspects, said recombinant genes are encoded by cDNA. In other embodiments, recombinant genes are synthetic and/or codon-optimized for expression in S. cerevisiae.
[0054] As used herein, the term "engineered biosynthetic pathway" refers to a biosynthetic pathway that occurs in a recombinant host, as described herein. In some aspects, one or more steps of the biosynthetic pathway do not naturally occur in an unmodified host. In some embodiments, a heterologous version of a gene is introduced into a host that comprises an endogenous version of the gene.
[0055] As used herein, the term "endogenous" gene refers to a gene that originates from and is produced or synthesized within a particular organism, tissue, or cell. In some embodiments, the endogenous gene is a yeast gene. In some embodiments, the gene is endogenous to S. cerevisiae, including, but not limited to S. cerevisiae strain S288C. In some embodiments, an endogenous yeast gene is overexpressed. As used herein, the term "overexpress" is used to refer to the expression of a gene in an organism at levels higher than the level of gene expression in a wild type organism. See, e.g., Prelich, 2012, Genetics 190:841 -54. See, e.g., Giaever & Nislow, 2014, Genetics 197(2):451 -65. As used herein, the terms "deletion," "deleted," "knockout," and "knocked out" can be used interchangeably to refer to an endogenous gene that has been manipulated to no longer be expressed in an organism, including, but not limited to, S. cerevisiae.
[0056] As used herein, the terms "heterologous sequence," "heterologous coding sequence," and "heterologous gene" are used to describe a sequence derived from a species other than the recombinant host. In some embodiments, the recombinant host is an S. cerevisiae cell, and a heterologous sequence is derived from an organism other than S. cerevisiae. A heterologous coding sequence, for example, can be from a prokaryotic microorganism, a eukaryotic microorganism, a plant, an animal, an insect, or a fungus different than the recombinant host expressing the heterologous sequence. A heterologous nucleic acid may be introduced into a host organism by recombinant methods. Thus, the genome of the host organism can be augmented by at least one incorporated heterologous nucleic acid sequence. It will be appreciated that typically the genome of a recombinant host described herein is augmented through the stable introduction of one or more heterologous nucleic acids encoding one or more enzymes. In some embodiments, a coding sequence is a sequence that is native to the host.
[0057] A "selectable marker" can be one of any number of genes that complement host cell auxotrophy, provide antibiotic resistance, or result in a color change. Linearized DNA fragments of the gene replacement vector then are introduced into the cells using methods well known in the art (see below). Integration of the linear fragments into the genome and the disruption of the gene can be determined based on the selection marker and can be verified by, for example, PCR or Southern blot analysis. Subsequent to its use in selection, a selectable marker can be removed from the genome of the host cell by, e.g. , Cre-LoxP systems (see, e.g., Gossen et al., 2002, Ann. Rev. Genetics 36:153-173 and U.S. 2006/0014264). Alternatively, a gene replacement vector can be constructed in such a way as to include a portion of the gene to be disrupted, where the portion is devoid of any endogenous gene promoter sequence and encodes none, or an inactive fragment of, the coding sequence of the gene.
[0058] As used herein, the terms "variant" and "mutant" are used to describe a protein sequence that has been modified at one or more amino acids, compared to the wild-type sequence of a particular protein.
[0059] The terms "chimera," "fusion polypeptide," "fusion protein," "fusion enzyme," "fusion construct," "chimeric protein," "chimeric polypeptide," "chimeric construct," and "chimeric enzyme" can be used interchangeably herein to refer to proteins engineered through the joining of two or more genes that code for different proteins. In some embodiments, a nucleic acid sequence encoding a polypeptide can include a tag sequence that encodes a "tag" designed to facilitate subsequent manipulation (e.g. , to facilitate purification or detection), secretion, or localization of the encoded polypeptide. Tag sequences can be inserted in the nucleic acid sequence encoding the polypeptide such that the encoded tag is located at either the carboxyl or amino terminus of the polypeptide. Non-limiting examples of encoded tags include green fluorescent protein (GFP), human influenza hemagglutinin (HA), glutathione S transferase (GST), polyhistidine-tag (HIS tag), and Flag™ tag (Kodak, New Haven, CT). Other examples of tags include a chloroplast transit peptide, a mitochondrial transit peptide, an amyloplast peptide, signal peptide, or a secretion tag.
[0060] In some embodiments, a fusion protein is a protein altered by domain swapping . As used herein, the term "domain swapping" is used to describe the process of replacing a domain of a first protein with a domain of a second protein. In some embodiments, the domain of the first protein and the domain of the second protein are functionally identical or functionally similar. In some embodiments, the structure and/or sequence of the domain of the second protein differs from the structure and/or sequence of the domain of the first protein .
[0061] As used herein, the term "inactive fragment" is a fragment of the gene that encodes a protein having, e.g. , less than about 10% (e.g., less than about 9%, less than about 8%, less than about 7%, less than about 6%, less than about 5%, less than about 4%, less than about 3%, less than about 2%, less than about 1 %, or 0%) of the activity of the protein produced from the full-length coding sequence of the gene. Such a portion of a gene is inserted in a vector in such a way that no known promoter sequence is operably linked to the gene sequence, but that a stop codon and a transcription termination sequence are operably linked to the portion of the gene sequence. This vector can be subsequently linearized in the portion of the gene sequence and transformed into a cell. By way of single homologous recombination, this linearized vector is then integrated in the endogenous counterpart of the gene with inactivation thereof.
[0062] As used herein, the terms "detectable amount," "detectable concentration," "measurable amount," and "measurable concentration" refer to a level of 13R-MO and/or *\ 3R- MO derivative measured in AUC, μΜ/ΟΟ600, mg/L, μΜ, or mM. 13R-MO and/or 13R-MO derivatives (i.e. , total, supernatant, and/or intracellular levels) can be detected and/or analyzed by techniques generally available to one skilled in the art, for example, but not limited to, liquid chromatography-mass spectrometry (LC-MS), thin layer chromatography (TLC), high- performance liquid chromatography (HPLC), ultraviolet-visible spectroscopy/ spectrophotometry (UV-Vis), mass spectrometry (MS), and nuclear magnetic resonance spectroscopy (NMR).
[0063] As used herein, the term "undetectable concentration" refers to a level of a compound that is too low to be measured and/or analyzed by techniques such as TLC, HPLC, UV-Vis, MS, or NMR. In some embodiments, a compound of an "undetectable concentration" is not present in a 13R-MO and/or 13R-MO derivative composition.
[0064] As used herein, the term "contact" is used to refer to any physical interaction between two objects. For example, the term "contact" may refer to the interaction between an an enzyme and a susbtrate. In another example, the term "contact" may refer to the interaction between a liquid (e.g., a supernatant) and an adsorbent resin.
[0065] 13R-MO and/or 13R-MO derivatives can be isolated using a method described herein. For example, following fermentation, a culture broth can be centrifuged for 30 min at 7000 rpm at 4°C to remove cells, or cells can be removed by filtration. The cell-free lysate can be obtained, for example, by mechanical disruption or enzymatic disruption of the host cells and additional centrifugation to remove cell debris. Mechanical disruption of the dried broth materials can also be performed, such as by sonication. The dissolved or suspended broth materials can be filtered using a micron or sub-micron prior to further purification, such as by preparative chromatography. The fermentation media or cell-free lysate can optionally be treated to remove low molecular weight compounds such as salt; and can optionally be dried prior to purification and re-dissolved in a mixture of water and solvent. The supernatant or cell- free lysate can be purified as follows: a column can be filled with, for example, HP20 Diaion resin (aromatic type Synthetic Adsorbent; Supeico) or other suitable non-polar adsorbent or reverse phase chromatography resin, and an aliquot of supernatant or cell-free lysate can be loaded on to the column and washed with water to remove the hydrophilic components. The 13R-MO and/or 13R-MO derivative product can be eluted by stepwise incremental increases in the solvent concentration in water or a gradient from). The levels of 13R-MO and/or 13R-MO derivatives in each fraction, including the flow-through, can then be analyzed by LC-MS. Fractions can then be combined and reduced in volume using a vacuum evaporator. Additional purification steps can be utilized, if desired, such as additional chromatography steps and crystallization.
[0066] As used herein, the terms "or" and "and/or" is utilized to describe multiple components in combination or exclusive of one another. For example, "x, y, and/or z" can refer to "x" alone, "y" alone, "z" alone, "x, y, and z," "(x and y) or z," "x or (y and z)," or "x or y or z." In some embodiments, "and/or" is used to refer to the exogenous nucleic acids that a recombinant cell comprises, wherein a recombinant cell comprises one or more exogenous nucleic acids selected from a group. In some embodiments, "and/or" is used to refer to production of 13R-MO and/or 13R-MO derivatives. In some embodiments, "and/or" is used to refer to production of 13R-MO and/or 13R-MO derivatives, wherein 13R-MO and/or 13R-MO derivatives are produced through the following steps: culturing a recombinant microorganism, synthesizing 13R-MO and/or 13R-MO derivatives in a recombinant microorganism, and/or isolating 13R-MO and/or 13R-MO derivatives.
[0067] As used herein, the term "diterpene" is used to refer to a compound derived or prepared from four isoprene units. A diterpene according to the invention is a C20-molecule comprising 20 carbon atoms. A diterpene typically comprises one or more ring structures, such as one or more monocyclic, bicyclic, tricyclic, or tetracyclic ring structure(s). The diterpene can comprise one or more double bonds. The diterpene can comprise up to three oxygen atoms, wherein the oxygen atom is generally present in the form of hydroxyl groups or part of a ring structure.
[0068] The term "substituted with a moiety" as used herein in relation to chemical compounds refers to hydrogen group(s) being substituted with the moiety. "Alkyl" as used herein refers to a saturated, straight, or branched hydrocarbon chain. The hydrocarbon chain preferably comprises from one to eighteen carbon atoms (C-|.18-alkyl), such as from one to six carbon atoms (Ci_6-alkyl), including methyl, ethyl, propyl, isopropyl, butyl, isobutyl, secondary butyl, tertiary butyl, pentyl, isopentyl, neopentyl, tertiary pentyl, hexyl, and isohexyl. In some embodiments, alkyl represents a C-|.3-alkyl group, which can in particular include methyl, ethyl, propyl, or isopropyl. The term "oxo" as used herein refers to a "=0" substituent. The term "keto" as used herein is used as a prefix to indicate presence of a carbonyl (C=0) group. The term "hydroxyl" as used herein refers to an "-OH" substituent. The term "acetylated" refers to presence of a CH30 group.
[0069] The abbreviation "13R-MO" as used herein refers to 13f?-manoyl oxide, the structure of which is provided in Figure 1 . The structure also provides the numbering of the carbon atoms of the ring structure used herein. The term "oxidized 13R-MO" as used herein refers to 13R-MO substituted at one or more positions with an =0 and/or -OH group. The person of ordinary skill in the art will appreciate that "oxidized 13R-MO" includes "hydroxylated 13R-MO." The term "acetylated 13R-MO" as used herein refers to 13R-MO substituted with at least one acetyl group. The term "acetylated oxidized 13R-MO" as used herein refers to 13R-MO substited with at least one acetyl group with one or more -OH and/or =0 groups.
[0070] As used herein, the term "derivative" is used to refer to a compound produced from or capable of being produced (e.g., derived) from a similar compound. Non-limiting examples of 13R-MO derivatives include acetylated 13R-MO compounds, oxidized 13R-MO compounds, and acetylated oxidized 13R-MO compounds. For example, 13R-MO derivatives include 1 1 - oxo-(13f?)-MO forskolin, iso-forskolin, forskolin B, forskolin D, 9-deoxyforskolin, 1 ,9- dideoxyforskolin, and coleoforskolin. Additional 13R-MO derivatives are shown in Figure 2.
[0071] As described herein, forskolin is a complex functionalized derivative of 13R-MO requiring region- and stereospecific oxidation of five carbon positions: one double-oxidation leading to a ketone and four single oxidation reactions yielding hydroxyl groups. The results presented herein show identification of diterpene synthases, cytochrome P450 mono- oxygenases, and acety transferases, which when co-expressed, result in production of forskolin . Diterpene Synthase (TPS)
[0072] In some embodiments, a host cell disclosed herein can comprise a diterpene synthase. The diterpene synthase (diTPS or TPS) can be from class II or class I, and in particular, be capable of converting geranylgeranyl diphosphate (GGPP) to (5S,8R,9R, 10R)- labda-8-ol diphosphate and/or be capable of converting (5S,8R,9R, 10R)-labda-8-ol diphosphate to 13R-MO. As described herein, 13R-MO is capable of being produced in a host cell comprising a gene encoding a terpene synthase polypeptide.
[0073] A diTPS of class II is an enzyme capable of catalyzing protonation-initiated cationic cycloisomerization of GGPP to form a diterpene pyrophosphate intermediate. The class II diTPS reaction can be terminated either by deprotonation or by water capture of the diphosphate carbocation. The diTPS of class II may in particular comprise the following motif of four amino acids: D/E-X-D-D, wherein X can be any amino acid, such as any naturally occurring amino acids. In particular, X can be an amino acid with a hydrophobic side chain, and thus, X can be A, I, L, M, F, W, Y, or V. Even more preferably, X is an amino acid with a small hydrophobic side chain, and thus X can be A, I, L, or V.
[0074] In embodiments of the invention relating to production of 13R-MO and/or 13R-MO derivatives, then it is preferred that the host organism comprises a gene encoding a TPS2 polypeptide. TPS2 catalyzes the reaction shown in Figure 2B, wherein -OPP refers to diphosphate. In particular, it is preferred that the TPS2 is TPS2 of C. forskohlii. In particular, the TPS2 can be a polypeptide of SEQ ID NO:16 or a functional homolog thereof sharing at least 50% sequence identity therewith. TPS2 of SEQ ID NO: 16 can be encoded by the nucleotide sequence set forth in SEQ ID NO:35. See Examples 2 and 3.
[0075] A diTPS of class I is an enzyme capable of catalyzing cleavage of the diphosphate group of the diterpene pyrophosphate intermediate and additionally preferably also is capable of catalyzing cyclization and/or rearrangement reactions on the resulting carbocation. As with the class II diTPSs, deprotonation or water capture may terminate the class I diTPS reaction leading to hydroxylation of the diterpene pyrophosphate intermediate.
[0076] A diTPS of class I may comprise the following motif of five amino acids: D-D-X-X- D/E, wherein X can be any amino acid, such as any naturally occurring amino acids. In particular, X can be an amino acid with a hydrophobic side chain, and thus X can for example be A, I, L, M, F, W, Y, or V. Even more preferably, X is an amino acid with a small hydrophobic side chain, and thus X can be A, I, L, or V. [0077] In embodiments of the invention relating to production of 13R-MO and/or 13R-MO derivatives, then it is preferred that the host organism comprises a gene encoding a TPS3 polypeptide and/or a gene encoding a TPS4 polypeptide. Preferably the TPS3 or TPS4 is an enzyme capable of catalyzing the reaction shown in Figure 2C. In particular, it is preferred that the TPS3 is TPS3 of C. forskohlii. In particular, the TPS3 can be a polypeptide of SEQ ID NO: 17 or a functional homolog thereof sharing at least 50% sequence identity therewith. TPS3 of SEQ ID NO: 17 can be encoded by the nucleotide sequence set forth in SEQ ID NO:36. See Examples 2 and 3. In particular, it is preferred that the TPS4 is TPS4 of C. forskohlii. In particular, the TPS4 can be a polypeptide of SEQ ID NO: 18 or a functional homolog thereof sharing at least 40% sequence identity therewith.
[0078] In some embodiments, a host comprises a gene encoding a TPS1 polypeptide. The TPS1 polypeptide can be a C. forskohlii TPS1 polypeptide, i.e. TPS1 of SEQ ID NO:65. See Example 2.
Cytochrome P450 (CYP)
[0079] In some embodiments, a host cell disclosed herein can comprise a nucleic acid encoding an enzyme capable of catalyzing oxidation of 13R-MO. In some aspects, the enzyme capable of catalyzing oxidation of 13R-MO is a cytochrome P450 (CYP) polypeptide. CYPs according to the present invention are enzymes capable of catalyzing oxidation reactions using NAD(P)H as electron donor. Preferred CYPs according to the present invention are hemoproteins capable of catalyzing oxidation reactions that utilize NADPH and/or NADH to reductively cleave atmospheric dioxygen to produce a functionalized organic substrate and a molecule of water. As described herein, a host cell comprising a gene encoding a diterpene synthase polypeptide and genes encoding a CYP polypeptide is capable of producing oxidized 13R-MO.
[0080] CYPs are encoded by gene superfamily, which is divided into families sharing at least 40% sequence identity. The families are divided into subfamilies sharing at least 55% sequence identity. The CYP families have a number, which generally is written after "CYP." Thus, by way of example, CYPs of family 74 are named CYP74. The subfamilies are indicated by a capital letter after the family number. Thus by way of example a CYP of family 74 and subfamily A is named CYP74A. Additional description of CYPs, the structural characteristics and the nomenclature thereof may for example be found in Schuler et al., Annu Rev. Plant Biol., 54:629-67 (2003) and in Podust et al. , Nat. Prod. Rep., 29: 1251-1266 (2012). Thus, the CYP to be used with the present invention can be a CYP as described in Schuler et al or Podust et al. [0081] The CYP may comprise the following motif of five amino acids: A G-G-X-X-T/S, wherein X can be any amino acid, such as any naturally occurring amino acids. In particular, one of the X amino acids can be an amino acid with a charged side chain, and in particular an acidic side chain, such as E. A/G indicates that the amino acid can be A or G. Similarly, T/S indicates that the amino acid can be T or S. The CYP can also comprise the following motif 4 amino acids: E-X-X-R, wherein X can be any amino acid, such as any naturally occurring amino acids. In particular, X can be an amino acid with an uncharged side chain, such as a hydrophobic side chain. Furthermore, the CYP can comprise the following motif following motif of 10 amino acids: F-X-X-G-X-X-X-C-X-G (SEQ ID NO:69), wherein X can be any amino acid, such as any naturally occurring amino acid. Furthermore, the CYP can comprise the following motif of 3 amino acids: P-F-G.
[0082] Preferably, the CYP is an enzyme capable of catalyzing the following reactions: a) conversion of 13R-MO to hydroxyl-13f?-MO; b) conversion of hydroxyl-13f?-MO to dihydroxy- 13R-MO; c) conversion of hydroxyl-13f?-MO to 13R-MO ketone; and/or d) conversion of hydroxyl-13R-MO to 13R-MO aldehyde.
[0083] It is preferred that a host organism comprises a gene encoding an enzyme capable of catalyzing oxidation of 13R-MO and/or of oxidized 13R-MO. Thus, the CYP may preferably be an enzyme capable of catalyzing oxidation of 13R-MO and/or of oxidized 13R-MO.
[0084] In one embodiment, a host organism comprises: a) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 1 position; b) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 6 position; c) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 7 position; d) a gene encoding CYP polypeptide capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 9 position; and/or e) a gene encoding CYP polypeptide capable of catalyzing oxidation of 13R-MO and/or of oxidized 13R-MO at the 1 1 position to a ketone.
[0085] In some embodiments, a host organism comprises a gene encoding CYP76AH16. The CYP76AH16 may in particular be CYP76AH16 of SEQ ID NO:19 or a functional homolog thereof sharing at least 55% sequence identity therewith. Preferably, a functional homolog of CYP76AH16 is a polypeptide sharing above-mentioned sequence identity with CYP76AH16 and which also is capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 9 position. See Examples 1 and 5. [0086] In some embodiments, a host organism comprises a gene encoding CYP76AH8. The CYP76AH8 may in particular be CYP76AH8 of SEQ ID NO:20 or a functional homolog thereof sharing at least 50% sequence identity therewith. See Examples 1 , 3, and 5.
[0087] In some embodiments, a host organism comprises a gene encoding CYP76AH15. The CYP76AH15 may in particular be CYP76AH15 of SEQ ID NO:22 or a functional homolog thereof sharing at least 50% sequence identity therewith. In some embodiments, CYP76AH15 catalyzes conversion of 13R-MO to 1 1-oxo-13f?-MO. See Examples 1-3.
[0088] In some embodiments, a host organism comprises a gene encoding CYP76AH17. The CYP76AH17 may in particular be CYP76AH17 of SEQ ID NO:23 or a functional homolog thereof sharing at least 50% sequence identity therewith. See Example 1.
[0089] In some embodiments, a host organism comprises a gene encoding CYP76AH11 . The CYP76AH1 1 may in particular be CYP76AH1 1 of SEQ ID NO:21 or a functional homolog thereof sharing at least 50% sequence identity therewith. See Examples 1 and 4.
[0090] In some embodiments, a host organism comprises a gene encoding R. officinalis CYP76AH4 (SEQ ID NO:71), R. officinalis FS1 (SEQ ID NO:70), S. fructicosa FS (SEQ ID NO:73), R. officinalis CYP76AH6 (SEQ ID NO:72), and/or a functional homolog thereof sharing at least 50% sequence identity therewith. See Example 5.
[0091] Preferably, a functional homolog of CYP76AH8, CYP76AH15, CYP76AH17, or CYP76AH1 1 is a polypeptide sharing above-mentioned sequence identity with CYP76AH8, CYP76AH15, CYP76AH17, or CYP76AH1 1 and which also is capable of catalyzing hydroxylation of 13R-MO and/or of oxidized 13R-MO at the 1 , 6, or 7 position or oxidation of 13R-MO at the 1 1 position. In some embodiments, the CYP76AH enzymes carry out ketonation at C-11 (CYP76AH15) and hydroxylations at C-6, C-7, C-1 (CYP76AH1 1) and C-9 (CYP76AH16) to produce deacetylforskolin.
[0092] In some embodiments, cytochrome P450 enzymes have at least six specific regions known as the substrate recognition sites (SRS, i.e. SRS1-6) that are important for the activity of CYPs. In some embodiments, alterations to SRS sites affect product production and/or substrate specificity. In some embodiments, one or more SRS sites are altered to increase in vivo formation of products in yeast. Positions of SRS1 -6 for CYP76AH15 are shown in Figure 5B. The S235 and Y236 residues lie in a region potentially in contact with the ER membrane. The A99 residue points towards SRS3. SRS2 and SRS3 can be part of the substrate entrance. The L366 and G362 residues can be essential for P450 function. See Figure 5C. [0093] The SRS regions were identified in the forskolin related CYP76AH enzymes (CYP76AH8, CYP76AH1 1 , CYP76AH15, CYP76AH16 and CYP76AH17) by alignments and comparisons of reported SRS regions in the rat CYP2A1 (Gotoh, 1992), Hyoscyamus muticus CYP71 D55 (Takahashi et a/. , 2007) and Thapsia villosa CYP71AJ6 (Dueholm et a/., 2015). Comparative homology modeling was furthermore utilized to determine and visualize SRS regions (Figures 5C and D). CYP76AH1 1 , CYP76AH15 and CYP76AH16 were determined to contain a total of 78 residues in putative SRS regions whereas CYP76AH8 and CYP76AH17 contained 77 residues due to a deletion in the SRS6.
[0094] CYP76AH8, CYP76AH15 and CYP76AH17 carry out similar reactions, but with differences in product patterns and the SRS regions were compared to identify the similarities and differences in these regions (Table 3; Figures 5C and D). CYP76AH8 and CYP76AH17 share similar product patterns with (13R)-manoyl oxide and a total sequence identity of 88%, whereas the sequence identity in the SRS regions were found to be 99% with a single conservative amino acid substitution in SRS1 (A1 17S in CYP76AH17), suggesting a high sequence conservation in the SRS regions. Differences in the SRS1-6 regions between CYP76AH15 and CYP76AH8/CYP76AH17 were mainly in the SRS1 , SRS3 and SRS6 whereas the SRS5 region was conserved between all three enzymes.
[0095] In some embodiments, CYP76AH15 variants are expressed in an N. benthamiana host. The N. benthamiana host can further comprise a terpene synthase, such as TPS2 (SEQ ID NO: 16) or TPS3 (SEQ ID NO:17), anti-post transcriptional suppressor protein P19 (SEQ ID NO:68), a 1 -deoxy-D-xylulose 5-phosphate synthase (DXS) such as C. forskohlii DXS (SEQ ID NO:30), and/or a geranylgeranyl diphosphate synthase (GGPPS) such as C. forskohlii GGPPS (SEQ ID NO:32). See Example 2.
[0096] In some embodiments, expression of CYP76AH15 A99I (SEQ ID NO:42), CYP76AH15 L366E, or CYP76AH15 A99I L366F (SEQ ID NO:58) in N. benthamiana results in accumulation of 1 1/3-hydroxy-13f?-MO and 1 1-oxo-13f?-MO. In some embodiments, expression of CYP76AH15 L366F (SEQ ID NO:50) in N. benthamiana results in accumulation of 11 -oxo- 13R-MO. In some embodiments, expression of CYP76AH15 G362V L366F (SEQ ID NO:51 ) in N. benthamiana results in accumulation of 1 1/3-hydroxy-13f?-MO. See Example 2 and Figure 6. In some embodiments, expression of CYP76AH15 variants in an N. benthamiana host results in production of further oxygenated derivatives of 13R-MO. See Example 2 and Figure 7. [0097] In some embodiments, CYP76AH15 variants are expressed in an S. cerevisiae host. The S. cerevisiae host can further comprise a cytochrome P450 reductase such as C. forskohlii POR (SEQ ID NO:34), a terpene synthase such as C. forskohlii TPS2 (SEQ ID NO: 16) and/or C. forskohlii TPS3 (SEQ ID NO:17), and/or a GGPPS such as Synechococcus sp. GGPPS (SEQ ID NO:37). See Example 3 and Figure 9.
[0098] Surprisingly, CYP76AH15 variants can increase in vivo accumulation of 1 1 -oxo-13f?- MO by several fold in S. cerevisiae. Specifically, mutating amino acids corresponding to SRS1 (i.e. , A99I), SRS3 (i.e. , S235G Y236F), and/or SRS5 (i.e. , L366F, L366E) of SEQ ID NO:22 can increase accumulation of 1 1-oxo-13f?-MO by over two-fold compared with native and codon- optimized CYP76AH15. For example, CYP76AH15 A99I (SEQ ID NO:42) can result in accumulation of 5.6-fold higher levels of 11-oxo-13f?-MO, compared to expression of native CYP76AH15 (SEQ ID NO:22). See Example 3, Figure 9, and Table 5.
[0099] In some embodiments, mutations in SRS regions can be combined to further increase CYP76AH15 activity, specifically when combining SRS1 +SRS3 and SRS1 +SRS5. Expression of CYP76AH15 A99I S235G Y236F (SEQ ID NO:62) can result in a 6.5-fold increase in 11-oxo-13R-MO accumulation, while CYP76AH15 A99I L366F (SEQ ID NO:58) can increase 1 1 -oxo-13f?-MO levels 6.2-fold. See Example 3, Figure 9, and Table 5.
[00100] In some embodiments, SRS6 variants of CYP76AH15 can lead to a changed product profile towards a hydroxylated product of 11-oxo-13f?-MO. In some embodiments, SRS5 variants of CYP76AH15 (i.e. CYP76AH15 G362V L366F of SEQ ID NO:51 ) result in production of 1 1-hydroxy-13f?-MO. See Example 3 and Figure 9.
[00101] In some embodiments, 13f?-MO-producing S. cerevisiae strains comprising CYP76AH15, CYP76AH15 A99I, or CYP76AH1 1 results in formation of compounds with the formula C20H32O3 and C20H32O4, corresponding to single hydroxylation and double hydroxylation of 1 1-0X0-13f?-manoy I oxide, respectively. In some embodiments, 13f?-MO-producing S. cerevisiae strains comprising CYP76AH15 (SEQ ID NO:22) and CYP76AH16 (SEQ ID NO: 19) results in formation of a C20H32O3 compound corresponding to a single hydroxylation of 1 1 -oxo- 13f?-manoyl oxide. In some embodiments, 13f?-MO-producing S. cerevisiae strains comprising CYP76AH15 A99I (SEQ ID NO:42) and CYP76AH16 (SEQ ID NO:21) results in formation of 1 1 - oxo-13f?-manoyl oxide as well as a C20H32O3 compound and a compound with the formula C20H32O4, which corresponds to a double hydroxylation of 1 1 -oxo-13f?-manoyl oxide. See Example 4 and Figure 11. [00102] In some embodiments, expression of R. officinalis CYP76AH4 (SEQ ID NO:71), R. officinalis FS1 (SEQ ID NO:70), S. fructicosa FS (SEQ ID NO:73), or R. officinalis CYP76AH6 (SEQ ID NO:72) in N. benthamiana results in production of 1 1-oxo-13f?-MO and 1 1/3-hydroxy- 13R-MO. In some embodiments, expression of C. forskohlii CYP76AH8 (SEQ ID NO:20) in N. benthamiana results in production of 1 1-oxo-13f?-MO and 1 1/3-hydroxy-13f?-MO. See Example 5 and Figure 13.
[00103] In some embodiments, CYP76AH15 or a CYP76AH15 variant is expressed in N. benthamiana or an S. cerevisiae host to produce miltiradiene, abietatriene, and/or ferruginol. In some embodiments, expression of CYP76AH8 in an S. cerevisiae host results in production of ferruginol. The host can further comprise C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO: 17), and/or C. forskohlii POR (SEQ ID NO:34). See Examples 2, 3, and 5 and Figures 8, 10, and 12.
Diterpene acetyltransferase (ACT)
[00104] In some embodiments, a host cell disclosed herein can comprise a nucleic acid encoding a diterpene acetyltransferase capable of catalyzing acetylation of 13R-MO and/or acetylation of oxidized 13R-MO. As described herein, a host cell comprising a gene encoding a diterpene synthase polypeptide, a gene encoding a CYP polypeptide, and a gene an ACT polypeptide is capable of producing acetylated oxidized 13R-MO, such as forskolin.
[00105] In some embodiments, a host cell disclosed herein comprises the diterpene acetyltransferase, ACT1-6. In some aspects, ACT1-6 is derived from C. forskohlii. In particular, the diterpene acetyltransferase can be ACT1 -6 of SEQ ID NO:6 or a functional homolog thereof sharing at least 55% sequence identity therewith. In some embodiments, a functional homolog of ACT1-6 of SEQ ID NO:6 is a polypeptide sharing at least 90% sequence identity therewith. In some aspects, ACT1-6 of SEQ ID NO:6 is encoded by the nucleic acid set forth in SEQ ID NO: 1 or SEQ ID NO:1 1 , wherein SEQ ID NO:1 1 is optimized for expression in S. cerevisiae.
[00106] In some embodiments, a host cell disclosed herein comprises the diterpene acetyltransferase, ACT1-7. In some aspects, ACT1-7 is derived from C. forskohlii. In particular, the diterpene acetyltransferase can be ACT1 -7 of SEQ ID NO:7 or a functional homolog thereof sharing at least 55% sequence identity therewith. In some embodiments, a functional homolog of ACT1-7 of SEQ ID NO:7 is a polypeptide sharing at least 90% sequence identity therewith. In some aspects, ACT1-7 of SEQ ID NO:7 is encoded by the nucleic acid set forth in SEQ ID NO:2 or SEQ ID NO: 12, wherein SEQ ID NO: 12 is optimized for expression in S. cerevisiae. [00107] In some embodiments, a host cell disclosed herein comprises the diterpene acetyltransferase, ACT1 -8. ACT1-8 can be derived from any suitable source; however, in a preferred embodiment, ACT1-8 is derived from C. forskohlii. In particular, the diterpene acetyltransferase can be ACT1-8 of SEQ ID NO:26 or a functional homolog thereof sharing at least 55% sequence identity therewith. In some embodiments, a functional homolog of ACT1-8 of SEQ ID NO:26 is a polypeptide sharing at least 90% sequence identity therewith. In some embodiments, ACT1-8 is encoded by the nucleic acid set forth in SEQ ID NO:27.
[00108] In some embodiments, 13R-MO and/or 13R-MO derivatives are produced in vivo through expression of one or more enzymes involved in a diterpene biosynthetic pathway in a recombinant host. For example, a recombinant host expressing a gene encoding a polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R- MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions, and/or a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO can produce 13R-MO and/or 13R-MO derivatives in vivo. See, e.g. , Figures 1-4. The skilled worker will appreciate that one or more of these genes can be endogenous to the host provided that at least one (and in some embodiments, all) of these genes is a recombinant gene introduced into the recombinant host.
[00109] In some aspects, the polypeptide capable of synthesizing (5S,8R,9R,10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R,10R)-labda-8-ol diphosphate comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO: 16 (which can be encoded by the nucleotide sequence set forth in SEQ ID NO:35), SEQ ID NO: 17 (encoded by the nucleotide sequence set forth in SEQ ID NO:36), SEQ ID NO: 18, or SEQ ID NO:65. In some embodiments, a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO: 16 can produce (5S,8R,9R, 10R)-labda-8-ol diphosphate from GGPP in vivo. In some embodiments, a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO: 17 or SEQ ID NO: 18 can produce 13R-MO in vivo.
[00110] In some aspects, the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:6 (which can be encoded by the nucleotide sequence set forth in SEQ ID NO: 1 or SEQ ID NO: 1 1), SEQ ID NO:7 (encoded by the nucleotide sequence set forth in SEQ ID NO:2 or SEQ ID NO:12), or SEQ ID NO:26 (encoded by the nucleotide sequence set forth in SEQ ID NO:27). [00111] In some aspects, the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R- MO at its 1-, 6-, 7-, 9-, and/or 1 1 -positions comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:21 , SEQ ID NO:71 , SEQ ID NO:70, SEQ ID NO:73, or SEQ ID NO:72. In some embodiments, a recombinant host expressing a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 1 1 -positions can produce hydroxyl-13R-MO, dihydroxy-13R-MO, 13R-MO ketone, and/or 13R-MO aldehyde. In some embodiments, a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO: 19 can produce 13R-MO and/or oxidized 13R-MO hydroxylated at its 9-position in vivo. In some embodiments, a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO:22 can produce 1 1-oxo-13f?-MO in vivo. In some embodiments, the 13R-MO derivative is an oxidized 13R-MO derivative. In some embodiments, the 13R-MO derivative is 11-oxo-13f?-MO and/or 1 1/3-hydroxy-13f?-MO. In some embodiments, the 13R-MO derivative is forskolin.
[00112] In some embodiments, the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions is further capable of synthesizing ferruginol from abietatriene and/or miltiradiene. In some embodiments, a recombinant host expressing a gene encoding a polypeptide having an amino acid sequence set forth in SEQ ID NO:22 can produce ferruginol in vivo.
[00113] In some embodiments, the polypeptide capable of synthesizing (5S,8R,9R, 10R)- labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate comprises a diterpene synthase (TPS) polypeptide as otherwise described herein, e.g. , a TPS1 , TPS2, TPS3, or TPS4 polypeptide. In some embodiments, the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 11 -positions comprises a cytochrome P450 (CYP) polypeptide as otherwise described herein, e.g., a CYP76AH16, CYP76AH8, CYP76AH15, CYP76AH17, CYP76AH1 1 , CYP76AH4, ferruginol synthase (FS), or FS1 polypeptide. In some embodiments, the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a diterpene acetyltransferase (ACT) polypeptide as otherwise described herein, e.g., an ACT1-6, ACT1-7, or ACT1-8 polypeptide.
[00114] In some embodiments, the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 1 1 -positions comprises a functional homolog, or variant, of CYP76AH15 (SEQ ID NO:22), as otherwise described herein. In some embodiments, the CYP76AH15 variant is further capable of synthesizing ferruginol from abietatriene and/or miltiradiene. In some embodiments, the CYP76AH15 variant comprises one or more amino acid substitutions corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 474, 476, and/or 478 of SEQ ID NO:22. Non-limiting examples of the CYP76AH15 variant include polypeptides comprising substitutions (with respect to SEQ I D NO:22) corresponding to residue 99 (e.g. , an isoleucine corresponding to residue 99); 100 (e.g. , a valine corresponding to residue 100); 104 (e.g., an aspartic acid corresponding to residue 104); 207 (e.g., a threonine corresponding to residue 207); 235 (e.g. , a glycine corresponding to residue 235); 236 (e.g., a phenylalanine corresponding to residue 236); 362 (e.g. , a valine corresponding to residue 362); 366 (e.g. , a phenylalanine or a glutamic acid corresponding to residue 366); 473 (e.g. , a glutamic acid corresponding to residue 473); 474 (e.g., a leucine corresponding to residue 474); 476 (e.g., a threonine corresponding to residue 476); and/or 478 (e.g., a methionine, an alanine, or an isoleucine corresponding to residue 478). In some embodiments, the CYP76AH 15 variant comprises an A99I substitution corresponding to SEQ I D NO:22 (e.g. , CYP76AH 15 A991 I; SEQ ID NO:42), an S235G and Y236F substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 S235 Y236F; SEQ ID NO:48), an L366F substitution corresponding to SEQ ID NO:22 (e.g. , CYP76AH15 L366F; SEQ ID NO:50), L366E substitution corresponding to SEQ I D NO:22 (e.g., CYP76AH15 L366E; SEQ ID NO:52), an A99I, S235G, and Y236F substitution corresponding to SEQ ID NO:22 (e.g. , CYP76AH15 A99I S235G Y236F; SEQ ID NO:62), an A99I and L366F substitution corresponding to SEQ ID NO:22 (e.g. , CYP76AH15 A99I L366F; SEQ ID NO:58), an S235G, Y236F, and L366E substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 S235G Y236F L366E; SEQ ID NO:63), an A99I , S235G, Y236F, and L366F substitution corresponding to SEQ ID NO:22 (e.g. , CYP76AH15 A99I S235G Y236F L366F; SEQ ID NO:64), an S235G, Y236F, and L366F substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH 15 S235G Y236F L366F; SEQ ID NO:74), an A99I , S235G, Y236F, and L366E substitution corresponding to SEQ ID NO:22 (e.g. , CYP76AH15 A99I S235G Y236F L366E; SEQ ID NO:75), a G362V and L366F substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 G362V L366F; SEQ ID NO:51 ), a G362V substitution corresponding to SEQ ID NO:22 (e.g., CYP76AH15 G362V; SEQ ID NO:49), or a D473E and D474L substitution, and a P475 deletion corresponding to SEQ ID NO:22 (e.g., CYP76AH 15 D473E D474L + P475 deletion; SEQ I D NO:53). In some embodiments, a CYP76AH15 variant can have one or more substitutions corresponding to the following portions of SEQ I D NO:22: residue 93-1 16, residue 202-209, residue 233-240, residue 233-240, residue 286-304, residue 359-369, and/or residue 473-480. For example, a CYP76AH 15 variant can have one or more mutations corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 476, and/or 478 of SEQ I D NO:22. For example, a CYP76AH15 variant can comprise the following mutations: A99I, A100V, G104D, V207T, S235G, Y236F, G362V, L366F, L366E, F476T, L478M, L478A, and/or L478I. See SEQ I D NOs:42-64, Tables 4 and 5, and Figure 5D. In some embodiments, the 13R-MO derivative is an oxidized 13R-MO derivative. In some embodiments, the 13R-MO derivative is 1 1 -oxo-13f?- MO and/or 1 1/3-hydroxy-13f?-MO. In some embodiments, the 13R-MO derivative is forskolin.
[00115] In some embodiments, a recombinant host expressing one or more CYP76AH15 variants not limited to CYP76AH15 A991 I (SEQ ID NO:42), CYP76AH15 S235 Y236F (SEQ I D NO:48), CYP76AH15 L366F (SEQ ID NO:50), CYP76AH 15 G362V L366F (SEQ ID NO:51 ), CYP76AH15 L366E (SEQ ID NO:52), CYP76AH15 A99I S235G Y236F (SEQ ID NO:62), CYP76AH15 A99I L366F (SEQ ID NO:58), CYP76AH15 S235G Y236F L366E (SEQ I D NO:63), or CYP76AH15 A99I S235G Y236F L366F (SEQ ID NO:64) produces 1 1 -oxo-13R-MO and/or 1 1/3-hydroxy-13f?-MO in vivo.
[00116] In some embodiments, a recombinant host expressing a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 - positions (e.g. , a CYP76AH 15 polypeptide variant) further expresses a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate (e.g. , a terpene synthase (TPS) polypeptide, comprising a terpene synthase 2 (TPS2) polypeptide, a terpene synthase 3 (TPS3) polypeptide, and/or a terpenes synthase 4 (TSP4) polypeptide; a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions (e.g. , a cytochrome P450 (CYP) polypeptide comprising a CYP76AH16 polypeptide variant, a CYP76AH8 polypeptide variant, a CYP76AH1 1 polypeptide variant, and/or a CYP76AH17 polypeptide variant); a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO (e.g. , a diterpene acetyltransferase (ACT) polypeptide); a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl diphosphate (FPP) and isopentyl diphosphate (IPP) (e.g., a geranylgeranyl diphosphate synthase (GGPPS) polypeptide); a gene encoding a polypeptide capable of synthesizing 1 -deoxy-D-xylulose 5-phosphate (DXS) from pyruvate and D-glyceraldehyde 3-phosphate (e.g. , a 1 -deoxy-D-xylulose-5-phosphate synthase (DXS) polypeptide); a gene encoding a polypeptide capable of reducing cytochrome P450 complex (e.g. , a cytochrome P450 reductase (CPR) polypeptide); and/or a gene encoding an anti-post transcriptional suppressor protein polypeptide. [00117] In some embodiments, a recombinant host expressing a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 - positions (e.g., a CYP76AH15 polypeptide variant) having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:22, and further having at least one amino acid substitution corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 474, 476, and/or 478 of SEQ ID NO:22 further expresses a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate (e.g., a terpene synthase (TPS) polypeptide, comprising a terpene synthase 2 (TPS2) polypeptide having at least 50% sequence identity to an amino acid sequence set forth in SEQ I D NO: 16, a terpene synthase 3 (TPS3) polypeptide having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO: 17, and/or a terpenes synthase 4 (TSP4) polypeptide having at least 40% sequence identity to an amino acid sequence set forth in SEQ ID NO: 18; a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9- , and/or 1 1 -positions (e.g. , a cytochrome P450 (CYP) polypeptide comprising a CYP76AH16 polypeptide variant having at least 55% sequence identity to an amino acid sequence set forth in SEQ ID NO: 19, a CYP76AH8 polypeptide variant having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO:20, a CYP76AH1 1 polypeptide variant having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO:21 , and/or a CYP76AH17 polypeptide variant having at least 50% sequence identity to an amino acid sequence set forth in SEQ ID NO:23); a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO (e.g., a diterpene acetyltransferase (ACT) polypeptide having at least 40% sequence identity to SEQ ID NO:26); a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl diphosphate (FPP) and isopentyl diphosphate (I PP) (e.g. , a geranylgeranyl diphosphate synthase (GGPPS) polypeptide having at least 70% sequence identity to SEQ ID NO:32 or SEQ ID NO:37); a gene encoding a polypeptide capable of synthesizing 1 -deoxy-D-xylulose 5-phosphate (DXS) from pyruvate and D-glyceraldehyde 3- phosphate (e.g., a 1 -deoxy-D-xylulose-5-phosphate synthase (DXS) polypeptide having at least 75% sequence identity to SEQ ID NO:30); a gene encoding a polypeptide capable of reducing cytochrome P450 complex (e.g., a cytochrome P450 reductase (CPR) polypeptide having at least 75% sequence identity to SEQ ID NO:34); and/or a gene encoding an anti-post transcriptional suppressor protein polypeptide having at least 75% sequence identity to SEQ I D NO:68. [00118] In some embodiments, expression of a gene encoding a CYP76AH15 variant in a recombinant host further expressing a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions, and/or a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO, results in increased production of a 13R-MO derivative (e.g. , 1 1 -oxo- 13R-MO) relative to a corresponding host lacking the CYP76AH15 homolog, e.g., at least a 10% increase, or at least a 25% increase, or at least a 50% increase, or at least a 75% increase, or at least a 100% increase, or at least a 150% increase, or at least a 200% increase, or at least a 250% increase, or at least a 300% increase, or at least a 350% increase, or at least a 400% increase, or at least a 450% increase, or at least a 500% increase, or at least a 550% increase, or at least a 600% increase in production of a 13R-MO derivative (e.g. , 1 1 -oxo-13R- MO).
[00119] In some embodiments, expression of a gene encoding a CYP76AH15 variant in a recombinant host further expressing a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of synthesizing GGPP from FPP and isopentyl diphosphate IPP, and/or a gene encoding a polypeptide capable of reducing cytochrome P450 complex, results in increased production of ferruginol relative to a corresponding host lacking the CYP76AH15 homolog, e.g. , at least a 10% increase, or at least a 25% increase, or at least a 50% increase, or at least a 75% increase, or at least a 100% increase, or at least a 150% increase, or at least a 200% increase in production of ferruginol.
[00120] In some embodiments, a recombinant host expressing a gene encoding a CYP76AH15 variant and a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate, a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions, and/or a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO, and further expressing a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl diphosphate (FPP) and isopentyl diphosphate (IPP), a gene encoding a polypeptide capable of synthesizing 1 -deoxy-D-xylulose 5-phosphate (DXP) from pyruvate and D-glyceraldehyde 3-phosphate, a gene encoding a polypeptide capable of reducing cytochrome P450 complex, and/or a gene encoding an anti-post transcriptional suppressor protein polypeptide, can produce 13R-MO and/or 13R-MO derivatives in vivo.
[00121] In some embodiments, the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:32 (which can be encoded by the nucleotide sequence set forth in SEQ ID NO:31) or SEQ ID NO:37 (encoded by the nucleotide sequence set forth in SEQ ID NO:38). In some embodiments, the polypeptide capable of synthesizing DXP from pyruvate and D-glyceraldehyde 3-phosphate comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:30 (encoded by the nucleotide sequence set forth in SEQ ID NO:29). In some embodiments, the polypeptide capable of reducing cytochrome P450 complex comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:34 (encoded by the nucleotide sequence set forth in SEQ ID NO:33). In some embodiments, the anti-post transcriptional suppressor protein polypeptide comprises a polypeptide having an amino acid sequence set forth in SEQ ID NO:68.
[00122] In some embodiments, the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a geranylgeranyl diphosphate synthase (GGPPS) polypeptide as otherwise described herein. In some embodiments, the polypeptide capable of synthesizing DXP from pyruvate and D-glyceraldehyde 3-phosphate comprises a 1-deoxy-D-xylulose-5-phosphate synthase (DXS) polypeptide as otherwise described herein. In some embodiments, the polypeptide capable of reducing cytochrome P450 complex comprises a cytochrome P450 reductase (CPR) polypeptide as otherwise described herein, e.g., a POR polypeptide. In some embodiments, the anti-post transcriptional suppressor protein polypeptide comprises a P19 polypeptide as otherwise described herein.
Functional Homologs
[00123] Functional homologs of the polypeptides described above are also suitable for use in producing 13R-MO and/or 13R-MO derivatives in a recombinant host. A functional homolog is a polypeptide that has sequence similarity to a reference polypeptide, and that carries out one or more of the biochemical or physiological function(s) of the reference polypeptide. A functional homolog and the reference polypeptide can be a natural occurring polypeptide, and the sequence similarity can be due to convergent or divergent evolutionary events. As such, functional homologs are sometimes designated in the literature as homologs, or orthologs, or paralogs. Variants of a naturally occurring functional homolog, such as polypeptides encoded by mutants of a wild type coding sequence, can themselves be functional homologs. Functional homologs can also be created via site-directed mutagenesis of the coding sequence for a polypeptide, or by combining domains from the coding sequences for different naturally- occurring polypeptides ("domain swapping"). Techniques for modifying genes encoding functional polypeptides described herein are known and include, inter alia, directed evolution techniques, site-directed mutagenesis techniques and random mutagenesis techniques, and can be useful to increase specific activity of a polypeptide, alter substrate specificity, alter expression levels, alter subcellular location, or modify polypeptide-polypeptide interactions in a desired manner. Such modified polypeptides are considered functional homologs. The term "functional homolog" is sometimes applied to the nucleic acid that encodes a functionally homologous polypeptide.
[00124] Functional homologs can be identified by analysis of nucleotide and polypeptide sequence alignments. For example, performing a query on a database of nucleotide or polypeptide sequences can identify homologs of 13R-MO and/or 13R-MO derivative biosynthesis polypeptides. Sequence analysis can involve BLAST, Reciprocal BLAST, or PSI- BLAST analysis of non-redundant databases using an amino acid sequence as the reference sequence. Amino acid sequence is, in some instances, deduced from the nucleotide sequence. Those polypeptides in the database that have greater than 40% sequence identity are candidates for further evaluation for suitability as a 13R-MO and/or 13R-MO derivative biosynthesis polypeptide. Amino acid sequence similarity allows for conservative amino acid substitutions, such as substitution of one hydrophobic residue for another or substitution of one polar residue for another. If desired, manual inspection of such candidates can be carried out in order to narrow the number of candidates to be further evaluated. Manual inspection can be performed by selecting those candidates that appear to have domains present in 13R-MO and/or 13R-MO derivative biosynthesis polypeptides, e.g. , conserved functional domains. In some embodiments, nucleic acids and polypeptides are identified from transcriptome data based on expression levels rather than by using BLAST analysis.
[00125] Conserved regions can be identified by locating a region within the primary amino acid sequence of a 13R-MO and/or 13R-MO derivative biosynthesis polypeptide that is a repeated sequence, forms some secondary structure (e.g. , helices and beta sheets), establishes positively or negatively charged domains, or represents a protein motif or domain. See, e.g. , the Pfam web site describing consensus sequences for a variety of protein motifs and domains on the World Wide Web at sanger.ac.uk/Software/Pfam/ and pfam.janelia.org/. The information included at the Pfam database is described in Sonnhammer et al. , Nucl. Acids Res., 26:320-322 (1998); Sonnhammer et a/., Proteins, 28:405-420 (1997); and Bateman et al. , Nucl. Acids Res., 27:260-262 (1999). Conserved regions also can be determined by aligning sequences of the same or related polypeptides from closely related species. Closely related species preferably are from the same family. In some embodiments, alignment of sequences from two different species is adequate to identify such homologs.
[00126] Typically, polypeptides that exhibit at least about 40% amino acid sequence identity are useful to identify conserved regions. Conserved regions of related polypeptides exhibit at least 45% amino acid sequence identity (e.g., at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% amino acid sequence identity). In some embodiments, a conserved region exhibits at least 92%, 94%, 96%, 98%, or 99% amino acid sequence identity.
[00127] Methods to modify the substrate specificity of a polypeptide are known to those skilled in the art, and include without limitation site-directed/rational mutagenesis approaches, random directed evolution approaches and combinations in which random mutagenesis/saturation techniques are performed near the active site of the enzyme. For example see Osmani et al., 2009, Phytochemistry 70: 325-347.
[00128] A candidate sequence typically has a length that is from 80% to 200% of the length of the reference sequence, e.g. , 82, 85, 87, 89, 90, 93, 95, 97, 99, 100, 105, 1 10, 1 15, 120, 130, 140, 150, 160, 170, 180, 190, or 200% of the length of the reference sequence. A functional homolog polypeptide typically has a length that is from 95% to 105% of the length of the reference sequence, e.g., 90, 93, 95, 97, 99, 100, 105, 1 10, 1 15, or 120% of the length of the reference sequence, or any range between. A % identity (or % sequence identity) for any candidate nucleic acid or polypeptide relative to a reference nucleic acid or polypeptide can be determined as follows. A reference sequence (e.g. , a nucleic acid sequence or an amino acid sequence described herein) is aligned to one or more candidate sequences using the computer program Clustal Omega (version 1.2.1 , default parameters), which allows alignments of nucleic acid or polypeptide sequences to be carried out across their entire length (global alignment). Chenna et al., 2003, Nucleic Acids Res. 31 (13):3497-500.
[00129] Clustal Omega calculates the best match between a reference and one or more candidate sequences, and aligns them so that identities, similarities and differences can be determined. Gaps of one or more residues can be inserted into a reference sequence, a candidate sequence, or both, to maximize sequence alignments. For fast pairwise alignment of nucleic acid sequences, the following default parameters are used: word size: 2; window size: 4; scoring method: %age; number of top diagonals: 4; and gap penalty: 5. For multiple alignment of nucleic acid sequences, the following parameters are used: gap opening penalty: 10.0; gap extension penalty: 5.0; and weight transitions: yes. For fast pairwise alignment of protein sequences, the following parameters are used: word size: 1 ; window size: 5; scoring method:%age; number of top diagonals: 5; gap penalty: 3. For multiple alignment of protein sequences, the following parameters are used: weight matrix: blosum; gap opening penalty: 10.0; gap extension penalty: 0.05; hydrophilic gaps: on; hydrophilic residues: Gly, Pro, Ser, Asn, Asp, Gin, Glu, Arg, and Lys; residue-specific gap penalties: on. The Clustal Omega output is a sequence alignment that reflects the relationship between sequences. Clustal Omega can be run, for example, at the Baylor College of Medicine Search Launcher site on the World Wide Web (searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) and at the European Bioinformatics Institute site at http://www.ebi.ac.uk Tools/msa/clustalo/.
[00130] To determine a % identity of a candidate nucleic acid or amino acid sequence to a reference sequence, the sequences are aligned using Clustal Omega, the number of identical matches in the alignment is divided by the length of the reference sequence, and the result is multiplied by 100. It is noted that the % identity value can be rounded to the nearest tenth. For example, 78.1 1 , 78.12, 78.13, and 78.14 are rounded down to 78.1 , while 78.15, 78.16, 78.17, 78.18, and 78.19 are rounded up to 78.2.
[00131] In some embodiments, functional polypeptided disclosed herein can include additional amino acids that are not involved in the enzymatic activities carried out by the enzymes. In some embodiments, functional polypeptided disclosed herein are fusion proteins. The terms "chimera," "fusion polypeptide," "fusion protein," "fusion enzyme," "fusion construct," "chimeric protein," "chimeric polypeptide," "chimeric construct," and "chimeric enzyme" can be used interchangeably herein to refer to proteins engineered through the joining of two or more genes that code for different proteins.
[00132] In some embodiments, a chimeric enzyme is constructed by joining the C-terminal of a first polypeptide ProteinA to the N-terminal of a second polypeptide ProteinB through a linker "b," i.e., "ProteinA-b-ProteinB." In some aspects, the linker of a chimeric enzyme may be the amino acid sequence "KLVK." In some aspects, the linker of a chimeric enzyme may be the amino acid sequence "RASSTKLVK." In some aspects, the linker of a chimeric enzyme may be the amino acid sequence "GGGGS." In some aspects, the linker of a chimeric enzyme may be two repeats of the amino acid sequence "GGGGS" (i.e. , "GGGGSGGGGS"). In some aspects, the linker of a chimeric enzyme may be three repeats of the amino acid sequence "GGGGS." In some aspects, the linker of a chimeric enzyme may be the amino acid sequence "EGKSSGSGSESKST." In some aspects, the linker of a chimeric enzyme is a direct bond between the C-terminal of a first polypeptide and the N-terminal of a second polypeptide. In some embodiments, a chimeric enzyme is constructed by joining the C-terminal of a first polypeptide ProteinA to the N-terminal of a second polypeptide ProteinB through a linker "b," i.e., "ProteinA-b-ProteinB" and by joining the C-terminal of the second polypeptide ProteinB to the N-terminal of a third polypeptide ProteinC through a second linker "d," i.e., "ProteinA-b- ProteinB-d-ProteinC.
[00133] In some embodiments, a fusion protein is a protein altered by domain swapping. As used herein, the term "domain swapping" is used to describe the process of replacing a domain of a first protein with a domain of a second protein. In some embodiments, the domain of the first protein and the domain of the second protein are functionally identical or functionally similar. In some embodiments, the structure and/or sequence of the domain of the second protein differs from the structure and/or sequence of the domain of the first protein. In some embodiments, a UGT polypeptide is altered by domain swapping.
[00134] In some embodiments, a fusion protein is a protein altered by circular permutation, which consists in the covalent attachment of the ends of a protein that would be opened elsewhere afterwards. Thus, the order of the sequence is altered without causing changes in the amino acids of the protein. In some embodiments, a targeted circular permutation can be produced, for example but not limited to, by designing a spacer to join the ends of the original protein. Once the spacer has been defined, there are several possibilities to generate permutations through generally accepted molecular biology techniques, for example but not limited to, by producing concatemers by means of PCR and subsequent amplification of specific permutations inside the concatemer or by amplifying discrete fragments of the protein to exchange to join them in a different order. The step of generating permutations can be followed by creating a circular gene by binding the fragment ends and cutting back at random, thus forming collections of permutations from a unique construct. In some embodiments, DAP1 polypeptide is altered by circular permutation. 13R-MO and/or 13f?-MO Derivative Biosynthesis Nucleic Acids
[00135] A recombinant gene encoding a polypeptide described herein comprises the coding sequence for that polypeptide, operably linked in sense orientation to one or more regulatory regions suitable for expressing the polypeptide. Because many microorganisms are capable of expressing multiple gene products from a polycistronic mRNA, multiple polypeptides can be expressed under the control of a single regulatory region for those microorganisms, if desired. A coding sequence and a regulatory region are considered to be operably linked when the regulatory region and coding sequence are positioned so that the regulatory region is effective for regulating transcription or translation of the sequence. Typically, the translation initiation site of the translational reading frame of the coding sequence is positioned between one and about fifty nucleotides downstream of the regulatory region for a monocistronic gene.
[00136] In many cases, the coding sequence for a polypeptide described herein is identified in a species other than the recombinant host, i.e. , is a heterologous gene. Thus, if the recombinant host is a microorganism, the coding sequence can be from other prokaryotic or eukaryotic microorganisms, from plants or from animals. In some case, however, the coding sequence is a sequence that is native to the host and is being reintroduced into that organism. A native sequence can often be distinguished from the naturally occurring sequence by the presence of non-natural sequences linked to the exogenous nucleic acid, e.g. , non-native regulatory sequences flanking a native sequence in a recombinant nucleic acid construct. In addition, stably transformed exogenous nucleic acids typically are integrated at positions other than the position where the native sequence is found. "Regulatory region" refers to a nucleic acid having nucleotide sequences that influence transcription or translation initiation and rate, and stability and/or mobility of a transcription or translation product. Regulatory regions include, without limitation, promoter sequences, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5' and 3' untranslated regions (UTRs), transcriptional start sites, termination sequences, polyadenylation sequences, introns, and combinations thereof. A regulatory region typically comprises at least a core (basal) promoter. A regulatory region also may include at least one control element, such as an enhancer sequence, an upstream element or an upstream activation region (UAR). A regulatory region is operably linked to a coding sequence by positioning the regulatory region and the coding sequence so that the regulatory region is effective for regulating transcription or translation of the sequence. For example, to operably link a coding sequence and a promoter sequence, the translation initiation site of the translational reading frame of the coding sequence is typically positioned between one and about fifty nucleotides downstream of the promoter. A regulatory region can, however, be positioned as much as about 5,000 nucleotides upstream of the translation initiation site, or about 2,000 nucleotides upstream of the transcription start site.
[00137] The choice of regulatory regions to be included depends upon several factors, including, but not limited to, efficiency, selectability, inducibility, desired expression level, and preferential expression during certain culture stages. It is a routine matter for one of skill in the art to modulate the expression of a coding sequence by appropriately selecting and positioning regulatory regions relative to the coding sequence. It will be understood that more than one regulatory region may be present, e.g., introns, enhancers, upstream activation regions, transcription terminators, and inducible elements.
[00138] One or more genes can be combined in a recombinant nucleic acid construct in "modules" useful for a discrete aspect of 13R-MO and/or 13R-MO derivative production. Combining a plurality of genes in a module, particularly a polycistronic module, facilitates the use of the module in a variety of species. For example, a 13R-MO and/or 13R-MO derivative gene cluster can be combined in a polycistronic module such that, after insertion of a suitable regulatory region, the module can be introduced into a wide variety of species. As another example, a 13R-MO and/or 13R-MO derivative gene cluster can be combined such that each coding sequence is operably linked to a separate regulatory region, to form a module. Such a module can be used in those species for which monocistronic expression is necessary or desirable. In addition to genes useful for 13R-MO and/or 13R-MO derivative production, a recombinant construct typically also comprises an origin of replication, and one or more selectable markers for maintenance of the construct in appropriate species.
[00139] It will be appreciated that because of the degeneracy of the genetic code, a number of nucleic acids can encode a particular polypeptide; i.e., for many amino acids, there is more than one nucleotide triplet that serves as the codon for the amino acid. Thus, codons in the coding sequence for a given polypeptide can be modified such that optimal expression in a particular host is obtained, using appropriate codon bias tables for that host (e.g., microorganism). As isolated nucleic acids, these modified sequences can exist as purified molecules and can be incorporated into a vector or a virus for use in constructing modules for recombinant nucleic acid constructs.
[00140] In some cases, it is desirable to inhibit one or more functions of an endogenous polypeptide in order to divert metabolic intermediates towards 13R-MO and/or 13R-MO derivative biosynthesis. As another example, it may be desirable to inhibit degradative functions of certain endogenous gene products. In such cases, a nucleic acid that overexpresses the polypeptide or gene product may be included in a recombinant construct that is transformed into the strain. Alternatively, mutagenesis can be used to generate mutants in genes for which it is desired to increase or enhance function.
Host Microorganisms
[00141] Recombinant hosts can be used to express polypeptides for the producing 13R-MO and/or 13R-MO derivatives, including mammalian, insect, plant, and algal cells. A number of prokaryotes and eukaryotes are also suitable for use in constructing the recombinant microorganisms described herein, e.g., gram-negative bacteria, yeast, and fungi. A species and strain selected for use as a 13R-MO and/or 13R-MO derivative production strain is first analyzed to determine which production genes are endogenous to the strain and which genes are not present. Genes for which an endogenous counterpart is not present in the strain are advantageously assembled in one or more recombinant constructs, which are then transformed into the strain in order to supply the missing function(s).
[00142] Typically, the recombinant microorganism is grown in a fermenter at a temperature(s) for a period of time, wherein the temperature and period of time facilitate the production of 13R-MO and/or 13R-MO derivatives. The constructed and genetically engineered microorganisms provided by the invention can be cultivated using conventional fermentation processes, including, inter alia, chemostat, batch, fed-batch cultivations, semi-continuous fermentations such as draw and fill, continuous perfusion fermentation, and continuous perfusion cell culture. Depending on the particular microorganism used in the method, other recombinant genes such as isopentenyl biosynthesis genes and terpene synthase and cyclase genes may also be present and expressed. Levels of substrates and intermediates can be determined by extracting samples from culture media for analysis according to published methods.
[00143] Carbon sources of use in the instant method include any molecule that can be metabolized by the recombinant host cell to facilitate growth and/or production of the 13R-MO and/or 13R-MO derivatives. Examples of suitable carbon sources include, but are not limited to, sucrose (e.g. , as found in molasses), fructose, xylose, ethanol, glycerol, glucose, cellulose, starch, cellobiose or other glucose-comprising polymer. In embodiments employing yeast as a host, for example, carbons sources such as sucrose, fructose, xylose, ethanol, glycerol, and glucose are suitable. The carbon source can be provided to the host organism throughout the cultivation period or alternatively, the organism can be grown for a period of time in the presence of another energy source, e.g., protein, and then provided with a source of carbon only during the fed-batch phase.
[00144] After the recombinant microorganism has been grown in culture for the period of time, wherein the temperature and period of time facilitate the production of 13R-MO and/or 13R-MO derivatives can then be recovered from the culture using various techniques known in the art. In some embodiments, a permeabilizing agent can be added to aid the feedstock entering into the host and product getting out. For example, a crude lysate of the cultured microorganism can be centrifuged to obtain a supernatant. The resulting supernatant can then be applied to a chromatography column, e.g. , a C-18 column, and washed with water to remove hydrophilic compounds, followed by elution of the compound(s) of interest with a solvent such as methanol. The compound(s) can then be further purified by preparative HPLC. See also, WO 2009/140394.
[00145] It will be appreciated that the various genes and modules discussed herein can be present in two or more recombinant hosts rather than a single host. When a plurality of recombinant hosts is used, they can be grown in a mixed culture to accumulate 13R-MO and/or 13R-MO derivatives.
[00146] Alternatively, the two or more hosts each can be grown in a separate culture medium and the product of the first culture medium can be introduced into second culture medium to be converted into a subsequent intermediate or into an end product. The product produced by the second or final host is then recovered. It will also be appreciated that in some embodiments, a recombinant host is grown using nutrient sources other than a culture medium and utilizing a system other than a fermenter.
[00147] For example, suitable species can be in a genus such as Agaricus, Aspergillus, Bacillus, Candida, Corynebacterium, Eremothecium, Escherichia, Fusarium/Gibberella, Kluyveromyces, Laetiporus, Lentinus, Phaffia, Phanerochaete, Pichia (formally known as Hansuela), Scheffersomyces, Physcomitrella, Rhodoturula, Saccharomyces, Schizosaccharomyces, Sphaceloma, Xanthophyllomyces, Humicola, Issatchenkia, Brettanomyces, Yamadazyma, Lachancea, Zygosaccharomyces, Komagataella, Kazachstania, Xanthophyllomyces, Geotrichum, Blakeslea, Dunaliella, Haematococcus, Chlorella, Undaria, Sargassum, Laminaria, Scenedesmus, Pachysolen, Trichosporon, Acremonium, Aureobasidium, Cryptococcus, Corynascus, Chrysosponum, FHibasidium, Fusarium, Magnaporthe, Monascus, Mucor, Myceliophthora, Mortierella, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Piromyces, Pachysolen, Phanerochaete, Podospora, Pycnoporus, Rhizopus, Schizophyllum, Sordaria, Talaromyces, Rasmsonia, Thermoascus, Thielavia, Tolypocladium, Kloeckera, Pachysolen, Schwanniomyces, Trametes, Trichoderma, Acinetobacter, Nocardia, Xanthobacter, Streptomyces, Erwinia, Klebsiella, Serratia, Pseudomonas, Salmonella, Choroflexus, Chloronema, Chlorobium, Pelodictyon, Chromatium, Rhode-spirillum, Rhodobacter, Rhodomicrobium, or Yarrowia.
[00148] Exemplary species from such genera include Lentinus tigrinus, Laetiporus sulphureus, Phanerochaete chrysosponum, Pichia pastons, Pichia kudnavzevii, Cybehindnera jadinii, Physcomitrella patens, Rhodoturula glutinis, Rhodoturula mucilaginosa, Phaffia rhodozyma, Xanthophyllomyces dendrorhous, Issatchenkia orientalis, Saccharomyces cerevisiae, Saccharomyces bayanus, Saccharomyces pastorianus, Saccharomyces carlsbergensis, Hansuela polymorpha, Brettanomyces anomalus, Yamadazyma philogaea, Fusarium fujikuroi/Gibberella fujikuroi, Candida utilis, Candida glabrata, Candida krusei, Candida revkaufi, Candida pulcherrima, Candida tropicalis, Aspergillus niger, Aspergillus oryzae, Aspergillus fumigatus, Penicillium chrysogenum, Penicillium citrinum, Acremonium chrysogenum, Trichoderma reesei, Rasamsonia emersonii (formerly known as Talaromyces emersonii), Aspergillus sojae, Chrysosponum lucknowense, Myceliophtora thermophyla, Candida albicans, Bacillus subtilis, Bacillus amyloliquefaciens, Bacillius licheniformis, Bacillus puntis, Bacillius megaterium, Bacillius halofurans, Baciilius punilus, Serratia marcessans, Pseudomonas aeruginosa, Salmonella typhimurium, Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp., Undaria pinnatifida, Sargassum, Laminaria japonica, Scenedesmus almeriensis, Salmonella typhi, Choroflexus aurantiacus, Chloronema gigateum, Chlorobium limicola, Pelodictyon luteolum, Chromatium okenii, Rhode-spirillum rubrum, Rhodobacter spaeroides, Rhodobacter capsulatus, Rhodomicrobium vanellii, Pachysolen tannophilus, Trichosporon beigelii, and Yarrowia lipolytica.
[00149] In some embodiments, a microorganism can be a prokaryote such as Escherichia bacteria cells, for example, Escherichia coli cells; Lactobacillus bacteria cells; Lactococcus bacteria cells; Comebacterium bacteria cells; Acetobacter bacteria cells; Acinetobacter bacteria cells; or Pseudomonas bacterial cells. [00150] In some embodiments, a microorganism can be an algal cell such as Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp. , Undaria pinnatifida, Sargassum, Laminaria japonica, Scenedesmus almeriensis species.
[00151] In some embodiments, a microorganism can be a fungi from the genera including but not limited to Acremonium, Arxula, Agaricus, Aspergillus, Agaricus, Aureobasidium, Brettanomyces, Candida, Cryptococcus, Corynascus, Chrysosporium, Debaromyces, Filibasidium, Fusarium, Gibberella, Humicola, Magnaporthe, Monascus, Mucor, Myceliophthora, Mortierella, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Piromyces, Phanerochaete Podospora, Pycnoporus, Rhizopus, Schizophyllum, Schizosaccharomyces, Sordaria, Scheffersomyces, Talaromyces, Rhodotorula, Rhodosporidium, Rasmsonia, Zygosaccharomyces, Thermoascus, Thielavia, Trichosporon, Tolypocladium, Trametes, and Trichoderma. Fungal species include, but are not limited to, Aspergillus niger, Aspergillus oryzae, Aspergillus fumigatus, Penicillium chrysogenum, Penicillium citrinum, Acremonium chrysogenum, Trichoderma reesei, Rasamsonia emersonii (formerly known as Talaromyces emersonii), Aspergillus sojae, Chrysosporium lucknowense, Myceliophtora thermophyla.
[00152] In some embodiments, a microorganism can be an Ascomycete such as Gibberella fujikuroi, Kluyveromyces lactis, Schizosaccharomyces pombe, Geotrichum Aspergillus niger, Yarrowia lipolytica, Ashbya gossypii, Yamadazyma philogaea, Lachancea kluyveri, Kodamaea ohmeri, or S. cerevisiae.
Agaricus, Gibberella, and Phanerochaete spp.
[00153] Agaricus, Gibberella, and Phanerochaete spp. can be useful because they are known to produce large amounts of isoprenoids in culture. Thus, the terpene precursors for producing large amounts of steviol glycosides are already produced by endogenous genes. Thus, modules comprising recombinant genes for steviol glycoside biosynthesis polypeptides can be introduced into species from such genera without the necessity of introducing mevalonate or MEP pathway genes.
Arxula adeninivorans (Blastobotrys adeninivorans)
[00154] Arxula adeninivorans is dimorphic yeast (it grows as budding yeast like the baker's yeast up to a temperature of 42°C, above this threshold it grows in a filamentous form) with unusual biochemical characteristics. It can grow on a wide range of substrates and can assimilate nitrate. It has successfully been applied to the generation of strains that can produce natural plastics or the development of a biosensor for estrogens in environmental samples. Rhodotorula sp.
[00155] Rhodotorula is unicellular, pigmented yeast. The oleaginous red yeast, Rhodotorula glutinis, has been shown to produce lipids and carotenoids from crude glycerol (Saenge et al., 201 1 , Process Biochemistry 46(1):210-8). Rhodotorula toruloides strains have been shown to be an efficient fed-batch fermentation system for improved biomass and lipid productivity (Li et al., 2007, Enzyme and Microbial Technology 41 :312-7).
Schizosaccharomyces spp.
[00156] Schizosaccharomyces is a genus of fission yeasts. Similar to S. cerevisiae, Schizosaccharomyces is a model organism in the study of eukaryotic cell biology. It provides an evolutionary distant comparison to S. cerevisiae. Species include but are not limited to S. cryophilius and S. pombe. (See Hoffman et al., 2015, Genetics. 201 (2):403-23).
Humicola spp.
[00157] Humicola is a genus of filamentous fungi. Species include but are not limited to H. alopallonella and H. siamensis.
Brettanomyces spp.
[00158] Brettanomyces is a non-spore forming genus of yeast. It is from the Saccharomycetaceae family and commonly used in the brewing and wine industries. Brettanomyces produces several sensory compounds that contribute to the complexity of wine, specifically red wine. Brettanomyces species include but are not limited to B. bruxellensis and B. claussenii. See, e.g., Fugelsang et al. , 1997, Wine Microbiology.
Trichosporon spp.
[00159] Trichosporon is a genus of the fungi family. Trichosporon species are yeast commonly isolated from the soil, but can also be found in the skin microbiota of humans and animals. Species include, for example but are not limited to, T. aquatile, T. beigelii, and T. dermatis.
Debaromyces spp.
[00160] Debaromyces is a genus of the ascomycetous yeast family, in which species are characterized as a salt-tolerant marine species. Species include but are not limited to D. hansenii and D. hansenius.
Physcomitrella spp. [00161] Physcomitrella mosses, when grown in suspension culture, have characteristics similar to yeast or other fungal cultures. This genera can be used for producing plant secondary metabolites, which can be difficult to produce in other types of cells.
Saccharomyces spp.
[00162] Saccharomyces is a widely used chassis organism in synthetic biology, and can be used as the recombinant microorganism platform. For example, there are libraries of mutants, plasmids, detailed computer models of metabolism and other information available for S. cerevisiae, allowing for rational design of various modules to enhance product yield. Methods are known for making recombinant microorganisms. Examples of Saccharomyces species include S. castellii, also known as Naumovozyma castelli.
ZvQosaccharomyces spp.
[00163] Zygosaccharomyces is a genus of yeast. Originally classified under the Saccharomyces genus it has since been reclassified. It is widely known in the food industry because several species are extremely resistant to commericially used food preservation techniques. Species include but are not limited to Z. bisporus and Z. cidn. (See Barnett et al, Yeasts: Characteristics and Identification, 1983).
Geotrichum spp.
[00164] Geotrichum is a fungi commonly found in soil, water and sewage worldwide. It's often identified in plants, cereal and diary products. Species include, for example but are not limited to, G. candidum and G. klebahnii (see Carmichael et al. , Mycologica, 1957, 49(6):820-830.)
Kazachstania sp
[00165] Kazachstania is a yeast genus in the family Sacchromycetaceae.
Torulaspora spp.
[00166] Torulaspora is a genus of yeasts and species include but are not limited to T. franciscae and T. globosa.
Aspergillus spp.
[00167] Aspergillus species such as A. oryzae, A. niger and A. sojae are widely used microorganisms in food production and can also be used as the recombinant microorganism platform. Nucleotide sequences are available for genomes of A. nidulans, A. fumigatus, A. oryzae, A. clavatus, A. flavus, A. niger, and A. terreus, allowing rational design and modification of endogenous pathways to enhance flux and increase product yield. Metabolic models have been developed for Aspergillus, as well as transcriptomic studies and proteomics studies. A. niger is cultured for the industrial production of a number of food ingredients such as citric acid and gluconic acid, and thus species such as A. niger are generally suitable for producing steviol glycosides.
Yarrowia lipolytics
[00168] Yarrowia lipolytica is dimorphic yeast (see Arxula adeninivorans) and belongs to the family Hemiascomycetes. The entire genome of Yarrowia lipolytica is known. Yarrowia species is aerobic and considered to be non-pathogenic. Yarrowia is efficient in using hydrophobic substrates (e.g. , alkanes, fatty acids, and oils) and can grow on sugars. It has a high potential for industrial applications and is an oleaginous microorgamism. Yarrowia lipolyptica can accumulate lipid content to approximately 40% of its dry cell weight and is a model organism for lipid accumulation and remobilization. See e.g., Nicaud, 2012, Yeas/ 29(10):409-18; Beopoulos et al., 2009, Biochimie 91 (6):692-6; Bankar et al. , 2009, Appl Microbiol Biotechnol. 84(5):847- 65.
Rhodosporidium toruloides
[00169] Rhodosporidium toruloides is oleaginous yeast and useful for engineering lipid- production pathways (See e.g. Zhu et al., 2013, Nature Commun. 3:1 1 12; Ageitos et al. , 201 1 , Applied Microbiology and Biotechnology 90(4) : 1219-27) .
Candida boidinii
[00170] Candida boidinii is methylotrophic yeast (it can grow on methanol). Like other methylotrophic species such as Hansenula polymorpha and Pichia pastoris, it provides an excellent platform for producing heterologous proteins. Yields in a multigram range of a secreted foreign protein have been reported. A computational method, I PRO, recently predicted mutations that experimentally switched the cofactor specificity of Candida boidinii xylose reductase from NADPH to NADH. See, e.g. , Mattanovich et al., 2012, Methods Mol Biol. 824:329-58; Khoury et al., 2009, Protein Sci. 18(10):2125-38.
Hansenula polymorpha (Pichia angusta)
[00171] Hansenula polymorpha is methylotrophic yeast (see Candida boidinii). It can furthermore grow on a wide range of other substrates; it is thermo-tolerant and can assimilate nitrate (see also, Kluyveromyces lactis). It has been applied to producing hepatitis B vaccines, insulin and interferon alpha-2a for the treatment of hepatitis C, furthermore to a range of technical enzymes. See, e.g., Xu et al., 2014, Virol Sin. 29(6):403-9.
Candida krusei (Issatchenkia orientalis)
[00172] Candida krusei, scientific name Issatchenkia orientalis, is widely used in chocolate production. C. krusei is used to remove the bitter taste of and break down cacao beans. In addition to this species involvement in chocolate production, C. krusei is commonly found in the immunocompromised as a fungal nosocomial pathogen (see Mastromarino et al. , New Microbiolgica, 36:229-238; 2013)
Kluyveromyces lactis
[00173] Kluyveromyces lactis is yeast regularly applied to the production of kefir. It can grow on several sugars, most importantly on lactose which is present in milk and whey. It has successfully been applied among others for producing chymosin (an enzyme that is usually present in the stomach of calves) for producing cheese. Production takes place in fermenters on a 40,000 L scale. See, e.g., van Ooyen et al. , 2006, FEMS Yeast Res. 6(3):381-92.
Pichia pastoris
[00174] Pichia pastoris is methylotrophic yeast (see Candida boidinii and Hansenula polymorpha). It is also commonly referred to as Komagataella pastoris. It provides an efficient platform for producing foreign proteins. Platform elements are available as a kit and it is worldwide used in academia for producing proteins. Strains have been engineered that can produce complex human N-glycan (yeast glycans are similar but not identical to those found in humans). See, e.g., Piirainen et al., 2014, N Biotechnol. 31 (6):532-7.
Scheffersomvces stipitis
[00175] Scheffersomyces stipitis also known as Pichia stipitis is a homothallic yeast found in haploid form. Commonly used instead of S. cerevisiae due to its enhanced respiratory capacity that results from and alternative respiratory system. (See Papini et al. , Microbial Cell Factories, 1 1 :136 (2012)).
[00176] In some embodiments, a microorganism can be an insect cell such as Drosophilia, specifically, Drosophilia melanogaster.
[00177] In some embodiments, a microorganism can be an algal cell such as, for example but not limited to, Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp., [00178] In some embodiments, a microorganism can be a cyanobacterial cell such as, for example but not limited to, Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp., Undaria pinnatifida, Sargassum, Laminaria japonica, and Scenedesmus almeriensis.
[00179] In some embodiments, a microorganism can be a bacterial cell. Examples of bacteria include, but are not limited to, the genenera Bacillus (e.g. , B. subtilis, B. amyloliquefaciens, B. licheniformis, B. puntis, B. megaterium, B. halodurans, B. pumilus), Acinetobacter, Nocardia, Xanthobacter, Escherichia (e.g., E. coli), Streptomyces, Erwinia, Klebsiella, Serratia (e.g. , S. marcessans), Pseudomonas (e.g. , P. aeruginosa), Salmonella (e.g., S. typhimurium, and S. typhi). Bacterial cells may also include, but are not limited to, photosynthetic bacteria (e.g. , green non-sulfur bacteria (e.g. , Choroflexus bacteria (e.g., C. aurantiacus), Chloronema (e.g. , C. gigateum), green sulfur bacteria (e.g., Chlorobium bacteria (e.g., C. limicola), Pelodictyon (e.g. , P. luteolum), purple sulfur bacteria (e.g. , Chromatium (e.g., C. okenii)), and purple non-sulfur bacteria (e.g., Rhode-spirillum (e.g., R. rubrum), Rhodobacter (e.g., R. sphaeroides, R. capsulatus), and Rhodomicrobium bacteria (e.g., R. vanellii)).
E. coli
[00180] E. coli, another widely used platform organism in synthetic biology, can also be used as the recombinant microorganism platform. Similar to Saccharomyces, there are libraries of mutants, plasmids, detailed computer models of metabolism and other information available for E. coli, allowing for rational design of various modules to enhance product yield. Methods similar to those described above for Saccharomyces can be used to make recombinant E. coli microorganisms.
[00181] It can be appreciated that the recombinant host cell disclosed herein can comprise a plant cell, comprising a plant cell that is grown in a plant, a mammalian cell, an insect cell, a fungal eel from Aspergillus genus; a yeast cell from Saccharomyces (e.g., S. cerevisiae, S. bayanus, S. pastorianus, and S. carisbergensis), Schizosaccharomyces (e.g. , S. pombe), Yarrowia (e.g., Y. lipolytica), Candida (e.g., C. glabrata, C. albicans, C. krusei, C. revkaufi, C. pulcherrima, Candida tropicalis, C. utilis, and C. boidinii), Ashbya (e.g. , A. gossypii), Cyberiindnera (e.g., C. jadinii), Pichia (e.g. , P. pastoris and P. kudriavzevii), Kluyveromyces (e.g., K. lactis), Hansenual (e.g., H. polymorpha), Arxula (e.g., A. adeninivorans), Xanthophyllomyces (e.g., X. dendrorhous), Issatchenkia (e.g. , I. orientali), Torulaspora (e.g. , T. franciscae and T. globosa), Geotrichum (e.g., G. candidum and G. klebahni), Zygosaccharomyces (e.g., Z. bisporus and Z. cidri), Yamadazyma (e.g., Y. philogaea), Lanchancea (e.g., L. kluyveri), Kodamaea (e.g., K. ohmeri), Bretianomyces (e.g., B. anomalus), Trichosporon (e.g. , T. aquatile, T. beigelii, and T. dermatis), Debaromyces (e.g. , D. hansenuis and D. hansenii), Scheffersomyces (e.g., S. stipis), Rhodosporidium (e.g., R. toruloides), Pachysolen (e.g., P. tannophilus), and Physcomitrella, Rhodotorula, Kazachstania, Gibberella, Agaricus, and Phanerochaete genera; an insect cell including, but not limited to, Drosophilia melanogaster, an algal cell including, but not limited to, Blakeslea trispora, Dunaliella salina, Haematococcus pluvialis, Chlorella sp., Undaria pinnatifida, Sargassum, Laminaria japonica, and Scenedesmus almeriensis species; or a bacterial cell from Bacillus genus (e.g., B. subtilis,
B. amyloliquefaciens, B. licheniformis, B. puntis, B. megaterium, B. halodurans, and B. pumilus) Acinetobacter, Nocardia, Xanthobacter genera, Escherichia (e.g., E. coli), Streptomyces, Erwinia, Klebsiella, Serratia (e.g., S. marcessans), Pseudomonas (e.g. , P. aeruginosa), Salmonella (e.g., S. typhimurium and S. typhi), and further including, Choroflexus bacteria (e.g.,
C. aurantiacus), Chloronema (e.g., C. gigateum), green sulfur bacteria (e.g., Chlorobium bacteria (e.g., C. limicola), Pelodictyon (e.g., P. luteolum)), purple sulfur bacteria (e.g., Chromatium (e.g., C. okenii)), and purple non-sulfur bacteria (e.g. , Rhode-spirillum (e.g., R. rubrum), Rhodobacter (e.g. , R. sphaeroides and R. capsulatus), and Rhodomicrobium bacteria (e.g., R. vanellii).
[00182] In some embodiments, the host organism is a plant. A plant or plant cell can be transformed by having a heterologous gene integrated into its genome, i.e., it can be stably transformed. Stably transformed cells typically retain the introduced nucleic acid with each cell division. A plant or plant cell can also be transiently transformed such that the recombinant gene is not integrated into its genome. Transiently transformed cells typically lose all or some portion of the introduced nucleic acid with each cell division such that the introduced nucleic acid cannot be detected in daughter cells after a certain number of cell divisions. Both transiently transformed and stably transformed transgenic plants and plant cells can be useful in the methods described herein.
[00183] Plant cells comprising a heterologous gene used in methods described herein can constitute part or all of a whole plant. Such plants can be grown in a manner suitable for the species under consideration, either in a growth chamber, a greenhouse, or in a field. Plants may also be progeny of an initial plant comprising a heterologous gene provided the progeny inherits the heterologous gene. Seeds produced by a transgenic plant can be grown and then selfed (or outcrossed and selfed) to obtain seeds homozygous for the nucleic acid construct. [00184] The plants to be used with the invention can be grown in suspension culture, or tissue or organ culture. For the purposes of this invention, solid and/or liquid tissue culture techniques can be used. When using solid medium, plant cells can be placed directly onto the medium or can be placed onto a filter that is then placed in contact with the medium. When using liquid medium, transgenic plant cells can be placed onto a flotation device, e.g. , a porous membrane that contacts the liquid medium.
[00185] When transiently transformed plant cells are used, a reporter sequence encoding a reporter polypeptide having a reporter activity can be included in the transformation procedure and an assay for reporter activity or expression can be performed at a suitable time after transformation. A suitable time for conducting the assay typically is about 1-21 days after transformation, e.g., about 1-14 days, about 1-7 days, or about 1-3 days. The use of transient assays is particularly convenient for rapid analysis in different species, or to confirm expression of a heterologous polypeptide whose expression has not previously been confirmed in particular recipient cells.
[00186] Techniques for introducing nucleic acids into monocotyledonous and dicotyledonous plants are known in the art, and include, without limitation, /4gro6acfe/7t//n-mediated transformation, viral vector-mediated transformation, electroporation and particle gun transformation, U.S. Patent Nos. 5,538,880; 5,204,253; 6,329,571 ; and 6,013,863. If a cell or cultured tissue is used as the recipient tissue for transformation, plants can be regenerated from transformed cultures if desired, by techniques known to those skilled in the art.
[00187] The plant comprising a heterologous nucleic acid to be used with the present invention can for example be: com (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cerale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuas), wheat (Tritium aestivum and other species), Triticale, Rye (Secale) soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Impomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp), coconut (Cocos nucifera), pineapple (Anana comosus), citrus (Citrus spp.) cocoa (Theobroma cacao), tea (Camellia senensis), banana (Musa spp.), avacado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifer indica), olive (O/ea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia intergrifolia), almond (Primus amygdalus), apple (Malus spp.), Pear (Pyrus spp.), plum and cherry tree (Prunus spp.), Ribes (currant etc.), Vitis, Jerusalem artichoke (Helianthemum spp.), non-cereal grasses (Grass family), sugar and fodder beets (Beta vulgaris), chicory, oats, barley, vegetables, or ornamentals.
[00188] For example, plants of the present invention are crop plants (for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassava, barley, pea, sugar beets, sugar cane, soybean, oilseed rape, sunflower and other root, tuber or seed crops. Other important plants maybe fruit trees, crop trees, forest trees or plants grown for their use as spices or pharmaceutical products (Mentha spp., clove, Artemesia spp., Thymus spp., Lavendula spp., Allium spp., Hypericum, Catharanthus spp., Vinca spp., Papaver spp., Digitalis spp., Rawolfia spp., Vanilla spp., Petrusilium spp., Eucalyptus, tea tree, Picea spp., Pinus spp., Abies spp., Juniperus spp. Horticultural plants which can be used with the present invention may include lettuce, endive, and vegetable brassicas including cabbage, broccoli, and cauliflower, carrots, and carnations and geraniums.
[00189] The plant can also be tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, or Chrysanthemum.
[00190] The plant may also be a grain plants for example oil-seed plants or leguminous plants. Seeds of interest include grain seeds, such as corn, wheat, barley, sorghum, rye, etc. Oil-seed plants include cotton soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, etc. Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mung bean, lima bean, fava bean, lentils, chickpea.
[00191] In a further embodiment of the invention the plant can be maize, rice, wheat, sugar beet, sugar cane, tobacco, oil seed rape, potato, soybean, or Arabidopsis thaliana. In some embodiments, the plant is not C. forskohlii.
Table 1. Sequence listing key.
Figure imgf000049_0001
SEQ ID NO:16 Amino acid sequence of TPS2
SEQ ID NO:17 Amino acid sequence of TPS3
SEQ ID NO:18 Amino acid sequence of TPS4
SEQ ID NO:19 Amino acid sequence of CYP76AH16
SEQ ID NO:20 Amino acid sequence of CYP76AH8
SEQ ID NO:21 Amino acid sequence of CYP76AH11
SEQ ID NO:22 Amino acid sequence of CYP76AH15
SEQ ID NO:23 Amino acid sequence of CYP76AH17
SEQ ID NO:26 Amino acid sequence of ACT1-8
SEQ ID NO:27 cDNA encoding ACT1-8
SEQ ID NO:28 DNA Sequence encoding ACT1 -8
SEQ ID NO:29 cDNA encoding DXS
SEQ ID NO:30 Amino acid sequence of DXS
SEQ ID NO:31 cDNA encoding GGPPS
SEQ ID NO:32 Amino acid sequence of GGPPS
SEQ ID NO:33 cDNA encoding POR
SEQ ID NO:34 Amino acid sequence of POR
SEQ ID NO:35 DNA Sequence encoding TPS2
SEQ ID NO:36 DNA Sequence encoding TPS3
SEQ ID NO:37 Amino acid sequence of GGPPS
SEQ ID NO:38 DNA Sequence encoding GGPPS
SEQ ID NO:39 Amino acid sequence of CYP2A1
SEQ ID NO:40 Amino acid sequence of CYP71 D55
SEQ ID NO:41 Amino acid sequence of CYP71AJ6
SEQ ID NO:42 Amino acid sequence of CYP76AH15 A99I SEQ ID NO:43 Amino acid sequence of CYP76AH15 A100V
SEQ ID NO:44 Amino acid sequence of CYP76AH15 G104D
SEQ ID NO:45 Amino acid sequence of CYP76AH15 V207T
SEQ ID NO:46 Amino acid sequence of CYP76AH15 S235G
SEQ ID NO:47 Amino acid sequence of CYP76AH15 Y236F
SEQ ID NO:48 Amino acid sequence of CYP76AH15 S235G Y236F
SEQ ID NO:49 Amino acid sequence of CYP76AH15 G362V
SEQ ID NO:50 Amino acid sequence of CYP76AH15 L366F
SEQ ID NO:51 Amino acid sequence of CYP76AH15 G362V L366F
SEQ ID NO:52 Amino acid sequence of CYP76AH15 L366E
SEQ ID NO:53 Amino acid sequence of CYP76AH15 D473E D474L + P475 deletion
SEQ ID NO:54 Amino acid sequence of CYP76AH15 F476T
SEQ ID NO:55 Amino acid sequence of CYP76AH15 L478M
SEQ ID NO:56 Amino acid sequence of CYP76AH15 L478I
SEQ ID NO:57 Amino acid sequence of CYP76AH15 L478A
SEQ ID NO:58 Amino acid sequence of CYP76AH15 A99I L366F
SEQ ID NO:59 Amino acid sequence of CYP76AH15 A99I L366E
SEQ ID NO:60 Amino acid sequence of CYP76AH15 G104D S235G
SEQ ID NO:61 Amino acid sequence of CYP76AH15 S235G L366F
SEQ ID NO:62 Amino acid sequence of CYP76AH15 A99I S235G Y236F
SEQ ID NO:63 Amino acid sequence of CYP76AH15 S235G Y236F L366E
SEQ ID NO:64 Amino acid sequence of CYP76AH15 A99I S235G Y236F L366F
SEQ ID NO:65 Amino acid sequence of TPS1
SEQ ID NO:66 DNA sequence encoding CYP76AH15 SEQ ID NO:67 DNA sequence encoding CYP76AH15
SEQ ID NO:68 Amino acid sequence of Tomato bushy stunt virus P19
SEQ ID NO:70 Amino acid sequence of FS1
SEQ ID NO:71 Amino acid sequence of CYP76AH4
SEQ ID NO:72 Amino acid sequence of CYP76AH6
SEQ ID NO:73 Amino acid sequence of FS
SEQ ID NO:74 Amino acid sequence of CYP76AH15 S235G Y236F L366F
SEQ ID NO:75 Amino acid sequence of CYP76AH15 A99I S235G Y236F L366E
[00192] The invention will be further described in the following examples, which do not limit the scope of the invention described in the claims.
EXAMPLES
[00193] The Examples that follow are illustrative of specific embodiments of the invention, and various uses thereof. They are set forth for explanatory purposes only, and are not to be taken as limiting the invention.
Example 1 : Identification and comparison of the putative SRS regions in the C. forskohlii CYP76AH
[00194] The putative substrate recognition sites 1 -6 (SRS1-6) of cytochrome P450s were selected for site-directed mutagenesis, as these areas comprise residues important for the catalytic activities of CYPs. The SRS regions were identified in CYP76AH8 (SEQ ID NO:20), CYP76AH1 1 (SEQ ID NO:21), CYP76AH15 (SEQ ID NO:22), CYP76AH16 (SEQ ID NO:19), and CYP76AH17 (SEQ ID NO:23) by alignments and comparisons of reported SRS regions in the Rattus norvegicus CYP2A1 (SEQ ID NO:39), Hyoscyamus muticus CYP71 D55 (SEQ ID NO:40), and Thapsia villosa CYP71AJ6 (SEQ ID NO:41 ). SRS regions of R. norvegicus CYP2A1 (SEQ ID NO:39), H. muticus CYP71 D55 (SEQ ID NO:40), and T. villosa CYP71AJ6 (SEQ ID NO:41) are shown in Figure 5A, and SRS regions of in CYP76AH8 (SEQ ID NO:20), CYP76AH1 1 (SEQ ID NO:21), CYP76AH15 (SEQ ID NO:22), CYP76AH16 (SEQ ID NO:19), and CYP76AH17 (SEQ ID NO:23) are shown in Figure 5B. [00195] Comparative homology modeling was furthermore utilized to determine and visualize SRS regions to identify residues involved in the regulating activity of the CYP76AH15 enzyme. See Figure 5C. Homology modeling was carried out using UCSF Chimera version 1 .10.2 (University of California) and Modeller 9.15 (University of California). BLAST searches in the PDB database were carried out using CYP76AH sequences as a query to find templates. Two templates were utilized for each modeling using default settings in Modeller except inclusion of the HEME heteroatom from the 3RUK template (Table 2).
Table 2. Templates used for modeling.
Figure imgf000053_0001
[00196] The SRS regions of CYP76AH enzymes that catalyze formation of 1 1-oxo-13R-MO (CYP76AH15, CYP76AH8, and CYP76AH17) were compared. See Table 3 and Figure 5D. CYP76AH8 and CYP76AH17 were found to share an overall sequence identity of 88%, whereas the total sequence identity in the SRS regions was found to be 99% with a single conservative amino acid substitution in SRS1 of A117S in CYP76AH17, suggesting a higher sequence conservation in these areas. Differences in SRS1-6 between CYP76AH15 and CYP76AH8/CYP76AH 17 were mainly in the SRS1 , SRS3, and SRS6 with sequence identities below the overall sequence identities, suggesting that SRS1 , SRS3, and SRS6 could be responsible for the catalytic differences between CYP76AH15 and either CYP76AH8 or CYP76AH17. The SRS5 region was conserved between all three enzymes.
[00197] The SRS regions of CYP76AH11 and CYP76AH16, which carry out distinct reactions in the forskolin pathway (CYP76AH1 1 is multifunctional, and CYP76AH16 is region-specific towards C-9 hydroxylation) were also compared. See Table 3 and Figure 5D. The SRS differences between CYP76AH1 1 and CYP76AH16 shared a similar pattern to the differences between CYP76AH8/CYP76AH 17 and CYP76AH15; differences were found in SRS1 (54% identity), SRS3 (50% identity), and SRS6 (63% identity). SRS5 was not conserved between CYP76AH1 1 and CYP76AH16.
[00198] Homology modeling and comparisons of a variety of putative substrate recognition sites (SRS) were used to produce CYP proteins having amino acid sequence variants of the amino acid sequence of CYP76AH15 with substitutions in the SRS1 , SRS3, SRS5, and/or SRS6 regions. These variant sequences are set forth in SEQ ID NOs:42-64 and 74-75, and identified in Tables 4 and 5.
Table 3. Sequence comparison of SRS regions in CYP76AH8, CYP76AH11, CYP76AH15, CYP76AH16, and CYP76AH17, shown in percent identity.
Figure imgf000054_0001
Example 2: Functional characterization of CYP76AH15 variants in tobacco
[00199] Initial identification and characterization of CYP76AH15 variant enzymes were conducted using transient N. benthamiana (tobacco) in vivo expression. CYP76AH15 variants were created using site-directed mutagenesis; variants were subsequently cloned for agrobacterium-meditated transient tobacco expression together with TPS2 (SEQ ID NO:16), TPS3 (SEQ ID NO: 17), anti-post transcriptional suppressor protein P19 (SEQ ID NO:68), DXS (SEQ ID NO:30), and C. forskohlii GGPPS (SEQ ID NO:32). Agrobacterium suspensions normalized at an OD600 of 1 were mixed in equivalent volumes prior to infiltrations. Tobacco plants 4-6 weeks old were infiltrated and left at green house conditions for 7 days before extracting metabolites. Two 2 cm diameter infiltrated leaf discs were collected from one plant from two separate tobacco leaves and extracted by adding 1 mL n-hexane (GC grade) spiked with 1 mg/L 1 -eicosene as internal standard in 1.5 mL glass vials, vortexed and left shaking 1 h at room temperature (RT) at above 200 RPM. Each plant combination was carried out in biological triplicates. Plant material was pelleted by centrifugation and the supernatants were transferred to new GC vials before analysis.
[00200] The Shimadzu GCMS-QP2010 Ultra system was utilized for GC-MS analysis using an Agilent HP-5MS column (30 m x 0.25 mm i.d., 0.25 μηη film thickness). Injection volume was set to 1 μί and the injection temperature at 250°C with the following GC program: 50°C for 2 min, ramp at rate 4°C/min to 1 10°C, ramp at rate 8°C/min to 250°C, ramp at rate 10°C/min to 310°C and hold for 5 min. The ion source temperature of the mass spectrometer was set to 250°C, and spectra were recorded from m/z 50 to m/z 350. Compound identification was carried out using authentic standards and comparison to reference spectra in databases (Wiley Registry of Mass Spectral Data, 8th Edition, July 2006, John Wiley & Sons, ISBN: 978-0-470- 04785-9). Relative differences in diterpene yields were based on an average of the total ion chromatogram (TIC) peak area for the compound of interest normalized to the TIC area of internal standard (1-eicosene).
[00201] GC-MS analysis of hexane extracts from tobacco leaf discs revealed accumulation of 13R-MO in control plants with no P450s (Figure 6A). Plants comprising CYP76AH15 had reduced accumulation of 13R-MO, compared to the control, but showed accumulation of 1 1- oxo-13f?-MO (Figure 6A) as well as further oxygenated derivatives (Figure 7A). CYP76AH15 variants shown in Table 4 were transiently expressed and compared to the diterpene accumulation pattern of the native CYP76AH15.
Table 4. CYP76AH15 variants tested in N. benthamiana and characterization of diterpene product profile.
Figure imgf000055_0001
[00202] The SRS1 variant A99I accumulated 1 1/3-hydroxy-13R-MO and 1 1-oxo-13R-MO (Figure 6B). The SRS5 variant L366F accumulated mainly 1 1-oxo-13f?-MO (Figure 6A) as well as three putative single hydroxylated derivatives of the following formula: C20H32O2 (Figure 7A, compounds a/b/c). An SRS5 variant, G362V, however, was inactive when combined with L366F; the G362V L366F variant produced 11/3-hydroxy-13R-MO but accumulated unused 13H- MO (Figure 6A). Variant L366E displayed a similar pattern as A99I with mainly 1 1/3-hydroxy- 13R-MO and 1 1 -oxo-13f?-MO accumulating (Figure 6A) and further oxygenated derivatives (Figure 7A). The SRS1 +SRS5 variant A99I L366F demonstrated a profile similar to that of A99I, while the similar combination A99I L366E was inactive (Figure 6B).
[00203] In addition to CYP76AH15 producing 11-oxo-13R-MO from 13R-MO, it also produces ferruginol from abietatriene. CYP76AH15 variants were tested for their ability to produce ferruginol from abietatriene to further study effects from mutagenesis in SRS sites. CYP76AH15 and CYP76AH15 variants were transiently expressed together with C. forskohlii TPS1 (SEQ ID NO:65) and C. forskohlii TPS3 (SEQ ID NO:17) producing miltiradiene and abietatriene. Miltiradiene and abietatriene were found in GC-MS analysis of hexane extracts from tobacco leaf discs in control plants with no P450s (Figure 8A). The addition of CYP76AH15 resulted in production of ferruginol (Figure 8A). SRS5 variants (G362V, G362V L366F, L366E) were inactive except for L366F, which demonstrated lowered accumulation of ferruginol and a change in miltiradiene and abietatriene ratios, compared to the native CYP76AH15 (Figure 8A). The SRS6 variant of SEQ ID NO:53 also demonstrated lower accumulation of ferruginol.
Example 3: Expression of CYP76AH15 variants in S. cerevisiae
[00204] The CYP76AH15 variants were individually genomically integrated in an S. cerevisiae strain further comprising C. forskohlii POR (SEQ ID NO:34), C. forskohlii TPS2 (SEQ ID NO:16), C. forskohlii TPS3 (SEQ ID NO: 17), and Synechococcus sp. GGPPS (SEQ ID NO:37). A control strain comprising C. forskohlii POR (SEQ ID NO:34) and no CYPs accumulated 13R-MO (Figures 9A, 9B). Incorporation of native CYP76AH15 (SEQ ID NO:66) or codon-optimized CYP76AH15 (SEQ ID NO:67) led to accumulation of 1 1-oxo-13R-MO and 1 1/3-hydroxy-13f?-MO as well as a hydroxylated 1 1 -oxo-13f?-MO derivative with no significant (Pvaiue >0.05) difference in relative diterpene yields.
[00205] Incorporation of the SRS1 (A99I), SRS3 (S235G Y236F), SRS5 (L366F, L366E) variants led to above 2-fold accumulation of 11 -oxo-13R-MO, while the 473DPP::EL SRS6 variant displayed a change in diterpene accumulation pattern (Figure 9C). Variant A99I displayed a 5.6-fold increase in accumulated 1 1-oxo-13f?-MO, while levels of 13R-MO dropped significantly (PvaiUe<0.05) compared to the non-altered and codon-optimized CYP76AH15 (Figure 9C). Variant G362V produced small amounts of 1 1/3-hydroxy-13R-MO, while G362V L366F produced mainly 1 1/3-hydroxy-13R-MO (Figure 9D).
[00206] The effect of combinatorial mutagenesis of the SRS1 , SRS3, and SRS5 variants was further tested. As shown in Table 5, combining SRS1 and SRS3 variants (A99I S235G Y236F of SEQ ID NO:62) resulted in a variant with a 6.5-fold increase in 1 1-oxo-13f?-MO formation, while combining SRS1 and SRS5 led to 6.2-fold increase in 11-oxo-13R-MO (A99I L366F of SEQ ID NO:58) and an inactive enzyme (A99I L366E). Combining SRS3+SRS5 variants yielded an enzyme (S235G Y236F L366E of SEQ I D NO:63) with a 3.1 -fold increase in 1 1 -oxo- 13R-MO (Table 5). Furthermore, the A99I S235G Y236F L366F variant of SEQ ID NO:64 resulted in a 31.4-fold increase in 1 1/3-hydroxy-13f?-MO.
Table 5. Fold changes for CYP76AH15 variants (relative to native CYP76AH15) in S. cerevisiae.
Figure imgf000057_0001
[00207] An S. cerevisiae strain producing miltiradiene and abietatriene was also constructed to determine the effects of CYP76AH15 and CYP76AH15 variants on formation of ferruginol in S. cerevisiae. The strain further comprised C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO: 17), Synechococcus sp. GGPPS (SEQ ID NO:37), and C. forskohlii POR (SEQ ID NO:34). The control strain comprising C. forskohlii POR (SEQ ID NO:34) and no CYPs produced miltiradiene and abietatriene (Figure 10). Genomic integration of the native CYP76AH15 and C. forskohlii POR (SEQ ID NO:34) resulted in production of ferruginol, as did expression of CYPAH15 S235G Y236F (SEQ ID NO:48) or D473E D474L + P475 deletion (SEQ ID NO:53) (Figure 10A). As well, as shown in Figure 10B, expression of CYP76AH8 (SEQ ID NO:20) in an S. cerevisiae strain further comprising C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO:17), Synechococcus sp. GGPPS (SEQ ID NO:37), and C. forskohlii POR (SEQ ID NO:34) resulted in production of ferruginol.
Example 4: Step-wise incorporation of CYP76AH15 and CYP76AH15 A99I variant with CYP76AH11 or CYP76AH16 in S. cerevisiae
[00208] The native CYP76AH15 (SEQ ID NO:22) or the CYP76AH15 A99I variant (SEQ ID NO:42) were combined with CYP76AH1 1 (SEQ ID NO:21) or CYP76AH16 (SEQ ID NO: 19) in a 13f?-MO-producing S. cerevisiae strain to observe effects on forskolin intermediate formation. Introduction of CYP76AH15 (SEQ ID NO:22) or CYP76AH1 1 (SEQ ID NO:21) led to formation of compounds with the formula C20H32O3 and C20H32O4, corresponding to single hydroxylation and double hydroxylation of 1 1-oxo-13f?-manoyl oxide, respectively. See Figure 11. The combination of CYP76AH15 A99I (SEQ ID NO:42) with CYP76AH1 1 (SEQ ID NO:21) lead to a similar product profile, however, with a slight increase in compounds with the formula C20H32O3 and C20H32O4 (Figure 1 1). Combining native CYP76AH15 (SEQ ID NO:22) with CYP76AH16 (SEQ ID NO:19) led to formation of mainly a single C20H32O3 compound corresponding to a single hydroxylation of 1 1-oxo-13R-manoyl oxide. Combining CYP76AH15 A99I (SEQ ID NO:42) and CYP76AH16 (SEQ ID NO:21) led to an increase in formation of 1 1 -oxo-13R-manoyl oxide as well as a C20H32O3 compound and a compound with the formula C20H32O4, which corresponds to a double hydroxylation of 1 1 -oxo-13f?-manoyl oxide (Figure 1 1).
Example 5: Further characterization of CYP76AH enzymes
[00209] As shown in Figures 4 and herein, CYP76AH enzymes accept multiple diterpene substrates and produce several products from one substrate. See also, Ignea et a/., 2016, Microb. Cell Fact. 15:46 and Guo et a/., 2016, New Phytol. 210:525-34. CYP76AH subfamily members are involved in production of ferruginol and 1 1-hydroxy-ferruginol in rosemary and sage (Figure 12A). Here, the promiscuous nature of CYP76AH subfamily members from rosemary, sage, and C. forskohlii on 13R-MO was explored.
[00210] The enzymes were transiently expressed in N. benthamiana with diterpene synthases. As shown in Figures 13A and 13B, enzymes from rosemary and sage species, including R. officinalis CYP76AH4 (SEQ ID NO:71), R. officinalis FS1 (SEQ ID NO:70), S. fructicosa FS (SEQ ID NO:73), and R. officinalis CYP76AH6 (SEQ ID NO:72) were able to produce the forskolin precursors 1 1-oxo-13f?-MO and 1 1/3-hydroxy-13f?-MO. The control comprised diterpene synthases but no CYPs and thus did not produce 1 1 -oxo-13f?-MO or 1 1/3- hydroxy-13R-MO. Furthermore, expression of C. forskohlii CYP76AH8 (SEQ ID NO:20) in N. benthamiana resulted in production of 1 1 -oxo-13f?-MO and 1 1/3-hydroxy-13f?-MO (Figure 13B). Figure 12B summarizes reactions catalyzed by R. officinalis, S. fructicosa, and C. forskohlii CYP76AH enzymes.
Example 6: Further Expression of CYP76AH15 variants in S. cerevisiae
[00211] An S. cerevisiae strain producing miltiradiene and abietatriene, as described in Example 3, was also used to determine the effects of SRS6 variants on formation of ferruginol in S. cerevisiae. The strain further comprised C. forskohlii TPS1 (SEQ ID NO:65), C. forskohlii TPS3 (SEQ ID NO: 17), Synechococcus sp. GGPPS (SEQ ID NO:37), and C. forskohlii POR (SEQ ID NO:34). The control strain comprising C. forskohlii POR (SEQ ID NO:34) and no CYPs produced miltiradiene and abietatriene (Figure 10). Expression of CYP76AH15 F476T (SEQ ID NO:54), CYP76AH15 L478M (SEQ ID NO:55), CYP76AH15 L478I (SEQ ID NO:56), or CYP76AH15 L478A (SEQ ID NO:57) resulted in increased production of ferruginol. Expression of CYP76AH16 F476T or CYP76AH L478A resulted in a 2.4-fold increase in ferruginol production, relative to the expression of CYP76AH15 (SEQ ID NO:22). See Figure 14.
Example 7: Product Profiles of S. cerevisiae strains expressing CYP76AH15 variants
[00212] Diterpene accumulation of CYP76AH15 variants, shown in Table 6, below, expressed in 13R-MO- or miltiradiene-producing S. cereivisiae strains, as described in Examples 3 and 6, were compared to the diterpene accumulation pattern of corresponding S. cerevisiae strains expressiing CYP76AH15 (SEQ ID NO:22). Table 6. CYP76AH15 variants tested in N. benthamiana and characterization of diterpene product profile.
Figure imgf000060_0001
Example 8: 13 ?-MO Turnover Characterization
[00213] The production of 13R-MO, 1 1-oxo-13R-MO, and 1 1/3-hydroxy-13R-MO by S. cerevisiae strains, as described in Example 3, expressing CYP76AH15 or a CYP76AH15 variant, provided in Table 5, and by a corresponding control S. cereivisiae strain lacking CYP76AH15 or a CYP76AH15 variant, was monitored over 72 hours. Expression of CYP76AH15 A99I (SEQ ID NO:42) resulted in a two-fold increase in the total amount of diterpenes accumulated, relative to expression of CYP76AH15 (SEQ ID NO:22), and a 3.7-fold increase in the amount of 1 1-oxo-13f?-MO accumulated (accounting for 99% of the total amount of diterpenes produced). Expression of CYP76AH15 A99I also resulted in a near-depletion of 13R-MO, an 18-fold decrease relative to expression of CYP76AH15. Expression of CYP76AH15 S235G Y236F (SEQ ID NO:48), CYP76AH15 L366F (SEQ ID NO:50), CYP76AH15 L366E (SEQ ID NO:52), CYP76AH15 A99I S235G Y236F (SEQ ID NO:62), or CYP76AH15 A99I L366F (SEQ ID NO:58) also resulted in an amount of accumulated 1 1-oxo- 13R-MO that accounted for more than 93% of the total amount of diterpenes produced. See Table 7 and Figure 15.
Table 7. Production Titers at 72 Hours.
Figure imgf000061_0001
[00214] Having described the invention in detail and by reference to specific embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope of the invention defined in the appended claims. More specifically, although some aspects of the present invention are identified herein as particularly advantageous, it is contemplated that the present invention is not necessarily limited to these particular aspects of the invention.
Table 6. Sequences disclosed herein.
SEQ ID NO:1
Coleus forskohlii
atgaaggtgg aaagatttag caggaaactc ataaaaccgc acaccccaac ccccgaaaat 60 ctgaagaagt ataagctttc tcttttagac aaatgcttgg ggcacgataa ttttgctatt 120 gttctgtttt acgaatcgaa accaagaaac aagagtgagt tggaagaatc actggaaaaa 180 gtcctggtgg atttctaccc tcttgctgga agacacacca tgaatgatca catagttgat 240 tgcagtgatg tgggcgccgt gtttgtcgaa gcagaagctc tagacgtcga gctgacgatg 300 gatgagctcg ttaagaacat ggaggcccag actatccatc atctccttcc caatcaatat 360 ttctcagctg atgctccaaa tccactcttg tcaatccagg tgacacattt cccatctggt 420 ggcctagcaa ttggcatcgc cgtttctcac gctgtcttcg atggtttttc gctgggggtg 480 ttcgtcgccg cctggtcgaa ggccaccatg aacccagatc ggaagatcaa aatcacaccg 540 tctttcgatc ttccctcgtt gcttccttac aaggatgaca attttggatt aactgccgcc 600 gaaattgtca gccagagtga agatattgtt gttaagaggt ttattttcgg gaaggaggca 660 ataacgaggt tgagatcaaa gctaagccca aatcgcaatg ggaaaaagat ctctcgtgtt 720 cgagtggtgt gtgccgttat agtgaaagcc ttgatgggcc tggaacgcgc caaacatggc 780 aaaacaagag atttcttgat cactcaatcg attaacatgc gcgagagaac aaaagcacct 840 ctacagaagc atgcttgtgg gaatctggca gtcttgtcgt gcacgcgacg tgtagaagcc 900 gaggagatga tggagctgca gaacttggtt aatctgatcg gagattctac agagaaggac 960 atagctgatt ttgctgaatt gctatctcct gatcaagttg ggcgtgatat catcatcaag 1020 atgatgaagt ctttcatgca attcttggat aatgacattt attctgtatg cttcactgat 1080 tggagtaagt ttgaatttta cgaagctgat tttgggttcg ggaagcctgt ttggatggcc 1140 gctggaccgc aacgcccaat cattagcacc gctattctca tgagcgacag agagggtgat 1200 ggaattgagg catggctgca tctcaacaaa aatgacatgc ttatcttcga gcaagatgaa 1260 gaaatcaaat tatttactac ctaa 1284
SEQ ID NO:2
Coleus forskohlii
atgaaggtgg aaagatttag caggaaactc ataaaaccgc acaccccaac ccccgaaaat 60 ctcaagaatt ataagctttc tcttttagat aaatgcttgg tgcaggataa ttttgctgtt 120 gtgctgtttt acgaatcgaa accaagaaac aagagtgagt tggaagaatc actagaaaaa 180 gtccttgtcg atttctaccc tcttgctgga agatacacca tgaatgatca catagttgat 240 tgcagtgatg agggcgccgt gtttgtcgaa gctgaagctc tggatgctga gctgacgatg 300 gatcagctcg tcaagaatat ggaggcccag actatccatc atctccttcc ggatcaatat 360 ttcccagctg atgctccaat tccactcctc tcaatccagg tcacacattt cccatttggg 420 ggcctggcaa ttgccatcgt cgtttctcac gctgtattcg aaggtttttc actcggggtg 480 ttcgtcgccg cctggtcgaa ggccaccatt aatccagatg tgaagatcga aatcaccccg 540 tctttcgatc ttccctcatt gcttccatac aaggatgacg atttcggatt aactgactgt 600 gaaattatta acctgtgtga ggatattgtt gttaagaggt ttatgtttgg gaaggaggct 660 ataacgaggt tgagatcaag actaagccca aatcacaatg ggaaaacgat ctctcgtgtt 720 cgagtggtgt gtgccgttat agtgaaagcc ttgatgggcc tggaacgcgc caaacatggc 780 aaaacaagag atttcttgat cactcaatcg attaacatgc gcgagagaac aaaagcacct 840 ctacagaagc atgcttgtgg gaatctggca gtcttgtcgt gcacgcgacg tgtagaagcc 900 gaggagatga tggagctgca gaacttggtt aatctgatcg gagaatctac tgagaaggac 960 atagctcatt attctgaatt gctgtctcat aatcaatttg ggcgtgatat catcgtcaac 1020 gtgatgaaat ctctcatgca attcttggat cctgacattt attctgtatg cttcactgat 1080 tggagtaagt ttagattcta cgaagctgat tttgggttcg ggaagcctgt ttggacggcc 1140 gttggaccgc aacgcccaat cattaccacc gctattctca tgaacaacag agagggtgat 1200 ggaattgagg catggctgca tctcaacaaa aatgacatgc ttatcttcga gcaagatgaa 1260 gaaatcaaat tatttactac ctaa 1284
SEQ ID NO:6
Coleus forskohlii
MKVERFSRKL IKPHTPTPEN LKKYKLSLLD KCLGHDNFAI VLFYESKPRN KSELEESLEK 60
VLVDFYPLAG RHTMNDHIVD CSDVGAVFVE AEALDVELTM DELVKNMEAQ TIHHLLPNQY 120
FSADAPNPLL SIQVTHFPSG GLAIGIAVSH AVFDGFSLGV FVAAWSKATM NPDRKIKITP 180
SFDLPSLLPY KDDNFGLTAA EIVSQSEDIV VKRFI FGKEA ITRLRSKLSP NRNGKKISRV 240
RVVCAVIVKA LMGLERAKHG KTRDFLITQS INMRERTKAP LQKHACGNLA VLSCTRRVEA 300
EEMMELQNLV NLIGDSTEKD IADFAELLSP DQVGRDIIIK MMKSFMQFLD NDIYSVCFTD 360
WSKFEFYEAD FGFGKPVWMA AGPQRPIIST AILMSDREGD GIEAWLHLNK NDMLI FEQDE 420
EIKLFTT 427
SEQ ID NO:7
Coleus forskohlii
MKVERFSRKL IKPHTPTPEN LKNYKLSLLD KCLVQDNFAV VLFYESKPRN KSELEESLEK 60
VLVDFYPLAG RYTMNDHIVD CSDEGAVFVE AEALDAELTM DQLVKNMEAQ TIHHLLPDQY 120
FPADAPIPLL SIQVTHFPFG GLAIAIVVSH AVFEGFSLGV FVAAWSKATI NPDVKIEITP 180
SFDLPSLLPY KDDDFGLTDC EIINLCEDIV VKRFMFGKEA ITRLRSRLSP NHNGKTISRV 240
RVVCAVIVKA LMGLERAKHG KTRDFLITQS INMRERTKAP LQKHACGNLA VLSCTRRVEA 300
EEMMELQNLV NLIGESTEKD IAHYSELLSH NQFGRDIIVN VMKSLMQFLD PDIYSVCFTD 360
WSKFRFYEAD FGFGKPVWTA VGPQRPIITT AILMNNREGD GIEAWLHLNK NDMLIFEQDE 420
EIKLFTT 427
SEQ ID NO:1 1
Artificial sequence
atgaaggtcg aaagattctc cagaaagttg attaagccac atactccaac tccagaaaac 60 ttgaagaagt acaagttgtc cttgttggat aagtgcttgg gtcatgataa tttcgccatc 120 gttttgttct acgaatccaa gccaagaaac aagtccgaat tggaagaatc cttggaaaag 180 gttttggttg acttttatcc attggctggt agatacacca tgaacgatca tatagttgat 240 tgctctgatg ttggtgccgt ttttgttgaa gctgaagctt tggatgttga attgaccatg 300 gatgaattgg tcaagaacat ggaagctcaa accatccatc atttgttgcc aaatcaatac 360 ttctctgctg atgctccaaa tcctttgttg tctattcaag ttacccattt cccatctggt 420 ggtttggcta ttggtattgc tgtttctcat gctgttttcg acggtttttc tttgggtgtt 480 ttcgttgctg cttggtctaa agctactatg aatccagata gaaagatcaa gatcacccca 540 tcttttgact tgccatcttt gttaccatac aaggatgata acttcggttt gactgctgct 600 gaaatcgttt ctcaatctga agatatcgtc gtcaagagat tcatcttcgg taaagaagct 660 atcactagat tgagatccaa gttgtctcca aacagaaacg gtaagaagat ctccagagtt 720 agagttgttt gtgccgttat agttaaggct ttgatgggtt tggaaagagc taaacacggt 780 aagactagag atttcttgat cacccaatcc atcaacatga gagaaagaac aaaagcccca 840 ttgcaaaaac atgcttgtgg taatttggct gttttgtctt gtaccagaag agttgaagcc 900 gaagaaatga tggaattgca aaacttggtt aacttgatcg gtgactctac cgaaaaggat 960 attgctgatt tcgccgaatt attgtcccca gatcaagttg gtagagacat cattatcaag 1020 atgatgaagt ccttcatgca attcttggac aacgacatct actctgtttg tttcactgat 1080 tggtctaagt tcgaattcta cgaagccgat tttggttttg gtaaaccagt ttggatggct 1140 gctggtccac aaagaccaat tatttctact gccatcttga tgtccgatag agaaggtgat 1200 ggtattgaag cttggttgca tttgaacaag aacgacatgt tgatcttcga acaagacgaa 1260 gaaatcaagt tgttcaccac ctga 1284
SEQ ID NO:12
Artificial sequence
atgaaggtcg aaagattctc cagaaagttg attaagccac atactccaac tccagaaaac 60 ttgaagaact acaagttgtc cttgttggat aagtgcttgg tccaagataa tttcgccgtt 120 gttttgttct acgaatccaa gccaagaaac aagtccgaat tggaagaatc cttggaaaag 180 gttttggttg acttttatcc attggctggt agatacacca tgaacgatca tatagttgat 240 tgctctgatg aaggtgccgt ttttgttgaa gctgaagctt tggatgctga attgactatg 300 gatcaattgg tcaagaacat ggaagcccaa accattcatc atttgttgcc agatcaatac 360 tttccagctg atgctccaat tcctttgttg tctattcaag ttacccattt cccatttggt 420 ggtttggcta ttgctatcgt tgtttctcat gctgttttcg acggtttttc tttgggtgtt 480 ttcgttgctg cttggtctaa agctactatt aacccagatg tcaagatcga aattacccca 540 tcttttgact tgccatcctt gttgccatac aaggacgatg attttggttt gaccgattgc 600 gaaatcatca acttgtgtga agatatcgtc gtcaagagat tcatgttcgg taaagaagct 660 atcaccagat tgagatctag attgtctcca aaccataacg gtaagaccat ctctagagtt 720 agagttgttt gtgccgttat cgttaaggct ttgatgggtt tggaaagagc taaacacggt 780 aaaaccagag atttcttgat cacccaatcc atcaacatga gagaaagaac aaaagcccca 840 ttgcaaaaac atgcttgtgg taatttggct gttttgtctt gtaccagaag agttgaagcc 900 gaagaaatga tggaattgca aaacttggtt aacttgatcg gtgaatccac cgaaaaggat 960 attgctcact actccgaatt attgtcccac aatcaattcg gtagagacat catcgttaac 1020 gtcatgaagt ctttgatgca attcttggat ccagacatct actctgtttg tttcactgat 1080 tggtctaagt tcagattcta cgaagctgat ttcggttttg gtaaaccagt ttggactgct 1140 gttggtccac aaagaccaat tattactacc gccattttga tgaacaacag agaaggtgat 1200 ggtattgaag cttggttgca tttgaacaag aacgacatgt tgatcttcga acaagacgaa 1260 gaaatcaagt tgttcaccac ctga 1284
SEQ ID NO:16
Coleus forskohlii
MKMLMIKSQF RVHSIVSAWA NNSNKRQSLG HQIRRKQRSQ VTECRVASLD ALNGIQKVGP 60
ATIGTPEEEN KKIEDSIEYV KELLKTMGDG RISVSPYDTA IVALIKDLEG GDGPEFPSCL 120
EWIAQNQLAD GSWGDHFFCI YDRVVNTAAC WALKSWNVH ADKIEKGAVY LKENVHKLKD 180
GKIEHMPAGF EFVVPATLER AKALGIKGLP YDDPFIREIY SAKQTRLTKI PKGMIYESPT 240
SLLYSLDGLE GLEWDKILKL QSADGSFITS VSSTAFVFMH TNDLKCHAFI KNALTNCNGG 300
VPHTYPVDIF ARLWAVDRLQ RLGISRFFEP EIKYLMDHIN NVWREKGVFS SRHSQFADID 360
DTSMGIRLLK MHGYNVNPNA LEHFKQKDGK FTCYADQHIE SPSPMYNLYR AAQLRFPGEE 420
ILQQALQFAY NFLHENLASN HFQEKWVISD HLIDEVRIGL KMPWYATLPR VEASYYLQHY 480
GGSSDVWIGK TLYRMPEISN DTYKILAQLD FNKCQAQHQL EWMSMKEWYQ SNNVKEFGIS 540
KKELLLAYFL AAATMFEPER TQERIMWAKT QVVSRMITSF LNKENTMSFD LKIALLTQPQ 600
HQINGSEMKN GLAQTLPAAF RQLLKEFDKY TRHQLRNTWN KWLMKLKQGD DNGGADAELL 660
ANTLNICAGH NEDILSHYEY TALSSLTNKI CQRLSQIQDK KMLEIEEGSI KDKEMELEIQ 720
TLVKLVLQET SGGIDRNIKQ TFLSVFKTFY YRAYHDAKTI DAHI FQVLFE PVV 773
SEQ ID NO:17
Coleus forskohlii
MSSLAGNLRV IPFSGNRVQT RTGILPVHQT PMITSKSSAA VKCSLTTPTD LMGKIKEVFN 60
REVDTSPAAM TTHSTDIPSN LCIIDTLQRL GIDQYFQSEI DAVLHDTYRL WQLKKKDIFS 120
DITTHAMAFR LLRVKGYEVA SDELAPYADQ ERINLQTIDV PTWELYRAA QERLTEEDST 180
LEKLYVWTSA FLKQQLLTDA IPDKKLHKQV EYYLKNYHGI LDRMGVRRNL DLYDISHYKS 240
LKAAHRFYNL SNEDILAFAR QDFNISQAQH QKELQQLQRW YADCRLDTLK FGRDVVRIGN 300
FLTSAMIGDP ELSDLRLAFA KHIVLVTRID DFFDHGGPKE ESYEILELVK EWKEKPAGEY 360
VSEEVEILFT AVYNTVNELA EMAHIEQGRS VKDLLVKLWV EILSVFRIEL DTWTNDTALT 420
LEEYLSQSWV SIGCRICILI SMQFQGVKLS DEMLQSEECT DLCRYVSMVD RLLNDVQTFE 480
KERKENTGNS VSLLQAAHKD ERVINEEEAC IKVKELAEYN RRKLMQIVYK TGTIFPRKCK 540
DLFLKACRIG CYLYSSGDEF TSPQQMMEDM KSLVYEPLPI SPPEANNASG EKMSCVSN 598
SEQ ID NO:18
Coleus forskohlii
MSITINLRVI AFPGHGVQSR QGIFAVMEFP RNKNTFKSSF AVKCSLSTPT DLMGKIKEKL 60
SEKVDNSVAA MATDSADMPT NLCIVDSLQR LGVEKYFQSE IDTVLDDAYR LWQLKQKDIF 120
SDITTHAMAF RLLRVKGYDV SSEELAPYAD QEGMNLQTID LAAVIELYRA AQERVAEEDS 180
TLEKLYVWTS TFLKQQLLAG AIPDQKLHKQ VEYYLKNYHG ILDRMGVRKG LDLYDAGYYK 240
ALKAADRLVD LCNEDLLAFA RQDFNINQAQ HRKELEQLQR WYADCRLDKL EFGRDWRVS 300
NFLTSAILGD PELSEVRLVF AKHIVLVTRI DDFFDHGGPR EESHKILELI KEWKEKPAGE 360
YVSKEVEILY TAVYNTVNEL AERANVEQGR NVEPFLRTLW VQILSIFKIE LDTWSDDTAL 420 TLDDYLNNSW VSIGCRICIL MSMQFIGMKL PEEMLLSEEC VDLCRHVSMV DRLLNDVQTF 480
EKERKENTGN AVSLLLAAHK GERAFSEEEA IAKAKYLADC NRRSLMQIVY KTGTI FPRKC 540
KDMFLKVCRI GCYLYASGDE FTSPQQMMED MKSLVYEPLQ IHPPPAN 587
SEQ ID NO:19
Coleus forskohlii
MELVEVIVW VGAAALGWL WSHLKPEGRK LPPGPSPLPI FGNI FQLTGP NTCESFANLS 60
KKYGPVMSLR LGSLFTWIS SPEMAKEVLT NTDFLERPLM QAVHAHDHAQ FSIAFLPVTT 120
PKWKQLRRIC QEQMFASRIL EKSQPLRHQK LQELIDHVQK CCDAGRAVTI RDAAFATTLN 180
LMSVTMFSAD ATELDSSVTA ELRELMAGW TVLGTPNFAD FFPILKYLDP QGVRRKAHFH 240
YGKMFDHIKS RMAERVELKK ANPNHLKHDD FLEKILDISL RRDYELTIQD ITHLLVDLYV 300
AGSESTVMSI EWIMSELMLH PQSLAKLKAE LRSVMGERKM IQESEDISRL PFLNAVIKET 360
LRLHPPGPLL FPRQNTNDVE LNGYFIPKGT QILVNEWAIG RDPSVWPNPE SFVPERFLDK 420
NIDYKGQDPQ LVPFGSGRRI CLGI PIAHRM VHSTVAALIH NFEWKFAPDG SEYNRELFSG 480
PALRREVPLN LIPLNPSF 498
SEQ ID NO:20
Coleus forskohlii
METITLLLAL FFIALTYFIS SRRRRNLPPG PFPLPIIGNM LQLGSKPHQS FAQLSKKYGP 60
LMSIHLGSLY TVIVSSPEMA KEILQKHGQV FSGRTIAQAV HACDHDKISM GFLPVANTWR 120
DMRKICKEQM FSHHSLEASE ELRHQKLQQL LDYAQKCCEA GRAVDIREAS FITTLNLMSA 180
TMFSTQATEF DSEATKEFKE IIEGVATIVG VANFADYFPI LKPFDLQGIK RRADGYFGRL 240
LKLIEGYLNE RLESRRLNPD APRKKDFLET LVDIIEANEY KLTTEHLTHL MLDLFVGGSE 300
TNTTSLEWIM SELVINPDKM AKVKEELKSV VGDEKLVNES DMPRLPYLQA VIKEVLRIHP 360
PGPLLLPRKA ESDQWNGYL IPKGTQILFN AWAMGRDPTI WKDPESFEPE RFLNQSIDFK 420
GQDFELIPFG SGRRICPGMP LANRILHMTT ATLVHNFDWK LEEGTADADH KGELFGLAVR 480
RATPLRIIPL KP 492
SEQ ID NO:21
Coleus forskohlii
MELVQVIAW AWWLWSQL KRKGRKLPPG PSPLPIVGNI FQLSGKNINE SFAKLSKIYG 60
PVMSLRLGSL LTVIISSPEM AKEVLTSKDF ANRPLTEAAH AHGHSKFSVG FVPVSDPKWK 120
QMRRVCQEEM FASRILENSQ QRRHQKLQEL IDHVQESRDA GRAVTIRDPV FATTLNIMSL 180
TLFSADATEF SSSATAELRD IMAGWSVLG AANLADFFPI LKYFDPQGMR RKADLHYGRL 240
IDHIKSRMDK RSELKKANPN HPKHDDFLEK IIDITIQRNY DLTINEITHL LVDLYLAGSE 300
STVMTIEWTM AELMLRPESL AKLKAELRSV MGERKMIQES DDISRLPYLN GAIKEALRLH 360
PPGPLLFARK SEIDVELSGY FIPKGTQILV NEWGMGRDPS VWPNPECFQP ERFLDKNIDY 420
KGQDPQLIPF GAGRRICPGI PIAHRVVHSV VAALVHNFDW EFAPGGSQCN NEFFTGAALV 480
REVPLKLIPL NPPSI 495
SEQ ID NO:22
Coleus forskohlii
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDI IQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:23
Coleus forskohlii
MESMNALVVG LLLIALTILF SLRRRRNLAP GPYPFPIIGN MLQLGTKPHQ SFAQLSKKYG 60
PLMSIHLGSL YTVIVSSPEM AKEILQKHGQ VFSGRTIAQA VHACDHDKIS MGFLPVSNTW 120 RDMRKICKEQ MFSHHSLEGS QGLRQQKLLQ LLDYAQKCCE TGRAVDIREA SFITTLNLMS 180
ATMFSTQATE FESKSTQEFK EIIEGVATIV GVANFGDYFP ILKPFDLQGI KRKADGYFGR 240
LLKLIEGYLN ERLESRKSNP NAPRKNDFLE TVVDILEANE YKLSVDHLTH LMLDLFVGGS 300
ETNTTSLEWT MSELVNNPDK MAKLKQELKS VVGERKLVDE SEMPRLPYLQ AVIKESLRIH 360
PPGPLLLPRK AETDQEVNGY LIPKGTQILF NVWAMGRDPS IWKDPESFEP ERFLNQNIDF 420
KGQDFELI PF GSGRRICPGM PLANRILHMA TATMVHNFDW KLEQGTDEAD AKGELFGLAV 480
RRAVPLRIIP LQP 493
SEQ ID NO:26
Coleus forskohlii
MKVERISRKF IKPYTPTPQN LKKYKLSLLD KCMGHMDFAV VLFYESKPRN KNELEESLEK 60 VLVDFYPLAG RYTMNDHIVD CSDEGAVFVE AEAPNVELTV DQLVKNMEAQ TIHDFLPDQY 120 FPADAPNPLL SIQVTHFPCG GLAIGIWSH AVFDGFSLGV FLAAWSKATM NPERKIEITP 180 SFDLPSLLPY KDESFGLNFS EIVKAENIW KRLNFGKEAI TRLRSKLSPN QNGKTISRVR 240 WCAVIVKAL MGLERAKTRD FMICQGINMR ERTKAPLQKH ACGNLAVSSY TRRVAAAEAE 300 ELQSLVNLIG DSIEKSIADY ADILSSDQDG RHIISTMMKS FMQFAAPDIK AISFTDWSKF 360 GFYQVDFGFG KPVWTGVRPE RPIFSAAILM SNREGDGIEA WLHLDKNDML IFEQDEEIKL 420 LITT 424
SEQ ID NO:27
Coleus forskohlii
atgaaggttg aaagaataag caggaaattc ataaaaccat acaccccaac cccccaaaac 60 ctcaagaaat ataagctttc tcttttagac aaatgcatgg ggcacatgga ttttgctgtt 120 gttctgtttt atgaatcgaa accaagaaac aagaatgaat tggaagaatc actagaaaaa 180 gtcctggtcg atttctaccc tcttgctgga agatacacca tgaatgatca catagttgat 240 tgcagtgatg agggcgccgt gtttgtcgaa gccgaagctc caaacgtcga gctgacggtg 300 gatcaactcg tcaagaacat ggaggcccag actatccatg atttccttcc cgatcaatat 360 ttcccagctg atgctccaaa tccactcctc tcgatccagg tcacgcattt cccatgtggt 420 ggcttggcga ttggcatcgt tgtttctcac gctgtcttcg atggtttttc gctgggggtg 480 ttccttgccg cctggtcgaa ggccaccatg aacccagaga ggaagatcga aatcaccccg 540 tctttcgatc ttccttcgtt gcttccatac aaggatgaaa gtttcggatt aaatttcagc 600 gaaattgtca aggctgaaaa tattgttgtg aagaggctta atttcgggaa ggaggctata 660 acgaggttga gatcaaaact aagcccaaat caaaatggga aaacgatctc tcgcgttcga 720 gttgtttgcg ccgttatagt gaaagccttg atgggcctgg aacgcgccaa aacaagagat 780 ttcatgatct gtcaagggat caacatgcgt gagagaacaa aggcacctct acagaagcat 840 gcttgtggga atctggcggt ctcgtcttac acgcgacgtg tagccgcagc agaagccgag 900 gagctgcaga gtttggtaaa tctaataggt gattctattg agaagagcat agctgattat 960 gctgatattc tttcttctga tcaagatggg cgtcatatca tcagcacgat gatgaaatct 1020 ttcatgcaat tcgcggctcc tgacataaaa gctatatcct tcactgattg gagtaagttt 1080 ggattctacc aagttgattt tgggttcggg aagcctgttt ggacgggtgt tcggcccgaa 1140 cgcccaatct ttagcgccgc tattctcatg agcaacagag agggtgatgg aattgaggca 1200 tggcttcatc ttgacaaaaa tgacatgctt atcttcgagc aagatgaaga aatcaaatta 1260 ttaattacta cctaa 1275
SEQ ID NO:28
Artificial Sequence
atgaaggtcg aaagaatctc cagaaagttc attaagccat acactccaac tccacaaaac 60 ttgaagaagt acaagttgtc cttgttggat aagtgcatgg gtcatatgga tttcgctgtt 120 gttttgttct acgaatccaa gccaagaaac aagaacgaat tggaagaatc cttggaaaag 180 gttttggttg acttttatcc attggctggt agatacacca tgaacgatca tatagttgat 240 tgctctgatg aaggtgccgt ttttgttgaa gctgaagctc caaatgttga attgaccgtt 300 gatcaattgg tcaagaacat ggaagctcaa accatccatg atttcttgcc agatcaatac 360 tttccagctg atgctcctaa tcctttgttg tctattcaag ttacccattt cccatgtggt 420 ggtttggcta ttggtatagt tgtttctcat gctgttttcg acggtttctc tttgggtgtt 480 tttttggctg cttggtctaa ggctactatg aatccagaaa gaaagatcga aatcacccca 540 tcttttgact tgccatcttt gttgccttac aaggatgaat ctttcggttt gaacttctcc 600 gaaatcgtta aggctgaaaa catcgttgtc aagagattga acttcggtaa agaagccatt 660 accagattga gatctaagtt gtccccaaat caaaacggta agaccatctc tagagttaga 720 gttgtttgtg ccgttattgt caaggctttg atgggtttgg aaagagctaa gactagagat 780 ttcatgatct gccaaggtat caacatgaga gaaagaacaa aagccccatt gcaaaaacat 840 gcttgtggta atttggccgt ttcttcatac actagaagag ttgctgctgc tgaagcagaa 900 gaattgcaat ctttggttaa cttgatcggt gactccatcg aaaagtctat tgctgattac 960 gccgatatct tgtcctctga tcaagatggt agacatatca tctccaccat gatgaagtct 1020 ttcatgcaat ttgctgcccc agatattaag gctatttctt tcactgactg gtccaagttt 1080 ggtttctacc aagttgattt tggtttcggt aaaccagttt ggactggtgt tagaccagaa 1140 agaccaattt tttccgctgc cattttgatg tctaacagag aaggtgatgg tattgaagct 1200 tggttgcatt tggataagaa cgacatgttg atcttcgaac aagacgaaga aatcaagttg 1260 ttgatcacca cctga 1275
SEQ ID NO:29
Coleus forskohlii
atggcgtctt gtggagctat cgggagtagt ttcttgccac tgctccattc cgacgagtca 60 agcttgttat ctcggcccac tgctgctctt cacatcaaga agcagaagtt ttctgtggga 120 gctgctctgt accaggataa cacgaacgat gtcgttccga gtggagaggg tctgacgagg 180 cagaaaccaa gaactctgag tttcacggga gagaagcctt caactccaat tttggatacc 240 atcaactatc caatccacat gaagaatctg tccgtggagg aactggagat attggccgat 300 gaactgaggg aggagatagt ttacacggtg tcgaaaacgg gagggcattt gagctcaagc 360 ttgggtgtat cagagctcac cgttgcactg catcatgtat tcaacacacc cgatgacaaa 420 atcatctggg atgttggaca tcaggcgtat ccacacaaaa tcttgacagg gaggaggtcc 480 agaatgcaca ccatccgaca gactttcggg cttgcagggt tccccaagag ggatgagagc 540 ccgcacgacg cgttcggagc tggtcacagc tccactagta tttcagctgg tctagggatg 600 gcggtgggga gggacttgct acagaagaac aaccacgtga tctcggtgat cggagacgga 660 gccatgacag cggggcaggc atacgaggcc atgaacaatg caggatttct tgattccaat 720 ctgatcatcg tgttgaacga caacaaacaa gtgtccctgc ctacagccac cgtcgacggc 780 cctgctcctc ccgtcggagc cttgagcaaa gccctcacca agctgcaagc aagcaggaag 840 ttccggcagc tacgagaagc agcaaaaggc atgactaagc agatgggaaa ccaagcacac 900 gaaattgcat ccaaggtaga cacttacgtt aaaggaatga tggggaaacc aggcgcctcc 960 ctcttcgagg agctcgggat ttattacatc ggccctgtag atggacataa catcgaagat 1020 cttgtctata ttttcaagaa agttaaggag atgcctgcgc ccggccctgt tcttattcac 1080 atcatcaccg agaagggcaa aggctaccct ccagctgaag ttgctgctga caaaatgcat 1140 ggtgtggtga agtttgatcc aacaacgggg aaacagatga aggtgaaaac gaagactcaa 1200 tcatacaccc aatacttcgc ggagtctctg gttgcagaag cagagcagga cgagaaagtg 1260 gtggcgatcc acgcggcgat gggaggcgga acggggctga acatcttcca gaaacggttt 1320 cccgaccgat gtttcgatgt cgggatagcc gagcagcatg cagtcacctt cgccgcgggt 1380 cttgcaacgg aaggcctcaa gcccttctgc acaatctact cttccttcct gcagcgaggt 1440 tatgatcagg tggtgcacga tgtggatctt cagaaactcc cggtgagatt catgatggac 1500 agagctggac ttgtgggagc tgacggccca acccattgcg gcgccttcga caccacctac 1560 atggcctgcc tgcccaacat ggtcgtcatg gctccctccg atgaggctga gctcatgcac 1620 atggtcgcca ctgccgctgt cattgatgat cgccctagct gcgttaggta ccctagagga 1680 aacggtatag gggtgcccct ccctccaaac aataaaggaa ttccattaga ggttgggaag 1740 ggaaggattt tgaaagaggg taaccgagtt gccattctag gcttcggaac tatcgtgcaa 1800 aactgtctag cagcagccca acttcttcaa gaacacggca tatccgtgag cgtagccgat 1860 gcgagattct gcaagcctct ggatggagat ctgatcaaga atcttgtgaa ggagcacgaa 1920 gttctcatca ctgtggaaga gggatccatt ggaggattca gtgcacatgt ctctcatttc 1980 ttgtccctca atggactcct cgacggcaat cttaagtgga ggcctatggt gctcccagat 2040 aggtacattg atcatggagc ataccctgat cagattgagg aagcagggct gagctcaaag 2100 catattgcag gaactgtttt gtcacttatt ggtggaggga aagacagtct tcatttgatc 2160 aacatgtaa 2169
SEQ ID NO:30
Coleus forskohlii MASCGAIGSS FLPLLHSDES SLLSRPTAAL HIKKQKFSVG AALYQDNTND WPSGEGLTR 60
QKPRTLSFTG EKPSTPILDT INYPIHMKNL SVEELEILAD ELREEIVYTV SKTGGHLSSS 120
LGVSELTVAL HHVFNTPDDK I IWDVGHQAY PHKILTGRRS RMHTIRQTFG LAGFPKRDES 180
PHDAFGAGHS STSISAGLGM AVGRDLLQKN NHVISVIGDG AMTAGQAYEA MNNAGFLDSN 240
LI IVLNDNKQ VSLPTATVDG PAPPVGALSK ALTKLQASRK FRQLREAAKG MTKQMGNQAH 300
EIASKVDTYV KGMMGKPGAS LFEELGIYYI GPVDGHNIED LVYI FKKVKE MPAPGPVLIH 360
IITEKGKGYP PAEVAADKMH GWKFDPTTG KQMKVKTKTQ SYTQYFAESL VAEAEQDEKV 420
VAIHAAMGGG TGLNIFQKRF PDRCFDVGIA EQHAVTFAAG LATEGLKPFC TIYSSFLQRG 480
YDQWHDVDL QKLPVRFMMD RAGLVGADGP THCGAFDTTY MACLPNMWM APSDEAELMH 540
MVATAAVIDD RPSCVRYPRG NGIGVPLPPN NKGIPLEVGK GRILKEGNRV AILGFGTIVQ 600
NCLAAAQLLQ EHGISVSVAD ARFCKPLDGD LIKNLVKEHE VLITVEEGSI GGFSAHVSHF 660
LSLNGLLDGN LKWRPMVLPD RYIDHGAYPD QIEEAGLSSK HIAGTVLSLI GGGKDSLHLI 720
NM 722
SEQ ID NO:31
Coleus forskohlii
atgaggtcta tgaatctggt cgatgcttgg gttcaaaacc tccccatttt caagcaacca 60 cacccctcca aattcatcca ccatcccaga ttcgagcccg ctttcctcaa atcgcggagg 120 cccatttcct ccttcgccgt ctccgccgtc ctcaccggcg aggaagcaag aatcttcacc 180 cgaggagatg aagcgccctt caatttcaac gcctacgtcg tcgagaaagc cacccacgtg 240 aacaaggctc tcgacgacgc ggtggcggtg aagaaccctc cgatgatcca cgaggccatg 300 aggtactcct tgctcgccgg cggaaagagg gtccgcccca tgctctgcat cgccgcctgc 360 gaggtggtgg gcggccccca agcggcggcg atccccgccg cctgcgcggt ggagatgatc 420 cacaccatgt ctctcatcca cgatgatctt ccctgtatgg acaatgatga cctccgccgc 480 ggcaagccca ccaatcacaa agtcttcggc gagaacgtcg ccgtgctcgc cggtgatgct 540 ttattggcct tcgcgtttga attcatcgcc actgccacca cgggggtggc ccctgagagg 600 attcttgcgg cggtggcgga gttggcgaag gcgatcggga cggaggggct ggtggcgggg 660 caggtggtgg atttgcattg caccggcaat cccaatgtag gactggacac attggaattc 720 atacacatac acaaaactgc agcattgctt gaggcctctg tagttttggg ggccattttg 780 ggaggaggaa gcagtgatca agttgagaaa ctgagaactt ttgctagaaa aattgggctt 840 ctcttccaag tggtggatga cattttagat gtcacaaaat cctcggagga gttggggaag 900 acggccggca aagacttggc cgtcgacaag accacctacc caaagcttct gggattggag 960 aaagctatgg agtttgctga gaggctgaat gaggaggcca agcagcagct gctggatttt 1020 gacccccgga aggcggcgcc gctggtggcg ctggccgatt acattgctca caggcagaac 1080 tag 1083
SEQ ID NO:32
Coleus forskohlii
MRSMNLVDAW VQNLPIFKQP HPSKFIHHPR FEPAFLKSRR PISSFAVSAV LTGEEARI FT 60
RGDEAPFNFN AYVVEKATHV NKALDDAVAV KNPPMIHEAM RYSLLAGGKR VRPMLCIAAC 120
EVVGGPQAAA IPAACAVEMI HTMSLIHDDL PCMDNDDLRR GKPTNHKVFG ENVAVLAGDA 180
LLAFAFEFIA TATTGVAPER ILAAVAELAK AIGTEGLVAG QWDLHCTGN PNVGLDTLEF 240
IHIHKTAALL EASWLGAIL GGGSSDQVEK LRTFARKIGL LFQVVDDILD VTKSSEELGK 300
TAGKDLAVDK TTYPKLLGLE KAMEFAERLN EEAKQQLLDF DPRKAAPLVA LADYIAHRQN 360
SEQ ID NO:33
Coleus forskohlii
atggaatcga ctattgagaa gctttcgccc ttcgatttga tgactgcgat tctcaaagga 60 gtcaaacttg ataattcgaa cgggtctgct ggggtggagc atccggctgt gatcgcgatg 120 ctgatggaga acaaggatct cgtgatgatg ctcaccacct ccgtcgcggt gcttctagga 180 cttgctgtgt atctcgtgtg gcggcgcgga gccggatcgg cgaagagggt ggtggagccg 240 ccgaagctgg tgattcccaa gggcccggtg gatgcggagg aagaggatga tgggaagaag 300 aaggttacca tcttttttgg gacgcagact ggaactgctg aaggctttgc taaggcactt 360 gccgaagaag ctaaagcaag atatccgctg accaacttta aagtagttga cttggatgat 420 tatgctgccg atgatgaaga gtatgaagag aagatgaaga aggagacctt tgcattcttc 480 ttcttggcga catatggaga tggtgagcct accgacaatg ctgcgagatt ttacaagtgg 540 ttttccgagg ggaaagagag aggtgagata ttcaagaatc tcaactatgg tgtatttggt 600 cttggaaaca ggcagtatga gcatttcaac aagattgcta tagtggtgga tgacattctt 660 cttgagcaag gtggaaatcg gcttgtccct gtgggtcttg gagatgacga tcaatgtatc 720 gaagatgatt tctcagcatg gcgtgataat gtgtggcctg agctggataa gttgctccgt 780 gatgaggatg atgcaactgt tgcaactcca tatactgcag ccgttttgga gtatcgtgtt 840 gtgttccatg accagtcaga tgaactgcac tcggaaaaca acttagccaa tggtcatgca 900 aatggaaatg cttcttatga tgctcaacac ccctgcaaag tgaatgttgc tgtaaaaagg 960 gagctacata ctcctctatc cgatcgttct tgcactcact tggaattcga catatctggc 1020 actggattag agtatgaaac aggggaccac gttggtgttt actgtgagaa cttgattgaa 1080 actgtagagg aagcagaaag gcttcttggt ctttctccac aaacattctt ttcagttcac 1140 actgataaag cggacggcac accacttggt ggaagtgcct tgcctcctcc cttcccgccg 1200 tgcactttga ggacagcgct aagtcgatat gctgatcttt tgaatgctcc caaaaagtct 1260 gctttgactg cattggctgc ttatgcctct gaccctagtg aagctgatcg gctcaagcac 1320 cttgcttccc ctgatggaaa ggaggaatat gctcaatatg tggtttctgg tcagagaagc 1380 ctacttgagg tgatggctga cttcccatct gccaagcctc ctcttggtgt tttctttgct 1440 gcaattgctc ctcgcttgca gcctcgattt tattcaatct catcctcacc aaagattgca 1500 ccttcaagaa ttcacgtcac ttgtgcgttg gtgtatgaga aaatgcccac tggacgaatc 1560 cacaagggtg tctgctcaac atggatgaag aatgctgtgc cattggagga aagccccaac 1620 tgctcttcag caccagtttt tgtacggacc tcaaacttca gactccctgc tgatcctaaa 1680 gtaccagtta taatgattgg ccctggaacc ggtttggctc cattcagggg ttttcttcag 1740 gaaagattag ccctcaagga atctggagca gaacttggtc ctgctatatt attcttcggg 1800 tgcagaaaca gtaaaatgga tttcatttac caagatgaac tggataactt tgttaaagct 1860 ggagtggttt ctgagcttgt ccttgcgttt tcacgcgagg gtcctgctaa ggaatacgtg 1920 cagcataaga tggcacagaa ggcctcggat gtgtggaata tgatatcaga agggggctac 1980 gtttatgtat gtggtgatgc taagggcatg gcacgtgacg ttcaccggac tcttcacacc 2040 attgttcaag aacagggatc tctggacagc tcgaaaaccg agagcttcgt caagaatctg 2100 cagatgaccg gccggtacct gcgtgacgtg tggtga 2136
SEQ ID NO:34
Coleus forskohlii
MESTIEKLSP FDLMTAILKG VKLDNSNGSA GVEHPAVIAM LMENKDLVMM LTTSVAVLLG 60 LAVYLVWRRG AGSAKRWEP PKLVIPKGPV DAEEEDDGKK KVTI FFGTQT GTAEGFAKAL 120 AEEAKARYPL TNFKWDLDD YAADDEEYEE KMKKETFAFF FLATYGDGEP TDNAARFYKW 180 FSEGKERGEI FKNLNYGVFG LGNRQYEHFN KIAIWDDIL LEQGGNRLVP VGLGDDDQCI 240 EDDFSAWRDN VWPELDKLLR DEDDATVATP YTAAVLEYRV VFHDQSDELH SENNLANGHA 300 NGNASYDAQH PCKVNVAVKR ELHTPLSDRS CTHLEFDISG TGLEYETGDH VGVYCENLIE 360 TVEEAERLLG LSPQTFFSVH TDKADGTPLG GSALPPPFPP CTLRTALSRY ADLLNAPKKS 420 ALTALAAYAS DPSEADRLKH LASPDGKEEY AQYWSGQRS LLEVMADFPS AKPPLGVFFA 480 AIAPRLQPRF YSISSSPKIA PSRIHVTCAL VYEKMPTGRI HKGVCSTWMK NAVPLEESPN 540 CSSAPVFVRT SNFRLPADPK VPVIMIGPGT GLAPFRGFLQ ERLALKESGA ELGPAILFFG 600 CRNSKMDFIY QDELDNFVKA GWSELVLAF SREGPAKEYV QHKMAQKASD VWNMISEGGY 660 VYVCGDAKGM ARDVHRTLHT IVQEQGSLDS SKTESFVKNL QMTGRYLRDV W 711
SEQ ID NO:35
Artificial Sequence
atgtccagag ttgcttcctt ggatgctttg aatggtattc aaaaagttgg tccagctacc 60 attggtactc cagaagaaga aaacaagaag atcgaagatt ccatcgaata cgtcaaagaa 120 ttattgaaaa ccatgggtga cggtagaatc tctgtttctc catatgatac tgctatcgtc 180 gccttgatta aggatttgga aggtggtgat ggtccagaat ttccatcttg tttggaatgg 240 attgcccaaa atcaattggc tgatggttct tggggtgatc attttttctg tatctacgat 300 agagttgtta acaccgctgc ttgtgttgtt gctttgaaat cttggaatgt tcacgccgat 360 aagattgaaa aaggtgccgt ttacttgaaa gaaaacgtcc acaaattgaa ggacggtaag 420 atagaacata tgccagctgg ttttgaattc gttgttccag caactttgga aagagctaaa 480 gctttgggta ttaagggttt gccatatgat gatccattca tcagagaaat ctactccgct 540 aagcaaacta gattgactaa gattccaaag ggtatgatct acgaatctcc aacctctttg 600 ttgtactctt tggatggttt agaaggtttg gaatgggata agatcttgaa gttgcaatca 660 gctgacggtt ctttcatcac ttctgtttct tctactgcct tcgttttcat gcataccaac 720 gatttgaagt gccatgcctt tattaagaac gctttgacta actgtaatgg tggtgttcca 780 catacttacc cagttgatat ttttgctaga ttgtgggccg ttgacagatt gcaaagattg 840 ggtatttcta gattcttcga accagaaatc aaatacttga tggaccacat caacaacgtt 900 tggagagaaa agggtgtttt ctcatccaga cattctcaat tcgccgatat tgatgatacc 960 tccatgggta tcagattatt gaagatgcat ggttacaacg ttaacccaaa cgctttggaa 1020 catttcaagc aaaaggatgg taaattcacc tgttacgccg atcaacatat tgaatctcca 1080 tctccaatgt ataacttgta cagagctgcc caattgagat ttccaggtga agaaatttta 1140 caacaagcct tgcaattcgc ctacaacttc ttgcacgaaa atttggcttc taaccacttc 1200 caagaaaagt gggttatctc cgatcatttg atcgatgaag ttagaatcgg tttgaaaatg 1260 ccatggtatg ctactttgcc aagagttgaa gcttcttact acttgcaaca ttacggtggt 1320 tcttccgatg tttggattgg taaaaccttg tatagaatgc cagaaatctc taacgacacc 1380 tacaagattt tggctcaatt ggatttcaac aagtgccaag ctcaacatca attagaatgg 1440 atgtctatga aggaatggta tcaatccaac aacgtaaaag aattcggtat ctccaagaaa 1500 gaattgttgt tggcttactt tttggctgct gctactatgt ttgaacctga aagaactcaa 1560 gaaagaatca tgtgggctaa gacccaagtt gtttctagaa tgattacctc attcttgaac 1620 aaagaaaaca ctatgtcctt cgacttgaag attgctttgt tgactcaacc acaacaccaa 1680 atcaatggtt ccgaaatgaa gaatggtttg gcacaaactt taccagctgc cttcagacaa 1740 ttattgaaag aattcgacaa gtacaccaga caccaattga gaaatacttg gaacaagtgg 1800 ttgatgaagt tgaagcaagg tgatgataac ggtggtgctg atgctgaatt attggctaac 1860 actttgaaca tttgcgccgg tcataacgaa gatattttgt cccattacga atacaccgcc 1920 ttgtcatctt tgaccaacaa gatttgtcaa agattgtccc aaatccaaga taagaagatg 1980 ttggaaatcg aagaaggttc catcaaggac aaagaaatgg aattggaaat tcaaaccttg 2040 gtcaagttgg tattgcaaga aacttctggt ggtatcgaca gaaacatcaa gcaaactttc 2100 ttgtccgttt tcaagacctt ctactacaga gcttaccatg atgctaagac cattgatgcc 2160 catatcttcc aagttttgtt cgaacctgtt gtttaa 2196
SEQ ID NO:36
Artificial Sequence
atgatcacct ccaaatcttc cgctgctgtt aagtgttctt tgactactcc aactgatttg 60 atgggtaaga tcaaagaagt tttcaacaga gaagttgata cctctccagc tgctatgact 120 actcattcta ctgatattcc atccaacttg tgcatcatcg ataccttgca aagattgggt 180 atcgaccaat acttccaatc cgaaattgat gctgtcttgc atgatactta cagattgtgg 240 caattgaaga agaaggacat cttctctgat attaccactc atgctatggc cttcagatta 300 ttgagagtta agggttacga agttgcctct gatgaattgg ctccatatgc tgatcaagaa 360 agaatcaact tgcaaaccat tgatgttcca accgtcgtcg aattatacag agctgcacaa 420 gaaagattga ccgaagaaga ttctaccttg gaaaagttgt acgtttggac ttctgctttc 480 ttgaagcaac aattattgac cgatgccatc ccagataaga agttgcataa gcaagtcgaa 540 tattacttga agaactacca cggtatcttg gatagaatgg gtgttagaag aaacttggac 600 ttgtacgata tctcccacta caaatctttg aaggctgctc atagattcta caacttgtct 660 aacgaagata ttttggcctt cgccagacaa gatttcaaca tttctcaagc ccaacaccaa 720 aaagaattgc aacaattgca aagatggtac gccgattgca gattggatac tttgaaattc 780 ggtagagatg tcgtcagaat cggtaacttt ttaacctctg ctatgatcgg tgatccagaa 840 ttgtctgatt tgagattggc ttttgctaag cacatcgttt tggttaccag aatcgatgat 900 ttcttcgatc atggtggtcc aaaagaagaa tcctacgaaa ttttggaatt ggtcaaagaa 960 tggaaagaaa agccagctgg tgaatacgtt tctgaagaag tcgaaatctt attcaccgct 1020 gtttacaaca ccgttaacga attggctgaa atggcccata ttgaacaagg tagatctgtt 1080 aaggatttgt tggttaagtt gtgggtcgaa atattgtccg ttttcagaat cgaattggat 1140 acctggacta acgatactgc tttgactttg gaagaatact tgtcccaatc ctgggtttct 1200 attggttgca gaatctgcat tttgatctcc atgcaattcc aaggtgttaa gttgagtgac 1260 gaaatgttgc aaagtgaaga atgtaccgat ttgtgcagat acgtttccat ggtcgataga 1320 ttattgaacg atgtccaaac cttcgaaaaa gaaagaaaag aaaacaccgg taactccgtt 1380 tctttgttgc aagctgctca caaagacgaa agagttatca acgaagaaga agcctgcatc 1440 aaggtaaaag aattagccga atacaataga agaaagttga tgcaaatcgt ctacaagacc 1500 ggtactattt tcccaagaaa atgcaaggac ttgttcttga aggcttgtag aattggttgc 1560 tacttgtact cttctggtga tgaattcact tccccacaac aaatgatgga agatatgaag 1620 tccttggtct atgaaccatt gccaatttct ccacctgaag ctaacaatgc atctggtgaa 1680 aaaatgtcct gcgtcagtaa ctga 1704
SEQ ID NO:37
Synechococcus sp.
MVAQTFNLDT YLSQRQQQVE EALSAALVPA YPERIYEAMR YSLLAGGKRL RPILCLAACE 60 LAGGSVEQ M PTACALEMIH TMSLIHDDLP AMDNDDFRRG KPTNHKVFGE DIAILAGDAL 120 LAYAFEHIAS QTRGVPPQLV LQVIARIGHA VAATGLVGGQ WDLESEGKA ISLETLEYIH 180 SHKTGALLEA SWSGGILAG ADEELLARLS HYARDIGLAF QIVDDILDVT ATSEQLGKTA 240 GKDQAAAKAT YPSLLGLEAS RQKAEELIQS AKEALRPYGS QAEPLLALAD FITRRQH 297
SEQ ID NO:38
Artificial Sequence
atggtcgcac aaactttcaa cctggatacc tacttatccc aaagacaaca acaagttgaa 60 gaggccctaa gtgctgctct tgtgccagct tatcctgaga gaatatacga agctatgaga 120 tactccctcc tggcaggtgg caaaagatta agacctatct tatgtttagc tgcttgcgaa 180 ttggcaggtg gttctgttga acaagccatg ccaactgcgt gtgcacttga aatgatccat 240 acaatgtcac taattcatga tgacctgcca gccatggata acgatgattt cagaagagga 300 aagccaacta atcacaaggt gttcggggaa gatatagcca tcttagcggg tgatgcgctt 360 ttagcttacg cttttgaaca tattgcttct caaacaagag gagtaccacc tcaattggtg 420 ctacaagtta ttgctagaat cggacacgcc gttgctgcaa caggcctcgt tggaggccaa 480 gtcgtagacc ttgaatctga aggtaaagct atttccttag aaacattgga gtatattcac 540 tcacataaga ctggagcctt gctggaagca tcagttgtct caggcggtat tctcgcaggg 600 gcagatgaag agcttttggc cagattgtct cattacgcta gagatatagg cttggctttt 660 caaatcgtcg atgatatcct ggatgttact gctacatctg aacagttggg gaaaaccgct 720 ggtaaagacc aggcagccgc aaaggcaact tatccaagtc tattgggttt agaagcctct 780 agacagaaag cggaagagtt gattcaatct gctaaggaag ccttaagacc ttacggttca 840 caagcagagc cactcctagc gctggcagac ttcatcacac gtcgtcagca ttaa 894
SEQ ID NO:39
Rattus norvegicus
MLDTGLLLW ILASLSVMLL VSLWQQKIRG RLPPGPTPLP FIGNYLQLNT KDVYSSITQL 60
SERYGPVFTI HLGPRRVWL YGYDAVKEAL VDQAEEFSGR GEQATYNTLF KGYGVAFSSG 120
ERAKQLRRLS IATLRDFGVG KRGVEERILE EAGYLIKMLQ GTCGAPIDPT IYLSKTVSNV 180
ISSIVFGERF DYEDTEFLSL LQMMGQMNRF AASPTGQLYD MFHSVMKYLP GPQQQIIKVT 240
QKLEDFMIEK VRQNHSTLDP NSPRNFIDSF LIRMQEEKNG NSEFHMKNLV MTTLSLFFAG 300
SETVSSTLRY GFLLLMKHPD VEAKVHEEIE QVIGRNRQPQ YEDHMKMPYT QAVINEIQRF 360
SNLAPLGI PR RIIKNTTFRG FFLPKATDVF PILGSLMTDP KFFPSPKDFD PQNFLDDKGQ 420
LKKNAAFLPF STGKRFCLGD GLAKMELFLL LTTILQNFRF KFPMKLEDIN ESPKPLGFTR 480
IIPKYTMSFM PI 492
SEQ ID NO:40
Hyoscyamus muticus
MQFFSLVSIF LFLSFLFLLR KWKNSNSQSK KLPPGPWKLP LLGSMLHMVG GLPHHVLRDL 60
AKKYGPLMHL QLGEVSAWV TSPDMAKEVL KTHDIAFASR PKLLAPEIVC YNRSDIAFCP 120
YGDYWRQMRK ICVLEVLSAK NVRSFSSIRR DEVLRLVNFV RSSTSEPVNF TERLFLFTSS 180
MTCRSAFGKV FKEQETFIQL IKEVIGLAGG FDVADIFPSL KFLHVLTGME GKIMKAHHKV 240
DAIVEDVINE HKKNLAMGKT NGALGGEDLI DVLLRLMNDG GLQFPITNDN IKAIIFDMFA 300
AGTETSSSTL VWAMVQMMRN PTILAKAQAE VREAFKGKET FDENDVEELK YLKLVIKETL 360
RLHPPVPLLV PRECREETEI NGYTI PVKTK VMVNVWALGR DPKYWDDADN FKPERFEQCS 420
VDFIGNNFEY LPFGGGRRIC PGISFGLANV YLPLAQLLYH FDWKLPTGME PKDLDLTELV 480 GVTAARKSDL MLVATPYQPS RE 502
SEQ ID NO:41
Thapsia villosa
MMDQQTLFLS LCSMLSVLVF LYIWLSTSKT TGKNLPPSPQ KLPIIGNLHQ VNQDPHISLR 60
SLAKKYGPVM QLHFGSIPVL VVSSADAAKE IMKTHDLAFA NRPNSSIWDK IFYNGKDVVF 120
APYSEYWRQV KSICVLQLLS NKRVRSFQTV REEEVALLVE NIRESGSKTV NLSELFYTLL 180
SNWSRIALG RKYAITTEGG KENSFKELFQ NIAQLIGYVS VGDYIPWLFW IDSVNGLKGR 240
VEKAANEADL FLEGVIKDHS VALDNGASSD DLLYNLLEIQ KQDTNSTFSI DKDSIKGVIL 300
NMFFDGTDTT SAVLEWTMAA LIKHPDIMCK LKNEVREIGR GKSKICGDDL EKMHYLKAW 360
KESMRIYTPV PLLVAREAMQ DVKLMGYDVK SGTQVLINAW AIATDPALWD NPEEFIPERF 420
LNNPIDYKGL HFEFIPFGAG RRGCPGIQYA MAINELALAN LVHIFDFALP DGRRLEDLDM 480
TSETGMTLHK KSPLLVIATS RV 502
SEQ ID NO:42
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60 LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQIA QACGHDQISI GFLPVATTWR 120 DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180 TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240 LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300 TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360 PGPLLLPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420 GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480 RRATPLRIIP LKP 493
SEQ ID NO:43
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60 LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120 DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAV FITTLNLMSA 180 TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240 LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300 TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360 PGPLLLPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420 GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480 RRATPLRIIP LKP 493
SEQ ID NO:44
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACDHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:45
Artificial Sequence METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVATIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQVVNGYL I PKGTQLLFN AMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:46
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQVVNGYL IPKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:47
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQVVNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:48
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQVVNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:49
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300 TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PVPLLLPRKA GSDQWNGYL I PKGTQLLFN AMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:50
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLFPRKA GSDQWNGYL IPKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:51
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:52
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLEPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:53
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGELFGLAIR 480
RATPLRIIPL KP 492 SEQ ID NO:54
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN AMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPTGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:55
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGMAI 480
RRATPLRIIP LKP 493
SEQ ID NO:56
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGIAI 480
RRATPLRIIP LKP 493
SEQ ID NO:57
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGAAI 480
RRATPLRIIP LKP 493
SEQ ID NO:58
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60 LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQIA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLFPRKA GSDQWNGYL I PKGTQLLFN AMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:59
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQIA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANSYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLEPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:60
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACDHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLLPRKA GSDQWNGYL I PKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:61
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGYFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLFPRKA GSDQWNGYL IPKGTQLLFN WAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:62
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQIA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360 PGPLLLPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:63
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLEPRKA GSDQWNGYL I PKGTQLLFN AMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:64
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQIA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKWSES EMARLPYLQA VIKEVLRLHP 360
PGPLLFPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:65
Coleus forskohlii
MGSLSTMNLN HSPMSYSGIL PSSSAKAKLL LPGCFSISAW MNNGKNLNCQ LTHKKISKVA 60
EIRVATVNAP PVHDQDDSTE NQCHDAVNNI EDPIEYIRTL LRTTGDGRIS VSPYDTAWVA 120
LIKDLQGRDA PEFPSSLEWI IQNQLADGSW GDAKFFCVYD RLVNTIACW ALRSWDVHAE 180
KVERGVRYIN ENVEKLRDGN EEHMTCGFEV VFPALLQRAK SLGIQDLPYD APVIQEIYHS 240
REQKSKRI PL EMMHKVPTSL LFSLEGLENL EWDKLLKLQS ADGSFLTSPS STAFAFMQTR 300
DPKCYQFIKN TIQTFNGGAP HTYPVDVFGR LWAIDRLQRL GISRFFESEI ADCIAHIHRF 360
WTEKGVFSGR ESEFCDIDDT SMGVRLMRMH GYDVDPNVLK NFKKDDKFSC YGGQMIESPS 420
PIYNLYRASQ LRFPGEQILE DANKFAYDFL QEKLAHNQIL DKWVISKHLP DEIKLGLEMP 480
WYATLPRVEA RYYIQYYAGS GDVWIGKTLY RMPEISNDTY HELAKTDFKR CQAQHQFEWI 540
YMQEWYESCN MEEFGISRKE LLVAYFLATA SIFELERANE RIAWAKSQII STIIASFFNN 600
QNTSPEDKLA FLTDFKNGNS TNMALVTLTQ FLEGFDRYTS HQLKNAWSVW LRKLQQGEGN 660
GGADAELLVN TLNICAGHIA FREEILAHND YKTLSNLTSK ICRQLSQIQN EKELETEGQK 720
TSIKNKELEE DMQRLVKLVL EKSRVGINRD MKKTFLAVVK TYYYKAYHSA QAIDNHMFKV 780
LFEPVA 786
SEQ ID NO:66
Coleus forskohlii
atggaaacca tgactcttct cctccctctt ttcttcatcg ctctgacata tttcctctcc 60 tggaggcgcc ggagaaacct tcctccgggg ccttttcctc ttccaatcat cggaaacttg 120 ctgcaaatcg gctccaaacc ccaccagtca ttcgcccaac tctcaaagaa atatgggcct 180 ctcatgtccg tccaactcgg gagtgtatac accgtgatag cctcctcccc ggaaatggcg 240 aaagagatac tgcaaaaaca cggccaagtg ttttccggga gaaccatcgc acaggcggcg 300 caagcgtgcg gccacgacca gatctccatc gggtttctgc cggtggcaac cacgtggcgt 360 gatatgcgta aaatatgcaa agaacagatg ttctcgcatc acagcctgga atccagcaag 420 gagctgaggc acgagaagct gcagaagctg ctggactacg cccagaaatg ctgcgaagcc 480 ggccgtgccg ttgatattcg tgaggccgcc ttcattacaa cgctcaacct catgtctgcc 540 acgttgttct cgactcaagc tactgagttc gactccgaag ctacaaaaga gtttaaggag 600 gtcatcgagg gggtggccgt cattgtgggt gagcctaatt tcgctgacta cttccccatc 660 ttgaagcctt tcgatcttca ggggatcaag cgtagagcta atagctactt tggaagactg 720 ctcaagttaa tggagagata tctgaatgag aggctggaat caagaaggtt gaacccagat 780 gcccccaaga agaatgactt tttggaaacc ctggtggata tcatccaggc tgatgaatac 840 aagctcacga ccgaccacgt cacgcacctc atgcttgact tatttgttgg aggatcggaa 900 acaagcgcga cctcactgga atggataatg tcggagttag tgagcaatcc gagtaaattg 960 gcgaaggtga aagcggagct caagagcgtt gtaggagaaa agaaagtggt gagcgaatca 1020 gaaatggcga ggctgccata cttgcaagca gtgatcaaag aagtgctccg acttcaccct 1080 cccggccctc ttctgcttcc tcgcaaggca gggagtgatc aagttgtgaa tggatacctg 1140 atcccaaagg gaactcaatt actcttcaat gtatgggcaa tgggcagaga ccccagtatc 1200 tggaagaatc ctgaatcttt cgagcccgag cgcttcctca atcaaaacat agactacaaa 1260 ggccaagatt tcgagctcat tccattcggg tccgggagaa gaatttgccc cggtatgccg 1320 ctggcggatc ggattatgca catgacgacg gccactctgg ttcacaactt cgattggaaa 1380 ctggaagacg gagcaggtga tgcggatcac aagggagacg accccttcgg cttggccatc 1440 cgccgtgcaa ctcctctcag gatcattcca cttaagccat ga 1482
SEQ ID NO:67
Artificial Sequence
atggaaacca tgaccttgtt gttgcctttg ttctttattg ccttgaccta cttcttgtct 60 tggcgtagaa gaagaaattt gccaccaggt ccatttccat tgccaattat tggtaacttg 120 ttgcaaatcg gttccaagcc acatcaatct tttgctcaat tgtccaaaaa gtacggtcca 180 ttgatgtctg ttcaattggg ttctgtctac accgttattg cttcatctcc agaaatggcc 240 aaagaaatct tgcaaaaaca cggtcaagtt ttctccggta gaactattgc tcaagcagct 300 caagcttgtg gtcatgatca aatttctatt ggtttcttgc cagttgctac cacttggaga 360 gatatgagaa agatctgcaa agaacaaatg ttctcccacc actctttgga atcctctaaa 420 gaattgagac acgaaaagtt gcaaaagttg ttggattacg ctcaaaagtg ttgtgaagct 480 ggtagagctg ttgatattag agaagctgct ttcattacca ccttgaactt aatgtctgct 540 accttgtttt ctacccaagc tactgaattt gattccgaag ctaccaaaga attcaaagaa 600 gttattgaag gtgttgccgt tatcgttggt gaacctaatt ttgctgatta cttcccaatc 660 ttgaagccat ttgacttgca aggtattaag agaagagcca actcttactt cggtagatta 720 ttgaagttga tggaaagata cttgaacgaa agattggaat ctagaagatt gaacccagat 780 gctccaaaga agaacgattt cttggaaacc ttggttgaca ttatccaagc cgatgaatac 840 aagttgacta ccgatcatgt tacccacttg atgttggatt tgtttgttgg tggttctgaa 900 acctctgcta cttcattgga atggatcatg tctgaattgg tcagtaaccc atctaagttg 960 gctaaagtta aggccgaatt gaagtctgtt gttggtgaaa agaaggttgt ttccgaatct 1020 gaaatggcta gattgccata cttgcaagcc gttatcaaag aagtcttgag attgcatcca 1080 cctggtcctt tgttgttacc aagaaaagct ggttctgatc aagttgtcaa cggttacttg 1140 attcctaagg gtactcaatt gttgttcaac gtttgggcta tgggtagaga tccatctatt 1200 tggaaaaacc cagaatcctt cgaaccagaa agatttttga atcaaaacat cgactacaag 1260 ggtcaagatt tcgaattgat tccattcggt tctggtagaa gaatttgtcc aggtatgcca 1320 ttggctgata gaattatgca tatgactacc gccactttgg ttcataattt cgattggaaa 1380 ttggaagatg gtgctggtga tgctgatcat aagggtgatg atccttttgg tttggctatt 1440 agaagagcta ccccattgag aattatccca ttgaagcctt ga 1482
SEQ ID NO:68
Tomato bushy stunt virus
MERAIQGNDA REQANSERWD GGSGGTTSPF KLPDESPSWT EWRLHNDETN SNQDNPLGFK 60
ESWGFGKVVF KRYLRYDRTE ASLHRVLGSW TGDSVNYAAS RFFGFDQIGC TYSIRFRGVS 120
ITVSGGSRTL QHLCEMAIRS KQELLQLAPI EVESNVSRGC PEGTETFEKE SE 172
SEQ ID NO:69 Artificial Sequence
FXXGXXXCXG
SEQ ID NO:70
Rosmarinus officinalis
MDSFPLLAAL FFILAATWFI SFRRPRNLPP GPFPYPIVGN MLQLGTQPHE TFAKLSKKYG 60
PLMSIHLGSL YTVIVSSPEM AKEIMHKYGQ VFSGRTVAQA VHACGHDKIS MGFLPVGGEW 120
RDMRKICKEQ MFSHQSMEDS QWLRKQKLQQ LLEYAQKCSE RGRAIDIREA AFITTLNLMS 180
ATLFSMQATE FDSKVTMEFK EIIEGVASIV GVPNFADYFP ILRPFDPQGV KRRADVYFGR 240
LLAIIEGFLN ERVESRRTNP NAPKKDDFLE TLVDTLQTND NKLKTDHLTH LMLDLFVGGS 300
ETSTTEIEWI MWELLANPEK MAKMKAELKS VMGEEKWDE SQMPRLPYLQ AVVKESMRLH 360
PPGPLLLPRK AESDQVVNGY LIPKGAQVLI NAWAIGRDHS IWKNPDSFEP ERFLDQKIDF 420
KGTDYELI PF GSGRRVCPGM PLANRILHTV TATLVHNFDW KLERPEASDA HRGVLFGFAV 480
RRAVPLKIVP FKV 493
SEQ ID NO:71
Rosmarinus officinalis
MDSFPLLVAL FFIAVTTFLS FRRPRNLPPG PFPLPIVGNM LQLGTQPHET FAKLSKKYGP 60
LMSIHLGSLY TVIVSSPEMA KEIMHKYGQV FSGRTVAQAV HACGHDKISM GFLPVGGEWR 120
DMRKICKEQM FSHQSMEDSQ WLRKQKLQQL LEYAQKCSER GRAIDIREAA FITTLNLMSA 180
TLFSMQATEF DSKVTMEFKE IIEGVATIVG VPNFADYFPI LRPFDPQGVK RRADVYFGRL 240
LAIIEGFLNE RIESRRTNPN APKKDDFLET LVDTLQTNDN KLKTDHLTHL MLDLFVGGSE 300
TSTTSIEWTM SELVM PEKM AKLKAELKSV AGDEKIVDES EIAKLPYLQA VIKEVMRIHP 360
PGPLLLPRKA ESDQEVNGYL I PKGTQVLIN AWAIGRDPSV WKNPDSFEPE RFLEQKIDFK 420
GQDFELLPFG SGRRVCPGLP LASRILHMTA ATLVHNFDWK LEDEATAEAD HAGELFGLAV 480
RRAVPLRIIP IVKS 494
SEQ ID NO:72
Rosmarinus officinalis
MEFSTLLIAL FSITLTYFLF KSGSKRRSGA KLPPGPYPLP IVGNIFQLGT KPHQSLAQLA 60
KTHGPLMSLR FGSIYTVIVT SPEMAREIFV RHDQDFLNRT WEAVHAHDH DSISMAFMDV 120
GAEWRALRRI CKEQIFSVQS LEASQGLRRE KLHQLREYVL RCCDAGRWD IREASFVTTL 180
NLMSASLFSI QATEFDSTAT EEFREIMEGV ASIVGDPNFA DYFPILKRFD PQGVKRKAEL 240
YFGKMLVLVQ DLLEKRQEER RRSPDYVKKN DLLEKLVDVL NEESEYKLTT KHITHLLLDL 300
FVGGSETTTT SVEWIMSELL INPEKLEKTK EELQRWGEK NQVQESDIPR LPYFEAVLKE 360
VFRLHPPGPL LLPRKAEREV QIGEYTI PKD TQILINAWAI GRDPSIWPNP EAFQPERFLA 420
QKKDYKGQDF ELIPFGSGRR MCPGLSFANR MLPMTVATLI HNFDWKLEVE ANAQDVHKGE 480
MFGIAVRRAV PLRAYPISH 499
SEQ ID NO:73
Salvia fructicosa
MDPFPLVAAA LFIAATWFIT FKRRRNLPPG PFPYPIVGNM LQLGSQPHET FAKLSKKYGP 60
LMSIHLGSLY TVIISSPEMA KEIMHKYGQV FSGRTIAQAV HACDHDKISM GFLPVGAEWR 120
DMRKICKEQM FSHQSMEDSQ NLRKQKLQQL LEYAQKCSEE GRGIDIREAA FITTLNLMSA 180
TLFSMQATEF DSKVTMEFKE IIEGVASIVG VPNFADYFPI LRPFDPQGVK RRADVYFGRL 240
LGLIEGYLNE RIEFRKANPN APKKDDFLET LVDALDAKDY KLKTEHLTHL MLDLFVGGSE 300
TSTTEIEWIM WELLASPEKM AKVKAELKSV MGGEKWDES MMPRLPYLQA WKESMRLHP 360
PGPLLLPRKA ESDQWNGYL I PKGAQVLIN AWAMGRDPSL WKNPDSFEPE RFLDQKIDFK 420
GTDYELIPFG SGRRVCPGMP LANRILHTVT ATLVHNFDWK LERPEASDAH KGVLFGFAVR 480
RAVPLKIVPI KA 492
SEQ ID NO:74 Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQAA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLEPRKA GSDQWNGYL I PKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
SEQ ID NO:75
Artificial Sequence
METMTLLLPL FFIALTYFLS WRRRRNLPPG PFPLPIIGNL LQIGSKPHQS FAQLSKKYGP 60
LMSVQLGSVY TVIASSPEMA KEILQKHGQV FSGRTIAQIA QACGHDQISI GFLPVATTWR 120
DMRKICKEQM FSHHSLESSK ELRHEKLQKL LDYAQKCCEA GRAVDIREAA FITTLNLMSA 180
TLFSTQATEF DSEATKEFKE VIEGVAVIVG EPNFADYFPI LKPFDLQGIK RRANGFFGRL 240
LKLMERYLNE RLESRRLNPD APKKNDFLET LVDIIQADEY KLTTDHVTHL MLDLFVGGSE 300
TSATSLEWIM SELVSNPSKL AKVKAELKSV VGEKKVVSES EMARLPYLQA VIKEVLRLHP 360
PGPLLFPRKA GSDQWNGYL IPKGTQLLFN VWAMGRDPSI WKNPESFEPE RFLNQNIDYK 420
GQDFELIPFG SGRRICPGMP LADRIMHMTT ATLVHNFDWK LEDGAGDADH KGDDPFGLAI 480
RRATPLRIIP LKP 493
References:
EL-AWAAD et al. , "Bifunctional CYP81AA proteins catalyse identical hydroxylations but alternative regioselective phenol couplings in plant xanthone biosynthesis," Nat Commun. 7:1 1472 (May 2016).
GOTOH et al. , "Substrate Recognition Sites in Cytochrome P450 Family 2 (CYPB) Proteins Inferred from ComparativeAnalyses of Amino Acid and Coding Nucleotide Sequence," J Biol Chem. 267(1):83-90 (January 1992).
GRICMAN et al. , "Identification of universal selectivity- determining positions in cytochrome P450 monooxygenases by systematic sequence-based literature mining," Proteins 83(9): 1593- 603 (September 2015).
HAMBERGER & BAK, "Plant P450s as versatile drivers for evolution of species-specific chemical diversity," Philos Trans R Soc Lond B Biol Sci. 368(1612):20120426 (January 2013).
IGNEA et al. , "Reconstructing the chemical diversity of labdane-type diterpene biosynthesis in yeast," Metab Eng. 28:91-103 (March 2015).
NELSON & WERCK-REICHHART, "A P450-centric view of plant evolution," Plant J. 66(1):194- 21 1 (April 201 1).
SAWADA & AYABE, "Multiple mutagenesis of P450 isoflavonoid synthase reveals a key active- site residue," Biochem Biophys Res Commun. 330:907-13 (May 2005).
SCHALK & CROTEAU, "A single amino acid substitution (F363I) converts the regiochemistry of the spearmint (-)-limonene hydroxylase from a C6- to a C3-hydroxylase," Proc Natl Acad Sci U S A. 97(22): 1 1948-53 (October 2000).
SEIFERT & PLEISS, "Identification of selectivity-determining residues in cytochrome P450 monooxygenases: a systematic analysis of the substrate recognition site 5," Proteins 74(4): 1028-35 (March 2009).
TAKAHASHI et al. , "Functional characterization of premnaspirodiene oxygenase, a cytochrome P450 catalyzing regio- and stereo-specific hydroxylations of diverse sesquiterpene substrates," J Biol Chem. 282(43):31744-54 (October 2007).
TOPORKOVA et al. , "Determinants governing the CYP74 catalysis: conversion of allene oxide synthase into hydroperoxide lyase by site-directed mutagenesis," FEBS Lett. 582(23-24):3423-8 (October 2008).

Claims

WHAT IS CLAIMED IS:
1. A recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 11- positions, having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:22, and further having at least one amino acid substitution corresponding to residues 99, 100, 104, 207, 235, 236, 362, 366, 473, 474, 476, and/or 478 of SEQ ID NO:22.
2. The recombinant host cell of claim 1 , wherein the polypeptide comprises an A99I, A100V, G104D, V207T, S235G, Y236F, G362V, L366F, L366E, D473E, D474L, F476T, L478M, L478A, and/or L478I substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22.
3. The recombinant host cell of claim 1 or 2, wherein the polypeptide comprises:
(a) an A99I substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(b) an S235G and Y236F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(c) an L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(d) an L366E substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(e) an A99I, S235G, and Y236F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(f) an A99I and L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(g) an S235G, Y236F, and L366E substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(h) an A99I, S235G, Y236F, and L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22;
(i) a G362V and L366F substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22; (k) a G362V substitution in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22; or
(I) a D473E and D474L substitution, and a P475 deletion in the polypeptide having the amino acid sequence corresponding to SEQ ID NO:22.
4. A recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 11- positions, having at least one amino acid substitution corresponding to residues 93-1 16; 202-209; 233-240; 286-304; 359-369 or 473-480 of SEQ ID NO:22.
5. A recombinant host cell capable of producing ferruginol, 13f?-manoyl oxide (13R-MO), and/or a 13-R-MO derivative, comprising a recombinant gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1-, 6-, 7-, 9-, and/or 11- positions, having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:70, SEQ ID NO:71 , SEQ ID NO:72, or SEQ ID NO:73.
6. The recombinant host cell of any one of claims 1 -5, wherein the 13R-MO derivative is an oxidized 13R-MO derivative.
7. The recombinant host cell of claim 6, wherein the 13R-MO derivative is 1 1-oxo-13f?-MO and/or 1 1/3-hydroxy-13R-MO.
8. The recombinant host cell of any one of claims 1 -5, wherein the 13R-MO derivative is forskolin.
9. The recombinant host cell of any one of claims 1 -8, wherein the recombinant host cell further comprises:
(a) a gene encoding a polypeptide capable of synthesizing (5S,8R,9R, 10R)- labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate;
(b) a gene encoding a polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its 1 -, 6-, 7-, 9-, and/or 1 1 -positions; (c) a gene encoding a polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO;
(d) a gene encoding a polypeptide capable of synthesizing GGPP from farnesyl diphosphate (FPP) and isopentyl diphosphate (IPP);
(e) a gene encoding a polypeptide capable of synthesizing 1-deoxy-D- xylulose 5-phosphate (DXS) from pyruvate and D-glyceraldehyde 3- phosphate;
(f) a gene encoding a polypeptide capable of reducing cytochrome P450 complex; and/or
(g) a gene encoding an anti-post transcriptional suppressor protein polypeptide.
wherein at least one of the genes is a recombinant gene.
The recombinant host cell of claim 9, wherein:
(a) the polypeptide capable of synthesizing (5S,8R,9R, 10R)-labda-8-ol diphosphate from geranylgeranyl diphosphate (GGPP) and/or capable of synthesizing 13R-MO from (5S,8R,9R, 10R)-labda-8-ol diphosphate comprises a polypeptide having at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:16 or SEQ ID NO:17, or at least 40% sequence identity to the amino acid sequence set forth in SEQ ID NO: 18;
(b) the polypeptide capable of oxidizing 13R-MO and/or oxidized 13R-MO at its -, 6-, 7-, 9-, and/or 1 1 -position comprises a polypeptide having at least 55% sequence identity to the amino acid sequence set forth in SEQ ID NO: 19, or at least 50% sequence identity to the amino acid sequence set forth in SEQ ID NO:20, SEQ ID NO:21 , or SEQ ID NO:23;
(c) the polypeptide capable of acetylating 13R-MO and/or oxidized 13R-MO comprises a polypeptide having at least 40% sequence identity to SEQ ID NO:26;
(d) the polypeptide capable of synthesizing GGPP from FPP and IPP comprises a polypeptide having at least 70% sequence identity to the amino acid sequence set forth in SEQ ID NO:32 or SEQ ID NO:37; (e) the polypeptide capable of synthesizing DXS from pyruvate and D- glyceraldehyde 3-phosphate comprises a polypeptide having at least 75% sequence identity to the amino acid sequence set forth in SEQ ID NO:30;
(f) the polypeptide capable of reducing cytochrome P450 complex comprises a polypeptide having at least 75% sequence identity to the amino acid sequence set forth in SEQ ID NO:34; and/or
(g) the anti-post transcriptional suppressor protein polypeptide comprises a polypeptide having at least 65% sequence identity to the amino acid sequence set forth in SEQ ID NO:68.
1 1 . The recombinant host cell of any one of claims 1-10, wherein the recombinant host cell comprises a plant cell, a mammalian cell, an insect cell, a fungal cell, an algal cell or a bacterial cell.
12. The recombinant host cell of claim 1 1 , wherein the bacterial cell comprises Escherichia cells, Lactobacillus cells, Lactococcus cells, Cornebacterium cells, Acetobacter cells, Acinetobacter cells, or Pseudomonas cells.
13. The recombinant host cell of claim 12, wherein the fungal cell comprises a yeast cell.
14. The recombinant host cell of claim 13, wherein the yeast cell is a cell from Saccharomyces cerevisiae, Schizosaccharomyces pombe, Yarrowia lipolytica, Candida glabrata, Ashbya gossypii, Cyberlindnera jadinii, Pichia pastoris, Kluyveromyces lactis, Hansenula polymorpha, Candida boidinii, Arxula adeninivorans, Xanthophyllomyces dendrorhous, or Candida albicans species.
15. The recombinant host cell of claim 14, wherein the yeast cell is a Saccharomycete.
16. The recombinant host cell of claim 15, wherein the yeast cell is a Saccharomyces cerevisiae cell.
17. The recombinant host cell of claim 11 , wherein the plant cell is a Nicotiana benthamiana cell.
18. A method of producing 13R-MO and/or a 13R-MO derivative in a cell culture, comprising growing the recombinant host cell of any one of claims 1-17 in the cell culture, under conditions in which the genes are expressed, and wherein 13R-MO and/or the 13R-MO derivative is produced by the recombinant host cell.
19. The method of claim 18, wherein the recombinant host cell is grown in a fermentor at a temperature for a period of time, wherein the temperature and period of time facilitate the production of 13R-MO and/or the 13R-MO derivative.
20. The method of claim 18 or 19, that further comprises a step of isolating 13R-MO and/or the 13R-MO derivative.
21 . The method of any one of claims 18-20, wherein the 13R-MO derivative is 1 1-oxo-13R- MO, 1 1/3-hydroxy-13R-MO or forskolin.
22. A 13R-MO derivative composition produced by the recombinant host cell of any one of claims 1 -17.
23. A 13R-MO derivative composition produced by the method of any one of claims 18-21 .
PCT/EP2017/068418 2016-07-20 2017-07-20 Biosynthesis of 13r-manoyl oxide derivatives WO2018015512A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662364615P 2016-07-20 2016-07-20
US62/364,615 2016-07-20

Publications (1)

Publication Number Publication Date
WO2018015512A1 true WO2018015512A1 (en) 2018-01-25

Family

ID=59569280

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/068418 WO2018015512A1 (en) 2016-07-20 2017-07-20 Biosynthesis of 13r-manoyl oxide derivatives

Country Status (1)

Country Link
WO (1) WO2018015512A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023242403A1 (en) * 2022-06-16 2023-12-21 Danmarks Tekniske Universitet Microbial cells and methods for production of hernandulcin

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5204253A (en) 1990-05-29 1993-04-20 E. I. Du Pont De Nemours And Company Method and apparatus for introducing biological substances into living cells
US5538880A (en) 1990-01-22 1996-07-23 Dekalb Genetics Corporation Method for preparing fertile transgenic corn plants
US6013863A (en) 1990-01-22 2000-01-11 Dekalb Genetics Corporation Fertile transgenic corn plants
US6329571B1 (en) 1996-10-22 2001-12-11 Japan Tobacco, Inc. Method for transforming indica rice
US20060014264A1 (en) 2004-07-13 2006-01-19 Stowers Institute For Medical Research Cre/lox system with lox sites having an extended spacer region
WO2009140394A1 (en) 2008-05-13 2009-11-19 Cargill, Incorporated Separation of rebaudioside a from stevia glycosides using chromatography
CN102676549A (en) * 2012-01-09 2012-09-19 中国中医科学院中药研究所 CYP450 (Cytochrome P450) gene participating in tanshinone biosynthesis and coded product as well as application thereof
CN103695441A (en) * 2013-10-24 2014-04-02 中国中医科学院中药研究所 Cytochrome P450 gene participated in anabolism of tanshinone compound as well as coding product and application thereof
WO2015113569A1 (en) * 2014-01-31 2015-08-06 University Of Copenhagen Biosynthesis of forskolin and related compounds
WO2015197075A1 (en) * 2014-06-23 2015-12-30 University Of Copenhagen Methods and materials for production of terpenoids
WO2016070885A1 (en) * 2014-11-07 2016-05-12 University Of Copenhagen Biosynthesis of oxidised 13r-mo and related compounds
CN106434704A (en) * 2016-03-24 2017-02-22 中国医学科学院药用植物研究所 Cytochrome P450 gene CYP76AH12 involved in tanshinone compound biosynthesis and coding product and application thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5538880A (en) 1990-01-22 1996-07-23 Dekalb Genetics Corporation Method for preparing fertile transgenic corn plants
US6013863A (en) 1990-01-22 2000-01-11 Dekalb Genetics Corporation Fertile transgenic corn plants
US5204253A (en) 1990-05-29 1993-04-20 E. I. Du Pont De Nemours And Company Method and apparatus for introducing biological substances into living cells
US6329571B1 (en) 1996-10-22 2001-12-11 Japan Tobacco, Inc. Method for transforming indica rice
US20060014264A1 (en) 2004-07-13 2006-01-19 Stowers Institute For Medical Research Cre/lox system with lox sites having an extended spacer region
WO2009140394A1 (en) 2008-05-13 2009-11-19 Cargill, Incorporated Separation of rebaudioside a from stevia glycosides using chromatography
CN102676549A (en) * 2012-01-09 2012-09-19 中国中医科学院中药研究所 CYP450 (Cytochrome P450) gene participating in tanshinone biosynthesis and coded product as well as application thereof
CN103695441A (en) * 2013-10-24 2014-04-02 中国中医科学院中药研究所 Cytochrome P450 gene participated in anabolism of tanshinone compound as well as coding product and application thereof
WO2015113569A1 (en) * 2014-01-31 2015-08-06 University Of Copenhagen Biosynthesis of forskolin and related compounds
WO2015197075A1 (en) * 2014-06-23 2015-12-30 University Of Copenhagen Methods and materials for production of terpenoids
WO2016070885A1 (en) * 2014-11-07 2016-05-12 University Of Copenhagen Biosynthesis of oxidised 13r-mo and related compounds
CN106434704A (en) * 2016-03-24 2017-02-22 中国医学科学院药用植物研究所 Cytochrome P450 gene CYP76AH12 involved in tanshinone compound biosynthesis and coding product and application thereof

Non-Patent Citations (57)

* Cited by examiner, † Cited by third party
Title
"Wiley Registry of Mass Spectral Data", July 2006, JOHN WILEY & SONS
AGEITOS ET AL., APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 90, no. 4, 2011, pages 1219 - 1227
ASADA, PHYTOCHEMISTRY, vol. 79, 2012, pages 141 - 146
AUSUBEL ET AL.: "CURRENT PROTOCOLS IN MOLECULAR BIOLOGY", 1989, GREENE PUBLISHING ASSOCIATES AND WILEY INTERSCIENCE
BANKAR, APPL MICROBIOL BIOTECHNOL., vol. 84, no. 5, 2009, pages 847 - 865
BARNETT ET AL., YEASTS: CHARACTERISTICS AND IDENTIFICATION, 1983
BATEMAN ET AL., NUCL. ACIDS RES., vol. 27, 1999, pages 260 - 262
BEOPOULOS, BIOCHIMIE, vol. 91, no. 6, 2009, pages 692 - 696
CARMICHAEL ET AL., MYCOLOGICA, vol. 49, no. 6, 1957, pages 820 - 830
CHENNA ET AL., NUCLEIC ACIDS RES., vol. 31, no. 13, 2003, pages 3497 - 500
DELPECH ET AL., TETRAHEDRON LETTERS, vol. 37, no. 7, 1996, pages 1019 - 1022
DUEHOLM, BMC EVOLUTIONARY BIOLOGY, vol. 15, 2015, pages 122
EL-AWAAD ET AL.: "Bifunctional CYP81AA proteins catalyse identical hydroxylations but alternative regioselective phenol couplings in plant xanthone biosynthesis", NAT COMMUN., vol. 7, May 2016 (2016-05-01), pages 11472
FANG Y ET AL: "Generation of expressed sequence tags from a cDNA library ofColeus forskohliifor identification of genes involved in terpene biosynthesis", BIOLOGIA PLANTARUM, KLUWER ACADEMIC PUBLISHERS, DO, vol. 59, no. 3, 13 June 2015 (2015-06-13), pages 463 - 468, XP035521967, ISSN: 0006-3134, [retrieved on 20150613], DOI: 10.1007/S10535-015-0526-X *
FUGELSANG ET AL., WINE MICROBIOLOGY, 1997
GIAEVER; NISLOW, GENETICS, vol. 197, no. 2, 2014, pages 451 - 465
GODARD ET AL., OBES RES., vol. 13, 2005, pages 1335 - 1343
GOSSEN ET AL., ANN. REV. GENETICS, vol. 36, 2002, pages 153 - 173
GOTOH ET AL.: "Substrate Recognition Sites in Cytochrome P450 Family 2 (CYPB) Proteins Inferred from ComparativeAnalyses of Amino Acid and Coding Nucleotide Sequence", J BIOL CHEM., vol. 267, no. 1, January 1992 (1992-01-01), pages 83 - 90
GOTOH, J. BIOL. CHEM., vol. 1, no. 5, 1997, pages 83 - 90
GREEN; SAMBROOK: "MOLECULAR CLONING: A LABORATORY MANUAL", 2012, COLD SPRING HARBOR LABORATORY
GRICMAN ET AL.: "Identification of universal selectivity- determining positions in cytochrome P450 monooxygenases by systematic sequence-based literature mining", PROTEINS, vol. 83, no. 9, September 2015 (2015-09-01), pages 1593 - 1603
GUO ET AL., NEW PHYTOL., vol. 210, 2016, pages 525 - 534
HAMBERGER; BAK: "Plant P450s as versatile drivers for evolution of species-specific chemical diversity", PHILOS TRANS R SOC LOND B BIOL SCI., vol. 368, no. 1612, January 2013 (2013-01-01), pages 20120426
HOFFMAN ET AL., GENETICS., vol. 201, no. 2, 2015, pages 403 - 423
IGNEA ET AL., MICROB. CELL FACT., vol. 15, 2016, pages 46
IGNEA ET AL.: "Reconstructing the chemical diversity of labdane-type diterpene biosynthesis in yeast", METAB ENG., vol. 28, March 2015 (2015-03-01), pages 91 - 103
INNIS ET AL.: "PCR Protocols: A Guide to Methods and Applications", 1990, ACADEMIC PRESS
KHOURY ET AL., PROTEIN SCI., vol. 18, no. 10, 2009, pages 2125 - 2138
KIKURA ET AL., PHARMACOL RES, vol. 49, 2004, pages 275 - 281
LI ET AL., ENZYME AND MICROBIAL TECHNOLOGY, vol. 41, 2007, pages 312 - 317
MASTROMARINO, NEW MICROBIOLGICA, vol. 36, 2013, pages 229 - 238
MATTANOVICH ET AL., METHODS MOL BIOL., vol. 824, 2012, pages 329 - 358
NELSON; WERCK-REICHHART: "A P450-centric view of plant evolution", PLANT J., vol. 66, no. 1, April 2011 (2011-04-01), pages 194 - 211
NICAUD, YEAST, vol. 29, no. 10, 2012, pages 409 - 418
OOYEN ET AL., FEMS YEAST RES., vol. 6, no. 3, 2006, pages 381 - 392
OSMANI ET AL., PHYTOCHEMISTRY, vol. 70, 2009, pages 325 - 347
PAPINI ET AL., MICROBIAL CELL FACTORIES, vol. 11, 2012, pages 136
PIIRAINEN ET AL., N BIOTECHNOL., vol. 31, no. 6, 2014, pages 532 - 7
PODUST, NAT. PROD. REP., vol. 29, 2012, pages 1251 - 1266
PRELICH, GENETICS, vol. 190, 2012, pages 841 - 854
SAENGE, PROCESS BIOCHEMISTRY, vol. 46, no. 1, 2011, pages 210 - 218
SAWADA Y ET AL: "Multiple mutagenesis of P450 isoflavonoid synthase reveals a key active-site residue", BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICAT, ELSEVIER, AMSTERDAM, NL, vol. 330, no. 3, 13 May 2005 (2005-05-13), pages 907 - 913, XP004833724, ISSN: 0006-291X, DOI: 10.1016/J.BBRC.2005.03.053 *
SAWADA; AYABE: "Multiple mutagenesis of P450 isoflavonoid synthase reveals a key active-site residue", BIOCHEM BIOPHYS RES COMMUN., vol. 330, May 2005 (2005-05-01), pages 907 - 13, XP004833724, DOI: doi:10.1016/j.bbrc.2005.03.053
SCHALK; CROTEAU: "A single amino acid substitution (F3631) converts the regiochemistry of the spearmint (-)-limonene hydroxylase from a C6- to a C3-hydroxylase", PROC NATL ACAD SCI U S A., vol. 97, no. 22, October 2000 (2000-10-01), pages 11948 - 11953, XP002317202, DOI: doi:10.1073/pnas.97.22.11948
SCHULER ET AL., ANNU REV. PLANT BIOL., vol. 54, 2003, pages 629 - 667
SEIFERT; PLEISS: "Identification of selectivity-determining residues in cytochrome P450 monooxygenases: a systematic analysis of the substrate recognition site 5", PROTEINS, vol. 74, no. 4, March 2009 (2009-03-01), pages 1028 - 1035
SONNHAMMER ET AL., NUCL. ACIDS RES., vol. 26, 1998, pages 320 - 322
SONNHAMMER ET AL., PROTEINS, vol. 28, 1997, pages 405 - 420
TAKAHASHI ET AL.: "Functional characterization of premnaspirodiene oxygenase, a cytochrome P450 catalyzing regio- and stereo-specific hydroxylations of diverse sesquiterpene substrates", J BIOL CHEM., vol. 282, no. 43, October 2007 (2007-10-01), pages 31744 - 31754, XP002729863, DOI: doi:10.1074/jbc.M703378200
TAKAHASHI, J. BIOL. CHEM., vol. 282, no. 43, 2007, pages 31744 - 31754
TOPORKOVA ET AL.: "Determinants governing the CYP74 catalysis: conversion of allene oxide synthase into hydroperoxide lyase by site-directed mutagenesis", FEBS LETT., vol. 582, no. 23-24, October 2008 (2008-10-01), pages 3423 - 3428, XP025609529, DOI: doi:10.1016/j.febslet.2008.09.005
TOPORKOVA Y Y ET AL: "Determinants governing the CYP74 catalysis: Conversion of allene oxide synthase into hydroperoxide lyase by site-directed mutagenesis", FEBS LETTERS, ELSEVIER, AMSTERDAM, NL, vol. 582, no. 23-24, 15 October 2008 (2008-10-15), pages 3423 - 3428, XP025609529, ISSN: 0014-5793, [retrieved on 20080918], DOI: 10.1016/J.FEBSLET.2008.09.005 *
WAGH ET AL., J POSTGRAD MED., vol. 58, no. 3, 2012, pages 199 - 202
XU ET AL., VIROL SIN., vol. 29, no. 6, 2014, pages 403 - 409
YOUSIF; THULESIUS, J PHARM PHARMACOL., vol. 51, no. 2, 1999, pages 181 - 186
ZHU, NATURE COMMUN., vol. 3, 2013, pages 1112

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023242403A1 (en) * 2022-06-16 2023-12-21 Danmarks Tekniske Universitet Microbial cells and methods for production of hernandulcin

Similar Documents

Publication Publication Date Title
US11091787B2 (en) Methods and materials for biosynthesis of mogroside compounds
JP7057765B2 (en) Recombinant production of steviol glycosides
CN104769121B (en) Vanillin synthase
US11248248B2 (en) Production of mogroside compounds in recombinant hosts
EP3387136B1 (en) Production of steviol glycosides in recombinant hosts
US10208326B2 (en) Methods and materials for biosynthesis of manoyl oxide
US20180265897A1 (en) Production of macrocyclic diterpenes in recombinant hosts
US11396669B2 (en) Production of steviol glycosides in recombinant hosts
WO2015197075A1 (en) Methods and materials for production of terpenoids
US20150059018A1 (en) Methods and compositions for producing drimenol
CN111225979B (en) Terpene synthases for producing patchouli alcohol and elemene alcohol, and preferably also patchouli aol
US20190071474A1 (en) Production of gibberellins in recombinant hosts
WO2018015512A1 (en) Biosynthesis of 13r-manoyl oxide derivatives
US11634718B2 (en) Production of macrocyclic ketones in recombinant hosts
US20180112243A1 (en) Biosynthesis of acetylated 13r-mo and related compounds

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17749626

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17749626

Country of ref document: EP

Kind code of ref document: A1