WO2023006699A1

WO2023006699A1 - Cells and method for producing isoprenoid molecules with canonical and non-canonical structures

Info

Publication number: WO2023006699A1
Application number: PCT/EP2022/070854
Authority: WO
Inventors: Sotirios KAMPRANIS; Mads ROSENFELDT; Lina Wang
Original assignee: Københavns Universitet
Priority date: 2021-07-30
Filing date: 2022-07-26
Publication date: 2023-02-02

Abstract

The invention concerns a genetically engineered eukaryotic cell and a method for the production of a terpene or terpenoid or isoprenoid, said cell comprising a first nucleic acid sequence encoding a first kinase such as AtFKI or a homologue or variant thereof having at least 75% identity thereto that phosphorylates a primary alcohol to a mono- or pyrophosphate terpenoid precursor; and optionally a second nucleic acid sequence encoding a phosphokinase that phosphorylates a monophosphate precursor to a terpenoid pyrophosphate precursor.

Description

CELLS AND METHOD FOR PRODUCING ISOPRENOID MOLECULES WITH CANONICAL AND NON-CANONICAL STRUCTURES

FIELD OF THE INVENTION

The present invention relates to the production of isoprenoids in eukaryotic cells.

BACKGROUND OF THE INVENTION

Isoprenoids are widely used as pharmaceuticals, cosmetics, nutraceuticals, flavours, fragrances, and pesticides, and have recently also found applications as drop-in jet fuels and biopolymers.

Extraction of these valuable compounds from natural sources can hardly meet the increasing demand, while organic chemical synthesis is often inefficient, particularly in the case of compounds having complex structures.

Eukaryotic microorganisms, such as yeasts, are considered good hosts for high-value compound production because of their capacity to synthesize complex chemical structures. However, production of isoprenoids in microorganisms currently depends on the feeding of sugar. This can frequently be a challenging and inefficient process, as microorganisms prefer to use the sugar for growth and other metabolic processes instead of producing the desired product.

Isoprenoids are synthesized by the successive addition of 5-carbon containing building blocks and, as a result, their structures mostly contain multiples of five carbon atoms. Canonically, the biosynthesis of isoprenoids and related compounds like cannabinoids, monoterpene indole alkaloids, and prenylated aromatic compounds, relies either on the MEP pathway and/or the MVA pathway, both leading to the formation of the 5-carbon precursors DMAPP and IPP that are condensed into diphosphate precursors with longer chains (i.e. GPP (C10), FPP (C15), GGPP (C20), GFPP (C25), etc...). These precursors are then converted into a wide array of compounds by terpene synthases (TS) or appended to other non-isoprenoid skeletons by specific prenyl- transferases (as in the case of cannabinoids or prenylated flavonoids). These pathways suffer from inherent limitations due to their length, complex regulation, and extensive cofactor requirements.

The isopentenol utilization pathway (IUP) can produce isopentenyl diphosphate or dimethylallyl diphosphate, the main precursors to isoprenoid synthesis, through sequential phosphorylation of isopentenol isomers isoprenol or prenol, Chatzivasileiou et al., PNAS, 2019. This alternative pathway converts primary alcohols into corresponding pyrophosphates: Conversion systems for the production of DMAPP- and IPP-derived compounds from prenol and isoprenol in bacteria have, however, only been demonstrated for obtaining canonical isoprenoids. Moreover, the enzymes used in these bacterial systems have not been demonstrated to be functional in yeast and have not been found to be efficient in yeast. One currently available system uses the combination of enzymes ScCKI (Ckilp) and AtIPK, which have been shown to convert prenol to isoprenoid precursors in E. coli. Lund et al., ACS Synth. Biol., 2019, disclose an artificial alcohol- dependent hemiterpene biosynthetic pathway and coupled to several isoprenoid biosynthetic systems, affording lycopene and prenylated tryptophan in robust yields. These systems function very inefficiently (or not at all) when the key enzymes are expressed in yeast.

Another bioconversion system utilizes the enzymes SfPhoN and AtIPK. This system has also been shown to function in E. coli. However, the activity of this system is non-existing or very low when it is expressed in yeast (https://pubs.acs.org/doi/10.1021/acssynbio.8b00383).

Johnson et al., Angew Chem Int Ed Engl., 2020, present a use of a single non-canonical building block for isoprenoid synthesis in vitro; this system has not been shown to be functional in eukaryotic cells.

SUMMARY

Therefore, it is a first object of the invention to present a new system for the conversion of a primary alcohol to a mono- or pyrophosphate terpenoid precursor compound in a eukaryotic cell such as yeast. A further object of the invention is to present a new system for the two-step conversion of isoprenol, prenol and prenol-like alcohols to terpene precursor compounds in eukaryotic cells such as yeast. The activity of this new system is fulfilled by alcohol kinase enzymes such as Arabidopsis thaliana farnesol kinase (AtFKI) and optionally an isopentenyl phosphate kinase (AtIPK) or another prenyl phosphate kinase, its yield being considerably higher compared to the other systems described in the background section herein above.

Thereby, in a first aspect the invention relates to cells comprising nucleic acid sequences encoding kinases or parts thereof, e.g., polypeptides or polypeptide analogues with kinase activity as described herein. Specifically, the invention relates to a genetically engineered eukaryotic cell for production of an isoprenoid (or terpene or terpenoid) comprising a first nucleic acid sequence encoding a first kinase that converts a primary alcohol to a mono- or pyrophosphate isoprenoid precursor; and optionally a second nucleic acid sequence encoding a phosphokinase that converts a monophosphate precursor to an isoprenoid pyrophosphate precursor; wherein the first kinase comprises SEQ ID NO: 2 or a homolog or variant thereof having at least 75% identity thereto while exhibiting kinase activity. Hence, the present invention provides a genetically engineered eukaryotic cell for the production of a terpene or terpenoid comprising a first nucleic acid sequence encoding a first kinase that phosphorylates a primary alcohol to a mono- or pyrophosphate terpenoid precursor; and optionally a second nucleic acid sequence encoding a phosphokinase that phosphorylates a monophosphate precursor to a terpenoid pyrophosphate precursor wherein the first kinase comprises SEQ ID NO: 2 or a homolog or variant thereof having at least 75% identity thereto while exhibiting kinase activity.

Also provided herein are vectors comprising the above nucleic acids, as well as host cells comprising said vectors and/or said nucleic acids or polypeptides.

Also provided is the use of above polypeptides, nucleic acids, vectors or host cells for the production of mono- or pyrophosphate isoprenoid precursors, such as terpenoid precursors.

The nucleic acids may be comprised in a vector, e.g., a plasmid, cosmid, virus, or another vector used, e.g., conventionally in genetic engineering. The vector may comprise further sequences such as marker sequences, which allow for the selection of the vector in a suitable host cell and under suitable conditions. Furthermore, the vector may comprise expression control elements allowing proper expression of the coding regions in suitable hosts. Such control elements are known to the person skilled in the art and may include a promoter, a splice cassette, and a translation initiation codon, amongst others. As an alternative or in addition to using a vector, the nucleic acids may be integrated in a certain chromosomal locus in the employed cell, in combination with the expression control elements described above.

Preferably, the nucleic acid of the invention is operatively attached to expression control elements allowing expression in eukaryotic cells. Control elements ensuring expression in eukaryotic cells are well known to those skilled in the art and a further explained herein below.

Methods for construction of nucleic acid molecules, for construction of vectors comprising nucleic acid molecules, for introduction of vectors into appropriately chosen host cells, for insertion of DNA fragments into genomic loci of said cells, or for causing or achieving expression of nucleic acid molecules are well-known in the art. Further detail and exemplary methods are detailed herein below. The system advantageously enables the biocatalytic synthesis of isoprenoids in eukaryotic microorganisms by a method that by-passes the MEP pathway and/or the MVA pathway for the production of DMAPP and IPP.

In a first embodiment, said first nucleic acid sequence encodes a kinase that is capable of both alcohol phosphorylation and phosphate phosphorylation. Thereby, said nucleic acid sequence encodes a single kinase enzyme with bi-catalytic activity and capable of sequential phosphorylation of alcohol and monophosphate substrates.

In another embodiment, the first nucleic acid sequence encodes a kinase that is capable of phosphorylating a primary alcohol to a monophosphate terpenoid or isoprenoid precursor. Said kinase may comprise SEQ ID NO: 2 or a homolog or variant thereof having at least 75% identity thereto, such as at least 80% identity thereto, such as at least 85% identity thereto, such as at least 90% identity thereto, such as at least 95% identity thereto, such as at least 96% identity thereto, such as at least 97% identity thereto, such as at least 98% identity thereto, such as at least 99% identity thereto, while exhibiting kinase activity.

In a preferred embodiment, said nucleic acid sequence encodes an alcohol kinase that is capable of phosphorylating a non-canonical, prenol-like primary alcohol to a non-canonical monophosphate isoprenoid precursor.

In another embodiment, the kinase may comprise SEQ ID NO: 1 or a homolog or variant thereof having at least 75% identity thereto, such as at least 80% identity thereto, such as at least 85% identity thereto, such as at least 90% identity thereto, such as at least 95% identity thereto, such as at least 96% identity thereto, such as at least 97% identity thereto, such as at least 98% identity thereto, such as at least 99% identity thereto, while exhibiting kinase activity.

In a further embodiment, the engineered cells can further include an exogenous nucleic acid sequence encoding a phosphate kinase, i.e. , a phosphokinase, such as prenyl phosphate kinase (or isopentenyl phosphate kinase, IPK) that can phosphorylate a phosphate precursor, e.g., dimethylallyl phosphate (DMAP), to dimethylallyl pyrophosphate (DMAPP) or isopentenyl phosphate (IP) to isopentenyl pyrophosphate (IPP).

In a preferred embodiment, said phosphokinase comprises Arabidopsis thaliana IPK (AtIPK) SEQ ID NO: 3; or a homolog or variant thereof having at least 75% identity thereto, such as at least 80% identity thereto, such as at least 85% identity thereto, such as at least 90% identity thereto, such as at least 95% identity thereto, such as at least 96% identity thereto, such as at least 97% identity thereto, such as at least 98% identity thereto, such as at least 99% identity thereto, while exhibiting kinase activity.

In another embodiment, said phosphokinase comprises MtIPK, SEQ ID NO: 4; or a homolog or variant thereof having at least 75% identity thereto, such as at least 80% identity thereto, such as at least 85% identity thereto, such as at least 90% identity thereto, such as at least 95% identity thereto, such as at least 96% identity thereto, such as at least 97% identity thereto, such as at least 98% identity thereto, such as at least 99% identity thereto, while exhibiting kinase activity.

In another embodiment, said phosphokinase comprises TalPK, SEQ ID NO: 5; or a homolog or variant thereof having at least 75% identity thereto, such as at least 80% identity thereto, such as at least 85% identity thereto, such as at least 90% identity thereto, such as at least 95% identity thereto, such as at least 96% identity thereto, such as at least 97% identity thereto, such as at least 98% identity thereto, such as at least 99% identity thereto, while exhibiting kinase activity.

In a preferred embodiment, said phosphokinase comprises TalPK(204G), SEQ ID NO: 6.

In other embodiments, the phosphokinase comprises ScCK (ScCKI, Cki1 p), ScMK, EcGK, EcHK, HvIPK, MjlPK, or TalPK-3m.

In another embodiment, said phosphokinase is capable of phosphorylating a non-canonical monophosphate isoprenoid precursor, resulting in a non-canonical pyrophosphate isoprenoid precursor, such as 3-ethylpent-2-en-1-yl-diphosphate, 4-fluoro-3-methylbut-2-en-1-yl- diphosphate, 3-methyl-4-(methylthio)but-2-en-1 -yl-diphosphate, 3-methylpent-2-en-1 -yl- diphosphate, 3-methylhex-2-en-1 -yl-diphosphate, 3, 4-dimethylpent-2-en-1 -yl-diphosphate.

Contacting canonical or non-canonical monophosphate precursors with phosphokinases advantageously enables an alternative pathway for the production of canonical or non-canonical pyrophosphate isoprenoid precursors, which can be further converted to canonical and non- canonical isoprenoids by the action of, e.g., terpene synthases and / or prenyltransferases.

The combined action of one or more of these enzymes provides an isoprenoid biosynthetic pathway that allows de-coupling of isoprenoid biosynthesis from biomass production and enables channelling more substrate into product, thus providing a non-competitive system. Thereby, a 10- to 100-fold increase in production titers of canonical isoprenoid compounds can be achieved. Thus, it is a highly efficient method to avoid common bottlenecks in currently used methods for the production of isoprenoids in yeast and other eukaryotic cells. In another embodiment, the kinase or kinases according to the invention are fused to one or more peptides or peptide analogues resulting in fusion proteins. Said fusion proteins may, depending on the functional characteristics of the said peptide or peptide analogue, advantageously confer additional functionality to the kinase or kinases of the invention. For example, such a peptide or peptide analogue may allow improved enzyme kinetics of the kinase domain or domains; and intracellular localisation peptide may increase the rate of localisation of the kinase or kinases according to the invention to sub-cellular organelles, such as chloroplasts, mitochondria or peroxisome via a chloroplastic, mitochondrial or peroxisomal targeting signal, respectively, thereby increasing the enzyme kinetics of the enzymes according to the invention. Likewise, stability-increasing, and half-life-increasing peptides may contribute to a longer activity of the enzymes by reducing protein turnover, thus increasing the concentration of active enzymes, and total catalytic activity. Enzymatic promiscuity may be increased by the fusion of a kinase according to the invention to a peptide or peptide analogue comprising an additional domain, such as kinase domain, such as a phosphokinase domain and / or one or more peptide sequences improving enzyme kinetics. Moreover, fusion to specific domains or peptides can facilitate the correct folding of the kinase, orcan improve the solubility of the kinase-containing polypeptide, resulting in higher overall intracellular activity.

In a preferred embodiment, the peptide or peptide analogue is maltose-binding protein, green fluorescent protein, thioredoxin, glutathione S-transferase, yeast farnesyl diphosphate synthase (Erg20p), ATP-synthase, CTP synthase, GTP synthase, UTP synthase, NusA, or small ubiquitin related modifier Smt3, or a fragment thereof.

In another embodiment, the peptide consists of naturally encoded amino acid residues, i.e. , amino acids found in the genetic code.

In one embodiment, the primary alcohol is prenol, isoprenol or a prenol-like primary alcohol.

In another embodiment, the primary alcohol is an alcohol with the structure of formula 1 : where formula 1 is:

wherein Ri is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, a group containing oxygen, a group containing nitrogen, a group containing sulphur, a group containing phosphorus and / or a group containing boron;

R₂ is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, a group containing oxygen, a group containing nitrogen, a group containing sulphur, a group containing phosphorus and / or a group containing boron; and R₃ is hydrogen, methyl, fluorine, chlorine, bromine, iodine, sulphydryl; hydroxyl. In a preferred embodiment, the primary alcohol is 3-methylbut-2-en-1-ol, 4-fluoro-3-methylbut-2- en-1-ol, 3-methylpent-2-en-1-ol, 3,4-dimethylpent-2-en-1-ol, 3-ethylpent-2-en-1-ol, 3-methylhex- 2-en-1-ol, 3-methylhexa-2,5-dien-1-ol, 3-methylbut-3-en-1-ol, 3-methylenepentan-1-ol, 2- methylprop-2-en-1 -ol, 3-methyl-4-(methylthio)but-2-en-1 -ol, or 5-chloro-3-methylpent-2-en-1 -ol.

In one embodiment, the primary alcohol is a prenol-like primary alcohol that is not geraniol.

In another embodiment, the primary alcohol is a prenol-like primary alcohol that is not farnesol.

In another embodiment, the primary alcohol is a prenol-like primary alcohol that is not geranylgeraniol.

By using non-canonical, prenol-like alcohols, which comprise an incremental number of carbons (other than multiple of 5 carbons) or different heteroatoms, referred to herein as non-canonical (or prenyl-like) alcohols), in combination with enzymes according to the invention, the production of novel isoprenoid building blocks with carbon size different than five or alternative structures, i.e. non-canonical building blocks, is advantageously enabled. Thus, the number of potential isoprenoids obtained is expanded, many of which could have improved properties over canonical isoprenoids or could provide new functions. Further, a 10- to 100-fold increase in production titers of canonical isoprenoid compounds can be achieved when compared to previously disclosed systems in yeast.

In a further embodiment, the cell according to the invention comprises an exogenous nucleic acid sequence enabling expression or increased expression of an enzyme capable of catalysing the production of canonical and / or non-canonical terpenes, terpenoids, isoprenoids or structures containing isoprenoid groups.

In one embodiment, said nucleic acid sequence comprises an expression control sequence as described herein.

In another embodiment, said nucleic acid comprises a sequence coding for an enzyme. In one embodiment, such an enzyme is a terpene or terpenoid synthase (TS), such as a monoterpene synthase, a sesquiterpene synthase, a diterpene synthase, a sesterterpene synthase, or a triterpene synthase or a fragment thereof, which may convert the canonical or non-canonical pyrophosphate terpene precursors into any of a wide array of terpene and / or terpenoid compounds.

In another embodiment the exogenous nucleic acid sequence enabling increased expression of an enzyme capable of catalysing the production of canonical and / or non-canonical terpenes, terpenoids, isoprenoids or structures containing isoprenoid groups is a prenyl transferase, whereby the terpene compounds are produced by appending precursors of the invention to other non-terpenoid skeletons (as in the case of cannabinoids and prenylated flavonoids).

Examples of preferred terpene synthases and prenyl transferases include but are not limited to limonene and limonene synthase, myrcene and myrcene synthase, geraniol and geraniol synthase, linalool and linalool synthase, taxadiene and taxadiene synthase, amorphadiene and amorphadiene synthase, valencene and valencene synthase, santalol and santalol synthase.

Other enzymes and fragments of the above-described enzymes capable of catalysing the production of canonical and / or non-canonical isoprenoids (or terpenes, or terpenoids) or structures containing isoprenoid groups are also contemplated.

In a preferred embodiment, the terpene synthase or the prenyl transferase is capable of using non-canonical isoprenoid building blocks as substrate. Importantly, by using of novel building blocks with carbon size different than five (or a multiple of five), or building blocks containing heteroatoms, as substrate, the number of potential isoprenoids produced by the system according to the invention is advantageously greatly expanded.

In another embodiment, the terpene synthase enzyme or isoprenoid synthase enzyme or other enzyme or a fragment thereof capable of catalysing the production of canonical isoprenoids (or terpenes, or terpenoids) or structures containing isoprenoid groups, comprises a change in the amino acid sequence that enables improved enzyme kinetics for utilisation of non-canonical isoprenoid building blocks. Such a change in the peptide sequence can, e.g., enhance the affinity of the enzyme for non-canonical substrates, or reduce the activation energy, which results in a greater reaction efficiency. Thus, the terpene and / or terpenoid yield is improved.

In a preferred embodiment, the host cell is a yeast cell. Any yeast species may be appropriate. In some embodiments, the genus of said yeast is selected from Saccharomyces, Pichia, Ogataea, Yarrowia, Kluyveromyces, Candida, Rhodotorula, Rhodosporidium, Cryptococcus, Schizosaccharomyces, Trichosporon and Lipomyces. In some preferred embodiments, the genus of said yeast is Saccharomyces, Pichia, Ogataea, Kluyveromyces or Yarrowia.

Preferred species include Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces pombe, Scheffersomyces stipidis, Pichia pastoris, Ogataea polymorpha, Kluyveromyces marxianus, Kluyveromyces lactis, Yarrowia lipolytica, or Dekkera bruxellensis.

In another embodiment, the host cell of the invention is a filamentous fungi cell, such as a cell derived from the miscella or germ bodies of a filamentous fungus. Preferred species include but are not limited to Aspergillus niger, Aspergillus oryzae, Aspergillus terreus, Neurospora crassa, or Trichoderma reesei.

In another embodiment, the host cell of the invention is an algal cell, such as a cell derived from a multicellar alga. In another embodiment, the algae cell is a microalga, such as a cell of the species Nannochoropsis gaditana, Nannochloropsis oceanica, Nannochloropsis salina, Chlamydomonas reinhardtii, Arthrospira, Chlorella vulgaris, Dunaliella salina, Haematococcus pluvialis, Pheaodactylum tricornutum, or Isochrysis galbana.

By the host cell of the invention being an eukaryotic cell, isoprenoid production titers and speed may be advantageously increased. Furthermore, eukaryotic systems comprising yeast incubators or microalgae photosynthetic production systems are adjusted for optimal Industry-oriented production methods and are, in addition, readily up-scalable depending on demand. In addition, eukaryotic cells contain organelles and subcellular structures that may be beneficial for bioproduction, by facilitating metabolic channelling, avoiding metabolite crosstalk or inadvertent inhibition, and potentially improving the function and stability of enzymes by enabling membrane association or confinement in a cellular compartment/substructure.

A further advantage of the host cell of the invention being an eukaryotic cell, is the ability to perform CYP-driven oxidations on the new structures in eukaryotic cells that have not been possible in bacterial cells. Indeed, eukaryotic cells provide, in general, a better environment to functionally express enzyme that belong to the group of cytochrome P450 (CYPs), particularly if these are originally found in another eukaryotic cell. This is due to the fact that said CYPs are membrane bound enzymes and correct association with the membrane is essential for optimal activity. Eukaryotic cells, such as yeast, contain appropriate membrane structures (e.g. ER) for exogenous CYPs to function. Such membranes are lacking from bacteria and, therefore, functional expression of said CYPs in prokaryotic cells is far less optimal. Although there are kinases reported to phosphorylate prenol and isoprenol in bacterial cells, establishing a bacterial system that converts prenol and isoprenol into CYP-decorated isoprenoids would be challenging due to the limitation described above.

In another aspect, the invention relates to a method for the production of a terpene or an isoprenoid.

In a preferred embodiment, said method comprises the steps of providing an engineered eukaryotic cell comprising a DNA sequence coding for a primary alcohol kinase, and culturing said engineered cell in a medium containing a primary alcohol. In another embodiment, the cell provided further comprises an exogenous nucleic acid sequence coding fora phosphokinase.

In another embodiment, the primary alcohol is at an initial concentration within a range of 0.01 % to 1% v/v, such as within a range of 0.05% to 0.6% v/v, such as within a range of 0.1% to 0.3% v/v.

In a preferred embodiment, the primary alcohol is at an initial concentration of 0.1% v/v.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, example embodiments are described according to the invention:

Fig. 1 & Fig. 2: Identification and characterization of the efficient alcohol kinases. Ion-count of 93- ion from SPME-GCMS from headspace of yeast co-expressing alcohol kinase candidates and AtIPK in yeast EGY48 to identify an efficient conversion of prenol isoprenoid building blocks.

Fig. 3: SPME-GC-MS: Comparison of linalool production with either AtFKI or A65AtFKI in presence of prenol.

Fig. 4: SPME-GC-MS: Comparison of AtFKI and A65AtFKI when co-expressed with the phosphokinase AtIPK in yeast in the presence of prenol. The production of linalool demonstrates that AtFKI performs well in combination with various phosphokinases.

Fig. 5a: Co-expression of the alcohol conversion pathway AtFKI-AtIPK and CIMyrS in yeast cells increases myrcene production (pink) compared without feeding alcohol (black).

Fig. 5b: Co-expression of the alcohol conversion pathway AtFKI-AtIPK and CISabS in yeast cells producing increased sabinene production (pink) compared without feeding alcohol (black).

Fig. 5c: Limonene titer when feeding different ratio of prenoLisoprenol in the strain containing the alcohol feeding pathway and limonene downstream building block.

Fig. 5d: Limonene yield when feeding different concentration alcohols in the strain containing the alcohol feeding pathway and limonene downstream building block.

Fig. 6: Comparison of production of CBGA with and without conversion of prenol and isoprenol.

Fig. 7: Alcohols found to be converted with AtFKI and AtIPK and utilized by CILimS.

Fig. 8: SPME-GCMS analysis of non-canonical terpenes produced when AtFKI, AtIPK and CILimS. Based on the alcohol which have been converted and the mass spectrum of novel compounds the suggested structures are shown for the peaks. Fig. 9: Novel isoprenoid compounds.

Fig. 10: Suggested core alcohol structure.

Fig. 11: Comparison of total production of non-canonical limonene variants using AtFKI-AtIPK alcohol conversion combined with the limonene synthase CILimS, or the CILimS mutants L505G, F483A and I284G, and the alcohols 3,4-Dimethylpent-2-en-1-ol (3,4-DMP) and 3-ethylpent-2-en- 1 -ol (3E2E).

Fig. 12: Synthesis of unnatural cannabigerolic acid analogues by using non-canonical prenyl diphosphates precursors. Noncanonical CBGA, no alcohols . Yeast cells expressing the Erg20p(127W)-CsPT4 alone, AtFKI-AtIPK alone a: 3-methylpent-2-en-1-ol. b: 3,4-dimethylpent- 2-en-1-ol. c: 3-methylhex-2-en-1-ol. d: 3-ethylpent-2-en-1-ol.

Fig. 13: shows the production of C16 and C17 sesquiterpenes using the alcohol conversion pathway and 3M2E.

Fig. 14: evaluates of the efficiency of different prenyltransferases in supporting the production of C16 and C17 sesquiterpenes when feeding 3M2E.

Fig. 15: evaluates of the efficiency of different prenyltransferases in supporting the production of C16 and C17 sesquiterpenes when feeding 3,4-DMP.

Fig. 16: shows of the efficiency of different prenyltransferases in supporting the production of C16 and C17 sesquiterpenes when feeding 3-MP.

Fig. 17: evaluates of the efficiency of different prenyltransferases in supporting the production of C17 sesquiterpenes when feeding 3E2E.

Fig. 18: shows the production of a C16 sesquiterpenoid by Salvia fruticosa caryophyllene synthase (Sf126) and 3-MP.

Fig. 19: shows the production of non-canonical squalene using 3MP.

Fig. 20: shows the production of a non-canonical triterpenoid by curcubitadienol synthase and 3MP.

Fig. 21 : shows the production of a non-canonical triterpenoid by BmeTC(373C) and 3MP. DETAILED DESCRIPTION

In the following the invention is described in detail through embodiments hereof that should not be thought of as limiting to the scope of the invention.

It is known that the isopentenol utilization pathway (IUP) can produce isopentenyl diphosphate or dimethylallyl diphosphate, the main precursors to isoprenoid synthesis, through sequential phosphorylation of isopentenol isomers isoprenol or prenol. This non-canonical, alternative pathway converts primary alcohols into corresponding pyrophosphates: Conversion systems for the production of DMAPP- and IPP-derived compounds from prenol and isoprenol in bacteria have, however, only been demonstrated for obtaining canonical isoprenoids. Moreover, the enzymes used are not efficient in yeast.

The inventors have surprisingly found that the ectopic expression of Arabidopsis thaliana farnesol kinase (FKI) and isopentenyl phosphate kinase lead to a high yield of canonical and non-canonical pyrophosphorylated isoprenoid precursors in yeast.

Definitions

Isoprenoids are a naturally occurring group of chemical molecules displaying a wide structural diversity of carbon skeletons made up from basic isoprene units (typically C₅) and including compounds otherwise named as terpenes, terpenoids, or isoprenoids. Herein, the terms “terpene", “terpenoid", and “isoprenoid" are used interchangeably. According to the number of C₅ units, terpenes are classified as: hemiterpenes C_5, monoterpenes, Ci₀; sesquiterpenes, Ci₅; diterpenes, C₂o; sesterterpenes, C25; triterpenes, C30; and tetraterpenes, C₄o.

Terpenes consist of compounds with the formula (C₅H₈)n. They are further classified by the number of carbons: hemiterpenes C_5, monoterpenes (C10), sesquiterpenes (C15), diterpenes (C20), etc. A well-known monoterpene is alpha-pinene, a major component of turpentine.

Terpenoids are modified terpenes, wherein methyl groups have been moved or removed, or oxygen atoms added, or contain other decorations or modifications. The term "terpene" may also be used more broadly, to include the terpenoids. Just like terpenes, the terpenoids can be classified according to the number of isoprene units that comprise the parent terpene.

Hemiterpenes consist of a single isoprene unit. Isoprene itself is the only hemiterpene, but oxygen-containing derivatives such as prenol and isovaleric acid are hemiterpenoids.

Monoterpenes (or monoterpenoids) are molecules comprising a 10-carbon isoprenoid structure. Monoterpenoids may, in addition to the 10-carbon isoprenoid structure, also comprise moieties not having isoprenoid structure. Frequently, the biosynthesis of mono-terpenoids involves several additional steps following the initial conversion of GPP to the basic monoterpene skeleton. These additional steps may be oxidations (e.g. catalysed by a cytochrome P450 enzyme), reductions, isomerizations. acetylations, methylations, etc.

Examples of sesquiterpenes and sesquiterpenoids include humulene, amorphadiene, farnesenes, farnesol, valencene, etc. (The sesqui- prefix means one and a half).

Diterpenes are composed of four isoprene units and have the molecular formula C20H32. They derive from geranylgeranyl pyrophosphate. Examples of diterpenes and diterpenoids are casbene, abietadiene, miltiradiene, ginkgolides, cafestol, kahweol, cembrene and taxadiene (precursor of taxol). Diterpenes also form the basis for compounds such as retinol, retinal, and phytol.

Sesterterpenes, terpenes having 25 carbons and five isoprene units, are rare relative to the other sizes (The sester- prefix means two and a half). An example of a sesterterpenoid is geranylfarnesol.

Triterpenes consist of six isoprene units and have the molecular formula C₃oH₄₈. The linear triterpene squalene, the major constituent of shark liver oil, is derived from the reductive coupling of two molecules of farnesyl pyrophosphate. Squalene is then processed biosynthetically to generate either lanosterol or cycloartenol, the structural precursors to the steroids.

Sesquarterpenes are composed of seven isoprene units and have the molecular formula C35H56· Sesquarterpenes are typically microbial in their origin. Examples of sesquarterpenoids are ferrugicadiol and tetraprenylcurcumene.

Tetraterpenes contain eight isoprene units and have the molecular formula C₄oH₆₄· These include the acyclic lycopene, the monocyclic gamma-carotene, and the bicyclic alpha- and beta- carotenes.

Polyterpenes consist of long chains of many isoprene units. Natural rubber consists of polyisoprene in which the double bonds are cis. Some plants produce a polyisoprene with trans double bonds, known as gutta-percha.

Norisoprenoids, such as the Ci₃-norisoprenoid 3-oxo-a-ionol and 7,8-dihydroionone derivatives, such as megastigmane-3,9-diol and 3-oxo-7,8-dihydro-a-ionol can be produced by fungal peroxidases or glycosidases. Many norisoprenoids are the product of oxidative cleavage of a larger isoprenoid molecule by specific enzymes.

Iridoids are a group of compounds found in plants and some animals, which are bio- synthetically derived from 8-oxogeraniol.

Monoterpene indole alkaloids, as used herein, refer to a large and diverse group of plant chemical compounds derived from a unit of tryptamine and a 10-carbon or 9-carbon unit of terpenoid origin that is, in turn, derived from 8-oxo-geraniol.

Higher terpenes, as used herein, are intended to mean molecules comprising more than 10 carbon atoms of isoprenoid structure. Examples include sesquiterpenes, diterpenes and triterpenes. Higher terpenes may include moieties not having the isoprenoid structure in addition to the terpene structure.

Cannabinoids, as used herein, refers to a group of compounds members of which were initially isolated from the plant Cannabis sativa. Many cannabinoids are bio-synthesized by the addition of GPP to olivetolic acid.

Meroterpenoids, as used herein, refer to compounds that contain an isoprenoid moiety as part of a larger compound. Such compounds are, for example, the group of cannabinoids, the group of monoterpene indole alkaloids, or other prenylated aromatic compounds.

Canonical terpenes, as used herein, refer to terpenes synthesized using the canonical terpene precursors IPP, DMAPP, GPP, FPP, GGPP, etc, or their “cis-” counterparts, and which have a number of carbon atoms that is a multiple of 5, as their biosynthesis is based on 5-carbon precursors.

DMAPP and IPP: Dimethylallyl pyrophosphate (or dimethylallyl diphosphate; DMAPP) and isopentenyl pyrophosphate (or isopentenyl diphosphate; IPP) are 5-carbon precursors of isoprenoids.

GPP: Geranyl diphopsphate (or geranyl pyrophosphate; GDP). GPP is formed by condensation of one DMAPP and one IPP molecule. GPP is a branch point molecule in isoprenoid synthesis, and it can, by addition of an IPP molecule, be converted into FPP, and thereby be directed into the biosynthesis of sesqui-, di- or tri-terpenes or sterol synthesis, or it can, by the action of a monoterpene synthase, be directed into the synthesis of monoterpenoids, iridoids, and monoterpene indole alkaloids. Other prenyltransferases can also direct GPP towards the production of cannabinoids, prenylated aromatic compounds, or meroterpenoids in general. FPP: Farnesyl pyrophosphate (orfarnesyl diphosphate; FDP) is formed by condensing GPP with an IPP molecule. FPP is the precursor for the synthesis of sesquiterpenes, diterpenes, triterpenes and sterols.

GGPP: Geranylgeranyl pyrophopsphate (or geranylgeranyl diphosphate; GGDP). GGPP is formed by condensing an FPP with an IPP molecule. GGPP is precursor for the synthesis of diterpenes.

GFPP: Geranylfarnesyl pyrophopsphate (or geranylfarnesyl diphosphate; GFDP). GGPP is formed by condensing a GGPP with an IPP molecule. GFPP is precursor for the synthesis of sesterterpenes.

Structural analogues: Also referred to as chemical analogues, chemical analogues, analogues, or analogues are compounds that possess structural similarity to a specific compound but differ in some or more ways to the in respect to said compound.

Analogues can be, but are not limited to, compounds with one or more atoms added or substituted, with functional groups added, removed or substituted, and with substructures changed, isomerized, or modified.

Non-canonical isoprenoids (terpenes or terpenoids) are chemical analogues of canonical isoprenoids produced by removal of diphosphate groups from non-canonical isoprenoid building blocks. When the diphosphate group is removed, several different reactions can occur including cyclization of the molecule, rearrangement of - or formation of bond, double bonds and triple bonds, and reaction with water or oxygen to form functional groups.

Non-canonical meroterpenoids are analogues of canonical meroterpenoids (i.e. meroterpenoids with a canonical isoprenoid moiety).

Non-canonical cannabinoids are analogues of the compounds to the cannabinoid group of compounds. These analogues are defined by the utilization of a non-canonical isoprenoids building block instead of GPP.

Non-canonical monoterpenes, sesquiterpenes, diterpenes, sesterterpenes, triterpenes, tetraterpenes, polyterpenes, sterols: Chemical analogues to monoterpenes, sesquiterpenes, diterpenes, sesterterpenes, triterpenes, tetraterpenes, polyterpenes, sterols or molecules with structural resemblance or production means. Can be produced either by direct conversion of a non-canonical isoprenoids building block, or condensation of a non-canonical building block with either another canonical or non-canonical building block. While not limited to this strict definition they will usually contain between 7 and 100 carbons and / or other non-hydrogen atoms.

Diphopsphate(s), also referred to as pyrophosphate(s) is any molecule with a diphosphosphate group. In this work is often refereed to, but not limited to, organic molecule with a diphosphate group and their analogues.

Prenol-like primary alcohol: is used here to describe an alcohol with a structure that is an analog of the alcohols prenol or isoprenol.

MEP pathway: The methylerythritol 4-phosphate (MEP) pathway forming IPP and DMAPP. The pathway is found e.g. in most bacteria, in algae and is the plastids of higher plants.

MVA pathway: The mevalonate pathway (MVA pathway) is an essential metabolic path-way present in eukaryotes and in some bacteria forming IPP and DMAPP starting from acetyl-CoA.

Alternative MVA pathway: The alternative MVA pathway is found in archaea and provides IPP and DMAPP, starting from acetyl-CoA but utilizing isopentenyl phosphate as intermediate.

Terpene synthases. The term includes any enzyme that is able to catalyse the rearrangement of DMAPP, IPP, GPP, FPP, GGPP, GFPP or other prenyl pyrophosphates or their non-canonical analogues. Terpene synthases typically synthesize multiple products, but the diversity of products varies among terpene synthases. Some terpene synthases have high product specificity, catalysing the synthesis of a limited number of products, and other terpene synthases have low product specificity, catalysing the synthesis of a large variety of different terpenes. Depending on the canonical substrate they primarily accept, terpene synthases are frequently classified as, hemiterpenesynthases, (if the accept DMAPP or IPP) monoterpene synthases (if they accept GPP), sesquiterpene synthases (if they accept FPP), diterpene synthases (if they accept GGPP), sesterterpene synthase (if they accept GFPP), or triterpene synthases (if they accept oxidosqualene orsqualene).

Prenyltransferases are enzymes that append a prenyl moiety to isoprenoid or non-isoprenoid skeletons. Many prenyltransferases that append a prenyl moiety to other isoprenoid chains are involved in the synthesis of the prenyl diphosphate precursors, such as GPP (GPP synthases), FPP (FPP synthases), GGPP (GGPP synthases) or geranylfarnesyl diphosphate synthases (GFPP synthases). These enzymes typically add IPP units to extend DMAPP to larger size prenyl- diphosphates in the trans- configuration. For this reason, they are also called trans-polyprenyl synthases ortrans-polyprenyltransferases. Several prenyltransferase enzymes exist that catalyse the cis- condensation and elongation of DMAPP with IPP. These enzymes are termed cis- prenyltransferase, or cis-polyprenyl diphosphate synthase, or cis-polyprenyltransferases, are responsible for the synthesis of neryl diphosphate, cis.cis-farnesyl diphosphate, and nerylneryl diphosphate.

Furthermore, certain prenyltransferases have been reported to condense two DMAPP molecules to lavandulyl diphosphate or chrysanthemyl diphosphate.

Prenyltransferases that append a prenyl moiety to non-isoprenoid scaffolds add DMAPP, GPP, FPP or GGPP to non-isoprenoid compounds, including flavonoids, amino acid residues and peptides, aromatic compounds, and other chemical compounds in general. Such prenyltransferase enzymes are involved in the biosynthesis of many different natural products including, but not limited to, cannabinoids, prenylated flavonoids, or other meroterpenoids. In the case of cannabinoid synthesis, this enzyme is a geranyldiphosphate:olivetolate geranyltransferase.

The prenylransferase may be part of separate polypeptides or fused into one polypeptide chain. The prenyltransferase may also be fused to another prenyltransferase (e.g. Erg20p; an FPP synthase), a terpene synthase, or another non-terpene synthesizing protein. The prenyltransferase may also be fused to an enzyme that naturally localizes to the peroxisome matrix or its membrane in yeasts or in another organism, or that it is fused to a polypeptide chain that is itself fused to a peroxisomal targeting signal.

An aromatic prenyltransferase is selected among any enzyme with prenyltransferase activity, identified from any organism or engineered, that is able to transfer an isoprenoid moiety to another isoprenoid or non-isoprenoid compound.

Prenyl diphosphate synthase, as used herein, refers to any polypeptide with prenyl diphosphate synthesizing capacity that utilizes prenyl pyrophosphate compounds as substrate(s). In most cases, a prenyl diphosphate synthase is a prenyltransferase (see above)

The term “pyrophosphate” is used interchangeably herein with "diphosphate". Pyrophosphates are in this document an umbrella term for organic molecules that contain a pyrophosphate group.

Also, herein, the term "prenyl diphosphate" is used interchangeably with "prenyl pyrophosphate", “isoprenyl diphosphate”, or “isoprenyl pyrophosphate”, and includes monoprenyl diphosphates containing a single prenyl or isoprenyl group (such as DMAPP or IPP), and polyprenyl diphosphates with two or more prenyl/isoprenyl groups (such as GPP, FPP, GGPP, etc.). Non-canonical pyrophosphates, as used herein, refers to structural analogues of compounds containing a single prenyl or isoprenyl group (such as DMAPP or IPP), as well as structural analogues of compounds polyprenyl diphosphates with two or more prenyl / isoprenyl groups (such as GPP, FPP, GGPP, etc.).

Canonical isoprenoid building blocks, as used herein, refer to prenyl pyrophosphate compounds with a carbon number that is a multiple of 5, which serve as substrates in the biosynthesis of either larger prenyl pyrophosphate compounds or of canonical terpenes (terpenoids and/or isoprenoids) and meroterpenoids. While not limited to these processes, non-canonical can usually be utilized by prenyltransferases and functional analogues, enzymes capable of removing the diphosphate group to catalysing a reaction, modified or folded by oxidoreductases. Squalene, dehydrosqualene, oxidosqualene, and phytoene are also considered herein as canonical isoprenoid building blocks, despite the fact that they do not carry a diphosphate group.

Non-canonical isoprenoid building blocks, as used herein, refers to pyrophosphate group- containing organic molecules that are structural analogues of the canonical isoprenoid building blocks and can serve as substrates in the biosynthesis of either larger pyrophosphate-containing compounds by the action of prenyltransferases or prenyl diphosphate synthase enzymes, also described as condensation and elongation in this work, or in the biosynthesis of non-canonical isoprenoids (terpenes or isoprenoids) by the action of terpene synthases, or non-canonical meroterpenoids by the action of corresponding prenyltransferases. Not limited to these processes, non-canonical isoprenoid building blocks can also be utilized by enzymes capable of removing the diphosphate group to catalysing a reaction, modified, or folded by oxidoreductases, and other reactions.

The term “genetically engineered” as used herein refers to the genetic alteration of a cell resulting from the direct uptake of exogenous genetic material from its surroundings through the cell membrane(s), or by other means, such as viral transduction, whether or not said exogenous genetic material is incorporated into the cell’s genome, thus possibly leading to either stable or transient expression.

Cells comprising exogenous nucleic acids

In one embodiment, the host cell of the invention comprises at least a first nucleic acid comprising or consisting of a variant of SEQ ID NO: 15 encoding a first kinase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 15 and wherein said polypeptide is capable of phosphorylation in said host cell. In another embodiment the host cell of the invention comprises at least a first nucleic acid comprising or consisting of a variant of SEQ ID NO: 15.

In another embodiment, the host cell of the invention comprises at least a first nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 2 (corresponding to a first kinase polypeptide), wherein said variant polypeptide has at least 75%, but less than 100% sequence identity to SEQ ID NO: 2 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a polypeptide comprising or consisting of a variant of SEQ ID NO: 2.

In another embodiment, the host cell of the invention comprises a polypeptide with between 1 and 5 amino acid substitutions as compared to SEQ ID NO: 2.

In one embodiment, said polypeptide has between 1 and 3 amino acid substitutions as compared to SEQ ID NO: 2.

In another embodiment, the host cell of the invention comprises at least a first nucleic acid comprising or consisting of a variant of SEQ ID NO: 14 encoding a first kinase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 14 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises at least a first nucleic acid comprising or consisting of a variant of SEQ ID NO: 14.

In another embodiment, the host cell of the invention comprises at least a first nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 1 (corresponding to a first kinase polypeptide), wherein said variant polypeptide has at least 75%, but less than 100% sequence identity to SEQ ID NO: 1 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a polypeptide comprising or consisting of a variant of SEQ ID NO: 1.

In another embodiment, the host cell of the invention comprises a polypeptide with between 1 and 5 amino acid substitutions as compared to SEQ ID NO: 1.

In one embodiment, said polypeptide has between 1 and 3 amino acid substitutions as compared to SEQ ID NO: 1. In another embodiment, the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 17 encoding a second kinase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 17 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 17.

In another embodiment, the host cell of the invention comprises a second nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 3 (i.e., encoding a second kinase polypeptide), wherein said variant polypeptide has at least 75%, but less than 100% sequence identity to SEQ ID NO: 3 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a polypeptide comprising or consisting of a variant of SEQ ID NO: 3.

In another embodiment, the host cell of the invention comprises a polypeptide with between 1 and 5 amino acid substitutions as compared to SEQ ID NO: 3.

In one embodiment, said polypeptide has between 1 and 3 amino acid substitutions as compared to SEQ ID NO: 3.

In another embodiment, the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 18 encoding a second kinase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 18 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 18.

In another embodiment, the host cell of the invention comprises a second nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 4 (i.e. encoding a second kinase polypeptide), wherein said variant polypeptide has at least 75%, but less than 100% sequence identity to SEQ ID NO: 4 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a polypeptide comprising or consisting of a variant of SEQ ID NO: 4. In another embodiment, the host cell of the invention comprises a polypeptide with between 1 and 5 amino acid substitutions as compared to SEQ ID NO: 4.

In one embodiment, said polypeptide has between 1 and 3 amino acid substitutions as compared to SEQ ID NO: 4.

In another embodiment, the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 19 encoding a second kinase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 19 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 19.

In another embodiment, the host cell of the invention comprises a second nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 5 (i.e. encoding a second kinase polypeptide), wherein said variant polypeptide has at least 75%, but less than 100% sequence identity to SEQ ID NO: 5 and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a polypeptide comprising or consisting of a variant of SEQ ID NO: 5.

In another embodiment, the host cell of the invention comprises a polypeptide with between 1 and 5 amino acid substitutions as compared to SEQ ID NO: 5.

In one embodiment, said polypeptide has between 1 and 3 amino acid substitutions as compared to SEQ ID NO: 5.

In another embodiment, the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 19 encoding a second kinase polypeptide wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 19 and comprising the amino acid substitution 204G and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment the host cell of the invention comprises a second nucleic acid comprising or consisting of a variant of SEQ ID NO: 19 and comprising the amino acid substitution 204G.

In another embodiment, the host cell of the invention comprises a second nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 6 (i.e., encoding a second kinase polypeptide), wherein said variant polypeptide has at least 75%, but less than 100% sequence identity to SEQ ID NO: 6 and comprising the amino acid substitution 204G and wherein said polypeptide is capable of phosphorylation in said host cell.

In another embodiment, the host cell of the invention comprises a polypeptide according to SEQ ID NO: 6.

In another embodiment, the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 20 encoding a prenyl diphosphate synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 20.

In another embodiment the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 20.

In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 7 encoding a prenyl diphosphate synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 7.

In another embodiment, the host cell of the invention comprises a polypeptide according to SEQ ID NO: 7.

In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 8 encoding a prenyl diphosphate synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 8 and comprises amino acid change N127W.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 8.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 8 and comprises amino acid change N127W.

In another embodiment, the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 21 encoding a (+)-limonene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 21.

In another embodiment the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 21.

In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 9 encoding a (+)-limonene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 9.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 9.

In another embodiment, the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 22 encoding a Beta-myrcene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 22.

In another embodiment the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 22.

In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 10 encoding a Beta-myrcene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 10.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 10.

In another embodiment, the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 23 encoding a Sabinene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 23.

In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 11 encoding a Sabinene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 11.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 11.

In another embodiment, the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 24 encoding a a-Pinene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 24.

In another embodiment the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 22. In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 12 encoding a a-Pinene synthase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 12.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 12.

In another embodiment, the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 25 encoding a geranyldiphosphate : olivetolate geranyltransferase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 25.

In another embodiment the host cell of the invention comprises a nucleic acid comprising or consisting of a variant of SEQ ID NO: 25.

In another embodiment, the host cell of the invention comprises a nucleic acid encoding for a polypeptide comprising or consisting of a variant of SEQ ID NO: 13 encoding a geranyldiphosphate : olivetolate geranyltransferase polypeptide, wherein said variant has at least 75%, but less than 100% sequence identity to SEQ ID NO: 13.

In another embodiment, the host cell of the invention comprises a polypeptide comprising SEQ ID NO: 13.

In another embodiment, the host cell of the invention comprises a polypeptide consisting of naturally encoded amino acid residues, i.e., amino acids found in the genetic code.

Although all combinations of the first kinase (SEQ ID NO: 2) that phosphorylates a primary alcohol to a mono- or pyrophosphate terpenoid precursor and a phosphokinase that phosphorylates a monophosphate precursor to a terpenoid pyrophosphate precursor herein may be useful for providing a genetically engineered eukaryotic cell, such as a yeast cell, specific combinations of the first kinase and phosphokinase may be of particular interest in the context of the present invention.

In some embodiments, the first kinase and the phosphokinase are: i) AtFKI and AtIPK; ii) AtFKI and TalPK; ii) AtFKI and TalPK(204G); or functional variants thereof having at least 70% homology thereto, such as at least 71 %, such as at least 72%, such as at least 73%, such as at least 74%, such as at least 75%, such as at least 76%, such as at least 77%, such as at least 78%, such as at least 79%, such as at least 80%, such as at least 81%, such as at least 82%, such as at least 83%, such as at least 84%, such as at least 85%, such as at least 86%, such as at least 87%, such as at least 88%, such as at least 89%, such as at least 90%, such as at least 91 %, such as at least 92%, such as at least 93%, such as at least 94%, such as at least 95%, such as at least 96%, such as at least 97%, such as at least 98%, such as at least 99% homology thereto.

Titer

The combined action of one or more of these enzymes provides an isoprenoid biosynthetic pathway that allows de-coupling of isoprenoid biosynthesis from biomass production and enables channelling more substrate into product, thus providing a non-competitive system. Thereby, a 10- to 100-fold increase in production titers of canonical isoprenoid compounds can be achieved. Thus, it is a highly efficient method to avoid common bottlenecks in currently used methods for the production of isoprenoids in yeast and other eukaryotic cells.

The present methods allow production of isoprenoids with a total titer of at least 10 mg/L, such as at least 30 mg/L, such as at least 100 mg/L, such as at least 300 mg/L, such as at least 1 g/L, such as at least 3 g/L, such as at least 10 g/L, such as at least 30 g/L or more, wherein the total titer is the sum of the intracellular isoprenoids titer and the extracellular isoprenoids titer. Indeed, the produced isoprenoids may be secreted from the cell - extracellular isoprenoids - or it may be retained in the cell - intracellular isoprenoids.

The method may also comprise a step of recovering the produced isoprenoids. This may involve a heating step to precipitate cell material and to release intracellular isoprenoids, a centrifugation or filtration step to remove the cell debris and precipitated materials, pH-adjusting and chromatographic steps optionally involving solvents to vary the solubility of the isoprenoids and to purify it from other components. In some embodiments recovery of isoprenoids involves the addition of a non-miscible solvent overlay in the yeast culture. Said solvent may be hexane, dodecane, isopropyl myristate, or a vegetable oil. In some embodiments the recovered isoprenoids may be used as a nutritional supplement with its naive or processed host cells directly. Kinases

Kinases of the invention meet the definition of an enzyme that catalyses the transfer of phosphate groups from high-energy, phosphate-donating molecules such as CTP, ATP, GTP, UTP, NTP, CDP, ADP, GDP, UDP, NDP, or diphosphate, triphosphate or polyphosphate to specific substrates. This process is known as phosphorylation, where the substrate gains a phosphate group, and the high-energy, e.g., NTP molecule donates a phosphate group. This transesterification produces a phosphorylated substrate and NDP, as illustrated in the below schematic:

ADP Phosphorylated substrate

Kinases as referred to herein encompass both alcohol kinases and phosphate kinases, i.e. , phosphokinases according to the invention.

Farnesol kinase

Arabidopsis thaliana FKI (or FOLK) is a farnesol kinase belonging to the phosphatidate cytidylyltransferase family of enzymes (Brenda EC 2.7.1.216, UniProt Accession Q67ZM7) that can phosphorylate farnesol using an NTP donor. It has also been shown to phosphorylate geraniol and geranylgeraniol. Phosphorylation of farnesol proceeds according to the following reaction:

NTP + (2E,6E)-farnesol = NDP + (2E,6E)-farnesyl phosphate 27b

Host cell

28

The present invention relates to a genetically engineered eukaryotic cell capable of producing mono- or pyrophosphate isoprenoid precursors, such as terpenoid precursors. The genetically engineered eukaryotic cell can be any appropriate cell.

In some embodiments, the genetically engineered eukaryotic cell is a yeast cell.

In some embodiments, the yeast cell is a cell from a GRAS (Generally Recognized As Safe) organism or a non-pathogenic organism or strain.

The cell according to the invention is a eukaryotic cell. Such a cell is described herein as a host cell insofar it is the recipient of and / or comprises nucleic acids or polypeptides according to the invention. Such a eukaryotic cell may be a yeast cell. In a preferred embodiment, the host cell is a yeast cell. Any yeast species may be appropriate. In some embodiments, the genus of said yeast is selected from Saccharomyces, Pichia, Yarrowia, Kluyveromyces, Candida, Rhodotorula, Rhodosporidium, Cryptococcus, Schizosaccharomyces, Trichosporon and Lipomyces. In some preferred embodiments, the genus of said yeast is Saccharomyces, Pichia, Yarrowia, Ogataea or Kluyveromyces The yeast cell may be selected from the group consisting of Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces pombe, Scheffersomyces stipidis, Pichia pastoris, Hansenula polymorpha (syn. Ogataea parapolymorpha), Kluyveromyces marxianus, Yarrowia lipolytica, Klyveromyces lactis, or Dekkera bruxellensis. It is understood that other cells of other genera and, in particular, other species or strains of the same genera are equally appropriate to use as host cells. The eukaryotic cell according to the invention may also be a cell derived from a filamentous fungus, such as a cell derived from the miscella or germ bodies of a filamentous fungus. Non-limiting exemplary species of filamentous fungi according to the invention are Aspergillus niger, Aspergillus oryzae, Aspergillus terreus, Neurospora crassa, or Trichoderma reesei.

The eukaryotic cell according to the invention may also be an algal cell, such as a cell derived from a multicellular alga. The algal cell of the invention may also be a microalga, such as a cell of the species Nannochoropsis gaditana, Nannochloropsis oceanica, Nannochloropsis salina, Chlamydomonas reinhardtii, Arthrospira, Chlorella vulgaris, Dunaliella salina, Haematococcus pluvialis, Pheaodactylum tricornutum, or Isochrysis galbana. It is understood that other cells of other genera and, in particular, other species or strains of the same genera, are equally appropriate to use as host cells.

In some embodiments of the present invention, the genetically engineered eukaryotic cell is not a plant cell. 29

Methods of production of cells according to the invention

Cells of the invention can be produced, for example, by use of a combination of recombinant DNA techniques and gene transfection methods as are well known in the art (Morrison, S. (1985) Science 229: 1202).

For example, the nucleic acid sequence(s) of interest, e.g., a kinase coding sequence, can be amplified using the primers ligated into expression vectors such as a eukaryotic expression plasmid such as used in the expression system disclosed in examples 1 and 2, or other expression systems well known in the art. A purified plasmid with the cloned sequences can be introduced into yeast cells or other eukaryotic host cells such as filamentous fungi cells, algae cells, mammalian cells such as CHO cells, HEK293T cells or HeLa cells or alternatively other eukaryotic cells like plant derived cells. The method used to introduce these genes can be methods described in the art including, but not limited to electroporation, chemical transformation, such as PEG/lithium acetate-mediated transformation, calcium-phosphate precipitation or DEAE-dextran transfection, transfection, such as lipofectamine transfection, transduction, ultrasound transformation and the like. For exogenous expression, i.e., expression of exogenous genetic material, in yeast or other eukaryotic cells, genes can be expressed in the cytosol, or can be targeted to mitochondrion, peroxisome, vacuole, or other organelles by the addition of a suitable targeting sequence such as a chloroplastic, mitochondrial or peroxisomal targeting signal suitable for the host cells. Thus, it is understood that appropriate modifications to a nucleic acid sequence to remove or include a targeting sequence can be incorporated into an exogenous nucleic acid sequence to impart desirable properties. Furthermore, genes can be subjected to codon optimization with techniques well known in the art to achieve optimized expression of the proteins, i.e., it may be preferred to modify said nucleic acids for the sake of optimization of codon usage, in particular if said nucleic acids, optionally fused to heterologous nucleic acids such as nucleic acids derived from other organisms as described herein, are to be expressed in cells from an organism different from the cell of origin. For example, the nucleic acid sequences encoding alcohol or phosphate kinases originating from, e.g., Arabidopsis thaliana according to the invention can be modified to include one or more, preferably at least 1, 2, 3, 4, 5, 10, 15, 20 and preferably up to 10, 15, 20, 25, 30, 50, 70 or 100 or more nucleotide replacements resulting in an optimized codon usage in, e.g. a preferred yeast genus. Such nucleotide replacements preferably relate to replacements of nucleotides not resulting in a change in the encoded amino acid sequence. Preferably, the degree of identity between a specific nucleic acid sequence and a nucleic acid sequence, which is modified with respect to, 30 or which is a variant of said specific nucleic acid sequence, will be at least 70%, preferably at least 75%, more preferably at least 80%, even more preferably at least 90% or most preferably at least 95%, 96%, 97%, 98% or 99%.

Cells according to the invention may also be prepared using various site-directed mutagenesis methods, which for example can be designed based on the sequence of AtFKI, which is accessible under the Uniprot entry Q67ZM7 and provided herein as SEQ ID NO:2. In one embodiment, the cell of the invention is prepared using any one of CRISPR, a TALEN, a zinc finger, meganuclease, and a DNA-cutting antibiotic as described in WO 2017/138986. In one embodiment, the cell is prepared using CRISPR/cas9 technique, e.g. using RNA-guided Cas9 nuclease. This may be done as described in Lawrenson et al. , Genome Biology (2015) 16:258; DOI 10.1186/s13059-015-0826-7 except that the single guide RNA sequence is designed based on the gene sequences provided herein. In one embodiment, the host cell is prepared using a combination of both TALEN and CRISPR/cas9 techniques, e.g., using RNA-guided Cas9 nuclease. This may be done as described in Holme et al., Plant Mol Biol (2017) 95:111-121 ;

DOI: 10.1007/si 1103-017-0640-6) except that the TALEN and single guide RNA sequence are designed based on the gene sequences provided herein.

In one embodiment, the cell of the invention is prepared using homology directed repair, a combination of a DNA cutting nuclease and a donor DNA fragment. This may be done as described in Sun et al., Molecular Plant (2016) 9:628-631 ; DOI: https://doi.Org/10.1016/j.molp.2016.01.001 except that the DNA cutting nuclease and donor DNA fragment are designed based on the gene sequences provided herein.

After introduction of these genes in the host cells, cells expressing the kinase(s) can be identified and selected as would be known to the person skilled in the art according to the marker(s) used. These cells can then be amplified for their expression level and upscaled to produce canonical and non-canonical isoprenoids by use of the canonical and non-canonical isoprenoid precursors.

Nucleic acids

The term "nucleic acid", as used herein, is intended to include deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) and refers to a polynucleotide comprising a polymer of nucleotides. Nucleic acids comprise according to the invention genomic DNA, cDNA, mRNA, recombinantly produced and chemically synthesized molecules. According to the invention, a nucleic acid may be present as a single-stranded or double-stranded and linear or covalently circularly closed 31 molecule. The nucleic acids may have been codon-optimized for expression in a yeast cell as is known in the art.

The nucleic acids described according to the invention have preferably been isolated. The term "isolated nucleic acid" means according to the invention that the nucleic acid was (i) amplified in vitro, for example by polymerase chain reaction (PCR), (ii) recombinantly produced by cloning, (iii) purified, for example by cleavage and gel-electrophoretic fractionation, or (iv) synthesized, for example by chemical synthesis. An isolated nucleic acid is a nucleic acid which is available for manipulation by recombinant DNA techniques.

Nucleic acids may, according to the invention, be present alone or in combination with other nucleic acids, which may be homologous or heterologous. In preferred embodiments, a nucleic acid is functionally linked to expression control sequences which may be homologous or heterologous with respect to said nucleic acid wherein the term "homologous" means that the nucleic acid is also functionally linked to the expression control sequence naturally and the term "heterologous" means that the nucleic acid is not functionally linked to the expression control sequence naturally. A nucleic acid, such as a nucleic acid expressing RNA and/or protein or peptide, and an expression control sequence are "functionally" linked to one another, if they are covalently linked to one another in such a way that expression or transcription of said nucleic acid is under the control or under the influence of said expression control sequence. Since the nucleic acid is to be translated into a functional protein, and where an expression control sequence is functionally linked to a coding sequence, induction of said expression control sequence results in transcription of said nucleic acid without causing a frame shift in the coding sequence or said coding sequence otherwise not being capable of being translated into the desired protein or peptide.

The term "expression control sequence" or "expression control element" comprises according to the invention promoters, ribosome binding sites, IRES, enhancers and other control elements which regulate transcription of a gene or translation of a mRNA. In particular embodiments of the invention, the expression control sequences can be regulated. The exact structure of expression control sequences may vary as a function of the species or cell type, but generally comprises 5'-untranscribed and 5'- and 3 '-untranslated sequences which are involved in initiation of transcription and translation, respectively, such as TATA box, capping sequence, CAAT sequence, and the like. More specifically, 5'-untranscribed expression control sequences comprise a promoter region which includes a promoter sequence for transcriptional control of 32 the functionally linked nucleic acid. Expression control sequences may also comprise enhancer sequences or upstream activator sequences.

According to the invention the term "promoter" or "promoter region" relates to a nucleic acid sequence which is located upstream (5') to the nucleic acid sequence being expressed and controls expression of the sequence by providing a recognition and binding site for RNA- polymerase. The "promoter region" may include further recognition and binding sites for further factors which are involved in the regulation of transcription of a gene.

A promoter may be "inducible" by way of initiating transcription in response to an inducing agent or may be "constitutive" if transcription is not controlled by an inducing agent. A gene which is under the control of an inducible promoter is not expressed or only expressed to a small extent if an inducing agent is absent. In the presence of the inducing agent the gene is switched on or the level of transcription is increased. This is mediated, in general, by binding of a specific transcription factor.

Promoters which are preferred according to the invention include promotors useful for expression in a yeast host, including but not limited to promoters obtained from the genes for Saccharomyces cerevisiae enolase ( EN01 ), Saccharomyces cerevisiae galactokinase ( GAL1 ), Saccharomyces cerevisiae UDP-glucose-4-epimerase ( GAL10 ), Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase ( TDH3 ), Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase ( ADH1 , ADH2/GAP), Saccharomyces cerevisiae triose phosphate isomerase ( TPI ), Saccharomyces cerevisiae metallothionein ( CUP1 ), and Saccharomyces cerevisiae 3-phosphoglycerate kinase (PGK), Saccharomyces cerevisiae cell wall mannoprotein (CCW12). Other useful promoters for yeast host cells are described by Romanos et al., 1992, Yeast 8: 423-488.

Examples of suitable promoters for directing transcription of the nucleic acid constructs of the present invention in a filamentous fungal host cell are promoters obtained from the genes for Aspergillus nidulans acetamidase, Aspergillus niger neutral alpha-amylase, Aspergillus niger acid stable alpha-amylase, Aspergillus niger or Aspergillus awamori glucoamylase (glaA), Aspergillus oryzae TAKA amylase, Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate isomerase, Fusarium oxysporum trypsin-like protease (WO 96/00787), Fusarium venenatum amyloglucosidase (WO 00/56900), Fusarium venenatum Daria (WO 00/56900), Fusarium venenatum Quinn (WO 00/56900), Rhizomucor miehei lipase, Rhizomucor miehei aspartic proteinase, Trichoderma reesei beta-glucosidase, Trichoderma reesei cellobiohydrolase I, Trichoderma reesei cellobiohydrolase II, Trichoderma reesei endoglucanase 33

I, Trichoderma reesei endoglucanase II, Trichoderma reesei endoglucanase III, Trichoderma reesei endoglucanase V, Trichoderma reesei xylanase I, Trichoderma reesei xylanase II, Trichoderma reesei xylanase III, Trichoderma reesei beta-xylosidase, and Trichoderma reesei translation elongation factor, as well as the NA2-tpi promoter (a modified promoter from an Aspergillus neutral alpha-amylase gene in which the untranslated leader has been replaced by an untranslated leader from an Aspergillus triose phosphate isomerase gene; non-limiting examples include modified promoters from an Aspergillus niger neutral alpha-amylase gene in which the untranslated leader has been replaced by an untranslated leader from an Aspergillus nidulans or Aspergillus oryzae triose phosphate isomerase gene); and mutant, truncated, and hybrid promoters thereof. Other promoters are described in U.S. Pat. No. 6,011,147.

The control sequence may also be a transcription terminator, which is recognized by a host cell to terminate transcription. The terminator is operably linked to the 3'-terminus of the polynucleotide encoding the polypeptide. Any terminator that is functional in the host cell may be used in the present invention.

Preferred terminators for yeast host cells are obtained from the genes for Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C ( CYC1 ), Saccharomyces cerevisiae alcohol dehydrogenase 1 ( ADH1 ), and Saccharomyces cerevisiae glyceraldehyde-3- phosphate dehydrogenase. Other useful terminators for yeast host cells are described by Romanos et al., 1992, supra.

Preferred terminators for filamentous fungal host cells are obtained from the genes for Aspergillus nidulans acetamidase, Aspergillus nidulans anthranilate synthase, Aspergillus niger glucoamylase, Aspergillus niger alpha glucosidase, Aspergillus oryzae TAKA amylase, Fusarium oxysporum trypsin-like protease, Trichoderma reesei beta- glucosidase, Trichoderma reesei cellobiohydrolase I, Trichoderma reesei cellobiohydrolase

II, Trichoderma reesei endoglucanase I, Trichoderma reesei endoglucanase II, Trichoderma reesei endoglucanase III, Trichoderma reesei endoglucanase V, Trichoderma reesei xylanase I, Trichoderma reesei xylanase II, Trichoderma reesei xylanase III, Trichoderma reesei beta- xylosidase, and Trichoderma reesei translation elongation factor.

The control sequence may also be an mRNA stabilizer region downstream of a promoter and upstream of the coding sequence of a gene which increases expression of the gene.

The control sequence may also be a leader, a non-translated region of an mRNA that is important for translation by the host cell. The leader is operably linked to the 5'-terminus of the 34 polynucleotide encoding the polypeptide. Any leader that is functional in the host cell may be used.

Suitable leaders for yeast host cells are obtained from the genes for Saccharomyces cerevisiae enolase ( EN01 ), Saccharomyces cerevisiae 3-phosphoglycerate kinase, Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde- 3-phosphate dehydrogenase ( ADH2/GAP ).

Preferred leaders for filamentous fungal host cells are obtained from the genes for Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase.

The control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3'-terminus of the polynucleotide and, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence that is functional in the host cell may be used.

Useful polyadenylation sequences for yeast host cells are described by Guo and Sherman,

1995, Mol. Cellular Biol. 15: 5983-5990.

Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes for Aspergillus nidulans anthranilate synthase, Aspergillus niger glucoamylase, Aspergillus niger alpha-glucosidase Aspergillus oryzae TAKA amylase, and Fusarium oxysporum trypsin-like protease.

It may also be desirable to add regulatory sequences that regulate expression of the polypeptide relative to the growth of the host cell. Examples of regulatory sequences are those that cause expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. In yeast, the ADH2 system or GAL1 system may be used. In filamentous fungi, the Aspergillus niger glucoamylase promoter, Aspergillus oryzae TAKA alpha-amylase promoter, and Aspergillus oryzae glucoamylase promoter, Trichoderma reesei cellobiohydrolase I promoter, and Trichoderma reesei cellobiohydrolase promoter may be used. Other examples of regulatory sequences are those that allow for gene amplification. In eukaryotic systems, these regulatory sequences include the dihydrofolate reductase gene that is amplified in the presence of methotrexate, and the metallothionein genes that are amplified with heavy metals. In these cases, the polynucleotide encoding the polypeptide would be operably linked to the regulatory sequence.

According to the invention, the term "expression" is used in its most general meaning and comprises the production of RNA or of RNA and protein/peptide. It also comprises partial 35 expression of nucleic acids. Furthermore, expression may be carried out transiently or stably. According to the invention, the term expression also includes an "increased expression" or "abnormal expression".

"Increased expression" or "abnormal expression" or “silenced” means according to the invention that expression is altered, increased or decreased, compared to a reference, preferably compared to the state in a normal cell in normal growing physical and chemical conditions, not undergoing above-normal respiration and / apoptosis, and in any case in the same conditions and state as the cell whose expression is being compared to. An increase in expression refers to an increase by at least 10%, in particular at least 20%, at least 50%, at least 100%, at least 200%, at least 500%, at least 1000%, at least 10000% or more. A decrease in expression or silencing refers to a decrease by at least 10%, in particular at least 20%, at least 50%, at least 100%, at least 200%, at least 500%, at least 1000%, at least 10000% or more.

In a preferred embodiment, a nucleic acid molecule is according to the invention present in a vector, where appropriate with a promoter, which controls expression of the nucleic acid.

The term "vector" is used here in its most general meaning and comprises any intermediary vehicle for a nucleic acid which enables said nucleic acid, for example, to be introduced into eukaryotic cells and preferably expressed and, where appropriate, to be replicated and / or integrated into a genome. Thus, the term “vector” as used herein generally relates to genetic material that is at least at the time of introduction into the host cell, extrachromosomal, usually circular DNA duplex. A vector containing foreign DNA is termed recombinant DNA. The term vector therefor comprises, but is not limited to, plasmids, viral vectors, cosmids, and artificial chromosomes. Common to most engineered vectors is an origin of replication, one or multiple cloning sites, and one or multiple selectable markers.

A vector for expression of one or more kinases according to the invention, may either be of a vector type in which the first and the second kinases are present in different vectors or a vector type in which both are present in the same vector.

The teaching given herein with respect to specific nucleic acid and amino acid sequences, e.g. those shown in the sequence listing, is to be construed so as to also relate to modifications of said specific sequences resulting in sequences which are functionally equivalent to said specific sequences, e.g. nucleic acid sequences encoding amino acid sequences exhibiting properties identical or similar to those of the amino acid sequences encoded by the specific nucleic acid sequences. 36

The term “ectopic”, particularly in relation to “ectopic expression” as used herein, relates to the occurrence of gene expression in a cell in which it is normally not expressed. Such ectopic expression can be caused by the introduction and expression of a nucleic acid in a vector as defined herein or by juxtaposition of novel enhancer elements to a gene. Such techniques are known to the person skilled in the art.

“Growth medium” or “culture medium” as used herein refers to is a solid, liquid, or semi-solid designed to support the growth of a population of microorganisms or cells via the process of cell proliferation. Different types of media are used for growing different types of cells and are known to the person skilled in the art. Examples of media are YPD medium, YPG medium, YPAD medium, synthetic minimal medium, and synthetic complex medium, YPGal, selective minimal medium and selective inducing minimal medium

The term "peptide analogue" as used herein refers to a compound comprising a peptide, wherein the peptide may be modified with moieties that do not necessarily consist of proteinogenic amino acids and are thus non-proteinogenic amino acids residues. Non- proteinogenic amino acids are those not naturally encoded or found in the genetic code of any organism. These may be, e.g., intermediates in biosynthesis, or post-translationally formed in proteins.

As used herein the term “fusion protein” or “fusion” or “recombinant protein” refers to a single polypeptide chain having at least two polypeptide domains that are not normally present in a single, natural polypeptide. Such a fusion protein is typically obtained by the expression of recombinant DNA molecules. Recombinant DNA molecules are DNA molecules formed by genetic recombination (such as molecular cloning) that bring together genetic material from multiple sources, creating sequences that would not otherwise be found in the genome.

Sequence identity

The recitations "sequence identity" or, for example, comprising a "sequence 75% identical to," as used herein, refer to the extent that sequences are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis over a window of comparison. Thus, a "percentage of sequence identity" may be calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, lie, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions 37 by the total number of positions in the window of comparison, i.e., the window size and multiplying the result by 100 to yield the percentage of sequence identity.

Terms used to describe sequence relationships between two or more nucleic acid polymers or polypeptides include "reference sequence," "comparison window," "sequence identity," "percentage of sequence identity" and "substantial identity". A "reference sequence" is at least 12 but frequently 15 to 18 and often at least 25 monomer units, inclusive of nucleotides and amino acid residues, in length. Because two nucleic acid or polypeptide polymers may each comprise (1) a sequence (i.e., only a portion of a complete polymer) that is similar between the two polymers, and (2) a sequence that is divergent between the two polymers, sequence comparisons between two (or more) polymers are typically performed by comparing sequences of the two polymers over a "comparison window" to identify and compare local regions of sequence similarity. A "comparison window" refers to a conceptual segment of at least 6 contiguous positions, usually about 50 to about 100, more usually about 100 to about 150 in which a sequence is compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. The comparison window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by computerized implementations of algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, Wl, USA) or by inspection and the best alignment (i.e., resulting in the highest percentage homology over the comparison window) generated by any of the various methods selected. Reference also may be made to the BLAST family of programs as for example disclosed by Altschul et al., 1997, Nucl. Acids Res. 25:3389. A detailed discussion of sequence analysis can be found in Unit 19.3 of Ausubel et al., "Current Protocols in Molecular Biology," John Wiley & Sons Inc, 1994-1998, Chapter 15.

Calculations of sequence similarity or sequence identity between sequences (the terms are used interchangeably herein) can be performed as follows: To determine the percent identity of two nucleic acid sequences, or of two amino acid sequences, the sequences can be aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In certain embodiments of the present disclosure, the length of a reference sequence aligned for comparison purposes is at 38 least 30%, preferably at least 40%, more preferably at least 50%, 60%, and even more preferably at least 75%, 80%, 90%, 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position.

The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.

The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. In a preferred embodiment of the present disclosure, the percent identity between two amino acid sequences is determined using the Needleman and Wunsch, (1970, J. Mol. Biol. 48: 444-453) algorithm which has been incorporated into the GAP program in the GCG software package, using either a Blossum 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet another preferred embodiment of the present disclosure, the percent identity between two nucleotide sequences is determined using the GAP program in the GCG software package, using a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70, or 80 and a length weight of 1 , 2, 3, 4, 5, or 6. A particularly preferred set of parameters (and the one that should be used unless otherwise specified) are a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5. The percent identity between two amino acid or nucleotide sequences can also be determined using the algorithm of E. Meyers and W. Miller (1989, Cabios, 4: 11-17) which has been incorporated into the ALIGN program (version 2.0), using a PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4.

The nucleic acid and protein sequences described herein can be used as a "query sequence" to perform a search against public databases, for example, to identify other family members or related sequences. Such searches can be performed using the NBLAST and XBLAST programs (version 2.0) of Altschul, et al., (1990, J. Mol. Biol, 215: 403-10). BLAST nucleotide searches can be performed with the NBLAST program, score = 100, wordlength = 12 to obtain nucleotide sequences homologous to nucleic acid molecules of the disclosure. BLAST protein searches can be performed with the XBLAST program, score = 50, wordlength = 3 to obtain amino acid sequences homologous to protein molecules of the disclosure. To obtain gapped alignments for 39 comparison purposes, Gapped BLAST can be utilized as described in Altschul et al. (Nucleic Acids Res. 25:3389-3402, 1997). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and N BLAST) can be used.

Alcohol substrates The substrate according to the invention is a primary alcohol with three (3) up to thirty (30) carbon atoms, including, but not limited to; structures with sidechains chains, branched sidechains, structures with one (1) or more double bonds, one (1) or more triple bonds, functional groups, structures with the addition of or substitution with the elements Hydrogen, Nitrogen, Oxygen, Fluorine, Silicon, Phosphorus, Sulphur, Chlorine, Selenium, Boron, Iodine, Lithium, Sodium or Potassium.

Such a substrate may be summarised by the structure of formula 1 : where formula 1 is:

40 or wherein Ri is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, including, but not limited to methyl, ethyl, propyl, isopropyl, methoxy, ethoxy, hydroxyl, hydroxymethyl, hydroxyethyl, sulfhydryl, silyl; a group containing a reactive nonmetal;

a group containing a metalloid; a group containing a halogen, comprising: fluoro-, chloro-, bromo- , or iodo- groups; a group containing oxygen, comprising: hydroxyl-, carbonyl-, aldehyde-, haloformyl-, carbonate ester-, carboxylate-, carboxyl-, carboalkoxy-, hydroperoxy-, peroxy-, ether- , hemiacetal-, hemiketal-, acetal-, ketal-, orthoester-, methylenedioxy-, orthocarbonate ester-, or carboxylic anhydride - groups; a group containing nitrogen, comprising: carboxamide-, primary amine-, secondary amine-, tertiary amine-, 4° ammonium ion-, primary ketimine-, secondary ketimine-, primary aldimine-, secondary aldimine-, imide-, azide-, azo-, cyanate-, isocyanate-, nitrate-, nitrile-, isonitrile-, nitrosooxy-, nitro-, nitroso-, oxime-, pyridyl-, or carbamate - groups; a group containing sulfur, comprising: sulfhydryl-, sulphide-, disulphide-, sulfinyl-, sulfonyl-, sulfino- , sulpho-, sulphonate ester-, thiocyanate-, isothiocyanate-, carbonothioyl-, carbonothioyl-, thiocarboxylic acid-, carbothioic s-acid-, carbothioic o-acid-, thiolester-, thionoester-, carbodithioic acid-, orcarbodithio- groups; a group containing phosphorus comprising: phosphino-, phosphono- , phosphate-, or phosphate- groups; and / or a group containing boron comprising: borono-, boronate-, borino-, or borinate- groups;

R₂ is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, including, but not limited to methyl, ethyl, propyl, isopropyl, methoxy, ethoxy, hydroxyl, hydroxymethyl, hydroxyethyl, sulfhydryl, silyl; a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, comprising: fluoro-, chloro-, bromo- , or iodo- groups; a group containing oxygen, comprising: hydroxyl-, carbonyl-, aldehyde-, haloformyl-, carbonate ester-, carboxylate-, carboxyl-, carboalkoxy-, hydroperoxy-, peroxy-, ether- , hemiacetal-, hemiketal-, acetal-, ketal-, orthoester-, methylenedioxy-, orthocarbonate ester-, or carboxylic anhydride - groups; a group containing nitrogen, comprising: carboxamide-, primary amine-, secondary amine-, tertiary amine-, 4° ammonium ion-, primary ketimine-, secondary ketimine-, primary aldimine-, secondary aldimine-, imide-, azide-, azo-, cyanate-, isocyanate-, 41 nitrate-, nitrile-, isonitrile-, nitrosooxy-, nitro-, nitroso-, oxime-, pyridyl-, or carbamate - groups; a group containing sulfur, comprising: sulfhydryl-, sulfide-, disulfide-, sulfinyl-, sulfonyl-, sulfino-, sulfo-, sulfonate ester-, thiocyanate-, isothiocyanate-, carbonothioyl-, carbonothioyl-, thiocarboxylic acid-, carbothioics-acid-, carbothioic o-acid-, thiolester-, thionoester-, carbodithioic acid-, orcarbodithio- groups; a group containing phosphorus comprising: phosphino-, phosphono- , phosphate-, or phosphate- groups; and / or a group containing boron comprising: borono-, boronate-, borino-, or borinate- groups; and

R₃ is hydrogen, methyl, fluorine, chlorine, bromine, iodine, sulfhydryl; or hydroxyl.

In one embodiment, the primary alcohol of the invention is an alcohol comprising less than 5 carbon atoms, such as 4 carbon atoms, such as 3 carbon atoms.

In one embodiment, the primary alcohol of the invention is an alcohol comprising more than 5 carbon atoms, such as 6 carbon atoms, such as 7 carbon atoms, such as 8 carbon atoms, such as 9 carbon atoms.

In one embodiment, the primary alcohol of the invention is an alcohol comprising 5 carbon atoms.

Accordingly, the substrate of the invention is any selected from the group formed by, but not limited to:

3,4-dimethylpent-2-en-1-ol, 4-methyl-3-methylenepentan-1-ol, 3,4-dimethylpent-3-en-1-ol, propan-1-ol, prop-2-en-1-ol, prop-2-yn-1-ol, butan-1-ol, but-3-en-1-ol, but-2-en-1-ol, buta-2,3- dien-1-ol, but-3-yn-1-ol, 3-methylbut-3-en-1-ol, 3-methylbut-2-en-1-ol, 3-methylbutan-1-ol, but-2- yn-1-ol, 2-methylenebutan-1-ol, 2-methylbut-2-en-1-ol, 2-methylbut-3-en-1-ol, 2-methylbutan-1- ol, 3-ethylpent-4-en-1-ol, 3-methylpenta-2,4-dien-1-ol, 3-methylpentan-1-ol, 3-methylpent-2-en- 1 -ol, 3-methylenepentan-1-ol, 3-methylpent-3-en-1-ol, 3-ethylpentan-1-ol, 3-ethylpent-4-en-1-ol, 3-ethylpent-3-en-1 -ol, 3-ethylpent-2-en-1 -ol, 3-ethylpent-4-yn-1 -ol, 3-methylenepent-4-en-1 -ol, 3-methylpent-4-yn-1-ol, 3-methylenepent-4-yn-1-ol, 3-methylhexan-1-ol, 3-methylhex-2-en-1-ol, 3-methylenehexan-1-ol, 3-methylhex-3-en-1-ol, 3-methylhex-5-en-1-ol, 3-methylpent-2-en-4-yn- 1 -ol, 3-methylhex-4-en-1-ol, 3-methylhexa-2,4-dien-1-ol, 3-methylhexa-2,5-dien-1-ol, 3- methylheptan-1-ol, 3-methylhepta-2,4-dien-1-ol, 3-methyleneheptan-1-ol, 3-methylhept-3-en-1- ol, 3-methylhept-4-en-1-ol, 3-methylhept-5-en-1-ol, 3-methylhept-6-en-1-ol, 3-methylhept-2-en- 1 -ol, 3-methylenehept-4-en-1-ol, 3-methylhepta-3,4-dien-1-ol, 3-methylhept-4-en-1-ol, 3- methylhepta-5,6-dien-1-ol, 3-methylhepta-2,5-dien-1-ol, 3-methylhepta-2,6-dien-1-ol, 3- methylenehept-6-en-1-ol, 3-methylenehept-5-en-1-ol, 3-methylhepta-3,5-dien-1-ol, 3- 42 methylhepta-4,5-dien-1-ol, 3-methylhepta-3,5,6-trien-1-ol, 3-methylhepta-4,6-dien-1-ol, 3- methylhepta-2,4,6-trien-1 -ol, 3-methylhepta-4,6-dien-1 -ol, 3-methyloctan-1 -ol, 3-methyloct-2-en- 1 -ol, 3-methyleneoctan-1-ol, 3-methyloct-3-en-1-ol, 3-methyloct-4-en-1-ol, 3-methyloct-5-en-1- ol, 3-methyloct-7-en-1-ol, 3-methylocta-2,4-dien-1-ol, 3-methyleneocta-4,5-dien-1-ol, 3- methyleneoct-4-en-1-ol, 3-methylocta-3,4-dien-1-ol, 3-methyloct-6-en-1-ol, 3-methylocta-2,4,5- trien-1 -ol, 3-methylocta-2,4,6-trien-1 -ol, 3-methyleneocta-4,6-dien-1 -ol, 3-methylocta-4,5-dien-1 - ol, 3-methylocta-5,6-dien-1-ol, 3-methylocta-6,7-dien-1-ol, 3-methyleneocta-5,7-dien-1-ol, 3- methylocta-2,5,7-trien-1-ol, 3-methyleneocta-4,7-dien-1-ol, 3-methylocta-2,4,7-trien-1-ol, 3- methylocta-4,5,7-trien-1-ol, 3-methylocta-3,4,6-trien-1-ol, 3-methylocta-3,4,7-trien-1-ol, 3- fluorobut-2-en-1-ol, 3-chlorobut-2-en-1-ol, 3-bromobut-2-en-1-ol, 3-aminobut-2-en-1-ol, 3- phosphaneylbut-2-en-1-ol, 3-fluorobut-3-en-1-ol, 3-chlorobut-3-en-1-ol, 3-bromobut-3-en-1-ol, 3- aminobut-3-en-1-ol, 3-phosphaneylbut-3-en-1-ol, 4-chloro-3-methylbut-2-en-1-ol, 4-bromo-3- methylbut-2-en-1-ol, 4-hydroxy-2-methylbut-2-enal, 2-methylbut-2-ene-1,4-diol, 4-mercapto-3- methylbut-2-en-1-ol, 3-methylpent-2-en-1-ol, 4-amino-3-methylbut-2-en-1-ol, 3- (fluoromethyl)but-3-en-1-ol, 4-fluoro-3-methylbut-2-en-1-ol, 3-(chloromethyl)but-3-en-1-ol, 3- (bromomethyl)but-3-en-1-ol, 4-hydroxy-2-methylenebutanal, 2-methylenebutane-1 ,4-diol, 3- (mercaptomethyl)but-3-en-1-ol, 3-methylenepentan-1-ol, 3-(aminomethyl)but-3-en-1-ol, 3- (phosphaneylmethyl)but-3-en-1 -ol, 3-methyl-4-phosphaneylbut-2-en-1 -ol, 5-fluoro-3-methylpent-

2-en-1-ol, 5-bromo-3-methylpent-2-en-1-ol, 5-chloro-3-methylpent-2-en-1-ol, 3-methylpent-2- ene-1 ,5-diol, 5-hydroxy-3-methylpent-3-enal, 5-iodo-3-methylpent-2-en-1-ol, 3-methyl-4- (methylthio)but-2-en-1 -ol, 5-mercapto-3-methylpent-2-en-1 -ol, 5-amino-3-methylpent-2-en-1 -ol,

3-methyl-5-phosphaneylpent-2-en-1-ol, and analogues of these compounds including analogues with the elements Nitrogen, Oxygen, Fluorine, Silicon, Phosphorus, Sulfur, Chlorine, Selenium, Boron, Iodine, Lithium, Sodium or Potassium.

In a preferred embodiment, the primary alcohol is 3-methylbut-2-en-1-ol, 4-fluoro-3-methylbut-2- en-1-ol, 3-methylpent-2-en-1-ol, 3,4-dimethylpent-2-en-1-ol, 3-ethylpent-2-en-1-ol, 3-methylhex- 2-en-1-ol, 3-methylhexa-2,5-dien-1-ol, 3-methylbut-3-en-1-ol, 3-methylenepentan-1-ol, 2- methylprop-2-en-1 -ol, 3-methyl-4-(methylthio)but-2-en-1 -ol, or 5-chloro-3-methylpent-2-en-1 -ol.

Terpenes and terpenoid compounds

The invention relates to the production of terpenes and terpenoid compounds in eukaryotic cells, said compounds being canonical or non-canonical terpenes or terpenoid compounds. Accordingly, a cell according to the invention is capable of production of a terpene or terpenoid selected from a group comprising, but not limited to: 43

Limonene, myrcene, alpha-pinene, sabinene, beta-pinene, 1,8-cineole, tricyclene, alpha- thujene, a/p/7a-fenchene, camphene, delta- 2-carene, a/p/7a-phellandrene, 3-carene, 1,4-cineole, a/p/7a-terpinene, befa-phellandrene, (Z)-befa-ocimene, (E)-beta-ocimene, gamma-terpinene, terpinolene, linalool, linalool acetate, ethyl linalool acetate, perillene, allo-ocimene, cis- beta- terpineol, c/s-terpine-1-ol, isoborneol, cfe/fa-terpineol, borneol, chrysanthemol, lavandulol, alpha- terpineol, nerol, geraniol, geranyl acetate, alpha-humulene, beta-caryophyllene, valencene, amorpha-4,11-diene, taxadiene, cannabigerolic acid, grifolic acid, daurichromenic acid, confluentin, rhododaurichromenic acids A and B, anthopogocyclolic acid, anthopogochromenic acid, cannabiorcichromenic acid, cannabiorcicyclolic acid, c/s-perrottetinene, (-)-cis- perrottetinenic acid,

7-ethyl-3-methylenenona-1 ,6-diene 7-methyl-3-methylenenona-1 ,6-diene 7, 8-dimethyl-3-methylenenona-1, 6-diene 7-methyl-3-methylenedeca-1 ,6-diene 1 -methyl-4-(3-methylbut-1 -en-2-yl)cyclohex-1 -ene 4-(but-1 -en-2-yl)-1 -methylcyclohex-1 -ene 1 -methyl-4-(pent-1 -en-2-yl)cyclohex-1 -ene 1 -methyl-4-(pent-2-en-3-yl)cyclohex-1 -ene

2.4-dihydroxy-3-(3-methylpent-2-en-1-yl)-6-pentylbenzoic acid,

2.4-dihydroxy-3-(3-methylhex-2-en-1-yl)-6-pentylbenzoic acid 3-(4-fluoro-3-methylbut-2-en-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid 3-(7-ethyl-3-methylnona-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid 3-(3,7-dimethylnona-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid

2.4-dihydroxy-6-pentyl-3-(3,4,7-trimethylocta-2,6-dien-1-yl)benzoic acid

2.4-dihydroxy-6-pentyl-3-(3,4,7-trimethylnona-2,6-dien-1-yl)benzoic acid 3-(3-ethylpent-2-en-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid

2.4-dihydroxy-6-pentyl-3-(3,7,8-trimethylnona-2,6-dien-1-yl)benzoic acid 3-(3,7-dimethyldeca-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid 3-(8-fluoro-3,7-dimethylocta-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid

In one embodiment the terpene or terpenoid is selected from the group commprising of hemiterpenes, hemiterpenoids, and monoterpenes.

In another embodiment the terpene or terpenoid is selected from the group comprising acyclic monoterpenes, monocyclic monoterpenes, cyclopropane monoterpenes, cyclobutane 44 monoterpenes, cyclopentane monoterpenes, cyclohexane monoterpenes, cymenes, bicyclic monoterpenes, pinanes, camphanes, fenchanes, monoterpene indole alkaloid, cannabinoids and sesquiterpenes.

In another embodiment the terpene or terpenoid is selected from the group comprising farnesanes, monocyclic farnesane sesquiterpenes, cyclofarnesanes, bisabolanes, germacranes, elemanes, humulanes, polycyclic farnesane sesquiterpenes, caryophyllanes, eudesmanes, furanoeudesmanes, eremophilanes, furanoeremonphilanes, valeranes, cadinanes, drimanes, guaianes, cycloguaianes, himachalanes, longipinanes, longifolanes, picrotoxanes, isodaucanes, daucanes, protoilludanes, illudanes, illudalanes, marasmanes, isolactaranes, lactaranes, sterpuranes, acoranes, chamigranes, cedranes, isocedranes, zizaanes, prezizaanes, campherenanes, santalanes, thujopsanes, hirsutanes, pinguisanes, presilphiperfolianes, silphiperfolianes, silphinanes, and isocomanes, and diterpenes.

In another embodiment the terpene or terpenoid is selected from the group comprising phytanes, cyclophytanes, bicyclophytanes, labdanes, rearranged labdanes, tricyclophytanes, pimaranes, isopimaranes, cassanes, cleistanthanes, isocopalanes, abietanes, totaranes, tetracyclophytanes, beyeranes, kauranes, villanovanes, atisanes, gibberellanes, grayanatoxanes, cembranes, cyclocembranes, casbanes, lathyranes, jatrophanes, tiglianes, rhamnofolanes, daphnanes, eunicellanes, asbestinanes, biaranes, dolabellanes, dolastanes, fusicoccanes, verticillanes, taxanes, trinervitanes, kempanes, prenylsesquiterpenes, xenicanes xeniaphyllanes, prenylgermacranes, lobanes, prenyleudesmanes, bifloranes, sacculatanes, prenyldrimanes, prenylguaianes, prenylaromadendranes, sphenolobanes, prenyldaucanes and ginkgolides.

In another embodiment the terpene or terpenoid is a sesterterpene.

In another embodiment the terpene or terpenoid is selected from the group comprising acyclic sesterterpenes, monocyclic sesterterpenes, bicyclic sesterterpenes, tricyclic sesterterpenes, tetracyclic sesterterpenes, pentacyclic sesterterpenes and polycyclic sesterterpenes.

In another embodiment the terpene or terpenoid is a triterpene.

In another embodiment the terpene or terpenoid is selected from the group comprising linear triterpenes, gonane type, tetracyclic triterpenes, protostanes, fusidanes, dammaranes, apotirucallanes, tirucallanes, euphanes, lanostanes, cycloartanes, cucurbitanes, baccharane type, pentacyclic triterpenes, baccharanes, lupanes, oleananes, taraxeranes, multifloranes, baueranes, glutinanes, friedelanes, pachysananes, taraxastanes, ursanes, pentacyclic 45 triterpenes, hopane type, hopanes, neohopanes, fernanes, adiananes, filicanes, gammaceranes, stictanes, arboranes, onoceranes, serratanes, and iridals.

In yet another embodiment the terpene or terpenoid is a tetraterpene.

In another embodiment the terpene or terpenoid is selected from the group comprising carotenoids, apocarotenoids, diapocarotenoids, megastigmanes, polyterpenes and prenylquinones.

Examples of the products of monoterpene synthases include, but are not limited to, the following compounds: tricyclene, a/p/7a-thujene, alpha-pinene, a/p/7a-fenchene, camphene, sabinene, beta- pinene, myrcene, delta- 2-carene, a/p/7a-phellandrene, 3-carene, 1,4-cineole, a/p/7a-terpinene, befa-phellandrene, 1,8-cineole, limonene, (Z)-befa-ocimene, (E)-beta-ocimene, gamma- terpinene, terpinolene, linalool, perillene, allo-ocimene, c/s-beta-terpineol, c/s-terpine-1-ol, isoborneol, cfe/fa-terpineol, borneol, chrysanthemol, lavandulol, alpha-terpineol, nerol, geraniol. In addition to GPP, certain terpene synthases (or terpene synthase variants developed by protein engineering) have been reported to convert non-canonical prenyl diphosphate substrates, such as the 11 -carbon substrate 2-methyl-GPP, to terpenes with non-canonical prenyl scaffolds (Ignea et al. 2018). In the context of this disclosure, enzymes that are able to convert non-canonical prenyl-diphosphate substrates with carbon lengths that differ from 10 into non-canonical terpenoids with 8, 9, 11 , 12, 13 or 14 carbons and / or other non-hydrogen atoms, or that are in any way analogues of the canonical substrate (GPP), are also included in the definition of products of monoterpene synthases.

Examples of sesquiterpene synthase products include, but are not limited to alpha-humulene, beta-caryophyllene, trans-alpha-bergamotene, cis-alpha-bergamotene, farnesene, alpha- santalene, santalol, beta-selinene, zingiberene, della-cadinene, germacrene, etc. (see more comprehensive list of structures above).

In some embodiments, the engineered eukaryotic cell according to the present invention is capable of producing terpene scaffolds with 16, 17 or 31 carbon atoms.

In some embodiments, the engineered eukaryotic cell is capable of producing terpene scaffolds with 16, 17 or 31 carbon atoms when feed with an alcohol substrate selected from the group consisting of 3M2E, 3,4-DMP, 3-MP, 3E2E, prenol, and isoprenol. 46

In some embodiments, the engineered eukaryotic cell comprises nucleic acid sequences encoding AtFK\, At\PK, Erg20p and CYC2 for the production of terpene with 16, 17 or31 carbon atoms, wherein AtFK\ comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto. In some embodiments, the cell comprises a nucleic acid sequence encoding an At\PK, wherein the nucleic acid sequence comprises or consists of SEQ ID NO: 17 or a homologue or variant thereof having at least 75% identity thereto.

In some embodiments, the engineered eukaryotic cell comprises nucleic acid sequences encoding AtFK\, At\PK, and an enzyme selected from the group consisting of Erg20p(F96C), Synechococcus elongatus GGPPS, Erg20p, Taxus canadensis GGPPS, Lycopersicon esculentum NNPPS, Solanum habrochaites zFPPS for the production of sesquiterpene with 16 and 17 carbon atoms, wherein AtFK\ comprises SEQ ID NO: 2 ora homologue or variant thereof having at least 75% identity thereto. In some embodiments, the cell comprises a nucleic acid sequence encoding an At\PK, wherein the nucleic acid sequence comprises or consists of SEQ ID NO: 17 ora homologue or variant thereof having at least 75% identity thereto.

In some embodiments, the engineered eukaryotic cell comprises nucleic acid sequences encoding AtFK\, At\PK, Erg20p(F96C) and Salvia fruticosa trans-p-caryophyllene synthase (Sf126) for the production of sesquiterpenes with 16 or 17 carbon atoms, wherein AtFK\ comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto. In some embodiments, the cell comprises a nucleic acid sequence encoding an At\PK, wherein the nucleic acid sequence comprises or consists of SEQ ID NO: 17 ora homologue or variant thereof having at least 75% identity thereto.

In some embodiments, the engineered eukaryotic cell comprises nucleic acid sequences encoding AtFK\, At\PK, and CPQ for the production of triterpene with 31 carbon atoms, wherein AtFK\ comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto. In some embodiments, the cell comprises a nucleic acid sequence encoding an At\PK, wherein the nucleic acid sequence comprises or consists of SEQ ID NO: 17 ora homologue or variant thereof having at least 75% identity thereto.

In some embodiments, the engineered eukaryotic cell comprises nucleic acid sequences encoding AtFK\, At\PK, and BmeTC(373C) for the production of triterpene with 31 carbon atoms, wherein AtFK\ comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto. In some embodiments, the cell comprises a nucleic acid sequence encoding an 47

At IPK, wherein the nucleic acid sequence comprises or consists of SEQ ID NO: 17 or a homologue or variant thereof having at least 75% identity thereto. In some embodiments, the genetically engineered eukaryotic cell for the production of a terpene or a terpenoid or an isoprenoid comprising a first nucleic acid sequence encoding a first kinase that phosphorylates a primary alcohol to a mono- or pyrophosphate terpenoid precursor; and optionally a second nucleic acid sequence encoding a phosphokinase that phosphorylates a monophosphate precursor to a terpenoid pyrophosphate precursor; wherein the first kinase comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto, wherein the cell further comprises at least one further exogenous nucleic acid encoding an enzyme selected from the group consisting of Erg20p, CYC2, Erg20p(F96C), SynGGPPS, Sf126, CPQ, and BmeTC(373C), wherein the enzyme is capable of catalysing the production of non- canonical isoprenoids or structures containing isoprenoid groups.

Examples of diterpene synthase products include but are not limited to: taxadiene, casbene, cembrene, copalyl diphosphate, copal-8-ol diphosphate, etc. (see more comprehensive list of structures above).

Examples of triterpene synthase products include but are not limited to: friedelin, alpha-amyrin, beta-amyrin, lupeol, cucurbitadienol, etc. (see more comprehensive list of structures above).

By “improved enzyme kinetics” as used herein it is meant the result of any change to an enzyme, e.g., a kinase according to the invention or terpene synthase or prenyl transferase according to the invention, said change involving either a change to the nucleic acid sequence coding for said enzyme, or a change in the expressed peptide or protein, which result is measurable in terms of, e.g., the efficiency of said enzyme. Enzyme efficiency can be expressed in terms of kcat/Km, i.e., the specificity constant, wherein kcat is the turnover number and Km is the Michaelis constant, i.e. the affinity an enzyme has for its substrate. Because the specificity constant reflects both affinity and catalytic ability, it is useful for comparing different enzymes against each other, or different variants of said enzyme according to the invention against each other, or the same enzyme with different substrates according to the invention. Thus, the term “improved enzyme kinetics” also encompasses the result of any change affecting the affinity of an enzyme according to the invention. 48

By “improved enzyme kinetics” it is also meant the result of any change affecting reaction rate. Reaction rate is often found to have the form: r= k(T)[A]^m [B]ⁿ where k(T) is the reaction rate constant that depends on temperature, and [A] and [B] are the molar concentrations of substances A and B in moles per unit volume of solution, assuming the reaction is taking place throughout the volume of the solution. (For a reaction taking place at a boundary, e.g., on a cell membrane, one would use instead moles of A or B per unit area). The exponents m and n are called partial orders of reaction and are not generally equal to the stoichiometric coefficients a and b. Instead they depend on the reaction mechanism and it would be known how to determine them experimentally by the person skilled in the art.

By “enzyme promiscuity” as used herein refers to the ability of an enzyme to catalyse a fortuitous side reaction in addition to its main reaction. Promiscuous activities are usually slow relative to the main activity. For example, the alcohol kinase of the invention may in some embodiments also be suited to catalyse a phosphate phosphorylation.

Various analytical techniques for measuring protein stability are available in the art and are reviewed in Peptide and Protein Drug Delivery, 247-301 , Vincent Lee Ed., Marcel Dekker, Inc., New York, N.Y., Pubs. (1991) and Jones, A. Adv. Drug Delivery Rev. 10: 29-90 (1993). Stability can be measured at a selected temperature for a selected time period. For rapid screening, a formulation comprising the enzymes which stability is to be compared can be kept at 40°C for 2 weeks to 1 month, at which time stability is measured. For example, the extent of aggregation during storage can be used as an indicator of protein stability.

Whether said genetically engineered eukaryotic cell, such as a yeast cell is capable of phosphorylating primary alcohols into mono- or pyrophosphate terpenoid precursors may be determined in different manners.

In one embodiment, it is determined by a method comprising the steps of:

• providing an aqueous solution containing a predefined level of a primary alcohol

• incubating the genetically engineered eukaryotic cell to be tested with said aqueous solution

• determining the level of the primary alcohol in the aqueous solution subsequent to said incubation 49 wherein the reduction in the primary alcohol level is considered a measure of phosphorylating of the primary alcohol into a mono- or pyrophosphate terpenoid precursor.

Accordingly it is preferred that when the genetically engineered eukaryotic cell according to the invention is incubated in said aqueous solution containing a predefined level of a primary alcohol, then the level of the primary alcohol subsequent to said incubation is at least 1% lower, such as at least 2%, such as at least 3%, such as at least 4%, such as at least 5%, such as at least 6%, such as at least 7%, such as at least 8%, such as at least 9%, such as at least 10%, such as at least 20%, such as at least 30%, such as at least 40%, such as at least 50% lower that the starting level.

In one embodiment, whether said genetically engineered eukaryotic cell is capable of phosphorylating a primary alcohol present in a culture medium into a mono- or pyrophosphate terpenoid precursor is determined by the steps of:

• providing a culture medium containing a predefined level of primary alcohol of a known size;

• Acidifying the culture medium to pH 5;

• incubating the genetically engineered eukaryotic cell to be tested in said medium; and

• determining the level of isoprenoid alcohols in the culture medium subsequent to said incubation wherein the appearance of isoprenoid alcohols with larger size than the added primary alcohol is considered a measure of the phosphorylating of the primary alcohol into the mono- or pyrophosphate terpenoid precursor and its further incorporation into larger prenyl diphosphate precursors.

Accordingly it is preferred that when the genetically engineered eukaryotic cell according to the invention is incubated in a culture medium containing a predefined level of a primary alcohol, then the sum of the levels of the different larger size alcohols determined is at least 1%, such as at least 2%, such as at least 3%, such as at least 4%, such as at least 5%, such as at least 6%, such as at least 7%, such as at least 8%, such as at least 9%, 10%, such as at least 20%, such as at least 30%, such as at least 40%, such as at least 50%, such as at least 600%, such 50 as at least 70%, such as at least 80%, such as at least 90% of the starting level of the predefined primary alcohol having a known size.

In some embodiments, downstream terpene, terpenoid or isoprenoid compounds can be used as a surrogate measure of the ability of the AtFKI to phosphorylate a primary alcohol into a mono- or pyrophosphate terpenoid precursor. In some embodiments, downstream compounds such as linalool, nerolido geranyllinalool, or squalene are measured and used as a marker of mono- or pyrophosphate terpenoid precursor production.

Accordingly it is preferred that when the genetically engineered eukaryotic cell according to the invention is incubated in an aqueous solution containing a predefined level of linalool, nerolidol, geranyllinalool, or squalene, and a predefined level of a primary alcohol, then the molar increase in linalool, nerolidol, geranyllinalool, or squalene level after incubation is at least 25%, such as at least 30%, such as at least 40%, such as at least 50%, such as at least 60%, such as at least 70%, such as at least 80%, such as at least 90%, such as at least 95% higher than the predefined molar level of linalool, nerolidol, geranyllinalool, or squalene at the starting level

Regardless of whether the method of determining said genetically engineered eukaryotic cell is capable of converting the primary alcohol present in an aqueous solution into a mono- or pyrophosphate terpenoid precursor involves determining the levels of the primary alcohol or the larger isoprenoid alcohols, then the incubation in the aqueous solution may be performed in any suitable manner. In general, the incubation is made under conditions allowing growth and/or metabolic activity of said eukaryotic cell, such as a yeast cell. Thus, the incubation is performed at a temperature in the range of 5 to 35°C, such as in the range of 20 to 32°C. The aqueous solution should also in addition to the primary alcohol also comprise components promoting cellular growth, such as yeast strain growth including a carbon source and a nitrogen source and optionally buffers and salts. The incubation may for example be done for 12 - 24 hours or 1 - 21 days.

OVERVIEW OF SEQUENCE LISTING

SEQ ID NO: 1 Amino acid sequence of Farnesol kinase of Arabidopsis thaliana UniProt accession Q67ZM7. 51

SEQ ID NO: 2 Truncated amino acid sequence of Farnesol kinase of Arabidopsis thaliana UniProt accession Q67ZM7, missing 65 first aa, termed delta65AtFKI.

SEQ ID NO: 3 Amino acid sequence of Isopentenyl phosphate kinase of Arabidopsis thaliana UniProt accession Q8H1 F7. SEQ ID NO: 4 Amino acid sequence of Isopentenyl phosphate kinase of Methanolobus tindarius strain DSM 2278 UniProt accession W9DTD1

SEQ ID NO: 5 Amino acid sequence of Isopentenyl phosphate kinase of Thermoplasma acidophilum (strain ATCC 25905 / DSM 1728 / JCM 9062 / NBRC 15155 / AMRC-C165)

UniProt accession Q9HLX1 SEQ ID NO: 6 Amino acid sequence of Isopentenyl phosphate kinase of Thermoplasma acidophilum (strain ATCC 25905 / DSM 1728 / JCM 9062 / NBRC 15155 / AMRC-C165)

UniProt accession Q9HLX1 comprising amino acid change (204G)

SEQ ID NO: 7 Amino acid sequence of Erg20p or Geranyl diphosphate synthase of Saccharomyces cerevisiae strain JAY291 UniProt accession C7GRZ5 SEQ ID NO:8 Amino acid sequence of Erg20p or Geranyl diphosphate synthase of

Saccharomyces cerevisiae strain JAY291 UniProt accession C7GRZ5 comprising amino acid change (N127W) and indicated as Erg20p^N127W

SEQ ID NO: 9 Amino acid sequence of (+)-limonene synthase of Citrus limon and encoded by the CILimS gene GenBank accession AAM53944.1. SEQ ID NO: 10 Amino acid sequence of Beta-myrcene synthase of Ocimum basilicum and encoded by the ObMyrS (MYS gene). UniProt accession Q5SBP1

SEQ ID NO: 11 Amino acid sequence of the Sabinene synthase of Salvia pomifera and encoded by the SpSabS gene. UniProt accession A6XH06.

SEQ ID NO: 12 Amino acid sequence of a-Pinene synthase of Pinus taeda and encoded by the PtPinS gene. UniProt accession Q84KL3.

SEQ ID NO: 13 Amino acid sequence of geranyldiphosphate: olivetolate geranyltransferase of Cannabis sativa and encoded by the CsPT4 gene. Uniprot: A0A455ZJC3

SEQ ID NO: 14 Nucleotide coding sequence of the AtFKI gene of Arabidopsis thaliana NCBI RefSeq: NM_125242.4 52

SEQ ID NO: 15 Nucleotide coding sequence of A65AtFKI of Arabidopsis thaliana

SEQ ID NO: 16 Nucleotide mRNA transcript of the AtFKI gene of Arabidopsis thaliana NCBI RefSeq: NM_125242.4

SEQ ID NO: 17 Nucleotide coding sequence of AtIPK of Arabidopsis thaliana NCBI RefSeq: NM_102426.6

SEQ ID NO: 18 Nucleotide coding sequence of Isopentenyl phosphate kinase of Methanolobus tindarius strain DSM 2278. Gene MettiDRAFT_2389 Uniprot: W9DTD1.

SEQ ID NO: 19 Nucleotide coding sequence of Isopentenyl phosphate kinase of Thermoplasma acidophilum (strain ATCC 25905 / DSM 1728 / JCM 9062 / NBRC 15155 / AMRC-C165).

Uniprot: Q9HLX1.

SEQ ID NO: 20 Nucleotide coding sequence of Erg20p of farnesyl diphosphate synthase of Saccharomyces cerevisiae strain S288C. Uniprot: P08524.

SEQ ID NO: 21 Nucleotide coding sequence of (+)-limonene synthase of Citrus limon CILimS gene GenBank accession AF514287.1.

SEQ ID NO:22 Nucleotide coding sequence of Beta-myrcene synthase of Ocimum basilicum GenBank accession AY693649.1

SEQ ID NO: 23 Nucleotide coding sequence of Sabinene synthase of Salvia pomifera GenBank accession DQ785794.1

SEQ ID NO: 24 Nucleotide coding sequence of a-Pinene synthase of Pinus taeda GenBank accession AF543530.1

SEQ ID NO: 25 Nucleotide coding sequence of geranyldiphosphate: olivetolate geranyltransferase of Cannabis sativa GenBank accession BK010648

SEQ ID NO: 26: Amino acid sequence of Erg20p orfarbesyl diphosphate synthase of Saccharomyces cerevisiae strain 288c. UniProt accession P08524 comprising amino acid change (F96C) and indicated as Erg20p^F96C

SEQ ID NO: 27 Amino acid sequence of terpentetriene synthase from Streptomyces griseolosporeus (CYC2). UniProt accession Q9AJE3.

SEQ ID NO: 28 Amino acid sequence of GGPP synthase of Synechococcus elongatus (SynGGPPS). UniProt accession Q2JX96. 53

SEQ ID NO: 29 Amino acid sequence of GGPP synthase of Taxus canadensis (TcaGGPPS). UniProt accession Q9ZPM3.

SEQ ID NO: 30 Amino acid sequence of nerylneryl diphosphate synthase of Lycopersicon esculentum (LycNNPPS). UniProt accession K7WQ45.

SEQ ID NO: 31 Amino acid sequence of z-FPP synthase of Solanum habrochaites (zFPPS). UniProt accession B8XA40.

SEQ ID NO: 32 Amino acid sequence of Salvia fruticose trans-b -caryophyllene synthase (Sf126).

SEQ ID NO: 33 Amino acid sequence of D373C variant of terpene cyclase of Priestia megaterium or Bacillus megaterium (BmeTC(373C)). UniProt accession D5DR56 (wild type).

SEQ ID NO: 34 Amino acid sequence of cucurbitadienol synthase of Cucumis sativus (CPQ). UniProt accession A0A097IYL3.

SEQ ID NO: 35 Nucleotide coding sequence of variant Erg20p^F96C of Saccharomyces cerevisiae strain 288c.

SEQ ID NO: 36 Nucleotide coding sequence of terpentetriene synthase from Streptomyces griseolosporeus (CYC2).

SEQ ID NO: 37 Nucleotide coding sequence of GGPP synthase of Synechococcus elongatus (SynGGPPS)

SEQ ID NO: 38 Nucleotide coding sequence of GGPP synthase of Taxus canadensis (TcaGGPPS)

SEQ ID NO: 39 Nucleotide coding sequence of nerylneryl diphosphate synthase of Lycopersicon esculentum (LycNNPPS)

SEQ ID NO: 40 Nucleotide coding sequence of z-FPP synthase of Solanum habrochaites (zFPPS).

SEQ ID NO: 41 Nucleotide coding sequence of Salvia fruticosa trans-b -caryophyllene synthase (Sf126)

SEQ ID NO: 42 Nucleotide coding sequence of D373C variant of terpene cyclase of Priestia megaterium or Bacillus megaterium (BmeTC(373C)).

SEQ ID NO: 43 Nucleotide coding sequence of cucurbitadienol synthase of Cucumis sativus (CPQ) 54

SEQ ID NO: 44 Amino acid sequence of (2Z,6Z)-farnesyl diphosphate synthase from Solanum lycopersicum (Lycopersicon esculentum). UniProt code K7W9N9.

SEQUENCES

SEQ ID NO: 1

MATTSTTTKLSVLCCSFISSPLVDSPPSLAFFSPIPRFLTVRIATSFRSSSRFPATKIRK

SSLAAVMFPENSVLSDVCAFGVTSIVAFSCLGFWGEIGKRGIFDQKLIRKLVHINIGLVF

MLCWPLFSSGIQGALFASLVPGLNIVRMLLLGLGVYHDEGTIKSMSRHGDRRELLKGPLY

YVLSITSACIYYWKSSPIAIAVICNLCAGDGMADIVGRRFGTEKLPYNKNKSFAGSIGMA

TAGFLASVAYMYYFASFGYIEDSGGMILRFLVISIASALVESLPISTDIDDNLTISLTSA

LAGFLLF

SEQ ID NO: 2

MVMFPENSVLSDVCAFGVTSIVAFSCLGFWGEIGKRGIFDQKLIRK LVHINIGLVFMLCWPLFSSG IQGALFASLV PGLNIVRMLLLGLGVYHDEGTIKSMSRHGD RRELLKGPLYYVLSITSACIYYWKSSPIAIAVICNLCAGDGMADIVGRRFGTEKLPYNKN KS FAG S I G M ATAG F LASVAY M YY FASFGYIEDSGGMILRF LVI S I AS ALV ESLPISTDIDDNLTISLTSALAGFLLF

SEQ ID NO: 3

MELNISESRSRSIRCIVKLGGAAITCKNELEKIHDENLEWACQLRQAMLEGSAPSKVIG

MDWSKRPGSSEISCDVDDIGDQKSSEFSKFWVHGAGSFGHFQASRSGVHKGGLEKPIVKAG

FVATRISVTNLNLEIVRALAREGIPTIGMSPFSCGWSTSKRDVASADLATVAKTIDSGFVPVLHG

DAVLDNILGCTILSGDVIIRHLADHLKPEYVVFLTDVLGVYDRPPSPSEPDAV

LLKEIAVGEDGSWKVVNPLLEHTDKKVDYSVAAHDTTGGMETKISEAAMIAKLGVDVYIV

KAATTHSQRALNGDLRDSVPEDWLGTIIRFSK

SEQ ID NO: 4

MDNNNITILKIGGSVITDKSADDGTARLSEIERIAAEISGFEGKLIIVHGAGSFGHPQVK

RFGLTGKFDHEGSIITHMSVRKLNTMVVETLNSAGINALPVHPMACAISSNSRIKSMFRE

QIEEMLANGFVPVLHGDMVMDTDLGTSVLSGDQIVPYLAIQMKASRIGIGSAEEGVLDDK

GGVIPLINNENFDEIKAYLSGSANTDVTGGMLGKVLELLELSEQSNSTSYIFNAGNTGNI

SDFLSGKNIGTAIGAGTI

SEQ ID NO: 5

MMILKIGGSVITDKSAYRTARTYAIRSIVKVLSGIEDLVCVVHGGGSFGHIKAMEFGLPG

PKNPRSSIGYSIVHRDMENLDLMVIDAMIEMGMRPISVPISALRYDGRFDYTPLIRYIDA

GFVPVSYGDVYIKDEHSYGIYSGDDIMADMAELLKPDVAVFLTDVDGIYSKDPKRNPDAV

LLRDIDTNITFDRVQNDVTGGIGKKFESMVKMKSSVKNGVYLINGNHPERIGDIGKESFI

GTVIR

SEQ ID NO: 6

MMILKIGGSVITDKSAYRTARTYAIRSIVKVLSGIEDLVCVVHGGGSFGHIKAMEFGLPG

PKNPRSSIGYSIVHRDMENLDLMVIDAMIEMGMRPISVPISALRYDGRFDYTPLIRYIDA

GFVPVSYGDVYIKDEHSYGIYSGDDIMADMAELLKPDVAVFLTDVDGIYSKDPKRNPDAV

LLRDIDTNITFDRVQNDVTGGIGGKFESMVKMKSSVKNGVYLINGNHPERIGDIGKESFIGTVIR

SEQ ID NO: 7

MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTPGGKLNRGLSWDTY

AILSNKTVEQLGQEEYEKVAILGWCIELLQAYFLVADDMMDKSITRRGQPCWYKVPEVGEIAIND 55

AFMLEAAIYKLLKSHFRNEKYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKHSFIVT

FKTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQD

DYLDCFGTPEQIGKIGTDIQDNKCSVWINKALELASAEQRKTLDENYGKKDSVAEAKCKK

IFNDLKIEQLYHEYEESIAKDLKAKISQVDESRGFKADVLTAFLNKVYKRSK

SEQ ID NO: 8

MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTPGGKLNRGLSWDTY

AILSNKTVEQLGQEEYEKVAILGWCIELLQAYFLVADDMMDKSITRRGQPCWYKVPEVGEIAIW

DAFMLEAAIYKLLKSHFRNEKYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKHSFIV

TFKTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQD

DYLDCFGTPEQIGKIGTDIQDNKCSVWINKALELASAEQRKTLDENYGKKDSVAEAKCKK

IFNDLKIEQLYHEYEESIAKDLKAKISQVDESRGFKADVLTAFLNKVYKRSK

SEQ ID NO: 9

MSSCINPSTLVTSVNAFKCLPLATNKAAIRIMAKYKPVQCLISAKYDNLTVDRRSANYQPSIWDH

DFLQSLNSNYTDEAYKRRAEELRGKVKIAIKDVIEPLDQLELIDNLQRLGLAHRFETEIRNILNNIY

NNNKDYNWRKENLYATSLEFRLLRQHGYPVSQEVFNGFKDDQGGFICDDFKGILSLHEASYYS

LEGESIMEEAWQFTSKHLKEVMISKNMEEDVFVAEQAKRALELPLHWKVPMLEARWFIHIYER

REDKNHLLLELAKMEFNTLQAIYQEELKEISGWWKDTGLGEKLSFARNRLVASFLWSMGIAFEP

QFAYCRRVLTISIALITVIDDIYDVYGTLDELEIFTDAVERWDINYALKHLPGYMKMCFLALYNFVN

EFAYYVLKQQDFDLLLSIKNAWLGLIQAYLVEAKWYHSKYTPKLEEYLENGLVSITGPLIITISYLS

GTNPIIKKELEFLESNPDIVHWSSKIFRLQDDLGTSSDEIQRGDVPKSIQCYMHETGASEEVARQ

HIKDMMRQMWKKVNAYTADKDSPLTGTTTEFLLNLVRMSHFMYLHGDGHGVQNQETIDVGFT

LLFQPIPLEDKHMAFTASPGTKG

SEQ ID NO: 10

MWSTISISMNVAILKKPLNFLHNSNNKASNPRCVSSTRRRPSCPLQLDVEPRRSGNYQPSAWD

FNYIQSLNNNHSKEERHLERKAKLIEEVKMLLEQEMAAVQQLELIEDLKNLGLSYLFQDEIKIILN

SIYNHHKCFHNNHEQCIHVNSDLYFVALGFRLFRQHGFKVSQEVFDCFKNEEGSDFSANLADD

TKGLLQLYEASYLVTEDEDTLEMARQFSTKILQKKVEEKMIEKENLLSWTLHSLELPLHWRIQRL

EAKWFLDAYASRPDMNPIIFELAKLEFNIAQALQQEELKDLSRVWVNDTGIAEKLPFARDRIVESH

YWAIGTLEPYQYRYQRSLIAKIIALTTWDDVYDVYGTLDELQLFTDAIRRWDIESINQLPSYMQL

CYLAIYNFVSELAYDIFRDKGFNSLPYLHKSWLDLVEAYFVEAKWFHDGYTPTLEEYLNNSKITII

CPAIVSEIYFAFANSIDKTEVESIYKYHDILYLSGMLARLPDDLGTSSFEMKRGDVAKAIQCYMKE

HNASEEEAREHIRFLMREAWKHMNTAAAADDCPFESDLWGAASLGRVANFVYVEGDGFGVQ

HSKIHQQMAELLFYPYQ

SEQ ID NO: 11

MPLNSLHNLERKPSKAWSTSCTAPAARLQASFSLQQEEPRQIRRSGDYQPSLWDFNYIQSLNT

PYKEQRYVNRQAELIMQVRMLLKVKMEAIQQLELIDDLQYLGLSYFFPDEIKQILSSIHNEHRYF

HNNDLYLTALGFRILRQHGFNVSEDVFDCFKTEKCSDFNANLAQDTKGMLQLYEASFLLREGE

DTLELARRFSTRSLREKLDEDGDEIDEDLSSWIRHSLDLPLHWRIQGLEARWFLDAYARRPDM

NPLIFKLAKLNFNIVQATYQEELKDVSRVWVNSSCLAEKLPFVRDRIVECFFWAIGAFEPHQYSY

QRKMAAIIITFVTIIDDVYDVYGTLEELELFTDMIRRWDNISISQLPYYMQVCYLALYNFVSERAYD

ILKDQHFNSIPYLQRSVWSLVEGYLKEAYWYYNGYKPSLEEYLNNAKISISAPTIISQLYFTLANS

TDETVIESLYEYHNILYLSGTILRLADDLGTSQHELERGDVPKAIQCYMKDTNASEREAVEHVKF

LIRETWKEMNTVTTASDCPFTDDLVAVATNLARAAQFIYLDGDGHGVQHSEIHQQMGGLLFQP

YV

SEQ ID NO: 12

MALVSAVPLNSKLCLRRTLFGFSHELKAIHSTVPNLGMCRGGKSIAPSMSMSSTTSVSNE

DGVPRRIAGHHSNLWDDDSIASLSTSYEAPSYRKRADKLIGEVKNIFDLMSVEDGVFTSP

LSDLHHRLWMVDSVERLGIDRHFKDEINSALDHVYSYWTEKGIGRGRESGVTDLNSTALG 56

LRTLRLHGYTVSSHVLDHFKNEKGQFTCSAIQTEGEIRDVLNLFRASLIAFPGEKIMEAA

EIFSTMYLKDALQKIPPSGLSQEIEYLLEFGWHTNLPRMETRMYIDVFGEDTTFETPYLI

REKLLELAKLEFNIFHSLVKRELQSLSRWWKDYGFPEITFSRHRHVEYYTLAACIANDPK

HSAFRLGFGKISHMITILDDIYDTFGTMEELKLLTAAFKRWDPSSIECLPDYMKGVYMAV

YDNINEMAREAQKIQGWDTVSYARKSWEAFIGAYIQEAKWISSGYLPTFDEYLENGKVSF

GSRITTLEPMLTLGFPLPPRILQEIDFPSKFNDLICAILRLKGDTQCYKADRARGEEASA

VSCYMKDHPGITEEDAVNQVNAMVDNLTKELNWELLRPDSGVPISYKKVAFDICRVFHYG

Y KY R DG F S VAS I E I KN LVT RTVVETVP L

SEQ ID NO: 13

MGLSLVCTFSFQTNYHTLLNPHNKNPKNSLLSYQHPKTPIIKSSYDNFPSKYCLTKNFHL

LGLNSHNRISSQSRSIRAGSDQIEGSPHHESDNSIATKILNFGHTCWKLQRPYVVKGMIS

IACGLFGRELFNNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRINKPDLPL

VSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIFAGFAYSVPPIRWKQYPF

TNFLITISSHVGLAFTSYSATTSALGLPFVWRPAFSFIIAFMTVMGMTIAFAKDISDIEG

DAKYGVSTVATKLGARNMTFWSGVLLLNYLVSISIGIIWPQVFKSNIMILSHAILAFCL

IFQTRELALANYASAPSRQFFEFIWLLYYAEYFVYVFI

SEQ ID NO: 14

AT GG CAACT ACT AGT ACTACT ACAAAG CTCTCCGTTCTCTG CTGCT CTTT CATTT CAT CTCC T CT CGTT G ACT CT COT CCTT CT CT CGCCTT CTT CT CT COG ATT CCACG ATT COT CACT GTCC GAATCGCGACTAGCTTTAGATCGAGCTCTAGGTTTCCGGCCACCAAAATCCGCAAGTCTTC ACT CGCCGCCGT GAT GTTT CCGGAAAATT CGGTTTTAT CAGAT GT CTGCGCGTTT GGAGT C ACTAGCAT CGTTGCGTT CT CGT GCCT CGGTTT CTGGGGAGAGATTGGCAAACGT GGCAT C TT CG ACCAG AAACT CAT CCG G AAG CTT GTG CAT AT AAAT ATTGG G CT AGTTTTT AT GCTTT G CTGGCCGCT GTT CAGTT CT GGAAT CCAAGGAGCACTTTT CGCAT CT CTT GT ACCTGGACT C AATATAGTAAGGATGCTATTGCTGGGGCTTGGAGTGTACCACGACGAAGGAACAATCAAGT CAAT GAGCAG ACAT GG AG AT CGCAGGGAACT ACTT AAGGGGCCGCTTT ACT AT GTACT GT C AAT CACAT CAGCCTGCAT CT ACT ATT GG AAAT CAT CCCCAAT CGCGATT GCGGT GAT ATGC AACCTTTGCGCAGGAGATGGTATGGCTGACATTGTGGGTCGGCGGTTTGGAACAGAGAAG CTT CCTT ACAACAAAAACAAAT CATTT GCTGGTAGCATT GGAAT GGCCACCGCCGGGTTT C TAGCATCTGTTGCGTATATGTACTACTTTGCTTCATTTGGTTACATCGAGGATAGCGGGGG AAT GATT CTT CGTTT CCT CGT CAT CT CT AT AGCAT CAGCT CTT GT GGAAT CACT CCCAAT AA GCACCG ACATT G ACG ACAAT CT CACCATTT CCTT AACCT CT GCCTTGGCCGG ATT CTT ACT CTTCTAA

SEQ ID NO: 15

ATGGTT AT GTT CCCAG AAAACT CT GTTTT GT CT GAT GTTT GTGCTTT CGGT GTT ACTT CT AT C GTT GCTTT CT CAT GTTT AGGTTT CTGGGGT GAAATT GGT AAGAG AGGTATTTT CG ACCAGAA GOT GATT AGAAAGTTGGT CCATATTAACAT CGGCCTGGTTTTTAT GTT GT GTT GGCCTTT GT TTT CCT CAGGTATT CAAGGT GCTTT GTT CGCTT CTTTGGTT CCAGGTTT GAAT AT CGT CAG A AT GTT GTTGTT AGGTTTGGGT GTTT ACCACG AT GAAGGTACT ATT AAGT COAT GT CAAG ACA CGGT G ACAGAAGAG AATT ATT GAAAGGT CCCTT GT ACT ACGT CTT GT CT ATT ACTT CT GCTT GOAT CT ACT ACTGGAAGT CAT CT CCAATT GOT ATT GCTGTT AT CT GTAACTT GTGTGCTGGT GATGGTAT GGCT GAT AT AGTT GGTAG AAG ATT CGGTACT G AAAAGCTGCCAT ACAACAAGA ACAAATCTTTCGCTGGTTCTATTGGTATGGCAACTGCTGGTTTTTTGGCTTCTGTTGCTTAT AT GT ACT ACTT CGCCT CTTT CGGTTACAT CGAAGATT CT GGTGGTAT GAT CTT GAGATT CCT GGTT ATTT CT ATTGCCT CCGCTTTGGTT GAAT CCTTGCCAATTT CT ACCG AT AT CGAT GAT A ACCT GACCAT CT CTTT GACAT CTGCTTTGGCAGGTTT CTT GOT GTT CTAA 57

SEQ ID NO: 16

CATGAACCATTCTAGCCGAATCCAAATAGAACGAAACTAAAATTGCTTTAATCCATATGTGA CCAG AT AAAGT AAAACACT CCTTTT GG G G AAAT AAAT AT CC ACTTTT CACCAT CTTTTT G GTA AACAATAAAACAAACCAAATTTT GT GTGCTT GAGCAACAACAACT ACCAACCAGTT ACT GT C AAAAAAAAT AAAT GAAAACGT AAT CG AAG AAAAGGT CGT GTTTT CT AGT GTT GCAG AAAAT G GCAACT ACT AGT ACT ACT ACAAAGCT CT CCGTT CT CTGCT GOT CTTT CATTT CAT CT COT CT CGTT GACT CT CCT CCTT CT CT CGCCTT CTT CT CT CCGATT CCACGATT CCT CACT GT CCGAA T CGCGACTAGCTTTAGAT CGAGCT CTAGGTTT CCGGCCACCAAAAT CCGCAAGT CTT CACT CGCCGCCGT GAT GTTT CCGGAAAATT CGGTTTTAT CAGAT GT CT GCGCGTTTGGAGT CACT AGCATCGTTGCGTTCTCGTGCCTCGGTTTCTGGGGAGAGATTGGCAAACGTGGCATCTTC GACCAGGTTTT CGAT CATT CAATT CACTT AATT AAATT AT ACACGCGT CGATTT GTT GAAT AA CGTTTTT G G AAT GTTGCT CG ATT CT G CAATTT CTG GTT AAAG CAACTTT G GATT C AG AGTAA TGGTAGTAGTAAGCTTTATCGAGTAATGGATGGCATTAATGGCGAGGATAATAAATCTTATT AGTTCAAGAAATTATAGAATGTGAATGGATACCTATTTTGGGGTGGAAGAACACTGCGATAA ACAATGGGAAAGTAACCCCATATCTGGTAAATTAGTGGGAAAATTGCTCAGAACCCAATTT GGGGAT G AAATTTTTTT GAAATT CGTGCGGT AT AAT CACAAT CT CAT CGTAACTT G AAGATT CAACCAT ATTT AT GAT AT AT AT G AATTTT GT AG GTTT AAAT GTTG CAC AATTT CTTT GOT AT AC TTT G AAGG ATT ATTT CTT CGTAT G AGT CAGT CT AT ACACAT ATTGG ACAT GGCTT GTTTTTTT TT CCTTT CT CTT ATGT GTTT CT CCGCAGT GACTT GTTTTT CTT CAAGTAATTTGGTT AGG AGT AG AGG AATT AT CT AT GATT AG CAAAAATT CT ATT AAAC ATTTTT G GTTT GG G AAT ACCTGT AT GGCTT CAGCT ACTT GATTTT GTTT CT CT G ACCTT AT CGT GATT AAG AT AT CAT CT CAG AACC CATT G AATTT AT AAG AT CCACAGGCT GT CT GAT GT AGGACT GACT CCACTTT CAT GGTTTT A TT GAAT G AGGCATTGGTTT AACAT CT GTGGTT G ACATTGGT GTT ATTT CTT ATTTTT CAGAAA CTCATCCGGAAGCTTGTGCATATAAATATTGGGCTAGTTTTTATGCTTTGCTGGCCGCTGTT CAGGTAAT AAT ACGG AGT G AAAGG AT CGATTTT GAT GT ACCAGCTTGCTTTT CTGCTT CACC AGTTT ATTT CGAT AT CAATT GTTT CAGTT CT GG AAT CCAAGG AGCACTTTT CGCAT CT CTT GT ACCTGGACTCAATATAGTAAGGATGCTATTGCTGGGGCTTGGAGTGTACCACGACGAAGG AACAAT CAAGT CAAT G AGCAG ACAT GG AGAT CGCAGGTAGCT GAAT CGAACAAGT GTCTGT CACAAT AT CCAACAAT CT CTT AGCATT CTT GGTATCTAT GAT CAAAATT CGGTTT GT CT GTTT ATTATTG CTTTT GT CACTT CACAAAGT GTCTGT AGT CG CTCTCTG CAT CAG GG AACT ACTT A AGGGGCCGCTTTACTATGTACTGTCAATCACATCAGCCTGCATCTACTATTGGAAATCATCC CCAATCGCGATTGCGGTGATATGCAACCTTTGCGCAGGAGATGGTAAGTACAGTTAGCTCT GGTTT GAT CCAAT AACATT AGCAACTTTT GT CTT AGG AATTT GAAT CCATT GAAT GAT GATT C TAATACACACACTTGCTGTTAAAAGGTATGGCTGACATTGTGGGTCGGCGGTTTGGAACAG AG AAGCTT CCTT AC AACAAAAAC AAAT CATTT GCTG GT AG CATT GG AAT G G CCACCG CCG G GTTTCT AG CAT CTGTTGCGTATATGT ACT ACTTT G CTT CATTT GGTT ACAT CG AG GAT AG CG GGGGAAT GATT CTT CGTTT CCT CGT CAT CT CT AT AGCAT CAGCT CTT GT GG AAT CACT COCA ATAAGCACCGACATT GACGACAAT CT CACCATTT CCTT AACCT CT GCCTTGGCCGGATT CTT ACT CTT CT AAT AAT ACCCT CT CGTT GTT AT GT AT CAT CAAAT AAAGGGT CGAGCTT G ATTGC T GAT AT GG AGGTAAAACTGCATT CATT GTT CCCAT CTT CTT CTGTAT GTACGT ATT AGT GAA AC ATCT CAT ATTGTTGTTGT CC AC AAAT CTTATTTTTCAGCTG CAATT GCAGTTGGGT AC AAT GTT GTAAT GTT CT AT CCATT AGT GAG ACAT AT GAT GACG AACAGT G ACGCTT CT ACAATTT G T AACAG AT ACT CTTTT GTAACAGAT ATT CAAT ACAT GTTT GTTT GTT ATTT GGCCT ATGGCTA TGTGG

SEQ ID NO: 17

AT GG AGCT GAAT ATTT CCG AG AGT CG AAGCAGAT CAATT CGTT GCATT GT GAAACTTGGAG GTGCGGCAATTACTTGCAAAAACGAGCTGGAGAAGATTCACGATGAAAATCTGGAGGTCGT GGCGTGTCAGTTACGTCAAGCTATGTTGGAGGGTTCAGCTCCAAGCAAGGTTATTGGCAT GGATT GGAGCAAGAGACCTGGAAGCT CT GAGATTT CTT GT GAT GTGGAT GACATAGGGGA TCAAAAGTCTTCTGAGTTTAGTAAATTTGTTGTGGTTCATGGCGCTGGTTCCTTTGGGCACT 58

TTCAGGCCAGTAGATCTGGGGTTCACAAAGGAGGACTTGAGAAACCTATTGTCAAAGCTGG TTT CGTTG CT ACT CGT ATATCTGT G ACAAAT CTT AAT CTT G AAATT GTACG AG CACT AG CCC GAGAGGGCATTCCTACGATAGGCATGTCTCCATTTTCATGTGGTTGGTCAACCTCCAAAAG AG AT GTGGCTT CTGCAGAT CT AGCAACCGT AGCT AAAACCAT AGACT CAGGATTT GT CCCT GTT CT CCAT GG AGATGCAGT GCT GG ACAAT AT ACT GGGCTGCACCAT ATT GAGT GGT GAT G TT AT CAT CCGT CAT CTTG CAG AT C ATTT G AAG CCAG AAT ATGTTGT CTTT CT CACAG AT GT A CTAGGT GT CTACGAT CGACCACCTT CACCTT CAGAGCCCGACGCT GTGCT CTT GAAAGAGA TCG CTGTT G GAG AAG AT G G AAG CT G G AAG GTT GT G AAT CCACT GTTG G AG C ACACAG AC A AGAAAGTTGACTACTCTGTTGCGGCGCACGATACAACCGGTGGAATGGAAACGAAGATAT CAGAAGCTGCT AT G ATTGCAAAACTTGGAGT CG AT GT CT ACATT GT G AAGGCT GCGACAAC T CATT CACAGAGAGCACTAAACGGT GATTT GAGAGATAGT GTT CCT GAAGATTGGCTT GGT ACT AT CAT CAG ATT CT C AAAGT AG

SEQ ID NO: 18

TGGATAACAACAACATCACCATCCTGAAAATTGGTGGTAGCGTGATTACCGATAAAAGCGC AGATGATGGCACCGCACGTCTGAGCGAAATTGAACGTATTGCAGCAGAAATTAGCGGCTTT GAAGGCAAACT GATT ATT GTT CAT GGTGCAGGTAGCTTT GGT CAT CCGCAGGTT AAACGTT TTGGTCTGACCGGTAAATTTGATCATGAAGGCAGCATTATTACCCATATGAGCGTTCGTAAA CT G AAT ACCATGGTT GTT GAAACCCT G AAT AGCGCAGGT ATT AAT GCACT GCCGGTT CAT C CG AT GGCAT GTGCAATT AGCAGCAAT AGCCGTAT CAAAAGCAT GTTT CGT G AGCAG ATT G A AGAAATGCTGGCCAATGGTTTTGTTCCGGTTCTGCATGGTGATATGGTTATGGATACCGAT CTGGGCACCAGCGTTCTGAGCGGTGATCAGATTGTTCCGTATCTGGCAATTCAGATGAAAG CAAGCCGTATTGGTATTGGTAGTGCCGAAGAGGGTGTTCTGGATGATAAAGGTGGTGTTAT T CCGCT GATT AACAACG AG AACTT CG AT GAG ATT AAAGCAT AT CT GAGTGGT AGCGCAAAT ACCGAT GTT ACCGGTGGT AT GCTGGGTAAAGTT CT GG AACT GCT GG AATT AAGCG AACAG A GCAATAGCACCAGCTATATCTTTAATGCAGGTAACACCGGCAACATCAGCGATTTTCTGTC AG GTAAAAACATT GG CACCG CCATT G GTG CAGG CACCATTT AA

SEQ ID NO: 19

AT GAT G ATTTT G AAG AT CGGTGGTTCCGTTAT CACT GAT AAGT CT G CTT AT AG AACT G CT AG AACCT ACGCCATT AG AT CCATT GT CAAAGTTTT GT CCGGTAT CGAAGATTTGGTTT GCGTT G TTCATGGTGGTGGTTCTTTTGGTCATATTAAGGCTATGGAATTTGGTCTGCCAGGTCCAAAA AAT CCAAGAT CAT CT ATTGGTT ACT CCAT CGTT CACAG AG ACATGGAAAATTT GG ATTT GAT GGTTAT CGACGCCAT GAT CG AAAT GGGTATGCGT CCAATTT CT GTT CCAATTT CAGCTTT G A GAT ACG ACGGTAG ATTT GATT ACACCCC ATT GATT AG GT ACATT G ATG CT G GTTTT GTTCCA GTTT CTT ACGGT GAT GTTT ACAT CAAGG AT GAACACT CTT ACGGTAT CT ACT CCGGT GAT GA T ATT ATGGCT GAT ATGGCCGAATT ATT G AAGCCAGAT GTTGCT GTTTT CTT GACCGAT GTTG AT GGCAT CT ATT CCAAAGAT CCAAAG AGAAAT CCAG ACGCCGTTTT GTT GAG AG AT ATT GAT ACCAACAT CACCTT CGACAGAGTT CAAAACG AT GTT ACTGGTGGT AT CGGTAAAAAGTT CG AAT CCAT GGTT AAGAT G AAGT CCT CT GTT AAGAACGGT GTCT ACTT GATT AACGGTAACCAT CCAGAAAGAAT CGGT GAT ATT GGTAAAG AGT CCTT CAT CGGTACT GT CAT CAG AT GA

SEQ ID NO: 20

TG GCTT CAG AAAAAG AAATT AG GAG AG AG AG ATT CTT G AACGTTTT CCCT AAATT AGTAG AG GAATTGAACGCATCGCTTTTGGCTTACGGTATGCCTAAGGAAGCATGTGACTGGTATGCCC ACT CATT G AACT ACAACACT CCAGGCGGT AAGCT AAAT AG AGGTTT GT CCGTT GTGGACAC GT ATGCT ATT CT CT CCAACAAG ACCGTT G AACAATTGGGGCAAGAAG AAT ACG AAAAGGTT GCCATT CT AGGTTGGT GCATT GAGTT GTTGCAGGCTT ACTT CTT GGT CGCCGAT GAT AT GA TGGACAAGTCCATTACCAGAAGAGGCCAACCATGTTGGTACAAGGTTCCTGAAGTTGGGG AAATTGCCAT CAAT G ACGCATT CAT GTT AG AGGCT GCT AT CT ACAAGCTTTT G AAAT CT CAC TT CAG AAACG AAAAAT ACT ACAT AG AT AT CACCG AATT GTT CCAT G AG GT CACCTT CCAAAC CG AATT G G GCC AATT GAT G G ACTT AAT CACT G CACCT G AAG ACAAAGT CG ACTT GAGT AAG 59

TT CT CCCT AAAGAAGCACT CCTT CAT AGTT ACTTT CAAG ACT GCTT ACT ATT CTTT CT ACTT G CCTGTCGCATTGGCCATGTACGTTGCCGGTATCACGGATGAAAAGGATTTGAAACAAGCCA GAG AT GT CTT GATT CCATT G G GT G AAT ACTT CC AAATT CAAG AT G ACT ACTT AG ACT G CTT C GGTACCCCAGAACAG AT CGGT AAG AT CGGT ACAG AT AT CCAAG AT AACAAAT GTT CTTGGG TAATCAACAAGGCATTGGAACTTGCTTCCGCAGAACAAAGAAAGACTTTAGACGAAAATTAC GGTAAGAAGGACT CAGT CGCAGAAGCCAAATGCAAAAAGATTTT CAAT GACTT GAAAATT G AACAG CTATACCACG AAT ATG AAG AGTCTATTG CCAAG GATTTG AAG G CCAAAATTTCTCAG GT CGAT G AGT CT CGTGGCTT CAAAGCT GAT GT CTT AACTGCGTT CTT GAACAAAGTTT ACAA GAG AAG CAAAT AG

SEQ ID NO: 21

AT GT CTT CTT GCATT AAT CCCT CAACCTTGGTT ACCT CT GTAAATGCTTT CAAAT GT CTT CCT CTTG C AACAAAT AAAG C AG C CAT C AG AAT CAT G G C CAAAT AT AAG CC AG T CC AAT G CCTT AT CAGCG CC AAAT AT GAT AATTT G ACAGTT GAT AG GAG AT CAG CAAACT ACC AACCTT CAATTT GGGACCACGATTTTTTGCAGTCATTGAATAGCAACTATACGGATGAAGCATACAAAAGACG AGCAGAAGAGCTGAGGGGAAAAGTGAAGATAGCGATTAAGGATGTAATCGAGCCTCTGGA T CAGTTGGAGCT GATT GAT AACTT GCAAAGACTT GG ATTGGCT CAT CGTTTT GAG ACT GAG ATT AG G AAC AT ATT G AAT AAT ATCT ACAAC AAT AAT AAAG ATT AT AATT G G AG AAAAG AAAAT CT GTATGCAACCT CCCTT GAATT CAGACT ACTT AG ACAACATGGCT AT CCT GTTT CT CAAGA GGTTTT CAATGGTTTTAAAGACGACCAGGGAGGCTT CATTT GT GAT GATTT CAAGGGAATA CT G AGCTT GC AT G AAG CTT CGTATT ACAGCTT AG AAG GAG AAAG CAT CAT G GAG GAG G CCT GG CAATTT ACT AGTAAACAT CTT AAAG AAGT GAT GAT CAG CAAG AACAT G G AAG AG GAT GT ATTT GTAGCAGAACAAGCGAAGCGT GCACT GGAGCT CCCT CT GCATT GGAAAGT GCCAAT GTTAGAGGCAAGGTGGTTCATACACATTTATGAGAGAAGAGAGGACAAGAACCACCTTTTA CTTG AGCTCG CTAAG ATG G AGTTTAACACTTTG CAGG C AATTTACC AG G AAG AACTAAAAG AAATTTCAGGGTGGTGGAAGGATACAGGTCTTGGAGAGAAATTGAGCTTTGCGAGGAACA GGTTGGTAGCGTCCTTCTTATGGAGCATGGGGATCGCGTTTGAGCCTCAATTCGCCTACTG CAGG AGAGTGCT CACAAT CT CGAT AGCCCT AATT ACAGT GATT GAT GACATTT AT GAT GTCT AT GG AACATT GG AT G AACTT GAG AT ATT CACT GAT GCT GTT GAG AGGT GGGACAT CAATT A TGCTTT GAAGCACCTT CCGGGCT AT AT GAAAAT GT GTTTT CTT GCGCTTT ACAACTTT GTTA AT GAATTTGCTT ATTACGTT CT CAAACAACAGGATTTT GATTTG CTT CT GAGCATAAAAAAT G CATGGCTTGGCTTAATACAAGCCTACTTGGTGGAGGCGAAATGGTACCATAGCAAGTACAC ACCG AAACT G G AAG AAT ACTT G GAAAAT G GATT G GTAT CAAT AACGG G CCCTTT AATT AT AA CG ATTT CAT AT CTTT CTGGTACAAAT CCAAT CATT AAGAAGGAACTGG AATTT CT AGAAAGT AAT CCAG AT AT AGTT CACT G GT CAT CCAAG ATTTT CCGTCTG CAAG AT GATTT G G G AACTT C AT CGGACG AGAT ACAG AG AGGGG AT GTT CCG AAAT CAAT CCAGT GTT ACATGCAT G AAACT GGTGCCT CAG AGG AAGTT GCT CGT CAACACAT CAAGG AT AT GAT G AGACAG AT GT GG AAG AAG GT G AAT G CAT AC AC AG CCG AT AAAG ACT CT CCCTT G ACT GG AACAACT ACT G AGTT CC T CTT GAAT CTT GT GAG AAT GT CCCATTTT ATGTAT CT ACAT GG AG ATGGGCAT GGTGTT CAA AACCAAG AG ACT AT CGAT GT CGGTTTT ACATTGCTTTTT CAGCCCATT CCCTT GG AGG ACAA AC AC ATG G CTTT C AC AG CAT CTCCTGGCAC C AAAG G CT G A

SEQ ID NO: 22

ATGTGGTCT ACCATT AG CATT AG CAT GAAT GTG G CAAT CCT G AAG AAG CCACTT AACTT CCT CCACAACT CAAAC AAC AAAG CTT CAAACCCT CGGTGCGT GT CGT CT ACT CGCCGGCGCCC TTCTTGCCCCTTGCAGCTTGACGTTGAACCCCGACGCTCCGGAAACTACCAGCCTTCAGCT TGGG ATTT CAACT ACATT CAAT CT CT CAAT AAT AAT CACT CCAAGGAGGAG AGGCATTT GG A AAGGAAAGCTAAGCTGATTGAGGAAGTGAAGATGCTATTGGAGCAGGAAATGGCGGCAGT T CAACAGTTGGAGTT GATT G AAG ACTT G AAAAAT CTGGG ATT GT CAT ACTT ATTT CAAGAT G AG ATT AAAAT AATTTT GAATT CCAT AT ACAAT CACCACAAAT GCTT CCACAAT AAT CAT G AAC AATGCAT ACACGT AAATT CAG ATTT GTATTT CGT CGCTCT CGG ATT CAGACT CTT CCGGCAA 60

CAT GGTTTT AAAGT CT CT CAAGAAGT ATTT G ACT GTTTT AAG AACG AAGAGGGCAGT GATTT CAGTGCAAACCTTGCTGACGATACAAAGGGGCTGCTACAACTTTACGAAGCGTCATATCTG GT GACAG AAG AT G AAG AT ACACT GG AGATGGCGCG ACAATTTT CCACCAAAATT CT GCAG A AAAAAGTGG AAG AAAAAAT GATT G AG AAGG AG AATTT ATT AT CAT GG ACACTT CATT CTTT G GAGCTCCCACTTCATTGGCGGATTCAAAGGCTGGAGGCCAAATGGTTCTTAGATGCTTATG CTAGCAGACCAGAT AT GAAT CCCATT ATTTTT GAGTT GGCTAAATTGGAATT CAATATTGCT CAAGCATT ACAACAGG AAG AACT CAAAG AT CT CT C AAG GT G GTG G AAT GAT ACT G GT ATT G CCGAAAAACTCCCATTTGCGAGGGATCGAATAGTTGAATCCCACTATTGGGCAATTGGAAC CCTT GAGCCTT AT CAAT AT AGAT AT CAAAGAAGCCT CAT CGCCAAG ATT ATT GCCCT AACT A CAGTT GTT GAT GAT GTCT ACG AT GT GT ACGGCACATT GG AT G AACT CCAACT ATTT ACAG AC GCAATT CG AAG AT G G G AT ATT GAAT CAAT CAACCAACTT CCTAGTT AC AT GCAACT ATG CTA TTT AG CAAT CT ACAACTTT GTTT CT GAG CTG GCTTACG AT ATTTT CCG AG AC AAG G GTTTCA ACAGCCTCCCATATTTACACAAATCGTGGCTGGATTTGGTTGAAGCATATTTTGTTGAGGCA AAGT G GTT CCACG AT G GAT AT ACT CCAACT CT AG AAG AAT AT CT C AACAATT CG AAG AT AAC AAT AATTT GTCCTG CAAT AGT CT CAG AAATTT ACTT CG CATTT GCAAACT CC AT CG ACAAAA CAGAGGT CGAG AGCAT AT ACAAAT AT CAT G ACAT CCTTT ACCTTT CCGG AAT GCTTGCAAG GCTT CCCG AT GATTT AGG AACAT CAT CGTTT GAG AT GAAGAG AGGT G ACGT GGCG AAAGC AATTCAGTGTTACATGAAGGAGCATAACGCCTCAGAGGAGGAGGCACGTGAGCACATCAG ATTTCTTATGCGGGAGGCGTGGAAGCATATGAACACGGCGGCTGCGGCCGACGACTGTCC ATTT GAG AGT GATTT AGTT GTGGGTGCAGCT AGT CT CGG AAG AGTGGCT AATTTT GT GT AT GT GG AGGG AG AT GGTTTT GG AGT GCAACACT CAAAAAT ACAT CAACAAATGGCT G AATT AC T GTTTT ACCCAT AT CAGT G A

SEQ ID NO: 23

AT GCCACT GAATT CCCT CCACAACTTGGAG AGG AAACCTT CAAAAGCATGGT CT ACCT CTT GCACT GCACCCGCAGCT CGCCT CCAGGCAT CTTT CT CCTT ACAACAAGAAG AACCT CGT CA AAT CCG ACG CTCT G GG G ATT ACC AACCCT CT CTTT G G GATTT CAATT ACAT ACAGT CT CT CA ACACTCCGTATAAGGAGCAGAGATACGTTAATAGGCAAGCAGAGTTGATTATGCAAGTGAG GAT GTTGCTT AAG GT AAAG AT G G AG G CAATT CAACAGTT G GAGTT GATT GAG ACTT G CAAT ACCTGGG ACT GT CTT ATTT CTTT CCAGAT GAG ATT AAACAAAT CTT AAGTT CT AT ACACAAT G AGCACAG AT ATTT CCACAAT AAT GATTT GTATCT CACAGCT CTTGG ATT CAGAAT CCT CAGA CAACATGGTTTT AAT GTTT CCGAAG AT GT ATTT GATT GTTT CAAG ACT GAGAAGTGCAGT GA TTT CAAT G CAAACCTT G CT CAAG AT ACG AAG GG AAT GTT ACAACTTT AT G AAGCAT CTTT CC TTTT GAG AGAAGGT G AAG AT ACATTGGAGCT AG CAAG ACG ATTTT CCACCAG AT CT CT ACG AG AAAAACTT GAT G AAG AT GGT GAT G AAATT GAT G AAG AT CT AT CAT CGT GG ATT CGCCATT CCTTGG AT CTT CCT CTT CATT GG AGG AT CCAAGG ATT AGAGGCAAG ATGGTT CTT AGAT GC TT AT G CGAG GAG G CCGG ACAT GAAT CCACTT ATTTT CAAACT CGCC AAACT CAACTT CAAT A TT GTT CAGGCAACAT AT CAAG AAG AACT CAAAG AT GT CT CAAGGTGGT GG AAT AGTT CGT G CCTT GCT GAGAAACT CCCATTT GT GAGAGAT AGGATT GTGGAATGCTT CTTTT GGGCCAT C GGGGCTTTT G AGCCT CACCAAT AT AGTT AT CAGAG AAAAAT GGCCGCCATT ATT ATT ACTTT CGT AACAATT AT CG AT GAT GTTT AT GAT GT GTATGG AACATT AG AAG AACTGGAACT ATTT A CAG AT ATG ATT CG CAG AT G GG AT AAT AT AT CAAT AAGCCAACTT CCAT ATT AT AT G CAAGT G TG CTATTTG G CACTATACAACTTCGTTTCTG AG CG G GCTTACG AT ATTCTAAAAG ATCAACA TTT CAACAGCAT CCCATATTTACAGAGAT CGTGGGT AAGTTT GGTT GAAGGAT AT CTT AAGG AG G CATACTG GT ACT ACAATG G CT AT AAACCAAGCTT G G AAG AAT AT CT CAACAACG CCAA GATTT CAAT AT CGGCT CCT ACAAT CAT AT CCCAGCTTT ATTTT ACATT AGCAAACT CG ACT G A T G AAACAGTT AT CG AG AGCTT AT ACG AAT AT CAT AACAT ACTTT ACCTAT CAG G AACCAT ATT AAGGCTTGCT GACGAT CTTGGGACAT CACAACAT GAGCTGGAGAGAGGAGACGT CCCGAA AGCAATCCAGTGCTACATGAAGGACACAAATGCTTCGGAGAGAGAGGCGGTGGAACACGT GAAGTTTCTGATAAGGGAGACGTGGAAGGAGATGAACACGGTCACAACAGCCAGCGATTG TCCGTTTACGGATGATTTGGTTGCGGTCGCAACTAATCTTGCAAGGGCGGCTCAGTTTATA 61

TAT CT CGACGGGGAT GGGCATGGCGTGCAACACT CGGAAAT ACAT CAACAGAT GGGAGGC CT GCT ATT CCAGCCTT ATGTCT G A

SEQ ID NO: 24

AT GT CAT CT ACT ACAT CCGTTT CT AAT G AAG AT GGTGT CCCAAG AAGAATT GCTGGT CAT CA TT CT AATTT GT GGG AT GAT GATT CT AT CGCCT CTTT GT CT ACTT CTT AT G AAGCT CCAT CTT A CAG AAAGAG AGCCG AT AAGTT G ATTGGT G AAGT CAAG AACAT CTT CGACTT GAT GT CT GTT GAGGATGGT GTTTTT ACTT CT CCATT GTCT G ACTTGCAT CACAG ATT GT GG ATGGTT GATT C AGTT GAAAG ATTGGGT AT CGACAG ACATTT CAAGG ACG AAAT CAATT CCGCTTT GG AT CAC GTTT ATT CTT ACTGGACCG AAAAAGGTATT GGTAGGGGTAG AGAAT CTGGTGTT ACT G ATTT GAATT CT ACCGCTTT GGGTTT GAG AACCTT GAGATT GOAT GGTT ACACT GTTT CTT CCCACG TTTT GG AT CATTTT AAG AACG AAAAGGGT CAGTT CACCT GTT CT GCT ATT CAAACT G AAGGT GAAAT CAGGG AT GT CTT GAATTT GTT CAG AGCTT CCTT G ATTGCTTT CCCAGGT G AAAAG AT T ATGGAAGCT GCT G AAATTTT CT CCACCAT GTACTT GAAAGATGCCTT GCAAAAAATT CCAC CAT CCG GTTT GTCT CAAG AAAT CG AAT ACTT GTT G GAATT CG GTTG G CAT ACCAATTT G CCA AGAATGGAAACTAGAATGTACATCGACGTTTTCGGTGAAGATACCACTTTTGAAACCCCATA CTTGATCAGGGAAAAGTTGTTAGAATTGGCCAAGTTGGAGTTCAACATCTTCCATTCATTGG T CAAGAGGG AATT GCAGT CTTT AT CT AGGT GGTGG AAAGATT ACGGTTT CCCAG AAATT AC CTT CT CCAG ACAT AGACAT GT CGAGTATT AT ACTTT GGCT GCTT GCATTGCT AACGAT CCT A AACATT CT GCTTT CAG ATT AGGTTT CGGTAAG AT CT CCCAT AT GAT CACCATTTTGGAT GAT AT CT ACGACACCTT CGGTACT AT GG AAGAATT G AAGTT GTT GACT GCT GCTTT CAAAAGAT G GG AT CCAT CCT CT ATT G AATGCTT GCCAG ATT AT AT GAAGGGT GTTT ACAT GGCCGTTT ACG ACAACATTAACGAAATGGCTAGAGAAGCCCAAAAGATTCAAGGTTGGGATACAGTTTCTTA CGCT AG AAAAT CTT G G G AAG CTTT CATT GGTG CTT ACATT CAAG AG G CT AAGT GG ATTTCTT CTGGTTACTTGCCAACTTTCGATGAGTACTTGGAAAACGGTAAGGTTTCTTTCGGTTCTAGA ATT ACT ACCTT GG AACCT AT GTT G ACCTTGGGTTTT CCATT GCCACCAAG AAT ATTGCAAG A AATT G ACTT CCCCT CCAAATT CAACG ATTT G ATTTGCGCCATTTT G AGGTT GAAGGGT GAT A CT C AAT GTT AC AAAG CTGATAGAGCTAGAGGTGAAGAAGCTTCAGCTGTTTCTTGTT ACAT G AAGGAT CAT CCAGGTAT CACT GAAGAAGAT GCCGTTAAT CAAGTTAACGCCAT GGTT GATA ACCT G ACCAAAG AGTT GAATTGGG AATTGCT AAGACCAG ATT CAGGT GTT CCAAT CT CTT A CAAG AAG GTTG CTTT CG AT AT CTGCAGAGTTTTT CACT ACGGTT ACAAGTACAGAG AT GGTT T CT CT GTT GCTT CCAT CGAAAT CAAG AACTTGGTT ACT AG AACCGTT GTT G AAACCGTT CCA TTGTGA

SEQ ID NO: 25

TGATCTTCGACGGCACAACCATGAGTATCGCCATTGGTTTGCTTAGCACCCTGGGAATAGG GGCAGAAGCG AAT CCAAG AGAAAATTT CTT GAAGT GTTTTT CT CAGT AT AT CCCGAAT AAT G CG ACG AACCTT AAGTT AGT AT ACACT CAGAACAACCCT CT AT AT AT G AGCGTT CT AAATT CT ACAATCCACAACCTAAGATTTACGTCCGACACGACTCCGAAACCCCTAGTTATAGTGACAC CGT CACAT GTT AGCCAT AT ACAGGGCACCAT ACT AT GTT CCAAAAAAGTT GGGTT ACAAAT A CGTACCCGTAGCGGGGGACACGACAGTGAGGGGATGAGTTATATTAGTCAGGTGCCTTTC GT CAT AGT GG ATTT AAGAAAT AT G AGGT CAATT AAAAT CGACGTT CACT CACAAACTGCCT G GGTTGAGGCGGGGGCCACATTGGGTGAAGTATATTACTGGGTCAATGAGAAGAACGAGAA TCTTTCACTAGCAGCCGGTTATTGTCCCACAGTCTGCGCCGGCGGTCACTTTGGCGGCGG CGGATACGGTCCCTTAATGAGAAATTACGGGCTTGCCGCAGACAATATCATAGATGCTCAC TT AGTT AAT GTT CAT G G AAAAGT GTT AG ACCGTAAAAG CAT G GG G GAG GAT CT GTTTT GGG CGCTTAGAGGGGGAGGGGCAGAATCATTTGGAATAATAGTGGCATGGAAAATCAGGCTTG TGGCT GTT CCAAAG AGT ACCAT GTT CT CAGT AAAG AAAAT AATGG AG AT CCAT G AGCT AGTT AAACTT GT G AAT AAAT GG CAAAACAT AGCCT AT AAAT AT GAT AAG G ACTT G CTG CTT AT G AC T CATTT CAT AACC AG AAACATT ACG GAT AACCAAG G G AAG AACAAAACAG CCATCCAT ACCT ACTTT AGCT CCGTTTT CTTGGGT GGTGT AG ACAGCTT AGTT G ACCT GAT G AACAAG AGTTTT CCG G AACT AGGTAT CAAG AAG ACAG ATT GT AG AC AACTTT CCT G G ATT GAT ACCAT AAT CTT 62

TT ACAGCGG AGT CGT CAATT AT G ACACT GACAACTT CAACAAGG AAATTTT ATT AGAT AGGA GT GCGGGT CAAAATGGGGCCTT CAAG AT CAAACT AG ACT ACGTT AAAAAACCCATT CCT GA AAGTGTTTTTGTTCAGATTCTGGAGAAGCTGTATGAAGAAGATATTGGCGCGGGGATGTAC GCT CTTT AT CCGTACGGCGGCATAATGGAT GAGATTAGT GAAAGCGCCAT CCCTTT CCCCC ACAGAGCTGGTATCCTGTACGAGTTGTGGTATATCTGCTCCTGGGAGAAACAGGAGGATAA CGAAAAGCACTTAAATTGGATTAGGAATATCTACAATTTCATGACGCCCTACGTTTCCAAGA ACCCCAGGTTGGCCTATTT GAACT ACAGGGAT CTT GATATTGGAAT CAACGACCCCAAAAA CCCAAACAACTACACCCAGGCAAGGATTTGGGGAGAGAAGTACTTCGGGAAGAACTTCGA CAGGCTAGTTAAGGTGAAAACGCTAGTTGATCCAAATAATTTTTTCAGAAACGAACAGAGTA T CCCT CCCTT ACCGCGT CAT AGGCACT AA

SEQ ID NO: 26

MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTPGGKLNRGLSWDTY

AILSNKTVEQLGQEEYEKVAILGWCIELLQAYCLVADDMMDKSITRRGQPCWYKVPEVGEIAIN

DAFMLEAAIYKLLKSHFRNEKYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKHSFIV

TFKTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQDDYLDCFGTPEQIGKIGTDIQ

DNKCSVWINKALELASAEQRKTLDENYGKKDSVAEAKCKKIFNDLKIEQLYHEYEESIAKDLKAK

ISQVDESRGFKADVLTAFLNKVYKRSK

SEQ ID NO: 27

MPDAIEFEHEGRRNPNSAEAESAYSSIIAALDLQESDYAVISGHSRIVGAAALVYPDADAETLLA

ASLWTACLIVNDDRWDYVQEDGGRLAPGEWFDGVTEWDTWRTAGPRLPDPFFELVRTTMS

RLDAALGAEAADEIGHEIKRAITAMKWEGVWNEYTKKTSLATYLSFRRGYCTMDVQVVLDKWI

NGGRSFAALRDDPVRRAIDDVWRFGCLSNDYYSWGREKKAVDKSNAVRILMDHAGYDESTA

LAHVRDDCVQAITDLDCIEESIKRSGHLGSHAQELLDYLACHRPLIYAAATWPTETNRYR

SEQ ID NO: 28

MVAQTFNLDTYLSQRQQQVEEALSAALVPAYPERIYEAMRYSLLAGGKRLRPILCLAACELAGG

SVEQAMPTACALEMIHTMSLIHDDLPAMDNDDFRRGKPTNHKVFGEDIAILAGDALLAYAFEHIA

SQTRGVPPQLVLQVIARIGHAVAATGLVGGQWDLESEGKAISLETLEYIHSHKTGALLEASVVS

GGILAGADEELLARLSHYARDIGLAFQIVDDILDVTATSEQLGKTAGKDQAAAKATYPSLLGLEA

SRQKAEELIQSAKEALRPYGSQAEPLLALADFITRRQH

SEQ ID NO: 29

MAYTAMAAGTQSLQLRTVASYQECNSMRSCFKLTPFKSFHGVNFNVPSLGAANCEIMGHLKL

GSLPYKQCSVSSKSTKTMAQLVDLAETEKAEGKDIEFDFNEYMKSKAVAVDAALDKAIPLEYPE

KIHESMRYSLLAGGKRVRPALCIAACELVGGSQDLAMPTACAMEMIHTMSLIHDDLPCMDNDD

FRRGKPTNHKVFGEDTAVLAGDALLSFAFEHIAVATSKTVPSDRTLRVISELGKTIGSQGLVGG

QVVDITSEGDANVDLKTLEWIHIHKTAVLLECSVVSGGILGGATEDEIARIRRYARCVGLLFQVV

DDILDVTKSSEELGKTAGKDLLTDKATYPKLMGLEKAKEFAAELATRAKEELSSFDQIKAAPLLG

LADYIAFRQN

SEQ ID NO: 30

MNSSIVSQHFFISLKSSLDLQCWKSSSPSSISMGEFKGIHDKLQILKLPLTMSDRGLSKISCSLSL

QTEKLRYDNDDNDDLELHEELIPKHIALIMDGNRRWAKAKGLEVYEGHKLIIPKLKEICDISSKLGI

QVITAFAFSTENWKRSKEEVDFLMQLFEEFFNEFLRFGVRVSVIGCKSNLPMTLQKCIALTEETT 63

KGNKGLHLVIALNYGGYYDILQATKSIVNKAMNGLLDVEDINKNLFEQELESKCPNPDLLIRTGG

EQRVSNFLLWQLAYTEFYFTNTLFPDFGEKDLKKAILNFQQRHRRFGGHTY

SEQ ID NO: 31

MSSLVLQCWKLSSPSLILQQNTSISMGAFKGIHKLQIPNSPLTVSARGLNKISCSLSLQTEKLCY

EDNDNDLDEELMPKHIALIMDGNRRWAKDKGLDVSEGHKHLFPKLKEICDISSKLGIQVITAFAF

STENWKRAKGEVDFLMQMFEELYDEFSRSGVRVSIIGCKTDLPMTLQKCIALTEETTKGNKGLH

LVIALNYGGYYDILQATKSIVNKAMNGLLDVEDINKNLFDQELESKCPNPDLLIRTGGDQRVSNF

LLWQLAYTEFYFTKTLFPDFGEEDLKEAIINFQQRHRRFGGHTY

SEQ ID NO: 32

MDIPVIVTSVSAENWRRSVTYHPNIWGEFFLSYASQLTEITVAGKEEHERQKEEIRNLLLQSDS

TLKKLELVDSIQRLGVGYHFEKEIGETLRFIHDTNSTNNNDLHEVALCFRLLREKGLHVPCDVFS

KFVDEEGNFRESIRNDVEGILSLYEASNYAVHGEEIPEKAFEFCSSHLVSLITNINNSLSTRVKDA

LKIPIRKSLNRLGAKKFISMYEEDDSHNQKLLNFAKLDFNLVQKIHQKELSHLTRVWVKELDFANK

LSFARDRLVECYFWIVGVYFEPSYGIARKLLTKVIYVASVLDDIYDVYGTLDELTLFTSIVQRWDI

SAIDQLPPYMRIYFKALFDVYVEMEDEMGKLGKSYAVEYGKAEMIRMAKMYFKEAEWSFKGYK

PTMEEYTTVALLSSGYMMMTINSLAVINDPISKEEFDVWLSEPPMLRASLIITRLMDDLAGYGSE

EKLSAVHYYMHQHGVSEEEAFVELQKQVKNAWKDLNKEFLEPREASMPILTCVDNFTRVIIVLY

SDEDTYGNSKTKTKDMIKSVLVDPFMLDC

SEQ ID NO: 33

MIILLKEVQLEIQRRIAYLRPTQKNDGSFRYCFETGVMPDAFLIMLLRTFDLDKEGLIKQLTERIVS

LQNEDGLWTLFDDEEHNLSATIQAYTALLYSGYYQKNDRILRKAERYIIDSGGISRAHFLTRWML

SVNGLYEWPKLFYLPLSLLLVPTYVPLNFYELSTYARIHFVPMMVAGNKKFSLTSRHTPSLSHLD

VREQNQESEETTQESRASIFLVDHLKQLASLPSYIHKLGYQAAERYMLERIEKDGTLYSYATSTF

FMIYGLLALGYKKDSFVIQKAIDGICSLLSTCSGHVHVENSTSTVWDTALLSYALQEAGVPQQDP

MIKGSTRYLKKRQHTKLGDWQFHNPNTAPGGWGFSDINTNNPDLDCTSAAIRALSRRAQTDT

DYLESWQRGINWLLSMQNKDGGFAAFEKNTDSILFTYLPLENAKDAATDPATADLTGRVLECL

GNFAGMNKSHPSIKAAVKWLFDHQLDNGSWYGRWGVCYIYGTWAAITGLRAVGVSASDPRIIK

AINWLKSIQQEDGGFGESCYSASLKKYVPLSFSTPSQTAWALDALMTICPLKDRAVEKGIKFLLN

PNLTEQQTHYPTGIGLPGQFYIQYHSYNDIFPLLALAHYAKKHSS

SEQ ID NO: 34

MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCAENDDDDDDEAVIHVVANSSKHLLQQQR

RQSSFENARKQFRNNRFHRKQSSDLFLTIQYEKEIARNGAKNGGNTKVKEGEDVKKEAVNNTL

ERALSFYSAIQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNE

DGGWGLHIEGSSTMFGSALNYVALRLLGEDANGGECGAMTKARSWILERGGATAITSWGKLW

LSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITHMVLS

LRKELYTIPYHEIDWNRSRNTCAQEDLYYPHPKMQDILWGSIYHVYEPLFNGWPGRRLREKAM

KIAMEHIHYEDENSRYIYLGPVNKVLNMLCCVWEDPYSDAFKFHLQRIPDYLWLAEDGMRMQG

YNGSQLWDTAFSIQAILSTKLIDTFGSTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFS

TRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRS

YPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAALAKAANFLENMQRT

DGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACHFLLSKELPGGGWGESYLSCQ

NKVYTNLEGNRPHLVNTAVWLMALIEAGQGERDPAPLHRAARLLINSQLENGDFPQQEIMGVF

NKNCMITYAAYRNIFPIWALGEYSHRVLTE 64

SEQ ID NO: 35

ATGGCTTCAGAAAAAGAAATTAGGAGAGAGAGATTCTTGAACGTTTTCCCTAAATTAGTAGA GGAATTGAACGCATCGCTTTTGGCTTACGGTATGCCTAAGGAAGCATGTGACTGGTATGCC CACTCATTGAACTACAACACTCCAGGCGGTAAGCTAAATAGAGGTTTGTCCGTTGTGGACA CGTATGCTATTCTCTCCAACAAGACCGTTGAACAATTGGGGCAAGAAGAATACGAAAAGGT TGCCATTCTAGGTTGGTGCATTGAGTTGTTGCAGGCTTACTGTTTGGTCGCCGATGATATG AT GGACAAGT CCATTACCAGAAGAGGCCAACCAT GTTGGT ACAAGGTT COT GAAGTT GGG G AAATT G COAT CAAT G ACG CATT CAT GTT AG AGG CT G CT AT CT AC AAG CTTTT G AAAT CT CA CTT CAG AAACG AAAAAT ACT ACAT AG AT AT CACCG AATT GTT COAT G AGGT CACCTT CCAAA CCG AATTGGGCCAATT GATGGACTT AAT CACTGCACCT G AAG ACAAAGT CG ACTT G AGT AA GTTCTCCCT AAAG AAG C ACT CCTT CAT AGTT ACTTT CAAG ACT G CTT ACT ATT CTTT CT ACTT GCCTGTCGCATTGGCCATGTACGTTGCCGGTATCACGGATGAAAAGGATTTGAAACAAGC CAG AG AT GTCTTG ATT CCATT GG GT G AAT ACTT CCAAATT CAAG AT G ACT ACTT AG ACT G CT T CGGT ACCCCAGAACAG AT CGGTAAGAT CGGTACAG AT AT CCAAG AT AACAAAT GTT CTT G GGTAATCAACAAGGCATTAGAACTTGCTTCCGCAGAACAAAGAAAGACTTTAGACGAAAAT TACGGTAAGAAGGACTCAGTCGCAGAAGCCAAATGCAAAAAGATTTTCAATGACTTGAAAA TT G AACAGCT AT ACCACGAAT AT G AAG AGT CT ATT GCCAAGG ATTT G AAGGCT AAAATTT CT CAGGT CG AT GAGT CT CGT GGCTT CAAAGCT GAT GT CTT AACT GCGTTTTT G AACAAAGTTT A CAAG AG AT AA

SEQ ID NO: 36

AT GCCAGATGCT ATT G AATTT G AACAT G AAGGTAGAAGAAACCCAAACT CTGCT GAAGCT G AAT CTGCTT ACT CTT CT ATT ATT G CTG CTTT G G ACTT G CAAG AAT CCG ATT ACG CT GTT ATTT CTG GT CACT CT AG AAT AGTT G GTG CTGCTG CTTT AGTTT AT CCAG AT G CTG ATG CT G AAACT TT GTTGGCTGCTT CTTT GTGGACT GCTT GTTT GAT CGTTAAT GAT GAT AGATGGGACTACGT CCAAGAAGATGGTGGTAGATTGGCTCCAGGTGAATGGTTTGATGGTGTTACTGAAGTTGTT GATACTTG GAG AACT GCTGGT CCAAGATTGCCAGAT CCATTTTTT GAATT GGTT AGAACCAC CAT GT CCAGATTGG AT GOT GCATT GGGTGCT G AAGCAGCT G ACG AAATTGGT CACG AAAT C AAAAGAGCTATTACCGCTATGAAGTGGGAAGGTGTTTGGAATGAATACACCAAAAAGACAT CTTT GGCCACCT ACTT GT CTTTT AGAAG AGGTT ACT GTACCATGGAT GTT CAAGTT GTTTT G GACAAGTGG ATT AACGGTGGT AGAAGTTTT GOT GCCTT G AG AGAT GAT CCAGTT AGAAG AG CAATT GAT GAT GTTGTT GTT AG ATT CGGTTGCTT GT CCAACG ATT ATT ACT CTTGGGGTAGA GAAAAAAAGGCCGTT GAT AAGT CT AACGCCGT CAG AATTTT G ATGGAT CATGCT GGTTATG AT G AAT CT ACTG CTTT G G CT CAT GTT AG AG ATG ATTG CGTT CAAG CCATT ACT G ATTT GG AT T G CAT CG AAG AAT COAT CAAG AG AT CAGGT CATTT G G GTTCT CAT GCCC AAG AATT ATT G G ATT ACTT GGCTT GT CAT AGACCATT GAT AT ACGCT GCTGCT ACTTGGCCAACT G AAACT AAT AGAT ACAG AT AA

SEQ ID NO: 37

ATGGTTGCT CAAACTTT CAACTTGGAT ACCT ACTT GT CCCAAAG ACAACAACAAGTT G AAG A AGCTTT GTCTGCT GCTTTGGTT CCAGCTT AT CCAG AAAGAAT CT AT GAAGCT AT GAGGT ACT CTTT GTTGGCT GGTGGTAAAAGATT AAG ACCAATTTT GT GTTTGGCTGCTT GT G AATT AGCT GGTGGTTCT GTT GAACAAGCT ATGCCAACT GCTT GTGCTTT GG AAAT GATT CAT ACCAT GT C CTT GAT CCACG AT GATTTGCCAGCT AT GG AT AAT GAT G ATTT CAG AAG AGGTAAGCCT ACC AACCATAAGGTTTTCGGTGAAGATATTGCTATTTTGGCCGGTGATGCTTTGTTAGCTTATGC CTTT G AACAT ATT GCCT CT CAAACT AG AGGT GTT CCACCACAATT GGT GTTGCAAGTT ATT G 65

CTAGAATTGGTCATGCTGTTGCTGCTACTGGTTTGGTTGGTGGTCAAGTTGTTGATTTGGA ATCTGAAGGTAAGGCCATTTCTTTGGAAACCTTGGAGTATATCCATTCTCATAAGACTGGTG CTTT GTTGGAAGCTT CT GTT GTTT CTGGT GGTATT CT AGCT GGTGCT GAT G AAG AATT ATT G GCCAG ATT GTCT CATT ACGCCAG AGAT ATTGGTTTGGCTTT CCAAAT CGTT GAT GAT AT CTT GG AT GTT ACCGCT ACCT CT G AACAATTGGGTAAAACT GCT GGTAAAG AT CAAGCT GCTGCT AAAGCT ACTT ACCCAT CTTT GTT AGGTTTGGAAGCCT CT AG ACAAAAGGCCGAAG AATT GAT T CAAT CCG CT AAAG AAG CCTT AAG AC CAT ACG GTTCT CAAG CT G AACCATT ATT GG CTTT G GCAGATTT CATT ACCAGAAGGCAACATT AA

SEQ ID NO: 38

ATGGCCTACACCGCTATGGCTGCTGGCACTCAGTCTCTGCAGCTGAGGACCGTGGCTAGC

T ACCAAGAGT GCAACAGCAT G AGGT CCTGCTT CAAGCT GACCCCGTT CAAG AGCTT CCAC

GGCGTGAACTTCAACGTGCCATCTCTGGGCGCTGCCAACTGCGAGATCATGGGCCATCTT

AAGCTGGGCAGCCTGCCGTACAAGCAGTGCTCTGTGAGCAGCAAGAGCACCAAGACCATG

GCGCAGCTGGTGGACCTTGCCGAGACTGAGAAGGCTGAAGGCAAGGACATCGAGTTCGA

CTTCAACGAGTACATGAAGTCCAAGGCCGTGGCCGTGGACGCTGCTCTGGATAAGGCTAT

CCCACT CGAGT ACCCAGAGAAGAT CCACGAGT COAT GAGGT ACAGCCT GCT GGCT GGTGG

TAAGAGGGTTCGCCCAGCTCTGTGCATTGCTGCCTGCGAGCTTGTTGGCGGCTCTCAGGA

TCTG G CTAT GCCAACCGCTT G CG CTATG G AAAT GAT CC ACACCAT G AG CCT GAT CCACG AC

GACCTGCCGTGCATGGACAACGACGATTTCAGAAGGGGCAAGCCGACCAACCACAAGGT

GTTCGGCGAGGATACTGCTGTGCTTGCTGGCGACGCTCTGCTGAGCTTCGCCTTCGAGCA

TATCGCTGTGGCCACCAGCAAGACCGTGCCATCAGATAGGACCCTGAGGGTGATCAGCGA

GCTGGGCAAGACCATTGGCTCACAGGGCCTTGTTGGAGGCCAGGTGGTGGACATTACTAG

CGAGGGCGACGCT AACGT GGACCT CAAGACCCT CGAGTGGATT CACAT CCACAAGACCGC

CGTGCTGCTCGAGTGCTCAGTTGTTTCTGGCGGCATTCTTGGCGGCGCTACCGAGGATGA

GATCGCTAGGATTAGAAGGTACGCCCGCTGCGTGGGCCTGCTGTTCCAAGTGGTGGACGA

CATCCTGGACGTGACCAAGAGCAGCGAGGAACTCGGCAAGACCGCTGGCAAGGATCTGC

T GACCGACAAGGCCACCTAT CCGAAGCT GATGGGCCT CGAGAAGGCCAAAGAGTT CGCC

GCTGAACTGGCGACCAGGGCGAAAGAGGAACTGAGCAGCTTCGACCAGATCAAGGCCGC

T CCACTGCT GGGCCT CGCT GACTACATTGCGTT CAGGCAGAACT GA

SEQ ID NO: 39

AT GT CT GAT AG AGGTTT GTCT AAGATTT CCTGCT CCTT GT CATTGCAAACCGAAAAGTT GAG AT ACGAT AACGAT GAT AACGACG ACTTGGAATTGCACGAAGAATT GATT CCAAAACAT ATCG CCTTGATCATGGACGGTAATAGAAGATGGGCTAAAGCTAAAGGTTTGGAAGTTTACGAAGG T CACAAGTT GATT AT CCCCAAGTT G AAAG AAAT CTGCG ACAT CT CTT CT AAGTTGGGTATT C AAGTTATTACCGCTTT CGCTTT CT CTACCGAAAATTGGAAGAGGT CT AAAGAAGAGGTT GAC TT CTT GATGCAGTT GTT CG AAGAATTTTT CAACGAGTT CTT GAG ATT CGGT GTT AG AGTTT C TGTTAT CGGTTGCAAGT CT AATTT GCCAAT G ACCTT GCAAAAGTGCATTGCTTT G ACT GAAG AAACTACCAAAGGTAACAAAGGCTTGCATTTGGTTATT GCCTT GAATT ACGGTGGTT ACTAC GAT ATCTTG CAAG CT ACT AAGT COAT CGTT AACAAGG CT AT G AAT G GTTT GTTG G ACGT CG AAG AT AT CAACAAGAATTT GTT CG AGCAAGAGTTGG AAAGCAAGT GT CCAAAT CCAG ATTT GTT GATT AG AACCGGT GGT G AACAAAGAGT CT CT AATTT CTT GTT GTGGCAATT GGCTT ACA CCG AATT CT ACTT CACT AACACTTT GTT CCCAG ACTT CGGT G AAAAGG ATTT GAAGAAGGCT AT CTT G AACTT CCAG CAG AG ACAT AG AAG ATTT G GTG GT CAT ACTT ACT AA

SEQ ID NO: 40 66

ATGTCTG CT AG AG GTTT G AACAAAATTT CCTG CT CCTT GT CCTT G CAAACCG AAAAATT GTG TTACG AG G ATAACG ATAACG ACTTG G ACG AAG AATTG ATG CCAAAACATATTG CCTTG ATC AT GG ACGGT AAT AGAAG AT GGGCT AAAG ACAAAGGTTTGGAT GTTT CT G AAGGT CACAAAC ACTT GTT CCCCAAGTT GAAAG AAAT CT GCGAT AT CT CTT CCAAGTTGGGTATT CAAGTT ATT ACCGCTTT CGCTTT CT CT ACCGAAAATTGGAAAAG AGCT AAAGGCGAAGTT G ACTT CTT GAT GCAAAT GTT CGAAGAGTT GTACG ACG AATT CT CT AG AT CTGGT GTT AGAGTTT CCATT ATT G GTTGCAAG ACT G ATTTGCCAAT GACCTT GCAAAAGT GTATTGCTTT G ACT G AAG AAACCAC CAAAGGT AACAAGGGT CT GCATTTGGTT AT CGCTTT G AATT ATGGTGGTT ACT ACG AT AT CT TGCAAGCCACT AAGT CT AT CGTT AACAAGGCT AT G AAT GGTTT GTTGGACGT CGAAG AT AT CAACAAGAACTT GTT CG ACCAAGAGTTGGAAT CT AAGT GT CCAAAT CCAG ACTT GTT GATT A GAACTGGT GGT GAT CAAAG AGT CT CCAATTTTTT GTT GTGGCAATTGGCTT ACACCG AGTT CT ACTTT ACT AAG ACTTT GTT CCCAG ACTT CG GT G AAG AAG ATTT GAAAG AAG CCAT CAT CA ACTT CCAGCAGAG ACAT AG AAGATT CGGTGGT CAT ACTT ACT AA

SEQ ID NO: 41

AT GG AT ATT CCT GT GATT GTT ACTT CCGTTT CGGCT GAG AAT GT CGT CCGT CGAT CT GTAAC TT ACCAT CCAAAT ATTT GG G G AG AATTTTTT CTTT CAT AT G CTT CACAACTT ACG G AAAT CAC TGTTG CTG G AAAGG AAG AG CAT GAAAG ACAAAAG G AAG AG ATT AGG AATTT G CTT CTT CAA AGT GATT CAACCCT AAAAAAGCTT G AACT CGTT G ACT CAAT CCAACGCCTTGGAGT GGGCT ACCATTT CGAG AAAG AAATTGGCGAAACATT ACGATT CATT CAT GACACCAAT AGCACCAAT AACAACGAT CTT CACG AAGTT GCT CTTT GCTTT CGT CT GCTT AG AG AAAAAGGT CTT CAT GT T CCAT GT GAT GTTTTT AGCAAGTT CGTAGAT G AAG AAGG AAATTT CAGGG AGT CGAT AAG A AACG AT GTT G AAG G GAT ATT GAG CTT AT ATG AG G CAT CAAATT AT GC AGT G CAT G GAG AG G AAATT CCT GAAAAAGCATT CG AATTTT GCT CCT CT CAT CTT GTCT CTTT AAT CACCAACAT CA ACAATT CCCTTT CAACACGAGTT AAGG ATGCTTT GAAGAT CCCAATT CGAAAGAGT CT AAAC AG ATT G G GAG CAAAAAAGTT CAT CTCTATGTAT G AAG AAG AT G ACT CACACAAT CAAAAATT ACT CAATTTT GCCAAATTGG ACTT CAACTT AGT GCAG AAGAT ACACCAG AAAG AGCT AAGC CAT CTT ACAAGGTGGTGGAAGGAGTTAGACTTTGCAAATAAGCTAT CTTTTGCGAGAGATA GACTT GTGGAATGCTACTTTTGGAT AGTGGGAGTTTACTTT GAGCCAAGCT ATGGAATT GC AAG AAAGCT ACT AACCAAAGT CATTT AT GTGGCTT CT GT CCTT GAT G ACAT CT ACGACGT CT AT GGAACCTTAGACGAACT AACCCT CTT CACCAGCATT GT CCAAAGGTGGGACATTAGTGC CAT CGAT CAATTGCCACCAT ACAT GAG AAT AT ACTT CAAAGCCCTTTT CGAT GT AT ATGTTG AAATGGAAGACGAAAT GGGAAAACTAGGCAAAT CAT ATGCAGT CGAATAT GGAAAAGCT GA GATGATAAGGTTGGCCAAGATGTACTTTAAAGAGGCTGAATGGTCTTTTAAGGGGTACAAG CCT ACAAT G G AG G AAT ACACAAC AGT G GC ACTTTT GTCTTCGGGCT ACAT G ATG AT G ACAA TT AATT CATT AG CT GTT AT AAAT G ACCCAATT AG CAAG G AAG AGTTT GATT G G GTTTTG AGT G AACCACCT AT G CT AAG GG CAT CTTT GAT CATT ACT AG ACT CAT G G AT GACCTT GCCG G AT AT GGGAGT G AAG AG AAGCT CT CCGCAGTGCATT ACT ACATGCAT CAACAT GGTGTAT CAGA AG AGG AAGCTTTT GT AG AGCTT CAAAAACAAGT G AAG AAT GCATGG AAGGAT CT CAACAAG G AATTT CTT G AG CCAAG AG AGG CAT CCAT G CCAATT CT CACAT GTGTT GAT AATTT CAC ACG AGTTATAATCGTGTTGTATAGTGATGAAGATACATATGGTAACTCCAAAACTAAGACCAAAG AT AT GAT CAAGT CGGTGCT AGTT GACCCCTT CATGCTT GATTGCT AA

SEQ ID NO: 42

AT GAT CAT CTT GTT GAAAG AAGT CCAGTTGG AAAT CC AAAG AAG AATTG CCT ACCT AAGGC CAACT CAAAAG AAT G ATG GTT CTTT CAG AT ACT G CTT CG AAACT G GTGTTAT G CCAG AT GCT TT CCT GATT AT GTT GTT G AG AACTTT CG ACTT GG ACAAAG AGGT GTT GATT AAGCAATT G AC CG AAAGG AT CGT GT CCTTGCAAAAT GAAGAT GGTTT GTGG ACTTT GTT CGAT GACG AAG AA CAT AACTT GTCCG CT ACT ATT CAAG CTT ACACT GCTTT GTTGT ACT CCGGTTACT ACCAAAA 67

GAACG ACAG AATTTT G AG AAAGGCCGAAAGGT ACATT AT CG ATT CTGGTGGT ATTT CT AG A GCCCATTTTTT G ACT AG ATGG AT GTT GT CT GTT AACGGCTT GTAT G AAT GGCCAAAGTT GTT TT ACTTGCCCCT GT CTTT GTT GTTGGTT CCAACTT AT GTT CCCTT GAACTT CT ACG AATT GT C T ACTT ACG CT AG AAT CCACTT CGTTCCTATGATGGTTGCT G GTAACAAG AAGTT CT CTTT G A CTT CT AG ACAT ACG CCAT CCTT GTCT CATTT G GAT GTT AG AG AACAG AAG CAAG AAT CCG A AG AAACT ACCCAAG AAT CT AG AGCCT CT AT CTT CTTGGTT GAT CACTT G AAACAATTGGCCT CTTT G CCAT CCT ACATT CACAAATT G G GTT AT CAAG CT GCT GAG AG GTAT ATGTTG G AAAG A ATT G AAAAAG ACG GC ACCTT GTACT CTTATG CT ACTT CT ACTTT CTT CAT G ATCTACG GTTT GTT GGCTTTGGGTTACAAGAAGGATT CCTT CGTTATT CAAAAGGCCATT GACGGTAT CTGC AGTTT GTT AT CT ACTT GCT CTGGT CAT GTT CACGTT GAAAATT CT ACAT CT ACT GTTT GGG AT ACCGCCTT GTTGT CTT ATGCTTT ACAAGAGGCT GGTGTT CCACAACAAGAT CCAAT GATT AA GG GTACT ACCAG GT ACTT G AAG AAG AG ACAAC AT ACCAAATT AG GT G ACT G G CAATTT CAT AACCCAAAT ACT G CT CCAG GT G GTT GG G GTTTTT CT GAT ATT AAC ACT AACAACCC AG ATTT GG ATT G CACCT CTG CTG CT ATT AG AG CTTT AT CT AG AAG G G CT CAAACT GAT ACCG ATT ACT TGGAATCTTGGCAGAGAGGTATTAACTGGTTATTGTCCATGCAAAACAAGGACGGTGGTTT TG CTG CTTTT GAAAAG AAC ACCG ATT CCAT CTTGTT CACCT ATTT G CCATT G G AAAAT G CTA AGGATGCT GCT ACT GAT CCAGCT ACTGCT GATTT GACTGGTAGAGTTTTGGAAT GTTT GGG TAATTTCGCTGGCATGAACAAATCTCACCCATCTATTAAGGCTGCTGTTAAGTGGTTGTTCG AT CACCAATT GG AT AAT GGTT CTTGGT AT GGTAGAT GGGGT GTTT GTT AT AT CT ATGGT ACT TGGGCTGCAATTACTGGTTTGAGAGCTGTTGGTGTTTCTGCTTCAGATCCAAGAATTATCAA GG CTAT CAACT GGTT G AAGT CCAT CCAACAAG AG GAT GGCGGTTTTGGT G AAT CTT GTT AT T CTGCTT CT CT GAAGAAGT ACGT CCCATT GT CTTTTT CTACT CCAT CT CAAACTGCTT GGGC TTT AG AT GCTTT GAT G ACAATTT GT CCATT G AAGG AT AGGT CCGTT G AAAAGGGT ATT AAGT TT CT GTT G AACCCAAACTT G ACCG AACAACAAACT CATT ACCCAACTGGTATTGGTTTGCCA GGT CAATTTT ACAT CCAAT ACCACT CCT ACAACG ACAT CTTT CCATT ATTGGCTTT AGCT CA CTACG CTAAG AAG CACTCTTCATT AG GT AG AG GTAG AAG GTCT AAGTTGTAA

SEQ ID NO: 43

AT GTGGCGTTT G AAAGTTGGT AAAGAAT CCGT CGGT GAAAAAG AAG AAAAGT GG AT CAAGT CCAT CT CCAACC ATTT G G GT AG ACAAGTTT GG G AATTTT G CG CT G AAAAT GAT GAT G ACG A T G ACG ACG AAG CT GTT ATT CAT GTT GTTGCCAACT CCT CCAAACATTT GTT ACAACAACAAA GGCGT CAGT CCT CATTT G AAAATGCT AGAAAGCAGTT CCGTAACAACAG ATT CCAT AGG AA GCAAT CCT CT GATTT GTT CTT G ACCAT CCAGTACGAAAAAG AAATTGCCAG AAATGGT GCTA AGAATGGTGGTAACACAAAGGTCAAAGAAGGTGAGGACGTTAAGAAAGAAGCCGTTAACA ATACTTTGGAAAGGGCCTTGTCTTTCTACTCTGCTATTCAAACTTCTGATGGTAACTGGGCT TCTGACTTAGGTGGTCCAATGTTTTTGTTGCCAGGTTTGGTTATTGCCTTGTACGTTACTGG T GTTTT G AACT CCGTTTT GT CT AAGCACCAT AG ACAAGAAAT GT GCAGGTACAT CT ACAACC ACCAAAATGAAGATGGTGGTTGGGGTTTACACATTGAAGGTTCTTCTACTATGTTCGGTTCT GCCTT G AATT AT GTT GCTTT GAG ATT GCT AGGT GAAGAT GCT AATGGT GGT GAAT GTGGTG CTATG ACAAAAG CT AG AT CAT G G ATTTT G G AAAG AG GT GG CG CT ACT G CT ATT ACTT CTT G GGGTAAATT GTGGTTGTCT GTTTTGGGT GTTT AT GAGT GGTCT GGTAACAAT CCATTGCCA CCAGAATTTT GGTTGCT GCCAT ATT CTTT GCCATTT CAT CCAGGT AGG AT GTGGT GT CATT G CAGAAT GGTTT ATTT GCCCAT GT CTT ACTT GTACGGT AAG AGATTT GTT GGT CCAAT CACT C ACAT GGT CTT GT CTTT GAG AAAAGAGTT GTACACCATT CCAT ACCACG AAATT G ATTGGAAC AG AT CCAG AAACACTT GTGCT CAAGAGG ACTT GTATT ACCCACAT CCAAAGAT GCAAGAT A TTTT GTGGGGTT CCAT CTACCAT GTTTACGAACCTTT GTTTAAT GGTT GGCCAGGT AGAAGA TT G AG AG AAAAG G CT AT G AAG ATT GCCAT G G AACAT ATT CATT ACG AG G ACG AAAATT CCC GTT ACAT CT ATTTGGGT CCAGTT AACAAGGT CTT G AACAT GTTGTGTT GTT GGGTT GAAGAT CCAT ACT CT GATGCTTTT AAGTT CCACTTGCAAAG AAT CCCAG ATT ACTT GTGGTT GGCCGA AGATGGTATGAGAATGCAAGGTTATAATGGTTCCCAATTGTGGGATACCGCTTTCTCTATTC 68

AAG CT ATTTT GT CCACCAAGTT G ATCG AT ACTTT CG GTTCT ACTTT G AGG AAG G CACAT CAT TT CGT CAAGCACT CT CAAAT CCAAGAGGATT GT CCAGGT GAT CCT AAT GTTTGGTTT AGACA T ATT CACAAAGGT GCCTGGCCATT CT CT ACT AG AG AT CAT GGTTGGTT G ATTT CT G ATTGCA CTGCT GAAGGTTT GAAGGCTT CTTT GAT GTT GT CTAAGTTGCCAT CTAAGAT CGTTGGT GAA CCATTGG AAAAGAACAGATT GTGT GATGCCGTT AACGT CTT GTTGT CCTTGCAAAACG AAA ACGGTGGTTTTGCTT CTTACGAATT GACTAGAT CATACCCCTGGTTGGAATT GATTAACCCA GCT G AAACTTT CGGT GAT AT CGTT AT CG ATT ACT CCT ACGTT G AAT GT ACTT CTGCT ACT AT GG AAG CTTTGG CTTTGTTCAAAAAATTG CATCCAG GTCACAGG ACCAAAG AAATAG ATG CT GCTTTGGCT AAAG CTGCT AACTTCTTG G AAAAC AT G C AAAG AACT GATGGTTCTTGGTACG GTTGTTGGGGTGTTTGTTTTACTTATGCTGGTTGGTTTGGTATCAAAGGTTTAGTTGCTGCT GGTAG AACCT ACAACAATTGCGTTGCT ATT AG AAAGGCCT GT CACTT CTT GCT GT CT AAAG A ATT ACCAG GT GGTGGATGGGGT G AAT CTT ATTT GT CTT GT CAAAACAAG GTTT ACACC AACT TGGAAGGTAACAGACCACATTTGGTT AATACTGCTT GGGTTTT GATGGCTTT GATT GAAGCT GGT CAAGGT GAAAGAG AT CCAGCT CCATT GCAT AG AGCT GCT AG ATT ATT GAT CAACT CCC AATT G G AAAAT G GT G ACTT CCCACAACAAG AAAT CAT GG GTGTTTT C AACAAG AACT G CAT GATT ACTTACGCTGCCTACAGAAACATTTT CCCT ATTTGGGCTTTGGGT GAAT ACT CCCATA GAGTTTT GACT G AGT GA

SEQ ID NO: 44

MNSLFVGRPIVKSSYNVYTLPSSICGGHFFKVSNSLSLYDDHRRTRIEIIRNSELIPKHVAIIMDGN

RRWAKARGLPVQEGHKFLAPNLKNICNISSKLGIQVITAFAFSTENWNRSSEEVDFLMRLFEEF

FEEFMRLGVRVSLIGGKSKLPTKLQQVIELTEEVTKSNEGLHLMMALNYGGQYDMLQATKNIAS

KVKDGLIKLEDIDYTLFEQELTTKCAKFPKPDLLIRTGGEQRISNFLLWQLAYSELYFTNTLFPDF

GEEALMDAIFSFQRRHRRFGGHTY

69

EXAMPLES

The techniques and methods described herein are carried out in a manner known to the skilled person. Further details may be found, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. All methods including the use of kits and reagents are carried out according to the manufacturers' information unless specifically indicated.

Genes used:

Table 1 - List of genes used in the study. Accession numbers are for Uniprot.org. Where unavailable a reference is cited.

Gene Origin Function Accession number/reference

_

ERG20^N127W S. cerevisiae Erg20p dominant (Ignea et al 2014) negative mutant that confers geranyl pyrophosphate synthase activity to the homo/heterodimeric partner subunit

P/PinS Pinustaeda a-pinene synthase Q84KL3

ObMyrs Ocimum basilicum Myrcene synthase Q5SBP1

AtFKI Arabidopsis thaliana Farnesol kinase, Q67ZM7 chloroplastic

CsPT4 Cannabis sativa Prenyltransferase 4 A0A455ZJC3

CsTHCAs Cannabis sativa Tetrahydrocannabinolic Q8GTB6 acid synthase

ScCKI S. cerevisiae Choline kinase P20485

ScEKI S. cerevisiae Ethanolamine kinase Q03764

EcThiM Escherichia coli Hydroxyethylthiazole P76423 kinase 70

SfPhoN Shigella flexneri Acid phosphatase / 050542

Alcohol kinase

Yeast strains

The yeast strains used in this application were based on the EGY48 Saccharomyces cerevisiae strain disclosed in (Ignea et al. , (2011), Thomas B.J. and R. Rothstein (1989) and (Ellerstrom M et al (1992)) and modified according to Table 2. Table 2: Strain genotype

Construction of plasmids:

Plasmids were generated using standard methods used within genetic engineering and known in the art. Detailed protocols for methods for plasmid construction can be found in general handbooks containing methods for molecular cloning. Genes were amplified by PCR and placed under the control of the dual inducible promoter P_GALI and P_GAUO. Coding genes sequences were then ligated using USER cloning (Nour-Eldin et al (2010)) into the backbone of the pESC-URA, pESC-LEU, pESC-TRP, and pESC-HIS, vectors (Agilent Technologies) to construct the plasmids listed in Table 3.

Table 3 - List of plasmids used in the study

71

Table 4 - List of yeast strains used in the study

72

Yeast media and yeast cultivation conditions

The yeast cells were first cultured on selective complete minimal media with glucose at 30°C overnight. Selective complete minimal media consisted of 0.13% w/v dropout powder, 0.67% w/v yeast nitrogen base without amino acids with ammonium sulphate (YNB+AS), 2% w/v glucose. Dropout powder was purchased to lack leucine, histidine, uracil and tryptophan. When required, these four nutrients were added at 0.01-0.02% w/v. Cells were then harvested by centrifugation 73 to remove medium and resuspended in selective minimal production media. This media was used to induce galactose promoters, with additional raffinose as an alternative carbon source. Selective minimal production media composition: 0.13% w/v dropout powder, 0.64% w/v YNB+AS, 2% galactose, 1% w/v raffinose. When appropriate, the same four nutrients as above were added at 0.01-0.02% w/v. The cultures were grown at 30°C, 150 rpm, for the indicated time, and analyzed using GC-FID and/or SPME sampling and GC-MS. Isopropylmyristate (IPM) was added as an overlay corresponding to 10% of the culture volume in samples analyzed with GC-FID.

Cannabinoid extraction and analysis

500 mI_ yeast culture were transferred to a 2ml tube containing approx. 10Omg 0.5mm glass beads and 500 mI_ ethyl acetate with 0.05% formic acid, and vortexed for 3 minutes followed by a 1 minute centrifugation at 20,000g. The procedure was repeated three times, each time the resulting supernatant was transferred to separate 1,5ml tubes and 500mI_ of ethyl acetate with 0.05% formic acid was to the original. The resulting solution was evaporated using vacuum centrifugation and the resulting pellet was suspended in 200mI HPLC grade methanol, unsoluble compounds was removed using 0.22 pm ultrafree centrifugal filters (Merck Milipore, Tullagreen, Ireland).

Qualitative LC-ESI-MS analysis was performed on the Dionex UltiMate® 3000 Quaternary Rapid Separation UHPLC focused system (Thermo Fisher Scientific, Germering, Germany) equipped with a Phenomenex Kinetex XB-C18 column (100 mm ^c 2.1 mm i.d., 1.7 pm particle size, 100 A pore size) (Phenomenex, Inc., Torrance, CA, USA). The samples was analyzed according to Hansen, N.L., Miettinen, K., Zhao, Y. et al. Integrating pathway elucidation with yeast engineering to produce polpunonic acid the precursor of the anti-obesity agent celastrol. Microb Cell Fact 19, 15 (2020). https://doi.org/10.1186/s12934-020-1284-9

Example 1:

AtFKI is a superior kinase

To evaluate the performance of the proposed alcohol phosphorylation pathway, consisting of the Saccharomyces cerevisiae Ckilp and the A. thaliana IPK enzymes (Chatzivassileiou et al 2019), in yeast, we over-expressed these two genes from plasmid vectors pUUS-GAL1 and pHUS-GAL1, respectively, in S. cerevisiae strain EGY48 (a W303 derivative). As control, cells expressing no kinase, i.e. transformed with the empty pUUS and pHUS vectors, were used. A 74 concentration of 0.1 % (v/v) of the alcohols prenol or isoprenol was used. As an additional control, Cki1 p and AtIPK was expressed together in the absence of alcohol.

Yeast cells were grown overnight in synthetic complete minimal glucose selective media (CM- GLU), washed twice and transferred to synthetic complete minimal galactose selective media (CM-GAL) to induce heterologous gene expression. An aliquot of 5 ml. of the culture was transferred in a 35 ml. vial which was incubated for approximately 72 h and the headspace above the culture was analyzed by SPME and GCMS. We examined the yeast culture headspace for the presence of geraniol, linalool, farnesol or nerolidol, which would indicate a functional pathway. Very low levels of linalool could be determined that were only marginally higher than the linalool levels in the control cells. These results suggested that Cki1 p or AtIPK could not provide sufficient flux through this pathway, and we searched for better performing enzymes.

We compared the performance of a set of candidate kinases for the first step, which included the enzymes: S. flexneri PhoN (SfPhoN), S. cerevisiae Ekilp (ScEKI), S. cerevisiae Cki1 p(ScCKI), E. Coli ThiM (EcThiM) and A. thaliana FOLK (AtFKI; SEQ ID: 2). The corresponding genes were cloned in plasmid vector pUUS and co-expressed with AtIPK from vector pHUS-AtIPK. The strains used were: EGY48, NCTY 7, NCTY 22, NCTY 23, NCTY 24, NCTY 25.

Under the same conditions as described above, when AtFKI and AtIPK were co-expressed, the production of linalool was considerably higher when either prenol or isoprenol were used, indicating that using the AtFKI kinase had a far superior effect that the remaining candidates in increasing the intracellular concentration of the terpene precursor GPP. Thus, production of linalool when co-expressing AtFKI and AtIPK in yeast identified the most efficient combination; AtFKI is superior to other alcohol kinases in phosphorylating prenol and isoprenol in yeast.

Example 2:

A truncated form of AtFKI missing the first 65 amino acids is equally active as full-length AtFKI in yeast

AtFKI is a chloroplastic enzyme and therefore contains an N-terminal chloroplast targeting signal. Therefore, the active form of AtFKI in the plant cells is a shorter form of the full-length enzyme. To evaluate the minimum domain required for AtFKI to be functional in yeast, we evaluated the activity of different truncations. AtFKI with the first 65 amino acids removed and 75 replaced with a start codon (A65AtFKI; SEQ ID: 2) was expressed to compare the activity of the full native and the truncated version of the enzyme. The full and truncated enzymes were each expressed from a corresponding galactose-inducible pUUS plasmid in combination with AtIPK. Yeast strain EGY48 transformed with these plasmids was grown and analyzed as described in Example 1. It was found that when both the full enzyme and the truncated version was expressed, the production of linalool was similar. These results demonstrate that the first 65 amino acids of AtFKI are not necessary for full function, and that the region 66-end is a shorter, equally functional form of AtFKI. Strains used NCTY 7 and NCTY 8.

Example 3:

An alcohol phosphorylation pathway comprising AtFKI performs well in combination with various phosphokinases.

To evaluate the effect of different phosphokinases to complement AtFKI in constructing a functional isoprenoid precursor supply pathway, AtFKI was expressed together with AtIPK, MtIPK, TalPK or TalPK(204G) mutant and the headspace was analysed using GC-MS-SPME, as described in examples 1 and 2. Strains used: NCTY 7, NCTY 9, NCTY 10, and NCTY 11.

The results shown in Fig. 4 (production of linalool) demonstrate that AtFKI works well in combination with various phosphokinases.

Example 4:

To evaluate the effect of the AtFKI alcohol bioconversion pathway on terpenoid production, we analyzed the production of the terpenes limonene, myrcene, and sabinene by co-expression of the corresponding terpene synthase, limonene synthase, myrcene synthase, or sabinene synthase, respectively. The strains used were NCTY 12, NCTY 13 and NCTY 14.

In Fig. 5 the production of limonene using AtFKI and AtIPK is compared to the production achieved in the absence of an alcohol. Further, an additional 80-fold production increase of myrcene was obtained after using alcohol feeding pathway; co-expression of AtFKI-AtIPKand ObMyrS in yeast. Media with 0.05% prenol and 0.05% isoprenol was used.

Also, an additional 20-fold production increase of sabinene was obtained after using alcohol feeding pathway; co-expression of AtFKI-AtIPK and SpSabS in yeast cells. Media with 0.05% prenol and 0.05% isoprenol was used. 76

Furthermore, in Fig. 5, evidence is presented that different ratios and concentrations of prenol and isoprenol can be used to further enhance or control terpene production. The data demonstrate the concentration and ratio of prenol and isoprenol can be used to control the total production. PrenoUsoprenol = 3:0, 2:1, 1:1, 1:2 obtained similar improvement which resulted in approximately 58-fold increase than without feeding alcohol (PrenoUsoprenol = 0:0). The overall concentration of alcohols was 0.1%.

Further, limonene yield can be adjusted by feeding alcohols at different concentration to the strain containing the alcohol feeding pathway and limonene downstream building block. Concentration of alcohol= 0.3% resulted the highest limonene titer with approximately 40-fold increase than without feeding alcohol (Concentration of alcohol= 0%). The alcohols ratio of prenoUsoprenol = 1:1.

In Fig. 5a and 5b, a similar increase in production of terpenes is demonstrated with AtFKI and AtIPK co-expressed with myrcene synthase or sabinene synthase, respectively.

When producing limonene, the production is increased over 25-fold when prenol and isoprenol is used, compared to strain with absence of AtFKI. (State-of the art production through isoprenoid precursors synthesized from the mevalonate pathway). Similarly, a 20-fold increase is found in using sabinene.

Example 5:

AtFKI boosts cannabinoid production from prenol and isoprenol.

Cannabigerolic acid (CBGA) is a central intermediate in the synthesis of natural cannabinoids. Co-expression of the alcohol feeding pathway AtFKI-AtIPK and the CBGA-synthesizing fusion Erg20p(N127W)-CsPT4 in yeast cells (strain NCTY 19) resulted in 7.5-fold increased CBGA production (pink) compared to the production titer without feeding alcohol (black). Samples were analyzed in triplicate (n = 3 biologically independent samples). Olivetolic acid was added to the cultures at 0.1 mM.

Example 6:

Synthesis of non-canonical terpenoids using AtFKI.

We tested the ability of AtFKI to phosphorylate additional alcohols, beyond prenol or isoprenol, by co-expressing AtFKI with AtIPK and CILimS in EGY48 cells (giving rise to strain NCTY 14). 77

We supplemented with different alcohols in the media and found that several of them were converted into non-canonical terpene like structures.

The data of Fig. 8 demonstrates that novel compounds are being produced when the prenol-like alcohols with additional methyl groups are converted with AtFKI + AtIPK + CILimS.

Based on the alcohol which have been converted and the mass spectrum of novel compounds the suggested structures are shown for the peaks.

In addition to the examples shown above, novel compounds have also been found when several other alcohols were tested with AtFKI and AtIPK (see Fig. 9).

Based on these observations, we propose that AfFKI can phosphorylate the core alcohol structure shown in Fig. 10.

Example 7:

Protein engineering of the terpene synthase facilitates the production of non-canonical terpenes.

Since several alcohols tested were able to produce only limited amounts of novel terpenoids, we examined whether this limitation in the production on non-canonical terpenoids is due to the inability of the terpene synthase, in this case limonene synthase, to accept the non-canonical substrate. To evaluate this, we subjected limonene synthase to site-directed mutagenesis aiming at specific residues whose substitution could expand the active site cavity and accommodate the larger substrate. The strains used in this study were: NCTY 26, NCTY 27 and NCTY 28.

The data in Fig. 11 demonstrates that terpene synthases can indeed be mutated to utilize the novel non-canonical building block instead of the canonical building block, i.e., the mutants favour the larger non-canonical substrates over the canonical GPP.

Example 8:

Synthesis of non-canonical cannabinoids.

We explored whether AtFKI can provide non-canonical precursors for the synthesis of cannabinoids with unprecedented structures. Fig. 12 demonstrates the capability of the system to convert a range of alcohols into the corresponding diphosphates, which are then attached to olivetolic acid to yield cannabigerolic acid (CBGA) analogues with novel non-canonical structures. 78

Four different alcohols were feeding in the yeast strain (NCTY 21) containing the alcohol feeding pathway AtFKI-AtIPK and CBGA downstream building block CsPT4-Erg20p(N127W) producing a blend of noncanonical CBGA (red), respectively, compared without feeding alcohols (green). Yeast cells expressing the Erg20p(127W)-CsPT4 alone (NCTY 20; blue), AtFKI-AtIPK alone (NCTY 21b; purple) are shown as the controH and control, respectively a, 3-methylpent-2-en- 1 -ol was used as feed to produce compound 1 (2,4-dihydroxy-3-(3-methylpent-2-en-1-yl)-6- pentylbenzoic acid), compound 2 (3-((2E)-3,7-dimethylnona-2,6-dien-1-yl)-2,4-dihydroxy-6- pentylbenzoic acid) and compound 3 (2,4-dihydroxy-6-pentyl-3-(3,4,7-trimethylnona-2,6-dien-1- yl)benzoic acid) b, 3,4-dimethylpent-2-en-1-ol was used as feed to produce compound 4 (3-(3,4- dimethylpent-2-en-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid) and compound 5 (2,4-dihydroxy-6- pentyl-3-((2E)-3,7,8-trimethylnona-2,6-dien-1-yl)benzoic acid) c, 3-methylhex-2-en-1-ol was used as feed to produce compound 6 ((E)-2,4-dihydroxy-3-(3-methylhex-2-en-1-yl)-6- pentylbenzoic acid) and compound 7 (3-((2E,6E)-3,7-dimethyldeca-2,6-dien-1-yl)-2,4-dihydroxy- 6-pentylbenzoic acid) d, 3-ethylpent-2-en-1-ol was used as feed to produce compound 8 ((E)-3- (7-ethyl-3-methylnona-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid).

Example 9:

Production of non-canonical terpenes with 16 and 17 carbon atoms using AtFKI.

This example illustrates the biosynthesis of noncanonical sesquiterpene scaffolds with 16 and 17 carbon atoms. Fig. 13 shows the production of non-canonical sesquiterpene scaffolds with 16 and 17 carbons by co-expression of the alcohol conversion pathway AtFK\ and At\PK, together with the farnesyl pyrophosphate (FPP) synthase Erg20p and the terpene synthase CYC2 in S. cerevisiae strain EGY48. The cells were growing according to the conditions described in Example 6.

To improve the product yield of the non-canonical C16 and C17 sesquiterpene precursors, we selected and tested several wild-type and mutant FPP synthases and geranylgeranyl pyrophopsphate (GGPP) synthases. Fig. 14 shows that Erg20p(F96C) is the most effective enzyme to increase the yield of C16 and C17 sesquiterpenes when feeding 3-methylbut-2-en-1- ol (3M2E). Similarly, Erg20p(F96C) is also the most efficient of the enzymes tested for improving the production of C16 and C17 sesquiterpenes when feeding 3,4-dimethylpent-2-en-1-ol (3,4- DMP) (Fig. 15) and 3-methylpent-2-en-1-ol (3-MP) (Fig. 16). However, SynGGPPS was found to be the most efficient enzyme tested to increase the yield of C17 sesquiterpenes when feeding 3- ethylpent-2-en-1-ol (3E2E) (Fig 17). 79

The data in Example 9 demonstrates that a yeast cell expressing AfFKI is capable of phosphorylating the core alcohol structures shown in Figures 13-17 (3-methylbut-2-en-1-ol (3M2E), 3,4-dimethylpent-2-en-1-ol (3,4-DMP), 3-methylpent-2-en-1-ol (3-MP), 3-ethylpent-2-en- 1 -ol (3E2E)) produce C16 and C17 (non-canonical) terpenes.

Example 10:

This example illustrates the biosynthesis of a non-canonical sesquiterpene by the action of the enzyme Salvia fruticosa caryophyllene synthase (Sf126) on C16 prenyl diphosphate substrates produced by the alcohol conversion pathway. The alcohol conversion pathway enzymes AfFKI and AfIPK were co-produced in yeast EGY48 cells with the FPP synthase Erg20p(F96C) and the terpene synthase Sfl26. 3-methylpent-2-en-1-ol (3-MP) was supplied (Fig 18).

This example demonstrates that AfFKI can phosphorylate the core alcohol structure 3- methylpent-2-en-1-ol (3-MP) and produce non-canonical C16 sesquiterpene.

Example 11:

This example illustrates the biosynthesis of non-canonical triterpene scaffolds. Fig. 19 shows the production of C31 (ho o)squalene by co-expression of the alcohol conversion pathway AfFKI- AflPK and the FPP synthase Erg20p(F96C) upon feeding 3-methylpent-2-en-1-ol (3-MP) and prenol. The yeast strain EGY48 was used and the cells were cultured under the conditions as described in Example 6.

Example 12:

Example 12 illustrates the biosynthesis of non-canonical triterpenes. Fig. 20 shows the production of a non-canonical triterpenoid product by co-expression of the alcohol conversion pathway AfFKI- AtlPKand cucurbitadienol synthase CPQ. Fig. 21 shows the production of another non-canonical triterpenoid product by co-expression of the alcohol conversion pathway AtFKI-AtIPK and (+)- ambrein synthase BmeTC(373C).

In conclusion, examples 9-12 demonstrate that yeast cells expressing AfFKI are capable of phosphorylating a variety of primary alcohols and generate non-canonical triterpenoids. 80

ITEMS

Item 1. A genetically engineered eukaryotic cell for the production of a terpene or terpenoid or isoprenoid comprising a first nucleic acid sequence encoding a first kinase that phosphorylates a primary alcohol to a mono- or pyrophosphate terpenoid precursor; and optionally a second nucleic acid sequence encoding a phosphokinase that phosphorylates a monophosphate precursor to a terpenoid pyrophosphate precursor; wherein the first kinase comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto.

Item 2. The cell according to item 1 wherein the first kinase is an alcohol kinase capable of phosphorylating a primary alcohol to a monophosphate terpenoid precursor.

Item 3. The cell according to either item 1 or 2 wherein said first nucleic acid sequence encodes a kinase that is capable of phosphorylating a non-canonical, prenol-like primary alcohol to a non- canonical monophosphate terpenoid precursor.

Item 4. The cell according to any of the preceding items, wherein said first nucleic acid encodes a kinase that also has phosphokinase activity thus being capable of catalyzing the conversion of a primary alcohol to a terpenoid pyrophosphate precursor.

Item 5. The cell according to any one of the preceding items, wherein the alcohol kinase is Farnesol kinase of Arabidopsis thaliana or SEQ ID NO: 1 or a homolog or variant thereof having at least 75% identity thereto.

Item 6. The cell according to any one of the preceding items, wherein the phosphokinase is a prenyl phosphate kinase, such as an isopentenyl phosphate kinase.

Item 7. The cell according to item 6, wherein the phosphokinase is SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, or SEQ ID NO: 6 or a homologue or variant thereof having at least 75% identity thereto.

Item 8. The cell according to any one of the preceding items wherein the kinase or kinases are fused to one or more peptide or peptide analogues.

Item 9. The cell according to item 8, wherein said one or more peptide or peptide analogues confer additional functionality to the kinase or kinases, such as an improved enzyme kinetics, or intracellular localisation functionality, or the functionality of increasing the stability, promiscuity and / or half-life of the kinase or kinases. 81

Item 10. The cell according to item 9, wherein the peptide or peptide analogue is maltose-binding protein, green fluorescent protein, thioredoxin, glutathione S-transferase, yeast farnesyl diphosphate synthase (Erg20p), NusA, small ubiquitin related modifier Smt3, or a fragment thereof. Item 11. The cell according to any one of the preceding items, wherein the primary alcohol is an alcohol with the structure of formula 1:

Formula 1:

wherein Ri is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, a group containing oxygen, a group containing nitrogen, a group containing sulfur, a group containing phosphorus and / or a group containing boron; 82

R₂ is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, a group containing oxygen, a group containing nitrogen, a group containing sulphur, a group containing phosphorus and / or a group containing boron; and

Item 12. The cell according to item 11, wherein the primary alcohol is 3-methylbut-2-en-1-ol, 4- fluoro-3-methylbut-2-en-1 -ol, 3-methylpent-2-en-1 -ol, 3,4-dimethylpent-2-en-1 -ol, 3-ethylpent-2- en-1-ol, 3-methylhex-2-en-1-ol, 3-methylhexa-2,5-dien-1-ol, 3-methylbut-3-en-1-ol, 3- methylenepentan-1 -ol, 2-methylprop-2-en-1 -ol, 3-methyl-4-(methylthio)but-2-en-1 -ol, 5-chloro-3- methylpent-2-en-1-ol, 3,4-dimethylpent-2-en-1-ol, 4-methyl-3-methylenepentan-1-ol, 3,4- dimethylpent-3-en-1-ol, propan-1-ol, prop-2-en-1-ol, prop-2 -yn-1-ol, butan-1-ol, but-3-en-1-ol, but-2-en-1-ol, buta-2,3-dien-1-ol, but-3-yn-1-ol, 3-methylbut-3-en-1-ol, 3-methylbut-2-en-1-ol, 3- methylbutan-1-ol, but-2-yn-1-ol, 2-methylenebutan-1-ol, 2-methylbut-2-en-1-ol, 2-methylbut-3-en- 1 -ol, 2-methylbutan-1-ol, 3-ethylpent-4-en-1-ol, 3-methylpenta-2,4-dien-1-ol, 3-methylpentan-1- ol, 3-methylpent-2-en-1-ol, 3-methylenepentan-1-ol, 3-methylpent-3-en-1-ol, 3-ethylpentan-1-ol, 3-ethylpent-4-en-1-ol, 3-ethylpent-3-en-1-ol, 3-ethylpent-2-en-1-ol, 3-ethylpent-4-yn-1-ol, 3- methylenepent-4-en-1-ol, 3-methylpent-4-yn-1-ol, 3-methylenepent-4-yn-1-ol, 3-methylhexan-1- ol, 3-methylhex-2-en-1-ol, 3-methylenehexan-1-ol, 3-methylhex-3-en-1-ol, 3-methylhex-5-en-1- ol, 3-methylpent-2-en-4-yn-1-ol, 3-methylhex-4-en-1-ol, 3-methylhexa-2,4-dien-1-ol, 3- methylhexa-2,5-dien-1-ol, 3-methylheptan-1-ol, 3-methylhepta-2,4-dien-1-ol, 3- methyleneheptan-1-ol, 3-methylhept-3-en-1-ol, 3-methylhept-4-en-1-ol, 3-methylhept-5-en-1-ol, 3-methylhept-6-en-1-ol, 3-methylhept-2-en-1-ol, 3-methylenehept-4-en-1-ol, 3-methylhepta-3,4- dien-1-ol, 3-methylhept-4-en-1-ol, 3-methylhepta-5,6-dien-1-ol, 3-methylhepta-2,5-dien-1-ol, 3- methylhepta-2,6-dien-1-ol, 3-methylenehept-6-en-1-ol, 3-methylenehept-5-en-1-ol, 3- methylhepta-3,5-dien-1-ol, 3-methylhepta-4,5-dien-1-ol, 3-methylhepta-3,5,6-trien-1-ol, 3- methylhepta-4,6-dien-1-ol, 3-methylhepta-2,4,6-trien-1-ol, 3-methylhepta-4,6-dien-1-ol, 3- methyloctan-1-ol, 3-methyloct-2-en-1-ol, 3-methyleneoctan-1-ol, 3-methyloct-3-en-1-ol, 3- methyloct-4-en-1-ol, 3-methyloct-5-en-1-ol, 3-methyloct-7-en-1-ol, 3-methylocta-2,4-dien-1-ol, 3- methyleneocta-4,5-dien-1 -ol, 3-methyleneoct-4-en-1 -ol, 3-methylocta-3,4-dien-1 -ol, 3-methyloct- 6-en-1-ol, 3-methylocta-2,4,5-trien-1-ol, 3-methylocta-2,4,6-trien-1-ol, 3-methyleneocta-4,6-dien- 1 -ol, 3-methylocta-4,5-dien-1-ol, 3-methylocta-5,6-dien-1-ol, 3-methylocta-6,7-dien-1-ol, 3- methyleneocta-5,7-dien-1-ol, 3-methylocta-2,5,7-trien-1-ol, 3-methyleneocta-4,7-dien-1-ol, 3- methylocta-2,4,7-trien-1-ol, 3-methylocta-4,5,7-trien-1-ol, 3-methylocta-3,4,6-trien-1-ol, 3- 83 methylocta-3,4,7-trien-1-ol, 3-fluorobut-2-en-1-ol, 3-chlorobut-2-en-1-ol, 3-bromobut-2-en-1-ol, 3- aminobut-2-en-1-ol, 3-phosphaneylbut-2-en-1-ol, 3-fluorobut-3-en-1-ol, 3-chlorobut-3-en-1-ol, 3- bromobut-3-en-1-ol, 3-aminobut-3-en-1-ol, 3-phosphaneylbut-3-en-1-ol, 4-chloro-3-methylbut-2- en-1-ol, 4-bromo-3-methylbut-2-en-1-ol, 4-hydroxy-2-methylbut-2-enal, 2-methylbut-2-ene-1 ,4- diol, 4-mercapto-3-methylbut-2-en-1-ol, 3-methylpent-2-en-1-ol, 4-amino-3-methylbut-2-en-1-ol, 3-(fluoromethyl)but-3-en-1-ol, 4-fluoro-3-methylbut-2-en-1-ol, 3-(chloromethyl)but-3-en-1-ol, 3- (bromomethyl)but-3-en-1-ol, 4-hydroxy-2-methylenebutanal, 2-methylenebutane-1 ,4-diol, 3- (mercaptomethyl)but-3-en-1-ol, 3-methylenepentan-1-ol, 3-(aminomethyl)but-3-en-1-ol, 3- (phosphaneylmethyl)but-3-en-1 -ol, 3-methyl-4-phosphaneylbut-2-en-1 -ol, 5-fluoro-3-methylpent- 2-en-1-ol, 5-bromo-3-methylpent-2-en-1-ol, 5-chloro-3-methylpent-2-en-1-ol, 3-methylpent-2- ene-1 ,5-diol, 5-hydroxy-3-methylpent-3-enal, 5-iodo-3-methylpent-2-en-1-ol, 3-methyl-4- (methylthio)but-2-en-1 -ol, 5-mercapto-3-methylpent-2-en-1 -ol, 5-amino-3-methylpent-2-en-1 -ol, or 3-methyl-5-phosphaneylpent-2-en-1-ol or an analogue of any of these compounds including analogues with the elements nitrogen, oxygen, fluorine, silicon, phosphorus, sulphur, chlorine, selenium, boron, iodine, lithium, sodium or potassium.

Item 13. The cell according to item 11 , wherein the primary alcohol is prenol, isoprenol or a prenol like alcohol.

Item 14. The cell according to any one of the preceding items, further comprising a further exogenous nucleic acid sequence enabling increased expression of an enzyme capable of catalysing the production of canonical and / or non-canonical terpenes, terpenoids, isoprenoids or structures containing isoprenoid groups. Item 15. The cell according to item 14 wherein the exogenous nucleic acid sequence enabling increased expression encodes a terpene synthase enzyme such as a monoterpene synthase, a sesquiterpene synthase, a diterpene synthase, a sesterterpene synthaseor a triterpene synthase or a fragment thereof; or a prenyltransferase enzyme or a fragment thereof; or other enzymes or a fragment thereof capable of catalysing the production of canonical or non-canonical terpenes, terpenoids, isoprenoids or structures containing isoprenoid groups.

Item 16. The cell according to item 15 wherein the terpene synthase or the prenyl transferase or other enzyme is capable of using non-canonical terpenoid or isoprenoid building blocks as substrate.

Item 17. The cell according to either item 15 or 16 wherein the terpene synthase enzyme or fragment thereof or a prenyl transferase enzyme or fragment thereof or other enzymes or a fragment thereof capable of catalysing the production of canonical terpenes, terpenoids, 84 isoprenoids or structures containing isoprenoid groups, comprises a change in the amino acid sequence that enables improved enzyme kinetics for utilisation of non-canonical terpenoid or isoprenoid building blocks.

Item 18. The cell according to any one of the preceding items, said cell being capable of production of a terpene or terpenoid selected from the group comprising:

Limonene, myrcene, alpha-pinene, sabinene, beta-pinene, 1,8-cineole, tricyclene, alpha- thujene, a/p/7a-fenchene, camphene, delta- 2-carene, a/p/7a-phellandrene, 3-carene, 1,4-cineole, a/p/7a-terpinene, befa-phellandrene, (Z)-befa-ocimene, (E)-beta-ocimene, gamma-terpinene, terpinolene, linalool, perillene, allo-ocimene, c/s-beta-terpineol, c/s-terpine-1-ol, isoborneol, cfe/fa-terpineol, borneol, chrysanthemol, lavandulol, alpha-terpineol, nerol, geraniol, alpha- humulene, beta-caryophyllene, valencene, amorpha-4,11 -diene, alpha-patchoulene, alpha- santalane, beta-santalene, valerenol, obtusadiene, laurencenone A, taxadiene, miltiradiene, sclareol, casbene, cannabigerolic acid, grifolic acid, daurichromenic acid, confluentin, rhododaurichromenic acids A and B, anthopogocyclolic acid, anthopogochromenic acid, cannabiorcichromenic acid, cannabiorcicyclolic acid, c/s-perrottetinene, (-)-c/s-perrottetinenic acid, 7-ethyl-3-methylenenona-1 ,6-diene, 7-methyl-3-methylenenona-1 ,6-diene, 7,8-dimethyl-3- methylenenona-1 ,6-diene, 7-methyl-3-methylenedeca-1 ,6-diene, 1 -methyl-4-(3-methylbut-1 -en- 2-yl)cyclohex-1 -ene, 4-(but-1 -en-2-yl)-1 -methylcyclohex-1 -ene,

1 -methyl-4-(pent-1 -en-2-yl)cyclohex-1 -ene, 1 -methyl-4-(pent-2-en-3-yl)cyclohex-1 -ene, 2,4-dihydroxy-3-(3-methylpent-2-en-1-yl)-6-pentylbenzoic acid, 2,4-dihydroxy-3-(3-methylhex-2- en-1-yl)-6-pentylbenzoic acid, 3-(4-fluoro-3-methylbut-2-en-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid, 3-(7-ethyl-3-methylnona-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid, 3-(3,7- dimethylnona-2,6-dien-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid, 2,4-dihydroxy-6-pentyl-3-(3,4,7- trimethylocta-2,6-dien-1-yl)benzoic acid, 2,4-dihydroxy-6-pentyl-3-(3,4,7-trimethylnona-2,6-dien- 1-yl)benzoic acid, 3-(3-ethylpent-2-en-1-yl)-2,4-dihydroxy-6-pentylbenzoic acid, 2,4-dihydroxy-6- pentyl-3-(3,7,8-trimethylnona-2,6-dien-1-yl)benzoic acid, 3-(3,7-dimethyldeca-2,6-dien-1-yl)-2,4- dihydroxy-6-pentylbenzoic acid, and 3-(8-fluoro-3,7-dimethylocta-2,6-dien-1-yl)-2,4-dihydroxy-6- pentylbenzoic acid.

Item 19. The cell according to any one of the preceding items wherein the eukaryotic cell is a yeast cell, such as a Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces pombe, Scheffersomyces stipidis, Pichia pastoris, Hansenula polymorpha (syn. Ogataea parapolymorpha), Kluyveromyces marxianus, Yarrowia lipolytica, Klyveromyces lactis, or Dekkera bruxellensis cell. 85

Item 20. The cell according to any one of items 1 - 18 wherein the eukaryotic cell is a filamentous fungi cell, such as a cell derived from Aspergillus niger, Aspergillus oryzae, Aspergillus terreus, Neurospora crassa, or Trichoderma reesei.

Item 21. The cell according to any one of items 1 -18, wherein the eukaryotic cell is an algal cell, such as a microalgae cell such as Nannochoropsis gaditana, Nannochloropsis oceanica, Nannochloropsis salina, Chlamydomonas reinhardtii, Arthrospira, Chlorella vulgaris, Dunaliella salina, Haematococcus pluvialis, Pheaodactylum tricornutum, or Isochrysis galbana.

Item 22. A method for production of a terpene, terpenoid or an isoprenoid, said method comprising the steps of:

- providing an engineered eukaryotic cell comprising an exogenous DNA sequence coding for a primary alcohol kinase, and

- culturing said engineered cell in a medium containing a primary alcohol.

Item 23. The method according to item 22 wherein said cell further comprises an exogenous nucleic acid sequence coding for a phosphokinase.

Item 24. The method according to either of items 22 or 23 wherein the exogenous DNA sequence coding for a primary alcohol kinase encodes an alcohol kinase comprising SEQ ID NO: 2 or a homolog or variant thereof having at least 75% identity thereto.

Item 25. The method according to either of items 22 or 23, wherein the exogenous DNA sequence coding for a primary alcohol kinase encodes an alcohol kinase comprising SEQ ID NO: 1 or a homolog or variant thereof having at least 75% identity thereto.

Item 26. The method according to any one of items 22 - 25 wherein the primary alcohol is an alcohol with a structure according to item 11.

Item 27. The method according to item 26 wherein the primary alcohol is an alcohol of item 12.

Item 28. The method according to any one of items 22 - 27 wherein the primary alcohol is at an initial concentration within a range of 0.01% to 1% v/v, such as within a range of 0.05% to 0.6% v/v, such as within a range of 0.1% to 0.3% v/v, such as 0.1% v/v.

Claims

86 CLAIMS

1. A genetically engineered eukaryotic cell for the production of a terpene or a terpenoid or an isoprenoid comprising a first nucleic acid sequence encoding a first kinase that phosphorylates a primary alcohol to a mono- or pyrophosphate terpenoid precursor; wherein the first kinase comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto.

2. The cell according to claim 1 , wherein the cell further comprises a second nucleic acid sequence encoding a phosphokinase that phosphorylates a monophosphate precursor to a terpenoid pyrophosphate precursor.

3. The cell according to any one of the preceding claims wherein the first kinase is an alcohol kinase capable of phosphorylating a primary alcohol to a monophosphate terpenoid precursor.

4. The cell according to any one of the preceding claims, wherein said first nucleic acid sequence encodes a kinase that is capable of phosphorylating a non-canonical, prenol-like primary alcohol to a non-canonical monophosphate terpenoid precursor.

5. The cell according to any one of the preceding claims, wherein said first nucleic acid encodes a kinase that also has phosphokinase activity thus being capable of catalyzing the conversion of a primary alcohol to a terpenoid pyrophosphate precursor.

6. The cell according to any one of the claims 2 to 5, wherein the phosphokinase is a prenyl phosphate kinase, such as an isopentenyl phosphate kinase.

7. The cell according to any one of claims 2 to 6, wherein the phosphokinase is SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, or SEQ ID NO: 6 ora homologue or variant thereof having at least 75% identity thereto. 87

8. The cell according to any one of the preceding claims, wherein the primary alcohol is an alcohol with the structure of formula 1:

Formula 1:

wherein Ri is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, a group containing oxygen, a group containing nitrogen, a group containing sulfur, a group containing phosphorus and / or a group containing boron;

R₂ is hydrogen, an alkane-, an alkene-, an alkyne-, a benzene derivative-, a cyclic group, a branched group, a group containing a reactive nonmetal; a group containing a metalloid; a group containing a halogen, a group containing oxygen, a group containing nitrogen, a group containing sulphur, a group containing phosphorus and / or a group containing boron; and 88

9. The cell according to any one of the preceding claims, wherein the primary alcohol is prenol, isoprenol ora prenol like alcohol.

10. The cell according to any one of the preceding claims wherein the eukaryotic cell is a yeast cell, such as a Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces pombe, Scheffersomyces stipidis, Pichia pastoris, Hansenula polymorpha (syn. Ogataea parapolymorpha), Kluyveromyces marxianus, Yarrowia lipolytica, Klyveromyces lactis, or Dekkera bruxellensis cell.

11. The cell according to any one of claims 1 - 9 wherein the eukaryotic cell is a filamentous fungi cell, such as a cell derived from Aspergillus niger, Aspergillus oryzae, Aspergillus terreus, Neurospora crassa, or Trichoderma reesei.

12. The cell according to any one of claims 1 -9, wherein the eukaryotic cell is an algal cell, such as a microalgae cell such as Nannochoropsis gaditana, Nannochloropsis oceanica, Nannochloropsis salina, Chlamydomonas reinhardtii, Arthrospira, Chlorella vulgaris, Dunaliella salina, Haematococcus pluvialis, Pheaodactylum tricornutum, or Isochrysis galbana.

13. A method for production of a terpene, a terpenoid or an isoprenoid, said method comprising the steps of:

- providing an engineered eukaryotic cell comprising an exogenous DNA sequence coding for a first kinase, wherein the first kinase comprises SEQ ID NO: 2 or a homologue or variant thereof having at least 75% identity thereto and

- culturing said engineered cell in a medium containing a primary alcohol.

14. The method according to claim 13 wherein said cell further comprises an exogenous nucleic acid sequence coding for a phosphokinase. 89

15. The method according to any one of claims 13- 14 wherein the primary alcohol is an alcohol with a structure according to claim 8.