US20230148463A9 - Method for producing albicanol and/or drimenol - Google Patents
Method for producing albicanol and/or drimenol Download PDFInfo
- Publication number
- US20230148463A9 US20230148463A9 US17/673,465 US202217673465A US2023148463A9 US 20230148463 A9 US20230148463 A9 US 20230148463A9 US 202217673465 A US202217673465 A US 202217673465A US 2023148463 A9 US2023148463 A9 US 2023148463A9
- Authority
- US
- United States
- Prior art keywords
- seq
- sequence
- polypeptide
- albicanol
- sesquiterpene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- ZPTSRWNMMWXEHX-KCQAQPDRSA-N (+)-albicanol Chemical compound OC[C@H]1C(=C)CC[C@H]2C(C)(C)CCC[C@@]21C ZPTSRWNMMWXEHX-KCQAQPDRSA-N 0.000 title claims abstract description 102
- ZPTSRWNMMWXEHX-UHFFFAOYSA-N Albicanol Natural products OCC1C(=C)CCC2C(C)(C)CCCC21C ZPTSRWNMMWXEHX-UHFFFAOYSA-N 0.000 title claims abstract description 99
- HMWSKUKBAWWOJL-KCQAQPDRSA-N drimenol Chemical compound CC1(C)CCC[C@]2(C)[C@@H](CO)C(C)=CC[C@H]21 HMWSKUKBAWWOJL-KCQAQPDRSA-N 0.000 title claims abstract description 71
- HMWSKUKBAWWOJL-UHFFFAOYSA-N Drimenol Natural products CC1(C)CCCC2(C)C(CO)C(C)=CCC21 HMWSKUKBAWWOJL-UHFFFAOYSA-N 0.000 title claims abstract description 66
- HMWSKUKBAWWOJL-YDHLFZDLSA-N drimentol Natural products OC[C@H]1C(C)=CC[C@H]2C(C)(C)CCC[C@@]12C HMWSKUKBAWWOJL-YDHLFZDLSA-N 0.000 title claims abstract description 62
- 238000004519 manufacturing process Methods 0.000 title claims description 19
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 156
- 229920001184 polypeptide Polymers 0.000 claims abstract description 149
- 102000004196 processed proteins & peptides Human genes 0.000 claims abstract description 149
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 114
- 108010087432 terpene synthase Proteins 0.000 claims abstract description 101
- 238000000034 method Methods 0.000 claims abstract description 87
- 230000001588 bifunctional effect Effects 0.000 claims abstract description 75
- 150000003483 drimane derivatives Chemical class 0.000 claims abstract description 72
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 62
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 62
- 125000003275 alpha amino acid group Chemical group 0.000 claims abstract description 61
- VWFJDQUYCIWHTN-YFVJMOTDSA-N 2-trans,6-trans-farnesyl diphosphate Chemical compound CC(C)=CCC\C(C)=C\CC\C(C)=C\CO[P@](O)(=O)OP(O)(O)=O VWFJDQUYCIWHTN-YFVJMOTDSA-N 0.000 claims abstract description 58
- VWFJDQUYCIWHTN-FBXUGWQNSA-N Farnesyl diphosphate Natural products CC(C)=CCC\C(C)=C/CC\C(C)=C/COP(O)(=O)OP(O)(O)=O VWFJDQUYCIWHTN-FBXUGWQNSA-N 0.000 claims abstract description 58
- 230000000694 effects Effects 0.000 claims abstract description 56
- 108090000604 Hydrolases Proteins 0.000 claims abstract description 29
- 108010052386 2-haloacid dehalogenase Proteins 0.000 claims abstract description 14
- 210000004027 cell Anatomy 0.000 claims description 131
- 150000003505 terpenes Chemical class 0.000 claims description 57
- 235000007586 terpenes Nutrition 0.000 claims description 57
- 102000004190 Enzymes Human genes 0.000 claims description 40
- 108090000790 Enzymes Proteins 0.000 claims description 40
- 125000003729 nucleotide group Chemical group 0.000 claims description 40
- 239000002773 nucleotide Substances 0.000 claims description 39
- 241000588724 Escherichia coli Species 0.000 claims description 29
- 239000013598 vector Substances 0.000 claims description 29
- 239000000203 mixture Substances 0.000 claims description 25
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 20
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 claims description 19
- 230000015572 biosynthetic process Effects 0.000 claims description 17
- 230000000295 complement effect Effects 0.000 claims description 17
- 230000002441 reversible effect Effects 0.000 claims description 15
- 239000013604 expression vector Substances 0.000 claims description 12
- 239000002243 precursor Substances 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 125000002015 acyclic group Chemical group 0.000 claims description 10
- 230000001580 bacterial effect Effects 0.000 claims description 10
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 8
- 230000001131 transforming effect Effects 0.000 claims description 6
- 210000005253 yeast cell Anatomy 0.000 claims description 6
- 210000001236 prokaryotic cell Anatomy 0.000 claims description 5
- 239000013603 viral vector Substances 0.000 claims description 4
- 238000012258 culturing Methods 0.000 claims description 2
- 238000001727 in vivo Methods 0.000 abstract description 16
- 238000000338 in vitro Methods 0.000 abstract description 4
- 108090000623 proteins and genes Proteins 0.000 description 122
- 102000004169 proteins and genes Human genes 0.000 description 73
- 235000018102 proteins Nutrition 0.000 description 72
- 108091028043 Nucleic acid sequence Proteins 0.000 description 53
- 239000002299 complementary DNA Substances 0.000 description 49
- 230000014509 gene expression Effects 0.000 description 37
- 229930004725 sesquiterpene Natural products 0.000 description 35
- 150000004354 sesquiterpene derivatives Chemical class 0.000 description 34
- 241000196324 Embryophyta Species 0.000 description 32
- 239000012634 fragment Substances 0.000 description 32
- 108020004414 DNA Proteins 0.000 description 25
- 102000004157 Hydrolases Human genes 0.000 description 25
- 239000013615 primer Substances 0.000 description 20
- 239000002987 primer (paints) Substances 0.000 description 20
- 239000000047 product Substances 0.000 description 20
- 235000001014 amino acid Nutrition 0.000 description 19
- 150000001413 amino acids Chemical class 0.000 description 19
- 244000005700 microbiome Species 0.000 description 19
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 15
- 230000001105 regulatory effect Effects 0.000 description 15
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 14
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 14
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 14
- XPPKVPWEQAFLFU-UHFFFAOYSA-N diphosphoric acid Chemical group OP(O)(=O)OP(O)(O)=O XPPKVPWEQAFLFU-UHFFFAOYSA-N 0.000 description 14
- 238000013518 transcription Methods 0.000 description 14
- 230000035897 transcription Effects 0.000 description 14
- 241000123291 Cryptoporus volvatus Species 0.000 description 13
- 101710093888 Pentalenene synthase Proteins 0.000 description 13
- 101710115850 Sesquiterpene synthase Proteins 0.000 description 13
- 229930001542 drimane Natural products 0.000 description 13
- 108091033319 polynucleotide Proteins 0.000 description 13
- 102000040430 polynucleotide Human genes 0.000 description 13
- 239000002157 polynucleotide Substances 0.000 description 13
- 239000000243 solution Substances 0.000 description 13
- 239000001177 diphosphate Substances 0.000 description 12
- 235000011180 diphosphates Nutrition 0.000 description 12
- CVRSZZJUWRLRDE-PWNZVWSESA-N drimane Chemical compound CC1(C)CCC[C@]2(C)[C@@H](C)[C@@H](C)CC[C@H]21 CVRSZZJUWRLRDE-PWNZVWSESA-N 0.000 description 12
- 108020004999 messenger RNA Proteins 0.000 description 12
- 238000007363 ring formation reaction Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 11
- 239000000284 extract Substances 0.000 description 11
- 238000009396 hybridization Methods 0.000 description 11
- 239000003550 marker Substances 0.000 description 11
- 241000233866 Fungi Species 0.000 description 10
- 230000002538 fungal effect Effects 0.000 description 10
- 239000013612 plasmid Substances 0.000 description 10
- 230000005588 protonation Effects 0.000 description 10
- 230000002255 enzymatic effect Effects 0.000 description 9
- 230000035772 mutation Effects 0.000 description 9
- 241000228212 Aspergillus Species 0.000 description 8
- 108091026890 Coding region Proteins 0.000 description 8
- 108020004705 Codon Proteins 0.000 description 8
- 241000408172 Fomitopsis officinalis Species 0.000 description 8
- 150000001875 compounds Chemical class 0.000 description 8
- 238000010276 construction Methods 0.000 description 8
- 239000013613 expression plasmid Substances 0.000 description 8
- 210000004897 n-terminal region Anatomy 0.000 description 8
- 239000000523 sample Substances 0.000 description 8
- KJTLQQUUPVSXIM-ZCFIWIBFSA-N (R)-mevalonic acid Chemical compound OCC[C@](O)(C)CC(O)=O KJTLQQUUPVSXIM-ZCFIWIBFSA-N 0.000 description 7
- 241000894006 Bacteria Species 0.000 description 7
- KJTLQQUUPVSXIM-UHFFFAOYSA-N DL-mevalonic acid Natural products OCCC(O)(C)CC(O)=O KJTLQQUUPVSXIM-UHFFFAOYSA-N 0.000 description 7
- 238000003556 assay Methods 0.000 description 7
- 230000003197 catalytic effect Effects 0.000 description 7
- 238000001819 mass spectrum Methods 0.000 description 7
- -1 sesquiterpene hydrocarbons Chemical class 0.000 description 7
- 241000894007 species Species 0.000 description 7
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 6
- 241000972773 Aulopiformes Species 0.000 description 6
- 241000123288 Cryptoporus Species 0.000 description 6
- 229920001917 Ficoll Polymers 0.000 description 6
- 208000033962 Fontaine progeroid syndrome Diseases 0.000 description 6
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 6
- 238000012512 characterization method Methods 0.000 description 6
- 238000003776 cleavage reaction Methods 0.000 description 6
- 238000005755 formation reaction Methods 0.000 description 6
- 230000010076 replication Effects 0.000 description 6
- 235000019515 salmon Nutrition 0.000 description 6
- 230000007017 scission Effects 0.000 description 6
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 5
- 108020004635 Complementary DNA Proteins 0.000 description 5
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 5
- 101150094690 GAL1 gene Proteins 0.000 description 5
- 102100028501 Galanin peptides Human genes 0.000 description 5
- 101100121078 Homo sapiens GAL gene Proteins 0.000 description 5
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 5
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 5
- 230000000692 anti-sense effect Effects 0.000 description 5
- 210000004899 c-terminal region Anatomy 0.000 description 5
- 238000005119 centrifugation Methods 0.000 description 5
- 239000001963 growth medium Substances 0.000 description 5
- 150000002500 ions Chemical class 0.000 description 5
- 239000002609 medium Substances 0.000 description 5
- 230000037361 pathway Effects 0.000 description 5
- CRDAMVZIKSXKFV-YFVJMOTDSA-N (2-trans,6-trans)-farnesol Chemical compound CC(C)=CCC\C(C)=C\CC\C(C)=C\CO CRDAMVZIKSXKFV-YFVJMOTDSA-N 0.000 description 4
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 4
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 4
- 101000924190 Aspergillus calidoustus Drimenol cyclase drtB Proteins 0.000 description 4
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 4
- 101150038242 GAL10 gene Proteins 0.000 description 4
- 102100024637 Galectin-10 Human genes 0.000 description 4
- 244000178870 Lavandula angustifolia Species 0.000 description 4
- BZLVMXJERCGZMT-UHFFFAOYSA-N Methyl tert-butyl ether Chemical compound COC(C)(C)C BZLVMXJERCGZMT-UHFFFAOYSA-N 0.000 description 4
- OFBQJSOFQDEBGM-UHFFFAOYSA-N Pentane Chemical compound CCCCC OFBQJSOFQDEBGM-UHFFFAOYSA-N 0.000 description 4
- SNRUBQQJIBEYMU-UHFFFAOYSA-N dodecane Chemical compound CCCCCCCCCCCC SNRUBQQJIBEYMU-UHFFFAOYSA-N 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 239000000543 intermediate Substances 0.000 description 4
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 4
- 229910052757 nitrogen Inorganic materials 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- CRDAMVZIKSXKFV-UHFFFAOYSA-N trans-Farnesol Natural products CC(C)=CCCC(C)=CCCC(C)=CCO CRDAMVZIKSXKFV-UHFFFAOYSA-N 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- NUFBIAUZAMHTSP-UHFFFAOYSA-N 3-(n-morpholino)-2-hydroxypropanesulfonic acid Chemical compound OS(=O)(=O)CC(O)CN1CCOCC1 NUFBIAUZAMHTSP-UHFFFAOYSA-N 0.000 description 3
- 244000251953 Agaricus brunnescens Species 0.000 description 3
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 3
- 241001465318 Aspergillus terreus Species 0.000 description 3
- 108091035707 Consensus sequence Proteins 0.000 description 3
- 235000019750 Crude protein Nutrition 0.000 description 3
- 241000189557 Dichomitus squalens Species 0.000 description 3
- 241000146398 Gelatoporia subvermispora Species 0.000 description 3
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 3
- 108010076504 Protein Sorting Signals Proteins 0.000 description 3
- 108020004459 Small interfering RNA Proteins 0.000 description 3
- 239000002253 acid Substances 0.000 description 3
- 230000004075 alteration Effects 0.000 description 3
- CKLJMWTZIZZHCS-REOHCLBHSA-N aspartic acid group Chemical group N[C@@H](CC(=O)O)C(=O)O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 3
- 238000002306 biochemical method Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000010353 genetic engineering Methods 0.000 description 3
- 229930195733 hydrocarbon Natural products 0.000 description 3
- 238000011534 incubation Methods 0.000 description 3
- HHNLQWOVCRXGMP-UHFFFAOYSA-N laricinolic acid Natural products C1C(O)C(=C)C(C(O)=O)C2(C)C1C(C)(C)CCC2 HHNLQWOVCRXGMP-UHFFFAOYSA-N 0.000 description 3
- 239000007788 liquid Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 229930003658 monoterpene Natural products 0.000 description 3
- 150000002773 monoterpene derivatives Chemical class 0.000 description 3
- 235000002577 monoterpenes Nutrition 0.000 description 3
- 230000002018 overexpression Effects 0.000 description 3
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 3
- 108020001580 protein domains Proteins 0.000 description 3
- 230000008707 rearrangement Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- RXHIKAIVEMAPRU-JRIGQVHBSA-N sequiterpene Natural products C1=C(C)[C@@H](OC(C)=O)[C@H](O)[C@@]2(O)[C@H](C)CC[C@@H](C(C)=C)[C@H]21 RXHIKAIVEMAPRU-JRIGQVHBSA-N 0.000 description 3
- 239000000725 suspension Substances 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- PCFMECNNYYMDRS-VTRNTCPLSA-N (3s,4r)-4-[[(1s,4ar,5r,8as)-5-[[(2s,3r)-3-[[(1s,4ar,5r,8as)-5-(hydroxymethyl)-5,8a-dimethyl-2-methylidene-3,4,4a,6,7,8-hexahydro-1h-naphthalen-1-yl]methoxy]-2-(carboxymethyl)-4-methoxy-4-oxobutanoyl]oxymethyl]-5,8a-dimethyl-2-methylidene-3,4,4a,6,7,8-hexa Chemical compound OC[C@@](C)([C@@H]1CCC2=C)CCC[C@]1(C)[C@H]2CO[C@@H](C(=O)OC)[C@H](CC(O)=O)C(=O)OC[C@@]1(C)[C@@H]2CCC(=C)[C@H](CO[C@H]([C@H](CC(O)=O)C(=O)OC)C(=O)OC)[C@@]2(C)CCC1 PCFMECNNYYMDRS-VTRNTCPLSA-N 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 2
- 229920001817 Agar Polymers 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical group [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 2
- 101150051269 ERG10 gene Proteins 0.000 description 2
- 101150014913 ERG13 gene Proteins 0.000 description 2
- 101150084072 ERG20 gene Proteins 0.000 description 2
- 108700024394 Exon Proteins 0.000 description 2
- 241000410067 Gelatoporia Species 0.000 description 2
- 108050003363 Haloacid dehalogenase-like hydrolases Proteins 0.000 description 2
- 102000014348 Haloacid dehalogenase-like hydrolases Human genes 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 2
- 101100390535 Mus musculus Fdft1 gene Proteins 0.000 description 2
- 101100445407 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) erg10B gene Proteins 0.000 description 2
- 101100390536 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) erg-6 gene Proteins 0.000 description 2
- 108010038807 Oligopeptides Proteins 0.000 description 2
- 102000015636 Oligopeptides Human genes 0.000 description 2
- 229910019142 PO4 Inorganic materials 0.000 description 2
- 238000002123 RNA extraction Methods 0.000 description 2
- 238000003559 RNA-seq method Methods 0.000 description 2
- 108020004511 Recombinant DNA Proteins 0.000 description 2
- 108091081021 Sense strand Proteins 0.000 description 2
- 108091023040 Transcription factor Proteins 0.000 description 2
- 102000040945 Transcription factor Human genes 0.000 description 2
- 108020004566 Transfer RNA Proteins 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 230000002378 acidificating effect Effects 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- 230000010933 acylation Effects 0.000 description 2
- 238000005917 acylation reaction Methods 0.000 description 2
- 239000008272 agar Substances 0.000 description 2
- 230000029936 alkylation Effects 0.000 description 2
- 238000005804 alkylation reaction Methods 0.000 description 2
- 125000000746 allylic group Chemical group 0.000 description 2
- 239000003242 anti bacterial agent Substances 0.000 description 2
- 229940088710 antibiotic agent Drugs 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000000376 autoradiography Methods 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 230000008827 biological function Effects 0.000 description 2
- 230000006696 biosynthetic metabolic pathway Effects 0.000 description 2
- 239000007853 buffer solution Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000013592 cell lysate Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- OXFMLGZWGZNFCM-UOXLEDAOSA-N cryptoporic acid d Chemical compound C1OC(=O)[C@@H](CC(O)=O)[C@H](C(=O)OC)OC[C@H]2C(=C)CC[C@@H]3[C@]2(C)CCC[C@@]3(C)COC(=O)[C@@H](CC(O)=O)[C@H](C(=O)OC)OC[C@H]2C(=C)CC[C@@H]3[C@]2(C)CCC[C@@]13C OXFMLGZWGZNFCM-UOXLEDAOSA-N 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 229960000633 dextran sulfate Drugs 0.000 description 2
- 238000004821 distillation Methods 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 101150116391 erg9 gene Proteins 0.000 description 2
- 150000002170 ethers Chemical class 0.000 description 2
- 238000000769 gas chromatography-flame ionisation detection Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 150000002430 hydrocarbons Chemical class 0.000 description 2
- 230000007062 hydrolysis Effects 0.000 description 2
- 238000006460 hydrolysis reaction Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 229930002697 labdane diterpene Natural products 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 229910021645 metal ion Inorganic materials 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 230000003647 oxidation Effects 0.000 description 2
- 238000007254 oxidation reaction Methods 0.000 description 2
- 239000012071 phase Substances 0.000 description 2
- 239000010452 phosphate Substances 0.000 description 2
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 2
- 108010071062 pinene cyclase I Proteins 0.000 description 2
- 230000008488 polyadenylation Effects 0.000 description 2
- 238000003752 polymerase chain reaction Methods 0.000 description 2
- 102000054765 polymorphisms of proteins Human genes 0.000 description 2
- 210000001938 protoplast Anatomy 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000006722 reduction reaction Methods 0.000 description 2
- 229920002477 rna polymer Polymers 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 239000002924 silencing RNA Substances 0.000 description 2
- 239000002904 solvent Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 230000005030 transcription termination Effects 0.000 description 2
- 230000009261 transgenic effect Effects 0.000 description 2
- 150000003648 triterpenes Chemical class 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- TUWKAEMGKORGLZ-NFAWXSAZSA-N (1r,4ar,8ar)-2,5,5,8a-tetramethyl-1,4,4a,6,7,8-hexahydronaphthalene-1-carbaldehyde Chemical compound CC1(C)CCC[C@@]2(C)[C@H](C=O)C(C)=CC[C@@H]21 TUWKAEMGKORGLZ-NFAWXSAZSA-N 0.000 description 1
- 239000000260 (2E,6E)-3,7,11-trimethyldodeca-2,6,10-trien-1-ol Substances 0.000 description 1
- 101710165761 (2E,6E)-farnesyl diphosphate synthase Proteins 0.000 description 1
- IAKOZHOLGAGEJT-UHFFFAOYSA-N 1,1,1-trichloro-2,2-bis(p-methoxyphenyl)-Ethane Chemical compound C1=CC(OC)=CC=C1C(C(Cl)(Cl)Cl)C1=CC=C(OC)C=C1 IAKOZHOLGAGEJT-UHFFFAOYSA-N 0.000 description 1
- 238000001644 13C nuclear magnetic resonance spectroscopy Methods 0.000 description 1
- FCYLLGSBJNTSAP-UHFFFAOYSA-N 15-Hydroxy-Cryptoporic acid A Natural products OCC1(C)CCCC2(C)C(COC(C(CC(O)=O)C(=O)OC)C(=O)OC)C(=C)CCC21 FCYLLGSBJNTSAP-UHFFFAOYSA-N 0.000 description 1
- 108020005345 3' Untranslated Regions Proteins 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- QTBSBXVTEAMEQO-UHFFFAOYSA-M Acetate Chemical compound CC([O-])=O QTBSBXVTEAMEQO-UHFFFAOYSA-M 0.000 description 1
- 102000005345 Acetyl-CoA C-acetyltransferase Human genes 0.000 description 1
- 108010006229 Acetyl-CoA C-acetyltransferase Proteins 0.000 description 1
- 229920000936 Agarose Polymers 0.000 description 1
- 108020005544 Antisense RNA Proteins 0.000 description 1
- 241000123370 Antrodia Species 0.000 description 1
- 101100178203 Arabidopsis thaliana HMGB3 gene Proteins 0.000 description 1
- 241000228215 Aspergillus aculeatus Species 0.000 description 1
- 241000789823 Aspergillus calidoustus Species 0.000 description 1
- 240000006439 Aspergillus oryzae Species 0.000 description 1
- 235000002247 Aspergillus oryzae Nutrition 0.000 description 1
- 241000306560 Aspergillus udagawae Species 0.000 description 1
- 241000122818 Aspergillus ustus Species 0.000 description 1
- 241000203233 Aspergillus versicolor Species 0.000 description 1
- 108020004513 Bacterial RNA Proteins 0.000 description 1
- 108700010070 Codon Usage Proteins 0.000 description 1
- 101710118490 Copalyl diphosphate synthase Proteins 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- 101710095468 Cyclase Proteins 0.000 description 1
- XDTMQSROBMDMFD-UHFFFAOYSA-N Cyclohexane Chemical compound C1CCCCC1 XDTMQSROBMDMFD-UHFFFAOYSA-N 0.000 description 1
- SHZGCJCMOBCMKK-UHFFFAOYSA-N D-mannomethylose Natural products CC1OC(O)C(O)C(O)C1O SHZGCJCMOBCMKK-UHFFFAOYSA-N 0.000 description 1
- ODBLHEXUDAPZAU-ZAFYKAAXSA-N D-threo-isocitric acid Chemical compound OC(=O)[C@H](O)[C@@H](C(O)=O)CC(O)=O ODBLHEXUDAPZAU-ZAFYKAAXSA-N 0.000 description 1
- 108020001019 DNA Primers Proteins 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 241000189559 Dichomitus Species 0.000 description 1
- 102000016680 Dioxygenases Human genes 0.000 description 1
- 108010028143 Dioxygenases Proteins 0.000 description 1
- 101150071502 ERG12 gene Proteins 0.000 description 1
- 101150045041 ERG8 gene Proteins 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- 101710156207 Farnesyl diphosphate synthase Proteins 0.000 description 1
- 101710125754 Farnesyl pyrophosphate synthase Proteins 0.000 description 1
- 101710089428 Farnesyl pyrophosphate synthase erg20 Proteins 0.000 description 1
- 108020004460 Fungal RNA Proteins 0.000 description 1
- 102100039556 Galectin-4 Human genes 0.000 description 1
- 102100039555 Galectin-7 Human genes 0.000 description 1
- 108700007698 Genetic Terminator Regions Proteins 0.000 description 1
- 206010071602 Genetic polymorphism Diseases 0.000 description 1
- 108010026318 Geranyltranstransferase Proteins 0.000 description 1
- 102000013404 Geranyltranstransferase Human genes 0.000 description 1
- 101100025321 Gibberella zeae (strain ATCC MYA-4620 / CBS 123657 / FGSC 9075 / NRRL 31084 / PH-1) ERG19 gene Proteins 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 102000040726 HAD-like hydrolase family Human genes 0.000 description 1
- 108091071311 HAD-like hydrolase family Proteins 0.000 description 1
- 101150091750 HMG1 gene Proteins 0.000 description 1
- 108700010013 HMGB1 Proteins 0.000 description 1
- 101150021904 HMGB1 gene Proteins 0.000 description 1
- 101001009859 Herpetosiphon aurantiacus (strain ATCC 23779 / DSM 785 / 114-95) (+)-kolavenyl diphosphate synthase Proteins 0.000 description 1
- 241000735452 Heterobasidion Species 0.000 description 1
- 241000446569 Heterobasidion irregulare Species 0.000 description 1
- 102100037907 High mobility group protein B1 Human genes 0.000 description 1
- 101000608765 Homo sapiens Galectin-4 Proteins 0.000 description 1
- 101000608772 Homo sapiens Galectin-7 Proteins 0.000 description 1
- 101001081533 Homo sapiens Isopentenyl-diphosphate Delta-isomerase 1 Proteins 0.000 description 1
- 101000579123 Homo sapiens Phosphoglycerate kinase 1 Proteins 0.000 description 1
- ODBLHEXUDAPZAU-FONMRSAGSA-N Isocitric acid Natural products OC(=O)[C@@H](O)[C@H](C(O)=O)CC(O)=O ODBLHEXUDAPZAU-FONMRSAGSA-N 0.000 description 1
- RRHGJUQNOFWUDK-UHFFFAOYSA-N Isoprene Chemical group CC(=C)C=C RRHGJUQNOFWUDK-UHFFFAOYSA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- SHZGCJCMOBCMKK-JFNONXLTSA-N L-rhamnopyranose Chemical compound C[C@@H]1OC(O)[C@H](O)[C@H](O)[C@H]1O SHZGCJCMOBCMKK-JFNONXLTSA-N 0.000 description 1
- PNNNRSAQSRJVSB-UHFFFAOYSA-N L-rhamnose Natural products CC(O)C(O)C(O)C(O)C=O PNNNRSAQSRJVSB-UHFFFAOYSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 102000008109 Mixed Function Oxygenases Human genes 0.000 description 1
- 108010074633 Mixed Function Oxygenases Proteins 0.000 description 1
- 230000004988 N-glycosylation Effects 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 230000004989 O-glycosylation Effects 0.000 description 1
- 241000018242 Obba <basidiomycete fungus> Species 0.000 description 1
- 241000018244 Obba rivulosa Species 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 102000004316 Oxidoreductases Human genes 0.000 description 1
- 108090000854 Oxidoreductases Proteins 0.000 description 1
- 108010033276 Peptide Fragments Proteins 0.000 description 1
- 102000007079 Peptide Fragments Human genes 0.000 description 1
- 108091036407 Polyadenylation Proteins 0.000 description 1
- 101710150389 Probable farnesyl diphosphate synthase Proteins 0.000 description 1
- 108091030071 RNAI Proteins 0.000 description 1
- 108020005091 Replication Origin Proteins 0.000 description 1
- 108700008625 Reporter Genes Proteins 0.000 description 1
- 108010034634 Repressor Proteins Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 101100329714 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) CTR3 gene Proteins 0.000 description 1
- 101100025327 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) MVD1 gene Proteins 0.000 description 1
- 241001311582 Schizopora Species 0.000 description 1
- 241001123565 Schizopora paradoxa Species 0.000 description 1
- 241001486992 Taiwanofungus camphoratus Species 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- 241000222354 Trametes Species 0.000 description 1
- 241000222355 Trametes versicolor Species 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 108090000992 Transferases Proteins 0.000 description 1
- 102000004357 Transferases Human genes 0.000 description 1
- 108700019146 Transgenes Proteins 0.000 description 1
- 101710174833 Tuberculosinyl adenosine transferase Proteins 0.000 description 1
- OXFMLGZWGZNFCM-UHFFFAOYSA-N UNPD143463 Natural products C1OC(=O)C(CC(O)=O)C(C(=O)OC)OCC2C(=C)CCC3C2(C)CCCC3(C)COC(=O)C(CC(O)=O)C(C(=O)OC)OCC2C(=C)CCC3C2(C)CCCC13C OXFMLGZWGZNFCM-UHFFFAOYSA-N 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- GQKPZEYFVCLAHT-YQQAZPJKSA-N [(1s,4as,8as)-5,5,8a-trimethyl-2-methylidene-3,4,4a,6,7,8-hexahydro-1h-naphthalen-1-yl]methyl acetate Chemical compound CC1(C)CCC[C@]2(C)[C@@H](COC(=O)C)C(=C)CC[C@H]21 GQKPZEYFVCLAHT-YQQAZPJKSA-N 0.000 description 1
- 150000001241 acetals Chemical class 0.000 description 1
- 108091006088 activator proteins Proteins 0.000 description 1
- 238000013019 agitation Methods 0.000 description 1
- GQKPZEYFVCLAHT-UHFFFAOYSA-N albicanyl acetate Natural products CC1(C)CCCC2(C)C(COC(=O)C)C(=C)CCC21 GQKPZEYFVCLAHT-UHFFFAOYSA-N 0.000 description 1
- 150000001298 alcohols Chemical class 0.000 description 1
- 150000001299 aldehydes Chemical class 0.000 description 1
- 150000001408 amides Chemical class 0.000 description 1
- 125000000539 amino acid group Chemical group 0.000 description 1
- 230000000845 anti-microbial effect Effects 0.000 description 1
- 239000008346 aqueous phase Substances 0.000 description 1
- 210000004507 artificial chromosome Anatomy 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- 150000001510 aspartic acids Chemical class 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 239000012159 carrier gas Substances 0.000 description 1
- 108020001778 catalytic domains Proteins 0.000 description 1
- 150000001767 cationic compounds Chemical class 0.000 description 1
- 238000007385 chemical modification Methods 0.000 description 1
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 1
- 229960005091 chloramphenicol Drugs 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 230000004186 co-expression Effects 0.000 description 1
- 239000003184 complementary RNA Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 229930195005 cryptoporic acid Natural products 0.000 description 1
- KKBSYSFIUSBXPW-UHFFFAOYSA-N cryptoporic acid A Natural products COC(=O)C(CC(=O)O)C(OC1C(=C)CCC2C(C)(C)CCCC12C)C(=O)OC KKBSYSFIUSBXPW-UHFFFAOYSA-N 0.000 description 1
- IYAYKNOVHBOSPH-HZPDHXFCSA-N cryptoporic acid D Natural products C([C@@]1(O)COC2=CC(O)=CC=C2[C@H]1O)C1=CC=C(O)C=C1 IYAYKNOVHBOSPH-HZPDHXFCSA-N 0.000 description 1
- PCFMECNNYYMDRS-UHFFFAOYSA-N cryptoporic acid E Natural products C=C1CCC2C(C)(CO)CCCC2(C)C1COC(C(=O)OC)C(CC(O)=O)C(=O)OCC1(C)C2CCC(=C)C(COC(C(CC(O)=O)C(=O)OC)C(=O)OC)C2(C)CCC1 PCFMECNNYYMDRS-UHFFFAOYSA-N 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000005595 deprotonation Effects 0.000 description 1
- 238000010537 deprotonation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 150000002009 diols Chemical class 0.000 description 1
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 229930004069 diterpene Natural products 0.000 description 1
- 125000000567 diterpene group Chemical group 0.000 description 1
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 1
- TUWKAEMGKORGLZ-UHFFFAOYSA-N drimenal Natural products CC1(C)CCCC2(C)C(C=O)C(C)=CCC21 TUWKAEMGKORGLZ-UHFFFAOYSA-N 0.000 description 1
- SQNZJJAZBFDUTD-UHFFFAOYSA-N durene Chemical compound CC1=CC(C)=C(C)C=C1C SQNZJJAZBFDUTD-UHFFFAOYSA-N 0.000 description 1
- 230000005014 ectopic expression Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 210000002257 embryonic structure Anatomy 0.000 description 1
- 230000007247 enzymatic mechanism Effects 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 238000001952 enzyme assay Methods 0.000 description 1
- 150000002118 epoxides Chemical class 0.000 description 1
- 238000011067 equilibration Methods 0.000 description 1
- 150000002148 esters Chemical class 0.000 description 1
- 210000001723 extracellular space Anatomy 0.000 description 1
- 229930002886 farnesol Natural products 0.000 description 1
- 229940043259 farnesol Drugs 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 238000003818 flash chromatography Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 239000003205 fragrance Substances 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 230000000855 fungicidal effect Effects 0.000 description 1
- 239000000417 fungicide Substances 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 238000004817 gas chromatography Methods 0.000 description 1
- 230000004545 gene duplication Effects 0.000 description 1
- 238000003197 gene knockdown Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 230000009368 gene silencing by RNA Effects 0.000 description 1
- 238000012226 gene silencing method Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 229930182470 glycoside Natural products 0.000 description 1
- 150000002338 glycosides Chemical class 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000002363 herbicidal effect Effects 0.000 description 1
- 239000004009 herbicide Substances 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 230000003301 hydrolyzing effect Effects 0.000 description 1
- 125000001165 hydrophobic group Chemical group 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 229910001411 inorganic cation Inorganic materials 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000005061 intracellular organelle Anatomy 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 1
- 229960000318 kanamycin Drugs 0.000 description 1
- 229930027917 kanamycin Natural products 0.000 description 1
- 229930182823 kanamycin A Natural products 0.000 description 1
- 238000011005 laboratory method Methods 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- XIXADJRWDQXREU-UHFFFAOYSA-M lithium acetate Chemical compound [Li+].CC([O-])=O XIXADJRWDQXREU-UHFFFAOYSA-M 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000028744 lysogeny Effects 0.000 description 1
- 229910001629 magnesium chloride Inorganic materials 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 239000002480 mineral oil Substances 0.000 description 1
- 235000010446 mineral oil Nutrition 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 239000004570 mortar (masonry) Substances 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 210000003463 organelle Anatomy 0.000 description 1
- 239000012074 organic phase Substances 0.000 description 1
- 239000003960 organic solvent Substances 0.000 description 1
- 230000001590 oxidative effect Effects 0.000 description 1
- 239000013500 performance material Substances 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 238000013081 phylogenetic analysis Methods 0.000 description 1
- 210000002706 plastid Anatomy 0.000 description 1
- 230000003234 polygenic effect Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000017854 proteolysis Effects 0.000 description 1
- 230000002797 proteolythic effect Effects 0.000 description 1
- 230000006337 proteolytic cleavage Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 238000003259 recombinant expression Methods 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- ODBLHEXUDAPZAU-UHFFFAOYSA-N threo-D-isocitric acid Natural products OC(=O)C(O)C(C(O)=O)CC(O)=O ODBLHEXUDAPZAU-UHFFFAOYSA-N 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000010474 transient expression Effects 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 244000045561 useful plants Species 0.000 description 1
- 239000007218 ym medium Substances 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P7/00—Preparation of oxygen-containing organic compounds
- C12P7/02—Preparation of oxygen-containing organic compounds containing a hydroxy group
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/88—Lyases (4.)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y402/00—Carbon-oxygen lyases (4.2)
- C12Y402/03—Carbon-oxygen lyases (4.2) acting on phosphates (4.2.3)
Definitions
- biochemical methods of producing albicanol, drimenol and related compounds and derivatives which method comprises the use of novel polypeptides.
- Terpenes are found in most organisms (microorganisms, animals and plants). These compounds are made up of five carbon units called isoprene units and are classified by the number of these units present in their structure. Thus monoterpenes, sesquiterpenes and diterpenes are terpenes containing 10, 15 and 20 carbon atoms, respectively. Sesquiterpenes, for example, are widely found in the plant kingdom. Many sesquiterpene molecules are known for their flavor and fragrance properties and their cosmetic, medicinal and antimicrobial effects. Numerous sesquiterpene hydrocarbons and sesquiterpenoids have been identified. Chemical synthesis approaches have been developed but are still complex and not always cost-effective.
- terpene synthases There are numerous sesquiterpene synthases present in the plant kingdom, all using the same substrate (farnesyl diphosphate, FPP), but having different product profiles. Genes and cDNAs encoding sesquiterpene synthases have been cloned and the corresponding recombinant enzymes characterized.
- sesquiterpenes for example drimenol
- drimenol the main sources for sesquiterpenes
- these natural sources can be low.
- terpenes, terpene synthases and more cost-effective methods of producing sesquiterpenes such as albicanol and/or drimenol and derivatives therefrom.
- a method for producing a drimane sesquiterpene comprising:
- the drimane sesquiterpene comprises albicanol and/or drimenol.
- the polypeptide having bifunctional terpene synthase activity comprises
- the above method comprises contacting the drimane sesquiterpene with at least one enzyme to produce a drimane sesquiterpene derivative. In another embodiment, the above method comprises converting the drimane sesquiterpene to a drimane sesquiterpene derivative using chemical synthesis or biochemical synthesis.
- the above method comprises transforming a host cell or non-human host organism with a nucleic acid encoding the above polypeptide.
- the method further comprises culturing a non-human host organism or a host cell capable of producing FPP and transformed to express a polypeptide comprising a HAD-like hydrolase domain under conditions that allow for the production of the polypeptide, wherein the polypeptide
- the polypeptide comprises one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- the class I terpene synthase-like motif of the above method comprises SEQ ID NO: 54 (DD(K/Q/R)(L/I/T)(D/E)), the class II terpene synthase-like motif comprises SEQ ID NO: 57 (D(V/M/L)DTT), and the drimane sesquiterpene is albicanol.
- polypeptide comprises
- polypeptide comprises
- polypeptide comprising a HAD-like hydrolase domains and having bifunctional terpene synthase activity comprising the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 5 or comprising
- the isolated polypeptide further comprises one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- nucleic acid molecule Provided herein is an isolated nucleic acid molecule
- a vector comprising the above nucleic acid molecule or a nucleic acid encoding the above polypeptide.
- the vector is an expression vector.
- the vector is a prokaryotic vector, viral vector or a eukaryotic vector.
- a host cell or non-human organism comprising the above nucleic acid or above vector.
- the host cell or non-human organism is a prokaryotic cell or a eukaryotic cell or a microorganism or fungal cell.
- the prokaryotic cell is a bacterial cell.
- the bacterial cell is E. coli.
- the host cell or non-human organism is a eukaryotic cell.
- the eukaryotic cell is a yeast cell or plant cell.
- the yeast cell is Saccharomyces cerevisiae.
- the drimane sesquiterpene is albicanol. In another aspect, in the above use of the polypeptide, the drimane sesquiterpene is drimenol.
- FIG. 1 Structure of drimane, (+)-albicanol and ( ⁇ )-drimenol.
- FIG. 2 Mechanism of cyclization of farnesyl-diphosphate by a class II terpene synthase and class I terpene synthase enzymatic activity.
- FIG. 3 GCMS analysis of the sesquiterpenes produced in-vivo by the recombinant CvTps1 enzyme in bacteria cells modified to overproduce farnesyl-diphosphate.
- A Total ion chromatogram of an extract of E. coli cells expressing CvTps1 and the mevalonate pathway enzymes.
- B Total ion chromatogram of an authentic standard of albicanol.
- C Total ion chromatogram of an extract of E. coli cells expressing only the mevalonate pathway enzymes. 1, albicanol; 2, trans-farnesol (from hydrolysis of FPP by endogenous phosphatase enzymes).
- FIG. 4 Comparison of the mass spectra of the product of CvTps1 and of an authentic standard of albicanol.
- FIG. 5 GCMS analysis of the sesquiterpenes produced by the LoTps1 and CvTps1 recombinant protein.
- the peak labeled ‘1’ is (+)-albicanol.
- FIG. 6 A-C Amino acid sequences alignment of putative terpene synthases containing class I and class II motifs: CvTps1 (SEQ ID NO: 1), LoTps1 (SEQ ID NO: 5), OCH93767.1 (SEQ ID NO: 9), EMD37666.1 (SEQ ID NO: 12), EMD37666-B (SEQ ID NO: 15), XP_001217376.1 (SEQ ID NO: 17), OJJ98394.1 (SEQ ID NO: 20), GA087501.1 (SEQ ID NO: 23), XP_008034151.1 (SEQ ID NO: 26), XP_007369631.1 (SEQ ID NO: 29), ACg006372 (SEQ ID NO: 32), KIA75676.1 (SEQ ID NO: 35), XP_001820867.2 (SEQ ID NO: 38), CEN60542.1 (SEQ ID NO: 41), XP_009547469.1 (SEQ ID NO: 44), K
- FIG. 7 GCMS chromatograms of the sesquiterpenes produced by the LoTps1, CvTps1, OCH93767.1, EMD37666.1, EMD37666-B, and XP_001217376.1, recombinant proteins.
- the peak labeled ‘1’ is (+)-albicanol.
- FIG. 8 GCMS chromatograms of the sesquiterpenes produced by the OJJ98394.1, GAO87501.1, XP_008034151.1, XP_007369631.1, and ACg006372 recombinant proteins.
- the peak labeled ‘1’ is (+)-albicanol.
- FIG. 9 GCMS chromatograms of the sesquiterpenes produced by the KIA75676.1, XP_001820867.2, CEN60542.1, XP_009547469.1, KLO09124.1 and OJI95797.1 recombinant proteins.
- the peak labeled ‘1’ is ( ⁇ )-drimenol and the peak labeled ‘2’ is farnesol.
- FIG. 10 GCMS chromatograms of the sesquiterpenes produced by CvTps1 and AstC expressed in E. coli cells with and without the AstI and AstK phosphatases.
- the major peak obtained with AstC is drim-8-ene-11-ol and the major peak obtained with CvTps1 is (+)-albicanol.
- FIG. 11 GCMS analysis of the sesquiterpenes produced in-vivo by the recombinant XP_006461126.1 enzyme in bacteria cells modified to overproduce farnesyl-diphosphate.
- A Total ion chromatogram of an extract of E. coli cells expressing XP_006461126.1 and the mevalonate pathway enzymes.
- B Mass spectra of peak 13.1 minutes identified as drimenol.
- FIG. 12 GC-FID analysis of drimane sesquiterpenes produced using the modified S. cereviciae strain YST045 expressing five different synthases: XP_007369631.1 from Dichomitus squalens , XP_006461126 from Agaricus bisporus , LoTps1 from Laricifomes officinalis , EMD37666.1 from Gelatoporia subvermispora and XP_001217376.1 from Aspergillus terreus.
- polypeptide means an amino acid sequence of consecutively polymerized amino acid residues, for instance, at least 15 residues, at least 30 residues, at least 50 residues.
- a polypeptide comprises an amino acid sequence that is an enzyme, or a fragment, or a variant thereof.
- protein refers to an amino acid sequence of any length wherein amino acids are linked by covalent peptide bonds, and includes oligopeptide, peptide, polypeptide and full length protein whether naturally occurring or synthetic.
- isolated polypeptide refers to an amino acid sequence that is removed from its natural environment by any method or combination of methods known in the art and includes recombinant, biochemical and synthetic methods.
- bifunctional terpene synthase or “polypeptide having bifunctional terpene synthase activity” relate to a polypeptide that comprises class I and class II terpene synthase domains and has bifunctional terpene synthase activity of protonation-initiated cyclization and ionization-initiated cyclization catalytic activities.
- a bifunctional terpene synthase as described herein comprises a HAD-like hydrolase domain which is characteristic of polypeptides belonging to the Haloacid dehalogenase (HAD)-like hydrolase superfamily (Interpro protein superfamily IPR023214, www.ebi.ac.uk/interpro/entry/IPR023214; Pfam protein superfamily PF13419, pfam.xfam.org/family/PF13419).
- HAD-like hydrolase domain is a portion of a polypeptide having amino acid sequence similarities with the members of the HAD-like hydrolase family and related function.
- a HAD-like hydrolase domain can be identified in a polypeptide by searching for amino acid motifs or signatures characteristic of this protein family.
- Proteins are generally composed of one or more functional regions or domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function.
- a polypeptide which comprises a HAD-like hydrolase domain and/or characteristic HAD-like hydrolase motifs functions in binding and cleavage of phosphate or diphosphate groups of a ligand.
- a bifunctional terpene synthase may also comprise one or more of conserved motifs A, B, C, and/or D as depicted in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- dimethyl methacrylate relates to a terpene having a drimane-like carbon skeleton structure as depicted in FIG. 1 .
- class I terpene synthase relates to a terpene synthase that catalyses ionization-initiated reactions, for example, monoterpene and sesquiterpene synthases.
- class I terpene synthase motif or “class I terpene synthase-like motif” relates to an active site of a terpene synthase that comprises the conserved DDxx(D/E) motif.
- the aspartic acid residues of this class I motif bind, for example, a divalent metal ion (most often Mg 2+ ) involved in the binding of the diphosphate group and catalyze the ionization and cleavage of the allylic diphosphate bond of the substrate.
- class II terpene synthase relates to a terpene synthase that catalyses protonation-initiated cyclization reactions, for example, typically involved in the biosynthesis of triterpenes and labdane diterpenes.
- the protonation-initiated reaction may involve, for example, acidic amino acids donating a proton to the terminal double-bond.
- class II terpene synthase motif or “class II terpene synthase-like motif” relates to an active site of a terpene synthase that comprises the conserved DxDD or DxD(T/S)T motif.
- albicanol synthase or “polypeptide having albicanol synthase activity” or “albicanol synthase protein” relate to a polypeptide capable of catalyzing the synthesis of albicanol, in the form of any of its stereoisomers or a mixture thereof, starting from an acyclic terpene pyrophosphate, particularly farnesyl diphosphate (FPP).
- Albicanol may be the only product or may be part of a mixture of sesquiterpenes.
- dipeptide having a drimenol synthase activity or “drimenol synthase protein” relate to a polypeptide capable of catalyzing the synthesis of drimenol, in the form of any of its stereoisomers or a mixture thereof, starting from an acyclic terpene pyrophosphate, particularly farnesyl diphosphate (FPP).
- Drimenol may be the only product or may be part of a mixture of sesquiterpenes.
- biological function refers to the ability of the bifunctional terpene synthase to catalyze the formation of albicanol and/or drimenol or a mixture of compounds comprising albicanol and/or drimenol and one or more terpenes.
- mixture of terpenes or “mixture of sesquiterpenes” refer to a mixture of terpenes or sesquiterpenes that comprises albicanol and/or drimenol, and may also comprise one or more additional terpenes or sesquiterpenes.
- nucleic acid sequence refers to a sequence of nucleotides.
- a nucleic acid sequence may be a single-stranded or double-stranded deoxyribonucleotide, or ribonucleotide of any length, and include coding and non-coding sequences of a gene, exons, introns, sense and anti-sense complimentary sequences, genomic DNA, cDNA, miRNA, siRNA, mRNA, rRNA, tRNA, recombinant nucleic acid sequences, isolated and purified naturally occurring DNA and/or RNA sequences, synthetic DNA and RNA sequences, fragments, primers and nucleic acid probes.
- nucleic acid sequences of RNA are identical to the DNA sequences with the difference of thymine (T) being replaced by uracil (U).
- nucleotide sequence should also be understood as comprising a polynucleotide molecule or an oligonucleotide molecule in the form of a separate fragment or as a component of a larger nucleic acid.
- isolated nucleic acid or “isolated nucleic acid sequence” relates to a nucleic acid or nucleic acid sequence that is in an environment different from that in which the nucleic acid or nucleic acid sequence naturally occurs and can include those that are substantially free from contaminating endogenous material.
- naturally-occurring as used herein as applied to a nucleic acid refers to a nucleic acid that is found in a cell of an organism in nature and which has not been intentionally modified by a human in the laboratory.
- Recombinant nucleic acid sequences are nucleic acid sequences that result from the use of laboratory methods (for example, molecular cloning) to bring together genetic material from more than on source, creating or modifying a nucleic acid sequence that does not occur naturally and would not be otherwise found in biological organisms.
- Recombinant DNA technology refers to molecular biology procedures to prepare a recombinant nucleic acid sequence as described, for instance, in Laboratory Manuals edited by Weigel and Glazebrook, 2002, Cold Spring Harbor Lab Press; and Sambrook et al, 1989, Cold Spring Harbor, N.Y., Cold Spring Harbor Laboratory Press.
- gene means a DNA sequence comprising a region, which is transcribed into a RNA molecule, e.g., an mRNA in a cell, operably linked to suitable regulatory regions, e.g., a promoter.
- a gene may thus comprise several operably linked sequences, such as a promoter, a 5′ leader sequence comprising, e.g., sequences involved in translation initiation, a coding region of cDNA or genomic DNA, introns, exons, and/or a 3′non-translated sequence comprising, e.g., transcription termination sites.
- a “chimeric gene” refers to any gene which is not normally found in nature in a species, in particular, a gene in which one or more parts of the nucleic acid sequence are present that are not associated with each other in nature.
- the promoter is not associated in nature with part or all of the transcribed region or with another regulatory region.
- the term “chimeric gene” is understood to include expression constructs in which a promoter or transcription regulatory sequence is operably linked to one or more coding sequences or to an antisense, i.e., reverse complement of the sense strand, or inverted repeat sequence (sense and antisense, whereby the RNA transcript forms double stranded RNA upon transcription).
- the term “chimeric gene” also includes genes obtained through the combination of portions of one or more coding sequences to produce a new gene.
- a “3′ UTR” or “3′ non-translated sequence” refers to the nucleic acid sequence found downstream of the coding sequence of a gene, which comprises, for example, a transcription termination site and (in most, but not all eukaryotic mRNAs) a polyadenylation signal such as AAUAAA or variants thereof. After termination of transcription, the mRNA transcript may be cleaved downstream of the polyadenylation signal and a poly(A) tail may be added, which is involved in the transport of the mRNA to the site of translation, e.g., cytoplasm.
- “Expression of a gene” encompasses “heterologous expression” and “over-expression” and involves transcription of the gene and translation of the mRNA into a protein. Overexpression refers to the production of the gene product as measured by levels of mRNA, polypeptide and/or enzyme activity in transgenic cells or organisms that exceeds levels of production in non-transformed cells or organisms of a similar genetic background.
- “Expression vector” as used herein means a nucleic acid molecule engineered using molecular biology methods and recombinant DNA technology for delivery of foreign or exogenous DNA into a host cell.
- the expression vector typically includes sequences required for proper transcription of the nucleotide sequence.
- the coding region usually codes for a protein of interest but may also code for an RNA, e.g., an antisense RNA, siRNA and the like.
- an “expression vector” as used herein includes any linear or circular recombinant vector including but not limited to viral vectors, bacteriophages and plasmids. The skilled person is capable of selecting a suitable vector according to the expression system.
- the expression vector includes the nucleic acid of an embodiment herein operably linked to at least one regulatory sequence, which controls transcription, translation, initiation and termination, such as a transcriptional promoter, operator or enhancer, or an mRNA ribosomal binding site and, optionally, including at least one selection marker.
- Nucleotide sequences are “operably linked” when the regulatory sequence functionally relates to the nucleic acid of an embodiment herein.
- regulatory sequence refers to a nucleic acid sequence that determines expression level of the nucleic acid sequences of an embodiment herein and is capable of regulating the rate of transcription of the nucleic acid sequence operably linked to the regulatory sequence. Regulatory sequences comprise promoters, enhancers, transcription factors, promoter elements and the like.
- Promoter refers to a nucleic acid sequence that controls the expression of a coding sequence by providing a binding site for RNA polymerase and other factors required for proper transcription including without limitation transcription factor binding sites, repressor and activator protein binding sites.
- the meaning of the term promoter also includes the term “promoter regulatory sequence”.
- Promoter regulatory sequences may include upstream and downstream elements that may influences transcription, RNA processing or stability of the associated coding nucleic acid sequence. Promoters include naturally-derived and synthetic sequences.
- the coding nucleic acid sequences is usually located downstream of the promoter with respect to the direction of the transcription starting at the transcription initiation site.
- constitutive promoter refers to an unregulated promoter that allows for continual transcription of the nucleic acid sequence it is operably linked to.
- operably linked refers to a linkage of polynucleotide elements in a functional relationship.
- a nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence.
- a promoter or rather a transcription regulatory sequence, is operably linked to a coding sequence if it affects the transcription of the coding sequence.
- Operably linked means that the DNA sequences being linked are typically contiguous.
- the nucleotide sequence associated with the promoter sequence may be of homologous or heterologous origin with respect to the plant to be transformed. The sequence also may be entirely or partially synthetic.
- nucleic acid sequence associated with the promoter sequence will be expressed or silenced in accordance with promoter properties to which it is linked after binding to the polypeptide of an embodiment herein.
- the associated nucleic acid may code for a protein that is desired to be expressed or suppressed throughout the organism at all times or, alternatively, at a specific time or in specific tissues, cells, or cell compartment.
- Such nucleotide sequences particularly encode proteins conferring desirable phenotypic traits to the host cells or organism altered or transformed therewith.
- the associated nucleotide sequence leads to the production of albicanol and/or drimenol or a mixture comprising albicanol and/or drimenol or a mixture comprising albicanol and/or drimenol and one or more terpenes in the cell or organism.
- the nucleotide sequence encodes a bifunctional terpene synthase.
- Target peptide refers to an amino acid sequence which targets a protein, or polypeptide to intracellular organelles, i.e., mitochondria, or plastids, or to the extracellular space (secretion signal peptide).
- a nucleic acid sequence encoding a target peptide may be fused to the nucleic acid sequence encoding the amino terminal end, e.g., N-terminal end, of the protein or polypeptide, or may be used to replace a native targeting polypeptide.
- primer refers to a short nucleic acid sequence that is hybridized to a template nucleic acid sequence and is used for polymerization of a nucleic acid sequence complementary to the template.
- the term “host cell” or “transformed cell” refers to a cell (or organism) altered to harbor at least one nucleic acid molecule, for instance, a recombinant gene encoding a desired protein or nucleic acid sequence which upon transcription yields a bifunctional terpene synthase protein useful to produce albicanol and/or drimenol.
- the host cell is particularly a bacterial cell, a fungal cell or a plant cell.
- the host cell may contain a recombinant gene which has been integrated into the nuclear or organelle genomes of the host cell. Alternatively, the host may contain the recombinant gene extra-chromosomally.
- Homologous sequences include orthologous or paralogous sequences. Methods of identifying orthologs or paralogs including phylogenetic methods, sequence similarity and hybridization methods are known in the art and are described herein.
- Paralogs result from gene duplication that gives rise to two or more genes with similar sequences and similar functions. Paralogs typically cluster together and are formed by duplications of genes within related plant species. Paralogs are found in groups of similar genes using pair-wise Blast analysis or during phylogenetic analysis of gene families using programs such as CLUSTAL. In paralogs, consensus sequences can be identified characteristic to sequences within related genes and having similar functions of the genes.
- Orthologs are sequences similar to each other because they are found in species that descended from a common ancestor. For instance, plant species that have common ancestors are known to contain many enzymes that have similar sequences and functions. The skilled artisan can identify orthologous sequences and predict the functions of the orthologs, for example, by constructing a polygenic tree for a gene family of one species using CLUSTAL or BLAST programs. A method for identifying or confirming similar functions among homologous sequences is by comparing of the transcript profiles in host cells or organisms, such as plants or microorganisms, overexpressing or lacking (in knockouts/knockdowns) related polypeptides.
- genes having similar transcript profiles with greater than 50% regulated transcripts in common, or with greater than 70% regulated transcripts in common, or greater than 90% regulated transcripts in common will have similar functions.
- Homologs, paralogs, orthologs and any other variants of the sequences herein are expected to function in a similar manner by making the host cells, organism such as plants or microorganisms producing bifunctional terpene synthase proteins.
- selectable marker refers to any gene which upon expression may be used to select a cell or cells that include the selectable marker. Examples of selectable markers are described below. The skilled artisan will know that different antibiotic, fungicide, auxotrophic or herbicide selectable markers are applicable to different target species.
- “Drimenol” for purposes of this application relates to ( ⁇ )-drimenol (CAS: 468-68-8).
- organism refers to any non-human multicellular or unicellular organisms such as a plant, or a microorganism. Particularly, a micro-organism is a bacterium, a yeast, an algae or a fungus.
- plant is used interchangeably to include plant cells including plant protoplasts, plant tissues, plant cell tissue cultures giving rise to regenerated plants, or parts of plants, or plant organs such as roots, stems, leaves, flowers, pollen, ovules, embryos, fruits and the like. Any plant can be used to carry out the methods of an embodiment herein.
- a particular organism or cell is meant to be “capable of producing FPP” when it produces FPP naturally or when it does not produce FPP naturally but is transformed to produce FPP, either prior to the transformation with a nucleic acid as described herein or together with said nucleic acid.
- Organisms or cells transformed to produce a higher amount of FPP than the naturally occurring organism or cell are also encompassed by the “organisms or cells capable of producing FPP”.
- nucleic acid molecule comprising a nucleotide sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 or comprising the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 33,
- the nucleic acid molecule consists of the nucleotide sequence SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66
- the nucleic acid of an embodiment herein can be either present naturally in Cryptoporus or Laricifomes or in other fungal species, or be obtained by modifying SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof.
- nucleic acid is isolated or is derived from fungi of the genus Cryptoporus or Laricifomes . In a further embodiment the nucleic acid is isolated or derived from Cryptoporus volvatus or Laricifomes officinalis.
- nucleotide sequence obtained by modifying SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof which encompasses any sequence that has been obtained by modifying the sequence of SEQ ID NO: 3 or SEQ ID NO: 7, or of the reverse complement thereof using any method known in the art, for example, by introducing any type of mutations such as deletion, insertion and/or substitution mutations.
- nucleic acids comprising a sequence obtained by mutation of SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof are encompassed by an embodiment herein, provided that the sequences they comprise share at least the defined sequence identity of SEQ ID NO: 3 or SEQ ID NO: 7 as defined in any of the above embodiments or the reverse complement thereof and provided that they encode a polypeptide comprising a HAD-like hydrolase domain and having a bifunctional terpene synthase activity to produce a drimane sesquiterpene, wherein the polypeptide comprises (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)) and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T).
- the polypeptide having bifunctional terpene synthase activity may further comprise one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- Mutations may be any kind of mutations of these nucleic acids, for example, point mutations, deletion mutations, insertion mutations and/or frame shift mutations of one or more nucleotides of the DNA sequence of SEQ ID NO: 3 or SEQ ID NO: 7.
- the nucleic acid of an embodiment herein may be truncated provided that it encodes a polypeptide as described herein.
- a variant nucleic acid may be prepared in order to adapt its nucleotide sequence to a specific expression system.
- bacterial expression systems are known to more efficiently express polypeptides if amino acids are encoded by particular codons.
- nucleic acid sequences encoding the bifunctional terpene synthase may be optimized for increased expression in the host cell.
- nucleotides of an embodiment herein may be synthesized using codons particular to a host for improved expression.
- the nucleic acid molecule comprises the nucleotide sequence of SEQ ID NO: 4, SEQ ID NO: 8, SEQ ID NO: 11, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 19, SEQ ID NO: 22, SEQ ID NO: 25, SEQ ID NO: 28, SEQ ID NO: 31, SEQ ID NO: 34, SEQ ID NO: 37, SEQ ID NO: 40, SEQ ID NO: 43, SEQ ID NO: 46, SEQ ID NO: 49, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70, or the reverse complement thereof.
- an isolated, recombinant or synthetic nucleic acid sequence comprising the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO:
- RNA sequences Provided herein are also cDNA, genomic DNA and RNA sequences. Any nucleic acid sequence encoding the bifunctional terpene synthase or variants thereof is referred herein as a bifunctional terpene synthase encoding sequence.
- the nucleic acid of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 is the coding sequence of a bifunctional terpene synthase gene encoding a bifunctional terpene synthase obtained as described in the Examples.
- a fragment of a polynucleotide of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 refers to contiguous nucleotides that is particularly at least 15 bp, at least 30 bp, at least 40 bp, at least 50 bp and/or at least 60 bp in length of the polynucleotide of an embodiment herein.
- the fragment of a polynucleotide comprises at least 25, more particularly at least 50, more particularly at least 75, more particularly at least 100, more particularly at least 150, more particularly at least 200, more particularly at least 300, more particularly at least 400, more particularly at least 500, more particularly at least 600, more particularly at least 700, more particularly at least 800, more particularly at least 900, more particularly at least 1000 contiguous nucleotides of the polynucleotide of an embodiment herein.
- fragment of the polynucleotides herein may be used as a PCR primer, and/or as a probe, or for anti-sense gene silencing or RNAi.
- genes including the polynucleotides of an embodiment herein, can be cloned on basis of the available nucleotide sequence information, such as found in the attached sequence listing, by methods known in the art. These include e.g. the design of DNA primers representing the flanking sequences of such gene of which one is generated in sense orientations and which initiates synthesis of the sense strand and the other is created in reverse complementary fashion and generates the antisense strand. Thermo stable DNA polymerases such as those used in polymerase chain reaction are commonly used to carry out such experiments. Alternatively, DNA sequences representing genes can be chemically synthesized and subsequently introduced in DNA vector molecules that can be multiplied by e.g. compatible bacteria such as e.g. E. coli.
- PCR primers and/or probes for detecting nucleic acid sequences encoding a polypeptide having bifunctional terpene synthase activity are provided.
- a detection kit for nucleic acid sequences encoding the bifunctional terpene synthase may include primers and/or probes specific for nucleic acid sequences encoding the bifunctional terpene synthase, and an associated protocol to use the primers and/or probes to detect nucleic acid sequences encoding the bifunctional terpene synthase in a sample.
- detection kits may be used to determine whether a plant, organism, microorganism or cell has been modified, i.e., transformed with a sequence encoding the bifunctional terpene synthase.
- the sequence of interest is operably linked to a selectable or screenable marker gene and expression of the reporter gene is tested in transient expression assays, for example, with microorganisms or with protoplasts or in stably transformed plants.
- transient expression assays for example, with microorganisms or with protoplasts or in stably transformed plants.
- DNA sequences capable of driving expression are built as modules. Accordingly, expression levels from shorter DNA fragments may be different than the one from the longest fragment and may be different from each other.
- nucleic acid sequence coding the bifunctional terpene synthase proteins provided herein, i.e., nucleotide sequences that hybridize under stringent conditions to the nucleic acid sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68.
- hybridization or hybridizes under certain conditions is intended to describe conditions for hybridization and washes under which nucleotide sequences that are significantly identical or homologous to each other remain bound to each other.
- the conditions may be such that sequences, which are at least about 70%, such as at least about 80%, and such as at least about 85%, 90%, or 95% identical, remain bound to each other. Definitions of low stringency, moderate, and high stringency hybridization conditions are provided herein.
- defined conditions of low stringency are as follows. Filters containing DNA are pretreated for 6 h at 40° C. in a solution containing 35% formamide, 5 ⁇ SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 ⁇ g/ml denatured salmon sperm DNA. Hybridizations are carried out in the same solution with the following modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 ⁇ g/ml salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20 ⁇ 106 32P-labeled probe is used.
- Filters are incubated in hybridization mixture for 18-20 h at 40° C., and then washed for 1.5 h at 55° C. In a solution containing 2 ⁇ SSC, 25 mM Tris-HCl (pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution is replaced with fresh solution and incubated an additional 1.5 h at 60° C. Filters are blotted dry and exposed for autoradiography.
- defined conditions of moderate stringency are as follows. Filters containing DNA are pretreated for 7 h at 50° C. in a solution containing 35% formamide, 5 ⁇ SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 ⁇ g/ml denatured salmon sperm DNA. Hybridizations are carried out in the same solution with the following modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 ⁇ g/ml salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20 ⁇ 106 32P-labeled probe is used.
- Filters are incubated in hybridization mixture for 30 h at 50° C., and then washed for 1.5 h at 55° C. In a solution containing 2 ⁇ SSC, 25 mM Tris-HCl (pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution is replaced with fresh solution and incubated an additional 1.5 h at 60° C. Filters are blotted dry and exposed for autoradiography.
- defined conditions of high stringency are as follows. Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65° C. in buffer composed of 6 ⁇ SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 ⁇ g/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65° C. in the prehybridization mixture containing 100 ⁇ g/ml denatured salmon sperm DNA and 5-20 ⁇ 106 cpm of 32P-labeled probe. Washing of filters is done at 37° C.
- the percentage of identity between two peptide or nucleotide sequences is a function of the number of amino acids or nucleotide residues that are identical in the two sequences when an alignment of these two sequences has been generated. Identical residues are defined as residues that are the same in the two sequences in a given position of the alignment.
- the percentage of sequence identity is calculated from the optimal alignment by taking the number of residues identical between two sequences dividing it by the total number of residues in the shortest sequence and multiplying by 100.
- the optimal alignment is the alignment in which the percentage of identity is the highest possible. Gaps may be introduced into one or both sequences in one or more positions of the alignment to obtain the optimal alignment.
- Alignment for the purpose of determining the percentage of amino acid or nucleic acid sequence identity can be achieved in various ways using computer programs and for instance publicly available computer programs available on the world wide web.
- the BLAST program (Tatiana et al, FEMS Microbiol Lett., 1999, 174:247-250, 1999) set to the default parameters, available from the National Center for Biotechnology Information (NCBI) website at ncbi.nlm.nih.gov/BLAST/b12seq/wblast2.cgi, can be used to obtain an optimal alignment of protein or nucleic acid sequences and to calculate the percentage of sequence identity.
- a related embodiment provided herein provides a nucleic acid sequence which is complementary to the nucleic acid sequence according to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 such as inhibitory RNAs, or nucleic acid sequence which hybridizes under stringent conditions to at least part of the nucleotide sequence according to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68.
- An alternative embodiment of an embodiment herein provides a method to alter gene expression in a host cell. For instance, the polynucleotide of an embodiment herein may be enhanced or overexpressed or induced in certain contexts (e.g. upon exposure to certain temperatures or culture conditions) in a host cell or host organism.
- Alteration of expression of a polynucleotide provided herein may also result in ectopic expression which is a different expression pattern in an altered and in a control or wild-type organism. Alteration of expression occurs from interactions of polypeptide of an embodiment herein with exogenous or endogenous modulators, or as a result of chemical modification of the polypeptide. The term also refers to an altered expression pattern of the polynucleotide of an embodiment herein which is altered below the detection level or completely suppressed activity.
- provided herein is also an isolated, recombinant or synthetic polynucleotide encoding a polypeptide or variant polypeptide provided herein.
- an isolated nucleic acid molecule encoding a polypeptide comprising a domain of the HAD-like hydrolase superfamily having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and the sequence as
- an isolated polypeptide comprising a HAD-like hydrolase domain having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1 or SEQ ID NO: 5 or comprising the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 5.
- the polypeptide consists of the amino acid sequence of SEQ ID NO: 1 or 5.
- the polypeptide of an embodiment herein can be present naturally in Cryptoporus or Laricifomes fungi or in other fungi species, or comprises an amino acid sequence that is a variant of SEQ ID NO: 1 or SEQ ID NO: 5, either obtained by genetic engineering or found naturally in Cryptoporus or Laricifomes fungi or in other fungi species.
- the polypeptide is isolated or derived from fungi of the genus Cryptoporus or Laricifomes .
- the polypeptide is isolated or derived from Cryptoporus volvatus or Laricifomes officinalis.
- the at least one polypeptide having a bifunctional terpene synthase activity used in any of the herein-described embodiments or encoded by the nucleic acid used in any of the herein-described embodiments comprises an amino acid sequence that is a variant of SEQ ID NO: 1 or SEQ ID NO: 5, obtained by genetic engineering.
- the polypeptide comprises an amino acid sequence encoded by a nucleotide sequence that has been obtained by modifying SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof.
- Polypeptides are also meant to include variants and truncated polypeptides provided that they have bifunctional terpene synthase activity.
- the at least one polypeptide having a bifunctional terpene synthase activity used in any of the herein-described embodiments or encoded by the nucleic acid used in any of the herein-described embodiments comprises an amino acid sequence that is a variant of SEQ ID NO: 1 or SEQ ID NO: 5, obtained by genetic engineering, provided that said variant has bifunctional terpene synthase activity to produce a drimane sesquiterpene and has the required percentage of identity to SEQ ID NO: 1 or SEQ ID NO: 5 as described in any of the above embodiments and comprises (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)) and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T) and comprises domains corresponding to Pfam domains PF13419.5 and PF13242.5.
- the polypeptide having bifunctional terpene synthase activity may further comprise one or more conserved motifs as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- the at least one polypeptide having a bifunctional terpene synthase activity used in any of the herein-described embodiments or encoded by the nucleic acid used in any of the herein-described embodiments is a variant of SEQ ID NO: 1 or SEQ ID NO: 5 that can be found naturally in other organisms, such as other fungal species, provided that it has bifunctional terpene synthase activity and comprises domains corresponding to Pfam domains PF13419.5 and PF13242.5.
- the polypeptide includes a polypeptide or peptide fragment that encompasses the amino acid sequences identified herein, as well as truncated or variant polypeptides provided that they have bifunctional terpene synthase activity and that they share at least the defined percentage of identity with the corresponding fragment of SEQ ID NO: 1 or SEQ ID NO: 5 and comprise (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)) and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T) and comprises domains corresponding to Pfam domains PF13419.5 and PF13242.5.
- variant polypeptides are naturally occurring proteins that result from alternate mRNA splicing events or from proteolytic cleavage of the polypeptides described herein. Variations attributable to proteolysis include, for example, differences in the N- or C-termini upon expression in different types of host cells, due to proteolytic removal of one or more terminal amino acids from the polypeptides of an embodiment herein. Polypeptides encoded by a nucleic acid obtained by natural or artificial mutation of a nucleic acid of an embodiment herein, as described thereafter, are also encompassed by an embodiment herein.
- Polypeptide variants resulting from a fusion of additional peptide sequences at the amino and carboxyl terminal ends can also be used in the methods of an embodiment herein.
- a fusion can enhance expression of the polypeptides, be useful in the purification of the protein or improve the enzymatic activity of the polypeptide in a desired environment or expression system.
- additional peptide sequences may be signal peptides, for example.
- Another aspect encompasses methods using variant polypeptides, such as those obtained by fusion with other oligo- or polypeptides and/or those which are linked to signal peptides.
- Polypeptides resulting from a fusion with another functional protein, such as another protein from the terpene biosynthesis pathway can also be advantageously used in the methods of an embodiment herein.
- a variant may also differ from the polypeptide of an embodiment herein by attachment of modifying groups which are covalently or non-covalently linked to the polypeptide backbone.
- the variant also includes a polypeptide which differs from the polypeptide provided herein by introduced N-linked or O-linked glycosylation sites, and/or an addition of cysteine residues. The skilled artisan will recognize how to modify an amino acid sequence and preserve biological activity.
- DNA sequence polymorphisms may exist within a given population, which may lead to changes in the amino acid sequence of the polypeptides disclosed herein.
- Such genetic polymorphisms may exist in cells from different populations or within a population due to natural allelic variation. Allelic variants may also include functional equivalents.
- nucleic acid encoding the polypeptide or variants thereof of an embodiment herein is a useful tool to modify non-human host organisms, microorganisms or cells and to modify non-human host organisms, microorganisms or cells intended to be used in the methods described herein.
- An embodiment provided herein provides amino acid sequences of bifunctional terpene synthase proteins including orthologs and paralogs as well as methods for identifying and isolating orthologs and paralogs of the bifunctional terpene synthases in other organisms.
- orthologs and paralogs of the bifunctional terpene synthase retain bifunctional terpene synthase activity, may be considered a polypeptide of the HAD-like hydrolase superfamily (Interpro protein superfamily IPR023214 or Pfam protein superfamily PF13419) and which comprises a HAD-like hydrolase domain and are capable of producing a drimane sesquiterpene, such as albicanol and/or drimenol, starting from an acyclic terpene pyrophosphate precursor, e.g. FPP.
- HAD-like hydrolase superfamily Interpro protein superfamily IPR023214 or Pfam protein superfamily PF13419
- a drimane sesquiterpene such as albicanol and/or drimenol
- the polypeptide to be contacted with an acyclic terpene pyrophosphate, e.g. FPP, in vitro can be obtained by extraction from any organism expressing it, using standard protein or enzyme extraction technologies. If the host organism is an unicellular organism or cell releasing the polypeptide of an embodiment herein into the culture medium, the polypeptide may simply be collected from the culture medium, for example by centrifugation, optionally followed by washing steps and re-suspension in suitable buffer solutions. If the organism or cell accumulates the polypeptide within its cells, the polypeptide may be obtained by disruption or lysis of the cells and optionally further extraction of the polypeptide from the cell lysate. The cell lysate or the extracted polypeptide can be used to contact the acyclic terpene pyrophosphate for production of a terpene or a mixture of terpenes.
- FPP acyclic terpene pyrophosphate
- the polypeptide having a bifunctional terpene synthase activity may then be suspended in a buffer solution at optimal pH. If adequate, salts, DTT, inorganic cations and other kinds of enzymatic co-factors, may be added in order to optimize enzyme activity.
- the precursor FPP is added to the polypeptide suspension, which is then incubated at optimal temperature, for example between 15 and 40° C., particularly between 25 and 35° C., more particularly at 30° C.
- the drimane sesquiterpene, such as albicanol and/or drimenol, produced may be isolated from the incubated solution by standard isolation procedures, such as solvent extraction and distillation, optionally after removal of polypeptides from the solution.
- the at least one polypeptide having a bifunctional terpene synthase activity can be used for production of a drimane sesquiterpene comprising albicanol and/or drimenol or mixtures of terpenes comprising albicanol and/or drimenol.
- One particular tool to carry out the method of an embodiment herein is the polypeptide itself as described herein.
- the polypeptide is capable of producing a mixture of sesquiterpenes wherein albicanol and/or drimenol represents at least 20%, particularly at least 30%, particularly at least 35%, particularly at least 90%, particularly at least 95%, more particularly at least 98% of the sesquiterpenes produced.
- albicanol and/or drimenol is produced with greater than or equal to 95%, more particularly 98% selectivity.
- bifunctional terpene synthase protein variant or fragment
- transient or stable overexpression in plant, bacterial or yeast cells can be used to test whether the protein has activity, i.e., produces albicanol and/or drimenol from FPP precursors.
- Bifunctional terpene synthase activity may be assessed in a microbial expression system, such as the assay described in Example 3 herein on the production of albicanol and/or drimenol, indicating functionality.
- a variant or derivative of a bifunctional terpene synthase polypeptide of an embodiment herein retains an ability to produce a drimane sesquiterpene such as albicanol and/or drimenol from FPP precursors.
- Amino acid sequence variants of the bifunctional terpene synthases provided herein may have additional desirable biological functions including, e.g., altered substrate utilization, reaction kinetics, product distribution or other alterations.
- a polypeptide to catalyze the synthesis of a particular sesquiterpene for example albicanol and/or drimenol
- a particular sesquiterpene for example albicanol and/or drimenol
- At least one vector comprising the nucleic acid molecules described herein.
- Also provided herein is a vector selected from the group of a prokaryotic vector, viral vector and a eukaryotic vector.
- a vector that is an expression vector is an expression vector.
- bifunctional terpene synthases encoding nucleic acid sequences are co-expressed in a single host, particularly under control of different promoters.
- several bifunctional terpene synthase proteins encoding nucleic acid sequences can be present on a single transformation vector or be co-transformed at the same time using separate vectors and selecting transformants comprising both chimeric genes.
- one or more bifunctional terpene synthase encoding genes may be expressed in a single plant, cell, microorganism or organism together with other chimeric genes.
- the nucleic acid sequences of an embodiment herein encoding bifunctional terpene synthase proteins can be inserted in expression vectors and/or be contained in chimeric genes inserted in expression vectors, to produce bifunctional terpene synthase proteins in a host cell or non-human host organism.
- the vectors for inserting transgenes into the genome of host cells are well known in the art and include plasmids, viruses, cosmids and artificial chromosomes.
- Binary or co-integration vectors into which a chimeric gene is inserted can also be used for transforming host cells.
- An embodiment provided herein provides recombinant expression vectors comprising a nucleic acid sequence of a bifunctional terpene synthase gene, or a chimeric gene comprising a nucleic acid sequence of a bifunctional terpene synthase gene, operably linked to associated nucleic acid sequences such as, for instance, promoter sequences.
- a chimeric gene comprising a nucleic acid sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO
- the promoter sequence may already be present in a vector so that the nucleic acid sequence which is to be transcribed is inserted into the vector downstream of the promoter sequence.
- Vectors can be engineered to have an origin of replication, a multiple cloning site, and a selectable marker.
- an expression vector comprising a nucleic acid as described herein can be used as a tool for transforming non-human host organisms or host cells suitable to carry out the method of an embodiment herein in vivo.
- the expression vectors provided herein may be used in the methods for preparing a genetically transformed non-human host organism and/or host cell, in non-human host organisms and/or host cells harboring the nucleic acids of an embodiment herein and in the methods for making polypeptides having a bifunctional terpene synthase activity, as described herein.
- Recombinant non-human host organisms and host cells transformed to harbor at least one nucleic acid of an embodiment herein so that it heterologously expresses or over-expresses at least one polypeptide of an embodiment herein are also very useful tools to carry out the method of an embodiment herein. Such non-human host organisms and host cells are therefore provided herein.
- a host cell, microorganism or non-human host organism comprising at least one of the nucleic acid molecules described herein or comprising at least one vector comprising at least one of the nucleic acid molecules.
- a nucleic acid according to any of the above-described embodiments can be used to transform the non-human host organisms and cells and the expressed polypeptide can be any of the above-described polypeptides.
- the non-human host organism or host cell is a prokaryotic cell. In another embodiment, the non-human host organism or host cell is a bacterial cell. In a further embodiment, the non-human host organism or host cell is Escherichia coli.
- the non-human host organism or host cell is a eukaryotic cell. In another embodiment, the non-human host organism or host cell is a yeast cell. In a further embodiment, the non-human host organism or cell is Saccharomyces cerevisiae.
- the non-human organism or host cell is a plant cell or a fungal cell.
- the non-human host organism or host cell expresses a polypeptide, provided that the organism or cell is transformed to harbor a nucleic acid encoding said polypeptide, this nucleic acid is transcribed to mRNA and the polypeptide is found in the host organism or cell. Suitable methods to transform a non-human host organism or a host cell have been previously described and are also provided herein.
- the host organism or host cell is cultivated under conditions conducive to the production of a drimane sesquiterpene such as albicanol and/or drimenol.
- optimal growth conditions can be provided, such as optimal light, water and nutrient conditions, for example.
- conditions conducive to the production of a drimane sesquiterpene such as albicanol and/or drimenol may comprise addition of suitable cofactors to the culture medium of the host.
- a culture medium may be selected, so as to maximize drimane sesquiterpene, such as albicanol and/or drimenol, synthesis. Examples of optimal culture conditions are described in a more detailed manner in the Examples.
- Non-human host organisms suitable to carry out the method of an embodiment herein in vivo may be any non-human multicellular or unicellular organisms.
- the non-human host organism used to carry out an embodiment herein in vivo is a plant, a prokaryote or a fungus. Any plant, prokaryote or fungus can be used. Particularly useful plants are those that naturally produce high amounts of terpenes.
- the non-human host organism used to carry out the method of an embodiment herein in vivo is a microorganism. Any microorganism can be used, for example, the microorganism can be a bacteria or yeast, such as E. coli or Saccharomyces cerevisiae.
- organisms or cells that do not produce an acyclic terpene pyrophosphate precursor, e.g. FPP, naturally are transformed to produce said precursor. They can be so transformed either before the modification with the nucleic acid described according to any of the above embodiments or simultaneously, as explained above.
- Methods to transform organisms, for example microorganisms, so that they produce an acyclic terpene pyrophosphate precursor, e.g. FPP are already known in the art.
- Isolated higher eukaryotic cells can also be used, instead of complete organisms, as hosts to carry out the method of an embodiment herein in vivo.
- Suitable eukaryotic cells may be any non-human cell, such as plant or fungal cells.
- a method of producing a drimane sesquiterpene comprising: contacting an acyclic terpene pyrophosphate, particularly farnesyl diphosphate (FPP) with a polypeptide which comprises a HAD-like hydrolase domain and having bifunctional terpene synthase activity to produce a drimane sesquiterpene, wherein the polypeptide comprises (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)); and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T); and optionally isolating the drimane sesquiterpene.
- FPP farnesyl diphosphate
- the drimane sesquiterpene comprises albicanol and/or drimenol.
- polypeptide comprises an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63 and (1) the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and (2) the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58 or comprising the amino acid sequence of SEQ ID NO:
- the drimane sesquiterpene is albicanol and/or drimenol. In another aspect, the drimane sesquiterpene is isolated.
- the albicanol and/or drimenol is produced with greater than or equal to, 60%, 80%, or 90% or even 95% selectivity.
- the drimane sesquiterpene is albicanol.
- a method comprising transforming a host cell, microorganism or a non-human host organism with a nucleic acid encoding a polypeptide comprising a HAD-like hydrolase domain having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, or SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and comprising (1) the sequence as set forth in SEQ ID NO:
- a method provided herein comprises cultivating a non-human host organism or a host cell capable of producing FPP and transformed to express a polypeptide wherein the polypeptide comprises a sequence of amino acids that has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 1 or SEQ ID NO: 5 under conditions that allow for the production of the polypeptide.
- a method provided herein comprises contacting a sesquiterpene such as albicanol and/or drimenol with at least one enzyme to produce a sesquiterpene derivative.
- the sesquiterpene derivative can be obtained biochemically or chemically.
- a drimenol derivative is provided. Examples of such derivatives of drimenol include but not limited to drimenyl acetate (CAS 40266-93-1), drimenal (CAS 105426-71-9), drimenic acid (CAS 111319-84-7).
- an albicanol derivative is provided.
- examples of such derivatives of albicanol include cryptoporic acid E (CAS 120001-10-7), cryptoporic acid D (CAS 119979-95-2), cryptoporic acid B (CAS 113592-88-4), cryptoporic acid A (CAS 113592-87-3), laricinolic acid (CAS 302355-23-3), albicanyl acetate (CAS 83679-71-4).
- the albicanol and/or drimenol produced in any of the method described herein can be converted to derivatives such as, but not limited to hydrocarbons, esters, amides, glycosides, ethers, epoxides, aldehydes, ketons, alcohols, diols, acetals or ketals.
- the albicanol and/or drimenol derivatives can be obtained by a chemical method such as, but not limited to oxidation, reduction, alkylation, acylation and/or rearrangement.
- the albicanol and/or drimenol derivatives can be obtained using a biochemical method by contacting the albicanol and/or drimenol with an enzyme such as, but not limited to an oxidoreductase, a monooxygenase, a dioxygenase, a transferase.
- an enzyme such as, but not limited to an oxidoreductase, a monooxygenase, a dioxygenase, a transferase.
- the biochemical conversion can be performed in-vitro using isolated enzymes, enzymes from lysed cells or in-vivo using whole cells.
- step a) comprises cultivating a non-human host organism or a host cell capable of producing FPP and transformed to express at least one polypeptide comprising an amino acid comprising SEQ ID NO: 1 or SEQ ID NO: 5 or a functional variant thereof which may be considered a polypeptide of the HAD-like hydrolase superfamily (Interpro protein superfamily IPR023214 or Pfam protein superfamily PF13419) and which comprises a HAD-like hydrolase domain and having a bifunctional terpene synthase activity, under conditions conducive to the production of drimane synthase, for example, albicanol and/or drimenol.
- albicanol may be the only product or may be part of a mixture of sesquiterpenes.
- drimenol may be the only product or may be part of a mixture of sesquiter
- the method further comprises, prior to step a), transforming a non-human organism or cell capable of producing FPP with at least one nucleic acid encoding a polypeptide comprising an amino acid comprising SEQ ID NO: 1 or SEQ ID NO: 5 or encoding a polypeptide having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, or SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50,
- An embodiment herein provides polypeptides of an embodiment herein to be used in a method to produce a drimane sesquiterpene such as albicanol and/or drimenol contacting an FPP precursor with the polypeptides of an embodiment herein either in vitro or in vivo.
- a polypeptide as described herein for producing a drimane sesquiterpene for example, albicanol and/or drimenol.
- Drimane sesquiterpenoids are widespread in nature (Jansen and Groot, 2004, Nat. Prod. Rep., 21, 449-477).
- the compounds in the drimane sesquiterpeneoid family contain the sesquiterpene structure with the drimane carbon skeleton depicted in FIG. 1 .
- commonly found drimane sesquiterpene are drimenol and albicanol ( FIG. 1 ) and compounds derived from drimenol and albicanol by enzymatic reactions such as oxidations, reduction, acylation, alkylation or rearrangement.
- the drimane sesquiterpenoid family contains also compounds were the drimane sesquiterpene is bound to a molecule derived from another biosynthetic pathway (Jansen and Groot, 2004, Nat. Prod. Rep., 21, 449-477).
- Cryptoporic acids A-H are drimane sequiterpenoid ethers of isocitric acid found in the fungus Cryptoporus volvatus (Hashimoto et al, 1987, Tetrahedron Let. 28, 6303-6304; Asakawa et al, 1992, Phytochemistry 31(2), 579-592; Hirotani et al, 1991, Phytochemistry 30(5), 1555-1559).
- crypotoporic acids the sesquiterpene moiety has the structure of albicanol and thus these compounds are putatively derived biosynthetically from albicanol.
- Laricinolic acid is a drimane type sesquiterpene which can be isolated from the wood-rotting fungus Laricifomes officinalis (Erb et al, 2000, J. Chem. Soc., Perkin Trans. 1, 2307-2309). Laricinolic acid is most likely derived from albicanol following several oxidative enzymatic steps.
- RNA extraction 0.5 ml of culture was taken, the cells (Approximately 100 mg) were recovered by centrifugation frozen in liquid nitrogen and grinded using a mortar and pestle. The total RNA pool was extracted using the ZR Fungal/Bacterial RNA MiniPrepTM from Zymo Research Corp (Irvine, Calif. 92614, U.S.A).
- Genomic DNA was extracted using the NucleoSpin® Soil Kit from Machery-Nagel (Duren, Germany). Cells were recovered from the culture by centrifugation and the genomic DNA was extracted following the manufacturer protocol. From 500 mg of cells 1.05 and 0.93 micrograms of genomic DNA was extracted from ATCC-12212 and ATCC-64430, respectively.
- the genomic DNA was sequenced using a paired read protocol (Illumina).
- the libraries were prepared to select insert sizes between 250 and 350 bp.
- the sequencing was performed on a HiSeq 2500 Illumina sequencer.
- the length of the reads was 125 bases.
- a total of 21.3 and 30.4 millions of paired-reads (clusters) were sequenced for ATCC-12212 and ATCC-64430, respectively.
- the library was prepared from the total RNA using the TruSeq Stranded mRNA Library Preparation Kit (Illumina). An additional insert size selection step (160-240 bp) was performed. The libraries were sequenced in 2 ⁇ 125 bases paired-ends on a HiSeq 2500 Illumina sequencer. For ATCC-12212 and ATCC-64430, 19.9 million and 126 millions of reads were sequences, respectively.
- the reads were first joined on their overlapping ends.
- the joined paired reads were then assembled using the Velvet V1.2.10 assembler (Zerbino D. R. and Birney E. 2008, Genome Res. 18(5), 821-829; www.ebi.ac.uk/ ⁇ zerbino/velvet/) and the Oases software (Schulz M. H et al., 2012, Bioinformatics 28(8), 1086-1092; www.ebi.ac.uk/ ⁇ zerbino/oases/).
- a total of 25′866 contigs with an average length of 1,792 bases was obtained for the C. volvatus transcriptome.
- the C. volvatus genome was assembled using the Velvet V1.2.10 assembler (Zerbino D. R. and Birney E., 2008, Genome Res. 18(5), 821-829; www.ebi.ac.uk/ ⁇ zerbino/velvet/).
- the genome could be assembled in 1′266 contigs with an average size 20,000 bases and a total size of 25′320′421 bases.
- An ab-initio gene prediction in the C. volvatus genomic contigs was performed by Progenus S A (Gembloux, Belgium) using the Augustus software (Stanke et al., Nucleic Acids Res . (2004) 32, W309-W312). A total of 7738 genes were predicted.
- the genome and transcriptome of L. officinalis were assembled using the CLC Genomic Workbench (Qiagen).
- the genome was assembled in 16′831 contigs for a total genome size of 90′591′190 bases.
- the transcriptome assembly provided 28′633 contigs with an average length of 1′962 bases.
- Drimane sesquiterpene are presumably produced from farnesyl-diphosphate (FPP) by an enzymatic mechanism involving a protonation-initiated cyclization followed by an ionization-initiated reaction (Henquet et al., 2017, Plant J . Mar 4. doi: 10.1111/tpj.13527; Kwon, M.et al., 2014, FEBS Letters 588, 4597-4603) ( FIG. 2 ).
- This implies that the drimane synthases are composed of two catalytic domains, a protonation-initiated cyclization catalytic domain and an ionization-initiated cyclization catalytic domain.
- Terpene synthases catalyzing protonation-initiated cyclization reaction are called class II (or type II) terpene synthases and are typically involved in the biosynthesis of triterpenes and labdane diterpenes.
- class II terpene synthases the protonation-initiated reaction involves acidic amino acids donating a proton to the terminal double-bond. These residues, usually aspartic acids, are part of a conserved DxDD motif located in the active site of the enzyme.
- Terpene synthases catalyzing ionization-initiated reactions are called class I (or type I) terpene synthases, generally monoterpene and sesquiterpene synthases, and the catalytic center contains a conserved DDxxD (part of SEQ ID NO: 53) motif.
- the aspartic acid residues of this class I motif bind a divalent metal ion (most often Mg 2+ ) involved in the binding of the diphosphate group and catalyze the ionization and cleavage of the allylic diphosphate bond of the substrate.
- the putative cyclization mechanism of a farnesyl-diphosphate to a drimane sesquiterpene starts with the protonation of the 10,11-double bond followed by the sequential rearrangements and carbon-bond formations.
- the carbocation intermediate of this first (class II) reaction can then undergo deprotonation at C15 or C4 (or eventually at C2) leading to an albicanyl-diphosphate or drimenyl-diphosphate intermediate.
- the class I catalytic domain catalyzes the ionization of the allylic diphosphate bond and quenching of the carbocation intermediated by a water molecule leading to a drimane sesquiterpene containing a primary hydroxyl group ( FIG. 2 ).
- any traces of residual phosphorylated intermediates of the albicanol or drimenol synthesis like any albicanyl—or drimenyl-monophosphate and/or—diphosphate, may be chemically converted to the respective final product albicanol or drimenol.
- Certain corresponding methods are known and may comprise, for example, the hydrolytic cleavage of the phosphoric acid ester bond. Additionally, certain intermediates can also be converted enzymatically as shown in Examples 7 and 8.
- this enzyme does not have a class I terpene synthase activity and thus does not catalyze the ionization and cleavage of the allylic diphosphate group.
- AstC we first search the amino acid sequences deduced from the genes predicted in the C. volvatus genome. Using a Blastp search against the amino acid sequences deduced from the predicted genes, 5 sequences were retrieved with an E value between 0.77 and 3e-089 (Altschul et al., 1990, J. Mol. Biol. 215, 403-410).
- CvTps1 was selected as the most relevant for a putative albicanol synthases.
- the amino acid sequence encoded by the CvTps1 gene shared 38% identity with the AstC amino acid sequence. Analysis of this sequence revealed the presence of a class II terpene synthase-like motif, DVDT, at position 275-279. This is a variant of the typical class II terpene synthase motif mentioned above, where the last Asp is replaced by a Thr.
- This DxDT class II motif is found in some class II diterpene synthases (Xu M. et al., 2014, J. Nat. Prod. 77, 2144-2147; Morrone D.
- CvTps1 was selected as putative candidate for a bi-functional albicanol synthase.
- Protein family databases such as Pfam and Interpro (European Bioinformatic Institute (EMBL-EBI) are databases of protein families including functional annotation, protein domains and protein domain signatures.
- the amino acid sequence of CvTps1 was searched for the occurrence of motifs characteristic of protein domains using the HMMER algorithm available on the HMMER website (Finn R. D., 2015, Nucleic Acids Research Web Server Issue 43:W30-W38; www.ebi.ac.uk/Tools/hmmer/). No domain associated with classical terpene synthases was found in the CvTps1 amino acid sequence.
- the query identified a domain characteristic of the Haloacid dehalogenase (HAD)-like hydrolase protein superfamily (PF13419.5) in the region between residues 115 and 187.
- a similar search using the Interpro protein family database see the ebi.ac.uk/interpro/web site) and the conserveed Domain Database (NCBI web site at ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) provided the same results: only the prediction of a domain of the HAD-like hydrolase superfamily in the N-terminal region (IPR 023214 and CL21460, respectively).
- the HAD-like hydrolase superfamily contains a large number of proteins with various functions including enzymes with phosphatase activity (Koonin and Tatusov, 1994, J. Mol. Biol 244, 125-132; Kuznetsova et al, 2015, J Biol Chem. 290(30), 18678-18698).
- the class I terpene synthase-like motif identified above in the CvTps1 polypeptide contains one of the HAD-like hydrolase motif signatures containing a conserved aspartic acid residues involved in the catalytic (phosphatase) activity. This analysis thus confirms that the N-terminal region of CvTps1 is involved in hydrolysis of the diphosphate group (class I terpene synthase activity).
- the CvTps1 amino acid sequence was used to search for homologous sequences in the L. officinalis genome and transcriptome. For this search the tBlastn algorithm was used (Altschul et al 1990, J. Mol. Biol. 215, 403-410).
- LoTps1 showed sequence similarity with CvTps1: the length of the sequence (521 amino acid) was similar to the length of the CvTps1 amino acid sequence, the overall sequence identity between the two sequences was 71%, the N-terminal region contained a typical class I terpene synthase motif (DDKLD at position 162-166), a class II terpene synthase motif (DMDT) was found in position 267-270 and the N-terminal region contain a predicted HAD-like hydrolase domain.
- DDKLD typical class I terpene synthase motif
- DMDT class II terpene synthase motif
- the CvTps1 and LoTps1 coding sequences were control and the intron-exon jonctions predictions were refined using mappings of the RNA sequencing reads against the genomic contigs.
- the coding sequences of the resulting cDNAs were codon optimized and cloned in the pJ401 E. coli expression plasmid (pJ401, ATUM, Newark, Calif.).
- the enzymes were functionally characterized in E. coli cells engineered to overproduce farnesyl-diphosphate (FPP). Competent E. coli cells were transformed with the plasmid pACYC-29258-4506 (described in WO2013064411 or in Schalk et al., 2013, J. Am. Chem. Soc. 134, 18900-18903) and with the pJ401-CvTps1 or pJ401-LoTps1 expression plasmid.
- the pACYC-29258-4506 carries the cDNA encoding for a FPP synthase gene and the genes for a complete mevalonate pathway.
- coli cells (Promega) were used as a host. Transformed cells were selected on kanamycin (50 ⁇ g/ml) and chloramphenicol (34 ⁇ g/ml) LB-agarose plates. Single colonies were used to inoculate 5 mL liquid LB medium supplemented with the same antibiotics. The culture was incubated overnight at 37° C. The next day 2 mL of TB medium supplemented with the same antibiotics were inoculated with 0.2 mL of the overnight culture. After 6 hours incubation at 37° C., the culture was cooled down to 28° C. and 0.1 mM IPTG, 0.2% rhamnose and 10% in volume (0.2 ml) of dodecane were added to each tube. The cultures were incubated for 48 hours at 28° C. The cultures were then extracted twice with 2 volumes of tert-Butyl methyl ether (MTBE), the organic phase were concentrated to 500 ⁇ L and analyzed by GC-MS.
- MTBE tert
- the GC-MS analysis were performed using an Agilent 6890 Series GC system connected to an Agilent 5975 mass detector.
- the GC was equipped with 0.25 mm inner diameter by 30 m DB-1MS capillary column (Agilent).
- the carrier gas was He at a constant flow of 1 mL/min.
- the inlet temperature was set at 250° C.
- the initial oven temperature was 80° C. followed by a gradient of 10° C./min to 220° C. and a second gradient of 30° C./min to 280° C.
- the identification of the products was based on the comparison of the mass spectra and retention indices with authentic standards and internal mass spectra databases.
- albicanol was confirmed by 1H- and 13C-NMR analysis.
- the optical rotation was measured using a Bruker Avance 500 MHz spectrometer.
- the value of [ ⁇ ] D 20 +3.8° (0.26%, CHCl3) confirmed the formation of (+)-albicanol (with the structure shown in FIG. 1 ) by the recombinant CvTps1 protein.
- LoTps1 The activity of LoTps1 was evaluated in the same conditions.
- the product profile was identical to the profile of CvTps1 with (+)-albicanol as the only detected product of the recombinant LoTps1 enzyme.
- NCBI accession OCH93767.1 from Obba rivulosa NCBI accession EMD37666.1 from Gelatoporia subvermispora
- NCBI accession XP_001217376.1 NCBI accession OJJ98394.1 from Aspergillus aculeatus
- NCBI accession GAO87501.1 from Aspergillus udagawae
- NCBI accession XP_008034151.1 from Trametes versicolor
- NCBI accession XP_007369631.1 from Dichomitus squalens NCBI accession KIA75676.1 from Aspergillus ustus
- NCBI accession XP_001820867.2 from Aspergillus oryzae
- NCBI accession CEN60542.1 NCBI accession CEN60542.1 from Aspergillus calidoustus
- sequence of EMD3766.1 was corrected by deleting the amino acids 261 to 266 present in the published sequence and probably resulting from incorrect splicing prediction (sequence EMD37666-B in table 1).
- sequence EMD37666-B Another sequence, ACg006372 was selected from the published annotated sequence of Antrodia cinnamomea (Lu et al., 2014, Proc. Natl. Acad. Sci. USA. 111(44):E4743-52, (Dataset S1)).
- the 15 putative terpene synthases amino acid sequences contain a class II terpene synthase-like motif with the consensus sequence D(V/M/L/F)D(T/S) as well as a class I terpene synthase-like motif with the consensus sequence DD(K/N/Q/R/S)xD (were x is a hydrophobic residue L, I, G, T or P).
- the class I and class II motifs are easily localized using an alignment of the amino acid sequences with the sequences of CvTps1 and LoTps1 ( FIG. 6 ). Such alignment can be made using for example the program Clustal W (Thompson J. D. et al., 1994, Nucleic Acids Res. 22(22), 4673-80).
- the presence of a HAD-like hydrolase domain was identified in the N-terminal region of the 15 amino acid sequences (between positions 1 and 183 to 243 of the sequences) (Table 3).
- the cDNAs encoding for the 15 new putative synthases described in Example 5 were codon optimized and cloned in the pJ401 E. coli expression plasmid (pJ401, ATUM, Newark, California).
- the enzymes were functionally characterized in E. coli cells engineered to overproduce farnesyl-diphosphate (FPP) following the procedure described in example 4.
- FPP farnesyl-diphosphate
- Drimenol is produced by a mechanism similar to the formation of albicanol and involving a class II followed by class I enzymatic activity.
- HAD-like HAD-like hydrolase hydrolase Ezyme Length Product domain start domain end CvTps1 525 Albicanol 115 187 LoTps1 521 Albicanol 62 181 OCH93767.1 527 Albicanol 51 185 EMD37666.1 533 Albicanol 54 185 EMD37555-B 528 Albicanol 54 185 XP_001217376.1 486 Albicanol 25 181 OJJ98394.1 483 Albicanol 25 181 GAO87501.1 485 Albicanol 34 186 XP_008034151.1 524 Albicanol 60 187 XP_007369631.1 527 Albicanol 120 187 ACg006372 496 Albicanol 60 198 KIA75676.1 543 Drimenol
- Crude protein extracts containing the recombinant terpene synthases are prepared using KRX E. coli cells (Promega) or BL21 StarTM (DE3) E. coli (ThermoFisher). Single colonies of cells transformed with the expression plasmid are used to inoculate 5 ml LB medium. After 5 to 6 hours incubation at 37° C., the cultures are transferred to a 25° C. incubator and left 1 hour for equilibration. Expression of the protein is then induced by the addition of 1 mM IPTG and the cultures are incubated over-night at 25° C.
- the cells are collected by centrifugation, resuspended in 0.1 volume of 50 mM MOPSO pH 7 (3-Morpholino-2-hydroxypropanesulfonic acid (sigma-Aldrich), 10% glycerol and lyzed by sonication.
- the extracts are cleared by centrifugation (30 min at 20,000 g) and the supernatants containing the soluble proteins are used for further experiments.
- the assays are performed in glass tubes in 2 mL of 50 mM MOPSO pH 7, 10% glycerol, 1 mM DTT, 15 mM MgCl2 in the presence of 80 ⁇ M of farnesyl-diphosphate (FPP, Sigma) and 0.1 to 0.5 mg of crude protein.
- the tubes are incubated 12 to 24 hours at 25° C. and extracted twice with one volume of pentane. After concentration under a nitrogen flux, the extracts are analyzed by GC-MS as described in Example 4 and compared to extracts from assays with control proteins.
- the aqueous phase is then treated by alkaline phosphatase (Sigma, 6 units/ml), followed by extraction with pentane and GC-MS analysis.
- the assays without alkaline phosphatase treatment allow detecting and identifying the sesquiterpene compounds (hydrocarbons and oxygenated sesquiterpenes) present in the assay and produced by the recombinant enzymes.
- Albicanyl-diphosphate or drimenyl-diphosphate compounds are not soluble in the organic solvent and are thus not detected in the GC-MS analysis.
- allylic diphosphate bounds are cleaved and when albicanyl-diphosphate or drimenyl-diphosphate compounds are present, the sequiterpene moiety is released, extracted in the solvent phase and detected in the GC-MS analysis.
- This example allows to differentiate enzymes having only class II terpene synthase activity (such as AstC, NCBI accession XP_001822013.2, Shinohara Y. et al., 2016, Sci Rep. 6, 32865) from enzyme having class II terpene synthase-like activity and class I (phosphatase) activity such as CvTps1 and LoTps1.
- class II terpene synthase activity such as AstC, NCBI accession XP_001822013.2, Shinohara Y. et al., 2016, Sci Rep. 6, 32865
- Synthetic operons were designed to co-express the CvTps1 protein with the AstI and AstK proteins.
- the synthetic operon contains the optimized cDNA encoding for each of the 3 proteins separated by a ribosome binding sequence (RBS).
- RBS ribosome binding sequence
- a similar operon was designed to co-express AstC with AstI and AstK.
- the operons were synthesized and cloned in the pJ401 expression plasmid (ATUM, Newark, Calif.). E coli cells were co-transformed with these expression plasmids and with the pACYC-29258-4506 plasmid (Example 4) and the cells were cultivated under conditions to produce sesquiterpenes as described in Example 4.
- the sequiterpenes produced were analyzed by GCMS as described in Example 4 and compared to the sequiterpene profile of cells expression only CvTps1 or AstC.
- AstC a significant higher amount (78-fold increase) of sesquiterpene is produced when the enzyme is co-expressed with enzymes (AstI and AstK) having phosphatase activity.
- Typical concentrations of drimane sesquiterpene in the E. coli cultures were 2,600 mg/ml with cells expressing AstC, AstI and AstK and 34 mg/ml with cells expressing AstC alone.
- the NCBI accession No XP 006461126.1 from Agaricus bisporus was selected using the method described in Example 5.
- the XP006461126.1 amino acid (SEQ ID NO: 63) shared 48.9% and 48.1% identity with the CvTps1 and LoTps1 amino acid sequences, respectively.
- the XP_006461126.1 contains a class II terpene synthase-like motif (DLDT) (part of SEQ ID NO: 56) located between position 278 and 271 and a class I terpene synthase-like motif (DDKLE) (part of SEQ ID NO: 55) located at position 167 to 171.
- the amino acid contains also motifs characteristic of of the Haloacid dehalogenase-like hydrolase superfamily in the N-terminal region.
- XP 006461126.1 The cDNA encoding for XP 006461126.1 was codon optimized and cloned in the pJ401 E. coli expression plasmid (pJ401, ATUM, Newark, Calif.). The enzyme was functionally characterized in E. coli cells engineered to overproduce farnesyl-diphosphate (FPP) following the procedure described in Example 4. The results show that XP 006461126.1 is a bifunctional drimenol synthase producing drimenol as major compound ( FIG. 11 ).
- the codon usage of the cDNA encoding for the different synthases was modified for optimal expression in S. cerevisiae (SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70).
- plasmids For expression of the different genes in S. cerevisiae , a set of plasmids were constructed in vivo using yeast endogenous homologous recombination as previously described in Kuijpers et al., Microb Cell Fact., 2013, 12:47. Each plasmid is composed by five DNA fragments which were used for S. cerevisiae co-transformation. The fragments were:
- FPP farnesyl-diphosphate
- tHMG1 truncated HMG1
- GAL4 under the control of a mutated version of its own promoter, as described in Griggs and Johnston, Proc Natl Acad Sci USA, 1991, 88:8597-8601, was integrated upstream the ERG9 promoter region.
- the endogenous promoter of ERG9 was replaced by the yeast promoter region of CTR3 generating the strain YST035.
- YST035 was mated with the strain CEN.PK2-1D (Euroscarf, Frankfurt, Germany) obtaining a diploid strain termed YST045.
- YST045 was transformed with the fragments required for in vivo plasmid assembly.
- Yeast transformations were performed with the lithium acetate protocol as described in Gietz and Woods, Methods Enzymol., 2002, 350:87-96. Transformation mixtures were plated on SmLeu-media containing 6.7 g/L of Yeast Nitrogen Base without amino acids (BD Difco, New Jersey, USA), 1.6 g/L Dropout supplement without leucine (Sigma Aldrich, Missouri, USA), 20 g/L glucose and 20 g/L agar. Plates were incubated for 3-4 days at 30° C.
- the table below shows the quantities of drimane sesquiterpene produced relative to the quantity obtained by the synthase XP 007369631.1 (under these experimental conditions, the concentration of drimane sesquiterpene produced by cells expressing XP 007369631.1 was 805 to 854 mg/L, the highest quantity produced).
Abstract
Description
- This application is a divisional application of U.S. patent application Ser. No. 16/618,737, filed Dec. 2, 2019, which is a U.S. National Phase Application of International Patent Application No. PCT/EP2018/064344, filed May 31, 2018, which claims the benefit of priority to European Patent Application No. 17174399.0, filed Jun. 2, 2017, the entire contents of which are hereby incorporated by reference herein.
- The contents of the electronic sequence listing (10200PCT SequenceListingAsFiled.txt; Size: 180,406 bytes; and Date of Creation: Feb. 16, 2022) is herein incorporated by reference in its entirety.
- Provided herein are biochemical methods of producing albicanol, drimenol and related compounds and derivatives, which method comprises the use of novel polypeptides.
- Terpenes are found in most organisms (microorganisms, animals and plants). These compounds are made up of five carbon units called isoprene units and are classified by the number of these units present in their structure. Thus monoterpenes, sesquiterpenes and diterpenes are terpenes containing 10, 15 and 20 carbon atoms, respectively. Sesquiterpenes, for example, are widely found in the plant kingdom. Many sesquiterpene molecules are known for their flavor and fragrance properties and their cosmetic, medicinal and antimicrobial effects. Numerous sesquiterpene hydrocarbons and sesquiterpenoids have been identified. Chemical synthesis approaches have been developed but are still complex and not always cost-effective.
- Biosynthetic production of terpenes involves enzymes called terpene synthases. There are numerous sesquiterpene synthases present in the plant kingdom, all using the same substrate (farnesyl diphosphate, FPP), but having different product profiles. Genes and cDNAs encoding sesquiterpene synthases have been cloned and the corresponding recombinant enzymes characterized.
- Many of the main sources for sesquiterpenes, for example drimenol, are plants naturally containing the sesquiterpene; however, the content of sesquiterpenes in these natural sources can be low. There still remains a need for the discovery of new terpenes, terpene synthases and more cost-effective methods of producing sesquiterpenes such as albicanol and/or drimenol and derivatives therefrom.
- Provided herein is a method for producing a drimane sesquiterpene comprising:
-
- a. contacting an acyclic farnesyl diphosphate (FPP) precursor with a polypeptide comprising aHaloacid dehalogenase (HAD)-like hydrolase domain and having bifunctional terpene synthase activity to produce a drimane sesquiterpene, wherein the polypeptide comprises
- i. a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)); and
- ii. a class II terpene synthase-like motifas set forth in SEQ ID NO: 56 (DxD(T/S)T); and
- b. optionally isolating the drimane sesquiterpene or a mixture comprising the drimane sesquiterpene.
- a. contacting an acyclic farnesyl diphosphate (FPP) precursor with a polypeptide comprising aHaloacid dehalogenase (HAD)-like hydrolase domain and having bifunctional terpene synthase activity to produce a drimane sesquiterpene, wherein the polypeptide comprises
- In one aspect, the drimane sesquiterpene comprises albicanol and/or drimenol.
- In a further aspect, in the above method, the polypeptide having bifunctional terpene synthase activity comprises
-
- a. an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and
- b. the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and
- c. the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58.
- In one embodiment, the above method comprises contacting the drimane sesquiterpene with at least one enzyme to produce a drimane sesquiterpene derivative. In another embodiment, the above method comprises converting the drimane sesquiterpene to a drimane sesquiterpene derivative using chemical synthesis or biochemical synthesis.
- In one aspect, the above method comprises transforming a host cell or non-human host organism with a nucleic acid encoding the above polypeptide.
- In one aspect, the method further comprises culturing a non-human host organism or a host cell capable of producing FPP and transformed to express a polypeptide comprising a HAD-like hydrolase domain under conditions that allow for the production of the polypeptide, wherein the polypeptide
-
- a. comprises the amino acid sequence of SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; or
- b. comprises
- i. an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and
- ii. the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and
- iii. the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58.
- In a further aspect, in the above method, the polypeptide comprises one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- In one embodiment, the class I terpene synthase-like motif of the above method comprises SEQ ID NO: 54 (DD(K/Q/R)(L/I/T)(D/E)), the class II terpene synthase-like motif comprises SEQ ID NO: 57 (D(V/M/L)DTT), and the drimane sesquiterpene is albicanol.
- In a one embodiment, in the above method the polypeptide comprises
-
- a. an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, or SEQ ID NO: 32, and
- b. the sequence of SEQ ID NO: 54 (DD(K/Q/R)(L/I/T)(D/E)), and
- c. the sequence of SEQ ID NO: 57 (D(V/M/L/F)DTTS); and
- the drimane sesquiterpene is albicanol.
- In a further embodiment, in the above method the polypeptide comprises
-
- a. an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63, and
- b. the sequence of SEQ ID NO: 55, and
- c. the sequence of SEQ ID NO: 58; and
- the drimane sesquiterpene is drimenol.
- Also provided is an isolated polypeptide comprising a HAD-like hydrolase domains and having bifunctional terpene synthase activity comprising the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 5 or comprising
-
- a. an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1 or SEQ ID NO: 5; and
- b. the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and
- c. the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 48.
- In one aspect, the isolated polypeptide further comprises one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- Provided herein is an isolated nucleic acid molecule
-
- a. comprising a nucleotide sequence encoding the polypeptide of
claim 13 or 14; or - b. comprising a nucleotide sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68, or the reverse complement thereof; or
- c. comprising a nucleotide molecule that hybridizes under stringent conditions to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68; or
- d. comprising the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70, or the reverse complement thereof.
- a. comprising a nucleotide sequence encoding the polypeptide of
- Also provided is a vector comprising the above nucleic acid molecule or a nucleic acid encoding the above polypeptide. In one aspect, the vector is an expression vector. In another aspect, the vector is a prokaryotic vector, viral vector or a eukaryotic vector.
- Further provided is a host cell or non-human organism comprising the above nucleic acid or above vector.
- In one aspect, the host cell or non-human organism is a prokaryotic cell or a eukaryotic cell or a microorganism or fungal cell.
- In one aspect, the prokaryotic cell is a bacterial cell. In a further aspect, the bacterial cell is E. coli.
- In another aspect, the host cell or non-human organism is a eukaryotic cell. In one aspect, the eukaryotic cell is a yeast cell or plant cell. In a further aspect, the yeast cell is Saccharomyces cerevisiae.
- Provided herein is the use of the above polypeptide for producing a drimane sesquiterpene or a mixture comprising a drimane sesquiterpene and one or more terpenes.
- In one aspect, in the above use of the polypeptide, the drimane sesquiterpene is albicanol. In another aspect, in the above use of the polypeptide, the drimane sesquiterpene is drimenol.
-
FIG. 1 : Structure of drimane, (+)-albicanol and (−)-drimenol. -
FIG. 2 : Mechanism of cyclization of farnesyl-diphosphate by a class II terpene synthase and class I terpene synthase enzymatic activity. -
FIG. 3 : GCMS analysis of the sesquiterpenes produced in-vivo by the recombinant CvTps1 enzyme in bacteria cells modified to overproduce farnesyl-diphosphate. A. Total ion chromatogram of an extract of E. coli cells expressing CvTps1 and the mevalonate pathway enzymes. B. Total ion chromatogram of an authentic standard of albicanol. C. Total ion chromatogram of an extract of E. coli cells expressing only the mevalonate pathway enzymes. 1, albicanol; 2, trans-farnesol (from hydrolysis of FPP by endogenous phosphatase enzymes). -
FIG. 4 : Comparison of the mass spectra of the product of CvTps1 and of an authentic standard of albicanol. A. Mass spectra ofpeak 1 inFIG. 3A (product of CvTps1). B. Mass spectra ofpeak 1 inFIG. 3B (authentic standard of albicanol). -
FIG. 5 : GCMS analysis of the sesquiterpenes produced by the LoTps1 and CvTps1 recombinant protein. Total ion chromatogram of an extract of E. coli cells expressing LoTpsl (A) and CvTps1 (B). The peak labeled ‘1’ is (+)-albicanol. -
FIG. 6A-C : Amino acid sequences alignment of putative terpene synthases containing class I and class II motifs: CvTps1 (SEQ ID NO: 1), LoTps1 (SEQ ID NO: 5), OCH93767.1 (SEQ ID NO: 9), EMD37666.1 (SEQ ID NO: 12), EMD37666-B (SEQ ID NO: 15), XP_001217376.1 (SEQ ID NO: 17), OJJ98394.1 (SEQ ID NO: 20), GA087501.1 (SEQ ID NO: 23), XP_008034151.1 (SEQ ID NO: 26), XP_007369631.1 (SEQ ID NO: 29), ACg006372 (SEQ ID NO: 32), KIA75676.1 (SEQ ID NO: 35), XP_001820867.2 (SEQ ID NO: 38), CEN60542.1 (SEQ ID NO: 41), XP_009547469.1 (SEQ ID NO: 44), KLO09124.1 (SEQ ID NO: 47), and OJI95797.1 (SEQ ID NO: 50). -
FIG. 7 . GCMS chromatograms of the sesquiterpenes produced by the LoTps1, CvTps1, OCH93767.1, EMD37666.1, EMD37666-B, and XP_001217376.1, recombinant proteins. The peak labeled ‘1’ is (+)-albicanol. -
FIG. 8 . GCMS chromatograms of the sesquiterpenes produced by the OJJ98394.1, GAO87501.1, XP_008034151.1, XP_007369631.1, and ACg006372 recombinant proteins. The peak labeled ‘1’ is (+)-albicanol. -
FIG. 9 . GCMS chromatograms of the sesquiterpenes produced by the KIA75676.1, XP_001820867.2, CEN60542.1, XP_009547469.1, KLO09124.1 and OJI95797.1 recombinant proteins. The peak labeled ‘1’ is (−)-drimenol and the peak labeled ‘2’ is farnesol. -
FIG. 10 . GCMS chromatograms of the sesquiterpenes produced by CvTps1 and AstC expressed in E. coli cells with and without the AstI and AstK phosphatases. The major peak obtained with AstC is drim-8-ene-11-ol and the major peak obtained with CvTps1 is (+)-albicanol. -
FIG. 11 . GCMS analysis of the sesquiterpenes produced in-vivo by the recombinant XP_006461126.1 enzyme in bacteria cells modified to overproduce farnesyl-diphosphate. A. Total ion chromatogram of an extract of E. coli cells expressing XP_006461126.1 and the mevalonate pathway enzymes. B. Mass spectra of peak 13.1 minutes identified as drimenol. -
FIG. 12 . GC-FID analysis of drimane sesquiterpenes produced using the modified S. cereviciae strain YST045 expressing five different synthases: XP_007369631.1 from Dichomitus squalens, XP_006461126 from Agaricus bisporus, LoTps1 from Laricifomes officinalis, EMD37666.1 from Gelatoporia subvermispora and XP_001217376.1 from Aspergillus terreus. -
- bp base pair
- kb kilo base
- DNA deoxyribonucleic acid
- cDNA complementary DNA
- DTT dithiothreitol
- FPP farnesyl diphosphate
- GC gas chromatograph
- HAD Haloacid dehalogenase
- IPTG isopropyl-D-thiogalacto-pyranoside
- LB lysogeny broth
- MS mass spectrometer/mass spectrometry
- MVA mevalonic acid
- PCR polymerase chain reaction
- RNA ribonucleic acid
- mRNA messenger ribonucleic acid
- miRNA micro RNA
- siRNA small interfering RNA
- rRNA ribosomal RNA
- tRNA transfer RNA
- The term “polypeptide” means an amino acid sequence of consecutively polymerized amino acid residues, for instance, at least 15 residues, at least 30 residues, at least 50 residues. In some embodiments herein, a polypeptide comprises an amino acid sequence that is an enzyme, or a fragment, or a variant thereof.
- The term “protein” refers to an amino acid sequence of any length wherein amino acids are linked by covalent peptide bonds, and includes oligopeptide, peptide, polypeptide and full length protein whether naturally occurring or synthetic.
- The term “isolated” polypeptide refers to an amino acid sequence that is removed from its natural environment by any method or combination of methods known in the art and includes recombinant, biochemical and synthetic methods.
- The terms “bifunctional terpene synthase” or “polypeptide having bifunctional terpene synthase activity” relate to a polypeptide that comprises class I and class II terpene synthase domains and has bifunctional terpene synthase activity of protonation-initiated cyclization and ionization-initiated cyclization catalytic activities. A bifunctional terpene synthase as described herein comprises a HAD-like hydrolase domain which is characteristic of polypeptides belonging to the Haloacid dehalogenase (HAD)-like hydrolase superfamily (Interpro protein superfamily IPR023214, www.ebi.ac.uk/interpro/entry/IPR023214; Pfam protein superfamily PF13419, pfam.xfam.org/family/PF13419). A HAD-like hydrolase domain is a portion of a polypeptide having amino acid sequence similarities with the members of the HAD-like hydrolase family and related function. A HAD-like hydrolase domain can be identified in a polypeptide by searching for amino acid motifs or signatures characteristic of this protein family. Tools for performing such searches are available at the following web sites: ebi.ac.uk/interpro/ or ebi.ac.uk/Tools/hmmer/. Proteins are generally composed of one or more functional regions or domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. A polypeptide which comprises a HAD-like hydrolase domain and/or characteristic HAD-like hydrolase motifs functions in binding and cleavage of phosphate or diphosphate groups of a ligand. A bifunctional terpene synthase may also comprise one or more of conserved motifs A, B, C, and/or D as depicted in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- The term “drimane sesquiterpene” relates to a terpene having a drimane-like carbon skeleton structure as depicted in
FIG. 1 . - The term “class I terpene synthase” relates to a terpene synthase that catalyses ionization-initiated reactions, for example, monoterpene and sesquiterpene synthases.
- The term “class I terpene synthase motif” or “class I terpene synthase-like motif” relates to an active site of a terpene synthase that comprises the conserved DDxx(D/E) motif. The aspartic acid residues of this class I motif bind, for example, a divalent metal ion (most often Mg2+) involved in the binding of the diphosphate group and catalyze the ionization and cleavage of the allylic diphosphate bond of the substrate.
- The term “class II terpene synthase” relates to a terpene synthase that catalyses protonation-initiated cyclization reactions, for example, typically involved in the biosynthesis of triterpenes and labdane diterpenes. In class II terpene synthases, the protonation-initiated reaction may involve, for example, acidic amino acids donating a proton to the terminal double-bond.
- The term “class II terpene synthase motif” or “class II terpene synthase-like motif” relates to an active site of a terpene synthase that comprises the conserved DxDD or DxD(T/S)T motif.
- The terms “albicanol synthase” or “polypeptide having albicanol synthase activity” or “albicanol synthase protein” relate to a polypeptide capable of catalyzing the synthesis of albicanol, in the form of any of its stereoisomers or a mixture thereof, starting from an acyclic terpene pyrophosphate, particularly farnesyl diphosphate (FPP). Albicanol may be the only product or may be part of a mixture of sesquiterpenes.
- The terms “drimenol synthase” or “polypeptide having a drimenol synthase activity” or “drimenol synthase protein” relate to a polypeptide capable of catalyzing the synthesis of drimenol, in the form of any of its stereoisomers or a mixture thereof, starting from an acyclic terpene pyrophosphate, particularly farnesyl diphosphate (FPP). Drimenol may be the only product or may be part of a mixture of sesquiterpenes.
- The terms “biological function,” “function,” “biological activity” or “activity” refer to the ability of the bifunctional terpene synthase to catalyze the formation of albicanol and/or drimenol or a mixture of compounds comprising albicanol and/or drimenol and one or more terpenes.
- The terms “mixture of terpenes” or “mixture of sesquiterpenes” refer to a mixture of terpenes or sesquiterpenes that comprises albicanol and/or drimenol, and may also comprise one or more additional terpenes or sesquiterpenes.
- The terms “nucleic acid sequence,” “nucleic acid,” “nucleic acid molecule” and “polynucleotide” are used interchangeably meaning a sequence of nucleotides. A nucleic acid sequence may be a single-stranded or double-stranded deoxyribonucleotide, or ribonucleotide of any length, and include coding and non-coding sequences of a gene, exons, introns, sense and anti-sense complimentary sequences, genomic DNA, cDNA, miRNA, siRNA, mRNA, rRNA, tRNA, recombinant nucleic acid sequences, isolated and purified naturally occurring DNA and/or RNA sequences, synthetic DNA and RNA sequences, fragments, primers and nucleic acid probes. The skilled artisan is aware that the nucleic acid sequences of RNA are identical to the DNA sequences with the difference of thymine (T) being replaced by uracil (U). The term “nucleotide sequence” should also be understood as comprising a polynucleotide molecule or an oligonucleotide molecule in the form of a separate fragment or as a component of a larger nucleic acid.
- An “isolated nucleic acid” or “isolated nucleic acid sequence” relates to a nucleic acid or nucleic acid sequence that is in an environment different from that in which the nucleic acid or nucleic acid sequence naturally occurs and can include those that are substantially free from contaminating endogenous material. The term “naturally-occurring” as used herein as applied to a nucleic acid refers to a nucleic acid that is found in a cell of an organism in nature and which has not been intentionally modified by a human in the laboratory.
- “Recombinant nucleic acid sequences” are nucleic acid sequences that result from the use of laboratory methods (for example, molecular cloning) to bring together genetic material from more than on source, creating or modifying a nucleic acid sequence that does not occur naturally and would not be otherwise found in biological organisms.
- “Recombinant DNA technology” refers to molecular biology procedures to prepare a recombinant nucleic acid sequence as described, for instance, in Laboratory Manuals edited by Weigel and Glazebrook, 2002, Cold Spring Harbor Lab Press; and Sambrook et al, 1989, Cold Spring Harbor, N.Y., Cold Spring Harbor Laboratory Press.
- The term “gene” means a DNA sequence comprising a region, which is transcribed into a RNA molecule, e.g., an mRNA in a cell, operably linked to suitable regulatory regions, e.g., a promoter. A gene may thus comprise several operably linked sequences, such as a promoter, a 5′ leader sequence comprising, e.g., sequences involved in translation initiation, a coding region of cDNA or genomic DNA, introns, exons, and/or a 3′non-translated sequence comprising, e.g., transcription termination sites.
- A “chimeric gene” refers to any gene which is not normally found in nature in a species, in particular, a gene in which one or more parts of the nucleic acid sequence are present that are not associated with each other in nature. For example the promoter is not associated in nature with part or all of the transcribed region or with another regulatory region. The term “chimeric gene” is understood to include expression constructs in which a promoter or transcription regulatory sequence is operably linked to one or more coding sequences or to an antisense, i.e., reverse complement of the sense strand, or inverted repeat sequence (sense and antisense, whereby the RNA transcript forms double stranded RNA upon transcription). The term “chimeric gene” also includes genes obtained through the combination of portions of one or more coding sequences to produce a new gene.
- A “3′ UTR” or “3′ non-translated sequence” (also referred to as “3′ untranslated region,” or “3′end”) refers to the nucleic acid sequence found downstream of the coding sequence of a gene, which comprises, for example, a transcription termination site and (in most, but not all eukaryotic mRNAs) a polyadenylation signal such as AAUAAA or variants thereof. After termination of transcription, the mRNA transcript may be cleaved downstream of the polyadenylation signal and a poly(A) tail may be added, which is involved in the transport of the mRNA to the site of translation, e.g., cytoplasm.
- “Expression of a gene” encompasses “heterologous expression” and “over-expression” and involves transcription of the gene and translation of the mRNA into a protein. Overexpression refers to the production of the gene product as measured by levels of mRNA, polypeptide and/or enzyme activity in transgenic cells or organisms that exceeds levels of production in non-transformed cells or organisms of a similar genetic background.
- “Expression vector” as used herein means a nucleic acid molecule engineered using molecular biology methods and recombinant DNA technology for delivery of foreign or exogenous DNA into a host cell. The expression vector typically includes sequences required for proper transcription of the nucleotide sequence. The coding region usually codes for a protein of interest but may also code for an RNA, e.g., an antisense RNA, siRNA and the like.
- An “expression vector” as used herein includes any linear or circular recombinant vector including but not limited to viral vectors, bacteriophages and plasmids. The skilled person is capable of selecting a suitable vector according to the expression system. In one embodiment, the expression vector includes the nucleic acid of an embodiment herein operably linked to at least one regulatory sequence, which controls transcription, translation, initiation and termination, such as a transcriptional promoter, operator or enhancer, or an mRNA ribosomal binding site and, optionally, including at least one selection marker. Nucleotide sequences are “operably linked” when the regulatory sequence functionally relates to the nucleic acid of an embodiment herein.
- “Regulatory sequence” refers to a nucleic acid sequence that determines expression level of the nucleic acid sequences of an embodiment herein and is capable of regulating the rate of transcription of the nucleic acid sequence operably linked to the regulatory sequence. Regulatory sequences comprise promoters, enhancers, transcription factors, promoter elements and the like.
- “Promoter” refers to a nucleic acid sequence that controls the expression of a coding sequence by providing a binding site for RNA polymerase and other factors required for proper transcription including without limitation transcription factor binding sites, repressor and activator protein binding sites. The meaning of the term promoter also includes the term “promoter regulatory sequence”. Promoter regulatory sequences may include upstream and downstream elements that may influences transcription, RNA processing or stability of the associated coding nucleic acid sequence. Promoters include naturally-derived and synthetic sequences. The coding nucleic acid sequences is usually located downstream of the promoter with respect to the direction of the transcription starting at the transcription initiation site.
- The term “constitutive promoter” refers to an unregulated promoter that allows for continual transcription of the nucleic acid sequence it is operably linked to.
- As used herein, the term “operably linked” refers to a linkage of polynucleotide elements in a functional relationship. A nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence. For instance, a promoter, or rather a transcription regulatory sequence, is operably linked to a coding sequence if it affects the transcription of the coding sequence. Operably linked means that the DNA sequences being linked are typically contiguous. The nucleotide sequence associated with the promoter sequence may be of homologous or heterologous origin with respect to the plant to be transformed. The sequence also may be entirely or partially synthetic. Regardless of the origin, the nucleic acid sequence associated with the promoter sequence will be expressed or silenced in accordance with promoter properties to which it is linked after binding to the polypeptide of an embodiment herein. The associated nucleic acid may code for a protein that is desired to be expressed or suppressed throughout the organism at all times or, alternatively, at a specific time or in specific tissues, cells, or cell compartment. Such nucleotide sequences particularly encode proteins conferring desirable phenotypic traits to the host cells or organism altered or transformed therewith. More particularly, the associated nucleotide sequence leads to the production of albicanol and/or drimenol or a mixture comprising albicanol and/or drimenol or a mixture comprising albicanol and/or drimenol and one or more terpenes in the cell or organism. Particularly, the nucleotide sequence encodes a bifunctional terpene synthase.
- “Target peptide” refers to an amino acid sequence which targets a protein, or polypeptide to intracellular organelles, i.e., mitochondria, or plastids, or to the extracellular space (secretion signal peptide). A nucleic acid sequence encoding a target peptide may be fused to the nucleic acid sequence encoding the amino terminal end, e.g., N-terminal end, of the protein or polypeptide, or may be used to replace a native targeting polypeptide.
- The term “primer” refers to a short nucleic acid sequence that is hybridized to a template nucleic acid sequence and is used for polymerization of a nucleic acid sequence complementary to the template.
- As used herein, the term “host cell” or “transformed cell” refers to a cell (or organism) altered to harbor at least one nucleic acid molecule, for instance, a recombinant gene encoding a desired protein or nucleic acid sequence which upon transcription yields a bifunctional terpene synthase protein useful to produce albicanol and/or drimenol. The host cell is particularly a bacterial cell, a fungal cell or a plant cell. The host cell may contain a recombinant gene which has been integrated into the nuclear or organelle genomes of the host cell. Alternatively, the host may contain the recombinant gene extra-chromosomally.
- Homologous sequences include orthologous or paralogous sequences. Methods of identifying orthologs or paralogs including phylogenetic methods, sequence similarity and hybridization methods are known in the art and are described herein.
- Paralogs result from gene duplication that gives rise to two or more genes with similar sequences and similar functions. Paralogs typically cluster together and are formed by duplications of genes within related plant species. Paralogs are found in groups of similar genes using pair-wise Blast analysis or during phylogenetic analysis of gene families using programs such as CLUSTAL. In paralogs, consensus sequences can be identified characteristic to sequences within related genes and having similar functions of the genes.
- Orthologs, or orthologous sequences, are sequences similar to each other because they are found in species that descended from a common ancestor. For instance, plant species that have common ancestors are known to contain many enzymes that have similar sequences and functions. The skilled artisan can identify orthologous sequences and predict the functions of the orthologs, for example, by constructing a polygenic tree for a gene family of one species using CLUSTAL or BLAST programs. A method for identifying or confirming similar functions among homologous sequences is by comparing of the transcript profiles in host cells or organisms, such as plants or microorganisms, overexpressing or lacking (in knockouts/knockdowns) related polypeptides.
- The skilled person will understand that genes having similar transcript profiles, with greater than 50% regulated transcripts in common, or with greater than 70% regulated transcripts in common, or greater than 90% regulated transcripts in common will have similar functions. Homologs, paralogs, orthologs and any other variants of the sequences herein are expected to function in a similar manner by making the host cells, organism such as plants or microorganisms producing bifunctional terpene synthase proteins.
- The term “selectable marker” refers to any gene which upon expression may be used to select a cell or cells that include the selectable marker. Examples of selectable markers are described below. The skilled artisan will know that different antibiotic, fungicide, auxotrophic or herbicide selectable markers are applicable to different target species.
- “Drimenol” for purposes of this application relates to (−)-drimenol (CAS: 468-68-8).
- “Albicanol” for the purpose of this application relates to (+)-albicanol (CAS: 54632-04-1).
- The term “organism” refers to any non-human multicellular or unicellular organisms such as a plant, or a microorganism. Particularly, a micro-organism is a bacterium, a yeast, an algae or a fungus.
- The term “plant” is used interchangeably to include plant cells including plant protoplasts, plant tissues, plant cell tissue cultures giving rise to regenerated plants, or parts of plants, or plant organs such as roots, stems, leaves, flowers, pollen, ovules, embryos, fruits and the like. Any plant can be used to carry out the methods of an embodiment herein.
- A particular organism or cell is meant to be “capable of producing FPP” when it produces FPP naturally or when it does not produce FPP naturally but is transformed to produce FPP, either prior to the transformation with a nucleic acid as described herein or together with said nucleic acid. Organisms or cells transformed to produce a higher amount of FPP than the naturally occurring organism or cell are also encompassed by the “organisms or cells capable of producing FPP”.
- For the descriptions herein and the appended claims, the use of “or” means “and/or” unless stated otherwise. Similarly, “comprise,” “comprises,” “comprising”, “include,” “includes,” and “including” are interchangeable and not intended to be limiting.
- It is to be further understood that where descriptions of various embodiments use the term “comprising,” those skilled in the art would understand that in some specific instances, an embodiment can be alternatively described using language “consisting essentially of” or “consisting of.”
- Provided herein is a nucleic acid molecule comprising a nucleotide sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 or comprising the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70, or the reverse complement thereof.
- According to one embodiment, the nucleic acid molecule consists of the nucleotide sequence SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70, or the reverse complement thereof.
- In one embodiment, the nucleic acid of an embodiment herein can be either present naturally in Cryptoporus or Laricifomes or in other fungal species, or be obtained by modifying SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof.
- In another embodiment, the nucleic acid is isolated or is derived from fungi of the genus Cryptoporus or Laricifomes. In a further embodiment the nucleic acid is isolated or derived from Cryptoporus volvatus or Laricifomes officinalis.
- Further provided is a nucleotide sequence obtained by modifying SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof which encompasses any sequence that has been obtained by modifying the sequence of SEQ ID NO: 3 or SEQ ID NO: 7, or of the reverse complement thereof using any method known in the art, for example, by introducing any type of mutations such as deletion, insertion and/or substitution mutations. The nucleic acids comprising a sequence obtained by mutation of SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof are encompassed by an embodiment herein, provided that the sequences they comprise share at least the defined sequence identity of SEQ ID NO: 3 or SEQ ID NO: 7 as defined in any of the above embodiments or the reverse complement thereof and provided that they encode a polypeptide comprising a HAD-like hydrolase domain and having a bifunctional terpene synthase activity to produce a drimane sesquiterpene, wherein the polypeptide comprises (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)) and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T). The polypeptide having bifunctional terpene synthase activity may further comprise one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62. Mutations may be any kind of mutations of these nucleic acids, for example, point mutations, deletion mutations, insertion mutations and/or frame shift mutations of one or more nucleotides of the DNA sequence of SEQ ID NO: 3 or SEQ ID NO: 7. In one embodiment, the nucleic acid of an embodiment herein may be truncated provided that it encodes a polypeptide as described herein.
- A variant nucleic acid may be prepared in order to adapt its nucleotide sequence to a specific expression system. For example, bacterial expression systems are known to more efficiently express polypeptides if amino acids are encoded by particular codons.
- Due to the degeneracy of the genetic code, more than one codon may encode the same amino acid sequence, multiple nucleic acid sequences can code for the same protein or polypeptide, all these DNA sequences being encompassed by an embodiment herein. Where appropriate, the nucleic acid sequences encoding the bifunctional terpene synthase may be optimized for increased expression in the host cell. For example, nucleotides of an embodiment herein may be synthesized using codons particular to a host for improved expression. In one embodiment, the nucleic acid molecule comprises the nucleotide sequence of SEQ ID NO: 4, SEQ ID NO: 8, SEQ ID NO: 11, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 19, SEQ ID NO: 22, SEQ ID NO: 25, SEQ ID NO: 28, SEQ ID NO: 31, SEQ ID NO: 34, SEQ ID NO: 37, SEQ ID NO: 40, SEQ ID NO: 43, SEQ ID NO: 46, SEQ ID NO: 49, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70, or the reverse complement thereof.
- In one embodiment provided herein is an isolated, recombinant or synthetic nucleic acid sequence comprising the nucleotide sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70, encoding for a bifunctional terpene synthase comprising the amino acid sequence of SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63 or functional fragments thereof that catalyze production of a drimane sesquiterpene in a cell from a FPP precursor. In a further embodiment, the drimane sesquiterpene comprises albicanol and/or drimenol.
- Provided herein are also cDNA, genomic DNA and RNA sequences. Any nucleic acid sequence encoding the bifunctional terpene synthase or variants thereof is referred herein as a bifunctional terpene synthase encoding sequence.
- According to one embodiment, the nucleic acid of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 is the coding sequence of a bifunctional terpene synthase gene encoding a bifunctional terpene synthase obtained as described in the Examples.
- A fragment of a polynucleotide of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 refers to contiguous nucleotides that is particularly at least 15 bp, at least 30 bp, at least 40 bp, at least 50 bp and/or at least 60 bp in length of the polynucleotide of an embodiment herein. Particularly the fragment of a polynucleotide comprises at least 25, more particularly at least 50, more particularly at least 75, more particularly at least 100, more particularly at least 150, more particularly at least 200, more particularly at least 300, more particularly at least 400, more particularly at least 500, more particularly at least 600, more particularly at least 700, more particularly at least 800, more particularly at least 900, more particularly at least 1000 contiguous nucleotides of the polynucleotide of an embodiment herein.
- Without being limited, the fragment of the polynucleotides herein may be used as a PCR primer, and/or as a probe, or for anti-sense gene silencing or RNAi.
- It is clear to the person skilled in the art that genes, including the polynucleotides of an embodiment herein, can be cloned on basis of the available nucleotide sequence information, such as found in the attached sequence listing, by methods known in the art. These include e.g. the design of DNA primers representing the flanking sequences of such gene of which one is generated in sense orientations and which initiates synthesis of the sense strand and the other is created in reverse complementary fashion and generates the antisense strand. Thermo stable DNA polymerases such as those used in polymerase chain reaction are commonly used to carry out such experiments. Alternatively, DNA sequences representing genes can be chemically synthesized and subsequently introduced in DNA vector molecules that can be multiplied by e.g. compatible bacteria such as e.g. E. coli.
- In a related embodiment provided herein, PCR primers and/or probes for detecting nucleic acid sequences encoding a polypeptide having bifunctional terpene synthase activity are provided.
- The skilled artisan will be aware of methods to synthesize degenerate or specific PCR primer pairs to amplify a nucleic acid sequence encoding the bifunctional terpene synthase or functional fragments thereof, based on SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68. A detection kit for nucleic acid sequences encoding the bifunctional terpene synthase may include primers and/or probes specific for nucleic acid sequences encoding the bifunctional terpene synthase, and an associated protocol to use the primers and/or probes to detect nucleic acid sequences encoding the bifunctional terpene synthase in a sample. Such detection kits may be used to determine whether a plant, organism, microorganism or cell has been modified, i.e., transformed with a sequence encoding the bifunctional terpene synthase.
- To test a function of variant DNA sequences according to an embodiment herein, the sequence of interest is operably linked to a selectable or screenable marker gene and expression of the reporter gene is tested in transient expression assays, for example, with microorganisms or with protoplasts or in stably transformed plants. The skilled artisan will recognize that DNA sequences capable of driving expression are built as modules. Accordingly, expression levels from shorter DNA fragments may be different than the one from the longest fragment and may be different from each other. Provided herein are also functional equivalents of the nucleic acid sequence coding the bifunctional terpene synthase proteins provided herein, i.e., nucleotide sequences that hybridize under stringent conditions to the nucleic acid sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68.
- As used herein, the term hybridization or hybridizes under certain conditions is intended to describe conditions for hybridization and washes under which nucleotide sequences that are significantly identical or homologous to each other remain bound to each other. The conditions may be such that sequences, which are at least about 70%, such as at least about 80%, and such as at least about 85%, 90%, or 95% identical, remain bound to each other. Definitions of low stringency, moderate, and high stringency hybridization conditions are provided herein.
- Appropriate hybridization conditions can be selected by those skilled in the art with minimal experimentation as exemplified in Ausubel et al. (1995, Current Protocols in Molecular Biology, John Wiley & Sons,
sections 2, 4, and 6). Additionally, stringency conditions are described in Sambrook et al. (1989, Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor - Press,
chapters 7, 9, and 11). As used herein, defined conditions of low stringency are as follows. Filters containing DNA are pretreated for 6 h at 40° C. in a solution containing 35% formamide, 5× SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 μg/ml denatured salmon sperm DNA. Hybridizations are carried out in the same solution with the following modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 μg/ml salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20×106 32P-labeled probe is used. Filters are incubated in hybridization mixture for 18-20 h at 40° C., and then washed for 1.5 h at 55° C. In a solution containing 2× SSC, 25 mM Tris-HCl (pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution is replaced with fresh solution and incubated an additional 1.5 h at 60° C. Filters are blotted dry and exposed for autoradiography. - As used herein, defined conditions of moderate stringency are as follows. Filters containing DNA are pretreated for 7 h at 50° C. in a solution containing 35% formamide, 5× SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 μg/ml denatured salmon sperm DNA. Hybridizations are carried out in the same solution with the following modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 μg/ml salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20×106 32P-labeled probe is used. Filters are incubated in hybridization mixture for 30 h at 50° C., and then washed for 1.5 h at 55° C. In a solution containing 2× SSC, 25 mM Tris-HCl (pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution is replaced with fresh solution and incubated an additional 1.5 h at 60° C. Filters are blotted dry and exposed for autoradiography.
- As used herein, defined conditions of high stringency are as follows. Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65° C. in buffer composed of 6× SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 μg/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65° C. in the prehybridization mixture containing 100 μg/ml denatured salmon sperm DNA and 5-20×106 cpm of 32P-labeled probe. Washing of filters is done at 37° C. for 1 h in a solution containing 2× SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA. This is followed by a wash in 0.1× SSC at 50° C. for 45 minutes. Other conditions of low, moderate, and high stringency well known in the art (e.g., as employed for cross-species hybridizations) may be used if the above conditions are inappropriate (e.g., as employed for cross-species hybridizations).
- The skilled artisan will be aware of methods to identify homologous sequences in other organisms and methods to determine the percentage of sequence identity between homologous sequences. Such newly identified DNA molecules then can be sequenced and the sequence can be compared with the nucleic acid sequence of SEQ ID NO: 3 or SEQ ID NO: 7.
- The percentage of identity between two peptide or nucleotide sequences is a function of the number of amino acids or nucleotide residues that are identical in the two sequences when an alignment of these two sequences has been generated. Identical residues are defined as residues that are the same in the two sequences in a given position of the alignment. The percentage of sequence identity, as used herein, is calculated from the optimal alignment by taking the number of residues identical between two sequences dividing it by the total number of residues in the shortest sequence and multiplying by 100. The optimal alignment is the alignment in which the percentage of identity is the highest possible. Gaps may be introduced into one or both sequences in one or more positions of the alignment to obtain the optimal alignment. These gaps are then taken into account as non-identical residues for the calculation of the percentage of sequence identity. Alignment for the purpose of determining the percentage of amino acid or nucleic acid sequence identity can be achieved in various ways using computer programs and for instance publicly available computer programs available on the world wide web. Preferably, the BLAST program (Tatiana et al, FEMS Microbiol Lett., 1999, 174:247-250, 1999) set to the default parameters, available from the National Center for Biotechnology Information (NCBI) website at ncbi.nlm.nih.gov/BLAST/b12seq/wblast2.cgi, can be used to obtain an optimal alignment of protein or nucleic acid sequences and to calculate the percentage of sequence identity.
- A related embodiment provided herein provides a nucleic acid sequence which is complementary to the nucleic acid sequence according to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68 such as inhibitory RNAs, or nucleic acid sequence which hybridizes under stringent conditions to at least part of the nucleotide sequence according to SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 68. An alternative embodiment of an embodiment herein provides a method to alter gene expression in a host cell. For instance, the polynucleotide of an embodiment herein may be enhanced or overexpressed or induced in certain contexts (e.g. upon exposure to certain temperatures or culture conditions) in a host cell or host organism.
- Alteration of expression of a polynucleotide provided herein may also result in ectopic expression which is a different expression pattern in an altered and in a control or wild-type organism. Alteration of expression occurs from interactions of polypeptide of an embodiment herein with exogenous or endogenous modulators, or as a result of chemical modification of the polypeptide. The term also refers to an altered expression pattern of the polynucleotide of an embodiment herein which is altered below the detection level or completely suppressed activity.
- In one embodiment, provided herein is also an isolated, recombinant or synthetic polynucleotide encoding a polypeptide or variant polypeptide provided herein.
- In one embodiment is provided an isolated nucleic acid molecule encoding a polypeptide comprising a domain of the HAD-like hydrolase superfamily having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58 or comprising the amino acid sequence of SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, or SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63.
- In one embodiment provided herein is an isolated polypeptide comprising a HAD-like hydrolase domain having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1 or SEQ ID NO: 5 or comprising the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 5.
- According to one embodiment, the polypeptide consists of the amino acid sequence of SEQ ID NO: 1 or 5.
- In one embodiment, the polypeptide of an embodiment herein can be present naturally in Cryptoporus or Laricifomes fungi or in other fungi species, or comprises an amino acid sequence that is a variant of SEQ ID NO: 1 or SEQ ID NO: 5, either obtained by genetic engineering or found naturally in Cryptoporus or Laricifomes fungi or in other fungi species.
- According to another embodiment, the polypeptide is isolated or derived from fungi of the genus Cryptoporus or Laricifomes. In a further embodiment, the polypeptide is isolated or derived from Cryptoporus volvatus or Laricifomes officinalis.
- In one embodiment, the at least one polypeptide having a bifunctional terpene synthase activity used in any of the herein-described embodiments or encoded by the nucleic acid used in any of the herein-described embodiments comprises an amino acid sequence that is a variant of SEQ ID NO: 1 or SEQ ID NO: 5, obtained by genetic engineering. In one embodiment the polypeptide comprises an amino acid sequence encoded by a nucleotide sequence that has been obtained by modifying SEQ ID NO: 3 or SEQ ID NO: 7 or the reverse complement thereof.
- Polypeptides are also meant to include variants and truncated polypeptides provided that they have bifunctional terpene synthase activity.
- According to another embodiment, the at least one polypeptide having a bifunctional terpene synthase activity used in any of the herein-described embodiments or encoded by the nucleic acid used in any of the herein-described embodiments comprises an amino acid sequence that is a variant of SEQ ID NO: 1 or SEQ ID NO: 5, obtained by genetic engineering, provided that said variant has bifunctional terpene synthase activity to produce a drimane sesquiterpene and has the required percentage of identity to SEQ ID NO: 1 or SEQ ID NO: 5 as described in any of the above embodiments and comprises (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)) and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T) and comprises domains corresponding to Pfam domains PF13419.5 and PF13242.5. The polypeptide having bifunctional terpene synthase activity may further comprise one or more conserved motifs as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- According to another embodiment, the at least one polypeptide having a bifunctional terpene synthase activity used in any of the herein-described embodiments or encoded by the nucleic acid used in any of the herein-described embodiments is a variant of SEQ ID NO: 1 or SEQ ID NO: 5 that can be found naturally in other organisms, such as other fungal species, provided that it has bifunctional terpene synthase activity and comprises domains corresponding to Pfam domains PF13419.5 and PF13242.5. As used herein, the polypeptide includes a polypeptide or peptide fragment that encompasses the amino acid sequences identified herein, as well as truncated or variant polypeptides provided that they have bifunctional terpene synthase activity and that they share at least the defined percentage of identity with the corresponding fragment of SEQ ID NO: 1 or SEQ ID NO: 5 and comprise (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)) and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T) and comprises domains corresponding to Pfam domains PF13419.5 and PF13242.5.
- Examples of variant polypeptides are naturally occurring proteins that result from alternate mRNA splicing events or from proteolytic cleavage of the polypeptides described herein. Variations attributable to proteolysis include, for example, differences in the N- or C-termini upon expression in different types of host cells, due to proteolytic removal of one or more terminal amino acids from the polypeptides of an embodiment herein. Polypeptides encoded by a nucleic acid obtained by natural or artificial mutation of a nucleic acid of an embodiment herein, as described thereafter, are also encompassed by an embodiment herein.
- Polypeptide variants resulting from a fusion of additional peptide sequences at the amino and carboxyl terminal ends can also be used in the methods of an embodiment herein. In particular such a fusion can enhance expression of the polypeptides, be useful in the purification of the protein or improve the enzymatic activity of the polypeptide in a desired environment or expression system. Such additional peptide sequences may be signal peptides, for example. Another aspect encompasses methods using variant polypeptides, such as those obtained by fusion with other oligo- or polypeptides and/or those which are linked to signal peptides. Polypeptides resulting from a fusion with another functional protein, such as another protein from the terpene biosynthesis pathway, can also be advantageously used in the methods of an embodiment herein.
- A variant may also differ from the polypeptide of an embodiment herein by attachment of modifying groups which are covalently or non-covalently linked to the polypeptide backbone. The variant also includes a polypeptide which differs from the polypeptide provided herein by introduced N-linked or O-linked glycosylation sites, and/or an addition of cysteine residues. The skilled artisan will recognize how to modify an amino acid sequence and preserve biological activity.
- In addition to the gene sequences shown in the sequences disclosed herein, it will be apparent for the person skilled in the art that DNA sequence polymorphisms may exist within a given population, which may lead to changes in the amino acid sequence of the polypeptides disclosed herein. Such genetic polymorphisms may exist in cells from different populations or within a population due to natural allelic variation. Allelic variants may also include functional equivalents.
- Further embodiments also relate to the molecules derived by such sequence polymorphisms from the concretely disclosed nucleic acids. These natural variations usually bring about a variance of about 1 to 5% in the nucleotide sequence of a gene or in the amino acid sequence of the polypeptides disclosed herein. As mentioned above, the nucleic acid encoding the polypeptide or variants thereof of an embodiment herein is a useful tool to modify non-human host organisms, microorganisms or cells and to modify non-human host organisms, microorganisms or cells intended to be used in the methods described herein.
- An embodiment provided herein provides amino acid sequences of bifunctional terpene synthase proteins including orthologs and paralogs as well as methods for identifying and isolating orthologs and paralogs of the bifunctional terpene synthases in other organisms. Particularly, so identified orthologs and paralogs of the bifunctional terpene synthase retain bifunctional terpene synthase activity, may be considered a polypeptide of the HAD-like hydrolase superfamily (Interpro protein superfamily IPR023214 or Pfam protein superfamily PF13419) and which comprises a HAD-like hydrolase domain and are capable of producing a drimane sesquiterpene, such as albicanol and/or drimenol, starting from an acyclic terpene pyrophosphate precursor, e.g. FPP.
- The polypeptide to be contacted with an acyclic terpene pyrophosphate, e.g. FPP, in vitro can be obtained by extraction from any organism expressing it, using standard protein or enzyme extraction technologies. If the host organism is an unicellular organism or cell releasing the polypeptide of an embodiment herein into the culture medium, the polypeptide may simply be collected from the culture medium, for example by centrifugation, optionally followed by washing steps and re-suspension in suitable buffer solutions. If the organism or cell accumulates the polypeptide within its cells, the polypeptide may be obtained by disruption or lysis of the cells and optionally further extraction of the polypeptide from the cell lysate. The cell lysate or the extracted polypeptide can be used to contact the acyclic terpene pyrophosphate for production of a terpene or a mixture of terpenes.
- The polypeptide having a bifunctional terpene synthase activity, either in an isolated form or together with other proteins, for example in a crude protein extract obtained from cultured cells or microorganisms, may then be suspended in a buffer solution at optimal pH. If adequate, salts, DTT, inorganic cations and other kinds of enzymatic co-factors, may be added in order to optimize enzyme activity. The precursor FPP is added to the polypeptide suspension, which is then incubated at optimal temperature, for example between 15 and 40° C., particularly between 25 and 35° C., more particularly at 30° C. After incubation, the drimane sesquiterpene, such as albicanol and/or drimenol, produced may be isolated from the incubated solution by standard isolation procedures, such as solvent extraction and distillation, optionally after removal of polypeptides from the solution.
- According to another embodiment, the at least one polypeptide having a bifunctional terpene synthase activity can be used for production of a drimane sesquiterpene comprising albicanol and/or drimenol or mixtures of terpenes comprising albicanol and/or drimenol.
- One particular tool to carry out the method of an embodiment herein is the polypeptide itself as described herein.
- According to a particular embodiment, the polypeptide is capable of producing a mixture of sesquiterpenes wherein albicanol and/or drimenol represents at least 20%, particularly at least 30%, particularly at least 35%, particularly at least 90%, particularly at least 95%, more particularly at least 98% of the sesquiterpenes produced. In another aspect provided here, the albicanol and/or drimenol is produced with greater than or equal to 95%, more particularly 98% selectivity.
- The functionality or activity of any bifunctional terpene synthase protein, variant or fragment, may be determined using various methods. For example, transient or stable overexpression in plant, bacterial or yeast cells can be used to test whether the protein has activity, i.e., produces albicanol and/or drimenol from FPP precursors. Bifunctional terpene synthase activity may be assessed in a microbial expression system, such as the assay described in Example 3 herein on the production of albicanol and/or drimenol, indicating functionality. A variant or derivative of a bifunctional terpene synthase polypeptide of an embodiment herein retains an ability to produce a drimane sesquiterpene such as albicanol and/or drimenol from FPP precursors. Amino acid sequence variants of the bifunctional terpene synthases provided herein may have additional desirable biological functions including, e.g., altered substrate utilization, reaction kinetics, product distribution or other alterations.
- The ability of a polypeptide to catalyze the synthesis of a particular sesquiterpene (for example albicanol and/or drimenol) can be simply confirmed, for example, by performing the enzyme assay as detailed in Examples 3, 4 and 6.
- Further provided is at least one vector comprising the nucleic acid molecules described herein.
- Also provided herein is a vector selected from the group of a prokaryotic vector, viral vector and a eukaryotic vector.
- Further provided here is a vector that is an expression vector.
- In one embodiment, several bifunctional terpene synthases encoding nucleic acid sequences are co-expressed in a single host, particularly under control of different promoters. In another embodiment, several bifunctional terpene synthase proteins encoding nucleic acid sequences can be present on a single transformation vector or be co-transformed at the same time using separate vectors and selecting transformants comprising both chimeric genes. Similarly, one or more bifunctional terpene synthase encoding genes may be expressed in a single plant, cell, microorganism or organism together with other chimeric genes.
- The nucleic acid sequences of an embodiment herein encoding bifunctional terpene synthase proteins can be inserted in expression vectors and/or be contained in chimeric genes inserted in expression vectors, to produce bifunctional terpene synthase proteins in a host cell or non-human host organism. The vectors for inserting transgenes into the genome of host cells are well known in the art and include plasmids, viruses, cosmids and artificial chromosomes. Binary or co-integration vectors into which a chimeric gene is inserted can also be used for transforming host cells.
- An embodiment provided herein provides recombinant expression vectors comprising a nucleic acid sequence of a bifunctional terpene synthase gene, or a chimeric gene comprising a nucleic acid sequence of a bifunctional terpene synthase gene, operably linked to associated nucleic acid sequences such as, for instance, promoter sequences. For example, a chimeric gene comprising a nucleic acid sequence of SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, or SEQ ID NO: 70 or a variant thereof may be operably linked to a promoter sequence suitable for expression in plant cells, bacterial cells or fungal cells, optionally linked to a 3′ non-translated nucleic acid sequence.
- Alternatively, the promoter sequence may already be present in a vector so that the nucleic acid sequence which is to be transcribed is inserted into the vector downstream of the promoter sequence. Vectors can be engineered to have an origin of replication, a multiple cloning site, and a selectable marker.
- In one embodiment, an expression vector comprising a nucleic acid as described herein can be used as a tool for transforming non-human host organisms or host cells suitable to carry out the method of an embodiment herein in vivo.
- The expression vectors provided herein may be used in the methods for preparing a genetically transformed non-human host organism and/or host cell, in non-human host organisms and/or host cells harboring the nucleic acids of an embodiment herein and in the methods for making polypeptides having a bifunctional terpene synthase activity, as described herein.
- Recombinant non-human host organisms and host cells transformed to harbor at least one nucleic acid of an embodiment herein so that it heterologously expresses or over-expresses at least one polypeptide of an embodiment herein are also very useful tools to carry out the method of an embodiment herein. Such non-human host organisms and host cells are therefore provided herein.
- In one embodiment is provided a host cell, microorganism or non-human host organism comprising at least one of the nucleic acid molecules described herein or comprising at least one vector comprising at least one of the nucleic acid molecules.
- A nucleic acid according to any of the above-described embodiments can be used to transform the non-human host organisms and cells and the expressed polypeptide can be any of the above-described polypeptides.
- In one embodiment, the non-human host organism or host cell is a prokaryotic cell. In another embodiment, the non-human host organism or host cell is a bacterial cell. In a further embodiment, the non-human host organism or host cell is Escherichia coli.
- In one embodiment, the non-human host organism or host cell is a eukaryotic cell. In another embodiment, the non-human host organism or host cell is a yeast cell. In a further embodiment, the non-human host organism or cell is Saccharomyces cerevisiae.
- In a further embodiment, the non-human organism or host cell is a plant cell or a fungal cell.
- In one embodiment the non-human host organism or host cell expresses a polypeptide, provided that the organism or cell is transformed to harbor a nucleic acid encoding said polypeptide, this nucleic acid is transcribed to mRNA and the polypeptide is found in the host organism or cell. Suitable methods to transform a non-human host organism or a host cell have been previously described and are also provided herein.
- To carry out an embodiment herein in vivo, the host organism or host cell is cultivated under conditions conducive to the production of a drimane sesquiterpene such as albicanol and/or drimenol. Accordingly, if the host is a transgenic plant, optimal growth conditions can be provided, such as optimal light, water and nutrient conditions, for example. If the host is a unicellular organism, conditions conducive to the production of a drimane sesquiterpene such as albicanol and/or drimenol may comprise addition of suitable cofactors to the culture medium of the host. In addition, a culture medium may be selected, so as to maximize drimane sesquiterpene, such as albicanol and/or drimenol, synthesis. Examples of optimal culture conditions are described in a more detailed manner in the Examples.
- Non-human host organisms suitable to carry out the method of an embodiment herein in vivo may be any non-human multicellular or unicellular organisms. In one embodiment, the non-human host organism used to carry out an embodiment herein in vivo is a plant, a prokaryote or a fungus. Any plant, prokaryote or fungus can be used. Particularly useful plants are those that naturally produce high amounts of terpenes. In another embodiment the non-human host organism used to carry out the method of an embodiment herein in vivo is a microorganism. Any microorganism can be used, for example, the microorganism can be a bacteria or yeast, such as E. coli or Saccharomyces cerevisiae.
- Some of these organisms do not produce FPP naturally. To be suitable to carry out the method of an embodiment herein, organisms or cells that do not produce an acyclic terpene pyrophosphate precursor, e.g. FPP, naturally are transformed to produce said precursor. They can be so transformed either before the modification with the nucleic acid described according to any of the above embodiments or simultaneously, as explained above. Methods to transform organisms, for example microorganisms, so that they produce an acyclic terpene pyrophosphate precursor, e.g. FPP, are already known in the art.
- Isolated higher eukaryotic cells can also be used, instead of complete organisms, as hosts to carry out the method of an embodiment herein in vivo. Suitable eukaryotic cells may be any non-human cell, such as plant or fungal cells.
- Further provided herein is a method of producing a drimane sesquiterpene comprising: contacting an acyclic terpene pyrophosphate, particularly farnesyl diphosphate (FPP) with a polypeptide which comprises a HAD-like hydrolase domain and having bifunctional terpene synthase activity to produce a drimane sesquiterpene, wherein the polypeptide comprises (1) a class I terpene synthase-like motif as set forth in SEQ ID NO: 53 (DDxx(D/E)); and (2) a class II terpene synthase-like motif as set forth in SEQ ID NO: 56 (DxD(T/S)T); and optionally isolating the drimane sesquiterpene.
- Also provided is the above method wherein the drimane sesquiterpene comprises albicanol and/or drimenol.
- Additionally provided is the above method, wherein the polypeptide comprises an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63 and (1) the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and (2) the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58 or comprising the amino acid sequence of SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63 to produce a drimane sesquiterpene ; and optionally isolating the drimane sesquiterpene. In another aspect, the polypeptide further comprises one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- In one aspect, the drimane sesquiterpene is albicanol and/or drimenol. In another aspect, the drimane sesquiterpene is isolated.
- In another aspect provided here, the albicanol and/or drimenol is produced with greater than or equal to, 60%, 80%, or 90% or even 95% selectivity. In a further aspect the drimane sesquiterpene is albicanol.
- Further provided here is a method comprising transforming a host cell, microorganism or a non-human host organism with a nucleic acid encoding a polypeptide comprising a HAD-like hydrolase domain having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, or SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and comprising (1) the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and (2) the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58 or comprising the amino acid sequence of SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, or SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63.
- In one embodiment, a method provided herein comprises cultivating a non-human host organism or a host cell capable of producing FPP and transformed to express a polypeptide wherein the polypeptide comprises a sequence of amino acids that has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 1 or SEQ ID NO: 5 under conditions that allow for the production of the polypeptide.
- In a another embodiment, a method provided herein comprises contacting a sesquiterpene such as albicanol and/or drimenol with at least one enzyme to produce a sesquiterpene derivative. In one embodiment, the sesquiterpene derivative can be obtained biochemically or chemically. In one embodiment, a drimenol derivative is provided. Examples of such derivatives of drimenol include but not limited to drimenyl acetate (CAS 40266-93-1), drimenal (CAS 105426-71-9), drimenic acid (CAS 111319-84-7).
- In one embodiment, an albicanol derivative is provided. Examples of such derivatives of albicanol include cryptoporic acid E (CAS 120001-10-7), cryptoporic acid D (CAS 119979-95-2), cryptoporic acid B (CAS 113592-88-4), cryptoporic acid A (CAS 113592-87-3), laricinolic acid (CAS 302355-23-3), albicanyl acetate (CAS 83679-71-4).
- The albicanol and/or drimenol produced in any of the method described herein can be converted to derivatives such as, but not limited to hydrocarbons, esters, amides, glycosides, ethers, epoxides, aldehydes, ketons, alcohols, diols, acetals or ketals.
- The albicanol and/or drimenol derivatives can be obtained by a chemical method such as, but not limited to oxidation, reduction, alkylation, acylation and/or rearrangement.
- Alternatively, the albicanol and/or drimenol derivatives can be obtained using a biochemical method by contacting the albicanol and/or drimenol with an enzyme such as, but not limited to an oxidoreductase, a monooxygenase, a dioxygenase, a transferase. The biochemical conversion can be performed in-vitro using isolated enzymes, enzymes from lysed cells or in-vivo using whole cells.
- According to another particularly embodiment, the method of any of the above-described embodiments is carried out in vivo. In such a case, step a) comprises cultivating a non-human host organism or a host cell capable of producing FPP and transformed to express at least one polypeptide comprising an amino acid comprising SEQ ID NO: 1 or SEQ ID NO: 5 or a functional variant thereof which may be considered a polypeptide of the HAD-like hydrolase superfamily (Interpro protein superfamily IPR023214 or Pfam protein superfamily PF13419) and which comprises a HAD-like hydrolase domain and having a bifunctional terpene synthase activity, under conditions conducive to the production of drimane synthase, for example, albicanol and/or drimenol. In one embodiment, albicanol may be the only product or may be part of a mixture of sesquiterpenes. In another aspect, drimenol may be the only product or may be part of a mixture of sesquiterpenes.
- According to a further embodiment, the method further comprises, prior to step a), transforming a non-human organism or cell capable of producing FPP with at least one nucleic acid encoding a polypeptide comprising an amino acid comprising SEQ ID NO: 1 or SEQ ID NO: 5 or encoding a polypeptide having bifunctional terpene synthase activity and comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 1, SEQ ID NO: 5, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 23, SEQ ID NO: 26, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 35, or SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 47, SEQ ID NO: 50, or SEQ ID NO: 63; and (1) the sequence as set forth in SEQ ID NO: 53, SEQ ID NO: 54, or SEQ ID NO: 55; and (2) the sequence as set forth in SEQ ID NO: 56, SEQ ID NO: 57, or SEQ ID NO: 58, so that said organism expresses said polypeptide. The polypeptide may further comprise one or more conserved motif as set forth in SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, and/or SEQ ID NO: 62.
- These embodiments of an embodiment herein are particularly advantageous since it is possible to carry out the method in vivo without previously isolating the polypeptide. The reaction occurs directly within the organism or cell transformed to express said polypeptide.
- An embodiment herein provides polypeptides of an embodiment herein to be used in a method to produce a drimane sesquiterpene such as albicanol and/or drimenol contacting an FPP precursor with the polypeptides of an embodiment herein either in vitro or in vivo.
- Further provided is the use of a polypeptide as described herein for producing a drimane sesquiterpene, for example, albicanol and/or drimenol.
- The following examples are illustrative only and are not intended to limit the scope of the claims an embodiments described herein.
- Drimane sesquiterpenoids are widespread in nature (Jansen and Groot, 2004, Nat. Prod. Rep., 21, 449-477). The compounds in the drimane sesquiterpeneoid family contain the sesquiterpene structure with the drimane carbon skeleton depicted in
FIG. 1 . For example, commonly found drimane sesquiterpene are drimenol and albicanol (FIG. 1 ) and compounds derived from drimenol and albicanol by enzymatic reactions such as oxidations, reduction, acylation, alkylation or rearrangement. The drimane sesquiterpenoid family contains also compounds were the drimane sesquiterpene is bound to a molecule derived from another biosynthetic pathway (Jansen and Groot, 2004, Nat. Prod. Rep., 21, 449-477). - Cryptoporic acids A-H are drimane sequiterpenoid ethers of isocitric acid found in the fungus Cryptoporus volvatus (Hashimoto et al, 1987, Tetrahedron Let. 28, 6303-6304; Asakawa et al, 1992, Phytochemistry 31(2), 579-592; Hirotani et al, 1991, Phytochemistry 30(5), 1555-1559). In crypotoporic acids, the sesquiterpene moiety has the structure of albicanol and thus these compounds are putatively derived biosynthetically from albicanol. Laricinolic acid is a drimane type sesquiterpene which can be isolated from the wood-rotting fungus Laricifomes officinalis (Erb et al, 2000, J. Chem. Soc., Perkin Trans. 1, 2307-2309). Laricinolic acid is most likely derived from albicanol following several oxidative enzymatic steps.
- We undertook to characterize albicanol synthases and to identify nucleotide sequences encoding for albicanol synthases from Cryptoporus volvatus and Laricifomes officinalis. Strains of Laricifomes officinalis (ATCC® 64430™) and Cryptoporus volvatus (ATCC® 12212™) are conserved at the American Type Culture Collection (ATCC) under the collection numbers ATCC-64430 and ATCC-12212, respectively. The Laricifomes officinalis (ATCC® 64430™) and Cryptoporus volvatus (ATCC® 12212™) strains were purchased from LGC Standards GmbH (46485 Wesel, Germany). The cells were grown in Yeast Mold (YM) medium (Wickerham, 1939, J. Tropical Med. Hyg. 42, 176).
- For each of the two strains, genomic DNA and total RNA were extracted in order to sequence the full genome and a transcriptome. Cells propagated on YM-agar plates were used to inoculate 100 ml liquid YM medium in glass tubes. The cultures were incubated for 6 days with at 25° C. and 180 rpm agitation. For RNA extraction 0.5 ml of culture was taken, the cells (Approximately 100 mg) were recovered by centrifugation frozen in liquid nitrogen and grinded using a mortar and pestle. The total RNA pool was extracted using the ZR Fungal/Bacterial RNA MiniPrep™ from Zymo Research Corp (Irvine, Calif. 92614, U.S.A). From 100 mg of cells 18 and 23 micrograms of total RNA were obtained for ATCC-12212 and ATCC-64430, respectively. Genomic DNA was extracted using the NucleoSpin® Soil Kit from Machery-Nagel (Duren, Germany). Cells were recovered from the culture by centrifugation and the genomic DNA was extracted following the manufacturer protocol. From 500 mg of cells 1.05 and 0.93 micrograms of genomic DNA was extracted from ATCC-12212 and ATCC-64430, respectively.
- The genomic DNA was sequenced using a paired read protocol (Illumina). The libraries were prepared to select insert sizes between 250 and 350 bp. The sequencing was performed on a HiSeq 2500 Illumina sequencer. The length of the reads was 125 bases. A total of 21.3 and 30.4 millions of paired-reads (clusters) were sequenced for ATCC-12212 and ATCC-64430, respectively.
- For the transcriptomes the library was prepared from the total RNA using the TruSeq Stranded mRNA Library Preparation Kit (Illumina). An additional insert size selection step (160-240 bp) was performed. The libraries were sequenced in 2×125 bases paired-ends on a HiSeq 2500 Illumina sequencer. For ATCC-12212 and ATCC-64430, 19.9 million and 126 millions of reads were sequences, respectively.
- For assembly of the C. volvatus transcriptome, the reads were first joined on their overlapping ends. The joined paired reads were then assembled using the Velvet V1.2.10 assembler (Zerbino D. R. and Birney E. 2008, Genome Res. 18(5), 821-829; www.ebi.ac.uk/˜zerbino/velvet/) and the Oases software (Schulz M. H et al., 2012, Bioinformatics 28(8), 1086-1092; www.ebi.ac.uk/˜zerbino/oases/). A total of 25′866 contigs with an average length of 1,792 bases was obtained for the C. volvatus transcriptome.
- The C. volvatus genome was assembled using the Velvet V1.2.10 assembler (Zerbino D. R. and Birney E., 2008, Genome Res. 18(5), 821-829; www.ebi.ac.uk/˜zerbino/velvet/). The genome could be assembled in 1′266 contigs with an average size 20,000 bases and a total size of 25′320′421 bases. An ab-initio gene prediction in the C. volvatus genomic contigs was performed by Progenus S A (Gembloux, Belgium) using the Augustus software (Stanke et al., Nucleic Acids Res. (2004) 32, W309-W312). A total of 7738 genes were predicted. Functional annotation was performed combining a Pfam domain search (Finn, R. D. et al., 2016, Nucleic Acids Research Database Issue 44:D279-D285) and a Blast search (Altschul et al., 1990, J. Mol. Biol. 215, 403-410).
- The genome and transcriptome of L. officinalis were assembled using the CLC Genomic Workbench (Qiagen). The genome was assembled in 16′831 contigs for a total genome size of 90′591′190 bases. The transcriptome assembly provided 28′633 contigs with an average length of 1′962 bases.
- Using a tBlastn search (Altschul et al., 1990, J. Mol. Biol. 215, 403-410) with the amino acid sequences of known sesquiterpene synthases as query sequences, 6 and 10 putative sesquiterpene synthases sequences were identified in the C. volvatus genome and L. officinalis genome, respectively. The sequences were manually corrected, in particular for the intro-exon junction localizations, using a mapping of the RNA sequencing reads on the genomic contigs. The corresponding cDNAs were then codon-optimized for optimal E. coli expression, synthesized and cloned in an expression plasmid (pJ401, ATUM, Newark, Calif.). Functional expression E coli cells and enzyme characterization assay showed sesquiterpene synthase activities but did not reveal any formation of albicanol from FPP.
- Drimane sesquiterpene are presumably produced from farnesyl-diphosphate (FPP) by an enzymatic mechanism involving a protonation-initiated cyclization followed by an ionization-initiated reaction (Henquet et al., 2017, Plant J. Mar 4. doi: 10.1111/tpj.13527; Kwon, M.et al., 2014, FEBS Letters 588, 4597-4603) (
FIG. 2 ). This implies that the drimane synthases are composed of two catalytic domains, a protonation-initiated cyclization catalytic domain and an ionization-initiated cyclization catalytic domain. - Terpene synthases catalyzing protonation-initiated cyclization reaction are called class II (or type II) terpene synthases and are typically involved in the biosynthesis of triterpenes and labdane diterpenes. In class II terpene synthases the protonation-initiated reaction involves acidic amino acids donating a proton to the terminal double-bond. These residues, usually aspartic acids, are part of a conserved DxDD motif located in the active site of the enzyme.
- Terpene synthases catalyzing ionization-initiated reactions are called class I (or type I) terpene synthases, generally monoterpene and sesquiterpene synthases, and the catalytic center contains a conserved DDxxD (part of SEQ ID NO: 53) motif. The aspartic acid residues of this class I motif bind a divalent metal ion (most often Mg2+) involved in the binding of the diphosphate group and catalyze the ionization and cleavage of the allylic diphosphate bond of the substrate.
- The putative cyclization mechanism of a farnesyl-diphosphate to a drimane sesquiterpene (such as albicanol or drimenol) starts with the protonation of the 10,11-double bond followed by the sequential rearrangements and carbon-bond formations. The carbocation intermediate of this first (class II) reaction can then undergo deprotonation at C15 or C4 (or eventually at C2) leading to an albicanyl-diphosphate or drimenyl-diphosphate intermediate. Finally the class I catalytic domain catalyzes the ionization of the allylic diphosphate bond and quenching of the carbocation intermediated by a water molecule leading to a drimane sesquiterpene containing a primary hydroxyl group (
FIG. 2 ). If necessary, any traces of residual phosphorylated intermediates of the albicanol or drimenol synthesis, like any albicanyl—or drimenyl-monophosphate and/or—diphosphate, may be chemically converted to the respective final product albicanol or drimenol. Certain corresponding methods are known and may comprise, for example, the hydrolytic cleavage of the phosphoric acid ester bond. Additionally, certain intermediates can also be converted enzymatically as shown in Examples 7 and 8. - Based on the above considerations, we searched the C. volvatus and L. officinalis genome and transcriptome data for sequences encoding for polypeptides containing together a class I and a class II terpene synthase motif. Recently, a drimanyl-diphosphate synthase (AstC) was identified in the fungus Aspergillus oryzea (Shinohara Y. et al., 2016, Sci Rep. 6, 32865). The enzyme contains a class II terpene synthase domain and catalyzes the protonation-initiation cyclization of farnesyl-diphosphate to drimanyl-diphosphate. However, this enzyme does not have a class I terpene synthase activity and thus does not catalyze the ionization and cleavage of the allylic diphosphate group. Using the sequence of AstC, we first search the amino acid sequences deduced from the genes predicted in the C. volvatus genome. Using a Blastp search against the amino acid sequences deduced from the predicted genes, 5 sequences were retrieved with an E value between 0.77 and 3e-089 (Altschul et al., 1990, J. Mol. Biol. 215, 403-410).
- Amongst these 5 sequences, CvTps1 was selected as the most relevant for a putative albicanol synthases. The amino acid sequence encoded by the CvTps1 gene shared 38% identity with the AstC amino acid sequence. Analysis of this sequence revealed the presence of a class II terpene synthase-like motif, DVDT, at position 275-279. This is a variant of the typical class II terpene synthase motif mentioned above, where the last Asp is replaced by a Thr. This DxDT class II motif is found in some class II diterpene synthases (Xu M. et al., 2014, J. Nat. Prod. 77, 2144-2147; Morrone D. et al., 2009, J. FEBS Lett., 583, 475-480) and in AstC. Another interesting feature of the CvTps1 sequence is the presence of a typical class I motif in the N-terminal region (DDKLD at position 168-172). The presence of this class I motif, not present in AstC, suggests that CvTps1 can catalyze an ionization-initiated reaction in addition to the class II reaction. Another difference with AstC is the presence of a C-terminal extension, the CvTps1 peptide contains 46 additional amino acids at the C-terminal end. Thus CvTps1 was selected as putative candidate for a bi-functional albicanol synthase.
- Protein family databases such as Pfam and Interpro (European Bioinformatic Institute (EMBL-EBI) are databases of protein families including functional annotation, protein domains and protein domain signatures. The amino acid sequence of CvTps1 was searched for the occurrence of motifs characteristic of protein domains using the HMMER algorithm available on the HMMER website (Finn R. D., 2015, Nucleic Acids Research Web Server Issue 43:W30-W38; www.ebi.ac.uk/Tools/hmmer/). No domain associated with classical terpene synthases was found in the CvTps1 amino acid sequence. The query identified a domain characteristic of the Haloacid dehalogenase (HAD)-like hydrolase protein superfamily (PF13419.5) in the region between residues 115 and 187. A similar search using the Interpro protein family database (see the ebi.ac.uk/interpro/web site) and the Conserved Domain Database (NCBI web site at ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) provided the same results: only the prediction of a domain of the HAD-like hydrolase superfamily in the N-terminal region (IPR 023214 and CL21460, respectively). The HAD-like hydrolase superfamily contains a large number of proteins with various functions including enzymes with phosphatase activity (Koonin and Tatusov, 1994, J. Mol. Biol 244, 125-132; Kuznetsova et al, 2015, J Biol Chem. 290(30), 18678-18698). The class I terpene synthase-like motif identified above in the CvTps1 polypeptide contains one of the HAD-like hydrolase motif signatures containing a conserved aspartic acid residues involved in the catalytic (phosphatase) activity. This analysis thus confirms that the N-terminal region of CvTps1 is involved in hydrolysis of the diphosphate group (class I terpene synthase activity).
- No significant domain prediction was obtained in the C-terminal portion the polypeptide. Given the presence of a class II terpene synthase-like motif, the C-terminal part is likely involved in the protonation-initiated cyclization.
- The CvTps1 amino acid sequence was used to search for homologous sequences in the L. officinalis genome and transcriptome. For this search the tBlastn algorithm was used (Altschul et al 1990, J. Mol. Biol. 215, 403-410). One transcript, LoTps1 showed sequence similarity with CvTps1: the length of the sequence (521 amino acid) was similar to the length of the CvTps1 amino acid sequence, the overall sequence identity between the two sequences was 71%, the N-terminal region contained a typical class I terpene synthase motif (DDKLD at position 162-166), a class II terpene synthase motif (DMDT) was found in position 267-270 and the N-terminal region contain a predicted HAD-like hydrolase domain.
- The CvTps1 and LoTps1 coding sequences were control and the intron-exon jonctions predictions were refined using mappings of the RNA sequencing reads against the genomic contigs. The coding sequences of the resulting cDNAs were codon optimized and cloned in the pJ401 E. coli expression plasmid (pJ401, ATUM, Newark, Calif.).
- The enzymes were functionally characterized in E. coli cells engineered to overproduce farnesyl-diphosphate (FPP). Competent E. coli cells were transformed with the plasmid pACYC-29258-4506 (described in WO2013064411 or in Schalk et al., 2013, J. Am. Chem. Soc. 134, 18900-18903) and with the pJ401-CvTps1 or pJ401-LoTps1 expression plasmid. The pACYC-29258-4506 carries the cDNA encoding for a FPP synthase gene and the genes for a complete mevalonate pathway. The KRX E. coli cells (Promega) were used as a host. Transformed cells were selected on kanamycin (50 μg/ml) and chloramphenicol (34 μg/ml) LB-agarose plates. Single colonies were used to inoculate 5 mL liquid LB medium supplemented with the same antibiotics. The culture was incubated overnight at 37° C. The
next day 2 mL of TB medium supplemented with the same antibiotics were inoculated with 0.2 mL of the overnight culture. After 6 hours incubation at 37° C., the culture was cooled down to 28° C. and 0.1 mM IPTG, 0.2% rhamnose and 10% in volume (0.2 ml) of dodecane were added to each tube. The cultures were incubated for 48 hours at 28° C. The cultures were then extracted twice with 2 volumes of tert-Butyl methyl ether (MTBE), the organic phase were concentrated to 500 μL and analyzed by GC-MS. - The GC-MS analysis were performed using an Agilent 6890 Series GC system connected to an Agilent 5975 mass detector. The GC was equipped with 0.25 mm inner diameter by 30 m DB-1MS capillary column (Agilent). The carrier gas was He at a constant flow of 1 mL/min. The inlet temperature was set at 250° C. The initial oven temperature was 80° C. followed by a gradient of 10° C./min to 220° C. and a second gradient of 30° C./min to 280° C. The identification of the products was based on the comparison of the mass spectra and retention indices with authentic standards and internal mass spectra databases.
- In these conditions formation of a single product was observed with the recombinant CrVo07609 protein. The final concentration for this enzyme product was 200 mg/l of culture medium. The retention time in gas chromatography as well as the mass spectrum was in accordance with the GCMS data of an authentic (+)-albicanol standard. For structure confirmation, the recombinant cells were cultivated in a larger (500 ml) volume in the conditions described above. The MTBE was distilled form the extract and the resulting suspension in dodecane was subjected to flash chromatography. The product was eluted with a mixture 1:5 of MTBE and cyclohexane. The solvent was removed by distillation providing a product with 98% purity. The structure of albicanol was confirmed by 1H- and 13C-NMR analysis. The optical rotation was measured using a
Bruker Avance 500 MHz spectrometer. The value of [α]D 20=+3.8° (0.26%, CHCl3) confirmed the formation of (+)-albicanol (with the structure shown inFIG. 1 ) by the recombinant CvTps1 protein. - The activity of LoTps1 was evaluated in the same conditions. The product profile was identical to the profile of CvTps1 with (+)-albicanol as the only detected product of the recombinant LoTps1 enzyme.
- This experiments show that the CvTps1 and LoTps1 are enzyme with bifunctional class II cyclase activity and class I phosphatase activity.
- The amino acid sequences of CvTps1 and LoTps1 were used to search for homologous sequences from other organisms present in public databases. A blastp search approach (Altschul et al., 1990, J. Mol. Biol. 215, 403-410) was first used to search in the protein database of the National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov/) for sequences showing homology with CvTps1 and LoTps1. The retrieved amino acids were then analyzed for the presence of the CyTps1 and LoTps1 features described in Example 3. Fifteen sequences, all from fungi species, were selected for further analysis and enzymatic activity characterization: NCBI accession OCH93767.1 from Obba rivulosa, NCBI accession EMD37666.1 from Gelatoporia subvermispora, NCBI accession XP_001217376.1 from Aspergillus terreus, NCBI accession OJJ98394.1 from Aspergillus aculeatus, NCBI accession GAO87501.1 from Aspergillus udagawae, NCBI accession XP_008034151.1 from Trametes versicolor, NCBI accession XP_007369631.1 from Dichomitus squalens, NCBI accession KIA75676.1 from Aspergillus ustus, NCBI accession XP_001820867.2 from Aspergillus oryzae, NCBI accession CEN60542.1 from Aspergillus calidoustus, NCBI accession XP_009547469.1 from Heterobasidion irregulare, NCBI accession KLO09124.1 from Schizopora paradoxa, NCBI accession OJI95797.1 from Aspergillus versicolor.
- The sequence of EMD3766.1 was corrected by deleting the amino acids 261 to 266 present in the published sequence and probably resulting from incorrect splicing prediction (sequence EMD37666-B in table 1). Another sequence, ACg006372 was selected from the published annotated sequence of Antrodia cinnamomea (Lu et al., 2014, Proc. Natl. Acad. Sci. USA. 111(44):E4743-52, (Dataset S1)).
- The 15 putative terpene synthases amino acid sequences contain a class II terpene synthase-like motif with the consensus sequence D(V/M/L/F)D(T/S) as well as a class I terpene synthase-like motif with the consensus sequence DD(K/N/Q/R/S)xD (were x is a hydrophobic residue L, I, G, T or P). The class I and class II motifs are easily localized using an alignment of the amino acid sequences with the sequences of CvTps1 and LoTps1 (
FIG. 6 ). Such alignment can be made using for example the program Clustal W (Thompson J. D. et al., 1994, Nucleic Acids Res. 22(22), 4673-80). In addition, the presence of a HAD-like hydrolase domain was identified in the N-terminal region of the 15 amino acid sequences (betweenpositions 1 and 183 to 243 of the sequences) (Table 3). - The features of the above sequences thus suggest that the proteins contain a phosphatase or class I terpene synthase domain and a class II terpene synthase domain in the N-terminal and C-terminal region, respectively and thus have bifunctional protonation-initiated cyclization and ionization-initiated catalytic activities. Alignment of the sequences and pairwise comparisons (Table 2) of the above amino acid sequences showed a lowest sequence identity value of 37% and a highest value of 89% (without considering the two EMD37666.1 variants). Compared to CvTps1 and LoTps1, the closest sequences shared 85% identity and the most distant sequence only 42% identity.
-
TABLE 1 List of selected sequences showing sequence homology with CyTps1 and LoTps1 and containing a class I and a class II motifs. The source (species) of the sequences, SEQ ID NO, length of the sequence, sequence region containing the class I andclass II motifs, and positions of the class I and class II motifs are listed. The residues of class I and class II motifs are in bold. Putative Class Class Name or NCBI Protein Length function I motif Class I II motif Class II accession SEQ (amino (database region motif region motif number Source ID NO acids) annotation) sequence position sequence position CvTps1 Cryptoporus 1 525 VFVDDKLD 168-172 FPDDVDTT 273-276 volvatus NVA S LoTps1 Laricifomes 5 521 VFVDDKLD 162-166 FPDDMDTT 267-270 officinalis NVV S OCH93767.1 Obba ribulosa 9 527 HAD-like VFVDDKID 166-170 FPDDLDTT 271-274 protein NVL S EMD37666.1 Gelatoporia 12 533 hypothetical VFVDDKID 166-170 FPDDLDTT 277-280 subvermispora protein NVL S EMD37666-B Gelatoporia 15 528 hypothetical VFVDDKID 166-170 FPDDLDTT 271-274 subvermispora protein NVL S XP_001217376.1 Aspergillus 17 486 Predicted MFIDDKLE 161-165 FPDDMDTT 267-270 terreus protein NVI S OJJ98394.1 Aspergillus 20 483 Hypothetical VFVDDKTE 162-166 FPNDLDTT 268-271 aculeatus protein NVL S GAO87501.1 Aspergillus 23 485 alpha-D- IFIDDQLE 167-171 FPDDVDTT 273-276 udagawae glucose-1- NVV S phosphate phosphatase YihX XP_008034151.1 Trametes 26 524 HAD-like VFVDDKLD 168-172 FPDDVDTT 273-276 versicolor protein NVV S XP_007369631.1 Dichomitus 29 527 HAD-like VFVDDKLD 168-172 FPDDVDTT 273-276 squalens protein NVA S ACg006372 Antrodia 32 496 HAD-like VFVDDRIE 179-183 YPDDFDTT 286-289 cinnamomea protein NVV S KIA75676.1 Aspergillus 35 543 Hypothetical VFVDDNLE 161-165 FPDDMDTT 267-270 ustus protein NVTS S XP_001820867.2 Aspergillus 38 477 Hypothetical IFVDDQLE 167-171 FPDDVDTT 273-276 oryzae protein NVIS S CEN60542.1 Aspergillus 41 528 Hypothetical VFVDDNLD 161-165 FPDDLDTT 267-270 calidoustus protein NVT S XP_009547469.1 Heterobasidion 44 531 Hyopthetical VFVDDKGD 166-170 FPFDLDTT 272-275 irregulare protein NVL S KLO09124.1 Schizopora 47 518 HAD-like VFVDDKLD 209-213 FPCDLDST 315-318 paradoxa protein NVI S OJI95797.1 Aspergillus 50 507 hypothetical VFIDDSPE 163-167 FPNDLDTT 269-272 versicolor protein NIL S -
TABLE 2 Pairwise sequence comparison of the selected putative bifunctional terpene synthases. The percentage of sequence identity is listed for each pairwise comparison. CvTps1 LoTps1 OCH93676.1 EMD37666.1 EMD37666-B XP_001217376.1 CvTps1 100 71 60 60 60 42 LoTps1 72 100 60 58 59 43 OCH93767.1 61 60 100 88 89 43 EMD37666-B 60 59 89 99 100 43 EMD37666.1 60 58 88 100 99 43 XP_001217376.1 42 43 43 43 43 100 OJJ98394.1 47 48 47 47 47 54 GAO87501.1 46 45 47 47 47 42 XP_008034151.1 73 85 62 60 61 43 XP_007369631.1 84 74 61 60 61 44 ACg006372 45 48 47 46 47 37 KIA75676.1 44 43 46 45 46 45 XP_001820867.2 45 44 44 43 44 41 CEN60542.1 44 45 45 46 46 44 XP_009547469.1 54 55 54 53 54 43 KLO09124.1 51 53 53 51 52 39 OJI95797.1 45 43 45 45 46 55 OJJ98394.1 GAO87501.1 XP_008034151.1 XP_007369631.1 ACg006372 KIA75676.1 CvTps1 47 46 72 84 45 44 LoTps1 48 45 85 74 48 43 OCH93767.1 47 47 62 62 48 46 EMD37666-B 47 47 61 62 47 46 EMD37666.1 47 47 60 61 46 45 XP_001217376.1 54 42 43 44 37 45 OJJ98394.1 100 44 47 48 41 48 GAO87501.1 44 100 46 47 45 46 XP_008034151.1 47 46 100 77 49 44 XP_007369631.1 48 48 77 100 48 45 ACg006372 41 45 48 48 100 42 KIA75676.1 48 46 44 43 42 100 XP_001820867.2 44 69 45 46 44 47 CEN60542.1 49 43 45 45 40 72 XP_009547469.1 45 47 55 55 49 44 KLO09124.1 44 45 51 51 55 45 OJI95797.1 56 43 44 46 39 45 XP_001820867.2 CEN60542.1 XP_009547469.1 KLO09124.1 OJI95797.1 CvTps1 45 44 55 52 45 LoTps1 44 45 55 53 43 OCH93767.1 44 45 54 53 45 EMD37666-B 44 46 58 52 46 EMD37666.1 43 46 57 52 45 XP_001217376.1 41 44 43 39 55 OJJ98394.1 44 49 45 44 56 GAO87501.1 69 43 47 45 43 XP_008034151.1 45 45 55 52 44 XP_007369631.1 46 46 56 52 46 ACg006372 44 40 49 55 39 KIA75676.1 46 72 44 45 45 XP_001820867.2 100 47 46 42 41 CEN60542.1 47 100 48 43 45 XP_009547469.1 46 48 100 54 47 KLO09124.1 42 44 54 100 44 OJI95797.1 41 45 47 44 100 - The cDNAs encoding for the 15 new putative synthases described in Example 5 were codon optimized and cloned in the pJ401 E. coli expression plasmid (pJ401, ATUM, Newark, California). The enzymes were functionally characterized in E. coli cells engineered to overproduce farnesyl-diphosphate (FPP) following the procedure described in example 4. Amongst the 15 new recombinant enzymes, 9 produced (+)-albicanol as major product: OCH93767.1, EMD37666.1, EMD37666-B, XP_001217376.1, OJJ98394.1, GAO87501.1 XP_008034151.1, XP_007369631.1 and ACg006372 (
FIGS. 7 and 8 ). These results confirm that these enzymes have bifunctional albicanol synthase enzymatic activities. - The 6 other new synthases, KIA75676.1, XP_001820867.2, CEN60542.1, XP_009547469.1 and KLO09124.1 and OJI95797.1, produced (−)-drimenol as major product (
FIG. 9 ). Drimenol is produced by a mechanism similar to the formation of albicanol and involving a class II followed by class I enzymatic activity. - For XP_001820867.2, the formation of a significant amount of trans-farnesol was detected (
FIG. 9 ). This was likely due to lower enzymatic activity of this synthase and thus a significant amount of the farnesyl-diphosphate produced in the bacterial cell was not converted to drimenol. This excess farnesyl-diphosphate was hydrolyzed by the endogenous alkaline phosphatase and the trans-farnesol produced was released in the growing medium. - The two Pfam domains identified in CvTps1, i.e. PF13419.5 and PF13242.5 as described in Example 3, are also found in these new putative synthases as shown in Table 3.
-
TABLE 3 Locations of the haloacid dehalogenase-like hydrolase domain in each of the bifunctional synhtases described herein. HAD-like HAD-like hydrolase hydrolase Ezyme Length Product domain start domain end CvTps1 525 Albicanol 115 187 LoTps1 521 Albicanol 62 181 OCH93767.1 527 Albicanol 51 185 EMD37666.1 533 Albicanol 54 185 EMD37555-B 528 Albicanol 54 185 XP_001217376.1 486 Albicanol 25 181 OJJ98394.1 483 Albicanol 25 181 GAO87501.1 485 Albicanol 34 186 XP_008034151.1 524 Albicanol 60 187 XP_007369631.1 527 Albicanol 120 187 ACg006372 496 Albicanol 60 198 KIA75676.1 543 Drimenol 43 180 XP_001820867.2 477 Drimenol 12 186 CEN60542.1 528 Drimenol 20 180 XP_009547469.1 531 Drimenol 77 185 KLO09124.1 518 Drimenol 119 228 OJI95797 507 Drimenol 48 180 - In-vitro assays.
- Crude protein extracts containing the recombinant terpene synthases are prepared using KRX E. coli cells (Promega) or BL21 Star™ (DE3) E. coli (ThermoFisher). Single colonies of cells transformed with the expression plasmid are used to inoculate 5 ml LB medium. After 5 to 6 hours incubation at 37° C., the cultures are transferred to a 25° C. incubator and left 1 hour for equilibration. Expression of the protein is then induced by the addition of 1 mM IPTG and the cultures are incubated over-night at 25° C. The next day, the cells are collected by centrifugation, resuspended in 0.1 volume of 50 mM MOPSO pH 7 (3-Morpholino-2-hydroxypropanesulfonic acid (sigma-Aldrich), 10% glycerol and lyzed by sonication. The extracts are cleared by centrifugation (30 min at 20,000 g) and the supernatants containing the soluble proteins are used for further experiments.
- These crude E. coli protein extracts containing the recombinant protein are used for the characterization of the enzymatic activities. The assays are performed in glass tubes in 2 mL of 50
mM MOPSO pH 7, 10% glycerol, 1 mM DTT, 15 mM MgCl2 in the presence of 80 μM of farnesyl-diphosphate (FPP, Sigma) and 0.1 to 0.5 mg of crude protein. The tubes are incubated 12 to 24 hours at 25° C. and extracted twice with one volume of pentane. After concentration under a nitrogen flux, the extracts are analyzed by GC-MS as described in Example 4 and compared to extracts from assays with control proteins. The aqueous phase is then treated by alkaline phosphatase (Sigma, 6 units/ml), followed by extraction with pentane and GC-MS analysis. - The assays without alkaline phosphatase treatment allow detecting and identifying the sesquiterpene compounds (hydrocarbons and oxygenated sesquiterpenes) present in the assay and produced by the recombinant enzymes. Albicanyl-diphosphate or drimenyl-diphosphate compounds are not soluble in the organic solvent and are thus not detected in the GC-MS analysis. Following the alkaline phosphatase treatment, allylic diphosphate bounds are cleaved and when albicanyl-diphosphate or drimenyl-diphosphate compounds are present, the sequiterpene moiety is released, extracted in the solvent phase and detected in the GC-MS analysis. This example allows to differentiate enzymes having only class II terpene synthase activity (such as AstC, NCBI accession XP_001822013.2, Shinohara Y. et al., 2016, Sci Rep. 6, 32865) from enzyme having class II terpene synthase-like activity and class I (phosphatase) activity such as CvTps1 and LoTps1.
- In Shinohara Y. et al., 2016, Sci Rep. 6, 32865 a drimane terpene synthase (AstC, NCBI accession XP_001822013.2) is described. This synthase produce a drimane sequiterpene bound to a diphosphate moiety. To produce a free drimane sesquiterpene the AstC enzyme must be combined with enzymes having phosphatase activity. The publication also describes two phosphatases AstI and AstK (XP_001822007.1 and XP_003189903.1) catalyzing the sequential cleavage of the phosphate moiety of the drimane-diphosphate produced by AstC.
- Synthetic operons were designed to co-express the CvTps1 protein with the AstI and AstK proteins. The synthetic operon contains the optimized cDNA encoding for each of the 3 proteins separated by a ribosome binding sequence (RBS). A similar operon was designed to co-express AstC with AstI and AstK. The operons were synthesized and cloned in the pJ401 expression plasmid (ATUM, Newark, Calif.). E coli cells were co-transformed with these expression plasmids and with the pACYC-29258-4506 plasmid (Example 4) and the cells were cultivated under conditions to produce sesquiterpenes as described in Example 4. The sequiterpenes produced were analyzed by GCMS as described in Example 4 and compared to the sequiterpene profile of cells expression only CvTps1 or AstC.
- As shown
FIG. 10 , with AstC a significant higher amount (78-fold increase) of sesquiterpene is produced when the enzyme is co-expressed with enzymes (AstI and AstK) having phosphatase activity. Typical concentrations of drimane sesquiterpene in the E. coli cultures were 2,600 mg/ml with cells expressing AstC, AstI and AstK and 34 mg/ml with cells expressing AstC alone. - In contrast, with CvTps1 no significant difference is observed for the amount of drimane sesquiterpene produced when the enzyme is expressed alone (1,000 mg/ml) or co-expressed with the phosphatases (1,200 mg/ml). This experiment confirms that the CvTps1 polypeptide, in contrast to the previously known AstC synthase, carries phosphatase activity in addition to the cyclase activity (i.e. class I and class II terpene synthase activity).
- The NCBI accession No XP 006461126.1 from Agaricus bisporus was selected using the method described in Example 5. The XP006461126.1 amino acid (SEQ ID NO: 63) shared 48.9% and 48.1% identity with the CvTps1 and LoTps1 amino acid sequences, respectively. The XP_006461126.1 contains a class II terpene synthase-like motif (DLDT) (part of SEQ ID NO: 56) located between position 278 and 271 and a class I terpene synthase-like motif (DDKLE) (part of SEQ ID NO: 55) located at position 167 to 171. The amino acid contains also motifs characteristic of of the Haloacid dehalogenase-like hydrolase superfamily in the N-terminal region.
- The cDNA encoding for XP 006461126.1 was codon optimized and cloned in the pJ401 E. coli expression plasmid (pJ401, ATUM, Newark, Calif.). The enzyme was functionally characterized in E. coli cells engineered to overproduce farnesyl-diphosphate (FPP) following the procedure described in Example 4. The results show that XP 006461126.1 is a bifunctional drimenol synthase producing drimenol as major compound (
FIG. 11 ). - In vivo drimane sesquiterpene production in Saccharomyces cerevisiae cells using fungal hydrolase-like bifunctional sesquiterpene synthases.
- Different hydrolase-like bifunctional sesquiterpene synthases were evaluated for the production of drimane sesquiterpenes in S. cerevisiae cells. The selected synthases were:
-
- XP_007369631.1, NCBI accession No XP_007369631.1, from Dichomitus squalens
- XP_006461126, NCBI accession No XP_006461126, from Agaricus bisporus
- LoTps1, SEQ ID NO: 5, from Laricifomes officinalis
- EMD37666.1, NCBI accession No EMD37666.1, from Gelatoporia subvermispora
- XP_001217376.1, NCBI accession No XP_001217376.1, from Aspergillus terreus
- The codon usage of the cDNA encoding for the different synthases was modified for optimal expression in S. cerevisiae (SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70).
- For expression of the different genes in S. cerevisiae, a set of plasmids were constructed in vivo using yeast endogenous homologous recombination as previously described in Kuijpers et al., Microb Cell Fact., 2013, 12:47. Each plasmid is composed by five DNA fragments which were used for S. cerevisiae co-transformation. The fragments were:
-
- Fragment a: LEU2 yeast marker, constructed by PCR using the primers 5′-AGGTGCAGTTCGCGTGCAATTATAACGTCGTGGCAACTGTTATCAGTCGTACC GCGCCATTCGACTACGTCGTAAGGCC-3′ (SEQ ID NO: 71) and 5′-TCGTGGTCAAGGCGTGCAATTCTCAACACGAGAGTGATTCTTCGGCGTTGTTG CTGACCATCGACGGTCGAGGAGAACTT -3′ (SEQ ID NO: 72) with the plasmid pESC-LEU (Agilent Technologies, California, USA) as template;
- Fragment b: AmpR E. coli marker, constructed by PCR using the primers 5′-TGGTCAGCAACAACGCCGAAGAATCACTCTCGTGTTGAGAATTGCACGCCTT GACCACGACACGTTAAGGGATTTTGGTCATGAG-3′ (SEQ ID NO: 73) and 5′-AACGCGTACCCTAAGTACGGCACCACAGTGACTATGCAGTCCGCACTTTGCC AATGCCAAAAATGTGCGCGGAACCCCTA-3′ (SEQ ID NO: 74) with the plasmid pESC-URA as template;
- Fragment c: Yeast origin of replication, obtained by PCR using the primers 5′-TTGGCATTGGCAAAGTGCGGACTGCATAGTCACTGTGGTGCCGTACTTAGGG TACGCGTTCCTGAACGAAGCATCTGTGCTTCA-3′ (SEQ ID NO: 75) and 5′-CCGAGATGCCAAAGGATAGGTGCTATGTTGATGACTACGACACAGAACTGCG GGTGACATAATGATAGCATTGAAGGATGAGACT-3′ (SEQ ID NO: 76) with pESC-URA as template;
- Fragment d: E. coli replication origin, obtained by PCR using the primers 5′-ATGTCACCCGCAGTTCTGTGTCGTAGTCATCAACATAGCACCTATCCTTTGGC ATCTCGGTGAGCAAAAGGCCAGCAAAAGG-3′ (SEQ ID NO: 77) and 5′-CTCAGATGTACGGTGATCGCCACCATGTGACGGAAGCTATCCTGACAGTGTA GCAAGTGCTGAGCGTCAGACCCCGTAGAA-3′ (SEQ ID NO: 78) with the plasmid pESC-URA as template and
- Fragment e: A fragment composed by the last 60 nucleotides of the fragment “d”, 200 nucleotides downstream the stop codon of the yeast gene PGK1, one of the hydrolase-like bifunctional sesquiterpene synthase coding sequences tested, codon optimized for its expression in S. cerevisiae, the promoter of GAL1 and 60 nucleotides corresponding to the beginning of the fragment “a”. These fragments were obtained by DNA synthesis (ATUM, Newark, Calif.).
- To increase the level of endogenous farnesyl-diphosphate (FPP) pool in S. cerevisiae cells, an extra copy of all the yeast endogenous genes involved in the mevalonate pathway, from ERG10 coding for acetyl-CoA C-acetyltransferase to ERG20 coding for FPP synthetase, were integrated in the genome of the S. cerevisiae strain CEN.PK2-1C (Euroscarf, Frankfurt, Germany) under the control of galactose-inducible promoters, similarly as described in Paddon et al., Nature, 2013, 496:528-532. Briefly, three cassettes were integrated in the LEU2, TRP1 and URA3 loci respectively. A first cassette containing the genes ERG20 and a truncated HMG1 (tHMG1 as described in Donald et al., Proc Natl Acad Sci USA, 1997, 109:E111-8) under the control of the bidirectional promoter of GAL10/GAL1 and the genes ERG19 and ERG13 also under the control of GAL10/GAL1 promoter, the cassette was flanked by two 100 nucleotides regions corresponding to the up- and down-stream sections of LEU2. A second cassette where the genes IDI1 and tHMG1 were under the control of the GAL10/GAL1 promoter and the gene ERG13 under the control of the promoter region of GAL7, the cassette was flanked by two 100 nucleotides regions corresponding to the up- and down-stream sections of TRP1. A third cassette with the genes ERG10, ERG12, tHMG1 and ERG8, all under the control of GAL10/GAL1 promoters, the cassette was flanked by two 100 nucleotides regions corresponding to the up- and down-stream sections of URA3. All genes in the three cassettes included 200 nucleotides of their own terminator regions. Also, an extra copy of GAL4 under the control of a mutated version of its own promoter, as described in Griggs and Johnston, Proc Natl Acad Sci USA, 1991, 88:8597-8601, was integrated upstream the ERG9 promoter region. In addition, the endogenous promoter of ERG9 was replaced by the yeast promoter region of CTR3 generating the strain YST035. Finally, YST035 was mated with the strain CEN.PK2-1D (Euroscarf, Frankfurt, Germany) obtaining a diploid strain termed YST045.
- YST045 was transformed with the fragments required for in vivo plasmid assembly. Yeast transformations were performed with the lithium acetate protocol as described in Gietz and Woods, Methods Enzymol., 2002, 350:87-96. Transformation mixtures were plated on SmLeu-media containing 6.7 g/L of Yeast Nitrogen Base without amino acids (BD Difco, New Jersey, USA), 1.6 g/L Dropout supplement without leucine (Sigma Aldrich, Missouri, USA), 20 g/L glucose and 20 g/L agar. Plates were incubated for 3-4 days at 30° C. Individual colonies were used to produce drimane sesquiterpenes in tubes or shake flasks containing media as described in Westfall et al., Proc Natl Acad Sci USA, 2012, 109:E111-118 and mineral oil ((2705-01, J.T. Baker, Avantor Performance Materials, Inc. Center Valley, Pa., USA) as organic overlay. Under these culture conditions, albicanol or drimenol were produced with all hydrolase-like bifunctional sesquiterpene synthases tested. The production of drimane sesquiterpenes was identified using GC-MS analysis and quantified by GC-FID (see
FIG. 12 ) with an internal standard. The table below shows the quantities of drimane sesquiterpene produced relative to the quantity obtained by the synthase XP 007369631.1 (under these experimental conditions, the concentration of drimane sesquiterpene produced by cells expressing XP 007369631.1 was 805 to 854 mg/L, the highest quantity produced). -
Relative quantity of drimane Enzyme Product sesquiterpene produced XP_007369631.1 Albicanol 100 XP_006461126 Drimenol 39 LoTps1 Albicanol 31 EMD37666.1 Albicanol 23 XP_001217376.1 Albicanol 3 -
Sequence Listings CvTps1 - CvTps1 Protein SEQ ID NO: 1 MTTIHRRHTTLILDLGDVLFRWSPKTETAIPPRQLKEILTSVTWFEYERGQISQTECYERCAAEFKVDPLVIAEAFKQARES LRPNKAFIALIRELRHQMHGDLTVLALSNISLPDYEYIMSLSSDWATVFNRVFPSALVGERKPHLGCYRKVISEMSLEPQT TVFVDDKLDNVASARSLGMHGIVFDNEANVFRQLRNIFGNPVSRGQGYLRKHAGKLESSTDNGLTFEENFTQLIIYEVT QDRSLITLSECPRTWNFFRGQPLFSESFPDDVDTTSVALTVLQPDRALVDSILDQMLEYVDADGIMQTYFDSSRPRIDPF VCVNVLSLFYANGRGRELPHTLEWVYEVLLHRAYHGGSRYYLSPDCFLFFMSRLLKRANDSALQARFRPLFMERVKERV GAAGDSMDLAFRILAAATIGVHCPQDLERLAAAQCEDGGWDMCWFYAFGSTGIKAGNRGLTTALAVAAIRTALGRPP SPSPSNISSSSKLDAPNSFLGIPRPTSPIRFGELFRSWRKNKPTAKSQ - CvTps1 transcript (including non-coding sequences) SEQ ID NO: 2 CATCCCGCCTTTTGAGCATGGCACACAAACAGCCTTTAAGGAGCTCCTTGGTTGCCTAGTCATGCCTCCACCTGCCC CCTCCTCACTCATCCCCTCGCATCCTAAAACATGACCACGATTCACCGTCGGCACACCACTCTCATCTTGGACCTCG GCGACGTCCTCTTCCGCTGGTCACCAAAGACCGAGACCGCCATCCCCCCTCGGCAGCTTAAGGAGATACTTACCTC CGTCACCTGGTTCGAGTACGAACGAGGCCAGATATCCCAAACAGAATGTTACGAACGATGCGCTGCAGAATTCAA AGTCGACCCCTTAGTGATCGCTGAAGCCTTCAAGCAAGCTCGCGAGTCATTACGGCCCAACAAAGCGTTCATCGCC TTGATTCGCGAACTTCGCCATCAAATGCATGGAGACCTCACGGTCCTCGCCCTTTCCAACATTTCCCTCCCCGATTAC GAATATATCATGTCTCTGAGCTCGGATTGGGCAACCGTCTTCAATCGCGTATTCCCTTCTGCACTTGTTGGCGAGCG AAAACCCCATCTGGGGTGCTACCGCAAGGTCATTTCGGAGATGAGCTTGGAACCCCAGACAACCGTATTTGTCGAT GATAAGCTAGACAACGTCGCCTCTGCTCGCTCACTTGGCATGCACGGCATCGTATTCGACAACGAAGCCAATGTCT TCCGGCAACTGCGCAATATCTTCGGGAATCCGGTTAGCCGCGGTCAAGGCTATCTTCGCAAGCATGCCGGAAAGC TTGAGTCTTCTACCGACAATGGCTTGACCTTTGAGGAGAACTTCACCCAGCTCATCATCTACGAGGTGACACAAGA CAGGAGTCTCATCACGCTCTCAGAATGTCCCCGTACCTGGAATTTCTTTCGAGGTCAACCGCTCTTCTCGGAGTCTT TCCCGGATGATGTGGACACAACATCCGTGGCATTGACAGTACTACAACCCGATAGAGCGCTCGTTGATTCTATTCT AGACCAAATGCTTGAATATGTTGACGCCGACGGCATCATGCAGACATACTTCGACAGCTCGCGACCACGCATAGA CCCTTTTGTTTGCGTCAATGTGCTTTCTCTGTTCTACGCAAACGGCCGGGGTCGGGAGCTCCCTCACACACTGGAGT GGGTCTATGAAGTACTCCTGCATCGCGCCTACCATGGAGGCTCACGTTACTACCTATCACCGGACTGCTTTTTATTC TTCATGAGCCGCTTGCTCAAGCGCGCCAACGACTCGGCCCTCCAGGCTCGGTTCCGCCCACTGTTCATGGAGAGAG TGAAAGAACGAGTAGGGGCAGCCGGAGACTCAATGGACCTGGCCTTCCGCATCCTCGCCGCGGCTACCATTGGCG TCCATTGCCCCCAAGATCTAGAAAGATTGGCCGCCGCGCAATGCGAGGACGGTGGATGGGACATGTGCTGGTTCT ACGCGTTCGGGTCGACAGGTATCAAGGCGGGCAACCGCGGCCTCACCACGGCCCTTGCCGTCGCAGCTATACGAA CCGCCCTCGGGCGCCCCCCCTCTCCCAGCCCCTCCAACATCTCGTCGTCGTCGAAGCTCGACGCTCCCAACAGCTTC TTGGGCATCCCGCGCCCAACCAGCCCCATTCGCTTTGGCGAACTTTTCCGTTCCTGGCGAAAGAACAAACCGACCG CAAAATCTCAATGAATCTCAGGTTCTCGTGCTCTCGTGCTATCTTCGTACTTATGCTACTCGACATTACCCGTCGCTG TCTACAATGATACGGGTACTTTGATGAAACTGTAGATGTATTTGTATCATATTGACCTCCATCCATAGTCACCTAGC TACTGTTCGTGTTATCACCTGTTGCTGTTATATGATACAAGATGCCCAAACGAGAATGTAGAAATGTTCCGTACACT TGTGTACCTGTGATGAAGCTACATAGGCCTTCAATCGATCACTTGGTCC - CvTps1 cDNA SEQ ID NO: 3 ATGACCACGATTCACCGTCGGCACACCACTCTCATCTTGGACCTCGGCGACGTCCTCTTCCGCTGGTCACCAAAGAC CGAGACCGCCATCCCCCCTCGGCAGCTTAAGGAGATACTTACCTCCGTCACCTGGTTCGAGTACGAACGAGGCCA GATATCCCAAACAGAATGTTACGAACGATGCGCTGCAGAATTCAAAGTCGACCCCTTAGTGATCGCTGAAGCCTTC AAGCAAGCTCGCGAGTCATTACGGCCCAACAAAGCGTTCATCGCCTTGATTCGCGAACTTCGCCATCAAATGCATG GAGACCTCACGGTCCTCGCCCTTTCCAACATTTCCCTCCCCGATTACGAATATATCATGTCTCTGAGCTCGGATTGG GCAACCGTCTTCAATCGCGTATTCCCTTCTGCACTTGTTGGCGAGCGAAAACCCCATCTGGGGTGCTACCGCAAGG TCATTTCGGAGATGAGCTTGGAACCCCAGACAACCGTATTTGTCGATGATAAGCTAGACAACGTCGCCTCTGCTCG CTCACTTGGCATGCACGGCATCGTATTCGACAACGAAGCCAATGTCTTCCGGCAACTGCGCAATATCTTCGGGAAT CCGGTTAGCCGCGGTCAAGGCTATCTTCGCAAGCATGCCGGAAAGCTTGAGTCTTCTACCGACAATGGCTTGACCT TTGAGGAGAACTTCACCCAGCTCATCATCTACGAGGTGACACAAGACAGGAGTCTCATCACGCTCTCAGAATGTCC CCGTACCTGGAATTTCTTTCGAGGTCAACCGCTCTTCTCGGAGTCTTTCCCGGATGATGTGGACACAACATCCGTGG CATTGACAGTACTACAACCCGATAGAGCGCTCGTTGATTCTATTCTAGACCAAATGCTTGAATATGTTGACGCCGA CGGCATCATGCAGACATACTTCGACAGCTCGCGACCACGCATAGACCCTTTTGTTTGCGTCAATGTGCTTTCTCTGT TCTACGCAAACGGCCGGGGTCGGGAGCTCCCTCACACACTGGAGTGGGTCTATGAAGTACTCCTGCATCGCGCCT ACCATGGAGGCTCACGTTACTACCTATCACCGGACTGCTTTTTATTCTTCATGAGCCGCTTGCTCAAGCGCGCCAAC GACTCGGCCCTCCAGGCTCGGTTCCGCCCACTGTTCATGGAGAGAGTGAAAGAACGAGTAGGGGCAGCCGGAGA CTCAATGGACCTGGCCTTCCGCATCCTCGCCGCGGCTACCATTGGCGTCCATTGCCCCCAAGATCTAGAAAGATTG GCCGCCGCGCAATGCGAGGACGGTGGATGGGACATGTGCTGGTTCTACGCGTTCGGGTCGACAGGTATCAAGGC GGGCAACCGCGGCCTCACCACGGCCCTTGCCGTCGCAGCTATACGAACCGCCCTCGGGCGCCCCCCCTCTCCCAGC CCCTCCAACATCTCGTCGTCGTCGAAGCTCGACGCTCCCAACAGCTTCTTGGGCATCCCGCGCCCAACCAGCCCCAT TCGCTTTGGCGAACTTTTCCGTTCCTGGCGAAAGAACAAACCGACCGCAAAATCTCAATGA - CvTps1 optimized cDNA SEQ ID NO: 4 ATGACTACGATCCACCGCCGCCATACTACGCTGATCCTGGACCTGGGTGATGTTCTGTTCCGCTGGTCCCCGAAAA CCGAAACCGCAATTCCGCCTCGTCAGCTGAAAGAAATCTTGACCAGCGTTACCTGGTTCGAGTATGAGCGTGGCCA AATTAGCCAGACCGAATGCTACGAGCGTTGTGCTGCCGAGTTTAAAGTTGATCCGCTGGTTATTGCCGAAGCGTTT AAACAAGCGCGTGAAAGCCTGCGTCCGAACAAAGCGTTTATCGCGTTGATCCGTGAGTTGCGCCACCAGATGCAT GGTGACCTGACGGTCCTGGCACTGAGCAACATTAGCCTGCCTGATTATGAGTACATTATGTCGCTGAGCTCCGATT GGGCGACGGTCTTTAATCGCGTGTTTCCGAGCGCACTGGTGGGTGAGCGTAAGCCACACCTGGGTTGCTACCGCA AGGTCATCAGCGAGATGTCTCTGGAGCCGCAGACCACGGTTTTCGTCGATGACAAACTGGACAATGTCGCAAGCG CTCGTAGCCTGGGCATGCATGGCATCGTGTTCGACAACGAAGCGAACGTTTTTCGTCAGCTGCGTAATATCTTCGG TAACCCGGTTAGCCGCGGTCAAGGTTACTTGCGTAAACACGCCGGTAAACTGGAATCTAGCACGGATAATGGTCT GACCTTCGAAGAGAACTTCACTCAATTAATTATTTACGAAGTCACGCAAGACCGCAGCCTGATCACCCTGAGCGAG TGCCCGCGTACCTGGAACTTCTTCCGCGGTCAACCACTGTTTTCTGAGAGCTTTCCGGACGACGTGGACACCACCTC TGTGGCGTTGACCGTTCTGCAGCCGGATCGTGCGTTGGTGGATAGCATCCTGGACCAGATGTTGGAATATGTTGA CGCGGATGGTATTATGCAAACCTACTTTGATTCATCCCGTCCGCGCATTGACCCGTTCGTGTGCGTGAATGTCCTGA GCCTGTTCTACGCCAATGGCAGAGGCCGCGAGCTGCCACACACGCTGGAATGGGTCTATGAAGTTCTGCTGCACC GTGCGTACCACGGCGGTAGCCGTTATTACCTGAGCCCGGACTGTTTCCTGTTCTTTATGAGCCGTCTGCTGAAGCG CGCGAATGACTCGGCGCTGCAGGCCCGTTTTCGCCCGCTTTTCATGGAACGTGTGAAAGAGCGTGTGGGCGCAGC CGGCGATAGCATGGACCTGGCGTTCCGCATTCTGGCCGCTGCAACCATCGGCGTTCATTGCCCACAAGATCTGGA GCGTCTGGCAGCAGCGCAGTGCGAAGATGGTGGCTGGGATATGTGTTGGTTTTATGCGTTTGGCAGCACGGGTAT CAAGGCTGGCAACCGCGGTCTGACCACCGCGTTGGCTGTCGCCGCAATTCGTACCGCGCTGGGTCGTCCGCCTTCC CCGAGCCCGAGCAATATTTCTAGCTCCAGCAAACTGGACGCGCCGAACTCCTTCCTGGGCATCCCGCGTCCGACCA GCCCGATCCGTTTCGGTGAACTGTTTCGTAGCTGGCGTAAGAACAAGCCGACCGCGAAAAGCCAGTAA LoTps1 - LoTps1 protein SEQ ID NO: 5 MYTALILDLGDVLFSWSSTTNTTIPPRQLKEILSSPAWFEYERGRITQAECYERVSAEFSLDATAVAEAFRQARDSLRPND KFLTLIRELRQQSHGELTVLALSNISLPDYEFIMALDSKWTSVFDRVFPSALVGERKPHLGAFRQVLSEMNLDPHTTVFVD DKLDNVVSARSLGMHGVVFDSQDNVFRMLRNIFGDPIHRGRDYLRQHAGRLETSTDAGVVFEENFTQLIIYELTNDKSL ITTSNCARTWNFFRGKPLFSASFPDDMDTTSVALTVLRLDHALVNSVLDEMLKYVDADGIMQTYFDHTRPRMDPFVC VNVLSLFHEQGRGHELPNTLEWVHEVLLHRAYIGGSRYYLSADCFLFFMSRLLQRITDPSVLGRFRPLFIERVRERVGATG DSIDLAFRIIAASTVGIQCPRDLESLLAAQCEDGGWDLCWFYQYGSTGVKAGNRGLTTALAIKAIDSAIARPPSPALSVAS SSKSEIPKPIQRSLRPLSPRRFGGFLMPWRRSQRNGVAVSS - LoTps1 transcript (including non-coding sequence) SEQ ID NO: 6 GCGTCTGCTGCGGTCTCTCACCGCGCCGAGCGACGGGAAGCGGAGGCTTTTTGATGCAGCCAGCTCAGCGCCATC CTCTCACGCAGGGGGTTTGATCCAGATCTGATCGCCTCCGGGTTCTCATCTAGAACGCACGGCGGCTCCCAGGAA GTTCTATCGACCCTCTGCGCGCTGGTCGGCGGCACGATGTGGCTACACCAGTCCCAATCATATCTCACACCCAGCA CCATCATCTCGGGCCTCTTCGTCATGTAACCCTCCCAAGCCTATTTTTCAGGGCGTTCCCCCTCACCGGCGCGCTTCT TAAAGAATCCCGAAATGTATACGGCTCTTATCCTTGACCTCGGCGACGTTCTGTTCTCTTGGTCGTCGACGACCAAC ACGACTATTCCCCCTCGGCAGCTAAAGGAGATCCTCTCATCTCCTGCCTGGTTTGAGTACGAGCGTGGTCGCATAA CGCAAGCCGAATGCTACGAGCGTGTCAGCGCCGAGTTCAGCCTAGACGCCACCGCCGTCGCGGAAGCATTCCGGC AAGCTCGCGACTCCTTGCGCCCGAACGACAAGTTCCTCACGTTAATTCGCGAGCTTCGACAACAATCTCATGGGGA GCTCACGGTGCTTGCGCTGTCCAACATATCCCTTCCCGACTATGAATTCATCATGGCCCTCGACTCGAAGTGGACTT CTGTCTTTGACCGCGTCTTCCCTTCTGCCCTCGTGGGCGAACGGAAGCCACACCTTGGAGCGTTTCGCCAGGTTCT GTCCGAGATGAATCTTGACCCGCACACAACTGTGTTCGTCGATGACAAGCTGGACAATGTCGTCTCCGCACGGTCC CTCGGGATGCACGGCGTCGTGTTCGACTCCCAAGACAATGTCTTTCGGATGCTGAGAAACATCTTTGGCGATCCCA TTCATCGGGGACGTGACTATCTCCGACAGCACGCCGGACGTCTGGAGACCTCCACGGATGCCGGTGTGGTCTTCG AAGAGAATTTCACGCAACTCATCATCTACGAACTGACGAATGACAAGTCTCTCATCACGACATCAAACTGTGCTCG TACTTGGAATTTCTTTCGTGGGAAGCCTTTGTTCTCAGCATCGTTCCCTGACGACATGGACACGACCTCGGTTGCCT TGACTGTATTACGTTTAGACCACGCCCTCGTGAACTCGGTTTTGGACGAGATGCTAAAGTATGTCGACGCAGACGG CATCATGCAGACCTACTTCGACCATACACGCCCACGCATGGATCCATTTGTCTGCGTCAATGTGCTCTCGTTGTTTC ACGAACAAGGTCGTGGCCACGAGCTTCCGAACACCCTCGAATGGGTCCATGAGGTCCTCCTCCACCGCGCGTACA TCGGGGGCTCGCGGTACTACCTCTCCGCGGACTGCTTCCTCTTTTTCATGAGCCGCCTCCTGCAGCGCATCACCGAC CCGTCCGTCCTTGGCCGCTTCCGTCCACTATTCATAGAGCGCGTTCGGGAGCGTGTAGGTGCGACCGGGGACTCCA TCGATCTCGCATTCCGCATCATCGCCGCGTCCACAGTAGGCATCCAGTGTCCACGCGACTTGGAAAGTCTCCTCGC CGCACAGTGTGAAGACGGTGGCTGGGACCTGTGCTGGTTCTACCAGTACGGATCGACCGGTGTCAAGGCGGGCA ACCGCGGGCTCACCACCGCTCTGGCGATCAAAGCTATTGACTCCGCCATTGCGAGGCCACCTTCGCCTGCCCTCTC AGTCGCTTCGTCGTCCAAATCGGAGATACCGAAACCCATACAACGGTCCCTTAGGCCCCTTAGCCCCCGCCGGTTT GGCGGTTTCCTGATGCCGTGGCGCAGGTCACAGCGCAATGGCGTGGCGGTCTCTAGTTGAACACTTGACCCTTGA CACTTCGCTTTGCACTGCCTGCTCCCCTGCCAATCCTCCCCTACGATCGTATCATCCCTCTCTTGCCCTCGCCTCCCCC TCGTACCCCCTCTCATGGGGTGCCATTTGTAGATATGTACGTAGCGTGATGTAGCGGTACTCGGATCGTTCTCGTA CTCGTCTTGCTCTGCCGTCGCTTCCAGCCCGTGCTGTTCTCTCGTTCAGGCTATTCGTTGGTTACGCGTATATCGTAA TAGACCGCCCCGGTTCCTCGCCTACAGACACTCGCCCGTCTCGCCACGGACTCGGCTACGGATTCAGACTACATGA GTGGCAGTTATCACACGCAGATCCCTCCTTGGTCGTTCTGTAGTACCCACATATGTAATTGTACCAGTCCACTGTTG CAGATC - LoTps1 cDNA SEQ ID NO: 7 ATGTATACGGCTCTTATCCTTGACCTCGGCGACGTTCTGTTCTCTTGGTCGTCGACGACCAACACGACTATTCCCCCT CGGCAGCTAAAGGAGATCCTCTCATCTCCTGCCTGGTTTGAGTACGAGCGTGGTCGCATAACGCAAGCCGAATGC TACGAGCGTGTCAGCGCCGAGTTCAGCCTAGACGCCACCGCCGTCGCGGAAGCATTCCGGCAAGCTCGCGACTCC TTGCGCCCGAACGACAAGTTCCTCACGTTAATTCGCGAGCTTCGACAACAATCTCATGGGGAGCTCACGGTGCTTG CGCTGTCCAACATATCCCTTCCCGACTATGAATTCATCATGGCCCTCGACTCGAAGTGGACTTCTGTCTTTGACCGC GTCTTCCCTTCTGCCCTCGTGGGCGAACGGAAGCCACACCTTGGAGCGTTTCGCCAGGTTCTGTCCGAGATGAATC TTGACCCGCACACAACTGTGTTCGTCGATGACAAGCTGGACAATGTCGTCTCCGCACGGTCCCTCGGGATGCACGG CGTCGTGTTCGACTCCCAAGACAATGTCTTTCGGATGCTGAGAAACATCTTTGGCGATCCCATTCATCGGGGACGT GACTATCTCCGACAGCACGCCGGACGTCTGGAGACCTCCACGGATGCCGGTGTGGTCTTCGAAGAGAATTTCACG CAACTCATCATCTACGAACTGACGAATGACAAGTCTCTCATCACGACATCAAACTGTGCTCGTACTTGGAATTTCTT TCGTGGGAAGCCTTTGTTCTCAGCATCGTTCCCTGACGACATGGACACGACCTCGGTTGCCTTGACTGTATTACGTT TAGACCACGCCCTCGTGAACTCGGTTTTGGACGAGATGCTAAAGTATGTCGACGCAGACGGCATCATGCAGACCT ACTTCGACCATACACGCCCACGCATGGATCCATTTGTCTGCGTCAATGTGCTCTCGTTGTTTCACGAACAAGGTCGT GGCCACGAGCTTCCGAACACCCTCGAATGGGTCCATGAGGTCCTCCTCCACCGCGCGTACATCGGGGGCTCGCGG TACTACCTCTCCGCGGACTGCTTCCTCTTTTTCATGAGCCGCCTCCTGCAGCGCATCACCGACCCGTCCGTCCTTGGC CGCTTCCGTCCACTATTCATAGAGCGCGTTCGGGAGCGTGTAGGTGCGACCGGGGACTCCATCGATCTCGCATTCC GCATCATCGCCGCGTCCACAGTAGGCATCCAGTGTCCACGCGACTTGGAAAGTCTCCTCGCCGCACAGTGTGAAG ACGGTGGCTGGGACCTGTGCTGGTTCTACCAGTACGGATCGACCGGTGTCAAGGCGGGCAACCGCGGGCTCACC ACCGCTCTGGCGATCAAAGCTATTGACTCCGCCATTGCGAGGCCACCTTCGCCTGCCCTCTCAGTCGCTTCGTCGTC CAAATCGGAGATACCGAAACCCATACAACGGTCCCTTAGGCCCCTTAGCCCCCGCCGGTTTGGCGGTTTCCTGATG CCGTGGCGCAGGTCACAGCGCAATGGCGTGGCGGTCTCTAGTTGA - LoTps1 optimized cDNA SEQ ID NO: 8 ATGTACACGGCGCTGATTTTGGATTTGGGTGATGTTCTGTTTAGCTGGAGCTCAACGACTAACACCACCATTCCGC CGCGTCAGCTGAAAGAAATCTTGAGCTCCCCGGCGTGGTTCGAGTACGAGCGTGGCCGTATCACCCAGGCAGAGT GTTATGAGCGTGTCAGCGCAGAGTTTAGCCTGGATGCGACGGCCGTGGCTGAGGCTTTTCGTCAGGCACGTGATA GCCTGCGTCCGAACGACAAATTTCTGACCCTGATCCGTGAGCTGCGTCAACAGAGCCACGGTGAATTGACCGTTCT GGCCTTGTCTAACATCAGCCTGCCGGATTACGAATTTATTATGGCACTGGACTCGAAGTGGACCAGCGTGTTTGAT CGTGTGTTCCCGAGCGCCCTGGTGGGCGAACGCAAGCCGCACCTGGGCGCGTTCCGCCAAGTCCTGTCCGAGATG AATTTGGACCCGCATACCACCGTTTTTGTGGACGACAAACTGGACAATGTTGTCAGCGCACGCAGCCTGGGTATGC ACGGTGTCGTGTTCGACAGCCAAGACAATGTTTTTCGTATGCTGCGTAACATTTTCGGTGACCCAATTCACCGCGG TCGTGACTATCTGCGCCAGCACGCTGGTCGTCTTGAAACGTCCACCGATGCGGGCGTTGTGTTCGAAGAGAACTTC ACCCAACTGATCATTTACGAACTGACCAACGATAAGAGCCTGATCACCACCTCTAATTGCGCCCGCACCTGGAACTT CTTCCGCGGCAAACCTCTGTTCTCCGCGAGCTTTCCGGACGATATGGACACTACGTCGGTAGCGCTGACCGTGCTG CGTCTGGACCATGCGCTGGTGAATAGCGTTCTGGATGAAATGCTGAAATACGTCGATGCTGACGGTATTATGCAG ACCTACTTTGATCATACGCGTCCTCGTATGGACCCGTTCGTTTGCGTCAATGTGCTGAGCCTGTTTCACGAGCAAGG TCGCGGTCATGAACTGCCGAATACGCTGGAATGGGTGCATGAAGTCCTGCTGCACCGTGCGTATATCGGTGGCAG CCGCTATTATCTGAGCGCGGATTGTTTCCTGTTCTTTATGAGCCGTCTGTTGCAACGTATTACCGACCCGAGCGTTT TAGGTAGATTTCGCCCGCTGTTCATCGAGCGTGTTCGCGAGCGCGTTGGCGCGACTGGCGACAGCATCGACCTGG CATTCCGTATCATCGCGGCCAGCACGGTCGGCATTCAATGCCCGCGTGACCTGGAGTCTCTGCTGGCAGCACAGTG CGAAGATGGTGGCTGGGATCTGTGTTGGTTTTACCAGTACGGCAGCACGGGTGTTAAGGCCGGTAACCGTGGTCT GACCACGGCGTTGGCGATCAAAGCGATTGACAGCGCCATCGCGCGTCCGCCAAGCCCGGCCCTGTCCGTTGCAAG CTCCAGCAAGAGCGAGATTCCGAAGCCGATTCAGCGTAGCCTCCGCCCGTTGAGCCCGCGTCGCTTCGGTGGCTTC CTGATGCCGTGGCGTCGTAGCCAACGCAATGGTGTCGCGGTGAGCTCTTAA OCH93767.1 - OCH93767.1 protein SEQ ID NO: 9 MSAAVRYTTLILDLGDVLFTWSPKTKTSISPRILKEILNSATWYEYERGSITQHECYERVGVEFGIAPSEIHNAFKQARDSM ESNDELIALVRELKEQSDGELLVFALSNISLPDYEYVLTKPADWSIFDKVFPSALVGERKPHLGIYKHVIAETGVDPRTTVFV DDKIDNVLSARSLGMHGIVFDKHEDVMRALRNIFGDPVRRGREYLRRNARKLESITDHGVAFGENFTQLLILELTSDASL VTLPDRPRTWNFFRGKPLFSEAFPDDLDTTSLALTVLKRDAATVSSVMDEMLKYRDADGIMQTYFDNGRQRLDPFVN ANVLTLFYANGRGHELDQSLSWVREVLLYRAYLGGSRYYPSADCFLYFISRLFACTSDPVLHHQLKPLFVERVHERIGVQ GDALELAFRLLVCASFNISNQPDMRKLLEMQCQDGGWDGGNLYRFGTTGLKVTNRGLTTAAAVQAIEATQLRPPSPA FSVESPKSPVTPVTPMLEIPALGLSISRPSSPLLGYFKLPWKKSAEVH - OCH93767.1 cDNA SEQ ID NO: 10 ATGTCCGCAGCAGTTCGGTACACGACCCTCATCCTCGACCTTGGCGACGTCTTGTTCACTTGGTCACCGAAGACGA AGACCAGCATCTCGCCTCGTATTCTGAAGGAGATCCTGAATTCCGCGACCTGGTATGAGTACGAGCGCGGTAGTA TCACTCAGCACGAATGTTACGAACGCGTTGGCGTGGAGTTCGGTATTGCGCCGAGCGAGATCCACAACGCGTTCA AGCAGGCTCGGGACTCTATGGAGTCGAATGACGAGCTGATCGCCCTTGTTCGGGAACTGAAGGAGCAGTCAGAT GGAGAGCTTCTCGTCTTCGCATTATCGAACATCTCACTGCCGGACTACGAATACGTCCTGACGAAGCCCGCGGACT GGTCCATCTTCGACAAAGTCTTTCCTTCCGCTCTCGTCGGCGAGCGCAAGCCCCATCTCGGCATCTACAAACACGTC ATCGCAGAGACGGGCGTTGATCCGCGAACAACCGTCTTCGTGGACGACAAGATCGACAATGTGCTTTCGGCGCGG TCGCTCGGTATGCACGGCATTGTCTTCGACAAACACGAAGACGTAATGCGCGCTCTGCGAAACATTTTCGGTGACC CCGTGCGAAGAGGACGAGAATATTTGCGTCGAAATGCAAGGAAATTGGAATCCATCACAGATCACGGCGTCGCCT TCGGGGAGAACTTCACCCAGCTTCTGATCCTCGAACTTACTAGTGATGCGTCCCTCGTTACTCTCCCTGATCGTCCT CGGACATGGAATTTTTTCCGAGGGAAGCCGCTCTTTTCGGAGGCCTTCCCCGATGACCTTGATACTACTTCCTTGGC ACTCACTGTCCTGAAAAGAGATGCCGCCACTGTATCGTCCGTGATGGACGAGATGCTGAAATACAGGGACGCGGA CGGCATCATGCAGACATACTTCGACAACGGTCGGCAACGACTCGATCCGTTCGTCAACGCCAACGTTTTGACCCTC TTCTACGCCAACGGTCGCGGACACGAGCTGGATCAGAGCCTCAGCTGGGTTCGCGAAGTCTTGCTCTACCGCGCTT ACCTCGGCGGTTCCCGCTACTACCCCTCCGCCGACTGCTTCCTATATTTCATCAGCCGCCTCTTCGCCTGCACCAGCG ACCCGGTCCTCCATCATCAACTTAAGCCCCTCTTTGTTGAGCGTGTGCACGAGCGGATAGGAGTGCAGGGCGACG CGCTGGAGCTCGCCTTCCGCCTGCTTGTATGCGCGAGCTTCAACATCTCGAACCAGCCTGACATGCGCAAGCTGCT CGAGATGCAGTGCCAGGACGGAGGCTGGGATGGCGGAAACCTGTATCGTTTCGGCACCACGGGCCTCAAGGTCA CGAACCGGGGTCTGACCACCGCAGCAGCCGTGCAAGCCATCGAGGCGACGCAGCTGCGTCCACCATCACCGGCG TTCTCTGTCGAGTCGCCTAAGAGCCCGGTGACGCCGGTGACGCCCATGCTGGAGATTCCAGCGCTGGGTCTCAGC ATCTCGCGGCCCTCCAGTCCTCTGTTGGGGTATTTCAAGCTCCCGTGGAAGAAGTCAGCCGAGGTTCATTGA - OCH93767 optimized cDNA SEQ ID NO: 11 ATGTCTGCAGCTGTTCGTTATACTACTCTGATCCTGGATTTGGGCGATGTTCTGTTCACCTGGTCCCCGAAAACCAA GACCTCTATCAGCCCACGTATCCTGAAAGAAATCCTGAACAGCGCGACCTGGTACGAGTATGAGCGTGGCAGCAT CACCCAGCACGAGTGCTACGAGCGTGTTGGCGTCGAATTTGGTATTGCGCCGAGCGAGATTCACAACGCGTTCAA ACAAGCCCGCGACAGCATGGAATCCAACGACGAACTGATTGCTCTGGTGCGTGAGCTGAAAGAACAGAGCGATG GTGAGCTGCTGGTCTTTGCCCTGAGCAATATCTCTCTGCCGGATTACGAATACGTTCTGACCAAACCAGCGGACTG GTCAATCTTCGATAAAGTCTTTCCGAGCGCTTTGGTCGGTGAGCGTAAACCGCATCTGGGTATTTACAAACACGTT ATTGCGGAAACCGGTGTTGACCCGAGAACGACCGTTTTTGTTGACGATAAGATTGACAACGTCCTGAGCGCACGC AGCCTGGGTATGCATGGTATTGTCTTTGATAAACACGAAGATGTGATGCGTGCTCTGCGCAATATCTTTGGCGACC CGGTGCGTCGCGGTCGTGAGTATTTGCGCCGCAACGCGCGCAAATTGGAGTCCATTACCGATCATGGTGTCGCAT TTGGTGAGAATTTCACCCAGCTCCTGATTCTGGAACTGACCAGCGACGCGTCCCTGGTGACGCTGCCGGATCGTCC GCGTACGTGGAACTTCTTCCGCGGCAAGCCGCTGTTTAGCGAAGCGTTCCCGGATGACCTGGACACCACGAGCCT GGCACTGACGGTGCTGAAACGCGATGCAGCAACTGTGAGCTCCGTCATGGACGAAATGCTGAAGTACCGCGACG CGGATGGCATCATGCAGACGTATTTCGACAACGGTCGTCAGCGTCTGGACCCGTTTGTCAACGCCAATGTTCTGAC GCTGTTTTACGCGAATGGCCGTGGTCATGAACTGGACCAGAGCTTATCATGGGTGCGTGAAGTGCTGCTGTATCG CGCCTATCTGGGTGGCAGCCGCTACTATCCGAGCGCGGACTGTTTTCTGTACTTCATTAGCCGCTTGTTCGCCTGCA CCAGCGATCCGGTTCTGCATCACCAACTGAAGCCATTGTTCGTCGAGCGTGTGCACGAGCGTATTGGTGTTCAGGG CGACGCACTGGAACTGGCGTTCCGTCTGTTGGTGTGTGCGAGCTTCAACATTAGCAATCAGCCGGATATGCGTAA GCTGCTGGAAATGCAATGCCAAGATGGCGGCTGGGACGGTGGTAATCTGTACCGTTTTGGCACCACCGGTTTAAA AGTGACGAATCGTGGTTTGACCACCGCTGCGGCCGTTCAAGCAATTGAAGCAACGCAACTGCGTCCGCCGAGCCC AGCATTTAGCGTAGAGTCGCCTAAGAGCCCGGTTACGCCGGTGACGCCGATGCTGGAAATCCCGGCGCTGGGTCT GTCTATCAGCCGTCCGTCGAGCCCGCTGCTGGGCTATTTCAAGTTGCCGTGGAAGAAAAGCGCCGAAGTGCACTA A EMD37666.1 - EMD37666.1 protein SEQ ID NO: 12 MSAAAQYTTLILDLGDVLFTWSPKTKTSIPPRTLKEILNSATWYEYERGRISQDECYERVGTEFGIAPSEIDNAFKQARDS MESNDELIALVRELKTQLDGELLVFALSNISLPDYEYVLTKPADWSIFDKVFPSALVGERKPHLGVYKHVIAETGIDPRTTV FVDDKIDNVLSARSVGMHGIVFEKQEDVMRALRNIFGDPVRRGREYLRRNAMRLESVTDHGVAFGENFTQLLILELTN DPSLVTLPDRPRTWNFFRGNGGRPSKPLFSEAFPDDLDTTSLALTVLQRDPGVISSVMDEMLNYRDPDGIMQTYFDDG RQRLDPFVNVNVLTFFYTNGRGHELDQCLTWVREVLLYRAYLGGSRYYPSADCFLYFISRLFACTNDPVLHHQLKPLFVE RVQEQIGVEGDALELAFRLLVCASLDVQNAIDMRRLLEMQCEDGGWEGGNLYRFGTTGLKVTNRGLTTAAAVQAIEA SQRRPPSPSPSVESTKSPITPVTPMLEVPSLGLSISRPSSPLLGYFRLPWKKSAEVH - EMD37666.1 cDNA SEQ ID NO: 13 ATGTCCGCGGCAGCTCAATACACGACCCTCATTCTCGACCTTGGCGACGTCCTGTTCACCTGGTCACCGAAAACCA AGACGAGCATCCCCCCTCGGACTCTGAAGGAGATTCTCAATTCCGCGACATGGTATGAGTATGAGCGCGGCCGCA TCTCTCAGGACGAATGTTACGAACGCGTTGGCACGGAGTTCGGAATCGCGCCTAGCGAAATCGACAACGCGTTCA AGCAAGCTCGGGATTCCATGGAATCCAACGACGAACTGATCGCCCTTGTTCGGGAACTCAAGACGCAGTTGGACG GCGAACTCCTTGTCTTCGCACTCTCAAATATCTCGTTGCCTGACTACGAGTACGTCCTCACGAAACCGGCCGACTGG TCCATCTTCGACAAGGTCTTCCCTTCCGCCCTCGTGGGCGAGCGCAAGCCGCACCTCGGCGTTTACAAGCACGTCA TTGCAGAAACGGGCATTGATCCGCGAACCACCGTTTTCGTGGACGACAAGATCGACAACGTGCTCTCAGCGCGGT CTGTAGGTATGCATGGGATCGTTTTCGAGAAGCAGGAAGACGTAATGCGCGCTCTCCGAAACATCTTCGGAGACC CGGTTCGGCGAGGGCGCGAGTACTTGCGCCGTAATGCCATGAGGCTTGAATCGGTTACAGACCATGGTGTGGCGT TTGGCGAGAACTTCACACAACTCCTTATCCTCGAACTAACGAACGATCCCTCCCTCGTTACGCTCCCTGATCGTCCTC GAACATGGAATTTCTTCCGAGGTAACGGGGGACGACCAAGCAAACCATTATTCTCGGAGGCCTTCCCCGATGACTT GGACACTACTTCACTAGCGTTGACTGTCCTCCAAAGAGATCCCGGCGTCATCTCTTCTGTGATGGACGAAATGTTG AACTACAGGGATCCGGACGGCATTATGCAGACATACTTCGACGATGGTCGGCAAAGACTCGATCCATTTGTCAAT GTCAATGTCTTAACGTTCTTCTACACCAACGGACGTGGTCATGAACTGGACCAATGCCTTACATGGGTCCGCGAAG TTTTGCTCTATCGCGCCTATCTCGGCGGCTCACGTTATTACCCCTCCGCCGACTGCTTTCTCTACTTCATCAGCCGCC TTTTCGCATGCACGAATGACCCCGTGCTACACCACCAACTCAAACCGCTCTTCGTCGAGCGCGTGCAGGAGCAAAT CGGCGTGGAGGGCGATGCGCTCGAGTTGGCGTTCCGATTGCTCGTCTGTGCAAGCCTGGACGTCCAAAACGCGAT CGACATGCGCAGGCTGCTCGAGATGCAATGCGAAGATGGCGGCTGGGAGGGCGGGAACCTTTATAGGTTTGGCA CGACCGGGCTCAAGGTGACTAACCGGGGCCTGACGACTGCAGCGGCCGTACAGGCCATCGAGGCGTCCCAACGG CGCCCACCATCACCGTCCCCCTCCGTCGAATCTACAAAGAGCCCAATAACCCCTGTGACGCCCATGCTGGAGGTCC CCTCGCTCGGCCTGAGCATCTCGAGGCCGTCCAGCCCTTTACTCGGCTACTTCAGGCTCCCGTGGAAGAAGTCGGC CGAAGTACACTGA - EMD37666.1 optimized cDNA SEQ ID NO: 14 ATGTCTGCGGCGGCTCAATACACGACTTTGATTCTGGATCTGGGTGATGTTCTGTTCACTTGGTCCCCGAAAACCA AGACCAGCATCCCTCCGCGTACCCTGAAAGAAATCCTGAATAGCGCTACCTGGTATGAGTACGAGCGTGGTCGCA TTTCCCAAGACGAGTGTTACGAACGTGTGGGCACCGAGTTCGGCATTGCGCCGAGCGAGATTGACAACGCGTTCA AACAAGCGCGCGATTCGATGGAAAGCAATGATGAACTGATCGCACTGGTCCGTGAGCTGAAAACGCAGCTGGAC GGTGAGCTGCTGGTTTTCGCACTGTCCAATATTAGCCTGCCGGATTACGAATACGTCTTGACCAAACCGGCGGACT GGAGCATCTTTGACAAAGTGTTCCCTAGCGCCTTGGTGGGCGAGCGTAAGCCGCATCTGGGCGTTTATAAACACG TTATTGCGGAAACGGGCATTGATCCGCGCACGACGGTTTTCGTGGACGACAAGATTGACAATGTGTTAAGCGCAC GCAGCGTCGGTATGCATGGTATCGTGTTTGAGAAACAAGAAGATGTCATGCGTGCACTGCGTAACATCTTTGGTG ATCCGGTCCGTCGTGGTCGTGAGTATCTGCGTAGAAACGCAATGCGTCTGGAGTCCGTGACCGACCACGGCGTGG CGTTTGGTGAGAACTTTACCCAGTTGCTGATTCTGGAATTGACGAACGACCCGAGCCTGGTCACCCTGCCTGATCG TCCGCGTACCTGGAACTTTTTTCGCGGCAATGGTGGCCGCCCGAGCAAGCCGCTGTTCAGCGAAGCGTTCCCGGAT GATCTGGATACCACGAGCCTGGCGCTGACCGTGCTGCAGCGCGACCCGGGTGTTATCAGCAGCGTTATGGACGAA ATGCTGAATTACCGTGACCCGGACGGTATCATGCAGACTTATTTCGATGACGGTCGCCAACGCTTGGACCCATTTG TGAACGTCAATGTTCTGACCTTTTTCTATACGAACGGCCGTGGTCACGAACTGGACCAGTGTCTGACGTGGGTGCG TGAAGTCCTCTTGTATCGTGCGTACCTTGGTGGCTCACGCTACTACCCATCGGCGGATTGCTTCCTGTACTTCATCT CTCGTCTGTTTGCGTGTACCAATGACCCGGTGCTGCACCATCAGCTGAAGCCACTGTTTGTTGAGCGTGTCCAAGA GCAAATTGGTGTCGAGGGTGATGCACTGGAACTGGCTTTTCGTCTGCTGGTCTGCGCCAGCCTGGATGTCCAGAA TGCCATCGACATGCGCCGTCTGCTGGAAATGCAGTGCGAAGATGGCGGTTGGGAGGGTGGTAACCTCTACCGCTT CGGCACCACGGGCCTGAAAGTTACCAACCGCGGTCTGACGACCGCAGCCGCCGTTCAAGCGATCGAAGCGAGCC AACGCCGTCCGCCGAGCCCGAGCCCGTCTGTAGAGAGCACGAAAAGCCCGATTACCCCGGTGACCCCGATGCTGG AAGTTCCAAGCCTGGGCTTATCTATCAGCCGTCCGTCCAGCCCGCTGCTGGGTTATTTCCGTTTGCCGTGGAAGAA AAGCGCAGAAGTGCACTAA EMD37666-B - EMD37666-B protein SEQ ID NO: 15 MSAAAQYTTLILDLGDVLFTWSPKTKTSIPPRTLKEILNSATWYEYERGRISQDECYERVGTEFGIAPSEIDNAFKQARDS MESNDELIALVRELKTQLDGELLVFALSNISLPDYEYVLTKPADWSIFDKVFPSALVGERKPHLGVYKHVIAETGIDPRTTV FVDDKIDNVLSARSVGMHGIVFEKQEDVMRALRNIFGDPVRRGREYLRRNAMRLESVTDHGVAFGENFTQLLILELTN DPSLVTLPDRPRTWNFFRGKPLFSEAFPDDLDTTSLALTVLQRDPGVISSVMDEMLNYRDPDGIMQTYFDDGRQRLDP FVNVNVLTFFYTNGRGHELDQCLTWVREVLLYRAYLGGSRYYPSADCFLYFISRLFACTNDPVLHHQLKPLFVERVQEQI GVEGDALELAFRLLVCASLDVQNAIDMRRLLEMQCEDGGWEGGNLYRFGTTGLKVTNRGLTTAAAVQAIEASQRRPP SPSPSVESTKSPITPVTPMLEVPSLGLSISRPSSPLLGYFRLPWKKSAEVH - EMD37666-B optimized cDNA SEQ ID NO: 16 ATGTCTGCGGCTGCTCAATATACTACTTTGATTCTGGATCTGGGCGACGTTCTGTTCACGTGGAGCCCGAAAACCA AGACCAGCATTCCACCGCGTACCCTGAAGGAGATCCTCAATAGCGCGACTTGGTACGAGTATGAGCGTGGCCGCA TCAGCCAAGACGAGTGCTACGAACGCGTCGGTACGGAATTTGGCATTGCACCAAGCGAGATTGACAATGCGTTTA AACAAGCGCGTGACAGCATGGAAAGCAATGACGAACTGATCGCGCTGGTCCGTGAGCTGAAAACCCAGCTGGAT GGTGAGCTGTTGGTGTTTGCGCTGTCGAACATCTCTCTGCCGGACTACGAGTATGTTCTGACCAAACCGGCGGATT GGAGCATTTTTGATAAAGTGTTTCCGAGCGCGCTGGTTGGTGAGCGCAAGCCGCACCTGGGTGTGTACAAACACG TTATTGCAGAGACTGGCATCGACCCGCGTACGACGGTTTTCGTTGACGACAAGATCGATAACGTTCTGAGCGCAC GTAGCGTCGGTATGCACGGTATTGTTTTCGAAAAACAAGAAGATGTTATGCGCGCACTGCGTAATATCTTCGGCGA TCCGGTCAGACGTGGCCGTGAGTATCTGCGCCGCAATGCGATGCGTCTGGAATCGGTGACCGATCATGGTGTCGC CTTTGGCGAGAATTTCACCCAGCTGCTGATTTTAGAGCTGACCAATGATCCTAGCCTGGTGACGCTGCCGGATCGT CCGCGTACCTGGAACTTTTTCCGCGGCAAGCCGTTGTTCTCCGAAGCCTTCCCGGACGACCTGGACACGACCAGCC TGGCGCTGACCGTGCTGCAACGTGATCCGGGTGTGATCTCTTCCGTAATGGACGAAATGCTGAACTACCGTGACCC GGACGGTATCATGCAGACCTATTTTGACGACGGTCGTCAGCGTCTGGACCCGTTTGTGAACGTGAATGTCCTGACG TTCTTTTACACCAATGGTCGCGGTCACGAACTGGATCAGTGTCTGACCTGGGTCCGCGAAGTGCTGCTGTATCGTG CATACCTGGGTGGCAGCCGTTATTACCCGAGCGCCGATTGCTTTCTGTACTTTATCAGCCGTCTGTTCGCGTGCACG AACGATCCGGTTCTGCATCACCAGCTGAAGCCGTTATTTGTTGAGCGCGTTCAGGAACAAATTGGTGTCGAGGGT GATGCGCTGGAATTGGCATTCCGCCTGTTGGTCTGCGCCAGCCTTGATGTCCAGAACGCCATTGACATGCGTCGCT TGCTCGAAATGCAGTGTGAGGACGGCGGTTGGGAGGGTGGCAACCTGTACCGTTTCGGTACGACCGGCCTGAAA GTCACGAACCGTGGTCTGACGACGGCAGCTGCGGTGCAAGCAATTGAAGCCAGCCAACGTCGTCCGCCATCCCCG TCACCGAGCGTTGAGTCCACCAAGAGCCCGATTACCCCTGTGACCCCGATGCTTGAAGTTCCGAGCCTGGGTCTGA GCATCTCCCGTCCTAGCAGCCCGCTGTTGGGTTACTTCCGCCTGCCGTGGAAGAAAAGCGCTGAGGTGCATTAA XP_001217376.1 - XP_001217376.1 protein SEQ ID NO: 17 MAITKGPVKALILDFSNVLCSWKPPSNVAVPPQILKMIMSSDIWHDYECGRYSREDCYARVADRFHISAADMEDTLKQ ARKSLQVHHETLLFIQQVKKDAGGELMVCGMTNTPRPEQDVMHSINAEYPVFDRIYISGLMGMRKPSICFYQRVMEEI GLSGDAIMFIDDKLENVIAAQSVGIRGVLFQSQQDLRRVVLNFLGDPVHRGLQFLAANAKKMDSVTNTGDTIQDNFA QLLILELAQDRELVKLQAGKRTWNYFIGPPKLTTATFPDDMDTTSMALSVLPVAEDVVSSVLDEMLKFVTDDGIFMTYF DSSRPRVDPVVCINVLGVFCRHNRERDVLPTFHWIRDILINRAYLSGTRYYPSPDLFLFFLARLCLAVRNQSLREQLVLPLV DRLRERVGAPGEAVSLAARILACRSFGIDSARDMDSLRGKQCEDGGWPVEWVYRFASFGLNVGNRGLATAFAVRALE SPYGESAVKVMRRIV - XP_001217376.1 cDNA SEQ ID NO: 18 ATGGCTATCACCAAGGGTCCAGTTAAGGCGCTTATTCTTGACTTTTCCAATGTTCTCTGCTCGTGGAAGCCTCCCAG CAATGTTGCGGTGCCGCCCCAGATACTCAAAATGATCATGTCCTCTGACATATGGCATGACTACGAGTGCGGACG GTACTCGAGAGAGGACTGCTATGCCAGAGTGGCAGACCGTTTTCATATCAGCGCCGCGGACATGGAAGACACGCT GAAACAGGCGCGCAAGAGCCTGCAGGTTCACCATGAGACACTGTTGTTTATCCAGCAAGTCAAGAAGGATGCCGG GGGCGAGTTGATGGTGTGTGGGATGACCAACACGCCCCGGCCAGAGCAAGACGTAATGCATTCAATCAACGCGG AGTATCCTGTGTTTGATAGGATATATATATCCGGTCTCATGGGCATGAGGAAGCCGAGCATCTGCTTCTACCAGCG GGTGATGGAGGAGATTGGCCTATCAGGCGATGCGATCATGTTTATAGATGACAAGTTGGAGAATGTCATCGCCGC CCAGTCGGTAGGGATCCGAGGCGTTCTATTTCAGAGTCAGCAAGATCTCCGTCGGGTTGTATTAAATTTCTTGGGC GATCCGGTCCATCGCGGCCTGCAGTTCCTAGCGGCCAATGCGAAAAAGATGGATAGTGTGACCAACACCGGCGAT ACTATCCAAGATAATTTTGCTCAGCTCCTCATCTTGGAGCTGGCCCAGGACAGGGAATTGGTGAAGCTTCAGGCTG GAAAAAGGACTTGGAATTACTTCATAGGGCCTCCCAAGCTCACAACAGCCACGTTCCCCGATGACATGGACACCAC ATCTATGGCTCTCTCGGTCCTTCCTGTGGCCGAGGATGTGGTCTCTTCTGTCCTGGATGAGATGCTTAAATTCGTCA CCGATGACGGTATCTTTATGACTTACTTCGATTCCTCGCGCCCTCGAGTCGACCCAGTCGTATGTATCAACGTCTTG GGTGTTTTCTGCAGGCATAACCGAGAGCGAGACGTCCTTCCAACGTTCCATTGGATTCGAGACATCCTGATCAACC GGGCATATCTCTCGGGCACCCGATACTACCCATCGCCCGATTTGTTTTTGTTTTTCCTTGCACGCCTCTGCCTGGCA GTCCGGAATCAGAGCCTACGGGAACAACTTGTCTTGCCTCTGGTAGACCGACTGCGTGAGCGGGTGGGCGCACCT GGAGAAGCGGTCTCATTGGCAGCGCGGATCCTTGCCTGCCGTAGCTTTGGTATCGACAGTGCGAGAGACATGGAC AGCTTGAGGGGAAAACAATGCGAGGATGGCGGCTGGCCAGTGGAGTGGGTTTACCGGTTTGCCTCTTTCGGCCT GAACGTAGGCAATCGGGGTCTTGCTACTGCCTTCGCGGTCAGGGCGCTCGAAAGCCCCTATGGTGAGTCGGCGGT GAAGGTTATGAGACGCATCGTCTGA - XP_001217376.1 optimized cDNA SEQ ID NO: 19 ATGGCAATCACTAAGGGCCCAGTTAAAGCGCTGATTCTTGATTTTTCTAACGTTCTGTGTAGCTGGAAGCCGCCGA GCAATGTTGCGGTCCCGCCTCAAATTCTGAAGATGATTATGTCGAGCGACATCTGGCATGATTATGAGTGTGGCCG TTACAGCCGTGAGGACTGCTACGCCCGTGTTGCTGACCGTTTTCATATCAGCGCAGCGGACATGGAAGATACCCTG AAACAGGCACGTAAGTCCCTGCAAGTGCACCACGAAACGCTGCTGTTCATCCAACAGGTGAAGAAAGACGCGGGT GGTGAGCTGATGGTTTGCGGCATGACCAACACGCCGCGTCCGGAACAAGACGTGATGCATTCCATCAATGCTGAG TATCCGGTGTTCGACCGTATTTACATTAGCGGCCTGATGGGCATGCGTAAACCGAGCATTTGTTTCTACCAACGCG TAATGGAAGAGATTGGTCTGAGCGGTGACGCCATCATGTTCATTGACGATAAACTGGAAAATGTGATTGCCGCAC AGAGCGTGGGTATCCGCGGTGTGCTGTTCCAAAGCCAGCAAGATCTGCGTCGTGTCGTGCTGAACTTTCTGGGCG ATCCGGTCCACCGTGGTCTGCAGTTCTTGGCGGCGAACGCAAAGAAAATGGACAGCGTCACGAATACCGGCGACA CTATCCAAGACAATTTCGCACAGCTGTTGATCTTAGAGCTGGCGCAGGATCGCGAATTGGTGAAATTGCAGGCCG GTAAACGTACCTGGAACTACTTTATTGGTCCGCCGAAGCTGACCACGGCGACGTTTCCGGATGATATGGACACGA CCAGCATGGCGCTGTCGGTGCTGCCTGTCGCGGAAGATGTCGTGAGCTCTGTTCTGGACGAGATGCTGAAGTTCG TGACCGATGATGGTATCTTTATGACCTATTTCGACTCTAGCCGTCCGCGTGTCGATCCGGTTGTCTGCATTAATGTG TTGGGTGTTTTCTGCCGCCACAATCGTGAGCGCGACGTGTTGCCGACCTTTCACTGGATTCGTGATATTCTGATCAA CCGCGCATATCTGAGCGGCACGCGCTATTACCCGTCCCCGGATCTGTTTCTGTTTTTCCTGGCTCGTCTGTGCCTGG CCGTTCGCAACCAGAGCCTGCGCGAACAACTGGTTCTCCCGCTGGTTGATCGTCTGCGCGAGCGTGTTGGTGCTCC GGGTGAGGCTGTGAGCCTGGCGGCACGTATCCTGGCGTGCCGTAGCTTCGGTATCGACTCAGCCCGCGACATGGA CTCCTTGCGTGGCAAACAGTGTGAAGATGGTGGTTGGCCGGTCGAATGGGTCTATCGCTTCGCGAGCTTTGGTCT GAACGTTGGCAACCGTGGTTTGGCCACCGCGTTTGCGGTTAGAGCGCTGGAGTCCCCATACGGCGAGAGCGCAGT TAAGGTTATGCGCCGTATCGTGTAA OJJ98394.1 - OJJ98394.1 protein SEQ ID NO: 20 MPSVKALVLDFAGVLCSWTPPAESPLSPAQLKQLMSSEIWFEYERGRYSEEECYAKLVERFSISAADMASTMEQARQSL ELNHAVLQLVSEIRKRNPGLKVYGMTNTPHAEQDCVNRIVNSYPVFDHVYLSGLVGMRKPDLGFYRFVLAETGLRPDE VVFVDDKTENVLVAQSVGMHGVVFQNVTDFKQQIINVTGDPVSRGLRYLRSNAKSLLTVTSNNSVIHENFAQLLILELT GDRDLIELEPWDRTWNYFIGVPQSPTSTFPNDLDTTSIALSVLPIHKDVVADVMDEIMLLLDNDGIVPTYFDPTRPRVDP VVCVNVLSLFAQNGRESELLATFNWVLDVLRHRAYLQGTRYYISPDAFLYFLARLSVFLRMSPLRARLMPLLEERVYERIG AHGDAISLAMRIYTCKLLGMSNMLDERALRDMQCEDGGFPTSWVYRFGSTGVKIGNRGLTTALAIKAIEMPLASLWKS WGLTTDIR - OJJ98394.1 cDNA SEQ ID NO: 21 ATGCCCTCCGTCAAAGCACTGGTCCTGGACTTCGCCGGAGTTCTATGCTCATGGACCCCGCCAGCCGAGAGCCCGC TCTCCCCAGCCCAGCTCAAACAACTCATGTCCTCCGAGATATGGTTCGAATACGAGCGCGGGAGATATTCCGAAGA AGAATGTTATGCGAAGCTCGTCGAACGGTTCTCCATCAGCGCTGCGGACATGGCTTCCACCATGGAACAGGCCCG TCAGAGCCTGGAACTGAACCACGCCGTACTTCAGCTTGTCAGCGAGATAAGGAAGCGGAACCCCGGGCTCAAAGT TTATGGCATGACGAACACGCCCCATGCGGAACAGGATTGTGTGAATCGCATCGTGAACAGCTATCCTGTTTTCGAC CATGTGTATCTCTCCGGGCTCGTTGGGATGCGCAAACCAGATCTTGGATTCTATCGGTTTGTTCTCGCAGAGACCG GGTTGAGGCCTGACGAGGTCGTGTTCGTCGACGACAAAACGGAGAATGTGTTGGTCGCGCAGTCCGTGGGGATG CACGGCGTGGTGTTCCAGAACGTTACGGATTTCAAGCAGCAGATCATAAACGTGACGGGAGACCCTGTCTCTCGG GGCTTGAGGTATCTCCGCTCGAATGCAAAGAGCCTCCTCACTGTGACTAGCAATAACTCCGTGATCCACGAAAACT TTGCGCAGTTGCTGATTCTGGAGCTGACGGGCGACCGAGACTTGATCGAACTCGAGCCTTGGGATCGAACATGGA ACTACTTCATCGGGGTTCCTCAGTCGCCGACGAGCACCTTCCCCAACGACCTGGACACCACCTCTATCGCGCTCTCG GTCCTTCCCATTCATAAGGACGTCGTTGCCGATGTGATGGACGAGATTATGCTTCTCCTAGACAACGACGGGATAG TCCCAACATATTTTGATCCCACTCGCCCTCGAGTCGACCCAGTCGTGTGTGTGAATGTACTCAGCCTGTTTGCCCAA AACGGCCGAGAATCCGAGTTACTCGCCACCTTCAACTGGGTGCTGGACGTGCTGCGACATAGAGCCTACCTGCAG GGCACGAGATATTACATCAGTCCGGACGCCTTCTTGTACTTTCTAGCCAGACTCTCGGTCTTTCTGAGGATGAGTCC ACTCCGCGCTCGGCTAATGCCTCTCCTGGAAGAAAGAGTGTATGAGCGAATTGGTGCCCATGGCGACGCCATTTC GCTGGCTATGCGGATCTATACGTGTAAGCTGCTCGGGATGTCGAATATGCTCGATGAAAGAGCATTGCGGGACAT GCAGTGTGAGGATGGCGGCTTCCCTACAAGTTGGGTCTATAGATTTGGATCGACCGGAGTGAAGATTGGGAACA GGGGGTTGACTACTGCACTTGCAATAAAGGCCATTGAGATGCCTCTCGCTTCGCTTTGGAAGTCGTGGGGATTGA CGACTGACATTCGATAA - OJJ98394.1 optimized cDNA SEQ ID NO: 22 ATGCCGTCGGTTAAAGCGTTGGTTCTGGATTTTGCGGGTGTGTTGTGTTCTTGGACTCCACCGGCGGAAAGCCCGT TGTCCCCAGCGCAGCTGAAGCAGCTGATGAGCAGCGAGATCTGGTTTGAGTATGAGCGTGGCCGCTATAGCGAA GAAGAGTGTTATGCAAAATTGGTGGAGCGTTTCTCTATCTCGGCCGCAGATATGGCGAGCACGATGGAACAGGCC CGTCAATCGCTGGAGTTGAACCACGCCGTGCTGCAATTAGTTTCCGAGATTCGTAAACGTAATCCGGGCTTAAAGG TTTACGGTATGACTAATACCCCGCATGCAGAGCAAGATTGTGTGAACCGTATTGTCAATAGCTATCCGGTTTTTGAT CATGTCTACCTGAGCGGTCTGGTGGGTATGCGCAAACCGGATCTGGGCTTTTACCGTTTCGTTCTGGCAGAGACTG GTCTGCGCCCGGATGAAGTCGTGTTCGTTGACGACAAGACCGAAAATGTCCTGGTGGCTCAATCCGTTGGCATGC ATGGTGTGGTGTTCCAAAATGTAACCGACTTCAAACAACAGATTATCAATGTCACGGGTGATCCTGTCAGCCGTGG TTTGCGCTACTTGCGTTCCAACGCGAAGTCTCTGCTCACTGTTACCAGCAATAACAGCGTTATCCATGAGAATTTCG CGCAGCTGCTGATCCTGGAACTGACGGGCGACCGTGACCTGATTGAACTGGAACCGTGGGACCGTACGTGGAACT ACTTTATCGGCGTGCCGCAAAGCCCGACCAGCACCTTTCCGAACGACCTGGATACGACCAGCATTGCCCTGAGCGT TCTGCCGATTCACAAAGATGTGGTTGCGGACGTGATGGATGAGATTATGCTGCTGCTGGACAATGACGGTATTGT CCCGACCTACTTCGATCCAACCCGTCCGCGTGTTGATCCTGTTGTGTGCGTCAACGTTCTGAGCCTGTTCGCACAGA ACGGTCGCGAGTCCGAATTGCTGGCGACGTTCAACTGGGTTTTGGACGTTCTGAGACACCGTGCGTATTTGCAGG GTACGCGCTATTATATCAGCCCGGATGCCTTTCTGTATTTTCTGGCGCGCCTGTCTGTGTTTCTGCGTATGTCTCCGT TGCGCGCTCGTCTGATGCCGCTGCTGGAAGAACGCGTTTATGAGCGTATCGGCGCACACGGCGATGCTATTAGCC TGGCGATGCGCATTTACACCTGTAAGCTGCTGGGCATGAGCAATATGCTGGACGAGCGTGCACTGCGTGACATGC AGTGTGAAGATGGTGGTTTCCCAACCAGCTGGGTGTACCGTTTTGGTAGCACGGGCGTGAAAATTGGTAACCGTG GCTTGACGACCGCACTGGCCATTAAGGCCATCGAAATGCCGCTGGCCAGCCTTTGGAAAAGCTGGGGCCTGACCA CCGATATTCGCTAA GAO87501.1 - GAO87501.1 protein SEQ ID NO: 23 MTRQKSPQYKAIIFDLGDVFFTWDAPKDTAVLPNLFKKMLTSPTWSDYERGKLSEESCYERLAEQFDVDSSEIARSLRKA QQSLTTDAAIVSLISEIRALAGHIAIYAMSNISAPAYAAVLQTQPEMGIFDGVFPSGCYGTRKPELLFYKKVLQEIAVPPNQ IIFIDDQLENVVSAQSTGMHGIVYTGAGELSRQLRNLVLDPVQRGREFLRRNAGALYSICETGQVIRENFSQLLILEATGD RSLVNLEYQQRSWNFFQGGPPSTSETFPDDVDTTSIALMILPADDNTVNSVLGEISEVANDEGIVNTYFDQTRQRIDPA VCVNVLRLFYTYGRGATLPLTLQWVSDVLEHRAHLHGTRYYPSPEVFLYFVSQLCRFSKREPTLQLLETLLTDRLKERIQVK ADTLSLAMRILACLSVGISQVEVDVRELLALQCKDGSWEPGSFYRFGSSKMNVGNRGLTTALATRAVELYQGTRIRSKG TE - GAO87501.1 cDNA SEQ ID NO: 24 ATGACCCGACAGAAATCGCCTCAATACAAAGCAATCATCTTTGACCTAGGGGATGTCTTTTTCACCTGGGACGCCC CCAAAGACACTGCTGTCTTGCCCAACCTCTTCAAGAAAATGCTTACCTCGCCAACCTGGTCAGATTACGAGCGCGG CAAGTTGAGCGAAGAAAGCTGCTACGAGAGACTGGCCGAACAGTTTGACGTTGACTCGTCGGAAATCGCGCGCA GCTTAAGGAAAGCACAGCAGTCTCTTACCACAGACGCAGCAATCGTGAGCCTGATATCAGAGATCAGAGCGTTGG CCGGACATATTGCCATCTACGCCATGTCCAACATTTCCGCCCCAGCTTATGCAGCTGTGCTCCAGACTCAGCCCGAA ATGGGCATCTTTGACGGAGTGTTCCCGTCTGGATGCTATGGGACGAGGAAGCCGGAGCTGTTGTTCTATAAGAAA GTCTTGCAGGAGATTGCAGTGCCGCCAAATCAGATCATCTTTATTGATGATCAGCTAGAGAATGTAGTTTCTGCGC AGTCAACAGGTATGCACGGCATTGTCTACACCGGTGCGGGTGAGCTCAGTCGACAGCTCAGAAATCTGGTGTTGG ACCCTGTACAAAGGGGTCGAGAGTTTCTACGGCGCAATGCTGGGGCATTGTATAGTATCTGCGAGACTGGTCAAG TCATCCGGGAAAACTTCTCGCAGCTGCTCATCCTAGAGGCGACGGGTGATAGAAGCCTGGTCAACCTTGAATATCA GCAGCGGAGCTGGAATTTCTTTCAAGGAGGTCCCCCTTCTACGTCGGAAACATTCCCAGATGATGTCGACACAACA TCCATTGCCTTGATGATTCTCCCTGCCGATGATAACACAGTCAACTCGGTTCTCGGCGAGATTTCCGAGGTAGCTAA TGACGAGGGCATTGTAAATACGTACTTTGACCAGACCCGACAGCGAATCGACCCAGCAGTCTGCGTCAATGTCCTC CGTCTCTTTTATACCTACGGCCGGGGCGCCACTCTCCCATTGACCCTCCAGTGGGTGTCCGACGTTCTTGAGCATCG TGCGCACTTACATGGTACGCGATACTACCCCAGCCCGGAGGTTTTCCTCTACTTTGTCAGTCAACTCTGCCGGTTCT CCAAGAGGGAACCGACGCTGCAGCTGCTGGAGACGTTGCTCACGGATCGCCTCAAGGAGCGCATTCAGGTCAAG GCAGACACTCTGTCACTGGCTATGCGGATCCTGGCATGCTTGTCTGTGGGTATATCACAAGTTGAAGTGGATGTCC GAGAGCTGCTCGCCTTGCAATGCAAGGATGGATCGTGGGAACCCGGCTCGTTTTACCGGTTTGGGTCGTCCAAGA TGAACGTTGGTAATCGAGGTCTTACGACTGCGTTGGCGACTAGGGCGGTTGAGTTGTACCAGGGGACTAGAATAC GCTCTAAGGGCACCGAGTAG - GAO87501.1 optimized cDNA SEQ ID NO: 25 ATGACTCGCCAAAAAAGCCCTCAATACAAAGCAATTATCTTCGATCTGGGTGACGTTTTCTTCACCTGGGATGCGC CGAAAGATACGGCCGTACTGCCGAACCTGTTCAAGAAAATGCTGACCTCGCCGACCTGGAGCGACTATGAGCGTG GTAAGCTGTCTGAGGAAAGCTGTTACGAACGCTTGGCCGAGCAATTTGACGTGGACAGCAGCGAGATCGCGCGT AGCCTCCGTAAAGCGCAGCAAAGCCTGACGACCGACGCAGCCATCGTGAGCCTGATCAGCGAGATCCGCGCATTG GCGGGTCACATTGCTATCTATGCTATGTCTAACATTTCTGCGCCAGCATACGCAGCGGTGTTACAGACCCAGCCGG AAATGGGTATCTTTGATGGTGTTTTTCCGAGCGGCTGCTATGGTACGCGTAAACCGGAACTGCTGTTTTACAAAAA AGTGCTTCAAGAAATTGCGGTTCCGCCGAATCAGATTATCTTCATTGACGATCAGCTGGAAAACGTCGTCAGCGCA CAGTCCACGGGCATGCATGGCATTGTTTACACCGGTGCCGGTGAGCTGAGCCGTCAACTGCGTAATCTGGTCCTG GACCCGGTGCAGCGTGGTCGTGAGTTCCTGCGCCGTAATGCTGGCGCCCTGTACAGCATTTGTGAGACTGGCCAA GTTATCCGTGAGAACTTCAGCCAGCTGCTGATTCTGGAAGCAACCGGCGATCGTTCGCTGGTGAACCTGGAGTATC AACAACGTTCCTGGAACTTCTTTCAGGGTGGCCCTCCATCCACGAGCGAAACTTTTCCGGATGATGTTGACACGAC CTCAATCGCGCTGATGATTTTACCGGCGGACGATAATACCGTCAATAGCGTCCTGGGTGAAATCAGCGAAGTCGC GAATGACGAGGGCATTGTGAATACCTATTTCGATCAGACCCGCCAACGTATCGATCCGGCCGTGTGTGTCAACGT GTTGCGCCTGTTTTACACCTATGGTCGTGGCGCTACGCTGCCGTTGACCCTGCAATGGGTTAGCGACGTGCTGGAG CACCGTGCGCATCTGCACGGCACCCGCTACTATCCGTCCCCAGAGGTTTTCCTGTACTTTGTCTCTCAGCTGTGCCG TTTTTCCAAGCGCGAACCGACCCTGCAGCTGCTGGAAACGCTGTTGACCGACAGACTGAAGGAACGCATCCAAGT TAAGGCAGATACGCTGAGCTTGGCAATGCGTATTTTGGCGTGCCTGAGCGTGGGCATCAGCCAGGTTGAGGTTGA CGTCCGCGAACTGCTGGCGCTGCAGTGCAAGGACGGTAGCTGGGAGCCGGGTAGCTTCTACCGTTTCGGTAGCA GCAAGATGAATGTCGGTAACCGCGGTCTGACGACCGCTTTGGCGACCCGTGCGGTTGAGCTGTACCAGGGTACGC GTATTCGTAGCAAGGGCACCGAGTAA XP_008034151.1 - XP_008034151.1 protein SEQ ID NO: 26 MASPHRRYTTLILDLGDVLFSWSSKTNTPIPPKKLKEILSSLTWFEYERGRISQAECYDRVSSEFSLDAATIAEAFQQARDS LRPNEEFLALIRELRQQTHGQLTVLALSNISLPDYEYIMALDSDWTSVFDRVFPSALVGERKPHLGAYRRVISEMHLDPET TVFVDDKLDNVVSARSLGMHGVVFDSQENVFQTLRNIFGDPIHRGRDYLRRHAGRLETSTDAGVVFEENFTQLIIYELT NDKSLITTSDCPRTWNFFRGKPLFSASFPDDVDTTSVALTVLRPPRTLVNSILDEMLEYVDADGIMQTYFDHSRPRMDP FVCVNVLSLFYEYGRGQDLPKTLEWVYEVLLHRAYIGGSRYYMSADCFLFFMSRLLQRITDPAVLNRLRPLFVERMHERV SAPGDSMELAFRILAGSSVGIQFPRDLEKLLAAQCADGGWDLCWFYQYGSTGVKAGNRGLTTALAIKAIESAIARPPSP ALSAVSSSKLEVPKPILQRPLSPRRLGDFLMPWRRAQREVAVSS - XP_008034151.1 - cDNA SEQ ID NO: 27 ATGGCTTCACCTCACCGCAGGTATACGACACTCATCCTAGACCTGGGCGACGTCCTCTTCTCTTGGTCATCCAAGAC CAACACACCTATCCCTCCCAAGAAGCTGAAGGAGATCCTCTCGTCCCTGACCTGGTTCGAGTACGAGCGCGGTCGG ATATCACAGGCCGAGTGCTATGACCGGGTCAGCTCCGAGTTCAGTCTTGACGCTGCCACCATCGCAGAAGCGTTCC AGCAGGCTCGCGACTCTCTGCGACCGAACGAAGAGTTCCTGGCGTTGATTCGCGAACTCCGCCAACAAACGCATG GTCAGCTTACCGTCCTCGCGCTCTCGAACATCTCACTCCCCGACTATGAATACATCATGGCTCTCGACTCGGACTGG ACGTCGGTCTTCGACCGCGTCTTCCCTTCTGCCCTCGTCGGCGAGCGCAAGCCACATCTGGGGGCGTACCGCCGTG TCATCTCTGAGATGCACCTAGACCCAGAAACGACCGTCTTTGTGGACGACAAGCTGGACAACGTGGTGTCCGCGC GATCGCTCGGGATGCACGGCGTGGTCTTCGACTCCCAGGAGAACGTCTTCCAGACGCTGAGGAATATCTTCGGCG ACCCGATACATCGCGGACGTGACTATCTCCGCAGGCATGCCGGTCGTCTGGAGACATCTACGGACGCCGGCGTTG TCTTCGAGGAAAACTTTACGCAGCTCATCATCTACGAACTAACAAATGACAAATCCCTCATCACGACATCAGACTGT CCCCGCACTTGGAACTTCTTCCGCGGGAAGCCCTTGTTCTCGGCCTCGTTTCCCGACGATGTGGACACGACGTCGG TTGCCCTGACAGTGTTGCGCCCACCCCGCACGCTTGTCAACTCGATCTTGGACGAGATGCTAGAGTATGTCGACGC CGACGGCATCATGCAGACCTACTTCGACCACTCGCGCCCGCGGATGGATCCGTTCGTCTGTGTCAACGTCCTGTCG CTGTTCTACGAGTACGGCCGGGGACAGGACCTCCCGAAGACCCTCGAATGGGTATACGAGGTTCTGCTGCACCGC GCCTACATCGGCGGCTCGCGGTACTACATGTCCGCGGACTGCTTCCTCTTCTTCATGAGCCGCCTTCTCCAACGTAT CACCGACCCAGCCGTCCTGAACCGCCTCCGCCCGTTGTTCGTCGAGCGCATGCACGAACGTGTCAGCGCACCGGG CGACTCCATGGAGCTCGCGTTCCGCATCCTCGCTGGCTCGTCCGTCGGCATCCAGTTCCCACGTGACCTGGAGAAG CTCCTCGCCGCGCAGTGCGCCGACGGCGGCTGGGACCTGTGCTGGTTCTACCAGTATGGGTCCACCGGCGTGAAG GCAGGCAACCGCGGGCTCACCACCGCGCTCGCCATCAAGGCTATCGAGAGCGCTATCGCGCGCCCTCCGTCCCCC GCTCTATCAGCTGTATCGTCGTCGAAACTGGAAGTGCCGAAACCAATTCTCCAGCGTCCCCTCAGCCCGCGCCGGC TTGGCGACTTCCTGATGCCCTGGAGGAGAGCACAGCGCGAGGTCGCGGTTTCCAGCTAG - XP_008034151 - optimized cDNA SEQ ID NO: 28 ATGGCTAGCCCGCACCGTCGCTATACTACTCTGATTCTGGATTTGGGTGATGTTTTGTTTAGCTGGAGCAGCAAAA CCAATACGCCTATTCCGCCGAAAAAGCTGAAAGAAATCCTGTCTAGCCTGACCTGGTTCGAGTACGAGCGCGGTC GCATTTCTCAAGCCGAGTGCTATGACCGTGTGAGCTCTGAGTTTAGCCTGGACGCAGCGACCATTGCAGAGGCATT CCAACAGGCTCGTGACTCGCTGCGCCCGAACGAAGAATTTCTGGCGTTGATTCGTGAGCTGCGCCAGCAGACCCA CGGCCAACTCACCGTTCTGGCACTGAGCAACATCTCCCTGCCGGATTACGAGTACATCATGGCTCTGGATAGCGAT TGGACCAGCGTCTTTGATAGAGTTTTCCCGAGCGCGCTGGTTGGTGAGCGTAAGCCGCATCTGGGTGCTTACCGTC GTGTCATTAGCGAGATGCATCTGGACCCGGAGACTACGGTGTTTGTGGACGACAAACTGGACAACGTTGTCTCCG CGCGCAGCCTGGGTATGCACGGCGTCGTTTTTGACTCACAAGAAAATGTTTTCCAGACGCTGCGTAACATTTTCGG TGACCCTATCCACCGTGGCCGCGACTATTTGCGTCGTCATGCCGGTCGTTTGGAAACCAGCACCGACGCGGGCGTT GTTTTTGAAGAAAACTTCACCCAGCTGATCATCTACGAACTGACGAATGACAAGAGCCTGATCACCACGAGCGATT GTCCGCGCACCTGGAACTTCTTCCGTGGTAAGCCGCTGTTTAGCGCGTCCTTCCCAGACGATGTCGATACGACTTC GGTGGCCCTGACCGTTCTGCGCCCACCGCGCACCCTGGTAAACAGCATCCTGGACGAAATGTTAGAATACGTCGA TGCGGATGGTATTATGCAGACCTATTTCGACCACAGCCGTCCGCGCATGGACCCGTTTGTGTGTGTGAATGTGTTG AGCCTGTTCTATGAGTACGGCCGTGGTCAAGATCTGCCAAAAACCCTGGAATGGGTCTACGAAGTCCTTCTGCATC GTGCCTACATCGGTGGCTCCCGTTATTACATGAGCGCAGATTGCTTTTTGTTCTTTATGTCTCGTCTGCTGCAGCGC ATCACGGACCCTGCCGTGCTGAATCGTCTGCGTCCGCTGTTCGTGGAGCGTATGCACGAGCGCGTGTCTGCCCCG GGTGACAGCATGGAACTGGCGTTCCGTATCCTGGCGGGCAGCAGCGTGGGTATTCAATTTCCGCGTGATTTGGAG AAACTGCTGGCTGCGCAGTGTGCGGACGGTGGCTGGGATCTGTGCTGGTTTTATCAATACGGTAGCACCGGCGTT AAGGCCGGCAATCGTGGCCTGACGACGGCACTGGCAATTAAGGCCATTGAGTCCGCGATTGCGCGTCCGCCGAG CCCGGCATTGAGCGCGGTCAGCAGCAGCAAACTGGAAGTGCCGAAGCCGATCTTGCAGCGTCCACTGAGCCCGC GTCGTCTGGGTGACTTCCTGATGCCGTGGCGCCGTGCGCAACGCGAAGTCGCGGTTAGCTCCTAA XP_007369631.1 - XP_007369631.1 protein SEQ ID NO: 29 MASIHRRYTTLILDLGDVLFRWSPKTETAIPPQQLKDILSSVTWFEYERGRLSQEACYERCAEEFKIEASVIAEAFKQARGS LRPNEEFIALIRDLRREMHGDLTVLALSNISLPDYEYIMSLSSDWTTVFDRVFPSALVGERKPHLGCYRKVISEMNLEPQT TVFVDDKLDNVASARSLGMHGIVFDNQANVFRQLRNIFGDPIRRGQEYLRGHAGKLESSTDNGLIFEENFTQLIIYELTQ DRTLISLSECPRTWNFFRGEPLFSETFPDDVDTTSVALTVLQPDRALVNSVLDEMLEYVDADGIMQTYFDRSRPRMDPF VCVNVLSLFYENGRGHELPRTLDWVYEVLLHRAYHGGSRYYLSPDCFLFFMSRLLKRADDPAVQARLRPLFVERVNERV GAAGDSMDLAFRILAAASVGVQCPRDLERLTAGQCDDGGWDLCWFYVFGSTGVKAGNRGLTTALAVTAIQTAIGRPP SPSPSAASSSFRPSSPYKFLGISRPASPIRFGDLLRPWRKMSRSNLKSQ - XP_007369631.1 cDNA SEQ ID NO: 30 ATGGCCTCAATCCACCGTCGATACACTACTCTCATCCTCGACCTCGGCGACGTACTCTTTCGTTGGTCTCCAAAGAC TGAGACCGCCATTCCACCTCAACAACTCAAGGATATCCTCTCCTCTGTCACCTGGTTTGAGTACGAACGCGGCAGA CTATCCCAGGAAGCATGCTACGAGCGCTGCGCCGAGGAGTTCAAGATAGAGGCCTCGGTCATTGCAGAAGCCTTT AAGCAGGCTCGCGGGTCACTGCGGCCCAACGAGGAGTTCATCGCCTTGATCCGTGACCTCCGCCGTGAGATGCAC GGTGACCTTACCGTTCTTGCCCTCTCCAACATCTCCCTCCCCGACTACGAATACATCATGTCGCTAAGCTCAGATTG GACGACCGTCTTCGATCGCGTATTCCCCTCTGCACTCGTTGGCGAGCGCAAGCCTCATCTGGGATGCTATCGCAAG GTCATCTCGGAGATGAACCTAGAACCTCAGACGACTGTGTTCGTGGATGACAAGCTTGACAACGTCGCGTCTGCTC GCTCACTTGGTATGCACGGCATCGTGTTTGACAACCAAGCCAACGTCTTCCGCCAACTCCGCAATATCTTCGGAGA CCCCATCCGCCGTGGCCAAGAGTATCTCCGTGGGCATGCTGGCAAACTCGAGTCTTCGACCGACAACGGGTTGAT CTTCGAGGAGAACTTCACACAGCTGATCATCTACGAGTTGACGCAAGACAGGACTCTCATCTCGCTTTCAGAATGT CCTCGTACTTGGAATTTCTTCCGAGGCGAACCGCTATTCTCGGAGACCTTCCCGGATGATGTCGACACAACATCTGT GGCGTTGACGGTATTGCAACCGGACAGAGCACTGGTCAACTCCGTTCTAGACGAGATGCTGGAGTATGTCGACGC CGATGGCATCATGCAGACATACTTCGATCGTTCACGACCACGCATGGACCCCTTCGTCTGCGTGAACGTACTCTCCC TGTTCTACGAGAACGGTCGTGGTCACGAGCTCCCTCGCACATTGGACTGGGTCTACGAGGTGCTCCTCCATCGCGC GTACCACGGCGGTTCGCGTTATTACCTGTCGCCCGACTGCTTTCTATTCTTCATGAGCCGCCTACTCAAGCGCGCAG ACGATCCAGCAGTCCAGGCTCGGCTCCGCCCGCTCTTCGTCGAGCGGGTGAACGAGCGAGTAGGCGCCGCTGGC GACTCGATGGACCTCGCCTTCCGCATCCTCGCCGCAGCGTCTGTTGGCGTCCAGTGCCCCCGCGATCTGGAAAGGT TGACTGCCGGGCAATGCGACGACGGTGGATGGGACCTCTGCTGGTTCTACGTGTTCGGCTCGACGGGCGTGAAG GCGGGCAACCGCGGCCTCACAACGGCCCTCGCTGTCACGGCCATACAGACGGCCATCGGACGCCCCCCTTCGCCC AGTCCCTCCGCGGCCTCCTCGTCTTTCAGACCTAGTTCCCCTTACAAATTCCTAGGCATTTCGCGCCCAGCTAGCCCC ATTCGCTTTGGCGACTTACTTCGCCCATGGCGGAAGATGAGCAGGTCGAACTTGAAGTCTCAATGA - XP_007369631.1 optimized cDNA SEQ ID NO: 31 ATGGCAAGCATTCATCGTCGCTATACTACGCTGATTCTGGACCTGGGTGATGTTTTGTTCCGCTGGAGCCCGAAAA CCGAGACTGCGATTCCTCCGCAACAACTGAAAGACATCCTGAGCAGCGTCACCTGGTTCGAGTACGAGCGTGGCC GTCTGAGCCAAGAGGCTTGCTACGAGCGTTGCGCCGAAGAGTTCAAGATTGAAGCCAGCGTGATTGCGGAAGCG TTCAAACAAGCGCGTGGTAGCCTGCGTCCGAACGAAGAATTTATCGCACTGATCCGTGATCTGCGTCGCGAGATG CATGGTGACCTGACCGTTCTGGCTCTGAGCAATATCTCGTTGCCGGATTACGAGTATATTATGTCTCTGAGCAGCG ACTGGACGACGGTCTTTGATCGTGTGTTCCCGTCAGCTCTGGTGGGCGAGCGTAAACCGCACTTGGGTTGCTATCG CAAGGTCATCAGCGAGATGAACCTGGAACCTCAGACCACGGTCTTTGTGGACGATAAACTGGATAATGTCGCAAG CGCGCGTAGCCTGGGTATGCACGGTATCGTGTTTGATAATCAAGCGAATGTGTTTCGCCAGCTGCGTAATATTTTC GGTGATCCAATCCGTCGCGGTCAAGAGTATCTGCGTGGCCATGCCGGTAAATTGGAGAGCAGCACGGACAATGG TTTGATCTTTGAAGAGAACTTCACCCAGCTGATCATTTATGAACTGACCCAGGACCGCACGTTGATCAGCCTGTCG GAGTGTCCGCGTACCTGGAACTTCTTCCGTGGCGAGCCGTTGTTTTCTGAAACCTTCCCGGACGACGTGGACACCA CGTCCGTTGCACTGACGGTTCTGCAACCGGATCGCGCACTGGTTAACAGCGTGCTGGACGAAATGCTGGAATATG TCGATGCGGATGGCATCATGCAGACGTATTTCGACCGCTCGCGTCCGCGTATGGACCCGTTTGTTTGCGTCAACGT ACTGAGCCTGTTTTACGAGAACGGTCGTGGTCACGAACTGCCGCGCACTCTGGATTGGGTGTACGAAGTCCTGCTC CACCGCGCCTACCACGGTGGTTCCCGTTACTACCTGAGCCCGGACTGTTTCTTGTTTTTTATGAGCCGTCTGCTGAA ACGTGCAGACGACCCAGCGGTTCAGGCGAGATTGCGTCCGCTGTTTGTGGAACGCGTTAACGAACGTGTTGGCGC GGCCGGTGATAGCATGGACCTGGCGTTTCGCATTCTGGCCGCAGCGAGCGTGGGTGTGCAGTGTCCGCGCGACCT GGAGCGTCTGACCGCTGGTCAATGCGATGATGGCGGCTGGGATCTGTGTTGGTTCTACGTTTTCGGCAGCACCGG CGTTAAGGCCGGTAATCGTGGTCTGACCACGGCGCTGGCAGTCACCGCGATCCAGACCGCCATCGGCCGTCCGCC TAGCCCGAGCCCGTCCGCGGCAAGCTCCAGCTTCCGCCCGAGCAGCCCGTACAAGTTTCTGGGTATTAGCCGTCCG GCGTCCCCAATTCGCTTCGGTGACCTTCTGCGTCCGTGGCGTAAAATGTCTCGCTCTAACCTGAAGTCCCAGTAA ACg006372 - ACg006372 protein SEQ ID NO: 32 MRRNVLNKATHSQSPLKPNITTLIFDLGDVLLTWSDSTPKSPLPPKIVKGILRSLTWFEYEKGNLTESQTYGQVAQEFGV DASEVKASFEAARDSLKSNPMLLQLIRSLKDSGHVIYAMSNISAPDWEFLKTRADLSDWALFDRVFPSAEAHDRKPNIG FYQHVINETGLNPSNTVFVDDRIENVVSARSAGMHGIVFDDINNVIRQLKNLCEDPIHRARSFLYANKKCLNTVSTDGTI VSENFSQLLILEAIGDESLVDFVRHEGRFNFFQGEAKLIMTNHYPDDFDTTSIGLTVVPYIDDKTRNRVMDEILAYQSEDG IVLVYFDHKRPRIDPVVCVNVLTLFYRYGRGHQLQKTLDWVEQVLINRACASGTFYYATEEQFLFFLSRLIQSSPDVRQRL EGVFKRRVVERFGADGDALAMAMRIHTAASVGLVDHVDLDKLFALQQNDGSWRDSAFYRFPSARQLASNDGLTTAIA IQAIQAAERLREDGNVL - ACg006372 cDNA SEQ ID NO: 33 ATGAGGCGAAACGTACTCAACAAAGCAACACATTCTCAGTCACCATTGAAGCCCAACATCACGACGCTCATATTTG ACTTGGGCGACGTACTTCTCACGTGGTCCGACTCAACACCTAAATCTCCACTGCCCCCAAAAATTGTCAAGGGAAT ACTACGTTCACTGACCTGGTTTGAGTACGAGAAAGGGAACTTGACAGAGTCCCAGACCTACGGGCAAGTTGCTCA GGAATTTGGAGTGGATGCTTCCGAAGTCAAAGCTTCCTTCGAAGCAGCTCGCGACTCGCTCAAGAGCAACCCAAT GCTTCTCCAGTTGATCCGTAGCCTCAAAGACTCTGGCCACGTCATTTACGCAATGTCTAACATATCTGCTCCCGACT GGGAATTTTTGAAGACGCGGGCAGACCTCTCAGATTGGGCTCTTTTTGACAGAGTCTTCCCTTCTGCCGAAGCGCA TGACCGCAAGCCGAACATTGGTTTCTATCAGCACGTCATAAACGAGACTGGTCTGAACCCGTCCAACACTGTCTTT GTCGATGACAGGATCGAGAATGTTGTATCCGCACGCTCAGCAGGAATGCACGGGATCGTGTTTGACGACATAAAT AATGTGATCCGACAGTTGAAAAACCTCTGCGAGGATCCGATTCACCGCGCACGATCTTTTCTTTATGCAAATAAGA AGTGTTTGAATACGGTTAGCACAGATGGCACAATTGTGAGCGAGAACTTCTCGCAATTGTTGATCCTTGAGGCCAT TGGCGACGAAAGCCTAGTCGACTTTGTGAGGCATGAGGGCCGATTCAACTTCTTCCAGGGGGAGGCCAAACTCAT CATGACGAATCACTACCCCGATGATTTCGATACTACATCCATAGGTTTAACCGTTGTTCCATATATTGACGACAAGA CTAGAAATAGAGTTATGGATGAGATCCTGGCCTACCAAAGCGAAGACGGCATTGTGCTGGTATACTTTGACCACA AGCGCCCCAGGATTGATCCTGTTGTCTGTGTCAATGTCCTCACCCTCTTCTATAGGTATGGCCGTGGGCACCAGCTT CAAAAGACACTGGATTGGGTCGAACAGGTCCTGATCAACCGTGCGTGTGCGTCCGGCACGTTCTATTACGCAACA GAGGAACAATTCCTCTTTTTCCTCTCCCGCCTGATCCAAAGCTCTCCGGACGTACGACAGCGGTTGGAAGGGGTCT TTAAAAGAAGAGTAGTCGAGCGGTTTGGTGCAGACGGCGACGCTCTCGCTATGGCGATGCGCATTCACACCGCGG CGAGCGTGGGCCTCGTTGACCATGTCGATCTTGACAAGCTGTTCGCATTGCAGCAAAATGACGGTTCTTGGAGAG ACAGCGCTTTCTACAGATTTCCGTCGGCCAGGCAACTGGCTAGTAACGACGGCTTGACGACTGCAATCGCTATTCA GGCCATTCAAGCTGCGGAGAGGCTCAGGGAGGATGGGAACGTGCTTTGA - ACg006372 optimized cDNA SEQ ID NO: 34 ATGCGCCGTAATGTCCTGAACAAAGCAACCCATAGCCAGTCACCGTTGAAACCGAATATCACCACGCTGATTTTTG ACTTGGGCGATGTCCTGCTGACCTGGAGCGACAGCACTCCGAAATCTCCGTTGCCGCCGAAGATCGTCAAGGGCA TCCTGCGTAGCCTGACTTGGTTCGAGTACGAAAAGGGCAATTTGACCGAAAGCCAAACGTATGGTCAGGTCGCGC AAGAATTTGGTGTGGATGCCTCTGAAGTGAAGGCCAGCTTTGAGGCTGCGCGTGATAGCTTGAAATCGAATCCGA TGCTGCTGCAGCTGATTCGCAGCCTGAAAGATTCCGGTCACGTGATCTACGCCATGAGCAACATCAGCGCGCCTGA TTGGGAATTTCTGAAAACCCGCGCTGACCTGTCTGACTGGGCCCTGTTTGACCGCGTGTTCCCGTCTGCCGAGGCA CATGACCGCAAACCGAACATTGGCTTTTACCAACACGTGATCAATGAAACGGGTCTGAATCCATCCAATACCGTGT TCGTTGACGACCGTATTGAAAACGTTGTTAGCGCACGTAGCGCTGGTATGCACGGTATCGTTTTCGATGACATTAA CAACGTCATTCGCCAGCTGAAGAATCTGTGCGAGGACCCAATTCACCGTGCACGTTCCTTTTTGTATGCGAACAAA AAGTGCCTGAATACCGTGAGCACCGATGGTACGATCGTCAGCGAGAACTTTAGCCAGCTTCTGATTCTGGAAGCC ATTGGTGACGAGTCCCTGGTAGACTTCGTCCGCCATGAGGGCCGTTTTAACTTCTTCCAGGGTGAGGCAAAGCTGA TCATGACCAATCACTACCCGGACGATTTCGATACCACGAGCATTGGTCTGACCGTTGTCCCGTATATCGATGACAA AACGCGTAATCGTGTGATGGATGAAATCCTGGCGTATCAGTCCGAGGATGGTATCGTTCTGGTGTACTTCGATCAC AAGCGTCCGCGCATTGACCCGGTCGTTTGTGTGAACGTTCTGACGCTGTTCTACCGCTATGGTCGTGGCCATCAAC TGCAGAAAACCCTGGACTGGGTTGAGCAAGTCCTGATTAATCGTGCGTGTGCGAGCGGCACGTTCTACTACGCGA CCGAAGAACAGTTCCTGTTTTTCCTGAGCCGTCTGATTCAGTCGAGCCCTGACGTGCGCCAACGTCTGGAAGGCGT GTTCAAGCGTCGTGTCGTTGAGCGCTTTGGTGCGGACGGTGATGCCCTGGCAATGGCGATGCGTATCCATACCGC AGCGAGCGTTGGCCTGGTGGACCACGTGGATCTGGATAAGCTGTTCGCGCTGCAACAGAACGACGGTAGCTGGC GCGATAGCGCGTTTTATCGTTTTCCGAGCGCGCGTCAACTCGCGAGCAACGACGGCTTGACCACGGCAATTGCTAT TCAGGCCATCCAAGCGGCTGAGAGATTACGTGAGGATGGTAACGTTCTGTAA KIA75676.1 - KIA75676.1 protein SEQ ID NO: 35 MVRALILDLGDVLFNWDAPKSTPVSRKTLSQMLHSDIWGEYECGQLTEPESYKALASRYSCQAQDVADTFYLARESLRL DATFKTFLQDLKQRANGSLRVYGMSNISQPDYEVLLSKADDLSLFDKIFPSGHVGMRKPDLAFFRHVLREISTASEDIVFV DDNLENVTSARSLGMQGIVFRDKEDVQRQLRNLFGSPAERGREYLSINKTKLQSVTTTNIPILDNFGQLLILEATRDPDLV SMHPGQRTWNFFIGSPTLTTDAFPDDMDTTSLGLSIIPPSPEIAASVMDEIVTRLNKDGIVPTYFDSTRPRVDPIVCVNVL TLFAKYGREDELSGTIAWVRDVLYHRAYLAGTRYYASPEAFLFFFTRFTRNLRPGPRKQELTALLSQRLQERNKTPVDALA LSMRIIACLTLGIESPADDVATLTGMQCGDGGWPACVIYKYGAGGLGITNRGVSTAFAVKAITTTPLAVQPEVSVSAGA GGSSRPVGADAAAVSLRPRWRAVVQSLHPLSRVGGLVAVIFAALHFNLAWLYNVSLASRIV - KIA75676.1 cDNA SEQ ID NO: 36 ATGGTCCGCGCACTGATTCTCGATCTCGGCGACGTCCTCTTCAACTGGGACGCCCCAAAGTCAACCCCCGTTTCCCG CAAGACACTCAGCCAGATGCTGCATAGCGACATCTGGGGCGAATACGAATGTGGCCAACTGACAGAGCCGGAAA GCTACAAGGCGCTTGCCAGCCGCTATTCTTGCCAGGCTCAAGATGTTGCAGATACCTTCTATCTAGCCCGCGAATC GCTGAGGCTCGATGCGACCTTCAAGACCTTCCTGCAGGACTTGAAGCAGAGGGCCAACGGCTCACTTCGCGTATA TGGGATGTCCAACATCTCCCAGCCCGATTATGAGGTCCTGCTGTCCAAGGCGGATGACTTGAGCCTGTTTGACAAG ATCTTCCCATCCGGCCACGTCGGGATGCGTAAGCCTGACCTTGCGTTTTTTCGACATGTCCTGCGTGAGATCTCGAC GGCCAGCGAGGATATTGTGTTTGTTGACGACAACCTGGAGAACGTGACATCTGCCCGGTCTCTGGGCATGCAGGG GATTGTCTTTCGCGACAAGGAGGATGTACAGAGACAGCTGCGGAACCTCTTTGGCAGTCCTGCTGAACGTGGAAG GGAGTATTTGTCCATCAACAAGACAAAGCTCCAGAGCGTCACGACGACCAATATCCCCATTCTCGACAACTTTGGC CAGCTCCTTATCCTCGAAGCCACCAGAGACCCAGACCTGGTGTCCATGCATCCTGGACAGAGGACCTGGAACTTTT TCATCGGATCTCCAACTCTGACAACGGACGCCTTCCCAGACGATATGGACACCACCTCACTTGGCCTTTCTATTATA CCCCCAAGTCCCGAGATTGCAGCGTCCGTGATGGATGAGATTGTGACCCGCCTGAACAAGGACGGCATTGTCCCA ACATATTTTGACAGCACCAGACCCCGCGTCGACCCGATCGTCTGCGTCAACGTTCTCACCCTCTTCGCTAAATACGG CCGCGAAGACGAGCTGTCCGGGACCATAGCCTGGGTGCGCGATGTGCTGTATCACAGGGCCTACCTTGCAGGGA CCAGATACTACGCATCCCCAGAAGCATTCCTTTTCTTCTTCACGCGCTTCACCCGAAACCTGCGCCCGGGCCCGCGC AAGCAGGAGCTCACGGCGCTGCTGTCCCAGCGCCTGCAGGAGCGCAACAAGACGCCCGTTGACGCACTTGCGCTC TCGATGCGGATTATTGCGTGCCTCACGCTGGGTATTGAATCCCCCGCTGACGACGTGGCTACCCTCACGGGCATGC AGTGTGGGGATGGCGGGTGGCCGGCCTGTGTCATCTACAAGTACGGCGCCGGTGGGCTGGGGATCACGAACAG GGGGGTCTCGACCGCGTTTGCTGTCAAGGCAATCACTACTACTCCTTTGGCGGTGCAGCCTGAAGTTAGTGTCAGC GCAGGTGCAGGAGGCAGCAGTCGCCCTGTGGGTGCCGATGCTGCTGCAGTCTCGCTCCGCCCGAGATGGCGAGC TGTTGTGCAGAGTCTCCATCCGCTCTCTCGGGTTGGTGGGTTGGTGGCCGTCATTTTTGCTGCACTGCATTTCAACT TGGCCTGGCTTTATAATGTGTCCCTTGCTAGTAGGATCGTTTAG - KIA75676.1 optimized cDNA SEQ ID NO: 37 ATGGTTCGTGCATTGATTTTGGATTTGGGTGATGTGTTGTTTAACTGGGATGCGCCTAAGAGCACCCCGGTTTCCC GCAAGACTCTGAGCCAAATGCTGCACTCGGATATTTGGGGCGAGTACGAGTGTGGTCAACTGACTGAGCCGGAGT CCTATAAAGCCCTGGCGAGCCGCTATAGCTGCCAGGCGCAAGATGTCGCTGACACCTTTTACCTGGCGCGTGAGA GCCTGCGTCTGGACGCAACGTTTAAGACCTTCCTGCAAGATCTGAAGCAACGCGCCAACGGTTCTCTGCGTGTCTA TGGTATGAGCAATATCAGCCAGCCGGATTACGAAGTCCTGCTGAGCAAAGCTGACGATCTCAGCCTGTTTGACAA AATCTTTCCGTCGGGTCACGTTGGTATGAGAAAGCCTGACCTGGCGTTTTTCCGTCACGTTCTGCGTGAGATCAGC ACGGCTAGCGAAGATATTGTGTTTGTTGACGACAATTTGGAAAACGTCACGTCTGCACGCTCCCTGGGTATGCAAG GCATCGTCTTTCGTGATAAGGAAGATGTCCAGCGCCAGCTGCGCAATCTGTTCGGTTCCCCGGCAGAGCGCGGTC GTGAGTATCTGAGCATTAATAAGACCAAACTGCAGAGCGTGACCACCACCAATATCCCGATTCTGGACAACTTCGG TCAGTTGCTGATCCTGGAAGCTACCCGTGACCCGGATTTAGTCAGCATGCATCCAGGCCAACGTACGTGGAACTTC TTCATTGGCAGCCCGACCTTGACGACCGACGCGTTTCCGGACGATATGGACACGACTTCTCTGGGCCTGAGCATCA TCCCGCCGAGCCCGGAAATTGCAGCAAGCGTTATGGACGAAATCGTCACCCGTCTGAATAAAGATGGTATTGTGC CGACCTACTTCGACAGCACGCGTCCACGTGTGGACCCGATCGTCTGCGTTAACGTCCTGACCTTGTTTGCGAAATA TGGTCGTGAAGATGAACTGAGCGGCACGATTGCGTGGGTCCGCGACGTTCTGTATCATCGCGCATACCTGGCGGG CACGCGCTACTACGCGTCCCCAGAGGCCTTCCTGTTCTTCTTTACGCGTTTCACCCGCAATCTGCGTCCGGGTCCGC GTAAACAAGAACTTACGGCGCTGCTGAGCCAGCGTCTGCAGGAACGCAACAAGACGCCGGTTGACGCTCTGGCCC TGAGCATGCGTATCATCGCCTGTCTGACCCTGGGCATTGAGAGCCCGGCAGACGACGTGGCCACCCTGACCGGTA TGCAGTGTGGTGATGGTGGCTGGCCGGCGTGCGTGATCTACAAATATGGTGCGGGTGGCTTGGGTATCACGAAT CGTGGCGTTAGCACTGCCTTCGCGGTGAAAGCGATTACGACCACCCCGCTGGCAGTGCAGCCAGAAGTCAGCGTC AGCGCTGGTGCCGGCGGCTCCAGCCGCCCGGTTGGTGCGGATGCGGCAGCGGTTAGCTTGCGTCCGCGTTGGCG TGCGGTTGTGCAGAGCCTGCATCCGCTGAGCCGCGTGGGTGGCCTGGTTGCCGTGATCTTCGCGGCACTGCACTT TAACCTGGCGTGGCTGTACAACGTAAGCCTGGCTAGCCGTATTGTGTAA XP_001820867.2 - XP_001820867.2 protein SEQ ID NO: 38 MTRWKSSQYQAIIFDLGGVILTWDLPEDTVISAQIFKRMLTSQTWSDYERGNLSENGCYQRLAEDFGIDSADIAHTVRQ ARESLVTDTAIMNIISEIRAGANHIAIFAMSNISQPDYAALLLDHRGMCSFDRVFPSGCYGTRKPELSFYNKVLREIDTPPE NVIFVDDQLENVISAQSIGIHGIAYTNAAELGRQLRNLIFDPVERGREFLRRNAGEFHSITETDQIVRENFSQLLILEATGD KSLVSLEYHQKSWNFFQGNPILTTETFPDDVDTTSLALMTLPTDTKTANLLLDQILGLVNADEIVTTYFDQTRERIDPVVC VNVLRLFCTYGRGIALPLTLQWVYDVLAHRAYINGTRYYTSPESFLYFVGQLCRFSTGVLALRPLETLLIDRLKERLQVKAD PLSLAMRILTCLSVGVSQVEVDLRELLSMQCEDGSWEHCPFTRYGLSKVSIGNRGLTTAFVVKAVEMCRGS - XP_001820867.2 cDNA SEQ ID NO: 39 ATGACTCGATGGAAATCGTCCCAATACCAAGCAATTATCTTTGACCTAGGCGGTGTCATTTTAACATGGGACCTCCC GGAAGACACTGTGATATCGGCCCAGATCTTTAAGAGAATGCTCACATCGCAGACATGGTCAGATTATGAGCGCGG AAATCTCAGCGAAAATGGTTGCTACCAGAGGTTGGCCGAGGATTTTGGCATTGACTCTGCCGACATTGCACATACC GTTAGACAAGCACGGGAATCCCTTGTCACTGATACCGCTATCATGAACATTATATCTGAGATCAGAGCTGGGGCTA ACCATATTGCTATCTTCGCTATGTCGAACATCTCCCAACCAGATTATGCGGCTCTGCTCCTTGATCATCGCGGGATG TGCAGTTTTGACCGGGTGTTCCCATCTGGATGCTACGGGACAAGGAAACCAGAGCTCTCATTCTATAACAAAGTCT TGCGGGAGATTGACACGCCACCGGAAAACGTCATCTTTGTCGATGATCAGCTGGAAAATGTGATCTCTGCGCAGT CCATTGGCATACACGGGATTGCCTATACGAATGCTGCTGAACTCGGTCGACAGCTTAGGAACCTAATATTTGACCC TGTAGAGAGGGGTAGGGAATTCTTACGGCGCAATGCTGGAGAGTTCCATAGCATCACTGAAACCGATCAAATTGT TCGGGAAAATTTCTCACAGTTGCTCATTCTAGAAGCGACTGGTGATAAGAGTCTGGTATCTCTTGAATATCACCAG AAGAGCTGGAATTTCTTCCAAGGAAACCCTATTCTCACGACAGAGACATTCCCAGATGATGTTGACACAACATCTC TTGCCTTGATGACTCTACCTACAGACACAAAAACTGCAAATTTGTTACTCGACCAGATTTTGGGGCTAGTCAACGCT GATGAAATCGTAACAACATACTTTGACCAGACCCGAGAACGGATCGATCCAGTAGTCTGCGTCAATGTCCTTCGTC TCTTTTGCACCTACGGCCGGGGCATTGCGCTCCCTTTGACTCTTCAGTGGGTGTACGACGTCCTCGCTCATCGGGCA TATATAAACGGTACACGTTACTACACAAGTCCCGAAAGCTTCCTATACTTCGTCGGTCAACTTTGTCGATTCTCAAC AGGGGTACTGGCACTTCGGCCGCTGGAAACGTTGCTTATAGATCGTCTCAAGGAACGTCTTCAGGTCAAAGCAGA TCCTCTATCACTCGCTATGCGGATCTTGACCTGTTTGTCCGTTGGTGTGTCTCAAGTTGAAGTCGATCTCCGAGAGT TGCTCTCGATGCAGTGTGAAGATGGCTCGTGGGAACATTGTCCATTCACCCGGTATGGTTTGTCCAAAGTGAGCAT TGGCAATCGGGGCCTTACAACTGCTTTTGTGGTCAAGGCGGTTGAAATGTGTCGAGGCAGTTAG - XP_001820867.2 optimized cDNA SEQ ID NO: 40 ATGACTCGTTGGAAAAGCTCTCAATATCAGGCAATCATTTTCGATCTGGGCGGTGTTATTCTGACCTGGGACTTGC CGGAAGATACGGTTATCTCCGCGCAAATCTTTAAGCGTATGCTGACCAGCCAGACCTGGTCCGATTATGAGCGCG GTAATCTGAGCGAGAACGGCTGCTATCAACGTTTGGCGGAAGATTTCGGCATCGATAGCGCCGATATTGCCCACA CCGTCCGTCAGGCACGTGAGTCCCTGGTGACCGACACCGCCATCATGAATATCATCTCCGAGATCCGTGCAGGCGC GAACCACATCGCAATTTTCGCGATGAGCAACATCTCACAGCCGGATTACGCTGCGCTGCTGCTGGACCATCGCGGT ATGTGCAGCTTTGACCGCGTCTTTCCGAGCGGTTGTTACGGCACCCGTAAGCCTGAGCTGAGCTTCTACAATAAAG TGCTGCGTGAAATTGACACCCCGCCGGAAAATGTTATTTTCGTTGACGATCAATTGGAAAATGTGATTAGCGCGCA AAGCATTGGTATTCATGGCATTGCGTATACGAATGCCGCGGAACTGGGCCGCCAGCTGAGAAACCTGATCTTCGA TCCGGTGGAGCGCGGTCGTGAGTTCCTGCGTCGTAACGCTGGTGAGTTTCACTCTATTACGGAAACGGACCAGAT TGTGCGCGAGAACTTCAGCCAGCTGCTGATTCTGGAAGCGACCGGTGACAAAAGCCTGGTTAGCCTGGAATACCA CCAAAAGTCGTGGAACTTCTTCCAAGGTAACCCAATCCTGACGACGGAAACCTTCCCGGACGATGTTGACACTACT AGCCTGGCTCTGATGACGCTGCCGACGGACACCAAGACCGCGAATCTGTTGCTGGACCAGATTCTGGGTTTGGTT AATGCCGATGAAATTGTGACTACGTACTTCGACCAGACCCGTGAGCGTATCGATCCAGTGGTCTGTGTGAATGTCC TGCGCCTGTTCTGTACGTACGGCCGCGGCATCGCGCTGCCGCTGACCCTGCAATGGGTCTACGATGTGCTGGCGC ACCGCGCATACATTAACGGTACGCGTTATTACACCAGCCCGGAGAGCTTTCTGTATTTTGTCGGTCAGCTCTGTCGT TTTAGCACCGGTGTGCTGGCACTGCGTCCGCTGGAGACTCTGCTGATTGATCGTCTGAAAGAGCGCCTGCAAGTTA AAGCTGACCCGCTGAGCCTGGCAATGCGCATCCTTACGTGCTTATCTGTCGGTGTCAGCCAGGTTGAAGTGGACTT GCGTGAGTTGTTGAGCATGCAGTGCGAGGACGGTAGCTGGGAGCATTGCCCGTTCACCCGCTACGGCCTGAGCA AGGTTTCCATCGGTAACCGTGGCCTGACCACGGCGTTTGTGGTTAAAGCCGTCGAGATGTGCCGTGGCAGCTAA CEN60542.1 - CEN60542.1 protein SEQ ID NO: 41 MVRALILDLGDVLFNWDAPASTPISRKTLGQMLHSEIWGEYERGHLTEDEAYNALAKRYSCEAKDVAHTFVLARESLRL DTKFKTFLQTLKQNANGSLRVYGMSNISKPDFEVLLGKADDWTLFDKIFPSGHVGMRKPDLAFFRYVLKDISTPVEDVV FVDDNLDNVTSARSLGMRSVLFHKKDEVQRQLTNIFGSPAERGLEYLSANKTNLQSATTTDIPIQDNFGQLLILEATEDP SLVRMEPGKRTWNFFIGSPSLTTDTFPDDLDTTSLALSIVPTSPDVVNSVIDEIISRRDKDGIVPTYFDNTRPRVDPIVCVN VLSMFAKYGREHDLPATVAWVRDVLYHRAYLGGTRYYGSAEAFLFFFTRFVRNLRPGTLKQDLHALLSERVRERLNTPV DALALSMRIQACHALGFDAPADIATLITMQDEDGGWPAAVIYKYGAGGLGITNRGVSTAFAVKAITGSPVKTETNIGGD GARAVSAMSSLEARRLQPISSVGDWVRFIIASLHVHLAWLWNVLLLSKVV - CEN60542.1 cDNA SEQ ID NO: 42 ATGGTCCGCGCACTCATCCTCGATCTCGGCGATGTCCTCTTCAACTGGGACGCGCCTGCGTCCACCCCCATTTCACG CAAGACCCTCGGCCAGATGCTGCATAGTGAGATCTGGGGTGAGTATGAACGTGGCCATTTGACAGAAGACGAGG CATACAACGCACTCGCGAAGCGGTATTCCTGCGAGGCCAAGGATGTCGCACATACCTTTGTCCTGGCACGAGAAT CGCTGCGGCTCGACACGAAATTCAAAACGTTTCTGCAGACTCTAAAGCAGAATGCCAACGGCTCCCTTCGTGTCTA TGGCATGTCGAATATATCGAAACCGGATTTCGAAGTCCTGCTGGGCAAGGCCGATGACTGGACTCTGTTTGACAA GATCTTCCCCTCTGGCCATGTCGGTATGCGCAAGCCAGATCTTGCCTTCTTCCGCTATGTGCTCAAGGACATTTCAA CGCCTGTCGAGGATGTGGTGTTTGTTGACGATAACCTGGACAACGTGACGAGTGCTCGGTCTCTGGGCATGCGCA GCGTCCTCTTTCATAAGAAAGACGAGGTCCAGCGACAGCTCACCAACATCTTTGGCAGCCCTGCTGAGCGGGGCTT GGAGTATCTCTCCGCCAACAAGACGAATCTGCAGAGTGCTACCACGACAGATATCCCAATCCAGGATAACTTTGGC CAACTTCTGATTCTCGAGGCCACTGAAGACCCATCGCTGGTCCGCATGGAGCCCGGTAAGCGAACCTGGAATTTCT TCATCGGTTCTCCATCCCTCACAACCGACACCTTCCCCGACGATCTCGACACCACATCCCTTGCCCTCTCCATCGTAC CCACAAGCCCCGACGTCGTCAACTCGGTCATCGACGAGATTATCAGCCGTCGCGACAAGGACGGTATCGTCCCGA CTTACTTCGACAACACCCGCCCCCGCGTGGACCCAATCGTCTGCGTAAACGTCCTCTCCATGTTCGCAAAGTACGGC CGCGAGCACGACCTCCCCGCAACAGTTGCGTGGGTCCGCGACGTCTTGTATCATCGAGCATACCTCGGCGGAACA CGGTACTACGGGTCAGCTGAGGCCTTCCTCTTCTTCTTCACTCGCTTCGTTCGCAACCTCCGACCGGGAACTCTCAA GCAGGATCTACACGCATTGCTATCAGAGCGCGTGCGCGAGCGACTCAATACCCCCGTCGACGCACTCGCCCTGTCA ATGCGCATCCAGGCCTGTCATGCGCTGGGCTTTGACGCCCCCGCAGACATTGCGACGCTCATCACAATGCAGGAC GAGGACGGCGGGTGGCCGGCAGCCGTCATCTACAAGTACGGGGCCGGGGGGTTGGGGATCACGAACCGGGGTG TTTCGACTGCGTTTGCCGTAAAGGCGATTACAGGGTCGCCCGTGAAGACTGAAACCAACATAGGCGGCGATGGAG CTCGCGCTGTCTCGGCCATGTCCTCCTTGGAGGCGAGGAGGCTACAGCCGATCTCGTCGGTTGGGGACTGGGTGC GGTTTATCATTGCGTCGTTGCATGTCCATCTGGCTTGGCTTTGGAATGTTTTGCTTTTGAGCAAGGTTGTTTGA - CEN60542.1 optimized cDNA SEQ ID NO: 43 ATGGTTCGTGCGTTGATTTTGGATTTGGGTGATGTGTTGTTTAATTGGGACGCCCCTGCAAGCACTCCGATCAGCC GTAAGACCCTGGGCCAGATGCTGCATTCCGAGATTTGGGGTGAGTATGAGCGTGGTCACCTGACCGAAGATGAA GCGTACAACGCGCTGGCAAAGCGCTACAGCTGCGAGGCAAAAGACGTGGCGCATACTTTTGTTTTGGCGCGTGAA AGCCTGCGCCTGGATACCAAGTTTAAGACTTTTCTGCAGACCCTGAAACAGAACGCGAACGGCTCGCTGCGTGTTT ATGGTATGTCCAATATCAGCAAACCGGATTTTGAAGTGCTGCTGGGTAAAGCTGACGACTGGACCTTGTTCGACAA GATCTTCCCGAGCGGTCATGTCGGTATGCGCAAACCGGACCTGGCTTTCTTTCGTTACGTGCTGAAAGACATCAGC ACCCCGGTTGAGGATGTTGTGTTTGTTGACGATAACCTGGATAATGTGACGTCTGCCCGTTCCCTGGGTATGCGTA GCGTCCTGTTCCACAAAAAAGACGAAGTCCAACGTCAGCTGACCAACATTTTCGGTAGCCCTGCTGAGCGCGGTCT GGAGTATCTGTCCGCGAACAAGACCAATCTGCAAAGCGCAACCACCACCGACATCCCTATCCAAGACAACTTTGGT CAATTACTGATTCTGGAAGCCACCGAAGATCCGAGCCTGGTACGCATGGAACCGGGCAAGCGTACCTGGAATTTC TTCATTGGCTCTCCGAGCCTGACGACGGATACCTTCCCGGATGACCTGGACACGACGAGCCTCGCACTGTCCATCG TGCCGACCAGCCCAGATGTTGTTAATAGCGTGATCGATGAGATCATCAGCCGTCGCGACAAGGACGGTATTGTGC CGACGTACTTTGATAACACGCGCCCGCGTGTGGACCCGATTGTTTGTGTTAACGTTCTGTCTATGTTCGCGAAATAT GGCCGTGAGCACGATCTGCCGGCGACGGTCGCGTGGGTCCGCGACGTCCTCTATCATCGCGCATACCTGGGTGGC ACCAGATACTACGGTAGCGCGGAAGCCTTCCTTTTCTTCTTTACGCGCTTTGTGCGTAATCTGCGTCCGGGCACGCT GAAACAAGATCTGCACGCGTTGCTGAGCGAGCGTGTCCGTGAGCGCCTGAATACCCCGGTGGATGCGCTGGCGCT GAGCATGCGCATTCAGGCTTGCCACGCACTGGGCTTTGACGCCCCAGCTGACATCGCGACGCTGATTACCATGCAA GATGAAGATGGTGGCTGGCCGGCGGCAGTTATCTACAAATATGGTGCGGGTGGCCTGGGCATTACGAACCGTGG TGTGTCCACGGCATTCGCGGTGAAGGCAATCACGGGTAGCCCGGTTAAAACCGAAACCAACATCGGCGGCGACG GTGCCCGTGCAGTGTCGGCCATGAGCAGCCTGGAAGCCCGTCGTTTGCAGCCGATTTCTAGCGTCGGCGACTGGG TCCGTTTCATCATCGCATCACTGCACGTCCACCTGGCGTGGCTGTGGAATGTCCTGCTGCTGAGCAAAGTCGTTTAA XP_009547469.1 - XP_009547469.1 protein SEQ ID NO: 44 MSMIPRCSNLILDIGDVLFTWSPKTSTSISPRTMKSILSSTTWHQYETGHISQGDCYRLIGNQFSIDPQEVGLAFQQARD SLQPNVDFIHFIRALKAESHGTLRVFAMSNISQPDYAVLRTKDADWAVFDDIFTSADAGVRKPHLGFYKLVLGKIGADP NDTVFVDDKGDNVLSARSLGLHGIVFDSMDNVKRALRYLISDPIRRGREFLQARAGHLESETNTGIEIGDNFAQLLILEAT KDRTLVNYMDHPNKWNFFRDQPLLTTEEFPFDLDTTSIGTLATQRDDGTANLVMDEMLQYRDEDGIIQTYFDHERPRI DPIVCVNVLSLFYSRGRGSELAPTLEWVRGVLKHRAYLDGTRYYETGECFLFFLSRLLQSTKDAALHASLKSLFAERVKERI GAPGDALALAMRILACAAVGVRDEIDLRSLLPLQCEDGGWEAGWVYKYGSSGVKIGNRGLTTALALNAIEAVEGRRTR PKSGKISRVSRHSEVAAAPRSSTSSHRSNRSISRTFQAYFKASWTSMKQVAVA - XP_009547469.1 cDNA SEQ ID NO: 45 ATGTCCATGATACCCAGATGCTCGAATCTCATCCTCGACATCGGGGATGTTCTCTTCACATGGTCTCCGAAGACGTC CACTTCGATCTCCCCCCGCACCATGAAGAGCATACTGTCATCGACGACCTGGCACCAATACGAGACCGGGCACATT TCACAGGGCGACTGCTACCGCCTCATAGGCAACCAGTTCTCCATCGATCCTCAGGAAGTCGGACTTGCATTCCAAC AAGCTCGGGACTCATTGCAGCCTAATGTTGACTTCATTCACTTCATCCGCGCCCTCAAGGCGGAATCACACGGGAC GCTGCGCGTCTTCGCTATGTCCAACATCTCTCAGCCCGATTACGCAGTTCTTCGGACTAAGGACGCCGACTGGGCC GTTTTTGACGATATATTCACGTCTGCAGATGCTGGGGTTCGAAAGCCACACCTTGGGTTCTACAAGTTGGTACTCG GAAAGATCGGCGCCGATCCAAACGATACCGTCTTCGTCGATGACAAGGGGGACAATGTCCTCTCTGCACGGTCTC TCGGCCTTCATGGAATCGTCTTTGACAGTATGGACAACGTCAAGCGAGCCCTGCGCTACTTGATCAGCGACCCCAT ACGGCGAGGACGAGAGTTTCTCCAAGCGCGAGCCGGCCATTTGGAGTCGGAGACCAATACGGGCATCGAAATCG GTGATAATTTTGCCCAGCTCCTTATTCTCGAGGCCACGAAGGATAGGACACTCGTCAATTATATGGACCATCCGAA CAAATGGAATTTCTTCCGAGATCAACCGCTCCTCACAACGGAGGAGTTCCCTTTCGATCTCGATACGACATCTATTG GAACGCTTGCGACGCAGCGCGATGATGGGACTGCCAATCTAGTAATGGATGAGATGCTTCAGTACCGTGATGAG GATGGCATAATACAAACATATTTCGATCATGAACGACCGAGGATAGATCCCATCGTCTGTGTCAACGTCTTGAGCC TTTTCTACTCCCGGGGTCGTGGTTCGGAGCTAGCACCGACACTAGAGTGGGTGCGTGGTGTCCTCAAGCACCGCG CGTATCTCGATGGAACGCGATACTACGAGACAGGCGAATGCTTCCTTTTCTTCCTCAGCCGGCTCTTGCAATCAACC AAGGACGCCGCCTTGCACGCATCGTTGAAATCTTTGTTCGCCGAACGGGTCAAGGAGCGCATAGGGGCACCAGG GGACGCGCTGGCGCTGGCGATGCGTATACTGGCATGCGCAGCAGTGGGCGTGCGGGACGAGATCGATCTTCGAT CACTATTACCTCTGCAGTGCGAGGATGGGGGGTGGGAGGCAGGCTGGGTGTACAAGTATGGGTCTTCGGGAGTC AAGATCGGCAATCGTGGCCTCACGACTGCGCTTGCGCTCAATGCCATCGAGGCTGTGGAGGGACGTCGCACGAG GCCGAAGTCGGGTAAGATCAGCCGAGTCAGCCGTCATTCTGAGGTCGCAGCAGCGCCACGGTCTTCCACCAGCAG TCATCGTTCTAATCGCTCGATCTCAAGGACATTCCAGGCGTACTTCAAGGCGTCGTGGACATCGATGAAACAGGTG GCCGTGGCGTGA - XP_009547469.1 optimized cDNA SEQ ID NO: 46 ATGAGCATGATTCCACGTTGTAGCAATCTGATTCTCGACATCGGTGATGTGTTGTTTACGTGGAGCCCGAAAACCA GCACCAGCATTAGCCCGCGTACCATGAAATCTATCCTGAGCTCTACCACCTGGCATCAATATGAGACTGGCCACAT CAGCCAGGGTGATTGCTACCGCCTGATCGGTAATCAGTTCTCCATCGACCCGCAAGAGGTCGGTTTGGCCTTCCAG CAAGCCAGAGACAGCCTGCAACCGAATGTTGATTTCATCCATTTCATTCGTGCCCTGAAAGCTGAGTCGCACGGCA CCCTGCGCGTTTTTGCGATGAGCAATATCAGCCAACCTGACTATGCAGTCCTGCGTACGAAAGACGCGGACTGGG CTGTTTTTGATGATATCTTCACGAGCGCGGATGCTGGTGTTCGTAAACCGCACCTGGGTTTTTATAAACTGGTCTTA GGCAAGATTGGCGCGGACCCTAACGACACCGTTTTTGTGGATGATAAGGGTGACAACGTCCTCTCTGCACGTTCCC TGGGTCTGCACGGTATCGTTTTTGATTCAATGGACAACGTGAAGCGCGCACTGCGCTACCTGATTAGCGACCCGAT CCGCCGCGGCCGTGAATTTCTGCAGGCCCGTGCGGGTCACCTGGAGTCCGAAACGAACACGGGTATTGAGATTGG TGATAATTTCGCGCAATTGCTGATCCTGGAAGCGACCAAAGATCGTACTCTGGTGAACTACATGGACCACCCGAAC AAGTGGAACTTCTTCCGTGACCAGCCGCTGCTGACCACCGAAGAATTTCCGTTCGACCTGGACACGACCAGCATTG GCACGCTGGCCACCCAACGTGACGATGGTACGGCGAATCTGGTAATGGACGAAATGTTGCAGTATCGTGACGAA GATGGCATCATTCAGACCTATTTCGATCATGAGCGCCCGCGTATTGATCCGATTGTTTGTGTGAATGTGCTGTCTCT GTTCTACAGCCGTGGCCGTGGCTCTGAGTTGGCGCCGACGCTGGAATGGGTGCGCGGTGTGTTGAAACATCGTGC GTACCTGGATGGTACGCGTTATTACGAGACTGGTGAGTGTTTCCTGTTTTTCCTGAGCCGTCTGCTGCAGAGCACC AAAGACGCAGCCCTGCACGCGAGCCTGAAGTCCCTGTTTGCAGAGCGTGTTAAAGAGCGCATCGGTGCGCCGGG CGATGCTCTGGCGCTGGCTATGCGCATCCTGGCGTGCGCCGCTGTTGGTGTGCGCGATGAAATTGATTTGCGTAG CCTGCTGCCGCTGCAATGCGAAGATGGCGGCTGGGAAGCGGGCTGGGTCTACAAATACGGCAGCAGCGGTGTGA AGATTGGCAATCGCGGTCTTACCACGGCGCTGGCATTGAATGCTATCGAAGCCGTTGAGGGCCGTCGCACCCGCC CAAAGTCCGGTAAGATCAGCCGTGTTAGCCGTCATAGCGAAGTCGCAGCGGCACCGCGTTCCTCGACGAGCAGCC ACCGTAGCAACCGTAGCATTAGCCGCACCTTCCAGGCATATTTTAAAGCGAGCTGGACCAGCATGAAACAAGTCG CAGTGGCGTAA KLO09124.1 - KLO09124.1 protein SEQ ID NO: 47 MSIHGSSMSSYSSTVPSMTSSPASTSTPSSPASSIHEIGPVPEARRKGQCNALIFDLGDVLFTWSAETKTTISPKLLKKILNS LTWFEYEKGNIGEQEAYDAVAKEFGVPSSEVGAAFQCARDSLQSNPRLVSLIRELKSQYDLKVYAMSNISAPDWEVLRT KATPEEWAMFDRVFTSAAARERKPNLGFYRQVVEATGVDPARSVFVDDKLDNVISARSVGLNAIIFDSFENVARQLKN YVADPIGRAEAWLRDNAKKMLSITDAGVVVYENFGQMLILEATGDRSLVDYVEYPRLFNFFQGNGVFTTESFPCDLDST SIGLTVTNHVDEKTRHSVMDEMLTYKNEDGIIATYFDATRPRIDPVVCANVLTFFYKNGRGEELNETLDWVYDILLHRAY LDGTRYYFGSDTFLFFLSRLLSESPSVYARFAPVFQERVKERMGATGDAMSLAMRIIAAATVKIQDRVDCDALLQTQED DGGFPIGWMYKYGATGMLLGNKGLSTALAIQAIKAVESFP - KLO09124.1 cDNA SEQ ID NO: 48 ATGTCGATTCACGGTTCTTCTATGTCCTCCTATTCCTCGACTGTGCCGTCAATGACTTCCTCTCCCGCGTCCACTTCTA CTCCGTCGTCTCCTGCATCGTCGATCCATGAGATTGGTCCTGTCCCAGAAGCTCGACGAAAGGGACAGTGCAACGC GCTGATCTTCGACCTCGGAGACGTCCTCTTCACCTGGTCGGCAGAGACTAAGACCACCATTTCCCCGAAACTCCTG AAAAAGATCCTTAACTCCTTAACATGGTTCGAATACGAGAAGGGAAACATCGGGGAGCAGGAGGCGTATGACGC AGTCGCAAAGGAGTTTGGCGTCCCGTCGTCCGAGGTCGGGGCCGCTTTCCAGTGCGCGCGCGATTCGCTACAGAG CAATCCCCGCCTCGTCTCGCTCATCCGTGAGCTGAAGTCGCAATATGATCTCAAGGTGTACGCCATGTCCAACATCT CTGCGCCGGACTGGGAAGTCCTAAGGACGAAGGCGACCCCTGAGGAGTGGGCAATGTTTGACCGCGTCTTCACG AGCGCGGCCGCGCGCGAGCGTAAGCCAAACCTCGGATTCTACAGACAGGTTGTTGAGGCGACCGGCGTCGACCC CGCTCGCTCCGTGTTCGTCGACGATAAACTCGACAATGTCATCTCTGCGCGTTCAGTCGGATTAAATGCGATCATCT TCGACTCATTTGAGAACGTCGCCCGGCAGCTCAAAAACTATGTCGCTGATCCTATCGGACGGGCGGAGGCGTGGT TGCGCGATAACGCAAAGAAGATGTTGTCAATTACGGATGCCGGGGTGGTCGTATACGAGAATTTCGGCCAGATGC TGATCTTGGAGGCAACAGGCGATAGGTCGCTTGTGGACTACGTCGAGTACCCTCGTCTCTTCAACTTCTTCCAAGG CAATGGCGTCTTTACGACCGAGTCATTCCCTTGCGACCTTGATTCGACTTCCATCGGCTTAACCGTCACGAACCACG TCGATGAGAAAACAAGGCACAGCGTCATGGATGAGATGCTGACCTACAAAAATGAGGATGGTATCATTGCGACTT ACTTTGATGCCACGCGTCCCCGAATTGACCCCGTCGTCTGCGCCAATGTCTTGACGTTCTTCTACAAGAACGGCCGA GGGGAGGAGCTCAATGAAACACTTGACTGGGTCTACGACATCCTCCTTCATCGCGCGTACCTCGATGGCACACGCT ATTATTTCGGCTCAGACACCTTCCTCTTCTTCCTTTCTCGACTTCTCTCCGAATCGCCATCCGTTTACGCCCGTTTCGC TCCGGTGTTCCAGGAGAGAGTCAAGGAGCGCATGGGGGCGACGGGAGATGCGATGTCCCTTGCGATGCGCATCA TCGCGGCCGCAACTGTCAAGATCCAAGACCGAGTCGACTGCGACGCTCTGCTGCAGACGCAGGAAGACGACGGT GGATTCCCGATAGGTTGGATGTACAAGTACGGGGCGACCGGGATGCTTCTGGGTAACAAGGGCTTGTCGACAGC TCTGGCAATCCAAGCTATCAAAGCGGTCGAATCTTTCCCTTGA - KLO09124.1 optimized cDNA SEQ ID NO: 49 GGATCCAAGCTTAAGGAGGTAAAAAATGTCGATTCACGGTAGCAGCATGTCGTCTTATAGCAGCACGGTTCCATCT ATGACTAGCAGCCCGGCTTCCACGAGCACGCCGTCCAGCCCGGCCAGCAGCATCCACGAAATCGGCCCGGTCCCT GAGGCGCGTCGCAAGGGCCAATGCAATGCACTGATCTTCGACCTGGGTGATGTTCTGTTTACCTGGAGCGCAGAA ACCAAGACCACGATCAGCCCGAAGCTGCTGAAAAAGATTCTGAACAGCTTGACCTGGTTTGAGTATGAGAAAGGC AACATCGGTGAACAAGAAGCCTATGACGCCGTTGCGAAAGAGTTCGGTGTGCCGAGCTCTGAGGTTGGCGCTGC GTTTCAATGTGCGCGTGACTCCCTGCAAAGCAATCCGCGTTTGGTTAGCCTGATTCGTGAGCTGAAGTCCCAGTAC GACCTGAAAGTGTACGCTATGAGCAATATTAGCGCGCCAGACTGGGAAGTGCTGCGTACTAAAGCGACCCCGGAA GAGTGGGCAATGTTCGATCGTGTCTTTACTTCTGCGGCGGCGCGTGAGCGTAAGCCGAACTTGGGCTTTTACCGCC AAGTCGTGGAAGCAACCGGTGTCGATCCGGCGCGTAGCGTTTTCGTCGATGATAAACTGGACAATGTGATCAGCG CGCGCTCTGTCGGTCTGAACGCTATTATCTTCGACTCCTTCGAAAACGTCGCCCGTCAGCTGAAGAATTACGTCGCA GACCCGATTGGTCGCGCTGAGGCGTGGCTGCGCGACAACGCAAAGAAAATGCTGAGCATCACCGATGCGGGTGT TGTGGTTTACGAGAATTTTGGCCAGATGCTGATCCTGGAAGCTACCGGTGACCGTAGCCTGGTGGACTATGTGGA GTATCCGCGCCTCTTTAACTTCTTCCAGGGTAACGGCGTTTTTACGACCGAGAGCTTTCCATGCGATCTGGACAGCA CCAGCATCGGTCTGACTGTGACCAATCATGTGGACGAAAAGACTCGCCACAGCGTCATGGACGAAATGCTGACCT ACAAAAATGAAGATGGTATTATTGCGACGTACTTTGACGCGACGCGCCCGCGCATTGACCCTGTTGTCTGTGCCAA TGTTCTGACCTTCTTCTACAAAAACGGTCGTGGTGAAGAATTGAACGAAACCCTGGATTGGGTGTACGACATTCTG CTGCATCGCGCGTATCTGGACGGTACGCGTTATTATTTCGGCTCCGATACGTTCCTGTTTTTCCTGAGCCGTCTGCT GAGCGAGTCTCCGAGCGTTTACGCGCGTTTTGCCCCGGTGTTTCAAGAGCGCGTGAAAGAGCGTATGGGCGCGAC CGGTGATGCGATGAGCCTGGCCATGCGTATCATTGCAGCAGCAACCGTAAAGATCCAGGATCGTGTGGATTGCGA CGCACTGTTGCAGACCCAAGAAGATGATGGCGGTTTCCCGATTGGTTGGATGTACAAATATGGTGCGACCGGTAT GTTGCTGGGCAACAAAGGCCTGAGCACGGCCCTGGCGATCCAGGCAATTAAAGCCGTCGAGTCGTTCCCGTAAGG TACCATATATGAATTCATTAATCTCGAG OJI95797.1 - 0JI95797.1 protein SEQ ID NO: 50 MGSTKALVVDFGNVLCTWTPPRELSIPPKKLKQIMSSDIWLDYERGIYKSEDECYLAVATRFGVSPSDLSSVMKKARESL QPNTATLNHLSHLKKTQPGLRIYGLTNTPLPEQSSVRSIAQEWPIFDHIYISGILGMRKPDIGCYRLVLRKIGLPAESVVFID DSPENILAAQSLGVHSILFQSHDQLSRQLGNVLGDPIQRGHNFLLSNAKQMNSTTDKGVIIRDNFAQLLIIELTQNPDLV ALETWDRTWNFFIGPPQLTTESFPNDLDTTSIALSVLPVDKEVVWSVMDEMLTFTNADGIFMTYFDRSRPRVDPVVCT NVLNLFCMHGRESEVAATFDWVLDVLRNSAYLSGSRYYSSPDCFLYFLSRLSCVVRDGTRRRELKSLLKQQVSQRIGAD GDSVSLATRLLASNILGITNGRDRSRLLALQETDGGWPAGWVYKFGSSGVQIGNRGLSTALALKSIERQKGPVEAISSEP EAWWPSLRLDRLLNVWPFIDWKGYSPS - 0JI95797.1 cDNA SEQ ID NO: 51 ATGGGTTCCACCAAGGCTCTTGTTGTTGACTTTGGGAATGTTTTGTGTACCTGGACACCACCCAGGGAGTTATCCAT CCCGCCCAAGAAGCTGAAACAAATCATGTCTTCTGACATTTGGCTCGACTATGAACGGGGTATCTATAAGTCGGAG GACGAGTGCTACTTGGCGGTTGCAACTCGCTTCGGCGTCTCTCCCAGCGACCTCTCCTCGGTGATGAAAAAGGCCC GCGAGAGCCTGCAACCAAACACCGCAACCCTGAATCATCTGTCTCATCTCAAAAAGACCCAGCCTGGCCTCAGGAT ATACGGTTTGACCAACACCCCTCTCCCAGAACAAAGCAGTGTACGATCCATCGCCCAGGAATGGCCTATCTTCGAC CATATCTACATATCAGGCATCCTCGGAATGCGCAAGCCGGACATTGGCTGCTACAGGCTGGTGCTGCGAAAGATT GGGCTTCCAGCGGAGTCCGTGGTCTTCATTGATGATTCACCCGAGAACATCCTGGCCGCGCAGTCACTGGGAGTA CACAGCATACTGTTCCAAAGCCACGACCAGCTCTCTCGTCAGCTTGGCAATGTGCTGGGTGATCCAATCCAGCGGG GCCATAACTTCCTACTCTCGAACGCAAAGCAAATGAATAGTACGACCGACAAGGGAGTTATTATCCGGGACAACTT TGCGCAACTGCTGATCATCGAGCTGACGCAGAACCCAGACCTTGTGGCGTTAGAAACATGGGACCGTACCTGGAA TTTTTTTATTGGACCTCCACAATTGACAACTGAAAGCTTTCCCAATGATCTTGACACTACCTCCATCGCTCTCTCGGT TCTTCCGGTTGACAAAGAAGTGGTATGGTCTGTGATGGACGAGATGCTAACGTTTACCAATGCGGATGGGATTTTT ATGACCTATTTCGACCGATCACGCCCTCGAGTTGATCCGGTAGTTTGCACCAATGTCCTGAATCTTTTCTGCATGCA TGGACGGGAAAGCGAAGTTGCAGCCACATTTGACTGGGTGCTGGACGTTCTTCGAAATTCGGCCTATTTATCAGG ATCCAGATACTATTCTTCGCCTGATTGCTTTCTATACTTTCTTTCACGGCTGAGCTGTGTGGTCCGAGACGGCACGC GACGCAGGGAGCTCAAGTCACTGTTGAAACAACAAGTGAGCCAGCGTATTGGCGCTGATGGTGATTCCGTCTCTC TCGCCACTAGGCTACTTGCATCGAACATTTTAGGAATCACAAATGGCCGTGATCGCTCCAGGCTTCTTGCTCTGCAG GAAACTGACGGTGGATGGCCTGCTGGGTGGGTTTATAAATTCGGAAGCTCGGGGGTACAGATTGGCAATCGGGG GCTCAGTACAGCCTTGGCGTTAAAATCAATTGAGCGTCAGAAGGGGCCTGTTGAGGCGATATCCAGTGAGCCAGA AGCGTGGTGGCCATCCCTCAGGCTTGACCGACTTCTCAACGTTTGGCCTTTCATCGACTGGAAGGGATATTCGCCG AGTTGA - 0JI95797.1 optimized cDNA SEQ ID NO: 52 ATGGGTTCTACGAAAGCGTTGGTTGTTGATTTTGGTAATGTTCTGTGCACTTGGACGCCACCACGTGAATTGTCCA TCCCGCCGAAGAAACTGAAGCAAATCATGAGCAGCGACATTTGGCTGGACTATGAGCGTGGTATCTACAAATCGG AAGATGAGTGCTACCTGGCAGTTGCGACGCGCTTTGGTGTCAGCCCGTCCGACCTGAGCTCCGTTATGAAAAAAG CCCGTGAGAGCCTGCAGCCGAATACCGCAACGCTGAACCACTTGAGCCATCTGAAGAAAACCCAGCCTGGCCTTC GTATCTACGGCCTGACGAACACCCCGTTGCCGGAACAGAGCTCAGTCCGTAGCATTGCGCAGGAATGGCCGATTT TTGACCACATCTACATTAGCGGCATCTTGGGTATGCGCAAACCGGATATTGGTTGTTACCGTCTGGTTCTGCGTAA GATCGGTCTGCCAGCGGAGTCCGTCGTATTCATCGACGACAGCCCGGAGAACATTCTGGCAGCTCAATCGTTGGG TGTCCATAGCATCCTGTTCCAGTCCCACGATCAGCTGAGCCGTCAGCTGGGCAATGTGCTGGGTGATCCGATTCAG CGCGGTCACAACTTCCTCCTGTCCAACGCGAAGCAAATGAACAGCACCACCGATAAGGGTGTGATTATCCGCGAC AACTTCGCCCAGCTGCTGATTATTGAGCTGACCCAAAATCCGGATCTGGTTGCGCTGGAGACTTGGGACCGTACGT GGAATTTCTTTATTGGTCCGCCGCAACTGACCACCGAGAGCTTTCCGAACGACCTGGACACCACGAGCATTGCCCT GAGCGTGTTGCCGGTGGATAAAGAAGTCGTTTGGTCTGTGATGGATGAGATGCTGACCTTCACCAACGCAGACGG CATCTTCATGACCTATTTCGATCGTAGCCGTCCGCGTGTTGACCCGGTCGTTTGTACCAATGTCCTGAATCTGTTTTG CATGCATGGTCGCGAGAGCGAAGTGGCCGCGACGTTCGACTGGGTGCTGGACGTGCTGCGCAACAGCGCGTACC TGAGCGGTTCCCGTTATTACAGCAGCCCGGATTGTTTTCTGTATTTCCTGTCTCGTCTGAGCTGCGTCGTCCGTGAT GGCACGCGTCGTCGTGAACTGAAAAGCCTGCTGAAGCAACAAGTTTCTCAACGTATCGGCGCTGACGGTGATTCC GTCAGCCTGGCCACCCGTTTGCTGGCGAGCAACATCCTGGGCATTACTAACGGTCGTGACCGCAGCCGTCTGCTG GCATTGCAAGAAACCGATGGTGGCTGGCCTGCAGGCTGGGTCTATAAGTTTGGTAGCAGCGGCGTGCAAATTGG CAATCGCGGTCTGAGCACCGCGCTGGCTCTGAAGTCTATCGAGCGCCAGAAAGGTCCGGTGGAAGCAATCAGCA GCGAGCCGGAAGCGTGGTGGCCTAGCTTACGCTTGGACCGCTTGCTGAATGTTTGGCCATTTATCGACTGGAAGG GCTACTCCCCGAGCTAA Class I terpene synthase-like motif SEQ ID NO: 53 DDxx(D/E), where x at position 3 is K, N, R, S, or Q and x at position 4 is L, I, G, P, or T Class I terpene synthase-like motif SEQ ID NO: 54 DD(K/Q/R)(L/I/T)(D/E)NV Class I terpene synthase-like motif SEQ ID NO: 55 DD(N/K/S/Q)(L/G/P)(D/E)N(V/I) Class II terpene synthase-like motif SEQ ID NO: 56 DxD(T/S)T, where x at position 2 is V, M, F or L Class II terpene synthase-like motif SEQ ID NO: 57 D(V/M/L/F)DTTS Class II terpene synthase-like motif SEQ ID NO: 58 D(V/M/L)D(T/S)TS Conserved motif A SEQ ID NO: 59 SxxWxxYExG, where x is any amino acid Conserved motif B SEQ ID NO: 60 NFxQx(I/L)IxE, where x is any amino acid Conserved motif C SEQ ID NO: 61 (D/E)(G/E)Ixx(T/V)YFDxxRxRxDPxVxxNVL Conserved motif D SEQ ID NO: 62 QxxDGx(W/F) XP_006461126.1 - XP_006461126.1 protein SEQ ID NO: 63 MAPPQRPFTAIVFDIGDVLFQWSATTKTSISPKTLRSILNCPTWFDYERGRLAENACYAAISQEFNVNPDEVRDAFSQAR DSLQANHDFISLIRELKAQANGRLRVYAMSNISLPDWEVLRMKPADWDIFDHVFTSGAVGERKPNLAFYRHVIAATDL QPHQTIFVDDKLENVLSARSLGFTGIVFDEPSEVKRALRNLIGDPVQRGGEFLVRNAGKLGSITRTTAKHESIPLDENFAQ LLILEITGNRALVNLVEHPQTWNFFQGKGQLTTEEFPFDLDTTSLGLTILKRSREIADSVMDEMLEYVDPDGIIQTYFDHR RPRFDPVVCVNALSLFYAYGRGEQLRSTLTWVHEVLLNRAYLDGTRYYETAECFLYFMSRLLATSGDPDLHSLLKPLLKER VQERIGADGDSLALAMRILACDFVGIRDEVDLRTLLTLQCEDGGWEVGWMYKYGSSGISIGNRGLATALAIKAVDTMF QPQIRFSESPTDTLVENAIHKRRPSFSEKFLGKRPRSGSFRKPLQWILQGSKLRKSVEIGS - XP_006461126.1 cDNA SEQ ID NO: 64 ATGGCTCCGCCTCAGCGACCCTTTACTGCGATTGTCTTTGACATCGGGGATGTTCTATTCCAATGGTCTGCAACCAC CAAAACCTCTATCTCACCAAAGACACTCCGCTCTATTCTCAACTGTCCGACATGGTTTGACTATGAACGTGGACGCC TGGCAGAAAACGCTTGTTATGCCGCTATCTCACAAGAATTCAACGTCAACCCAGACGAAGTTCGCGACGCTTTCAG CCAAGCGCGCGACTCTCTCCAAGCAAACCACGACTTCATCAGTCTCATCCGTGAGCTGAAGGCACAAGCAAATGGT CGTTTACGTGTGTACGCCATGTCGAACATATCTCTTCCTGATTGGGAAGTGCTGCGGATGAAACCTGCTGATTGGG ATATTTTCGACCACGTCTTCACATCCGGTGCGGTTGGGGAACGCAAGCCCAATCTCGCCTTTTATCGCCATGTTATC GCGGCCACCGATCTGCAGCCTCATCAGACAATATTTGTTGACGATAAGCTGGAGAATGTTCTCTCAGCACGTTCCC TCGGGTTCACAGGCATCGTGTTTGACGAGCCCTCCGAGGTCAAACGTGCGCTTCGTAACCTCATTGGGGATCCTGT TCAACGAGGAGGTGAATTCTTGGTTCGGAATGCCGGAAAGCTTGGCTCTATCACAAGGACTACTGCAAAGCACGA GTCAATCCCCCTCGACGAGAATTTTGCTCAGCTTCTTATTCTCGAGATAACGGGGAACAGGTGCGTTAGCTTCTTGT AGGGTCTTCTGTCGTAATACTAAATTTTTTCTGGTGTTTAGGGCTTTGGTCAACCTCGTTGAGCATCCTCAAACGTG GAATTTCTTCCAAGGTGCGCTGCTAAAATAAACATCCAGTTGCGTTTCGAAGCTCATTGTGGGCGTCCCGTCACAG GCAAGGGCCAGCTGACAACAGAAGAATTTCCATTCGATCTCGATACAACTTCTCTTGGTCTCACGATCCTCAAGCG AAGCAGGGAAATCGCCGATTCAGTCATGGATGAAATGCTGGAGTATGTCGATCCTGATGGTATCATTCAGGCAAG TTTCATTTATCGGCTTGAGAAAATAAAGACAAAAACGTTCTGATGGGGGGATGTTTCTAGACGTATTTCGATCATC GGAGACCACGTTTTGATCCAGTCGTGTGTGTCAATGCATTAAGCCTCTTCTATGCTTACGGCCGCGGGGAGCAACT GCGGTCGACTTTGACATGGGTACATGAAGTCCTTCTCAATCGAGCCTACTTGGATGGCACACGGTACTACGAAACA GCCGAATGCTTCCTCTATTTCATGAGCCGACTTCTCGCCACTTCAGGCGACCCTGACCTTCACTCCCTTCTTAAACCT CTTCTCAAAGAACGGGTGCAAGAACGCATTGGAGCTGATGGAGACTCTCTTGCACTCGCAATGCGTATTCTCGCCT GTGATTTCGTCGGAATCAGAGATGAAGTGGATTTACGCACACTTCTGACTTTGCAATGTGAAGATGGAGGTTGGG AAGTGGGTTGGATGTACAAGTATGGATCTTCCGGTATCAGTATCGGAAATCGTGGACTGGCCACCGCGCTCGCTA TCAAGGCCGTCGACACGATGTTTCAACCCCAAATTCGGTTCTCTGAATCACCCACAGATACTTTGGTTGAAAACGCT ATCCACAAACGCCGTCCCTCATTTTCCGAAAAATTCCTCGGCAAACGTCCTCGCAGCGGATCGTTCAGGAAACCTTT ACAGTGGATACTGCAAGGTTCCAAGCTTCGCAAATCTGTCGAAATAGGAAGCTAA - XP_006461126.1 optimized cDNA SEQ ID NO: 65 ATGGCACCACCGCAACGTCCGTTCACTGCAATTGTTTTCGATATTGGCGATGTTTTGTTCCAATGGTCTGCGACCAC GAAAACCAGCATTAGCCCGAAAACCCTGCGCAGCATTCTGAATTGTCCGACCTGGTTTGATTATGAGCGCGGCCGT CTGGCGGAAAATGCGTGTTACGCTGCGATCAGCCAAGAATTTAACGTCAACCCGGACGAAGTTCGCGACGCCTTC AGCCAAGCGCGCGACAGCCTGCAGGCGAATCACGACTTCATCAGCCTGATTCGTGAGCTGAAAGCTCAGGCGAAC GGTCGTCTGCGTGTCTACGCCATGTCTAATATCAGCCTGCCGGATTGGGAAGTCCTGCGTATGAAGCCAGCCGATT GGGACATCTTTGACCATGTATTTACCAGCGGTGCGGTGGGTGAGCGCAAGCCGAACCTGGCCTTTTATCGTCACGT CATCGCGGCCACGGATCTGCAGCCGCACCAGACGATCTTCGTGGATGACAAACTGGAAAACGTGCTGTCTGCGCG CTCGCTGGGCTTCACGGGTATCGTGTTCGACGAGCCAAGCGAAGTCAAACGTGCGCTGCGTAATCTGATCGGCGA CCCGGTGCAGCGTGGTGGCGAGTTCCTGGTTCGTAATGCTGGCAAACTGGGTTCTATCACCCGTACGACCGCAAA ACATGAGAGCATCCCGCTGGATGAGAATTTTGCACAACTGTTGATTCTGGAAATTACTGGTAACCGCGCACTGGTC AATCTGGTTGAGCACCCGCAGACGTGGAACTTCTTCCAGGGTAAGGGCCAGCTGACGACCGAAGAATTTCCTTTT GACCTGGATACGACGAGCCTGGGTCTGACGATCCTGAAGCGTAGCCGCGAGATTGCCGACTCCGTCATGGACGAA ATGTTGGAATACGTGGACCCTGACGGCATCATTCAGACCTACTTCGATCATCGTCGCCCGCGCTTTGACCCGGTTG TTTGCGTTAATGCCCTGAGCCTGTTCTATGCATACGGCCGTGGTGAGCAACTGCGTTCCACCTTGACCTGGGTGCA CGAAGTTCTGTTGAACCGTGCGTATTTGGATGGTACGCGTTACTATGAAACGGCCGAGTGCTTTCTGTATTTCATG TCCCGTCTGCTGGCAACCAGCGGTGACCCGGATCTGCATTCCCTGCTGAAGCCGTTGCTGAAGGAACGCGTGCAA GAGCGCATCGGCGCTGACGGTGACAGCCTGGCGCTGGCGATGCGCATTTTGGCATGTGATTTTGTTGGCATCCGT GATGAAGTGGATCTGCGTACCCTGCTGACCTTACAGTGCGAGGATGGCGGTTGGGAAGTGGGCTGGATGTACAA ATACGGTAGCAGCGGTATTAGCATTGGTAACCGTGGTCTGGCAACCGCATTGGCGATCAAAGCTGTTGACACCAT GTTTCAACCGCAAATCCGTTTCAGCGAGAGCCCGACCGACACTCTGGTGGAGAACGCGATTCACAAGCGCCGCCC GAGCTTTTCAGAGAAATTTTTAGGTAAGCGTCCGCGTTCCGGTTCGTTCCGTAAACCGCTGCAATGGATTCTGCAG GGCAGCAAGCTGCGCAAGAGCGTCGAGATCGGTAGCTAA XP_007369631.1 - XP_007369631.1 Optimized cDNA for S. cerevisiae expression SEQ ID NO: 66 ATGGCTTCTATCCACAGAAGATACACTACTTTGATCTTGGACTTGGGTGACGTTTTGTTCAGATGGTCTCCAAAGAC TGAAACTGCTATCCCACCACAACAATTGAAGGACATCTTGTCTTCTGTTACTTGGTTCGAATACGAAAGAGGTAGA TTGTCTCAAGAAGCTTGTTACGAAAGATGTGCTGAAGAATTCAAGATCGAAGCTTCTGTTATCGCTGAAGCTTTCA AGCAAGCTAGAGGTTCTTTGAGACCAAACGAAGAATTCATCGCTTTGATCAGAGACTTGAGAAGAGAAATGCACG GTGACTTGACTGTTTTGGCTTTGTCTAACATCTCTTTGCCAGACTACGAATACATCATGTCTTTGTCTTCTGACTGGA CTACTGTTTTCGACAGAGTTTTCCCATCTGCTTTGGTTGGTGAAAGAAAGCCACACTTGGGTTGTTACAGAAAGGTT ATCTCTGAAATGAACTTGGAACCACAAACTACTGTTTTCGTTGACGACAAGTTGGACAACGTTGCTTCTGCTAGATC TTTGGGTATGCACGGTATCGTTTTCGACAACCAAGCTAACGTTTTCAGACAATTGAGAAACATCTTCGGTGACCCA ATCAGAAGAGGTCAAGAATACTTGAGAGGTCACGCTGGTAAGTTGGAATCTTCTACTGACAACGGTTTGATCTTCG AAGAAAACTTCACTCAATTGATCATCTACGAATTGACTCAAGACAGAACTTTGATCTCTTTGTCTGAATGTCCAAGA ACTTGGAACTTCTTCAGAGGTGAACCATTGTTCTCTGAAACTTTCCCAGACGACGTTGACACTACTTCTGTTGCTTT GACTGTTTTGCAACCAGACAGAGCTTTGGTTAACTCTGTTTTGGACGAAATGTTGGAATACGTTGACGCTGACGGT ATCATGCAAACTTACTTCGACAGATCTAGACCAAGAATGGACCCATTCGTTTGTGTTAACGTTTTGTCTTTGTTCTAC GAAAACGGTAGAGGTCACGAATTGCCAAGAACTTTGGACTGGGTTTACGAAGTTTTGTTGCACAGAGCTTACCAC GGTGGTTCTAGATACTACTTGTCTCCAGACTGTTTCTTGTTCTTCATGTCTAGATTGTTGAAGAGAGCTGACGACCC AGCTGTTCAAGCTAGATTGAGACCATTGTTCGTTGAAAGAGTTAACGAAAGAGTTGGTGCTGCTGGTGACTCTATG GACTTGGCTTTCAGAATCTTGGCTGCTGCTTCTGTTGGTGTTCAATGTCCAAGAGACTTGGAAAGATTGACTGCTG GTCAATGTGACGACGGTGGTTGGGACTTGTGTTGGTTCTACGTTTTCGGTTCTACTGGTGTTAAGGCTGGTAACAG AGGTTTGACTACTGCTTTGGCTGTTACTGCTATCCAAACTGCTATCGGTAGACCACCATCTCCATCTCCATCTGCTGC TTCTTCTTCTTTCAGACCATCTTCTCCATACAAGTTCTTGGGTATCTCTAGACCAGCTTCTCCAATCAGATTCGGTGA CTTGTTGAGACCATGGAGAAAGATGTCTAGATCTAACTTGAAGTCTCAATAA XP_006461126 - XP_006461126 Optimized cDNA for S. cerevisiae expression SEQ ID NO: 67 ATGGCTCCACCACAAAGACCATTCACTGCTATCGTTTTCGACATCGGTGACGTTTTGTTCCAATGGTCTGCTACTAC TAAGACTTCTATCTCTCCAAAGACTTTGAGATCTATCTTGAACTGTCCAACTTGGTTCGACTACGAAAGAGGTAGAT TGGCTGAAAACGCTTGTTACGCTGCTATCTCTCAAGAATTCAACGTTAACCCAGACGAAGTTAGAGACGCTTTCTCT CAAGCTAGAGACTCTTTGCAAGCTAACCACGACTTCATCTCTTTGATCAGAGAATTGAAGGCTCAAGCTAACGGTA GATTGAGAGTTTACGCTATGTCTAACATCTCTTTGCCAGACTGGGAAGTTTTGAGAATGAAGCCAGCTGACTGGGA CATCTTCGACCACGTTTTCACTTCTGGTGCTGTTGGTGAAAGAAAGCCAAACTTGGCTTTCTACAGACACGTTATCG CTGCTACTGACTTGCAACCACACCAAACTATCTTCGTTGACGACAAGTTGGAAAACGTTTTGTCTGCTAGATCTTTG GGTTTCACTGGTATCGTTTTCGACGAACCATCTGAAGTTAAGAGAGCTTTGAGAAACTTGATCGGTGACCCAGTTC AAAGAGGTGGTGAATTCTTGGTTAGAAACGCTGGTAAGTTGGGTTCTATCACTAGAACTACTGCTAAGCACGAAT CTATCCCATTGGACGAAAACTTCGCTCAATTGTTGATCTTGGAAATCACTGGTAACAGAGCTTTGGTTAACTTGGTT GAACACCCACAAACTTGGAACTTCTTCCAAGGTAAGGGTCAATTGACTACTGAAGAATTCCCATTCGACTTGGACA CTACTTCTTTGGGTTTGACTATCTTGAAGAGATCTAGAGAAATCGCTGACTCTGTTATGGACGAAATGTTGGAATA CGTTGACCCAGACGGTATCATCCAAACTTACTTCGACCACAGAAGACCAAGATTCGACCCAGTTGTTTGTGTTAAC GCTTTGTCTTTGTTCTACGCTTACGGTAGAGGTGAACAATTGAGATCTACTTTGACTTGGGTTCACGAAGTTTTGTT GAACAGAGCTTACTTGGACGGTACTAGATACTACGAAACTGCTGAATGTTTCTTGTACTTCATGTCTAGATTGTTGG CTACTTCTGGTGACCCAGACTTGCACTCTTTGTTGAAGCCATTGTTGAAGGAAAGAGTTCAAGAAAGAATCGGTGC TGACGGTGACTCTTTGGCTTTGGCTATGAGAATCTTGGCTTGTGACTTCGTTGGTATCAGAGACGAAGTTGACTTG AGAACTTTGTTGACTTTGCAATGTGAAGACGGTGGTTGGGAAGTTGGTTGGATGTACAAGTACGGTTCTTCTGGTA TCTCTATCGGTAACAGAGGTTTGGCTACTGCTTTGGCTATCAAGGCTGTTGACACTATGTTCCAACCACAAATCAGA TTCTCTGAATCTCCAACTGACACTTTGGTTGAAAACGCTATCCACAAGAGAAGACCATCTTTCTCTGAAAAGTTCTT GGGTAAGAGACCAAGATCTGGTTCTTTCAGAAAGCCATTGCAATGGATCTTGCAAGGTTCTAAGTTGAGAAAGTC TGTTGAAATCGGTTCTTAA LoTps1 - LoTps1 Optimized cDNA for S. cerevisiae expression SEQ ID NO: 68 ATGTACACTGCTTTGATCTTGGACTTGGGTGACGTTTTGTTCTCTTGGTCTTCTACTACTAACACTACTATCCCACCA AGACAATTGAAGGAAATCTTGTCTTCTCCAGCTTGGTTCGAATACGAAAGAGGTAGAATCACTCAAGCTGAATGTT ACGAAAGAGTTTCTGCTGAATTCTCTTTGGACGCTACTGCTGTTGCTGAAGCTTTCAGACAAGCTAGAGACTCTTTG AGACCAAACGACAAGTTCTTGACTTTGATCAGAGAATTGAGACAACAATCTCACGGTGAATTGACTGTTTTGGCTT TGTCTAACATCTCTTTGCCAGACTACGAATTCATCATGGCTTTGGACTCTAAGTGGACTTCTGTTTTCGACAGAGTTT TCCCATCTGCTTTGGTTGGTGAAAGAAAGCCACACTTGGGTGCTTTCAGACAAGTTTTGTCTGAAATGAACTTGGA CCCACACACTACTGTTTTCGTTGACGACAAGTTGGACAACGTTGTTTCTGCTAGATCTTTGGGTATGCACGGTGTTG TTTTCGACTCTCAAGACAACGTTTTCAGAATGTTGAGAAACATCTTCGGTGACCCAATCCACAGAGGTAGAGACTA CTTGAGACAACACGCTGGTAGATTGGAAACTTCTACTGACGCTGGTGTTGTTTTCGAAGAAAACTTCACTCAATTG ATCATCTACGAATTGACTAACGACAAGTCTTTGATCACTACTTCTAACTGTGCTAGAACTTGGAACTTCTTCAGAGG TAAGCCATTGTTCTCTGCTTCTTTCCCAGACGACATGGACACTACTTCTGTTGCTTTGACTGTTTTGAGATTGGACCA CGCTTTGGTTAACTCTGTTTTGGACGAAATGTTGAAGTACGTTGACGCTGACGGTATCATGCAAACTTACTTCGACC ACACTAGACCAAGAATGGACCCATTCGTTTGTGTTAACGTTTTGTCTTTGTTCCACGAACAAGGTAGAGGTCACGA ATTGCCAAACACTTTGGAATGGGTTCACGAAGTTTTGTTGCACAGAGCTTACATCGGTGGTTCTAGATACTACTTGT CTGCTGACTGTTTCTTGTTCTTCATGTCTAGATTGTTGCAAAGAATCACTGACCCATCTGTTTTGGGTAGATTCAGAC CATTGTTCATCGAAAGAGTTAGAGAAAGAGTTGGTGCTACTGGTGACTCTATCGACTTGGCTTTCAGAATCATCGC TGCTTCTACTGTTGGTATCCAATGTCCAAGAGACTTGGAATCTTTGTTGGCTGCTCAATGTGAAGACGGTGGTTGG GACTTGTGTTGGTTCTACCAATACGGTTCTACTGGTGTTAAGGCTGGTAACAGAGGTTTGACTACTGCTTTGGCTAT CAAGGCTATCGACTCTGCTATCGCTAGACCACCATCTCCAGCTTTGTCTGTTGCTTCTTCTTCTAAGTCTGAAATCCC AAAGCCAATCCAAAGATCTTTGAGACCATTGTCTCCAAGAAGATTCGGTGGTTTCTTGATGCCATGGAGAAGATCT CAAAGAAACGGTGTTGCTGTTTCTTCTTAA EMD37666.1 - EMD37666.1 Optimized cDNA for S. cerevisiae expression SEQ ID NO: 69 ATGTCTGCTGCTGCTCAATACACTACTTTGATCTTGGACTTGGGTGACGTTTTGTTCACTTGGTCTCCAAAGACTAA GACTTCTATCCCACCAAGAACTTTGAAGGAAATCTTGAACTCTGCTACTTGGTACGAATACGAAAGAGGTAGAATC TCTCAAGACGAATGTTACGAAAGAGTTGGTACTGAATTCGGTATCGCTCCATCTGAAATCGACAACGCTTTCAAGC AAGCTAGAGACTCTATGGAATCTAACGACGAATTGATCGCTTTGGTTAGAGAATTGAAGACTCAATTGGACGGTG AATTGTTGGTTTTCGCTTTGTCTAACATCTCTTTGCCAGACTACGAATACGTTTTGACTAAGCCAGCTGACTGGTCTA TCTTCGACAAGGTTTTCCCATCTGCTTTGGTTGGTGAAAGAAAGCCACACTTGGGTGTTTACAAGCACGTTATCGCT GAAACTGGTATCGACCCAAGAACTACTGTTTTCGTTGACGACAAGATCGACAACGTTTTGTCTGCTAGATCTGTTG GTATGCACGGTATCGTTTTCGAAAAGCAAGAAGACGTTATGAGAGCTTTGAGAAACATCTTCGGTGACCCAGTTA GAAGAGGTAGAGAATACTTGAGAAGAAACGCTATGAGATTGGAATCTGTTACTGACCACGGTGTTGCTTTCGGTG AAAACTTCACTCAATTGTTGATCTTGGAATTGACTAACGACCCATCTTTGGTTACTTTGCCAGACAGACCAAGAACT TGGAACTTCTTCAGAGGTAACGGTGGTAGACCATCTAAGCCATTGTTCTCTGAAGCTTTCCCAGACGACTTGGACA CTACTTCTTTGGCTTTGACTGTTTTGCAAAGAGACCCAGGTGTTATCTCTTCTGTTATGGACGAAATGTTGAACTAC AGAGACCCAGACGGTATCATGCAAACTTACTTCGACGACGGTAGACAAAGATTGGACCCATTCGTTAACGTTAAC GTTTTGACTTTCTTCTACACTAACGGTAGAGGTCACGAATTGGACCAATGTTTGACTTGGGTTAGAGAAGTTTTGTT GTACAGAGCTTACTTGGGTGGTTCTAGATACTACCCATCTGCTGACTGTTTCTTGTACTTCATCTCTAGATTGTTCGC TTGTACTAACGACCCAGTTTTGCACCACCAATTGAAGCCATTGTTCGTTGAAAGAGTTCAAGAACAAATCGGTGTT GAAGGTGACGCTTTGGAATTGGCTTTCAGATTGTTGGTTTGTGCTTCTTTGGACGTTCAAAACGCTATCGACATGA GAAGATTGTTGGAAATGCAATGTGAAGACGGTGGTTGGGAAGGTGGTAACTTGTACAGATTCGGTACTACTGGTT TGAAGGTTACTAACAGAGGTTTGACTACTGCTGCTGCTGTTCAAGCTATCGAAGCTTCTCAAAGAAGACCACCATC TCCATCTCCATCTGTTGAATCTACTAAGTCTCCAATCACTCCAGTTACTCCAATGTTGGAAGTTCCATCTTTGGGTTT GTCTATCTCTAGACCATCTTCTCCATTGTTGGGTTACTTCAGATTGCCATGGAAGAAGTCTGCTGAAGTTCACTAA XP_001217376.1 - XP_001217376.1 Optimized cDNA for S. cerevisiae expression SEQ ID NO: 70 ATGGCTATCACTAAGGGTCCAGTTAAGGCTTTGATCTTGGACTTCTCTAACGTTTTGTGTTCTTGGAAGCCACCATC TAACGTTGCTGTTCCACCACAAATCTTGAAGATGATCATGTCTTCTGACATCTGGCACGACTACGAATGTGGTAGAT ACTCTAGAGAAGACTGTTACGCTAGAGTTGCTGACAGATTCCACATCTCTGCTGCTGACATGGAAGACACTTTGAA GCAAGCTAGAAAGTCTTTGCAAGTTCACCACGAAACTTTGTTGTTCATCCAACAAGTTAAGAAGGACGCTGGTGGT GAATTGATGGTTTGTGGTATGACTAACACTCCAAGACCAGAACAAGACGTTATGCACTCTATCAACGCTGAATACC CAGTTTTCGACAGAATCTACATCTCTGGTTTGATGGGTATGAGAAAGCCATCTATCTGTTTCTACCAAAGAGTTATG GAAGAAATCGGTTTGTCTGGTGACGCTATCATGTTCATCGACGACAAGTTGGAAAACGTTATCGCTGCTCAATCTG TTGGTATCAGAGGTGTTTTGTTCCAATCTCAACAAGACTTGAGAAGAGTTGTTTTGAACTTCTTGGGTGACCCAGTT CACAGAGGTTTGCAATTCTTGGCTGCTAACGCTAAGAAGATGGACTCTGTTACTAACACTGGTGACACTATCCAAG ACAACTTCGCTCAATTGTTGATCTTGGAATTGGCTCAAGACAGAGAATTGGTTAAGTTGCAAGCTGGTAAGAGAAC TTGGAACTACTTCATCGGTCCACCAAAGTTGACTACTGCTACTTTCCCAGACGACATGGACACTACTTCTATGGCTT TGTCTGTTTTGCCAGTTGCTGAAGACGTTGTTTCTTCTGTTTTGGACGAAATGTTGAAGTTCGTTACTGACGACGGT ATCTTCATGACTTACTTCGACTCTTCTAGACCAAGAGTTGACCCAGTTGTTTGTATCAACGTTTTGGGTGTTTTCTGT AGACACAACAGAGAAAGAGACGTTTTGCCAACTTTCCACTGGATCAGAGACATCTTGATCAACAGAGCTTACTTGT CTGGTACTAGATACTACCCATCTCCAGACTTGTTCTTGTTCTTCTTGGCTAGATTGTGTTTGGCTGTTAGAAACCAAT CTTTGAGAGAACAATTGGTTTTGCCATTGGTTGACAGATTGAGAGAAAGAGTTGGTGCTCCAGGTGAAGCTGTTTC TTTGGCTGCTAGAATCTTGGCTTGTAGATCTTTCGGTATCGACTCTGCTAGAGACATGGACTCTTTGAGAGGTAAG CAATGTGAAGACGGTGGTTGGCCAGTTGAATGGGTTTACAGATTCGCTTCTTTCGGTTTGAACGTTGGTAACAGAG GTTTGGCTACTGCTTTCGCTGTTAGAGCTTTGGAATCTCCATACGGTGAATCTGCTGTTAAGGTTATGAGAAGAATC GTTTAA Primers - Primer for construction of fragment “a” (LEU2 yeast marker) SEQ ID NO: 71 AGGTGCAGTTCGCGTGCAATTATAACGTCGTGGCAACTGTTATCAGTCGTACCGCGCCATTCGACTACGTCGTAAG GCC - Primer for construction of fragment “a” (LEU2 yeast marker) SEQ ID NO: 72 TCGTGGTCAAGGCGTGCAATTCTCAACACGAGAGTGATTCTTCGGCGTTGTTGCTGACCATCGACGGTCGAGGAG AACTT - Primer for construction of fragment “b” (AmpR E. coli marker) SEQ ID NO: 73 TGGTCAGCAACAACGCCGAAGAATCACTCTCGTGTTGAGAATTGCACGCCTTGACCACGACACGTTAAGGGATTTT GGTCATGAG - Primer for construction of fragment “b” (AmpR E. coli marker) SEQ ID NO: 74 AACGCGTACCCTAAGTACGGCACCACAGTGACTATGCAGTCCGCACTTTGCCAATGCCAAAAATGTGCGCGGAAC CCCTA - Primer for construction of fragment “c” (Yeast origin of replication) SEQ ID NO: 75 TTGGCATTGGCAAAGTGCGGACTGCATAGTCACTGTGGTGCCGTACTTAGGGTACGCGTTCCTGAACGAAGCATC TGTGCTTCA - Primer for construction of fragment “c” (Yeast origin of replication) SEQ ID NO: 76 CCGAGATGCCAAAGGATAGGTGCTATGTTGATGACTACGACACAGAACTGCGGGTGACATAATGATAGCATTGAA GGATGAGACT - Primer for construction of fragment “d” (E. coli origin of replication) SEQ ID NO: 77 ATGTCACCCGCAGTTCTGTGTCGTAGTCATCAACATAGCACCTATCCTTTGGCATCTCGGTGAGCAAAAGGCCAGC AAAAGG - Primer for construction of fragment “d” (E. coli origin of replication) SEQ ID NO: 78 CTCAGATGTACGGTGATCGCCACCATGTGACGGAAGCTATCCTGACAGTGTAGCAAGTGCTGAGCGTCAGACCCC GTAGAA
Claims (28)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/673,465 US11932894B2 (en) | 2017-06-02 | 2022-02-16 | Method for producing albicanol and/or drimenol |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17174399 | 2017-06-02 | ||
EP17174399.0 | 2017-06-02 | ||
EP17174399 | 2017-06-02 | ||
PCT/EP2018/064344 WO2018220113A1 (en) | 2017-06-02 | 2018-05-31 | Method for producing albicanol and/or drimenol |
US201916618737A | 2019-12-02 | 2019-12-02 | |
US17/673,465 US11932894B2 (en) | 2017-06-02 | 2022-02-16 | Method for producing albicanol and/or drimenol |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2018/064344 Division WO2018220113A1 (en) | 2017-06-02 | 2018-05-31 | Method for producing albicanol and/or drimenol |
US16/618,737 Division US11293037B2 (en) | 2017-06-02 | 2018-05-31 | Method for producing albicanol and/or drimenol |
Publications (3)
Publication Number | Publication Date |
---|---|
US20220186265A1 US20220186265A1 (en) | 2022-06-16 |
US20230148463A9 true US20230148463A9 (en) | 2023-05-11 |
US11932894B2 US11932894B2 (en) | 2024-03-19 |
Family
ID=59021376
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/618,737 Active 2038-11-07 US11293037B2 (en) | 2017-06-02 | 2018-05-31 | Method for producing albicanol and/or drimenol |
US17/673,465 Active US11932894B2 (en) | 2017-06-02 | 2022-02-16 | Method for producing albicanol and/or drimenol |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/618,737 Active 2038-11-07 US11293037B2 (en) | 2017-06-02 | 2018-05-31 | Method for producing albicanol and/or drimenol |
Country Status (6)
Country | Link |
---|---|
US (2) | US11293037B2 (en) |
EP (1) | EP3635123A1 (en) |
JP (1) | JP7191860B2 (en) |
CN (1) | CN110691850A (en) |
BR (1) | BR112019025376A2 (en) |
WO (1) | WO2018220113A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11345907B2 (en) * | 2018-05-29 | 2022-05-31 | Firmenich Sa | Method for producing albicanol compounds |
WO2021105236A2 (en) | 2019-11-27 | 2021-06-03 | Firmenich Sa | Novel polypeptides for producing albicanol and/or drimenol compounds |
EP4001417A1 (en) * | 2020-11-13 | 2022-05-25 | serYmun Yeast GmbH | Yeast platform for the production of vaccines |
WO2023156429A1 (en) | 2022-02-16 | 2023-08-24 | Firmenich Sa | Process for preparing octahydro-2(1h)-naphthalenone derivatives |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006134523A2 (en) * | 2005-06-17 | 2006-12-21 | Firmenich Sa | Novel sesquiterpene synthases and methods of their use |
CN104053770A (en) * | 2011-10-19 | 2014-09-17 | 凯金公司 | Methods for producing cinnamolide and/or drimendiol |
MX352318B (en) | 2011-11-01 | 2017-11-21 | Firmenich & Cie | Cytochrome p450 and use thereof for the enzymatic oxidation of terpenes. |
JP6882163B2 (en) | 2014-05-01 | 2021-06-02 | フイルメニツヒ ソシエテ アノニムFirmenich Sa | Cohesive flavor system |
WO2016094178A1 (en) * | 2014-12-09 | 2016-06-16 | Dsm Ip Assets B.V. | Methods for producing abienol |
US10849348B2 (en) | 2015-09-21 | 2020-12-01 | Firmenich Sa | Sucrose monoesters microemulsions |
WO2017077125A1 (en) * | 2015-11-05 | 2017-05-11 | Firmenich S.A | Drimenol synthases iii |
-
2018
- 2018-05-31 US US16/618,737 patent/US11293037B2/en active Active
- 2018-05-31 WO PCT/EP2018/064344 patent/WO2018220113A1/en active Application Filing
- 2018-05-31 EP EP18727313.1A patent/EP3635123A1/en active Pending
- 2018-05-31 JP JP2019566200A patent/JP7191860B2/en active Active
- 2018-05-31 BR BR112019025376-9A patent/BR112019025376A2/en active Search and Examination
- 2018-05-31 CN CN201880035852.0A patent/CN110691850A/en active Pending
-
2022
- 2022-02-16 US US17/673,465 patent/US11932894B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
WO2018220113A1 (en) | 2018-12-06 |
CN110691850A (en) | 2020-01-14 |
US20200140898A1 (en) | 2020-05-07 |
US11293037B2 (en) | 2022-04-05 |
US11932894B2 (en) | 2024-03-19 |
EP3635123A1 (en) | 2020-04-15 |
JP7191860B2 (en) | 2022-12-19 |
JP2020523005A (en) | 2020-08-06 |
US20220186265A1 (en) | 2022-06-16 |
BR112019025376A2 (en) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11932894B2 (en) | Method for producing albicanol and/or drimenol | |
EP3140409B1 (en) | Drimenol synthase and method for producing drimenol | |
US20100311134A1 (en) | Method for producing sclareol | |
US11773414B2 (en) | Sesquiterpene synthases for production of drimenol and mixtures thereof | |
US20210010035A1 (en) | Production of manool | |
EP3140410B1 (en) | Drimenol synthases and method of producing drimenol | |
US10337031B2 (en) | Production of fragrant compounds | |
US10385361B2 (en) | Production of manool | |
US20200308612A1 (en) | Vetiver |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FIRMENICH SA, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SCHALK, MICHEL;ANZIANI, PAULINE;GOERNER, CHRISTIAN;AND OTHERS;REEL/FRAME:059141/0729 Effective date: 20191128 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |