IL293855A - Microbial production of mogrol and mogrosides - Google Patents
Microbial production of mogrol and mogrosidesInfo
- Publication number
- IL293855A IL293855A IL293855A IL29385522A IL293855A IL 293855 A IL293855 A IL 293855A IL 293855 A IL293855 A IL 293855A IL 29385522 A IL29385522 A IL 29385522A IL 293855 A IL293855 A IL 293855A
- Authority
- IL
- Israel
- Prior art keywords
- seq
- amino acid
- acid sequence
- ugt
- enzyme
- Prior art date
Links
- JLYBBRAAICDTIS-UHFFFAOYSA-N mogrol Natural products CC12C(O)CC3(C)C(C(CCC(O)C(C)(C)O)C)CCC3(C)C1CC=C1C2CCC(O)C1(C)C JLYBBRAAICDTIS-UHFFFAOYSA-N 0.000 title claims description 251
- JLYBBRAAICDTIS-AYEHCKLZSA-N mogrol Chemical compound C([C@H]1[C@]2(C)CC[C@@H]([C@]2(C[C@@H](O)[C@]11C)C)[C@@H](CC[C@@H](O)C(C)(C)O)C)C=C2[C@H]1CC[C@H](O)C2(C)C JLYBBRAAICDTIS-AYEHCKLZSA-N 0.000 title claims description 215
- 230000000813 microbial effect Effects 0.000 title claims description 142
- 238000004519 manufacturing process Methods 0.000 title claims description 71
- 229930189775 mogroside Natural products 0.000 title claims description 36
- 108090000790 Enzymes Proteins 0.000 claims description 438
- 102000004190 Enzymes Human genes 0.000 claims description 436
- 150000001413 amino acids Chemical group 0.000 claims description 372
- 235000001014 amino acid Nutrition 0.000 claims description 276
- 229940024606 amino acid Drugs 0.000 claims description 196
- 230000004048 modification Effects 0.000 claims description 161
- 238000012986 modification Methods 0.000 claims description 161
- 238000000034 method Methods 0.000 claims description 128
- 108020003891 Squalene monooxygenase Proteins 0.000 claims description 113
- 102000005782 Squalene Monooxygenase Human genes 0.000 claims description 112
- 238000006467 substitution reaction Methods 0.000 claims description 108
- 230000037361 pathway Effects 0.000 claims description 92
- 108010022535 Farnesyl-Diphosphate Farnesyltransferase Proteins 0.000 claims description 68
- 102100037997 Squalene synthase Human genes 0.000 claims description 68
- 238000012217 deletion Methods 0.000 claims description 60
- 230000037430 deletion Effects 0.000 claims description 60
- 238000003780 insertion Methods 0.000 claims description 59
- 230000037431 insertion Effects 0.000 claims description 59
- 238000006206 glycosylation reaction Methods 0.000 claims description 54
- 108020002908 Epoxide hydrolase Proteins 0.000 claims description 52
- 102000005486 Epoxide hydrolase Human genes 0.000 claims description 52
- 230000014509 gene expression Effects 0.000 claims description 47
- 230000013595 glycosylation Effects 0.000 claims description 47
- -1 mogrol glycosides Chemical class 0.000 claims description 42
- 108090000623 proteins and genes Proteins 0.000 claims description 41
- 229930182470 glycoside Natural products 0.000 claims description 39
- WSPRAEIJBDUDRX-UHFFFAOYSA-N Euferol Natural products CC12CCC3(C)C(C(CCC=C(C)C)C)CCC3(C)C1CC=C1C2CCC(O)C1(C)C WSPRAEIJBDUDRX-UHFFFAOYSA-N 0.000 claims description 32
- WSPRAEIJBDUDRX-FBJXRMALSA-N cucurbitadienol Chemical compound C([C@H]1[C@]2(C)CC[C@@H]([C@]2(CC[C@]11C)C)[C@@H](CCC=C(C)C)C)C=C2[C@H]1CC[C@H](O)C2(C)C WSPRAEIJBDUDRX-FBJXRMALSA-N 0.000 claims description 32
- 230000000694 effects Effects 0.000 claims description 32
- 238000006243 chemical reaction Methods 0.000 claims description 31
- 108010045510 NADPH-Ferrihemoprotein Reductase Proteins 0.000 claims description 28
- 102000002004 Cytochrome P-450 Enzyme System Human genes 0.000 claims description 26
- 108010015742 Cytochrome P-450 Enzyme System Proteins 0.000 claims description 26
- 239000002253 acid Substances 0.000 claims description 24
- 101710095468 Cyclase Proteins 0.000 claims description 23
- 150000003648 triterpenes Chemical class 0.000 claims description 23
- NUHSROFQTUXZQQ-UHFFFAOYSA-N isopentenyl diphosphate Chemical compound CC(=C)CCO[P@](O)(=O)OP(O)(O)=O NUHSROFQTUXZQQ-UHFFFAOYSA-N 0.000 claims description 22
- QYIMSPSDBYKPPY-RSKUXYSASA-N (S)-2,3-epoxysqualene Chemical compound CC(C)=CCC\C(C)=C\CC\C(C)=C\CC\C=C(/C)CC\C=C(/C)CC[C@@H]1OC1(C)C QYIMSPSDBYKPPY-RSKUXYSASA-N 0.000 claims description 21
- 102000004316 Oxidoreductases Human genes 0.000 claims description 21
- 108090000854 Oxidoreductases Proteins 0.000 claims description 21
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 claims description 17
- QYIMSPSDBYKPPY-UHFFFAOYSA-N OS Natural products CC(C)=CCCC(C)=CCCC(C)=CCCC=C(C)CCC=C(C)CCC1OC1(C)C QYIMSPSDBYKPPY-UHFFFAOYSA-N 0.000 claims description 17
- 241000588724 Escherichia coli Species 0.000 claims description 16
- YYGNTYWPHWGJRM-UHFFFAOYSA-N (6E,10E,14E,18E)-2,6,10,15,19,23-hexamethyltetracosa-2,6,10,14,18,22-hexaene Chemical compound CC(C)=CCCC(C)=CCCC(C)=CCCC=C(C)CCC=C(C)CCC=C(C)C YYGNTYWPHWGJRM-UHFFFAOYSA-N 0.000 claims description 15
- BHEOSNUKNHRBNM-UHFFFAOYSA-N Tetramethylsqualene Natural products CC(=C)C(C)CCC(=C)C(C)CCC(C)=CCCC=C(C)CCC(C)C(=C)CCC(C)C(C)=C BHEOSNUKNHRBNM-UHFFFAOYSA-N 0.000 claims description 15
- PRAKJMSDJKAYCZ-UHFFFAOYSA-N dodecahydrosqualene Natural products CC(C)CCCC(C)CCCC(C)CCCCC(C)CCCC(C)CCCC(C)C PRAKJMSDJKAYCZ-UHFFFAOYSA-N 0.000 claims description 15
- 229930190082 siamenoside Natural products 0.000 claims description 15
- TUHBEKDERLKLEC-UHFFFAOYSA-N squalene Natural products CC(=CCCC(=CCCC(=CCCC=C(/C)CCC=C(/C)CC=C(C)C)C)C)C TUHBEKDERLKLEC-UHFFFAOYSA-N 0.000 claims description 15
- 229940031439 squalene Drugs 0.000 claims description 15
- KABSNIWLJXCBGG-TXLDAEQNSA-N 3-[(3e,7e,11e,15e)-18-(3,3-dimethyloxiran-2-yl)-3,7,12,16-tetramethyloctadeca-3,7,11,15-tetraenyl]-2,2-dimethyloxirane Chemical compound O1C(C)(C)C1CCC(/C)=C/CCC(/C)=C/CC\C=C(/C)CC\C=C(/C)CCC1OC1(C)C KABSNIWLJXCBGG-TXLDAEQNSA-N 0.000 claims description 14
- CBIDRCWHNCKSTO-UHFFFAOYSA-N prenyl diphosphate Chemical compound CC(C)=CCO[P@](O)(=O)OP(O)(O)=O CBIDRCWHNCKSTO-UHFFFAOYSA-N 0.000 claims description 14
- 102000004169 proteins and genes Human genes 0.000 claims description 14
- 108700023372 Glycosyltransferases Proteins 0.000 claims description 13
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 claims description 13
- 239000000203 mixture Substances 0.000 claims description 13
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 claims description 13
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 12
- 102000051366 Glycosyltransferases Human genes 0.000 claims description 12
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 claims description 12
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 claims description 12
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 12
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 claims description 12
- 235000004279 alanine Nutrition 0.000 claims description 12
- 230000001580 bacterial effect Effects 0.000 claims description 12
- 235000018102 proteins Nutrition 0.000 claims description 12
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 claims description 11
- XCCTYIAWTASOJW-XVFCMESISA-N Uridine-5'-Diphosphate Chemical compound O[C@@H]1[C@H](O)[C@@H](COP(O)(=O)OP(O)(O)=O)O[C@H]1N1C(=O)NC(=O)C=C1 XCCTYIAWTASOJW-XVFCMESISA-N 0.000 claims description 11
- 101710192343 NADPH:adrenodoxin oxidoreductase, mitochondrial Proteins 0.000 claims description 10
- 102100036777 NADPH:adrenodoxin oxidoreductase, mitochondrial Human genes 0.000 claims description 10
- 101710104207 Probable NADPH:adrenodoxin oxidoreductase, mitochondrial Proteins 0.000 claims description 10
- 244000228451 Stevia rebaudiana Species 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 10
- 102220117512 rs886041928 Human genes 0.000 claims description 10
- 241000894006 Bacteria Species 0.000 claims description 9
- 102000018832 Cytochromes Human genes 0.000 claims description 9
- 108010052832 Cytochromes Proteins 0.000 claims description 9
- 235000011180 diphosphates Nutrition 0.000 claims description 9
- 235000013305 food Nutrition 0.000 claims description 9
- 235000003599 food sweetener Nutrition 0.000 claims description 9
- 125000002887 hydroxy group Chemical group [H]O* 0.000 claims description 9
- 239000004471 Glycine Substances 0.000 claims description 8
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 claims description 8
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 claims description 8
- 230000001965 increasing effect Effects 0.000 claims description 8
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 claims description 8
- 229960000310 isoleucine Drugs 0.000 claims description 8
- 102220244484 rs373526599 Human genes 0.000 claims description 8
- 239000003765 sweetening agent Substances 0.000 claims description 8
- 239000004474 valine Substances 0.000 claims description 8
- 235000006092 Stevia rebaudiana Nutrition 0.000 claims description 7
- HSCJRCZFDFQWRP-JZMIEXBBSA-N UDP-alpha-D-glucose Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@@H]1OP(O)(=O)OP(O)(=O)OC[C@@H]1[C@@H](O)[C@@H](O)[C@H](N2C(NC(=O)C=C2)=O)O1 HSCJRCZFDFQWRP-JZMIEXBBSA-N 0.000 claims description 7
- HSCJRCZFDFQWRP-UHFFFAOYSA-N Uridindiphosphoglukose Natural products OC1C(O)C(O)C(CO)OC1OP(O)(=O)OP(O)(=O)OCC1C(O)C(O)C(N2C(NC(=O)C=C2)=O)O1 HSCJRCZFDFQWRP-UHFFFAOYSA-N 0.000 claims description 7
- 235000013361 beverage Nutrition 0.000 claims description 7
- 238000012239 gene modification Methods 0.000 claims description 7
- 230000005017 genetic modification Effects 0.000 claims description 7
- 235000013617 genetically modified food Nutrition 0.000 claims description 7
- 102220057405 rs730881771 Human genes 0.000 claims description 7
- 101710106940 Iron oxidase Proteins 0.000 claims description 6
- 239000001177 diphosphate Substances 0.000 claims description 6
- 102200027729 rs62642928 Human genes 0.000 claims description 6
- 102220501176 E3 ubiquitin-protein ligase UBR1_C127F_mutation Human genes 0.000 claims description 5
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 claims description 5
- 102220611084 Steroid 21-hydroxylase_V281L_mutation Human genes 0.000 claims description 5
- 235000015218 chewing gum Nutrition 0.000 claims description 5
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 claims description 5
- 150000003278 haem Chemical class 0.000 claims description 5
- 230000002018 overexpression Effects 0.000 claims description 5
- 102200029262 rs12065961 Human genes 0.000 claims description 5
- 102220288039 rs1555914287 Human genes 0.000 claims description 5
- 102220235355 rs201630151 Human genes 0.000 claims description 5
- 241000894007 species Species 0.000 claims description 5
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 claims description 4
- 241000193830 Bacillus <bacterium> Species 0.000 claims description 4
- 244000063299 Bacillus subtilis Species 0.000 claims description 4
- 235000014469 Bacillus subtilis Nutrition 0.000 claims description 4
- 241000588722 Escherichia Species 0.000 claims description 4
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 4
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 claims description 4
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 claims description 4
- 241000235648 Pichia Species 0.000 claims description 4
- 241000589776 Pseudomonas putida Species 0.000 claims description 4
- 241000235070 Saccharomyces Species 0.000 claims description 4
- 239000004383 Steviol glycoside Substances 0.000 claims description 4
- 240000006365 Vitis vinifera Species 0.000 claims description 4
- 241000235015 Yarrowia lipolytica Species 0.000 claims description 4
- 229940112822 chewing gum Drugs 0.000 claims description 4
- 108010087911 flavodoxin NADPH oxidoreductase Proteins 0.000 claims description 4
- 230000027756 respiratory electron transport chain Effects 0.000 claims description 4
- 235000019411 steviol glycoside Nutrition 0.000 claims description 4
- 229930182488 steviol glycoside Natural products 0.000 claims description 4
- 150000008144 steviol glycosides Chemical class 0.000 claims description 4
- 235000019202 steviosides Nutrition 0.000 claims description 4
- 239000004475 Arginine Substances 0.000 claims description 3
- 240000007154 Coffea arabica Species 0.000 claims description 3
- 235000007460 Coffea arabica Nutrition 0.000 claims description 3
- 239000004593 Epoxy Substances 0.000 claims description 3
- 241000235058 Komagataella pastoris Species 0.000 claims description 3
- 241000191043 Rhodobacter sphaeroides Species 0.000 claims description 3
- 241000607365 Vibrio natriegens Species 0.000 claims description 3
- 241000235013 Yarrowia Species 0.000 claims description 3
- 241000588902 Zymomonas mobilis Species 0.000 claims description 3
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 claims description 3
- 239000002417 nutraceutical Substances 0.000 claims description 3
- 235000021436 nutraceutical agent Nutrition 0.000 claims description 3
- 239000008194 pharmaceutical composition Substances 0.000 claims description 3
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 claims description 3
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 claims description 3
- 108010011485 Aspartame Proteins 0.000 claims description 2
- 241000186018 Bifidobacterium adolescentis Species 0.000 claims description 2
- 239000004472 Lysine Substances 0.000 claims description 2
- 239000004384 Neotame Substances 0.000 claims description 2
- 241000191025 Rhodobacter Species 0.000 claims description 2
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 claims description 2
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 claims description 2
- 239000004473 Threonine Substances 0.000 claims description 2
- 239000000605 aspartame Substances 0.000 claims description 2
- 235000010357 aspartame Nutrition 0.000 claims description 2
- IAOZJIPTCAWIRG-QWRGUYRKSA-N aspartame Chemical compound OC(=O)C[C@H](N)C(=O)N[C@H](C(=O)OC)CC1=CC=CC=C1 IAOZJIPTCAWIRG-QWRGUYRKSA-N 0.000 claims description 2
- 229960003438 aspartame Drugs 0.000 claims description 2
- 235000019412 neotame Nutrition 0.000 claims description 2
- HLIAVLHNDJUHFG-HOTGVXAUSA-N neotame Chemical compound CC(C)(C)CCN[C@@H](CC(O)=O)C(=O)N[C@H](C(=O)OC)CC1=CC=CC=C1 HLIAVLHNDJUHFG-HOTGVXAUSA-N 0.000 claims description 2
- 108010070257 neotame Proteins 0.000 claims description 2
- RPYRMTHVSUWHSV-CUZJHZIBSA-N rebaudioside D Chemical compound O([C@H]1[C@H](O)[C@@H](CO)O[C@H]([C@@H]1O[C@H]1[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O1)O)O[C@]12C(=C)C[C@@]3(C1)CC[C@@H]1[C@@](C)(CCC[C@]1([C@@H]3CC2)C)C(=O)O[C@H]1[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O1)O[C@H]1[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O1)O)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O RPYRMTHVSUWHSV-CUZJHZIBSA-N 0.000 claims description 2
- 235000019505 tobacco product Nutrition 0.000 claims description 2
- 102220035924 rs515726219 Human genes 0.000 claims 3
- 102100023072 Neurolysin, mitochondrial Human genes 0.000 claims 2
- 101710135150 (+)-T-muurolol synthase ((2E,6E)-farnesyl diphosphate cyclizing) Proteins 0.000 claims 1
- 241000186226 Corynebacterium glutamicum Species 0.000 claims 1
- 108700016613 E coli galU Proteins 0.000 claims 1
- 101710119400 Geranylfarnesyl diphosphate synthase Proteins 0.000 claims 1
- 101710107752 Geranylgeranyl diphosphate synthase Proteins 0.000 claims 1
- 241000191023 Rhodobacter capsulatus Species 0.000 claims 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 206
- 210000004027 cell Anatomy 0.000 description 89
- 239000000758 substrate Substances 0.000 description 35
- XMWHRVNVKDKBRG-CRCLSJGQSA-N [(2s,3r)-2,3,4-trihydroxy-3-methylbutyl] dihydrogen phosphate Chemical compound OC[C@](O)(C)[C@@H](O)COP(O)(O)=O XMWHRVNVKDKBRG-CRCLSJGQSA-N 0.000 description 20
- 238000002474 experimental method Methods 0.000 description 19
- XMWHRVNVKDKBRG-UHNVWZDZSA-N 2-C-Methyl-D-erythritol 4-phosphate Natural products OC[C@@](O)(C)[C@H](O)COP(O)(O)=O XMWHRVNVKDKBRG-UHNVWZDZSA-N 0.000 description 17
- 244000185386 Thladiantha grosvenorii Species 0.000 description 17
- 235000011171 Thladiantha grosvenorii Nutrition 0.000 description 17
- 238000000855 fermentation Methods 0.000 description 17
- 230000004151 fermentation Effects 0.000 description 17
- 150000003505 terpenes Chemical class 0.000 description 17
- 230000015572 biosynthetic process Effects 0.000 description 16
- 239000000543 intermediate Substances 0.000 description 15
- SIKJAQJRHWYJAI-UHFFFAOYSA-N Indole Chemical compound C1=CC=C2NC=CC2=C1 SIKJAQJRHWYJAI-UHFFFAOYSA-N 0.000 description 14
- 108030005251 Cucurbitadienol synthases Proteins 0.000 description 12
- 102100035111 Farnesyl pyrophosphate synthase Human genes 0.000 description 12
- 235000013399 edible fruits Nutrition 0.000 description 12
- 101710125754 Farnesyl pyrophosphate synthase Proteins 0.000 description 11
- 150000001875 compounds Chemical class 0.000 description 11
- GHBNZZJYBXQAHG-KUVSNLSMSA-N (2r,3r,4s,5s,6r)-2-[[(2r,3s,4s,5r,6r)-6-[[(3s,8s,9r,10r,11r,13r,14s,17r)-17-[(2r,5r)-5-[(2s,3r,4s,5s,6r)-4,5-dihydroxy-3-[(2r,3r,4s,5s,6r)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-6-[[(2r,3r,4s,5s,6r)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy Chemical compound C([C@H]1O[C@H]([C@@H]([C@@H](O)[C@@H]1O)O[C@H]1[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O1)O)O[C@H](CC[C@@H](C)[C@@H]1[C@]2(C[C@@H](O)[C@@]3(C)[C@H]4C(C([C@@H](O[C@H]5[C@@H]([C@@H](O)[C@H](O)[C@@H](CO[C@H]6[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O6)O)O5)O)CC4)(C)C)=CC[C@H]3[C@]2(C)CC1)C)C(C)(C)O)O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O GHBNZZJYBXQAHG-KUVSNLSMSA-N 0.000 description 10
- VWFJDQUYCIWHTN-YFVJMOTDSA-N 2-trans,6-trans-farnesyl diphosphate Chemical compound CC(C)=CCC\C(C)=C\CC\C(C)=C\CO[P@](O)(=O)OP(O)(O)=O VWFJDQUYCIWHTN-YFVJMOTDSA-N 0.000 description 10
- 240000001980 Cucurbita pepo Species 0.000 description 10
- 235000009852 Cucurbita pepo Nutrition 0.000 description 10
- TVJXHJAWHUMLLG-UHFFFAOYSA-N mogroside V Natural products CC(CCC(OC1OC(COC2OC(CO)C(O)C(O)C2OC3OC(CO)C(O)C(O)C3O)C(O)C(O)C1O)C(C)(C)O)C4CCC5(C)C6CC=C7C(CCC(OC8OC(COC9OC(CO)C(O)C(O)C9O)C(O)C(O)C8O)C7(C)C)C6(C)C(O)CC45C TVJXHJAWHUMLLG-UHFFFAOYSA-N 0.000 description 10
- XBZYWSMVVKYHQN-MYPRUECHSA-N (4as,6as,6br,8ar,9r,10s,12ar,12br,14bs)-10-hydroxy-2,2,6a,6b,9,12a-hexamethyl-9-[(sulfooxy)methyl]-1,2,3,4,4a,5,6,6a,6b,7,8,8a,9,10,11,12,12a,12b,13,14b-icosahydropicene-4a-carboxylic acid Chemical compound C1C[C@H](O)[C@@](C)(COS(O)(=O)=O)[C@@H]2CC[C@@]3(C)[C@]4(C)CC[C@@]5(C(O)=O)CCC(C)(C)C[C@H]5C4=CC[C@@H]3[C@]21C XBZYWSMVVKYHQN-MYPRUECHSA-N 0.000 description 9
- 241000196324 Embryophyta Species 0.000 description 9
- 101001056878 Homo sapiens Squalene monooxygenase Proteins 0.000 description 9
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 8
- 229910052799 carbon Inorganic materials 0.000 description 8
- 235000021096 natural sweeteners Nutrition 0.000 description 8
- 244000241257 Cucumis melo Species 0.000 description 7
- 235000009842 Cucumis melo Nutrition 0.000 description 7
- 235000009849 Cucumis sativus Nutrition 0.000 description 7
- 240000008067 Cucumis sativus Species 0.000 description 7
- VWFJDQUYCIWHTN-UHFFFAOYSA-N Farnesyl pyrophosphate Natural products CC(C)=CCCC(C)=CCCC(C)=CCOP(O)(=O)OP(O)(O)=O VWFJDQUYCIWHTN-UHFFFAOYSA-N 0.000 description 7
- 244000302512 Momordica charantia Species 0.000 description 7
- 235000009811 Momordica charantia Nutrition 0.000 description 7
- 101150053185 P450 gene Proteins 0.000 description 7
- 238000009825 accumulation Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 7
- 238000006735 epoxidation reaction Methods 0.000 description 7
- PZOUSPYUWWUPPK-UHFFFAOYSA-N indole Natural products CC1=CC=CC2=C1C=CN2 PZOUSPYUWWUPPK-UHFFFAOYSA-N 0.000 description 7
- RKJUIXBNRJVNHR-UHFFFAOYSA-N indolenine Natural products C1=CC=C2CC=NC2=C1 RKJUIXBNRJVNHR-UHFFFAOYSA-N 0.000 description 7
- 238000001294 liquid chromatography-tandem mass spectrometry Methods 0.000 description 7
- 230000035772 mutation Effects 0.000 description 7
- 235000007586 terpenes Nutrition 0.000 description 7
- KJTLQQUUPVSXIM-ZCFIWIBFSA-M (R)-mevalonate Chemical compound OCC[C@](O)(C)CC([O-])=O KJTLQQUUPVSXIM-ZCFIWIBFSA-M 0.000 description 6
- 241000219195 Arabidopsis thaliana Species 0.000 description 6
- 235000009854 Cucurbita moschata Nutrition 0.000 description 6
- 240000004244 Cucurbita moschata Species 0.000 description 6
- KJTLQQUUPVSXIM-UHFFFAOYSA-N DL-mevalonic acid Natural products OCCC(O)(C)CC(O)=O KJTLQQUUPVSXIM-UHFFFAOYSA-N 0.000 description 6
- 101000878981 Homo sapiens Squalene synthase Proteins 0.000 description 6
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 6
- ZSLZBFCDCINBPY-ZSJPKINUSA-N acetyl-CoA Chemical compound O[C@@H]1[C@H](OP(O)(O)=O)[C@@H](COP(O)(=O)OP(O)(=O)OCC(C)(C)[C@@H](O)C(=O)NCCC(=O)NCCSC(=O)C)O[C@H]1N1C2=NC=NC(N)=C2N=C1 ZSLZBFCDCINBPY-ZSJPKINUSA-N 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 230000004907 flux Effects 0.000 description 6
- 125000002791 glucosyl group Chemical group C1([C@H](O)[C@@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 6
- 101150011963 sohB gene Proteins 0.000 description 6
- 239000000126 substance Substances 0.000 description 6
- BXJMMHFVUAQJBV-WSZKGZBVSA-N 11alpha-hydroxycucurbitadienol Chemical compound C[C@H](CCC=C(C)C)[C@H]1CC[C@@]2(C)[C@@H]3CC=C4[C@@H](CC[C@H](O)C4(C)C)[C@]3(C)[C@H](O)C[C@]12C BXJMMHFVUAQJBV-WSZKGZBVSA-N 0.000 description 5
- 101100180240 Burkholderia pseudomallei (strain K96243) ispH2 gene Proteins 0.000 description 5
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 5
- 241001409321 Siraitia grosvenorii Species 0.000 description 5
- 101100126492 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) ispG1 gene Proteins 0.000 description 5
- 235000009508 confectionery Nutrition 0.000 description 5
- 239000013078 crystal Substances 0.000 description 5
- 101150118992 dxr gene Proteins 0.000 description 5
- 239000008103 glucose Substances 0.000 description 5
- 238000001727 in vivo Methods 0.000 description 5
- 101150068863 ispE gene Proteins 0.000 description 5
- 101150081094 ispG gene Proteins 0.000 description 5
- 101150017044 ispH gene Proteins 0.000 description 5
- 230000003647 oxidation Effects 0.000 description 5
- 238000007254 oxidation reaction Methods 0.000 description 5
- 239000013612 plasmid Substances 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- GSGVXNMGMKBGQU-PHESRWQRSA-N rebaudioside M Chemical compound C[C@@]12CCC[C@](C)([C@H]1CC[C@@]13CC(=C)[C@@](C1)(CC[C@@H]23)O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@H]1O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O)C(=O)O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@H]1O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O GSGVXNMGMKBGQU-PHESRWQRSA-N 0.000 description 5
- 238000010183 spectrum analysis Methods 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 101100152417 Bacillus spizizenii (strain ATCC 23059 / NRRL B-14472 / W23) tarI gene Proteins 0.000 description 4
- 241000283690 Bos taurus Species 0.000 description 4
- 101100453077 Botryococcus braunii HDR gene Proteins 0.000 description 4
- 102000057412 Diphosphomevalonate decarboxylases Human genes 0.000 description 4
- 240000005979 Hordeum vulgare Species 0.000 description 4
- 235000007340 Hordeum vulgare Nutrition 0.000 description 4
- 108700040132 Mevalonate kinases Proteins 0.000 description 4
- 101000958834 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) Diphosphomevalonate decarboxylase mvd1 Proteins 0.000 description 4
- 101000958925 Panax ginseng Diphosphomevalonate decarboxylase 1 Proteins 0.000 description 4
- 102100024279 Phosphomevalonate kinase Human genes 0.000 description 4
- 101100397457 Plasmodium falciparum (isolate 3D7) LytB gene Proteins 0.000 description 4
- 101150031212 SQE2 gene Proteins 0.000 description 4
- 241001409305 Siraitia Species 0.000 description 4
- 238000007792 addition Methods 0.000 description 4
- 125000000539 amino acid group Chemical group 0.000 description 4
- 235000011869 dried fruits Nutrition 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 239000000796 flavoring agent Substances 0.000 description 4
- 235000019634 flavors Nutrition 0.000 description 4
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000000338 in vitro Methods 0.000 description 4
- 150000002500 ions Chemical class 0.000 description 4
- 101150014059 ispD gene Proteins 0.000 description 4
- 101150022203 ispDF gene Proteins 0.000 description 4
- 101150018742 ispF gene Proteins 0.000 description 4
- 102000002678 mevalonate kinase Human genes 0.000 description 4
- 235000013615 non-nutritive sweetener Nutrition 0.000 description 4
- 108091000116 phosphomevalonate kinase Proteins 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 108010006229 Acetyl-CoA C-acetyltransferase Proteins 0.000 description 3
- 102100037768 Acetyl-CoA acetyltransferase, mitochondrial Human genes 0.000 description 3
- 241000251468 Actinopterygii Species 0.000 description 3
- 235000011446 Amygdalus persica Nutrition 0.000 description 3
- 101100397224 Bacillus subtilis (strain 168) isp gene Proteins 0.000 description 3
- 235000015844 Citrullus colocynthis Nutrition 0.000 description 3
- 240000000885 Citrullus colocynthis Species 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 3
- 101100286286 Dictyostelium discoideum ipi gene Proteins 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 244000043261 Hevea brasiliensis Species 0.000 description 3
- 101100509110 Leifsonia xyli subsp. xyli (strain CTCB07) ispDF gene Proteins 0.000 description 3
- 244000232885 Naematoloma sublateritium Species 0.000 description 3
- 235000016009 Naematoloma sublateritium Nutrition 0.000 description 3
- 240000007594 Oryza sativa Species 0.000 description 3
- 235000007164 Oryza sativa Nutrition 0.000 description 3
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 3
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 3
- 240000005809 Prunus persica Species 0.000 description 3
- 101150091539 SQE1 gene Proteins 0.000 description 3
- 101100052502 Shigella flexneri yciB gene Proteins 0.000 description 3
- 101100278777 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) dxs1 gene Proteins 0.000 description 3
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 3
- 229930006000 Sucrose Natural products 0.000 description 3
- 244000299461 Theobroma cacao Species 0.000 description 3
- 101000692921 Uncultured bacterium HF130_AEPn_1 2-amino-1-hydroxyethylphosphonate dioxygenase (glycine-forming) Proteins 0.000 description 3
- 241001247821 Ziziphus Species 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 3
- 235000021311 artificial sweeteners Nutrition 0.000 description 3
- 230000006696 biosynthetic metabolic pathway Effects 0.000 description 3
- XPPKVPWEQAFLFU-UHFFFAOYSA-N diphosphoric acid Chemical compound OP(O)(=O)OP(O)(O)=O XPPKVPWEQAFLFU-UHFFFAOYSA-N 0.000 description 3
- 238000009510 drug design Methods 0.000 description 3
- 101150056470 dxs gene Proteins 0.000 description 3
- 101150014423 fni gene Proteins 0.000 description 3
- 235000021474 generally recognized As safe (food) Nutrition 0.000 description 3
- 235000021473 generally recognized as safe (food ingredients) Nutrition 0.000 description 3
- 230000007407 health benefit Effects 0.000 description 3
- 230000036571 hydration Effects 0.000 description 3
- 238000006703 hydration reaction Methods 0.000 description 3
- 230000033444 hydroxylation Effects 0.000 description 3
- 238000005805 hydroxylation reaction Methods 0.000 description 3
- 101150075592 idi gene Proteins 0.000 description 3
- 239000004615 ingredient Substances 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 101150064873 ispA gene Proteins 0.000 description 3
- 229910001629 magnesium chloride Inorganic materials 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 239000006072 paste Substances 0.000 description 3
- 239000005720 sucrose Substances 0.000 description 3
- 235000000346 sugar Nutrition 0.000 description 3
- DGVVWUTYPXICAM-UHFFFAOYSA-N β‐Mercaptoethanol Chemical compound OCCS DGVVWUTYPXICAM-UHFFFAOYSA-N 0.000 description 3
- OKZYCXHTTZZYSK-ZCFIWIBFSA-N (R)-5-phosphomevalonic acid Chemical compound OC(=O)C[C@@](O)(C)CCOP(O)(O)=O OKZYCXHTTZZYSK-ZCFIWIBFSA-N 0.000 description 2
- TWCMVXMQHSVIOJ-UHFFFAOYSA-N Aglycone of yadanzioside D Natural products COC(=O)C12OCC34C(CC5C(=CC(O)C(O)C5(C)C3C(O)C1O)C)OC(=O)C(OC(=O)C)C24 TWCMVXMQHSVIOJ-UHFFFAOYSA-N 0.000 description 2
- 235000001405 Artemisia annua Nutrition 0.000 description 2
- 240000000011 Artemisia annua Species 0.000 description 2
- PLMKQQMDOMTZGG-UHFFFAOYSA-N Astrantiagenin E-methylester Natural products CC12CCC(O)C(C)(CO)C1CCC1(C)C2CC=C2C3CC(C)(C)CCC3(C(=O)OC)CCC21C PLMKQQMDOMTZGG-UHFFFAOYSA-N 0.000 description 2
- 101100326920 Caenorhabditis elegans ctl-1 gene Proteins 0.000 description 2
- 241000723377 Coffea Species 0.000 description 2
- 235000009804 Cucurbita pepo subsp pepo Nutrition 0.000 description 2
- 241000219130 Cucurbita pepo subsp. pepo Species 0.000 description 2
- SRBFZHDQGSBBOR-IOVATXLUSA-N D-xylopyranose Chemical compound O[C@@H]1COC(O)[C@H](O)[C@H]1O SRBFZHDQGSBBOR-IOVATXLUSA-N 0.000 description 2
- 206010013911 Dysgeusia Diseases 0.000 description 2
- 102100037584 FAST kinase domain-containing protein 4 Human genes 0.000 description 2
- 235000000554 Glycyrrhiza uralensis Nutrition 0.000 description 2
- 240000008917 Glycyrrhiza uralensis Species 0.000 description 2
- 244000299507 Gossypium hirsutum Species 0.000 description 2
- 101001028251 Homo sapiens FAST kinase domain-containing protein 4 Proteins 0.000 description 2
- 101000736368 Homo sapiens PH and SEC7 domain-containing protein 4 Proteins 0.000 description 2
- 241001048891 Jatropha curcas Species 0.000 description 2
- 235000009496 Juglans regia Nutrition 0.000 description 2
- 240000007049 Juglans regia Species 0.000 description 2
- 241000219828 Medicago truncatula Species 0.000 description 2
- 241000409625 Morus notabilis Species 0.000 description 2
- 101100451971 Mus musculus Ephx2 gene Proteins 0.000 description 2
- 235000007171 Ononis arvensis Nutrition 0.000 description 2
- 240000002598 Ononis spinosa Species 0.000 description 2
- 235000004294 Ononis spinosa Nutrition 0.000 description 2
- 102100036232 PH and SEC7 domain-containing protein 4 Human genes 0.000 description 2
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 2
- 244000046052 Phaseolus vulgaris Species 0.000 description 2
- 101100048058 Stevia rebaudiana UGT85C1 gene Proteins 0.000 description 2
- 235000009470 Theobroma cacao Nutrition 0.000 description 2
- 239000007983 Tris buffer Substances 0.000 description 2
- 241000607598 Vibrio Species 0.000 description 2
- 235000014787 Vitis vinifera Nutrition 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 2
- 235000007244 Zea mays Nutrition 0.000 description 2
- OJFDKHTZOUZBOS-CITAKDKDSA-N acetoacetyl-CoA Chemical compound O[C@@H]1[C@H](OP(O)(O)=O)[C@@H](COP(O)(=O)OP(O)(=O)OCC(C)(C)[C@@H](O)C(=O)NCCC(=O)NCCSC(=O)CC(=O)C)O[C@H]1N1C2=NC=NC(N)=C2N=C1 OJFDKHTZOUZBOS-CITAKDKDSA-N 0.000 description 2
- 239000003963 antioxidant agent Substances 0.000 description 2
- 235000006708 antioxidants Nutrition 0.000 description 2
- 239000008122 artificial sweetener Substances 0.000 description 2
- 239000012472 biological sample Substances 0.000 description 2
- 230000001851 biosynthetic effect Effects 0.000 description 2
- 238000011138 biotechnological process Methods 0.000 description 2
- 235000015895 biscuits Nutrition 0.000 description 2
- 239000013592 cell lysate Substances 0.000 description 2
- 238000004587 chromatography analysis Methods 0.000 description 2
- 230000004186 co-expression Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 239000002537 cosmetic Substances 0.000 description 2
- 238000012258 culturing Methods 0.000 description 2
- 235000005911 diet Nutrition 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 235000019264 food flavour enhancer Nutrition 0.000 description 2
- 235000021022 fresh fruits Nutrition 0.000 description 2
- 235000011389 fruit/vegetable juice Nutrition 0.000 description 2
- 230000004545 gene duplication Effects 0.000 description 2
- 238000010362 genome editing Methods 0.000 description 2
- 239000001963 growth medium Substances 0.000 description 2
- 239000008123 high-intensity sweetener Substances 0.000 description 2
- PFOARMALXZGCHY-UHFFFAOYSA-N homoegonol Natural products C1=C(OC)C(OC)=CC=C1C1=CC2=CC(CCCO)=CC(OC)=C2O1 PFOARMALXZGCHY-UHFFFAOYSA-N 0.000 description 2
- 238000012405 in silico analysis Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000001819 mass spectrum Methods 0.000 description 2
- 235000013372 meat Nutrition 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 230000002503 metabolic effect Effects 0.000 description 2
- 239000002207 metabolite Substances 0.000 description 2
- 230000007483 microbial process Effects 0.000 description 2
- 238000009629 microbiological culture Methods 0.000 description 2
- 229930014626 natural product Natural products 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 239000000419 plant extract Substances 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 210000003705 ribosome Anatomy 0.000 description 2
- 238000007363 ring formation reaction Methods 0.000 description 2
- 102220270083 rs1555407429 Human genes 0.000 description 2
- 235000013580 sausages Nutrition 0.000 description 2
- 235000014102 seafood Nutrition 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012358 sourcing Methods 0.000 description 2
- 235000013555 soy sauce Nutrition 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 235000013311 vegetables Nutrition 0.000 description 2
- 235000021419 vinegar Nutrition 0.000 description 2
- 239000000052 vinegar Substances 0.000 description 2
- HGVJFBSSLICXEM-UHNVWZDZSA-N (2s,3r)-2-methylbutane-1,2,3,4-tetrol Chemical compound OC[C@@](O)(C)[C@H](O)CO HGVJFBSSLICXEM-UHNVWZDZSA-N 0.000 description 1
- 238000011925 1,2-addition Methods 0.000 description 1
- XQFCCTPWINMCQJ-UHFFFAOYSA-N 1-(1H-indol-3-yl)-N,N-dimethylpropan-2-amine Chemical compound CC(N(C)C)CC1=CNC2=CC=CC=C12 XQFCCTPWINMCQJ-UHFFFAOYSA-N 0.000 description 1
- AJPADPZSRRUGHI-RFZPGFLSSA-N 1-deoxy-D-xylulose 5-phosphate Chemical compound CC(=O)[C@@H](O)[C@H](O)COP(O)(O)=O AJPADPZSRRUGHI-RFZPGFLSSA-N 0.000 description 1
- CNIGTNPXWSLRPH-UHFFFAOYSA-N 2,2,4-trimethyl-1-oxidoimidazol-1-ium Chemical compound CC1=NC(C)(C)[N+]([O-])=C1 CNIGTNPXWSLRPH-UHFFFAOYSA-N 0.000 description 1
- SFRQRNJMIIUYDI-UHNVWZDZSA-N 2-C-methyl-D-erythritol 2,4-cyclic diphosphate Chemical compound OC[C@]1(C)OP(O)(=O)OP(O)(=O)OC[C@H]1O SFRQRNJMIIUYDI-UHNVWZDZSA-N 0.000 description 1
- 101710184086 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase Proteins 0.000 description 1
- 101710201168 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase Proteins 0.000 description 1
- 101710195531 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase, chloroplastic Proteins 0.000 description 1
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- MIDXCONKKJTLDX-UHFFFAOYSA-N 3,5-dimethylcyclopentane-1,2-dione Chemical compound CC1CC(C)C(=O)C1=O MIDXCONKKJTLDX-UHFFFAOYSA-N 0.000 description 1
- JVVRCYWZTJLJSG-UHFFFAOYSA-N 4-dimethylaminophenol Chemical compound CN(C)C1=CC=C(O)C=C1 JVVRCYWZTJLJSG-UHFFFAOYSA-N 0.000 description 1
- 229960000549 4-dimethylaminophenol Drugs 0.000 description 1
- VHYFNPMBLIVWCW-UHFFFAOYSA-N 4-dimethylaminopyridine Substances CN(C)C1=CC=NC=C1 VHYFNPMBLIVWCW-UHFFFAOYSA-N 0.000 description 1
- 101710166309 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase Proteins 0.000 description 1
- 101710139854 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (ferredoxin) Proteins 0.000 description 1
- 101710088071 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (ferredoxin), chloroplastic Proteins 0.000 description 1
- 101710086072 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (flavodoxin) Proteins 0.000 description 1
- 235000007754 Achillea millefolium Nutrition 0.000 description 1
- 240000000073 Achillea millefolium Species 0.000 description 1
- 241000163216 Agapanthus praecox subsp. orientalis Species 0.000 description 1
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 1
- 108010082126 Alanine transaminase Proteins 0.000 description 1
- 244000144725 Amygdalus communis Species 0.000 description 1
- 244000099147 Ananas comosus Species 0.000 description 1
- 235000007119 Ananas comosus Nutrition 0.000 description 1
- 241001610442 Arabidopsis lyrata subsp. lyrata Species 0.000 description 1
- 101100061270 Arabidopsis thaliana CPR1 gene Proteins 0.000 description 1
- 101100011894 Arabidopsis thaliana SQE3 gene Proteins 0.000 description 1
- 244000105624 Arachis hypogaea Species 0.000 description 1
- 235000010777 Arachis hypogaea Nutrition 0.000 description 1
- 241000905661 Bacteroidetes bacterium Species 0.000 description 1
- 241001628333 Bathymodiolus azoricus Species 0.000 description 1
- 239000002028 Biomass Substances 0.000 description 1
- 241001474374 Blennius Species 0.000 description 1
- 240000008100 Brassica rapa Species 0.000 description 1
- 235000011292 Brassica rapa Nutrition 0.000 description 1
- 235000004936 Bromus mango Nutrition 0.000 description 1
- 238000010453 CRISPR/Cas method Methods 0.000 description 1
- 241000759909 Camptotheca Species 0.000 description 1
- 241000769888 Canephora <angiosperm> Species 0.000 description 1
- 102220511321 Caprin-1_D95K_mutation Human genes 0.000 description 1
- 240000001829 Catharanthus roseus Species 0.000 description 1
- 240000006162 Chenopodium quinoa Species 0.000 description 1
- 235000015493 Chenopodium quinoa Nutrition 0.000 description 1
- 241001508469 Chrysosplenium americanum Species 0.000 description 1
- 244000223760 Cinnamomum zeylanicum Species 0.000 description 1
- 244000270200 Citrullus vulgaris Species 0.000 description 1
- 235000012840 Citrullus vulgaris Nutrition 0.000 description 1
- 235000008733 Citrus aurantifolia Nutrition 0.000 description 1
- 235000005979 Citrus limon Nutrition 0.000 description 1
- 244000131522 Citrus pyriformis Species 0.000 description 1
- ACTIUHUUMQJHFO-UHFFFAOYSA-N Coenzym Q10 Natural products COC1=C(OC)C(=O)C(CC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)C)=C(C)C1=O ACTIUHUUMQJHFO-UHFFFAOYSA-N 0.000 description 1
- 244000016593 Coffea robusta Species 0.000 description 1
- 235000002187 Coffea robusta Nutrition 0.000 description 1
- 235000016795 Cola Nutrition 0.000 description 1
- 241001634499 Cola Species 0.000 description 1
- 235000011824 Cola pachycarpa Nutrition 0.000 description 1
- 241000272205 Columba livia Species 0.000 description 1
- 235000010862 Corchorus capsularis Nutrition 0.000 description 1
- 240000004792 Corchorus capsularis Species 0.000 description 1
- 235000003363 Cornus mas Nutrition 0.000 description 1
- 240000006766 Cornus mas Species 0.000 description 1
- 229920000742 Cotton Polymers 0.000 description 1
- 241000219122 Cucurbita Species 0.000 description 1
- 241000219104 Cucurbitaceae Species 0.000 description 1
- 102100031515 D-ribitol-5-phosphate cytidylyltransferase Human genes 0.000 description 1
- 240000008853 Datura stramonium Species 0.000 description 1
- 241000859983 Dendrobium catenatum Species 0.000 description 1
- 102100035966 DnaJ homolog subfamily A member 2 Human genes 0.000 description 1
- 102100022840 DnaJ homolog subfamily C member 7 Human genes 0.000 description 1
- 102100029211 E3 ubiquitin-protein ligase TTC3 Human genes 0.000 description 1
- 241000220485 Fabaceae Species 0.000 description 1
- VWFJDQUYCIWHTN-FBXUGWQNSA-N Farnesyl diphosphate Natural products CC(C)=CCC\C(C)=C/CC\C(C)=C/COP(O)(=O)OP(O)(O)=O VWFJDQUYCIWHTN-FBXUGWQNSA-N 0.000 description 1
- 241001546314 Flavobacteriales bacterium Species 0.000 description 1
- 229930091371 Fructose Natural products 0.000 description 1
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 1
- 239000005715 Fructose Substances 0.000 description 1
- 108010026318 Geranyltranstransferase Proteins 0.000 description 1
- 235000010469 Glycine max Nutrition 0.000 description 1
- 244000068988 Glycine max Species 0.000 description 1
- 235000009432 Gossypium hirsutum Nutrition 0.000 description 1
- 235000014718 Gossypium raimondii Nutrition 0.000 description 1
- 241001149081 Gossypium raimondii Species 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 101000994204 Homo sapiens D-ribitol-5-phosphate cytidylyltransferase Proteins 0.000 description 1
- 101000931210 Homo sapiens DnaJ homolog subfamily A member 2 Proteins 0.000 description 1
- 101000903053 Homo sapiens DnaJ homolog subfamily C member 7 Proteins 0.000 description 1
- 101000633723 Homo sapiens E3 ubiquitin-protein ligase TTC3 Proteins 0.000 description 1
- 101001065660 Homo sapiens Lanosterol synthase Proteins 0.000 description 1
- 101000847024 Homo sapiens Tetratricopeptide repeat protein 1 Proteins 0.000 description 1
- 101000845188 Homo sapiens Tetratricopeptide repeat protein 4 Proteins 0.000 description 1
- 108090000769 Isomerases Proteins 0.000 description 1
- 102000004195 Isomerases Human genes 0.000 description 1
- HGVJFBSSLICXEM-UHFFFAOYSA-N L-2-methyl-erythritol Natural products OCC(O)(C)C(O)CO HGVJFBSSLICXEM-UHFFFAOYSA-N 0.000 description 1
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 1
- 125000000510 L-tryptophano group Chemical group [H]C1=C([H])C([H])=C2N([H])C([H])=C(C([H])([H])[C@@]([H])(C(O[H])=O)N([H])[*])C2=C1[H] 0.000 description 1
- 235000014826 Mangifera indica Nutrition 0.000 description 1
- 240000007228 Mangifera indica Species 0.000 description 1
- 240000003183 Manihot esculenta Species 0.000 description 1
- 235000004456 Manihot esculenta Nutrition 0.000 description 1
- 235000010624 Medicago sativa Nutrition 0.000 description 1
- 240000004658 Medicago sativa Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000404878 Methylomonas lenta Species 0.000 description 1
- 229920000881 Modified starch Polymers 0.000 description 1
- WVXIMWMLKSCVTD-JLRHFDOOSA-N Mogroside II-E Chemical compound O([C@H](CC[C@@H](C)[C@@H]1[C@]2(C[C@@H](O)[C@@]3(C)[C@H]4C(C([C@@H](O[C@H]5[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O5)O)CC4)(C)C)=CC[C@H]3[C@]2(C)CC1)C)C(C)(C)O)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O WVXIMWMLKSCVTD-JLRHFDOOSA-N 0.000 description 1
- 235000008708 Morus alba Nutrition 0.000 description 1
- 240000000249 Morus alba Species 0.000 description 1
- 240000005561 Musa balbisiana Species 0.000 description 1
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 1
- 241000187479 Mycobacterium tuberculosis Species 0.000 description 1
- 241000588652 Neisseria gonorrhoeae Species 0.000 description 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 1
- 244000061176 Nicotiana tabacum Species 0.000 description 1
- 235000018136 Olea europaea var sylvestris Nutrition 0.000 description 1
- 240000006267 Olea europaea var. sylvestris Species 0.000 description 1
- 102000004020 Oxygenases Human genes 0.000 description 1
- 108090000417 Oxygenases Proteins 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 241000208343 Panax Species 0.000 description 1
- 235000002791 Panax Nutrition 0.000 description 1
- 240000004371 Panax ginseng Species 0.000 description 1
- 235000002789 Panax ginseng Nutrition 0.000 description 1
- 241001278112 Populus euphratica Species 0.000 description 1
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 1
- 235000003893 Prunus dulcis var amara Nutrition 0.000 description 1
- 241000589516 Pseudomonas Species 0.000 description 1
- 244000294611 Punica granatum Species 0.000 description 1
- 235000014360 Punica granatum Nutrition 0.000 description 1
- 241000220324 Pyrus Species 0.000 description 1
- 235000014443 Pyrus communis Nutrition 0.000 description 1
- 241000290143 Pyrus x bretschneideri Species 0.000 description 1
- LCTONWCANYUPML-UHFFFAOYSA-M Pyruvate Chemical compound CC(=O)C([O-])=O LCTONWCANYUPML-UHFFFAOYSA-M 0.000 description 1
- 235000019057 Raphanus caudatus Nutrition 0.000 description 1
- 244000088415 Raphanus sativus Species 0.000 description 1
- 235000011380 Raphanus sativus Nutrition 0.000 description 1
- 101100451967 Rattus norvegicus Ephx1 gene Proteins 0.000 description 1
- 241000031819 Rhinolophus sinicus Species 0.000 description 1
- 101100501992 Rhizobium meliloti (strain 1021) exoM gene Proteins 0.000 description 1
- 240000000528 Ricinus communis Species 0.000 description 1
- 235000004443 Ricinus communis Nutrition 0.000 description 1
- 235000019095 Sechium edule Nutrition 0.000 description 1
- 240000007660 Sechium edule Species 0.000 description 1
- 101100165805 Siraitia grosvenorii CYP87D18 gene Proteins 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 235000002560 Solanum lycopersicum Nutrition 0.000 description 1
- 240000006394 Sorghum bicolor Species 0.000 description 1
- 235000007230 Sorghum bicolor Nutrition 0.000 description 1
- 108010073771 Soybean Proteins Proteins 0.000 description 1
- 235000009337 Spinacia oleracea Nutrition 0.000 description 1
- 244000300264 Spinacia oleracea Species 0.000 description 1
- 235000009184 Spondias indica Nutrition 0.000 description 1
- 108030001636 Squalene synthases Proteins 0.000 description 1
- 101100101355 Stevia rebaudiana UGT91D1 gene Proteins 0.000 description 1
- 101100101356 Stevia rebaudiana UGT91D2 gene Proteins 0.000 description 1
- 241000193985 Streptococcus agalactiae Species 0.000 description 1
- 241000193998 Streptococcus pneumoniae Species 0.000 description 1
- 238000010459 TALEN Methods 0.000 description 1
- 102100032841 Tetratricopeptide repeat protein 1 Human genes 0.000 description 1
- 102100031279 Tetratricopeptide repeat protein 4 Human genes 0.000 description 1
- 235000011941 Tilia x europaea Nutrition 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 1
- 244000186513 Trema orientalis Species 0.000 description 1
- 235000009754 Vitis X bourquina Nutrition 0.000 description 1
- 235000012333 Vitis X labruscana Nutrition 0.000 description 1
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 1
- 241000588901 Zymomonas Species 0.000 description 1
- YPXGTKHZRCDZTL-KSFOROOFSA-N [(2r,3s)-2,3,4-trihydroxypentyl] dihydrogen phosphate Chemical compound CC(O)[C@H](O)[C@H](O)COP(O)(O)=O YPXGTKHZRCDZTL-KSFOROOFSA-N 0.000 description 1
- 241000606834 [Haemophilus] ducreyi Species 0.000 description 1
- 230000002378 acidificating effect Effects 0.000 description 1
- 150000007513 acids Chemical class 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 239000013543 active substance Substances 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 235000013334 alcoholic beverage Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003110 anti-inflammatory effect Effects 0.000 description 1
- 239000008346 aqueous phase Substances 0.000 description 1
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 235000013405 beer Nutrition 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 1
- ZVGODNZUEWDIPM-YXRLTKITSA-N beta-D-glucosyl crocetin Chemical compound OC(=O)C(/C)=C/C=C/C(/C)=C/C=C/C=C(\C)/C=C/C=C(\C)C(=O)O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O ZVGODNZUEWDIPM-YXRLTKITSA-N 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 235000008429 bread Nutrition 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 239000004067 bulking agent Substances 0.000 description 1
- 102220352439 c.128G>C Human genes 0.000 description 1
- 235000013736 caramel Nutrition 0.000 description 1
- 235000012174 carbonated soft drink Nutrition 0.000 description 1
- 235000021466 carotenoid Nutrition 0.000 description 1
- 150000001747 carotenoids Chemical class 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 230000003833 cell viability Effects 0.000 description 1
- 235000013339 cereals Nutrition 0.000 description 1
- 239000013626 chemical specie Substances 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 235000019219 chocolate Nutrition 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 235000017803 cinnamon Nutrition 0.000 description 1
- 235000017471 coenzyme Q10 Nutrition 0.000 description 1
- ACTIUHUUMQJHFO-UPTCCGCDSA-N coenzyme Q10 Chemical compound COC1=C(OC)C(=O)C(C\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CCC=C(C)C)=C(C)C1=O ACTIUHUUMQJHFO-UPTCCGCDSA-N 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000003340 combinatorial analysis Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000006482 condensation reaction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 239000006071 cream Substances 0.000 description 1
- 235000013365 dairy product Nutrition 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 230000000378 dietary effect Effects 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- 238000004090 dissolution Methods 0.000 description 1
- ZWIBGKZDAWNIFC-UHFFFAOYSA-N disuccinimidyl suberate Chemical compound O=C1CCC(=O)N1OC(=O)CCCCCCC(=O)ON1C(=O)CCC1=O ZWIBGKZDAWNIFC-UHFFFAOYSA-N 0.000 description 1
- 230000002526 effect on cardiovascular system Effects 0.000 description 1
- 235000013601 eggs Nutrition 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000003995 emulsifying agent Substances 0.000 description 1
- 108010017796 epoxidase Proteins 0.000 description 1
- 125000003700 epoxy group Chemical group 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000013332 fish product Nutrition 0.000 description 1
- 229930003935 flavonoid Natural products 0.000 description 1
- 150000002215 flavonoids Chemical class 0.000 description 1
- 235000017173 flavonoids Nutrition 0.000 description 1
- 235000012041 food component Nutrition 0.000 description 1
- 239000005417 food ingredient Substances 0.000 description 1
- 235000013611 frozen food Nutrition 0.000 description 1
- 235000012055 fruits and vegetables Nutrition 0.000 description 1
- 101150041954 galU gene Proteins 0.000 description 1
- 238000004817 gas chromatography Methods 0.000 description 1
- 238000000769 gas chromatography-flame ionisation detection Methods 0.000 description 1
- 239000003349 gelling agent Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 150000002338 glycosides Chemical class 0.000 description 1
- 230000001279 glycosylating effect Effects 0.000 description 1
- 108700014210 glycosyltransferase activity proteins Proteins 0.000 description 1
- 102000045442 glycosyltransferase activity proteins Human genes 0.000 description 1
- 235000002532 grape seed extract Nutrition 0.000 description 1
- 101150096208 gtaB gene Proteins 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- 230000000640 hydroxylating effect Effects 0.000 description 1
- 235000015243 ice cream Nutrition 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 229910052500 inorganic mineral Inorganic materials 0.000 description 1
- 235000021539 instant coffee Nutrition 0.000 description 1
- 150000002540 isothiocyanates Chemical class 0.000 description 1
- 235000015110 jellies Nutrition 0.000 description 1
- 239000008274 jelly Substances 0.000 description 1
- 235000008960 ketchup Nutrition 0.000 description 1
- 238000004898 kneading Methods 0.000 description 1
- 235000021374 legumes Nutrition 0.000 description 1
- 239000004571 lime Substances 0.000 description 1
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 1
- PXUQTDZNOHRWLI-OXUVVOBNSA-O malvidin 3-O-beta-D-glucoside Chemical compound COC1=C(O)C(OC)=CC(C=2C(=CC=3C(O)=CC(O)=CC=3[O+]=2)O[C@H]2[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O2)O)=C1 PXUQTDZNOHRWLI-OXUVVOBNSA-O 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 235000010746 mayonnaise Nutrition 0.000 description 1
- 239000008268 mayonnaise Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 229940126601 medicinal product Drugs 0.000 description 1
- 229930182817 methionine Natural products 0.000 description 1
- 125000001360 methionine group Chemical group N[C@@H](CCSC)C(=O)* 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 239000011707 mineral Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 235000019426 modified starch Nutrition 0.000 description 1
- WVXIMWMLKSCVTD-UHFFFAOYSA-N mogroside II E Natural products C1CC2(C)C3CC=C(C(C(OC4C(C(O)C(O)C(CO)O4)O)CC4)(C)C)C4C3(C)C(O)CC2(C)C1C(C)CCC(C(C)(C)O)OC1OC(CO)C(O)C(O)C1O WVXIMWMLKSCVTD-UHFFFAOYSA-N 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 238000005325 percolation Methods 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- 150000002989 phenols Chemical class 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 230000000865 phosphorylative effect Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 239000003075 phytoestrogen Substances 0.000 description 1
- 229940068065 phytosterols Drugs 0.000 description 1
- 238000005554 pickling Methods 0.000 description 1
- 235000002378 plant sterols Nutrition 0.000 description 1
- 102000040430 polynucleotide Human genes 0.000 description 1
- 108091033319 polynucleotide Proteins 0.000 description 1
- 239000002157 polynucleotide Substances 0.000 description 1
- 229920005862 polyol Polymers 0.000 description 1
- 150000003077 polyols Chemical class 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 235000008476 powdered milk Nutrition 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- 239000006041 probiotic Substances 0.000 description 1
- 235000018291 probiotics Nutrition 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- VBUBYMVULIMEHR-UHFFFAOYSA-N propa-1,2-diene;prop-1-yne Chemical compound CC#C.C=C=C VBUBYMVULIMEHR-UHFFFAOYSA-N 0.000 description 1
- 230000005664 protein glycosylation in endoplasmic reticulum Effects 0.000 description 1
- 235000011962 puddings Nutrition 0.000 description 1
- HELXLJCILKEWJH-NCGAPWICSA-N rebaudioside A Chemical compound O([C@H]1[C@H](O)[C@@H](CO)O[C@H]([C@@H]1O[C@H]1[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O1)O)O[C@]12C(=C)C[C@@]3(C1)CC[C@@H]1[C@@](C)(CCC[C@]1([C@@H]3CC2)C)C(=O)O[C@H]1[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O1)O)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O HELXLJCILKEWJH-NCGAPWICSA-N 0.000 description 1
- NPCOQXAVBJJZBQ-UHFFFAOYSA-N reduced coenzyme Q9 Natural products COC1=C(O)C(C)=C(CC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)C)C(O)=C1OC NPCOQXAVBJJZBQ-UHFFFAOYSA-N 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 102220295344 rs1555894553 Human genes 0.000 description 1
- 102200002862 rs199474726 Human genes 0.000 description 1
- 102220059907 rs730881678 Human genes 0.000 description 1
- 102220333944 rs746295635 Human genes 0.000 description 1
- 102220263114 rs759778382 Human genes 0.000 description 1
- 102220105053 rs80357017 Human genes 0.000 description 1
- 102220331596 rs863224613 Human genes 0.000 description 1
- 201000005404 rubella Diseases 0.000 description 1
- 229930182490 saponin Natural products 0.000 description 1
- 150000007949 saponins Chemical class 0.000 description 1
- 235000017709 saponins Nutrition 0.000 description 1
- 235000015067 sauces Nutrition 0.000 description 1
- 229930000044 secondary metabolite Natural products 0.000 description 1
- 239000013049 sediment Substances 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
- 229940001941 soy protein Drugs 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 229940031000 streptococcus pneumoniae Drugs 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 235000020357 syrup Nutrition 0.000 description 1
- 239000006188 syrup Substances 0.000 description 1
- 238000004885 tandem mass spectrometry Methods 0.000 description 1
- 239000002562 thickening agent Substances 0.000 description 1
- 150000003568 thioethers Chemical class 0.000 description 1
- 150000003573 thiols Chemical class 0.000 description 1
- 239000000606 toothpaste Substances 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 229940035936 ubiquinone Drugs 0.000 description 1
- 239000008371 vanilla flavor Substances 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 235000013522 vodka Nutrition 0.000 description 1
- 235000014101 wine Nutrition 0.000 description 1
- 210000005253 yeast cell Anatomy 0.000 description 1
- 235000008924 yoghurt drink Nutrition 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/52—Genes encoding for enzymes or proenzymes
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/415—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/70—Vectors or expression systems specially adapted for E. coli
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/0004—Oxidoreductases (1.)
- C12N9/0071—Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/0004—Oxidoreductases (1.)
- C12N9/0071—Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
- C12N9/0077—Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14) with a reduced iron-sulfur protein as one donor (1.14.15)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/0004—Oxidoreductases (1.)
- C12N9/0071—Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
- C12N9/0083—Miscellaneous (1.14.99)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/1048—Glycosyltransferases (2.4)
- C12N9/1051—Hexosyltransferases (2.4.1)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/90—Isomerases (5.)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P17/00—Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms
- C12P17/02—Oxygen as only ring hetero atoms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/18—Preparation of compounds containing saccharide radicals produced by the action of a glycosyl transferase, e.g. alpha-, beta- or gamma-cyclodextrins
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/26—Preparation of nitrogen-containing carbohydrates
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/44—Preparation of O-glycosides, e.g. glucosides
- C12P19/56—Preparation of O-glycosides, e.g. glucosides having an oxygen atom of the saccharide radical directly bound to a condensed ring system having three or more carbocyclic rings, e.g. daunomycin, adriamycin
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P33/00—Preparation of steroids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P5/00—Preparation of hydrocarbons or halogenated hydrocarbons
- C12P5/007—Preparation of hydrocarbons or halogenated hydrocarbons containing one or more isoprene units, i.e. terpenes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y114/00—Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
- C12Y114/99—Miscellaneous (1.14.99)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y114/00—Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
- C12Y114/99—Miscellaneous (1.14.99)
- C12Y114/99034—Monoprenyl isoflavone epoxidase (1.14.99.34)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y204/00—Glycosyltransferases (2.4)
- C12Y204/01—Hexosyltransferases (2.4.1)
- C12Y204/01017—Glucuronosyltransferase (2.4.1.17)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y205/00—Transferases transferring alkyl or aryl groups, other than methyl groups (2.5)
- C12Y205/01—Transferases transferring alkyl or aryl groups, other than methyl groups (2.5) transferring alkyl or aryl groups, other than methyl groups (2.5.1)
- C12Y205/01021—Squalene synthase (2.5.1.21), i.e. farnesyl-disphosphate farnesyltransferase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y303/00—Hydrolases acting on ether bonds (3.3)
- C12Y303/02—Ether hydrolases (3.3.2)
- C12Y303/0201—Soluble epoxide hydrolase (3.3.2.10)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y504/00—Intramolecular transferases (5.4)
- C12Y504/99—Intramolecular transferases (5.4) transferring other groups (5.4.99)
- C12Y504/99008—Cycloartenol synthase (5.4.99.8)
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- General Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Gastroenterology & Hepatology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Botany (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Enzymes And Modification Thereof (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
- Coloring Foods And Improving Nutritive Qualities (AREA)
- Medicinal Preparation (AREA)
- Cosmetics (AREA)
Description
WO 2021/126960 PCT/US2020/065285 MICROBIAL PRODUCTION OF MOGROL AND MOGROSIDES CROSS-REFERENCE TO RELATED APPLICATIONS This Application claims priority to, and the benefit of, U.S. Provisional Application No. 63/085,557 filed September 30, 2020, US Provisional Application No. 63/075,631 filed September 8, 2020, and US Provisional Application No. 62/948,6filed December 16, 2019, the disclosures of which are hereby incorporated by reference in their entireties.
BACKGROUND Mogrosides are triterpene-derived specialized secondaiy metabolites found in the fruit of the Cucurbitaceae family plant Siraitia grosvenorii (a/k/a monkfruit or Luo Han Guo). Their biosynthesis in fruit involves a number of consecutive glycosylations of the aglycone mogrol. The food industry is increasing its use of mogroside fruit extract as a natural non-sugar food sweetener. For example, mogroside V (Mog.V) has a sweetening capacity that is -'-250 times that of sucrose (Kasai et al., Agric Biol Chem (1989)). Moreover, additional health benefits of mogrosides have been revealed in recent studies (Li etal, Chin J Nat Med (2014)), A variety of factors are promoting a surge in interest in research and. commercialization of the mogrosides and monkfruit in general, including, for example, the explosion in popularity of and demand for natural sweeteners; the difficulties in scalable sourcing of other promising natural sweeteners such as rebaudioside M (RebM) from the Stevia plant; the superior taste performance of Mog.V relative to other natural and artificial sweetener products on the market; and the medicinal potential of the plant and fruit.
Purified Mog.V has been approved as a high-mtensity sweetening agent in Japan (Jakinovich et al. , Journal of Natural Products and the extract has gained GRASstatus in the USA as a non-nutritive sweetener and flavor enhancer (GRAS 522). Extraction of mogrosides from the fruit can yield a product of varying degrees of purity, often accompanied by undesirable aftertaste. In addition, yields of mogroside from cultivated fruit are limited due to low plant yields and particular cultivation requirements of the plant. Mogrosides are present at about 1% in the fresh fruit and about 4% in the 1 WO 2021/126960 PCT/US2020/065285 dried fruit (Li HB, et al., 2006). Mog.V is the mam component, with a content of 0.5% to 1.4% in the dried fruit. Moreover, purification difficulties limit purity for Mog.V, with commercial products from plant extracts being standardized to about 50% Mog.V. It is highly likely that a pure Mog. V product will achieve greater commercial success than the blend, since it is less likely to have off flavors, will be easier to formulate into products, and has good solubility potential. It is therefore advantageous to be able to produce sweet mogroside compounds via biotechnological processes.
SUMMARY The present invention, in various aspects and embodiments, provides enzymes (including engineered enzymes), microbial strains, and methods for making mogrol and mogrol glycosides ("mogrosides ") using recombinant microbial processes. In other aspects, the invention provides methods for making products, including foods, beverages, and sweeteners (among others), by incorporating the mogrol glycosides produced according to the present disclosure.
In various aspects, the invention provides microbial strains and methods for making mogrol or mogrol glycoside(s). The invention involves a. recombinant microbial host cell expressing a heterologous enzyme pathway catalyzing the conversion of isopentenyl pyrophosphate (IPP) and/or dimethylallyl pyrophosphate (DMAPP) to mogrol or mogrol glycoside(s). The microbial host cell in various embodiments may be prokaryotic (e.g., E coll) or eukaryotic (e.g., yeast).
In various embodiments, the heterologous enzyme pathway comprises a famesyl diphosphate synthase (FTPS) and a squalene synthase (SQS), which are recombinantly expressed. In various embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 2 to 16, 166, and 167. In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SQS (SEQ ID NO: 11), which has high activity in E. coll.
In some embodiments, the host cell expresses one or more enzymes that produce mogrol from squalene. For example, the host cell may express one or more squalene epoxidase (SQE) enzymes, one or more tri terpenoid cyclases, an epoxide hydrolase (EPH), one or more cytochrome P450 oxidase enzymes (CYP450), a non-heme iron- dependent oxygenases, and a cytochrome P450 reductases (CPR). As shown in FIG, 2, 2 WO 2021/126960 PCT/US2020/065285 the heterologous pathway can proceed through several routes to mogroL which may involve one or two epoxidations of the core substrate.
In some embodiments, the heterologous enzyme pathway comprises two squalene epoxidase (SQE) enzymes. For example, the heterologous enzyme pathway may comprise an SQE that produces 2,3-oxidosqualene. In some embodiments, the SQE will produce 2,3;22,23-dioxidosqualene, and this conversion can be catalyzed by the same SQE enzyme, or an enzyme that differs in amino acid sequence by at least one amino acid modification. For example, the squalene epoxidase enzymes may include at least two SQE enzymes each comprising (independently) an amino acid sequence that is at least 70% identical to any one of SEQ ID NOS: 17 to 39, 168 to 170, and 177 to 183.
In some embodiments, at least one SQE comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 39.
In some embodiments, the host cell comprises two squalene epoxidase enzymes that each comprise an amino acid sequence that is at least 70% identical to squalene epoxidase (SEQ ID NO: 39). For example, one of the SQE, enzymes may have one or more amino acid modifications that improve specificity or productivity for conversion of 2,3-oxidosqualene to 2,3;22,23 dioxidosqualene, as compared to the enzyme having the amino acid sequence of SEQ ID NO: 39. In some embodiments, the amino acid modifications comprise one or more modifications at positions corresponding to the following positions of SEQ ID NO: 39: 35, 133, 163, 254, 283, 380, and 395. For example, the amino acid at the position corresponding to position 35 of SEQ ID NO: may be arginine (e.g., H35R). The position corresponding to position 133 of SEQ ID NO: may be glycine (e.g., N133G). The amino acid at the position corresponding to position 16.3 of SEQ ID NO: .39 may be alanine (e.g., Fl 63A). The amino acid at the position corresponding to position 254 of SEQ ID NO: 39 may be phenylalanine (e.g., Y254F). The amino acid at the position corresponding to position 283 of SEQ ID NO: may be leucine (e.g., M283L). The amino acid at the position corresponding to position 380 of SEQ ID NO: 39 may be leucine (e.g., V280L). The amino acid at the position corresponding to position 395 of SEQ ID NO: 39 may be tyrosine (e.g., F395Y).
In various embodiments, the heterologous enzyme pathway comprises a triterpene cyclase (TTC) enzyme. In some embodiments, where the microbial cell 3 WO 2021/126960 PCT/US2020/065285 coexpresses FPPS, along with the SQS, SQE, and one or more tnterpene cyclase enzymes, the microbial cell produces 2,3;22,23-dioxidosqualene. The 2,3;22,23-dioxidosqualene may be the substrate for downstream enzymes in the heterologous pathway. In some embodiments, the triterpene cyclase (TTC) comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 40 to 55 and 191 to 193. The TTC in various embodiments comprises an amino acid sequence that is at least 70% identical to the amino acid sequence of SEQ ID NO: 40.
In various embodiments, the heterologous enzyme pathway comprises at least two copies of a TTC enzyme gene, or comprises at least two enzymes having triterpene cyclase activity and converting 22,23-dioxidosqualene to 24,25-epoxycucurbitadienol. In such embodiments, product can be pulled to 24,25-epoxycucurbitadienol, with less production of cucurbitadienol. In some embodiments, the heterologous enzyme pathway comprises at least one TTC that comprises an amino acid sequence that is at least 70% identical to one of SEQ ID NO: 191, SEQ ID NO: 192, and SEQ ID NO: 193. For example, when co-expressed with SgCDS, these enzymes demonstrated improved production of 24,25-epoxycucurbitadienol compared to expression of SgCDS alone.
In some embodiments, the heterologous enzyme pathway comprises an epoxide hydrolase (EPH). The EPH may comprise an amino acid sequence that is at least 70% identical to amino acid sequence selected from SEQ ID NOS: 56 to 72, 184 to 190, and 212. In some embodiments, the EPH may employ as a substrate 24,25- epoxycucurbitadienol, for production of 24,25-dihydroxycucurbitadienol.
In some embodiments, the heterologous pathway comprises at least one EPH converting 24,25-epoxycucurbitadienol to 24,25-dihydroxycucurbitadienol, the at least one EPH compri sing an amino acid sequence that is at least 70% identical to one of: SEQ ID NO: 189, SEQ ID NO: 58, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 190, and SEQ ID NO: 212.
In some embodiments, the heterologous pathway comprises one or more oxidases. The one or more oxidases may be active on cucurbitadienol or oxygenated products thereof as a substrate, adding (collectively) hydroxylations at Cl 1, C24 and 25, thereby producing mogrol. Alternatively or in addition, the heterologous pathway may comprise 4 WO 2021/126960 PCT/US2020/065285 one or more oxidases that oxidize Cl 1 oi C24,25 dihydroxycucurbttadienol to produce mogrol.
In some embodiments, at least one oxidase is a cytochrome P450 enzyme. Exemplary ׳ cytochrome P450 enzymes comprise an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 73 to 91, 171 to 176, and 194 to 200.
In some embodiments, the microbial host cell expresses a heterologous enzyme pathway comprising a P450 enzyme having activity for oxidation at Cl 1 of C24,dihydroxycucurbitadienol, to thereby produce mogrol. For example, in some embodiments, the cytochrome P450 comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NO: 194 and SEQ ID NO: 171.
In various embodiments, the microbial host cell expresses one or more electron transfer proteins selected from a cytochrome P450 reductase (CPR), flavodoxin reductase (FPR) and ferredoxin reductase (FDXR) sufficient to regenerate the one or more oxidases. Exemplary CPR proteins are provided herein as SEQ ID NOS: 92 to and 201.
In some embodiments, the microbial host cell expresses SEQ ID NO: 194 or a derivative thereof, and SEQ ID NO: 98 or a. derivative thereof In some embodiments, the microbial host cell expresses SEQ ID NO: 171 or a derivative thereof, and. SEQ ID NO: 201. or a derivative thereof.
In some embodiments, the heterologous enzyme pathway further comprises one or more uridine diphosphate-dependent glycosyltransferase (UGT) enzymes, thereby producing one or more mogrol glycosides. The mogrol glycoside may be pentaglycosylated, hexaglycosylated, or more, in some embodiments. In other embodiments, the mogrol glycoside has two, three, or four glucosylations. The one or more mogrol glycosides may be selected from Mog.H-E, Mog.HI, Mog.HI-AI, Mog.111- A2, Mog.Ill, Mog.IV, Mog.IV-A, siamenoside, Mog.V, and Mog.VI. In some embodiments, the host cell produces Mog.V or siamenoside.
WO 2021/126960 PCT/US2020/065285 In some embodiments, the host cell expresses a UGT enzyme that catalyzes the primary' glycosylation of mogrol at C24 and/or C3 hydroxyl groups. In some embodiments, the UGT enzyme catalyzes a. branching glycosylation, such as a beta 1,and/or beta 1,6 branching glycosylation at the primary ׳ C3 and C24 gluscosyl groups.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 116 to 165, 202 to 210, 211, and 213 to 218.
For example, in some embodiments, the microbial cell expresses at least four UGT enzymes, resulting in glucosylation of mogrol at the C3 hydroxyl group, the Chydroxyl group, as well as a further 1,6 glucosylation at the C3 glucosyl group, and a further 1,6 glucosylation and. a further 1,2 glucosylation at the C24 glucosyl group. The product of such glucosylation reactions is Mog. V.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence having at least 70% sequence identity to one of SEQ ID NO: 164, 165, 138, 204 to 211, and 213 to 218. In some embodiments, the UGT enzyme is engineered to have higher glycosyl transferase productivity as compared to the wild type enzyme.
In various embodiments, the microbial strain expresses one or more UGT enzymes capable of primary glycosylation at C24 and/or C3 of mogrol. Exemplary ׳■ UGT enzymes include UGT enzymes comprising: an amino acid sequence that is at least 70% identical to SEQ ID NO: 165, an amino acid, sequence that is at least 70% identical to SEQ ID NO: 146, an amino acid sequence that is at least 70% identical to SEQ ID NO: 202, an amino acid sequence that is at least 70% identical to SEQ ID NO: 202, an amino acid sequence that is at least 70% identical to SEQ ID NO: 129, an amino acid sequence that is at least 70% identical to SEQ ID NO: 116, an amino acid sequence that is at least 70% identical to SEQ ID NO: 218, and amino acid sequence that is at least 70% identical to SEQ ID NO: 217.
In various embodiments, the microbial strain expresses one or more UGT enzymes capable of catalyzing a branching glycosylation of one or both primary' glycosylations. Such UGT enzymes are summarized in Table 2.
WO 2021/126960 PCT/US2020/065285 In some embodiments, the microbial host cell has one or more genetic modifications that increase the production of UDP-glucose, the co-factor employed by UGT enzymes.
Mogrol glycosides can be recovered from the microbial culture. For example, mogrol glycosides may be recovered from microbial cells, or in some embodiments, are predominately available in the extracellular media, where they may be recovered or sequestered.
Other aspects and embodiments of the invention will be apparent from the following detailed disclosure.
DESCRIPTION OF THE FIGURES FIG. 1 shows the chemical structures of Mog.V, Mog.VI, Isomog.V, and. Siatnenoside. The type of glycosylation reaction is shown within each glucose moiety (e.g., C3 or C24 core glycosylation and the 1-2, 1-4, or 1-6 glycosylation additions).
FIG. 2 shows routes to Mog.V production in vivo. The enzymatic transformation required for each step is indicated, along with the type of enzyme required. Numbers in parentheses correspond to the chemical structures in FIG. 3. Abbreviations: FPP, farnesyl pyrophosphate; SQS, squalene synthase; SQE, squalene epoxidase; TTC, triterpene cyclase; EPH, epoxide hydrolase; CYP450, cytochrome P450 with reductase partner, UGTs, uridine diphosphate glycosyltransferases.
FIG. 3 depicts chemical structures of metabolites involved in Mog.V biosynthesis: (1) famesyl pyrophosphate; (2) squalene; (3) 2,3-oxidosqualene; (4) 2,3;22,23-dioxidosqualene; (5) 24,25-epoxycucurbitadienol; (6) 24,25-dihydroxycucurbitadienol; (7) mogrol; (8) mogroside V, (9) cucurbitadienol.
FIG. 4 illustrates glycosylation routes to Mog.V. Bubble structures represent different mogrosides. White tetra-cyclic core represents mogrol. The numbers below each structure indicate the particular glycosylated mogroside. Black circles represent Cor C24 glucosyl ati ons. Dark grey vertical circles represent L6-glucosylations. Light grey horizontal circles represent 1,2-glucosylations. Abbreviations: Mog, mogrol; sia, si am enoside. 7 WO 2021/126960 PCT/US2020/065285 FIG.5 shows results for in vivo production of squalene in E. coli using different squalene synthases. The asterisk denotes a different plasmid construct and experiment nm on a different day from the others shown. Legend: (1) SgSQS (SEQ ID NO:2), (2) AasQS (SEQ ID NO: 11), (3) EsSQS (SEQ ID NO: 16), (4) EiSQS (SEQ ID NO: 14), (5) FbSQS (SEQ ID NO: 166), (6) BbSQS (SEQ ID NO: 167).
FIG. 6shows results for in vivo production of squalene, 2,3-oxidosqualene, and 2,3;22,23-dioxidosqualene using different squalene epoxidases. Legend: (A) SEQ ID NO: and SEQ ID NO: 168; (B) SEQ ID NO: 11 and SEQ ID NO: 168; (C) SEQ ID NO: and SEQ ID NO: 169, (D) SEQ ID NO: 11 and SEQ ID NO: 169; (E) SEQ ID NO: 2 and SEQ ID NO: 170; (F) SEQ ID NO: 2 and SEQ ID NO: 39; (G) SEQ ID NO: 11 and SEQ ID NO: 39.
FIG. 7shows results for in vivo production of the cyclized triterpene product. Reactions involve an increasing number of enzymes expressed in an E. coli ceil line having an overexpression of MEP pathway enzymes. The asterisks represent fermentation experiments incubated for a quarter of the time than the other experiments. As shown, co-expression of SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), and TTC (SEQ ID NO: 40) (lane G) resulted in high production of the triterpenoid product, cucurbitadienol. Legends: Product 1 is squalene, Product 2 is 2,3-oxidosqualene; Product is cucurbitadienol; (A) expression of SEQ ID NO: 2, (B) expression of SEQ ID NO: 11, (C) coexpression of SEQ ID NO: 2 and SEQ ID NO: SEQ ID NO: 17, (D) coexpression of SEQ ID NO: 2 and SEQ ID NO: 169, (E) coexpression of SEQ ID NO: and SEQ ID NO: 169; (F) coexpression of SEQ ID NO: 2, SEQ ID NO: 17, and SEQ ID NO: 40; (G) coexpression of SEQ ID NO: 11, SEQ ID NO: 39, and SEQ ID NO: 40.
FIG. 8 shows results for SQE engineering to produce high titers of 2,3,22,23- dioxidosqualene. Expression of SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), and TTC (SEQ ID NO: 40) whether on a bacterial artificial chromosome (BAC) or integrated, produce large amounts of cucurbitadienol. Point mutations in SQE (SEQ ID NO: 39) were screened to complement SQE to reduce levels of cucurbitadienol, with corresponding gain in titers of 2,3;22,23-dioxidosqualene. Two variants are shown in FIG. 8, SQE A4 (including H35R, F163A, M283L, V380L, and F395Y substitutions, SEQ ID NO: 20.3) and SQE Cl 1 (including H35R, N133G, F163A, Y254F, V380L, and F395Y substitutions). 8 WO 2021/126960 PCT/US2020/065285 FIG. 9 shows production of 2,3:22,23 dioxidosqualene. Titers are plotted for each strain producing 2,3;22,23 dioxidosqualene. An engineered squalene epoxidase gene, SEQ ID NO: 203, was expressed in a strain producing 2,3 oxidosqualene via the squalene epoxidase of SEQ ID NO: 39. Strains were incubated for 48 hours before extraction. Lanes: (1) expression of SQE of SEQ ID NO: .39; (2) expression of SQE of SEQ ID NO: and SEQ ID NO: 203.
FIG. .10 shows the coexpression of SQS, SQE, and TIC enzymes. CDS of SEQ ID NO: 40, when coexpressed with SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), and. SQE A4 (SEQ ID NO: 203) in E. coll, resulted in production of cucurbitadienoi and 24,25-epoxycucuibitadienol. E. coli strains coexpressing SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQE A4 (SEQ ID NO: 203), and CDS (SEQ ID NO: 40), with an additional TTC produced higher levels of24,25-epoxycucurbitadienol. Legend: TTC1 is SEQ ID NO: 92, TTC2 is SEQ ID NO: 191, TTC3 is SEQ ID NO: 193, TTC4 is SEQ ID NO: 40.
FIG. 11 shows production of cucurbitadienoi and 24,25-epoxy cucurbitadienoi. E coli strains producing oxidosqualene and dioxidosqualene were complemented with CDS homologs and CAS genes engineered to produce cucurbitadienoi. The ratio of 24,25- epoxycucurbitadienoi to cucurbitadienoi varies from 0.15 for Enzyme 1 (SEQ ID NO: 40) to 0.58 for Enzyme 2 (SEQ ID NO: 192), demonstrating improved substrate specificity toward the desired 24,25-epoxycucurbitadienol product for Enzyme 2. Enzyme 3 is SEQ ID NO: 219, and Enzyme 4 is SEQ ID NO: 220.
FIG. 12 shows the screening of EPH enzymes for hydration of 24,25- epoxycucurbitadienol to produce 24,25-dihydroxycucurbitadienoi in E. coli strains coexpressing SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQE A4 (SEQ ID NO: 203), and TTC (SEQ ID NO: 40). These fermentation experiments were performed at 30° C for 72 hours in 96 well plates. Legend: EPHI (SEQ ID NO: 186); EPH2 (SEQ ID NO: 212); EPH3 (SEQ ID NO: 190); EPH4 (SEQ ID NO: 187); EPH5 (SEQ ID NO: 184); EPH6 (SEQ ID NO: 185); EPH7 (SEQ ID NO: 188); EPH8 (SEQ ID NO: 189); and. EPH9 (SEQ ID NO: 58).
FIG. 13(A-C) show the coexpression of SQS, SQE, TTC, EPH, and P4enzymes to produce mogrol. An E. coli strain expressing SEQ ID NOS: 11, 39,203 along 9 WO 2021/126960 PCT/US2020/065285 with CDS, EPH, and P450 genes with a CPR resulted in production of mogrol and oxo- mogrol (FIG. 13A),These fermentation experiments were performed at 30° C for hours in 96 well plates, Mogrol production was validated by LC-QQQ mass spectrum analysis with a spiked authentic standard (FIG.13B) and GC-FID chromatography versus an authentic standard (FIG. 13C).Legend: (1) coexpression of SEQ ID NO: 40, SEQ ID NO: 58, SEQ ID NO: 194), and SEQ ID NO: 98); (2) coexpression of SEQ ID NO: 40, SEQ ID NO: 58, SEQ ID NO: 197, and SEQ ID NO: 98, (3) SEQ ID NO: 40, SEQ ID NO: 58, SEQ ID NO: 171, and SEQ ID NO: 201.
FIG. 14 shows the screening of cytochrome P450s for oxidation at Cll of the 24,25-dihydroxycucurbitadienol-like molecule cucurbitadienol. Native anchor P4enzymes shown are: (I) SEQ ID NO: 194, (2) SEQ ID NO: 197, (3) SEQ ID NO: 171, (4) SEQ ID NO: 74); and (5) SEQ ID NO: SEQ ID NO: 75. In some cases, the native transmembrane domain was replaced with the transmembrane domain from £. coll sohB (Anchor 3), E coli zipA (Anchor 2), or bovine 17a (Anchor 1) to improve interaction with the E. coli membrane. Each P450 was coexpressed with either CPR SEQ ID NO: or CPR (SEQ ID NO: 201), resulting in production of 11-hydroxycucurbitadienol. These fermentation experiments were performed at 30° C for 72 hours in 96 well plates.
FIG. 15shows production of products with oxidation at Cl 1.
FIG. 16shows Mog. V production using a combination of different enzymes. (A) Penta-glycosylated products are observed when UGTs of SEQ ID NO: 165, SEQ ID NO: 146, SEQ ID NO: 117, or SEQ ID NO: 164 are incubated together with mogrol as a substrate. Strains: (1) expresses SEQ ID NO: 165, (2) expresses SEQ ID NO: 146, (3) co-expresses SEQ ID NO: 165 and SEQ ID NO: 146, (4) co-expresses SEQ ID NO: 165, SEQ ID NO: 146, and SEQ ID NO: 117, (5) co-expresses SEQ ID NO: 165, SEQ ID NO: 146, SEQ ID NO: 117, and SEQ ID NO: 164. Mogroside substrates were incubated in Tris buffer containing magnesium chloride, beta-mercaptoethanol, UDP-glucose, single UGT, and a phosphatase. (B) Extracted ion chromatogram (EIC) for 1285.4 Da (mogroside V+H) of reactions containing SEQ ID NO: 165 and SEQ ID NO: 146, and. either Enzyme 1 (SEQ ID NO: 117) or Enzyme 2 (SEQ ID NO: 164) when incubated with Mog.II-E. (C) Extracted ion chromatogram (EIC) for 1285.4 Da (mogroside V+H) of reactions containing SEQ ID NO: 165 and SEQ ID NO: 146 and either Enzyme 1 WO 2021/126960 PCT/US2020/065285 (SEQ ID NO: 117) or Enzyme 2 (SEQ ID NO: 164) when incubated with mogrol. Abbreviation: MogV, mogroside V.
FIG. 17 shows in vitro assays showing the conversion of mogroside substrates to more glycosylated products. Mogroside substrates were incubated in Tris buffer containing magnesium chloride, beta-mercaptoethanol, UDP-glucose, single UGT, and a phosphatase. The panels correspond to the use of different substrates: (A) mogrol; (B) Mog.I-A; (C) Mog.I-E; (I)) Mog.H-E; (E) Mog.DI; (F) Mog.IV-A; (G) Mog.IV; (H) siamenoside. Enzyme 1 (SEQ ID NO: 165), Enzyme 2 (SEQ ID NO: 146), Enzyme (SEQ ID NO: 116), Enzyme 4 (SEQ ID NO: 117), and Enzyme 5 (SEQ ID NO: 164).
FIG. 18 shows the bioconversion of mogrol into mogroside-IA or mogroside-IIE. In the experiment, engineered E. coll strains were inoculated with 0.2 mM mogrol at 37°C. Product formation was examined after 48 hours. The values are reported relative to the empty vector control (the values reported are the detected compound minus the background level detected in the empty vector control). Products were measured on LC/MS-QQQ with authentic standards. Only Enzyme 1 shows formation of mogroside- HE. Enzyme 1 to 5 are SEQ ID NOS: 202, 116, 216, 217, and 218 respectively.
FIG. 19A and FIG. 19B shows the bioconversion of Mog.IA (FIG. 19A) or Mog.IE (FIG. 18B) into Mog.HE. Engineered E. coll strains (expressing either Enzyme 1, SEQ ID NO: 165, Enzyme 2, SEQ ID NO: 202; or Enzyme 3, SEQ ID NO: 116) were grown at 37° C in fermentation media containing 0.2 mM Mog. LA (FIG. 19A) or Mog.IE (FIG. 19B). Product formation was measured after 48 hours using LC-MS/MS with authentic standards. Reported values are those in excess of the empty vector control.
FIG. 20 shows the production of Mog.HI or siamenoside from Mog.II-E by engineered E. coil strains expressing Enzyme 1 (SEQ ID NO: 204), Enzyme 2 (SEQ ID NO: 138), or Enzyme 3 (SEQ ID NO: 206). Strains were grown at 37°C in fermentation media containing 0.2 mM Mog.IA, and product formation was measured after 48 hours using LC-MS/MS with authentic standards.
FIG. 21 shows the in vitro production of Mog.IIA2 by cells expressing Enzyme (SEQ ID NO: 205). 0.1 mM Mog.I-E was added, and reactions were incubated at 37 °C for 48 hr. Data was quantified by LC MS/MS with authentic standards of each compound. 11 WO 2021/126960 PCT/US2020/065285 FIG. 22(A,B)shows production ot Mog.V m E. coli. (A)Chromatogram indicating Mog.V production from engineered E. coli strains expressing SEQ ID NO: 11, SEO ID NO: 39, SEQ ID NO: 203, SEQ ID NO: 40, SEQ ID NO: 189, SEQ ID NO: 199, SEQ ID NO: 202, SEQ ID NO: 165, and SEQ ID NO: 122. Strains were incubated at 30°C for 72 hours before extraction. Mog.V production was verified by LC-QQQ spectrum analysis versus an authentic, standard. (B) Chromatogram indicating Mog.V production from a biological sample with a spiked Mog.V authentic standard.
FIG. 23 shows bioconversion of mogroside-EE to further glycosylated products using an engineered version of the UGT enzyme of SEQ ID NO: 164.
FIG. 24 shows bioconversion of Mog. LA to Mog. HE with an engineered version of the UGT enzyme of SEQ ID NO: 165.
FIG.25 shows bioconversion of Mog.IE to Mog.HE with an engineered version of the UGT enzyme of SEQ ID NO: 217.
FIG. 26 is an amino acid alignment of CaUGT 1,6 and SgUGT94 289 3 using Clustal Omega (Version CLUSTAL O (1,2,4). These sequences share 54% amino acid identity.
FIG. 27 is an amino acid alignment of Homo sapiens squalene synthase (HsSQS) (NCBI accession NP_004453.3) and AaSQS (SEQ ID NO: 11) using Clustal Omega (Version CLUSTAL O (1.2.4)). HsSQS has a published crystal structure (PDB entry: IL/J■) These sequences share 42% amino acid identity.
FIG. 28 is an amino acid alignment of Homo sapiens squalene epoxidase (HsSQE) (NCBI accession XP__011515548) and M1SQE (SEQ ID NO: 39) using Clustal Omega (Version CLUSTAL O (1.2.4)). HsSQE has a published crystal structure (PDB entiy: 6C6N). These sequences share 35% amino acid identity.
DETAILED DESCRIPTION OF THE INVENTION The present invention, in various aspects and embodiments, provides microbial strains and methods for making mogrol and mogrol glycosides, using recombinant microbial processes. In other aspects, the invention provides methods for making products, including foods, beverages, and sweeteners (among others), by incorporating 12 WO 2021/126960 PCT/US2020/065285 the mogrol glycosides produced according to the methods described herein. In still other aspects, the invention provides engineered UGT enzymes for glycosylating secondary metabolite substrates, such as mogrol or mogrosides.
As used herein, the terms "terpene or tri terpene " are used interchangeably with the terms "terpenoid " or "triterpenoid, " respectively.
In various aspects, the invention provides microbial strains and methods for making the triterpenoid compound mogrol, or glycoside products thereof. The invention provides a recombinant microbial host cell expressing a heterologous enzyme pathway catalyzing the conversion of isopentenyl pyrophosphate (IPP) and/or diniethybllyl pyrophosphate (I)MAPP) to one or more of mogrol or mogroside(s).
The microbial host cell in various embodiments may be prokaryotic or eukaryotic. In some embodiments, the microbial host cell is a bacterium, and which can be optionally selected from Escherichia spp., Bacillus spp., Corynehacterium spp., Rhodobacler spp., Zymomonas spp ., Vibrio spp., and Pseudomonas spp. For example, in some embodiments, the bacterial host cell is a species selected from Escherichia coll, Bacillus subtilis, Corynehacterium glutamicum, Rhodobacler capsidatus, Rhodobacler sphaeroides, Zymomonas mobilis, Vibrio natriegens, or Pseudomonas putida. In some embodiments, the bacterial host cell isE coll. Alternatively, the microbial cell may be a yeast cell, such as but not limited to a species of Saccharomyces, Pichia, or Yarrowia, including Saccharomyces cerevisiae, Pichiapastoris, and Yarrowia lipolytica.
The microbial cell will produce MEP or MV A. products, which act as substrates for the heterologous enzyme pathway. The MEP (2-C-methyl-D-erythritol 4-phosphate) pathway, also called the MEP/DOXP (2-C-methyl-D-erythritol 4-phosphate/l-deoxy-D- xylulose 5-phosphate) pathway or the non-mevalonate pathway or the mevalonic acid- independent pathway refers to the pathway that converts glyceraldehyde3 ״~phosphate and pyruvate to IPP and DMAPP. The pathway, which is present in bacteria, typically involves action of the following enzymes: l-deoxy-D-xylulose-5-phosphate synthase (Dxs), l-deoxy-D-xylulose-5-phosphate reductoisomerase (IspC), 4-diphosphocytidyl- 2-C-methyl-D-erythritol synthase (IspD), 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (IspE), 2C-methyl-D-erythritol 2,4-cydodiphosphate synthase (IspF), 1 -hydroxy- 2-methyl-2-(E)-butenyl 4-diphosphate synthase (IspG), and isopentenyl diphosphate 13 WO 2021/126960 PCT/US2020/065285 isomerase (IspH). The MEP pathway, and the genes and enzymes that make up the MEP pathway, are described in US 8,512,988, which is hereby incorporated by reference in its entirety. For example, genes that make up the MEP pathway include dxs, ispC, ispD, ispE, ispF, ispG, ispH, idi, and ispA. In some embodiments, the host cell expresses or overexpresses one or more of dxs, ispC, ispD, ispE, ispF, ispG, ispH, idi, ispA, or modified variants thereof, which results in the increased production of IPP and DMAPP. In some embodiments, the triterpenoid (e.g., squalene, mogrol, or other intermediate described herein) is produced at least in part by metabolic flux through an MEP pathway, and wherein the host cell has at least one additional gene copy of one or more of dxs, ispC, ispD, ispE, ispF, ispG, ispH, idi, ispA, or modified variants thereof.
The MV A pathway refers to the biosynthetic pathway that converts acetyl-CoA to IPP. The mevalonate pathway, which will be present in yeast, typically comprises enzymes that catalyze the following steps: (a) condensing two molecules of acetyl-CoA. to acetoacetyl-CoA (e.g., by action of acetoacetyl-CoA thiolase); (b) condensing acetoacetyl-CoA with acetyl-CoA to form hydroxymethylglutaryl-CoenzymeA (HMG- C0A) (e.g., by action of HMG-C0A synthase (HMGS)), (c) converting HMG-C0A to mevalonate (e.g., by action of HMG-C0A reductase (HMGR)); (d) phosphorylating mevalonate to mevalonate 5-phosphate (e.g., by action of mevalonate kinase (MK)); (e) converting mevalonate 5-phosphate to mevalonate 5-pyrophosphate (e.g., by action of phosphomevalonate kinase (PMK)); and (f) converting mevalonate 5-pyrophosphate to isopentenyl pyrophosphate (e.g., by action of mevalonate pyrophosphate decarboxylase (MPD)). The MV A pathway, and the genes and enzymes that make up the MV A pathway, are described in US 7,667,017, which is hereby incorporated by reference in its entirety. In some embodiments, the host cell expresses or overexpresses one or more of acetoacetyl-CoA thiolase, HMGS, HMGR, MK, PMK, and MPD or modified variants thereof, which results in the increased production of IPP and DMAPP. In some embodiments, the triterpenoid (e.g., mogrol or squalene) is produced at least in part by metabolic flux through an MV A pathway, and wherein the host cell has at least one additional gene copy of one or more of acetoacetyl-CoA thiolase, HMGS, HMGR, MK, PMK, MPD, or modified variants thereof In some embodiments, the host cell is a bacterial host ceil engineered to increase production of IPP and DMAPP from glucose as described in US 10,480,015 and US 14 WO 2021/126960 PCT/US2020/065285 ,662,442, the contents of which are hereby incorporated by reference in thetr entireties. For example, in some embodiments the host ceil overexpresses MEP pathway enzymes, with balanced expression to push/puli carbon flux to IPP and DMAP. In some embodiments, the host ceil is engineered to increase the availability or activity of Fe-S cluster proteins, so as to support higher activity of IspG and IspH, which are Fe-S enzymes. In some embodiments, the host cell is engineered to overexpress IspG and IspH, as to provide increased carbon flux to l-hydroxy-2-methyl-2-(E)-butenyl 4- diphosphate (HMBPP) intermediate, but with balanced expression to prevent accumulation of HMBPP at an amount that reduces cell growth or viability, or at an amount that inhibits MEP pathway flux and/or terpenoid production. In some embodiments, the host cell exhibits higher activity of IspH relative to IspG. In some embodiments, the host cell is engineered to downregulate the ubiquinone biosynthesis pathway, e.g., by reducing the expression or activity of IspB, which uses IPP and FPP substrate.
In various embodiments, the heterologous enzyme pathway comprises a. famesyl diphosphate synthase (FPPS) and a squalene synthase (SQS), which are recombinantly expressed. In various embodiments, the SQS comprises an amino acid, sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 2 to 16, 166, and. 167.
By way of non-limiting example, the FPPS may be Saccharomyces cerevisiae famesyl pyrophosphate synthase (ScFPPS) (SEQ ID NO: 1), or modified variants thereof. Modified variants may comprise an amino acid sequence that is at least 70% identical to SEQ ID NO: 1). For example, the FPPS may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 1. In some embodiments, the FPPS comprises an amino acid sequence having from 1 to 20 amino acid modifications or having from 1 to amino acid modifications with respect to SEQ ID NO: 1, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Numerous other FPPS enzymes are known in the art, and may be employed for conversion of IPP and/or DMAPP to famesyl diphosphate in accordance with this aspect.
In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 11. For example, the SQS may comprise an amino acid WO 2021/126960 PCT/US2020/065285 sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 11. In some embodiments, the SQS comprises an amino aci d sequence having from 1 to 20 amino acid modifications or from to 10 amino acid modifications with respect to SEQ ID NO; 11, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme. As shown in FIG. 5, AaSQS has high activity in E. cob.
In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 2. For example, the SQS may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 2. In some embodiments, the SQS comprises an amino acid sequence having from 1 to 20 amino acid modifications or from to 10 amino acid modifications with respect to SEQ ID NO: 2, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme. As shown in l1 ׳G. 5, SgSQS has high activity in E. coll.
In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 14. For example, the SQS may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 14. In some embodiments, the SQS comprises an amino acid sequence having from 1 to 20 amino acid modifications or from to 10 amino acid modifications with respect to SEQ ID NO: 14, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme. As shown in FIG. 5, El SQS was active in E. colt In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 16. For example, the SQS may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 16. In some embodiments, the SQS 16 WO 2021/126960 PCT/US2020/065285 comprises an ammo acid sequence having from 1 to 20 ammo acid modifications or from to 10 amino acid modifications with respect to SEQ ID NO: 16, the amino acid, modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial ceil, or to increase productivity of the enzyme. As shown in FIG. 5, EsSQS was active in E. colt.
In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 166. For example, the SQS may comprise an amino acid, sequence that, is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 166. In some embodiments, the SQS comprises an amino acid sequence having from 1 to 20 amino acid modifications or from to 10 amino acid modifications with respect to SEQ ID NO: 166, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme. As shown in FIG. 5,FbSQS was active in£. coll.
In some embodiments, the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 167. For example, the SQS may comprise an amino acid, sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 167. In some embodiments, the SQS comprises an amino acid sequence having from I to 20 amino acid modifications or from to 10 amino acid modifications with respect to SEQ ID NO: 167, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme. As shown in FIG. 5,BbSQS was active in E. coli Amino acid modifications to the SQS enzyme can be guided by available enzyme structures and homology models, including those described in Aminfar and Tohidfar, In silico analysis of squalene synthase in Fabaceae family using bioinformatics tools , J Genetic Engineer, and Biotech. 16 (2018) 739-747. The publicly available crystal structure for HsSQE (PDB entry : 6C6N) may be used to inform amino acid modifications. 17 WO 2021/126960 PCT/US2020/065285 An alignment between AaSQS andHsSQS is shown in FIG. 27. The enzymes have 42% amino acid identity.
In some embodiments, the host cell expresses one or more enzymes that produce mogrol from squalene. For example, the host cell may express one or more squalene epoxidase (SQE) enzymes, one or more triterpenoid cyclases, one or more epoxide hydrolase (EPH) enzymes, one or more cytochrome P450 oxidases (CYP450), optionally one or more non-heme iron-dependent oxygenases, and one or more cytochrome P4reductases (CPR). As shown in FIG. 2, the heterologous pathway can proceed through several routes to mogrol, which may involve one or two epoxidations of the core substrate. In some embodiments, the pathway proceeds through cucurbitadienol, and in some embodiments, does not involve a further epoxidation step. In some embodiments, cucurbitadienol intermediate is converted to 24,25-epoxycucurbitadienol (5) by one or or more epoxidase enzymes (such as that, provided herein as SEQ ID NO: 221). In still other embodiments, the pathway largely proceeds through 2,3;24,25-dioxidosqualene, with only small or minimal production of cucurbitadienol intermediate. In some embodiments, one or more of SQE, CDS, EPH, CYP450, non-heme iron-dependent oxygenases, flavodoxin reductases (FPR), ferredoxin reductases (FDXR), and CPR enzymes are engineered to increase flux to mogrol.
In some embodiments, the heterologous enzyme pathway comprises two squalene epoxidase (SQE) enzymes. For example, the heterologous enzyme pathway may comprise an SQE that produces 2,3-oxidosqualene (intermediate (3) in FIG. 2). In some embodiments, the SQE will produce 2,3;22,23-dioxidosqualene (intermediate (4) in FIG. 2), and this conversion can be catalyzed by the same SQE enzyme, or an enzyme that differs in amino acid sequence by at least one amino acid modification. For example, the squalene epoxidase enzymes may include at least two SQE enzymes each comprising (independently) an amino acid sequence that is at least 70% identical to any one of SEQ ID NOS: 17 to 39, 168 to 170, and 177 to 183. By coexpression of an SQE enzyme engineered or screened for substrate specificity for 2,3-oxidosqualene, the di-epoxy intermediate can be produced, with low or minimal levels of cucurbitadienol. In these embodiments, P450 oxygenase enzymes hydroxylating C24 and C25 of the scaffold can be eliminated. 18 WO 2021/126960 PCT/US2020/065285 In some embodiments, the at least one SQE comprises an ammo acid sequence that is at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 39. For example, the SQE enzyme may comprise an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 39, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
As shown in FIG. 6, M1SQE has high activity in E. coti, particularly when coexpressed with AaSQS, where high levels of the single epoxylated product (2,3- oxidosqualene) were observed. Accordingly, coexpression of AaSQS (or an engineered derivative) with multiple copies of M1SQE engineered as described above, has good potential for bioengineering of the mogrol pathway. See FIG. 9. Amino acid modifications may be made to increase expression or stability of the SQE enzyme in the microbial cell, or to increase productivity of the enzyme.
In some embodiments, the host cell comprises two squalene epoxidase enzymes that each comprise an amino acid sequence that is at least 70% identical ioMethylomonas tenia squalene epoxidase (SEQ ID NO: 39). For example, one of the SQE enzymes may have one or more amino acid modifications that improve specificity or productivity for conversion of 2,3-oxidosqualene to 2,3:22,23 dioxidosqualene, as compared to the enzyme having the amino acid sequence of SEQ ID NO: 39. hi some embodiments, the amino acid modifications comprise one or more (or in some embodiments, 2, 3, 4, 5, 6, or 7) modifications at positions corresponding to the following positions of SEQ ID NO: 39: 35, 133, 163, 254, 283, 380, and 395. For example, the amino acid at the position corresponding to position 35 of SEQ ID NO: 39 may be arginine or lysine (e.g., H35R). The position corresponding to position 133 of SEQ ID NO: 39 may be glycine, alanine, leucine, isoleucine, or valine (e.g., N133G). The amino acid at the position corresponding to position 163 of SEQ ID NO: 39 may be glycine, alanine, leucine, isoleucine, or valine (e.g., F163A). The amino acid at the position corresponding to position 254■ of SEQ ID NO: 39 may be phenylalanine, alanine, leucine, isoleucine, or valine (e.g., Y254F). The amino acid at the position corresponding to position 283 of SEQ ID NO: 39 may be alanine, leucine, isoleucine, or valine (e.g., M283L). The amino acid at the position corresponding to position 380 of SEQ ID NO: 39 may be alanine, leucine, or glycine (e.g., V280L). The amino acid at the position corresponding to position 395 of SEQ ID 19 WO 2021/126960 PCT/US2020/065285 NO: 39 may be tyrosine, serine, or threonine (e.g., F395Y). Exemplary SQE enzymes in these embodiments are at least 70%, or at least 80%, or at least 90%, or at least 95% identical to SEQ ID NO: 39, but comprise the following sets of amino acid substitutions: H35R, F163A, M283L, V380L, F395Y; or H35R, N133G, F163A, Y254F, V380L, and F395Y, in each case numbered according to SEQ ID NO: 39. For example, the host cell may express an SQE comprising the amino acid sequence of SEQ. ID NO: 203 (referred to herein as M1SQE A4).
In still other embodiments, the squalene epoxidase comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 168). For example, the SQE may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 168. In various embodiments, the SQE comprises an amino acid sequence having from 1 to 20 amino acid modifications or from I to 10 amino acid modifications with respect to SEQ ID NO: 168, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. As shown in FIG. 6, BaESQE had good activity in E. coti. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme.
In some embodiments, the squalene epoxidase comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 169. For example, the SQE may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 169. In various embodiments, the SQE comprises an amino acid sequence having from 1 to 20 amino acid modifications or from 1 to 10 amino acid modifications with respect to SEQ ID NO: 169, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. As shown in FIG. 6, MsSQE had good activity in E. coli. Amino add modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme.
In some embodiments, the squalene epoxidase comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 170. For example, the SQE may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 170. In various embodiments, the SQE comprises an amino acid sequence having from 1 to 20 amino WO 2021/126960 PCT/US2020/065285 acid modifications or from 1 to 10 ammo acid modifications with respect to SEQ ID NO: 170, the amino acid modifications being independently selected from amino acid, substitutions, deletions, and insertions. As shown in FIG. 6, MbSQE had good activity in E. colt. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme.
Amino acid modifications can be guided by available enzyme structures and homology models, including those described in Padyana AK, et al., Stiucture and inhibition mechanism of the catalytic domain of human squalene epoxidase, Nat Comm. (2019) Vol. 10(97): 1-10, or Ruckenstulh etaf, Structure-Function Correlations of Two Highly Conserved Motifs in Saccharomyces cerevisiae Squalene Epoxidase . Antimicrob. Agents and Chemo. (2008) Vol. 52(4): 1496-1499. FIG. 28 shows an alignment of HsSQE and M1SQE, which is useful for guiding engineering of the enzymes for expression, stability, and productivity in microbial host cells. The two enzymes have 35% identity.
In various embodiments, the heterologous enzyme pathway comprises a triterpene cyclase (TTC). In some embodiments, where the microbial cell coexpresses FPPS, along with the SQS, SQE, and triterpene cyclase enzymes, the microbial cell produces 2,3;22,23-dioxidosqualene. The 2,3;22,23-dioxidosqualene may be the substrate for downstream enzymes in the heterologous pathway. In some embodiments, the triterpene cyclase (TTC) comprises an amino acid sequence that is at least 70%, or at least 80%, or at. least 90%, or at. least 95% identical to an amino acid sequence selected from SEQ ID NOS: 40 to 55, 191 to 193, and 219 to 220. The TTC in various embodiments comprises an amino acid sequence that is at least 70% identical to the amino acid sequence of SEQ ID NO: 40. In some embodiments, the TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 40. For example, the TTC may comprise an amino acid sequence having from I to 20 amino acid modifications with respect to SEQ ID NO: 40, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, the TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least. 95%, or at least 98%, or at. least 99% identical to SEQ ID NO: 192. For example, the TTC may comprise an amino acid 21 WO 2021/126960 PCT/US2020/065285 sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 192, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. The enzyme defined by SEQ ID NO: 192 shows improved specificity toward production of 24,25-epoxycucurbitadienol (FIG. 11).
In various embodiments, the heterologous enzyme pathway comprises at least two copies of a TTC enzyme gene, or comprises at least two enzymes having triterpene cyclase activity and converting 22,23-di oxi dosqualene to 24,25-epoxycucurbitadienol. In such embodiments, product can be pulled to 24,25-epoxycucurbitadienol, with less production of cucurbitadienol.
In some embodiments, the heterologous enzyme pathway comprises at least one TTC that comprises an amino acid sequence that is at least 70% identical to one of SEQ ID NO: 191, SEQ ID NO: 192, and SEQ ID NO: 193. These enzymes may be optionally co-expressed with SgCDS. These enzymes exhibit high production of 24,25- epoxycucurbitadienol. FIG. 10.Thus, in some embodiments, at least one TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 191,192, and 193. In some embodiments, the TTC comprises an amino acid sequence having from 1 to amino acid modifications with respect to one of SEQ ID NOS: 191, 192, and 193, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme. Amino acid modifications can be guided by available enzyme structures and homology models, including those described in Itkin M,, et al., The biosynthetic pathway of the nonsugar, high-intensity sweetener mogroside V from Siraitia srosvenorii , PNAS (2016) Vol 113(47): E7619-E7628. For example, the CDS may be modeled using the structure of human lanosterol synthase (oxidosquaiene cyclase) (PDB 1A 6K) In various embodiments, cucurbitadienol (intermediate 9 in FIG. 2) is converted to 24,25-epoxycucurbitadienol (5) by one of more enzymes expressed in the host cell. For example, the heterologous pathway may comprise an enzyme having at least about 22 WO 2021/126960 PCT/US2020/065285 70%, or at least about 80%, or at least about 85%, or at !east about 90%, or at least about 95%, or at least about 97%, 98%, or 99% sequence identity with SEQ ID NO: 221.
In some embodiments, the heterologous enzyme pathway comprises at least one epoxide hydrolase (EPH). The EPH may comprise an amino acid sequence that is at least 70% identical to amino acid sequence selected from SEQ ID NOS: 56 to 72, 184 to 190, and 212. In some embodiments, the EPH may employ as a substrate 24,25- epoxycucurbitadienoi (intermediate (5) of FIG. 2), for production of 24,25- dihydroxycucurbitadienol (intermediate (6) of FIG. 2). In some embodiments, the EPH comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 56 to 72, 184 to 190, and 212. Thus, in some embodiments, the EPH comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 56 to 72, 184 to 190, and 212, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, the heterologous pathway comprises at least one EPH enzyme converting 24,25-epoxycucurbitadienol to 24,2 5-dihydroxycucurbitadienol, the at least one EPH enzyme comprising an amino acid sequence that is at least 70% identical to one of: SEQ ID NO: 189, SEQ ID NO: 58, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 190, and SEQ ID NO: 212. See FIG. 12. In some embodiments, the EPH enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least. 95%, or at least 98'%, or at. least 99% identical to one of SEQ ID NOS: 189, 58, 184,185,187, 188,190, and 212. For example, the EPH may comprise an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 189, 58, 184, 185, 187, 188, 190, and 212, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme.
In some embodiments, the heterologous pathway comprises one or more oxidases. The one or more oxidases may be active on cucurbitadienol or oxygenated products thereof as a substrate, adding (collectively) hydroxy! ati ons at Cl 1, C24 and 25, thereby producing mogrol (see FIG. 2). Alternatively, the heterologous pathway may comprise 23 WO 2021/126960 PCT/US2020/065285 one or more oxidases that oxidize Cl 1 oi C24,25 dihydroxycucurbttadienol to produce mogrol.
In some embodiments, at least one oxidase is a cytochrome P450 enzyme. Exemplary ׳ cytochrome P450 enzymes comprise an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 73 to 91, 171 to 176, and 194 to 200. In some embodiments, at least one P450 enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 73 to 91, 171 to 176, and 194 to 200. For example, at least one cytochrome P450 enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 73 to 91, 171 to 176, and 194 to 200, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, the microbial host cell expresses a heterologous enzyme pathway comprising a P450 enzyme having activity for oxidation at Cll of 024,dihydroxycucurbitadienol, to thereby produce mogrol. For example, in some embodiments, the cytochrome P450 comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NO: 194 and SEQ ID NO: 171. See FIGS. 13A-C, FIG. 14,and FIG. 15.In some embodiments, the microbial host cell expresses a. cytochrome P450 enzyme that comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 194 and 171. In some embodiments, at least one cytochrome P450 enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 194 and 171, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, the cytochrome P450 enzyme has at least a portion of its transmembrane region substituted with a. heterologous transmembrane region. For example, particularly in embodiments in which the microbial cell is a bacterium, the CYP450 and/or CPR is modified as described in US 2018/0251738, the contents of which are hereby incorporated by reference in their entireties. For example, in some embodiments, the CYP450 enzyme has a deletion of all or part of the wild type P450 N- terminal transmembrane region, and the addition of a transmembrane domain derived 24 WO 2021/126960 PCT/US2020/065285 irom an E coll or bacterial inner membrane, cytoplasmic C-temunus protein. In some embodiments, the transmembrane domain is a single-pass transmembrane domain. In some embodiments, the transmembrane domain is a multi-pass (e.g., 2, 3, or more transmembrane helices) transmembrane domain. Exemplary transmembrane domains are derived from E. coll zipA or sohB. Alternatively, the P450 enzyme can employ its native transmembrane anchor, or the well-known bovine 17a anchor. See FIG. 14.
In some embodiments, the microbial host cell expresses a non-heme iron oxidase. Exemplary non-heme iron oxidases comprise an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 100 to 115. In some embodiments, the non-heme iron oxidase comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 100 to 115.
In various embodiments, the microbial host cell expresses one or more electron transfer proteins selected from a cytochrome P450 reductase (CPR), flavodoxin reductase (FPR) and ferredoxin reductase (FDXR) sufficient to regenerate the one or more oxidases. Exemplar} ׳■ CPR proteins are provided, herein as SEQ ID NOS: 92 to and 201.
In some embodiments, the microbial host cell expresses a cytochrome P4reductase, and which may comprise an amino acid, sequence that is at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at. least 99% identical to one of SEQ ID NOS: 92 to 99 and 201. For example, in some embodiments, the microbial host cell expresses SEQ ID NO: 194 or a derivative thereof (as described above), and SEQ ID NO: 98 or a derivative thereof (i.e., having at least 70%, at least 80%, or at least 90% sequence identity thereto). In some embodiments, the microbial host, cell expresses SEQ ID NO: 171 or a derivative thereof (as described above), and. SEQ ID NO: 201 or a derivative thereof (i.e., having at least 70%, at least 80%, or at least 90% sequence identity thereto).
In various embodiments, the heterologous enzyme pathway produces mogrol, which may be an intermediate for downstream enzymes in the heterologous pathway, or in some embodiments is recovered from the culture. Mogrol may be recovered from host cells in some embodiments, and/or can be recovered from the culture media.
WO 2021/126960 PCT/US2020/065285 In some embodiments, the heterologous enzyme pathway further comprises one or more uridine diphosphate-dependent glycosyltransferase (UGT) enzymes, thereby producing one or more mogrol glycosides (or "mogrosides "). The mogrol glycoside may be pentaglycosylated, hexaglycosylated, or more (e.g., 7, 8, or 9 glycosylations), in some embodiments. In other embodiments, the mogrol glycoside has two, three, or four glucosylations. The one or more mogrol glycosides may be selected from Mog.H-E. Mog.Ill, Mog.IH-Al, Mog.III-A2, Mog.Ill, Mog.IV, Mog.IV-A, siamenoside, isomog.V, Mog. V, or Mog.VI. In some embodiments, the host cell produces Mog. V or siamenoside.
In some embodiments, the host cell expresses a UGT enzyme that catalyzes the primary glycosylation of mogrol at C24 and/or C3 hydroxyl groups. In some embodiments, the UGT enzyme catalyzes a branching glycosylation, such as a. beta 1,and/or beta 1,6 branching glycosylation at the primary C3 and C24 gluscosyl groups. UGT enzymes observed to catalyze primary glycosylation of C24 and/or C3 hydroxyl groups are summarized in Table 1. UGT enzymes observed to catalyze various branching glycosylation reactions are summarized in Table 2.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 116 to 165, 202 to 210, 211, and 213 to 218. For example, in some embodiments, the UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 116 to 165, 202 to 210, 211, and 213 to 218. Thus, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 116 to 165, 202 to 210, 211, and 212 to 218, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
For example, in some embodiments, the microbial cell expresses at least four UGT enzymes, resulting in glucosylation of mogrol at the C3 hydroxyl group, the Chydroxyl group, as well as a further 1,6 glucosylation at the C3 glucosyl group, and a further 1,6 glucosylation and a further 1,2 glucosylation at the C24 glucosyl group. The product of such glucosylation reactions is Mog.V.
WO 2021/126960 PCT/US2020/065285 In some embodiments, at !east one UGT enzyme comprises an ammo acid sequence having at least 70% sequence identity to one of SEQ ID NO: 164, 165, 138, 204 to 211, and 213 to 218.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to UGT85C1 (SEQ ID NO: 165). UGT85Cexhibits primary glycosylation at the C3 and C24 hydroxyl groups. Thus, in some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 165. The at least one UGT enzyme may comprise an amino acid sequence having from 1 to 20 amino acid, modifications with respect to SEQ ID NO: 165, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Exemplary amino acid substitutions include substitutions at positions 41 (e.g., L41F or L41Y). 49 (e.g., D49E), and 127 (e.g., C127F or C127Y).
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 164, which exhibits activity for adding branching glycosylations, both 1-2 and 1-6 branching glycosylations. In various embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 164. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 164, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Exemplary amino acid substitutions are shown in Table 3. Exemplary amino acid substitutions include substitutions at one or more positions selected from 150 (e.g., SI 50F, SI 50Y), 147 (e.g., T147L, T147V, T147I, and T147A), 207 (e.g., N207K or N207R). 270 (e.g., K270E or K270D), 281 (V281L or V281I), 354 (e.g., L354V or L354I), 13 (e.g., L13F or LI3Y), (T32A or T32G or T32L), and 101 (K101A or K101G), with respect to SEQ ID NO: 164. An exemplary engineered UGT enzyme comprises the amino acid substitutions T147L and X207K. with respect to SEQ ID NO: 164.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 138, which exhibits an activity to catalyze 1-6 branching glycosylations. In some embodiments, at least one UGT enzyme 27 WO 2021/126960 PCT/US2020/065285 comprises an ammo acid sequence that is at !east 80%, or at !east 85%, or at !east 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 138. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 138, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 204, which catalyzes 1-6 branching glycosylation, particularly at the C3 primary glucosylation. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99°/״ identical to SEQ ID NO: 204. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 204, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 205, which catalyzes 1-6 branching glycosylation, including at both the C3 and C24 pritnaiy glucosylations. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 205. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from I to 20 amino acid modifications with respect to SEQ ID NO: 205, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 206, which catalyzes 1-2 and 1-branching glycosylations. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at I east 99% identical to SEQ ID NO: 206. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 206, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. 28 WO 2021/126960 PCT/US2020/065285 In some embodiments, at !east one UGT enzyme comprises an ammo acid sequence that is at least 70% identical to SEQ ID NO: 207, which catalyzes 1-6 branching glycosylations of the primary glucosylations. For example, at least, one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 207. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 207, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 208, which catalyzes 1-2 and 1-branching glycosylations. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 208. In exemplary' embodiments, at least, one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 208, the amino acid modifications being independently selected, from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid, sequence that is at least 70% identical to SEQ ID NO: 209, which catalyzes 1-6 branching glycosylations of the primary glucosylations. For example, at least one UGT enzyme may comprise an amino acid sequence that is at. least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 209. In exemplary ׳ embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 209, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 210), which catalyzes 1-branching glycosylations of the primary glucosylations. For example, at least, one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 210. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence 29 WO 2021/126960 PCT/US2020/065285 having from 110 20 ammo acid modifications with respect to SEQ ID NO: 210, the ammo acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 211, which catalyzes 1-2 branching glycosylation of the C24 primary glucosylation. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 211. In exemplary ׳’ embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 210, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 213, which catalyzes 1-6 branching glycosylation of the primary glucosylation at C24. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 213. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 213, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least, one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 214, which catalyzes primary ’ glucosylation at C24. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 214. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 214, the amino acid modifications being independently selected from amino acid substitutions, deletions, and. insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 215, which catalyzes 1-6 branching WO 2021/126960 PCT/US2020/065285 glucosylation at C24. For example, at least one UGT enzyme may comprise an ammo acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 215. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 215, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In still other embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 146, which provides for glucosylation of the C24 hydroxyl of mogrol or Mog. IE. In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 146. In some embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 or from I to 10 amino acid modifications with respect to SEQ ID NO: 146, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme for particular substrates.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 202, which catalyzes primary glycosylation at the C3 and C24 hydroxyl. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 202. In exemplary ׳’ embodiments, at least one UGT enzyme comprises an amino acid sequence having from to 20 amino acid modifications with respect to SEQ ID NO: 202, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 218, which catalyzes primary glycosylation at the C24 hydroxyl. For example, at least one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 218. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from 31 WO 2021/126960 PCT/US2020/065285 1 to 20 ammo acid modifications with respect to SEQ ID NO: 218, the ammo acid modifications being independently selected from amino acid substitutions, deletions, and. insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 217, which catalyzes primary glycosylation at the C24 hydroxyl. For example, at least, one UGT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 217. In exemplary ’ embodiments, at least one UGT enzyme comprises an amino acid sequence having from to 20 amino acid modifications with respect to SEQ ID NO: 217, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Exemplary amino acid substitutions include substitutions at one or more positions (with respect to SEQ ID NO: 17) selected from 74 (e.g., A74E or A74D), (I91F or 191Y), 101 (e.g., H101P), 241 (e.g., Q241E or Q241D), and 436 (e.g., I436L or I436A). In some embodiments, the UGT enzyme comprises the following amino acid substitutions with respect to SEQ ID NO: 217: A74E, I91F, and H101P.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 216, which catalyzes primary' glycosylation at the C24 hydroxyl. For example, at least one UC iT enzyme may comprise an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 216. In exemplary embodiments, at least one UGT enzyme comprises an amino acid sequence having from to 20 amino acid modifications with respect to SEQ ID NO: 216, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at. least 70% identical to SEQ ID NO: 117, SEQ ID NO: 210, or SEQ ID NO: 122. For example, the enzyme defined by SEQ ID NO: 117 catalyzes branching glycosylations. In some embodiments, at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 117, SEQ ID NO: 210, or SEQ ID NO: 122. In some embodiments, at least one UGT enzyme comprises an amino acid 32 WO 2021/126960 PCT/US2020/065285 sequence having from 1 to 20 ammo acid modifications with respect to SEQ ID NO: 117, 210, or 122, the amino acid, modifications being independently selected from amino acid, substitutions, deletions, and insertions.
In some embodiments, the microbial cell expresses at. least one UGT enzyme capable of catalyzing beta 1,2 addition of a glucose molecule to at least the C24 glucosyl group (e.g., of Mog. IVA). Exemplary UGT enzymes in accordance with these embodiments include SEQ ID NO; 117, SEQ ID NO; 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, or SEQ ID NO: 163, or derivatives thereof. Derivatives include enzymes comprising amino acid sequence that are least 70% identical to one or more of SEQ ID NO: 117, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, and SEQ ID NO: 163. In some embodiments, the UGT enzyme catalyzing beta 1,addition of a glucose molecule to at least the C24 glucosyl group comprises an amino acid sequence that is at least 80%, or at least 85%, or at. least 90%, or at least 95%, or at least 98%, or at least 99% identical to one or more of SEQ ID NO: 117, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, and SEQ ID NO: 163. In some embodiments, at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 or having from 1 to 10 amino acid, modifications with respect to SEQ ID NO: 117, SEQ II) NO: 147, SEQ ID NO: 148, SEQ II) NO: 149, SEQ ID NO; 150, and SEQ ID NO: 163, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. Amino acid modifications may be made to increase expression or stability of the enzyme in the microbial cell, or to increase productivity of the enzyme for particular substrates.
In some embodiments, at least one UGT enzyme is a circular permutant of a wild- type UGT enzyme, optionally having amino acid substitutions, deletions, and/or insertions with respect to the corresponding position of the wild-type enzyme. Circular pennutants can provide novel and desirable substrate specificities, product profiles, and reaction kinetics over the wild-type enzymes. A circular permutant retains the same basic fold of the parent enzyme, but has a. different position of the N-terminus (e.g., "cut-site "), with the original N- and C-termini connected, optionally by a linking sequence. For example, in the circular pennutants, the N-terminal Methionine is positioned at a. site in the protein other than the natural N-terminus. UGT circular permutants are described in US 2017/0332673, which is hereby incorporated by reference in its entirety. In some 33 WO 2021/126960 PCT/US2020/065285 embodiments, at least one UGT enzyme is a circular permutant of a UGT enzyme described herein, such as but not limited to SEQ ID NO: 146, SEQ ID NO: 164, or SEQ ID NO: 165, SEQ ID NO: 117, SEQ ID NO; 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID NO: 163, SEQ ID NO: 202, SEQ ID NO: 216, SEQ ID NO: 217, and SEQ ID NO: 218. In some embodiments, the circular pennutant further has one or more amino acid modifications (e.g., amino acid substitutions, deletions, and/or insertions) with respect to the parent UGT enzyme. In these embodiments, the circular permutant will have at least about 70%, or at least about 80%, or at least about 90%, or at least about 95%, or at least about 98% identity to the parent enzyme, when the corresponding amino acid sequences are aligned (i.e., without regard to the new N- terminus of the circular permutant). An exemplary circular permutant for use according to some embodiments is SEQ ID NO: 206.
In some embodiments, the microbial host cell expresses at least three UGT enzymes: a first UGT enzyme catalyzing primary glycosylation at the C24 hydroxyl of mogrol, a second UGT enzyme catalyzing primary glycosylation at the C3 hydroxyl of mogrol, and a. third UGT enzyme catalyzing one or more branching glycosylation reactions. In some embodiments, the microbial host cell expresses one or two UGT enzymes catalyzing beta. 1,2 and/or beta 1,6 branching glycosylations of the C3 and/or C24 primary glycosylations. For example, the UGT enzymes may comprise three or four UGT enzymes selected from:SEQ ID NO: 165 or a derivative thereof;SEQ ID NO: 146 or a derivative thereof;SEQ ID NO: 214 or a derivative thereof;SEQ ID NO: 129 or a derivative thereof;SEQ ID NO: 164 or a derivative thereof;SEQ ID NO: 116 or a derivative thereof;SEQ ID NO: 202 or a derivative thereof;SEQ ID NO: 218 or a derivative thereof;SEQ ID NO: 217 or a derivative thereof;SEQ ID NO: 138 or a derivative thereof;SEQ ID NO: 204 or a derivative thereof;SEQ ID NO: 205 or a derivative thereof;SEQ ID NO: 207 or a derivative thereof; 34 WO 2021/126960 PCT/US2020/065285 SEQ ID NO: 208 or a derivative thereof;SEQ ID NO: 209 or a derivative thereof;SEQ ID NO: 11 or a. derivative thereof;SEQ ID NO: 215 or a derivative thereof;SEQ ID NO: 213 or a derivative thereof;SEQ ID NO: 206 or a derivative thereof;SEQ ID NO: 122 or a derivative thereof; andSEQ ID NO: 210) or a derivative thereof. Derivatives have sequence identity to the reference enzyme as described herein.
In some embodiments, the microbial host cell has one or more genetic modifications that increase the production of UDP-giucose, the co-factor employed by UGT enzymes. These genetic modifications may include one or more, or two or more (or all) of AgalE, AgalT, AgalK, AgalM, AushA, Aagp, Apgm, duplication of Ecoll galU, expression of Bacillus subtillus UGPA, and expression of Bifidobacterium adolescentis SPL.
Mogrol glycosides can be recovered from the microbial culture. For example, mogrol glycosides may be recovered from microbial cells, or in some embodiments, are predominately available in the extracellular media, where they may be recovered or sequestered.
In various embodiments, the reaction is performed in a. microbial cell, and UGT enzymes are recombinantly expressed in the ceil. In some embodiments, mogrol is produced in the cell by a heterologous mogrol synthesis pathway, as described herein. In other embodiments, mogrol or mogrol glycosides (such as a monkfruit extract) are fed to the cells for glycosylation. In still other embodiments, the reaction is performed in vitro using purified UGT enzyme, partially purified UGT enzyme, or recombinant cell lysates.
As described herein, the microbial host cell can be prokaryotic or eukaryotic, and is optionally a bacterium selected from Escherichia coli, Bacillus subtilis, Conmebacterium glutamicum, Rhodobacter capsidatus, Rhodobacter sphaeroides, Zymomonas mobilis, Vibrio nafriegens, or Pseudomonasputida. In some embodiments, the microbial cell is a yeast selected from a. species of Saccharomyces, Pichia, or WO 2021/126960 PCT/US2020/065285 Yarrow la, including Saccharomyces cerevisiae, Pichia pastor is, and Yarrowia lipolytica. In some embodiments, the microbial host cell is E. colt The bacterial host cell is cultured to produce the triterpenoid product (e.g., mogroside). In some embodiments, carbon substrates such as Cl, C2, C3, C4, C5, and/or C6 carbon substrates are employed for the production phase. In exemplary embodiments, the carbon source is glucose, sucrose, fructose, xylose, and/or glycerol. Culture conditions are generally selected from aerobic, microaerobic, and anaerobic.
In various embodiments, the bacterial host cell may be cultured at a temperature between 22° C and 37° C. While commercial biosynthesis in bacteria such asE. coll can be limited by the temperature at which overexpressed and/or foreign enzymes (e.g., enzymes derived from plants) are stable, recombinant enzymes may be engineered to allow for cultures to be maintained at higher temperatures, resulting in higher yields and higher overall productivity. In some embodiments, the culturing is conducted at about 22° C or greater, about 23° C or greater, about 24° C or greater, about 25° C or greater, about 26° C or greater, about 27° C or greater, about 28° C or greater, about 29° C or greater, about 30° C or greater, about 31° C or greater, about 32° C or greater, about 33° C or greater, about 34° C or greater, about 35° C or greater, about 36° C or greater, or about 37° C.
In some embodiments, the bacterial host cells are further suitable for commercial production, at commercial scale. In some embodiments, the size of the culture is at least about 100 L, at least about 200 L, at least about 500 L, at least about 1,000 L, or at least about 10,000 L, or at least about 100,000 L, or at least about 500,000 L, or at least about 600,000 L. In an embodiment, the culturing may be conducted in batch culture, continuous culture, or semi-continuous culture.
In various embodiments, methods further include recovering the product from the cell culture or from cell lysates. In some embodiments, the culture produces at least about 100 mg/L, or at least about 200 mg/L, or at least about 500 mg/L, or at least about 1 g/L, or at least about 2 g/L, or at least about 5 g/L, or at least about 10 g/L, or at least about g/L, or at least about 30 g/L, or at least about 40 g/L of the terpenoid or terpenoid glycoside product. 36 WO 2021/126960 PCT/US2020/065285 In some embodiments, the production of indole (including prenyiated indole) is used as a surrogate marker for terpenoid production, and/or the accumulation of indole in the culture is controlled to increase production. For example, in various embodiments, accumulation of indole in the culture is controlled to below about 100 mg/L, or below about 75 mg/L, or below about 50 mg/L, or below about 25 mg/L, or below about mg/L. The accumulation of indole can be controlled by balancing protein expression and activity using the multivariate modular approach as described in U.S. Pat. No. 8,927,2(which is hereby incorporated by reference), and/or is controlled by chemical means.
Other markers for efficient production of terpene and terpenoids, include accumulation of DOX or ME in the culture media. Generally, the bacterial strains may be engineered to accumulate less of these chemical species, which accumulate in the culture at less than about 5 g/L, or less than about 4 g/L, or less than about 3 g/L, or less than about 2 g/L, or less than about I g/L, or less than about 500 mg/L, or less than about 100 mg/L.
The optimization of terpene or terpenoid production by manipulation of MEP pathway genes, as well as manipulation of the upstream and downstream pathways, is not expected to be a simple linear or additive process. Rather, through combinatorial analysis, optimization is achieved through balancing components of the MEP pathway, as well as upstream and downstream pathways. Indole (including prenyiated indole) accumulation and MEP metabolite accumulation (e.g., DOX, ME, MEcPP, and/or famesol) in the culture can be used as surrogate markers to guide this process.
For example, in some embodiments, the bacterial strain has at least, one additional copy of dxs and idi expressed as an operon/module; or dxs, ispD, IspE, and idi expressed as an operon or module (either on a. plasmid or integrated into the genome), with additional MEP pathway complementation described herein to improve X HiP carbon. For example, the bacterial strain may have a further copy of dxr, and. ispG and/or ispH, optionally with a further copy of ispE and/or idi, with expressions of these genes tuned to increase MEP carbon and/or improve terpene or terpenoid titer. In various embodiments, the bacterial strain has a further copy of at least dxr, ispE, ispG and ispH, optionally with a further copy of idi, with expressions of these genes tuned to increase MEP carbon and/or improve terpene or terpenoid titer. 37 WO 2021/126960 PCT/US2020/065285 Manipulation of the expression of genes and-'or proteins, including gene modules, can be achieved through various methods. For example, expression of the genes or operons can be regulated through selection of promoters, such as inducible or constitutive promoters, with different strengths (e.g., strong, intermediate, or weak). Several non- limiting examples of promoters of different, strengths include Tre, T5 and T7. Additionally, expression of genes or operons can be regulated through manipulation of the copy number of the gene or operon in the cell. In some embodiments, expression of genes or operons can be regulated through manipulating the order of the genes within a module, where the genes transcribed, first are generally expressed, at a higher level. In some embodiments, expression of genes or operons is regulated through integration of one or more genes or operons into the chromosome.
Optimization of protein expression can also be achieved through selection of appropriate promoters and ribosomal binding sites. In some embodiments, this may include the selection of high-copy number plasmids, or single ״, low- or medium-copy number plasmids. The step of transcription termination can also be targeted for regulation of gene expression, through the introduction or elimination of structures such as stem- loops.
Expression vectors containing all the necessary elements for expression are commercially available and known to those skilled in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989. Cells are genetically engineered by the introduction into the cells of heterologous DNA. The heterologous DNA is placed under operable control of transcriptional elements to permit the expression of the heterologous DNA. in the host cell.
In some embodiments, endogenous genes are edited, as opposed to gene complementation. Editing can modify endogenous promoters, ribosomal binding sequences, or other expression control sequences, and/or in some embodiments modifies trans-acting and/or cis-acting factors in gene regulation. Genome editing can take place using CRISPR/Cas genome editing techniques, or similar techniques employing zinc finger nucleases and TALENs. In some embodiments, the endogenous genes are replaced by homologous recombination. 38 WO 2021/126960 PCT/US2020/065285 In some embodiments, genes are overexpressed at ]east m part by controlling gene copy number. While gene copy number can be conveniently controlled using plasmids with varying copy number, gene duplication and chromosomal integration can also be employed. For example, a process for genetically stable tandem gene duplication is described in US 2011/0236927, which is hereby incorporated by reference in its entirety.
The terpene or terpenoid product can be recovered by any suitable process. For example, the aqueous phase can be recovered, and/or the whole cell biomass can be recovered, for further processing. The production of the desired product can be determined and/or quantified, for example, by gas chromatography (e.g., GC-MS). The desired product can be produced in batch or continuous bioreactor systems.
The similarity of nucleotide and amino acid sequences, i.e. the percentage of sequence identity, can be determined via sequence alignments. Such alignments can be carried out with several art-known algorithms, such as with the mathematical algorithm of Karlin and Altschul (Karlin & Altschul (1993) Proc. Natl. Acad. Sci. LISA 90: 5873- 5877), with hmmalign (HMMER package, http://hmmer.wustl.edu/ ) or with the CLUSTAL algorithm (Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-80). The grade of sequence identity (sequence matching) may be calculated using e.g. BLAST, BEAT or BlastZ (or BlastX). A similar algorithm is incorporated, into the BLASTN and BLAST? programs of Altschul et al (1990) J. Mol. Biol. 215: 403-410. BLAST polynucleotide searches can be performed with the BLASTN program, score=100, word length=12.
BLAST protein searches may be performed with the BLAST? program, score=50, word length :::3. To obtain gapped alignments for comparative purposes, Gapped BLAST is utilized as described in Altschul et al (1997) Nucleic Acids Res. 25: 3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs are used. Sequence matching analysis may be supplemented by established homology mapping techniques like Shuffle-LAGAN (Brudno M., Bioinformatics 2003b, Suppl 1:154-162) or Markov random fields.
"Conservative substitutions" may be made, for instance, on the basis of similarity in polarity, charge, size, solubility, hydrophobicity, hydrophilicity, and/or the 39 WO 2021/126960 PCT/US2020/065285 amphipathic nature of the amino acid residues involved. The 20 naturally occurring amino acids can be grouped into the following six standard amino acid groups: (1) hydrophobic: Met, Ala, Vai, Leu, He; (2) neutral hydrophilic: Cys, Ser, Thr; Asn, Gin, (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; and (6) aromatic: Trp, Tyr, Phe.
As used herein, "conservative substitutions " are defined as exchanges of an amino acid by another amino acid listed within the same group of the six standard, amino acid groups shown above. For example, the exchange of Asp by Glu retains one negative charge in the so modified polypeptide. In addition, glycine and proline may be substituted for one another based on their ability to disrupt a-helices. Some preferred conservative substitutions within the above six groups are exchanges within the following sub-groups:(i) Ala, Vai, Leu and He; (ii) Ser and Thr; (ii) Asn and Gin, (iv) Lys and Arg; and (v) Tyrand Phe.
As used herein, "non-conservative substitutions " are defined as exchanges of an amino acid by another amino acid listed in a different group of the six standard, amino acid groups (I) to (6) shown above.
Modifications of enzymes as described herein can include conservative and/ornon-conservative mutations. In some embodiments, an Alanine is substituted or inserted at position 2, to increase stability.
In some embodiments "rational design " is involved in constructing specific mutations in enzymes. Rational design refers to incorporating knowledge of the enzyme, or related enzymes, such as its reaction thermodynamics and kinetics, its three dimensional structure, its active site(s), its substrates) and/or the interaction between the enzyme and substrate, into the design of the specific mutation. Based on a rational design approach, mutations can be created in an enzyme which can then be screened for 40 WO 2021/126960 PCT/US2020/065285 increased production of a terpene or terpenoid relative to control levels. In some embodiments, mutations can be rationally designed based on homology modeling. As used herein, "■homology modeling " refers to the process of constructing an atomic resolution model of one protein from its amino acid sequence and a three-dimensional structure of a related homologous protein.
In other aspects, the invention provides a method for making a product comprising a mogrol glycoside. The method comprises producing a mogrol glycoside in accordance with this disclosure, and incorporating the mogrol glycoside into a product. In some embodiments, the mogrol glycoside is siamenoside, Mog.V, Mog.VI, or Isomog.V. In some embodiments, the product is a sweetener composition, flavoring composition, food, beverage, chewing gum, texturant, pharmaceutical composition, tobacco product, nutraceutical composition, or oral hygiene composition.
The product may be a sweetener composition comprising a blend of artificial and/or natural sweeteners. For example, the composition may further comprise one or more of a steviol glycoside, aspartame, and neotame. Exemplary' steviol glycosides comprises one or more of RebM, RebB, RebD, Reb A, RebE, and Rebl.
Non-limiting examples of flavors for which the products can be used, in combination include lime, lemon, orange, fruit, banana, grape, pear, pineapple, mango, bitter almond, cola, cinnamon, sugar, cotton candy and. vanilla flavors. Non-limiting examples of other food ingredients include flavors, acidulants, and amino acids, coloring agents, bulking agents, modified starches, gums, texturizers, preservatives, antioxidants, emulsifiers, stabilizers, thickeners and gelling agents.
Mogrol glycosides obtained according to this invention may be incorporated as a high intensity natural sweetener in foodstuffs, beverages, pharmaceutical compositions, cosmetics, chewing gums, table top products, cereals, dairy products, toothpastes and other oral cavity compositions, etc.
Mogrol glycosides obtained according to this invention can be used in combination with various physiologically active substances or functional ingredients. Functional ingredients generally are classified into categories such as carotenoids, dietary 7 fiber, fatty acids, saponins, antioxidants, nutraceuticals, flavonoids, isothiocyanates, phenols, plant sterols and stands (phytosterols and phytostanols), 41 WO 2021/126960 PCT/US2020/065285 polyols; prebiottcs, probiotics; phytoestrogens; soy protein; sulfides/thiols; ammo acids; proteins; vitamins; and minerals. Functional ingredients also may be classified based on their health benefits, such as cardiovascular, cholesterol-reducing, and anti-inflammatory.
Mogrol glycosides obtained according to this invention may be applied as a high intensity sweetener to produce zero calorie, reduced calorie or diabetic beverages and food products with improved taste characteristics. It may also be used in drinks, foodstuffs, pharmaceuticals, and other products in which sugar cannot be used. In addition, highly purified target mogrol glycoside(s), particularly, Mog.V, Mog.VI, or Isomog. V, can be used as a. sweetener not only for drinks, foodstuffs, and other products dedicated for human consumption, but also in animal feed and fodder with improved characteristics.
Examples of products in which mogrol gly coside( s) may be used as a sweetening compound include, but are not limited to, alcoholic beverages such as vodka, wine, beer, liquor, and sake, etc.; natural juices, refreshing drinks; carbonated soft drinks; diet drinks, zero calorie drinks; reduced calorie drinks and foods; yogurt drinks; instant juices; instant coffee; powdered, types of instant beverages; canned products; syrups; fermented soybean paste, soy sauce; vinegar; dressings; mayonnaise; ketchups, cuny; soup; instant bouillon; powdered soy sauce; powdered vinegar; types of biscuits; rice biscuit; crackers, bread; chocolates; caramel; candy; chewing gum, jelly; pudding; preserved fruits and vegetables; fresh cream; jam; marmalade; flower paste; powdered milk; ice cream; sorbet; vegetables and fruits packed in bottles; canned and boiled beans; meat and foods boiled in sweetened sauce; agricultural vegetable food products; seafood; ham; sausage, fish ham; fish sausage; fish paste; deep fried fish products; dried seafood products, frozen food products; preserved seaweed; preserved meat; tobacco; medicinal products, and many others.
During the manufacturing of products such as foodstuffs, drinks, pharmaceuticals, cosmetics, table top products, and chewing gum, the conventional methods such as mixing, kneading, dissolution, pickling, permeation, percolation, sprinkling, atomizing, infusing and other methods may be used.
WO 2021/126960 PCT/US2020/065285 As used m this specification and the appended claims, the singular forms "a ", "an " and. "the " include plural referents unless the content clearly dictates otherwise. For example, reference to "a cell " includes a combination of two or more cells, and the like.
As used herein, the term "about " in reference to a number is generally taken to include numbers that fall within a range of 10% in either direction (greater than or less than) of the number.EXAMPLES The biosynthesis of mogrosides in fruit involves a. number of consecutive glycosylations of the aglycone mogrol to the final sweet products, including mogroside V (Mog.V). Mog.V has a. sweetening capacity that is about 250 times that of sucrose (Kasai et al., Agric Biol Chem (1989)). Mogrosides are reported to have health benefits as well (Li. et al., Chin J Nat Med (2014)).
A variety of factors are promoting a surge in interest in mogrosides and monkfruit in general, including an explosion in demand for natural sweeteners, difficulties in scalable sourcing of the current lead natural sweetener, rebaudioside M (RebM) from the Stevia plant, the superior taste performance of Mog.V relative to other natural and artificial sweetener products on the market, and the medicinal potential of the plant and fruit.
Purified Mog. V has been approved as a high-intensity sweetening agent, in Japan (Jakinovich et al., Journal of Natural Products (1990)) and the extract has gained. GRAS status in the USA as a non-nutritive sweetener and flavor enhancer (GRAS 522). Extraction of mogrosides from the fruit can yield a product of varying degrees of purity, often accompanied by undesirable aftertaste. In addition, yields of mogroside from cultivated fruit are limited due to low plant yields and particular cultivation requirements of the plant. Mogrosides are present at ~1% in the fresh fruit and -4% in the dried fruit. Mog. V is the main component, with a content of 0.5%-l .4% in the dried fruit. Moreover, purification difficulties limit purity' for Mog.V, with commercial products from plant extracts being standardized to 50%،'׳ Mog.V. A pure Mog.V product is desirable to avoid off flavors, and will be easier to formulate into products, since Mog.V has good solubility potential. It. is therefore advantageous to produce sweet mogroside compounds, such as but not limited to Mog.V, via biotechnological processes. 43 WO 2021/126960 PCT/US2020/065285 FIG.1 shows the chemical structures of Mog.V, Mog.VI, Isomog.V, and Siamenoside. Mog.V has five glucosylations with respect to the mogrol core, including glucosylations attheC3 and C24 hydroxyl groups, followed by 1-2,1-4, and 1-6 glucosyl additions. These glucosylation reactions are catalyzed by uridine diphosphate-dependent glycosyltransferase enzymes (UGTs).
FIG. 2 shows routes to Mog.V production in vivo. The enzymatic transformation required for each step is indicated, along with the type of enzyme required. Numbers in parentheses correspond to the chemical structures in FIG. 3, namely: (1) farnesyl pyrophosphate; (2) squalene, (3) 2,3-oxidosqualene; (4) 2,3;22,23-dioxidosqualene; (5) 24,25-epoxycucurbitadienol; (6) 24,25-dihydrooxycucurbitadienol; (7) mogrol; (8) mogroside V; (9) cucurbitadienol.
Mogrosides can be produced by biosynthetic fermentation processes, as illustrated in FIG. 2, using microbial strains that produce high levels of methylerythritol 4-phosphate (MEP) pathway products, along with heterologous expression of mogrol biosynthesis enzymes and UGT enzymes that direct glucosylation reactions to Mog.V, or other desired mogroside compound. For example, in bacteria such as E. coll. isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) can be produced from glucose, and are converted to farnesyl diphosphate (FPP) (1) by recombinant farnesyl diphosphate synthase (FPPS). FPP is converted to squalene (2) by a condensation reaction catalyzed by squalene synthase (SQS). Squalene is converted to 2,3-oxidosqualene (3) by an epoxidation reaction catalyzed by a squalene epoxidase (SQE). The pathway can proceed to 22,23-dioxidosqualene (4) by further epoxidation followed by cyclization to 24,25-epoxycucurbitadienol (5) by a triterpene cyclase, and then hydration of the remaining epoxy group to 24,25-dihydroxycucurbitadienol (6) by an epoxide hydrolase. A further hydroxylation catalyzed by a P450 oxidase produces mogrol (7).
The pathway can alternatively proceed by cyclization of (3) to produce cucurbitadienol (9), followed by epoxidation to (5), or multiple hydroxylations of cucurbitadienol to 24,25-dihydroxycucurbitadienol (6), or to mogrol (7).
FIG.4 illustrates glucosylation routes to Mog.V. Glucosylation of the Chydroxyl produces Mog.I-E, or glucosylation of the C24 hydroxyl produces Mog.I-Al. 44 WO 2021/126960 PCT/US2020/065285 Glucosylation of Mog.I-Al at C3 or glucosylation of Mog.I-El at C24 produces Mog.II- E. Further 1-6 glucosylation of Mog.II-E at C3 produces Mog.III-A2. Further 1-glucosylation at C24 of Mog,HE produces Mog.HI. 1-2 glucosylation of Mog.n1-A2 at C24 produces Mog.IV, and then to Mog.V with a further 1-6 glucosylation at C24. Alternatively, glucosyl ations may proceed through Mog.Ill, with a 1-6 glucosylation at C3 and a 1-2 glucosylation at C24, or through Siamenoside or Mog.IV with 1-glucosyl ations.
While biosynthetic enzymes from monkfruit Siralti.a grosvenorii) have been identified for production of mogrol (See, WO 2016/038617 and US 2015/0322473, which are hereby incorporated by reference in their entireties), many of these enzymes lack the productivity or physical properties desired for overexpression in microbial hosts, particularly for fermentation approaches that operate at higher temperatures than the natural climate of the plant. Accordingly, alternative or engineered enzymes are desired to improve production of mogrol using microbial fermentation, with mogrol acting as the substrate for glucosylation to produce Mog.V or other target mogroside.
Using an E. coli strain that produces high levels of the MEP pathway products IPP and DMAPP (see US 2018/0245103 and US 2018/0216137, which are hereby incorporated by reference), and with overexpression of ScFPPS, enzymes were screened for their ability to convert FPP to squalene (SQS activity), as well epoxidation of squalene to produce 2,3-oxidosqualene (SQE activity). The 2,3-oxidosqualene intermediate can by cyclized by a triterpene cyclase, such as CDS from Siraltia grosvenorii. As demonstrated in FIG. 5, several enzymes were identified with good activity in E. coll. In particular, SEQ ID NO: 11 showed high activity in E. coll at 37° C culture conditions.
As shown in FIG. 6,co-expression of SQS (SEQ ID NO: 11) and SQE (SEQ ID NO: 39) in E. coll provided a substantial gain in titer of the 2,3-oxidosqualene intermediate. Other SQE enzymes were active in£. coli.
FIG. 7 shows coexpression of SQS, SQE, and TTC enzymes. CDS (or triterpene cyclase, or "TTC") (SEQ ID NO: 40), when coexpressed with SQS (SEQ ID NO: 11) and SQE (SEQ ID NO: 39), resulted in high production of the triterpenoid product, cucurbitadienol (Product 3). These fermentation experiments were performed at 37° C 45 WO 2021/126960 PCT/US2020/065285 ror 48 to 120 hours. FIG. 8shows results for SQE engineering to produce high titers of 2,3;22,23-dioxidosqualene. Expression of SQS, SQE, and TTC whether on a bacterial artificial chromosome (BAG) or integrated, produce large amounts of cucurbitadienol. Point mutations in SQE (SEQ ID NO: 39) were screened to complement SQE (SEQ ID NO: 39) to reduce levels of cucurbitadienol, with corresponding gain in titers of 2,3;22,23-dioxidosqualene. Two SQE mutants are shown in FIG. 8,SQE A4 and SQE CH. By complementing SQE (SEQ ID NO: 39) with a. second engineered version with higher specif! city/activity for 2,3-oxidosqualene, titers can be pushed toward 2,3,22,23- dioxidosqualene, as opposed to cucurbitadienol. This concept is demonstrated further in FIG. 9. SQE A4 (SEQ ID NO: 203) was co-expressed with SQE (SEQ ID NO: 39), SQS (SEQ ID NO: 11), and TTC (SEQ ID NO: 40). These fermentation experiments were performed at 37° C for 48 hours in 96 well plates. Titers were plotted for each strain producing 2,3;22,23 dioxidosqualene. As shown in FIG. 9, the strain expressing SQE A.4 (SEQ ID NO: 203) produced much more 2,3,22,23 dioxidosqualene.
FIG.10 shows the coexpression of SQS, SQE, and TTC enzymes. TTC (SEQ ID NO: 40), when coexpressed with SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), and SQE A4 (SEQ ID NO: 203) in E. coll. resulted in production of cucurbitadienol and 24,25-epoxycucurbitadienol. Candidate enzymes for an additional or alternative TTC include SEQ ID NO: 40, SEQ ID NO: 191, SEQ ID NO: 192, and SEQ ID NO: 193. Each candidate TTC enzyme was expressed in this strain and screened for production of 24,25-epoxy-cucurbitadienoI. These fermentation experiments were performed at 30° C for 72 hours in 96 well plates. 24,25-epoxy-cucurbitadienol production was verified by GC-MS spectrum analysis. Concentrations were plotted relative to production of 24,25- epoxy-cucurbitadienol from an E. coll strain expressing SEQ ID NO: 40 as the only cyclase. As shown in FIG. 10, E. coll strains coexpressing SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQE A4 (SEQ ID NO: 203), and TTC (SEQ ID NO: 40), with an additional TTC, produced higher levels of 24,25-epoxycucurbitadienol.
FIG. 11 shows substrate specificity for production of cucurbitadienol and 24,25- epoxycucurbitadienol with candidate TTC enzymes. Engineered £. coll strains producing oxidosqualene and dioxidosqualene were complemented with CDS homologs and CAS genes engineered for cucurbitadienol production. Strains were incubated at both 30°C for 72 hours before extraction. The ratio of 24,25-epoxycucurbitadienol to 46 WO 2021/126960 PCT/US2020/065285 cucurbitadienol vanes from 0.15 tor Enzyme 1 (SEQ ID NO: 40) to 0.58 for Enzyme (SEQ ID NO: 192), pointing to improved substrate specificity toward the desired 24,25- epoxycucurbitadienol product for Enzyme 2.
FIG. 12shows the screening of EPH enzymes for hydration of epoxycucurbitadienol to produce 24,25-dihydroxycucurbitadienol in E. coll strains coexpressing SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQF, A4 (SEQ ID NO: 203), and TTC (SEQ ID NO: 40). EPH homologs were expressed in a strain producing 24,25-epoxycucurbitadienol for production of 24,25-dihydroxycucurbitadienol. Candidate EPH enzymes for this reaction include SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 212, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, and SEQ ID NO: 190. These fermentation experiments were performed at 30° C for hours in 96 well plates. 24,25-dihydroxycucurbitadienol production was verified by GC-MS spectrum analysis. Titers were plotted for each strain producing 24,25- dihydroxycucurbitadienol. As shown in FIG. 12, the E. coll strains expressing the EPHs were able to produce 24,25-dihydroxycucurbitadienol. T0EPH and SgEPH3 in particular demonstrated high activity n A. colt.
FIG. 13A-Cshows the coexpression of SQS, SQE, TTC, EPH, and P4enzymes to produce mogrol. E. cob. strains were constructed that express SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQE A.4 (SEQ ID NO: 203), TTC (SEQ ID NO: 40), EPH (SEQ ID NO: 58), and a P450 selected from SEQ ID NO: 194, SEQ ID NO: 197, and SEQ ID NO: 171, together with a cytochrome P450 reductase (9SEQ ID NO: 98 or SEQ ID NO: 201). These fermentation experiments were performed at 30° C for 72 hours in 96 well plates. Mogrol production was verified by LC-QQQ spectrum analysis. As shown in FIG. 13A,the expression of SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQE A4 (SEQ ID NO: 203), TTC (SEQ ID NO: 40), EPH (SEQ ID NO: 58), and the P450s SEQ ID NO: 194, SEQ ID NO: 197, and SEQ II) NO: 171 resulted in production of mogrol and oxo-mogrol. As shown in FIG. 13Band FIG. 13C,mogrol production was validated by LC-QQQ mass spectrum analysis using spiked authentic standard (FIG. 13B)and GCF1D chromatography versus an authentic standard (FIG. 13C), respectively.
FIG.14 shows the screening of cytochrome P450s for oxidation at Cl 1 of the 24,25-dihydroxycucurbitadienol-like molecule cucurbitadienol. In many cases, the 47 WO 2021/126960 PCT/US2020/065285 native transmembrane domain was replaced with the transmembrane domain from E. coli sohB (SEQ ID NO: 195, SEQ ID NO: 198, and SEQ ID NO: 199), E. coll zipA (SEQ ID NO: 196), or bovine 17a (e.g. SEQ ID NO: 200) to improve interaction with the E. coli membrane. Each P450 was coexpressed with either SEQ ID NO: 201 or SEQ ID NO: 98, resulting in production of 11-hydroxycucurbitadienol. These fermentation experiments were performed at 30° C for 72 hours in 96 well plates. 11-hydroxy-cucurbitadienol production was verified by GC-MS. Concentrations were plotted for strains producing 11-hydroxycucurbitadienol. As shown in FIGS. 14 and 15, the strains disclosed herein were capable of production of 11-hydroxy-cucurbitadienol.
Mogrol was used as a substrate for in vitro glucosylation reactions with candidate UGT enzymes, to identify candidate enzymes that provide efficient glucosylation of mogrol to Mog.V. Reactions were carried out in 50 mM Tris-HCl buffer (pH 7.0) containing beta-mercaptoethanol (5 mM), magnesium chloride (400 uM), substrate (2uM), UDP-glucose (5 mM), and a phosphatase (1 U). Results are shown in FIG. 16A Mog.V product is observed when the UGT enzymes of SEQ ID NO: 165, SEQ ID NO: 146, and SEQ ID NO: 117 are incubated together. A penta-glycosylated product is formed, when the UGT enzymes of SEQ ID NO: 165, SEQ ID NO: 146, and SEQ ID NO: 164) are incubated together. FIG. 16B,Extracted ion chromatogram (EIC) for 1285.Da (mogroside V+H) of reactions containing enzymes of SEQ ID NO: 165 + SEQ ID NO: 146 and either SEQ ID NO: 117 (solid dark grey line) or SEQ ID NO: 164 (light grey line) when incubated with Mog.II-E. FIG. 16C, Extracted ion chromatogram (EIC) for 1285.4 Da. (mogroside V+H) of reactions containing enzymes of SEQ ID NO: 165 + SEQ ID NO: 146 and either SEQ ID NO: 117 (solid dark grey line) or SEQ ID NO: 1(light grey line) when incubated with mogrol.
FIG. 4 and FIG. 17 show additional glycosyltransferase activities observed on particular substrates. Coexpression of UGT enzymes can be selected to move product to the desired mogroside product.
FIG. 18 shows the bioconversion of mogrol into mogroside intermediates. Engineered E. coll strains (see US 2020/0087692, which is hereby incorporated by reference in its entirety) expressing UGT enzymes were incubated in 96-well plates with 0.2 mM mogrol. Product formation was examined after 48 hours. Reported values are those in excess of the empty vector control. Products were measured on LC-MS/MS with 48 WO 2021/126960 PCT/US2020/065285 authentic standards. Only Enzyme 1 shows ionnation of Mog.HE. Enzymes 1 to 5 are SEQ ID NOS: 202, 116, 216, 217, and 218, respectively.
FIG. 19A and FIG. 19B shows the bioconversion of Mog.IA (FIG. 19A) or Mog.IE (FIG. 19B) into Mog.HE. In the experiment, engineered E. coil strains (as above) expressing UGT enzymes, SEQ ID NO: 165, SEQ ID NO: 202, or SEQ ID NO: 116 were incubated in fermentation media containing 0.2 mM Mog.IA (FIG. 19A) or Mog.IE (FIG. 19B) in 96-well plates at 37"' C. Product formation was examined after 48 hours.
Products were measured on LC-MS/MS with authentic standards. The values of Mog.IlE levels in excess of the empty vector control were calculated. As shown in FIG. 19A, SEQ ID NO: 165 and SEQ ID NO: 202 were able to catalyze bioconversion of Mog.IA. into Mog.IlE. Similarly, as shown in FIG. 19B, SEQ ID NO: 165, SEQ ID NO: 202, and SEQ ID NO: 116 were able to catalyze the bioconversion of Mog.IE into Mog.IlE.
FIG. 20 shows the production of Mog.Ill or siamenoside from Mog.II-E. In the experiment, engineered E. coll strains expressing UGT enzymes SEQ ID NO: 204, SEQ ID NO: 138 or SEQ ID NO: 206 were grown in fermentation media containing OJ mM Mog.II-E at 37 °C for 48 hr. Products were quantified by LCMS/MS with authentic standards of each compound. As shown in FIG. 20, all strains were able to catalyze bioconversion ofMog. HE to Mog.III. In addition, MbUGTl,2.2 also showed production of substantial amounts of siamenoside.
FIG.21 shows the production of Mog.II-A2. 0.1mM Mog.I-E was fed in vitro. In the experiment, engineered E. coll strains expressing UGT enzyme SEQ ID NO: 2were incubated at 37 °C for 48 hr. Products were quantified by LC-MS/MS with authentic standards of each compound. As shown in FIG. 21, SEQ ID NO: 205 is able to catalyze bioconversion of Mog.IE to Mog. 11-7X2.
A summary of observed primary glycosylation reactions at. C3 and C24 hydroxyls of mogrol are provided in Table 1. Specifically, 0.2 mM mogrol was fed to cells expressing various UGT enzymes. Reactions were incubated at 37 °C for 48 hrs. Products were quantified by LCMS/MS with authentic standards of each compound.
Table 1 49 WO 2021/126960 PCT/US2020/065285 UGT C3 O-Glucosyiation C24 O-GIucosylation SEQIDNO: 165 Yes Yes SEQIDNO: 146 No Yes SEQ ID NO: 214 No Yes SEQ ID NO: 2.02 Yes Yr es SEQ ID NO: 129 Yes No SEQIDNO: 116 Yes Yes SEQ ID NO: 218 No Yes SEQ ID NO: 216 No Yes SEQ ID NO: 217 No Yes A summary of branched glycosylation reactions are provided in Table 2. 0.2 mM Mog.IIE or Mog.IE was fed to cells expressing various UGT enzymes. Reactions were incubated at 37 °C for 48 hr. Products were quantified by LC- MS/MS with authenticstandards of each compound. "Indirect " evidence means that consumption of substrate was observed.
Table 2 Name €3 1-2 €3 1-6 C24 1-2 C24 1-6 SEQ ID NO: 205 No Yr es No Yes SEQ ID NO: 204 No YTs No No SEQ ID NO: 122 No Yes Yes Yes SEQIDNO: 211 No No Yes No SEQIDNO: 138 No Yes No Yes 50 WO 2021/126960 PCT/US2020/065285 SEQ ID NO: 207 No Yes No Yes SEQ ID NO: 209 No Yes No Yes SEQ ID NO: 208 Yes (Indirect)Yes Yes (Indirect)Yes (Indirect)SEQ ID NO: 206 Yes (Indirect)Yes (Indirect)Yes Yes SEQ ID NO: 164 No Yes Yes Yes SEQ ID NO: 210 No Yes No Yes SEQ ID NO: 215 No No No Yes SEQ ID NO: 213 No No No Yes An exemplary E. coll strain producing Mog.V was created by expressing the foilowing enzymes in an£. coll strain engineered to produce high levels ofMEP pathway products: SQS (SEQ ID NO: 11), SQE (SEQ ID NO: 39), SQE A4 (SEQ ID NO: 203), TIC (SEQ ID NO: 40), EPH (SEQ ID NO: 189),.sohB_CppCYP (SEQ ID NO: 199), AtUGT73C3 (SEQ ID NO: 202), UGT85C1 (SEQ ID NO: 165), and UGT94-289-(SEQ ID NO: 122). Production of Mog.V is demonstrated in FIG. 22A, B.Strains were incubated at 30°C for 72 hours before extraction. Mog.V production was verified by LC- QQQ spectrum analysis versus an authentic standard FIG. 22A. FIG. 22Bshows a chromatogram indicating Mog.V production from a biological sample with a spiked Mog.V authentic standard.
Biosynthesis enzymes can be further engineered for expression and activity in microbial cells, using known structures and primary sequences.
FIG. 26 is an amino acid alignment of CaUGT_l,6 and SgUGT94_289_3 using Clustal Omega (Version CLUSTAL O (1,2,4). These sequences share 54% amino acid identity. Coffea arabica UGT_1,6 is predicted to be a beta-D-glucosyl crocetin beta 1,6- glucosyltransferase-like (XP 027096357.1). Together with known UGT structures and primary sequences, CaUGT l ,6 can be further engineered for microbial expression and activity, including engineering of a circular permutant. 51 WO 2021/126960 PCT/US2020/065285 FIG.27 is an amino acid alignment of Homo sapiens squalene synthase (HsSQS) (NCBI accession NP 004453.3) and AaSQS (SEQ ID NO: 11) using Clustal Omega (Version CLUSTAL 0 (1.2.4)). HsSQS has a published crystal structure (PDB entry: 1EZF). These sequences share 42% amino acid identity.
FIG. 28is an amino acid alignment of Homo sapiens squalene epoxidase (HsSQE)(NCBI accession XP_011515548) and M1SQE (SEQ ID NO: 39) using Clustal Omega (Version CLUSTAL 0 (1.2.4)). HsSQE has a published crystal structure (PDB entry': 6C6N). These sequences share 35% amino acid identity.
The UGT enzyme of SEQ ID NO: 164 was engineered for improved glysoylation activity. Various amino acid substitutions were made to the enzyme, as informed by in silico analysis. The following amino acid substitutions in Table 3 were tested for further glycosylation of mog. HE.
Table 3 Substitution Fold improviment in UDP-Glucose Transferred G150F 13.2 T147L 13.0 N207K 10.9 K270E 10.0 V281L 9.1 L354V 8.6 L13F 7.5 T32A 5.6 K101A N.j C219E 4.9 V281Q 4.6 52 WO 2021/126960 PCT/US2020/065285 Substitution Fold Improviment in UDP-Glacose Transferred S43T 4.6 M394V 4.6 E74G 4.5 K270P 4.1 T256V 3.9 V175K 3.9 N283G 3.4 D285P 3.3 A377V3.2 F217L 3.1 K204R 3.1 T303A 3.0 D95K 2.9 s14n 2.7 K270T 2.7 V281A 2.5 A166 del. 2.2 G205S2.1 N333S 2.0 K270M 2.0 F132L 2.0 53 WO 2021/126960 PCT/US2020/065285 Substitution Fold Improviment in UDP-Glacose Transferred L40F 1.9 A166K 1.9 V281K 1.8 R185S 1.7 F8L 1.7 F258Y 1.7 N35G 1.7 N133G 1.7 A77P 1.6 N207Y 1.6 K386D 1.6 Y163F 1.5 N399R 1.5 H18Y 1.5 A166S1.3 K101E 1.3 Q418K 1.3 I191V 1.3 RI 82S 1.2 K101Q 1.2 S142F 1.2 54 WO 2021/126960 PCT/US2020/065285 Substitution Fold hnproviment in UDP-Glacose Transferred T46N 1.2 T159E 1.2 T55P 1.2 K160D 1.2 T7K 1.2 A166T 1.1 Aii engineered UGT enzyme based on SEQ ID NO: 164 was prepared having substitutions T147L and N207K. The bioconversion of Mog.lIE to further glycosylated products is shown in FIG. 23. In the experiment, engineered. E. coll strains expressing the engineered CaUGT_l,6 were inoculated with Mog.lIE substrate at 37°C. Product formation was examined after 48 hours. Products were measured on LC/MS-QQQ with authentic standards.
The UGT enzyme of SEQ ID NO: 165 was engineered for improved glysoylation activity. The following amino acid substitutions were identified as improvingbioconversion of Mog.IA to Mog.lIE (Table 4): Table 4 Substitution Fold Improvement in Mog.IA to Mog.lIE Bioconversion CTL 1 L41F 1.29 D49E 1.36 55 WO 2021/126960 PCT/US2020/065285 An engineered UGT enzyme based on 85C1 was prepared, having substitutions L41F, D49E, and C127F. The bioconversion of Mog.IA to Mog.HE is shown in FIG. 24. In the experiment, engineered E. colt strains expressing the engineered 85C1 were inoculated with Mog.IA substrate at 37°C. Product formation was examined after 5 hours. Products were measured on LC/MS-QQQ with authentic standards. FIG. shows the fold improvement of the engineered version compared to the control (85C1).
The UGT enzyme of SEQ ID NO: 217 (UGT73F24) was engineered for improved glysoylation activity. The folkwing amino acid substitutions were identified as improving bioconversion of Mog.IE to Mog.HE with UGT73F24 (Table 5): Table 5 Substitution Fold Improvement in Mog.IE to Mog.IIEProduction CTL 1 A74E 1.88 19 IF 2.01 H101P 2.38 Q241E 1.31 I436L 1.09 An engineered UGT enzyme based on UGT73F24 was prepared having substitutions A74E, I91F, and H10 IP. The bioconversion of Mog.IE to Mog.HE is shown! in FIG. 25. In the experiment, engineered E. colt strains expressing the engineered UGT73F24 were inoculated with Mog.IE substrate at 37°C. Product formation was 56 WO 2021/126960 PCT/US2020/065285 examined after 48 hours. Products were measured on LC/MS-QQQ with authentic standards. FIG. 25 shows the fold improvement of the engineered version compared to the control (73F24), 57 WO 2021/126960 PCT/US2020/065285 SEQUENCES Farnesyl Pyrophosphate Synthase (FPPS) Saccharomyces cerevisiae FPPS (SEQ 1D NO: 1) MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTPGGKLNRGLSVVDTYAILSNKTVEQLGQEEYEKVAILGWCIELLQAYELVADDMMDKSiTRRGQPCWYKVPEVGEIAIND AFMLEAAIYKLLKSHFRNEKYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKHSFIVTFKTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQDDYLDCFGTPEQIGKIG TDIQDNKCSWVINKALELASAEQRKTLDENYGKKDSVAEAKCKKIFNDLKIEQLYHEYEESIAK DLKAKISQVDESRGFKADVLTAFLNKVYKRSK Squalene Synthase (SQS) Siraitia grosvenorii SQSa (SEQ ID NO: 2) MGSLGAILRHPDDFYPLLKLKKAARHAEKQIPPEPHWGFCYTMLHKVSRSFALVIQQLAPELRN AICIFYLVLRALDTVEDDTSIQTDIKVPILKAFHCHIYNRDWHFSCGTKDYKVLMDQFHHVSTA FLELGKGYQEAIEDITKRMGAGMAKFlCKEVETVDDYDEYCHYVAGLVGLGLSKLFHASDLEDL APDSLSNSMGLLLQKTNIIRDYLEDINEIPKSRMFWPREIWGKYADKLEDFKYEENSVKAVQCL NDLVTNALNHVEDCLKYMSNLRDLSIFRFCAIPQIMAIGTLALCYNNVEVFRGWKMRRGLTAK VIDRTQTMADVYGAFFDFSVMLKAKVNSSDPNATKTLSRIEAIQKTCEQSGLLNKRKLYAVKSE PMFNPTLIVIL FS LLC11LAYL SAKRLPANQPV Siraitia grosvenorii SQSb (SEQ ID NO: 3) MGSLGAILRHPDDFYPLLKLKMAARHAEKQIPPEPHWGFCYTMLHKVSRSFALVIQQLAPELRN AICIFYLVLRALDTVEDDTSIQTDIKVPILKAFHCHIYNRDWHFSCGTKDYKVLMDQFHHVSTA FLELGKGYOEAIEDITKRMGAGMAKFICKEVETVDDYDEYCHYVAGLVGLGLSKLFHASDLEDL APDSLSNSMGLLLQKTNIIRDYLEDINEIPKSRMFWPREIWGKYADKLEDFKYEENSVKAVQCL NDLVTNALNHVEDCLKYMSNLRDLSIFRTCAIPQIMAIGTLALCYNNVEVFRGVVKMRRGLTAK VIDRTQTMADVYGAFFDFSVMLKAKVNNSDPNATKTLSRIEAIQKTCEQSGLLNKRKLYAVKSE PMENPTLIVILFSLICIILAYLSAKRLPANQPV Cue-arris sativus (SEQ ID NO: 4) MGSLGAILKHPDDFYPLLKLKIAARHAEKQIPPEPHWGFCYTMLHKVSRSFALVIQQLKPELRN AVCIFYLVLRALDTVEDDISIQTDIKVPILKAFHCHIYNRDWHFSCGTKDYKVLMDEFHHVSTA FLELGKGYQEAIEDITKBMGAGMAKFICKEVEYVDDYDEYCHYVAGLVGLGLSKLFHAAELEDL APDSLSNSMGLFLQKTMIIRDYLEDINEIPKSRMFWPREIWGKYADKLEDFKYEENSVKAVQCL NDLVTNALMHVEDCLKYMSNLRDLSIFRFCAIPQ1MAIGTLALCYNNVEVFRGVVKMRRGLTAK VIDRTKTMADVYGAFFDFSVMLKAKVNSNDPNASKTLSRIEAlQKTCKQSGILNRRKLYVVRSE PMFNPAVIVILFSLLCIILAYLSAKRLPANQSV Cucumis melo (SEQ ID NO: 5) MGSLGAILKHPDDFYPLLKLKMAARHAEKQIPPESHWGFCYTMLHKVSRSFALVIQQLKPELRN AVCIFYLVLRALDTVEDD'TSIQTDIKVPILKAFHCHIYNRDWHFSCGTKDYKVLMDEFHHVSTA FLELGKGYQEAIEDITKRMGAGMAKFICKEVETVDDYDEYCHYVAGLVGLGLSKLFHAAELEDL APDSLSNSMGLFLQKTNIIRDYLEDINEIPKSRMFWPREIWGKYADKLEDFKYEENSVKAVQCL NDLVTMALNHVEDCLKYMSNLRDLSIFRFCAIPQTMAIGTLALCYNMVEVFRGVrKMRRGLTAK 58 WO 2021/126960 PCT/US2020/065285 VIDRTNTMADvYGAFFDFSvMLKAKVNSNDPNASKTLSRIEAiQQTCQQSGLMNKRKEYVvRSEPMYNPAVIVILFSLLCIILAYLSAKRLPANQSV Cucumis melo (SEQ ID NO: 6) MGSLGAILKHPDDFYPLLKLKMAARHAEKQIPPESHWGFCYTMLHKVSRSFALVIQQLKPELRN AVCIFYLVLRALDTVEDDTSIQTDIKVPILKAFHCHIYNRDWHFSCGTKDYKVLMDEFHHVSTA FLELGKGYQEAIEDITKRMGAGMAKFICKEVETVDDYDEYCHYVAGLVGLGLSKLFHAAELEDL APDSLSNSMGLFLQKTNIIRDYLEDINEIPKSRMFWPREIWGKYADKLEDFKYEENSVKAVQCL NDLVTNALNHVEDCPKYMSNLRDLSIFRFCAIPQIMAIGTLALCYNNVEVFRGWKMRRGLTAK VIDRTKTMADVYGAFFDFSVMLKAKVNSNDPNASKTLSRIEAIQQTCQQSGLMNKRKLYWRSE PMYN PAVIVIL F S LLC 11 LAY L S AKRL PANQ SV Cucurbita moschata (SEQ ID MO: 7) MGSLGAILRHPDDIYPLLKLKMAARHAEKQIPPESHWGFCYTMLHKVSRSFALVIQQLKPELRN AVCIFYLVLRALDTVEDDTSIQTDIKVPILKAFHCHIYNRDWHFSCGTKDYKVLMDEFHHVSTA FLELGRGYQEAIEDITKRMGAGMAKFICKEVETVEDYDEYCHYVAGLVGLGLSKLFHASKSENL APDSLSNSMGLFLQKTNIIRDYLEDINEIPKSRMFWPREIWSKYADKLEDFKYEKNSVKAVQCL NDLVTNALTHVEDCLEYMSNLKDLSIFRFCAIPQIMAIGTLALCYNNVDVFRGVVKMRRGLTAK VIYRTKTMADVYGAFFDFSVMLKAKVNSSDPNASKTLTRIEATQKTCKQSGLLNKRELYAVRSE PMCNPAAIVVLFSLLCIILAYLSAKLLPANQPV Sechium edule (SEQ ID NO: 8) MGSLGAILSHPDDLYPLLKLKMAAKHAEKQIPPDPHWGFCFSMLHKVSRSFALVIQQLKPELRN AVC1FYLVLRALDTVEDDTGIHPDIKVPILQAFHCHTYNRDWHFSCGTKHYKVLMDEFHHVSTAFLELGKGYQEATEDVTERMGAGMAKFICKEVETVDDYDEYCHYVAGLVGLGLSKLFHAAELEDLAPDSLSNSMGLFLQKTMIIRDYLEDINEIPKSRMFWPREIWNKYADKLEDFKYEENSVKAVQCLNDLVTNALMHVEDCLKYMSNLKDLSTFRFCAIPQIMAIGTLALCYDNVEVFRGVVKMRRGLTAKIIDRTKKIADVYGAFFDF3VMLKAKVNSSDPNAAKTLSRIEAIEKTCKESGLLNKRKLYVIRSEPLFNPAVLVILFSLICILLAYLSAKRLPANQPV Panax. quinquetolius (SEQ ID NO: 9) MGSLGAILKHPDDFYPLLKLKFAARHAEKQIPPEPHWAFCYSMLHKVSRSFGLVIQQLGPQLRD AVCI FYLVLRALDTVEDD'TSIPTEVKVPILMAFHRHIYDKDWHFSCGTKEYKVLMDEFHHVSNA FLELGSGYQEAIEDITMRMGAGMAKFICKEVETIDDYDEYCHYVAGLVGLGLSKLFHASGAEDL ATDSLSNSMGLFLQKTNIIRDYLEDINEIPKSRME’WPRQIWSKYVDKLEDLKYEENSAKAVQCL NDMVTDALVHAEDCLKYMSDLRDPAIFRFCAIPQIMAIGTLALCFNNirQVFRGVVKMRRGLTAK VIDRTKTMSDVYGAFFDFSCLLKSKVDNNDPNATKTLSRLEATQKTCKESGTLSKRKSYIIESE SGHNSALIAIIFIILAILYAYLSSNLLLNKQ I4al׳c1s domestica (SEQ ID NO: 10) MGALSTMLKHPDDIYPLLKLKIASRQIEKQIPAEPHWAFCYTMLQKVSRSFALVIQQLGTELRN AVCLFYLVLRALDTVEDDTSvATDVKVPILLAFHRHIYDPDWHFACGTNNYKVLMDEFHHVSTAFLELGTGYQEAIEDITKRMGAGMAKFILKEVETIDDYDEYCHYVAGLVGLGLSKLFHAAGKEDLASDSLSNSMGLFLQKTNIIRDYLEDINEIPKSRMFWPRQIWSKYVNKLEDLKYEENSEKAVQCLNDMVTNALIHMEDCLKYMAALRDPAIFKFCAIPQIMAIGTLALCYNNIEVFRGWKMRRGLTAK VIDRTKSMDDVYGAFFDFSSILKSKVDKNDPNATKTLSRVEAVQKLCRDSGALSKRKSYIANRE QSYNSTLIVALFIILAIIYAYLSASPRI 59 *V3 O o Si) p £u ph ׳־p םי״פ-01A p pןינ ،ינo؛־>iniמto toa too MH toto to HCto O to toמtoהלUIKsto ׳־؟؛ to1O!5 to to Q to Q O • U'1toC to to to Cto to qto to oIto 1—1to toyto 1to O toto toto ito a to toו:=נtotog tocto to H tto to QO')o ؛h to 1-1 9Q1-1 u a to 1־ci to to g [y to ■ ui،־י:ןcvt1־o o er- Gl >-tvJ r1ci Oj o (7) uO־־' LKP f U1־ddd td rn!- to !--■!Lb H ri- P׳׳נ ؛־־~יי &bI—I to to H to i-< to !-< toI to I—I p!-נ >־d 7dpHUKDUj،oPd ؛־g r_rJP H 73Q t1־H O־־ tl:d d־JSdd!ס !gPC/j CO 7)p pOUPr W O 2021/126960 PCT/US2020/065285 WO 2021/126960 PCT/US2020/065285 MGSLGAILKHPDDFYPLLKLKFAARHAERQIPPEPHWAFCiSMEHKVSRSr'GLVlQQEDAQLRD AVCIFYLVLRALDTVEDDTSIPTEVKVPILMAFHRHIYDKDWHFSCGTKEYKVLMDEFHHVSNA FLELGSGFQEAIEDITMRMGAGMAKFICKEVETIDDYDEYCHYVAGLVGLGLSKLFHASGAEDL ATDSLSNSMGLFLQKTNIIRDYLEDIMEIPKSRMFWPRQIWSKYVDKLENLKYEENSAKAVQCL NDMVTN2LLHAEDCLKYMSNLRDPAIFRFCAIPQTMAIGTLALCFNNIQVFRGWKMRRGLTAK V1DRTKTMSDVYGAFFDFSCLLKSKVDNNDPMATKTLSRLEATQKTCKESGTLSKRKSYIIESK SAHNSALIAIIFIILAILYAYLSSNLENNQ Flavobacteriales bacterium (SEQ ID NO: 166) MLNNSLFSRLEEIPALLKLKLGSKDYYKNNNSETLTCDNLRYCFDTLNKVSRSFATVIKQLPNE LGNNVCVFYLILRALDSIEDDMNLPKELKIKLLREFHKKNYESGemiSGVGDKKEHVELLENYD KVIQSFLAIDQKNQLIITDICRKVGAGMANFVKAEIESVEDYNLYCHHVAGLVGIGLSRMFISS GLENDQFLNQDEISNSMGLFLQKTNIVRDYREDLDEGRMFWPKDIWHVYGSKINDFAINPTHDQ SVLCLNHIILNNALTHATDCLAYLKHLRNENIFKFCAIPQVMAMATLCKIYSNPDVFIKNVKIRK GLAAKLILNTTSMDEVIKVYKDMLLVIESKISSDNNPVSAETIQLLKQIREYFNDETLIVRKIA Bacteroidetes bacterium (SEQ ID NO: 167) MLNSSLFSRLEEIPALLKLKLGSINNYKNNNSENLTSKNLRYCFDTLNKVSRSFASVIKQLPNE LMVNvCLFYLILRALDSIEDDMNLPKDFKINLLREFLDKNYEPGWKISGVGDKKEYVELLENYD KVIQVFLDIDPKNQLIITD1CRKMGAGMAHFVEAEINSVKDYNLYCYHVAGLVGIGLSKMFLAS GLENCDYLNQEEISSSMGLFLQKTNIVRDYKEDMEENRIFWPKEIWRTYASKFSDFSIMPQHET SISCLNHMVNDALGHVIDCLEYLRHLRNENIFKFCAIPQVMAMATLCKVYNNPDVFIKTVKIRK GLAAKLILNTTSMDEVIKVYKGLLLDIENKIPLHNPTSDETLRLIKNIRSYCNNETMVVSKTA Squa1e n e E p o xid a s e Siraitia grosvenorii SQE1 (SEQ ID NO: 17) MVDQCALGWILASALGLVIALCFFVAPRRNHRGVDSKERDECVQSAATTKGECRFNDRDVDVIV VGAGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQRV YGYALFKDGKNTRLSYPLENFHSDV3GRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEEKG T1KGVQYKSKNGEEKTAYAPLTIVCDGCFSNLRRSLCNPMVDVPSYFVGLVLENCELPFANHGH VILGDPSPILFYQISRTEIRCLVDVPGQKVPSIANGEMEKYLKTVVAPQVPPQIYDSFIAAIDK GNIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLKDLSDAST LCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLLS GLNPRPLSLVLHFFAVAIYGVGRLLLPFPSVKGIWIGARLIYSASGIIFPIIRAEGVRQMFFPA TVTPAYYRSPP־VFEvPIV Siraitia. grosvenorii SQE2 (SEQ ID NO: 18) MVDQCALGWILASVLGAAALYFLFGRKNGGVSNERRHESIKNIATTNGEYKSSNSDGDT1IVGA GVAGSALAYTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLTELGLEDCVDDIDAQRVYGY ALEKDGKDTRLSYPLEKFHSDVAGRSFHNGRFIQRMREKAASLPKVSLEQGTVTSLLEENGIIK GVQYKTKTGQEMTAYAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCDLPYANHGHVIL ADPSPILFYRISSTEIRCLVDVPGQKVPSISNGEMANYLKNVVAPQIPSQLYDSFVAAIDKGNI RTMPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDVWLRDLLKPLRDLNDAPTLSK YLEAFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLLSGLM PRPISLVLHFFAVAIYGVGRLLIPFPSPKRVWIGARIISGASAIIFPIIKAEGVRQMFFPATVA AYYRAPRVVKGR Momordica charantia (SEQ ID MO: 19) 61 WO 2021/126960 PCT/US2020/065285 MVDECALGWILAAALGAVIALCLEVAPKTNNQDGGVDSKATPECVATTNGECRSDGDSDVIIVG AGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLADCVEEIDAQRVYG YALFKDGKNTRLSYFLEKFHSDVSGRSFHNGRFIQRMREKADSLPNVRLEQGTVTSLLEEKGTIKGVQYKSKDGKEKTAYAPLTIVCDGCFSNLRRSLCNPMVDVPSCFVGLVLENCQLPFANHGHW LGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGEMEKYLKTVVAPQVPPQIYDAFIAAIDKGNIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNlLKPLKDLHDAPTLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGMFSNGPVSLLSGL NPRPLSLVLHFFAVAIYGVGRLLFPFPSPKGIWIGARLIYSASGIIFPIIKAEGVRQMFFPATV PAYYRSPPALKPVA Cucurbita maxima (SEQ ID NO: 20) MVDYCAFGWILAAVLGLAIALSFFVSPRRNRRGGADSTPRSEGVRSSSTTNGECRSVDGDADVI IVGAGVAGSALAHTLGKDGRLVHVIERDLTEPDR1VGELLQPGGYLKLIELGLQDCVEEIDAQK VYGYALFKDGKNTQLSYPLEKFQSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEEK GT IKGVQYKSKNGEEKTAYAPLT IVCDGCFSNLRRSLCKPMVDVPSCFVGLVLENCQLPEANHG HVVLGDPSPILFYPISSTEIRCLVDVPGQKIPSISNGEMEKYLKTIVAPQVPPQTHDAFIAAID KGNIRTMPNRSMPAAPQPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPLKDLNDAP TLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLL SGLNPRPLSLVLHFFAVAIYGVGRLLLPFPSPKGIWIGARLVYSASGIIFPIIKAEGVRQMFFP ATVPAYYRSPPVHKSIA Cucurbita raoscnata (SEQ ID NO: 21) MVDYCAFGWILAAVLGLAIALSFFVSPRRNRRGGADSTPRSEGvRSSSTTNGECRSVDCDADVI IVGAGVAGSALAHTLGKDGRLVHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQK VYGYALFKDGKNTQLSYPLEKFQSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEBK GTIKGVQYKSKNGEEKTAHAPLTIVCDGCFSNLRRSLCKPMVDVPSCFVGLVLENCQLPFAMHG HWLGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGEMEKYLKTIVAPQVPPQIHDAFIAAID KGNIRTMPNRSMPAAPQPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPLKDLNDAP TLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLL SGLN PRPLSLVLHFFAVAIYGVGRLLLPFPSPKG IWIGARLVY SASGIIFP11KAEGVRQMFFP ATVPAYYRSPPVLKTIA Cucurbita moschata (SEQ ID NO: 22) MMVDHCAFAWILDVVLGLWAVTFFVAAPRRNRR.GGTDSTASKDCVISTAIANGECKPDDADAE VIIVGAGVAGSALAYTLGKDGRRVHVIERDLTEPDRIVGEFLQPGGYLKLIELGLGDCVEEIDA QKLYGYALFKDGKNTRVSYPLGNFHSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLE TKGTIKGVQYKSKNGEEKTAYAPLTIVCDGCFSMLRRSLCKPMVDVPSCFVGLVLENCQLPFAN HGHWLGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGDMEKYLKTWAPQVPPQIHDAFIAA lEKGNVRTMPNRSMPAAPHPTPGAALMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLKDLNDASTLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGVFSNGPIS LLSGLNPRPSSLVLHFFAVAIYGVGRLLLPFPSLKGIWIGARLIYSASGIILPIIKAEGVRQMFFPATVPAYYRSPPVHKPIT Cucumis sativus (SEQ ID NO: 23) MVDHCTFGWIFSAFLAFVIAFSFFLSPRKNRRGRGTNSTPRRDCLSSSATTNGECRSVDGDADVIIVGAGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQ KVYGYALFKDGKSTRLSYPLENFQSDVSGRSFHNGRFIQRMREKAAFLPNVRLEQGTVTSLLEE KGTITGVQYKSKNGEQKTAYAPLTIVCDGCFSNLRRSLCNPMVDVPSCFVGLVLENCQLPYANL GHVVLGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGEMEKYLKTVVAPQVPPQIHDAFIAAI E KGNIRTMPNRSMP AAPQ PT PGALLMGDAFNMRH PLT GGGMT VAL S DIVVL RNLLKPLKDLNDA 62 WO 2021/126960 PCT/US2020/065285 PTlCKYLES4YTLR.RPvASTIjMTLAGALYKVFUASSDQARKEMRQACFDYLSLGGIFSNGPVSL LSGLNPRPLSLVLHFFAVAIYGVGRLLLPFPSPKGIWIGARLVYSASGIIFPIIKAEGVRQMFF PATVPAYYRT PPVFNS Cucumis melo (SEQ ID NO: 24) MVDHCAFGWIFSALLAFPIALSLFLSPWRNRRVRGTDSTPRSASVSSSATTNGECRSVDGDADV VIVGAGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQ KVYGYALFKDGKNTRLSYPLENFHSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEE KGTITGVQYKSKNGEQKTAYAPLTIVCDGCFSNLRRSLCTPMVDVPSYFVGLVLENCQLPYANL GHWLGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGEMEKYLKTWAPQVPPQIHDAFIAAI EKCTfIRTMPNRSMPAAPQPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPLKDLNDA PTLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSL LSGLMPRPLSLVLHFFAVAIYGVGRLLLPFPSLKGIWIGARLVYSASGIIFPIIKAEGVRQMFF PAIVPAYYRTPPVLNS Cucurbita maxima (SEQ ID NO: 25) MMVEHCAYGWILAAVLGLWAVTFFVAVPRRNP.RGGTDSTASKDCVISPAIANGECEPEDADAD ADVIIVGAGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGEFLQPGGHLKLIELGLGDCVEEI DAQKLYGYALFKDGKNTRVSYPLGNFHSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSL LEKKGTIKGVQYKSKNGEEKTAYAPLTTvCDGCFSNLRRSLCKPMVDVPSCFVGLVLENCRLPF ANHGHWLGDPSPILFYPISSTEIRCLVDVPGQKVPSIPNGDMEKYLKTWAPQVPPQIHDAFI AAIEKGNIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPLKDL MDAPTLCKYLESYYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGVFSNGP ISLLSGLNPRPSCLVLHFFAVAIYGVGRLLLPFPSLKGIWIGARLIYSASGIILPIIKAEGVRQ MFFPATVPAYYRSPPVHKPIT Ziziphus jujube (SEQ ID NO: 26) MLDQCPLGWILASVLGLFVLCNLIVKIxTRNSKASLEKRSECVKSIATTNGECRSKSDDVDVIIVG AGVAGSALAHTLGKDGRRLHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQRVFG YALFKDGKDTRLSYPLEKFHSDVSGRSFHNGRFIQRMREKSASLPNVRLEQGTVTSLLEEKGTI KGVQYKTKTGQELTAFAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCELPYANHGHVI LADPSPILFYPTSSTEVRCLVDVPGQKVPSISMGEMAKYLKSVVAPQIPPQIYDAFIAAVDKGN IRTMPNRSMPASPFPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRDLLK.PLGDLNDAATLC KYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSTGPVSLLSGL NPRPLSLVLHFFAVAIYGVGRLLLPFPSPKRIWIGARLISGASGIIFPIIKAEGVRQMFFPATV PAYYRAAPVE Morus alba. (SEQ ID NO: 27) MADPYTMGWILASLLGLFALYYLFVNNKNHREASLQESGSECVKSVAPVKGECRSKNGDADVII VGAGVAGSALAHTLGKDGRRVHVIERDLAEPDRIVGELLQPGGYLKLIELGLQDCVEEIDSQRV YGYALFKDGKDTRLSYPLEKFHSDVSGRSFHNGRFIQRMREKAASLPNVQLEQGTVTSLLEENG TIKGVQYKTKTGQELTAYAPLTIVCDGCFSNLRRSICIPKVDVPSCFVGLVLENCNLPYANHGH VVLADPSPILFYPISSTEVRCLVDVPGQKVPSISNGEMAKYLKTVVASQIPPQIYDSFVAAVDK GNIRTMPMRSMPAAPHPTPGALLMGDAFMMRHPLTGGGMTVALSDIVVLRDLLKPLRDLNDSVT LCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMREACFDYLSLGGVFSEGPVSLLS GLNPRPLSLVCHFFAVAIYGVGRLLLPFPSPKRLWIGARLISGASGIIFPIIRAEGVRQMFFPA TIFAYYRAPRPN Juglans regia (JrSQEl) (SEQ ID NO: 28) 63 CD QSftiO ؛־>toto-0Q pכ 0tototo toto to to to tototo pi !-< w to toto f1- ־ to 0) to § !-< to totoH־־؛ Lgto to Hto־؛ b؛־>to pסכto to to to to to to!to ---סכto gtoto dto t1־ t1־ M Q iO Kt1־ Ci r1 caAAA Ai to *>J to H toH H to H --- toA tO tototo pO toS to to Ito Q to Ito 3 !״ 1 .--־ to 1—1GOto דכ toסכto toC/D to to סכ-to to to to to 1to to Ito toto to1Oto to to I—I d 1to g □Q ™IO؛כAHnrAKo 9Hהל•1H-QAH LL-lM MO H ל 4 - הלto tocoסכto to 7toנ 7 !׳O to1—1to y □1OהלA to O. g g HH ,^•1 t"aA bu0to־־ tA>! w H1־a ►Ej t1־ O9t1־ H A b1־U).o o I—I־־؛ tto o to to H e:- co to to toהלA□ to 1—1MM־؛ b I־rC):־־ Im1:=Ha co Al A aהל 3t؛ t2 וסנ :־؛ tE- botוסנ toto to L- to bor to to to' to to to a ,vi K) oe ,vi WO 2021/126960 PCT/US2020/065285 CKYLESFYTlRKPIASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGVFSTGPiSLLSG LNPRPVSLVLHFFAVAIYGVGRLLLPFPSPKRIWIGARLISGASGIIFPIIKAEGVRQMFFPATVPAYYRAPPVE Cucurbita moschata (SEQ ID NO: 33) MMVDHCAFAWILDWLGLWAVTFFVAAPRRNRRGGTDSTASKDCVISTAIANGECKPDDADAE VIIVGAGVAGSALAYTLGKDGRRVHVIERDLTEPDRIVGEFLQPGGYLKLIELGLGDCVEEIDA QKLYGYALFKDGKNTRVSYPLGNFHSDVSGRSFHNGRFIQRMREKMSLPNVRLEQGTVTSLLETKGTIKGVQYKSKKGEEKTAYAPLTIVCDGCFSNLRRSLCKPMVDVPSCFVGLVLENCQLPFAN HGHVVLGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGDMEKYLKTWAPQVPPQIHDAFIAAIEKGNVRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPLKDLND ASTLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGVFSNGPIS LLSGLNPRPSSLVLHFFAVAIYGVGRLLLPFPSLKGIWIGARLIYSASGIILPIIKAEGVRQMF FPATVPAYY RS P PVHKPIT Phaseolus vulgaris (SEQ ID NO: 34) MLDTYVFGWIICAALSVFVIRNFVFAGKKCCASSETDASMCAENITTAAGECRSSMRDGEFDVL IVGAGVAGSALAYTLGKDGRQVLVIERDLSEPDRIVGELLQPGGYLKLIELGLEDCVDKIDAQQ VFGYALFKDGKHIRLSYPLEKFHSDVAGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEEK GVIKGVQYKTKDSQELSVCAPFTIVCDGCFSNLRRSLCDPKVDVPSCFVGLVLENCELPCANHG HVILGEPSPVLFYPISSTEIRCLVDVPGQKVPSISNGEMAKYLKTVIAPQVPHELHNAFIAAVD KGSIRTMPNRSMPAAPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLRPLRDLNDAP SLCKYLESFYTLRKPVASTINTLAGALYKVFCASSDPARKEMRQACFDYLSLGGQFSEGPISLL SGLNPRPLTLVLHFFAVATYGVGRLLLPFPSPKRMWIGLRLISSASGIIMPIIKAEGVRQMFFP AT V PAY Y RN P PAA Hevea brasiliensis (SEQ ID MO: 35) MKMADHYLLGWILASVMGLFAFYYIVYLLVKPEEDNNRRSLPQPRSDFVKTMTATNGECRSDDD 8DVDVI1VGAGVAGAALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIElGLEDCVe EIDAQRVFGYALFKDGKHTQLAYPLEKFHSEVAGRSFHNGRFIQRMREKAASLPSVKLEQGTVT SLLEEKGTIKGVLYKTKTGEELTAFAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCRL PYANNGHVILADPSPILFYPISSTEVRSLVDVPGQKVPSVSSGEMANYLEINVVAPQVPPEIYDS FVAAVDKGNIRTMPNRSMPASPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRDLLKPLR DLHDAPTLCRYLESFYTLRKPVASTINTLAGALYKVFCASPDEARKEMRQACFDYLSLGGVFST GPVSLLSGLNPRPLSLVLHFFAVAIYGVGRLLLPFPSPHRIWVGARLISGASGIIFPIIKAEGV RQMFFPATVPAYYRAPPIKCN Sorghum bicolor (SEQ ID NO: 36) MAAAAAAASGVGFQLIGAAAATLLAAVLVAAVLGRRRRRARPQAPLVEAKPAPEGGCAVGDGRT DVIIVGAGVAGSALAYTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCVEEID AQRVLGYAL FKDGRNT KLAYPLE KFHS DVAGRS FHNGRF IQRMRQKAASLPNVQI EQGTVT SLL EENGTVKGVQYKTKSGEELKAYAPLTtvcDGCFSNLRRALOSPKVDVPSCFVGLVLENCQLPHP NHGH VI LAN PSP IL FY PIS ST EVRCLVDVPGQKV2 SI AS GEMANY L KT VVAPQIP PE IY DS FIA AIDKGSIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLHNLH DASSLCKYLESFYTLRKPVASTINTLAGALYKVFSASPDQARNEMRQACFDYLSLGGVFSNGPI ALLSGLNPRPLSLVAHFFAVAIYGVGRLMLPLPSPKRMWIGARLISGACGIILPIIKAEGVRQM FFPATV PAYYRAAPMGE Zea mays (SEQ ID NO: 37) WO 2021/126960 PCT/US2020/065285 MRKl؛EEEAGCAVSDGGTDV1I VGAGVAGSALAYTEGKDGRRVHVIERDfTEPDRI VGELDQPGG׳ YLKLIELGLQDCVEEIDAQRVLGYALFKDGRNTKLAYPLEKFHSDVAGRSFHNGRFIQRMRQKA ASLPNVQLEQGTVTSLLEENGTVKGVQYKTKSGEELKAYAPLTIVCDGCFSNLRRALCSPKVDV PSCFVGLVLENCQLPHPNHGHVILANPSPILFYPISSTEVRCLVDVPGQKVPSIATGEMANYLK TVVAPQIPPEIYDSFIAAIDK^SIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVAL SDIVVLRNLLKPLRNLHDASSLCKYLESFYTLRKPVASTINTLAGALYKVFSASPDQARNEMRQ ACFDYLSLGGVFSNGPIALLSGLNPRPLSLVAHFFAVAIYGVGRLMLPLPSPKRMWIGARLISG ACG11L P11KAE GVROMFFPATVPAYY RAAPTG E KA Medicago sativa (SEQ ID NO: 38) MDLYNIGWILSSVLSLFALYNLIFSGKRNYHDVNDKVKDSVTSTDAGDIQSEKLNGDADVIIVG AGIAGAALAHTLGKDGRRVHIIERDLSEPDRIVGELLQPGGYLKLVELGLQDCVDNIDAQRVFG YALFKDGKHTRLSYPLEKFHSDVSGRSFHNGRFIQRMREKAASLPNVNMEQGTVISLLEEKGTI KGVQYKNKDGQALTAYAPLTIVCDGCFSNLRRSLCNPKVDNPSCFVGLILENCELPCANHGHVI LGDPSPILFYPISSTEIRCLVDVPGTKVPSISNGDMTKYLKTTVAPQVPPELYDAFIAAVDKGN IRTMPNRSMPADPRPTPGAVLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPMRDLNDAPTLC KYLESFYTLRKPVASTINTLAGALYKVFSASPDEARKEMRQACFDYLSLGGLFSEGPISLLSGL NPRPLSLVLHFFAVAVFGVGRLLLPFPSPKRVWIGARLLSGASGIILPIIKAEGIRQMFFPATVPAYYRAPPVNAF Methylomonas lenta (SeQ ID MO: 39) MKEEFDICIIGAGMAGATISAYLAPKGIKIALIDHCYKEKKRIVGELLQPGAVLSLEQMGLSHL LDGFEAQTVKGYALLQGNEKTTIPYPSQHEGIGLHNGRFLQQTRASALENSSVTQIHGKALQLL ENERNEIIGVSYRESITSQIKSIYAPLTITSDGFFSNFRAHLSNNQKTVTSYFIGLILKDCEMP FPKHGHVFLSGPTPFICYPISDNEVRLLIDFPGEQLPRKNLLQEHLDTNVTPYIPECMRSSYAQ AIQEGGFKVMPNHYMAAKPIVRKGAVMLGDALNMRHPLTGGGLTAVFSDIQILSAHLLAMPDFK NT DLIH E KI E AY Y RD RKRANANLNILANAL Y AVMSNDL L KT AV FKYLQCGGANAQ E SIAVLAGL NRKHFSLIKQFCFLAVFGACNLLQQSISNIPKALK1LKDAFVIIKPLIKNELS Bathymod-iolus azoricus mndosymbiont (SEQ ID NO: 168) MHITSEHNDLFDICIVGAGMAGATIATYLAPRGIKIALIDRDYAEKRRIVGELLQPGAVQTLKKMGLEHLLEGFDAQPIYGYALFNKDCEFSIEYNQDKSTNYRGVGLHNGRFLQKIREDALKQPSITQIHGTVSELIEDENHVVTGVKYKEKYTRELimDLKKATITSDGFFSSFRKDLTNNVKTVTSFFV GIILKDCELPYPHHGHVFLSAPTPFICYPISSTESRLLIDFPGDQAPKKEAVKHHIENNVIPFL PKEFRLCLDQALRENDYKIMPNHYMPAKPVLKKGVVLLGDALNMRHPITGGGLTAVFNDVYLLSTHLLAMPDFNDTKLIHEKVNLYYNDRYHANTNVNIMANALYGVMSNDLLKQSVFEYLRKGGDNS GGPISLLAGLNRNPTILIKHFFSVALLCLRNLFKAHKMSLTNAFYVIKDAFCIIVPLA1NELRPSSFLKKNIHN Methyl.oprofundus sediment (SEQ ID NO: 169) MNTSPEHNDLFDICIVGVGMAGATIAAYLAPRGLKIALIDREYTEKRRIVGELLQPGAVQTLKK MGLEHLLEGFDAQPIYGYALFNNDKEFSISYNSDDSTEYHGVGLHNGRFLQKIREDVFKNETVT QTHGTVSELIEDKKGVVKGVTYREKHTREYKTVKAKLTVTSDGFFSNFRKDLSNNVKTVTSFFI GLVLNDCNLPFPNHGHVFLSAPTPFICYPISSTETRLLIDYPGDKAPKKDEIREHILNKVAPFL PEEFKECFANAYIEDDDFKVM'PNHYMPAKPVLKEGAVLLGDALNMRHPL'TGGGLTAVFNDVYLLS THLLAMPDFNDPKLLHEKLELYYQDRYHANTNVNIMANALYGVMSNDLLKQGVFEYLRKGGDNS GGPITLLAGLNRNPTLLIKHFFSVAFLCICNLSGNNKMNFTNVFRVMKDAFCIIKPLAVNELRP SSFYKKNIQL Methylomicrobiuia buryatense (SEQ ID NO: 170) 66 WO 2021/126960 PCT/US2020/065285 MESNr'DICIIGAGMAGATIAAYLAPKGINIALIDHCYREKKRlVGELLQPGAVLSLEQLGLGHLLDGIDAQPVEGYALLQGNEQTTIPYPSPNHGMGLHNGRFLQQIRASALQNSSVTQIQGKALSLL ENEQNEIIGVNYRDSVSNEIKSIYAPLTITSDGFESNFRELLSNNEKTVTSYFIGLILKDCEIP VPKHGHVFLSGPTPFICYPISSNEVRLLTDFPGGQFPRKAFLQAHLETNVTPYIPEGMQTSYRH ALQEDRLKVMPNHYMAAKPICIRKGAVMLGDALNMRHPLTGGGLTAVFSDIEILSGHLLAMPDFNNNDLIYQKIEAYYRDRQYANANLMILANALYGVMSNELLKNSVFKYLQRGGVNAKESIAILAGLNKbJHYSLMKQFFFVALFGAYTLVRENITNLPKATKILSDALTIIKPLAKNELSLVGIFSDYFKR Ononis spinosa SQE1 (SEQ ID NO: 177) MVDPYAVGWIICSLTTIVALYNFVFYRQNRSDKTTPTTTENITTATGDCRSLNPNGDVDIVIVG AGVAGSALAYTLGKDGRRVLVIERDLMEPDRIVGELLQPGGYLKLIELGLEDCVEKIDAQQVFG YALFKDGKHTRLSYPLEKFHSDIAGRSFHNGRFIQRMREKAASLPNVQLVQGTVTSLLEENGTI KGVQYKTKDAQELSACAPLTIVCDGCESNLRRNLCNPKVEVPSCFVGLVLENCELPCANHGHVI LGDPSPVLFYPISSTEIRCLVDVPGQKVPSISNGEMAKYLKEWAPQVPPELHDAFIAAVDKGN IRTMPNRSMPAAPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPLRDLNDAPSLC KYIESFYTLRKPVASTINTLAGALYKVFCASPDDARKEMRQACFDYLSLGGLFSEGPVSLLSGL NPRPLSLVLHFFAVAIYGVGRLLLPFPSPKRIWIGVRLIASASGIILPIIKAEGIRQMFFPATV PAYYRTPPAA Ononis spinosa SQE2 (SeQ ID MO: 178) MDLYLLGWILSSVLSLFALYCLVFDGNRSRANAEKQIQRGYSVTTDAGDVKSEKLNGDADVIIV GAGIAGAALAHTLGKDGRRVRVIERDLSEPDRIVGELLQPGGYLKLVELGLADCVDNIDAQKVF GYALFKDGKHTRLSYPLEKFHADVSGRSFHNGRFIQRMREKAASLLNVNLEQGTVTSLLEEKGT IKGVQYKNKDGQELTAYAPLTIVCDGCFSMLRRSLCNPKVDNPSCFVGLVLENCELPCANHGHV ILGDPSPILFYPISSTEIRCLVDVPGQKVPSISNGmTKYLKLTVAPQVPPELYDAFIAAVDKG NIRTMPNKSMPADPCPTPGAVLMGDAFNMRHPLTGGGMTVALSDIWLRNLLRPLRDLMDAPALC KY L E S FY I’ L RKPVAS TIN T LAGAL Y K V F S S S P DQ AR RE MRQAC F DY L S LGG L F S EG PIS L L S G LN PR PLSLVLH FFAVAVFGVGRLILPFPSPKRVWIGARLLSAASGI ILPIIKAEGIRQM FFPVT VPAYYRAPPTSQE Medicago truncatula SQE1 (SEQ 1D NO: 179) MIDPYGFGWITCTLITLAALYNFLFSRKNHSDSTTTENITTATGECRSFNPNGDVDIIIVGAGV AGSALAYTLGKDGRRVLIIERDLNEPDRIVGELLQPGGYLKLTELGLDDCVEKIDAQKVFGYAL FKDGKHTRLSYPLEIKFHSDIAGRSFHNGRFILRMREKAASLPNVRLEQGTVTSLLEEMGTIKGV QYKTKDAQEFSACAPLTIVCDGCFSNLRRSLCNPKVEVPSCFVGLVLENCELPCADHGHVILGD PSP'v7LFYPISSTEIRCLVDvTGQKVPSTSNGEMAK.YLKTVVAPQRrPPELHAAFIAAVDKGHIRT MPNRSMPADPYPTPGALLMGDAFt׳JMRHPLTGGGMTVALSDIV7LRNLLKPLRDLMDASSLCKYLESFYTLRKPVASTINTLAGALYKVFCASPDPARKEMRQACFDYLSLGGLFSEGPVSLLSGLNPCPLSLVLHFFAVAIYGVGRLLLPFPSPKRLWIGIRLIASASGIILPIIKAEGIRQMFFPATVPAYYRAPPDA Medicago truncatula SQE2 (SEQ ID NO: 180) MDLYNIGWILSSVLSLFALYNLIFAGKKNYDI7NEKVNQREDSVTSTDAGEIKSDKLNGDADVII VGAGIAGAALAHTLGKDGRRVHIIERDLSEPDRIVGELLQPGGYLKLVELGLQDCVDNIDAQRV FGYALFKDGKHTRLSYPLEKFHSDVSGRSFHGRFIQRMREKAASLPNVNMEQGTVISLLEEKGT IKGVQYKNKDGQALTAYAPLTIVCDGCFSNLRRSIjCNPKVDNPSCFVGLILENCELPCANHGHV ILGDPSPILFYPISSTEIRCLVDVPGTKVPSISNGDMTKYLKTTVAPQVPPELYDAFIAAVDKG NIRTMPNRSMPADPRPTPGAVLMGDAFNMRHPLTGGGMTVALSDIWLRNLLKPMRDLNDAPTL CKYIjESFYTLRKPVASTINTLAGALYKVFSASPDEARKEMRQACFDYLSLGGLFSEGPTSLLSG 67 WO 2021/126960 PCT/US2020/065285 LNPRPLSLVrRh'FAA'AVFGvGRLLLPFPSPKRVNIGARLLSGASGIILPIIIAAGIRQMEFPAT VPAYYRAPPVNAE Hypholoma sublateritium SQE (SEQ ID NO: 181) MSKSRSNYDVIIVGAGIAGCA1AHGLSTLSRATPLRIAIVERSLAEPDRIVGELLQPGGVMALQ RLGMEGCLEGIDAVKVHGYCWENGTSVHIPYPGVHEGRSFHHGRFIMKLREAARAARGVELVE ATVTELIPREGGKGIAGVRVARKGKDGEEDTTEALGAALWVADGCFSNFRAAVMGGAAVKPET KSH FVGAILKDARLPIPN HGTVALVKG FGPVLLYQT S EH DTRMLVDVKAPL PADL KVCAH ILSN IVPQLPAALHLPIQRALDAERLRRMPNSFLPPVEQGATRGAVLVGDAWNMRHPLTGGGMTVALN DVWLRDLLGSVGDLGDWRQVASTVNILSVALYDLFGADGELQVLRTGCFKYFERGGDCIDGPV SLLSGIAPSPMLLAYHFFSVAFYSIYVIAVGAQNGSAKQVLAVPGALQYPALCVKGLRVFYTAC VVFGPLLWTELRW Hypholoma sublateritium SQE2 (SEQ ID NO: 182; MHPTHYDWIVGAGVAGSSLAHALATLPREKPLQIALIERSFEEPDRIVGELLQPGGVDALKTL KMTSSVEGIDAITVTGYILVESGDMVRIPYPKGKEGRSFHHGRFIMGLRRVALENPNVHPIEAT AADLIECPCTGQVIGVRATSKTAPAPSSIDAQQTPPAPFSVYGDLVIVADGCFSNFRNWMGKA ACKATTKSYFVGTILKDAVLPVAGHGTVILPQGSGPVLLYQISEHDTRMLIDIQHPLPSDLRAH ILTNILPQLPASIQGWSDAFTKDRIRRMPNSFLPSVQQGSPLSKKGVILLGDSWNMRHPLTGG GMTVALNDVVYLRSIFASIQNLDDWDE TRYALRHWHWGRKPLS STINILSGTLYGLFEKDDDDY RALRKGCFKYFQLGGKCIDDPVSLLSGLSPSPLLLSSHFFAVILYAIWVVFTHPRVGSSMSANP ADVKRVYDIPSADEYPQLTLKGIRMFSQACGVFLPVLWSEIRWWAPCESS Hypholoma sublateritium SQE3 (SEQ ID NO: 183) MSKSRSNYDVIIVGAGIAGCALAHGLSTLSRATPLRIAIVERSLAEPDRIVGELLQPGGVMALQ RLGMEGCLEGIDAVKVHGYCVVENGTSVHIPYPGVHEGRSFHHGRFIMKLREAARAARGVELVE ATVTELIPREGGKGIAGVRVARKGKDGEEDTTEALGAALVVVADGCFSNFRAAVMGGAAVKPET KSHFVGAILKDARLPIPNHGTVALVKGFGPVLLYQISEHDTRMLVDVKAPLPADLKAHILSNIV PQLPAALHLPIQBALDAERLRRMPNSFLPPVEQGATRGAVLVGDAWNMRHPLTGGGMTVALNDV WLRDLLGSVGDLGDWRQVRRALHRWHWDRKPLASTVNILSVALYDLFGADGEELQVLRTGCFK YFERGGDCIDGPVSLLSGIAPSPMLLAYHFFSVAFYSIYVMFAHPQPVAQSKAVGAQNGSAKQV LAVPGALQYPALCVKGLRVFYTACVVFGPLLWTELRWWTAAEASRGRLLVMSLVPLLLLLGAAN YGIPGMGLLGVL MI.SQE A 4 (SEQ ID NO: 2 03) MAKEEFDICIIGAGMAGATISAYLAPKGIKIALIDRCYKEKKRIVGELLQPGAVLSLEQMGLSH LLDGFEAQTVKGYALLQGNEKTTIPYPSQHEGIGLHNGRFLQQIRASALENSSVTQIHGKALQL LENERNEIIGVSYRESITSQIKSIYAPLTITSDGFASNFRAHLSNNQKTVTSYFIGLILKDCEM P FPKHGHVFL SGPT P FIG Y P ISDNEVRLLID FPGEQL PRKNLLQE HLDTNVT p Y ! PECMRS SY A QAIQEGGFKVMPNHYMAAKPIVRKGAVLLGDAIjNMRHPLTGGGLTAVFSDIQILSAHLLAMPDF KNTDLIHEKIEAYYRDRKRANANLNILANALYAVMSNDLLKTAVFKYLQCGGANAQESIALLAG LNRKHFSLIKQYCFLAVFGAGNLLQQSISNIPKALKLLKDAFVIIKPLIKNELS Cucurbitadienol Synthase (CDS), Triterpene Synthase (TTP) Siraitia grosvenorii CDS (SEQ ID NO: 40) MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF 68 WO 2021/126960 PCT/US2020/065285 LLPGLVlALYVTGVLNSvLSKiiHRQEMCRYvYNHQbiEDGGwGLHlEGPSTMFGSALNYvALRLLGEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHTHYEDENTRYICLGPVNKVLNLLCCWVED PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV KSSQIQQDCPGDPNWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLEL1NPAETFGD1VIDYPYVECTSATMEALTLF KKLHPGHRTKE1DTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Momordica charantia (SEQ ID NO: 41) MWRLKVGAESVGENDEKWVKSISNHLGRQVWEFCPDAGTPQQLLQIEKARKAFQDNRFHRKQTS DLLVSIQCEKGTTMGARVPGTKLKEGEEVRKEAVKSTLERALSFYSSIQTSDGNWASDLGGPMFLLPGLVIALCVTGALNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIESPSTMFGSALNYVALRLLGEDADGGEGRAMTKARAWILGHGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYFLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPWLSLRKELYTVPYHEIDWNKSRNTCAKEDLYYPHS KMQ DILWGSIH HMY E PL FT HWPAKRJ RE KALKTAMQ HIHY E D ENT RY I CL G PVN KVLNML COW VEFPYSEAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSVQAIISTKLVDNYGPTLRKAH DYVKNSQIQQDCPGEPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSETVGEP LERNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAET FGDIVIDYPYVECTSATMEALALEKKLHPGHRTKEIDTA.IARAADELENMQRTDGSWYGCWGVCETYAGWEGIKGLVAAGRAYSN CLA1RKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQGERDPA PLHRAARLLINSQLENGDFPQEEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cucurbita maxima (SEQ 1D NO: 42) MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTSDGNWASDLGGP MFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC CWVEDPYSDAFKLHLQRVHDYLWVAEDO4RMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Citrullus colocynthis (CcCDS1) (SEQ 1D NO: 43} MWRLKVGAESVGEKEEKWLKSISNHLGRQWEFCADQPTASPNHLQQIDNARKHFRNNRFHRKQSSDLFLAIQNEKEIANGTKGGGIKVKEEEDVRKETVKNTVERALSFYSAIQTNDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALMYVALRLLGEDADGGEGGAMTKARGWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYCLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNKSRNTCAKEDLYYPHPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRVPDYLWIAEDGMRMQGYNGSQLWDTAFSVQAIISTKLIDSFGTTLKKAHDFVKDSQIQQDFPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLE 69 WO 2021/126960 PCT/US2020/065285 STCvAlRKACDh'LLSKELPGGGWGESYLSCQNKvYTNLEGNRPHLVNTAWVLMALIEAGQAERD PAPLHRAARLL1NSQLENGDFPQEEIMGVFNKNCMITYAAYRN1FPIWALGEYFHRVLTE Citrullus colocynthis (CcCDS2) (SEQ ID NO: 44) MWRLKVGAESVGEKEEKWLKSISNHLGRQVWEFCAHQPTASPNHLQQIDNARNHFRNNRFHRKQ SSDLFLAIQNEKEIANVTKGGGIKVKEEEDVRKETVKNTVERALSFYSAIQTNDGNWASDLGGP MFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR LLGEDADGGEGGA1CTKARSWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYCLP FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYY PHPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICLGPVNKVLNMLC CWVEDPYSDAFKFHLQRVPDYLWVAEDGMRMQGYNGSQLWDTAFSVQAIISTKLIDSFGT’rLKK AHDFVKDSQIQQDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVG EPLEKSRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATME ALTLFKKLHPGHRTKEIDIAVARAA14FLENMQRTDGSWYGCWGVCFTYAGV/FGIKGLVAAGRTY NSCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERD PAFLHRAARLLINSQLENGDFPQEEINGVFNKNCMITYAAYRNIFPIWALGEYFHRVLTE Cucurbita moschata (SEQ ID NO: 45 > MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAAAATPRQLLQIQNARNHFERNRFHRK QSSDLFLAIQYEKEIAEGGKGGAVKVKEEEEVGKEAVKSTLERALSFYSAVQTSDGNWASDLGG PMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVAL RLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSL PFHPGRMWCHCRMvYLPMSYLYGKRFVGPITPKVLSLRQELYTvPYHEIDNNKSRNTCAKEDLY YPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNML CCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSFAPTLR KAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTMV GEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATM EALTLFKKLHPGHRTKEIDTAVGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRT YNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGER DPAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cucumis sativus (SEQ ID NO: 46) MWRLPDZGKESVGEKEEKWIKSISNHLGRQWEFCAENDDDDDDEAVIHVVANSSKHLLQQQRRQ SSFENARKQFRNNRFHRKQSSDLFL'riQYEKEIARNGAKNGGNTKVKEGEDVKKEAVNNILERA LSFYSAIQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQMEDGGW GLHIEGSSTMFGSALNYVALRLLGEDANGGECGAMTKARSWILERGGATAITSWGKLWLSVLGV YEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITHMVLSLRKELYTI PYHEIDWNRSRNTCAQEDLYYPHPKMQDILWGSIYHVYEPLFNGWPGRRLREKANKIAMEHIHY EDENSRYIYLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTA FSIQAILSTKLIDTFGSTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISD CTAEGLKASLMLSKLPSKIVGEPLEKMRLCDAVNVLLSLQNEMGGFASYELTRSYPWLELINPA ETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAALAKAANFLENMQRTDGSWYGCWG VCFTYAGWFGIKGLVAAGRTYNNCVAIRKACHFLLSKELPGGGWGESYLSCQNKVYTNLEGNRP HLVNTAWVLMALIEAGQGERDPAPLHRAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRN IFPIWALGEYSHRVLTE Cucumis melo (SEQ ID NO: 47) MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR NNRFHRKQS SDLFLAT QCE KE11RNGAKNEGTTKVKEGEDVKKEAVKNTLERALS FYSAVQTSD GNWASDLGGPMFLLPGLVIAIjYVTGVLNSVLSKHHRQEMORY1YNHQNEDGGWGLHIEGSSTMF 70 WO 2021/126960 PCT/US2020/065285 GSALNYVALRLDGEAADGGEHGAAiTKARSWILERGGATAITSWGRLWLSvDGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLORIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGC ؟A7GVCFTYAGWFGI kglvaagrtynncvairkacnfllskelpgggwgesylscqnkvytnlegnkphlvntawmma LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDM Citrullus lanatus subsp. vulgaris (SEQ ID NO: 48) DGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTM FGSALNYVALRLLGEDADGGEGGAMTKARSWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLP PEFWLLPYCLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRS RNTCAKEDLYYPHPKMQDILKGSTIYHLYEPLFTRWPGKRLREKALOMAMKHIHYEDENSRYICI GPVNKVLNMLCCWVEDPYSDAFKFHLQRVPDYLWVAEDGMRMQGYNGSQLWDTAFSVQAIISTK LIDSFGTTLKKAHDFVKDSQIQQDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASL MLSKLPSEIVGEPLEKSRLCDAVMVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDY PYVECTSATMEALTLFKKLHPGRRTKEIDIAVARAANFLENMQRTDGSWYGCWGVCFTYAGWFG IKGLVAAGRTYNSCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVMTAWVLM ALIEAGQAERDPAPLHRAARLLINSQLENGDFPQEEiMGVFNKNCMITYAAYRNIFPIWALGEY FHRVLTE Theobroma cacao (SEQ ID NO: 49) MWRIjKIGKESVGDNGAWLRSSNDHVGRQVWEFCPESGTPEELSKVEMARQSFSTDRLLKKHSSD LLMRIQYAKENQFVTNFPQVKLKEFEDVKEEATLTTLRRALNFYSTIQADDGHWPGDYGGPMFL LPGLVITLSVTGALNAVLSKEHQYEMCRYLYNHQNRDGGWGLHIEGPSTMFGTVLNYVTLRLLG EGPEGGQGAVEKACEWILEHGSATAITSWGKMWLSVLGAYEWSGNNPLPPEVWLCPYFLPIHPG RMWCHCRMVYLPMSYLYGKRFVGPITPIILSLRKELYAVPYHEVDWNKARNTCAKEDLYYPHPL VQDILWASLHYLYEPIFTRWPCKSLREKALRTVMQHIHYEDENTRYICIGPVNKVLNMLSCWVE DPYSESFKLHLPRILDYLWIAEDGMKMQGYNGSQLWDTAFAVQAIISTGLADEYGPILRKAHDF IKYSQVLEDCPGDLNFWYRHISKGAWPFSTVDHGWPISDCTSEGLKAVLLLSTLPSESVGEPLH MMRLYDAVNVILSLQNVDGGFPTYELTRSYQWLELINPAETFGDIVIDYPYVECTSAAIQALIS FKKLFPEHRMEEIENCIGRAVEFIEKIQAADGSWYGSWGVCFTYAGWFGIKGLSAAGRTYNNSS NIRKACDFLLSKELATGGWGESYLSCQNKVYTNLEGARPHIVNTSWALLALIEAGQAERDPTPL HRAARILINSQMEDGDFPQEEIMGVFNKNCMISYSAYRNIFPIWALGEYTCRVLRAP Ziziphus jujube (SEQ ID NO: 50) MWKLKIGAETVGEGGSDGWLRSVNSHLGRQVWEFHPELGTPEELRQIQDARDAFFNHRFHKQHS SDLLMRIQFAKEbTPCVANPPQVKVKDTDEVTEESVTTTLRRAINFYSTIQAHDGHWAGDYGGPM FLLPGLVITLSVTGALNAVLSKEHQCEMCRYIYNHQNEDGGWGLHIEGPSTMFGTVLNYVSLRL LGEGAEDGLGTIENARKWILDHGGATAITSWGKMWLSVLGVYEWSGNNPLPPEVWLCPYTLPFH PGRMWCHCRMVYLPMSYLYGKRFVGPITPTIRSLRKELYTAPYHEIDWNRARNECAKEDLYYPH PLVQDVLWASLHYVYEPIFMRWPAKKLREKALSTVMQHIHYEDENTRYICIGPVNKVLNMLCCW VEDPNSEAFKLHLPRISDYLW1AEDGMKMQGYNGSQLWDTAFAVQAIVSTDLAEEYGPTIRKAH EYIKNSQVLEDCPGDLNFWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSQLSSETVGDS LDVKRLFNAVNVILSLQNGDGGFATYELTRSYQWLELINPAETFGDIVIDYPYVECTSAALEAL TLFKKSYPGHRREEVENCITNAAMFIENIQAKDGSWYGSWGVCFTYAGWEGIKGLVASGRTYEN CPSIRKACDFLLSKELPSGGWGESYLSCQNKVYTNLKDNKPHIVNTAWAMLALIVARQAERDPM PLHRAARILIKSQMHDGDFPQEEIMGVFNKNCMISYAAYRNIFPIWALGEYRLHVLRSL 71 Oדכ P to KK qדכH-ly Q to S דכs-- QH !-< F—i 1-GOQ '־''II—IHI—I Hדי*־r-15PH-lr^! tPדנ סכ ־־ t !דכ 1״j 1-p9bD >P qti u PGOG0־1P Q 1 ־־ L דכ דכ מH Pידכ E-דכP rO HQ Pn H R C3־דכo') y PסכP QH t1־- •- Q l:IJ.؛ G1־u דכ דכ )/נ1-3H־־ tדכ£סכ !"נlz^?ס17J a נ 1 ׳־u!"דSסרGOדכ1O 2ontTo 1O Hסכ1PP דכ q דכP P PP דכ،nC/) bo oP P p aד!9'Sj h) LU il) toCjtub tb OVatoנ 7 ! ID 90 rq id P rq o rq o £ cd H UI 1-o u hEh Dj Q cdo M £pסכ !-ר;>co r!:y).כm a4!--רo cdלר 0k" S ״ו ؛HI- §HoS£M r2o pj [2-Q Mo -- 1-o Cxi WO 2021/126960 PCT/US2020/065285 MAWKDKVAEGRDARLRTINGHVGRQIWEFDPDeGTDNERAEVEAA/REKfRNNRFEKKHSSDLLM RLQLAKENPVSSYLTQVKLEENEDITEEAVTMTLRRALNFHSSIQSFDGHWAGDLGGPMFLMPG LVISLYITGVLNTVLSSEHQREMCRYLYNHQNEDGGWGLHIEGPSTVFGSTLTYITLRLLGENV EDGDGAMEKGRKSILDHGGATYITSWGKMWLSVLGVFDWSGNNPLPPEMWLLPYFLPVHPGRMW CHCRMVYLPMSYLYGKRFVGKITPLVLSLRNEIYTVSYNQIDWNKARNLCAKEDLYYPHPMVQD ILWATLHKFVEPILMHWPGTLLREKALNTTMQHIHYEDESTRYICIGPVNKVLNMLCCWVDDPD SEAFKLHLPRISDYLWIAEDGMKCQGYNGSQLWDTAFAVQAYIATNLSDEFGPVLTKAHEYIKN SQVPDDCSGDLSFWYRHISKGAWPFSTGDHGWPISDCTAEGLKASLLLSRISPEVVGKPLNAKR FYDAVNVILSLMNSDGSFATYELTRSYTWLEMINPAETFGDIVIDYPYVECTSAAIQSLVAFTK LYPGHRREEIDECITKAAKFIESIQKKDGSWYGSWAVCFTYGLWFGIKGLIAAGKTYKMSSAIR KACEFLLSKQLASGGWGESYLSCQDKVYTNLEGNRAHAVNTGWAMLSLIDAGQAERDPSPLHRA ARVLINSQMGMGDFPQEEIMGVFNRNCMISYSAYRNIFPIWALGEYRCKVLASKGHE Artemisia annua (AaCASmut) (SEQ ID NO: 219) MAWKLKIAEGGDPWLRTTNDHIGRQIWEFDPTLGSVEELAEIEKLRKTFRDNRFEKKHSADLLM RSQFAKENSVSVFPPKVNIKDVEDITEDKVTNVLRRAIGFHSTLQADDGHWPGDLGGPMFLLPG LVITLSITGALNAVLSKEHKREMCRYLYNHQNIDGGWGLHIEGHSTMFGSALNYVTLRLLGEGA NDGEGAMEKGRKWILDHGGATAITSWGKFWLSVLGVFEWPGNNPLPPEMWLLPYFLPVHPGRMW CHCRMVYLPMSYLYGKRFVGPITSTVLALRKELFTVPYHDIDWNEARNLCAKEDLYYPHPLIQD VLWATLDKFVEPVLMSWPGKKLREKALRTAMEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPN SEAFKLHLPRIQDYLWIAEDGMKMQGYNGSQLWDAAFTVQAIMSTNLIEEFGPTLKKGHIFIKK SQVLDNCYGDLDYWYRHISKGAWPFSTADHGWPISDCTAEGLKAALLLSKLPSEIVDEPLDAKR FYEAVNVILSLMNADGSFATYELTRSYSWLELINPAETFGDIvIDYPYVECTSAAIQALVAFKR LYFGHRRDEVQGCIDKAAAFLEKIQEADGSWYGSWAVCFTYGTWFGVKGLVAAGKNYSNCSSIR KACNFLLSKQLASGGWGESYLSCVDKVYTNLEGNRSHWNTGWAMLALIDAEQAKRDPTPLHRA ARVLINSQMENGEFPQQEIMGVFNRNCMITYAAYRNIFPIWALGEYRCRVLKVET Citrullus colocynthis (C0CDS2) (SEQ ID NO: 220) MAWRLKVGAESVGE KEEKWLKSISN HLGRQVWEFCAHQPTASPNHLQQIDNARNHFRNNRFHRK QSSDLFLAIQNEKEIANVTKGGGIKVKEEEDVRKETVKNTVERALSFYSAIQTNDGNWASDLGG PMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVAL RLLGEDADGGEGGAMTKARSWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYCL PFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLY YPBPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICLGPVNKVLNML CCWVEDPYSDAFKFHLQRVPDYLWVAEDGMR^GYNGSQLWDTAFSVQAIISTKLIDSFGTTLK KAHDFVKDSQIQQDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIV GEPLEKSRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATM E AI?)? L F KKLH PGH RT KE 11) IAVARAAN FL ENMQ RT DG SWYGCWGVC F’TYAGW FGIKGLVAAGRT YNSCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAER DPAPLHRAARLLINSQLENGDFPQEEIMGVFNKNCMITYAAYRNIFPIWALGEYFHRVLTE Epoxide Hydrolase Siraitia grosvenorii EPH1 (SgEPHl) (SEQ ID NO: 56) MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGDWGAMMAWYFCLFRPDRVKALVNLSVHFT PRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFRS LATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGDL DLVYHFPGVKEYIHGGGFKKDVPFLEEVWMEGAAHFINQEKADEINSLIYDFIKQF Siraitia grosvenorii EPH2 (SqEPH2) (SEQ ID NO: 57) 74 WO 2021/126960 PCT/US2020/065285 MEKiEHTTISTNGINMHVASIGSGPAVEFLHGfPELWYSWRHQELFLSSMGIRAIAPDLRGFGD TDAPPSPSSYTAHHIVGDLVGLLDQLGIDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF LRRHPSIKFVDGFRALLGDDFYFCQFQEPGVAEADFGSVDVATMLKAFLTMRDPRPPMIPKEKG FRALETPDPLPAWLTEEDIDYFAGKFRKTGFTGGFNYYRAFNLTWELTAPWSGSEIKVAAKFIV GDLDLVYHFPGAKEYIHGGGFKKDVPLLEEVVVVDGAAHFINQERPAEISSLIYDFIKKF Siraitia grosvenorii EPH3 (SgEPH3) (SEQ ID NO: 58) MDQIBHITINTNGIKMHIASVGTGPWLLLHGFPBLWYSWRHQLLYLSSVGYRAIAPDLRGYGD TDSPASPTSYTALHIVGDLVGALDELGIBKVFLVGHDWGAIIAWY FCLFRPDRIKALVNLSVQF IPRNPAIPFIEGFRTAFGDDFYMCRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG FRAIPPPENLPSWLTEEDINYYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV GDSDLTYHFPGAKEYIHNGGFKKDVPLLEBWWKDACHFINQBRPQEINAHIHDFINKF Momordica charantia (SEQ ID NO: 59) MEKIEHSTIAANGITIHVASVGSGPAVLLLHGFPELWYSWRHQLLFLASKGYRAIAPDLRGFGD SDAPPSPSSYTPLHIVGDLvALLDHLGIDLVFLVGHDWGAMMAWHFCLLRPDRVKALVNLSVHF MPRNPAMSPLDGMRLLLGDDFYVCRFQEPGAAEADEGSVDTATMMKKFLTMRDPRPPIIPNGFR SLETPQALPPWLTEEDIDYFAAKFAKTGFTGGFNYYRAIGRTWELTAPWTGSKIKVPAKFIVGD LDMVYHLPDAKEYIHGGGFKEDVPLLEEVWIEGAAHFINQEKPDEISSLIYDFIKKF Cucurbita moschata (SEQ ID NO: 60) MEKIEHSTIATNGINMHVASIGSGPPVLFLHGFPELWYSWRHQLLFLASKGFRAIAPDLRGFGD SDVPPSPSSYTPFHIIGDLIGLLDHLGIEQVELVGHDWGAMMAWYFCLFRPDRVKALVNLSVHY NPRNPAISPLSRTRQFLGDDFYICKFQTPGVAEADFGSVDTATMMKKFLTIRDPSPPITPNGFK TLKTPETLPSWLTEEDIDYFASKFTKTGFTGGFNYYRAIEQTWELTGPWSGAKIKVPTKYVVGD VDMVYHLPGAKQYIEGGGFKKDVPLLEEWVMEGAAHFINQEKADEISAHIYDFIIKF Cucurbita maxima (SEQ ID NO: 61) MENIBHTIVPTNGINMHIASIGSGPAVLFLHGFPBLWYSWRHQLLFLASNGFRAIAPDLRGFGD TDVPPSPSSYTAHHIVGDLIGLLDHLGIDRVFLVGHDWGAMMAWYFCLFRPDRVRALVNLSVHY LHRHPSIKFVDGFRAFLGDDFYFCQFQEPGVAEADFGSVDTATMLKKFLTMRDPRPPMIPKBKG FRALETPDPLPSWLTEEDVDYFASKFSKTGFTGGFNYYRAFDLSWELTAPWSGSQVKVPAKFIV GDLDLVYHFPGAKEYIHGGRFKEDVPFLEEWVIBGAAHFINQBRADEISSLIYEFINKF Prunus persica (SEO ID NO: 62) MEKTEHTTVSTNGINMHIASIGTGPFtVLFLHGFPELWYSWRHQLLSLSSLGYRCIAPDLRGFGD TDAPPSPASYSALHIVGDLIGLLDHLGIDQVFLVGHDWGAVIAWWFCLFRPDRVKALVNMSVAF SPRNPKRKPVDGFRALFGDDYYICRFQEPGEIEKEFAGYDTTSIMKKFLTGRSPKPPCLPKELG LRAWKTPETLPPWLSEEDLNYFASKFSKTGFVGGLNYYRALNLTWELTGPWTGLQVKVPVKFIV GDLDITYHIPGVKNYIHNGGFKRDVPFLQEVWIEDGAHFINQERPDEISRHVYDFIQKF Morus notabilis (SEQ ID NO: 63) MEKIEHSTVHTNGINMHVASVGTGPAILFLHGFPELWYSWRHQMISLSSLGYRCIAPDLRGYGD TDAPPSPTSYTSLHIVGDLVGLIDHLVIEKLFLVGHDWGAMIAT^YFCLFRPDRIKALVNLSVPF FPRNPKINFVDGFRAELGDDFYICRFQEPGESEADFSSDTVAVFRRILANRDPKPPLIPKEIGF RGVYEDPVALPSWLTBDDINHFANKFNETGFTGGLNYYRALNLTWELTAAWTGARVQVPTKFIM GDLDLVYYFPGMKEYILNGGFKRDVOLLQELVI IEGAAHFINQEKPDEISSHIHHFIQKF 75 WO 2021/126960 PCT/US2020/065285 Ricinus communis (SEQ ID MO: 64) MEKIEHTTVATNGINMHVAAIGTGPEILFLHGFPELWYSWRHQLLSLSSRGYRCIAPDLRGYGD TDAPESLTGYTALHIVGDLIGLLDSMGIEQVFLVGHDWGAMMAWYLCMFRPDRIKALVNTSVAY MSRNPQLKSLELFRTVYGDDYYVCRFQEPGGAEEDFAQVDTAKLIRSVFTSRDPNPPIVPKEIG FRSLPDPPSLPSWLSEEDVNYYADKFNKKGFTGGLNYYRNIDQNWELTAPWDGLQIKVPVKFVI GDLDLTYHFPGIKDYIHNGGFKQVVPLLQEVVVMEGVAHFINQEKPEEISEHIYDFIKKF D NO: 65) MEKIEHTTVGTNGINMHVASIGTGPWLFIHGFPELWYSWRNQLLYLSSRGYRAIAPDLRGYGD TDAPPSVTSYTALHLVGDLIGLLDKLGIHQVFLVGHDWGALIAWYFCLFRPDRVKALVNMSVPF PPRNPAVRPLNNFRAVYGDDYYICRFQEPGEIEEEFAQIDTARLMKKFLCLRIAKPLCIPKDTG LSTVPDPSALPSWLSEEDVNYYASKFNQKGFTGPVNYYRCSDLNWELMAPWTGVQLEVPVKFIV GDQDLVYNNKGMKEYIHNGGFKKYVPYLQEVWMEGVAH FINQEKAEEVGAHIYEFIKKF He7ea brasiliensis (SEQ ID NO: 66; MEKIEHITVFTNGINMHIASIGTGPEILFLHGFPELWYSWRHQLLSLSSLGYRCIAPDLRGYGD TDAPQSVNOYTVLHIVGDLvGLLDSLGIQQVFLVGHDWGAFIAWYFCIFRPDRIKALVNTSVAF MPRNPQVKPLDGLRSMFGDDYYICQFQKPGKAEEDFAQVNTAKLIKLLFTSRDPRPPHFLKEVG LKALQDPPSQQSWLTEEDVNFYAAKFNQKGFRGGLNYYQNINMNWELAAAWTGVQIKVPVKFII GDLDLTYHFPGIKEYIHNGGFKKDVPLLQDVWMEGVAHFLNQEKPEEVSKHIYDFIKKF Handroan.thus impet iginosus (SEQ ID NO: 67) MDKIQHKIIQTNGINIHVAEIGDGPAVLFLHGFPELWYSWRHQMLFLSSRGYRAIAPDLRGYGD SDAPPCAT S YTAFH11GDLVGLLDAMGLDRVFLVGHDWGAVMAWYFCLLRPDRIKALVNLSVVF QPRNPKRKPVESMRAKLGDDYYICRFQEPGEAEEEFARVDTARLIKKLLTTRNPAPPRLPKEVG FGCLPHKPITMPSWLSEEDVQYYAAKFNQKGFTGGLNYYRAMDLSWELAAPWTGVQIKVPVKFI VGDLDITYNTPGVKEYIHKGRFKQHVPFLQELVILEGVAHFLNQEKPDEINQHIYDFIHKF Can'eJi.ina sativa (SEQ ID NO: 68) MEKIEHTTVSTNGINMHVASIGSGPVILFLHGFPDLWYSWRHQLLSFAALGYRAIAPDLRGYGDSDAPPSPESYTILHIVGDLVGLLDSLGVDRVFLVGHDWGAIVAWWLCMIRPDRVKALVNTSWFNPRNPSVKPVDKFRDLFGDDYYVCRFQETGEIEEDFAQVDTKKLITRFFVSRNPRPPCIPKSVFRGLPDPPSLPAWLTEQDVSFYGDKFSQKGFTGGLNYYRAMNLSWELTAPWAGLQIKVPVKFIVGDLDITYNIPGTKEYIHGGGLKKHVPFLQEVWMEGVGHFLQQEKPDEVTDHIYGFFEKFRTRE Coffea canephora (SEQ ID NO: 69) MDKIQHRQVPVNGINLHVAEIGDGPAILFLHGFPELWYSWRHQLLSLSAKGYRALAPDLRGYGDSDAPPSPSNYT2YLHIVGDLVGLLDSLGLDRVFL VGHDWGAVMAWY FCLLRPDRIKZ^LVNMSWFTPRNPKRKPLEAMRARFGDDYYICRFQEPGEAEEEFARVDTARIIKKFLTSRRPGPLCVPKEVGFGGSPHNPIQLPSWLSEDDVNYFASKFSQKGFTGGLNYYRAMDLNWELTAPWTGLQIKVPVKFIVGDLDVTFTTPGvKEYIQKGGFKRDVPFLQELVVMEGVAHFVNQEKPEEVSAHIYDFIQKF Punica granatum (SEQ ID NO: 70) 76 WO 2021/126960 PCT/US2020/065285 MEKlQHTTVRTNGINMHvATAGSGPDSTLFvHGFPELWYTwRhQMVSLJiALGYRTIAPDLRGYG DTDAPPSHESYTAFHIVGDLVGLLDSMGIEKVFLVGHDWGAAIAWYFCLFRPDRIKALVNMSW FHPRNPNRKPVDGLRAILGDDYYICRFQAPGEIEEDFAPADTANIIKFFLVSRNPRPPQIPKEG FSCLANSR.QMDLPSWLSEED1NYYASKFSEKGFTGGLKYYRVMNLNWELTA.PFTGLQIKVPAKF MVGDLDITYNTPGTKEFIHNGGLKKHVPFLQEVVVMEGVAHFINQEKPEEVTAHIYDFIKKF Arabidopsis lyrata subsp. lyrata (SEQ NO: 71) MEKIEHTTVSTNGINMHVASIGSGPVILFLHGFPDLWYSWRHQLLSFAALGYRAIAPDLRGYGDSDAPPSRESYTILHIVGDLVGLLNSLGVDRVFLVGHDWGAIVAWWLCMIRPDRVNALVNTSWF NPRNPSVKPVDAFRALFGDDYYICRFQEPGEIEEDFAQVDTKKLITRFFISRNPRPPCIPKSVG FRGLPDPPSLPAWLTEEDVSFYGDKFSQKGFTGGLNYYRALNLSWELTAPWAGLQIKVPVKFIV GDLD1TYNIPGTKEYIHEGGLKKHVPFLQEVVVLEGVGHFLHQEKPDEITDHIYGFFKKFRTRE TASI Rhinolophus sinicus (SEQ ID NO: 72) MDKIEHTTVSTMGINMHVASIGSGPVILFLHGFPDLWYSWRHQLLSFAGLGYRAIAPDLRGYGD SDSPPSHESYTILHIVGDLVGLLDSLGVDRVELVGHDWGAWAWWLCMIRPDRVNALVNTSWF NPRNPSVKPVDAFKALFGEDYYVCRFQEPGEIEEDFAQVDTKKLINRFFTSRNPRPPCIPKTLG FRGLPDPPALPAWLTEQDVSFYADKFSQKGFTGGLNYYRAMNLSWELTAPWAGLQIKVPVKFIV GDLDITYNIPGTKEYIHEGGLKKHVPFLQEVWMEGVGHFLHQEKPDEVTDHIYGFFKKF Gossypium raimondii (GrEPH) (SEQ ID NO: 184) MAEKIEHTTVTTNGIKMHVASIGSGPIILFLHGFPELWYTWRHQLLSLSSLGYRCVAPDLRGYG DSDAPPSPESYTVFHIVGDLVGLLDALGVDKVFLVGHDWGAMIAWNFCLFRPDRIKALVNLSIP YHPRNPKVK'rVDGYRALFGDDFYICRFQvPGEAEAHFAQMDTAKVMKKFLTTRDPNPPCIPRET GLKALPDPPALPSWLSEDEINYFATKFSQKGFTGGLNYYRAMNLNWELMAPWTGLQIQVPVKFI vgdldityhipgvkeylqnggfkknvpflqelvvmegvahfinqekpqeismhiydfikkf Gossypium hirsutum (GhEPH) (SEQ ID NO: 185) MAEKIEHTTVTTNGIKMHVASIGSGPIILFLHGFPELWYTWRRQLLSLSSLGYRCVAPDLRGYG DSDAPPSPESYTVFHWGDLVGLLDALGVDKVFLVGHDWGAMIAWNFCLFRPDRIKALVNLSVP YHPRNPKVKTVDGYRALFGDDFYICRFQVPGEAEAHFAQMDTAKVLKKFLTTRDPNPPCIPKET GLKALPDPPALPSWLSEDEINYFATKFNQKGFTGGLKYYRAMNLNWELMAPWTGLQIQVPVKFI VGDLDITYHIPGVKEYLQNGGFK1KNVPFLQELWMEGVAHFINQEKPQEISMHIYDFVKKF Siraitia gT'osnevorj. 1 (SgEPH4) (SEQ ID NO: 186) MAENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYG DTDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQ ffprnpttpfvkgfravlgdqfymvrfqepgkaeeefasvdireffknvlsnrdpqapylpnev KFEGVPPP2YLAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIV GDLDLTYHFPGAQKYIHGEGFKKAVPGLEEVWMEDTSHFINQERPHEINSHIHDFFSKFC Cucumis melo (CmEPHl) (SEQ ID NO: 187) MADKIQHSTISTNGINIHFASIGSGPVVLFLHGFPELWYSWRHQLLFLASKGFRAIAPDLRGFG DSDAPPSPSSYTPHHIVGDLTGLLDHLGIDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVH YTPRNPAGSPLAVTRRYLGDDFYICKFQEPGVAEADFGSVDTATMMKKFLTMRDPRPAIIPNGF WO 2021/126960 PCT/US2020/065285 KTeLETPEIdPSNLTEEDIEYFASKfSKTGFTGGeNYYRALDiTWELTGPWSRAQIKVPTAFIV GDLDLVYNFPGAKEYIHGGGFKKDVPLLEDVWIEGAAHFINQEKPDEISSLIYDFITKF Cucumis melo (CmEPH2) (SE!Q ID NO: 188) MAEKIEHTTIPTNGINMHVASIGSGPAVLFLHGFPQLWYSWRHQLLFLASKGFRALAPDLRGFG DTDAPPSPSSYTFLHIIGDLIGLLDHLGLEKVFLVGHDWGAMIAWYFCLFRPDRVKALVNLSVY YIKRHPSISFVDGFRAVAGDNFYICOFQEAGVAEADFGRVDTATMMKKFMGMRDPEAPLIFTKE KGFSSMETPDPLPCWLTEEDIDFFATKFSKTGFTGGFNYYRALNLSWELTAAWNGSKIEVPVKF IVGDLDLVYHFPGAKQYIHGGEFKKDVPFLEEVWIKDAAHFIHQEKPHQINSLIYHFINKFST ST SPA Trema orientale (T0EPH) (SEQ ID NO: 189) MAEKIEHTTINTNGVNLHVASIGTGPAVLFLHGFPELWYSWRHQMLALSSLGYRAIAPDLRGYG DSDAPPSPESYSSLHIVGDLVGLIDQLGIDQIFLVGHDWGAVIAWQFCLFRPDRVKALVNMSVP FRPRHPTRKPIETFRALFGDDYYVCRFQAPGEVEEDFASDDTANLLKKFYGGRNPRPPCVPKEI GFKGLKAPELPSWLSEEDLNYFAEKFNQRGFTGGLNYYRALDLTWELTAAWTGVQVKVPTKFIV GDLDITYHIPGAKEYINEGGLKKDVPYLQEVWMEGVAHFVNQEKAEEVSAHIHDFIKKF Arachis hypogaea (AhEPH) (SEQ ID NO: 190) MAEKIEHTWVNTNGIKMHVASIGSGPAVLFLHGFPELWYSWRHQLLSLSAQGYRCIAPDLRGYG DTDAPPSPSSYSALHIVSDLVGLLDALRIDQVFLVGHDWGAAMAWYFCLFRPDRIKALVNMSW FRPRNPKWKPLQSLRAMLGDDYYICRFQKPGEAEEEFARAGTSRIIKTFLVSRDPRPPCVPKEI GFGGSPNLQLALPSWLTEEDVNYYASKFDQKGFTGGLNYYRAIDLTWELTAPWTGVQIKVPVKF IVGDLDVTYNTPGVKEYIHGGGFKKEVPFLQELVVMEGVAHFINQERPDEISAHIHDFIKKF Mycobacterium tuberculosis (MtEPH) (SEQ ID NO: 212) MASQVHRILNCRGTRIHAVADSPPDQQGPLWLLHGFPESWYSWRHQIPALAGAGYRWAIDQR GYGRSSKYRVQKAYRIKELVGDWGVLDSYGAEQAFWGHDWGAPVAWTFAWLHPDRCAGWGI SVPFAGRGVIGLPGSPFGERRPSDYHLELAGPGRVWYQDYFAVQDGIITEIEEDLRGWLLGLTY TVSGEGMMAATKAAVDAGVDLESMDPIDVIRAGPLCMAEGARLKDAFVYPETMPAWFTEADLDF YTGEFERSGFGGPLSFYHNIDNDWHDLADQQGKPLTPPALFIGGQYDVGTIWGAQAIERAHEVM PNYRGTHMIADVGHWIQQEAPEETNRLLLDFLGGLRP Cytochrome P450 Siraitia grosvenorii CYP87D18 (SEQ ID NO: 73) MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKKVERYGPIFKTCLAGRPVWSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVP^NASALMVFRTSVNKMFGEDAKKLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREWDDRLANVGPDVEDFLGQAFKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEVEHEAIRKARADPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGKIIPEGWTIMLVTASRHR DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKLGGGTIARABILSFEDGLHVKFT PKE Cucumis melo (SEQ ID NO: 74) 78 6L ( 6A :ON QI Oss) edieooqDTj:^ snindoj Cf TMM 3 3331H19(193 9rI 1 d NWVT TVVM111M3 AM lLASAHSSIYSSANSASYOVOOHKSOSSJHSNM SAVASar1:a3M3ffl3N3 3:93033 INNSSO GVSJAANKLMOadlASDNAdKMGVyNGGDdVANgGHGISNIAOGIddiNSNAaajyillSSNdNaN3H3VlG33AASdHGVGGSSVTVIAVSIS3a1VSG99SAGDAS3aSSG333M3r 8£ : ON a I Oss) nrniAe snunjj ؛ $£ 3Md1aMAH39G3dSGIHVHVAM99338MdN3GHdaNXAVda3 ־ NAMHaNGSGIHGS G0GXANSAaV9V9H3G99DGdM3N30IIISaG33M HOSYAAGmiMOBdlAAONlOGaNAANdGGYdlASOGdGAajTAHGT^iMSNABSMAldOadaV HVNH1V3H3V3G3MAAadHaVGANGAGJLGAASIS33SV3SISdGG01I3GGSIG90S93a3IVDGda3aS9VDSVGM93GI3MGM301a3?KI3HD3NAGA9dANIcIGAGGsgGGSj;G3MGG99AGGMSSaaSAW3AWSIH3AMAWSa3A3ASdDlVMAH3I3HVS33IadO3H3H3S3V93HN0ITSHI9 Q £ MHT39GYMGM31aG9S33SG1aGAMR3AWGD30G?IIANNG3VGASAAAd'a9YG9JJ331S9AH3A MMMiGdHAaGSGSdOVGOGlBOAGdGOjNMgddGAgNGMSaMMMNlMHIAAVAVGAVGSAAlVMH UL :ON GI D3S) e^qosoiii ^rqjnDno : 9933 : LG NA A IEG 3 3: H 3:1HV WIN ־ 33 d:׳ 33M3A3lLdGIH0dlL9GAA3SA3V9VDH33:9993dN:laN301MSa3:a3M33c{N3lIlada3XA3da a'tiSVIlAri:NAJ1M93dd]IlA931D:!:3::31LJ133rI3:VdJ1AS9G:dGL!Ja9:1 ANGAHNNSMAaaMIddOGdaV LL3:JJILS:]:Sa]dSVSS:[:SdrI3:(:>:1ILHaaS:!:GOOSaMa3::[:¥D :[?־ yY33:!:V3H3V3G33AAadHGV1ISNG 9G3a3a13A93VGG330IG3G330103?KI3I93HAlA9dGN3dGGGSS9339GS331393AGGSSa3S3K3AISAH3AlAreWS33A3ASG0ASMAHG133¥S33I3dGS333GS3V9SHNGATSHIA3HIGDGV3GM3GaG5aa3SGGaGAMN3AV3D30GNIAHNa3VGlSAAAdH9VG9A3a1d9AH3A GAYG9GGIAMN ״ nAAVI ؛ MaD1adHAaGSaSd3SG0I139I3dG9N:A9ddGA9NaMSa3M3NAM ( 90 : ON GI D3S) snAT3c?s spunano 3:3 d 1S3HH GSO 3d S 31HV3 VA39930AMO 93:033:1003 LLOGAAMgAaVDVDHHGDDDGdWaNMQiMSa'iaMMyMdNSAHdaHAAYdaN HHSVAAGWAlMOGdlAASNlDGSNAGNNGGYdAANSGaGASSIAHdGAANSNAaaMGJdOadaV C [t!YN31¥SH3¥3O3:3AAadHaGGANG:1?31GJJILS:1Sa13:SaSSGS3rTG01dGGaSl.H3(jS93a3AV9G3aaaS9V9SVGM93GI3HGM3D1a3Ha3WD3MAlI,9dANIdGJ,GS99GG9IS3MlG99 lLGGM SSa3SAWMAI3S1LHSAWyVVSaMASASdDG,VMA0GiL3MVS33IMcIG3M3'drISaV93HNGJJISMIA MHISOGVMGflSGaGSddygGGaGAMKSAVNOaDGNiANNdavaiSAAAdHSVGOAMSISSAHNA NNNlSdHAaGSaSdNSGDOlGSI^GdGSWlSddGASNSNSaNJVlNNIMHdAAVAVGAVGOAAlAMW QI ( Ql :ON QI 03S) eniTXEm eqTqjnDnD 33dMNAAG9aad3GIHVGV13DD3333M39333GIHG330GAA3S9aVDV9HHG9993d?I3NH011ISaGa3M3MdN3Gada39A3dGHMSVlAGI^lMDSdllASNlDiSM.lI.M^GGVdlASOGHG.LaOIANGlSNP ’ISMASSMlIdSadaV dY33IV3H3VaG33VAadHaVGSNGIGlGlGSISa3SV3SISaGGDlI3S3SI30Dsa3a3IVD §9G3G3a1SA93YGM3aGIG3GMM01a3Na3I93HA13DdANGdGGGa5DGGDA33M3G0V3GGM SSa3S3W3AISlHSAWWS33A3ASd0ASMAHGI3aVS33I ’gdGS333GS3V9SHNGlISHINHTGOGVNGMSiaGSSaNSGGaGAMKSAYdOaDGWlAHNdHYGASAAAdaOYGOANSIdSAHNA 3S01adHAaGSaSdaSGDlA39IGdG9WA9dd3A9N33Sa3MNNAMHIA9YIVG3V39GGI3MK S8ZS90/0mSfl/13d 096971/1707 OM WO 2021/126960 PCT/US2020/065285 MWAIGLVVVALVVIIITHMIEKWRSPKIEGVLPPGSMGWPLIGETLQFISPGKSLDLHPEVKKRMEKYGPIFKTSLV IIVSTDYEMNKYILQHEGTLVELWYLDSFAKFFALEGETRVNAIGTVHKYLRSITLNHFGVESLKESLLPKIEDMLHTNLAKWASQGPVDVKQVISVMVFNFTANKIFGYDA ENSKEKLSENYTKILNSFISLPLNIPGTSFHKCMQDREKMLKMLKDTLMERLNDPSKRRGDFLD QAIDDMKTEK DFIPQLMFGILFASFESMSTTLTLTF? , N P RWE ELRAEHEAI VKKRENPNSRLTWEEYRSMTFTOMVVNETLRISNIPPGLFRKALKDFQVKGYTVPAGWTVMLVTPATQ LNPDTFKDPVT FNPWRWQELDQVTISKNFMP FGGGTRQCAGAE YS KL VL ST FLHILVTNY S FT K IRGGDVS RTP11SFGDGIHIKFTARA Prunus persica (SEQ TO NO: 80) MWTLVGLSLVGLLVIYFTHWIIKWRNPKCNGVLPPGSMGLPFIGETLNLIIPSYSLDLHPFIKK RLQRYGPIFRTSLAGRQWVTADPEFNNYLFQQEGRMVELWYLDTFSKIFVHEGBSKTNAVGMV HKYVRSIFLNHFGAERLKEKLLPQIEE FVNKSLCAWSSKASVEVKHAGSVMVFNFSAKOMI SY D AEKSSDDLSEKYTKIIDGLMSFPLNIPGTAYYNCLKHQKNVTTMLRDMLKERQISPETRRGDFL DQISIDMEKEKFLSEDFSVQLVFGGLFATFESISAVLALAFSLLAEHPSWEELTAEHEAILKN RENLNSSLTWDEYKSMTFTLQVINEILRLGNVAPGLLRRALKDIPVKGFTIPEGWTIMVVTSAL QLSPNTPEOPLEFRPWRWKDLDSYAVSKNFMPFGGGMRQCAGAEYSRVFLATFLHVLVTKYRWT TIKA ARIARNPILGFGDGIHIKFEE KKT Populus euphratica (SEQ ID NO: 81) MWTFVLCWAVLWYYTHWINKWRNPTCNGVLPPGSMGLPIIGETLELIIPSYSLDLHPFIKKR IQRYGPIFRTNILGRPAWSADPEINSYIFQNEGKLVEMWYMDTFSKLFAQSGESRTNAFGIIH KYARSLTLTHFGSESLKERLLPQVENIVSKSLQMWSSDASVDVKPAVSIMvCDFTAKQLFGYDA ENSSDKISEKFTKVIDAFMSLPLNIPGTTYHKCLKDKDSTLSILRNTLKERMNSPAESRGGDFL DQIIADMDKEKFLTEDFTVNLIFGILFASFESISAALTLSLKLIGDHPSVLEELTVEHEAILKN RENPDSPLTWAEYNSMTFSLQVINETLRLGNVAPGIjLRRALQDMQVKGYTIPAGWVIMVVNSAL HLNPATFKDPLEFNPWRWKDFDSYAVSKNLMPFGGGRRQCAGSEFTKLFMAIFLHKLVTKYRWN 1IKQ GN IGRN PILGFG DG IHISFSP KD1 Juglans regia (SEQ ID NO: 82) MWKVGLCVVGVIVVWFTRWINKWRNPKCNGILPPGSMGPPLIGESLQLIIPSYSLDLHPFIKKR VQRYGFIFRTSVVGQPMVVSTDVEFNHYLAKQEGRLVHFWYLDSFAEIFNLEDENAISAVGLIH KYGRSIVLMHFGTDSLKKTLLSQIEEIVt?KTLQTWSSLPSVEVKHAASVPLAFDLTAKQCFGYDV ENSAVKMSEKFLYTLDSLISFPFMIPGTVYHKCLKDKKEVLNMLRNIVKERMNSPEKYRGDFLD QITADMNKESFLTQDFIVYLLYGLLFASFESISASLSLTLKLLAEHPAVLQQLTAEHEAILKNR DNPN SSLTWDEYRSMT FT h’QVINEALRLGNVAPGLLRRALKDIEFKGY TIPAGWTIMLANSAI Q LNPNTYEDPLAFNPWRWQDLDPQIVSKNFMPFGGGIRQCAGAEYSKTFLATFLHVLVTKYRWTK VKGGKMARNPILWFADGIHINFALKHN Pyrus x bretschneideri (SEQ ID NO: 83) MWDWGLSFVALLVIYLTYWITQWKNPKCNGVLPPGSMGLPLIGETLNLLIPSYSLDLHPFIRK RLERYGPIFRTSLAGKPVLVSADPEFNNYVLKQEGRMVEFWYLDTFSKIFMQEGGNGTNQIGVI HKYARSIFLNHFGAECIKEKLLTQIEGSINKHLRAWSNQESVEVKKAGSIMALNFCAEHMIGYD AETATENLGEIYHRVFQGLISFPLNVPGTAYHNCLKIHKKATTMLRAFGjRERRSSPEKRRGDFL DQIIDDLDQEKFLSEDFCIHLIFGGLFAIFESISTVLTLFFSLLADHPAVLQELTAEHEALLKN REDPNSALTWDEYKSMTFTLQVINETLRLVNTAPGLLRRALKDIPVKGYTIPAGWTILLVTPAL hltsntfkdhlefnpwrwkdldslvisknfmpfgsglrqcagaefsraylstflhvlvtkyrwt TIKGARIS RR. PMLT FGDG AH IKFS E KKN Morus notabilis (SEQ ID NO: 84) 80 WO 2021/126960 PCT/US2020/065285 MNNTICLSVVGLVVlNISNWIRRWRNPKCNGVLPPGSMGFPDlGETLFDlIPTYSLDDHPFIKN RLQRYGSIFRTSIVGRPWISADPEFNNFLFQQEGSLVELYYLDTFSKIFVHEGVSRTNEFGW HKYIRSIFLNHFGAERLKEKLLPEIEQMVNKTLSAWSTQASVEVKHAASVLVLDFSAKQIISYD AKKSSESLSETYTRIIQGFMSFPLNIPGTAYNQCVKDQKKIIAMLRDMLKERRASPETNRGDFL DQISKDMDKEKFLSEDFVVQLIFGGLFATFESVSAVLALGFMLLSEHPSVLEEMIAEHETILKN REHPNSLLAWGEYKSMTFTLQVINETLRLGNVAPGLLRKALKDIRVKGFTIPKGWAIMt-IVTSAL QLSPSTFKNPLEFNPWRWKDLDSLVISKNFMPFGRGMRQCAGAEYSRAFMATFFHVLLTKYRWT TIKVGNVSRNPILRFGNGIHIKFSKKN Jatropha curcas (JcP450.1) (SEQ ID NO: 85) MWII GLCFASLLVTYCTEF FYKWRNPKCKGVLP PGSMGLPIIGETLQLIIPSYSLDHHPFIQKR IQRYGPIFRTNLVGRPVIVSADPEVNQYIFQQEGNSVEMWYLDAYAKIFQLDGESRLSAVGRVH KYIRSITLNNFGIENLKENLLPQIQDLVNQSLQKWSNKASVDVKQAASVMVFNLTAKQMFSYGV EKNSSEEMTEKFTGIFNSLMSLPLNIPGTTYHKCLKDREAMLKMLRDTLKQRLSSPDTHRGDFL DQAIDDMDTEKFLTGDCIPQLIFGILLAGFETTATTLTLAFKFLAEHPLVLEELTAEHEKILSK RENLESPL/TWDEYKSMTFTHHVINETLRLANFLPGLLRKALKDIQVKNYTIPAGWTIMVVKSAM QLNPEIYKDPLAFNPWRWKDLDSYTVSKNFMPFGGGSRQCAGADYSKLFMTIFLHVLVTKYRWR KIKGGDIARNPILG FGDGLH1EVSAKN Hevea brasiliensis (SEQ ID NO: 86) MLTWLLLVGFFIIYYTYWISKWRNPNCNGVLPPGSMGFPLIGETLQLLIPSYSLDLHPFIKKR IHRYGPIFRSNLAGRPVIVSADPEFNYYILSQEGRSVEIWYLDTFSKLFRQQGESRTNVAGYVH KYLRGAFLSQIGSENLREKLLLHIQDMVNRTLCSWSNQESVEVKHSASLAVCDFTAKVLFGYDA EKSPDNLSETFTRFVEGLISFPLNIPRTAYRQCLQDRQKALSILKNVLTDRRNSVENYRGDVLD LLLNEMGKEKFLTEDFICLIMLGGLFASFESISTITTLLLKLFSAHPEWQELEAEHEKILVSR HGSDSLSITWDEYKSMTFTHQVINETLRLGNVAPGLLRRAIKDVQFKGYTIPSGWTIMMVTSAQ QVNPEVYKDPLVFNPWRWKDFDSITVSKNFTPFGGGTRQCVGAEYSRLTLSLFIHLLVTKYRWT KIKEGEIRRAPMLGFGDGIH FKFSE KE Jatropha curcas (J0P45O.2) (SEQ ID NO: 87) MKRAIYICLARITKQGLSLIEMLMTELLFGAFFIIFLTYWINRWRNPKCNGVLPPGSMGLPLLG ETLQLLI PRY SLDLHPFIRRRIQRYGF IFRSNVAGRPIVFTADPELMHY IFIQERRLVELWYMD TFSNLFVLDGESRPTGATGYIHKYMRGLFLTHFGAERLKDKLLHQIQELIHTTLQSWCKQPTIE VKHAASAVICDFSAKFLFGYEAE KS P FNMS E RFAKFAE 8 LVS F PLNIPGTAY HQ S LE DREKVMK LLKNVLRERRNSTKKSEEDVLKQILDDMEKENFITDDFIIQILFGALFAISESIPMTIALLVKF LSAQPSVVEELTAEHEEILKNKKEKGLDSSETWEDYKSMTFTLQVINETLRIANVAPGLLRRTL RDIHYKGYTIPAGWTIMVLTSSRHMNPETYKDPVEFNPWRWKDLDSQTISECNFTPFGGGTRQCA GAEYSRAFISMFLHVLVTKYRWKMVKEGKICRGPTLRIEDGIHIKLYEKH Chenopodium quinoa (SEQ ID NO: 88) MWPTMGLYVATIVAICFILLELKRRNSREKQWLPPGSKGFPLIGETLQLLVPSYSLDLPSFIR TRIQRYGPIFKTRLVGRP7VMSADPGFNRYIVQQEGKSVEMWYLDTFSKLFAQDGEARTTAAGL VHKYLRNLTLSHFGSESLRVNLLPHLESLVRNTLLGWSSKDTIDVKESALTMTIEFVAKQLFGY DSDKSKEKIGEKFGNISQGLFSLPLNIPGTTYHSCLKSQREVMDMMRTALKDRLTTPESYRGDF LDHALKDLSTEKFLSEEFILQIMFGLLFASSESTSMTLTLVLKLLSENPHVLKELEAEHERIIK NKESPDSPI/rWAEVKSMTFTLQVINESLRLGNVSLGILRRTLKDIEINGYTIPAGWTIMLVTSA CQYNSDIYKDPLTFNPWRWKEMQPDVIAKNFMPFGGGTRQCAGAEFAKVLMTIFLHNLVTNYRW E KIKGGEIVRTPILGFRNALRVKLTKKN Spinacia oleracea (SEQ ID NO: 89) 81 WO 2021/126960 PCT/US2020/065285 MvLLPGSKGi'Ph'lGETLQLLLPSYSLDLPSFIRTRlQRYGPIFQTRLVGRPVVVSADPGh'NRYI VQQEGKMVEMWYLDTFSKIFAQQGEGRTNAAGLVHKYLRNITFTHFGSQTLRDKLLPHLEILVR KTLHGWTSQESIDVKEAALTMTIEFVAKQLFGYDSDKSKERIGDKFANISQGLLSFPLNIPGTT YHSCLKSQREITYDMMRKTLKERLASPDTCQGDFLDHALKDLNTDKFLTEDFILQIAFGLLFASS ESTSITLTLILKFLSENPHVLEELEVEHERILKNRESPDSPLTWAEVKSMTFTLQVINESLRLG NVSLGLLRRTLKDIEINGYTIPAGWTIMLVTSACQYNSDVYKDPLTFNPWRWKEMQPDVIAKNF MPFGGGTRQCAGAEFAKVLMTIFLHVLVTTYRNEKIKGGEIIRTPILGFRNGLHVKLIBBKARLS Manihot esculenta (SEQ ID NO: 90) MEMWSVWLYIISLIIIIATHWIYRWRNPKCNGKLPPGSMGIPFIGETIQFLIPSKSLDVPNFIK KRMNKYGPLFRTMLVGRPVIVSSDPDFNYYLLQREGKLVERWYMDSFSKLLHHDVTQIIIKHGS IHKYLRNLVLGHFGPEPLKDKLLPQLESAISQRLQDWSKQPSIEAKSASSAMIFDFTAKILFSY EPEKSGENIGEIFSNFLQGLMSIPLNIPGTAFHRCLKMQKRAIQMITEILKERRSMPEIHKGDF LDQIVEDMKKDSFWTEEFAIYMMFGLLLASFETISSTLALAIIELTDNPPVVQKLTEEHEAILK ARENRDSGLSWKEYKSLSYTHQVVNESLRLASVAPGILRRAITDIQVDGYTIPKGWTIMWPAA VQLNPNTFEDPLVFNPSRWEDMGAVAI-LAKMFIAFGGGSRSCAGAEFSRVLMSVFVHVFVTNYRW TKIKGGDMVRSPALGFGNGFHIRVSEKQL Olea europaea var. sylvestris (SEQ ID NO: 91) MAALDLSTVGYLIVGLLTVYITHWIYKWRNPKCNGVLPPGSMGLPLIGETIQLVIPNASLDLPP FIKKRMKRYGPIFRTNVAGRPVIITADPEFNHFLLRQDGKLVDTWSMDTFAEVFDQASQSSRKY TRHLTLNHFGVEALREKLLPQMEDMVRTTLSNWSSQESVEVKSASVTMAIDYAARQIYSGNLEN APLKISDLFRDLVDGLMSFPINIPGTAHHRCLQTHKKVREMMKDIVKTRLEEPERQYGDMLDHM IEDMKKE S FL DE D FIVQLMFGL FFVTS DSIS TTLALAFKLLAEHPLVLEELTAEHBAILKKREK SESHLTWNDYKSMTFTLQVINEVLRLGNIAPGFFRRALQDIPVNGYTIPSGWVIMIATAGLHLN SNQFEDPLKFNPWRWKVCKVSSVTAKCEMPFGSGMKQCAGAEYSRVLLATFTHVLTTKYRWAIV KGGKIVRSPIIRFPDGFHYKIIEKTN Cucurbita pepo subsp. pepo (SEQ ID NO: 171) MWAIVVGLAELAVAYYIHWINKWKDSKFNGVLPPGTMGLPLVGETLQLARPSDSLDVHPFIKKK VKRYGPIFKTCLAGRPWVSTDAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGFIHK YI RS ITLNH FGAE SLRERFLPRI EE SAKETLRYWATQPSVEVKDSAAVMVFRT SMVKMVSEDS S KLLTGGLTKKFTGLLGGFLTLPINVPGTTYNKCMKDMKEIQKKLREILEGRLASGGGSDEDFLG QAIKDZKGSQQFISDDFIIQLLFSISFASFESISTTLTLVLNYLADHPDVVKELEAEHEAIRNAR ADPDGPITWEEYKSMTFTLHVIFETLRLGSVTPALLRKTTKELQINGYTIPEGWTVMLVTASRH RDPAVYKDPHTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILFTKYRWTK LKGGEOZARAHIL S FE DGL H VKFT PRE Ca.pseila rubella CYP705A38 (SEQ ID NO: 172) MATLMTIDLQNCFIFIILSLLCYYLLFKKQKGSRAGCVLPPSPPSLPIIGHLHLLLSNLTHKSL QNISTKFGSFLYLRWNLPIVLVSSPSVAYEIYKTHDVNVSSRVATSLGDSLFLGSSGFITAPY GDYWKFMKKMVATKLLRPQAIEQSRGGRAEELQMFYENLLDKAMKKESIEVSKEAMKLTNNTIC RMSMGRSCSDENGEAERVRELLVKSTALTKKIFFANMFPRIPLFKKEIMGVSSEFDDLLERLLV EHEERVEEHENKDMMDLLLEAYRDENAEYKISRKQIKSLFVEIFLGGTDTSAQTVQWILAELIN KPNILERIREEIDSVVGKSRLMKETDLPNLPYLQATVKEGLRMHPPSPLLVRTFQESCEVKGFY MPEKTMLVINVYALMRDPDTWEDPNEFKPERFLLSSRSRQEDEKEQGNMKYLPFGAGRRGCPGS NLAYLFVGIAVGVMVQCFDWKIKEDKVNMEETTAGMNLAMAHPFKCTPWRNDPLTLNLENPSS Brassica rapa CYP705A37v2 (SEQ ID NO: 173) 82 WO 2021/126960 PCT/US2020/065285 MlVDh'QNCSIFTLLCFFTh’LUYSvFh’FbKKTNDLGPSPPSLPlIGHLHHFLSGLPHKAFQKlSTKYGPLLHLHIFSFPIVLVSSPTMAHEIFTTHDLNISSRNTPAIDESLLFGPSGFTVAPYGDYVKFIKKLLATKLLRPRAIEKSRGVRAEELKQFYLKVQDKALKKESIEIGKETMKFTNNMICRMSIGFSEENGEVETLRELIIKSFALSKQILFVNVLRRPLEMLGLMSLFKKDIMDVSRGFDELLERV LAEHEEKREEDQDMDMMDLLLEACRDENAEYKITRNQIKSLFVEIFLGGTDTSAHTTQWTMAEL VNNPNILGRLRDEIDLWGKERLIQETDLPNLPYLQAWKEGLRLHPPAPLLVRMFDKKCVIKDFFKVPEKTTLWNVYGVMRDPDSWEDPNEFKPERFLTSKQEEDKVLKYLPFAAGRRGCPATNVGYTFVGISIGMMVQCFDWSIKEKVSMEEVYAGMSLSt4AHPPTCTPVSRLSL Siraitia qrosvenorii (SEQ ID NO: 174) MDFFSAFLLLLLTVLILLQIRTRRRNLPPSPPSLPIIGHLHLLKRPIHRNFHKIAAEYGPIFSL RFGSRLAVIVSSLDIAEECFTKNDLIFANRPRLLISKHLGYNCTTMATSPYGDHWRNLRRLAAI EIFSTARLNSSLSTRKDEIQRLLLKLHSGSSGEFTKVELKTMFSELAFNALMRIVAGKRYYGDE VSDEEEAREFRGLMEEISLHGGASHWVDFMPILKWIGGGGFEKSLVRLTKRTDKFMQALIEERR NKKVLERKNSLLDRLLELQASEPEYYTDQIIKGLVLVLLRAGTOTSAVTLNWAMAQLLNNPELL AKAKAELDTKIGQDRPVDEPDLPNLSYLQAIVSETLRLHPAAPMLLSHYSSADCTVAGYDIPRG TTLLVNAWAIHRDPKLWDDPTSFRPERFLGAANELQSKKLIAFGLGRRSCPGDTMALRFVGLTL GLLIOCYQWKKCGDEKVDMGEGGGITIHKAKPLEAMCKARPAMYKLLLNALDKI Caiaelina sativa (SEQ ID NO: 175) MATMMIFDFQNCFIFIILCFVSLLCYTILFKKQESSRTGCVLPPSPPSLPIIGHLHLLLSSLTH KSLHNISSKFGPFLYLRWNLPIVLVSSASVAYEIYKTQDVNVSSRVATSLGDSLFLGSSGFIT APYGDYWKFMKKMVATKLLRPQAIEQSRGGRAEELQGLYENLLDKANKKESIEISKEAMKFTNN IICRMSMGRSCSDENGEAE IVRELLVKSTALTKKI FFANMFPRIPLFKKEIMGVSNQ FDELLE R LLVEHEERVEEHENKDMMDLLLEAFRDEHAEYKISRKQIKSLFVEIFLGGTDTSAQTVQWIMAE LINKPSIIEKIREEIDSWGKTRLIKETDLPKLPYLQVWKEGLRMHPPSPLWRTFQESCEVK GFYMPEKTMLVINVYALMRDPESWEDPNEFKPERFLPSSKSRQDEEKEQGLKYLPFGAGRRGCP GSNLAYLFVGLAVGVMVQCFDWKIKEDKVNMEETTAGMNLAMAHPFKCTPVVRIDPLTFNLKSP SP Raphanus sativus (SEQ ID MO: 176) MAPMTIDFQTCFIFILLSFFSFFCYFFFFKKTNDLGPSPPSLPIIGHLHHFLSVLPHKAFQQISTKYGPLLHLRIFSFPIVLVSSATMAYEIFTTHDLNISSRNAPAIDESLVFGSSGFIVSPYGDYVKFIKKLLATKLLRPRAIEKSRGVRAEELKQFYLKLHDKALKKESIEIGNETMKFTNNMICGMSM GRSCSEENGETETVRGLINKSFALSRKILFVNVLRRPLEKLGLLSLFKKDILDVSNRFDELLERILLEHEEKPEEEQDMDMMDLLLEASRDENAEYKITRNQIKALFVEIFMGGTDTSAHTTQWTMAELVNNPNSLEKLRDEIDMVVGKSRLIQETDLPNLPYLQAVVKEGLRLHPPAPLLVRMFEKKCVIKDFFNVPEKTTLWNLYGVMRDPDSWEDPNEFKPERFLTSKQEEEKTLKYLPFAAGRRGCPATNVAYIFVGISIGMMVQCFDWSIKDKVSMEEVYAGMSLSMAHPPKFTPVSRLSL Cucuniis sativus (C5CYP87D20) (SEQ ID MO: 194) MAWTILLGLATLAIAYYIHWVNKWKDSKFNG'lijPPGTMGLPLIGETIQLSRPSDSLDVHPFIQR KVKRYGPIFKTCLAGRPVVVSTDAEFNHYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIH KYIRSITLNHFGAESLRERFLPRIEESARETLHYWSTQTSVEVKESAAAMVFRTSIVKMFSEDS SKLLTEGLTKKFTGLLGGFLTLPLMLPGTTYHKCIKDMKQIQKKLKDILEERLAKGVK1DEDFL GQAIKDKESQQFISEEFIIQLLFSISFASFESISTTLTLILNFLADHPDVVKELEAEHEAIRKA RADPDGPITWEEYKSMNF'TLNVICE'rLRLGSV'rPALLRKTTKElQIKGY'riPEGWTVMLVTASR HREPEVYKDPDTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILFTKYRWR KLKGGKIARAHILRFEDGLYVN FTPKE 83 WO 2021/126960 PCT/US2020/065285 Cucumis sativus (sohB C3CYP87D20) (SEQ ID NO: 195) MALLSEYGLFLAKIVTWLAIAAIAAIIHWVNKWKDSKFNGVLPPGTMGLPLIGETIQLSRPSD SLDVHPFIQRKVKRYGPIFKTCLAGRPVVVSTDAEFNHYIMLQEGRAVEMWYLDTLSKFFGLDT EWLKALGLIHKYIRSITLNHFGAESLRERFLPRIEESARETLHYWSTQTSVEVKESAAAMVFRT SIVKMFSEDSSKLLTEGLTKKFTGLLGGFLTLPLNLPGTTYHKCIKDMKQIQKKLKDILEERLA KGVKIDEDFLGQAIKDKESQQFISEEFTIQLLFSTSFASFESISTTLTLILNFLADHPDWKEL EAEHEAIRKARADPDGPITWEEYKSMNFTLNVICETLRLGSVTPALLRKTTKEIQIKGYTIPEG WTVMLVTASRHRDPEVYKDPDTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFL HIL FTKYRWRKLKGGKIARAHILRFEDGLYVNFT PRE Cucumis sativus (zipA CsCYP87D20) (SEQ ID Nu196 ؛) MAQDLRLILIIVGAIAIIALLVHGFHWVNKWKDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLD VHPFIQRKVKRYGPIFKTCLAGRPVWSTDAEFNHYIMLQEGRAVEMWYLDTLSKFFGLDTEWL KALGLIHKYIRSITLNHFGAESLRERFLPRIEESARETLHYWSTQTSVEVKESAAAMVFRTSIV KMFSEDSSKLLTEGLTKKFTGLLGGFLTLPLNLPGTTYHKCIKDMKQIQKKLKDILEERLAKGV KIDEDFLGQAIKDKESQQFISEEFIIQLLFSISFASFESISTTLTLILNFLADHPDWKELEAE HEAIRKARADPDGPITWEEYKSMNFTLNVICETLRLGSVTPALLRKTTKEIQIKGYTIPEGWTV MLVTASRHRDPEVYKDPDTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHIL FTKYRWRKLKGGKIARAHILRFEDGLYVNFTPKE Cucumis sativus (CsCYP87D20_mut) (SEQ ID NO: 197) MAWTILLGLATLAIAYYIHWVNKWKDSKFNGVLPPGTMGLPLIGETIQFSRPSDSLDVHPFIQR KVKRYGPIFKTCIAGRPVVvSTDAEFNHYIMLQEGRAVEMWYLDTFSKFLGLDTEWLKALGLIH KYIRSITLNHFGAESLRERFLPRIEESARETLHYWSTQTSVEVKESAAAMVFRTSIVKMFSEDS SKLLTEGLTKKFTGLLGGFLTLPLNLPGTTYHKCIKDMKQIQKKLKDILEERLAKGVKIDEDFL GQAIKDKESQQFISE EFIIQLLFSISFASFASISTTLTLILN FLADH PDWKELEAEHEAIRKA RADPDGPITWEEYKSMNFTLNVICETLRLGSVTPALLRKTTKEIQIKGYTIPEGWTVMLVTASR HRDPEVYKDPDTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILFTKYRWR KLKGGKIARALILRFEDGLYVNFTPKE Cucumis sativus (sohB CsCYP87D20 mut) (SEO ID NO: 198) MALLSEYGLFLAKIVTWLAIAAIAAIIHWVMKWKDSKFNGVLPPGTMGLPLIGETIQFSRPSD SLDVrHPFIQRKVKRYGPIFKTCIAGRFW7؛STDAEFNHYIMLQEGRAVEMWYLDTFSKFLGLDT EWLKALGLIHKYIRSITLNHFGAESLRERFLPRIEESARETLHYWSTQTSVEVKESAAAMVFRT SIVKMFSEDSSKLLTEGLIKKFTGLLGGFLTLPLNLPGTTYHKCIKDMKQIQKKLKDILEERLA KGVKIDEDFLGQAIKDKESQQFISEEFIIQLLFSISFASFASISTTLTLILNFLADHPDVVKEL EAEHEAIRKARADPDGPITWEEYKSI-KFTLNVICETLRLGSVTPALLRKTTKEIQIKGYTIPEG WTVMLVTASRHRDPEVYKDPDTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFL HILFTKYRWRKLKGGKIARALILRFEDGLYVNFTPKE Cucurbits pepo subsp. pepo (sohB CppCYP) (SEQ ID NO: 199) MALLSEYGLFLAKIVTWLAIAAIAAIIHWINKWKDSKFNGVLPPGTMGLPLVGETLQLARPSD SLDVHPFIKKKVKRYGPIFKTCLAGRPW7STDAEFNMYIMLQEGRAVEMWYLDTLSKFFGLDT EWLKALGFIHKYIRSITLNHFGAESLRERFLPRIEESAKETLRYWATQPSVEVKDSAAVMVFRT SMVKMVSEDSSKLLTGGLIKKFTGLLGGFLTLPINVPGTTYNKCMKDMKEIQKKLREILEGRLA SGGGSDEDFLGQAIKDKGSQQFISDDFI IQLLFS ISFAS FES ISTTLTLVLNYLADHPDWKEL EAEHEAIRNARADPDGPITWEEYKSMTFTLHVIFETLRLGSVTPALLRKTTKELQINGYTIPBG 84 WO 2021/126960 PCT/US2020/065285 WTVMEVTASRRRDPAVYKDPHTFNPWRWNELDSITiQKNFMPFGGGLRHCAGAEYSKVILCTFLHIL ETKYRWT KLKGGKVARAH ILS FEDGLHVKFT PRE Cucurbita pepo subsp. pepo (17alpha CppCYP) (SEQ ID NO: 200) MALLLAVFHWINKWKDSKFNGVLPPGTMGLPLVGETLQLARPSDSLDVRPFIKKKVKRYGPIFK TCLAGRPVWSTDAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGFIHKYIRSITLNH FGAESLRERFLPRIEESAKETLRYWATQPSVEVKDSAAVMVFRTSMVKMVSEDSSKLLTGGLTK KFTGLLGGFLTLPINVPGTTYNKCMKDMKEIQKKLREILEGRLASGGGSDEDFLGQAIKDKGSQ QFISDDFIIQLLFSISFASFESISTTLTLVLNYLADHPDWKELEAEHEAIRNARADPDGPITW EEYKSMTFTLHVIFETLRLGSVTPALLRKTTKELQINGYTIPEGWTVMLVTASRHRDPAVYKDP HTFNPWRWKELDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILFTKYRWTKLKGGKVARA HILSFEDGLHVKFTPKE Siraitia grosvenorii (CYP1798) (SEQ ID NO: 221) MEMSSSVAATISIWMWVCIVGVGWRWNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA MEEQANSKPINFSHDIGPRVFPSMYKTIQMYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ KPNLNPLIKELLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLWKVAFGVYIPGWRFLPTKS NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKWTMIL NEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKGV SKAAKVQPAFFPFGWGPRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTWFTTQPQHG AHIVLRKL Cytochrome P450 Reductase Stevia rebaudiana (SrCPRl) (SEQ ID NO: 92) MAQSDSVKVSPFDLVSAAMNGKAMEKLNASESEDPTTLPALKMLVENRELLTLFTTSFAVLIGC LVFLMWRRSSSKKLVQDPVPQVIWKKKEKESEVDDGKKKVSIFYGTQTGTAEGFAKALVEEAK VRYEKTSFKVIDLDDYAADDDEYEEKLKKESLAFFFLATYGDGEPTDNAANFYKWFTEGDDKGE WLKKLQYGVFGLGNRQYEHFNKIAIWDDKLTEMGAKRLVPVGLGDDDQCIEDDFTAWKELVWP ELDQLLRDEDDTSVTTPYTAAVLEYRVVYHDKPADSYAEDQTHTNGHWHDAQHPSRSNVAFKK ELRTSQSDRSCTHLEFDISHTGLSYETGDHVGVYSENLSEWDEALKLLGLSPDTYFSVHADKE DGTPIGGASLPPPFPPCTLRDALTRYADVLSSPKKVALLALAAHASDPSEADRLKFLASPAGKD EYAQKIVANQRSLLEVMQSFPSAKPPLGVFFAAVAPRLQPRYYSISSSPKMSPNRIHVTCALVY ETTPAGRIHRGLCSTWMKNAVPLTESPDCSQASIFVRTSNFRLPVDPKVPVIMIGPGTGLAPFR GFLQERLALKESGTELGSSIFFFGCRNRKvDFIYEDELNNFVETGALSELIVAFSREGTAKEYV QHKMSQKASDIWKLLSEGAYLYVCGDAKGMAKDVHRTLHTIVQEQGSLDSSKAELYVKNLQMSG RYLRDVW Arabidopsis thaliana CPR1 (AtCPRl) (SEQ ID NO: 93) MATSALYASDLFKQLKSIMGTDSLSDDVVLVIATTSLALVAGFVVLLWKKTTADRSGELKPLMI PKSLMAKDEDDDLDLGSGKTRVSIFFGTQTGTAEGFAKALSEEIKARYEKAAVKVIDLDDYAAD DDQYEEKLKKETLAFFCVATYGDGEPTDNAARFYKWFTEENERDIKLQQLAYGVFALGNRQYEH FNKIGIVLDEELCKKGAKRLIEVGLGDDDQSIEDDFNAWKESLWSELDKLLKDEDDKSVATPYT AVIPEYRWTHDPRFTTQKSMESNVANGNTTIDIHHPCRVDVAVQKELHTHESDRSCIHLEFDI SRTGITYETGDHVGVYAENHVEIVEEAGKLLGHSLDLVFSIHADKEDGSPLESAVPPPFPGPCT LGTGLARYADLLNPPRKSALVALMYATEPSEAEKLKHLTSPDGKDEYSQWIVASQRSLLEVMA AFPSAKPPLGVFFAAIAPRLQPRYYSISSSPRLAPSRVHVTSALVYGPTPTGRTHKGVCSTWMK NAVPAEKSHECSGAPIFIRASNFKLPSNPSTPIVMVGPGTGLAPFRGFLQERMALKEDGEELGS 85 WO 2021/126960 PCT/US2020/065285 SLLFFGCRNRQMDFlYEDELNbl FVDQGV1SELIMAFSREGAQKEYVQHKM4EKAAQVWDL1KEEGYLYVCGDAKGMARDVHRTLHTIVQEQEGVSSSEAEAIVKKLQTEGRYLRDVW Arabidopsis thaliana. CPR2 (AtCPR2) (SEQ ID NO: 94) MASSSSSSSTSMIDLMAAIIKGEPVIVSDPANASAYESVAAELSSMLIENRQFAMIVTTSIAVL IGCIVMLVWRRSGSGNSKRVEPLKPLVIKPREEEIDDGRKKVTIFFGTQTGTAEGFAKALGEEA KARYEKTRFKIVDLDDYAADDDEYEEKLKKEDVAFFFLATYGDGEPTDNAARFYKWFTEGNDRG EWLKNLKYGVFGLGNRQYEHFNKVAKVVDDILVEQGAQRLVQVGLGDDDQCIEDDFTAWREALW PELDTILREEGDTAVATPYITAAVLEYRVSIHDSEDAKFNDTNMANGNGYTVEDAQHPYKANVAV KRELHTPESDRSCIHLEFDIAGSGLTYETGDHVGVLCDNLSETVDEALRLLDMSPDTYFSLHAE KEDGTPISSSLPPPFPPCNLRTALTRYACLLSSPKKSALVALAAHASDPTEAERLKHLASPAGK DEYSKWWESQRSLLEVMAEFPSAKPPLGVFFAGVAPRLQPRFYSISSSPKIAETRIHVTCALV YEKMPTGRIHKGVCSTWMKNAVPYEKSENCSSAP1FVRQSNFKLPSDSKVPIIMIGPGTGLAPF RGELQERLALVESGVELGPSVLFFGCRNRRMDFIYEEELQRFVESGALAELSVAFSREGPTKEY VQHKMMDKASDIWNMISQGAYLYVCGDAKGMARDVHRSLHTIAQEQGSMDSTKAEGFVKNLQTS GRYLRDVW Arabidopsis thaliana (AtCPR3) !SEQ ID NO: 95) MASSSSSSSTSMIDLMAAIIKGEPVIVSDPANASAYESVAAELSSMLIENRQFAMIVTTSIAVL IGCIVMLVWRP.SGSGNSKRVEPLKPLVIKPREEEIDDGRKKVTIFFGTQTGTAEGFAKALGEEA KARYEKTRFKIVDLDDYAADDDEYEEKLKKEDVAFFFLATYGDGEPTDNAARFYKWFTEGNDRG EWLKMLKYGVFGLGNRQYEHFNKVAKWDDILVEQGAQRLVQVGLGDDDQC1EDDFTAWREALW PELDTILREEGDTAVATPYTAAVLEYRVSIHDSEDAKFNDITLANGNGYTVFDAQHPYKANVAV KRELHTPESDRSCIHLEFDIAGSGLTMKLGDHVGVLCDNLSETVDEALRLLDMSPDTYFSLHAE KEDGTPISSSLPPPFPPCNLRTALTRYACLLSSPKKSALVALAAHASDPTEAERLKHLASPAGK DEYSKWWESQRSLLEVMAEFPSAKPPLGVFFAGVAPRLQPRFYSISSSPKIAETRIHVTCALV YEKMPTGRIHKGVCSTWMKNAVPYEKSEKLFLGRPIFVRQSNFKLPSDSKVPIIMIGPGTGLAP FRGFLQERLALVESGVELGPSVLFFGCRNRRMDFIYEEELQRFVESGALAELSVAFSREGPTKE YVQHKMMDKASDIWNMISQGAYLYVCGDAKGMARDVHRSLHTIAQEQGSMDSTKAEGFVKNLQT SGRYLRDVI Stevia rebaudiana CPR2 (SrCPR2)(SEQ 1D NO: 96) MAQSESVEASTIDLMTAVLKDTVIDTANASDNGDSKMPPALAMMFEIRDLLLILTTSVAVLVGC FVVLWKRSSGKKSGKELEPPKIWPKRRLEQEVDDGKKKVTIFFGTQTGTAEGFAKALFEEAK ARYEKAAFKVIDLDDYAADLDEYAEKLKKETYAFFFLATYGDGEPTDNAAKFYKWFTEGDEKGV WLQKLQYGVFGLGNRQYEEFNKIGIVVDDGLTEQGAKRIVPVGLGDDDQSIEDDFSAWKELVWP ELDLLLRDEDDKAAATPYTAA.IPEYPkVVFHDKPDA*FSDDHTQTNGHAVHDA*QHPCRSNVA.vTKKE LHTPESDRSCTHLEFDISHTGLSYETGDHVGVYCENLIEWEEAGKLLGLSTDTYFSLHIDNED GSPLGGPSLQPPFPPCTLRKALTNYADLLSSPKKSTLLALAAHASDPTEADRLRFLASREGKDE YAEWWANQR S L L EVMEAFPSARPPLGVF FAAVAPRLQPRYYSIS S S PKMEPNRIHVTCALVY E KTPAGRIHKGICSTWMKNAVPLTESQDCSWAPIFVRTSNFRLPIDPKVPVIMIGPGTGLAPFRG FLQERLALKESGTELGSSILFFGCRNRKVDYIYENELNNFVENGALSELDVAFSRDGPTKEYVQ HKMTQKASEIWNMLSEGAYLYVCGDAKGMAKDVHRTLHTIVQEQGSLDSSKAELYVKNLQMSGR YLRDVW Stevia rebaudiana CPR3 (SrCPR3) (SEQ ID NO: 97) MAQSNSVKISPLDLVTALFSGKVLDTSNASESGESAMLPTIAMIMENRELLMILTTSVAVLIGC VWLVWRRSSTKKSALEPPVIWPKRVQEEEVDDGKKKVTVFFGTQTGTAEGFAKALVEEAKAR YEKAVFKVIDLDDYAADDDEYEEKLKKESLAFFFLATYGDGEPTDNAARFYKWFrrEGDAKGEWL NKLQYGVFGLGNRQYEHFNKIAKVVDDGLVEQGAKRLVPVGLGDDDQCIEDDFTAWKELTCPEL 86 WO 2021/126960 PCT/US2020/065285 DQlLRDEDDFTVATPYTAAvAEYRVVFHFKPDALSEDYSYTNGHAVHDAQHPCRSNVAvKKELH SPESDRSCTHLEFDISNTGLSYETGDHVGVYCENLSEWNDAERLVGLPPDTYFSIHTDSEDGS PLGGASLPPPFPPCTLRKALTCYADVLSSPKKSALLALAAHATDPSEADRLKFLASPAGKDEYS QWIVASQRSLLEVMEAFPSAKPSLGVFFASVAPRLQPRYYSISSSPKMAPDRIHVTCALVYEKT PAGRIHKGVCSTWMKNAVPMTESQDCSWAPIYVRTSNFRLPSDPKVPVIMIGPGTGLAPFRGFL QERLALKEAGTDLGLSILFFGCRNRKVDFIYENELNNFVETGALSELIVAFSREGPTKEYVQHK MSEPVISDIWNLLSEGAYLYVCGDAKGMAKDVHRTLHTIVQEQGSLDSSKAELYVKNLQMSGRYL RDVW Artemisia annua. OPR (AaCPR) (SEQ ID NO: 98) MAQSTTSVKLSPFDLMTALLNGKVS FDTSI4TSDTNIPLAVFMENRELLMILTTSVAVLIGCVW LVWRRS S SAAKKAAE 3 PVIWPKKVTE DE VDDGRKKVTV F FGT QT GTAE G FAKALVE EAKARY E KAVFKVIDLDDYAAEDDEYEEKLKKESLAFFFLATYGDGEPTDNAARFYKWFTEGEEKGEWLDK LQYAVFGLGNRQYEHFNKIAKWDEKLVEQGAKRLVPVGMGDDDQCIEDDFTAWKELVWPELDQ LLRDEDDTSVATPYTAAVAEYRWFHDKPETYDQDQLTNGHAVHDAQHPCRSNVAVKKELHSPL SDRSCTHLEFDISNTGLSYETGDHVGVYVENLSEVVDEAEKLIGLPPHTYFSVHADNEDGTPLG GASLPPPFPPCTLRKALASYADVLSSPKKSALLALAAHATDSTEADRLKFLASPAGKDEYAQWI VASHRSLLEVMEAFPSAKPPLGVFFASVAPRLQPRYYSISSSPRFAPNRIHVTCALVYEQTPSG RVHKGVCSTWMKNAVPMTESQDCSWAPIYVRTSNFRLPSDPKVPVIMIGPGTGLAPFRGFLQER LAQKEAGTELGTAILFFGCRNRKVDFIYEDELNNFVETGALSELVTAFSREGATKEYVQHKMTQ KASDIWNLLSEGAYLYVCGDAKGMAKDVHRTLHTIVQEQGSLDSSKAELYVKNLQMAGRYLRDV W CPR (PgCPR) (SEQ ID NO: 99) MAQSSSGSMSPFDFMTAIIKGKMEPSNASLGAAGEVTAMILDNRELVMILTTSIAVLIGCWVF IWRRSSSQTPTAVQPLKPLLAKETESEVDDGKQKVTIFFGTQTGTAEGFAKALADEAKARYDKV TFKVVDLDDYAADDEEYEEKLKKETLAFFFLATYGDGEPTDNAARFYKWFLEGKERGEWLQNLK FGVFGLGNRQYEHFNKIAIWDEILAEQGGKRLISVGLGDDDQCIEDDFTAWRESLWPELDQLL RDEDDTTVSTPYTAAVLEYRWFHDPADAPTLEKSYSNANGHSWDAQHPLRANVAVRRELHTP ASDRSCTHLEFDISGTGIAYETGDHVGVYCENLAETVEEALELLGLSPDTYFSVHADKEDGTPL SGSSLPPPFPPCTLRTALTLHADLLSSPKKSALLALAAHASDPTEADRLRHLASPAGKDEYAQW IVASQRSLLEVMAEFPSAKPPLGVFFASVAPRLQPRYYSISSSPRIAPSRIHVTCALVYEKTPT GRVHKGVCSTWMBOISVPSEKSDECSWAPIFVRQSNFKLPADAKVPIIMIGPGTGLAPFRGFLQE RLALKEAGTELGPSILFFGCRNSKMDYIYEDELDNFVQNGALSELVLAFSREGPTKEYVQHKMM EKASDIWNLISQGAYLYVCGDAKGMARDVHRTLHTIAQEQGSLDSSKAESMVKNLQMSGRYLRD VW Camptotheca acuminate CaCPR (SEQ ID NO: 201) MAQSSSVKVSTFDLMSAILRGRSMDQTNVSFESGESPALAMLIENRELVMILTTSVAVLIGCFV VLLWRRSSGKSGKVTEPPKPLMVKTEPEPEVDDGKKKVSIFYGTQTGTAEGFAKALAEEAKVRY EKASFKVIDLDDYAADDEEYEEKLKKETLTFFFLATYGDGEPTDNAARFYKWFI-4EGKERGDWLK NLHYGVFGLGNRQYEHFtlRIAKWDDTIAEQGGKRLIPVGLGDDDQCIEDDFAAWRELLWPELD QLLQDEDGTTVATPYTMVLEYRWFHDSPDASLLDKSFSKSNGHAVHDAQHPCRANVAVRREL HTPASDRSCTHLEFDISGTGLVYETGDHVGVYCENLIEWEEAEMLLGLSPDTFFSIHTDKEDG TPLSGSSLPPPFPPCTLRRALTQYADLLSSPKKSSLLALAAHCSDPSEADRLRHLASPSGKDEY AQWWASQRSLLEVMAEFPSAKPPIGAFFAGVAPRLQPRYYSISSSPRMAPSRIHVTCALVFEK TPVGRIHKGVCSTWMKNAVPLDESRDCSWAPIFVRQSNFKLPADTKVPVLMIGPGTGLAPFRGF LQ EREALKEAGAELGPAILFFGCRNRQMDYIYEDELNNFVETGALSELIVAFSREGPKKEYVQH KMMEKASDIWNMISQEGYIYVCGDAKGMARDVHRTLHTIVQEQGSLDSSKTESMVKNLQMNGRY LRDVW 87 WO 2021/126960 PCT/US2020/065285 Non■ me iron oxidase tobacter pasteurianus subsp. ascendens (ApGA2ox) 100) MSVSKTTETFTSIPVIDISKLYSSDLAERKAVAEKLGDAARNIGFLYISGHNVSADLIEGVRKA ARDFFAEPFEKKMEYYIGTSATHKGFVPEGEEVYSAGRPDHKEAFDIGYEVPANHPLVQAGTPL LGPNNWPDIPGFRSAAEAYYRTVFDLGRTLFRGFALALGLNESYFDTVANFPPSKLRMIHYPYD ADAQDAPGIGAHTDYECFTILLADKPGLEVMNGNGDWIDAPPIPGAFWNIGDMLEVMTAGEFV ATAHRVRKVSEERYSFPLFYACDYHTQIRPLPAFAKKIDASYETITIGEHMWAQALQTYQYLVK KVE KGE L KL P KG ARKT AI1 FGH F KPN S AA.
Cucurbita maxima (CmGA2ox) (SEQ ID NO: 101) MAAASSFSAAFYSGIPLIDLSAPDAKQLIVKACEELGFFKWKHGVPMELISSLESESTKFFSL PLSEKQRAGPPSPFGYGNKQ1GRNGDVGWVEYLLLNTHLESNSDGFLSMFGQDPQKLRSAVNDY ISAVRNMAGEILELMAEGLKIQQRNVFSKLVMDEQSDSVFRVNHYPPCPDLQALKGTNMIGFGE HTDPQ1ISVLRSNNTSGFQ1SLADGNW1SVPPDHSSFFINVGDSLQVMTNGRFKSVKHRVLTNS SKSRVSMIYFGGPPLSEKIAPLASLMQGEERSLYKEFTWFEYKRSAYNSRLADNRLVPFERIAA Dendrobium catenatum (DcGA3ox) (SEQ ID NO: 102) MPSLSKEHFDLYSAFHVPETHAWSSSHLHDHPIAGDGATIPVIDISDPDAASMVGGACRSWGVF YATSHGIPADLLHQVESHARRLFSLPLHRKLQTAPRDGSLSGYGRPPISAFFPKLMWSEGFTLA GHDDHLAVTSQLSPFDSLSFCEVMEAYRKEMKKLAGRLFRLLILSLGLEEEEMGQVGPLKELSQ AADA1QLNSYPTCPEPERA1GMAAHTDSAFLTVLHQTDGAGGLQVLRDQDESGSARWVDVLPRP DCLVWZGDLLHILSNGRFKSVRHRAVVNBADHRISAAYFIGPPAHMKVGS1TKLVDMRTGPMY RPVTWPEYLGIRTRLFDKALDSVKFQEKELEKD Cucurbita maxima (CmGA3ox) (SEQ ID NO: 103) MATTIADVFKSFPVHIPAHKNLDFDSLHELPDSYAWIQPDSFPSPTHKHHNSILDSDSDSVPLI DLSLPNAAALIGNAFRSWGAFQVINHGVPISLLQSIESSADTLFSLPPSHKLKAARTPDGISGY GLVRISSFFPKRMWSEGFTIVGSPLDHFRQLWPHDYHKHCEIVEEYDREMRSLCGRLMWLGLGE LGITRDDMKWAGPDGDFKTSPAATQFNSYPVCPDPDRAMGLGPHTDTSLLTIVYQSNTRGLQVL REGKRWVTVEPVAGGLWQVGDLLHILTNGLYPSALHQAWNRTRKRLSVAYVFGPPESAEISP LKKLLGPTQPPLYRPVTWTEYLGKKAEHFNNALSTVRLCAPITGLLDVNDHSRVKVG Cucurbita maxima (CmGA20ox) (SEQ ID NO: 104) MHWTSTPEARHDGAPLVFDASVLRHQHNIPKQFIWPDEEKPAATCPELEVPLIDLSGFLSGEK DAAAEAVRLVGEACEKHGFFLVVNHGVDRKLIGEAHKYMDEFFELPLSQKQSAQRKAGEHCGYA SSFTGRFSSKLPWKETLSFRFAADESLNNLVLHYLNDKLGDQFAKFGRVYQDYCEAMSGLSLGI MELLGKSLGVEEQCFKNFFKDNDSIMRLNFYPPCQKPra-TLGTGPHCDPTSLTILHQDQVGGLQ VFVDNQWRLITPNFDAFVVNIGDTFMALSNGRYKSCLHRAVVNSERTRKSLAFFLCPRNDKVVR PPRELVDTQNPRRYPDFTWSMLLRFTGTHYRADMKTLEAFSAWLQQEQQEQQEQQFNI Agapanthus praecox subsp. orientalis (ApoGA20ox) (SEQ ID NO: 105) MVLQPFVFDAALLRDEHNIPTQFIWPEEDKPSPDASEELILPFIDLKAFLSGDPDSPFQVSKQV GEACESLGAFQVTNHGIDFDLLEEAHSCIQKFFSMPLCEKQRALRKAGESYGYASSFTGRFCSK WO 2021/126960 PCT/US2020/065285 LPWKETLSFRiSSSSSDIVQNYFVRTLGEEFRHFGEVYQKiCESMSKLSLMiMEVLGDSDGVGRMHFREFFEGNDSTMRLNYYPPCKKPDLTLGTGPHCDPTSLTILHQDDVSGLQVFTGGKWLTVRP KTDAFVWIGDTFTALSNGRYKSCLHRAVWSKTARKSLAFFLCPAMNKIVRPPRELVDIDHPRAYPDFTWSALLEFTQKHYRADMQTLNEFSKYILQAQGTLHK Arabidopsis thaliana (AtF3H) (SEQ ID NO: 106) MAPGTLTELAGESKLNSKFVRDEDERPKVAYNVFSDEIPVISLAGIDDVDGKRGEICRQIVEAC ENWGIFQWDHGVDTNLVADMTRLARDFFALPPEDKLRFDMSGGKKGGFIVSSHLQGEAVQDWR BIVTYFSYPVRNRDYSRWPDKPEGWVKVTEEYSERLMSLACKLLEVLSEAMGLEKESLTNACVD MDQKIWNYYPKCPQPDLTLGLKRHTDPGTITLLLQDQVGGLQATRDNGKTWITVQPVEGAFW NLGDHGHFLSNGRFKNADHQAWNSNSSRLSIATFQNPAPDATVYPLKVREGEKAILEEPITFA EMYEOIKMGRDLELARLKKLAKEERDHKEVDKPVDQIFA Chrysosplenium americanum (CaF6H) (SEQ ID NO: 107) OEKTLNSRFVARDEDSLERPKVSAIYNGSFDEIPvLISLAGIDMTGAGTDAAARRSEICRKIVEACEDWGIFGEIDDDHGKRAEICDKIVKACEDWGVFQPDEKLESVMSAAKKGDFWDHGVDAEVI SQWTTFAKPTSHTQFETETTRDFPNKPEGWKATTEQYSRTLMGLACKLLGVISEAMGLEKEALT KACVDMDQKWVNYYPKCPQPDLTLGLKRHTDPGTITLLLQDQVGGLQATRDGGKTWITVQPVK DNGWILLHIGDSNGHRHGHFLSNGRFKSHQAYRYRRPTRGSPTFGTKVSNYPPCPEQSLVRPPA GRPYGRALNALDAKKLASAKQQLESAAILLISELAVAYIILAILPSSEIIAEEGYL Datura stramonium (DsH6H) (SEQ ID NO: 108) MATFVSNWSTNNVSESFIAPLEKRAEKDVALGNDVPIIDLQODHLLIVQQITKACQDFGLFQVINHGVPEKLMVEAMEVYKEFFALPAEEKEKFQPKGEPAKFELPLEQKAKLYVEGERRCNEEFLYWKDTLAHGCYPLHEELLNSWPEKPPTYRDvIAKYSVEVRKLTMRILDYICEGLGLKLGYFDNELTQ1QMLLANYYPSCPDPSST1GSGGHYDGNLITLLQQDLVGLQQL1VKDDKW1AVEPIPTAFWNLGLTLKWISNEKFEGSIHRVVTHPTRNRISIGTLIGPDYSCTTEPIKELLSQENPPLYKPYPYA KFAEIYLSDKSDY DAGVK P Y KINQ F PN Arabidopsis thaliana (AtH6DH) (SEQ ID NO: 109) MENHTTMKVSSLNCIDLANDDLNHSWSLKQACLDCGFFYVINHGISEEFMDDVFEQSKKLFAL PLEEKMKVLRNEKHRGYTPVLDELLDPKNQINGDHKEGYYIGIEVPKDDPHWDKPFYGPNPWPD ADVLPGWRETMEKYHQEALRVSMAIARLLALALDLDVGYFDRTEMLGKPIATMRLLRYQGISDP SKGIYACGAHSDFGMMTLLATDGVMGLQICKDKNAMPQKWEYVPPIKGAFIVNLGIMLERWSNG FFKSTLHRVLGNGQERYSIPFFVEPNHDCLVECLPTCKSESELPKYPPIKCSTYLTQRYEETHA NLSIYHQQT Solanum lycopersicum (S1F35H) (SEQ 1D NO: 110) MALRINELFVAAIIYIIVHIIISKLITTVRERGRRLPLPPGPTGWPVIGALPLLGSMPHVALAK MAKKYGPIMYLKVGTCGMVVASTPNAAKAFLKTLDINFSNRPPNAGATHLAYNAQEMVFAPYGP RWKLLRKLSNLHMLGGKALENWANVRANELGHMLKSMFDASQDGECVVIADvLTFAMANMIGQV MLSKRVFVEKGVEVNEFKNMvVELMYVAGYFNIGDFIPKLAWMDIQGIEKGMKNLHKKFDDLLT KMFDEHEATSNERKENPDFLDVVMANRDNSEGERLSTTNIKALLLNLFTAGTDTSSSVIEWALA EMMKNPKIFEKAQQEMDQVIGKNRRLIESDIPNLPYLRAICKETFRKHPSTPLNLPRVSSEPCT VDGYYIPKNTRLSVN1WAIGRDPDVWENPLEFTPERFLSGKNAKIEPRGNDFELIPFGAGRRIC AGTRMGIVMVEYTLGTLVHSFDWKLPNNVIDINMEESFGLALQKAVPLEAMVTPRLSLDVYRC D4H (SEQ ID NO: 111) 89 WO 2021/126960 PCT/US2020/065285 MPKSWPI vISSHSFCFLPNSEQERK1؛־KDLNFHAATLSEEESlRELKAFDETKAGVKGIvDTGIT KIPRIFIDQPKNLDRISVCRGKSDIKIPVINLNGLSSNSEIRREIVEKIGEASEKYGFFQIVNH G1PQDVF5DKMVDGVRKFHEQDDQIKRQYYSRDRFNKNFLYSSNYVLIPGIACNWRDTMECI14NS NQFDPQEFPDVCRDILMKYSNYVRNLGLILFELLSEALGLKPNHLEEMDCAEGLILLGHYYPAC PQPELTFGTSKHSDSGFLTILMQDQIGGLQILLENQWIDVPFIPGALVINIADLLQLITNDKFK SVEHRVLAMKVGPRISVAVAFGIKTQTQEGVSPRLYGPIKELISEENPPIYKEVTVKDFITIRF AKR F DD S S S L S P FRLNN Catharanthus roseus (CrD4Hlike) (SEQ ZD NO: 112) MKELNNSEEELKAFDDTKAGVKALVDSGITEIPRIFLDHPTNLDQISSKDREPKFKKNIPVIDL DGISTNSEIRREIVEKIREASEKWGFFQIVNHGIPQEVMDDMIVGIRRFHEQDNEIKKQFYTRD RTKSFRYTSNFVLMPKIACNWRDTFECTMAPHQPNPQDLPDICRDIMMKYISYTRNLGLTLFEL LSEALGLKSNRLKDMHCDEGVELVGHYYPACPQPELTLGTSKHTDTGFLTMLQQDQIGGLQVLY ENHQWDVPFIPGALIINIGDFLQIISNDKFKSAPHRVLANKNGPRISTASVFMPNFLESAEVR LYGPlKELLSEENPPIYEQITAKDYVTVQFSRGLDGDSFLSPFMLNKDNMEK Zea mays (ZmBX6) (SEQ ID NO: 113) MAPTTATKDDSGYGDERRRELQAFDDTKLGVKGLVDSGVKSIPSIFHHPPEALSDIISPAPLPS SPPSGAAIPWDLSVTRREDLVEQVRHAAGTVGFFWLVNHGVAEELMGGMLRGVRQFNEGPVEA KQALYSRDLARNLRFASNFDLFKAAAADWRDTLFCEVAPNPPPREELPEPLRNVMLEYGAAVTK LARFVFELLSESLGMPSDHLYEMECMQNLNWCQYYPPCPEPHRTVGVKRHTDPGFFTILLQDG MGGLQVRLGNNGQSGGCWVDIAPRPGALMVNIGDLLQLVTNDRFRSVEHRVPANKSSDTARVSV ASFFNTDVRRSERMYGPIPDPSKPPLYRSVRARDFIAKFNTIGLDGRALDHFRL Hordeum vulgare subsp. vulgare (HvIDS2) (SEQ ID NO: 114) MAKVMNLTPVEASSIPDSFLLPADRLHPATTDVSLP11DMSRGRDEVRQAILDSGKEYGFIQVV NHGISEPMLHEMYAVCHEFFDMPAEDKAEFFSEDRSERNKLFCGSAFETLGEKYWIDVLELLYP LPSGDTKDWPHKPOMLREVVGNYTSLARGVAMEILRLLCEGLGLRPDFFVGDISGGRVVVDINY YPPSPNPSRTLGLPPHCDRDLMTVLLPGAVPGLEIAYKGGWIKVQPVPNSLVINFGLQLEVVTN GYLKAVEHRAATNFAEPRLSVASFIVPADDCWGPAEEFVSEDMPPRYRTLTVGEFKRKHNWN LDSSINQIININNNQKGI Hordeum vulgare subsp. vulgare (HvIDS3) (SEQ ID NO: 115) MENILHATPAPVSLPESFVFASDKVPPATKAWSLPIIDLSCGRDEVRRSILEAGKELGFFQW NHGVSKQVMRIMEGMCEQFFHLPAADKASLYSEERHKPNRLFSGATYDTGGEKYWRDCLRLACP FPVDDSINEWPDTPKGLRDVIEKFTSQTRDVGKELLRLLCEGMGIRADYFEGDLSGGNVILNIN HYPSCPNPDKALGQPPHCDRNLITLLLPGAVNGLEVSYKGDWIKVDPAPNAFWMFGQQLEWT NGLLKSIEHRAMTNSALARTSVATFIMPTQECLIGPAKEFLSKENPPCYRTTMFRDFMRIYNW KLGSSLNLTTNLKNVQKEI Uridine diphosphate dependent glycosyltransferase (UGT) Siraitia grosvenorii UGT720-269-1 (SEQ ID NO: 116) MEDRNAMDMSRIKYRPQPLRPASMVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDWFLSTEY NHRRISNTEALASRFPTLHFETIPDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRW PITCIITDIMLSSPIEVAEEFGIPVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQG VPLFEGLLRRNHLPGSWSDKSADISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFN KIYTIGPLHAZSKSRLGDSSSSASALSGFWKEDRACMSWLDCQPPRSWFVSFGSTMKMKADEL 90 WO 2021/126960 PCT/US2020/065285 REfWYGLVSSGKPFLCVLRSDWSGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVASHPAVG GFLTHCGWNSTVESIAAGVPMMCWPILGDQPSNATWIDRVWKIGVERNNREWDRLTVEKMVRAL MEGQKRVEIQRSMEKLSKLANEKWRGINLHPTISLKKDTPTTSEHPRHEFENMRGMNYEMLVG NAIKSPTLIKK Siraitia grosvenorii UGT94-289-3 (SEQ ID NO: 117) MTIFFSVEILVLGIAEFAAIAMDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFC STSVNLDAIKPKLPSSFSDSIQFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFE SILQTLAPHLLIYDSLQPWAPRVASSLKIPAINFNTTGVFVISQGLHPIHYPHSKFPFSEFVLH NHWKAMYSTADGASTERTRKRGEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKWPVG PLXTYEPNQDGEDEGYSSIKNWLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWW RFPQGDNTSGIEDALPKGFLERAGERGMWKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFG VPIIGVPMHVDQPFNAGLVEEAGVGVEAKRDPDGKIQRDEVAKLIKEWVEKTREDVRKKARBM SEILRS KGEE KFDEMVAEISELL KI Siraitia grosvenorii UGT74-345-2 (SEQ ID NO: 118) MDETTVNGGRRASDVWFAFPRHGHMSPMLQFSKRLVSKGLRVTFLITTSATESLBJjNLPPSSS LDLQVlSDVPESNDIATLEGYLRSFKATVSKTLADFTDGIGNPPKFIVYDSVMPWVQEvARGRG LDAAPFFTQSSAVNHILNHVYGGSLSIPAPENTAVSLPSMPVLQAEDLPAFPDDPEVVMNFMTS QFSNFQDAKWIFFNTFDQLECKKQSQWNWMADRWPIKTVGPTIPSAYLDDGRLEDDRAFGLNL LKPEDGKNTRQWQWLDSKDTASVLYISFGSLAILQEEQVKELAYFLKDTNLSFLWVLRDSELQK LPHNFVQETSHRGLWNWCSQLQVLSHRAVSCFVTHCGWNSTLEALSLGVPMVAIPQWVDQTTN AKFVADVWRVGVRvKKKDERIVTKEELEASIRQVVQGEGRNEFKHNAIKWKKLAKEAVDEGGSS DKNIEEFVKTIA Siraitia grosvenorii UGT75-281-2 (SEQ ID NO: 119) MGDNGDGGEKKELKENVKKGKELGRQAIGEGYINPSLQLARRLISLGVNVTFATTVLAGRRMKN KTHQTATTPGLSFATFSDGFDDETLKPNGDLTHYFSELRRCGSESLTHLITSAANEGRPITFVI YSLLLSWAADIASTYDIPSALFFAQPATVLALYFYYFHGYGDTICSKLQDPSSYIELPGLPLLT SQDMPSFFSPSGPHAFILPPMREQAEFLGRQSQPKVLVNTFDALEADALRAIDKLKMLAIGPLI PSALLGGNDSSDASFCGDLFQVSSEDYIEWLNSKPDSSVVYISVGSICVLSDEQEDELVHALLN SGHTFLWVKRSKENNEGVKQETDEEKLKKLEEQGKMVSWCRQVEVLKHPALGCFLTHCGWNSTI ESLVSGLPWAFPQQIDQATNAKLIEDVWKTGVRVKZkNTEGIVEREEIRRCLDLVMGSRDGQKE E1ERNAKKWKELARQAIG EGGS SDSNLKT FL WE I OLE 1 Siraitia grosvenorii UGT720-269-4 (SEQ ID NO: 120) MAEQAHDLLHVLLFPFPAEGHIKPFLCLAELLCNAGFHVTFLNTDYNHRRLHNLHLLAARFPSL HFESISDGLPPDQPRDILDPKFFISICQVTKPLFRELLLSYKRISSVQTGRPPITCVITDVIFR FPIDVAEELDIPVFSFCTFSARFMFLYFWIPKLIEDGQLPYPNGNINQKLYGVAPEAEGLLRCK DLPGHWAFADELKDDQLNFVDQTTASSRSSGLILNTFDDLEAPFLGRLSTIFKKIYAVGPIHSL LMSHHCGLWKEDHSCLAWLDSRAAKSWFVSEGSLVKITSRQLMEFWHGLLNSGKSFLFVLRSD VVEGDDEKQVVKEIYETKAEGKWLVVGWAPQEKVLAHEAVGGFLTHSGWNSILESIAAGVPMIS CPKIGDQSSNCTWISKVWKTGLEMEDRYDRVSVETMVRSIMEQEGEKMQK'TIAELAKQAKYKVS KDGTSYQNLECLIQDIKKLNQIEGFINNPNFSDLLRV Siraitia grosvenorii UGT94-289-2 (SEQ ID NO: 121) MDAQQGHTTTILMLPWVGYGHLLPFLELAKSLSRRKLFHIYFCSTSVSLDAIKPKLPPSISSDD SIQLVELRLPSSPELPPHLHTTNGLPSHLMPALHQAFVMAAQHFQVILQTLAPHLLIYDILQPW 91 WO 2021/126960 PCT/US2020/065285 APQvASSLNIPAyNFSTTGASMLSRTLHPTHYPSSKFPISEFVEHNHWRAMITTADGALTEEGR KIEETLANCLHTSCGWLVNSFRELETKYIDYLSVLLNKKWPVGPLVYEPNQEGEDEGYSSIK NWLDKKEPSSTVFVSFGTEYFPSKEEMEEIAYGLELSEVNFIWVLRFPQGDSTSTIEDALPKGF LERAGERAMWKGWAPQAKILKHWSTGGLVSHCGWNSMMEGMMFGVPIIAVPMHLDQPFNAGLV EEAGVGA/EAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMDTKHGPTYFSRSKVSSFGR LYKINRPTTLTVGRFWSKQIEMKRE Siraitia grosvenorii UGT94-289-1 (SEQ ID NO: 122) MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA PQLASSLNI PAINFNTTGASVLTRMLHATHY PSSKFP ISE F’VLHDYWKAMY SAAGGAVTKKDH K IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKWPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWWRFPQGDNTSAIEDALPKGFL ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE EAGVGVEAKRDPDGKIQRDEVAKLIKEVWEKTREDVRKKAREMSEILRSKGEEKMDEMVAAIS LFLKI Moraordica charantia 1 (McUGT1) (SEQ ID NO: 123) MAQPQTQARVIVFPYPTVGHIKPFLSLAELLADGGLDVVFLSTEYNHRRIPNLEALASRFPTLH FDTIPDGLPIDKPRVIIGGELYTSMRDGVKQRLRQVLQSYNDGSSPITCVICDVMLSGPIEAAE ELGIPVVTFCPYSARYLCAHFVMPKLIEEGQIPFTDGNLAGEIQGVPLFGGLLRRDHLPGFWFV KSLSDEVWSHAFLNQTLAVGRTSALIINTLDELEAPFLAHLSSTFDKIYPIGPLDALSKSRLGD SSSSSTVLTAFWKEDQACMSWLDSQPPKSVIFVSFGSTMRMTADKLVEFWHGLVNSGTRFLCVL RSDIVEGGGAADLIKQVGETGNGIWEWAAQEKVLAHRAVGGFLTHCGWNSTMESIAAGVPMMC WQIYGDQMINATWIGKVWKIGIERDDKWDRSTVEKMIKELMEGEKGAEIQRSMEKFSKLANDKV VKGGTSFENLELIVEYLKKLKPSN Momordica charantia 2 (McUGT2) (SEQ ID NO: 124) MAOPRVLLFPFPAMGHVKPFLSLAELLSDAGVEVVFLSTEYNHRRIPDIGALAARFPTLHFETI PDGLPPDQPRVLADGHLYFSMLDGTKPRFRQLIQSLNGNPRPITCIINDVMLSSPIEVAEEFGI PVIAFCPCSARFLSVHFFMPNFIEEAQ1PYTDENPMGKIEEATVFEGLLRRKDLPGLWCAKSSN ISFSHRFINQTIAAGRASALILNTFDELESPFLNHLSSIFPKIYCTGPLNALSRSRLGKSSSSS SALAGFWKEDQAYMSWLESQPPRSVIEVSFGSTMKMEAVKLAEFWYGLVNSGSPFLEVTRPDCV INSGDAAEVMEGRGRGMVVEWASQEKVLAHPAVGGFLTHCGWNSTVESIVAGVPMMCCPIVADQ LSNATWIHKVWKIGIEGDEKWDRSTVEMMIKELMESQKGTEIRTSIEMLSKLANEKVVKGGTSL NN FE LLVE DIKTLRRPYI Momordica charantia 3 (McUGT3) (SEQ ID NO: 125) MEQSDSNSDDHQHHVLLFPFPAKGHIKPFLCLAQLLCGAGLQVTFLNTDHNHRRIDDRHRRLLA TQFPMLHFKSISDGLPPDHPRDLLDGKLIASMRRVTESLFRQLLLSYNGYGNGTNNVSNSGRRP PISCVITDVIFSFPVEVAEELGIPVFSFATFSARFLFLYFWIPKLIQEGQLPFPDGKTNQELYG VPGAEGIIRCKDLPGSWSVEAYAKNDPMNFVKQTLASSRSSGLILNTFEDLEAPFVTHLSNTFD KIYTIGPIHSLLGTSHCGLWKEDYACLAWLDARPRKSVVFVSFGSLVKTISRELMELWHGLVSS GKS FLLVLRS DVVEGEDE EOVVKEILE SNGEGKWLVVGWAPQE EVLAHEAI GGFLTH SGWNSTM ESIAAGVPMVCWPKIGDQPSNCTWVSRVWKVGLEMEERYDRSTVARjyiARSMMEQEGKEMERRIA ELAKRVKY RVGKDGE S Y RNLE S LI RD I KI'T KSSN Momordica charantia 4 (McUGT4) (SEQ ID NO: 126) 92 WO 2021/126960 PCT/US2020/065285 MDAHQQAEHETTiLMLPWVGYGHLTAIEELAKALSRRNFHIYYCSTPVNIESIKPKLTiPCSSI QFVELHLPSSDDLPPNLHTTNGLPSHLMPTLHQAFSAAAPLFEEILQTLCPHLLIYDSLQPWAP KIASSLKIPALNFNTSGVSVIAQALHAIHHPDSKFPLSDFILHNYWKSTYTTADGGASEKTRRA REAFLYCLMSSGNAILIN1FRELEGEY1DYLSLLLNKKVIPIGPLVYEPNQDEDQDEEYRS1KN WLDKKEPCSTVFVSFGSEYFPSNEEMEEIAPGLEESGANFIWVVRFPKLENRNGIIEEGLLERA GERGMVIKEWAPQARILRHGSIGGFVSHCGWNSVMESIICGVPVIGVPMRVDQPYNAGLVEEAG VGVEAKRDPDGKIQRHEVSKLIKQVWEKTRDDVRKKVAQMSEILRRKGDEKIDEMVALISLLP KG Momordica charantia 5 (McUGTS) (SEQ ID MO: 127 J MDARQQAEHTTTILMLPWVGYGHLSAYLELAKALSRRNFHIYYCSTPVNIESIKPKLTIPCSSI QFVELHLPFSDDLPPNLHTTNGLPSHLMPALHQAFSAAAPLFEAILQTLCPHLLIYDSLQPWAP QIASSLKIPALNFNTTGVSVIARALHTIHHPDSKFPLSEIVLHNYWKATHATADGANPEKFRRD LEALLCCLHSSCNAILINTFRELEGEYIDYLSLLLNKKVTPIGPLVYEPNQDEEQDEEYRSIKN WLDKKEPYSTIFVSFGSEYFPSNEEMEEIARGLEESGANFIWVVRFHKLENGNGITEEGLLERA GERGMVIQGWAPQARILRHGSIGGFVSHCGWNSVMESIICGVPVIGVFMGLDOPYNAGLVEEAG VGVEAKRDPDGKIQRHEVSKLIKQVWEKTRDDVRKKVAQMSEILRRKGDEKIDEMVALISLLL KG Cucumis sativus (SEQ ID NO: 128) MGLSPTDHVLLFPFPAKGHIKPFFCLAHLLCNAGLRVTFLSTEHHHQKLHNLTHLAAQIPSLHF QSISDGLSLDHPRNLLDGQLFKSMPQVTKPLFRQLLLSYKDGTSPITCVITDLILRFPMDVAQE LDIPVFCFSTFSARFLFLYFSIPKLLEDGQIPYPEGNSNQVLHGIPGAEGLLRCKDLPGYWSVE AVANYNPMNFVNQTIATSKSHGLILNTFDELEVPFITNLSKIYKKVYTIGPIHSLLKKSVQTQY EFWKEDHSCLAWLDSQPPRSVMFVSFGSIVKLKSSQLKEFWNGLVDSGKAFLLVLRSDALVEET GEEDEKQKELVIKEIMETKEEGRWIVNWAPQEKVLEHKAIGGFLTHSGWNSTLESVAVGVPMV SWPQIGDQPSNATWLSKVWKIGVEMEDSYDRSTVESKVRSIMEHEDKKMENAIVELAKRVDDRV S KEGT SY QN LQRLIE; DIE G FKLN Cucurbita maxima 1 (CmaUGTl) (SEQ ID NO: 129) MELSHTHHVLLFPFPAKGHIKPFFSLAQLLCNAGLRVTFLMTDHHHRRIHDLNRLAAQLPTLHF DSVSDGLPPDEPRNVFDGKLYESIRQVTSSLFRELLVSYNNGTSSGRPPITCVITDVMFRFPID IAEELGIPVFTFSTFSARFLFLIFWIPKLLEDGQLRYPEQELHGVPGAEGLIRWKDLPGFWSVE DVADWDPMNFVNQTLATSRSSGLILMTFDELEAPFLTSLSKIYKKIYSLGPINSLLKNFQSQPQ YNLWKEDHSCMAWLDSQPRKSWFVSFGSWKLTSRQLMEFWNGLVNSGMPFLLVLRSDVIEAG EEV’VREIMERKAEGRWVIVSWAPQEEVLAHDAVGGFLTHSGWNSTLESLAAGVPMISWPQIGDQ TSNSTWISKVWRIGLQLEDGFDSSTIETMVRSIMDQTMEKT ־AAELAERAKNRASKNGTSYRNFQ TLIODITNIIETHI Cucurbita maxima 2 (CmaUGT2) (SEQ ID NO: 130) MDAQKAVDTPPTTVLMLPWIGYGHLSAYLELAKALSRRNFHVYFCSTPVNLDSIKPNLIPPPSS IQFVDLHLPSSPELPPHLHTTNGLPSHLKPTLHQAFSAAAQHFEAILQTLSPHLLIYDSLQPWA PRIASSLNIPAINFNTTAVSIIAHALHSVHYPDSKFPFSDFVLHDYWKAKYTTADGATSEKIRR GAEAFLYCLNASCDWLVNSFRELEGEYMDYLSVLLKKKWSVGPLVYEPSEGEEDEEYWRIKK WLDEKEALSTVLVSFGSEYFPSKEEMEEIAHGLEESEANFIWVVRFPKGEESCRGIEEALPKGF VEPAGERAMVVKKNAPQGKILKHGSIGGFVSHCGWNSVLESIRFGVPVIGVPMHLDQPYNAGLL EEAGlGVEAKRDADGKIQRDQVASLIKRvVVEKTREDIWKTVREMREVLRRRDDDMIDEMVAEI SVVLKI Cucurbita maxima 3 (CmaUGT3) (SEQ ID NO: 131) 93 WO 2021/126960 PCT/US2020/065285 MSSbiLFLKISIPFGRLRDSALJMCSv B’HCKLHLAIAlAMDAQQAANKSPTATTIFMLPWAGYGHLSAYLELAKALSTRNFHIYFCSTPVSLASIKPRLIPSCSSIQFVELHLPSSDEFPPHLHTTNGLP SRLVPTFHQAFSEAAQTFEAFLQTLRPHLLIYDSLQPWAPRIASSLNIPAINFFTAGAFAVSHV LRAFHYPDSQFPSSDFVLHSRWKIKNTTAESPTQAKLPKIGEA1GYCLNASRGVILTNSFRELE GKYIDYLSVILKKRVFPIGPLVYQPMQDEEDEDYSRIKNWLDRKEASSTVLVSFGSEFFLSKEE TEAIAHGLEQSEANFIWGIRFPKGAKKNAIEEALPEGFLERAGGRAMVVEEWVPQGKILKHGSI GGFVSHCGWNSAMESIVCGVPIIGIPMQVDQPFNAGILEEAGVGVEAKRDSDGKIQRDEVAKLIKEWVERTREDIRNKLEKINEILRSRREEKLDELATEISLLSRN cure :1.RIOSC.ha td.(CmoUGTl) (SEQ ID MO: 132 J MELSPTHHLLLFPFPAKGHIKPFFSLAQLLCNAGARVTFLNTDHHHRRIHDLDRLAAQLPTLHF DSVSDGLPPDESRNVFDGKLYESIRQVTSSLFRELLVSYNNGTSSGRPPITCVITDCMFRFPID IAEELGIPVFTFSTFSARFLFLFFWIPKLLEDGQLRYPEQELHGVPGAEGLIRCKDLPGFLSDE DVAHWKPINFVNQILATSRSSGLILNTFDELEAPELTSLSKIYKKIYSLGPINSLLKNFQSQPQ YNLWKEDHSCMAWLDSQPPKSVVFVSFGSVVKLTMRQLVEFWNGLVNSGKPFLLVLRSDVIEAG EEVVRENMERKAEGRWMIVSWAPQEEVLAHDAVGGFLTHSGWNSTLESLAAGVPMISWTQIGDQ TSNSTWVSKVWRIGLQLEDGFDSFTIETMVRSVMDQTMEKTVAELAERAKNRASKNGTSYRNFQTLIQDITNIIETHI ,ucurbi ta (CmoUGTl) (SEQ ID NO: 133) MDAQKAVDTPPTTVLMLPWIGYGHLSAYLELAKALSRRNFHVYFCSTPVNLDSIKPNLIPPPPS IQFVDLHLPSSPELPPHLHTTNGLPSHLKPTLHQAFSAAAQHFEAILQTLSPHLLIYDSLQPWA PRIASSLNIPAINFNTTAVSIIAHALHSVHYPDSKFPFSDFVLHDYWKAKYTTADGATSEKTRR GVEAFLYCLNASCDWLVNSFRELEGEYMDYLSVLLKKKWSVGPLVYEPSEGEEDEEYWRIKK WLDEKEALSTVLVSFGSEYFPPKEEMEEIAHGLEESEANFIWVVRFPKGEESSSRGIEEALPKG FVERAGERMWKKWAPQGKILKHGSIGGFVSHCGWNSVLESIRFGVPVIGAPMHLDQPYNAGL LEEAGIGVEAKRDADGKIQRDQVASLIKQVWEKTREDIWKKVREMREVLRRRDDDDMMIDEMV AVISVVLKI Cucurbita moschata 3 (CmoUGT3) (SEO 1D NO: 134) MDAQQAANKSPTASTIFMLPWVGYGHLSAYLELAKALSTRNFHVYFCSTPVSLASIKPRLIPSCSSIQFVELHLPSSDEFPPHLHTTMGLPAHLVPTIHQAFAAAAQTFEAFLQTLRPHLLIYDSLQPWAPRIASSLNIPAINFFTAGAFAATSHVLRAFHYPDSQFPSSDFVLHSRWKIKNTTAESPTQVKIPKIGEAIGYCLNASRGVILTNSFRELEGKYIDYLSVILKKRVLPIGPLVYQPNQDEEDEDYSRIKNWLDRKEASSTVLVSFGSEFFLSKEETEAIAHGLEQSEANFIWGIRFPKGAKKNAIEEALPEGFLERVGGRAMWEEWVPQGKILKHGMIGGFVSHCGWNSAI-IESIMCGVPVIGIPMQVDQPFNAGILEEAGVGVEAKRDSDGKIQRDEVAKLIBCEVVVE RTREDIRMKLEE1NEIlRTRREEKLDELA.trISLLCKN Prunus persica. (SEQ ID NO: 133) MAMKQPHVIIFPFPLQGHMKPLLCLAELLCHAGLHVTYVNTHHNHQRLAMRQALSTHFPTLHFE SISDGLPEDDPRTLNSQLLlALKTSIRPHFRELLKTISLKAESNDTLVPPPSCIMTDGLVTFAF DVAEELGLPILSFNVPCPRYLWTCLCLPKLIENGQLPEQDDDMNVEITGVPGMEGLLHRQDLPG FCRVKQADHPSLQFAINETQTLKRASALILDTVYELDAPCISHMALMFPKIYTLGPLHALLNSQ IGDMSRGLASHGSLWKSDLNCMTWLDSQPSKSIIYVSFGTLVHLTRAQVIEFWYGLVNSGHPFL WVMRSDITSGDHQIPAELENGTKERGCIVDWVSQEEvLAHKSVGGFLTHSGWNSTLESIVAGLP MICWPKLGDHYIISSTVCRQWKIGLQLNENCDRSNIESMVQTLMGSKREEIQSSMDAISKLSRD SVAE GG S S HNNL EQLIE YIRNLQHQN Theobroma cacao (SEQ 1D NO: 136) 94 WO 2021/126960 PCT/US2020/065285 MRQPHVLvLPFPA.QGHIKPMLCLAELLCQAGLRvTFLNTHHShRRLNNLQDLSTRFPTLHFESV SDGLPEDHPRNLVHFMHLVHSIKMVTKPLLRDLLTSLSLKTDIPPVSCIIADGILSFAIDVAEE LQIKVIIFRTISSCCLWSYLCVPKLIQQGELQFSDSDMGQKVSSVPEMKGSLRLHDRPYSFGLK QLEDPNFQFFVSETQAI^RASAVIFNTFDSLEAPVLSQMIPLLPKVYTIGPLHALRKARLGDLS QHSSFNGNLREADHNCITWLDSQPLRSWYVSFGSHVVLTSEELLEFWHGLVNSGKRFLWVLRP DIIAGEKDHNQIIAREPDLGTKEKGLLVDWAPQEEVLAHPSVGGFLTHCGWNSTLESMVAGVPM LCWPKLPDQLVNSSCVSEWKIGLDLKDMCDRSTVEKMVRALMEDRREEV14RSVDGISKLARES VSHGGSSSSNLEMLIOELET Corchorus capsularis (SEQ ID NO: 137) MDSKQKKMSVLMFPWLAYGHISPFLELAKKLSKRNFHTFFFSTPINLNSIKSKLSPKYAQSIQF VELHLPSLPDLPPHYHTTNGLPPHLMNTLKKAFDMSSLQFSKILKTLNPDLLVYDFIQPWAPLL ALSNKIPAVEPLOTSAAMS S FSVHAFKKPCEDFPFPNIYVHGNFMNAKFNNMENCSSDDSISDQ DRVLQCFERSTKIILVKTFEELEGKFMDYLSVIJjNKKIVPTGPLTQDPNEDEGDDDERTKLLLE WLNKKSKSSTVFVSFGSEYFLSKEEREEIAYGLELSKVNFIWV1RFPLGENKTNLEEALPQGFL QRVSERGLVVENWAPOAKILQHSSIGG FVSHCGWSSVME SLKFGVPIIAIPMHLDQPLNARLW DVGVGLEVIRNHGSLEREEIAKLIKEVVLGNGNDGEIVRRKAREMSMHIKKKGEKDMDELVEEL MLICKMKPNSCHLS Ziziphus jujube (SEQ ID NO: 138) MMERQRSIKVLMFPWLAHGHISPFLELAKRLTDRNFQIYFCSTPVNLTSVKPKLSQKYSSSIKL VELHLPSLPDLPPHYHTTNGLALNLIPTLKKAFDMSSSSFST1LSTIKPDLLIYDFLQPWAPQL ASCMNIPAVNFLSAGASMVSFVLHSIKYNGDDHDDEFLTTELHLSDSMEAKFAEMTESSPDEHI DRAVTCLERSNSLILIKSFRELEGKYLDYLSLSFAKKWPIGPLVAQDTNPEDDSMDIINWLDK KEKSSTVFVSFGSEYYLTNEEMEEIAYGLELSKVNFIWWRFPLGQKMAVEEALPKGFLERVGE KGMVVEDWAPQMKILGHSSIGGFVSHCGWSSLMESLKLGVPIIAMPMQLDQPINAKLVERSGVG LEVKRDKNGRIEREYLAKVIREIWEKARQDIEKKAREMSNIITEKGEEEIDNWEELAKLCGM Vitis vinifera (SEQ ID MO: 139) MDARQSDGISVLMFPWLAHGHISPFLQLAKKLSKRNFSIYFCSTPVNLDPIKGKLSESYSLSIQ LVKLHLPSLPELPPQYHT1NGLPPHLMPTLKMAFDMASPNFSN1LKTLHPDLLIYDFLQPWAPA AASSLNIPAVQFLSTGATLQSFLAHRHRKPGIEFPFQEIHLPDYEIGRLNRFLEPSAGRISDRD RANQCLEFJSSRFSLIKTFREIEAKYLDYVSDLTKKKMVTVGPLLQDPEDEDEATDIVEWLNKKC EASAVFVSFGSEYFVSKEEMEEIAHGLELSNVDFIWVVRFPMGEKIRLEDALPPGFLHRLGDRG MWEGWAPQRKILGHS SIGGFVS HC GW SSVMEGMKEGVP 11AMPMHLDQ PINAKLVE AVGVGRE VKRDENRKLEREEIAKVIKEWGEKNGENVRRKARELSETLRKKGDEEIDVWEELKQLCSY vJuglans regia (SEQ ID NO: 14 0) MDTARKRIRWMLPWLAHGHISPFLELSKKLAKRNFHIYFCSTPVNLSSIKPKLSGKYSRSIQL VELHLPSLPELPPQYHTTKGLPPHLNATLKRAFDMAGPHFSNILKTLSPDLLIYDFLQPWAPAI AASQNIPAINFLSTGAAMTSFVLHAMKKPGDEFPFPEIHLDECMKTRFVDLPEDHSPSDDHNHI sdkdralkcferssgfvmm:ktfeelegkyinflshlmqkkivpvgplvqnpvrgdhekaktlew LDKRKQSSAYFVSFGTEYFLSKEEMEEIAYGLELSNVNFIWVVRFPEGEKVKLEEALPEGFLQR VGEKGMVVEGWAPQAKILMHPSIGGFVSHCGWSSVMESIDFGVPIVAIPMQLDQPVNAKVVEQA GVGVEVKRDRDGKLEREEVATVIREVVMGNIGESVRKKEREMRDNIRKKGEEKMDGVAQELVQL YGNGIKNV Hevea brasiliensis (SEQ ID MO: 141) 95 (g^T :on QI 03s) 3058X99 BUBTpnEqaj ftas^s Cf ISSIKSAISGlS3ASS99MW3SA(TO10MlAHVN03IA3933(]ARAHHIVtra139H3M9N31AA9A)iTAaSHKHVNGdOa39aasa1HdA9a0AS33XSNM9SHlM3V9IV9HY3AaOOdA®>IAI>{SH3D3aDad3d3AMlS9MAa9dHAAM3as03SGA39WI3r1aa33aASSXS9aSAArIASSdd00arIM0aAXHaHarnSSSSVXGH3d3dI3؟SdYdI3HIAX333S33G3MaSNMIA9SSVMX0>lIM393I3>n10MNSAY OrraHrTNr]:sGVAs0va ־ A1 ؟ sAHYHaNar1:ssjJH ־[: i0d1 ؛ L ׳) L>1aaa:cr1z9r1a ؛ j333'a ؛ 5SY ؛ xda ؛ SM1aMA>1r1M3va1.130SAsaaaasv31'ma3aH>naavDHaN11d1Hpii9VGdDHl1dr1NS1H3aDdaNa31ai ׳ lN3J ؛ IdNIH903dAd331rWIHHAXX3X ؟ LANOT0 ؟ H3X3HdARSXadMN3NXH3IXIS39:HSA :ON QI 035) T59£.X9n BUBTpneqej eiAd^s >lYNM0asa0a^AAdAD3SIw3djigN3DDHJV1aDSAS3HVGAaG0>IOMYAT3E»I91I3IA3SrLN 3dTM933MHMIAr1aNAasa1TfaXISSA03d9H>:A3SSaVAAAAS33dHaa3MNRO3HHN®iA 3Na9Na^aaa3HMar1AWSd'1id91A>nNM1MHiM31A33a'1>uasNiaAMdY0a1wa09a3w0SMdS0103HN03I'1aA3MA03Ad3SaASAI3D3dASTAD3HAHAAr؛SNAAOV0AaaS33a1Daa Q£ lYAGGAMBlHSGATIYailXOSSOGNNlGaYGSiiSOAONGXaGASaOYSAiaOOSaOOaSTVOTSI SLLLLLNSHNILSNILHILLAILLLMABMSIILS5GIHdNIHbdHdITIAHGSMMIMOHFW (7[ ל T :ON ai 03s) T0HX9n pyreTpnBqaj ptab^s CNTSNXHNSOTSAiyaYAOaiSOagNHHIHaSWaHraSHAVagNMXAnAa^IAyVIHaHSdHgaMNd STXI3rI9A9AaSA3HVN3dOGIHHdWIIdA9dyWSSHHSSM90HSAa99ISSHS3IPWOdVM9SAA,I9H39IH3V39Nd r IVa33»A»39AdaHAAMI3NA^S ,I3r [9HV1a3H33XS r MX3S93SAaAJ.S0SdSa3MNII3IM3a3NGdA&Ar1dOV3AINNA33VSSAaAANV310؛da1>mMANSSd3rIO03AdaiNaAGONSVSHGGNISNAWaaNINISddddaSSdNNYaHGSdWIPaVSNAaGIlVdINGSSYASdVMdOGHaDIGGadNGONGTNGGsaSYHadVJLmANHGmdGONAdHXHddGadGSdGHGaA ’lOISSSJ^SSTOdniSNTNIdJ.SDaXIiaNHNJ.'raVTSUdSIHOHVIMdTWIAHISHra'KIW GANiL^HGDGOAia aAA9a13339HMH,IAaSHSmMHHAAA93H3AAASHIIMVW33H3rI39S0NHSA3I9A9W3AI HSNId0a3NRdIdfIIdA93MWS3MISSfflfDDHSAa99ISSH9G™־d0dVMD3AI39da013aAA9 C[G SA3AXXSH3HH31OIA3 ؛( 3A3S ؛ 3HYI33W33NSr ؛ 3rI :] ؛ ad3S33r1NIN3ONd.H'aAAM13NANSaHHHHHHaA13aaa3HI1La3ArId9AdAI3NNONASA.XaiA>iO3T:3H3J;NYrIlAHas0GIOHE)Vti SDaHlONSSSadAHaGHKSIdAaydXISdaGaHISHNMNGMGDdDSWYPiiSSDGaMAVdllxnSSYG VdVMdO'ISaXIrIr1adNlJMUNl3aSSSVHadVy31lEStnHddrI9NXXHXHSdrI3dSSdlHrI3AannMd،JWrIASIMHX0YXYW ؛ H9 ؟ 1adS ׳ MSrI3 ؟ 01S3HA3dSTSd^ISa'INAdXSaSXAH،HNHNVT :ON aI OSS) E^ueinose qoqruew OINIAGDG0GG3aAADaT333O}INATXaSM3HYNHSANMDaN3AAANHTXNYI3333G3DSNNHdA3I0A OVG3A1aSNAd0GGNWdNYlTdAO3NNS31A[ISSlVlODHSAdDOlSSHHGIN00dYjy1OaAINDHAH3GAaNdGS33ANIN30AdaGAAM1aHVNSG3G0AVT33W33ASGaA310aSA3AlSG33NNa C GMNTAOHHaNINaaaadldGOAGdDAdAlNNmMASGAaiXNyaGaHdAAYGINHdSDaWDHOYHZDGIGNSSSTIVBSAMWMWKGHAXISSJdJMVTONNIHTDJDSWURSLDTZNAVJINISSVI YdVMdOiaaxmaaNGDniNaasssvidadVNMGrMHGHddGDNXAHxmdGSdssdimaAi 01SaSA3VSGNdNTSaaNAdASDaXAHaNaMNGNNSG3dadSaHOHYGMaaiALGASIN ־ay0aA3N S8ZS90/0mSfl/13d 096971/1707 OM WO 2021/126960 PCT/US2020/065285 MDAlxLATTEKKPHvIFIPFPAJQSHIKAMLJKLAQLLHHKGLQITFVbiTDFIHNQFIjESSGPHCIjDGAPGFRFETIPDGVSHSPEASIPIRESLLRSIETNFLDRFIDLVTKLPDPPTCIISDGFLSVFTI DAAKKLGIPVMMYWTLAACGFMGFYHIHSLIEKGFAPLKDASYLTNGYLDTVIDWVPGMEGIRL KDFPLDWSTDLNDKVLMFTTEAPQRSHFeZSHHIFHTFDELEPSlIKTLSLRYNHIYTIGPLQLL LDQIPEEKKQTGITSLHGYSLVKEEPECFQWLQSKEPNSWYVNFGSTTVMSLEDMTEFGWGLA NSNHYFLWIIRSNLVIGENAVLPPELEEHIKKRGFIASWCSQEKVLKHPSVGGFLTHCGWGSTIESLSAGVPMICWPYSWDQLTNCRYICKEWEVGLEMGTKVKRDEVKRLVQELMGEGGHKMRNKAKDWKEKARIAIAPblGSSSLNIDKMVKEITVLARN Stevia, rebaudiana UGT91D1 (SEQ ID NO: 147) MYNVTYHQNSKMIATSDSIVDDRKQLHVATFPWLAEGHILPFLQLSKLIAEKGHKVSFLSTTRN IQRLSSHISPLINWQLTLPRVQELPEDAEATTDVHPEDIQYLKKAVDGLQPEVTRFLEQHSPD WIIYDFTHYWLPSIAASLGISRAYFCVITPWTIAYLAPSSDAMINDSDGRTTVEDLTTPPKWFP PPTKVCWRKHDLARMEPYEAPGISDGYRMGMVFKGSDCLLFKCYHEFGTQWLPLLETLHQVPW PVGLLPPEIPGDEKDETWVSIKKWLDGKQKGSWYVALGSEALVSQTEWELALGLELSGLPFV WAYRKPKGPAKSDSVELPDGFVERTRDRGLVWTSWAPQLRILSHESVCGFLTHCGSGSIVEGLM FGHPLIMLPIFCDQPLNARLLEDKQVGIEIPRNEEDGCLTKESVARSLRSVWENEGEIYKANA RALSKIYNDTKVEKE Y VSQFVDY LEKNARAVAIDHES tevia rebaudiana UGT91D2 (SEQ ID NO: 148) MATSDSIVDDRKQLHVATFPWLAFGHILPYLQLSKLIAEKGHKVSFLSTTRNIQRLSSHISPLINWQLTLPRVQELPEDAEATTDVHPEDIPYLKKASDGLQPEVTRFLEQHSPDWIIYDYTHYWLP SIAASLGISRAHFSVTTPWAIAYMGPSADAMINGSDGRTTVEDLTTPPKWFPFPTKVCWRKHDLARLVPYKAPGISDGYRMGLVLKGSDCLLSKCYHEFGTQWLPLLETLHQVPVVPVGLLPPEVPGDEKDETWVSIKKWLDGKQKGSWYVALGSEVLVSQTEWELALGLELSGLPFVWAYBKPKGPAKSDSVELPDGFVERTRDRGLVWTSWAPQLRILSHESVCGFLTHCGSGSIVEGLMFGHPLIMLPIFGDQPLNARLLEDKQVGIEIPRNEEDGCLTKESVARSLRSVWEKEGEIYKANARELSKIYNDTKVE KE Y VS Q FVDY I! E KNT RAVA IDH E■ S Stevia rebaudiana JGT91D2e (SEQ ID NO: 149) MATSDSIVDDRKQLHVATFPWLAFGHILPYLQLSKLIAEKGHKVSFLSTTRNIQRLSSHISPLI MVVQLTLPRVQELPEDAEATTDVHPEDTPYLKKASDGLQPEVTRFLEQHSPDWIIYDYTHYWLPSIAASLGISRAHFSVTTPWAIAYMGPSADAMINGSDGRTTVEDLTTPPKWFPFPTKVCWRKHDLARLVPYKAPGISDGYRMGLVLKGSDCLLSKCYHEFGTQWLPLLETLHQVPVVPVGLLPPEIPGD EKDETWVSIKKWLDGKQKGSWYVALGSEVLVSQTEWELALGLELSGLPFVWAYRKPKGPAKS DSVELPDGFVERTRDRGLVWTSWAPQLRILSHESVCGFLTHCGSGSIVEGLMFGHPLIMLPIFGDQPLNARLLEDKQVGIEIPRNEEDGCLTKESVARSLRSVV tvEKEGEIYIGaI,?ARELSKIYNDTKV EKEYVSQFVDYLEKNARAVAIDHES OsUGTl2״ (SEQ ID NO: 110) MDSGYSSSYAAAAGMHWICPWLAFGHLLPCLDLAQRLASRGHRVSFVSTPRNISRLPPVRPAL APLVAFVALPLPRVEGLPDGAESTNDVPHDRPDMVELHRRAFDGLAAPFSEFLGTACADWVIVDVFHHWAAAAALE HKVPCAMMLLGSAHMIASIADRRLE RAETES PAAAGQGRPAAAPTFEVARMK LlRTKGSSGMSLAERFSLALSRSSLVVGRSCVEFEPETVPLLSTLRGKPITFLGLMPPLHEGRREDGEDATVRWLDAQPAKSVVYVALGSEVPLGVEKVHELALGLELAGTRFLV?ALRKPTGVSDADLLPAGFEERTRGRGVVATRWVPQMSILAHAAVGAFLTHCGWNSTIEGLMFGHPLIMLPIFGDQGPNARL1EAKNAGLQVARNDGDGSFDREGVAAAIRAVAVEEESSKVFQAKAKKLQEIVADMACHERY1DGFIQQLRSYKD Arabiciopsis thaliana AAN72025.1 (SEQ ID NO: 97 WO 2021/126960 PCT/US2020/065285 MGSiSEMvFETCPSPNPIHvMLVSFQGQGHvNPLERLGKLIASKGLLvTFVTTELWGKPMRQANKIVDGELKPVGSGSIRFEFFDEEWAEDDDRRADFSLYIAHLESVG1REVSKLVRRYEEANEPVS CLINNPFIPWVCHVAEEFNIPCAVLWVQSCACFSAYYHYQDGSVSFPTETEPELDVKLPCVPVL KNDEIPSFLHPSSRFTGFRQAILGQFKNLSKSFCVLIDSFDSLEREVIDYMSSLCPVKTVGPLF KVARTVTSDVSGDICKSTDKCLEWLDSRPKSSVVYISFGTVAYLKQEQIEEIAHGVLKSGLSFL WVIRPPPHDLKVETHVLPQELKESSAKGKGMIVDWCPQEQVLSHPSVACFVTHCGWNSTMESLS SGVPVVCCPQWGDQVTDAVYLIDVFKTGVRLGRGATEERVVPREEVAEKLLEATVGEKAEELRK NALKWKAEAEAAVAPGGSSDKNFREFVEKLGAGVTKTKDNGY Arabidopsis thaliana AAF87256.1 (SEQ ID NO: 152) MGSHVAQKQHWCVPYPAQGHINPMMKVAKLLYAKGFHITFVNTVYNHNRLLRSRGPNAVDGLPS FREES IPDGLPETDVDVTQDIPTLCE STMKHCLAPEKELLRQ INARDDVPPVSCIVSDGCMS FTLFAAEELGVPEVLFWTTSACGFLAYLYYYRFIEKGLSPIKDESYLTKEHLDTKIDWIPSMKNLRLKDIPSFIRTTNPDDIMLNFIIREADRAKRASAIILNTFDDLEHDVIQSMKSIVPPVYSIGPLHLLEKQESGEYSEIGRTGSNLWREETECLDWLNTKARNSVVYVNFGSITVLSAKQLVEFANGLAatgkeflwvirpdlvagdeamvppefltatadrrmlaswcpqekvlshpaiggflthcgwnstl eslcggvpmvcnpffaeoqtnckfsrdewevgieiggdvkreeveavvrelmdeekgknmrekaEEWRRLANEATEHKHGSSKLNFEMLVNKVLLGE Columba livia C1UGT1 (SEQ ID NO: 153) MIHCGKKHICAFVTCILISASILMYSWKDPQLQNNITRKIFQATSALPASQLCRGKPAQNVITA LEDNRTFIISPYFDDRESKVTRVIGIVHHEDVKQLYCWFCCQPDGKIYVARAKIDVHSDRFGFP YGAADIVCLEPENCNPTHVSIHQSPHANIDQLPSFKIKNRKSETFSVDFTVCISAMFGNYNNVL QFIQSVEMYKILGVQKWIYKNNCSQLMEKVLKFYMEEGTVEIIPWPINSHLKVSTKWHFSMDA KDIGYYGQITALNDCIYRNMQRSKFWLNDADEIILPLKHLDWKAMMSSLQEQNPGAGI FLFEN HIfpktvstpvfnisswnrvpgvnilqhvhrepdrkbvfnpkkmiidprqvvqtsvhsvlrayg NSVNVPADVALVYHCRVPLQEELPRESLIRDTALWRYNSSLITNVNKVLHQTVL Haemophilus ducreyi LgtF Q9L875 (SEQ ID NO: 154) MPILTVAMIVKNEAQDLAECLKTVDGWVDEIVIVDSGSTDDTLKIATQFNAKVYVNSDWQGFGP QROFAQQYSDYVLWLDADERVTPELFASILOAVQHNQKNTVYKVSRLSEIFGKEIRYSGWYP DYWRLYPTYLAKYGDELVHEKVHYPADSRVEKLQGDLLHFTYKNIHHYLVKSASYAKAWAMQRAKAGKKASLLDGVTHAIACFLKMYLFKAGFLDGKQGFLLAVLSAHSTFVKYADLWDRTRS Neisseria gonorrhoeae Q5F735 (SEQ ID MO: 155) MKKVSVLIVAKNEANHIRECIESCRFDKEVIVIDDHSADNTAEIAEGLGAKVFRRHLNGDFGAQ KTFAIEQAGGEWVFLIDADERCTPELSDEISKIVRTGDYAAYFVERRNLFPNHPATHGAMRPDS VCRLMPKKGGSVQGKVHETVQTPYPERRLKHFMYHYTYDNWEQYFNKFNKYTSISAEKYREQGK PVSFVRDIILRPIWGFFKIYILNKGFLDGKMGWIMSVNHSYYTMIKYVKLYYLYKSGGKF Rhizobium meliloti (strain 1021) ExoM P33695 (SEQ 1D NO: 156) MPNETLHIDIGVCTYRRPELAETLRSLAAMNVPERARLRVIVADNDAEPSARALVEGLRPEMPFDILYVHCPHSNISIARNCCLDNSTGDFLAFLDDDETVSGDWLTRLLETARTTGAAAVLGPVRAHYGPTAPRWMRSGDFHSTLPVWAKGEIRTGYTCNALLRRDAASLLGRRFKLSLGKSGGEDTDFFTGMHCAGGTIAFSPEAWVHEPVPENRASLAWLAKRRFRSGQTHGRLLAEKAHGLRQAWNIALAGA KSGFCATAAVLCFPSAARRNRFALRAVLHAGVISGLLGLKEIEQYGAREVTSA Rhizobium radiobacter Q44418 (SEQ ID NO: 157) 98 WO 2021/126960 PCT/US2020/065285 MCRCGRAVRSRPvCRPGQLVVRRSPRPRSRNHSRCRPLRLSVFPRPHRRvRHHCQRDlRWEPGR WIAVRWKAARSHRRFRRCPFPRQLVWP'VRERHRDAGDRRNQRERRRRDAYHEISEPKFRTRKRT E S EWMNKAITVIVWLLVSLCVLAI ITMPVSLQT HLVATAI SLI FLAT IKS FNGQGAWRLVALGF GTAIVLRYVYWRTTSTLPPVNQLENFIPGFLLYLAEMYSVVMLGLSLVIVSMPLPSRKTRPGSP DYRPTVDVFVPSYNEDAELLANTLAAAKNMDYPADRFTVWLLDDGGSVQKRNAANIVEAQAAQR RHEELKKLCEDLDVEYLTRERNVHAKAGNLNNGLAHSTGELVTVFDADHAPARDFLLETVGYFD EDPRLFLVQTPHFFVNPDPIERNLRTFETMPSENEMFYGIIQRGLDKWNGAFFCGSAAVLRREA LQDSDGFSGVSITEDCETALALHSRGWNSVYVDKPLIAGLQPATFASFIGQRSRWAQGMMQILI FRQPLFKRGLSFTQRLCYMSSTLFWLFPFPRTIFLFAPLFYLFFDLQIFVASGGEFLAYTAAYM LVNLMMQNYLYGSFRWPWISELYEYVQTVHLLPAVVSVIFNPGKPTFKVTAKDESIAEARLSEI SRPFFVIFALLLVAMAFAVWRIYSEPYKADVTLVVGGWNLLNLIFAGCALGVVSERGDKSASRR ITVKRRCEVQLGGSDTWVPASIDNVSVHGLLINIFDSATNIEKGATAIVKVKPHSEGVPETMPL NWRTVRGEGFVSIGCTFSPQRAVDHRLIADLIFANSEQWSEFQRVRRKKPGLIRGTAIFLAIA LFQTQRGLYYLVRARRPAPKSAKPVGAVK Streptococcus agalactiae cpsl 087183 (SEQ ID NO: 158) MIKKIEKDLISVIVPIYNVEDYLVECIESLIVQTYRNIEILLINDGSTDNCATIAKEFSERDCR VIYIEKSNGGLSEARNYGIYHSKGKYLTTVDSDDKVSSDYIANLYNAIQKHDSSIAIGGYLEFY ERHNSIRNYEYLDKVIPVEEALLNMYDIKTYGSIFITAWGKLFHKSIFNDLEFALNKYHEDEFF NYKAYLKANSITYIDKPLYHYRIRVGSIMNNSDNVIIARKKLDVLSALDERIKLITSLRKYSVF LQKTEIFYVNQY FRTKKFLKQQSVMFKEDNYIDAYRMYGRLLRKVKLVDKLKLIKNRFF Streptococcus pneumoniae cps3S 054 611 (SEQ ID NO: 159) MYTFILMLLDFFQNHDFHFFMLFFVFILIRWAVIYFHAVRYKSYSCSVSDEKLFSSVIIPWDE PLNLFESVLNRISRHKPSEIIWINGPKNERLVKLCHDFNEKLENNMTPIQCYYTPVPGKRNAI RVGLEHVDSQSDITVLVDSDTVWTPRTLSELLKPFVCDKKIGGVTTRQKILDPERNLVTMFANL LEEIRAEGTMKAMSVTGKVGCLPGRTIAFRNIVERVYTKFIEETFMGFHKEVSDDRSLTNLTLK KGYKTVMQDTSVVYTDAPTSWKKFIRQQLRWAEGSQYNNLKMTPWMIRNAPLMFFIYFTDMILP MLLISFGVNIFLLKILNITTIVYTASWWEIILYVLLGMIFSFGGRNFKAMSRMKWYYVFLIPVF IIVLSIIMCPIRLLGLMRCSDDLGWGTRNLTE MbUGTc13 (SEQ ID MO: 160) MADAMATTEKKP HVIFIP FPAQSHIKAMLKLAQLLHHKGLQITFVNTDFIHNQFLESSGPHCLD GAPGFRFETIPDGVSHSPEASIPIRESLLRSIETNFLDRFIDLVTKLPDPPTCIISDGFLSVFT IDAAKKLGIPVMMYWTLAACGFMGFYH1HSLIEKGFAPLKDASYLTNGYLDTVIDWVPGMEGIR LKDFPLDWSFDLNDKVLMFTTEATQRSHKVSHHIFHTFDELEPSIIKTLSLRYNHIYIIGPLQL LLDQIPEEKKQTGITSLHGYSLVKEEPECFQVJLQSKEPNS?VYVNFGSTTVMSLEDMTEFGWGL ANSNHYFLWIIRSNLVIGENAVLPPELEEHIKKRGFIASWCSQEKVLKHPSVGGFLTHCGWGST IESLSAGVPMICWPYSWDQLTNCRYICKEWEVGLEMGTKVKRDEVKRLVQELMGEGGHKMRNKA KDWKEKARIAIAPNGSSSLNIDKMVKEITVLARN MbUGTcl9 (SEQ ID NO: 161) MANHHECMMWLDDKPKESVVYVAFGSLVKHGPEQVEEITRALIDSDVMFLWVIKHKEEGKLPEN LSEVIKTGKGLIVAWCKQLDVLAHESVGCFVTHCGFNSTLEAISLGVPVVM4PQFSDQTTNAKL LDEILGVGVRVKADENGIVRRGNLASCIKMIMEEERGVIIRKNAVKWKDLAKVAVHEGGSSDND IVEFVSELIKAGSGEQQKIKKSPHVLLIPFPLQGHINPFIQFGKRLISKGVKTTLVTIIHTLNS TLNHSNTTTTSIEIQAISDGCDEGGEHSAGESYLETFKQVGSKSLADLIKKLQSEGTTIDAIIY DSMTEWVLDVAIEFGIDGGSFFTQACVVNSLYYHVHKGLISLPLGETVSVPGFPVLQRWETPLI LQNHEQIQSPWSQMLFGQFANIDQARWVFTNSFYKLEEEVIEWTRKIWNLKVIGPTLPSMYLDK RLDDDKDNGFNLYKA 99 WO 2021/126960 PCT/US2020/065285 MDUGTI-3 (SEQ ID NO: 1621 MENKTETTVRRRRRIILFPVPFQGHINPILQLANVLYSKGFSITIFHTMFNKPKTSNYPHFTFR FILDNDPQDERISNLPTHGPLAGMRIPIINEHGADELRRELELLMLASEEDEEVSCLITDALWY FAQSVADSLNLRRLVLMTSSLFNFHAHVSLPQFDELGYLDPDDKTRLEEOASGFPMLKVKDIKS AYSNWQILKEILGKMIKQTKASSGVIWNSFKELEESELETVIREIPAPSFLIPLPKHLTASSSS LLDHDRTVFQWLDQQPPSSVLYVSFGSTSEVDEKDFLEIARGLVDSKQSFLWVRPGFVKGSTW VEPLPDGFLGERGRIVKWVPQQEVLAHGAIGAFWTHSGWNSTLESVCEGVPMIFSDFGLDQPLN ARYMSDVLE77GVYLENGWERGEIANAIRRWVDEEGEYIRQNARVLKQKADVSLMKGGSSYESL ESLVSYISSL MbUGTl-2 (SEQ ID NO: 163) MATKGSSGMSLAERFWLTLSRSSLWGRSCVEFEPETVPLLSTLRGKPITFLGLMPPLHEGRRE DGEDATVRWLDAQPAKSWYVALGSEVPLGVBKVHELALGLELAGTRFLWALRKPTGVSDADLL PAGFEERTRGRGWATRWVPQMSILAHAAVGAFLTHCGWNSTIEGLMFGHPLIMLPIFGDQGPN ARL LEAKNAGLQVARNDGDGSFDREGvAAAIRAVAGEEESSKVFQAKAKKLQEIVADMACHERY IDGFIQQLRSYKDDSGYSSSYAAAAGMHWICP’SLAFGHLLPCLDLAQRLASRGHRVSFVSTPR NiSRLPPVRPALAPLVAFVALPLPRVEGLPDGAESTNDVPHDRPDMVELHRRAFDGLAAPFSEF LGTACADWVIVDVFHHWAAAAALEHKVPCAMMLLGSAEMIASIADERLEHAETESPAAAGQGRP AAAPTFEVARMKLIR Coffea arabica (CaUGT_l,6) (SEQ ID NO: 164) MAENHATFNVLMLPWLAHGHVSPYLELAKKLTARNENVYLCSSPATLSSVRSKLTEKFSQSIHL VELHLPKLPELPAEYHTTNGLPPHLMPTLKDAFDMAKPNFCNVLKSLKPDLLIYDLLQPWAPEA ASAFNIPAWFISSSATMTSEGLHFFKNPGTKYPYGNAIFYRDYESVFVENLTRRDRDTYRVIN CMERSSKIILIKGFNEIEGKYFDYFSCLTGKKWPVGPLVQDPVLDDEDCRIMQWLNKKEKGST VFVSFGSEYFLSKKDMEEIAHGLEVSNVDFIWWRFPKGENIVIEETLPKGFFERVGERGLWN GWAPQAKILTHPNVGGFVSHCGWNSVMESMKFGLPIIAMPMHLDQPINARLIEEVGAGVEVLRD SKGKLHRERMAETINKVMKEASGESVRKKARELQEKLELKGDEEIDDWKELVQLCATKNKRNG LHYY Stevia rebaudiana JGT85C1 (SEQ ID NO: 165) MADQMAKIDEKKPHWFIPFPAQSHIKCMLKLARILHQKGLYITFINTDTNHERLVASGGTQWL ENAPGFWFKTVPDGFGSAKDDGVKPTDALRELMDYLKTNFFDLFLDLVLKLEVPATCIICDGCM TFANTIRAAEKLNIPVILFWTMAACGFMAFYQAKvLKEKEIVPVKDETYLTNGYLDMEIDWIPG MKRIRLRDLPEFILATKQNYFAFEFLFETAQLADKVSHMIIHTFEELEASLVSEIKSIFPNVYT IGPLQLLLNKITQKETNNDSYSLWKEEPECVEWLNSKEPNSVVYVNFGSLAVMSLQDLVEFGWG LVNSNHYFLWIIRANLIDGKPAVMPQELKEAMNEKGFVGSWCSQEEVLNHPAVGGFLTHCGWGS IIESLSAGVPMLGWPSIGDQRANCRQMCKEWEVGMEIGKNVKRDEVEKLVRMLMEGLEGERMRK KALEWKKSATLATCCNGSSSLDVEKLANEIKKLSRN Arabidopsis thaliana AtUGT73C3 (SEQ ID NO: 202) MATEKTHQFHPSLHFVLFPFMAQGHMIPMIDIARLLAQRGVTITIVTTP HNAARFKNVLNRAIE SGLAINILHVKFPYQEFGLPEGKENIDSLDSTELMVPFFKAVNLLEDPVMKLMEEMKPRPSCLI SDWCLPYTSIIAKMFNIPKIVFHGMGCFNLLCMHVLRRNLEILENVKSDEEYFLVPS FPDRVEF TKLQLPVKANASGDWKEIMDEMVKAEYTSYGVIVNTFQELEPPYVKDYKEM4DGKVWSIGPVSL CNKAGADKAERGSKAAIDQDECLQWLDSKEEGSVLYVCLGSICNLPLSQLKELGLGLEESRRSF IWVIRGSEKYKELFEWMLESGFEERIKERGLLIKGWAPQVLILSHPSVGGFLTHCGWNSTLEGI 100 WO 2021/126960 PCT/US2020/065285 TSGlPLITwPLEGDQFCNQKLvVQvLKAGVSAGvEEVMKwGEED;K.IGVLvDKEGVKKAvE£LMGDSDDAKERRRRVKELGELAHKAVEKGGSSHSNITLLLQDIMQLAQFKN Hordeum vulgare subsp. Vulgare HvUGT Bl (SEQ ID NO: 204) MAQAESERMRWMFPWLAHGHINPYLELAKRLIASASGDHHLDVWHLVSTPANLAPLAHHQTD RLRLVELHLPSLPDLPPALHTTKGLPARLMPVLKRACDLAAPRFGALLDELCPDILVYDFIQPWAPLEAEARGVPAFHFATCGAAATAFFIHCLKTDRPPSAFPFESISLGGVDEDAKYTALVTVREDSTALVAERDRLPLSLERSSGFVAVKSSADIERKYMEYLSQLLGKEIIPTGPLLVDSGGSEEQRD GGRIMRWLDGEEPGSWFVSFGSEYFMSEHQMAQMARGLELSGVPFLWWR FPNAEDDARGAAR SMPPGFEPELGLVVEGWAPQRRILSHPSCGAFLTHCGWSSVLESMMGVPMVALPLHIDQPLNA NLAVELGAAAARVKQERFGEFTAEEVARAVRAAVKGKEGEAARRRARELQEWARNNGNDGQIA TLLQRMARLCGKDQAVPN Hordeum vulgare subsp. Vulgare HvUGT 33 (SEQ 1D NO: 205) MAEANDGGKMHWMLPWLAFGHVLPFTEFAKRVARQGHRVTLLSAPRNTRRLIDIPPGLAGLIR WHVPLPRVDGLPEHAEATIDLPSDHLRPCLRRAFDAAFERELSRLLQEEAKPDWVLVDYASYW APTAAARHGVPCAFLSLFGAAALSFFGTPETLLGIGRHAKTEPAHLTWPEYVPFPTTVAYRGY EARELFEPGMVPDDSGVSEGYRFAKTIEGCQLVGIRSSSEFEPEWLRLLGELYRKPVIPVGLFP PAPQDDVAGHEATLRWLDGQAPSSWYAAFGSEVKLTGAQLQRIALGLEASGLPFIWAFRAPTS TETGAASGGLPEGFEERLAGRGWCRGWPQVKFLAHASVGGFLTHAGWMSIAEGLAHGVRLVL LPLVFEQGLNARNlVDKNIGVEVARDEQDGSFAAGDIAAALRRVMVEDEGEGFGAKVKELAKVF GDDEVNDQCVREFLMHLSDHSKKNQGQD MbUGTl,2.2 (SEQ ID NO: 206) MATKGSSGMSLAERFWLTLSRSSLWGRSCVEFEPETVPLLSTLRGKPITFLGLMPPLKEGRREDGEDATVRWLDAQPAKSVVYVALGSEVPLGVEKVHELALGLELAGTRFLWALRKPTGVSDADLL PAGFEERTRGRGWATRWVPQMSILAHAAVGAFLTHCGWNSTIEGLMFGHPLIMLPIFGDQGPNARLIEAKNAGLQVARNDGDGSFDREGVAAAIRAVAVEEESSKVFQAKAKKLQEIVADMACHERYIDGFIQQLRSYKDDSGYSSSYAAAAGMHVVICPWLAFGHLLPCLDLAQRLASRGHRVSFVSTPR N1SRLPPVRPALAPLVAFVALPLPRVEGLPDGAESTNDVPHDRPDMVELHRRAFDGLAAPFSEFLGTACADWVIVDVFHHWAAAAALEHKVPCANMLLGSAEMIASIADERLEHAETESPAAAGQGRP AAAPT FEVARMKLIR Coftea canephora (CcUGT_l,6) (207) MAENHATFNVLMLPWLAHGHVSPYLELAMKLTARNFNVYLCSSPATLSSVRSKLTEKFSQSIHL VELHLPKLPELPAEYHTTNGLPPHLMPTLKDAFDMAKPNFCNVLKSLKPDLLIYDLLQPWAPEAAS AFNT PAWFIS S S ATMT S FGL HF FKNPGT KY PYGNTIFYRDYE SVEVENLKKRDRDTY RWNCMERSSKIILIKGFKEIEGKYFDYFSCLTGKKWPVGPLVQDPVLDDEDCRIMQWLNKKEKGST VFVSFGSEYFLSKEDMEEIAHGLELSNVDFIWWRFPKGENIVIEETLPKGFFERVGERGLWN GWAPQAKILTHPNVGGFVSHCGWNSVMESMKFGLPIVAMPMHLDQPINARLIEEVGAGVEVLRD SKGKLHRERMAETINKVTKEASGEPARKKARELQEKLELKGDEEIDDWKELVQLCATKNKRNG LHCYN Coffea eugerdoides (CeUGT 1, 6) (208) MAENHATFNVLMLPWLAHGHVSPYLELAKKLTARNFNVYLCSSPATLSSVRSKLTEKFSQSIHL VELHLPKLPELPAEYHTTNGLPPHLMPTLKDAFDMAEPNFCNVLKSLKPDLLIYDLLQPWAPEAASAFNIPAWFISSSATMTSFGLHFFKNPGTKYPYGNTIFYRDYESVFVENLKRRDRDTYRWNCMERSSKIILIKGFKEIEGKYFDYFSCLTGKKWPVGPLVQDPVLDDEDCRIMQWLNKKEKGST 101 WO 2021/126960 PCT/US2020/065285 VFVSrGSEYFLSECEDMEEIAHGLELSNVDFIVJVVREPRGENIVlEETLPKGFFERVGERGLVVN GWAPQAKILTEPNVGGFVSHCGWNSVMESMKFGLPIIAMPMHLDQPINARLIEEVGAGVEVLRD SKGKLHRERMAETINKVTKEASGESVRKKARELQEKLELKGDEEIDDWKELVQLCATKNKRNG LHYN Coffea euqeniaides (CeUGT 1,6.2) (209) MAENHATFNVLMLPWLAHGHVSPYLELAKKLTARNmVYLCSSPATLSSVRSKLTEKFSQSIHL VELHLPKLPELPAEYHTTNGLPPHLMPTLKDAFDMAKPNFCNVLKSLKPDLLIYDLLQPWAPEA ASAFNIPAWFISSSATMTSFGLHFFKNPGTKYPYGNAIFYRDYESVFVENLTRRDRDTYRVIN CMERSSKIILIKGFNEIEGKYFDYFSCLTGKKWPVGPLVQDPVLDDEDCEIMQWLNKKEKVST VFVSFGSEYFLSKKDMEEIAHGLELSMVDFIWWR FPKGENIVIEETLPKGFFERVGERGLWN GWAPQAKILTHPNVGGFVSHCGWNSVMESMKFGLPIIAMPMHLDQPINARLIEEVGAGVEVLRD SKGKLHRERMAETTNKVMKEASGESVRKKARELQEKMDLKGDEEIDDVVKELvQLCATKNKRNG LHYY Siraitia grosvenorii (SgUGT94-289-3.2) !210) MADAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDS IQFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWA PRVASSLKIPAINFNTTGVFVISQGLHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRK RGEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKWPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWWRFPQGDNTSGIEDALPKGFL ERAGERGMWKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVE EAGVGVEAKRDPDGKIQRDEVAKLIKEVVvEKTREDVRKKAREMSEILRSKGEEKFDEMVAEIS LLLKI Oryza sativa (OsJUGT 1,6) (SEQ ID NO: 211) MAQAERERLRVLMFPVLAEGEINPYLELATRLTTTSSSQIDVVVHLVSTPVNLAAVAHRRTDRI SLVELHLPELPGLPPALHTTKHLPPRLMPALKRACDLAAPAFGALLDELSPDVVLYDFIQPWAP LEAAARGVPAVHFSTCSAAATAFFLHFLDGGGGGGGRGAFPFEAISLGGAEEDARYTMLTCRDD GTALLFKGERLPLSFARSSEFVAVKTCVE1ESKYMDYLSKLVGKEHPCGPLLVDSGDVSAGSE ADGVMRWLDGQEPGSVVLVSFGSEYFMTEKQLAEMARGLELSGAAFVWVVRFPQQSPDGDEDDH GAAAARAMPPGFAPARGLVVEGWAPQRRVLSHRSCGAFLTHCGWSSVMESMSAGVPMVALPLHI DQPVGANLAAELGVAARVRQERFGEFEAEEVARAvBAYTMRGGEALRRRATELREVVARRDAECD EQIGALLHRMARLCGKGTGRAAQLG H Panax ginseng (PsUGT94_Bl) (SEQ ID NO: 213) MADNQNG RI SI ALL P F LA HGHIS P F FE LAKQLAKRNCNV FLC S TP INL S SI RD KD S S AS IKLVE LHLPSSPDLPPHYHTTNGLPSHLMLPLRNAFETAGPTFSEILKTLNPDLLIYDFNPSWAPEIAS SHNIPAVYFLTTAAASSSIGLHAFKNPGEKYPFPDFYDNSNITPEPPSADNMKLLHDFIACFER SCPlILIKSFRELEGKYIDLLSTLSDKTLVPR'GPLVQDPiviGHNEDPKTEQIlNWLDKRAESTVV FVCFGSEYFLSNEELEEVAIGLEISTVNFIVifAVRLIEGEKKGILPEGFVQRVGDRGLVVEGWAP QARILGHSSTGGFVSHCGWSSIAESMKFGVPVIAMARHLDQPLNGKLAAEVGVGMEVVRDENGK YKREG1AEVIRKVVVEKSGEVIRRKARELSEKMKEKGEQEIDRALEELVQ1CKKKKDEQ Stevia rebaudiana (SrUGT73El, with optional His tag) (SEQ ID NO: 214) MABHHHHHVGTGSNDDDDKSPDPNWASTSELVFIPSPGAGHLPPTVELAKLLLHRDQRLSVTII VMNLWLGPKHNTEARPCVPSLRFVDIPCDESTMALISPNTFISAFVEHHKPRVRDIVRGIIESD 102 WO 2021/126960 PCT/US2020/065285 SvRLAGFVLDMh'CMPFiSDVANEFGVPSYNYFTSGAATLGLMFHEQWKRDHEGYDATEEPCNSDTE LSVPSYVNPVPAKVLPEVVLDKEGGSKMFLDLAERIRESKGIIVNSCQAIERHALEYLSSNNNG IPPVFPVGPILNLENKKDDAKTDEIMRWLNEQPESSVVFLCFGSMGSFNEKQVKEIAVAIERSG HRFLWSLRRPTPKEKIEFPKEYENLEEVLPEGFLKRTSSIGKVIGWAPOMAVLSHPSVGGFVSH CGWNSTLESMWCGVPMAAWPLYAEQTLNAFLLVVELGLAAEIRMDYRTDTKAGYDGGMEVTVEE IEDGIRKLMS DGEIRNKVKDVKE KS RAAVVE GG S S YAS IGKFI EHVSNVTI Oryza sativa (OsUGTl-2) (SEQ ID NO: 215) MADSGYSSSYAAAAGNiHVVICPWLAFGHLLPCLDIjAQRLASRGHRVSFVSTPRNISRLPPVRPA LAPLVAFVALPLPRVEGLPDGAESTNDVPHDRPDMVELHRRAFDGLAAPFSEFLGTACADWVIV DVFHHWAAAAALEHKVPCAtMLLGSAHMIASIADRRLERAETESPAAAGQGRPAAAPTFEVARM KLIRTKGSSGMSLAERFSLTLSRSSLVVGRSCVEFEPETVPLLSTLRGKPITFLGLMPPLHEGR REDGEDATVRWLDAQPAKSVVYVALGSEVPLGVEKVHELALGLELAGTRFLWALRKPTGVSDAD LL PAG F E E RT RG RG V VAT RWVPQMSILAHAAVGA FLT HO GWN S TIE GLM EG H P LI ML PIFGDQG PNARL1EAKNAGLQVARNDGDGSFDREGVAAAIRAVAVEEESSKVFQAKAKKLQEIVADMACHE RYIDGFIQQLRSYKD Caraelina sativa (XP 010516905.1) (SEQ ID NO: 216) MASEKTLQVHPPLHFVLFPFMAQGHMIPMVDIARLLAQRGATVTIVTTRYNAGRFENVLSRAVE SGLPINIVHVKFPYEEVGLPKGKENIDSLDSMELMVPFFKAVNMLQDPVVKLMEEMESRPSCII SDLLLPYTSKIAKKFNIPKIVFHGISCFCLLCVHvLRRNLEILTNLKSDKEYFLVPSFPDRVEF TKPQVTVETNASGDWKEFLDEMVEAEDTSYGVIINTFEELEPAYVKDYKDARAGNVWSIGPVSL CNFAGVDKAERGNKATIDQDECLKWLDSKEEGSVLYVCLGSICNLPLVQLKELGLGLEESQRPF IWVIRGWEKYNELSEWMVESGFEERIRERGLLIRGWAPQVLILSHPSVGGFLTHCGWNSTVEGI TSGVPLITWPLFGDQFCNQTLVVQVLKAGVSVGVEEVMKWGEEEKIGVLVDKEGVKKAVEDLMG ESDDAKERTKRVKELGGLAHKAVEEGGSSHSNITLFLQDIRQVQSV Glycyrrhiza uralensis (UGT73F24) (SEQ ID NO: 217) MADVAEEQPLKIYFIPYLAAGHMIPLCDIATLFASRGHHVTIITTPSNAQTLRESHHFRVQTIQ FPSQEVGLPAGVQNLTAVTNLDDSYKIYHATMLLRKHIEDFVERDPPDCIVADFLFPWVDDVAT KLHIPRLVFNGFTLFTICAMESHKAHPLPVDAASGSFVIPDFPHHVTINSTPPKRTKEFVDPLL TEAFKSHGFLINSFVELDGEECVEHYERITGGHKAWHLGPAFLVHRTAQDRGEKSVVSTQECLS WLDSKB.DNSVLYICFGTICYFPDKQLYEIASAIEASGHEFIWVVPEKRGMADESEEEKEKWLPK GFEERNNGKKGMIIRGWAPQVAILGHPAVGGFLTHCGWNSTVEAVSAGVPMITWPVHSDQYFNE KLITQVRGIGVEVGAEEWIVTAFRETEKLVGRDRIERAVRRVMDGGDEAVQIRRRARELGEMAR QAVQEGGSSHTNLTALINDLKRWRDSKQLN Glycyrrhiza uralensis (UGT73C33) (SEQ ID NO: 218) MAVFQANQPHFVLFPLMAQGHIIPMIDIARLLAQRGAIVTIFTTPKNASRFTSVLSRAVSSGLQ ]:RLVHLHFPSKEAGLPEGCENLDMVASHDMICNTFQAIRMLQKQAEELFETLTPKPSCIISDFC IPWTTQVAEKHHIPRISFHGFSCFCLHCMLKIHTSKVLEGITSESEYETVPGIPDQIQVTKQQV PGPMIDEMKEFGEQMRDAEIRSYGVIINTFEELEKAYVNDYKKERNGKVWCIGPVSLCNKDGLD KAQRGNKASISEHHCLEWLDLQQPNSVIYVCLGSLCNLTPPQLMELALGLEATKRPFIWVIREG NKFEELEKtnSEEGFEERIKGRGLIIRGWAPQVLILSHPSIGGFLTHCGWNSTLEGVTAGVPMV TWPLFADQFLNEKLVTQVLRIGVSLGVDVPLKWGEEEKVGVQVKKEGIEKAICNVMDEGEESKE RRERAKELSEMAKRAVEKDGSSHLNMTMLIQDIMQQSSSKVET
Claims (171)
1. A method for making mogroi or mogroside, comprising:providing a recombinant microbial host cell expressing a heterologous enzyme pathway catalyzing the conversion of isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) to mogroi or mogroside, the pathway comprising at least one of:(A) at least two squalene epoxidase enzymes (SQE) for convening squalene to 2,3;22,23 dioxidosqualene;(B) at least one triterpene cyclase enzyme for converting 22,23-dioxidosqualene to 24,25-epoxycucurbitadienol, the triterpene cyclase enzyme comprising an amino acid sequence that is at least 70% identical to one of SEQ ID NO: 191, SEQ ID NO: 192, and SEQ ID NO: 193;(C) at least one epoxide hydrolase converting 24,25-epoxycucurbitadienol to 24,25-dihydroxycucurbitadienol, the at least one epoxide hydrolase comprising an amino acid sequence that is at least 70% identical to any one of SEQ ID NOS: 189, 58, 184, 185,187, 188, 190, and 212;(D) a cytochrome P450 enzyme comprising an amino acid sequence having at least 70% sequence identity with an amino acid sequence selected from SEQ ID NO: 1and SEQ ID NO: 171; and(E) at least one uridine diphosphate dependent glycosyltransferase (UGT) enzyme comprising an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NO: 164, 165, 138, 204 to 211, 213 to 218; andculturing the host cell under conditions for producing the mogroi or mogroside.
2. The method of claim 1, wherein at least one squalene epoxidase comprises an amino acid sequence that is at least 70% identical to any one of SEQ ID NOS: 17 to 39, 168 to 170, and 177 to 183.
3. The method of claim 2, wherein at least one squalene epoxidase comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 39. 104 WO 2021/126960 PCT/US2020/065285
4. The method of claim 3, wherein the at least one SQE comprises an ammo acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 39.
5. The method of claim 3, wherein the SQE comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 39, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
6. The method of claim 3, wherein the host cell comprises two squalene epoxidase enzymes that each comprise an amino acid sequence that is at least 70% identical to SEQ ID NO: 39.
7. The method of claim 6, wherein one of the SQE enzymes has one or more amino acid modifications that improve specificity or productivity' for conversion of 2,3- oxidosqualene to 2,3;22,23 dioxidosqualene, as compared to the enzyme having the amino acid sequence of SEQ ID NO: 39.
8. The method of claim 6 or 7, wherein the amino acid modifications to the squalene epoxidase comprise one or more modifications at positions corresponding to the following positions of SEQ ID NO: 39: 35, 133, 163, 254, 283, 380, and 395.
9. The method of claim 8, wherein the amino acid modifications to the squalene epoxidase comprise two, three, four, five, or six amino acid modifications selected from substitutions at positions corresponding to positions 35, 133,163, 254,283, 380, and 3of SEQ ID NO: 39.
10. The method of claim 8 or 9, wherein the amino acid modifications are selected from:the amino acid at the position corresponding to position 35 of SEQ ID NO: 39 is arginine or lysine;the amino add at the position corresponding to position 133 of SEQ ID NO: is glycine, alanine, leucine, isoleucine, or valine; 105 WO 2021/126960 PCT/US2020/065285 the ammo acid at the position corresponding to position 163 of SEQ ID NO: is glycine, alanine, leucine, isoleucine, or valine;the amino acid at the position corresponding to position 254 of SEQ ID NO: is phenylalanine, alanine, leucine, isoleucine, or valine;the amino acid at the position corresponding to position 283 of SEQ ID NO: is alanine, leucine, isoleucine, or valine.the amino acid at the position corresponding to position 380 of SEQ ID NO: is alanine, leucine, or glycine, andthe amino acid at the position corresponding to position 395 of SEQ ID NO: is tyrosine, serine, or threonine.
11. The method of claim 10, wherein the squalene epoxidase comprises the amino acid substitutions; H35R, F163A, M283L, V38OL, and F395Y, numbered according to SEQ ID NO: 39.
12. The method of claim 10, wherein the squalene epoxidase comprises the amino acid substitutions: H35R. N133G, F163A, Y254F, V380L, and F395Y, numbered according to SEQ ID NO: 39.
13. The method of any one of claims 1 to 12, wherein the heterologous enzyme pathway further comprises a squalene synthase (SQS).
14. The method of claim 13, wherein the SQS comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 2 to 16, 166, and 167.
15. The method of claim 14, wherein the SQS comprises an amino acid sequence that is at. least 70% identical to SEQ ID NO: 11.
16. The method of claim 15, wherein the SQS comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 11. 106 WO 2021/126960 PCT/US2020/065285
17. The method of claim 15, wherein the SQS comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 11, the amino acid modifications being independently selected from amino add substitutions, deletions, and insertions.
18. The method of claim 14, wherein the SQS comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 2, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 166, or SEQ ID NO: 167.
19. The method of any one of claims 1 to 18, wherein the heterologous enzyme pathway comprises at least one triterpene cyclase (TTC).
20. The method of claim 19, wherein at least one TTC comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 40 to 55, 191 to 193, and 219 to 220.
21. The method of claim 20, wherein the heterologous enzyme pathway comprises at least two enzymes having tri terpene cyclase activity and converting 22,23- dioxidosqualene to 24,25-epoxycucurbitadienol.
22. The method of claim 20 or 21, wherein the TTC comprises an amino acid sequence that is at least 70% identical to the amino acid sequence of SEQ ID NO: 40.
23. The method of claim 22, wherein the TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 40.
24. The method of claim 23, wherein the TTC comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 40, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions. 107 WO 2021/126960 PCT/US2020/065285
25. The method of any one of claims 19 to 24, wherein the heterologous enzyme pathway comprises at least one TIC that comprises an amino acid sequence that is at least 70% identical to one of SEQ ID NO: 191, SEQ ID NO: 192, and SEQ ID NO: 193.
26. The method of claim 25, wherein at least one TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 191, 192, and 193.
27. The method of claim 26, wherein the TTC comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 191, 192, and 193, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
28. The method of any one of claims 1 to 27, wherein the heterologous pathway comprises an enzyme that converts cucurbitadienol to 24,25-epoxycucurbitadienol.
29. The method of claim 28, wherein the enzyme converting cucurbitadienol to 24,25-epoxycucurbitadienol comprises an amino acid sequence having at least about 70% sequence identity to SEQ ID NO: 221.
30. The method of any one of claims 1 to 29, wherein the heterologous enzyme pathway comprises an epoxide hydrolase (EPH).
31. The method of claim 30, wherein the EPH comprises an amino acid sequence that is at least 70% identical to amino acid sequence selected from SEQ ID NOS: 56 to 72, 184 to 190, and 212.
32. The method of claim 31, wherein the EPH comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 56 to 72, 184 to 190, and 212.
33., The method of claim 32, wherein the EPH comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 56 to 108 WO 2021/126960 PCT/US2020/065285 72,184 to 190, and 212, the ammo acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
34. The method of claim 31, wherein the heterologous pathway comprises at least one EPH converting 24,25-epoxycucurbitadienol to 24,25-dihydroxycucurbitadienol, the at least one EPH comprising an amino acid sequence that is at least 70% identical to one of: SEQ ID NOS: 189, 58, 184, 185, 187, 188, 190, and 212.
35.5. The method of claim 34, wherein the EPH comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 189, 58, 184, 185,187, 188, 190, and 212.
36. The method of claim 35, wherein the EPH comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 189, 58, 184, 185, 187, 188, 190, and 212, the amino acid modifications being independently selected from amino acid, substitutions, deletions, and. insertions.
37. The method of any one of claims 1 to 36, wherein the heterologous pathway comprises one or more oxidases that oxidize C11 of €24,25 dihydroxycucurbitadienol to produce mogrol.
38. The method of claim 37, wherein at least one oxidase is a cytochrome P4enzyme.
39. The method of claim 38, wherein at least one cytochrome P450 enzyme comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 73 to 91, 171 to 176, and 194 to 200.
40. The method of claim 39, wherein at least one cytochrome P450 enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 73 to 91, 171 to 176, and 1941.0 200. 109 WO 2021/126960 PCT/US2020/065285
41. The method of claim 40, wherein at least one cytochrome P450 enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 73 to 91, 171 to 176, and 194 to 200, the amino add modifications being independently selected from amino acid substitutions, deletions, and insertions.
42. The method of any one of claims 37 to 41, wherein the cytochrome P4comprises an amino acid sequence that is at least 70% identical to an amino add sequence selected from SEQ ID NO: 194 and SEQ ID NO: 171.
43. The method of claim 42, wherein the cytochrome P450 enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 194 and 171.
44. The method of claim 43, wherein at least one cytochrome P450 enzyme comprises an amino acid, sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 194 and 171, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
45. The method of any one of claims 42 to 44, wherein the cytochrome P450 enzyme has at. least a portion of its transmembrane region substituted with a heterologous transmembrane region.
46. The method of claim 37, wherein at least one oxidase is a non-heme iron oxidase.
47. The method of claim 46, wherein the non-heme iron oxidase comprises an amino acid sequence that is at least 70% identical to an amino acid, sequence selected from SEQ ID NOS: 100 to 115.
48. The method, of any one of claims 37 to 47, wherein the microbial host cell expresses one or more electron transfer proteins selected from a cytochrome P4reductase (CPR), flavodoxin reductase (FPR) and ferredoxin reductase (FDXR) sufficient to regenerate the one or more oxidases. 110 WO 2021/126960 PCT/US2020/065285
49. The method of claim 48, wnerem the microbial host cell expresses a cytochrome P450 reductase comprising an amino acid sequence that is at least 70% identical to one of SEQ ID NOS: 92 to 99 and 201.
50. The method of claim 49, wherein the cytochrome P450 reductase comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 92 to 99 and 201.
51. The method of claim 49, wherein the microbial host cell expresses SEQ ID NO: 194 or a. derivative thereof, and SEQ) ID NO: 98 or a derivative thereof.
52. The method of claim 49, wherein the microbial host cell expresses SEQ ID NO: 171 or a derivative thereof, and SEQ ID NO: 201 or a derivative thereof.
53. The method of any one of claims 1 to 52, wherein the heterologous enzyme pathway comprises one or more uridine diphosphate-dependent glycosyltransferase (UGT) enzymes, thereby producing one or more mogrol glycosides.
54. The method of claim 53, wherein the one or more mogrol glycosides are selected from Mog.II-E, Mog.IIL Mog.III-Al, Mog.III-A2, Mog.IH, Mog.IV, Mog.IV-A, siamenoside, Mog.V, and Mog.VI.
55.55. The method of claim 53, wherein the one or more mogrol glycosides include Mog. VI, Isomog.V, and Mog.V.
56. The method of claim 53, wherein the host cell produces Mog.V or siamenoside.
57. The method of any one of claims 53 to 56, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NOS: 116 to 165, 202 to 210, 211, 213 to 218. .58 , The method of claim 57, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at ill
58.WO 2021/126960 PCT/US2020/065285 least 98%, or at least 99% identical to one of SEQ ID NOS: 116 to 165,202 to 210, 211, 21310218.
59. The method of claim 58, wherein at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to one of SEQ ID NOS: 116 to 165, 202 to 210, 211, 213 to 218, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
60. The method, of any one of claims 53 to 59, wherein at least one uridine diphosphate dependent glycosyltransferase (UGT) enzyme comprises an amino acid sequence having at least 70% sequence identity to one of SEQ ID NO: 164, 165, 138, 204 to 211, and 213 to 218.
61. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 165.
62. The method of claim 61, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 165; or comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 165, the amino acid modifications being independently selected from amino add substitutions, deletions, and insertions.
63. The method of claim 62, wherein the at least one UGT enzyme comprises a substitution at one or more of positions 41,49, and 127, with respect to SEQ ID NO: 165, wherein said one or more substitutions optionally include one or more of: L41F, D49E, and C127F.
64. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 164.
65. The method of claim 64, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at. least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 164; or comprises an amino acid. 112 WO 2021/126960 PCT/US2020/065285 sequence having from 1 to 20 ammo acid modifications with respect to SEQ ID NO: 164, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions,
66. The method of claim 65, wherein the at least one UGT enzyme comprises one or substitutions listed in Table 3, with respect to SEQ ID NO: 164, and optionally having one or more amino acid substitutions selected from S150F, T147L, N207K, K270E, V281L, L354V, L13F, T32A, and KI 01A with respect to SEQ ID NO: 164.
67. The method of claim 59, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 138.
68. The method of claim 67, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 138; or comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 138, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
69. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that, is at least 70% identical to SEQ ID NO: 204.
70. The method of claim 69, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 204; or comprises an amino acid, sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 204, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
71. The method, of claim 60, wherein at least, one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 205.
72. The method of claim 71, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at 113 WO 2021/126960 PCT/US2020/065285 least 98%, or at least 99% identical to SEQ ID NO: 205; or comprises an ammo acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 205, the amino add modifications being independently selected from amino acid substitutions, deletions, and insertions.
73. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 206.
74. The method of claim 73, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at. least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 206; or comprises an amino acid, sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 206, the amino acid modifications being independently selected from amino acid substitutions, deletions, and insertions.
75. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 207; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 207.
76. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 208; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 208.
77. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 209; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 209. 114 WO 2021/126960 PCT/US2020/065285
78. The method of clam! 60, wherein at least one UGT enzyme comprises an ammo acid sequence that is at least 70% identical to SEQ ID NO: 210; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 210.
79. The method of claim 60, wherein at least, one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 211; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 211.
80. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 213; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%), or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 213.
81. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 214; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 214.
82. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 215; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 215.
83. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 218; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or 115 WO 2021/126960 PCT/US2020/065285 at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 218.
84. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 217; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 217, and optionally having one or more amino acid substitutions selected from A74E, I91F, H101P, Q241E, and I436L.
85. The method of claim 60, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 216; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 216.
86. The method of any one of claims 60 to 85, wherein at least one UGT enzyme further comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 146.
87. The method of claim 86, wherein at least, one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 146; or at least one UGT enzyme comprises an amino acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 146, the amino acid modifications being independently selected, from amino acid substitutions, deletions, and insertions.
88. The method of any one of claims 60 to 87, wherein at least one UGT enzyme further comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 202.
89., The method of claim 88, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 202; or at least one UGT enzyme 116 WO 2021/126960 PCT/US2020/065285 comprises an ammo acid sequence having from 1 to 20 amino acid modifications with respect to SEQ ID NO: 202, the amino acid modifications being independently selected, from amino acid substitutions, deletions, and insertions.
90. The method of any one of claims 60 to 89, wherein at least one UGT enzyme is a circular pennutant of a wild-type UGT enzyme, or a derivative thereof.
91. The method of any one of claims 60 to 90, wherein the microbial host cell expresses at least three UGT enzymes: a first UGT enzyme catalyzing primary glycosylation at the C24 hydroxyl of mogrol, a second UGT enzyme catalyzing primary glycosylation at the C3 hydroxyl of mogrol, and a third UGT enzyme catalyzing one or more branching glycosylation reactions.
92. The method of claim 91, wherein the microbial host cell expresses one or two UGT enzymes catalyzing beta 1,2 and/or beta 1,6 branching glycosylations of the Cand/or C24 primary- glycosylations.
93. The method of any one of claims 53 to 57, wherein the UGT enzymes comprise three or four UGT enzymes selected from:SEQ ID NO: 165 or a derivative thereof;SEQ ID NO: 146 or a derivative thereofSEQ ID NO: 214 or a derivative thereof;SEQ ID NO: 129 or a derivative thereof;SEQ ID NO: 164 or a derivative thereof;SEQ ID NO: 116 or a derivative thereof;SEQ ID NO: 202 or a derivative thereof;SEQ ID NO: 218 or a derivative thereof;SEQ ID NO: 217 or a derivative thereof;SEQ ID NO: 138 or a derivative thereof;SEQ ID NO: 204 or a derivative thereof;SEQ ID NO: 205 or a derivative thereof;SEQ ID NO: 207 or a derivative thereof;SEQ ID NO: 208 or a derivative thereof;SEQ ID NO: 209 or a derivative thereof; 117 WO 2021/126960 PCT/US2020/065285 SEQ ID NO: Il or a derivative thereof,SEQ ID NO: 215 or a derivative thereof;SEQ ID NO: 213 or a derivative thereof;SEQ ID NO: 206 or a derivative thereof;SEQ ID NO: 122) or a derivative thereof; andSEQ ID NO: 210) or a derivative thereof.
94. The method of any one of claims I to 93, wherein the microbial host ceil is prokaryotic or eukaryotic, and. is optionally a bacteria selected from Escherichia coll. Bacillus subtilis, Coryne bacterium glutamicum, Rhodobacter capsulatus, Rhodobacter sphaeroides, Zymomonas mobilis, Vibrio natriegens, ar Pseudomonas putida; or is a yeast selected from a. species of Saccharomyces, Pichia, or Yarrowia, and which is optionally Saccharomyces cerevisiae, Pichiapastoris, and Yarrowia lipolytica.
95. The method of claim 94, wherein the microbial host cell is E. coli.
96. The method of claim 94 or 95, wherein the microbial host cell is a bacterium thatproduces increased MEP pathway products.
97. The method of any one of claims 1 to 96, wherein the heterologous enzyme pathway comprises a famesyl diphosphate synthase (FPPS).
98. The method of any one of claims 1 to 97, wherein microbial host cell has one or more genetic modifications that increase the production or availability of UDP-glucose.
99. The method of claim 98, wherein the microbial host, cell is a bacterial cell having one or more genetic modifications selected from AgalE, AgalT, AgalK, AgalM, AushA, Aagp, Apgm, duplication or overexpression of E coli GALU, expression of Bacillus subtillus UGPA, and expression of Bifidobacterium adolescentis SPL.
100. The method of any one of claims 1 to 99, wherein the mogrol glycoside products are recovered from the extracellular media.
101. A method for making a product comprising a mogrol glycoside, comprising: 118 WO 2021/126960 PCT/US2020/065285 producing a mogrol glycoside in accordance with any one oi claims 1 to 100, and incorporating the mogrol glycoside into a product.
102. The method of claims 101, wherein the product is a sweetener composition, flavoring composition, food, beverage, chewing gum, texturant, pharmaceutical composition, tobacco product, nutraceutical composition, or oral hygiene composition.
103. The method of claim 101 or 102, wherein the product further comprises one or more of a steviol glycoside, aspartame, and neotame.
104. The method, of claim 103, wherein the steviol glycoside comprises one or more of RebM, RebB, RebD, Reb A, RebE, and Rebl.
105. A microbial host cell expressing a heterologous enzyme pathway catalyzing the conversion of isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) to mogrol or mogroside, the pathway comprising at least one of:(A) at least two squalene epoxidase enzymes (SQE) for converting squalene to 2,3;22,23 dioxidosqualene;(B) at least one triterpene cyclase enzyme for converting 22,23-dioxidosqualene to 24,25-epoxycucufbitadienol, the triterpene cyclase enzyme comprising an amino acid, sequence that is at least 70% identical to one of SEQ ID NO: 1.91, SEQ ID NO: 192, and SEQ ID NO: 193;(C) at least one epoxide hydrolase converting 24,25-epoxycucurbitadienol to 24,25-dihydroxycucurbitadienol, the at least one epoxide hydrolase comprising an ammo acid sequence that is at least 70% identical to any one of SEQ ID NOS: 189, 58, 184, 185,187, 188, 190, and 212;(D) a. cytochrome P450 enzyme comprising an amino acid sequence having at least 70% sequence identity with an amino acid sequence selected from SEQ ID NO: 1and SEQ ID NO: 171; and(E) at least one uridine diphosphate dependent glycosyltransferase (UGT) enzyme comprising an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NO: 164, 165, 138, 204 to 211, :213 to 218. 119 WO 2021/126960 PCT/US2020/065285
106. The microbial host cell of claim 105, wherein at least one squalene epoxidase comprises an amino acid sequence that is at least 70% identical to any one of SEQ ID NOS: 17 to 39, 168 to 170, and 177 to 183.
107. The microbial host cell of claim 106, wherein at least one squalene epoxidase comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 39.
108. The microbial host cell of claim 107, wherein the at least one SQE comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 39.
109. The microbial host cell of claim 108, wherein the host cell comprises two squalene epoxidase enzymes that each comprise an amino acid sequence that is at least 70% identical to SEQ ID NO: 39.
110. The microbial host cell of claim 109, wherein one of the SQE enzymes has one or more amino acid modifications that improve specificity or productivity for conversion of 2,3-oxidosquaiene to 2,3;22,23 dioxidosqualene, as compared, to the enzyme having the amino acid sequence of SEQ ID NO: 39.
111. The microbial host cell of claim 110, wherein the amino add modifications to the squalene epoxidase comprise one or more modifications at positions corresponding to the following positions of SEQ ID NO: 39: 35, 133, 163, 254, 283, 380, and 395.
112. The microbial host cell of claim 1 1 1, wherein the squalene epoxidase comprises the amino acid substitutions: H35R, F163A, M283L, V380L, and F395Y, numbered according to SEQ ID NO: 39: or comprises the amino acid substitutions: H35R, N133G, F1637X, Y254F, V380L, and F395Y, numbered according to SEQ ID NO: 39.
113. The microbial host cell of any one of claims 105 to 112, wherein the heterologous enzyme pathway further comprises a squalene synthase (SQS).
114. The microbial host cell of claim 113, wherein the SQS comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 11; or the SQS comprises an amino 120 WO 2021/126960 PCT/US2020/065285 acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 11.
115. The microbial host cell of any one of claims 105 to 114, wherein the heterologousenzyme pathway comprises at least one triterpene cyclase (TTC).
116. The microbial host cell of claim 115, wherein the heterologous enzyme pathway comprises at least two enzymes having triterpene cyclase activity and converting 22,23- dioxidosqualene to 24,25-epoxycucurbitadienol.
117. The microbial host cell of claim 115 or 116, wherein the TTC comprises an amino acid sequence that is at least 70% identical to the amino acid sequence of SEQ ID NO: 40; or the TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 40.
118. The microbial host cell of any one of claims 115 to 117, wherein the heterologous enzyme pathway comprises at least one TTC that comprises an amino acid sequence that is at least 70% identical to one of SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193; or at least one TTC comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 191, 192, and 193.
119. The microbial host cell of any one of claims 105 to 118, wherein the heterologous pathway comprises an enzyme that converts cucurbitadienol to 24,25- epoxy cucurbitadienol.
120. The microbial host cell of claim 119, wherein the enzyme converting cucurbitadienol to 24,25-epoxycucurbitadienol comprises an amino acid sequence having at least about 70% sequence identity to SEQ ID NO: 221. 121, The microbial host cell of any one of claims 105 to 120, wherein the heterologous enzyme pathway comprises an epoxide hydrolase (EPH).
121.WO 2021/126960 PCT/US2020/065285
122. The microbial host cell oi claim 121, wherein the heterologous pathway comprises at least one EPH converting 24,25-epoxycucurbitadienol to 24,25- dihydroxycucurbitadienol, the at least one EPH comprising an amino add sequence that is at least 70% identical to one of: SEQ ID NO: 189, SEQ ID NO: 58, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 190) and SEQ ID NO: 212).
123. The microbial host cel l of claim 122, wherein the EPH comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 189, 58, 184, 185,187, 188, 190, and 212.
124. The microbial host cell of any one of claims 105 to 123, wherein the heterologous pathway comprises one or more oxidases that oxidize CH of C24,dihydroxycucurbitadienol to produce mogrol.
125. The microbial host cell of claim 124, wherein at least one oxidase is a cytochrome P450 enzyme.
126. The microbial host cell of claim 124 or 125, wherein the cytochrome P4comprises an amino acid sequence that is at least 70% identical to an amino acid sequence selected from SEQ ID NO: 194 and SEQ ID NO: 171; or the cytochrome P450 enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to one of SEQ ID NOS: 1and. 171.
127. The microbial host cell of claim 125 or 126, wherein the cytochrome P4enzyme has at least a portion of its transmembrane region substituted with a heterologous transmembrane region.
128. The microbial host cell of any one of claims 124 to 127, wherein the microbial host cell expresses one or more electron transfer proteins selected from a cytochrome P450 reductase (CPR), Havodoxin reductase (FPR) and ferredoxin reductase (FDXR) sufficient to regenerate the one or more oxidases. 122 WO 2021/126960 PCT/US2020/065285
129. The microbial host cell of claim 128, wherein the microbial host ceil expresses SEQ ID NO: 194 or a derivative thereof, and SEQ ID NO: 98) or a derivative thereof.
130. The microbial host cell of claim 129, wherein the microbial host cell expresses SEQ ID NO: 171 or a derivative thereof, and SEQ ID NO: 201 or a derivative thereof.
131. The microbial host cell of any one of claims 105 to 130, wherein the heterologous enzyme pathway comprises one or more uridine diphosphate-dependent glycosyltransferase (UGT) enzymes, thereby producing one or more mogrol glycosides.
132. The microbial host, cell of claim 131, wherein the host, cell produces one or more mogrol glycosides selected from Mog.II-E, Mog.III, Mog.III-Al, Mog.III-A2, Mog.Ill, Mog.IV, Mog.IV-A, siamenoside, and Mog. V.
133. The microbial host cell of claim 132, wherein the host cell produces Mog.V or siamenoside.
134. The microbial host, cell of any one of claims 105 to 133, wherein at least one uridine diphosphate dependent glycosyltransferase (UGT) enzyme comprises an amino acid sequence having at. least 70% sequence identity to one of SEQ ID NO: 164, 165, 138, 204 to 211, and 213 to 218.
135. The microbial host cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to Stevia rebaudiana UGT85C(SEQ ID NO: 165), or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least v8i!; or at least 99% identical to SEQ ID NO: 165.
136. The microbial host cell of claim 135, wherein the UGT enzyme has an amino acid substitution at one or more positions selected from 41, 49, and 127 with respect to SEQ ID NO: 165, optionally including one or more ofL4IF, D49E, C127F. 123 WO 2021/126960 PCT/US2020/065285
137. The microbial host cell oi claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to Coffea arabica UGT (SEQ ID NO; 164); or wherein at least, one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 164.
138. The microbial host cell of claim 137, wherein the UGT enzyme has one or more amino acid substitutions from Table 3 with respect to SEQ ID NO: 164, and which optionally include one or more of S150F, T147L, N207K, K270E, V281L, L354V, L13F, T32A, and KI 01 A.
139. The microbial host cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 138; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 138.
140. The microbial host cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ IE) NO: 204; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at. least 98%, or at least 99% identical to SEQ ID NO: 204.
141. The microbial host cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 205; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 205.
142. The microbial host cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 206; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98'%, or at least 99% identical to SEQ ID NO: 206. 124 WO 2021/126960 PCT/US2020/065285
143. The microbial host cell of claim 13 4, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 207; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 207.
144. The microbial host cell of claim 134, wherein at least oneUGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 208; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 208.
145. The microbial host cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 209; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90'%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 209.
146. The microbial host cell of claim 134, wherein at least oneUGT enzyme comprises an amino add sequence that is at least ,WA identical to SEQ ID NO: 210; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 210.
147. The microbial host cell of claim 134, wherein at least oneUGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 211; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 211.
148., The microbial cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 213, or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, 125 WO 2021/126960 PCT/US2020/065285 or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 213.
149. The microbial cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 214, or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 214.
150. The microbial cell of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 215; or wherein at least one UGT enzyme compri ses an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 215.
151. The method of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 218; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 218.
152. The method of claim 134, wherein at least one UGT enzyme compri ses an amino acid sequence that is at least 70% identical to SEQ ID NO: 217; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 217, the UGT optionally having an amino acid substitution selected from A74E, 191F, H101P, Q241E, and I436L.
153. The method of claim 134, wherein at least one UGT enzyme comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 216; or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 216. 126 WO 2021/126960 PCT/US2020/065285
154. The microbial host cell of any one of claims 134 to 153, wherein at least one UGT enzyme further comprises an amino acid sequence that is at least 70% identical to SEQ ID NO: 146); or wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 146.
155. The microbial cell of any one of claims 134 to 154, wherein at least one UGT enzyme further comprises an amino acid sequence that is at least 70’% identical to SEQ ID NO: 202; wherein at least one UGT enzyme comprises an amino acid sequence that is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% identical to SEQ ID NO: 202.
156. The microbial host cell of any one of claims 134 to 155, wherein the microbial host cell expresses at least three UGT enzymes: a first UGT enzyme catalyzing primary glycosylation at the C24 hydroxyl of mogrol, a second UGT enzyme catalyzing primary' glycosylation at tire C3 hydroxyl of mogrol, and a third UGT enzyme catalyzing one or more branching glycosylation reactions.
157. The microbial host cell of claim 156, wherein the microbial host cell expresses one or two UGT enzymes catalyzing beta 1,2 and/or beta 1,6 branching glycosylations of the C3 and/or C24 primary glycosylations.
158. The microbial host cell of claim 157, wherein the UGT enzymes comprise three or four UGT enzymes selected from:SEQ ID NO: 165 or a derivative thereof;SEQ ID NO: 146 or a derivative thereof;SEQ ID NO: 214 or a derivative thereof;SEQ ID NO: 129 or a derivative thereof;SEQ ID NO: 164 or a derivative thereof;SEQ ID NO: 116 or a derivative thereof;SEQ ID NO: 202 or a derivative thereof;SEQ ID NO: 218 or a derivative thereof;SEQ ID NO: 217 or a derivative thereof;SEQ ID NO: 138 or a derivative thereof: 127 WO 2021/126960 PCT/US2020/065285 SEQ ID NO: 204 or a derivative thereof; andSEQ ID NO: 205 or a derivative thereof;SEQ ID NO: 207 or a derivative thereof;SEQ ID NO: 208 or a derivative thereof;SEQ ID NO: 209 or a derivative thereof;SEQ ID NO: 11 or a derivative thereof;SEQ ID NO: 215 or a derivative thereof;SEQ ID NO: 213 or a derivative thereof;SEQ ID NO: 206 or a derivative thereof;SEQ ID NO: 122 or a derivative thereof; andSEQ ID NO: 210) or a derivative thereof.
159. The microbial host ceil of any one of claims 105 to 158, wherein the microbial host cell is prokaryotic or eukaryotic, and is optionally a bacteria selected from Escherichia colt. Bacillus subtilis, Corynebacteriwn glutamicum, Rhodobacter capsuluhis, Rhodobacter sphaeroides, Zymomcmas mobilis, Vibrio natriegens, or Pseudomonas putida; or is a yeast selected from a species of Saccharomyces, Pichky or Yarrowia, and which is optionally Saccharomyces cerevisiae, Pichia pastoris, and Yarrowia lipolytica.
160. The microbial host cell of claim 159, wherein the microbial host cell is E. coli.
161. The microbial host cell of claim 159 or 160, wherein the microbial host cell is abacterium that produces increased MEP pathway products.
162. The microbial host cell of any one of claims 105 to 161, wherein the heterologous enzyme pathway comprises a famesyl diphosphate synthase (FPPS).
163. The microbial host cell of any one of claims 105 to 162, wherein microbial host cell has one or more genetic modifications that increase the production or availability of UDP-glucose.
164. The method of claim 163, wherein the microbial host cell is abacterial cell having one or more genetic modifications selected from AgalE, AgalT, AgalK, AgalM, AushA, 128 WO 2021/126960 PCT/US2020/065285 Aagp, Apgm, duplication or overexpression of E coli galU, expression of Bacillus subtilhis UGPA, and expression 0£ Bijidobactemm adolescentis SPL.
165. A UGT enzyme or host cell expressing the UGT enzyme, the UG I enzyme comprising an amino acid sequence that has at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 97% sequence identity with SEQ ID NO: 165, and having one or more an amino acid substitutions selected from L41F, D49E, and C127F with respect to SEQ ID NO: 165.
166. The UGT enzyme or host cell of claim 165, wherein the UGT enzyme comprises the amino acid substitutions L41F, D49E, and CI27F, with respect to SEQ ID NO: 165.
167. A UGT enzyme or host cell expressing the UGT enzyme, the UGT enzyme comprising an amino acid sequence that has at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 97% sequence identity with SEQ ID NO: 164, and having one or more an amino acid substitutions selected from Table 3.
168. The UGT' enzyme or host cell of claim 167, wherein the UGT enzyme has one or more substitutions selected from S150F, T147L, N207K, K270E, V281L, L354V, L13F, T32A, and K101A, with respect to SEQ ID NO: 164.
169. The UGT enzyme of claim 168, comprising the amino acid substitutions T147L andN207K, with respect to SEQ ID NO: 164.
170. The UGT enzyme or host cell expressing the UGT enzyme, the UGT enzyme comprising an amino acid sequence sequence that has at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 97% sequence identity' with SEQ ID NO: 217, and having one or more an amino acid substitutions selected from A74E, I91F, H10IP, Q241E, and I436L, with respect to SEQ ID NO: 217.
171. The UGT enzyme or host cell of claim 170, comprising the amino acid substitutions A74E, I91F, and H101P with respect to SEQ ID NO: 217. 129
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962948657P | 2019-12-16 | 2019-12-16 | |
US202063075631P | 2020-09-08 | 2020-09-08 | |
US202063085557P | 2020-09-30 | 2020-09-30 | |
PCT/US2020/065285 WO2021126960A1 (en) | 2019-12-16 | 2020-12-16 | Microbial production of mogrol and mogrosides |
Publications (1)
Publication Number | Publication Date |
---|---|
IL293855A true IL293855A (en) | 2022-08-01 |
Family
ID=76478522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IL293855A IL293855A (en) | 2019-12-16 | 2020-12-16 | Microbial production of mogrol and mogrosides |
Country Status (8)
Country | Link |
---|---|
US (1) | US20230042171A1 (en) |
EP (1) | EP4077649A4 (en) |
JP (1) | JP2023506242A (en) |
KR (1) | KR20220164469A (en) |
AU (1) | AU2020408684A1 (en) |
CA (1) | CA3164769A1 (en) |
IL (1) | IL293855A (en) |
WO (1) | WO2021126960A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118047841B (en) * | 2024-04-09 | 2024-06-11 | 中国农业科学院农业基因组研究所 | Fermented mulberry leaf antibacterial peptide Squ and application thereof |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9029636B2 (en) * | 2008-02-05 | 2015-05-12 | Monsanto Technology Llc | Isolated novel nucleic acid and protein molecules from soy and methods of using those molecules to generate transgenic plants with enhanced agronomic traits |
WO2014086842A1 (en) * | 2012-12-04 | 2014-06-12 | Evolva Sa | Methods and materials for biosynthesis of mogroside compounds |
JPWO2015016393A1 (en) * | 2013-08-02 | 2017-03-02 | サントリーホールディングス株式会社 | How to use hexenol glucosylase |
IL287789B2 (en) * | 2014-09-11 | 2024-05-01 | The State Of Israel Ministry Of Agriculture & Rural Development Agricultural Res Organization Aro Vo | Methods of producing mogrosides and compositions comprising same and uses thereof |
CN107466320B (en) * | 2014-10-01 | 2021-11-05 | 埃沃尔瓦公司 | Methods and materials for biosynthesizing mogroside compounds |
BR112017021066B1 (en) * | 2015-04-03 | 2022-02-08 | Dsm Ip Assets B.V. | STEVIOL GLYCOSIDES, METHOD FOR THE PRODUCTION OF A STEVIOL GLYCOSIDE, COMPOSITION, RELATED USES, FOOD, PET FOOD AND BEVERAGE |
CN110914445B (en) * | 2017-02-03 | 2024-08-27 | 泰莱解决方案美国有限责任公司 | Engineered glycosyltransferases and sweeteners method for glucosylation of chrysanthenol glycosides |
EP3759230A4 (en) * | 2018-02-27 | 2022-05-25 | Manus Bio Inc. | Microbial production of triterpenoids including mogrosides |
-
2020
- 2020-12-16 IL IL293855A patent/IL293855A/en unknown
- 2020-12-16 JP JP2022536516A patent/JP2023506242A/en active Pending
- 2020-12-16 AU AU2020408684A patent/AU2020408684A1/en active Pending
- 2020-12-16 US US17/785,488 patent/US20230042171A1/en active Pending
- 2020-12-16 CA CA3164769A patent/CA3164769A1/en active Pending
- 2020-12-16 WO PCT/US2020/065285 patent/WO2021126960A1/en unknown
- 2020-12-16 KR KR1020227024513A patent/KR20220164469A/en unknown
- 2020-12-16 EP EP20902958.6A patent/EP4077649A4/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2021126960A1 (en) | 2021-06-24 |
CA3164769A1 (en) | 2021-06-24 |
EP4077649A1 (en) | 2022-10-26 |
KR20220164469A (en) | 2022-12-13 |
US20230042171A1 (en) | 2023-02-09 |
EP4077649A4 (en) | 2024-07-10 |
AU2020408684A1 (en) | 2022-07-14 |
JP2023506242A (en) | 2023-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7382946B2 (en) | Microbial production of triterpenoids including mogrosides | |
US11819042B2 (en) | Uridine diphosphate-dependent glycosyltransferase circular permutants | |
JP7061145B2 (en) | Improved production method of rebaudioside D and rebaudioside M | |
US9284570B2 (en) | Microbial production of natural sweeteners, diterpenoid steviol glycosides | |
CN108473995B (en) | Production of steviol glycosides in recombinant hosts | |
BR112016000745B1 (en) | process for preparing rebaudioside m | |
EP2928321A1 (en) | Steviol glycoside compositions sensory properties | |
US20230042171A1 (en) | Microbial production of mogrol and mogrosides | |
BR112015018872B1 (en) | METHOD FOR PRODUCING A STEVIOL GLYCOSIDE COMPOSITION, RECOMBINANT HOST CELL, METHODS FOR PRODUCING REBAUDIOSIDE M, CELL CULTURE, CELL CULTURE LYSATE AND REACTION MIX |