CA2578028A1 - Method for redesign of microbial production systems - Google Patents
Method for redesign of microbial production systems Download PDFInfo
- Publication number
- CA2578028A1 CA2578028A1 CA002578028A CA2578028A CA2578028A1 CA 2578028 A1 CA2578028 A1 CA 2578028A1 CA 002578028 A CA002578028 A CA 002578028A CA 2578028 A CA2578028 A CA 2578028A CA 2578028 A1 CA2578028 A1 CA 2578028A1
- Authority
- CA
- Canada
- Prior art keywords
- reactions
- production
- computer
- functionalities
- assisted method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000004519 manufacturing process Methods 0.000 title claims abstract description 118
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000000813 microbial effect Effects 0.000 title description 28
- 238000006243 chemical reaction Methods 0.000 claims abstract description 153
- 230000037361 pathway Effects 0.000 claims abstract description 52
- 230000002503 metabolic effect Effects 0.000 claims abstract description 47
- 239000000758 substrate Substances 0.000 claims abstract description 36
- 230000036983 biotransformation Effects 0.000 claims abstract description 21
- 238000005457 optimization Methods 0.000 claims description 40
- 230000001413 cellular effect Effects 0.000 claims description 20
- 230000008878 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 230000037353 metabolic pathway Effects 0.000 claims description 7
- 239000002028 Biomass Substances 0.000 description 67
- 229910052739 hydrogen Inorganic materials 0.000 description 60
- 239000001257 hydrogen Substances 0.000 description 60
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 52
- 230000004907 flux Effects 0.000 description 49
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 47
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 45
- 241000588724 Escherichia coli Species 0.000 description 43
- 239000008103 glucose Substances 0.000 description 41
- 239000000047 product Substances 0.000 description 36
- 230000012010 growth Effects 0.000 description 35
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 30
- 230000015572 biosynthetic process Effects 0.000 description 30
- 238000012217 deletion Methods 0.000 description 30
- 230000037430 deletion Effects 0.000 description 29
- MWOOGOJBHIARFG-UHFFFAOYSA-N vanillin Chemical compound COC1=CC(C=O)=CC=C1O MWOOGOJBHIARFG-UHFFFAOYSA-N 0.000 description 28
- FGQOOHJZONJGDT-UHFFFAOYSA-N vanillin Natural products COC1=CC(O)=CC(C=O)=C1 FGQOOHJZONJGDT-UHFFFAOYSA-N 0.000 description 28
- 235000012141 vanillin Nutrition 0.000 description 28
- KDYFGRWQOYBRFD-UHFFFAOYSA-L succinate(2-) Chemical compound [O-]C(=O)CCC([O-])=O KDYFGRWQOYBRFD-UHFFFAOYSA-L 0.000 description 22
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 21
- 239000002207 metabolite Substances 0.000 description 21
- JVTAAEKCZFNVCJ-UHFFFAOYSA-M Lactate Chemical compound CC(O)C([O-])=O JVTAAEKCZFNVCJ-UHFFFAOYSA-M 0.000 description 20
- 108090000623 proteins and genes Proteins 0.000 description 20
- 238000012224 gene deletion Methods 0.000 description 19
- ALRHLSYJTWAHJZ-UHFFFAOYSA-N 3-hydroxypropionic acid Chemical compound OCCC(O)=O ALRHLSYJTWAHJZ-UHFFFAOYSA-N 0.000 description 16
- 229930027945 nicotinamide-adenine dinucleotide Natural products 0.000 description 15
- 238000012824 chemical production Methods 0.000 description 14
- BOPGDPNILDQYTO-NNYOXOHSSA-N nicotinamide-adenine dinucleotide Chemical compound C1=CCC(C(=O)N)=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OC[C@@H]2[C@H]([C@@H](O)[C@@H](O2)N2C3=NC=NC(N)=C3N=C2)O)O1 BOPGDPNILDQYTO-NNYOXOHSSA-N 0.000 description 13
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 12
- BAWFJGJZGIEFAR-NNYOXOHSSA-O NAD(+) Chemical compound NC(=O)C1=CC=C[N+]([C@H]2[C@@H]([C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OC[C@@H]3[C@H]([C@@H](O)[C@@H](O3)N3C4=NC=NC(N)=C4N=C3)O)O2)O)=C1 BAWFJGJZGIEFAR-NNYOXOHSSA-O 0.000 description 12
- 238000013459 approach Methods 0.000 description 12
- 239000000126 substance Substances 0.000 description 12
- 230000008901 benefit Effects 0.000 description 11
- 229910052799 carbon Inorganic materials 0.000 description 11
- 230000034659 glycolysis Effects 0.000 description 11
- 230000007246 mechanism Effects 0.000 description 11
- LXJXRIRHZLFYRP-VKHMYHEASA-L (R)-2-Hydroxy-3-(phosphonooxy)-propanal Natural products O=C[C@H](O)COP([O-])([O-])=O LXJXRIRHZLFYRP-VKHMYHEASA-L 0.000 description 10
- LXJXRIRHZLFYRP-VKHMYHEASA-N D-glyceraldehyde 3-phosphate Chemical compound O=C[C@H](O)COP(O)(O)=O LXJXRIRHZLFYRP-VKHMYHEASA-N 0.000 description 10
- GNGACRATGGDKBX-UHFFFAOYSA-N dihydroxyacetone phosphate Chemical compound OCC(=O)COP(O)(O)=O GNGACRATGGDKBX-UHFFFAOYSA-N 0.000 description 10
- DTBNBXWJWCWCIK-UHFFFAOYSA-K phosphonatoenolpyruvate Chemical compound [O-]C(=O)C(=C)OP([O-])([O-])=O DTBNBXWJWCWCIK-UHFFFAOYSA-K 0.000 description 10
- 230000001105 regulatory effect Effects 0.000 description 10
- DNIAPMSPPWPWGF-VKHMYHEASA-N (+)-propylene glycol Chemical compound C[C@H](O)CO DNIAPMSPPWPWGF-VKHMYHEASA-N 0.000 description 9
- YPFDHNVEDLHUCE-UHFFFAOYSA-N 1,3-propanediol Substances OCCCO YPFDHNVEDLHUCE-UHFFFAOYSA-N 0.000 description 9
- 229940035437 1,3-propanediol Drugs 0.000 description 9
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 9
- GSXOAOHZAIYLCY-UHFFFAOYSA-N D-F6P Natural products OCC(=O)C(O)C(O)C(O)COP(O)(O)=O GSXOAOHZAIYLCY-UHFFFAOYSA-N 0.000 description 9
- 102000004190 Enzymes Human genes 0.000 description 9
- 108090000790 Enzymes Proteins 0.000 description 9
- BGWGXPAPYGQALX-ARQDHWQXSA-N beta-D-fructofuranose 6-phosphate Chemical compound OC[C@@]1(O)O[C@H](COP(O)(O)=O)[C@@H](O)[C@@H]1O BGWGXPAPYGQALX-ARQDHWQXSA-N 0.000 description 9
- 230000009977 dual effect Effects 0.000 description 9
- 238000009472 formulation Methods 0.000 description 9
- 230000004190 glucose uptake Effects 0.000 description 9
- 239000000203 mixture Substances 0.000 description 9
- 238000012261 overproduction Methods 0.000 description 9
- 229930029653 phosphoenolpyruvate Natural products 0.000 description 9
- 229920000166 polytrimethylene carbonate Polymers 0.000 description 9
- MVSBXGNECVFSOD-AWEZNQCLSA-N (2r)-2-[3-(4-azido-3-iodophenyl)propanoylamino]-3-(pyridin-2-yldisulfanyl)propanoic acid Chemical compound C([C@@H](C(=O)O)NC(=O)CCC=1C=C(I)C(N=[N+]=[N-])=CC=1)SSC1=CC=CC=N1 MVSBXGNECVFSOD-AWEZNQCLSA-N 0.000 description 8
- QTBSBXVTEAMEQO-UHFFFAOYSA-M Acetate Chemical compound CC([O-])=O QTBSBXVTEAMEQO-UHFFFAOYSA-M 0.000 description 8
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 8
- 101150015939 Parva gene Proteins 0.000 description 8
- 150000001875 compounds Chemical class 0.000 description 8
- 238000003209 gene knockout Methods 0.000 description 8
- 150000002431 hydrogen Chemical class 0.000 description 8
- 230000032258 transport Effects 0.000 description 8
- BDAGIHXWWSANSR-UHFFFAOYSA-M Formate Chemical compound [O-]C=O BDAGIHXWWSANSR-UHFFFAOYSA-M 0.000 description 7
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 7
- 230000008030 elimination Effects 0.000 description 7
- 238000003379 elimination reaction Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 229910052760 oxygen Inorganic materials 0.000 description 7
- 239000001301 oxygen Substances 0.000 description 7
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 6
- 241000840267 Moma Species 0.000 description 6
- 238000007792 addition Methods 0.000 description 6
- 239000006227 byproduct Substances 0.000 description 6
- 229910002092 carbon dioxide Inorganic materials 0.000 description 6
- 239000001569 carbon dioxide Substances 0.000 description 6
- RXKJFZQQPQGTFL-UHFFFAOYSA-N dihydroxyacetone Chemical compound OCC(=O)CO RXKJFZQQPQGTFL-UHFFFAOYSA-N 0.000 description 6
- 238000012246 gene addition Methods 0.000 description 6
- 230000004060 metabolic process Effects 0.000 description 6
- 230000028327 secretion Effects 0.000 description 6
- 241000193401 Clostridium acetobutylicum Species 0.000 description 5
- RPNUMPOLZDHAAY-UHFFFAOYSA-N Diethylenetriamine Chemical compound NCCNCCN RPNUMPOLZDHAAY-UHFFFAOYSA-N 0.000 description 5
- 102000030595 Glucokinase Human genes 0.000 description 5
- 108010021582 Glucokinase Proteins 0.000 description 5
- 108010020056 Hydrogenase Proteins 0.000 description 5
- 241000589308 Methylobacterium extorquens Species 0.000 description 5
- 229910019142 PO4 Inorganic materials 0.000 description 5
- 108091000080 Phosphotransferase Proteins 0.000 description 5
- 230000003190 augmentative effect Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 239000010452 phosphate Substances 0.000 description 5
- 102000020233 phosphotransferase Human genes 0.000 description 5
- 239000002243 precursor Substances 0.000 description 5
- 230000009897 systematic effect Effects 0.000 description 5
- 108010092060 Acetate kinase Proteins 0.000 description 4
- 241001424309 Arita Species 0.000 description 4
- FERIUCNNQQJTOY-UHFFFAOYSA-M Butyrate Chemical compound CCCC([O-])=O FERIUCNNQQJTOY-UHFFFAOYSA-M 0.000 description 4
- FERIUCNNQQJTOY-UHFFFAOYSA-N Butyric acid Natural products CCCC(O)=O FERIUCNNQQJTOY-UHFFFAOYSA-N 0.000 description 4
- 241000193403 Clostridium Species 0.000 description 4
- NGHMDNPXVRFFGS-IUYQGCFVSA-N D-erythrose 4-phosphate Chemical compound O=C[C@H](O)[C@H](O)COP(O)(O)=O NGHMDNPXVRFFGS-IUYQGCFVSA-N 0.000 description 4
- VFRROHXSMXFLSN-UHFFFAOYSA-N Glc6P Natural products OP(=O)(O)OCC(O)C(O)C(O)C(O)C=O VFRROHXSMXFLSN-UHFFFAOYSA-N 0.000 description 4
- LCTONWCANYUPML-UHFFFAOYSA-M Pyruvate Chemical compound CC(=O)C([O-])=O LCTONWCANYUPML-UHFFFAOYSA-M 0.000 description 4
- 102000005924 Triose-Phosphate Isomerase Human genes 0.000 description 4
- 108700015934 Triose-phosphate isomerases Proteins 0.000 description 4
- 229910021529 ammonia Inorganic materials 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 125000004429 atom Chemical group 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000006241 metabolic reaction Methods 0.000 description 4
- 230000004108 pentose phosphate pathway Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- CSCPPACGZOOCGX-UHFFFAOYSA-N Acetone Chemical compound CC(C)=O CSCPPACGZOOCGX-UHFFFAOYSA-N 0.000 description 3
- 230000004619 Entner-Doudoroff pathway Effects 0.000 description 3
- 108010068561 Fructose-Bisphosphate Aldolase Proteins 0.000 description 3
- 102000001390 Fructose-Bisphosphate Aldolase Human genes 0.000 description 3
- 102000003855 L-lactate dehydrogenase Human genes 0.000 description 3
- 108700023483 L-lactate dehydrogenases Proteins 0.000 description 3
- 108010081577 aldehyde dehydrogenase (NAD(P)+) Proteins 0.000 description 3
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 3
- 230000006696 biosynthetic metabolic pathway Effects 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 3
- 230000010261 cell growth Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 229940120503 dihydroxyacetone Drugs 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 108010034895 formate hydrogenlyase Proteins 0.000 description 3
- -1 hexose sugars Chemical class 0.000 description 3
- 238000010348 incorporation Methods 0.000 description 3
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 3
- 230000006798 recombination Effects 0.000 description 3
- 238000005215 recombination Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000002441 reversible effect Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000004102 tricarboxylic acid cycle Effects 0.000 description 3
- OSJPPGNTCRNQQC-UWTATZPHSA-N 3-phospho-D-glyceric acid Chemical compound OC(=O)[C@H](O)COP(O)(O)=O OSJPPGNTCRNQQC-UWTATZPHSA-N 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- SRBFZHDQGSBBOR-IOVATXLUSA-N D-xylopyranose Chemical compound O[C@@H]1COC(O)[C@H](O)[C@H]1O SRBFZHDQGSBBOR-IOVATXLUSA-N 0.000 description 2
- 241001646716 Escherichia coli K-12 Species 0.000 description 2
- 108050000235 Fructose-6-phosphate aldolases Proteins 0.000 description 2
- 102000005731 Glucose-6-phosphate isomerase Human genes 0.000 description 2
- 108010070600 Glucose-6-phosphate isomerase Proteins 0.000 description 2
- 108010044467 Isoenzymes Proteins 0.000 description 2
- 102000004317 Lyases Human genes 0.000 description 2
- 108090000856 Lyases Proteins 0.000 description 2
- 108700023175 Phosphate acetyltransferases Proteins 0.000 description 2
- 108010022181 Phosphopyruvate Hydratase Proteins 0.000 description 2
- 102000012288 Phosphopyruvate Hydratase Human genes 0.000 description 2
- 102000013009 Pyruvate Kinase Human genes 0.000 description 2
- 108020005115 Pyruvate Kinase Proteins 0.000 description 2
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 2
- QAOWNCQODCNURD-UHFFFAOYSA-L Sulfate Chemical compound [O-]S([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 2
- 235000009499 Vanilla fragrans Nutrition 0.000 description 2
- 244000263375 Vanilla tahitensis Species 0.000 description 2
- 235000012036 Vanilla tahitensis Nutrition 0.000 description 2
- XJLXINKUBYWONI-DQQFMEOOSA-N [[(2r,3r,4r,5r)-5-(6-aminopurin-9-yl)-3-hydroxy-4-phosphonooxyoxolan-2-yl]methoxy-hydroxyphosphoryl] [(2s,3r,4s,5s)-5-(3-carbamoylpyridin-1-ium-1-yl)-3,4-dihydroxyoxolan-2-yl]methyl phosphate Chemical compound NC(=O)C1=CC=C[N+]([C@@H]2[C@H]([C@@H](O)[C@H](COP([O-])(=O)OP(O)(=O)OC[C@@H]3[C@H]([C@@H](OP(O)(O)=O)[C@@H](O3)N3C4=NC=NC(N)=C4N=C3)O)O2)O)=C1 XJLXINKUBYWONI-DQQFMEOOSA-N 0.000 description 2
- ZSLZBFCDCINBPY-ZSJPKINUSA-N acetyl-CoA Chemical compound O[C@@H]1[C@H](OP(O)(O)=O)[C@@H](COP(O)(=O)OP(O)(=O)OCC(C)(C)[C@@H](O)C(=O)NCCC(=O)NCCSC(=O)C)O[C@H]1N1C2=NC=NC(N)=C2N=C1 ZSLZBFCDCINBPY-ZSJPKINUSA-N 0.000 description 2
- PPQRONHOSHZGFQ-LMVFSUKVSA-N aldehydo-D-ribose 5-phosphate Chemical compound OP(=O)(O)OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PPQRONHOSHZGFQ-LMVFSUKVSA-N 0.000 description 2
- 150000001413 amino acids Chemical class 0.000 description 2
- 239000002551 biofuel Substances 0.000 description 2
- CRFNGMNYKDXRTN-CITAKDKDSA-N butyryl-CoA Chemical compound O[C@@H]1[C@H](OP(O)(O)=O)[C@@H](COP(O)(=O)OP(O)(=O)OCC(C)(C)[C@@H](O)C(=O)NCCC(=O)NCCSC(=O)CCC)O[C@H]1N1C2=NC=NC(N)=C2N=C1 CRFNGMNYKDXRTN-CITAKDKDSA-N 0.000 description 2
- 230000006652 catabolic pathway Effects 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000007357 dehydrogenase reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000009088 enzymatic function Effects 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 108010008221 formate C-acetyltransferase Proteins 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000006692 glycolytic flux Effects 0.000 description 2
- 238000000126 in silico method Methods 0.000 description 2
- 230000002427 irreversible effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 244000005700 microbiome Species 0.000 description 2
- 230000001590 oxidative effect Effects 0.000 description 2
- 150000002972 pentoses Chemical class 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 230000009962 secretion pathway Effects 0.000 description 2
- 238000007493 shaping process Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003827 upregulation Effects 0.000 description 2
- WKOLLVMJNQIZCI-UHFFFAOYSA-M vanillate Chemical compound COC1=CC(C([O-])=O)=CC=C1O WKOLLVMJNQIZCI-UHFFFAOYSA-M 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- WKZGKZQVLRQTCT-ABLWVSNPSA-N (2S)-2-[[4-[(2-amino-4-oxo-5,6,7,8-tetrahydro-3H-pteridin-6-yl)methylamino]benzoyl]amino]-5-formyloxy-5-oxopentanoic acid Chemical compound N1C=2C(=O)NC(N)=NC=2NCC1CNC1=CC=C(C(=O)N[C@@H](CCC(=O)OC=O)C(O)=O)C=C1 WKZGKZQVLRQTCT-ABLWVSNPSA-N 0.000 description 1
- MSTNYGQPCMXVAQ-RYUDHWBXSA-N (6S)-5,6,7,8-tetrahydrofolic acid Chemical compound C([C@H]1CNC=2N=C(NC(=O)C=2N1)N)NC1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 MSTNYGQPCMXVAQ-RYUDHWBXSA-N 0.000 description 1
- 108010011958 1,3-propanediol dehydrogenase Proteins 0.000 description 1
- AUFGTPPARQZWDO-YPMHNXCESA-N 10-formyltetrahydrofolic acid Chemical compound C([C@H]1CNC=2N=C(NC(=O)C=2N1)N)N(C=O)C1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 AUFGTPPARQZWDO-YPMHNXCESA-N 0.000 description 1
- KVZLHPXEUGJPAH-UHFFFAOYSA-N 2-oxidanylpropanoic acid Chemical compound CC(O)C(O)=O.CC(O)C(O)=O KVZLHPXEUGJPAH-UHFFFAOYSA-N 0.000 description 1
- YQUVCSBJEUQKSH-UHFFFAOYSA-M 3,4-dihydroxybenzoate Chemical compound OC1=CC=C(C([O-])=O)C=C1O YQUVCSBJEUQKSH-UHFFFAOYSA-M 0.000 description 1
- YQUVCSBJEUQKSH-UHFFFAOYSA-N 3,4-dihydroxybenzoic acid Chemical compound OC(=O)C1=CC=C(O)C(O)=C1 YQUVCSBJEUQKSH-UHFFFAOYSA-N 0.000 description 1
- 102100031126 6-phosphogluconolactonase Human genes 0.000 description 1
- 108010029731 6-phosphogluconolactonase Proteins 0.000 description 1
- 101150014742 AGE1 gene Proteins 0.000 description 1
- 230000002407 ATP formation Effects 0.000 description 1
- 101710185516 Acetaldehyde dehydrogenase 4 Proteins 0.000 description 1
- 102000000452 Acetyl-CoA carboxylase Human genes 0.000 description 1
- 108010016219 Acetyl-CoA carboxylase Proteins 0.000 description 1
- 101710112890 Aldehyde dehydrogenase B Proteins 0.000 description 1
- 108020004306 Alpha-ketoglutarate dehydrogenase Proteins 0.000 description 1
- 102000006589 Alpha-ketoglutarate dehydrogenase Human genes 0.000 description 1
- 108010018763 Biotin carboxylase Proteins 0.000 description 1
- 241000192731 Chloroflexus aurantiacus Species 0.000 description 1
- 241001112696 Clostridia Species 0.000 description 1
- NBSCHQHZLSJFNQ-GASJEMHNSA-N D-Glucose 6-phosphate Chemical compound OC1O[C@H](COP(O)(O)=O)[C@@H](O)[C@H](O)[C@H]1O NBSCHQHZLSJFNQ-GASJEMHNSA-N 0.000 description 1
- MNQZXJOMYWMBOU-VKHMYHEASA-N D-glyceraldehyde Chemical compound OC[C@@H](O)C=O MNQZXJOMYWMBOU-VKHMYHEASA-N 0.000 description 1
- 101710088194 Dehydrogenase Proteins 0.000 description 1
- 108010089760 Electron Transport Complex I Proteins 0.000 description 1
- 102000008013 Electron Transport Complex I Human genes 0.000 description 1
- 108010018962 Glucosephosphate Dehydrogenase Proteins 0.000 description 1
- 101710162684 Glyceraldehyde-3-phosphate dehydrogenase 3 Proteins 0.000 description 1
- 108010025885 Glycerol dehydratase Proteins 0.000 description 1
- 108010041921 Glycerolphosphate Dehydrogenase Proteins 0.000 description 1
- 102000000587 Glycerolphosphate Dehydrogenase Human genes 0.000 description 1
- SQUHHTBVTRBESD-UHFFFAOYSA-N Hexa-Ac-myo-Inositol Natural products CC(=O)OC1C(OC(C)=O)C(OC(C)=O)C(OC(C)=O)C(OC(C)=O)C1OC(C)=O SQUHHTBVTRBESD-UHFFFAOYSA-N 0.000 description 1
- 101000605827 Homo sapiens Pinin Proteins 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- LTYOQGRJFJAKNA-KKIMTKSISA-N Malonyl CoA Natural products S(C(=O)CC(=O)O)CCNC(=O)CCNC(=O)[C@@H](O)C(CO[P@](=O)(O[P@](=O)(OC[C@H]1[C@@H](OP(=O)(O)O)[C@@H](O)[C@@H](n2c3ncnc(N)c3nc2)O1)O)O)(C)C LTYOQGRJFJAKNA-KKIMTKSISA-N 0.000 description 1
- 241001446467 Mama Species 0.000 description 1
- 108010047290 Multifunctional Enzymes Proteins 0.000 description 1
- 102000006833 Multifunctional Enzymes Human genes 0.000 description 1
- 101000755935 Mus musculus Fructose-bisphosphate aldolase B Proteins 0.000 description 1
- 108010090282 NADH dehydrogenase II Proteins 0.000 description 1
- 108090000472 Phosphoenolpyruvate carboxykinase (ATP) Proteins 0.000 description 1
- 108700019535 Phosphoprotein Phosphatases Proteins 0.000 description 1
- 102000045595 Phosphoprotein Phosphatases Human genes 0.000 description 1
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 102100038374 Pinin Human genes 0.000 description 1
- 241000589516 Pseudomonas Species 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 102000019259 Succinate Dehydrogenase Human genes 0.000 description 1
- 108010012901 Succinate Dehydrogenase Proteins 0.000 description 1
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 description 1
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 description 1
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 125000000217 alkyl group Chemical group 0.000 description 1
- 230000008848 allosteric regulation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009604 anaerobic growth Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 1
- 235000019568 aromas Nutrition 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 1
- 238000005842 biochemical reaction Methods 0.000 description 1
- 230000001851 biosynthetic effect Effects 0.000 description 1
- 238000013452 biotechnological production Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- UBAZGMLMVVQSCD-UHFFFAOYSA-N carbon dioxide;molecular oxygen Chemical compound O=O.O=C=O UBAZGMLMVVQSCD-UHFFFAOYSA-N 0.000 description 1
- 235000021466 carotenoid Nutrition 0.000 description 1
- 150000001747 carotenoids Chemical class 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- METQSPRSQINEEU-UHFFFAOYSA-N dihydrospirorenone Natural products CC12CCC(C3(CCC(=O)C=C3C3CC33)C)C3C1C1CC1C21CCC(=O)O1 METQSPRSQINEEU-UHFFFAOYSA-N 0.000 description 1
- 230000003828 downregulation Effects 0.000 description 1
- METQSPRSQINEEU-OLKMEILKSA-N drospirenone Chemical compound C([C@]12[C@H]3C[C@H]3C3C4[C@@H]([C@]5(CCC(=O)C=C5[C@@H]5C[C@@H]54)C)CC[C@@]31C)CC(=O)O2 METQSPRSQINEEU-OLKMEILKSA-N 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000855 fermentation Methods 0.000 description 1
- 238000012262 fermentative production Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 230000002414 glycolytic effect Effects 0.000 description 1
- 230000009036 growth inhibition Effects 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 108010055049 hydrogen dehydrogenase Proteins 0.000 description 1
- 238000003424 hydrogenase reaction Methods 0.000 description 1
- 230000000415 inactivating effect Effects 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 229910052816 inorganic phosphate Inorganic materials 0.000 description 1
- CDAISMWEOUEBRE-GPIVLXJGSA-N inositol Chemical compound O[C@H]1[C@H](O)[C@@H](O)[C@H](O)[C@H](O)[C@@H]1O CDAISMWEOUEBRE-GPIVLXJGSA-N 0.000 description 1
- 229960000367 inositol Drugs 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000006489 isomerase reaction Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000006977 lyase reaction Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 229940049920 malate Drugs 0.000 description 1
- BJEPYKJPYRNKOW-UHFFFAOYSA-N malic acid Chemical compound OC(=O)C(O)CC(O)=O BJEPYKJPYRNKOW-UHFFFAOYSA-N 0.000 description 1
- LTYOQGRJFJAKNA-DVVLENMVSA-N malonyl-CoA Chemical compound O[C@@H]1[C@H](OP(O)(O)=O)[C@@H](COP(O)(=O)OP(O)(=O)OCC(C)(C)[C@@H](O)C(=O)NCCC(=O)NCCSC(=O)CC(O)=O)O[C@H]1N1C2=NC=NC(N)=C2N=C1 LTYOQGRJFJAKNA-DVVLENMVSA-N 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012269 metabolic engineering Methods 0.000 description 1
- 230000007269 microbial metabolism Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 235000021231 nutrient uptake Nutrition 0.000 description 1
- JJVNINGBHGBWJH-UHFFFAOYSA-N ortho-vanillin Chemical compound COC1=CC=CC(C=O)=C1O JJVNINGBHGBWJH-UHFFFAOYSA-N 0.000 description 1
- 108010038136 phospho-2-keto-3-deoxy-gluconate aldolase Proteins 0.000 description 1
- 108010022393 phosphogluconate dehydratase Proteins 0.000 description 1
- 229910052698 phosphorus Inorganic materials 0.000 description 1
- 239000011574 phosphorus Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001376 precipitating effect Effects 0.000 description 1
- 230000019525 primary metabolic process Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 229940076788 pyruvate Drugs 0.000 description 1
- 239000000376 reactant Substances 0.000 description 1
- 230000027756 respiratory electron transport chain Effects 0.000 description 1
- CDAISMWEOUEBRE-UHFFFAOYSA-N scyllo-inosotol Natural products OC1C(O)C(O)C(O)C(O)C1O CDAISMWEOUEBRE-UHFFFAOYSA-N 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 235000000346 sugar Nutrition 0.000 description 1
- 239000011593 sulfur Substances 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 239000005460 tetrahydrofolate Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
Abstract
A computer-assisted method for identifying functionalities to add to an organism-specific metabolic network to enable a desired biotransformation in a host includes accessing reactions from a universal database to provide stoichiometric balance, identifying at least one stoichiometrically balanced pathway at least partially based on the reactions and a substrate to minimize a number of non-native functionalities in the production host, and incorporating the at least one stiochiometrically balanced pathway into the host to provide the desired biotransformation. A representation of the metabolic network as modified can be stored.
Description
TITLE: METHOD FOR REDESIGN OF MICROBIAL PRODUCTION SYSTEMS
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of U.S. Patent Application Serial No.
10/616,659, filed July 9, 2003 which is a conversion of U.S. Patent Application Serial No.
60/395,763, filed July 10, 2002; U.S. Patent Application Serial No.
60/417,511, filed October, 9, 2002; and U.S. Patent Application Serial No. 60/444,933, filed February 3, 2003, each of which is herein incorporated by reference in its entirety.
GRANT REFERENCE
This work has been supported by Department of Energy pursuant to Grant No.
58855 and the National Science Foundation Grant No. BES0120277. Accordingly, the U.S. government may have certain rights in the invention.
BACKGROUND OF THE INVENTION
The present invention relates to a computational framework that guides pathway modifications, through reaction additions and deletions.
The generation of bioconversion pathways has attracted significant interest in recent years. The first systematic effort towards this end was made by Seressiotis and Bailey (Seressiotis & Bailey, 1988), who utilized the concepts of Artificial Intelligence in developing their software. This was followed by a case study on the production of lysine from glucose and ammonia performed by Mavrovouniotis et al. (Mavrovouniotis et al., 1990) utilizing an algorithm based on satisfying the stoichiometric constraints on reactions SUBSTITUTE SHEET (RULE 26) and metabolites in an iterative fashion. More recently, elegant graph theoretic concepts (e.g., P-graphs (Fan et al., 2002) and k-shortest paths algorithm (Eppstein, 1994)) were pioneered to identify novel biotransformation pathways based on the tracing of atoms (Arita, 2000; Arita, 2004), enzyme function rules and thermodynamic feasibility constraints (Hatzimanikatis et al., 2003). Most of these approaches have been demonstrated by applying them on a relatively small database of reactions. Their performance on genome-scale databases of metabolic reactions, such as the KEGG database which consists of approximately 5000 reactions (Kanehisa et al., 2002), will dramatically suffer.
Very recently, a heuristic approach based on determining the minimum pathway cost (based on any biochemical property) was proposed (McShan et al., 2003).
This approach is quite successful in delineating the pathways for conversion of one metabolite into another. However, like all other approaches discussed earlier, it fails to predict the yield of the product obtained by employing a specific pathway. Furthermore, these approaches mostly identify linear biotransformation pathways without ensuring the balanceability of all metabolites, especially the cofactors.
Therefore it is a primary object, feature, or advantage of the present invention to provide an optimization-based procedure which addresses the complexity associated with genome-scale networks.
It is a further object, feature, or advantage of the present invention to provide a method for constructing stoichiometrically-balanced bioconversion pathways, both branched and linear, that are efficient in terms of yield and the number of non-native reactions required in a host for product formation.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of U.S. Patent Application Serial No.
10/616,659, filed July 9, 2003 which is a conversion of U.S. Patent Application Serial No.
60/395,763, filed July 10, 2002; U.S. Patent Application Serial No.
60/417,511, filed October, 9, 2002; and U.S. Patent Application Serial No. 60/444,933, filed February 3, 2003, each of which is herein incorporated by reference in its entirety.
GRANT REFERENCE
This work has been supported by Department of Energy pursuant to Grant No.
58855 and the National Science Foundation Grant No. BES0120277. Accordingly, the U.S. government may have certain rights in the invention.
BACKGROUND OF THE INVENTION
The present invention relates to a computational framework that guides pathway modifications, through reaction additions and deletions.
The generation of bioconversion pathways has attracted significant interest in recent years. The first systematic effort towards this end was made by Seressiotis and Bailey (Seressiotis & Bailey, 1988), who utilized the concepts of Artificial Intelligence in developing their software. This was followed by a case study on the production of lysine from glucose and ammonia performed by Mavrovouniotis et al. (Mavrovouniotis et al., 1990) utilizing an algorithm based on satisfying the stoichiometric constraints on reactions SUBSTITUTE SHEET (RULE 26) and metabolites in an iterative fashion. More recently, elegant graph theoretic concepts (e.g., P-graphs (Fan et al., 2002) and k-shortest paths algorithm (Eppstein, 1994)) were pioneered to identify novel biotransformation pathways based on the tracing of atoms (Arita, 2000; Arita, 2004), enzyme function rules and thermodynamic feasibility constraints (Hatzimanikatis et al., 2003). Most of these approaches have been demonstrated by applying them on a relatively small database of reactions. Their performance on genome-scale databases of metabolic reactions, such as the KEGG database which consists of approximately 5000 reactions (Kanehisa et al., 2002), will dramatically suffer.
Very recently, a heuristic approach based on determining the minimum pathway cost (based on any biochemical property) was proposed (McShan et al., 2003).
This approach is quite successful in delineating the pathways for conversion of one metabolite into another. However, like all other approaches discussed earlier, it fails to predict the yield of the product obtained by employing a specific pathway. Furthermore, these approaches mostly identify linear biotransformation pathways without ensuring the balanceability of all metabolites, especially the cofactors.
Therefore it is a primary object, feature, or advantage of the present invention to provide an optimization-based procedure which addresses the complexity associated with genome-scale networks.
It is a further object, feature, or advantage of the present invention to provide a method for constructing stoichiometrically-balanced bioconversion pathways, both branched and linear, that are efficient in terms of yield and the number of non-native reactions required in a host for product formation.
SUBSTITUTE SHEET (RULE 26) Another object, feature, or advantage of the present invention is to provide a method that enables the evaluation of multiple substrate choices.
Yet another object, feature, or advantage of the present invention is to provide a method for computationally suggesting the manner in which to achieve bioengineering objectives, including increased production objectives.
A further object, feature or advantage of the present invention is to determine candidates for gene deletion or addition through use of a model of a network of bioconversion pathways.
Yet another object, feature or advantage of the present invention is to provide an ] 0 optimized method for computationally achieving a bioengineering objective that is flexible and robust.
A still further object, feature, or advantage of the present invention is to provide a method for computationally achieving a bioengineering objective that can take into account not only central metabolic pathways, but also other pathways such as amino acid biosynthetic and degradation pathways.
Yet another object, feature, or advantage of the present invention is to provide a method for computationally achieving a bioengineering objective that that can take into account transport rates, secretion pathways or other characteristics as optimization variables.
One or more of these and/or other objects, features and advantages of the present invention will become apparent after review of the following detailed description of the disclosed embodiments and the appended claims.
Yet another object, feature, or advantage of the present invention is to provide a method for computationally suggesting the manner in which to achieve bioengineering objectives, including increased production objectives.
A further object, feature or advantage of the present invention is to determine candidates for gene deletion or addition through use of a model of a network of bioconversion pathways.
Yet another object, feature or advantage of the present invention is to provide an ] 0 optimized method for computationally achieving a bioengineering objective that is flexible and robust.
A still further object, feature, or advantage of the present invention is to provide a method for computationally achieving a bioengineering objective that can take into account not only central metabolic pathways, but also other pathways such as amino acid biosynthetic and degradation pathways.
Yet another object, feature, or advantage of the present invention is to provide a method for computationally achieving a bioengineering objective that that can take into account transport rates, secretion pathways or other characteristics as optimization variables.
One or more of these and/or other objects, features and advantages of the present invention will become apparent after review of the following detailed description of the disclosed embodiments and the appended claims.
SUBSTITUTE SHEET (RULE 26) BRIEF SUMMARY OF THE INVENTION
The present invention provides hierarchical computational framework, which is referred to as "OptStrain" and is aimed at guiding pathways modifications, through reaction additions and deletions, of microbial networks for the overproduction of targeted compounds. These compounds may range from electrons or hydrogen in bio-fuel cell and environmental applications to complex drug precursor molecules. A
comprehensive database of biotransformations, referred to as the Universal database (with over 5,000 reactions), is compiled and regularly updated by downloading and curating reactions from multiple biopathway database sources. Combinatorial optimization is then employed to elucidate the set(s) of non-native functionalities, extracted from this Universal database, to add to the examined production host for enabling the desired product formation.
Subsequently, competing functionalities that divert flux away from the targeted product are identified and removed to ensure higher product yields coupled with growth.
The present invention represents a significant advancement over earlier efforts by establishing an integrated computational framework capable of constructing stoichiometrically balanced pathways, imposing maximum product yield requirements, pinpointing the optimal substrate(s), and evaluating different microbial hosts.
The range and utility of OptStrain is demonstrated by addressing two very different product molecules. The hydrogen case study pinpoints reaction elimination strategies for improving hydrogen yields using two different substrates for three separate production hosts. Tn contrast, the vanillin study primarily showcases which non-native pathways need to be added into Escherichia coli. In summary, OptStrain provides a useful tool to aid SUBSTITUTE SHEET (RULE 26) microbial strain design and, more importantly, it establishes an integrated frainework to accommodate future modeling developments.
The OptStrain process incorporates the OptKnock process which has been previously described in U.S. Patent Application Serial No. 10/616,659, filed July 9, U.S.
Patent Application Serial No. 60/395,763, filed July 10, 2002, U.S. Patent Application Serial No. 60/417,511, filed October, 9, 2002, and U.S. Patent Application Serial No.
60/444,933, filed February 3, 2003, all of which have been previously incorporated by reference in their entirety. The OptKnock process provides for the systematic development of engineered microbial strains for optimizing the production of chemical or biochemicals which is an overarching challenge in biotechnology. The advent of genome-scale models of metabolism has laid the foundation for the development of computational procedures for suggesting genetic manipulations that lead to overproduction. This is accomplished by ensuring that a drain towards growth resources (i.e., carbon, redox potential, and energy) is accompanied, due to stoichiometry, by the production of a desired production.
Specifically, the computation framework identifies multiple gene deletion combinations that maximally couple a postulated cellular objective (e.g., biomass formation) with externally imposed chemical production targets. This nested structure gives rise to a bilevel optimization problem which is solved based on a transformation inspired by duality theory. This procedure of this framework, by coupling biomass formation with chemical production, suggest a growth selection/adaption system for indirectly evolving overproducing mutants.
OptKnock can also incorporate strategies that not only include central metabolic network genes, but also the amino acid biosynthetic and degradation pathways.
In addition SUBSTITUTE SHEET (RULE 26) to gene deletions, the transport rates of carbon dioxide, ammonia and oxygen as well as the secretion pathways for key metabolites can be introduced as optimizaiion variables in the framework. Thus, the present invention is both robust and flexible in order to address the complexity associated with genome-scale networks.
The present invention provides hierarchical computational framework, which is referred to as "OptStrain" and is aimed at guiding pathways modifications, through reaction additions and deletions, of microbial networks for the overproduction of targeted compounds. These compounds may range from electrons or hydrogen in bio-fuel cell and environmental applications to complex drug precursor molecules. A
comprehensive database of biotransformations, referred to as the Universal database (with over 5,000 reactions), is compiled and regularly updated by downloading and curating reactions from multiple biopathway database sources. Combinatorial optimization is then employed to elucidate the set(s) of non-native functionalities, extracted from this Universal database, to add to the examined production host for enabling the desired product formation.
Subsequently, competing functionalities that divert flux away from the targeted product are identified and removed to ensure higher product yields coupled with growth.
The present invention represents a significant advancement over earlier efforts by establishing an integrated computational framework capable of constructing stoichiometrically balanced pathways, imposing maximum product yield requirements, pinpointing the optimal substrate(s), and evaluating different microbial hosts.
The range and utility of OptStrain is demonstrated by addressing two very different product molecules. The hydrogen case study pinpoints reaction elimination strategies for improving hydrogen yields using two different substrates for three separate production hosts. Tn contrast, the vanillin study primarily showcases which non-native pathways need to be added into Escherichia coli. In summary, OptStrain provides a useful tool to aid SUBSTITUTE SHEET (RULE 26) microbial strain design and, more importantly, it establishes an integrated frainework to accommodate future modeling developments.
The OptStrain process incorporates the OptKnock process which has been previously described in U.S. Patent Application Serial No. 10/616,659, filed July 9, U.S.
Patent Application Serial No. 60/395,763, filed July 10, 2002, U.S. Patent Application Serial No. 60/417,511, filed October, 9, 2002, and U.S. Patent Application Serial No.
60/444,933, filed February 3, 2003, all of which have been previously incorporated by reference in their entirety. The OptKnock process provides for the systematic development of engineered microbial strains for optimizing the production of chemical or biochemicals which is an overarching challenge in biotechnology. The advent of genome-scale models of metabolism has laid the foundation for the development of computational procedures for suggesting genetic manipulations that lead to overproduction. This is accomplished by ensuring that a drain towards growth resources (i.e., carbon, redox potential, and energy) is accompanied, due to stoichiometry, by the production of a desired production.
Specifically, the computation framework identifies multiple gene deletion combinations that maximally couple a postulated cellular objective (e.g., biomass formation) with externally imposed chemical production targets. This nested structure gives rise to a bilevel optimization problem which is solved based on a transformation inspired by duality theory. This procedure of this framework, by coupling biomass formation with chemical production, suggest a growth selection/adaption system for indirectly evolving overproducing mutants.
OptKnock can also incorporate strategies that not only include central metabolic network genes, but also the amino acid biosynthetic and degradation pathways.
In addition SUBSTITUTE SHEET (RULE 26) to gene deletions, the transport rates of carbon dioxide, ammonia and oxygen as well as the secretion pathways for key metabolites can be introduced as optimizaiion variables in the framework. Thus, the present invention is both robust and flexible in order to address the complexity associated with genome-scale networks.
SUBSTITUTE SHEET (RULE 26) BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a pictorial representation of the OptStrain procedure. Step 1 involves the curation of database(s) of reactions to compile the Universal database which comprises of only elementally balanced reactions. Step 2 identifies a path enabling the desired biotransformation from a substrate (e.g., glucose, methanol, xylose) to product (e.g., hydrogen, vanillin) without any consideration for the origin of reactions.
Note that the both, native reactions of the host and non-native reactions, are present. Step 3 minimizes the reliance on non-native reactions while Step 4 incorporates the non-native functionalities into the microbial host's stoichiometric model and applies the OptKnock procedure to identify and eliminate competing reactions with the targeted product. The (X)'s pinpoint the deleted reactions.
Figure 2 is a graph indicating maximum hydrogen yield on a weight basis for different substrates.
Figure 3 is a graph illustrating hydrogen production envelopes as a function of the biomass production rate of the wild-type E. coli network under aerobic and anaerobic conditions as well as the two-reaction and three-reaction deletion mutant networks. The basis glucose uptake rate is fixed at 10 mmol/gDW/hr. These curves are constructed by finding the maximum and minimum hydrogen production rates under different rates of biomass fornnation. Point A denotes the required theoretical hydrogen production rate at the maximum biomass formation rate of the wild-type network under anaerobic conditions.
Points B and C identify the theoretical hydrogen production rates at maximum growth for SUBSTITUTE SHEET (RULE 26) the two mutant networks respectively after fixing the corresponding carbon dioxide transport rates at the values suggested by OptKnock.
Figure 4 is a graph illustrating hydrogen formation limits of the wild-type (solid) and mutant (dotted) Clostridium acetobutylicum metabolic network for a basis glucose uptake rate of 1 mmol/gDW/hr. Line AB denotes different alternate maximum biomass yield solutions that are available to the wild-type network. Point C pinpoints the hydrogen yield of the mutant network at maximum growth. This can be contrasted with the reported experimental hydrogen yield (2 mol/mol glucose) in C. acetobutylicum (45).
Figure 5 is a graph illustrating vanillin production envelope of the augmented E.
coli metabolic network for a basis 10 rnmol/gDW/hr uptake rate of glucose.
Points A, B
- and C denote the maximum growth points associated with the one, two and four reaction deletion mutant networks, respectively. In contrast to the wild-type network for which vanillin production is not guaranteed at any rate of biomass production, the mutant networks require significant vanillin yields to achieve high levels of biomass production.
Note that an anaerobic mode of growth is suggested in all cases.
Figure 6 depicts the bilevel optimization structure of Optknock. The inner problem performs the flux allocation based on the optimization of a particular cellular objective (e.g., maximization of biomass yield, MOMA). The outer problem then maximizes the bioengineering objective (e.g., chemical production) by restricting access to key reactions available to the optimization of the inner problem.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Figure 1 is a pictorial representation of the OptStrain procedure. Step 1 involves the curation of database(s) of reactions to compile the Universal database which comprises of only elementally balanced reactions. Step 2 identifies a path enabling the desired biotransformation from a substrate (e.g., glucose, methanol, xylose) to product (e.g., hydrogen, vanillin) without any consideration for the origin of reactions.
Note that the both, native reactions of the host and non-native reactions, are present. Step 3 minimizes the reliance on non-native reactions while Step 4 incorporates the non-native functionalities into the microbial host's stoichiometric model and applies the OptKnock procedure to identify and eliminate competing reactions with the targeted product. The (X)'s pinpoint the deleted reactions.
Figure 2 is a graph indicating maximum hydrogen yield on a weight basis for different substrates.
Figure 3 is a graph illustrating hydrogen production envelopes as a function of the biomass production rate of the wild-type E. coli network under aerobic and anaerobic conditions as well as the two-reaction and three-reaction deletion mutant networks. The basis glucose uptake rate is fixed at 10 mmol/gDW/hr. These curves are constructed by finding the maximum and minimum hydrogen production rates under different rates of biomass fornnation. Point A denotes the required theoretical hydrogen production rate at the maximum biomass formation rate of the wild-type network under anaerobic conditions.
Points B and C identify the theoretical hydrogen production rates at maximum growth for SUBSTITUTE SHEET (RULE 26) the two mutant networks respectively after fixing the corresponding carbon dioxide transport rates at the values suggested by OptKnock.
Figure 4 is a graph illustrating hydrogen formation limits of the wild-type (solid) and mutant (dotted) Clostridium acetobutylicum metabolic network for a basis glucose uptake rate of 1 mmol/gDW/hr. Line AB denotes different alternate maximum biomass yield solutions that are available to the wild-type network. Point C pinpoints the hydrogen yield of the mutant network at maximum growth. This can be contrasted with the reported experimental hydrogen yield (2 mol/mol glucose) in C. acetobutylicum (45).
Figure 5 is a graph illustrating vanillin production envelope of the augmented E.
coli metabolic network for a basis 10 rnmol/gDW/hr uptake rate of glucose.
Points A, B
- and C denote the maximum growth points associated with the one, two and four reaction deletion mutant networks, respectively. In contrast to the wild-type network for which vanillin production is not guaranteed at any rate of biomass production, the mutant networks require significant vanillin yields to achieve high levels of biomass production.
Note that an anaerobic mode of growth is suggested in all cases.
Figure 6 depicts the bilevel optimization structure of Optknock. The inner problem performs the flux allocation based on the optimization of a particular cellular objective (e.g., maximization of biomass yield, MOMA). The outer problem then maximizes the bioengineering objective (e.g., chemical production) by restricting access to key reactions available to the optimization of the inner problem.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
SUBSTITUTE SHEET (RULE 26) The present invention provides for methods and systems for guiding pathway modifications, through reaction additions and deletions. Preferably the methods are computer implemented or computer assisted or otherwise automated. The term "computer"
as used herein should be construed broadly to include, but not to be limited to, any number of eleotronic devices suitable for practicing the methodology described herein. It is further to be understood that because the invention relates to computer-assisted modeling that the scope of the invention is broader than the specific embodiments provided herein and that one skilled in the art would understand how to apply the present invention in different environments and contexts to address different problems in part due to the predictability associated with computer implementations.
1. OptStrain A fundamental goal in systems biology is to elucidate the complete "palette"
of biotransformations accessible to nature in living systems. This goal parallels the continuing quest in biotechnology to construct microbial strains capable. of accomplishing an ever-expanding array of desired biotransformations. These biotransformations are aimed at products that range from simple precursor chemicals (Nakamura &
Whited, 2003; Causey et al., 2004) or complex molecules such as carotenoids (Misawa et al., 1991), to electrons in bio fuel cells (Liu et al., 2004) or batteries (Bond et al., 2002; Bond et al., 2003) to even microbes capable of precipitating heavy metal complexes in bioremediation applications (Methe et al., 2003; Lovley, 2003; Finneran et al., 2002).
Recent developments in molecular biology and recombinant DNA technology have ushered a new era in the ability to shape the gene content and expression levels for microbial production strains in a direct and targeted fashion (Bailey, 1991;
Stephanopoulos &
as used herein should be construed broadly to include, but not to be limited to, any number of eleotronic devices suitable for practicing the methodology described herein. It is further to be understood that because the invention relates to computer-assisted modeling that the scope of the invention is broader than the specific embodiments provided herein and that one skilled in the art would understand how to apply the present invention in different environments and contexts to address different problems in part due to the predictability associated with computer implementations.
1. OptStrain A fundamental goal in systems biology is to elucidate the complete "palette"
of biotransformations accessible to nature in living systems. This goal parallels the continuing quest in biotechnology to construct microbial strains capable. of accomplishing an ever-expanding array of desired biotransformations. These biotransformations are aimed at products that range from simple precursor chemicals (Nakamura &
Whited, 2003; Causey et al., 2004) or complex molecules such as carotenoids (Misawa et al., 1991), to electrons in bio fuel cells (Liu et al., 2004) or batteries (Bond et al., 2002; Bond et al., 2003) to even microbes capable of precipitating heavy metal complexes in bioremediation applications (Methe et al., 2003; Lovley, 2003; Finneran et al., 2002).
Recent developments in molecular biology and recombinant DNA technology have ushered a new era in the ability to shape the gene content and expression levels for microbial production strains in a direct and targeted fashion (Bailey, 1991;
Stephanopoulos &
SUBSTITUTE SHEET (RULE 26) Sinskey, 1993). The astounding range and diversity of these newly acquired capabilities and the scope of biotechnological applications imply that now more than ever we need modeling and computational aids to a priori identify the optimal sets of genetic modifications for strain optimization projects.
. , The recent availability of genome-scale models of microbial organisms has provided the pathway reconstructions necessary for developing computational methods aimed at identifying strain engineering strategies (Bailey, 2001). These models, already available for H. pylori (Schilling et al., 2002), E. coli (Reed et al., 2003;
Edwards &
Palsson, 2000), S. cerevisiae (Forster et al., 2003) and other microorganisms (David et al., 2003; Van Dien & Lindstrom, 2002; Valdes et al., 2003) provide successively refined abstractions of the microbial metabolic capabilities. An automated process to expedite the construction of stoichiometric models from annotated genomes (Segre et al., 2003) promises to further accelerate the metabolic reconstructions of several microbial organisms. At the same time, individual reactions are deposited in databases such as KEGG, EMP, MetaCyc, UM-BBD, and many more (Overbeek et al., 2000; Selkov et al., 1998; Kanehisa et al., 2004; Krieger et al., 2004; Ellis et al., 2003; Karp et al., 2002), forming encompassing and growing collections of the biotransformations for which we have direct or indirect evidence of existence in different species. Already many thousands of such reactions have been deposited; however, unlike organism specific metabolic reconstructions (Schilling et al., 2002; Reed et al., 2003; Edwards & Palsson, 2000; Forster et al., 2003), these compilations include reactions from not a single but many different species in a largely uncurated fashion. This means that currently there exists an ever-expanding collection of microbial models and at the same time ever more encompassing SUBSTITUTE SHEET (RULE 26) compilations of non-native functionalities. This newly acquired plethora of data has brought to the forefront a number of computational and modeling challenges which form the scope of this article. Specifically, how can we systematically select from the thousands of functionalities catalogued in various biological databases, the appropriate set of pathways/genes to recombine into existing production systems such as E. coli so as to endow them with the desired new functionalities? Subsequently, how can we identify which competing functionalities to eliminate to ensure high product yield as well as viability?
Existing strategies and methods for accomplishing this goal include database queries to explore all feasible bioconversion routes from a substrate to a target compound from a given list of biochemical transformations (Seressiotis & Bailey, 1988;
Mavrovouniotis et al., 1990). More recentfy, elegant graph theoretic concepts (e.g., P-graphs (Fan et al., 2002) and k-shortest paths algorithm (Eppstein, 1994)) were pioneered to identify novel biotransformation pathways based on the tracing of atoms (Arita, 2000;
Arita 2004), enzyme function rules and thermodynamic feasibility constraints _ (Hatzimanikatis et al., 2003). Also an interesting heuristic search approach that uses the enzymatic biochemical reactions found in the KEGG database (Kanehisa et al., 2004) to construct a connected graph linking the substrate and product metabolites was recently proposed (McShan et al., 2003). Most of these approaches, however, generate linear paths that link substrates to fmal products without ensuring that the rest of the metabolic network is balanced and that metabolic imperatives on cofactor usage/generation and energy balances are met.
SUBSTITUTE SHEET (RULE 26) The present invention provides a hierarchical optimization-based framework, OptStrain to identify stoichiometrically-balanced pathways to be generated upon recombination of non-native functionalities into a host organism to confer the desired phenotype. Candidate metabolic pathways are identified from an ever-expanding array of thousands (currently 5,734) of reactions pooled together from different stoichiometric models and publicly available databases such as KEGG (Kanehisa et al., 2004).
Note that the identified pathways satisfy maximum yield considerations while the choice of substrates can be treated as optimization variables. Important information pertaining to the cofactor/energy requirements associated with each pathway is deduced enabling the comparison of candidate pathways with respect to the aforementioned criteria.
Production host selection is examined by successively minimizing the reliance on heterologous genes while satisfying the performance targets identified above. A gene set that encodes for all the enzymes needed to catalyze the identified non-native functionalities can then be constructed accounting for isozymes and multi-subunit enzymes. Subsequently, gene deletions are identified (Burgard et al., 2003; Pharkya et al., 2003) in the augmented host networks to improve product yields by removing competing functionalities which decouple biochemical production and growth objectives. The breadth and scope of OptStrain is demonstrated by addressing in detail two different product molecules (i.e., hydrogen and vanillin) which lie at the two extremes in terms of product- molecule size.
Briefly, computational results in some cases match existing strain designs and production practices whereas in others pinpoint novel engineering strategies.
1.1 Materials and Methods SUBSTITUTE SHEET (RULE 26) The first challenge addressed is to develop a systematic computational framework to identify which functionalities to add to the organism-specific metabolic network (e.g., E.
coli (Reed et al., 2003; ), S. cerevisiae (Forster et al., 2003), , Edwards &
Palsson, 2000 C.
acetobutylicum (Desai et al., 1999; Papoutsakis, 1984), etc.) to enable the desired biotransformation. The present inventors have already contributed towards this objective at a much smaller scale (Burgard & Maranas, 2001). Due to the extremely large size of the compiled database and the presence of multiple and sometimes conflicting objectives that need to be simultaneously satisfied, we developed the OptStrain procedure illustrated in Figure 1. Each step introduces different computational challenges arising from the specific structure and size of the optimization problems that need to be solved.
Step 1. Automated downloading and curation of the reactions in our Universal database to ensure stoichiometric balance;
Step 2. Calculation of the maximum theoretical yield of the product given a substrate choice without restrictions on the reaction origin (i.e., native or non-native);
Step 3. Identification of a stoichiometrically-balanced pathway(s) that minimizes the number of non-native functionalities in the examined production host given the maximum theoretical yield and the optimum substrate(s) found in Step 2. Alternative pathways that meet both criteria of maximum yield and minimum number of heterologous genes are generated along with comparisons between different host choices. Information pertaining to the cofactor/energy usage associated with each pathway is also derived at this stage.
~
Finally, one or multiple gene sets can be derived at this stage that ensure the presence of the targeted biotransformations by encoding for the appropriate enzymes;
SUBSTITUTE SHEET (RULE 26) Step 4. Incorporation of the identified non-native biotransformations into the stoichiometric models, if available, of the examined microbial production hosts. The OptKnock framework is next applied (Burgard et al., 2003; Pharkya et aL, 2003) on these augmented models to suggest gene deletions that ensure the production of the desired product becomes an obligatory byproduct of growth by "shaping" the connectivity of the metabolic network. The OptKnock framework is further described herein.
Curation of the database. The first step of the OptStrain procedure begins with the downloading and curation of reactions acquired from various sources in our Universal database. Specifically, given the fact that new reactions are incorporated in the KEGG
database on a monthly basis, we have developed customized scripts using Perl (Brown, 1999) to automatically download all reactions in the database on a regular basis. A
different script is then used to parse the number of atoms of each element in every compound. The number of atoms of each type among the reactants and products of all reactions are calculated and reactions which are elementally unbalanced are excluded from consideration. In addition, compounds with an unspecified number of repeat units, (e.g., trans-2-Enoyl-CoA represented by C25H39N7017P3S(CHZ)õ) or unspecified alkyl groups R
in their chemical formulae are remoyed from the downloaded sets. This step enables the automated downloading of functionalities present in genomic databases and the subsequent verification of their elemental balanceabilities forming large-scale sets of functionalities to be used as recombination targets.
The present invention, contemplates that any number of particular methods can be used to automate the duration and/or curation of reactions. These automated functions can be performed in any number of ways depending upon the resources available, the type of SUBSTITUTE SHEET (RULE 26) access to the resources, and other factors related to the specific environment or context in which the present invention is implemented.
Determination of the maximum yield Once the reaction sets are determined, the second step is geared towards determining the maximum theoretical yield of the target product from a range of substrate choices, without restrictions on the number or origin of the reactions used. The maximum theoretical product yield is obtained for a unit uptake rate of substrate by maximizing the sum of all reaction fluxes producing minus those consuming the target metabolite, weighted by the stoichiometric coefficient of the target metabolite in these reactions. .The maximization of this yield subject to stoichiometric constraints and transport conditions yields a Linear Programming (LP) problem (see Supporting Information for mathematical formulation), often encountered in Flux Balance Analysis frameworks (Varma & Palsson, 1994). Given the computational tractability of LP
problems, even for many thousands for reactions, a large number of different substrate choices can thoroughly be explored here.
Although, in this specific embodiment, the bioengineering objective relates to maximizing production, the present invention contemplates that other bioengineering objectives can be used. In such instances, instead of determining or selecting a maximum yield, a separate and appropriate objective or constraint can be used.
Identification of the minimum number of heterologous reactions for a host organism. The next step in OptStrain uses the knowledge of the maximum theoretical yield to determine the minimum number of non-native functionalities that need to be added into a specific host organism network. Mathematically, this is achieved by first introducing a set of binary variables y, that serve as switches to tum the associated reaction fluxes v, on vlin ,yj <vj SvT~3.yj SUBSTITUTE SHEET (RULE 26) or off.
Note that the binary variable y, assumes a value of one if reactionj is active and a value of zero if it is inactive. This constraint will be imposed on only reactions associated with genes heterologous to the specified production host. The parameters vj""
and vj"' are calculated by minimizing and maximizing every reaction flux vj subject to the stoichiometry of the metabolic network (Burgard & Mamas, 2001). This leads to a Mixed Integer Linear Programming (MILP) model for finding the minimum number of genes to be added into the host organism network while meeting the yield target for the desired product. This formulation, discussed in greater detail later herein, enables the exploration of tradeoffs between the required numbers of heterologous genes versus the maximum theoretical product yield and also the iterative identification of all alternate optimal solutions. The end result of this step is a set of distinct pathways and corresponding gene complements that provide a ranked list of all alternatives for the efficient conversion of the substrate(s) into the desired product.
Incorporating the non-native reactions into the host organism's stoichiometric model. Upon identification of the appropriate host organism, the analysis proceeds with an organism-specific stoichiometric model augmented by the set of the identified non-native reactions. Hqwever, simply adding genes to a microbial production strain will not necessarily lead to the desired overproduction due to the fact that microbial metabolism is primed to be as responsive as possible to the imposed selection pressures (e.g., outgrow its competition). These survival objectives are typically in direct competition with the overproduction of targeted biochemicals. To combat this, we use our previously developed bilevel computational framework, OptKnock (Burgard et al., 2003; Pharkya et al., 2003) to SUBSTITUTE SHEET (RULE 26) eliminate all those functionalities which uncouple the cellular fitness objective, typically exemplified as the biomass yield, from the maximum yield of the product of interest.
1.2 Results Computational results for microbial strain optimization focused on the production of hydrogen and vanillin. One skilled in the art having the benefit of this disclosure would understand the present invention is in no way limited to these particular bioengineering objectives which are merely illustrative of the present invention. The hydrogen production case study underscores the importance of investigating multiple substrates and microbial hosts to pinpoint the optimal production environment as well as the need to eliminate competing functionalities. In contrast, in the vanillin study, identifying the smallest number of non-native reactions is found to be the key challenge for strain design. A
common database of reactions, as outlined in (Step 1), was constructed for both examples by pooling together metabolic pathways from the methylotroph Methylobacterium extorquens AMI (Van Dien & Lindstrom, 2002) and the KEGG database (Kanehisa et al., 2004) of reactions.
1.2.1 Hydrogen Production Case Study An efficient microbial hydrogen production strateg,~ requires the selection of an optimal substrate and a microbial strain capable of forming hydrogen at high rates. First we solved the maximum yield LP formulation (Step 2) using all catalogued reactions which were balanced with respect to hydrogen, oxygen, nitrogen, sulfur, phosphorus and carbon (approximately 3,000 reactions) as recombination candidates. Note that OptStrain allowed for different substrate choices such as pentose and hexose sugars as well as acetate, lactate, SUBSTITUTE SHEET (RULE 26) malate, glycerol, pyruvate, succinate and methanol. The highest hydrogen yield obtained for a methanol substrate was equal to 0.126 g/g substrate consumed. This is not surprising given that the hydrogen to carbon ratio for methanol is the highest at four to one. A
comparison of the yields for some of the more efficient substrates is shown in Figure 2.
We decided to explore methanol and glucose further, motivated by the high yield on methanol and the favorable costs associated with the use of glucose.
The next step in the OptStrain procedure entailed the determination of the minimum number of non-native functionalities for achieving the theoretical maximum yield in a host organism. We examined three different uptake scenarios: (i) glucose as the substrate in Escherichia coli (an established production system), (ii) glucose in Clostridium, acetobutylicum (a known hydrogen producer), and (iii) methanol in Methylobacterium extorquens (a known methanol consumer).
1.2.1.1 Escherichia coli The MILP framework (described in Step 3) correctly verified that with glucose as the substrate no ndn-native functionalities were required by E. coli for hydrogen production. Interestingly, hydrogen production was possible through either the ferredoxin hydrogenase reaction (E.C.# 1.12.7.2) which reduces protons to form hydrogen or via the hydrogen dehydrogenase reaction (E.C.# 1.12.1.2) which converts NADH into NAD+while forming hydrogen through proton association. Subsequently, the upper and lower limits of maximum hydrogen formatiori were explored for the E. coli stoichiometric model (Reed et al., 2003) as a function of biomass formation rate (i.e., growth rate) for both aerobic and anaerobic conditions and a basis glucose uptake rate of 10 mmol/gDW/hr (see Figure 3).
SUBSTITUTE SHEET (RULE 26) Notably, the maximum theoretical hydrogen yield is higher under aerobic conditions.
However, only under anaerobic conditions hydrogen is formed at maximum growth (see point A, in Fig. 3) leading to a growth-coupled production mode. Note that hydrogen production takes place through the formate hydrogen lyase reaction which converts formate into hydrogen and carbon dioxide under anaerobic conditions, in agreement with current experimental observations (Nandi & Sengupta., 1998).
Moving to phenotype restriction to curtail byproduct formation (Step 4), we explored whether the production of hydrogen in the wild type E. coli network (Reed et al., 2003) could be enhanced by removing functionalities from the network that were in direct ~ or indirect competition with hydrogen production. To this end, we employed the OptKnock framework (Burgard et al., 2003; Pharkya et al., 2003), to pinpoint gene deletion strategies that couple hydrogen production with growth. Here we highlight two of the identified strategies. The first (double deletion) removes both enolase (E.C.# 4.2.1.11) and glucose 6-phosphate dehydrogenase (E.C.# 1.1.1.49). The removal of the enolase reaction strongly promotes hydrogen formation by directing the glycolytic flux towards the 3-phosphoglycerate branching point into the serine biosynthesis pathway.
Subsequently, serine participates in a series of reactions in one-carbon metabolism to form formyltetrahydrofolate which eventually is converted to formate and tetrahydrofolate. The elimination of dehydrogenase reaction prevents the shunting of any glucose 6-phosphate flux into the pentose phosphate pathway. The second strategy, a three-reaction deletion study, involves the removal of ATP synthase (E.C.# 3.6.3.14), alpha-ketoglutarate dehydrogenase, and acetate kinase (E.C.# 2.7.2.1). The removal of the first reaction enhances proton availability whereas the other two deletions ensure that maximum carbon SUBSTITUTE SHEET (RULE 26) flux is directed towards pyruvate which is then converted into formate through pyruvate formate lyase. Formate is catabolized into hydrogen and carbon dioxide through formate hydrogen lyase.
A comparison of the hydrogen production limits as a function of growth rate for both the wild-type and mutant networks is shown in Figure 3. The transport rates of carbon dioxide for the mutant networks were fixed at the values suggested by OptKnock, thus setting the operational imperatives (Pharkya et al., 2003). Note that while the two-reaction deletion mutant has a theoretical hydrogen production rate of 22.7 mmol/gDW/hr (0.025 g/g glucose) at the maximum growth rate (Point B), the three-reaction deletion mutant produces a maximum of 29.5 mmol/gDW/hr (0.033 g/g glucose) (Point C) at the expense of a reduced maximum growth rate. Interestingly, in both mutant networks, maximum hydrogen production requires the uptake of oxygen. This is in contrast to the wild-type case where the lack of oxygen was preferred for hydrogen formation. Notably, it has been reported (Nandi & Sengupta, 1996) that although formate hydrogen lyase can only be induced in the absence of oxygen, it can function in aerobic environments.
This will have to be accounted for in any experimental study conducted on the basis of these results.
1.2.1.2 Clostridium acetobutylicum Ample literature evidence has identified the organisms of the Clostridium species as natural hydrogen production systems (Nandi & Sengupta, 1998; Katakoka et al., 1997;
Chin et al., 2003; Das & Veziroglu, 2001). The reduction of protons into hydrogen through ferredoxin hydrogenase (E.C.# 1.12.7.2) is the key associated reaction. Not surprisingly, using OptStrain (Step 3), we verified that no non-native reactions were required for SUBSTITUTE SHEET (RULE 26) hydrogen production (Papoutsakis & Meyer, 1985) in Clostridium acetobultylicum with glucose as a substrate. We next explored, as in the E. coli =case,'whether hydrogen production could be enhanced by judiciously removing competing functionalities using the OptKnock framework. To this end, we used the stoichiornetric model for Clostridium acetobutylicum developed by Papoutsakis and coworkers (Desai et al., 1999;
Papoutsakis, 1984). OptKnock suggested the deletion of the acetate-forming and butyrate-transport reactions.
This deletion strategy is reasonable in hindsight upon considering the energetics of the entire network. Specifically, in the wild-type case the formation and secretion of each butyrate molecule requires the consumption of 2 NADH molecules, thus reducing the hydrogen production capacity of the network. However, if butyrate is not secreted, but is instead recycled to form acetone and butyryl CoA, then butyryl CoA can again be converted to butyrate without any NADH consumption. The double deletion mutant has a theoretical hydrogen yield of 3.17 mol/mol glucose (0.036g /g glucose) at the expense of slightly lower growth rate (point C in Figure 4). Notably, in this case, biomass formation and hydrogen production are tightly coupled, in contrast to the wild-type network where a range (1.38-2.96 mmol/gDW/hr) of hydrogen formation rates are possible (Line AB in Figure 4) at the maximum growth rate. Experimental results (Nandi & Sengupta, 1998) indicate that only up to 2 mol of hydrogen can be produced per mol of glucose anaerobically in Clostridium. In -fact, it has been reported=that inhibitory effects of butyrate directly on hydrogen production and indirect, effects of acetate on growth inhibition (Chin et al., 2003) are responsible for the observed low hydrogen yields.
Interestingly, the suggested reaction eliminations directly circumvent these inhibition bottlenecks.
SUBSTITUTE SHEET (RULE 26) 1.2.1.3 A<lethylobacterium extorguens AMl Moving from glucose to methanol as the substrate, we next investigated hydrogen production in'Methylobacterium extorquens AMl , a facultative methylotroph capable of surviving solely on methanol as a carbon and energy source (Van Dien &
Lidstrom, 2002).
The organism has been well-studied (Anthony, 1982; Chistoserdova et al., 2004;
Chistoserdova et al., 1998; Korotkova et al., 2002; Van Dien et al., 2003) and recently, a stoichiometric model of its central metabolism was published (Van Dien &
Lidstrom, 2002). Using Step 3 of OptStrain, we identified that only a single reaction needs to be introduced into the metabolic network of M. extorquens to enable hydrogen production.
Two such candidates are hydrogenase (E.C.# 1.12.7.2) which reduces protons to hydrogen or alternatively N5,.N10-methenyltetrahydromethanopterin hydrogenase which catalyzes the following transformation:
E.C.# 1.12.98.2: 5,10-Methylenetrahydromethanopterin +-+ 5,10-Methenyltetrahydromethanopterin + H2.
The need for an additional reaction is expected because the central metabolic pathways in the methylotroph, as abstracted in (Van Dien & Lidstrom, 2002), do not include any reactions that convert protons into hydrogen such as the hydrogenases found in E. coli and the anaerobes of the Clostridia species. Therefore, it is not surprising that, to the best of our knowledge, no one has achieved hydrogen production using methylotrophs such as Pseudomonas AMI and P. methylica (Nandi & Sengupta, 1998). The identified reaction additions provide a plausible explanation for this outcome by pinpointing the lack of a mechanism to convert the generated protons to hydrogen.
1.2.2 Vanillin Production Case Study SUBSTITUTE SHEET (RULE 26) Vanillin is an important flavor and aroma molecule. The low yields of vanilla from cured vanilla pods have motivated efforts for its biotechnological production.
In this case study, we identify metabolic network redesign strategies for the de novo production of vanillin from glucose in E. coli. Using OptStrain, we first determined the maximum i theoretical yield of vanillin from glucose to be 0.63 g/g glucose by solving the LP
optimization over approximately 4,000 candidate reactions balanced with respect to all elements but hydrogen (Step 2). We next identified that the minimum number of non-native reactions that must be'recombined into E. coli to endow it with the pathways necessary to achieve the maximum yield is three (Step 3). Numerous alternative pathways, differing only in their cofactor usage, which satisfy both the optimality criteria of yield and minimality of recombined reactions, were identified. For example, one such pathway uses the following three non-native reactions:
(i) E.C.# 1.2.1.46: Formate + NADH + H+<-+ Formaldehyde + NAD+ + H20, (ii) E.C.# 1.2.3.12: 3, 4-dihydroxybenzoate (or protocatechuate) + NAD+ + H20 +
Forrnaldehyde <--> Vanillate + Oa + NADH, and (iii) E.C.# 1.2.1.67: Vanillate + NADH +'H+ +-+ Vanillin + NAD+ + H20.
Interestingly, these steps are essentially the same as those used in the experimental study by Li and Frost (1998) to convert glucose to vanillin in recombinant E. coli cells demonstrating that the computational procedure can indeed uncover relevant engineering strategies. Note, however, that the reported experimental yield of 0.15 g/g glucose is far from the maximum theoretical yield (i.e., 0.63 g/g glucose) of the network indicating the potential for considerable improvement.
SUBSTITUTE SHEET (RULE 26) This motivates examining whether it is possible to reach higher yields of vanillin by systematically pruning the metabolic network using OptKnock (Step 4). Here the genome-scale model of E. coli metabolism, augmented with the three functionalities identified above, is integrated into the OptKnock framework to determine the set(s) of reactions whose deletion would, force a strong coupling between growth and vanillin production. The highest vanillin-yielding single, double, and quadruple knockout strategies are discussed next for a basis glucose uptake rate of 10 mmol/gDW/hr. In all cases, anacrobic conditions are selected by OptKnock as the most favorable for vanillin production. It is worth emphasizing that, in general, the deletion strategies identified by OptStrain are dependent upon the specific gene addition strategy fed into Step 4 of OptStrain.
Accordingly, we tested whether altexnative and possibly better, deletion strategies would accompany some of the other candidate addition strategies alluded to above. For the vanillin case study, we found the deletion suggestioris and anticipated vaniliin yields at maximal growth to be quite similar regardless of the gene addition strategy employed.
The first deletion strategy identified by OptStrain suggests removing acetaldehyde dehydrogenase (E.C.# 1.2.1.10) to prevent the conversion of acetyl-CoA into ethanol.
Vanillin production in this network, at the maximum biomass production rate of 0.205 hr-1, is 3.9 mmoUgDW/hr or 0.33 g/g glucose based on the assumed uptake rate of glucose. In this deletion strategy, flux is redirected through the vanillin precursor metabolites, phosphoenolpyruvate (PEP) and erythrose-4-phosphate (E4P), by blocking the loss of carbon through ethanol secretion. The second (double) deletion strategy involves the additional removal of glucose-6-phosphate isomerase (E.C.# 5.3.1.9) essentially blocking the upper half of glycolysis. These deletions cause the network to place a heavy'reliance on SUBSTITUTE SHEET (RULE 26) the Entner-Doudoroff pathway to generate pyntvate and glyceraldehyde-3-phosphate (GAP) which undergoes further conversion into PEP in the lower half of glycolysis.
Fructose-6-phosphate (F6P), produced through the non-oxidative part of the pentose phosphate pathway, is subsequently converted to E4P. Vanillin production, at the expense of a reduced maximum growth rate of 0.06 hr"t, is increased to 4.78 mmol/gDW/hr or 0.40 g/g glucose. A substantially higher level of vanillin production is predicted in the four-reaction deletion mutant network without imposing a high penalty on the growth rate. This strategy leads to the production of 6.79 mmol/gDW/hr of vanillin or 0.57 g/g glucose at the maximum growth rate of 0.052 hr"1. The OptKnock framework suggests the deletion of acetate kinase (E.C.# 2.7.2.1), pyruvate kinase (E.C.# 2.7.1.40), the PTS
transport mechanism, and fructose 6-phosphate aldolase. The first three deletions prevent leakage of flux from PEP and redirect it instead to vanillin synthesis. The elimination of fructose 6-phosphate aldolaseprevents the direct conversion of F6P into GAP and dihydroxyacetone (DHA). Note that both F6P and GAP are used to form E4P in the non-oxidative branch of the pentose phosphate pathway. DHA can be further reacted to form dihydroxyacetone phosphate (DHAP) with the consumption of a PEP molecule. Thus, elimination of fructose 6-phosphate aldolase prevents theutilization of both F6P and PEP which are required for vanillin synthesis. Furthermore, a surprising network flux redistribution involves the employment of a group of reactions from one-carbon metabolism to form 10-formyltetrahydrofolate, which is subsequently converted to formaldehyde.
Figure 5 compares the vanillin production envelopes, obtained by maximizing and minimizing vanillin formation at different biomass production rates for the wild-type and SUBSTITUTE SHEET (RULE 26) mutanat networks. These deletions endow the network with high levels of vanillin production under any growth conditions.
1.3 I)iscussion The OptStrain framework of the present invention is aimed at systematically reshaping whole genome-scale metabolic networks of microbial systems for the overproduction of not only small but also complex molecules. We have so far examined a number of different products (e.g., 1,3 propanediol, inositol, pyruvate, electron transfer, etc.) using a variety of hosts (i.e., E. coli, C. acetobut,ylicum, M.
extorquens). The two case studies, hydrogen and vanillin, discussed earlier show that OptStrain can address the range of challenges associated with strain redesign. At the same time, it is important to emphasize that the validity and relevance of the results obtained with the OptStrain framework are dependent on the level of completeness and accuracy of the reaction databases and microbial metabolic models considered. We have identified numerous instances of unbalanced reactions, especially with respect to hydrogen atoms, and ambiguous reaction directionality in the reaction.databases that we mined.
Careful curation of the downloaded reactions preceded all of our case studies. Whenever the balanceability of a reaction with respect to carbon could not be restored, the reaction was removed from consideration. We expect that this step will become less time-consuming as automated tools for reaction database testing and verification (Segre et al., 2003) are becoming available. The purely stoichiometric representation of metabolic pathways in microbial models can lead to unrealistic flux distributions by not accounting for kinetic barriers and regulatory interactions (e.g., allosteric regulation). To alleviate this, the present invention contemplates incorporating regulatory information in the form of Boolean constraints SUBSTITUTE SHEET (RULE 26) (Covert & Palsson, 2002) into the stoichiometric model of E. coli and the use of kinetic expressions on an as-needed basis (Castellanos et al. 2004; Tomita et al., 1999; Vamer &
Ramkrishna, 1999). Further, the present invention contemplates using OptKnock to account for not only reaction deletions but also up or down regulation of various key reaction steps. Despite these simplifications, OptStrain has already provided in many cases useful insight into microbial host redesign and, more importantly, established for the first time an integrated framework open to future modeling improvements.
It should be understood that a computer is iused in implementing the methodology of the present invention. The present invention contemplates that any number of computers can be used, and any number of types of software or programming languages can be used. It should further be understood that the present invention provides for storing a representation of the networks created. The representations of the networks can be-stored in a memory, in a signal, or in a bioengineered organism.
1.4 Mathematical Formulation for OntStrain The redesign of microbial metabolic networks to enable enhanced product yields by employing the OptStrain procedure requires the solution of multiple types of optimization problems. The first optimization task (Step 2) involves determining the maximum yield of the desired product in a metabolic network comprised of a set N= { 1, ..., N}
of metabolites and a set 9K ={1, ..., M} of reactions. The Linear Programming (LP) problem for maximizing the yield on a weight basis of a particular product P (in the set 9V} from a set 91 of substrates is formulated as:
u , i=P
Max MW =IS,,v) Vj I=1 SUBSTITUTE SHEET (RULE 26) subject to ESv, _ 0 , 'd i E N, io 91 (1) J=J
M
~ (MW, - isJv! -Z (2) 101 f al -where MYI', is the molecular weight of metabolite i, vj is the molar flux of reactionj, and S.
is the stoichiometric coefficient of metabolite i in reaction j. In our work, the metabolite set Nwas comprised of approximately 4,800 metabolites and the reaction set 94 consisted of more than 5,700 reactions. The inequality in constraint (1) allows only for' secretion and prevents the uptake of all metabolites in the network other than the substrates in 93.
Constraint (2) scales the results for a total substrate uptake flux of one gram. The reaction fluxes vi can either be irreversible (i.e., v, _ 0) or reversible in which case they can assume either positive or negative values. Reactions which enable the uptake of essential-for-growth compounds such as oxygen, carbon dioxide, ammonia, sulfate and phosphate are also present.
In Step 3 of OptStrain, the minimum number of non-native reactions needed to meet the identified maximum yield from Step 2 is found. First the Universal database reactions which are absent in the examined microbial host's metabolic model are flagged as non-native. This gives rise to the following Mixed Integer Linear Programming (MILP) problem:
Min EyJ
v, I yi JEMrren-nanve M
subject to ySuvj >_ 0 , b' i E N, io 91 (1) J=~
SUBSTITUTE SHEET (RULE 26) ,~f =
MI3; = ~ S#vi (2) IE92 j=1 M
t er ' 1= P (3) Syvj Yleld m MW; =Y >_ .%=J
v j:5 jmx .Y j b J E I"Inon-native (4) vj ~ v~p vi d.1 E I"lnon-native (5) y'j {0,1} , b J E Mnon-native (6) The set W noõ_narõe comprises of the non-native reactions for the examined host and is a subset of the set 9f. Constraints (1) and (2) are identical to those in the product yield maximization problem. Constrairit (3) ensures that the product yield meets the maximum theoretical yield, Yiela~age1, calculated in step 2. The binary variables yl in constraints (4) and (5) serve as switches to turn reactions on or off. A value of zero for yJ
forces the corresponding flux vj to be zero, while a value of one enables it to take on nonzero values.
The parameters vjm'n and vj" can either assume very low and very high values, respectively, or they can be calculated by minimizing and maximizing every reaction flux vj subject to constraints (1-3).
.15 Alternative pathways that satisfy both optimality criteria of maximum yield and minimum non-native reactions are obtained by the iterative solution of the MILP
formulation upon the accumulation of additional constraints referred to as an integer cuts.
Integer cut constraints exclude from consideration all sets of reactions previously identified. For example, if a previously identified pathway utilizes reactions 1, 2, and 3, then the following constraint prevents the same reactions from being simultaneously SUBSTITUTE SHEET (RULE 26) considered in subsequent solutions: yr + y2 + y3 < 2. More details can be found in Burgard and Maranas (2001).
Step 4 of OptStrain identifies which reactions to eliminate from the network augrnented with the non-native functionalities, using the OptKnock framework developed previously (Burgard et al., 2003; Pharkya et al., 2003). The objective of this step is to constrain the phenotypic behavior of the network so that growth is coupled with the formation of the desired biochemical, thus curtailing byproduct formation. The envelope of allowable targeted product yields versus biomass yields is constructed by solving a series of linear optimization problems which maximize and then, minimize biochemical production,for various levels of biomass formation rates available to the network. More details on the optimization formulation can be found in (Pharkya et al., 2003). All the optimization problems were solved in the order of minutes to hours using CPLEX
7.0 (http://www.ilog.com/products/cplex/) accessed via the GAMS (Brooke et al., 1998) modeling environment on an IBM RS6000-270 workstation.
SUBSTITUTE SHEET (RULE 26) 2. OUtKDOCk The ability to investigate the metabolism of single-cellular organisms at a genomic scale, and thus systemic level, motivates the need for novel computational methods aimed at identifying strain engineering strategies. The present invention includes a computational framework termed OptKnock for suggesting gene deletion strategies leading to the overproduction of specific chemical compounds in E. coli. This is accomplished by ensuring that the production of the desired chemical becomes an obligatory byproduct of growth by "shaping" the connectivity of the metabolic network. In other words, OptKnock identifies and subsequently removes metabolic reactions that are capable of uncoupling cellular growth from chemical production. The computational procedure is designed to identify not just straightforward but also non-intuitive knockout strategies by simultaneously considering the entire E. colf metabolic network as abstracted in the in silico E. coli model of Palsson and coworkers (Edwards & Palsson, 2000). The complexity and built-in redundancy of this network (e.g., the E. coli model encompasses 720 reactions) necessitates a systematic and efficient search approach to combat the combinatorial explosion of candidate gene knockout strategies.
The nested optimization framework shown in Figure 6 is developed to identify -multiple gene deletion combinations that maximally couple cellular growth objectives with externally imposed chemical production targets. This multi-layered optimization structure involving two competing optimal strategists (i.e., cellular objective and chemical production) is referred to as a bilevel optimization problem (Bard, 1998).
Problem formulation specifics along with an elegant solution procedure drawing upon linear programming (LP) duality theory are described in the Methods section. The OptKnock SUBSTITUTE SHEET (RULE 26) procedure is applied to succinate, lactate, and 1,3-propanediol (PDO) production in E. coli with the maximization of the biomass yield for a fixed amount of uptaken glucose employed as the cellular objective. The obtained results are also contrasted against using the minimization of metabolic adjustment (MO,MA) (Segre et al., 2002) as the cellular objective. Based on the OptKnock framework, it is possible to identify the most promising gene knockout strategies and their corresponding allowable envelopes of chemical versus biomass production in the context of succinate, lactate, and PDO production in E. coli.
A preferred embodiment of this invention describes a computational framework, termed OptKnock, for suggesting gene deletions=strategies that could lead to chemical production in E. coli by ensuring that the drain towards metabolites/compounds necessary for growth resources (i.e., carbons, redox potential, and energy) must be accompanied, due to stoichiometry, by the production of the desired chemical. Therefore, the production of the desired product becomes an obligatory byproduct of cellular growth.
Specifically, OptKnock pinpoints which reactions to remove from a metabolic network, which can be realized by deleting the gene(s) associated with the identified functionality.
The procedure was demonstrated based on succinate, lactate, and PDO production in E. coli K-12. The obtained results exhibit good agreement with strains published in the literature. While some of the suggested gene deletions are quite straightforward, as they essentially prune reaction pathways competing with the desired one, many others are at first quite non-intuitive reflecting the complexity and built-in redundancy of the metabolic network of E.
coli. For the succinate case, OptKnock correctly suggested anaerobic fermentation and the removal of the phosphotranferase glucose uptake mechanism as a consequence of the competition between the cellular and chemical production objectives, and not as a direct SUBSTITUTE SHEET (RULE 26) input to the problem. In the lactate study, the glucokinase-based glucose uptake mechanism was shown to decouple lactate and biomass production for certain knockout strategies. For the PDO case, results show that the Entner-Doudoroff pathway is more advantageous than EMP glycolysis despite the fact that it is substantially less energetically efficient. In addition, the so far popular tpi knockout was clearly shown to reduce the maximum yields of PDO while a complex network of 15 reactions was shown to be theoretically possible of "leaking" flux from the PPP pathway to the TCA cycle and thus decoupling PDO production from biomass formation. The obtained results also appeared to be quite robust with respect to the choice for the cellular objective.
The present =invention contemplates any number of cellular objectives, including but not limited to maximizing a growth rate, maximizing ATP production, minimizing' metabolic adjustment, minimizing nutrient uptake, minimizing redox production, minimizing a Euclidean norm, and combinations of these and other cellular objectives.
It is important to note that the suggested gene deletion strategies must be interpreted carefully. Fo'r example, in many cases the deletion of a gene in one branch of a branched pathway is equivalent with the significant up-regulation in the other. In addition, inspection of the flux changes before and after the gene deletions provides insight as to which genes need to be up or down-regulated. Lastly, the problem of mapping the set of identified reactions iargeted for removal to its corresponding gene counterpart is not always /uniquely specified. Therefore, careful identification of the most economical gene,set accounting for isozymes and multifunctional enzymes needs to be made.
Preferably, in,the OptKnock framework, the substrate uptake flux (i.e., glucose) is assumed to be 10 mrnol/gDW-hr. Therefore, all reported chemical production and biomass SUBSTITUTE SHEET (RULE 26) formation values are based upon this postulated and not predicted uptake scenario. Thus, it is quite possible that the suggested deletion mutants may involve substantially lower uptake efficiencies. However, because OptKnock essentially suggests mutants with coupled growth and chemical production, one could envision a growth selection system that will successively evolve mutants with improved uptake efficiencies and thus enhanced desired chemical production characteristics.
Where there is a lack of any regulatory or kinetic information within the purely stoichiometric representation of the inner optimization problem that performs flux allocation, OptKnock is used to identify any gene deletions-as the sole mechanism for chemical overproduction. Clearly, the lack of any regulatory or kinetic information in the model is a simplification that may in some cases suggest unrealistic flux distributions. The incorporation of regulatory information will not only enhance the quality of the suggested gene deletions by more appropriately resolving flux allocation, but also allow us to suggest regulatory modifications along with gene deletions as mechanisms for strain improvement.
The use of alternate'modeling approaches (e.g., cybernetic (Kompala et al., 1984;
Ramakrishna et al., 1996; Vamer and Ramkrishna, 1999), metabolic control analysis (Kacser and Bums, 1973; Heinrich and Rapoport, 1974; Hatzimanikatis et al., 1998)), if available, can be incorporated within the OptKnock framework to more accurately estimate the metabolic flux distributions of gene-deleted metabolic networks.
Nevertheless, even without such regulatory or kinetic information, OptKnock provides useful suggestions for strain improvement and more importantly,establishes a systematic framework.
The present invention naturally contemplates future improvements in metabolic and regulatory modeling frameworks.
SUBSTITUTE SHEET (RULE 26) 2.1 Methods The maximization of a cellular objective quantified as an aggregate reaction flux for a steady state metabolic network comprising a set N= { 1,. .., 9V) of metabolites and a set 91= { 1,..., M} of metabolic reactions fueled by a glucose substrate is expressed mathematically as follows, maximize vicellular objective (Primal) subject to YSvj = 0, 'd i E N
j=1 Vpts. + Vglk = Vgic_uptake mmol/gDWhr vatp ~ vatp main mmol/gDW-hr Vbiamass ~ Vbiamass 1 /hr 1' j? O, 'd j E Mirrev v j< 0, f/ j E Msecr only vJ E R, t'1 j E Mrev where Sy is the stoichiometric coefficient of metabolite i in reactionj, vj represents the flux of reactionj, Vglc_uptake is the basis glucose uptake scenario, vatp mai>: is the non-growth associated ATP maintenance requirement, and vb orMass is a minimum level of biomass production. The vector v includes both internal and transport reactions. The forward (i.e., positive) direction of transport fluxes corresponds to the uptake of a particular metabolite, SUBSTITUTE SHEET (RULE 26) whereas the reverse (i.e., negative) direction corresponds to metabolite secretion. The uptake of glucose through the phosphotransferase system and glucokinase are denoted by vprs and vgtk, respectively. Transport fluxes for metabolites that can only be secreted from the network are members Of Msecr oõly. Note also that the complete set of reactions M is subdivided into reversible M, and irreversible M1Reõ reactions. The cellular objective is often assumed to be a drain of biosynthetic precursors in the ratios required for biomass formation (Neidhardt and Curtiss, 1996). The fluxes are reported per 1 gDW hr such that biomass formation is expressed as g biomass produced/gDWhr or 1/hr.
The modefing of gene deletions, and thus reaction elimination, first requires the incorporation of binary variables into the flux balance analysis framework (Burgard and Maranas, 2001; Burgard et al., 2001). These binary variables, 1 if reaction flux vj is active y' 0 if reaction flux, vj is not active VjE M
assume a value of one if reaction j is active and a value of zero if it is inactive. The following constraint, vimm. yf<v,<v,~'*yJ b'jE M
ensures tliat reaction flux vj is set to zero only if variable yj is equal to zero. Alternatively, wheny, is equal to one, vj is free to assume any value between a lower vJ ""
and an upper v.,"
bound. In this study, vJ""" and vj'" " are identified by minimizing and subsequently maximizing every reaction flux subject to the constraints from the Primal problem.
The identification of optimal gene/reaction knockouts requires the solution of a bilevel optimization problem that chooses the set of reactions that can be accessed (yj = 1) so as the optimization of the celluldr objective indirectly leads to the overproduction of the SUBSTITUTE SHEET (RULE 26) chemical or biochemical of interest (see also Figure 6). Using biomass formation as the cellular objective, this is expressed mathematically as the following bilevel mixed-integer optimization problem.
SUBSTITUTE SHEET (RULE 26) maximize vchemrcal (OptKnock) Yi subject to maximize vbromass (Primal) vJ
subject to E.S' vl = 0, V1 E N
vpts + Vglk - vgtc _ uptake Vaip ~ Vatp matn t arg et Vblomass > V
biomass v ;in =y, 5v, <_V;ax =y,, ej E M
yi ={0,1 }, fI j E M
F, (1-yj )SK
JEM
where K is the number of allowable knockouts. The fmal constraint ensures that the resulting network meets a miriimum biomass yield, ' btomass =
The direct solution of this two-stage optimization problem is intractable given the high dimensionality of the flux space (i.e., over 700 reactions) and the presence of two nested optimization problems. To remedy this, we develop an efficient solution approach borrowing from LP duality theory which shows that for every linear programming problem (primal) there exists a unique optimization problem (dual) whose optimal objective value is SUBSTITUTE SHEET (RULE 26) equal to that of the primal problem. A similar strategy was employed by (Burgard and Maranas, 2003) for identifying/testing metabolic objective functions from metabolic flux data. The dual problem (Ignizio and Cavalier, 1994) associated with the OptKnock inner problem is P
minimize Vatp_mQtn =Patp + Vbromnss Aromass + Vglc_up(ake gic (Dual) N
subj ect to r a~ ~oEch Si glk +~lgik + gIC - 0 6j-~i Xitolch Si pts +,ll pts + gl c= 0 ' i=t N
stofch ~i Si,biomass + Pbiomass 1 r'=1 N
Ivi 1o"hS,J +,u j= 0, 'd j E M; j~ glk, pts, biomass ~ min y, ) < Pj < ~,'."a" = (1- yj), b' j E Mrev and1 0- Msecr only fl~ ~ fl j n(1 -.j1]Vj E Mrev and Msecr only fl; <_ ~Cl ,niax(1 - yj), d j E Mirrev and J SE Msecr only j E R, Vj E Mirrev and Msecr only Ztoich E R, t'/ j E N
glcER
where .its' "h is the dual variable associated with the stoichiometric constraints, glc is the dual variable associated with the glucose uptake constraint, and pj is the dual variable SUBSTITUTE SHEET (RULE 26) associated with any other restrictions on its corresponding flux vJ in the Primal. Note that, the dual variable,uj acquires unrestricted sign if its corresponding flux in the OptKnock inner problem is set to zero by enforcing yj = 0. The parameters ,-limin and qj' are identified by minimizing and subsequently maximizing their values subject to the constraints of the Dual problem.
If the optimal solutions -to the Primal and Dual problems are bounded, their objective function values must be equal to one another at optimality. This means that every optimal solution to both problems can be characterized by setting their objectives equal to one another and accumulating their respective constraints. Thus the bilevel formulation for OptKnock shown previously can be transformed into the following single-level MII,P.
maximize Vchemical (OptKnock) subject to t et Vbiomass - Vatp main',uatp + Vb,omass ~-omass +Vgic_uptake ' glc M
Y'S;;vi = 0, 'd i E N
1=J
Vpts + Vglk = 1'gic_uptake mmol/gDW-hr 1'atp ~ Vatp main mmol/gDI Y=hr N
Y, Ivi toich S*f glk + /uglk + glC = 0 i=1 N =
Z,~itoichsi,p, + tLl pta + gic = 0 i=1 SUBSTITUTE SHEET (RULE 26) N
a'stofch i si,bfomass + Pbiomass - 1 i=1 N
sloich Su +,u j= 0, Vj E M, j glk, pts, biomass E(1-yj):5"T';
jEM
vbtomass ~ Vbiomass ,47 n .(1- y J):5'Y, ~~'ax .(1- y j), Vj E Mrev and j0 Msecr oniy ~ j >~ jdn '(1 -y d f E Mrev and Msecr only Pj:gp~'(1- Y j), d j E Mirrev and j 0 Msecr only pj ERr V,J E Mirrev and Msecr only ~ftn.yj<_vjSv!'a' yj, VjE M
A,toich E R, b'jE N
glc E R
yj d j E M
An important feature of the above formulation is that if the problem is feasible, the optimal solution will always be found., In this invention, the candidates for gene knockouts included, but are not limited to, all reactions of glycolysis, the TCA cycle, the pentose SUBSTITUTE SHEET (RULE 26) phosphate pathway, respiration, and all anaplerotic reactions. This is accomplished by limiting the number of reactions included in the summation (i.e., (1- y j)=
K).
jeCentral Metabolism Problems containing as many as 100 binary variables were solved in the order of minutes to hours using CPLEX 7.0 accessed via the GAMS modeling environment on an IBM
RS6000-270 workstation. It should be understood, however, that the present invention is not dependent upon any particular type of computer or environment being used.
Any type =
can be used to allow for inputting and outputting the information associated with. the methodology of the present invention. Moreover, the steps of the methods of the present invention can be implemented in any number of types software applications, or languages, and the present invention is not limited in this respect. It will be appreciated that other embodiments and uses w-ill be, apparent to those skilled in the art and that the invention is not limited to these specific illustrative examples.
2.2 EXAMPLE 1 Succinate and Lactate Production ' Wtuch reactions, if any, that could be removed from the E. coli K-12 stoichiometric model (Edwards.8z Palsson, 2000) so as the remaining network produces succinate or lactate whenever biomass maximization is a good descriptor of flux allocation were identified. A prespecified amount of glucose (10 mmol/gDW=hr), along with unconstrained uptake routes for inorganic phosphate, oxygen, sulfate, and ammonia are provided to fuel the metabolic network. The optimization step could opt for or against the phosphotransferase system, glucokinase, or both mechanisms for the uptake of glucose.
Secretion routes for acetate, carbon dioxide, ethanol, formate, lactate and succinate are also SUBSTITUTE SHEET (RULE 26) enabled. Note that because the glucose uptake rate is fixed, the biomass and product yields are essentially equivalent to the rates of biomass and product production, respectively. In all cases, the OptKnock procedure eliminated the oxygen uptake reaction pointing at anaerobic growth conditions consistent with current succinate (Zeikus et al., 1999) and lactate (Datta et al., 1995) fermentative production strategies.
Table I summarizes three of the identified gene knockout strategies for succinate overproduction (i.e., mutants A, B, and C). The results for mutant A suggested that the removal of two reactions (i.e., pyruvate formate lyase and lactate dehydrogenase) from the .
network results in succinate production reaching 63% of its theoretical maximum at the maximum biomass yield. This knockout strategy is identical to the one employed by Stols and Donnelly (1997) in their succinate overproducing E. coli strain. Next, the envelope of allowable succinate versus biomass production was explored for the wild-type E. coli network and the three mutants listed in Table I. The succinate production limits revealed that mutant A does not exhibit coupled succinate and biomass formation until the yield of biomass approaches 80% of the maximum. Mutant B, however, with the additional -deletion of acetaldehyde dehydrogenase, resulted in a much earlier coupling of succinate with biomass yields.
A less intuitive strategy was identified for mutant C which focused on inactivating two PEP consuming reactions rather than eliminating competing byproduct (i.e., ethanol, formate, and lactate) production mechanisms. First, the phosphotransferase system was disabled requiring the network to rely exclusively on glucokinase for the uptake of glucose.
Next, pyruvate kinase was removed leaving PEP carboxykinase as the only central metabolic reaction capable of draining the significant amount of PEP supplied by SUBSTITUTE SHEET (RULE 26) glycolysis. This strategy, assuming that the maximum biomass yield could be attained, resulted in a succinate yield approaching 88% of the theoretical maximum. In addition, there was significant succinate production for every attainable biomass yield, while the maximum theoretical yield of succinate is the same as that for the wild-type strain.
The OptKnock framework was next applied to identify knockout strategies for coupling lactate and biomass production. Table I shows three of the identified gene knockout strategies (i.e., mutants A, B, and C). =. Mutant A redirects flux toward lactate at the maximum biomass yield by blocking acetate and ethanol production. This result is consistent with previous work demonstrating that an adh, pta mutant E. coli strain could grow anaerobically on glucose by producing lactate (Gupta & Clark, 1989).
Mutant B
provides an alternate strategy involving the removal of an initial glycolysis reaction along with the acetate production mechanism. This results in a lactate yield of 90%
of its theoretical limit at the maximum biomass yield. It is also noted that the network could avoid producing lactate while maximizing biomass formation. This is due to the fact that OptKnock does not explicitly account for the "worst-case" alternate solution.
It should be appreciated that upon the additional elimination of the glucokinase and ethanol production mechanisms, mutant C exhibited a tighter coupling between lactate and biomass production.
SUBSTITUTE SHEET (RULE 26) . . o .
Table I - Biomass and chemical yields for various gene knockout strategies identified by OptKnock. The reactions and corresponding enzymes for each knockout strategy are listed.
The maximum biomass and corresponding chemical yields are provided on a basis of 10 mmoUhr glucose fed and 1 gDW of cells. The rightmost column provides the chemical yields for the same basis assuming a minimal redistribution of metabolic fluxes from the wild-type (undeleted) E. coli network (MOMA assumption). For the 1,3-propanediol case, glycerol secretion was disabled for both knockout strategies.
SUBSTITUTE SHEET (RULE 26) Succinate max yblomasa Rl1n ME(yo - v)2 Biomass Succinate Suceinate ID Knockouts Enzyme (1/hr) (mmoUhr) (mmol/hr) Wild "Complele network" 0 38 D.12 0 A I COA+ PYR -r ACCOA'+FOR Pyruvatc fomtatc lyase 2 NADH + PYR *-s LAC + NAD Lactate dehydrogenase 0=3 i 10.70 1,65 B I COA + PYR -). ACCOA + FOR Pytnvate fonnata lyase 2 NADH + PYR E-s LAC + NAD Lactate dehydrogenase 0.31 10.70 4.79 3 ACCOA + 2 NADH COA + ETH + 2 NAD Acetaldehyde dehydrogenase C I ADP + PEP -+ ATP + PYR Pynrvato kinasc 2 ACTP + ADP ea AC + ATP or Acetate Idnase 0.16 15.15 621 ACCOA + Pi e-+ ACTP + COA Phosphotransacetylase 3 GLC + PEP -- G6P + PYR Phosphatran9ferase system Lactate max V btomau min ME(v v) 2 Biornass Lactate Lactate ID Knockouts Enzyme (1/hr) (mmol/hr) (mmoUhr) Wlld "Complete network" 0 38 D 0 A I ACTP + ADP t+ AC + ATP or Acatate kinase ACCOA + Pi c-s ACTP + COA Phosphotransacetylasc 0.28 10.46 5 58 2 ACCOA + 2 NADH - COA + ETH + 2 NAD Acetaldehydc dehydrogouasc B I ACTP + ADP <-s AC + ATP or Acetate kinase ACCOA + Pi t-r ACTP + COA Phosphotransacetylase 0.13 1900 0 19 2 ATP + F6P --> ADP + F'DP or Phosphofructoldnase FDP r+ T3P1 + T3P2 Fntctose-1,6-blspbosphatate aldolase C 1 ACTP + ADP a-s AC + ATP or Acetate kinase ACCOA + P( E+ ACTP + CDA Phosphotransacctylase 2 ATP + F6P -s ADP + FDP or Phosphofruclokinase FDPHT3P1+T3P2 Fructose-I.6-bisphosphatataaldolase 0=12 18.13 1053 3 ACCOA + 2 NADH - COA + ETH + 2 NAD Acetaldehyde dehydrogenase 4 GLC + ATP -+ G6P + PEP Glucokinasr 1,3-Propanediot mar y biomau rxin t$ (v a- v)' Biomass 1,3-PD 1,3-PD
ID Knockouts Enzyme (1/hr) (mmol/hr) (mmol/hr) Wad "Complcta network" 1.06 0 0 A I FDP -+ F6P + Pi or Fructose-1,6-bispbosphatase FDP f+ T3P1 + T3P2 Fructosc-1,6-bisphosphatc aldolase 2 13PDG + ADP ++ 3PG + ATP or Phosphoglycemte kinase 0.21 9.66 8.66 NAD + Pi + T3P 1 t-+ I 3PDG + NADH Glyceraldehyde-3-phosphate dehydrogenase 3 GL + NAD <-- GLAL + NADH Aldehyde dehydrogenase B I T3P1 ta T3P2 Triosphosphate isometase 2 G6P + NADP ea D6PGL+NADPH or Glucase 6-phosphate-l-dehydrogenase D6PGL -r D6PGC 6-Phosphoglaconolactonase 0.29 9.67 9.54 3 DRSP -s ACAL+T3P1 Deo.yn'bose-phosphate aldolase 4 GL + NAD H GLAL + NADH Aldehyde dchydrogenase 2.2 EXAMPLE 2 1,3-Propanediol (PDO) Production SUBSTITUTE SHEET (RULE 26) In addition to devise optimum gene knockout strategies, OptKnock was used to design strains where gene additions were needed along with gene deletions such as in PDO
production in E. coli. Although microbial 1,3-propanediol (PDO) production methods have been developed utilizing glycerol as the primary carbon source (Hartlep et al., 2002;
Zhu et al., 2002), the production of 1,3-propanediol directly from glucose in a single microorganism has recently attracted considerable interest (Cameron et al., 1998; Biebl et al., 1999; Zeng & Biebl, 2002). Because wild-type E. coli lacks the pathway necessary for PDO production, the gene addition framework was first employed (Burgard and Maranas, 2001) to identify the additional reactions needed for producing PDO from glucose in E.
coli. The gene addition framework identified a straightforward three-reaction pathway involving the conversion of glycerol-3-P to glycerol by glycerol phosphatase, followed by the conversion of glycerol to 1,3 propanediol by glycerol dehydratase and 1,3-propanediol oxidoreductase. These reactions were then added to the E. coli stoichiometric model and the OptKnock procedure was subsequently applied.
OptKnock revealed that there was neither a single nor a double deletion mutant ~
with coupled PDO and biomass production. However, one triple and multiple quadruple knockout strategies that can couple PDO production with biomass production was identified.= Two of these knockout strategies are shown in Table I. The results suggested that the removal of certain key functionalities from the E. coli network resulted in PDO
=J
overproducing mutants for growth on glucose. Specifically, Table I reveals that the removal of two glycolytic reactions along with an additional knockout preventing the degradation of glycerol yields a network capable of reaching 72% of the theoretical maximum yield of PDO at the maximum biomass yield. Note that the glyceraldehyde-3-SUBSTITUTE SHEET (RULE 26) phosphate dehydrogenase (gapA) knockout was used by DuPont in their PDO-overproducing E. coli strain (Nakamura, 2002). Mutant B revealed an alternative strategy, involving the removal of the triose phosphate isomerase (tpi) enzyme exhibiting a similar PDO yield and a 38% higher biomass yield. Interestingly, a yeast strain deficient in triose phosphate isomerase activity was recently reported to produce glycerol, a key precursor to PDO, at 80-90% of its maximum theoretical yield (Compagno et al., 1996).
Review of the flux distributions of the wild-type E. coli, mutant A, and mutant B
networks that maximize the biomass yield indicates that, not surprisingly, further conversion of glycerol to glyceraldehyde was disrupted in both mutants A and B. For mutant A, the removal of two reactions from the top and bottom parts of glycolysis resulted in a nearly complete inactivatiori of the pentose phosphate and glycolysis (with the exception of triose phosphate isomerase) pathways. To compensate, the Entner-Doudoroff glycolysis pathway is activated to cliannel flux from glucose to pyruvate and glyceraldehyde-3-phosphate (GAP). GAP is then converted to glycerol which is subsequently converted to PDO. Energetic demands lost with the decrease in glycolytic fluxes from the wild-type E. coli network case, are now met by an increase in the TCA
cycle fluxes. The knockouts suggested for mutant B redirect flux toward the production of PDO by a distinctly different mechanism. The removal of the initial pentose phosphate pathway reaction results in the complete flow,of metabolic flux through the first steps of glycolysis. At the fructose bisphosphate aldolase junction; the flow is split into the two product metabolites: dihydroxyacetone-phosphate (DHAP) which is converted to PDO and GAP which continues through the second half of the glycolysis. The removal of the triose-phosphate isomerase reaction prevents any interconversion between DHAP and GAP.
SUBSTITUTE SHEET (RULE 26) Interestingly, a fourth knockout is predicted to retain the coupling between biomass formation and chemical production. This knockout prevents the "leaking" of flux through a complex pathway involving 15 reactions that together convert ribose-5-phosphate (R5P) to acetate and GAP, thereby decoupling growth from chemical production.
Next, the envelope of allowable PDO production versus biomass yield is explored for the two mutants listed in Table I. The production limits of the mutants along with the original E. coli network, reveal that the wild-type E. coli network has no "incentive" to produce PDO if the biomass yield is to be maximized. On the other hand, both mutants A
and B have to produce significant amounts of PDO if any amount of biomass is to be formed given the reduced functionalities of the network following the gene removals.
Mutant A, by avoiding the tpi knockout that essentially sets the ratio of biomass to PDO
production, is characterized by a higher maximum theoretical yield of PDO. The above described results hinge on the use of glycerol as a key intermediate to PDO.
Next, the possibility of utilizing an alternative to the glycerol conversion route for 1,3-propandediol production was explored.
Applicants identified a pathway in Chlorof lexus aurantiacus involving a two-step NADPH-dependant reduction of malonyl-CoA to generate 3-hydroxypropionic acid (3-HPA) (Menendez et al., 1999; Hugler et al., 2002). 3-HPA could then be subsequently converted chemically to 1,3 propanediol given that there is no biological functionality to achieve this transformation. This pathway offers a key advantage over PDO
production through the glycerol route because its initial step (acetyl-CoA carboxylase) is a carbon fixing reaction. Accordingly, the maximum theoretical yield of 3-HPA (1.79 mmol/mmol glucose) is considerably higher than for PDO production through the glycerol conversion SUBSTITUTE SHEET (RULE 26) route (1.34 mmol/mmol glucose). The application of the OptKnock framework upon the addition of the 3-HPA production pathway revealed that many more knockouts are required before biomass formation is coupled with 3-HPA production. One of the most interesting strategies involves nine knockouts yielding 3-HPA production at 91% of its theoretical maximum at optimal growth. The first three knockouts were relatively straightforward as they involved removal of competing acetate, lactate, and ethanol production mechanisms.
In addition, the Entner-Doudoroff pathway (either phosphogluconate dehydratase or 2-keto-3-deoxy-6-phosphogluconate aldolase), four respiration reactions (i.e., NADH
dehydrogenase I, NADH dehydrogenase II, glycerol-3-phosphate dehydrogenase, and the succinate dehydrogenase complex), and an initial glycolyis step (i.e., phosphoglucose isomerase) are disrupted. This strategy resulted in a 3-HPA yield that, assuming the maximum biomass yield, is 69% higher than the previously identified mutants utilizing the glycerol conversion route.
2.3 EXAMPLE 3 Alternative Cellular Objective: Minimization of Metabolic Adjustment All results described previously were obtained by invoking the maximization of biomass yield as the cellular objective that drives flux allocation. This hypothesis essentially assumes that the metabolic network could arbitrarily change and/or even rewire regulatory loops to maintain biomass yield maximality under changing environmental conditions (maximal response). Recent evidence suggests that this is sometimes achieved by the K-12 strain of E. coli after miultiple cycles of growth selection (Ibarra et al., 2002).
In this section, a contrasting hypothesis was examined (i.e., minimization of metabolic SUBSTITUTE SHEET (RULE 26) adjustment (MOMA) (Segre et al., 2002)) that assumed a myopic (minimal) response by the metabolic network upon gene deletions. Specifically, the MOMA hypothesis suggests' that the metabolic network will attempt'to remain as close as possible to the original steady state of the system rendered unreachable by the gene deletion(s). This hypothesis has been shown to provide a more accurate description of flux allocation immediately after a gene deletion event (Segre et al., 2002). For this study, the MOMA objective was utilized to predict the flux distributions in the mutant strains identified by OptKnock.
The base case for the lactate and succinate simulations was assumed to be maximum biomass formation under anaerobic conditions, while the base case for the PDO simulations was maximum biomass formation under aerobic conditions. The results are shown in the last column of Table 1. In all cases, the suggested multiple gene knock-out strategy suggests only slightly lower chemical production yields for the MOMA case compared to the maximum biomass hypothesis. This implies that the OptKnock results are fairly robust with respect to the choice of cellular objective.
3.0 Alternative Embodiments The publications and other material used hereiri to illuminate the background of the invention or provide additional details respecting the practice, are herein incorporated by reference in their entirety. The present invention contemplates numerous variations, including variations in organisms, variations in cellular objectives, variations in bioengineering objectives, variations in types of optimization problems formed and solutions used. These and/or other variations, modifications or alterations may be made SUBSTITUTE SHEET (RULE 26) therein without departing from the spirit and the scope of the invention as set forth in the appended claims. -REFERENCES
Anthony, C. (1982) The Biochemistry of Methylotrophs (Academic Press.) Arita, M. (2000) Simulation Practice and Theory 8, 109-125.
Arita, M. (2004) Proc Natl Acad Sci U S A 101, 1543-7.
Badarinarayana, V., Estep, P.W., 3rd, Shendure, J., Edwards, J., Tavazoie, S., Lam, F., Church, G.M. (2001) Nat Biotechnol 19(11): 1060-5.
Bailey, J. E: (1991) Science 252, 1668-75.
Bailey, J. E. (2001) Nat Biotechnol 19, 503-4.
Bard, J. F. 1998. Practical bilevel optimization : algorithms and applications. Dordrecht ;
Boston, Kluwer Academic.
Biebl, H., Menzel, K., Zeng, A.P., Deckwer, W.D. (1999) Appl Environ Microbiol 52:
289-297.
Bond, D. R. & Lovley, D. R. (2003) Appl Environ Microbiol 69, 1548-55.
Bond, D. R., Holmes, D. E., Tender, L. M. & Lovley, D. R. (2002) Science 295, 483-5.
Brown, M. (1999) Perl programmer's reference (Osborne/McGraw-Hill, Berkeley, Calif.).
Burgard, A. P. & Maranas, C. D. (2001) Biotechnol Bioeng 74, 364-375.
Burgard, A. P., Pharkya, P. & Maranas, C. D. (2003) Biotechnol Bioeng 84, 647-57.
Burgard, A. P., Maranas, C. D. (2003) Biotechnol Bioeng 82(6): 670-7.
Burgard, A. P., Vaidyaraman, S., Maranas, C. D. (2001) Biotechnol Prog 17: 791-797.
SUBSTITUTE SHEET (RULE 26) Cameron, D. C., Altaras, N. E., Hoffman, M. L., Shaw, A. J. (1998) Biotechnol Prog 14(l):
116-25.
Castellanos, M., Wilson, D. B. & Shuler, M. L. (2004) Proc Natl Acad Sci U S A
101, 6681-6.
Causey, T. B., Shanmugam, K. T., Yomano, L. P. & Ingram, L. O. (2004) Proc Natl Acad Sci U S A 101, 2235-40. =
Chin, H. L., Chen, Z. S. & Chou, C. P. (2003) Biotechnol Prog 19, 3 83-8.
Chistoserdova, L., Laukel, M., Portais, J. C., Vorholt, J. A. & Lidstrom, M.
E. (2004) J
Bacteriol 186, 22-8.
Chistoserdova, L., Vorholt, J. A., Thauer, R. K. & Lidstrom, M. E. (1998) Science 281, 99-102.
Compagno, C., Boschi, F., Ranzi, B. M. (1996) Biotechnol Prog 12(5): 591-5.
Covert, M.W., Palsson, B.O. (2002) J Biol Chem 277(31): 28058-64.
Covert, M.W., Schilling, C.H., & Palsson, B.O. (2001) J Theor Bio1.213(1): 73-88.
'15 Das, D. & Veziroglu, T. N. (2001) International Journal of Hydrogen Energy 26, 13-28.
Datta, R., Tsai, S., Bonsignore, P., Moon, S., Frank, J. R. (1995) FEMS
Microbiol. Rev.
16: 221-231.
David, H., Akesson, M. & Nielsen, J. (2003) Eur J Biochem 270,4243-53.
Desai, R. P., Nielsen, L. K. & Papoutsakis, E. T. (1999) J Biotechno171, 191-205.
Edwards, J. S. & Paisson, B. O. (2000) Proc Natl Acad Sci U S A 97, 5528-33.
Edwards, J. S., Ibarra, R. U., Palsson, B. O. (2001) Nat Biotechnol 19(2): 125-30.
Edwards, J. S., Palsson, B. O. (2000) Proc Natl Acad Sci U S A 97(10): 5528-33.
Ellis, L. B., Hou, B. K., Kang, W. & Wackett, L. P. (2003) Nucleic Acids Res 31, 262-5.
SUBSTITUTE SHEET (RULE 26) Eppstein; D. (1994) in 35th IEEE Symp. Foundations of Comp. Sci, Santa Fe), pp. 154;
165.
Fan, L. T., Bertok, B. & Friedler, F. (2002) Comput Chem 26, 265-92.
Finneran, K. T., Housewright, M. E. & Lovley, D. R. (2002) Environ Microbiol 4, 510-6.
Forster, J., Famili, I., Fu, P. C., Palsson, B., Nielsen, J. (2003) Genome Research 13(2):
244-253.
Forster, J., Famili, I., Fu, P., Palsson, B. O. & Nielsen, J. (2003) Genome Res 13, 244-53.
Gupta, S., Clark, D. P. (1989) J Bacteriol 171(7): 3650-5.
. ~ .
Hartlep, M.; Hussmann, W:, Prayitno, N., Meynial-Salles, I., Zeng, A. P.
(2002) Appi Microbiol Biotechnol 60(1-2): 60-6.
Hatzimanikatis, V., Emmerling, M., Sauer, U., Bailey, J. E. (1998) Biotechnol Bioeng 58(2-3): 154-61.
Hatzimanikatis, V., Li, C., Ionita, J. A. & Broadbelt, L. J. (2003) presented at Biochemical Engineering XIII Conference; Session 2, Boulder, CO.
Heinrich, R., Rapoport, T. A. (1974) Eur. J. Biochem. 41:' 89-95.
Hugler, M., Menendez, C., Schagger, H., Fuchs, G. (2002) J Bacteridl- 184(9):
2404-10.
Ibarra, R. U., Edwards, J. S., Palsson, B. O. (2002) Nature 420(6912): 186-9.
Ignizio, J.P., Cavalier, T.M. 1994. Linear programming. Englewood Cliffs, N.J., Prentice Hall.
Kacser, H., Bums, J. A. (1973). Symp. Soc. Exp. Biol. 27: 65-104.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. (2004) Nucleic Acids Res 32 Database issue, D277-80.
SUBSTITUTE SHEET (RULE 26) Karp, P. D., Riley, M:, Saier, M., Paulsen, I. T., Collado-Vides, J., Paley, S. M., Pellegrini-Toole, A., Bonavides, C. & Gama-Castro, S. (2002) Nucleic Acids Res 30, 56-8.
Kataoka, N., Miya, A. & Kiriyama, K. (1997) Wat. Sci. Tech. 36, 41-47.
Kompala, D. S., Ramkrishna, D., Tsao, G. T. (1984) Biotechnol Bioeng 26(11):
1281.
Korotkova, N., Chistoserdova, L. & Lidstrom, M. E. (2002) J Bacteriol 184, 6174-8 1.
Krieger, C. J., Zhang, P., Mueller, L. A., Wang, A., Paley, S., Arnaud, M., Pick, J., Rhee, S. Y:= & Karp, P. D. (2004) Nucleic Acids Res 32 Database issue, D438-42.
Li, K. & Frost, J. W. (1998) Journal of American Chemical Society 120, 10545-10546.
Liu, H., Ramnarayanan, R. & Logan, B. E. (2004) Environmen'tal Sceince and Technology 38, 2281-2285.
Lovley, D. R. (2003) Nat Rev Microbiol 1, 35-44.
Majewski, R.A., Domach, M. M. (1990) Biotechnol Bioeng 35: 732=738.
Mavrovouniotis, M., Stephanopoulos, G. & Stephanopoulos, G. (1990) Bibtechnol Bioeng 36, 1119-1132.
McShan, D. C., Rao, S. & Shah, I. (2003) Bioinformatics 19,1692-8.
Menendez, C.,=Bauer, Z., Huber, H., Gad'on, N., Stetter, K.O., Fuchs, G.
(1999) J Bacteriol 181(4): 1088-98.
Methe, B. A., Nelson, K. E., Eisen, J. A., Paulsen, I. T., Nelson, W., Heidelberg, J. F., Wu, = D., Wu, M., Ward, N., Beanan, M. J., et al. (2003) Science 302, 1967-9.
Misawa, N., Yamano, S. & Ikenaga, H. (1991) Appl Environ Microbiol 57, 1847-9.
Nakamura, C. E. & Whited; G. M. (2003) Curr Opin Biotechnol 14, 454-9., SUBSTITUTE SHEET (RULE 26) Nakamura, C.E. 2002. Production of 1,3-Propanediol by E. coli. presented at Metab Eng IV
Conf: Tuscany, Italy.
Nandi, R. & Sengupta, S. (1996) Enzyme and microbial tehcnology 19, 20-25.
Nandi, R. & Sengupta, S. (1998) Crit Rev Microbiol 24, 61-84.
Neidhardt, F.C., Curtiss, R. 1996. Escherichia coli and Salmonella : cellular and molecular biology. Washington, D.C., ASM Press.
Overbeek, R., Larsen, N., Pusch, G.*D., D'Souza, M., Selkov, E., Jr., Kyrpides, N., Fonstein, M., Maltsev, N. & Selkov, E. (2000) Nucleic Acids Res 28, 123-5.
Papin, J.A., Price, N.D., Wiback, S.J., Fell, D.A., Palsson, B. 2003.
Metabolic Pathways in the Post-Genome Era. Trends Biochem Sci, accepted.
Papoutsakis, E. & Meyer, C. (1985) Biotechnol Bioeng 27, 50-66.
Papoutsakis, E. (1984) Biotechnol Bioeng 26, 174-187.
Pharkya, P., Burgard, A. P. & Maranas, C. D. (2003) Biotechnol Bioeng 84, 887-99.
Price, N.D., Papin, J. A., Schilling, C.H., Paisson, B. 2003. Genome-scale Microbial In Silico Models: The Constraints-Based Approach. Trends Biotechnol, accepted.
Ramakrishna, R., Edwards, J. S., McCulloch, A., Palsson, B. O. (2001) Am J
Physiol Regul Integr Comp Physiol 280(3): R695-704.
Ramakrishna, R., Ramakrishna, D., Konopka, A. E. (1996). Biotechnol Bioeng 52:
151.
Reed, J. L., Vo, T. D., Schilling, C. H. & Palsson, B. O. (2003) Genome Biol 4, R54.
Schilling, C. H., Covert, M. W., Famili, I., Church, G. M., Edwards, J. S. &
Paisson, B. O.
(2002) J Bacteriol 184,4582-93.
SUBSTITUTE SHEET (RULE 26) Schilling, C. H., Covert,lVl. W., Famili, I., Church, G. M., Edwards,'J. S., Palsson, B. 0.
(2002) J Bacteriol 184(16): 4582-93.
Schilling, C. H., Palsson, B. O. (2000) J Theor Biol 203(3): 249-83.
Segre, D., Vitkup, D., Church, G. M. (2002) Proc Natl Acad Sci U S A 99(23):
15112-7.
Segre, D., Zucker, J., Katz, J., Lin, X., D'Haeseleer, P., Rindone, W. P., Kharchenko, P., Nguyen, D. H., Wright, M. A. & Church, G. M. (2003) Omics 7,301-16.
Selkov,"E., Jr., Grechkin, Y., Mikhailova, N. & Selkov, E. (1998) Nucleic Acids Res 26, 43-5.
Seressiotis, A. & Bailey, J. E. (1988) Biotechnol Bioeng 31, 587-602.
Stephanopoulos, G., Aristidou, A. A., Nielsen, J. 1998. Metabolic engineering : principles and methodologies. San Diego, Academic Press.
Stephanopoulos, G. & Sinskey, A. J. (1993) T'rends Biotechnol 11, 392-6.
Stols, L., Donnelly, M. I. (1997) Appl Environ Microbio163(7): 2695-701.
Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y.;
Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C., et al. (1999) Bioinformatics 15, 72-84.
Valdes, J., Veloso, F., Jedlicki, E. & Holmes, D. (2003) BMC Genomics 4, 51.
Van Dien, S. J. & Lidstrom, M. E. (2002) Biotechnol Bioeng 78, 296-312.
Van Dien, S. J., Strovas, T. & Lidstrom, M. E. (2003) Biotechnol Bioeng 84, 45-55.
Varma, A., Boesch, B. W., Palsson, B. O. (1993) Appl Environ Microbiol 59(8):
2465-73.
Varma, A, Palsson, B. O. (1993) J. Theor. Biol. 165: 503-522.
Varma, A. & Palsson, B. O. (1994) Bio/Technology 12, 994-998.
Varner, J., Ramkrishna, D. (1999) Biotechnol Prog 15(3): 407-25.
Varner, J. & Ramkrishna, D. (1999) Curr Opin Biotechnol 10, 146-150.
SUBSTITUTE SHEET (RULE 26) Zeikus, J. 'G., Jain, M. K., Elankovan, P. (1999) Appl Microbiol Biotechnol 51: 545-552.
Zeng, A. P., Biebl, H. (2002) Adv Biochem Eng Biotechnol 74: 239-59.
Zhu, M. M., Lawman, P. D., Cameron, D. C. (2002) Biotechnol Prog 18(4): 694-9.
SUBSTITUTE SHEET (RULE 26)
. , The recent availability of genome-scale models of microbial organisms has provided the pathway reconstructions necessary for developing computational methods aimed at identifying strain engineering strategies (Bailey, 2001). These models, already available for H. pylori (Schilling et al., 2002), E. coli (Reed et al., 2003;
Edwards &
Palsson, 2000), S. cerevisiae (Forster et al., 2003) and other microorganisms (David et al., 2003; Van Dien & Lindstrom, 2002; Valdes et al., 2003) provide successively refined abstractions of the microbial metabolic capabilities. An automated process to expedite the construction of stoichiometric models from annotated genomes (Segre et al., 2003) promises to further accelerate the metabolic reconstructions of several microbial organisms. At the same time, individual reactions are deposited in databases such as KEGG, EMP, MetaCyc, UM-BBD, and many more (Overbeek et al., 2000; Selkov et al., 1998; Kanehisa et al., 2004; Krieger et al., 2004; Ellis et al., 2003; Karp et al., 2002), forming encompassing and growing collections of the biotransformations for which we have direct or indirect evidence of existence in different species. Already many thousands of such reactions have been deposited; however, unlike organism specific metabolic reconstructions (Schilling et al., 2002; Reed et al., 2003; Edwards & Palsson, 2000; Forster et al., 2003), these compilations include reactions from not a single but many different species in a largely uncurated fashion. This means that currently there exists an ever-expanding collection of microbial models and at the same time ever more encompassing SUBSTITUTE SHEET (RULE 26) compilations of non-native functionalities. This newly acquired plethora of data has brought to the forefront a number of computational and modeling challenges which form the scope of this article. Specifically, how can we systematically select from the thousands of functionalities catalogued in various biological databases, the appropriate set of pathways/genes to recombine into existing production systems such as E. coli so as to endow them with the desired new functionalities? Subsequently, how can we identify which competing functionalities to eliminate to ensure high product yield as well as viability?
Existing strategies and methods for accomplishing this goal include database queries to explore all feasible bioconversion routes from a substrate to a target compound from a given list of biochemical transformations (Seressiotis & Bailey, 1988;
Mavrovouniotis et al., 1990). More recentfy, elegant graph theoretic concepts (e.g., P-graphs (Fan et al., 2002) and k-shortest paths algorithm (Eppstein, 1994)) were pioneered to identify novel biotransformation pathways based on the tracing of atoms (Arita, 2000;
Arita 2004), enzyme function rules and thermodynamic feasibility constraints _ (Hatzimanikatis et al., 2003). Also an interesting heuristic search approach that uses the enzymatic biochemical reactions found in the KEGG database (Kanehisa et al., 2004) to construct a connected graph linking the substrate and product metabolites was recently proposed (McShan et al., 2003). Most of these approaches, however, generate linear paths that link substrates to fmal products without ensuring that the rest of the metabolic network is balanced and that metabolic imperatives on cofactor usage/generation and energy balances are met.
SUBSTITUTE SHEET (RULE 26) The present invention provides a hierarchical optimization-based framework, OptStrain to identify stoichiometrically-balanced pathways to be generated upon recombination of non-native functionalities into a host organism to confer the desired phenotype. Candidate metabolic pathways are identified from an ever-expanding array of thousands (currently 5,734) of reactions pooled together from different stoichiometric models and publicly available databases such as KEGG (Kanehisa et al., 2004).
Note that the identified pathways satisfy maximum yield considerations while the choice of substrates can be treated as optimization variables. Important information pertaining to the cofactor/energy requirements associated with each pathway is deduced enabling the comparison of candidate pathways with respect to the aforementioned criteria.
Production host selection is examined by successively minimizing the reliance on heterologous genes while satisfying the performance targets identified above. A gene set that encodes for all the enzymes needed to catalyze the identified non-native functionalities can then be constructed accounting for isozymes and multi-subunit enzymes. Subsequently, gene deletions are identified (Burgard et al., 2003; Pharkya et al., 2003) in the augmented host networks to improve product yields by removing competing functionalities which decouple biochemical production and growth objectives. The breadth and scope of OptStrain is demonstrated by addressing in detail two different product molecules (i.e., hydrogen and vanillin) which lie at the two extremes in terms of product- molecule size.
Briefly, computational results in some cases match existing strain designs and production practices whereas in others pinpoint novel engineering strategies.
1.1 Materials and Methods SUBSTITUTE SHEET (RULE 26) The first challenge addressed is to develop a systematic computational framework to identify which functionalities to add to the organism-specific metabolic network (e.g., E.
coli (Reed et al., 2003; ), S. cerevisiae (Forster et al., 2003), , Edwards &
Palsson, 2000 C.
acetobutylicum (Desai et al., 1999; Papoutsakis, 1984), etc.) to enable the desired biotransformation. The present inventors have already contributed towards this objective at a much smaller scale (Burgard & Maranas, 2001). Due to the extremely large size of the compiled database and the presence of multiple and sometimes conflicting objectives that need to be simultaneously satisfied, we developed the OptStrain procedure illustrated in Figure 1. Each step introduces different computational challenges arising from the specific structure and size of the optimization problems that need to be solved.
Step 1. Automated downloading and curation of the reactions in our Universal database to ensure stoichiometric balance;
Step 2. Calculation of the maximum theoretical yield of the product given a substrate choice without restrictions on the reaction origin (i.e., native or non-native);
Step 3. Identification of a stoichiometrically-balanced pathway(s) that minimizes the number of non-native functionalities in the examined production host given the maximum theoretical yield and the optimum substrate(s) found in Step 2. Alternative pathways that meet both criteria of maximum yield and minimum number of heterologous genes are generated along with comparisons between different host choices. Information pertaining to the cofactor/energy usage associated with each pathway is also derived at this stage.
~
Finally, one or multiple gene sets can be derived at this stage that ensure the presence of the targeted biotransformations by encoding for the appropriate enzymes;
SUBSTITUTE SHEET (RULE 26) Step 4. Incorporation of the identified non-native biotransformations into the stoichiometric models, if available, of the examined microbial production hosts. The OptKnock framework is next applied (Burgard et al., 2003; Pharkya et aL, 2003) on these augmented models to suggest gene deletions that ensure the production of the desired product becomes an obligatory byproduct of growth by "shaping" the connectivity of the metabolic network. The OptKnock framework is further described herein.
Curation of the database. The first step of the OptStrain procedure begins with the downloading and curation of reactions acquired from various sources in our Universal database. Specifically, given the fact that new reactions are incorporated in the KEGG
database on a monthly basis, we have developed customized scripts using Perl (Brown, 1999) to automatically download all reactions in the database on a regular basis. A
different script is then used to parse the number of atoms of each element in every compound. The number of atoms of each type among the reactants and products of all reactions are calculated and reactions which are elementally unbalanced are excluded from consideration. In addition, compounds with an unspecified number of repeat units, (e.g., trans-2-Enoyl-CoA represented by C25H39N7017P3S(CHZ)õ) or unspecified alkyl groups R
in their chemical formulae are remoyed from the downloaded sets. This step enables the automated downloading of functionalities present in genomic databases and the subsequent verification of their elemental balanceabilities forming large-scale sets of functionalities to be used as recombination targets.
The present invention, contemplates that any number of particular methods can be used to automate the duration and/or curation of reactions. These automated functions can be performed in any number of ways depending upon the resources available, the type of SUBSTITUTE SHEET (RULE 26) access to the resources, and other factors related to the specific environment or context in which the present invention is implemented.
Determination of the maximum yield Once the reaction sets are determined, the second step is geared towards determining the maximum theoretical yield of the target product from a range of substrate choices, without restrictions on the number or origin of the reactions used. The maximum theoretical product yield is obtained for a unit uptake rate of substrate by maximizing the sum of all reaction fluxes producing minus those consuming the target metabolite, weighted by the stoichiometric coefficient of the target metabolite in these reactions. .The maximization of this yield subject to stoichiometric constraints and transport conditions yields a Linear Programming (LP) problem (see Supporting Information for mathematical formulation), often encountered in Flux Balance Analysis frameworks (Varma & Palsson, 1994). Given the computational tractability of LP
problems, even for many thousands for reactions, a large number of different substrate choices can thoroughly be explored here.
Although, in this specific embodiment, the bioengineering objective relates to maximizing production, the present invention contemplates that other bioengineering objectives can be used. In such instances, instead of determining or selecting a maximum yield, a separate and appropriate objective or constraint can be used.
Identification of the minimum number of heterologous reactions for a host organism. The next step in OptStrain uses the knowledge of the maximum theoretical yield to determine the minimum number of non-native functionalities that need to be added into a specific host organism network. Mathematically, this is achieved by first introducing a set of binary variables y, that serve as switches to tum the associated reaction fluxes v, on vlin ,yj <vj SvT~3.yj SUBSTITUTE SHEET (RULE 26) or off.
Note that the binary variable y, assumes a value of one if reactionj is active and a value of zero if it is inactive. This constraint will be imposed on only reactions associated with genes heterologous to the specified production host. The parameters vj""
and vj"' are calculated by minimizing and maximizing every reaction flux vj subject to the stoichiometry of the metabolic network (Burgard & Mamas, 2001). This leads to a Mixed Integer Linear Programming (MILP) model for finding the minimum number of genes to be added into the host organism network while meeting the yield target for the desired product. This formulation, discussed in greater detail later herein, enables the exploration of tradeoffs between the required numbers of heterologous genes versus the maximum theoretical product yield and also the iterative identification of all alternate optimal solutions. The end result of this step is a set of distinct pathways and corresponding gene complements that provide a ranked list of all alternatives for the efficient conversion of the substrate(s) into the desired product.
Incorporating the non-native reactions into the host organism's stoichiometric model. Upon identification of the appropriate host organism, the analysis proceeds with an organism-specific stoichiometric model augmented by the set of the identified non-native reactions. Hqwever, simply adding genes to a microbial production strain will not necessarily lead to the desired overproduction due to the fact that microbial metabolism is primed to be as responsive as possible to the imposed selection pressures (e.g., outgrow its competition). These survival objectives are typically in direct competition with the overproduction of targeted biochemicals. To combat this, we use our previously developed bilevel computational framework, OptKnock (Burgard et al., 2003; Pharkya et al., 2003) to SUBSTITUTE SHEET (RULE 26) eliminate all those functionalities which uncouple the cellular fitness objective, typically exemplified as the biomass yield, from the maximum yield of the product of interest.
1.2 Results Computational results for microbial strain optimization focused on the production of hydrogen and vanillin. One skilled in the art having the benefit of this disclosure would understand the present invention is in no way limited to these particular bioengineering objectives which are merely illustrative of the present invention. The hydrogen production case study underscores the importance of investigating multiple substrates and microbial hosts to pinpoint the optimal production environment as well as the need to eliminate competing functionalities. In contrast, in the vanillin study, identifying the smallest number of non-native reactions is found to be the key challenge for strain design. A
common database of reactions, as outlined in (Step 1), was constructed for both examples by pooling together metabolic pathways from the methylotroph Methylobacterium extorquens AMI (Van Dien & Lindstrom, 2002) and the KEGG database (Kanehisa et al., 2004) of reactions.
1.2.1 Hydrogen Production Case Study An efficient microbial hydrogen production strateg,~ requires the selection of an optimal substrate and a microbial strain capable of forming hydrogen at high rates. First we solved the maximum yield LP formulation (Step 2) using all catalogued reactions which were balanced with respect to hydrogen, oxygen, nitrogen, sulfur, phosphorus and carbon (approximately 3,000 reactions) as recombination candidates. Note that OptStrain allowed for different substrate choices such as pentose and hexose sugars as well as acetate, lactate, SUBSTITUTE SHEET (RULE 26) malate, glycerol, pyruvate, succinate and methanol. The highest hydrogen yield obtained for a methanol substrate was equal to 0.126 g/g substrate consumed. This is not surprising given that the hydrogen to carbon ratio for methanol is the highest at four to one. A
comparison of the yields for some of the more efficient substrates is shown in Figure 2.
We decided to explore methanol and glucose further, motivated by the high yield on methanol and the favorable costs associated with the use of glucose.
The next step in the OptStrain procedure entailed the determination of the minimum number of non-native functionalities for achieving the theoretical maximum yield in a host organism. We examined three different uptake scenarios: (i) glucose as the substrate in Escherichia coli (an established production system), (ii) glucose in Clostridium, acetobutylicum (a known hydrogen producer), and (iii) methanol in Methylobacterium extorquens (a known methanol consumer).
1.2.1.1 Escherichia coli The MILP framework (described in Step 3) correctly verified that with glucose as the substrate no ndn-native functionalities were required by E. coli for hydrogen production. Interestingly, hydrogen production was possible through either the ferredoxin hydrogenase reaction (E.C.# 1.12.7.2) which reduces protons to form hydrogen or via the hydrogen dehydrogenase reaction (E.C.# 1.12.1.2) which converts NADH into NAD+while forming hydrogen through proton association. Subsequently, the upper and lower limits of maximum hydrogen formatiori were explored for the E. coli stoichiometric model (Reed et al., 2003) as a function of biomass formation rate (i.e., growth rate) for both aerobic and anaerobic conditions and a basis glucose uptake rate of 10 mmol/gDW/hr (see Figure 3).
SUBSTITUTE SHEET (RULE 26) Notably, the maximum theoretical hydrogen yield is higher under aerobic conditions.
However, only under anaerobic conditions hydrogen is formed at maximum growth (see point A, in Fig. 3) leading to a growth-coupled production mode. Note that hydrogen production takes place through the formate hydrogen lyase reaction which converts formate into hydrogen and carbon dioxide under anaerobic conditions, in agreement with current experimental observations (Nandi & Sengupta., 1998).
Moving to phenotype restriction to curtail byproduct formation (Step 4), we explored whether the production of hydrogen in the wild type E. coli network (Reed et al., 2003) could be enhanced by removing functionalities from the network that were in direct ~ or indirect competition with hydrogen production. To this end, we employed the OptKnock framework (Burgard et al., 2003; Pharkya et al., 2003), to pinpoint gene deletion strategies that couple hydrogen production with growth. Here we highlight two of the identified strategies. The first (double deletion) removes both enolase (E.C.# 4.2.1.11) and glucose 6-phosphate dehydrogenase (E.C.# 1.1.1.49). The removal of the enolase reaction strongly promotes hydrogen formation by directing the glycolytic flux towards the 3-phosphoglycerate branching point into the serine biosynthesis pathway.
Subsequently, serine participates in a series of reactions in one-carbon metabolism to form formyltetrahydrofolate which eventually is converted to formate and tetrahydrofolate. The elimination of dehydrogenase reaction prevents the shunting of any glucose 6-phosphate flux into the pentose phosphate pathway. The second strategy, a three-reaction deletion study, involves the removal of ATP synthase (E.C.# 3.6.3.14), alpha-ketoglutarate dehydrogenase, and acetate kinase (E.C.# 2.7.2.1). The removal of the first reaction enhances proton availability whereas the other two deletions ensure that maximum carbon SUBSTITUTE SHEET (RULE 26) flux is directed towards pyruvate which is then converted into formate through pyruvate formate lyase. Formate is catabolized into hydrogen and carbon dioxide through formate hydrogen lyase.
A comparison of the hydrogen production limits as a function of growth rate for both the wild-type and mutant networks is shown in Figure 3. The transport rates of carbon dioxide for the mutant networks were fixed at the values suggested by OptKnock, thus setting the operational imperatives (Pharkya et al., 2003). Note that while the two-reaction deletion mutant has a theoretical hydrogen production rate of 22.7 mmol/gDW/hr (0.025 g/g glucose) at the maximum growth rate (Point B), the three-reaction deletion mutant produces a maximum of 29.5 mmol/gDW/hr (0.033 g/g glucose) (Point C) at the expense of a reduced maximum growth rate. Interestingly, in both mutant networks, maximum hydrogen production requires the uptake of oxygen. This is in contrast to the wild-type case where the lack of oxygen was preferred for hydrogen formation. Notably, it has been reported (Nandi & Sengupta, 1996) that although formate hydrogen lyase can only be induced in the absence of oxygen, it can function in aerobic environments.
This will have to be accounted for in any experimental study conducted on the basis of these results.
1.2.1.2 Clostridium acetobutylicum Ample literature evidence has identified the organisms of the Clostridium species as natural hydrogen production systems (Nandi & Sengupta, 1998; Katakoka et al., 1997;
Chin et al., 2003; Das & Veziroglu, 2001). The reduction of protons into hydrogen through ferredoxin hydrogenase (E.C.# 1.12.7.2) is the key associated reaction. Not surprisingly, using OptStrain (Step 3), we verified that no non-native reactions were required for SUBSTITUTE SHEET (RULE 26) hydrogen production (Papoutsakis & Meyer, 1985) in Clostridium acetobultylicum with glucose as a substrate. We next explored, as in the E. coli =case,'whether hydrogen production could be enhanced by judiciously removing competing functionalities using the OptKnock framework. To this end, we used the stoichiornetric model for Clostridium acetobutylicum developed by Papoutsakis and coworkers (Desai et al., 1999;
Papoutsakis, 1984). OptKnock suggested the deletion of the acetate-forming and butyrate-transport reactions.
This deletion strategy is reasonable in hindsight upon considering the energetics of the entire network. Specifically, in the wild-type case the formation and secretion of each butyrate molecule requires the consumption of 2 NADH molecules, thus reducing the hydrogen production capacity of the network. However, if butyrate is not secreted, but is instead recycled to form acetone and butyryl CoA, then butyryl CoA can again be converted to butyrate without any NADH consumption. The double deletion mutant has a theoretical hydrogen yield of 3.17 mol/mol glucose (0.036g /g glucose) at the expense of slightly lower growth rate (point C in Figure 4). Notably, in this case, biomass formation and hydrogen production are tightly coupled, in contrast to the wild-type network where a range (1.38-2.96 mmol/gDW/hr) of hydrogen formation rates are possible (Line AB in Figure 4) at the maximum growth rate. Experimental results (Nandi & Sengupta, 1998) indicate that only up to 2 mol of hydrogen can be produced per mol of glucose anaerobically in Clostridium. In -fact, it has been reported=that inhibitory effects of butyrate directly on hydrogen production and indirect, effects of acetate on growth inhibition (Chin et al., 2003) are responsible for the observed low hydrogen yields.
Interestingly, the suggested reaction eliminations directly circumvent these inhibition bottlenecks.
SUBSTITUTE SHEET (RULE 26) 1.2.1.3 A<lethylobacterium extorguens AMl Moving from glucose to methanol as the substrate, we next investigated hydrogen production in'Methylobacterium extorquens AMl , a facultative methylotroph capable of surviving solely on methanol as a carbon and energy source (Van Dien &
Lidstrom, 2002).
The organism has been well-studied (Anthony, 1982; Chistoserdova et al., 2004;
Chistoserdova et al., 1998; Korotkova et al., 2002; Van Dien et al., 2003) and recently, a stoichiometric model of its central metabolism was published (Van Dien &
Lidstrom, 2002). Using Step 3 of OptStrain, we identified that only a single reaction needs to be introduced into the metabolic network of M. extorquens to enable hydrogen production.
Two such candidates are hydrogenase (E.C.# 1.12.7.2) which reduces protons to hydrogen or alternatively N5,.N10-methenyltetrahydromethanopterin hydrogenase which catalyzes the following transformation:
E.C.# 1.12.98.2: 5,10-Methylenetrahydromethanopterin +-+ 5,10-Methenyltetrahydromethanopterin + H2.
The need for an additional reaction is expected because the central metabolic pathways in the methylotroph, as abstracted in (Van Dien & Lidstrom, 2002), do not include any reactions that convert protons into hydrogen such as the hydrogenases found in E. coli and the anaerobes of the Clostridia species. Therefore, it is not surprising that, to the best of our knowledge, no one has achieved hydrogen production using methylotrophs such as Pseudomonas AMI and P. methylica (Nandi & Sengupta, 1998). The identified reaction additions provide a plausible explanation for this outcome by pinpointing the lack of a mechanism to convert the generated protons to hydrogen.
1.2.2 Vanillin Production Case Study SUBSTITUTE SHEET (RULE 26) Vanillin is an important flavor and aroma molecule. The low yields of vanilla from cured vanilla pods have motivated efforts for its biotechnological production.
In this case study, we identify metabolic network redesign strategies for the de novo production of vanillin from glucose in E. coli. Using OptStrain, we first determined the maximum i theoretical yield of vanillin from glucose to be 0.63 g/g glucose by solving the LP
optimization over approximately 4,000 candidate reactions balanced with respect to all elements but hydrogen (Step 2). We next identified that the minimum number of non-native reactions that must be'recombined into E. coli to endow it with the pathways necessary to achieve the maximum yield is three (Step 3). Numerous alternative pathways, differing only in their cofactor usage, which satisfy both the optimality criteria of yield and minimality of recombined reactions, were identified. For example, one such pathway uses the following three non-native reactions:
(i) E.C.# 1.2.1.46: Formate + NADH + H+<-+ Formaldehyde + NAD+ + H20, (ii) E.C.# 1.2.3.12: 3, 4-dihydroxybenzoate (or protocatechuate) + NAD+ + H20 +
Forrnaldehyde <--> Vanillate + Oa + NADH, and (iii) E.C.# 1.2.1.67: Vanillate + NADH +'H+ +-+ Vanillin + NAD+ + H20.
Interestingly, these steps are essentially the same as those used in the experimental study by Li and Frost (1998) to convert glucose to vanillin in recombinant E. coli cells demonstrating that the computational procedure can indeed uncover relevant engineering strategies. Note, however, that the reported experimental yield of 0.15 g/g glucose is far from the maximum theoretical yield (i.e., 0.63 g/g glucose) of the network indicating the potential for considerable improvement.
SUBSTITUTE SHEET (RULE 26) This motivates examining whether it is possible to reach higher yields of vanillin by systematically pruning the metabolic network using OptKnock (Step 4). Here the genome-scale model of E. coli metabolism, augmented with the three functionalities identified above, is integrated into the OptKnock framework to determine the set(s) of reactions whose deletion would, force a strong coupling between growth and vanillin production. The highest vanillin-yielding single, double, and quadruple knockout strategies are discussed next for a basis glucose uptake rate of 10 mmol/gDW/hr. In all cases, anacrobic conditions are selected by OptKnock as the most favorable for vanillin production. It is worth emphasizing that, in general, the deletion strategies identified by OptStrain are dependent upon the specific gene addition strategy fed into Step 4 of OptStrain.
Accordingly, we tested whether altexnative and possibly better, deletion strategies would accompany some of the other candidate addition strategies alluded to above. For the vanillin case study, we found the deletion suggestioris and anticipated vaniliin yields at maximal growth to be quite similar regardless of the gene addition strategy employed.
The first deletion strategy identified by OptStrain suggests removing acetaldehyde dehydrogenase (E.C.# 1.2.1.10) to prevent the conversion of acetyl-CoA into ethanol.
Vanillin production in this network, at the maximum biomass production rate of 0.205 hr-1, is 3.9 mmoUgDW/hr or 0.33 g/g glucose based on the assumed uptake rate of glucose. In this deletion strategy, flux is redirected through the vanillin precursor metabolites, phosphoenolpyruvate (PEP) and erythrose-4-phosphate (E4P), by blocking the loss of carbon through ethanol secretion. The second (double) deletion strategy involves the additional removal of glucose-6-phosphate isomerase (E.C.# 5.3.1.9) essentially blocking the upper half of glycolysis. These deletions cause the network to place a heavy'reliance on SUBSTITUTE SHEET (RULE 26) the Entner-Doudoroff pathway to generate pyntvate and glyceraldehyde-3-phosphate (GAP) which undergoes further conversion into PEP in the lower half of glycolysis.
Fructose-6-phosphate (F6P), produced through the non-oxidative part of the pentose phosphate pathway, is subsequently converted to E4P. Vanillin production, at the expense of a reduced maximum growth rate of 0.06 hr"t, is increased to 4.78 mmol/gDW/hr or 0.40 g/g glucose. A substantially higher level of vanillin production is predicted in the four-reaction deletion mutant network without imposing a high penalty on the growth rate. This strategy leads to the production of 6.79 mmol/gDW/hr of vanillin or 0.57 g/g glucose at the maximum growth rate of 0.052 hr"1. The OptKnock framework suggests the deletion of acetate kinase (E.C.# 2.7.2.1), pyruvate kinase (E.C.# 2.7.1.40), the PTS
transport mechanism, and fructose 6-phosphate aldolase. The first three deletions prevent leakage of flux from PEP and redirect it instead to vanillin synthesis. The elimination of fructose 6-phosphate aldolaseprevents the direct conversion of F6P into GAP and dihydroxyacetone (DHA). Note that both F6P and GAP are used to form E4P in the non-oxidative branch of the pentose phosphate pathway. DHA can be further reacted to form dihydroxyacetone phosphate (DHAP) with the consumption of a PEP molecule. Thus, elimination of fructose 6-phosphate aldolase prevents theutilization of both F6P and PEP which are required for vanillin synthesis. Furthermore, a surprising network flux redistribution involves the employment of a group of reactions from one-carbon metabolism to form 10-formyltetrahydrofolate, which is subsequently converted to formaldehyde.
Figure 5 compares the vanillin production envelopes, obtained by maximizing and minimizing vanillin formation at different biomass production rates for the wild-type and SUBSTITUTE SHEET (RULE 26) mutanat networks. These deletions endow the network with high levels of vanillin production under any growth conditions.
1.3 I)iscussion The OptStrain framework of the present invention is aimed at systematically reshaping whole genome-scale metabolic networks of microbial systems for the overproduction of not only small but also complex molecules. We have so far examined a number of different products (e.g., 1,3 propanediol, inositol, pyruvate, electron transfer, etc.) using a variety of hosts (i.e., E. coli, C. acetobut,ylicum, M.
extorquens). The two case studies, hydrogen and vanillin, discussed earlier show that OptStrain can address the range of challenges associated with strain redesign. At the same time, it is important to emphasize that the validity and relevance of the results obtained with the OptStrain framework are dependent on the level of completeness and accuracy of the reaction databases and microbial metabolic models considered. We have identified numerous instances of unbalanced reactions, especially with respect to hydrogen atoms, and ambiguous reaction directionality in the reaction.databases that we mined.
Careful curation of the downloaded reactions preceded all of our case studies. Whenever the balanceability of a reaction with respect to carbon could not be restored, the reaction was removed from consideration. We expect that this step will become less time-consuming as automated tools for reaction database testing and verification (Segre et al., 2003) are becoming available. The purely stoichiometric representation of metabolic pathways in microbial models can lead to unrealistic flux distributions by not accounting for kinetic barriers and regulatory interactions (e.g., allosteric regulation). To alleviate this, the present invention contemplates incorporating regulatory information in the form of Boolean constraints SUBSTITUTE SHEET (RULE 26) (Covert & Palsson, 2002) into the stoichiometric model of E. coli and the use of kinetic expressions on an as-needed basis (Castellanos et al. 2004; Tomita et al., 1999; Vamer &
Ramkrishna, 1999). Further, the present invention contemplates using OptKnock to account for not only reaction deletions but also up or down regulation of various key reaction steps. Despite these simplifications, OptStrain has already provided in many cases useful insight into microbial host redesign and, more importantly, established for the first time an integrated framework open to future modeling improvements.
It should be understood that a computer is iused in implementing the methodology of the present invention. The present invention contemplates that any number of computers can be used, and any number of types of software or programming languages can be used. It should further be understood that the present invention provides for storing a representation of the networks created. The representations of the networks can be-stored in a memory, in a signal, or in a bioengineered organism.
1.4 Mathematical Formulation for OntStrain The redesign of microbial metabolic networks to enable enhanced product yields by employing the OptStrain procedure requires the solution of multiple types of optimization problems. The first optimization task (Step 2) involves determining the maximum yield of the desired product in a metabolic network comprised of a set N= { 1, ..., N}
of metabolites and a set 9K ={1, ..., M} of reactions. The Linear Programming (LP) problem for maximizing the yield on a weight basis of a particular product P (in the set 9V} from a set 91 of substrates is formulated as:
u , i=P
Max MW =IS,,v) Vj I=1 SUBSTITUTE SHEET (RULE 26) subject to ESv, _ 0 , 'd i E N, io 91 (1) J=J
M
~ (MW, - isJv! -Z (2) 101 f al -where MYI', is the molecular weight of metabolite i, vj is the molar flux of reactionj, and S.
is the stoichiometric coefficient of metabolite i in reaction j. In our work, the metabolite set Nwas comprised of approximately 4,800 metabolites and the reaction set 94 consisted of more than 5,700 reactions. The inequality in constraint (1) allows only for' secretion and prevents the uptake of all metabolites in the network other than the substrates in 93.
Constraint (2) scales the results for a total substrate uptake flux of one gram. The reaction fluxes vi can either be irreversible (i.e., v, _ 0) or reversible in which case they can assume either positive or negative values. Reactions which enable the uptake of essential-for-growth compounds such as oxygen, carbon dioxide, ammonia, sulfate and phosphate are also present.
In Step 3 of OptStrain, the minimum number of non-native reactions needed to meet the identified maximum yield from Step 2 is found. First the Universal database reactions which are absent in the examined microbial host's metabolic model are flagged as non-native. This gives rise to the following Mixed Integer Linear Programming (MILP) problem:
Min EyJ
v, I yi JEMrren-nanve M
subject to ySuvj >_ 0 , b' i E N, io 91 (1) J=~
SUBSTITUTE SHEET (RULE 26) ,~f =
MI3; = ~ S#vi (2) IE92 j=1 M
t er ' 1= P (3) Syvj Yleld m MW; =Y >_ .%=J
v j:5 jmx .Y j b J E I"Inon-native (4) vj ~ v~p vi d.1 E I"lnon-native (5) y'j {0,1} , b J E Mnon-native (6) The set W noõ_narõe comprises of the non-native reactions for the examined host and is a subset of the set 9f. Constraints (1) and (2) are identical to those in the product yield maximization problem. Constrairit (3) ensures that the product yield meets the maximum theoretical yield, Yiela~age1, calculated in step 2. The binary variables yl in constraints (4) and (5) serve as switches to turn reactions on or off. A value of zero for yJ
forces the corresponding flux vj to be zero, while a value of one enables it to take on nonzero values.
The parameters vjm'n and vj" can either assume very low and very high values, respectively, or they can be calculated by minimizing and maximizing every reaction flux vj subject to constraints (1-3).
.15 Alternative pathways that satisfy both optimality criteria of maximum yield and minimum non-native reactions are obtained by the iterative solution of the MILP
formulation upon the accumulation of additional constraints referred to as an integer cuts.
Integer cut constraints exclude from consideration all sets of reactions previously identified. For example, if a previously identified pathway utilizes reactions 1, 2, and 3, then the following constraint prevents the same reactions from being simultaneously SUBSTITUTE SHEET (RULE 26) considered in subsequent solutions: yr + y2 + y3 < 2. More details can be found in Burgard and Maranas (2001).
Step 4 of OptStrain identifies which reactions to eliminate from the network augrnented with the non-native functionalities, using the OptKnock framework developed previously (Burgard et al., 2003; Pharkya et al., 2003). The objective of this step is to constrain the phenotypic behavior of the network so that growth is coupled with the formation of the desired biochemical, thus curtailing byproduct formation. The envelope of allowable targeted product yields versus biomass yields is constructed by solving a series of linear optimization problems which maximize and then, minimize biochemical production,for various levels of biomass formation rates available to the network. More details on the optimization formulation can be found in (Pharkya et al., 2003). All the optimization problems were solved in the order of minutes to hours using CPLEX
7.0 (http://www.ilog.com/products/cplex/) accessed via the GAMS (Brooke et al., 1998) modeling environment on an IBM RS6000-270 workstation.
SUBSTITUTE SHEET (RULE 26) 2. OUtKDOCk The ability to investigate the metabolism of single-cellular organisms at a genomic scale, and thus systemic level, motivates the need for novel computational methods aimed at identifying strain engineering strategies. The present invention includes a computational framework termed OptKnock for suggesting gene deletion strategies leading to the overproduction of specific chemical compounds in E. coli. This is accomplished by ensuring that the production of the desired chemical becomes an obligatory byproduct of growth by "shaping" the connectivity of the metabolic network. In other words, OptKnock identifies and subsequently removes metabolic reactions that are capable of uncoupling cellular growth from chemical production. The computational procedure is designed to identify not just straightforward but also non-intuitive knockout strategies by simultaneously considering the entire E. colf metabolic network as abstracted in the in silico E. coli model of Palsson and coworkers (Edwards & Palsson, 2000). The complexity and built-in redundancy of this network (e.g., the E. coli model encompasses 720 reactions) necessitates a systematic and efficient search approach to combat the combinatorial explosion of candidate gene knockout strategies.
The nested optimization framework shown in Figure 6 is developed to identify -multiple gene deletion combinations that maximally couple cellular growth objectives with externally imposed chemical production targets. This multi-layered optimization structure involving two competing optimal strategists (i.e., cellular objective and chemical production) is referred to as a bilevel optimization problem (Bard, 1998).
Problem formulation specifics along with an elegant solution procedure drawing upon linear programming (LP) duality theory are described in the Methods section. The OptKnock SUBSTITUTE SHEET (RULE 26) procedure is applied to succinate, lactate, and 1,3-propanediol (PDO) production in E. coli with the maximization of the biomass yield for a fixed amount of uptaken glucose employed as the cellular objective. The obtained results are also contrasted against using the minimization of metabolic adjustment (MO,MA) (Segre et al., 2002) as the cellular objective. Based on the OptKnock framework, it is possible to identify the most promising gene knockout strategies and their corresponding allowable envelopes of chemical versus biomass production in the context of succinate, lactate, and PDO production in E. coli.
A preferred embodiment of this invention describes a computational framework, termed OptKnock, for suggesting gene deletions=strategies that could lead to chemical production in E. coli by ensuring that the drain towards metabolites/compounds necessary for growth resources (i.e., carbons, redox potential, and energy) must be accompanied, due to stoichiometry, by the production of the desired chemical. Therefore, the production of the desired product becomes an obligatory byproduct of cellular growth.
Specifically, OptKnock pinpoints which reactions to remove from a metabolic network, which can be realized by deleting the gene(s) associated with the identified functionality.
The procedure was demonstrated based on succinate, lactate, and PDO production in E. coli K-12. The obtained results exhibit good agreement with strains published in the literature. While some of the suggested gene deletions are quite straightforward, as they essentially prune reaction pathways competing with the desired one, many others are at first quite non-intuitive reflecting the complexity and built-in redundancy of the metabolic network of E.
coli. For the succinate case, OptKnock correctly suggested anaerobic fermentation and the removal of the phosphotranferase glucose uptake mechanism as a consequence of the competition between the cellular and chemical production objectives, and not as a direct SUBSTITUTE SHEET (RULE 26) input to the problem. In the lactate study, the glucokinase-based glucose uptake mechanism was shown to decouple lactate and biomass production for certain knockout strategies. For the PDO case, results show that the Entner-Doudoroff pathway is more advantageous than EMP glycolysis despite the fact that it is substantially less energetically efficient. In addition, the so far popular tpi knockout was clearly shown to reduce the maximum yields of PDO while a complex network of 15 reactions was shown to be theoretically possible of "leaking" flux from the PPP pathway to the TCA cycle and thus decoupling PDO production from biomass formation. The obtained results also appeared to be quite robust with respect to the choice for the cellular objective.
The present =invention contemplates any number of cellular objectives, including but not limited to maximizing a growth rate, maximizing ATP production, minimizing' metabolic adjustment, minimizing nutrient uptake, minimizing redox production, minimizing a Euclidean norm, and combinations of these and other cellular objectives.
It is important to note that the suggested gene deletion strategies must be interpreted carefully. Fo'r example, in many cases the deletion of a gene in one branch of a branched pathway is equivalent with the significant up-regulation in the other. In addition, inspection of the flux changes before and after the gene deletions provides insight as to which genes need to be up or down-regulated. Lastly, the problem of mapping the set of identified reactions iargeted for removal to its corresponding gene counterpart is not always /uniquely specified. Therefore, careful identification of the most economical gene,set accounting for isozymes and multifunctional enzymes needs to be made.
Preferably, in,the OptKnock framework, the substrate uptake flux (i.e., glucose) is assumed to be 10 mrnol/gDW-hr. Therefore, all reported chemical production and biomass SUBSTITUTE SHEET (RULE 26) formation values are based upon this postulated and not predicted uptake scenario. Thus, it is quite possible that the suggested deletion mutants may involve substantially lower uptake efficiencies. However, because OptKnock essentially suggests mutants with coupled growth and chemical production, one could envision a growth selection system that will successively evolve mutants with improved uptake efficiencies and thus enhanced desired chemical production characteristics.
Where there is a lack of any regulatory or kinetic information within the purely stoichiometric representation of the inner optimization problem that performs flux allocation, OptKnock is used to identify any gene deletions-as the sole mechanism for chemical overproduction. Clearly, the lack of any regulatory or kinetic information in the model is a simplification that may in some cases suggest unrealistic flux distributions. The incorporation of regulatory information will not only enhance the quality of the suggested gene deletions by more appropriately resolving flux allocation, but also allow us to suggest regulatory modifications along with gene deletions as mechanisms for strain improvement.
The use of alternate'modeling approaches (e.g., cybernetic (Kompala et al., 1984;
Ramakrishna et al., 1996; Vamer and Ramkrishna, 1999), metabolic control analysis (Kacser and Bums, 1973; Heinrich and Rapoport, 1974; Hatzimanikatis et al., 1998)), if available, can be incorporated within the OptKnock framework to more accurately estimate the metabolic flux distributions of gene-deleted metabolic networks.
Nevertheless, even without such regulatory or kinetic information, OptKnock provides useful suggestions for strain improvement and more importantly,establishes a systematic framework.
The present invention naturally contemplates future improvements in metabolic and regulatory modeling frameworks.
SUBSTITUTE SHEET (RULE 26) 2.1 Methods The maximization of a cellular objective quantified as an aggregate reaction flux for a steady state metabolic network comprising a set N= { 1,. .., 9V) of metabolites and a set 91= { 1,..., M} of metabolic reactions fueled by a glucose substrate is expressed mathematically as follows, maximize vicellular objective (Primal) subject to YSvj = 0, 'd i E N
j=1 Vpts. + Vglk = Vgic_uptake mmol/gDWhr vatp ~ vatp main mmol/gDW-hr Vbiamass ~ Vbiamass 1 /hr 1' j? O, 'd j E Mirrev v j< 0, f/ j E Msecr only vJ E R, t'1 j E Mrev where Sy is the stoichiometric coefficient of metabolite i in reactionj, vj represents the flux of reactionj, Vglc_uptake is the basis glucose uptake scenario, vatp mai>: is the non-growth associated ATP maintenance requirement, and vb orMass is a minimum level of biomass production. The vector v includes both internal and transport reactions. The forward (i.e., positive) direction of transport fluxes corresponds to the uptake of a particular metabolite, SUBSTITUTE SHEET (RULE 26) whereas the reverse (i.e., negative) direction corresponds to metabolite secretion. The uptake of glucose through the phosphotransferase system and glucokinase are denoted by vprs and vgtk, respectively. Transport fluxes for metabolites that can only be secreted from the network are members Of Msecr oõly. Note also that the complete set of reactions M is subdivided into reversible M, and irreversible M1Reõ reactions. The cellular objective is often assumed to be a drain of biosynthetic precursors in the ratios required for biomass formation (Neidhardt and Curtiss, 1996). The fluxes are reported per 1 gDW hr such that biomass formation is expressed as g biomass produced/gDWhr or 1/hr.
The modefing of gene deletions, and thus reaction elimination, first requires the incorporation of binary variables into the flux balance analysis framework (Burgard and Maranas, 2001; Burgard et al., 2001). These binary variables, 1 if reaction flux vj is active y' 0 if reaction flux, vj is not active VjE M
assume a value of one if reaction j is active and a value of zero if it is inactive. The following constraint, vimm. yf<v,<v,~'*yJ b'jE M
ensures tliat reaction flux vj is set to zero only if variable yj is equal to zero. Alternatively, wheny, is equal to one, vj is free to assume any value between a lower vJ ""
and an upper v.,"
bound. In this study, vJ""" and vj'" " are identified by minimizing and subsequently maximizing every reaction flux subject to the constraints from the Primal problem.
The identification of optimal gene/reaction knockouts requires the solution of a bilevel optimization problem that chooses the set of reactions that can be accessed (yj = 1) so as the optimization of the celluldr objective indirectly leads to the overproduction of the SUBSTITUTE SHEET (RULE 26) chemical or biochemical of interest (see also Figure 6). Using biomass formation as the cellular objective, this is expressed mathematically as the following bilevel mixed-integer optimization problem.
SUBSTITUTE SHEET (RULE 26) maximize vchemrcal (OptKnock) Yi subject to maximize vbromass (Primal) vJ
subject to E.S' vl = 0, V1 E N
vpts + Vglk - vgtc _ uptake Vaip ~ Vatp matn t arg et Vblomass > V
biomass v ;in =y, 5v, <_V;ax =y,, ej E M
yi ={0,1 }, fI j E M
F, (1-yj )SK
JEM
where K is the number of allowable knockouts. The fmal constraint ensures that the resulting network meets a miriimum biomass yield, ' btomass =
The direct solution of this two-stage optimization problem is intractable given the high dimensionality of the flux space (i.e., over 700 reactions) and the presence of two nested optimization problems. To remedy this, we develop an efficient solution approach borrowing from LP duality theory which shows that for every linear programming problem (primal) there exists a unique optimization problem (dual) whose optimal objective value is SUBSTITUTE SHEET (RULE 26) equal to that of the primal problem. A similar strategy was employed by (Burgard and Maranas, 2003) for identifying/testing metabolic objective functions from metabolic flux data. The dual problem (Ignizio and Cavalier, 1994) associated with the OptKnock inner problem is P
minimize Vatp_mQtn =Patp + Vbromnss Aromass + Vglc_up(ake gic (Dual) N
subj ect to r a~ ~oEch Si glk +~lgik + gIC - 0 6j-~i Xitolch Si pts +,ll pts + gl c= 0 ' i=t N
stofch ~i Si,biomass + Pbiomass 1 r'=1 N
Ivi 1o"hS,J +,u j= 0, 'd j E M; j~ glk, pts, biomass ~ min y, ) < Pj < ~,'."a" = (1- yj), b' j E Mrev and1 0- Msecr only fl~ ~ fl j n(1 -.j1]Vj E Mrev and Msecr only fl; <_ ~Cl ,niax(1 - yj), d j E Mirrev and J SE Msecr only j E R, Vj E Mirrev and Msecr only Ztoich E R, t'/ j E N
glcER
where .its' "h is the dual variable associated with the stoichiometric constraints, glc is the dual variable associated with the glucose uptake constraint, and pj is the dual variable SUBSTITUTE SHEET (RULE 26) associated with any other restrictions on its corresponding flux vJ in the Primal. Note that, the dual variable,uj acquires unrestricted sign if its corresponding flux in the OptKnock inner problem is set to zero by enforcing yj = 0. The parameters ,-limin and qj' are identified by minimizing and subsequently maximizing their values subject to the constraints of the Dual problem.
If the optimal solutions -to the Primal and Dual problems are bounded, their objective function values must be equal to one another at optimality. This means that every optimal solution to both problems can be characterized by setting their objectives equal to one another and accumulating their respective constraints. Thus the bilevel formulation for OptKnock shown previously can be transformed into the following single-level MII,P.
maximize Vchemical (OptKnock) subject to t et Vbiomass - Vatp main',uatp + Vb,omass ~-omass +Vgic_uptake ' glc M
Y'S;;vi = 0, 'd i E N
1=J
Vpts + Vglk = 1'gic_uptake mmol/gDW-hr 1'atp ~ Vatp main mmol/gDI Y=hr N
Y, Ivi toich S*f glk + /uglk + glC = 0 i=1 N =
Z,~itoichsi,p, + tLl pta + gic = 0 i=1 SUBSTITUTE SHEET (RULE 26) N
a'stofch i si,bfomass + Pbiomass - 1 i=1 N
sloich Su +,u j= 0, Vj E M, j glk, pts, biomass E(1-yj):5"T';
jEM
vbtomass ~ Vbiomass ,47 n .(1- y J):5'Y, ~~'ax .(1- y j), Vj E Mrev and j0 Msecr oniy ~ j >~ jdn '(1 -y d f E Mrev and Msecr only Pj:gp~'(1- Y j), d j E Mirrev and j 0 Msecr only pj ERr V,J E Mirrev and Msecr only ~ftn.yj<_vjSv!'a' yj, VjE M
A,toich E R, b'jE N
glc E R
yj d j E M
An important feature of the above formulation is that if the problem is feasible, the optimal solution will always be found., In this invention, the candidates for gene knockouts included, but are not limited to, all reactions of glycolysis, the TCA cycle, the pentose SUBSTITUTE SHEET (RULE 26) phosphate pathway, respiration, and all anaplerotic reactions. This is accomplished by limiting the number of reactions included in the summation (i.e., (1- y j)=
K).
jeCentral Metabolism Problems containing as many as 100 binary variables were solved in the order of minutes to hours using CPLEX 7.0 accessed via the GAMS modeling environment on an IBM
RS6000-270 workstation. It should be understood, however, that the present invention is not dependent upon any particular type of computer or environment being used.
Any type =
can be used to allow for inputting and outputting the information associated with. the methodology of the present invention. Moreover, the steps of the methods of the present invention can be implemented in any number of types software applications, or languages, and the present invention is not limited in this respect. It will be appreciated that other embodiments and uses w-ill be, apparent to those skilled in the art and that the invention is not limited to these specific illustrative examples.
2.2 EXAMPLE 1 Succinate and Lactate Production ' Wtuch reactions, if any, that could be removed from the E. coli K-12 stoichiometric model (Edwards.8z Palsson, 2000) so as the remaining network produces succinate or lactate whenever biomass maximization is a good descriptor of flux allocation were identified. A prespecified amount of glucose (10 mmol/gDW=hr), along with unconstrained uptake routes for inorganic phosphate, oxygen, sulfate, and ammonia are provided to fuel the metabolic network. The optimization step could opt for or against the phosphotransferase system, glucokinase, or both mechanisms for the uptake of glucose.
Secretion routes for acetate, carbon dioxide, ethanol, formate, lactate and succinate are also SUBSTITUTE SHEET (RULE 26) enabled. Note that because the glucose uptake rate is fixed, the biomass and product yields are essentially equivalent to the rates of biomass and product production, respectively. In all cases, the OptKnock procedure eliminated the oxygen uptake reaction pointing at anaerobic growth conditions consistent with current succinate (Zeikus et al., 1999) and lactate (Datta et al., 1995) fermentative production strategies.
Table I summarizes three of the identified gene knockout strategies for succinate overproduction (i.e., mutants A, B, and C). The results for mutant A suggested that the removal of two reactions (i.e., pyruvate formate lyase and lactate dehydrogenase) from the .
network results in succinate production reaching 63% of its theoretical maximum at the maximum biomass yield. This knockout strategy is identical to the one employed by Stols and Donnelly (1997) in their succinate overproducing E. coli strain. Next, the envelope of allowable succinate versus biomass production was explored for the wild-type E. coli network and the three mutants listed in Table I. The succinate production limits revealed that mutant A does not exhibit coupled succinate and biomass formation until the yield of biomass approaches 80% of the maximum. Mutant B, however, with the additional -deletion of acetaldehyde dehydrogenase, resulted in a much earlier coupling of succinate with biomass yields.
A less intuitive strategy was identified for mutant C which focused on inactivating two PEP consuming reactions rather than eliminating competing byproduct (i.e., ethanol, formate, and lactate) production mechanisms. First, the phosphotransferase system was disabled requiring the network to rely exclusively on glucokinase for the uptake of glucose.
Next, pyruvate kinase was removed leaving PEP carboxykinase as the only central metabolic reaction capable of draining the significant amount of PEP supplied by SUBSTITUTE SHEET (RULE 26) glycolysis. This strategy, assuming that the maximum biomass yield could be attained, resulted in a succinate yield approaching 88% of the theoretical maximum. In addition, there was significant succinate production for every attainable biomass yield, while the maximum theoretical yield of succinate is the same as that for the wild-type strain.
The OptKnock framework was next applied to identify knockout strategies for coupling lactate and biomass production. Table I shows three of the identified gene knockout strategies (i.e., mutants A, B, and C). =. Mutant A redirects flux toward lactate at the maximum biomass yield by blocking acetate and ethanol production. This result is consistent with previous work demonstrating that an adh, pta mutant E. coli strain could grow anaerobically on glucose by producing lactate (Gupta & Clark, 1989).
Mutant B
provides an alternate strategy involving the removal of an initial glycolysis reaction along with the acetate production mechanism. This results in a lactate yield of 90%
of its theoretical limit at the maximum biomass yield. It is also noted that the network could avoid producing lactate while maximizing biomass formation. This is due to the fact that OptKnock does not explicitly account for the "worst-case" alternate solution.
It should be appreciated that upon the additional elimination of the glucokinase and ethanol production mechanisms, mutant C exhibited a tighter coupling between lactate and biomass production.
SUBSTITUTE SHEET (RULE 26) . . o .
Table I - Biomass and chemical yields for various gene knockout strategies identified by OptKnock. The reactions and corresponding enzymes for each knockout strategy are listed.
The maximum biomass and corresponding chemical yields are provided on a basis of 10 mmoUhr glucose fed and 1 gDW of cells. The rightmost column provides the chemical yields for the same basis assuming a minimal redistribution of metabolic fluxes from the wild-type (undeleted) E. coli network (MOMA assumption). For the 1,3-propanediol case, glycerol secretion was disabled for both knockout strategies.
SUBSTITUTE SHEET (RULE 26) Succinate max yblomasa Rl1n ME(yo - v)2 Biomass Succinate Suceinate ID Knockouts Enzyme (1/hr) (mmoUhr) (mmol/hr) Wild "Complele network" 0 38 D.12 0 A I COA+ PYR -r ACCOA'+FOR Pyruvatc fomtatc lyase 2 NADH + PYR *-s LAC + NAD Lactate dehydrogenase 0=3 i 10.70 1,65 B I COA + PYR -). ACCOA + FOR Pytnvate fonnata lyase 2 NADH + PYR E-s LAC + NAD Lactate dehydrogenase 0.31 10.70 4.79 3 ACCOA + 2 NADH COA + ETH + 2 NAD Acetaldehyde dehydrogenase C I ADP + PEP -+ ATP + PYR Pynrvato kinasc 2 ACTP + ADP ea AC + ATP or Acetate Idnase 0.16 15.15 621 ACCOA + Pi e-+ ACTP + COA Phosphotransacetylase 3 GLC + PEP -- G6P + PYR Phosphatran9ferase system Lactate max V btomau min ME(v v) 2 Biornass Lactate Lactate ID Knockouts Enzyme (1/hr) (mmol/hr) (mmoUhr) Wlld "Complete network" 0 38 D 0 A I ACTP + ADP t+ AC + ATP or Acatate kinase ACCOA + Pi c-s ACTP + COA Phosphotransacetylasc 0.28 10.46 5 58 2 ACCOA + 2 NADH - COA + ETH + 2 NAD Acetaldehydc dehydrogouasc B I ACTP + ADP <-s AC + ATP or Acetate kinase ACCOA + Pi t-r ACTP + COA Phosphotransacetylase 0.13 1900 0 19 2 ATP + F6P --> ADP + F'DP or Phosphofructoldnase FDP r+ T3P1 + T3P2 Fntctose-1,6-blspbosphatate aldolase C 1 ACTP + ADP a-s AC + ATP or Acetate kinase ACCOA + P( E+ ACTP + CDA Phosphotransacctylase 2 ATP + F6P -s ADP + FDP or Phosphofruclokinase FDPHT3P1+T3P2 Fructose-I.6-bisphosphatataaldolase 0=12 18.13 1053 3 ACCOA + 2 NADH - COA + ETH + 2 NAD Acetaldehyde dehydrogenase 4 GLC + ATP -+ G6P + PEP Glucokinasr 1,3-Propanediot mar y biomau rxin t$ (v a- v)' Biomass 1,3-PD 1,3-PD
ID Knockouts Enzyme (1/hr) (mmol/hr) (mmol/hr) Wad "Complcta network" 1.06 0 0 A I FDP -+ F6P + Pi or Fructose-1,6-bispbosphatase FDP f+ T3P1 + T3P2 Fructosc-1,6-bisphosphatc aldolase 2 13PDG + ADP ++ 3PG + ATP or Phosphoglycemte kinase 0.21 9.66 8.66 NAD + Pi + T3P 1 t-+ I 3PDG + NADH Glyceraldehyde-3-phosphate dehydrogenase 3 GL + NAD <-- GLAL + NADH Aldehyde dehydrogenase B I T3P1 ta T3P2 Triosphosphate isometase 2 G6P + NADP ea D6PGL+NADPH or Glucase 6-phosphate-l-dehydrogenase D6PGL -r D6PGC 6-Phosphoglaconolactonase 0.29 9.67 9.54 3 DRSP -s ACAL+T3P1 Deo.yn'bose-phosphate aldolase 4 GL + NAD H GLAL + NADH Aldehyde dchydrogenase 2.2 EXAMPLE 2 1,3-Propanediol (PDO) Production SUBSTITUTE SHEET (RULE 26) In addition to devise optimum gene knockout strategies, OptKnock was used to design strains where gene additions were needed along with gene deletions such as in PDO
production in E. coli. Although microbial 1,3-propanediol (PDO) production methods have been developed utilizing glycerol as the primary carbon source (Hartlep et al., 2002;
Zhu et al., 2002), the production of 1,3-propanediol directly from glucose in a single microorganism has recently attracted considerable interest (Cameron et al., 1998; Biebl et al., 1999; Zeng & Biebl, 2002). Because wild-type E. coli lacks the pathway necessary for PDO production, the gene addition framework was first employed (Burgard and Maranas, 2001) to identify the additional reactions needed for producing PDO from glucose in E.
coli. The gene addition framework identified a straightforward three-reaction pathway involving the conversion of glycerol-3-P to glycerol by glycerol phosphatase, followed by the conversion of glycerol to 1,3 propanediol by glycerol dehydratase and 1,3-propanediol oxidoreductase. These reactions were then added to the E. coli stoichiometric model and the OptKnock procedure was subsequently applied.
OptKnock revealed that there was neither a single nor a double deletion mutant ~
with coupled PDO and biomass production. However, one triple and multiple quadruple knockout strategies that can couple PDO production with biomass production was identified.= Two of these knockout strategies are shown in Table I. The results suggested that the removal of certain key functionalities from the E. coli network resulted in PDO
=J
overproducing mutants for growth on glucose. Specifically, Table I reveals that the removal of two glycolytic reactions along with an additional knockout preventing the degradation of glycerol yields a network capable of reaching 72% of the theoretical maximum yield of PDO at the maximum biomass yield. Note that the glyceraldehyde-3-SUBSTITUTE SHEET (RULE 26) phosphate dehydrogenase (gapA) knockout was used by DuPont in their PDO-overproducing E. coli strain (Nakamura, 2002). Mutant B revealed an alternative strategy, involving the removal of the triose phosphate isomerase (tpi) enzyme exhibiting a similar PDO yield and a 38% higher biomass yield. Interestingly, a yeast strain deficient in triose phosphate isomerase activity was recently reported to produce glycerol, a key precursor to PDO, at 80-90% of its maximum theoretical yield (Compagno et al., 1996).
Review of the flux distributions of the wild-type E. coli, mutant A, and mutant B
networks that maximize the biomass yield indicates that, not surprisingly, further conversion of glycerol to glyceraldehyde was disrupted in both mutants A and B. For mutant A, the removal of two reactions from the top and bottom parts of glycolysis resulted in a nearly complete inactivatiori of the pentose phosphate and glycolysis (with the exception of triose phosphate isomerase) pathways. To compensate, the Entner-Doudoroff glycolysis pathway is activated to cliannel flux from glucose to pyruvate and glyceraldehyde-3-phosphate (GAP). GAP is then converted to glycerol which is subsequently converted to PDO. Energetic demands lost with the decrease in glycolytic fluxes from the wild-type E. coli network case, are now met by an increase in the TCA
cycle fluxes. The knockouts suggested for mutant B redirect flux toward the production of PDO by a distinctly different mechanism. The removal of the initial pentose phosphate pathway reaction results in the complete flow,of metabolic flux through the first steps of glycolysis. At the fructose bisphosphate aldolase junction; the flow is split into the two product metabolites: dihydroxyacetone-phosphate (DHAP) which is converted to PDO and GAP which continues through the second half of the glycolysis. The removal of the triose-phosphate isomerase reaction prevents any interconversion between DHAP and GAP.
SUBSTITUTE SHEET (RULE 26) Interestingly, a fourth knockout is predicted to retain the coupling between biomass formation and chemical production. This knockout prevents the "leaking" of flux through a complex pathway involving 15 reactions that together convert ribose-5-phosphate (R5P) to acetate and GAP, thereby decoupling growth from chemical production.
Next, the envelope of allowable PDO production versus biomass yield is explored for the two mutants listed in Table I. The production limits of the mutants along with the original E. coli network, reveal that the wild-type E. coli network has no "incentive" to produce PDO if the biomass yield is to be maximized. On the other hand, both mutants A
and B have to produce significant amounts of PDO if any amount of biomass is to be formed given the reduced functionalities of the network following the gene removals.
Mutant A, by avoiding the tpi knockout that essentially sets the ratio of biomass to PDO
production, is characterized by a higher maximum theoretical yield of PDO. The above described results hinge on the use of glycerol as a key intermediate to PDO.
Next, the possibility of utilizing an alternative to the glycerol conversion route for 1,3-propandediol production was explored.
Applicants identified a pathway in Chlorof lexus aurantiacus involving a two-step NADPH-dependant reduction of malonyl-CoA to generate 3-hydroxypropionic acid (3-HPA) (Menendez et al., 1999; Hugler et al., 2002). 3-HPA could then be subsequently converted chemically to 1,3 propanediol given that there is no biological functionality to achieve this transformation. This pathway offers a key advantage over PDO
production through the glycerol route because its initial step (acetyl-CoA carboxylase) is a carbon fixing reaction. Accordingly, the maximum theoretical yield of 3-HPA (1.79 mmol/mmol glucose) is considerably higher than for PDO production through the glycerol conversion SUBSTITUTE SHEET (RULE 26) route (1.34 mmol/mmol glucose). The application of the OptKnock framework upon the addition of the 3-HPA production pathway revealed that many more knockouts are required before biomass formation is coupled with 3-HPA production. One of the most interesting strategies involves nine knockouts yielding 3-HPA production at 91% of its theoretical maximum at optimal growth. The first three knockouts were relatively straightforward as they involved removal of competing acetate, lactate, and ethanol production mechanisms.
In addition, the Entner-Doudoroff pathway (either phosphogluconate dehydratase or 2-keto-3-deoxy-6-phosphogluconate aldolase), four respiration reactions (i.e., NADH
dehydrogenase I, NADH dehydrogenase II, glycerol-3-phosphate dehydrogenase, and the succinate dehydrogenase complex), and an initial glycolyis step (i.e., phosphoglucose isomerase) are disrupted. This strategy resulted in a 3-HPA yield that, assuming the maximum biomass yield, is 69% higher than the previously identified mutants utilizing the glycerol conversion route.
2.3 EXAMPLE 3 Alternative Cellular Objective: Minimization of Metabolic Adjustment All results described previously were obtained by invoking the maximization of biomass yield as the cellular objective that drives flux allocation. This hypothesis essentially assumes that the metabolic network could arbitrarily change and/or even rewire regulatory loops to maintain biomass yield maximality under changing environmental conditions (maximal response). Recent evidence suggests that this is sometimes achieved by the K-12 strain of E. coli after miultiple cycles of growth selection (Ibarra et al., 2002).
In this section, a contrasting hypothesis was examined (i.e., minimization of metabolic SUBSTITUTE SHEET (RULE 26) adjustment (MOMA) (Segre et al., 2002)) that assumed a myopic (minimal) response by the metabolic network upon gene deletions. Specifically, the MOMA hypothesis suggests' that the metabolic network will attempt'to remain as close as possible to the original steady state of the system rendered unreachable by the gene deletion(s). This hypothesis has been shown to provide a more accurate description of flux allocation immediately after a gene deletion event (Segre et al., 2002). For this study, the MOMA objective was utilized to predict the flux distributions in the mutant strains identified by OptKnock.
The base case for the lactate and succinate simulations was assumed to be maximum biomass formation under anaerobic conditions, while the base case for the PDO simulations was maximum biomass formation under aerobic conditions. The results are shown in the last column of Table 1. In all cases, the suggested multiple gene knock-out strategy suggests only slightly lower chemical production yields for the MOMA case compared to the maximum biomass hypothesis. This implies that the OptKnock results are fairly robust with respect to the choice of cellular objective.
3.0 Alternative Embodiments The publications and other material used hereiri to illuminate the background of the invention or provide additional details respecting the practice, are herein incorporated by reference in their entirety. The present invention contemplates numerous variations, including variations in organisms, variations in cellular objectives, variations in bioengineering objectives, variations in types of optimization problems formed and solutions used. These and/or other variations, modifications or alterations may be made SUBSTITUTE SHEET (RULE 26) therein without departing from the spirit and the scope of the invention as set forth in the appended claims. -REFERENCES
Anthony, C. (1982) The Biochemistry of Methylotrophs (Academic Press.) Arita, M. (2000) Simulation Practice and Theory 8, 109-125.
Arita, M. (2004) Proc Natl Acad Sci U S A 101, 1543-7.
Badarinarayana, V., Estep, P.W., 3rd, Shendure, J., Edwards, J., Tavazoie, S., Lam, F., Church, G.M. (2001) Nat Biotechnol 19(11): 1060-5.
Bailey, J. E: (1991) Science 252, 1668-75.
Bailey, J. E. (2001) Nat Biotechnol 19, 503-4.
Bard, J. F. 1998. Practical bilevel optimization : algorithms and applications. Dordrecht ;
Boston, Kluwer Academic.
Biebl, H., Menzel, K., Zeng, A.P., Deckwer, W.D. (1999) Appl Environ Microbiol 52:
289-297.
Bond, D. R. & Lovley, D. R. (2003) Appl Environ Microbiol 69, 1548-55.
Bond, D. R., Holmes, D. E., Tender, L. M. & Lovley, D. R. (2002) Science 295, 483-5.
Brown, M. (1999) Perl programmer's reference (Osborne/McGraw-Hill, Berkeley, Calif.).
Burgard, A. P. & Maranas, C. D. (2001) Biotechnol Bioeng 74, 364-375.
Burgard, A. P., Pharkya, P. & Maranas, C. D. (2003) Biotechnol Bioeng 84, 647-57.
Burgard, A. P., Maranas, C. D. (2003) Biotechnol Bioeng 82(6): 670-7.
Burgard, A. P., Vaidyaraman, S., Maranas, C. D. (2001) Biotechnol Prog 17: 791-797.
SUBSTITUTE SHEET (RULE 26) Cameron, D. C., Altaras, N. E., Hoffman, M. L., Shaw, A. J. (1998) Biotechnol Prog 14(l):
116-25.
Castellanos, M., Wilson, D. B. & Shuler, M. L. (2004) Proc Natl Acad Sci U S A
101, 6681-6.
Causey, T. B., Shanmugam, K. T., Yomano, L. P. & Ingram, L. O. (2004) Proc Natl Acad Sci U S A 101, 2235-40. =
Chin, H. L., Chen, Z. S. & Chou, C. P. (2003) Biotechnol Prog 19, 3 83-8.
Chistoserdova, L., Laukel, M., Portais, J. C., Vorholt, J. A. & Lidstrom, M.
E. (2004) J
Bacteriol 186, 22-8.
Chistoserdova, L., Vorholt, J. A., Thauer, R. K. & Lidstrom, M. E. (1998) Science 281, 99-102.
Compagno, C., Boschi, F., Ranzi, B. M. (1996) Biotechnol Prog 12(5): 591-5.
Covert, M.W., Palsson, B.O. (2002) J Biol Chem 277(31): 28058-64.
Covert, M.W., Schilling, C.H., & Palsson, B.O. (2001) J Theor Bio1.213(1): 73-88.
'15 Das, D. & Veziroglu, T. N. (2001) International Journal of Hydrogen Energy 26, 13-28.
Datta, R., Tsai, S., Bonsignore, P., Moon, S., Frank, J. R. (1995) FEMS
Microbiol. Rev.
16: 221-231.
David, H., Akesson, M. & Nielsen, J. (2003) Eur J Biochem 270,4243-53.
Desai, R. P., Nielsen, L. K. & Papoutsakis, E. T. (1999) J Biotechno171, 191-205.
Edwards, J. S. & Paisson, B. O. (2000) Proc Natl Acad Sci U S A 97, 5528-33.
Edwards, J. S., Ibarra, R. U., Palsson, B. O. (2001) Nat Biotechnol 19(2): 125-30.
Edwards, J. S., Palsson, B. O. (2000) Proc Natl Acad Sci U S A 97(10): 5528-33.
Ellis, L. B., Hou, B. K., Kang, W. & Wackett, L. P. (2003) Nucleic Acids Res 31, 262-5.
SUBSTITUTE SHEET (RULE 26) Eppstein; D. (1994) in 35th IEEE Symp. Foundations of Comp. Sci, Santa Fe), pp. 154;
165.
Fan, L. T., Bertok, B. & Friedler, F. (2002) Comput Chem 26, 265-92.
Finneran, K. T., Housewright, M. E. & Lovley, D. R. (2002) Environ Microbiol 4, 510-6.
Forster, J., Famili, I., Fu, P. C., Palsson, B., Nielsen, J. (2003) Genome Research 13(2):
244-253.
Forster, J., Famili, I., Fu, P., Palsson, B. O. & Nielsen, J. (2003) Genome Res 13, 244-53.
Gupta, S., Clark, D. P. (1989) J Bacteriol 171(7): 3650-5.
. ~ .
Hartlep, M.; Hussmann, W:, Prayitno, N., Meynial-Salles, I., Zeng, A. P.
(2002) Appi Microbiol Biotechnol 60(1-2): 60-6.
Hatzimanikatis, V., Emmerling, M., Sauer, U., Bailey, J. E. (1998) Biotechnol Bioeng 58(2-3): 154-61.
Hatzimanikatis, V., Li, C., Ionita, J. A. & Broadbelt, L. J. (2003) presented at Biochemical Engineering XIII Conference; Session 2, Boulder, CO.
Heinrich, R., Rapoport, T. A. (1974) Eur. J. Biochem. 41:' 89-95.
Hugler, M., Menendez, C., Schagger, H., Fuchs, G. (2002) J Bacteridl- 184(9):
2404-10.
Ibarra, R. U., Edwards, J. S., Palsson, B. O. (2002) Nature 420(6912): 186-9.
Ignizio, J.P., Cavalier, T.M. 1994. Linear programming. Englewood Cliffs, N.J., Prentice Hall.
Kacser, H., Bums, J. A. (1973). Symp. Soc. Exp. Biol. 27: 65-104.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. (2004) Nucleic Acids Res 32 Database issue, D277-80.
SUBSTITUTE SHEET (RULE 26) Karp, P. D., Riley, M:, Saier, M., Paulsen, I. T., Collado-Vides, J., Paley, S. M., Pellegrini-Toole, A., Bonavides, C. & Gama-Castro, S. (2002) Nucleic Acids Res 30, 56-8.
Kataoka, N., Miya, A. & Kiriyama, K. (1997) Wat. Sci. Tech. 36, 41-47.
Kompala, D. S., Ramkrishna, D., Tsao, G. T. (1984) Biotechnol Bioeng 26(11):
1281.
Korotkova, N., Chistoserdova, L. & Lidstrom, M. E. (2002) J Bacteriol 184, 6174-8 1.
Krieger, C. J., Zhang, P., Mueller, L. A., Wang, A., Paley, S., Arnaud, M., Pick, J., Rhee, S. Y:= & Karp, P. D. (2004) Nucleic Acids Res 32 Database issue, D438-42.
Li, K. & Frost, J. W. (1998) Journal of American Chemical Society 120, 10545-10546.
Liu, H., Ramnarayanan, R. & Logan, B. E. (2004) Environmen'tal Sceince and Technology 38, 2281-2285.
Lovley, D. R. (2003) Nat Rev Microbiol 1, 35-44.
Majewski, R.A., Domach, M. M. (1990) Biotechnol Bioeng 35: 732=738.
Mavrovouniotis, M., Stephanopoulos, G. & Stephanopoulos, G. (1990) Bibtechnol Bioeng 36, 1119-1132.
McShan, D. C., Rao, S. & Shah, I. (2003) Bioinformatics 19,1692-8.
Menendez, C.,=Bauer, Z., Huber, H., Gad'on, N., Stetter, K.O., Fuchs, G.
(1999) J Bacteriol 181(4): 1088-98.
Methe, B. A., Nelson, K. E., Eisen, J. A., Paulsen, I. T., Nelson, W., Heidelberg, J. F., Wu, = D., Wu, M., Ward, N., Beanan, M. J., et al. (2003) Science 302, 1967-9.
Misawa, N., Yamano, S. & Ikenaga, H. (1991) Appl Environ Microbiol 57, 1847-9.
Nakamura, C. E. & Whited; G. M. (2003) Curr Opin Biotechnol 14, 454-9., SUBSTITUTE SHEET (RULE 26) Nakamura, C.E. 2002. Production of 1,3-Propanediol by E. coli. presented at Metab Eng IV
Conf: Tuscany, Italy.
Nandi, R. & Sengupta, S. (1996) Enzyme and microbial tehcnology 19, 20-25.
Nandi, R. & Sengupta, S. (1998) Crit Rev Microbiol 24, 61-84.
Neidhardt, F.C., Curtiss, R. 1996. Escherichia coli and Salmonella : cellular and molecular biology. Washington, D.C., ASM Press.
Overbeek, R., Larsen, N., Pusch, G.*D., D'Souza, M., Selkov, E., Jr., Kyrpides, N., Fonstein, M., Maltsev, N. & Selkov, E. (2000) Nucleic Acids Res 28, 123-5.
Papin, J.A., Price, N.D., Wiback, S.J., Fell, D.A., Palsson, B. 2003.
Metabolic Pathways in the Post-Genome Era. Trends Biochem Sci, accepted.
Papoutsakis, E. & Meyer, C. (1985) Biotechnol Bioeng 27, 50-66.
Papoutsakis, E. (1984) Biotechnol Bioeng 26, 174-187.
Pharkya, P., Burgard, A. P. & Maranas, C. D. (2003) Biotechnol Bioeng 84, 887-99.
Price, N.D., Papin, J. A., Schilling, C.H., Paisson, B. 2003. Genome-scale Microbial In Silico Models: The Constraints-Based Approach. Trends Biotechnol, accepted.
Ramakrishna, R., Edwards, J. S., McCulloch, A., Palsson, B. O. (2001) Am J
Physiol Regul Integr Comp Physiol 280(3): R695-704.
Ramakrishna, R., Ramakrishna, D., Konopka, A. E. (1996). Biotechnol Bioeng 52:
151.
Reed, J. L., Vo, T. D., Schilling, C. H. & Palsson, B. O. (2003) Genome Biol 4, R54.
Schilling, C. H., Covert, M. W., Famili, I., Church, G. M., Edwards, J. S. &
Paisson, B. O.
(2002) J Bacteriol 184,4582-93.
SUBSTITUTE SHEET (RULE 26) Schilling, C. H., Covert,lVl. W., Famili, I., Church, G. M., Edwards,'J. S., Palsson, B. 0.
(2002) J Bacteriol 184(16): 4582-93.
Schilling, C. H., Palsson, B. O. (2000) J Theor Biol 203(3): 249-83.
Segre, D., Vitkup, D., Church, G. M. (2002) Proc Natl Acad Sci U S A 99(23):
15112-7.
Segre, D., Zucker, J., Katz, J., Lin, X., D'Haeseleer, P., Rindone, W. P., Kharchenko, P., Nguyen, D. H., Wright, M. A. & Church, G. M. (2003) Omics 7,301-16.
Selkov,"E., Jr., Grechkin, Y., Mikhailova, N. & Selkov, E. (1998) Nucleic Acids Res 26, 43-5.
Seressiotis, A. & Bailey, J. E. (1988) Biotechnol Bioeng 31, 587-602.
Stephanopoulos, G., Aristidou, A. A., Nielsen, J. 1998. Metabolic engineering : principles and methodologies. San Diego, Academic Press.
Stephanopoulos, G. & Sinskey, A. J. (1993) T'rends Biotechnol 11, 392-6.
Stols, L., Donnelly, M. I. (1997) Appl Environ Microbio163(7): 2695-701.
Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y.;
Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C., et al. (1999) Bioinformatics 15, 72-84.
Valdes, J., Veloso, F., Jedlicki, E. & Holmes, D. (2003) BMC Genomics 4, 51.
Van Dien, S. J. & Lidstrom, M. E. (2002) Biotechnol Bioeng 78, 296-312.
Van Dien, S. J., Strovas, T. & Lidstrom, M. E. (2003) Biotechnol Bioeng 84, 45-55.
Varma, A., Boesch, B. W., Palsson, B. O. (1993) Appl Environ Microbiol 59(8):
2465-73.
Varma, A, Palsson, B. O. (1993) J. Theor. Biol. 165: 503-522.
Varma, A. & Palsson, B. O. (1994) Bio/Technology 12, 994-998.
Varner, J., Ramkrishna, D. (1999) Biotechnol Prog 15(3): 407-25.
Varner, J. & Ramkrishna, D. (1999) Curr Opin Biotechnol 10, 146-150.
SUBSTITUTE SHEET (RULE 26) Zeikus, J. 'G., Jain, M. K., Elankovan, P. (1999) Appl Microbiol Biotechnol 51: 545-552.
Zeng, A. P., Biebl, H. (2002) Adv Biochem Eng Biotechnol 74: 239-59.
Zhu, M. M., Lawman, P. D., Cameron, D. C. (2002) Biotechnol Prog 18(4): 694-9.
SUBSTITUTE SHEET (RULE 26)
Claims (14)
1. A computer-assisted method for identifying functionalities to add to an organism-specific metabolic network to enable a desired biotransformation in a host, comprising:
accessing reactions from a universal database; to provide stoichiometric balance;
identifying at least one stoichiometrically balanced pathway at least partially based on the reactions and a substrate to minimize a number of non-native functionalities in the production host; and incorporating the at least one stiochiometrically balanced pathway into the host to provide the desired biotransformation.
accessing reactions from a universal database; to provide stoichiometric balance;
identifying at least one stoichiometrically balanced pathway at least partially based on the reactions and a substrate to minimize a number of non-native functionalities in the production host; and incorporating the at least one stiochiometrically balanced pathway into the host to provide the desired biotransformation.
2. The computer-assisted method of claim 1 wherein the the step of identifying the at least one stiochiometircally balanced pathway includes solving an optimization problem.
3. The computer-assisted method of claim 2 wherein the optimization problem is formed by coupling at least one cellular objective with a bioengineering objective.
4. A computer-assisted method for identifying functionalities to add to an organism-specific metabolic network to enable a desired biotransformation in a production host, comprising:
accessing reactions from a universal database, having them stoichiometrically balanced;
calculating a maximum theoretical yield of a product associated with a substrate;
identifying at least one stoichiometrically balanced pathway based on the reactions, the substrate, and the maximum theoretical yield of the product to minimize a number of non-native functionalities in the production host; and incorporating the at least one stiochiometrically balanced pathway into the production host to provide the desired biotransformation.
accessing reactions from a universal database, having them stoichiometrically balanced;
calculating a maximum theoretical yield of a product associated with a substrate;
identifying at least one stoichiometrically balanced pathway based on the reactions, the substrate, and the maximum theoretical yield of the product to minimize a number of non-native functionalities in the production host; and incorporating the at least one stiochiometrically balanced pathway into the production host to provide the desired biotransformation.
5. The computer-assisted method of claim 4 wherein the the step of identifying at least one stiochiometircally balanced pathway includes solving an optimization problem.
6. The computer-assisted method of claim 5 wherein the optimization problem is a linear programming problem.
7. The computer-assisted method of claim 5 wherein the optimization probem is a mixed-integer optimization problem.
8. The computer-assisted method of claim 5 wherein the optimization problem is a bi-level optimization problem.
9. The computer-assisted method of claim 5 wherein the optimization problem couples at least one cellular objective with a bioengineering objective.
10. The computer-assisted method of claim 1 further comprising storing the organism-specific metabolic network as modified with the desired biotransformation.
11. A stored representation of a modified metabolic network based on an organism-specific metabolic network with added functionalities to enable a desired biotransformation of a production host, the stored representation comprising a plurality of metabolic pathways which include at least one stoichiometrically balanced pathway formed by (a) accessing reactions from a universal database to provide stoichiometric balance; (b) calculating a maximum theoretical yield of a product associated with a substrate; (c) identifying the at least one stoichiometrically balanced pathway based on the reactions, a substrate, and the maximum theoretical yield of the product to minimize a number of non-native functionalities in the production host; and (d) incorporating the at least one stiochiometrically balanced pathway into the production host to provide the desired biotransformation.
12. The stored representation of claim 11 wherein the stored representation is stored in a memory.
13. The stored representation ot claim 11 wherein the stored representation is stored in a signal.
14. The stored representation of claim 11 wherein the stored representation is stored in a bioengineered organism.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2004/027614 WO2006025817A1 (en) | 2004-08-26 | 2004-08-26 | Method for redesign of microbial production systems |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2578028A1 true CA2578028A1 (en) | 2006-03-09 |
Family
ID=36000353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002578028A Abandoned CA2578028A1 (en) | 2004-08-26 | 2004-08-26 | Method for redesign of microbial production systems |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1782061A4 (en) |
JP (1) | JP2008510485A (en) |
AU (1) | AU2004322970B2 (en) |
CA (1) | CA2578028A1 (en) |
WO (1) | WO2006025817A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190006103A (en) | 2010-05-05 | 2019-01-16 | 게노마티카 인코포레이티드 | Microorganisms and methods for the biosynthsis of butadiene |
JP2014150747A (en) * | 2013-02-06 | 2014-08-25 | Sekisui Chem Co Ltd | Mutant microorganisms and methods for producing succinic acid |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002239855B2 (en) * | 2001-01-10 | 2006-11-23 | The Penn State Research Foundation | Method and system for modeling cellular metabolism |
US7127379B2 (en) * | 2001-01-31 | 2006-10-24 | The Regents Of The University Of California | Method for the evolutionary design of biochemical reaction networks |
CA2439260C (en) * | 2001-03-01 | 2012-10-23 | The Regents Of The University Of California | Models and methods for determining systemic properties of regulated reaction networks |
EP1552472A4 (en) * | 2002-10-15 | 2008-02-20 | Univ California | Methods and systems to identify operational reaction pathways |
JP4742528B2 (en) * | 2003-06-30 | 2011-08-10 | 味の素株式会社 | Analysis method of intracellular metabolic flux using isotope-labeled substrate |
JP4477389B2 (en) * | 2004-03-24 | 2010-06-09 | 株式会社エヌ・ティ・ティ・データ | Information relationship analysis apparatus, information relationship analysis method, program, and recording medium |
-
2004
- 2004-08-26 WO PCT/US2004/027614 patent/WO2006025817A1/en active Application Filing
- 2004-08-26 EP EP04782168A patent/EP1782061A4/en not_active Withdrawn
- 2004-08-26 JP JP2007529791A patent/JP2008510485A/en active Pending
- 2004-08-26 AU AU2004322970A patent/AU2004322970B2/en not_active Ceased
- 2004-08-26 CA CA002578028A patent/CA2578028A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
EP1782061A4 (en) | 2009-08-12 |
WO2006025817A1 (en) | 2006-03-09 |
JP2008510485A (en) | 2008-04-10 |
AU2004322970A1 (en) | 2006-03-09 |
AU2004322970B2 (en) | 2010-04-01 |
EP1782061A1 (en) | 2007-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8108152B2 (en) | Method for redesign of microbial production systems | |
Burgard et al. | Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization | |
AU2003256480B2 (en) | Method for determining gene knockout strategies | |
Pharkya et al. | OptStrain: a computational framework for redesign of microbial production systems | |
Kim et al. | Applications of genome-scale metabolic network model in metabolic engineering | |
Yang et al. | EMILiO: a fast algorithm for genome-scale strain design | |
Price et al. | Genome-scale microbial in silico models: the constraints-based approach | |
Reed et al. | Genome-scale in silico models of E. coli have multiple equivalent phenotypic states: assessment of correlated reaction subsets that comprise network states | |
Costa et al. | Kinetic modeling of cell metabolism for microbial production | |
Haggart et al. | Whole-genome metabolic network reconstruction and constraint-based modeling⋆ | |
Lu et al. | Multiscale models quantifying yeast physiology: towards a whole-cell model | |
Matsuoka et al. | Current status and future perspectives of kinetic modeling for the cell metabolism with incorporation of the metabolic regulation mechanism | |
Garcia-Albornoz et al. | Application of genome-scale metabolic models in metabolic engineering | |
Mol et al. | Genome-scale metabolic modeling of P. thermoglucosidasius NCIMB 11955 reveals metabolic bottlenecks in anaerobic metabolism | |
Hu et al. | Comparative study of two Saccharomyces cerevisiae strains with kinetic models at genome-scale | |
Chellapandi et al. | Systems biotechnology: an emerging trend in metabolic engineering of industrial microorganisms | |
Maertens et al. | Modeling with a view to target identification in metabolic engineering: a critical evaluation of the available tools | |
AU2004322970B2 (en) | Method for redesign of microbial production systems | |
Tian et al. | Metabolic modeling for design of cell factories | |
Orth | Systems biology analysis of Escherichia coli for discovery and metabolic engineering | |
JP2012104129A (en) | Method for redesigning microbiological production system | |
Oyetunde | Decoding Complexity in Metabolic Networks using Integrated Mechanistic and Machine Learning Approaches | |
Vacher | Genome-scale reconstruction and constraint-based modelling of Arabidopsis thaliana energy metabolism | |
Sohn | Sang Yup Lee, Hyohak Song, Tae Yong Kim, and | |
Angione | Computational methods for multi-omic models of cell metabolism and their importance for theoretical computer science |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
FZDE | Discontinued |
Effective date: 20130827 |