US20230217957A1 - Compositions and methods for glycated consumables - Google Patents
Compositions and methods for glycated consumables Download PDFInfo
- Publication number
- US20230217957A1 US20230217957A1 US18/095,351 US202318095351A US2023217957A1 US 20230217957 A1 US20230217957 A1 US 20230217957A1 US 202318095351 A US202318095351 A US 202318095351A US 2023217957 A1 US2023217957 A1 US 2023217957A1
- Authority
- US
- United States
- Prior art keywords
- protein
- proteins
- feature
- values
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000000203 mixture Substances 0.000 title claims description 99
- 238000000034 method Methods 0.000 title abstract description 85
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 758
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 757
- 235000018102 proteins Nutrition 0.000 claims description 716
- 241000196324 Embryophyta Species 0.000 claims description 29
- 235000013351 cheese Nutrition 0.000 claims description 20
- 235000000346 sugar Nutrition 0.000 claims description 20
- 239000000470 constituent Substances 0.000 claims description 17
- 239000003925 fat Substances 0.000 claims description 13
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 claims description 12
- 235000021120 animal protein Nutrition 0.000 claims description 9
- 235000021118 plant-derived protein Nutrition 0.000 claims description 8
- 235000010469 Glycine max Nutrition 0.000 claims description 7
- 235000013336 milk Nutrition 0.000 claims description 7
- 239000008267 milk Substances 0.000 claims description 7
- 210000004080 milk Anatomy 0.000 claims description 7
- 235000014571 nuts Nutrition 0.000 claims description 6
- 239000007787 solid Substances 0.000 claims description 6
- 102000006395 Globulins Human genes 0.000 claims description 5
- 108010044091 Globulins Proteins 0.000 claims description 5
- 235000009854 Cucurbita moschata Nutrition 0.000 claims description 4
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 claims description 4
- 108010068370 Glutens Proteins 0.000 claims description 4
- 241000228143 Penicillium Species 0.000 claims description 4
- 239000008103 glucose Substances 0.000 claims description 4
- 108091005996 glycated proteins Proteins 0.000 claims description 4
- 235000020245 plant milk Nutrition 0.000 claims description 4
- 235000000832 Ayote Nutrition 0.000 claims description 3
- 235000009025 Carya illinoensis Nutrition 0.000 claims description 3
- 244000068645 Carya illinoensis Species 0.000 claims description 3
- 235000014036 Castanea Nutrition 0.000 claims description 3
- 241001070941 Castanea Species 0.000 claims description 3
- 244000241235 Citrullus lanatus Species 0.000 claims description 3
- 235000012828 Citrullus lanatus var citroides Nutrition 0.000 claims description 3
- 235000013162 Cocos nucifera Nutrition 0.000 claims description 3
- 244000060011 Cocos nucifera Species 0.000 claims description 3
- 235000007466 Corylus avellana Nutrition 0.000 claims description 3
- 240000004244 Cucurbita moschata Species 0.000 claims description 3
- 235000009804 Cucurbita pepo subsp pepo Nutrition 0.000 claims description 3
- 244000068988 Glycine max Species 0.000 claims description 3
- 235000003434 Sesamum indicum Nutrition 0.000 claims description 3
- 239000000839 emulsion Substances 0.000 claims description 3
- 235000021374 legumes Nutrition 0.000 claims description 3
- 239000003921 oil Substances 0.000 claims description 3
- 235000015136 pumpkin Nutrition 0.000 claims description 3
- 235000020234 walnut Nutrition 0.000 claims description 3
- 102000009027 Albumins Human genes 0.000 claims description 2
- 108010088751 Albumins Proteins 0.000 claims description 2
- 244000144725 Amygdalus communis Species 0.000 claims description 2
- 241000208223 Anacardiaceae Species 0.000 claims description 2
- 241000186146 Brevibacterium Species 0.000 claims description 2
- 241000222120 Candida <Saccharomycetales> Species 0.000 claims description 2
- 244000025254 Cannabis sativa Species 0.000 claims description 2
- 235000012766 Cannabis sativa ssp. sativa var. sativa Nutrition 0.000 claims description 2
- 235000012765 Cannabis sativa ssp. sativa var. spontanea Nutrition 0.000 claims description 2
- 241001266001 Cordyceps confragosa Species 0.000 claims description 2
- 241000235646 Cyberlindnera jadinii Species 0.000 claims description 2
- 241000235036 Debaryomyces hansenii Species 0.000 claims description 2
- 244000168141 Geotrichum candidum Species 0.000 claims description 2
- 235000017388 Geotrichum candidum Nutrition 0.000 claims description 2
- 108010061711 Gliadin Proteins 0.000 claims description 2
- 241000206596 Halomonas Species 0.000 claims description 2
- 241001138401 Kluyveromyces lactis Species 0.000 claims description 2
- 241000186660 Lactobacillus Species 0.000 claims description 2
- 241000194036 Lactococcus Species 0.000 claims description 2
- 241001609976 Leuconostocaceae Species 0.000 claims description 2
- 241000208467 Macadamia Species 0.000 claims description 2
- 241000192041 Micrococcus Species 0.000 claims description 2
- 241000192001 Pediococcus Species 0.000 claims description 2
- 244000271379 Penicillium camembertii Species 0.000 claims description 2
- 235000002245 Penicillium camembertii Nutrition 0.000 claims description 2
- 240000000064 Penicillium roqueforti Species 0.000 claims description 2
- 235000002233 Penicillium roqueforti Nutrition 0.000 claims description 2
- 240000006711 Pistacia vera Species 0.000 claims description 2
- 241000588671 Psychrobacter Species 0.000 claims description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 2
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 claims description 2
- 108010016634 Seed Storage Proteins Proteins 0.000 claims description 2
- 102000005686 Serum Globulins Human genes 0.000 claims description 2
- 108010045362 Serum Globulins Proteins 0.000 claims description 2
- 241000191940 Staphylococcus Species 0.000 claims description 2
- 241000194020 Streptococcus thermophilus Species 0.000 claims description 2
- 108050001277 Vegetative storage proteins Proteins 0.000 claims description 2
- 229920002494 Zein Polymers 0.000 claims description 2
- 235000020224 almond Nutrition 0.000 claims description 2
- 235000020113 brazil nut Nutrition 0.000 claims description 2
- 235000009120 camo Nutrition 0.000 claims description 2
- 235000020226 cashew nut Nutrition 0.000 claims description 2
- 235000005607 chanvre indien Nutrition 0.000 claims description 2
- 108091005896 globular proteins Proteins 0.000 claims description 2
- 102000034238 globular proteins Human genes 0.000 claims description 2
- 235000021312 gluten Nutrition 0.000 claims description 2
- 239000011487 hemp Substances 0.000 claims description 2
- 229940039696 lactobacillus Drugs 0.000 claims description 2
- 235000020233 pistachio Nutrition 0.000 claims description 2
- 108060006613 prolamin Proteins 0.000 claims description 2
- 235000008983 soft cheese Nutrition 0.000 claims description 2
- 239000005019 zein Substances 0.000 claims description 2
- 229940093612 zein Drugs 0.000 claims description 2
- 125000002791 glucosyl group Chemical group C1([C@H](O)[C@@H](O)[C@H](O)[C@H](O1)CO)* 0.000 claims 3
- 239000007858 starting material Substances 0.000 claims 3
- 244000205479 Bertholletia excelsa Species 0.000 claims 2
- 125000004042 4-aminobutyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])N([H])[H] 0.000 claims 1
- 235000012284 Bertholletia excelsa Nutrition 0.000 claims 1
- 241000723382 Corylus Species 0.000 claims 1
- 241000758791 Juglandaceae Species 0.000 claims 1
- 235000018330 Macadamia integrifolia Nutrition 0.000 claims 1
- 240000000912 Macadamia tetraphylla Species 0.000 claims 1
- 235000003800 Macadamia tetraphylla Nutrition 0.000 claims 1
- 244000000231 Sesamum indicum Species 0.000 claims 1
- 235000020194 almond milk Nutrition 0.000 claims 1
- 235000020258 cashew milk Nutrition 0.000 claims 1
- 235000020197 coconut milk Nutrition 0.000 claims 1
- 235000021105 fermented cheese Nutrition 0.000 claims 1
- 235000020259 hazelnut milk Nutrition 0.000 claims 1
- 235000020196 hemp milk Nutrition 0.000 claims 1
- 235000020260 pistachio milk Nutrition 0.000 claims 1
- 235000020271 pumpkin milk Nutrition 0.000 claims 1
- 235000020274 sesame milk Nutrition 0.000 claims 1
- 235000020261 walnut milk Nutrition 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 35
- 239000013598 vector Substances 0.000 description 41
- 239000000047 product Substances 0.000 description 31
- 230000006870 function Effects 0.000 description 28
- 238000003556 assay Methods 0.000 description 26
- 150000001413 amino acids Chemical group 0.000 description 22
- 238000000605 extraction Methods 0.000 description 22
- 230000002776 aggregation Effects 0.000 description 20
- 238000004220 aggregation Methods 0.000 description 20
- 239000004615 ingredient Substances 0.000 description 20
- 230000003993 interaction Effects 0.000 description 20
- 238000004519 manufacturing process Methods 0.000 description 20
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 19
- 238000004458 analytical method Methods 0.000 description 17
- 235000013305 food Nutrition 0.000 description 17
- 230000000694 effects Effects 0.000 description 12
- 238000011282 treatment Methods 0.000 description 12
- -1 etc.) Proteins 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 239000000243 solution Substances 0.000 description 11
- 239000003153 chemical reaction reagent Substances 0.000 description 10
- 235000019197 fats Nutrition 0.000 description 10
- 230000036252 glycation Effects 0.000 description 10
- 230000013595 glycosylation Effects 0.000 description 10
- 238000006206 glycosylation reaction Methods 0.000 description 10
- 125000003275 alpha amino acid group Chemical group 0.000 description 9
- 238000012512 characterization method Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 9
- 230000002596 correlated effect Effects 0.000 description 9
- 230000003247 decreasing effect Effects 0.000 description 9
- 230000026731 phosphorylation Effects 0.000 description 9
- 238000006366 phosphorylation reaction Methods 0.000 description 9
- 150000008163 sugars Chemical class 0.000 description 9
- 238000013459 approach Methods 0.000 description 8
- 230000027455 binding Effects 0.000 description 8
- 239000003085 diluting agent Substances 0.000 description 8
- 238000005457 optimization Methods 0.000 description 8
- 102000011632 Caseins Human genes 0.000 description 7
- 108010076119 Caseins Proteins 0.000 description 7
- 230000006399 behavior Effects 0.000 description 7
- 235000013365 dairy product Nutrition 0.000 description 7
- 238000004925 denaturation Methods 0.000 description 7
- 230000036425 denaturation Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 7
- 238000005259 measurement Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 239000000126 substance Substances 0.000 description 7
- 241001465754 Metazoa Species 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000005119 centrifugation Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 235000013312 flour Nutrition 0.000 description 6
- 239000000499 gel Substances 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 108091005981 phosphorylated proteins Proteins 0.000 description 6
- 230000009145 protein modification Effects 0.000 description 6
- 239000012460 protein solution Substances 0.000 description 6
- 240000007594 Oryza sativa Species 0.000 description 5
- 235000007164 Oryza sativa Nutrition 0.000 description 5
- 239000002253 acid Substances 0.000 description 5
- 150000007513 acids Chemical class 0.000 description 5
- 230000004931 aggregating effect Effects 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 5
- 150000002632 lipids Chemical class 0.000 description 5
- 235000013372 meat Nutrition 0.000 description 5
- 239000000843 powder Substances 0.000 description 5
- 230000035484 reaction time Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 235000009566 rice Nutrition 0.000 description 5
- 150000003839 salts Chemical class 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 244000105624 Arachis hypogaea Species 0.000 description 4
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 4
- 235000010582 Pisum sativum Nutrition 0.000 description 4
- 240000004713 Pisum sativum Species 0.000 description 4
- 108010064851 Plant Proteins Proteins 0.000 description 4
- 101710159648 Uncharacterized protein Proteins 0.000 description 4
- 239000007864 aqueous solution Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 235000014121 butter Nutrition 0.000 description 4
- 239000011575 calcium Substances 0.000 description 4
- 229910052791 calcium Inorganic materials 0.000 description 4
- 235000019519 canola oil Nutrition 0.000 description 4
- 239000000828 canola oil Substances 0.000 description 4
- 239000003054 catalyst Substances 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 239000000796 flavoring agent Substances 0.000 description 4
- 230000002068 genetic effect Effects 0.000 description 4
- 239000000693 micelle Substances 0.000 description 4
- 230000000269 nucleophilic effect Effects 0.000 description 4
- 239000002245 particle Substances 0.000 description 4
- 238000010187 selection method Methods 0.000 description 4
- 235000004977 Brassica sinapistrum Nutrition 0.000 description 3
- 241000207199 Citrus Species 0.000 description 3
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 3
- 240000003183 Manihot esculenta Species 0.000 description 3
- 235000016735 Manihot esculenta subsp esculenta Nutrition 0.000 description 3
- 235000009754 Vitis X bourquina Nutrition 0.000 description 3
- 235000012333 Vitis X labruscana Nutrition 0.000 description 3
- 240000006365 Vitis vinifera Species 0.000 description 3
- 235000014787 Vitis vinifera Nutrition 0.000 description 3
- 230000010933 acylation Effects 0.000 description 3
- 238000005917 acylation reaction Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 3
- 239000013566 allergen Substances 0.000 description 3
- 239000006227 byproduct Substances 0.000 description 3
- BECPQYXYKAMYBN-UHFFFAOYSA-N casein, tech. Chemical compound NCCCCC(C(O)=O)N=C(O)C(CC(O)=O)N=C(O)C(CCC(O)=N)N=C(O)C(CC(C)C)N=C(O)C(CCC(O)=O)N=C(O)C(CC(O)=O)N=C(O)C(CCC(O)=O)N=C(O)C(C(C)O)N=C(O)C(CCC(O)=N)N=C(O)C(CCC(O)=N)N=C(O)C(CCC(O)=N)N=C(O)C(CCC(O)=O)N=C(O)C(CCC(O)=O)N=C(O)C(COP(O)(O)=O)N=C(O)C(CCC(O)=N)N=C(O)C(N)CC1=CC=CC=C1 BECPQYXYKAMYBN-UHFFFAOYSA-N 0.000 description 3
- 235000021240 caseins Nutrition 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 235000020971 citrus fruits Nutrition 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 239000006071 cream Substances 0.000 description 3
- 235000014113 dietary fatty acids Nutrition 0.000 description 3
- 238000007865 diluting Methods 0.000 description 3
- 239000000194 fatty acid Substances 0.000 description 3
- 229930195729 fatty acid Natural products 0.000 description 3
- 150000004665 fatty acids Chemical class 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 235000019634 flavors Nutrition 0.000 description 3
- 238000001879 gelation Methods 0.000 description 3
- 238000004811 liquid chromatography Methods 0.000 description 3
- 235000021073 macronutrients Nutrition 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 235000020232 peanut Nutrition 0.000 description 3
- 238000001556 precipitation Methods 0.000 description 3
- 230000012846 protein folding Effects 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- UGTZMIPZNRIWHX-UHFFFAOYSA-K sodium trimetaphosphate Chemical compound [Na+].[Na+].[Na+].[O-]P1(=O)OP([O-])(=O)OP([O-])(=O)O1 UGTZMIPZNRIWHX-UHFFFAOYSA-K 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 239000000758 substrate Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- HRPVXLWXLXDGHG-UHFFFAOYSA-N Acrylamide Chemical compound NC(=O)C=C HRPVXLWXLXDGHG-UHFFFAOYSA-N 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 2
- 235000017060 Arachis glabrata Nutrition 0.000 description 2
- 235000010777 Arachis hypogaea Nutrition 0.000 description 2
- 235000018262 Arachis monticola Nutrition 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 241000219198 Brassica Species 0.000 description 2
- 235000014698 Brassica juncea var multisecta Nutrition 0.000 description 2
- 235000006008 Brassica napus var napus Nutrition 0.000 description 2
- 235000006618 Brassica rapa subsp oleifera Nutrition 0.000 description 2
- 244000188595 Brassica sinapistrum Species 0.000 description 2
- 244000045232 Canavalia ensiformis Species 0.000 description 2
- 240000009226 Corylus americana Species 0.000 description 2
- 235000001543 Corylus americana Nutrition 0.000 description 2
- 239000004971 Cross linker Substances 0.000 description 2
- SRBFZHDQGSBBOR-IOVATXLUSA-N D-xylopyranose Chemical compound O[C@@H]1COC(O)[C@H](O)[C@H]1O SRBFZHDQGSBBOR-IOVATXLUSA-N 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 244000020551 Helianthus annuus Species 0.000 description 2
- 235000003222 Helianthus annuus Nutrition 0.000 description 2
- 241000238631 Hexapoda Species 0.000 description 2
- VEXZGXHMUGYJMC-UHFFFAOYSA-N Hydrochloric acid Chemical compound Cl VEXZGXHMUGYJMC-UHFFFAOYSA-N 0.000 description 2
- 240000007049 Juglans regia Species 0.000 description 2
- 235000009496 Juglans regia Nutrition 0.000 description 2
- 108010070551 Meat Proteins Proteins 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 235000010617 Phaseolus lunatus Nutrition 0.000 description 2
- 244000040738 Sesamum orientale Species 0.000 description 2
- 229920002472 Starch Polymers 0.000 description 2
- 108060008539 Transglutaminase Proteins 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 244000098338 Triticum aestivum Species 0.000 description 2
- 235000010749 Vicia faba Nutrition 0.000 description 2
- 240000006677 Vicia faba Species 0.000 description 2
- 235000002098 Vicia faba var. major Nutrition 0.000 description 2
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 2
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 2
- 159000000007 calcium salts Chemical class 0.000 description 2
- 238000005251 capillar electrophoresis Methods 0.000 description 2
- 239000005018 casein Substances 0.000 description 2
- 238000006555 catalytic reaction Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 235000012343 cottonseed oil Nutrition 0.000 description 2
- 235000020247 cow milk Nutrition 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 239000008367 deionised water Substances 0.000 description 2
- 229910021641 deionized water Inorganic materials 0.000 description 2
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 description 2
- 235000011180 diphosphates Nutrition 0.000 description 2
- 150000002016 disaccharides Chemical class 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 235000013601 eggs Nutrition 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 108091005632 fatty acylated proteins Proteins 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000004817 gas chromatography Methods 0.000 description 2
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 description 2
- 238000001502 gel electrophoresis Methods 0.000 description 2
- 150000004676 glycans Chemical class 0.000 description 2
- 150000002402 hexoses Chemical class 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 2
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 2
- 238000004949 mass spectrometry Methods 0.000 description 2
- 235000013622 meat product Nutrition 0.000 description 2
- 239000011785 micronutrient Substances 0.000 description 2
- 235000013369 micronutrients Nutrition 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 108091005573 modified proteins Proteins 0.000 description 2
- 102000035118 modified proteins Human genes 0.000 description 2
- 231100000252 nontoxic Toxicity 0.000 description 2
- 230000003000 nontoxic effect Effects 0.000 description 2
- 235000016709 nutrition Nutrition 0.000 description 2
- 235000019198 oils Nutrition 0.000 description 2
- 229920001542 oligosaccharide Polymers 0.000 description 2
- 150000002482 oligosaccharides Chemical class 0.000 description 2
- 150000002972 pentoses Chemical class 0.000 description 2
- 230000000865 phosphorylative effect Effects 0.000 description 2
- 229920001282 polysaccharide Polymers 0.000 description 2
- 239000005017 polysaccharide Substances 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 230000004845 protein aggregation Effects 0.000 description 2
- 235000004252 protein component Nutrition 0.000 description 2
- 238000000164 protein isolation Methods 0.000 description 2
- 238000000455 protein structure prediction Methods 0.000 description 2
- 230000002797 proteolythic effect Effects 0.000 description 2
- 238000010791 quenching Methods 0.000 description 2
- 230000000171 quenching effect Effects 0.000 description 2
- 239000013049 sediment Substances 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000004611 spectroscopical analysis Methods 0.000 description 2
- 235000019698 starch Nutrition 0.000 description 2
- 239000008107 starch Substances 0.000 description 2
- 230000000707 stereoselective effect Effects 0.000 description 2
- 238000003756 stirring Methods 0.000 description 2
- 231100000419 toxicity Toxicity 0.000 description 2
- 230000001988 toxicity Effects 0.000 description 2
- 102000003601 transglutaminase Human genes 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- WRIDQFICGBMAFQ-UHFFFAOYSA-N (E)-8-Octadecenoic acid Natural products CCCCCCCCCC=CCCCCCCC(O)=O WRIDQFICGBMAFQ-UHFFFAOYSA-N 0.000 description 1
- TWJNQYPJQDRXPH-UHFFFAOYSA-N 2-cyanobenzohydrazide Chemical compound NNC(=O)C1=CC=CC=C1C#N TWJNQYPJQDRXPH-UHFFFAOYSA-N 0.000 description 1
- LQJBNNIYVWPHFW-UHFFFAOYSA-N 20:1omega9c fatty acid Natural products CCCCCCCCCCC=CCCCCCCCC(O)=O LQJBNNIYVWPHFW-UHFFFAOYSA-N 0.000 description 1
- QSBYPNXLFMSGKH-UHFFFAOYSA-N 9-Heptadecensaeure Natural products CCCCCCCC=CCCCCCCCC(O)=O QSBYPNXLFMSGKH-UHFFFAOYSA-N 0.000 description 1
- 241000238818 Acheta domesticus Species 0.000 description 1
- 235000009434 Actinidia chinensis Nutrition 0.000 description 1
- 244000298697 Actinidia deliciosa Species 0.000 description 1
- 235000009436 Actinidia deliciosa Nutrition 0.000 description 1
- 108010005094 Advanced Glycation End Products Proteins 0.000 description 1
- ATRRKUHOCOJYRX-UHFFFAOYSA-N Ammonium bicarbonate Chemical compound [NH4+].OC([O-])=O ATRRKUHOCOJYRX-UHFFFAOYSA-N 0.000 description 1
- 229910000013 Ammonium bicarbonate Inorganic materials 0.000 description 1
- 235000011437 Amygdalus communis Nutrition 0.000 description 1
- 241000693997 Anacardium Species 0.000 description 1
- 235000001271 Anacardium Nutrition 0.000 description 1
- 244000226021 Anacardium occidentale Species 0.000 description 1
- 235000003911 Arachis Nutrition 0.000 description 1
- 241000208838 Asteraceae Species 0.000 description 1
- 235000007319 Avena orientalis Nutrition 0.000 description 1
- 241000209763 Avena sativa Species 0.000 description 1
- 235000007558 Avena sp Nutrition 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 238000009010 Bradford assay Methods 0.000 description 1
- 235000011331 Brassica Nutrition 0.000 description 1
- 235000003351 Brassica cretica Nutrition 0.000 description 1
- 240000002791 Brassica napus Species 0.000 description 1
- 235000003343 Brassica rupestris Nutrition 0.000 description 1
- 239000007848 Bronsted acid Substances 0.000 description 1
- UXVMQQNJUSDDNG-UHFFFAOYSA-L Calcium chloride Chemical compound [Cl-].[Cl-].[Ca+2] UXVMQQNJUSDDNG-UHFFFAOYSA-L 0.000 description 1
- 102000005701 Calcium-Binding Proteins Human genes 0.000 description 1
- 108010045403 Calcium-Binding Proteins Proteins 0.000 description 1
- 108700000434 Cannabis sativa edestin Proteins 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 240000006162 Chenopodium quinoa Species 0.000 description 1
- 235000010523 Cicer arietinum Nutrition 0.000 description 1
- 244000045195 Cicer arietinum Species 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 102000008186 Collagen Human genes 0.000 description 1
- 108010035532 Collagen Proteins 0.000 description 1
- VPAXJOUATWLOPR-UHFFFAOYSA-N Conferone Chemical compound C1=CC(=O)OC2=CC(OCC3C4(C)CCC(=O)C(C)(C)C4CC=C3C)=CC=C21 VPAXJOUATWLOPR-UHFFFAOYSA-N 0.000 description 1
- 101710086453 Conglutin Proteins 0.000 description 1
- 101710190853 Cruciferin Proteins 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- 241000219122 Cucurbita Species 0.000 description 1
- 240000001980 Cucurbita pepo Species 0.000 description 1
- 235000009852 Cucurbita pepo Nutrition 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 101100053236 Escherichia coli (strain K12) ymcE gene Proteins 0.000 description 1
- 241000220485 Fabaceae Species 0.000 description 1
- 229930091371 Fructose Natural products 0.000 description 1
- 239000005715 Fructose Substances 0.000 description 1
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 1
- 108010010803 Gelatin Proteins 0.000 description 1
- 108010060231 Insect Proteins Proteins 0.000 description 1
- 241000207923 Lamiaceae Species 0.000 description 1
- 101710094902 Legumin Proteins 0.000 description 1
- 235000006439 Lemna minor Nutrition 0.000 description 1
- 244000207740 Lemna minor Species 0.000 description 1
- 241000408747 Lepomis gibbosus Species 0.000 description 1
- 239000002841 Lewis acid Substances 0.000 description 1
- OYHQOLUKZRVURQ-HZJYTTRNSA-N Linoleic acid Chemical compound CCCCC\C=C/C\C=C/CCCCCCCC(O)=O OYHQOLUKZRVURQ-HZJYTTRNSA-N 0.000 description 1
- 235000004431 Linum usitatissimum Nutrition 0.000 description 1
- 240000006240 Linum usitatissimum Species 0.000 description 1
- 102000004895 Lipoproteins Human genes 0.000 description 1
- 108090001030 Lipoproteins Proteins 0.000 description 1
- 240000000894 Lupinus albus Species 0.000 description 1
- 235000010649 Lupinus albus Nutrition 0.000 description 1
- 102000001621 Mucoproteins Human genes 0.000 description 1
- 108010093825 Mucoproteins Proteins 0.000 description 1
- 102000003505 Myosin Human genes 0.000 description 1
- 108060008487 Myosin Proteins 0.000 description 1
- 235000021360 Myristic acid Nutrition 0.000 description 1
- TUNFSRHWOTWDNC-UHFFFAOYSA-N Myristic acid Natural products CCCCCCCCCCCCCC(O)=O TUNFSRHWOTWDNC-UHFFFAOYSA-N 0.000 description 1
- 101710202365 Napin Proteins 0.000 description 1
- 241000207836 Olea <angiosperm> Species 0.000 description 1
- 239000005642 Oleic acid Substances 0.000 description 1
- ZQPPMHVWECSIRJ-UHFFFAOYSA-N Oleic acid Natural products CCCCCCCCC=CCCCCCCCC(O)=O ZQPPMHVWECSIRJ-UHFFFAOYSA-N 0.000 description 1
- 235000008753 Papaver somniferum Nutrition 0.000 description 1
- 240000001090 Papaver somniferum Species 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 235000008331 Pinus X rigitaeda Nutrition 0.000 description 1
- 241000018646 Pinus brutia Species 0.000 description 1
- 235000011613 Pinus brutia Nutrition 0.000 description 1
- 235000003447 Pistacia vera Nutrition 0.000 description 1
- 229920000388 Polyphosphate Chemical class 0.000 description 1
- 235000001855 Portulaca oleracea Nutrition 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 235000004443 Ricinus communis Nutrition 0.000 description 1
- 235000004789 Rosa xanthina Nutrition 0.000 description 1
- 241000220222 Rosaceae Species 0.000 description 1
- 235000017276 Salvia Nutrition 0.000 description 1
- 235000012377 Salvia columbariae var. columbariae Nutrition 0.000 description 1
- 240000005481 Salvia hispanica Species 0.000 description 1
- 235000001498 Salvia hispanica Nutrition 0.000 description 1
- 240000007164 Salvia officinalis Species 0.000 description 1
- 241000657513 Senna surattensis Species 0.000 description 1
- UIIMBOGNXHQVGW-DEQYMQKBSA-M Sodium bicarbonate-14C Chemical compound [Na+].O[14C]([O-])=O UIIMBOGNXHQVGW-DEQYMQKBSA-M 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000021355 Stearic acid Nutrition 0.000 description 1
- 244000299461 Theobroma cacao Species 0.000 description 1
- 235000009470 Theobroma cacao Nutrition 0.000 description 1
- 235000001484 Trigonella foenum graecum Nutrition 0.000 description 1
- 244000250129 Trigonella foenum graecum Species 0.000 description 1
- 241001489212 Tuber Species 0.000 description 1
- 101710196023 Vicilin Proteins 0.000 description 1
- 240000004922 Vigna radiata Species 0.000 description 1
- 235000010721 Vigna radiata var radiata Nutrition 0.000 description 1
- 235000011469 Vigna radiata var sublobata Nutrition 0.000 description 1
- 235000009932 Zanthoxylum simulans Nutrition 0.000 description 1
- 244000089698 Zanthoxylum simulans Species 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 238000007171 acid catalysis Methods 0.000 description 1
- 239000003377 acid catalyst Substances 0.000 description 1
- 230000002378 acidificating effect Effects 0.000 description 1
- 125000002015 acyclic group Chemical group 0.000 description 1
- 150000001263 acyl chlorides Chemical class 0.000 description 1
- 238000013019 agitation Methods 0.000 description 1
- 150000001299 aldehydes Chemical class 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- WQZGKKKJIJFFOK-PHYPRBDBSA-N alpha-D-galactose Chemical compound OC[C@H]1O[C@H](O)[C@H](O)[C@@H](O)[C@H]1O WQZGKKKJIJFFOK-PHYPRBDBSA-N 0.000 description 1
- 108010000011 amandin Proteins 0.000 description 1
- 150000001408 amides Chemical class 0.000 description 1
- 235000012538 ammonium bicarbonate Nutrition 0.000 description 1
- 239000001099 ammonium carbonate Substances 0.000 description 1
- PYMYPHUHKUWMLA-WDCZJNDASA-N arabinose Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)C=O PYMYPHUHKUWMLA-WDCZJNDASA-N 0.000 description 1
- 235000019606 astringent taste Nutrition 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 238000005815 base catalysis Methods 0.000 description 1
- 238000013398 bayesian method Methods 0.000 description 1
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 1
- QKSKPIVNLNLAAV-UHFFFAOYSA-N bis(2-chloroethyl) sulfide Chemical compound ClCCSCCCl QKSKPIVNLNLAAV-UHFFFAOYSA-N 0.000 description 1
- OHJMTUPIZMNBFR-UHFFFAOYSA-N biuret Chemical compound NC(=O)NC(N)=O OHJMTUPIZMNBFR-UHFFFAOYSA-N 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000009937 brining Methods 0.000 description 1
- 235000021329 brown rice Nutrition 0.000 description 1
- 239000001110 calcium chloride Substances 0.000 description 1
- 235000011148 calcium chloride Nutrition 0.000 description 1
- 229910001628 calcium chloride Inorganic materials 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 150000001732 carboxylic acid derivatives Chemical class 0.000 description 1
- 150000001735 carboxylic acids Chemical class 0.000 description 1
- 239000000679 carrageenan Substances 0.000 description 1
- 235000010418 carrageenan Nutrition 0.000 description 1
- 229920001525 carrageenan Polymers 0.000 description 1
- 229940113118 carrageenan Drugs 0.000 description 1
- 229940021722 caseins Drugs 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000012993 chemical processing Methods 0.000 description 1
- 235000014167 chia Nutrition 0.000 description 1
- 238000002983 circular dichroism Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000005345 coagulation Methods 0.000 description 1
- 230000015271 coagulation Effects 0.000 description 1
- 229920001436 collagen Polymers 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- JECGPMYZUFFYJW-UHFFFAOYSA-N conferone Natural products CC1=CCC2C(C)(C)C(=O)CCC2(C)C1COc3cccc4C=CC(=O)Oc34 JECGPMYZUFFYJW-UHFFFAOYSA-N 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- AZSFNUJOCKMOGB-UHFFFAOYSA-K cyclotriphosphate(3-) Chemical class [O-]P1(=O)OP([O-])(=O)OP([O-])(=O)O1 AZSFNUJOCKMOGB-UHFFFAOYSA-K 0.000 description 1
- 230000018044 dehydration Effects 0.000 description 1
- 238000006297 dehydration reaction Methods 0.000 description 1
- 235000013367 dietary fats Nutrition 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- NEKNNCABDXGBEN-UHFFFAOYSA-L disodium;4-(4-chloro-2-methylphenoxy)butanoate;4-(2,4-dichlorophenoxy)butanoate Chemical compound [Na+].[Na+].CC1=CC(Cl)=CC=C1OCCCC([O-])=O.[O-]C(=O)CCCOC1=CC=C(Cl)C=C1Cl NEKNNCABDXGBEN-UHFFFAOYSA-L 0.000 description 1
- 238000002296 dynamic light scattering Methods 0.000 description 1
- 239000003995 emulsifying agent Substances 0.000 description 1
- 150000002148 esters Chemical class 0.000 description 1
- 238000001506 fluorescence spectroscopy Methods 0.000 description 1
- 238000004108 freeze drying Methods 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 229930182830 galactose Natural products 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 239000008273 gelatin Substances 0.000 description 1
- 229920000159 gelatin Polymers 0.000 description 1
- 235000019322 gelatine Nutrition 0.000 description 1
- 235000011852 gelatine desserts Nutrition 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 239000010520 ghee Substances 0.000 description 1
- 235000020251 goat milk Nutrition 0.000 description 1
- 235000020993 ground meat Nutrition 0.000 description 1
- 235000011617 hard cheese Nutrition 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 229940005740 hexametaphosphate Drugs 0.000 description 1
- 235000020256 human milk Nutrition 0.000 description 1
- 210000004251 human milk Anatomy 0.000 description 1
- 239000000416 hydrocolloid Substances 0.000 description 1
- 230000007062 hydrolysis Effects 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-M hydroxide Chemical compound [OH-] XLYOFNOQVPJJNP-UHFFFAOYSA-M 0.000 description 1
- 235000015243 ice cream Nutrition 0.000 description 1
- 238000004255 ion exchange chromatography Methods 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- QXJSBBXBKPUZAA-UHFFFAOYSA-N isooleic acid Natural products CCCCCCCC=CCCCCCCCCC(O)=O QXJSBBXBKPUZAA-UHFFFAOYSA-N 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 150000007517 lewis acids Chemical class 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 235000020778 linoleic acid Nutrition 0.000 description 1
- OYHQOLUKZRVURQ-IXWMQOLASA-N linoleic acid Natural products CCCCC\C=C/C\C=C\CCCCCCCC(O)=O OYHQOLUKZRVURQ-IXWMQOLASA-N 0.000 description 1
- 230000004576 lipid-binding Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000003050 macronutrient Effects 0.000 description 1
- 240000004308 marijuana Species 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Chemical class 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 238000009629 microbiological culture Methods 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 150000002772 monosaccharides Chemical class 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 235000010460 mustard Nutrition 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 230000031787 nutrient reservoir activity Effects 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000035764 nutrition Effects 0.000 description 1
- QIQXTHQIDYTFRH-UHFFFAOYSA-N octadecanoic acid Chemical compound CCCCCCCCCCCCCCCCCC(O)=O QIQXTHQIDYTFRH-UHFFFAOYSA-N 0.000 description 1
- OQCDKBAXFALNLD-UHFFFAOYSA-N octadecanoic acid Natural products CCCCCCCC(C)CCCCCCCCC(O)=O OQCDKBAXFALNLD-UHFFFAOYSA-N 0.000 description 1
- RAFYDKXYXRZODZ-UHFFFAOYSA-N octanoyl octanoate Chemical compound CCCCCCCC(=O)OC(=O)CCCCCCC RAFYDKXYXRZODZ-UHFFFAOYSA-N 0.000 description 1
- ZQPPMHVWECSIRJ-KTKRTIGZSA-N oleic acid Chemical compound CCCCCCCC\C=C/CCCCCCCC(O)=O ZQPPMHVWECSIRJ-KTKRTIGZSA-N 0.000 description 1
- 210000000956 olfactory bulb Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 235000016046 other dairy product Nutrition 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 238000002264 polyacrylamide gel electrophoresis Methods 0.000 description 1
- 239000001205 polyphosphate Chemical class 0.000 description 1
- 235000011176 polyphosphates Nutrition 0.000 description 1
- 230000004481 post-translational protein modification Effects 0.000 description 1
- 239000011736 potassium bicarbonate Substances 0.000 description 1
- 235000015497 potassium bicarbonate Nutrition 0.000 description 1
- 229910000028 potassium bicarbonate Inorganic materials 0.000 description 1
- TYJJADVDDVDEDZ-UHFFFAOYSA-M potassium hydrogencarbonate Chemical compound [K+].OC([O-])=O TYJJADVDDVDEDZ-UHFFFAOYSA-M 0.000 description 1
- 235000008476 powdered milk Nutrition 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- XXRYFVCIMARHRS-UHFFFAOYSA-N propan-2-yl n-dimethoxyphosphorylcarbamate Chemical compound COP(=O)(OC)NC(=O)OC(C)C XXRYFVCIMARHRS-UHFFFAOYSA-N 0.000 description 1
- 230000006916 protein interaction Effects 0.000 description 1
- 230000033998 protein modification process Effects 0.000 description 1
- 238000000734 protein sequencing Methods 0.000 description 1
- 230000004850 protein–protein interaction Effects 0.000 description 1
- 235000020236 pumpkin seed Nutrition 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000000376 reactant Substances 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
- 238000009938 salting Methods 0.000 description 1
- 238000005185 salting out Methods 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 235000021003 saturated fats Nutrition 0.000 description 1
- 235000013580 sausages Nutrition 0.000 description 1
- 238000004579 scanning voltage microscopy Methods 0.000 description 1
- 238000004062 sedimentation Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 235000020254 sheep milk Nutrition 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 239000011734 sodium Substances 0.000 description 1
- 235000011121 sodium hydroxide Nutrition 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 235000020354 squash Nutrition 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 239000008117 stearic acid Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 235000020238 sunflower seed Nutrition 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 235000019640 taste Nutrition 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000002849 thermal shift Methods 0.000 description 1
- 238000002411 thermogravimetry Methods 0.000 description 1
- 150000007970 thio esters Chemical class 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 235000001019 trigonella foenum-graecum Nutrition 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- UNXRWKVEANCORM-UHFFFAOYSA-I triphosphate(5-) Chemical class [O-]P([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O UNXRWKVEANCORM-UHFFFAOYSA-I 0.000 description 1
- 235000021081 unsaturated fats Nutrition 0.000 description 1
- 239000000052 vinegar Substances 0.000 description 1
- 235000021419 vinegar Nutrition 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 239000008256 whipped cream Substances 0.000 description 1
- 235000013618 yogurt Nutrition 0.000 description 1
- UHVMMEOXYDMDKI-JKYCWFKZSA-L zinc;1-(5-cyanopyridin-2-yl)-3-[(1s,2s)-2-(6-fluoro-2-hydroxy-3-propanoylphenyl)cyclopropyl]urea;diacetate Chemical compound [Zn+2].CC([O-])=O.CC([O-])=O.CCC(=O)C1=CC=C(F)C([C@H]2[C@H](C2)NC(=O)NC=2N=CC(=CC=2)C#N)=C1O UHVMMEOXYDMDKI-JKYCWFKZSA-L 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23C—DAIRY PRODUCTS, e.g. MILK, BUTTER OR CHEESE; MILK OR CHEESE SUBSTITUTES; MAKING THEREOF
- A23C11/00—Milk substitutes, e.g. coffee whitener compositions
- A23C11/02—Milk substitutes, e.g. coffee whitener compositions containing at least one non-milk component as source of fats or proteins
- A23C11/10—Milk substitutes, e.g. coffee whitener compositions containing at least one non-milk component as source of fats or proteins containing or not lactose but no other milk components as source of fats, carbohydrates or proteins
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23C—DAIRY PRODUCTS, e.g. MILK, BUTTER OR CHEESE; MILK OR CHEESE SUBSTITUTES; MAKING THEREOF
- A23C20/00—Cheese substitutes
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23C—DAIRY PRODUCTS, e.g. MILK, BUTTER OR CHEESE; MILK OR CHEESE SUBSTITUTES; MAKING THEREOF
- A23C20/00—Cheese substitutes
- A23C20/02—Cheese substitutes containing neither milk components, nor caseinate, nor lactose, as sources of fats, proteins or carbohydrates
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23J—PROTEIN COMPOSITIONS FOR FOODSTUFFS; WORKING-UP PROTEINS FOR FOODSTUFFS; PHOSPHATIDE COMPOSITIONS FOR FOODSTUFFS
- A23J3/00—Working-up of proteins for foodstuffs
- A23J3/14—Vegetable proteins
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23J—PROTEIN COMPOSITIONS FOR FOODSTUFFS; WORKING-UP PROTEINS FOR FOODSTUFFS; PHOSPHATIDE COMPOSITIONS FOR FOODSTUFFS
- A23J3/00—Working-up of proteins for foodstuffs
- A23J3/22—Working-up of proteins for foodstuffs by texturising
- A23J3/225—Texturised simulated foods with high protein content
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23J—PROTEIN COMPOSITIONS FOR FOODSTUFFS; WORKING-UP PROTEINS FOR FOODSTUFFS; PHOSPHATIDE COMPOSITIONS FOR FOODSTUFFS
- A23J3/00—Working-up of proteins for foodstuffs
- A23J3/22—Working-up of proteins for foodstuffs by texturising
- A23J3/225—Texturised simulated foods with high protein content
- A23J3/227—Meat-like textured foods
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23L—FOODS, FOODSTUFFS, OR NON-ALCOHOLIC BEVERAGES, NOT COVERED BY SUBCLASSES A21D OR A23B-A23J; THEIR PREPARATION OR TREATMENT, e.g. COOKING, MODIFICATION OF NUTRITIVE QUALITIES, PHYSICAL TREATMENT; PRESERVATION OF FOODS OR FOODSTUFFS, IN GENERAL
- A23L33/00—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof
- A23L33/10—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof using additives
- A23L33/125—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof using additives containing carbohydrate syrups; containing sugars; containing sugar alcohols; containing starch hydrolysates
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23L—FOODS, FOODSTUFFS, OR NON-ALCOHOLIC BEVERAGES, NOT COVERED BY SUBCLASSES A21D OR A23B-A23J; THEIR PREPARATION OR TREATMENT, e.g. COOKING, MODIFICATION OF NUTRITIVE QUALITIES, PHYSICAL TREATMENT; PRESERVATION OF FOODS OR FOODSTUFFS, IN GENERAL
- A23L33/00—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof
- A23L33/10—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof using additives
- A23L33/135—Bacteria or derivatives thereof, e.g. probiotics
-
- A—HUMAN NECESSITIES
- A23—FOODS OR FOODSTUFFS; TREATMENT THEREOF, NOT COVERED BY OTHER CLASSES
- A23L—FOODS, FOODSTUFFS, OR NON-ALCOHOLIC BEVERAGES, NOT COVERED BY SUBCLASSES A21D OR A23B-A23J; THEIR PREPARATION OR TREATMENT, e.g. COOKING, MODIFICATION OF NUTRITIVE QUALITIES, PHYSICAL TREATMENT; PRESERVATION OF FOODS OR FOODSTUFFS, IN GENERAL
- A23L33/00—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof
- A23L33/10—Modifying nutritive qualities of foods; Dietetic products; Preparation or treatment thereof using additives
- A23L33/17—Amino acids, peptides or proteins
- A23L33/185—Vegetable proteins
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Definitions
- This invention relates generally to the food science field, and more specifically to a new and useful system and method in the food science field.
- FIG. 1 is a schematic representation of a variant of the method.
- FIG. 2 is a schematic representation of a variant of the system.
- FIG. 3 depicts an illustrative example of a database.
- FIG. 4 depicts an illustrative example of functional property value sets associated with different source components and constituent proteins.
- FIGS. 5 A and 5 B depicts illustrative examples of aggregating feature values for a protein set.
- FIG. 6 depicts an embodiment of training a prediction model.
- FIG. 7 A depicts a first example of training a prediction model to predict functional property values.
- FIG. 7 B depicts a second example of training a prediction model to predict functional property values.
- FIG. 8 depicts an example of determining a candidate protein set.
- FIG. 9 depicts an illustrative example of determining a candidate protein set.
- FIG. 10 depicts an embodiment of target determination.
- FIG. 11 depicts another embodiment of target determination.
- FIG. 12 depicts an example of predicting the functional properties for a protein set and optionally predicting a protein set or protein source set.
- FIG. 13 depicts an example of predicting the functional properties for a protein set.
- FIG. 14 depicts example functional property values for samples produced using phosphorylated proteins.
- the method can include: characterizing a protein set S 100 , training a prediction model S 300 , determining target characteristic values S 400 , determining a candidate protein set based on the target characteristic values S 500 , and/or any other suitable steps.
- the method can function to determine a candidate protein set with a desired set of functional property values (e.g., wherein the candidate protein set can be used in a replacement for a target food product).
- the candidate protein set can be selected to replicate target functional property values of and/or replace: caseins, leather proteins (e.g., collagen, gelatin, etc.), meat proteins (e.g., myosin), and/or any other protein set.
- the method can optionally determine protein source sets that contain the candidate protein set.
- the method can include: predicting functional property values given a protein set and optionally a context (e.g., example shown in FIG. 14 ).
- the method can include: extracting feature values from the amino acid sequences for each of a set of protein sets, measuring functional property values for the set of protein sets, and training a prediction model to predict functional property values for a protein set based on feature values for the respective protein set.
- the prediction model can predict functional property values for the protein set based on aggregated feature values across individual proteins in the protein set.
- a protein set can optionally be associated with a composition (e.g., a relative and/or absolute concentration for each protein in the training protein set) and/or a context (e.g., manufacturing process parameters, protein modifications, etc.), wherein the composition and/or context can be inputs to the prediction model (e.g., separate vectors, concatenated to the protein set feature vector, used to weight the protein set feature vector, etc.).
- a composition e.g., a relative and/or absolute concentration for each protein in the training protein set
- a context e.g., manufacturing process parameters, protein modifications, etc.
- measuring functional property values for a protein set can include manufacturing a sample matching the protein set composition using the process parameters and/or other context information, wherein the functional property values for the protein set are measured using assays.
- the target functional property values can be directly measured for a target product (e.g., a target food product).
- the prediction model can be used to predict functional property values for each protein set in a candidate group, wherein the candidate group includes uncharacterized protein sets (e.g., without measured functional property data).
- a candidate protein set e.g., including an associated composition and/or context
- a protein source with a high probability of producing the candidate protein set can then be selected from the candidate group based on a similarity between the predicted functional property values and target functional property values.
- a candidate protein set can be extracted from the prediction model (e.g., using an acquisition function), be predicted by a second model (e.g., a decoder), and/or otherwise determined.
- Variants of the technology can confer one or more advantages over conventional technologies.
- variants of the method can identify protein replacements based on the similarities between the amino acid sequence features (AA sequence features) of the candidate proteins and the target proteins (proteins to be replaced), and/or based on similarities between the predicted functional properties of the candidate proteins and the functional properties of the target product (e.g., food).
- AA sequence features amino acid sequence features
- target protein proteins to be replaced
- variants of the technology can use a subset of features (e.g., subset of amino acid sequence features) which are likely to be important in influencing functional behavior.
- the functional property values are experimentally determined for protein sets (e.g., gelled mixtures of proteins) to capture important protein-protein interactions influencing function, and correlated with the feature values for the constituent proteins, wherein predictive features are selected for subsequent analysis based on the correlation.
- lift analysis can be used (e.g., during and/or after training a prediction model) to select a subset of features with high lift. This feature selection can reduce computational complexity and/or enable human-interpretable annotation of the features.
- variants of the technology can reduce the need for experimental analysis of proteins to determine their candidacy potential.
- a large domain of available protein sets can be computationally analyzed (e.g., using featurization of their amino acid sequences) rather than experimentally analyzed to evaluate their potential to replicate functional properties of a target set of proteins.
- This analysis methodology can enable a much larger group of candidates to be considered than if experimental analysis of each protein set were required.
- variants of the technology can reduce the need for experimental analysis of potential protein sources by predicting whether a protein source (e.g., plant, plant component, etc.) will include sufficient amounts of a given protein or protein set, such as by using genetic analyses and/or evolutionary tree analyses.
- a protein source e.g., plant, plant component, etc.
- Variants of the system can include a database and a set of models.
- the system functions to determine the functional properties for protein sets, determine which protein sets can produce a set of target functional properties, determine which protein sources can produce a target protein set, and/or be otherwise used.
- the database can include proteins, protein sets (e.g., protein set identifiers), protein set compositions (e.g., identification of proteins in the set, relative and/or absolute concentrations of proteins in the set, etc.), sequences, features, feature values, functional properties, functional property values, protein sources and/or source components, evolutionary relationships, contexts (e.g., process parameters, protein modifications, sample environment, etc.), and/or any other elements.
- the system can optionally include and/or interface with one or more third-party databases (e.g., a sequence database, a protein database, amino acid composition database, etc.).
- elements stored in the system database can be retrieved from a third-party database.
- the system database can be a third-party database.
- a protein set can be an individual protein (e.g., a set of one, an individual protein within a larger set, etc.), multiple proteins (e.g., a mixture of proteins, proteins within a source and/or source component; within a gel, sample, product, solution, combination, and/or other mixture; within a food product; within a consumer product; etc.), a set of protein sets, and/or be otherwise defined.
- an individual protein e.g., a set of one, an individual protein within a larger set, etc.
- multiple proteins e.g., a mixture of proteins, proteins within a source and/or source component; within a gel, sample, product, solution, combination, and/or other mixture; within a food product; within a consumer product; etc.
- the protein set can be from one or more protein sources (e.g., combination of protein sources), from one or more components of protein sources, be manually specified, and/or be otherwise determined.
- the protein source can be plant matter (e.g., processed and/or unprocessed plant matter), animal matter (e.g., milk such as cow milk, insects such as Acheta domesticus , meat, etc.), bacterium (e.g., naturally occurring, genetically modified, etc.), any organism (e.g., identified by a species name, a common name, etc.), a food product, a naturally-occurring protein source, a synthetic protein source, and/or any other entity and/or component (e.g., protein source component) thereof.
- plant matter e.g., processed and/or unprocessed plant matter
- animal matter e.g., milk such as cow milk, insects such as Acheta domesticus , meat, etc.
- bacterium e.g., naturally occurring, genetically modified, etc.
- any organism
- the protein source component (e.g., the part of the source where the protein set can be derived) can be a nut, fruit, seed, legumes, stem, leaves, root, flower, stamen, muscle, carapace, and/or any other component of the associated source.
- the protein source can optionally be labeled (e.g., in the database) with one or more classifications (e.g., dairy, meat, non-dairy, non-meat, etc.).
- the protein source, source component, and/or the protein set can optionally be associated with an abundance metric (e.g., where the metric can assess the ease of accessing large quantities of the protein set for scaled use).
- the abundance metric can be: experimentally determined (e.g., measured), predicted (e.g., based on the abundance metrics for related protein sources), and/or otherwise determined.
- the abundance metric is preferably representative of a single protein's abundance within a protein source, but can alternatively be representative of a protein set's abundance within a protein source, be representative of the protein source's abundance, and/or represent other information.
- the protein set can include all or a subset of proteins in the protein source and/or protein source component.
- the protein set can include proteins above a concentration threshold in the protein source and/or source component (e.g., wherein the concentration threshold by weight can be 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 15%, 20%, 25%, 50%, etc.).
- the protein set can include the most abundant (e.g., highest concentration) proteins in a protein source and/or a component of the protein source.
- the protein set can include a predetermined number of the most abundant proteins (e.g., wherein the predetermined number can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 50, etc.).
- Plant matter can include: peas (e.g., pea flour, pea starch, etc.), rice (e.g., rice flour, glutinous rice flour, white rice flour, brown rice flour, etc.), fruits (e.g., citrus fiber), cassava (e.g., cassava flour), potato, cocoa beans, truffles, olives, coconut flesh, grape pomace, pumpkin (e.g., pumpkin seed), cottonseed, canola, sunflower, hazelnut, pistachio, almond, walnut, crude walnut, cashew, brazil nuts, hazelnut, macadamia nuts, pecan, peanut, hemp, oat, rice, poppy, watermelon (e.g., watermelon seed), chestnut, chia, flax, quinoa, soybean, split mung beans, aquafaba, lupini, fenugreek, kiwi, Sichuan pepper, mustard, sesame, sunflower seeds, algae, duckweeds (e.g.,
- the plant matter may include major production oilseeds (e.g., soybean, rapeseed, sunflower, sesame, niger, castor, canola, cottonseed, etc.), minor production oilseeds (coconut, palm seed, pumpkin, etc.), and/or other crops or plant matter.
- the plant matter may exclude allergens (e.g., wheat, soy, peanut, etc.).
- the plant matter may include a single variety of plant matter, a mixture of various plant matter, include animal matter (e.g., insect matter, mammalian products, etc.), and/or include matter from any other source.
- the protein source can be processed (e.g., lipid-removed, comminuted, separated into a solid and liquid component, mechanical processing, chemical processing, a protein powder derived from the plant matter, an extract from the plant matter, fermented, protein modifications, etc.) and/or unprocessed.
- processed e.g., lipid-removed, comminuted, separated into a solid and liquid component, mechanical processing, chemical processing, a protein powder derived from the plant matter, an extract from the plant matter, fermented, protein modifications, etc.
- the protein source can include a plant milk, powdered whole plant component (e.g., plant matter) in an aqueous solution, isolated plant protein (e.g., powder), and/or any other suitable source of protein.
- One or more proteins can be derived (e.g., extracted) from the protein source.
- the proteins can include protein isolates (e.g., solubilized protein isolates) extracted from the protein source.
- Protein isolates can include: proteins isolated using isoelectric precipitation (e.g., salting in, salting out, etc.), by collecting and optionally diluting a protein-rich solution (e.g., the supernatant obtained by spinning down a whole plant ingredient, such as a seed powder; residual obtained by removing at least a threshold proportion of insoluble solids from a plant milk, such as 50%, 75%, 80%, 90%, etc.; etc.), and/or otherwise obtained.
- isoelectric precipitation e.g., salting in, salting out, etc.
- a protein-rich solution e.g., the supernatant obtained by spinning down a whole plant ingredient, such as a seed powder
- residual obtained by removing at least a threshold proportion of insoluble solids from a plant milk such as 50%, 75%, 80%, 90%, etc.; etc.
- the protein ingredient obtained from the plant matter can be substantially pure (e.g., wherein a single monomeric or multimeric protein represents at least 50%, 60%, 70%, 80%, 90%, and/or more than 90% of the overall protein content in the protein ingredient and/or the product), but can alternatively be impure (e.g., include more than 10%, 20%, 30%, 40%, 50%, 60% other proteins, etc.).
- the proteins can include structured protein isolates (SPIs) produced using protein isolates.
- SPIs can be produced by: obtaining a protein isolate mixture (e.g., a protein isolate solution) from a protein source; diluting the protein isolate mixture using a diluent; optionally separating the diluted protein isolate mixture (e.g., allowing sedimentation to occur, centrifuging, filtering, etc.); and collecting SPIs (e.g., an SPI mixture) and from the diluted protein isolate mixture (e.g., collecting the sediment, collecting all or part of a homogenous diluted lipid protein isolate mixture, etc.).
- a protein isolate mixture e.g., a protein isolate solution
- a diluent e.g., allowing sedimentation to occur, centrifuging, filtering, etc.
- collecting SPIs e.g., an SPI mixture
- from the diluted protein isolate mixture e.g., collecting the sediment, collecting all or part of a homogenous
- the diluent can include water (e.g., deionized water), an aqueous solution (e.g., water, a mixture of water and other ingredients, etc.), an aqueous solution mixed (e.g., emulsified) with other ingredients, and/or any other diluent.
- the SPI mixture can include an aqueous component, a protein component, and/or other ingredients.
- the protein component can include protein isolates, SPIs, aggregates of SPIs, a combination thereof, and/or any other proteins.
- the protein concentration (by weight) in the SPI mixture can be between 0.01%-95% or any range or value therebetween (e.g., 1%-15%, 30%-50%, 44%, 40%, etc.), but can alternatively be less than 0.01% or greater than 95%.
- the proteins can include: globulins (e.g., 2S globulins, 1S globulins, 7S globulins, conglutin, napin, sfa, edestin, amandin, concanvalin, vicilin, legumin, cruciferin, helianthinin, etc.), pseudoglobulins, globular proteins, prolamins, albumins, gluten, gliadin, conglycinin, hordein, phasolin, zein, olsosin, caloleosin, sterelosin, conjugated proteins (e.g., lipoprotein, mucoprotein, etc.), other storage proteins (e.g., seed storage proteins, vegetative storage protein, etc.), animal proteins (e.g., casein, insect proteins, etc.), and/or any other suitable protein or combination thereof.
- globulins e.g., 2S globulins, 1S globul
- Proteins can optionally be modified (e.g., transglutaminase modifications, proteolytic modifications, glycosylation, glycation, phosphorylation, acylation, etc.) pre- or post-extraction from the protein source.
- the proteins e.g., modified or unmodified
- the SPI structure can be a sphere (e.g., a shell of protein isolate units, a shell or micelle with hydrophilic regions along the exterior and hydrophilic regions along the interior, etc.), an amorphous structure, and/or any other structure.
- the proteins can optionally include an aggregate of SPIs, wherein constituent SPIs can be arranged in: agglomerates, aggregates, micelles, stacks, and/or any other suitable higher-order arrangement.
- the proteins can include casein proteins, non-casein proteins, mammalian proteins, non-mammalian proteins, plant proteins, animal proteins, non-animal proteins, and/or any other proteins.
- proteins in target protein sets can include casein proteins, mammalian proteins, and/or animal proteins
- proteins in candidate protein sets can substantially exclude casein proteins, mammalian proteins, allergen proteins (e.g., proteins from allergens, such as peanuts, soy, wheat, etc.), and/or animal proteins, and/or include plant proteins (e.g., exclusively include plant proteins).
- proteins in candidate protein sets can include casein, mammalian, and/or animal proteins below a threshold amount, wherein the threshold amount can be between 0.1%-10% or any range or value therebetween (e.g., 10%, 5%, 3%, 2%, 1%, 0.1%, etc.), but can alternatively be greater than 10% or less than 0.1%.
- the protein set can be associated with a protein set composition and/or a total protein quantity (e.g., wherein the total protein quantity is an overall concentration or amount of proteins within a protein source and/or source component, an overall concentration or amount of proteins within a product, etc.).
- the protein set composition can include an identification of each protein in the set (e.g., a name or other identifier for each protein) and/or a concentration of each protein in the set.
- the concentration of a protein in the protein set can be an absolute concentration or a concentration relative to other proteins in the protein set.
- the concentration can be a percentage (e.g., by weight, by mass, by moles, etc.), a ratio, a proportion, an abundance, an amount (e.g., weight, mass, moles, etc.), a ranking (e.g., wherein each protein in the set is ranked relative to the other proteins based on concentration), and/or any other concentration metric.
- the composition of a first protein set can include a first protein (P1) at a concentration C1, and a second protein (P2) at a concentration C2; the composition of a second protein set can include the same proteins (P1 and P2) at difference concentrations C3 and C4, respectively.
- the protein set composition and/or the total protein quantity can be measured (e.g., using an assay), predetermined (e.g., manually specified), predicted (e.g., based on evolutionary relationships, using a prediction model, based on an amino acid composition, using a database, etc.), and/or otherwise determined.
- a first protein source is associated with a first protein set with a known composition and a second protein source is associated with a second protein set with an unknown composition, wherein an evolutionary relationship (e.g., based on an evolutionary tree) between the first and second protein sources is used to predict the composition of the second protein set (e.g., using the assumption that certain proteins and/or protein concentrations would be similar between the first and second protein sets when the protein sources are evolutionarily close).
- an overall composition of amino acids in a protein set is determined using an assay (e.g., LC/MS), and the composition of amino acids in each constituent protein are predicted based on the amino acid sequence for the respective constituent protein.
- an overall composition of amino acids in a protein set and a composition of amino acids in each constituent protein are retrieved from an amino acid composition database (e.g., a third-party PseAAC database).
- a model e.g., a regression
- concentration of each constituent protein within the protein set based on the overall amino acid composition (e.g., of the mixture) and the amino acid compositions for the constituent proteins.
- the protein set can be associated with one or more sequences (e.g., one sequence for each individual protein in the set).
- Sequences can include amino acid sequences, genetic sequences (e.g., DNA sequence, RNA sequence, gene sequence, etc.), any molecular sequence, any protein sequence, and/or other genetic information. Sequences can be measured (e.g., using an assay), predetermined (e.g., manually specified), predicted (e.g., based on an evolutionary tree, using a prediction model, etc.), and/or otherwise determined.
- the protein set can be associated with a context.
- the context can include: process parameters, protein modifications, sample environment, and/or any other information associated with the protein set and/or a sample (e.g., a food product, a gel, and/or any other product) containing the protein set.
- the context can be measured (e.g., using an assay), predetermined (e.g., manually specified), predicted, and/or otherwise determined.
- the protein set can be associated with one or more protein structures (e.g., one structure for each protein, one structure for each protein-context combination, etc.).
- the protein structures can be measured, predicted (e.g., using protein structure prediction models, and/or otherwise determined.
- Process parameters are preferably specifications prescribing the manufacturing of a sample containing the protein set (e.g., extracting the protein set from one or more protein sources, manufacturing the sample using the protein set, etc.), but can be otherwise defined.
- Process parameters can define: manufacturing specifications; the amounts thereof (e.g., ratios, volume, concentration, mass, etc.); temporal parameters thereof (e.g., when the input should be applied, duration of input application, etc.); and/or any other suitable manufacturing parameter.
- Manufacturing specifications can include: ingredients, treatments, and/or any other sample manufacturing input, wherein the process parameters can include parameters for each specification.
- ingredients can include: plant matter, proteins, lipids (e.g., fats, oils, etc.; isolated from plant sources; etc.), water, preservatives, acids and/or bases, macronutrients (e.g., protein, fat, starch, sugar, etc.), nutrients, micronutrients, carbohydrates, gums, vitamins, enzymes, emulsifiers, hydrocolloids, salts, chemical crosslinkers and/or non-crosslinkers, coloring, flavoring compounds, vinegar, mold powders, microbial cultures (e.g.
- cheese cultures such as Penicillium camemberti, Penicillium candidum, Geotrichum candidum, Penicillium roqueforti, Penicillium nalgiovensis, Verticillium lecanii, Kluyveromyces lactis, Saccharomyces cerevisiae, Candida utilis, Debaryomyces hansenii, Rhodosporidum infirmominiatum, Candida jefer, Cornybacteria, Micrococcus sps., Lactobacillus sps., Lactococcus, Staphylococcus, Halomonas, Brevibacterium, Psychrobacter , Leuconostocaceae, Streptococcus thermophilus, Pediococcus sps., Propionibacteria culture, combinations thereof, etc.), carbon sources, any combination thereof, and/or any other ingredient.
- Penicillium camemberti Penicillium candidum, Geotrichum candidum, Penicill
- Examples of treatments can include: adjusting temperature, adjusting salt level, adjusting pH level, diluting, pressurizing, depressurizing, humidifying, dehumidifying, agitating, resting, adding ingredients, removing components (e.g., filtering, draining, centrifugation, etc.), adjusting oxygen level, brining, comminuting, fermenting, mixing (e.g., homogenizing), reactions (e.g., acylation, glycation, phosphorylation, etc.), structural adjustments (e.g., micellization, etc.) and/or other treatments.
- adjusting temperature adjusting salt level, adjusting pH level, diluting, pressurizing, depressurizing, humidifying, dehumidifying, agitating, resting, adding ingredients, removing components (e.g., filtering, draining, centrifugation, etc.), adjusting oxygen level, brining, comminuting, fermenting, mixing (e.g., homogenizing), reactions (e.g.,
- treatment parameters can include: treatment type, treatment duration, treatment rate (e.g., flow rate, agitation rate, cooling rate, etc.), treatment temperature, time (e.g., when a treatment is applied, when the sample is characterized, etc.), and/or any other parameters.
- treatment rate e.g., flow rate, agitation rate, cooling rate, etc.
- treatment temperature e.g., temperature
- time e.g., when a treatment is applied, when the sample is characterized, etc.
- Protein modifications can include transglutaminase modifications, proteolytic modifications, glycosylation, glycation, phosphorylation, acylation, hydrolysis, and/or any other protein treatments.
- the modified proteins can be used as ingredients for a downstream product (e.g., dairy replicate), be used as a product (e.g., be sold as-is, be fermented using a cheese culture post-modification, etc.), and/or be otherwise used.
- proteins e.g., proteins containing nucleophilic residues, such as Lys, Ser, Thr, Cys, etc.; SPIs; etc.
- fatty acyl anhydrides e.g., caprylic anhydride; myristic acid; stearic acid; oleic acid; linoleic acid; etc.
- a fatty acylated protein e.g., via an amide linkage, such as from Lys; ester linkage, such as from Ser; thioester linkage, such as from Cys; etc.
- the ratio between proteins and acyl anhydrides can be between 1:1-1:4, but can alternatively be greater than 1:1 or less than 1:4.
- Unreacted fatty acyl anhydride can be quenched (e.g., with hydroxide and water, a base, a salt, etc.), yielding the corresponding fatty acid.
- the resultant fatty acylated protein and/or a sample therefrom can have increased lipid binding; increased hydrophobicity; increased gel strength; increased flow at elevated temperatures (i.e., melt); increased stretchiness; and/or other changed functional property values (e.g., values for texture, nutrition, etc.) relative to the unacylated protein or a sample therefrom.
- other carboxylic acid conjugation reagents e.g., acyl chlorides, activated carboxylic acids, metal catalysts, etc.
- carboxylic acid conjugation reagents e.g., acyl chlorides, activated carboxylic acids, metal catalysts, etc.
- proteins e.g., protein residues, surface-accessible nucleophilic residues, etc., SPIs, etc.
- proteins can be phosphorylated (e.g., using sodium trimetaphosphate).
- nucleophilic residues e.g., Ser, Thr, Lys
- STMP sodium trimetaphosphate
- Examples of other phosphorylation reagents that can be used include: other trimetaphosphate salts; hexametaphosphate salts; tripolyphosphate salts; polyphosphate salts; nucleoside triphosphates, and/or other phosphorylation agents.
- phosphorylation can be performed using non-toxic (e.g., at relevant concentrations) catalysts, reagents, byproducts, and/or other substances.
- the resultant phosphorylated protein and/or a sample therefrom can have increased calcium binding (e.g., an increased calcium concentration in the sample); increased stretchiness; increased flow at elevated temperatures (i.e., melt); increased solubility; increased hydrophobicity and/or hydrophilicity; decreased toxicity; decreased hydrophobicity and/or hydrophilicity; and/or other changed functional property values relative to an unphosphorylated protein and/or a sample therefrom.
- the proteins e.g., protein isolates, SPIs, dissolved and resuspended protein source substrate, etc.
- a protein solution at a target protein concentration.
- the target protein concentration in the protein solution and/or the target protein concentration in a final mixture e.g., including protein, acids/bases, phosphorylation reagent, calcium, etc.
- the target protein concentration in the protein solution and/or the target protein concentration in a final mixture is between 3%-50% or any range or value therebetween (e.g., 4-10%, 6%, 9%, greater than 6%, greater than 9%, 10-20%, 15%, etc.), but can alternatively be less than 5% or greater than 50%.
- the proteins are diluted to achieve the target concentration.
- the diluent can be water (e.g., deionized water), an aqueous solution (e.g., water, a mixture of water and other ingredients, etc.), and/or any other diluent.
- the protein solution can optionally be homogenized (e.g., for 30s-10 min, 1 min, 2 min, 3 min, 5 min, any other time, etc.).
- the pH of the protein solution can be adjusted to a target pH, wherein the target pH is between 3-12 or any range or value therebetween (e.g., 3-5, 4, 5-7, 6, above 6, above 7, below 7, 10-11, 10, 10.5, 11, etc.), but can alternatively be less than 3 or greater than 12.
- a solution including a phosphorylation reagent e.g., Na 3 (PO 3 ) 3
- a phosphorylation reagent e.g., Na 3 (PO 3 ) 3
- a target concentration in a final mixture e.g. 20 mM-1000 mM, 100 mM-500 mM, 300 mM-400 mM, 80 mM, 150 mM, 250 mM, 350 mM, greater than 150 mM, greater than 80 mM, less than 80 mM, etc.
- the resulting (intermediate) mixture can optionally be homogenized (e.g., for 30 s-10 min, 1 min, 2 min, 3 min, 5 min, any other time, etc.).
- the resulting mixture can be stirred for between 15 min-10 hrs or any range or value therebetween (e.g., 30 min-2 hrs, 1 hr, etc.), but can alternatively be stirred for less than 15 min or greater than 10 hrs.
- the stir rate can be between 100 rpm-10,000 rpm or any range or value therebetween (e.g., 300 rpm-1,000 rpm), but can alternatively be less than 100 rpm or greater than 10,000 rpm.
- the temperature while stirring can be between 10° C.-50° C. or any range or value therebetween (e.g., 20° C.-30° C., room temperature, etc.), but can alternatively be less than 10° C. or greater than 50° C.
- Calcium can optionally be added to the mixture (e.g., to bind calcium to the phosphorylated proteins, to enable the reaction to proceed forward, etc.) before or after phosphorylating agent addition.
- a solution of calcium salts e.g., CaCl2
- a target concentration in a final mixture e.g., 5 mM-40 mM, 20 mM-10000 mM, 20 mM-1000 mM, 80 mM, 100 mM, 140 mM, 240 mM, 300 mM, 400 mM, greater than 140 mM, less than 400 mM, etc.).
- the process parameters in this example can optionally achieve a sticky (e.g., increased adhesion, decreased hardness, etc.) texture in a sample produced using the phosphorylated proteins.
- a sticky texture e.g., increased adhesion, decreased hardness, etc.
- An example is shown in FIG. 14 .
- the texture of the sample can be hardened by increasing the amount of phosphorylating agent, decreasing the amount of calcium salts, decreasing the amount of protein in the starting protein solution, and/or decreasing the pH.
- the phosphorylated proteins can optionally be collected, such as via centrifugation (e.g., collecting the sediment after centrifugation), filtration, precipitation, and/or other protein isolation methods, wherein the proteins can be used in all or parts of the method.
- the centrifugation speed can be between 500 rpm-20,000 rpm or any range or value therebetween (e.g., 1,000 rpm-10,000 rpm, 5,000 rpm, etc.), but can alternatively be less than 500 rpm or greater than 20,000 rpm.
- the centrifugation time can be between 30 s-1 hr or any range or value therebetween (e.g., 5 min-30 min, 10 min, 20 min, etc.), but can alternatively be less than 30s or greater than 1 hr.
- the proteins can optionally be resuspended after collection (e.g., washed) in a diluent (e.g., water).
- the ratio (by volume) between the collected protein and the diluent can be between 1:10-10:1 (e.g., 1:3, 1:2, 1:1, 2:1, 3:1, etc.), but can alternatively be less than 1:10 or greater than 10:1.
- proteins can be otherwise phosphorylated.
- proteins e.g., protein residues, surface-accessible lysine residues, etc.
- proteins can be glycated.
- lysine residues and/or other residues can covalently bond to sugars (e.g., via nucleophilic attack of an acyclic sugar's aldehyde), resulting in a glycated protein.
- glycation can be performed using non-toxic (e.g., at relevant concentrations) catalysts, reagents, byproducts, and/or other substances.
- the resultant glycated protein and/or a sample therefrom can have increased flow at elevated temperatures (e.g., melt); increased solubility; increased hydrophobicity and/or hydrophilicity; decreased toxicity; decreased hydrophobicity and/or hydrophilicity; and/or other changed functional property values relative to an unglycated protein and/or a sample therefrom.
- elevated temperatures e.g., melt
- solubility increased hydrophobicity and/or hydrophilicity
- decreased toxicity decreased hydrophobicity and/or hydrophilicity
- other changed functional property values relative to an unglycated protein and/or a sample therefrom.
- Maillard glycation is conventionally achieved at high temperatures, which may result in protein denaturation and accelerates later stage reactions, including those resulting in advanced Maillard products (AMPs).
- AMPs can give rise to off-flavours and/or off-colours in a sample.
- the method can include catalyzing an initial glycation event (e.g., via base catalysis and/or acid catalysis), which can reduce or remove the need for high temperatures (e.g., in the initial and/or later stages).
- glycating proteins can include: combing proteins and sugars in a solution; adjusting a pH of the solution; and adjusting a temperature of the solution.
- the proteins e.g., protein isolates, SPIs, dissolved and resuspended protein source substrate, etc.
- sugars can be combined in the solution (e.g., dissolved in a diluent such as water) at a target protein concentration and a target sugar concentration.
- the target protein concentration e.g., by weight
- the target sugar concentration can be 5%-70% or any range or value therebetween (e.g., 20%-40%, 20%-30%, 30%-40%, etc.), but can alternatively be less than 5% or greater than 70%.
- sugars examples include: monosaccharides such as pentoses and hexoses (e.g., ribose, arabinose, xylose, glucose, galactose, fructose, etc.); disaccharides; oligosaccharides; polysaccharides; and/or any other sugars.
- the sugars can be plant-based, synthesized, and/or otherwise obtained.
- the sugar used can be selected based on its reactivity. For example, pentoses can be preferred to hexoses, which can be preferred to disaccharides, which can be preferred to oligosaccharides, which can be preferred to polysaccharides. However, the sugars can be otherwise selected.
- the pH of the solution during glycation can be adjusted to a target pH.
- all or parts of the glycation reaction can be performed at an acidic pH (e.g., an acid-catalyzed reaction).
- the target pH can be between 2-7 or any range or value therebetween (e.g., 3-6.5, 4-6, less than 6, etc.), but can alternatively be less than 2 or greater than 7.
- acid catalysts that can be used to adjust the pH can include: hydrochloric acid, Bronsted acids, Lewis acids, and/or other acids.
- all or parts of the glycation reaction can be performed at a basic pH (e.g., a base-catalyzed reaction).
- the target pH can be between 7-11.5 or any range or value therebetween (e.g., 8-11, 9-10.5, 9-10, 10-10.5, greater than 8, greater than 9, etc.), but can alternatively be less than 7 or greater than 11.5.
- base catalysts that can be used to adjust the pH can include: sodium hydroxide, sodium bicarbonate, potassium bicarbonate, ammonium bicarbonate, and/or other bases.
- the acids and/or bases are preferably food safe, but can alternatively be not food safe.
- the temperature of the solution can be adjusted to a target temperature for a target reaction time (e.g., wherein the temperature is maintained throughout the reaction time, wherein the temperature is adjusted during the reaction time, etc.).
- the target temperature can be between 10° C.-200° C. or any range or value therebetween (e.g., at or above 45° C., 40° C.-80° C., at or above 50° C., 55° C.-70° C., below 55° C., at room temperature, above room temperature, etc.), but can alternatively be less than 10° C. or greater than 200° C.
- the target temperature is preferably below the protein's denaturation point, but can alternatively be at or above the denaturation point.
- the target reaction time can be between 1 hour-1 week or any range or value therebetween (e.g., 5 hrs-10 hrs, 8 hrs, 24 hrs-48 hrs, 12 hrs-24 hrs), but can alternatively be less than 1 hour or greater than 1 week.
- proteins can be otherwise glycated.
- modified proteins and/or a sample therefrom can have changed functional property values.
- the change in functional property value can be determined relative to a protein source, an unmodified protein, a reaction intermediary, a sample therefrom, and/or relative to any other compound or substance. Examples of changes can include: 5%, 10%, 30%, 50%, 80%, a range therebetween, over 80%, and/or any other increased or decreased proportion.
- one or more protein modification process variables can be selected, controlled, adjusted, and/or otherwise manipulated to achieve a target functional property value (e.g., target texture).
- variables that can be controlled include: the protein source; protein preprocessing methods (e.g., protein isolation techniques, etc.); protein configuration (e.g., protein isolates, structured or unstructured arrangement of protein isolates, etc.); reagents; protein and/or reagent concentrations; stoichiometric ratio between protein and reagents; reaction scale (i.e., mass of initial protein substrate, volume of solvent); reaction time; reaction temperature; reaction pH, quenching or not quenching; washing (e.g., removal of unreacted reactants and byproducts such as pyrophosphate, unreacted sugars, AMPs, etc.) or not washing; concentration (e.g., presence vs absence of acids, bases, and/or other ingredients; and/or other variables.
- protein preprocessing methods e.g., protein isolation techniques, etc.
- protein configuration e.g., protein isolates, structured or unstructured arrangement of protein isolates, etc.
- reagents protein and/or reagent concentration
- the sample environment can include: a composition of the sample (e.g., other macronutrients and their respective concentrations), sample structure information (e.g., sample matrix type; sample porosity; sample phase such as solid, liquid, and/or gaseous; etc.), pH level, temperature (e.g., temperature at which the functional properties for the sample would be measured), pressure, isoelectric point, and/or any other sample parameters.
- sample structure information e.g., sample matrix type; sample porosity; sample phase such as solid, liquid, and/or gaseous; etc.
- pH level e.g., pH level
- temperature e.g., temperature at which the functional properties for the sample would be measured
- pressure isoelectric point
- any other sample parameters e.g., temperature at which the functional properties for the sample would be measured.
- the protein set can be associated with values for one or more characteristics and/or can be uncharacterized (e.g., lack values for one or more characteristics).
- Characteristics can include: features, functional properties (e.g., an example of functional properties is shown in FIG. 4 ), functionalities (e.g., storage functionalities, breaking down sugar and/or any other molecule, enzyme functionalities, etc.), and/or any other characteristics.
- Features are preferably sequence features (e.g., extracted from one or more amino acid sequences), but can alternatively be other protein characteristics (e.g., molecular features, physicochemical features, protein structure features, context features, etc.).
- Features can be human-interpretable (e.g., semantic features, where features represent specific properties, where the influence of a feature on functional properties is understood, etc.) or not human-interpretable (e.g., nonsemantic).
- features can be annotated to provide human-interpretable context (e.g., by using an explainability or interpretability method applied to one or more models, etc.).
- a feature set can include: all possible features, a subset of features (e.g., selected using dimensionality reduction, selected using a feature selection model, selected features based on correlation with specific functional properties, etc.), a user-defined set of features, weighted features, aggregated features, and/or any other suitable set of features.
- the features within the feature set can be: learned (e.g., using an autoencoder, using a deep learning model, etc.), handcrafted, and/or otherwise determined.
- Each protein set can be associated with one feature value set (e.g., an aggregate feature value set), multiple feature value sets (e.g., one feature value set for each constituent protein, different feature value sets corresponding to different folding configurations, different feature value sets corresponding to different contexts, etc.), not have a feature value set, and/or be associated with any other feature value set.
- a feature value set is a feature value vector, wherein each element is a feature value for a feature in a feature set (e.g., a feature vector).
- each protein in the protein set is associated with a feature value set, wherein an aggregate feature value set (e.g., a representative feature value set) is determined for the protein set based on the feature values of the constituent proteins using a feature aggregation model (e.g., examples shown in FIG. 5 A and FIG. 5 B ).
- a feature value set for the protein set can be directly determined (e.g., using a feature extraction model, using a machine learning model, etc.).
- Features values can include and/or be extracted (e.g., using a feature extraction model) from: sequences (e.g., amino acid sequences, genetic sequences, etc.), measurements and/or other data, structures (e.g., primary, secondary, or tertiary structures that are known, measured, computer-generated, etc.), context, other feature values, and/or any other information.
- features can include: amino acid composition-based features, autocorrelation-based features, profile-based features, pseudo amino acid composition, sequence features (e.g., AA groups, active sites, binding sites, PTM sites, repeats, etc.), domain features, physicochemical features, domains, and/or any other feature.
- features can include and/or be based on: k-mers; pseudo structure status composition (PseSSC); pseudo amino acid composition (PseAAC); composition, transition, and distribution (CTD); grand average of hydropathicity index (GRAVY); autocovariance; auto-cross covariance; top-n-gram; overall amino acid count; count and/or percentage of a specific amino acid; amino acid structure (e.g., amino acid subsequence organization within the amino acid sequence); charge (e.g., overall charge, charge distribution, charge at a given pH, etc.); acidity; hydrophilicity/hydrophobicity; functional groups; flexibility; instability; aromaticity; length; molecular weight; binding affinity; active sites (e.g., count, structure, location, etc.); physicochemical and/or molecular features of amino acids; and/or any other feature.
- PseSSC pseudo structure status composition
- PseAAC pseudo amino acid composition
- CTD composition, transition, and distribution
- GRAVY grand
- Functional properties can include macro functional properties, micro functional properties, nano functional properties, a combination thereof, other characteristics, and/or any other functional properties.
- the set of functional property values for a protein set functions to define how the protein set and/or proteins in the protein set: behaves during sample preparation or cooking, influences the finished sample (e.g., in look, feel, taste, etc.), interacts with other molecules (e.g., secondary interactions, tertiary interactions, quaternary interactions, etc.), denatures (e.g., the denaturization point), folds, aggregates, other target functionalities, and/or any other behavior at the nano, micro, and/or macro scale (e.g., behaviors between the protein as a whole and the context or other proteins, etc.).
- Functional properties can include: nutritional profile (e.g., macronutrient profile, micronutrient profile, etc.), texture (e.g., texture profile, firmness, toughness, puncture, stretch, compression response, mouthfeel, viscosity, graininess, relaxation, stickiness, chalkiness, flouriness, astringency, crumbliness, stickiness, stretchiness, tearability, mouth melt, etc.), solubility, melt profile, smoke profile, gelation point, flavor, appearance (e.g., color, sheen, etc.), aroma, precipitation, stability (e.g., room temperature stability), emulsion stability, ion binding capacity, heat capacity, solid fat content, chemical properties (e.g., pH, affinity, surface charge, isoelectric point, hydrophobicity/hydrophilicity, chain lengths, chemical composition, nitrogen levels, chirality, stereospecific position, etc.), physiochemical properties, compound concentration (e.g., in the solid sample fraction, vial headspace, olfactory bulb, post-gustation, etc.), denaturation
- a functional property set can include: all possible functional property values, a subset of functional properties (e.g., selected using dimensionality reduction, selected using a functional property selection model, etc.), a user-defined set of functional properties, weighted functional properties, and/or any other suitable set of functional properties.
- Functional property values sets can be associated with an individual protein, the entire set of proteins (e.g., a protein mixture; where each protein in the set is assigned the same functional property values, where functional property values are assigned to each protein based on individual protein concentrations within the set, etc.), a subset of the protein set (e.g., one or more proteins with the highest concentrations within the set), and/or be unassociated with the protein set (e.g., manually defined target functional properties).
- the entire set of proteins e.g., a protein mixture; where each protein in the set is assigned the same functional property values, where functional property values are assigned to each protein based on individual protein concentrations within the set, etc.
- a subset of the protein set e.g., one or more proteins with the highest concentrations within the set
- unassociated with the protein set e.g., manually defined target functional properties
- Each protein set can be associated with one functional property value set (e.g., wherein the functional property value set includes a value for each functional property in a functional property set), multiple functional property value sets (e.g., a protein set can be associated with different functional property value sets corresponding to different contexts), not have a functional property value set (e.g., uncharacterized), or be associated with any other functional property value set.
- a given protein can be associated with multiple functional property value sets, wherein each functional property value set corresponds to different protein sets that include the given protein.
- Functional property values can optionally include an uncertainty parameter (e.g., measurement uncertainty, determined using statistical analysis, etc.).
- the functional property values can be determined experimentally (e.g., using an assay tool), determined via computer simulations, predicted (e.g., using a prediction model, based on the sample context, other functional properties, other inputs, etc.), and/or be otherwise determined.
- the functional property values can be: directly measured, analyzed and/or transformed data, features extracted from data (e.g., a data time series), and/or be otherwise determined.
- the system can optionally leverage one or more assays.
- Properties determined using an assay tool can optionally be and/or be used to determine any functional property value and/or feature value.
- assays and/or assay tools include: a differential scanning calorimeter (e.g., to determine properties related to melt, gelation point, denaturation point, etc.), Schrieber Test, an oven (e.g., for the Schrieber Test), a water bath, a texture analyzer, a rheometer, spectrophotometer (e.g., determine properties related to color), centrifuge (e.g., to determine properties related to water binding capacity), moisture analyzer (e.g., to determine properties related to water availability), light microscope (e.g., to determine properties related to microstructure), atomic force microscope (e.g., to determine properties related to microstructure), confocal microscope (e.g., to determine protein association with fat/water), staining (e.g., paired with computer vision models), laser dif
- a sample made using the protein set can be stained (e.g., for lipids and proteins), imaged, and analyzed (e.g., using the image) to determine the sample's lipid and protein structure (e.g., treated as a functional property).
- the sample's can optionally be measured using GC-MS to determine the chemical composition of the sample.
- the method can be used with one or more targets, wherein one or more candidate protein sets (e.g., analogous protein sets) can be determined based on the target (e.g., to replace a target protein set, to manufacture an analog for a target product, to identify a protein set with target characteristic values, etc.).
- candidate protein sets can include: proteins found in a predetermined set of protein sources, proteins expressed by a predetermined set of species, genus, family, and/or other set of organisms, and/or other proteins.
- candidate protein sets can include proteins found in plant-based sources (e.g., substantially excluding animal-based sources), naturally-occurring sources, genetically modified sources, synthetic sources, and/or any other suitable source.
- Target protein sets can include: a protein set to be replaced or replicated, or any other protein set.
- target protein sets can include proteins found in animal-based sources (e.g., dairy sources).
- the target can include one or more: target characteristics (e.g., features, functional properties, etc.), target characteristic values, target protein sets (e.g., a single protein set, a composition of protein sets, etc.), target sources, target products (e.g., target food products), and/or other targets.
- target characteristics e.g., features, functional properties, etc.
- target characteristic values e.g., target protein sets, target sources, target products (e.g., target food products), and/or other targets.
- target food products include: dairy fats (e.g., ghee, other bovine milk fats, etc.), milk (e.g., cow milk, sheep milk, goat milk, human milk, etc.), cheese (e.g., hard cheese, soft cheese, semi-hard cheese, semi-soft cheese), yogurt, cream cheese, dried milk powder, cream, whipped cream, ice cream, coffee cream, other dairy products, egg products (e.g., scrambled eggs), additive ingredients, mammalian meat products (e.g., ground meat, steaks, chops, bones, deli meats, sausages, etc.), fish meat products (e.g., fish steaks, filets, etc.), any animal product, and/or any other suitable food product.
- dairy fats e.g., ghee, other bovine milk fats, etc.
- milk e.g., cow milk, sheep milk, goat milk, human milk, etc.
- cheese e.g., hard cheese, soft cheese, semi-hard cheese, semi-
- the target food product includes mozzarella, burrata, feta, brie, ricotta, camembert, chevre, cottage cheese, cheddar, parmigiano, pecorino, gruyere, edam, gouda, jarlsberg, and/or any other cheese.
- Target characteristic values can optionally be characteristic values for a target product and/or for a target protein set (e.g., associated with a target product).
- Target characteristic values can include a single value and/or ranges.
- a target can be: a single target (e.g., a single target characteristic value set for a given protein set) or aggregated targets (e.g., a vectorized set of feature values and/or functional property values aggregated across multiple protein sets, etc.).
- a target can be: a positive target (e.g., where positive target features are positively correlated with target functional properties; where desired characteristics are positive targets; etc.), or a negative target (e.g., where negative target features are negatively correlated with target functional properties; where undesired characteristics are negative targets; etc.); an example is shown in FIG. 10 .
- the target characteristic values include desired feature values; an example is shown in FIG. 11 .
- the target characteristic values include desired functional property values (e.g., associated with a target protein set, manually specified, etc.); examples shown in FIG. 10 and FIG. 12 .
- the target can be otherwise defined.
- the system can include one or more models, including feature extraction models, correlation models, feature selection models, functional property selection models, prediction models, protein set determination models, feature aggregation models, similarity models, structure prediction models, and/or any other model.
- Any model can include: regression, classification, neural networks (e.g., CNNs, DNNs, etc.), rules, heuristics, equations (e.g., weighted equations, etc.), selection (e.g., from a library), instance-based methods (e.g., nearest neighbor), regularization methods (e.g., ridge regression), decision trees, models used in Bayesian methods (e.g., Na ⁇ ve Bayes, Markov), optimization methods, kernel methods, probability, deterministics, genetic programs, support vectors, and/or any other suitable method.
- neural networks e.g., CNNs, DNNs, etc.
- rules e.g., heuristics, equations (e.g., weighted equations, etc.), selection
- the models can include classical machine learning models (e.g., linear regression, logistic regression, decision tree, SVM, nearest neighbor, PCA, SVC, LDA, LSA, t-SNE, na ⁇ ve bayes, k-means clustering, clustering, association rules, dimensionality reduction, etc.), neural networks (e.g., CNN, CAN, LSTM, RNN, autoencoders, deep learning models, etc.), ensemble methods, heuristics, and/or any other suitable model.
- the models can be scoring models, numerical value predictors (e.g., regressions), classifiers (e.g., binary classifiers, multiclass classifiers, etc.), and/or provide other outputs.
- the models can be trained and/or learned, fit, predetermined, and/or can be otherwise determined.
- the models can be learned using: supervised learning, unsupervised learning, reinforcement learning, Bayesian optimization, positive-unlabeled learning, and/or otherwise learned.
- models can be trained using multiple-instance learning (MIL), learning to aggregate (LTA), and/or any other training approach.
- MIL multiple-instance learning
- LTA learning to aggregate
- the models can be learned or trained on: labeled data (e.g., data labeled with the target label), unlabeled data, positive training sets (e.g., a set of data with true positive labels, negative training sets (e.g., a set of data with true negative labels), and/or any other suitable set of data.
- the models can be specific to: functional properties, a protein set, a context, a target, and/or otherwise specific, or be generic.
- the feature extraction model can function to extract values for features for a protein set (e.g., for each protein in the set, for the protein set as a whole, etc.).
- the feature extraction model can output feature values based on molecular information inputs (e.g., sequences, measurements, data, structure, protein set composition, etc.), context, and/or other information.
- the feature extraction model can use: folding analysis, classifiers, the reduced alphabet approach, Markov models, statistical methods, n-gram analysis, autocovariance, auto-cross covariance, protein descriptor methods (e.g., PseSSC, PseAAC, CTD, GRAVY, etc.), any protein analysis methods, encoders (e.g., trained to encode the sequence to a shared latent space), and/or any other feature extraction technique.
- the feature extraction model extracts handpicked features (e.g., wherein the feature extraction model is trained on a predetermined training value for the feature).
- the feature extraction model can be adopted from another domain (e.g., be a linguistic feature model).
- the feature extraction model can be a subset of the layers from a model trained end-to-end to predict another attribute (e.g., wherein the features can be learned features).
- the feature extraction model can be a subset of layers (e.g., the first several layers, feature extraction layers, intermediary layers, etc.) of a prediction model trained to predict functional property values from protein sequences, context, and/or other inputs (e.g., example shown in FIG. 14 ).
- the feature extraction model can be otherwise configured.
- the extracted features for the protein set can be represented as one or more feature vectors, wherein each vector position can represent a different feature.
- a feature vector is determined for each protein within the set, wherein the feature value is determined based on the protein's sequence and optionally the protein's abundance or concentration within the protein set. Alternatively, the protein's abundance or concentration can be represented by a separate vector.
- a feature vector is determined for each protein set, wherein each feature's value is representative of the feature value for the protein set as a whole.
- the protein set feature vector is determined based on the feature's values for each protein in the protein set (e.g., wherein the different values for a given functional feature are aggregated, predicted, etc.), and optionally determined based on the respective protein's abundance within the protein set (e.g., weighted based on the respective protein's abundance within the set, etc.).
- the extracted features can be otherwise represented.
- the optional feature aggregation model can function to aggregate feature values across proteins in a protein set.
- the feature aggregation model inputs can include: a feature value set (e.g., a feature value vector) for each protein in the protein set, a feature value set for each protein in a subset of the protein set, a protein set composition, context, and/or any other protein set information.
- the feature aggregation model outputs can include an aggregate feature value set (e.g., an aggregate feature value vector) for the protein set.
- the feature aggregation model can optionally interface with and/or be part of the prediction model (e.g., wherein the prediction model aggregates feature values).
- the feature aggregation model can leverage classical or traditional approaches (e.g., heuristics, equations, etc.), leverage machine learning approaches (e.g., have learned parameters/weights, use MIL (multiple instance learning), use LTA learning, etc.), and/or be otherwise constructed.
- the feature aggregation model is a traditional or classical model.
- the feature aggregation model can include a weighted combination (e.g., weighted average, etc.) of the feature value sets for individual proteins in the protein set, wherein the weights can be based on protein type, protein set composition (e.g., protein concentration, protein abundance in the protein set, etc.), and/or any other protein information.
- the feature aggregation model is a neural network.
- the feature aggregation model includes a weighted combination of feature value sets for individual proteins in the protein set with optional interaction terms, wherein the weights and/or the interaction terms are learned parameters.
- the feature aggregation model is the prediction model trained using MIL, wherein each instance is an individual protein with a respective concentration, each bag is a protein set, and bag labels are functional property values.
- the feature aggregation model can be otherwise configured.
- the prediction model can function to predict functional property values for a protein set.
- the prediction model can incorporate a correlation model, feature selection model, functional property selection model, feature aggregation model, and/or any other model.
- the prediction model inputs can include: a feature value set for each protein in the protein set (e.g., a feature value vector), a feature value set for the protein set (e.g., a feature value vector for the protein set as a whole, an aggregate feature value vector, etc.), protein set composition, context (e.g., parametrized into a context vector), correlation information (e.g., outputs from the correlation model), and/or any other protein set information.
- the prediction model outputs can include: a functional property value set and/or any other protein set information.
- the prediction model can include a single model and/or multiple models.
- the models can be arranged in series, in parallel, as distinct models, and/or otherwise arranged.
- the models can be trained separately (e.g., using distinct training data sets), trained together (e.g., using the same training data set, using different subsets of the same training data set, etc.), and/or otherwise trained.
- the prediction model outputs functional property values based on feature values associated with the protein set (e.g., feature values for individual proteins in the protein set and/or for the protein set as a whole).
- feature values associated with the protein set e.g., feature values for individual proteins in the protein set and/or for the protein set as a whole.
- FIG. 8 An example shown in FIG. 8 .
- the model can optionally predict the functional property value based on the context; an example is shown in FIG. 13 .
- the context can be parametrized into a context vector, wherein the context vector can be appended to the protein set feature vector or provided as another input into the model.
- the model can predict a value for a single functional property (e.g., be a regression, classifier trained on a single functional property, etc.), values for multiple functional properties (e.g., be a multiclass classifier), and/or values for any other suitable set of functional properties.
- a single functional property e.g., be a regression, classifier trained on a single functional property, etc.
- values for multiple functional properties e.g., be a multiclass classifier
- the prediction model predicts functional property values based on protein sequences for the protein set.
- the prediction model can output a vector, wherein each vector position can represent a different functional property and the vector value can represent the predicted value for said functional property.
- the prediction model predicts a functional property similarity score, indicative of the protein set's functional property similarity to a target sample's functional property, wherein the model can be analyzed (e.g., using an acquisition function) to determine which protein set (and/or feature vector) can produce a sample with functional properties that are closer to the target sample (e.g., using a Bayesian optimization technique).
- the prediction model predicts the protein set that can produce the target functional property values, target feature values, and/or other target.
- the prediction model (and/or another model) can optionally predict the context (e.g., process parameters) needed to produce the target functional property values.
- the prediction model can predict: which proteins should be included in the protein set, the amount of each protein in the protein set, and/or other aspects of the protein set.
- the prediction model predicts a vector, wherein each vector position represents a different protein, and each value represents an amount of the respective protein.
- the prediction model predicts a protein inclusion vector (e.g., which proteins should be in the set) and a protein amount vector (e.g., how much of the included proteins should be in the set).
- the two vectors can be predicted serially (e.g., protein inclusion vector first, then protein amount vector), at the same time, by the same model, by different models, and/or otherwise predicted.
- the prediction model can be otherwise configured.
- the optional protein set determination model can function to determine a candidate protein set with characteristic values that closely match target characteristic values (e.g., the best/closest match, a match below a threshold, etc.).
- the protein set determination model inputs can include target characteristic values (e.g., target functional property values, target feature values, etc.), constraints (e.g., context constraints), the database, predicted characteristic values (e.g., predicted functional property values for each of a set of candidate protein sets), and/or any other information.
- the protein set determination model outputs can include: the candidate protein set (e.g., a candidate protein set selected from the database), the composition of the candidate protein set (e.g., the concentration for each protein in the set), the context for the candidate protein set, an ingredient (e.g., from which the candidate protein set can be derived; for use in product manufacture or target analog manufacture; etc.), and/or any other protein set information.
- the protein set determination model can use: comparison methods (e.g., matching, distance metrics, etc.), thresholds, optimization methods, regression, selection methods, classification, neural networks (e.g., CNNs, DNNs, etc.), clustering methods, rules, heuristics, equations (e.g., weighted equations, etc.), and/or any other methods.
- the protein set determination model can search the database for a candidate protein set and/or determine a new protein set based on the target characteristics.
- the protein set determination model can optionally interface with and/or be part of the prediction model, the similarity model, and/or any other model.
- the protein set determination model can interface with and/or include the prediction model, wherein functional property values are predicted for each of a set of protein sets (e.g., uncharacterized protein sets) using the prediction model.
- the protein set determination model can then select a candidate protein set based on a comparison between the predicted functional property values and target functional property values (e.g., using the similarity model).
- the protein set determination model can determine the target feature values for a target protein set and identify a candidate protein set based on a comparison (e.g., the similarity) between the respective feature values (for the candidate protein set) and the target feature values, and/or a comparison (e.g., dissimilarity) between the respective feature values and a set of negative target feature values (e.g., feature values from protein sets to avoid).
- a comparison e.g., the similarity
- dissimilarity e.g., dissimilarity
- the protein set determination model can be otherwise configured.
- the optional correlation model can function to determine the correlation, interaction, and/or any other association between features and functional properties.
- a correlation model can determine correlations between features and functional properties.
- the correlation model can determine correlations between any first set of features and/or functional properties and any second set of features and/or functional properties.
- the correlation model inputs can include features (e.g., specifying a subset of features for correlation), feature values (e.g., individual protein feature values and/or aggregate feature values), sequences, functional properties (e.g., specifying a subset of functional properties for correlation), functional property values (e.g., where the feature values and/or functional property values are associated via common protein sets in the database), context, protein set compositions, the database, and/or any other information.
- features e.g., specifying a subset of features for correlation
- feature values e.g., individual protein feature values and/or aggregate feature values
- sequences e.g., sequences
- functional properties e.g., specifying a subset of functional properties for correlation
- functional property values e.g., where the feature values and/or functional property values are associated via common protein sets in the database
- context e.g., protein set compositions, the database, and/or any other information.
- the correlation model outputs can include a mapping between features (e.g., features, feature values, ranges of values, etc.) and functional properties (e.g., functional properties, functional property values, ranges of values, etc.), wherein the mapping can include: correlation coefficients (e.g., negative and/or positive), interaction effects (e.g., negative and/or positive, where a positive interaction effect can represent an increased significance effect of feature A on a functional property when in the presence of feature B), an association, and/or other correlation metric.
- the correlation model can use: classifiers, SVMs, ANNs, RF, conditional random field (CRF), K-nearest neighbors, statistical methods, and/or any other method.
- the mapping between features and functional properties can be an association between features and functional properties (e.g., an autocorrelation feature is correlated with stretchability), feature values and/or ranges thereof with functional properties (e.g., a first range of autocorrelation values is correlated with stretchability, while a second range of autocorrelation values is correlated with spreadability, etc.), features with functional property values and/or ranges thereof, feature values and/or ranges thereof with functional property values and/or ranges thereof (e.g., autocorrelation values are correlated with spreadability values), combinations of features with combinations of functional properties (e.g., including interaction effects between features), combinations of feature values with combinations of functional property values, and/or any other association.
- features and functional properties e.g., an autocorrelation feature is correlated with stretchability
- feature values and/or ranges thereof with functional properties e.g., a first range of autocorrelation values is correlated with stretchability, while a second range of autocorrelation values is correlated with spreadability
- the correlation model can optionally be trained on a set of characterized protein sets (e.g., characterized with feature values, functional property values, etc.).
- the correlation model can identify similar and/or divergent feature values (e.g., calculating an implicit and/or explicit similarity measure) between protein sets and correlate those features to functional properties.
- features with differing values e.g., across protein sets
- the functional properties e.g., across the same protein sets.
- a first feature is mapped to meltability when the feature values for two protein sets are substantially similar (e.g., within a threshold) except for the first feature's values, and the functional property values for the two protein sets are substantially similar except for the meltability values.
- feature value differences e.g., sequence differences determined using a sequence alignment method, a classifier, etc.
- related proteins e.g., where a relation is determined using an evolutionary tree
- correlation model can be otherwise configured.
- the optional feature selection model can function to select a subset of features (e.g., to reduce feature dimensions, to select features likely influencing functional properties, etc.).
- the feature selection model inputs can include: features, feature values, functional properties, functional property values, target characteristic values, correlation information (e.g., outputs from the correlation model, correlation coefficients, interaction effects, etc.), the database, and/or any other protein set information.
- the feature selection model outputs can include: a feature subset, target features (e.g., positive and/or negative targets), and/or any other features.
- the feature selection model can use: supervised selection (e.g., wrapper, filter, intrinsic, etc.), unsupervised selection, recursive feature selection, lift analysis (e.g., based on a feature's lift), any explainability and/or interpretability method (e.g., SHAP values), and/or with any other selection method.
- the feature selection model can be a correlation model (and/or vice versa), can include a correlation model (and/or vice versa), can take correlation model outputs as inputs (and/or vice versa), be otherwise related to a correlation model, and/or be unrelated to a correlation model.
- the feature selection model can optionally be trained to select relevant features for functional property value prediction.
- the training target can be a subset of features with high (positive and/or negative) interaction effects and/or correlation with functional properties (e.g., a correlation coefficient for a feature and/or feature set given a target functional property, interaction coefficients for features, whether an expected correlation and/or interaction was validated and/or invalidated in S 600 , etc.).
- the feature selection model can be otherwise trained.
- the feature selection model can be otherwise configured.
- the optional functional property selection model can function to select a subset of functional properties (e.g., to reduce dimensions, etc.).
- the functional property selection model inputs can include: functional properties, functional property values, target characteristic values, correlation information (e.g., outputs from the correlation model, correlation coefficients, interaction effects, etc.), the database, and/or any other protein set information.
- the functional property selection model outputs can include: a functional property subset, target functional properties (e.g., positive and/or negative targets), and/or any other functional properties.
- the functional property selection model can use: supervised selection (e.g., wrapper, filter, intrinsic, etc.), unsupervised selection, recursive feature selection, lift analysis (e.g., based on a functional property's lift), any explainability and/or interpretability method (e.g., SHAP values), and/or with any other selection method.
- the functional property selection model can be a correlation model (and/or vice versa), can include a correlation model (and/or vice versa), can take correlation model outputs as inputs (and/or vice versa), be otherwise related to a correlation model, and/or be unrelated to a correlation model.
- the functional property selection model can be otherwise configured.
- the optional similarity model can function to compare two sets of characteristic values.
- the similarity model inputs can include candidate protein set characteristic values, target characteristic values, and/or any other information.
- the similarity model outputs can include a comparison metric.
- the similarity model can use: comparison methods (e.g., matching, distance metrics, etc.), thresholds, optimization methods, regression, selection methods, classification, neural networks (e.g., CNNs, DNNs, etc.), clustering methods, rules, heuristics, equations (e.g., weighted equations, etc.), and/or any other methods.
- the comparison metric can be qualitative, quantitative, relative, discrete, continuous, a classification, numeric, binary, and/or be otherwise characterized.
- the comparison metric can be or include a distance, difference (e.g., vector of differences between values for each characteristic, vector of squared differences between values for each characteristic), ratio, regression, residuals, clustering metric (e.g., wherein multiple samples of the candidate and/or target protein sets are evaluated, wherein multiple candidate and/or target protein sets are evaluated, etc.), a statistical measure, and/or any other comparison measure.
- the comparison metric is a distance in feature space (e.g., wherein a characteristic value set is an embedding in the feature space).
- the comparison metric is low (e.g., the candidate protein set is similar to the target product/protein set) when the candidate protein set characteristic values are near (in feature space) positive target characteristic values and/or far from negative target characteristic values.
- the similarity model can be otherwise configured.
- the optional structure prediction model functions to predict the protein folding structure, given the context.
- the resultant structure can be parametrized and used to determine the protein set feature values, used to determine the functional property values, or otherwise used.
- structure prediction models that can be used include: AlphaFold, I-TASSER, HHpred, and/or any other suitable protein structure prediction model.
- the system can optionally include an evolutionary tree (e.g., representing evolutionary relationships or distances between protein sources, protein sets, etc.).
- the evolutionary tree and/or evolutionary distances based on the evolutionary tree can be predetermined (e.g., where the evolutionary tree is stored in the system database and/or a third-party database), be retrieved (e.g., for each source in the database), and/or be otherwise determined.
- the evolutionary tree can be used to identify features, facilitate protein and/or protein set selection, discover a protein source component for a given protein set, and/or be otherwise used.
- the evolutionary tree can be traversed to identify candidate protein sources and/or protein source components (e.g., source components that are more commercially feasible) that might have similar protein sets to a given protein source.
- the method can include: characterizing a protein set S 100 , training a prediction model S 300 , determining target characteristic values S 400 , determining a candidate protein set based on the target characteristic values S 500 , and/or any other suitable steps.
- the method can optionally include selecting a feature subset S 200 , selecting a functional property subset S 250 , evaluating the candidate protein set S 600 , and/or any other suitable steps.
- the method can be performed once (e.g., for a given target), iteratively (e.g., to train one or more models, to iteratively improve determination of a candidate protein set, etc.), concurrently with data generation (e.g., where a database of characterized and/or uncharacterized sources is iteratively updated while one or more protein set determination events are occurring), and/or at any other suitable frequency. All or portions of the method can be performed in real time (e.g., responsive to a request), iteratively, asynchronously, periodically, and/or at any other suitable time. All or portions of the method can be performed automatically, manually, semi-automatically, and/or otherwise performed. All or portions of the method can be performed during training and/or inference (e.g., prediction).
- inference e.g., prediction
- All or portions of the method can be performed by one or more components of the system, by a user, by a computing system, and/or by any other suitable system.
- the computing system can include one or more: CPUs, GPUs, custom FPGA/ASICS, microprocessors, servers, cloud computing, and/or any other suitable components.
- the computing system can be local, remote, distributed, or otherwise arranged relative to any other system or module.
- Characterizing a protein set S 100 functions to determine abstracted characterizations (e.g., feature values, functional property values, etc.) of the protein set, wherein the characterizations can be used to train the prediction model and/or any other model (e.g., to generate training data), to determine correlations between features and functional properties, to expand the database, and/or for any other downstream functionality.
- S 100 can be performed before S 400 and/or at any other time.
- the protein set is characterized as a whole (e.g., where characteristic values are determined for and/or associated with the protein set as a unit).
- the protein set is characterized based on the characteristic values (e.g., feature values, functional property values, etc.) of the constituent proteins.
- each constituent protein is individually characterized, and the mixture characterization is determined based on the individual characterizations (e.g., a set including the individual characteristic values, aggregated individual characteristic values, characteristic values weighted based on the concentration of the constituents in the protein mixture, characteristic values weighted based on a relative importance for a constituent protein in influencing functional properties, etc.).
- the protein set characterization can be determined using a model (e.g., the feature aggregation model, a machine learning model, etc.) that determines protein set characteristic values based on the individual characterizations of the constituent proteins.
- a subset of proteins in the mixture are assigned characteristic values (e.g., only the highest concentrated protein(s) are assigned feature values and/or functional property values, proteins having a concentration percent value higher than a threshold, etc.).
- Characterizing a protein set can include: optionally determining a composition of the protein set (e.g., S 120 ), determining sequences for the protein set (e.g., S 140 ), determining feature values for the protein set (e.g., S 160 ), determining functional property values for the protein set (e.g., S 180 ), and/or optionally determining a functionality (e.g., impact on functional properties, interaction with other molecules, structural functions, etc.) of the protein set (e.g., using machine learning annotation, using a correlation model, using explainability and/or interpretability methods, etc.).
- S 120 , S 140 , S 160 , and S 180 are performed for training protein sets, while only S 120 , S 140 , and S 160 are preformed for candidate protein sets. However, S 100 can be otherwise performed.
- Characterizing the protein set can optionally include manufacturing a sample using the protein set (e.g., wherein the manufacturing process is defined based on a context associated with the protein set), wherein all or parts of S 100 are performed for the sample.
- the sample can optionally be processed prior to, during, or after, performing any assay (e.g., using dilution, centrifugation, dehydration, lyophilization, reconstitution, concentration methods, etc.).
- Determining a composition of the protein set S 120 functions to identify each protein and/or the concentration of each protein in the set (e.g., a concentration for each protein within the protein set, a concentration of each protein within a sample containing the protein set, etc.).
- the composition can be manually or automatically specified (e.g., for a candidate protein set).
- the composition of the protein set can be measured (e.g., using mass spectrometry proteomics, a Bradford assay, capillary Electrophoresis SDS, and/or any other assay).
- a sample can be manufactured using the protein set, wherein the protein set composition in the sample is measured using one or more assays and/or assay tools.
- a total protein quantification and individual protein abundances can be measured for the sample, wherein the concentration for each protein in the sample is based on the total protein quantification and individual protein abundances.
- the composition can be inferred using bioinformatics (e.g., machine learning techniques applied to codons), genomics, transcriptomics, and/or other protein expression prediction techniques. However, the composition can be otherwise determined.
- the protein concentrations can be used to identify the most abundant proteins in the set, to weight variables (e.g., features), used in downstream analyses to determine proteins that have a disproportionate effect on functional properties relative to their concentration, and/or otherwise used.
- a protein set and/or data associated with a protein set can be adjusted based on the protein composition.
- a subset of the protein set is determined (e.g., to represent the complete protein set), wherein the subset includes the highest prevalence proteins in the set.
- proteins that occupy a proportion of the protein set above a threshold percentage are selected as the subset, wherein the threshold percentage can be between 0.5%-50% or any range or value therebetween (e.g., 1%, 2%, 5%, 10%, 15%, 20%, 25%, 50%, etc.), but can alternatively be less than 0.5% or greater than 50%.
- proteins with an overall concentration in the sample above a threshold percentage are selected as the subset, wherein the threshold percentage can be between 0.05%-20% or any range or value therebetween (e.g., 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 15%, etc.), but can alternatively be less than 0.05% or greater than 20%.
- a threshold number of the highest prevalence proteins are selected for the subset, wherein the threshold number can be between 1-100 or any range or value therebetween (e.g., 2-10, 5, 10, 15, etc.), but can alternatively be greater than 100.
- proteins having a certain set of characteristics e.g., binding affinity, adsorption affinity, etc.
- data associated with a protein set can be weighted based on the proportions of each constituent protein.
- the protein set composition can be otherwise determined.
- Determining sequences for the protein set S 140 functions to determine information for feature extraction (e.g., for sequence-based features) and/or to directly determine feature values (e.g., where the feature values are sequences). Sequences can be measured (e.g., using an assay), retrieved (e.g., from a third-party database), and/or otherwise determined. Determining sequences can optionally include determining secondary information associated with the sequences (e.g., protein structure information, metadata, etc.).
- a sequence is preferably determined for each individual protein in the protein set (e.g., retrieved from a databased, determined using protein sequencing, etc.), but can alternatively be determined for a subset of proteins in the protein set, be determined directly for the protein set as a whole, and/or be otherwise determined. However, sequences for the protein set can be otherwise determined.
- Determining feature values for the protein set S 160 functions to computationally identify characterization values (e.g., molecular property values) of the protein set.
- S 160 can be performed one or more times for each protein in a protein set, one or more times for each protein in a subset of the protein set (e.g., for the highest prevalence proteins in the protein set), one or more times for each protein set (e.g., iterating through a database), after S 200 (e.g., where feature values for a protein set are determined for the features selected in S 200 ), and/or at any other suitable time.
- the feature values can optionally be a feature value vector (e.g., wherein each element of the vector is a feature value for a feature in a feature set).
- Feature values are preferably determined using the feature extraction model, but can alternatively be otherwise determined.
- feature values can be computationally determined.
- feature values can be extracted from sequences (e.g., amino acid sequences).
- feature values can be based on a computationally-determined protein charge and/or charge distribution.
- feature values can be determined based on a modeled folding pattern (e.g., a likely protein folding pattern).
- context feature values can be determined based on context information (e.g., extracted from ingredient lists, treatments, protein modifications, etc.) and optionally the protein sequences.
- feature values can be measured and/or extracted from measurements (e.g., experimentally determined using assays).
- feature values can be determined using a simulation (e.g., protein folding simulation, protein functionality simulation, protein interaction simulation, etc.).
- feature values can be retrieved from a database (e.g., a third-party database, the system database, etc.).
- a first subset of feature values can be determined using a first feature extraction model while the remaining feature values are determined using a second feature extraction model (e.g., using values from the first feature subset, using other information, etc.).
- Feature values can be determined using one or more of the variants.
- feature values can be computationally determined and subsequently validated and/or updated using measurements (e.g., values for water binding capacity can be estimated based on computationally determined charge distribution and/or folding pattern, then subsequently tested using centrifugal compression).
- the amino acid sequence for each protein of the protein set e.g., for a subset of the protein set including the highest prevalence proteins
- context feature values can be determined based on context information retrieved from the database, and sequence feature values can be determined using a feature extraction model.
- S 160 can optionally include aggregating feature values across individual proteins in the protein set (e.g., all proteins in the set, a subset of the protein set, etc.).
- an aggregated feature value set e.g., aggregated feature value vector
- the feature values are preferably aggregated using the feature aggregation model, but can alternatively be otherwise aggregated.
- aggregating feature values includes summing the values for each feature across the proteins of the protein set (e.g., optionally weighted by concentration or abundance).
- aggregating feature values includes predicting an aggregated feature vector based on the feature value set for each protein of the protein set and optionally the respective protein concentration or abundance (e.g., wherein the feature value sets can be concatenated, fed to different input heads, etc.).
- aggregating feature values can include predicting the aggregated feature vector based on the protein sequences of the proteins within the protein set (e.g., wherein the protein sequences can be concatenated, fed to different input heads, etc.).
- feature values can be otherwise determined.
- Determining functional property values for the protein set S 180 functions to determine behavior of the protein set. S 180 can be performed before S 160 , after S 160 , during S 600 , iteratively, and/or at any other suitable time.
- the functional property values are preferably measured and/or otherwise directly determined values for a set of functional properties, but can alternatively be manually assigned, inferred, predicted, or otherwise determined.
- functional property values are measured and/or extracted from measurements (e.g., measurements determined using any assay and/or assay tool).
- a sample can be manufactured using the protein set, wherein the functional property values for the protein set are measured using one or more assays and/or assay tools.
- the functional property values are determined for a protein and lipid gel (e.g., wherein the gel manufacturing is prescribed by a context associated with the protein set).
- the functional property values can be determined using one or more experimental environments, treatments, and/or any other variable (e.g., where the set of functional property values determined in an environment are associated with that variable).
- the functional property values can be retrieved from a database (e.g., a third-party database).
- the functional property values can be computationally determined.
- the functional property values can be determined based on simulations (e.g., computer simulations of protein dynamics).
- the functional property values can be predicted using prediction model (e.g., based on the protein set feature values, etc.).
- the method can optionally include selecting a feature subset S 200 , which functions to select features which most likely influence (e.g., have a measurable effect on, a significant effect on, a disproportionate effect relative to their concentration, etc.) one or more functional properties and/or to reduce feature space dimensions (e.g., to reduce computational load).
- S 200 can be performed after S 100 , before S 400 , during and/or after S 300 , and/or at any other suitable time.
- the feature subset can be selected using a feature selection model, using a correlation model, randomly, with human input, and/or be otherwise determined.
- the feature subset can be features (e.g., target features) that influence functional properties.
- the feature selection model uses lift analysis (e.g., applied to a prediction model trained to output functional property values based on the feature values) to select the subset of features with lift above a threshold.
- features with prediction model weights above a threshold value are selected as the feature subset, wherein the model weights can be determined during and/or after prediction model training.
- a correlation model can be used to determine features positively and/or negatively correlated to one or more functional properties (e.g., absolute value of correlation coefficient above a threshold, a confidence score above a threshold, etc.).
- the subset of features can be determined using any dimensionality reduction technique (e.g., principal component analysis, linear discriminant analysis, etc.).
- any dimensionality reduction technique e.g., principal component analysis, linear discriminant analysis, etc.
- the subset of features can be determined based on a comparison between a target (e.g., a target protein set and/or target product) and a candidate protein set (e.g., a prototype protein set), wherein the subset of features (e.g., used to predict functional properties for a second candidate protein set) can be selected based on the similarities and/or differences between the respective functional property values.
- a difference between functional property values associated with the target and candidate protein set can be determined (e.g., where values for one or more functional properties differ significantly between the target and candidate).
- the differing functional property values can define a functional property subset (e.g., target functional properties).
- target functional properties can then be used to determine a feature subset (e.g., target features), wherein the feature subset can be the feature(s) mostly likely to influence the functional property subset (e.g., based on a correlation model output).
- a difference between feature values associated with the target and candidate protein set can be determined (e.g., where one or more functional property values differ between the two sets).
- the features associated with the differing feature values can define the feature subset (e.g., target features).
- the feature subset can be otherwise selected.
- the method can optionally include selecting a functional property subset S 250 , which functions to reduce functional property space dimensions (e.g., to reduce computational load).
- S 250 can be performed after S 100 , before S 400 , during and/or after S 300 , and/or at any other suitable time.
- the functional property subset can be selected using a feature selection model, using a correlation model, randomly, with human input, and/or be otherwise determined.
- the subset of functional properties can be determined using any dimensionality reduction technique (e.g., principal component analysis, linear discriminant analysis, etc.).
- any dimensionality reduction technique e.g., principal component analysis, linear discriminant analysis, etc.
- the subset of functional properties can be determined based on a comparison between a target (e.g., a target protein set and/or target product) and a first candidate protein set (e.g., a prototype protein set), wherein the subset of functional properties can be selected based on the similarities and/or differences between the functional property values for the target and the first protein set.
- a difference between functional property values associated with the target and candidate protein set can be determined (e.g., where values for one or more functional properties differ significantly between the two sets).
- the differing functional property values can define a functional property subset (e.g., target functional properties).
- the functional property subset can be otherwise selected.
- Training a prediction model S 300 functions to improve functional property value predication, candidate protein set determination (using the prediction model), and/or any other part of the method.
- S 300 can be performed after S 100 and/or at any other time.
- training the prediction model includes determining training data including feature values (e.g., determined via S 160 ) and corresponding functional property values (e.g., determined via S 180 ) for one or more protein sets (e.g., a set of protein sets).
- the functional property values in the training data are preferably measured, but can alternatively be otherwise determined (e.g., using any other method in S 180 ).
- the prediction model is then trained using the training data to predict the functional property values for a protein set based on the feature values for the protein set. Examples are shown in FIG. 6 , FIG. 7 A , and FIG. 7 B .
- the training data can include positive samples (e.g., with no negative samples), wherein the prediction model is trained using positive-unlabeled learning.
- the training data can include negative samples, wherein the prediction model can be trained to distance the prediction from the negative samples.
- one or more prediction models can be otherwise trained.
- Determining target characteristic values S 400 functions to specify one or more criteria for candidate protein set determination.
- the candidate protein set can be selected to manufacture an analog for a target product, to replace a target protein set (e.g., a protein set to be replicated and/or replaced, a protein set to be replicated with specified modifications, etc.), to meet a desired set of characteristic values, and/or otherwise used.
- S 400 can be performed after S 100 (e.g., after a target protein set has been characterized) and/or at any other time.
- the target characteristic values are preferably associated with a characterized protein set (e.g., a characterized target protein set), but alternatively can be associated with an uncharacterized protein set, be associated with a source and/or source component, be associated with a target product (e.g., target food product), be otherwise associated with protein set information, and/or not be associated with a protein set and/or source.
- the target characteristic values can be all or a subset of: the functional property values, the feature values, the amino acid sequences, and/or any other characteristic value associated with a target: product, source, source component, and/or protein set.
- the target characteristic values can be determined manually, automatically, predetermined, with a model (e.g., target features selected using a feature selection model, target functional properties selected using a functional property selection model, etc.), based on a target product and/or target protein set, based on a use case (e.g., the use case for the candidate protein set, for the associated target protein set, etc.), retrieved from a database (e.g., where target functional property values are those associated with a target protein set in the database), measured, and/or be otherwise determined.
- a model e.g., target features selected using a feature selection model, target functional properties selected using a functional property selection model, etc.
- a use case e.g., the use case for the candidate protein set, for the associated target protein set, etc.
- a database e.g., where target functional property values are those associated with a target protein set in the database
- the target characteristic values include target feature values.
- the target feature values can be determined for a target protein set using S 160 methods.
- a subset of feature values of the target protein set can be used as the target characteristic values, where the subset can correspond to the feature subset determined in S 200 .
- target functional property values are used to determine target feature values.
- a correlation model is used to identify feature values associated with the target functional property values.
- the target characteristic values include target functional property values.
- the target functional property values can be determined for a target product and/or protein set using S 180 methods.
- the target functional property values can be manually specified (e.g., desired or optimal functional property values for a product, a desired change in functional property values relative to functional property values for a protein set, etc.).
- the target characteristic vales can include target feature values and target functional property values (e.g., a combination of the first and second variants).
- target characteristic values can be otherwise determined.
- S 500 functions to determine a protein set that satisfies target criteria (e.g., has desired characteristic values, mimics a target product/protein set, etc.). Additionally or alternatively, S 500 functions to determine a candidate protein set for evaluation in S 600 (e.g., wherein characterization of the candidate protein set can train the prediction model). S 500 can be performed after S 400 , after S 300 , during S 300 (e.g., as part of training), and/or at any other suitable time.
- target criteria e.g., has desired characteristic values, mimics a target product/protein set, etc.
- S 500 functions to determine a candidate protein set for evaluation in S 600 (e.g., wherein characterization of the candidate protein set can train the prediction model).
- S 500 can be performed after S 400 , after S 300 , during S 300 (e.g., as part of training), and/or at any other suitable time.
- Determining the candidate protein set can optionally include determining the composition of the candidate protein set (e.g., determining each protein in the set and/or determining the concentration of each protein in the set) and/or selecting a context for the candidate protein set.
- determining each protein in the candidate protein set includes individually selecting each individual protein in the candidate protein set from proteins in a candidate group of protein sets. In a second variant, determining each protein in the candidate protein set includes selecting the candidate protein set as a whole from the candidate group of protein sets.
- the candidate group of protein sets can include uncharacterized protein sets, partially characterized protein sets (e.g., with feature values but not functional property values), fully characterized protein sets (e.g., with both feature values and functional property values), known or estimated abundant protein sets (e.g., determined based on functional protein labelling), and/or any other set of protein sets.
- the candidate group can optionally be a subset of the system database (e.g., to reduce the computational resources, to reduce the search space, to constrain all or parts of the selection, etc.).
- the candidate group can include a subset of protein sources (e.g., candidate protein sources, wherein all or parts of the protein sets associated with each candidate protein source are included), a subset of protein sets, and/or any other subset.
- an evolutionary tree is used to identify protein sources evolutionarily related to a target protein source, wherein the candidate group includes protein sets associated with the identified protein sources.
- the candidate group includes a set of protein sets with target feature values (e.g., within a threshold similarity to target feature values).
- Each protein set in the candidate group can optionally be associated with one or more concentrations and/or contexts.
- each protein set can be associated with a predetermined set of possible values for each concentration and context parameter (e.g., a protein set can be associated with each unique combination of possible compositions and context values).
- a protein set in the candidate group includes [Protein 1, Protein 2]; the possible compositions for the protein set include: [70%, 30%], [30%, 70%], and [50%, 50%]; the possible contexts for the protein set include: [combine with canola oil, heat to 65° C., glycosylation of Protein 1], [combine with kokum butter, heat to 65° C., glycosylation of Protein 1], [combine with canola oil, heat to 72° C., glycosylation of Protein 1], [combine with kokum butter, heat to 72° C., glycosylation of Protein 1], [combine with canola oil, heat to 65° C., no glycosylation of Protein 1], [combine with kokum butter, heat to 65° C., no glycosylation of Protein 1], [combine with canola oil, heat to 72° C., no glycosylation of Protein 1], and [combine with kokum butter, heat to 72
- the candidate protein set can be determined based on: the target and candidate protein set's characteristic values (e.g., functional property values, feature values, etc.), estimated abundance and/or ease of extraction (e.g., determined based on the protein set's functionality, the protein source, the source component, etc.), the database, and/or any other factor.
- Any candidate protein set determination method can optionally be supplemented based on protein source and/or source component information (e.g., where the probability of selecting a protein set as the candidate protein set increases if the protein set is likely to be abundant within the protein source and/or the protein source itself is likely to be abundant relative to a threshold).
- the candidate protein set is determined using optimization approaches (e.g., Bayesian optimization, machine learning recommender systems, etc.).
- the candidate protein set can be selected as a training protein set for characterization (e.g., to expand the training data for use in S 300 ), wherein optimization approaches can be used to reduce (e.g., minimize) the number of additional training protein sets that are needed to train the prediction model and/or to identify a candidate protein set that satisfies the target criteria.
- the candidate protein set can be determined by comparing (e.g., matching) one or more characteristic values using a similarity model to generate a comparison metric (e.g., example shown in FIG. 9 ).
- characteristic values for each protein set in the candidate group e.g., for each unique protein set composition and context pair
- the prediction model can be predicted (e.g., using the prediction model), wherein the predicted characteristic values are compared to the target characteristic values to generate the comparison metric.
- the candidate protein set (e.g., with associated composition and context) can then be determined based on the comparison metric (e.g., selecting the protein set with the minimum or maximum comparison metric, selecting a protein set with a comparison metric above or below a threshold, selecting the protein set using a protein set determination model, etc.).
- the comparison metric e.g., selecting the protein set with the minimum or maximum comparison metric, selecting a protein set with a comparison metric above or below a threshold, selecting the protein set using a protein set determination model, etc.
- the candidate protein set's predicted functional property values can be compared to target functional property values (e.g., for an analogous set of functional properties).
- target functional property values e.g., for an analogous set of functional properties.
- the candidate protein set's functional property values are predicted using the prediction model (e.g., based on feature values, based on context, etc.).
- the candidate protein set's feature values can be compared to target feature values. For example, a match between positive target feature values and candidate protein set feature values can increase the probability of selection of the candidate protein set, whereas a match between negative target feature values and to candidate protein set feature values can the probability of selection.
- a candidate protein source and/or candidate protein set can be selected based on an evolutionary tree.
- the candidate protein source is selected by identifying a protein source based on a close evolutionary relationship with a target protein source and/or protein source containing a matching candidate protein set.
- the candidate protein set can be selected by identifying close evolutionary relationships between proteins in the candidate protein set and proteins in a target protein set.
- additional candidate protein set(s) can be selected after a first selection event of a first candidate protein set by identifying additional protein set(s) based on close evolutionary relationships to the first candidate protein set.
- the candidate protein set can optionally be used to manufacture an analog for a target food product (e.g., dairy analog, meat analog, egg analog, any animal product analog, etc.) and/or any other sample (e.g., product).
- a protein source associated with the candidate protein set can be selected as an ingredient for manufacturing a product.
- proteins in the candidate protein set can be extracted and/or isolated from one or more sources, wherein a sample is manufactured (e.g., based on a context associated with the candidate protein set) using the proteins to have the determined candidate protein set composition.
- the candidate protein set can be otherwise determined.
- the method can optionally include evaluating the candidate protein set S 600 , which functions to determine whether the candidate protein set can be used in an analog for a target product, whether the candidate protein set can be used as a replacement for a target protein set, whether the candidate protein set has the desired (e.g., target) characteristic values, to determine feedback for a model (e.g., for training the prediction model, the protein set determination model, and/or any other model), and/or to compare the functional property values of the candidate protein set to one or more other functional property values.
- a model e.g., for training the prediction model, the protein set determination model, and/or any other model
- S 600 can be performed after S 500 , after S 180 (e.g., after the candidate protein set is characterized with functional property values), iteratively (e.g., until a stop condition is met, such as substantial similarity to the target), and/or at any other time.
- a search for a protein set can be continued (e.g., iteratively performing S 500 and S 600 ) until a candidate protein set satisfies a set of target criteria (e.g., stopping when the evaluation indicates that the candidate protein set characteristic values fall within target ranges), until a comparison metric is below or above a threshold, for a predetermined number of iterations, and/or until any other stop condition is met.
- the target criteria include one or more ranges of characteristic values based on target characteristic values (e.g., predetermined ranges around the target characteristic values).
- S 600 can include: determining functional property values for the candidate protein set (e.g., S 180 performed for the candidate protein set), and determining a comparison metric based on the resultant functional property values (e.g., using the similarity model).
- determining functional property values for the candidate protein set includes manufacturing a sample containing the candidate protein set (e.g., at a protein composition determined in S 500 and/or using a context determined in S 500 ), wherein the sample is subjected to assays to measure functional property values.
- the sample (e.g., target food replica) can be manufactured by mixing the protein set with a set of other ingredients (e.g., plant-derived ingredients, such as fats, oils, sugars, etc.) and processing the mixture (e.g., by heating, reacting, inoculating, fermenting, etc.).
- the sample can be manufactured by gelling the protein, then using the gel as an ingredient.
- the manufactured samples can be entirely or mostly plant-derived (e.g., more than 70%, 80%, 90%, 99%, etc. plant-derived components by weight or volume).
- the comparison metric can be based on a comparison between the candidate protein set's measured functional property values and predicted functional property values (e.g., predicted functional property values for the candidate protein set determined using the prediction model).
- the comparison metric can be based on a comparison between the candidate protein set's measured functional property values and target functional property values (e.g., the functional property values of a target protein set).
- a comparison metric above or below a threshold corresponds to negative feedback in model training (e.g., S 400 and/or any other model training).
- the candidate protein set can be otherwise evaluated.
- the method can optionally include determining interpretability and/or explainability of the trained prediction model, which can be used to select features, select functional properties, identify errors in the data, identify ways of improving the prediction model, increase computational efficiency, determine influential features and/or values thereof, determine influential functional properties and/or values thereof, and/or otherwise used.
- Interpretability and/or explainability methods can include: local interpretable model-agnostic explanations (LIME), Shapley Additive explanations (SHAP), Ancors, DeepLift, Layer-Wise Relevance Propagation, contrastive explanations method (CEM), counterfactual explanation, Protodash, Permutation importance (PIMP), L2X, partial dependence plots (PDPs), individual conditional expectation (ICE) plots, accumulated local effect (ALE) plots, Local Interpretable Visual Explanations (LIVE), breakDown, ProfWeight, Supersparse Linear Integer Models (SLIM), generalized additive models with pairwise interactions (GA2Ms), Boolean Rule Column Generation, Generalized Linear Rule Models, Teaching Explanations for Decisions (TED), and/or any other suitable method and/or approach.
- LIME local interpretable model-agnostic explanations
- SHAP Shapley Additive explanations
- Ancors Ancors
- DeepLift Layer-Wise Relevance
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- the computer-executable component can include a computing system and/or processing system (e.g., including one or more collocated or distributed, remote or local processors) connected to the non-transitory computer-readable medium, such as CPUs, GPUs, TPUS, microprocessors, or ASICs, but the instructions can alternatively or additionally be executed by any suitable dedicated hardware device.
- a computing system and/or processing system e.g., including one or more collocated or distributed, remote or local processors
- the instructions can alternatively or additionally be executed by any suitable dedicated hardware device.
- Embodiments of the system and/or method can include every combination and permutation of the various system components and the various method processes, wherein one or more instances of the method and/or processes described herein can be performed asynchronously (e.g., sequentially), contemporaneously (e.g., concurrently, in parallel, etc.), or in any other suitable order by and/or using one or more instances of the systems, elements, and/or entities described herein.
- Components and/or processes of the following system and/or method can be used with, in addition to, in lieu of, or otherwise integrated with all or a portion of the systems and/or methods disclosed in the applications mentioned above, each of which are incorporated in their entirety by this reference.
Abstract
Description
- This application claims the benefit of U.S. Provisional Application No. 63/297,966 filed 10 Jan. 2022, US Provisional Application Ser. No. 63/298,920 filed 12 Jan. 2022, US Provisional Application Ser. No. 63/298,927 filed 12 Jan. 2022, and US Provisional Application Ser. No. 63/298,930 filed 12 Jan. 2022, each of which is incorporated in its entirety by this reference.
- This invention relates generally to the food science field, and more specifically to a new and useful system and method in the food science field.
-
FIG. 1 is a schematic representation of a variant of the method. -
FIG. 2 is a schematic representation of a variant of the system. -
FIG. 3 depicts an illustrative example of a database. -
FIG. 4 depicts an illustrative example of functional property value sets associated with different source components and constituent proteins. -
FIGS. 5A and 5B depicts illustrative examples of aggregating feature values for a protein set. -
FIG. 6 depicts an embodiment of training a prediction model. -
FIG. 7A depicts a first example of training a prediction model to predict functional property values. -
FIG. 7B depicts a second example of training a prediction model to predict functional property values. -
FIG. 8 depicts an example of determining a candidate protein set. -
FIG. 9 depicts an illustrative example of determining a candidate protein set. -
FIG. 10 , depicts an embodiment of target determination. -
FIG. 11 depicts another embodiment of target determination. -
FIG. 12 depicts an example of predicting the functional properties for a protein set and optionally predicting a protein set or protein source set. -
FIG. 13 depicts an example of predicting the functional properties for a protein set. -
FIG. 14 depicts example functional property values for samples produced using phosphorylated proteins. - The following description of the embodiments of the invention is not intended to limit the invention to these embodiments, but rather to enable any person skilled in the art to make and use this invention.
- As shown in
FIG. 1 , the method can include: characterizing a protein set S100, training a prediction model S300, determining target characteristic values S400, determining a candidate protein set based on the target characteristic values S500, and/or any other suitable steps. - In variants, the method can function to determine a candidate protein set with a desired set of functional property values (e.g., wherein the candidate protein set can be used in a replacement for a target food product). For example, the candidate protein set can be selected to replicate target functional property values of and/or replace: caseins, leather proteins (e.g., collagen, gelatin, etc.), meat proteins (e.g., myosin), and/or any other protein set. In variants, the method can optionally determine protein source sets that contain the candidate protein set.
- In an example, the method can include: predicting functional property values given a protein set and optionally a context (e.g., example shown in
FIG. 14 ). In an illustrative example, the method can include: extracting feature values from the amino acid sequences for each of a set of protein sets, measuring functional property values for the set of protein sets, and training a prediction model to predict functional property values for a protein set based on feature values for the respective protein set. In a specific example, the prediction model can predict functional property values for the protein set based on aggregated feature values across individual proteins in the protein set. A protein set can optionally be associated with a composition (e.g., a relative and/or absolute concentration for each protein in the training protein set) and/or a context (e.g., manufacturing process parameters, protein modifications, etc.), wherein the composition and/or context can be inputs to the prediction model (e.g., separate vectors, concatenated to the protein set feature vector, used to weight the protein set feature vector, etc.). - In variants, measuring functional property values for a protein set can include manufacturing a sample matching the protein set composition using the process parameters and/or other context information, wherein the functional property values for the protein set are measured using assays. In a specific example, the target functional property values can be directly measured for a target product (e.g., a target food product).
- In variants, the prediction model can be used to predict functional property values for each protein set in a candidate group, wherein the candidate group includes uncharacterized protein sets (e.g., without measured functional property data). A candidate protein set (e.g., including an associated composition and/or context) and/or a protein source with a high probability of producing the candidate protein set can then be selected from the candidate group based on a similarity between the predicted functional property values and target functional property values. Additionally or alternatively, a candidate protein set can be extracted from the prediction model (e.g., using an acquisition function), be predicted by a second model (e.g., a decoder), and/or otherwise determined.
- Variants of the technology can confer one or more advantages over conventional technologies.
- First, previous protein selection methodologies (e.g., to identify replacements for dairy and/or meat proteins) relied heavily on domain knowledge, previously researched protein alternatives, and laborious manual testing. Variants of the technology can utilize a computational approach to explore the extremely large and under-investigated protein space to identify candidate proteins that would not have otherwise been identified. For example, variants of the method can identify protein replacements based on the similarities between the amino acid sequence features (AA sequence features) of the candidate proteins and the target proteins (proteins to be replaced), and/or based on similarities between the predicted functional properties of the candidate proteins and the functional properties of the target product (e.g., food).
- Second, variants of the technology can use a subset of features (e.g., subset of amino acid sequence features) which are likely to be important in influencing functional behavior. In a specific example, the functional property values are experimentally determined for protein sets (e.g., gelled mixtures of proteins) to capture important protein-protein interactions influencing function, and correlated with the feature values for the constituent proteins, wherein predictive features are selected for subsequent analysis based on the correlation. In a second specific example, lift analysis can be used (e.g., during and/or after training a prediction model) to select a subset of features with high lift. This feature selection can reduce computational complexity and/or enable human-interpretable annotation of the features.
- Third, variants of the technology can reduce the need for experimental analysis of proteins to determine their candidacy potential. In an example, a large domain of available protein sets can be computationally analyzed (e.g., using featurization of their amino acid sequences) rather than experimentally analyzed to evaluate their potential to replicate functional properties of a target set of proteins. This analysis methodology can enable a much larger group of candidates to be considered than if experimental analysis of each protein set were required.
- Fourth, variants of the technology can reduce the need for experimental analysis of potential protein sources by predicting whether a protein source (e.g., plant, plant component, etc.) will include sufficient amounts of a given protein or protein set, such as by using genetic analyses and/or evolutionary tree analyses.
- However, further advantages can be provided by the system and method disclosed herein.
- Variants of the system can include a database and a set of models. The system functions to determine the functional properties for protein sets, determine which protein sets can produce a set of target functional properties, determine which protein sources can produce a target protein set, and/or be otherwise used.
- An example of the system, including a database, is shown in
FIG. 2 . An example of the database is shown inFIG. 3 . The database can include proteins, protein sets (e.g., protein set identifiers), protein set compositions (e.g., identification of proteins in the set, relative and/or absolute concentrations of proteins in the set, etc.), sequences, features, feature values, functional properties, functional property values, protein sources and/or source components, evolutionary relationships, contexts (e.g., process parameters, protein modifications, sample environment, etc.), and/or any other elements. The system can optionally include and/or interface with one or more third-party databases (e.g., a sequence database, a protein database, amino acid composition database, etc.). In a first example, elements stored in the system database can be retrieved from a third-party database. In a second example, the system database can be a third-party database. - A protein set can be an individual protein (e.g., a set of one, an individual protein within a larger set, etc.), multiple proteins (e.g., a mixture of proteins, proteins within a source and/or source component; within a gel, sample, product, solution, combination, and/or other mixture; within a food product; within a consumer product; etc.), a set of protein sets, and/or be otherwise defined.
- The protein set can be from one or more protein sources (e.g., combination of protein sources), from one or more components of protein sources, be manually specified, and/or be otherwise determined. The protein source can be plant matter (e.g., processed and/or unprocessed plant matter), animal matter (e.g., milk such as cow milk, insects such as Acheta domesticus, meat, etc.), bacterium (e.g., naturally occurring, genetically modified, etc.), any organism (e.g., identified by a species name, a common name, etc.), a food product, a naturally-occurring protein source, a synthetic protein source, and/or any other entity and/or component (e.g., protein source component) thereof. The protein source component (e.g., the part of the source where the protein set can be derived) can be a nut, fruit, seed, legumes, stem, leaves, root, flower, stamen, muscle, carapace, and/or any other component of the associated source. The protein source can optionally be labeled (e.g., in the database) with one or more classifications (e.g., dairy, meat, non-dairy, non-meat, etc.). The protein source, source component, and/or the protein set can optionally be associated with an abundance metric (e.g., where the metric can assess the ease of accessing large quantities of the protein set for scaled use). The abundance metric can be: experimentally determined (e.g., measured), predicted (e.g., based on the abundance metrics for related protein sources), and/or otherwise determined. The abundance metric is preferably representative of a single protein's abundance within a protein source, but can alternatively be representative of a protein set's abundance within a protein source, be representative of the protein source's abundance, and/or represent other information.
- The protein set can include all or a subset of proteins in the protein source and/or protein source component. In a first example, the protein set can include proteins above a concentration threshold in the protein source and/or source component (e.g., wherein the concentration threshold by weight can be 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 15%, 20%, 25%, 50%, etc.). In a second example, the protein set can include the most abundant (e.g., highest concentration) proteins in a protein source and/or a component of the protein source. In a specific example, the protein set can include a predetermined number of the most abundant proteins (e.g., wherein the predetermined number can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 50, etc.).
- Plant matter can include: peas (e.g., pea flour, pea starch, etc.), rice (e.g., rice flour, glutinous rice flour, white rice flour, brown rice flour, etc.), fruits (e.g., citrus fiber), cassava (e.g., cassava flour), potato, cocoa beans, truffles, olives, coconut flesh, grape pomace, pumpkin (e.g., pumpkin seed), cottonseed, canola, sunflower, hazelnut, pistachio, almond, walnut, crude walnut, cashew, brazil nuts, hazelnut, macadamia nuts, pecan, peanut, hemp, oat, rice, poppy, watermelon (e.g., watermelon seed), chestnut, chia, flax, quinoa, soybean, split mung beans, aquafaba, lupini, fenugreek, kiwi, Sichuan pepper, mustard, sesame, sunflower seeds, algae, duckweeds (e.g., lenna), squash, chickpeas, pine nuts, peas, cassava, citrus (e.g., citrus fiber), fava bean (e.g., fava bean flower), grape (e.g., grape pomace), lima bean (e.g., lima bean paste), carrageenan; plants selected from the cucurbita, anacardium, cannabis, salvia, arachis, brassica, sesamun, legume, and/or other genuses; plants selected from the Anacardiaceae, Asteraceae, Leguminosae, Cucurbits, Rosaceae, Lamiaceae, and/or other family; a combination thereof, and/or any other plant matter. The plant matter may include major production oilseeds (e.g., soybean, rapeseed, sunflower, sesame, niger, castor, canola, cottonseed, etc.), minor production oilseeds (coconut, palm seed, pumpkin, etc.), and/or other crops or plant matter. The plant matter may exclude allergens (e.g., wheat, soy, peanut, etc.). The plant matter may include a single variety of plant matter, a mixture of various plant matter, include animal matter (e.g., insect matter, mammalian products, etc.), and/or include matter from any other source.
- The protein source can be processed (e.g., lipid-removed, comminuted, separated into a solid and liquid component, mechanical processing, chemical processing, a protein powder derived from the plant matter, an extract from the plant matter, fermented, protein modifications, etc.) and/or unprocessed.
- For example, the protein source can include a plant milk, powdered whole plant component (e.g., plant matter) in an aqueous solution, isolated plant protein (e.g., powder), and/or any other suitable source of protein. [0022] One or more proteins can be derived (e.g., extracted) from the protein source. The proteins can include protein isolates (e.g., solubilized protein isolates) extracted from the protein source. Protein isolates can include: proteins isolated using isoelectric precipitation (e.g., salting in, salting out, etc.), by collecting and optionally diluting a protein-rich solution (e.g., the supernatant obtained by spinning down a whole plant ingredient, such as a seed powder; residual obtained by removing at least a threshold proportion of insoluble solids from a plant milk, such as 50%, 75%, 80%, 90%, etc.; etc.), and/or otherwise obtained. The protein ingredient obtained from the plant matter can be substantially pure (e.g., wherein a single monomeric or multimeric protein represents at least 50%, 60%, 70%, 80%, 90%, and/or more than 90% of the overall protein content in the protein ingredient and/or the product), but can alternatively be impure (e.g., include more than 10%, 20%, 30%, 40%, 50%, 60% other proteins, etc.).
- The proteins can include structured protein isolates (SPIs) produced using protein isolates. In a first example, SPIs can be produced by: obtaining a protein isolate mixture (e.g., a protein isolate solution) from a protein source; diluting the protein isolate mixture using a diluent; optionally separating the diluted protein isolate mixture (e.g., allowing sedimentation to occur, centrifuging, filtering, etc.); and collecting SPIs (e.g., an SPI mixture) and from the diluted protein isolate mixture (e.g., collecting the sediment, collecting all or part of a homogenous diluted lipid protein isolate mixture, etc.). The diluent can include water (e.g., deionized water), an aqueous solution (e.g., water, a mixture of water and other ingredients, etc.), an aqueous solution mixed (e.g., emulsified) with other ingredients, and/or any other diluent. The SPI mixture can include an aqueous component, a protein component, and/or other ingredients. The protein component can include protein isolates, SPIs, aggregates of SPIs, a combination thereof, and/or any other proteins. The protein concentration (by weight) in the SPI mixture can be between 0.01%-95% or any range or value therebetween (e.g., 1%-15%, 30%-50%, 44%, 40%, etc.), but can alternatively be less than 0.01% or greater than 95%.
- The proteins can include: globulins (e.g., 2S globulins, 1S globulins, 7S globulins, conglutin, napin, sfa, edestin, amandin, concanvalin, vicilin, legumin, cruciferin, helianthinin, etc.), pseudoglobulins, globular proteins, prolamins, albumins, gluten, gliadin, conglycinin, hordein, phasolin, zein, olsosin, caloleosin, sterelosin, conjugated proteins (e.g., lipoprotein, mucoprotein, etc.), other storage proteins (e.g., seed storage proteins, vegetative storage protein, etc.), animal proteins (e.g., casein, insect proteins, etc.), and/or any other suitable protein or combination thereof. Proteins can optionally be modified (e.g., transglutaminase modifications, proteolytic modifications, glycosylation, glycation, phosphorylation, acylation, etc.) pre- or post-extraction from the protein source. The proteins (e.g., modified or unmodified) can optionally include SPIs, wherein protein isolate units (e.g., protein monomers arranged in an oligomeric complex such as a hexamer) can be arranged in: agglomerates, aggregates, micelles, stacks, and/or any other suitable higher-order arrangement (e.g., quaternary structure or higher). The SPI structure can be a sphere (e.g., a shell of protein isolate units, a shell or micelle with hydrophilic regions along the exterior and hydrophilic regions along the interior, etc.), an amorphous structure, and/or any other structure. The proteins can optionally include an aggregate of SPIs, wherein constituent SPIs can be arranged in: agglomerates, aggregates, micelles, stacks, and/or any other suitable higher-order arrangement. The proteins can include casein proteins, non-casein proteins, mammalian proteins, non-mammalian proteins, plant proteins, animal proteins, non-animal proteins, and/or any other proteins. For example, proteins in target protein sets can include casein proteins, mammalian proteins, and/or animal proteins, while proteins in candidate protein sets can substantially exclude casein proteins, mammalian proteins, allergen proteins (e.g., proteins from allergens, such as peanuts, soy, wheat, etc.), and/or animal proteins, and/or include plant proteins (e.g., exclusively include plant proteins). In a specific example, proteins in candidate protein sets can include casein, mammalian, and/or animal proteins below a threshold amount, wherein the threshold amount can be between 0.1%-10% or any range or value therebetween (e.g., 10%, 5%, 3%, 2%, 1%, 0.1%, etc.), but can alternatively be greater than 10% or less than 0.1%.
- The protein set can be associated with a protein set composition and/or a total protein quantity (e.g., wherein the total protein quantity is an overall concentration or amount of proteins within a protein source and/or source component, an overall concentration or amount of proteins within a product, etc.). The protein set composition can include an identification of each protein in the set (e.g., a name or other identifier for each protein) and/or a concentration of each protein in the set. The concentration of a protein in the protein set can be an absolute concentration or a concentration relative to other proteins in the protein set. In examples, the concentration can be a percentage (e.g., by weight, by mass, by moles, etc.), a ratio, a proportion, an abundance, an amount (e.g., weight, mass, moles, etc.), a ranking (e.g., wherein each protein in the set is ranked relative to the other proteins based on concentration), and/or any other concentration metric. In an illustrative example, the composition of a first protein set can include a first protein (P1) at a concentration C1, and a second protein (P2) at a concentration C2; the composition of a second protein set can include the same proteins (P1 and P2) at difference concentrations C3 and C4, respectively. The protein set composition and/or the total protein quantity can be measured (e.g., using an assay), predetermined (e.g., manually specified), predicted (e.g., based on evolutionary relationships, using a prediction model, based on an amino acid composition, using a database, etc.), and/or otherwise determined. In a first specific example, a first protein source is associated with a first protein set with a known composition and a second protein source is associated with a second protein set with an unknown composition, wherein an evolutionary relationship (e.g., based on an evolutionary tree) between the first and second protein sources is used to predict the composition of the second protein set (e.g., using the assumption that certain proteins and/or protein concentrations would be similar between the first and second protein sets when the protein sources are evolutionarily close). In a second specific example, an overall composition of amino acids in a protein set is determined using an assay (e.g., LC/MS), and the composition of amino acids in each constituent protein are predicted based on the amino acid sequence for the respective constituent protein. In a third specific example, an overall composition of amino acids in a protein set and a composition of amino acids in each constituent protein are retrieved from an amino acid composition database (e.g., a third-party PseAAC database). A model (e.g., a regression) can be used to determine the concentration of each constituent protein within the protein set based on the overall amino acid composition (e.g., of the mixture) and the amino acid compositions for the constituent proteins.
- The protein set can be associated with one or more sequences (e.g., one sequence for each individual protein in the set). Sequences can include amino acid sequences, genetic sequences (e.g., DNA sequence, RNA sequence, gene sequence, etc.), any molecular sequence, any protein sequence, and/or other genetic information. Sequences can be measured (e.g., using an assay), predetermined (e.g., manually specified), predicted (e.g., based on an evolutionary tree, using a prediction model, etc.), and/or otherwise determined.
- The protein set can be associated with a context. The context can include: process parameters, protein modifications, sample environment, and/or any other information associated with the protein set and/or a sample (e.g., a food product, a gel, and/or any other product) containing the protein set. The context can be measured (e.g., using an assay), predetermined (e.g., manually specified), predicted, and/or otherwise determined.
- The protein set can be associated with one or more protein structures (e.g., one structure for each protein, one structure for each protein-context combination, etc.). The protein structures can be measured, predicted (e.g., using protein structure prediction models, and/or otherwise determined.
- Process parameters are preferably specifications prescribing the manufacturing of a sample containing the protein set (e.g., extracting the protein set from one or more protein sources, manufacturing the sample using the protein set, etc.), but can be otherwise defined. Process parameters can define: manufacturing specifications; the amounts thereof (e.g., ratios, volume, concentration, mass, etc.); temporal parameters thereof (e.g., when the input should be applied, duration of input application, etc.); and/or any other suitable manufacturing parameter. Manufacturing specifications can include: ingredients, treatments, and/or any other sample manufacturing input, wherein the process parameters can include parameters for each specification. Examples of ingredients can include: plant matter, proteins, lipids (e.g., fats, oils, etc.; isolated from plant sources; etc.), water, preservatives, acids and/or bases, macronutrients (e.g., protein, fat, starch, sugar, etc.), nutrients, micronutrients, carbohydrates, gums, vitamins, enzymes, emulsifiers, hydrocolloids, salts, chemical crosslinkers and/or non-crosslinkers, coloring, flavoring compounds, vinegar, mold powders, microbial cultures (e.g. cheese cultures, such as Penicillium camemberti, Penicillium candidum, Geotrichum candidum, Penicillium roqueforti, Penicillium nalgiovensis, Verticillium lecanii, Kluyveromyces lactis, Saccharomyces cerevisiae, Candida utilis, Debaryomyces hansenii, Rhodosporidum infirmominiatum, Candida jefer, Cornybacteria, Micrococcus sps., Lactobacillus sps., Lactococcus, Staphylococcus, Halomonas, Brevibacterium, Psychrobacter, Leuconostocaceae, Streptococcus thermophilus, Pediococcus sps., Propionibacteria culture, combinations thereof, etc.), carbon sources, any combination thereof, and/or any other ingredient. Examples of treatments can include: adjusting temperature, adjusting salt level, adjusting pH level, diluting, pressurizing, depressurizing, humidifying, dehumidifying, agitating, resting, adding ingredients, removing components (e.g., filtering, draining, centrifugation, etc.), adjusting oxygen level, brining, comminuting, fermenting, mixing (e.g., homogenizing), reactions (e.g., acylation, glycation, phosphorylation, etc.), structural adjustments (e.g., micellization, etc.) and/or other treatments. Examples of treatment parameters can include: treatment type, treatment duration, treatment rate (e.g., flow rate, agitation rate, cooling rate, etc.), treatment temperature, time (e.g., when a treatment is applied, when the sample is characterized, etc.), and/or any other parameters.
- Protein modifications can include transglutaminase modifications, proteolytic modifications, glycosylation, glycation, phosphorylation, acylation, hydrolysis, and/or any other protein treatments. The modified proteins can be used as ingredients for a downstream product (e.g., dairy replicate), be used as a product (e.g., be sold as-is, be fermented using a cheese culture post-modification, etc.), and/or be otherwise used.
- In a first embodiment, proteins (e.g., proteins containing nucleophilic residues, such as Lys, Ser, Thr, Cys, etc.; SPIs; etc.) can be acylated using fatty acyl anhydrides (e.g., caprylic anhydride; myristic acid; stearic acid; oleic acid; linoleic acid; etc.), yielding a fatty acylated protein (e.g., via an amide linkage, such as from Lys; ester linkage, such as from Ser; thioester linkage, such as from Cys; etc.) and a fatty acid. For example, the ratio between proteins and acyl anhydrides (e.g., by weight, by mass, by moles, etc.) can be between 1:1-1:4, but can alternatively be greater than 1:1 or less than 1:4. Unreacted fatty acyl anhydride can be quenched (e.g., with hydroxide and water, a base, a salt, etc.), yielding the corresponding fatty acid. The resultant fatty acylated protein and/or a sample therefrom can have increased lipid binding; increased hydrophobicity; increased gel strength; increased flow at elevated temperatures (i.e., melt); increased stretchiness; and/or other changed functional property values (e.g., values for texture, nutrition, etc.) relative to the unacylated protein or a sample therefrom. In variants, other carboxylic acid conjugation reagents (e.g., acyl chlorides, activated carboxylic acids, metal catalysts, etc.) can additionally or alternatively be used.
- In a second embodiment, proteins (e.g., protein residues, surface-accessible nucleophilic residues, etc., SPIs, etc.) can be phosphorylated (e.g., using sodium trimetaphosphate). For example, nucleophilic residues (e.g., Ser, Thr, Lys) of the protein may attack sodium trimetaphosphate (STMP) and/or other reagents, resulting in a triphosphorylated protein which hydrolyses, releasing pyrophosphate to yield the phosphorylated protein. Examples of other phosphorylation reagents that can be used include: other trimetaphosphate salts; hexametaphosphate salts; tripolyphosphate salts; polyphosphate salts; nucleoside triphosphates, and/or other phosphorylation agents. In variants, phosphorylation can be performed using non-toxic (e.g., at relevant concentrations) catalysts, reagents, byproducts, and/or other substances. The resultant phosphorylated protein and/or a sample therefrom can have increased calcium binding (e.g., an increased calcium concentration in the sample); increased stretchiness; increased flow at elevated temperatures (i.e., melt); increased solubility; increased hydrophobicity and/or hydrophilicity; decreased toxicity; decreased hydrophobicity and/or hydrophilicity; and/or other changed functional property values relative to an unphosphorylated protein and/or a sample therefrom.
- In an example, the proteins (e.g., protein isolates, SPIs, dissolved and resuspended protein source substrate, etc.) can be suspended in a protein solution at a target protein concentration. The target protein concentration in the protein solution and/or the target protein concentration in a final mixture (e.g., including protein, acids/bases, phosphorylation reagent, calcium, etc.) is between 3%-50% or any range or value therebetween (e.g., 4-10%, 6%, 9%, greater than 6%, greater than 9%, 10-20%, 15%, etc.), but can alternatively be less than 5% or greater than 50%. In a specific example, the proteins are diluted to achieve the target concentration. The diluent can be water (e.g., deionized water), an aqueous solution (e.g., water, a mixture of water and other ingredients, etc.), and/or any other diluent. The protein solution can optionally be homogenized (e.g., for 30s-10 min, 1 min, 2 min, 3 min, 5 min, any other time, etc.). The pH of the protein solution can be adjusted to a target pH, wherein the target pH is between 3-12 or any range or value therebetween (e.g., 3-5, 4, 5-7, 6, above 6, above 7, below 7, 10-11, 10, 10.5, 11, etc.), but can alternatively be less than 3 or greater than 12. A solution including a phosphorylation reagent (e.g., Na3(PO3)3) can be added to the protein solution to achieve a target concentration in a final mixture (e.g., 20 mM-1000 mM, 100 mM-500 mM, 300 mM-400 mM, 80 mM, 150 mM, 250 mM, 350 mM, greater than 150 mM, greater than 80 mM, less than 80 mM, etc.). The resulting (intermediate) mixture can optionally be homogenized (e.g., for 30 s-10 min, 1 min, 2 min, 3 min, 5 min, any other time, etc.). The resulting mixture can be stirred for between 15 min-10 hrs or any range or value therebetween (e.g., 30 min-2 hrs, 1 hr, etc.), but can alternatively be stirred for less than 15 min or greater than 10 hrs. The stir rate can be between 100 rpm-10,000 rpm or any range or value therebetween (e.g., 300 rpm-1,000 rpm), but can alternatively be less than 100 rpm or greater than 10,000 rpm. The temperature while stirring can be between 10° C.-50° C. or any range or value therebetween (e.g., 20° C.-30° C., room temperature, etc.), but can alternatively be less than 10° C. or greater than 50° C. Calcium can optionally be added to the mixture (e.g., to bind calcium to the phosphorylated proteins, to enable the reaction to proceed forward, etc.) before or after phosphorylating agent addition. For example, a solution of calcium salts (e.g., CaCl2)) can be added to the mixture to achieve a target concentration in a final mixture (e.g., 5 mM-40 mM, 20 mM-10000 mM, 20 mM-1000 mM, 80 mM, 100 mM, 140 mM, 240 mM, 300 mM, 400 mM, greater than 140 mM, less than 400 mM, etc.). The process parameters in this example can optionally achieve a sticky (e.g., increased adhesion, decreased hardness, etc.) texture in a sample produced using the phosphorylated proteins. An example is shown in
FIG. 14 . Additionally or alternatively, the texture of the sample can be hardened by increasing the amount of phosphorylating agent, decreasing the amount of calcium salts, decreasing the amount of protein in the starting protein solution, and/or decreasing the pH. - In examples, the phosphorylated proteins can optionally be collected, such as via centrifugation (e.g., collecting the sediment after centrifugation), filtration, precipitation, and/or other protein isolation methods, wherein the proteins can be used in all or parts of the method. The centrifugation speed can be between 500 rpm-20,000 rpm or any range or value therebetween (e.g., 1,000 rpm-10,000 rpm, 5,000 rpm, etc.), but can alternatively be less than 500 rpm or greater than 20,000 rpm. The centrifugation time can be between 30 s-1 hr or any range or value therebetween (e.g., 5 min-30 min, 10 min, 20 min, etc.), but can alternatively be less than 30s or greater than 1 hr. The proteins can optionally be resuspended after collection (e.g., washed) in a diluent (e.g., water). The ratio (by volume) between the collected protein and the diluent can be between 1:10-10:1 (e.g., 1:3, 1:2, 1:1, 2:1, 3:1, etc.), but can alternatively be less than 1:10 or greater than 10:1.
- However, proteins can be otherwise phosphorylated.
- In a third embodiment, proteins (e.g., protein residues, surface-accessible lysine residues, etc.) can be glycated. For example, lysine residues and/or other residues can covalently bond to sugars (e.g., via nucleophilic attack of an acyclic sugar's aldehyde), resulting in a glycated protein. In variants, glycation can be performed using non-toxic (e.g., at relevant concentrations) catalysts, reagents, byproducts, and/or other substances. The resultant glycated protein and/or a sample therefrom can have increased flow at elevated temperatures (e.g., melt); increased solubility; increased hydrophobicity and/or hydrophilicity; decreased toxicity; decreased hydrophobicity and/or hydrophilicity; and/or other changed functional property values relative to an unglycated protein and/or a sample therefrom.
- Maillard glycation is conventionally achieved at high temperatures, which may result in protein denaturation and accelerates later stage reactions, including those resulting in advanced Maillard products (AMPs). AMPs can give rise to off-flavours and/or off-colours in a sample. In variants, the method can include catalyzing an initial glycation event (e.g., via base catalysis and/or acid catalysis), which can reduce or remove the need for high temperatures (e.g., in the initial and/or later stages).
- In an example, glycating proteins can include: combing proteins and sugars in a solution; adjusting a pH of the solution; and adjusting a temperature of the solution.
- The proteins (e.g., protein isolates, SPIs, dissolved and resuspended protein source substrate, etc.) and sugars can be combined in the solution (e.g., dissolved in a diluent such as water) at a target protein concentration and a target sugar concentration. The target protein concentration (e.g., by weight) can be between 5%-60% or any range or value therebetween (e.g., 10%-40%, 15%-35%, 15%-25%, 25%-35%, etc.), but can alternatively be less than 5% or greater than 60%. The target sugar concentration can be 5%-70% or any range or value therebetween (e.g., 20%-40%, 20%-30%, 30%-40%, etc.), but can alternatively be less than 5% or greater than 70%. Examples of sugars that can be used include: monosaccharides such as pentoses and hexoses (e.g., ribose, arabinose, xylose, glucose, galactose, fructose, etc.); disaccharides; oligosaccharides; polysaccharides; and/or any other sugars. The sugars can be plant-based, synthesized, and/or otherwise obtained. In variants, the sugar used can be selected based on its reactivity. For example, pentoses can be preferred to hexoses, which can be preferred to disaccharides, which can be preferred to oligosaccharides, which can be preferred to polysaccharides. However, the sugars can be otherwise selected.
- The pH of the solution during glycation can be adjusted to a target pH. In a first specific example, all or parts of the glycation reaction can be performed at an acidic pH (e.g., an acid-catalyzed reaction). The target pH can be between 2-7 or any range or value therebetween (e.g., 3-6.5, 4-6, less than 6, etc.), but can alternatively be less than 2 or greater than 7. Examples of acid catalysts that can be used to adjust the pH can include: hydrochloric acid, Bronsted acids, Lewis acids, and/or other acids. In a second example, all or parts of the glycation reaction can be performed at a basic pH (e.g., a base-catalyzed reaction). The target pH can be between 7-11.5 or any range or value therebetween (e.g., 8-11, 9-10.5, 9-10, 10-10.5, greater than 8, greater than 9, etc.), but can alternatively be less than 7 or greater than 11.5. Examples of base catalysts that can be used to adjust the pH can include: sodium hydroxide, sodium bicarbonate, potassium bicarbonate, ammonium bicarbonate, and/or other bases. The acids and/or bases are preferably food safe, but can alternatively be not food safe.
- The temperature of the solution can be adjusted to a target temperature for a target reaction time (e.g., wherein the temperature is maintained throughout the reaction time, wherein the temperature is adjusted during the reaction time, etc.). The target temperature can be between 10° C.-200° C. or any range or value therebetween (e.g., at or above 45° C., 40° C.-80° C., at or above 50° C., 55° C.-70° C., below 55° C., at room temperature, above room temperature, etc.), but can alternatively be less than 10° C. or greater than 200° C. The target temperature is preferably below the protein's denaturation point, but can alternatively be at or above the denaturation point. The target reaction time can be between 1 hour-1 week or any range or value therebetween (e.g., 5 hrs-10 hrs, 8 hrs, 24 hrs-48 hrs, 12 hrs-24 hrs), but can alternatively be less than 1 hour or greater than 1 week.
- However, the proteins can be otherwise glycated.
- In examples, modified proteins and/or a sample therefrom can have changed functional property values. The change in functional property value can be determined relative to a protein source, an unmodified protein, a reaction intermediary, a sample therefrom, and/or relative to any other compound or substance. Examples of changes can include: 5%, 10%, 30%, 50%, 80%, a range therebetween, over 80%, and/or any other increased or decreased proportion. In variants, one or more protein modification process variables can be selected, controlled, adjusted, and/or otherwise manipulated to achieve a target functional property value (e.g., target texture). Examples of variables that can be controlled include: the protein source; protein preprocessing methods (e.g., protein isolation techniques, etc.); protein configuration (e.g., protein isolates, structured or unstructured arrangement of protein isolates, etc.); reagents; protein and/or reagent concentrations; stoichiometric ratio between protein and reagents; reaction scale (i.e., mass of initial protein substrate, volume of solvent); reaction time; reaction temperature; reaction pH, quenching or not quenching; washing (e.g., removal of unreacted reactants and byproducts such as pyrophosphate, unreacted sugars, AMPs, etc.) or not washing; concentration (e.g., presence vs absence of acids, bases, and/or other ingredients; and/or other variables.
- The sample environment can include: a composition of the sample (e.g., other macronutrients and their respective concentrations), sample structure information (e.g., sample matrix type; sample porosity; sample phase such as solid, liquid, and/or gaseous; etc.), pH level, temperature (e.g., temperature at which the functional properties for the sample would be measured), pressure, isoelectric point, and/or any other sample parameters.
- The protein set can be associated with values for one or more characteristics and/or can be uncharacterized (e.g., lack values for one or more characteristics). Characteristics can include: features, functional properties (e.g., an example of functional properties is shown in
FIG. 4 ), functionalities (e.g., storage functionalities, breaking down sugar and/or any other molecule, enzyme functionalities, etc.), and/or any other characteristics. - Features are preferably sequence features (e.g., extracted from one or more amino acid sequences), but can alternatively be other protein characteristics (e.g., molecular features, physicochemical features, protein structure features, context features, etc.). Features can be human-interpretable (e.g., semantic features, where features represent specific properties, where the influence of a feature on functional properties is understood, etc.) or not human-interpretable (e.g., nonsemantic). Optionally, features can be annotated to provide human-interpretable context (e.g., by using an explainability or interpretability method applied to one or more models, etc.).
- A feature set can include: all possible features, a subset of features (e.g., selected using dimensionality reduction, selected using a feature selection model, selected features based on correlation with specific functional properties, etc.), a user-defined set of features, weighted features, aggregated features, and/or any other suitable set of features. The features within the feature set can be: learned (e.g., using an autoencoder, using a deep learning model, etc.), handcrafted, and/or otherwise determined.
- Each protein set can be associated with one feature value set (e.g., an aggregate feature value set), multiple feature value sets (e.g., one feature value set for each constituent protein, different feature value sets corresponding to different folding configurations, different feature value sets corresponding to different contexts, etc.), not have a feature value set, and/or be associated with any other feature value set. In an illustrative example, a feature value set is a feature value vector, wherein each element is a feature value for a feature in a feature set (e.g., a feature vector). In a first embodiment, each protein in the protein set is associated with a feature value set, wherein an aggregate feature value set (e.g., a representative feature value set) is determined for the protein set based on the feature values of the constituent proteins using a feature aggregation model (e.g., examples shown in
FIG. 5A andFIG. 5B ). In a second embodiment, a feature value set for the protein set can be directly determined (e.g., using a feature extraction model, using a machine learning model, etc.). - Features values can include and/or be extracted (e.g., using a feature extraction model) from: sequences (e.g., amino acid sequences, genetic sequences, etc.), measurements and/or other data, structures (e.g., primary, secondary, or tertiary structures that are known, measured, computer-generated, etc.), context, other feature values, and/or any other information. Examples of features can include: amino acid composition-based features, autocorrelation-based features, profile-based features, pseudo amino acid composition, sequence features (e.g., AA groups, active sites, binding sites, PTM sites, repeats, etc.), domain features, physicochemical features, domains, and/or any other feature. For example, features can include and/or be based on: k-mers; pseudo structure status composition (PseSSC); pseudo amino acid composition (PseAAC); composition, transition, and distribution (CTD); grand average of hydropathicity index (GRAVY); autocovariance; auto-cross covariance; top-n-gram; overall amino acid count; count and/or percentage of a specific amino acid; amino acid structure (e.g., amino acid subsequence organization within the amino acid sequence); charge (e.g., overall charge, charge distribution, charge at a given pH, etc.); acidity; hydrophilicity/hydrophobicity; functional groups; flexibility; instability; aromaticity; length; molecular weight; binding affinity; active sites (e.g., count, structure, location, etc.); physicochemical and/or molecular features of amino acids; and/or any other feature.
- However, features and/or feature values can be otherwise defined.
- Functional properties can include macro functional properties, micro functional properties, nano functional properties, a combination thereof, other characteristics, and/or any other functional properties.
- The set of functional property values for a protein set functions to define how the protein set and/or proteins in the protein set: behaves during sample preparation or cooking, influences the finished sample (e.g., in look, feel, taste, etc.), interacts with other molecules (e.g., secondary interactions, tertiary interactions, quaternary interactions, etc.), denatures (e.g., the denaturization point), folds, aggregates, other target functionalities, and/or any other behavior at the nano, micro, and/or macro scale (e.g., behaviors between the protein as a whole and the context or other proteins, etc.). Functional properties can include: nutritional profile (e.g., macronutrient profile, micronutrient profile, etc.), texture (e.g., texture profile, firmness, toughness, puncture, stretch, compression response, mouthfeel, viscosity, graininess, relaxation, stickiness, chalkiness, flouriness, astringency, crumbliness, stickiness, stretchiness, tearability, mouth melt, etc.), solubility, melt profile, smoke profile, gelation point, flavor, appearance (e.g., color, sheen, etc.), aroma, precipitation, stability (e.g., room temperature stability), emulsion stability, ion binding capacity, heat capacity, solid fat content, chemical properties (e.g., pH, affinity, surface charge, isoelectric point, hydrophobicity/hydrophilicity, chain lengths, chemical composition, nitrogen levels, chirality, stereospecific position, etc.), physiochemical properties, compound concentration (e.g., in the solid sample fraction, vial headspace, olfactory bulb, post-gustation, etc.), denaturation point, denaturation behavior, aggregation point, aggregation behavior (e.g., micellization capability, micelle stability, etc.), particle size, structure (e.g., microstructure, macrostructure, fat crystalline structure, etc.), folding state, folding kinetics, interactions with other molecules (e.g., dextrinization, caramelization, coagulation, shortening, interactions between fat and protein, interactions with water, aggregation, micellization, etc.), fat leakage, water holding and/or binding capacity, fat holding and/or binding capacity, fatty acid composition (e.g., percent saturated/unsaturated fats), moisture level, turbidity, interactions within the protein set (e.g., protein aggregation), properties determined using an assay tool, and/or any other properties. In examples, functional properties can include physicochemical and/or biochemical properties of amino acids and/or clusters of amino acids in each protein.
- A functional property set can include: all possible functional property values, a subset of functional properties (e.g., selected using dimensionality reduction, selected using a functional property selection model, etc.), a user-defined set of functional properties, weighted functional properties, and/or any other suitable set of functional properties.
- Functional property values sets can be associated with an individual protein, the entire set of proteins (e.g., a protein mixture; where each protein in the set is assigned the same functional property values, where functional property values are assigned to each protein based on individual protein concentrations within the set, etc.), a subset of the protein set (e.g., one or more proteins with the highest concentrations within the set), and/or be unassociated with the protein set (e.g., manually defined target functional properties). Each protein set can be associated with one functional property value set (e.g., wherein the functional property value set includes a value for each functional property in a functional property set), multiple functional property value sets (e.g., a protein set can be associated with different functional property value sets corresponding to different contexts), not have a functional property value set (e.g., uncharacterized), or be associated with any other functional property value set. In a specific example, a given protein can be associated with multiple functional property value sets, wherein each functional property value set corresponds to different protein sets that include the given protein. Functional property values can optionally include an uncertainty parameter (e.g., measurement uncertainty, determined using statistical analysis, etc.).
- The functional property values can be determined experimentally (e.g., using an assay tool), determined via computer simulations, predicted (e.g., using a prediction model, based on the sample context, other functional properties, other inputs, etc.), and/or be otherwise determined. The functional property values can be: directly measured, analyzed and/or transformed data, features extracted from data (e.g., a data time series), and/or be otherwise determined.
- However, functional properties and/or functional property values can be otherwise defined.
- The system can optionally leverage one or more assays. Properties determined using an assay tool can optionally be and/or be used to determine any functional property value and/or feature value. Examples of assays and/or assay tools that can be used include: a differential scanning calorimeter (e.g., to determine properties related to melt, gelation point, denaturation point, etc.), Schrieber Test, an oven (e.g., for the Schrieber Test), a water bath, a texture analyzer, a rheometer, spectrophotometer (e.g., determine properties related to color), centrifuge (e.g., to determine properties related to water binding capacity), moisture analyzer (e.g., to determine properties related to water availability), light microscope (e.g., to determine properties related to microstructure), atomic force microscope (e.g., to determine properties related to microstructure), confocal microscope (e.g., to determine protein association with fat/water), staining (e.g., paired with computer vision models), laser diffraction particle size analyzer (e.g., to determine properties related to emulsion stability), polyacrylamide gel electrophoresis system (e.g., to determine properties related to protein composition), phos-tag acrylamide gel electrophoresis (e.g., to determine extent of phosphorylation), acrylamide gel electrophoresis (e.g., to determine extent of glycation), mass spectrometry (MS), gas chromatography (GC) (e.g., gas chromatography-olfactometry, GC-MS, etc.; to determine properties related to aroma/flavor, to determine properties related to protein composition, etc.), liquid chromatography (LC), LC-MS, fast protein LC (e.g., to determine properties related to protein composition), protein concentration assay systems, thermal gravimetric analysis system, thermal shift (e.g., to determine protein denaturization and/or aggregation behavior), ion chromatography, dynamic light scattering system (e.g., to determine properties related to particle size, to determine protein aggregation, etc.), Zetasizer (e.g., to determine properties related to surface charge), protein concentration assays (e.g., Q-bit, Bradford, Biuret, Lecco, etc.), particle size analyzer, sensory panels (e.g., to determine properties related to texture, flavor, appearance, aroma, etc.), capillary electrophoresis SDS (e.g., to determine protein concentration), spectroscopy (e.g., fluorescence spectroscopy, circular dichroism, etc.; to determine folding state, folding kinetics, denaturation temperature, etc.), absorbance spectroscopy (e.g., to determine protein hydrophobicity), CE-IEF (e.g., to determine protein isoelectric point/charge), total protein quantification, high temperature gelation, microbial cloning, Turbiscan, stereospecific analysis, and/or any other assay and/or assay tool. In an illustrative example, a sample made using the protein set can be stained (e.g., for lipids and proteins), imaged, and analyzed (e.g., using the image) to determine the sample's lipid and protein structure (e.g., treated as a functional property). The sample's can optionally be measured using GC-MS to determine the chemical composition of the sample.
- The method can be used with one or more targets, wherein one or more candidate protein sets (e.g., analogous protein sets) can be determined based on the target (e.g., to replace a target protein set, to manufacture an analog for a target product, to identify a protein set with target characteristic values, etc.). Candidate protein sets can include: proteins found in a predetermined set of protein sources, proteins expressed by a predetermined set of species, genus, family, and/or other set of organisms, and/or other proteins. For example, candidate protein sets can include proteins found in plant-based sources (e.g., substantially excluding animal-based sources), naturally-occurring sources, genetically modified sources, synthetic sources, and/or any other suitable source. Target protein sets can include: a protein set to be replaced or replicated, or any other protein set. For example, target protein sets can include proteins found in animal-based sources (e.g., dairy sources).
- The target can include one or more: target characteristics (e.g., features, functional properties, etc.), target characteristic values, target protein sets (e.g., a single protein set, a composition of protein sets, etc.), target sources, target products (e.g., target food products), and/or other targets. Examples of target food products include: dairy fats (e.g., ghee, other bovine milk fats, etc.), milk (e.g., cow milk, sheep milk, goat milk, human milk, etc.), cheese (e.g., hard cheese, soft cheese, semi-hard cheese, semi-soft cheese), yogurt, cream cheese, dried milk powder, cream, whipped cream, ice cream, coffee cream, other dairy products, egg products (e.g., scrambled eggs), additive ingredients, mammalian meat products (e.g., ground meat, steaks, chops, bones, deli meats, sausages, etc.), fish meat products (e.g., fish steaks, filets, etc.), any animal product, and/or any other suitable food product. In specific examples, the target food product includes mozzarella, burrata, feta, brie, ricotta, camembert, chevre, cottage cheese, cheddar, parmigiano, pecorino, gruyere, edam, gouda, jarlsberg, and/or any other cheese.
- Target characteristic values can optionally be characteristic values for a target product and/or for a target protein set (e.g., associated with a target product). Target characteristic values can include a single value and/or ranges. A target can be: a single target (e.g., a single target characteristic value set for a given protein set) or aggregated targets (e.g., a vectorized set of feature values and/or functional property values aggregated across multiple protein sets, etc.). A target can be: a positive target (e.g., where positive target features are positively correlated with target functional properties; where desired characteristics are positive targets; etc.), or a negative target (e.g., where negative target features are negatively correlated with target functional properties; where undesired characteristics are negative targets; etc.); an example is shown in
FIG. 10 . In a first variant, the target characteristic values include desired feature values; an example is shown inFIG. 11 . In a second variant, the target characteristic values include desired functional property values (e.g., associated with a target protein set, manually specified, etc.); examples shown inFIG. 10 andFIG. 12 . - However, the target can be otherwise defined.
- The system can include one or more models, including feature extraction models, correlation models, feature selection models, functional property selection models, prediction models, protein set determination models, feature aggregation models, similarity models, structure prediction models, and/or any other model. Any model can include: regression, classification, neural networks (e.g., CNNs, DNNs, etc.), rules, heuristics, equations (e.g., weighted equations, etc.), selection (e.g., from a library), instance-based methods (e.g., nearest neighbor), regularization methods (e.g., ridge regression), decision trees, models used in Bayesian methods (e.g., Naïve Bayes, Markov), optimization methods, kernel methods, probability, deterministics, genetic programs, support vectors, and/or any other suitable method.
- The models can include classical machine learning models (e.g., linear regression, logistic regression, decision tree, SVM, nearest neighbor, PCA, SVC, LDA, LSA, t-SNE, naïve bayes, k-means clustering, clustering, association rules, dimensionality reduction, etc.), neural networks (e.g., CNN, CAN, LSTM, RNN, autoencoders, deep learning models, etc.), ensemble methods, heuristics, and/or any other suitable model. The models can be scoring models, numerical value predictors (e.g., regressions), classifiers (e.g., binary classifiers, multiclass classifiers, etc.), and/or provide other outputs.
- The models can be trained and/or learned, fit, predetermined, and/or can be otherwise determined. The models can be learned using: supervised learning, unsupervised learning, reinforcement learning, Bayesian optimization, positive-unlabeled learning, and/or otherwise learned. In specific examples, models can be trained using multiple-instance learning (MIL), learning to aggregate (LTA), and/or any other training approach. The models can be learned or trained on: labeled data (e.g., data labeled with the target label), unlabeled data, positive training sets (e.g., a set of data with true positive labels, negative training sets (e.g., a set of data with true negative labels), and/or any other suitable set of data.
- The models can be specific to: functional properties, a protein set, a context, a target, and/or otherwise specific, or be generic. The feature extraction model can function to extract values for features for a protein set (e.g., for each protein in the set, for the protein set as a whole, etc.). The feature extraction model can output feature values based on molecular information inputs (e.g., sequences, measurements, data, structure, protein set composition, etc.), context, and/or other information. The feature extraction model can use: folding analysis, classifiers, the reduced alphabet approach, Markov models, statistical methods, n-gram analysis, autocovariance, auto-cross covariance, protein descriptor methods (e.g., PseSSC, PseAAC, CTD, GRAVY, etc.), any protein analysis methods, encoders (e.g., trained to encode the sequence to a shared latent space), and/or any other feature extraction technique. In a first example, the feature extraction model extracts handpicked features (e.g., wherein the feature extraction model is trained on a predetermined training value for the feature). In a second example, the feature extraction model can be adopted from another domain (e.g., be a linguistic feature model). In a third example, the feature extraction model can be a subset of the layers from a model trained end-to-end to predict another attribute (e.g., wherein the features can be learned features). In an illustrative example, the feature extraction model can be a subset of layers (e.g., the first several layers, feature extraction layers, intermediary layers, etc.) of a prediction model trained to predict functional property values from protein sequences, context, and/or other inputs (e.g., example shown in
FIG. 14 ). However, the feature extraction model can be otherwise configured. - The extracted features for the protein set can be represented as one or more feature vectors, wherein each vector position can represent a different feature. In a first variant, a feature vector is determined for each protein within the set, wherein the feature value is determined based on the protein's sequence and optionally the protein's abundance or concentration within the protein set. Alternatively, the protein's abundance or concentration can be represented by a separate vector. In a second variant, a feature vector is determined for each protein set, wherein each feature's value is representative of the feature value for the protein set as a whole. In an example, the protein set feature vector is determined based on the feature's values for each protein in the protein set (e.g., wherein the different values for a given functional feature are aggregated, predicted, etc.), and optionally determined based on the respective protein's abundance within the protein set (e.g., weighted based on the respective protein's abundance within the set, etc.). However, the extracted features can be otherwise represented.
- The optional feature aggregation model can function to aggregate feature values across proteins in a protein set. The feature aggregation model inputs can include: a feature value set (e.g., a feature value vector) for each protein in the protein set, a feature value set for each protein in a subset of the protein set, a protein set composition, context, and/or any other protein set information. The feature aggregation model outputs can include an aggregate feature value set (e.g., an aggregate feature value vector) for the protein set. The feature aggregation model can optionally interface with and/or be part of the prediction model (e.g., wherein the prediction model aggregates feature values).
- The feature aggregation model can leverage classical or traditional approaches (e.g., heuristics, equations, etc.), leverage machine learning approaches (e.g., have learned parameters/weights, use MIL (multiple instance learning), use LTA learning, etc.), and/or be otherwise constructed. In a first embodiment, the feature aggregation model is a traditional or classical model. For example, the feature aggregation model can include a weighted combination (e.g., weighted average, etc.) of the feature value sets for individual proteins in the protein set, wherein the weights can be based on protein type, protein set composition (e.g., protein concentration, protein abundance in the protein set, etc.), and/or any other protein information. In a second embodiment, the feature aggregation model is a neural network. In a first example, the feature aggregation model includes a weighted combination of feature value sets for individual proteins in the protein set with optional interaction terms, wherein the weights and/or the interaction terms are learned parameters. In a second example, the feature aggregation model is the prediction model trained using MIL, wherein each instance is an individual protein with a respective concentration, each bag is a protein set, and bag labels are functional property values.
- However, the feature aggregation model can be otherwise configured.
- The prediction model can function to predict functional property values for a protein set. The prediction model can incorporate a correlation model, feature selection model, functional property selection model, feature aggregation model, and/or any other model. The prediction model inputs can include: a feature value set for each protein in the protein set (e.g., a feature value vector), a feature value set for the protein set (e.g., a feature value vector for the protein set as a whole, an aggregate feature value vector, etc.), protein set composition, context (e.g., parametrized into a context vector), correlation information (e.g., outputs from the correlation model), and/or any other protein set information. The prediction model outputs can include: a functional property value set and/or any other protein set information. The prediction model can include a single model and/or multiple models. When the prediction model includes multiple models, the models can be arranged in series, in parallel, as distinct models, and/or otherwise arranged. When the prediction model includes multiple models, the models can be trained separately (e.g., using distinct training data sets), trained together (e.g., using the same training data set, using different subsets of the same training data set, etc.), and/or otherwise trained.
- In a first variant, the prediction model outputs functional property values based on feature values associated with the protein set (e.g., feature values for individual proteins in the protein set and/or for the protein set as a whole). An example shown in
FIG. 8 . The model can optionally predict the functional property value based on the context; an example is shown inFIG. 13 . For example, the context can be parametrized into a context vector, wherein the context vector can be appended to the protein set feature vector or provided as another input into the model. The model can predict a value for a single functional property (e.g., be a regression, classifier trained on a single functional property, etc.), values for multiple functional properties (e.g., be a multiclass classifier), and/or values for any other suitable set of functional properties. - In a second variant, the prediction model predicts functional property values based on protein sequences for the protein set. In an example, the prediction model can output a vector, wherein each vector position can represent a different functional property and the vector value can represent the predicted value for said functional property.
- In a third variant, the prediction model predicts a functional property similarity score, indicative of the protein set's functional property similarity to a target sample's functional property, wherein the model can be analyzed (e.g., using an acquisition function) to determine which protein set (and/or feature vector) can produce a sample with functional properties that are closer to the target sample (e.g., using a Bayesian optimization technique).
- In a fourth variant, the prediction model predicts the protein set that can produce the target functional property values, target feature values, and/or other target. The prediction model (and/or another model) can optionally predict the context (e.g., process parameters) needed to produce the target functional property values. The prediction model can predict: which proteins should be included in the protein set, the amount of each protein in the protein set, and/or other aspects of the protein set. In an example, the prediction model predicts a vector, wherein each vector position represents a different protein, and each value represents an amount of the respective protein. In a second example, the prediction model predicts a protein inclusion vector (e.g., which proteins should be in the set) and a protein amount vector (e.g., how much of the included proteins should be in the set). The two vectors can be predicted serially (e.g., protein inclusion vector first, then protein amount vector), at the same time, by the same model, by different models, and/or otherwise predicted.
- However, the prediction model can be otherwise configured.
- The optional protein set determination model (e.g., selection model) can function to determine a candidate protein set with characteristic values that closely match target characteristic values (e.g., the best/closest match, a match below a threshold, etc.). The protein set determination model inputs can include target characteristic values (e.g., target functional property values, target feature values, etc.), constraints (e.g., context constraints), the database, predicted characteristic values (e.g., predicted functional property values for each of a set of candidate protein sets), and/or any other information. The protein set determination model outputs can include: the candidate protein set (e.g., a candidate protein set selected from the database), the composition of the candidate protein set (e.g., the concentration for each protein in the set), the context for the candidate protein set, an ingredient (e.g., from which the candidate protein set can be derived; for use in product manufacture or target analog manufacture; etc.), and/or any other protein set information. The protein set determination model can use: comparison methods (e.g., matching, distance metrics, etc.), thresholds, optimization methods, regression, selection methods, classification, neural networks (e.g., CNNs, DNNs, etc.), clustering methods, rules, heuristics, equations (e.g., weighted equations, etc.), and/or any other methods. For example, the protein set determination model can search the database for a candidate protein set and/or determine a new protein set based on the target characteristics. The protein set determination model can optionally interface with and/or be part of the prediction model, the similarity model, and/or any other model. In a specific example, the protein set determination model can interface with and/or include the prediction model, wherein functional property values are predicted for each of a set of protein sets (e.g., uncharacterized protein sets) using the prediction model. The protein set determination model can then select a candidate protein set based on a comparison between the predicted functional property values and target functional property values (e.g., using the similarity model). In another example, the protein set determination model can determine the target feature values for a target protein set and identify a candidate protein set based on a comparison (e.g., the similarity) between the respective feature values (for the candidate protein set) and the target feature values, and/or a comparison (e.g., dissimilarity) between the respective feature values and a set of negative target feature values (e.g., feature values from protein sets to avoid).
- However, the protein set determination model can be otherwise configured.
- The optional correlation model can function to determine the correlation, interaction, and/or any other association between features and functional properties. For example, a correlation model can determine correlations between features and functional properties. However, the correlation model can determine correlations between any first set of features and/or functional properties and any second set of features and/or functional properties.
- The correlation model inputs can include features (e.g., specifying a subset of features for correlation), feature values (e.g., individual protein feature values and/or aggregate feature values), sequences, functional properties (e.g., specifying a subset of functional properties for correlation), functional property values (e.g., where the feature values and/or functional property values are associated via common protein sets in the database), context, protein set compositions, the database, and/or any other information. The correlation model outputs can include a mapping between features (e.g., features, feature values, ranges of values, etc.) and functional properties (e.g., functional properties, functional property values, ranges of values, etc.), wherein the mapping can include: correlation coefficients (e.g., negative and/or positive), interaction effects (e.g., negative and/or positive, where a positive interaction effect can represent an increased significance effect of feature A on a functional property when in the presence of feature B), an association, and/or other correlation metric. The correlation model can use: classifiers, SVMs, ANNs, RF, conditional random field (CRF), K-nearest neighbors, statistical methods, and/or any other method.
- In variants, the mapping between features and functional properties can be an association between features and functional properties (e.g., an autocorrelation feature is correlated with stretchability), feature values and/or ranges thereof with functional properties (e.g., a first range of autocorrelation values is correlated with stretchability, while a second range of autocorrelation values is correlated with spreadability, etc.), features with functional property values and/or ranges thereof, feature values and/or ranges thereof with functional property values and/or ranges thereof (e.g., autocorrelation values are correlated with spreadability values), combinations of features with combinations of functional properties (e.g., including interaction effects between features), combinations of feature values with combinations of functional property values, and/or any other association.
- The correlation model can optionally be trained on a set of characterized protein sets (e.g., characterized with feature values, functional property values, etc.). In variants, the correlation model can identify similar and/or divergent feature values (e.g., calculating an implicit and/or explicit similarity measure) between protein sets and correlate those features to functional properties. For example, features with differing values (e.g., across protein sets) can be mapped to the functional properties with differing values (e.g., across the same protein sets). In a first specific example, a first feature is mapped to meltability when the feature values for two protein sets are substantially similar (e.g., within a threshold) except for the first feature's values, and the functional property values for the two protein sets are substantially similar except for the meltability values. In a second specific example, feature value differences (e.g., sequence differences determined using a sequence alignment method, a classifier, etc.) between related proteins (e.g., where a relation is determined using an evolutionary tree) can be correlated with differences in the respective functional property values.
- However, the correlation model can be otherwise configured.
- The optional feature selection model can function to select a subset of features (e.g., to reduce feature dimensions, to select features likely influencing functional properties, etc.). The feature selection model inputs can include: features, feature values, functional properties, functional property values, target characteristic values, correlation information (e.g., outputs from the correlation model, correlation coefficients, interaction effects, etc.), the database, and/or any other protein set information. The feature selection model outputs can include: a feature subset, target features (e.g., positive and/or negative targets), and/or any other features. The feature selection model can use: supervised selection (e.g., wrapper, filter, intrinsic, etc.), unsupervised selection, recursive feature selection, lift analysis (e.g., based on a feature's lift), any explainability and/or interpretability method (e.g., SHAP values), and/or with any other selection method. The feature selection model can be a correlation model (and/or vice versa), can include a correlation model (and/or vice versa), can take correlation model outputs as inputs (and/or vice versa), be otherwise related to a correlation model, and/or be unrelated to a correlation model.
- The feature selection model can optionally be trained to select relevant features for functional property value prediction. For example, the training target can be a subset of features with high (positive and/or negative) interaction effects and/or correlation with functional properties (e.g., a correlation coefficient for a feature and/or feature set given a target functional property, interaction coefficients for features, whether an expected correlation and/or interaction was validated and/or invalidated in S600, etc.). However, the feature selection model can be otherwise trained.
- However, the feature selection model can be otherwise configured.
- The optional functional property selection model can function to select a subset of functional properties (e.g., to reduce dimensions, etc.). The functional property selection model inputs can include: functional properties, functional property values, target characteristic values, correlation information (e.g., outputs from the correlation model, correlation coefficients, interaction effects, etc.), the database, and/or any other protein set information. The functional property selection model outputs can include: a functional property subset, target functional properties (e.g., positive and/or negative targets), and/or any other functional properties. The functional property selection model can use: supervised selection (e.g., wrapper, filter, intrinsic, etc.), unsupervised selection, recursive feature selection, lift analysis (e.g., based on a functional property's lift), any explainability and/or interpretability method (e.g., SHAP values), and/or with any other selection method. The functional property selection model can be a correlation model (and/or vice versa), can include a correlation model (and/or vice versa), can take correlation model outputs as inputs (and/or vice versa), be otherwise related to a correlation model, and/or be unrelated to a correlation model.
- However, the functional property selection model can be otherwise configured.
- The optional similarity model can function to compare two sets of characteristic values. The similarity model inputs can include candidate protein set characteristic values, target characteristic values, and/or any other information. The similarity model outputs can include a comparison metric. The similarity model can use: comparison methods (e.g., matching, distance metrics, etc.), thresholds, optimization methods, regression, selection methods, classification, neural networks (e.g., CNNs, DNNs, etc.), clustering methods, rules, heuristics, equations (e.g., weighted equations, etc.), and/or any other methods. The comparison metric can be qualitative, quantitative, relative, discrete, continuous, a classification, numeric, binary, and/or be otherwise characterized. The comparison metric can be or include a distance, difference (e.g., vector of differences between values for each characteristic, vector of squared differences between values for each characteristic), ratio, regression, residuals, clustering metric (e.g., wherein multiple samples of the candidate and/or target protein sets are evaluated, wherein multiple candidate and/or target protein sets are evaluated, etc.), a statistical measure, and/or any other comparison measure. In an example, the comparison metric is a distance in feature space (e.g., wherein a characteristic value set is an embedding in the feature space). In a specific example, the comparison metric is low (e.g., the candidate protein set is similar to the target product/protein set) when the candidate protein set characteristic values are near (in feature space) positive target characteristic values and/or far from negative target characteristic values. However, the similarity model can be otherwise configured.
- The optional structure prediction model functions to predict the protein folding structure, given the context. The resultant structure can be parametrized and used to determine the protein set feature values, used to determine the functional property values, or otherwise used. Examples of structure prediction models that can be used include: AlphaFold, I-TASSER, HHpred, and/or any other suitable protein structure prediction model.
- However, the models can be otherwise defined.
- The system can optionally include an evolutionary tree (e.g., representing evolutionary relationships or distances between protein sources, protein sets, etc.). The evolutionary tree and/or evolutionary distances based on the evolutionary tree can be predetermined (e.g., where the evolutionary tree is stored in the system database and/or a third-party database), be retrieved (e.g., for each source in the database), and/or be otherwise determined. The evolutionary tree can be used to identify features, facilitate protein and/or protein set selection, discover a protein source component for a given protein set, and/or be otherwise used. In an example, the evolutionary tree can be traversed to identify candidate protein sources and/or protein source components (e.g., source components that are more commercially feasible) that might have similar protein sets to a given protein source.
- As shown in
FIG. 1 , the method can include: characterizing a protein set S100, training a prediction model S300, determining target characteristic values S400, determining a candidate protein set based on the target characteristic values S500, and/or any other suitable steps. The method can optionally include selecting a feature subset S200, selecting a functional property subset S250, evaluating the candidate protein set S600, and/or any other suitable steps. - The method can be performed once (e.g., for a given target), iteratively (e.g., to train one or more models, to iteratively improve determination of a candidate protein set, etc.), concurrently with data generation (e.g., where a database of characterized and/or uncharacterized sources is iteratively updated while one or more protein set determination events are occurring), and/or at any other suitable frequency. All or portions of the method can be performed in real time (e.g., responsive to a request), iteratively, asynchronously, periodically, and/or at any other suitable time. All or portions of the method can be performed automatically, manually, semi-automatically, and/or otherwise performed. All or portions of the method can be performed during training and/or inference (e.g., prediction).
- All or portions of the method can be performed by one or more components of the system, by a user, by a computing system, and/or by any other suitable system. The computing system can include one or more: CPUs, GPUs, custom FPGA/ASICS, microprocessors, servers, cloud computing, and/or any other suitable components. The computing system can be local, remote, distributed, or otherwise arranged relative to any other system or module.
- Characterizing a protein set S100 functions to determine abstracted characterizations (e.g., feature values, functional property values, etc.) of the protein set, wherein the characterizations can be used to train the prediction model and/or any other model (e.g., to generate training data), to determine correlations between features and functional properties, to expand the database, and/or for any other downstream functionality. S100 can be performed before S400 and/or at any other time.
- In a first variant, the protein set is characterized as a whole (e.g., where characteristic values are determined for and/or associated with the protein set as a unit). In a second variant, the protein set is characterized based on the characteristic values (e.g., feature values, functional property values, etc.) of the constituent proteins. In a first example, each constituent protein is individually characterized, and the mixture characterization is determined based on the individual characterizations (e.g., a set including the individual characteristic values, aggregated individual characteristic values, characteristic values weighted based on the concentration of the constituents in the protein mixture, characteristic values weighted based on a relative importance for a constituent protein in influencing functional properties, etc.). In a specific example, the protein set characterization can be determined using a model (e.g., the feature aggregation model, a machine learning model, etc.) that determines protein set characteristic values based on the individual characterizations of the constituent proteins. In a second example, a subset of proteins in the mixture are assigned characteristic values (e.g., only the highest concentrated protein(s) are assigned feature values and/or functional property values, proteins having a concentration percent value higher than a threshold, etc.).
- Characterizing a protein set can include: optionally determining a composition of the protein set (e.g., S120), determining sequences for the protein set (e.g., S140), determining feature values for the protein set (e.g., S160), determining functional property values for the protein set (e.g., S180), and/or optionally determining a functionality (e.g., impact on functional properties, interaction with other molecules, structural functions, etc.) of the protein set (e.g., using machine learning annotation, using a correlation model, using explainability and/or interpretability methods, etc.). In variants, S120, S140, S160, and S180 are performed for training protein sets, while only S120, S140, and S160 are preformed for candidate protein sets. However, S100 can be otherwise performed.
- Characterizing the protein set can optionally include manufacturing a sample using the protein set (e.g., wherein the manufacturing process is defined based on a context associated with the protein set), wherein all or parts of S100 are performed for the sample. The sample can optionally be processed prior to, during, or after, performing any assay (e.g., using dilution, centrifugation, dehydration, lyophilization, reconstitution, concentration methods, etc.).
- Determining a composition of the protein set S120 functions to identify each protein and/or the concentration of each protein in the set (e.g., a concentration for each protein within the protein set, a concentration of each protein within a sample containing the protein set, etc.). In a first variant, the composition can be manually or automatically specified (e.g., for a candidate protein set). In a second variant, the composition of the protein set can be measured (e.g., using mass spectrometry proteomics, a Bradford assay, capillary Electrophoresis SDS, and/or any other assay). For example, a sample can be manufactured using the protein set, wherein the protein set composition in the sample is measured using one or more assays and/or assay tools. In a specific example, a total protein quantification and individual protein abundances can be measured for the sample, wherein the concentration for each protein in the sample is based on the total protein quantification and individual protein abundances. In a third variant, the composition can be inferred using bioinformatics (e.g., machine learning techniques applied to codons), genomics, transcriptomics, and/or other protein expression prediction techniques. However, the composition can be otherwise determined.
- The protein concentrations (e.g., mol %, wt %) can be used to identify the most abundant proteins in the set, to weight variables (e.g., features), used in downstream analyses to determine proteins that have a disproportionate effect on functional properties relative to their concentration, and/or otherwise used. For any part of the method, a protein set and/or data associated with a protein set can be adjusted based on the protein composition. In a first example, a subset of the protein set is determined (e.g., to represent the complete protein set), wherein the subset includes the highest prevalence proteins in the set. In a first specific example, proteins that occupy a proportion of the protein set above a threshold percentage are selected as the subset, wherein the threshold percentage can be between 0.5%-50% or any range or value therebetween (e.g., 1%, 2%, 5%, 10%, 15%, 20%, 25%, 50%, etc.), but can alternatively be less than 0.5% or greater than 50%. In a second specific example, proteins with an overall concentration in the sample above a threshold percentage (e.g., mol %, wt %) are selected as the subset, wherein the threshold percentage can be between 0.05%-20% or any range or value therebetween (e.g., 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 15%, etc.), but can alternatively be less than 0.05% or greater than 20%. In a third specific example, a threshold number of the highest prevalence proteins are selected for the subset, wherein the threshold number can be between 1-100 or any range or value therebetween (e.g., 2-10, 5, 10, 15, etc.), but can alternatively be greater than 100. In a third specific example, proteins having a certain set of characteristics (e.g., binding affinity, adsorption affinity, etc.) that enable easy extraction and/or purification can be selected as the subset. In a second example, data associated with a protein set can be weighted based on the proportions of each constituent protein.
- However, the protein set composition can be otherwise determined.
- Determining sequences for the protein set S140 functions to determine information for feature extraction (e.g., for sequence-based features) and/or to directly determine feature values (e.g., where the feature values are sequences). Sequences can be measured (e.g., using an assay), retrieved (e.g., from a third-party database), and/or otherwise determined. Determining sequences can optionally include determining secondary information associated with the sequences (e.g., protein structure information, metadata, etc.). In a first variant, a sequence is preferably determined for each individual protein in the protein set (e.g., retrieved from a databased, determined using protein sequencing, etc.), but can alternatively be determined for a subset of proteins in the protein set, be determined directly for the protein set as a whole, and/or be otherwise determined. However, sequences for the protein set can be otherwise determined.
- Determining feature values for the protein set S160 functions to computationally identify characterization values (e.g., molecular property values) of the protein set. S160 can be performed one or more times for each protein in a protein set, one or more times for each protein in a subset of the protein set (e.g., for the highest prevalence proteins in the protein set), one or more times for each protein set (e.g., iterating through a database), after S200 (e.g., where feature values for a protein set are determined for the features selected in S200), and/or at any other suitable time. The feature values can optionally be a feature value vector (e.g., wherein each element of the vector is a feature value for a feature in a feature set).
- Feature values are preferably determined using the feature extraction model, but can alternatively be otherwise determined. In a first variant, feature values can be computationally determined. In a first example, feature values can be extracted from sequences (e.g., amino acid sequences). In a second example, feature values can be based on a computationally-determined protein charge and/or charge distribution. In a third example, feature values can be determined based on a modeled folding pattern (e.g., a likely protein folding pattern). In a fourth example, context feature values can be determined based on context information (e.g., extracted from ingredient lists, treatments, protein modifications, etc.) and optionally the protein sequences. In a second variant, feature values can be measured and/or extracted from measurements (e.g., experimentally determined using assays). In a third variant, feature values can be determined using a simulation (e.g., protein folding simulation, protein functionality simulation, protein interaction simulation, etc.). In a fourth variant, feature values can be retrieved from a database (e.g., a third-party database, the system database, etc.). In a fifth variant, a first subset of feature values can be determined using a first feature extraction model while the remaining feature values are determined using a second feature extraction model (e.g., using values from the first feature subset, using other information, etc.).
- Feature values can be determined using one or more of the variants. In a first example, feature values can be computationally determined and subsequently validated and/or updated using measurements (e.g., values for water binding capacity can be estimated based on computationally determined charge distribution and/or folding pattern, then subsequently tested using centrifugal compression). In a second example, the amino acid sequence for each protein of the protein set (e.g., for a subset of the protein set including the highest prevalence proteins) can be retrieved from a third-party database, then feature values can be subsequently extracted based on the retrieved sequences. In a third example, context feature values can be determined based on context information retrieved from the database, and sequence feature values can be determined using a feature extraction model.
- S160 can optionally include aggregating feature values across individual proteins in the protein set (e.g., all proteins in the set, a subset of the protein set, etc.). For example, an aggregated feature value set (e.g., aggregated feature value vector) can be determined for the protein set based on feature value sets (e.g., feature value vectors) for one or more proteins in the protein set. The feature values are preferably aggregated using the feature aggregation model, but can alternatively be otherwise aggregated. In a first example, aggregating feature values includes summing the values for each feature across the proteins of the protein set (e.g., optionally weighted by concentration or abundance). In a second example, aggregating feature values includes predicting an aggregated feature vector based on the feature value set for each protein of the protein set and optionally the respective protein concentration or abundance (e.g., wherein the feature value sets can be concatenated, fed to different input heads, etc.). In a third example, aggregating feature values can include predicting the aggregated feature vector based on the protein sequences of the proteins within the protein set (e.g., wherein the protein sequences can be concatenated, fed to different input heads, etc.).
- However, feature values can be otherwise determined.
- Determining functional property values for the protein set S180 functions to determine behavior of the protein set. S180 can be performed before S160, after S160, during S600, iteratively, and/or at any other suitable time.
- The functional property values are preferably measured and/or otherwise directly determined values for a set of functional properties, but can alternatively be manually assigned, inferred, predicted, or otherwise determined. In a first variant, functional property values are measured and/or extracted from measurements (e.g., measurements determined using any assay and/or assay tool). For example, a sample can be manufactured using the protein set, wherein the functional property values for the protein set are measured using one or more assays and/or assay tools. In an illustrative example, the functional property values are determined for a protein and lipid gel (e.g., wherein the gel manufacturing is prescribed by a context associated with the protein set). The functional property values can be determined using one or more experimental environments, treatments, and/or any other variable (e.g., where the set of functional property values determined in an environment are associated with that variable). In a second variant, the functional property values can be retrieved from a database (e.g., a third-party database). In a third variant, the functional property values can be computationally determined. In a first example of the third variant, the functional property values can be determined based on simulations (e.g., computer simulations of protein dynamics). In a second example of the third variant, the functional property values can be predicted using prediction model (e.g., based on the protein set feature values, etc.).
- However, functional property values can be otherwise determined.
- The method can optionally include selecting a feature subset S200, which functions to select features which most likely influence (e.g., have a measurable effect on, a significant effect on, a disproportionate effect relative to their concentration, etc.) one or more functional properties and/or to reduce feature space dimensions (e.g., to reduce computational load). S200 can be performed after S100, before S400, during and/or after S300, and/or at any other suitable time. The feature subset can be selected using a feature selection model, using a correlation model, randomly, with human input, and/or be otherwise determined.
- In a first variant, the feature subset can be features (e.g., target features) that influence functional properties. In a first embodiment, the feature selection model uses lift analysis (e.g., applied to a prediction model trained to output functional property values based on the feature values) to select the subset of features with lift above a threshold. In a second embodiment, features with prediction model weights above a threshold value are selected as the feature subset, wherein the model weights can be determined during and/or after prediction model training. In a third embodiment, a correlation model can be used to determine features positively and/or negatively correlated to one or more functional properties (e.g., absolute value of correlation coefficient above a threshold, a confidence score above a threshold, etc.).
- In a second variant, the subset of features can be determined using any dimensionality reduction technique (e.g., principal component analysis, linear discriminant analysis, etc.).
- In a third variant, the subset of features can be determined based on a comparison between a target (e.g., a target protein set and/or target product) and a candidate protein set (e.g., a prototype protein set), wherein the subset of features (e.g., used to predict functional properties for a second candidate protein set) can be selected based on the similarities and/or differences between the respective functional property values. In a first example, a difference between functional property values associated with the target and candidate protein set can be determined (e.g., where values for one or more functional properties differ significantly between the target and candidate). The differing functional property values can define a functional property subset (e.g., target functional properties). These target functional properties can then be used to determine a feature subset (e.g., target features), wherein the feature subset can be the feature(s) mostly likely to influence the functional property subset (e.g., based on a correlation model output). In a second example, a difference between feature values associated with the target and candidate protein set can be determined (e.g., where one or more functional property values differ between the two sets). The features associated with the differing feature values can define the feature subset (e.g., target features).
- However, the feature subset can be otherwise selected.
- The method can optionally include selecting a functional property subset S250, which functions to reduce functional property space dimensions (e.g., to reduce computational load). S250 can be performed after S100, before S400, during and/or after S300, and/or at any other suitable time. The functional property subset can be selected using a feature selection model, using a correlation model, randomly, with human input, and/or be otherwise determined.
- In a first variant, the subset of functional properties can be determined using any dimensionality reduction technique (e.g., principal component analysis, linear discriminant analysis, etc.).
- In a second variant, the subset of functional properties can be determined based on a comparison between a target (e.g., a target protein set and/or target product) and a first candidate protein set (e.g., a prototype protein set), wherein the subset of functional properties can be selected based on the similarities and/or differences between the functional property values for the target and the first protein set. In an example, a difference between functional property values associated with the target and candidate protein set can be determined (e.g., where values for one or more functional properties differ significantly between the two sets). The differing functional property values can define a functional property subset (e.g., target functional properties).
- However, the functional property subset can be otherwise selected.
- Training a prediction model S300 functions to improve functional property value predication, candidate protein set determination (using the prediction model), and/or any other part of the method. S300 can be performed after S100 and/or at any other time.
- In variants, training the prediction model includes determining training data including feature values (e.g., determined via S160) and corresponding functional property values (e.g., determined via S180) for one or more protein sets (e.g., a set of protein sets). The functional property values in the training data are preferably measured, but can alternatively be otherwise determined (e.g., using any other method in S180). The prediction model is then trained using the training data to predict the functional property values for a protein set based on the feature values for the protein set. Examples are shown in
FIG. 6 ,FIG. 7A , andFIG. 7B . - In any variant, the training data can include positive samples (e.g., with no negative samples), wherein the prediction model is trained using positive-unlabeled learning. Alternatively or additionally, the training data can include negative samples, wherein the prediction model can be trained to distance the prediction from the negative samples.
- However, one or more prediction models can be otherwise trained.
- Determining target characteristic values S400 functions to specify one or more criteria for candidate protein set determination. For example, the candidate protein set can be selected to manufacture an analog for a target product, to replace a target protein set (e.g., a protein set to be replicated and/or replaced, a protein set to be replicated with specified modifications, etc.), to meet a desired set of characteristic values, and/or otherwise used. S400 can be performed after S100 (e.g., after a target protein set has been characterized) and/or at any other time.
- The target characteristic values are preferably associated with a characterized protein set (e.g., a characterized target protein set), but alternatively can be associated with an uncharacterized protein set, be associated with a source and/or source component, be associated with a target product (e.g., target food product), be otherwise associated with protein set information, and/or not be associated with a protein set and/or source. The target characteristic values can be all or a subset of: the functional property values, the feature values, the amino acid sequences, and/or any other characteristic value associated with a target: product, source, source component, and/or protein set.
- The target characteristic values (e.g., a target characteristic value vector) can be determined manually, automatically, predetermined, with a model (e.g., target features selected using a feature selection model, target functional properties selected using a functional property selection model, etc.), based on a target product and/or target protein set, based on a use case (e.g., the use case for the candidate protein set, for the associated target protein set, etc.), retrieved from a database (e.g., where target functional property values are those associated with a target protein set in the database), measured, and/or be otherwise determined.
- In a first variant, the target characteristic values include target feature values. In a first embodiment, the target feature values can be determined for a target protein set using S160 methods. In a specific example, a subset of feature values of the target protein set can be used as the target characteristic values, where the subset can correspond to the feature subset determined in S200. In a second embodiment, target functional property values are used to determine target feature values. In a specific example, a correlation model is used to identify feature values associated with the target functional property values.
- In a second variant, the target characteristic values include target functional property values. In a first embodiment, the target functional property values can be determined for a target product and/or protein set using S180 methods. In a second embodiment, the target functional property values can be manually specified (e.g., desired or optimal functional property values for a product, a desired change in functional property values relative to functional property values for a protein set, etc.).
- In a third variant, the target characteristic vales can include target feature values and target functional property values (e.g., a combination of the first and second variants).
- However, target characteristic values can be otherwise determined.
- Determining a candidate protein set based on the target characteristic values S500 functions to determine a protein set that satisfies target criteria (e.g., has desired characteristic values, mimics a target product/protein set, etc.). Additionally or alternatively, S500 functions to determine a candidate protein set for evaluation in S600 (e.g., wherein characterization of the candidate protein set can train the prediction model). S500 can be performed after S400, after S300, during S300 (e.g., as part of training), and/or at any other suitable time.
- Determining the candidate protein set can optionally include determining the composition of the candidate protein set (e.g., determining each protein in the set and/or determining the concentration of each protein in the set) and/or selecting a context for the candidate protein set.
- In a first variant, determining each protein in the candidate protein set includes individually selecting each individual protein in the candidate protein set from proteins in a candidate group of protein sets. In a second variant, determining each protein in the candidate protein set includes selecting the candidate protein set as a whole from the candidate group of protein sets.
- The candidate group of protein sets can include uncharacterized protein sets, partially characterized protein sets (e.g., with feature values but not functional property values), fully characterized protein sets (e.g., with both feature values and functional property values), known or estimated abundant protein sets (e.g., determined based on functional protein labelling), and/or any other set of protein sets. The candidate group can optionally be a subset of the system database (e.g., to reduce the computational resources, to reduce the search space, to constrain all or parts of the selection, etc.). For example, the candidate group can include a subset of protein sources (e.g., candidate protein sources, wherein all or parts of the protein sets associated with each candidate protein source are included), a subset of protein sets, and/or any other subset. In a first specific example, an evolutionary tree is used to identify protein sources evolutionarily related to a target protein source, wherein the candidate group includes protein sets associated with the identified protein sources. In a second specific example, the candidate group includes a set of protein sets with target feature values (e.g., within a threshold similarity to target feature values).
- Each protein set in the candidate group can optionally be associated with one or more concentrations and/or contexts. For example, each protein set can be associated with a predetermined set of possible values for each concentration and context parameter (e.g., a protein set can be associated with each unique combination of possible compositions and context values). In an illustrative example, a protein set in the candidate group includes [
Protein 1, Protein 2]; the possible compositions for the protein set include: [70%, 30%], [30%, 70%], and [50%, 50%]; the possible contexts for the protein set include: [combine with canola oil, heat to 65° C., glycosylation of Protein 1], [combine with kokum butter, heat to 65° C., glycosylation of Protein 1], [combine with canola oil, heat to 72° C., glycosylation of Protein 1], [combine with kokum butter, heat to 72° C., glycosylation of Protein 1], [combine with canola oil, heat to 65° C., no glycosylation of Protein 1], [combine with kokum butter, heat to 65° C., no glycosylation of Protein 1], [combine with canola oil, heat to 72° C., no glycosylation of Protein 1], and [combine with kokum butter, heat to 72° C., no glycosylation of Protein 1]. - The candidate protein set can be determined based on: the target and candidate protein set's characteristic values (e.g., functional property values, feature values, etc.), estimated abundance and/or ease of extraction (e.g., determined based on the protein set's functionality, the protein source, the source component, etc.), the database, and/or any other factor. Any candidate protein set determination method can optionally be supplemented based on protein source and/or source component information (e.g., where the probability of selecting a protein set as the candidate protein set increases if the protein set is likely to be abundant within the protein source and/or the protein source itself is likely to be abundant relative to a threshold).
- In a first variant, the candidate protein set is determined using optimization approaches (e.g., Bayesian optimization, machine learning recommender systems, etc.). For example, the candidate protein set can be selected as a training protein set for characterization (e.g., to expand the training data for use in S300), wherein optimization approaches can be used to reduce (e.g., minimize) the number of additional training protein sets that are needed to train the prediction model and/or to identify a candidate protein set that satisfies the target criteria.
- In a second variant, the candidate protein set can be determined by comparing (e.g., matching) one or more characteristic values using a similarity model to generate a comparison metric (e.g., example shown in
FIG. 9 ). For example, characteristic values for each protein set in the candidate group (e.g., for each unique protein set composition and context pair) can be predicted (e.g., using the prediction model), wherein the predicted characteristic values are compared to the target characteristic values to generate the comparison metric. The candidate protein set (e.g., with associated composition and context) can then be determined based on the comparison metric (e.g., selecting the protein set with the minimum or maximum comparison metric, selecting a protein set with a comparison metric above or below a threshold, selecting the protein set using a protein set determination model, etc.). - In a first embodiment, the candidate protein set's predicted functional property values can be compared to target functional property values (e.g., for an analogous set of functional properties). For example, the candidate protein set's functional property values are predicted using the prediction model (e.g., based on feature values, based on context, etc.).
- In a second embodiment, the candidate protein set's feature values can be compared to target feature values. For example, a match between positive target feature values and candidate protein set feature values can increase the probability of selection of the candidate protein set, whereas a match between negative target feature values and to candidate protein set feature values can the probability of selection.
- In a third embodiment, a candidate protein source and/or candidate protein set can be selected based on an evolutionary tree. In a first example, the candidate protein source is selected by identifying a protein source based on a close evolutionary relationship with a target protein source and/or protein source containing a matching candidate protein set. In a second example, the candidate protein set can be selected by identifying close evolutionary relationships between proteins in the candidate protein set and proteins in a target protein set. In a third example, additional candidate protein set(s) can be selected after a first selection event of a first candidate protein set by identifying additional protein set(s) based on close evolutionary relationships to the first candidate protein set.
- The candidate protein set can optionally be used to manufacture an analog for a target food product (e.g., dairy analog, meat analog, egg analog, any animal product analog, etc.) and/or any other sample (e.g., product). For example, a protein source associated with the candidate protein set can be selected as an ingredient for manufacturing a product. In a specific example, proteins in the candidate protein set can be extracted and/or isolated from one or more sources, wherein a sample is manufactured (e.g., based on a context associated with the candidate protein set) using the proteins to have the determined candidate protein set composition.
- However, the candidate protein set can be otherwise determined.
- The method can optionally include evaluating the candidate protein set S600, which functions to determine whether the candidate protein set can be used in an analog for a target product, whether the candidate protein set can be used as a replacement for a target protein set, whether the candidate protein set has the desired (e.g., target) characteristic values, to determine feedback for a model (e.g., for training the prediction model, the protein set determination model, and/or any other model), and/or to compare the functional property values of the candidate protein set to one or more other functional property values.
- S600 can be performed after S500, after S180 (e.g., after the candidate protein set is characterized with functional property values), iteratively (e.g., until a stop condition is met, such as substantial similarity to the target), and/or at any other time. A search for a protein set can be continued (e.g., iteratively performing S500 and S600) until a candidate protein set satisfies a set of target criteria (e.g., stopping when the evaluation indicates that the candidate protein set characteristic values fall within target ranges), until a comparison metric is below or above a threshold, for a predetermined number of iterations, and/or until any other stop condition is met. In an example, the target criteria include one or more ranges of characteristic values based on target characteristic values (e.g., predetermined ranges around the target characteristic values).
- S600 can include: determining functional property values for the candidate protein set (e.g., S180 performed for the candidate protein set), and determining a comparison metric based on the resultant functional property values (e.g., using the similarity model). In an example, determining functional property values for the candidate protein set includes manufacturing a sample containing the candidate protein set (e.g., at a protein composition determined in S500 and/or using a context determined in S500), wherein the sample is subjected to assays to measure functional property values.
- The sample (e.g., target food replica) can be manufactured by mixing the protein set with a set of other ingredients (e.g., plant-derived ingredients, such as fats, oils, sugars, etc.) and processing the mixture (e.g., by heating, reacting, inoculating, fermenting, etc.). Alternatively, the sample can be manufactured by gelling the protein, then using the gel as an ingredient. The manufactured samples can be entirely or mostly plant-derived (e.g., more than 70%, 80%, 90%, 99%, etc. plant-derived components by weight or volume).
- In a first embodiment, the comparison metric can be based on a comparison between the candidate protein set's measured functional property values and predicted functional property values (e.g., predicted functional property values for the candidate protein set determined using the prediction model). In a second embodiment, the comparison metric can be based on a comparison between the candidate protein set's measured functional property values and target functional property values (e.g., the functional property values of a target protein set). In variants, a comparison metric above or below a threshold (e.g., a significant difference between the actual and target and/or predicted functional property values) corresponds to negative feedback in model training (e.g., S400 and/or any other model training).
- However, the candidate protein set can be otherwise evaluated.
- The method can optionally include determining interpretability and/or explainability of the trained prediction model, which can be used to select features, select functional properties, identify errors in the data, identify ways of improving the prediction model, increase computational efficiency, determine influential features and/or values thereof, determine influential functional properties and/or values thereof, and/or otherwise used. Interpretability and/or explainability methods can include: local interpretable model-agnostic explanations (LIME), Shapley Additive explanations (SHAP), Ancors, DeepLift, Layer-Wise Relevance Propagation, contrastive explanations method (CEM), counterfactual explanation, Protodash, Permutation importance (PIMP), L2X, partial dependence plots (PDPs), individual conditional expectation (ICE) plots, accumulated local effect (ALE) plots, Local Interpretable Visual Explanations (LIVE), breakDown, ProfWeight, Supersparse Linear Integer Models (SLIM), generalized additive models with pairwise interactions (GA2Ms), Boolean Rule Column Generation, Generalized Linear Rule Models, Teaching Explanations for Decisions (TED), and/or any other suitable method and/or approach.
- Alternative embodiments implement the above methods and/or processing modules in non-transitory computer-readable media, storing computer-readable instructions that, when executed by a processing system, cause the processing system to perform the method(s) discussed herein. The instructions can be executed by computer-executable components integrated with the computer-readable medium and/or processing system. The computer-readable medium may include any suitable computer readable media such as RAMs, ROMs, flash memory, EEPROMs, optical devices (CD or DVD), hard drives, floppy drives, non-transitory computer readable media, or any suitable device. The computer-executable component can include a computing system and/or processing system (e.g., including one or more collocated or distributed, remote or local processors) connected to the non-transitory computer-readable medium, such as CPUs, GPUs, TPUS, microprocessors, or ASICs, but the instructions can alternatively or additionally be executed by any suitable dedicated hardware device.
- Embodiments of the system and/or method can include every combination and permutation of the various system components and the various method processes, wherein one or more instances of the method and/or processes described herein can be performed asynchronously (e.g., sequentially), contemporaneously (e.g., concurrently, in parallel, etc.), or in any other suitable order by and/or using one or more instances of the systems, elements, and/or entities described herein. Components and/or processes of the following system and/or method can be used with, in addition to, in lieu of, or otherwise integrated with all or a portion of the systems and/or methods disclosed in the applications mentioned above, each of which are incorporated in their entirety by this reference.
- As a person skilled in the art will recognize from the previous detailed description and from the figures and claims, modifications and changes can be made to the preferred embodiments of the invention without departing from the scope of this invention defined in the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/095,351 US20230217957A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for glycated consumables |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263297966P | 2022-01-10 | 2022-01-10 | |
US202263298920P | 2022-01-12 | 2022-01-12 | |
US202263298927P | 2022-01-12 | 2022-01-12 | |
US202263298930P | 2022-01-12 | 2022-01-12 | |
US18/095,351 US20230217957A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for glycated consumables |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230217957A1 true US20230217957A1 (en) | 2023-07-13 |
Family
ID=87069957
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/095,339 Active US11837327B2 (en) | 2022-01-10 | 2023-01-10 | System and method for protein selection |
US18/095,357 Pending US20230223101A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for acylated consumables |
US18/095,351 Pending US20230217957A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for glycated consumables |
US18/095,345 Pending US20230217956A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for phosphorylated consumables |
US18/382,945 Pending US20240071567A1 (en) | 2022-01-10 | 2023-10-23 | System and method for protein selection |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/095,339 Active US11837327B2 (en) | 2022-01-10 | 2023-01-10 | System and method for protein selection |
US18/095,357 Pending US20230223101A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for acylated consumables |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/095,345 Pending US20230217956A1 (en) | 2022-01-10 | 2023-01-10 | Compositions and methods for phosphorylated consumables |
US18/382,945 Pending US20240071567A1 (en) | 2022-01-10 | 2023-10-23 | System and method for protein selection |
Country Status (2)
Country | Link |
---|---|
US (5) | US11837327B2 (en) |
WO (1) | WO2023133352A2 (en) |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001186845A (en) * | 1999-12-28 | 2001-07-10 | Masahiko Nakazoe | Edible oil-and-fat composition having function to suppress body fat accumulation |
US7747391B2 (en) | 2002-03-01 | 2010-06-29 | Maxygen, Inc. | Methods, systems, and software for identifying functional biomolecules |
GB0507123D0 (en) * | 2005-04-08 | 2005-05-11 | Isis Innovation | Method |
CA2841470C (en) | 2011-07-12 | 2020-11-10 | Maraxi, Inc. | Methods and compositions for consumables |
KR20150105979A (en) * | 2013-01-11 | 2015-09-18 | 임파서블 푸즈 인크. | Non-dairy cheese replica comprising a coacervate |
SG11201802582XA (en) | 2015-09-30 | 2018-05-30 | Hampton Creek Inc | Systems and methods for identifying entities that have a target property |
EP3759714A1 (en) * | 2018-02-26 | 2021-01-06 | Just Biotherapeutics, Inc. | Determining protein structure and properties based on sequence |
CN112638165A (en) * | 2018-04-24 | 2021-04-09 | 斯佩罗食品公司 | Methods and compositions for oilseed material |
WO2019234769A1 (en) * | 2018-06-04 | 2019-12-12 | Sabina Baglio | Process of production of vegetable cheeses without animal ingredients |
US11164478B2 (en) | 2019-05-17 | 2021-11-02 | NotCo Delaware, LLC | Systems and methods to mimic target food items using artificial intelligence |
WO2021119261A1 (en) | 2019-12-10 | 2021-06-17 | Homodeus, Inc. | Generative machine learning models for predicting functional protein sequences |
US10957424B1 (en) | 2020-08-10 | 2021-03-23 | NotCo Delaware, LLC | Neural network method of generating food formulas |
US10962473B1 (en) | 2020-11-05 | 2021-03-30 | NotCo Delaware, LLC | Protein secondary structure prediction |
US11439159B2 (en) * | 2021-03-22 | 2022-09-13 | Shiru, Inc. | System for identifying and developing individual naturally-occurring proteins as food ingredients by machine learning and database mining combined with empirical testing for a target food function |
US11205101B1 (en) | 2021-05-11 | 2021-12-21 | NotCo Delaware, LLC | Formula and recipe generation with feedback loop |
US11348664B1 (en) | 2021-06-17 | 2022-05-31 | NotCo Delaware, LLC | Machine learning driven chemical compound replacement technology |
US11373107B1 (en) | 2021-11-04 | 2022-06-28 | NotCo Delaware, LLC | Systems and methods to suggest source ingredients using artificial intelligence |
US11404144B1 (en) | 2021-11-04 | 2022-08-02 | NotCo Delaware, LLC | Systems and methods to suggest chemical compounds using artificial intelligence |
-
2023
- 2023-01-10 US US18/095,339 patent/US11837327B2/en active Active
- 2023-01-10 WO PCT/US2023/010492 patent/WO2023133352A2/en unknown
- 2023-01-10 US US18/095,357 patent/US20230223101A1/en active Pending
- 2023-01-10 US US18/095,351 patent/US20230217957A1/en active Pending
- 2023-01-10 US US18/095,345 patent/US20230217956A1/en active Pending
- 2023-10-23 US US18/382,945 patent/US20240071567A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US11837327B2 (en) | 2023-12-05 |
US20240071567A1 (en) | 2024-02-29 |
US20230217956A1 (en) | 2023-07-13 |
US20230223101A1 (en) | 2023-07-13 |
US20230223109A1 (en) | 2023-07-13 |
WO2023133352A2 (en) | 2023-07-13 |
WO2023133352A3 (en) | 2023-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11439159B2 (en) | System for identifying and developing individual naturally-occurring proteins as food ingredients by machine learning and database mining combined with empirical testing for a target food function | |
US11823070B2 (en) | System and method for sample evaluation to mimic target properties | |
Wani et al. | Effect of temperature, alkali concentration, mixing time and meal/solvent ratio on the extraction of watermelon seed proteins—a response surface approach | |
CN107228924B (en) | A kind of adequate proteins processing peanut raw material quality determination and its evaluation method | |
Bose et al. | Protein extraction protocols for optimal proteome measurement and arginine kinase quantitation from cricket Acheta domesticus for food safety assessment | |
Boggess et al. | The need for agriculture phenotyping:“Moving from genotype to phenotype” | |
Liu et al. | Effect of soy milk characteristics and cooking conditions on coagulant requirements for making filled tofu | |
US11837327B2 (en) | System and method for protein selection | |
CN109444313B (en) | Method for analyzing digestibility of protein-polysaccharide complex based on liquid chromatography-mass spectrometry technology | |
Al-Saedi et al. | Study on the correlation between the protein profile of lupin milk and its cheese production compared with cow’s milk | |
US11941538B2 (en) | System and method for sample evaluation to mimic target properties | |
Tarapoulouzi et al. | Discrimination of Cheddar and Kefalotyri cheese samples: Analysis by chemometrics of proton-NMR and FTIR spectra | |
US20240078447A1 (en) | System and method for determining a sample recommendation | |
Castell-Palou et al. | Application of multivariate statistical analysis to chemical, physical and sensory characteristics of Majorcan cheese | |
US20240016179A1 (en) | Selecting food ingredients from vector representations of individual proteins using cluster analysis and precision fermentation | |
CN113812598A (en) | Seafood characteristic food seasoning, preparation method and application | |
Chawla et al. | Omics approaches and applications in dairy and food processing technology | |
Megha et al. | Effect of heat on the functional properties of pea flour and pea protein concentrate | |
Ma et al. | Functional Performance of Plant Proteins. Foods 2022, 11, 594 | |
US20230409975A1 (en) | System and method for sample characterization | |
WO2021039681A1 (en) | Method for fractionating soybean proteins | |
JP2024518021A (en) | A system for identifying and developing food ingredients from natural sources using machine learning and database mining combined with empirical testing for target functionality | |
Nepolean et al. | ISOLATION AND IDENTIFICATION OF CASEIN FROM VARIOUS SOURCES OF MILK | |
Möller | Functional properties of legumes important for food applications | |
Zervos et al. | Chemometrics: the use of multivariate methods for the determination and characterization of off-flavors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CLIMAX FOODS INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NAVI, TAYLOR;WESTCOTT, DANIEL;SIGNING DATES FROM 20230202 TO 20230203;REEL/FRAME:062587/0017 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: CLIMAX FOODS INC., CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNOR EXECUTION DATE ON THE ASSIGNMENT PREVIOUSLY RECORDED ON REEL 062587 FRAME 0017. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:NAVI, TAYLOR;WESTCOTT, DANIEL;SIGNING DATES FROM 20230203 TO 20230627;REEL/FRAME:064148/0250 |