WO2023129653A2 - Systems and methods for accelerate speed to market for improved plant-based products - Google Patents
Systems and methods for accelerate speed to market for improved plant-based products Download PDFInfo
- Publication number
- WO2023129653A2 WO2023129653A2 PCT/US2022/054252 US2022054252W WO2023129653A2 WO 2023129653 A2 WO2023129653 A2 WO 2023129653A2 US 2022054252 W US2022054252 W US 2022054252W WO 2023129653 A2 WO2023129653 A2 WO 2023129653A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- plant
- machine learning
- learning model
- progeny
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 119
- 230000001976 improved effect Effects 0.000 title claims abstract description 26
- 238000010801 machine learning Methods 0.000 claims abstract description 118
- 230000001488 breeding effect Effects 0.000 claims abstract description 69
- 238000009395 breeding Methods 0.000 claims abstract description 62
- 238000012549 training Methods 0.000 claims abstract description 38
- 230000009471 action Effects 0.000 claims abstract description 11
- 108090000623 proteins and genes Proteins 0.000 claims description 79
- 238000010362 genome editing Methods 0.000 claims description 23
- 238000004088 simulation Methods 0.000 claims description 16
- 238000000126 in silico method Methods 0.000 claims description 15
- 230000006872 improvement Effects 0.000 claims description 7
- 238000011161 development Methods 0.000 abstract description 10
- 241000196324 Embryophyta Species 0.000 description 216
- 244000068988 Glycine max Species 0.000 description 87
- 235000010469 Glycine max Nutrition 0.000 description 83
- 239000000047 product Substances 0.000 description 38
- 102000004169 proteins and genes Human genes 0.000 description 37
- 235000018102 proteins Nutrition 0.000 description 35
- 230000014509 gene expression Effects 0.000 description 33
- 230000015654 memory Effects 0.000 description 26
- 210000004027 cell Anatomy 0.000 description 23
- 238000012360 testing method Methods 0.000 description 22
- 150000007523 nucleic acids Chemical group 0.000 description 21
- 230000008569 process Effects 0.000 description 21
- 235000013305 food Nutrition 0.000 description 19
- 230000001965 increasing effect Effects 0.000 description 18
- 210000001519 tissue Anatomy 0.000 description 18
- 230000002068 genetic effect Effects 0.000 description 17
- 239000003795 chemical substances by application Substances 0.000 description 16
- 108020004707 nucleic acids Proteins 0.000 description 16
- 102000039446 nucleic acids Human genes 0.000 description 16
- 238000011282 treatment Methods 0.000 description 16
- 239000004615 ingredient Substances 0.000 description 15
- 239000003921 oil Substances 0.000 description 15
- 235000019198 oils Nutrition 0.000 description 15
- 238000004519 manufacturing process Methods 0.000 description 14
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 14
- 230000035772 mutation Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 13
- 238000013459 approach Methods 0.000 description 11
- 230000006798 recombination Effects 0.000 description 11
- 238000005215 recombination Methods 0.000 description 11
- 230000018109 developmental process Effects 0.000 description 10
- 239000000796 flavoring agent Substances 0.000 description 10
- 239000002689 soil Substances 0.000 description 10
- 241000233866 Fungi Species 0.000 description 9
- 210000000349 chromosome Anatomy 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 235000019634 flavors Nutrition 0.000 description 9
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 8
- 241000607479 Yersinia pestis Species 0.000 description 7
- 230000008859 change Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000004720 fertilization Effects 0.000 description 7
- 238000012546 transfer Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 108020004414 DNA Proteins 0.000 description 6
- 241000219730 Lathyrus aphaca Species 0.000 description 6
- 240000004713 Pisum sativum Species 0.000 description 6
- 230000003247 decreasing effect Effects 0.000 description 6
- 230000007613 environmental effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000012010 growth Effects 0.000 description 6
- 239000003550 marker Substances 0.000 description 6
- 238000003976 plant breeding Methods 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 241000589158 Agrobacterium Species 0.000 description 5
- 244000105624 Arachis hypogaea Species 0.000 description 5
- 235000010582 Pisum sativum Nutrition 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 230000003466 anti-cipated effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 230000036541 health Effects 0.000 description 5
- 239000002773 nucleotide Substances 0.000 description 5
- 235000015097 nutrients Nutrition 0.000 description 5
- 229920001542 oligosaccharide Polymers 0.000 description 5
- 150000002482 oligosaccharides Chemical class 0.000 description 5
- 238000013031 physical testing Methods 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 235000019640 taste Nutrition 0.000 description 5
- 238000012070 whole genome sequencing analysis Methods 0.000 description 5
- 241000238631 Hexapoda Species 0.000 description 4
- 240000004322 Lens culinaris Species 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 4
- 108010064851 Plant Proteins Proteins 0.000 description 4
- MUPFEKGTMRGPLJ-RMMQSMQOSA-N Raffinose Natural products O(C[C@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O[C@@]2(CO)[C@H](O)[C@@H](O)[C@@H](CO)O2)O1)[C@@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@@H](CO)O1 MUPFEKGTMRGPLJ-RMMQSMQOSA-N 0.000 description 4
- UQZIYBXSHAGNOE-USOSMYMVSA-N Stachyose Natural products O(C[C@H]1[C@@H](O)[C@H](O)[C@H](O)[C@@H](O[C@@]2(CO)[C@H](O)[C@@H](O)[C@@H](CO)O2)O1)[C@@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@H](CO[C@@H]2[C@@H](O)[C@@H](O)[C@@H](O)[C@H](CO)O2)O1 UQZIYBXSHAGNOE-USOSMYMVSA-N 0.000 description 4
- MUPFEKGTMRGPLJ-UHFFFAOYSA-N UNPD196149 Natural products OC1C(O)C(CO)OC1(CO)OC1C(O)C(O)C(O)C(COC2C(C(O)C(O)C(CO)O2)O)O1 MUPFEKGTMRGPLJ-UHFFFAOYSA-N 0.000 description 4
- 230000009418 agronomic effect Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 239000012141 concentrate Substances 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 235000013312 flour Nutrition 0.000 description 4
- 102000054766 genetic haplotypes Human genes 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 239000007788 liquid Substances 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 229910052757 nitrogen Inorganic materials 0.000 description 4
- 235000021118 plant-derived protein Nutrition 0.000 description 4
- MUPFEKGTMRGPLJ-ZQSKZDJDSA-N raffinose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO[C@@H]2[C@@H]([C@@H](O)[C@@H](O)[C@@H](CO)O2)O)O1 MUPFEKGTMRGPLJ-ZQSKZDJDSA-N 0.000 description 4
- 230000001850 reproductive effect Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- UQZIYBXSHAGNOE-XNSRJBNMSA-N stachyose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO[C@@H]2[C@@H]([C@@H](O)[C@@H](O)[C@@H](CO[C@@H]3[C@@H]([C@@H](O)[C@@H](O)[C@@H](CO)O3)O)O2)O)O1 UQZIYBXSHAGNOE-XNSRJBNMSA-N 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 108091033409 CRISPR Proteins 0.000 description 3
- 238000010354 CRISPR gene editing Methods 0.000 description 3
- 235000014647 Lens culinaris subsp culinaris Nutrition 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 238000003559 RNA-seq method Methods 0.000 description 3
- 240000006394 Sorghum bicolor Species 0.000 description 3
- 108010073771 Soybean Proteins Proteins 0.000 description 3
- 150000001413 amino acids Chemical group 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 239000007795 chemical reaction product Substances 0.000 description 3
- 239000013065 commercial product Substances 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 235000005911 diet Nutrition 0.000 description 3
- 230000037213 diet Effects 0.000 description 3
- 239000000835 fiber Substances 0.000 description 3
- 238000010353 genetic engineering Methods 0.000 description 3
- 238000003205 genotyping method Methods 0.000 description 3
- 238000003306 harvesting Methods 0.000 description 3
- 239000004009 herbicide Substances 0.000 description 3
- 208000015181 infectious disease Diseases 0.000 description 3
- 239000002917 insecticide Substances 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 235000013372 meat Nutrition 0.000 description 3
- 244000005700 microbiome Species 0.000 description 3
- VLKZOEOYAKHREP-UHFFFAOYSA-N n-Hexane Chemical compound CCCCCC VLKZOEOYAKHREP-UHFFFAOYSA-N 0.000 description 3
- 125000003729 nucleotide group Chemical group 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 235000020232 peanut Nutrition 0.000 description 3
- 230000010152 pollination Effects 0.000 description 3
- 238000007637 random forest analysis Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000009394 selective breeding Methods 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 230000001953 sensory effect Effects 0.000 description 3
- 229940001941 soy protein Drugs 0.000 description 3
- 230000017260 vegetative to reproductive phase transition of meristem Effects 0.000 description 3
- WRIDQFICGBMAFQ-UHFFFAOYSA-N (E)-8-Octadecenoic acid Natural products CCCCCCCCCC=CCCCCCCC(O)=O WRIDQFICGBMAFQ-UHFFFAOYSA-N 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- LQJBNNIYVWPHFW-UHFFFAOYSA-N 20:1omega9c fatty acid Natural products CCCCCCCCCCC=CCCCCCCCC(O)=O LQJBNNIYVWPHFW-UHFFFAOYSA-N 0.000 description 2
- QSBYPNXLFMSGKH-UHFFFAOYSA-N 9-Heptadecensaeure Natural products CCCCCCCC=CCCCCCCCC(O)=O QSBYPNXLFMSGKH-UHFFFAOYSA-N 0.000 description 2
- 244000144725 Amygdalus communis Species 0.000 description 2
- 235000011437 Amygdalus communis Nutrition 0.000 description 2
- 244000099147 Ananas comosus Species 0.000 description 2
- 235000007119 Ananas comosus Nutrition 0.000 description 2
- 235000010777 Arachis hypogaea Nutrition 0.000 description 2
- 244000075850 Avena orientalis Species 0.000 description 2
- 235000007319 Avena orientalis Nutrition 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 2
- 244000197813 Camelina sativa Species 0.000 description 2
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 2
- 235000009467 Carica papaya Nutrition 0.000 description 2
- 240000006432 Carica papaya Species 0.000 description 2
- 235000003255 Carthamus tinctorius Nutrition 0.000 description 2
- 244000020518 Carthamus tinctorius Species 0.000 description 2
- 108090000994 Catalytic RNA Proteins 0.000 description 2
- 102000053642 Catalytic RNA Human genes 0.000 description 2
- 235000013912 Ceratonia siliqua Nutrition 0.000 description 2
- 240000008886 Ceratonia siliqua Species 0.000 description 2
- 240000006162 Chenopodium quinoa Species 0.000 description 2
- 235000010523 Cicer arietinum Nutrition 0.000 description 2
- 244000045195 Cicer arietinum Species 0.000 description 2
- 235000007542 Cichorium intybus Nutrition 0.000 description 2
- 244000298479 Cichorium intybus Species 0.000 description 2
- 241000207199 Citrus Species 0.000 description 2
- 235000013162 Cocos nucifera Nutrition 0.000 description 2
- 244000060011 Cocos nucifera Species 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 2
- 241000723377 Coffea Species 0.000 description 2
- 235000001950 Elaeis guineensis Nutrition 0.000 description 2
- 244000078127 Eleusine coracana Species 0.000 description 2
- 240000008620 Fagopyrum esculentum Species 0.000 description 2
- 235000009419 Fagopyrum esculentum Nutrition 0.000 description 2
- 244000299507 Gossypium hirsutum Species 0.000 description 2
- 244000020551 Helianthus annuus Species 0.000 description 2
- 235000003222 Helianthus annuus Nutrition 0.000 description 2
- 240000005979 Hordeum vulgare Species 0.000 description 2
- 235000007340 Hordeum vulgare Nutrition 0.000 description 2
- 235000003228 Lactuca sativa Nutrition 0.000 description 2
- 240000008415 Lactuca sativa Species 0.000 description 2
- 235000010666 Lens esculenta Nutrition 0.000 description 2
- 235000004431 Linum usitatissimum Nutrition 0.000 description 2
- 240000006240 Linum usitatissimum Species 0.000 description 2
- 241000219745 Lupinus Species 0.000 description 2
- 235000014826 Mangifera indica Nutrition 0.000 description 2
- 240000007228 Mangifera indica Species 0.000 description 2
- 240000003183 Manihot esculenta Species 0.000 description 2
- 241000219823 Medicago Species 0.000 description 2
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 2
- 244000061176 Nicotiana tabacum Species 0.000 description 2
- GQPLMRYTRLFLPF-UHFFFAOYSA-N Nitrous Oxide Chemical compound [O-][N+]#N GQPLMRYTRLFLPF-UHFFFAOYSA-N 0.000 description 2
- 208000008589 Obesity Diseases 0.000 description 2
- 240000007817 Olea europaea Species 0.000 description 2
- 239000005642 Oleic acid Substances 0.000 description 2
- ZQPPMHVWECSIRJ-UHFFFAOYSA-N Oleic acid Natural products CCCCCCCCC=CCCCCCCCC(O)=O ZQPPMHVWECSIRJ-UHFFFAOYSA-N 0.000 description 2
- 241000209094 Oryza Species 0.000 description 2
- 235000007199 Panicum miliaceum Nutrition 0.000 description 2
- 108010084695 Pea Proteins Proteins 0.000 description 2
- 235000007195 Pennisetum typhoides Nutrition 0.000 description 2
- 244000025272 Persea americana Species 0.000 description 2
- 235000008673 Persea americana Nutrition 0.000 description 2
- 108700001094 Plant Genes Proteins 0.000 description 2
- 241000219000 Populus Species 0.000 description 2
- -1 RNA) of the plant Chemical class 0.000 description 2
- 235000007238 Secale cereale Nutrition 0.000 description 2
- 244000082988 Secale cereale Species 0.000 description 2
- PXIPVTKHYLBLMZ-UHFFFAOYSA-N Sodium azide Chemical compound [Na+].[N-]=[N+]=[N-] PXIPVTKHYLBLMZ-UHFFFAOYSA-N 0.000 description 2
- 240000003768 Solanum lycopersicum Species 0.000 description 2
- 235000002595 Solanum tuberosum Nutrition 0.000 description 2
- 244000061456 Solanum tuberosum Species 0.000 description 2
- 235000011684 Sorghum saccharatum Nutrition 0.000 description 2
- 235000019764 Soybean Meal Nutrition 0.000 description 2
- 244000269722 Thea sinensis Species 0.000 description 2
- 244000299461 Theobroma cacao Species 0.000 description 2
- 235000009470 Theobroma cacao Nutrition 0.000 description 2
- 241000219793 Trifolium Species 0.000 description 2
- 244000098338 Triticum aestivum Species 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 2
- 230000000844 anti-bacterial effect Effects 0.000 description 2
- 230000000433 anti-nutritional effect Effects 0.000 description 2
- 230000010165 autogamy Effects 0.000 description 2
- 239000003899 bactericide agent Substances 0.000 description 2
- 244000022203 blackseeded proso millet Species 0.000 description 2
- 235000020971 citrus fruits Nutrition 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 235000013365 dairy product Nutrition 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000839 emulsion Substances 0.000 description 2
- 230000006353 environmental stress Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000003337 fertilizer Substances 0.000 description 2
- 239000000417 fungicide Substances 0.000 description 2
- 238000001879 gelation Methods 0.000 description 2
- 230000007614 genetic variation Effects 0.000 description 2
- 230000035784 germination Effects 0.000 description 2
- ZSIAUFGUXNUGDI-UHFFFAOYSA-N hexan-1-ol Chemical compound CCCCCCO ZSIAUFGUXNUGDI-UHFFFAOYSA-N 0.000 description 2
- JARKCYVAAOWBJS-UHFFFAOYSA-N hexanal Chemical compound CCCCCC=O JARKCYVAAOWBJS-UHFFFAOYSA-N 0.000 description 2
- 239000002054 inoculum Substances 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- QXJSBBXBKPUZAA-UHFFFAOYSA-N isooleic acid Natural products CCCCCCCC=CCCCCCCCCC(O)=O QXJSBBXBKPUZAA-UHFFFAOYSA-N 0.000 description 2
- 235000012054 meals Nutrition 0.000 description 2
- 235000013622 meat product Nutrition 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 229910044991 metal oxide Inorganic materials 0.000 description 2
- 150000004706 metal oxides Chemical class 0.000 description 2
- 239000002923 metal particle Substances 0.000 description 2
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 235000020824 obesity Nutrition 0.000 description 2
- ZQPPMHVWECSIRJ-KTKRTIGZSA-N oleic acid Chemical compound CCCCCCCC\C=C/CCCCCCCC(O)=O ZQPPMHVWECSIRJ-KTKRTIGZSA-N 0.000 description 2
- 210000003463 organelle Anatomy 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 244000052769 pathogen Species 0.000 description 2
- 235000019702 pea protein Nutrition 0.000 description 2
- 239000000575 pesticide Substances 0.000 description 2
- 244000000003 plant pathogen Species 0.000 description 2
- 239000013612 plasmid Substances 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 108091092562 ribozyme Proteins 0.000 description 2
- 210000003296 saliva Anatomy 0.000 description 2
- 230000010153 self-pollination Effects 0.000 description 2
- 239000002904 solvent Substances 0.000 description 2
- 239000004455 soybean meal Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 239000004094 surface-active agent Substances 0.000 description 2
- 230000009261 transgenic effect Effects 0.000 description 2
- 235000015112 vegetable and seed oil Nutrition 0.000 description 2
- 239000005631 2,4-Dichlorophenoxyacetic acid Substances 0.000 description 1
- JLIDBLDQVAYHNE-YKALOCIXSA-N Abscisic acid Natural products OC(=O)/C=C(/C)\C=C\[C@@]1(O)C(C)=CC(=O)CC1(C)C JLIDBLDQVAYHNE-YKALOCIXSA-N 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- 235000001271 Anacardium Nutrition 0.000 description 1
- 241000693997 Anacardium Species 0.000 description 1
- 244000226021 Anacardium occidentale Species 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 229930192334 Auxin Natural products 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 235000021533 Beta vulgaris Nutrition 0.000 description 1
- 241000335053 Beta vulgaris Species 0.000 description 1
- 241000219310 Beta vulgaris subsp. vulgaris Species 0.000 description 1
- 235000018185 Betula X alpestris Nutrition 0.000 description 1
- 235000018212 Betula X uliginosa Nutrition 0.000 description 1
- 239000002028 Biomass Substances 0.000 description 1
- 241000219198 Brassica Species 0.000 description 1
- 235000011331 Brassica Nutrition 0.000 description 1
- 244000178993 Brassica juncea Species 0.000 description 1
- 240000002791 Brassica napus Species 0.000 description 1
- 240000008100 Brassica rapa Species 0.000 description 1
- 241000220243 Brassica sp. Species 0.000 description 1
- 235000004936 Bromus mango Nutrition 0.000 description 1
- 235000016401 Camelina Nutrition 0.000 description 1
- 235000014595 Camelina sativa Nutrition 0.000 description 1
- 235000015493 Chenopodium quinoa Nutrition 0.000 description 1
- 108010077544 Chromatin Proteins 0.000 description 1
- 235000010521 Cicer Nutrition 0.000 description 1
- 241000220455 Cicer Species 0.000 description 1
- 108700010070 Codon Usage Proteins 0.000 description 1
- 241000218631 Coniferophyta Species 0.000 description 1
- 229920000742 Cotton Polymers 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- ZAKOWWREFLAJOT-CEFNRUSXSA-N D-alpha-tocopherylacetate Chemical compound CC(=O)OC1=C(C)C(C)=C2O[C@@](CCC[C@H](C)CCC[C@H](C)CCCC(C)C)(C)CCC2=C1C ZAKOWWREFLAJOT-CEFNRUSXSA-N 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 239000005504 Dicamba Substances 0.000 description 1
- 240000003133 Elaeis guineensis Species 0.000 description 1
- 244000127993 Elaeis melanococca Species 0.000 description 1
- 235000007349 Eleusine coracana Nutrition 0.000 description 1
- 235000013499 Eleusine coracana subsp coracana Nutrition 0.000 description 1
- 241000093679 Ensifer sp. Species 0.000 description 1
- 244000166124 Eucalyptus globulus Species 0.000 description 1
- 244000004281 Eucalyptus maculata Species 0.000 description 1
- 241000220485 Fabaceae Species 0.000 description 1
- 241000218218 Ficus <angiosperm> Species 0.000 description 1
- 229930191978 Gibberellin Natural products 0.000 description 1
- 239000005562 Glyphosate Substances 0.000 description 1
- 240000000047 Gossypium barbadense Species 0.000 description 1
- 235000009429 Gossypium barbadense Nutrition 0.000 description 1
- 235000009432 Gossypium hirsutum Nutrition 0.000 description 1
- 235000017367 Guainella Nutrition 0.000 description 1
- 241000257303 Hymenoptera Species 0.000 description 1
- 206010021929 Infertility male Diseases 0.000 description 1
- 235000021506 Ipomoea Nutrition 0.000 description 1
- 241000207783 Ipomoea Species 0.000 description 1
- 244000017020 Ipomoea batatas Species 0.000 description 1
- 235000002678 Ipomoea batatas Nutrition 0.000 description 1
- 206010022971 Iron Deficiencies Diseases 0.000 description 1
- 241000209510 Liliopsida Species 0.000 description 1
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 241000208467 Macadamia Species 0.000 description 1
- 235000018330 Macadamia integrifolia Nutrition 0.000 description 1
- 240000007575 Macadamia integrifolia Species 0.000 description 1
- 208000007466 Male Infertility Diseases 0.000 description 1
- 235000004456 Manihot esculenta Nutrition 0.000 description 1
- 235000016735 Manihot esculenta subsp esculenta Nutrition 0.000 description 1
- 235000017587 Medicago sativa ssp. sativa Nutrition 0.000 description 1
- AFVFQIVMOAPDHO-UHFFFAOYSA-N Methanesulfonic acid Chemical compound CS(O)(=O)=O AFVFQIVMOAPDHO-UHFFFAOYSA-N 0.000 description 1
- 108700011259 MicroRNAs Proteins 0.000 description 1
- 241000234295 Musa Species 0.000 description 1
- 240000005561 Musa balbisiana Species 0.000 description 1
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108020004485 Nonsense Codon Proteins 0.000 description 1
- 108091093105 Nuclear DNA Proteins 0.000 description 1
- 241000143294 Ochrobactrum sp. Species 0.000 description 1
- 235000002725 Olea europaea Nutrition 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 206010033307 Overweight Diseases 0.000 description 1
- 244000038248 Pennisetum spicatum Species 0.000 description 1
- 244000115721 Pennisetum typhoides Species 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 102000029797 Prion Human genes 0.000 description 1
- 108091000054 Prion Proteins 0.000 description 1
- 238000012356 Product development Methods 0.000 description 1
- 241001494501 Prosopis <angiosperm> Species 0.000 description 1
- 235000001560 Prosopis chilensis Nutrition 0.000 description 1
- 235000014460 Prosopis juliflora var juliflora Nutrition 0.000 description 1
- 241000508269 Psidium Species 0.000 description 1
- 240000001679 Psidium guajava Species 0.000 description 1
- 235000013929 Psidium pyriferum Nutrition 0.000 description 1
- 238000012228 RNA interference-mediated gene silencing Methods 0.000 description 1
- 102000018120 Recombinases Human genes 0.000 description 1
- 108010091086 Recombinases Proteins 0.000 description 1
- 241000589187 Rhizobium sp. Species 0.000 description 1
- 241000700141 Rotifera Species 0.000 description 1
- 241000209051 Saccharum Species 0.000 description 1
- 240000000111 Saccharum officinarum Species 0.000 description 1
- 235000007201 Saccharum officinarum Nutrition 0.000 description 1
- 108010016634 Seed Storage Proteins Proteins 0.000 description 1
- 235000005775 Setaria Nutrition 0.000 description 1
- 241000232088 Setaria <nematode> Species 0.000 description 1
- 235000008515 Setaria glauca Nutrition 0.000 description 1
- 240000005498 Setaria italica Species 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 235000002560 Solanum lycopersicum Nutrition 0.000 description 1
- 235000007230 Sorghum bicolor Nutrition 0.000 description 1
- 244000062793 Sorghum vulgare Species 0.000 description 1
- 235000009184 Spondias indica Nutrition 0.000 description 1
- 235000021536 Sugar beet Nutrition 0.000 description 1
- 241000282898 Sus scrofa Species 0.000 description 1
- 238000010459 TALEN Methods 0.000 description 1
- 240000004584 Tamarindus indica Species 0.000 description 1
- 235000004298 Tamarindus indica Nutrition 0.000 description 1
- 235000006468 Thea sinensis Nutrition 0.000 description 1
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 241000219977 Vigna Species 0.000 description 1
- 208000036142 Viral infection Diseases 0.000 description 1
- 241000726445 Viroids Species 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 235000007244 Zea mays Nutrition 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 1
- 150000003529 abscisic acid derivatives Chemical class 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 235000020224 almond Nutrition 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 150000005018 aminopurines Chemical class 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 238000009360 aquaculture Methods 0.000 description 1
- 244000144974 aquaculture Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- MXWJVTOOROXGIU-UHFFFAOYSA-N atrazine Chemical compound CCNC1=NC(Cl)=NC(NC(C)C)=N1 MXWJVTOOROXGIU-UHFFFAOYSA-N 0.000 description 1
- 239000002363 auxin Substances 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 235000013527 bean curd Nutrition 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 239000003124 biologic agent Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 235000019658 bitter taste Nutrition 0.000 description 1
- 239000002775 capsule Substances 0.000 description 1
- 230000023852 carbohydrate metabolic process Effects 0.000 description 1
- 235000021256 carbohydrate metabolism Nutrition 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 239000001569 carbon dioxide Substances 0.000 description 1
- 229910002092 carbon dioxide Inorganic materials 0.000 description 1
- 235000020226 cashew nut Nutrition 0.000 description 1
- 239000002962 chemical mutagen Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 210000003483 chromatin Anatomy 0.000 description 1
- 230000004186 co-expression Effects 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 238000001246 colloidal dispersion Methods 0.000 description 1
- 229920000547 conjugated polymer Polymers 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 244000038559 crop plants Species 0.000 description 1
- 230000010154 cross-pollination Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 239000004062 cytokinin Substances 0.000 description 1
- UQHKFADEQIVWID-UHFFFAOYSA-N cytokinin Natural products C1=NC=2C(NCC=C(CO)C)=NC=NC=2N1C1CC(O)C(CO)O1 UQHKFADEQIVWID-UHFFFAOYSA-N 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011982 device technology Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- IWEDIXLBFLAXBO-UHFFFAOYSA-N dicamba Chemical compound COC1=C(Cl)C=CC(Cl)=C1C(O)=O IWEDIXLBFLAXBO-UHFFFAOYSA-N 0.000 description 1
- 235000019621 digestibility Nutrition 0.000 description 1
- 235000006694 eating habits Nutrition 0.000 description 1
- 230000001516 effect on protein Effects 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 241001233957 eudicotyledons Species 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004129 fatty acid metabolism Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 235000004426 flaxseed Nutrition 0.000 description 1
- 238000005187 foaming Methods 0.000 description 1
- 235000012041 food component Nutrition 0.000 description 1
- 239000005417 food ingredient Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 231100000221 frame shift mutation induction Toxicity 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 235000012055 fruits and vegetables Nutrition 0.000 description 1
- 238000010230 functional analysis Methods 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 230000009368 gene silencing by RNA Effects 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 239000003448 gibberellin Substances 0.000 description 1
- IXORZMNAPKEEDV-OBDJNFEBSA-N gibberellin A3 Chemical class C([C@@]1(O)C(=C)C[C@@]2(C1)[C@H]1C(O)=O)C[C@H]2[C@]2(C=C[C@@H]3O)[C@H]1[C@]3(C)C(=O)O2 IXORZMNAPKEEDV-OBDJNFEBSA-N 0.000 description 1
- XDDAORKBJWWYJS-UHFFFAOYSA-N glyphosate Chemical compound OC(=O)CNCP(O)(O)=O XDDAORKBJWWYJS-UHFFFAOYSA-N 0.000 description 1
- 229940097068 glyphosate Drugs 0.000 description 1
- 239000005431 greenhouse gas Substances 0.000 description 1
- 239000003630 growth substance Substances 0.000 description 1
- 235000015220 hamburgers Nutrition 0.000 description 1
- 239000001307 helium Substances 0.000 description 1
- 229910052734 helium Inorganic materials 0.000 description 1
- SWQJXJOGLNCZEY-UHFFFAOYSA-N helium atom Chemical compound [He] SWQJXJOGLNCZEY-UHFFFAOYSA-N 0.000 description 1
- 230000002363 herbicidal effect Effects 0.000 description 1
- 244000038280 herbivores Species 0.000 description 1
- 239000005556 hormone Substances 0.000 description 1
- 229940088597 hormone Drugs 0.000 description 1
- 208000006278 hypochromic anemia Diseases 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000011261 inert gas Substances 0.000 description 1
- 238000011081 inoculation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000005865 ionizing radiation Effects 0.000 description 1
- 238000003973 irrigation Methods 0.000 description 1
- 230000002262 irrigation Effects 0.000 description 1
- CJWQYWQDLBZGPD-UHFFFAOYSA-N isoflavone Natural products C1=C(OC)C(OC)=CC(OC)=C1C1=COC2=C(C=CC(C)(C)O3)C3=C(OC)C=C2C1=O CJWQYWQDLBZGPD-UHFFFAOYSA-N 0.000 description 1
- 150000002515 isoflavone derivatives Chemical class 0.000 description 1
- 235000008696 isoflavones Nutrition 0.000 description 1
- 235000021374 legumes Nutrition 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000012669 liquid formulation Substances 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 239000011785 micronutrient Substances 0.000 description 1
- 235000013369 micronutrients Nutrition 0.000 description 1
- 235000019713 millet Nutrition 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 239000003147 molecular marker Substances 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 239000002105 nanoparticle Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 239000001272 nitrous oxide Substances 0.000 description 1
- 238000005312 nonlinear dynamic Methods 0.000 description 1
- 230000037434 nonsense mutation Effects 0.000 description 1
- 239000003960 organic solvent Substances 0.000 description 1
- 235000002252 panizo Nutrition 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000029553 photosynthesis Effects 0.000 description 1
- 238000010672 photosynthesis Methods 0.000 description 1
- 230000008121 plant development Effects 0.000 description 1
- 230000008635 plant growth Effects 0.000 description 1
- 235000021135 plant-based food Nutrition 0.000 description 1
- 210000002706 plastid Anatomy 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 235000020777 polyunsaturated fatty acids Nutrition 0.000 description 1
- 244000144977 poultry Species 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 210000001938 protoplast Anatomy 0.000 description 1
- 238000004549 pulsed laser deposition Methods 0.000 description 1
- 235000021251 pulses Nutrition 0.000 description 1
- 150000003254 radicals Chemical class 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000033458 reproduction Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 230000021749 root development Effects 0.000 description 1
- 235000019600 saltiness Nutrition 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 229930182490 saponin Natural products 0.000 description 1
- 150000007949 saponins Chemical class 0.000 description 1
- 235000017709 saponins Nutrition 0.000 description 1
- 235000021003 saturated fats Nutrition 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 238000005204 segregation Methods 0.000 description 1
- 230000014639 sexual reproduction Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000037432 silent mutation Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 239000002002 slurry Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 235000013322 soy milk Nutrition 0.000 description 1
- 235000012424 soybean oil Nutrition 0.000 description 1
- 239000003549 soybean oil Substances 0.000 description 1
- 235000019710 soybean protein Nutrition 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 238000002948 stochastic simulation Methods 0.000 description 1
- 230000035882 stress Effects 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012090 tissue culture technique Methods 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 108700012359 toxins Proteins 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 235000019583 umami taste Nutrition 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
- 230000008511 vegetative development Effects 0.000 description 1
- 230000009105 vegetative growth Effects 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Definitions
- Genomics has been used for decades to develop crops for our food system, but most agricultural companies have focused almost exclusively on increasing the yield of a few crops, resulting in commodity ingredients and a food system based on the quantity of calories available. While focus on quantity is important, that focus resulted in lower nutrient density and changed flavors. Minimal diversity in ingredient options also led food manufacturers to add costly water- and energy-intensive processing steps, and additives like sugar and salt to make up for attributes that were muted in crops over time.
- soybean plant The largest commercial source of plant protein today is the soybean plant.
- Other plantbased protein crops include chickpeas, edamame, lentils, peanuts, and peas.
- Soybeans Generally
- Soybeans are believed to have originated on the Asian Continent (glycine soja) where it is believed they were also first domesticated in China (glycine max). Abstract, Hymowitz and Newell, Taxonomy of the vausGlycine , domestication and uses of soybeans. Econ Bot 35, 272- 288 (1981). Soybeans are a common field crop with the largest producing countries including the United States, Brazil, Argentina, China, India, Paraguay, and Canada. In the United States in 2020 soybeans were primarily produced in the Western Com Belt (48.7%), Eastern Corn Belt (32.7 %), and the Midsouth (11.9%) with Illinois and Iowa being the largest producing states.
- Soybean plants produce seed-bearing pods, each generally having 2-4 seeds. The seeds are harvested and processed either for future planting (i.e., to produce additional soybean plants) or processed into dozens of products (e.g., bean curd, feed for livestock, flour, meal, oil (cooking and industrial)). Soy flours includes flour concentrates and isolates, which are the primary protein products of soy.
- Soybean seeds are usually planted in rows in soil. According to the 2012 Illinois Soybean Production Guide, soybeans require 55-60°F soil temperature, an air temperature of at least 68°F, about 25 inches of water, sufficient nitrogen and five months from germination to harvest.
- the radical (or root) is the first structure to emerge from a germinating soybean seed.
- the hypocotyl is the seedling structure that emerges from the soil surface. As the hypocotyl emerges it forms a crook as it pulls the cotyledons (i.e., the plant’s first leaves) from the soil. Then, the cotyledons can unfold and begin the process of photosynthesis. Once the cotyledons have emerged from the soil surface the plant is said to be at the VE stage of vegetative development.
- the VC (cotyledon) development stage occurs once two unifoliate (or single blade) leaves emerge from opposite sides of the main stem and no longer touch the cotyledons.
- the VI (vegetative) development stage occurs once the unifoliate leaves are fully expanded establishing the first node.
- MG maturity group
- Soybeans are short-day plants i.e., the soybean plant is triggered to flower as the day length decreases below some critical value, which differs among MGs). See, e.g., Purcell, Salmeron and Ashlock, “Chapter 2: Soybean Growth and Development” Arkansas Soybean Production Handbook (University of Arkansas Division of Agricultural Research & Extension, 2014 Update). Soybeans planted in Arkansas tend to be MG3 through MG6. Id.
- MG 5 to MG 8 soybeans tend to be determinate (i.e., they cease vegetative growth when the main stem terminates in a cluster of mature pods) and MG 0 to MG 4.9 tend to be indeterminate (i.e. they develop leaves and flowers simultaneously after flowering begins).
- Each soybean plant can produce a lot of flowers. The flowers are small and hidden underneath the leaves of the plant. The number of flowers produced depends upon the number of nodes on the main stem and branches with flower-bearing nodes. Not all flowers produce pods. For those flowers that do produce pods whether the resulting pod produces a full complement of seeds requires ample nitrogen, sugar, other nutrients, and favorable environmental conditions.
- soybean plant begins to flower, it is referred to as being in its reproductive (R) growth stage.
- Soybeans are a normally self-pollinating crop, in fact, they have a perfect flower structure for self-pollination. Still, bees have been known to be attracted to soybean flowers and cross-pollinated plants. Where cross-pollination is desired breeders need to intervene to prevent self-pollination: the pistil of a soybean plant can become mature and the anthers can begin to shed pollen before the soybean flowers even bloom, breeders seeking to cross-pollinate need to be proactive.
- Soybean plants have eight reproductive stages: R1 (beginning flowering/bloom (i.e., at least one flower)), R2 (full flowering/bloom i.e., an open flower at one of the two uppermost nodes)), R3 (beginning pod (i.e., a pod measuring 3/16 inch at one of the four uppermost nodes)), R4 (full pod (i.e., a pod measuring 3/4 inch at one of the four uppermost nodes)), R5 (beginning seed (i.e., a seed measuring 1/8 inch long in the pod at one of the four uppermost nodes)), R6 (full seed (i.e., a pod containing a green seed that fills the pod at one of the four uppermost nodes)), R7 (beginning maturity (i.e., one normal pod has reached mature pod color)), and R8 (full maturity (i.e., at least 95% of pods have reach full mature color)).
- soybeans As the days get shorter and the temperatures get cooler, the leaves on soybean plants begin to turn yellow, they subsequently turn brown, fall off, and expose the matured pods of soybeans.
- the soybeans are now ready to be harvested using combines.
- the header on the front of the combine cuts and collects the soybean plants.
- the combine separates the soybeans from their pods and stems, and collects them into some container.
- the soybeans After harvesting the soybeans are processed.
- the soybeans are cleaned, heat dried, crushed and then flaked. Thereafter, the flake is further processed.
- the primary method for further processing is referred to as the extraction or solvent process, as it uses organic solvents (e.g. hexane) to recover the soybean oil and protein from the flake. Aside from its substantial use of solvents, this process consumes significant amounts of energy.
- organic solvents e.g. hexane
- Soybeans Seed Varieties, Breeding, and Genetic Modification
- phenotype is not necessarily correlated because that phenotype may result from homozygous dominant, heterozygous, or homozygous recessive alleles. Where the phenotype is dominant, it will be exhibited by either of the first two zygosities. Whereas a recessive phenotype can only be exhibited by the third, homozygous recessive example.
- Homozygous genotypes breed true from generation to generation, while heterozygous genotypes do not. Thus, after finding a desirable phenotype, plant breeders work to develop homozygosity in the population, and then release the resulting pure line as a new variety. For example, hybrid varieties are the result of crossing two homozygous, but unrelated pure lines of a species. The resulting Fl of the cross are all heterozygous. However, by F2 50% of the plants are either homozygous (dominant or recessive) and by F3 heterozygosity is reduced to 25%. Once a desired trait is found in homozygous plants, commercial quantities are produced by replanting the resulting seeds over several generations.
- Plants may also be genetically modified. Genetic engineering allows for the introduction of a new trait or even just better control over an existing trait. In 2002, for instance, the majority of the soybean plants grown in the United States were genetically modified for herbicide-tolerance. Sleper and Shannon, “Role of Public and Private Soybean Breeding Programs in the Development of Soybean Varieties Using Biotechnology,” AgBioForum, 6(1&2): 27-32 (2003). There are two predominant approaches to genetic engineering in plants: the gene gun and the agrobacterium method.
- the desired gene is coated onto small metal particles and shot within a vacuum chamber using a short, high-velocity pulse of a high-pressure, inert gas (e.g., Helium) toward plants covered by a fine mesh baffle that catches the small metal particles while allowing the gene to continue into the target cell.
- a high-pressure, inert gas e.g., Helium
- the tumor inducing region is removed from transfer DNA and replaced with the desired gene and a marker, which are inserted into the tissue of an organism usually by direct inoculation with a culture of transformed Agrobacterium.
- An antibiotic medium is subsequently introduced to kill the Agrobacterium and remove the marker. Only tissues expressing the marker will survive and possess the gene of interest. These tissues are then grown using tissue culture techniques until a plant is grown and produces seeds. Neither of these methods are particularly easy.
- DNA sequencing is generally desirable to confirm that the host cell now contains the new gene and where the gene inserted.
- soybean yield is increased yield and increased tolerance to various potential environmental stressors (e.g., insects, drought).
- environmental stressors e.g., insects, drought.
- soybean yields have significantly increased in the United States over the last thirty years, the amount of protein contained in those soybeans has substantially declined over the same time period.
- Machine learning and other forms of artificial intelligence are already being used to improve certain outcomes in agriculture.
- One key to successful machine learning is identifying the right types of data to gather and then using that data to train the right type of model.
- Another key may include identifying the wrong, unnecessary, or cumbersome data the inclusion of which is either unhelpful in developing the model or unnecessarily slows down or other makes the training process unnecessarily expensive without sufficient improvement of the model.
- the present disclosure is directed to systems and methods for training a machine learning model and subsequently applying that machine learning model to accelerate speed to market for improved plant-based products.
- these potential improvements may comprise increased protein content, decreased oligosaccharides (e.g., raffinose and stachyose), maintaining and/or even improving crop yield, improved consumer experience (e.g., taste, texture, smell), and combinations of the foregoing.
- the present disclosure teaches a method for training a machine-learning model and subsequently applying that machine learning model to accelerate speed to market for an improved plant-based product.
- the method comprising: (a) collecting into a database, with a processor, seed data including at least labelled parentage information that includes genetics information; (b) training, with the processor, a first machine-learning model based on the data collected for each data type for each of the plurality of seed varieties within the germplasm; (c) establishing, via the processor, a functional specification for the improved plant-based product; (d) extracting, with the processor, one or more plant traits needed to at least meet the functional specification; (e) inputting, via the processor, the one or more plant traits needed to at least meet the functional specification into the trained first machine learning model to generate a first predictive breeding crosses list ranked based on aggregate probability that a progeny of the cross will substantially conform to one or more of the one or more plant traits needed to meet the functional specification; (f) collecting data, by the processor, from the progeny of
- the method may calculate potential crosses for advancement in the breeding pipeline to obtain progeny having desired characteristics or combinations of characteristics and/or traits such as yield, protein, oil, height, and maturity based on simulated and/or historical data. Moreover, the method may also estimate population parameters (e.g. population usefulness, transgressive segregation ratio, parent mean, protein, yield, oil, maturity, height) based on the simulated population phenotypes and may then test different selection algorithms.
- population parameters e.g. population usefulness, transgressive segregation ratio, parent mean, protein, yield, oil, maturity, height
- the method may implement a machine learning model to select to select progeny for field testing based entirely on genotypic data before collecting any phenotypes on the plant lines.
- the machine learning model may be a neural network with historical data based on genomic predictions, maturity rating, and market class.
- the first machine learning model could be selected from the group comprising supervised learning models, unsupervised learning models, and combinations thereof and different from the first machine learning model.
- the model may predict, for example, the likelihood that a progeny will advance to the next phase of the breeding pipeline (e.g.
- Phase 2 if it were tested in the previous phase (e.g., “Phase 1”) based on the historical data of the previous phase (e.g., Phase 1).
- the model may predict breeding advancement for a progeny without using any field data or observed phenotypes for the target progeny. As disclosed herein, it may be possible to predict progeny success in the breeding pipeline using only genomic predictions, product class, and estimated maturity rating based on the parents.
- the method may be directed to only selecting a plant progeny line for advancement within a pipeline.
- Such a method would comprise: (A) using a first machine learning model trained using simulated training data to identify a first set of candidate progeny lines from a plurality of candidate progeny lines to advance to a testing phase; and (B) using a second machine learning model trained using historical data to identify a second set of candidate progeny lines from the first set of candidate progeny lines to advance to a phase subsequent to the testing phase.
- the method may further comprise selecting a training data set, which may comprise a genomic marker set, to train the first and/or second machine learning model.
- the method may also comprise using the second machine learning model trained to identify the second set of candidate progeny lines to advance to the phase subsequent to the testing phase comprises generative data indicative of which of the first set of candidate progeny lines to advance to commercial use.
- the method may additionally comprise receiving information about a population of plants, wherein the first set and the second set are progenies of the population, and using the first machine learning model comprises automatically using the first machine learning model in response to receiving the information about the population of plants; and using the second machine learning model comprises automatically using the second machine learning model in response to the first machine learning model identifying the first set.
- the method may further include generating a first list of potential gene editing targets based on a probability that editing a particular gene will result in a plant that will substantially conform to one or more of the one or more plant traits needed to at least meet the functional specification.
- the method may further comprise: (h) selecting, with the processor, a second machine learning model based on the data type of each data element of the training data selected to train the second machine learning model (“second training data”), the second machine learning model selected from the group comprising supervised learning models, unsupervised learning models, and combinations thereof and different from the first machine learning model; (i) training, with the processor, the second machine learning model using the second training data from the database; (j) inputting, via the processor, the one or more plant traits needed to at least meet the functional specification into the trained second machine learning model to generate a second predictive breeding crosses list ranked based on aggregate probability that a progeny of the cross will substantially conform to one or more of the one or more plant traits needed to meet the functional specification and a second list of potential gene editing targets based on a probability that editing a particular gene will result in a plant that will substantially conform to one or more of the one or more plant traits needed to at least meet the functional specification; (k) collecting data, by the processor, from the
- the method may still further comprise: (m) mediating between the first machine learning model and the second machine learning model to establish an aggregated predictive breeding crosses list based on the first and second predictive breeding crosses lists; (n) collecting data from the progeny of crosses planted based on the aggregated predictive breeding crosses list; (o) comparing the collected progeny data to corresponding predictions made by both the first and the second machine learning models toward determining next action recommended by the first and second machine learning model; and (p) mediating between the first machine learning model and the second machine learning model to determine the best next action recommendation.
- the first machine learning model may be paired with an in silico simulation model.
- the method may also comprise automated processes to consume data, such as historical genomic and phenotypic data, select optimized genomic marker sets, select optimized model training sets, select optimal genomic selection models, and provide breeding advancement recommendations.
- data such as historical genomic and phenotypic data, select optimized genomic marker sets, select optimized model training sets, select optimal genomic selection models, and provide breeding advancement recommendations.
- the method may process historical genomic and phenotypic data of a soybean. The method may be automated to process and run analysis on the historical data to get summarized phenotypes of all soybean traits. The method may be automated to then use custom markers for obtaining genomic data from the soybean. The method may be automated to then process and link phenotypes with genotypes as well as germplasm metadata information.
- the method may be automated to determine the best training model based on genomic distance, selecting the best training model for one or more given soybean trait, training the model for one or more soybean traits, and calculating predictions for phenotypes for a germplasm.
- the disclosure further teaches various systems that implement the various methods described herein.
- Figure 1 is a diagram of a system and associated methods for accelerating the speed to market for improved plant-based products.
- Figure 1A is a diagram of plant-based production development program (150) shown in Figure 1.
- Figure IB is a diagram illustrating the types of data gathered and maintained by the system for each seed associated with the system.
- Figure 1C is an illustration of the basic concept behind the various models used in system 100.
- Figure ID is a diagram showing the probabilities determined for a particular seed object under a particular set of circumstances.
- Figure 2 is a diagram of features that may be used to train one embodiment of the predictive crossing, predictive recombination, predictive advancement, and predictive deployment models used in the plant-based production development program, which may include one or more types of machine learning models depending upon the type of feature data used.
- Figure 3 is a diagram illustrating the process of potential changes to one or more of the machine-learning models based on live data collection.
- Figure 4 is a block diagram illustration one potential system within which one or more of the inventive concepts disclosed in the present specification may be implemented.
- the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having,” or any other variation thereof, are intended to cover a non-exclusive inclusion.
- a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherently present therein.
- A, B, C, and combinations thereof refers to all permutations or combinations of the listed items preceding the term.
- “A, B, C, and combinations thereof’ is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, CBA, BCA, ACB, BAC, or CAB.
- expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, AAB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth.
- a person of ordinary skill in the art will understand that typically there is no limit on the number of items or terms in any combination, unless otherwise apparent from the context.
- At least one and “one or more” will be understood to include one as well as any quantity more than one, including, but not limited to, each of, 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 100, and all integers and fractions, if applicable, therebetween.
- the terms “at least one” and “one or more” may extend up to 100 or 1000 or more, depending on the term to which it is attached; in addition, the quantities of 100/1000 are not to be considered limiting, as higher limits may also produce satisfactory results.
- any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
- the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- qualifiers such as “about,” “approximately,” and “substantially” are intended to signify that the item being qualified is not limited to the exact value specified, but includes some slight variations or deviations therefrom, caused by measuring error, manufacturing tolerances, stress exerted on various parts, wear and tear, and combinations thereof, for example.
- components may be analog or digital components that perform one or more functions.
- the term “component” may include hardware, such as a processor (e.g., microprocessor), a combination of hardware and software, and/or the like.
- Software may include one or more computer executable instructions that when executed by one or more components cause the component to perform a specified function. It should be understood that any and all algorithms described herein may be stored on one or more non-transitory memory. Exemplary non-transitory memory may include random access memory, read only memory, flash memory, and/or the like. Such non-transitory memory may be electrically based, optically based, and/or the like.
- a “mutation” is any change in a nucleic acid sequence.
- Nonlimiting examples comprise insertions, deletions, duplications, substitutions, inversions, and translocations of any nucleic acid sequence, regardless of how the mutation is brought about and regardless of how or whether the mutation alters the functions or interactions of the nucleic acid.
- a mutation may produce altered enzymatic activity of a ribozyme, altered base pairing between nucleic acids (e.g. RNA interference interactions, DNA-RNA binding, etc.), altered mRNA folding stability, and/or how a nucleic acid interacts with polypeptides (e.g.
- a mutation might result in the production of proteins with altered amino acid sequences (e.g. missense mutations, nonsense mutations, frameshift mutations, etc.) and/or the production of proteins with the same amino acid sequence (e.g. silent mutations).
- Certain synonymous mutations may create no observed change in the plant while others that encode for an identical protein sequence nevertheless result in an altered plant phenotype (e.g. due to codon usage bias, altered secondary protein structures, etc.).
- Mutations may occur within coding regions (e.g., open reading frames) or outside of coding regions (e.g., within promoters, terminators, untranslated elements, or enhancers), and may affect, for example and without limitation, gene expression levels, gene expression profiles, protein sequences, and/or sequences encoding RNA elements such as tRNAs, ribozymes, ribosome components, and microRNAs.
- coding regions e.g., open reading frames
- coding regions e.g., within promoters, terminators, untranslated elements, or enhancers
- RNA elements such as tRNAs, ribozymes, ribosome components, and microRNAs.
- Methods disclosed herein are not limited to mutations made in the genomic DNA of the plant nucleus.
- a mutation is created in the genomic DNA of an organelle (e.g. a plastid and/or a mitochondrion).
- a mutation is created in extrachromosomal nucleic acids (including RNA) of the plant, cell, or organelle of a plant.
- Nonlimiting examples include creating mutations in supernumerary chromosomes (e.g. B chromosomes), plasmids, and/or vector constructs used to deliver nucleic acids to a plant. It is anticipated that new nucleic acid forms will be developed and yet fall within the scope of the claimed invention when used with the teachings described herein.
- Methods disclosed herein are not limited to certain techniques of mutagenesis. Any method of creating a change in a nucleic acid of a plant can be used in conjunction with the disclosed invention, including the use of chemical mutagens (e.g. methanesulfonate, sodium azide, aminopurine, etc.), genome/gene editing techniques (e.g. CRISPR-like technologies, TALENs, zinc finger nucleases, and meganucleases), ionizing radiation (e.g. ultraviolet and/or gamma rays) temperature alterations, long-term seed storage, tissue culture conditions, targeting induced local lesions in a genome, sequence-targeted and/or random recombinases, etc.
- chemical mutagens e.g. methanesulfonate, sodium azide, aminopurine, etc.
- genome/gene editing techniques e.g. CRISPR-like technologies, TALENs, zinc finger nucleases, and meganucleases
- nucleic acid of a plant It is anticipated that new methods of creating a mutation in a nucleic acid of a plant will be developed and yet fall within the scope of the claimed invention when used with the teachings described herein.
- embodiments disclosed herein are not limited to certain methods of introducing nucleic acids into a plant and are not limited to certain forms or structures that the introduced nucleic acids take. Any method of transforming a cell of a plant described herein with nucleic acids are also incorporated into the teachings of this innovation, and one of ordinary skill in the art will realize that the use of particle bombardment (e.g.
- nucleic acid sequences into a plant described herein can be used to deliver nucleic acid sequences into a plant described herein.
- Methods disclosed herein are not limited to any size of nucleic acid sequences that are introduced, and thus one could introduce a nucleic acid comprising a single nucleotide (e.g. an insertion) into a nucleic acid of the plant and still be within the teachings described herein.
- Nucleic acids introduced in substantially any useful form, for example, on supernumerary chromosomes e.g.
- B chromosomes B chromosomes
- plasmids plasmids
- vector constructs additional genomic chromosomes (e.g. substitution lines)
- additional genomic chromosomes e.g. substitution lines
- other forms is also anticipated. It is envisioned that new methods of introducing nucleic acids into plants and new forms or structures of nucleic acids will be discovered and yet fall within the scope of the claimed invention when used with the teachings described herein.
- Methods disclosed herein include conferring desired traits to plants, for example, by mutating sequences of a plant, introducing nucleic acids into plants, using plant breeding techniques and various crossing schemes, etc. These methods are not limited as to certain mechanisms of how the plant exhibits and/or expresses the desired trait.
- the trait is conferred to the plant by introducing a nucleotide sequence (e.g. using plant transformation methods) that encodes production of a certain protein by the plant.
- the desired trait is conferred to a plant by causing a null mutation in the plant’s genome (e.g. when the desired trait is reduced expression or no expression of a certain trait).
- the desired trait is conferred to a plant by crossing two plants to create offspring that express the desired trait. It is expected that users of these teachings will employ a broad range of techniques and mechanisms known to bring about the expression of a desired trait in a plant. Thus, as used herein, conferring a desired trait to a plant is meant to include any process that causes a plant to exhibit a desired trait, regardless of the specific techniques employed. [0064] As used herein, “fertilization” and/or “crossing” broadly includes bringing the genomes of gametes together to form zygotes but also broadly may include pollination, syngamy, fecundation and other processes related to sexual reproduction.
- a cross and/or fertilization occurs after pollen is transferred from one flower to another, but those of ordinary skill in the art will understand that plant breeders can leverage their understanding of fertilization and the overlapping steps of crossing, pollination, syngamy, and fecundation to circumvent certain steps of the plant life cycle and yet achieve equivalent outcomes, for example, a plant or cell of a soybean cultivar described herein.
- a user of this innovation can generate a plant of the claimed invention by removing a genome from its host gamete cell before syngamy and inserting it into the nucleus of another cell.
- the process falls within the definition of fertilization and/or crossing as used herein when performed in conjunction with these teachings.
- the gametes are not different cell types (i.e. egg vs. sperm), but rather the same type and techniques are used to effect the combination of their genomes into a regenerable cell.
- Other embodiments of fertilization and/or crossing include circumstances where the gametes originate from the same parent plant, i.e. a “self’ or “self-fertilization”.
- compositions taught herein are not limited to certain techniques or steps that must be performed to create a plant or an offspring plant of the claimed invention, but rather include broadly any method that is substantially the same and/or results in compositions of the claimed invention.
- a “plant” refers to a whole plant, any part thereof, or a cell or tissue culture derived from a plant, comprising any of: whole plants, plant components or organs (e.g., leaves, stems, roots, etc.), plant tissues, seeds, plant cells, protoplasts and/or progeny of the same.
- a plant cell is a biological cell of a plant, taken from a plant or derived through culture of a cell taken from a plant.
- a “population” means a set comprising any number, including one, of individuals, objects, or data from which samples are taken for evaluation, e.g. estimating QTL effects and/or disease tolerance. Most commonly, the terms relate to a breeding population of plants from which members are selected and crossed to produce progeny in a breeding program.
- a “population of plants” can include the progeny of a single breeding cross or a plurality of breeding crosses and can be either actual plants or plant derived material, or in silico representations of plants. The member of a population need not be identical to the population members selected for use in subsequent cycles of analyses nor does it need to be identical to those population members ultimately selected to obtain a final progeny of plants.
- a “plant population” is derived from a single biparental cross but can also derive from two or more crosses between the same or different parents.
- a population of plants can comprise any number of individuals, those of skill in the art will recognize that plant breeders commonly use population sizes ranging from one or two hundred individuals to several thousand, and that the highest performing 5-20% of a population is what is commonly selected to be used in subsequent crosses in order to improve the performance of subsequent generations of the population in a plant breeding program.
- Crop performance is used synonymously with “plant performance” and refers to of how well a plant grows under a set of environmental conditions and cultivation practices. Crop performance can be measured by any metric a user associates with a crop's productivity (e.g. yield), appearance and/or robustness (e.g. color, morphology, height, biomass, maturation rate), product quality (e.g. fiber lint percent, fiber quality, seed protein content, seed carbohydrate content, etc.), cost of goods sold (e.g. the cost of creating a seed, plant, or plant product in a commercial, research, or industrial setting) and/or a plant's tolerance to disease (e.g.
- a crop's productivity e.g. yield
- appearance and/or robustness e.g. color, morphology, height, biomass, maturation rate
- product quality e.g. fiber lint percent, fiber quality, seed protein content, seed carbohydrate content, etc.
- cost of goods sold e.g. the cost of creating
- Crop performance can also be measured by determining a crop's commercial value and/or by determining the likelihood that a particular inbred, hybrid, or variety will become a commercial product, and/or by determining the likelihood that the offspring of an inbred, hybrid, or variety will become a commercial product.
- Crop performance can be a quantity (e.g. the volume or weight of seed or other plant product measured in liters or grams) or some other metric assigned to some aspect of a plant that can be represented on a scale (e.g. assigning a 1 -10 value to a plant based on its disease tolerance).
- a “microbe” will be understood to be a microorganism, i.e. a microscopic organism, which can be single celled or multicellular. Microorganisms are very diverse and include all the bacteria, archaea, protozoa, fungi, and algae, especially cells of plant pathogens and/or plant symbionts. Certain animals are also considered microbes, e.g. rotifers. In various embodiments, a microbe can be any of several different microscopic stages of a plant or animal. Microbes also include viruses, viroids, and prions, especially those which are pathogens or symbionts to crop plants.
- a “fungus” includes any cell or tissue derived from a fungus, for example whole fungus, fungus components, organs, spores, hyphae, mycelium, and/or progeny of the same.
- a fungus cell is a biological cell of a fungus, taken from a fungus or derived through culture of a cell taken from a fungus.
- a “pest” is any organism that can affect the performance of a plant in an undesirable way. Common pests include microbes, animals (e.g. insects and other herbivores), and/or plants (e.g. weeds). Thus, a “pesticide” is any substance that reduces the survivability and/or reproduction of a pest, e.g. fungicides, bactericides, insecticides, herbicides, and other toxins.
- Tolerance or improved tolerance in a plant to disease conditions (e.g. growing in the presence of a pest) will be understood to mean an indication that the plant is less affected by the presence of pests and/or disease conditions with respect to yield, survivability and/or other relevant agronomic measures, compared to a less tolerant, more "susceptible" plant. Tolerance is a relative term, indicating that a "tolerant" plant survives and/or performs better in the presence of pests and/or disease conditions compared to other (less tolerant) plants (e.g., a different soybean cultivar) grown in similar circumstances.
- tolerance is sometimes used interchangeably with “resistance”, although resistance is sometimes used to indicate that a plant appears maximally tolerant to, or unaffected by, the presence of disease conditions. Plant breeders of ordinary skill in the art will appreciate that plant tolerance levels vary widely, often representing a spectrum of more-tolerant or less-tolerant phenotypes, and are thus trained to determine the relative tolerance of different plants, plant lines or plant families and recognize the phenotypic gradations of tolerance.
- a plant, or its environment can be contacted with a wide variety of "agriculture treatment agents.”
- an "agriculture treatment agent”, or “treatment agent”, or “agent” can refer to any exogenously provided compound that can be brought into contact with a plant tissue (e.g. a seed) or its environment that affects a plant's growth, development and/or performance, including agents that affect other organisms in the plant's environment when those effects subsequently alter a plant's performance, growth, and/or development (e.g. an insecticide that kills plant pathogens in the plant's environment, thereby improving the ability of the plant to tolerate the insect's presence).
- Agriculture treatment agents also include a broad range of chemicals and/or biological substances that are applied to seeds, in which case they are commonly referred to as “seed treatments” and/or seed dressings. Seed treatments are commonly applied as either a dry formulation or a wet slurry or liquid formulation prior to planting and, as used herein, generally include any agriculture treatment agent including growth regulators, micronutrients, nitrogen- fixing microbes, and/or inoculants. Agriculture treatment agents include pesticides (e.g. fungicides, insecticides, bactericides, etc.) hormones (abscisic acids, auxins, cytokinins, gibberellins, etc.) herbicides (e.g.
- the agriculture treatment agent acts extracellularly within the plant tissue, such as interacting with receptors on the outer cell surface.
- the agriculture treatment agent enters cells within the plant tissue.
- the agriculture treatment agent remains on the surface of the plant and/or the soil near the plant.
- the agriculture treatment agent is contained within a liquid.
- liquids include, but are not limited to, solutions, suspensions, emulsions, and colloidal dispersions.
- liquids described herein will be of an aqueous nature.
- aqueous liquids that comprise water can also comprise water insoluble components, can comprise an insoluble component that is made soluble in water by addition of a surfactant, or can comprise any combination of soluble components and surfactants.
- the application of the agriculture treatment agent is controlled by encapsulating the agent within a coating, or capsule (e.g. microencapsulation).
- the agriculture treatment agent comprises a nanoparticle and/or the application of the agriculture treatment agent comprises the use of nanotechnology.
- plants disclosed herein can be modified to exhibit at least one “desired trait”, and/or combinations thereof.
- the disclosed innovations are not limited to any set of traits that can be considered desirable, but nonlimiting examples include male sterility, herbicide tolerance, pest tolerance, disease tolerance, modified fatty acid metabolism, modified carbohydrate metabolism, modified seed yield, modified seed oil, modified seed protein, modified lodging resistance, modified shattering, modified iron-deficiency chlorosis, modified water use efficiency, and/or combinations thereof.
- Desired traits can also include traits that are deleterious to plant performance, for example, when a researcher desires that a plant exhibits such a trait in order to study its effects on plant performance.
- a user can combine the teachings herein with high-density molecular marker profiles spanning substantially the entire soybean genome to estimate the value of selecting certain candidates in a breeding program in a process commonly known as “genomic selection”.
- machine learning generally refers to computer algorithms that may learn from pre-existing data and then make predictions about new data.
- machine-learning tools operate by building a model from example training data, which, for example, can be used to model an environment based on that training data and then make decisions or predictions without explicit instructions.
- Deep learning or deep structured learning is a type of machine learning that can use artificial neural networks (e.g., inspired by biological systems) with representation learning.
- Representation learning is a set of techniques that allows a system to automatically discover representations needed to detect features in future sets of data.
- supervised learning a “teacher” presents the computer with the desired outputs given a set of example inputs. This is generally thought to involve classification and regression, which can be accomplished using one or more approaches including, but not limited to, decision trees, ensembles (e.g. Random Forest), nearest neighbors algorithm, linear regression, gBLUP (genomic best linear unbiased prediction), lasso (least absolute shrinkage and selection operator), lasso LARS, Ridge regression, Elastic Net, Naive Bayes, Artificial neural networks (ANN or NN), logistic regression, perceptron, Relevance vector machine (RVM), and Support vector machine (SVM).
- the approach to supervised learning used depends on the data set, among other issues involved in this choice is the amount training data available, the dimensionality and heterogeneity of that data, redundancy in that data, the interrelations between data elements, and the amount of noise present in the output.
- “unsupervised learning” the computer is left to find any naturally occurring patterns within the training data. This can be accomplished by using one or more approaches including, but not limited to, clustering (z.e., automatically grouping the training examples into categories with similar features), anomaly detection, principal component analysis (z.e., automatically identifying features that are most useful for discriminating between different training examples and then discarding the rest), self-organizing feature maps, and latent variable models.
- Clustering methods include hierarchical clustering, k-means, mixture models (z.e., a probabilistic model that represents the presence of subpopulations within an overall population), DBSCAN (density -based spatial clustering of applications with noise), expectation-maximization, BIRCH, and CURE.
- one or more of the foregoing supervised and unsupervised machine learning approaches may be used by the present system and methods in parallel or seriatim using the same training data or subsets thereof. Where subsets are used the scope of any such subset may be selected for use with the particularly selected training data within that subset with reference to the pluses and minuses of one or more of the particular approaches to machine learning. Where multiple machine learning approaches are used in parallel (i.e., stacked) a decision-making model is preferably introduced to mediate between the probability assessments provided by the multiple machine learning models toward providing a single list of recommended actions (e.g., desirable plant crosses, gene editing targets, crop management techniques).
- recommended actions e.g., desirable plant crosses, gene editing targets, crop management techniques.
- Training machine learning models requires the selection of features and collection of data associated with relevant features in order to appropriately train the machine learning model.
- the present disclosure identifies various categories of data that the inventors believe may play a substantive role in training useful models.
- the potentially useful data is saved to a seed object (or seed vector) 200 that describes each unique seed contained within the germplasm 105.
- the seed object 200 is preferably identified by one or more of its germplasm ID, its parentage, genotypic, phenotypic, or other genetic data. Seed object 200 may be virtual in the sense that it may contain nothing more than the germplasm ID, parentage and basic genetic data.
- a “virtual” seed object 200 may also include genomic forecasted probabilities for the seed such as protein content, yield, oil content, and maturity group, all of which may be represented as their mean values and may have an associated standard deviation.
- physical testing data may be collected and may be further processed based on directions from the machine learning system. Processing may be performed on the directly observed physical data (e.g., genotype, phenotype, genetic sequencing (partial or WGS), ingredient processing data, and consumer sensory data) or on one or more derivative data sets (e.g. , GWAS or TWAS) based on the observed physical data.
- the directly observed data may be collected during speed breeding, field testing, and commercialization and/or from the results of such speed breeding, field testing, and commercialization by obtaining tissue samples from the various steps in the process (as illustrated in Figure 1A).
- tissue samples may be obtained from seeds generated during speed breeding which may be subjected genotyping, sequencing (partial or WGS), and/or predictive phenotyping.
- tissue samples taken from seeds resulting from the growth of an F4 generation may be subjected to both food testing protocols as well as genotyping/sequencing/predictive phenotyping, whereas seeds resulting from commercialization may only be subjected to food testing protocols.
- information is recorded to a seed object 200 associated with a particular seed.
- the data saved to seed object 200 may also include measured data for a seed (really a population of seeds sharing a common pedigree). As illustrated in Figure IB, for soybeans this measured seed data 250 may include protein, yield, oil, Maturity Group, and food testing protocol data for each instance that seed is grown. The protein and oil data may be further measured and recorded as to type of protein/oil. Field data 255 for each instance the seed has been grown and observed may also preferably be associated with this collected data. Field data 255 may include location (e.g.
- the field data 255 collected with respect to any particular growing event may not produce instructive data with respect to all of these variables (e.g., the location of an indoor growing event) or even where all of the variables could have been collected, the data may not have been recorded, entered into the dataset, or removed from the dataset for various reasons.
- the models selected for use with the overall dataset contemplate the potential absence of data points from the overall dataset.
- the record may also include the number of actual data points collected with respect to each separate data type, as well as the mean and standard deviation for that data, and various correlations, such as the correlation of observed protein to observed yield. It should be understood by those of ordinary skill in the art having the present disclosure before them that other correlations may be calculated and included in a seed object data record 200, such as correlations, if any, between protein and oil, protein and maturity group, protein and food testing data, yield and oil, yield and maturity group, yield and food testing data, oil and maturity group, oil and food testing data, and maturity group and food testing data. It may further be possible using the collected data to identify opportunities to use growing data from one or more prior growing season in predicting future performance of the seed. Thus, for example, the probabilities with respect to future protein and yield of a seed, are significantly improved when combining genomic prediction with prior year field data (e.g. use the measured results of Phase 1 field testing to predict Phase 2 results).
- the genotypic data may include, but is not limited to, ATAC-Seq, gene annotation, gene expression, genes essential development and maintenance, GO (Gene Ontology) Terms, GWAS (genome wide association study) data, known QTL (quantitative trait locus) data, known eQTL (expression quantitative trait locus), expression data, co-expression data, metabolites data, promoters, RNA-sequencing data (preferably collected at R4 and R5), structural variant (SV) data, transcriptome data, TWAS (transcriptome-wide association study) data, and WGS (whole genome sequencing) data.
- the matched transcriptome and WGS data may comprise the entirety (or nearly the entirety) of the DNA sequence of an organism’s genome.
- genotypes some of which may be “haplotypes” at loci that are clustered together on the same chromosome, as well as collections of genotypes from across a single chromosome, and/or collections of genotypes corresponding to loci distributed on different chromosomes may be measured, saved, and used in one or more of the various models operating within the present system.
- ATAC-Seq is a technique to assess genome-wide chromatin accessibility. Gene expression links to tissues and times when a particular gene is active allowing for a direct link of gene level changes to phenotypic changes, at scale. Gene Ontology is a representation of detectable observations in genes and relationships between those observations, which allows scientists to publish specific observations about genes opening up literature as a source of training data.
- GWAS data is a method of studying associations between a genome-wide set of single-nucleotide polymorphisms (SNPs) and a desired phenotypic traits, such as increased protein content.
- QTL is the location within a genome that correlates with a variation in a quantitative phenotype of the organism.
- expression data While it is high value corelative data particularly with respect to protein content in soybeans, is expensive to generate. Assuming a scenario where genotype data for 5,000 samples is approximately $135,000, expression data for just four replicates, 2 tissues would be approximately $9,000,000. In such instances it would be ideal to find a proxy for such data.
- expression values can be predicted using already collected expression data correlated to other genotypic data. Using predicted expression data allows the system to dramatically increase sample numbers and the power of the machine learning model. In particular, by using the predicted expression for more than 6,300 genes across 1800+ soybean lines along with protein measurements for those same 1800+ soybean lines as training data for a random forest regression machine-learning model, high predictive accuracy has been obtained.
- the phenotypic data may include various desirable and undesirable traits associated with a particular plant.
- phenotypic data may include the protein content in seeds of the plant (measured both in the field using NIR and in a wet lab), the density of other nutrients in the seeds of the plant, the oil content in seeds of the plant, the oleic acid content in seeds of the plant, the fiber content in the seeds of the plant, the oligosaccharides content (e.g., raffinose and stachyose) in seeds of the plant, the saponins/isoflavones/PUFA content in the seeds of the plant, the content of other off-flavor contributing chemicals (e.g., Hexanal and Hexanol) in the seeds of the plant, the moisture content in seeds of the plant (water holding capacity), plant height, the yield history for the plant, the maturity group (MG) of the plant, and environmental stress resistance of the plant.
- the protein content in seeds of the plant measured both in the field using NIR and in
- the answer to meaningful substantive improvement of plantbased products may result from the aggregation of smaller improvements in those products.
- the disclosed systems and methods can consider billions of data points in millions of pipeline configurations to identify the starting parental plant breeding combinations, predict gene targets, and analyze optimal farm management and environmental conditions to guide eventual placement of improved varieties in the field. This result may more easily be attained by assessing the seeds in germplasm 105 using the machine learning techniques disclosed herein alongside in silico simulation and perhaps also gene editing. In fact, using just machine learning and in silico simulation has already facilitated the rapid identification and development of plant-based (soy) products with ultra-high protein (UHP).
- UHP ultra-high protein
- Such in silico simulation may be enabled in some part, by one or more of the same RNA-sequencing data, structural variant data, whole genome sequences data, phenotype data, and genotype data used to power the machine learning models.
- the machine learning model may, among other things, predict potential successful breeding crosses and potential QTLs and/or eQTLs that may provide promising targets to pursue using gene editing and/or breeding techniques based on the knowledge provided to the machine learning program regarding plants and their gene functions. This in turn can provide one or more paths to unlock and/or restore lost or muted genetic variation that is within the natural diversity of the plant and/or knock out genes that result in undesirable traits.
- product specifications could include increased protein content, increased water holding capacity, improved flavor, and decreased total oil.
- the specification could also require that ingredient processing be as energy-efficient as possible to meet growing consumer preferences.
- desired specifications for soybean-based white beverage could include increased protein content, increased solubility, improved flavor, improved color, and a differentiated saturated fat profile.
- desired specifications for soybean-based white beverage e.g., soy milk
- desired specifications for soybean-based white beverage could include increased protein content, increased solubility, improved flavor, improved color, and a differentiated saturated fat profile.
- an soybean-based egg replacement the specification may include increased emulsion/foaming, increased gelation, increased water holding capacity, and decreased total oil. Based on each particular specification, the necessary traits for the ultimately desired commercial soybean for that specification would be established. Then, the work of breeding and gene editing to achieve those desired traits in a commercial soybean plant begins.
- the desired traits e.g., maximized protein content, minimized oligosaccharides, increased water holding capacity
- the desired traits may be assessed against the genetic information and phenotypes of plants within an available germplasm as well as available gene editing targets within that germplasm to predict and potentially rank the most efficient (e.g., quickest, most cost-effective, most environmentally friendly, and combinations thereof) paths that have the highest probabilities of achieving the desired specification.
- some traits will be easier to integrate through gene editing than breeding.
- gene targets believed to result in the desired traits may be yet unknown, too difficult to edit/modify successfully, provide insufficient improvement of the desired trait, or may otherwise prove undesirable.
- a combination of breeding crosses and genetic editing will provide the most efficient path to the desired end product specification.
- breeding, genetic editing, planting location and crop management techniques will provide the most efficient path with the highest probability of producing an end product that meets (or exceeds) the specification.
- one or more machine learning models are trained (102) using training data collected (101) from one or more of the following: the germplasm 105 (e.g., phenotypic, genotypic), any existing breeding program data e.g., phenotypic, genomic), any existing gene editing program, as well as publicly available literature and information regarding the plant species underlying the resulting product.
- a specification is established for the improved plant-based product (103) and the plant traits needed to meet the specification (e.g., protein content, decreased/muted chemical expression) are extracted from the specification (104).
- the extracted specifications are input into the trained machine learning model(s) and in silico simulation(s) 190.
- lists of desirable predicted breeding crosses preferably by maturity group) (115) and a list of potential gene editing candidates (120) both having been ranked by probabilities determined by the machine learning model(s) may be produced.
- the predictive crossing plan 115 is based on the calculated probability of the progeny meeting product thresholds and maximizing genetic value with respect to one or more traits (e.g. , protein content).
- traits e.g. , protein content.
- This general concept is illustrated in Figure 1 C with respect to the predicted performance of a single trait for just two of the millions of potential crosses that are actually calculated and assessed by the predictive crossing plan (in addition to the calculations made in the predictive recombination, predictive advancement, and predictive deployment models, as a result of each predicted cross).
- GEBV genomic estimate breeding values
- Figure ID further illustrates the results with the calculation with two traits, protein and yield, for one particular soybean (z.e., the progeny of one particular potential breeding decision) that has been assigned a particular GermplasmID and has a probable maturity group in the middle of Group III (z.e., 36).
- yield may be measured in terms of the protein recovered per acre as opposed to the more traditional method of measuring yield, i.e. pounds of dry seed obtained per acre.
- the machine learning model may be trained to merely predict advancement of a plant line out of a testing phase.
- Such a method may using training data to train the machine learning model such that the machine learning model takes as input genotype information about a plurality of candidate plant lines selected for the testing phase without taking as input information about phenotypes of the plurality of candidate plant lines and outputs data indicative of which of the plurality of candidate plant lines should advance out of the testing phase.
- the plant-based product development program 150 may include speed breeding 155.
- This speed breeding 155 is likely to be conducted within an indoor facility that provides controlled growing conditions (e.g., temperature, daily photoperiod, humidity) year around without unintentional stressors (e.g., insects, drought). Even though speed breeding and even F3 may be conducted within an indoor facility, it is contemplated that F4 may still be grown outdoors. In speed breeding 155, the daily photoperiod is longer resulting expedited growth in the plants.
- Speed breeding 155 may include two selection processes: crossing and selfing. Whether any line is advanced, crossed, self-crossed, or back-crossed from one generation to the next may depend upon data gathered from the resulting plants that comprise the line. In this regard, as shown in Figure 1 A, tissue samples may be obtained from the seeds of plants grown in speed breeding 155.
- tissue samples may be collected from the plants within the speed breeding program 155. These tissue samples may be subjected to a variety of physical tests 170, such as genotyping, sequencing, and predictive phenotyping.
- one type of physical data gathered from the plants may comprise certain NIR data.
- This NIR data may be correlated to predict protein content in soybeans.
- the NIR data may be obtained by applying NIR light directly to soybeans, soybean pods, or even soy plants, but most preferably the NIR light is applied directly to the beans.
- other physical testing may be done, as may be appropriate, given the specification and the particular stage in the pipeline (e.g., speed breeding, F3, F4, Yield & Increase, and Commercialization) as illustrated by Figure 1 A.
- genotype data may also be collected between generations. The collection of this genomic data allows for assessment of the model and better future predictions. Where genomic data of a line significantly deviates from the genomic predictions of the model (especially if that deviation suggests negative future performance), that line may not be further advanced through breeding.
- Predictive recombination model 175 may receive input from the results of physical testing 170, the output of in silico simulation 190, or both.
- the results produced by in silico simulation 190 may also be based on the output of one or more component of the physical testing 170, food testing 171, historical genetic or phenotypic data of other seeds in the germplasm 105 (see, e.g., seed object 200 ( Figure IB)).
- This historical seed data may, itself, be real physical testing observations 170 (which may have been obtained from speed breeding 155 or actually in-field growth), calculated from real physical observations, predicted data, the result of in silico simulation 190, or a combination of one, some or all of the foregoing.
- the model may adjust based on the source of the data (e.g., real physical observation data versus simulated data versus predicted data).
- the predictive recombination model 175 is a machine learning model that directs that particular plants within speed breeding 155 are crossed and/or selfed.
- the predictive recombination model 175 is preferably trained (and potentially optimized) to achieve a few outcomes: (1) improve overall genetic diversity in the germplasm 105; (2) provide germlines for potential future products; and (3) provide a product focused on meeting the specifications for a particularly desired improved plant-based product.
- the predictive recombination model 175 may assess hundreds upon hundreds or even thousands upon thousands of potential breeding options to determine which one(s) of the options have higher probabilities of leading to one or more of the desired outcomes. For example, where predictive recombination model 175 recommends a selfing out of F2, it has assessed that such selfing has a significant probability of meeting the desired product specification in the future.
- Genome data may also be collected between generations. The collection of this genomic data allows for assessment of the model and better future predictions. Where genomic data of a line significantly deviates from the genomic predictions of the model (especially if that deviation suggests negative future performance), that line may not be further advanced through breeding. As further illustrated with respect to the F3 and F4 generations, plants may be crossed with gene-edited plants and resulting crosses may be gene edited. It should be understood that the same could be true of plants in the Fl and/or F2 generations.
- predictions may be further governed by predictive advancement model 180.
- Predictive advancement model 180 uses the same database as the predictive crossing and predictive recombination models, but assesses the available data differently.
- advancement decisions made by the predictive advancement model 180 are based on expected future performance and ability of quickly achieving commercialization for each variety at least in the portion of the pipeline illustrated in association with predictive advancement 180 in Figure 1A.
- the expected performance considerations considered by the system and methods shift more toward commercialization considerations/metrics.
- the predictive deployment model 185 is applied to make decisions about when, how, and where in the ground to plant each particular seed type in the pipeline and how to subsequently manage those plantings, including when to harvest.
- the predictive deployment model 185 assesses the probabilities of meeting the product specification using a particular type of seed (based on information in the seed object record 200) in a particular location, at a particular time, using particular management techniques.
- the predictive deployment model 185 assesses each of the potential options and ranks them. The seeds are subsequently planted for yield & increase and commercialization based on the recommendations provided by the model.
- in silico simulations 190 allow the system to test alternatives that cannot be readily tested in the real world because, among other things there are just too many possibilities to test. By picking seed objects that are believed to have a higher chance of success, modeling their progeny using in silico simulations 190 and the various machine learning options, the probability of hitting the desired improved plant-based product increases.
- the general framework of in silico (stochastic) simulation 190 for plant breeding programs is well-known: See, e.g., Faux AM, Gorjanc G, Gaynor RC, Battagin M, Edwards SM, Wilson DL, Hearne SJ, Gonen S, Hickey JM.
- AlphaSim Software for Breeding Program Simulation. Plant Genome.
- AlphaSim simulates breeding programs in a series of steps: (i) simulate haplotype sequences and pedigree; (ii) drop haplotypes into the base generation of the pedigree and select single-nucleotide polymorphism (SNP) and quantitative trait nucleotide (QTN); (iii) assign QTN effects, calculate genetic values, and simulate phenotypes; (iv) drop haplotypes into the bum-in generations; and (v) perform selection and simulate new generations.); Mackay I, Ober E, Hickey J. GplusE: beyond genomic selection. Food Energy Secur.
- candidate lists 115 and 120 may have elements that are based on one another.
- the ranked list of potential crosses 115 may include a cross involving the progeny of a gene edited plant as recommended in list 120.
- the list of gene editing targets 120 may rely upon the progeny of a potential cross recommended in list 115.
- Portions of ranked lists 115 and 120 are used as the basis for a selective breeding program (150) and a gene editing program (160), respectively.
- Genetic editing 160 is different from the transgenic, or “GMO,” approach in that it advances natural genetic variation that could be achieved using traditional breeding approaches rather than introducing genes foreign to the species, as is the case in GMO technology.
- GMO transgenic
- One method for gene editing that may be used to achieve this non-transgenic approach is called CRISPR.
- CRISPR technology is well-known. Generally speaking, the CRISPR nucelease scans the genome for the target site within the existing genome of the plant and makes a precise cut in the DNA. The DNA reattaches at the target site with the intended edit, leveraging the native genetic code.
- the machine learning model predicts a probability ranked list potential gene editing candidates 120 using genotypic and phenotypic data including data regarding an orthologous species.
- An “ortholog” is a gene in a different species that has evolved through speciation events only.” Getting Started in Gene Orthology and Functional Analysis (2010) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2845645/). Identifying orthologs helps to identify phenotypic information regarding genes with similar functionality. The same advantages may be seen with orthologous promoters.
- potentially advancing lines are at least partially (if not wholly) sequenced and the resulting genome for each potentially advancing line is analyzed by the one or more machine learning models (180) to determine the probability of whether the desired specification will be met by commercial production of that line on the farm in a field.
- Those lines for which the probability meets pre-determined criteria are advanced to farm field trials (190).
- farm field trials 190 phenotypic data is gathered for analysis by the ML system. Genomic data may also be gathered from certain plants during the farm field trials (190). If the data meets the pre-determined threshold(s), the plant products are advanced for ingredient processing (195). Data is collected on the processed ingredients, which is considered by the ML model(s) to determine whether or not the ingredients sufficiently meet the specifications. This may include phenotypic, genomic, and sensory panel data.
- the method may determine expression change for plant genes.
- the method may use transcriptomic data (RNA-seq expression matrix) in combination with genotype data to build the machine learning Expression Predictive Model.
- the machine learning model may employ ElasticNet implementation in Python, allowing parallelization and hyperparameter tuning across multiple parameters.
- the method may separate gene models built for each gene, which are used to predict gene expression for one or more genes of a plant genome.
- the method may use the predicted expression and Random Forest model to predict phenotype.
- the method may report the predictive accuracy for predicted phenotypes.
- the method may report feature importance and Shapley values for the contribution to each gene.
- the method provides directionality for the effect of a given edit on the desired phenotype and ranks candidate gene edits based on the predicted effect.
- the method may provide single or combinatorial gene targets. All of the features can be implemented using, for example, this system architecture described herein.
- the machine learning models may be operated toward recommending the selection of one or more candidate genomic edits and prediction of the cumulative effect of the recommended edits on given agronomic traits.
- the machine learning model may determine candidate genes and directionality of expression change.
- the system may implement a method for determining expression change for plant genes comprising: (A) predicting gene expressions for one or more genes of a plant genome using a first machine learning model that takes as input genotype information; (B) determining functional relationships between features of the gene expressions and a plurality of phenotypes using a second machine learning model that takes as input data indicative of the gene expressions; and (c) generating data indicative of directionality for at least one of the gene expressions based on the functional relationships.
- the method may use a high-throughput transcriptomic and genotype dataset to build a first machine learning model that predicts genetically regulated expression using genotype information.
- the method may feed expression data into a second machine learning model, which can account for non-linear dynamics and interactivity between genes, providing high global predictive accuracy.
- the method may employ the functional relationships between the gene expression features and phenotypes derived by the model to advise recommendations for gene editing strategy.
- Gene editing recommendations may comprise single editing targets, as well as multiple editing strategies, that involve balancing genes with interactive expression patterns.
- the method may provide directionality for how edits will affect the desired phenotype.
- Example 1 Soy, specifically soy protein concentrate (SPC) is the number one protein ingredient used in plant-based meat applications.
- SPC has a protein content of approximately 65%.
- SPC is primarily made by processing of defatted soy flour (approximately 47% protein content) produced from soybeans with an average protein content of approximately 36%.
- the processing required to increase the protein content is costly, water-intensive, and energy-intensive. It is believed that an ultra-high protein soybean could make this process less expensive, less waterintensive, and more energy-efficient.
- By leveraging the soybean plant’s genetic diversity its protein content may be increased to a sufficiently high-level (at least 49%) that it would effectively disintermediate one or more processing steps necessary to arrive at the protein level suitable for plant-based meat applications.
- the protein content of the soybeans in the field is driven toward 65%, the less waste and processing that would be required to produce Soy Protein Concentrate.
- Example 2 Through machine learning it is anticipated that better soybean genetics can be found in a germplasm which includes, among other varieties, the wild ancestor of the present day commercial soybean, Glycine soja (previously G. ussuriensis) or created using lessons from that broader germplasm and/or orthologs that will (a) facilitate other easier, cheaper, more environmentally friendly production of soy-based ingredients, potentially alleviating supply constraints; (b) allow for the production of completely new ingredients (e.g., de-flavored, high- water holding capacity soybeans for enhanced flavor and texture in final plant-based meat products; healthy oils (due to higher oleic acid); stable gelation); (c) new food products; and/or (d) improved end user satisfaction (e.g., better taste, texture, color).
- Glycine soja previously G. ussuriensis
- orthologs that will (a) facilitate other easier, cheaper, more environmentally friendly production of soy-based ingredients, potentially alleviating supply constraints; (b) allow for the production of completely new ingredients (e.g., de-
- Example 3 Soybean meal is an ideal protein source for swine, poultry, and fish due to its availability, cost, high protein content, and balanced amino acid profile. In fact, currently over 90% of the soybeans produced in the United States are fed to animals. However, its use has been restricted because — like many plant proteins — soybean meal has a high concentration of anti- nutritional compounds (ANCs), including oligosaccharides such as raffinose and stachyose that can have a negative effect on protein digestibility, leading to low energy values, poor metabolism, and excessive secretion impacting water quality in aquaculture systems.
- ANCs anti- nutritional compounds
- soy meal Apart from antinutritional factors, the steady decline in protein content of soy — an unintended consequence of breeding primarily for yield and other agronomic traits — has rendered soy meal a continually less valuable feed ingredient. Through machine learning it is anticipated that the expression of oligosaccharides such as raffinose and stachyose can be significantly decreased.
- Example 4 The yellow pea is another significant source of plant-protein.
- PPC pea protein concentrate
- PPI pea protein isolate
- the flavor and color of PPC is not preferred by consumers. While PPI has better flavor, the cost of process is much higher.
- machine learning models will help identify the gene(s) that result in the undesirable flavor and color of the yellow pea and provide gene editing actions to mute/lessen the undesirable flavor and color to provide greater consumer interest in yellow-pea based food ingredients.
- This will (a) facilitate other easier, cheaper, more environmentally friendly production of yellow-pea-based ingredients, alleviating plant-protein supply constraints; (b) allow for the production of completely new ingredients (e.g., de-flavored, high-water holding capacity yellow peas for enhanced flavor and texture in final plant-based meat products); (c) new food products; and/or (d) improved end user satisfaction (e.g., better taste, texture, color).
- machine learning models, data collection, various logic and/or functions disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer- readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics.
- Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof.
- Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, and so on).
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- PAL programmable array logic
- aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc.
- aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types.
- MOSFET metal-oxide semiconductor field-effect transistor
- CMOS complementary metal-oxide semiconductor
- ECL emitter-coupled logic
- polymer technologies e.g., silicon-conjugated polymer and metal- conjugated polymer-metal structures
- mixed analog and digital and so on.
- aspects of the methods and systems disclosed herein may be embodied and/or executed by the logic of the processes described herein, which may also be embodied in the form of software instructions and/or firmware that may be executed on any appropriate hardware.
- logic embodied in the form of software instructions and/or firmware may be executed on a dedicated system or systems, on a personal computer system, on a distributed processing computer system, and/or the like.
- logic may be implemented in a stand-alone environment operating on a single computer system and/or logic may be implemented in a networked environment such as a distributed system using multiple computers and/or processors, for example.
- system 400 may comprise a user devices 410a-n, server 460, and network 450.
- the user device 410 of the system 400 may include various components including, but not limited to, one or more input devices 411, one or more output devices 412, one or more processors 420, a network interface device 425 capable of interfacing with the network 450, one or more non-transitory memories 430 storing processor executable code and/or software application(s), for example including, a web browser capable of accessing a website and/or communicating information and/or data over the network, and/or the like.
- the memory 430 may also store an application (not shown) that, when executed by the processor 420 causes the user device 410 to provide the functionality of the various systems and methods described the present specification, as would be understood by those of ordinary skill in the art having the present specification, drawings, and claims before them.
- the input device 411 may be capable of receiving information input from the user and/or processor 420, and transmitting such information to other components of the user device 410 and/or the network 450.
- the input device 411 may include, but are not limited to, implementation as a keyboard, touchscreen, mouse, trackball, microphone, remote control, and combinations thereof, for example.
- the output device 412 may be capable of outputting information in a form perceivable by the user and/or processor 420.
- implementations of the output device 412 may include, but are not limited to, a computer monitor, a screen, a touchscreen, an audio speaker, a website, and combinations thereof, for example.
- the input device 411 and the output device 412 may be implemented as a single device, such as, for example, a computer touchscreen.
- the term “user” is not limited to a human being, and may comprise, a computer, a server, a website, a processor, a network interface, a user terminal, and combinations thereof, for example.
- the server 460 of the system 400 may include various components including, but not limited to, one or more input devices 461, one or more output devices 462, one or more processors 470, a network interface device 475 capable of interfacing with the network 450, and one or more non-transitory memories 480 for storing data structures/tables (including those of database 485) that may be used by the system 400 and particularly server 460 to perform the functions and procedures set forth herein.
- the memory 480 may also store an application/program store 481 that, when executed by the processor 470 causes the server 460 to provide the functionality of the systems and methods disclosed in the present application.
- the server 460 may include a single processor or multiple processors working together or independently to execute the program logic 481 stored in the memory 480 as described herein. It is to be understood, that in certain embodiments using more than one processor 470, the processors 470 may be located remotely from one another, located in the same location, or comprising a unitary multi-core processor. The processors 470 may be capable of reading and/or executing processor executable code and/or capable of creating, manipulating, retrieving, altering, and/or storing data structures and data tables (including those of database 485) into the memory 480.
- Exemplary embodiments of the processor 470 may be include, but are not limited to, a digital signal processor (DSP), a central processing unit (CPU), a field programmable gate array (FPGA), a microprocessor, a multi-core processor, combinations, thereof, and/or the like, for example.
- the processor 470 may be capable of communicating with the memory 480 via a path (e.g., data bus).
- the processor 470 may be capable of communicating with the input device 461 and/or the output device 462.
- the input device 461 of the server 460 may be capable of receiving information input from the user and/or processor 470, and transmitting such information to other components of the server 460 and/or the network 450.
- the input device 461 may include, but are not limited to, implementation as a keyboard, touchscreen, mouse, trackball, microphone, remote control, and/or the like and combinations thereof, for example.
- the input device 461 may be located in the same physical location as the processor 470, or located remotely and/or partially or completely networkbased.
- the output device 462 of the server 460 may be capable of outputting information in a form perceivable by the user and/or processor 470.
- implementations of the output device 462 may include, but are not limited to, a computer monitor, a screen, a touchscreen, an audio speaker, a website, a computer, and/or the like and combinations thereof, for example.
- the output device 462 may be located with the processor 470, or located remotely and/or partially or completely network-based.
- the memory 480 stores applications or program logic 481 as well as data structures (including those of database 485) that may be used by the system 400 and particularly server 460.
- the memory 480 may be implemented as a conventional non-transitory memory, such as for example, random access memory (RAM), CD-ROM, a hard drive, a solid state drive, a flash drive, a memory card, a DVD-ROM, a disk, an optical drive, combinations thereof, and/or the like, for example.
- the memory 480 may be located in the same physical location as the server 460, and/or one or more memory 480 may be located remotely from the server 460.
- the memory 480 may be located remotely from the server 460 and communicate with the processor 470 via the network 450.
- a first memory 480a may be located in the same physical location as the processor 470, and additional memory 480n may be located in a location physically remote from the processor 470.
- the memory 480 may be implemented as a “cloud” non-transitory computer readable storage memory (i.e., one or more memory 480 may be partially or completely based on or accessed using the network 450).
- Each element of the server 460 may be partially or completely network-based or cloudbased, and may or may not be located in a single physical location.
- the terms “network-based,” “cloud-based,” and any variations thereof, are intended to include the provision of configurable computational resources on demand via interfacing with a computer and/or computer network, with software and/or data at least partially located on a computer and/or computer network.
- the server 460 may or may not be located in single physical location.
- multiple servers 460 may or may not necessarily be located in a single physical location.
- Database 485 may comprise one or more data structures and/or data tables stored on non-transitory computer readable storage memory 480 accessible by the processor 470 of the server 460.
- the database 485 can be a relational database or a non-relational database. Examples of such databases include, but are not limited to: DB2®, Microsoft® Access, Microsoft® SQL Server, Oracle®, mySQL, PostgreSQL, MongoDB, Apache Cassandra, and the like. It should be understood that these examples have been provided for the purposes of illustration only and should not be construed as limiting the presently disclosed inventive concepts.
- the database 485 can be centralized or distributed across multiple systems.
- the teachings herein are not limited to certain plant species, and it is envisioned that they can be modified to be useful for monocots, dicots, and/or substantially any crop and/or valuable plant type, including plants that can reproduce by self-fertilization and/or cross fertilization, hybrids, inbreds, varieties, and/or cultivars thereof.
- Some of example plant species include, soybeans (Glycine max), peas (Pisum sativum and other members of the Fabaceae like Cjanus and Vigna species), chickpeas (Cicer arielinum), peanuts (Arachis hypogaea), lentils (Lens culinaris o Lens esculenta), lupins (various Lupinus species), mesquite (various Proopis species), clover (various Trifolium species), carob (Ceratonia siliqua), tamarind, corn (Zea mays), Brassica sp. (e.g., B. napus, B. rapa, B.
- juncea particularly those Brassica species useful as sources of seed oil, alfalfa (Medicago saliva), rice (Oryza saliva), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), camelina (Camelina sativa), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria ilahca), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), quinoa (Chenopodium quinoa), chicory (Cichorium intybus), tomato (Solanum lycopersicum), lettuce (Lactuca sativa), safflower (Carthamus tinctorius), wheat (Triticum aestivum), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogae
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Entrepreneurship & Innovation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Operations Research (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Bioethics (AREA)
- Analytical Chemistry (AREA)
- Public Health (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22917354.7A EP4456714A2 (en) | 2021-12-31 | 2022-12-29 | Systems and methods for accelerate speed to market for improved plant-based products |
Applications Claiming Priority (13)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163295826P | 2021-12-31 | 2021-12-31 | |
US202163295798P | 2021-12-31 | 2021-12-31 | |
US202163295664P | 2021-12-31 | 2021-12-31 | |
US202163295822P | 2021-12-31 | 2021-12-31 | |
US202163295823P | 2021-12-31 | 2021-12-31 | |
US63/295,823 | 2021-12-31 | ||
US63/295,295 | 2021-12-31 | ||
US63/295,826 | 2021-12-31 | ||
US63/295,822 | 2021-12-31 | ||
US63/295,798 | 2021-12-31 | ||
US63/295,664 | 2021-12-31 | ||
US202263326745P | 2022-04-01 | 2022-04-01 | |
US63/326,745 | 2022-04-01 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023129653A2 true WO2023129653A2 (en) | 2023-07-06 |
WO2023129653A3 WO2023129653A3 (en) | 2023-08-10 |
Family
ID=87002541
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/054252 WO2023129653A2 (en) | 2021-12-31 | 2022-12-29 | Systems and methods for accelerate speed to market for improved plant-based products |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4456714A2 (en) |
WO (1) | WO2023129653A2 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8049081B2 (en) * | 2008-05-13 | 2011-11-01 | Monsanto Technology Llc | Plants and seeds of hybrid corn variety CH872467 |
US11526601B2 (en) * | 2017-07-12 | 2022-12-13 | The Regents Of The University Of California | Detection and prevention of adversarial deep learning |
EP3871160A4 (en) * | 2018-10-24 | 2022-07-27 | Climate LLC | Leveraging genetics and feature engineering to boost placement predictability for seed product selection and recommendation by field |
EP3924808A4 (en) * | 2019-02-14 | 2022-10-26 | Fluence Bioengineering, Inc. | Controlled agricultural systems and methods of managing agricultural systems |
-
2022
- 2022-12-29 WO PCT/US2022/054252 patent/WO2023129653A2/en active Application Filing
- 2022-12-29 EP EP22917354.7A patent/EP4456714A2/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2023129653A3 (en) | 2023-08-10 |
EP4456714A2 (en) | 2024-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Anderson et al. | Soybean [Glycine max (L.) Merr.] breeding: History, improvement, production and future opportunities | |
CN111656355B (en) | Seed classification system and method | |
Baenziger et al. | Improving lives: 50 years of crop breeding, genetics, and cytology (C‐1) | |
Badu-Apraku et al. | Grouping of early maturing quality protein maize inbreds based on SNP markers and combining ability under multiple environments | |
Joshi | Plant breeding in Nepal: Past, present and future | |
AU2023226776A1 (en) | Methods for identifying crosses for use in plant breeding | |
Begna | Conventional breeding methods widely used to improve self-pollinated crops | |
Valle‐Echevarria et al. | Accelerating crop domestication in the era of gene editing | |
Gela et al. | An advanced lentil backcross population developed from a cross between Lens culinaris× L. ervoides for future disease resistance and genomic studies | |
Mbo Nkoulou et al. | Perspective for genomic-enabled prediction against black sigatoka disease and drought stress in polyploid species | |
Farokhzadeh et al. | Exploring agronomic traits and breeding prospects of primary tritipyrum and triticale lines to increase grain yield potential | |
Wei et al. | A joint segregation analysis of the inheritance of fertility restoration for cytoplasmic male sterility in pepper | |
Ene et al. | Hybrid vigor and heritability estimates in tomato crosses involving Solanum lycopersicum× S. pimpinellifolium under cool tropical monsoon climate | |
Braun et al. | Wheat: Prospects for Global Improvement: Proceedings of the 5th International Wheat Conference, 10–14 June, 1996, Ankara, Turkey | |
Gantait et al. | Evaluation of genetic divergence in Spanish bunch groundnut (Arachis hypogaea Linn.) genotypes | |
Ashraf et al. | Phylogenetic relationship of salt tolerance in early Green Revolution CIMMYT wheats | |
WO2023129653A2 (en) | Systems and methods for accelerate speed to market for improved plant-based products | |
EP4457813A1 (en) | Systems and methods for selecting recommended crosses with increased an probability of meeting plant-based product specifications | |
Hernández-Bautista et al. | Prediction accuracy of genomic selection models for earliness in tomato | |
WO2023129664A2 (en) | Systems and methods for training a machine-learning model for predictive plant breeding using phenomic selection based on diverse data streams to predict grain composition | |
Limbalkar et al. | Infusing genetic variability for productivity and drought tolerance traits from Brassica carinata into Brassica juncea genotypes | |
Kgasudi et al. | Genetic Variability, Heritability, Correlation and Path Coefficient Analysis of Growth and Yield Traits of Cowpea [Vigna unguiculata (L.) Walp] Parental Genotypes and their F1 Crosses | |
WO2023192474A1 (en) | Method to produce seeds rapidly through asexual propagation of cuttings in legumes | |
Grüneberg et al. | Unleashing the potential of sweetpotato in sub-saharan Africa: Current challenges and way forward | |
Wilson et al. | The Efficiency and effectiveness of open pollination in Musa Breeding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22917354 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3242687 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022917354 Country of ref document: EP Effective date: 20240731 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22917354 Country of ref document: EP Kind code of ref document: A2 |