WO2022203734A1 - Machine learning for predicting the properties of chemical formulations - Google Patents
Machine learning for predicting the properties of chemical formulations Download PDFInfo
- Publication number
- WO2022203734A1 WO2022203734A1 PCT/US2021/063436 US2021063436W WO2022203734A1 WO 2022203734 A1 WO2022203734 A1 WO 2022203734A1 US 2021063436 W US2021063436 W US 2021063436W WO 2022203734 A1 WO2022203734 A1 WO 2022203734A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- mixture
- property
- molecule
- predictions
- Prior art date
Links
- 239000000203 mixture Substances 0.000 title claims abstract description 315
- 239000000126 substance Substances 0.000 title claims abstract description 64
- 238000009472 formulation Methods 0.000 title abstract description 9
- 238000010801 machine learning Methods 0.000 title description 13
- 238000000034 method Methods 0.000 claims description 166
- 230000001953 sensory effect Effects 0.000 claims description 102
- 238000013528 artificial neural network Methods 0.000 claims description 42
- 238000012545 processing Methods 0.000 claims description 29
- 230000003993 interaction Effects 0.000 claims description 24
- 108020003175 receptors Proteins 0.000 claims description 24
- 102000005962 receptors Human genes 0.000 claims description 24
- 230000004913 activation Effects 0.000 claims description 16
- 230000027455 binding Effects 0.000 claims description 14
- 230000003197 catalytic effect Effects 0.000 claims description 4
- 239000004094 surface-active agent Substances 0.000 claims description 3
- 235000019640 taste Nutrition 0.000 abstract description 10
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 238000012549 training Methods 0.000 description 66
- 230000008569 process Effects 0.000 description 52
- 235000019645 odor Nutrition 0.000 description 32
- 238000010586 diagram Methods 0.000 description 22
- 238000001994 activation Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 16
- 238000013459 approach Methods 0.000 description 15
- 230000008447 perception Effects 0.000 description 13
- 241000894007 species Species 0.000 description 12
- 238000012360 testing method Methods 0.000 description 12
- 239000013598 vector Substances 0.000 description 12
- 230000005764 inhibitory process Effects 0.000 description 11
- 230000015654 memory Effects 0.000 description 11
- 239000003205 fragrance Substances 0.000 description 9
- 239000000077 insect repellent Substances 0.000 description 8
- 230000002787 reinforcement Effects 0.000 description 8
- 230000004044 response Effects 0.000 description 8
- 238000005070 sampling Methods 0.000 description 8
- 230000035945 sensitivity Effects 0.000 description 8
- 241001465754 Metazoa Species 0.000 description 7
- 239000008186 active pharmaceutical agent Substances 0.000 description 7
- 230000008901 benefit Effects 0.000 description 7
- 230000001339 gustatory effect Effects 0.000 description 7
- 241000607479 Yersinia pestis Species 0.000 description 6
- 238000013461 design Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- 230000000306 recurrent effect Effects 0.000 description 6
- 238000012800 visualization Methods 0.000 description 6
- 235000006484 Paeonia officinalis Nutrition 0.000 description 5
- 244000170916 Paeonia officinalis Species 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 239000000314 lubricant Substances 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 4
- 238000007792 addition Methods 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 4
- 230000006957 competitive inhibition Effects 0.000 description 4
- 235000009508 confectionery Nutrition 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 239000005871 repellent Substances 0.000 description 4
- 230000002940 repellent Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 241000234295 Musa Species 0.000 description 3
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 3
- 108050002069 Olfactory receptors Proteins 0.000 description 3
- 102000012547 Olfactory receptors Human genes 0.000 description 3
- 235000014443 Pyrus communis Nutrition 0.000 description 3
- 239000005667 attractant Substances 0.000 description 3
- 230000031902 chemoattractant activity Effects 0.000 description 3
- 239000000796 flavoring agent Substances 0.000 description 3
- 235000019634 flavors Nutrition 0.000 description 3
- 210000003128 head Anatomy 0.000 description 3
- 238000003032 molecular docking Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000006959 non-competitive inhibition Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000704 physical effect Effects 0.000 description 3
- 230000003389 potentiating effect Effects 0.000 description 3
- 230000021317 sensory perception Effects 0.000 description 3
- 230000008786 sensory perception of smell Effects 0.000 description 3
- 230000001988 toxicity Effects 0.000 description 3
- 231100000419 toxicity Toxicity 0.000 description 3
- 241000167854 Bourreria succulenta Species 0.000 description 2
- 241000255925 Diptera Species 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 235000019693 cherries Nutrition 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 239000013078 crystal Substances 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000013305 food Nutrition 0.000 description 2
- 238000003197 gene knockdown Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 239000004615 ingredient Substances 0.000 description 2
- 239000003112 inhibitor Substances 0.000 description 2
- 239000002917 insecticide Substances 0.000 description 2
- MLFHJEHSLIIPHL-UHFFFAOYSA-N isoamyl acetate Chemical compound CC(C)CCOC(C)=O MLFHJEHSLIIPHL-UHFFFAOYSA-N 0.000 description 2
- 238000000329 molecular dynamics simulation Methods 0.000 description 2
- 239000002304 perfume Substances 0.000 description 2
- 238000013031 physical testing Methods 0.000 description 2
- 230000001846 repelling effect Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 239000000344 soap Substances 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 231100000027 toxicology Toxicity 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000256113 Culicidae Species 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 235000011430 Malus pumila Nutrition 0.000 description 1
- 235000015103 Malus silvestris Nutrition 0.000 description 1
- 230000010799 Receptor Interactions Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 239000002386 air freshener Substances 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000005557 antagonist Substances 0.000 description 1
- 239000007961 artificial flavoring substance Substances 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000009835 boiling Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000009137 competitive binding Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 108091005708 gustatory receptors Proteins 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 229940117955 isoamyl acetate Drugs 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012900 molecular simulation Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 210000001331 nose Anatomy 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 239000004033 plastic Substances 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 230000007096 poisonous effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000003077 quantum chemistry computational method Methods 0.000 description 1
- 230000005610 quantum mechanics Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000005316 response function Methods 0.000 description 1
- 210000002265 sensory receptor cell Anatomy 0.000 description 1
- 108091008691 sensory receptors Proteins 0.000 description 1
- 102000027509 sensory receptors Human genes 0.000 description 1
- 239000002453 shampoo Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003655 tactile properties Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000014599 transmission of virus Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/20—Identification of molecular entities, parts thereof or of chemical compositions
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Definitions
- the present disclosure relates generally to predicting the properties of chemical formulations using machine learning. More particularly, the present disclosure relates to property prediction using properties of molecules, concentrations, composition, and interactions.
- Mixture models in the art focus on perceptual similarity of mixtures for predictions while ignoring other factors.
- certain existing approaches focus on storing and providing human acquired data on properties of mixtures such as human tasted mixtures.
- the stored data relies on human acquired data, which can lead to subjective bias, including varying scales based on the acquirer of data.
- the method can include obtaining, by a computing system comprising one or more computing devices, respective molecule data for each of a plurality of molecules and mixture data associated with a mixture of the plurality of molecules.
- the method can include respectively processing, by the computing device, the respective molecule data for each of the plurality of molecules with a machine-learned embedding model to generate a respective embedding for each molecule.
- the method can include processing, by the computing system, the embeddings and the mixture data with a prediction model to generate one or more property predictions for the mixture of the plurality of molecules.
- the one or more property predictions can be based at least in part on the embeddings and the mixture data.
- the method can include storing, by the computing system, the one or more property predictions.
- the mixture data can describe a respective concentration of each molecule in the mixture.
- the mixture data can describe a composition of the mixture.
- the prediction model can include a deep neural network.
- the machine-learned embedding model can include a machine-learned graph neural network.
- the prediction model can include a characteristic-specific model configured to generate predictions relative to a specific characteristic.
- the one or more property predictions can be based at least in part on a binding energy of one or more molecules of the plurality of molecules.
- the one or more property predictions can include one or more sensory property predictions.
- the one or more property predictions can include an olfactory prediction.
- the one or more property predictions can include a catalytic property prediction.
- the one or more property predictions can include an energetic property prediction.
- the one or more property predictions can include a surfactant between target property prediction.
- the one or more property predictions can include a pharmaceutical property prediction.
- the one or more property predictions can include a thermal property prediction.
- the prediction model can include a weighting model configured to weight and pool the embeddings based on the mixture data, and the mixture data can include concentration data related to the plurality of molecules of the mixture.
- the method can include obtaining, by the computing system, a request from a requesting computing device for a chemical mixture with a requested property, determining, by the computing system, the one or more property predictions satisfy the requested property, and providing, by the computing system, the mixture data to the requesting computing device.
- the one or more property predictions can be based at least in part on a molecule interaction property. In some implementations, the one or more property predictions can be based at least in part on receptor activation data.
- the computing system can include one or more processors and one or more non- transitory computer readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations.
- the operations can include obtaining respective molecule data for a plurality of molecules and mixture data associated with a mixture of the plurality of molecules.
- the mixture data can include concentrations for each respective molecule of the plurality of the molecules.
- the operations can include respectively processing the respective molecule data with an embedding model for each of the plurality of molecules to generate respective embeddings for each molecule.
- the operations can include processing the embeddings and the mixture data with a machine-learned prediction model to generate one or more property predictions.
- the one or more property predictions can be based at least in part on the embeddings and the mixture data.
- the operations can include storing the one or more property predictions.
- Another example aspect of the present disclosure is directed to one or more non- transitory computer readable media that collectively store instructions that, when executed by one or more processors, cause a computing system to perform operations.
- the operations can include obtaining respective molecule data for a plurality of molecules and mixture data associated with a mixture of the plurality of molecules.
- the operations can include respectively processing the respective molecule data with an embedding model for each of the plurality of molecules to generate respective embeddings for each molecule.
- the operations can include processing the embeddings and the mixture data with a machine- learned prediction model to generate one or more property predictions.
- the one or more property predictions can be based at least in part on the embeddings and the mixture data.
- the operations can include storing the one or more property predictions.
- Figure 1A depicts a block diagram of an example computing system that performs mixture property prediction according to example embodiments of the present disclosure.
- Figure IB depicts a block diagram of an example computing device that performs mixture property prediction according to example embodiments of the present disclosure.
- Figure 1C depicts a block diagram of an example computing device that performs mixture property prediction according to example embodiments of the present disclosure.
- Figure 2 depicts a block diagram of an example machine-learned prediction model according to example embodiments of the present disclosure.
- Figure 3 depicts a block diagram of an example property prediction model system according to example embodiments of the present disclosure.
- Figure 4 depicts a block diagram of an example property request system according to example embodiments of the present disclosure.
- Figure 5 depicts a block diagram of an example mixture property profile according to example embodiments of the present disclosure.
- Figure 6 depicts a flow chart diagram of an example method to perform mixture property prediction according to example embodiments of the present disclosure.
- Figure 7 depicts a flow chart diagram of an example method to perform property prediction and retrieval according to example embodiments of the present disclosure.
- Figure 8 depicts a flow chart diagram of an example method to perform property prediction database generation according to example embodiments of the present disclosure.
- Figure 9A depicts a block diagram of an example evolutionary approach according to example embodiments of the present disclosure.
- Figure 9B depicts a block diagram of an example reinforcement learning approach according to example embodiments of the present disclosure.
- the present disclosure is directed to systems and methods for using machine learning to predict one or more properties of a mixture of multiple chemical molecules.
- the systems and methods can leverage known properties for individual molecules, compositions, and interactions to predict properties for mixtures before the mixture is tested.
- machine-learned models can be used to utilize artificial intelligence techniques to quickly and efficiently predict the properties of the mixtures.
- the systems and methods can include obtaining molecule data for one or more molecules and mixture data associated with a mixture of the one or more molecules.
- the molecule data can include respective molecule data for each molecule of a plurality of molecules that make up a mixture.
- the mixture data can include data related to the concentration of each molecule in the mixture along with the overall composition of the mixture.
- the mixture data can describe the chemical formulation of the mixture.
- the molecule data can be processed with an embedding model to generate a plurality of embeddings. Each respective molecule data for each respective molecule may be processed with the embedding model to generate a respective embedding for each respective molecule in the mixture.
- the embeddings can include data descriptive of individual molecule properties for the embedded data.
- the embeddings can be vectors of numbers.
- the embeddings may represent graphs or molecular property descriptions.
- the embeddings and the mixture data can be processed by a prediction model to generate one or more property predictions.
- the one or more property predictions can be based at least in part on the one or more embeddings and the mixture data.
- the property predictions can include various predictions on the taste, smell, coloration, etc. of the mixture.
- the systems and methods can include storing the one or more property predictions.
- one or both of the models can include a machine-learned model.
- Obtaining molecule data and mixture data can include receiving a request for property predictions for a mixture including the one or more molecules of the plurality of molecules.
- the request can further include concentrations for each of the one or more molecules.
- the request can include characteristic specific properties (e.g., sensory properties) or mixture properties in general.
- obtaining molecule data and mixture data can include a form of sampling, such as random sampling or category specific sampling. For example, random sampling of molecule mixtures may be implemented to catalog predictions of various mixtures.
- category specific sampling can include taking molecules in a category with known properties and sampling with molecules in another category with other known properties.
- the molecule data can be processed with an embedding model to generate a plurality of embeddings.
- Each molecule of the plurality of molecules may receive one or more respective embeddings.
- the embeddings may be property feature embeddings, which can include embedded data related to individual molecule properties.
- an embedding for a first molecule may include embedded information descriptive of the olfactory properties of that molecule.
- the embedding model can include a graph neural network that generates one or more embeddings for each respective molecule.
- the embeddings can be vectors, and the vectors can be based on processed graphs, in which the graphs describe one or more molecules.
- the one or more embeddings can be processed with the mixture data by a prediction model to generate one or more property predictions.
- the prediction model can include weighting the one or more embeddings based on the concentration of the molecule the embedding is associated with. For example, a mixture including a first molecule and a second molecule with a two to one concentration ratio may include a heavier weighting for the embedding of the first molecule as the first molecule has a higher concentration in the mixture.
- the machine-learned prediction model can include a weighting model including weighting and pooling the embeddings based on the mixture data, in which the mixture data can include concentration data related to the plurality of molecules of the mixture.
- the prediction model can be a machine-learned prediction model, and the machine-learned prediction model can include a characteristic specific model (e.g., sensory property prediction model, energetic property prediction model, thermal property prediction model, etc.).
- a characteristic specific model e.g., sensory property prediction model, energetic property prediction model, thermal property prediction model, etc.
- the one or more property predictions can be stored.
- the predictions can be stored in a database of property predictions and may be stored on a centralized server. In some implementations, the predictions may be provided to a computing device after being generated.
- the stored predictions may be organized into a mixture property prediction profile, which can include the mixture and its respective property predictions in a digestible format.
- the stored predictions may be received upon request. In some implementations, the stored predictions can be readily searchable. For example, the system can receive a request for a particular property in the form of a property search query. The system can determine if the requested property is one of the properties in the property predictions for the mixture. If the requested property is in the property predictions, the mixture information may be provided to the requestor.
- property predictions can be based on one or more initial predictions, including, but not limited to: predicting a single molecule’s properties as a function of concentration, predicting a mixture’s properties as a function of mixture composition, and predicting a mixture’s properties when components of the mixture interact (e.g., synergistically or competitively).
- Each prediction may be generated by a separate model or by a singular model.
- the systems and methods may rely on an algorithm that is fully differentiable.
- the systems and methods may use knowledge of strong chemical inductive biases and nonconvex optimization for training their predictive models.
- the machine-learned models can be trained using gradient descent and a dataset of mixture data.
- the machine-learned prediction model may be trained with a training dataset with labeled pairings.
- the training data can include known receptor activation data.
- the systems and methods can predict the perceptual or physical properties of mixtures.
- the methods and systems can involve explicitly modeling chemically realistic equilibrium and competitive binding dynamics, where the entire algorithm can be fully differentiable.
- This implementation can allow both the use of strong chemical inductive biases, and also the full toolkit of nonconvex optimization from the field of neural networks and machine learning.
- the machine-learned prediction model can be trained for concentration dependence and modeling mixtures, which can include mixtures with competitive inhibition and mixtures with noncompetitive inhibition.
- Concentration dependence can include understanding the properties of individual molecules and factoring in and weighting the properties of individual molecules based on the concentration of each molecule in the mixture.
- Mixtures with competitive inhibition can include mixtures in which the various molecules of the mixture are competing for activating a receptor (e.g., molecules competing to activate an odor receptor). Moreover, the systems and methods can factor in that molecules with higher normalized binding energy can be more likely to trigger receptors before lower normalized binding energy molecules.
- the mixtures with competitive inhibition can be considered by the system by adding a second head to the model. One head can model the net binding energy, the other head can model the “proper substrate or competitive inhibitor” propensity score, and the two heads can be elementwise multiplied.
- the systems and methods can include an attention mechanism.
- the two headed model can factor in which molecule activates a receptor.
- Mixtures with noncompetitive inhibition can include cumulative inhibition based on a proper activation binding mode and a noncompetitively inhibiting binding mode.
- the weighting of the embeddings based on concentration can be a weighted average.
- the weighting can generate a single fixed dimensional embedding.
- the concentration can be passed through a nonlinearity.
- a weighting model can generate a weighted set of graphs.
- the graph structures of the molecules in a mixture may be passed in as a weighted set to a neural network model, and a machine learning method to handle variable-sized set input may be used to digest each molecule. For instance, methods such as set2vec may be combined with graph neural network methods.
- the graph structures of the molecules in a mixture may be embedded in a “graph of graphs,” where each node represents a molecule in the mixture.
- the edges may be constructed in an all-to-all fashion (e.g., hypothesizing that all molecule types may interact with each other) or using chemical prior knowledge to prune down the interactions between molecules that are more or less likely to occur.
- the edges may be weighted according to the likelihood of interaction.
- standard graph neural network methods may be used to pass messages both within the atoms of molecules, and between entire molecules, in an alternating fashion.
- the systems and methods can include a nearest neighbor interpolation.
- a nearest neighbor interpolation can include enumerating a set of N ingredients and can include representing each mixture as an N-dimensional vector. The vector can represent the proportion of each ingredient.
- a prediction for a novel mixture can involve a nearest-neighbor lookup according to some distance metric, followed by an averaging of the perceptual properties for the nearest neighbors. The averaged perceptual properties can be the predictions.
- the systems and methods can include direct molecular dynamics simulation, through a quantum mechanics based or molecular force field based approach.
- each molecule's interaction with a putative odor receptor or taste receptor can be directly modeled using specialized computers for molecular simulation, and the strength of the interaction can be measured by the simulation.
- the perceptual properties of a mixture may be modeled based on the combined interactions of all components.
- the property predictions can include sensory property predictions (e.g., olfactory properties, taste properties, color properties, etc.). Additionally and/or alternatively, the property predictions can include catalytic property predictions, energetic property predictions, surfactant between target property predictions, pharmaceutical property predictions, odor quality predictions, odor intensity predictions, color predictions, viscosity predictions, lubricant property predictions, boiling point predictions, adhesion property predictions, coloration property predictions, stability predictions, and thermal property predictions.
- the property predictions can include predictions related to properties that can be beneficial to battery design, such as how long the mixture holds a charge, how much charge the mixture can hold, discharge, rate, degradation rate, stability, and overall quality.
- the systems and methods disclosed herein can be applied to generate property predictions for a variety of uses including but not limited to consumer packaged goods, flavor and fragrance, and industrial applications such as dyes, paints, lubricants, and energy applications such as battery design.
- the systems and methods described herein can be implemented by one or more computing devices the computing device(a) can include one or more processors and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computing device to perform operations.
- the operations can include steps of various methods described herein.
- the systems and methods disclosed herein can be used for a closed-loop development process. For example, a human practitioner can utilize the systems and methods disclosed herein to predict the properties of mixtures before physically creating the mixture. In some implementations, the systems and methods can be used to generate a database of theoretical mixtures with predicted properties. A human practitioner can utilize the generated database to enable computer-aided mixture design for a desired effect.
- the database may be a searchable database that can be used to screen through all possible mixtures to identify mixtures with desired perceptual and physical properties.
- a human practitioner may be attempting to make a new, potent flowery fragrance.
- the human practitioner may provide theoretical mixture suggestions to the embedding model and machine-learned prediction model to output predicted properties of the theoretical mixtures.
- the human practitioner can use the predictions to determine whether to actually produce the mixture or continue formulating other mixtures for testing.
- the system in response to determining one or more mixtures are predicted to have the desired properties, the system may send instructions to a manufacturing system or a user computing system to manufacture the one or more mixtures for physical testing.
- the human practitioner may search or screen through mixtures that have already been processed by the machine-learned model(s) to generate property predictions.
- the mixtures and their respective property predictions can be stored in a database to provide ease in screening through or searching the data.
- a human practitioner can screen through the plurality of mixtures to find mixtures with property predictions that match a desired property. For example, the human practitioner attempting to make a new, potent flowery fragrance may screen through the database for a mixture predicted to have a potent smell with flowery notes.
- the closed-loop development process utilization of the systems and methods disclosed herein can save time and can save on cost of producing and physically testing mixtures.
- Human practitioners can screen through data with the machine-learned models to quickly eliminate a large amount of possible mixtures from the pool of possible candidates.
- the machine-learned models may predict properties that indicate candidate mixtures that may be overlooked by human practitioners due to the candidate mixtures having surprise cumulative properties.
- the systems and methods for using machine learning to predict one or more properties of a mixture of multiple chemical molecules may be used to control machinery and/or provide an alert.
- the systems and methods can be used to control manufacturing machinery to provide a safer work environment or to change the composition of a mixture to provide a desired output.
- the property prediction can be processed to determine if an alert needs to be provided.
- the property predictions may include olfactory property predictions for the scent of a vehicle used for transportation services.
- the systems and methods may output scent profile predictions, potency predictions, and scent lifetime predictions for an air freshener, a fragrance, or a candle alternative.
- the predictions can then be processed to determine when a new product should be placed in the transportation device and/or whether the transportation device should undergo a cleaning routine.
- the determined new product time may then be sent as an alert to a user computing device or may be used to set up an automated purchase.
- the transportation device e.g., an autonomous vehicle
- an alert can be provided in a property prediction generated by the machine learning model indicates an unsafe environment for animals or persons present within a space. For example, an audio alert can sound in a building if a prediction of lack of safety is generated for a mixture of chemical molecules sensed to be in the building.
- the system may intake sensor data to be input into the embedding model and prediction model to generate property predictions of the environment.
- the system may utilize one or more sensors for intaking data associated with the presence and/or concentration of molecules in the environment.
- the system can process the sensor data to generate input data for the embedding model and the prediction model to generate property predictions for the environment, which can include one or more predictions on the smell of the environment or other properties of the environment. If the predictions include a determined unpleasant odor, the system may send an alert to a user computing device to have a cleaning service completed. In some implementations, the system may bypass an alert and send an appointment request to a cleaning service upon determination of the unpleasant odor.
- Another example implementation can involve background processing and/or active monitoring for safety precautions.
- the system can document manufacturing steps completed by a user or a machine to track the predicted property of created mixtures to ensure the manufacturer is aware of any dangers.
- the new potential mixture may be processed by the embedding model and prediction model to determine the property predictions of the new mixture.
- the property predictions can include whether the new mixture is flammable, poisonous, unstable, or dangerous in any way. If the new mixture is determined to be dangerous in any way, an alert may be sent.
- the system may control one or more machines to stop and/or contain the process to protect from any potential present or future danger.
- the systems and methods can be applied to other manufacturing, industrial, or commercial systems to provide automated alerts or automated actions in response to property predictions. These applications can include new mixture creations, adjustments to recipes, counteracting measures, or real-time alerts on changes in predicted properties.
- the systems and methods of the present disclosure provide a number of technical effects and benefits.
- the system and methods can provide property predictions for mixtures without having to individually and physically test various mixtures of molecules.
- the systems and methods can further be used to generate a database of mixtures with predicted properties that can be readily searchable for finding mixtures with certain properties to be implemented in fragrances, foods, lubricants, and so forth based on their predicted properties.
- the systems and methods can enable more accurate predictions due to consideration of both individual molecule properties and interaction properties.
- the ability of a computer to perform a task e.g., a mixture fragrance prediction
- Another technical benefit of the systems and methods of the present disclosure is the ability to quickly and efficiently predict mixture properties, which can circumvent the need for testing mixtures with human taste tests and other physical testing applications.
- Figure 1 A depicts a block diagram of an example computing system 100 that performs property predictions according to example embodiments of the present disclosure.
- the system 100 includes a user computing device 102, a server computing system 130, and a training computing system 150 that are communicatively coupled over a network 180.
- the user computing device 102 can be any type of computing device, such as, for example, a personal computing device (e.g., laptop or desktop), a mobile computing device (e.g., smartphone or tablet), a gaming console or controller, a wearable computing device, an embedded computing device, or any other type of computing device.
- a personal computing device e.g., laptop or desktop
- a mobile computing device e.g., smartphone or tablet
- a gaming console or controller e.g., a gaming console or controller
- a wearable computing device e.g., an embedded computing device, or any other type of computing device.
- the user computing device 102 includes one or more processors 112 and a memory 114.
- the one or more processors 112 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 114 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
- the memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations.
- the user computing device 102 can store or include one or more prediction models 120.
- the prediction models 120 can be or can otherwise include various machine-learned models such as neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models.
- Neural networks can include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks.
- Example prediction models 120 are discussed with reference to Figures 2, 3, and 6 - 8.
- the one or more prediction models 120 can be received from the server computing system 130 over network 180, stored in the user computing device memory 114, and then used or otherwise implemented by the one or more processors 112.
- the user computing device 102 can implement multiple parallel instances of a single prediction model 120 (e.g., to perform parallel mixture property predictions across multiple instances of mixture composition).
- the machine-learned prediction model can be trained to intake molecule data and mixture data and output property predictions for the mixture the mixture data is descriptive of.
- the molecule data may be embedded with an embedding model before being processed by a prediction model.
- one or more prediction models 140 can be included in or otherwise stored and implemented by the server computing system 130 that communicates with the user computing device 102 according to a client-server relationship.
- the prediction models 140 can be implemented by the server computing system 140 as a portion of a web service (e.g., a mixture property prediction service).
- a web service e.g., a mixture property prediction service
- one or more models 120 can be stored and implemented at the user computing device 102 and/or one or more models 140 can be stored and implemented at the server computing system 130.
- the user computing device 102 can also include one or more user input component 122 that receives user input.
- the user input component 122 can be a touch-sensitive component (e.g., a touch-sensitive display screen or a touch pad) that is sensitive to the touch of a user input object (e.g., a finger or a stylus).
- the touch-sensitive component can serve to implement a virtual keyboard.
- Other example user input components include a microphone, a traditional keyboard, or other means by which a user can provide user input.
- the server computing system 130 includes one or more processors 132 and a memory 134.
- the one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 134 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
- the memory 134 can store data 136 and instructions 138 which are executed by the processor 132 to cause the server computing system 130 to perform operations.
- the server computing system 130 includes or is otherwise implemented by one or more server computing devices. In instances in which the server computing system 130 includes plural server computing devices, such server computing devices can operate according to sequential computing architectures, parallel computing architectures, or some combination thereof.
- the server computing system 130 can store or otherwise include one or more machine-learned prediction models 140.
- the models 140 can be or can otherwise include various machine-learned models.
- Example machine-learned models include neural networks or other multi-layer non-linear models.
- Example neural networks include feed forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks.
- Example models 140 are discussed with reference to Figures 2, 3, and 6 - 8.
- the user computing device 102 and/or the server computing system 130 can train the models 120 and/or 140 via interaction with the training computing system 150 that is communicatively coupled over the network 180.
- the training computing system 150 can be separate from the server computing system 130 or can be a portion of the server computing system 130.
- the training computing system 150 includes one or more processors 152 and a memory 154.
- the one or more processors 152 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 154 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
- the memory 154 can store data 156 and instructions 158 which are executed by the processor 152 to cause the training computing system 150 to perform operations.
- the training computing system 150 includes or is otherwise implemented by one or more server computing devices.
- the training computing system 150 can include a model trainer 160 that trains the machine-learned models 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques, such as, for example, backwards propagation of errors.
- a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function).
- Various loss functions can be used such as mean squared error, likelihood loss, cross entropy loss, hinge loss, and/or various other loss functions.
- Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations.
- performing backwards propagation of errors can include performing truncated backpropagation through time.
- the model trainer 160 can perform a number of generalization techniques (e.g., weight decays, dropouts, etc.) to improve the generalization capability of the models being trained.
- the model trainer 160 can train the prediction models 120 and/or 140 based on a set of training data 162.
- the training data 162 can include, for example, labeled training data, such as molecule data with known molecule property labels, mixture data with known composition property labels, and mixture data with known interaction property labels.
- the training examples can be provided by the user computing device 102.
- the model 120 provided to the user computing device 102 can be trained by the training computing system 150 on user-specific data received from the user computing device 102. In some instances, this process can be referred to as personalizing the model.
- the model trainer 160 includes computer logic utilized to provide desired functionality.
- the model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor.
- the model trainer 160 includes program files stored on a storage device, loaded into a memory and executed by one or more processors.
- the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media.
- the network 180 can be any type of communications network, such as a local area network (e.g., intranet), wide area network (e.g., Internet), or some combination thereof and can include any number of wired or wireless links.
- communication over the network 180 can be carried via any type of wired and/or wireless connection, using a wide variety of communication protocols (e.g., TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g., HTML, XML), and/or protection schemes (e.g., VPN, secure HTTP, SSL).
- TCP/IP Transmission Control Protocol/IP
- HTTP HyperText Transfer Protocol
- SMTP Simple Stream Transfer Protocol
- FTP e.g., HTTP, HTTP, HTTP, HTTP, FTP
- encodings or formats e.g., HTML, XML
- protection schemes e.g., VPN, secure HTTP, SSL
- the input to the machine-learned model (s) of the present disclosure can be image data.
- the machine-learned model(s) can process the image data to generate an output.
- the machine-learned model(s) can process the image data to generate an image recognition output (e.g., a recognition of the image data, a latent embedding of the image data, an encoded representation of the image data, a hash of the image data, etc.).
- the machine-learned model(s) can process the image data to generate a molecular graph output, which can then be processed by the embedding model and the prediction model to generate property predictions.
- the input to the machine-learned model (s) of the present disclosure can be text or natural language data.
- the machine-learned model(s) can process the text or natural language data to generate an output.
- the machine- learned model(s) can process the natural language data to generate a search query output.
- the search query output can be processed by a search model to search for a mixture with a particular property and output one or mixtures with that specific property.
- the machine-learned model(s) can process the text or natural language data to generate a classification output.
- the classification output can be descriptive of a mixture having one or more predicted properties.
- the machine-learned model(s) can process the text or natural language data to generate a prediction output.
- the input to the machine-learned model(s) of the present disclosure can be latent encoding data (e.g., a latent space representation of an input, etc.).
- the machine-learned model(s) can process the latent encoding data to generate an output.
- the machine-learned model(s) can process the latent encoding data to generate a recognition output.
- the machine-learned model(s) can process the latent encoding data to generate a reconstruction output.
- the machine-learned model(s) can process the latent encoding data to generate a search output.
- the machine-learned model(s) can process the latent encoding data to generate a reclustering output.
- the machine-learned model(s) can process the latent encoding data to generate a prediction output.
- the input to the machine-learned model(s) of the present disclosure can be statistical data.
- the machine-learned model(s) can process the statistical data to generate an output.
- the machine-learned model(s) can process the statistical data to generate a recognition output.
- the machine- learned model(s) can process the statistical data to generate a prediction output.
- the machine-learned model(s) can process the statistical data to generate a classification output.
- the machine-learned model(s) can process the statistical data to generate a segmentation output.
- the machine-learned model(s) can process the statistical data to generate a segmentation output.
- the machine-learned model(s) can process the statistical data to generate a visualization output.
- the machine-learned model(s) can process the statistical data to generate a diagnostic output.
- the input to the machine-learned model(s) of the present disclosure can be sensor data.
- the machine-learned model(s) can process the sensor data to generate an output.
- the machine-learned model(s) can process the sensor data to generate a recognition output.
- the machine-learned model(s) can process the sensor data to generate a prediction output.
- the machine-learned model(s) can process the sensor data to generate a classification output.
- the machine-learned model(s) can process the sensor data to generate a segmentation output.
- the machine-learned model(s) can process the sensor data to generate a segmentation output.
- the machine-learned model(s) can process the sensor data to generate a visualization output.
- the machine-learned model(s) can process the sensor data to generate a diagnostic output.
- the input includes visual data
- the task is a computer vision task.
- the input includes pixel data for one or more images
- the task is an image processing task.
- the image processing task can be image classification, where the output is a set of scores, each score corresponding to a different object class and representing the likelihood that the one or more images depict an object belonging to the object class.
- the image processing task may be object detection, where the image processing output identifies one or more regions in the one or more images and, for each region, a likelihood that region depicts an object of interest.
- the image processing task can be image segmentation, where the image processing output defines, for each pixel in the one or more images, a respective likelihood for each category in a predetermined set of categories.
- the set of categories can be object classes.
- Figure 1 A illustrates one example computing system that can be used to implement the present disclosure.
- the user computing device 102 can include the model trainer 160 and the training dataset 162.
- the models 120 can be both trained and used locally at the user computing device 102.
- the user computing device 102 can implement the model trainer 160 to personalize the models 120 based on user-specific data.
- Figure IB depicts a block diagram of an example computing device 10 that performs according to example embodiments of the present disclosure.
- the computing device 10 can be a user computing device or a server computing device.
- the computing device 10 includes a number of applications (e.g., applications 1 through N). Each application contains its own machine learning library and machine-learned model(s). For example, each application can include a machine-learned model.
- Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
- each application can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components.
- each application can communicate with each device component using an API (e.g., a public API).
- the API used by each application is specific to that application.
- Figure 1C depicts a block diagram of an example computing device 50 that performs according to example embodiments of the present disclosure.
- the computing device 50 can be a user computing device or a server computing device.
- the computing device 50 includes a number of applications (e.g., applications 1 through N). Each application is in communication with a central intelligence layer.
- Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
- each application can communicate with the central intelligence layer (and model(s) stored therein) using an API (e.g., a common API across all applications).
- the central intelligence layer includes a number of machine-learned models. For example, as illustrated in Figure 1C, a respective machine-learned model (e.g., a model) can be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications can share a single machine-learned model. For example, in some implementations, the central intelligence layer can provide a single model (e.g., a single model) for all of the applications. In some implementations, the central intelligence layer is included within or otherwise implemented by an operating system of the computing device 50.
- a respective machine-learned model e.g., a model
- two or more applications can share a single machine-learned model.
- the central intelligence layer can provide a single model (e.g., a single model) for all of the applications.
- the central intelligence layer is included within or otherwise implemented by an operating system of the computing device 50.
- the central intelligence layer can communicate with a central device data layer.
- the central device data layer can be a centralized repository of data for the computing device 50. As illustrated in Figure 1C, the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components. In some implementations, the central device data layer can communicate with each device component using an API (e.g., a private API).
- an API e.g., a private API
- the systems and methods can include graph neural networks (GNN) and deep neural networks (DNN) for processing data.
- the systems and methods can factor in the normalized binding energy (NBE) and concentration of the molecules in the mixture to better understand the mixture and how the mixture may act.
- NBE normalized binding energy
- Graph neural networks (GNN), deep neural networks (DNN), and normalized binding energy (NBE) may be denoted as their respective acronyms, and concentration may be denoted such that the concentration of X is denoted as [X]
- the system can include factoring in concentration dependence into the prediction followed by modeling the mixture as a whole.
- a GNN GNN(molecule)
- perceptual_odor_response DNN(molecule_embedding)
- the systems and methods can determine a proper substrate score and/or generate feature vectors to aid in modeling mixtures and generating property predictions.
- Inhibition of molecules can be factored into the predictions, in some implementations.
- the systems and methods can determine inhibition data related to the normalized binding energy through a similar process to determining the normalized binding energy of a molecule.
- each perceptual odor response function and models may be factored into the overall property predictions for the mixtures. For example, concentration dependence, mixtures with competitive inhibition, and mixtures with noncompetitive inhibition may be factored into the overall machine-learned prediction model using various functions, architectures, and models.
- the systems and methods may include a specialized framework for processing the molecules individually to determine the individual properties of the molecules with the embedding model, or a first machine-learned model.
- These systems and methods may include or otherwise leverage machine-learned models (e.g., graph neural networks) in conjunction with molecule chemical structure data to predict one or more perceptual (e.g., olfactory, gustatory, tactile, etc.) properties of a molecule.
- perceptual e.g., olfactory, gustatory, tactile, etc.
- the systems and methods can predict the olfactory properties (e.g., humanly -perceived odor expressed using labels such as “sweet,” “piney,” “pear,” “rotten,” etc.) of a single molecule based on the chemical structure of the molecule.
- a machine-learned graph neural network can be trained and used to process a graph that graphically describes the chemical structure of a molecule to predict olfactory properties of the molecule.
- the graph neural network can operate directly upon the graph representation of the chemical structure of the molecule (e.g., perform convolutions within the graph space) to predict the olfactory properties of the molecule.
- the graph can include nodes that correspond to atoms and edges that correspond to chemical bonds between the atoms.
- the individual-molecule machine-learned models can be trained, for example, using training data that includes descriptions of molecules (e.g., structural descriptions of molecules, graph-based descriptions of chemical structures of molecules, etc.) that have been labeled (e.g., manually by an expert) with descriptions of olfactory properties (e.g., textual descriptions of odor categories such as “sweet,” “piney,” “pear,” “rotten,” etc.) that have been assessed for the molecules.
- descriptions of molecules e.g., structural descriptions of molecules, graph-based descriptions of chemical structures of molecules, etc.
- olfactory properties e.g., textual descriptions of odor categories such as “sweet,” “piney,” “pear,” “rotten,” etc.
- the first machine-learned model may use graph neural networks for quantitative structure-odor relationship (QSOR) modeling. Learned embeddings from graph neural networks capture a meaningful odor space representation of the underlying relationship between structure and odor.
- QSOR quantitative structure-odor relationship
- the relationship between a molecule’s structure and its olfactory perceptual properties is complex, and, to date, generally little is known about such relationships.
- the systems and methods of the present disclosure provide for the use of deep learning and under utilized data sources to obtain predictions of olfactory perceptual properties of unseen molecules, thus allowing for improvements in the identification and development of molecules having desired perceptual properties, for example, allowing for development of new compounds useful in commercial flavor, fragrance, or cosmetics products, improving expertise in prediction of drug psychoactive effects from single molecules, and/or the like.
- machine- learned models such as graph neural network models
- perceptual properties e.g., olfactory properties, gustatory properties, tactile properties, etc.
- a machine-learned model may be provided with an input graph structure of a molecule’s chemical structure, for example, based on a standardized description of a molecule’s chemical structure (e.g., a simplified molecular-input line-entry system (SMILES) string, etc.).
- SILES simplified molecular-input line-entry system
- the machine-learned model may provide output comprising a description of predicted perceptual properties of the molecule, such as, for example, a list of olfactory perceptual properties descriptive of what the molecule would smell like to a human.
- the systems and methods in response to receipt of a SMILES string or other description of chemical structure, can convert the string to a graph structure that graphically describes the two-dimensional structure of a molecule and can provide the graph structure to a machine-learned model (e.g., a trained graph convolutional neural network and/or other type of machine-learned model) that can predict,
- systems and methods could provide for creating a three-dimensional graph representation of the molecule, for example using quantum chemical calculations, for input to a machine-learned model.
- the prediction can indicate whether or not the molecule has a particular desired olfactory perceptual quality (e.g., a target scent perception, etc.).
- the prediction data can include one or more types of information associated with a predicted olfactory property of a molecule.
- prediction data for a molecule can provide for classifying the molecule into one olfactory property class and/or into multiple olfactory property classes.
- the classes can include human-provided (e.g., experts) textual labels (e.g., sour, cherry, piney, etc.).
- the classes can include non-textual representations of scent/odor, such as a location on a scent continuum or the like.
- prediction data for molecules can include intensity values that describe the intensity of the predicted scent/odor.
- prediction data can include confidence values associated with the predicted olfactory perceptual property.
- prediction data can include a numerical embedding that allows for similarity search, clustering, or other comparisons between two or more molecules based on a measure of distance between two or more embeddings.
- the machine-learned model can be trained to output embeddings that can be used to measure similarity by training the machine- learned model using a triplet training scheme where the model is trained to output embeddings that are closer in the embedding space for a pair of similar chemical structures (e.g., an anchor example and a positive example) and to output embeddings that are more distant in the embedding space for a pair of dissimilar chemical structures (e.g., the anchor and a negative example).
- the outputs of these models may be configured to be processed by a second machine-learned model for predicting the properties of a mixture of various models.
- the systems and methods of the present disclosure may not necessitate the generation of feature vectors descriptive of the molecule for input to a machine-learned model. Rather, the machine-learned model can be provided directly with the input of a graph-value form of the original chemical structure, thus reducing the resources required to make olfactory property predictions.
- the graph structure of molecules as input to the machine-learned model, new molecule structures can be conceptualized and evaluated without requiring the experimental production of such molecule structures to determine perceptual properties, thereby greatly accelerating the ability to evaluate new molecular structures and saving significant resources.
- training data including a plurality of known molecules can be obtained to provide for training one or more machine-learned models (e.g., a graph convolutional neural network, other type of machine-learned model) to provide predictions of olfactory properties of molecules.
- the machine-learned models can be trained using one or more datasets of molecules, where the dataset can include the chemical structure and a textual description of the perceptual properties (e.g., descriptions of the smell of the molecule provided by human experts, etc.) for each molecule.
- the training data can be derived from industry lists such as, for example, publicly available perfume industry lists of chemical structures and their corresponding odors.
- steps can be taken to balance out common perceptual properties and rare perceptual properties when training the machine-learned model(s).
- the systems and methods may provide for indications of how changes to a molecule structure could affect the predicted perceptual properties. These changes may be later processed by the second machine-learned model to generate an interaction property prediction, which can be used to generate an overall mixture property prediction. For example, the systems and methods could provide indications of how changes to the molecule structure may affect the intensity of a particular perceptual property, how catastrophic a change in the molecule’s structure would be to desired perceptual qualities, and/or the like.
- the systems and methods may provide for adding and/or removing one or more atoms and/or groups of atoms from a molecule’s structure to determine the effect of such addition/removal on one or more desired perceptual properties. For example, iterative and different changes to the chemical structure can be performed and then the result can be evaluated to understand how such change would affect the perceptual properties of the molecule.
- a gradient of the classification function of the machine-learned model can be evaluated (e.g., with respect to a particular label) at each node and/or edge of the input graph (e.g., via backpropagation through the machine-learned model) to generate a sensitivity map (e.g., that indicates how important each node and/or edge of the input graph was for output of such particular label).
- a graph of interest can be obtained, similar graphs can be sampled by adding noise to the graph, and then the average of the resulting sensitivity maps for each sampled graph can be taken as the sensitivity map for the graph of interest. Similar techniques can be performed to determine perceptual differences between different molecule structures.
- the systems and methods can provide for interpreting and/or visualizing which aspects of a molecule’s structure most contributes to its predicted odor quality.
- a heat map can be generated to overlay the molecule structure that provides indications of which portions of a molecule’s structure are most important to the perceptual properties of the molecule and/or which portions of a molecule’s structure are less important to the perceptual properties of the molecule.
- data indicative of how changes to a molecule structure would impact olfactory perception can be used to generate visualizations of how the structure contributes to a predicted olfactory quality.
- iterative changes to the molecule’s structure e.g., a knock-down technique, etc.
- a gradient technique can be used to generate a sensitivity map for the chemical structure, which can then be used to produce the visualization (e.g., in the form of a heat map).
- machine-learned model (s) may be trained to produce predictions of a molecule chemical structure that would provide one or more desired perceptual properties (e.g., generate a molecule chemical structure that would produce a particular scent quality, etc.).
- an iterative search can be performed to identify proposed molecule(s) that are predicted to exhibit one or more desired perceptual properties (e.g., targeted scent quality, intensity, etc.).
- an iterative search can propose a number of candidate molecule chemical structures that can be evaluated by the machine-learned model(s).
- candidate molecule structures can be generated through an evolutionary or genetic process.
- candidate molecule structures can be generated by a reinforcement learning agent (e.g., recurrent neural network) that seeks to leam a policy that maximizes a reward that is a function of whether the generated candidate molecule structures exhibit the one or more desired perceptual properties.
- a reinforcement learning agent e.g., recurrent neural network
- a plurality of candidate molecule graph structures that describe the chemical structure of each candidate molecule can be generated (e.g., iteratively generated) for use as input to a machine-learned model.
- the graph structure for each candidate molecule can be input to the machine-learned model to be evaluated.
- the machine-learned model can produce prediction data for each candidate molecule or a group of molecules that describes one or more perceptual properties of the one or more candidate molecules.
- the candidate molecule prediction data can then be compared to the one or more desired perceptual properties to determine if the candidate molecule(s) would exhibit desired perceptual properties (e.g., a viable molecule candidate, etc.).
- the comparison can be performed to generate a reward (e.g., in a reinforcement learning scheme) or to determine whether to retain or discard the candidate molecule (e.g., in an evolutionary learning scheme).
- Brute force search approaches may also be employed.
- the search for candidate molecules that exhibit the one or more desired perceptual properties can be structured as a multi-parameter optimization problem with a constraint on the optimization defined for each desired property.
- the systems and methods may provide for predicting, identifying, and/or optimizing other properties associated with a molecule structure along with desired olfactory properties.
- the machine-learned model(s) may predict or identify properties of molecule structures such as optical properties (e.g., clarity, reflectiveness, color, etc.), gustatory properties (e.g., tastes like “banana,” “sour,” “spicy,” etc.) shelf-stability, stability at particular pH levels, biodegradability, toxicity, industrial applicability, and/or the like.
- optical properties e.g., clarity, reflectiveness, color, etc.
- gustatory properties e.g., tastes like “banana,” “sour,” “spicy,” etc.
- shelf-stability e.g., stability at particular pH levels, biodegradability, toxicity, industrial applicability, and/or the like.
- the machine-learned models described herein can be used in active learning techniques to narrow a wide field of candidates to a smaller set of molecules or mixtures that are then manually evaluated.
- systems and methods can allow for synthesis of molecules, and/or mixtures, with particular properties in an iterative design-test- refine process. For example, based on prediction data from the machine-learned models, molecules or mixtures can be proposed for development. The molecules or mixtures can then be synthesized, and then can be subjected to specialized testing. Feedback from the testing can then be provided back to the design phase to refine the molecules to better achieve desired properties, etc.
- some property predictions may be determined based on a first determined property prediction.
- the secondary determined property predictions can be determined by utilizing known transfer properties and a non-leamed general purpose descriptor (e.g., SMILES string, Morgan fingerprint, Dragon descriptor, etc.). These descriptors are generally intended to “featurize” a molecule, rather than convey complicated structural interrelations. For instance, some existing approaches featurize or represent the molecule with general purpose heuristic features, such as Morgan fingerprints or Dragon descriptors. However, the general purpose featurization strategies often do not highlight the important information related to specific tasks, such as predicting the olfactory or other sensory properties of molecules in a given species.
- Morgan fingerprints are generally designed for “lookup” of similar molecules. Morgan fingerprints generally do not include spatial arrangement of a molecule. While this information can nonetheless be useful, it may be insufficient alone, in some design cases, such as olfactory cases which may benefit from spatial understanding. Despite this, a scratch-trained model with a low amount of available training data is unlikely to beat a Morgan fingerprint model.
- Another existing approach is physics-based modeling of sensory properties. For instance, physics-based modeling can include computational modeling of sensory (e.g., olfactory) receptors or sensory-related (e.g., olfactory-related) proteins.
- a base such as ethanol, plastic, shampoo, soap, fabric, etc.
- scented chemicals can affect the perceived smell of the chemical.
- the same chemical may be perceived differently in an ethanol base compared to, for example, a soap base.
- a base such as ethanol, plastic, shampoo, soap, fabric, etc.
- a machine-learned sensory prediction model may be trained on a first sensory prediction task and used to output predictions associated with a second sensory prediction task.
- the first sensory prediction task may be a broader sensory prediction task than the second sensory prediction task.
- the model may be trained on a broad task and transferred to a narrow task.
- the first task may be a broad property task
- the second task may be a specific property task (e.g., olfactory).
- the first sensory prediction task may be a task for which a larger amount of training data is available than for the second sensory prediction task.
- the first sensory prediction task may be associated with a first species and the second sensory prediction task may be associated with a second species.
- the first sensory prediction task may be a human olfactory task.
- the second sensory prediction task may be a pest control task, such as a mosquito repellent task.
- a sensory embedding model can be trained to produce a sensory embedding for the first sensory prediction task.
- the sensory embedding can be learned from the first sensory prediction task, such as from a larger available dataset, such that the sensory embedding is specific to the first prediction task (e.g., a broader task).
- this sensory embedding can capture useful information for other (e.g., narrower) sensory prediction tasks.
- this sensory embedding can be transferred, fine-tuned, or otherwise modified to produce accurate predictions in another domain for the second sensory prediction task that has less available data then the first sensory prediction task, such as a task where machine learning or accurate prediction would otherwise be difficult and/or impossible.
- a sensory embedding model can be trained in tandem with a first prediction task model.
- the sensory embedding model and the first prediction task model can be trained using (e.g., labeled) first prediction task training data for the first prediction task.
- the sensory embedding model can be trained to produce sensory embeddings with respect to the first prediction task. These sensory embeddings can capture information that is useful in the second prediction task.
- the sensory embedding model can be used with a second prediction task model to output predictions associated with the second prediction task.
- the sensory embedding model can further be refined, fine-tuned, or otherwise continually trained on second prediction task training data associated with the second prediction task.
- the model may be trained at a lower training rate with the second prediction task than for the first prediction task, to prevent intuitively un-leaming the information learned from the first prediction task.
- an amount of second prediction task training data may be less than an amount of first prediction task training data, such as if there is less available data for the second prediction task than for the first prediction task.
- the machine-learned models can be trained, for example, using training data that includes descriptions of molecules and/or mixtures (e.g., structural descriptions of molecules, graph-based descriptions of chemical structures of molecules, etc.) for a first sensory prediction task, such as molecules that have been labeled (e.g., manually by an expert) with descriptions of sensory properties (e.g., olfactory properties) (e.g., textual descriptions of odor categories such as “sweet,” “piney,” “pear,” “rotten,” etc.) that have been assessed for the molecules.
- these descriptions of olfactory molecules may relate to, for example, human perception.
- These models can then be used for a second sensory prediction task that is different from the first sensory prediction task.
- the second sensory prediction task may relate to non-human perception.
- the model is transferred across different species’ perceptual properties of molecules.
- the sensory embeddings can provide a significant boost to prediction quality when transfer learning across species for sensory (e.g., olfactory) prediction tasks. Beyond even in-domain transfer learning, these sensory embeddings can provide improved performance for even more disparate qualities, such as cross-species perception. This is especially unexpected in the chemical domain.
- the sensory embeddings may be taken directly as input at a second prediction task model. The sensory embedding model may then be fine-tuned and trained on the second sensory prediction task.
- the second sensory prediction task and the first sensory prediction task need not be overly similar. For instance, prediction tasks having sufficient distinction (e.g., cross-species, cross domain, etc.) may nonetheless find benefit according to example aspects of the present disclosure.
- some example aspects of the present disclosure are directed to propose the use of neural networks, such as graph neural networks, for olfactory, gustatory, and/or other sensory modeling across distinct domains, such as quantitative structure-odor relationship (QSOR) modeling.
- Graph neural networks can represent spatial information, which can be important for olfactory and/or other sensory modeling.
- Example implementations of the systems and methods described herein significantly outperform prior methods on a novel data set labeled by olfactory experts.
- the learned sensory embeddings from graph neural networks capture a meaningful odor space representation of the underlying relationship between structure and odor.
- These learned sensory embeddings can unexpectedly be applied to domains other than the domain for which the model used to generate the sensory embedding is learned.
- a model trained on human sensory perception data may unexpectedly achieve desirable results outside of the human sensory perception domain, such as other species’ perception and/or other domains.
- the use of graph neural networks can provide spatial understanding to the model that is beneficial for sensory modeling applications.
- prediction for a first prediction task and/or the second prediction task can indicate whether or not the molecule has a particular desired sensory quality (e.g., a target scent perception, etc.).
- the prediction data can include one or more types of information associated with a predicted sensory property (e.g., olfactory property) of a molecule.
- prediction data for a molecule can provide for classifying the molecule into one sensory property (e.g., olfactory property) class and/or into multiple sensory property (e.g., olfactory property) classes.
- the classes can include human-provided (e.g., experts) textual labels (e.g., sour, cherry, piney, etc.).
- the classes can include non-textual representations of scent/odor, such as a location on a scent continuum or the like.
- prediction data for molecules can include intensity values that describe the intensity of the predicted scent/odor.
- prediction data can include confidence values associated with the predicted olfactory perceptual property.
- the prediction data may be descriptive of how well the molecule will perform at a particular task (e.g., a pest control task).
- prediction data can include a numerical sensory embedding that allows for similarity search, clustering, or other comparisons between two or more molecules based on a measure of distance between two or more sensory embeddings.
- the machine-learned model can be trained to output sensory embeddings that can be used to measure similarity by training the machine-learned model using a triplet training scheme where the model is trained to output sensory embeddings that are closer in the sensory embedding space for a pair of similar chemical structures (e.g., an anchor example and a positive example) and to output sensory embeddings that are more distant in the sensory embedding space for a pair of dissimilar chemical structures (e.g., the anchor and a negative example).
- these output sensory embeddings may be used even in dissimilar tasks such as cross-species tasks.
- training data including a plurality of known molecules can be obtained to provide for training one or more machine- learned models (e.g., a graph convolutional neural network, other type of machine-learned model) to provide predictions of sensory properties (e.g., olfactory properties) of molecules.
- the machine-learned models can be trained using one or more datasets of molecules, where the dataset includes the chemical structure and a textual description of the perceptual properties (e.g., descriptions of the smell of the molecule provided by human experts, etc.) for each molecule.
- the training data can be derived from publicly available data such as, for example, publicly available lists of chemical structures and their corresponding odors.
- the training data may be provided for a first sensory prediction task, where the training data is more widely available than for a second sensory prediction task that is an overall objective of the model.
- the model may then be retrained for the second sensory prediction task on a (limited) amount of training data for the second sensory prediction task and/or used as-is for the second sensory prediction task without further training.
- the systems and methods may provide for indications of how changes to a molecule structure could affect the predicted perceptual properties (e.g., for the second prediction task).
- the systems and methods could provide indications of how changes to the molecule structure may affect the intensity of a particular perceptual property, how catastrophic a change in the molecule’s structure would be to desired perceptual qualities, and/or the like.
- the systems and methods may provide for adding and/or removing one or more atoms and/or groups of atoms from a molecule’s structure to determine the effect of such addition/removal on one or more desired perceptual properties.
- a gradient of the classification function of the machine-learned model can be evaluated (e.g., with respect to a particular label) at each node and/or edge of the input graph (e.g., via backpropagation through the machine-learned model) to generate a sensitivity map (e.g., that indicates how important each node and/or edge of the input graph was for output of such particular label).
- a graph of interest can be obtained, similar graphs can be sampled by adding noise to the graph, and then the average of the resulting sensitivity maps for each sampled graph can be taken as the sensitivity map for the graph of interest. Similar techniques can be performed to determine perceptual differences between different molecular structures.
- the systems and methods of the present disclosure can provide for interpreting and/or visualizing which aspects of a molecule’s structure most contributes to a predicted sensory quality (e.g., for the second prediction task).
- a heat map could be generated to overlay the molecule structure that provides indications of which portions of a molecule’s structure are most important to the perceptual properties of the molecule and/or which portions of a molecule’s structure are less important to the perceptual properties of the molecule.
- data indicative of how changes to a molecule structure would impact olfactory perception can be used to generate visualizations of how the structure contributes to a predicted olfactory quality.
- iterative changes to the molecule’s structure e.g., a knock down technique, etc.
- a gradient technique can be used to generate a sensitivity map for the chemical structure, which can then be used to produce the visualization (e.g., in the form of a heat map).
- the machine-learned model(s) may be trained to produce predictions of a molecule chemical structure or a mixture chemical formulation that would provide one or more desired perceptual properties (e.g., generate a molecule chemical structure that would produce a particular scent quality, etc.).
- an iterative search can be performed to identify proposed molecule(s) or mixtures that are predicted to exhibit one or more desired perceptual properties (e.g., targeted scent quality, intensity, etc.).
- an iterative search can propose a number of candidate molecule chemical structures or mixture chemical formulations that can be evaluated by the machine-learned model(s).
- candidate molecule structures can be generated through an evolutionary or genetic process.
- candidate molecule structures can be generated by a reinforcement learning agent (e.g., recurrent neural network) that seeks to leam a policy that maximizes a reward that is a function of whether the generated candidate molecule structures exhibit the one or more desired perceptual properties.
- this perceptual property analysis can be related to a second sensory prediction task that is different from the first sensory prediction task.
- the systems and methods may provide for predicting, identifying, and/or optimizing other properties associated with a molecule structure along with desired sensory properties (e.g., olfactory properties).
- desired sensory properties e.g., olfactory properties
- the machine-learned model(s) may predict or identify properties of molecule structures such as optical properties (e.g., clarity, reflectiveness, color, etc.), olfactory properties (e.g., scents such as scents reminiscent of scents of fruits, flowers, etc.), gustatory properties (e.g., tastes like “banana,” “sour,” “spicy,” etc.) shelf-stability, stability at particular pH levels, biodegradability, toxicity, industrial applicability, and/or the like for a second sensory prediction task that is different from a first sensory prediction task on which the model(s) were earlier trained.
- optical properties e.g., clarity, reflectiveness, color, etc.
- olfactory properties e.g., scents
- the machine-learned models can be used in active learning techniques to narrow a wide field of candidates to a smaller set of molecules or mixtures that are then manually evaluated.
- the systems and methods can allow for synthesis of molecules or mixtures with particular properties in an iterative design-test-refine process. For example, based on prediction data from the machine- learned models, mixtures can be proposed for development. The mixtures can then be formulated, and then can be subjected to specialized testing. Feedback from the testing can then be provided back to the design phase to refine the mixtures to better achieve desired properties, etc. For example, results from the testing can be used as training data to re-train the machine-learned model.
- predictions from the model can then again be used to identify certain molecules or mixtures for testing.
- an iterative pipeline can be evaluated where a model is used to select candidates and then testing results for the candidates can be used to re-train the model, and so on.
- a model is trained using a large amount of human perceptual data, which may be readily available as training data.
- the model is then transferred to an at least somewhat related chemical problem, such as predicting whether a molecule or mixture will be a good mosquito repellent, discovering a new flavor molecule, etc.
- the model e.g., a neural network
- the model can also be packaged into a standalone molecule embedding tool for generating representations that focus on olfactory related problems. These representations can be used to search for odors that smell similarly or trigger similar behavior in animals.
- the embedding space described herein can additionally be useful as a codec for designing electronic scent perception systems (e.g., “electronic noses”).
- certain sensory properties can be desirable for animal attractant and/or repellent tasks.
- the first sensory prediction task can be a human sensory task, such as human olfactory task, a human gustatory task, etc., based on chemical structure of a molecule or mixture.
- the first sensory property can be human perception properties, such as human olfactory perceptual properties and/or human gustatory perceptual properties.
- the second sensory prediction task can be a nonhuman sensory task, such as a related sensory task for another species.
- the second sensory prediction task can additionally and/or alternatively be or include performance of the molecule as an attractant and/or repellent for a certain species.
- the properties may indicate performance of the molecule at attracting a desired species (e.g., for incorporation into animal food, etc.), or repelling undesired species (e.g., an insect repellent).
- this can include pest control applications, such as mosquito repellent, insecticides, etc.
- mosquito repellent may serve to repel mosquitoes and prevent bites contributing to transmission of viruses and diseases.
- services or technologies that relate to human and/or animal olfactory systems could potentially find use for systems and methods according to example aspects in various implementations.
- Example implementations can include, for example, approaches for finding suitable odors for insect repellent or other pest control, such as repellent for mosquitoes, pests that affect crop health, livestock health, personal health, building/infrastructure health, and/or other suitable pests.
- systems and methods described herein may be useful for designing a repellent, insecticide, attractant, etc.
- the first sensory prediction task can be a sensory prediction task related to a human sense, such as a human olfactory task of predicting human olfactory perception labels based on molecular structure data.
- the second sensory prediction task may include predicting performance of molecules at repelling another species, such as mosquitos.
- FIG. 1 depicts a block diagram of an example property prediction system 200 according to example embodiments of the present disclosure.
- the property prediction system 200 is trained to receive a set of input data 202, 204, 206, and 208 descriptive of molecules in a mixture and, as a result of receipt of the input data 202, 204,
- the property prediction system 200 can include one or more embedding model(s) 212 that are operable to generate molecule embeddings, and a machine-learned prediction model 214 that is operable to generate one or more property predictions 216.
- Property prediction systems 200 can include two-stage processing of input data to generate one or more property predictions 216.
- the input data can include molecule data with respective molecule data 202, 204, 206, and 208 for each molecule in a mixture, in which the molecule data can be descriptive of an N number of molecules, and mixture data 210 descriptive of the composition of a mixture of the N number of molecules.
- the system 200 can process the molecule data with one or more embedding model(s) 212 to generate one or more embeddings to be processed by the machine-learned prediction model 214.
- the embedding model 212 can include a graph neural network (GNN) to generate one or more graphs.
- the molecule data can be processed such that the respective molecule data related to each individual molecule can be processed separately such that each embedding can represent a singular molecule.
- the embeddings and the mixture data 210 can be processed by the machine- learned prediction model 214 to generate one or more property predictions 216.
- the machine- learned prediction model 214 can include a deep neural network and/or various other architectures.
- the property predictions 216 can include various predictions related to various properties associated with the mixture.
- the property predictions 216 may include sensory property predictions, such as an olfactory property prediction to later be used for creating a fragrance.
- the first molecule 202, the second molecule 204, the third molecule 206, ... , and the nth molecule 208 can be of the same or different concentrations in the theoreticized mixture.
- the system may weight the one or more embeddings based on concentration of the molecules. The weighting can be completed by the embedding model 212, the machine-learned prediction model 214, and/or a third separate weighting model.
- Figure 3 depicts a block diagram of an example property prediction system 300 according to example embodiments of the present disclosure.
- the property prediction system 300 is similar to property prediction system 200 of Figure 2 except that property prediction system 300 further includes three initial predictions.
- the depicted system 300 includes three initial predictions being made before the overall property predictions 330 are generated.
- the system 300 can make individual molecule predictions 310, mixture composition property predictions 322, and mixture interaction property predictions 324, which can all be factored into the overall property predictions 330.
- the system 300 can begin with obtaining input data 310, which can include molecule data and mixture data descriptive of a mixture with a set of molecules.
- the input data can be processed by a first model to generate molecule specific predictions 310, and in some implementations, the predictions 310 can be concentration specific predictions.
- the concentration predictions 310 may be weighted based on the concentration level and the predictions of the various molecules may be pooled.
- the output of the first model can then be processed by a second model 320, which can include two sub-models.
- the first sub-model can process the data and output composition specific property predictions 322 associated with the overall composition of the mixture.
- the second sub-model can process the data and output interaction specific property predictions 324 associated with predicted interactions in the mixture and/or predicted extrinsic interactions.
- the three initial predictions can be processed to generate an overall property prediction 330 based on each of the initial predictions to allow for a better understanding of the mixture. For example, each individual molecule may have their own respective odor properties, while certain compositions may lead to some molecule properties being more prevalent. Moreover, interaction properties of various molecules and molecule sets may alter, enhance, or dilute certain odor properties. Therefore, each initial prediction can provide insight to how the overall mixture may smell, taste, etc.
- Figure 4 depicts a block diagram of an example property prediction request system 400 according to example embodiments of the present disclosure.
- the property prediction request system 400 is trained to receive a set of training data 442 & 444 descriptive of known properties of individual molecules and known properties of mixture interactions and, as a result of receipt of the training data 442 & 444, determine and store property predictions for one or more mixtures.
- the property prediction request system 400 can include a prediction computing system 402 that is operable to predict and store mixture properties.
- the property prediction request system 400 depicted in Figure 4 includes a prediction computing system 410, a requesting computing system 430, and a training computing system 440 that can communicate with one another to make-up the overall system 400.
- the property prediction request system can rely on a trained prediction computing system 410 that can predict and store properties of mixtures to later produce upon request.
- Training the prediction computing system 410 can include the use of a training computing system 440 that can provide training data for training the machine- learned models 412 & 414 of the prediction computing system 410.
- the training computing system 440 may have training molecule data 442 for training a first machine- learned model (e.g., an embedding model) 412 and training mixture data 444 for training a second machine-learned model (e.g., a deep neural network model) 414.
- the training data can include known properties for various molecules, compositions, and interactions, and the training data, once received, may be stored in the prediction computing system for later reference.
- the training data can include labeled training datasets, which can include known properties of certain mixtures to complete ground truth training of the machine-learned models.
- the prediction computing system 410 may store molecule data 416 and mixture data 418 for reference, for retraining, or for centralization of data.
- the molecule data 416 may be sampled to generate a database of mixture property predictions. The sampling may be at random or may be influenced sampling based on known molecule properties, molecule categories, and/or molecule abundancy.
- the molecule data 416 and the mixture data 418 may be processed by the first machine-learned model 410 and the second machine-learned model to generate property predictions for mixtures to be stored 420 by the prediction system.
- the stored data 420 may then be searchable or accessible via communication between the prediction computing system and the requesting computing system 430.
- the requesting computing system 430 can include a user interface 434 for a user to input a search query or a request related to a certain mixture or a certain property.
- the requesting computing system 430 can generate a request 432, which can be sent to the prediction computing system 410 to search or screen through the stored data to retrieve and provide one or more results.
- the one or more results can then be provided back to the requesting computing system, which may display the one or more results for the user via the user interface.
- the results may be one or more mixtures with a property prediction associated with or matching the search query /request.
- the results may be provided as mixture property profiles with the mixture and their respective property predictions.
- Figure 5 depicts a block diagram of an example mixture property profile 500 according to example embodiments of the present disclosure.
- the mixture property profile 500 is trained to receive and store property predictions with their respective mixture for property screening or searching.
- the mixture property profile 500 can include various property predictions descriptive of predicted properties of a mixture.
- the example mixture property profile 500 in Figure 5 includes a grid of various property categories, which can be filled with property predictions, known properties, or a mix of known and predicted properties.
- the mixture property profiles 500 may include the mixture, the predicted properties, a graphical depiction of the mixture or molecules in the mixture, and/or reasons for the property predictions including initial predictions associated with the molecules in the mixture, the composition of the mixture, and/or the interactions in the mixture.
- Some example properties displayed in a mixture property profile 500 can include odor properties 504, taste properties 506, color properties 508, viscosity properties 510, lubricant properties 512, thermal properties 514, energy properties 516, pharmaceutical properties 518, stability properties 520, catalytic properties 522, adhesion properties 524, and other miscellaneous properties 526,
- Each property can be searchable for retrieving a mixture with a desired property upon request or query. Moreover, each property may provide a desired insight for use in a variety of different fields including consumer facing, industrial facing, etc.
- odor properties 504 can include odor quality properties and odor intensity properties, which can be utilized in order to make fragrances, perfumes, candles, and so forth.
- Taste properties 506 can be utilized to make artificial flavors for candy, vitamins, or other consumables.
- the property predictions can be based at least in part on predicted receptor interactions and activations.
- Other properties can be used for product marketing, such as color properties 508, which can be used to predict the mixtures color or may include coloration properties. The coloration properties can be predicted to determine if the mixture could color other products.
- the viscosity properties 510 can be another property predicted and stored.
- Other property predictions can be related to industrial applications such as providing lubricant properties 512 for machinery dynamics, and energy properties 516 can be used for producing better bateries. Pharmaceuticals may also be improved by or formulated based on knowledge obtained from these property predictions.
- Figure 9A depicts an example evolutionary approach 900, which can be used for generating a database of new mixtures with predicted properties.
- the proposed mixtures can have molecule data and mixture data 902 for each respective proposed mixture.
- the molecule data and mixture data 902 can be processed by the machine-learned property prediction system 904 to generate predicted properties 906 for the proposed mixture.
- the predicted properties 906 can then be processed by an objective function 908 to decide whether an addition to the corpus of top performers 910 should be made or whether to discard. A random mutation can be made, and the process can begin again.
- the evolutionary approach 900 can aid in generating a large database of useful mixtures to be available for screening by a human practitioner for use in a variety of products and industries.
- Figure 9B depicts an example reinforcement learning approach 950, which can be used for model optimization. Similar to the evolutionary approach 900, the reinforcement learning approach 950 can begin with molecule data and mixture data 902 of a proposed mixture being processed by a machine-learned property prediction system to generate predicted properties 906. The predicted properties 906 can then be processed by an objective function 912 to provide an output to a machine-learning controller 914 to provide a proposal to the system.
- the machine-learning controller can include a recurrent neural network.
- the reinforcement learning approach 950 can aid in refining the parameters of the machine-learned models disclosed herein.
- Figure 6 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 6 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 600 can be omited, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
- a computing system can obtain molecule data and mixture data.
- the molecule data can be data descriptive of one or molecules of a mixture, and the mixture data can be descriptive of the mixture.
- the molecule data can include respective molecule data for each of a plurality of molecules, and the mixture data can describe the chemical formulation of the mixture.
- the data may be obtained via manually input data or automatically sampled data.
- the molecule data and the mixture data may be retrieved from a server.
- the mixture data can include concentrations for each of the molecules in the mixture.
- the computing system can process the molecule data with an embedding model to generate one or more embeddings.
- the respective molecule data for each of the plurality of molecules can be processed with an embedding model to generate a respective embedding for each molecule.
- the embedding model can include a graph neural network to generate one or more graph embeddings.
- the embeddings can include embedded data descriptive of individual molecule properties.
- the computing system can process the embeddings and the mixture data with a machine-learned prediction model.
- the machine-learned prediction model can include a deep neural network and may include a weighting model that can weight and pool the embeddings based on the respective molecule concentrations.
- the computing system can generate one or more property predictions.
- the one or more property predictions can be based at least in part on the one or more embeddings and the mixture data. Moreover, the predictions can be based on individual molecule properties, concentration of molecules in the mixture, the composition of the mixture, and interaction properties of the mixture. In some implementations, the predictions can be sensory predictions, energy predictions, stability predictions, and/or thermal predictions.
- the computing system can store the one or more property predictions.
- the property predictions may be stored in a searchable database for easy look-up of mixtures and properties.
- Figure 7 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 7 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 700 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
- a computing system can obtain molecule data and mixture data.
- the molecule data can be descriptive of a plurality of molecules in a mixture, and the mixture data can be descriptive of the mixture.
- the molecule data and mixture data may be obtained separately or at the same time.
- the computing system can process the molecule data with an embedding model to generate embeddings.
- the embedding model can be a graph embedding model, in which the embeddings can be graph embeddings.
- the graph embeddings may be weighted and pooled to generate a graph of graphs.
- respective molecule data for each of the plurality of molecules can be processed as molecule specific sets with an embedding model to generate a respective embedding for each molecule.
- the computing system can process the embeddings and the mixture data with a machine-learned prediction model to generate one or more property predictions.
- the property predictions can include predictions on a variety of mixture properties and can be used in a variety of fields and industries.
- the computing system can store the one or more property predictions.
- the property predictions may be stored in a searchable database to provide easy access to the information.
- the computing system can obtain a request for a mixture with a requested property and determine the one or more property predictions comprise the requested property.
- the request may be a formal request or may be a search query input into a user interface.
- the determination can include determining if a predicted property matches the requested property or is associated with the search query.
- the computing system can provide the mixture data to the requesting computing device.
- the requesting computing device may receive the mixture data in a variety of forms including text data, graph data, etc.
- the mixture data may be provided with a mixture property profile that indicates the property predictions for the respective mixture.
- Figure 8 depicts a flow chart diagram of an example method to perform according to example embodiments of the present disclosure. Although Figure 8 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 800 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
- a computing system can obtain molecule data and mixture data.
- the computing system can process the molecule data with a first model to generate molecule property predictions.
- the molecule property predictions can be embedded before being processed by a second model.
- the computing system can process the molecule property predictions and the mixture data with a second model to generate mixture property predictions.
- the mixture property predictions can be based at least in part on the molecule property predictions and concentrations of the one or more molecules.
- the computing system can generate a predicted property profile for the mixture.
- the property profile can be organized data including the mixture, the mixture property predictions, and other data needed for application of the mixture in a desired field.
- the computing system can store the predicted property profile in a searchable database.
- the searchable database can be enabled by other applications or may be a standalone searchable database with a dedicated interface.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Bioethics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023558451A JP2024512565A (en) | 2021-03-25 | 2021-12-15 | Machine learning to predict properties of chemical formulations |
EP21841117.1A EP4311406A1 (en) | 2021-03-25 | 2021-12-15 | Machine learning for predicting the properties of chemical formulations |
IL307152A IL307152A (en) | 2021-03-25 | 2021-12-15 | Machine learning for predicting the properties of chemical formulations |
KR1020237036503A KR20240004344A (en) | 2021-03-25 | 2021-12-15 | Machine learning to predict properties of chemical agents |
CN202180097570.5A CN117223061A (en) | 2021-03-25 | 2021-12-15 | Machine learning for predicting properties of chemical agents |
US18/370,711 US20240013866A1 (en) | 2021-03-25 | 2023-09-20 | Machine learning for predicting the properties of chemical formulations |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163165781P | 2021-03-25 | 2021-03-25 | |
US63/165,781 | 2021-03-25 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/370,711 Continuation US20240013866A1 (en) | 2021-03-25 | 2023-09-20 | Machine learning for predicting the properties of chemical formulations |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022203734A1 true WO2022203734A1 (en) | 2022-09-29 |
Family
ID=79425491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/063436 WO2022203734A1 (en) | 2021-03-25 | 2021-12-15 | Machine learning for predicting the properties of chemical formulations |
Country Status (7)
Country | Link |
---|---|
US (1) | US20240013866A1 (en) |
EP (1) | EP4311406A1 (en) |
JP (1) | JP2024512565A (en) |
KR (1) | KR20240004344A (en) |
CN (1) | CN117223061A (en) |
IL (1) | IL307152A (en) |
WO (1) | WO2022203734A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4386766A1 (en) * | 2022-12-16 | 2024-06-19 | Firmenich SA | Method and system for predicting a stability value for a determined fragrance in a determined fragrance base |
WO2024150018A1 (en) * | 2023-01-12 | 2024-07-18 | Octagon I/O Limited | Methods and systems for design, selection and modelling of materials |
WO2024182178A1 (en) * | 2023-02-27 | 2024-09-06 | Dow Global Technologies Llc | Formulation graph for machine learning of chemical products |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180107803A1 (en) * | 2016-10-18 | 2018-04-19 | International Business Machines Corporation | Correlating olfactory perception with molecular structure |
US20190156224A1 (en) * | 2017-11-21 | 2019-05-23 | International Business Machines Corporation | Prediction of olfactory and taste perception through semantic encoding |
US20200072808A1 (en) * | 2018-09-04 | 2020-03-05 | International Business Machines Corporation | Predicting human discriminability of odor mixtures |
WO2020163860A1 (en) * | 2019-02-08 | 2020-08-13 | Google Llc | Systems and methods for predicting the olfactory properties of molecules using machine learning |
CN111564186A (en) * | 2020-03-25 | 2020-08-21 | 湖南大学 | Method and system for predicting interaction of graph-volume drug pairs based on knowledge graph |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10037411B2 (en) * | 2015-12-30 | 2018-07-31 | Cerner Innovation, Inc. | Intelligent alert suppression |
-
2021
- 2021-12-15 JP JP2023558451A patent/JP2024512565A/en active Pending
- 2021-12-15 CN CN202180097570.5A patent/CN117223061A/en active Pending
- 2021-12-15 KR KR1020237036503A patent/KR20240004344A/en unknown
- 2021-12-15 IL IL307152A patent/IL307152A/en unknown
- 2021-12-15 WO PCT/US2021/063436 patent/WO2022203734A1/en active Application Filing
- 2021-12-15 EP EP21841117.1A patent/EP4311406A1/en active Pending
-
2023
- 2023-09-20 US US18/370,711 patent/US20240013866A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180107803A1 (en) * | 2016-10-18 | 2018-04-19 | International Business Machines Corporation | Correlating olfactory perception with molecular structure |
US20190156224A1 (en) * | 2017-11-21 | 2019-05-23 | International Business Machines Corporation | Prediction of olfactory and taste perception through semantic encoding |
US20200072808A1 (en) * | 2018-09-04 | 2020-03-05 | International Business Machines Corporation | Predicting human discriminability of odor mixtures |
WO2020163860A1 (en) * | 2019-02-08 | 2020-08-13 | Google Llc | Systems and methods for predicting the olfactory properties of molecules using machine learning |
CN111564186A (en) * | 2020-03-25 | 2020-08-21 | 湖南大学 | Method and system for predicting interaction of graph-volume drug pairs based on knowledge graph |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4386766A1 (en) * | 2022-12-16 | 2024-06-19 | Firmenich SA | Method and system for predicting a stability value for a determined fragrance in a determined fragrance base |
WO2024126681A1 (en) * | 2022-12-16 | 2024-06-20 | Firmenich Sa | Method and system for predicting a stability value for a determined fragrance in a determined fragrance base |
WO2024150018A1 (en) * | 2023-01-12 | 2024-07-18 | Octagon I/O Limited | Methods and systems for design, selection and modelling of materials |
WO2024182178A1 (en) * | 2023-02-27 | 2024-09-06 | Dow Global Technologies Llc | Formulation graph for machine learning of chemical products |
Also Published As
Publication number | Publication date |
---|---|
JP2024512565A (en) | 2024-03-19 |
KR20240004344A (en) | 2024-01-11 |
CN117223061A (en) | 2023-12-12 |
EP4311406A1 (en) | 2024-01-31 |
IL307152A (en) | 2023-11-01 |
US20240013866A1 (en) | 2024-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7457721B2 (en) | Systems and methods for predicting olfactory properties of molecules using machine learning | |
US20240013866A1 (en) | Machine learning for predicting the properties of chemical formulations | |
Mohanty et al. | Web-services classification using intelligent techniques | |
Santana et al. | Optimal fragrances formulation using a deep learning neural network architecture: A novel systematic approach | |
Bhattacharya | Machine learning for bioclimatic modelling | |
Rittig et al. | Graph Neural Networks for the Prediction of Molecular Structure–Property Relationships | |
Makridis et al. | Enhanced food safety through deep learning for food recalls prediction | |
Liu et al. | In silico prediction of fragrance retention grades for monomer flavors using QSPR models | |
Agyemang et al. | Deep inverse reinforcement learning for structural evolution of small molecules | |
US20240021275A1 (en) | Machine-learned models for sensory property prediction | |
Achebouche et al. | Application of artificial intelligence to decode the relationships between smell, olfactory receptors and small molecules | |
Zagatti et al. | MetaPrep: Data preparation pipelines recommendation via meta-learning | |
Bhola et al. | Comparative study of machine learning techniques for chronic disease prognosis | |
JP2009508246A (en) | Support vector induction logic programming | |
Sushma et al. | Machine learning based unique perfume flavour creation using quantitative structure-activity relationship (QSAR) | |
WO2022258652A1 (en) | System for training an ensemble neural network device to assess predictive uncertainty | |
KR102451270B1 (en) | Electronic device, method, and computer readable medium for marketing cosmetic | |
Liu et al. | Integrated machine learning framework for computer-aided chemical product design | |
Lindfors | Demand Forecasting in Retail: A Comparison of Time Series Analysis and Machine Learning Models | |
Agarwal | Intelligent refrigerator | |
Vaithianathan et al. | Smart Agriculture-Based Food Quality Analysis with Healthcare Security System Using Cloud Machine Learning Model | |
Sravya et al. | Meal Magic: An Image-Based Recipe-Generation System | |
Guan | Risk Identification for Plant Health via Text Mining of E-Commerce Data | |
KR20230077921A (en) | Comestics Marketing System | |
WO2024180112A1 (en) | Fragrance and flavour generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21841117 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 306123 Country of ref document: IL |
|
WWE | Wipo information: entry into national phase |
Ref document number: 307152 Country of ref document: IL |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023558451 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020237036503 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021841117 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180097570.5 Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 2021841117 Country of ref document: EP Effective date: 20231025 |