WO2022113945A1 - 情報処理システム、情報処理方法、および情報処理プログラム - Google Patents
情報処理システム、情報処理方法、および情報処理プログラム Download PDFInfo
- Publication number
- WO2022113945A1 WO2022113945A1 PCT/JP2021/042833 JP2021042833W WO2022113945A1 WO 2022113945 A1 WO2022113945 A1 WO 2022113945A1 JP 2021042833 W JP2021042833 W JP 2021042833W WO 2022113945 A1 WO2022113945 A1 WO 2022113945A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information processing
- regression
- machine learning
- component objects
- processing system
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 69
- 238000003672 processing method Methods 0.000 title claims description 10
- 238000010801 machine learning Methods 0.000 claims abstract description 79
- 239000002131 composite material Substances 0.000 claims abstract description 51
- 239000013598 vector Substances 0.000 claims description 90
- 230000006870 function Effects 0.000 claims description 39
- 150000001875 compounds Chemical class 0.000 claims description 36
- 239000000463 material Substances 0.000 claims description 17
- 229920000642 polymer Polymers 0.000 claims description 16
- 230000003993 interaction Effects 0.000 claims description 13
- 230000014509 gene expression Effects 0.000 claims description 12
- 239000000126 substance Substances 0.000 claims description 12
- 239000000956 alloy Substances 0.000 claims description 8
- 229910045601 alloy Inorganic materials 0.000 claims description 8
- 239000000178 monomer Substances 0.000 claims description 7
- 238000000034 method Methods 0.000 description 36
- 238000004364 calculation method Methods 0.000 description 35
- 230000008569 process Effects 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 12
- 239000004793 Polystyrene Substances 0.000 description 11
- 229920002125 Sokalan® Polymers 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 11
- 239000004584 polyacrylic acid Substances 0.000 description 11
- 229920002223 polystyrene Polymers 0.000 description 11
- 125000000484 butyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 10
- 229920002845 Poly(methacrylic acid) Polymers 0.000 description 8
- 238000013329 compounding Methods 0.000 description 7
- 238000012417 linear regression Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 229920001222 biopolymer Polymers 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000009739 binding Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000002253 acid Substances 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- CQEYYJKEWSMYFG-UHFFFAOYSA-N butyl acrylate Chemical compound CCCCOC(=O)C=C CQEYYJKEWSMYFG-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004581 coalescence Methods 0.000 description 1
- 210000003792 cranial nerve Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000000144 pharmacologic effect Effects 0.000 description 1
- 229920000193 polymethacrylate Polymers 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C60/00—Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
Definitions
- One aspect of this disclosure relates to information processing systems, information processing methods, and information processing programs.
- Patent Document 1 describes a method for predicting the binding property between the three-dimensional structure of a biopolymer and the three-dimensional structure of a compound.
- a step of generating a predicted three-dimensional structure of a complex of a biopolymer and a compound based on the three-dimensional structure of a biopolymer and a three-dimensional structure of a compound, and the collation of the predicted three-dimensional structure with an interaction pattern are performed.
- the step of converting to a predicted three-dimensional structure vector representing the result and the step of predicting the bondability between the three-dimensional structure of the biopolymer and the three-dimensional structure of the compound by discriminating the predicted three-dimensional structure vector using a machine learning algorithm include.
- the component objects are diverse or exist in large numbers, it is not possible to prepare a sufficient amount of data for these component objects, and as a result, the accuracy of analysis of the composite object reaches the expected level. May not. Therefore, a mechanism for improving the accuracy of analysis of a composite object is desired even when a sufficient amount of data cannot be prepared for the component object.
- the information processing system includes at least one processor. At least one processor gets the numerical representations and compound ratios for each of the multiple component objects and performs machine learning based on the multiple numerical representations to calculate multiple regression parameters for the multiple component objects. Then, a plurality of composite ratios are applied to the regression model defined by the plurality of regression parameters, and a predicted value indicating the characteristics of the composite object obtained by combining the plurality of component objects is calculated.
- the information processing method is executed by an information processing system including at least one processor.
- This information processing method involves obtaining numerical representations and compound ratios for each of multiple component objects, and performing machine learning based on multiple numerical representations to perform multiple regression parameters for multiple component objects. And a step to calculate the predicted value indicating the characteristics of the composite object obtained by applying multiple composite ratios to the regression model defined by multiple regression parameters and combining multiple component objects. including.
- the information processing program performs machine learning based on a step of acquiring a numerical representation and a compound ratio for each of a plurality of component objects and a plurality of numerical representations to form a plurality of component objects.
- a prediction that shows the characteristics of a composite object obtained by compounding multiple component objects by applying multiple composite ratios to the step of calculating the corresponding multiple regression parameters and the regression model defined by the multiple regression parameters.
- machine learning is executed based on the data of each component object, and a plurality of regression parameters corresponding to a plurality of component objects are calculated. Then, the composite ratio is applied to the regression model defined by the regression parameter, and the characteristics of the composite object are predicted.
- the accuracy of analysis of compound objects can be improved even when a sufficient amount of data cannot be prepared for component objects.
- the information processing system 10 is a computer system that executes analysis on a composite object obtained by combining a plurality of component objects at a given composite ratio.
- a component object is a tangible or intangible object used to create a composite object.
- Composite objects can be tangible or intangible.
- An example of a tangible object is any substance or object.
- Data and information are examples of intangibles.
- "Composite of a plurality of component objects” means a process of converting a plurality of component objects into one object, that is, a compound object.
- the method of combining is not limited, and may be, for example, compounding, compounding, synthesis, binding, mixing, merging, combination, compounding, or coalescence, or other methods.
- the analysis of a compound object is a process for obtaining data showing some characteristics of the compound object.
- Multiple component objects may be any multiple types of materials, in which case the composite object is a multi-component substance produced by those materials.
- a material is any component used to produce a multi-component substance.
- the plurality of materials may be any plurality of types of molecules or atoms, and in this case, the composite object is a multi-component substance obtained by combining those molecules or atoms by any method.
- the material may be a polymer or a monomer, correspondingly the multi-component material may be a polymer alloy.
- the material may be a monomer, and correspondingly, the multi-component substance may be a polymer.
- the material may be a drug, i.e. a chemical substance having a pharmacological action, and correspondingly, the multi-component substance may be a drug.
- the information processing system 10 executes machine learning for analysis of complex objects.
- Machine learning is a method of learning based on given information and autonomously finding a law or rule.
- the specific method of machine learning is not limited.
- the information processing system 10 may execute machine learning using a machine learning model which is a calculation model including a neural network.
- a neural network is a model of information processing that imitates the mechanism of the human cranial nerve system.
- the information processing system 10 includes a graph neural network (GNN), a convolutional neural network (CNN), a recursive neural network (RNN), an attention RNN (Attention RNN), and a multi-head attention (Multi-).
- Machine learning may be performed using at least one of the Head Attentions).
- the information processing system 10 is composed of one or more computers. When a plurality of computers are used, one information processing system 10 is logically constructed by connecting these computers via a communication network such as the Internet or an intranet.
- FIG. 1 is a diagram showing an example of a general hardware configuration of a computer 100 constituting an information processing system 10.
- the computer 100 includes a processor 101 such as a CPU that executes an operating system, an application program, and the like, a main storage unit 102 composed of a ROM and a RAM, and an auxiliary storage unit 103 composed of a hard disk, a flash memory, and the like.
- a communication control unit 104 composed of a network card or a wireless communication module, an input device 105 such as a keyboard and a mouse, and an output device 106 such as a monitor are provided.
- Each functional element of the information processing system 10 is realized by reading a predetermined program on the processor 101 or the main storage unit 102 and causing the processor 101 to execute the program.
- the processor 101 operates the communication control unit 104, the input device 105, or the output device 106 according to the program, and reads and writes data in the main storage unit 102 or the auxiliary storage unit 103.
- the data or database required for processing is stored in the main storage unit 102 or the auxiliary storage unit 103.
- FIG. 2 is a diagram showing an example of the functional configuration of the information processing system 10.
- the information processing system 10 includes an acquisition unit 11, a calculation unit 12, and a prediction unit 13 as functional elements.
- the acquisition unit 11 is a functional element that acquires data related to a plurality of component objects. Specifically, the acquisition unit 11 acquires a numerical expression and a compound ratio for each of the plurality of component objects.
- the numerical representation of a component object is data that expresses an arbitrary attribute of a component object using a plurality of numerical values.
- the attributes of a component object are the properties or characteristics of the component object. Numerical representations may be visualized by various methods, for example, numbers, letters, texts, molecular graphs, vectors, images, time series data, etc., or any two of these methods. It may be visualized by the above combination.
- the individual numerical values constituting the numerical representation may be expressed in decimal notation, or may be expressed in other notations such as binary notation and hexadecimal notation.
- the compound ratio of component objects is the ratio between multiple component objects.
- the specific type, unit, and expression method of the compound ratio are not limited, and may be arbitrarily determined depending on the component object or the compound object.
- the compound ratio may be represented by a ratio such as a percentage, a histogram, or an absolute quantity of individual component objects.
- the calculation unit 12 is a functional element for calculating the regression parameters of the regression model for predicting the characteristics of the composite object. Specifically, the calculation unit 12 executes machine learning based on a plurality of numerical representations corresponding to a plurality of component objects to calculate regression parameters.
- the regression model is an expression for obtaining the value of one or more objective variables y when the value of one or more explanatory variables x is given.
- the regression model may be a linear regression model or a non-linear regression model.
- An example of a regression model is Scheffe polynomial. However, the regression model may be another parametric model. Regression parameters are numerical values included in the regression model.
- the prediction unit 13 is a functional element that predicts the characteristics of the composite object and outputs the predicted value.
- the characteristics of a composite object are the peculiar properties of a composite object.
- the prediction unit 13 applies a composite ratio to the regression model defined by the calculated regression parameters to calculate the predicted value.
- the prediction unit 13 substitutes a plurality of compound ratios into the regression model to calculate the prediction value.
- the combination of the calculation unit 12 and the prediction unit 13 is realized by one machine learning model.
- the calculation unit 12 may be realized by a machine learning model
- the prediction unit 13 may be realized by an algorithm that does not use a machine learning model.
- each of at least one machine learning model used in this embodiment is a trained model expected to have the highest estimation accuracy, and therefore can be called the "best machine learning model".
- the trained model is generated by a given computer processing teacher data containing many combinations of input vectors and labels.
- a given computer inputs an input vector into a machine learning model, calculates an output value, and finds the error between the output value and the label shown in the teacher data.
- the output value is, for example, a predicted value. It can be said that the error between the output value and the label is the difference between the estimation result and the correct answer.
- the computer updates a given parameter in the machine learning model based on that error.
- the computer generates a trained model by repeating such learning.
- the computer that generates the trained model is not limited, and may be, for example, the information processing system 10 or another computer system.
- the process of generating a trained model can be called the learning phase, and the process of using the trained model can be called the operation phase.
- the entire machine learning model used in this embodiment may be described by a function that does not depend on the input order. With this mechanism, it is possible to eliminate the influence of the order of multiple vectors in machine learning.
- FIG. 3 is a flowchart showing an example of the operation of the information processing system 10 as a processing flow S1.
- the processing flow S1 corresponds to the operation phase.
- step S11 the acquisition unit 11 acquires the numerical representation and the compound ratio for each of the plurality of component objects.
- the acquisition unit 11 may, for example, numerically represent the component object Ea ⁇ 1,1,2,3,4,3,3,5. 6,7,5,4 ⁇ , the numerical representation of the component object Eb ⁇ 1,1,5,6,3,3,5,1,7,0,0 ⁇ , and the composite of the component objects Ea and Eb.
- each numerical representation is shown as a vector.
- the compound ratio ⁇ 0.7, 0.3 ⁇ means that the component objects Ea and Eb are used in a ratio of 7: 3 to obtain a compound object.
- the acquisition unit 11 may acquire the data of each of the plurality of component objects by any method.
- the acquisition unit 11 may access a given database to read data, may receive data from another computer or computer system, or may receive data input by a user of the information processing system 10. You may accept it.
- the acquisition unit 11 may acquire data by any two or more of these methods.
- the calculation unit 12 calculates a feature vector for each of the plurality of component objects based on a numerical expression.
- the feature vector is a vector showing the features of the component object.
- the characteristics of a component object are any elements that make the component object different from other objects.
- a vector is an n-dimensional quantity having n numerical values, and can be expressed as a one-dimensional array.
- step S13 the calculation unit 12 calculates a plurality of regression parameters corresponding to a plurality of component objects based on the calculated plurality of feature vectors.
- step S14 the prediction unit 13 calculates a prediction value indicating the characteristics of the composite object by using a regression model defined by a plurality of calculated regression parameters.
- the regression model defined by the regression parameter is, in short, a regression model in which a specific specific numerical value is determined as the regression parameter.
- the prediction unit 13 applies a plurality of compound ratios to the regression model to calculate a prediction value.
- step S15 the prediction unit 13 outputs the predicted value.
- the method of outputting the predicted value is not limited.
- the prediction unit 13 may store the predicted value in a given database, send it to another computer or computer system, or display it on a display device.
- the prediction unit 13 may output the predicted value to another functional element for subsequent processing in the information processing system 10.
- 4 and 5 are both diagrams showing an example of a procedure for calculating regression parameters.
- the component object represents three materials (polymers): polystyrene, polyacrylic acid, and butyl polymethacrylic acid. Any form of numerical representation may be provided for each of these materials.
- step S121 which is part of step S12, the calculator 12 uses a machine learning model for an embedded function to calculate the features of the vector, from a numerical representation to the feature vector Z for each of the plurality of component objects. Is calculated.
- This machine learning model is a trained model.
- the input vector and the output vector have a one-to-one relationship.
- the input vector is a numerical representation and the output vector is the feature vector Z.
- the calculation unit 12 inputs a plurality of numerical representations corresponding to the plurality of component objects into the model for the embedded function, and calculates the feature vector Z of each of the plurality of component objects.
- the calculation unit 12 inputs the numerical representation corresponding to the component object into the model for the embedded function for each of the plurality of component objects, and calculates the feature vector Z of the component object.
- the model for the embedded function may generate a feature vector Z, which is a fixed-length vector, from a numerical representation, which is atypical data. Atypical data refers to data that is not represented by a fixed-length vector.
- the calculation unit 12 calculates the feature vector Z 1 corresponding to polystyrene, the feature vector Z 2 corresponding to polyacrylic acid, and the feature vector Z 3 corresponding to butyl polymethacrylic acid.
- the machine learning model for the embedded function is not limited, and may be decided by an arbitrary policy in consideration of factors such as the types of component objects and composite objects.
- the calculation unit 12 may execute the embedding function using a graph neural network (GNN), a convolutional neural network (CNN), or a recurrent neural network (RNN).
- GNN graph neural network
- CNN convolutional neural network
- RNN recurrent neural network
- step S122 which is a part of step S12, the calculation unit 12 separates the plurality of component objects from the feature vector Z by the machine learning model for the interaction function for interacting the plurality of vectors.
- the feature vector M of is calculated.
- This machine learning model is a trained model.
- the input vector and the output vector have a one-to-one relationship.
- the input vector is the feature vector Z and the output vector is the feature vector M.
- the calculation unit 12 inputs a set of a plurality of feature vectors Z corresponding to the plurality of component objects into the model for the interaction function, and calculates the feature vector M for each of the plurality of component objects.
- the calculation unit 12 calculates the feature vector M 1 corresponding to polystyrene, the feature vector M 2 corresponding to polyacrylic acid, and the feature vector M 3 corresponding to butyl polymethacrylic acid.
- the machine learning model for the interaction function is not limited, and it may be decided by an arbitrary policy in consideration of factors such as the types of component objects and compound objects.
- the calculation unit 12 may execute machine learning for an interaction function using an attention RNN (Attention RNN) or a multi-head attention (Multi-Head Attention).
- the calculation unit 12 may calculate the feature vector M by an interaction function that does not include learning parameters.
- the calculation unit 12 calculates the regression parameter a of the linear regression model from the feature vector M for each of the plurality of component objects.
- the calculation unit 12 calculates the regression parameters by the machine learning model.
- This machine learning model is a trained model.
- the input vector and the output value have a one-to-one relationship.
- the input vector is the feature vector M and the output value is the regression parameter a.
- the calculation unit 12 inputs a set of a plurality of feature vectors M corresponding to the plurality of component objects into the machine learning model, and calculates the regression parameter a for each of the plurality of component objects.
- the calculation unit 12 calculates the regression parameter a 1 corresponding to polystyrene, the regression parameter a 2 corresponding to polyacrylic acid, and the regression parameter a 3 corresponding to butyl polymethacrylic acid.
- the machine learning model for calculating the regression parameters is not limited, and may be determined by any policy in consideration of factors such as the types of component objects and compound objects.
- the calculation unit 12 may calculate the regression parameters using a fully coupled neural network (FCNN).
- FCNN fully coupled neural network
- the prediction unit 13 calculates the prediction value E by the following Scheffe polynomial (1) defined by the three regression parameters a 1 , a 2 , and a 3 .
- the regression parameter a is the regression coefficient of the linear term of the equation (1).
- the predicted value E indicates the characteristics of the multi-component substance (polymer alloy) obtained from polystyrene, polyacrylic acid, and butyl polymethacrylic acid.
- the variable r in the equation (1) means a compound ratio.
- the composite ratios of polystyrene, polyacrylic acid, and butyl polymethacrylic acid are represented as r1, r2 , and r3, respectively .
- step S12 including step S121 and step S122 is the same as the example of FIG. 4, and steps S13 and S14 are different from the example of FIG.
- the calculation unit 12 calculates the regression parameters of the linear regression model from the feature vector M for each of the plurality of component objects. Specifically, the calculation unit 12 calculates the regression parameter a of the primary term and the regression parameter b of the secondary term. In one example, the calculation unit 12 calculates the regression parameter by machine learning such as FCNN. Machine learning models are prepared for each of the linear and quadratic terms of the linear regression model.
- the input vector and the output value have a one-to-one relationship.
- the input vector is the feature vector M and the output value is the regression parameter a.
- the calculation unit 12 inputs a set of a plurality of feature vectors M corresponding to the plurality of component objects into the machine learning model, and calculates the regression parameter a for each of the plurality of component objects. Also in the example of FIG. 5, the calculation unit 12 calculates the regression parameter a 1 corresponding to polystyrene, the regression parameter a 2 corresponding to polyacrylic acid, and the regression parameter a 3 corresponding to butyl polymethacrylic acid.
- each input vector is obtained by synthesizing two feature vectors.
- This function is a function that calculates one regression parameter from two vectors.
- two feature vectors M are combined.
- the calculation unit 12 synthesizes two feature vectors M 1 and M 2 to generate a first input vector, and synthesizes two feature vectors M 1 and M 3 to generate a second input vector. Is generated, and the two feature vectors M 2 and M 3 are combined to generate a third input vector.
- the first input vector corresponds to polystyrene and polyacrylic acid
- the second input vector corresponds to polystyrene and butyl polymethacrylic acid
- the third input vector corresponds to polyacrylic acid and butyl polymethacrylic acid. handle.
- the input vector and the output value have a one-to-one relationship.
- the input vector is a composite of two feature vectors M and the output value is the regression parameter b.
- the calculation unit 12 inputs all combinations of input vectors into the machine learning model and calculates the regression parameter b for each combination. In the example of FIG.
- the calculation unit 12 has a regression parameter b 12 corresponding to the combination of polystyrene and polyacrylic acid, a regression parameter b 13 corresponding to the combination of polystyrene and butyl polymethacrylate, and polyacrylic acid and polymetha.
- Regression parameter b 23 corresponding to the combination of butyl acrylate is calculated.
- the prediction unit 13 determines the prediction value E by the following Scheffe polynomial (2) defined by the six regression parameters a 1 , a 2 , a 3 , b 12 , b 13 , and b 23 . calculate.
- the regression parameter a is the regression coefficient of the first-order term
- the regression parameter b is the regression coefficient of the second-order term.
- the meaning of the variable r in the equation (2) is the compound ratio as in the equation (1).
- the information processing system 10 may output individual regression parameters based on the feature vectors of all the related component objects for the regression model including the terms of the third order or higher or other parameters.
- the information processing system 10 may output one regression parameter based on the feature vectors of all component objects.
- the calculation unit 12 executes both the embedding function and the interaction function, but one of these two functions may be omitted.
- the calculation unit 12 may calculate the regression parameter from the feature vector Z obtained by the machine learning model for the embedded function. In any case, the calculation unit 12 executes machine learning to calculate the regression parameters.
- a machine learning model for embedded functions, a machine learning model for interaction functions, a machine learning model for regression parameters, and a regression model may be constructed by one neural network or multiple neural networks. It may be constructed by a set of networks.
- the machine learning model for the embedded function, the machine learning model for the interaction function, and the machine learning model for the regression parameter may be constructed by one neural network, or may be constructed by a set of a plurality of neural networks. May be done.
- the information processing program for making a computer or a computer system function as an information processing system 10 includes a program code for making the computer system function as an acquisition unit 11, a calculation unit 12, and a prediction unit 13.
- This information processing program may be provided after being temporarily recorded on a tangible recording medium such as a CD-ROM, a DVD-ROM, or a semiconductor memory. Alternatively, the information processing program may be provided via a communication network as a data signal superimposed on a carrier wave.
- the provided information processing program is stored in, for example, the auxiliary storage unit 103.
- Each of the above functional elements is realized by the processor 101 reading the information processing program from the auxiliary storage unit 103 and executing the information processing program.
- the information processing system includes at least one processor. At least one processor gets the numerical representations and compound ratios for each of the multiple component objects and performs machine learning based on the multiple numerical representations to calculate multiple regression parameters for the multiple component objects. Then, a plurality of composite ratios are applied to the regression model defined by the plurality of regression parameters, and a predicted value indicating the characteristics of the composite object obtained by combining the plurality of component objects is calculated.
- the information processing method is executed by an information processing system including at least one processor.
- This information processing method involves obtaining numerical representations and compound ratios for each of multiple component objects, and performing machine learning based on multiple numerical representations to perform multiple regression parameters for multiple component objects. And a step to calculate the predicted value indicating the characteristics of the composite object obtained by applying multiple composite ratios to the regression model defined by multiple regression parameters and combining multiple component objects. including.
- the information processing program performs machine learning based on a step of acquiring a numerical expression and a compound ratio for each of a plurality of component objects, and machine learning based on the plurality of numerical expressions, to form a plurality of component objects.
- a prediction that shows the characteristics of a composite object obtained by compounding multiple component objects by applying multiple composite ratios to the step of calculating the corresponding multiple regression parameters and the regression model defined by the multiple regression parameters. Have the computer perform the steps to calculate the value.
- machine learning is executed based on the data of each component object, and a plurality of regression parameters corresponding to a plurality of component objects are calculated. Then, the composite ratio is applied to the regression model defined by the regression parameter, and the characteristics of the composite object are predicted.
- the composite ratio can be changed and the characteristics of the composite object can be instantly recalculated by the regression model. That is, the calculated regression parameters can be reused.
- At least one processor inputs a plurality of numerical representations into a first machine learning model to calculate a plurality of feature vectors corresponding to a plurality of component objects, and a plurality of feature vectors. May be input to the second machine learning model to calculate a plurality of regression parameters.
- the first machine learning model may include a machine learning model for an embedded function and a machine learning model for an interaction function.
- At least one processor inputs a plurality of numerical representations into a machine learning model for an embedded function, calculates a plurality of first feature vectors corresponding to a plurality of component objects, and interacts the plurality of first feature vectors.
- By inputting into the machine learning model for multiple second feature vectors corresponding to multiple component objects are calculated, and by inputting multiple second feature vectors into the second machine learning model, multiple regression parameters are calculated. You may.
- the machine learning model for the embedded function may be a machine learning model that generates a first feature vector, which is a fixed-length vector, from a numerical expression that is atypical data.
- a first feature vector which is a fixed-length vector
- feature vectors can be obtained from numerical representations that cannot be represented by fixed-length vectors.
- the regression model may be Scheffe's polynomial.
- At least one processor may calculate a plurality of regression coefficients of a linear term of Scheffé's polynomial as a plurality of regression parameters.
- Scheffe's polynomial which is often dealt with in compounding problems, it is possible to accurately analyze a composite object obtained by compounding a plurality of component objects.
- the regression coefficient of the linear term can be used to calculate a predicted value that takes into account the single degree of influence of the component object.
- At least one processor may further calculate a plurality of regression coefficients of the quadratic term of the Scheffe polynomial as a plurality of regression parameters.
- the regression coefficient of the quadratic term can be used to calculate a predicted value that further considers the degree of influence of the composition of the two component objects.
- the component object may be a material and the composite object may be a multi-component substance.
- the composite object may be a multi-component substance.
- the material may be a polymer or a monomer
- the multi-component substance may be a polymer alloy.
- Polymers or monomers are very diverse and correspondingly there are a huge variety of polymer alloys. For such polymers, monomers, and polymer alloys, in general, only some of the possible combinations can be tested, and therefore sufficient data are often not available. According to this aspect, it is possible to analyze the polymer alloy with high accuracy even when the data is insufficient in this way.
- the processing procedure of the information processing method executed by at least one processor is not limited to the example in the above embodiment. For example, some of the steps or processes described above may be omitted, or the steps may be performed in a different order. Further, any two or more steps among the above-mentioned steps may be combined, or a part of the steps may be modified or deleted. Alternatively, other steps may be performed in addition to each of the above steps.
- the expression "at least one processor executes the first process, executes the second process, ... executes the nth process", or the expression corresponding thereto is the first.
- a concept including a case where the processor that executes n processes from the first process to the nth process changes in the middle is shown. That is, this expression shows a concept including both a case where all n processes are executed by the same processor and a case where the processor changes according to an arbitrary policy in the n processes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
実施形態に係る情報処理システム10は、複数の成分オブジェクトを所与の複合比で複合させることで得られる複合オブジェクトに関する解析を実行するコンピュータシステムである。成分オブジェクトとは、複合オブジェクトを生成するために用いられる有体物または無体物をいう。複合オブジェクトは有体物または無体物であり得る。有体物の例として任意の物質または物体が挙げられる。無体物の例としてデータおよび情報が挙げられる。「複数の成分オブジェクトを複合させる」とは、複数の成分オブジェクトを一つのオブジェクト、すなわち複合オブジェクトにする処理をいう。複合させる手法は限定させず、例えば、配合、調合、合成、結合、混合、合併、組合せ、化合、または合体でもよいし、他の手法でもよい。複合オブジェクトに関する解析とは、複合オブジェクトの何らかの特性を示すデータを得るための処理をいう。
情報処理システム10は1台以上のコンピュータで構成される。複数台のコンピュータを用いる場合には、これらのコンピュータがインターネット、イントラネット等の通信ネットワークを介して接続されることで、論理的に一つの情報処理システム10が構築される。
図3を参照しながら、情報処理システム10の動作を説明するとともに本実施形態に係る情報処理方法について説明する。図3は情報処理システム10の動作の一例を処理フローS1として示すフローチャートである。処理フローS1は運用フェーズに相当する。
コンピュータまたはコンピュータシステムを情報処理システム10として機能させるための情報処理プログラムは、該コンピュータシステムを取得部11、算出部12、および予測部13として機能させるためのプログラムコードを含む。この情報処理プログラムは、CD-ROM、DVD-ROM、半導体メモリ等の有形の記録媒体に非一時的に記録された上で提供されてもよい。あるいは、情報処理プログラムは、搬送波に重畳されたデータ信号として通信ネットワークを介して提供されてもよい。提供された情報処理プログラムは例えば補助記憶部103に記憶される。プロセッサ101が補助記憶部103からその情報処理プログラムを読み出して実行することで、上記の各機能要素が実現する。
以上説明したように、本開示の一側面に係る情報処理システムは、少なくとも一つのプロセッサを備える。少なくとも一つのプロセッサは、複数の成分オブジェクトのそれぞれについての数値表現および複合比を取得し、複数の数値表現に基づいて機械学習を実行して、複数の成分オブジェクトに対応する複数の回帰パラメータを算出し、複数の回帰パラメータによって定義される回帰モデルに複数の複合比を適用して、複数の成分オブジェクトを複合させることで得られる複合オブジェクトの特性を示す予測値を算出する。
以上、本発明をその実施形態に基づいて詳細に説明した。しかし、本発明は上記実施形態に限定されるものではない。本発明は、その要旨を逸脱しない範囲で様々な変形が可能である。
Claims (10)
- 少なくとも一つのプロセッサを備え、
前記少なくとも一つのプロセッサが、
複数の成分オブジェクトのそれぞれについての数値表現および複合比を取得し、
複数の前記数値表現に基づいて機械学習を実行して、前記複数の成分オブジェクトに対応する複数の回帰パラメータを算出し、
前記複数の回帰パラメータによって定義される回帰モデルに複数の前記複合比を適用して、前記複数の成分オブジェクトを複合させることで得られる複合オブジェクトの特性を示す予測値を算出する、
情報処理システム。 - 前記少なくとも一つのプロセッサが、
前記複数の数値表現を第1機械学習モデルに入力して、前記複数の成分オブジェクトに対応する複数の特徴ベクトルを算出し、
前記複数の特徴ベクトルを第2機械学習モデルに入力して前記複数の回帰パラメータを算出する、
請求項1に記載の情報処理システム。 - 前記第1機械学習モデルが、埋込み関数用の機械学習モデルと、相互作用関数用の機械学習モデルとを含み、
前記少なくとも一つのプロセッサが、
前記複数の数値表現を前記埋込み関数用の機械学習モデルに入力して、前記複数の成分オブジェクトに対応する複数の第1特徴ベクトルを算出し、
前記複数の第1特徴ベクトルを前記相互作用関数用の機械学習モデルに入力して、前記複数の成分オブジェクトに対応する複数の第2特徴ベクトルを算出し、
前記複数の第2特徴ベクトルを前記第2機械学習モデルに入力して前記複数の回帰パラメータを算出する、
請求項2に記載の情報処理システム。 - 前記埋込み関数用の機械学習モデルが、非定型データである前記数値表現から、固定長ベクトルである前記第1特徴ベクトルを生成する機械学習モデルである、
請求項3に記載の情報処理システム。 - 前記回帰モデルがシェッフェ多項式であり、
前記少なくとも一つのプロセッサが、前記複数の回帰パラメータとして、前記シェッフェ多項式の1次項の複数の回帰係数を算出する、
請求項1~4のいずれか一項に記載の情報処理システム。 - 前記少なくとも一つのプロセッサが、前記複数の回帰パラメータとして、前記シェッフェ多項式の2次項の複数の回帰係数を更に算出する、
請求項5に記載の情報処理システム。 - 前記成分オブジェクトが材料であり、前記複合オブジェクトが多成分物質である、
請求項1~6のいずれか一項に記載の情報処理システム。 - 前記材料がポリマーまたはモノマーであり、前記多成分物質がポリマーアロイである、
請求項7に記載の情報処理システム。 - 少なくとも一つのプロセッサを備える情報処理システムにより実行される情報処理方法であって、
複数の成分オブジェクトのそれぞれについての数値表現および複合比を取得するステップと、
複数の前記数値表現に基づいて機械学習を実行して、前記複数の成分オブジェクトに対応する複数の回帰パラメータを算出するステップと、
前記複数の回帰パラメータによって定義される回帰モデルに複数の前記複合比を適用して、前記複数の成分オブジェクトを複合させることで得られる複合オブジェクトの特性を示す予測値を算出するステップと、
を含む情報処理方法。 - 複数の成分オブジェクトのそれぞれについての数値表現および複合比を取得するステップと、
複数の前記数値表現に基づいて機械学習を実行して、前記複数の成分オブジェクトに対応する複数の回帰パラメータを算出するステップと、
前記複数の回帰パラメータによって定義される回帰モデルに複数の前記複合比を適用して、前記複数の成分オブジェクトを複合させることで得られる複合オブジェクトの特性を示す予測値を算出するステップと、
をコンピュータに実行させる情報処理プログラム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/254,384 US20240047018A1 (en) | 2020-11-27 | 2021-11-22 | Information processing system, information processing method, and storage medium |
EP21897918.5A EP4243026A4 (en) | 2020-11-27 | 2021-11-22 | INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM |
JP2022565331A JPWO2022113945A1 (ja) | 2020-11-27 | 2021-11-22 | |
CN202180089147.0A CN116745850A (zh) | 2020-11-27 | 2021-11-22 | 信息处理系统、信息处理方法及信息处理程序 |
KR1020237021006A KR20230110584A (ko) | 2020-11-27 | 2021-11-22 | 정보 처리 시스템, 정보 처리 방법, 및 정보 처리 프로그램 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020197046 | 2020-11-27 | ||
JP2020-197046 | 2020-11-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022113945A1 true WO2022113945A1 (ja) | 2022-06-02 |
Family
ID=81754598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/042833 WO2022113945A1 (ja) | 2020-11-27 | 2021-11-22 | 情報処理システム、情報処理方法、および情報処理プログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US20240047018A1 (ja) |
EP (1) | EP4243026A4 (ja) |
JP (1) | JPWO2022113945A1 (ja) |
KR (1) | KR20230110584A (ja) |
CN (1) | CN116745850A (ja) |
WO (1) | WO2022113945A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102457159B1 (ko) * | 2021-01-28 | 2022-10-20 | 전남대학교 산학협력단 | 딥러닝 기반 화합물 의약 효과 예측 방법 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140163937A1 (en) * | 2012-12-12 | 2014-06-12 | Hyundai Motor Company | Method for predicting physical properties of a composite blend of polypropylene and low density polypropylene |
JP2019028879A (ja) | 2017-08-02 | 2019-02-21 | 学校法人立命館 | 結合性予測方法、装置、プログラム、記録媒体、および機械学習アルゴリズムの製造方法 |
JP2020030638A (ja) * | 2018-08-23 | 2020-02-27 | パナソニックIpマネジメント株式会社 | 材料情報出力方法、材料情報出力装置、材料情報出力システム、及びプログラム |
JP2020038493A (ja) * | 2018-09-04 | 2020-03-12 | 横浜ゴム株式会社 | 物性データ予測方法及び物性データ予測装置 |
WO2020090805A1 (ja) * | 2018-10-31 | 2020-05-07 | 昭和電工株式会社 | 材料探索装置、方法、およびプログラム |
JP2020161044A (ja) * | 2019-03-28 | 2020-10-01 | 日立化成株式会社 | データ管理システム、データ管理方法、およびデータ管理プログラム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7218519B2 (ja) * | 2018-09-04 | 2023-02-07 | 横浜ゴム株式会社 | 物性データ予測方法及び物性データ予測装置 |
-
2021
- 2021-11-22 US US18/254,384 patent/US20240047018A1/en active Pending
- 2021-11-22 JP JP2022565331A patent/JPWO2022113945A1/ja active Pending
- 2021-11-22 WO PCT/JP2021/042833 patent/WO2022113945A1/ja active Application Filing
- 2021-11-22 KR KR1020237021006A patent/KR20230110584A/ko unknown
- 2021-11-22 CN CN202180089147.0A patent/CN116745850A/zh active Pending
- 2021-11-22 EP EP21897918.5A patent/EP4243026A4/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140163937A1 (en) * | 2012-12-12 | 2014-06-12 | Hyundai Motor Company | Method for predicting physical properties of a composite blend of polypropylene and low density polypropylene |
JP2019028879A (ja) | 2017-08-02 | 2019-02-21 | 学校法人立命館 | 結合性予測方法、装置、プログラム、記録媒体、および機械学習アルゴリズムの製造方法 |
JP2020030638A (ja) * | 2018-08-23 | 2020-02-27 | パナソニックIpマネジメント株式会社 | 材料情報出力方法、材料情報出力装置、材料情報出力システム、及びプログラム |
JP2020038493A (ja) * | 2018-09-04 | 2020-03-12 | 横浜ゴム株式会社 | 物性データ予測方法及び物性データ予測装置 |
WO2020090805A1 (ja) * | 2018-10-31 | 2020-05-07 | 昭和電工株式会社 | 材料探索装置、方法、およびプログラム |
JP2020161044A (ja) * | 2019-03-28 | 2020-10-01 | 日立化成株式会社 | データ管理システム、データ管理方法、およびデータ管理プログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP4243026A4 |
Also Published As
Publication number | Publication date |
---|---|
CN116745850A (zh) | 2023-09-12 |
EP4243026A1 (en) | 2023-09-13 |
KR20230110584A (ko) | 2023-07-24 |
US20240047018A1 (en) | 2024-02-08 |
EP4243026A4 (en) | 2024-05-15 |
JPWO2022113945A1 (ja) | 2022-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021095722A1 (ja) | 情報処理システム、情報処理方法、および情報処理プログラム | |
Emmanoulopoulos et al. | Quantum machine learning in finance: Time series forecasting | |
Robert | Approximate Bayesian computation: a survey on recent results | |
WO2022113945A1 (ja) | 情報処理システム、情報処理方法、および情報処理プログラム | |
CN112086144A (zh) | 分子生成方法、装置、电子设备及存储介质 | |
WO2022079911A1 (ja) | 秘密決定木テスト装置、秘密決定木テストシステム、秘密決定木テスト方法、及びプログラム | |
JP7395974B2 (ja) | 入力データ生成システム、入力データ生成方法、及び入力データ生成プログラム | |
Lupo Pasini et al. | Fast and accurate predictions of total energy for solid solution alloys with graph convolutional neural networks | |
WO2021095725A1 (ja) | 情報処理システム、情報処理方法、および情報処理プログラム | |
JP7571781B2 (ja) | 情報処理システム、情報処理方法、および情報処理プログラム | |
WO2021166634A1 (ja) | 情報処理システム、情報処理方法、および情報処理プログラム | |
Xavier et al. | Genome assembly using reinforcement learning | |
JP2021179668A (ja) | データ解析システム、データ解析方法及びデータ解析プログラム | |
JP2020161044A (ja) | データ管理システム、データ管理方法、およびデータ管理プログラム | |
WO2022079908A1 (ja) | 秘密決定木テスト装置、秘密決定木テストシステム、秘密決定木テスト方法、及びプログラム | |
JP7339924B2 (ja) | 材料の特性値を推定するシステム | |
Bar et al. | Kuadrosim: An Optimized and Practical Quantum Circuit Simulator | |
WO2022124010A1 (ja) | 演算制御装置、演算制御方法、および記録媒体 | |
Ghafarollahi et al. | Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems | |
Pierri et al. | Beyond the Cox Model: Applying Machine Learning Techniques with Time-to-Event Data | |
JP2023072958A (ja) | モデル生成装置、モデル生成方法及びデータ推定装置 | |
CN117893316A (zh) | 一种构建指数的量子方法及装置 | |
Beyer et al. | Theory of Evolutionary Algorithms | |
Styger | AN EXPLORATION OF APPLYING KNOWLEDGE BASED ENGINEERING INTO A QUALITY MANAGEMENT FRAMEWORK-EXTENDING THE QUALITY TRIANGLE FOR ESTABLISHING THE FIRST PRINCIPLES OF KNOWLEDGE BUSINESS MODELLING | |
Beyer et al. | Theory of Evolutionary Algorithms (Dagstuhl Seminar 02031) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21897918 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022565331 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18254384 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2021897918 Country of ref document: EP Effective date: 20230609 |
|
ENP | Entry into the national phase |
Ref document number: 20237021006 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180089147.0 Country of ref document: CN |