CN110491453A - A kind of yield prediction method of chemical reaction - Google Patents
A kind of yield prediction method of chemical reaction Download PDFInfo
- Publication number
- CN110491453A CN110491453A CN201810392135.5A CN201810392135A CN110491453A CN 110491453 A CN110491453 A CN 110491453A CN 201810392135 A CN201810392135 A CN 201810392135A CN 110491453 A CN110491453 A CN 110491453A
- Authority
- CN
- China
- Prior art keywords
- yield
- vector
- dimension
- chemical reaction
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 239000000126 substance Substances 0.000 claims abstract description 13
- 239000000376 reactant Substances 0.000 claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims description 19
- 230000000737 periodic effect Effects 0.000 claims description 5
- 230000000306 recurrent effect Effects 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 9
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- RXKJFZQQPQGTFL-UHFFFAOYSA-N dihydroxyacetone Chemical compound OCC(=O)CO RXKJFZQQPQGTFL-UHFFFAOYSA-N 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000006053 organic reaction Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of yield prediction method of chemical reaction, it is therefore intended that finds the implementation of the yield prediction of more efficiently chemical reaction comprising following steps: to reactant or product cutting, being expressed as a series of K element;K element vector after cutting is turned to the vector of (K, N) dimension;Reaction condition vector is turned to the vector of (M, N) dimension by cutting reaction condition, and the vector for splicing (K, N) dimension described in the vector sum of (M, N) dimension obtains the vector of (V, N) dimension;By in the vector input neural network of (V, N) dimension, yield prediction value is obtained;Yield true value is obtained, system parameter is adjusted, so that yield prediction value is constantly close to yield true value.The present invention then in such a way that neural network carries out yield prediction, improves the efficiency of yield prediction by the way that reactant, product and the reaction condition cutting vector in chemical equation are turned to vector, is convenient for engineer application.
Description
Technical Field
The invention relates to the cross field of computer science and chemical organic synthesis, in particular to a yield prediction method for chemical reaction.
Background
Machine learning excavates internal association through deep information hidden in the learning data, and then makes prediction and judgement for it possesses comparatively effectual insight and high efficiency, surpasses human ability even in the vertical field.
The traditional prediction of chemical reaction yield is based on that users recall a plurality of organic reaction mechanisms and obtain a plausible inverse synthetic analysis on the basis of consulting a large number of documents, so that a relation is established between a target molecule and available raw materials. The chemical reaction yield prediction method is relatively complicated and inconvenient for engineering application.
Disclosure of Invention
In order to find a more effective implementation scheme for predicting the yield of the chemical reaction, the application of machine learning in various fields such as biopharmaceuticals, medical diagnosis and the like is considered, the traditional research method is changed, the scientific research efficiency is improved, and the changes of various industries are promoted, so that the invention provides the more effective method for predicting the yield of the chemical reaction.
In order to achieve the above object, the present invention provides a method for predicting yield of a chemical reaction, comprising the steps of:
splitting a reactant or a product of a chemical equation according to a preset splitting rule, so that the reactant or the product is represented as a series of K elements, wherein K is a natural number greater than 1;
quantizing the segmented K element vectors into vectors of (K, N) dimensions, wherein N is a natural number greater than 1;
segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M;
inputting the vector of the (V, N) dimension into a neural network to obtain a yield predicted value of the chemical equation;
and acquiring a yield real value, and adjusting system parameters in the neural network according to an error between the yield real value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield real value.
Preferably, the preset segmentation rule includes one or both of segmentation according to an element periodic table and segmentation according to a compound.
Preferably, the neural network is a convolutional neural network or a recurrent neural network.
Preferably, the adjusting the system parameters in the neural network according to the error between the actual yield value and the predicted yield value comprises the following steps:
calculating a gap between the predicted yield value and the true yield value by loss function regression fitting;
and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.
Preferably, the loss function is a squared loss function.
Preferably, the quantizing the K elements after segmentation into a vector of (K, N) dimension includes the following steps:
establishing an element table based on the K elements;
and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.
Compared with the prior art, the yield prediction method for the chemical reaction has the following beneficial effects:
according to the yield prediction method for the chemical reaction, the reactants, the products and the reaction conditions in the chemical reaction equation are segmented and vectorized into vectors, and then the yield prediction is carried out through the neural network, so that the problem of complicated artificial inverse synthetic analysis is solved, the yield prediction efficiency is improved, and the yield prediction method is convenient for engineering application.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart of a method for predicting yield of a chemical reaction according to an embodiment of the present invention;
fig. 2 is an exemplary schematic diagram of step S3 in the yield prediction method of a chemical reaction according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
Referring to fig. 1-2, a method for predicting yield of a chemical reaction according to an embodiment of the present invention includes the following steps:
step S101: splitting reactants or products of a chemical equation according to a preset splitting rule, so that the reactants or the products are represented as a series of K elements, wherein K is a natural number greater than 1.
Preferably, the preset segmentation rule includes one or both of segmentation according to the periodic table of elements and segmentation according to the compound.
Illustratively, the original chemical expression:
[C:1]([C:3]1[CH:8]=[CH:7][CH:6]=[CH:5][C:4]=1
[OH:9])#[N:2].[CH2:10]([CH:12]1[O:14][CH2:13]1)Cl,
it is then cut into Cl C1C O, N # C C1C C C C1O according to the periodic Table of the elements.
In some embodiments, in order to achieve automatic segmentation of elements, a segmentation model may be trained in advance to segment the elements using manually labeled segmentation data.
Step S103: quantizing the K elements after segmentation into vectors of (K, N) dimensions, wherein N is a natural number greater than 1.
Specifically, quantizing the vector of the K elements after segmentation into a vector of (K, N) dimension includes the following steps:
establishing an element table based on the K elements;
and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.
Illustratively, as above, Cl is cut into Cl C1C O. N # C C1C C C C1O in the periodic table of elements, if N is assumed to be 100, Cl is represented as a 100-dimensional vector by the word embedding method as:
[0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 …]。
step S105: segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M.
It is understood that the sampling method of step S3 is substantially the same as steps S1 and S2, except that the action object is a reaction condition.
Illustratively, as shown in fig. 2, the chemical reaction equation:
C1=COCCC1.OCCCCCCO>C1.C1C(C1)C1>OCCCCCCOC1CCCCO1
the reactant corresponding to the chemical reaction equation is C1 ═ cocccc1. occcccco, the product is occcccccoc 1CCCCO1, and the reaction condition is C1.c1c (C1) C1, then step S3 is to splice the split and vectorized reactant with the split and vectorized reaction condition or splice the split and vectorized product with the split and vectorized reaction condition.
It should be noted that the above equation is expressed in order to simplify the linear input specification of the molecule, and is a specification for explicitly describing the structure of the molecule by using the code string of the american standard code for information exchange.
Step S107: and inputting the vector of the (V, N) dimension into a neural network to obtain a yield prediction value of the chemical equation.
In some embodiments, the neural network is a convolutional neural network or a recurrent neural network. The explanation of step S4 will be described below by taking a long-short term memory network commonly used in recurrent neural networks as an example. In the long-short term memory network,
it=σ(Wixt+Uist-1+bi)
ft=σ(Wfxt+Ufst-1+bf)
ot=σ(Woxt+Uost-1+bo)
ht=ot*tanh(Ct)
wherein,
it、ft、otan input gate, a forgetting gate and an output gate are respectively arranged;
is a state value calculated based on the current input and the previous hidden state;
xtis the t-th vector among the input vectors of (V, N) dimensions;
u is the weight matrix of the current input;
w is the weight matrix at the previous moment;
u, W, b is the parameter of the long-short term memory network, b is the bias term;
h represents the long-short term memory network output.
The yield prediction value can be finally obtained through the above calculation of the long-short term memory network. For convenience of illustration, the output of the long-short term memory network only needs one dimension to represent the reaction yield, i.e., the output of the yield y is 0-1, which represents the yield interval of the chemical reaction, and in order to represent the predicted yield value to the interval of 0-1, the predicted yield value can be represented by a sigmoid function after h is obtained:
it should be noted that the predicted yield value may be expressed by other functions in the interval of 0-1, which is not limited in the embodiments of the present invention.
Step S109: and acquiring a yield true value of the chemical equation, and adjusting system parameters in the neural network according to an error between the yield true value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield true value.
Specifically, the step of adjusting the system parameters in the neural network according to the error between the actual yield value and the predicted yield value comprises the following steps:
calculating the difference between the yield predicted value and the yield true value through loss function regression fitting;
and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.
Preferably, the loss function is a squared loss function, such as a mean square error.
Illustratively, the system parameters adjusted in the long-short term memory network include U, W and b.
It should be noted that, in the case of the product having more chemical reaction equations, the yield can be predicted by chemical reaction equation, or by taking the intermediate process as the reaction condition, which is not limited in the embodiment of the present invention.
Compared with the prior art, the yield prediction method for the chemical reaction has the following beneficial effects:
according to the yield prediction method for the chemical reaction, provided by the embodiment of the invention, the reactants, the products and the reaction conditions in the chemical reaction equation are segmented and vectorized into vectors, and then the yield prediction is carried out through a neural network, so that the problem of complicated artificial inverse synthetic analysis is solved, the yield prediction efficiency is improved, and the yield prediction method is convenient for engineering application.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (6)
1. A method for predicting the yield of a chemical reaction, comprising: the yield prediction method for the chemical reaction comprises the following steps:
splitting a reactant or a product of a chemical equation according to a preset splitting rule, so that the reactant or the product is represented as a series of K elements, wherein K is a natural number greater than 1;
quantizing the segmented K element vectors into vectors of (K, N) dimensions, wherein N is a natural number greater than 1;
segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M;
inputting the vector of the (V, N) dimension into a neural network to obtain a yield predicted value of the chemical equation;
and acquiring a yield real value, and adjusting system parameters in the neural network according to an error between the yield real value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield real value.
2. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the preset segmentation rule comprises one or two of segmentation according to an element periodic table and segmentation according to a compound.
3. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the neural network is a convolutional neural network or a recurrent neural network.
4. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the adjusting the system parameters in the neural network according to the error between the yield real value and the yield predicted value comprises the following steps:
calculating a gap between the predicted yield value and the true yield value by loss function regression fitting;
and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.
5. The method for predicting the yield of a chemical reaction according to claim 4, wherein: the loss function is a squared loss function.
6. A method for predicting the yield of a chemical reaction according to claim 1, wherein: quantizing the K element vectors after segmentation into vectors of (K, N) dimensions, including the steps of:
establishing an element table based on the K elements;
and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810392135.5A CN110491453A (en) | 2018-04-27 | 2018-04-27 | A kind of yield prediction method of chemical reaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810392135.5A CN110491453A (en) | 2018-04-27 | 2018-04-27 | A kind of yield prediction method of chemical reaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110491453A true CN110491453A (en) | 2019-11-22 |
Family
ID=68543722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810392135.5A Pending CN110491453A (en) | 2018-04-27 | 2018-04-27 | A kind of yield prediction method of chemical reaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110491453A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113380346A (en) * | 2021-06-08 | 2021-09-10 | 河南大学 | Coupling reaction yield intelligent prediction method based on attention convolution neural network |
CN113517033A (en) * | 2021-03-23 | 2021-10-19 | 河南大学 | XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment |
JP7302075B1 (en) | 2022-07-06 | 2023-07-03 | 日本曹達株式会社 | Method/device for generating estimation model for estimating reaction conditions, method/device for providing reaction conditions, and program |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243310A1 (en) * | 2007-03-28 | 2008-10-02 | Esposito William R | Production control utilizing real time optimization |
CN105893354A (en) * | 2016-05-03 | 2016-08-24 | 成都数联铭品科技有限公司 | Word segmentation method based on bidirectional recurrent neural network |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN107168955A (en) * | 2017-05-23 | 2017-09-15 | 南京大学 | Word insertion and the Chinese word cutting method of neutral net using word-based context |
US20170351948A1 (en) * | 2016-06-01 | 2017-12-07 | Seoul National University R&Db Foundation | Apparatus and method for generating prediction model based on artificial neural network |
-
2018
- 2018-04-27 CN CN201810392135.5A patent/CN110491453A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243310A1 (en) * | 2007-03-28 | 2008-10-02 | Esposito William R | Production control utilizing real time optimization |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN105893354A (en) * | 2016-05-03 | 2016-08-24 | 成都数联铭品科技有限公司 | Word segmentation method based on bidirectional recurrent neural network |
US20170351948A1 (en) * | 2016-06-01 | 2017-12-07 | Seoul National University R&Db Foundation | Apparatus and method for generating prediction model based on artificial neural network |
CN107168955A (en) * | 2017-05-23 | 2017-09-15 | 南京大学 | Word insertion and the Chinese word cutting method of neutral net using word-based context |
Non-Patent Citations (4)
Title |
---|
CONNOR W. COLEY等: "Prediction of Organic Reaction Outcomes Using Machine Learning", 《2017 AMERICAN CHEMICAL SOCIETY》 * |
J.N.WEI等: "Neural networks for the prediction of organic chemistry reactions", 《ACS CENTRAL SCI》 * |
P.RACCUGLIA等: "Machine-learning-assisted materials discovery using failed experiments", 《NATURE》 * |
张春梅等: "玉米秸秆热裂解产物产率预测分析", 《农业机械学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113517033A (en) * | 2021-03-23 | 2021-10-19 | 河南大学 | XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment |
CN113517033B (en) * | 2021-03-23 | 2022-08-12 | 河南大学 | XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment |
CN113380346A (en) * | 2021-06-08 | 2021-09-10 | 河南大学 | Coupling reaction yield intelligent prediction method based on attention convolution neural network |
JP7302075B1 (en) | 2022-07-06 | 2023-07-03 | 日本曹達株式会社 | Method/device for generating estimation model for estimating reaction conditions, method/device for providing reaction conditions, and program |
JP2024007916A (en) * | 2022-07-06 | 2024-01-19 | 日本曹達株式会社 | Generation method/generation device of estimation model for estimating reaction condition, provision method/provision device of reaction condition and program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chou et al. | Parameter estimation in biochemical systems models with alternating regression | |
Duvenaud et al. | Structure discovery in nonparametric regression through compositional kernel search | |
Bharadiya | A review of Bayesian machine learning principles, methods, and applications | |
Yuan et al. | Soft sensor model development in multiphase/multimode processes based on Gaussian mixture regression | |
Pernkopf | Bayesian network classifiers versus selective k-NN classifier | |
Hassan et al. | A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction | |
CN113744799B (en) | Method for predicting interaction and affinity of compound and protein based on end-to-end learning | |
CN104657744A (en) | Multi-classifier training method and classifying method based on non-deterministic active learning | |
CN110491453A (en) | A kind of yield prediction method of chemical reaction | |
Schrier et al. | In pursuit of the exceptional: Research directions for machine learning in chemical and materials science | |
CN110600085B (en) | Tree-LSTM-based organic matter physicochemical property prediction method | |
AU2001290505A1 (en) | Molecular database for antibody characterization | |
Rittig et al. | Graph Neural Networks for the Prediction of Molecular Structure–Property Relationships | |
Lee et al. | Machine learning of interstellar chemical inventories | |
CN115758758A (en) | Inverse synthesis prediction method, medium, and apparatus based on similarity feature constraint | |
Siino et al. | Artificially intelligent scoring and classification engine for forensic identification | |
Ghosal et al. | G-MIND: an end-to-end multimodal imaging-genetics framework for biomarker identification and disease classification | |
Reh et al. | Variational Monte Carlo approach to partial differential equations with neural networks | |
Taneja et al. | Machine-learning-based methods to generate conformational ensembles of disordered proteins | |
CN101710338A (en) | Heterogeneous network sequencing method based on public hidden space | |
Schachtner et al. | A Bayesian approach to the Lee–Seung update rules for NMF | |
Hector et al. | Parallel-and-stream accelerator for computationally fast supervised learning | |
Takeishi et al. | Knowledge-based regularization in generative modeling | |
Kim et al. | Batch sequential minimum energy design with design-region adaptation | |
Khemchandani et al. | DeepGraphMol, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191122 |
|
RJ01 | Rejection of invention patent application after publication |