CN110491453A - A kind of yield prediction method of chemical reaction - Google Patents

A kind of yield prediction method of chemical reaction Download PDF

Info

Publication number
CN110491453A
CN110491453A CN201810392135.5A CN201810392135A CN110491453A CN 110491453 A CN110491453 A CN 110491453A CN 201810392135 A CN201810392135 A CN 201810392135A CN 110491453 A CN110491453 A CN 110491453A
Authority
CN
China
Prior art keywords
yield
vector
dimension
chemical reaction
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810392135.5A
Other languages
Chinese (zh)
Inventor
张倬胜
赵海
姜舒
李江彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810392135.5A priority Critical patent/CN110491453A/en
Publication of CN110491453A publication Critical patent/CN110491453A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of yield prediction method of chemical reaction, it is therefore intended that finds the implementation of the yield prediction of more efficiently chemical reaction comprising following steps: to reactant or product cutting, being expressed as a series of K element;K element vector after cutting is turned to the vector of (K, N) dimension;Reaction condition vector is turned to the vector of (M, N) dimension by cutting reaction condition, and the vector for splicing (K, N) dimension described in the vector sum of (M, N) dimension obtains the vector of (V, N) dimension;By in the vector input neural network of (V, N) dimension, yield prediction value is obtained;Yield true value is obtained, system parameter is adjusted, so that yield prediction value is constantly close to yield true value.The present invention then in such a way that neural network carries out yield prediction, improves the efficiency of yield prediction by the way that reactant, product and the reaction condition cutting vector in chemical equation are turned to vector, is convenient for engineer application.

Description

Yield prediction method for chemical reaction
Technical Field
The invention relates to the cross field of computer science and chemical organic synthesis, in particular to a yield prediction method for chemical reaction.
Background
Machine learning excavates internal association through deep information hidden in the learning data, and then makes prediction and judgement for it possesses comparatively effectual insight and high efficiency, surpasses human ability even in the vertical field.
The traditional prediction of chemical reaction yield is based on that users recall a plurality of organic reaction mechanisms and obtain a plausible inverse synthetic analysis on the basis of consulting a large number of documents, so that a relation is established between a target molecule and available raw materials. The chemical reaction yield prediction method is relatively complicated and inconvenient for engineering application.
Disclosure of Invention
In order to find a more effective implementation scheme for predicting the yield of the chemical reaction, the application of machine learning in various fields such as biopharmaceuticals, medical diagnosis and the like is considered, the traditional research method is changed, the scientific research efficiency is improved, and the changes of various industries are promoted, so that the invention provides the more effective method for predicting the yield of the chemical reaction.
In order to achieve the above object, the present invention provides a method for predicting yield of a chemical reaction, comprising the steps of:
splitting a reactant or a product of a chemical equation according to a preset splitting rule, so that the reactant or the product is represented as a series of K elements, wherein K is a natural number greater than 1;
quantizing the segmented K element vectors into vectors of (K, N) dimensions, wherein N is a natural number greater than 1;
segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M;
inputting the vector of the (V, N) dimension into a neural network to obtain a yield predicted value of the chemical equation;
and acquiring a yield real value, and adjusting system parameters in the neural network according to an error between the yield real value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield real value.
Preferably, the preset segmentation rule includes one or both of segmentation according to an element periodic table and segmentation according to a compound.
Preferably, the neural network is a convolutional neural network or a recurrent neural network.
Preferably, the adjusting the system parameters in the neural network according to the error between the actual yield value and the predicted yield value comprises the following steps:
calculating a gap between the predicted yield value and the true yield value by loss function regression fitting;
and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.
Preferably, the loss function is a squared loss function.
Preferably, the quantizing the K elements after segmentation into a vector of (K, N) dimension includes the following steps:
establishing an element table based on the K elements;
and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.
Compared with the prior art, the yield prediction method for the chemical reaction has the following beneficial effects:
according to the yield prediction method for the chemical reaction, the reactants, the products and the reaction conditions in the chemical reaction equation are segmented and vectorized into vectors, and then the yield prediction is carried out through the neural network, so that the problem of complicated artificial inverse synthetic analysis is solved, the yield prediction efficiency is improved, and the yield prediction method is convenient for engineering application.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart of a method for predicting yield of a chemical reaction according to an embodiment of the present invention;
fig. 2 is an exemplary schematic diagram of step S3 in the yield prediction method of a chemical reaction according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
Referring to fig. 1-2, a method for predicting yield of a chemical reaction according to an embodiment of the present invention includes the following steps:
step S101: splitting reactants or products of a chemical equation according to a preset splitting rule, so that the reactants or the products are represented as a series of K elements, wherein K is a natural number greater than 1.
Preferably, the preset segmentation rule includes one or both of segmentation according to the periodic table of elements and segmentation according to the compound.
Illustratively, the original chemical expression:
[C:1]([C:3]1[CH:8]=[CH:7][CH:6]=[CH:5][C:4]=1
[OH:9])#[N:2].[CH2:10]([CH:12]1[O:14][CH2:13]1)Cl,
it is then cut into Cl C1C O, N # C C1C C C C1O according to the periodic Table of the elements.
In some embodiments, in order to achieve automatic segmentation of elements, a segmentation model may be trained in advance to segment the elements using manually labeled segmentation data.
Step S103: quantizing the K elements after segmentation into vectors of (K, N) dimensions, wherein N is a natural number greater than 1.
Specifically, quantizing the vector of the K elements after segmentation into a vector of (K, N) dimension includes the following steps:
establishing an element table based on the K elements;
and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.
Illustratively, as above, Cl is cut into Cl C1C O. N # C C1C C C C1O in the periodic table of elements, if N is assumed to be 100, Cl is represented as a 100-dimensional vector by the word embedding method as:
[0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 …]。
step S105: segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M.
It is understood that the sampling method of step S3 is substantially the same as steps S1 and S2, except that the action object is a reaction condition.
Illustratively, as shown in fig. 2, the chemical reaction equation:
C1=COCCC1.OCCCCCCO>C1.C1C(C1)C1>OCCCCCCOC1CCCCO1
the reactant corresponding to the chemical reaction equation is C1 ═ cocccc1. occcccco, the product is occcccccoc 1CCCCO1, and the reaction condition is C1.c1c (C1) C1, then step S3 is to splice the split and vectorized reactant with the split and vectorized reaction condition or splice the split and vectorized product with the split and vectorized reaction condition.
It should be noted that the above equation is expressed in order to simplify the linear input specification of the molecule, and is a specification for explicitly describing the structure of the molecule by using the code string of the american standard code for information exchange.
Step S107: and inputting the vector of the (V, N) dimension into a neural network to obtain a yield prediction value of the chemical equation.
In some embodiments, the neural network is a convolutional neural network or a recurrent neural network. The explanation of step S4 will be described below by taking a long-short term memory network commonly used in recurrent neural networks as an example. In the long-short term memory network,
it=σ(Wixt+Uist-1+bi)
ft=σ(Wfxt+Ufst-1+bf)
ot=σ(Woxt+Uost-1+bo)
ht=ot*tanh(Ct)
wherein,
it、ft、otan input gate, a forgetting gate and an output gate are respectively arranged;
is a state value calculated based on the current input and the previous hidden state;
xtis the t-th vector among the input vectors of (V, N) dimensions;
u is the weight matrix of the current input;
w is the weight matrix at the previous moment;
u, W, b is the parameter of the long-short term memory network, b is the bias term;
h represents the long-short term memory network output.
The yield prediction value can be finally obtained through the above calculation of the long-short term memory network. For convenience of illustration, the output of the long-short term memory network only needs one dimension to represent the reaction yield, i.e., the output of the yield y is 0-1, which represents the yield interval of the chemical reaction, and in order to represent the predicted yield value to the interval of 0-1, the predicted yield value can be represented by a sigmoid function after h is obtained:
it should be noted that the predicted yield value may be expressed by other functions in the interval of 0-1, which is not limited in the embodiments of the present invention.
Step S109: and acquiring a yield true value of the chemical equation, and adjusting system parameters in the neural network according to an error between the yield true value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield true value.
Specifically, the step of adjusting the system parameters in the neural network according to the error between the actual yield value and the predicted yield value comprises the following steps:
calculating the difference between the yield predicted value and the yield true value through loss function regression fitting;
and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.
Preferably, the loss function is a squared loss function, such as a mean square error.
Illustratively, the system parameters adjusted in the long-short term memory network include U, W and b.
It should be noted that, in the case of the product having more chemical reaction equations, the yield can be predicted by chemical reaction equation, or by taking the intermediate process as the reaction condition, which is not limited in the embodiment of the present invention.
Compared with the prior art, the yield prediction method for the chemical reaction has the following beneficial effects:
according to the yield prediction method for the chemical reaction, provided by the embodiment of the invention, the reactants, the products and the reaction conditions in the chemical reaction equation are segmented and vectorized into vectors, and then the yield prediction is carried out through a neural network, so that the problem of complicated artificial inverse synthetic analysis is solved, the yield prediction efficiency is improved, and the yield prediction method is convenient for engineering application.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A method for predicting the yield of a chemical reaction, comprising: the yield prediction method for the chemical reaction comprises the following steps:
splitting a reactant or a product of a chemical equation according to a preset splitting rule, so that the reactant or the product is represented as a series of K elements, wherein K is a natural number greater than 1;
quantizing the segmented K element vectors into vectors of (K, N) dimensions, wherein N is a natural number greater than 1;
segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M;
inputting the vector of the (V, N) dimension into a neural network to obtain a yield predicted value of the chemical equation;
and acquiring a yield real value, and adjusting system parameters in the neural network according to an error between the yield real value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield real value.
2. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the preset segmentation rule comprises one or two of segmentation according to an element periodic table and segmentation according to a compound.
3. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the neural network is a convolutional neural network or a recurrent neural network.
4. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the adjusting the system parameters in the neural network according to the error between the yield real value and the yield predicted value comprises the following steps:
calculating a gap between the predicted yield value and the true yield value by loss function regression fitting;
and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.
5. The method for predicting the yield of a chemical reaction according to claim 4, wherein: the loss function is a squared loss function.
6. A method for predicting the yield of a chemical reaction according to claim 1, wherein: quantizing the K element vectors after segmentation into vectors of (K, N) dimensions, including the steps of:
establishing an element table based on the K elements;
and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.
CN201810392135.5A 2018-04-27 2018-04-27 A kind of yield prediction method of chemical reaction Pending CN110491453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810392135.5A CN110491453A (en) 2018-04-27 2018-04-27 A kind of yield prediction method of chemical reaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810392135.5A CN110491453A (en) 2018-04-27 2018-04-27 A kind of yield prediction method of chemical reaction

Publications (1)

Publication Number Publication Date
CN110491453A true CN110491453A (en) 2019-11-22

Family

ID=68543722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810392135.5A Pending CN110491453A (en) 2018-04-27 2018-04-27 A kind of yield prediction method of chemical reaction

Country Status (1)

Country Link
CN (1) CN110491453A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380346A (en) * 2021-06-08 2021-09-10 河南大学 Coupling reaction yield intelligent prediction method based on attention convolution neural network
CN113517033A (en) * 2021-03-23 2021-10-19 河南大学 XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment
JP7302075B1 (en) 2022-07-06 2023-07-03 日本曹達株式会社 Method/device for generating estimation model for estimating reaction conditions, method/device for providing reaction conditions, and program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243310A1 (en) * 2007-03-28 2008-10-02 Esposito William R Production control utilizing real time optimization
CN105893354A (en) * 2016-05-03 2016-08-24 成都数联铭品科技有限公司 Word segmentation method based on bidirectional recurrent neural network
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN107168955A (en) * 2017-05-23 2017-09-15 南京大学 Word insertion and the Chinese word cutting method of neutral net using word-based context
US20170351948A1 (en) * 2016-06-01 2017-12-07 Seoul National University R&Db Foundation Apparatus and method for generating prediction model based on artificial neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243310A1 (en) * 2007-03-28 2008-10-02 Esposito William R Production control utilizing real time optimization
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN105893354A (en) * 2016-05-03 2016-08-24 成都数联铭品科技有限公司 Word segmentation method based on bidirectional recurrent neural network
US20170351948A1 (en) * 2016-06-01 2017-12-07 Seoul National University R&Db Foundation Apparatus and method for generating prediction model based on artificial neural network
CN107168955A (en) * 2017-05-23 2017-09-15 南京大学 Word insertion and the Chinese word cutting method of neutral net using word-based context

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CONNOR W. COLEY等: "Prediction of Organic Reaction Outcomes Using Machine Learning", 《2017 AMERICAN CHEMICAL SOCIETY》 *
J.N.WEI等: "Neural networks for the prediction of organic chemistry reactions", 《ACS CENTRAL SCI》 *
P.RACCUGLIA等: "Machine-learning-assisted materials discovery using failed experiments", 《NATURE》 *
张春梅等: "玉米秸秆热裂解产物产率预测分析", 《农业机械学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113517033A (en) * 2021-03-23 2021-10-19 河南大学 XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment
CN113517033B (en) * 2021-03-23 2022-08-12 河南大学 XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment
CN113380346A (en) * 2021-06-08 2021-09-10 河南大学 Coupling reaction yield intelligent prediction method based on attention convolution neural network
JP7302075B1 (en) 2022-07-06 2023-07-03 日本曹達株式会社 Method/device for generating estimation model for estimating reaction conditions, method/device for providing reaction conditions, and program
JP2024007916A (en) * 2022-07-06 2024-01-19 日本曹達株式会社 Generation method/generation device of estimation model for estimating reaction condition, provision method/provision device of reaction condition and program

Similar Documents

Publication Publication Date Title
Chou et al. Parameter estimation in biochemical systems models with alternating regression
Duvenaud et al. Structure discovery in nonparametric regression through compositional kernel search
Bharadiya A review of Bayesian machine learning principles, methods, and applications
Yuan et al. Soft sensor model development in multiphase/multimode processes based on Gaussian mixture regression
Pernkopf Bayesian network classifiers versus selective k-NN classifier
Hassan et al. A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction
CN113744799B (en) Method for predicting interaction and affinity of compound and protein based on end-to-end learning
CN104657744A (en) Multi-classifier training method and classifying method based on non-deterministic active learning
CN110491453A (en) A kind of yield prediction method of chemical reaction
Schrier et al. In pursuit of the exceptional: Research directions for machine learning in chemical and materials science
CN110600085B (en) Tree-LSTM-based organic matter physicochemical property prediction method
AU2001290505A1 (en) Molecular database for antibody characterization
Rittig et al. Graph Neural Networks for the Prediction of Molecular Structure–Property Relationships
Lee et al. Machine learning of interstellar chemical inventories
CN115758758A (en) Inverse synthesis prediction method, medium, and apparatus based on similarity feature constraint
Siino et al. Artificially intelligent scoring and classification engine for forensic identification
Ghosal et al. G-MIND: an end-to-end multimodal imaging-genetics framework for biomarker identification and disease classification
Reh et al. Variational Monte Carlo approach to partial differential equations with neural networks
Taneja et al. Machine-learning-based methods to generate conformational ensembles of disordered proteins
CN101710338A (en) Heterogeneous network sequencing method based on public hidden space
Schachtner et al. A Bayesian approach to the Lee–Seung update rules for NMF
Hector et al. Parallel-and-stream accelerator for computationally fast supervised learning
Takeishi et al. Knowledge-based regularization in generative modeling
Kim et al. Batch sequential minimum energy design with design-region adaptation
Khemchandani et al. DeepGraphMol, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191122

RJ01 Rejection of invention patent application after publication