CN110491453A

CN110491453A - A kind of yield prediction method of chemical reaction

Info

Publication number: CN110491453A
Application number: CN201810392135.5A
Authority: CN
Inventors: 张倬胜; 赵海; 姜舒; 李江彤
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2019-11-22

Abstract

The present invention discloses a kind of yield prediction method of chemical reaction, it is therefore intended that finds the implementation of the yield prediction of more efficiently chemical reaction comprising following steps: to reactant or product cutting, being expressed as a series of K element；K element vector after cutting is turned to the vector of (K, N) dimension；Reaction condition vector is turned to the vector of (M, N) dimension by cutting reaction condition, and the vector for splicing (K, N) dimension described in the vector sum of (M, N) dimension obtains the vector of (V, N) dimension；By in the vector input neural network of (V, N) dimension, yield prediction value is obtained；Yield true value is obtained, system parameter is adjusted, so that yield prediction value is constantly close to yield true value.The present invention then in such a way that neural network carries out yield prediction, improves the efficiency of yield prediction by the way that reactant, product and the reaction condition cutting vector in chemical equation are turned to vector, is convenient for engineer application.

Description

Yield prediction method for chemical reaction

Technical Field

The invention relates to the cross field of computer science and chemical organic synthesis, in particular to a yield prediction method for chemical reaction.

Background

Machine learning excavates internal association through deep information hidden in the learning data, and then makes prediction and judgement for it possesses comparatively effectual insight and high efficiency, surpasses human ability even in the vertical field.

The traditional prediction of chemical reaction yield is based on that users recall a plurality of organic reaction mechanisms and obtain a plausible inverse synthetic analysis on the basis of consulting a large number of documents, so that a relation is established between a target molecule and available raw materials. The chemical reaction yield prediction method is relatively complicated and inconvenient for engineering application.

Disclosure of Invention

In order to find a more effective implementation scheme for predicting the yield of the chemical reaction, the application of machine learning in various fields such as biopharmaceuticals, medical diagnosis and the like is considered, the traditional research method is changed, the scientific research efficiency is improved, and the changes of various industries are promoted, so that the invention provides the more effective method for predicting the yield of the chemical reaction.

In order to achieve the above object, the present invention provides a method for predicting yield of a chemical reaction, comprising the steps of:

splitting a reactant or a product of a chemical equation according to a preset splitting rule, so that the reactant or the product is represented as a series of K elements, wherein K is a natural number greater than 1;

quantizing the segmented K element vectors into vectors of (K, N) dimensions, wherein N is a natural number greater than 1;

segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M;

inputting the vector of the (V, N) dimension into a neural network to obtain a yield predicted value of the chemical equation;

and acquiring a yield real value, and adjusting system parameters in the neural network according to an error between the yield real value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield real value.

Preferably, the preset segmentation rule includes one or both of segmentation according to an element periodic table and segmentation according to a compound.

Preferably, the neural network is a convolutional neural network or a recurrent neural network.

Preferably, the adjusting the system parameters in the neural network according to the error between the actual yield value and the predicted yield value comprises the following steps:

calculating a gap between the predicted yield value and the true yield value by loss function regression fitting;

and adjusting system parameters in the neural network according to the difference between the yield predicted value and the yield true value.

Preferably, the loss function is a squared loss function.

Preferably, the quantizing the K elements after segmentation into a vector of (K, N) dimension includes the following steps:

establishing an element table based on the K elements;

and respectively mapping K elements in the element table to a vector space with N dimensions by a word embedding method so as to respectively represent the K elements by vectors.

Compared with the prior art, the yield prediction method for the chemical reaction has the following beneficial effects:

according to the yield prediction method for the chemical reaction, the reactants, the products and the reaction conditions in the chemical reaction equation are segmented and vectorized into vectors, and then the yield prediction is carried out through the neural network, so that the problem of complicated artificial inverse synthetic analysis is solved, the yield prediction efficiency is improved, and the yield prediction method is convenient for engineering application.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart of a method for predicting yield of a chemical reaction according to an embodiment of the present invention;

fig. 2 is an exemplary schematic diagram of step S3 in the yield prediction method of a chemical reaction according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

Referring to fig. 1-2, a method for predicting yield of a chemical reaction according to an embodiment of the present invention includes the following steps:

step S101: splitting reactants or products of a chemical equation according to a preset splitting rule, so that the reactants or the products are represented as a series of K elements, wherein K is a natural number greater than 1.

Preferably, the preset segmentation rule includes one or both of segmentation according to the periodic table of elements and segmentation according to the compound.

Illustratively, the original chemical expression:

[C:1]([C:3]1[CH:8]＝[CH:7][CH:6]＝[CH:5][C:4]＝1

[OH:9])#[N:2].[CH2:10]([CH:12]1[O:14][CH2:13]1)Cl，

it is then cut into Cl C1C O, N # C C1C C C C1O according to the periodic Table of the elements.

In some embodiments, in order to achieve automatic segmentation of elements, a segmentation model may be trained in advance to segment the elements using manually labeled segmentation data.

Step S103: quantizing the K elements after segmentation into vectors of (K, N) dimensions, wherein N is a natural number greater than 1.

Specifically, quantizing the vector of the K elements after segmentation into a vector of (K, N) dimension includes the following steps:

establishing an element table based on the K elements;

Illustratively, as above, Cl is cut into Cl C1C O. N # C C1C C C C1O in the periodic table of elements, if N is assumed to be 100, Cl is represented as a 100-dimensional vector by the word embedding method as:

[0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 …]。

step S105: segmenting the reaction conditions in the chemical equation according to the segmentation preset rule, quantizing the reaction condition vector into a vector of (M, N) dimension, and splicing the vector of (M, N) dimension and the vector of (K, N) dimension to obtain a vector of (V, N) dimension, wherein M is a natural number greater than 1, and V is the sum of K and M.

It is understood that the sampling method of step S3 is substantially the same as steps S1 and S2, except that the action object is a reaction condition.

Illustratively, as shown in fig. 2, the chemical reaction equation:

C1＝COCCC1.OCCCCCCO>C1.C1C(C1)C1>OCCCCCCOC1CCCCO1

the reactant corresponding to the chemical reaction equation is C1 ═ cocccc1. occcccco, the product is occcccccoc 1CCCCO1, and the reaction condition is C1.c1c (C1) C1, then step S3 is to splice the split and vectorized reactant with the split and vectorized reaction condition or splice the split and vectorized product with the split and vectorized reaction condition.

It should be noted that the above equation is expressed in order to simplify the linear input specification of the molecule, and is a specification for explicitly describing the structure of the molecule by using the code string of the american standard code for information exchange.

Step S107: and inputting the vector of the (V, N) dimension into a neural network to obtain a yield prediction value of the chemical equation.

In some embodiments, the neural network is a convolutional neural network or a recurrent neural network. The explanation of step S4 will be described below by taking a long-short term memory network commonly used in recurrent neural networks as an example. In the long-short term memory network,

i_t＝σ(W_ix_t+U_is_t-1+b_i)

f_t＝σ(W_fx_t+U_fs_t-1+b_f)

o_t＝σ(W_ox_t+U_os_t-1+b_o)

h_t＝o_t*tanh(C_t)

wherein,

i_t、f_t、o_tan input gate, a forgetting gate and an output gate are respectively arranged;

is a state value calculated based on the current input and the previous hidden state;

x_tis the t-th vector among the input vectors of (V, N) dimensions;

u is the weight matrix of the current input;

w is the weight matrix at the previous moment;

u, W, b is the parameter of the long-short term memory network, b is the bias term;

h represents the long-short term memory network output.

The yield prediction value can be finally obtained through the above calculation of the long-short term memory network. For convenience of illustration, the output of the long-short term memory network only needs one dimension to represent the reaction yield, i.e., the output of the yield y is 0-1, which represents the yield interval of the chemical reaction, and in order to represent the predicted yield value to the interval of 0-1, the predicted yield value can be represented by a sigmoid function after h is obtained:

it should be noted that the predicted yield value may be expressed by other functions in the interval of 0-1, which is not limited in the embodiments of the present invention.

Step S109: and acquiring a yield true value of the chemical equation, and adjusting system parameters in the neural network according to an error between the yield true value and the yield predicted value so as to enable the yield predicted value to be continuously close to the yield true value.

Specifically, the step of adjusting the system parameters in the neural network according to the error between the actual yield value and the predicted yield value comprises the following steps:

calculating the difference between the yield predicted value and the yield true value through loss function regression fitting;

Preferably, the loss function is a squared loss function, such as a mean square error.

Illustratively, the system parameters adjusted in the long-short term memory network include U, W and b.

It should be noted that, in the case of the product having more chemical reaction equations, the yield can be predicted by chemical reaction equation, or by taking the intermediate process as the reaction condition, which is not limited in the embodiment of the present invention.

according to the yield prediction method for the chemical reaction, provided by the embodiment of the invention, the reactants, the products and the reaction conditions in the chemical reaction equation are segmented and vectorized into vectors, and then the yield prediction is carried out through a neural network, so that the problem of complicated artificial inverse synthetic analysis is solved, the yield prediction efficiency is improved, and the yield prediction method is convenient for engineering application.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for predicting the yield of a chemical reaction, comprising: the yield prediction method for the chemical reaction comprises the following steps:

2. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the preset segmentation rule comprises one or two of segmentation according to an element periodic table and segmentation according to a compound.

3. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the neural network is a convolutional neural network or a recurrent neural network.

4. A method for predicting the yield of a chemical reaction according to claim 1, wherein: the adjusting the system parameters in the neural network according to the error between the yield real value and the yield predicted value comprises the following steps:

5. The method for predicting the yield of a chemical reaction according to claim 4, wherein: the loss function is a squared loss function.

6. A method for predicting the yield of a chemical reaction according to claim 1, wherein: quantizing the K element vectors after segmentation into vectors of (K, N) dimensions, including the steps of:

establishing an element table based on the K elements;