CN116227485A

CN116227485A - Reaction condition prediction model training method and device

Info

Publication number: CN116227485A
Application number: CN202211579416.4A
Authority: CN
Inventors: 王晓瑞; 康玉; 侯廷军; 谢昌谕; 曹东升; 邓亚峰; 施慧
Original assignee: Hangzhou Carbon Silicon Smart Technology Development Co ltd
Current assignee: Hangzhou Carbon Silicon Smart Technology Development Co ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-06-06

Abstract

The application provides a reaction condition prediction model training method and device. Comprising the following steps: optimizing a language representation network layer; obtaining a model training sample of a reaction condition prediction model; inputting a model training sample into a reaction condition prediction model to be trained; calling a language representation network layer to process the chemical reaction SMILES to obtain hidden tensors which are corresponding to each word in the chemical reaction SMILES and are provided with attention weight information and represent chemical reaction sequences; calling a masking multi-head attention network in a reaction condition decoder to perform attention characterization on a reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on a hidden tensor and a condition sequence characteristic tensor; calculating to obtain a loss value of a reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence; and under the condition that the loss value is in a preset range, obtaining a final reaction condition prediction model. The prediction accuracy is obviously improved on a large-scale reaction condition prediction data set.

Description

Reaction condition prediction model training method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a reaction condition prediction model training method and device.

Background

Successful implementation of synthesis planning is not separated from selection of accurate reaction conditions, and a reliable reaction condition prediction algorithm can help researchers to optimize chemical reactions more efficiently, so that target molecules can be acquired more quickly. Although some reaction condition recommendation algorithms have been proposed by researchers, most of them are single-reaction or single-condition prediction schemes, and cannot be directly applied to the synthetic planning flow.

Currently, some machine learning algorithms have been applied to the problem of reaction condition prediction. However, most of the condition recommendation algorithms using advanced molecular characterization are recommended for a single reaction only, cannot be trained on a large-scale general chemical reaction data set, and have little difficulty in application to reaction planning algorithm applications. The current commonly used model algorithm uses an idea similar to RNN, uses molecular fingerprints as input of a neural network, predicts the most important catalyst first, takes the predicted condition of each step as input composition of the predicted condition of the next step, predicts the catalyst, two solvents, two reagents and temperature in sequence, has poor interpretation and lower model prediction precision.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present application is to provide a method and an apparatus for training a reaction condition prediction model, so as to improve prediction accuracy on a large-scale reaction condition prediction data set.

In a first aspect, embodiments of the present application provide a method for training a reaction condition prediction model, where the method includes:

obtaining a pre-training sample of a language representation network layer based on a reaction center masking policy, the pre-training sample comprising: a SMILES sequence of a chemical reaction expression;

identifying a chemical reaction center in an SMILES sequence of the chemical reaction expression, taking a masking rate of 0.5 for the corresponding vocabulary, taking a masking probability of 0.15 for the rest vocabulary, calling a language representation network layer to predict the masked vocabulary, calculating cross entropy loss, and optimizing the language representation network layer;

obtaining a model training sample of a reaction condition prediction model, the model training sample comprising: a SMILES sequence of a chemical reaction expression and a sequence of reaction conditions;

inputting the model training sample into a reaction condition prediction model to be trained, wherein the reaction condition prediction model to be trained comprises: the pre-trained language characterizes a network layer and a reaction condition decoder;

Calling the language characterization network layer to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and characterize the chemical reaction sequence;

calling a masking multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor;

calculating to obtain a loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence;

and under the condition that the loss value is in a preset range, taking the trained reaction condition prediction model to be trained as a final reaction condition prediction model.

Optionally, the reaction condition prediction model to be trained further includes: the feature extraction network layer and the vector conversion layer,

before the language representation network layer is called to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to words in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and represent the chemical reaction sequence, the method further comprises the steps of:

Invoking the feature extraction network layer to extract each word in the SMILES sequence of the chemical reaction expression and the position feature of each word in the SMILES sequence of the chemical reaction expression;

and calling the vector conversion layer to respectively perform vector conversion processing on each word and the position feature to obtain a word vector corresponding to each word and a position vector corresponding to the position feature.

Optionally, the language characterization network layer includes: multi-headed attention mechanism layer, layer normalization and Dropout layer, feed-forward neural network and language characterization encoder,

the calling the language representation network layer to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and represent the chemical reaction sequence, wherein the hidden tensors comprise:

invoking the multi-head attention mechanism layer to process the addition normalization result of the word vector and the position vector to obtain the association relation between each word;

invoking the layer normalization and Dropout layer to process the word vector and the position vector according to the association relation to obtain a hidden tensor with weights among words;

The hidden tensor with the weight among each word is passed through a feedforward neural network to obtain a feedforward neural network hidden tensor;

invoking the language representation output layer to perform fusion processing on the weights among the words and the forward processing result to obtain and output a hidden tensor of the language representation encoder;

when tensor flows in the language characterization coder, the residual connection comprises a plurality of residual connections, namely the residual connection from the input embedded layer to the coder self-output layer and the residual connection from the output layer to the language characterization output layer.

Optionally, the reaction condition decoder comprises: masking a multi-head attention network, a multi-head attention mechanism layer and a feedforward neural network,

the method comprises the steps of calling a multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor, wherein the method comprises the following steps:

invoking the masking multi-head attention network to learn the attention weight of the reaction condition sequence to obtain mask condition sequence characteristics;

invoking the multi-head attention mechanism layer to perform attention learning on the mask condition sequence features based on the attention weight to obtain initial prediction condition sequence features;

And calling the feedforward neural network to forward learn the initial predicted condition sequence characteristics to obtain the predicted reaction condition sequence.

Optionally, the calculating, based on the reaction condition sequence and the predicted reaction condition sequence, a loss value of the reaction condition prediction model to be trained includes:

and calculating a cross entropy loss value based on the reaction condition sequence and the predicted reaction condition sequence, and taking the cross entropy loss value as the loss value of the reaction condition prediction model to be trained.

Optionally, the reaction condition prediction model to be trained further includes: the temperature of the heat sink is controlled by the temperature decoder,

masking a multi-head attention network in the calling reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor, wherein the method further comprises the following steps:

and calling the temperature decoder to process the predicted response condition sequence according to the attention weight to obtain the predicted reaction temperature corresponding to the SMILES sequence of the chemical reaction expression.

Calculating to obtain a cross entropy loss value based on the reaction condition sequence and the predicted reaction condition sequence;

calculating to obtain a mean square error loss value based on the predicted reaction temperature;

and calculating the loss value of the reaction condition prediction model to be trained based on the cross entropy loss value, the mean square error loss value and the balance coefficient.

In a second aspect, embodiments of the present application provide a reaction condition prediction model training apparatus, the apparatus including:

a pre-training sample acquisition module, configured to acquire a pre-training sample of a language representation network layer based on a reaction center masking policy, where the pre-training sample includes: a SMILES sequence of a chemical reaction expression;

the language representation network layer optimizing module is used for identifying a chemical reaction center in the SMILES sequence of the chemical reaction expression, taking the corresponding vocabulary as a masking rate of 0.5, taking the masking probability of 0.15 as the rest vocabulary, calling the language representation network layer to predict the masked vocabulary, calculating cross entropy loss and optimizing the language representation network layer;

the model training sample acquisition module is used for acquiring a model training sample of the reaction condition prediction model, and the model training sample comprises: a SMILES sequence of a chemical reaction expression and a sequence of reaction conditions;

The model training sample input module is used for inputting the model training sample into a reaction condition prediction model to be trained, and the reaction condition prediction model to be trained comprises: the pre-trained language characterizes a network layer and a reaction condition decoder;

the hidden tensor acquisition module is used for calling the language characterization network layer to process the SMILES sequence of the chemical reaction expression so as to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are used for characterizing the chemical reaction sequence and are provided with attention weight information;

the predicted reaction condition sequence acquisition module is used for calling a multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor;

the loss value calculation module is used for calculating the loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence;

and the reaction condition prediction model acquisition module is used for taking the trained reaction condition prediction model to be trained as a final reaction condition prediction model under the condition that the loss value is in a preset range.

the apparatus further comprises:

the position feature acquisition module is used for calling the feature extraction network layer to extract each word in the SMILES sequence of the chemical reaction expression and the position feature of each word in the SMILES sequence of the chemical reaction expression;

and the position vector acquisition module is used for calling the vector conversion layer to respectively perform vector conversion processing on each word and the position feature to obtain a word vector corresponding to each word and a position vector corresponding to the position feature.

the hidden tensor acquisition module includes:

the association relation acquisition unit is used for calling the multi-head attention mechanism layer to process the addition normalization result of the word vector and the position vector so as to obtain the association relation between each word;

the hidden tensor acquisition unit is used for calling the layer normalization and Dropout layer to process the word vector and the position vector according to the association relation so as to obtain a hidden tensor with weights among the words;

The feedforward network tensor acquisition unit is used for obtaining the feedforward neural network hidden tensor by passing the hidden tensor with the weight among each word through the feedforward neural network;

the encoder tensor acquisition unit is used for calling the language characterization output layer to perform fusion processing on the weights among the words and the forward processing result to obtain and output the hidden tensor of the language characterization encoder;

the predicted reaction condition sequence acquisition module comprises:

the mask sequence feature acquisition unit is used for calling the masking multi-head attention network to learn the attention weight of the reaction condition sequence to obtain mask condition sequence features;

an initial sequence feature acquisition unit, configured to invoke the multi-head attention mechanism layer to perform attention learning on the mask condition sequence feature based on the attention weight, so as to obtain an initial prediction condition sequence feature;

The predicted sequence feature acquisition unit is used for calling the feedforward neural network to forward learn the initial predicted condition sequence feature so as to obtain the predicted reaction condition sequence.

Optionally, the loss value calculation module includes:

and the first loss value calculation unit is used for calculating a cross entropy loss value based on the reaction condition sequence and the predicted reaction condition sequence, and taking the cross entropy loss value as the loss value of the reaction condition prediction model to be trained.

the apparatus further comprises:

and the predicted reaction temperature acquisition module is used for calling the temperature decoder to process the predicted response condition sequence according to the attention weight so as to obtain the predicted reaction temperature corresponding to the SMILES sequence of the chemical reaction expression.

Optionally, the loss value calculation module includes:

the cross entropy loss value calculation unit is used for calculating a cross entropy loss value based on the reaction condition sequence and the predicted reaction condition sequence;

the mean square error loss calculation unit is used for calculating a mean square error loss value based on the predicted reaction temperature;

And the second loss value calculation unit is used for calculating the loss value of the reaction condition prediction model to be trained based on the cross entropy loss value, the mean square error loss value and the balance coefficient.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the reaction condition prediction model training method of any one of the above when executing the program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the reaction condition prediction model training method of any one of the above.

Compared with the prior art, the embodiment of the application has the following advantages:

in the embodiment of the application, the pre-training sample of the language representation network layer based on the reaction center masking strategy is obtained, and the pre-training sample comprises the following components: a SMILES sequence of a chemical reaction expression; identifying a chemical reaction center in an SMILES sequence of the chemical reaction expression, taking a masking rate of 0.5 for the corresponding vocabulary, taking a masking probability of 0.15 for the rest vocabulary, calling a language representation network layer to predict the masked vocabulary, calculating cross entropy loss, and optimizing the language representation network layer; obtaining a model training sample of a reaction condition prediction model, the model training sample comprising: a SMILES sequence of a chemical reaction expression and a sequence of reaction conditions; inputting the model training sample into a reaction condition prediction model to be trained, wherein the reaction condition prediction model to be trained comprises: the pre-trained language characterizes a network layer and a reaction condition decoder; calling the language characterization network layer to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and characterize the chemical reaction sequence; calling a masking multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor; calculating to obtain a loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence; and under the condition that the loss value is in a preset range, taking the trained reaction condition prediction model to be trained as a final reaction condition prediction model. According to the embodiment of the application, the prediction of the reaction background is regarded as a task of translating the sequence (namely, a reaction condition sequence formed by the catalyst, the solvent 1, the solvent 2, the reagent 1 and the reagent 2) into the sequence, and the prediction model based on the Attention is adopted to use the sequence to represent the chemical reaction so as to respectively predict the catalyst, the solvent, the reagent and the temperature, so that the prediction precision can be obviously improved on a large-scale reaction condition prediction data set.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for training a reaction condition prediction model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a prediction flow of a reaction condition prediction model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a pre-training procedure for modeling a masking reaction center according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a training device for a reaction condition prediction model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.

The method aims to solve the problems of poor interpretability and lower model prediction precision when the model algorithm in the prior art predicts the reaction conditions. The present embodiments present an interpretable pre-training based reaction condition predictor (Parrot). The model uses Bert as an encoder to extract reaction features from the reaction SMILES and uses a less parametric translator decoder to calculate a hidden layer representation of the reaction context. Finally, a classifier is used to predict the reaction conditions and a regression layer (temperature decoder) is used to calculate the temperature. The structure of the model provided in the embodiment of the present application may be as shown in fig. 1. The prediction of the chemical context (catalyst, solvent 1, solvent 2, reagent 1, reagent 2) can be considered as a sequence-to-sequence translation task, with post-prediction conditions also taking into account the already predicted conditions, but the target sequence is of fixed length (length 6). The information contained in the memory tensor of the encoder and the decoder output tensor is used to predict the temperature corresponding to the five reaction conditions. Each of these tensors is deformed by one feed-forward neural network and after the tensors are connected in series, is fed into a third feed-forward neural network to calculate the temperature.

The technical solutions of the embodiments of the present application are described in detail below in conjunction with specific embodiments.

Referring to fig. 1, a step flowchart of a reaction condition prediction model training method provided in an embodiment of the present application is shown, and as shown in fig. 1, the reaction condition prediction model training method may include the following steps:

step 101: obtaining a pre-training sample of a language representation network layer based on a reaction center masking policy, the pre-training sample comprising: SMILES sequences of the chemical reaction expression.

The embodiment of the application can be applied to a transition variant model based on the Attention, and the sequence is used for representing chemical reaction, so that the scenes of catalyst, solvent and reagent and temperature are respectively predicted.

Two pre-training strategies, mask language simulation (Masked ML) and mask center simulation (Masked RCM) containing knowledge of the chemical reaction domain, were devised in the examples of this application. The reaction data sets used in both pre-training strategies were about 130 ten thousand reaction SMILES obtained by cleaning USPTO 1976-2016 sep. These data have cleared all reaction conditions, including reactants and products only (reactant > > product), and remain the same as the input format and content of the reaction condition prediction task. A mask RCM training strategy with domain knowledge was designed. A schematic diagram of the implementation of this strategy is shown in fig. 3. In the Masked RCM strategy, the masking probability of the reaction center mark can be increased to 0.5 instead of 0.15 in order to enhance the model's understanding of the reaction center. The model can be made more focused on predictions and embedding representing reaction center vocabulary by the Masked RCM.

In this embodiment, a pre-training sample of the language-characterization network layer based on the reaction center masking policy may be first obtained to optimize the language-characterization network layer of the reaction condition prediction model. Wherein the pre-training samples may include: SMILES sequences of the chemical reaction expression.

After a pre-training sample of the language representation network layer based on the reaction center masking policy is obtained, step 102 is performed.

Step 102: identifying a chemical reaction center in the SMILES sequence of the chemical reaction expression, taking a masking rate of 0.5 for the corresponding vocabulary, taking a masking probability of 0.15 for the rest vocabulary, calling a language representation network layer to predict the masked vocabulary, calculating cross entropy loss, and optimizing the language representation network layer.

After a pre-training sample of a language representation network layer based on a reaction center masking strategy is obtained, a chemical reaction center in an SMILES sequence of a chemical reaction expression can be identified, a masking rate of 0.5 is taken from a corresponding vocabulary, a masking probability of 0.15 is taken from the rest vocabulary, the language representation network layer is called to predict the masked vocabulary, cross entropy loss is calculated, and the language representation network layer is optimized.

After optimizing the language representation network layer, step 103 is performed.

Step 103: obtaining a model training sample of a reaction condition prediction model, the model training sample comprising: the SMILES sequence of the chemical reaction expression and the sequence of the reaction conditions.

Model training samples refer to samples used to train a reaction condition predictive model to be trained, in this example, the model training samples may include: a SMILES sequence of a chemical reaction expression and a reaction condition sequence, wherein the reaction condition sequence may be: the catalyst, solvent, reagent and reagent are formed in sequence.

When the reaction condition prediction model to be trained is trained, a model training sample can be obtained. In a specific implementation, the model training sample may be a SMILES sequence of a chemical reaction expression obtained by cleaning USPTO 1976-2016 sep.

After the model training samples of the reaction condition prediction model are obtained, step 104 is performed.

Step 104: inputting the model training sample into a reaction condition prediction model to be trained, wherein the reaction condition prediction model to be trained comprises: the pre-trained language characterizes the network layer and the reaction condition decoder.

After the model training samples are obtained, the model training samples may be input to a reaction condition prediction model to be trained, which in this example may include: the pre-trained language representation network layer and the reaction condition decoder are as shown in fig. 2, and the language representation network layer is as follows: the Bert Layer, the reaction condition decoder is: condition Decoder.

After inputting the model training samples into the reaction condition predictive model to be trained, step 105 is performed.

Step 105: and calling the language characterization network layer to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and characterize the chemical reaction sequence.

After the model training sample is input into the reaction condition prediction model to be trained, a language characterization network layer can be called to process the SMILES sequence of the chemical reaction expression so as to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are used for characterizing the chemical reaction sequence and are provided with attention weight information.

In this embodiment, the reaction condition prediction model to be trained may further include: a feature extraction network layer and a vector conversion layer, wherein the feature extraction network layer can extract each word in the SMILES sequence of the chemical reaction expression and the position feature of each word in the SMILES sequence of the chemical reaction expression. The implementation may be described in detail in connection with the following specific implementations.

In a specific implementation manner of the present application, before the step 105, the method may further include:

Step A1: and calling the feature extraction network layer to extract each word in the SMILES sequence of the chemical reaction expression and the position feature of each word in the SMILES sequence of the chemical reaction expression.

In this embodiment, after the model training sample is input to the reaction condition prediction model to be trained, the feature extraction network layer may be invoked to extract each word in the SMILES sequence of the chemical reaction expression and the positional feature of each word in the SMILES sequence of the chemical reaction expression.

Step A2 is performed after each word in the SMILES sequence of the chemical reaction expression is extracted and the positional feature of each word in the SMILES sequence of the chemical reaction expression is called up by the feature extraction network layer.

Step A2: and calling the vector conversion layer to respectively perform vector conversion processing on each word and the position feature to obtain a word vector corresponding to each word and a position vector corresponding to the position feature.

After each word in the SMILES sequence of the chemical reaction expression and the position feature of each word in the SMILES sequence of the chemical reaction expression are extracted by the feature extraction network layer, vector conversion processing can be respectively carried out on each word and the position feature by the vector conversion layer so as to obtain a word vector corresponding to each word and a position vector corresponding to the position feature. As shown in fig. 2, the SMILES sequence of the chemical reaction expression may be Input to a model structure Input component of the left part, which includes Token component (word feature extraction network layer and vector conversion layer) to extract each word in the SMILES sequence of the chemical reaction expression and perform vector conversion to obtain a word vector corresponding to each word. The model structure of the left side portion may further include: positional Embedding (i.e., a position feature extraction layer and a vector conversion layer) to extract the position feature of each word in the SMILES sequence of the chemical reaction expression and perform vector conversion to obtain a position vector corresponding to each position feature.

After the addition of the position vector and the word vector is obtained, a subsequent model processing process may be performed. The call language characterization network layer processes the addition of the word vector and the position vector, and the processing process can be described in detail in the following specific implementation manner.

In another specific implementation of the present application, the step 105 may include:

substep B1: and calling the multi-head attention mechanism layer to process the summation normalization result of the word vector and the position vector to obtain the association relation between each word.

In this embodiment, after obtaining the position vector and the word vector, the multi-attention mechanism layer may be invoked to process the addition normalization result of the word vector and the position vector, so as to obtain the association relationship between each word. As shown in fig. 2, input encoding may output a word vector and a position vector, and may further call Bert Self Attention to process the addition normalization result of the word vector and the position vector, so as to output an association relationship between each word.

And after calling the multi-head attention mechanism layer to process the addition normalization result of the word vector and the position vector to obtain the association relation between each word, executing the substep B2.

Substep B2: and calling the layer normalization and Dropout layer to process the word vector and the position vector according to the association relation to obtain a hidden tensor with weights among the words.

After the multi-head attention mechanism layer is called to process the addition normalization result of the word vector and the position vector to obtain the association relation between each word, the call layer normalization and the Dropout layer can process the word vector and the position vector according to the association relation to obtain the hidden tensor with the weight between each word. As shown in fig. 2, bert Self Attention outputs the association relationship between each word, the Bert Self Output may process the word vector and the position vector Output by Input encoding according to the association relationship between each word Output by Bert Self Attention to obtain a hidden tensor with weights between the words, for example, 10 words in the SMILES sequence of the chemical reaction expression, and the hidden tensor with weights between the 10 words may be obtained through the Bert Self Output.

It will be appreciated that the above examples are only examples listed for better understanding of the technical solutions of the embodiments of the present application, and are not to be construed as the only limitation of the present embodiments.

And B3, after the call layer normalization and the Dropout layer process the word vector and the position vector according to the association relation to obtain a hidden tensor with weights among the words, executing a sub-step.

Substep B3: and passing the hidden tensor with the weights among the words through a feedforward neural network to obtain the feedforward neural network hidden tensor.

After the call layer normalizes and the Dropout layer processes the word vector and the position vector according to the association relation to obtain hidden tensors with weights among the words, the hidden tensors with weights among the words can be passed through a feedforward neural network to obtain the feedforward neural network hidden tensors. As shown in fig. 2, the Output of the Bert Self Output may be used as the input of Bert Intermediate, and the Bert Self Output outputs a hidden tensor with weights between words passing through the feedforward neural network may obtain a feedforward neural network hidden tensor.

After passing the hidden tensor with weights between the words through the feedforward neural network to obtain the feedforward neural network hidden tensor, sub-step B4 is performed.

Substep B4: and calling the language representation output layer to perform fusion processing on the weights among the words and the forward processing result to obtain and output the hidden tensor of the language representation encoder.

After the hidden tensor with the weights among the words is obtained through the feedforward neural network, the language representation output layer can be called to carry out fusion processing on the weights among the words and the forward processing result, so as to obtain and output the hidden tensor of the language representation encoder. As shown in fig. 2, the Output of Bert Intermediate and the Output of Bert Self Output may be used as the input of Bert Output, and invoking Bert Output may perform fusion processing on the weights of the words Output by Bert Self Output and the forward processing result Output by Bert Intermediate, so as to obtain and Output the hidden tensor of the language characterization encoder.

After the language characterization output layer is called to perform fusion processing on the weights among the words and the forward processing results, the hidden tensor of the language characterization encoder is obtained and output, and then step 106 is executed.

Step 106: and calling a masking multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor.

The predicted reaction condition sequence refers to a reaction condition corresponding to the reaction molecule expression predicted by the reaction condition prediction model to be trained, i.e. the predicted reaction condition sequence may be as follows: catalyst, solvent, reagent, sequence of reagents.

After the model training sample is input into the reaction condition prediction model to be trained, a masking multi-head attention network in a reaction condition decoder can be called to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and a predicted reaction condition sequence is obtained based on a hidden tensor and a condition sequence characteristic tensor. The implementation may be described in detail in connection with the following specific implementations.

In a specific implementation of the present application, the step 105 may include:

substep C1: and calling the masking multi-head attention network to learn the attention weights of the reaction condition sequences to obtain mask condition sequence characteristics.

In this embodiment, the masking multi-headed attention network includes: masking the multi-headed attention network, multi-headed attention mechanism layer and feed forward neural network as shown in the right-hand side model structure of fig. 2. The masking multi-headed attention network is Condition Decoder (i.e., conditional decoder), condition Decoder may include: masked Multi-Head Attention network, multi-Head Attention layer, and Feed Forward.

After the reaction condition sequence is input into the reaction condition prediction model to be trained, a masking multi-head attention network can be called to learn the attention weight of the reaction condition sequence, so as to obtain the mask condition sequence characteristics. As shown in fig. 2, attention learning of the reaction condition sequence may be invoked, so that mask condition sequence features and the like may be obtained.

After invoking the masked multi-headed attention network to perform attention weight learning on the reaction condition sequence to obtain the mask condition sequence feature, a sub-step C2 is performed.

Substep C2: and calling the multi-head attention mechanism layer to perform attention learning on the mask condition sequence features based on the attention weight to obtain initial prediction condition sequence features.

After the masking multi-head attention network is called to perform attention weight learning on the reaction condition sequence to obtain the mask condition sequence feature, the multi-head attention mechanism layer can be called to perform attention learning on the mask condition sequence feature based on the attention weight to obtain the initial prediction condition sequence feature. As shown in fig. 2, the output of the Masked Multi-Head attribute and the output of the Bert Layer may be used as the input of the Multi-Head attribute, and invoking the Multi-Head attribute may perform Attention learning on the mask condition sequence feature output by the Masked Multi-Head attribute according to the Attention weight output by the Bert Layer to obtain an initial prediction condition sequence feature, and so on.

In the present example of the present invention,for the interpretive analysis part of the Parrot reaction condition prediction model, the attention weight a of the model can be used _e The atom w is associated to the chemical reaction condition C. The attention weight is calculated by using a tensor between the encoder and decoder of the model. The decoder layers each contain a plurality of heads (heads), each of which learns an Attention moment array Attention epsilon R ^N×M This attention weight represents the embedded tensor X of each marker in the input reaction sequence X of associated length N _i An embedding tensor Y to each reaction condition sequence vocabulary of the reaction condition sequence Y of output length M _j . Thus, each element Attention _ij Is a connection X _i To Y _j Is a weight of attention of (c).

Each head in the multi-head attention layer will first use the following operation to mix each vocabulary X _i Or Y _j Is converted into a key (Q), query (K) and value (V) vector.

K _i ＝W _k X _i ,Q _j ＝W _q Y _j ,V _i ＝W _v X _i (1)

Wherein W is _k ,W _q ,W _v Is a learnable parameter. A is that _i Can be regarded as X _i For the correlation probability vector of Y, it is calculated according to the following equation:

in this embodiment, the attention weight of the vocabulary may be converted into an atomic attention weight. And the average value of the attention weights of each head was taken as the attention weight a for analysis.

After invoking the multi-headed attention mechanism layer to learn the mask condition sequence features based on the attention weights to obtain initial predicted condition sequence features, sub-step C2 is performed.

Substep C3: and calling the feedforward neural network to forward learn the initial predicted condition sequence characteristics to obtain the predicted reaction condition sequence.

After the multi-head attention mechanism layer is called to perform attention learning on the mask condition sequence features based on the attention weights to obtain initial prediction condition sequence features, a feedforward neural network can be called to perform forward learning on the initial prediction condition sequence features to obtain a prediction reaction condition sequence. As shown in FIG. 2, the output of the Multi-Head Attention can be used as an input to Feed Forward, which can Forward learn the initial predicted condition sequence characteristics of the Multi-Head Attention output, thereby obtaining a predicted reaction condition sequence.

After the predicted sequence of reaction conditions is obtained, step 107 is performed.

Step 107: and calculating to obtain the loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence.

After the predicted reaction condition sequence is obtained, a loss value of the reaction condition prediction model to be trained can be calculated based on the reaction condition sequence and the predicted reaction condition sequence.

In this embodiment, if the reaction condition prediction model to be trained does not involve a temperature prediction function, the loss function only includes a classification loss function. Specifically, the cross entropy loss value can be calculated based on the reaction condition sequence and the predicted reaction condition sequence, and the cross entropy loss value is used as the loss value of the reaction condition prediction model to be trained.

When the reaction condition prediction model to be trained involves a temperature prediction function, the information contained in the memory tensor of the encoder and the decoder output tensor may be used to predict the temperature corresponding to five reaction conditions, each of which is deformed by one feedforward neural network and sent to a third feedforward neural network after the tensors are connected in series to calculate the temperature. The implementation may be described in detail in connection with the following specific implementations.

In a specific implementation manner of the present application, after the step 106, the method may further include:

step D1: and calling the temperature decoder to process the predicted response condition sequence according to the attention weight to obtain the predicted reaction temperature corresponding to the SMILES sequence of the chemical reaction expression.

In this embodiment, as shown in fig. 2, the temperature decoder is Temperature Decoder, the output of Bert Layer and the output of Condition Decoder can be used as the input of Temperature Decoder, and the call Temperature Decoder can process the predicted corresponding condition sequence according to the attention weight, so as to obtain the predicted reaction temperature corresponding to the SMILES sequence of the chemical reaction expression.

After the predicted reaction temperature is obtained, a loss value of the reaction condition prediction model to be trained can be calculated according to the predicted reaction condition sequence and the predicted reaction temperature. The implementation may be described in detail in connection with the following specific implementations.

In another specific implementation of the present application, the step 107 may include:

substep E1: and calculating to obtain a cross entropy loss value based on the reaction condition sequence and the predicted reaction condition sequence.

In this embodiment, after the predicted reaction condition sequence is obtained, the cross entropy loss value may be calculated based on the reaction condition sequence and the predicted reaction condition sequence.

Substep E2: and calculating to obtain a mean square error loss value based on the predicted reaction temperature.

After the predicted reaction temperature is obtained, a mean square error loss value may be calculated based on the predicted reaction temperature.

Substep E3: and calculating the loss value of the reaction condition prediction model to be trained based on the cross entropy loss value, the mean square error loss value and the balance coefficient.

After the cross entropy loss value and the mean square error loss value are obtained, the loss value of the reaction condition prediction model to be trained can be calculated based on the cross entropy loss value, the mean square error loss value and the balance coefficient. Specifically, the calculation formula is as follows:

In the above formula (3), I is a chemical background condition number, c _i Is the predictive label of the i-th condition,

is the i-th conditional real tag, t is the predicted temperature,/->

Is the true temperature. In this example, i=6 (including 5 chemical background conditions and end markers).

Step 108: and under the condition that the loss value is in a preset range, taking the trained reaction condition prediction model to be trained as a final reaction condition prediction model.

After the loss value of the reaction condition prediction model to be trained is calculated, whether the loss value is in a preset range or not can be judged.

If the loss value is not in the preset range, the condition prediction model to be trained is not converged, at this time, model parameters of the condition prediction model to be trained can be updated according to the calculated loss value, and training is continued on the condition prediction model to be trained until the model converges.

If the loss value is within the preset range, the prediction model of the reaction condition to be trained is converged, and the trained prediction model of the reaction condition to be trained can be used as a final prediction model of the reaction condition for a scene of subsequent prediction of the reaction condition.

According to the reaction condition prediction model training method provided by the embodiment of the application, the pre-training sample of the language representation network layer based on the reaction center masking strategy is obtained, and the pre-training sample comprises the following steps: a SMILES sequence of a chemical reaction expression; identifying a chemical reaction center in an SMILES sequence of the chemical reaction expression, taking a masking rate of 0.5 for the corresponding vocabulary, taking a masking probability of 0.15 for the rest vocabulary, calling a language representation network layer to predict the masked vocabulary, calculating cross entropy loss, and optimizing the language representation network layer; obtaining a model training sample of a reaction condition prediction model, the model training sample comprising: a SMILES sequence of a chemical reaction expression and a sequence of reaction conditions; inputting the model training sample into a reaction condition prediction model to be trained, wherein the reaction condition prediction model to be trained comprises: the pre-trained language characterizes a network layer and a reaction condition decoder; calling the language characterization network layer to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and characterize the chemical reaction sequence; calling a masking multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor; calculating to obtain a loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence; and under the condition that the loss value is in a preset range, taking the trained reaction condition prediction model to be trained as a final reaction condition prediction model. According to the embodiment of the application, the prediction of the reaction background is regarded as a task of translating the sequence (namely, a reaction condition sequence formed by the catalyst, the solvent 1, the solvent 2, the reagent 1 and the reagent 2) into the sequence, and the prediction model based on the Attention is adopted to use the sequence to represent the chemical reaction so as to respectively predict the catalyst, the solvent, the reagent and the temperature, so that the prediction precision can be obviously improved on a large-scale reaction condition prediction data set.

Referring to fig. 4, a schematic structural diagram of a reaction condition prediction model training device provided in an embodiment of the present application is shown, and as shown in fig. 4, the reaction condition prediction model training device 400 may include the following modules:

a pre-training sample obtaining module 410, configured to obtain a pre-training sample of a language representation network layer based on a reaction center masking policy, where the pre-training sample includes: a SMILES sequence of a chemical reaction expression;

the language characterization network layer optimizing module 420 is configured to identify a chemical reaction center in the SMILES sequence of the chemical reaction expression, take the corresponding vocabulary to a masking rate of 0.5, take the rest vocabulary to a masking probability of 0.15, call the language characterization network layer to predict the masked vocabulary, calculate cross entropy loss, and optimize the language characterization network layer;

a model training sample acquisition module 430, configured to acquire a model training sample of a reaction condition prediction model, where the model training sample includes: a SMILES sequence of a chemical reaction expression and a sequence of reaction conditions;

a model training sample input module 440, configured to input the model training sample into a reaction condition prediction model to be trained, where the reaction condition prediction model to be trained includes: the pre-trained language characterizes a network layer and a reaction condition decoder;

The hidden tensor obtaining module 450 is configured to invoke the language representation network layer to process the SMILES sequence of the chemical reaction expression, so as to obtain a hidden tensor, which is corresponding to each word in the SMILES sequence of the chemical reaction expression and has attention weight information, of the representation chemical reaction sequence;

a predicted reaction condition sequence obtaining module 460, configured to invoke a masking multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain a corresponding condition sequence feature, and obtain a predicted reaction condition sequence based on the hidden tensor and the condition sequence feature tensor;

the loss value calculation module 470 is configured to calculate, based on the reaction condition sequence and the predicted reaction condition sequence, a loss value of the reaction condition prediction model to be trained;

and a reaction condition prediction model obtaining module 480, configured to take the trained reaction condition prediction model to be trained as a final reaction condition prediction model when the loss value is within a preset range.

The apparatus further comprises:

the hidden tensor acquisition module includes:

the predicted reaction condition sequence acquisition module comprises:

Optionally, the loss value calculation module includes:

the apparatus further comprises:

Optionally, the loss value calculation module includes:

According to the reaction condition prediction model training device, the pre-training sample of the language representation network layer based on the reaction center masking strategy is obtained, and the pre-training sample comprises the following components: a SMILES sequence of a chemical reaction expression; identifying a chemical reaction center in an SMILES sequence of the chemical reaction expression, taking a masking rate of 0.5 for the corresponding vocabulary, taking a masking probability of 0.15 for the rest vocabulary, calling a language representation network layer to predict the masked vocabulary, calculating cross entropy loss, and optimizing the language representation network layer; obtaining a model training sample of a reaction condition prediction model, the model training sample comprising: a SMILES sequence of a chemical reaction expression and a sequence of reaction conditions; inputting the model training sample into a reaction condition prediction model to be trained, wherein the reaction condition prediction model to be trained comprises: the pre-trained language characterizes a network layer and a reaction condition decoder; calling the language characterization network layer to process the SMILES sequence of the chemical reaction expression to obtain hidden tensors which are corresponding to each word in the SMILES sequence of the chemical reaction expression and are provided with attention weight information and characterize the chemical reaction sequence; calling a masking multi-head attention network in the reaction condition decoder to perform attention characterization on the reaction condition sequence to obtain corresponding condition sequence characteristics, and obtaining a predicted reaction condition sequence based on the hidden tensor and the condition sequence characteristic tensor; calculating to obtain a loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence; and under the condition that the loss value is in a preset range, taking the trained reaction condition prediction model to be trained as a final reaction condition prediction model. According to the embodiment of the application, the prediction of the reaction background is regarded as a task of translating the sequence (namely, a reaction condition sequence formed by the catalyst, the solvent 1, the solvent 2, the reagent 1 and the reagent 2) into the sequence, and the prediction model based on the Attention is adopted to use the sequence to represent the chemical reaction so as to respectively predict the catalyst, the solvent, the reagent and the temperature, so that the prediction precision can be obviously improved on a large-scale reaction condition prediction data set.

Example III

The embodiment of the application provides electronic equipment, which comprises: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the reaction condition prediction model training method when being executed by the processor.

Fig. 5 shows a schematic structural diagram of an electronic device 500 according to an embodiment of the invention. As shown in fig. 5, the electronic device 500 includes a Central Processing Unit (CPU) 501 that can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 502 or loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic device 500 may also be stored. The CPU501, ROM502, and RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, mouse, microphone, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The various processes and treatments described above may be performed by the processing unit 501. For example, the methods of any of the embodiments described above may be implemented as a computer software program tangibly embodied on a computer-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM502 and/or the communication unit 509. When the computer program is loaded into RAM503 and executed by CPU501, one or more actions of the methods described above may be performed.

Example IV

Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described reaction condition prediction model training method.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminals (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

The foregoing has described in detail the methods, apparatus, electronic devices and computer readable storage medium for training a reaction condition prediction model provided herein, and specific examples have been presented herein to illustrate the principles and embodiments of the present application, the above examples being provided only to assist in understanding the methods and core ideas of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for training a reaction condition predictive model, the method comprising:

2. The method of claim 1, wherein the reaction condition predictive model to be trained further comprises: the feature extraction network layer and the vector conversion layer,

3. The method of claim 2, wherein the language characterization network layer comprises: multi-headed attention mechanism layer, layer normalization and Dropout layer, feed-forward neural network and language characterization encoder,

4. The method of claim 1, wherein the reaction condition decoder comprises: masking a multi-head attention network, a multi-head attention mechanism layer and a feedforward neural network,

5. The method according to claim 1, wherein the calculating, based on the reaction condition sequence and the predicted reaction condition sequence, a loss value of the reaction condition prediction model to be trained comprises:

6. The method of claim 1, wherein the reaction condition predictive model to be trained further comprises: the temperature of the heat sink is controlled by the temperature decoder,

7. The method according to claim 6, wherein the calculating a loss value of the reaction condition prediction model to be trained based on the reaction condition sequence and the predicted reaction condition sequence comprises:

8. A reaction condition predictive model training apparatus, the apparatus comprising:

9. The apparatus of claim 8, wherein the reaction condition predictive model to be trained further comprises: the feature extraction network layer and the vector conversion layer,

the apparatus further comprises:

10. The apparatus of claim 9, wherein the language characterization network layer comprises: multi-headed attention mechanism layer, layer normalization and Dropout layer, feed-forward neural network and language characterization encoder,

the hidden tensor acquisition module includes: