CN113782109A

CN113782109A - Reactant derivation method and reverse synthesis derivation method based on Monte Carlo tree

Info

Publication number: CN113782109A
Application number: CN202111066691.1A
Authority: CN
Inventors: 柳彦宏; 戴开洋; 却立勇
Original assignee: Yantai Guogong Intelligent Technology Co ltd
Current assignee: Yantai Guogong Intelligent Technology Co ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2021-12-10

Abstract

The invention discloses a reactant derivation method and a reverse synthesis derivation method based on a Monte Carlo tree, belongs to the technical field of reverse synthesis, and aims to solve the technical problem of improving the prediction accuracy of reverse synthesis analysis. The reverse synthesis derivation method comprises the following steps: performing reverse reaction template extraction on the initial data set by an RdChiral method; after data cleaning is carried out on the initial template library, a training set, a verification set and a test set are obtained, and a reverse reaction template library is constructed on the basis of a nonrepeating reverse reaction template; constructing a single hidden layer full-connection neural network model based on a Keras deep learning framework to serve as a template prediction model; training a template prediction model; the method comprises the steps of constructing a Monte Carlo tree by taking a target compound SMILES expression as a root node, predicting a reverse reaction template corresponding to a molecule in each node of the Monte Carlo tree by adopting a template prediction model based on a Monte Carlo tree searching method, and obtaining a preceding stage reactant corresponding to the reverse reaction template.

Description

Reactant derivation method and reverse synthesis derivation method based on Monte Carlo tree

Technical Field

The invention relates to the technical field of reverse synthesis, in particular to a reactant derivation method and a reverse synthesis derivation method based on a Monte Carlo tree.

Background

Retrosynthetic analysis is a method of synthesizing a given compound, usually by a chemist or computer, by breaking down the target into intermediates or simpler reactants in a step-by-step process until a commercially available building block is found. The reverse synthesis analysis is traditionally realized by an expert system based on a manual coding rule, and the application range is standard and the accuracy is low.

Based on the analysis, how to improve the prediction accuracy of the reverse synthesis analysis is a technical problem to be solved.

Disclosure of Invention

The technical task of the invention is to provide a reactant derivation method and a reverse synthesis derivation method based on Monte Carlo trees aiming at the defects so as to solve the problem of how to improve the prediction accuracy of reverse synthesis analysis.

According to the reactant derivation method based on the Monte Carlo tree, a target compound SMILES expression is used as a root node to construct the Monte Carlo tree, a template prediction model is adopted to predict an inverse reaction template corresponding to molecules in each node of the Monte Carlo tree based on a Monte Carlo tree search method, and a preceding stage reaction corresponding to the inverse reaction template is obtained at the same time, wherein the template prediction model is a neural network model which takes a product in a chemical equation as input and the inverse reaction template as output; the method comprises the following steps:

in the selection stage, for the current node, each iteration starts from the root node of the tree, the UCB score of each node is calculated, and the leaf node with the highest UCB score is selected as the leaf node to be expanded;

in the expansion stage, for each molecule of the leaf node to be expanded, predicting a corresponding reverse reaction template through a template prediction model, obtaining a preposed molecule corresponding to each reverse reaction template based on RDKit and creating a leaf node;

in the simulation phase, the selection and the expansion are continuously carried out starting from the leaf nodes which are not visited until a stop condition is met and a termination node is reached, wherein the stop condition comprises the following steps: the generated preposed molecules are all present in a commercially available compound library, reach the maximum depth of a tree and are ineffective in a reverse reaction template;

in the backtracking stage, updating the Q value and the N value of each node on the backtracking path from bottom to top from the leaf node to be expanded until the root node is reached;

wherein, the calculation formula of the UCB score is as follows:

wherein N is_-1Representing the number of times that the parent node of the current node is traversed, and C representing a hyper-parameter for balancing exploration and development;

in the process of calculating the UCB score of each node, Q represents the sum of the previous step values, and N represents the number of times that the current child node is traversed.

Preferably, the same template prediction model is used for predicting the corresponding reverse reaction template in the simulation stage and the expansion stage;

the template prediction model is obtained through the following steps:

acquiring a reaction equation to construct an initial data set, wherein the reaction equation comprises a reactant SMILES expression and a product SMILES expression;

extracting a reverse reaction template from the initial data set by an RdChiral method, performing Hash coding on a reaction equation and the reverse reaction template respectively, and constructing an initial template library based on the Hash coding of the reaction equation, a reactant SMILES expression, a product SMILES expression, the reverse reaction template and the Hash coding of the reverse reaction template;

after data cleaning is carried out on the initial template library, the reverse reaction template hash code is converted into a label vector, a product SMILES expression is converted into a finger print vector, data set division is carried out on the basis of the finger print vector and the label vector, a training set, a verification set and a test set are obtained, the training set, the verification set and the test set respectively comprise the label vector and the finger print vector, the label vector corresponds to the finger print vector one by one, and a reverse reaction template library is constructed on the basis of a nonrepeating reverse reaction template;

constructing a single hidden layer full-connection neural network model based on a Keras deep learning framework to serve as a template prediction model, wherein the template prediction model is used for inputting, predicting and outputting a reverse reaction template by taking a product as an input;

and training the template prediction model by taking the training set as input to optimize parameters of the template prediction model, monitoring the training process of the template prediction model by taking the verification set as input to prevent overfitting to obtain a trained template prediction model, and testing the trained template prediction model based on the test set to obtain a final template prediction model.

Preferably, the data cleaning is performed on the initial template library, and the method comprises the following steps:

removing sample data of which the product quantity is more than 1 in the product SMILES expression in the initial template library;

acting the reverse reaction template on the corresponding product based on the RdChiral, and if the action is invalid, removing corresponding sample data;

removing sample data corresponding to reverse reaction templates with the occurrence frequency less than a threshold value in the initial template library;

and removing repeated sample data in the initial template library according to the Hash code of the reaction equation.

Preferably, the method for obtaining the corresponding preposed molecules of each reverse reaction template and creating the leaf nodes based on the RDkit comprises the following steps:

reserving a predetermined number of reverse reaction templates for all the reverse reaction templates obtained from each molecule;

all molecular related reverse reaction templates are stored in a father node, and related Q values and N values are respectively initialized.

Preferably, the same template prediction model is used to predict the corresponding inverse response template in the simulation phase and the expansion phase.

Preferably, the updating of the Q value and the N value of each node on the backtracking path from the leaf node to be expanded to the root node from bottom to top includes the following steps:

obtaining a value evaluation value of the termination node according to the value updating function;

accumulating the value evaluation value once for the Q value of each node on the backtracking path, and adding 1 to the N value;

the calculation formula of the value updating function is as follows:

wherein Reward is a value evaluation value, N_{in_stock}For the number of compounds available, N is the number of compounds in the termination node and transformations are the number of changes from the target compound for each compound.

In a second aspect, the present invention provides a method for deriving a reverse synthesis based on a monte carlo tree, comprising the following steps:

training the template prediction model by taking a training set as input to optimize parameters of the template prediction model, monitoring the training process of the template prediction model by taking a verification set as input to prevent overfitting to obtain a trained template prediction model, and testing the trained template prediction model based on a test set to obtain a final template prediction model;

according to the reactant derivation method based on the Monte Carlo tree, the Monte Carlo tree is constructed by taking the SMILES expression of the target compound as a root node, the reverse reaction template corresponding to each molecule in each node of the Monte Carlo tree is predicted by adopting a template prediction model based on the Monte Carlo tree search method, and the previous stage reaction corresponding to the reverse reaction template is obtained at the same time.

Preferably, before reverse reaction template extraction is performed on the initial dataset by the RdChiral method, the reaction equation is subjected to atomic mapping by rxn mapper, and the reactants and products are labeled with atomic numbers.

Preferably, the reverse reaction template is converted into a label vector through a LabelBinarizer label binarization method of a scimit-lern library; the product SMILES expression is converted into a finger print vector by the Morgan algorithm of RDkit.

Preferably, the template prediction model includes:

the number of neurons of the input layer is consistent with the length of the finger print vector;

a hidden layer configured with an activation function ELU;

and the number of the neurons of the output layer is consistent with the number of the non-repeated reverse reaction templates, and the output layer is configured with an activation function Softmax.

The reactant derivation method and the reverse synthesis derivation method based on the Monte Carlo tree have the following advantages:

1. the method comprises the steps of constructing a Monte Carlo tree based on molecules in a target compound SMILES expression as root nodes, predicting an inverse reaction template corresponding to the molecules in each node of the Monte Carlo tree through a template prediction model, and obtaining a preceding-stage reactant corresponding to the inverse reaction template based on a Monte Carlo tree searching method, wherein the inverse reaction template is obtained by the template prediction model based on product prediction;

2. obtaining a reverse reaction template by an RdChiral method, constructing an initial template library by reaction equation hash codes, a reactant SMILES expression, a product SMILES expression, the reverse reaction template and the reverse reaction template hash codes, after the initial template library is cleaned, the initial template library is divided into a training set, a verification set and a test set, a reverse reaction template library is constructed based on non-repetitive reverse reaction templates, constructing a single hidden layer fully-connected neural network model as a template prediction model based on a Keras deep learning framework, training the template prediction model based on the training set, the verification set and the test set to obtain a final template prediction model, the reverse reaction template can be obtained by the template prediction model by taking the product as input prediction, thereby realizing the rapid and high-efficiency obtaining of the reverse reaction template, further, the reverse synthesis derivation can be quickly, efficiently and accurately realized by combining the Monte Carlo tree search;

3. before the template prediction model is trained, data cleaning is carried out on the initial template base, invalid and repeated sample data are removed, the effectiveness of the sample is improved, the accuracy of the template prediction model is improved, and the operation efficiency is improved;

4. converting the product SMILES expression into a high-dimensional finger print vector, facilitating the learning of the neural network model to the chemical structure characteristics in the SMILES expression, and constructing a mapping relation between the reaction center characteristics in the product SMILES expression and the corresponding reverse reaction template;

5. the simple single-hidden-layer fully-connected neural network structure is adopted as the template prediction model structure, so that the overfitting phenomenon of a reverse reaction template with less occurrence times in the training process is reduced, and the generalization capability of the template prediction model is improved;

6. the inverse synthesis derivation method has the capability of continuous learning. The reverse synthetic path may be updated by continuously learning new chemical reaction knowledge by retraining the template prediction model after updating the reaction equation dataset.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block flow diagram of the reactant derivation method based on Monte Carlo tree of example 1;

fig. 2 is a flow chart of a reverse synthesis derivation method based on a monte carlo tree in embodiment 2.

Detailed Description

The present invention is further described in the following with reference to the drawings and the specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not to be construed as limiting the present invention, and the embodiments and the technical features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides a reactant derivation method and a reverse synthesis derivation method based on a Monte Carlo tree, which are used for solving the technical problem of how to improve the prediction accuracy of reverse synthesis analysis.

Example 1:

the invention relates to a reactant derivation method based on a Monte Carlo tree, which is characterized in that a target compound SMILES expression is used as a root node to construct the Monte Carlo tree, a template prediction model is adopted to predict an inverse reaction template corresponding to molecules in each node of the Monte Carlo tree based on a Monte Carlo tree search method, a preceding stage reactant corresponding to the inverse reaction template is obtained at the same time, and the template prediction model is a neural network model which takes a product in a chemical equation as input and the inverse reaction template as output.

In this embodiment, the method includes the steps of:

s100, in a selection stage, for a current node, calculating the UCB score of each node from a root node of a tree in each iteration, and selecting a leaf node with the highest UCB score as a leaf node to be expanded;

s200, in the expansion stage, predicting a corresponding reverse reaction template through a template prediction model for each molecule of the leaf node to be expanded, obtaining a preposed molecule corresponding to each reverse reaction template based on the RDkit, and creating the leaf node;

s300, in a simulation stage, selecting and expanding are continuously carried out from leaf nodes which are not visited until a stop condition is met and a stop node is reached, wherein the stop condition comprises: the generated preposed molecules are all present in a commercially available compound library, reach the maximum depth of a tree and are ineffective in a reverse reaction template;

s400, in a backtracking stage, starting from the leaf node to be expanded to update the Q value and the N value of each node on a backtracking path from bottom to top until the root node is reached, wherein Q represents the sum of the previous step values, and N represents the number of times that the current child node is traversed.

The template prediction model in the embodiment is obtained through the following steps:

(1) acquiring a reaction equation to construct an initial data set, wherein the reaction equation comprises a reactant SMILES expression and a product SMILES expression;

(2) extracting a reverse reaction template from the initial data set by an RdChiral method, performing Hash coding on a reaction equation and the reverse reaction template respectively, and constructing an initial template library based on the Hash coding of the reaction equation, a reactant SMILES expression, a product SMILES expression, the reverse reaction template and the Hash coding of the reverse reaction template;

(3) after data cleaning is carried out on an initial template library, a reverse reaction template hash code is converted into a label vector, a product SMILES expression is converted into a finger print vector, data set division is carried out based on the finger print vector and the label vector to obtain a training set, a verification set and a test set, the training set, the verification set and the test set respectively comprise the label vector and the finger print vector, the label vector corresponds to the finger print vector one by one, and a reverse reaction template library is constructed based on a nonrepeating reverse reaction template;

(4) constructing a single hidden layer full-connection neural network model based on a Keras deep learning framework to serve as a template prediction model, wherein the template prediction model is used for inputting, predicting and outputting a reverse reaction template by taking a product as an input;

(5) and training the template prediction model by taking the training set as input to optimize parameters of the template prediction model, monitoring the training process of the template prediction model by taking the verification set as input to prevent overfitting to obtain a trained template prediction model, and testing the trained template prediction model based on the test set to obtain a final template prediction model.

Before the initial data set is subjected to reverse reaction template extraction through an RdChiral method in the step (2), atomic mapping is carried out on a reaction equation through RXMnaper, and atomic numbers are marked on reactants and products so as to further extract a reverse reaction template.

And after the reverse reaction template of the reaction equation is extracted based on the RdChiral, performing hash coding on the reaction equation and the reverse reaction template respectively to obtain two new fields of the hash coding of the reaction equation and the hash coding of the reverse reaction template, wherein the hash coding of the reaction equation is used for eliminating repeated sample data in the initial template library, and the hash coding of the reverse reaction template is used for eliminating sample data corresponding to the reverse reaction template with too few times in the initial template library on one hand and is further converted into a label vector to be used as a training target on the other hand.

Step (3) firstly, data cleaning is carried out on the initial template library to obtain the initial template library without abnormal sample data, and the method specifically comprises the following steps:

(3-1) removing sample data with the product quantity larger than 1 in the SMILES field of the product of the initial template library;

(3-2) acting the reverse reaction template on the corresponding product based on the RdChiral, and if the action is invalid, rejecting the corresponding sample data;

and (3-3) obtaining an initial template base without abnormal sample data.

Then, further carrying out data preprocessing on the initial template library without abnormal sample data, wherein the data preprocessing comprises the following steps:

(3-4) removing sample data corresponding to reverse reaction templates with too few occurrences in the initial template library (in order to ensure the generalization capability of the training model, a threshold value is generally set to be 3);

and (3-5) removing repeated sample data in the initial template library according to the Hash coding of the reaction equation.

After further data preprocessing is carried out on the initial template library without abnormal sample data, the reverse reaction template hash code is converted into label vectors through a LabelBinarizer label binarization method of a scinit-lern library, and the label vectors form a label vector sample set; the method comprises the steps of converting a product SMILES into an ECFP with the radius of 2 and the length of 2048 by using a Morgan algorithm of an RDkit, forming a finger print vector sample set by finger print vectors, combining the label vectors and the finger print vector sample set into a sample data set, wherein the label vectors and the finger print vectors in the sample data set are in one-to-one correspondence due to the fact that random seeds are fixed, then carrying out data set division on the sample data set to obtain a training set, a verification set and a test set, and the proportion of division of the training set, the verification set and the test set is 90%, 5% and 5%. And constructing a reverse reaction template library according to the nonrepeating reverse reaction template.

In the embodiment, the template prediction model is a single hidden layer full-connection neural network model constructed based on a Keras deep learning framework, the number of neurons in an input layer of the neural network is set to be the finger print vector length, the number of neurons in an output layer of the neural network is set to be the nonrepeating reverse reaction template number in a training set, activation functions of a hidden layer and an output layer are respectively set to be ELU and Softmax, a loss function is set to be a cross entropy loss function, and then the template prediction model is trained to obtain a final template prediction model. In the template training process, Dropout and l2 regularization are used for preventing overfitting, the number of neuron nodes of the hidden layer is set to be 512, an Adam optimizer is used, and the initial learning rate is set to be 0.001. And training the neural network model on a training set by adopting the hyper-parameter setting. The verification set is used for monitoring the training effect of the model and preventing over-training fitting. The test set is used to verify the generalization ability of the model after training is completed.

In step S100, selecting a leaf node of which the current node is most expected to be further developed as a leaf node to be expanded in the selection stage, starting from a root node of the tree in each iteration, calculating a UCB score of each node, selecting a leaf node with the highest UCB score to further expand, wherein a calculation formula of the UCB score is as follows:

wherein Q represents the sum of the previous step values, N represents the number of times the current child node is traversed, and N represents the total number of times the current child node is traversed_-1Representing the number of times the parent of the current node is traversed, C represents a hyper-parameter for balancing exploration and development, with a default value of 1.4.

In step S200, in the expansion phase, for each molecule of the selected leaf node to be expanded, a pre-molecule is generated through a reaction template given by a template prediction model, and a leaf node is created, specifically, the following operations are performed:

(1) obtaining a series of reverse reaction templates for each molecule of the leaf node to be expanded according to the template prediction model;

(2) and obtaining a preposed molecule corresponding to each reverse reaction template based on the RDkit and creating a leaf node.

Specifically, all templates obtained for each molecule retained only the top 50 highest scoring templates, or the cumulative probability of retaining templates reached 0.995. The reaction templates for all molecules are stored in the parent node and the associated Q and N values are initialized to 0.5 and 1, respectively.

In step S300, the selection and expansion are continuously performed from the leaf node that has not been visited until the stop condition is satisfied and the end node is reached, and in the above operation steps, the same template prediction model is used to predict the corresponding inverse response template in the simulation stage and the expansion stage.

In step S400, in the backtracking stage, the Q value and the N value of each node on the backtracking path are updated from the leaf node to be expanded from bottom to top until the root node is reached, including the following steps:

(1) obtaining a value evaluation value of the termination node according to the value updating function;

(2) and accumulating the value evaluation value once for the Q value of each node on the backtracking path, and adding 1 to the N value.

The calculation formula of the value updating function is as follows:

In the specific implementation process, the template prediction model can be constructed and trained according to the prior art, products are used as input, and the reverse reaction template is predicted and output through the template prediction model. The construction and training of the template prediction model in this embodiment is an option.

Example 2:

the invention relates to a reverse synthesis derivation method based on a Monte Carlo tree, which comprises the following steps:

s100, acquiring a reaction equation to construct an initial data set, wherein the reaction equation comprises a reactant SMILES expression and a product SMILES expression;

s200, extracting a reverse reaction template from the initial data set by an RdChiral method, respectively carrying out Hash coding on a reaction equation and the reverse reaction template, and constructing an initial template library based on the Hash coding of the reaction equation, a reactant SMILES expression, a product SMILES expression, the reverse reaction template and the Hash coding of the reverse reaction template;

s300, after data cleaning is carried out on the initial template base, Hash coding of a reverse reaction template is converted into a label vector, a product SMILES expression is converted into a finger print vector, data set division is carried out based on the finger print vector and the label vector to obtain a training set, a verification set and a test set, the training set, the verification set and the test set respectively comprise the label vector and the finger print vector, the label vector corresponds to the finger print vector one by one, and the reverse reaction template base is constructed based on a nonrepeating reverse reaction template;

s400, constructing a single hidden layer full-connection neural network model based on a Keras deep learning framework to serve as a template prediction model, wherein the template prediction model is used for inputting, predicting and outputting a reverse reaction template by taking a product as an input;

s500, training the template prediction model by taking the training set as input to optimize parameters of the template prediction model, monitoring the training process of the template prediction model by taking the verification set as input to prevent overfitting to obtain a trained template prediction model, and testing the trained template prediction model based on the test set to obtain a final template prediction model;

s600, constructing a monte carlo tree by using the SMILES expression of the target compound as a root node through the reactant derivation method based on the monte carlo tree disclosed in embodiment 1, predicting an inverse reaction template corresponding to a molecule in each node of the monte carlo tree by using a template prediction model based on the monte carlo tree search method, and obtaining a previous-stage reactant corresponding to the inverse reaction template.

In step S200, before performing reverse reaction template extraction on the initial data set by using an RdChiral method, atom mapping is performed on a reaction equation by using rxn mapper, and atomic numbers are labeled on reactants and products, so as to further extract a reverse reaction template.

And finally, obtaining an initial template library consisting of five fields of reaction equation Hash codes, reactant SMILES, product SMILES, reverse reaction templates and reverse reaction template Hash codes.

In step S300, firstly, data cleaning is performed on the initial template library to obtain an initial template library without abnormal sample data, which specifically includes:

(1) removing sample data with the product quantity larger than 1 in the SMILES field of the initial template library product;

(2) acting the reverse reaction template on the corresponding product based on the RdChiral, and if the action is invalid, removing corresponding sample data;

(3) and obtaining an initial template library without abnormal sample data.

(1) removing sample data corresponding to reverse reaction templates with too few occurrences in the initial template library (in order to ensure the generalization capability of the training model, a threshold value is generally set to be 3);

(2) removing repeated sample data in the initial template library according to the Hash code of the reaction equation,

after further data preprocessing is carried out on the initial template library without abnormal sample data, the reverse reaction template is converted into a label vector through a LabelBinarizer label binarization method of a scinit-lern library; the method comprises the steps of converting a product SMILES into an ECFP with the radius of 2 and the length of 2048 by using a Morgan algorithm of an RDkit, forming a finger print vector sample set by finger print vectors, combining the label vectors and the finger print vector sample set into a sample data set, wherein the label vectors and the finger print vectors in the sample data set are in one-to-one correspondence due to the fact that random seeds are fixed, then carrying out data set division on the sample data set to obtain a training set, a verification set and a test set, and the proportion of division of the training set, the verification set and the test set is 90%, 5% and 5%. And constructing a reverse reaction template library according to the nonrepeating reverse reaction template.

In step S400, a single hidden layer fully-connected neural network model is constructed based on a Keras deep learning framework, in this embodiment, the number of neurons in an input layer of the neural network is set to be the finger print vector length, the number of neurons in an output layer of the neural network is set to be the number of nonrepeating reverse reaction templates in a training set, activation functions of a hidden layer and an output layer are respectively set to be ELU and Softmax, a loss function is set to be a cross entropy loss function, and then the template prediction model is trained through step S500 to obtain a final template prediction model.

In step S500 template training, Dropout and l2 regularization are used to prevent overfitting, the hidden layer neuron node number is set to 512, and the Adam optimizer is used, the initial learning rate is set to 0.001. And training the neural network model on a training set by adopting the hyper-parameter setting. The verification set is used for monitoring the training effect of the model and preventing over-training fitting. The test set is used to verify the generalization ability of the model after training is completed.

After the final template prediction model is obtained, step S600 is executed, the target product is taken as input, a corresponding reverse reaction template is predicted and output through the final template prediction model, a monte carlo tree is constructed based on the product SMILES expression, and a preceding-stage reactant corresponding to the reverse reaction template is obtained through monte carlo tree search.

The method can effectively improve the prediction accuracy and expand the application field by adopting the deep learning algorithm and the Monte Carlo tree search algorithm to carry out inverse synthesis analysis.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. The reactant derivation method based on the Monte Carlo tree is characterized in that a target compound SMILES expression is used as a root node to construct the Monte Carlo tree, a template prediction model is used for predicting an inverse reaction template corresponding to molecules in each node of the Monte Carlo tree based on a Monte Carlo tree search method, and a preceding stage reactant corresponding to the inverse reaction template is obtained at the same time, wherein the template prediction model is a neural network model which takes a product in a chemical equation as input and the inverse reaction template as output;

the method comprises the following steps:

wherein, the calculation formula of the UCB score is as follows:

q represents the sum of the values of the previous steps, and N represents the number of times the current child node is traversed.

2. The method of Monte Carlo tree-based reactant derivation according to claim 1, wherein the same template prediction model is used to predict the corresponding inverse reaction template during the simulation phase and the expansion phase;

the template prediction model is obtained through the following steps:

3. The method of Monte Carlo tree-based reactant derivation according to claim 2, wherein the initial template library is data-cleaned comprising the steps of:

4. The method for reactant derivation according to any of claims 1-3, wherein the obtaining of the corresponding pre-molecule for each reverse reaction template and the creation of leaf nodes based on RDKit comprises the steps of:

5. The method of any one of claims 1-3, wherein the Q and N values of each node in the backtracking path are updated from the leaf node to be expanded from bottom to top until the root node is reached, comprising the steps of:

the calculation formula of the value updating function is as follows:

6. The reverse synthesis derivation method based on the Monte Carlo tree is characterized by comprising the following steps:

by the Monte Carlo tree-based reactant derivation method as claimed in any one of claims 1-5, constructing the Monte Carlo tree with the target compound SMILES expression as the root node, and predicting the reverse reaction template corresponding to the molecule in each node of the Monte Carlo tree by using the template prediction model based on the Monte Carlo tree search method, and obtaining the previous stage reactant corresponding to the reverse reaction template.

7. The method of claim 6, wherein the reactants and products are labeled with atomic numbers by RXMLAPPer atomic mapping of the reaction equations before reverse reaction template extraction of the initial dataset by the RdChiral method.

8. The method of claim 6, wherein the initial template library is data-washed, comprising the steps of:

9. The method of claim 6, wherein the inverse reaction template hash code is converted into a label vector by LabelBinarizer label binarization method of scinit-lern library; the product SMILES expression is converted into a finger print vector by the Morgan algorithm of RDkit.

10. The method of Monte Carlo tree based inverse synthetic derivation according to claims 6, 7, 8 or 9, wherein the template prediction model comprises:

a hidden layer configured with an activation function ELU;