CN116578934B

CN116578934B - Inverse synthetic analysis method and device based on Monte Carlo tree search

Info

Publication number: CN116578934B
Application number: CN202310854663.9A
Authority: CN
Inventors: 却立勇; 李中伟; 柳彦宏
Original assignee: Yantai Guogong Intelligent Technology Co ltd
Current assignee: Yantai Guogong Intelligent Technology Co ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-09-19
Anticipated expiration: 2043-07-13
Also published as: CN116578934A

Abstract

The application discloses an inverse synthetic analysis method and equipment based on Monte Carlo tree search, wherein the method comprises the following steps: obtaining a reverse reaction template library, wherein the reverse reaction template library comprises a plurality of fields; training to generate a plurality of required models based on the reverse reaction template library; aiming at a single preferential reverse reaction template, obtaining a similarity score corresponding to a reverse reaction template SMATS based on the similarity obtained by the product SMILES and the reactant SMILES corresponding to the single preferential reverse reaction template and the product SMILES to be decomposed; based on the Monte Carlo tree searching mode, selecting leaf nodes from the root nodes according to the similarity scores, expanding the nodes according to the selected leaf nodes until the simulation meets the stop condition and reaches the termination nodes, and backtracking to the root nodes. Through improving the Monte Carlo tree search algorithm, the probability of generating a route conforming to the synthetic preference of a user can be effectively improved, the application field can be expanded, and the method can be suitable for more compound inverse synthetic analysis fields.

Description

Inverse synthetic analysis method and device based on Monte Carlo tree search

Technical Field

The application relates to the field of computers, in particular to an inverse synthetic analysis method and device based on Monte Carlo tree search.

Background

Inverse synthetic analysis is a method used to synthesize a given compound by decomposing the compound of interest into intermediates or simpler reactants in a step-wise fashion until a commercially available building block is found.

Inverse synthetic analysis is traditionally implemented by expert systems based on manual coding rules, which results in a narrow range of applications and low accuracy.

Disclosure of Invention

In order to solve the above problems, the present application provides an inverse synthetic analysis method based on Monte Carlo tree search, comprising:

acquiring a reverse reaction template library, wherein the reverse reaction template library comprises a plurality of fields, and the fields comprise at least a plurality of reactants SMILES, products SMILES, reverse reaction templates SMARTS and whether templates are preferential or not;

training to generate a plurality of required models based on the reverse reaction template library, wherein the required models comprise a template correlation model, a template applicability model and a reaction rationality model;

aiming at a single preferential reverse reaction template, obtaining a similarity score corresponding to the reverse reaction template SMATS based on the similarity obtained by the product SMILES and the reactant SMILES corresponding to the single preferential reverse reaction template and the product SMILES to be decomposed;

and selecting leaf nodes from the root node according to the similarity score based on a Monte Carlo tree searching mode, and performing node expansion according to the selected leaf nodes until the simulation meets the stopping condition to reach the termination node, and performing backtracking to the root node to obtain an inverse synthetic analysis path which accords with the user preference.

On the other hand, the application also provides inverse synthetic analysis equipment based on Monte Carlo tree search, which comprises the following steps:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

In another aspect, the present application also provides a non-volatile computer storage medium storing computer-executable instructions configured to:

The inverse synthetic analysis method based on Monte Carlo tree search provided by the application can bring the following beneficial effects:

through the plurality of set required models and the obtained similarity scores and the improvement of the Monte Carlo tree search algorithm, the probability of generating the route conforming to the synthetic preference of the user can be effectively improved, the application field can be expanded, and the method can be suitable for more compound inverse synthetic analysis fields.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of an inverse synthetic analysis method based on Monte Carlo tree search in an embodiment of the application;

FIG. 2 is a schematic diagram of a model correlation model in an embodiment of the present application;

FIG. 3 is a schematic diagram of a model of the applicability of the model in an embodiment of the application;

FIG. 4 is a schematic diagram of a reaction rationality model according to an embodiment of the application;

fig. 5 is a schematic diagram of an inverse synthetic analysis device based on monte carlo tree search in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present application provides an inverse synthetic analysis method based on a monte carlo tree search, including:

s101: and obtaining a reverse reaction template library, wherein the reverse reaction template library comprises a plurality of fields, and the fields comprise at least a plurality of reactant SMILES, product SMILES, reverse reaction template SMARTS and whether templates are preferential or not.

The generation of an inverse synthetic route based on AI algorithms such as monte carlo tree search has received attention, and although it can generate a synthetic route of a specified target compound faster and more accurately, it is difficult to generate a synthetic route satisfying a user and conforming to a user's synthetic preference in an industrial floor scene.

SMILES is a linear representation of chemical structure, while smart is a SMILES-based reaction representation. Whether the template is preferential or not can be marked in advance based on manual work, and the reverse reaction template marked with "yes" can be called as a preferential reverse reaction template.

S102: training and generating a plurality of required models based on the reverse reaction template library, wherein the required models comprise a template correlation model, a template applicability model and a reaction rationality model.

Specifically, a first neural network model (which may be, for example, a multi-hidden layer fully-connected multi-class neural network) is built for the template correlation model.

Training the first neural network model through the reverse reaction template library to obtain a template correlation model. The input of the template correlation model is a certain molecular vector representation of the product SMILES, and the output is probability distribution of a corresponding non-repeated reverse reaction template SMART in a reverse reaction template library. For example, a Morgan fingerprint with length of 2048 and radius of 2 calculated by RDkit is used for molecular vectorization representation, a Keras deep learning frame is used for building the first neural network model, as shown in fig. 2, the number of neurons of an input layer of the first neural network model is 2048, one Dense layer is connected, the number of neurons is 512, an activation function is ReLU, a Dropout layer parameter is set to 0.4, after one Dense layer and the Dropout layer are repeatedly set, one Dense layer is connected, the number of neurons is 128, the activation function is ELU, one Dense layer is finally connected, the number of neurons of an output layer is the number of nonrepeating inverse reaction templates, and the loss function is cross entropy loss (after being processed by a Softmax function, the cross entropy function is input for cross entropy loss calculation).

For the template applicability model, firstly generating a corresponding label vector, and for each product SMILES, when the SMILES acts on each reverse reaction template SMART, if a pre-reactant can be generated, the label of the corresponding reverse reaction template SMART is 1, otherwise, the label is 0, so that a label vector consisting of 0 or 1 is generated.

A second neural network model is built (e.g., it may be a multi-hidden layer fully-connected multi-label classification neural network).

And performing supervised training on the second neural network model through the reverse reaction template library by using a label vector to obtain a template applicability model, wherein the input of the template applicability model is a certain molecular vector representation of a product SMILES, and the output is the applicable probability of a corresponding nonrepeating reverse reaction template SMART in the reverse reaction template library. For example, the molecular vectorization of the product SMILES indicates that a morgan fingerprint with a length of 2048 and a radius of 2 can be calculated by using RDKit, and the second neural network model is built by using a Keras deep learning framework, as shown in fig. 3, the number of neurons of an input layer is 2048, a Dense layer is connected, the number of neurons is 512, an activation function is a ReLU, a Dropout layer parameter is set to 0.4, a Dense layer is connected, the number of neurons is 128, an activation function is an ELU, a Dense layer is connected finally, the number of neurons of an output layer is a non-repeated inverse template number, and a loss function is a binary cross entropy loss (after processing by using a Sigmoid function, the cross entropy function is input to calculate the cross entropy loss).

For the reaction rationality model, similarly, it is also necessary to generate a corresponding tag vector to convert the reverse reaction template smart of the reverse reaction template library into the forward reaction template smart. For each reactant SMILES, when it acts on a randomly selected forward reaction template SMART, if the resulting product SMILES is different from the corresponding product SMILES of that reactant SMILES, then that resulting product SMILES and corresponding reactant SMILES are unreasonably reacted, the label is 0, and the reactions in the reverse reaction template library are unreasonably reacted (as opposed to unreasonably reacted, if the resulting product SMILES is the same as the corresponding product SMILES of that reactant SMILES, then it is unreasonably reacted), the label is 1, to generate a label vector consisting of 0 or 1.

A third neural network model (which may be, for example, a multi-hidden layer fully connected two-class neural network) is built.

And performing supervised training on the third neural network model through the reverse reaction template library by using a label vector to obtain a reaction rationality model, wherein the input of the reaction rationality model is a certain molecular vector representation of a product SMILES and a reaction SMILES, and the output is the reasonable probability of the reaction. For example, the molecular vectorization of the product SMILES is represented by a Morgan fingerprint with a length of 2048 and a radius of 2 calculated by RDKit, the molecular vectorization of the reaction SMILES is represented by the difference between the sum of the product Morgan fingerprint and the reactant Morgan fingerprint, the third neural network model is built by a Keras deep learning framework, as shown in fig. 4, the network input layer has two heads for receiving the product Morgan fingerprint (product Morgan FP) and the reaction Morgan fingerprint (reaction Morgan FP), the number of neurons is 2048, one of the Dense layers is connected to one side of the product Morgan fingerprint, the number of neurons is 512, the activation function is Relu, one of the Dropout layers is connected to one side of the reaction Morgan fingerprint, the number of neurons is 512, the activation function is Relu, the two outputs are connected to the cosine similarity layer (cosine), the number of neurons of the output layer Dense layer is 1, and the loss function is binary cross entropy loss (the cross entropy function is calculated after processing through the sigid function).

S103: and aiming at a single preferential reverse reaction template, obtaining a similarity score corresponding to the reverse reaction template SMATS based on the similarity obtained by the product SMILES and the reactant SMILES corresponding to the single preferential reverse reaction template and the product SMILES to be decomposed.

Specifically, molecular vectorization representation is carried out on the SMILES to be decomposed to obtain molecular vectorization representation of the SMILES to be decomposed. For example, a Morgan fingerprint with length 2048 and radius 2 calculated by RDkit may be used for molecular vectorization in this way when no special description is made in the embodiment of the present application.

In the reverse reaction template library, the preferential reverse reaction template may correspond to a plurality of product SMILES, and at this time, molecular vectorization representation is performed on a certain product SMILES and a reactant SMILES corresponding to the preferential reverse reaction template, so as to obtain a corresponding product molecular vectorization representation and a reactant molecular vectorization representation. When a plurality of product SMILES are corresponding, a corresponding molecular vector representation of the product can be obtained. And (3) acting the SMILES to be decomposed on the reactant SMILES generated corresponding to the preferential reverse reaction template, and carrying out molecular vectorization representation to obtain the generated reactant molecular vectorization representation.

And taking the product molecular vectorization representation to be decomposed as the product similarity according to the similarity between the product molecular vectorization representation and the product molecular vectorization representation. And taking the generated reactant molecular vectorization representation and the reactant molecular vectorization representation as reactant similarity according to the similarity.

And multiplying the product similarity and the reactant similarity according to the product similarity and the reactant similarity to obtain the total similarity. Because the single preferential reverse reaction template corresponds to a plurality of product SMILES, a plurality of total similarity can be obtained, and at the moment, the similarity score corresponding to the preferential reverse reaction template is obtained by taking the maximum value according to all the total similarity of the single preferential reverse reaction template. Wherein, the similarity can adopt the tanimoto similarity index of RDkit.

S104: and selecting leaf nodes from the root node according to the similarity score based on a Monte Carlo tree searching mode, and performing node expansion according to the selected leaf nodes until the simulation meets the stopping condition to reach the termination node, and performing backtracking to the root node to obtain an inverse synthetic analysis path which accords with the user preference.

Specifically, the Monte Carlo tree search includes four phases, namely a selection phase, an expansion phase, a simulation phase and a backtracking phase.

Wherein the selection phase recursively selects UCB scores from the root nodeThe highest child node, until the leaf node is reached, the UCB score is calculated by fusion based on the similarity score, e.g., the UCB score is calculated by the following formula,the method comprises the steps of carrying out a first treatment on the surface of the Wherein, UCB is UCB score,Qas a sum of the values of the previous steps,Nfor the number of times the current child node is traversed,N _-1 for the number of times the parent node of the current node is traversed,Cthe method is a super-parameter for balance exploration and development, the default value is 1.4, U represents whether the template is a preferential reverse reaction template, if yes, 1 is taken, and if not, 0,S is taken as the similarity score of the preferential reverse reaction template.

And in the expansion stage, for each molecule of the selected leaf node, generating a pre-molecule through a reaction rationality model by combining a reverse reaction template obtained by a template correlation model and a template applicability model, and creating the leaf node.

For each molecule of the selected leaf node, a template correlation model and a template applicability model are respectively input, and a corresponding first output result (corresponding to the output of the template correlation model) and a corresponding second output result (corresponding to the output of the template applicability model) are respectively obtained. And filtering the second output result through a threshold value, and multiplying the second output result by the first output result to obtain a plurality of reverse reaction templates. And generating a pre-molecule for each obtained reverse reaction template through a reaction rationality model, and creating leaf nodes after filtering.

Among all templates predicted by combining the template correlation model and the template applicability model for each molecule, only the first 50 templates with the highest probability are reserved, or the accumulated probability of the reserved templates reaches 0.995. The template applicability model and the reaction rationality model filtering threshold are taken 0.1,0.05 respectively.

The simulation stage, for the leaf nodes which are not accessed, repeatedly performs selection and expansion until the stop condition is met and the stop condition reaches the termination node, wherein the expansion comprises the steps of generating a pre-molecule and creating the leaf nodes.

Maintaining the template correlation model, the template applicability model and the reaction rationality model unchanged to repeatedly select and expand for the unviewed leaf nodes until a stop condition is met to reach a termination node, wherein the stop condition comprises: the generated pre-molecule exists in a preset general available compound library or a preferential available compound library, reaches the maximum depth of Monte Carlo tree, and the reverse reaction template is invalid.

And in the backtracking stage, starting from the current leaf node, and updating the Q value and the N value of each node on the backtracking path from bottom to top until the root node is reached. Where the Q value represents the sum of the previous values and the N value represents the number of times the current child node is traversed, and how the UCB score is calculated has been described above, the Q and N values are divided and used to perform the calculation of the UCB score, and parameters used in the UCB calculation processThe preferential reverse reaction template can be highlighted, and the preferential reverse reaction template is selected as a key for generating the path conforming to the preference of the user.

Obtaining a value evaluation value of the termination node according to a value update function, wherein the value update function is calculated by the following formula, and Reward=0.9×+0.1×/>Wherein, reward is the value evaluation value, nin_generalstock is the number of general purchasable compounds, nin_priority is the number of priority purchasable compounds, and N is the number of compounds in the termination node. The general available compounds and the preferential available compounds are set by the user based on actual requirements.

Starting from the current leaf node, updating and backtracking from bottom to top, and on the updating and backtracking path, accumulating the Q value of each node for one time, and adding 1 to the N value until reaching the root node, so as to calculate and obtain the final Q value and N value.

As shown in fig. 5, the embodiment of the present application further provides an inverse synthetic analysis device based on monte carlo tree search, including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

The embodiment of the application also provides a nonvolatile computer storage medium, which stores computer executable instructions, wherein the computer executable instructions are configured to:

The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.

The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not repeated here.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. An inverse synthetic analysis method based on Monte Carlo tree search, comprising:

obtaining a reverse reaction template library, wherein the reverse reaction template library comprises a plurality of fields, and the fields comprise a reactant SMILES, a product SMILES, a reverse reaction template SMATS and whether templates are preferential or not;

based on a Monte Carlo tree searching mode, selecting leaf nodes from a root node according to the similarity score, and performing node expansion according to the selected leaf nodes until the simulation meets a stop condition to reach a termination node, and backtracking to the root node to obtain an inverse synthetic analysis path which accords with user preference;

training to generate a plurality of required models based on the reverse reaction template library, wherein the training comprises the following steps:

building a first neural network model;

training the first neural network model through the reverse reaction template library to obtain a template correlation model, wherein the input of the template correlation model is a product SMILES, and the output is probability distribution of a corresponding non-repeated reverse reaction template SMART in the reverse reaction template library;

for each product SMILES, when acting on each reverse reaction template SMART, if a pre-reactant can be generated, the label of the corresponding reverse reaction template SMART is 1, otherwise, the label is 0, so as to generate a label vector consisting of 0 or 1;

building a second neural network model;

performing supervised training on the second neural network model through the reverse reaction template library by using the label vector to obtain a template applicability model, wherein the input of the template applicability model is a product SMILES, and the output of the template applicability model is the applicable probability of a corresponding nonrepeating reverse reaction template SMART in the reverse reaction template library;

converting the reverse reaction template SMART of the reverse reaction template library into a forward reaction template SMART;

for each reactant SMILES, when acting on a randomly selected forward reaction template SMART, if the generated product SMILES is different from the product SMILES corresponding to the reactant SMILES, the generated product SMILES and the corresponding reactant SMILES are unreasonably reacted, the label is 0, the reaction in the reverse reaction template library is reasonably reacted, and the label is 1, so that a label vector consisting of 0 or 1 is generated;

building a third neural network model;

performing supervised training on the third neural network model through the reverse reaction template library by using the label vector to obtain a reaction rationality model, wherein the input of the reaction rationality model is a product SMILES and a reaction SMILES, and the output is the reasonable probability of the reaction;

based on a Monte Carlo tree searching mode, selecting leaf nodes from root nodes according to the scores, and performing node expansion according to the selected leaf nodes until simulation meets a stop condition to reach a termination node, and backtracking to the root nodes, wherein the method specifically comprises the following steps:

recursively selecting a child node with the highest UCB score from a root node until a leaf node is reached, wherein the UCB score is obtained by fusion calculation based on the similarity score;

for each molecule of the selected leaf node, generating a pre-molecule through a reverse reaction template obtained by combining the template correlation model and the template applicability model, and creating the leaf node after filtering through the reaction rationality model;

repeating selection and expansion for unviewed leaf nodes until a stop condition is met to reach a termination node, wherein the expansion comprises generating a pre-molecule and creating a leaf node;

starting from the current leaf node, updating the Q value and the N value of each node on the trace-back path from bottom to top until the root node is reached.

2. The method according to claim 1, wherein for a single preferential reverse reaction template, based on the similarity obtained with the product SMILES to be decomposed, the similarity score corresponding to the reverse reaction template smart is obtained, specifically comprising:

carrying out molecular vectorization representation on a product SMILES to be decomposed to obtain molecular vectorization representation of the product to be decomposed;

carrying out molecular vectorization representation on a product SMILES and a reactant SMILES corresponding to the preferential reverse reaction template to obtain corresponding product molecular vectorization representation and reactant molecular vectorization representation;

the product SMILES to be decomposed acts on the reactant SMILES corresponding to the preferential reverse reaction template, and molecular vectorization representation is carried out, so that the generated reactant molecular vectorization representation is obtained;

obtaining product similarity according to the similarity between the product molecular vectorization representation to be decomposed and the product molecular vectorization representation;

obtaining reactant similarity according to the similarity between the generated reactant molecular vectorization representation and the reactant molecular vectorization representation;

obtaining total similarity according to the product similarity and the reactant similarity;

and aiming at a single preferential reverse reaction template, taking the maximum value according to all the total similarity of the single preferential reverse reaction template to obtain the similarity score corresponding to the preferential reverse reaction template.

3. The method according to claim 1, wherein for each molecule of the selected leaf node, generating a pre-molecule and creating a leaf node by means of the reaction rationality model by means of a reverse reaction template obtained by combining the template correlation model, the template suitability model, in particular comprising:

inputting the template correlation model and the template applicability model to each molecule of the selected leaf node respectively to obtain a corresponding first output result and a corresponding second output result;

filtering the second output result through a threshold value, and multiplying the second output result by the first output result to obtain a plurality of inverse reaction templates;

and generating a pre-molecule for each obtained reverse reaction template through the reaction rationality model, and creating leaf nodes after filtering.

4. The method according to claim 1, wherein the selecting and expanding are repeated for the non-visited leaf nodes until the stop condition is met to the termination node, in particular comprising:

maintaining the template correlation model, the template applicability model, and the reaction rationality model unchanged to repeat selection and expansion for unvisited leaf nodes until a stop condition is met to reach a termination node, the stop condition comprising: the generated pre-molecule exists in a preset general available compound library or a preferential available compound library, reaches the maximum depth of Monte Carlo tree, and the reverse reaction template is invalid.

5. The method according to claim 1, wherein the updating of the Q value and the N value of each node on the traceback path from bottom to top starting from the current leaf node until the root node is reached, in particular comprises:

acquiring a value evaluation value of the termination node according to the value updating function;

starting from the current leaf node, updating and backtracking from bottom to top, and on the updating and backtracking path, accumulating the Q value of each node for one time, and adding 1 to the N value until reaching the root node.

6. An inverse synthetic analysis device based on a monte carlo tree search, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

building a first neural network model;

building a second neural network model;

building a third neural network model;