CN111695702A

CN111695702A - Training method, device, equipment and storage medium of molecular generation model

Info

Publication number: CN111695702A
Application number: CN202010546027.6A
Authority: CN
Inventors: 徐挺洋; 余俊驰; 荣钰; 黄俊洲
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2020-09-22
Anticipated expiration: 2040-06-16
Also published as: CN111695702B

Abstract

The application provides a training method of a molecular generation model; the method comprises the following steps: obtaining a basic molecule and a target molecule; coding the basic molecules through a coding layer to obtain a first graph node characteristic and a first tree node characteristic, and coding the target molecules to obtain a second graph node characteristic and a second tree node characteristic; matching the first graph node characteristics with the second graph node characteristics through an alignment layer to obtain first similarity characteristics, and matching the first tree node characteristics with the second tree node characteristics to obtain second similarity characteristics; generating graph node characteristics according to the first similarity characteristics and the first graph node characteristics through generating layers, and generating tree node characteristics according to the second similarity characteristics and the first tree node characteristics; respectively decoding the graph node characteristics and the tree node characteristics through a decoding layer to obtain predicted molecules; updating model parameters based on the difference between the predicted molecule and the target molecule; by the obtained model, high-attribute molecules which retain partial structures of basic molecules can be generated.

Description

Training method, device, equipment and storage medium of molecular generation model

Technical Field

The present application relates to artificial intelligence technology, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for training a molecular generation model.

Background

Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning and the like, and it is believed that with the development of the technology, the artificial intelligence technology will be applied in more fields and play more and more important values.

The artificial intelligence technology is applied to molecule generation, and high-attribute molecules can be generated according to some low-attribute basic molecules through the reasoning and decision functions of a machine, so that the time overhead of artificially discovering the high-attribute molecules is reduced; however, the training process of the molecular generation model in the related art is tedious and takes a long time, and the generated molecules cannot keep the structural information of the original molecules due to the inability to discover the structural similarity between the molecules, which is low in rationality.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for generating a molecular generation model, which can improve the training efficiency of the molecular generation model, so that the trained molecular generation model generates a prediction molecule which retains basic molecular structure information.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a molecular generation model method, wherein the molecular generation model comprises an encoding layer, an alignment layer, a generation layer and a decoding layer, and the method comprises the following steps:

obtaining a molecule pair sample containing basic molecules and target molecules, wherein the molecule pair sample is a biomolecule pair sample or a drug molecule pair sample;

wherein the molecular property of the target molecule is higher than the molecular property of the base molecule, the base molecule being represented by a first molecular graph and a first molecular tree, the target molecule being represented by a second molecular graph and a second molecular tree;

encoding the basic molecules in the molecule pair sample through the encoding layer to obtain a first graph node characteristic of a first graph and a first tree node characteristic of a first molecular tree, and encoding the target molecules in the molecule pair sample to obtain a second graph node characteristic of a second graph and a second tree node characteristic of a second molecular tree;

matching the first graph node characteristics with the second graph node characteristics through the alignment layer to obtain first similarity characteristics of the first graph node characteristics and the second graph node characteristics, and matching the first tree node characteristics with the second tree node characteristics to obtain second similarity characteristics of the first sub-tree and the second sub-tree;

generating graph node characteristics of a predicted molecule according to the first similarity characteristics and the first graph node characteristics through the generation layer, and generating tree node characteristics of the predicted molecule according to the second similarity characteristics and the first tree node characteristics;

decoding the graph node characteristics and the tree node characteristics respectively through the decoding layer to obtain the predicted molecules represented by a molecular graph and a molecular tree;

obtaining a difference between the predicted molecule and the target molecule, and updating model parameters of the molecule generation model based on the difference.

A molecular generation method based on a molecular generation model, wherein the molecular generation model comprises: the method comprises an encoding layer, a generating layer and a decoding layer, and comprises the following steps:

obtaining a base molecule, wherein the base molecule is a biomolecule or a drug molecule and is represented by a first molecular diagram and a first molecular tree;

coding basic molecules in the molecule pair sample through the coding layer to obtain a first graph node characteristic of a first graph and a first tree node characteristic of a first molecular tree;

generating graph node features of the predicted numerator based on the first graph node features through the generation layer, and generating tree node features of the predicted numerator according to the standard Gaussian distribution and the first tree node features;

decoding the graph node characteristics and the tree node characteristics respectively through the decoding layer to obtain the prediction molecules;

the molecular generation model is obtained by training based on the training method of the molecular generation model provided by the embodiment of the application.

The embodiment of the application provides a training device of a molecule generation model, wherein the molecule generation model comprises an encoding layer, an alignment layer, a generation layer and a decoding layer, and the device comprises:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a molecule pair sample containing basic molecules and target molecules, and the molecule pair sample is a biomolecule pair sample or a drug molecule pair sample;

a first encoding module, configured to encode, through the encoding layer, the basic molecule in the molecule pair sample to obtain a first graph node feature of a first graph and a first tree node feature of a first branch tree, and encode the target molecule in the molecule pair sample to obtain a second graph node feature and a second tree node feature;

the alignment module is used for matching the first graph node characteristics with the second graph node characteristics through the alignment layer to obtain first similarity characteristics of the first graph and the second graph, and matching the first tree node characteristics with the second tree node characteristics to obtain second similarity characteristics of the first sub-tree and the second sub-tree;

a first generating module, configured to generate, through the generating layer, a graph node feature of a predicted molecule according to the first similarity feature and the first graph node feature, and generate a tree node feature of the predicted molecule according to the second similarity feature and the first tree node feature;

the first decoding module is used for respectively decoding the graph node characteristics and the tree node characteristics through the decoding layer to obtain the predicted molecules represented by a molecular graph and a molecular tree;

and the updating module is used for acquiring the difference between the predicted molecule and the target molecule and updating the model parameters of the molecule generation model based on the difference.

In the above scheme, the first encoding module is further configured to encode at least two first nodes in the first graph through a graph encoding network in an encoding layer to obtain encoding characteristics of the at least two first nodes, and use the encoding characteristics of the at least two first nodes as the first graph node characteristics; wherein the first molecular diagram corresponds to the molecular structural topology of the base molecule, the at least two first nodes corresponding to the constituent elements that make up the base molecule;

coding at least two second nodes in the first sub-tree through a tree coding network in a coding layer to obtain coding characteristics of the at least two second nodes, and taking the coding characteristics of the at least two second nodes as first tree node characteristics; wherein the first molecular tree is constructed with the constituent elements of the base molecule as second nodes based on the molecular structure of the base molecule.

In the foregoing solution, the first encoding module is further configured to, for each first node in the first graph, perform the following operations:

when at least two edges connecting the first node exist, acquiring edge coding characteristics of the at least two edges connecting the first node;

summing the edge coding features of the at least two edges to obtain a first edge aggregation feature;

and generating node coding characteristics of the nodes based on the attribute characteristics of the first nodes and the first edge aggregation characteristics.

In the foregoing scheme, the first encoding module is further configured to, for each of at least two edges connecting the first node, perform the following operations:

when the edge is the edge connecting the first node and the neighbor node and at least two edges connecting the neighbor node exist, acquiring attribute characteristics of the at least two edges connecting the neighbor node;

summing the attribute characteristics of at least two edges connecting the neighbor nodes to obtain a second edge aggregation characteristic;

and generating edge coding characteristics of the edge based on the attribute characteristics of the first node, the attribute characteristics of the neighbor nodes and the second edge aggregation characteristics.

In the above solution, the first graph node characteristics include coding characteristics of at least two first nodes in the first graph, and the second graph node characteristics include coding characteristics of at least two third nodes in the second graph;

the alignment module is further configured to obtain, based on the first graph node feature and the second graph node feature, a first similarity from each first node in the first graph to at least two third nodes in the second graph and a second similarity from each third node in the second graph to at least two first nodes in the first graph;

and according to the first similarity and the second similarity, aggregating the coding features of at least two first nodes in the first graph node feature and the second graph node feature to obtain a first similarity feature of the first graph and the second graph.

In the foregoing solution, the alignment module is further configured to aggregate, according to the first similarity, coding features of at least two first nodes in the first graph node feature and at least two third nodes in the second graph node feature, so as to obtain a similarity feature from the first graph to the second graph;

according to the second similarity, the coding features of at least two first nodes in the first graph node features and at least two third nodes in the second graph node features are aggregated to obtain the similarity feature from the second graph to the first graph;

and splicing the similarity features from the first molecular diagram to the second molecular diagram and the similarity features from the second molecular diagram to the first molecular diagram to obtain the first similarity features of the first molecular diagram and the second molecular diagram.

In the foregoing solution, the alignment module is further configured to perform weighted summation on the coding features of at least two third nodes in the second graph according to first similarities from the first nodes in the first graph to the at least two first nodes in the second graph, respectively, to obtain first aggregation features corresponding to the first nodes in the first graph;

respectively splicing the coding features of each first node in the first graph with corresponding first aggregation features to obtain first splicing features corresponding to each first node in the first graph;

and summing the first splicing characteristics corresponding to each first node in the first molecular graph to obtain the similarity characteristics from the first molecular graph to the second molecular graph.

In the foregoing solution, the first generating module is further configured to obtain a mean and a variance corresponding to the first similarity feature based on the first similarity feature;

acquiring Gaussian distribution corresponding to a first similarity characteristic based on a mean value and a variance corresponding to the first similarity characteristic;

sampling from the Gaussian distribution to obtain sampling characteristics;

and splicing the sampling characteristics and the first graph node characteristics to obtain the graph node characteristics of the predicted molecules.

In the foregoing scheme, the first updating module is further configured to obtain a probability that the predicted molecule is the same as the target molecule;

based on the difference, acquiring information divergence between posterior probability distribution and standard Gaussian distribution of the sampling features;

determining a value of a variation loss function based on the probability and the information divergence;

obtaining the central representation of a basic molecule, the central representation of a predicted molecule and the central representation of a target molecule;

determining a value of a concealment loss function based on the central representation of the base molecule, the central representation of the predicted molecule, and the central representation of the target molecule;

summing the value of the variation loss function and the value of the hidden loss function to obtain the value of the loss function of the molecular generation model;

updating model parameters of the molecular generative model based on values of a loss function of the molecular generative model.

In the above solution, the first graph node characteristics include coding characteristics of at least two first nodes in the first graph, and the first tree node characteristics include coding characteristics of at least two second nodes in the first tree;

the updating module is further configured to obtain a first average value of the coding features of at least two first nodes in the first graph and a second average value of the coding features of at least two second nodes in the first graph;

and splicing the first average value and the second average value to obtain the central representation of the basic molecules.

In the above scheme, the first decoding module is further configured to process the graph node features through a gated loop network to obtain information vectors transmitted among the nodes;

for any decoded node, obtaining the probability of adding new node based on the information vector transmitted between the node and other nodes, and

and when the probability is determined to be higher than the probability threshold value, determining the type of the new node according to the information vector transmitted between the node and other nodes, and adding the new node at the node according to the type of the new node.

A molecular generation apparatus based on a molecular generation model, the molecular generation model comprising: an encoding layer, a generating layer, and a decoding layer, the apparatus comprising:

a second obtaining module, configured to obtain a base molecule, where the base molecule is a biomolecule or a drug molecule and is represented by a first molecular graph and a first molecular tree;

the second coding module is used for coding the basic molecules in the molecule pair sample through the coding layer to obtain a first graph node characteristic of the first graph and a first tree node characteristic of the first branch tree;

a second generation module for generating, by the generation layer, graph node features of the predicted numerator based on the first graph node features, and generating tree node features of the predicted numerator according to the standard gaussian distribution and the first tree node features;

the second decoding module is used for respectively decoding the graph node characteristics and the tree node characteristics through the decoding layer to obtain the predicted molecules;

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the training method of the molecular generation model provided by the embodiment of the application when the executable instructions stored in the memory are executed.

a memory for storing executable instructions;

and the processor is used for realizing the molecular generation method based on the molecular generation model provided by the embodiment of the application when executing the executable instructions stored in the memory.

Embodiments of the present application provide a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for training a molecular generative model provided in the embodiments of the present application.

Embodiments of the present application provide a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for generating molecules based on a molecular generation model provided in embodiments of the present application.

The embodiment of the application has the following beneficial effects:

1) the method comprises the steps of obtaining a molecule pair sample containing basic molecules and target molecules, training a molecule generation model based on the obtained molecule pair sample, and generating prediction molecules with molecular attributes higher than the basic molecules based on the basic molecules by the molecule generation model trained by the application because the molecular attributes of the target molecules are higher than the molecular attributes of the basic molecules so as to realize optimization of the molecular attributes;

2) matching the first graph node characteristics with the second graph node characteristics through the alignment layer to obtain first similarity characteristics of the first graph node characteristics and the second graph node characteristics, and matching the first tree node characteristics with the second tree node characteristics to obtain second similarity characteristics of the first sub-tree and the second sub-tree; generating graph node characteristics of a predicted molecule according to the first similarity characteristics and the first graph node characteristics through the generation layer, and generating tree node characteristics of the predicted molecule according to the second similarity characteristics and the first tree node characteristics; therefore, the similarity of the basic molecule and the target molecule on the molecular structure can be discovered, the similarity and the basic molecule are combined to generate the predicted molecule, and further the predicted molecule generated by the molecular generation model obtained through the training of the application can keep partial structural information of the basic molecule.

Drawings

FIG. 1 is a schematic diagram of an implementation scenario of a training method for a molecular generative model provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram illustrating a method for training a molecular generative model provided in an embodiment of the present application;

FIG. 4A is a schematic representation of the molecular structure of a base molecule provided in an embodiment of the present application;

FIG. 4B is a schematic diagram of the molecular structure of a target molecule provided in an embodiment of the present application;

fig. 5A is a schematic diagram of a first graph provided in an embodiment of the present application;

FIG. 5B is a diagram illustrating a first subtree according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a molecular generation method based on a molecular generation model provided in an embodiment of the present application;

FIG. 7 is a schematic flow chart diagram illustrating a method for training a molecular generative model provided in an embodiment of the present application;

FIG. 8 is a schematic flow chart of a molecular generation method based on a molecular generation model provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a training apparatus for a molecular generative model according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) A molecule is a whole formed by combining atoms according to a certain bonding sequence and spatial arrangement, and the bonding sequence and the spatial arrangement are called as a molecular structure.

2) The molecular properties include the physical properties and the chemical properties of molecules, for example, drug molecules with polar groups have great affinity to water, can attract water molecules, or are dissolved in water, the surface of a solid material formed by the drug molecules is easily wetted by water, namely the drug molecules are hydrophilic; after some drug molecules contact with living organisms or enter the living organisms of organisms, direct or indirect damage can be caused, namely the property of the drug molecules is biological toxicity (biological harmfulness); the molecular properties depend not only on the type and number of constituent atoms, but also on the structure of the molecule.

3) Gaussian distribution, i.e. normal distribution, if the random variable X obeys a probability distribution with a position parameter of μ and a scale parameter of σ, and its probability density function is

This random variable is called a gaussian random variable (normal random variable), and the distribution obeyed by the gaussian random variable (normal random variable) is called a gaussian distribution (normal distribution) and is denoted as X to N (μ, σ)²)。

4) And a standard gaussian distribution, wherein when mu is 0 and sigma is 1, the gaussian distribution becomes the standard gaussian distribution.

5) Information divergence, also known as Kullback-Leibler divergence or relative entropy, is an asymmetric measure of the difference between two probability distributions, and in information theory, information divergence is equivalent to the difference in information entropy of two probability distributions.

Based on the above explanations of terms and terms involved in the embodiments of the present application, an implementation scenario of the training method of the molecular generative model provided in the embodiments of the present application is first described next, referring to fig. 1, fig. 1 is a schematic diagram of an implementation scenario of the training method of the molecular generative model provided in the embodiments of the present application, in order to support an exemplary application, a terminal includes a terminal 200-1 and a terminal 200-2, where the terminal 200-1 is located on a developer side and is used to control the training of the molecular generative model, and the terminal 200-2 is located on a user side and is used to request to generate a predicted molecule corresponding to a base molecule; the terminal 200 is connected to the server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both, and uses a wireless or wired link to realize data transmission.

A terminal 200-1 for sending a training instruction for the molecular generative model to the server 100;

a server 100 for obtaining a molecule pair sample comprising a base molecule and a target molecule; wherein the molecular property of the target molecule is higher than the molecular property of the base molecule, the base molecule being represented by a first molecular graph and a first molecular tree, the target molecule being represented by a second molecular graph and a second molecular tree; encoding the basic molecules in the molecule pair sample through the encoding layer to obtain a first graph node characteristic of a first graph and a first tree node characteristic of a first molecular tree, and encoding the target molecules in the molecule pair sample to obtain a second graph node characteristic and a second tree node characteristic; matching the first graph node characteristics with the second graph node characteristics through the alignment layer to obtain first similarity characteristics of the first graph node characteristics and the second graph node characteristics, and matching the first tree node characteristics with the second tree node characteristics to obtain second similarity characteristics of the first sub-tree and the second sub-tree; generating graph node characteristics of a predicted molecule according to the first similarity characteristics and the first graph node characteristics through the generation layer, and generating tree node characteristics of the predicted molecule according to the second similarity characteristics and the first tree node characteristics; decoding the graph node characteristics and the tree node characteristics respectively through the decoding layer to obtain the predicted molecules represented by a molecular graph and a molecular tree; obtaining a difference between the predicted molecule and the target molecule, and updating model parameters of the molecule generation model based on the difference.

After the molecular generation model completes training, the terminal 200-2 is configured to send a molecular generation instruction for instructing generation of a predicted molecule corresponding to the base molecule;

the basic molecule is a molecule having a low molecular property, and a predicted molecule having a higher molecular property than the basic molecule is generated by transmitting a predicted molecule corresponding to the basic molecule.

And the server 100 is configured to generate a prediction molecule according to the basic molecule by responding to the molecule generation instruction and through the molecule generation model obtained through training, and return the prediction molecule to the terminal 200-2.

In some embodiments, the server 100 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal (e.g., terminal 400-1) may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

In practical application, the molecular generation model provided by the embodiment of the application can be applied to the fields of structural biology and medicine, and drug discovery, molecular optimization, molecular generation and the like can be realized through the molecular generation model.

Taking the application to drug discovery as an example, here, the basic molecules and the target molecules in the molecule sample pair for training the molecule generation model are drug molecules, and the target molecules are superior to the basic molecules for the required drug property; a model is then generated for the training molecules based on the molecular samples.

And obtaining a molecular generation model after training, namely, discovering the medicine through the molecular generation model. Illustratively, a drug discovery application client is provided on the terminal, through which a user can input basic molecules, and the terminal sends a molecule generation instruction of a predicted molecule corresponding to the basic molecule to the server 100 through the network 300; after receiving the instruction, the server 100 extracts a basic molecule in the instruction, and generates a prediction molecule according to the basic molecule; wherein the basic molecule to be input is a drug molecule.

The hardware structure of the electronic device of the training method for the molecular generative model provided in the embodiments of the present application is described in detail below, and the electronic device includes, but is not limited to, a server or a terminal. Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, where the electronic device shown in fig. 2 includes: the various components in the at least one processor 410, memory 450, at least one network interface 420, and user interface 430 electronic devices are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 2.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the training apparatus for molecular generative model provided in the embodiments of the present application may be implemented in software, and fig. 2 illustrates the training apparatus 455 for molecular generative model stored in the memory 450, which may be software in the form of programs and plug-ins, and includes the following software modules: a first obtaining module 4551, a first encoding module 4552, an alignment module 4553, a first generating module 4554, a first decoding module 4555 and an updating module 4556, which are logical and thus may be arbitrarily combined or further divided according to the functions implemented.

The functions of the respective modules will be explained below.

In other embodiments, the training Device of the molecular generation model provided in the embodiments of the present Application may be implemented in hardware, and as an example, the Device provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to perform the training method of the molecular generation model provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

Based on the above description of the implementation scenario of the molecular generative model training method and the electronic device according to the embodiments of the present application, the following description will explain the molecular generative model training method according to the embodiments of the present application.

The molecular generation model provided in the embodiment of the present application includes an encoding layer, an alignment layer, a generation layer, and a decoding layer, see fig. 3, where fig. 3 is a schematic flow chart of a training method of the molecular generation model provided in the embodiment of the present application; in some embodiments, the training method of the molecular generative model may be implemented by a server or a terminal alone, or implemented by a server and a terminal in a cooperative manner, taking the server as an example, the training method of the molecular generative model provided in this embodiment of the present application includes:

step 301: the server obtains a sample of molecule pairs comprising a base molecule and a target molecule.

Here, in practical applications, the molecule pair sample may be a biomolecule pair sample or a drug molecule pair sample, that is, the base molecule and the target molecule may be biomolecules (such as protein molecules) or drug molecules (such as drug molecules with polar groups).

In some embodiments, the molecular property of the target molecule is higher than the molecular property of the base molecule, the base molecule being represented by a first molecular graph and a first molecular tree, and the target molecule being represented by a second molecular graph and a second molecular tree.

In practical implementation, the molecular generation model provided in the embodiment of the present application is used for optimizing molecular properties, that is, by inputting a base molecule with a lower molecular property into the molecular generation model, the base molecule is correspondingly processed, and a predicted molecule with a molecular property higher than that of the base molecule is output. Therefore, the molecular properties of the molecules used to train the molecular generative model should be higher for the target molecules in the sample than for the base molecules. Here, a high molecular property means that the target molecule has a significant improvement over the base molecule for a desired property, e.g. the desired feature is hydrophilicity, which should be better than the base molecule.

It should be noted that the base molecule and the target molecule should be similar in structure, so that the target molecule can be generated based on the base molecule, that is, the structural similarity between the base molecule and the target molecule should reach a similarity threshold, for example, fig. 4A is a schematic molecular structure diagram of the base molecule provided in the embodiment of the present application, fig. 4B is a schematic molecular structure diagram of the target molecule provided in the embodiment of the present application, see fig. 4A and 4B, and a part of the structure in the dashed line box in fig. 4A is the same as a part of the structure in the dashed line box in fig. 4B.

In order to accurately describe the molecular structures of the base molecule and the target molecule, the embodiments of the present application use a graph structure and a tree structure to represent the base molecule and the target molecule, that is, the base molecule is represented by a first molecular diagram and a first molecular number, and the target molecule is represented by a second molecular diagram and a second molecular tree.

Here, the graph structure and the tree structure are both composed of nodes and edges connecting the nodes, where in the graph structure, the relationship between the nodes may be arbitrary, and any two data elements in the graph may be related to each other; in a tree structure, there is a distinct hierarchical relationship between data elements, and data elements on each level may be related to multiple elements in the next level (i.e., their child nodes), but only to one element in the previous level (i.e., its parent node).

In practical implementation, the graph structure corresponds to a molecular structure topology, nodes in the graph structure correspond to constituent elements (e.g., atoms) of a molecule, and edges correspond to bonds between the constituent elements, for example, taking the molecular structure in fig. 4A as an example, fig. 5A is a schematic diagram of a first molecular diagram provided in this embodiment of the present application, see fig. 4A and 5A, where the molecular structure in fig. 4A corresponds to the first molecular diagram in fig. 5A, and the constituent elements in fig. 4A correspond to nodes in the first molecular diagram in fig. 5A.

After the graph structure of the molecule is obtained, some nodes in the graph structure can be shrunk to a single node according to the chemical concept to generate a corresponding tree structure, for example, in the graph structure, one benzene ring includes 5 nodes, and the five nodes can be shrunk to a single node.

Illustratively, taking the molecular structure in fig. 4A as an example, fig. 5B is a schematic diagram of a first molecular tree provided in the embodiments of the present application, and referring to fig. 5A and 5B, some nodes in the first molecular tree in fig. 5A are shrunk to one node to obtain the first molecular tree in fig. 5B, e.g., five rings in the first molecular tree are shrunk to one node to obtain a node 501 in fig. 5B; the six rings in the first graph are also shrunk to a node, resulting in node 502 in FIG. 5B.

Step 302: and coding the target molecules in the molecule pair samples to obtain a second graph node characteristic of a second molecular graph and a second tree node characteristic of a second molecular tree.

Because the basic molecule is represented by the first molecular graph and the first molecular tree, in practical implementation, the first molecular graph and the first molecular tree are input into the coding layer, and the coding layer is used for coding the first molecular graph and the first molecular tree to obtain a first graph node characteristic and a first tree node characteristic of the first molecular graph; similarly, the second graph and the second sub-tree are input into the coding layer, and the coding layer is used for coding the second graph and the second sub-tree to obtain the first graph node characteristic and the first tree node characteristic of the second graph.

In some embodiments, the base molecule in the sample may be encoded by the molecule in the following manner: coding at least two first nodes in the first graph through a graph coding network in a coding layer to obtain coding characteristics of the at least two first nodes, and taking the coding characteristics of the at least two first nodes as first graph node characteristics; wherein the first molecular diagram corresponds to the molecular structure topology of the base molecule, and the at least two first nodes correspond to the constituent elements constituting the base molecule; coding at least two second nodes in the first sub-tree through a tree coding network in a coding layer to obtain coding characteristics of the at least two second nodes, and taking the coding characteristics of the at least two second nodes as the first tree node characteristics; wherein the first molecular tree is constructed based on the molecular structure of the base molecule with the constituent elements of the base molecule as the second nodes.

Here, the coding layer includes two coding networks, namely a graph coding network and a tree coding network, where the graph coding network is used to process data of a graph structure, namely a first graph; the tree coding network is used for processing data of a tree structure, namely a first sub tree.

In actual implementation, the graph coding network will generate one coding feature for each first node in the first graph, and the tree coding network will generate one coding feature for each second node in the first graph tree. The Graph coding network and the tree coding network may use the same network to implement coding processing, for example, information transfer Networks (MPNNs, Message Passing Neural Networks), Graph convolutional Neural Networks (GCNs, Global cyclic Networks), Graph sampling and aggregation (Graph SAGE, Graph SAmple and aggregation gate), and the like may be used.

In some embodiments, at least two first nodes in the first graph may be encoded by:

for each first node in the first graph, performing the following: when at least two edges connecting the first node exist, acquiring edge coding characteristics of the at least two edges connecting the first node; summing edge coding features of at least two edges to obtain a first edge aggregation feature; and generating the node coding feature of the first node based on the attribute feature and the first edge aggregation feature of the first node.

In practical implementation, all the edges connected with the first node may be encoded to obtain edge encoding features of all the edges connected with the first node, and then the edge encoding features are aggregated into one feature, that is, a first edge aggregation feature, and then the first edge aggregation feature and the attribute feature of the first node are combined to generate the node encoding feature of the first node.

Here, the first attribute feature may be determined according to the element category corresponding to the first node, and the attribute features of the first node corresponding to different elements are different.

In some embodiments, the edge connecting the first node may be encoded by:

for each of at least two edges connecting the first node, performing the following operations: when the edge is the edge connecting the first node and the neighbor node and at least two edges connecting the neighbor node exist, acquiring attribute characteristics of at least two edges connecting the neighbor node; summing the attribute characteristics of at least two edges connecting the neighbor nodes to obtain a second edge aggregation characteristic; and generating edge coding characteristics of the edges based on the attribute characteristics of the first node, the attribute characteristics of the neighbor nodes and the second edge aggregation characteristics.

Here, the attribute characteristics of an edge may be determined according to the elements corresponding to the two nodes connected by the edge. In actual implementation, for an edge between a first node and a neighboring node, attribute features of all edges connecting the neighboring node are acquired, and then the attribute features are aggregated into one feature, namely a second edge aggregation feature, and an edge coding feature of the edge is generated by combining the attribute feature of the first node, the attribute feature of the neighboring node, and the second edge aggregation feature.

In some embodiments, when the number of layers of the graph coding network is at least two, then the coding characteristics of the first node of the t-th network output in the graph coding network may be determined by:

when t is 1, according to formula (1), obtaining an edge coding feature of an edge connecting a first node i and a first node j obtained by a t-th layer network:

wherein the content of the first and second substances,

the method comprises the steps that edge coding characteristics of edges connecting a first node i and a first node j are obtained through a t-layer network, and the first node j is a neighbor node of the first node i; f. of₁Representing a neural network; f. of_iAttribute features of the first node i; f. of_jAttribute features of the first node i; f. of_jkThe attribute characteristics of an edge between a first node j and a first node k are shown, and the first node k is a neighbor node of the first node j;

then, according to the formula (2), acquiring coding characteristics of the first node connected through the t-layer network;

wherein f is₂Representing a neural network;

the coding characteristics of a first node i representing the output of a t-th network;

when t >1, according to formula (3), obtaining an edge coding feature of an edge connecting the first node i and the first node j obtained through a t-th layer network:

wherein the content of the first and second substances,

the method comprises the steps that edge coding characteristics of edges connecting a first node i and a first node j are obtained through a t-layer network, and the first node j is a neighbor node of the first node i; f. of₁Representing a neural network;

the coding characteristics of the first node i are obtained through a t-1 layer network;

the coding characteristics of the first node j are obtained through a t-1 layer network;

the coding characteristics of the edge between the first node j and the first node k are obtained through a t-1 layer network, and the first node k is a neighbor node of the first node j;

according to the formula (4), acquiring the coding characteristics of the first node connected through the t-th layer network:

wherein f is₂Representing a neural network;

indicating the coding characteristics of the first node i of the t-th network output,

the coding characteristics of the first node i obtained through the t-1 layer network.

Here, the coding feature of the first node i obtained by the last layer of network processing can be expressed as

The first picture coding is characterized by

Wherein nG represents the number of first nodes in the first graph.

In practical implementation, the same encoding method as that of the first molecular graph may be adopted to perform encoding processing on the first molecular tree, the second molecular graph, and the second molecular tree, and accordingly, the first tree encoding characteristic, the second graph encoding characteristic, and the second tree encoding characteristic may be obtained.

Step 303: and matching the first graph node characteristics with the second graph node characteristics through the alignment layer to obtain first similarity characteristics of the first graph and the second graph, and matching the first tree node characteristics with the second tree node characteristics to obtain second similarity characteristics of the first sub-tree and the second sub-tree.

Here, the structural differences between different molecules mainly depend on the types of atoms in the molecules and the connection manner between them, and by matching the first graph node features with the second graph node features and matching the first tree node features with the second tree node features, the structural similarity between the base molecule and the target molecule is obtained, so as to better explore the structural information of the molecules.

In some embodiments, when the first graph node feature comprises coding features of at least two first nodes in the first graph and the second graph node feature comprises coding features of at least two third nodes in the second graph, the first similarity feature between the first graph and the second graph can be obtained by: based on the first graph node characteristics and the second graph node characteristics, acquiring first similarity from each first node in the first graph to at least two third nodes in the second graph and second similarity from each third node in the second graph to at least two first nodes in the first graph; and according to the first similarity and the second similarity, aggregating the coding features of at least two first nodes in the first graph node feature and the second graph node feature to obtain a first similarity feature of the first graph and the second graph.

In practical implementation, for any first node in the first graph and any third node in the second graph, the bidirectional similarity between the two nodes needs to be calculated.

Exemplarily, for the ith first node in the first graph, its coding feature is denoted as x_i ^GFor the q-th third node in the second molecular diagram, its coding feature is denoted as y_q ^GFirst, according to formula (5), a first similarity from the ith first node in the first graph to the qth third node in the second graph is obtained:

wherein, w_iqFor a first similarity from the ith first node in the first graph to the qth third node in the second graph, σ represents the standard deviation;

then, according to formula (6), obtaining a second similarity from the qth first node in the second graph to the ith third node in the first graph:

wherein, w_qiFor a second similarity of the qth first node in the second graph to the ith third node in the first graph, σ represents the standard deviation.

By the method, the bidirectional similarity between each first node in the first graph and at least two nodes in the second graph can be obtained; after the bidirectional similarity is obtained, the coding features of the third nodes in the first molecular graph and the second molecular graph are aggregated together according to the bidirectional similarity, so that the first similarity feature of the first molecular graph and the second molecular graph is obtained.

Correspondingly, the second similarity characteristic of the first sub-tree and the second sub-tree can be obtained by the following method: when the first tree node characteristic comprises coding characteristics of at least two second nodes in the first sub-tree and the second tree node characteristic comprises coding characteristics of at least two fourth nodes in the second sub-tree, a first similarity characteristic between the first sub-tree and the second sub-tree can be obtained by: acquiring third similarities from each second node in the first sub-tree to at least two fourth nodes in the second sub-tree and fourth similarities from each fourth node in the second sub-tree to at least two second nodes in the first sub-tree based on the first tree node characteristics and the second tree node characteristics; and according to the third similarity and the fourth similarity, the coding characteristics of at least two second nodes in the first tree node characteristics and the second tree node characteristics are aggregated to obtain second similarity characteristics of the first subtree and the second subtree.

In practical implementation, for any second node in the first sub-tree and any fourth node in the second sub-tree, the two-way similarity between the two nodes needs to be calculated. Here, the bidirectional similarity between the node in the first sub-tree and the node in the second sub-tree may be calculated in the same way as the bidirectional similarity between the node in the first sub-tree and the node in the second sub-tree is calculated. And further, obtaining a second similarity characteristic of the first sub-tree and the second sub-tree.

In some embodiments, the first similarity feature of the first graph and the second graph may be obtained by:

according to the first similarity, the coding features of at least two first nodes in the first graph node features and the coding features of at least two third nodes in the second graph node features are aggregated to obtain similarity features from the first graph to the second graph; according to the second similarity, the coding features of at least two first nodes in the first graph node features and the coding features of at least two third nodes in the second graph node features are aggregated to obtain the similarity feature from the second graph to the first graph; and splicing the similarity features from the first molecular diagram to the second molecular diagram and the similarity features from the second molecular diagram to the first molecular diagram to obtain the first similarity features of the first molecular diagram and the second molecular diagram.

Here, the aggregation refers to aggregating the coding features of at least two first nodes in the first graph node feature and the coding features of at least two third nodes in the second graph node feature into one feature, and in actual implementation, the aggregation of the features may be implemented by using a mode such as summation and averaging.

In practical implementation, the obtained similarity characteristic from the first molecular diagram to the second molecular diagram is assumed to be

Similarity feature of second graph to first graph

The resulting first similarity feature may then be denoted as

Accordingly, the second similarity characteristic of the first sub-tree and the second sub-tree can be obtained by:

according to the third similarity, the coding features of at least two second nodes in the first tree node features and the coding features of at least two fourth nodes in the second tree node features are aggregated to obtain similarity features from the first sub tree to the second sub tree; according to the fourth similarity, the coding features of at least two second nodes in the first tree node features and the coding features of at least two fourth nodes in the second tree node features are aggregated to obtain the similarity feature from the second subtree to the first subtree; and splicing the similarity characteristics from the first sub-tree to the second sub-tree and the similarity characteristics from the second sub-tree to the first sub-tree to obtain a second similarity characteristic of the first sub-tree and the second sub-tree.

In practical implementation, the obtained similarity characteristic from the first sub-tree to the second sub-tree is assumed to be

Second sub-tree to first sub-treeSimilarity features of subtrees

The resulting second similarity characteristic may then be expressed as

In some embodiments, the similarity features from the first graph to the second graph may be obtained by:

respectively carrying out weighted summation on the coding features of at least two third nodes in the second graph according to the first similarity from each first node in the first graph to at least two third nodes in the second graph to obtain a first aggregation feature corresponding to each first node in the first graph; respectively splicing the coding features of each first node in the first graph with corresponding first aggregation features to obtain first splicing features corresponding to each first node in the first graph; and summing the first splicing characteristics corresponding to each first node in the first molecular graph to obtain the similarity characteristics from the first molecular graph to the second molecular graph.

In practical implementation, for each first node in the first graph, the following operations may be performed: and carrying out weighted summation on the coding features of the first nodes in the second graph according to the first similarity from the first nodes to each first node in the second graph, wherein the weight corresponding to the coding feature of each first node in the second graph is the first similarity from the first nodes in the first graph to the first nodes in the second graph.

Illustratively, when the first similarity from the ith first node in the first graph to the qth third node in the second graph is w_iqThen the weight corresponding to the coding feature of the q-th third node in the second molecular graph is w_iq. Further, a first aggregation feature corresponding to the ith first node in the first graph can be obtained

That is, the coding features of all the third nodes in the second graph are weighted and summed according to the corresponding weights.

Executing the above operations for each first node in the first graph, so as to obtain a first aggregation characteristic corresponding to each first node in the first graph; after the first aggregation features corresponding to all the first nodes in the first graph are obtained, the coding features of all the first nodes in the first graph and the corresponding first aggregation features are spliced respectively. For example, for the ith first node in the first graph, its coding characteristics are

First aggregation feature corresponding thereto

Splicing is carried out to obtain a first splicing characteristic

That is, for each first node in the first graph, a corresponding first splicing feature is obtained, and then all the obtained first splicing features are aggregated in a summing manner to obtain a similarity feature from the first graph to the second graph.

In practical implementation, the similarity feature from the second sub-tree to the first sub-tree, the similarity feature from the first sub-tree to the second sub-tree, and the similarity feature from the second sub-tree to the first sub-tree can be obtained in a similar manner.

In some embodiments, the similarity characteristic from the first molecular graph to the second molecular graph can be obtained through formula (7)

Wherein g represents a polymerization treatment,

is the coding characteristic of the ith first node in the first graph,

for a first similarity of the ith first node in the first graph to the qth third node in the second graph,

is the coding characteristic of the q-th third node in the first graph.

Correspondingly, through the formula (8), the similarity characteristic from the second molecular diagram to the first molecular diagram is obtained

Wherein the content of the first and second substances,

is a second similarity of the q-th third node in the second graph to the i-th first node in the first graph.

Similarly, through formula (9), obtaining the similarity characteristic from the first sub-tree to the second sub-tree;

wherein the content of the first and second substances,

is the coding characteristic of the ith second node in the first subtree,

for the ith second node in the first sub-tree to the second sub-treeA third similarity of the qth fourth node,

the coding characteristic of the q-th fourth node in the second sub-tree.

Correspondingly, the similarity characteristic from the second sub-tree to the first sub-tree is obtained through the formula (10):

wherein the content of the first and second substances,

is a fourth similarity of the qth fourth node in the second subtree to the ith second node in the first subtree.

Step 304: by generating the layer, a graph node feature of the predicted molecule is generated according to the first similarity feature and the first graph node feature, and a tree node feature of the predicted molecule is generated according to the second similarity feature and the first tree node feature.

In actual implementation, in order to retain the structural similarity between the predicted molecule and the basic molecule, at the graph structure level, combining the first similarity characteristic and the first graph node characteristic to generate a graph node characteristic of the predicted molecule; and combining the second similarity characteristic and the first tree node characteristic at the level of the tree structure to generate the tree node characteristic of the predicted molecule.

In some embodiments, the graph node features of the predicted molecules may be generated by: acquiring a mean value and a variance corresponding to the first similarity characteristic based on the first similarity characteristic; obtaining a Gaussian distribution corresponding to the first similarity characteristic based on the mean value and the variance corresponding to the first similarity characteristic; sampling from the Gaussian distribution to obtain sampling characteristics; and splicing the sampling characteristics and the first graph node characteristics to obtain the graph node characteristics of the predicted molecules.

In practical implementation, the resampling skill pair of the variational encoder may be adopted, that is, it is assumed that there is a distribution dedicated to the first similarity feature, and it is further assumed that the distribution is gaussian, and the gaussian has two sets of parameters: and a mean and a variance, wherein the mean and the variance corresponding to the first similarity feature need to be obtained in order to determine the distribution specific to the first similarity feature, and the mean and the variance can be obtained by fitting through a neural network.

Here, assume that the sampling characteristics are

The sampled characteristics then follow a gaussian distribution, i.e.

Wherein m is^GRepresenting a first similarity characteristic.

After the Gaussian distribution which is exclusively owned by the first similarity characteristic is obtained, sampling characteristic is sampled from the obtained Gaussian distribution, wherein the sampling characteristic comprises a plurality of characteristic items which respectively correspond to each first node in the first molecular graph. When the sampling features are spliced with the first graph node features, specifically, each feature item in the sampling features is respectively spliced with the coding features of each first node in the first graph node features to obtain the graph node features of the predicted molecules. For example, when the first graph node features are expressed as

Then, the graph node characteristics of the predicted molecule may be represented as

Similarly, the tree node characteristics of the predicted molecule may be generated in the following manner: obtaining a Gaussian distribution corresponding to the second similarity characteristic based on the mean value and the variance corresponding to the second similarity characteristic; sampling from the Gaussian distribution to obtain sampling characteristics; and splicing the sampling characteristics and the first tree node characteristics to obtain the tree node characteristics of the predicted molecules.

Here, the manner of obtaining the gaussian distribution corresponding to the second similarity feature is similar to the manner of obtaining the gaussian distribution corresponding to the first geotropic feature; and the mode of splicing the sampling features with the first tree node features is similar to the mode of splicing the sampling features with the first graph node features.

Here, assume that the sampling characteristics are

The sampled characteristics then follow a gaussian distribution, i.e.

Wherein m is^TRepresenting a second similarity characteristic. When the first tree node is characterized as

When the tree node characteristics of the predicted molecule can be expressed as

Step 305: and respectively decoding the graph node characteristics and the tree node characteristics through a decoding layer to obtain predicted molecules represented by the molecular graph and the molecular tree.

In practical implementation, the graph node feature and the tree node feature may be decoded in a depth-first manner.

In some embodiments, the graph node features may be decoded by: the gated cyclic network processes the characteristics of the graph nodes to obtain information vectors transmitted among the nodes; and for any decoded node, obtaining the probability of adding a new node based on the information vectors transmitted between the node and other nodes, and determining the type of the new node according to the information vectors transmitted between the node and other nodes when the probability is determined to be higher than a probability threshold value, so as to add the new node at the node according to the type of the new node.

In practical implementation, after the graph node characteristics of the predicted molecules are obtained, the graph node characteristics are input into the gated loop network, and information vectors transmitted between any node and other nodes can be iterated through the gated loop network.

For example, for the u-th node in the molecular graph of the predicted molecule, the information vector transferred between the u-th node and the v-th node in the molecular graph can be obtained through formula (11), that is:

wherein h is_uvFor the information vector transferred between the u-th node and the v-th node in the molecular graph,

representing the coding characteristics of the r-th node in the graph, h_vwAnd representing information vectors transferred between the v-th node and the w-th node in the molecular graph, wherein the w-th node is a node except the u-th node in the neighbor nodes of the v-th node.

When the t-th iteration reaches the u-th node, the node is recorded as u_tThen, first, according to equation (12) and equation (13), the probability of adding a new node is obtained:

wherein τ (-) represents a ReLU function, s (-) represents a sigmoid function,

for the purpose of the weights in the network,

representing the output of the ReLU layer when the t iteration reaches the u node;

representing the output of the ReLU layer when the t-1 th iteration reaches the u-th node; h is_vtwDenotes the t-th timeWhen the iteration reaches the u-th node, the information vector transmitted between the u-th node and the v-th node;

and the probability of adding a new node when the nth iteration reaches the u-th node is shown.

Here, when the probability of adding a new node reaches a probability threshold, it is determined to add the new node; otherwise, the decoding is ended. Here, the probability threshold is set in advance, such as being set to 0.5.

When determining to add a new node, it is necessary to determine the type of the new node, i.e., the element corresponding to the new node. In actual implementation, the type of the new node can be determined by equation (14):

wherein r is_vIndicates the type of the new node or nodes,

is a weight in the network, h_uvThe information vector transferred between the u-th node and the v-th node in the molecular graph is obtained.

After the new node is determined, the current node can be expanded, that is, the new node is added as a neighbor node of the current node.

Step 306: differences between the predicted molecules and the target molecules are obtained, and model parameters of the molecule generation model are updated based on the differences.

Here, the training is aimed at making the predicted molecule and the target molecule similar as much as possible, and in actual implementation, the value of the loss function may be determined according to the difference between the predicted molecule and the target molecule, whether the value of the loss function exceeds a preset threshold value is judged, when the value of the loss function exceeds the preset threshold value, an error signal of the coding model is determined based on the loss function, error information is propagated in the coding model in a backward direction, and model parameters of each layer are updated in the process of propagation.

Here, describing the back propagation, if the predicted molecule output by the molecule generation model has an error with the target molecule, calculating the error between the predicted molecule and the target molecule, and back propagating the error from the output layer to the hidden layer until the error is propagated to the input layer, and adjusting the value of the model parameter according to the error in the process of back propagation; and continuously iterating the process until convergence.

In some embodiments, the model parameters of the molecular generative model may be updated by: obtaining the probability that the predicted molecule is the same as the target molecule; acquiring information divergence between posterior probability distribution and standard Gaussian distribution of sampling features; determining the value of a variation loss function based on the probability and the information divergence; obtaining the central representation of a basic molecule, the central representation of a predicted molecule and the central representation of a target molecule; determining a value of a concealment loss function based on the central representation of the base molecule, the central representation of the predicted molecule, and the central representation of the target molecule; summing the value of the variation loss function and the value of the hidden loss function to obtain the value of the loss function of the molecular generation model; model parameters of the molecular generative model are updated based on values of a loss function of the molecular generative model.

Here, the loss function is divided into two parts, i.e., a variation loss function and a concealment loss function. The variation loss function is used to predict the probability that the molecule is the same as the target molecule and the information divergence between the posterior probability distribution of the sampling feature and the standard gaussian distribution, that is, to make the probability that the predicted molecule is the same as the target molecule as high as possible and to make the posterior probability distribution of the sampling feature as close to the standard gaussian distribution as possible.

It should be noted that, when the posterior probability distribution of the sampling feature is consistent with the standard gaussian distribution, in the application, the sampling feature can be directly sampled from the standard gaussian distribution.

In some embodiments, a variational loss function may be employed as shown in equation (15):

wherein, L (theta, phi)Representing a variation loss function, E representing a mathematical expectation; z comprises a sampling feature for the first similarity feature and a sampling feature for the second similarity feature, x represents a base molecule and y represents a target molecule; p is a radical of_θ(y | x, z) represents the probability that the generated predicted molecule is identical to the target molecule under the condition of x, z; q. q.s_φ(z | x, y) represents the posterior probability distribution of z, wherein the posterior probability of z means the probability of obtaining z under the condition of x and y; p (z | x) represents a standard Gaussian distribution; KL denotes the information divergence.

In practical implementation, the hidden loss function is to constrain the center distance between the base molecule, the target molecule and the predicted molecule.

In some embodiments, a distance loss function may be employed as shown in equation (16):

wherein x is a base molecule and y is a target molecule,

for predicting molecules, y' is a randomly selected molecule with lower similarity to the molecular structure of x; accordingly, G_xAs indicated by the center of the x,

is composed of

Center of (a) represents, G_yIs represented by the center of y, G_y'Is represented by the center of y', and gamma, β are hyper-parameters.

By the formula (16), the central distance between the basic molecule and the predicted molecule can be made smaller than the central distance between the basic molecule and the target molecule; the central distance between the basic molecule and the target molecule is less than gamma; and the center distance of the basic molecule from the less similar molecule is greater than beta.

In some embodiments, when the first graph node feature comprises coding features of at least two first nodes in the first graph and the first tree node feature comprises coding features of at least two second nodes in the first molecular tree, the central representation of the base molecule may be obtained by:

acquiring a first average value of the coding features of at least two first nodes in a first molecular graph and a second average value of the coding features of at least two second nodes in a first molecular tree; and splicing the first average value and the second average value to obtain the central representation of the basic molecules.

In practical implementation, when the first graph node characteristics are expressed as

The first tree node is represented as

The central representation of the base molecule can be obtained by equation (17):

accordingly, when the second graph node characteristics are expressed, the first tree node is expressed as

The central representation of the target molecule can be obtained by equation (18):

the central representation of the predicted molecule and the central representation of the molecule having a lower similarity in molecular structure to the base molecule can be obtained in the same manner as described above.

After the molecule generation model is obtained through training, generation of molecules can be achieved based on the molecule generation model obtained through training. Fig. 6 is a schematic flow chart of a molecular generation method based on a molecular generation model provided in an embodiment of the present application, and referring to fig. 6, the molecular generation model provided in an embodiment of the present application includes:

step 601: the server obtains the base molecule.

Here, the base molecule is represented by a first molecular graph and a first molecular tree. In actual implementation, a user can input basic molecules through the terminal, after the basic molecules are input, the terminal automatically acquires the basic molecules and sends a molecule generation request corresponding to the basic molecules to the server, so that the server acquires the basic molecules and corresponding molecule acquisition requests.

In practical applications, the molecular generation method based on the molecular generation model provided by the embodiment of the application can be applied to drug discovery, molecular optimization, molecular generation and the like in the fields of structural biology and medicine. For example, a user may input a known base drug molecule to generate a new drug molecule via a molecular generative model, wherein the new drug molecule has higher molecular properties than the base drug molecule.

Step 602: and coding the basic molecules in the molecule pair sample through the coding layer to obtain a first graph node characteristic of the first graph and a first tree node characteristic of the first molecular tree.

In practical implementation, the first graph and the first sub-tree are input into the coding layer, and the coding layer is used for coding the first graph and the first sub-tree to obtain a first graph node characteristic and a first tree node characteristic of the first graph.

In some embodiments, at least two first nodes in the first graph are encoded through a graph encoding network in an encoding layer to obtain encoding characteristics of the at least two first nodes, and the encoding characteristics of the at least two first nodes are used as first graph node characteristics; and coding at least two second nodes in the first branch tree through a tree coding network in a coding layer to obtain coding characteristics of the at least two second nodes, and taking the coding characteristics of the at least two second nodes as the first tree node characteristics.

Step 603: generating graph node features of the predicted numerator based on the first graph node features by generating a layer, and generating tree node features of the predicted numerator according to the first tree node features.

In actual implementation, sampling is carried out from the standard Gaussian distribution to obtain sampling characteristics corresponding to the first graph node characteristics and sampling characteristics corresponding to the first tree node characteristics; splicing the sampling features corresponding to the first graph node features with the first graph node features to obtain graph node features of the predicted molecules; and splicing the sampling characteristics corresponding to the first tree node characteristics with the first tree node characteristics to obtain the tree node characteristics of the predicted molecules.

Step 604: and respectively decoding the graph node characteristics and the tree node characteristics through a decoding layer to obtain the predicted molecules.

The molecular generation model is obtained by training based on the training method of the molecular generation model.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

Fig. 7 is a schematic flowchart of a training method of a molecular generative model provided in an embodiment of the present application, and the training method of the molecular generative model provided in the embodiment of the present application will be described below with reference to fig. 7, where referring to fig. 7, the training method of the molecular generative model provided in the embodiment of the present invention includes:

step 701: a sample of molecule pairs is constructed.

Here, a molecular pair sample is screened from the dataset, comprising a base molecule x and a target molecule y, wherein the molecular property of the base molecule x is lower than the molecular property of the target molecule y, and the structural similarity between the base molecule x and the target molecule y is greater than a similarity threshold. Here, the molecular property of the base molecule x is lower than that of the target molecule y, meaning that the target molecule has a significant improvement over the base molecule for the desired property.

In practical implementation, the base molecule x and the target molecule y are represented by a graph structure and a tree structure, i.e., the base molecule is represented by a first molecular graph and a first molecular tree, and the target molecule is represented by a second molecular graph and a second molecular tree.

In practical application, for any first node i in the first graph, the attribute feature of the first node i can be represented as f_iFor any two adjacent first nodes i and j in the first graph, the attribute feature of the edge connecting the nodes i and j can be represented as f_ij。

Correspondingly, nodes and edges in the first sub-tree, the second sub-tree and the second sub-tree can also be represented in a similar manner.

Step 702: the molecular pair sample is encoded.

In practical implementation, the base molecule and the target molecule are respectively input into corresponding coding networks, wherein each coding network comprises two sub-coding networks: graph coding networks and tree coding networks. Here, the graph coding network is used to process data of the graph structure; the tree coding network is used for processing the data of the tree structure.

Here, the first graph is encoded by a graph encoding network as an example. In practical applications, the first graph includes at least two first nodes, and the number of layers of the graph coding network is at least two, so that the coding characteristics of the first nodes output by the t-th layer in the graph coding network can be determined by:

when t is 1, according to the formula

Acquiring edge coding characteristics of edges connecting the first node i and the first node j, which are obtained through a t-th layer network;

wherein the content of the first and second substances,

the method comprises the steps that edge coding characteristics of edges connecting a first node i and a first node j are obtained through a t-layer network, and the first node j is a neighbor node of the first node i; f. of₁Representing a neural network; f. of_iAttribute features of the first node i; f. of_jAttribute features of the first node i; f. of_jkThe first node k is a neighbor node of the first node j, and is an attribute feature of an edge between the first node j and the first node k.

Then, according to the formula

Acquiring coding characteristics of a first node connected through a t-layer network;

wherein f is₂Representing a neural network;

indicating the coding characteristics of the first node i of the t-th network output.

When t is>1 hour according to the formula

wherein the content of the first and second substances,

the first node k is a neighbor node of the first node j, and is an encoding characteristic of an edge between the first node j and the first node k obtained through the t-1 layer network.

According to the formula

wherein f is₂Representing a neural network;

The first graph node is characterized as

Wherein nG represents the number of first nodes in the first graph.

In practical implementation, similar methods may be used to encode the first sub-tree, the second sub-tree, and accordingly, the first tree node characteristics may be obtained

Second graph node characteristics

Second tree node characteristics

Step 703: and acquiring the similarity characteristic between two molecules in the molecule pair sample through a node alignment strategy.

First, the molecule centers are aligned.

Here, according to

Obtaining a central representation of a base molecule x; according to

A central representation of the target molecule y is obtained.

To constrain the center of the generated predicted molecule, a molecule y' with low similarity in molecular structure to the base molecule is selected from the data set. Here, the predicted molecule can be obtained in the same manner as described above

And the center of the molecule y'.

Using comparison strategy to compare basic molecule x, target molecule y and predicted molecule

And the center distance between molecules y' to constrain:

wherein G is_xAs indicated by the center of the x,

is composed of

And then, carrying out node alignment on the graph structure level and the tree structure level.

Here, for any first node in the first graph and any third node in the second graph, the bidirectional similarity between the two nodes needs to be calculated.

Exemplarily, for the ith first node in the first graph and the qth third node in the second graph, according to

Obtaining the ith first in the first molecular graphA first similarity of the node to a qth third node in the second graph; and, according to

And acquiring second similarity from the q-th third node in the second molecular graph to the ith first node in the first molecular graph.

In particular, according to

Obtaining similarity characteristics from a first molecular graph to a second molecular graph

According to

Acquiring similarity characteristics from a first sub-tree to a second sub-tree; then will be

And

splicing to obtain a first similarity characteristic of the first molecular graph and the second molecular graph

Accordingly, in the same manner as described above, according to

Acquiring similarity characteristics from a first sub-tree to a second sub-tree; according to

Acquiring similarity characteristics from the second sub-tree to the first sub-tree; then will be

And

splicing to obtain a second similarity characteristic

Step 704: and sampling the similarity characteristics to obtain sampling characteristics, and generating graph node characteristics and tree node characteristics of the predicted molecules.

Here, the resampling technique of the variational self-encoder is adopted to sample the similarity feature in the hidden space to obtain a sampling feature, that is:

then, the sampling characteristics

Splicing with the first graph node characteristics to obtain the graph node characteristics of the predicted molecules

Characterizing the sample

Splicing with the first tree node characteristics to obtain the tree node characteristics of the predicted molecules

When the sampling features are spliced with the first graph node features, specifically, each feature item in the sampling features is respectively spliced with the coding features of each first node in the first graph node features.

Step 705: and decoding the graph node characteristics and the tree node characteristics of the predicted molecules to obtain the predicted molecules.

Here, decoding of the graph node feature will be described as an example.

For the u-th node in the molecular diagram of the predicted molecule, the molecular diagram can be obtained by

And acquiring an information vector transmitted between the information vector and the v-th node in the molecular graph.

When the t-th iteration reaches the u-th node, the node is recorded as u_tThen, according to

And

and acquiring the probability of adding the new node.

Wherein τ (-) represents a ReLU function, s (-) represents a sigmoid function,

for the purpose of the weights in the network,

denotes the t-th timeWhen the iteration reaches the u node, the output of the ReLU layer;

representing the output of the ReLU layer when the t-1 th iteration reaches the u-th node; h is_vtwRepresenting information vectors transmitted between the u-th node and the v-th node when the t-th iteration reaches the u-th node;

When it is determined to add a new node, the type of the new node needs to be determined. In particular by

To determine the type of the new node.

Wherein r is_vIndicates the type of the new node or nodes,

Step 706: based on the predicted molecules and the target molecules, values of the loss function are calculated, and model parameters are updated according to the values of the loss function.

After obtaining the predicted molecule, we expect that the molecule is as similar as possible to the target molecule in the original molecule pair, and can obtain the variation loss function by adopting the evidence lower bound function of the conditional self-encoder

Wherein L (θ, φ) represents a variation loss function and E represents a mathematical expectation; z represents a sampling feature, x represents a base molecule, and y represents a target molecule；p_θ(y | x, z) represents the probability that the generated predicted molecule is identical to the target molecule under the condition of x, z; q. q.s_φ(z | x, y) represents the posterior probability distribution of z; p (z | x) represents a standard gaussian distribution.

In practical applications, the loss function of the entire molecular generative model is: l ═ L (θ, Φ) + L_latent. Then, the value of the loss function can be calculated according to the predicted molecule and the target molecule, and then the model parameters are updated according to the value of the loss function until convergence, so as to train and obtain the molecule generation model.

After the molecule generation model is obtained through training, generation of molecules can be achieved based on the molecule generation model obtained through training. Fig. 8 is a schematic flow chart of a molecular generation method based on a molecular generation model provided in an embodiment of the present application, and referring to fig. 8, the molecular generation model provided in the embodiment of the present application includes:

step 801: obtaining a basic molecule.

In actual implementation, a user can input basic molecules through the terminal, after the basic molecules are input, the terminal automatically acquires the basic molecules and sends a molecule generation request corresponding to the basic molecules to the server, so that the server acquires the basic molecules and corresponding molecule acquisition requests.

Here, the base molecule is represented by a first molecular graph and a first molecular tree.

Step 802: the base molecule is encoded.

In practical implementation, the process of encoding the basic molecules is the same as that of encoding in training, and the first graph node characteristics obtained through encoding are

And first tree node characteristics

Step 803: sampling characteristics are sampled from the standard Gaussian distribution, and graph node characteristics and tree node characteristics of the predicted molecules are generated.

Here, the sampling features are sampled from two standard Gaussian distributions respectively

And sampling features

Characterizing the sample

Characterizing the sample

Step 804: and decoding the graph node characteristics and the tree node characteristics of the predicted molecules to obtain the predicted molecules.

The embodiment of the application has the following beneficial effects:

1. the defect that the structural similarity of a new molecule and an original molecule is difficult to maintain when the new molecule is generated by a traditional molecule generation method is overcome;

2. the method has wide application prospect, can generate new drug molecules with higher molecular property according to the existing drug molecules, and can improve the molecular property to a greater extent compared with a molecular generation model in the related technology.

Continuing with the exemplary structure of the training apparatus 455 for molecular generative model provided in this application implemented as a software module, fig. 9 is a schematic structural diagram of the training apparatus for molecular generative model provided in this application, and referring to fig. 9, the molecular generative model includes an encoding layer, an alignment layer, a generation layer, and a decoding layer, and the software module of the training apparatus 455 for molecular generative model may include:

a first obtaining module 4551, configured to obtain a molecule pair sample including a base molecule and a target molecule;

a first encoding module 4552, configured to encode, through the encoding layer, the base molecule in the molecule pair sample to obtain a first graph node feature of a first graph and a first tree node feature of a first sub-tree, and encode the target molecule in the molecule pair sample to obtain a second graph node feature of a second graph and a second tree node feature of the second sub-tree;

an alignment module 4553, configured to match, through the alignment layer, the first graph node feature with a second graph node feature to obtain a first similarity feature of the first graph and the second graph node feature, and match the first tree node feature with a second tree node feature to obtain a second similarity feature of the first sub-tree and the second sub-tree;

a first generating module 4554, configured to generate, through the generation layer, a graph node feature of a predicted molecule according to the first similarity feature and the first graph node feature, and generate a tree node feature of the predicted molecule according to the second similarity feature and the first tree node feature;

a first decoding module 4555, configured to decode, through the decoding layer, the graph node feature and the tree node feature respectively to obtain the predicted molecule represented by a molecular graph and a molecular tree;

an updating module 4556 configured to obtain a difference between the predicted molecule and the target molecule, and update a model parameter of the molecule generation model based on the difference.

In some embodiments, the first encoding module 4552 is further configured to encode at least two first nodes in the first graph through a graph coding network in a coding layer, obtain coding features of the at least two first nodes, and use the coding features of the at least two first nodes as the first graph node features; wherein the first molecular diagram corresponds to the molecular structural topology of the base molecule, the at least two first nodes corresponding to the constituent elements that make up the base molecule;

In some embodiments, the first encoding module 4552 is further configured to, for each first node in the first graph:

In some embodiments, the first encoding module 4552 is further configured to, for each of at least two edges connecting the first node:

when the edge is the edge connecting the first node and the neighbor node, acquiring attribute characteristics of at least two edges connecting the neighbor node;

In some embodiments, the first graph node characteristics comprise coding characteristics of at least two first nodes in a first graph, and the second graph node characteristics comprise coding characteristics of at least two third nodes in a second graph;

the alignment module 4553 is further configured to obtain, based on the first graph node feature and the second graph node feature, a first similarity from each first node in the first graph to at least two third nodes in the second graph, and a second similarity from each third node in the second graph to at least two first nodes in the first graph;

In some embodiments, the aligning module 4553 is further configured to aggregate the coding features of at least two first nodes in the first graph node features and the coding features of at least two first nodes in the second graph node features according to the first similarity, so as to obtain similarity features from the first graph to the second graph;

according to the second similarity, the coding features of at least two first nodes in the first graph node features and the coding features of at least two third nodes in the second graph node features are aggregated to obtain similarity features from the second graph to the first graph;

In some embodiments, the aligning module 4553 is further configured to perform weighted summation on the coding features of at least two third nodes in the second graph according to first similarities from the first nodes in the first graph to the at least two third nodes in the second graph, respectively, to obtain first aggregation features corresponding to the first nodes in the first graph;

In some embodiments, the first generating module 4554 is further configured to obtain a mean and a variance corresponding to the first similarity feature based on the first similarity feature;

obtaining a Gaussian distribution corresponding to a first similarity characteristic based on a mean value and a variance corresponding to the first similarity characteristic;

sampling from the Gaussian distribution to obtain sampling characteristics;

In some embodiments, the updating module 4556 is further configured to obtain a probability that the predicted molecule is the same as the target molecule;

acquiring information divergence between posterior probability distribution and standard Gaussian distribution of the sampling features;

In some embodiments, the first graph node features comprise coding features of at least two first nodes in a first graph, the first tree node features comprise coding features of at least two second nodes in a first tree branch;

the updating module 4556 is further configured to obtain a first average value of the coding features of at least two first nodes in the first graph and a second average value of the coding features of at least two second nodes in the first graph;

In some embodiments, the first decoding module 4555 is further configured to process the graph node features through a gated loop network to obtain an information vector transferred between nodes;

a second obtaining module for obtaining a base molecule, the base molecule being represented by the first molecular graph and the first molecular tree;

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 3.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method for training a molecular generative model, wherein the molecular generative model comprises an encoding layer, an alignment layer, a generation layer, and a decoding layer, the method comprising:

2. The method of claim 1, wherein said encoding the base molecule in the sample of the molecular pair to obtain a first graph node feature of a first graph and a first tree node feature of a first molecular tree comprises:

coding at least two first nodes in the first graph through a graph coding network in a coding layer to obtain coding characteristics of the at least two first nodes, and taking the coding characteristics of the at least two first nodes as the first graph node characteristics; wherein the first molecular diagram corresponds to the molecular structural topology of the base molecule, the at least two first nodes corresponding to the constituent elements that make up the base molecule;

3. The method of claim 2, wherein the encoding at least two first nodes in the first graph comprises:

for each first node in the first graph, performing the following:

and generating a node coding feature of the first node based on the attribute feature of the first node and the first edge aggregation feature.

4. The method of claim 3, wherein the obtaining edge coding features for at least two edges connecting the first node comprises:

for each of at least two edges connecting the first node, performing the following operations:

5. The method of claim 1,

the first graph node characteristics comprise coding characteristics of at least two first nodes in a first graph, and the second graph node characteristics comprise coding characteristics of at least two third nodes in a second graph;

the matching the first graph node feature with the second graph node feature to obtain a first similarity feature of the first graph and the second graph node feature comprises:

based on the first graph node characteristics and the second graph node characteristics, acquiring first similarity from each first node in the first graph to at least two third nodes in the second graph and second similarity from each third node in the second graph to at least two first nodes in the first graph;

and according to the first similarity and the second similarity, aggregating the coding features of at least two first nodes in the first graph node feature and at least two third nodes in the second graph node feature to obtain a first similarity feature of the first graph and the second graph.

6. The method of claim 5, wherein the aggregating the coding features of at least two first nodes in a first graph node feature and at least two third nodes in a second graph node feature according to the first similarity and the second similarity to obtain the first similarity feature of the first graph and the second graph node feature comprises:

according to the first similarity, the coding features of at least two first nodes in the first graph node feature and at least two third nodes in the second graph node feature are aggregated to obtain the similarity feature from the first graph to the second graph;

7. The method of claim 6, wherein said aggregating the coding features of at least two first nodes in a first graph node feature and at least two third nodes in a second graph node feature according to the first similarity comprises:

respectively carrying out weighted summation on the coding features of at least two third nodes in a second graph according to the first similarity from each first node in the first graph to at least two third nodes in the second graph to obtain a first aggregation feature corresponding to each first node in the first graph;

8. The method of claim 1, wherein generating graph node features of a predicted molecule based on the first similarity feature and the first graph node features comprises:

acquiring a mean value and a variance corresponding to the first similarity characteristic based on the first similarity characteristic;

sampling from the Gaussian distribution to obtain sampling characteristics;

9. The method of claim 8, wherein updating model parameters of the molecular generative model based on the difference comprises:

obtaining a probability that the predicted molecule is the same as the target molecule based on the difference;

10. The method of claim 9,

the first graph node features comprise coding features of at least two first nodes in a first graph, and the first tree node features comprise coding features of at least two second nodes in a first graph sub-tree;

the obtaining a central representation of a base molecule, comprising:

acquiring a first average value of the coding features of at least two first nodes in a first molecular graph and a second average value of the coding features of at least two second nodes in a first molecular tree;

11. The method of claim 1, wherein said decoding the graph node characteristic comprises:

processing the graph node characteristics through a gated cyclic network to obtain information vectors transmitted among the nodes;

12. A molecular generation method based on a molecular generation model, wherein the molecular generation model comprises: the method comprises an encoding layer, a generating layer and a decoding layer, and comprises the following steps:

generating, by the generation layer, graph node features of the predicted numerator based on the first graph node features and tree node features of the predicted numerator according to the first tree node features;

wherein the molecular generative model is trained based on the training method of any one of claims 1 to 11.

13. An apparatus for training a molecular generative model, wherein the molecular generative model comprises an encoding layer, an alignment layer, a generation layer, and a decoding layer, the apparatus comprising:

a first encoding module, configured to encode, through the encoding layer, the base molecule in the molecule pair sample to obtain a first graph node feature of a first graph and a first tree node feature of a first sub-tree, and encode the target molecule in the molecule pair sample to obtain a second graph node feature of a second graph and a second tree node feature of the second sub-tree;

14. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of training a molecular generative model according to any one of claims 1 to 11 when executing executable instructions stored in the memory.

15. A computer-readable storage medium storing executable instructions for implementing the method of training a molecular generative model according to any one of claims 1 to 11 when executed by a processor.