CN116453615A

CN116453615A - Prediction method and device, readable storage medium and electronic equipment

Info

Publication number: CN116453615A
Application number: CN202310420710.9A
Authority: CN
Inventors: 陈红阳; 李瑞凤
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-07-18

Abstract

The specification discloses a prediction method, a device, a readable storage medium and electronic equipment, wherein a specified sub-graph corresponding to a sub-structure of a molecule to be predicted is determined through a graph neural network model based on the molecular structure of the molecule to be predicted, and further, the specified property of the specified sub-graph is determined based on specified characteristics corresponding to the specified sub-graph and characterization characteristics of preset specified properties, and further, the molecular property of the molecule to be predicted is predicted. It can be seen that the molecule to be predicted has its corresponding molecular properties, since the molecule to be predicted comprises a substructure with the specified properties. Obviously, the prediction method provides interpretability for the molecule to be predicted to have the corresponding molecular property, and ensures the credibility of the prediction result.

Description

Prediction method and device, readable storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of chemistry, and in particular, to a prediction method, a prediction apparatus, a readable storage medium, and an electronic device.

Background

With the development of computer technology and the need of deep business fusion, in the medical and chemical fields, molecular properties are predicted through a model, and compound screening is performed based on the predicted molecular properties, so that the method has become one of application scenes common in the medical and chemical fields in deep learning.

At present, when predicting molecular properties, a molecular structure of a molecule to be predicted is generally required to be obtained, and then the molecular structure is input into a pre-trained prediction model to obtain the molecular properties output by the prediction model, which are used as the predicted molecular properties of the molecule to be predicted.

However, in general, since the prediction model can only determine the molecular properties of the molecule to be predicted, the reason why the molecule to be predicted has its corresponding molecular properties cannot be given, which results in low reliability of the predicted molecular properties. Based on this, the present specification provides a prediction method.

Disclosure of Invention

The present disclosure provides a prediction method, apparatus, readable storage medium, and electronic device, so as to partially solve the foregoing problems of the prior art.

The technical scheme adopted in the specification is as follows:

the present specification provides a prediction method comprising:

according to the obtained atoms contained in the molecules to be predicted and the chemical bonds among the atoms, establishing a molecular diagram taking the atoms as nodes and the chemical bonds as edges;

inputting the molecular diagram into a pre-trained graphic neural network model to obtain a plurality of designated subgraphs of the molecular diagram output by the graphic neural network model, wherein the designated subgraphs correspond to substructures contained in a molecular structure of the molecule to be predicted;

Determining the appointed characteristics of each appointed sub-graph according to each appointed sub-graph, and determining the fusion characteristics according to the appointed characteristics and the pre-stored characterization characteristics respectively corresponding to each appointed property;

inputting the fusion characteristics into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used for representing the appointed property of the substructure corresponding to the appointed subgraph;

and predicting the molecular property of the molecule to be predicted according to the target classification results corresponding to the designated subgraphs.

Optionally, inputting the molecular diagram into a pre-trained graph neural network model to obtain a plurality of designated subgraphs of the molecular diagram output by the graph neural network model, which specifically includes:

inputting the molecular diagram into a pre-trained graphic neural network model, and obtaining confidence degrees corresponding to all chemical bonds in the molecules to be predicted, which are output by the graphic neural network, wherein the confidence degrees are used for representing the probability that the edges corresponding to the chemical bonds belong to a designated subgraph;

and determining each designated side belonging to the designated subgraph according to each confidence level, and determining the designated subgraph according to each designated side and the nodes connecting the designated sides.

Optionally, inputting the molecular diagram into a pre-trained graphic neural network model to obtain confidence degrees corresponding to each chemical bond in the molecules to be predicted, which are output by the graphic neural network, specifically including:

determining node characteristics corresponding to each node in the molecular diagram;

determining the bond characteristics of each chemical bond contained in the molecule to be predicted according to the node characteristics of the nodes connected by the edges corresponding to the chemical bonds;

and inputting the key characteristics into a pre-trained graph neural network model to obtain the confidence coefficient of the chemical key output by the graph neural network.

Optionally, determining node characteristics corresponding to each node in the molecular graph specifically includes:

extracting the characteristics of each node and each side contained in the molecular graph respectively, and determining initial characteristics corresponding to each node and initial characteristics corresponding to each side respectively;

and determining a neighbor node of each node in the molecular graph, and determining the node characteristics of the node according to the initial characteristics of the node, the initial characteristics of the neighbor node and the initial characteristics of the edge between the neighbor node and the node.

Optionally, determining the fusion feature according to the specified feature and the pre-stored characterization feature respectively corresponding to each specified property specifically includes:

for each specified property, determining an enhancement feature of the specified property corresponding to the specified sub-graph according to the similarity between the specified feature and the characterization feature of the specified property;

and fusing the appointed characteristic and the enhanced characteristic of each appointed property corresponding to the appointed subgraph respectively to obtain a fused characteristic.

Optionally, predicting the molecular property of the molecule to be predicted according to the target classification result corresponding to each specified subgraph, which specifically includes:

according to the molecular diagram and each appointed sub-diagram, determining other sub-structures except each sub-structure corresponding to each appointed sub-diagram in the molecular structure of the molecule to be predicted as specific sub-structures;

determining a specific sub-graph corresponding to the specific sub-structure, and determining specific characteristics of the specific sub-graph;

inputting the specific features into the classification model to obtain a specific classification result output by the classification model;

and predicting the molecular property of the molecule to be predicted according to the specific classification result and the target classification result corresponding to each specific subgraph respectively.

Optionally, the graph neural network model and the classification model are trained by the following method:

for each sample molecule marked with the appointed property, according to each atom contained in the sample molecule and the chemical bond among the atoms, a sample molecular diagram taking the atom as a node and the chemical bond as an edge is established as a training sample, and the appointed property is used as the mark of the training sample;

respectively inputting each training sample into a graph neural network model to be trained, and obtaining sample designated subgraphs respectively corresponding to each training sample output by the graph neural network;

determining sample characteristics corresponding to each sample designated subgraph respectively, and determining characterization characteristics of each designated property according to each sample characteristic and labels corresponding to each training sample respectively;

determining fusion characteristics corresponding to each training sample according to each sample characteristic and the characteristic characteristics of each appointed property;

inputting each fusion characteristic into a classification model to be trained, and obtaining a sample classification result output by the classification model;

determining sample properties corresponding to each training sample according to sample classification results of sample designated subgraphs corresponding to each training sample;

And training the graphic neural network model and the classification model according to sample properties and labels respectively corresponding to the training samples.

The present specification provides a prediction apparatus including:

the first determining module is used for establishing a molecular diagram taking an atom as a node and taking a chemical bond as an edge according to each atom contained in the obtained molecule to be predicted and the chemical bond among the atoms;

the second determining module is used for inputting the molecular diagram into a pre-trained graphic neural network model to obtain a plurality of designated subgraphs of the molecular diagram output by the graphic neural network model, wherein the designated subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted;

the fusion module is used for determining the appointed characteristics of each appointed subgraph and determining fusion characteristics according to the appointed characteristics and the pre-stored characterization characteristics respectively corresponding to each appointed property;

the classification module is used for inputting the fusion characteristics into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used for representing the appointed property of the substructure corresponding to the appointed subgraph;

And the prediction module is used for predicting the molecular property of the molecule to be predicted according to the target classification results respectively corresponding to the specified subgraphs.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above-described prediction method.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above prediction method when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

determining a designated sub-graph corresponding to the substructure of the molecule to be predicted through a graph neural network model based on the molecular structure of the molecule to be predicted, and further determining designated properties of the designated sub-graph based on designated features corresponding to the designated sub-graph and characterization features of preset designated properties, so as to predict the molecular properties of the molecule to be predicted.

It can be seen that the molecule to be predicted has its corresponding molecular properties, since the molecule to be predicted comprises a substructure with the specified properties. Obviously, the prediction method provides interpretability for the molecule to be predicted to have the corresponding molecular property, and ensures the credibility of the prediction result.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at

In the figure:

FIG. 1 is a flow chart of a prediction method provided in the present specification;

FIG. 2 is a schematic representation of a molecular diagram provided herein;

FIG. 3 is a flow chart of the prediction method provided in the present specification;

FIG. 4 is a schematic diagram of a prediction apparatus provided in the present specification;

fig. 5 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a prediction method provided in the present specification, specifically including the following steps:

s100: and establishing a molecular graph taking the atoms as nodes and taking the chemical bonds as edges according to the obtained atoms contained in the molecules to be predicted and the chemical bonds among the atoms.

Embodiments of the present disclosure provide a prediction method, where the neural network model and the classification model involved may be pre-trained. The execution of the prediction method may be performed by an electronic device of a server for identifying molecular properties of the molecules to be predicted. The electronic device performing the training process of the neural network model and the classification model of the graph may be the same as or different from the electronic device performing the prediction method, which is not limited in this specification.

The method is different from the prior art that the molecular structure of the molecule to be predicted is directly input into a prediction model, and the molecular property of the molecule to be predicted is output by the prediction model, but the prediction model cannot explain the molecular property. That is, the predictive model cannot give the reason that the molecule to be predicted has its corresponding molecular property. And further results in the situation that the reliability of the molecular properties of the molecules to be predicted obtained based on the model at present is low.

The application provides a prediction method, which is used for determining a designated subgraph corresponding to a substructure of a molecule to be predicted through a graph neural network model based on the molecular structure of the molecule to be predicted, and further predicting the molecular property of the molecule to be predicted based on the designated property of the designated subgraph. It can be seen that the prediction method in the present application can predict the molecular property of the molecule to be predicted based on the property of the substructure contained in the molecular structure of the molecule to be predicted, that is, the molecule to be predicted has its corresponding molecular property, because the molecule to be predicted contains the substructure with the specified property. Obviously, the prediction method provides interpretability for the molecule to be predicted to have the corresponding molecular property, and ensures the credibility of the prediction result.

Based on the above brief description of the prediction method in the present application, it can be seen that the prediction method in the present specification can determine a specified subgraph through a graph neural network model. In general, the graph neural network model is used for processing the graph structure, and the graph structure can accurately represent the molecular structure of the molecule to be predicted. Based on this, the server can determine a molecular map based on the molecular structure of the molecules to be predicted.

Specifically, each molecule contains atoms, and chemical bonds exist between the atoms. Wherein the chemical bond comprises an ionic bond and a covalent bond.

The server may then determine the molecules to be predicted. The molecule to be predicted may be carried in a prediction request received by the server, or may be carried in a prediction task generated by the molecule to be predicted according to a preset prediction condition. The server may parse the received prediction request or the generated prediction task to determine molecules to be predicted carried in the prediction request or the prediction task.

Then, the server can determine each atom contained in the molecule to be predicted and the chemical bond between each atom according to the molecular structure of the molecule to be predicted.

Finally, the server can determine each node in the molecular graph according to the determined atoms, and then determine the edges between the nodes in the molecular graph according to each node connected by each chemical bond. Thus, a molecular diagram with atoms as nodes and chemical bonds as edges is constructed. The constructed molecular graph includes nodes corresponding to each atom and edges between the nodes. Wherein an edge between nodes in the molecular graph is used to characterize a chemical bond between two nodes in the molecular graph that are connected to the edge. Take fig. 2 as an example.

Fig. 2 is a schematic diagram of a molecular diagram provided in the present specification. Methanol molecules are shown as an example. The server may determine the atoms the methanol molecule contains: C. h, O, and determines the chemical bonds respectively corresponding between the atoms. Then, for each atom, a node corresponding to the atom is determined, and for each chemical bond, an edge between nodes in the molecular diagram is determined according to the nodes connected by the chemical bond.

S102: inputting the molecular diagram into a pre-trained graphic neural network model to obtain a plurality of designated subgraphs of the molecular diagram output by the graphic neural network model, wherein the designated subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted.

In one or more embodiments provided herein, as previously described, the server may determine, through a graph neural network model, a specified sub-graph corresponding to a sub-structure included in a molecular structure of the molecule to be predicted, so as to predict the specified property of the molecule to be predicted based on a specified property possessed by the specified sub-graph.

Specifically, the server is provided with a pre-trained graphic neural network model. The graph neural network model is used for determining a designated subgraph contained in a molecular graph corresponding to the molecule to be predicted. The specified subgraph corresponds to the substructures contained in the molecular structure of the molecule to be predicted. The designated subgraph is a substructure corresponding to a substructure used for characterizing molecular properties of the molecule to be predicted.

Taking NaOH molecules as an example, the substructure corresponding to the specific sub-graph corresponding to the NaOH molecules may be OH, and the properties of the NaOH molecules may be predicted by the specific properties of OH. Thus, OH can be used as a substructure for characterizing the molecular properties corresponding to NaOH molecules. The designated subgraph corresponding to OH may be a substructure corresponding to a molecular property for characterizing NaOH molecules.

Then, the server may input the molecular diagram determined in the step S100 as an input, and input the molecular diagram into the pre-trained graph neural network model, so as to obtain a plurality of specified subgraphs included in the molecular diagram output by the graph neural network model. Wherein, for each specified sub-graph, the specified sub-graph corresponds to a sub-structure contained in the molecular structure of the molecule to be predicted.

S104: and determining the designated characteristics of each designated sub-graph according to each designated sub-graph, and determining the fusion characteristics according to the designated characteristics and the pre-stored characterization characteristics respectively corresponding to the designated properties.

In one or more embodiments provided herein, after determining the specified sub-graph, the server may predict a molecular property corresponding to the molecule to be predicted based on a specified property of a sub-structure corresponding to the specified sub-graph. If the specified characteristics corresponding to the specified sub-graph are directly classified to determine the specified property of the sub-structure corresponding to the specified sub-graph, it may be impossible to explain why the sub-structure has the specified property. Therefore, the server can adopt the method of fusing the pre-stored characterization features corresponding to the specified properties and the specified features of the specified sub-graph to obtain a fusion result, and determine the specified properties of the sub-structure corresponding to the specified sub-graph in a manner of determining the specified properties of the sub-structure corresponding to the specified sub-graph based on the fusion result.

Specifically, the server stores in advance the characterization vectors corresponding to the specified properties. The specific property can be toxic, nontoxic and the like, and can also be water-soluble, slightly water-soluble, water-insoluble and the like. The specific type corresponding to the specific property can be set according to the need, and the specification does not limit the specific type.

Meanwhile, for each specified property, the characterization vector corresponding to the specified property may be used to characterize the sub-structure having the specified property, i.e., the more similar the specified feature of the specified sub-graph is to the characterization vector of the specified property, the higher the probability that the sub-structure corresponding to the specified sub-graph has the specified property. And determining the designated characteristics corresponding to the designated subgraph by the atoms and the chemical bonds among the atoms contained in the designated subgraph.

And finally, the server can splice the characterization vector corresponding to each appointed property stored in advance and the appointed feature corresponding to the appointed subgraph, and takes the spliced result as a fusion feature.

S106: and inputting the fusion characteristics into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used for representing the appointed property of the substructure corresponding to the appointed subgraph.

In one or more embodiments provided herein, after determining the fusion feature, the server may classify the fusion feature through a classification model to determine a specified property of a substructure corresponding to the specified sub-graph.

Specifically, a classification model is preset in the server, and the classification model is used for determining the appointed property of the substructure corresponding to the appointed sub-graph.

The server can input the fusion feature as input into a pre-trained classification model to obtain a target classification result corresponding to the specified subgraph output by the classification model. The target classification result is the appointed property of the substructure corresponding to the appointed sub-graph. Taking the specified property as toxic and non-toxic as an example, the target classification result may be that the substructure corresponding to the specified sub-graph is toxic or that the substructure corresponding to the specified sub-graph is non-toxic.

Of course, the target classification result may also be a probability that the sub-structure corresponding to the specified sub-graph has each specified property. Also, taking the specific property as toxic and non-toxic as an example, the target classification result may be that the probability of toxicity of the substructure corresponding to the specific sub-graph is 20%, the probability of non-toxicity is 80%, and so on. The specific expression form of the target classification result can be set as required, and the specification does not limit this.

S108: and predicting the molecular property of the molecule to be predicted according to the target classification results corresponding to the designated subgraphs.

In one or more embodiments provided herein, the prediction method predicts the molecular property of the molecule to be predicted based on the specified property of the specified subgraph. The molecular property may be a specific property of the molecule to be predicted, or may be a probability that the molecule to be predicted has a specific property.

Specifically, for each designated sub-graph, the server may determine a target classification result corresponding to the designated sub-graph.

Then, for each specified property, the server may determine the number of specified sub-graphs having the specified property as a specified number and determine whether the specified number exceeds one half of the number of specified sub-graphs contained by the molecule to be predicted.

If so, the server may determine that the molecule to be predicted has the specified property.

If not, the server may determine that the molecule to be predicted does not have the specified property.

Further, if a molecule contains a toxic substructure, the molecule is also toxic with a high probability. If the molecule comprises three designated sub-patterns, only one of the sub-patterns is toxic, it is obvious that the determined molecular property of the molecule to be predicted may not comprise the designated property of "toxic". In order to avoid the occurrence of the above situation, the server may determine, for each specified property, whether a specified sub-graph having the specified property is included in each specified sub-graph included in the molecule to be predicted.

Of course, the server may also determine, for each specific property, a probability that the molecule to be predicted has the specific property according to the probability that each specific sub-graph included in the molecule to be predicted has the specific property.

The server may then determine the molecular property that the molecule to be predicted has based on the determined respective specified property that the molecule to be predicted has, or the probability that the molecule to be predicted has the respective specified property. The server may directly use the probability of each specific property of the molecule to be predicted as the molecular property of the molecule to be predicted, or may use the specific property of which the probability exceeds the preset specific probability as the molecular property of the molecule to be predicted. Specifically, how to determine the molecular property of the molecule to be predicted based on the target classification results corresponding to each specific subgraph can be set according to the needs, which is not limited in the present specification.

Based on the prediction method shown in fig. 1, determining a designated subgraph corresponding to the substructure of the molecule to be predicted through a graph neural network model based on the molecular structure of the molecule to be predicted, and further predicting the molecular property of the molecule to be predicted based on the designated property possessed by the designated subgraph. It can be seen that the molecule to be predicted has its corresponding molecular properties, since the molecule to be predicted comprises a substructure with the specified properties. Obviously, the prediction method provides interpretability for the molecule to be predicted to have the corresponding molecular property, and ensures the credibility of the prediction result.

Further, for a molecule, the chemical bond that the molecule has can generally affect the physical properties to which the molecule corresponds. The prediction method provided by the application aims at determining a designated subgraph which can be used for representing the molecular property of the molecule to be predicted, and predicting the molecular property of the molecule to be predicted through the designated property of the designated subgraph. Based on the same thought, if chemical bonds which can be used for representing molecular properties of the molecules to be predicted are determined, and then a designated sub-graph is determined based on the determined chemical bonds, the molecular properties of the molecules to be predicted can be determined based on a target classification result of a sub-structure corresponding to the designated sub-graph.

Specifically, the server may input the molecular map determined in step S100 as input, and input the molecular map into a pre-trained graph neural network model, so as to obtain the confidence degrees corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model. The map neural network is used for determining a designated subgraph contained in a molecular subgraph corresponding to the molecule to be predicted. For each chemical bond, the confidence corresponding to that chemical bond is used to characterize the probability that the edge corresponding to that chemical bond belongs to the specified subgraph.

The server may then determine the designated edges belonging to the designated subgraph based on the determined confidence levels.

Finally, the server may determine the designated subgraph based on the determined designated edges and the nodes connecting the designated edges.

Further, for each chemical bond, the nature of the chemical bond may be characterized based on the nature of the atom to which the chemical bond is attached. Then, when determining the confidence levels corresponding to the chemical bonds, the server may also determine, for each chemical bond, a key feature corresponding to the chemical bond, and determine, based on the key feature, the confidence level corresponding to the chemical bond.

Specifically, the server may determine node characteristics corresponding to each node in the molecular diagram.

Then, for each chemical bond included in the molecule to be predicted, the server may determine two nodes connected by the edge corresponding to the chemical bond, determine feature extraction on the two nodes, and determine node features of the two nodes.

The server may then splice the node characteristics of the two nodes as the key characteristics of the chemical bond.

Finally, the server can input the key characteristics into a pre-trained graph neural network model to obtain the confidence coefficient of the chemical key output by the graph neural network.

Of course, the server may also perform feature extraction on the chemical bond, determine an initial feature corresponding to the chemical bond, fuse the initial feature corresponding to the chemical bond with the node features of the two nodes, and use the fusion result as the bond feature of the chemical bond.

In this specification, for each atom in the molecular diagram, the property corresponding to the atom is affected not only by the atom itself but also by other atoms connected to the atom. Thus, when determining the node characteristics of the node to which each atom corresponds, it may also be determined based on the characteristics of the neighboring nodes of the node.

Specifically, for each node and each edge included in the molecular graph, the server may perform feature extraction on the node and the edge, and determine initial features of the node and initial features of the edge.

The server may then determine, for each node in the molecular graph, the neighbor nodes of that node.

Finally, the server may determine node characteristics of the node based on the initial characteristics of the node, the initial characteristics of the neighbor node, and the initial characteristics of the edge between the neighbor node and the node.

Of course, it should be noted that the server may further re-use the determined node characteristic of the node as the initial characteristic of the node, and determine the node characteristic of the node based on the initial characteristic of the node and the initial characteristic of the neighboring node of the node according to the re-determined initial characteristic of the node. The purpose of transmitting the property of each atom in the molecular diagram along each chemical bond is achieved, and the accuracy of the confidence coefficient of the chemical bond determined based on the node characteristics of the node is further guaranteed.

Further, in the present application, the purpose of determining the fusion feature is to determine the specified feature of the specified feature based on the similarity between the specified feature and the characterization feature corresponding to each of the pre-stored specified features. For each specific property, if the specific feature is enhanced based on the similarity between the characteristic feature corresponding to the specific property and the specific feature, the accuracy of the target classification result determined based on the enhanced result is higher. Based on this, the server may augment a specified feature based on the characterizing feature of the specified property.

In particular, the server may determine, for each specified property, an enhancement feature of the specified property corresponding to the specified sub-graph based on a similarity between the characterization feature of the specified property and the specified feature of the specified sub-graph, and the characterization feature of the specified property.

After determining that each specified property corresponds to the enhanced feature of the specified sub-graph, the server may fuse the specified feature with the enhanced feature of each specified property corresponding to the specified sub-graph, respectively, to obtain a fused feature. The target classification result of the specified subgraph can be obtained subsequently based on the fusion features.

Wherein, for each specified property, the characterization feature of that specified property can be determined in the following manner:

and inputting a sample molecular diagram corresponding to each sample molecule of the appointed property into a pre-trained graph neural network model aiming at each appointed property to obtain appointed subgraphs corresponding to each sample molecule of the appointed property output by the graph neural network model, and determining the characterization characteristics of the appointed property based on the appointed characteristics corresponding to each appointed subgraph.

Of course, the server may also select any sample molecule from the sample molecules corresponding to the specified property, input the molecular diagram of the sample molecule into a pre-trained graph neural network model, obtain a specified sub-graph corresponding to the sample molecule output by the graph neural network model, and then use the specified feature of the specified sub-graph as the characterization feature of the specified property.

How to determine the characterization features corresponding to the specified properties respectively can be set according to the needs, and the specification is not limited to this.

Furthermore, for the molecule to be predicted, in addition to the determined substructures corresponding to the specified subgraphs, the molecular structure of the molecule to be predicted also includes other substructures, and the properties of other substructures may also affect the properties of the molecule to be predicted. Thus, the server may also take the other substructures described above as specific substructures and determine the nature of the molecule to be predicted based on the specific substructures.

Specifically, the server may determine, according to the molecular diagram and each specified diagram, other substructures in the molecular structure of the molecule to be predicted, except for each substructures corresponding to each specified sub-graph, as the specific substructures.

The server may then determine a particular sub-graph corresponding to the particular sub-structure and determine a particular feature of the feature sub-graph.

Then, the server can input the specific feature into a classification model to obtain a specific classification result output by the classification model.

And finally, the server can predict the molecular property of the molecule to be predicted according to the specific classification result and the target classification result corresponding to each specific subgraph respectively. As shown in fig. 3.

Fig. 3 is a schematic flow chart of a prediction method provided in the present disclosure, where the server may input a molecular diagram of the molecule to be predicted into a graph neural network model to obtain a specified sub-graph and a specific sub-graph output by the graph neural network model. The server can fuse the appointed characteristics of the appointed subgraph with the preset characteristic characteristics of each appointed property to obtain fused characteristics, and then respectively input the fused characteristics and the specific subgraph into a classification model to obtain a target classification result corresponding to the fused characteristics output by the classification model and a specific classification result corresponding to the specific subgraph. Finally, the server may determine a prediction result based on the specific classification result and the target classification result, the prediction result being a molecular property possessed by the molecule to be predicted.

In addition, the graph neural network model and the classification model in the specification can be trained by the following ways:

in particular, the server may obtain a number of sample molecules that are labeled with a specified property. And for each sample molecule marked with the specified property, a sample molecular diagram taking the atom as a node and taking the chemical bond as a side is established as a training sample according to each atom contained in the sample molecule and the chemical bond among the atoms. At the same time, the server may treat the specified property as a label for the training sample.

And secondly, the server can respectively input each training sample into a graph neural network model to be trained to obtain sample designated subgraphs respectively corresponding to each training sample output by the graph neural network.

Then, the server can determine sample characteristics corresponding to each sample designated subgraph respectively, and determine characterization characteristics of each designated property according to each sample characteristic and labels corresponding to each training sample respectively.

Then, the server can determine fusion characteristics corresponding to each training sample according to the characteristics of each sample and the characterization characteristics of each appointed property.

And then, the server can input each fusion characteristic into a classification model to be trained to obtain a sample classification result output by the classification model.

The server can then determine sample properties corresponding to each training sample according to the sample classification results of the sample specification subgraph corresponding to each training sample.

And finally, the server can determine loss according to sample properties and labels respectively corresponding to the training samples, and train the graph neural network model and the classification model by taking the minimum loss as a target.

Further, the graph neural network in the present specification can be trained by the following ways:

in particular, the server may obtain a number of sample molecules that are labeled with a specified property. And for each sample molecule marked with the specified property, a sample molecular diagram taking the atom as a node and taking the chemical bond as a side is established as a training sample according to each atom contained in the sample molecule and the chemical bond among the atoms.

Second, the server may determine, for each training sample, a specified subgraph that the training sample contains as a target sample subgraph for the training sample.

Then, the server can respectively input each training sample into a graph neural network model to be trained to obtain sample designated subgraphs respectively corresponding to each training sample output by the graph neural network.

And finally, the server can determine the loss of the graph neural network model according to the sample designated subgraph and the target sample subgraph respectively corresponding to each training sample, and adjust the model parameters of the graph neural network model by taking the minimum loss as an optimization target.

Based on the same thought, the present specification also provides a prediction device, as shown in fig. 4.

Fig. 4 is a view of a prediction apparatus provided in the present specification, wherein:

the first determining module 200 is configured to establish a molecular graph with atoms as nodes and chemical bonds as edges according to the obtained atoms included in the molecule to be predicted and the chemical bonds between the atoms.

The second determining module 202 is configured to input the molecular map into a pre-trained graph neural network model, and obtain a plurality of designated subgraphs of the molecular map output by the graph neural network model, where the designated subgraphs correspond to substructures included in a molecular structure of the molecule to be predicted.

And the fusion module 204 is configured to determine, for each specified sub-graph, a specified feature of the specified sub-graph, and determine a fusion feature according to the specified feature and a characterization feature corresponding to each pre-stored specified property.

And the classification module 206 is configured to input the fusion feature into a classification model, and obtain a target classification result output by the classification model, where the target classification result is used to characterize a specified property of a substructure corresponding to the specified sub-graph.

And the prediction module 208 is used for predicting the molecular property of the molecule to be predicted according to the target classification result corresponding to each designated subgraph.

Optionally, the first determining module 200 is configured to input the molecular map into a pre-trained graph neural network model, obtain confidence degrees corresponding to each chemical bond in the to-be-predicted molecule output by the graph neural network, where the confidence degrees are used to characterize a probability that an edge corresponding to the chemical bond belongs to a designated sub-graph, determine each designated edge belonging to the designated sub-graph according to each confidence degree, and determine the designated sub-graph according to each designated edge and a node connecting each designated edge.

Optionally, the first determining module 200 is configured to determine node features corresponding to each node in the molecular graph, determine, for each chemical bond included in the molecule to be predicted, a key feature of the chemical bond according to the node feature of a node connected to a side corresponding to the chemical bond, and input the key feature into a pre-trained graph neural network model, so as to obtain a confidence coefficient of the chemical bond output by the graph neural network.

Optionally, the first determining module 200 is configured to perform feature extraction on each node and each edge included in the molecular graph, determine initial features corresponding to each node and initial features corresponding to each edge, determine, for each node in the molecular graph, a neighboring node of the node, and determine node features of the node according to the initial features of the node, the initial features of the neighboring node, and the initial features of edges between the neighboring node and the node.

Optionally, the fusion module 204 is configured to determine, for each specified property, an enhancement feature of the specified property corresponding to the specified sub-graph according to a similarity between the specified feature and a feature of the specified property, and fuse the specified feature and each specified property respectively corresponding to the enhancement feature of the specified sub-graph to obtain a fused feature.

Optionally, the prediction module 208 is configured to determine, according to the molecular graph and each designated sub-graph, other sub-structures in the molecular structure of the molecule to be predicted, except for each sub-structure corresponding to each designated sub-graph, as a specific sub-structure, determine a specific sub-graph corresponding to the specific sub-structure, determine specific features of the specific sub-graph, input the specific features into the classification model, obtain a specific classification result output by the classification model, and predict molecular properties of the molecule to be predicted according to the specific classification result and a target classification result corresponding to each designated sub-graph.

The apparatus further comprises:

the training module 210 is configured to train to obtain the graph neural network model and the classification model in the following manner: for each sample molecule marked with the appointed property, a sample molecular diagram taking atoms as nodes and taking the chemical bonds as sides is established according to the atoms contained in the sample molecule and the chemical bonds between the atoms, the appointed property is used as a training sample, the appointed property is used as a mark of the training sample, each training sample is respectively input into a graph neural network model to be trained, a sample appointed subgraph corresponding to each training sample output by the graph neural network is obtained, sample characteristics corresponding to each sample appointed subgraph are determined, characteristic features of each appointed property are determined according to each sample feature and marks corresponding to each training sample, fusion features corresponding to each training sample are determined according to each sample feature and the characteristic features of each appointed property, the sample classification result output by the classification model is obtained, the properties corresponding to each training sample are determined according to the sample classification result of each sample appointed subgraph corresponding to each training sample, and the graph neural network is trained according to the properties corresponding to each training sample and marks thereof.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the predictive method provided in fig. 1 above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the prediction method described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable lesion detection device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable lesion detection device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable lesion detection device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of prediction, the method comprising:

2. The method according to claim 1, wherein inputting the molecular diagram into a pre-trained graph neural network model to obtain a plurality of specified subgraphs of the molecular diagram output by the graph neural network model, specifically comprises:

3. The method of claim 2, wherein inputting the molecular map into a pre-trained map neural network model to obtain a confidence level corresponding to each chemical bond in the molecule to be predicted output by the map neural network, specifically includes:

4. The method of claim 3, wherein determining node characteristics corresponding to each node in the molecular diagram comprises:

5. The method according to claim 1, wherein determining the fusion feature based on the specified feature and the pre-stored characterization feature corresponding to each specified property, specifically comprises:

6. The method of claim 1, wherein predicting the molecular property of the molecule to be predicted according to the target classification result corresponding to each specified subgraph, specifically comprises:

7. The method of claim 1, wherein the graph neural network model and the classification model are trained by:

8. A predictive device, the device comprising:

9. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.