CN112185480B - Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model - Google Patents

Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model Download PDF

Info

Publication number
CN112185480B
CN112185480B CN202011159909.3A CN202011159909A CN112185480B CN 112185480 B CN112185480 B CN 112185480B CN 202011159909 A CN202011159909 A CN 202011159909A CN 112185480 B CN112185480 B CN 112185480B
Authority
CN
China
Prior art keywords
distribution coefficient
lipid
layer
feature
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011159909.3A
Other languages
Chinese (zh)
Other versions
CN112185480A (en
Inventor
周文彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wangshi Intelligent Technology Co ltd
Original Assignee
Beijing Wangshi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wangshi Intelligent Technology Co ltd filed Critical Beijing Wangshi Intelligent Technology Co ltd
Priority to CN202011159909.3A priority Critical patent/CN112185480B/en
Publication of CN112185480A publication Critical patent/CN112185480A/en
Application granted granted Critical
Publication of CN112185480B publication Critical patent/CN112185480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a graph feature extraction and fat water distribution coefficient prediction method and a graph feature extraction model, wherein the graph feature extraction method comprises the following steps: acquiring a feature map to be extracted, wherein the feature map to be extracted consists of a plurality of nodes and edges connected with the nodes with association relations; inputting a feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers; and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer to perform characteristic fusion, so as to obtain the characteristics of the characteristic diagram to be extracted. According to the invention, the GRU network layer is used for fusing the characteristic information with the association relation nodes during each convolution operation, so that the network has better expression capability, is more suitable for interaction among nodes in the expression graph, and reduces the earlier extraction workload.

Description

Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a graph feature extraction and fat water distribution coefficient prediction method and a graph feature extraction model.
Background
Log of lipid water distribution coefficient p Concentration of substance in octanol/concentration of substance in water, an important reference element in drug design, which affects the absorption behavior of drugs in vivo. Although the index can be detected by simple experiments, in the early stage of virtual screening of drug design, the experiment determination of a large number of candidate small molecules is impractical, and a pharmacy expert often calculates log by means of software p Coarse screening is performed.
In the related art, a machine learning model is generally used for log of small molecules p The prediction is performed, but the extraction of the features requires a lot of earlier work, and requires a lot of expertise and experience, and is labor-intensive. Therefore, a graph feature extraction method is needed to learn reasonable and sufficient characterization information of small molecules, more accurately express the features of the molecules and reduce the workload.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the defects that a large amount of earlier work and large workload are needed in the feature extraction in the prior art, so as to provide a graph feature extraction, a lipid water distribution coefficient prediction method and a graph feature extraction model.
According to a first aspect, an embodiment of the present invention discloses a graph feature extraction method, including: acquiring a feature map to be extracted, wherein the feature map to be extracted consists of a plurality of nodes and edges connected with the nodes with association relations; inputting the feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers; and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer to perform characteristic fusion, so as to obtain the characteristics of the characteristic graph to be extracted.
Optionally, the feature fusion with association relation nodes through the GRU network layer includes:
w′=GRU(w,around w )
wherein, around w Representing the total impact of all other nodes v connected with node w on node w in the graph;is an MLP neural network, and different types of edges correspond to different network parameters; w' represents the feature vector of the updated node w.
Optionally, the inputting the feature of each node output by the last layer of convolution layer to the merging layer for feature fusion includes: and inputting the characteristics of each node output by the last layer of convolution layer to a merging layer, mapping the characteristics of each node into simulated fingerprints through the merging layer, and then carrying out characteristic fusion.
Optionally, the feature fusion is performed after the feature of each node is mapped to the simulated fingerprint by the merging layer, including:
wherein w is (n) Is the output characteristic vector of the node w after the calculation of the convolution of the nth layer of graph; dim means that any node feature vector is mapped into a dim-dimensional vector space;a mapping value representing the output eigenvector of the n-th layer convolution of the node w; softbmap represents the output of the merge layer.
According to a second aspect, the embodiment of the invention also discloses a method for predicting the lipid water distribution coefficient, which comprises the following steps: performing feature extraction on the biological small molecules by using the graph feature extraction method according to the first aspect or any optional implementation manner of the first aspect; and predicting the lipid distribution coefficient of the extracted biological micromolecular features by using a pre-trained lipid distribution coefficient prediction model.
Optionally, the lipid distribution coefficient prediction model is obtained by training in the following way: acquiring first lipid distribution coefficient training data and training data associated with the first lipid distribution coefficient, the first lipid distribution coefficient related training data comprising: at least one of solubility, melting point, dissociation coefficient, lipid fraction distribution coefficient measured at ph=7.4; and inputting the first water distribution coefficient training data and the training data related to the first water distribution coefficient into a machine learning model for pre-training to obtain a water distribution coefficient prediction model.
Optionally, the method further comprises: acquiring second lipid distribution coefficient training data and training data related to the second lipid distribution coefficient, wherein the second lipid distribution coefficient related data is the same as the first lipid distribution coefficient related training data, and the accuracy of the second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient is greater than that of the first lipid distribution coefficient training data and the training data related to the first lipid distribution coefficient; and inputting the second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient into the lipid distribution coefficient prediction model for training to obtain a target lipid distribution coefficient prediction model.
According to a third aspect, the embodiment of the invention further discloses a graph feature extraction model, which comprises: an input layer, configured to obtain a feature map to be extracted, where the feature map to be extracted is composed of a plurality of nodes and edges connected with the nodes having an association relationship; the method comprises the steps of inputting the characteristics of the nodes with association relations in an extracted characteristic diagram to the GRU network layer for characteristic fusion and inputting the characteristics to the next convolution layer, and repeating the steps that each convolution layer inputs the characteristics of the nodes with association relations in the extracted characteristic diagram to the GRU network layer for characteristic fusion and inputting the characteristics to the next convolution layer until the last convolution layer; and the merging layer is used for carrying out feature fusion on the features of each node output by the last layer of convolution layer and outputting the feature fusion result through the output layer.
According to a fourth aspect, an embodiment of the present invention also discloses a computer device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform steps of the graph feature extraction method as described in the first aspect or any optional implementation of the first aspect or steps of the lipid water distribution coefficient prediction method as described in the second aspect or any optional implementation of the second aspect.
According to a fifth aspect, the embodiment of the present invention further discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the graph feature extraction method according to the first aspect or any optional implementation manner of the first aspect or the steps of the lipid water distribution coefficient prediction method according to the second aspect or any optional implementation manner of the second aspect. The technical scheme of the invention has the following advantages:
1. according to the graph feature extraction method provided by the invention, the feature graph to be extracted is obtained, and consists of a plurality of nodes and edges connected with the nodes with association relations; inputting a feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers; and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer to perform characteristic fusion, so as to obtain the characteristics of the characteristic diagram to be extracted. According to the embodiment of the invention, the GRU network layer is used for fusing the characteristic information with the association relation nodes during each convolution operation, and as the GRU has different degrees of sensitivity to different input information, the GRU network layer can retain some useful information through a plurality of internal gating sub-networks and discard some useless information, so that the network has better expression capability, is more suitable for interaction among nodes in an expression graph, reduces the earlier extraction workload and improves the efficiency.
2. According to the lipid distribution coefficient prediction method provided by the invention, the characteristic extraction is carried out on the biological micromolecules by utilizing a graph characteristic extraction method, and the lipid distribution coefficient prediction is carried out on the extracted biological micromolecule characteristics by utilizing a pre-trained lipid distribution coefficient prediction model. According to the embodiment of the invention, the characteristics of the biological micromolecules are extracted by using the graph characteristic extraction method, so that reasonable and sufficient characterization information of the micromolecules is learned, the characteristics of the molecules are more accurately expressed, and the extracted characteristics of the biological micromolecules are predicted by using the pre-trained lipid distribution coefficient prediction model, so that the lipid distribution coefficient prediction is more accurate, and the workload is reduced.
3. The graph feature extraction model provided by the invention comprises the following steps: the input layer is used for acquiring a feature image to be extracted, and the feature image to be extracted consists of a plurality of nodes and edges connected with the nodes with association relations; the method comprises the steps of inputting the characteristics of the nodes with association relations in an extracted characteristic diagram to the GRU network layer for characteristic fusion and inputting the characteristics to the next convolution layer, repeating the steps of inputting the characteristics of the nodes with association relations in the extracted characteristic diagram to the GRU network layer for characteristic fusion and inputting the characteristics to the next convolution layer until the last convolution layer; and the merging layer is used for carrying out feature fusion on the features of each node output by the last layer of convolution layer and outputting the feature fusion result through the output layer. According to the embodiment of the invention, the GRU network layer is added, and is used for fusing the characteristic information with the association relation nodes during convolution operation, and as the GRU has different degrees of sensitivity to different input information, the GRU network layer can retain some information through a plurality of internal gating sub-networks and discard some useless information, so that the network has better expression capability and is more suitable for interaction among nodes in an expression graph.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a specific example of a graph feature extraction method in an embodiment of the invention;
FIG. 2 is a flowchart showing a specific example of a method for predicting a lipid distribution coefficient in an embodiment of the present invention;
FIG. 3 is a diagram showing a specific example of a training method of the lipid distribution coefficient prediction model according to the embodiment of the present invention;
FIG. 4 is a diagram showing another specific example of a training method of the lipid distribution coefficient prediction model according to the embodiment of the present invention;
FIG. 5 is a graph of the predicted scatter of XLogP3 and logP according to an embodiment of the present invention;
FIG. 6 is a diagram showing a specific example of the feature extraction model in accordance with an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a specific example of a feature extraction device in accordance with an embodiment of the invention;
FIG. 8 is a schematic block diagram showing a specific example of a lipid-water distribution coefficient prediction apparatus in the embodiment of the present invention;
fig. 9 is a diagram showing a specific example of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, or can be communicated inside the two components, or can be connected wirelessly or in a wired way. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The embodiment of the invention discloses a graph feature extraction method, which is shown in fig. 1 and comprises the following steps:
s11: and obtaining a feature map to be extracted, wherein the feature map to be extracted consists of a plurality of nodes and edges connected with the nodes with association relations.
The embodiment of the invention does not limit the feature map to be extracted specifically, and one skilled in the art can set the feature map according to practical situations. The feature graph to be extracted is composed of a plurality of nodes and edges connecting the nodes with association relation, for example, for a molecular graph, the nodes in the graph are each atoms composing a molecule, and attribute information of the atoms includes: formal charge (formal charge), partial charge (partial charge) hybridization types (e.g., sp2, sp3, etc.), and the like; edges are covalent bonds between atoms, comprising: single bond, double bond, triple bond and pi bond.
The method for acquiring the feature map to be extracted can be manually uploaded to a server or can be called from a database, and the embodiment of the invention does not limit the method for acquiring the feature map to be extracted specifically, and can be set by a person skilled in the art according to actual conditions.
S12: inputting a feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers.
The GRU network layer is a special neural network, and through a plurality of gating sub-networks in the GRU network layer, important information in an input feature map to be extracted can be reserved and output to an output feature vector, and meanwhile useless information is ignored, so that the GRU network layer accumulates information of neighbor nodes of a target node by referring to the information of the target node and then performs feature fusion with the feature information of the target node.
The specific fusion method is as follows: for any node w, the information is updated by other nodes v directly connected with the node w, specifically, the sum value around of all neighbor nodes v of w is calculated by using the formula (1) w The summary value is the total influence of all neighbor nodes v of the node w on the w, and the w value is updated by using the formula (2), wherein the updating operation uses a basic unit GRU of the cyclic neural network, so that the optimal updating and forgetting can be realized.
w′=GRU(w,around w ) (2)
Wherein, around w Representing the total impact of all other nodes v connected with node w on node w in the graph;is an MLP neural network, and different types of edges correspond to different network parameters; for example, for a molecular diagram, if a single bond is between w and vThen correspond to NN single_bond If it is a double bond then it corresponds to NN double_bond The method comprises the steps of carrying out a first treatment on the surface of the w' represents the eigenvector of the updated node w after the convolution operation.
2-layer convolution operation:
w″=GRU(w′,around w′ ) (4)
wherein, around w′ Representing the total impact of all other nodes v connected with node w on node w in the graph; w "represents the feature vector of the node w after the update of the secondary convolution operation.
3-layer convolution operation:
w″′=GRU(w″,around w″ ) (6)
wherein, around w″ Representing the total impact of all other nodes v connected with node w on node w in the graph; w' "represents the eigenvector of the node w after the update of the three convolution operations.
And the like, 4-layer convolution operation characteristic information … … and n-th-layer convolution operation characteristic information are obtained.
S13: and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer to perform characteristic fusion, so as to obtain the characteristics of the characteristic diagram to be extracted.
The feature of each node output by the last convolution layer is input to the merging layer for feature fusion, so that the feature of the feature graph to be extracted can be obtained by directly summing the features of all nodes in the graph, or the feature of each node in the graph can be input to the merging layer, and the feature of each node is mapped into the simulated fingerprint through the merging layer for feature fusion.
According to the graph feature extraction method provided by the invention, the feature graph to be extracted is obtained, and consists of a plurality of nodes and edges connected with the nodes with association relations; inputting a feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers; and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer to perform characteristic fusion, so as to obtain the characteristics of the characteristic diagram to be extracted. According to the embodiment of the invention, the GRU network layer is used for fusing the characteristic information with the association relation nodes during each convolution operation, and as the GRU has different degrees of sensitivity to different input information, the GRU network layer can retain some information through a plurality of internal gating sub-networks and discard some useless information, so that the network has better expression capability, is more suitable for interaction among nodes in an expression graph, reduces the earlier extraction workload and improves the efficiency.
As an alternative implementation manner of the embodiment of the present invention, the step S13 includes:
and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer, mapping the characteristics of each node into simulated fingerprints through the merging layer, and then carrying out characteristic fusion.
Illustratively, the feature of each node output by the last convolution layer is input to the merging layer, and feature fusion is specifically performed after the feature of each node is mapped into the simulated fingerprint by the merging layer: the characteristics of each node output by the last layer of convolution layer are input into a merging layer, all node information is mapped into an x-dimensional bitmap file (bitmap) through the merging layer, an x-dimensional molecular fingerprint is constructed, so that the characteristic information is mapped to different positions of the bitmap file, and then the characteristics of each node are fused. In an embodiment of the present invention, the bitmap file may be 2048 dimensions. Compared with the method that the feature information of all the nodes is directly summed to obtain the features of the feature graph to be extracted, the feature expression capability is better, confusion and mutual coverage of the feature information are avoided, and the local and global features of the feature graph to be extracted can be better expressed.
Specifically, feature extraction of the feature map to be extracted can be performed by the following formula:
wherein w is (n) Is the output characteristic vector of the node w after the calculation of the convolution of the nth layer of graph; dim means that any node feature vector is mapped into a dim-dimensional vector space;a mapping value representing the output eigenvector of the n-th layer convolution of the node w; softbmap represents the output of the merge layer.
The embodiment of the invention also discloses a method for predicting the lipid distribution coefficient, which is shown in figure 2 and comprises the following steps:
s21: and extracting the characteristics of the biological micromolecules by using the graph characteristic extraction method in the graph characteristic extraction method embodiment.
The method for extracting the characteristics of the biological small molecules by using the graph characteristic extraction method according to the graph characteristic extraction method embodiment is specifically that the biological small molecule graph is directly input into a graph characteristic extraction model, and the characteristic of the biological small molecules is obtained by using the graph characteristic extraction method according to the graph characteristic extraction method embodiment.
S22: and predicting the lipid distribution coefficient of the extracted biological micromolecular features by using a pre-trained lipid distribution coefficient prediction model.
Illustratively, the extracted biological micromolecular features are directly input into a pre-trained lipid water distribution coefficient prediction model to obtain a lipid water distribution coefficient prediction value. The lipid distribution coefficient prediction model can be an existing lipid distribution coefficient prediction model, can be trained in advance according to requirements, is not particularly limited, and can be selected by a person skilled in the art according to actual conditions.
As shown in fig. 3, the training method of the lipid water distribution coefficient prediction model can be obtained by training by adopting a pre-training and fine-tuning method, and under the condition of initializing model parameters, a large amount of software calculation data is adopted to perform pre-training, and then more accurate experiment and patent data are adopted to perform fine-tuning.
According to the lipid distribution coefficient prediction method provided by the invention, the characteristic extraction is carried out on the biological micromolecules by utilizing a graph characteristic extraction method, and the lipid distribution coefficient prediction is carried out on the extracted biological micromolecule characteristics by utilizing a pre-trained lipid distribution coefficient prediction model. According to the embodiment of the invention, the characteristics of the biological micromolecules are extracted by using the graph characteristic extraction method, so that reasonable and sufficient characterization information of the micromolecules is learned, the characteristics of the molecules are more accurately expressed, and the extracted characteristics of the biological micromolecules are predicted by using the pre-trained lipid distribution coefficient prediction model, so that the lipid distribution coefficient prediction is more accurate, and the workload is reduced.
As an alternative implementation manner of the embodiment of the present invention, the lipid water distribution coefficient prediction model is obtained by training in the following manner:
acquiring first lipid distribution coefficient training data and training data associated with a first lipid distribution coefficient, the first lipid distribution coefficient related training data comprising: at least one of solubility (log), melting point (mp), dissociation coefficient (pka), lipid water partition coefficient (log d) measured at ph=7.4.
The first lipid fraction distribution coefficient training data and the training data associated with the first lipid fraction distribution coefficient may be obtained from a chemical database, wherein the chemical database comprises 180 ten thousand small molecule compounds, and the lipid fraction distribution coefficient training data and the lipid fraction distribution coefficient associated training data in the chemical database are all calculated by software. The first lipid fraction distribution coefficient related training data comprises: at least one of solubility, melting point, dissociation coefficient, and lipid water distribution coefficient measured at ph=7.4
And inputting the first water distribution coefficient training data and the training data related to the first water distribution coefficient into a machine learning model for pre-training to obtain a water distribution coefficient prediction model.
Illustratively, as shown in fig. 4, the first lipid distribution coefficient training data and the training data related to the first lipid distribution coefficient are both input into a machine learning model for strong supervision or weak supervision multi-task pre-training, so as to obtain a lipid distribution coefficient prediction model, and the final loss is a weighted sum of losses of each item. Specifically, different parameters may be given different weights according to the correlation with the lipid distribution coefficient, for example, the lipid distribution coefficient: 1. lipid partition coefficient measured at ph=7.4: 1. dissociation coefficient: 0.1, solubility: 1. melting point: 0.1. the embodiment of the invention does not limit the setting of the weight, and the person skilled in the art can set the weight according to the actual situation.
The embodiment of the invention adopts multitask learning, simultaneously learns the related indexes of a plurality of lipid distribution coefficients, enhances the robustness of the model through the synergy between the data, and overcomes the problem of sparse data in the field.
As an optional implementation manner of the embodiment of the present invention, the method for predicting a lipid distribution coefficient further includes:
and acquiring second lipid distribution coefficient training data and training data related to a second lipid distribution coefficient, wherein the second lipid distribution coefficient related data is the same as the first lipid distribution coefficient related training data, and the accuracy of the second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient is greater than that of the first lipid distribution coefficient training data and the training data related to the first lipid distribution coefficient.
The second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient can be obtained from a chemical database and a patent literature, and belong to experimental data, the accuracy of the second lipid distribution coefficient training data and the training data related to the first lipid distribution coefficient is higher than that of the first lipid distribution coefficient training data obtained through calculation, fine tuning is performed on a lipid distribution model by using the data with better accuracy, and the robustness of the obtained model is better.
And inputting the second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient into a lipid distribution coefficient prediction model for training to obtain a target lipid distribution coefficient prediction model. The specific training method refers to the description of the lipid water distribution coefficient prediction model, and is not described herein.
In order to test the effect of the method, the method is compared with the currently widely used XLogP3 with good effect, and as shown in a figure 5, the same 1800 small molecules of the quasi-drugs are input to two models, the pearson correlation coefficient obtained by the scheme of the embodiment of the invention is 0.83, and the pearson correlation coefficient of the XLogP3 is 0.76, so that the prediction of the lipid distribution coefficient of the invention is more accurate.
The embodiment of the invention also discloses a graph feature extraction model, which as shown in fig. 6, comprises the following steps:
an input layer 31, configured to obtain a feature map to be extracted, where the feature map to be extracted is composed of a plurality of nodes and edges connecting the nodes with association relationships; the specific implementation manner is described in the related description of step S11 in the embodiment, and will not be described herein.
The method comprises the steps of a plurality of convolution layers 32 and a GRU network layer 33, wherein the convolution layers 32 and the GRU network layer 33 are arranged at intervals, each convolution layer 32 inputs the characteristics of the nodes with association relations in the extracted characteristic images to the GRU network layer 33 for characteristic fusion and inputs the characteristics to the next convolution layer 32, and each convolution layer 32 is repeated to input the characteristics of the nodes with association relations in the extracted characteristic images to the GRU network layer 33 for characteristic fusion and input the characteristics to the next convolution layer 32 until the last convolution layer 32; the specific implementation manner is described in the related description of step S12 in the embodiment, and will not be described herein.
The merging layer 34 is configured to perform feature fusion on the features of each node output by the final convolution layer 32, and output the feature fusion result through the output layer 35. The specific implementation manner is described in the related description of step S13 in the embodiment, and will not be described herein.
The graph feature extraction model provided by the invention comprises the following steps: the input layer is used for acquiring a feature map to be extracted; the method comprises the steps of inputting the characteristics of the nodes with association relations in an extracted characteristic diagram to the GRU network layer for characteristic fusion and inputting the characteristics to the next convolution layer, repeating the steps of inputting the characteristics of the nodes with association relations in the extracted characteristic diagram to the GRU network layer for characteristic fusion and inputting the characteristics to the next convolution layer until the last convolution layer; and the merging layer is used for carrying out feature fusion on the features of each node output by the last layer of convolution layer and outputting the feature fusion result through the output layer. According to the embodiment of the invention, the GRU network layer is added, and is used for fusing the characteristic information with the association relation nodes during convolution operation, and as the GRU has different degrees of sensitivity to different input information, the GRU network layer can retain some information through a plurality of internal gating sub-networks and discard some useless information, so that the network has better expression capability and is more suitable for interaction among nodes in an expression graph.
The embodiment of the invention discloses a graph characteristic extraction device, as shown in fig. 7, comprising:
an obtaining module 41, configured to obtain a feature map to be extracted, where the feature map to be extracted is composed of a plurality of nodes and edges connecting the nodes with association relationships; the specific implementation manner is described in the related description of step S11 in the embodiment, and will not be described herein.
The first extraction module 42 is configured to input a feature map to be extracted into a map feature extraction model to perform feature extraction, so as to obtain features of each node, where the map feature extraction model includes a plurality of convolution layers and a GRU network layer, the plurality of convolution layers are arranged at intervals with the GRU network layer, and feature fusion with association relation nodes is performed through the GRU network layer; the specific implementation manner is described in the related description of step S12 in the embodiment, and will not be described herein.
And the fusion module 43 is used for inputting the characteristics of each node output by the last layer of convolution layer to the merging layer for carrying out characteristic fusion to obtain the characteristics of the characteristic diagram to be extracted. The specific implementation manner is described in the related description of step S13 in the embodiment, and will not be described herein. According to the graph feature extraction device provided by the invention, the feature graph to be extracted is obtained, and consists of a plurality of nodes and edges connected with the nodes with association relations; inputting a feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers; and inputting the characteristics of each node output by the convolution layer of the last layer into a merging layer to perform characteristic fusion, so as to obtain the characteristics of the characteristic diagram to be extracted. According to the embodiment of the invention, the GRU network layer is used for fusing the characteristic information with the association relation nodes during each convolution operation, and as the GRU has different degrees of sensitivity to different input information, the GRU network layer can retain some information through a plurality of internal gating sub-networks and discard some useless information, so that the network has better expression capability and is more suitable for interaction among nodes in an expression graph.
The embodiment of the invention also discloses a device for predicting the lipid distribution coefficient, which is shown in fig. 8 and comprises the following steps:
a second extraction module 51, configured to perform feature extraction on the biological small molecules by using the graph feature extraction method according to the graph feature extraction method embodiment; the specific implementation manner is described in the related description of step S21 in the embodiment, and will not be described herein.
The prediction module 52 is configured to perform lipid distribution coefficient prediction on the extracted small biological molecule features by using a pre-trained lipid distribution coefficient prediction model. The specific implementation manner is described in the related description of step S21 in the embodiment, and will not be described herein.
According to the lipid distribution coefficient prediction device provided by the invention, the characteristics of biological small molecules are extracted by using a graph characteristic extraction method, and the lipid distribution coefficient of the extracted biological small molecules is predicted by using a pre-trained lipid distribution coefficient prediction model. According to the embodiment of the invention, the characteristics of the biological micromolecules are extracted by using the graph characteristic extraction method, so that reasonable and sufficient characterization information of the micromolecules is learned, the characteristics of the molecules are more accurately expressed, and the extracted characteristics of the biological micromolecules are predicted by using the pre-trained lipid distribution coefficient prediction model, so that the lipid distribution coefficient prediction is more accurate, and the workload is reduced.
The embodiment of the present invention further provides a computer device, as shown in fig. 9, which may include a processor 61 and a memory 62, where the processor 61 and the memory 62 may be connected by a bus or other means, and in fig. 7, the connection is exemplified by a bus.
The processor 61 may be a central processing unit (Central Processing Unit, CPU). Processor 61 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination of the above.
The memory 62 is used as a non-transitory computer readable storage medium for storing a non-transitory software program, a non-transitory computer executable program, and a module, such as program instructions/modules (e.g., the acquisition module 41, the first extraction module 42, and the fusion module 43 shown in fig. 7, or the second extraction module 51 and the prediction module 52 shown in fig. 8) corresponding to the graph feature extraction method or the lipid water distribution coefficient prediction method in the embodiment of the present invention. The processor 61 executes various functional applications of the processor and data processing, i.e., implements the graph feature extraction method or the lipid water distribution coefficient prediction method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 62.
Memory 62 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 61, etc. In addition, the memory 62 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 62 may optionally include memory located remotely from processor 61, which may be connected to processor 61 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 62, which when executed by the processor 61, perform the graph feature extraction method in the embodiment shown in fig. 1 or the lipid water distribution coefficient prediction method in the embodiment shown in fig. 2.
The details of the above-mentioned computer device may be understood correspondingly with respect to the corresponding relevant descriptions and effects in the embodiments shown in fig. 1 to 2, and will not be repeated here.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (RandomAccessMemory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims (5)

1. The method for predicting the lipid water distribution coefficient is characterized by comprising the following steps of:
extracting the characteristics of the biological micromolecules by using a preset graph characteristic extraction method;
predicting the lipid distribution coefficient of the extracted biological micromolecular features by using a pre-trained lipid distribution coefficient prediction model;
the preset graph feature extraction method comprises the following steps:
acquiring a feature map to be extracted, wherein the feature map to be extracted consists of a plurality of nodes and edges connected with the nodes with association relations;
inputting the feature image to be extracted into a graph feature extraction model to perform feature extraction to obtain the features of each node, wherein the graph feature extraction model comprises a plurality of convolution layers and GRU network layers, the convolution layers and the GRU network layers are arranged at intervals, and feature fusion with association relation nodes is performed through the GRU network layers;
inputting the characteristics of each node output by the last layer of convolution layer to a merging layer for carrying out characteristic fusion to obtain the characteristics of the characteristic graph to be extracted;
the feature fusion with association relation nodes is performed through the GRU network layer, and the feature fusion comprises the following steps:
w′=GRU(w,around w )
wherein, around w Representing the total influence of all other nodes v connected with the node w on the node w in the graph characteristic extraction model;is an MLP neural network, and different types of edges correspond to different network parameters; w' represents the feature vector of the updated node w;
the step of inputting the characteristics of each node output by the last layer of convolution layer to the merging layer for characteristic fusion comprises the following steps:
inputting the characteristics of each node output by the last layer of convolution layer into a merging layer, mapping the characteristics of each node into simulated fingerprints through the merging layer, and then carrying out characteristic fusion;
the feature fusion is performed after the feature of each node is mapped into the simulated fingerprint through the merging layer, and the feature fusion comprises the following steps:
wherein w is (n) Is the output characteristic vector of the node w after the calculation of the convolution of the nth layer of graph; dim means that any node feature vector is mapped into a dim-dimensional vector space;a mapping value representing the output eigenvector of the n-th layer convolution of the node w; softbmap represents the output of the merge layer.
2. The method according to claim 1, wherein the lipid fraction prediction model is trained by:
acquiring first lipid distribution coefficient training data and training data associated with the first lipid distribution coefficient, the first lipid distribution coefficient related training data comprising: at least one of solubility, melting point, dissociation coefficient, lipid fraction distribution coefficient measured at ph=7.4;
and inputting the first water distribution coefficient training data and the training data related to the first water distribution coefficient into a machine learning model for pre-training to obtain a water distribution coefficient prediction model.
3. The method according to claim 2, wherein the method further comprises:
acquiring second lipid distribution coefficient training data and training data related to the second lipid distribution coefficient, wherein the second lipid distribution coefficient related data is the same as the first lipid distribution coefficient related training data, and the accuracy of the second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient is greater than that of the first lipid distribution coefficient training data and the training data related to the first lipid distribution coefficient;
and inputting the second lipid distribution coefficient training data and the training data related to the second lipid distribution coefficient into the lipid distribution coefficient prediction model for training to obtain a target lipid distribution coefficient prediction model.
4. A computer device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the lipid profile prediction method of any one of claims 1-3.
5. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor realizes the steps of the lipid water distribution coefficient prediction method according to any one of claims 1 to 3.
CN202011159909.3A 2020-10-26 2020-10-26 Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model Active CN112185480B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011159909.3A CN112185480B (en) 2020-10-26 2020-10-26 Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011159909.3A CN112185480B (en) 2020-10-26 2020-10-26 Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model

Publications (2)

Publication Number Publication Date
CN112185480A CN112185480A (en) 2021-01-05
CN112185480B true CN112185480B (en) 2024-01-26

Family

ID=73923371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011159909.3A Active CN112185480B (en) 2020-10-26 2020-10-26 Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model

Country Status (1)

Country Link
CN (1) CN112185480B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012051242A1 (en) * 2010-10-13 2012-04-19 Aspen Technology, Inc. Extension of cosmo-sac solvation model for electrolytes
CN106599198A (en) * 2016-12-14 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image description method for multi-stage connection recurrent neural network
CN108205613A (en) * 2017-12-11 2018-06-26 华南理工大学 The computational methods of similarity and system and their application between a kind of compound molecule
WO2019238680A1 (en) * 2018-06-11 2019-12-19 Givaudan Sa Method related to organic compositions
CN110957012A (en) * 2019-11-28 2020-04-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for analyzing properties of compound
CN111694917A (en) * 2020-06-10 2020-09-22 北京嘀嘀无限科技发展有限公司 Vehicle abnormal track detection and model training method and device
CN111783442A (en) * 2019-12-19 2020-10-16 国网江西省电力有限公司电力科学研究院 Intrusion detection method, device, server and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012051242A1 (en) * 2010-10-13 2012-04-19 Aspen Technology, Inc. Extension of cosmo-sac solvation model for electrolytes
CN106599198A (en) * 2016-12-14 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image description method for multi-stage connection recurrent neural network
CN108205613A (en) * 2017-12-11 2018-06-26 华南理工大学 The computational methods of similarity and system and their application between a kind of compound molecule
WO2019238680A1 (en) * 2018-06-11 2019-12-19 Givaudan Sa Method related to organic compositions
CN110957012A (en) * 2019-11-28 2020-04-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for analyzing properties of compound
CN111783442A (en) * 2019-12-19 2020-10-16 国网江西省电力有限公司电力科学研究院 Intrusion detection method, device, server and storage medium
CN111694917A (en) * 2020-06-10 2020-09-22 北京嘀嘀无限科技发展有限公司 Vehicle abnormal track detection and model training method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Novel Leakage Detection by Ensemble CNN-SVM and Graph-Based Localization in Water Distribution Systems;Jiheon Kang;《 IEEE Transactions on Industrial Electronics》;第65卷(第5期);全文 *
基于注意力机制的CNN-GRU短期电力负荷预测方法;赵兵;《电网技术》;第43卷(第12期);全文 *

Also Published As

Publication number Publication date
CN112185480A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN108615046A (en) A kind of stored-grain pests detection recognition methods and device
US20220051079A1 (en) Auto-encoding using neural network architectures based on synaptic connectivity graphs
EP4036796A1 (en) Automatic modeling method and apparatus for object detection model
CN113205142B (en) Target detection method and device based on incremental learning
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN114333986A (en) Method and device for model training, drug screening and affinity prediction
CN110046706A (en) Model generating method, device and server
CN114333980A (en) Method and device for model training, protein feature extraction and function prediction
CN112419306B (en) NAS-FPN-based lung nodule detection method
CN106997373A (en) A kind of link prediction method based on depth confidence network
CN108256489B (en) Behavior prediction method and device based on deep reinforcement learning
US20170161946A1 (en) Stochastic map generation and bayesian update based on stereo vision
US20220215485A1 (en) Methods and systems for an enhanced energy grid system
CN112185480B (en) Graph feature extraction and lipid water distribution coefficient prediction method and graph feature extraction model
KR102333428B1 (en) Method, apparatus and computer program for detecting fish school using artificial intelligence
CN116861262B (en) Perception model training method and device, electronic equipment and storage medium
KR20200023695A (en) Learning system to reduce computation volume
JP2020064364A (en) Learning device, image generating device, learning method, and learning program
CN116542396A (en) Distributed photovoltaic output prediction method and device, storage medium and electronic equipment
CN116882474A (en) Fine tuning method, device, equipment and medium of pre-training model
CN117010480A (en) Model training method, device, equipment, storage medium and program product
CN112417236B (en) Training sample acquisition method and device, electronic equipment and storage medium
CN116805384A (en) Automatic searching method, automatic searching performance prediction model training method and device
Pearson et al. Improving obstacle boundary representations in predictive occupancy mapping
CN112070205A (en) Multi-loss model obtaining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant