CN114822718A - Human oral bioavailability prediction method based on graph neural network - Google Patents
Human oral bioavailability prediction method based on graph neural network Download PDFInfo
- Publication number
- CN114822718A CN114822718A CN202210306054.5A CN202210306054A CN114822718A CN 114822718 A CN114822718 A CN 114822718A CN 202210306054 A CN202210306054 A CN 202210306054A CN 114822718 A CN114822718 A CN 114822718A
- Authority
- CN
- China
- Prior art keywords
- atoms
- neural network
- information
- atomic
- chemical bond
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 58
- 241000282414 Homo sapiens Species 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 33
- 239000000126 substance Substances 0.000 claims abstract description 119
- 239000011159 matrix material Substances 0.000 claims abstract description 29
- 230000005540 biological transmission Effects 0.000 claims abstract description 22
- 238000010521 absorption reaction Methods 0.000 claims abstract description 19
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 12
- 230000009467 reduction Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 5
- 230000002093 peripheral effect Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000009396 hybridization Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 5
- 238000009509 drug development Methods 0.000 description 6
- 241000282412 Homo Species 0.000 description 4
- 239000002547 new drug Substances 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 229940126701 oral medication Drugs 0.000 description 3
- 238000012827 research and development Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007876 drug discovery Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Pharmacology & Pharmacy (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Medicinal Chemistry (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a human oral bioavailability prediction method based on a graph neural network in the technical field of molecular chemical property prediction, which comprises an initial atom and chemical bond characteristic extraction module and a graph neural network module; the map neural network needs to convert molecular structure information into a molecular map, the initial characteristics of atoms and chemical bonds need to be defined for the map neural network to use, and the atomic structure information is utilized to construct an atomic adjacency matrix representing the topological structure of molecules; the graph neural network module and the forward propagation of the graph neural network comprise two steps including message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate a good hidden representation of atoms and chemical bonds, the extraction of molecular descriptors can be avoided by using the graph neural network, the workload is reduced, a chemical bond message absorption mechanism is used, a chemical bond auxiliary model is made to learn a better molecular representation, and the interpretability of the graph neural network is improved.
Description
Technical Field
The invention relates to the technical field of molecular chemical property prediction, in particular to a human oral bioavailability prediction method based on a graph neural network.
Background
Oral bioavailability in humans is one of the most important pharmacokinetic properties in the development of oral drugs in humans. In the early stage of oral drug discovery and development, candidate drugs with low oral bioavailability in human bodies are excluded, and resource consumption can be reduced. At present, human oral bioavailability of candidate drugs is often predicted by combining a molecular descriptor based on a specific calculation method or on expert definition with a machine learning algorithm, the predefined molecular descriptor not only increases workload, but also does not bring new insights and new ideas for oral drug development, and the traditional prediction of human oral bioavailability uses the molecular descriptor to combine with machine learning to develop a prediction model, but the molecular descriptor is often based on previous drug development experiences, does not provide new insights for new drug development, and has certain unavoidable experience deviation. With the development of deep learning technology, the graph neural network has been widely applied to molecular property prediction tasks. By using the graph neural network, the molecular hidden representation can be automatically learned by only defining simple atomic characteristics and chemical bond characteristics without extracting a molecular descriptor, and the molecular property prediction is completed. Therefore, the method has great practical significance for constructing a human oral bioavailability prediction model by using the graph neural network, assisting in research and development of new drugs and promoting application and development of artificial intelligence in the field of drug discovery.
Because the prediction of the human oral bioavailability has higher theoretical research and application values, the resource waste caused by the too low human oral bioavailability of the candidate drug can be obviously reduced, and many researchers at home and abroad always propose a new method for predicting the property. Falc Lou n-Cano [1] and the like are integrated by using various machine learning models, and 0D-2D various molecular descriptors are extracted to construct a human oral bioavailability prediction model. The application of graph neural network to predict human oral bioavailability belongs to the field of molecular property prediction, Gilmer [3] et al propose a message transmission graph neural network model, construct the convolution operation of graph neural network based on atomic message transmission, and greatly exceed the traditional method in the field of quantum chemical property prediction;
the prior art has the following disadvantages:
(1) human oral bioavailability prediction model
The previous prediction models for predicting oral bioavailability in humans are represented by molecular descriptors, which can be classified into predefined molecular descriptors and specific calculation-based molecular descriptors. The molecular descriptors based on the pre-definition are developed by pharmacologists through previous drug development experiences, the compounds synthesized by human beings currently only occupy a small part of chemical space, and the problems of experience deviation, misjudgment and the like are inevitably generated based on the previous drug development experiences. For descriptors based on a particular computational method, the researcher is usually unaware of the relevance of the descriptor to the task, which limits the performance of predictions for certain properties, such as oral bioavailability in humans. The use of a graphical neural network to automatically extract a molecular representation that is highly correlated with human oral bioavailability or will help predict this property in a more accurate manner.
(2) Molecular property prediction model based on graph neural network
At present, the forward propagation process of predicting molecular properties by the neural network does not take the essential characteristics of chemical bonds, which represent electron clouds around atom pairs, into account. When the atomic state is changed, the chemical bond state should also be changed. However, most models do not update chemical bonds during message passing, and even if chemical bonds are updated, interaction of atoms and chemical bonds is not sufficient. Improving the interaction of atoms and chemical bonds, updating chemical bonds in a manner consistent with chemical knowledge, or will help improve the performance of molecular property predictions for graphical neural networks.
Based on the above, the invention designs a human oral bioavailability prediction method based on a graph neural network to solve the problems.
Disclosure of Invention
The present invention aims to provide a method for predicting human oral bioavailability based on a graph neural network, so as to solve the problems proposed in the background technology.
1. In order to achieve the purpose, the invention provides the following technical scheme: the human body oral bioavailability prediction method based on the graph neural network comprises an initial atom and chemical bond characteristic extraction module and a graph neural network module;
in the initial atom and chemical bond feature extraction module, the graph neural network needs to convert molecular structure information into a molecular graph, the initial features of atoms and chemical bonds need to be defined for the graph neural network to use, and an atom adjacency matrix is constructed to represent a topological structure of molecules by utilizing the atom structure information;
the forward propagation of the graph neural network comprises two steps, namely message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate good hidden representations of atoms and chemical bonds, the reading operation enables the hidden representations of the atoms and the chemical bonds to generate hidden representations of molecules, and then the prediction is carried out by using a full-connection network to obtain a prediction result;
s1: message transmission, wherein the message transmission comprises three stages of atomic message transmission, chemical bond message absorption and self-attention zooming;
during the atomic messaging phase, each atom in the molecular graph will absorb information about the atoms and chemical bonds to which it is attachedAccording to the following steps:
wherein the content of the first and second substances,andare all learning matrices, d t And c t The dimensionality of the atomic state vector and the chemical bond state vector in the t-th update respectively; d t+1 Is the dimension of the primitive state vector in the t +1 th update; σ (-) isA ReLU nonlinear activation function; in the process, the information of the central atom i is updated by the information of the peripheral neighbor atoms and the chemical bonds connected with the peripheral neighbor atoms;
in the chemical bond message absorption phase, the chemical bond will absorb the information of the two atoms connected to it for updating itself, according to:
wherein the content of the first and second substances,andall the learning matrixes are used as learning matrixes,will be reacted with e ij Splicing the state vectors of two connected atoms;
through atom message transmission and chemical bond message absorption, the information of atoms flows to atoms and chemical bonds connected with the atoms and the chemical bonds, the chemical bonds also absorb the information of surrounding atoms, and after multiple updates, the molecular information flows through all atoms and chemical bonds, so that the atoms and the chemical bonds have the topological information of the neighborhood;
in the zoom from attention stage, the model will focus on atomic and chemical bond features according to:
wherein, V t+1 And E t+1 The state matrices of atoms and chemical bonds at the time of completing the atomic message delivery and chemical bond message absorption, respectively, in the t-th update, andare all the learning matrixes,is a Hadamard Product of a matrix (Hadamard Product), W va1 For embedding information in an atomic state matrix into a high-dimensional space, after activation, W va2 Extracting information, converting numerical values into attention weights through a SoftMax (·) function to obtain an atomic attention weight vector, and directly using the attention weight vector and an atomic state matrix to carry out Hadamard product to reduce the numerical values of all characteristics, namely important characteristics, the reduction amplitude and d t+1 Is related to the size of d t +1 The larger the eigenvalue is reduced, the larger the attention weight vector is enlarged by d t+1 Multiple, such that the average of the attention weight vector is scaled to 1 regardless of the feature vector length d t+1 The model is easier to train;
s2: reading, in the reading phase, simultaneously processing atoms and chemical bonds using a plurality of reading functions to obtain a better molecular hidden representation, according to:
v all =Set2Set(V T )||Mean(V T )||Max(V T ) (8)
e all =Set2Set(E T )||Mean(E T )||Max(E T ) (9)
z=v all ||e all (10)
wherein Mean (-) and Max (-) are global average pooling and global maximum pooling, respectively.
Preferably, extracting the atomic initial features including atomic type, atomic number, aromaticity and hybridization mode features as atomic representation; extracting chemical bond initial characteristics including bond type, whether the chemical bond is a covalent bond or not and stereoisomerism type characteristics as chemical bond representation.
Preferably, in S1, the matrix is embeddedAndrespectively embedding information of atoms and chemical bonds into a hidden space, wherein the dimension of the space is h; dimension reduction matrixFor translating information in hidden space into the dimension required by the neural network of the lower graph,for collecting information about atom i itself.
Preferably, in S1, the matrix is embeddedIs used to embed the two atomic information into a hidden space, with a dimension h,for collecting chemical bonds e ij Embedding own information into hidden space, and reducing dimension matrixThe method is used for converting the information in the hidden space into the dimension required by the chemical bond of the neural network of the next layer diagram.
Preferably, in S1, the chemical bond state vector matrix is processed in the same manner as described above.
Preferably, in S2, the results obtained by the various Readout functions are concatenated so that the obtained atoms as a whole represent v all And chemical bond as a whole represent e all It will be more representative of its overall state.
Preferably, in S2, v is all And e all And (4) splicing to obtain a hidden representation z of the molecule, and then predicting by using the full-connection layer f (-) to obtain a prediction result.
Compared with the prior art, the invention has the beneficial effects that:
1. the method provides chemical bond message absorption, so that a graph neural network can adaptively fuse important layer number characteristics according to molecular structure information, and simultaneously filters noise information to improve molecular representation capability; a self-attention zooming mechanism is provided, so that the model can focus on the characteristics strongly related to the human oral bioavailability and simultaneously avoid the strong related characteristics from being excessively reduced, and the molecular representation capability is improved; the method has strong explanatory property, can analyze the molecular substructure highly related to the human oral bioavailability, and provides new insight of artificial intelligence level exceeding human visual angle for the research and development of new drugs;
2. by using the graph neural network, the extraction of molecular descriptors can be avoided, the workload is reduced, and a chemical bond message absorption mechanism is used, so that a chemical bond auxiliary model learns better molecular representation, and the explanatory performance of the graph neural network is improved.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a neural network module of the present invention;
FIG. 3 is a schematic diagram of atomic messaging and chemical bond message absorption in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1 to 3, the present invention provides a technical solution of a method for predicting human oral bioavailability based on a neural network: the human body oral bioavailability prediction method based on the graph neural network comprises an initial atom and chemical bond characteristic extraction module and a graph neural network module;
in the initial atom and chemical bond feature extraction module, a graph neural network needs to convert molecular structure information into a molecular graph, the initial features of atoms and chemical bonds need to be defined for the graph neural network to use, and the extracted atomic initial features comprise atomic type, atomic number, aromaticity and hybridization mode features as atomic representation; extracting chemical bond initial characteristics including bond types, whether the chemical bond initial characteristics are covalent bonds or not and stereoisomerism type characteristics as chemical bond representation, and constructing a topological structure of an atom adjacency matrix representative molecule by using atom structure information;
the forward propagation of the graph neural network comprises two steps, namely message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate good hidden representations of atoms and chemical bonds, the reading operation enables the hidden representations of the atoms and the chemical bonds to generate hidden representations of molecules, and then the prediction is carried out by using a full-connection network to obtain a prediction result;
s1: message transmission, wherein the message transmission comprises three stages of atomic message transmission, chemical bond message absorption and self-attention zooming;
during the atomic messaging phase, each atom in the molecular graph will absorb information about the atoms and chemical bonds to which it is attachedAccording to the following steps:
wherein the content of the first and second substances,andare all learning matrices, d t And c t The dimensionality of the atomic state vector and the chemical bond state vector in the t-th update respectively; d t+1 Is the dimension of the primitive state vector in the t +1 th update; σ (-) is the ReLU nonlinear activation function; embedded matrixAndfor embedding information of atoms and chemical bonds, respectivelyEntering a hidden space, wherein the dimension of the space is h; dimension reduction matrixFor translating information in hidden space into the dimension required by the neural network of the lower graph,for collecting the atom i self information, this process updates the central atom i self information with the information of its surrounding neighbor atoms and chemical bonds connected to them, fig. 3(a) shows the process of atom messaging;
in the chemical bond message absorption phase, the chemical bond will absorb the information of the two atoms connected to it for updating itself, according to:
wherein the content of the first and second substances,andare all the learning matrixes,will be reacted with e ij State vector stitching, embedding matrix of two connected atomsIs used to embed the two atomic information into a hidden space, with a dimension h,for collecting chemical bonds e ij Embedding own information into hidden space, and reducing dimension matrixFor hidingThe information in the space is converted into the dimension required by the chemical bond of the neural network of the next layer diagram, and the chemical bond message absorption process is shown in fig. 3 (b);
through atom message transmission and chemical bond message absorption, the information of atoms flows to atoms and chemical bonds connected with the atoms and the chemical bonds, the chemical bonds also absorb the information of surrounding atoms, and after multiple updates, the molecular information flows through all atoms and chemical bonds, so that the atoms and the chemical bonds have the topological information of the neighborhood;
in the zoom from attention stage, the model will focus on atomic and chemical bond features according to:
wherein, V t+1 And E t+1 The state matrices of atoms and chemical bonds at the time of completing the atomic message delivery and chemical bond message absorption, respectively, in the t-th update, andare all the learning matrixes,hadamard Product (Hadamard Product), W, as a matrix va1 For embedding information in an atomic state matrix into a high-dimensional space, after activation, W va2 Extracting information, converting numerical values into attention weights through a SoftMax (·) function to obtain an atomic attention weight vector, wherein the average value of the attention weight vector at the moment isWhen the attention weight vector and the atomic state matrix are directly used to perform the Hadamard product, the values of all the features are reduced, even the important features, the reduction range and d t+1 Is related to the size of d t+1 The larger the eigenvalue is reduced, the larger the attention weight vector is enlarged by d t+1 Multiple, such that the average of the attention weight vector is scaled to 1 regardless of the feature vector length d t+1 The influence of (3) avoids overlarge reduction of characteristic numerical values when attention is used, so that the model is easier to train, and the processing mode of the chemical bond state vector matrix is the same as that of the chemical bond state vector matrix;
s2: reading, in the reading phase, simultaneously processing atoms and chemical bonds using a plurality of reading functions to obtain a better molecular hidden representation, according to:
v all =Set2Set(V T )||Mean(V T )||Max(VT) (8)
e all =Set2Set(E T )||Mean(E T )||Max(E T ) (9)
z=v all ||e all (10)
wherein Mean (-) and Max (-) are respectively global average pooling and global maximum pooling, and results obtained by various Readout functions are spliced to enable obtained atoms to integrally express v all And chemical bond as a whole represent e all Will be more representative of its overall state, will v all And e all And (4) splicing to obtain a hidden representation z of the molecule, and then predicting by using the full-connection layer f (-) to obtain a prediction result.
The method provides chemical bond message absorption, so that a graph neural network can adaptively fuse important layer number characteristics according to molecular structure information, and simultaneously filters noise information to improve molecular representation capability; a self-attention zooming mechanism is provided, so that the model can focus on the characteristics strongly related to the human oral bioavailability and simultaneously avoid the strong related characteristics from being excessively reduced, and the molecular representation capability is improved; the method has strong explanatory property, can analyze the molecular substructure highly related to the human oral bioavailability, and provides new insight of artificial intelligence level exceeding human visual angle for the research and development of new drugs; by using the graph neural network, the extraction of molecular descriptors can be avoided, the workload is reduced, and a chemical bond message absorption mechanism is used, so that a chemical bond auxiliary model learns better molecular representation, and the explanatory performance of the graph neural network is improved.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (8)
1. The human body oral bioavailability prediction method based on the graph neural network is characterized by comprising an initial atom and chemical bond feature extraction module and a graph neural network module;
in the initial atom and chemical bond feature extraction module, the graph neural network needs to convert molecular structure information into a molecular graph, the initial features of atoms and chemical bonds need to be defined for the graph neural network to use, and an atom adjacency matrix is constructed to represent a topological structure of molecules by utilizing the atom structure information;
the forward propagation of the graph neural network comprises two steps, namely message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate good hidden representations of atoms and chemical bonds, the reading operation enables the hidden representations of the atoms and the chemical bonds to generate hidden representations of molecules, and then the prediction is carried out by using a full-connection network to obtain a prediction result;
s1: message transmission, wherein the message transmission comprises three stages of atomic message transmission, chemical bond message absorption and self-attention zooming;
during the atomic messaging phase, each atom in the molecular graph will absorb information about the atoms and chemical bonds to which it is attachedAccording to the following steps:
wherein the content of the first and second substances,andare all learning matrices, d t And c t The dimensionality of the atomic state vector and the chemical bond state vector in the t-th update respectively; d t+1 Is the dimension of the primitive state vector in the t +1 th update; σ (-) is the ReLU nonlinear activation function; in the process, the information of the central atom i is updated by the information of the peripheral neighbor atoms and the chemical bonds connected with the peripheral neighbor atoms;
in the chemical bond message absorption phase, the chemical bond will absorb the information of the two atoms connected to it for updating itself, according to:
wherein the content of the first and second substances,andare all the learning matrixes,will be reacted with e ij Splicing the state vectors of two connected atoms;
through atom message transmission and chemical bond message absorption, the information of atoms flows to atoms and chemical bonds connected with the atoms and the chemical bonds, the chemical bonds also absorb the information of surrounding atoms, and after multiple updates, the molecular information flows through all atoms and chemical bonds, so that the atoms and the chemical bonds have the topological information of the neighborhood;
in the zoom from attention stage, the model will focus on atomic and chemical bond features according to:
wherein, V t+1 And E t+1 The state matrices of atoms and chemical bonds at the time of completing the atomic message delivery and chemical bond message absorption, respectively, in the t-th update, andare all the learning matrixes,is the Hadamard product (Hadamard product) of a matrix, W va1 For embedding information in an atomic state matrix into a high-dimensional space, after activation, W va2 Extracting information, converting numerical values into attention weights through a SoftMax (·) function to obtain an atomic attention weight vector, and directly using the attention weight vector and an atomic state matrix to carry out Hadamard product to reduce the numerical values of all characteristics, namely important characteristics, the reduction amplitude and d t+1 Is related to the size of d t+1 The larger the eigenvalue is reduced, the larger the attention weight vector is enlarged by d t+1 Multiple, such that the average of the attention weight vector is scaled to 1 regardless of the feature vector length d t+1 Make the model easier to trainRefining;
s2: reading, in the reading phase, simultaneously processing atoms and chemical bonds using a plurality of reading functions to obtain a better molecular hidden representation, according to:
v all =Set2Set(V T )||Mean(V T )||Max(V T ) (8)
e all =Set2Set(E T )||Mean(E T )||Max(E T ) (9)
z=v all ||e all (10)
wherein Mean (-) and Max (-) are global average pooling and global maximum pooling, respectively.
2. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: the extracted atomic initial features comprise atomic type, atomic number, aromaticity and hybridization mode features as atomic representations; extracting chemical bond initial characteristics including bond type, whether the bond is a covalent bond or not and stereoisomeric type characteristics as chemical bond representation.
3. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in the S1, a matrix is embeddedAndrespectively embedding information of atoms and chemical bonds into a hidden space, wherein the dimension of the space is h; dimension reduction matrixFor translating information in hidden space into the dimension required by the neural network of the lower graph,for collecting information about atom i itself.
4. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in the S1, a matrix is embeddedIs used to embed the two atomic information into a hidden space, with a dimension h,for collecting chemical bonds e ij Embedding own information into hidden space, and reducing dimension matrixThe method is used for converting the information in the hidden space into the dimension required by the chemical bond of the neural network of the next layer diagram.
6. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in S1, the chemical bond state vector matrix is processed in the same manner as described above.
7. The method for predicting human oral bioavailability based on neural networks of claim 1, whereinIs characterized in that: in the step S2, the results obtained by various Readout functions are spliced, so that the obtained atom integrally represents v all And chemical bond as a whole represent e all It will be more representative of its overall state.
8. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in said S2, v is all And e all And (4) splicing to obtain a hidden representation z of the molecule, and then predicting by using the full-connection layer f (-) to obtain a prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210306054.5A CN114822718B (en) | 2022-03-25 | 2022-03-25 | Human oral bioavailability prediction method based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210306054.5A CN114822718B (en) | 2022-03-25 | 2022-03-25 | Human oral bioavailability prediction method based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114822718A true CN114822718A (en) | 2022-07-29 |
CN114822718B CN114822718B (en) | 2024-04-09 |
Family
ID=82531176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210306054.5A Active CN114822718B (en) | 2022-03-25 | 2022-03-25 | Human oral bioavailability prediction method based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114822718B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115966266A (en) * | 2023-01-06 | 2023-04-14 | 东南大学 | Anti-tumor molecule strengthening method based on graph neural network |
CN117935971A (en) * | 2024-03-22 | 2024-04-26 | 中国石油大学(华东) | Deep drilling fluid treatment agent performance prediction evaluation method based on graphic neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113140267A (en) * | 2021-03-25 | 2021-07-20 | 北京化工大学 | Directional molecule generation method based on graph neural network |
CN113241128A (en) * | 2021-04-29 | 2021-08-10 | 天津大学 | Molecular property prediction method based on molecular space position coding attention neural network model |
CN113299354A (en) * | 2021-05-14 | 2021-08-24 | 中山大学 | Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network |
WO2022022173A1 (en) * | 2020-07-30 | 2022-02-03 | 腾讯科技(深圳)有限公司 | Drug molecular property determining method and device, and storage medium |
-
2022
- 2022-03-25 CN CN202210306054.5A patent/CN114822718B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022022173A1 (en) * | 2020-07-30 | 2022-02-03 | 腾讯科技(深圳)有限公司 | Drug molecular property determining method and device, and storage medium |
CN113140267A (en) * | 2021-03-25 | 2021-07-20 | 北京化工大学 | Directional molecule generation method based on graph neural network |
CN113241128A (en) * | 2021-04-29 | 2021-08-10 | 天津大学 | Molecular property prediction method based on molecular space position coding attention neural network model |
CN113299354A (en) * | 2021-05-14 | 2021-08-24 | 中山大学 | Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network |
Non-Patent Citations (1)
Title |
---|
张志扬;张凤荔;陈学勤;王瑞锦;: "基于分层注意力的信息级联预测模型", 计算机科学, no. 06, 15 June 2020 (2020-06-15) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115966266A (en) * | 2023-01-06 | 2023-04-14 | 东南大学 | Anti-tumor molecule strengthening method based on graph neural network |
CN115966266B (en) * | 2023-01-06 | 2023-11-17 | 东南大学 | Anti-tumor molecule strengthening method based on graph neural network |
CN117935971A (en) * | 2024-03-22 | 2024-04-26 | 中国石油大学(华东) | Deep drilling fluid treatment agent performance prediction evaluation method based on graphic neural network |
Also Published As
Publication number | Publication date |
---|---|
CN114822718B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020221200A1 (en) | Neural network construction method, image processing method and devices | |
LU503090B1 (en) | A semantic segmentation system and method based on dual feature fusion for iot sensing | |
CN112883149B (en) | Natural language processing method and device | |
US20210182666A1 (en) | Weight data storage method and neural network processor based on the method | |
CN111368993B (en) | Data processing method and related equipment | |
CN114822718A (en) | Human oral bioavailability prediction method based on graph neural network | |
CN112288075B (en) | Data processing method and related equipment | |
CN112633010B (en) | Aspect-level emotion analysis method and system based on multi-head attention and graph convolution network | |
CN108256636A (en) | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing | |
WO2022111617A1 (en) | Model training method and apparatus | |
CN114997412A (en) | Recommendation method, training method and device | |
CN111783937A (en) | Neural network construction method and system | |
US20230117973A1 (en) | Data processing method and apparatus | |
CN113011568B (en) | Model training method, data processing method and equipment | |
CN113592060A (en) | Neural network optimization method and device | |
CN113065649A (en) | Complex network topology graph representation learning method, prediction method and server | |
US20240046067A1 (en) | Data processing method and related device | |
CN114999565A (en) | Drug target affinity prediction method based on representation learning and graph neural network | |
CN114613437A (en) | miRNA and disease associated prediction method and system based on heteromorphic image | |
CN115952424A (en) | Graph convolution neural network clustering method based on multi-view structure | |
CN113836319B (en) | Knowledge completion method and system for fusion entity neighbors | |
Ye et al. | A novel automatic image caption generation using bidirectional long-short term memory framework | |
CN112668543B (en) | Isolated word sign language recognition method based on hand model perception | |
WO2023197910A1 (en) | User behavior prediction method and related device thereof | |
CN111724309B (en) | Image processing method and device, training method of neural network and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |