CN114822718A - Human oral bioavailability prediction method based on graph neural network - Google Patents

Human oral bioavailability prediction method based on graph neural network Download PDF

Info

Publication number
CN114822718A
CN114822718A CN202210306054.5A CN202210306054A CN114822718A CN 114822718 A CN114822718 A CN 114822718A CN 202210306054 A CN202210306054 A CN 202210306054A CN 114822718 A CN114822718 A CN 114822718A
Authority
CN
China
Prior art keywords
atoms
neural network
information
atomic
chemical bond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210306054.5A
Other languages
Chinese (zh)
Other versions
CN114822718B (en
Inventor
杨云
于明浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202210306054.5A priority Critical patent/CN114822718B/en
Publication of CN114822718A publication Critical patent/CN114822718A/en
Application granted granted Critical
Publication of CN114822718B publication Critical patent/CN114822718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a human oral bioavailability prediction method based on a graph neural network in the technical field of molecular chemical property prediction, which comprises an initial atom and chemical bond characteristic extraction module and a graph neural network module; the map neural network needs to convert molecular structure information into a molecular map, the initial characteristics of atoms and chemical bonds need to be defined for the map neural network to use, and the atomic structure information is utilized to construct an atomic adjacency matrix representing the topological structure of molecules; the graph neural network module and the forward propagation of the graph neural network comprise two steps including message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate a good hidden representation of atoms and chemical bonds, the extraction of molecular descriptors can be avoided by using the graph neural network, the workload is reduced, a chemical bond message absorption mechanism is used, a chemical bond auxiliary model is made to learn a better molecular representation, and the interpretability of the graph neural network is improved.

Description

Human oral bioavailability prediction method based on graph neural network
Technical Field
The invention relates to the technical field of molecular chemical property prediction, in particular to a human oral bioavailability prediction method based on a graph neural network.
Background
Oral bioavailability in humans is one of the most important pharmacokinetic properties in the development of oral drugs in humans. In the early stage of oral drug discovery and development, candidate drugs with low oral bioavailability in human bodies are excluded, and resource consumption can be reduced. At present, human oral bioavailability of candidate drugs is often predicted by combining a molecular descriptor based on a specific calculation method or on expert definition with a machine learning algorithm, the predefined molecular descriptor not only increases workload, but also does not bring new insights and new ideas for oral drug development, and the traditional prediction of human oral bioavailability uses the molecular descriptor to combine with machine learning to develop a prediction model, but the molecular descriptor is often based on previous drug development experiences, does not provide new insights for new drug development, and has certain unavoidable experience deviation. With the development of deep learning technology, the graph neural network has been widely applied to molecular property prediction tasks. By using the graph neural network, the molecular hidden representation can be automatically learned by only defining simple atomic characteristics and chemical bond characteristics without extracting a molecular descriptor, and the molecular property prediction is completed. Therefore, the method has great practical significance for constructing a human oral bioavailability prediction model by using the graph neural network, assisting in research and development of new drugs and promoting application and development of artificial intelligence in the field of drug discovery.
Because the prediction of the human oral bioavailability has higher theoretical research and application values, the resource waste caused by the too low human oral bioavailability of the candidate drug can be obviously reduced, and many researchers at home and abroad always propose a new method for predicting the property. Falc Lou n-Cano [1] and the like are integrated by using various machine learning models, and 0D-2D various molecular descriptors are extracted to construct a human oral bioavailability prediction model. The application of graph neural network to predict human oral bioavailability belongs to the field of molecular property prediction, Gilmer [3] et al propose a message transmission graph neural network model, construct the convolution operation of graph neural network based on atomic message transmission, and greatly exceed the traditional method in the field of quantum chemical property prediction;
the prior art has the following disadvantages:
(1) human oral bioavailability prediction model
The previous prediction models for predicting oral bioavailability in humans are represented by molecular descriptors, which can be classified into predefined molecular descriptors and specific calculation-based molecular descriptors. The molecular descriptors based on the pre-definition are developed by pharmacologists through previous drug development experiences, the compounds synthesized by human beings currently only occupy a small part of chemical space, and the problems of experience deviation, misjudgment and the like are inevitably generated based on the previous drug development experiences. For descriptors based on a particular computational method, the researcher is usually unaware of the relevance of the descriptor to the task, which limits the performance of predictions for certain properties, such as oral bioavailability in humans. The use of a graphical neural network to automatically extract a molecular representation that is highly correlated with human oral bioavailability or will help predict this property in a more accurate manner.
(2) Molecular property prediction model based on graph neural network
At present, the forward propagation process of predicting molecular properties by the neural network does not take the essential characteristics of chemical bonds, which represent electron clouds around atom pairs, into account. When the atomic state is changed, the chemical bond state should also be changed. However, most models do not update chemical bonds during message passing, and even if chemical bonds are updated, interaction of atoms and chemical bonds is not sufficient. Improving the interaction of atoms and chemical bonds, updating chemical bonds in a manner consistent with chemical knowledge, or will help improve the performance of molecular property predictions for graphical neural networks.
Based on the above, the invention designs a human oral bioavailability prediction method based on a graph neural network to solve the problems.
Disclosure of Invention
The present invention aims to provide a method for predicting human oral bioavailability based on a graph neural network, so as to solve the problems proposed in the background technology.
1. In order to achieve the purpose, the invention provides the following technical scheme: the human body oral bioavailability prediction method based on the graph neural network comprises an initial atom and chemical bond characteristic extraction module and a graph neural network module;
in the initial atom and chemical bond feature extraction module, the graph neural network needs to convert molecular structure information into a molecular graph, the initial features of atoms and chemical bonds need to be defined for the graph neural network to use, and an atom adjacency matrix is constructed to represent a topological structure of molecules by utilizing the atom structure information;
the forward propagation of the graph neural network comprises two steps, namely message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate good hidden representations of atoms and chemical bonds, the reading operation enables the hidden representations of the atoms and the chemical bonds to generate hidden representations of molecules, and then the prediction is carried out by using a full-connection network to obtain a prediction result;
s1: message transmission, wherein the message transmission comprises three stages of atomic message transmission, chemical bond message absorption and self-attention zooming;
during the atomic messaging phase, each atom in the molecular graph will absorb information about the atoms and chemical bonds to which it is attached
Figure BDA0003565129490000031
According to the following steps:
Figure BDA0003565129490000032
Figure BDA0003565129490000033
wherein the content of the first and second substances,
Figure BDA0003565129490000034
and
Figure BDA0003565129490000035
are all learning matrices, d t And c t The dimensionality of the atomic state vector and the chemical bond state vector in the t-th update respectively; d t+1 Is the dimension of the primitive state vector in the t +1 th update; σ (-) isA ReLU nonlinear activation function; in the process, the information of the central atom i is updated by the information of the peripheral neighbor atoms and the chemical bonds connected with the peripheral neighbor atoms;
in the chemical bond message absorption phase, the chemical bond will absorb the information of the two atoms connected to it for updating itself, according to:
Figure BDA0003565129490000036
wherein the content of the first and second substances,
Figure BDA0003565129490000037
and
Figure BDA0003565129490000038
all the learning matrixes are used as learning matrixes,
Figure BDA0003565129490000039
will be reacted with e ij Splicing the state vectors of two connected atoms;
through atom message transmission and chemical bond message absorption, the information of atoms flows to atoms and chemical bonds connected with the atoms and the chemical bonds, the chemical bonds also absorb the information of surrounding atoms, and after multiple updates, the molecular information flows through all atoms and chemical bonds, so that the atoms and the chemical bonds have the topological information of the neighborhood;
in the zoom from attention stage, the model will focus on atomic and chemical bond features according to:
Figure BDA0003565129490000041
Figure BDA0003565129490000042
Figure BDA0003565129490000043
Figure BDA0003565129490000044
wherein, V t+1 And E t+1 The state matrices of atoms and chemical bonds at the time of completing the atomic message delivery and chemical bond message absorption, respectively, in the t-th update,
Figure BDA0003565129490000045
Figure BDA0003565129490000046
and
Figure BDA0003565129490000047
are all the learning matrixes,
Figure BDA0003565129490000049
is a Hadamard Product of a matrix (Hadamard Product), W va1 For embedding information in an atomic state matrix into a high-dimensional space, after activation, W va2 Extracting information, converting numerical values into attention weights through a SoftMax (·) function to obtain an atomic attention weight vector, and directly using the attention weight vector and an atomic state matrix to carry out Hadamard product to reduce the numerical values of all characteristics, namely important characteristics, the reduction amplitude and d t+1 Is related to the size of d t +1 The larger the eigenvalue is reduced, the larger the attention weight vector is enlarged by d t+1 Multiple, such that the average of the attention weight vector is scaled to 1 regardless of the feature vector length d t+1 The model is easier to train;
s2: reading, in the reading phase, simultaneously processing atoms and chemical bonds using a plurality of reading functions to obtain a better molecular hidden representation, according to:
v all =Set2Set(V T )||Mean(V T )||Max(V T ) (8)
e all =Set2Set(E T )||Mean(E T )||Max(E T ) (9)
z=v all ||e all (10)
Figure BDA0003565129490000048
wherein Mean (-) and Max (-) are global average pooling and global maximum pooling, respectively.
Preferably, extracting the atomic initial features including atomic type, atomic number, aromaticity and hybridization mode features as atomic representation; extracting chemical bond initial characteristics including bond type, whether the chemical bond is a covalent bond or not and stereoisomerism type characteristics as chemical bond representation.
Preferably, in S1, the matrix is embedded
Figure BDA0003565129490000051
And
Figure BDA0003565129490000052
respectively embedding information of atoms and chemical bonds into a hidden space, wherein the dimension of the space is h; dimension reduction matrix
Figure BDA0003565129490000053
For translating information in hidden space into the dimension required by the neural network of the lower graph,
Figure BDA0003565129490000054
for collecting information about atom i itself.
Preferably, in S1, the matrix is embedded
Figure BDA0003565129490000055
Is used to embed the two atomic information into a hidden space, with a dimension h,
Figure BDA0003565129490000056
for collecting chemical bonds e ij Embedding own information into hidden space, and reducing dimension matrix
Figure BDA0003565129490000057
The method is used for converting the information in the hidden space into the dimension required by the chemical bond of the neural network of the next layer diagram.
Preferably, in S1, the average value of the attention weight vector is
Figure BDA0003565129490000058
Preferably, in S1, the chemical bond state vector matrix is processed in the same manner as described above.
Preferably, in S2, the results obtained by the various Readout functions are concatenated so that the obtained atoms as a whole represent v all And chemical bond as a whole represent e all It will be more representative of its overall state.
Preferably, in S2, v is all And e all And (4) splicing to obtain a hidden representation z of the molecule, and then predicting by using the full-connection layer f (-) to obtain a prediction result.
Compared with the prior art, the invention has the beneficial effects that:
1. the method provides chemical bond message absorption, so that a graph neural network can adaptively fuse important layer number characteristics according to molecular structure information, and simultaneously filters noise information to improve molecular representation capability; a self-attention zooming mechanism is provided, so that the model can focus on the characteristics strongly related to the human oral bioavailability and simultaneously avoid the strong related characteristics from being excessively reduced, and the molecular representation capability is improved; the method has strong explanatory property, can analyze the molecular substructure highly related to the human oral bioavailability, and provides new insight of artificial intelligence level exceeding human visual angle for the research and development of new drugs;
2. by using the graph neural network, the extraction of molecular descriptors can be avoided, the workload is reduced, and a chemical bond message absorption mechanism is used, so that a chemical bond auxiliary model learns better molecular representation, and the explanatory performance of the graph neural network is improved.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a neural network module of the present invention;
FIG. 3 is a schematic diagram of atomic messaging and chemical bond message absorption in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1 to 3, the present invention provides a technical solution of a method for predicting human oral bioavailability based on a neural network: the human body oral bioavailability prediction method based on the graph neural network comprises an initial atom and chemical bond characteristic extraction module and a graph neural network module;
in the initial atom and chemical bond feature extraction module, a graph neural network needs to convert molecular structure information into a molecular graph, the initial features of atoms and chemical bonds need to be defined for the graph neural network to use, and the extracted atomic initial features comprise atomic type, atomic number, aromaticity and hybridization mode features as atomic representation; extracting chemical bond initial characteristics including bond types, whether the chemical bond initial characteristics are covalent bonds or not and stereoisomerism type characteristics as chemical bond representation, and constructing a topological structure of an atom adjacency matrix representative molecule by using atom structure information;
the forward propagation of the graph neural network comprises two steps, namely message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate good hidden representations of atoms and chemical bonds, the reading operation enables the hidden representations of the atoms and the chemical bonds to generate hidden representations of molecules, and then the prediction is carried out by using a full-connection network to obtain a prediction result;
s1: message transmission, wherein the message transmission comprises three stages of atomic message transmission, chemical bond message absorption and self-attention zooming;
during the atomic messaging phase, each atom in the molecular graph will absorb information about the atoms and chemical bonds to which it is attached
Figure BDA0003565129490000071
According to the following steps:
Figure BDA0003565129490000072
Figure BDA0003565129490000073
wherein the content of the first and second substances,
Figure BDA0003565129490000074
and
Figure BDA0003565129490000075
are all learning matrices, d t And c t The dimensionality of the atomic state vector and the chemical bond state vector in the t-th update respectively; d t+1 Is the dimension of the primitive state vector in the t +1 th update; σ (-) is the ReLU nonlinear activation function; embedded matrix
Figure BDA0003565129490000076
And
Figure BDA0003565129490000077
for embedding information of atoms and chemical bonds, respectivelyEntering a hidden space, wherein the dimension of the space is h; dimension reduction matrix
Figure BDA0003565129490000078
For translating information in hidden space into the dimension required by the neural network of the lower graph,
Figure BDA0003565129490000079
for collecting the atom i self information, this process updates the central atom i self information with the information of its surrounding neighbor atoms and chemical bonds connected to them, fig. 3(a) shows the process of atom messaging;
in the chemical bond message absorption phase, the chemical bond will absorb the information of the two atoms connected to it for updating itself, according to:
Figure BDA00035651294900000710
wherein the content of the first and second substances,
Figure BDA00035651294900000711
and
Figure BDA00035651294900000712
are all the learning matrixes,
Figure BDA00035651294900000713
will be reacted with e ij State vector stitching, embedding matrix of two connected atoms
Figure BDA00035651294900000714
Is used to embed the two atomic information into a hidden space, with a dimension h,
Figure BDA00035651294900000715
for collecting chemical bonds e ij Embedding own information into hidden space, and reducing dimension matrix
Figure BDA00035651294900000716
For hidingThe information in the space is converted into the dimension required by the chemical bond of the neural network of the next layer diagram, and the chemical bond message absorption process is shown in fig. 3 (b);
through atom message transmission and chemical bond message absorption, the information of atoms flows to atoms and chemical bonds connected with the atoms and the chemical bonds, the chemical bonds also absorb the information of surrounding atoms, and after multiple updates, the molecular information flows through all atoms and chemical bonds, so that the atoms and the chemical bonds have the topological information of the neighborhood;
in the zoom from attention stage, the model will focus on atomic and chemical bond features according to:
Figure BDA0003565129490000081
Figure BDA0003565129490000082
Figure BDA0003565129490000083
Figure BDA0003565129490000084
wherein, V t+1 And E t+1 The state matrices of atoms and chemical bonds at the time of completing the atomic message delivery and chemical bond message absorption, respectively, in the t-th update,
Figure BDA0003565129490000085
Figure BDA0003565129490000086
and
Figure BDA0003565129490000087
are all the learning matrixes,
Figure BDA0003565129490000088
hadamard Product (Hadamard Product), W, as a matrix va1 For embedding information in an atomic state matrix into a high-dimensional space, after activation, W va2 Extracting information, converting numerical values into attention weights through a SoftMax (·) function to obtain an atomic attention weight vector, wherein the average value of the attention weight vector at the moment is
Figure BDA0003565129490000089
When the attention weight vector and the atomic state matrix are directly used to perform the Hadamard product, the values of all the features are reduced, even the important features, the reduction range and d t+1 Is related to the size of d t+1 The larger the eigenvalue is reduced, the larger the attention weight vector is enlarged by d t+1 Multiple, such that the average of the attention weight vector is scaled to 1 regardless of the feature vector length d t+1 The influence of (3) avoids overlarge reduction of characteristic numerical values when attention is used, so that the model is easier to train, and the processing mode of the chemical bond state vector matrix is the same as that of the chemical bond state vector matrix;
s2: reading, in the reading phase, simultaneously processing atoms and chemical bonds using a plurality of reading functions to obtain a better molecular hidden representation, according to:
v all =Set2Set(V T )||Mean(V T )||Max(VT) (8)
e all =Set2Set(E T )||Mean(E T )||Max(E T ) (9)
z=v all ||e all (10)
Figure BDA0003565129490000091
wherein Mean (-) and Max (-) are respectively global average pooling and global maximum pooling, and results obtained by various Readout functions are spliced to enable obtained atoms to integrally express v all And chemical bond as a whole represent e all Will be more representative of its overall state, will v all And e all And (4) splicing to obtain a hidden representation z of the molecule, and then predicting by using the full-connection layer f (-) to obtain a prediction result.
The method provides chemical bond message absorption, so that a graph neural network can adaptively fuse important layer number characteristics according to molecular structure information, and simultaneously filters noise information to improve molecular representation capability; a self-attention zooming mechanism is provided, so that the model can focus on the characteristics strongly related to the human oral bioavailability and simultaneously avoid the strong related characteristics from being excessively reduced, and the molecular representation capability is improved; the method has strong explanatory property, can analyze the molecular substructure highly related to the human oral bioavailability, and provides new insight of artificial intelligence level exceeding human visual angle for the research and development of new drugs; by using the graph neural network, the extraction of molecular descriptors can be avoided, the workload is reduced, and a chemical bond message absorption mechanism is used, so that a chemical bond auxiliary model learns better molecular representation, and the explanatory performance of the graph neural network is improved.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (8)

1. The human body oral bioavailability prediction method based on the graph neural network is characterized by comprising an initial atom and chemical bond feature extraction module and a graph neural network module;
in the initial atom and chemical bond feature extraction module, the graph neural network needs to convert molecular structure information into a molecular graph, the initial features of atoms and chemical bonds need to be defined for the graph neural network to use, and an atom adjacency matrix is constructed to represent a topological structure of molecules by utilizing the atom structure information;
the forward propagation of the graph neural network comprises two steps, namely message transmission and reading, wherein the message transmission needs to be carried out for multiple times to generate good hidden representations of atoms and chemical bonds, the reading operation enables the hidden representations of the atoms and the chemical bonds to generate hidden representations of molecules, and then the prediction is carried out by using a full-connection network to obtain a prediction result;
s1: message transmission, wherein the message transmission comprises three stages of atomic message transmission, chemical bond message absorption and self-attention zooming;
during the atomic messaging phase, each atom in the molecular graph will absorb information about the atoms and chemical bonds to which it is attached
Figure FDA0003565129480000011
According to the following steps:
Figure FDA0003565129480000012
Figure FDA0003565129480000013
wherein the content of the first and second substances,
Figure FDA0003565129480000014
and
Figure FDA0003565129480000015
are all learning matrices, d t And c t The dimensionality of the atomic state vector and the chemical bond state vector in the t-th update respectively; d t+1 Is the dimension of the primitive state vector in the t +1 th update; σ (-) is the ReLU nonlinear activation function; in the process, the information of the central atom i is updated by the information of the peripheral neighbor atoms and the chemical bonds connected with the peripheral neighbor atoms;
in the chemical bond message absorption phase, the chemical bond will absorb the information of the two atoms connected to it for updating itself, according to:
Figure FDA0003565129480000021
wherein the content of the first and second substances,
Figure FDA0003565129480000022
and
Figure FDA0003565129480000023
are all the learning matrixes,
Figure FDA0003565129480000024
will be reacted with e ij Splicing the state vectors of two connected atoms;
through atom message transmission and chemical bond message absorption, the information of atoms flows to atoms and chemical bonds connected with the atoms and the chemical bonds, the chemical bonds also absorb the information of surrounding atoms, and after multiple updates, the molecular information flows through all atoms and chemical bonds, so that the atoms and the chemical bonds have the topological information of the neighborhood;
in the zoom from attention stage, the model will focus on atomic and chemical bond features according to:
Figure FDA0003565129480000025
Figure FDA0003565129480000026
Figure FDA0003565129480000027
Figure FDA0003565129480000028
wherein, V t+1 And E t+1 The state matrices of atoms and chemical bonds at the time of completing the atomic message delivery and chemical bond message absorption, respectively, in the t-th update,
Figure FDA0003565129480000029
Figure FDA00035651294800000210
and
Figure FDA00035651294800000211
are all the learning matrixes,
Figure FDA00035651294800000212
is the Hadamard product (Hadamard product) of a matrix, W va1 For embedding information in an atomic state matrix into a high-dimensional space, after activation, W va2 Extracting information, converting numerical values into attention weights through a SoftMax (·) function to obtain an atomic attention weight vector, and directly using the attention weight vector and an atomic state matrix to carry out Hadamard product to reduce the numerical values of all characteristics, namely important characteristics, the reduction amplitude and d t+1 Is related to the size of d t+1 The larger the eigenvalue is reduced, the larger the attention weight vector is enlarged by d t+1 Multiple, such that the average of the attention weight vector is scaled to 1 regardless of the feature vector length d t+1 Make the model easier to trainRefining;
s2: reading, in the reading phase, simultaneously processing atoms and chemical bonds using a plurality of reading functions to obtain a better molecular hidden representation, according to:
v all =Set2Set(V T )||Mean(V T )||Max(V T ) (8)
e all =Set2Set(E T )||Mean(E T )||Max(E T ) (9)
z=v all ||e all (10)
Figure FDA0003565129480000031
wherein Mean (-) and Max (-) are global average pooling and global maximum pooling, respectively.
2. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: the extracted atomic initial features comprise atomic type, atomic number, aromaticity and hybridization mode features as atomic representations; extracting chemical bond initial characteristics including bond type, whether the bond is a covalent bond or not and stereoisomeric type characteristics as chemical bond representation.
3. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in the S1, a matrix is embedded
Figure FDA0003565129480000032
And
Figure FDA0003565129480000033
respectively embedding information of atoms and chemical bonds into a hidden space, wherein the dimension of the space is h; dimension reduction matrix
Figure FDA0003565129480000034
For translating information in hidden space into the dimension required by the neural network of the lower graph,
Figure FDA0003565129480000035
for collecting information about atom i itself.
4. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in the S1, a matrix is embedded
Figure FDA0003565129480000036
Is used to embed the two atomic information into a hidden space, with a dimension h,
Figure FDA0003565129480000037
for collecting chemical bonds e ij Embedding own information into hidden space, and reducing dimension matrix
Figure FDA0003565129480000038
The method is used for converting the information in the hidden space into the dimension required by the chemical bond of the neural network of the next layer diagram.
5. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in the step S1, the average value of the attention weight vector is
Figure FDA0003565129480000039
6. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in S1, the chemical bond state vector matrix is processed in the same manner as described above.
7. The method for predicting human oral bioavailability based on neural networks of claim 1, whereinIs characterized in that: in the step S2, the results obtained by various Readout functions are spliced, so that the obtained atom integrally represents v all And chemical bond as a whole represent e all It will be more representative of its overall state.
8. The method of predicting human oral bioavailability based on neural networks of claim 1, wherein: in said S2, v is all And e all And (4) splicing to obtain a hidden representation z of the molecule, and then predicting by using the full-connection layer f (-) to obtain a prediction result.
CN202210306054.5A 2022-03-25 2022-03-25 Human oral bioavailability prediction method based on graph neural network Active CN114822718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210306054.5A CN114822718B (en) 2022-03-25 2022-03-25 Human oral bioavailability prediction method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210306054.5A CN114822718B (en) 2022-03-25 2022-03-25 Human oral bioavailability prediction method based on graph neural network

Publications (2)

Publication Number Publication Date
CN114822718A true CN114822718A (en) 2022-07-29
CN114822718B CN114822718B (en) 2024-04-09

Family

ID=82531176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210306054.5A Active CN114822718B (en) 2022-03-25 2022-03-25 Human oral bioavailability prediction method based on graph neural network

Country Status (1)

Country Link
CN (1) CN114822718B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966266A (en) * 2023-01-06 2023-04-14 东南大学 Anti-tumor molecule strengthening method based on graph neural network
CN117935971A (en) * 2024-03-22 2024-04-26 中国石油大学(华东) Deep drilling fluid treatment agent performance prediction evaluation method based on graphic neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113140267A (en) * 2021-03-25 2021-07-20 北京化工大学 Directional molecule generation method based on graph neural network
CN113241128A (en) * 2021-04-29 2021-08-10 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113299354A (en) * 2021-05-14 2021-08-24 中山大学 Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network
WO2022022173A1 (en) * 2020-07-30 2022-02-03 腾讯科技(深圳)有限公司 Drug molecular property determining method and device, and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022022173A1 (en) * 2020-07-30 2022-02-03 腾讯科技(深圳)有限公司 Drug molecular property determining method and device, and storage medium
CN113140267A (en) * 2021-03-25 2021-07-20 北京化工大学 Directional molecule generation method based on graph neural network
CN113241128A (en) * 2021-04-29 2021-08-10 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113299354A (en) * 2021-05-14 2021-08-24 中山大学 Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张志扬;张凤荔;陈学勤;王瑞锦;: "基于分层注意力的信息级联预测模型", 计算机科学, no. 06, 15 June 2020 (2020-06-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966266A (en) * 2023-01-06 2023-04-14 东南大学 Anti-tumor molecule strengthening method based on graph neural network
CN115966266B (en) * 2023-01-06 2023-11-17 东南大学 Anti-tumor molecule strengthening method based on graph neural network
CN117935971A (en) * 2024-03-22 2024-04-26 中国石油大学(华东) Deep drilling fluid treatment agent performance prediction evaluation method based on graphic neural network

Also Published As

Publication number Publication date
CN114822718B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
WO2020221200A1 (en) Neural network construction method, image processing method and devices
LU503090B1 (en) A semantic segmentation system and method based on dual feature fusion for iot sensing
CN112883149B (en) Natural language processing method and device
US20210182666A1 (en) Weight data storage method and neural network processor based on the method
CN111368993B (en) Data processing method and related equipment
CN114822718A (en) Human oral bioavailability prediction method based on graph neural network
CN112288075B (en) Data processing method and related equipment
CN112633010B (en) Aspect-level emotion analysis method and system based on multi-head attention and graph convolution network
CN108256636A (en) A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
WO2022111617A1 (en) Model training method and apparatus
CN114997412A (en) Recommendation method, training method and device
CN111783937A (en) Neural network construction method and system
US20230117973A1 (en) Data processing method and apparatus
CN113011568B (en) Model training method, data processing method and equipment
CN113592060A (en) Neural network optimization method and device
CN113065649A (en) Complex network topology graph representation learning method, prediction method and server
US20240046067A1 (en) Data processing method and related device
CN114999565A (en) Drug target affinity prediction method based on representation learning and graph neural network
CN114613437A (en) miRNA and disease associated prediction method and system based on heteromorphic image
CN115952424A (en) Graph convolution neural network clustering method based on multi-view structure
CN113836319B (en) Knowledge completion method and system for fusion entity neighbors
Ye et al. A novel automatic image caption generation using bidirectional long-short term memory framework
CN112668543B (en) Isolated word sign language recognition method based on hand model perception
WO2023197910A1 (en) User behavior prediction method and related device thereof
CN111724309B (en) Image processing method and device, training method of neural network and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant