CN112863693A - Drug target interaction prediction method based on multi-channel graph convolution network - Google Patents

Drug target interaction prediction method based on multi-channel graph convolution network Download PDF

Info

Publication number
CN112863693A
CN112863693A CN202110154690.6A CN202110154690A CN112863693A CN 112863693 A CN112863693 A CN 112863693A CN 202110154690 A CN202110154690 A CN 202110154690A CN 112863693 A CN112863693 A CN 112863693A
Authority
CN
China
Prior art keywords
drug
protein
network
matrix
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110154690.6A
Other languages
Chinese (zh)
Other versions
CN112863693B (en
Inventor
汪国华
李洋
乔冠宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Forestry University
Original Assignee
Northeast Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Forestry University filed Critical Northeast Forestry University
Priority to CN202110154690.6A priority Critical patent/CN112863693B/en
Publication of CN112863693A publication Critical patent/CN112863693A/en
Application granted granted Critical
Publication of CN112863693B publication Critical patent/CN112863693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Software Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Mathematical Physics (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Toxicology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Primary Health Care (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A method for predicting drug target interaction based on a multichannel graph convolution network belongs to the technical field of drug and target relation prediction. The method solves the problem that the accuracy of the prediction of the interaction of the drug target is poor due to the fact that the characteristics extracted manually by the existing method are inaccurate. The method comprises the steps of constructing a drug protein pair network according to an obtained drug characteristic matrix and a protein characteristic matrix, extracting characteristics of a topological relation between drug protein pairs in the drug protein pair network and a proximity relation between drug protein pair characteristics by adopting a multichannel graph convolution network to obtain topological relation embedding and characteristic proximity relation embedding, processing the topological relation embedding and the characteristic proximity relation embedding to obtain common embedding, fusing the topological relation embedding, the characteristic proximity relation embedding and the common embedding by using an attention mechanism, and inputting a fusion result into a multilayer sensing machine to predict a drug target relation. The method can be applied to the prediction of the relation between the medicine and the target.

Description

Drug target interaction prediction method based on multi-channel graph convolution network
Technical Field
The invention belongs to the technical field of medicine and target relation prediction, and particularly relates to a medicine target interaction prediction method based on a multichannel graph convolution network.
Background
Drug targets are molecules that can bind to drugs and exert specific effects inside cells, and proteins are the main molecular targets of drugs.
We need to test and experiment thousands of compounds to find safe and effective drugs. Drug discovery is therefore a time consuming and laborious process with the risk of failure. But by calculating the probability of drug interaction with the target, the costly losses in the drug discovery process can be reduced.
To achieve this goal, more and more researchers are exploring other methods to predict the relationship between drugs and targets. The prediction of the drug targeting relationship not only can reduce the loss in the drug discovery process, but also has guiding effect on drug migration, multi-drug pharmacology, drug resistance prediction, side effect prediction and the like.
Traditional approaches to predict new targets for known drugs are based on small molecules, protein targets or phenotypic characteristics. The existing drug-protein relationship prediction methods include a machine learning-based method, a two-part model-based method, a structure-based method, a deep learning-based method, and the like.
For proteins with unknown structure, the return from using a structure-based prediction method is often small, while many proteins have little return.
In recent years, the characteristics of the medicine and the target are fully utilized, and the performance of the medicine and the target is predicted by a method based on deep learning and machine learning. Although more and more researches show that deep learning can be used for predicting the drug target relationship, the existing prediction method needs to rely on manual feature extraction, and the feature extraction mode is necessarily influenced by artificial subjective factors, so that the feature extraction is not accurate, and the accuracy of drug target interaction prediction is further influenced.
Disclosure of Invention
The invention aims to solve the problem that the accuracy of the drug target interaction prediction is poor due to the fact that the existing method depends on manual extracted features are inaccurate, and provides a drug target interaction prediction method based on a multi-channel graph convolution network.
The technical scheme adopted by the invention for solving the technical problems is as follows: a drug target interaction prediction method based on a multichannel graph convolutional network specifically comprises the following steps:
step one, extracting medicine information, protein information, disease information and medicine side effect information from a database, and constructing a heterogeneous network according to the extracted information;
processing the constructed heterogeneous network by adopting a Jaccard similarity method and a random restart walk method to obtain a drug diffusion state matrix and a protein diffusion state matrix;
step two, respectively carrying out noise reduction and dimension reduction on the drug diffusion state matrix and the protein diffusion state matrix to obtain a drug characteristic matrix and a protein characteristic matrix;
thirdly, splicing the drug characteristic matrix and the protein characteristic matrix obtained in the second step, wherein in each drug protein pair obtained by splicing, the drug protein pair formed by the drug and the protein which are known to have a relationship is correct, and the rest drug protein pairs are incorrect;
randomly selecting a part of drug protein pairs from the correct drug protein pairs as a training set positive example, and randomly selecting a part of the rest correct drug protein pairs as a testing set positive example;
randomly selecting the drug protein pairs with the same number as the positive examples of the training set from the incorrect drug protein pairs as the negative examples of the training set, and randomly selecting the drug protein pairs with the same number as the positive examples of the test set from the rest incorrect drug protein pairs as the negative examples of the test set;
if the two drug protein pairs share the drug or share the protein, the two drug protein pairs are considered to be related, otherwise, the two drug protein pairs do not have the relation, a first drug protein pair network is constructed according to a training set positive example and a training set negative example, and a second drug protein pair network is constructed according to a test set positive example and a test set negative example;
step five, training the multichannel graph convolution network by adopting a first drug protein, wherein the specific process is as follows:
respectively adopting a graph convolution network to carry out feature extraction on the topological relation between the drug protein pairs in the first drug protein pair network and the adjacent relation between the drug protein pair features to obtain a topological relation embedding ZtAnd feature proximity embedding Zf
To ZtAnd ZfProcessed to obtain a common insert Zc
Using attention mechanism pair Zt、ZfAnd ZcProcessing to obtain a characteristic Z;
inputting the characteristic Z into a multilayer perceptron to carry out secondary classification, and outputting a prediction result of the relationship between the medicine and the protein by the multilayer perceptron;
testing the multichannel graph convolution network by using the second drug protein to the network, and stopping training until the prediction result of the second drug protein to the relationship between the drugs and the proteins in the network, which is output by the multilayer perceptron, meets the precision requirement to obtain the trained multichannel graph convolution network;
step six, after the processes from the step one to the step three are repeatedly executed for the drug protein pairs related to the to-be-predicted drug protein pair, randomly selecting a part of the drug protein pairs obtained in the step three, and constructing a third drug protein pair network by using the drug protein pairs related to the to-be-predicted drug protein pair and the randomly selected drug protein pairs;
and after the constructed third drug protein pair network is processed by a trained multichannel graph convolution network and an attention mechanism, inputting a processing result into a multilayer perceptron to obtain a relationship prediction result of the drug protein pair to be predicted.
The invention has the beneficial effects that: the invention provides a drug target interaction prediction method based on a multichannel graph convolution network, which comprises the steps of firstly obtaining a drug characteristic matrix and a protein characteristic matrix, then constructing a drug protein pair network according to the obtained drug characteristic matrix and the protein characteristic matrix, adopting the multichannel graph convolution network to carry out characteristic extraction on the topological relation between drug protein pairs in the drug protein pair network and the adjacent relation between the drug protein pair characteristics, obtaining topological relation embedding and characteristic adjacent relation embedding, carrying out processing on the topological relation embedding and the characteristic adjacent relation embedding to obtain common embedding, finally carrying out the topological relation embedding, the characteristic adjacent relation embedding and the common embedding fusion by using an attention mechanism, inputting a fusion result into a multilayer perceptron, and further carrying out prediction on the drug target relation.
The method provided by the invention overcomes the problem that the existing method needs to rely on manual feature extraction, so that the extracted features are accurate, and experiments prove that the Roc area obtained by adopting the method provided by the invention is 0.9616, the PR area obtained by adopting the method provided by the invention is 0.9612, which is obviously higher than that of the existing method, and the accuracy of the drug target interaction prediction is improved.
Drawings
FIG. 1 is an overall flow chart of the drug target interaction prediction method based on a multi-channel graph convolution network according to the present invention;
in the figure, Gt=(AtAnd X) is a topological graph; zf(1)Outputting the first layer of the graph convolution network when the graph convolution network is used for extracting the characteristics of the topological relation; zf(2)Outputting the second layer of the graph convolution network when the graph convolution network is used for extracting the characteristics of the topological relation; zt(1)Outputting the first layer of the graph convolution network when the graph convolution network is used for extracting the characteristics of the adjacent relation between the characteristics; zf(2)When the graph convolution network is used for extracting the features of the adjacent relation between the features, the graph convolution network outputs from the second layer.
Detailed Description
First embodiment this embodiment will be described with reference to fig. 1. The method for predicting the drug target interaction based on the multichannel graph convolution network comprises the following steps:
step one, extracting medicine information, protein information, disease information and medicine side effect information from a database, and constructing a heterogeneous network according to the extracted information;
processing the constructed heterogeneous network by adopting a Jaccard similarity method and a random restart walk method to obtain a drug diffusion state matrix and a protein diffusion state matrix;
step two, respectively carrying out noise reduction and dimension reduction on the drug diffusion state matrix and the protein diffusion state matrix to obtain a drug characteristic matrix and a protein characteristic matrix;
thirdly, splicing the drug characteristic matrix and the protein characteristic matrix obtained in the second step, wherein in each drug protein pair obtained by splicing, the drug protein pair formed by the drug and the protein which are known to have a relationship is correct, and the rest drug protein pairs are incorrect;
randomly selecting a part of drug protein pairs from the correct drug protein pairs as a training set positive example, and randomly selecting a part of the rest correct drug protein pairs as a testing set positive example;
randomly selecting the drug protein pairs with the same number as the positive examples of the training set from the incorrect drug protein pairs as the negative examples of the training set, and randomly selecting the drug protein pairs with the same number as the positive examples of the test set from the rest incorrect drug protein pairs as the negative examples of the test set;
if the two drug protein pairs share the drug or share the protein, the two drug protein pairs are considered to be related, otherwise, the two drug protein pairs do not have the relation, a first drug protein pair network is constructed according to a training set positive example and a training set negative example, and a second drug protein pair network is constructed according to a test set positive example and a test set negative example;
step five, training the multichannel graph convolution network by adopting a first drug protein, wherein the specific process is as follows:
respectively adopting a graph convolution network to carry out feature extraction on the topological relation between the drug protein pairs in the first drug protein pair network and the adjacent relation between the drug protein pair features to obtain a topological relation embedding ZtAnd (c) aEmbedding of symbolic proximity relations Zf
To ZtAnd ZfProcessed to obtain a common insert Zc
Using attention mechanism pair Zt、ZfAnd ZcProcessing to obtain a characteristic Z;
inputting the characteristic Z into a multilayer perceptron to carry out secondary classification, and outputting a prediction result of the relationship between the medicine and the protein by the multilayer perceptron;
testing the multichannel graph convolution network by using the second drug protein to the network, and stopping training until the prediction result of the second drug protein to the relationship between the drugs and the proteins in the network, which is output by the multilayer perceptron, meets the precision requirement to obtain the trained multichannel graph convolution network;
step six, after the processes from the step one to the step three are repeatedly executed for the drug protein pairs related to the to-be-predicted drug protein pair, randomly selecting a part of the drug protein pairs obtained in the step three, and constructing a third drug protein pair network by using the drug protein pairs related to the to-be-predicted drug protein pair and the randomly selected drug protein pairs;
after the network of the third drug protein pair is processed by the trained multichannel graph convolution network and the attention mechanism, the processing result is input into the multilayer perceptron, and a relationship prediction result of the drug protein pair to be predicted (namely, whether the relationship between the drug and the protein exists is predicted) is obtained.
The multichannel graph convolution network of the present embodiment includes three graph convolution networks, a graph convolution network for topological relation feature extraction between drug protein pairs, a graph convolution network for proximity relation feature extraction between drug protein pair features, and a graph convolution network for pair ZtAnd ZfA graph convolution network for processing.
Proximity relationships between drug protein pairs features
And the information extraction of the characteristic space is realized by constructing a k-nearest neighbor graph. Here cosine similarity distance is used to measure the similarity distance between features. For the feature matrix X of the Drug Protein Pairs (DPP), if XiAnd xjShows DPPiAnd DPPjTheir cosine distance S can be expressed as:
Figure BDA0002934180680000051
we select two nodes nearest to the target node (target DPP) to construct a neighborhood graph, and get a neighborhood graph Gf=(Af,X)。
The second embodiment is as follows: the first step is to extract drug information, protein information, disease information, and drug side effect information from a database, and construct a heterogeneous network based on the extracted information; the specific process comprises the following steps:
extracting drug information from a drug bank database, wherein the drug information comprises drug interaction information and known drug target interaction information;
extracting protein information from an HPRD database, wherein the protein information is protein-protein interaction information;
extracting disease information from a toxicological genomics database, wherein the disease information comprises relationship information between diseases and medicines and relationship information between the diseases and proteins;
extracting drug side effect information from a SIDER database, wherein the drug side effect information is relationship information between drugs and side effects;
obtaining M drugs, N proteins, O side effects and W diseases from the extracted information, and constructing a heterogeneous network according to the information extracted from each database;
the heterogeneous network comprises a drug and drug relationship network, a drug and disease relationship network, a drug and drug side effect relationship network, a drug and protein relationship network, a protein and disease relationship network, a drug chemical similarity network, and a protein gene sequence similarity network.
Drug and protein relationship networks were used in step three to determine if the drug protein pairs formed were correct.
The third concrete implementation mode: the second embodiment is different from the second embodiment in that, in the first step, the constructed heterogeneous network is processed by a Jaccard similarity method and a random restart walk method to obtain a drug diffusion state matrix and a protein diffusion state matrix; the specific process comprises the following steps:
for drugs and drug side-effect relationship networks, the drug and drug side-effect relationship networks are represented in the form of a matrix C:
Figure BDA0002934180680000061
wherein, ci′j′0 or 1, ci′j′1 represents that the i 'th medicament and the j' th medicament have a relationship of side effects, and ci′j′0 represents that the i 'th medicament and the j' th medicament have no relation in side effect, i 'is 1,2, …, M, j' is 1,2, … O;
calculating the similarity between the ith row and the jth row of the matrix C by adopting a Jaccard similarity method, wherein i is 1,2, …, and M, j is 1,2, …, M, and the calculated similarity between the ith row and the jth row is used as an element of the jth row and the jth column in the similarity matrix H, and traversing every two rows in the similarity matrix C to obtain the similarity matrix H;
processing the similarity matrix H by adopting a random restart wandering method to obtain a diffusion state matrix corresponding to the medicine and the medicine side effect relationship network;
similarly, a diffusion state matrix corresponding to the drug and drug relationship network, a diffusion state matrix corresponding to the drug and disease relationship network, a diffusion state matrix corresponding to the protein and protein relationship network, a diffusion state matrix corresponding to the protein and disease relationship network, a diffusion state matrix corresponding to the drug chemical property similarity network and a diffusion state matrix corresponding to the protein gene sequence similarity network are obtained;
correspondingly, if the two drugs have similar chemical properties, the value of the corresponding position in the matrix C is 1, otherwise, the value is 0;
splicing a diffusion state matrix corresponding to a drug and drug side effect relationship network, a diffusion state matrix corresponding to a drug and drug relationship network, a diffusion state matrix corresponding to a drug and disease relationship network and a diffusion state matrix corresponding to a drug chemical property similarity network into a feature matrix D, and taking the feature matrix D as a drug diffusion state matrix;
splicing a diffusion state matrix corresponding to the protein and protein relation network, a diffusion state matrix corresponding to the protein and disease relation network and a diffusion state matrix corresponding to the protein gene sequence similarity network into a feature matrix P, and taking the feature matrix P as the protein diffusion state matrix.
The fourth concrete implementation mode: the third difference between the present embodiment and the specific embodiment is that the method for calculating the similarity between the ith row and the jth row of the matrix C is as follows:
Figure BDA0002934180680000062
wherein J (A, B) is the similarity between the ith row and the jth row of the matrix C, A is the ith row of the matrix C, and B is the jth row of the matrix C.
The fifth concrete implementation mode: the fourth difference between this embodiment and the fourth embodiment is that, in the second step, a Denoising Automatic Encoding (DAE) method is used for denoising and reducing dimensions of the drug diffusion state matrix and the protein diffusion state matrix.
The sixth specific implementation mode: in this embodiment, different from the fifth embodiment, after the drug and protein relationship network is expressed in the form of a matrix, element 1 in the matrix represents the known existing relationship between the drug and the protein.
The seventh embodiment: sixth embodiment is different from the sixth embodiment in that the pair ZtAnd ZfProcessed to obtain a common insert ZcThe specific process comprises the following steps:
graph convolution network pair Z using weight sharingtAnd ZfProcessed to obtain an embedded ZctAnd ZcfTo Z is paired withcfAnd ZctSum and averageTo obtain a common insertion Zc
ZcfIs to ZfAfter treatment result of (1), ZctIs to ZtThe processed result of (1).
The specific implementation mode is eight: the seventh embodiment is different from the seventh embodiment in that the pair of attention-using mechanisms Zt、ZfAnd ZcProcessing to obtain a characteristic Z, wherein the specific process is as follows:
Z=α1*Zc2*Zf3*Zt
wherein alpha is1、α2、α3Respectively, represent the embedded weights.
Examples
The process of the present invention is further discussed below in connection with the examples
First, data preparation, feature embedding of the acquired drug and target:
drug information is extracted from the drug bank database, including interactions between drugs and known drug target interactions. The interactions between proteins were from the HPRD database. We obtain disease information from the toxicological genomics database, including disease to drug, disease to protein relationships. We also obtained some information about drug side effects from the SIDER database. And from 708 drugs, 1512 proteins, side effects 4912, disease 5603, and eight-relationship heterogeneous networks were obtained.
The relationship between drugs, the relationship between drugs and diseases, the relationship between drugs and drug side effects, the relationship between drugs and proteins, the relationship between proteins and diseases, the similarity of chemical properties of drugs and the similarity of protein gene sequences;
the similarity and difference between limited sample sets was first compared using the Jaccard similarity coefficient. For example, a similarity matrix H of the drug and the side effect network is calculated, a and B respectively represent the ith and jth rows of the matrix, J (a, B) represents the similarity between the ith and jth rows, the similarity matrix H is a symmetric matrix, and if H is the similarity of the drug-related network, H ═ J (J ═ J)ij)708×708. The Jaccard similarity algorithm can be defined as follows:
Figure BDA0002934180680000081
then, the obtained similarity matrix adopts a random re-starting wandering algorithm to obtain a matrix of a diffusion state, and related matrixes of the same medicine are spliced into a characteristic matrix D, wherein D is (D is ═ij)708×2832(ii) a The protein is also processed to obtain protein feature matrix P, and P ═ Pij)1512×4536Thus, a high-dimensional and high-noise medicine and protein diffusion state matrix can be obtained;
finally, the diffusion matrix is denoised and dimensionality reduced by using a DAE automatic denoising and coding method, so that the drug characteristic matrix is 100-dimensional, the protein characteristic matrix is 400-dimensional, namely DDAE=(dij)708×100And PDAE=(pij)1512×400
Secondly, constructing a drug protein pair network:
and (3) splicing the characteristic matrix of the drug and the protein obtained in the first step, wherein the drug protein pair which is known to have a relation to the spliced drug and protein is considered to be the correct drug protein pair, and the other spliced drug protein pairs are considered to be incorrect. The characteristic of the drug protein pair is that the fusion of the corresponding drug characteristic and the protein characteristic is combined to obtain 1332 pairs of correct drug proteins, therefore, 1332 pairs of incorrect drug proteins are randomly selected as negative examples;
if the spliced drug and protein pairs share the drug or share the protein, the drug and protein pairs are considered to have a relationship with each other, and a drug protein pair network is constructed according to the relationship.
Third, drug target interaction prediction using a multichannel graph convolutional network:
respectively extracting the features of the topological relation network and the feature adjacent relation network by using graph convolution in consideration of the topological relation between the drug protein pairs and the adjacent relation between the features of the drug protein pairs to obtain the topological networkEmbedding ZtAnd feature proximity embedding Zf
The graph convolution network of each channel has two hidden layers, and the lth layer can be represented as follows:
Figure BDA0002934180680000082
wherein
Figure BDA0002934180680000083
A is the adjacency matrix of the figure, I is the identity matrix,
Figure BDA0002934180680000084
to represent
Figure BDA0002934180680000085
Diagonal matrix of (W)(L)Representing the weight of the L-th layer.
Meanwhile, a certain relation between the topological network and the characteristic adjacent network is considered, so after the topological network and the characteristic adjacent network are spliced, a shared parameter strategy is used in the convolution module because the commonality of the topological network and the characteristic adjacent network is wanted. Processing the graph volume network to obtain an embedded ZcfAnd ZctThe sum of which is averaged to obtain the common embedding Zc;ZcfAnd ZctCan be respectively expressed as:
Figure BDA0002934180680000091
Figure BDA0002934180680000092
wherein
Figure BDA0002934180680000093
A is the adjacency matrix of the figure, I is the identity matrix,
Figure BDA0002934180680000094
to represent
Figure BDA0002934180680000095
The diagonal matrix of degrees of (c) is,
Figure BDA0002934180680000096
representing the sharing weight of the L-th layer.
Then, an attention mechanism is used for processing the three embedding to obtain a characteristic Z, so that more important embedding has larger weight; the formula is as follows:
Z=α1*Zc2*Zf3*Zt
wherein alpha is1、α2、α3Respectively representing the embedded weights;
and finally, inputting the characteristic Z into a multilayer perceptron to perform secondary classification so as to predict whether the relationship exists between the medicine and the protein.
Experimental performance was evaluated using AUROC (area under Roc curve) and aurr (area under PR curve) scores and the experimental performance data are shown in table 1:
TABLE 1
Figure BDA0002934180680000097
Experiments show that the performance of the method is obviously superior to that of the existing NRLMF, DTINet and DTI-CNN methods.
The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims (8)

1. The method for predicting the drug target interaction based on the multichannel graph convolution network is characterized by comprising the following steps:
step one, extracting medicine information, protein information, disease information and medicine side effect information from a database, and constructing a heterogeneous network according to the extracted information;
processing the constructed heterogeneous network by adopting a Jaccard similarity method and a random restart walk method to obtain a drug diffusion state matrix and a protein diffusion state matrix;
step two, respectively carrying out noise reduction and dimension reduction on the drug diffusion state matrix and the protein diffusion state matrix to obtain a drug characteristic matrix and a protein characteristic matrix;
thirdly, splicing the drug characteristic matrix and the protein characteristic matrix obtained in the second step, wherein in each drug protein pair obtained by splicing, the drug protein pair formed by the drug and the protein which are known to have a relationship is correct, and the rest drug protein pairs are incorrect;
randomly selecting a part of drug protein pairs from the correct drug protein pairs as a training set positive example, and randomly selecting a part of the rest correct drug protein pairs as a testing set positive example;
randomly selecting the drug protein pairs with the same number as the positive examples of the training set from the incorrect drug protein pairs as the negative examples of the training set, and randomly selecting the drug protein pairs with the same number as the positive examples of the test set from the rest incorrect drug protein pairs as the negative examples of the test set;
if the two drug protein pairs share the drug or share the protein, the two drug protein pairs are considered to be related, otherwise, the two drug protein pairs do not have the relation, a first drug protein pair network is constructed according to a training set positive example and a training set negative example, and a second drug protein pair network is constructed according to a test set positive example and a test set negative example;
step five, training the multichannel graph convolution network by adopting a first drug protein, wherein the specific process is as follows:
separately miningPerforming feature extraction on the topological relation between the drug protein pairs in the first drug protein pair network and the adjacent relation between the drug protein pair features by using the graph convolution network to obtain a topological relation embedding ZtAnd feature proximity embedding Zf
To ZtAnd ZfProcessed to obtain a common insert Zc
Using attention mechanism pair Zt、ZfAnd ZcProcessing to obtain a characteristic Z;
inputting the characteristic Z into a multilayer perceptron to carry out secondary classification, and outputting a prediction result of the relationship between the medicine and the protein by the multilayer perceptron;
testing the multichannel graph convolution network by using the second drug protein to the network, and stopping training until the prediction result of the second drug protein to the relationship between the drugs and the proteins in the network, which is output by the multilayer perceptron, meets the precision requirement to obtain the trained multichannel graph convolution network;
step six, after the processes from the step one to the step three are repeatedly executed for the drug protein pairs related to the to-be-predicted drug protein pair, randomly selecting a part of the drug protein pairs obtained in the step three, and constructing a third drug protein pair network by using the drug protein pairs related to the to-be-predicted drug protein pair and the randomly selected drug protein pairs;
and after the constructed third drug protein pair network is processed by a trained multichannel graph convolution network and an attention mechanism, inputting a processing result into a multilayer perceptron to obtain a relationship prediction result of the drug protein pair to be predicted.
2. The method for predicting drug target interaction based on the multi-channel graph convolution network as claimed in claim 1, wherein in the first step, drug information, protein information, disease information and drug side effect information are extracted from a database, and a heterogeneous network is constructed according to the extracted information; the specific process comprises the following steps:
extracting drug information from a drug bank database, wherein the drug information comprises drug interaction information and known drug target interaction information;
extracting protein information from an HPRD database, wherein the protein information is protein-protein interaction information;
extracting disease information from a toxicological genomics database, wherein the disease information comprises relationship information between diseases and medicines and relationship information between the diseases and proteins;
extracting drug side effect information from a SIDER database, wherein the drug side effect information is relationship information between drugs and side effects;
obtaining M drugs, N proteins, O side effects and W diseases from the extracted information, and constructing a heterogeneous network according to the information extracted from each database;
the heterogeneous network comprises a drug and drug relationship network, a drug and disease relationship network, a drug and drug side effect relationship network, a drug and protein relationship network, a protein and disease relationship network, a drug chemical similarity network, and a protein gene sequence similarity network.
3. The method for predicting the drug target interaction based on the multichannel graph convolution network according to claim 2, wherein in the first step, the constructed heterogeneous network is processed by a Jaccard similarity method and a random restart walk method to obtain a drug diffusion state matrix and a protein diffusion state matrix; the specific process comprises the following steps:
for drugs and drug side-effect relationship networks, the drug and drug side-effect relationship networks are represented in the form of a matrix C:
Figure FDA0002934180670000021
wherein, ci′j′0 or 1, ci′j′1 represents that the i 'th medicament and the j' th medicament have a relationship of side effects, and ci′j′0 represents that the i 'th medicament and the j' th medicament have no relation in side effect, i 'is 1,2, …, M, j' is 1,2, … O;
calculating the similarity between the ith row and the jth row of the matrix C by adopting a Jaccard similarity method, wherein i is 1,2, …, and M, j is 1,2, …, M, and the calculated similarity between the ith row and the jth row is used as an element of the jth row and the jth column in the similarity matrix H, and traversing every two rows in the similarity matrix C to obtain the similarity matrix H;
processing the similarity matrix H by adopting a random restart wandering method to obtain a diffusion state matrix corresponding to the medicine and the medicine side effect relationship network;
similarly, a diffusion state matrix corresponding to the drug and drug relationship network, a diffusion state matrix corresponding to the drug and disease relationship network, a diffusion state matrix corresponding to the protein and protein relationship network, a diffusion state matrix corresponding to the protein and disease relationship network, a diffusion state matrix corresponding to the drug chemical property similarity network and a diffusion state matrix corresponding to the protein gene sequence similarity network are obtained;
splicing a diffusion state matrix corresponding to a drug and drug side effect relationship network, a diffusion state matrix corresponding to a drug and drug relationship network, a diffusion state matrix corresponding to a drug and disease relationship network and a diffusion state matrix corresponding to a drug chemical property similarity network into a feature matrix D, and taking the feature matrix D as a drug diffusion state matrix;
splicing a diffusion state matrix corresponding to the protein and protein relation network, a diffusion state matrix corresponding to the protein and disease relation network and a diffusion state matrix corresponding to the protein gene sequence similarity network into a feature matrix P, and taking the feature matrix P as the protein diffusion state matrix.
4. The method for predicting drug target interaction based on the multi-channel graph convolutional network of claim 3, wherein the similarity between the ith row and the jth row of the matrix C is calculated by:
Figure FDA0002934180670000031
wherein J (A, B) is the similarity between the ith row and the jth row of the matrix C, A is the ith row of the matrix C, and B is the jth row of the matrix C.
5. The method for predicting drug target interaction based on multi-channel graph convolutional network of claim 4, wherein in the second step, denoising and dimension reduction are performed on the drug diffusion state matrix and the protein diffusion state matrix by using a denoising automatic coding method.
6. The method for predicting drug target interaction based on multi-channel graph convolution network of claim 5, wherein after the drug and protein relationship network is expressed in a matrix form, element 1 in the matrix represents the known existing relationship between the drug and the protein.
7. The multi-channel graph convolution network-based drug target interaction prediction method of claim 6, wherein the pair Z istAnd ZfProcessed to obtain a common insert ZcThe specific process comprises the following steps:
graph convolution network pair Z using weight sharingtAnd ZfProcessed to obtain an embedded ZctAnd ZcfTo Z is paired withcfAnd ZctSumming and averaging to obtain the common embedding Zc
8. The multi-channel graph convolution network-based drug target interaction prediction method of claim 7, wherein the attention mechanism is used for Zt、ZfAnd ZcProcessing to obtain a characteristic Z, wherein the specific process is as follows:
Z=α1*Zc2*Zf3*Zt
wherein alpha is1、α2、α3Respectively, represent the embedded weights.
CN202110154690.6A 2021-02-04 2021-02-04 Drug target interaction prediction method based on multi-channel graph convolution network Active CN112863693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110154690.6A CN112863693B (en) 2021-02-04 2021-02-04 Drug target interaction prediction method based on multi-channel graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110154690.6A CN112863693B (en) 2021-02-04 2021-02-04 Drug target interaction prediction method based on multi-channel graph convolution network

Publications (2)

Publication Number Publication Date
CN112863693A true CN112863693A (en) 2021-05-28
CN112863693B CN112863693B (en) 2021-09-28

Family

ID=75986612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110154690.6A Active CN112863693B (en) 2021-02-04 2021-02-04 Drug target interaction prediction method based on multi-channel graph convolution network

Country Status (1)

Country Link
CN (1) CN112863693B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113345535A (en) * 2021-06-04 2021-09-03 南开大学 Drug target prediction method and system for keeping chemical property and function consistency of drug
CN114023464A (en) * 2021-11-08 2022-02-08 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114496303A (en) * 2022-01-06 2022-05-13 湖南大学 Anticancer drug screening method based on multichannel neural network
CN114582429A (en) * 2022-03-03 2022-06-03 四川大学 Method and device for predicting drug resistance of mycobacterium tuberculosis based on hierarchical attention neural network
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960131A (en) * 2017-05-05 2017-07-18 华东师范大学 A kind of drug side-effect Forecasting Methodology based on multi-feature fusion
CN109887540A (en) * 2019-01-15 2019-06-14 中南大学 A kind of drug targets interaction prediction method based on heterogeneous network insertion
CN110880354A (en) * 2019-10-24 2020-03-13 广东药科大学 Medicine-target interaction prediction method based on group intelligence
CN111477344A (en) * 2020-04-10 2020-07-31 电子科技大学 Drug side effect identification method based on self-weighted multi-core learning
CN111524546A (en) * 2020-04-14 2020-08-11 湖南大学 Drug-target interaction prediction method based on heterogeneous information
CN111785320A (en) * 2020-06-28 2020-10-16 西安电子科技大学 Drug target interaction prediction method based on multilayer network representation learning
CN112309505A (en) * 2020-11-05 2021-02-02 湖南大学 Anti-neocoronal inflammation drug discovery method based on network characterization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960131A (en) * 2017-05-05 2017-07-18 华东师范大学 A kind of drug side-effect Forecasting Methodology based on multi-feature fusion
CN109887540A (en) * 2019-01-15 2019-06-14 中南大学 A kind of drug targets interaction prediction method based on heterogeneous network insertion
CN110880354A (en) * 2019-10-24 2020-03-13 广东药科大学 Medicine-target interaction prediction method based on group intelligence
CN111477344A (en) * 2020-04-10 2020-07-31 电子科技大学 Drug side effect identification method based on self-weighted multi-core learning
CN111524546A (en) * 2020-04-14 2020-08-11 湖南大学 Drug-target interaction prediction method based on heterogeneous information
CN111785320A (en) * 2020-06-28 2020-10-16 西安电子科技大学 Drug target interaction prediction method based on multilayer network representation learning
CN112309505A (en) * 2020-11-05 2021-02-02 湖南大学 Anti-neocoronal inflammation drug discovery method based on network characterization

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113345535A (en) * 2021-06-04 2021-09-03 南开大学 Drug target prediction method and system for keeping chemical property and function consistency of drug
CN114023464A (en) * 2021-11-08 2022-02-08 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114023464B (en) * 2021-11-08 2022-08-09 东北林业大学 Drug-target interaction prediction method based on supervised synergy map contrast learning
CN114496303A (en) * 2022-01-06 2022-05-13 湖南大学 Anticancer drug screening method based on multichannel neural network
CN114496303B (en) * 2022-01-06 2024-06-04 湖南大学 Anti-cancer drug screening method based on multichannel neural network
CN114582429A (en) * 2022-03-03 2022-06-03 四川大学 Method and device for predicting drug resistance of mycobacterium tuberculosis based on hierarchical attention neural network
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks

Also Published As

Publication number Publication date
CN112863693B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN112863693B (en) Drug target interaction prediction method based on multi-channel graph convolution network
Jin et al. Application of deep learning methods in biological networks
Su et al. Attention-based knowledge graph representation learning for predicting drug-drug interactions
CN111785320A (en) Drug target interaction prediction method based on multilayer network representation learning
CN115171779B (en) Cancer driving gene prediction device based on graph attention network and multiple groups of chemical fusion
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
AU2019276730A1 (en) Methods and apparatus for multi-modal prediction using a trained statistical model
US20190370616A1 (en) Methods and apparatus for multi-modal prediction using a trained statistical model
CN114255886B (en) Multi-group similarity guide-based drug sensitivity prediction method and device
CN109637579B (en) Tensor random walk-based key protein identification method
CN110246550B (en) Drug combination prediction method based on drug similarity network data
CN113488104B (en) Cancer driving gene prediction method and system based on local and global network centrality analysis
CN111951886A (en) Drug relocation prediction method based on Bayesian inductive matrix completion
CN114334038A (en) Disease drug prediction method based on heterogeneous network embedded model
CN112652355A (en) Medicine-target relation prediction method based on deep forest and PU learning
CN113539372A (en) Efficient prediction method for LncRNA and disease association relation
CN115376704A (en) Medicine-disease interaction prediction method fusing multi-neighborhood correlation information
CN114842927A (en) Medicine and pathway association prediction method of knowledge graph attention network
CN110618987A (en) Treatment pathway key node information processing method based on lung cancer medical big data
Sun et al. Robust structured heterogeneity analysis approach for high‐dimensional data
Wang et al. LDS-CNN: A deep learning framework for drug-target interactions prediction based on large-scale drug screening
CN116741408A (en) Method for multi-view self-attention prediction of drug to disease association
CN116798509A (en) Method for convolutionally predicting microbial and drug associations based on double attention force diagram
CN114023464B (en) Drug-target interaction prediction method based on supervised synergy map contrast learning
CN113192562B (en) Pathogenic gene identification method and system fusing multi-scale module structure information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant