CN113421658A - Medicine-target interaction prediction method based on neighbor attention network - Google Patents
Medicine-target interaction prediction method based on neighbor attention network Download PDFInfo
- Publication number
- CN113421658A CN113421658A CN202110759813.9A CN202110759813A CN113421658A CN 113421658 A CN113421658 A CN 113421658A CN 202110759813 A CN202110759813 A CN 202110759813A CN 113421658 A CN113421658 A CN 113421658A
- Authority
- CN
- China
- Prior art keywords
- target
- drug
- interaction
- matrix
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 42
- 239000003814 drug Substances 0.000 claims abstract description 93
- 229940079593 drug Drugs 0.000 claims abstract description 75
- 239000003596 drug target Substances 0.000 claims abstract description 45
- 239000011159 matrix material Substances 0.000 claims description 41
- 102000004169 proteins and genes Human genes 0.000 claims description 28
- 108090000623 proteins and genes Proteins 0.000 claims description 28
- 230000014509 gene expression Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000001939 inductive effect Effects 0.000 abstract description 7
- 238000011156 evaluation Methods 0.000 abstract description 2
- 230000006916 protein interaction Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 102000003688 G-Protein-Coupled Receptors Human genes 0.000 description 3
- 108090000045 G-Protein-Coupled Receptors Proteins 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000002547 new drug Substances 0.000 description 3
- 102000004310 Ion Channels Human genes 0.000 description 2
- 108020005497 Nuclear hormone receptor Proteins 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009509 drug development Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 102000006255 nuclear receptors Human genes 0.000 description 2
- 108020004017 nuclear receptors Proteins 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 235000000332 black box Nutrition 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Toxicology (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention provides a 'drug-target' interaction prediction method based on a neighbor attention network, wherein a prediction model adopted is the neighbor attention network (NNAttNet), the problems are solved by constructing embedded representation (DTPs) of a drug to a neighbor, the prediction method enables the drug and protein interaction to have interpretability, reduces the influence caused by the lack of DTI entries, and provides a unified representation for direct-push prediction and inductive prediction. In addition, NNAttNet provides an attention-based selection of key features to predict DTI more accurately, and evaluation of NNAttNet on a baseline dataset shows that NNAttNet has better DTI prediction performance.
Description
Technical Field
The invention belongs to the technical field of computer-aided drug research and development, and particularly relates to a 'drug-target' interaction prediction method based on a neighbor attention network.
Background
The shift in drug discovery patterns from "one drug, one target" to "multiple drugs, multiple targets" reveals the link between the drug and the target, a new pattern that facilitates the discovery of potential "drug-target" interactions (DTIs) that are fundamental tasks in drug development. However, the process of determining DTI by biological experiments is time consuming and laborious.
In recent years, various databases have come along with the increasing generation of DTI data, and this accumulation has prompted the application of computer methods, particularly machine learning-based methods, with good predictive performance in finding potential DTIs. However, despite significant efforts by researchers in DTI prediction, with significant achievements, there are still significant challenges in actual work, mainly expressed in the following aspects:
1) the embedded representation description of the DTI prediction mechanism is not sufficient due to the deficiency of interpretability;
2) the prediction model is very sensitive to missing tags;
3) the prediction approach is difficult to predict for new compound molecules/proteins interactions.
In view of this, there is a need to develop a new approach to predict "drug-target" interactions.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a 'drug-target' interaction prediction method based on a neighbor attention network.
Therefore, the invention firstly carries out intensive research and analysis on the problems in the prior art and discovers that:
1. the embedded representation description of the DTI prediction mechanism is not sufficient due to the deficiency of interpretability; mainly because the drug/target embedded representation learned by the existing Deep Learning (DL) or matrix decomposition (MF) is always difficult to interpret, the generated implicit space is difficult to provide an easy-to-handle way to indicate how these properties affect the interaction, and their black-box nature hinders direct guidance of drug design.
2. The prediction model is very sensitive to missing tags; mainly because in practice the collection of labels for "drug-target" is not complete, existing methods rarely take into account the missing interaction labels between "drug-target" pairs and do not concern whether the missing interactions contribute to the prediction of DTI.
3. The prediction mode is difficult to predict the interaction of new compound molecules/proteins; at present, two main prediction modes are provided, namely direct-push prediction and inductive prediction;
the task of the direct-push prediction is to construct a function mapping F: DXT → [0,1]To infer potential interactions between unlabeled "drug-target" pairs, the characteristics or similarities of the drug and target are used to learn the function F. Inductive learning is a well-known cold start problem in recommendation systems, and the task of inductive prediction is usually to learn a functional mapping F: DXT → [0,1](ii) a However, it can infer new drug moleculesAnd novel target proteinsPotential interaction between, or inference of D and TyInteraction between them, and, Dx and TyIs learned in F. However, almost all the current methods of DTI prediction based on similarity belong to direct-push prediction, which extracts topological embedded features from a DTI network or a similar matrix, and the training phase uses labeled training samples and unlabeled test samples at the same time, so that when new samples determine their labels in practice, they need to train the model again, and cannot meet the requirements of current drug development.
Therefore, in order to achieve the above object, the technical solution provided by the present invention is:
the 'drug-target' interaction prediction method based on the neighbor attention network is characterized by comprising the following steps of:
1) construction of a model for predicting drug-target pair interactions
The drug-target pair interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) collecting sample data, and training the 'drug-target' pair interaction prediction model constructed in the step 1) to obtain a trained 'drug-target' pair interaction prediction model;
the sample data comprises relevant data of the drug and the target and real interaction of the drug and the target; the specific training process is as follows:
2.1) calculating the similarity between every two of all the drug molecules and the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises the structural information of the drug molecules, the sequence information of the target protein and the interaction relation information of the drug molecules and the target protein;
2.2) constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules and the target protein obtained in the step 2.1) and similarity data between the drug molecules, and extracting the embedded expression of the target protein and all the drug molecules, namely whether the target protein is connected or not;
and/or
Constructing a DsTNA module by using the similarity data between the interaction relation matrix A of the drug molecules and the target protein obtained in the step 2.1), and extracting the embedded expression of the drug molecules and all the target proteins, namely the characteristic vector, which indicates whether a continuous edge exists or not;
wherein a drug d is extracted by the TsDNA modulexAnd a target tpThe extraction process is as follows:
a1. according to the combination of all the medicaments with the medicament dxThe similarity is sequenced from high to low to obtain K1、K2、…Km;
a2. Obtaining all drugs and targets tpEliminating non-interacting drugs;
a3. obtainingThe medicament dxAnd the target tpIs expressed as follows:
wherein ,is a key assigned, which is a drug Is a series of assigned keys, viIs thatIs dxAndsimilarity of (c);
extraction of a target t by the DsTNA modulepWith a medicament dxThe extraction process is as follows:
b1. according to all targets and the target tpSequencing all targets from high similarity to low similarity to obtain H1、H2、…Hm;
b2. Obtaining all targets and drugs dxEliminating targets without interaction;
b3. obtaining the target tpAnd the said medicament dxIs expressed as follows
wherein ,is a key assigned, which is a drug Is a series of assigned keys, uiIs thatIs tpAndsimilarity of (c);
medicine dxAnd target tpIs generated by concatenating the bi-directional representations:
e(dx,tp)=[a(dx,tp)||a(tp,dx)];
for the extraction of new drug embedded representation, only TsDNA was constructed at the time of constructing the test set and training set in order to maintain the balance of data since it cannot construct DsTNA (new data);
for the extraction of the new target embedded representation, only DsTNA is constructed when the test set and the training set are constructed in order to keep the balance of data because TsDNA (new data) cannot be constructed;
that is, e (d)x,tp)=[a(dx,tp)]Or e (d)x,tp)=[a(tp,dx)];
2.3) processing the embedded representation obtained in the step 2.2) by using the characteristic important network
S1, performing step 2.2) on all drugs and targets, and stacking the obtained embedded expressions of the drug-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing an attention-enhanced expression matrix through the matrix M obtained in the step S2The identification is convenient;
2.4) representing the matrix obtained in step 3)Inputting the data into a deep neural network model as an input layer to obtain the predicted interaction of the drug and the target;
2.5) comparing the predicted interaction of the drug-target obtained in the step 2.4) with the real interaction of the drug and the target, and obtaining a weight value in the model through back propagation to obtain a trained drug-target pair interaction prediction model.
That is, the training of the prediction model uses an interpretable model based on deep learning, namely NNAttNet, which comprises three modules, a neighbor attention module, a feature importance network and a multi-layer deep neural network model. For the "drug-target" pairs, the first module generates their interpretable embedded representations that have stronger expressive properties for the missing tags in the training data and are feasible in both the direct-push prediction and inductive prediction scenarios. In addition, the algorithm is not only adaptive to the feature input, but also adaptive to the similarity input. The second module, the feature importance network, which represents the importance of each dimension of the embedded feature, provides an interpretable feature selection, and belongs to one of the steps in building the neighbor attention module. The last module distinguishes whether a "drug-target" pair is a potential DTIs.
3) And (3) predicting the interaction by using the 'drug-target' interaction prediction model trained in the step 2).
Further, in the step 2.1), the similarity between every two of all the drug molecules is calculated by using the acquired structural information of the drug molecules and adopting a SIMCOMP method;
and calculating the similarity between every two target proteins by using the sequence information of the collected target proteins and adopting a Smith-Waterman algorithm.
Further, the SIMCOMP method is as follows:
SIMCOMP provides a global similarity score based on the size of the common substructure between two pharmaceutical compounds using a graph alignment algorithm, wherein the similarity s (c, c ') of compounds c and c' is calculated as follows:
further, the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as A ═ a1a2a3…an,B=b1b2b3…bmWherein n and m are the length of sequences A and B, respectively;
determining parameters:
s is a score when the elements constituting the sequence are identical;
Wka gap penalty for length k;
creating a score matrix H and initializing the head row and the head column of the score matrix H, wherein the size of the matrix is (n +1) × (m + 1);
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, where:
and selecting the item with the highest score in the score matrix H, namely the matching score of the sequence A and the sequence B, and marking as SW (A, B).
The similarity between sequence a and sequence B is:
further, in step S2), the attention matrix M is mapped as follows:
M(:,i)=DNNi(E)。
further, in step S3), the expression matrix of attention enhancementThe method comprises the following specific steps:
further, in step 2.4), the deep neural network model includes an input layer, a hidden layer using Relu as an activation function, and two neuron output layers using Sigmoid as an activation function; the deep neural network model acts as a binary predictor, with the output layer producing a probability representing the likelihood of drug-target pair interaction. The entire network of NNAttNet with neighbor attention weights, feature importance terms, and DNN weights can be jointly optimized by a binary cross entropy loss function, as follows:
wherein Y is the authentic tag of the drug target pair; f (-) is DNN; θ is a weight parameter of the entire network; r (-) is L2-norm; the coefficients of the lambda regularization term.
The present invention also provides a computer-readable storage medium having stored thereon a computer program characterized in that: which when executed by a processor implements the steps of the above-described method.
An electronic device, characterized in that: including a processor and a computer-readable storage medium;
the computer-readable storage medium has stored thereon a computer program which, when being executed by the processor, performs the steps of the above-mentioned method.
The invention has the advantages that:
the invention provides a prediction method based on deep learning, wherein a prediction model adopted is a neighbor attention network (NNAttNet), the problems are solved by constructing embedded representation (DTPs) of a medicament to a neighbor, the prediction method enables the interaction of the medicament and protein to have interpretability, reduces the influence caused by the lack of DTI entries, and provides a unified representation for direct-push prediction and inductive prediction. In addition, NNAttNet provides an attention-based selection of key features to predict DTI more accurately, and evaluation of NNAttNet on a baseline dataset shows that NNAttNet has better DTI prediction performance.
Drawings
Fig. 1 is the general architecture of the method NNAttNet proposed by the present invention;
FIG. 2 is a diagram of the basic structure of a TsDNA module;
FIG. 3 is a schematic diagram of the construction of a feature importance matrix;
FIG. 4 is an arrangement and distribution of drug key insertion features;
FIG. 5 is an arrangement and distribution of the significance of drug characteristics;
FIG. 6 is a graph of predicted performance of top-k features;
FIG. 7 distribution of feature importance under different deletion rates of DTIs.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples:
an embodiment of the "drug-target" interaction prediction method based on the neighbor attention network proposed by the present invention is specifically as follows:
the present embodiment includes three parts:
in the first part, when a test set is constructed, the continuous edges in a part of networks are deleted, and then the continuous edges are predicted.
In the second part, some drugs and all their links in the network are deleted in order to simulate the scenario of new drug prediction.
And in the third part, some target points and all the connecting edges in the network are deleted, so as to simulate the scene of new target point prediction.
In the process of result statistics and display, the performance parameters of the overall result are displayed, and a specific certain drug (target) is not specially displayed.
This example uses an inventive baseline dataset in a predictive performance comparison experiment, dividing the receptors into 4 subsets, enzymes (En), Ion Channels (IC), G-protein coupled receptors (GPCR) and Nuclear Receptors (NR), respectively, based on the characteristics of the protein family. Each subset includes known "drug-target" interactions, pairwise similarities between drugs, and pairwise similarities between targets. Wherein, the calculation of pairwise similarity between the medicines is performed by SIMCOMP algorithm, and the calculation of pairwise similarity between the targets is performed by Smith-Waterman algorithm. The details of this data set are shown in table 1.
TABLE 1 details of the reference data set
And (3) setting a dictionary K of a virtual key type for each data subset, assigning values to the dictionary by using the acquired data, constructing a TsDNA module and a DsTNA module between a drug molecule and a target protein, and finally obtaining the embedded representation of all the drug-target pairs.
TsDNA (see FIG. 2) consists of a dictionary K of virtual key types and values v describing these virtual keys. In the dictionary, the virtual keys are sorted by semantic adjacency. In short, the first key is its nearest neighbor, the second key is its second nearest neighbor, and the last key is its farthest neighbor. It is noted that this is a contribution that may explain the classification distinction between known DTIs and unknown DTIs.
When considering a drug dxWhether or not to react with a target protein tpWhen there is an interaction, these empty bonds are bound to the target tpThere are other known drug assignments that have an interaction relationship. In contrast, for those sums tpDrugs with no interaction are not madeOperation, then dxFor tpAttention of (c) can be defined as:
in this connection, it is possible to use,is a key assigned, which is a drug Is a series of assigned keys, viIs thatIs dxAndthe similarity of (c). Note that a virtual key is featureless, and only after being assigned, will there be a feature. Such attention is given to dx→tpAnd (4) unidirectional representation.
To enhance the interpretability of the TsDNA module, we set V ═ V1,v2,…,v|K|]Is a diagonal matrix of which v isiIs a one-hot-like vector, that is, viIs not 0, and the other elements are all 0. By this method, dx→tpThe attention vector of (a) is sparse. Considering a well-accepted hypothesis that similar drugs tend to interact with target proteins of interest, it is hypothesized that if drug dxAnd target tpPossessing an interaction relationship, and possessing more non-zero values in some of their initial characteristic dimensions than drugs and targets that do not have an interaction relationship. In other words, for dx and tpThe interaction relationship between the twoOther drugs, if desired, with tpThere are interactions, that is they are usually dxThe first few neighbors of (a). This sparse attention insert expression provides evidence for later interpretability of TsDNA.
Due to the symmetric effect of the nodes in the two networks, we can similarly construct a DsTNA block that outputs another one-way representation tp→dxThe expression is a (t)p,dx) And thus d is requested finallyx and tpThe representation of pairs is generated by concatenating the bi-directional representations.
e(dx,tp)=[a(dx,tp)||a(tp,dx)]
The embedded representations of all "drug-target" pairs are stacked together, denoted as the attention matrix E.
By the formula M (: i) ═ DNNi(E) And modeling the attention matrix E to obtain an embedded expression matrix M.
A generic DNN was used as a binary predictor to predict whether a "drug-target" pair would interact. The binary predictor comprises an input layer, namely an embedded expression of a drug-target pair, a hidden layer taking Relu as an activation function, and two neuron output layers taking Sigmoid as the activation function. The output layer generates probabilities that represent the likelihood of drug-target pair interaction. The entire network of NNAttNet with neighbor attention weights, feature importance terms, and DNN weights can be jointly optimized by a binary cross entropy loss function, as follows:
wherein Y is the authentic tag of the drug target pair; f (-) is DNN; θ is a weight parameter of the entire network; r (-) is L2-norm; the coefficients of the lambda regularization term.
In this example we evaluated the performance of each method by cross validation at 10 fold (CV), and used AUROC (area under receiver operating characteristic curve) and aurrc (area under exact recall curve) as indicators to measure DTI predictive performance.
In 10-fold cross-validation, we calculated AUROC/aurrc scores for each prediction method and obtained final AUROC/aurrc scores by calculating the average AUROC/aurrc score over 10 replicates.
In order to comprehensively evaluate the performance of each method, the CV test was performed in consideration of the following three scenarios.
Under CVS1, 90% of the DTPs (embedded representation of drug versus neighbor) were used for training, while the remaining 10% were used for each round of testing.
Under CVS2 (or CVS3), 90% of the drug (or target) interactions were used for training, and the remaining 10% of the drug (or target) interactions were used for testing.
CVS2(CVS3) is a cold start DTI prediction because there is no overlap between the training drug (target) and the test drug (target).
Notably, CVS1 is a direct-push type of prediction task.
The CVS2/CVS3 may be a straight-push or inductive prediction task, depending on the nature of the prediction method. The experimental results of NNAttNet are shown in tables 2, 3, 4.
TABLE 2 Performance display of DTI prediction on 4 data sets by CVS1
Note: ROC and PR are abbreviations for AUROC and AUPRC.
TABLE 3 Performance display of DTI prediction on 4 data sets by CVS2
Note: ROC and PR are abbreviations for AUROC and AUPRC.
TABLE 4 Performance display of DTI prediction on 4 data sets by CVS3
Note: ROC and PR are abbreviations for AUROC and AUPRC.
The interpretability of the prediction method in the present invention is explained below based on the experimental results of the present example.
Taking the GPCR dataset as an example, by calculating two mean embedding vectors of known DTIs and unlabeled DTPs, a dictionary distribution of drug bond types from K1 to K100 was obtained (see fig. 4). The significantly high values of the insertion characteristic that occur in the first n nearest neighbors indicate that a drug interacting with a particular target will always find its top n nearest neighbors in drugs interacting with the same target. This observation indicates that if a drug has more nonzero-value units in the first n characteristic dimensions (bonds) than it does not, it may interact with the target.
The present invention also indicates on this embodiment which embedded features in the M matrix cause the interaction to occur. Since the cells with larger M median represent an important feature dimension, each feature fiThe importance M (: i) of M can be measured by the average of the values in the ith column of M (see fig. 5). The distribution of importance of the keys in dictionary K illustrates that features of higher importance are typically located in the top n nearest neighbors. This observation is in marked agreement with the above-described visual observation on the first 10 keys with the larger Spearman correlation (r-0.8182).
This example investigates the predicted performance of top-k features (see FIG. 6). k takes the value {1,5,10,15, …,220 }. The prediction effect increases dramatically when k increases to 50. As k continues to increase, performance increases slowly, and decreases even when k is greater.
One reason NNAttNet still performs well on the missing tag problem is that it utilizes an embedded vector composed of neighboring nodes. We investigated the distribution of feature importance at different rates of loss of DTIs (fig. 7). The graph reveals that the distribution of characteristic bonds shows a similar trend at different deletion rates. Meanwhile, the feature importance vectors under the 9 deficiency rates have high correlation. The Spearman correlation coefficients of the feature importance vectors at 10% deletion rate and other deletion rates (20% -90%) are 0.9996, 0.9993, 0.9989, 0.9979, 0.9969, 0.9943, 0.9919 and 0.9770, respectively. This high degree of correlation indicates that in the absence of data, the feature importance network can still indicate critical features. Thus, in the absence of a tag, even if a few drugs are found in the top n neighbors of the drug closest to the target, the ordering key dictionary in its neighbor attention module can still guarantee that the queried drug interacts with the target.
The invention demonstrates the feasibility of NNAttNet by the above examples: the interpretability of drug-protein interactions, stronger properties for predictions of missing DTI tags, consistent representation of straight-forward and generalized DTI predictions, and selection of important features based on attention for more accurate DTI predictions.
Well-known implementations and features of the above-described arrangements are not described in great detail herein. It should be noted that, for those skilled in the art, various modifications can be made without departing from the invention, and these should also be construed as the scope of the invention, which does not affect the effect of the invention and the practicability of the patent. The scope of protection claimed in the present application shall be defined by the claims, and the detailed description and the like in the specification shall be used for explaining the contents of the claims.
Claims (9)
1. The 'drug-target' interaction prediction method based on the neighbor attention network is characterized by comprising the following steps of:
1) construction of a model for predicting drug-target pair interactions
The drug-target interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) collecting sample data, and training the 'drug-target' pair interaction prediction model constructed in the step 1) to obtain a trained 'drug-target' pair interaction prediction model;
the sample data comprises relevant data of the drug and the target and real interaction of the drug and the target; the specific training process is as follows:
2.1) calculating the similarity between every two of all the drug molecules and the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises the structural information of the drug molecules, the sequence information of the target protein and the interaction relation information of the drug molecules and the target protein;
2.2) constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules and the target protein obtained in the step 2.1) and similarity data between the drug molecules, and extracting the embedded expression of the target protein and all the drug molecules;
constructing a DsTNA module by using the similarity data between the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1), and extracting the embedded expression of the drug molecules and all the target proteins;
wherein a drug d is extracted by the TsDNA modulexAnd a target tpThe extraction process is as follows:
a1. according to the combination of all the medicaments with the medicament dxThe similarity is sequenced from high to low to obtain K1、K2、…Km;
a2. Obtaining all drugs and targets tpEliminating non-interacting drugs;
a3. obtaining said drug dxAnd the target tpIs expressed as follows:
wherein ,is a key assigned, which is a drugIs a series of assigned keys, viIs thatIs dxAndsimilarity of (c);
extraction of a target t by the DsTNA modulepWith a medicament dxThe extraction process is as follows:
b1. according to all targets and the target tpSequencing all targets from high similarity to low similarity to obtain H1、H2、…Hm;
b2. Obtaining all targets and drugs dxEliminating targets without interaction;
b3. obtaining the target tpAnd the said medicament dxIs expressed as follows
wherein ,is a key assigned, which is a drugIs a series of assigned keys, uiIs thatIs tpAndsimilarity of (c);
medicine dxAnd target tpIs generated by concatenating the bi-directional representations:
e(dx,tp)=[a(dx,tp)||a(tp,dx)];
2.3) processing the embedded representation obtained in the step 2.2) by using the characteristic important network
S1, performing step 2.2) on all drugs and targets, and stacking the obtained embedded expressions of the drug-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing an attention-enhanced expression matrix through the matrix M obtained in the step S2
2.4) representing the matrix obtained in step 3)Inputting the data into a deep neural network model as an input layer to obtain the predicted interaction of the drug and the target;
2.5) comparing the predicted interaction of the drug-target obtained in the step 2.4) with the real interaction of the drug and the target, and obtaining a weight in the model through back propagation to obtain a trained drug-target pair interaction prediction model;
3) predicting the interaction of the drug-target by using the trained drug-target interaction prediction model in the step 2).
2. The method for predicting a drug-target interaction according to claim 1, wherein:
in the step 2.1), similarity between every two medicine molecules is calculated by using the acquired structure information of the medicine molecules and adopting a SIMCOMP method;
and calculating the similarity between every two target proteins by using the sequence information of the collected target proteins and adopting a Smith-Waterman algorithm.
3. The method for predicting a drug-target interaction according to claim 2, wherein:
the SIMCOMP method is as follows:
SIMCOMP provides a global similarity score based on the size of the common substructure between two pharmaceutical compounds using a graph alignment algorithm, wherein the similarity s (c, c ') of compounds c and c' is calculated as follows:
4. the method for predicting a drug-target interaction according to claim 2, wherein:
the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as A ═ a1a2a3…an,B=b1b2b3…bmWherein n and m are the length of sequences A and B, respectively;
determining parameters:
s is a score when the elements constituting the sequence are identical;
Wka gap penalty for length k;
creating a score matrix H and initializing the head row and the head column of the score matrix H, wherein the size of the matrix is (n +1) × (m + 1);
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, where:
selecting the item with the highest score in the score matrix H, namely the matching score of the sequence A and the sequence B, and marking as SW (A, B);
the similarity between sequence a and sequence B is:
5. the method for predicting a drug-target interaction according to any one of claims 1 to 4, wherein:
step S2), the attention matrix M is mapped as follows:
M(:,i)=DNNi(E)。
7. the method for predicting a drug-target interaction according to claim 6, wherein:
in step 2.4), the deep neural network model comprises an input layer, a hidden layer taking Relu as an activation function and two neuron output layers taking Sigmoid as an activation function.
8. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
9. An electronic device, characterized in that: including a processor and a computer-readable storage medium;
the computer-readable storage medium has stored thereon a computer program which, when being executed by the processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110759813.9A CN113421658B (en) | 2021-07-06 | 2021-07-06 | Drug-target interaction prediction method based on neighbor attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110759813.9A CN113421658B (en) | 2021-07-06 | 2021-07-06 | Drug-target interaction prediction method based on neighbor attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421658A true CN113421658A (en) | 2021-09-21 |
CN113421658B CN113421658B (en) | 2023-06-16 |
Family
ID=77721466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110759813.9A Active CN113421658B (en) | 2021-07-06 | 2021-07-06 | Drug-target interaction prediction method based on neighbor attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421658B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114765060A (en) * | 2021-01-13 | 2022-07-19 | 四川大学 | Multi-attention method for predicting drug target interaction |
CN116246697A (en) * | 2023-05-11 | 2023-06-09 | 上海微观纪元数字科技有限公司 | Target protein prediction method and device for medicines, equipment and storage medium |
CN117912591A (en) * | 2024-03-19 | 2024-04-19 | 鲁东大学 | Kinase-drug interaction prediction method based on deep contrast learning |
CN117912591B (en) * | 2024-03-19 | 2024-05-31 | 鲁东大学 | Kinase-drug interaction prediction method based on deep contrast learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0321708D0 (en) * | 2003-09-16 | 2003-10-15 | Pfizer Ltd | System and method for the computer-assisted identification of drugs and indications |
US20150324693A1 (en) * | 2014-05-06 | 2015-11-12 | International Business Machines Corporation | Predicting drug-drug interactions based on clinical side effects |
CN111243682A (en) * | 2020-01-10 | 2020-06-05 | 京东方科技集团股份有限公司 | Method, device, medium and apparatus for predicting toxicity of drug |
CN111785320A (en) * | 2020-06-28 | 2020-10-16 | 西安电子科技大学 | Drug target interaction prediction method based on multilayer network representation learning |
CN111916145A (en) * | 2020-07-24 | 2020-11-10 | 湖南大学 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
US20200365270A1 (en) * | 2019-05-15 | 2020-11-19 | International Business Machines Corporation | Drug efficacy prediction for treatment of genetic disease |
US20210142173A1 (en) * | 2019-11-12 | 2021-05-13 | The Cleveland Clinic Foundation | Network-based deep learning technology for target identification and drug repurposing |
-
2021
- 2021-07-06 CN CN202110759813.9A patent/CN113421658B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0321708D0 (en) * | 2003-09-16 | 2003-10-15 | Pfizer Ltd | System and method for the computer-assisted identification of drugs and indications |
US20150324693A1 (en) * | 2014-05-06 | 2015-11-12 | International Business Machines Corporation | Predicting drug-drug interactions based on clinical side effects |
US20200365270A1 (en) * | 2019-05-15 | 2020-11-19 | International Business Machines Corporation | Drug efficacy prediction for treatment of genetic disease |
US20210142173A1 (en) * | 2019-11-12 | 2021-05-13 | The Cleveland Clinic Foundation | Network-based deep learning technology for target identification and drug repurposing |
CN111243682A (en) * | 2020-01-10 | 2020-06-05 | 京东方科技集团股份有限公司 | Method, device, medium and apparatus for predicting toxicity of drug |
CN111785320A (en) * | 2020-06-28 | 2020-10-16 | 西安电子科技大学 | Drug target interaction prediction method based on multilayer network representation learning |
CN111916145A (en) * | 2020-07-24 | 2020-11-10 | 湖南大学 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
Non-Patent Citations (3)
Title |
---|
丁棋梁;石泽智;李建华;: "基于分组贝叶斯排序的药物-靶标关系预测", 计算机工程与应用 * |
吴纯伟;路丽;梁生旺;陈超;王淑美;: "药物靶标预测技术在中药网络药理学中的应用", 中国中药杂志 * |
赵其昌: "基于深度注意模型的药物蛋白质关系预测", 中国优秀硕士学位论文全文数据库 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114765060A (en) * | 2021-01-13 | 2022-07-19 | 四川大学 | Multi-attention method for predicting drug target interaction |
CN116246697A (en) * | 2023-05-11 | 2023-06-09 | 上海微观纪元数字科技有限公司 | Target protein prediction method and device for medicines, equipment and storage medium |
CN117912591A (en) * | 2024-03-19 | 2024-04-19 | 鲁东大学 | Kinase-drug interaction prediction method based on deep contrast learning |
CN117912591B (en) * | 2024-03-19 | 2024-05-31 | 鲁东大学 | Kinase-drug interaction prediction method based on deep contrast learning |
Also Published As
Publication number | Publication date |
---|---|
CN113421658B (en) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nasiri et al. | A new link prediction in multiplex networks using topologically biased random walks | |
Deng et al. | An improved method to construct basic probability assignment based on the confusion matrix for classification problem | |
Bielza et al. | Multi-dimensional classification with Bayesian networks | |
Zou et al. | Approaches for recognizing disease genes based on network | |
CN104899253A (en) | Cross-modality image-label relevance learning method facing social image | |
Sadeghi et al. | An analytical review of computational drug repurposing | |
Lin et al. | Clustering methods in protein-protein interaction network | |
Chu et al. | Hierarchical graph representation learning for the prediction of drug-target binding affinity | |
CN113421658B (en) | Drug-target interaction prediction method based on neighbor attention network | |
CN110021341A (en) | A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access | |
Srihari et al. | Computational prediction of protein complexes from protein interaction networks | |
CN112652355A (en) | Medicine-target relation prediction method based on deep forest and PU learning | |
CN116206688A (en) | Multi-mode information fusion model and method for DTA prediction | |
Tian et al. | MHADTI: predicting drug–target interactions via multiview heterogeneous information network embedding with hierarchical attention mechanisms | |
Kirac et al. | Protein function prediction based on patterns in biological networks | |
Dutta et al. | Incomplete multi-view gene clustering with data regeneration using Shape Boltzmann Machine | |
Celebi et al. | Prediction of Drug-Drug interactions using pharmacological similarities of drugs | |
CN116543832A (en) | disease-miRNA relationship prediction method, model and application based on multi-scale hypergraph convolution | |
CN116206775A (en) | Multi-dimensional characteristic fusion medicine-target interaction prediction method | |
Cai et al. | A new community detection method for simplified networks by combining structure and attribute information | |
CN114898815A (en) | Homogeneous interaction prediction method and device based on spatial structure in field of drug discovery | |
Wang et al. | Predicting Drug-Drug Interactions with Graph Attention Network | |
Yang et al. | Graph Contrastive Learning for Clustering of Multi-layer Networks | |
Ao et al. | Computational Approaches for Predicting Drug-Disease Associations: A Comprehensive Review | |
Jeipratha et al. | Optimal gene prioritization and disease prediction using knowledge based ontology structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240304 Address after: Room 21501, Unit 2, Building 1, Oak Tree Constellation, Keji Fifth Road, High tech Zone, Xi'an City, Shaanxi Province, 710065 Patentee after: Shaanxi Exquisite Technology Development Co.,Ltd. Country or region after: China Address before: 710072 No. 127 Youyi West Road, Shaanxi, Xi'an Patentee before: Northwestern Polytechnical University Country or region before: China |
|
TR01 | Transfer of patent right |