CN113421658B - Drug-target interaction prediction method based on neighbor attention network - Google Patents
Drug-target interaction prediction method based on neighbor attention network Download PDFInfo
- Publication number
- CN113421658B CN113421658B CN202110759813.9A CN202110759813A CN113421658B CN 113421658 B CN113421658 B CN 113421658B CN 202110759813 A CN202110759813 A CN 202110759813A CN 113421658 B CN113421658 B CN 113421658B
- Authority
- CN
- China
- Prior art keywords
- drug
- target
- matrix
- similarity
- interaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 68
- 239000003596 drug target Substances 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 40
- 239000003814 drug Substances 0.000 claims abstract description 93
- 229940079593 drug Drugs 0.000 claims abstract description 90
- 239000011159 matrix material Substances 0.000 claims description 40
- 102000004169 proteins and genes Human genes 0.000 claims description 25
- 108090000623 proteins and genes Proteins 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 9
- 238000012360 testing method Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 6
- 230000001939 inductive effect Effects 0.000 description 5
- 238000002790 cross-validation Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 102000003688 G-Protein-Coupled Receptors Human genes 0.000 description 3
- 108090000045 G-Protein-Coupled Receptors Proteins 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 239000002547 new drug Substances 0.000 description 3
- 102000004310 Ion Channels Human genes 0.000 description 2
- 108020005497 Nuclear hormone receptor Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009509 drug development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 102000006255 nuclear receptors Human genes 0.000 description 2
- 108020004017 nuclear receptors Proteins 0.000 description 2
- 230000006916 protein interaction Effects 0.000 description 2
- 206010013710 Drug interaction Diseases 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Toxicology (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention provides a drug-target interaction prediction method based on a neighbor attention network, wherein a prediction model is adopted as a neighbor attention network (NNAttNet), and the problems are solved by constructing an embedded representation (DTPs) of drugs to neighbors. In addition, the NNAttNet provides a key feature selection based on attention so as to accurately predict the DTI, and the evaluation of the NNAttNet on a reference data set shows that the NNAttNet has better DTI prediction performance.
Description
Technical Field
The invention belongs to the technical field of computer-aided drug research and development, and particularly relates to a drug-target interaction prediction method based on a neighbor attention network.
Background
The shift in drug discovery pattern from "one drug, one target" to "multiple drugs, multiple targets" reveals a link between drug and targets, and this shift in new pattern facilitates the discovery of potential "drug-target" interactions (DTIs), which are fundamental tasks in drug development. However, the process of determining DTI by biological experiments is time consuming and laborious.
In recent years, with the generation of more and more DTI data, various databases have been developed, and this accumulation has prompted the application of computer methods, particularly machine learning-based methods, to have good predictive performance in finding potential DTIs. However, despite the great efforts of researchers in DTI prediction, significant achievements are achieved, but there are still challenges in practice, mainly in the following ways:
1) The lack of interpretability, the embedded representation of the DTI prediction mechanism is insufficiently described;
2) The predictive model is very sensitive to missing tags;
3) Prediction of new compound molecule/protein interactions is difficult.
In view of this, there is a need to develop a new approach to predicting "drug-target" interactions.
Disclosure of Invention
The invention aims to solve the defects existing in the prior art and provides a drug-target interaction prediction method based on a neighbor attention network.
For this reason, the present invention first conducted intensive studies and analyses on the problems existing in the prior art, and found that:
1. the lack of interpretability, the embedded representation of the DTI prediction mechanism is insufficiently described; mainly because existing Deep Learning (DL) or Matrix Factorization (MF) learned drug/target embedded representations are always difficult to interpret, the resulting hidden space is difficult to provide an easy way to indicate how these properties affect interactions, their black box nature prevents direct guidance of drug design.
2. The predictive model is very sensitive to missing tags; mainly because in practice the collection of tags for "drug-target" pairs is not complete, existing methods rarely take into account the missing interaction tags between "drug-target" pairs and do not pay attention to whether the missing interactions are helpful for the prediction of DTI.
3. Prediction means it is difficult to predict new compound molecule/protein interactions; at present, two prediction modes are mainly adopted, namely direct push prediction and inductive prediction;
the task of direct push prediction is constructBuilding a function mapping F: dxT → [0,1 ]]To infer potential interactions between unlabeled "drug-target" pairs, the characteristics or similarity of the drug and target are used to learn the function F. Inductive learning is a well known problem for cold starts in recommended systems, and the task of inductive prediction is typically to learn a function map F: dxT → [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the However, it can infer new drug moleculesAnd novel target protein->Potential interactions between, or infer, D and T y Interaction with each other, and D x and Ty Features or similarities of (a) are learned in F. However, almost all current approaches to DTI prediction based on similarity belong to direct push predictions, which extract topologically embedded features from DTI networks or similarity matrices, and the training phase uses both labeled training samples and unlabeled test samples, so that new samples need to be trained again for model when they determine their labels in practice, which cannot meet the current requirements of drug development.
Therefore, in order to achieve the above object, the technical solution provided by the present invention is:
the drug-target interaction prediction method based on the neighbor attention network is characterized by comprising the following steps of:
1) Construction of "drug-target" pair interaction prediction model
The drug-target pair interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) Collecting sample data, and training the drug-target interaction prediction model constructed in the step 1) to obtain a trained drug-target interaction prediction model;
the sample data includes relevant data of the drug and the target and actual interactions of the drug and the target; the specific training process is as follows:
2.1 Calculating the similarity between every two of all the drug molecules, the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises structural information of the drug molecules, sequence information of the target proteins and interaction relation information of the drug molecules and the target proteins;
2.2 Constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1) and the similarity data between the drug molecules, and extracting the embedded representation of the target proteins and all the drug molecules, namely whether the drug molecules are connected or not;
and/or
Constructing a DsTNA module by utilizing the similarity data between the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1), extracting the embedded representation of the drug molecules and all the target proteins, namely, the feature vector, and representing whether the drug molecules are connected or not;
wherein a drug d is extracted by a TsDNA module x And a target t p The extraction process is as follows:
a1. according to all drugs and said drug d x Sequencing all medicines in the sequence from high similarity to low similarity to obtain K 1 、K 2 、…K m ;
a2. Obtaining all drugs and targets t p Removing non-interacting drugs;
a3. obtaining the drug d x And the target t p The formula is as follows:
wherein ,is assigned key, which is aMedicine->Is a series of assigned keys, v i Is->S (·, ·) is d x and />Similarity of (2);
extraction of a target t by DsTNA Module p With a drug d x The extraction process is as follows:
b1. according to all targets and said target t p Sequencing all targets in the sequence of similarity from high to low to obtain H 1 、H 2 、…H m ;
b2. Obtaining all targets and drug d x Removing non-interacting targets;
b3. acquiring the target t p With the drug d x Is embedded in the representation of (1) as follows
wherein ,is assigned bond, which is a drug +.>Is a series of assigned keys, u i Is->S (·, ·) is t p and />Similarity of (2);
drug d x And target t p Is generated by concatenating the bi-directional representations:
e(d x ,t p )=[a(d x ,t p )||a(t p ,d x )];
for extraction of new drug embedded representation, since it cannot construct dsTNA (new data), only TsDNA is constructed when constructing test set and training set in order to maintain data balance;
for extraction of new target embedded representation, as it cannot construct TsDNA (new data), in order to maintain data balance, only DsTNA is constructed when constructing test set and training set;
that is, e (d x ,d p )=[a(d x ,t p )]Or e (d) x ,t p )=[a(t p ,d x )];
2.3 Processing the embedded representation obtained in step 2.2) with a feature importance network
S1, carrying out step 2.2 on all medicines and targets, and stacking the obtained embedded representations of the medicine-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing a attention-enhanced representation matrix through the matrix M obtained in the step S2The identification is convenient;
2.4 (ii) matrix representation obtained in step 3)Inputting the predicted interaction into a deep neural network model as an input layer to obtain a predicted interaction of a drug-target;
2.5 Comparing the predicted interaction of the drug-target obtained in the step 2.4) with the actual interaction of the drug and the target, and obtaining a weight in the model through back propagation to obtain a trained drug-target pair interaction prediction model.
That is, training of the predictive model uses an interpretable model based on deep learning, namely NNAttNet, which comprises three modules, a neighbor attention module, a feature importance network and a multi-layer deep neural network model. For "drug-target" pairs, the first module generates their interpretable embedded representations that have stronger representation characteristics for missing tags in the training data and are viable in both direct-push and inductive prediction scenarios. In addition, the algorithm is adaptive not only to feature inputs, but also to similarity inputs. The second module, the feature importance network, represents a step inside the build neighbor attention module, indicating the importance of each dimension of the embedded feature, providing an interpretable feature selection. The last module distinguishes whether a "drug-target" pair is a potential DTIs.
3) And 3) predicting the interaction by using the trained drug-target interaction prediction model in the step 2).
Further, in step 2.1), the similarity between every two of all the drug molecules is calculated by using the SIMCOMP method by utilizing the acquired structural information of the drug molecules;
and calculating the similarity between every two target proteins by using the collected sequence information of the target proteins and adopting a Smith-Waterman algorithm.
Further, the SIMCOMP method is specifically as follows:
SIMCOMP provides a global similarity score based on the common substructure size between two drug compounds using a graph alignment algorithm, where the similarity s (c, c ') of compounds c and c' is calculated as follows:
further, the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as a=a 1 a 2 a 3 …a n ,B=b 1 b 2 b 3 …b m Wherein n and m are the lengths of sequences A and B, respectively;
determining parameters:
s is a score when there are identity between the elements that make up the sequence;
W k a gap penalty of length k;
creating a scoring matrix H and initializing the first row and the first column of the scoring matrix H, wherein the size of the matrix is (n+1) ×m+1;
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, wherein:
the highest scoring item in the scoring matrix H is selected, namely the matching score of the sequence A and the sequence B, and is marked as SW (A, B).
Similarity of sequence a and sequence B:
further, in step S2), the attention matrix M is mapped as follows:
M(:,i)=DNN i (E)。
further, in step S3), the attention-enhanced representation matrixThe method comprises the following steps:
further, in step 2.4), the deep neural network model includes an input layer, a hidden layer using Relu as an activation function, and two neuron output layers using Sigmoid as an activation function; the deep neural network model acts as a binary predictor, with the output layer producing probabilities representing the likelihood of drug-target pair interactions. The entire network of the NNAttNet with neighbor awareness weights, feature importance items, and DNN weights can be jointly optimized by a binary cross entropy loss function as follows:
wherein Y is the true tag of the drug target pair; f (·) is DNN; θ is a weight parameter of the entire network; r (& gt) is L2-norm; lambda regularizes the coefficients of the term.
The present invention also provides a computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program realizes the steps of the above method when being executed by a processor.
An electronic device is characterized in that: including a processor and a computer-readable storage medium;
the computer readable storage medium has stored thereon a computer program which, when executed by the processor, performs the steps of the above method.
The invention has the advantages that:
the invention provides a prediction method based on deep learning, wherein a prediction model is a neighbor attention network (NNAttNet), and the problems are solved by constructing an embedded representation (DTPs) of a drug to a neighbor. In addition, the NNAttNet provides a key feature selection based on attention so as to accurately predict the DTI, and the evaluation of the NNAttNet on a reference data set shows that the NNAttNet has better DTI prediction performance.
Drawings
FIG. 1 is a general architecture of the proposed method NNAttNet;
FIG. 2 is a basic block diagram of a TsDNA module;
FIG. 3 is a schematic construction diagram of a feature importance matrix;
FIG. 4 is an arrangement and distribution of drug bond embedding features;
FIG. 5 is an arrangement and distribution of the importance of drug features;
FIG. 6 is a graph of predicted performance of top-k features;
FIG. 7 distribution of feature importance at different rates of DTIs loss.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and specific examples:
one embodiment of a "drug-target" interaction prediction method based on a neighbor attention network according to the present invention is specifically as follows:
this embodiment includes three parts:
first, when constructing the test set, deleting a part of the continuous edges in the network, and then predicting the continuous edges.
The second part, deleting some drugs and all the edges in the network, is to simulate the scene of new drug prediction.
And the third part, deleting some targets and all edges in the network, so as to simulate the scene of new target prediction.
During the results statistics and presentation we presented the performance parameters of the overall results, without specific presentation of a specific drug (target).
This example uses an initial baseline data set in predictive performance comparison experiments, separating receptors into 4 subsets according to the properties of the protein family, enzyme (En), ion Channel (IC), G Protein Coupled Receptor (GPCR) and Nuclear Receptor (NR), respectively. Each subset includes known "drug-target" interactions, pairwise similarities between drugs, and pairwise similarities between targets. Wherein, the pairwise similarity between medicines is calculated by a SIMCOMP algorithm, and the pairwise similarity between targets is calculated by a Smith-Waterman algorithm. The details of this dataset are shown in table 1.
TABLE 1 details of reference data sets
And setting a dictionary K of the virtual key type for each data subset, assigning values to the dictionary by using the acquired data, and constructing a TsDNA module and a DsTNA module between the drug molecules and the target proteins to finally obtain embedded representations of all drug-target pairs.
TsDNA (fig. 2) consists of a dictionary K of virtual key types and some values v describing these virtual keys. In the dictionary, virtual keys are ordered by semantic adjacent terms. Briefly, the first key is its nearest neighbor, the second key is its second nearest neighbor, and the last key is its farthest neighbor. Notably, this is a contribution that can explain the classification distinction between known DTIs and unknown DTIs.
When taking into account a drug d x Whether to bind to a target protein t p In an interactive relationship, these empty bonds are then bound to the target t p Other drugs known to have interactions are assigned. Conversely, for those sums t p The medicine without interaction does not operate, d x For t p Can be defined as:
in this context,is assigned bond, which is a drug +.>Is a series of assigned keys, v i Is->S (·, ·) is d x and />Is a similarity of (3). Note that virtual keys are featureless and only after being assigned will there be features. This attention is given to d x →t p Unidirectional representation.
To enhance the interpretability of TsDNA modules, we set v= [ V 1 ,v 2 ,…,v |K| ]Is a diagonal matrix, v therein i Is a vector resembling one-hot, that is, v i The i-th element of (2) is not 0, and the other elements are all 0. By this method, d x →t p Is sparse. Considering the widely accepted assumption that similar drugs tend to interact with the target protein of interest, it is hypothesized that if drug d x And target t p Having an interaction relationship has more non-zero values in some of their first characteristic dimensions than drugs and targets that have no interaction relationship. In other words, for d x and tp Interaction relationship between other drugs if desired and t p Having interactions, where they are generally d x Is the first neighbor of (c). This sparse attention embedding expression provides evidence for the later interpretability of TsDNA.
Due to the symmetrical effect of the nodes in the two networks, we can similarly construct a dsTNA block that outputs another unidirectional representation t p →d x The expression form is a (t) p ,d x ) Thus, the final requested d x and tp The representation of the pair is generated by concatenating the bi-directional representations.
e(d x ,t p )=[a(d x ,t p )||a(t p ,d x )]
The embedded representations of all "drug-target" pairs are stacked together, denoted as the attention matrix E.
By the formula M (: i) =DNN i (E) Modeling the attention matrix E to obtain an embedded expression matrix M.
A generic DNN was used as a binary predictor to predict whether or not there was an interaction with the "drug-target" pair. The binary predictor comprises an input layer, i.e. an embedded representation of a "drug-target" pair, a hidden layer with Relu as activation function, and two neuron output layers with Sigmoid as activation function. The output layer produces a probability that indicates the likelihood of drug-target pair interactions. The entire network of the NNAttNet with neighbor awareness weights, feature importance items, and DNN weights can be jointly optimized by a binary cross entropy loss function as follows:
wherein Y is the authentic tag of the drug target pair; f (·) is DNN; θ is a weight parameter of the entire network; r (& gt) is L2-norm; lambda regularizes the coefficients of the term.
In this example we evaluate the performance of the various methods by 10 fold cross-validation (CV) and use AUROC (area under the receiver work characteristic) and AUPRC (area under the precision recall) as indicators for DTI predictive performance.
In 10-fold cross-validation, we calculated the AUROC/AUPRC score for each prediction method and obtained the final AUROC/AUPRC score by calculating the average AUROC/AUPRC score for 10 replicates.
In order to comprehensively evaluate the performance of the various methods, the present embodiment considers the following three scenarios for performing CV experiments.
Under CVS1, 90% of the DTPs (drug-to-neighbor embedded representation) are used for training, while the remaining 10% are used for each round of testing.
Under CVS2 (or CVS 3), 90% of the drug (or target) interactions are used for training, and the remaining 10% of the drug (or target) interactions are used for testing.
CVS2 (CVS 3) is a cold-start DTI prediction because there is no overlap between the training drug (target) and the test drug (target).
Notably, CVS1 is a straight-forward predictive task.
The CVS2/CVS3 may be a direct push or generalized prediction task, depending on the nature of the prediction method. The experimental results of the nnittnet are shown in tables 2, 3, and 4.
TABLE 2 performance demonstration of DTI predictions by CVS1 over 4 data sets
Note that: ROC and PR are abbreviations for AUROC and AUPRC.
TABLE 3 performance demonstration of DTI predictions by CVS2 over 4 data sets
Note that: ROC and PR are abbreviations for AUROC and AUPRC.
Table 4. Performance demonstration of DTI predictions by cvs3 over 4 data sets
Note that: ROC and PR are abbreviations for AUROC and AUPRC.
The following describes the explanation of the prediction method in the present invention according to the experimental results of the present embodiment.
Taking the GPCR dataset as an example, a dictionary distribution of drug bond types from K1 to K100 was obtained by computing two average embedded vectors of known DTIs and unlabeled DTPs (see fig. 4). The significantly high embedded eigenvalues that occur in the first n nearest neighbors indicate that drugs that interact with a particular target always find their first n nearest neighbors among drugs that interact with the same target. This observation shows that if a drug has more non-zero units than non-interactions in the first n characteristic dimensions (bonds) it may interact with the target.
The present invention also indicates on this embodiment which embedded features in the M matrix caused interactions to occur. Since the cells with larger median M represent an important feature dimension, each feature f i The importance M (: i) of (i) can be measured by the average of the values in column i of M (see fig. 5). The importance distribution of the keys in dictionary K illustrates that features of relatively high importance are typically located in the first n nearest neighbors. This observation is significantly consistent with the visual observation described above over the first 10 keys with large Spearman (Spearman) correlation values (r= 0.8182).
This example investigated the predictive performance of top-k features (see FIG. 6). The value of k is {1,5,10,15, …,220}. When k increases to 50, the predictive effect increases dramatically. As k increases again, performance increases slowly and even decreases when k is greater.
One reason that the NNAttNet still performs better in missing labels is that it utilizes embedded vectors made up of neighboring nodes. We studied the distribution of feature importance at different rates of DTIs deletion (fig. 7). The graph reveals that the distribution of characteristic bonds shows a similar trend at different deletion rates. Meanwhile, feature importance vectors at 9 miss rates have a high degree of correlation. The Spearman correlation coefficients of the feature importance vectors at 10% loss and other loss (20% -90%) were 0.9996, 0.9993, 0.9989, 0.9979, 0.9969, 0.9943, 0.9919 and 0.9770, respectively. This high degree of correlation indicates that in the absence of data, the feature importance network may still indicate a critical feature. Thus, in the absence of a tag, even if a few drugs are found in the first n nearest neighbors of the drug to the target, the ordering key dictionary in its neighbor attention module can still guarantee that the queried drug interacts with the target.
The feasibility of the NNAttNet is demonstrated by the above examples: the interpretability of drug interactions with proteins, the ability to have stronger properties for predictions of missing DTI tags, consistent representation of direct-push and inductive DTI predictions, and selection of important attention-based features for more accurate DTI predictions.
The implementation methods and common general knowledge that are well known in the above-described schemes are not described here too much. It should be noted that modifications can be made to the invention by those skilled in the art without departing from the scope of the invention, which is also to be considered as the scope of the invention, and which does not affect the practice of the invention or the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the detailed description and the like in the specification are recited for explaining the content of the claims.
Claims (9)
1. A method for predicting drug-target interactions based on a neighbor attention network, comprising the steps of:
1) Construction of "drug-target" pair interaction prediction model
The drug-target interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) Collecting sample data, and training the drug-target interaction prediction model constructed in the step 1) to obtain a trained drug-target interaction prediction model;
the sample data includes relevant data of the drug and the target and actual interactions of the drug and the target; the specific training process is as follows:
2.1 Calculating the similarity between every two of all the drug molecules, the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises structural information of the drug molecules, sequence information of the target proteins and interaction relation information of the drug molecules and the target proteins;
2.2 Constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules obtained in the step 2.1) and the target protein and the similarity data between the drug molecules, and extracting the embedded representation of the target protein and all the drug molecules;
constructing a DsTNA module by using the similarity data between the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1), and extracting the embedded representation of the drug molecules and all the target proteins;
wherein a drug d is extracted by a TsDNA module x And a target t p The extraction process is as follows:
a1. according to all drugs and said drug d x Sequencing all medicines in the sequence from high similarity to low similarity to obtain K 1 、K 2 、…K m ;
a2. Obtaining all drugs and targets t p Removing non-interacting drugs;
a3. obtaining the drug d x And the target t p The formula is as follows:
wherein ,is assigned bond, which is a drug +.>Is a series of assigned keys, v i Is thatS (·, ·) is d x and />Similarity of (2);
extraction of a target t by DsTNA Module p With a drug d x The extraction process is as follows:
b1. according to all targets and said target t p Sequencing all targets in the sequence of similarity from high to low to obtain H 1 、H 2 、…H m ;
b2. Obtaining all targets and drug d x Removing non-interacting targets;
b3. acquiring the target t p With the drug d x Is embedded in the representation of (1) as follows
wherein ,is assigned bond, which is a drug +.>Is a series of assigned keys, u i Is thatS (·, ·) is t p and />Similarity of (2);
drug d x And target t p Is generated by concatenating the bi-directional representations:
e(d x ,t p )=[a(d x ,t p )||a(t p ,d x )];
2.3 Processing the embedded representation obtained in step 2.2) with a feature importance network
S1, carrying out step 2.2 on all medicines and targets, and stacking the obtained embedded representations of the medicine-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing a attention-enhanced representation matrix through the matrix M obtained in the step S2
2.4 (ii) matrix representation obtained in step 3)Inputting the predicted interaction into a deep neural network model as an input layer to obtain a predicted interaction of a drug-target;
2.5 Comparing the predicted interaction of the drug-target obtained in the step 2.4) with the actual interaction of the drug and the target, and obtaining a weight in the model through back propagation to obtain a trained drug-target pair interaction prediction model;
3) And 3) predicting the interaction of the drug-target by using the trained drug-target interaction prediction model in the step 2).
2. The method of predicting drug-target interactions of claim 1, wherein:
in the step 2.1), the similarity between every two of all the medicine molecules is calculated by using the SIMCOMP method by utilizing the acquired structure information of the medicine molecules;
and calculating the similarity between every two target proteins by using the collected sequence information of the target proteins and adopting a Smith-Waterman algorithm.
3. The method of predicting drug-target interactions of claim 2, wherein:
the SIMCOMP method comprises the following steps:
SIMCOMP provides a global similarity score based on the common substructure size between two drug compounds using a graph alignment algorithm, where the similarity s (c, c ') of compounds c and c' is calculated as follows:
4. the method of predicting drug-target interactions of claim 2, wherein:
the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as a=a 1 a 2 a 3 …a n ,B=b 1 b 2 b 3 …b m Wherein n and m are the lengths of sequences A and B, respectively;
determining parameters:
s is a score when there are identity between the elements that make up the sequence;
W k a gap penalty of length k;
creating a scoring matrix H and initializing the first row and the first column of the scoring matrix H, wherein the size of the matrix is (n+1) ×m+1;
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, wherein:
selecting the item with the highest score in the score matrix H, namely, the matching score of the sequence A and the sequence B, and marking the matching score as SW (A, B);
similarity of sequence a and sequence B:
5. the method of predicting drug-target interactions of any one of claims 1-4, wherein:
in step S2), the attention matrix M is mapped as follows:
M(:,i)=DNN i (E)。
7. the method of predicting drug-target interactions of claim 6, wherein:
in step 2.4), the deep neural network model includes an input layer, a hidden layer using Relu as an activation function, and two neuron output layers using Sigmoid as an activation function.
8. A computer-readable storage medium having stored thereon a computer program, characterized by: which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
9. An electronic device, characterized in that: including a processor and a computer-readable storage medium;
the computer readable storage medium has stored thereon a computer program which, when executed by the processor, performs the steps of the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110759813.9A CN113421658B (en) | 2021-07-06 | 2021-07-06 | Drug-target interaction prediction method based on neighbor attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110759813.9A CN113421658B (en) | 2021-07-06 | 2021-07-06 | Drug-target interaction prediction method based on neighbor attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421658A CN113421658A (en) | 2021-09-21 |
CN113421658B true CN113421658B (en) | 2023-06-16 |
Family
ID=77721466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110759813.9A Active CN113421658B (en) | 2021-07-06 | 2021-07-06 | Drug-target interaction prediction method based on neighbor attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421658B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114765060B (en) * | 2021-01-13 | 2023-12-08 | 四川大学 | Multi-attention method for predicting drug target interactions |
CN116246697B (en) * | 2023-05-11 | 2023-08-01 | 上海微观纪元数字科技有限公司 | Target protein prediction method and device for medicines, equipment and storage medium |
CN117912591B (en) * | 2024-03-19 | 2024-05-31 | 鲁东大学 | Kinase-drug interaction prediction method based on deep contrast learning |
CN118335201B (en) * | 2024-06-12 | 2024-10-01 | 安徽农业大学 | Prediction method based on deformable convolutional neural network and convergence similarity principle |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0321708D0 (en) * | 2003-09-16 | 2003-10-15 | Pfizer Ltd | System and method for the computer-assisted identification of drugs and indications |
CN111243682A (en) * | 2020-01-10 | 2020-06-05 | 京东方科技集团股份有限公司 | Method, device, medium and apparatus for predicting toxicity of drug |
CN111916145A (en) * | 2020-07-24 | 2020-11-10 | 湖南大学 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10803144B2 (en) * | 2014-05-06 | 2020-10-13 | International Business Machines Corporation | Predicting drug-drug interactions based on clinical side effects |
US11651860B2 (en) * | 2019-05-15 | 2023-05-16 | International Business Machines Corporation | Drug efficacy prediction for treatment of genetic disease |
US20210142173A1 (en) * | 2019-11-12 | 2021-05-13 | The Cleveland Clinic Foundation | Network-based deep learning technology for target identification and drug repurposing |
CN111785320B (en) * | 2020-06-28 | 2024-02-06 | 西安电子科技大学 | Drug target interaction prediction method based on multi-layer network representation learning |
-
2021
- 2021-07-06 CN CN202110759813.9A patent/CN113421658B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0321708D0 (en) * | 2003-09-16 | 2003-10-15 | Pfizer Ltd | System and method for the computer-assisted identification of drugs and indications |
CN111243682A (en) * | 2020-01-10 | 2020-06-05 | 京东方科技集团股份有限公司 | Method, device, medium and apparatus for predicting toxicity of drug |
CN111916145A (en) * | 2020-07-24 | 2020-11-10 | 湖南大学 | Novel coronavirus target prediction and drug discovery method based on graph representation learning |
Non-Patent Citations (3)
Title |
---|
基于分组贝叶斯排序的药物-靶标关系预测;丁棋梁;石泽智;李建华;;计算机工程与应用(15);全文 * |
基于深度注意模型的药物蛋白质关系预测;赵其昌;中国优秀硕士学位论文全文数据库(第2期);全文 * |
药物靶标预测技术在中药网络药理学中的应用;吴纯伟;路丽;梁生旺;陈超;王淑美;;中国中药杂志(03);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113421658A (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113421658B (en) | Drug-target interaction prediction method based on neighbor attention network | |
Cao et al. | Deep learning-based remote and social sensing data fusion for urban region function recognition | |
Lanchantin et al. | Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks | |
Su et al. | A visualized bibliometric analysis of mapping research trends of machine learning in engineering (MLE) | |
Zhou et al. | CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway | |
Zhang et al. | Computational prediction of conformational B-cell epitopes from antigen primary structures by ensemble learning | |
Chu et al. | Hierarchical graph representation learning for the prediction of drug-target binding affinity | |
Jiang et al. | Predicting protein function by multi-label correlated semi-supervised learning | |
Shoombuatong et al. | Towards the revival of interpretable QSAR models | |
Lin et al. | Clustering methods in protein-protein interaction network | |
Srihari et al. | Computational prediction of protein complexes from protein interaction networks | |
CN116206688A (en) | Multi-mode information fusion model and method for DTA prediction | |
Qiu et al. | BOW-GBDT: a GBDT classifier combining with artificial neural network for identifying GPCR–drug interaction based on wordbook learning from sequences | |
CN112652355A (en) | Medicine-target relation prediction method based on deep forest and PU learning | |
Yan et al. | Image retrieval for structure-from-motion via graph convolutional network | |
Wang et al. | A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences | |
CN118038995B (en) | Method and system for predicting small open reading window coding polypeptide capacity in non-coding RNA | |
Yang et al. | Graph Contrastive Learning for Clustering of Multi-layer Networks | |
Tian et al. | GTAMP-DTA: Graph transformer combined with attention mechanism for drug-target binding affinity prediction | |
Enireddy et al. | OneHotEncoding and LSTM-based deep learning models for protein secondary structure prediction | |
Halsana et al. | DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein–Protein Interactions | |
Zhang et al. | MLLBC: A machine learning toolbox for modeling the loss rate of the lining bearing capacity | |
Yu et al. | A supervised approach to detect protein complex by combining biological and topological properties | |
Thareja et al. | Applications of deep learning models in bioinformatics | |
Pap et al. | Depthwise Convolutions using Physicochemical Features of DNA for Transcription Factor Binding Site Classification: Physicochemical Features for DNA-Protein Classification with Depthwise Convolutions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240304 Address after: Room 21501, Unit 2, Building 1, Oak Tree Constellation, Keji Fifth Road, High tech Zone, Xi'an City, Shaanxi Province, 710065 Patentee after: Shaanxi Exquisite Technology Development Co.,Ltd. Country or region after: China Address before: 710072 No. 127 Youyi West Road, Shaanxi, Xi'an Patentee before: Northwestern Polytechnical University Country or region before: China |