CN113421658B - Drug-target interaction prediction method based on neighbor attention network - Google Patents

Drug-target interaction prediction method based on neighbor attention network Download PDF

Info

Publication number
CN113421658B
CN113421658B CN202110759813.9A CN202110759813A CN113421658B CN 113421658 B CN113421658 B CN 113421658B CN 202110759813 A CN202110759813 A CN 202110759813A CN 113421658 B CN113421658 B CN 113421658B
Authority
CN
China
Prior art keywords
drug
target
matrix
similarity
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110759813.9A
Other languages
Chinese (zh)
Other versions
CN113421658A (en
Inventor
施建宇
赵鹏程
徐意
朱蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Exquisite Technology Development Co ltd
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110759813.9A priority Critical patent/CN113421658B/en
Publication of CN113421658A publication Critical patent/CN113421658A/en
Application granted granted Critical
Publication of CN113421658B publication Critical patent/CN113421658B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a drug-target interaction prediction method based on a neighbor attention network, wherein a prediction model is adopted as a neighbor attention network (NNAttNet), and the problems are solved by constructing an embedded representation (DTPs) of drugs to neighbors. In addition, the NNAttNet provides a key feature selection based on attention so as to accurately predict the DTI, and the evaluation of the NNAttNet on a reference data set shows that the NNAttNet has better DTI prediction performance.

Description

Drug-target interaction prediction method based on neighbor attention network
Technical Field
The invention belongs to the technical field of computer-aided drug research and development, and particularly relates to a drug-target interaction prediction method based on a neighbor attention network.
Background
The shift in drug discovery pattern from "one drug, one target" to "multiple drugs, multiple targets" reveals a link between drug and targets, and this shift in new pattern facilitates the discovery of potential "drug-target" interactions (DTIs), which are fundamental tasks in drug development. However, the process of determining DTI by biological experiments is time consuming and laborious.
In recent years, with the generation of more and more DTI data, various databases have been developed, and this accumulation has prompted the application of computer methods, particularly machine learning-based methods, to have good predictive performance in finding potential DTIs. However, despite the great efforts of researchers in DTI prediction, significant achievements are achieved, but there are still challenges in practice, mainly in the following ways:
1) The lack of interpretability, the embedded representation of the DTI prediction mechanism is insufficiently described;
2) The predictive model is very sensitive to missing tags;
3) Prediction of new compound molecule/protein interactions is difficult.
In view of this, there is a need to develop a new approach to predicting "drug-target" interactions.
Disclosure of Invention
The invention aims to solve the defects existing in the prior art and provides a drug-target interaction prediction method based on a neighbor attention network.
For this reason, the present invention first conducted intensive studies and analyses on the problems existing in the prior art, and found that:
1. the lack of interpretability, the embedded representation of the DTI prediction mechanism is insufficiently described; mainly because existing Deep Learning (DL) or Matrix Factorization (MF) learned drug/target embedded representations are always difficult to interpret, the resulting hidden space is difficult to provide an easy way to indicate how these properties affect interactions, their black box nature prevents direct guidance of drug design.
2. The predictive model is very sensitive to missing tags; mainly because in practice the collection of tags for "drug-target" pairs is not complete, existing methods rarely take into account the missing interaction tags between "drug-target" pairs and do not pay attention to whether the missing interactions are helpful for the prediction of DTI.
3. Prediction means it is difficult to predict new compound molecule/protein interactions; at present, two prediction modes are mainly adopted, namely direct push prediction and inductive prediction;
the task of direct push prediction is constructBuilding a function mapping F: dxT → [0,1 ]]To infer potential interactions between unlabeled "drug-target" pairs, the characteristics or similarity of the drug and target are used to learn the function F. Inductive learning is a well known problem for cold starts in recommended systems, and the task of inductive prediction is typically to learn a function map F: dxT → [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the However, it can infer new drug molecules
Figure GDA0004219997810000021
And novel target protein->
Figure GDA0004219997810000022
Potential interactions between, or infer, D and T y Interaction with each other, and D x and Ty Features or similarities of (a) are learned in F. However, almost all current approaches to DTI prediction based on similarity belong to direct push predictions, which extract topologically embedded features from DTI networks or similarity matrices, and the training phase uses both labeled training samples and unlabeled test samples, so that new samples need to be trained again for model when they determine their labels in practice, which cannot meet the current requirements of drug development.
Therefore, in order to achieve the above object, the technical solution provided by the present invention is:
the drug-target interaction prediction method based on the neighbor attention network is characterized by comprising the following steps of:
1) Construction of "drug-target" pair interaction prediction model
The drug-target pair interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) Collecting sample data, and training the drug-target interaction prediction model constructed in the step 1) to obtain a trained drug-target interaction prediction model;
the sample data includes relevant data of the drug and the target and actual interactions of the drug and the target; the specific training process is as follows:
2.1 Calculating the similarity between every two of all the drug molecules, the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises structural information of the drug molecules, sequence information of the target proteins and interaction relation information of the drug molecules and the target proteins;
2.2 Constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1) and the similarity data between the drug molecules, and extracting the embedded representation of the target proteins and all the drug molecules, namely whether the drug molecules are connected or not;
and/or
Constructing a DsTNA module by utilizing the similarity data between the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1), extracting the embedded representation of the drug molecules and all the target proteins, namely, the feature vector, and representing whether the drug molecules are connected or not;
wherein a drug d is extracted by a TsDNA module x And a target t p The extraction process is as follows:
a1. according to all drugs and said drug d x Sequencing all medicines in the sequence from high similarity to low similarity to obtain K 1 、K 2 、…K m
a2. Obtaining all drugs and targets t p Removing non-interacting drugs;
a3. obtaining the drug d x And the target t p The formula is as follows:
Figure GDA0004219997810000041
wherein ,
Figure GDA0004219997810000042
is assigned key, which is aMedicine->
Figure GDA0004219997810000043
Is a series of assigned keys, v i Is->
Figure GDA0004219997810000044
S (·, ·) is d x and />
Figure GDA0004219997810000045
Similarity of (2);
extraction of a target t by DsTNA Module p With a drug d x The extraction process is as follows:
b1. according to all targets and said target t p Sequencing all targets in the sequence of similarity from high to low to obtain H 1 、H 2 、…H m
b2. Obtaining all targets and drug d x Removing non-interacting targets;
b3. acquiring the target t p With the drug d x Is embedded in the representation of (1) as follows
Figure GDA0004219997810000046
wherein ,
Figure GDA0004219997810000047
is assigned bond, which is a drug +.>
Figure GDA0004219997810000048
Is a series of assigned keys, u i Is->
Figure GDA0004219997810000051
S (·, ·) is t p and />
Figure GDA0004219997810000052
Similarity of (2);
drug d x And target t p Is generated by concatenating the bi-directional representations:
e(d x ,t p )=[a(d x ,t p )||a(t p ,d x )];
for extraction of new drug embedded representation, since it cannot construct dsTNA (new data), only TsDNA is constructed when constructing test set and training set in order to maintain data balance;
for extraction of new target embedded representation, as it cannot construct TsDNA (new data), in order to maintain data balance, only DsTNA is constructed when constructing test set and training set;
that is, e (d x ,d p )=[a(d x ,t p )]Or e (d) x ,t p )=[a(t p ,d x )];
2.3 Processing the embedded representation obtained in step 2.2) with a feature importance network
S1, carrying out step 2.2 on all medicines and targets, and stacking the obtained embedded representations of the medicine-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing a attention-enhanced representation matrix through the matrix M obtained in the step S2
Figure GDA0004219997810000053
The identification is convenient;
2.4 (ii) matrix representation obtained in step 3)
Figure GDA0004219997810000054
Inputting the predicted interaction into a deep neural network model as an input layer to obtain a predicted interaction of a drug-target;
2.5 Comparing the predicted interaction of the drug-target obtained in the step 2.4) with the actual interaction of the drug and the target, and obtaining a weight in the model through back propagation to obtain a trained drug-target pair interaction prediction model.
That is, training of the predictive model uses an interpretable model based on deep learning, namely NNAttNet, which comprises three modules, a neighbor attention module, a feature importance network and a multi-layer deep neural network model. For "drug-target" pairs, the first module generates their interpretable embedded representations that have stronger representation characteristics for missing tags in the training data and are viable in both direct-push and inductive prediction scenarios. In addition, the algorithm is adaptive not only to feature inputs, but also to similarity inputs. The second module, the feature importance network, represents a step inside the build neighbor attention module, indicating the importance of each dimension of the embedded feature, providing an interpretable feature selection. The last module distinguishes whether a "drug-target" pair is a potential DTIs.
3) And 3) predicting the interaction by using the trained drug-target interaction prediction model in the step 2).
Further, in step 2.1), the similarity between every two of all the drug molecules is calculated by using the SIMCOMP method by utilizing the acquired structural information of the drug molecules;
and calculating the similarity between every two target proteins by using the collected sequence information of the target proteins and adopting a Smith-Waterman algorithm.
Further, the SIMCOMP method is specifically as follows:
SIMCOMP provides a global similarity score based on the common substructure size between two drug compounds using a graph alignment algorithm, where the similarity s (c, c ') of compounds c and c' is calculated as follows:
Figure GDA0004219997810000061
further, the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as a=a 1 a 2 a 3 …a n ,B=b 1 b 2 b 3 …b m Wherein n and m are the lengths of sequences A and B, respectively;
determining parameters:
s is a score when there are identity between the elements that make up the sequence;
W k a gap penalty of length k;
creating a scoring matrix H and initializing the first row and the first column of the scoring matrix H, wherein the size of the matrix is (n+1) ×m+1;
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, wherein:
Figure GDA0004219997810000071
the highest scoring item in the scoring matrix H is selected, namely the matching score of the sequence A and the sequence B, and is marked as SW (A, B).
Similarity of sequence a and sequence B:
Figure GDA0004219997810000072
further, in step S2), the attention matrix M is mapped as follows:
M(:,i)=DNN i (E)。
further, in step S3), the attention-enhanced representation matrix
Figure GDA0004219997810000073
The method comprises the following steps:
Figure GDA0004219997810000074
further, in step 2.4), the deep neural network model includes an input layer, a hidden layer using Relu as an activation function, and two neuron output layers using Sigmoid as an activation function; the deep neural network model acts as a binary predictor, with the output layer producing probabilities representing the likelihood of drug-target pair interactions. The entire network of the NNAttNet with neighbor awareness weights, feature importance items, and DNN weights can be jointly optimized by a binary cross entropy loss function as follows:
Figure GDA0004219997810000081
wherein Y is the true tag of the drug target pair; f (·) is DNN; θ is a weight parameter of the entire network; r (& gt) is L2-norm; lambda regularizes the coefficients of the term.
The present invention also provides a computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program realizes the steps of the above method when being executed by a processor.
An electronic device is characterized in that: including a processor and a computer-readable storage medium;
the computer readable storage medium has stored thereon a computer program which, when executed by the processor, performs the steps of the above method.
The invention has the advantages that:
the invention provides a prediction method based on deep learning, wherein a prediction model is a neighbor attention network (NNAttNet), and the problems are solved by constructing an embedded representation (DTPs) of a drug to a neighbor. In addition, the NNAttNet provides a key feature selection based on attention so as to accurately predict the DTI, and the evaluation of the NNAttNet on a reference data set shows that the NNAttNet has better DTI prediction performance.
Drawings
FIG. 1 is a general architecture of the proposed method NNAttNet;
FIG. 2 is a basic block diagram of a TsDNA module;
FIG. 3 is a schematic construction diagram of a feature importance matrix;
FIG. 4 is an arrangement and distribution of drug bond embedding features;
FIG. 5 is an arrangement and distribution of the importance of drug features;
FIG. 6 is a graph of predicted performance of top-k features;
FIG. 7 distribution of feature importance at different rates of DTIs loss.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and specific examples:
one embodiment of a "drug-target" interaction prediction method based on a neighbor attention network according to the present invention is specifically as follows:
this embodiment includes three parts:
first, when constructing the test set, deleting a part of the continuous edges in the network, and then predicting the continuous edges.
The second part, deleting some drugs and all the edges in the network, is to simulate the scene of new drug prediction.
And the third part, deleting some targets and all edges in the network, so as to simulate the scene of new target prediction.
During the results statistics and presentation we presented the performance parameters of the overall results, without specific presentation of a specific drug (target).
This example uses an initial baseline data set in predictive performance comparison experiments, separating receptors into 4 subsets according to the properties of the protein family, enzyme (En), ion Channel (IC), G Protein Coupled Receptor (GPCR) and Nuclear Receptor (NR), respectively. Each subset includes known "drug-target" interactions, pairwise similarities between drugs, and pairwise similarities between targets. Wherein, the pairwise similarity between medicines is calculated by a SIMCOMP algorithm, and the pairwise similarity between targets is calculated by a Smith-Waterman algorithm. The details of this dataset are shown in table 1.
TABLE 1 details of reference data sets
Figure GDA0004219997810000101
And setting a dictionary K of the virtual key type for each data subset, assigning values to the dictionary by using the acquired data, and constructing a TsDNA module and a DsTNA module between the drug molecules and the target proteins to finally obtain embedded representations of all drug-target pairs.
TsDNA (fig. 2) consists of a dictionary K of virtual key types and some values v describing these virtual keys. In the dictionary, virtual keys are ordered by semantic adjacent terms. Briefly, the first key is its nearest neighbor, the second key is its second nearest neighbor, and the last key is its farthest neighbor. Notably, this is a contribution that can explain the classification distinction between known DTIs and unknown DTIs.
When taking into account a drug d x Whether to bind to a target protein t p In an interactive relationship, these empty bonds are then bound to the target t p Other drugs known to have interactions are assigned. Conversely, for those sums t p The medicine without interaction does not operate, d x For t p Can be defined as:
Figure GDA0004219997810000111
in this context,
Figure GDA0004219997810000112
is assigned bond, which is a drug +.>
Figure GDA0004219997810000113
Is a series of assigned keys, v i Is->
Figure GDA0004219997810000114
S (·, ·) is d x and />
Figure GDA0004219997810000115
Is a similarity of (3). Note that virtual keys are featureless and only after being assigned will there be features. This attention is given to d x →t p Unidirectional representation.
To enhance the interpretability of TsDNA modules, we set v= [ V 1 ,v 2 ,…,v |K| ]Is a diagonal matrix, v therein i Is a vector resembling one-hot, that is, v i The i-th element of (2) is not 0, and the other elements are all 0. By this method, d x →t p Is sparse. Considering the widely accepted assumption that similar drugs tend to interact with the target protein of interest, it is hypothesized that if drug d x And target t p Having an interaction relationship has more non-zero values in some of their first characteristic dimensions than drugs and targets that have no interaction relationship. In other words, for d x and tp Interaction relationship between other drugs if desired and t p Having interactions, where they are generally d x Is the first neighbor of (c). This sparse attention embedding expression provides evidence for the later interpretability of TsDNA.
Due to the symmetrical effect of the nodes in the two networks, we can similarly construct a dsTNA block that outputs another unidirectional representation t p →d x The expression form is a (t) p ,d x ) Thus, the final requested d x and tp The representation of the pair is generated by concatenating the bi-directional representations.
e(d x ,t p )=[a(d x ,t p )||a(t p ,d x )]
The embedded representations of all "drug-target" pairs are stacked together, denoted as the attention matrix E.
By the formula M (: i) =DNN i (E) Modeling the attention matrix E to obtain an embedded expression matrix M.
By the formula
Figure GDA0004219997810000121
Constructing a concentration-enhanced representation matrix +.>
Figure GDA0004219997810000122
A generic DNN was used as a binary predictor to predict whether or not there was an interaction with the "drug-target" pair. The binary predictor comprises an input layer, i.e. an embedded representation of a "drug-target" pair, a hidden layer with Relu as activation function, and two neuron output layers with Sigmoid as activation function. The output layer produces a probability that indicates the likelihood of drug-target pair interactions. The entire network of the NNAttNet with neighbor awareness weights, feature importance items, and DNN weights can be jointly optimized by a binary cross entropy loss function as follows:
Figure GDA0004219997810000123
wherein Y is the authentic tag of the drug target pair; f (·) is DNN; θ is a weight parameter of the entire network; r (& gt) is L2-norm; lambda regularizes the coefficients of the term.
In this example we evaluate the performance of the various methods by 10 fold cross-validation (CV) and use AUROC (area under the receiver work characteristic) and AUPRC (area under the precision recall) as indicators for DTI predictive performance.
In 10-fold cross-validation, we calculated the AUROC/AUPRC score for each prediction method and obtained the final AUROC/AUPRC score by calculating the average AUROC/AUPRC score for 10 replicates.
In order to comprehensively evaluate the performance of the various methods, the present embodiment considers the following three scenarios for performing CV experiments.
Under CVS1, 90% of the DTPs (drug-to-neighbor embedded representation) are used for training, while the remaining 10% are used for each round of testing.
Under CVS2 (or CVS 3), 90% of the drug (or target) interactions are used for training, and the remaining 10% of the drug (or target) interactions are used for testing.
CVS2 (CVS 3) is a cold-start DTI prediction because there is no overlap between the training drug (target) and the test drug (target).
Notably, CVS1 is a straight-forward predictive task.
The CVS2/CVS3 may be a direct push or generalized prediction task, depending on the nature of the prediction method. The experimental results of the nnittnet are shown in tables 2, 3, and 4.
TABLE 2 performance demonstration of DTI predictions by CVS1 over 4 data sets
Figure GDA0004219997810000131
Note that: ROC and PR are abbreviations for AUROC and AUPRC.
TABLE 3 performance demonstration of DTI predictions by CVS2 over 4 data sets
Figure GDA0004219997810000132
Note that: ROC and PR are abbreviations for AUROC and AUPRC.
Table 4. Performance demonstration of DTI predictions by cvs3 over 4 data sets
Figure GDA0004219997810000141
Note that: ROC and PR are abbreviations for AUROC and AUPRC.
The following describes the explanation of the prediction method in the present invention according to the experimental results of the present embodiment.
Taking the GPCR dataset as an example, a dictionary distribution of drug bond types from K1 to K100 was obtained by computing two average embedded vectors of known DTIs and unlabeled DTPs (see fig. 4). The significantly high embedded eigenvalues that occur in the first n nearest neighbors indicate that drugs that interact with a particular target always find their first n nearest neighbors among drugs that interact with the same target. This observation shows that if a drug has more non-zero units than non-interactions in the first n characteristic dimensions (bonds) it may interact with the target.
The present invention also indicates on this embodiment which embedded features in the M matrix caused interactions to occur. Since the cells with larger median M represent an important feature dimension, each feature f i The importance M (: i) of (i) can be measured by the average of the values in column i of M (see fig. 5). The importance distribution of the keys in dictionary K illustrates that features of relatively high importance are typically located in the first n nearest neighbors. This observation is significantly consistent with the visual observation described above over the first 10 keys with large Spearman (Spearman) correlation values (r= 0.8182).
This example investigated the predictive performance of top-k features (see FIG. 6). The value of k is {1,5,10,15, …,220}. When k increases to 50, the predictive effect increases dramatically. As k increases again, performance increases slowly and even decreases when k is greater.
One reason that the NNAttNet still performs better in missing labels is that it utilizes embedded vectors made up of neighboring nodes. We studied the distribution of feature importance at different rates of DTIs deletion (fig. 7). The graph reveals that the distribution of characteristic bonds shows a similar trend at different deletion rates. Meanwhile, feature importance vectors at 9 miss rates have a high degree of correlation. The Spearman correlation coefficients of the feature importance vectors at 10% loss and other loss (20% -90%) were 0.9996, 0.9993, 0.9989, 0.9979, 0.9969, 0.9943, 0.9919 and 0.9770, respectively. This high degree of correlation indicates that in the absence of data, the feature importance network may still indicate a critical feature. Thus, in the absence of a tag, even if a few drugs are found in the first n nearest neighbors of the drug to the target, the ordering key dictionary in its neighbor attention module can still guarantee that the queried drug interacts with the target.
The feasibility of the NNAttNet is demonstrated by the above examples: the interpretability of drug interactions with proteins, the ability to have stronger properties for predictions of missing DTI tags, consistent representation of direct-push and inductive DTI predictions, and selection of important attention-based features for more accurate DTI predictions.
The implementation methods and common general knowledge that are well known in the above-described schemes are not described here too much. It should be noted that modifications can be made to the invention by those skilled in the art without departing from the scope of the invention, which is also to be considered as the scope of the invention, and which does not affect the practice of the invention or the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the detailed description and the like in the specification are recited for explaining the content of the claims.

Claims (9)

1. A method for predicting drug-target interactions based on a neighbor attention network, comprising the steps of:
1) Construction of "drug-target" pair interaction prediction model
The drug-target interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) Collecting sample data, and training the drug-target interaction prediction model constructed in the step 1) to obtain a trained drug-target interaction prediction model;
the sample data includes relevant data of the drug and the target and actual interactions of the drug and the target; the specific training process is as follows:
2.1 Calculating the similarity between every two of all the drug molecules, the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises structural information of the drug molecules, sequence information of the target proteins and interaction relation information of the drug molecules and the target proteins;
2.2 Constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules obtained in the step 2.1) and the target protein and the similarity data between the drug molecules, and extracting the embedded representation of the target protein and all the drug molecules;
constructing a DsTNA module by using the similarity data between the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1), and extracting the embedded representation of the drug molecules and all the target proteins;
wherein a drug d is extracted by a TsDNA module x And a target t p The extraction process is as follows:
a1. according to all drugs and said drug d x Sequencing all medicines in the sequence from high similarity to low similarity to obtain K 1 、K 2 、…K m
a2. Obtaining all drugs and targets t p Removing non-interacting drugs;
a3. obtaining the drug d x And the target t p The formula is as follows:
Figure FDA0003149130970000021
wherein ,
Figure FDA0003149130970000022
is assigned bond, which is a drug +.>
Figure FDA0003149130970000023
Is a series of assigned keys, v i Is that
Figure FDA0003149130970000024
S (·, ·) is d x and />
Figure FDA0003149130970000025
Similarity of (2);
extraction of a target t by DsTNA Module p With a drug d x The extraction process is as follows:
b1. according to all targets and said target t p Sequencing all targets in the sequence of similarity from high to low to obtain H 1 、H 2 、…H m
b2. Obtaining all targets and drug d x Removing non-interacting targets;
b3. acquiring the target t p With the drug d x Is embedded in the representation of (1) as follows
Figure FDA0003149130970000026
wherein ,
Figure FDA0003149130970000027
is assigned bond, which is a drug +.>
Figure FDA0003149130970000028
Is a series of assigned keys, u i Is that
Figure FDA0003149130970000029
S (·, ·) is t p and />
Figure FDA00031491309700000210
Similarity of (2);
drug d x And target t p Is generated by concatenating the bi-directional representations:
e(d x ,t p )=[a(d x ,t p )||a(t p ,d x )];
2.3 Processing the embedded representation obtained in step 2.2) with a feature importance network
S1, carrying out step 2.2 on all medicines and targets, and stacking the obtained embedded representations of the medicine-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing a attention-enhanced representation matrix through the matrix M obtained in the step S2
Figure FDA0003149130970000031
2.4 (ii) matrix representation obtained in step 3)
Figure FDA0003149130970000032
Inputting the predicted interaction into a deep neural network model as an input layer to obtain a predicted interaction of a drug-target;
2.5 Comparing the predicted interaction of the drug-target obtained in the step 2.4) with the actual interaction of the drug and the target, and obtaining a weight in the model through back propagation to obtain a trained drug-target pair interaction prediction model;
3) And 3) predicting the interaction of the drug-target by using the trained drug-target interaction prediction model in the step 2).
2. The method of predicting drug-target interactions of claim 1, wherein:
in the step 2.1), the similarity between every two of all the medicine molecules is calculated by using the SIMCOMP method by utilizing the acquired structure information of the medicine molecules;
and calculating the similarity between every two target proteins by using the collected sequence information of the target proteins and adopting a Smith-Waterman algorithm.
3. The method of predicting drug-target interactions of claim 2, wherein:
the SIMCOMP method comprises the following steps:
SIMCOMP provides a global similarity score based on the common substructure size between two drug compounds using a graph alignment algorithm, where the similarity s (c, c ') of compounds c and c' is calculated as follows:
Figure FDA0003149130970000041
4. the method of predicting drug-target interactions of claim 2, wherein:
the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as a=a 1 a 2 a 3 …a n ,B=b 1 b 2 b 3 …b m Wherein n and m are the lengths of sequences A and B, respectively;
determining parameters:
s is a score when there are identity between the elements that make up the sequence;
W k a gap penalty of length k;
creating a scoring matrix H and initializing the first row and the first column of the scoring matrix H, wherein the size of the matrix is (n+1) ×m+1;
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, wherein:
Figure FDA0003149130970000042
selecting the item with the highest score in the score matrix H, namely, the matching score of the sequence A and the sequence B, and marking the matching score as SW (A, B);
similarity of sequence a and sequence B:
Figure FDA0003149130970000051
5. the method of predicting drug-target interactions of any one of claims 1-4, wherein:
in step S2), the attention matrix M is mapped as follows:
M(:,i)=DNN i (E)。
6. the method of predicting drug-target interactions of claim 5, wherein:
in step S3), the attention-enhanced representation matrix
Figure FDA0003149130970000052
The method comprises the following steps:
Figure FDA0003149130970000053
7. the method of predicting drug-target interactions of claim 6, wherein:
in step 2.4), the deep neural network model includes an input layer, a hidden layer using Relu as an activation function, and two neuron output layers using Sigmoid as an activation function.
8. A computer-readable storage medium having stored thereon a computer program, characterized by: which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
9. An electronic device, characterized in that: including a processor and a computer-readable storage medium;
the computer readable storage medium has stored thereon a computer program which, when executed by the processor, performs the steps of the method of any of claims 1 to 7.
CN202110759813.9A 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network Active CN113421658B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110759813.9A CN113421658B (en) 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110759813.9A CN113421658B (en) 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network

Publications (2)

Publication Number Publication Date
CN113421658A CN113421658A (en) 2021-09-21
CN113421658B true CN113421658B (en) 2023-06-16

Family

ID=77721466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110759813.9A Active CN113421658B (en) 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network

Country Status (1)

Country Link
CN (1) CN113421658B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114765060B (en) * 2021-01-13 2023-12-08 四川大学 Multi-attention method for predicting drug target interactions
CN116246697B (en) * 2023-05-11 2023-08-01 上海微观纪元数字科技有限公司 Target protein prediction method and device for medicines, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0321708D0 (en) * 2003-09-16 2003-10-15 Pfizer Ltd System and method for the computer-assisted identification of drugs and indications
CN111243682A (en) * 2020-01-10 2020-06-05 京东方科技集团股份有限公司 Method, device, medium and apparatus for predicting toxicity of drug
CN111916145A (en) * 2020-07-24 2020-11-10 湖南大学 Novel coronavirus target prediction and drug discovery method based on graph representation learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803144B2 (en) * 2014-05-06 2020-10-13 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
US11651860B2 (en) * 2019-05-15 2023-05-16 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
US20210142173A1 (en) * 2019-11-12 2021-05-13 The Cleveland Clinic Foundation Network-based deep learning technology for target identification and drug repurposing
CN111785320B (en) * 2020-06-28 2024-02-06 西安电子科技大学 Drug target interaction prediction method based on multi-layer network representation learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0321708D0 (en) * 2003-09-16 2003-10-15 Pfizer Ltd System and method for the computer-assisted identification of drugs and indications
CN111243682A (en) * 2020-01-10 2020-06-05 京东方科技集团股份有限公司 Method, device, medium and apparatus for predicting toxicity of drug
CN111916145A (en) * 2020-07-24 2020-11-10 湖南大学 Novel coronavirus target prediction and drug discovery method based on graph representation learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于分组贝叶斯排序的药物-靶标关系预测;丁棋梁;石泽智;李建华;;计算机工程与应用(15);全文 *
基于深度注意模型的药物蛋白质关系预测;赵其昌;中国优秀硕士学位论文全文数据库(第2期);全文 *
药物靶标预测技术在中药网络药理学中的应用;吴纯伟;路丽;梁生旺;陈超;王淑美;;中国中药杂志(03);全文 *

Also Published As

Publication number Publication date
CN113421658A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
Cao et al. Deep learning-based remote and social sensing data fusion for urban region function recognition
Lanchantin et al. Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks
Chen et al. Alchemy: A quantum chemistry dataset for benchmarking ai models
Su et al. A visualized bibliometric analysis of mapping research trends of machine learning in engineering (MLE)
Zhou et al. CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway
Zhang et al. Computational prediction of conformational B-cell epitopes from antigen primary structures by ensemble learning
Jiang et al. Predicting protein function by multi-label correlated semi-supervised learning
CN113421658B (en) Drug-target interaction prediction method based on neighbor attention network
Chu et al. Hierarchical graph representation learning for the prediction of drug-target binding affinity
Lin et al. Clustering methods in protein-protein interaction network
Srihari et al. Computational prediction of protein complexes from protein interaction networks
CN116206688A (en) Multi-mode information fusion model and method for DTA prediction
Qiu et al. BOW-GBDT: a GBDT classifier combining with artificial neural network for identifying GPCR–drug interaction based on wordbook learning from sequences
Wang et al. A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences
Wang et al. Performance improvement for a 2D convolutional neural network by using SSC encoding on protein–protein interaction tasks
Alghushairy et al. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting
Enireddy et al. OneHotEncoding and LSTM-based deep learning models for protein secondary structure prediction
CN115458046B (en) Method for predicting drug target binding property based on parallel deep fine granularity model
CN116543832A (en) disease-miRNA relationship prediction method, model and application based on multi-scale hypergraph convolution
Wells et al. Chainsaw: protein domain segmentation with fully convolutional neural networks
CN114898815A (en) Homogeneous interaction prediction method and device based on spatial structure in field of drug discovery
Tian et al. GTAMP-DTA: Graph transformer combined with attention mechanism for drug-target binding affinity prediction
Yang et al. Graph Contrastive Learning for Clustering of Multi-layer Networks
Yu et al. A supervised approach to detect protein complex by combining biological and topological properties
Zhang et al. MLLBC: A Machine Learning Toolbox for Modeling the Loss Rate of the Lining Bearing Capacity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240304

Address after: Room 21501, Unit 2, Building 1, Oak Tree Constellation, Keji Fifth Road, High tech Zone, Xi'an City, Shaanxi Province, 710065

Patentee after: Shaanxi Exquisite Technology Development Co.,Ltd.

Country or region after: China

Address before: 710072 No. 127 Youyi West Road, Shaanxi, Xi'an

Patentee before: Northwestern Polytechnical University

Country or region before: China