CN113421658A - Medicine-target interaction prediction method based on neighbor attention network - Google Patents

Medicine-target interaction prediction method based on neighbor attention network Download PDF

Info

Publication number
CN113421658A
CN113421658A CN202110759813.9A CN202110759813A CN113421658A CN 113421658 A CN113421658 A CN 113421658A CN 202110759813 A CN202110759813 A CN 202110759813A CN 113421658 A CN113421658 A CN 113421658A
Authority
CN
China
Prior art keywords
target
drug
interaction
matrix
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110759813.9A
Other languages
Chinese (zh)
Other versions
CN113421658B (en
Inventor
施建宇
赵鹏程
徐意
朱蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Exquisite Technology Development Co ltd
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110759813.9A priority Critical patent/CN113421658B/en
Publication of CN113421658A publication Critical patent/CN113421658A/en
Application granted granted Critical
Publication of CN113421658B publication Critical patent/CN113421658B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a 'drug-target' interaction prediction method based on a neighbor attention network, wherein a prediction model adopted is the neighbor attention network (NNAttNet), the problems are solved by constructing embedded representation (DTPs) of a drug to a neighbor, the prediction method enables the drug and protein interaction to have interpretability, reduces the influence caused by the lack of DTI entries, and provides a unified representation for direct-push prediction and inductive prediction. In addition, NNAttNet provides an attention-based selection of key features to predict DTI more accurately, and evaluation of NNAttNet on a baseline dataset shows that NNAttNet has better DTI prediction performance.

Description

Medicine-target interaction prediction method based on neighbor attention network
Technical Field
The invention belongs to the technical field of computer-aided drug research and development, and particularly relates to a 'drug-target' interaction prediction method based on a neighbor attention network.
Background
The shift in drug discovery patterns from "one drug, one target" to "multiple drugs, multiple targets" reveals the link between the drug and the target, a new pattern that facilitates the discovery of potential "drug-target" interactions (DTIs) that are fundamental tasks in drug development. However, the process of determining DTI by biological experiments is time consuming and laborious.
In recent years, various databases have come along with the increasing generation of DTI data, and this accumulation has prompted the application of computer methods, particularly machine learning-based methods, with good predictive performance in finding potential DTIs. However, despite significant efforts by researchers in DTI prediction, with significant achievements, there are still significant challenges in actual work, mainly expressed in the following aspects:
1) the embedded representation description of the DTI prediction mechanism is not sufficient due to the deficiency of interpretability;
2) the prediction model is very sensitive to missing tags;
3) the prediction approach is difficult to predict for new compound molecules/proteins interactions.
In view of this, there is a need to develop a new approach to predict "drug-target" interactions.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a 'drug-target' interaction prediction method based on a neighbor attention network.
Therefore, the invention firstly carries out intensive research and analysis on the problems in the prior art and discovers that:
1. the embedded representation description of the DTI prediction mechanism is not sufficient due to the deficiency of interpretability; mainly because the drug/target embedded representation learned by the existing Deep Learning (DL) or matrix decomposition (MF) is always difficult to interpret, the generated implicit space is difficult to provide an easy-to-handle way to indicate how these properties affect the interaction, and their black-box nature hinders direct guidance of drug design.
2. The prediction model is very sensitive to missing tags; mainly because in practice the collection of labels for "drug-target" is not complete, existing methods rarely take into account the missing interaction labels between "drug-target" pairs and do not concern whether the missing interactions contribute to the prediction of DTI.
3. The prediction mode is difficult to predict the interaction of new compound molecules/proteins; at present, two main prediction modes are provided, namely direct-push prediction and inductive prediction;
the task of the direct-push prediction is to construct a function mapping F: DXT → [0,1]To infer potential interactions between unlabeled "drug-target" pairs, the characteristics or similarities of the drug and target are used to learn the function F. Inductive learning is a well-known cold start problem in recommendation systems, and the task of inductive prediction is usually to learn a functional mapping F: DXT → [0,1](ii) a However, it can infer new drug molecules
Figure BDA0003149130980000021
And novel target proteins
Figure BDA0003149130980000022
Potential interaction between, or inference of D and TyInteraction between them, and, Dx and TyIs learned in F. However, almost all the current methods of DTI prediction based on similarity belong to direct-push prediction, which extracts topological embedded features from a DTI network or a similar matrix, and the training phase uses labeled training samples and unlabeled test samples at the same time, so that when new samples determine their labels in practice, they need to train the model again, and cannot meet the requirements of current drug development.
Therefore, in order to achieve the above object, the technical solution provided by the present invention is:
the 'drug-target' interaction prediction method based on the neighbor attention network is characterized by comprising the following steps of:
1) construction of a model for predicting drug-target pair interactions
The drug-target pair interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) collecting sample data, and training the 'drug-target' pair interaction prediction model constructed in the step 1) to obtain a trained 'drug-target' pair interaction prediction model;
the sample data comprises relevant data of the drug and the target and real interaction of the drug and the target; the specific training process is as follows:
2.1) calculating the similarity between every two of all the drug molecules and the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises the structural information of the drug molecules, the sequence information of the target protein and the interaction relation information of the drug molecules and the target protein;
2.2) constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules and the target protein obtained in the step 2.1) and similarity data between the drug molecules, and extracting the embedded expression of the target protein and all the drug molecules, namely whether the target protein is connected or not;
and/or
Constructing a DsTNA module by using the similarity data between the interaction relation matrix A of the drug molecules and the target protein obtained in the step 2.1), and extracting the embedded expression of the drug molecules and all the target proteins, namely the characteristic vector, which indicates whether a continuous edge exists or not;
wherein a drug d is extracted by the TsDNA modulexAnd a target tpThe extraction process is as follows:
a1. according to the combination of all the medicaments with the medicament dxThe similarity is sequenced from high to low to obtain K1、K2、…Km
a2. Obtaining all drugs and targets tpEliminating non-interacting drugs;
a3. obtainingThe medicament dxAnd the target tpIs expressed as follows:
Figure BDA0003149130980000041
wherein ,
Figure BDA0003149130980000042
is a key assigned, which is a drug
Figure BDA0003149130980000043
Figure BDA0003149130980000044
Is a series of assigned keys, viIs that
Figure BDA0003149130980000045
Is dxAnd
Figure BDA0003149130980000046
similarity of (c);
extraction of a target t by the DsTNA modulepWith a medicament dxThe extraction process is as follows:
b1. according to all targets and the target tpSequencing all targets from high similarity to low similarity to obtain H1、H2、…Hm
b2. Obtaining all targets and drugs dxEliminating targets without interaction;
b3. obtaining the target tpAnd the said medicament dxIs expressed as follows
Figure BDA0003149130980000047
wherein ,
Figure BDA0003149130980000048
is a key assigned, which is a drug
Figure BDA0003149130980000049
Figure BDA00031491309800000410
Is a series of assigned keys, uiIs that
Figure BDA0003149130980000051
Is tpAnd
Figure BDA0003149130980000052
similarity of (c);
medicine dxAnd target tpIs generated by concatenating the bi-directional representations:
e(dx,tp)=[a(dx,tp)||a(tp,dx)];
for the extraction of new drug embedded representation, only TsDNA was constructed at the time of constructing the test set and training set in order to maintain the balance of data since it cannot construct DsTNA (new data);
for the extraction of the new target embedded representation, only DsTNA is constructed when the test set and the training set are constructed in order to keep the balance of data because TsDNA (new data) cannot be constructed;
that is, e (d)x,tp)=[a(dx,tp)]Or e (d)x,tp)=[a(tp,dx)];
2.3) processing the embedded representation obtained in the step 2.2) by using the characteristic important network
S1, performing step 2.2) on all drugs and targets, and stacking the obtained embedded expressions of the drug-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing an attention-enhanced expression matrix through the matrix M obtained in the step S2
Figure BDA0003149130980000053
The identification is convenient;
2.4) representing the matrix obtained in step 3)
Figure BDA0003149130980000054
Inputting the data into a deep neural network model as an input layer to obtain the predicted interaction of the drug and the target;
2.5) comparing the predicted interaction of the drug-target obtained in the step 2.4) with the real interaction of the drug and the target, and obtaining a weight value in the model through back propagation to obtain a trained drug-target pair interaction prediction model.
That is, the training of the prediction model uses an interpretable model based on deep learning, namely NNAttNet, which comprises three modules, a neighbor attention module, a feature importance network and a multi-layer deep neural network model. For the "drug-target" pairs, the first module generates their interpretable embedded representations that have stronger expressive properties for the missing tags in the training data and are feasible in both the direct-push prediction and inductive prediction scenarios. In addition, the algorithm is not only adaptive to the feature input, but also adaptive to the similarity input. The second module, the feature importance network, which represents the importance of each dimension of the embedded feature, provides an interpretable feature selection, and belongs to one of the steps in building the neighbor attention module. The last module distinguishes whether a "drug-target" pair is a potential DTIs.
3) And (3) predicting the interaction by using the 'drug-target' interaction prediction model trained in the step 2).
Further, in the step 2.1), the similarity between every two of all the drug molecules is calculated by using the acquired structural information of the drug molecules and adopting a SIMCOMP method;
and calculating the similarity between every two target proteins by using the sequence information of the collected target proteins and adopting a Smith-Waterman algorithm.
Further, the SIMCOMP method is as follows:
SIMCOMP provides a global similarity score based on the size of the common substructure between two pharmaceutical compounds using a graph alignment algorithm, wherein the similarity s (c, c ') of compounds c and c' is calculated as follows:
Figure BDA0003149130980000061
further, the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as A ═ a1a2a3…an,B=b1b2b3…bmWherein n and m are the length of sequences A and B, respectively;
determining parameters:
s is a score when the elements constituting the sequence are identical;
Wka gap penalty for length k;
creating a score matrix H and initializing the head row and the head column of the score matrix H, wherein the size of the matrix is (n +1) × (m + 1);
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, where:
Figure BDA0003149130980000071
and selecting the item with the highest score in the score matrix H, namely the matching score of the sequence A and the sequence B, and marking as SW (A, B).
The similarity between sequence a and sequence B is:
Figure BDA0003149130980000072
further, in step S2), the attention matrix M is mapped as follows:
M(:,i)=DNNi(E)。
further, in step S3), the expression matrix of attention enhancement
Figure BDA0003149130980000073
The method comprises the following specific steps:
Figure BDA0003149130980000074
further, in step 2.4), the deep neural network model includes an input layer, a hidden layer using Relu as an activation function, and two neuron output layers using Sigmoid as an activation function; the deep neural network model acts as a binary predictor, with the output layer producing a probability representing the likelihood of drug-target pair interaction. The entire network of NNAttNet with neighbor attention weights, feature importance terms, and DNN weights can be jointly optimized by a binary cross entropy loss function, as follows:
Figure BDA0003149130980000081
wherein Y is the authentic tag of the drug target pair; f (-) is DNN; θ is a weight parameter of the entire network; r (-) is L2-norm; the coefficients of the lambda regularization term.
The present invention also provides a computer-readable storage medium having stored thereon a computer program characterized in that: which when executed by a processor implements the steps of the above-described method.
An electronic device, characterized in that: including a processor and a computer-readable storage medium;
the computer-readable storage medium has stored thereon a computer program which, when being executed by the processor, performs the steps of the above-mentioned method.
The invention has the advantages that:
the invention provides a prediction method based on deep learning, wherein a prediction model adopted is a neighbor attention network (NNAttNet), the problems are solved by constructing embedded representation (DTPs) of a medicament to a neighbor, the prediction method enables the interaction of the medicament and protein to have interpretability, reduces the influence caused by the lack of DTI entries, and provides a unified representation for direct-push prediction and inductive prediction. In addition, NNAttNet provides an attention-based selection of key features to predict DTI more accurately, and evaluation of NNAttNet on a baseline dataset shows that NNAttNet has better DTI prediction performance.
Drawings
Fig. 1 is the general architecture of the method NNAttNet proposed by the present invention;
FIG. 2 is a diagram of the basic structure of a TsDNA module;
FIG. 3 is a schematic diagram of the construction of a feature importance matrix;
FIG. 4 is an arrangement and distribution of drug key insertion features;
FIG. 5 is an arrangement and distribution of the significance of drug characteristics;
FIG. 6 is a graph of predicted performance of top-k features;
FIG. 7 distribution of feature importance under different deletion rates of DTIs.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples:
an embodiment of the "drug-target" interaction prediction method based on the neighbor attention network proposed by the present invention is specifically as follows:
the present embodiment includes three parts:
in the first part, when a test set is constructed, the continuous edges in a part of networks are deleted, and then the continuous edges are predicted.
In the second part, some drugs and all their links in the network are deleted in order to simulate the scenario of new drug prediction.
And in the third part, some target points and all the connecting edges in the network are deleted, so as to simulate the scene of new target point prediction.
In the process of result statistics and display, the performance parameters of the overall result are displayed, and a specific certain drug (target) is not specially displayed.
This example uses an inventive baseline dataset in a predictive performance comparison experiment, dividing the receptors into 4 subsets, enzymes (En), Ion Channels (IC), G-protein coupled receptors (GPCR) and Nuclear Receptors (NR), respectively, based on the characteristics of the protein family. Each subset includes known "drug-target" interactions, pairwise similarities between drugs, and pairwise similarities between targets. Wherein, the calculation of pairwise similarity between the medicines is performed by SIMCOMP algorithm, and the calculation of pairwise similarity between the targets is performed by Smith-Waterman algorithm. The details of this data set are shown in table 1.
TABLE 1 details of the reference data set
Figure BDA0003149130980000101
And (3) setting a dictionary K of a virtual key type for each data subset, assigning values to the dictionary by using the acquired data, constructing a TsDNA module and a DsTNA module between a drug molecule and a target protein, and finally obtaining the embedded representation of all the drug-target pairs.
TsDNA (see FIG. 2) consists of a dictionary K of virtual key types and values v describing these virtual keys. In the dictionary, the virtual keys are sorted by semantic adjacency. In short, the first key is its nearest neighbor, the second key is its second nearest neighbor, and the last key is its farthest neighbor. It is noted that this is a contribution that may explain the classification distinction between known DTIs and unknown DTIs.
When considering a drug dxWhether or not to react with a target protein tpWhen there is an interaction, these empty bonds are bound to the target tpThere are other known drug assignments that have an interaction relationship. In contrast, for those sums tpDrugs with no interaction are not madeOperation, then dxFor tpAttention of (c) can be defined as:
Figure BDA0003149130980000111
in this connection, it is possible to use,
Figure BDA0003149130980000112
is a key assigned, which is a drug
Figure BDA0003149130980000113
Figure BDA0003149130980000114
Is a series of assigned keys, viIs that
Figure BDA0003149130980000115
Is dxAnd
Figure BDA0003149130980000116
the similarity of (c). Note that a virtual key is featureless, and only after being assigned, will there be a feature. Such attention is given to dx→tpAnd (4) unidirectional representation.
To enhance the interpretability of the TsDNA module, we set V ═ V1,v2,…,v|K|]Is a diagonal matrix of which v isiIs a one-hot-like vector, that is, viIs not 0, and the other elements are all 0. By this method, dx→tpThe attention vector of (a) is sparse. Considering a well-accepted hypothesis that similar drugs tend to interact with target proteins of interest, it is hypothesized that if drug dxAnd target tpPossessing an interaction relationship, and possessing more non-zero values in some of their initial characteristic dimensions than drugs and targets that do not have an interaction relationship. In other words, for dx and tpThe interaction relationship between the twoOther drugs, if desired, with tpThere are interactions, that is they are usually dxThe first few neighbors of (a). This sparse attention insert expression provides evidence for later interpretability of TsDNA.
Due to the symmetric effect of the nodes in the two networks, we can similarly construct a DsTNA block that outputs another one-way representation tp→dxThe expression is a (t)p,dx) And thus d is requested finallyx and tpThe representation of pairs is generated by concatenating the bi-directional representations.
e(dx,tp)=[a(dx,tp)||a(tp,dx)]
The embedded representations of all "drug-target" pairs are stacked together, denoted as the attention matrix E.
By the formula M (: i) ═ DNNi(E) And modeling the attention matrix E to obtain an embedded expression matrix M.
By the formula
Figure BDA0003149130980000121
Constructing an attention-enhancing representation matrix
Figure BDA0003149130980000122
A generic DNN was used as a binary predictor to predict whether a "drug-target" pair would interact. The binary predictor comprises an input layer, namely an embedded expression of a drug-target pair, a hidden layer taking Relu as an activation function, and two neuron output layers taking Sigmoid as the activation function. The output layer generates probabilities that represent the likelihood of drug-target pair interaction. The entire network of NNAttNet with neighbor attention weights, feature importance terms, and DNN weights can be jointly optimized by a binary cross entropy loss function, as follows:
Figure BDA0003149130980000123
wherein Y is the authentic tag of the drug target pair; f (-) is DNN; θ is a weight parameter of the entire network; r (-) is L2-norm; the coefficients of the lambda regularization term.
In this example we evaluated the performance of each method by cross validation at 10 fold (CV), and used AUROC (area under receiver operating characteristic curve) and aurrc (area under exact recall curve) as indicators to measure DTI predictive performance.
In 10-fold cross-validation, we calculated AUROC/aurrc scores for each prediction method and obtained final AUROC/aurrc scores by calculating the average AUROC/aurrc score over 10 replicates.
In order to comprehensively evaluate the performance of each method, the CV test was performed in consideration of the following three scenarios.
Under CVS1, 90% of the DTPs (embedded representation of drug versus neighbor) were used for training, while the remaining 10% were used for each round of testing.
Under CVS2 (or CVS3), 90% of the drug (or target) interactions were used for training, and the remaining 10% of the drug (or target) interactions were used for testing.
CVS2(CVS3) is a cold start DTI prediction because there is no overlap between the training drug (target) and the test drug (target).
Notably, CVS1 is a direct-push type of prediction task.
The CVS2/CVS3 may be a straight-push or inductive prediction task, depending on the nature of the prediction method. The experimental results of NNAttNet are shown in tables 2, 3, 4.
TABLE 2 Performance display of DTI prediction on 4 data sets by CVS1
Figure BDA0003149130980000131
Note: ROC and PR are abbreviations for AUROC and AUPRC.
TABLE 3 Performance display of DTI prediction on 4 data sets by CVS2
Figure BDA0003149130980000132
Note: ROC and PR are abbreviations for AUROC and AUPRC.
TABLE 4 Performance display of DTI prediction on 4 data sets by CVS3
Figure BDA0003149130980000141
Note: ROC and PR are abbreviations for AUROC and AUPRC.
The interpretability of the prediction method in the present invention is explained below based on the experimental results of the present example.
Taking the GPCR dataset as an example, by calculating two mean embedding vectors of known DTIs and unlabeled DTPs, a dictionary distribution of drug bond types from K1 to K100 was obtained (see fig. 4). The significantly high values of the insertion characteristic that occur in the first n nearest neighbors indicate that a drug interacting with a particular target will always find its top n nearest neighbors in drugs interacting with the same target. This observation indicates that if a drug has more nonzero-value units in the first n characteristic dimensions (bonds) than it does not, it may interact with the target.
The present invention also indicates on this embodiment which embedded features in the M matrix cause the interaction to occur. Since the cells with larger M median represent an important feature dimension, each feature fiThe importance M (: i) of M can be measured by the average of the values in the ith column of M (see fig. 5). The distribution of importance of the keys in dictionary K illustrates that features of higher importance are typically located in the top n nearest neighbors. This observation is in marked agreement with the above-described visual observation on the first 10 keys with the larger Spearman correlation (r-0.8182).
This example investigates the predicted performance of top-k features (see FIG. 6). k takes the value {1,5,10,15, …,220 }. The prediction effect increases dramatically when k increases to 50. As k continues to increase, performance increases slowly, and decreases even when k is greater.
One reason NNAttNet still performs well on the missing tag problem is that it utilizes an embedded vector composed of neighboring nodes. We investigated the distribution of feature importance at different rates of loss of DTIs (fig. 7). The graph reveals that the distribution of characteristic bonds shows a similar trend at different deletion rates. Meanwhile, the feature importance vectors under the 9 deficiency rates have high correlation. The Spearman correlation coefficients of the feature importance vectors at 10% deletion rate and other deletion rates (20% -90%) are 0.9996, 0.9993, 0.9989, 0.9979, 0.9969, 0.9943, 0.9919 and 0.9770, respectively. This high degree of correlation indicates that in the absence of data, the feature importance network can still indicate critical features. Thus, in the absence of a tag, even if a few drugs are found in the top n neighbors of the drug closest to the target, the ordering key dictionary in its neighbor attention module can still guarantee that the queried drug interacts with the target.
The invention demonstrates the feasibility of NNAttNet by the above examples: the interpretability of drug-protein interactions, stronger properties for predictions of missing DTI tags, consistent representation of straight-forward and generalized DTI predictions, and selection of important features based on attention for more accurate DTI predictions.
Well-known implementations and features of the above-described arrangements are not described in great detail herein. It should be noted that, for those skilled in the art, various modifications can be made without departing from the invention, and these should also be construed as the scope of the invention, which does not affect the effect of the invention and the practicability of the patent. The scope of protection claimed in the present application shall be defined by the claims, and the detailed description and the like in the specification shall be used for explaining the contents of the claims.

Claims (9)

1. The 'drug-target' interaction prediction method based on the neighbor attention network is characterized by comprising the following steps of:
1) construction of a model for predicting drug-target pair interactions
The drug-target interaction prediction model consists of a neighbor attention module and a deep neural network module;
2) collecting sample data, and training the 'drug-target' pair interaction prediction model constructed in the step 1) to obtain a trained 'drug-target' pair interaction prediction model;
the sample data comprises relevant data of the drug and the target and real interaction of the drug and the target; the specific training process is as follows:
2.1) calculating the similarity between every two of all the drug molecules and the similarity between every two of all the target proteins by using the related data, and constructing an interaction relation matrix A of the drug molecules and the target proteins;
wherein the related data comprises the structural information of the drug molecules, the sequence information of the target protein and the interaction relation information of the drug molecules and the target protein;
2.2) constructing a TsDNA module by utilizing the interaction relation matrix A of the drug molecules and the target protein obtained in the step 2.1) and similarity data between the drug molecules, and extracting the embedded expression of the target protein and all the drug molecules;
constructing a DsTNA module by using the similarity data between the interaction relation matrix A of the drug molecules and the target proteins obtained in the step 2.1), and extracting the embedded expression of the drug molecules and all the target proteins;
wherein a drug d is extracted by the TsDNA modulexAnd a target tpThe extraction process is as follows:
a1. according to the combination of all the medicaments with the medicament dxThe similarity is sequenced from high to low to obtain K1、K2、…Km
a2. Obtaining all drugs and targets tpEliminating non-interacting drugs;
a3. obtaining said drug dxAnd the target tpIs expressed as follows:
Figure FDA0003149130970000021
wherein ,
Figure FDA0003149130970000022
is a key assigned, which is a drug
Figure FDA0003149130970000023
Is a series of assigned keys, viIs that
Figure FDA0003149130970000024
Is dxAnd
Figure FDA0003149130970000025
similarity of (c);
extraction of a target t by the DsTNA modulepWith a medicament dxThe extraction process is as follows:
b1. according to all targets and the target tpSequencing all targets from high similarity to low similarity to obtain H1、H2、…Hm
b2. Obtaining all targets and drugs dxEliminating targets without interaction;
b3. obtaining the target tpAnd the said medicament dxIs expressed as follows
Figure FDA0003149130970000026
wherein ,
Figure FDA0003149130970000027
is a key assigned, which is a drug
Figure FDA0003149130970000028
Is a series of assigned keys, uiIs that
Figure FDA0003149130970000029
Is tpAnd
Figure FDA00031491309700000210
similarity of (c);
medicine dxAnd target tpIs generated by concatenating the bi-directional representations:
e(dx,tp)=[a(dx,tp)||a(tp,dx)];
2.3) processing the embedded representation obtained in the step 2.2) by using the characteristic important network
S1, performing step 2.2) on all drugs and targets, and stacking the obtained embedded expressions of the drug-target pairs together to obtain a matrix E;
s2, constructing a mapping attention matrix M for the matrix E obtained in the step S1 through a deep neural network;
s3, constructing an attention-enhanced expression matrix through the matrix M obtained in the step S2
Figure FDA0003149130970000031
2.4) representing the matrix obtained in step 3)
Figure FDA0003149130970000032
Inputting the data into a deep neural network model as an input layer to obtain the predicted interaction of the drug and the target;
2.5) comparing the predicted interaction of the drug-target obtained in the step 2.4) with the real interaction of the drug and the target, and obtaining a weight in the model through back propagation to obtain a trained drug-target pair interaction prediction model;
3) predicting the interaction of the drug-target by using the trained drug-target interaction prediction model in the step 2).
2. The method for predicting a drug-target interaction according to claim 1, wherein:
in the step 2.1), similarity between every two medicine molecules is calculated by using the acquired structure information of the medicine molecules and adopting a SIMCOMP method;
and calculating the similarity between every two target proteins by using the sequence information of the collected target proteins and adopting a Smith-Waterman algorithm.
3. The method for predicting a drug-target interaction according to claim 2, wherein:
the SIMCOMP method is as follows:
SIMCOMP provides a global similarity score based on the size of the common substructure between two pharmaceutical compounds using a graph alignment algorithm, wherein the similarity s (c, c ') of compounds c and c' is calculated as follows:
Figure FDA0003149130970000041
4. the method for predicting a drug-target interaction according to claim 2, wherein:
the Smith-Waterman algorithm is specifically as follows:
two target sequences to be aligned are defined as A ═ a1a2a3…an,B=b1b2b3…bmWherein n and m are the length of sequences A and B, respectively;
determining parameters:
s is a score when the elements constituting the sequence are identical;
Wka gap penalty for length k;
creating a score matrix H and initializing the head row and the head column of the score matrix H, wherein the size of the matrix is (n +1) × (m + 1);
scoring from left to right, top to bottom, filling the remainder of the scoring matrix H, where:
Figure FDA0003149130970000042
selecting the item with the highest score in the score matrix H, namely the matching score of the sequence A and the sequence B, and marking as SW (A, B);
the similarity between sequence a and sequence B is:
Figure FDA0003149130970000051
5. the method for predicting a drug-target interaction according to any one of claims 1 to 4, wherein:
step S2), the attention matrix M is mapped as follows:
M(:,i)=DNNi(E)。
6. the method for predicting a drug-target interaction according to claim 5, wherein:
step S3), the expression matrix for attention enhancement
Figure FDA0003149130970000052
The method comprises the following specific steps:
Figure FDA0003149130970000053
7. the method for predicting a drug-target interaction according to claim 6, wherein:
in step 2.4), the deep neural network model comprises an input layer, a hidden layer taking Relu as an activation function and two neuron output layers taking Sigmoid as an activation function.
8. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
9. An electronic device, characterized in that: including a processor and a computer-readable storage medium;
the computer-readable storage medium has stored thereon a computer program which, when being executed by the processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110759813.9A 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network Active CN113421658B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110759813.9A CN113421658B (en) 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110759813.9A CN113421658B (en) 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network

Publications (2)

Publication Number Publication Date
CN113421658A true CN113421658A (en) 2021-09-21
CN113421658B CN113421658B (en) 2023-06-16

Family

ID=77721466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110759813.9A Active CN113421658B (en) 2021-07-06 2021-07-06 Drug-target interaction prediction method based on neighbor attention network

Country Status (1)

Country Link
CN (1) CN113421658B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114765060A (en) * 2021-01-13 2022-07-19 四川大学 Multi-attention method for predicting drug target interaction
CN116246697A (en) * 2023-05-11 2023-06-09 上海微观纪元数字科技有限公司 Target protein prediction method and device for medicines, equipment and storage medium
CN117912591A (en) * 2024-03-19 2024-04-19 鲁东大学 Kinase-drug interaction prediction method based on deep contrast learning
CN117912591B (en) * 2024-03-19 2024-05-31 鲁东大学 Kinase-drug interaction prediction method based on deep contrast learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0321708D0 (en) * 2003-09-16 2003-10-15 Pfizer Ltd System and method for the computer-assisted identification of drugs and indications
US20150324693A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
CN111243682A (en) * 2020-01-10 2020-06-05 京东方科技集团股份有限公司 Method, device, medium and apparatus for predicting toxicity of drug
CN111785320A (en) * 2020-06-28 2020-10-16 西安电子科技大学 Drug target interaction prediction method based on multilayer network representation learning
CN111916145A (en) * 2020-07-24 2020-11-10 湖南大学 Novel coronavirus target prediction and drug discovery method based on graph representation learning
US20200365270A1 (en) * 2019-05-15 2020-11-19 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
US20210142173A1 (en) * 2019-11-12 2021-05-13 The Cleveland Clinic Foundation Network-based deep learning technology for target identification and drug repurposing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0321708D0 (en) * 2003-09-16 2003-10-15 Pfizer Ltd System and method for the computer-assisted identification of drugs and indications
US20150324693A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
US20200365270A1 (en) * 2019-05-15 2020-11-19 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
US20210142173A1 (en) * 2019-11-12 2021-05-13 The Cleveland Clinic Foundation Network-based deep learning technology for target identification and drug repurposing
CN111243682A (en) * 2020-01-10 2020-06-05 京东方科技集团股份有限公司 Method, device, medium and apparatus for predicting toxicity of drug
CN111785320A (en) * 2020-06-28 2020-10-16 西安电子科技大学 Drug target interaction prediction method based on multilayer network representation learning
CN111916145A (en) * 2020-07-24 2020-11-10 湖南大学 Novel coronavirus target prediction and drug discovery method based on graph representation learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
丁棋梁;石泽智;李建华;: "基于分组贝叶斯排序的药物-靶标关系预测", 计算机工程与应用 *
吴纯伟;路丽;梁生旺;陈超;王淑美;: "药物靶标预测技术在中药网络药理学中的应用", 中国中药杂志 *
赵其昌: "基于深度注意模型的药物蛋白质关系预测", 中国优秀硕士学位论文全文数据库 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114765060A (en) * 2021-01-13 2022-07-19 四川大学 Multi-attention method for predicting drug target interaction
CN116246697A (en) * 2023-05-11 2023-06-09 上海微观纪元数字科技有限公司 Target protein prediction method and device for medicines, equipment and storage medium
CN117912591A (en) * 2024-03-19 2024-04-19 鲁东大学 Kinase-drug interaction prediction method based on deep contrast learning
CN117912591B (en) * 2024-03-19 2024-05-31 鲁东大学 Kinase-drug interaction prediction method based on deep contrast learning

Also Published As

Publication number Publication date
CN113421658B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
Nasiri et al. A new link prediction in multiplex networks using topologically biased random walks
Deng et al. An improved method to construct basic probability assignment based on the confusion matrix for classification problem
Bielza et al. Multi-dimensional classification with Bayesian networks
Zou et al. Approaches for recognizing disease genes based on network
CN104899253A (en) Cross-modality image-label relevance learning method facing social image
Sadeghi et al. An analytical review of computational drug repurposing
Lin et al. Clustering methods in protein-protein interaction network
Chu et al. Hierarchical graph representation learning for the prediction of drug-target binding affinity
CN113421658B (en) Drug-target interaction prediction method based on neighbor attention network
CN110021341A (en) A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access
Srihari et al. Computational prediction of protein complexes from protein interaction networks
CN112652355A (en) Medicine-target relation prediction method based on deep forest and PU learning
CN116206688A (en) Multi-mode information fusion model and method for DTA prediction
Tian et al. MHADTI: predicting drug–target interactions via multiview heterogeneous information network embedding with hierarchical attention mechanisms
Kirac et al. Protein function prediction based on patterns in biological networks
Dutta et al. Incomplete multi-view gene clustering with data regeneration using Shape Boltzmann Machine
Celebi et al. Prediction of Drug-Drug interactions using pharmacological similarities of drugs
CN116543832A (en) disease-miRNA relationship prediction method, model and application based on multi-scale hypergraph convolution
CN116206775A (en) Multi-dimensional characteristic fusion medicine-target interaction prediction method
Cai et al. A new community detection method for simplified networks by combining structure and attribute information
CN114898815A (en) Homogeneous interaction prediction method and device based on spatial structure in field of drug discovery
Wang et al. Predicting Drug-Drug Interactions with Graph Attention Network
Yang et al. Graph Contrastive Learning for Clustering of Multi-layer Networks
Ao et al. Computational Approaches for Predicting Drug-Disease Associations: A Comprehensive Review
Jeipratha et al. Optimal gene prioritization and disease prediction using knowledge based ontology structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240304

Address after: Room 21501, Unit 2, Building 1, Oak Tree Constellation, Keji Fifth Road, High tech Zone, Xi'an City, Shaanxi Province, 710065

Patentee after: Shaanxi Exquisite Technology Development Co.,Ltd.

Country or region after: China

Address before: 710072 No. 127 Youyi West Road, Shaanxi, Xi'an

Patentee before: Northwestern Polytechnical University

Country or region before: China

TR01 Transfer of patent right