CN111243659A - Drug interaction prediction method based on drug multidimensional similarity - Google Patents

Drug interaction prediction method based on drug multidimensional similarity Download PDF

Info

Publication number
CN111243659A
CN111243659A CN201811441665.0A CN201811441665A CN111243659A CN 111243659 A CN111243659 A CN 111243659A CN 201811441665 A CN201811441665 A CN 201811441665A CN 111243659 A CN111243659 A CN 111243659A
Authority
CN
China
Prior art keywords
drug
similarity
target
pair
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811441665.0A
Other languages
Chinese (zh)
Inventor
陈迪
朴海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Institute of Chemical Physics of CAS
Original Assignee
Dalian Institute of Chemical Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Institute of Chemical Physics of CAS filed Critical Dalian Institute of Chemical Physics of CAS
Priority to CN201811441665.0A priority Critical patent/CN111243659A/en
Publication of CN111243659A publication Critical patent/CN111243659A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses a medicine interaction prediction method based on medicine multidimensional similarity, which comprises the following steps: calculating the multidimensional similarity between two drugs based on the characteristics of the drugs in multiple aspects of compound molecular descriptors, drug targets, ATC codes, pathways and targets in the protein interaction network; constructing an SVM classifier based on Mahalanobis distance based on the multi-dimensional drug similarity characteristics for distinguishing synergistic, antagonistic and independent drug interactions.

Description

Drug interaction prediction method based on drug multidimensional similarity
Technical Field
The present invention relates to the field of bioinformatics, and in particular to the field of predicting drug interactions using computer technology.
Background
Traditional drug development mainly aims at single compounds acting on a single target, complex diseases often correspond to complex biological processes, and one-to-one drug-target action modes cannot achieve obvious curative effects in treatment of many diseases. The clinical discovery shows that the reasonable drug combination can not only improve the curative effect of the drug, but also reduce the toxic and side effect, and provides a new way for treating complex diseases: the combination medicine is prepared. The combined medicine is composed of two or more active medicine components, different components usually act on different targets to regulate different pathological processes, higher curative effect is achieved with lower amount, and toxic and side effects of the medicine are reduced. Currently, combination drugs are increasingly being used for complex diseases such as: in the treatment of cancer, AIDS, hypertension, pulmonary tuberculosis, etc.
Unlike randomly combined drugs, there is synergistic interaction between the different drugs of the combined drug. The synergistic drug combination has the functions of synergy and attenuation, namely, the drug effect generated by the drug combination is more than the sum of the drug effects generated by the individual drugs when used independently. The other drug interaction corresponding to the drug is antagonistic interaction, and the drug combination of the antagonistic interaction can reduce the curative effect and increase the toxic and side effects of the drug. No matter antagonistic interaction or synergistic interaction, the drugs have correlation in pharmacology and pharmacokinetics. Aiming at the treatment of complex diseases, the combined medicaments are reasonably designed, the medicament combination of synergistic interaction is effectively utilized, meanwhile, the medicament combination of antagonistic interaction is avoided, and the treatment effect of the diseases is greatly improved.
The existing combined drugs are mainly from clinical experience, and the condition of drug combination increases exponentially along with the number of drugs, so that the problem of combination explosion hardly realizes clinical or experimental research. Therefore, the development of an effective computer prediction method becomes a necessary trend of combined drug discovery research, and the type of the drug combination is predicted through calculation, so that guidance can be provided for experimental research, and the development work of the combined drug can be promoted.
Currently, researchers have proposed a variety of computational methods for analyzing drug combinations or drug interactions. One of the more common methods is to construct classification models based on various types of similarity measures between drugs. For example, Zou et al analyze topological properties of neighboring communities of drug targets in Protein Interaction (PPI) networks and semantic similarity of related Gene Ontologies (GO), construct SVM classifiers, and predict whether drug combinations can be used as effective combined drugs. However, analyzing drug combinations solely by their inter-target properties ignores much of the drug-related information. Gottlieb and the like consider more comprehensive drug characteristics, and integrate and utilize drug ATC codes, the distance of drug targets on a PPI network, GO terms of the targets, target sequences, compound structures, drug side effects and ligand structures to construct a drug interaction classification model in 7 different aspects. However, the existing classification models have some limitations, one is that the association relationship of different drugs in the pathway is not considered, and the effective combination drug usually achieves the synergistic effect by interfering the associated pathway. Secondly, these classification models mainly classify whether there is interaction or not or whether they are as a combination drug, and neither model distinguishes well between synergistic interaction and antagonistic interaction.
In terms of constructing a classifier, a data set for a high-dimensional feature generally needs to be classified after being subjected to dimension reduction of the feature. In 2008, a heuristic learning method of the Mahalanobis distance matrix is provided for Shiming and the like, the accuracy of an expert method can be effectively improved when the Mahalanobis distance matrix obtained through learning is used in a classification and clustering method, and the dimension reduction method provides a favorable reference for predicting the category of drug interaction.
Disclosure of Invention
Technical problem to be solved
The invention aims to provide a drug interaction prediction method based on drug multidimensional similarity, which integrates multiple aspects of characteristics of drugs to calculate drug similarity and constructs an SVM (support vector machine) classifier based on Mahalanobis distance to classify three types of drug interactions, namely synergy, antagonism and independence.
(II) technical scheme
In order to solve the above technical problems, the present invention provides a prediction method of drug interaction based on multidimensional drug similarity, comprising:
firstly, constructing an SVM classifier based on Mahalanobis distance based on the multi-dimensional drug similarity characteristics;
step 1: obtaining a drug pair of a known drug interaction type;
step 2: calculating the multi-dimensional drug similarity between drug pairs;
and step 3: learning a mahalanobis distance transformation matrix based on the multi-dimensional drug similarity measure and the known classes of drug interactions;
and 4, step 4: and constructing the SVM classifier of the interaction of the three classes of medicines based on the data of the Mahalanobis distance transformation.
Predicting the interaction between every two drugs in any plurality of candidate drugs based on the constructed SVM classifier;
aiming at the medicines with targets, two-dimensional structures and ATC coding information in any two drug bank databases in the plurality of candidate medicines, calculating the multidimensional similarity between the two medicines based on the same method in the step 2; and (3) carrying out data transformation on the multidimensional similarity based on the Mahalanobis distance matrix obtained in the step (3) and then bringing the data into the classifier in the step (4) to obtain probability values of the two drugs belonging to each drug interaction type, wherein the sum of the probability values corresponding to the three classes of drug interactions is1, and the drug interaction class with the maximum probability value is used as the drug interaction class obtained by prediction), namely the two input drugs belong to one drug interaction type of synergy, antagonism and independence.
The step 2 comprises the following steps:
step 21: calculating drug similarity based on the molecular descriptors of the drugs;
step 22: calculating drug similarity based on the two-dimensional structure of the drug;
step 23: calculating drug similarity based on the drug target;
step 24: calculating drug similarity based on drug ATC codes;
step 25: calculating drug similarity based on the pathway in which the drug target is located;
step 26: calculating drug similarity based on neighbor nodes of drug targets in a protein interaction network;
step 27: multi-dimensional drug similarity is obtained by integrating drug similarity measurement results based on different characteristics.
The step 3 comprises the following steps:
step 31: construction of Must-links matrix Ls(i,j):
Figure BDA0001884830720000031
Wherein pairi、pairjTo representTwo different drug interaction pairs, (pair)i,pairj) Epsilon S represents pairi、pairjBoth drug pairs are of the same drug interaction type. The meaning of the above formula means when the drug is PairiAnd pairjWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is1, otherwise, the content is 0;
construction of Cannot-links matrix Ld(i,j):
Figure BDA0001884830720000041
Wherein (pair)i,pairj) E.g. D represents pairi、pairjThe meaning of the above formula when two drug pairs are drugs belonging to different types means that when a drug pair is pairiAnd pairjWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is 0, otherwise, the content is 1;
step 32: based on Ls、LdSeparately computing covariance matrices
Figure BDA0001884830720000042
Wherein X is a drug interaction pair feature description matrix, each column corresponds to a drug interaction pair, and each row corresponds to a similarity measure result;
step 33: learning mahalanobis distance conversion matrix using mahalanobis distance learning method
Figure BDA0001884830720000043
The step 4 comprises the following steps:
step 41: and (3) performing Mahalanobis distance transformation processing on the original data: x ═ W*TX;
Step 42: and constructing an SVM classifier aiming at drug interaction classification based on the data set after the Mahalanobis distance transformation.
(III) advantageous effects
The invention provides a drug interaction prediction method based on drug multidimensional similarity, which comprehensively describes the association relationship among drugs by integrating the similarity measurement of the drugs in multiple aspects of molecular descriptors, two-dimensional structures, targets, ATC codes, paths and targets in neighbor nodes in a protein interaction network, and provides a basis for classification and mechanism explanation of drug interaction; the SVM classifier based on the Mahalanobis distance can improve the precision reading of the classifier and reliably predict the interaction probability of the drugs to the drugs belonging to different types.
Drawings
FIG. 1 is a schematic diagram of a method for predicting drug interaction based on multi-dimensional similarity of drugs according to the present invention;
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
Firstly, constructing an SVM classifier based on Mahalanobis distance based on the multi-dimensional drug similarity characteristics;
FIG. 1 is a schematic diagram of a method for predicting drug interaction based on multidimensional drug similarity. As shown in FIG. 1, the present invention provides a method for predicting synergistic, antagonistic and independent interactions of three different types of drugs, wherein independent drug interaction refers to the type of interaction between two drugs that does not have synergistic or antagonistic interactions. Four steps are given in fig. 1 in order from top to bottom: the specific content comprises the following steps:
step 1: acquiring interaction of different types of medicines;
the step 1 mainly provides training and testing samples for subsequent classification models, collects the same number of synergistic, antagonistic and independent drug interaction pairs only for ensuring sample balance and computability, and ensures that the two-dimensional structure data, targets and ATC coding information of all drugs are known.
Step 2: calculating the multi-dimensional similarity of the medicines;
the steps calculate the similarity among the medicines from a plurality of aspects of molecular descriptors, two-dimensional structures, targets, ATC codes, paths and targets of the medicines in the neighbor nodes of the protein interaction network, comprehensively describe the association relation among the medicines and provide a basis for the classification and mechanism explanation of the medicine interaction.
And step 3: learning a Mahalanobis distance matrix;
in order to improve the classification performance, the mahalanobis distance conversion matrix is learned according to sample data to ensure that drug pairs belonging to the same drug interaction type are as close as possible in the mahalanobis distance space, and drug pairs of different types are as far as possible in the mahalanobis distance space.
And 4, step 4: constructing an SVM classifier based on the Mahalanobis distance;
the method comprises the steps of converting original data by using a Mahalanobis distance conversion matrix, and training and testing the SVM classifier based on the converted data, so that the classification precision is improved.
Predicting the interaction between every two drugs in any plurality of candidate drugs based on the constructed SVM classifier;
aiming at the medicines with targets, two-dimensional structures and ATC coding information in any two drug bank databases in the plurality of candidate medicines, calculating the multidimensional similarity between the two medicines based on the same method in the step 2; and (3) carrying out data transformation on the multidimensional similarity based on the Mahalanobis distance matrix obtained in the step (3) and then bringing the data into the classifier in the step (4) to obtain probability values of the two drugs belonging to each drug interaction type, wherein the sum of the probability values corresponding to the three classes of drug interactions is1, and the drug interaction class with the maximum probability value is used as the drug interaction class obtained by prediction), namely the two input drugs belong to one drug interaction type of synergy, antagonism and independence.
The individual steps involved are described in detail below.
Step 1: acquiring interaction of different types of medicines;
the step 1 comprises the following steps:
step 11: obtaining 179 drug pairs with known two-dimensional structures, targets and ATC codes of the drugs from a DCDB database, and using the drug pairs as a cooperative drug interaction pair set, wherein the set comprises the following steps: amitriptyline and lissamine, lapatinib and topotecan, and the like;
step 12: randomly selecting 179 drug pairs with known antagonistic effect in a drug two-dimensional structure, a target and an ATC code from drug interaction in a drug Bank database as an antagonistic drug interaction pair set; the method comprises the following steps: tacrolimus and imatinib, vorinostat and chlorpromazine and the like
Step 13: randomly extracting 179 pairs from drug library drug interaction record, which belong to neither DCDB database nor drug library, and ensuring that the two-dimensional structure, target, ATC code of each drug can be obtained from drug library database as independent drug interaction pair set; the method comprises the following steps: amprenavir and clockie, amobarbital and rasagiline
All the collected medicines are medicines recorded in a drug bank and DCDB database, the two-dimensional structure, the target and the ATC code of each medicine are known, the two-dimensional structure is described by a drug bank database sdf file, the target set is from the drug bank database and the DCDB database and is subjected to unified labeling based on Entrez Gene ID corresponding to the medicine target, and the drug ATC code is from the drug bank database.
The step 2 comprises the following steps:
step 21: calculating drug similarity based on the molecular descriptors of the drugs;
molecular descriptors describe the chemical properties of small molecule compounds from a variety of angles. Taking a two-dimensional structure of a drug as an input, calculating a molecular descriptor of each small molecule drug based on a cdk (cdk. qsar. descriptors. molecular) in the Chemistry Development kit, removing molecular descriptors which are 0 or can not be calculated in more than 90% of molecules, and finally obtaining a molecular descriptor with 112 dimensions in total. The similarity between the two drug molecule descriptors in each pair of drugs in the set of three classes of drug interaction pairs, MDsS, was calculated as follows:
Figure BDA0001884830720000071
wherein d is1、d2Represents two drugs of any drug pair, MD1 dTo MDn dN different molecular descriptors representing drug d.
Step 22: calculating drug similarity based on the two-dimensional structure of the drug;
drugs of similar compound structures generally have similar pharmacological and pharmacokinetic properties. Fingerprint of drug two-dimensional structure calculation compound was obtained using CDK. fingerprint module in CDK tool, and fingerprint similarity between two drugs based on Tanimoto coefficient was calculated using CDK.
Step 23: calculating drug similarity based on the drug target;
the step 23 specifically includes:
step 231: obtaining a target set of the medicine by using a drug bank database;
step 232: calculating drug similarity geneS1 based on the ratio of the two drugs sharing the target to the target union;
Figure BDA0001884830720000072
wherein d is1、d2Represents any two drugs, TdA target set representing drug d;
step 233: calculating drug similarity based on the semantic similarity of Gene Ontology (GO) of drug targets;
obtaining GO sets corresponding to two drug target sets in each drug pair respectively by utilizing a pyGS2 packet installed in Python, and calculating the average semantic similarity between the two GO sets through a GS2 function in a pyGS2 packet, wherein the average semantic similarity is marked as geneS 2;
step 234: drug similarity geneS3 was calculated based on the shortest path length of the target in the protein interaction network:
Figure BDA0001884830720000081
wherein, Td1、Td2Distribution represents drug d1、d2Target set of (1), t1、t2Are respectively any member of a target set, d (t)1,t2) Represents a target t1、t2And in the shortest path length in the PPI network, the PPI network is downloaded from an HPRD database, and the shortest path length is obtained by calculation based on a shortest _ path _ length function of a network package in Python.
Step 24: calculating drug similarity based on drug ATC codes;
the anatomical therapeutics and chemical classification system, abbreviated as atc (atomic Therapeutic chemical) system, is the official classification system of drugs by the world health organization, and comprises 5 drug levels, which are classified and coded according to the organ acted by the drug and the chemical characteristics of the treatment. Similarity ATCs on the kth level for two drug ATC codeskIs defined as follows:
atcSk(d1,d2)=|atck(d1)∩atck(d2)|/|atck(d1)∪atck(d2)|
wherein atck(d) Indicating all ATC codes for drug d on the kth level of the ATC system. Since each drug corresponds to a 5-level code, the ATC code similarity of two drugs is defined as follows:
Figure BDA0001884830720000082
step 25: calculating drug similarity based on the pathway in which the drug target is located;
the similarity between drugs is measured by the correlation between the pathways in which the drug targets are located. The pathway effects of drugs are described in terms of their effect on all known pathways by their corresponding targets, and pathway similarity is described by comparing the degree of similarity of the effects of different drug pathways.
The step 25 specifically includes:
step (ii) of251: obtaining the corresponding pathway information of all human beings by using a KEGG database, and defining a pathway map for each drug on the basis of the pathway information
Figure BDA0001884830720000083
Where N represents the number of all vias in the KEGG database,
Figure BDA0001884830720000084
is the score of the target set of drug d on the ith pathway, i is an integer between 1 and N;
step 252: calculation of the first form based on the ratio of overlap between target and pathway containing protein of the drug
Figure BDA0001884830720000091
Wherein T isdRepresents the target, Ps, of drug diRepresenting the protein contained in the ith pathway, and calculating a pathway map for each drug based thereon;
step 253: computing a second form of p based on GO similarity between GO terms of a drug's target and GO terms of a pathwayd iObtaining a GO term set corresponding to a target set of a drug d and a GO term set corresponding to all proteins in a channel i by utilizing a PyGS2 packet, and calculating the average semantic similarity of the two GO term sets to obtain pd iCalculating a pathway profile for each drug based thereon;
step 254: after the pathway maps of each drug are obtained through calculation, the similarity between the drugs is represented by calculating the Pearson correlation coefficient between the two drug pathway maps. Corresponding to two different pd iThe calculation method can calculate two-dimensional measurement based on the path similarity.
Step 26: calculating drug similarity based on neighbor nodes of drug targets in the protein interaction network;
research shows that the action mechanism of the combined drug is closely related to the neighbor nodes of the drug targets in the PPI network, the method obtains the neighbor nodes of the target set of each drug in the PPI network based on the PPI network, and calculates the similarity based on the neighbor nodes.
Preferably, said step 26 comprises:
step 261: determining neighbor nodes of the drug target in the PPI network based on the PPI recorded in the HPRD database;
step 262: target set T in step 232d1、Td2Replacement with neighbor node set Neid1And Neid2Calculating the similarity of the drugs based on the target neighbor node sharing ratio, and recording as neiS 1:
Figure BDA0001884830720000092
step 263: acquiring neighbor node set Nei by utilizing PyGS2 packet installed in Pythond1And Neid2Calculating the average semantic similarity between the two GO sets through a GS2 function in a pyGS2 package to obtain the drug similarity based on the GO similarity of the target neighbor node, and marking as neiS 2;
step 264: target set T in step 234d1、Td2Replacement with neighbor node set Neid1And Neid2Calculating by adopting the same calculation method in the step 234 to obtain a drug similarity measurement based on the shortest path length between target neighbor nodes, which is marked as neiS 3:
Figure BDA0001884830720000101
wherein, t1、t2Are respectively any member in the neighbor node set, d (t)1,t2) Represents t1、t2Shortest path length in PPI networks.
Step 265: target set T in step 25dReplacement with neighbor node set NeidAnd respectively calculating and obtaining two other drug similarity measurements based on target neighbor node path association by adopting two calculation modes in the step 25, and respectively marking the two drug similarity measurements as neiS4 and neiS 5.
Step 27: integrating the drug similarity measurement results of steps 21 to 26 to obtain 124-dimensional multi-dimensional drug similarity;
the similarity comprises 112-dimensional molecular descriptor similarity, 1-dimensional two-dimensional structure similarity, 3-dimensional target similarity, 1-dimensional ATC similarity, 2-dimensional drug path similarity, 5-dimensional target neighbor node similarity and 124-dimensional similarity.
And step 3: learning a mahalanobis distance transformation matrix based on the multi-dimensional drug similarity measure and the known classes of drug interactions;
the step 3 comprises the following steps:
step 31: construction of Must-links matrix Ls(i,j):
Figure BDA0001884830720000102
Wherein pairi、pairjRepresenting two different pairs of drug interactions, (pair)i,pairj) Epsilon S represents pairi、pairjTwo drug pairs are of the same drug interaction type (synergistic, antagonistic, or independent). The meaning of the above formula means when the drug is PairiAnd pairjWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is1, otherwise, the content is 0;
construction of Cannot-links matrix Ld(i,j):
Figure BDA0001884830720000103
Wherein (pair)i,pairj) E.D represents Pairi, the two drug pairs of Pairj belong to different types of drugs, and the meaning of the formula means that when the drug pair PairiAnd pairjWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is 0, otherwise, the content is 1;
step 32: based on Ls、LdSeparately computing covariance matrices
Figure BDA0001884830720000111
Figure BDA0001884830720000112
Wherein X is a drug interaction pair characterization matrix, each column corresponds to a drug interaction pair, each row corresponds to a similarity measure, and XTRepresents the transpose of matrix X;
step 33: learning mahalanobis distance conversion matrix using mahalanobis distance learning method
Figure BDA0001884830720000113
Where tr denotes the trace of the matrix, W*Is that make
Figure BDA0001884830720000114
A W matrix with a maximum value that satisfies the transpose of W (W)T) The product of the matrix with W is the identity matrix (W)TW=I)。
And 4, step 4: constructing an SVM classifier of three-class drug interaction based on data of Mahalanobis distance transformation;
the step 4 comprises the following steps:
step 41: and (3) performing Mahalanobis distance transformation processing on the original data: x ═ W*TX, replacing original data with the converted matrix for subsequent analysis;
step 42: and constructing an SVM classifier aiming at three drug interaction categories by using a sklern packet in python and based on X' and the drug interaction category to which each row of drug pairs belongs.
Examples
The inventor uses a drug interaction prediction method based on drug multidimensional similarity in drug interaction analysis of drugs related to cerebral apoplexy, 8 drugs (including dipyridamole, aspirin, argatroban, clopidogrel, dabigatran etexilate, ticlopidine, warfarin and cilostazol) for brain apoplexy and 41 drugs (including atenolol, cilazapril, cycloserine and the like) related to cerebral apoplexy are selected to obtain a drug pair combination 8 (8-1)/2+8 (41) 356 pairs containing at least one drug for cerebral apoplexy, and the implementation effect is as follows:
the probability that all cerebral apoplexy related drug pairs belong to different interaction categories is predicted by utilizing the SVM classifier based on the Mahalanobis distance constructed by the invention, and the corresponding structure of each category is sequenced according to the probability value. The results show that both drug pair combinations with exact synergistic interaction (aspirin and clopidogrel, dipyridamole and aspirin) are ranked in the first 11 positions, and that two additional drug pairs in the first 11 positions demonstrate synergistic effects in the relevant literature (table 1); while the antagonism of the top 10 ranked antagonistic interactions of 7 groups of drugs on combinations (dipyridamole and timolol, clopidogrel and escitalopram, escitalopram and ticlopidine, carvedilol and ticlopidine, clopidogrel and ropinirole, dipyridamole and metoprolol, dipyridamole and escitalopram) has been recorded in databases or drug-related websites (table 2). This example demonstrates the effectiveness of a drug interaction prediction method based on multi-dimensional similarity of drugs proposed by the present invention, i.e. the ability to identify known synergistic interactions and antagonistic interactions.
TABLE 1 ranking of synergistic drug interactions associated with cerebral apoplexy
Figure BDA0001884830720000121
TABLE 2 ranking of antagonist drug interactions associated with cerebral apoplexy
Ranking Medicine 1 Medicine 2 Probability of antagonistic interactions
1 Dipyridamole Timolol 0.963587695
2 Clopidogrel Escitalopram 0.956255907
3 Escitalopram Ticlopidine 0.948348315
4 Carvedilol Ticlopidine 0.944879595
5 Clopidogrel Ropinirole 0.902465643
6 Dipyridamole Metoprolol 0.885249271
7 Dipyridamole Ropinirole 0.883119422
8 Ropinirole Ticlopidine 0.877290588
9 Dipyridamole Escitalopram 0.85500597
10 Argatroban Quinapril 0.841249238
TABLE 3 Stroke-related independent interaction drug action ranking
Ranking Medicine 1 Medicine 2 Probability of independent interaction
1 Dabigatran etexilate Gemfibrozil 0.59
2 Dabigatran etexilate Simvastatin 0.56
3 Cyclic serine Dabigatran etexilate 0.56
4 Dabigatran etexilate Lisinopril 0.55
5 Cilostazol Cyclic serine 0.54
6 Dabigatran etexilate Fosinopril 0.54
7 Dabigatran etexilate Ropinirole 0.54
8 Carvedilol Dabigatran etexilate 0.53
9 Dipyridamole Minox ringVegetable extract 0.53
10 Dabigatran etexilate Lovastatin 0.53
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for predicting drug interaction based on multidimensional similarity of drugs, the method comprising:
firstly, constructing an SVM classifier based on Mahalanobis distance based on the multi-dimensional drug similarity characteristics;
step 1: obtaining a drug pair of a known drug interaction type;
step 2: calculating the multi-dimensional similarity between different drugs in each pair of drugs;
and step 3: learning a mahalanobis distance transformation matrix based on the multi-dimensional drug similarity measure and the known classes of drug interactions;
and 4, step 4: constructing an SVM classifier of three-class drug interaction based on data after Mahalanobis distance transformation;
predicting the interaction between every two drugs in any plurality (more than 2) of candidate drugs based on the constructed SVM classifier;
aiming at the medicines with targets, two-dimensional structures and ATC coding information in any two drug bank databases in the plurality of candidate medicines, calculating the multidimensional similarity between the two medicines based on the same method in the step 2; and (3) carrying out data transformation on the multidimensional similarity based on the Mahalanobis distance matrix obtained in the step (3) and then bringing the data into the classifier in the step (4) to obtain probability values of the two drugs belonging to each drug interaction type, wherein the sum of the probability values corresponding to the three classes of drug interactions is1, and the drug interaction class with the maximum probability value is used as the drug interaction class obtained by prediction, namely the two input drugs belong to one drug interaction type of synergy, antagonism and independence.
2. The prediction method of claim 1, wherein:
drug pairs for which drug interactions are known are derived from the following database:
-at least 100 pairs of synergistic drug pairs in the DCDB database for which the drug targets are known;
-at least 100 pairs of antagonist drug pairs randomly selected from drug interactions in the drug bank database;
-at least 100 pairs of independent drug pairs randomly generated from drug bank drugs that do not belong to the DCDB database and drug bank drug interactions;
-ensuring equal number of drug interactions of the three classes.
3. The prediction method of claim 1, wherein the step 2 multidimensional drug similarity calculation comprises the steps of:
step 21: calculating drug similarity based on the molecular descriptors of the drugs;
step 22: calculating drug similarity based on the two-dimensional structure of the drug;
step 23: calculating drug similarity based on the drug target;
step 24: calculating drug similarity based on drug ATC codes;
step 25: calculating drug similarity based on the pathway in which the drug target is located;
step 26: calculating drug similarity based on neighbor nodes of drug targets in a protein interaction network;
step 27: the drug similarity measurement results based on different characteristics are integrated to obtain 124-dimensional multi-dimensional drug similarity.
4. The prediction method of claim 3, wherein the similarity of the molecular descriptors in step 21 is calculated as follows:
Figure FDA0001884830710000021
wherein d is1、d2Represents any two drugs, MD1 dTo MDn dN different molecular descriptors representing drug d.
5. The prediction method according to claim 3, wherein the step 23 comprises the steps of:
step 231: obtaining a target set of the medicine by using a drug bank database;
step 232: calculating drug similarity geneS1 based on the ratio of the two drugs sharing the target to the target union;
Figure FDA0001884830710000022
wherein d is1、d2Represents any two drugs, TdA target set representing drug d;
step 233: calculating medicine similarity based on the semantic similarity of a Gene Ontology (GO) of a medicine target, wherein a GO set corresponding to the medicine target set is obtained by utilizing a pyGS2 packet of Python, and the semantic similarity of the GO is calculated through a pyGS2 packet and is recorded as Gene 2;
step 234: drug similarity geneS3 was calculated based on the shortest path length of the target in the protein interaction network:
Figure FDA0001884830710000023
wherein, Td1、Td2Distribution represents drug d1、d2Target set of (1), t1、t2Respectively is any one of the target setsMember, d (t)1,t2) Represents a target t1、t2Shortest path length in PPI networks.
6. The prediction method according to claim 3, wherein said step 25 comprises the steps of:
step 251: obtaining the corresponding pathway information of all human beings by using a KEGG database, and defining a pathway map for each drug on the basis of the pathway information
Figure FDA0001884830710000031
Where N represents the number of all vias in the KEGG database,
Figure FDA0001884830710000032
is the score of the target set of drug d on the ith pathway;
step 252: calculation of P in the first form based on the ratio of overlap between target and pathway containing proteins of the drugd i
Figure FDA0001884830710000033
Wherein T isdRepresents the target, Ps, of drug diRepresenting the protein contained in the ith pathway, and calculating a pathway map for each drug based thereon;
step 253: computing first form P based on GO similarity between GO terms and pathway GO terms for a drug's targetd iCalculating a pathway profile for each drug based thereon;
step 254: and calculating the similarity between the two medicine passage maps based on the Pearson correlation coefficient aiming at the passage maps of two different forms obtained by calculation.
7. The prediction method according to claim 3, wherein said step 26 comprises the steps of:
step 261: determining neighbor nodes of the drug target in the PPI network based on the PPI recorded in the HPRD database;
step 262: target set T in step 232d1、Td2Replacement with neighbor node set Neid1And Neid2Calculating the similarity of a medicine based on the target neighbor node sharing ratio;
step 263: acquiring neighbor node set Nei by utilizing PyGS2 packetd1And Neid2Corresponding GO sets, and calculating the average semantic similarity between the two GO sets to obtain the drug similarity based on the GO similarity of target neighbor nodes;
step 264: target set T in step 234d1、Td2Replacement with neighbor node set Neid1And Neid2Calculating to obtain a drug similarity measure based on the shortest path length between the target neighbor nodes by adopting the same calculation method in the step 234;
step 265: target set T in step 25dReplacement with neighbor node set NeidAnd respectively calculating to obtain other two drug similarity measures based on target neighbor node path association by adopting two calculation modes in the step 25.
8. The prediction method of claim 3, wherein the 124-dimensional multi-dimensional similarity of step 27 comprises:
112-dimensional molecular descriptor similarity, 1-dimensional two-dimensional structure similarity, 3-dimensional target similarity, 1-dimensional ATC similarity, 2-dimensional drug path similarity and 5-dimensional target neighbor node similarity.
9. The prediction method of claim 1, wherein the step 3 comprises the steps of:
step 31: construction of Must-links matrix Ls(i,j):
Figure FDA0001884830710000041
Wherein pairi、pairjRepresenting two different pairs of drug interactions, (pair)i,pairj) Epsilon S represents pairi、pairjBoth drug pairs are of the same drug interaction type. The meaning of the above formula means when the drug is PairiAnd pairjWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is1, otherwise, the content is 0;
construction of Cannot-links matrix Ld(i,j):
Figure FDA0001884830710000042
Wherein (pair)i,pairj) E.g. D represents pairi、pairjThe meaning of the above formula when two drug pairs are drugs belonging to different types means that when a drug pair is pairiAnd pairjWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is 0, otherwise, the content is 1;
step 32: based on Ls、LdSeparately computing covariance matrices
Figure FDA0001884830710000043
Wherein X is a drug interaction pair feature description matrix, each column corresponds to a drug interaction pair, and each row corresponds to a similarity measure result;
step 33: learning mahalanobis distance conversion matrix using mahalanobis distance learning method
Figure FDA0001884830710000051
Where tr denotes the trace of the matrix, W*Is that make
Figure FDA0001884830710000052
A W matrix with a maximum value that satisfies the transpose of W (W)T) The product of the matrix with W is the identity matrix (W)TW=I)。
10. The method of claim 1, wherein said step 4 comprises the steps of:
step 41: to pairAnd (3) carrying out Mahalanobis distance transformation processing on the original data: x ═ W*TX;
Step 42: and constructing SVM classifiers of three drug interaction categories by using the transformed data set, namely constructing an SVM classifier based on the Mahalanobis distance.
CN201811441665.0A 2018-11-29 2018-11-29 Drug interaction prediction method based on drug multidimensional similarity Pending CN111243659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811441665.0A CN111243659A (en) 2018-11-29 2018-11-29 Drug interaction prediction method based on drug multidimensional similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811441665.0A CN111243659A (en) 2018-11-29 2018-11-29 Drug interaction prediction method based on drug multidimensional similarity

Publications (1)

Publication Number Publication Date
CN111243659A true CN111243659A (en) 2020-06-05

Family

ID=70874207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811441665.0A Pending CN111243659A (en) 2018-11-29 2018-11-29 Drug interaction prediction method based on drug multidimensional similarity

Country Status (1)

Country Link
CN (1) CN111243659A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112133367A (en) * 2020-08-17 2020-12-25 中南大学 Method and device for predicting interaction relation between medicine and target spot
CN112927766A (en) * 2021-03-29 2021-06-08 天士力国际基因网络药物创新中心有限公司 Method for screening disease combination drug

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method
CN103902848A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 System and method for identifying drug targets based on drug interaction similarities
US20150242752A1 (en) * 2012-10-01 2015-08-27 Japan Science And Technology Agency Approval prediction apparatus, approval prediction method, and computer program product

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150242752A1 (en) * 2012-10-01 2015-08-27 Japan Science And Technology Agency Approval prediction apparatus, approval prediction method, and computer program product
CN103902848A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 System and method for identifying drug targets based on drug interaction similarities
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112133367A (en) * 2020-08-17 2020-12-25 中南大学 Method and device for predicting interaction relation between medicine and target spot
CN112927766A (en) * 2021-03-29 2021-06-08 天士力国际基因网络药物创新中心有限公司 Method for screening disease combination drug
CN112927766B (en) * 2021-03-29 2022-11-01 天士力国际基因网络药物创新中心有限公司 Method for screening disease combination drug

Similar Documents

Publication Publication Date Title
Lee et al. Novel deep learning model for more accurate prediction of drug-drug interaction effects
Staszak et al. Machine learning in drug design: Use of artificial intelligence to explore the chemical structure–biological activity relationship
Hung Gene set/pathway enrichment analysis
CN112053742A (en) Method and device for screening molecular target protein, computer equipment and storage medium
CN108899086A (en) A kind of system that osteoarthritis hypotype is diagnosed by blood sample based on machine learning
CN111243659A (en) Drug interaction prediction method based on drug multidimensional similarity
Montero-Torres et al. Non-stochastic quadratic fingerprints and LDA-based QSAR models in hit and lead generation through virtual screening: theoretical and experimental assessment of a promising method for the discovery of new antimalarial compounds
Chen et al. Gene selection with multiple ordering criteria
Athar et al. First protein drug target’s appraisal of lead-likeness descriptors to unfold the intervening chemical space
US20190214136A1 (en) Predictive biomarkers of drug response in malignancies
McGarry et al. RESKO: repositioning drugs by using side effects and knowledge from ontologies
Wang et al. Crosstalk analysis of dysregulated pathways in preeclampsia
Zhang et al. Network motif-based identification of breast cancer susceptibility genes
Cope et al. MergeMaid: R tools for merging and cross-study validation of gene expression data
CN115691751A (en) Traditional Chinese medicine prescription screening method and system based on diagnosis and treatment experience and intelligent learning
Akhter et al. mrelief: A reward penalty based feature subset selection considering data overlapping problem
Chen et al. Knowledge-guided multi-scale independent component analysis for biomarker identification
CN112071439B (en) Drug side effect relationship prediction method, system, computer device, and storage medium
CN111477287B (en) Drug target prediction method, device, equipment and medium
CN112259175B (en) Virtual screening method of IRAK1 kinase inhibitor
Tian et al. A longitudinal feature selection method identifies relevant genes to distinguish complicated injury and uncomplicated injury over time
Numcharoenpinij et al. Predicting synergistic drug interaction with dnn and gat
Kuo et al. Functional relationships between gene pairs in oral squamous cell carcinoma
Guo et al. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies
Lee et al. Constructing a cancer patient-specific network based on second-order partial correlations of gene expression and dna methylation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination