CN111243659A

CN111243659A - Drug interaction prediction method based on drug multidimensional similarity

Info

Publication number: CN111243659A
Application number: CN201811441665.0A
Authority: CN
Inventors: 陈迪; 朴海龙
Original assignee: Dalian Institute of Chemical Physics of CAS
Current assignee: Dalian Institute of Chemical Physics of CAS
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2020-06-05

Abstract

The invention discloses a medicine interaction prediction method based on medicine multidimensional similarity, which comprises the following steps: calculating the multidimensional similarity between two drugs based on the characteristics of the drugs in multiple aspects of compound molecular descriptors, drug targets, ATC codes, pathways and targets in the protein interaction network; constructing an SVM classifier based on Mahalanobis distance based on the multi-dimensional drug similarity characteristics for distinguishing synergistic, antagonistic and independent drug interactions.

Description

Drug interaction prediction method based on drug multidimensional similarity

Technical Field

The present invention relates to the field of bioinformatics, and in particular to the field of predicting drug interactions using computer technology.

Background

Traditional drug development mainly aims at single compounds acting on a single target, complex diseases often correspond to complex biological processes, and one-to-one drug-target action modes cannot achieve obvious curative effects in treatment of many diseases. The clinical discovery shows that the reasonable drug combination can not only improve the curative effect of the drug, but also reduce the toxic and side effect, and provides a new way for treating complex diseases: the combination medicine is prepared. The combined medicine is composed of two or more active medicine components, different components usually act on different targets to regulate different pathological processes, higher curative effect is achieved with lower amount, and toxic and side effects of the medicine are reduced. Currently, combination drugs are increasingly being used for complex diseases such as: in the treatment of cancer, AIDS, hypertension, pulmonary tuberculosis, etc.

Unlike randomly combined drugs, there is synergistic interaction between the different drugs of the combined drug. The synergistic drug combination has the functions of synergy and attenuation, namely, the drug effect generated by the drug combination is more than the sum of the drug effects generated by the individual drugs when used independently. The other drug interaction corresponding to the drug is antagonistic interaction, and the drug combination of the antagonistic interaction can reduce the curative effect and increase the toxic and side effects of the drug. No matter antagonistic interaction or synergistic interaction, the drugs have correlation in pharmacology and pharmacokinetics. Aiming at the treatment of complex diseases, the combined medicaments are reasonably designed, the medicament combination of synergistic interaction is effectively utilized, meanwhile, the medicament combination of antagonistic interaction is avoided, and the treatment effect of the diseases is greatly improved.

The existing combined drugs are mainly from clinical experience, and the condition of drug combination increases exponentially along with the number of drugs, so that the problem of combination explosion hardly realizes clinical or experimental research. Therefore, the development of an effective computer prediction method becomes a necessary trend of combined drug discovery research, and the type of the drug combination is predicted through calculation, so that guidance can be provided for experimental research, and the development work of the combined drug can be promoted.

Currently, researchers have proposed a variety of computational methods for analyzing drug combinations or drug interactions. One of the more common methods is to construct classification models based on various types of similarity measures between drugs. For example, Zou et al analyze topological properties of neighboring communities of drug targets in Protein Interaction (PPI) networks and semantic similarity of related Gene Ontologies (GO), construct SVM classifiers, and predict whether drug combinations can be used as effective combined drugs. However, analyzing drug combinations solely by their inter-target properties ignores much of the drug-related information. Gottlieb and the like consider more comprehensive drug characteristics, and integrate and utilize drug ATC codes, the distance of drug targets on a PPI network, GO terms of the targets, target sequences, compound structures, drug side effects and ligand structures to construct a drug interaction classification model in 7 different aspects. However, the existing classification models have some limitations, one is that the association relationship of different drugs in the pathway is not considered, and the effective combination drug usually achieves the synergistic effect by interfering the associated pathway. Secondly, these classification models mainly classify whether there is interaction or not or whether they are as a combination drug, and neither model distinguishes well between synergistic interaction and antagonistic interaction.

In terms of constructing a classifier, a data set for a high-dimensional feature generally needs to be classified after being subjected to dimension reduction of the feature. In 2008, a heuristic learning method of the Mahalanobis distance matrix is provided for Shiming and the like, the accuracy of an expert method can be effectively improved when the Mahalanobis distance matrix obtained through learning is used in a classification and clustering method, and the dimension reduction method provides a favorable reference for predicting the category of drug interaction.

Disclosure of Invention

Technical problem to be solved

The invention aims to provide a drug interaction prediction method based on drug multidimensional similarity, which integrates multiple aspects of characteristics of drugs to calculate drug similarity and constructs an SVM (support vector machine) classifier based on Mahalanobis distance to classify three types of drug interactions, namely synergy, antagonism and independence.

(II) technical scheme

In order to solve the above technical problems, the present invention provides a prediction method of drug interaction based on multidimensional drug similarity, comprising:

firstly, constructing an SVM classifier based on Mahalanobis distance based on the multi-dimensional drug similarity characteristics;

step 1: obtaining a drug pair of a known drug interaction type;

step 2: calculating the multi-dimensional drug similarity between drug pairs;

and step 3: learning a mahalanobis distance transformation matrix based on the multi-dimensional drug similarity measure and the known classes of drug interactions;

and 4, step 4: and constructing the SVM classifier of the interaction of the three classes of medicines based on the data of the Mahalanobis distance transformation.

Predicting the interaction between every two drugs in any plurality of candidate drugs based on the constructed SVM classifier;

aiming at the medicines with targets, two-dimensional structures and ATC coding information in any two drug bank databases in the plurality of candidate medicines, calculating the multidimensional similarity between the two medicines based on the same method in the step 2; and (3) carrying out data transformation on the multidimensional similarity based on the Mahalanobis distance matrix obtained in the step (3) and then bringing the data into the classifier in the step (4) to obtain probability values of the two drugs belonging to each drug interaction type, wherein the sum of the probability values corresponding to the three classes of drug interactions is1, and the drug interaction class with the maximum probability value is used as the drug interaction class obtained by prediction), namely the two input drugs belong to one drug interaction type of synergy, antagonism and independence.

The step 2 comprises the following steps:

step 21: calculating drug similarity based on the molecular descriptors of the drugs;

step 22: calculating drug similarity based on the two-dimensional structure of the drug;

step 23: calculating drug similarity based on the drug target;

step 24: calculating drug similarity based on drug ATC codes;

step 25: calculating drug similarity based on the pathway in which the drug target is located;

step 26: calculating drug similarity based on neighbor nodes of drug targets in a protein interaction network;

step 27: multi-dimensional drug similarity is obtained by integrating drug similarity measurement results based on different characteristics.

The step 3 comprises the following steps:

step 31: construction of Must-links matrix L_s(i,j)：

Wherein pair_i、pair_jTo representTwo different drug interaction pairs, (pair)_i,pair_j) Epsilon S represents pair_i、pair_jBoth drug pairs are of the same drug interaction type. The meaning of the above formula means when the drug is Pair_iAnd pair_jWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is1, otherwise, the content is 0;

construction of Cannot-links matrix L_d(i,j)：

Wherein (pair)_i,pair_j) E.g. D represents pair_i、pair_jThe meaning of the above formula when two drug pairs are drugs belonging to different types means that when a drug pair is pair_iAnd pair_jWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is 0, otherwise, the content is 1;

step 32: based on L_s、L_dSeparately computing covariance matrices

Wherein X is a drug interaction pair feature description matrix, each column corresponds to a drug interaction pair, and each row corresponds to a similarity measure result;

step 33: learning mahalanobis distance conversion matrix using mahalanobis distance learning method

The step 4 comprises the following steps:

step 41: and (3) performing Mahalanobis distance transformation processing on the original data: x ═ W^*TX；

Step 42: and constructing an SVM classifier aiming at drug interaction classification based on the data set after the Mahalanobis distance transformation.

(III) advantageous effects

The invention provides a drug interaction prediction method based on drug multidimensional similarity, which comprehensively describes the association relationship among drugs by integrating the similarity measurement of the drugs in multiple aspects of molecular descriptors, two-dimensional structures, targets, ATC codes, paths and targets in neighbor nodes in a protein interaction network, and provides a basis for classification and mechanism explanation of drug interaction; the SVM classifier based on the Mahalanobis distance can improve the precision reading of the classifier and reliably predict the interaction probability of the drugs to the drugs belonging to different types.

Drawings

FIG. 1 is a schematic diagram of a method for predicting drug interaction based on multi-dimensional similarity of drugs according to the present invention;

Detailed Description

In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.

FIG. 1 is a schematic diagram of a method for predicting drug interaction based on multidimensional drug similarity. As shown in FIG. 1, the present invention provides a method for predicting synergistic, antagonistic and independent interactions of three different types of drugs, wherein independent drug interaction refers to the type of interaction between two drugs that does not have synergistic or antagonistic interactions. Four steps are given in fig. 1 in order from top to bottom: the specific content comprises the following steps:

step 1: acquiring interaction of different types of medicines;

the step 1 mainly provides training and testing samples for subsequent classification models, collects the same number of synergistic, antagonistic and independent drug interaction pairs only for ensuring sample balance and computability, and ensures that the two-dimensional structure data, targets and ATC coding information of all drugs are known.

Step 2: calculating the multi-dimensional similarity of the medicines;

the steps calculate the similarity among the medicines from a plurality of aspects of molecular descriptors, two-dimensional structures, targets, ATC codes, paths and targets of the medicines in the neighbor nodes of the protein interaction network, comprehensively describe the association relation among the medicines and provide a basis for the classification and mechanism explanation of the medicine interaction.

And step 3: learning a Mahalanobis distance matrix;

in order to improve the classification performance, the mahalanobis distance conversion matrix is learned according to sample data to ensure that drug pairs belonging to the same drug interaction type are as close as possible in the mahalanobis distance space, and drug pairs of different types are as far as possible in the mahalanobis distance space.

And 4, step 4: constructing an SVM classifier based on the Mahalanobis distance;

the method comprises the steps of converting original data by using a Mahalanobis distance conversion matrix, and training and testing the SVM classifier based on the converted data, so that the classification precision is improved.

The individual steps involved are described in detail below.

Step 1: acquiring interaction of different types of medicines;

the step 1 comprises the following steps:

step 11: obtaining 179 drug pairs with known two-dimensional structures, targets and ATC codes of the drugs from a DCDB database, and using the drug pairs as a cooperative drug interaction pair set, wherein the set comprises the following steps: amitriptyline and lissamine, lapatinib and topotecan, and the like;

step 12: randomly selecting 179 drug pairs with known antagonistic effect in a drug two-dimensional structure, a target and an ATC code from drug interaction in a drug Bank database as an antagonistic drug interaction pair set; the method comprises the following steps: tacrolimus and imatinib, vorinostat and chlorpromazine and the like

Step 13: randomly extracting 179 pairs from drug library drug interaction record, which belong to neither DCDB database nor drug library, and ensuring that the two-dimensional structure, target, ATC code of each drug can be obtained from drug library database as independent drug interaction pair set; the method comprises the following steps: amprenavir and clockie, amobarbital and rasagiline

All the collected medicines are medicines recorded in a drug bank and DCDB database, the two-dimensional structure, the target and the ATC code of each medicine are known, the two-dimensional structure is described by a drug bank database sdf file, the target set is from the drug bank database and the DCDB database and is subjected to unified labeling based on Entrez Gene ID corresponding to the medicine target, and the drug ATC code is from the drug bank database.

The step 2 comprises the following steps:

molecular descriptors describe the chemical properties of small molecule compounds from a variety of angles. Taking a two-dimensional structure of a drug as an input, calculating a molecular descriptor of each small molecule drug based on a cdk (cdk. qsar. descriptors. molecular) in the Chemistry Development kit, removing molecular descriptors which are 0 or can not be calculated in more than 90% of molecules, and finally obtaining a molecular descriptor with 112 dimensions in total. The similarity between the two drug molecule descriptors in each pair of drugs in the set of three classes of drug interaction pairs, MDsS, was calculated as follows:

wherein d is₁、d₂Represents two drugs of any drug pair, MD₁ ^dTo MD_n ^dN different molecular descriptors representing drug d.

drugs of similar compound structures generally have similar pharmacological and pharmacokinetic properties. Fingerprint of drug two-dimensional structure calculation compound was obtained using CDK. fingerprint module in CDK tool, and fingerprint similarity between two drugs based on Tanimoto coefficient was calculated using CDK.

Step 23: calculating drug similarity based on the drug target;

the step 23 specifically includes:

step 231: obtaining a target set of the medicine by using a drug bank database;

step 232: calculating drug similarity geneS1 based on the ratio of the two drugs sharing the target to the target union;

wherein d is₁、d₂Represents any two drugs, T_dA target set representing drug d;

step 233: calculating drug similarity based on the semantic similarity of Gene Ontology (GO) of drug targets;

obtaining GO sets corresponding to two drug target sets in each drug pair respectively by utilizing a pyGS2 packet installed in Python, and calculating the average semantic similarity between the two GO sets through a GS2 function in a pyGS2 packet, wherein the average semantic similarity is marked as geneS 2;

step 234: drug similarity geneS3 was calculated based on the shortest path length of the target in the protein interaction network:

wherein, T_d1、T_d2Distribution represents drug d₁、d₂Target set of (1), t₁、t₂Are respectively any member of a target set, d (t)₁,t₂) Represents a target t₁、t₂And in the shortest path length in the PPI network, the PPI network is downloaded from an HPRD database, and the shortest path length is obtained by calculation based on a shortest _ path _ length function of a network package in Python.

Step 24: calculating drug similarity based on drug ATC codes;

the anatomical therapeutics and chemical classification system, abbreviated as atc (atomic Therapeutic chemical) system, is the official classification system of drugs by the world health organization, and comprises 5 drug levels, which are classified and coded according to the organ acted by the drug and the chemical characteristics of the treatment. Similarity ATCs on the kth level for two drug ATC codes_kIs defined as follows:

atcS_k(d₁,d₂)＝|atc_k(d₁)∩atc_k(d₂)|/|atc_k(d₁)∪atc_k(d₂)|

wherein atc_k(d) Indicating all ATC codes for drug d on the kth level of the ATC system. Since each drug corresponds to a 5-level code, the ATC code similarity of two drugs is defined as follows:

the similarity between drugs is measured by the correlation between the pathways in which the drug targets are located. The pathway effects of drugs are described in terms of their effect on all known pathways by their corresponding targets, and pathway similarity is described by comparing the degree of similarity of the effects of different drug pathways.

The step 25 specifically includes:

step (ii) of251: obtaining the corresponding pathway information of all human beings by using a KEGG database, and defining a pathway map for each drug on the basis of the pathway information

Where N represents the number of all vias in the KEGG database,

is the score of the target set of drug d on the ith pathway, i is an integer between 1 and N;

step 252: calculation of the first form based on the ratio of overlap between target and pathway containing protein of the drug

Wherein T is_dRepresents the target, Ps, of drug d_iRepresenting the protein contained in the ith pathway, and calculating a pathway map for each drug based thereon;

step 253: computing a second form of p based on GO similarity between GO terms of a drug's target and GO terms of a pathway_d ⁱObtaining a GO term set corresponding to a target set of a drug d and a GO term set corresponding to all proteins in a channel i by utilizing a PyGS2 packet, and calculating the average semantic similarity of the two GO term sets to obtain p_d ⁱCalculating a pathway profile for each drug based thereon;

step 254: after the pathway maps of each drug are obtained through calculation, the similarity between the drugs is represented by calculating the Pearson correlation coefficient between the two drug pathway maps. Corresponding to two different p_d ⁱThe calculation method can calculate two-dimensional measurement based on the path similarity.

Step 26: calculating drug similarity based on neighbor nodes of drug targets in the protein interaction network;

research shows that the action mechanism of the combined drug is closely related to the neighbor nodes of the drug targets in the PPI network, the method obtains the neighbor nodes of the target set of each drug in the PPI network based on the PPI network, and calculates the similarity based on the neighbor nodes.

Preferably, said step 26 comprises:

step 261: determining neighbor nodes of the drug target in the PPI network based on the PPI recorded in the HPRD database;

step 262: target set T in step 232_d1、T_d2Replacement with neighbor node set Nei_d1And Nei_d2Calculating the similarity of the drugs based on the target neighbor node sharing ratio, and recording as neiS 1:

step 263: acquiring neighbor node set Nei by utilizing PyGS2 packet installed in Python_d1And Nei_d2Calculating the average semantic similarity between the two GO sets through a GS2 function in a pyGS2 package to obtain the drug similarity based on the GO similarity of the target neighbor node, and marking as neiS 2;

step 264: target set T in step 234_d1、T_d2Replacement with neighbor node set Nei_d1And Nei_d2Calculating by adopting the same calculation method in the step 234 to obtain a drug similarity measurement based on the shortest path length between target neighbor nodes, which is marked as neiS 3:

wherein, t₁、t₂Are respectively any member in the neighbor node set, d (t)₁,t₂) Represents t₁、t₂Shortest path length in PPI networks.

Step 265: target set T in step 25_dReplacement with neighbor node set Nei_dAnd respectively calculating and obtaining two other drug similarity measurements based on target neighbor node path association by adopting two calculation modes in the step 25, and respectively marking the two drug similarity measurements as neiS4 and neiS 5.

Step 27: integrating the drug similarity measurement results of steps 21 to 26 to obtain 124-dimensional multi-dimensional drug similarity;

the similarity comprises 112-dimensional molecular descriptor similarity, 1-dimensional two-dimensional structure similarity, 3-dimensional target similarity, 1-dimensional ATC similarity, 2-dimensional drug path similarity, 5-dimensional target neighbor node similarity and 124-dimensional similarity.

the step 3 comprises the following steps:

step 31: construction of Must-links matrix L_s(i,j)：

Wherein pair_i、pair_jRepresenting two different pairs of drug interactions, (pair)_i,pair_j) Epsilon S represents pair_i、pair_jTwo drug pairs are of the same drug interaction type (synergistic, antagonistic, or independent). The meaning of the above formula means when the drug is Pair_iAnd pair_jWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is1, otherwise, the content is 0;

construction of Cannot-links matrix L_d(i,j)：

Wherein (pair)_i,pair_j) E.D represents Pairi, the two drug pairs of Pairj belong to different types of drugs, and the meaning of the formula means that when the drug pair Pair_iAnd pair_jWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is 0, otherwise, the content is 1;

step 32: based on L_s、L_dSeparately computing covariance matrices

Wherein X is a drug interaction pair characterization matrix, each column corresponds to a drug interaction pair, each row corresponds to a similarity measure, and X^TRepresents the transpose of matrix X;

Where tr denotes the trace of the matrix, W^*Is that make

A W matrix with a maximum value that satisfies the transpose of W (W)^T) The product of the matrix with W is the identity matrix (W)^TW＝I)。

And 4, step 4: constructing an SVM classifier of three-class drug interaction based on data of Mahalanobis distance transformation;

the step 4 comprises the following steps:

step 41: and (3) performing Mahalanobis distance transformation processing on the original data: x ═ W^*TX, replacing original data with the converted matrix for subsequent analysis;

step 42: and constructing an SVM classifier aiming at three drug interaction categories by using a sklern packet in python and based on X' and the drug interaction category to which each row of drug pairs belongs.

Examples

The inventor uses a drug interaction prediction method based on drug multidimensional similarity in drug interaction analysis of drugs related to cerebral apoplexy, 8 drugs (including dipyridamole, aspirin, argatroban, clopidogrel, dabigatran etexilate, ticlopidine, warfarin and cilostazol) for brain apoplexy and 41 drugs (including atenolol, cilazapril, cycloserine and the like) related to cerebral apoplexy are selected to obtain a drug pair combination 8 (8-1)/2+8 (41) 356 pairs containing at least one drug for cerebral apoplexy, and the implementation effect is as follows:

the probability that all cerebral apoplexy related drug pairs belong to different interaction categories is predicted by utilizing the SVM classifier based on the Mahalanobis distance constructed by the invention, and the corresponding structure of each category is sequenced according to the probability value. The results show that both drug pair combinations with exact synergistic interaction (aspirin and clopidogrel, dipyridamole and aspirin) are ranked in the first 11 positions, and that two additional drug pairs in the first 11 positions demonstrate synergistic effects in the relevant literature (table 1); while the antagonism of the top 10 ranked antagonistic interactions of 7 groups of drugs on combinations (dipyridamole and timolol, clopidogrel and escitalopram, escitalopram and ticlopidine, carvedilol and ticlopidine, clopidogrel and ropinirole, dipyridamole and metoprolol, dipyridamole and escitalopram) has been recorded in databases or drug-related websites (table 2). This example demonstrates the effectiveness of a drug interaction prediction method based on multi-dimensional similarity of drugs proposed by the present invention, i.e. the ability to identify known synergistic interactions and antagonistic interactions.

TABLE 1 ranking of synergistic drug interactions associated with cerebral apoplexy

TABLE 2 ranking of antagonist drug interactions associated with cerebral apoplexy

Ranking	Medicine 1	Medicine 2	Probability of antagonistic interactions
				1	Dipyridamole	Timolol	0.963587695
2	Clopidogrel	Escitalopram	0.956255907
				3	Escitalopram	Ticlopidine	0.948348315
4	Carvedilol	Ticlopidine	0.944879595
				5	Clopidogrel	Ropinirole	0.902465643
6	Dipyridamole	Metoprolol	0.885249271
				7	Dipyridamole	Ropinirole	0.883119422
8	Ropinirole	Ticlopidine	0.877290588
				9	Dipyridamole	Escitalopram	0.85500597
10	Argatroban	Quinapril	0.841249238

TABLE 3 Stroke-related independent interaction drug action ranking

Ranking	Medicine 1	Medicine 2	Probability of independent interaction
				1	Dabigatran etexilate	Gemfibrozil	0.59
2	Dabigatran etexilate	Simvastatin	0.56
				3	Cyclic serine	Dabigatran etexilate	0.56
4	Dabigatran etexilate	Lisinopril	0.55
				5	Cilostazol	Cyclic serine	0.54
6	Dabigatran etexilate	Fosinopril	0.54
				7	Dabigatran etexilate	Ropinirole	0.54
8	Carvedilol	Dabigatran etexilate	0.53
				9	Dipyridamole	Minox ringVegetable extract	0.53
10	Dabigatran etexilate	Lovastatin	0.53

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for predicting drug interaction based on multidimensional similarity of drugs, the method comprising:

step 1: obtaining a drug pair of a known drug interaction type;

step 2: calculating the multi-dimensional similarity between different drugs in each pair of drugs;

and 4, step 4: constructing an SVM classifier of three-class drug interaction based on data after Mahalanobis distance transformation;

predicting the interaction between every two drugs in any plurality (more than 2) of candidate drugs based on the constructed SVM classifier;

aiming at the medicines with targets, two-dimensional structures and ATC coding information in any two drug bank databases in the plurality of candidate medicines, calculating the multidimensional similarity between the two medicines based on the same method in the step 2; and (3) carrying out data transformation on the multidimensional similarity based on the Mahalanobis distance matrix obtained in the step (3) and then bringing the data into the classifier in the step (4) to obtain probability values of the two drugs belonging to each drug interaction type, wherein the sum of the probability values corresponding to the three classes of drug interactions is1, and the drug interaction class with the maximum probability value is used as the drug interaction class obtained by prediction, namely the two input drugs belong to one drug interaction type of synergy, antagonism and independence.

2. The prediction method of claim 1, wherein:

drug pairs for which drug interactions are known are derived from the following database:

-at least 100 pairs of synergistic drug pairs in the DCDB database for which the drug targets are known;

-at least 100 pairs of antagonist drug pairs randomly selected from drug interactions in the drug bank database;

-at least 100 pairs of independent drug pairs randomly generated from drug bank drugs that do not belong to the DCDB database and drug bank drug interactions;

-ensuring equal number of drug interactions of the three classes.

3. The prediction method of claim 1, wherein the step 2 multidimensional drug similarity calculation comprises the steps of:

step 23: calculating drug similarity based on the drug target;

step 24: calculating drug similarity based on drug ATC codes;

step 27: the drug similarity measurement results based on different characteristics are integrated to obtain 124-dimensional multi-dimensional drug similarity.

4. The prediction method of claim 3, wherein the similarity of the molecular descriptors in step 21 is calculated as follows:

wherein d is₁、d₂Represents any two drugs, MD₁ ^dTo MD_n ^dN different molecular descriptors representing drug d.

5. The prediction method according to claim 3, wherein the step 23 comprises the steps of:

step 231: obtaining a target set of the medicine by using a drug bank database;

step 233: calculating medicine similarity based on the semantic similarity of a Gene Ontology (GO) of a medicine target, wherein a GO set corresponding to the medicine target set is obtained by utilizing a pyGS2 packet of Python, and the semantic similarity of the GO is calculated through a pyGS2 packet and is recorded as Gene 2;

wherein, T_d1、T_d2Distribution represents drug d₁、d₂Target set of (1), t₁、t₂Respectively is any one of the target setsMember, d (t)₁,t₂) Represents a target t₁、t₂Shortest path length in PPI networks.

6. The prediction method according to claim 3, wherein said step 25 comprises the steps of:

step 251: obtaining the corresponding pathway information of all human beings by using a KEGG database, and defining a pathway map for each drug on the basis of the pathway information

Where N represents the number of all vias in the KEGG database,

is the score of the target set of drug d on the ith pathway;

step 252: calculation of P in the first form based on the ratio of overlap between target and pathway containing proteins of the drug_d ⁱ：

step 253: computing first form P based on GO similarity between GO terms and pathway GO terms for a drug's target_d ⁱCalculating a pathway profile for each drug based thereon;

step 254: and calculating the similarity between the two medicine passage maps based on the Pearson correlation coefficient aiming at the passage maps of two different forms obtained by calculation.

7. The prediction method according to claim 3, wherein said step 26 comprises the steps of:

step 262: target set T in step 232_d1、T_d2Replacement with neighbor node set Nei_d1And Nei_d2Calculating the similarity of a medicine based on the target neighbor node sharing ratio;

step 263: acquiring neighbor node set Nei by utilizing PyGS2 packet_d1And Nei_d2Corresponding GO sets, and calculating the average semantic similarity between the two GO sets to obtain the drug similarity based on the GO similarity of target neighbor nodes;

step 264: target set T in step 234_d1、T_d2Replacement with neighbor node set Nei_d1And Nei_d2Calculating to obtain a drug similarity measure based on the shortest path length between the target neighbor nodes by adopting the same calculation method in the step 234;

step 265: target set T in step 25_dReplacement with neighbor node set Nei_dAnd respectively calculating to obtain other two drug similarity measures based on target neighbor node path association by adopting two calculation modes in the step 25.

8. The prediction method of claim 3, wherein the 124-dimensional multi-dimensional similarity of step 27 comprises:

112-dimensional molecular descriptor similarity, 1-dimensional two-dimensional structure similarity, 3-dimensional target similarity, 1-dimensional ATC similarity, 2-dimensional drug path similarity and 5-dimensional target neighbor node similarity.

9. The prediction method of claim 1, wherein the step 3 comprises the steps of:

step 31: construction of Must-links matrix L_s(i,j)：

Wherein pair_i、pair_jRepresenting two different pairs of drug interactions, (pair)_i,pair_j) Epsilon S represents pair_i、pair_jBoth drug pairs are of the same drug interaction type. The meaning of the above formula means when the drug is Pair_iAnd pair_jWhen the medicine belongs to the same medicine interaction type, the content of the ith row and the jth column in the matrix is1, otherwise, the content is 0;

construction of Cannot-links matrix L_d(i,j)：

step 32: based on L_s、L_dSeparately computing covariance matrices

Where tr denotes the trace of the matrix, W^*Is that make

10. The method of claim 1, wherein said step 4 comprises the steps of:

step 41: to pairAnd (3) carrying out Mahalanobis distance transformation processing on the original data: x ═ W^*TX；

Step 42: and constructing SVM classifiers of three drug interaction categories by using the transformed data set, namely constructing an SVM classifier based on the Mahalanobis distance.