CN114520060B

CN114520060B - Medicine path prediction method based on network reasoning

Info

Publication number: CN114520060B
Application number: CN202011312575.9A
Authority: CN
Inventors: 唐贇; 吴曾睿; 王吉烨; 刘桂霞; 李卫华
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2024-03-29
Anticipated expiration: 2040-11-20
Also published as: CN114520060A

Abstract

The invention provides a medicine path prediction method and a prediction model based on network reasoning, which are constructed by adopting the following steps: first, a known drug-pathway network is constructed by using large-scale pharmacogenomic data; constructing a substructure-drug network by calculating information of the chemical substructure; integrating the substructure-medicine network and the medicine-path network to construct a substructure-medicine-path heterogeneous network; then, based on a network reasoning algorithm, the method is applied to a substructure-medicine-path heterogeneous network to construct a medicine-path prediction model; in the model, new chemical entity molecules are input, and potential path relations are output; and other disease treatment effects in the marketed drugs are found in combination with the drug repositioning strategy based on the pathway. The method is simple and effective, and has better prediction performance in the aspect of drug path prediction and drug repositioning based on the path; and the result is obtained quickly, compared with the drug path relationship determined by the test, the method is more rapid and efficient.

Description

Medicine path prediction method based on network reasoning

Technical Field

The invention relates to the fields of computer-aided drug design and drug informatics, in particular to a drug path prediction method based on network reasoning.

Background

Traditional new drug development methods generally follow the principle of "one drug, corresponding to one target, corresponding to one disease", and mainly include target-based drug design, structure-based drug design, property-based drug design, and the like. Although these methods have designed some drug candidates that perform well, there are certain limitations in terms of off-target effects that can occur during clinical use. In recent years, more and more studies have shown that many drugs act therapeutically by modulating multiple targets or disease-associated signaling pathways, rather than just a single target. For example: sorafenib inhibits tumor growth and angiogenesis by inhibiting the activity of RAF/MEK/ERK signaling pathways and receptor tyrosine kinases; co-inhibiting PI3K-Akt signaling pathway and HER kinase activity helps to increase the therapeutic index of tumor patients. Therefore, a new drug design method needs to be developed on the basis of the traditional drug design method so as to improve the therapeutic effect and safety of the drug, reduce higher cost in the new drug research and development process and improve the success rate of new drug research and development.

To speed up the efficiency of new drug development, pathway-based drug discovery (or pathway-based drug repositioning) strategies open up new views for researchers. The main research objective of this strategy is to discover new drug-pathway relationships. During the course of disease development, the phenotype is mainly represented by the molecular pathway level in the cell, such as the gene expression level and the like. Drugs may exert therapeutic effects by altering gene expression levels in disease-related pathways. Such drugs exert a therapeutic effect by acting on disease-related pathways, i.e., drug-pathway relationships. However, determining such a relationship by experimental methods is very time consuming and labor intensive. With the development of multiple-study technology, pharmacogenomics can present the change of panoramic genes occurring at intracellular pathway level under different drug treatments, and provides a compacted data basis for researching the relationship between drugs and pathways from a computational hierarchy.

In previous studies, methods based on matrix decomposition, association rules, machine learning, etc. were applied to predict drug-pathway relationships. Although these methods have proven to be somewhat reliable, with the rapid growth in the amount and type of data associated with drugs and pathways, the ability to represent and algorithmically analyze current data is far beyond. In addition, current methods do not broadly predict potential pathway relationships for some older drugs, clinically failed drugs, and new chemical entity small molecules. In recent years, a series of algorithms (NBI, SDTNBI, bSDTNBI) based on network reasoning developed by the subject group have received a great deal of attention in the field of drug development and are applied to prediction of multiple network relationships, including prediction of drug-target relationships, prediction of drug-adverse drug event relationships, prediction of drug-microRNA relationships, and prediction of drug-ATC coding relationships, and have the significant advantage of not relying on negative samples. Therefore, it is necessary to design a drug pathway prediction method based on network reasoning to predict potential pathway relationships for some old drugs, clinically failed drugs and new chemical entity small molecules and use the method to conduct repositioning studies for drug discovery.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a drug path prediction method based on network reasoning and application thereof, and compared with the traditional drug path prediction method based on methods such as matrix decomposition, association rules, machine learning and the like, the method can predict potential path relation for new chemical entity micromolecules and does not depend on negative samples; the invention can fully utilize the information in the substructure-medicine-path heterogeneous network on the basis of using the network-based reasoning algorithm, improves the model prediction performance, is simple and effective, and is easy to implement.

In a first aspect of the present invention, a method for predicting a drug pathway based on network reasoning is provided, comprising the steps of:

1. construction of a substructure-drug-pathway heterogeneous network: calculating a drug-induced gene signature using the published pharmacogenomic data, using pathway enrichment analysis, constructing a known drug-pathway network; constructing a substructure-drug network by calculating various chemical substructure information of drugs in the drug-pathway network; finally integrating the medicine-path network and the substructure-medicine network to construct a substructure-medicine-path heterogeneous network;

2. extending a network-based reasoning algorithm to the heterogeneous network: according to the known medicine-path network and the substructure-medicine network, for any medicine, the path node and the substructure node connected with the medicine are respectively allocated with an initial resource of one unit, and an initial resource matrix based on a network reasoning algorithm is constructed; then in each step of resource diffusion process, the substructure node and the path node with initial resources in the network can evenly distribute the resources of the node to the neighbor nodes connected with the node, and then a transfer matrix based on a network reasoning algorithm is constructed according to the number of times of resource diffusion;

3. predicting new drug-pathway relationships: for a given drug or a new chemical entity small molecule, according to the constructed initial resource matrix and the resource transfer matrix after diffusion, for any one path node in the network, the number of the path node owns resources indicates the strength of the association between the drug and the path, namely the more the path node owns the resources, the higher the score of the path node, and the higher the possibility of the association between the drug and the path.

The method of the present invention will be described in detail below.

1. Construction of substructure-drug-pathway heterogeneous networks

1.1 using published pharmacogenomic data, calculating drug-induced gene signatures, constructing a known drug-pathway network by using pathway enrichment analysis; in the network, a drug collection N drugs, pathway set->N paths are shown; thus, the matrix representation of the drug-pathway network is:

wherein i is E (0, N) _D ],j∈(0,N _P ]Is two positive integers.

1.2, calculating a substructure of the medicine based on chemical structure information of the medicine, and constructing a substructure-medicine network; in the network, a drug collectionRepresenting N drugs, substructure sets->Representing an N seed structure; thus, substructure-drugsThe matrix representation of the network is:

wherein i is E (0, N) _D ],j∈(0,N _S ]Is two positive integers.

1.3 integrating the two networks to construct a substructure-drug-pathway network; thus, the matrix representation of the substructure-drug-pathway network is:

2. extending network-based reasoning algorithms to the heterogeneous network

2.1 constructing an initial resource matrix: according to the algorithm based on network reasoning, an adjustable parameter alpha E [0,1 ] is introduced in the process of resource diffusion and is used for adjusting the initial resource allocation of different node types. In the initial resource allocation process, we allocate one unit of initial resource to all neighbor nodes of any one drug node in the substructure-drug-pathway network. Each sub-structure node gives an initial resource with the average sharing total amount of alpha, and each path node gives an initial resource with the average sharing total amount of 1-alpha. In the resource diffusion process, the node type which is dominant in the initial resource allocation link can be analyzed by adjusting the parameter alpha, so that the expressive capacity of the prediction model can be improved. Thus, the matrix representation of the drug-pathway network and the substructure-drug network are respectively:

wherein i is E (0, N) _D ],j∈(0,N _P ]Is two positive integers.

Wherein i is E (0, N) _D ],j∈(0,N _S ]Is two positive integers.

Thus, the initial resource matrix representation of the substructure-drug-pathway heterogeneous network is:

2.2 constructing a resource transfer matrix: before constructing the resource transfer matrix, we need to redefine the matrix representation of the substructure-drug network and drug-pathway network during the resource diffusion process, just to exclude the effects of completely new chemical entity small molecules without known pathway nodes, as follows:

wherein i is E (0, N) _D ],j∈(0,N _P ]Is two positive integers.

Wherein i is E (0, N) _D ],j∈(0,N _S ]Is two positive integers.

According to an algorithm based on the reasoning of the network, two other adjustable parameters beta epsilon [0,1 ] and gamma epsilon (- ≡) are introduced in the process of resource diffusion, ++ infinity A kind of electronic device. The parameter beta is used for adjusting the influence of different edge types on the model in the resource diffusion process, and the parameter gamma is used for adjusting the influence of the pivot node on the model in the resource diffusion process. Thus, the transfer matrix representation of each resource diffusion process is:

wherein i, j E (0, N) _D +N _S +N _P ]Is two positive integers.

3. Implementing reasoning on the heterogeneous network, predicting new drug-pathway relationships

Iteratively reasoning in the heterogeneous network based on the initial resource matrix A' of the constructed substructure-drug-pathway heterogeneous network and the transfer matrix W in each step of resource diffusion process; assuming that the number of resource diffusion times is k, the calculation formula of the final resource diffusion transfer matrix is as follows:

F＝A′×W ^k #(12)

in the matrix, F (i, N _D +N _S The value of +j) is a scoring of the algorithm predicted drug-pathway interactions based on network reasoning, where i e (0, N) _D ],j∈(0,N _P ]Is two positive integers.

Regarding the numerical determination of the three adjustment parameters and the number of cycles, the invention uses different substructure information types to train the model under different parameters alpha, beta, gamma and different diffusion numbers k. Firstly, under the conditions of diffusion times k=2 and gamma=0, searching optimal parameters alpha and beta; secondly, after finding out the optimal parameters alpha and beta, further finding out the optimal parameters gamma and the optimal diffusion times k; and finally, evaluating and predicting the trained optimal model. By optimization, α is 0.1, β is 0.1, γ is-0.8, and k is 2.

The invention provides a drug path prediction model constructed based on the prediction method, which at least comprises an input display module, a drug path relation calculation module and a storage module. The input display module is used for inputting a disease-pathway network, chemical entities related to the disease and a pharmaceutical composition to be analyzed, and displaying analysis results; the drug path relation calculation module calculates the relevance between the drug and the path according to a drug path prediction method, screens to obtain a potential path, and scores the interaction between the drug composition to be analyzed and the path based on the path; the storage module records and stores the result of the medicine path relation calculation module in real time.

Preferably, the potential pathway is a pathway in which chemical entities associated with the largest number of diseases are involved.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention provides a medicine path prediction method based on network reasoning, which applies an algorithm based on network reasoning to the medicine path prediction field for the first time. Tests on a standard data set and an external verification set show that the invention has better prediction performance in the aspects of drug path prediction and drug repositioning based on the path.

(2) Most of the existing drug path prediction methods rely on conventional machine learning algorithms, and require that a set of putative negative sample data be created before a model is built. The invention is a medicine path prediction framework constructed based on a network reasoning algorithm, and has the remarkable advantage of not depending on negative samples.

(3) The existing drug pathway prediction method cannot predict potential drug pathway relationships for brand-new chemical entity small molecules. The invention constructs the substructure-medicine-passage heterogeneous network by integrating the substructure-medicine network and the medicine-passage network, takes the substructure as a bridge for the first time, predicts potential medicine passage relation for medicines outside the network or brand new chemical entity small molecules, can more fully help researchers to predict large-scale medicine passages, and provides an effective tool for medicine discovery or repositioning research based on passages.

Drawings

In order to more clearly illustrate the technical solutions or embodiments of the present invention in the prior art, the following description briefly describes the drawings that are required to be used in the technical solutions and embodiments of the present invention.

FIG. 1 is a flow chart of the present invention.

Figure 2 is a ten-fold cross-validation based evaluation of the model performance of the invention for different molecular fingerprints under different parameters α and β for predicting drug candidates for a known pathway.

Figure 3 is a model representation of the invention based on ten fold cross-validation evaluation of model performance corresponding to different molecular fingerprints at parameter gamma for predicting drug candidates for a known pathway.

Fig. 4 is a graph showing the performance of the model corresponding to different molecular fingerprints under different resource diffusion times k, based on ten-fold cross-validation evaluation of the present invention for predicting candidate drugs for a known pathway.

Fig. 5 is a network diagram of tumor-associated pathways in an embodiment of the present invention. Rectangular nodes of different colors represent different types of tumors; circular nodes of different sizes represent the degree values of different paths.

FIG. 6 is an important feature of the tumor associated pathway PI3K-Akt signaling pathway as pathway-based anti-tumor drug repositioning in an embodiment of the present invention. LN (IC) ₅₀ ) Is an IC ₅₀ Logarithmic value based on natural logarithm e.

Detailed Description

The following is illustrative of the invention in connection with specific preferred embodiments, but is not intended to limit the scope of the invention. Modifications and substitutions to the methods, steps and conditions of the present invention are within the scope of the present invention without departing from the spirit and nature of the invention. The technical means used in the examples are conventional means well known to those skilled in the art unless specifically stated.

The drug path prediction method based on network reasoning builds a model, the construction flow is shown in figure 1, and the specific implementation process is as follows:

1. construction of substructure-drug-pathway heterogeneous networks

The data set applied by the method comprises a pathway set, a medicine set, a substructure set, medicine-pathway relation data, substructure-medicine association data and a medicine genomics data set.

First, a drug-pathway network is constructed: constructing a drug-induced gene tag using the disclosed LINCS pharmacogenomic dataset and the CMap pharmacogenomic dataset, dividing the LINCS pharmacogenomic dataset into a Pan-cancer dataset and an external validation set 1 according to an 8:2 ratio; the CMap pharmacogenomic dataset was taken as the external validation set 2.

Pathway enrichment analysis was performed on each drug-induced gene tag using R-package clusterifier to obtain drug pathway relationship data, and finally a drug-pathway network was constructed (as shown in table 1). Screening the constructed drug pathway network, and only reserving drug sets, pathway sets and relation entries thereof meeting the following standards: (1) molecular weight is more than or equal to 200 and less than or equal to 800; (2) a carbon number >3; (3) the cell administration time is 6h, and the administration concentration is 10 mu M; (4) the gene labels induced by the same drug on different cells are processed in a union way; (5) 208 pathways related to metabolism, environmental information processing, cellular processes, biological systems, etc. are selected as Pathway sets, in addition to the disease signal pathways in the KEGG Pathway database; (6) when pathway enrichment analysis was performed using drug-induced gene signatures, corrected P <0.01 was considered as drug pathway relationship.

TABLE 1 drug-pathway network profiles in different datasets

Note that: in the table, N _Drug Representing the amount of drug in the network; n (N) _Pathway Representing the number of paths in the network; n (N) _DPI Representing the number of drug pathway roles in the network; sparsity equal to N _DPI /(N _Drug ×N _Pathway )。

Secondly, a substructure-drug network is constructed: the present invention uses molecular Fingerprinting (FP) as a representation of chemical substructure information. The chemical structure of all drugs and small molecules to which the present invention relates is converted to classical SMILES and standardized using Pipeline Pilot software. Four types of molecular Fingerprints (FPs) were generated using PaDEL-Descriptor software, including Klekota-Roth FP, MACCS FP, pubCHem FP and Substructure FP. Morgan FP was generated using Python packet RDKit and was generated with different atomic radii (radius=1, 2) and lengths (1024 and 2048 bytes). Thus, there are five types of molecular fingerprinting characterizing chemical substructure information for building a substructure-drug network (as shown in table 2).

TABLE 2 substructures in different data sets-drug network profiles

Note that: in the table, N _Drug Representing the amount of drug in the network; n (N) _S Representing the number of substructures in the network; n (N) _DSI Representing the number of sub-structural drug associations in the network; sparsity equal to N _DSI /(N _Drug ×N _FP )。

Finally, a substructure-drug-pathway heterogeneous network is constructed: the present invention constructs a substructure-drug-pathway heterogeneous network for training and evaluating models by integrating the above substructure-drug network and drug-pathway network. 2. Extending network-based reasoning algorithms to the heterogeneous network

The invention constructs a medicine path prediction method based on a network reasoning algorithm. The method uses different substructure information types to train a model under different parameters alpha, beta and gamma and different diffusion times k. Firstly, under the conditions of diffusion times k=2 and gamma=0, searching optimal parameters alpha and beta; secondly, after finding out the optimal parameters alpha and beta, further finding out the optimal parameters gamma and the optimal diffusion times k; and finally, evaluating and predicting the trained optimal model.

3. Implementing reasoning in the heterogeneous network, predicting new drug path relationships

For a given Drug, the potential pathway relationship is predicted based on the constructed substructure-Drug-pathway heterogeneous network and the trained optimal model. In the final resource diffusion transfer matrix, F (i, N _D +N _S The value of +j) is a scoring of the algorithm predicted drug-pathway interactions based on network reasoning, where i e (0, N) _D ],j∈(0,N _P ]Is two positiveAn integer. The higher the F scoring value, the higher the ranking, indicating that the drug has a greater likelihood of having an effect on the pathway. The present invention predicts new Drug pathway relationships for a given Drug based on F.

The constructed model runs on a carrier such as a computer and at least comprises an input display module, a medicine path relation calculation module and a storage module. The input display module is used for inputting a disease-pathway network, chemical entities related to the disease and a pharmaceutical composition to be analyzed, and displaying analysis results; the drug path relation calculation module calculates the relevance between the drug and the path according to a drug path prediction method, screens to obtain a potential path, and scores the interaction between the drug composition to be analyzed and the path based on the path; the storage module records and stores the result of the medicine path relation calculation module in real time.

4. Experiment verification

4.1 evaluation index

The present invention uses the area under the test subject's operating characteristics curve (Areas under a receiver operating characteristic curve, AUC for short) as an evaluation index. For a given Drug, the F values obtained for each channel are predicted and arranged in descending order. Given a threshold L, the predicted drug-pathway relationship ranked in the top L is considered positive and then negative. The Drug true positive number (TP), false positive number (FP), true negative number (TN), false negative number (FN) can be calculated by comparing the predicted positive or negative Drug-pathway relationship of a given Drug with the known Drug-pathway relationship of the Drug, and the true positive rate (True positive rate, TPR for short) and false positive rate (False positive rate, FPR for short) can be further calculated. Thus, by giving a series of different thresholds L, a series of TPR and FPR can be obtained. The test subject operating characteristic curve (Receiver operating characteristic curve, ROC for short) is obtained by plotting TPR versus FPR, and AUC value is the area under the curve. In addition to the AUC values, the more top ranked drug-pathway relationships are important in practical applications in the prediction results based on the network reasoning algorithm.

The invention adopts a ten-fold cross validation method to train and evaluate the model. This is a model evaluation method widely used in network-based method research. In each ten-fold cross-validation process, the relationship (edge) of the drug-pathway in the substructure-drug-pathway heterogeneous network is randomly divided into ten equal parts; one (10%) of them was then used as the test set, and the remaining nine (90%) were used as the training set. Thus, ten different pairs of "training sets-test sets" are obtained. A set of evaluation indices can be calculated for each pair of "training set-test set" and ten-fold cross-validation is repeated 10 times in order to reduce the effect of random factors. The evaluation index is expressed using "mean ± standard deviation".

4.2 model evaluation

The invention uses the substructure-drug network constructed by the Pan-cancer data set and different molecular fingerprint types to train, and uses the external verification sets 1 and 2 to evaluate and verify the trained optimal model. As shown in fig. 2, when the resource diffusion number k=2 and the parameter γ=0, training is performed on the drug-pathway network in the Pan-cancer dataset and the substructure-drug network characterized by different molecular fingerprint types, and the results show that the performance of each model is more prominent in the parameters α=0.1 and β=0.1. This means that a smaller value of parameter alpha is given to the model, meaning that in the initial resource allocation step, each drug node allocates more initial resources to the pathway node to which it is connected than to the sub-structure node; giving the model a smaller value for parameter beta means that the drug-pathway edge sets a larger weight value than the substructure-drug associated edge.

After determining the model parameters α=0.1 and β=0.1, the model parameters γ are optimized. As shown in fig. 3, when the resource diffusion number k=2 and the parameter α=0.1, β=0.1, the drug-pathway network in the Pan-cancer dataset and the substructure-drug network characterized by different molecular fingerprint types were trained, and the results show that the performance of most models is more prominent at the parameter γ= -0.8. Wherein, the model corresponding to the molecular fingerprint type Morgan (1,1024) is optimally performed (AUC= 0.9358 + -0.0015). This suggests that the smaller values of parameter γ given to the model means that the impact of brushing the junction node in the substructure-drug-pathway heterogeneous network is properly attenuated, contributing to the model's performance capabilities.

Finally, after determining the model parameters α=0.1, β=0.1 and γ= -0.8, the model resource diffusion number k is evaluated. As shown in fig. 4, the performance of each model gradually decreases as the number of resource diffusion times in the substructure-drug-pathway heterogeneous network increases, and this result is independent of the substructure type characterized by the molecular fingerprint. Therefore, the number of resource diffusion times k=2 is a suitable parameter value. According to the above results, the molecular fingerprint type is selected as Morgan (1,1024) characterization substructure, and the optimal model is selected when the resource diffusion times k=2, the parameters α=0.1, β=0.1 and γ= -0.8.

Next, the model was evaluated using two external validation sets (as shown in table 3), and the results showed that both external validation sets gave better performance on the model with AUCs 0.8519 and 0.7494, respectively. Therefore, the test on the external verification set shows that the invention has better prediction performance in the aspect of drug path prediction, can more fully help researchers to conduct large-scale drug path prediction, and provides an effective tool for drug discovery or repositioning based on the path.

TABLE 3 model evaluation of external validation set

5. Case analysis: pathway-based drug repositioning studies

According to the optimal model constructed by using the drug-pathway network and the substructure-drug network in the Pan-cancer dataset, the drug repositioning and unfolding study based on the pathway is performed. Specifically, a non-tumor drug with potential anti-tumor effect is found and mainly comprises the following three steps:

5.1 analysis of tumor-associated signal pathways: we downloaded transcriptomic data for 8,628 tumor samples, including 21 tumor types, from the TCGA database. The tumor-pathway network was finally constructed by differential gene expression analysis and pathway enrichment analysis using R-package DESeq2 and clusterifier, respectively. As shown in fig. 5, the network includes 21 tumor types, 110 pathways and 289 pieces of relationship data. Such as: breast cancer (BRCA) is closely related to the PI3K-Akt pathway (hsa 04151), MAPK pathway (hsa 04010) and other pathways. Analysis of this network using the networkAnalyzer plugin Cytoscape software has found that some paths have very high degree values (degrees) in this network, such as: a Focal addition pathway (hsa 04510, devire=14), a Cell cycle pathway (hsa 04110, devire=12), a PI3K-Akt pathway (hsa 04151, devire=10), and the like. Thus, these tumor-closely related signaling pathways provide basis for pathway-based anti-tumor drug discovery or repositioning studies.

5.2 combining the analysis of the tumor-pathway network, determining the pathway characteristics of the anti-tumor drugs according to the optimal model constructed by us. We collected from the literature 46 new chemical entity small molecules in A549 antitumor activity data (IC ₅₀ ) The potential pathway relationship of these small molecules was predicted (prediction length l=6) using the optimal model we constructed, and the predicted pathway was analyzed by the method of rank sum test (Wilcoxon rank sum test). Based on the predicted relationship of small molecules to pathways, alternative assumptions are defined as: small molecules linked to pathway a, their anti-tumor activity data (IC ₅₀ ) Antitumor activity data (IC) smaller than small molecules not linked to pathway a ₅₀ ) Then pathway a is considered a critical pathway feature of antitumor drugs. As shown in FIG. 6, we can find the anti-tumor activity data (IC) of small molecules linked to the PI3K-Akt pathway ₅₀ ) Smaller than those small molecules not associated with the PI3K-Akt pathway (Wilcoxon test: P value=0.0013). And the antitumor activity data (IC) of these small molecules ₅₀ ) The pearson correlation coefficient between the PI3K-Akt pathway is r ² = -0.6489 (P value=1.08E-06), whereas other pathways connected to small molecules do not show significant differences. In addition, by predictive analysis of 24 antitumor drugs in the CCLE database, 19 antitumor drugs were found to have potential to the PI3K-Akt pathwayIs a relationship of (3). Therefore, in the path-based anti-tumor drug repositioning strategy, we prioritize the path characteristics of the PI3K-Akt path as an anti-tumor drug, and discover the potential anti-tumor effect of non-tumor drugs in the marketed drugs according to the characteristics.

5.3 potential anti-tumor drugs in non-tumor drugs are found based on the PI3K-Akt pathway. By using the drug path prediction method developed by the invention, through the prediction analysis (the prediction length is L=6) of the anti-tumor drugs on the market in the drug bank database, nearly 80% of the anti-tumor drugs are predicted to have an interaction relationship with the PI3K-Akt path; by carrying out predictive analysis on non-tumor drugs on the market in a drug bank database, a part of non-tumor drugs with interaction relation with a PI3K-Akt pathway show a certain anti-tumor effect (shown in a table 4), such as: lovastatin as hypolipidemic agent. Meanwhile, preclinical and clinical studies have also demonstrated that pravastatin and simvastatin can significantly inhibit the growth of tumor cells. In addition, non-tumor drugs such as spironolactone, sitagliptin, acarbose, dexamethasone, pentoxifylline and the like have been proved to have certain anti-tumor activity in the related research. Therefore, the case analysis result shows that the prediction result of the invention has a certain guiding effect on biological experiments, and provides a powerful tool for repositioning of anti-tumor drugs or network-based virtual screening.

TABLE 4 antitumor Effect of non-tumor drugs based on the PI3K-Akt pathway

/>

Note that: antitumor Activity data (IC) ₅₀ ) Is derived from PRISM drug repurposing source.

Claims

1. A medicine path prediction method based on network reasoning is characterized by comprising the following steps:

1) Construction of a substructure-drug-pathway heterogeneous network: calculating a drug-induced gene signature using the disclosed pharmacogenomic data, using pathway enrichment analysis, constructing a known drug-pathway network; constructing a substructure-drug network by calculating chemical substructure information of drugs in the drug-pathway network; finally integrating the substructure-medicine network and the medicine-path network to construct a substructure-medicine-path heterogeneous network A:

wherein A is _DS Is a substructure-a drug network, A _DP Is a known drug-pathway network;

2) Extending network-based reasoning algorithms to heterogeneous networks: according to the constructed substructure-medicine-path heterogeneous network, for any medicine, allocating an initial resource of one unit to each path node and each substructure node which are connected with the medicine, and constructing an initial resource matrix A' based on a network reasoning algorithm; in each resource diffusion process, the node with the substructure node and the path node of the initial resource in the network can evenly distribute the resource of the node to the neighbor node connected with the node, and then construct a transfer matrix W based on a network reasoning algorithm according to the number of resource diffusion,

wherein C (i, j) is an initial resource matrix added with adjustable parameters, N _D Is a medicine collection, N _S N is a substructure set _P Is a set of paths;

3) Predicting new drug-pathway relationships: based on the initial resource matrix A' of the constructed substructure-medicine-path heterogeneous network and the transfer matrix W in each step of resource diffusion process, reasoning is performed in the heterogeneous network iteratively, and the calculation formula of the final resource diffusion transfer matrix is as follows:

F＝A′×W ^k

F(i,N _D +N _S the value of +j) is a scoring of the algorithm predicted drug-pathway interactions based on network reasoning, where i e (0, N) _D ],j∈(0,N _P ]Is two positive integers; k is the number of times the resource is spread,

for a given drug, for any one pathway node in the network, the number of the owned resources indicates that there is a strong or weak correlation between the drug and the pathway, i.e. the more the pathway node owns the resources, the higher the score of the pathway node, and the greater the possibility that there is a correlation between the drug and the pathway.

2. The method for predicting drug pathways based on network reasoning as recited in claim 1, wherein,

wherein, in step 1), the process of constructing the substructure-drug-pathway heterogeneous network a comprises the following steps:

1.1 Using published pharmacogenomic data, calculate drug-induced gene signatures, construct a known drug-pathway network by using pathway enrichment analysis:

in the network, the medicines are gatheredRepresenting N drugs, a pathway setN paths are shown; i epsilon (0, N) _D ],j∈(0,N _P ]Is two positive integers;

1.2 Based on the chemical structure information of the drug, calculate the substructure of the drug, construct the substructure-drug network:

wherein i is E (0, N) _D ],j∈(0,N _S ]Is two positive integers; in the network, a drug collectionRepresenting N drugs, substructure sets->Representing an N seed structure;

1.3 Integrating the two networks to construct a substructure-drug-pathway heterogeneous network, wherein the expression form is as follows:

3. the network reasoning-based drug path prediction method of claim 1, wherein:

wherein step 2) comprises the steps of:

2.1 Constructing an initial resource matrix: according to an algorithm based on network reasoning, an adjustable parameter alpha E [0,1 ] is introduced in the process of resource diffusion and is used for adjusting initial resource allocation weights of different node types; in the initial resource allocation process, all neighbor nodes of any one drug node in the substructure-drug-path network are allocated with one unit of initial resource, wherein each substructure node is endowed with an initial resource with average sharing total quantity of alpha, each path node is endowed with an initial resource with average sharing total quantity of 1-alpha,

the matrix representation of the drug-pathway network and the substructure-drug network with initial resource allocation weights are respectively:

wherein i is E (0, N) _D ],j∈(0,N _P ]Is two positive integers;

wherein i is E (0, N) _D ],j∈(0,N _S ]Is two positive integers;

the initial resource matrix representation of the substructure-drug-pathway heterogeneous network is:

2.2 Constructing a resource transfer matrix:

firstly redefining the matrix representation of the substructure-drug network and the drug-pathway network in the resource diffusion process, excluding the influence of completely new chemical entity small molecules without known pathway nodes, specifically as follows:

wherein i is E (0, N) _D ],j∈(0,N _P ]Is two positive integers;

wherein i is E (0, N) _D ],j∈(0,N _S ]Is two positive integers;

according to an algorithm based on network reasoning, introducing two other adjustable parameters beta E [0,1 ] and gamma E (-and +) in the process of resource diffusion, wherein the parameter beta is used for adjusting the influence of different edge types on the model in the process of resource diffusion, and the parameter gamma is used for adjusting the influence of a pivot node on the model in the process of resource diffusion; thus, the transfer matrix representation of each resource diffusion process is:

wherein i, j E (0, N) _D +N _S +N _P ]Is two positive integers.

4. A method of predicting drug pathways based on network reasoning as recited in claim 3 wherein:

wherein alpha is 0.1, beta is 0.1, gamma is-0.8, and k is 2.

5. A drug pathway prediction model constructed based on the prediction method according to any one of claims 1 to 4, comprising:

an input display module, a drug path relation calculation module and a storage module,

the input display module is used for inputting a disease-pathway network, chemical entities related to the disease and a pharmaceutical composition to be analyzed, and displaying analysis results;

the drug pathway relation calculation module calculates the relevance between the drug and the pathway according to the drug pathway prediction method, screens to obtain a potential pathway, scores the interaction between the drug composition to be analyzed and the pathway based on the pathway,

and the storage module records and stores the result of the medicine path relation calculation module in real time.

6. The drug pathway prediction model of claim 5, wherein:

wherein the potential pathway is a pathway in which chemical entities associated with the largest number of diseases are in relation.