WO2023077854A1

WO2023077854A1 - Drug repurposing system and method based on heterogeneous association network deep learning

Info

Publication number: WO2023077854A1
Application number: PCT/CN2022/104668
Authority: WO
Inventors: 于琦; 贺培风; 张升校; 刘格良; 师高翔; 王琪; 高启超
Original assignee: 山西医科大学
Priority date: 2021-11-03
Filing date: 2022-07-08
Publication date: 2023-05-11
Also published as: CN114038574A

Abstract

The present invention belongs to the technical field of biological medicine. Disclosed are a drug repurposing system and method based on heterogeneous association network deep learning. The system comprises a prediction tool module, an experiment verification module and an external service module. In the prediction tool core module of the provided deep learning drug repurposing system, electronic medical record data is used for updating a "drug-disease" incidence matrix; drug information such as electronic medical records, chemical structures, target protein sequences, side effects, and protein interactions are merged with disease information such as electronic medical records, ontology and phenotype, and a drug similarity matrix and a disease similarity matrix are generated; finally, said three matrixes are combined to generate a "drug-disease" heterogeneous association network.

Description

A drug repositioning system and method based on heterogeneous association network deep learning

technical field

The invention belongs to the technical field of biomedicine, and specifically relates to a drug relocation system and method based on deep learning of heterogeneous association networks.

Background technique

Deep learning is to learn the internal laws and representation levels of sample data. The information obtained during the learning process is of great help to the interpretation of data such as text, images and sounds. Its ultimate goal is to enable machines to have the ability to analyze and learn like humans, and to be able to recognize data such as text, images, and sounds. Deep learning is a complex machine learning algorithm that has achieved many results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, and other related fields. Deep learning enables machines to imitate human activities such as audio-visual and thinking, and solves many complex pattern recognition problems, making great progress in artificial intelligence-related technologies.

Drug Repurposing (also known as drug repurposing, old drug repurposing) is a strategy that uses deep learning and other technical methods to rescreen, combine or transform existing drugs to discover their unknown new uses. This strategy has many advantages: firstly, the risk of failure is low, because the drugs used for repositioning have been proved to be absolutely safe in clinical models and on humans; secondly, the development cycle is short, because preclinical experiments, safety Evaluation and even preparation screening have been completed; again, the required investment is small, saving a lot of costs in the preclinical experiment stage. Therefore, the present invention studies a drug repositioning system and method based on deep learning of heterogeneous association networks.

Contents of the invention

The applicant of the present invention established an association model of drug repositioning based on the hypothesis that "the gene expression profile of a disease should be inversely correlated with the gene expression profile of a drug that can treat the disease", that is, the relationship between drug r and disease d The degree of association is expressed as:

The value interval of Assoc _{r and d} is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease. For a specific disease, by sequentially calculating the degree of association Assoc between it and all drugs, and setting a certain threshold, the list L _r of potential therapeutic drugs for the disease can be screened out. Through the further screening of the "drug-disease" association pairs l _r that have been found in the medical literature, the drug list L' _r with no confirmed association can be determined.

However, during the application process, the model was found to have the following limitations:

1. This model only establishes the relationship between drugs and diseases based on the principle of expression profile reversal, ignoring the two types of information based on gene expression profiles, the relationship between drugs and the relationship between diseases.

2. The model does not fully integrate gene expression profile data and medical literature data. The former is only used for association prediction, while the latter is only used for association screening.

In view of the above problems, the present invention provides a drug repositioning system and method based on deep learning of heterogeneous association networks.

In order to achieve the above object, the present invention adopts the following technical solutions:

The present invention provides a drug repositioning system based on heterogeneous association network deep learning, including a prediction tool module, an experimental verification module and an external service module; wherein, the prediction tool module mainly uses the Python programming language to establish a connection with the EMR database and perform operations. Specifically, on the basis of the known "drug-disease" association, information on drug similarity and disease similarity is incorporated to establish a "drug-disease" heterogeneous network, and the deep neural network algorithm in deep learning or the corrected geometric mean is used. value to predict the potential association of "drug-disease" to realize drug repositioning; the experimental verification module is connected with the prediction tool module, mainly through the integration of animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs to form drug repositioning. Standardized test process for positioning results, which can meet general morphology, molecular biology, behavior and multi-omics research; external service modules mainly include data processing and analysis sub-modules, code and program presentation sub-modules, and training and communication The sub-module, the data processing and analysis sub-module is based on the original data uploaded by the registered user and the analysis target, and provides a solution and timely feedback to the user. The code and solution presentation sub-module discloses part of the code and solution for the user. The training and communication sub-module can carry out training and communication work for peers;

The experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, answer the research hypothesis, and verify the results of the prediction tool module screening.

The present invention also provides a drug repositioning method based on heterogeneous association network deep learning, comprising the following steps:

Step 1, the construction of drug similarity matrix;

Step 2, construction of disease similarity matrix;

Step 3, construction of "drug-disease" heterogeneous association network;

Step 4, potential prediction of "drug-disease" association, i.e. drug repositioning.

Further, the specific process of constructing the drug similarity matrix in step 1 is as follows: according to the completeness and availability of data, four types of property characteristic information are selected: chemical structure, target protein sequence, interaction and side effects of the drug; The drug similarity matrix of various attribute characteristics, that is, the drug similarity matrix based on chemical structure, the drug similarity matrix based on target protein sequence, the drug similarity matrix based on interaction and the drug similarity matrix based on side effects; then the above The established drug similarity matrix based on various attribute characteristics is fused with the drug similarity matrix based on EMR to form a drug similarity matrix;

The specific process of constructing the disease similarity matrix in the step 2 is as follows: according to the data completeness and availability, select two types of information, the ontology and phenotype of the disease; respectively establish the ontology-based disease similarity matrix and the disease phenotype Similarity matrix; then the ontology-based disease similarity matrix established above and the disease similarity matrix based on phenotype are fused with the disease similarity matrix established based on EMR to form a disease similarity matrix;

The specific process of constructing the "drug-disease" heterogeneous association network in the step 3 is: using the "drug-disease" adjacency matrix as a bridge, combining the drug similarity matrix constructed in step 1 and the disease similarity matrix constructed in step 2, A "drug-disease" heterogeneous association network can be formed:

H _r,d ={{R,D},{E _r ,E _d ,E _r,d }{W _r ,W _d ,W _r,d }}

In the formula, R represents the drug vertex set, D represents the disease vertex set; E _r , E _d , Er _{, d} represent the connection between "drug-drug", "disease-disease", and "drug-disease"respectively; W _r , W _d , W _{r, d} respectively represent the "drug-drug" similarity value, the "disease-disease" similarity value and whether there is a therapeutic relationship between "drug-disease";

The potential prediction of the "drug-disease" association in step 4, that is, the specific process of drug repositioning is:

4.1 Feature extraction of "drug-disease" association:

4.1.1 The drug-centric feature vector, expressed as:

In the formula, A _i,: is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r _i ;

is the i-th row of the drug similarity matrix, indicating the similarity between drug r _i and other drugs;

4.1.2 Disease-centric feature vector, expressed as:

In the formula, A: _{, j} is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d _j ;

is the jth column of the drug similarity matrix, indicating the similarity between disease d _j and other diseases;

4.1.3 The eigenvector of the "drug-disease" association (r _i , d _j ) can be represented by the eigenvector centered on the drug r _i

and an eigenvector centered on disease d _j

combined, expressed as:

4.2 Training of deep neural network model

The deep neural network algorithm model adopts a fully connected neural network, that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level layer is formed by combining low-level features to represent attribute categories or features. -Disease" association prediction is set as a binary classification problem, using the classic tower structure to build a fully connected neural network, the input layer is the feature vector F _ij generated in step 4.1, and the output layer contains two neurons, respectively indicating that the test sample belongs to Probabilities of "true" and "false";

For the "drug-disease" associated feature vector set F, randomly generate a "negative" sample set at 1:1, and generate a training set, a test set, and a verification set at a ratio of 6:2:2;

The weight between the i-th neuron unit in the l layer and the j-th neuron unit in the l-1 layer is recorded as

Find the optimal weight set w:={w ^l }l:=1→L of the L-layer neural network through the training set, so that the cross-entropy is minimized;

Use the stochastic gradient descent method to learn parameters, use the Mini-Batch method to speed up the learning speed, use the Dropout method to avoid model overfitting, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators to realize the model optimization;

4.3 Use the test set and verification set to test and verify the model optimized in step 4.2, and intersect the predicted results of the tested and verified fully connected neural network model with the predicted results of the existing model to obtain the final predicted drug.

Furthermore, the specific process of fusion of the drug similarity matrix based on various attribute characteristics and the drug similarity matrix based on EMR in the step 1 is: quantile standardization is used to respectively compare the chemical structure-based drug similarity matrix and the target-based drug similarity matrix. The similarity values of protein sequence drug similarity matrix, interaction-based drug similarity matrix, side effect-based drug similarity matrix and EMR-based drug similarity matrix are standardized, and then averaged to form drug similarity matrix.

The specific process of fusion of ontology-based disease similarity matrix and phenotype-based disease similarity matrix with the disease similarity matrix established based on EMR in the step 2 is as follows: use quantile standardization to respectively transform ontology-based disease similarity matrix , The similarity values of the phenotype-based disease similarity matrix and the disease similarity matrix established based on EMR are standardized, and then averaged to form a disease similarity matrix.

Merging multiple similarity matrices into a single matrix can effectively utilize information of various similarities on the one hand, and reduce the complexity of later calculations on the other hand. In addition, because the mean values of different similarity matrix elements are different, if the matrix fusion is performed directly based on the original similarity value, the matrix with a higher similarity average value will greatly affect the final result; even if the matrix is normalized The fusion effect will also be affected by the difference in the similarity distribution. Therefore, quantile standardization is selected to standardize the similarity values of various similarity matrices, and then take the average value.

The similarity of each drug similarity matrix is calculated as follows:

(1) The similarity of the drug similarity matrix based on chemical structure is expressed by Tanimoto coefficient:

In the formula, |C _r | and |C _r′ | represent the number of chemical substructures in drug r and drug r′, respectively, and C _r C _r′ represents the number of chemical substructures shared by drug r and drug r′;

(2) The similarity of the drug similarity matrix based on the target protein sequence:

In the formula, se(...,...) represents the Smith–Waterman sequence alignment score;

(3) The similarity of the drug similarity matrix based on the interaction is expressed by the Jaccard coefficient:

In the formula, I _r , I _r' represent the interaction drug set of drug r and drug r'respectively;

(4) The similarity of the drug similarity matrix based on side effects is represented by the Jaccard coefficient:

In the formula, E _r and E _r' represent the side effect sets of drug r and drug r'respectively;

(5) The calculation formula of the similarity of the drug similarity matrix based on EMR is as follows:

Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2

In the formula, Qd,pk represents the laboratory test results of type k in hospitalized medical record p after taking d medicine, and Simd,pk is the largest Qd,pk difference;

The similarity of each disease similarity matrix is calculated as follows:

(1) The calculation formula of the similarity of ontology-based disease similarity matrix is as follows:

In the formula, (d, d') represents the number of common parent nodes of disease d and d'; p _x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;

(2) The similarity of the phenotype-based disease similarity matrix is expressed by the Cosine coefficient:

In the formula,

and

Respectively represent the frequency of occurrence of the i-th MeSH word in the medical description information of diseases di and d'i;

(3) The calculation formula of the similarity of the disease similarity matrix based on EMR is as follows:

Among them, G _d and G _d' represent the feature sets of diseases d and d' respectively.

The evaluation indicators in step 4.2 include Precision, Recall, F1-measure and AUC. The existing model described in step 4.3 is:

The value interval of Assoc _{r and d} is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease.

Compared with the prior art, the present invention has the following advantages:

The drug relocation system and method based on the heterogeneous association network provided by the present invention establishes a "drug -Disease" heterogeneous network, and then use the deep neural network algorithm in deep learning or the modified geometric mean to carry out data mining and deep learning on the heterogeneous association network, so as to predict the potential association of "drug-disease", Realized the repositioning of the drug. The system and method not only integrates "drug-disease" association information, "drug-drug" similarity information and "disease-disease" similarity information, but also fully integrates gene expression profile data and medical literature data ("drug-disease" "adjacency matrix) can greatly improve the efficiency, accuracy and pertinence of drug R&D, provide scientific support for the discovery of new drug indications and drug R&D cycle management, and provide guidance and support for improving the level of clinical diagnosis and treatment; realize clinical data and multiple Deep mining of omics data to make it serve the field of biomedicine; it can also promote the generation of scientific hypotheses in the clinical field, accelerate the research process of new diagnosis and treatment programs, and promote the development of clinical pharmacy, molecular biology and other related disciplines; rapidly promote the industrialization of drug development , thus creating considerable market value and promoting the rapid development of the national economy.

Description of drawings

Fig. 1 is a schematic diagram of obtaining drug information and generating a drug similarity matrix in the present invention.

Fig. 2 is a schematic diagram of the acquisition of disease information and the generation of disease similarity matrix in the present invention.

Fig. 3 is a schematic diagram of the construction of the "drug-disease" heterogeneous association network of the present invention.

Fig. 4 is the feature extraction of "drug-disease" association in the present invention. Among them, (a) "drug-disease" association network, (b) drug feature matrix, (c) disease feature matrix, (d) "drug-disease" feature extraction.

Fig. 5 is the deep neural network model used in the present invention.

Figure 6 shows the results of patch clamp experiments.

Figure 7 is the results of the open field experiment; among them, A is the statistical histogram of the total movement distance of mice in each group in the open field, B is the percentage of time that mice in each group are active in the central area of the open field, and C is the time percentage of the mice in each group. The representative trajectory of the mouse in the open field test is shown; *P<0.05, ***P<0.001.

Figure 8 is a diagram of the results of the new object recognition experiment; among them, A is the schematic diagram of the new object recognition experiment, and B is the histogram of the respective exploration time of the mice in each group to two identical objects during the familiarization period, and there is no statistical difference (P>0.05 ), C is the NOI statistical histogram of each group of mice during the test period. *P<0.05, **P<0.01, ***P<0.001.

Figure 9 is a graph of the results of the Morris water maze experiment; wherein, A is the line graph of the average swimming speed change of the mice during the Morris water maze experiment; B is the line graph of the escape latency change of the mice in each group during the 5-day positioning navigation experiment , * means APP/PS1+Vehicle vs WT+Vehicle, # means APP/PS1+TSA vs APP/PS1+Vehicle; C is the histogram of the percentage of time spent in the target quadrant of each group of mice in the space exploration experiment, D is the histogram of each group Statistical histogram of the number of times the mice in each group crossed the platform in the space exploration experiment, E is a schematic diagram of the representative swimming trajectory of each group of mice in the space exploration experiment, F is the time of each group of mice reaching the platform in the visual platform experiment Statistical histogram, G is the statistical histogram of the average swimming speed of mice in each group in the visual platform test; *P<0.05, **P<0.01, ***P<0.001.

Detailed ways

The technical solutions in the embodiments of the present invention will be described in detail below in combination with the embodiments of the present invention and the accompanying drawings. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the principle of the present invention, and these should also be regarded as belonging to the protection scope of the present invention.

Example 1

A drug repositioning system based on deep learning of heterogeneous association networks, including a prediction tool module, an experimental verification module and an external service module;

Among them, the prediction tool module mainly uses the Python programming language to connect and operate with EMR and other databases, and uses deep learning and other technical methods to re-screen existing drugs, specifically in the acquisition and construction of various drug attribute information and disease information. Based on the database, drug similarity matrix and disease similarity matrix are established through similarity calculation, matrix fusion and other methods; the known "drug-disease" correlation matrix is used as a bridge to integrate drug similarity and disease similarity information to construct a "drug-disease" Disease" heterogeneous network; for a certain disease, use the deep neural network algorithm in deep learning or the revised geometric mean to predict the potential association of "drug-disease", and re-screen candidate drugs or compounds that meet the algorithm requirements;

The experimental verification module is mainly to form a standardized test process for drug repositioning results by integrating animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs, which serves the prediction tool module and can meet the requirements of general morphology and molecular biology. The experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, and make a research hypothesis Answer, verify the results of the prediction tool module screening;

The external service module mainly provides researchers with specialized data processing and analysis services. Registered users can upload raw data and analysis targets to the platform. Feedback to users in a timely manner; in addition, the platform discloses part of the code and problem solutions for users, and conducts training and communication work with peers.

Example 2

A drug repositioning method based on heterogeneous association network deep learning, including the following steps:

Step 1, the construction of drug similarity matrix: according to the completeness and availability of data, four types of attribute information are selected: the chemical structure of the drug, the target protein sequence, the interaction and the side effect; degree matrix, that is, drug similarity matrix based on chemical structure

Drug similarity matrix based on target protein sequence

Interaction-based drug similarity matrix

and drug similarity matrix based on side effects

Then quantile standardization is used to compare the drug similarity matrix based on various attribute characteristics and the drug similarity matrix based on EMR established above.

The similarity values of the drugs are standardized, averaged, and fused to form a drug similarity matrix S _r ; as shown in Figure 1.

(2) Based on the similarity of the drug similarity matrix of the target protein sequence, calculate the Smith–Waterman sequence alignment score between drug r and drug r', and standardize this value by geometric mean, the target similarity for:

(5) Similarity of EMR-based drug similarity matrix:

The EMR database contains records of drug prescriptions, including time points of administration and various laboratory test results during the patient's hospital stay. We tracked any changes in dosing records and laboratory test results, and described the physiological changes in each test result after drug treatment by calculating the maximum difference. The calculation formula is as follows:

Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2

In the formula, Qd,pk represents the k-type laboratory test results of the hospitalized medical record p after taking d drugs, and the change of k-type laboratory test results caused by drug induction after taking d drugs is calculated based on the largest Qd,pk difference Simd, pk. The present invention calculates the similarity between two drug-induced physiological profiles of a drug pair using the rank sum test as a P-value for the type of laboratory test. Finally, the normalized ranking of the P values of all drug pairs was used as a measure of "drug-drug" similarity to reduce the heterogeneity of the distribution of P values tested by different laboratories. The present invention assumes that different laboratory tests may be related to specific physiological properties of different diseases or drugs. Therefore, the similarity of disease or drug pairs is calculated using each test type separately. Due to the sparsity of laboratory test results, only the main type of laboratory test was used here based on its high coverage of drug-prescribing patients (≥0.3) with more than two test results |Qd,pk|≥2 during the dosing period .

Step 2, the construction of disease similarity matrix: according to the completeness and availability of data, select two types of information about the ontology and phenotype of the disease; respectively establish the disease similarity matrix based on ontology

and phenotype-based disease similarity matrix

Then quantile normalization is used to transform the ontology-based disease similarity matrix established above into

and phenotype-based disease similarity matrix

Disease similarity matrix based on EMR

The similarity value of the disease is standardized, averaged, and fused to form a disease similarity matrix S _d ; as shown in Figure 2.

In the formula, c(d, d′) represents the number of common parent nodes of disease d and d′; p _x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;

In the formula,

and

(3) Referring to the similarity calculation formula of the ERM-based drug similarity matrix, the calculation formula of the similarity of the EMR-based disease similarity matrix is as follows:

Step 3, construction of "drug-disease" heterogeneous association network: use the known "drug-disease" adjacency matrix A as a bridge, combine the drug similarity matrix S _r constructed in step 1, and the disease similarity matrix constructed in step 2 S _d , build a "drug-disease" heterogeneous association network H _{r, d} : (Figure 3)

H _r,d ={{R,D},{E _r ,E _d ,E _r,d }{W _r ,W _d ,W _r,d }}

In the formula, R represents the drug vertex set, D represents the disease vertex set; E _r , E _d , Er _{, d} represent the connection between "drug-drug", "disease-disease", and "drug-disease"respectively; W _r , W _d , W _{r, d} represent the "drug-drug" similarity value, the "disease-disease" similarity value and whether there is a therapeutic relationship between "drug-disease" (1 or 0);

Step 4, potential prediction of "drug-disease" association, i.e. drug repositioning:

(1) Feature extraction of "drug-disease" association: For each group of "drug-disease" associations, extract its topological feature vector from the "drug-disease" heterogeneous association network, and use it as the training method of deep neural network model parameter,

Drug-centric eigenvectors, for the drug _ri , one of the eigenvectors corresponds to its known association with all diseases in the disease set D, and the other eigenvector corresponds to its relationship with all the diseases in the drug set R similarities between drugs. These two vectors are combined to form a feature vector centered on the drug _ri , expressed as:

is the i-th row of the drug similarity matrix, indicating the similarity between drug r _i and other drugs; at this time,

The length of is n+m.

Disease-centered eigenvectors, similarly, as far as disease d _j is concerned, one of the eigenvectors corresponds to its known association with all drugs in the drug set R, and the other eigenvector corresponds to its relationship with the disease set The similarity between all diseases in D. These two vectors are combined to form the feature vector centered on the disease d _j , expressed as:

In the formula, A _{:, j} is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d _j ;

is the jth column of the drug similarity matrix, indicating the similarity between disease d _j and other diseases; at this time,

The length of is n+m.

Therefore, the eigenvector of the "drug-disease" association (r _i , d _j ) can be given by the eigenvector centered on the drug r _i

and an eigenvector centered on disease d _j

combined, expressed as:

(2) Training of deep neural network model

The inner neural network layer of the deep neural network can be divided into three parts, the first layer is an input layer, the last layer is an output layer, and all intermediate layers are hidden layers, and the deep neural network model of the present invention adopts a fully connected neural network (Fig. 5 ), that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level is formed by combining low-level features to represent attribute categories or features. The present invention sets the prediction of the "drug-disease" association For a binary classification problem, a fully connected neural network is built using a classic tower structure. The input layer inputs the feature vector F _ij generated above, with a total of 2*(n+m) neurons; the output layer contains two neurons, Denote the probability that the test sample belongs to "true" and "false", respectively;

Find the optimal weight set w:={w ^l }l:=1→L of the L-layer neural network through the training set, so as to minimize the cross entropy:

The stochastic gradient descent method is used to learn parameters, the Mini-Batch method is used to speed up the learning speed, and the Dropout method is used to avoid model overfitting; the prediction results of the method of the present invention are compared with three classifiers of Logistic Regression, Support Vector Machine and Random Forest For comparison, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators (Precision, Recall, F1-measure and AUC) to achieve model optimization;

(3) Use the test set and verification set to test and verify the model optimized in step (2), and intersect the prediction results of the fully connected neural network model after testing and verification with the prediction results of the existing drug repositioning model to obtain the final predictive drugs.

The correlation degree between drug r and disease d in the existing drug repositioning model is expressed as:

The model is established based on the assumption that "the gene expression profile of the disease should be inversely correlated with the gene expression _profile of the drug that can treat the disease". The closer to 1, the greater the possibility that the drug can treat the disease. For a specific disease, by sequentially calculating the degree of association Assoc between it and all drugs, and setting a certain threshold, the list L _r of potential therapeutic drugs for the disease can be screened out.

Example 3

Based on the heterogeneous association analysis between the clinical routine drug database of ventricular arrhythmia diseases and the database of ventricular arrhythmia diseases, a heterogeneous association network deep learning model was established to discover the therapeutic effect of drug B for the treatment of other diseases on ventricular arrhythmias. as follows:

(1) Establish the similarity matrix of antiarrhythmic drugs

On the basis of establishing similarity between drugs based on transcriptome data, four types of associations are added: chemical structure similarity, interaction similarity, protein target similarity and side effect similarity.

A collection of drug similarity matrices: according to the calculation method of each similarity in Example 2, construct a drug transcriptome similarity matrix

Medicinal chemical structure similarity matrix

Drug Interaction Similarity Matrix

Drug protein target similarity matrix

and Drug Side Effects Similarity Matrix

Fusion of drug similarity matrices: merging multiple similarity matrices into a single matrix, effectively utilizing information of various similarities, and reducing the complexity of later calculations. Since the mean values of the elements of the five drug similarity matrices are different, the quantile standardization was selected to standardize the five types of matrices, and then the average value was taken to obtain the final drug similarity matrix S _r .

(2) Establish a ventricular arrhythmia disease similarity matrix

On the basis of establishing the similarity of various ventricular arrhythmia diseases based on transcriptome data, two types of associations, phenotype similarity and ontology similarity, are added.

A collection of disease similarity matrices: According to the calculation method of each similarity in Example 2, construct a disease transcriptome similarity matrix

Disease Phenotype Similarity Matrix

and disease ontology similarity matrix

Fusion of disease similarity matrix: Similarly, quantile standardization is selected to standardize the three types of matrices, and then the average value is taken to obtain the final drug similarity matrix S _d .

(3) Establish a heterogeneous association network of "anti-ventricular arrhythmia drugs-ventricular arrhythmia diseases", and mine a drug repositioning model based on the heterogeneous association network.

Extract drug-centric feature vectors

and disease-centric eigenvectors

Form the feature vector F _ij of the "drug-disease" association (r _i , d _j ), set the prediction of the "anti-ventricular arrhythmia drug-ventricular arrhythmia disease" association as a binary classification problem, and connect the input of the neural network The layer is the feature vector F _ij generated in the previous stage, with a total of 2*(n+m) neurons; the output layer contains two neurons, which respectively represent the probability of the test sample belonging to "true" and "false". The stochastic gradient descent method is used to learn parameters, the Mini-Batch technology is used to speed up the learning speed, and the Dropout technology is used to avoid model overfitting.

The present invention randomly generates a "negative" sample set for the feature vector set F at a ratio of 1:1. Generate training set, test set and validation set according to the ratio of 6:2:2. The prediction results of the model of the present invention are compared with the three classifiers of Logistic Regression, Support Vector Machine and Random Forest, and ten-fold cross-validation is used to evaluate the performance of the model. The list of anti-ventricular arrhythmia drugs predicted by the original drug repositioning model is marked as L _old , and the list of anti-ventricular arrhythmia drugs predicted by the drug repositioning model of the present invention is marked as L _new . The intersection of the two sets L=L _old ∩L _new is determined as the final predicted drug (lead compound).

(4) Animal experiments for predicting drugs: Using patch clamp technology, by detecting the changes in the resting potential (RP), action potential (AP) amplitude and time course of rat isolated ventricular myocytes before and after drug action, evaluate whether the drug has Anti-ventricular arrhythmia effect, validation of machine learning prediction results.

The specific process is as follows: the isolated rat heart is suspended on the Langendorff perfusion device through the aorta, the heart is perfused with calcium-free Tyrode's solution and enzyme solution in turn, and a single ventricular myocyte is isolated after the ventricular muscle tissue becomes larger. Using patch clamp technique, RP and AP were recorded in current clamp mode. The animal experiment data is represented by (x±s), analyzed by SPSS statistical software, P<0.05 means that the difference is statistically significant, and the drugs with statistical differences can be used as backup drugs for the experiment.

As shown in Figure 6, drug A can increase the resting potential of isolated ventricular myocytes in a dose-dependent manner, and shorten the terminal repolarization time of action potentials. The results of this experiment showed that the action characteristics of drug B on action potentials were consistent with those of IK1 agonists. Moderately enhance IK1, and then increase or restore the resting potential, and the mechanism against ischemic arrhythmia is as follows: ①Increasing the negative value of the resting potential can reverse the membrane depolarization caused by pathological factors, reduce the excitability of cells, ②Increase the membrane conductance, reduce the abnormal fluctuation of membrane potential caused by the change of membrane current, and increase the electrical stability of the membrane; ③Appropriately shorten the action potential duration (APD), which helps to prevent early and late depolarization (EAD) and thus Triggered arrhythmias.

(4) Clinical trials of alternative drugs: With the approval of the Medical Ethics Committee and the informed consent of the patients, clinical randomized controlled trials will be carried out. Evaluate whether the new drug is effective or not based on changes in indicators such as total effective rate, ambulatory electrocardiogram improvement, blood pressure, heart rate, electrocardiogram, blood lipids, blood sugar, and adverse reactions (the results of animal experiments in this example are positive, and clinical trials have not yet been carried out).

Example 4

Based on the clinical drug database of Alzheimer's disease (AD) disease, the multi-omics data database of AD disease, the text database of clinical medical records, etc., heterogeneous association analysis was carried out, and a deep learning model of heterogeneous association network was established, and a new compound was discovered. The therapeutic effect of C on AD is as follows:

(1)～(3) Same as "Example 3", by establishing the similarity matrix of AD treatment drugs, establishing the similarity matrix of AD disease multi-omics data and clinical medical records text, and establishing the "AD treatment drugs-AD disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining the seven final predicted drugs (lead compounds), compound C was selected as the research object through literature research, pharmacophore analysis and other means to carry out experimental verification.

(4) Experimental verification of mouse behavior

Eight-month-old APP/PS1 mice and their littermate wild-type mice (wild-type, WT) were used in the experiment. According to the different types of mice and drug treatment, they were randomly divided into four groups: wild-type control group (WT+Vehicle), wild-type administration group (WT+C), APP/PS1 control group (APP/PS1+Vehicle) and APP/PS1 administration group (APP/PS1+C).

Animals in each group received intraperitoneal injection of compound C (2 mg/kg) or an equal volume of vehicle (Vehicle) from 30 days before the behavioral experiment, and the drug injection continued until the end of the behavioral experiment. On the 31st day of drug treatment, three behavioral tests were performed: open field test, novel object recognition test, and Morris water maze test.

Experimental results:

①Compared with mice in WT+Vehicle group, the total movement distance of mice in APP/PS1+Vehicle group was significantly increased in the open field (P<0.05), and the percentage of time in the central area tended to increase. The total movement distance of APP/PS1 mice after C treatment was significantly reduced (P<0.001), and the percentage of time in the central area also showed a downward trend. It shows that compound C can significantly reduce the hyperactive state of APP/PS1 mice in the open field, as shown in FIG. 7 .

②In the new object recognition experiment, there was no significant difference in the exploration time of the four groups of mice for two identical objects in the familiarization period (P>0.05); The new object recognition index of the mice in the Vehicle group was significantly decreased (P<0.01), and the new object recognition index of the APP/PS1+C group mice (P<0.05) could be significantly improved after treatment with compound C. In addition, compound C treatment also significantly improved the novel object recognition index of wild-type control mice (P<0.001). It shows that Compound C can improve the short-term recognition memory of mice, see Figure 8.

③Water maze navigation test showed that the escape latency of mice in APP/PS1+Vehicle group on the 4th day (P<0.001) and 5th day (P<0.05) was significantly longer than that of WT+Vehicle group mice. C significantly shortened after treatment (P<0.05). The results of space exploration experiments showed that compared with WT+Vehicle group mice, the percentage of residence time (P<0.05) and platform crossing times (P<0.01) of APP/PS1+Vehicle group mice in the target quadrant were significantly decreased, while compound C After treatment, they all increased significantly (P<0.05). During the water maze test, there was no statistical difference in the swimming speed of the mice in each group (P>0.05). It shows that compound C can enhance the spatial learning and reference memory of APP/PS1 mice, as shown in FIG. 9 .

Example 5

Based on the heterogeneous association analysis based on the clinical drug database of viral hepatitis B (hepatitis B for short), the multi-omics database of hepatitis B disease, and the text database of clinical medical records, a heterogeneous association network deep learning model was established, and it was found that the natural species of Oleander The therapeutic effect of ellipticine, a compound in the leaves of rose ellipse, on hepatitis B, is as follows:

(1)～(3) Same as "Example 3", by establishing the similarity matrix of hepatitis B treatment drugs, establishing the multi-omics data of hepatitis B disease, the similarity matrix of clinical medical records, and establishing the "hepatitis B treatment drugs-hepatitis B disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining more than ten kinds of final predicted drugs (lead compounds), through literature research, pharmacophore analysis and other means, select ellipticine and camptothecin as research objects, and carry out experimental verification (camptothecin-related prediction and verification process See Example 6).

(4) Culture of HepG2.2.15 cells

① Recovery of HepG2.2.15 cells

In a 10ml glass centrifuge tube, 7ml of DMEM culture solution containing 10% fetal bovine serum containing double antibody (PS) was added in advance; the frozen cells were taken out from the liquid nitrogen tank and quickly placed in a constant temperature water bath at 37.0°C for continuous Shake for 2-3 minutes to make it thaw quickly; after it is completely thawed, take out the cryopreservation tube, sterilize the cryopreservation tube with 75% alcohol cotton ball, open the cryopreservation tube, suck out the cell suspension with a straw, and pour it into the pre-prepared centrifuge tube Medium; after mixing, centrifuge at low speed at 800 rpm for 5 min; discard the supernatant, then add 7ml of the above culture solution, blow the cells with a pipette, and centrifuge again at 800 rpm for 6 min; discard the supernatant, and then add to the centrifuge tube The above cell culture solution (the culture bottle used is 25cm2, so add 5ml of the culture solution), blow the cells with a pipette to make them all blow up, suck out the culture solution containing the cells into the cell culture bottle, repeatedly use the pipette to blow and blow the cells to make a single Place the cell suspension in a 5% CO2 incubator at 37.0°C for culture; the next day, observe the growth of the cells, replace the cell culture medium once, and continue culturing at 37.0°C in a 5% CO2 incubator.

② Subculture of HepG 2.2.15 cells

Discard the old cell culture medium in the cell culture flask, wash it twice with PBS buffer solution with a pH value of 7.2; add 1ml of 0.25% trypsin to the cell culture flask, shake the culture flask gently to make the trypsin flow through all the cells Surface, fully in contact with the cells; put the cell culture flask into a 37.0°C incubator, and digest for 2 to 3 minutes; observe the cell culture flask under an inverted microscope, and find that the cytoplasm of the cells retracts and the intercellular space thickens, and the digestion is terminated immediately ; Immediately add 5-10ml of DMEM cell culture solution containing 10% FBS containing PS to the culture bottle; absorb the culture solution in the bottle with a straw, blow and beat the cells on the wall of the bottle repeatedly to suspend all the cells and form a single cell suspension; Cell counting was performed on a cell counting plate, and the cells were inoculated into new cell culture flasks at 5×105/12.5 cm2; cultured at 37.0°C in a 5% CO2 incubator.

③Cryopreservation of cells

Select one bottle of cells in good growth state and in the logarithmic growth phase; digest the monolayer-grown cells with 0.25% trypsin (wash with 1ml of trypsin, discard, then add 1ml of trypsin, 37.0°C, 2-3 minutes) ; Add 5ml of DMEM cell culture solution containing PS 10% FBS to the cell culture flask, suspend the cells, and blow them into single cells, transfer them to a centrifuge tube, centrifuge at 800 rpm, 5min; discard the supernatant, and centrifuge Add the prepared DMEM containing 10% DMSO and 20% FBS to the tube, blow and beat it several times with a pipette to make the cells uniform; divide the cell suspension into cryopreservation tubes, add 1.5ml of cell culture medium to each tube , make a mark, indicate the cell name, storage time and the medium used; ⑥Place at 4°C for 30 minutes, -20°C for 1.5-2h, -70°C for 12h, and then transfer to a liquid nitrogen tank for freezing.

④ Seeding cells into cell culture plates

Take a bottle of HepG 2.2.15 cells in good growth state and in the logarithmic growth phase, add 1ml of 0.25% trypsin to the culture bottle, shake the culture bottle gently, and pour off the trypsin; then add 0.25% trypsin again 1-2ml, 37.0°C, digest for 2-3 minutes, pour off the trypsin gently; add 3ml of DMEM cell culture medium without PS and 10% FBS, blow the cells repeatedly with a pipette to make a single cell suspension After confirming that it has become a single cell suspension, add an appropriate amount of the above-mentioned DMEM culture solution to the bottle, count the cells, and make the cell content 2 × 105/ml; take out the 24-well cell culture plate, and add 500 μl of the cell suspension to each well ( Containing cells (1×105/well); put into 5% CO2 incubator at 37.0°C, culture overnight, and apply the next day.

(5) CCK-8 detects the cytotoxicity of drugs

① Ellipticine drug treatment

Dissolve ellipticine with a small amount of DMSO, then use maintenance medium to dilute the mother solution to the highest concentration to be tested in the experiment, 0.01 μmol/L, filter and sterilize with a 0.2 μm disposable syringe filter, and distribute it for use; Dilution method, dilute in EP tubes, the first tube is the highest concentration liquid medicine. Add a certain amount of maintenance medium from the second tube, and then draw the same amount of liquid medicine from the first tube to the second tube. After blowing and mixing, discard the tip and replace it with a new tip. Add the same amount of drug solution to the third tube, and so on to the penultimate tube; the last tube is the maintenance medium without drugs. Finally, the drug is diluted into four concentrations of A, B, C, and D;

② Add the drug-containing medium to the cell culture well plate for cytotoxicity test

HepG 2.2.15 cells were inoculated in 96-well cell culture plate at 8×104/ml, 0.1ml per well. After the cells grew into a monolayer, the culture medium was removed (note that it was removed), and A, B, C, D Four concentrations of ellipticine were added to corresponding cell wells for culture, and three replicate wells were set up for each concentration. Set up a blank control group (only medium, no cells) and a normal cell control group (no drug group, only normal medium); after adding the drug, place in a 5% CO2 incubator and culture at 37.0°C for 72 hours; after 72 hours of cell culture Take out the culture well plate, add 10ul of CCK-8 solution to each well, and continue to culture in the incubator for 2h. After 2h, use a microplate reader to measure the absorbance at 450nm and print the results; data processing and analysis calculation.

Cell survival rate=(experimental group A value-blank control group A value)/(normal cell control group A value-blank control group A value)×100%, half cytotoxic concentration (CC50)=Antilog[(logB+(50- Inhibition percentage of A)/(A-B) inhibition percentage × C)], A≥50% drug concentration, B≤50% drug concentration, C=log dilution factor; use computer software to calculate ellipticine maximum non-toxic concentration.

(6) Inhibition experiment of ellipticine on HBsAg and HBeAg

① Treatment with ellipticine at non-cytotoxic doses

According to the above CCK-8 method, the non-cytotoxic concentration of ellipticine is 0.01 μmol/L, and the concentration of 0.01 μmol/L drug solution is diluted to 0.005 μmol/L, 0.0025 μmol/L, etc. Different concentrations (including 0.01 μmol/L); Add different concentrations of drug-containing medium into the same 24-well cell culture plate, set up three duplicate holes for each concentration, and set up a blank control group and a normal cell control group to observe At different concentrations, the inhibitory rate of the same drug on HBsAg and HBeAg was observed, and the dose-effect relationship was observed; after adding the drug-containing culture medium and cultured in an incubator for 72 hours, the cell supernatant was collected and stored at -20°C for testing.

②Detection of HBsAg and HBeAg by ELISA method (just follow the instructions of the ELISA kit)

Take the ELISA kit out of the refrigerator at 4°C, and at the same time take out the cell culture supernatant to be tested, and let it rest at room temperature for 1 hour; make up lotion: add 25ml of lotion, add double distilled water to 500ml, mix well, and set aside; Put 50 μl of the sample to be tested, negative control, and positive control into the corresponding wells (two wells each for negative control and positive control); Mix well; paste the sealing film and incubate at 37.0°C for 30 minutes; pour off the contents of the orifice plate, pat dry on the filter paper, then fill it with washing solution, let it stand for 10-20 seconds, shake off the washing solution, and pat dry, so that the total Wash 5 times; color development: add 50 μl of chromogenic reagent A uniformly to each well in sequence, and then add 50 μl of chromogenic reagent B uniformly, affix a sealing film, incubate at 37.0°C as soon as possible, and develop color for 15 minutes in the dark; termination: add 50 μl of chromogenic reagent B to each well Stop solution 50 μl; Interpretation: Complete the interpretation within 10 minutes after the termination, adjust to zero with a blank control, measure the absorbance (A) at a wavelength of 450 nm with a microplate reader, and the results are expressed as positive well A value/negative well A value (P/ N) means (OD value is read under A450).

The formula for calculating the inhibitory rate of ellipticine on the expression of HBsAg and HBeAg:

Inhibition rate = (experimental well P/N value - control well P/N value / control group P/N value - 2.1) × 100%

(7) Inhibition experiment of ellipticine on HBV DNA (operated strictly in accordance with the instructions of the DNA detection kit)

The drug treatment was the same as the HBsAg and HBeAg inhibition tests; the cells were added with different concentrations of drugs, and after 72 hours of culture, the HBV DNA load in the cell supernatant was measured by real-time fluorescent quantitative PCR; the inhibitory effect of different concentrations of drugs on HBV DNA was observed.

(8) The data obtained were statistically analyzed using SPSS 22.0 system, and the data were tested by analysis of variance, and P<0.05 was considered statistically significant.

(9) Experimental results

After HepG 2.2.15 cells were cultured for 72 hours by adding non-toxic dose of drug-containing serum, the contents of HBsAg, HBeAg and HBVDNA in the cell supernatant were detected, and the results were as follows:

Table 1 Inhibitory effect of ellipticine on HBsAg and HBeAg secreted by HepG 2.2.15 cells (n=3,

)

Compared with the cell control group, *P<0.05

Table 2 Inhibitory effect of ellipticine on HBV DNA (1×10 ⁶ IU/ml) in culture supernatant of HepG 2.2.15 (n=3,

)

Compared with the cell control group, *P<0.05

It can be seen from the above data that ellipticine has a significant inhibitory effect on HBsAg and HBeAg at the maximum non-toxic dose, and with the increase of the concentration, the inhibitory effect is more obvious, showing a concentration-dose dependent relationship. Ellipticine also has an inhibitory effect on HBV DNA, and with the increase of drug concentration, the inhibitory effect is more obvious, showing a concentration-dose-dependent relationship.

Example 6

Based on the Hepatitis B disease clinical drug database, the Hepatitis B disease multi-omics database, the clinical medical record text database, etc., heterogeneous association analysis was carried out, and a heterogeneous association network deep learning model was established, and the compound camptothecin naturally present in camptotheca was found to treat hepatitis B. function, as follows:

(1)～(3) Same as "Example 3", by establishing the similarity matrix of hepatitis B treatment drugs, establishing the multi-omics data of hepatitis B disease, the similarity matrix of clinical medical records, and establishing the "hepatitis B treatment drugs-hepatitis B disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining more than ten final predicted drugs (lead compounds), through literature research, pharmacophore analysis and other means, camptothecin was selected as the research object to carry out experimental verification.

(4) The cultivation of HepG 2.2.15 cells (same as "Example 5")

(5) CCK-8 detects the cytotoxicity of medicine (same as "Example 5")

(6) Inhibition experiment of camptothecin to HBsAg and HBeAg (same as "Example 5")

(7) inhibitory experiment of camptothecin to HBV DNA (with " embodiment 5 ")

(8) statistical processing (with " embodiment 5 ")

(9) Experimental results

After adding non-toxic dose of drug-containing serum to culture HepG 2.2.15 cells for 72 hours, the contents of HBsAg, HBeAg and HBV DNA in the cell supernatant were detected, and the results were as follows:

Table 3 Inhibitory effect of camptothecin on HBsAg and HBeAg secreted by HepG 2.2.15 cells (n=3,

)

Compared with the cell control group, *P<0.05

Table 4 Inhibitory effect of camptothecin on HBV DNA (1×106IU/ml) in culture supernatant of HepG2.2.15 (n=3,

)

Compared with the cell control group, *P<0.05

It can be seen from the above data that camptothecin has a significant inhibitory effect on HBsAg and HBeAg at the maximum non-toxic dose, and with the increase of the concentration, the inhibitory effect is more obvious, showing a concentration-dose dependent relationship. Camptothecin also has an inhibitory effect on HBV DNA, and with the increase of drug concentration, the inhibitory effect is more obvious, showing a concentration-dose-dependent relationship.

The experimental results of Examples 3-6 show that the system and method of the present invention can greatly improve the efficiency, accuracy and pertinence of drug research and development, provide scientific support for the discovery of new drug indications and drug research and development cycle management, and improve clinical diagnosis and treatment. Provide guidance and support at the level; realize in-depth mining of clinical data and multi-omics data, so that it can serve the field of biomedicine; it can also promote the generation of scientific hypotheses in the clinical field, accelerate the research process of new diagnosis and treatment programs, and promote clinical pharmacy, molecular biology, etc. The development of related disciplines; rapidly promote the industrialization of drug development, thereby creating considerable market value and promoting the rapid development of the national economy.

Claims

A drug repositioning system based on heterogeneous association network deep learning is characterized in that it includes a prediction tool module, an experimental verification module and an external service module; wherein the prediction tool module mainly uses the Python programming language to establish a connection with the EMR database and perform operations , specifically, on the basis of the known "drug-disease" association, incorporate drug similarity and disease similarity information, establish a "drug-disease" heterogeneous network, and use deep neural network algorithms in deep learning or modified geometry The average value predicts the potential association of "drug-disease" to realize drug repositioning; the experimental verification module is connected with the prediction tool module, mainly through the integration of animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs to form a drug Standardized test procedures for relocation results, which can meet general morphology, molecular biology, behavior and multi-omics research; external service modules mainly include data processing and analysis submodules, code and program presentation submodules, and training and The communication sub-module, the data processing and analysis sub-module is based on the original data uploaded by the registered user and the analysis target, provides a solution and timely feedback to the user, and the code and solution presentation sub-module discloses part of the code and solution for the user , the training and communication sub-module can carry out training and communication work for peers;

The experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, answer the research hypothesis, and verify the results of the prediction tool module screening.
A drug repositioning method based on heterogeneous association network deep learning, characterized in that: comprising the following steps:

Step 1, the construction of drug similarity matrix;

Step 2, construction of disease similarity matrix;

Step 3, construction of "drug-disease" heterogeneous association network;

Step 4, potential prediction of "drug-disease" association, i.e. drug repositioning.
A drug repositioning method based on heterogeneous association network deep learning according to claim 2, characterized in that, the specific process of building the drug similarity matrix in the step 1 is: according to the completeness and availability of data , select the chemical structure of the drug, target protein sequence, interaction and side effects four types of attribute characteristics information; respectively establish the drug similarity matrix based on various attribute characteristics, that is, the drug similarity matrix based on the chemical structure, the drug similarity matrix based on the target protein sequence Similarity matrix, drug similarity matrix based on interaction and drug similarity matrix based on side effects; then the drug similarity matrix based on various attribute characteristics established above is fused with the drug similarity matrix based on EMR to form drug similarity matrix; the specific process of constructing the disease similarity matrix in the step 2 is: according to data completeness and availability, select two types of information of disease ontology and phenotype; respectively establish ontology-based disease similarity matrix and phenotype-based The disease similarity matrix based on the disease similarity matrix; then the ontology-based disease similarity matrix and phenotype-based disease similarity matrix established above are fused with the disease similarity matrix established based on EMR to form a disease similarity matrix.
A drug repositioning method based on heterogeneous association network deep learning according to claim 2, characterized in that the specific process of constructing the "drug-disease" heterogeneous association network in said step 3 is: "Disease" adjacency matrix as a bridge, combined with the drug similarity matrix constructed in step 1 and the disease similarity matrix constructed in step 2, can form a "drug-disease" heterogeneous association network:

H r,d ={{R,D},{E r ,E d ,E r,d }{W r ,W d ,W r,d }}

In the formula, R represents the drug vertex set, D represents the disease vertex set; E r , E d , Er , d represent the connection between "drug-drug", "disease-disease", and "drug-disease"respectively; W r , W d , W r,d represent the similarity value of "drug-drug", "disease-disease" and whether there is a therapeutic relationship between "drug-disease", respectively.
A drug repositioning method based on heterogeneous association network deep learning according to claim 2, characterized in that the potential prediction of the "drug-disease" association in the step 4, that is, the specific process of drug repositioning is:

4.1 Feature extraction of "drug-disease" association:

4.1.1 The drug-centric feature vector, expressed as:

In the formula, A i, 1 is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r i ;
is the i-th row of the drug similarity matrix, indicating the similarity between drug r i and other drugs;

4.1.2 Disease-centric feature vector, expressed as:

In the formula, A 1, j is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d j ;
is the jth column of the drug similarity matrix, indicating the similarity between disease d j and other diseases;

4.1.3 The eigenvector of the "drug-disease" association (r i , d j ) can be represented by the eigenvector centered on the drug r i
and an eigenvector centered on disease d j
combined, expressed as:

4.2 Training of deep neural network model

The deep neural network algorithm model adopts a fully connected neural network, that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level layer is formed by combining low-level features to represent attribute categories or features. -Disease" association prediction is set as a binary classification problem, using the classic tower structure to build a fully connected neural network, the input layer is the feature vector F ij generated in step 4.1, and the output layer contains two neurons, respectively indicating that the test sample belongs to Probabilities of "true" and "false";

For the "drug-disease" associated feature vector set F, randomly generate a "negative" sample set at 1:1, and generate a training set, a test set, and a verification set at a ratio of 6:2:2;

The weight between the i-th neuron unit in the l layer and the j-th neuron unit in the l-1 layer is recorded as
Find the optimal weight set w:={w l }l:=1→L of the L-layer neural network through the training set, so that the cross-entropy is minimized;

Use the stochastic gradient descent method to learn parameters, use the Mini-Batch method to speed up the learning speed, use the Dropout method to avoid model overfitting, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators to realize the model optimization;

4.3 Use the test set and verification set to test and verify the model optimized in step 4.2, and intersect the predicted results of the tested and verified fully connected neural network model with the predicted results of the existing model to obtain the final predicted drug.
A drug relocation method based on heterogeneous association network deep learning according to claim 3, characterized in that: in the step 1, the drug similarity matrix based on various attribute characteristics is fused with the drug similarity matrix based on EMR The specific process is: quantile standardization is used to analyze the drug similarity matrix based on chemical structure, the drug similarity matrix based on target protein sequence, the drug similarity matrix based on interaction, the drug similarity matrix based on side effects and the EMR-based drug similarity matrix. The similarity values of the drug similarity matrix are standardized, and then averaged to form a drug similarity matrix; in the step 2, the ontology-based disease similarity matrix and the phenotype-based disease similarity matrix are established based on EMR. The specific process of the fusion of the disease similarity matrix is as follows: quantile standardization is used to standardize the similarity values of the ontology-based disease similarity matrix, phenotype-based disease similarity matrix, and EMR-based disease similarity matrix. processing, and then take the average value to form a disease similarity matrix.
A drug repositioning method based on heterogeneous association network deep learning according to claim 3, characterized in that: the similarity of each drug similarity matrix is calculated as follows:

The similarity of the chemical structure-based drug similarity matrix is expressed by the Tanimoto coefficient:

In the formula, |C r | and |C r′ | represent the number of chemical substructures in drug r and drug r′, respectively, and C r C r′ represents the number of chemical substructures shared by drug r and drug r′;

Similarity based on target protein sequence-drug similarity matrix:

In the formula, sw(...,...) represents the Smith–Waterman sequence alignment score;

The similarity of the interaction-based drug similarity matrix is expressed by the Jaccard coefficient:

In the formula, I r , I r' represent the interaction drug set of drug r and drug r'respectively;

The similarity of the side effect-based drug similarity matrix is expressed by the Jaccard coefficient:

In the formula, E r and E r' represent the side effect sets of drug r and drug r'respectively;

The formula for calculating the similarity of the EMR-based drug similarity matrix is as follows:

Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2

In the formula, Qd,pk represents the laboratory test results of type k in hospitalized medical record p after taking d medicine, and Simd,pk is the largest Qd,pk difference;
A drug repositioning method based on heterogeneous association network deep learning according to claim 3, characterized in that: the similarity of each disease similarity matrix is calculated as follows:

The calculation formula of the similarity of ontology-based disease similarity matrix is as follows:

In the formula, c(d, d′) represents the number of common parent nodes of disease d and d′; p x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;

The similarity of the phenotype-based disease similarity matrix is expressed by the Cosine coefficient:

In the formula,
and
Indicate the frequency of occurrence of the i-th MeSH word in the medical description information of diseases di and d′i respectively;

The calculation formula of the similarity of the disease similarity matrix based on EMR is as follows:

Among them, Gd and Gd' represent the feature sets of diseases d and d' respectively.
The drug repositioning method based on heterogeneous association network deep learning according to claim 5, characterized in that: the evaluation indicators in the step 4.2 include Precision, Recall, F1-measure and AUC.
A kind of drug repositioning method based on heterogeneous association network deep learning according to claim 5, it is characterized in that, the existing model described in step 4.3 is:

The value interval of Assoc r and d is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease.