WO2023077854A1 - Drug repurposing system and method based on heterogeneous association network deep learning - Google Patents

Drug repurposing system and method based on heterogeneous association network deep learning Download PDF

Info

Publication number
WO2023077854A1
WO2023077854A1 PCT/CN2022/104668 CN2022104668W WO2023077854A1 WO 2023077854 A1 WO2023077854 A1 WO 2023077854A1 CN 2022104668 W CN2022104668 W CN 2022104668W WO 2023077854 A1 WO2023077854 A1 WO 2023077854A1
Authority
WO
WIPO (PCT)
Prior art keywords
drug
disease
similarity matrix
similarity
matrix
Prior art date
Application number
PCT/CN2022/104668
Other languages
French (fr)
Chinese (zh)
Inventor
于琦
贺培风
张升校
刘格良
师高翔
王琪
高启超
Original Assignee
山西医科大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山西医科大学 filed Critical 山西医科大学
Publication of WO2023077854A1 publication Critical patent/WO2023077854A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the invention belongs to the technical field of biomedicine, and specifically relates to a drug relocation system and method based on deep learning of heterogeneous association networks.
  • Deep learning is to learn the internal laws and representation levels of sample data. The information obtained during the learning process is of great help to the interpretation of data such as text, images and sounds. Its ultimate goal is to enable machines to have the ability to analyze and learn like humans, and to be able to recognize data such as text, images, and sounds. Deep learning is a complex machine learning algorithm that has achieved many results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, and other related fields. Deep learning enables machines to imitate human activities such as audio-visual and thinking, and solves many complex pattern recognition problems, making great progress in artificial intelligence-related technologies.
  • Drug Repurposing also known as drug repurposing, old drug repurposing
  • This strategy has many advantages: firstly, the risk of failure is low, because the drugs used for repositioning have been proved to be absolutely safe in clinical models and on humans; secondly, the development cycle is short, because preclinical experiments, safety Evaluation and even preparation screening have been completed; again, the required investment is small, saving a lot of costs in the preclinical experiment stage. Therefore, the present invention studies a drug repositioning system and method based on deep learning of heterogeneous association networks.
  • the applicant of the present invention established an association model of drug repositioning based on the hypothesis that "the gene expression profile of a disease should be inversely correlated with the gene expression profile of a drug that can treat the disease", that is, the relationship between drug r and disease d
  • the degree of association is expressed as:
  • the value interval of Assoc r and d is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease.
  • the list L r of potential therapeutic drugs for the disease can be screened out. Through the further screening of the "drug-disease" association pairs l r that have been found in the medical literature, the drug list L' r with no confirmed association can be determined.
  • This model only establishes the relationship between drugs and diseases based on the principle of expression profile reversal, ignoring the two types of information based on gene expression profiles, the relationship between drugs and the relationship between diseases.
  • the model does not fully integrate gene expression profile data and medical literature data. The former is only used for association prediction, while the latter is only used for association screening.
  • the present invention provides a drug repositioning system and method based on deep learning of heterogeneous association networks.
  • the present invention provides a drug repositioning system based on heterogeneous association network deep learning, including a prediction tool module, an experimental verification module and an external service module; wherein, the prediction tool module mainly uses the Python programming language to establish a connection with the EMR database and perform operations. Specifically, on the basis of the known "drug-disease" association, information on drug similarity and disease similarity is incorporated to establish a "drug-disease" heterogeneous network, and the deep neural network algorithm in deep learning or the corrected geometric mean is used.
  • the experimental verification module is connected with the prediction tool module, mainly through the integration of animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs to form drug repositioning.
  • Standardized test process for positioning results which can meet general morphology, molecular biology, behavior and multi-omics research
  • external service modules mainly include data processing and analysis sub-modules, code and program presentation sub-modules, and training and communication
  • the sub-module, the data processing and analysis sub-module is based on the original data uploaded by the registered user and the analysis target, and provides a solution and timely feedback to the user.
  • the code and solution presentation sub-module discloses part of the code and solution for the user.
  • the training and communication sub-module can carry out training and communication work for peers;
  • the experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, answer the research hypothesis, and verify the results of the prediction tool module screening.
  • the present invention also provides a drug repositioning method based on heterogeneous association network deep learning, comprising the following steps:
  • Step 1 the construction of drug similarity matrix
  • Step 2 construction of disease similarity matrix
  • Step 3 construction of "drug-disease” heterogeneous association network
  • Step 4 potential prediction of "drug-disease” association, i.e. drug repositioning.
  • the specific process of constructing the drug similarity matrix in step 1 is as follows: according to the completeness and availability of data, four types of property characteristic information are selected: chemical structure, target protein sequence, interaction and side effects of the drug;
  • the drug similarity matrix of various attribute characteristics that is, the drug similarity matrix based on chemical structure, the drug similarity matrix based on target protein sequence, the drug similarity matrix based on interaction and the drug similarity matrix based on side effects; then the above
  • the established drug similarity matrix based on various attribute characteristics is fused with the drug similarity matrix based on EMR to form a drug similarity matrix;
  • the specific process of constructing the disease similarity matrix in the step 2 is as follows: according to the data completeness and availability, select two types of information, the ontology and phenotype of the disease; respectively establish the ontology-based disease similarity matrix and the disease phenotype Similarity matrix; then the ontology-based disease similarity matrix established above and the disease similarity matrix based on phenotype are fused with the disease similarity matrix established based on EMR to form a disease similarity matrix;
  • the specific process of constructing the "drug-disease” heterogeneous association network in the step 3 is: using the “drug-disease” adjacency matrix as a bridge, combining the drug similarity matrix constructed in step 1 and the disease similarity matrix constructed in step 2,
  • a "drug-disease" heterogeneous association network can be formed:
  • H r,d ⁇ R,D ⁇ , ⁇ E r ,E d ,E r,d ⁇ W r ,W d ,W r,d ⁇
  • R represents the drug vertex set
  • D represents the disease vertex set
  • E r , E d , Er , d represent the connection between "drug-drug", “disease-disease”, and “drug-disease”respectively
  • W r , W d , W r, d respectively represent the "drug-drug” similarity value, the “disease-disease” similarity value and whether there is a therapeutic relationship between "drug-disease”;
  • step 4 The potential prediction of the "drug-disease" association in step 4, that is, the specific process of drug repositioning is:
  • a i is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r i ; is the i-th row of the drug similarity matrix, indicating the similarity between drug r i and other drugs;
  • A is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d j ; is the jth column of the drug similarity matrix, indicating the similarity between disease d j and other diseases;
  • the eigenvector of the "drug-disease" association (r i , d j ) can be represented by the eigenvector centered on the drug r i and an eigenvector centered on disease d j combined, expressed as:
  • the deep neural network algorithm model adopts a fully connected neural network, that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level layer is formed by combining low-level features to represent attribute categories or features.
  • -Disease" association prediction is set as a binary classification problem, using the classic tower structure to build a fully connected neural network, the input layer is the feature vector F ij generated in step 4.1, and the output layer contains two neurons, respectively indicating that the test sample belongs to Probabilities of "true” and "false”;
  • step 4.3 Use the test set and verification set to test and verify the model optimized in step 4.2, and intersect the predicted results of the tested and verified fully connected neural network model with the predicted results of the existing model to obtain the final predicted drug.
  • quantile standardization is used to respectively compare the chemical structure-based drug similarity matrix and the target-based drug similarity matrix.
  • the similarity values of protein sequence drug similarity matrix, interaction-based drug similarity matrix, side effect-based drug similarity matrix and EMR-based drug similarity matrix are standardized, and then averaged to form drug similarity matrix.
  • the specific process of fusion of ontology-based disease similarity matrix and phenotype-based disease similarity matrix with the disease similarity matrix established based on EMR in the step 2 is as follows: use quantile standardization to respectively transform ontology-based disease similarity matrix , The similarity values of the phenotype-based disease similarity matrix and the disease similarity matrix established based on EMR are standardized, and then averaged to form a disease similarity matrix.
  • Merging multiple similarity matrices into a single matrix can effectively utilize information of various similarities on the one hand, and reduce the complexity of later calculations on the other hand.
  • the mean values of different similarity matrix elements are different, if the matrix fusion is performed directly based on the original similarity value, the matrix with a higher similarity average value will greatly affect the final result; even if the matrix is normalized The fusion effect will also be affected by the difference in the similarity distribution. Therefore, quantile standardization is selected to standardize the similarity values of various similarity matrices, and then take the average value.
  • se(...,%) represents the Smith–Waterman sequence alignment score
  • I r , I r' represent the interaction drug set of drug r and drug r'respectively;
  • E r and E r' represent the side effect sets of drug r and drug r'respectively;
  • Qd,pk represents the laboratory test results of type k in hospitalized medical record p after taking d medicine, and Simd,pk is the largest Qd,pk difference;
  • (d, d') represents the number of common parent nodes of disease d and d';
  • p x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;
  • G d and G d' represent the feature sets of diseases d and d' respectively.
  • the evaluation indicators in step 4.2 include Precision, Recall, F1-measure and AUC.
  • the existing model described in step 4.3 is:
  • the value interval of Assoc r and d is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease.
  • the present invention has the following advantages:
  • the drug relocation system and method based on the heterogeneous association network provided by the present invention establishes a "drug -Disease" heterogeneous network, and then use the deep neural network algorithm in deep learning or the modified geometric mean to carry out data mining and deep learning on the heterogeneous association network, so as to predict the potential association of "drug-disease", Realized the repositioning of the drug.
  • the system and method not only integrates "drug-disease” association information, "drug-drug” similarity information and “disease-disease” similarity information, but also fully integrates gene expression profile data and medical literature data ("drug-disease” "adjacency matrix) can greatly improve the efficiency, accuracy and pertinence of drug R&D, provide scientific support for the discovery of new drug indications and drug R&D cycle management, and provide guidance and support for improving the level of clinical diagnosis and treatment; realize clinical data and multiple Deep mining of omics data to make it serve the field of biomedicine; it can also promote the generation of scientific hypotheses in the clinical field, accelerate the research process of new diagnosis and treatment programs, and promote the development of clinical pharmacy, molecular biology and other related disciplines; rapidly promote the industrialization of drug development , thus creating considerable market value and promoting the rapid development of the national economy.
  • Fig. 1 is a schematic diagram of obtaining drug information and generating a drug similarity matrix in the present invention.
  • Fig. 2 is a schematic diagram of the acquisition of disease information and the generation of disease similarity matrix in the present invention.
  • Fig. 3 is a schematic diagram of the construction of the "drug-disease" heterogeneous association network of the present invention.
  • Fig. 4 is the feature extraction of "drug-disease” association in the present invention. Among them, (a) “drug-disease” association network, (b) drug feature matrix, (c) disease feature matrix, (d) "drug-disease” feature extraction.
  • Fig. 5 is the deep neural network model used in the present invention.
  • FIG. 6 shows the results of patch clamp experiments.
  • Figure 7 is the results of the open field experiment; among them, A is the statistical histogram of the total movement distance of mice in each group in the open field, B is the percentage of time that mice in each group are active in the central area of the open field, and C is the time percentage of the mice in each group.
  • A is the statistical histogram of the total movement distance of mice in each group in the open field
  • B is the percentage of time that mice in each group are active in the central area of the open field
  • C is the time percentage of the mice in each group.
  • the representative trajectory of the mouse in the open field test is shown; *P ⁇ 0.05, ***P ⁇ 0.001.
  • Figure 8 is a diagram of the results of the new object recognition experiment; among them, A is the schematic diagram of the new object recognition experiment, and B is the histogram of the respective exploration time of the mice in each group to two identical objects during the familiarization period, and there is no statistical difference (P>0.05 ), C is the NOI statistical histogram of each group of mice during the test period. *P ⁇ 0.05, **P ⁇ 0.01, ***P ⁇ 0.001.
  • Figure 9 is a graph of the results of the Morris water maze experiment; wherein, A is the line graph of the average swimming speed change of the mice during the Morris water maze experiment; B is the line graph of the escape latency change of the mice in each group during the 5-day positioning navigation experiment , * means APP/PS1+Vehicle vs WT+Vehicle, # means APP/PS1+TSA vs APP/PS1+Vehicle; C is the histogram of the percentage of time spent in the target quadrant of each group of mice in the space exploration experiment, D is the histogram of each group Statistical histogram of the number of times the mice in each group crossed the platform in the space exploration experiment, E is a schematic diagram of the representative swimming trajectory of each group of mice in the space exploration experiment, F is the time of each group of mice reaching the platform in the visual platform experiment Statistical histogram, G is the statistical histogram of the average swimming speed of mice in each group in the visual platform test; *P ⁇ 0.05, **P
  • a drug repositioning system based on deep learning of heterogeneous association networks including a prediction tool module, an experimental verification module and an external service module;
  • the prediction tool module mainly uses the Python programming language to connect and operate with EMR and other databases, and uses deep learning and other technical methods to re-screen existing drugs, specifically in the acquisition and construction of various drug attribute information and disease information.
  • drug similarity matrix and disease similarity matrix are established through similarity calculation, matrix fusion and other methods;
  • the known "drug-disease” correlation matrix is used as a bridge to integrate drug similarity and disease similarity information to construct a "drug-disease” Disease” heterogeneous network; for a certain disease, use the deep neural network algorithm in deep learning or the revised geometric mean to predict the potential association of "drug-disease", and re-screen candidate drugs or compounds that meet the algorithm requirements;
  • the experimental verification module is mainly to form a standardized test process for drug repositioning results by integrating animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs, which serves the prediction tool module and can meet the requirements of general morphology and molecular biology.
  • the experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, and make a research hypothesis Answer, verify the results of the prediction tool module screening;
  • the external service module mainly provides researchers with specialized data processing and analysis services. Registered users can upload raw data and analysis targets to the platform. Feedback to users in a timely manner; in addition, the platform discloses part of the code and problem solutions for users, and conducts training and communication work with peers.
  • a drug repositioning method based on heterogeneous association network deep learning including the following steps:
  • Step 1 the construction of drug similarity matrix: according to the completeness and availability of data, four types of attribute information are selected: the chemical structure of the drug, the target protein sequence, the interaction and the side effect; degree matrix, that is, drug similarity matrix based on chemical structure Drug similarity matrix based on target protein sequence Interaction-based drug similarity matrix and drug similarity matrix based on side effects Then quantile standardization is used to compare the drug similarity matrix based on various attribute characteristics and the drug similarity matrix based on EMR established above. The similarity values of the drugs are standardized, averaged, and fused to form a drug similarity matrix S r ; as shown in Figure 1.
  • se(...,%) represents the Smith–Waterman sequence alignment score
  • I r , I r' represent the interaction drug set of drug r and drug r'respectively;
  • E r and E r' represent the side effect sets of drug r and drug r'respectively;
  • the EMR database contains records of drug prescriptions, including time points of administration and various laboratory test results during the patient's hospital stay. We tracked any changes in dosing records and laboratory test results, and described the physiological changes in each test result after drug treatment by calculating the maximum difference.
  • the calculation formula is as follows:
  • Qd,pk represents the k-type laboratory test results of the hospitalized medical record p after taking d drugs
  • the change of k-type laboratory test results caused by drug induction after taking d drugs is calculated based on the largest Qd,pk difference Simd, pk.
  • the present invention calculates the similarity between two drug-induced physiological profiles of a drug pair using the rank sum test as a P-value for the type of laboratory test. Finally, the normalized ranking of the P values of all drug pairs was used as a measure of "drug-drug" similarity to reduce the heterogeneity of the distribution of P values tested by different laboratories.
  • the present invention assumes that different laboratory tests may be related to specific physiological properties of different diseases or drugs.
  • Step 2 the construction of disease similarity matrix: according to the completeness and availability of data, select two types of information about the ontology and phenotype of the disease; respectively establish the disease similarity matrix based on ontology and phenotype-based disease similarity matrix Then quantile normalization is used to transform the ontology-based disease similarity matrix established above into and phenotype-based disease similarity matrix Disease similarity matrix based on EMR The similarity value of the disease is standardized, averaged, and fused to form a disease similarity matrix S d ; as shown in Figure 2.
  • c(d, d′) represents the number of common parent nodes of disease d and d′;
  • p x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;
  • G d and G d' represent the feature sets of diseases d and d' respectively.
  • Step 3 construction of "drug-disease” heterogeneous association network: use the known “drug-disease” adjacency matrix A as a bridge, combine the drug similarity matrix S r constructed in step 1, and the disease similarity matrix constructed in step 2 S d , build a "drug-disease” heterogeneous association network H r, d : ( Figure 3)
  • H r,d ⁇ R,D ⁇ , ⁇ E r ,E d ,E r,d ⁇ W r ,W d ,W r,d ⁇
  • R represents the drug vertex set
  • D represents the disease vertex set
  • E r , E d , Er , d represent the connection between "drug-drug", “disease-disease”, and “drug-disease”respectively
  • W r , W d , W r, d represent the "drug-drug” similarity value, the “disease-disease” similarity value and whether there is a therapeutic relationship between "drug-disease” (1 or 0);
  • Step 4 potential prediction of "drug-disease” association, i.e. drug repositioning:
  • Drug-centric eigenvectors for the drug ri , one of the eigenvectors corresponds to its known association with all diseases in the disease set D, and the other eigenvector corresponds to its relationship with all the diseases in the drug set R similarities between drugs. These two vectors are combined to form a feature vector centered on the drug ri , expressed as:
  • a i is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r i ; is the i-th row of the drug similarity matrix, indicating the similarity between drug r i and other drugs; at this time,
  • the length of is n+m.
  • A is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d j ; is the jth column of the drug similarity matrix, indicating the similarity between disease d j and other diseases; at this time, The length of is n+m.
  • the eigenvector of the "drug-disease" association (r i , d j ) can be given by the eigenvector centered on the drug r i and an eigenvector centered on disease d j combined, expressed as:
  • the inner neural network layer of the deep neural network can be divided into three parts, the first layer is an input layer, the last layer is an output layer, and all intermediate layers are hidden layers, and the deep neural network model of the present invention adopts a fully connected neural network (Fig. 5 ), that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level is formed by combining low-level features to represent attribute categories or features.
  • the present invention sets the prediction of the "drug-disease" association For a binary classification problem, a fully connected neural network is built using a classic tower structure.
  • the input layer inputs the feature vector F ij generated above, with a total of 2*(n+m) neurons; the output layer contains two neurons, Denote the probability that the test sample belongs to "true” and "false", respectively;
  • the stochastic gradient descent method is used to learn parameters, the Mini-Batch method is used to speed up the learning speed, and the Dropout method is used to avoid model overfitting;
  • the prediction results of the method of the present invention are compared with three classifiers of Logistic Regression, Support Vector Machine and Random Forest For comparison, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators (Precision, Recall, F1-measure and AUC) to achieve model optimization;
  • step (3) Use the test set and verification set to test and verify the model optimized in step (2), and intersect the prediction results of the fully connected neural network model after testing and verification with the prediction results of the existing drug repositioning model to obtain the final predictive drugs.
  • the correlation degree between drug r and disease d in the existing drug repositioning model is expressed as:
  • the model is established based on the assumption that "the gene expression profile of the disease should be inversely correlated with the gene expression profile of the drug that can treat the disease". The closer to 1, the greater the possibility that the drug can treat the disease. For a specific disease, by sequentially calculating the degree of association Assoc between it and all drugs, and setting a certain threshold, the list L r of potential therapeutic drugs for the disease can be screened out.
  • a collection of drug similarity matrices according to the calculation method of each similarity in Example 2, construct a drug transcriptome similarity matrix Medicinal chemical structure similarity matrix Drug Interaction Similarity Matrix Drug protein target similarity matrix and Drug Side Effects Similarity Matrix
  • Fusion of drug similarity matrices merging multiple similarity matrices into a single matrix, effectively utilizing information of various similarities, and reducing the complexity of later calculations. Since the mean values of the elements of the five drug similarity matrices are different, the quantile standardization was selected to standardize the five types of matrices, and then the average value was taken to obtain the final drug similarity matrix S r .
  • a collection of disease similarity matrices According to the calculation method of each similarity in Example 2, construct a disease transcriptome similarity matrix Disease Phenotype Similarity Matrix and disease ontology similarity matrix Fusion of disease similarity matrix: Similarly, quantile standardization is selected to standardize the three types of matrices, and then the average value is taken to obtain the final drug similarity matrix S d .
  • Extract drug-centric feature vectors and disease-centric eigenvectors Form the feature vector F ij of the "drug-disease" association (r i , d j ), set the prediction of the "anti-ventricular arrhythmia drug-ventricular arrhythmia disease” association as a binary classification problem, and connect the input of the neural network
  • the layer is the feature vector F ij generated in the previous stage, with a total of 2*(n+m) neurons; the output layer contains two neurons, which respectively represent the probability of the test sample belonging to "true” and "false”.
  • the stochastic gradient descent method is used to learn parameters, the Mini-Batch technology is used to speed up the learning speed, and the Dropout technology is used to avoid model overfitting.
  • the present invention randomly generates a "negative" sample set for the feature vector set F at a ratio of 1:1. Generate training set, test set and validation set according to the ratio of 6:2:2. The prediction results of the model of the present invention are compared with the three classifiers of Logistic Regression, Support Vector Machine and Random Forest, and ten-fold cross-validation is used to evaluate the performance of the model.
  • the list of anti-ventricular arrhythmia drugs predicted by the original drug repositioning model is marked as L old
  • the list of anti-ventricular arrhythmia drugs predicted by the drug repositioning model of the present invention is marked as L new .
  • the specific process is as follows: the isolated rat heart is suspended on the Langendorff perfusion device through the aorta, the heart is perfused with calcium-free Tyrode's solution and enzyme solution in turn, and a single ventricular myocyte is isolated after the ventricular muscle tissue becomes larger. Using patch clamp technique, RP and AP were recorded in current clamp mode. The animal experiment data is represented by (x ⁇ s), analyzed by SPSS statistical software, P ⁇ 0.05 means that the difference is statistically significant, and the drugs with statistical differences can be used as backup drugs for the experiment.
  • drug A can increase the resting potential of isolated ventricular myocytes in a dose-dependent manner, and shorten the terminal repolarization time of action potentials.
  • the results of this experiment showed that the action characteristics of drug B on action potentials were consistent with those of IK1 agonists.
  • Moderately enhance IK1, and then increase or restore the resting potential, and the mechanism against ischemic arrhythmia is as follows: 1Increasing the negative value of the resting potential can reverse the membrane depolarization caused by pathological factors, reduce the excitability of cells, 2Increase the membrane conductance, reduce the abnormal fluctuation of membrane potential caused by the change of membrane current, and increase the electrical stability of the membrane; 3Appropriately shorten the action potential duration (APD), which helps to prevent early and late depolarization (EAD) and thus Triggered arrhythmias.
  • APD action potential duration
  • AD Alzheimer's disease
  • multi-omics data database of AD disease the text database of clinical medical records, etc.
  • heterogeneous association analysis was carried out, and a deep learning model of heterogeneous association network was established, and a new compound was discovered.
  • the therapeutic effect of C on AD is as follows:
  • Example 3 Same as "Example 3", by establishing the similarity matrix of AD treatment drugs, establishing the similarity matrix of AD disease multi-omics data and clinical medical records text, and establishing the "AD treatment drugs-AD disease” heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining the seven final predicted drugs (lead compounds), compound C was selected as the research object through literature research, pharmacophore analysis and other means to carry out experimental verification.
  • Wild-type mice Eight-month-old APP/PS1 mice and their littermate wild-type mice (wild-type, WT) were used in the experiment. According to the different types of mice and drug treatment, they were randomly divided into four groups: wild-type control group (WT+Vehicle), wild-type administration group (WT+C), APP/PS1 control group (APP/PS1+Vehicle) and APP/PS1 administration group (APP/PS1+C).
  • WT+Vehicle wild-type control group
  • WT+C wild-type administration group
  • APP/PS1 control group APP/PS1+Vehicle
  • APP/PS1+C APP/PS1 administration group
  • mice in WT+Vehicle group the total movement distance of mice in APP/PS1+Vehicle group was significantly increased in the open field (P ⁇ 0.05), and the percentage of time in the central area tended to increase.
  • the total movement distance of APP/PS1 mice after C treatment was significantly reduced (P ⁇ 0.001), and the percentage of time in the central area also showed a downward trend. It shows that compound C can significantly reduce the hyperactive state of APP/PS1 mice in the open field, as shown in FIG. 7 .
  • Example 3 Same as "Example 3", by establishing the similarity matrix of hepatitis B treatment drugs, establishing the multi-omics data of hepatitis B disease, the similarity matrix of clinical medical records, and establishing the "hepatitis B treatment drugs-hepatitis B disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining more than ten kinds of final predicted drugs (lead compounds), through literature research, pharmacophore analysis and other means, select ellipticine and camptothecin as research objects, and carry out experimental verification (camptothecin-related prediction and verification process See Example 6).
  • Dissolve ellipticine with a small amount of DMSO then use maintenance medium to dilute the mother solution to the highest concentration to be tested in the experiment, 0.01 ⁇ mol/L, filter and sterilize with a 0.2 ⁇ m disposable syringe filter, and distribute it for use; Dilution method, dilute in EP tubes, the first tube is the highest concentration liquid medicine. Add a certain amount of maintenance medium from the second tube, and then draw the same amount of liquid medicine from the first tube to the second tube. After blowing and mixing, discard the tip and replace it with a new tip. Add the same amount of drug solution to the third tube, and so on to the penultimate tube; the last tube is the maintenance medium without drugs. Finally, the drug is diluted into four concentrations of A, B, C, and D;
  • HepG 2.2.15 cells were inoculated in 96-well cell culture plate at 8 ⁇ 104/ml, 0.1ml per well. After the cells grew into a monolayer, the culture medium was removed (note that it was removed), and A, B, C, D Four concentrations of ellipticine were added to corresponding cell wells for culture, and three replicate wells were set up for each concentration. Set up a blank control group (only medium, no cells) and a normal cell control group (no drug group, only normal medium); after adding the drug, place in a 5% CO2 incubator and culture at 37.0°C for 72 hours; after 72 hours of cell culture Take out the culture well plate, add 10ul of CCK-8 solution to each well, and continue to culture in the incubator for 2h. After 2h, use a microplate reader to measure the absorbance at 450nm and print the results; data processing and analysis calculation.
  • the non-cytotoxic concentration of ellipticine is 0.01 ⁇ mol/L
  • the concentration of 0.01 ⁇ mol/L drug solution is diluted to 0.005 ⁇ mol/L, 0.0025 ⁇ mol/L, etc.
  • Different concentrations including 0.01 ⁇ mol/L
  • the inhibitory rate of the same drug on HBsAg and HBeAg was observed, and the dose-effect relationship was observed
  • Inhibition rate (experimental well P/N value - control well P/N value / control group P/N value - 2.1) ⁇ 100%
  • the drug treatment was the same as the HBsAg and HBeAg inhibition tests; the cells were added with different concentrations of drugs, and after 72 hours of culture, the HBV DNA load in the cell supernatant was measured by real-time fluorescent quantitative PCR; the inhibitory effect of different concentrations of drugs on HBV DNA was observed.
  • ellipticine has a significant inhibitory effect on HBsAg and HBeAg at the maximum non-toxic dose, and with the increase of the concentration, the inhibitory effect is more obvious, showing a concentration-dose dependent relationship.
  • Ellipticine also has an inhibitory effect on HBV DNA, and with the increase of drug concentration, the inhibitory effect is more obvious, showing a concentration-dose-dependent relationship.
  • Example 3 Same as "Example 3", by establishing the similarity matrix of hepatitis B treatment drugs, establishing the multi-omics data of hepatitis B disease, the similarity matrix of clinical medical records, and establishing the "hepatitis B treatment drugs-hepatitis B disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining more than ten final predicted drugs (lead compounds), through literature research, pharmacophore analysis and other means, camptothecin was selected as the research object to carry out experimental verification.
  • camptothecin has a significant inhibitory effect on HBsAg and HBeAg at the maximum non-toxic dose, and with the increase of the concentration, the inhibitory effect is more obvious, showing a concentration-dose dependent relationship.
  • Camptothecin also has an inhibitory effect on HBV DNA, and with the increase of drug concentration, the inhibitory effect is more obvious, showing a concentration-dose-dependent relationship.
  • Examples 3-6 show that the system and method of the present invention can greatly improve the efficiency, accuracy and pertinence of drug research and development, provide scientific support for the discovery of new drug indications and drug research and development cycle management, and improve clinical diagnosis and treatment.
  • the development of related disciplines rapidly promote the industrialization of drug development, thereby creating considerable market value and promoting the rapid development of the national economy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Primary Health Care (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medicinal Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Toxicology (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Bioethics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention belongs to the technical field of biological medicine. Disclosed are a drug repurposing system and method based on heterogeneous association network deep learning. The system comprises a prediction tool module, an experiment verification module and an external service module. In the prediction tool core module of the provided deep learning drug repurposing system, electronic medical record data is used for updating a "drug-disease" incidence matrix; drug information such as electronic medical records, chemical structures, target protein sequences, side effects, and protein interactions are merged with disease information such as electronic medical records, ontology and phenotype, and a drug similarity matrix and a disease similarity matrix are generated; finally, said three matrixes are combined to generate a "drug-disease" heterogeneous association network.

Description

一种基于异构关联网络深度学习的药物重定位系统及方法A drug repositioning system and method based on heterogeneous association network deep learning 技术领域technical field
本发明属于生物医药技术领域,具体为一种基于异构关联网络深度学习的药物重定位系统及方法。The invention belongs to the technical field of biomedicine, and specifically relates to a drug relocation system and method based on deep learning of heterogeneous association networks.
背景技术Background technique
深度学习是学习样本数据的内在规律和表示层次,这些学习过程中获得的信息对诸如文字,图像和声音等数据的解释有很大的帮助。它的最终目标是让机器能够像人一样具有分析学习能力,能够识别文字、图像和声音等数据。深度学习是一个复杂的机器学习算法,在搜索技术,数据挖掘,机器学习,机器翻译,自然语言处理,多媒体学习,以及其他相关领域都取得了诸多成果。深度学习使机器模仿视听和思考等人类的活动,解决了很多复杂的模式识别难题,使得人工智能相关技术取得了很大进步。Deep learning is to learn the internal laws and representation levels of sample data. The information obtained during the learning process is of great help to the interpretation of data such as text, images and sounds. Its ultimate goal is to enable machines to have the ability to analyze and learn like humans, and to be able to recognize data such as text, images, and sounds. Deep learning is a complex machine learning algorithm that has achieved many results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, and other related fields. Deep learning enables machines to imitate human activities such as audio-visual and thinking, and solves many complex pattern recognition problems, making great progress in artificial intelligence-related technologies.
药物重定位(Drug Repurposing,也称药物再利用、老药新用)是利用深度学习等技术方法对已有的药物进行重新筛选、组合或改造从而发现其未知新用途的一种策略。这种策略具有多方面优势:首先是失败风险低,因为被用于重定位的药物已在临床模型中和人体上被证实是绝对安全的;其次是研发周期短,因为临床前实验、安全性评估甚至制剂筛选已经完成;再次是所需投资少,节省了临床前实验阶段的大量费用。因此,本发明研究了一种基于异构关联网络深度学习的药物重定位系统及方法。Drug Repurposing (also known as drug repurposing, old drug repurposing) is a strategy that uses deep learning and other technical methods to rescreen, combine or transform existing drugs to discover their unknown new uses. This strategy has many advantages: firstly, the risk of failure is low, because the drugs used for repositioning have been proved to be absolutely safe in clinical models and on humans; secondly, the development cycle is short, because preclinical experiments, safety Evaluation and even preparation screening have been completed; again, the required investment is small, saving a lot of costs in the preclinical experiment stage. Therefore, the present invention studies a drug repositioning system and method based on deep learning of heterogeneous association networks.
发明内容Contents of the invention
本发明申请人前期基于“疾病基因表达谱应与可治疗该疾病的药物基因表达谱呈反向相关关系”这一假设研究建立了药物重定位的关联模型,即药物r与 疾病d之间的关联度表示为:The applicant of the present invention established an association model of drug repositioning based on the hypothesis that "the gene expression profile of a disease should be inversely correlated with the gene expression profile of a drug that can treat the disease", that is, the relationship between drug r and disease d The degree of association is expressed as:
Figure PCTCN2022104668-appb-000001
Figure PCTCN2022104668-appb-000001
Assoc r,d取值区间为[0,1],越趋近于1,则表示药物治疗疾病的可能性越大。对于特定疾病,通过依次计算它与全部药物之间的关联度Assoc,并设定某个阈值,可以筛选出该疾病的潜在治疗药物列表L r。通过已从医学文献中发现的“药物-疾病”关联对l r进一步甄别,可以确定未被证实关联的药物列表L′ rThe value interval of Assoc r and d is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease. For a specific disease, by sequentially calculating the degree of association Assoc between it and all drugs, and setting a certain threshold, the list L r of potential therapeutic drugs for the disease can be screened out. Through the further screening of the "drug-disease" association pairs l r that have been found in the medical literature, the drug list L' r with no confirmed association can be determined.
但是在应用过程中发现该模型存在以下局限:However, during the application process, the model was found to have the following limitations:
1、该模型仅是基于表达谱逆转原则建立了药物与疾病之间的关联,忽略了基于基因表达谱的药物间关联和疾病间关联两类信息。1. This model only establishes the relationship between drugs and diseases based on the principle of expression profile reversal, ignoring the two types of information based on gene expression profiles, the relationship between drugs and the relationship between diseases.
2、该模型没有充分融合基因表达谱数据与医学文献数据,前者仅用于关联预测,后者则仅用于关联甄别。2. The model does not fully integrate gene expression profile data and medical literature data. The former is only used for association prediction, while the latter is only used for association screening.
针对上述问题本发明提供了一种基于异构关联网络深度学习的药物重定位系统及方法。In view of the above problems, the present invention provides a drug repositioning system and method based on deep learning of heterogeneous association networks.
为了达到上述目的,本发明采用了下列技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
本发明提供一种基于异构关联网络深度学习的药物重定位系统,包括预测工具模块、实验验证模块和对外服务模块;其中,预测工具模块主要利用Python编程语言与EMR数据库建立连接并进行操作,具体是在已知“药物-疾病”关联的基础上,融入药物相似度、疾病相似度信息,建立“药物-疾病”异构网络,利用深度学习中的深度神经网络算法或修正后的几何平均值进行“药物-疾病”潜在关联预测,实现药物重定位;实验验证模块与预测工具模块连接,主要是通过整 合动物在体内或离体实验和临床药理学试验硬件设备和研究方案,形成药物重定位结果标准化试验流程,所述流程可满足一般的形态学、分子生物学、行为学及多组学研究;对外服务模块主要包括数据处理以及分析子模块、代码和方案呈现子模块以及培训与交流子模块,所述数据处理以及分析子模块是根据注册用户上传的原始数据及分析目标,给出解决方案并及时反馈给用户,所述代码和方案呈现子模块为用户公开部分代码和解决方案,所述培训与交流子模块可对同行开展培训和交流工作;The present invention provides a drug repositioning system based on heterogeneous association network deep learning, including a prediction tool module, an experimental verification module and an external service module; wherein, the prediction tool module mainly uses the Python programming language to establish a connection with the EMR database and perform operations. Specifically, on the basis of the known "drug-disease" association, information on drug similarity and disease similarity is incorporated to establish a "drug-disease" heterogeneous network, and the deep neural network algorithm in deep learning or the corrected geometric mean is used. value to predict the potential association of "drug-disease" to realize drug repositioning; the experimental verification module is connected with the prediction tool module, mainly through the integration of animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs to form drug repositioning. Standardized test process for positioning results, which can meet general morphology, molecular biology, behavior and multi-omics research; external service modules mainly include data processing and analysis sub-modules, code and program presentation sub-modules, and training and communication The sub-module, the data processing and analysis sub-module is based on the original data uploaded by the registered user and the analysis target, and provides a solution and timely feedback to the user. The code and solution presentation sub-module discloses part of the code and solution for the user. The training and communication sub-module can carry out training and communication work for peers;
所述实验验证模块还可根据不同的研究目的给与实验对象特定的处理因素,并控制非处理因素的影响,观察并评价实验效应,对研究假设做出回答,验证预测工具模块筛选的结果。The experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, answer the research hypothesis, and verify the results of the prediction tool module screening.
本发明还提供一种基于异构关联网络深度学习的药物重定位方法,包括以下步骤:The present invention also provides a drug repositioning method based on heterogeneous association network deep learning, comprising the following steps:
步骤1,药物相似度矩阵的构建; Step 1, the construction of drug similarity matrix;
步骤2,疾病相似度矩阵的构建; Step 2, construction of disease similarity matrix;
步骤3,“药物-疾病”异构关联网络的构建; Step 3, construction of "drug-disease" heterogeneous association network;
步骤4,“药物-疾病”关联的潜在预测,即药物重定位。Step 4, potential prediction of "drug-disease" association, i.e. drug repositioning.
进一步,所述步骤1中药物相似度矩阵构建的具体过程为:根据数据的完备性和可获取性,选取药物的化学结构、靶蛋白序列、互作用和副作用四类属性特性信息;分别建立基于各类属性特征的药物相似度矩阵,即基于化学结构的药物相似度矩阵、基于靶蛋白序列的药物相似度矩阵、基于互作用的药物相似度矩阵和基于副作用的药物相似度矩阵;然后将上述建立的基于各类属性特征的药物相似度矩阵与基于EMR的药物相似度矩阵融合,构成药物相似度矩阵;Further, the specific process of constructing the drug similarity matrix in step 1 is as follows: according to the completeness and availability of data, four types of property characteristic information are selected: chemical structure, target protein sequence, interaction and side effects of the drug; The drug similarity matrix of various attribute characteristics, that is, the drug similarity matrix based on chemical structure, the drug similarity matrix based on target protein sequence, the drug similarity matrix based on interaction and the drug similarity matrix based on side effects; then the above The established drug similarity matrix based on various attribute characteristics is fused with the drug similarity matrix based on EMR to form a drug similarity matrix;
所述步骤2中疾病相似度矩阵构建的具体过程为:根据数据完备性和可获 取性,选取疾病的本体和表型两类信息;分别建立基于本体的疾病相似度矩阵和基于表型的疾病相似度矩阵;然后将上述建立的基于本体的疾病相似度矩阵和基于表型的疾病相似度矩阵与基于EMR建立的疾病相似度矩阵融合,构成疾病相似度矩阵;The specific process of constructing the disease similarity matrix in the step 2 is as follows: according to the data completeness and availability, select two types of information, the ontology and phenotype of the disease; respectively establish the ontology-based disease similarity matrix and the disease phenotype Similarity matrix; then the ontology-based disease similarity matrix established above and the disease similarity matrix based on phenotype are fused with the disease similarity matrix established based on EMR to form a disease similarity matrix;
所述步骤3中“药物-疾病”异构关联网络构建的具体过程为:以“药物-疾病”邻接矩阵为桥梁,结合步骤1构建的药物相似度矩阵、步骤2构建的疾病相似度矩阵,可构成“药物-疾病”异构关联网络:The specific process of constructing the "drug-disease" heterogeneous association network in the step 3 is: using the "drug-disease" adjacency matrix as a bridge, combining the drug similarity matrix constructed in step 1 and the disease similarity matrix constructed in step 2, A "drug-disease" heterogeneous association network can be formed:
H r,d={{R,D},{E r,E d,E r,d}{W r,W d,W r,d}} H r,d ={{R,D},{E r ,E d ,E r,d }{W r ,W d ,W r,d }}
式中,R表示药物顶点集合,D表示疾病顶点集合;E r、E d、E r,d分别表示“药物-药物”、“疾病-疾病”、“药物-疾病”之间的连线;W r、W d、W r,d分别表示“药物-药物”相似度值、“疾病-疾病”相似度值以及“药物-疾病”之间是否存在治疗关系; In the formula, R represents the drug vertex set, D represents the disease vertex set; E r , E d , Er , d represent the connection between "drug-drug", "disease-disease", and "drug-disease"respectively; W r , W d , W r, d respectively represent the "drug-drug" similarity value, the "disease-disease" similarity value and whether there is a therapeutic relationship between "drug-disease";
所述步骤4中“药物-疾病”关联的潜在预测,即药物重定位的具体过程为:The potential prediction of the "drug-disease" association in step 4, that is, the specific process of drug repositioning is:
4.1“药物-疾病”关联的特征提取:4.1 Feature extraction of "drug-disease" association:
4.1.1以药物为中心的特征向量,表示为:4.1.1 The drug-centric feature vector, expressed as:
Figure PCTCN2022104668-appb-000002
Figure PCTCN2022104668-appb-000002
式中,A i,:为“药物-疾病”邻接矩阵A的第i行,表示与药物r i存在关联的疾病集合;
Figure PCTCN2022104668-appb-000003
为药物相似度矩阵的第i行,表示药物r i与其他药物之间的相似度;
In the formula, A i,: is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r i ;
Figure PCTCN2022104668-appb-000003
is the i-th row of the drug similarity matrix, indicating the similarity between drug r i and other drugs;
4.1.2以疾病为中心的特征向量,表示为:4.1.2 Disease-centric feature vector, expressed as:
Figure PCTCN2022104668-appb-000004
Figure PCTCN2022104668-appb-000004
式中,A: ,j为邻接矩阵A的第j列,表示与疾病d j存在关联的药物集合;
Figure PCTCN2022104668-appb-000005
为药物相似度矩阵的第j列,表示疾病d j与其他疾病之间的相似度;
In the formula, A: , j is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d j ;
Figure PCTCN2022104668-appb-000005
is the jth column of the drug similarity matrix, indicating the similarity between disease d j and other diseases;
4.1.3“药物-疾病”关联(r i,d j)的特征向量可由以药物r i为中心的特征向量
Figure PCTCN2022104668-appb-000006
和以疾病d j为中心的特征向量
Figure PCTCN2022104668-appb-000007
组合而成,表示为:
4.1.3 The eigenvector of the "drug-disease" association (r i , d j ) can be represented by the eigenvector centered on the drug r i
Figure PCTCN2022104668-appb-000006
and an eigenvector centered on disease d j
Figure PCTCN2022104668-appb-000007
combined, expressed as:
Figure PCTCN2022104668-appb-000008
Figure PCTCN2022104668-appb-000008
4.2深度神经网络模型的训练4.2 Training of deep neural network model
深度神经网络算法模型采用全连接神经网络,即当前层的任意一个神经元一定与前一层的任意一个神经元相连,通过组合低层特征形成更加抽象的高层来表示属性类别或特征,将“药物-疾病”关联的预测设置为一个二分类问题,采用经典的塔式结构搭建全连接神经网络,输入层步骤4.1生成的特征向量F ij,输出层包含了两个神经元,分别表示测试样本属于“真”和“假”的概率; The deep neural network algorithm model adopts a fully connected neural network, that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level layer is formed by combining low-level features to represent attribute categories or features. -Disease" association prediction is set as a binary classification problem, using the classic tower structure to build a fully connected neural network, the input layer is the feature vector F ij generated in step 4.1, and the output layer contains two neurons, respectively indicating that the test sample belongs to Probabilities of "true" and "false";
对于“药物-疾病”关联特征向量集合F,按1:1随机产生“负”样本集合,按照6:2:2的比例生成训练集、测试集和验证集;For the "drug-disease" associated feature vector set F, randomly generate a "negative" sample set at 1:1, and generate a training set, a test set, and a verification set at a ratio of 6:2:2;
第l层的第i神经单元与l-1层的第j个神经单元之间的权重记为
Figure PCTCN2022104668-appb-000009
通过训练集寻找L层神经网络的最优权重集w:={w l}l:=1→L,使得交叉熵最小化;
The weight between the i-th neuron unit in the l layer and the j-th neuron unit in the l-1 layer is recorded as
Figure PCTCN2022104668-appb-000009
Find the optimal weight set w:={w l }l:=1→L of the L-layer neural network through the training set, so that the cross-entropy is minimized;
采用随机梯度下降法来学习参数,采用Mini-Batch方法加快学习速度,采用Dropout方法避免模型过度拟合,使用十倍交叉验证评估模型性能优劣,通过评估指标优化隐藏层的层数,实现模型优化;Use the stochastic gradient descent method to learn parameters, use the Mini-Batch method to speed up the learning speed, use the Dropout method to avoid model overfitting, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators to realize the model optimization;
4.3采用测试集和验证集对步骤4.2优化后的模型进行测试和验证,将测试、验证后的全连接神经网络模型预测结果与现有模型预测结果求交集即获得最终预测药物。4.3 Use the test set and verification set to test and verify the model optimized in step 4.2, and intersect the predicted results of the tested and verified fully connected neural network model with the predicted results of the existing model to obtain the final predicted drug.
更进一步,所述步骤1中基于各类属性特征的药物相似度矩阵与基于EMR的药物相似度矩阵融合的具体过程为:采用分位数标准化分别对基于化学结构的药物相似度矩阵、基于靶蛋白序列的药物相似度矩阵、基于互作用的药物相似度矩阵、基于副作用的药物相似度矩阵和基于EMR的药物相似度矩阵的相似度取值进行标准化处理,进而取平均值,构成药物相似度矩阵。Furthermore, the specific process of fusion of the drug similarity matrix based on various attribute characteristics and the drug similarity matrix based on EMR in the step 1 is: quantile standardization is used to respectively compare the chemical structure-based drug similarity matrix and the target-based drug similarity matrix. The similarity values of protein sequence drug similarity matrix, interaction-based drug similarity matrix, side effect-based drug similarity matrix and EMR-based drug similarity matrix are standardized, and then averaged to form drug similarity matrix.
所述步骤2中基于本体的疾病相似度矩阵和基于表型的疾病相似度矩阵与基于EMR建立的疾病相似度矩阵融合的具体过程为:采用分位数标准化分别对基于本体的疾病相似度矩阵、基于表型的疾病相似度矩阵和基于EMR建立的疾病相似度矩阵的相似度取值进行标准化处理,进而取平均值,构成疾病相似度矩阵。The specific process of fusion of ontology-based disease similarity matrix and phenotype-based disease similarity matrix with the disease similarity matrix established based on EMR in the step 2 is as follows: use quantile standardization to respectively transform ontology-based disease similarity matrix , The similarity values of the phenotype-based disease similarity matrix and the disease similarity matrix established based on EMR are standardized, and then averaged to form a disease similarity matrix.
将多种相似度矩阵融合为单一矩阵,一方面可以有效利用各类相似度的信息,另一方面可以降低后期计算的复杂度。另外,由于不同相似度矩阵元素的均值不一,若直接基于原始相似度值进行矩阵融合,则相似度平均值较高的矩阵会在很大程度上影响最终结果;即使对矩阵进行了归一化处理,融合效果也会受到相似度分布差异的影响。因此,选择分位数标准化对各类相似度矩阵的相似度取值进行标准化处理,进而取平均值。Merging multiple similarity matrices into a single matrix can effectively utilize information of various similarities on the one hand, and reduce the complexity of later calculations on the other hand. In addition, because the mean values of different similarity matrix elements are different, if the matrix fusion is performed directly based on the original similarity value, the matrix with a higher similarity average value will greatly affect the final result; even if the matrix is normalized The fusion effect will also be affected by the difference in the similarity distribution. Therefore, quantile standardization is selected to standardize the similarity values of various similarity matrices, and then take the average value.
所述各药物相似度矩阵的相似度计算如下:The similarity of each drug similarity matrix is calculated as follows:
(1)基于化学结构的药物相似度矩阵的相似度用Tanimoto系数表示:(1) The similarity of the drug similarity matrix based on chemical structure is expressed by Tanimoto coefficient:
Figure PCTCN2022104668-appb-000010
Figure PCTCN2022104668-appb-000010
式中,|C r|、|C r′|分别表示药物r与药物r′中化学子结构的数量,C r C r′表示药物r与药物r′共同拥有化学子结构的数量; In the formula, |C r | and |C r′ | represent the number of chemical substructures in drug r and drug r′, respectively, and C r C r′ represents the number of chemical substructures shared by drug r and drug r′;
(2)基于靶蛋白序列的药物相似度矩阵的相似度:(2) The similarity of the drug similarity matrix based on the target protein sequence:
Figure PCTCN2022104668-appb-000011
Figure PCTCN2022104668-appb-000011
式中,se(…,…)表示Smith–Waterman序列对齐分值;In the formula, se(...,...) represents the Smith–Waterman sequence alignment score;
(3)基于互作用的药物相似度矩阵的相似度用Jaccard系数表示:(3) The similarity of the drug similarity matrix based on the interaction is expressed by the Jaccard coefficient:
Figure PCTCN2022104668-appb-000012
Figure PCTCN2022104668-appb-000012
式中,I r、I r′分别表示药物r与药物r′的互作用药物集合; In the formula, I r , I r' represent the interaction drug set of drug r and drug r'respectively;
(4)基于副作用的药物相似度矩阵的相似度用Jaccard系数表示:(4) The similarity of the drug similarity matrix based on side effects is represented by the Jaccard coefficient:
Figure PCTCN2022104668-appb-000013
Figure PCTCN2022104668-appb-000013
式中,E r、E r′分别表示药物r与药物r′的副作用集合; In the formula, E r and E r' represent the side effect sets of drug r and drug r'respectively;
(5)基于EMR的药物相似度矩阵的相似度的计算公式如下:(5) The calculation formula of the similarity of the drug similarity matrix based on EMR is as follows:
Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2
式中,Qd,pk代表用d药后住院病历p的k类型的实验室检测结果,Simd,pk为最大的Qd,pk差值;In the formula, Qd,pk represents the laboratory test results of type k in hospitalized medical record p after taking d medicine, and Simd,pk is the largest Qd,pk difference;
所述各疾病相似度矩阵的相似度计算如下:The similarity of each disease similarity matrix is calculated as follows:
(1)基于本体的疾病相似度矩阵的相似度的计算公式如下:(1) The calculation formula of the similarity of ontology-based disease similarity matrix is as follows:
Figure PCTCN2022104668-appb-000014
Figure PCTCN2022104668-appb-000014
式中,(d,d′)表示疾病d和d′共有父节点的数量;p x表示疾病x出现的概率,即疾病名称x或其子节点的数量与所有疾病名称数量的比值; In the formula, (d, d') represents the number of common parent nodes of disease d and d'; p x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;
(2)基于表型的疾病相似度矩阵的相似度用Cosine系数表示:(2) The similarity of the phenotype-based disease similarity matrix is expressed by the Cosine coefficient:
Figure PCTCN2022104668-appb-000015
Figure PCTCN2022104668-appb-000015
式中,
Figure PCTCN2022104668-appb-000016
Figure PCTCN2022104668-appb-000017
分别表示疾病di和d′i的医学描述信息中第i个MeSH词出现的频次;
In the formula,
Figure PCTCN2022104668-appb-000016
and
Figure PCTCN2022104668-appb-000017
Respectively represent the frequency of occurrence of the i-th MeSH word in the medical description information of diseases di and d'i;
(3)基于EMR的疾病相似度矩阵的相似度的计算公式如下:(3) The calculation formula of the similarity of the disease similarity matrix based on EMR is as follows:
Figure PCTCN2022104668-appb-000018
Figure PCTCN2022104668-appb-000018
其中,G d与G d’分别表示疾病d与d’的特征集合。 Among them, G d and G d' represent the feature sets of diseases d and d' respectively.
所述步骤4.2中的评估指标包括Precision、Recall、F1-measure和AUC。步骤4.3中所述现有模型为:The evaluation indicators in step 4.2 include Precision, Recall, F1-measure and AUC. The existing model described in step 4.3 is:
Figure PCTCN2022104668-appb-000019
Figure PCTCN2022104668-appb-000019
Assoc r,d取值区间为[0,1],越趋近于1,则表示药物治疗疾病的可能性越大。 The value interval of Assoc r and d is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease.
与现有技术相比本发明具有以下优点:Compared with the prior art, the present invention has the following advantages:
本发明提供的基于异构关联网络的药物重定位系统和方法,通过运用特定的算法在已知“药物-疾病”关联的基础上,融合疾病相似性矩阵和药物相似性矩阵,建立了“药物-疾病”异构网络,然后运用深度学习中的深度神经网络算法或修正后的几何平均值两种方法对异构关联网络进行数据挖掘和深度学习,从而进行“药物-疾病”潜在关联预测,实现了对药物的重定位。该系统和方法不仅融 合了“药物-疾病”关联信息、“药物-药物”相似度信息和“疾病-疾病”相似度信息,还充分融合了基因表达谱数据与医学文献数据(“药物-疾病”邻接矩阵),可大大提高药物研发的效率、精准性及针对性,为药物新适应症的发现和药物研发周期管理提供科学支持,为提升临床诊治水平提供引领和支撑;实现临床数据和多组学数据深度挖掘,使之服务生物医药领域;还可促进临床领域科学假说的生成,加快新的诊疗方案研究进程,推动临床药学、分子生物学等相关学科发展;迅速推进药物开发的产业化,从而创造可观的市场价值,促进国民经济快速发展。The drug relocation system and method based on the heterogeneous association network provided by the present invention establishes a "drug -Disease" heterogeneous network, and then use the deep neural network algorithm in deep learning or the modified geometric mean to carry out data mining and deep learning on the heterogeneous association network, so as to predict the potential association of "drug-disease", Realized the repositioning of the drug. The system and method not only integrates "drug-disease" association information, "drug-drug" similarity information and "disease-disease" similarity information, but also fully integrates gene expression profile data and medical literature data ("drug-disease" "adjacency matrix) can greatly improve the efficiency, accuracy and pertinence of drug R&D, provide scientific support for the discovery of new drug indications and drug R&D cycle management, and provide guidance and support for improving the level of clinical diagnosis and treatment; realize clinical data and multiple Deep mining of omics data to make it serve the field of biomedicine; it can also promote the generation of scientific hypotheses in the clinical field, accelerate the research process of new diagnosis and treatment programs, and promote the development of clinical pharmacy, molecular biology and other related disciplines; rapidly promote the industrialization of drug development , thus creating considerable market value and promoting the rapid development of the national economy.
附图说明Description of drawings
图1为本发明药物信息的获取和药物相似矩阵的生成示意图。Fig. 1 is a schematic diagram of obtaining drug information and generating a drug similarity matrix in the present invention.
图2为本发明疾病信息的获取和疾病相似矩阵的生成示意图。Fig. 2 is a schematic diagram of the acquisition of disease information and the generation of disease similarity matrix in the present invention.
图3为本发明“药物-疾病”异构关联网络的构建示意图。Fig. 3 is a schematic diagram of the construction of the "drug-disease" heterogeneous association network of the present invention.
图4为本发明“药物-疾病”关联的特征提取。其中,(a)“药物-疾病”关联网络,(b)药物特征矩阵,(c)疾病特征矩阵,(d)“药物-疾病”特征提取。Fig. 4 is the feature extraction of "drug-disease" association in the present invention. Among them, (a) "drug-disease" association network, (b) drug feature matrix, (c) disease feature matrix, (d) "drug-disease" feature extraction.
图5为本发明所用深度神经网络模型。Fig. 5 is the deep neural network model used in the present invention.
图6为膜片钳实验结果。Figure 6 shows the results of patch clamp experiments.
图7为旷场实验结果图;其中,A为各组小鼠在旷场中的总运动距离统计直方图,B为各组小鼠在旷场中央区域活动的时间百分比,C为各组小鼠在旷场实验中的代表性运动轨迹示;*P<0.05,***P<0.001。Figure 7 is the results of the open field experiment; among them, A is the statistical histogram of the total movement distance of mice in each group in the open field, B is the percentage of time that mice in each group are active in the central area of the open field, and C is the time percentage of the mice in each group. The representative trajectory of the mouse in the open field test is shown; *P<0.05, ***P<0.001.
图8为新物体识别实验结果图;其中,A为新物体识别实验示意图,B为各组小鼠在熟悉期对两个相同物体的各自探索时间直方图,未见统计学差异(P>0.05),C为各组小鼠在测试期的NOI统计直方图。*P<0.05,**P<0.01,***P<0.001。Figure 8 is a diagram of the results of the new object recognition experiment; among them, A is the schematic diagram of the new object recognition experiment, and B is the histogram of the respective exploration time of the mice in each group to two identical objects during the familiarization period, and there is no statistical difference (P>0.05 ), C is the NOI statistical histogram of each group of mice during the test period. *P<0.05, **P<0.01, ***P<0.001.
图9为Morris水迷宫实验结果图;其中,A为小鼠在Morris水迷宫实验期间的平均游泳速度变化折线图;B为各组小鼠在为期5天的定位航行实验中逃避潜伏期变化折线图,*表示APP/PS1+Vehicle vs WT+Vehicle,#表示APP/PS1+TSA vs APP/PS1+Vehicle;C为各组小鼠在空间探索实验中的目标象限停留时间百分比直方图,D为各组小鼠在空间探索实验中穿台次数的统计直方图,E为各组小鼠在空间探索实验中的代表性游泳轨迹示意图,F为各组小鼠在可视平台实验中到达平台时间的统计直方图,G为各组小鼠在可视平台实验中平均游泳速度的统计直方图;*P<0.05,**P<0.01,***P<0.001。Figure 9 is a graph of the results of the Morris water maze experiment; wherein, A is the line graph of the average swimming speed change of the mice during the Morris water maze experiment; B is the line graph of the escape latency change of the mice in each group during the 5-day positioning navigation experiment , * means APP/PS1+Vehicle vs WT+Vehicle, # means APP/PS1+TSA vs APP/PS1+Vehicle; C is the histogram of the percentage of time spent in the target quadrant of each group of mice in the space exploration experiment, D is the histogram of each group Statistical histogram of the number of times the mice in each group crossed the platform in the space exploration experiment, E is a schematic diagram of the representative swimming trajectory of each group of mice in the space exploration experiment, F is the time of each group of mice reaching the platform in the visual platform experiment Statistical histogram, G is the statistical histogram of the average swimming speed of mice in each group in the visual platform test; *P<0.05, **P<0.01, ***P<0.001.
具体实施方式Detailed ways
下面结合本发明实施例和附图,对本发明实施例中的技术方案进行具体、详细的说明。应当指出,对于本领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干变型和改进,这些也应视为属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be described in detail below in combination with the embodiments of the present invention and the accompanying drawings. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the principle of the present invention, and these should also be regarded as belonging to the protection scope of the present invention.
实施例1Example 1
基于异构关联网络深度学习的药物重定位系统,包括预测工具模块、实验验证模块和对外服务模块;A drug repositioning system based on deep learning of heterogeneous association networks, including a prediction tool module, an experimental verification module and an external service module;
其中,预测工具模块主要利用Python编程语言与EMR等数据库建立连接并进行操作,利用深度学习等技术方法对已有的药物进行重新筛选,具体是在对多种药物属性信息和疾病信息获取和建库的基础上,通过相似度计算、矩阵融合等方法,建立药物相似矩阵和疾病相似矩阵;以已知“药物-疾病”关联矩阵为桥梁融入药物相似度、疾病相似度信息,构建“药物-疾病”异构网络;针对某种疾病,利用深度学习中的深度神经网络算法或修正后的几何平均值进行“药物-疾病”潜在关联预测,重新筛选出符合算法要求的候选药物或化合物;Among them, the prediction tool module mainly uses the Python programming language to connect and operate with EMR and other databases, and uses deep learning and other technical methods to re-screen existing drugs, specifically in the acquisition and construction of various drug attribute information and disease information. Based on the database, drug similarity matrix and disease similarity matrix are established through similarity calculation, matrix fusion and other methods; the known "drug-disease" correlation matrix is used as a bridge to integrate drug similarity and disease similarity information to construct a "drug-disease" Disease" heterogeneous network; for a certain disease, use the deep neural network algorithm in deep learning or the revised geometric mean to predict the potential association of "drug-disease", and re-screen candidate drugs or compounds that meet the algorithm requirements;
实验验证模块主要是通过整合动物在体内或离体实验和临床药理学试验硬件设备和研究方案,形成药物重定位结果标准化试验流程,服务于预测工具模块,并能满足一般的形态学、分子生物学、行为学及多组学研究;所述实验验证模块还可根据不同的研究目的给与实验对象特定的处理因素,并控制非处理因素的影响,观察并评价实验效应,对研究假设做出回答,验证预测工具模块筛选的结果;The experimental verification module is mainly to form a standardized test process for drug repositioning results by integrating animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs, which serves the prediction tool module and can meet the requirements of general morphology and molecular biology. The experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, and make a research hypothesis Answer, verify the results of the prediction tool module screening;
对外服务模块主要为研究者提供专门的数据处理以及分析的服务,注册用户可以向平台上传原始数据及分析目标,平台统一分类汇总传输给相应的后台,后台处理给出解决方法后所述平台会及时反馈给用户;此外,平台为用户公开一部分代码和问题解决方案,并向同行开展培训和交流工作。The external service module mainly provides researchers with specialized data processing and analysis services. Registered users can upload raw data and analysis targets to the platform. Feedback to users in a timely manner; in addition, the platform discloses part of the code and problem solutions for users, and conducts training and communication work with peers.
实施例2Example 2
基于异构关联网络深度学习的药物重定位方法,包括以下步骤:A drug repositioning method based on heterogeneous association network deep learning, including the following steps:
步骤1,药物相似度矩阵的构建:根据数据的完备性和可获取性,选取药物的化学结构、靶蛋白序列、互作用和副作用四类属性特性信息;分别建立基于各类属性特征的药物相似度矩阵,即基于化学结构的药物相似度矩阵
Figure PCTCN2022104668-appb-000020
基于靶蛋白序列的药物相似度矩阵
Figure PCTCN2022104668-appb-000021
基于互作用的药物相似度矩阵
Figure PCTCN2022104668-appb-000022
和基于副作用的药物相似度矩阵
Figure PCTCN2022104668-appb-000023
然后采用分位数标准化分别对上述建立的基于各类属性特征的药物相似度矩阵与基于EMR的药物相似度矩阵
Figure PCTCN2022104668-appb-000024
的相似度取值进行标准化处理,取平均值,融合构成药物相似度矩阵S r;如图1。
Step 1, the construction of drug similarity matrix: according to the completeness and availability of data, four types of attribute information are selected: the chemical structure of the drug, the target protein sequence, the interaction and the side effect; degree matrix, that is, drug similarity matrix based on chemical structure
Figure PCTCN2022104668-appb-000020
Drug similarity matrix based on target protein sequence
Figure PCTCN2022104668-appb-000021
Interaction-based drug similarity matrix
Figure PCTCN2022104668-appb-000022
and drug similarity matrix based on side effects
Figure PCTCN2022104668-appb-000023
Then quantile standardization is used to compare the drug similarity matrix based on various attribute characteristics and the drug similarity matrix based on EMR established above.
Figure PCTCN2022104668-appb-000024
The similarity values of the drugs are standardized, averaged, and fused to form a drug similarity matrix S r ; as shown in Figure 1.
(1)基于化学结构的药物相似度矩阵的相似度用Tanimoto系数表示:(1) The similarity of the drug similarity matrix based on chemical structure is expressed by Tanimoto coefficient:
Figure PCTCN2022104668-appb-000025
Figure PCTCN2022104668-appb-000025
式中,|C r|、|C r′|分别表示药物r与药物r′中化学子结构的数量,C r C r′表示药物r与药物r′共同拥有化学子结构的数量; In the formula, |C r | and |C r′ | represent the number of chemical substructures in drug r and drug r′, respectively, and C r C r′ represents the number of chemical substructures shared by drug r and drug r′;
(2)基于靶蛋白序列的药物相似度矩阵的相似度,计算药物r与药物r′之间的Smith–Waterman序列对齐分值,并用几何平均的方式对此值进行标准化处理,靶点相似度为:(2) Based on the similarity of the drug similarity matrix of the target protein sequence, calculate the Smith–Waterman sequence alignment score between drug r and drug r', and standardize this value by geometric mean, the target similarity for:
Figure PCTCN2022104668-appb-000026
Figure PCTCN2022104668-appb-000026
式中,se(…,…)表示Smith–Waterman序列对齐分值;In the formula, se(...,...) represents the Smith–Waterman sequence alignment score;
(3)基于互作用的药物相似度矩阵的相似度用Jaccard系数表示:(3) The similarity of the drug similarity matrix based on the interaction is expressed by the Jaccard coefficient:
Figure PCTCN2022104668-appb-000027
Figure PCTCN2022104668-appb-000027
式中,I r、I r′分别表示药物r与药物r′的互作用药物集合; In the formula, I r , I r' represent the interaction drug set of drug r and drug r'respectively;
(4)基于副作用的药物相似度矩阵的相似度用Jaccard系数表示:(4) The similarity of the drug similarity matrix based on side effects is represented by the Jaccard coefficient:
Figure PCTCN2022104668-appb-000028
Figure PCTCN2022104668-appb-000028
式中,E r、E r′分别表示药物r与药物r′的副作用集合; In the formula, E r and E r' represent the side effect sets of drug r and drug r'respectively;
(5)基于EMR的药物相似度矩阵的相似度:(5) Similarity of EMR-based drug similarity matrix:
EMR数据库包含药物处方记录,包括患者住院期间的给药时间点和各种实验室检测结果。我们跟踪给药记录和实验室检测结果的任何变化,通过计算最大差异来描述药物治疗后每个检测结果的生理变化的计算公式如下:The EMR database contains records of drug prescriptions, including time points of administration and various laboratory test results during the patient's hospital stay. We tracked any changes in dosing records and laboratory test results, and described the physiological changes in each test result after drug treatment by calculating the maximum difference. The calculation formula is as follows:
Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2
式中,Qd,pk代表用d药后住院病历p的k类型的实验室检测结果,基于最大的Qd,pk差值计算药物诱导所引起的用d药后k类型的实验室检测结果的变化Simd,pk。本发明使用秩和检验计算了药物对的两个药物诱导的生理分布之间的相似度,作为实验室检测类型的P值。最后,将所有药物对的P值归一化排序作为“药物-药物”相似度的衡量指标,以减少不同实验室检测的P值分布的异质性。本发明假设不同的实验室检测可能与不同疾病或药物的特定生理特性有关。因此,分别使用每种测试类型计算疾病或药物对的相似度。由于实验室检测结果的稀疏性,在此仅使用主要类型的实验室检测,基于其高覆盖的药物处方患者(≥0.3),在给药期间有超过两项检测结果|Qd,pk|≥2。In the formula, Qd,pk represents the k-type laboratory test results of the hospitalized medical record p after taking d drugs, and the change of k-type laboratory test results caused by drug induction after taking d drugs is calculated based on the largest Qd,pk difference Simd, pk. The present invention calculates the similarity between two drug-induced physiological profiles of a drug pair using the rank sum test as a P-value for the type of laboratory test. Finally, the normalized ranking of the P values of all drug pairs was used as a measure of "drug-drug" similarity to reduce the heterogeneity of the distribution of P values tested by different laboratories. The present invention assumes that different laboratory tests may be related to specific physiological properties of different diseases or drugs. Therefore, the similarity of disease or drug pairs is calculated using each test type separately. Due to the sparsity of laboratory test results, only the main type of laboratory test was used here based on its high coverage of drug-prescribing patients (≥0.3) with more than two test results |Qd,pk|≥2 during the dosing period .
步骤2,疾病相似度矩阵的构建:根据数据完备性和可获取性,选取疾病的本体和表型两类信息;分别建立基于本体的疾病相似度矩阵
Figure PCTCN2022104668-appb-000029
和基于表型的疾病相似度矩阵
Figure PCTCN2022104668-appb-000030
然后采用分位数标准化分别将上述建立的基于本体的疾病相似度矩阵
Figure PCTCN2022104668-appb-000031
和基于表型的疾病相似度矩阵
Figure PCTCN2022104668-appb-000032
与基于EMR建立的疾病相似度矩阵
Figure PCTCN2022104668-appb-000033
的相似度取值进行标准化处理,取平均值,融合构成疾病相似度矩阵S d;如图2。
Step 2, the construction of disease similarity matrix: according to the completeness and availability of data, select two types of information about the ontology and phenotype of the disease; respectively establish the disease similarity matrix based on ontology
Figure PCTCN2022104668-appb-000029
and phenotype-based disease similarity matrix
Figure PCTCN2022104668-appb-000030
Then quantile normalization is used to transform the ontology-based disease similarity matrix established above into
Figure PCTCN2022104668-appb-000031
and phenotype-based disease similarity matrix
Figure PCTCN2022104668-appb-000032
Disease similarity matrix based on EMR
Figure PCTCN2022104668-appb-000033
The similarity value of the disease is standardized, averaged, and fused to form a disease similarity matrix S d ; as shown in Figure 2.
(1)基于本体的疾病相似度矩阵的相似度的计算公式如下:(1) The calculation formula of the similarity of ontology-based disease similarity matrix is as follows:
Figure PCTCN2022104668-appb-000034
Figure PCTCN2022104668-appb-000034
式中,c(d,d′)表示疾病d和d′共有父节点的数量;p x表示疾病x出现的概率, 即疾病名称x或其子节点的数量与所有疾病名称数量的比值; In the formula, c(d, d′) represents the number of common parent nodes of disease d and d′; p x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;
(2)基于表型的疾病相似度矩阵的相似度用Cosine系数表示:(2) The similarity of the phenotype-based disease similarity matrix is expressed by the Cosine coefficient:
Figure PCTCN2022104668-appb-000035
Figure PCTCN2022104668-appb-000035
式中,
Figure PCTCN2022104668-appb-000036
Figure PCTCN2022104668-appb-000037
分别表示疾病di和d′i的医学描述信息中第i个MeSH词出现的频次;
In the formula,
Figure PCTCN2022104668-appb-000036
and
Figure PCTCN2022104668-appb-000037
Respectively represent the frequency of occurrence of the i-th MeSH word in the medical description information of diseases di and d'i;
(3)参照基于ERM的药物相似度矩阵的相似度计算公式,基于EMR的疾病相似度矩阵的相似度的计算公式如下:(3) Referring to the similarity calculation formula of the ERM-based drug similarity matrix, the calculation formula of the similarity of the EMR-based disease similarity matrix is as follows:
Figure PCTCN2022104668-appb-000038
Figure PCTCN2022104668-appb-000038
其中,G d与G d’分别表示疾病d与d’的特征集合。 Among them, G d and G d' represent the feature sets of diseases d and d' respectively.
步骤3,“药物-疾病”异构关联网络的构建:以已知的“药物-疾病”邻接矩阵A为桥梁,结合步骤1构建的药物相似度矩阵S r、步骤2构建的疾病相似度矩阵S d,构建“药物-疾病”异构关联网络H r,d:(图3) Step 3, construction of "drug-disease" heterogeneous association network: use the known "drug-disease" adjacency matrix A as a bridge, combine the drug similarity matrix S r constructed in step 1, and the disease similarity matrix constructed in step 2 S d , build a "drug-disease" heterogeneous association network H r, d : (Figure 3)
H r,d={{R,D},{E r,E d,E r,d}{W r,W d,W r,d}} H r,d ={{R,D},{E r ,E d ,E r,d }{W r ,W d ,W r,d }}
式中,R表示药物顶点集合,D表示疾病顶点集合;E r、E d、E r,d分别表示“药物-药物”、“疾病-疾病”、“药物-疾病”之间的连线;W r、W d、W r,d分别表示“药物-药物”相似度值、“疾病-疾病”相似度值以及“药物-疾病”之间是否存在治疗关系(1或0); In the formula, R represents the drug vertex set, D represents the disease vertex set; E r , E d , Er , d represent the connection between "drug-drug", "disease-disease", and "drug-disease"respectively; W r , W d , W r, d represent the "drug-drug" similarity value, the "disease-disease" similarity value and whether there is a therapeutic relationship between "drug-disease" (1 or 0);
步骤4,“药物-疾病”关联的潜在预测,即药物重定位:Step 4, potential prediction of "drug-disease" association, i.e. drug repositioning:
(1)“药物-疾病”关联的特征提取:对于每一组“药物-疾病”关联,从“药物-疾病”异构关联网络中抽取其拓扑特征向量,以此作为训练深度神经网络模型的参数,(1) Feature extraction of "drug-disease" association: For each group of "drug-disease" associations, extract its topological feature vector from the "drug-disease" heterogeneous association network, and use it as the training method of deep neural network model parameter,
以药物为中心的特征向量,就药物r i而言,其中一个特征向量对应于其与疾病集合D中全部疾病之间的已知关联,另一个特征向量则对应于其与药物集合R中全部药物之间的相似度。这两个向量组合在一起构成了以药物r i为中心的特征向量,表示为: Drug-centric eigenvectors, for the drug ri , one of the eigenvectors corresponds to its known association with all diseases in the disease set D, and the other eigenvector corresponds to its relationship with all the diseases in the drug set R similarities between drugs. These two vectors are combined to form a feature vector centered on the drug ri , expressed as:
Figure PCTCN2022104668-appb-000039
Figure PCTCN2022104668-appb-000039
式中,A i,:为“药物-疾病”邻接矩阵A的第i行,表示与药物r i存在关联的疾病集合;
Figure PCTCN2022104668-appb-000040
为药物相似度矩阵的第i行,表示药物r i与其他药物之间的相似度;此时,
Figure PCTCN2022104668-appb-000041
的长度为n+m。
In the formula, A i,: is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r i ;
Figure PCTCN2022104668-appb-000040
is the i-th row of the drug similarity matrix, indicating the similarity between drug r i and other drugs; at this time,
Figure PCTCN2022104668-appb-000041
The length of is n+m.
以疾病为中心的特征向量,同理,就疾病d j而言,其中一个特征向量对应于其与药物集合R中全部药物之间的已知关联,另一个特征向量则对应于其与疾病集合D中全部疾病之间的相似度。这两个向量组合在一起构成了以疾病d j为中心的特征向量,表示为: Disease-centered eigenvectors, similarly, as far as disease d j is concerned, one of the eigenvectors corresponds to its known association with all drugs in the drug set R, and the other eigenvector corresponds to its relationship with the disease set The similarity between all diseases in D. These two vectors are combined to form the feature vector centered on the disease d j , expressed as:
Figure PCTCN2022104668-appb-000042
Figure PCTCN2022104668-appb-000042
式中,A :,j为邻接矩阵A的第j列,表示与疾病d j存在关联的药物集合;
Figure PCTCN2022104668-appb-000043
为药物相似度矩阵的第j列,表示疾病d j与其他疾病之间的相似度;此时,
Figure PCTCN2022104668-appb-000044
的长度为n+m。
In the formula, A :, j is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d j ;
Figure PCTCN2022104668-appb-000043
is the jth column of the drug similarity matrix, indicating the similarity between disease d j and other diseases; at this time,
Figure PCTCN2022104668-appb-000044
The length of is n+m.
因此,“药物-疾病”关联(r i,d j)的特征向量可由以药物r i为中心的特征向量
Figure PCTCN2022104668-appb-000045
和以疾病d j为中心的特征向量
Figure PCTCN2022104668-appb-000046
组合而成,表示为:
Therefore, the eigenvector of the "drug-disease" association (r i , d j ) can be given by the eigenvector centered on the drug r i
Figure PCTCN2022104668-appb-000045
and an eigenvector centered on disease d j
Figure PCTCN2022104668-appb-000046
combined, expressed as:
Figure PCTCN2022104668-appb-000047
Figure PCTCN2022104668-appb-000047
(2)深度神经网络模型的训练(2) Training of deep neural network model
深度神经网络内部的神经网络层可以分为三部分,第一层是输入层,最后一层是输出层,所有中间层都是隐藏层,本发明深度神经网络模型采用全连接神经网络(图5),即当前层的任意一个神经元一定与前一层的任意一个神经元相连,通过组合低层特征形成更加抽象的高层来表示属性类别或特征,本发明将“药物-疾病”关联的预测设置为一个二分类问题,采用经典的塔式结构搭建全连接神经网络,输入层输入上述生成的特征向量F ij,共2*(n+m)个神经元;输出层包含了两个神经元,分别表示测试样本属于“真”和“假”的概率; The inner neural network layer of the deep neural network can be divided into three parts, the first layer is an input layer, the last layer is an output layer, and all intermediate layers are hidden layers, and the deep neural network model of the present invention adopts a fully connected neural network (Fig. 5 ), that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level is formed by combining low-level features to represent attribute categories or features. The present invention sets the prediction of the "drug-disease" association For a binary classification problem, a fully connected neural network is built using a classic tower structure. The input layer inputs the feature vector F ij generated above, with a total of 2*(n+m) neurons; the output layer contains two neurons, Denote the probability that the test sample belongs to "true" and "false", respectively;
对于“药物-疾病”关联特征向量集合F,按1:1随机产生“负”样本集合,按照6:2:2的比例生成训练集、测试集和验证集;For the "drug-disease" associated feature vector set F, randomly generate a "negative" sample set at 1:1, and generate a training set, a test set, and a verification set at a ratio of 6:2:2;
第l层的第i神经单元与l-1层的第j个神经单元之间的权重记为
Figure PCTCN2022104668-appb-000048
通过训练集寻找L层神经网络的最优权重集w:={w l}l:=1→L,使得交叉熵最小化:
The weight between the i-th neuron unit in the l layer and the j-th neuron unit in the l-1 layer is recorded as
Figure PCTCN2022104668-appb-000048
Find the optimal weight set w:={w l }l:=1→L of the L-layer neural network through the training set, so as to minimize the cross entropy:
采用随机梯度下降法来学习参数,采用Mini-Batch方法加快学习速度,采用Dropout方法来避免模型过度拟合;将本发明方法的预测结果与Logistic Regression、Support Vector Machine和Random Forest三种分类器进行对比,使用十倍交叉验证评估模型性能优劣,通过评估指标(Precision、Recall、F1-measure和AUC)优化隐藏层的层数,实现模型优化;The stochastic gradient descent method is used to learn parameters, the Mini-Batch method is used to speed up the learning speed, and the Dropout method is used to avoid model overfitting; the prediction results of the method of the present invention are compared with three classifiers of Logistic Regression, Support Vector Machine and Random Forest For comparison, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators (Precision, Recall, F1-measure and AUC) to achieve model optimization;
(3)采用测试集和验证集对步骤(2)优化后的模型进行测试和验证,将测试、验证后的全连接神经网络模型预测结果与现有药物重定位模型预测结果求交集即获得最终预测药物。(3) Use the test set and verification set to test and verify the model optimized in step (2), and intersect the prediction results of the fully connected neural network model after testing and verification with the prediction results of the existing drug repositioning model to obtain the final predictive drugs.
现有药物重定位模型的药物r与疾病d之间的关联度表示为:The correlation degree between drug r and disease d in the existing drug repositioning model is expressed as:
Figure PCTCN2022104668-appb-000049
Figure PCTCN2022104668-appb-000049
该模型是基于“疾病基因表达谱应与可治疗该疾病的药物基因表达谱呈反向相关关系”这一假设建立的,模型的Assoc r,d取值区间为[0,1],越趋近于1,则表示药物治疗疾病的可能性越大。对于特定疾病,通过依次计算它与全部药物之间的关联度Assoc,并设定某个阈值,可以筛选出该疾病的潜在治疗药物列表L rThe model is established based on the assumption that "the gene expression profile of the disease should be inversely correlated with the gene expression profile of the drug that can treat the disease". The closer to 1, the greater the possibility that the drug can treat the disease. For a specific disease, by sequentially calculating the degree of association Assoc between it and all drugs, and setting a certain threshold, the list L r of potential therapeutic drugs for the disease can be screened out.
实施例3Example 3
基于室性心律失常疾病临床常规用药数据库与室性心律失常疾病数据库异构关联分析,建立异构关联网络深度学习模型,发现用于其他疾病治疗的药物B对室性心律失常的治疗作用,具体如下:Based on the heterogeneous association analysis between the clinical routine drug database of ventricular arrhythmia diseases and the database of ventricular arrhythmia diseases, a heterogeneous association network deep learning model was established to discover the therapeutic effect of drug B for the treatment of other diseases on ventricular arrhythmias. as follows:
(1)建立抗心律失常药物相似度矩阵(1) Establish the similarity matrix of antiarrhythmic drugs
在基于转录组数据建立药物间相似度的基础上,增加化学结构相似度、互作用相似度、蛋白质靶点相似度和副作用相似度四类关联。On the basis of establishing similarity between drugs based on transcriptome data, four types of associations are added: chemical structure similarity, interaction similarity, protein target similarity and side effect similarity.
药物相似度矩阵的集合:根据实施例2中各相似度的计算方法,分别构建药物转录组相似度矩阵
Figure PCTCN2022104668-appb-000050
药物化学结构相似度矩阵
Figure PCTCN2022104668-appb-000051
药物互作用相似度矩阵
Figure PCTCN2022104668-appb-000052
药物蛋白质靶点相似度矩阵
Figure PCTCN2022104668-appb-000053
和药物副作用相似度矩阵
Figure PCTCN2022104668-appb-000054
A collection of drug similarity matrices: according to the calculation method of each similarity in Example 2, construct a drug transcriptome similarity matrix
Figure PCTCN2022104668-appb-000050
Medicinal chemical structure similarity matrix
Figure PCTCN2022104668-appb-000051
Drug Interaction Similarity Matrix
Figure PCTCN2022104668-appb-000052
Drug protein target similarity matrix
Figure PCTCN2022104668-appb-000053
and Drug Side Effects Similarity Matrix
Figure PCTCN2022104668-appb-000054
药物相似度矩阵的融合:将多种相似度矩阵融合为单一矩阵,有效利用各类相似度的信息,降低后期计算的复杂度。由于五种药物相似度矩阵元素的均值不一,选择分位数标准化对五类矩阵进行标准化处理,进而取平均值,得到最终药物相似度矩阵S rFusion of drug similarity matrices: merging multiple similarity matrices into a single matrix, effectively utilizing information of various similarities, and reducing the complexity of later calculations. Since the mean values of the elements of the five drug similarity matrices are different, the quantile standardization was selected to standardize the five types of matrices, and then the average value was taken to obtain the final drug similarity matrix S r .
(2)建立室性心律失常疾病相似度矩阵(2) Establish a ventricular arrhythmia disease similarity matrix
在基于转录组数据建立各种室性心律失常性疾病相似度的基础上,增加表型相似度和本体相似度两类关联。On the basis of establishing the similarity of various ventricular arrhythmia diseases based on transcriptome data, two types of associations, phenotype similarity and ontology similarity, are added.
疾病相似度矩阵的集合:根据实施例2中各相似度的计算方法,分别构建疾病转录组相似度矩阵
Figure PCTCN2022104668-appb-000055
疾病表型相似度矩阵
Figure PCTCN2022104668-appb-000056
和疾病本体相似度矩阵
Figure PCTCN2022104668-appb-000057
疾病相似度矩阵的融合:同理,选择分位数标准化对三类矩阵进行标准化处理,进而取平均值,得到最终药物相似度矩阵S d
A collection of disease similarity matrices: According to the calculation method of each similarity in Example 2, construct a disease transcriptome similarity matrix
Figure PCTCN2022104668-appb-000055
Disease Phenotype Similarity Matrix
Figure PCTCN2022104668-appb-000056
and disease ontology similarity matrix
Figure PCTCN2022104668-appb-000057
Fusion of disease similarity matrix: Similarly, quantile standardization is selected to standardize the three types of matrices, and then the average value is taken to obtain the final drug similarity matrix S d .
(3)建立“抗室性心律失常药-室性心律失常疾病”异构关联网络,基于异构关联网络挖掘药物重定位模型。(3) Establish a heterogeneous association network of "anti-ventricular arrhythmia drugs-ventricular arrhythmia diseases", and mine a drug repositioning model based on the heterogeneous association network.
提取以药物为中心的特征向量
Figure PCTCN2022104668-appb-000058
和以疾病为中心的特征向量
Figure PCTCN2022104668-appb-000059
形成“药物-疾病”关联(r i,d j)的特征向量F ij,将“抗室性心律失常药-室性心律失常疾病”关联的预测设置为一个二分类问题,连接神经网络的输入层为前期生成的特征向 量F ij,共2*(n+m)个神经元;输出层包含了两个神经元,分别表示测试样本属于“真”和“假”的概率。采用随机梯度下降法来学习参数,采用Mini-Batch技术加快学习速度,采用Dropout技术避免模型过度拟合。
Extract drug-centric feature vectors
Figure PCTCN2022104668-appb-000058
and disease-centric eigenvectors
Figure PCTCN2022104668-appb-000059
Form the feature vector F ij of the "drug-disease" association (r i , d j ), set the prediction of the "anti-ventricular arrhythmia drug-ventricular arrhythmia disease" association as a binary classification problem, and connect the input of the neural network The layer is the feature vector F ij generated in the previous stage, with a total of 2*(n+m) neurons; the output layer contains two neurons, which respectively represent the probability of the test sample belonging to "true" and "false". The stochastic gradient descent method is used to learn parameters, the Mini-Batch technology is used to speed up the learning speed, and the Dropout technology is used to avoid model overfitting.
本发明对于特征向量集合F按1:1比例随机产生“负”样本集合。按照6:2:2的比例生成训练集、测试集和验证集。将本发明模型预测结果与Logistic Regression、Support Vector Machine和Random Forest三种分类器进行对比,使用十倍交叉验证评估模型性能优劣。将原药物重定位模型预测出的抗室性心律失常药物列表记为L old,将本发明药物重定位模型预测出的抗室性心律失常药物列表记为L new。将两集合的交集L=L old∩L new确定为最终预测药物(先导化合物)。 The present invention randomly generates a "negative" sample set for the feature vector set F at a ratio of 1:1. Generate training set, test set and validation set according to the ratio of 6:2:2. The prediction results of the model of the present invention are compared with the three classifiers of Logistic Regression, Support Vector Machine and Random Forest, and ten-fold cross-validation is used to evaluate the performance of the model. The list of anti-ventricular arrhythmia drugs predicted by the original drug repositioning model is marked as L old , and the list of anti-ventricular arrhythmia drugs predicted by the drug repositioning model of the present invention is marked as L new . The intersection of the two sets L=L old ∩L new is determined as the final predicted drug (lead compound).
(4)预测药物的动物实验:采用膜片钳技术,通过检测药物作用前后大鼠离体心室肌细胞静息电位(RP)、动作电位(AP)幅度及时程的变化,评价该药物是否具有抗室性心律失常作用,验证机器学习预测结果。(4) Animal experiments for predicting drugs: Using patch clamp technology, by detecting the changes in the resting potential (RP), action potential (AP) amplitude and time course of rat isolated ventricular myocytes before and after drug action, evaluate whether the drug has Anti-ventricular arrhythmia effect, validation of machine learning prediction results.
具体过程为:将大鼠离体心脏经主动脉悬挂在Langendorff灌流装置上,依次用无钙台氏液和酶液灌流心脏,待心室肌组织变大后分离单个心室肌细胞。运用膜片钳技术,电流钳方式下记录RP及AP。动物实验数据以(x±s)表示,采用SPSS统计软件进行分析,P<0.05为差异具有统计学意义,具有统计学差异的药物可作为实验的备用药物。The specific process is as follows: the isolated rat heart is suspended on the Langendorff perfusion device through the aorta, the heart is perfused with calcium-free Tyrode's solution and enzyme solution in turn, and a single ventricular myocyte is isolated after the ventricular muscle tissue becomes larger. Using patch clamp technique, RP and AP were recorded in current clamp mode. The animal experiment data is represented by (x±s), analyzed by SPSS statistical software, P<0.05 means that the difference is statistically significant, and the drugs with statistical differences can be used as backup drugs for the experiment.
如图6所示,药物A可增大离体心室肌细胞静息电位,呈剂量依赖性,并缩短动作电位终末复极时间。本实验结果显示药物B对动作电位的作用特征符合IK1激动剂的特点。适度增强IK1,进而增大或恢复静息电位,对抗缺血性心律失常的机制在于:①静息电位负值增大,可反转病理因素导致的膜去极化,降 低细胞的兴奋性,②增大膜电导,减小膜电流变化引起的膜电位异常波动,增加膜的电稳定性;③适当缩短动作电位时程(APD),有助于防止早后除极(EAD)和由此引起的触发性心律失常。As shown in Figure 6, drug A can increase the resting potential of isolated ventricular myocytes in a dose-dependent manner, and shorten the terminal repolarization time of action potentials. The results of this experiment showed that the action characteristics of drug B on action potentials were consistent with those of IK1 agonists. Moderately enhance IK1, and then increase or restore the resting potential, and the mechanism against ischemic arrhythmia is as follows: ①Increasing the negative value of the resting potential can reverse the membrane depolarization caused by pathological factors, reduce the excitability of cells, ②Increase the membrane conductance, reduce the abnormal fluctuation of membrane potential caused by the change of membrane current, and increase the electrical stability of the membrane; ③Appropriately shorten the action potential duration (APD), which helps to prevent early and late depolarization (EAD) and thus Triggered arrhythmias.
(4)备选药物的临床试验:经医学伦理委员会批准,经患者知情同意,开展临床随机对照试验。以总有效率、动态心电图改善情况、血压、心率、心电图、血脂、血糖以及不良反应等指标变化情况评价新药物是否有效(本实施例动物实验结果阳性,临床试验尚未开展)。(4) Clinical trials of alternative drugs: With the approval of the Medical Ethics Committee and the informed consent of the patients, clinical randomized controlled trials will be carried out. Evaluate whether the new drug is effective or not based on changes in indicators such as total effective rate, ambulatory electrocardiogram improvement, blood pressure, heart rate, electrocardiogram, blood lipids, blood sugar, and adverse reactions (the results of animal experiments in this example are positive, and clinical trials have not yet been carried out).
实施例4Example 4
基于阿尔兹海默症(Alzheimer’s disease,AD)疾病临床用药数据库与AD疾病多组学数据数据库、临床病历文本数据库等进行异构关联分析,建立异构关联网络深度学习模型,发现一种新型化合物C对AD的治疗作用,具体如下:Based on the clinical drug database of Alzheimer's disease (AD) disease, the multi-omics data database of AD disease, the text database of clinical medical records, etc., heterogeneous association analysis was carried out, and a deep learning model of heterogeneous association network was established, and a new compound was discovered. The therapeutic effect of C on AD is as follows:
(1)~(3)同“实施例3”,通过建立AD治疗药物的相似度矩阵,建立AD疾病多组学数据、临床病历文本相似度矩阵,建立“AD治疗药物-AD疾病”异构关联网络,基于异构关联网络挖掘药物重定位模型。确定七种最终预测药物(先导化合物)后,通过文献调研、药效团分析等多种手段,选取化合物C为研究对象,开展实验验证。(1)~(3) Same as "Example 3", by establishing the similarity matrix of AD treatment drugs, establishing the similarity matrix of AD disease multi-omics data and clinical medical records text, and establishing the "AD treatment drugs-AD disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining the seven final predicted drugs (lead compounds), compound C was selected as the research object through literature research, pharmacophore analysis and other means to carry out experimental verification.
(4)小鼠行为学实验验证(4) Experimental verification of mouse behavior
实验选用8月龄APP/PS1小鼠及其同窝野生型小鼠(wild-type,WT)。根据小鼠类型和药物处理的不同,随机分为四组:野生型对照组(WT+Vehicle)、野生型给药组(WT+C)、APP/PS1对照组(APP/PS1+Vehicle)和APP/PS1给药组(APP/PS1+C)。Eight-month-old APP/PS1 mice and their littermate wild-type mice (wild-type, WT) were used in the experiment. According to the different types of mice and drug treatment, they were randomly divided into four groups: wild-type control group (WT+Vehicle), wild-type administration group (WT+C), APP/PS1 control group (APP/PS1+Vehicle) and APP/PS1 administration group (APP/PS1+C).
各组动物在行为学实验前30天起分别接受腹腔注射化合物C(2mg/kg)或等体积的溶剂(Vehicle),药物注射一直延续至行为学实验结束。药物处理的第 31天开始进行三种行为学测试:旷场实验、新物体识别实验、Morris水迷宫实验。Animals in each group received intraperitoneal injection of compound C (2 mg/kg) or an equal volume of vehicle (Vehicle) from 30 days before the behavioral experiment, and the drug injection continued until the end of the behavioral experiment. On the 31st day of drug treatment, three behavioral tests were performed: open field test, novel object recognition test, and Morris water maze test.
实验结果:Experimental results:
①与WT+Vehicle组小鼠相比,APP/PS1+Vehicle组小鼠在旷场中的总运动距离明显增多(P<0.05),而在中央区域的时间百分比有升高的趋势,给予化合物C治疗后的APP/PS1小鼠总运动距离明显减少(P<0.001),在中央区域的时间百分比也有下降的趋势。表明化合物C明显降低APP/PS1小鼠在旷场中的过度活跃状态,见图7。①Compared with mice in WT+Vehicle group, the total movement distance of mice in APP/PS1+Vehicle group was significantly increased in the open field (P<0.05), and the percentage of time in the central area tended to increase. The total movement distance of APP/PS1 mice after C treatment was significantly reduced (P<0.001), and the percentage of time in the central area also showed a downward trend. It shows that compound C can significantly reduce the hyperactive state of APP/PS1 mice in the open field, as shown in FIG. 7 .
②在新物体识别实验中,四组小鼠在熟悉期对两个相同物体的探索时间没有明显差异(P>0.05);在测试期,与WT+Vehicle组小鼠相比,APP/PS1+Vehicle组小鼠的新物体识别指数出现明显下降(P<0.01),经化合物C治疗后可以明显提高APP/PS1+C组小鼠(P<0.05)的新物体识别指数。此外,化合物C治疗也显著提高了野生型对照组小鼠的新物体识别指数(P<0.001)。表明化合物C可以改善小鼠的短期识别记忆,见图8。②In the new object recognition experiment, there was no significant difference in the exploration time of the four groups of mice for two identical objects in the familiarization period (P>0.05); The new object recognition index of the mice in the Vehicle group was significantly decreased (P<0.01), and the new object recognition index of the APP/PS1+C group mice (P<0.05) could be significantly improved after treatment with compound C. In addition, compound C treatment also significantly improved the novel object recognition index of wild-type control mice (P<0.001). It shows that Compound C can improve the short-term recognition memory of mice, see Figure 8.
③水迷宫定位航行实验显示,APP/PS1+Vehicle组小鼠第4天(P<0.001)和第5天(P<0.05)的逃避潜伏期与WT+Vehicle组小鼠相比明显延长,给予化合物C治疗后明显缩短(P<0.05)。空间探索实验结果显示,相比WT+Vehicle组小鼠,APP/PS1+Vehicle组小鼠在目标象限的停留时间百分比(P<0.05)和穿台次数(P<0.01)明显减少,而化合物C治疗后均显著增多(P<0.05)。在水迷宫实验期间,各组小鼠的游泳速度无统计学差异(P>0.05)。表明化合物C可增强APP/PS1小鼠的空间学习和参考记忆,见图9。③Water maze navigation test showed that the escape latency of mice in APP/PS1+Vehicle group on the 4th day (P<0.001) and 5th day (P<0.05) was significantly longer than that of WT+Vehicle group mice. C significantly shortened after treatment (P<0.05). The results of space exploration experiments showed that compared with WT+Vehicle group mice, the percentage of residence time (P<0.05) and platform crossing times (P<0.01) of APP/PS1+Vehicle group mice in the target quadrant were significantly decreased, while compound C After treatment, they all increased significantly (P<0.05). During the water maze test, there was no statistical difference in the swimming speed of the mice in each group (P>0.05). It shows that compound C can enhance the spatial learning and reference memory of APP/PS1 mice, as shown in FIG. 9 .
实施例5Example 5
基于乙型病毒性肝炎(简称乙肝)疾病临床用药数据库与乙肝疾病多组学数据库、临床病历文本数据库等进行异构关联分析,建立异构关联网络深度学 习模型,发现天然存在于夹竹桃科植物椭圆玫瑰树叶中的化合物玫瑰树碱对乙肝的治疗作用,具体如下:Based on the heterogeneous association analysis based on the clinical drug database of viral hepatitis B (hepatitis B for short), the multi-omics database of hepatitis B disease, and the text database of clinical medical records, a heterogeneous association network deep learning model was established, and it was found that the natural species of Oleander The therapeutic effect of ellipticine, a compound in the leaves of rose ellipse, on hepatitis B, is as follows:
(1)~(3)同“实施例3”,通过建立乙肝治疗药物的相似度矩阵,建立乙肝疾病多组学数据、临床病历文本相似度矩阵,建立“乙肝治疗药物-乙肝疾病”异构关联网络,基于异构关联网络挖掘药物重定位模型。确定十余种最终预测药物(先导化合物)后,通过文献调研、药效团分析等多种手段,选取玫瑰树碱和喜树碱为研究对象,开展实验验证(喜树碱相关预测和验证过程见实施例6)。(1)~(3) Same as "Example 3", by establishing the similarity matrix of hepatitis B treatment drugs, establishing the multi-omics data of hepatitis B disease, the similarity matrix of clinical medical records, and establishing the "hepatitis B treatment drugs-hepatitis B disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining more than ten kinds of final predicted drugs (lead compounds), through literature research, pharmacophore analysis and other means, select ellipticine and camptothecin as research objects, and carry out experimental verification (camptothecin-related prediction and verification process See Example 6).
(4)HepG2.2.15细胞的培养(4) Culture of HepG2.2.15 cells
①HepG2.2.15细胞的复苏① Recovery of HepG2.2.15 cells
在10ml的玻璃离心管中预先加入含有双抗(PS)的10%胎牛血清的DMEM培养液7ml备用;液氮罐中取出冻存的细胞,迅速放入37.0℃的恒温水浴箱中,持续摇动2~3min,使其迅速解冻;全部融化后,取出冻存管,用75%酒精棉球消毒冻存管,打开冻存管,用吸管吸出细胞悬液,注入已预先准备好的离心管中;混匀后低速离心,800转/min,5min;弃上清,再加入上述培养液7ml,用吸管吹打细胞,再离心,800转/min,6min;弃上清,再向离心管加入上述细胞培养液(所用培养瓶为25cm2,故培养液加5ml),用吸管吹打细胞,使其全部吹起,吸出含有细胞的培养液至细胞培养瓶中,反复用吸管吹打细胞,制成单个的细胞悬液,放入5%CO2孵箱中,37.0℃培养;次日,观察细胞生长状况,更换细胞培养液一次,5%CO2孵箱,37.0℃继续培养。In a 10ml glass centrifuge tube, 7ml of DMEM culture solution containing 10% fetal bovine serum containing double antibody (PS) was added in advance; the frozen cells were taken out from the liquid nitrogen tank and quickly placed in a constant temperature water bath at 37.0°C for continuous Shake for 2-3 minutes to make it thaw quickly; after it is completely thawed, take out the cryopreservation tube, sterilize the cryopreservation tube with 75% alcohol cotton ball, open the cryopreservation tube, suck out the cell suspension with a straw, and pour it into the pre-prepared centrifuge tube Medium; after mixing, centrifuge at low speed at 800 rpm for 5 min; discard the supernatant, then add 7ml of the above culture solution, blow the cells with a pipette, and centrifuge again at 800 rpm for 6 min; discard the supernatant, and then add to the centrifuge tube The above cell culture solution (the culture bottle used is 25cm2, so add 5ml of the culture solution), blow the cells with a pipette to make them all blow up, suck out the culture solution containing the cells into the cell culture bottle, repeatedly use the pipette to blow and blow the cells to make a single Place the cell suspension in a 5% CO2 incubator at 37.0°C for culture; the next day, observe the growth of the cells, replace the cell culture medium once, and continue culturing at 37.0°C in a 5% CO2 incubator.
②HepG 2.2.15细胞的传代培养② Subculture of HepG 2.2.15 cells
弃去细胞培养瓶中旧的细胞培养液,用PH值7.2的PBS缓冲液冲洗2遍;向细胞培养瓶中加入0.25%的胰蛋白酶1ml,轻轻摇动培养瓶,使胰蛋白酶流遍所有 细胞表面,充分与细胞接触;将细胞培养瓶放入37.0℃孵箱,消化2~3min;将细胞培养瓶放到倒置显微镜下进行观察,发现细胞胞质回缩,细胞间隙增厚,立即终止消化;立即向培养瓶中加入含有PS的10%FBS的DMEM细胞培养液5~10ml;用吸管吸取瓶内培养液,反复吹打瓶壁细胞,使细胞全部悬起,形成单个的细胞悬液;用细胞计数板进行细胞计数,按5×105/12.5cm2将细胞接种到新的细胞培养瓶中;5%CO2孵箱,37.0℃培养。Discard the old cell culture medium in the cell culture flask, wash it twice with PBS buffer solution with a pH value of 7.2; add 1ml of 0.25% trypsin to the cell culture flask, shake the culture flask gently to make the trypsin flow through all the cells Surface, fully in contact with the cells; put the cell culture flask into a 37.0°C incubator, and digest for 2 to 3 minutes; observe the cell culture flask under an inverted microscope, and find that the cytoplasm of the cells retracts and the intercellular space thickens, and the digestion is terminated immediately ; Immediately add 5-10ml of DMEM cell culture solution containing 10% FBS containing PS to the culture bottle; absorb the culture solution in the bottle with a straw, blow and beat the cells on the wall of the bottle repeatedly to suspend all the cells and form a single cell suspension; Cell counting was performed on a cell counting plate, and the cells were inoculated into new cell culture flasks at 5×105/12.5 cm2; cultured at 37.0°C in a 5% CO2 incubator.
③细胞的冻存③Cryopreservation of cells
选择生长状态良好,处于对数生长期的细胞1瓶;用0.25%的胰蛋白酶消化单层生长的细胞(胰蛋白酶1ml冲洗,弃之,再加胰蛋白酶1ml,37.0℃,2~3分钟);向细胞培养瓶中加入含PS的10%FBS的DMEM细胞培养液5ml,将细胞悬起,并吹成单个细胞,移入离心管,离心,800转/分钟,5min;弃上清,向离心管中加入已配制好的含10%DMSO、20%FBS的DMEM,用吸管反复吹打几次,使细胞均匀;将细胞悬液分装入冻存管中,每个管加细胞培养液1.5ml,做好标记,标明细胞名称、保存时间及所用培养基;⑥4℃放置30min,﹣20℃放置1.5~2h,﹣70℃放置12h,然后移入液氮罐中冻存。Select one bottle of cells in good growth state and in the logarithmic growth phase; digest the monolayer-grown cells with 0.25% trypsin (wash with 1ml of trypsin, discard, then add 1ml of trypsin, 37.0°C, 2-3 minutes) ; Add 5ml of DMEM cell culture solution containing PS 10% FBS to the cell culture flask, suspend the cells, and blow them into single cells, transfer them to a centrifuge tube, centrifuge at 800 rpm, 5min; discard the supernatant, and centrifuge Add the prepared DMEM containing 10% DMSO and 20% FBS to the tube, blow and beat it several times with a pipette to make the cells uniform; divide the cell suspension into cryopreservation tubes, add 1.5ml of cell culture medium to each tube , make a mark, indicate the cell name, storage time and the medium used; ⑥Place at 4°C for 30 minutes, -20°C for 1.5-2h, -70°C for 12h, and then transfer to a liquid nitrogen tank for freezing.
④细胞接种到细胞培养板④ Seeding cells into cell culture plates
取生长状态好,处于对数生长期的HepG 2.2.15细胞1瓶,向培养瓶中加入0.25%胰蛋白酶1ml,轻轻摇动培养瓶,将胰蛋白酶倒掉;再重新加入0.25%的胰蛋白酶1~2ml,37.0℃,消化2~3分钟,将胰蛋白酶轻轻倒掉;加入不含PS的10%FBS的DMEM细胞培养液3ml,用吸管反复吹打细胞,使其成为单个的细胞悬液;确认已成为单个细胞悬液后,再向瓶中加入适量上述DMEM培养液,细胞计数,使细胞含量为2×105/ml;取出24孔细胞培养板,向每孔加入500μl细胞悬液(含细胞1×105/孔);放入5%CO2孵箱,37.0℃,培养过夜,待次日应用。Take a bottle of HepG 2.2.15 cells in good growth state and in the logarithmic growth phase, add 1ml of 0.25% trypsin to the culture bottle, shake the culture bottle gently, and pour off the trypsin; then add 0.25% trypsin again 1-2ml, 37.0°C, digest for 2-3 minutes, pour off the trypsin gently; add 3ml of DMEM cell culture medium without PS and 10% FBS, blow the cells repeatedly with a pipette to make a single cell suspension After confirming that it has become a single cell suspension, add an appropriate amount of the above-mentioned DMEM culture solution to the bottle, count the cells, and make the cell content 2 × 105/ml; take out the 24-well cell culture plate, and add 500 μl of the cell suspension to each well ( Containing cells (1×105/well); put into 5% CO2 incubator at 37.0°C, culture overnight, and apply the next day.
(5)CCK-8检测药物的细胞毒性(5) CCK-8 detects the cytotoxicity of drugs
①玫瑰树碱药物处理① Ellipticine drug treatment
将玫瑰树碱用少量DMSO溶解,然后用维持培养基将母液稀释到实验预先设计的最高待测浓度0.01μmol/L,用0.2μm一次性针头过滤器过滤除菌,分装备用;用倍比稀释法,在EP管中稀释,第一管为最高浓度药液。自第二管分别加入一定量的维持培养基,然后从第一管吸取等量的药液至第二管,吹打混匀后,弃去Tip头,换新的Tip头,从第二管吸取等量药液至第三管,依次类推至倒数第二管;最后一管为维持培养基,不含药物。最终将药物稀释成A、B、C、D四种浓度;Dissolve ellipticine with a small amount of DMSO, then use maintenance medium to dilute the mother solution to the highest concentration to be tested in the experiment, 0.01 μmol/L, filter and sterilize with a 0.2 μm disposable syringe filter, and distribute it for use; Dilution method, dilute in EP tubes, the first tube is the highest concentration liquid medicine. Add a certain amount of maintenance medium from the second tube, and then draw the same amount of liquid medicine from the first tube to the second tube. After blowing and mixing, discard the tip and replace it with a new tip. Add the same amount of drug solution to the third tube, and so on to the penultimate tube; the last tube is the maintenance medium without drugs. Finally, the drug is diluted into four concentrations of A, B, C, and D;
②含药培养基加入细胞培养孔板进行细胞毒性实验检测② Add the drug-containing medium to the cell culture well plate for cytotoxicity test
HepG 2.2.15细胞以8×104/ml接种于96孔细胞培养板中,每孔0.1ml,待细胞长成单层后,去除培养液(注意去除干净),将A、B、C、D四种浓度的玫瑰树碱加入对应的细胞孔进行培养,各浓度设三个复孔。设立空白对照组(只有培养基,不含细胞)和正常细胞对照组(不加药组,只加正常培养基);加药后放置5%CO2孵箱,37.0℃培养72h;细胞培养72h后取出培养孔板,向每孔加入10ul的CCK-8溶液,继续放入孵箱培养2h,2h后用酶标仪测定在450nm处的吸光度并打印结果;数据处理和分析计算。HepG 2.2.15 cells were inoculated in 96-well cell culture plate at 8×104/ml, 0.1ml per well. After the cells grew into a monolayer, the culture medium was removed (note that it was removed), and A, B, C, D Four concentrations of ellipticine were added to corresponding cell wells for culture, and three replicate wells were set up for each concentration. Set up a blank control group (only medium, no cells) and a normal cell control group (no drug group, only normal medium); after adding the drug, place in a 5% CO2 incubator and culture at 37.0°C for 72 hours; after 72 hours of cell culture Take out the culture well plate, add 10ul of CCK-8 solution to each well, and continue to culture in the incubator for 2h. After 2h, use a microplate reader to measure the absorbance at 450nm and print the results; data processing and analysis calculation.
细胞存活率=(实验组A值﹣空白对照组A值)/(正常细胞对照组A值-空白对照组A值)×100%,半数细胞中毒浓度(CC50)=Antilog[(logB+(50﹣A)的抑制百分数/(A﹣B)的抑制百分数×C)],A≥50%的药物浓度,B≤50%的药物浓度,C=log稀释倍数;应用计算机软件计算出玫瑰树碱的最大无毒浓度。Cell survival rate=(experimental group A value-blank control group A value)/(normal cell control group A value-blank control group A value)×100%, half cytotoxic concentration (CC50)=Antilog[(logB+(50- Inhibition percentage of A)/(A-B) inhibition percentage × C)], A≥50% drug concentration, B≤50% drug concentration, C=log dilution factor; use computer software to calculate ellipticine maximum non-toxic concentration.
(6)玫瑰树碱对HBsAg、HBeAg的抑制实验(6) Inhibition experiment of ellipticine on HBsAg and HBeAg
①无细胞毒性剂量的玫瑰树碱处理① Treatment with ellipticine at non-cytotoxic doses
根据上述CCK-8法得出玫瑰树碱的无细胞毒性浓度为0.01μmol/L,将浓度0.01μmol/L药液用维持培养基按倍比法稀释成0.005μmol/L、0.0025μmol/L等不同浓度(包括0.01μmol/L);将不同浓度的含药培养基加入到同一个24孔细胞培养板中,每个浓度设三个复孔,设立空白对照组和正常细胞对照组,以观察不同浓度下,同一药物对HBsAg、HBeAg的抑制率,观察量效关系;加入含药培养液后放入孵箱培养72h后,收集细胞上清液,﹣20℃保存待测。According to the above CCK-8 method, the non-cytotoxic concentration of ellipticine is 0.01 μmol/L, and the concentration of 0.01 μmol/L drug solution is diluted to 0.005 μmol/L, 0.0025 μmol/L, etc. Different concentrations (including 0.01 μmol/L); Add different concentrations of drug-containing medium into the same 24-well cell culture plate, set up three duplicate holes for each concentration, and set up a blank control group and a normal cell control group to observe At different concentrations, the inhibitory rate of the same drug on HBsAg and HBeAg was observed, and the dose-effect relationship was observed; after adding the drug-containing culture medium and cultured in an incubator for 72 hours, the cell supernatant was collected and stored at -20°C for testing.
②ELISA法检测HBsAg、HBeAg(按照ELISA试剂盒说明书操作即可)②Detection of HBsAg and HBeAg by ELISA method (just follow the instructions of the ELISA kit)
将ELISA试剂盒从4℃冰箱取出,同时取出待测的细胞培养上清液,放置室温平衡1小时;配洗液:洗液25ml,加双蒸水至500ml,充分混匀,备用;分别加待测样品、阴性对照、阳性对照50μl于相应孔(阴性对照、阳性对照各两孔);每孔加酶标结合物50μl;设空白对照孔2个,不加样品及酶标结合物;充分混匀;贴上封膜,37.0℃孵育30分钟;倾去孔板内容物,在滤纸上拍干,然后住满洗液,静置10~20秒后甩去洗液,拍干,如此共洗5次;显色:按顺序每孔先统一加显色剂A 50μl,再统一加显色剂B 50μl,贴上封膜,尽快37.0℃孵育,避光显色15min;终止:每孔加终止液50μl;判读:终止后在10分钟内完成判读,以空白对照调零,用酶标仪在450nm波长处测定吸光值(A),结果以阳性孔A值/阴性孔A值(P/N)表示(OD值在A450下读取)。Take the ELISA kit out of the refrigerator at 4°C, and at the same time take out the cell culture supernatant to be tested, and let it rest at room temperature for 1 hour; make up lotion: add 25ml of lotion, add double distilled water to 500ml, mix well, and set aside; Put 50 μl of the sample to be tested, negative control, and positive control into the corresponding wells (two wells each for negative control and positive control); Mix well; paste the sealing film and incubate at 37.0°C for 30 minutes; pour off the contents of the orifice plate, pat dry on the filter paper, then fill it with washing solution, let it stand for 10-20 seconds, shake off the washing solution, and pat dry, so that the total Wash 5 times; color development: add 50 μl of chromogenic reagent A uniformly to each well in sequence, and then add 50 μl of chromogenic reagent B uniformly, affix a sealing film, incubate at 37.0°C as soon as possible, and develop color for 15 minutes in the dark; termination: add 50 μl of chromogenic reagent B to each well Stop solution 50 μl; Interpretation: Complete the interpretation within 10 minutes after the termination, adjust to zero with a blank control, measure the absorbance (A) at a wavelength of 450 nm with a microplate reader, and the results are expressed as positive well A value/negative well A value (P/ N) means (OD value is read under A450).
玫瑰树碱对HBsAg、HBeAg的表达的抑制率计算公式:The formula for calculating the inhibitory rate of ellipticine on the expression of HBsAg and HBeAg:
抑制率=(实验孔P/N值-对照孔P/N值/对照组P/N值﹣2.1)×100%Inhibition rate = (experimental well P/N value - control well P/N value / control group P/N value - 2.1) × 100%
(7)玫瑰树碱对HBV DNA的抑制实验(严格按照DNA检测试剂盒说明书操作)(7) Inhibition experiment of ellipticine on HBV DNA (operated strictly in accordance with the instructions of the DNA detection kit)
药物处理与HBsAg、HBeAg抑制试验相同;加入不同浓度药物的细胞,培养72h后应用实时荧光定量PCR法测细胞上清液的HBV DNA载量;观察不同 浓度药物对HBV DNA的抑制作用。The drug treatment was the same as the HBsAg and HBeAg inhibition tests; the cells were added with different concentrations of drugs, and after 72 hours of culture, the HBV DNA load in the cell supernatant was measured by real-time fluorescent quantitative PCR; the inhibitory effect of different concentrations of drugs on HBV DNA was observed.
(8)所得数据采用SPSS 22.0系统统计,数据应用方差分析进行检验,P<0.05为差异有统计学意义。(8) The data obtained were statistically analyzed using SPSS 22.0 system, and the data were tested by analysis of variance, and P<0.05 was considered statistically significant.
(9)实验结果(9) Experimental results
加入无毒剂量含药血清培养HepG 2.2.15细胞72h后,检测细胞上清液中HBsAg、HBeAg及HBVDNA含量,结果如下:After HepG 2.2.15 cells were cultured for 72 hours by adding non-toxic dose of drug-containing serum, the contents of HBsAg, HBeAg and HBVDNA in the cell supernatant were detected, and the results were as follows:
表1 玫瑰树碱对HepG 2.2.15细胞分泌的HBsAg和HBeAg的抑制作用(n=3,
Figure PCTCN2022104668-appb-000060
)
Table 1 Inhibitory effect of ellipticine on HBsAg and HBeAg secreted by HepG 2.2.15 cells (n=3,
Figure PCTCN2022104668-appb-000060
)
Figure PCTCN2022104668-appb-000061
Figure PCTCN2022104668-appb-000061
与细胞对照组相比,*P<0.05Compared with the cell control group, *P<0.05
表2 玫瑰树碱对HepG 2.2.15培养上清HBV DNA(1×10 6IU/ml)的抑制作用(n=3,
Figure PCTCN2022104668-appb-000062
)
Table 2 Inhibitory effect of ellipticine on HBV DNA (1×10 6 IU/ml) in culture supernatant of HepG 2.2.15 (n=3,
Figure PCTCN2022104668-appb-000062
)
Figure PCTCN2022104668-appb-000063
Figure PCTCN2022104668-appb-000063
与细胞对照组相比,*P<0.05Compared with the cell control group, *P<0.05
由上述数据可见玫瑰树碱在最大无毒剂量下对HBsAg及HBeAg均有明显抑制作用,且随浓度的增大,抑制作用越明显,呈现浓度剂量依赖关系。玫瑰树碱对HBV DNA也有抑制作用,且随药物浓度增大,抑制作用越明显,呈现浓度剂量依赖关系。It can be seen from the above data that ellipticine has a significant inhibitory effect on HBsAg and HBeAg at the maximum non-toxic dose, and with the increase of the concentration, the inhibitory effect is more obvious, showing a concentration-dose dependent relationship. Ellipticine also has an inhibitory effect on HBV DNA, and with the increase of drug concentration, the inhibitory effect is more obvious, showing a concentration-dose-dependent relationship.
实施例6Example 6
基于乙肝疾病临床用药数据库与乙肝疾病多组学数据库、临床病历文本数据库等进行异构关联分析,建立异构关联网络深度学习模型,发现天然存在于喜树中的化合物喜树碱对乙肝的治疗作用,具体如下:Based on the Hepatitis B disease clinical drug database, the Hepatitis B disease multi-omics database, the clinical medical record text database, etc., heterogeneous association analysis was carried out, and a heterogeneous association network deep learning model was established, and the compound camptothecin naturally present in camptotheca was found to treat hepatitis B. function, as follows:
(1)~(3)同“实施例3”,通过建立乙肝治疗药物的相似度矩阵,建立乙肝疾病多组学数据、临床病历文本相似度矩阵,建立“乙肝治疗药物-乙肝疾病”异构关联网络,基于异构关联网络挖掘药物重定位模型。确定十余种最终预测药物(先导化合物)后,通过文献调研、药效团分析等多种手段,选取喜树碱为研究对象,开展实验验证。(1)~(3) Same as "Example 3", by establishing the similarity matrix of hepatitis B treatment drugs, establishing the multi-omics data of hepatitis B disease, the similarity matrix of clinical medical records, and establishing the "hepatitis B treatment drugs-hepatitis B disease" heterogeneous Association Networks, a drug repositioning model based on heterogeneous association network mining. After determining more than ten final predicted drugs (lead compounds), through literature research, pharmacophore analysis and other means, camptothecin was selected as the research object to carry out experimental verification.
(4)HepG 2.2.15细胞的培养(同“实施例5”)(4) The cultivation of HepG 2.2.15 cells (same as "Example 5")
(5)CCK-8检测药物的细胞毒性(同“实施例5”)(5) CCK-8 detects the cytotoxicity of medicine (same as "Example 5")
(6)喜树碱对HBsAg、HBeAg的抑制实验(同“实施例5”)(6) Inhibition experiment of camptothecin to HBsAg and HBeAg (same as "Example 5")
(7)喜树碱对HBV DNA的抑制实验(同“实施例5”)(7) inhibitory experiment of camptothecin to HBV DNA (with " embodiment 5 ")
(8)统计学处理(同“实施例5”)(8) statistical processing (with " embodiment 5 ")
(9)实验结果(9) Experimental results
加入无毒剂量含药血清培养HepG 2.2.15细胞72h后,检测细胞上清液中HBsAg、HBeAg及HBV DNA含量,结果如下:After adding non-toxic dose of drug-containing serum to culture HepG 2.2.15 cells for 72 hours, the contents of HBsAg, HBeAg and HBV DNA in the cell supernatant were detected, and the results were as follows:
表3 喜树碱对HepG 2.2.15细胞分泌的HBsAg和HBeAg的抑制作用(n=3,
Figure PCTCN2022104668-appb-000064
)
Table 3 Inhibitory effect of camptothecin on HBsAg and HBeAg secreted by HepG 2.2.15 cells (n=3,
Figure PCTCN2022104668-appb-000064
)
Figure PCTCN2022104668-appb-000065
Figure PCTCN2022104668-appb-000065
与细胞对照组相比,*P<0.05Compared with the cell control group, *P<0.05
表4 喜树碱对HepG2.2.15培养上清HBV DNA(1×106IU/ml)的抑制作用(n=3,
Figure PCTCN2022104668-appb-000066
)
Table 4 Inhibitory effect of camptothecin on HBV DNA (1×106IU/ml) in culture supernatant of HepG2.2.15 (n=3,
Figure PCTCN2022104668-appb-000066
)
Figure PCTCN2022104668-appb-000067
Figure PCTCN2022104668-appb-000067
Figure PCTCN2022104668-appb-000068
Figure PCTCN2022104668-appb-000068
与细胞对照组相比,*P<0.05Compared with the cell control group, *P<0.05
由上述数据可见喜树碱在最大无毒剂量下对HBsAg及HBeAg均有明显抑制作用,且随浓度的增大,抑制作用越明显,呈现浓度剂量依赖关系。喜树碱对HBV DNA也有抑制作用,且随药物浓度增大,抑制作用越明显,呈现浓度剂量依赖关系。It can be seen from the above data that camptothecin has a significant inhibitory effect on HBsAg and HBeAg at the maximum non-toxic dose, and with the increase of the concentration, the inhibitory effect is more obvious, showing a concentration-dose dependent relationship. Camptothecin also has an inhibitory effect on HBV DNA, and with the increase of drug concentration, the inhibitory effect is more obvious, showing a concentration-dose-dependent relationship.
通过实施例3-6的实验结果表明本发明的系统及方法可大大提高药物研发的效率、精准性及针对性,为药物新适应症的发现和药物研发周期管理提供科学支持,为提升临床诊治水平提供引领和支撑;实现临床数据和多组学数据深度挖掘,使之服务生物医药领域;还可促进临床领域科学假说的生成,加快新的诊疗方案研究进程,推动临床药学、分子生物学等相关学科发展;迅速推进药物开发的产业化,从而创造可观的市场价值,促进国民经济快速发展。The experimental results of Examples 3-6 show that the system and method of the present invention can greatly improve the efficiency, accuracy and pertinence of drug research and development, provide scientific support for the discovery of new drug indications and drug research and development cycle management, and improve clinical diagnosis and treatment. Provide guidance and support at the level; realize in-depth mining of clinical data and multi-omics data, so that it can serve the field of biomedicine; it can also promote the generation of scientific hypotheses in the clinical field, accelerate the research process of new diagnosis and treatment programs, and promote clinical pharmacy, molecular biology, etc. The development of related disciplines; rapidly promote the industrialization of drug development, thereby creating considerable market value and promoting the rapid development of the national economy.

Claims (10)

  1. 一种基于异构关联网络深度学习的药物重定位系统,其特征在于,包括预测工具模块、实验验证模块和对外服务模块;其中,预测工具模块主要利用Python编程语言与EMR数据库建立连接并进行操作,具体是在已知“药物-疾病”关联的基础上,融入药物相似度、疾病相似度信息,建立“药物-疾病”异构网络,利用深度学习中的深度神经网络算法或修正后的几何平均值进行“药物-疾病”潜在关联预测,实现药物重定位;实验验证模块与预测工具模块连接,主要是通过整合动物在体内或离体实验和临床药理学试验硬件设备和研究方案,形成药物重定位结果标准化试验流程,所述流程可满足一般的形态学、分子生物学、行为学及多组学研究;对外服务模块主要包括数据处理以及分析子模块、代码和方案呈现子模块以及培训与交流子模块,所述数据处理以及分析子模块是根据注册用户上传的原始数据及分析目标,给出解决方案并及时反馈给用户,所述代码和方案呈现子模块为用户公开部分代码和解决方案,所述培训与交流子模块可对同行开展培训和交流工作;A drug repositioning system based on heterogeneous association network deep learning is characterized in that it includes a prediction tool module, an experimental verification module and an external service module; wherein the prediction tool module mainly uses the Python programming language to establish a connection with the EMR database and perform operations , specifically, on the basis of the known "drug-disease" association, incorporate drug similarity and disease similarity information, establish a "drug-disease" heterogeneous network, and use deep neural network algorithms in deep learning or modified geometry The average value predicts the potential association of "drug-disease" to realize drug repositioning; the experimental verification module is connected with the prediction tool module, mainly through the integration of animal in vivo or in vitro experiments and clinical pharmacology test hardware equipment and research programs to form a drug Standardized test procedures for relocation results, which can meet general morphology, molecular biology, behavior and multi-omics research; external service modules mainly include data processing and analysis submodules, code and program presentation submodules, and training and The communication sub-module, the data processing and analysis sub-module is based on the original data uploaded by the registered user and the analysis target, provides a solution and timely feedback to the user, and the code and solution presentation sub-module discloses part of the code and solution for the user , the training and communication sub-module can carry out training and communication work for peers;
    所述实验验证模块还可根据不同的研究目的给与实验对象特定的处理因素,并控制非处理因素的影响,观察并评价实验效应,对研究假设做出回答,验证预测工具模块筛选的结果。The experimental verification module can also give the experimental subjects specific treatment factors according to different research purposes, and control the influence of non-treatment factors, observe and evaluate the experimental effect, answer the research hypothesis, and verify the results of the prediction tool module screening.
  2. 一种基于异构关联网络深度学习的药物重定位方法,其特征在于:包括以下步骤:A drug repositioning method based on heterogeneous association network deep learning, characterized in that: comprising the following steps:
    步骤1,药物相似度矩阵的构建;Step 1, the construction of drug similarity matrix;
    步骤2,疾病相似度矩阵的构建;Step 2, construction of disease similarity matrix;
    步骤3,“药物-疾病”异构关联网络的构建;Step 3, construction of "drug-disease" heterogeneous association network;
    步骤4,“药物-疾病”关联的潜在预测,即药物重定位。Step 4, potential prediction of "drug-disease" association, i.e. drug repositioning.
  3. 根据权利要求2所述的一种基于异构关联网络深度学习的药物重定位方 法,其特征在于,所述步骤1中药物相似度矩阵构建的具体过程为:根据数据的完备性和可获取性,选取药物的化学结构、靶蛋白序列、互作用和副作用四类属性特性信息;分别建立基于各类属性特征的药物相似度矩阵,即基于化学结构的药物相似度矩阵、基于靶蛋白序列的药物相似度矩阵、基于互作用的药物相似度矩阵和基于副作用的药物相似度矩阵;然后将上述建立的基于各类属性特征的药物相似度矩阵与基于EMR的药物相似度矩阵融合,构成药物相似度矩阵;所述步骤2中疾病相似度矩阵构建的具体过程为:根据数据完备性和可获取性,选取疾病的本体和表型两类信息;分别建立基于本体的疾病相似度矩阵和基于表型的疾病相似度矩阵;然后将上述建立的基于本体的疾病相似度矩阵和基于表型的疾病相似度矩阵与基于EMR建立的疾病相似度矩阵融合,构成疾病相似度矩阵。A drug repositioning method based on heterogeneous association network deep learning according to claim 2, characterized in that, the specific process of building the drug similarity matrix in the step 1 is: according to the completeness and availability of data , select the chemical structure of the drug, target protein sequence, interaction and side effects four types of attribute characteristics information; respectively establish the drug similarity matrix based on various attribute characteristics, that is, the drug similarity matrix based on the chemical structure, the drug similarity matrix based on the target protein sequence Similarity matrix, drug similarity matrix based on interaction and drug similarity matrix based on side effects; then the drug similarity matrix based on various attribute characteristics established above is fused with the drug similarity matrix based on EMR to form drug similarity matrix; the specific process of constructing the disease similarity matrix in the step 2 is: according to data completeness and availability, select two types of information of disease ontology and phenotype; respectively establish ontology-based disease similarity matrix and phenotype-based The disease similarity matrix based on the disease similarity matrix; then the ontology-based disease similarity matrix and phenotype-based disease similarity matrix established above are fused with the disease similarity matrix established based on EMR to form a disease similarity matrix.
  4. 根据权利要求2所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于,所述步骤3中“药物-疾病”异构关联网络构建的具体过程为:以“药物-疾病”邻接矩阵为桥梁,结合步骤1构建的药物相似度矩阵、步骤2构建的疾病相似度矩阵,可构成“药物-疾病”异构关联网络:A drug repositioning method based on heterogeneous association network deep learning according to claim 2, characterized in that the specific process of constructing the "drug-disease" heterogeneous association network in said step 3 is: "Disease" adjacency matrix as a bridge, combined with the drug similarity matrix constructed in step 1 and the disease similarity matrix constructed in step 2, can form a "drug-disease" heterogeneous association network:
    H r,d={{R,D},{E r,E d,E r,d}{W r,W d,W r,d}} H r,d ={{R,D},{E r ,E d ,E r,d }{W r ,W d ,W r,d }}
    式中,R表示药物顶点集合,D表示疾病顶点集合;E r、E d、E r,d分别表示“药物-药物”、“疾病-疾病”、“药物-疾病”之间的连线;W r、W d、W r,d分别表示“药物-药物”相似度值、“疾病-疾病”相似度值以及“药物-疾病”之间是否存在治疗关系。 In the formula, R represents the drug vertex set, D represents the disease vertex set; E r , E d , Er , d represent the connection between "drug-drug", "disease-disease", and "drug-disease"respectively; W r , W d , W r,d represent the similarity value of "drug-drug", "disease-disease" and whether there is a therapeutic relationship between "drug-disease", respectively.
  5. 根据权利要求2所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于,所述步骤4中“药物-疾病”关联的潜在预测,即药物重定位 的具体过程为:A drug repositioning method based on heterogeneous association network deep learning according to claim 2, characterized in that the potential prediction of the "drug-disease" association in the step 4, that is, the specific process of drug repositioning is:
    4.1“药物-疾病”关联的特征提取:4.1 Feature extraction of "drug-disease" association:
    4.1.1以药物为中心的特征向量,表示为:4.1.1 The drug-centric feature vector, expressed as:
    Figure PCTCN2022104668-appb-100001
    Figure PCTCN2022104668-appb-100001
    式中,A i,1为“药物-疾病”邻接矩阵A的第i行,表示与药物r i存在关联的疾病集合;
    Figure PCTCN2022104668-appb-100002
    为药物相似度矩阵的第i行,表示药物r i与其他药物之间的相似度;
    In the formula, A i, 1 is the i-th row of the "drug-disease" adjacency matrix A, which represents the disease set associated with the drug r i ;
    Figure PCTCN2022104668-appb-100002
    is the i-th row of the drug similarity matrix, indicating the similarity between drug r i and other drugs;
    4.1.2以疾病为中心的特征向量,表示为:4.1.2 Disease-centric feature vector, expressed as:
    Figure PCTCN2022104668-appb-100003
    Figure PCTCN2022104668-appb-100003
    式中,A 1,j为邻接矩阵A的第j列,表示与疾病d j存在关联的药物集合;
    Figure PCTCN2022104668-appb-100004
    为药物相似度矩阵的第j列,表示疾病d j与其他疾病之间的相似度;
    In the formula, A 1, j is the jth column of the adjacency matrix A, which represents the drug collection associated with the disease d j ;
    Figure PCTCN2022104668-appb-100004
    is the jth column of the drug similarity matrix, indicating the similarity between disease d j and other diseases;
    4.1.3“药物-疾病”关联(r i,d j)的特征向量可由以药物r i为中心的特征向量
    Figure PCTCN2022104668-appb-100005
    和以疾病d j为中心的特征向量
    Figure PCTCN2022104668-appb-100006
    组合而成,表示为:
    4.1.3 The eigenvector of the "drug-disease" association (r i , d j ) can be represented by the eigenvector centered on the drug r i
    Figure PCTCN2022104668-appb-100005
    and an eigenvector centered on disease d j
    Figure PCTCN2022104668-appb-100006
    combined, expressed as:
    Figure PCTCN2022104668-appb-100007
    Figure PCTCN2022104668-appb-100007
    4.2深度神经网络模型的训练4.2 Training of deep neural network model
    深度神经网络算法模型采用全连接神经网络,即当前层的任意一个神经元一定与前一层的任意一个神经元相连,通过组合低层特征形成更加抽象的高层来表示属性类别或特征,将“药物-疾病”关联的预测设置为一个二分类问题,采用经典的塔式结构搭建全连接神经网络,输入层步骤4.1生成的特征向量F ij,输 出层包含了两个神经元,分别表示测试样本属于“真”和“假”的概率; The deep neural network algorithm model adopts a fully connected neural network, that is, any neuron in the current layer must be connected to any neuron in the previous layer, and a more abstract high-level layer is formed by combining low-level features to represent attribute categories or features. -Disease" association prediction is set as a binary classification problem, using the classic tower structure to build a fully connected neural network, the input layer is the feature vector F ij generated in step 4.1, and the output layer contains two neurons, respectively indicating that the test sample belongs to Probabilities of "true" and "false";
    对于“药物-疾病”关联特征向量集合F,按1:1随机产生“负”样本集合,按照6:2:2的比例生成训练集、测试集和验证集;For the "drug-disease" associated feature vector set F, randomly generate a "negative" sample set at 1:1, and generate a training set, a test set, and a verification set at a ratio of 6:2:2;
    第l层的第i神经单元与l-1层的第j个神经单元之间的权重记为
    Figure PCTCN2022104668-appb-100008
    通过训练集寻找L层神经网络的最优权重集w:={w l}l:=1→L,使得交叉熵最小化;
    The weight between the i-th neuron unit in the l layer and the j-th neuron unit in the l-1 layer is recorded as
    Figure PCTCN2022104668-appb-100008
    Find the optimal weight set w:={w l }l:=1→L of the L-layer neural network through the training set, so that the cross-entropy is minimized;
    采用随机梯度下降法来学习参数,采用Mini-Batch方法加快学习速度,采用Dropout方法避免模型过度拟合,使用十倍交叉验证评估模型性能优劣,通过评估指标优化隐藏层的层数,实现模型优化;Use the stochastic gradient descent method to learn parameters, use the Mini-Batch method to speed up the learning speed, use the Dropout method to avoid model overfitting, use ten-fold cross-validation to evaluate the performance of the model, and optimize the number of hidden layers through evaluation indicators to realize the model optimization;
    4.3采用测试集和验证集对步骤4.2优化后的模型进行测试和验证,将测试、验证后的全连接神经网络模型预测结果与现有模型预测结果求交集即获得最终预测药物。4.3 Use the test set and verification set to test and verify the model optimized in step 4.2, and intersect the predicted results of the tested and verified fully connected neural network model with the predicted results of the existing model to obtain the final predicted drug.
  6. 根据权利要求3所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于:所述步骤1中基于各类属性特征的药物相似度矩阵与基于EMR的药物相似度矩阵融合的具体过程为:采用分位数标准化分别对基于化学结构的药物相似度矩阵、基于靶蛋白序列的药物相似度矩阵、基于互作用的药物相似度矩阵、基于副作用的药物相似度矩阵和基于EMR的药物相似度矩阵的相似度取值进行标准化处理,进而取平均值,构成药物相似度矩阵;所述步骤2中基于本体的疾病相似度矩阵和基于表型的疾病相似度矩阵与基于EMR建立的疾病相似度矩阵融合的具体过程为:采用分位数标准化分别对基于本体的疾病相似度矩阵、基于表型的疾病相似度矩阵和基于EMR建立的疾病相似度矩阵的相似度取值进行标准化处理,进而取平均值,构成疾病相似度矩阵。A drug relocation method based on heterogeneous association network deep learning according to claim 3, characterized in that: in the step 1, the drug similarity matrix based on various attribute characteristics is fused with the drug similarity matrix based on EMR The specific process is: quantile standardization is used to analyze the drug similarity matrix based on chemical structure, the drug similarity matrix based on target protein sequence, the drug similarity matrix based on interaction, the drug similarity matrix based on side effects and the EMR-based drug similarity matrix. The similarity values of the drug similarity matrix are standardized, and then averaged to form a drug similarity matrix; in the step 2, the ontology-based disease similarity matrix and the phenotype-based disease similarity matrix are established based on EMR. The specific process of the fusion of the disease similarity matrix is as follows: quantile standardization is used to standardize the similarity values of the ontology-based disease similarity matrix, phenotype-based disease similarity matrix, and EMR-based disease similarity matrix. processing, and then take the average value to form a disease similarity matrix.
  7. 根据权利要求3所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于:所述各药物相似度矩阵的相似度计算如下:A drug repositioning method based on heterogeneous association network deep learning according to claim 3, characterized in that: the similarity of each drug similarity matrix is calculated as follows:
    基于化学结构的药物相似度矩阵的相似度用Tanimoto系数表示:The similarity of the chemical structure-based drug similarity matrix is expressed by the Tanimoto coefficient:
    Figure PCTCN2022104668-appb-100009
    Figure PCTCN2022104668-appb-100009
    式中,|C r|、|C r′|分别表示药物r与药物r′中化学子结构的数量,C rC r′表示药物r与药物r′共同拥有化学子结构的数量; In the formula, |C r | and |C r′ | represent the number of chemical substructures in drug r and drug r′, respectively, and C r C r′ represents the number of chemical substructures shared by drug r and drug r′;
    基于靶蛋白序列的药物相似度矩阵的相似度:Similarity based on target protein sequence-drug similarity matrix:
    Figure PCTCN2022104668-appb-100010
    Figure PCTCN2022104668-appb-100010
    式中,sw(…,…)表示Smith–Waterman序列对齐分值;In the formula, sw(...,...) represents the Smith–Waterman sequence alignment score;
    基于互作用的药物相似度矩阵的相似度用Jaccard系数表示:The similarity of the interaction-based drug similarity matrix is expressed by the Jaccard coefficient:
    Figure PCTCN2022104668-appb-100011
    Figure PCTCN2022104668-appb-100011
    式中,I r、I r′分别表示药物r与药物r′的互作用药物集合; In the formula, I r , I r' represent the interaction drug set of drug r and drug r'respectively;
    基于副作用的药物相似度矩阵的相似度用Jaccard系数表示:The similarity of the side effect-based drug similarity matrix is expressed by the Jaccard coefficient:
    Figure PCTCN2022104668-appb-100012
    Figure PCTCN2022104668-appb-100012
    式中,E r、E r′分别表示药物r与药物r′的副作用集合; In the formula, E r and E r' represent the side effect sets of drug r and drug r'respectively;
    基于EMR的药物相似度矩阵的相似度的计算公式如下:The formula for calculating the similarity of the EMR-based drug similarity matrix is as follows:
    Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2Simd,pk=Max(Qd,pk)-Min(Qd,pk),Where|Qd,pk|≥2
    式中,Qd,pk代表用d药后住院病历p的k类型的实验室检测结果,Simd,pk为最大的Qd,pk差值;In the formula, Qd,pk represents the laboratory test results of type k in hospitalized medical record p after taking d medicine, and Simd,pk is the largest Qd,pk difference;
  8. 根据权利要求3所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于:所述各疾病相似度矩阵的相似度计算如下:A drug repositioning method based on heterogeneous association network deep learning according to claim 3, characterized in that: the similarity of each disease similarity matrix is calculated as follows:
    基于本体的疾病相似度矩阵的相似度的计算公式如下:The calculation formula of the similarity of ontology-based disease similarity matrix is as follows:
    Figure PCTCN2022104668-appb-100013
    Figure PCTCN2022104668-appb-100013
    式中,c(d,d′)表示疾病d和d′共有父节点的数量;p x表示疾病x出现的概率,即疾病名称x或其子节点的数量与所有疾病名称数量的比值; In the formula, c(d, d′) represents the number of common parent nodes of disease d and d′; p x represents the probability of disease x, that is, the ratio of the number of disease name x or its child nodes to the number of all disease names;
    基于表型的疾病相似度矩阵的相似度用Cosine系数表示:The similarity of the phenotype-based disease similarity matrix is expressed by the Cosine coefficient:
    Figure PCTCN2022104668-appb-100014
    Figure PCTCN2022104668-appb-100014
    式中,
    Figure PCTCN2022104668-appb-100015
    Figure PCTCN2022104668-appb-100016
    分别表示疾病di和d′i的医学描述信息中第i个MeSH词出现的频次;
    In the formula,
    Figure PCTCN2022104668-appb-100015
    and
    Figure PCTCN2022104668-appb-100016
    Indicate the frequency of occurrence of the i-th MeSH word in the medical description information of diseases di and d′i respectively;
    基于EMR的疾病相似度矩阵的相似度的计算公式如下:The calculation formula of the similarity of the disease similarity matrix based on EMR is as follows:
    Figure PCTCN2022104668-appb-100017
    Figure PCTCN2022104668-appb-100017
    其中,Gd与Gd’分别表示疾病d与d’的特征集合。Among them, Gd and Gd' represent the feature sets of diseases d and d' respectively.
  9. 根据权利要求5所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于:所述步骤4.2中的评估指标包括Precision、Recall、F1-measure和AUC。The drug repositioning method based on heterogeneous association network deep learning according to claim 5, characterized in that: the evaluation indicators in the step 4.2 include Precision, Recall, F1-measure and AUC.
  10. 根据权利要求5所述的一种基于异构关联网络深度学习的药物重定位方法,其特征在于,步骤4.3中所述现有模型为:A kind of drug repositioning method based on heterogeneous association network deep learning according to claim 5, it is characterized in that, the existing model described in step 4.3 is:
    Figure PCTCN2022104668-appb-100018
    Figure PCTCN2022104668-appb-100018
    Assoc r,d取值区间为[0,1],越趋近于1,则表示药物治疗疾病的可能性越大。 The value interval of Assoc r and d is [0,1], and the closer it is to 1, the greater the possibility of the drug treating the disease.
PCT/CN2022/104668 2021-11-03 2022-07-08 Drug repurposing system and method based on heterogeneous association network deep learning WO2023077854A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111296039.9 2021-11-03
CN202111296039.9A CN114038574A (en) 2021-11-03 2021-11-03 Drug relocation system and method based on heterogeneous association network deep learning

Publications (1)

Publication Number Publication Date
WO2023077854A1 true WO2023077854A1 (en) 2023-05-11

Family

ID=80142734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104668 WO2023077854A1 (en) 2021-11-03 2022-07-08 Drug repurposing system and method based on heterogeneous association network deep learning

Country Status (2)

Country Link
CN (1) CN114038574A (en)
WO (1) WO2023077854A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688110A (en) * 2024-02-02 2024-03-12 山东再起数据科技有限公司 Data blood-margin map construction method for data center
CN117976139A (en) * 2024-03-29 2024-05-03 武汉纺织大学 Drug repositioning method and system based on deviation correcting mechanism and contrast learning

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114038574A (en) * 2021-11-03 2022-02-11 山西医科大学 Drug relocation system and method based on heterogeneous association network deep learning
CN114860886B (en) * 2022-05-25 2023-07-18 北京百度网讯科技有限公司 Method for generating relationship graph and method and device for determining matching relationship
CN117652002A (en) * 2022-05-27 2024-03-05 京东方科技集团股份有限公司 Correlation prediction method and device, and machine learning model training method and device
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021360A (en) * 2017-09-30 2019-07-16 山西医科大学 Platform is associated with based on a group disease-drug for data mining
CN110021341A (en) * 2019-02-21 2019-07-16 华东师范大学 A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access
CN110176271A (en) * 2019-03-06 2019-08-27 山西医科大学 Multiple groups disturbance of data cloud
US20210142173A1 (en) * 2019-11-12 2021-05-13 The Cleveland Clinic Foundation Network-based deep learning technology for target identification and drug repurposing
CN113053468A (en) * 2021-05-31 2021-06-29 之江实验室 Drug new indication discovering method and system fusing patient image information
CN114038574A (en) * 2021-11-03 2022-02-11 山西医科大学 Drug relocation system and method based on heterogeneous association network deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021360A (en) * 2017-09-30 2019-07-16 山西医科大学 Platform is associated with based on a group disease-drug for data mining
CN110021341A (en) * 2019-02-21 2019-07-16 华东师范大学 A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access
CN110176271A (en) * 2019-03-06 2019-08-27 山西医科大学 Multiple groups disturbance of data cloud
US20210142173A1 (en) * 2019-11-12 2021-05-13 The Cleveland Clinic Foundation Network-based deep learning technology for target identification and drug repurposing
CN113053468A (en) * 2021-05-31 2021-06-29 之江实验室 Drug new indication discovering method and system fusing patient image information
CN114038574A (en) * 2021-11-03 2022-02-11 山西医科大学 Drug relocation system and method based on heterogeneous association network deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 18 April 2021, HEILONGJIANG UNIVERSITY, CN, article SONG, YINGYING: "Research on a Drug Relocation Method Based on Network Representation Learning and Deep Learning", pages: 1 - 74, XP009545529, DOI: 10.27123/d.cnki.ghlju.2021.001133 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688110A (en) * 2024-02-02 2024-03-12 山东再起数据科技有限公司 Data blood-margin map construction method for data center
CN117688110B (en) * 2024-02-02 2024-04-26 山东再起数据科技有限公司 Data blood-margin map construction method for data center
CN117976139A (en) * 2024-03-29 2024-05-03 武汉纺织大学 Drug repositioning method and system based on deviation correcting mechanism and contrast learning

Also Published As

Publication number Publication date
CN114038574A (en) 2022-02-11

Similar Documents

Publication Publication Date Title
WO2023077854A1 (en) Drug repurposing system and method based on heterogeneous association network deep learning
CN111613289B (en) Individuation medicine dosage prediction method, device, electronic equipment and storage medium
JP5970449B2 (en) A computer-based system for predicting treatment outcomes
WO2023241012A1 (en) Method for establishing deep learning-based model for predicting functions after post-cerebral stroke early rehabilitation
CN101615222A (en) A kind of Chinese prescription designing technique based on the Chinese medicine effective component group
US20180166175A1 (en) Discovery and analysis of drug-related side effects
Gandla et al. A review of artificial intelligence in treatment of COVID-19
Chen et al. Using neural networks to determine the contribution of danshensu to its multiple cardiovascular activities in acute myocardial infarction rats
Djuris et al. Neural computing in pharmaceutical products and process development
Wang et al. Fusion-based deep learning architecture for detecting drug-target binding affinity using target and drug sequence and structure
Katsarou et al. Short term glucose prediction in patients with type 1 diabetes mellitus
CN117253537A (en) Method for analyzing mechanism of action of glucomannan for resisting hyperlipidemia based on network pharmacology and molecular docking
Pezoulas et al. Generation of virtual patients for in silico cardiomyopathies drug development
KR102431534B1 (en) Model for predicting the toxic side effects of the intended drug and method thereof
CN116129988B (en) Model construction method, device, equipment and medium
CN115810397A (en) Method for constructing molecular prediction model of active ingredient target spot of Chinese actinidia root
Chen et al. A drug repositioning algorithm based on a deep autoencoder and adaptive fusion
Tran et al. Emerging trends in computational biology, bioinformatics and system biology
Guo et al. Building and evaluating an animal model for syndrome in traditional Chinese medicine in the context of unstable angina (Myocardial Ischemia) by supervised data mining approaches
Zhao et al. Using TransR to enhance drug repurposing knowledge graph for COVID-19 and its complications
An et al. A Survey of Machine Learning Technologies for COVID-19 Pandemic
CN104636619A (en) Method for rapidly and virtually screening human small intestine absorbable drugs
Wang et al. Drug Toxicity Classification Based on ReliefF and K-means Algorithm
Yu et al. The prediction model of blood glucose concentration for smart health
Lu et al. Drug discovery for suicide management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22888889

Country of ref document: EP

Kind code of ref document: A1