CN114242186A - Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium - Google Patents

Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium Download PDF

Info

Publication number
CN114242186A
CN114242186A CN202111658146.1A CN202111658146A CN114242186A CN 114242186 A CN114242186 A CN 114242186A CN 202111658146 A CN202111658146 A CN 202111658146A CN 114242186 A CN114242186 A CN 114242186A
Authority
CN
China
Prior art keywords
ghp
gcn
disease
chinese
drug
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111658146.1A
Other languages
Chinese (zh)
Other versions
CN114242186B (en
Inventor
金敏
龚后武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202111658146.1A priority Critical patent/CN114242186B/en
Publication of CN114242186A publication Critical patent/CN114242186A/en
Application granted granted Critical
Publication of CN114242186B publication Critical patent/CN114242186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a Chinese and western medicine relocation method and a system and a storage medium for fusing GHP and GCN, wherein the relocation method comprises the following steps: multi-source multi-modal heterogeneous data acquisition; the GHP-GCN Chinese and western medicine relocation model construction method specifically comprises the following steps: constructing a global heterogeneous pharmacological network GHP, fusing the global heterogeneous pharmacological network and a graph convolution neural network to form a GHP-GCN model, and optimizing the GHP-GCN model by adopting double weighted constraints of 'monarch, minister, assistant and guide' and 'node semantic neighbor'; training set construction and GHP-GCN model learning and reasoning; calculating contribution degree and analyzing results; and verifying and optimizing the result. The method can be applied to the relocation research of Chinese and western medicines, can predict potential and new medicine-target protein/gene, target protein/gene-disease and medicine-disease association, identify new medicine target and disease new associated pathogenic gene, discover and confirm medicines with potential treatment effect on diseases and integration action and synergistic mechanism thereof, and provide scientific basis for explaining the effectiveness of Chinese and western medicine treatment and medicine discovery.

Description

Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium
Technical Field
The invention relates to the technical field of Chinese and western medicine screening, development and relocation, in particular to a Chinese and western medicine relocation method and system and a storage medium for fusing a global heterogeneous pharmacological network GHP and a graph convolution neural network GCN.
Background
The new drug development is to search for a proper new drug for a certain protein target; drug relocation is the relocation analysis of existing drugs to find new uses for the drug. Compared with the research and development of new drugs, the relocation of drugs has the advantages of reducing cost and accelerating the discovery of new drugs, and gradually becomes a more popular strategy in drug discovery. The current strategy for drug relocation is to classify subjects into drug-based and disease-based under the guidance of the "similarity" concept. That is, the method is generally based on two directions, the first is drug-based, and the new indication of the drug is predicted by some property of the drug, such as the desired chemical structure or the activity of drug molecules; secondly, based on diseases and the like, reverse speculation is carried out through side effects, disease clinical response, pathological features, pathogenesis and intervention mechanism, and then the marketed drugs with new indications are searched and discovered from different links.
For drug relocation, predicting the complex relationships between drugs, targets and diseases is an important direction. Researchers tend to focus on associations between any two of them. In a drug-target-disease triple association, if two known associations exist, then a third association may also exist. For example, there may be a link between a drug, a target and a disease, a link between a drug and a target and a disease, and a link between a drug and a disease. Therefore, focusing only on the correlation between any two of them cannot fully reveal the new mechanism of action of the drug, presume the mechanism of occurrence and development of the disease, and find new indications of the drug.
The current development status of drug relocation mainly utilizes complex network analysis, machine learning, literature mining and the like, and combines data in knowledge bases of chemistry, pharmacy, bioinformatics and the like to establish a relevant model so as to obtain a heuristic model and perform relocation application on certain diseases or certain drugs. The method comprises the steps of converting data into an abstract matrix form by integrating information of a relevant database, learning features by using an effective algorithm of machine learning, obtaining an optimization model based on certain requirements by repeated training, and finally performing classification or regression prediction in a test set. The comprehensive analysis method can effectively combine the characteristic information of different sources. It is noted that due to the sparsity of feature data from some source, irrelevant noise data may be introduced at the same time as the feature data, which increases the difficulty of identifying important features of the drug. Matrix factorization methods have achieved good performance in some disease studies, and studies that need further breakthrough now include how to efficiently capture the complex structure of drug-disease associated data and efficiently handle high-complexity matrix operations. There are complex and nonlinear associations between the mechanisms of action of drugs and diseases, drug-to-drug, and disease-to-disease similarities. Most of the previous methods belong to shallow models, and it is difficult to capture the relationships deeply, so deep learning methods are proposed.
The newly-appeared graph convolution neural network expands the method of the neural network in the aspect of graph structure data representation, and has the characteristic of updating nodes according to self characteristics, neighbor characteristic information and topology information on a graph structure. This computational power for dimensionality reduction on the graph structure just meets the computational requirements for predicting similar drugs (targets) from similar target (drug) nodes in drug relocation.
Patent document CN111191014A discloses a method, a system, a terminal and a medium for drug relocation, which includes identifying and extracting an indication entity and a drug entity for constructing an indication word stock and a drug word stock; constructing a medicine knowledge graph according to the indication word bank and the medicine word bank; obtaining the indication similarity degree between indication nodes and the medicine similarity degree between medicine nodes according to the medicine knowledge graph; and obtaining a link correlation index according to the indication similarity and the medicine similarity so as to reposition the medicine. Only the association between the indications and the medicines is constructed, and finally, link correlation indexes are obtained according to the calculated indication similarity and the medicine similarity so as to relocate the medicines, or a new action mechanism of the medicines cannot be comprehensively disclosed, and the efficiency and the accuracy of medicine relocation are not high; and is mainly directed to the relocation of western medicines (compound medicines).
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a Chinese and western medicine relocation method, a system and a storage medium for fusing Global heterogeneous Pharmacological network GHP (GHP) and graph convolution neural network GCN are provided.
In order to solve the technical problems, the invention adopts the following technical scheme:
firstly, a Chinese and western medicine relocation method for fusing GHP and GCN is provided, which comprises the following steps:
s1, multi-source multi-modal heterogeneous data acquisition: collecting known drug data, disease data and target protein/gene data;
s2, constructing a Chinese and western medicine relocation model of GHP-GCN, wherein the construction method comprises the following steps:
s21, constructing a global heterogeneous pharmacological network GHP: according to the collected data, taking each disease, medicine and target protein/gene as nodes, constructing a global heterogeneous pharmacological network GHP, and comprehensively representing various nonlinear incidence relations between the same node and the heterogeneous node;
s22, fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN to form a GHP-GCN model: the constructed global heterogeneous pharmacological network GHP is fused into a graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented;
s23, GHP-GCN model optimization: the GHP-GCN Chinese and western medicine relocation model is optimized by adopting the double weighted constraint of 'monarch, minister, assistant and guide' and 'node semantic nearest neighbor';
s3, training set construction and learning and reasoning of the GHP-GCN model: constructing a drug-target protein/gene-disease training set, and iteratively training and optimizing a GHP-GCN Chinese and western drug relocation model by adopting an alpha-balanced focus loss function;
s4, calculating contribution degree and analyzing results: given a target disease, calculating the contribution degree of each node to the target disease through a GHP-GCN Chinese and western medicine relocation model, and sequencing the contribution degrees; according to the contribution degree sorting, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug;
s5, result verification and tuning: and verifying and optimizing the prediction analysis result by adopting a molecular packaging method, repeating the steps S2-S4 if the prediction analysis result has deviation, and outputting the prediction analysis result if the prediction analysis result is accurate.
The global heterogeneous pharmacological network GHP constructed by the invention is composed of global node types and global topological associations, wherein the global nodes comprise three types of network nodes 'drug-target protein/gene-disease' instead of 'drug-disease', and can comprehensively represent drug action mechanisms.
Further, in step S2, the medicine nodes include the screened traditional Chinese medicines and western medicines, specifically include western medicines for treating a certain disease, traditional Chinese medicines for treating a certain disease, unknown western medicines, unknown traditional Chinese medicines, western medicines for not treating a certain disease, and traditional Chinese medicines for not treating a certain disease; the disease node is composed of four dimensions of disease type, symptom stage, symptom and side effect symptom; the target protein/gene node includes all known target proteins and genes that have been selected.
Further, the association of the nodes of the same species is mainly structural or phenotypic similarity association, including drug-drug similarity association (e.g. small molecule structural similarity between drugs-drugs), target protein/gene-target protein/gene similarity association, disease-disease similarity association, and the association between the nodes of the target protein/gene also includes protein interaction association from PPI.
Further, similarity association of the same kind of nodes is expressed by adopting a matrix and semantic neighbor constraint is adopted; specifically, the following:
(1) drug-drug similarity correlation matrix At∈Rn*nIs shown, in which: a. thetEach item of (a) is constructed based on the similarity of each pair of drugs; the similarity of these drugs is represented by the n × n matrix StRepresents, wherein (i, j) item St(i, j) is the drug tiAnd a drug tjThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.
Figure BDA0003446574070000033
ofAtIs defined as:
Figure BDA0003446574070000031
wherein: st(i, j) is a drug similarity matrix,
Figure BDA0003446574070000032
is a group of tiExtension of (t)iNearest neighbor, Nk(ti) Is a drug tiK nearest neighbors of riRepresenting structural similarity between drugs; that is, the similarity between drugs includes semantic similarity between drugs and structural similarity between drugs.
(2) The similarity relationship between the target protein/gene and the target protein/gene is represented by matrix Ap∈Rl*lIs shown, in which: a. thepEach item of (a) is constructed based on the similarity of each pair of target proteins/genes; the similarity of these target proteins/genes is represented by the l × l matrix SpRepresents, wherein (i, j) item Sp(i, j) is a target protein/gene piAnd target protein/Gene pjThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.
Figure BDA0003446574070000041
of ApIs defined as:
Figure BDA0003446574070000042
wherein: sp(i, j) is a target protein/gene similarity matrix,
Figure BDA0003446574070000043
is a group piExtension p ofiNearest neighbor, Nk(pi) Is a target protein/gene piK nearest neighbors of riRepresenting the interaction relationship between target proteins/genes.
(3) Disease-disease similarity correlation matrix Ad∈Rm*mIs shown, in which: a. thedEach item of (a) is constructed based on the similarity of each pair of diseases; the similarity of these diseases is represented by the m × m matrix SdRepresents, wherein (i, j) item Sd(i, j) is a disease diAnd disease djThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.
Figure BDA0003446574070000044
of AdIs defined as:
Figure BDA0003446574070000045
wherein: sd(i, j) is a disease similarity matrix,
Figure BDA0003446574070000046
is a group diExtension d ofiNearest neighbor, Nk(di) Is a disease of diK nearest neighbors of riIndicating the semantic similarity of MESH between diseases.
Further, the air conditioner is provided with a fan,
the association of the heterogeneous nodes is mainly the association of mechanism action, and mainly comprises the association of drug-target protein/gene, the association of target protein/gene-disease and the association of disease-drug.
And for the association of the heterogeneous nodes, adopting matrix representation and adopting weight constraint, if the two heterogeneous nodes have known interaction, adding an edge between the two nodes, and setting the weight of the edge to be 1.
Specifically, such as: matrix Y epsilon R for interaction between medicine and target protein/genem*nIs shown, wherein:
Figure BDA0003446574070000047
similar to drug-target protein/gene, the similarity between target protein/gene-disease and drug-disease is related by mechanism action, and is also expressed by a similar matrix and constrained by weight.
Further, in step S23, the optimization of the GHP-GCN chinese and western medicine relocation model using the "monarch, minister, assistant and guide" and "node semantic nearest neighbor" dual-weighted constraint specifically means: traditional Chinese medicine components based on 'monarch, minister, assistant and guide' form a node characteristic matrix of a theoretically optimized GHP-GCN Chinese and western medicine relocation model; and optimizing a node adjacency matrix of the GHP-GCN Chinese and western medicine relocation model by using 'node semantic neighbourhood'.
The monarch, minister, assistant and guide theory is the basis of the compatibility of the traditional Chinese medicine prescription. Referring to clinical experience, for traditional Chinese medicine node characteristics, the monarch drug weight is assigned to 1, the ministerial drug weight is assigned to 0.8, the adjuvant drug weight is assigned to 0.6, and the messenger drug weight is assigned to 0.4. In the GHP network, according to the weight of the drugs of the monarch, the minister, the assistant and the guide, the relationship between the target protein/genes corresponding to the drug, namely the weight value is given to the side in the network. For the node characteristics of the western medicines, the weight of the monarch medicine is assigned to 1, and the weights of the ministerial medicine, the adjuvant medicine and the conductant medicine are all assigned to 0. The node characteristic matrix of the western medicine is also subjected to weighting constraint by adopting a compatibility theory of the traditional Chinese medicine, namely 'monarch, minister, assistant and guide', and mutual complementation can be realized by utilizing the advantages of each medicine and the medicine, so that the similarity correlation between the traditional Chinese medicine and the western medicine is further excavated.
Furthermore, the constructed GHP-GCN Chinese and western medicine relocation model has a multilayer connected neural network structure, and specifically comprises an input layer, a hidden layer and an output layer, wherein the neural network structure of each layer is directly linked through a graph, gathers the information of neighbors, reconstructs and embeds the neural network structure as the input of the next layer; the propagation between layers is expressed by the following formula:
Figure BDA0003446574070000051
wherein: g denotes adjacency matrix, h (l) denotes embedding of nodes at l-th layer, D is degree matrix of G, (l) is layer-specific trainable weight matrix, σ is nonlinear activation function.
Further, the α -balanced focus loss function in step S4 is specifically as follows:
Figure BDA0003446574070000052
wherein:
Figure BDA0003446574070000053
is a conditional probability, for a drug-disease association, r represents disease, d represents drug, weight factor
Figure BDA0003446574070000054
Positive sample y+Representing a known drug-disease association pair, negative example y-Representing unknown drug-disease association relationship pairs, n and m respectively represent the number of drugs and diseases.
Similarly, for drug-target protein/gene associations, r represents target protein/gene, d represents drug, weight factor
Figure BDA0003446574070000055
Positive sample y+Representing a known drug-target protein/gene association pair, negative example y-Representing unknown drug-target protein/gene association relationship pairs, wherein n and m respectively represent the number of target protein/gene and drug.
For target protein/gene-disease associations,r represents disease, d represents target protein/gene, weight factor
Figure BDA0003446574070000056
Positive sample y+Representing a known target protein/gene-disease association pair, negative sample y-Representing unknown target protein/gene-disease association relation pairs, and n and m respectively represent the number of target protein/gene and disease.
Inventive alpha balance based focal loss function
Figure BDA0003446574070000061
And (3) reshaping a standard cross entropy loss function, namely, putting the training emphasis on a sparse and indistinguishable positive sample set so as to reduce the loss weight distributed to a large number of indistinguishable negative sample sets.
The invention also provides a Chinese and western medicine relocation system fusing GHP and GCN, which comprises:
the multi-source multi-modal heterogeneous data acquisition module comprises: the device is used for collecting known medicine data, disease data and target protein/gene data;
the GHP-GCN Chinese and western medicine relocation model building module is used for building a GHP-GCN Chinese and western medicine relocation model and specifically comprises the following steps:
global heterogeneous pharmacological network GHP construction module: the system is used for constructing a global heterogeneous pharmacological network GHP by taking various diseases, medicines and target protein/genes as nodes according to data acquired by a multi-source multi-modal heterogeneous data acquisition module, and comprehensively representing various nonlinear association relations between the same node and heterogeneous nodes;
a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module: the global heterogeneous pharmacological network GHP is constructed by the global heterogeneous pharmacological network GHP construction module and is fused into the graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented;
the GHP-GCN model optimization module comprises: the GHP-GCN Chinese and western medicine relocation model is used for optimizing a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module by adopting 'monarch, minister, assistant and guide' and 'node semantic neighbor' double-weighted constraint and seamlessly integrating;
the training set construction and GHP-GCN model learning and reasoning module comprises: the model is used for constructing a drug-target protein/gene-disease training set, and an alpha-balanced focus loss function is adopted to iteratively train and optimize a GHP-GCN model optimization module, namely a GHP-GCN Chinese and western drug relocation model after weight constraint optimization;
the contribution degree calculation and result analysis module comprises: the model is used for training the optimized GHP-GCN Chinese and western medicine relocation model through a training set structure and a learning and reasoning module of the GHP-GCN model according to a given target disease, calculating the contribution degree of each node to the target disease and sequencing the contribution degrees; and according to the contribution degree sequence, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug.
Further, the system further comprises:
and the result verifying and optimizing module is used for verifying and optimizing the prediction analysis result output by the contribution degree calculating and result analyzing module.
The present invention also provides a computer storage medium having a computer program stored thereon, wherein the computer program when executed by an actuator implements a method for relocating a pharmaceutical combination as described above in conjunction with GHP and GCN.
Compared with the existing drug repositioning method, the method has the following beneficial effects:
1. the Chinese and western medicine relocation method provided by the invention is based on high-throughput omics data, a global heterogeneous pharmacological network GHP formed by global node types and global topological associations is constructed, and various nonlinear association relations such as complex mechanism action association, structural similarity association and the like among drugs (traditional Chinese medicines and partial western medicines), virus target protein or genes and disease symptoms (including symptoms at different pathological stages and drug side effect symptoms) are comprehensively represented, so that powerful data support is provided for deep learning of traditional Chinese medicine relocation; and then, a Chinese and western medicine relocation model is constructed by fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN, the complex incidence relation between various node characteristics and nodes is comprehensively represented, and a deep calculation model GHP-GCN with the dual reasoning capability of a network propagation method and a deep learning method based on similarity is constructed. The method adopts 'monarch, minister, assistant and guide' and 'semantic neighbor' dual-weighted constraint GHP-GCN. Meanwhile, a training set is constructed, an alpha-balanced focus loss function iterative training optimization model is introduced, abundant topological and semantic information in the global heterogeneous pharmacological network is kept, meanwhile, a large amount of noise interference information which is possibly introduced is effectively filtered, and the GHP-GCN model is further trained and optimized. Finally, the deep learning interpretability method based on the contribution degree clustering analysis solves the problem that a deep learning model cannot be explained, the Chinese and western medicines with good intervention effect on different pathological stages of the disease are efficiently searched at low cost, and the compatibility rule and the action mechanism of the Chinese medicine prescription are disclosed and explained by using modern language.
2. The method can be applied to relocation research of traditional Chinese medicines and western medicines, can predict potential and new medicine-target protein/gene, target protein/gene-disease and medicine-disease association, identify new targets of the medicine and new associated pathogenic genes of the disease, discover and confirm the medicine with potential treatment effect on the disease and the integration effect and the synergistic mechanism thereof, and provide scientific basis for explaining the effectiveness of the treatment of the traditional Chinese medicines and the western medicines and discovering the medicine.
3. The global heterogeneous pharmacological network GHP is merged into the graph convolution nerve GCN, seamless integration of mainstream calculation methods with reasoning ability in the field of drug relocation based on a network transmission method and a deep learning method of similarity is achieved, namely a deep calculation model GHP-GCN with double reasoning ability is constructed, and especially double calculation ability is provided for discovery and identification of active ingredients of traditional Chinese medicines. The present invention designs a multi-tiered connected neural network structure for learning low-dimensional representations of nodes from graph structure data. And each layer of GCN gathers the information of the neighbors through the direct link of the graph, and reconstructs the embedding as the input of the next layer. To build a GCN based encoder for learning low dimensional representations of drugs and diseases, node similarity and drug-disease related information is combined by deploying the GCN on a constructed global heterogeneous pharmacology network GHP.
4. The GHP-GCN reasoning learning method based on the alpha-equilibrium focus loss function is characterized in that a node feature matrix of theoretically optimized GHP-GCN is formed based on components of 'monarch, minister, assistant and guide', node semantic neighbors are introduced instead of all neighbors to optimize a node adjacency matrix of the GHP-GCN, and meanwhile, the alpha-equilibrium focus loss function is provided, so that abundant topological and semantic information in a global heterogeneous pharmacological network is kept, a large amount of noise interference information which can be introduced is effectively filtered, and the robustness of GHP-GCN reasoning learning is realized.
5. Aiming at the inexplicability of a deep learning model, the output result of GHP-GCN is analyzed by methods such as thermodynamic diagram, clustering and the like to obtain the characteristic contribution degree and the interaction contribution degree of each component of the medicine, and then the compatibility is optimized and the interpretation action mechanism is disclosed. In particular, feature importance analysis was performed using relevance layer-by-layer propagation (LRP) to explain the molecular characteristics of the chinese and western drugs identified by GCN, using LRP rules to extract the contribution of interactions from PPI networks to individual drug classifications. The first 1000 predictions of GCN were clustered from Consensus Pathway Database (CPDB) networks using Kluger's spectral diplomatic algorithm, based on the characteristic importance LRP scores of disease types of different pathological processes, revealing evidence-matching drugs of different pathological processes with unique therapeutic effects. Combining the information of all drugs in the network, a directed weighted graph of drug-drug LRP contributions is established and strongly connected nodes of the graph are investigated, modules containing more than two drugs are extracted and new disease treatment drugs are determined.
Experiments prove that the method can remarkably improve the treatment of Chinese and western medicines which can accurately recommend diseases to medical care personnel, and shows the effectiveness of the method.
Drawings
FIG. 1 is a schematic flow diagram of a method of relocation of Chinese and western medicines according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of a global heterogeneous pharmacological network GHP composed of global node types and global topological relations according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a deep computation model GHP-GCN fusing a global heterogeneous pharmacological network and a graph convolution neural network according to an embodiment of the present invention;
fig. 4 is a schematic structural view of a chinese-western medicine relocation system in embodiment 2 of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1
As shown in fig. 1, the embodiment provides a method for relocating western and Chinese medicines by fusing GHP and GCN, which is characterized by comprising the following steps:
s1, multi-source multi-modal heterogeneous data acquisition: collecting known drug data, disease data and target protein/gene data;
alternatively, known drug data, disease data, target protein/gene data are derived from drug bank, OMIM and GO notes, Fdataset and other professional databases. The data representation, data source and data structure of multi-source multi-modal heterogeneous data are different, such as: the drug data may include drug structure data, drug function data, and the like.
S2, constructing a Chinese and western medicine relocation model of GHP-GCN, wherein the construction method comprises the following steps:
s21, constructing a global heterogeneous pharmacological network GHP: according to the collected data, taking each disease, medicine and target protein/gene as nodes, constructing a global heterogeneous pharmacological network GHP, and comprehensively representing various nonlinear incidence relations between the same node and the heterogeneous node;
the global heterogeneous pharmacological network GHP constructed by the invention is composed of global node types and global topological associations, wherein the global nodes comprise three types of network nodes 'drug-target protein/gene-disease' instead of 'drug-disease', and can comprehensively represent drug action mechanisms.
Optionally, as shown in fig. 2, the drug nodes include screened traditional Chinese medicines and western medicines, specifically include western medicines for treating a certain disease (target disease), traditional Chinese medicines for treating a certain disease (target disease), unknown western medicines, unknown traditional Chinese medicines, western medicines for not treating a certain disease (target disease), and traditional Chinese medicines for not treating a certain disease (target disease), and meanwhile, the drug nodes include structural features, functional features, and the like of the drugs; the disease node is composed of four dimensions of disease type, symptom stage, symptom and side effect symptom; the target protein/gene node includes all known target proteins and genes that have been selected.
Optionally, the association of the nodes of the same species is mainly a similarity association of a structure or a phenotype, including a drug-drug similarity association (e.g., small molecule structure similarity between drugs-drugs), a target protein/gene-target protein/gene similarity association, a disease-disease similarity association, and an association between nodes of the target protein/gene also includes a protein interaction association from PPIs.
Optionally, the association of the heterogeneous nodes is mainly the association of mechanism action, and mainly comprises drug-target protein/gene association, target protein/gene-disease association and disease-drug association.
S22, fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN to form a GHP-GCN model: the constructed global heterogeneous pharmacological network GHP is fused into a graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented; fig. 3 provides a schematic structural diagram of a depth calculation model GHP-GCN that integrates a global heterogeneous pharmacological network and a graph convolution neural network.
The global heterogeneous pharmacological network GHP is merged into the graph convolution nerve GCN, seamless integration of mainstream calculation methods with reasoning ability in the field of drug relocation based on a network transmission method and a deep learning method of similarity is achieved, namely a deep calculation model GHP-GCN with double reasoning ability is constructed, and especially double calculation ability is provided for discovery and identification of active ingredients of traditional Chinese medicines.
The present invention designs a multi-tiered connected neural network structure for learning low-dimensional representations of nodes from graph structure data. And each layer of GCN gathers the information of the neighbors through the direct link of the graph, and reconstructs the embedding as the input of the next layer. To build a GCN-based encoder for learning low-dimensional representations of drugs and diseases, node similarity and drug-disease-associated information are combined by deploying GCNs on a constructed global heterogeneous pharmacologic network.
As shown in fig. 3, the GHP-GCN chinese and western medicine relocation model constructed in this embodiment has a multilayer connected neural network structure, specifically including an input layer, a hidden layer, and an output layer, where each layer of neural network structure gathers information of neighbors through direct link of the graph, reconstructs and embeds the information as input of the next layer; the propagation between layers is expressed by the following formula:
Figure BDA0003446574070000101
wherein: g denotes adjacency matrix, h (l) denotes embedding of nodes at l-th layer, D is degree matrix of G, (l) is layer-specific trainable weight matrix, σ is nonlinear activation function.
To build a GCN-based encoder for learning low-dimensional representations of drugs and diseases, node similarity and drug-disease-associated information are combined by deploying GCNs on a constructed global heterogeneous pharmacologic network.
S23, GHP-GCN model optimization: the GHP-GCN Chinese and western medicine relocation model is optimized by adopting the double weighted constraint of 'monarch, minister, assistant and guide' and 'node semantic nearest neighbor';
optionally, the method for optimizing the GHP-GCN western medicine relocation model by using the dual-weighted constraint of "monarch, minister, assistant and guide" and "node semantic nearest neighbor" in this embodiment specifically includes: a node characteristic matrix of a theoretical optimization GHP-GCN Chinese and western medicine relocation model is formed by traditional Chinese medicine components based on 'monarch, minister, assistant and guide' and a node adjacency matrix of the 'node semantic neighbor' optimization GHP-GCN Chinese and western medicine relocation model is optimized.
The monarch, minister, assistant and guide theory is the basis of the compatibility of the traditional Chinese medicine prescription. Referring to clinical experience, for traditional Chinese medicine node characteristics, the monarch drug weight is assigned to 1, the ministerial drug weight is assigned to 0.8, the adjuvant drug weight is assigned to 0.6, and the messenger drug weight is assigned to 0.4. In the GHP network, according to the weight of the drugs of the monarch, the minister, the assistant and the guide, the relationship between the target protein/genes corresponding to the drug, namely the weight value is given to the side in the network. For the node characteristics of the western medicines, the weight of the monarch medicine is assigned to 1, and the weights of the ministerial medicine, the adjuvant medicine and the conductant medicine are all assigned to 0. The node characteristic matrix of the western medicine is also subjected to weighting constraint by adopting a compatibility theory of the traditional Chinese medicine, namely 'monarch, minister, assistant and guide', and mutual complementation can be realized by utilizing the advantages of each medicine and the medicine, so that the similarity correlation between the traditional Chinese medicine and the western medicine is further excavated.
For similarity association of nodes of the same kind, matrix representation is adopted, and semantic neighbor constraint is adopted; specifically, the following:
(1) drug-drug similarity correlation matrix At∈Rn*nIs shown, in which: a. thetEach item of (a) is constructed based on the similarity of each pair of drugs; the similarity of these drugs is represented by the n × n matrix StRepresents, wherein (i, j) item St(i, j) is the drug tiAnd a drug tjThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.
Figure BDA0003446574070000104
of AtIs defined as:
Figure BDA0003446574070000102
wherein: st(i, j) is a drug similarity matrix,
Figure BDA0003446574070000103
is a group of tiExtension of (t)iNearest neighbor, Nk(ti) Is a drug tiK nearest neighbors of riRepresenting structural similarity between drugs; that is, the similarity between drugs includes semantic similarity between drugs and structural similarity between drugs.
(2) The similarity relationship between the target protein/gene and the target protein/gene is represented by matrix Ap∈Rl*lIs shown, in which: a. thepEach item of (a) is constructed based on the similarity of each pair of target proteins/genes; the similarity of these target proteins/genes is represented by the l × l matrix SpRepresents, wherein (i, j) item Sp(i, j) is a target protein/gene piAnd target protein/Gene pjThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.
Figure BDA0003446574070000111
of ApIs defined as:
Figure BDA0003446574070000112
wherein: sp(i, j) is a target protein/gene similarity matrix,
Figure BDA0003446574070000113
is a group piExtension p ofiNearest neighbor, Nk(pi) Is a target protein/gene piK nearest neighbors of riRepresenting the interaction relationship between target proteins/genes.
(3) Disease-disease similarity correlation matrix Ad∈Rm*mIs shown, in which: a. thedEach item of (a) is constructed based on the similarity of each pair of diseases; the similarity of these diseases is represented by the m × m matrix SdRepresents, wherein (i, j) item Sd(i, j) is a disease diAnd disease djThe similarity between them. To obtain more accurate results by avoiding noise information, the model only utilizes k-nearest neighbors (k-nearest) rather than what was considered in previous studiesThere are similar neighbors.
Figure BDA0003446574070000114
of AdIs defined as:
Figure BDA0003446574070000115
wherein: sd(i, j) is a disease similarity matrix,
Figure BDA0003446574070000116
is a group diExtension d ofiNearest neighbor, Nk(di) Is a disease of diK nearest neighbors of riIndicating the semantic similarity of MESH between diseases.
And for the association of the heterogeneous nodes, adopting matrix representation and adopting weight constraint, if the two heterogeneous nodes have known interaction, adding an edge between the two nodes, and setting the weight of the edge to be 1.
Specifically, such as: matrix Y epsilon R for interaction between medicine and target protein/genem*nIs shown, wherein:
Figure BDA0003446574070000117
similar to drug-target protein/gene, the similarity between target protein/gene-disease and drug-disease is related by mechanism action, and is also expressed by the matrix and constrained by weight.
S3, training set construction and learning and reasoning of the GHP-GCN model: constructing a drug-target protein/gene-disease training set, and iteratively training and optimizing a GHP-GCN Chinese and western drug relocation model by adopting an alpha-balanced focus loss function;
the constructed drug-target protein/gene-disease training set comprises the characteristics of all nodes, the association of the same nodes and the association relationship of heterogeneous nodes, similarity association is mainly performed between the same nodes (for example, small molecule structure similarity is performed between drugs and drugs), and the nodes of the target protein/gene not only comprise small molecule structure similarity association but also comprise protein interaction association from PPI.
Optionally, the α -balanced focus loss function is specifically as follows:
Figure BDA0003446574070000121
wherein:
Figure BDA0003446574070000122
is a conditional probability, for a drug-disease association, r represents disease, d represents drug, weight factor
Figure BDA0003446574070000123
Positive sample y+Representing a known drug-disease association pair, negative example y-Representing unknown drug-disease association relationship pairs, n and m respectively represent the number of drugs and diseases.
Similarly, for drug-target protein/gene associations, r represents target protein/gene, d represents drug, weight factor
Figure BDA0003446574070000124
Positive sample y+Representing a known drug-target protein/gene association pair, negative example y-Representing unknown drug-target protein/gene association relationship pairs, wherein n and m respectively represent the number of target protein/gene and drug.
For target protein/gene-disease association, r represents disease, d represents target protein/gene, weight factor
Figure BDA0003446574070000125
Positive sample y+Representing a known target protein/gene-disease association pair, negative sample y-Representing unknown target protein/gene-disease association relation pairs, and n and m respectively represent the number of target protein/gene and disease.
The embodiment provides a focus based on alpha balanceLoss function
Figure BDA0003446574070000126
And (3) reshaping a standard cross entropy loss function, namely, putting the training emphasis on a sparse and indistinguishable positive sample set so as to reduce the loss weight distributed to a large number of indistinguishable negative sample sets.
S4, calculating contribution degree and analyzing results: given a target disease, calculating the contribution degree of each node to the target disease through a GHP-GCN Chinese and western medicine relocation model, and sequencing the contribution degrees; and (4) analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method according to the contribution degree sequence, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug.
The contribution ranking generally includes a target disease-associated pathogenic gene ranking, a drug candidate ranking, and the like.
The method for relocating western and traditional Chinese medicines fusing GHP and GCN provided in this example uses relevance layer-by-layer transmission (LRP) for feature importance analysis to explain the molecular features of western and traditional Chinese medicines identified by GCN (graph convolution nerve), and uses LRP rules to extract the contribution of interactions from PPI network to individual medicine classification. The first 1000 predictions of GCN were clustered from Consensus Pathway Database (CPDB) networks using Kluger's spectral diplomatic algorithm, based on the characteristic importance LRP scores of disease types of different pathological processes, revealing evidence-matching drugs of different pathological processes with unique therapeutic effects. Combining the information of all drugs in the network, a directed weighted graph of drug-drug LRP contributions is established and strongly connected nodes of the graph are investigated, modules containing more than two drugs are extracted and new disease treatment drugs are determined.
S5, result verification and tuning: and verifying and optimizing the prediction analysis result by adopting a molecular packaging method, repeating the steps S2-S4 if the prediction analysis result has deviation, and outputting the prediction analysis result if the prediction analysis result is accurate.
The Chinese and western medicine relocation method provided by the embodiment is based on high-throughput omics data, a global heterogeneous pharmacological network GHP formed by global node types and global topological associations is constructed, various nonlinear association relations such as complex mechanism action association and structure similarity association among drugs (traditional Chinese medicines and partial western medicines), virus target protein or genes and disease symptoms (including symptoms at different pathological stages and drug side effect symptoms) are comprehensively characterized, and particularly powerful data support is provided for deep learning of traditional Chinese medicine relocation; and then, a Chinese and western medicine relocation model is constructed by fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN, the complex incidence relation between various node characteristics and nodes is comprehensively represented, and a deep calculation model GHP-GCN with the dual reasoning capability of a network propagation method and a deep learning method based on similarity is constructed. The method adopts 'monarch, minister, assistant and guide' and 'semantic neighbor' dual-weighted constraint GHP-GCN. Meanwhile, an alpha-balanced focus loss function iterative training optimization model is introduced, so that abundant topological and semantic information in the global heterogeneous pharmacological network is retained, and a large amount of noise interference information which may be introduced is effectively filtered. Finally, the deep learning interpretability method based on the contribution degree clustering analysis solves the problem that a deep learning model cannot be explained, the Chinese and western medicines with good intervention effect on different pathological stages of the disease are efficiently searched at low cost, and the compatibility rule and the action mechanism of the Chinese medicine prescription are disclosed and explained by using modern language. The method can be applied to relocation research of traditional Chinese medicines and western medicines, can predict potential and new medicine-target protein/gene, target protein/gene-disease and medicine-disease association, identify new targets of the medicines and new associated pathogenic genes of the diseases, discover and confirm medicines with potential treatment effects on the diseases and integration action and synergistic mechanism thereof, and provide scientific basis for explaining the effectiveness of treatment of the traditional Chinese medicines and the western medicines and discovering the medicines.
Example 2
As shown in fig. 4, the present embodiment provides a system 20 for relocating western medicine and pharmaceutical products fused with GHP and GCN, which adopts the method for relocating western medicine and pharmaceutical products fused with GHP and GCN shown in the embodiment, and includes:
multi-source multi-modal heterogeneous data acquisition module 21: the device is used for collecting known medicine data, disease data and target protein/gene data;
the GHP-GCN chinese and western medicine relocation model building module 22 is configured to build a GHP-GCN chinese and western medicine relocation model, and specifically includes:
global heterogeneous pharmacological network GHP construction module 221: the system is used for constructing a global heterogeneous pharmacological network GHP by taking various diseases, medicines and target protein/genes as nodes according to data acquired by the multi-source multi-modal heterogeneous data acquisition module 21 so as to comprehensively represent various nonlinear association relations between the same node and different nodes;
the global heterogeneous pharmacological network GHP and convolutional neural network GCN fusion module 222: the global heterogeneous pharmacological network GHP construction module 221 is used for fusing the global heterogeneous pharmacological network GHP constructed by the global heterogeneous pharmacological network GHP construction module into the graph convolution neural network GCN, and seamlessly fusing and constructing a GHP-GCN Chinese and western medicine relocation model with double reasoning capability so as to comprehensively represent various node characteristics and complex association relations among nodes;
the GHP-GCN model optimization module 223: the GHP-GCN Chinese and western medicine relocation model is used for optimizing a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module 222 through double weighted constraint optimization of 'monarch, minister, assistant and guide' and 'node semantic neighbor' and seamlessly integrating the GHP-GCN Chinese and western medicine relocation model;
specifically, the method is used for forming a node characteristic matrix of a theoretical optimization GHP-GCN Chinese and western medicine relocation model by adopting Chinese medicine components of 'monarch, minister, assistant and guide' and optimizing a node adjacency matrix of the GHP-GCN Chinese and western medicine relocation model by utilizing 'node semantic neighbor';
the learning and reasoning module 23 of the training set construction and GHP-GCN model: the model optimization module 223 is used for constructing a drug-target protein/gene-disease training set, and iteratively training and optimizing a GHP-GCN model optimization module 223 through an alpha-balanced focus loss function to weight and constrain an optimized GHP-GCN Chinese and western drug relocation model;
the contribution calculation and result analysis module 24: the model is used for training the optimized GHP-GCN Chinese and western medicine relocation model through the training set construction and GHP-GCN model learning and reasoning module 223 according to the given target diseases, calculating the contribution degree of each node to the target diseases and sequencing the contribution degrees; and according to the contribution degree sequence, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug.
Result verification and tuning module 25: and is used for verifying and optimizing the prediction analysis result output by the contribution degree calculation and result analysis module 24.
The system can achieve all the technical effects of the relocation method shown in embodiment 1.
Example 3
An embodiment of the present invention provides a computer storage medium, on which a computer program is stored, wherein the computer program, when executed by an executor, implements a method for relocating traditional Chinese and western medicines by fusing GHP and GCN as shown in fig. 1.
The computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc read-only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read-only memories), EEPROMs (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be a product that is not accessed by the computer device or may be a component that is used by an accessed computer device.
Example 4: application examples
In order to verify the effectiveness of the relocation method proposed by the present invention, the method provided in the examples was applied to predict new associated pathogenic genes of breast cancer and relocate new drug candidates.
Breast cancer is a common malignancy, usually occurring in women, and is a cancer that develops from breast tissue, with signs that may include breast lumps, changes in the shape of the breast, depressed skin, nipple discharge or red, flaky skin.
According to the OMIM database records, breast cancer (MIM:114480) has 23 validated disease genes, which are RAD54L, CASP8, BARD1, PIK3CA, HMMR, NQO2, ESR1, RB1CC1, SLC22A1L, TSG101, ATM, KRAS, BRCA2, XRCC3, AKT1, RAD51A, PALB2, CDH1, TP53, PHB, PPM1D, BRIP1, CHEK2, respectively.
The GHP-GCN chinese and western drug relocation model for breast cancer was trained using drug bank, OMIM and GO annotations, all known drug-target, target-disease and drug-disease pair training relocation models in the Fdataset database.
The optimized model was trained by the method provided in example 1 and predicted by the method, and the validated disease genes were excluded from the predicted disease gene ranking of breast cancer, and the new disease genes ranked in the top ten of breast cancer were MAP2K1, MAP3K1, TPX2, IKBKB, RAD51C, MSH2, STK11, NOD1, CHUK, VRK1, respectively. The protein encoded by MAP2k1 is a member of the bispecific protein kinase family, and this protease is involved in various cellular processes such as proliferation, differentiation, transcriptional regulation, and the like. The fifth prediction ranked is the RAD51C gene, one quarter of which is in the chromosome 17q23 region, which is frequently amplified in breast tumors. Recent research literature shows that MAP2k1 has a definite association with breast cancer specificity and mortality, and mutations in RAD51C may explain part of the cause of genetic breast cancer susceptibility. This shows that the part of new pathogenic genes predicted by the present invention has strong consistency with the existing experimental research.
Candidates for Metigestrona, estramustine and mitoxantrone for the treatment of breast cancer were also relocated by the method of the invention and validated against various evidence of authoritative sources and clinical trials. The first candidate for the treatment of breast cancer that the method of the present invention has relocated is metriconstron a, a long-acting contraceptive, which has also been used to treat breast and endometrial tumors. The method of the invention relocates to estramustine which is an antineoplastic drug mainly used for treating prostate tumor and is related to breast cancer, and the previous clinical test report shows that estramustine shows encouraging results in the aspect of treating metastatic breast cancer, so that the method is worthy of being researched in early clinical conditions, thereby supporting the prediction. In addition, the inventive method relocates to discover that mitoxantrone is associated with breast cancer, which is supported by DB (short for drug bank), CTD (Common Technical Document, short for CTD, Common technology for human drug registration), drug central (western medicine database) and clinical trials.
In conclusion, the application of the method of the invention to new drug candidates for the relocation of breast cancer is validated by various evidences of authoritative sources and clinical trials. These results show the effectiveness of the present invention.
The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A Chinese and western medicine relocation method fusing GHP and GCN is characterized by comprising the following steps:
s1, multi-source multi-modal heterogeneous data acquisition: collecting known drug data, disease data and target protein/gene data;
s2, constructing a Chinese and western medicine relocation model of GHP-GCN, wherein the specific construction method comprises the following steps:
s21, constructing a global heterogeneous pharmacological network GHP: according to the collected data, taking each disease, medicine and target protein/gene as nodes, constructing a global heterogeneous pharmacological network GHP, and comprehensively representing various nonlinear incidence relations between the same node and the heterogeneous node;
s22, fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN to form a GHP-GCN model: the constructed global heterogeneous pharmacological network GHP is fused into a graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented;
s23, GHP-GCN model optimization: the GHP-GCN Chinese and western medicine relocation model is optimized by adopting the double weighted constraint of 'monarch, minister, assistant and guide' and 'node semantic nearest neighbor';
s3, training set construction and learning and reasoning of the GHP-GCN model: constructing a drug-target protein/gene-disease training set, and iteratively training and optimizing a GHP-GCN Chinese and western drug relocation model by adopting an alpha-balanced focus loss function;
s4, calculating contribution degree and analyzing results: given a target disease, calculating the contribution degree of each node to the target disease through a GHP-GCN Chinese and western medicine relocation model, and sequencing the contribution degrees; according to the contribution degree sorting, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug;
s5, result verification and tuning: and verifying and optimizing the prediction analysis result by adopting a molecular packaging method, repeating the steps S2-S4 if the prediction analysis result has deviation, and outputting the prediction analysis result if the prediction analysis result is accurate.
2. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 1,
in step S21, the medicine nodes include the screened traditional Chinese medicines and western medicines, specifically including western medicines for treating a certain disease, traditional Chinese medicines for treating a certain disease, unknown western medicines, unknown traditional Chinese medicines, western medicines for not treating a certain disease, and traditional Chinese medicines for not treating a certain disease; the disease node is composed of four dimensions of disease type, symptom stage, symptom and side effect symptom; the target protein/gene node includes all known target proteins and genes that have been selected.
3. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 1,
in step S21, the association of the nodes of the same species is mainly the similarity association of the structure or phenotype, including drug-drug similarity association, target protein/gene-target protein/gene similarity association, disease-disease similarity association, and the association between the nodes of the target protein/gene also includes protein interaction association from PPI;
similarity association of nodes of the same kind is represented by a matrix and semantic neighbor constraint is adopted.
4. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 1,
the association of the heterogeneous nodes is mainly the association of mechanism action, and mainly comprises the association of drug-target protein/gene, the association of target protein/gene-disease and the association of disease-drug.
5. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 4,
and if the two heterogeneous nodes have known interaction, an edge is added between the two nodes, and the weight of the edge is set to be 1.
6. The method for relocating Chinese and western medicines in combination with GHP and GCN as claimed in any one of claims 1 to 5,
in the step S23, the method specifically comprises the following steps of optimizing a GHP-GCN Chinese and western medicine relocation model by adopting 'monarch, minister, assistant and guide' and 'node semantic nearest neighbor' dual weighted constraint: the components of 'monarch, minister, assistant and guide' based on traditional Chinese medicine nodes form a node characteristic matrix of a theoretically optimized GHP-GCN Chinese and western medicine relocation model; and optimizing a node adjacency matrix of the GHP-GCN Chinese and western medicine relocation model by using 'node semantic neighbourhood'.
7. The method for relocating Chinese and western medicines in combination with GHP and GCN as claimed in any one of claims 1 to 5,
the constructed GHP-GCN Chinese and western medicine relocation model has a multilayer connected neural network structure, and specifically comprises an input layer, a hidden layer and an output layer, wherein the neural network structure of each layer is directly linked through a graph, gathers the information of neighbors, reconstructs and embeds the information as the input of the next layer; the propagation between layers is expressed by the following formula:
Figure FDA0003446574060000021
wherein: g denotes adjacency matrix, h (l) denotes embedding of nodes at l-th layer, D is degree matrix of G, (l) is layer-specific trainable weight matrix, σ is nonlinear activation function.
8. The method for relocating Chinese and western medicines in combination with GHP and GCN as claimed in any one of claims 1 to 5,
the α -balanced focus loss function in step S3 is specifically as follows:
Figure FDA0003446574060000022
wherein:
Figure FDA0003446574060000023
is a conditional probability, for a drug-disease association, r represents disease, d represents drug, weight factor
Figure FDA0003446574060000024
Positive sample
Figure FDA0003446574060000025
Representing known drug-disease association pairs, negative examples
Figure FDA0003446574060000026
Representing unknown drug-disease association relationship pairs, n and m respectively represent the number of drugs and diseases.
9. A Chinese and western medicine relocation system fusing GHP and GCN is characterized by comprising:
the multi-source multi-modal heterogeneous data acquisition module comprises: the device is used for collecting known medicine data, disease data and target protein/gene data;
the GHP-GCN Chinese and western medicine relocation model building module is used for building a GHP-GCN Chinese and western medicine relocation model and specifically comprises the following steps:
global heterogeneous pharmacological network GHP construction module: the system is used for constructing a global heterogeneous pharmacological network GHP by taking various diseases, medicines and target protein/genes as nodes according to data acquired by a multi-source multi-modal heterogeneous data acquisition module so as to comprehensively represent various nonlinear association relations between the same node and different nodes;
a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module: the global heterogeneous pharmacological network GHP is constructed by the global heterogeneous pharmacological network GHP construction module and is fused into the graph convolution neural network GCN, and a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction so as to comprehensively represent various node characteristics and complex association relations among nodes;
the GHP-GCN model optimization module comprises: the GHP-GCN Chinese and western medicine relocation model is used for optimizing a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module by adopting 'monarch, minister, assistant and guide' and 'node semantic neighbor' double-weighted constraint and seamlessly integrating;
the training set construction and GHP-GCN model learning and reasoning module comprises: the model is used for constructing a drug-target protein/gene-disease training set, and an alpha-balanced focus loss function is adopted to iteratively train and optimize a GHP-GCN model optimization module, namely a GHP-GCN Chinese and western drug relocation model after weight constraint optimization;
the contribution degree calculation and result analysis module comprises: the model is used for training the optimized GHP-GCN Chinese and western medicine relocation model through a training set structure and a learning and reasoning module of the GHP-GCN model according to a given target disease, calculating the contribution degree of each node to the target disease and sequencing the contribution degrees; according to the contribution degree sorting, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and repositioning a new candidate drug;
and the result verifying and optimizing module is used for verifying and optimizing the prediction analysis result output by the contribution degree calculating and result analyzing module.
10. A computer storage medium having a computer program stored thereon, wherein the computer program when executed by an actuator implements a method of relocating a combination of a GHP and a GCN as claimed in any one of claims 1 to 7.
CN202111658146.1A 2021-12-30 2021-12-30 Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium Active CN114242186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111658146.1A CN114242186B (en) 2021-12-30 2021-12-30 Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111658146.1A CN114242186B (en) 2021-12-30 2021-12-30 Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium

Publications (2)

Publication Number Publication Date
CN114242186A true CN114242186A (en) 2022-03-25
CN114242186B CN114242186B (en) 2022-08-12

Family

ID=80744948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111658146.1A Active CN114242186B (en) 2021-12-30 2021-12-30 Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium

Country Status (1)

Country Link
CN (1) CN114242186B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974406A (en) * 2022-05-11 2022-08-30 中国人民解放军总医院 Training method, system, device and product of antiviral drug repositioning model
CN115966252A (en) * 2023-02-12 2023-04-14 汤永 Antiviral drug screening method based on L1norm diagram
CN116646072A (en) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 Training method and device for prostate diagnosis neural network model
CN117038105A (en) * 2023-10-08 2023-11-10 武汉纺织大学 Drug repositioning method and system based on information enhancement graph neural network
CN117334246A (en) * 2023-09-28 2024-01-02 之江实验室 Method, device and storage medium for drug repositioning based on calculation
CN118351945A (en) * 2024-04-24 2024-07-16 湖北中医药大学 Disease gene prediction method and device based on graph neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853714A (en) * 2019-10-21 2020-02-28 天津大学 Drug relocation model based on pathogenic contribution network analysis
CN111681718A (en) * 2020-06-11 2020-09-18 湖南大学 Medicine relocation method based on deep learning multi-source heterogeneous network
WO2020204586A1 (en) * 2019-04-01 2020-10-08 한국과학기술정보연구원 Drug repositioning candidate recommendation system, and computer program stored in medium in order to execute each function of system
CN111916145A (en) * 2020-07-24 2020-11-10 湖南大学 Novel coronavirus target prediction and drug discovery method based on graph representation learning
CN112927765A (en) * 2021-03-29 2021-06-08 天士力国际基因网络药物创新中心有限公司 Method for repositioning medicine
US20210365795A1 (en) * 2019-10-11 2021-11-25 Medirita Method and apparatus for deriving new drug candidate substance

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020204586A1 (en) * 2019-04-01 2020-10-08 한국과학기술정보연구원 Drug repositioning candidate recommendation system, and computer program stored in medium in order to execute each function of system
US20210365795A1 (en) * 2019-10-11 2021-11-25 Medirita Method and apparatus for deriving new drug candidate substance
CN110853714A (en) * 2019-10-21 2020-02-28 天津大学 Drug relocation model based on pathogenic contribution network analysis
CN111681718A (en) * 2020-06-11 2020-09-18 湖南大学 Medicine relocation method based on deep learning multi-source heterogeneous network
CN111916145A (en) * 2020-07-24 2020-11-10 湖南大学 Novel coronavirus target prediction and drug discovery method based on graph representation learning
CN112927765A (en) * 2021-03-29 2021-06-08 天士力国际基因网络药物创新中心有限公司 Method for repositioning medicine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAJIE MENG等: "Drug repositioning based on similarity constrained probabilistic matrix factorization: COVID-19 as a case study", 《APPLIED SOFT COMPUTING》 *
任淑月等: "重定位策略在罕见病治疗药物中的应用研究进展", 《江苏大学学报(医学版)》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974406A (en) * 2022-05-11 2022-08-30 中国人民解放军总医院 Training method, system, device and product of antiviral drug repositioning model
CN114974406B (en) * 2022-05-11 2023-04-14 中国人民解放军总医院 Training method, system, device and product of antiviral drug repositioning model
CN115966252A (en) * 2023-02-12 2023-04-14 汤永 Antiviral drug screening method based on L1norm diagram
CN115966252B (en) * 2023-02-12 2024-01-19 中国人民解放军总医院 Antiviral drug screening method based on L1norm diagram
CN116646072A (en) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 Training method and device for prostate diagnosis neural network model
CN117334246A (en) * 2023-09-28 2024-01-02 之江实验室 Method, device and storage medium for drug repositioning based on calculation
CN117038105A (en) * 2023-10-08 2023-11-10 武汉纺织大学 Drug repositioning method and system based on information enhancement graph neural network
CN117038105B (en) * 2023-10-08 2023-12-15 武汉纺织大学 Drug repositioning method and system based on information enhancement graph neural network
CN118351945A (en) * 2024-04-24 2024-07-16 湖北中医药大学 Disease gene prediction method and device based on graph neural network

Also Published As

Publication number Publication date
CN114242186B (en) 2022-08-12

Similar Documents

Publication Publication Date Title
CN114242186B (en) Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium
Deng et al. Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification
JP7490576B2 (en) Method and apparatus for multimodal prediction using trained statistical models - Patents.com
Ghosh et al. Recursive memetic algorithm for gene selection in microarray data
Shehu et al. A survey of computational methods for protein function prediction
Alrefai et al. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets
Cutler et al. [23] random forests for microarrays
US20220183571A1 (en) Predicting fractional flow reserve from electrocardiograms and patient records
JP7568276B2 (en) System or method for predicting trait information of an individual
Sheng et al. Multi-channel graph attention autoencoders for disease-related lncRNAs prediction
Ali et al. A comprehensive review of artificial intelligence approaches in omics data processing: evaluating progress and challenges
CN118116600B (en) Colorectal cancer prognosis method based on multiple sets of clinical test data
Masulli et al. Natural computing methods in bioinformatics: A survey
Ouaderhman et al. A new filter-based gene selection approach in the DNA microarray domain
Iraji et al. Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method
Palmal et al. Integrative prognostic modeling for breast cancer: Unveiling optimal multimodal combinations using graph convolutional networks and calibrated random forest
Zhang et al. Data mining methods in Omics-based biomarker discovery
Xu et al. AutoOmics: New multimodal approach for multi-omics research
Zhou et al. Identifying biomarkers of nottingham prognosis index in breast cancer survivability
Boruah et al. CaDenseNet: a novel deep learning approach using capsule network with attention for the identification of HIV-1 integration site
WO2022212337A1 (en) Graph database techniques for machine learning
Uthayan A novel microarray gene selection and classification using intelligent dynamic grey wolf optimization
Xu et al. AutoOmics: An AutoML Tool for Multi-Omics Research
Alzahrani et al. Feature Subset Selection with Artificial Intelligence-Based Classification Model for Biomedical Data
Mukhopadhyay et al. Multiobjective Approach to Protein Complex Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared