CN114242186A

CN114242186A - Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium

Info

Publication number: CN114242186A
Application number: CN202111658146.1A
Authority: CN
Inventors: 金敏; 龚后武
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-03-25
Anticipated expiration: 2041-12-30
Also published as: CN114242186B

Abstract

The invention discloses a Chinese and western medicine relocation method and a system and a storage medium for fusing GHP and GCN, wherein the relocation method comprises the following steps: multi-source multi-modal heterogeneous data acquisition; the GHP-GCN Chinese and western medicine relocation model construction method specifically comprises the following steps: constructing a global heterogeneous pharmacological network GHP, fusing the global heterogeneous pharmacological network and a graph convolution neural network to form a GHP-GCN model, and optimizing the GHP-GCN model by adopting double weighted constraints of 'monarch, minister, assistant and guide' and 'node semantic neighbor'; training set construction and GHP-GCN model learning and reasoning; calculating contribution degree and analyzing results; and verifying and optimizing the result. The method can be applied to the relocation research of Chinese and western medicines, can predict potential and new medicine-target protein/gene, target protein/gene-disease and medicine-disease association, identify new medicine target and disease new associated pathogenic gene, discover and confirm medicines with potential treatment effect on diseases and integration action and synergistic mechanism thereof, and provide scientific basis for explaining the effectiveness of Chinese and western medicine treatment and medicine discovery.

Description

Chinese and western medicine relocation method and system fusing GHP and GCN and storage medium

Technical Field

The invention relates to the technical field of Chinese and western medicine screening, development and relocation, in particular to a Chinese and western medicine relocation method and system and a storage medium for fusing a global heterogeneous pharmacological network GHP and a graph convolution neural network GCN.

Background

The new drug development is to search for a proper new drug for a certain protein target; drug relocation is the relocation analysis of existing drugs to find new uses for the drug. Compared with the research and development of new drugs, the relocation of drugs has the advantages of reducing cost and accelerating the discovery of new drugs, and gradually becomes a more popular strategy in drug discovery. The current strategy for drug relocation is to classify subjects into drug-based and disease-based under the guidance of the "similarity" concept. That is, the method is generally based on two directions, the first is drug-based, and the new indication of the drug is predicted by some property of the drug, such as the desired chemical structure or the activity of drug molecules; secondly, based on diseases and the like, reverse speculation is carried out through side effects, disease clinical response, pathological features, pathogenesis and intervention mechanism, and then the marketed drugs with new indications are searched and discovered from different links.

For drug relocation, predicting the complex relationships between drugs, targets and diseases is an important direction. Researchers tend to focus on associations between any two of them. In a drug-target-disease triple association, if two known associations exist, then a third association may also exist. For example, there may be a link between a drug, a target and a disease, a link between a drug and a target and a disease, and a link between a drug and a disease. Therefore, focusing only on the correlation between any two of them cannot fully reveal the new mechanism of action of the drug, presume the mechanism of occurrence and development of the disease, and find new indications of the drug.

The current development status of drug relocation mainly utilizes complex network analysis, machine learning, literature mining and the like, and combines data in knowledge bases of chemistry, pharmacy, bioinformatics and the like to establish a relevant model so as to obtain a heuristic model and perform relocation application on certain diseases or certain drugs. The method comprises the steps of converting data into an abstract matrix form by integrating information of a relevant database, learning features by using an effective algorithm of machine learning, obtaining an optimization model based on certain requirements by repeated training, and finally performing classification or regression prediction in a test set. The comprehensive analysis method can effectively combine the characteristic information of different sources. It is noted that due to the sparsity of feature data from some source, irrelevant noise data may be introduced at the same time as the feature data, which increases the difficulty of identifying important features of the drug. Matrix factorization methods have achieved good performance in some disease studies, and studies that need further breakthrough now include how to efficiently capture the complex structure of drug-disease associated data and efficiently handle high-complexity matrix operations. There are complex and nonlinear associations between the mechanisms of action of drugs and diseases, drug-to-drug, and disease-to-disease similarities. Most of the previous methods belong to shallow models, and it is difficult to capture the relationships deeply, so deep learning methods are proposed.

The newly-appeared graph convolution neural network expands the method of the neural network in the aspect of graph structure data representation, and has the characteristic of updating nodes according to self characteristics, neighbor characteristic information and topology information on a graph structure. This computational power for dimensionality reduction on the graph structure just meets the computational requirements for predicting similar drugs (targets) from similar target (drug) nodes in drug relocation.

Patent document CN111191014A discloses a method, a system, a terminal and a medium for drug relocation, which includes identifying and extracting an indication entity and a drug entity for constructing an indication word stock and a drug word stock; constructing a medicine knowledge graph according to the indication word bank and the medicine word bank; obtaining the indication similarity degree between indication nodes and the medicine similarity degree between medicine nodes according to the medicine knowledge graph; and obtaining a link correlation index according to the indication similarity and the medicine similarity so as to reposition the medicine. Only the association between the indications and the medicines is constructed, and finally, link correlation indexes are obtained according to the calculated indication similarity and the medicine similarity so as to relocate the medicines, or a new action mechanism of the medicines cannot be comprehensively disclosed, and the efficiency and the accuracy of medicine relocation are not high; and is mainly directed to the relocation of western medicines (compound medicines).

Disclosure of Invention

The technical problem to be solved by the invention is as follows: a Chinese and western medicine relocation method, a system and a storage medium for fusing Global heterogeneous Pharmacological network GHP (GHP) and graph convolution neural network GCN are provided.

In order to solve the technical problems, the invention adopts the following technical scheme:

firstly, a Chinese and western medicine relocation method for fusing GHP and GCN is provided, which comprises the following steps:

s1, multi-source multi-modal heterogeneous data acquisition: collecting known drug data, disease data and target protein/gene data;

s2, constructing a Chinese and western medicine relocation model of GHP-GCN, wherein the construction method comprises the following steps:

s21, constructing a global heterogeneous pharmacological network GHP: according to the collected data, taking each disease, medicine and target protein/gene as nodes, constructing a global heterogeneous pharmacological network GHP, and comprehensively representing various nonlinear incidence relations between the same node and the heterogeneous node;

s22, fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN to form a GHP-GCN model: the constructed global heterogeneous pharmacological network GHP is fused into a graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented;

s23, GHP-GCN model optimization: the GHP-GCN Chinese and western medicine relocation model is optimized by adopting the double weighted constraint of 'monarch, minister, assistant and guide' and 'node semantic nearest neighbor';

s3, training set construction and learning and reasoning of the GHP-GCN model: constructing a drug-target protein/gene-disease training set, and iteratively training and optimizing a GHP-GCN Chinese and western drug relocation model by adopting an alpha-balanced focus loss function;

s4, calculating contribution degree and analyzing results: given a target disease, calculating the contribution degree of each node to the target disease through a GHP-GCN Chinese and western medicine relocation model, and sequencing the contribution degrees; according to the contribution degree sorting, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug;

s5, result verification and tuning: and verifying and optimizing the prediction analysis result by adopting a molecular packaging method, repeating the steps S2-S4 if the prediction analysis result has deviation, and outputting the prediction analysis result if the prediction analysis result is accurate.

The global heterogeneous pharmacological network GHP constructed by the invention is composed of global node types and global topological associations, wherein the global nodes comprise three types of network nodes 'drug-target protein/gene-disease' instead of 'drug-disease', and can comprehensively represent drug action mechanisms.

Further, in step S2, the medicine nodes include the screened traditional Chinese medicines and western medicines, specifically include western medicines for treating a certain disease, traditional Chinese medicines for treating a certain disease, unknown western medicines, unknown traditional Chinese medicines, western medicines for not treating a certain disease, and traditional Chinese medicines for not treating a certain disease; the disease node is composed of four dimensions of disease type, symptom stage, symptom and side effect symptom; the target protein/gene node includes all known target proteins and genes that have been selected.

Further, the association of the nodes of the same species is mainly structural or phenotypic similarity association, including drug-drug similarity association (e.g. small molecule structural similarity between drugs-drugs), target protein/gene-target protein/gene similarity association, disease-disease similarity association, and the association between the nodes of the target protein/gene also includes protein interaction association from PPI.

Further, similarity association of the same kind of nodes is expressed by adopting a matrix and semantic neighbor constraint is adopted; specifically, the following:

(1) drug-drug similarity correlation matrix A^t∈R^n*nIs shown, in which: a. the^tEach item of (a) is constructed based on the similarity of each pair of drugs; the similarity of these drugs is represented by the n × n matrix S^tRepresents, wherein (i, j) item S^t(i, j) is the drug t_iAnd a drug t_jThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.

ofA^tIs defined as:

wherein: s^t(i, j) is a drug similarity matrix,

is a group of t_iExtension of (t)_iNearest neighbor, N_k(t_i) Is a drug t_iK nearest neighbors of r_iRepresenting structural similarity between drugs; that is, the similarity between drugs includes semantic similarity between drugs and structural similarity between drugs.

(2) The similarity relationship between the target protein/gene and the target protein/gene is represented by matrix A_p∈R^l*lIs shown, in which: a. the^pEach item of (a) is constructed based on the similarity of each pair of target proteins/genes; the similarity of these target proteins/genes is represented by the l × l matrix S^pRepresents, wherein (i, j) item S^p(i, j) is a target protein/gene p_iAnd target protein/Gene p_jThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.

of A^pIs defined as:

wherein: s^p(i, j) is a target protein/gene similarity matrix,

is a group p_iExtension p of_iNearest neighbor, N_k(p_i) Is a target protein/gene p_iK nearest neighbors of r_iRepresenting the interaction relationship between target proteins/genes.

(3) Disease-disease similarity correlation matrix A^d∈R^m*mIs shown, in which: a. the^dEach item of (a) is constructed based on the similarity of each pair of diseases; the similarity of these diseases is represented by the m × m matrix S^dRepresents, wherein (i, j) item S^d(i, j) is a disease d_iAnd disease d_jThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.

of A^dIs defined as:

wherein: s^d(i, j) is a disease similarity matrix,

is a group d_iExtension d of_iNearest neighbor, N_k(d_i) Is a disease of d_iK nearest neighbors of r_iIndicating the semantic similarity of MESH between diseases.

Further, the air conditioner is provided with a fan,

the association of the heterogeneous nodes is mainly the association of mechanism action, and mainly comprises the association of drug-target protein/gene, the association of target protein/gene-disease and the association of disease-drug.

And for the association of the heterogeneous nodes, adopting matrix representation and adopting weight constraint, if the two heterogeneous nodes have known interaction, adding an edge between the two nodes, and setting the weight of the edge to be 1.

Specifically, such as: matrix Y epsilon R for interaction between medicine and target protein/gene^m*nIs shown, wherein:

similar to drug-target protein/gene, the similarity between target protein/gene-disease and drug-disease is related by mechanism action, and is also expressed by a similar matrix and constrained by weight.

Further, in step S23, the optimization of the GHP-GCN chinese and western medicine relocation model using the "monarch, minister, assistant and guide" and "node semantic nearest neighbor" dual-weighted constraint specifically means: traditional Chinese medicine components based on 'monarch, minister, assistant and guide' form a node characteristic matrix of a theoretically optimized GHP-GCN Chinese and western medicine relocation model; and optimizing a node adjacency matrix of the GHP-GCN Chinese and western medicine relocation model by using 'node semantic neighbourhood'.

The monarch, minister, assistant and guide theory is the basis of the compatibility of the traditional Chinese medicine prescription. Referring to clinical experience, for traditional Chinese medicine node characteristics, the monarch drug weight is assigned to 1, the ministerial drug weight is assigned to 0.8, the adjuvant drug weight is assigned to 0.6, and the messenger drug weight is assigned to 0.4. In the GHP network, according to the weight of the drugs of the monarch, the minister, the assistant and the guide, the relationship between the target protein/genes corresponding to the drug, namely the weight value is given to the side in the network. For the node characteristics of the western medicines, the weight of the monarch medicine is assigned to 1, and the weights of the ministerial medicine, the adjuvant medicine and the conductant medicine are all assigned to 0. The node characteristic matrix of the western medicine is also subjected to weighting constraint by adopting a compatibility theory of the traditional Chinese medicine, namely 'monarch, minister, assistant and guide', and mutual complementation can be realized by utilizing the advantages of each medicine and the medicine, so that the similarity correlation between the traditional Chinese medicine and the western medicine is further excavated.

Furthermore, the constructed GHP-GCN Chinese and western medicine relocation model has a multilayer connected neural network structure, and specifically comprises an input layer, a hidden layer and an output layer, wherein the neural network structure of each layer is directly linked through a graph, gathers the information of neighbors, reconstructs and embeds the neural network structure as the input of the next layer; the propagation between layers is expressed by the following formula:

wherein: g denotes adjacency matrix, h (l) denotes embedding of nodes at l-th layer, D is degree matrix of G, (l) is layer-specific trainable weight matrix, σ is nonlinear activation function.

Further, the α -balanced focus loss function in step S4 is specifically as follows:

wherein:

is a conditional probability, for a drug-disease association, r represents disease, d represents drug, weight factor

Positive sample y⁺Representing a known drug-disease association pair, negative example y^-Representing unknown drug-disease association relationship pairs, n and m respectively represent the number of drugs and diseases.

Similarly, for drug-target protein/gene associations, r represents target protein/gene, d represents drug, weight factor

Positive sample y⁺Representing a known drug-target protein/gene association pair, negative example y^-Representing unknown drug-target protein/gene association relationship pairs, wherein n and m respectively represent the number of target protein/gene and drug.

For target protein/gene-disease associations,r represents disease, d represents target protein/gene, weight factor

Positive sample y⁺Representing a known target protein/gene-disease association pair, negative sample y^-Representing unknown target protein/gene-disease association relation pairs, and n and m respectively represent the number of target protein/gene and disease.

Inventive alpha balance based focal loss function

And (3) reshaping a standard cross entropy loss function, namely, putting the training emphasis on a sparse and indistinguishable positive sample set so as to reduce the loss weight distributed to a large number of indistinguishable negative sample sets.

The invention also provides a Chinese and western medicine relocation system fusing GHP and GCN, which comprises:

the multi-source multi-modal heterogeneous data acquisition module comprises: the device is used for collecting known medicine data, disease data and target protein/gene data;

the GHP-GCN Chinese and western medicine relocation model building module is used for building a GHP-GCN Chinese and western medicine relocation model and specifically comprises the following steps:

global heterogeneous pharmacological network GHP construction module: the system is used for constructing a global heterogeneous pharmacological network GHP by taking various diseases, medicines and target protein/genes as nodes according to data acquired by a multi-source multi-modal heterogeneous data acquisition module, and comprehensively representing various nonlinear association relations between the same node and heterogeneous nodes;

a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module: the global heterogeneous pharmacological network GHP is constructed by the global heterogeneous pharmacological network GHP construction module and is fused into the graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented;

the GHP-GCN model optimization module comprises: the GHP-GCN Chinese and western medicine relocation model is used for optimizing a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module by adopting 'monarch, minister, assistant and guide' and 'node semantic neighbor' double-weighted constraint and seamlessly integrating;

the training set construction and GHP-GCN model learning and reasoning module comprises: the model is used for constructing a drug-target protein/gene-disease training set, and an alpha-balanced focus loss function is adopted to iteratively train and optimize a GHP-GCN model optimization module, namely a GHP-GCN Chinese and western drug relocation model after weight constraint optimization;

the contribution degree calculation and result analysis module comprises: the model is used for training the optimized GHP-GCN Chinese and western medicine relocation model through a training set structure and a learning and reasoning module of the GHP-GCN model according to a given target disease, calculating the contribution degree of each node to the target disease and sequencing the contribution degrees; and according to the contribution degree sequence, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug.

Further, the system further comprises:

and the result verifying and optimizing module is used for verifying and optimizing the prediction analysis result output by the contribution degree calculating and result analyzing module.

The present invention also provides a computer storage medium having a computer program stored thereon, wherein the computer program when executed by an actuator implements a method for relocating a pharmaceutical combination as described above in conjunction with GHP and GCN.

Compared with the existing drug repositioning method, the method has the following beneficial effects:

1. the Chinese and western medicine relocation method provided by the invention is based on high-throughput omics data, a global heterogeneous pharmacological network GHP formed by global node types and global topological associations is constructed, and various nonlinear association relations such as complex mechanism action association, structural similarity association and the like among drugs (traditional Chinese medicines and partial western medicines), virus target protein or genes and disease symptoms (including symptoms at different pathological stages and drug side effect symptoms) are comprehensively represented, so that powerful data support is provided for deep learning of traditional Chinese medicine relocation; and then, a Chinese and western medicine relocation model is constructed by fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN, the complex incidence relation between various node characteristics and nodes is comprehensively represented, and a deep calculation model GHP-GCN with the dual reasoning capability of a network propagation method and a deep learning method based on similarity is constructed. The method adopts 'monarch, minister, assistant and guide' and 'semantic neighbor' dual-weighted constraint GHP-GCN. Meanwhile, a training set is constructed, an alpha-balanced focus loss function iterative training optimization model is introduced, abundant topological and semantic information in the global heterogeneous pharmacological network is kept, meanwhile, a large amount of noise interference information which is possibly introduced is effectively filtered, and the GHP-GCN model is further trained and optimized. Finally, the deep learning interpretability method based on the contribution degree clustering analysis solves the problem that a deep learning model cannot be explained, the Chinese and western medicines with good intervention effect on different pathological stages of the disease are efficiently searched at low cost, and the compatibility rule and the action mechanism of the Chinese medicine prescription are disclosed and explained by using modern language.

2. The method can be applied to relocation research of traditional Chinese medicines and western medicines, can predict potential and new medicine-target protein/gene, target protein/gene-disease and medicine-disease association, identify new targets of the medicine and new associated pathogenic genes of the disease, discover and confirm the medicine with potential treatment effect on the disease and the integration effect and the synergistic mechanism thereof, and provide scientific basis for explaining the effectiveness of the treatment of the traditional Chinese medicines and the western medicines and discovering the medicine.

3. The global heterogeneous pharmacological network GHP is merged into the graph convolution nerve GCN, seamless integration of mainstream calculation methods with reasoning ability in the field of drug relocation based on a network transmission method and a deep learning method of similarity is achieved, namely a deep calculation model GHP-GCN with double reasoning ability is constructed, and especially double calculation ability is provided for discovery and identification of active ingredients of traditional Chinese medicines. The present invention designs a multi-tiered connected neural network structure for learning low-dimensional representations of nodes from graph structure data. And each layer of GCN gathers the information of the neighbors through the direct link of the graph, and reconstructs the embedding as the input of the next layer. To build a GCN based encoder for learning low dimensional representations of drugs and diseases, node similarity and drug-disease related information is combined by deploying the GCN on a constructed global heterogeneous pharmacology network GHP.

4. The GHP-GCN reasoning learning method based on the alpha-equilibrium focus loss function is characterized in that a node feature matrix of theoretically optimized GHP-GCN is formed based on components of 'monarch, minister, assistant and guide', node semantic neighbors are introduced instead of all neighbors to optimize a node adjacency matrix of the GHP-GCN, and meanwhile, the alpha-equilibrium focus loss function is provided, so that abundant topological and semantic information in a global heterogeneous pharmacological network is kept, a large amount of noise interference information which can be introduced is effectively filtered, and the robustness of GHP-GCN reasoning learning is realized.

5. Aiming at the inexplicability of a deep learning model, the output result of GHP-GCN is analyzed by methods such as thermodynamic diagram, clustering and the like to obtain the characteristic contribution degree and the interaction contribution degree of each component of the medicine, and then the compatibility is optimized and the interpretation action mechanism is disclosed. In particular, feature importance analysis was performed using relevance layer-by-layer propagation (LRP) to explain the molecular characteristics of the chinese and western drugs identified by GCN, using LRP rules to extract the contribution of interactions from PPI networks to individual drug classifications. The first 1000 predictions of GCN were clustered from Consensus Pathway Database (CPDB) networks using Kluger's spectral diplomatic algorithm, based on the characteristic importance LRP scores of disease types of different pathological processes, revealing evidence-matching drugs of different pathological processes with unique therapeutic effects. Combining the information of all drugs in the network, a directed weighted graph of drug-drug LRP contributions is established and strongly connected nodes of the graph are investigated, modules containing more than two drugs are extracted and new disease treatment drugs are determined.

Experiments prove that the method can remarkably improve the treatment of Chinese and western medicines which can accurately recommend diseases to medical care personnel, and shows the effectiveness of the method.

Drawings

FIG. 1 is a schematic flow diagram of a method of relocation of Chinese and western medicines according to embodiment 1 of the present invention;

fig. 2 is a schematic structural diagram of a global heterogeneous pharmacological network GHP composed of global node types and global topological relations according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a deep computation model GHP-GCN fusing a global heterogeneous pharmacological network and a graph convolution neural network according to an embodiment of the present invention;

fig. 4 is a schematic structural view of a chinese-western medicine relocation system in embodiment 2 of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example 1

As shown in fig. 1, the embodiment provides a method for relocating western and Chinese medicines by fusing GHP and GCN, which is characterized by comprising the following steps:

alternatively, known drug data, disease data, target protein/gene data are derived from drug bank, OMIM and GO notes, Fdataset and other professional databases. The data representation, data source and data structure of multi-source multi-modal heterogeneous data are different, such as: the drug data may include drug structure data, drug function data, and the like.

Optionally, as shown in fig. 2, the drug nodes include screened traditional Chinese medicines and western medicines, specifically include western medicines for treating a certain disease (target disease), traditional Chinese medicines for treating a certain disease (target disease), unknown western medicines, unknown traditional Chinese medicines, western medicines for not treating a certain disease (target disease), and traditional Chinese medicines for not treating a certain disease (target disease), and meanwhile, the drug nodes include structural features, functional features, and the like of the drugs; the disease node is composed of four dimensions of disease type, symptom stage, symptom and side effect symptom; the target protein/gene node includes all known target proteins and genes that have been selected.

Optionally, the association of the nodes of the same species is mainly a similarity association of a structure or a phenotype, including a drug-drug similarity association (e.g., small molecule structure similarity between drugs-drugs), a target protein/gene-target protein/gene similarity association, a disease-disease similarity association, and an association between nodes of the target protein/gene also includes a protein interaction association from PPIs.

Optionally, the association of the heterogeneous nodes is mainly the association of mechanism action, and mainly comprises drug-target protein/gene association, target protein/gene-disease association and disease-drug association.

S22, fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN to form a GHP-GCN model: the constructed global heterogeneous pharmacological network GHP is fused into a graph convolution neural network GCN, a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction, and various node characteristics and complex association relations among nodes are comprehensively represented; fig. 3 provides a schematic structural diagram of a depth calculation model GHP-GCN that integrates a global heterogeneous pharmacological network and a graph convolution neural network.

The global heterogeneous pharmacological network GHP is merged into the graph convolution nerve GCN, seamless integration of mainstream calculation methods with reasoning ability in the field of drug relocation based on a network transmission method and a deep learning method of similarity is achieved, namely a deep calculation model GHP-GCN with double reasoning ability is constructed, and especially double calculation ability is provided for discovery and identification of active ingredients of traditional Chinese medicines.

The present invention designs a multi-tiered connected neural network structure for learning low-dimensional representations of nodes from graph structure data. And each layer of GCN gathers the information of the neighbors through the direct link of the graph, and reconstructs the embedding as the input of the next layer. To build a GCN-based encoder for learning low-dimensional representations of drugs and diseases, node similarity and drug-disease-associated information are combined by deploying GCNs on a constructed global heterogeneous pharmacologic network.

As shown in fig. 3, the GHP-GCN chinese and western medicine relocation model constructed in this embodiment has a multilayer connected neural network structure, specifically including an input layer, a hidden layer, and an output layer, where each layer of neural network structure gathers information of neighbors through direct link of the graph, reconstructs and embeds the information as input of the next layer; the propagation between layers is expressed by the following formula:

To build a GCN-based encoder for learning low-dimensional representations of drugs and diseases, node similarity and drug-disease-associated information are combined by deploying GCNs on a constructed global heterogeneous pharmacologic network.

optionally, the method for optimizing the GHP-GCN western medicine relocation model by using the dual-weighted constraint of "monarch, minister, assistant and guide" and "node semantic nearest neighbor" in this embodiment specifically includes: a node characteristic matrix of a theoretical optimization GHP-GCN Chinese and western medicine relocation model is formed by traditional Chinese medicine components based on 'monarch, minister, assistant and guide' and a node adjacency matrix of the 'node semantic neighbor' optimization GHP-GCN Chinese and western medicine relocation model is optimized.

For similarity association of nodes of the same kind, matrix representation is adopted, and semantic neighbor constraint is adopted; specifically, the following:

of A^tIs defined as:

wherein: s^t(i, j) is a drug similarity matrix,

(2) The similarity relationship between the target protein/gene and the target protein/gene is represented by matrix A^p∈R^l*lIs shown, in which: a. the^pEach item of (a) is constructed based on the similarity of each pair of target proteins/genes; the similarity of these target proteins/genes is represented by the l × l matrix S^pRepresents, wherein (i, j) item S^p(i, j) is a target protein/gene p_iAnd target protein/Gene p_jThe similarity between them. To obtain more accurate results by avoiding noisy information, the model only utilizes the k nearest neighbor (k-nearest) rather than all similar neighbors considered in previous studies.

of A^pIs defined as:

wherein: s^p(i, j) is a target protein/gene similarity matrix,

(3) Disease-disease similarity correlation matrix A^d∈R^m*mIs shown, in which: a. the^dEach item of (a) is constructed based on the similarity of each pair of diseases; the similarity of these diseases is represented by the m × m matrix S^dRepresents, wherein (i, j) item S^d(i, j) is a disease d_iAnd disease d_jThe similarity between them. To obtain more accurate results by avoiding noise information, the model only utilizes k-nearest neighbors (k-nearest) rather than what was considered in previous studiesThere are similar neighbors.

of A^dIs defined as:

wherein: s^d(i, j) is a disease similarity matrix,

similar to drug-target protein/gene, the similarity between target protein/gene-disease and drug-disease is related by mechanism action, and is also expressed by the matrix and constrained by weight.

the constructed drug-target protein/gene-disease training set comprises the characteristics of all nodes, the association of the same nodes and the association relationship of heterogeneous nodes, similarity association is mainly performed between the same nodes (for example, small molecule structure similarity is performed between drugs and drugs), and the nodes of the target protein/gene not only comprise small molecule structure similarity association but also comprise protein interaction association from PPI.

Optionally, the α -balanced focus loss function is specifically as follows:

wherein:

For target protein/gene-disease association, r represents disease, d represents target protein/gene, weight factor

The embodiment provides a focus based on alpha balanceLoss function

S4, calculating contribution degree and analyzing results: given a target disease, calculating the contribution degree of each node to the target disease through a GHP-GCN Chinese and western medicine relocation model, and sequencing the contribution degrees; and (4) analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method according to the contribution degree sequence, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug.

The contribution ranking generally includes a target disease-associated pathogenic gene ranking, a drug candidate ranking, and the like.

The method for relocating western and traditional Chinese medicines fusing GHP and GCN provided in this example uses relevance layer-by-layer transmission (LRP) for feature importance analysis to explain the molecular features of western and traditional Chinese medicines identified by GCN (graph convolution nerve), and uses LRP rules to extract the contribution of interactions from PPI network to individual medicine classification. The first 1000 predictions of GCN were clustered from Consensus Pathway Database (CPDB) networks using Kluger's spectral diplomatic algorithm, based on the characteristic importance LRP scores of disease types of different pathological processes, revealing evidence-matching drugs of different pathological processes with unique therapeutic effects. Combining the information of all drugs in the network, a directed weighted graph of drug-drug LRP contributions is established and strongly connected nodes of the graph are investigated, modules containing more than two drugs are extracted and new disease treatment drugs are determined.

The Chinese and western medicine relocation method provided by the embodiment is based on high-throughput omics data, a global heterogeneous pharmacological network GHP formed by global node types and global topological associations is constructed, various nonlinear association relations such as complex mechanism action association and structure similarity association among drugs (traditional Chinese medicines and partial western medicines), virus target protein or genes and disease symptoms (including symptoms at different pathological stages and drug side effect symptoms) are comprehensively characterized, and particularly powerful data support is provided for deep learning of traditional Chinese medicine relocation; and then, a Chinese and western medicine relocation model is constructed by fusing the global heterogeneous pharmacological network GHP and the graph convolution neural network GCN, the complex incidence relation between various node characteristics and nodes is comprehensively represented, and a deep calculation model GHP-GCN with the dual reasoning capability of a network propagation method and a deep learning method based on similarity is constructed. The method adopts 'monarch, minister, assistant and guide' and 'semantic neighbor' dual-weighted constraint GHP-GCN. Meanwhile, an alpha-balanced focus loss function iterative training optimization model is introduced, so that abundant topological and semantic information in the global heterogeneous pharmacological network is retained, and a large amount of noise interference information which may be introduced is effectively filtered. Finally, the deep learning interpretability method based on the contribution degree clustering analysis solves the problem that a deep learning model cannot be explained, the Chinese and western medicines with good intervention effect on different pathological stages of the disease are efficiently searched at low cost, and the compatibility rule and the action mechanism of the Chinese medicine prescription are disclosed and explained by using modern language. The method can be applied to relocation research of traditional Chinese medicines and western medicines, can predict potential and new medicine-target protein/gene, target protein/gene-disease and medicine-disease association, identify new targets of the medicines and new associated pathogenic genes of the diseases, discover and confirm medicines with potential treatment effects on the diseases and integration action and synergistic mechanism thereof, and provide scientific basis for explaining the effectiveness of treatment of the traditional Chinese medicines and the western medicines and discovering the medicines.

Example 2

As shown in fig. 4, the present embodiment provides a system 20 for relocating western medicine and pharmaceutical products fused with GHP and GCN, which adopts the method for relocating western medicine and pharmaceutical products fused with GHP and GCN shown in the embodiment, and includes:

multi-source multi-modal heterogeneous data acquisition module 21: the device is used for collecting known medicine data, disease data and target protein/gene data;

the GHP-GCN chinese and western medicine relocation model building module 22 is configured to build a GHP-GCN chinese and western medicine relocation model, and specifically includes:

global heterogeneous pharmacological network GHP construction module 221: the system is used for constructing a global heterogeneous pharmacological network GHP by taking various diseases, medicines and target protein/genes as nodes according to data acquired by the multi-source multi-modal heterogeneous data acquisition module 21 so as to comprehensively represent various nonlinear association relations between the same node and different nodes;

the global heterogeneous pharmacological network GHP and convolutional neural network GCN fusion module 222: the global heterogeneous pharmacological network GHP construction module 221 is used for fusing the global heterogeneous pharmacological network GHP constructed by the global heterogeneous pharmacological network GHP construction module into the graph convolution neural network GCN, and seamlessly fusing and constructing a GHP-GCN Chinese and western medicine relocation model with double reasoning capability so as to comprehensively represent various node characteristics and complex association relations among nodes;

the GHP-GCN model optimization module 223: the GHP-GCN Chinese and western medicine relocation model is used for optimizing a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module 222 through double weighted constraint optimization of 'monarch, minister, assistant and guide' and 'node semantic neighbor' and seamlessly integrating the GHP-GCN Chinese and western medicine relocation model;

specifically, the method is used for forming a node characteristic matrix of a theoretical optimization GHP-GCN Chinese and western medicine relocation model by adopting Chinese medicine components of 'monarch, minister, assistant and guide' and optimizing a node adjacency matrix of the GHP-GCN Chinese and western medicine relocation model by utilizing 'node semantic neighbor';

the learning and reasoning module 23 of the training set construction and GHP-GCN model: the model optimization module 223 is used for constructing a drug-target protein/gene-disease training set, and iteratively training and optimizing a GHP-GCN model optimization module 223 through an alpha-balanced focus loss function to weight and constrain an optimized GHP-GCN Chinese and western drug relocation model;

the contribution calculation and result analysis module 24: the model is used for training the optimized GHP-GCN Chinese and western medicine relocation model through the training set construction and GHP-GCN model learning and reasoning module 223 according to the given target diseases, calculating the contribution degree of each node to the target diseases and sequencing the contribution degrees; and according to the contribution degree sequence, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and relocating a new candidate drug.

Result verification and tuning module 25: and is used for verifying and optimizing the prediction analysis result output by the contribution degree calculation and result analysis module 24.

The system can achieve all the technical effects of the relocation method shown in embodiment 1.

Example 3

An embodiment of the present invention provides a computer storage medium, on which a computer program is stored, wherein the computer program, when executed by an executor, implements a method for relocating traditional Chinese and western medicines by fusing GHP and GCN as shown in fig. 1.

The computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc read-only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read-only memories), EEPROMs (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be a product that is not accessed by the computer device or may be a component that is used by an accessed computer device.

Example 4: application examples

In order to verify the effectiveness of the relocation method proposed by the present invention, the method provided in the examples was applied to predict new associated pathogenic genes of breast cancer and relocate new drug candidates.

Breast cancer is a common malignancy, usually occurring in women, and is a cancer that develops from breast tissue, with signs that may include breast lumps, changes in the shape of the breast, depressed skin, nipple discharge or red, flaky skin.

According to the OMIM database records, breast cancer (MIM:114480) has 23 validated disease genes, which are RAD54L, CASP8, BARD1, PIK3CA, HMMR, NQO2, ESR1, RB1CC1, SLC22A1L, TSG101, ATM, KRAS, BRCA2, XRCC3, AKT1, RAD51A, PALB2, CDH1, TP53, PHB, PPM1D, BRIP1, CHEK2, respectively.

The GHP-GCN chinese and western drug relocation model for breast cancer was trained using drug bank, OMIM and GO annotations, all known drug-target, target-disease and drug-disease pair training relocation models in the Fdataset database.

The optimized model was trained by the method provided in example 1 and predicted by the method, and the validated disease genes were excluded from the predicted disease gene ranking of breast cancer, and the new disease genes ranked in the top ten of breast cancer were MAP2K1, MAP3K1, TPX2, IKBKB, RAD51C, MSH2, STK11, NOD1, CHUK, VRK1, respectively. The protein encoded by MAP2k1 is a member of the bispecific protein kinase family, and this protease is involved in various cellular processes such as proliferation, differentiation, transcriptional regulation, and the like. The fifth prediction ranked is the RAD51C gene, one quarter of which is in the chromosome 17q23 region, which is frequently amplified in breast tumors. Recent research literature shows that MAP2k1 has a definite association with breast cancer specificity and mortality, and mutations in RAD51C may explain part of the cause of genetic breast cancer susceptibility. This shows that the part of new pathogenic genes predicted by the present invention has strong consistency with the existing experimental research.

Candidates for Metigestrona, estramustine and mitoxantrone for the treatment of breast cancer were also relocated by the method of the invention and validated against various evidence of authoritative sources and clinical trials. The first candidate for the treatment of breast cancer that the method of the present invention has relocated is metriconstron a, a long-acting contraceptive, which has also been used to treat breast and endometrial tumors. The method of the invention relocates to estramustine which is an antineoplastic drug mainly used for treating prostate tumor and is related to breast cancer, and the previous clinical test report shows that estramustine shows encouraging results in the aspect of treating metastatic breast cancer, so that the method is worthy of being researched in early clinical conditions, thereby supporting the prediction. In addition, the inventive method relocates to discover that mitoxantrone is associated with breast cancer, which is supported by DB (short for drug bank), CTD (Common Technical Document, short for CTD, Common technology for human drug registration), drug central (western medicine database) and clinical trials.

In conclusion, the application of the method of the invention to new drug candidates for the relocation of breast cancer is validated by various evidences of authoritative sources and clinical trials. These results show the effectiveness of the present invention.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A Chinese and western medicine relocation method fusing GHP and GCN is characterized by comprising the following steps:

s2, constructing a Chinese and western medicine relocation model of GHP-GCN, wherein the specific construction method comprises the following steps:

2. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 1,

in step S21, the medicine nodes include the screened traditional Chinese medicines and western medicines, specifically including western medicines for treating a certain disease, traditional Chinese medicines for treating a certain disease, unknown western medicines, unknown traditional Chinese medicines, western medicines for not treating a certain disease, and traditional Chinese medicines for not treating a certain disease; the disease node is composed of four dimensions of disease type, symptom stage, symptom and side effect symptom; the target protein/gene node includes all known target proteins and genes that have been selected.

3. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 1,

in step S21, the association of the nodes of the same species is mainly the similarity association of the structure or phenotype, including drug-drug similarity association, target protein/gene-target protein/gene similarity association, disease-disease similarity association, and the association between the nodes of the target protein/gene also includes protein interaction association from PPI;

similarity association of nodes of the same kind is represented by a matrix and semantic neighbor constraint is adopted.

4. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 1,

5. The method of relocating Chinese and western medicines in combination with GHP and GCN as claimed in claim 4,

and if the two heterogeneous nodes have known interaction, an edge is added between the two nodes, and the weight of the edge is set to be 1.

6. The method for relocating Chinese and western medicines in combination with GHP and GCN as claimed in any one of claims 1 to 5,

in the step S23, the method specifically comprises the following steps of optimizing a GHP-GCN Chinese and western medicine relocation model by adopting 'monarch, minister, assistant and guide' and 'node semantic nearest neighbor' dual weighted constraint: the components of 'monarch, minister, assistant and guide' based on traditional Chinese medicine nodes form a node characteristic matrix of a theoretically optimized GHP-GCN Chinese and western medicine relocation model; and optimizing a node adjacency matrix of the GHP-GCN Chinese and western medicine relocation model by using 'node semantic neighbourhood'.

7. The method for relocating Chinese and western medicines in combination with GHP and GCN as claimed in any one of claims 1 to 5,

the constructed GHP-GCN Chinese and western medicine relocation model has a multilayer connected neural network structure, and specifically comprises an input layer, a hidden layer and an output layer, wherein the neural network structure of each layer is directly linked through a graph, gathers the information of neighbors, reconstructs and embeds the information as the input of the next layer; the propagation between layers is expressed by the following formula:

8. The method for relocating Chinese and western medicines in combination with GHP and GCN as claimed in any one of claims 1 to 5,

the α -balanced focus loss function in step S3 is specifically as follows:

wherein:

Positive sample

Representing known drug-disease association pairs, negative examples

Representing unknown drug-disease association relationship pairs, n and m respectively represent the number of drugs and diseases.

9. A Chinese and western medicine relocation system fusing GHP and GCN is characterized by comprising:

global heterogeneous pharmacological network GHP construction module: the system is used for constructing a global heterogeneous pharmacological network GHP by taking various diseases, medicines and target protein/genes as nodes according to data acquired by a multi-source multi-modal heterogeneous data acquisition module so as to comprehensively represent various nonlinear association relations between the same node and different nodes;

a global heterogeneous pharmacological network GHP and graph convolution neural network GCN fusion module: the global heterogeneous pharmacological network GHP is constructed by the global heterogeneous pharmacological network GHP construction module and is fused into the graph convolution neural network GCN, and a GHP-GCN Chinese and western medicine relocation model with double reasoning capability is obtained through seamless fusion construction so as to comprehensively represent various node characteristics and complex association relations among nodes;

the contribution degree calculation and result analysis module comprises: the model is used for training the optimized GHP-GCN Chinese and western medicine relocation model through a training set structure and a learning and reasoning module of the GHP-GCN model according to a given target disease, calculating the contribution degree of each node to the target disease and sequencing the contribution degrees; according to the contribution degree sorting, analyzing and outputting a prediction analysis result by using a thermodynamic diagram and a cluster analysis method, predicting potential and new drug-target protein/gene, target protein/gene-disease and drug-disease association, identifying a new drug target, a new disease associated pathogenic gene and repositioning a new candidate drug;

10. A computer storage medium having a computer program stored thereon, wherein the computer program when executed by an actuator implements a method of relocating a combination of a GHP and a GCN as claimed in any one of claims 1 to 7.