CN113012770A

CN113012770A - Medicine-medicine interaction event prediction method, system, terminal and readable storage medium based on multi-modal deep neural network

Info

Publication number: CN113012770A
Application number: CN202110287239.1A
Authority: CN
Inventors: 高建良; 吕腾飞
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-06-22
Anticipated expiration: 2041-03-17
Also published as: CN113012770B

Abstract

The invention discloses a method, a system, a terminal and a readable storage medium for predicting a drug-drug interaction event based on a multi-modal deep neural network, wherein the method comprises the following steps: acquiring drug-drug interaction events, drug heterogeneous characteristics and drug knowledge maps; acquiring topological embedded representation of the medicine in the knowledge graph and similar embedded representation of the medicine; splicing and fusing the topological embedded representation of the medicine and the similar embedded representation of the medicine to construct a medicine-medicine interaction event prediction model, and performing model training by using the medicine-medicine interaction event in the sample; then, predicting the drug-drug interaction event by using a prediction model; the topological embedded representation of the medicine to be predicted in the knowledge graph and the similar embedded representation of the medicine are input into a prediction model to obtain a medicine-medicine interaction event prediction result. The method utilizes the relevance and complementarity among different modal characteristics to further improve the accuracy of the prediction result.

Description

Medicine-medicine interaction event prediction method, system, terminal and readable storage medium based on multi-modal deep neural network

Technical Field

The invention belongs to the technical field of drug design and medicine, and particularly relates to a method, a system, a terminal and a readable storage medium for predicting drug-drug interaction events based on a multi-modal deep neural network.

Background

With the rapid increase in the number of kinds of medicines, when a plurality of medicines are taken for the treatment of diseases, it is important to ensure safety between the medicines. The problem of Drug-Drug Interaction (DDI) prediction means that when a plurality of drugs are taken simultaneously, adverse Interaction between the drugs may occur due to the influence of molecules of compounds constituting the drugs, targeting proteins of Drug action, targeting pathways, and the action of different enzymes, thereby causing adverse harm to patients or enormous medical expenses. In addition, DDI may also lead to different biological consequences and events. Therefore, accurate prediction of DDI events has become a clinically important task that can help clinicians make effective decisions and establish appropriate treatment regimens. In addition, proper use of multiple drugs can minimize the risk of treatment for the patient and bring about the synergistic benefits of the drugs.

Drug-drug interaction events come from a variety of sources, including the effects of the molecules of the compounds that make up the drug, the targeting proteins for drug action, the targeting pathways, and the action of enzymes, among others, which affect the interaction events between drugs. Each factor and the drug can be combined into an independent feature matrix, and the information aggregation of a plurality of feature matrices can provide great help for the drug-drug interaction event prediction task. However, the task of predicting drug-drug interaction events is a serious challenge due to the lack of adequate clinical data and knowledge. Therefore, research programs that find potential drug-drug interaction events have a profound impact on improving healthcare and increasing drug alertness.

However, most of the existing methods utilize the multi-source characteristics of the drugs for independent analysis, so that the reliability of the prediction result of drug-drug interaction in the existing methods still needs to be further improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a medicine-medicine interaction event prediction method based on a multi-mode deep neural network, which analyzes the relevance and complementarity among different modal characteristics to improve the accuracy of DDI event prediction.

In one aspect, the invention provides a method for predicting a drug-drug interaction event based on a multi-modal deep neural network, which comprises the following steps:

step S1: acquiring drug data as a sample, wherein the drug data comprises drug-drug interaction events, drug heterogeneous characteristics and a drug knowledge graph constructed based on drug attributes;

step S2: obtaining a topological embedded representation of the drug in the knowledge graph based on the drug knowledge graph; similarity calculation is carried out on the basis of the heterogeneous characteristics of the medicines to obtain similar embedded representation of the medicines;

step S3: performing splicing and fusion on the topological embedded representation of the medicine in the knowledge graph and the similar embedded representation of the medicine to construct a medicine-medicine interaction event prediction model, and performing model training by using the medicine-medicine interaction event in the sample in the step S1 to obtain a medicine-medicine interaction event prediction model;

step S4: predicting the drug-drug interaction event by using the drug-drug interaction event prediction model obtained in the step S3;

and inputting the topological embedded representation of the medicine in the knowledge graph and the similar embedded representation of the medicine in the medicine pair to be predicted into the medicine-medicine interaction event prediction model to obtain a medicine-medicine interaction event prediction result.

The technical idea of the invention is as follows: by constructing a medicine knowledge graph, a design graph neural network obtains topological embedded representation of the medicine by utilizing the topological structure and semantic information of the medicine; calculating the similarity of the heterogeneous characteristics of the medicines as similar embedded representation of the medicines by collecting the heterogeneous characteristics of the medicines; the topological embedded representation and the similar embedded representation of the medicine are spliced, multi-modal characteristic analysis is carried out through a plurality of layers of full connection layers, and the prediction of the medicine-medicine interaction event is carried out.

Certain complementarity often exists among the multi-modal characteristics, and the correlation and the complementarity among different modal characteristics can be analyzed to improve the accuracy of DDI event prediction. In addition, the medicine knowledge graph contains rich information of the medicine, including topological structure information and semantic information of the medicine, and the importance of the information is not negligible. Therefore, the invention applies the above two characteristics to the scheme of the invention to improve the reliability of the prediction result.

Optionally, the drug-drug interaction event prediction model is as follows:

wherein the content of the first and second substances,

denotes the drug d_i,d_jThe predicted outcome of the drug-drug interaction event of (a),

are respectively medicine d_i,d_jIs then represented by the final embedded representation of (c),

denotes the stitching operation, σ is a non-linear activation function softmax, W₃,b₃Are trainable weights and biases.

Optionally, if the drug heterogeneity characteristics comprise: substructure F^sTarget F^tAnd enzyme F^eIn step S2, similarity calculation is performed based on the heterogeneous characteristics of the drugs to obtain a similar embedding expression of the drugs as follows:

wherein, the drug d_iIs embedded in

Target-like embedding

Intercalation similar to enzymes

Corresponds to E^s、E^t、E^mRespectively a substructure similarity matrix, a target similarity matrix and an enzyme similarity matrix;

the elements in the substructure similarity matrix, the target similarity matrix and the enzyme similarity matrix are obtained by calculation by using a medicament heterogeneous characteristic and a similarity value calculation formula of a medicament pair.

Optionally, the elements in the substructure similarity matrix, the target similarity matrix, and the enzyme similarity matrix are obtained by using a Jaccard similarity calculation, where the Jaccard similarity calculation formula is as follows:

wherein, J (F)_i,F_j) Representing a value of similarity between the heterogeneous characteristics of two drugs, F_i,F_jAre the heterogeneous characteristics of the two drugs, respectively.

Optionally, the process of obtaining the topologically embedded representation of the drug in the knowledge graph based on the drug knowledge graph in step S2 is as follows:

firstly, initializing embedded representation of each node in a medicine knowledge graph, wherein the nodes comprise medicine nodes and attribute nodes;

then, calculating the weight scores of the edges of the drug nodes and the attribute nodes in the drug knowledge graph, wherein the formula is as follows:

wherein the content of the first and second substances,

is node d of the drug_iAnd the weight scores of the edges corresponding to the attribute nodes,

respectively represent the drugs d_iAnd relation r_inIs embedded in the representation, said relation r_inFunction description indicating an attribute node, <' > indicating a Hamada product, W₁And b₁Respectively representing trainable weights and biases;

second, the neighbor-embedded representation of the drug node is calculated according to the different weight scores of each neighbor node:

wherein N is_s(d_i) Denotes the drug d_iOf the sampling of the set of neighboring nodes,

indicating drug node d_iIs embedded in a representation of the neighborhood of (c),

denotes the drug d_iNeighbor node t of_nThe initial embedded representation of (a);

finally, the drug d_iTopological embedded representation in a knowledge graph:

wherein the content of the first and second substances,

indicating a splicing operation. Likewise, the drug d can be obtained_jTopology embedding of

Optionally, the drug knowledge graph constructed based on the drug attributes in step S1 is represented as follows:

wherein D represents a drug set, R represents a relationship set, T represents an attribute node set of the drug, D, R_dtAnd t represents the drug, relationship and attribute nodes, respectively.

In another aspect, the present invention further provides a system based on the foregoing method, including:

a drug data acquisition module: for collecting drug-drug interaction events, drug heterogeneity characteristics;

the medicine knowledge graph building module is used for building a medicine knowledge graph based on the medicine attribute;

the embedded representation acquisition module of the medicine is used for acquiring topological embedded representation of the medicine in the knowledge graph based on the medicine knowledge graph; similarity calculation is carried out on the basis of the heterogeneous characteristics of the medicines to obtain similar embedded representation of the medicines;

the prediction model construction and training module is used for splicing and fusing the topological embedded representation of the medicine in the knowledge graph and the similar embedded representation of the medicine to construct a medicine-medicine interaction event prediction model, and performing model training by using the collected medicine-medicine interaction event to obtain a medicine-medicine interaction event prediction model; and a prediction module, configured to perform drug-drug interaction event prediction by using the drug-drug interaction event prediction model obtained in step S3.

In another aspect, the present invention provides a terminal, including a processor and a memory, where the memory stores a computer program, and the processor calls the computer program to execute: a method for predicting drug-drug interaction events based on a multi-modal deep neural network.

In a fourth aspect, the present invention also provides a readable storage medium storing a computer program, the computer program being invoked by a processor to perform: a method for predicting drug-drug interaction events based on a multi-modal deep neural network.

Advantageous effects

1. The invention provides a medicine-medicine interaction event prediction method based on a multi-modal deep neural network, which splices multi-modal embedded representations of medicines, efficiently analyzes the relevance and complementarity of the medicine embedded representations in different modes and brings better medicine-medicine interaction event prediction effect.

2. The invention effectively constructs the medicine knowledge graph, designs the graph neural network, obtains the embedded expression of the medicine by using the topological structure information and the semantic information in the medicine knowledge graph, and improves the accuracy of the prediction of the medicine-medicine interaction event.

3. The invention considers the heterogeneous characteristics of different medicines, can analyze the relation of medicine-medicine interaction from multiple angles, and improves the accuracy of medicine embedding expression.

Drawings

FIG. 1 is an overall framework diagram of a drug-drug interaction event prediction method based on a multi-modal deep neural network provided by the invention.

Fig. 2 is a schematic flow chart of a drug-drug interaction event prediction method based on a multi-modal deep neural network provided by the invention.

Figure 3 is a schematic of drug-drug interaction events.

Detailed Description

The invention provides a medicine-medicine interaction event prediction method based on a multi-modal deep neural network. The present invention will be further described with reference to the following examples.

Fig. 1 and 2 show the overall framework of the method of the invention: the invention provides a medicine-medicine interaction event prediction method based on a multi-mode deep neural network, which comprises the following steps:

step S1: drug data including drug-drug interaction events, drug heterogeneous characteristics, and drug knowledge maps constructed based on drug attributes are obtained as samples.

The specific flow is as follows.

FIG. 3 is a schematic representation of drug interaction events. The data set used by the invention is mainly obtained from a drug bank, and mainly comprises three parts: drug-drug interaction events, drug heterogeneous characteristics, and drug knowledge profiles.

A. Drug-drug interaction events

Drug-drug interaction event matrix

N_dIndicates the amount of the drug, y_ijRepresenting the event of drug i interacting with drug j. Wherein events of interactions between pairs of drugs are encoded, such as: when the drug i Bomacillin and the drug j dabrafenib are taken together, the serum concentration of a patient may be reduced, namely y_ijLabels with elevated serum concentrations are numbered.

B. Heterogeneous character of the drug

The heterogeneous characteristics of drugs mainly include substructure (F)^sTarget F^tAnd enzyme (enzyme) F^e。

Respectively represent the drugs d_iThe heterostructure, target and enzyme of (a). The above-mentioned heterogeneous characteristics are all (0, 1) binary sequences, i.e. s₀,…,s_n，t₀,…,t_m，m₀,…,m_kEpsilon (0, 1), 1 represents drug d_iComprising this feature, 0 denotes the drug d_iDoes not contain the characteristic; where n +1, m +1 and k +1 represent the total number of features of the substructure, target and enzyme, respectively.

C. Knowledge map of medicine

The invention obtains a medicine knowledge graph G ═ (D, R, T) on a drug Bank, and the medicine knowledge graph G ═ D, R, T is specifically expressed as:

wherein D represents a drug set, R represents a relationship set, and T represents an attribute node set of the drug. And (5) constructing a biological knowledge map by combining a DrugBank database. The knowledge-graph can be visualized as a triplet, i.e., < head entity, tail entity, relationship >. Detailed attribute information of each drug, including information on the substructures, transporters, pathways, targets, etc. of the drug is contained in the drug bank database. The drug is taken as a head entity, an attribute node contained in the drug is taken as a tail entity, and the relationship is the functional description of the tail entity. For example, the drug DB00130 contains a target with number P17812, which functions as: ctp synthase activity, then the triplet is denoted < DB00130, P17812, Ctp synthase activity >. By collecting attributes of each drug, a knowledge graph of the drug can be constructed, thereby providing topological structure information for subsequent use of the graph neural network to obtain drug-embedded representations.

S2, the design chart neural network module takes the medicine knowledge graph obtained in the step S1 as input information to obtain topological embedded representation of the medicine in the knowledge graph; and performing similarity calculation based on the heterogeneous characteristics of the medicines to obtain a similar embedded representation of the medicines.

A: the acquisition procedure for the topologically embedded representation of the drug in the knowledge graph is as follows:

based on the knowledge graph of the drug obtained in step S1, the present invention designs a graph neural network to obtain the topology embedded representation of the drug by using the topology structure information and semantic information in the knowledge graph of the drug.

An embedded representation of each node in the drug knowledge-graph is first initialized, the embedded representation being randomly generated numerical information, such as a 128-dimensional row vector. The nodes comprise medicine nodes and attribute nodes; then, 6 neighbor nodes are randomly selected for each node to update the embedded representation of the central node. For example, for a drug node i, all nodes connected with edges of the drug node i are collectively referred to as neighbor nodes of the drug node i;

calculating the weight fraction of the edges of the drug nodes and the attribute nodes in the drug knowledge graph by utilizing the semantic information:

wherein the content of the first and second substances,

respectively represent the drugs d_iAnd relation r_inIs initialized with an embedded representation of the relationship r_inInitialized embedded representation of

Also based on its relation r_inCode formation, <' > indicates a Hamada product, W₁And b₁Trainable weights and biases are represented separately, sum being a summation function.

Calculating a neighbor-embedded representation of the drug node according to the different weight scores of each neighbor node:

denotes the drug d_iNeighbor node t of_nInitial embedding of (1).

Calculating drug d_iIs topologically embedded in the representation

Wherein the content of the first and second substances,

representing a splicing operation, ReLU being an activation function, W₂And b₂Trainable weights and biases, respectively. Likewise, the drug d can be obtained_jTopology embedding of

From the knowledge map, the drug d_iIs topologically embedded in the representation

The information of the drug node and the relationship between the drug node and the attribute node are covered.

B: the heterogeneous characteristic similarity calculation of the medicine comprises the following specific processes:

the invention uses the heterogeneous characteristics of the Jaccard similarity calculation medicine to obtain the similarity embedding of the medicine, and the Jaccard similarity calculation can be expressed as:

wherein, J (F)_i,F_j) Representing a value of similarity between two drug heterogeneous characteristics, F_i,F_jAre respectively the heterogeneous characteristics of two medicaments.

By using the Jaccard similarity method for calculation, a substructure similarity matrix can be obtained

Target similarity matrix

Enzyme similarity matrix

Thereby obtaining a drug d_iIs embedded in

Target-like embedding

Intercalation similar to enzymes

Wherein the sub-structure-like intercalation, target-like intercalation and enzyme-like intercalation are performed by corresponding drugs in a sub-structure-like matrix E^sTarget similarity matrix E^tAnd enzyme similarity matrix E^mThe corresponding data. Medicine d_iThe similar embedded representation of (c) can be expressed as:

in the same way, the drug d can be obtained_iIs similar to the embedded representation

S3: the multi-modal representation fusion analysis module specifically comprises the steps of splicing and fusing the topological embedded representation of the medicine in the knowledge graph and the similar embedded representation of the medicine to construct a medicine-medicine interaction event prediction model, and performing model training by using the medicine-medicine interaction event in the sample in the step S1 to obtain the medicine-medicine interaction event prediction model.

According to the drug topology embedded representation and the drug similarity embedded representation obtained in step S2, the invention combines the two embedded parts as the drug d_iAnd finally, performing multi-mode embedding analysis on the embedded representation. Is represented as follows:

likewise, the drug d can be obtained_jFinal embedded representation

The resulting inserts of drug pairs were predictively classified as follows:

where σ is a non-linear activation function softmax, W₃,b₃In order to train the weights and biases,

is a medicine d_jAnd a drug d_iThe predicted outcome of the drug-drug interaction event between.

In the training and optimizing process, updating weight parameters are optimized through back propagation gradient descent, training is carried out by using a 5-fold cross validation method, and the accuracy of the super-reference validation model is adjusted. The model loss function samples a multi-class cross entropy loss function. Since this process is a prior art implementation process, it is not specifically described. It should be understood that, in the present invention, the drug-drug interaction event is collected and used as sample data in step S1, and then, the topological embedded representation of the drug in the knowledge graph and the similar embedded representation of the drug can be obtained by using the sample and according to step S2, and then the two are substituted into the prediction model, and the prediction model is obtained by performing model training with the drug-drug interaction event of the sample. Wherein drug-drug interaction events between pairs of drugs can be predicted using the predictive model.

In some implementations, the present invention also provides a system for a method for drug-drug interaction event prediction based on a multimodal deep neural network, comprising: the system comprises a medicine data acquisition module, a medicine knowledge map construction module, an embedded expression acquisition module of medicines, a prediction model construction and training module and a prediction module.

Wherein, the medicine data acquisition module: for collecting drug-drug interaction events, drug heterogeneity characteristics;

the prediction model construction and training module is used for splicing and fusing the topological embedded representation of the medicine in the knowledge graph and the similar embedded representation of the medicine to construct a medicine-medicine interaction event prediction model, and performing model training by using the collected medicine-medicine interaction event to obtain a medicine-medicine interaction event prediction model;

and a prediction module, configured to perform drug-drug interaction event prediction by using the drug-drug interaction event prediction model obtained in step S3.

For the specific implementation process of each unit module, refer to the corresponding process of the foregoing method. It should be understood that, the specific implementation process of the above unit module refers to the method content, and the present invention is not described herein in detail, and the division of the above functional module unit is only a division of a logic function, and there may be another division manner in the actual implementation, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. Meanwhile, the integrated unit can be realized in a hardware form, and can also be realized in a software functional unit form.

In some implementations, the present invention also provides a terminal comprising a processor and a memory, the memory storing a computer program, the processor invoking the computer program to perform: a method for predicting drug-drug interaction events based on a multi-modal deep neural network.

In some implementations, the invention also provides a readable storage medium storing a computer program for invocation by a processor to perform: a method for predicting drug-drug interaction events based on a multi-modal deep neural network.

For the implementation process of each step, please refer to the specific implementation process of the foregoing method, which is not described herein again.

It should be understood that in the embodiments of the present invention, the Processor may be a Central Processing Unit (CPU), and the Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.

The readable storage medium is a computer readable storage medium, which may be an internal storage unit of the controller according to any of the foregoing embodiments, for example, a hard disk or a memory of the controller. The readable storage medium may also be an external storage device of the controller, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the controller. Further, the readable storage medium may also include both an internal storage unit of the controller and an external storage device. The readable storage medium is used for storing the computer program and other programs and data required by the controller. The readable storage medium may also be used to temporarily store data that has been output or is to be output.

It should be emphasized that the examples described herein are illustrative and not restrictive, and thus the invention is not to be limited to the examples described herein, but rather to other embodiments that may be devised by those skilled in the art based on the teachings herein, and that various modifications, alterations, and substitutions are possible without departing from the spirit and scope of the present invention.

Claims

1. A method for predicting drug-drug interaction events based on a multi-modal deep neural network is characterized by comprising the following steps: the method comprises the following steps:

2. The method of claim 1, wherein: the drug-drug interaction event prediction model is as follows:

wherein the content of the first and second substances,

3. The method of claim 1, wherein: if the drug heterogeneity characteristics include: substructure F^sTarget F^tAnd enzyme F^eIn step S2, similarity calculation is performed based on the heterogeneous characteristics of the drugs to obtain a similar embedding expression of the drugs as follows:

wherein, the drug d_iIs embedded in

Target-like embedding

Intercalation similar to enzymes

Corresponds to E^s、E^t、E^mRespectively a substructure similarity matrix, a target similarity matrix and an enzymeA similarity matrix;

4. The method of claim 3, wherein: the elements in the substructure similarity matrix, the target similarity matrix and the enzyme similarity matrix are obtained by Jaccard similarity calculation, wherein the Jaccard similarity calculation formula is as follows:

5. The method of claim 1, wherein: the process of obtaining the topologically embedded representation of the drug in the knowledge graph based on the drug knowledge graph in step S2 is as follows:

wherein the content of the first and second substances,

finally, the drug d_iTopological embedded representation in a knowledge graph:

wherein the content of the first and second substances,

indicating a splicing operation.

6. The method of claim 5, wherein: the drug knowledge graph constructed based on the drug attributes in step S1 is represented as follows:

7. A system based on the method of any one of claims 1-6, characterized by: the method comprises the following steps:

8. A terminal, characterized by: comprising a processor and a memory, the memory storing a computer program that the processor calls to perform: the process steps of any one of claims 1 to 6.

9. A readable storage medium, characterized by: a computer program is stored, which is invoked by a processor to perform: the process steps of any one of claims 1 to 6.