CN113643765B

CN113643765B - Tensor neural network-based medicine-medicine interaction prediction method

Info

Publication number: CN113643765B
Application number: CN202110996665.2A
Authority: CN
Inventors: 于会; 赵时雨; 施建宇; 于宏; 董文敏
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2024-03-08
Anticipated expiration: 2041-08-27
Also published as: CN113643765A

Abstract

The invention belongs to the field of biological information, and particularly relates to a Drug-Drug Interaction (DDI) prediction method based on a tensor neural network. The method has the characteristics of high interpretability, capability of predicting various relations of interaction among medicines and prediction under cold start condition, can effectively overcome the defect that the traditional neural network DDI prediction model is difficult to interpret, and is beneficial to clinical specialists to interpret interaction reaction among medicines and construct new medicines.

Description

Tensor neural network-based medicine-medicine interaction prediction method

Technical Field

The invention belongs to the technical field of biological information, and particularly relates to a Drug-Drug Interaction (DDI) prediction method based on a tensor neural network, in particular to a neural network DDI prediction method based on tensor decomposition.

Background

Single drugs often cannot meet the clinical treatment requirements, and complex diseases are generally required to be treated by combining multiple drugs, which may cause adverse reactions among the drugs to occur, so that the functions of human bodies are damaged, adverse reactions to human bodies caused by DDI have caused huge economic losses in the global scope, and how to effectively identify interactions among the drugs has become a hot problem of concern for a plurality of scholars.

At present, researchers have utilized various neural network models to conduct DDI prediction, and important models comprise convolutional neural networks, graph neural networks, cyclic neural networks and the like, however, the current DDI prediction model based on the neural networks often directly researches the whole medicine structure, the medicine components causing the DDI are lack of research, and the model has insufficient interpretation, so that clinical experts cannot analyze according to the prediction result, and great limitation exists in practical application. Studies have shown that a particular behavior of a drug is often associated with certain of its chemical substructures, rather than the entire drug acting directly.

At present, researchers have invented a plurality of models to conduct DDI prediction, for example, DDIMDL (A multimodal deep learning framework for predicting drug-drug interaction events) calculates to obtain similarity vectors among medicines by using Jaccard similarity, and the vectors of medicine pairs are connected in series to be used as input of a deep neural network model to predict DDI conditions among medicine pairs. GoGCN (GoGNN: graph of graphs neural network for predicting structured entity interactions) models drug structure and known DDI on a bilayer map of drug structure-drug interactions, and performs DDI prediction by training the bilayer map through a graph neural network. The main drawbacks of these models are: (1) The model has insufficient interpretability, so that clinical experts cannot analyze according to the prediction result, and the practical application is greatly limited; (2) Multiple types of interactions between the same pair of drugs cannot be predicted.

Disclosure of Invention

In order to solve the technical problems, the invention provides a drug-drug interaction prediction method based on a tensor neural network.

The invention aims to provide a drug-drug interaction prediction method based on tensor neural network, which comprises the following steps:

according to the division standard of the medicine substructures, a tensor ST of a substructures multiplied by interaction types is constructed for all chemical substructures and interaction types of the medicine;

according to the operation mode of tensor ST, CP decomposition is carried out on the tensor ST, so that the tensor ST approximates to the sum of a group of rank 1 tensors;

and reconstructing a factor matrix of tensors aiming at the rank 1 tensors, constructing a tensor neural network model for the factor matrix, and training to finally obtain the relationship between the substructures and the interaction types, thereby obtaining the probability of DDI occurrence.

Preferably, the method for predicting drug-drug interaction based on tensor neural network, the standard method for partitioning the drug substructure is based on Pubchem dataset partitioning, wherein,representing the collection of all substructures or functional groups in the dataset s _i Representing the substructures numbered i in S; />Representing a set of all drugs in the dataset, d _p Drug number p in D; />Indicating the type of f interactions present in all drugs, l _i Indicating the type of interaction numbered i in L.

Preferably, in the method for predicting drug-drug interaction based on tensor neural network, the elements ST of the k layers in the i rows and j columns of tensor ST _ijk Representation of SSI _ijk Probability of occurrence

Wherein ST predicted DDI _pqk Probability of occurrenceThe calculation formula is as follows:

preferably, the drug-drug interaction prediction method based on tensor neural network is implemented in three stepsIs calculated by (1):

first of all, the first one,extracting a k-th layer matrix ST from ST _：,：,k ，ST _：,：,k In that all interaction types are l _k Is->

Second, the first one is a first one,obtain all types as l _k And wherein one substructure belongs to d _p Is->A vector of components;

third, the third step of, in the case of a vehicle,calculation (d) _p ,d _q ) All types of (1) _k Is->Sum, i.e.)>

Preferably, in the method for predicting drug-drug interactions based on tensor neural network, CP decomposition is performed on tensor ST as follows:

r is a manually set parameter, and represents the number of rank 1 tensors decomposed by ST, factor matrices representing tensors ST, respectively, +.>Weights representing each rank 1 tensor;

a, B, C collect tensor information of each dimension according to the direction, and the x axis and the y axis of ST represent the substructure of the medicine, so that:

the tensor ST can be approximately represented by two matrices A, C.

Preferably, in the method for predicting drug-drug interaction based on tensor neural network, the construction of the tensor neural network model is performed according to the following method:

from equation (10) and equation (13):

the method is deduced according to the formula (7):

wherein, the formula (7) is

Wherein, represents the dot product of the vector;

r-dimensional embedded representations of each chemical substructure and interaction are randomly generated and respectively formed into a matrix A, C according to rows, and after multiple training, the tensor neural network model is obtained as follows:

wherein,the probability of occurrence of DDI calculated by the invention, bias added to enhance the robustness of the model, and the Hadamard product is shown by the following formula.

Preferably, the method for predicting drug-drug interaction based on tensor neural network updates the tensor neural network by using a known DDI, and the specific method is as follows:

first, if DDI _pqk Occurs, thenOtherwise->

Secondly, the model is updated by calculating the predictive loss of each DDI in the training set, namely:

next, the tensor neural network model is updated using Adam back propagation algorithm with loss as a loss function.

Preferably, the method for predicting drug-drug interaction based on tensor neural network, which predicts DDI by using tensor neural network model, is as follows:

representing triples according to the existence of substructures and interaction types;

randomly generating R-dimensional embedded representations for all substructures and interaction types;

the matrix A, C of the substructures and the embedded representation of interactions, respectively, in rows can be considered as a two-dimensional representation of ST, and the scalar product is obtained by multiplying the drug, the vector representation of interactions, and A, C column by column;

multiplying scalar products one by one according to columns and connecting the scalar products in series to obtain the final vector representation of the triples;

and predicting the probability of DDI occurrence corresponding to the triplet by using a tensor neural network model.

Preferably, the above method for predicting drug-drug interactions based on tensor neural network, the drug and its interactions are derived from database TWOSIDES.

Preferably, in the drug-drug interaction prediction method based on tensor neural network, considering the cold start situation, DDI prediction tasks can be divided into 3 types: c1, potential DDI present between known drugs; c2, unknown DDI present between known and new drugs; c3, unknown DDI between new drugs.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a drug-drug interaction prediction method based on a tensor neural network, which comprises the steps of firstly decomposing a drug into different chemical substructures, constructing a substructure tensor and establishing a tensor neural network model, solving interaction (substructure-substructure interaction, SSI) between the chemical substructures by using the model, and finally predicting the probability of DDI occurrence by SSI. The method has the characteristics of high interpretability, capability of predicting various relations of interaction among medicines and prediction under cold start condition, can effectively overcome the defect that the traditional neural network DDI prediction model is difficult to interpret, and is beneficial to clinical specialists to interpret interaction reaction among medicines and construct new medicines.

Drawings

FIG. 1 is a flow chart of the working principle of the invention;

FIG. 2 is a schematic illustration of the principles of operation of the present invention;

FIG. 3 is a CP decomposition and two-dimensional representation of tensors;

FIG. 4 is a modular multiplication illustration of tensors and vectors

FIG. 5 is a modeling of tensor ST prediction DDI, element ST at ST coordinates (i, j, k) _ijk ∈[0,1]The physical meaning of the value of (2) is SSI _ijk Probability of occurrence;

FIG. 6 is ST and v _k ，e _p ，e _q Is calculated by modular multiplication one by one

FIG. 7 is a diagram illustrating the operation of the present invention on a neural network;

fig. 8 illustrates three DDI prediction problems addressed by the present invention.

Detailed Description

In order that those skilled in the art will better understand the technical scheme of the present invention, the present invention will be further described with reference to specific embodiments and drawings.

The invention provides a drug-drug interaction prediction method based on tensor neural network. Previous studies have shown that drug-drug interactions are closely related to the composition of the drug, i.e., the chemical substructure of the drug. The invention therefore first proposes the following assumptions: "drug chemical substructure-drug chemical substructure interactions are the primary cause of drug-drug interactions. If the interactions between the chemical substructures of the drugs can be effectively obtained, the interactions between the drugs can be solved. According to the division standard of the medicine substructure, the invention constructs a tensor of 'substructure x interaction type' for all chemical substructure and interaction type of the medicine, and then carries out CP decomposition according to the operation mode of the tensor to make the tensor approximate to the sum of a group of rank 1 tensors. And reconstructing a factor matrix of tensors aiming at the rank 1 tensors, training a factor matrix tensor neural network model, and finally obtaining the relation between the substructures and the interaction types, thereby obtaining the probability of DDI occurrence. The invention mainly provides a method for predicting DDI based on SSI and a tensor neural network model based on tensor decomposition. The working principle of the invention is seen in fig. 1 and 2.

1. Definition of tensor and related calculation rule

1.1 tensor and CP decomposition of tensor

Tensors can be regarded as high-dimensional extensions of the matrix if an N-order tensorThe outer product of N vectors can be written, namely:

then the tensor y is called rank 1 tensor, and each element of the rank 1 tensor is multiplied by the component of its corresponding vector, namely:

CP decomposition of tensors: either tensorCan be approximated as a sum of several rank 1 tensors, namely:

wherein the method comprises the steps ofIs the rank 1 tensor, < >>Weights for each rank 1 tensor. For ease of understanding, FIG. 3 shows a three-dimensional tensor +.>CP division of (c)Schematic of solution, a three-dimensional tensor->The sum of the tensors of R ranks 1 can be decomposed by CP, namely:

wherein the method comprises the steps ofIs tensor->Vectors resolved in three directions, < >>Representing the outer product of the vector. The vector set of the rank 1 tensor elements formed by the outer product of the matrix is denoted as a= [ a ] ₁ ,a ₂ ,…,a _R ]，B＝[b ₁ ,b ₂ ,…,b _R ],C＝[c ₁ ,c ₂ ,…,c _R ]Also known as tensor->Is a factor matrix of->Is a two-dimensional representation of (c).

1.2 modular multiplication of tensors and vectors

Given an N-order tensorSum vector->The n-modulo product of both is defined as:

the modular product of tensors and vectors can achieve a reduced order of tensors, a three-dimensional tensor being given in FIG. 43-modular multiplication of (2) is represented by tensor +.>By and 3 vectors v ₁ ，v ₂ ，v ₃ Modulo multiplication, the order changes from 3 to 0.

When tensor according to formula (4)After decomposition into the sum of rank 1 tensors by CP, it is possible to obtain:

from the nature of the vector outer product, it is possible to:

where, represents the dot product of the vector.

2. Tensor neural network model

In this section, we will introduce mathematical reasoning processes and neural network training methods for tensor neural network models.

2.1Definition

Order theRepresenting the collection of all substructures or functional groups in the Pubchem fingerprint dataset.Representing the set of all drugs in the dataset, +.>Indicating the type of f interactions present in all drugs. In the present invention, interaction l _k Using one-hot vector coding, vector +.>And (3) representing. Let->Is medicine d _p Coded representations of Pubchem molecular fingerprints. For convenience of description, table 1 lists representations of mathematical symbols used in the present invention 2.1 and their meanings.

TABLE 1 meanings of symbols used in the present invention

2.2SSI tensor ST

We assume that DDI is caused by SSI and that DDI _pqk Probability of occurrenceThe following calculation rules were followed:

the meaning of formula (8) is: DDI _pqk The probability of occurrence is the drug pair (d _p ,d _q ) Each substructure pair present in (1) causes l _k Is added one by one.

Based on the above assumption, if the probability of occurrence of the SSI of all known combinations can be calculated, DDI can be obtained from the SSI. In order to study the calculation method of the probability of occurrence of SSI,we construct an interpretable tensor ST of "substructures x interaction types", set the i rows, j columns and k layers of elements ST of ST _ijk Representation of SSI _ijk Probability of occurrenceAs shown in fig. 5.

2.3ST prediction of DDI

DDI using ST prediction according to the physical meaning of tensor ST modeling and equation (8) _pqk Probability of occurrenceThe calculation is as follows:

according to the invention, ST and v are encoded by the drug and the interaction _k ，e _p ，e _q Can be realized in a three-step manner as shown in FIG. 6Is calculated by (1):

1、extracting a k-th layer matrix ST from ST _：,：,k ，ST _：,：,k In that all interaction types are l _k Is->

2、Obtain all types as l _k And wherein one substructure belongs to d _p Is->A component vector.

3、Calculation (d) _p ,d _q ) All types of (1) _k Is->Sum, i.e.)>

However, the amount of data in ST is very large, and the direct storage and update thereof requires huge storage and computing resources. Next, we will discuss how to simplify the representation of ST and how to predict DDI using the simplified ST.

2.4 two-dimensional representation of ST

From the CP decomposition of tensors, a tensor can be approximated using its factor matrix (as in fig. 3), namely:

r is a manually set parameter, and represents the number of rank 1 tensors decomposed by ST, factor matrices representing tensors ST, respectively, +.>The weight of each rank 1 tensor is represented, and meanwhile, the factor matrix of the CP decomposition model collects potential information of specific dimension, A collects potential information corresponding to the x axis of ST, and B receives the potential informationPotential information corresponding to the y-axis of the ST is collected, and potential information corresponding to the z-axis of the ST is collected by C. Thus, A, B can be regarded as an embedded representation matrix of the substructure, so that a=b, C can be regarded as an embedded representation matrix of the interaction type. Then there are:

from the above derivation, only two matrices A, C of controllable size are used, and a tensor ST of a huge data volume can be approximately represented.

Matrix implementation of 2.5 tensor ST prediction DDI

Known ST and v _k ，e _p ，e _q Can be calculated by modular multiplication of (a)ST can be approximated using A, C, in this section we will deduce how to use A, C instead of ST prediction to complete +.>Is calculated by the computer.

From equation (10) and equation (13), we can get:

from equation (7), it can be deduced that:

from the above derivation, we have found that the process of ST predicting DDI can be done approximately using the embedding matrix it decomposes into.

2.6 neural network implementation of ST predictive DDI

A tensor neural network model can be constructed from the above derivation, first, an R-dimensional embedded representation is randomly generated for the substructures in each Pubchem, then an R-dimensional embedded representation is generated for each interaction type in the dataset, and they are organized into a matrix A, C by rows, respectively. The tensor neural network model is built as follows:

second, update tensor neural network with known DDI if DDI _pqk Occurs, thenOtherwiseThe ". Iy represents Hadamard product. The model can be updated by calculating the predicted loss for each DDI in the tensor neural network model training set, namely:

wherein loss represents a predicted loss, which isDDI calculated by the invention _pqk Is a probability of occurrence of (a).

The tensor neural network model was updated using Adam back propagation algorithm with loss as the loss function.

The operation of the model is shown in fig. 7 and 8: 1. representing triples according to the existence of substructures and interaction types; 2. randomly generating R-dimensional embedded representations for all substructures and interaction types; 3. the matrix A, C of the substructures and the embedded representation of interactions, respectively, in rows can be considered as a two-dimensional representation of ST, and the scalar product is obtained by multiplying the drug, the vector representation of interactions, and A, C column by column; 4. multiplying scalar products one by one according to columns and connecting the scalar products in series to obtain the final vector representation of the triples; 5. and predicting the probability of DDI occurrence corresponding to the triplet by using a tensor neural network model.

3. Experiment

3.1 data set

Tatolett et al (Tatolett, N.P., ye, P.P., daneshjou, R., & Altman, R.B. (2012). Data-driven prediction of drug effects and interactions.science translational medicine,4 (125), 125ra 31) downloaded about 215 ten thousand DDI reports from ARES, medEffect resource, respectively, and built a database TWOSIDES using an adaptive drive method, including 63,473 interactions with drugs and 1318 interactions. On this basis, bo et al (Bo, j., yang, h., cao, x., ping, z., wei, x., & Fei, w. (2017) Multitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug interactions) extracted 576,513 sets of DDIs from the TWOSIDES database, which have been shown to result from combination administration, rather than side effects caused by a single Drug. In the studies herein we used the dataset provided by Bo et al above, including m=555 drugs and t=1318 interactions.

3.2 Experimental setup

The purpose of the invention is to predict DDI triples<d _p ,d _q ,l _k >Probability of occurrence, considering the condition of cold start, these triplets can be classified into three categories, as shown in fig. 8. In fig. 8, to the left is a diagram of a known DDI network, to the right is a new drug, the different labels of the lines represent different types of interactions, where the solid lines represent the known DDI and the dashed lines represent the triplets to be predicted. The three types of prediction tasks are described as follows:

c1: unknown interactions that may exist between known drugs, i.e., predicting the dashed line bordering the different labels on the left-hand side of fig. 8;

c2: the possible reaction between the new drug and the known drug is predicted by the dashed line bordering the different label symbols between the new drug and the known drug in fig. 8;

and C3: the possible interactions between the new drugs are predicted by the dashed line bordering the different labels between the new drugs on the right side of fig. 8.

In addition, it is notable that in the studies of the present invention, multiple types of interactions between a pair of drugs may occur, rather than one of the hypotheses assumed in many documents.

The effect of tensor neural network in predicting DDI was verified using 555 drugs and 1318 types of interactions in database TWOSIDES, a total of 576,513 drug interactions. As shown in fig. 8, considering the case of cold start, DDI prediction tasks can be divided into 3 types: c1, potential DDI present between known drugs; c2, unknown DDI present between known and new drugs; c3, unknown DDI between new drugs. Table 2 shows the performance of tensor neural networks on these three tasks and comparisons with DDIMDL (Deng, Y., xu, X., qiau, Y., xia, J., zhang, W., & Liu, S. (2020). A multimodal deep learning framework for predicting drug-drug interactions events.Bioinformation (Oxford, england), 36 (15), 4316-4322), goGCN (Wang, H.et. (2020) GoGNN: graph of graphs neural network for predicting structured entity interactions. In: international Joint Conference on Artificial interactions. Springer, yokohama, japan).

Note that: AUC represents the size of area under the ROC curve; AUPR represents the area under the PR curve; PR is a graph consisting of recall and precision; ACC represents the proportion of samples that are split, representing the proportion of predictions that are correct for the total data; pre represents the proportion of the example divided into positive examples, which is actually positive examples.

Compared with the model GoCCN which can only predict whether unknown DDI edges exist in a DDI network, the invention can process the cold start tasks of the tasks C2 and C3 and has wider application range. In task C1, the invention is superior to other comparison models in terms of various indexes, and in task C2, the invention is far superior to DDIMDL in terms of AUPR and Pre, which verifies the effectiveness of the invention.

It should be noted that, when numerical ranges are referred to in the present invention, it should be understood that two endpoints of each numerical range and any numerical value between the two endpoints are optional, and because the adopted step method is the same as the embodiment, in order to prevent redundancy, the present invention describes a preferred embodiment. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for predicting drug-drug interactions based on tensor neural networks, comprising:

aiming at the rank 1 tensors, reconstructing a tensor factor matrix, constructing a tensor neural network model for the factor matrix, and training to finally obtain the relationship between the substructures and the interaction types, thereby obtaining the probability of DDI occurrence;

the standard method for dividing the medicine substructure is divided according to a Pubchem data set, wherein, representing the collection of all substructures or functional groups in the dataset s _i Representing the substructures numbered i in S; />Representing a set of all drugs in the dataset, d _i A drug numbered i in D; /> Indicating the type of f interactions present in all drugs, l _i Represents the interaction type numbered i in L;

i rows j columns k layer elements ST of tensor ST _ijk Representation of SSI _ijk Probability of occurrence

wherein if d _p In which there is a substructure s _i ThenOtherwise->If d _q In which there is a substructure s _j Then->Otherwise->

Is realized in three stepsIs calculated by (1):

first of all, the first one,extracting a k-th layer matrix ST from ST _:,:,k ，ST _:,:,k In that all interaction types are l _k Is->

third, the third step of, in the case of a vehicle,calculation (d) _p ,d _q ) All types of (1) _k A kind of electronic deviceSum, i.e.)>

CP decomposition of tensors ST is as follows:

tensors ST can be approximated by two matrices A, C;

the construction of the tensor neural network model is carried out according to the following method:

from equation (10) and equation (13):

the method is deduced according to the formula (7):

wherein, the formula (7) is

Wherein, represents the dot product of the vector;

wherein,the occurrence probability of DDI calculated by the invention is bias added for enhancing the robustness of the model, and the Aldrich represents Hadamard product;

the tensor neural network is updated by using the known DDI, and the specific method is as follows:

first, if DDI _pqk Occurs, thenOtherwise->

2. The tensor neural network-based drug-drug interaction prediction method according to claim 1, wherein the method of predicting DDI using the tensor neural network model is as follows:

3. The tensor neural network-based drug-drug interaction prediction method of claim 1, wherein the drug and its interactions are derived from the database TWOSIDES.

4. A drug-drug interaction prediction method based on tensor neural network according to claim 3, wherein the DDI prediction task can be divided into 3 kinds of tasks considering the cold start condition: c1, potential DDI present between known drugs; c2, unknown DDI present between known and new drugs; c3, unknown DDI between new drugs.