CN114974408A

CN114974408A - Construction method, prediction method and device of drug interaction prediction model

Info

Publication number: CN114974408A
Application number: CN202210588763.7A
Authority: CN
Inventors: 苗晓晔; 茹钟莹; 吴洋洋; 朋环环; 尹建伟
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2022-08-30

Abstract

The invention discloses a construction method, a prediction method and a device of a drug interaction prediction model, which comprises the steps of collecting and sorting drugs, molecular linear representation of endogenous protein and interaction among the molecules, wherein the drugs comprise chemical drugs and biological drugs; constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and expanded data volume of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules; constructing a drug interaction prediction model taking a double-view heteromorphic image as input, wherein the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view based on a graph neural network, and then the double-view fusion prediction module combines the double views to give prediction; training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.

Description

Construction method, prediction method and device of drug interaction prediction model

Technical Field

The application relates to the cross field of computer science and drug discovery and medicine technology, in particular to a construction method, a prediction method and a device of a drug interaction prediction model.

Background

The medicines are divided into chemical medicines (namely small molecular medicines) and biological medicines (namely large molecular medicines), and the interaction of two or more medicines in vivo (namely medicine interaction) obviously influences the normal curative effect of the medicines and even endangers life. Since chemical drugs and biological drugs are significantly different in molecular properties, production methods, intake methods, action effects, and the like, prediction methods for chemical drugs cannot be well adapted to prediction of drug interactions involving biological drugs.

Some works have been done by scholars at home and abroad for the task of predicting drug interaction, but these works have the following limitations: 1) prediction of drug interactions involving biological drugs remains a gap. The existing drug interaction prediction work only aims at chemical drugs of small molecules, and neglects biological drugs of large molecules with small side effect and strong specificity. 2) The method for predicting the interaction of the chemical drugs cannot be directly popularized to the interaction recognition task of the chemical drugs and the biological drugs. First, the drug interaction network containing the biopharmaceutical is more sparse. As a new product, the number of biopharmaceuticals is currently limited, and many interactions have not been observed in experimental and clinical practice; secondly, the heterogeneity of chemical drugs and biological drugs is prominent, and the difference of physicochemical properties is great. In computer-aided drug interaction prediction methods, it is not feasible to collect uniform and rich characteristic data for a large number of chemical and biological drugs or to model both classes of drugs at the same granularity.

Disclosure of Invention

The embodiment of the application aims to provide a construction method, a prediction method and a device of a drug interaction prediction model, so as to solve the technical problems that the cost for collecting drug characteristic data is too high, and the drug interaction and the specific interaction type related to biological drugs are difficult to predict in the existing method.

According to a first aspect of the embodiments of the present invention, there is provided a method for constructing a drug interaction prediction model, including:

collecting molecular linear representations of the finishing drugs, endogenous proteins and interactions between the molecules, wherein the drugs comprise chemical drugs and biological drugs;

constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and data volume expansion of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules;

constructing a drug interaction prediction model taking a double-view differential picture as input, wherein the model comprises a double-view differential picture characterization module and a double-view fusion prediction module, the double-view differential picture characterization module learns the characterization of drugs and endogenous proteins at each view angle based on a graph neural network, and then the double-view fusion prediction module combines double views to give prediction;

training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.

According to a second aspect of the embodiments of the present invention, there is provided a method for constructing a drug interaction prediction model, including:

a collection module for collecting molecular linear representations of collated drugs, including chemical and biological drugs, endogenous proteins, and interactions between these molecules;

the construction module is used for constructing a double-view-angle abnormal picture with enhanced connectivity of an outer layer interaction picture and expanded data volume of an inner layer molecular structure picture by using the molecular linear representation and the interaction among the molecules;

the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, wherein the double-view heteromorphic image representation module learns the representation of the medicine and the endogenous protein at each view angle based on a graph neural network, and then the double-view fusion prediction module combines the double view angles to give prediction;

and the training module is used for training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.

According to a third aspect of embodiments of the present invention, there is provided a drug interaction prediction method, including:

inputting the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model of the first aspect, and obtaining a prediction result.

According to a fourth aspect of embodiments of the present invention, there is provided a drug interaction prediction apparatus comprising:

the prediction module is used for inputting the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model of the first aspect to obtain a prediction result.

According to a fifth aspect of an embodiment of the present invention, there is provided an electronic apparatus, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method as described in the first or third aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the embodiment, the interaction relation between the medicine and the endogenous protein and the molecular structure of the endogenous protein are introduced, and the correlation data of the medicine is enriched by utilizing the property that the chemical medicine and the biological medicine both act through acting on the docking of the endogenous protein and the characteristic that the biological medicine and the endogenous protein are macromolecules, so that the medicine interaction prediction related to the biological medicine becomes possible.

Compared with a typical artificial intelligence auxiliary method in the field of biomedicine, the method does not depend on domain knowledge and feature engineering, and provides a simple, accurate and efficient solution for predicting the drug interaction of chemical drugs and biological drugs.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart illustrating a method of constructing a drug interaction prediction model in accordance with an exemplary embodiment.

FIG. 2 is a schematic diagram illustrating a drug interaction diagram including a chemical drug and a biological drug, according to an exemplary embodiment.

FIG. 3 is a diagram illustrating an outer layer interaction diagram of one connectivity enhancement, according to an example embodiment.

FIG. 4 is a schematic diagram illustrating a dual view heteromorphic diagram in accordance with an exemplary embodiment.

FIG. 5 is a schematic diagram illustrating a dual view heteromorphic image encoding module according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating an apparatus for constructing a drug interaction prediction model in accordance with an exemplary embodiment.

FIG. 7 is a flow chart illustrating a method of drug interaction prediction according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating a drug interaction prediction device, according to an exemplary embodiment.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, an embodiment of the present invention provides a method for constructing a drug interaction prediction model, including the following steps:

s100: collecting molecular linear representations of the finishing drugs, endogenous proteins and interactions between the molecules, wherein the drugs comprise chemical drugs and biological drugs; this step may include the following sub-steps:

s110: collecting and collating ID and molecular linear representation, drug-drug interaction data and drug-endogenous protein interaction data of the drug from a drug information platform drug bank;

specifically, S111: analyzing an XML file (version 5.1.8) provided by the drug Bank by using a third party package dbparser in the R language, and arranging to obtain a molecular linear representation (namely a SMILES expression) of a small-molecular chemical drug, a molecular linear representation (namely an amino acid sequence) of a large-molecular biological drug, drug-drug interaction data and drug-endogenous protein interaction data;

s112: the cleaning data specifically comprises: eliminating biological agents that fail to calculate an amino acid residue contact pattern (standard is a single chain length of more than 1000 amino acids or comprises multiple chains), eliminating chemical agents that fail to construct a molecular pattern from a SMILES expression using the Rdkit toolkit, eliminating agents that have no known interaction data or known interactions of less than 3, and eliminating agent interaction types that have a total number of records of less than 1000;

s120: and converting the ID of the endogenous protein in the drug-endogenous protein interaction data into the ID in a protein database, and inquiring the protein database according to the ID to obtain the molecular linear representation of each endogenous protein and the interaction among the endogenous proteins.

Specifically, the ID of the endogenous protein in the drug-endogenous protein interaction data in step S110 is converted into its ID in the StringDB database of the protein database through the Uniprot website, the amino acid sequence of the endogenous protein obtained in step S110 is collected and input into the StringDB protein interaction database, and the interaction between these endogenous proteins is queried.

The resulting data were divided into two data sets C-DB and CB-DB depending on whether or not the biopharmaceutical was considered. C-DB only considers chemical drugs, 1586 drugs in total, 496 endogenous proteins, 62 types, and nearly 45 ten thousand drug interaction data in total; CB-DB considers both chemical and biological drugs, for a total of 3174 drugs, 811 endogenous proteins, and 62 types, totaling approximately 112 million drug interaction data. Each interaction datum is represented as a triplet in the form of < ID, interaction type, ID >.

The molecular linear representation of the chemical drug refers to the SMILES expression of the small molecules that make up the drug; the biological medicine refers to protein biological medicine, and the molecular linear expression of the biological medicine is the amino acid sequence of protein macromolecules forming the medicine; the molecular linear representation of the endogenous protein refers to the amino acid sequence of the protein macromolecules that make up it.

S200: constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and data volume expansion of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules; this step may include the following sub-steps:

s210: only taking the molecule linear expression of a single drug/endogenous protein as input, constructing a molecular structure diagram with attributes for the drug/endogenous protein molecule, and taking the molecular structure diagram as an inner layer molecular structure diagram;

specifically, S211: converting the linear representation of the small molecule, namely the SMILES expression into a molecular diagram through an Rdkit toolkit, vectorizing the basic attributes of atoms and chemical bonds by using an dgl-lifesci library, and adding the vectorized basic attributes to the molecular diagram to obtain an inner-layer molecular structure diagram of the small molecule.

S212: calculating/predicting an amino acid residue contact map for macromolecules by utilizing a protein sequence retrieval comparison tool HHblits and a protein residue contact map prediction tool CCMpred of homology detection iteration, setting a contact map matrix to be binary by taking 0.5 as a threshold value, constructing a molecular structure diagram by taking amino acids as nodes, and if an element of a subscript [ i, j ] in the binary contact map matrix is 1, considering that an edge exists between two nodes numbered i and j. And after the composition is finished, adding characteristics such as amino acid types and the like to the nodes, and taking the nodes with the added attributes as the inner-layer molecular structure chart of the macromolecule.

Subsequently, an outer layer interaction graph is constructed, and fig. 2 gives an example of an outer layer interaction graph containing only drug-drug interaction data, which may occur between chemical drugs, biological drugs, or both. In practice, the interaction of a drug with another drug is abstracted as a directed edge of the drug to another drug, the type of edge being the type of interaction. At present, a large number of drug-drug interactions are not discovered yet, and particularly, the interaction cognition on new drugs such as biological drugs is insufficient, so that the connectivity among drug nodes of an interaction graph only containing drug-drug interaction information is poor. However, the performance of the graph neural network-based drug interaction prediction method is greatly influenced by graph connectivity, and therefore, an outer interaction graph with enhanced connectivity needs to be constructed by introducing drug-endogenous protein interaction data and endogenous protein-endogenous protein interaction data to allow more drugs to be connected by using endogenous proteins as relays.

S220: the collected interaction data of the drug, the endogenous protein and the endogenous protein are used as input, the drug and the endogenous protein are respectively regarded as a type of node, different types of interaction relations between the drug and the endogenous protein are regarded as different types of edges, and an isomeric interaction diagram with stronger connectivity than a pure drug interaction diagram is constructed and used as an outer layer interaction diagram;

specifically, an isomerous graph comprising two nodes of the drug and the endogenous protein is constructed as an outer layer interaction graph by taking the drug and the endogenous protein as nodes and taking drug-drug, drug-endogenous protein and endogenous protein-endogenous protein interaction relations as edges. The inner layer images of the small molecule chemical drugs and the large molecule biological drugs are isomeric, but the outer layer images of the small molecule chemical drugs and the large molecule biological drugs are both used as drug nodes. The edges in the outer interaction graph contain at least 3 types, including three major classes, directed drug-drug interaction edge, undirected drug-endogenous protein interaction edge, undirected endogenous protein-endogenous protein interaction edge. Where drug-drug interactions are the subject of prediction, their categories may be subdivided by the specific type of drug-drug interaction event and treated as different kinds of edges in the heteromorphic graph. An example of an enhanced connectivity outer layer interaction diagram is given in fig. 3, where the type and direction of drug-drug interactions are not indicated.

S230: and corresponding each constructed inner layer molecular structure diagram to a corresponding node in the constructed outer layer diagram to complete the construction of the double-view-angle different-pattern diagram. Referring to fig. 4, a constructed dual view heteromorphic diagram is illustrated.

S300: constructing a drug interaction prediction model taking a double-view heteromorphic image as input, wherein the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view based on a graph neural network, and then the double-view fusion prediction module combines the double views to give prediction;

referring to fig. 5, the dual-view heteromorphic image characterization module extracts an inner layer representation F (composed of inner layer F of small molecules) containing information of single molecule property and molecular interaction law from the inner layer molecular structure diagram and the outer layer interaction diagram respectively by using a method based on a neural network _s And the inner layer of the macromolecule represents F _l ) And the outer layer represents Z to be subsequently combined with the two types of information to predict the probability of a certain type of interaction occurring for a certain drug combination. S300 may include the steps of:

s310: the double-view-angle heteromorphic image coding module respectively codes an inner-layer molecular structure diagram of a chemical drug and an inner-layer molecular structure diagram of a biological drug and an endogenous protein by adopting a molecular representation learning method based on a graph neural network to obtain an inner-layer representation of each drug/endogenous protein;

specifically, the small molecules and the large molecules respectively construct an encoder, and full-image-level representation is extracted for the inner-layer molecular structure diagrams of the small molecules and the large molecules respectively, namely inner-layer representation of single medicine/endogenous protein molecules. The encoder structure selects an atlas neural network (atlas FP) suitable for molecular representation learning.

S320: setting initial node representations in the outer interaction graph, namely initial representations of the drug and the endogenous protein, as corresponding inner layer representations, and then extracting the outer layer representations of the drug and the endogenous protein from the outer interaction graph by the double-visual-angle-map encoding module by using a graph neural network suitable for the multi-relation map;

specifically, a Graph convolution neural Network (RGCN) suitable for different Graph multi-relation modeling is selected as an encoder to extract node representation for an outer interaction Graph, namely outer representation of a single drug/endogenous protein.

S330: the dual view fusion prediction module includes a representation alignment module, an inner layer scorer, and an outer layer scorer. And the two graders respectively predict the probability values of specific types of interactions of the medicine combination under the visual angles of the inner layer and the outer layer by utilizing the inner layer representation and the outer layer representation.

In particular, for a certain drug combination (i.e., drug u and drug v) and the type of drug interaction of interest act, the formalization is expressed as a triplet t ═ d ₁ ，act，d ₂ The >. In/Extra scorers each have a set of trainable weights sensitive to type of interaction, with drug d respectively ₁ And d ₂ Is represented as an input, predicting the probability of the interaction existing under the inner/outer perspective. F (d) and Z (d) are respectively inner layer representation and outer layer representation of drug d, \ is a Hadamard product, \ is a sigmoid function,

and

trainable weight vectors for the interaction type act in the inner and outer scorers, respectively, are t ═ d ₁ ，act，d ₂ Probability scores given are:

s400: training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs; this step may include the following sub-steps:

s410: training a double-visual-angle heteromorphic image coding module and a double-visual-angle fusion prediction module end to end in a gradient descending mode to enable the outputs of two graders to be close to the true value and consistent as much as possible, and simultaneously maximizing mutual information between inner-layer representation and outer-layer representation in a comparison learning mode to enable molecular structure information extracted from the inner layer to be 'injected' into the outer-layer representation which is closer to the interaction prediction task relationship with the multi-type drugs;

specifically, the overall optimization objective function in the model training is as follows:

wherein, the first and the second end of the pipe are connected with each other,

in order to supervise the learning loss,

and

predicting disparity losses for the inner and outer layers and expressing disparity losses for the inner and outer layer molecules, respectively, beta and gamma being hyper-parameters for balancing

And

specific gravity of (a).

Binary Cross Entropy (BCE) was used to measure the difference between the predicted and true values of the inner and outer layers.

Is defined as

Wherein the content of the first and second substances,

as a triple set of drug-drug interactions, y _t Is the true value, S, of the triplet t ^tra (t) and S ^ter And (t) respectively giving predicted values to the triad t by the inner-layer and outer-layer graders, wherein alpha is a hyper-parameter and is used for adjusting the proportion of predicted deviation of the inner layer and the outer layer in loss supervision.

KL divergence (Kullback-Leibler divergence) was used to measure the disparity of inner and outer layer classifier predictors, defined as

For maximizing mutual information between the inner representation F and the outer representation Z in a contrast learning manner. The inner layer of small molecules in the F represents F _s And the inner layer of the macromolecule represents F _l And Z are generated by different encoders respectively, and the representation spaces of the three groups of representations have differences, therefore, before calculating mutual information, a network consisting of 2 layers of full connection layers and 1 Skip connection is used to project the three groups of representations to the same space respectively, and the projected representations are respectively marked as F _s ′、F _l 'and Z'. Subsequently, positive and negative sample pairs are constructed according to the following rules: regarding a graph formed by certain types of nodes and connecting edges between the nodes in the outer layer graph as an isomorphic subgraph of the original outer layer interaction graph, namely, not distinguishing the specific types of the connecting edges between the nodes in the same type; and constructing a positive sample pair and a negative sample pair on each isomorphic subgraph, specifically, regarding a certain node u as an anchor point, forming a positive sample by matching u with any node directly connected with the node or u per se, and forming a negative sample by matching u with any node not directly connected with the node.

After the positive and negative sample pair structure is completed, the learning loss is compared

Is defined as:

and

are isomorphic subgraphs of an outer layer interaction graph consisting of drug nodes and endogenous protein nodes, respectively, F '(i) and Z' (i) are respectively a post-projection inner layer representation and a post-projection outer layer representation of the node i,

the lower bound of the mutual information value was calculated using the JS divergence (Jensen-Shannon divergence).

S420: and adjusting the hyper-parameters of the model, training the model under the optimal parameter setting, and storing the outer layer representation of the medicine and the network in the dual-visual angle fusion prediction module as a medicine interaction prediction model for subsequent use. Training the whole model end to enable the outputs of the two graders to be close to the true value as much as possible, enable the distribution of the two groups of outputs to be close as much as possible, simultaneously align the inner-layer representation and the outer-layer representation of the medicine and the endogenous protein by utilizing the inner-layer and outer-layer mutual information between the contrast learning maximization medicine-medicine and endogenous protein-endogenous protein node pairs, and enable the molecular structure information extracted from the inner layer to be 'injected' into the outer-layer representation with a close interaction relation;

specifically, an Adam optimizer was used for the full batch training and the following hyper-parameters were adjusted:

1) the weight coefficient alpha of the loss of the supervised learning of the inner layer visual angle in the target function;

2) predicting a weight coefficient beta of inconsistency loss of an inner layer and an outer layer in the target function;

3) the proportion gamma of the loss of the distribution inconsistency represented by the inner layer and the outer layer in the objective function;

4) the number of layers of the graph neural network used for coding the structure diagram of the inner layer small molecules and the inner layer large molecules;

5) the number of layers of the graph neural network used to encode the outer interaction network.

S430: under the optimal parameter setting, the training model is used for predicting the multi-type interaction of the drug combination, and the output of the outer-layer scorer is used as a final prediction result.

Referring to tables 1 and 2, the method proposed by the present invention (noted chembitip in tables 1 and 2) shows the learning method (HGT, GraIL, RGCN, Decagon, MIRACLE) from the viewpoint of predicting performance, which is the best in both cases of "chemical drug only" (C-DB on left side of table 1) and "chemical drug and biological drug facing" (CB-DB on right side of table 1). And, three modules (double-view angle abnormal picture construction module, double-view angle abnormal picture coding module, double-view angle fusion and prediction module) in the model of the invention are replaced or added and deleted to obtain models of different varieties, and compared with the complete model of the invention, the model using the outer interaction diagram enhancement strategy and the multi-view angle contrast fusion strategy is better in performance of prediction.

Table 1: results of each model on both C-DB and CB-DB data sets

Table 2: ablation experimental result of submodule on two data sets of C-DB and CB-DB

Corresponding to the embodiment of the method for constructing the drug interaction prediction model, the application also provides an embodiment of a device for constructing the drug interaction prediction model.

FIG. 6 is a block diagram of an apparatus for constructing a drug interaction prediction model, according to an exemplary embodiment. Referring to fig. 6, the apparatus includes a collection module 100, a construction module 200, a construction module 300, and a training module 400.

A collection module 100 for collecting a molecular linear representation of a collated drug, including chemical and biological drugs, endogenous proteins, and interactions between these molecules;

a construction module 200 for constructing a dual-view profile map with enhanced connectivity of the outer layer interaction map and data volume expansion of the inner layer molecular structure map using the molecular linear representation and the interaction between the molecules;

the construction module 300 is used for constructing a drug interaction prediction model taking a double-view heteromorphic image as input, the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view angle based on a graph neural network, and then the double-view fusion prediction module combines the double view angles to give prediction;

and the training module 400 is used for training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.

Referring to fig. 7, an embodiment of the present invention further provides a method for predicting drug interaction, including:

and inputting the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model to obtain a prediction result.

In correspondence with the foregoing embodiments of the drug interaction prediction method, the present application also provides embodiments of a drug interaction prediction device.

FIG. 8 is a block diagram illustrating a drug interaction prediction device, according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a prediction module 500.

The prediction module 500 is configured to input the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model to obtain a prediction result.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement without inventive effort.

Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method of constructing a drug interaction prediction model or a method of drug interaction prediction as described above.

Accordingly, the present application also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of constructing a drug interaction prediction model or a method of predicting drug interaction as described above.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for constructing a drug interaction prediction model is characterized by comprising the following steps:

constructing a drug interaction prediction model taking a double-view heteromorphic image as input, wherein the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view based on a graph neural network, and then the double-view fusion prediction module combines the double views to give prediction;

2. The method of claim 1, wherein collecting the molecular linear representation of the codrug, endogenous proteins, and interactions between these molecules comprises:

collecting and collating drug ID and molecular linear representation, drug-drug interaction data and drug-endogenous protein interaction data from a drug information platform;

and converting the ID of the endogenous protein in the drug-endogenous protein interaction data into the ID in a protein database, and inquiring the protein database according to the ID to obtain the molecular linear representation of each endogenous protein and the interaction among the endogenous proteins.

3. The method of claim 2, wherein the molecular linear representation of the chemical drug refers to a SMILES expression of the small molecules that make up the drug; the biological medicine refers to protein biological medicine, and the molecular linear expression of the biological medicine is the amino acid sequence of protein macromolecules forming the medicine; the molecular linear representation of the endogenous protein refers to the amino acid sequence of the protein macromolecules that make up it.

4. The method of claim 1, wherein constructing a dual-view metamgram with enhanced connectivity of the outer-layer interaction map and data volume augmentation of the inner-layer molecular structure map using the molecular linear representation and interactions between the molecules comprises:

only taking the molecule linear expression of a single drug/endogenous protein as input, constructing a molecular structure diagram with attributes for the drug/endogenous protein molecule, and taking the molecular structure diagram as an inner layer molecular structure diagram;

the collected interaction data of the drug, the endogenous protein and the endogenous protein are used as input, the drug and the endogenous protein are respectively regarded as a type of node, different types of interaction relations between the drug and the endogenous protein are regarded as different types of edges, and an isomeric interaction diagram with stronger connectivity than a pure drug interaction diagram is constructed and used as an outer layer interaction diagram;

and corresponding each constructed inner layer molecular structure diagram to a corresponding node in the constructed outer layer diagram to complete the construction of the double-view-angle different-pattern diagram.

5. The method of claim 1, wherein the dual view heteromorphic image characterization module learns the characterization of drugs and endogenous proteins at each view based on a neural network of images, and the dual view fusion prediction module then gives predictions in combination with the dual views, comprising:

the double-view-angle heteromorphic image coding module respectively codes an inner-layer molecular structure diagram of a chemical drug and an inner-layer molecular structure diagram of a biological drug and an endogenous protein by adopting a molecular representation learning method based on a graph neural network to obtain an inner-layer representation of each drug/endogenous protein;

setting initial node representations in the outer interaction graph, namely initial representations of the drug and the endogenous protein, as corresponding inner layer representations, and then extracting the outer layer representations of the drug and the endogenous protein from the outer interaction graph by the double-visual-angle-map encoding module by using a graph neural network suitable for the multi-relation map;

the dual-view fusion prediction module comprises an inner-layer scorer and an outer-layer scorer, the inner-layer scorer and the outer-layer scorer map the inner-layer representation and the outer-layer representation to the same space and align through maximum mutual information, and the inner-layer scorer and the outer-layer scorer respectively predict probability values of specific types of interactions of the medicine combination under the inner-layer view and the outer-layer view by using the inner-layer representation and the outer-layer representation.

6. The method of claim 1, wherein training the constructed model and adjusting the hyper-parameters thereof to obtain a multi-type drug interaction prediction model for chemical and biological drugs comprises:

training the double-visual-angle heteromorphic image coding module and the double-visual-angle fusion prediction module end to end in a gradient descending mode to enable the output of the double-visual-angle fusion prediction module to approach to a true value and be consistent, and simultaneously maximizing mutual information between the inner layer representation and the outer layer representation in a comparison learning mode to enable molecular structure information extracted from the inner layer to be injected into the outer layer representation which is more closely related to a multi-type drug interaction prediction task;

and adjusting the hyper-parameters of the model, training the model under the optimal parameter setting, and storing the outer layer representation of the medicine and the network in the dual-visual angle fusion prediction module as a medicine interaction prediction model.

7. An apparatus for constructing a model for predicting drug interaction, comprising:

8. A method for predicting drug interaction, comprising:

inputting the drug combination to be predicted and the type of drug interaction to be predicted into the drug interaction prediction model of claim 1 to obtain the prediction result.

9. A drug interaction prediction device, comprising:

a prediction module, configured to input a drug combination to be predicted and a drug interaction type to be predicted into the drug interaction prediction model according to claim 1, so as to obtain a prediction result.

10. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6, 8.