CN114974408A - Construction method, prediction method and device of drug interaction prediction model - Google Patents

Construction method, prediction method and device of drug interaction prediction model Download PDF

Info

Publication number
CN114974408A
CN114974408A CN202210588763.7A CN202210588763A CN114974408A CN 114974408 A CN114974408 A CN 114974408A CN 202210588763 A CN202210588763 A CN 202210588763A CN 114974408 A CN114974408 A CN 114974408A
Authority
CN
China
Prior art keywords
drug
interaction
layer
view
double
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210588763.7A
Other languages
Chinese (zh)
Inventor
苗晓晔
茹钟莹
吴洋洋
朋环环
尹建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210588763.7A priority Critical patent/CN114974408A/en
Publication of CN114974408A publication Critical patent/CN114974408A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a construction method, a prediction method and a device of a drug interaction prediction model, which comprises the steps of collecting and sorting drugs, molecular linear representation of endogenous protein and interaction among the molecules, wherein the drugs comprise chemical drugs and biological drugs; constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and expanded data volume of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules; constructing a drug interaction prediction model taking a double-view heteromorphic image as input, wherein the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view based on a graph neural network, and then the double-view fusion prediction module combines the double views to give prediction; training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.

Description

Construction method, prediction method and device of drug interaction prediction model
Technical Field
The application relates to the cross field of computer science and drug discovery and medicine technology, in particular to a construction method, a prediction method and a device of a drug interaction prediction model.
Background
The medicines are divided into chemical medicines (namely small molecular medicines) and biological medicines (namely large molecular medicines), and the interaction of two or more medicines in vivo (namely medicine interaction) obviously influences the normal curative effect of the medicines and even endangers life. Since chemical drugs and biological drugs are significantly different in molecular properties, production methods, intake methods, action effects, and the like, prediction methods for chemical drugs cannot be well adapted to prediction of drug interactions involving biological drugs.
Some works have been done by scholars at home and abroad for the task of predicting drug interaction, but these works have the following limitations: 1) prediction of drug interactions involving biological drugs remains a gap. The existing drug interaction prediction work only aims at chemical drugs of small molecules, and neglects biological drugs of large molecules with small side effect and strong specificity. 2) The method for predicting the interaction of the chemical drugs cannot be directly popularized to the interaction recognition task of the chemical drugs and the biological drugs. First, the drug interaction network containing the biopharmaceutical is more sparse. As a new product, the number of biopharmaceuticals is currently limited, and many interactions have not been observed in experimental and clinical practice; secondly, the heterogeneity of chemical drugs and biological drugs is prominent, and the difference of physicochemical properties is great. In computer-aided drug interaction prediction methods, it is not feasible to collect uniform and rich characteristic data for a large number of chemical and biological drugs or to model both classes of drugs at the same granularity.
Disclosure of Invention
The embodiment of the application aims to provide a construction method, a prediction method and a device of a drug interaction prediction model, so as to solve the technical problems that the cost for collecting drug characteristic data is too high, and the drug interaction and the specific interaction type related to biological drugs are difficult to predict in the existing method.
According to a first aspect of the embodiments of the present invention, there is provided a method for constructing a drug interaction prediction model, including:
collecting molecular linear representations of the finishing drugs, endogenous proteins and interactions between the molecules, wherein the drugs comprise chemical drugs and biological drugs;
constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and data volume expansion of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules;
constructing a drug interaction prediction model taking a double-view differential picture as input, wherein the model comprises a double-view differential picture characterization module and a double-view fusion prediction module, the double-view differential picture characterization module learns the characterization of drugs and endogenous proteins at each view angle based on a graph neural network, and then the double-view fusion prediction module combines double views to give prediction;
training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.
According to a second aspect of the embodiments of the present invention, there is provided a method for constructing a drug interaction prediction model, including:
a collection module for collecting molecular linear representations of collated drugs, including chemical and biological drugs, endogenous proteins, and interactions between these molecules;
the construction module is used for constructing a double-view-angle abnormal picture with enhanced connectivity of an outer layer interaction picture and expanded data volume of an inner layer molecular structure picture by using the molecular linear representation and the interaction among the molecules;
the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, wherein the double-view heteromorphic image representation module learns the representation of the medicine and the endogenous protein at each view angle based on a graph neural network, and then the double-view fusion prediction module combines the double view angles to give prediction;
and the training module is used for training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.
According to a third aspect of embodiments of the present invention, there is provided a drug interaction prediction method, including:
inputting the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model of the first aspect, and obtaining a prediction result.
According to a fourth aspect of embodiments of the present invention, there is provided a drug interaction prediction apparatus comprising:
the prediction module is used for inputting the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model of the first aspect to obtain a prediction result.
According to a fifth aspect of an embodiment of the present invention, there is provided an electronic apparatus, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method as described in the first or third aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the embodiment, the interaction relation between the medicine and the endogenous protein and the molecular structure of the endogenous protein are introduced, and the correlation data of the medicine is enriched by utilizing the property that the chemical medicine and the biological medicine both act through acting on the docking of the endogenous protein and the characteristic that the biological medicine and the endogenous protein are macromolecules, so that the medicine interaction prediction related to the biological medicine becomes possible.
Compared with a typical artificial intelligence auxiliary method in the field of biomedicine, the method does not depend on domain knowledge and feature engineering, and provides a simple, accurate and efficient solution for predicting the drug interaction of chemical drugs and biological drugs.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart illustrating a method of constructing a drug interaction prediction model in accordance with an exemplary embodiment.
FIG. 2 is a schematic diagram illustrating a drug interaction diagram including a chemical drug and a biological drug, according to an exemplary embodiment.
FIG. 3 is a diagram illustrating an outer layer interaction diagram of one connectivity enhancement, according to an example embodiment.
FIG. 4 is a schematic diagram illustrating a dual view heteromorphic diagram in accordance with an exemplary embodiment.
FIG. 5 is a schematic diagram illustrating a dual view heteromorphic image encoding module according to an exemplary embodiment.
Fig. 6 is a block diagram illustrating an apparatus for constructing a drug interaction prediction model in accordance with an exemplary embodiment.
FIG. 7 is a flow chart illustrating a method of drug interaction prediction according to an exemplary embodiment.
FIG. 8 is a block diagram illustrating a drug interaction prediction device, according to an exemplary embodiment.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1, an embodiment of the present invention provides a method for constructing a drug interaction prediction model, including the following steps:
s100: collecting molecular linear representations of the finishing drugs, endogenous proteins and interactions between the molecules, wherein the drugs comprise chemical drugs and biological drugs; this step may include the following sub-steps:
s110: collecting and collating ID and molecular linear representation, drug-drug interaction data and drug-endogenous protein interaction data of the drug from a drug information platform drug bank;
specifically, S111: analyzing an XML file (version 5.1.8) provided by the drug Bank by using a third party package dbparser in the R language, and arranging to obtain a molecular linear representation (namely a SMILES expression) of a small-molecular chemical drug, a molecular linear representation (namely an amino acid sequence) of a large-molecular biological drug, drug-drug interaction data and drug-endogenous protein interaction data;
s112: the cleaning data specifically comprises: eliminating biological agents that fail to calculate an amino acid residue contact pattern (standard is a single chain length of more than 1000 amino acids or comprises multiple chains), eliminating chemical agents that fail to construct a molecular pattern from a SMILES expression using the Rdkit toolkit, eliminating agents that have no known interaction data or known interactions of less than 3, and eliminating agent interaction types that have a total number of records of less than 1000;
s120: and converting the ID of the endogenous protein in the drug-endogenous protein interaction data into the ID in a protein database, and inquiring the protein database according to the ID to obtain the molecular linear representation of each endogenous protein and the interaction among the endogenous proteins.
Specifically, the ID of the endogenous protein in the drug-endogenous protein interaction data in step S110 is converted into its ID in the StringDB database of the protein database through the Uniprot website, the amino acid sequence of the endogenous protein obtained in step S110 is collected and input into the StringDB protein interaction database, and the interaction between these endogenous proteins is queried.
The resulting data were divided into two data sets C-DB and CB-DB depending on whether or not the biopharmaceutical was considered. C-DB only considers chemical drugs, 1586 drugs in total, 496 endogenous proteins, 62 types, and nearly 45 ten thousand drug interaction data in total; CB-DB considers both chemical and biological drugs, for a total of 3174 drugs, 811 endogenous proteins, and 62 types, totaling approximately 112 million drug interaction data. Each interaction datum is represented as a triplet in the form of < ID, interaction type, ID >.
The molecular linear representation of the chemical drug refers to the SMILES expression of the small molecules that make up the drug; the biological medicine refers to protein biological medicine, and the molecular linear expression of the biological medicine is the amino acid sequence of protein macromolecules forming the medicine; the molecular linear representation of the endogenous protein refers to the amino acid sequence of the protein macromolecules that make up it.
S200: constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and data volume expansion of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules; this step may include the following sub-steps:
s210: only taking the molecule linear expression of a single drug/endogenous protein as input, constructing a molecular structure diagram with attributes for the drug/endogenous protein molecule, and taking the molecular structure diagram as an inner layer molecular structure diagram;
specifically, S211: converting the linear representation of the small molecule, namely the SMILES expression into a molecular diagram through an Rdkit toolkit, vectorizing the basic attributes of atoms and chemical bonds by using an dgl-lifesci library, and adding the vectorized basic attributes to the molecular diagram to obtain an inner-layer molecular structure diagram of the small molecule.
S212: calculating/predicting an amino acid residue contact map for macromolecules by utilizing a protein sequence retrieval comparison tool HHblits and a protein residue contact map prediction tool CCMpred of homology detection iteration, setting a contact map matrix to be binary by taking 0.5 as a threshold value, constructing a molecular structure diagram by taking amino acids as nodes, and if an element of a subscript [ i, j ] in the binary contact map matrix is 1, considering that an edge exists between two nodes numbered i and j. And after the composition is finished, adding characteristics such as amino acid types and the like to the nodes, and taking the nodes with the added attributes as the inner-layer molecular structure chart of the macromolecule.
Subsequently, an outer layer interaction graph is constructed, and fig. 2 gives an example of an outer layer interaction graph containing only drug-drug interaction data, which may occur between chemical drugs, biological drugs, or both. In practice, the interaction of a drug with another drug is abstracted as a directed edge of the drug to another drug, the type of edge being the type of interaction. At present, a large number of drug-drug interactions are not discovered yet, and particularly, the interaction cognition on new drugs such as biological drugs is insufficient, so that the connectivity among drug nodes of an interaction graph only containing drug-drug interaction information is poor. However, the performance of the graph neural network-based drug interaction prediction method is greatly influenced by graph connectivity, and therefore, an outer interaction graph with enhanced connectivity needs to be constructed by introducing drug-endogenous protein interaction data and endogenous protein-endogenous protein interaction data to allow more drugs to be connected by using endogenous proteins as relays.
S220: the collected interaction data of the drug, the endogenous protein and the endogenous protein are used as input, the drug and the endogenous protein are respectively regarded as a type of node, different types of interaction relations between the drug and the endogenous protein are regarded as different types of edges, and an isomeric interaction diagram with stronger connectivity than a pure drug interaction diagram is constructed and used as an outer layer interaction diagram;
specifically, an isomerous graph comprising two nodes of the drug and the endogenous protein is constructed as an outer layer interaction graph by taking the drug and the endogenous protein as nodes and taking drug-drug, drug-endogenous protein and endogenous protein-endogenous protein interaction relations as edges. The inner layer images of the small molecule chemical drugs and the large molecule biological drugs are isomeric, but the outer layer images of the small molecule chemical drugs and the large molecule biological drugs are both used as drug nodes. The edges in the outer interaction graph contain at least 3 types, including three major classes, directed drug-drug interaction edge, undirected drug-endogenous protein interaction edge, undirected endogenous protein-endogenous protein interaction edge. Where drug-drug interactions are the subject of prediction, their categories may be subdivided by the specific type of drug-drug interaction event and treated as different kinds of edges in the heteromorphic graph. An example of an enhanced connectivity outer layer interaction diagram is given in fig. 3, where the type and direction of drug-drug interactions are not indicated.
S230: and corresponding each constructed inner layer molecular structure diagram to a corresponding node in the constructed outer layer diagram to complete the construction of the double-view-angle different-pattern diagram. Referring to fig. 4, a constructed dual view heteromorphic diagram is illustrated.
S300: constructing a drug interaction prediction model taking a double-view heteromorphic image as input, wherein the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view based on a graph neural network, and then the double-view fusion prediction module combines the double views to give prediction;
referring to fig. 5, the dual-view heteromorphic image characterization module extracts an inner layer representation F (composed of inner layer F of small molecules) containing information of single molecule property and molecular interaction law from the inner layer molecular structure diagram and the outer layer interaction diagram respectively by using a method based on a neural network s And the inner layer of the macromolecule represents F l ) And the outer layer represents Z to be subsequently combined with the two types of information to predict the probability of a certain type of interaction occurring for a certain drug combination. S300 may include the steps of:
s310: the double-view-angle heteromorphic image coding module respectively codes an inner-layer molecular structure diagram of a chemical drug and an inner-layer molecular structure diagram of a biological drug and an endogenous protein by adopting a molecular representation learning method based on a graph neural network to obtain an inner-layer representation of each drug/endogenous protein;
specifically, the small molecules and the large molecules respectively construct an encoder, and full-image-level representation is extracted for the inner-layer molecular structure diagrams of the small molecules and the large molecules respectively, namely inner-layer representation of single medicine/endogenous protein molecules. The encoder structure selects an atlas neural network (atlas FP) suitable for molecular representation learning.
S320: setting initial node representations in the outer interaction graph, namely initial representations of the drug and the endogenous protein, as corresponding inner layer representations, and then extracting the outer layer representations of the drug and the endogenous protein from the outer interaction graph by the double-visual-angle-map encoding module by using a graph neural network suitable for the multi-relation map;
specifically, a Graph convolution neural Network (RGCN) suitable for different Graph multi-relation modeling is selected as an encoder to extract node representation for an outer interaction Graph, namely outer representation of a single drug/endogenous protein.
S330: the dual view fusion prediction module includes a representation alignment module, an inner layer scorer, and an outer layer scorer. And the two graders respectively predict the probability values of specific types of interactions of the medicine combination under the visual angles of the inner layer and the outer layer by utilizing the inner layer representation and the outer layer representation.
In particular, for a certain drug combination (i.e., drug u and drug v) and the type of drug interaction of interest act, the formalization is expressed as a triplet t ═ d 1 ,act,d 2 The >. In/Extra scorers each have a set of trainable weights sensitive to type of interaction, with drug d respectively 1 And d 2 Is represented as an input, predicting the probability of the interaction existing under the inner/outer perspective. F (d) and Z (d) are respectively inner layer representation and outer layer representation of drug d, \ is a Hadamard product, \ is a sigmoid function,
Figure BDA0003664218070000081
and
Figure BDA0003664218070000082
trainable weight vectors for the interaction type act in the inner and outer scorers, respectively, are t ═ d 1 ,act,d 2 Probability scores given are:
Figure BDA0003664218070000083
Figure BDA0003664218070000084
s400: training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs; this step may include the following sub-steps:
s410: training a double-visual-angle heteromorphic image coding module and a double-visual-angle fusion prediction module end to end in a gradient descending mode to enable the outputs of two graders to be close to the true value and consistent as much as possible, and simultaneously maximizing mutual information between inner-layer representation and outer-layer representation in a comparison learning mode to enable molecular structure information extracted from the inner layer to be 'injected' into the outer-layer representation which is closer to the interaction prediction task relationship with the multi-type drugs;
specifically, the overall optimization objective function in the model training is as follows:
Figure BDA0003664218070000085
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003664218070000086
in order to supervise the learning loss,
Figure BDA0003664218070000087
and
Figure BDA0003664218070000088
predicting disparity losses for the inner and outer layers and expressing disparity losses for the inner and outer layer molecules, respectively, beta and gamma being hyper-parameters for balancing
Figure BDA0003664218070000089
And
Figure BDA00036642180700000810
specific gravity of (a).
Figure BDA00036642180700000811
Binary Cross Entropy (BCE) was used to measure the difference between the predicted and true values of the inner and outer layers.
Figure BDA00036642180700000812
Is defined as
Figure BDA0003664218070000091
Wherein the content of the first and second substances,
Figure BDA0003664218070000092
as a triple set of drug-drug interactions, y t Is the true value, S, of the triplet t tra (t) and S ter And (t) respectively giving predicted values to the triad t by the inner-layer and outer-layer graders, wherein alpha is a hyper-parameter and is used for adjusting the proportion of predicted deviation of the inner layer and the outer layer in loss supervision.
Figure BDA0003664218070000093
KL divergence (Kullback-Leibler divergence) was used to measure the disparity of inner and outer layer classifier predictors, defined as
Figure BDA0003664218070000094
Figure BDA0003664218070000095
For maximizing mutual information between the inner representation F and the outer representation Z in a contrast learning manner. The inner layer of small molecules in the F represents F s And the inner layer of the macromolecule represents F l And Z are generated by different encoders respectively, and the representation spaces of the three groups of representations have differences, therefore, before calculating mutual information, a network consisting of 2 layers of full connection layers and 1 Skip connection is used to project the three groups of representations to the same space respectively, and the projected representations are respectively marked as F s ′、F l 'and Z'. Subsequently, positive and negative sample pairs are constructed according to the following rules: regarding a graph formed by certain types of nodes and connecting edges between the nodes in the outer layer graph as an isomorphic subgraph of the original outer layer interaction graph, namely, not distinguishing the specific types of the connecting edges between the nodes in the same type; and constructing a positive sample pair and a negative sample pair on each isomorphic subgraph, specifically, regarding a certain node u as an anchor point, forming a positive sample by matching u with any node directly connected with the node or u per se, and forming a negative sample by matching u with any node not directly connected with the node.
After the positive and negative sample pair structure is completed, the learning loss is compared
Figure BDA0003664218070000096
Is defined as:
Figure BDA0003664218070000097
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003664218070000098
and
Figure BDA0003664218070000099
are isomorphic subgraphs of an outer layer interaction graph consisting of drug nodes and endogenous protein nodes, respectively, F '(i) and Z' (i) are respectively a post-projection inner layer representation and a post-projection outer layer representation of the node i,
Figure BDA00036642180700000910
the lower bound of the mutual information value was calculated using the JS divergence (Jensen-Shannon divergence).
S420: and adjusting the hyper-parameters of the model, training the model under the optimal parameter setting, and storing the outer layer representation of the medicine and the network in the dual-visual angle fusion prediction module as a medicine interaction prediction model for subsequent use. Training the whole model end to enable the outputs of the two graders to be close to the true value as much as possible, enable the distribution of the two groups of outputs to be close as much as possible, simultaneously align the inner-layer representation and the outer-layer representation of the medicine and the endogenous protein by utilizing the inner-layer and outer-layer mutual information between the contrast learning maximization medicine-medicine and endogenous protein-endogenous protein node pairs, and enable the molecular structure information extracted from the inner layer to be 'injected' into the outer-layer representation with a close interaction relation;
specifically, an Adam optimizer was used for the full batch training and the following hyper-parameters were adjusted:
1) the weight coefficient alpha of the loss of the supervised learning of the inner layer visual angle in the target function;
2) predicting a weight coefficient beta of inconsistency loss of an inner layer and an outer layer in the target function;
3) the proportion gamma of the loss of the distribution inconsistency represented by the inner layer and the outer layer in the objective function;
4) the number of layers of the graph neural network used for coding the structure diagram of the inner layer small molecules and the inner layer large molecules;
5) the number of layers of the graph neural network used to encode the outer interaction network.
S430: under the optimal parameter setting, the training model is used for predicting the multi-type interaction of the drug combination, and the output of the outer-layer scorer is used as a final prediction result.
Referring to tables 1 and 2, the method proposed by the present invention (noted chembitip in tables 1 and 2) shows the learning method (HGT, GraIL, RGCN, Decagon, MIRACLE) from the viewpoint of predicting performance, which is the best in both cases of "chemical drug only" (C-DB on left side of table 1) and "chemical drug and biological drug facing" (CB-DB on right side of table 1). And, three modules (double-view angle abnormal picture construction module, double-view angle abnormal picture coding module, double-view angle fusion and prediction module) in the model of the invention are replaced or added and deleted to obtain models of different varieties, and compared with the complete model of the invention, the model using the outer interaction diagram enhancement strategy and the multi-view angle contrast fusion strategy is better in performance of prediction.
Table 1: results of each model on both C-DB and CB-DB data sets
Figure BDA0003664218070000101
Table 2: ablation experimental result of submodule on two data sets of C-DB and CB-DB
Figure BDA0003664218070000111
Corresponding to the embodiment of the method for constructing the drug interaction prediction model, the application also provides an embodiment of a device for constructing the drug interaction prediction model.
FIG. 6 is a block diagram of an apparatus for constructing a drug interaction prediction model, according to an exemplary embodiment. Referring to fig. 6, the apparatus includes a collection module 100, a construction module 200, a construction module 300, and a training module 400.
A collection module 100 for collecting a molecular linear representation of a collated drug, including chemical and biological drugs, endogenous proteins, and interactions between these molecules;
a construction module 200 for constructing a dual-view profile map with enhanced connectivity of the outer layer interaction map and data volume expansion of the inner layer molecular structure map using the molecular linear representation and the interaction between the molecules;
the construction module 300 is used for constructing a drug interaction prediction model taking a double-view heteromorphic image as input, the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view angle based on a graph neural network, and then the double-view fusion prediction module combines the double view angles to give prediction;
and the training module 400 is used for training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.
Referring to fig. 7, an embodiment of the present invention further provides a method for predicting drug interaction, including:
and inputting the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model to obtain a prediction result.
In correspondence with the foregoing embodiments of the drug interaction prediction method, the present application also provides embodiments of a drug interaction prediction device.
FIG. 8 is a block diagram illustrating a drug interaction prediction device, according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a prediction module 500.
The prediction module 500 is configured to input the drug combination to be predicted and the drug interaction type to be predicted into the drug interaction prediction model to obtain a prediction result.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement without inventive effort.
Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method of constructing a drug interaction prediction model or a method of drug interaction prediction as described above.
Accordingly, the present application also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of constructing a drug interaction prediction model or a method of predicting drug interaction as described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for constructing a drug interaction prediction model is characterized by comprising the following steps:
collecting molecular linear representations of the finishing drugs, endogenous proteins and interactions between the molecules, wherein the drugs comprise chemical drugs and biological drugs;
constructing a double-view-angle differential graph with enhanced connectivity of an outer-layer interaction graph and data volume expansion of an inner-layer molecular structure graph by using the molecular linear representation and the interaction among the molecules;
constructing a drug interaction prediction model taking a double-view heteromorphic image as input, wherein the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, the double-view heteromorphic image representation module learns the representation of drugs and endogenous proteins at each view based on a graph neural network, and then the double-view fusion prediction module combines the double views to give prediction;
training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.
2. The method of claim 1, wherein collecting the molecular linear representation of the codrug, endogenous proteins, and interactions between these molecules comprises:
collecting and collating drug ID and molecular linear representation, drug-drug interaction data and drug-endogenous protein interaction data from a drug information platform;
and converting the ID of the endogenous protein in the drug-endogenous protein interaction data into the ID in a protein database, and inquiring the protein database according to the ID to obtain the molecular linear representation of each endogenous protein and the interaction among the endogenous proteins.
3. The method of claim 2, wherein the molecular linear representation of the chemical drug refers to a SMILES expression of the small molecules that make up the drug; the biological medicine refers to protein biological medicine, and the molecular linear expression of the biological medicine is the amino acid sequence of protein macromolecules forming the medicine; the molecular linear representation of the endogenous protein refers to the amino acid sequence of the protein macromolecules that make up it.
4. The method of claim 1, wherein constructing a dual-view metamgram with enhanced connectivity of the outer-layer interaction map and data volume augmentation of the inner-layer molecular structure map using the molecular linear representation and interactions between the molecules comprises:
only taking the molecule linear expression of a single drug/endogenous protein as input, constructing a molecular structure diagram with attributes for the drug/endogenous protein molecule, and taking the molecular structure diagram as an inner layer molecular structure diagram;
the collected interaction data of the drug, the endogenous protein and the endogenous protein are used as input, the drug and the endogenous protein are respectively regarded as a type of node, different types of interaction relations between the drug and the endogenous protein are regarded as different types of edges, and an isomeric interaction diagram with stronger connectivity than a pure drug interaction diagram is constructed and used as an outer layer interaction diagram;
and corresponding each constructed inner layer molecular structure diagram to a corresponding node in the constructed outer layer diagram to complete the construction of the double-view-angle different-pattern diagram.
5. The method of claim 1, wherein the dual view heteromorphic image characterization module learns the characterization of drugs and endogenous proteins at each view based on a neural network of images, and the dual view fusion prediction module then gives predictions in combination with the dual views, comprising:
the double-view-angle heteromorphic image coding module respectively codes an inner-layer molecular structure diagram of a chemical drug and an inner-layer molecular structure diagram of a biological drug and an endogenous protein by adopting a molecular representation learning method based on a graph neural network to obtain an inner-layer representation of each drug/endogenous protein;
setting initial node representations in the outer interaction graph, namely initial representations of the drug and the endogenous protein, as corresponding inner layer representations, and then extracting the outer layer representations of the drug and the endogenous protein from the outer interaction graph by the double-visual-angle-map encoding module by using a graph neural network suitable for the multi-relation map;
the dual-view fusion prediction module comprises an inner-layer scorer and an outer-layer scorer, the inner-layer scorer and the outer-layer scorer map the inner-layer representation and the outer-layer representation to the same space and align through maximum mutual information, and the inner-layer scorer and the outer-layer scorer respectively predict probability values of specific types of interactions of the medicine combination under the inner-layer view and the outer-layer view by using the inner-layer representation and the outer-layer representation.
6. The method of claim 1, wherein training the constructed model and adjusting the hyper-parameters thereof to obtain a multi-type drug interaction prediction model for chemical and biological drugs comprises:
training the double-visual-angle heteromorphic image coding module and the double-visual-angle fusion prediction module end to end in a gradient descending mode to enable the output of the double-visual-angle fusion prediction module to approach to a true value and be consistent, and simultaneously maximizing mutual information between the inner layer representation and the outer layer representation in a comparison learning mode to enable molecular structure information extracted from the inner layer to be injected into the outer layer representation which is more closely related to a multi-type drug interaction prediction task;
and adjusting the hyper-parameters of the model, training the model under the optimal parameter setting, and storing the outer layer representation of the medicine and the network in the dual-visual angle fusion prediction module as a medicine interaction prediction model.
7. An apparatus for constructing a model for predicting drug interaction, comprising:
a collection module for collecting molecular linear representations of collated drugs, including chemical and biological drugs, endogenous proteins, and interactions between these molecules;
the construction module is used for constructing a double-view-angle abnormal picture with enhanced connectivity of an outer layer interaction picture and expanded data volume of an inner layer molecular structure picture by using the molecular linear representation and the interaction among the molecules;
the model comprises a double-view heteromorphic image representation module and a double-view fusion prediction module, wherein the double-view heteromorphic image representation module learns the representation of the medicine and the endogenous protein at each view angle based on a graph neural network, and then the double-view fusion prediction module combines the double view angles to give prediction;
and the training module is used for training the built model and adjusting the hyper-parameters of the model to obtain a multi-type drug interaction prediction model for chemical drugs and biological drugs.
8. A method for predicting drug interaction, comprising:
inputting the drug combination to be predicted and the type of drug interaction to be predicted into the drug interaction prediction model of claim 1 to obtain the prediction result.
9. A drug interaction prediction device, comprising:
a prediction module, configured to input a drug combination to be predicted and a drug interaction type to be predicted into the drug interaction prediction model according to claim 1, so as to obtain a prediction result.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6, 8.
CN202210588763.7A 2022-05-26 2022-05-26 Construction method, prediction method and device of drug interaction prediction model Pending CN114974408A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210588763.7A CN114974408A (en) 2022-05-26 2022-05-26 Construction method, prediction method and device of drug interaction prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210588763.7A CN114974408A (en) 2022-05-26 2022-05-26 Construction method, prediction method and device of drug interaction prediction model

Publications (1)

Publication Number Publication Date
CN114974408A true CN114974408A (en) 2022-08-30

Family

ID=82954888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210588763.7A Pending CN114974408A (en) 2022-05-26 2022-05-26 Construction method, prediction method and device of drug interaction prediction model

Country Status (1)

Country Link
CN (1) CN114974408A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method
CN111243659A (en) * 2018-11-29 2020-06-05 中国科学院大连化学物理研究所 Drug interaction prediction method based on drug multidimensional similarity
CN112382411A (en) * 2020-11-13 2021-02-19 大连理工大学 Drug-protein targeting effect prediction method based on heterogeneous graph
CN113571125A (en) * 2021-07-29 2021-10-29 杭州师范大学 Drug target interaction prediction method based on multilayer network and graph coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method
CN111243659A (en) * 2018-11-29 2020-06-05 中国科学院大连化学物理研究所 Drug interaction prediction method based on drug multidimensional similarity
CN112382411A (en) * 2020-11-13 2021-02-19 大连理工大学 Drug-protein targeting effect prediction method based on heterogeneous graph
CN113571125A (en) * 2021-07-29 2021-10-29 杭州师范大学 Drug target interaction prediction method based on multilayer network and graph coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
闵倩;廖俊;陆涛;: "基于大型药物数据库的药物相互作用预测模型", 中国临床药理学杂志, vol. 32, no. 11, 17 June 2016 (2016-06-17), pages 1034 - 1036 *

Similar Documents

Publication Publication Date Title
CN112765486B (en) Knowledge graph fused attention mechanism movie recommendation method
CN111460130B (en) Information recommendation method, device, equipment and readable storage medium
Mahmood Dissimilarity fuzzy soft points and their applications
Hummon et al. Connectivity in a citation network: The development of DNA theory
CN109583562A (en) SGCNN: the convolutional neural networks based on figure of structure
CN110619084B (en) Method for recommending books according to borrowing behaviors of library readers
CN110347932A (en) A kind of across a network user&#39;s alignment schemes based on deep learning
CN113821670B (en) Image retrieval method, device, equipment and computer readable storage medium
CN114398983A (en) Classification prediction method, classification prediction device, classification prediction equipment, storage medium and computer program product
CN113781385B (en) Combined attention-seeking convolution method for automatic classification of brain medical images
CN112085525A (en) User network purchasing behavior prediction research method based on hybrid model
CN113421658B (en) Drug-target interaction prediction method based on neighbor attention network
CN111931023B (en) Community structure identification method and device based on network embedding
CN113764034A (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
CN111597943B (en) Table structure identification method based on graph neural network
CN115358809A (en) Multi-intention recommendation method and device based on graph comparison learning
CN116206327A (en) Image classification method based on online knowledge distillation
CN115985520A (en) Medicine disease incidence relation prediction method based on graph regularization matrix decomposition
CN114141361B (en) Traditional Chinese medicine prescription recommendation method based on symptom term mapping and deep learning
CN115424053A (en) Small sample image identification method, device and equipment and storage medium
CN113345564B (en) Early prediction method and device for patient hospitalization duration based on graph neural network
CN112259157B (en) Protein interaction prediction method
CN111259176B (en) Cross-modal Hash retrieval method based on matrix decomposition and integrated with supervision information
CN112905906A (en) Recommendation method and system fusing local collaboration and feature intersection
CN111061754B (en) Family map determining method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination