CN116206775A - Multi-dimensional characteristic fusion medicine-target interaction prediction method - Google Patents
Multi-dimensional characteristic fusion medicine-target interaction prediction method Download PDFInfo
- Publication number
- CN116206775A CN116206775A CN202310038717.4A CN202310038717A CN116206775A CN 116206775 A CN116206775 A CN 116206775A CN 202310038717 A CN202310038717 A CN 202310038717A CN 116206775 A CN116206775 A CN 116206775A
- Authority
- CN
- China
- Prior art keywords
- target
- drug
- information
- medicine
- interaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000003993 interaction Effects 0.000 title claims abstract description 46
- 230000004927 fusion Effects 0.000 title claims description 13
- 239000003814 drug Substances 0.000 claims abstract description 96
- 229940079593 drug Drugs 0.000 claims abstract description 67
- 201000010099 disease Diseases 0.000 claims abstract description 12
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 12
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 230000000694 effects Effects 0.000 claims abstract description 11
- 230000005540 biological transmission Effects 0.000 claims abstract description 6
- 238000010586 diagram Methods 0.000 claims abstract description 6
- 239000003596 drug target Substances 0.000 claims description 29
- 239000011159 matrix material Substances 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 23
- 102000004169 proteins and genes Human genes 0.000 claims description 15
- 108090000623 proteins and genes Proteins 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 150000001413 amino acids Chemical class 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000012512 characterization method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 2
- 238000004220 aggregation Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000002790 cross-validation Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000016434 protein splicing Effects 0.000 claims description 2
- 239000000126 substance Chemical group 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims 1
- 238000012827 research and development Methods 0.000 abstract description 6
- 238000011161 development Methods 0.000 description 7
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 description 4
- 150000001875 compounds Chemical class 0.000 description 4
- 239000003446 ligand Substances 0.000 description 4
- 208000025721 COVID-19 Diseases 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 229940126585 therapeutic drug Drugs 0.000 description 2
- 230000002110 toxicologic effect Effects 0.000 description 2
- 231100000027 toxicology Toxicity 0.000 description 2
- 208000035977 Rare disease Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 229940000406 drug candidate Drugs 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 238000009511 drug repositioning Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010534 mechanism of action Effects 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Toxicology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Pharmacology & Pharmacy (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a medicine-target interaction prediction method fusing multidimensional characteristics; extracting information of the medicines and targets with interaction and related diseases and side effects of the medicines from a medical database, and constructing a heterogeneous network; extracting topological structure features in the heterogeneous network by using the heterogeneous graph semantic neural network; the molecular sequence of the medicine SMILES is expressed as a molecular diagram structure, and medicine characteristic information is extracted; simultaneously extracting target feature information; the extracted characteristic information of the medicine and the target point is integrated into the message transmission process of the heterograph attention neural network to train, a model is stored, and the relation between the medicine and the target point is predicted. The invention effectively utilizes the biological characteristic information of the medicine and the target, has higher accuracy in predicting the medicine-target relation, improves the efficiency and the accuracy of verifying the medicine-target relation, effectively shortens the medicine research and development period and greatly reduces the research and development cost of new medicines.
Description
Technical Field
The invention relates to the technical field of medical artificial intelligence, in particular to a medicine-target interaction prediction method fusing multidimensional features.
Background
New drug development is a lengthy and expensive process, typically taking 10-17 years from thinking to drug market, and capital investment will be between 7-27 billion dollars. The method for discovering the new indication by using the existing medicaments has the advantages of low research and development cost and short development time. Thus, it is becoming increasingly attractive to reuse existing drugs to treat common and rare diseases. Predicting drug-target interactions is an essential step in the identification of new candidate compounds with potential therapeutic effects. The medicine plays an important role in human body through interaction with various targets, can strengthen or inhibit the function of the medicine, and plays a regulating role to achieve the aim of treating a certain disease. Thus, identifying drug-target interactions can help understand the mechanism of action of a drug, playing a vital role in the discovery of new targets and drug repositioning.
Currently, structure-based methods, ligand similarity-based methods, and network-based methods are the primary ways of performing drug-target interaction prediction. Where structure-based methods generally require knowledge of the three-dimensional structure of the protein, the performance of those proteins whose structure is unknown is often poor. Methods based on ligand similarity make use of the common sense of known ligands to predict, and if the target compound is not indicated in the target ligand library, such methods will not yield a reliable prediction. Network-based methods make full use of potential correlations between drugs and targets, and have become a mainstream technique for analyzing and solving drug target interaction prediction-related problems. Inspired by information transmission and clustering tasks in deep learning, drug target prediction can perform large-scale data mining on a graph neural network, wherein a graph convolutional network-based method is particularly outstanding, a large amount of effective hidden information is stored in a huge heterogeneous network formed by drugs and related data, and potential association existing in the network can be effectively mined by processing the information through the graph convolutional network, so that drug discovery research is facilitated. However, these methods generally ignore the use of biological knowledge, such as the biological structural properties in the sequence of the compound, and thus cannot obtain the potential features in the data, and there is still a large room for improvement in terms of model performance.
Disclosure of Invention
The invention aims to provide a graphic neural network model which integrates the molecular structure information and the protein biological structure information of a medicine, and the graphic neural network model can automatically predict the interaction relation between the medicine and a target point, so that the verification efficiency is improved, and the verification cost is reduced.
In order to achieve the above purpose, the technical scheme of the application is as follows: a method of predicting drug-target interactions that incorporate multidimensional features, comprising:
step 1: extracting information of the medicines and targets with interaction and related diseases and side effects of the medicines from a medical database, preprocessing the information, and constructing a heterogeneous network;
step 2: extracting network topological structure characteristics in a heterogeneous network by using a heterogeneous graph ideographic neural network;
step 3: representing the molecular sequence of the medicine SMILES as a molecular diagram structure, and extracting the characteristic information of the medicine structure by using a molecular attention transducer network;
step 4: embedding and representing target sequence information, processing by using a convolutional neural network and a two-way long-short-term memory network, and extracting target structural feature information;
step 5: the extracted medicine structure characteristic information and target structure characteristic information are integrated into the message transmission process of the heterograph attention neural network;
step 6: optimizing and training a prediction model by using a cross entropy loss function, and then storing the prediction model;
step 7: and loading the prediction model, inputting information of the medicine and the target point to be predicted, performing relation prediction on the medicine and the target point, and outputting a prediction result.
Further, the specific implementation process of the step 1 comprises the following steps:
step 1.1: screening the drug information and target information from the medical database, and deleting the drug information and target information which have no interaction relation;
step 1.2: obtaining the SMILES molecular sequence corresponding to the medicine and the sequence information corresponding to the target point from a medical database, and respectively taking the SMILES molecular sequence and the sequence information corresponding to the target point as biological characteristic representation information of the medicine and the target point;
step 1.3: extracting information of diseases and side effects of the medicines related to the medicines and targets;
step 1.4: referring to fig. 2 (a), using the extracted drugs, targets, diseases and drug side effects as nodes, and the association information between them is represented as edges, and constructing a heterogeneous network g= (V, E), wherein V represents node sets and E represents edge sets;
step 1.5: integrating the drug and the target with interaction relationship and constructing a form of < drug number, target number, label >, and marking the label as 1;
step 1.6: according to the positive example: negative example is 1:10, randomly constructs an unknown drug-target relationship as negative example, and marks the label as 0.
Further, the specific implementation process of the step 2 comprises the following steps:
the heterogeneous network has an embedding formula f of an initialization node 0 :V→R d Wherein f 0 (v) A d-dimensional map representing each node v; the neighbor node information aggregation of node v is defined as:
wherein sigma (·) represents a nonlinear activation function in the propagation process of a layer of neural network, K is the number of attention layers, N v All neighboring nodes representing node v, W is a shared weight parameter, a represents the weight vector of the attention mechanism, aconC is a new type of activation function that can be adaptively learned.
Further, the specific implementation process of the step 3 comprises the following steps:
step 3.1: the SMILES molecular sequence of each drug is expressed as a molecular graph form by calling an RDkit function library in a Python library, wherein the top and the side of the graph respectively represent atoms and chemical bonds of the drug, each drug molecule is expressed by using a feature matrix and an adjacent matrix, and each row of the feature matrix corresponds to the attribute of each atom;
step 3.2: because each SMILES sequence has a different length, a maximum of 100 character length SMILES sequence is selected to create an effective representation, such that it covers at least 90% of the compounds in the dataset. Sequences greater than the maximum character length are truncated, while sequences less than the maximum character length are filled with 0 s;
step 3.3: referring to FIG. 2 (b), a molecular attention transducer network is used to extract a drug signature representation S drug The method comprises the steps of carrying out a first treatment on the surface of the The calculation formula of the molecular multi-head self-attention layer is as follows:
wherein the method comprises the steps ofAdjacency matrix representing a molecular diagram, < >>Representing the distance between atoms;A query vector matrix, a key vector matrix, and a value vector matrix, respectively, wherein W is a learnable parameter, i e (1,., h), h is the number of heads of multi-head attention; lambda (lambda) a 、λ d And lambda (lambda) g Scalar parameters representing weighted self-attention, distance and adjacency matrices.
Further, the specific implementation process of the step 4 includes:
step 4.1: randomly initializing an index table corresponding to all the amino acids appearing in the target sequence, wherein the size of the index table is 26 multiplied by 100; corresponding the amino acid in each target sequence with an index table to construct an embedding matrix of the target sequence; the length of the embedded matrix is the maximum length in the target sequence and is set to be 1000; in the model training process, the embedded vector is continuously optimized, so that the related information in the index table is continuously changed along with the optimization of the model;
step 4.2: referring to fig. 2 (c), a convolutional neural network and a two-way long-short term memory network are used to extract characteristic information in a target sequence.
Further, the specific implementation process of the step 4.2 includes:
step 4.2.1: taking the embedded matrix obtained in the step 4.1 as the input of a convolutional neural network; filling of empty labels is automatically carried out on target sequences smaller than the length of the embedded matrix; each CNN block uses three consecutive one-dimensional convolution layers, the number of convolution kernels increases with increasing number of layers, the second layer uses twice the first layer's convolution kernel, and the third layer uses three times the first layer's convolution kernel;
step 4.2.2: receiving the output of the convolution layer using the BiLSTM layer, the final output being the protein structural features, denoted S protein The formula is as follows:
wherein w and m respectively represent the weight matrix and the convolution window size, h is the LSTM hidden layer state, and x is the characteristic representation of the protein sequence.
Further, the specific implementation process of the step 5 includes:
referring to FIG. 2 (d), the drug structure feature vector S is obtained in step 3 drug And the target point structure characteristic vector S obtained in the step 4 protein Splicing is carried out in the heterogeneous graph meaning neural network message transmission stage, and the formula for updating node embedding in the formula (1) is as follows:
further, the specific implementation process of the step 6 includes:
step 6.1: referring to FIG. 2 (e), after obtaining a characterization of the drug and target, the inner product method is used for predictionDrug-target interactions; given a drug node u and a protein node v, f u And f v Representing their characteristics; the probability of interaction between u and v is:
P=σ((f u ) T f v ) (5)
wherein the method comprises the steps ofP represents the interaction prediction score between u and v as an s-type function;
step 6.2: optimizing and training the prediction Model by using a cross entropy loss function, testing the performance of the prediction Model by adopting 10 times of cross validation, and storing the prediction Model with the best effect best 。
Further, the specific implementation process of the step 7 includes:
loading the Model of the predictive Model in step 6.2 best Inputting the drug-target information in the verification data into a prediction model, judging whether the interaction relationship exists between the drug and the target, and outputting corresponding evaluation indexes.
By adopting the technical scheme, the invention can obtain the following technical effects: the invention adopts a deep learning model, utilizes the information of drugs, targets, diseases and drug side effects in a medicine database, combines the structural characteristics of the drugs and the targets, and automatically predicts the interaction information of the drugs and the targets through the model. The method effectively extracts the characteristic information in the medicine molecules and protein structures, has higher accuracy and robustness in the process of predicting the medicine-target point relationship, improves the efficiency and the accuracy of verifying the medicine-target point relationship, effectively shortens the medicine research and development period, greatly reduces the research and development cost of new medicines, and provides important foundation and guarantee for research and development of the new medicines and reuse of the medicines.
Drawings
FIG. 1 is a flow chart of a method for predicting drug-target interactions that incorporates multidimensional features;
FIG. 2 is a diagram of a model structure of a drug-target interaction prediction method incorporating multidimensional features.
Detailed Description
The embodiment of the invention is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are provided, but the protection scope of the invention is not limited to the following embodiment.
The present invention is described in detail below with reference to examples so that those skilled in the art can practice the same with reference to the present specification.
Example 1
In the embodiment, a Windows system is used as a development environment, pycharm is used as a development platform, python is used as a development language, and the medicine-target interaction prediction method which is fused with the multidimensional characteristics is adopted to predict the medicine-target interaction relation.
In this embodiment, a method for predicting drug-target interaction with multi-dimensional feature fusion includes the following steps:
extracting 708 drugs, 1512 proteins, 5603 diseases and 4192 drug side effects from DrugBank, pubChem database, HPRD database, comparative toxicological genomics database and SIDER database; marking the existing drug-target interaction relationship as positive examples, and setting the data label as 1 to be 1923 in total; 19230 cases are randomly selected from a drug-target pair which is not marked as a positive case, a negative case is constructed, and the data label is set to 0; constructing a heterogeneous network by using the obtained data;
and taking the heterogeneous network, the drug SMILES sequence and the protein sequence as inputs, training and storing a prediction model to obtain an evaluation index prediction score of the interaction relation between the drug and the target, wherein the evaluation index comprises an area under an operation characteristic curve (AUROC) of a receiver and an area under a precision-recall ratio curve (AUPR).
According to the steps, the invention compares the drug-target relation prediction effect with an EEG-DTI model, a NeoDTI model, a DTINet model, an MSCMF model and an HNM model. As can be seen from table 1, the process proposed herein is significantly better than the other processes in both AUROC and AUPR.
Table 1 comparison of different models for drug-target relationship prediction results
Example 2
In the embodiment, a Windows system is used as a development environment, pycharm is used as a development platform, python is used as a development language, and the method for predicting the drug-target interaction by fusing the multidimensional characteristics is used for predicting the potential therapeutic drugs of the COVID-19.
In this embodiment, a method for predicting drug-target interaction with multi-dimensional feature fusion includes the following steps:
146 targets closely related to the COVID-19 and 708 candidate drugs are extracted from a comparative toxicological genomics database and a drug Bank database; obtaining SMILES sequences and sequence structures of targets related to the medicaments from a PubCHem database; extracting 1456 diseases and 4192 drug side effects related to drugs and targets from the HPRD database and the SIDER database; constructing a heterogeneous network through the acquired data;
taking the sequence data of the heterogeneous network, the drugs and the proteins as input, and loading a stored prediction model to obtain a drug and different target interaction relation prediction Score; the predictive scores Score were sorted in descending order, extracting the top 10 drug candidates for each target confidence ranking, while requiring that these drug confidence scores be greater than 0.5. After such treatment, only 15 targets meet the requirements, and 150 candidate drugs are finally obtained by screening.
Of the 150 drugs screened, 54 had been shown in the covd-19 clinical study and some of the data are shown in table 2. By the method, candidate medicines can be quickly and more specifically searched for in subsequent wet experiments.
TABLE 2 screening of therapeutic drugs related to COVID-19 according to the present invention
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (9)
1. A method for predicting drug-target interactions with fusion of multidimensional features, comprising:
step 1: extracting information of the medicines and targets with interaction and related diseases and side effects of the medicines from a medical database, preprocessing the information, and constructing a heterogeneous network;
step 2: extracting network topological structure characteristics in a heterogeneous network by using a heterogeneous graph ideographic neural network;
step 3: representing the molecular sequence of the medicine SMILES as a molecular diagram structure, and extracting the characteristic information of the medicine structure by using a molecular attention transducer network;
step 4: embedding and representing target sequence information, processing by using a convolutional neural network and a two-way long-short-term memory network, and extracting target structural feature information;
step 5: the extracted medicine structure characteristic information and target structure characteristic information are integrated into the message transmission process of the heterograph attention neural network;
step 6: optimizing and training a prediction model by using a cross entropy loss function, and then storing the prediction model;
step 7: and loading the prediction model, inputting information of the medicine and the target point to be predicted, performing relation prediction on the medicine and the target point, and outputting a prediction result.
2. The method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 1, wherein the specific implementation process of step 1 comprises:
step 1.1: screening the drug information and target information from the medical database, and deleting the drug information and target information which have no interaction relation;
step 1.2: obtaining the SMILES molecular sequence corresponding to the medicine and the sequence information corresponding to the target point from a medical database, and respectively taking the SMILES molecular sequence and the sequence information corresponding to the target point as biological characteristic representation information of the medicine and the target point;
step 1.3: extracting information of diseases and side effects of the medicines related to the medicines and targets;
step 1.4: taking the extracted medicines, targets, diseases and side effects of the medicines as nodes, and expressing the association information between the extracted medicines, targets, diseases and side effects of the medicines as edges to construct a heterogeneous network G= (V, E), wherein V represents a node set and E represents an edge set;
step 1.5: integrating the drug and the target with interaction relationship and constructing a form of < drug number, target number, label >, and marking the label as 1;
step 1.6: according to a certain proportion, an unknown drug-target relationship is randomly constructed as a negative example, and the label is marked as 0.
3. The method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 1, wherein the specific implementation process of step 2 comprises:
the heterogeneous network has an embedding formula f of an initialization node 0 :V→R d Wherein f 0 (v) A d-dimensional map representing each node v; the neighbor node information aggregation of node v is defined as:
wherein sigma (·) represents a nonlinear activation function in the propagation process of a layer of neural network, K is the number of attention layers, N v All neighboring nodes representing node v, W is a shared weight parameter, a represents the weight vector of the attention mechanism, aconC is a new type of activation function that can be adaptively learned.
4. The method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 1, wherein the specific implementation process of step 3 comprises:
step 3.1: the SMILES molecular sequence of each drug is expressed as a molecular graph form by calling an RDkit function library in a Python library, wherein the top and the side of the graph respectively represent atoms and chemical bonds of the drug, each drug molecule is expressed by using a feature matrix and an adjacent matrix, and each row of the feature matrix corresponds to the attribute of each atom;
step 3.2: selecting a maximum 100 character length SMILES sequence, wherein sequences larger than the maximum character length are truncated, and sequences smaller than the maximum character length are filled with 0;
step 3.3: extraction of drug characterization S using molecular attention transducer network drug The method comprises the steps of carrying out a first treatment on the surface of the The calculation formula of the molecular multi-head self-attention layer is as follows:
wherein the method comprises the steps ofAdjacency matrix representing a molecular diagram, < >>Representing the distance between atoms; q (Q) i =XW i q ,K i =XW i k ,V i =XW i v A query vector matrix, a key vector matrix, and a value vector matrix, respectively, wherein W is a learnable parameter, i e (1,., h), h is the number of heads of multi-head attention; lambda (lambda) a 、λ d And lambda (lambda) g Scalar parameters representing weighted self-attention, distance and adjacency matrices.
5. The method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 1, wherein the specific implementation process of step 4 comprises:
step 4.1: randomly initializing an index table corresponding to all the amino acids appearing in the target sequence; corresponding the amino acid in each target sequence with an index table to construct an embedding matrix of the target sequence; the length of the embedded matrix is the maximum length in the target sequence;
step 4.2: and extracting characteristic information in the target sequence by using a convolutional neural network and a two-way long-short-term memory network.
6. The method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 5, wherein the specific implementation process of step 4.2 comprises:
step 4.2.1: taking the embedded matrix obtained in the step 4.1 as the input of a convolutional neural network; filling of empty labels is automatically carried out on target sequences smaller than the length of the embedded matrix; each CNN block uses three consecutive one-dimensional convolution layers, the number of convolution kernels increases with increasing number of layers, i.e. the second layer uses twice the first layer's convolution kernel and the third layer uses three times the first layer's convolution kernel;
step 4.2.2: receiving the output of the convolution layer using the BiLSTM layer, the final output being the protein structural features, denoted S protein The formula is as follows:
wherein w and m respectively represent the weight matrix and the convolution window size, h is the LSTM hidden layer state, and x is the characteristic representation of the protein sequence.
7. The method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 1, wherein the specific implementation process of step 5 comprises:
obtaining the drug structure characteristic vector S in the step 3 drug And the target point structure characteristic vector S obtained in the step 4 protein Splicing is carried out in the heterogeneous graph meaning neural network message transmission stage, and the formula for updating node embedding in the formula (1) is as follows:
8. the method for predicting drug-target interaction with multi-dimensional feature fusion according to claim 1, wherein the specific implementation process of step 6 comprises:
step 6.1: after obtaining a characterization of the drug and target, predicting drug-target interactions using an inner product method; given a drug node u and a protein node v, f u And f v Representing their characteristics; the probability of interaction between u and v is:
P=σ((f u ) T f v )(5)
wherein the method comprises the steps ofP represents the interaction prediction score between u and v as an s-type function;
step 6.2: optimizing and training the prediction Model by using a cross entropy loss function, testing the performance of the prediction Model by adopting 10 times of cross validation, and storing the prediction Model with the best effect best 。
9. The method for predicting drug-target interactions with fusion of multidimensional features according to claim 8, wherein the step 7 is specifically implemented by:
loading the Model of the predictive Model in step 6.2 best Inputting the drug-target information in the verification data into a prediction model, judging whether the interaction relationship exists between the drug and the target, and outputting corresponding evaluation indexes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310038717.4A CN116206775A (en) | 2023-01-13 | 2023-01-13 | Multi-dimensional characteristic fusion medicine-target interaction prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310038717.4A CN116206775A (en) | 2023-01-13 | 2023-01-13 | Multi-dimensional characteristic fusion medicine-target interaction prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116206775A true CN116206775A (en) | 2023-06-02 |
Family
ID=86516594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310038717.4A Pending CN116206775A (en) | 2023-01-13 | 2023-01-13 | Multi-dimensional characteristic fusion medicine-target interaction prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116206775A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116894180A (en) * | 2023-09-11 | 2023-10-17 | 南京航空航天大学 | Product manufacturing quality prediction method based on different composition attention network |
CN117809737A (en) * | 2023-12-26 | 2024-04-02 | 国药(武汉)精准医疗科技有限公司 | Drug target protein interaction identification method, device, equipment and storage medium |
CN118197402A (en) * | 2024-04-02 | 2024-06-14 | 宁夏大学 | Method, device and equipment for predicting drug target relation |
CN118506856A (en) * | 2024-07-18 | 2024-08-16 | 中国石油大学(华东) | Medicine target interaction prediction method and system based on artificial intelligence |
-
2023
- 2023-01-13 CN CN202310038717.4A patent/CN116206775A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116894180A (en) * | 2023-09-11 | 2023-10-17 | 南京航空航天大学 | Product manufacturing quality prediction method based on different composition attention network |
CN116894180B (en) * | 2023-09-11 | 2023-11-24 | 南京航空航天大学 | Product manufacturing quality prediction method based on different composition attention network |
CN117809737A (en) * | 2023-12-26 | 2024-04-02 | 国药(武汉)精准医疗科技有限公司 | Drug target protein interaction identification method, device, equipment and storage medium |
CN118197402A (en) * | 2024-04-02 | 2024-06-14 | 宁夏大学 | Method, device and equipment for predicting drug target relation |
CN118197402B (en) * | 2024-04-02 | 2024-09-10 | 宁夏大学 | Method, device and equipment for predicting drug target relation |
CN118506856A (en) * | 2024-07-18 | 2024-08-16 | 中国石油大学(华东) | Medicine target interaction prediction method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11462304B2 (en) | Artificial intelligence engine architecture for generating candidate drugs | |
CN116206775A (en) | Multi-dimensional characteristic fusion medicine-target interaction prediction method | |
CN113140254B (en) | Meta-learning drug-target interaction prediction system and prediction method | |
CN110021341B (en) | Heterogeneous network-based GPCR (GPCR-based drug and targeting pathway) prediction method | |
US12087404B2 (en) | Generating anti-infective design spaces for selecting drug candidates | |
CN114093527B (en) | Drug repositioning method and system based on spatial similarity constraint and nonnegative matrix factorization | |
CN112131399A (en) | Old medicine new use analysis method and system based on knowledge graph | |
CN112562791A (en) | Drug target action depth learning prediction system based on knowledge graph, computer equipment and storage medium | |
CN113764034A (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
CN114882970B (en) | Medicine interaction effect prediction method based on pre-training model and molecular diagram | |
CN115376704A (en) | Medicine-disease interaction prediction method fusing multi-neighborhood correlation information | |
CN112837743B (en) | Drug repositioning method based on machine learning | |
CN113284627A (en) | Medication recommendation method based on patient characterization learning | |
CN116646001B (en) | Method for predicting drug target binding based on combined cross-domain attention model | |
Halsana et al. | DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein–Protein Interactions | |
CN117457064A (en) | Graph structure self-adaption based medicine-medicine interaction prediction method and device | |
CN114999566B (en) | Drug repositioning method and system based on word vector characterization and attention mechanism | |
Wang et al. | Predicting polypharmacy side effects based on an enhanced domain knowledge graph | |
CN116630062A (en) | Medical insurance fraud detection method, system and storage medium | |
CN113345535A (en) | Drug target prediction method and system for keeping chemical property and function consistency of drug | |
US11915832B2 (en) | Apparatus and method for processing multi-omics data for discovering new drug candidate substance | |
CN114300036A (en) | Genetic variation pathogenicity prediction method and device, storage medium and computer equipment | |
CN117976047B (en) | Key protein prediction method based on deep learning | |
CN115458061B (en) | Medicine-protein interaction prediction method and system | |
CN118447340B (en) | Method and equipment for carrying out spatial modeling on image class relation based on prototype network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |