AU2021104604A4 - Drug target prediction method for keeping consistency of chemical properties and functions of drugs - Google Patents
Drug target prediction method for keeping consistency of chemical properties and functions of drugs Download PDFInfo
- Publication number
- AU2021104604A4 AU2021104604A4 AU2021104604A AU2021104604A AU2021104604A4 AU 2021104604 A4 AU2021104604 A4 AU 2021104604A4 AU 2021104604 A AU2021104604 A AU 2021104604A AU 2021104604 A AU2021104604 A AU 2021104604A AU 2021104604 A4 AU2021104604 A4 AU 2021104604A4
- Authority
- AU
- Australia
- Prior art keywords
- drug
- drugs
- target
- space
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 239000003814 drug Substances 0.000 title claims abstract description 293
- 229940079593 drug Drugs 0.000 title claims abstract description 293
- 239000000126 substance Substances 0.000 title claims abstract description 125
- 230000006870 function Effects 0.000 title claims abstract description 65
- 239000003596 drug target Substances 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 46
- 201000010099 disease Diseases 0.000 claims abstract description 108
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 108
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 97
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 96
- 230000003993 interaction Effects 0.000 claims abstract description 56
- 239000013598 vector Substances 0.000 claims abstract description 24
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 72
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 3
- 230000009456 molecular mechanism Effects 0.000 abstract description 13
- 238000007876 drug discovery Methods 0.000 abstract description 4
- 230000008859 change Effects 0.000 abstract description 3
- 101710132952 Protein U4 Proteins 0.000 abstract description 2
- 230000008901 benefit Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 2
- 238000000205 computational method Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 abstract description 2
- 238000009511 drug repositioning Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 6
- 239000003446 ligand Substances 0.000 description 6
- 230000001052 transient effect Effects 0.000 description 3
- 125000003275 alpha amino acid group Chemical group 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000003032 molecular docking Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 101710177166 Phosphoprotein Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention belongs to the technical field of computational biology, and proposes a
drug-target interaction (DTI) prediction method based on maintaining the consistency of the
chemical properties and functions of drugs. Finding unknown DTI is a key step in drug
discovery and repositioning. Due to the shortcomings of time-consuming, laborious, high cost,
and high failure rate for DTI identification based on biological experiments, the prediction of
possible targets of drugs based on computational methods has gradually become a hotspot in
the field of drug discovery in recent years. However, most of the previous inventions on DTI
prediction did not consider the consistency of the chemical properties and functions of the
drugs in the prediction process. This change in consistency may have a serious negative impact
on the accuracy of the prediction results. The advantages of the present invention are: (1) the
possible target of the drugs is predicted from two perspectives by taking chemical properties
and clinical functions into consideration at the same time ; (2) the feature vector of the drugs
is firstly projected to the protein space then to disease space through the autoencoder model,
which makes the drug-target interaction prediction task change from the traditional single-label
classification to the multi-label, taking into account the complex compatibility-exclusion
relationship between the drugs and the proteins; (3) the consistency of medicinal chemical
properties, molecular mechanisms and clinical functions is maintained by maintaining the
consistency of chemical similarity and functional similarity of drugs.
1/2
Sequence information of
protein
Similarity calculation F waw-S Hfia Umcd mnMe F"Os Huffmnan & )
Coding
MRPSCTA-GAA Algorithm
MAHVRGL"QLP
BLOSUME62
WDViOVGP-LPH
b&TESM---RDV
SMIL ES informationofdrugs ECTPS
ECF.
r, CCIMH)(C)(C@HI[NC(=D) F" I oCosine
similarity -M
, CCO(CII j1CN2CCC1C2)N1 0 1 0 .. 1r
Ti COJC OC@@H] ry 1 0 1 1 U
CC(C)C[C@HkNC@ONC@QHI rT 1 1 1 0
Disease-drugcorrelation
drugs
r. ro r r4 cosn n 4 4,4
protein U4
disease
Figure 1
Description
1/2
Sequence information of protein Similarity calculation F waw-S Hfia Umcd mnMe F"Os Huffmnan &
) Coding MRPSCTA-GAA Algorithm MAHVRGL"QLP
BLOSUME62 WDViOVGP-LPH b&TESM---RDV
SMIL ES informationofdrugs ECTPS ECF. r, CCIMH)(C)(C@HI[NC(=D) F" I oCosine similarity -M , CCO(CII j1CN2CCC1C2)N1 0 1 0 .. 1r Ti COJC OC@@H] ry 1 0 1 1 U CC(C)C[C@HkNC@ONC@QHI rT 1 1 1 0
Disease-drugcorrelation drugs r. ro r r4 cosnn 4 4,4
protein U4 disease
Figure 1
Drug target prediction method for keeping consistency of chemical properties and
functions of drugs
The invention relates to the technical field of computer-aided drug discovery, in
particular to a drug target prediction method for keeping the consistency of chemical
properties and functions of drugs.
Because the identification of drug-target interaction (DTI) based on biological wet-lab
experiments is time-consuming, laborious, costly, and has high failure rate, the prediction
of possible drug targets based on computational methods has become a research hotspot
in the field of drug discovery. However, most previous inventions about DTI prediction
did not consider the consistency of chemical properties and functions of drugs in the
prediction process. This change in consistency may have a serious negative impact on the
accuracy of prediction results.
Traditional drug-target interaction prediction methods can be mainly divided into docking
simulations and ligand-based methods. Docking simulation methods need to simulate the
3D structure of target, which is very time-consuming and not all the structural
information of target protein is known. The ligand-based methods compare the target
protein of ligand to be queried with that of a group of known ligands. However, when the
number of known ligands is small, ligand-based methods do not perform well.
To solve the problems above, a prediction method of drug target interaction based on
deep neural network is proposed. Based on the assumption that similar drugs are more
likely to interact with similar targets, the possibility of interaction between each pair of drugs and targets is analysed by integrating various information in drug-target heterogeneous networks.
However, the current methods based on deep neural network all regard the prediction of
drug-target interaction as a single label and binary classification task, which makes the
prediction process between each pair of drug-target independent. However, because the
chemical properties and functions of drugs should be consistent, drugs with similar
chemical structures should also have similar target protein or similar indications. Ignoring
the complex compatibility-exclusion relationship between drugs and proteins and making
independent predictions for each pair of drugs-targets may regard mutually exclusive
drugs as compatible, which may lead to serious drug misuse in the clinical treatment.
Similarly, there are similar hidden dangers in predicting the relationship between drugs
and diseases. Therefore, in addition to considering the similarities of drugs and of target
protein, it is particularly important to keep the chemical property similarity and functional
similarity among drugs consistent.
The purpose of the present invention is to provide a drug-target prediction method that
keeps the consistency of chemical properties and functions of drugs, so as to solve at least
one of technical problems existing in the background technology above.
In order to achieve the purpose above, the invention adopts the following technical
scheme:
On the one hand, the invention provides a method for predicting drug targets for keeping
the consistency of chemical properties and functions of drugs, which comprises the
following steps:
Acquiring chemical fingerprints of drugs to be predicted;
Using the trained feature selection model to process the chemical fingerprints of the drug
to obtain the interaction score matrix between the drug and the target; The feature
selection model regards the targets of drug as the feature of drug in protein space and the
indications of drug as the feature of drug in disease space. Wherein, when the feature
selection model undergo training, the similarity among target sequences and among
diseases are considered, with a aim at minimizing the similarity difference of drugs in
different spaces so as to keep the chemical properties and functions of drugs consistent;
Based on the score matrix of drug-target interaction, the corresponding target with the
highest score is taken as the candidate target of the drug.
Preferably, training the feature selection model comprises:
Extracting chemical fingerprints, sequence information of targets protein, targets
information and indications information of drugs to generate chemical fingerprints
characteristic matrix, drug-target interaction matrix and drug-disease association matrix.
Based on chemical fingerprints information, sequence of targets protein, drugs
information related to disease, the similarity matrix of chemical fingerprints, sequence
similarity matrix among targets protein and similarity matrix among diseases are
calculated respectively.
Based on the chemical fingerprint similarity matrix of drugs, sequence similarity matrix
among targets and similarity matrix among diseases, the similarity of drugs in drug,
protein and disease space is calculated on the basis of the chemical fingerprints of drugs,
action targets and related indications, and the feature selection model which regards the
target of drug as the feature of drug in protein space and the indication of drug as the feature of drug in disease space is trained with the goal of keeping the error of the three similarities minimum.
Preferably, the feature vector of the drug in the target protein space is calculated based on
the chemical fingerprint similarity matrix of the drug, the sequence similarity matrix
between targets and the similarity matrix between diseases.
Based on the feature vector of the drug in the target protein space, the association score
between the drug and each disease in the disease space is obtained.
Based on the feature vector of drugs in target protein space and the association score
between drugs and diseases in disease space, the similarity of each pair of drugs in target
protein space and disease space is calculated.
Based on the similarity of each pair of drugs in target protein space and disease space, the
score matrix of drug-target interaction is calculated.
Preferably, the chemical fingerprints of the drug are projected into the target protein
space through an encoder composed of two layers of fully connected neural networks,
and the interaction score between the drug and each target is obtained.
The interaction score between the drug and each target is projected into the disease space
through the decoder to get the association score between the drug and each disease.
Preferably, the chemical fingerprints f' of the drug , is input into an encoder
composed of two layers of fully connected neural networks, and projected into the target
protein space to obtain the feature vector h 3 in the target protein space:
h|=o7,(W'hi+b|) t=1,2,3
hi° =rT
Where o, h|, W', b| are the activation function, output, weight matrix and biased
vector respectively of the fully connected layer for the tth layer.
6 Preferably, with the help of a decoder, the association score h, of predicted drug
diseases is calculated according to the result h3 of the encoder:
a W'h|'- + b|} t =4,5,6. h|'=o
Preferably, the loss function of the feature selection model is:
loss= -ZhM _Ih-Y T k2 +~(~ij+_ S(,j )Si) S,(i, j) + S(i, j);
Wherein, for the encoder, A4 =3 , k=q, S=S,, S4 =SP, and for the decoder,
, 1,=14, k= n, S= Si S= SD; SP indicates the similarity of protein calculated
based on encoder prediction results; SD indicates the similarity among diseases
calculated based on decoder prediction results.
In a second aspect, the present invention provides a drug target prediction system for
maintaining the consistency of chemical properties and functions of drugs, which
comprises:
The acquisition module is used for acquiring chemical fingerprints of drugs to be
predicted.
The calculation module is used for processing chemical fingerprints of drugs by using the
trained feature selection model to obtain an interaction score matrix of drugs and targets;
The feature selection model regards the targets of drug as the feature of drug in protein
space and the indications of drug as the feature of drug in disease space. Wherein when the feature selection model is trained, the similarity among target sequences and diseases are considered, with an aim at minimizing the similarity difference of drugs in different spaces, so as to keep the chemical properties and functions of drugs consistent.
And the judgement module is used for taking the corresponding target with the highest
score as the candidate target of the drug based on the interaction score matrix of the drug
and the target.
In a third aspect, the present invention provides a non-transient computer-readable
storage medium, which includes instructions for executing the drug target prediction
method for keeping the consistency of drug chemical properties and functions as
described above.
In a fourth aspect, the present invention provides an electronic device comprising a non
transitory computer-readable storage medium as described above; And one or more
processors capable of executing the instructions of the non-transitory computer-readable
storage medium.
The method has following beneficial effects that the possible target of the drug are jointly
predicted from two views by simultaneously considering the chemical properties and
clinical function of the drug; Through the Auto-encoder model, the feature vectors of
drugs are projected to the protein space and then to the disease space, and the task of
drug-target interaction prediction is changed to the multi-label from the traditional single
label classification, taking into account the complex compatibility-exclusion relationship
between drugs and protein. By keeping the consistency of drug chemical similarity and
functional similarity, the consistency of drug chemical properties, molecular mechanism
and clinical function is maintained.
Additional aspects and advantages of the invention will be partially introduced in the
description which follows and will be clearer in the following description or known with
the help of application in the present invention.
In order to explain the technical scheme of the embodiments of the present invention
more clearly, the drawings used in the description of the embodiments will be briefly
introduced below. Obviously, the drawings in the following description are only some
embodiments of the present invention, and other drawings can be obtained according to
these drawings on the premise of not paying creative labor.
Fig. 1 is a schematic diagram of a data set construction flow according to an embodiment
of the present invention.
Fig. 2 is a schematic diagram of the working principle of the Auto-encoder according to
the embodiment of the present invention.
Fig. 3 is an example diagram of similarity distribution of 20 drugs in drug space (left),
protein space (middle) and disease space (right).
Fig. 4 is a schematic diagram for comparing performance between the prediction methods
described in the embodiments in the present invention and other DTT prediction methods
Embodiments of the present invention are described in detail below, examples of which
are shown in the figures, in which identical or similar mark numbers representing
identical or similar elements or elements having identical or similar functions throughout.
The embodiments described below with the help of the figures are exemplary and are only used to explain the present invention but can't be interpreted as limiting the present invention.
It can be understood by those skilled in the art that all terms (including technical terms
and scientific terms) used herein have the same meanings as those generally understood
by those skilled in the art to which the present invention belongs, unless otherwise
defined.
It should also be understood that terms such as those defined in a general dictionary
should be understood to have meanings consistent with those in the context of the prior
art and will not be interpreted in idealized or overly formal meanings unless defined as
here.
As will be understood by those skilled in the art, the singular forms "a","an,
"mentioned" and "should" used herein may also include plural forms unless expressly
stated. It should be further understood that the word "comprising" used in the
specification of the present invention means the presence of stated features, integers,
steps, operations, elements and/or components, but does not exclude the presence or
addition of one or more other features, integers, steps, operations, elements and/or groups
thereof.
In the description of this specification, descriptions referring to the terms "one
embodiment", "some embodiments", "example", "specific example", or "some examples"
mean that specific features, structures, materials or characteristics described in
connection with this embodiment or example are included in at least one embodiment or
example of the present invention. Furthermore, the specific features, structures, materials
or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without contradicting each other.
For the convenience of understanding the present invention, the following will further
explain the present invention with specific examples in conjunction with the drawings,
and the specific examples do not mean a limitation on the embodiments of the present
invention.
It should be understood by those skilled in the art that the figures are only schematic
diagrams of embodiments, and the components in the figures are not necessarily
necessary for implementing the present invention.
Embodiment 1
The present invention provides a drug target prediction system for maintaining the
consistency of chemical properties and functions of drugs, which comprises:
The acquisition module is used for acquiring chemical fingerprints of drugs to be
predicted.
The calculation module is used for processing chemical fingerprints of drugs by using the
trained feature selection model to obtain an interaction score matrix of drugs and targets;
The mentioned feature selection model regards the targets of drug as the feature of drug
in protein space and the indications of drug as the feature of drug in disease space.
Wherein when the feature selection model is trained, the similarity among target
sequences and diseases are considered, with an aim at minimizing the similarity
difference of drugs in chemical fingerprints space(dimension), target protein space(dimension) and disease(indication) space(dimension), so as to keep the chemical properties and functions of drugs consistent;
And the judgement module is used for taking the corresponding target with the highest
score as the candidate target of the drug based on the interaction score matrix of the drugs
and the targets.
In Embodiment 1, a drug target prediction method for keeping the consistency of
chemical properties and functions of drugs is realized by using the system above,
including:
Using the acquisition module for acquiring chemical fingerprints of drugs to be predicted.
Using the calculation module for processing chemical fingerprints of drugs by using the
trained feature selection model to obtain an interaction score matrix of drugs and target;
The mentioned feature selection model regards the target of drug as the feature of drug in
protein space and the indication of drug as the feature of drug in disease space. Wherein
when the feature selection model is trained, the similarity among target sequences and
diseases are considered, with an aim at minimizing the similarity difference between
different drugs in chemical fingerprint space(dimension), target protein space(dimension)
and disease(indication) space(dimension), so as to keep the chemical properties and
functions of drugs consistent;
And using the judgement module for taking the corresponding target with the highest
score as the candidate target of the drug based on the interaction score matrix of the drugs
and the targets.
In this embodiment 1, training the feature selection model includes:
Extracting chemical fingerprints, sequence information of targets protein, targets
information and indications information of drugs to generate chemical fingerprints
characteristic matrix, drug-target interaction matrix and drug-disease association matrix.
Based on chemical fingerprints information, sequence of targets protein, drugs
information related to disease, the similarity matrix of chemical fingerprints, sequence
similarity matrix among targets protein and similarity matrix among diseases are
calculated respectively.
Based on the chemical fingerprint similarity matrix of drugs, sequence similarity matrix
among targets and similarity matrix among diseases, the similarity of drugs in drug
(chemical fingerprint dimension), protein and disease space are calculated on the basis of
the chemical fingerprints of drugs, interacted targets and related indications, and the
feature selection model is trained with the goal of keeping the error of the three
similarities minimum.
Where the feature vector of the drug in the target protein space is calculated based on the
chemical fingerprints matrix and chemical fingerprint similarity matrix of the drug, the
sequence similarity matrix among targets and the similarity matrix among diseases;
Based on the feature vector of the drug in the target protein space, the association score
between the drug and each disease in the disease space is obtained;
Based on the feature vector of drugs in target protein space and the association score
between drugs and diseases in disease space, the similarity of each pair of drugs in target
protein space and disease space is calculated.
Based on the similarity of each pair of drugs in target protein space and disease space, the
score matrix of drug-target interaction is calculated.
Where the chemical fingerprint of the drug is projected into the target protein space
through an encoder composed of two layers of fully connected neural networks, and the
interaction score between the drug and each target is obtained;
The interaction score between the drug and each target is projected into the disease space
through the decoder to get the association score between the drug and each disease.
Embodiment 2
In Embodiment 2, a new method for predicting drug-target interaction is proposed, which
focuses on keeping the consistency of drug chemical properties, molecular mechanisms
and clinical manifestations.
Firstly, the data set of drug-target interaction prediction is extracted from several related
public databases. By constructing drug-protein-disease heterogeneous network, the
chemical fingerprints of drugs, amino acid sequence of protein, drug-target interaction
data and indication data of drugs are integrated.
After that, in order to consider the association of drugs in different spaces, the drug-target
interaction prediction task is regarded as a multi-label classification task. Specifically, the
targets of drugs are regarded as the characteristics of drugs in protein space, and
indications are regarded as the characteristics of drugs in disease space. By constructing
an Auto-encoder model based on deep neural network, the feature vectors of drugs are
projected from the original feature space (drug space/dimension) to the embedded space
(protein space/dimension), and then from the embedded space to the label space (disease
space/dimension).
In this Embodiment 2, according to the chemical fingerprints, targets and indications of
drugs, three similarities of drugs are calculated respectively. By minimizing the error among these three similarities in the prediction process, the consistency of the chemical properties, molecular mechanism and clinical function of the drug itself is maintained.
Firstly, the data set required for drug-target interaction prediction is extracted from the
network database, and the similarity of drug, target and disease is calculated (as shown in
Figure 1), which is used as the association measurement of various nodes in their original
feature space. The specific steps are as follows:
Step 1: Extracting the chemical fingerprints, target protein and indication information of
each drug in the data set from the public database, and generating the chemical
fingerprints characteristic matrix F'e R'k, drug-target interaction matrix yRP e R"'
and drug-disease association matrix YRDe R"`" of the drug. Set R ={r,r 2 ,-..., r,}
represents m drugs in the dataset; Set D ={d,d 2 ... d, } represents n diseases in the
dataset and Set P={piP 2,...,Pq represents q protein in the dataset. If the drug r,
has characteristics fE=1; Otherwise I'=0.. Similarly, if there is a known
association (or interaction) between, with disease d (or protein P yRD=1 (or
RP =1); otherwise yRD =0(or =0
Step 2: Based on the chemical fingerprint information of drugs and the related drug
information of diseases, the similarity matrix of drugs' chemical fingerprints, S, e R" "
and diseases, SdE R""" , are calculated respectively. Based on the sequence information
of protein, the sequence similarity matrix S, = R q* among protein is calculated.
Wherein, s(i,j)E [0,1]the closer to 1 the s(ij) is, the more similar the nodes of i and
j is.
Step 3: The potential target of drugs is predicted based on the Auto-encoder model on the
premise of keeping the chemical properties and functions of drugs consistent. As shown
in Fig. 2, the chemical fingerprint f of r; is used as the input of the model and r, is
projected into the protein space through an encoder composed of two-layer fully
connected neural network, and the interaction score between r, and each protein is
obtained, which is expressed by vector hi,h,(i, j)E [0,1] and the closer to 1 hi(i, j) is,
the greater the possibility of interaction between r, and p.In order to introduce the
indication information of drugs to facilitate DTI prediction, the feature vector hi of r,
in protein space is projected into disease space by a decoder, and the association score
with each disease is obtained, which is represented by vector h,
. Considering that there are unobserved target proteins and indications of drugs, this leads
to the problem of unavailability of features in drug-target interaction matrix and drug
disease association matrix YRP. If the encoder and decoder are optimized according to
the unavailable feature matrix, the association between drugs may be changed in protein
space and disease space.
As shown in Figure 3, according to the chemical fingerprints, target proteins and
indications of drugs, 20 drugs are randomly selected from the data set, and the association
distribution of these drugs in drug space, protein space and disease space is simulated.
Obviously, the distribution of association in the three spaces changed to some extent.
Based on the assumption that the chemical properties and functions of drugs should be
consistent, drugs with similar chemical properties should have similar targets and
indications, so the association of drugs in the three spaces should be consistent. As far as drugs are concerned, their chemical structures are known and complete, while their target information and indication information are unavailable to some extent. The uncertainty of association caused by the unavailability of features will have a negative impact on the prediction results of the model.
Therefore, in this embodiment 2, the consistency of drug association in the drug-protein
disease space is finally maintained: the consistency of drug chemical properties,
molecular mechanisms and clinical functions is maintained.
The specific steps are as follows:
The output, h/ and h', from the encoder about the drugs r and r, are regarded as
the feature vector of these two drugs in protein space, and the similarity of , and r, in
protein space is calculated and recorded as S,(i, j). Similarly, the similarity S'(ij) of
r, and r, in disease space is calculated. By minimizing the error among similarity
S(i, j), S[ (i, j), and Sd'(i, j) of ,' and r, in the three spaces, the aim of keeping the
chemical properties, molecular mechanisms and clinical functions of drugs consistent is
achieved.
In this embodiment 2, the chemical fingerprints of the drug are:
According to the molecular structure and chemical properties of the drug, the 0-1 code is
constructed. For the chemical fingerprint of a drug, code 1 means that it has the first
molecular structure or chemical property.
Targets and target protein are:
The functions (indications) of drugs include:
Molecular mechanism and clinical function. The molecular mechanism of drugs refers to
the targets that drugs can bind to. Clinical functions of drugs refer to diseases that drugs
can treat (i.e., indications).
In this Embodiment 2, the chemical fingerprints of the drug refer to:
According to the molecular structure and chemical properties of the drug, the 0-1 code is
constructed. For the chemical fingerprint f of a drug , f (j)=1means that it has
the j th type of molecular structure or chemical property.
Targets and target protein are:
Substances needed to bind to drugs for the sake of drugs efficacy become drug targets,
such as protein, genes and so on. Protein, which can be used as target, is called target
protein.
The functions (indications) of drugs are:
Including molecular mechanism and clinical function of drugs. The molecular mechanism
of drugs refers to the targets that drugs can bind to. Clinical functions of drugs refer to
diseases that drugs can treat (i.e., indications).
In this Embodiment 2, the similarity calculation includes:
Based on the chemical fingerprints of drugs and the related drugs of the diseases, the
similarities between drugs and diseases are calculated respectively. Based on the
sequence information of protein, the sequence similarity between protein is calculated.
Based on the SMILE information of drugs, the chemical fingerprints for all drugs in the
data set are constructed, and the chemical fingerprint matrix F'e Rxk of drugs is
obtained. Based on this, the chemical similarity Si,j) between drugs , and drugs
r, can be calculated. Similarly, according to the related drugs of the diseases, the similarity between each pair of diseases can be calculated, and the similarity matrix
S, e R"' of diseases can be obtained.
Based on the sequence information of proteins, the sequence similarity scores of each
pair of protein are calculated, and the similarity matrix S, e R"' of proteins is obtained.
In the Embodiment 2, a feature selection model based on Auto-encoder is designed, as
shown in Fig. 2. And specifically, the chemical fingerprint f of the drug r is input
of the model and r is projected into an embedded space through an encoder composed of
two layers of fully connected neural networks, and the feature vector h/ in the
embedded space is obtained according to the following formula:
h|=o(W'h|-'i+b,) t =1,2,3
hi° r
Where a, hi', W', b 1 are the activation function, output, weight matrix and bias
vector respectively of the fully connected layer for the t th layer.
In order to introduce the indication information of drugs to assist DTI prediction, a
decoder is used to calculate the association score of predicted drugs and each disease
according to the result of encoder. According to the following formula, the association
score h,' with each disease is calculated: h, : h, =oa(W'h- +b ) t = 4,5,6.
In addition, in order to prevent the model from over-fitting, a Batch Normalizing layer is
added after each fully connected layer, and the output of the fully connected layer is fitted
to a standard Gaussian distribution.
The loss of the encoder and the loss of the decoder can be calculated respectively
according to the following formula:
loss = hi '|
Wherein, for the encoder, h, = h,, Y = YRP ; For the decoder, hi = h, i =jRD
According to the prediction results of encoder and decoder, the similarity matrix Sr
and S,D of drug , and r, in protein space and disease space can be alculated.
Based on this, the loss functions of encoder and decoder are extended as follows:
loss= - h- + Sr*(i~j)-Srij);
Wherein, for the encoder, 1 =/1 , S*= S,; the second term in the formula is the
association loss of drugs in protein space; For the decoder, A,-2, S*=SD; the
second term in the formula is the loss of drug similarity between the drug space and the
protein (or disease) space. A, and 2 are parameters for adjusting the weight of loss
terms.
Based on the assumption that similar drugs can interact(associate) with similar
proteins(diseases), the similarity between protein (diseases) should be considered in the
prediction results of encoders(decoders). Therefore, the loss function of the Auto-encoder
model is finally defined as:
loss= h + 2* A - Sr(i, j)_ + 2 S(i, j) m mi k
Wherein, for the encoder, A.= /, k=q , S=S, , S#= S; and for the decoder,
14= 14, k=n , S S#= Sf ; SP indicates the similarity of protein calculated
based on encoder prediction results; SD indicates the similarity among diseases
calculated based on decoder prediction results.
In Embodiment 2, the final score matrix of drug-protein interaction is obtained by
minimizing the loss of encoder and decoder.
In this Embodiment 2, in order to evaluate the performance of the prediction model, it is
proved through 5-fold cross-validation that its prediction accuracy is superior to several
state-of-the-art DTI prediction methods, including DTINet, GRMF, MolTrans, NGDTP
and DeepDTNet, in terms of AUC and AUPR. Two published data sets of drug-target
interaction prediction were used to test the effectiveness of this method, and it is superior
to several baseline methods in terms of both AUC and AUPR.
To sum up, the drug-target interaction prediction method provided in Embodiment 2
includes four parts: the extraction of drug big data, the calculation of similarity among
various nodes, the prediction of drug-target interaction based on deep learning and
keeping the similarity of drug chemical properties and functions consistent. The
prediction of drug-target interaction includes two parts: drug-target interaction prediction
based on deep neural network and auxiliary prediction based on drug-disease association
information. Maintaining the similarity of drug chemistry and function includes keeping
the consistency of drug chemistry and molecular mechanism and keeping the consistency
of drug chemistry and clinical function.
In this Embodiment 2, by considering the chemical properties and clinical functions of
drugs at the same time, the possible targets of drugs are predicted from two views;
Through the Auto-encoder model, the feature vectors of drugs are projected to the protein
space and then to the disease space, and the task of drug-target interaction prediction is
changed from the traditional single-label classification task to the multi-label task, taking
into account the complex compatibility-exclusion relationship between drugs and protein.
By keeping the consistency of drug chemical similarity and functional similarity, the
consistency of drug chemical properties, molecular mechanism and clinical function is
maintained.
Embodiment 3
The present invention provides a non-transient computer-readable storage medium, which
includes instructions for executing the drug target prediction method for keeping the
consistency of drug chemical properties and functions as described above. The method
includes:
Acquiring chemical fingerprints of drugs to be predicted;
Processing chemical fingerprints of drugs by using the trained feature selection model to
obtain an interaction score matrix of drugs and target; The mentioned feature selection
model regards the targets of drug as the feature of drug in protein space and the
indications of drug as the feature of drug in disease space. Among them, when training,
the feature selection model considers the similarity between target sequences and
diseases, aiming at minimizing the similarity difference between a drug in three different
spaces, so as to keep the chemical properties and functions of drugs consistent;
Based on the score matrix of drug-target interaction, the corresponding target with the
highest score is taken as the candidate target of the drug.
Embodiment 4
The present invention provides an electronic device comprising a non-transitory
computer readable storage medium; And one or more processors capable of executing the
instructions of the non-transitory computer-readable storage medium. The non-transient
computer-readable storage medium comprises instructions for executing a drug target
prediction method for keeping the consistency of chemical properties and functions of
drugs, and the method comprises:
Acquiring chemical fingerprints of drugs to be predicted;
Processing chemical fingerprints of drugs by using the trained feature selection model to
obtain an interaction score matrix of drugs and target; The mentioned feature selection
model regards the target of drug as the feature of drug in protein space and the indication
of drug as the feature of drug in disease space. Among them, when training, the feature
selection model considers the similarity between target sequences and diseases, aiming at
minimizing the similarity difference of drugs in different spaces, so as to keep the
chemical properties and functions of drugs consistent;
Based on the score matrix of drug-target interaction, the corresponding target with the
highest score is taken as the candidate target of the drug.
To sum up, the drug target prediction method and system for keeping the consistency of
drug chemical properties and functions described in the embodiments of the present
invention focus on keeping the consistency of drug chemical properties, molecular
mechanisms and clinical manifestations.
Firstly, the data set of drug-target interaction prediction is extracted from several related
public databases. By constructing drug-protein-disease heterogeneous network, the
chemical fingerprints of drugs, amino acid sequence of protein, drug-target interaction
data and indication data of drugs are integrated. After that, in order to consider the
association of drugs in different spaces, the drug-target interaction prediction task is
regarded as a multi-label classification task.
Specifically, the targets of drugs are regarded as the characteristics of drugs in protein
space, and indications are regarded as the characteristics of drugs in disease space. By
constructing an Auto-encoder model based on deep neural network, the feature vectors of
drugs are projected from the original feature space (drug space/dimension) to the
embedded space (protein space), and then from the embedded space to the label space
(disease space).
According to the chemical fingerprints, target and indications of drugs, three similarities
of drugs are calculated respectively. By minimizing the error between these three
similarities in the prediction process, the consistency of the chemical properties,
molecular mechanism and clinical function of the drug itself is maintained.
It should be understood by those skilled in the field that embodiments of the present
invention may be provided as methods, systems, or computer program products.
Therefore, the present invention may take the form of an entirely hardware embodiment,
an entirely software embodiment, or an embodiment combining software and hardware
aspects. Furthermore, the present invention may take the form of a computer program
product embodied on one or more computer usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) having computer usable program code embodied therein.
The present invention is described with reference to the methods, devices (systems), and
flowcharts and/or block diagrams of computer program products according to
embodiments of the present invention. It should be understood that each flow and/or
block in the flowchart and/or block diagram, and combinations of flows and/or blocks in
the flowchart and/or block diagram can be implemented by computer program
instructions. These computer program instructions may be provided to a processor of a
general purpose computer, a special purpose computer, an embedded processor or other
programmable data processing apparatus to produce a machine, such that the instructions
which are executed by the processor of the computer or other programmable data
processing apparatus produce means for implementing the functions specified in one or
more flow diagrams and/or one or more block diagrams.
These computer program instructions can also be loaded on a computer or other
programmable data processing device, and a series of operation steps are executed on the
computer or other programmable device to produce a computer-implemented process, so
that the instructions executed on the computer or other programmable device provide
steps for implementing the functions specified in one or more flow charts and/or one or
more block diagrams.
The above is only the preferred embodiments of the present invention and is not used to
limit the present invention. For those skilled in the field, the present invention can be
modified and varied. Any modification, equivalent substitution, improvement, etc. made within the spirit and principle of this invention shall be subjected to the protection scope of this invention.
Although the specific embodiments of this invention have been described with reference
to the attached figures, it is not a limitation on the protection scope of this invention. It
should be understood by those skilled in the field that various modifications or variations
that can be made by those skilled in the field on the basis of the technical scheme
disclosed in this invention without making creative efforts should be covered within the
protection scope of this invention.
Claims (10)
1.A drug target prediction method for keeping the consistency of chemical properties and
functions of drugs characterized in comprising:
Acquiring chemical fingerprints of drugs to be predicted;
Using the trained feature selection model to process the chemical fingerprints of the drug
is to obtain the interaction score matrix between the drug and the target; The feature
selection model regards the targets of drug as the feature of drug in protein space and the
indications of drug as the feature of drug in disease space. Among them, when training,
the feature selection model considers the similarity among targets and among diseases,
aiming at minimizing the similarity difference of drugs in different spaces, so as to keep
the chemical properties and functions of drugs consistent;
Based on the score matrix of drug-target interaction, the corresponding target with the
highest score is taken as the candidate target of the drug.
2. The drug target prediction method for keeping the consistency of drug chemical
properties and functions according to claim 1, the training the feature selection model
comprises:
Extracting chemical fingerprints, sequence information of targets protein, targets
information and indications information of drugs to generate chemical fingerprints
characteristic matrix, drug-target interaction matrix and drug-disease association matrix.
Based on chemical fingerprints information, sequence of targets protein, drugs
information related to disease, the similarity matrix of chemical fingerprints, sequence
similarity matrix among targets protein and similarity matrix among diseases are
calculated respectively.
Based on the chemical fingerprint similarity matrix of drugs, sequence similarity matrix
among targets and similarity matrix among diseases, the similarity of drugs in drug,
protein and disease space is calculated by combining the chemical fingerprints of drugs,
interacted targets and related indications, and the feature selection model is trained with
the goal of keeping the error of the three similarities minimum.
3. The drug target prediction method for keeping the consistency of drug chemical
properties and functions according to claim 2, is characterized in that: The feature vector
of the drug in the target protein space is calculated based on the chemical fingerprint
similarity matrix of the drug, the sequence similarity matrix among targets and the
similarity matrix among diseases;
Based on the feature vector of the drug in the target protein space, the association score
between the drug and each disease in the disease space is obtained;
Based on the feature vector of drugs in target protein space and the association score
between drugs and diseases in disease space, the similarity of each pair of drugs in
protein space and disease space is calculated.
Based on the similarity of each pair of drugs in protein space and disease space, the score
matrix of drug-target interaction is calculated.
4. The drug target prediction method for keeping the consistency of drug chemical
properties and functions according to claim 3, is characterized in that: the chemical
fingerprints of the drug are projected into the target protein space through an encoder
composed of two layers of fully connected neural networks, and the interaction score
between the drug and each target is obtained;
The interaction score between the drug and each target is projected into the disease space
through the decoder to get the association score between the drug and each disease.
5.The drug target prediction method for keeping the consistency of drug chemical
properties and functions according to claim 4, is characterized in that: the chemical
fingerprint fjof the drug r is input into an encoder composed of two layers of fully
connected neural networks, and projected into the target protein space to obtain the
feature vector hi in the protein space:
h|=U,(W'h|-'+b|) t =1,2,3
hi°=rT
Where a, h|', W', b| are the activation function, output, weight matrix and bias
vector respectively of the fully connected layer for the t th layer.
6. The drug target prediction method for keeping the consistency of drug chemical
properties and functions according to claim 5, is characterized in that: through a decoder,
the association score of predicted drug diseases is calculated according to the result h/ of
the encoder: h16 . hi=-, (W'hi1+b|) t=4,5,6.
7. The drug target prediction method for keeping the consistency of drug chemical
properties and functions according to claim 6, is characterized in that the loss function of
the encoder is: the loss function of the feature selection model is:
loss= S___j ~ k2 ~ i~ Zh YT s(jjijj~)2~~sQj~s~)
Wherein, for the encoder, 1, =23, k=q, S=S,, S4 =SP, and for the decoder,
4 = 14, k=n, S= Se, S#=SD; SP indicates the similarity of protein calculated
based on encoder prediction results; SD indicates the similarity among diseases
calculated based on decoder prediction results.
8.A drug target prediction system for keeping the consistency of chemical properties and
functions of drugs is characterized by comprising:
The acquisition module which is used for acquiring chemical fingerprints of drugs to be
predicted;
The calculation module which is used for processing chemical fingerprints of drugs by
using the trained feature selection model to obtain an interaction score matrix of drugs
and target; The feature selection model regards the target of drug as the feature of drug in
protein space and the indication of drug as the feature of drug in disease space. Among
them, when training, the feature selection model considers the similarity between target
sequences and diseases, aiming at minimizing the similarity difference between a drug in
different spaces, so as to keep the chemical properties and functions of drugs consistent.
And the judgment module which is used for taking the corresponding target with the
highest score as the candidate target of the drug based on the interaction score matrix of
the drug and the target.
9. A non-transitory computer-readable storage medium is characterized in that it
comprises instructions for executing the drug target prediction method for keeping the
consistency of drug chemical properties and functions according to any one of claims 1-7.
10. An electronic device is characterized by including the non-transitory computer
readable storage medium according to claim 9 and one or more processors capable of
executing the instructions of the non-transitory computer-readable storage medium.
Figure 1 1/2
Figure 3 Figure 2 2/2
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021104604A AU2021104604A4 (en) | 2021-07-27 | 2021-07-27 | Drug target prediction method for keeping consistency of chemical properties and functions of drugs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021104604A AU2021104604A4 (en) | 2021-07-27 | 2021-07-27 | Drug target prediction method for keeping consistency of chemical properties and functions of drugs |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021104604A4 true AU2021104604A4 (en) | 2021-09-23 |
Family
ID=77746024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021104604A Ceased AU2021104604A4 (en) | 2021-07-27 | 2021-07-27 | Drug target prediction method for keeping consistency of chemical properties and functions of drugs |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2021104604A4 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115458061A (en) * | 2022-10-13 | 2022-12-09 | 南开大学 | Drug-protein interaction prediction method and system |
-
2021
- 2021-07-27 AU AU2021104604A patent/AU2021104604A4/en not_active Ceased
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115458061A (en) * | 2022-10-13 | 2022-12-09 | 南开大学 | Drug-protein interaction prediction method and system |
CN115458061B (en) * | 2022-10-13 | 2024-01-23 | 南开大学 | Medicine-protein interaction prediction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug–Target interaction prediction | |
Tao et al. | A method for identifying vesicle transport proteins based on LibSVM and MRMD | |
CN113470741B (en) | Drug target relation prediction method, device, computer equipment and storage medium | |
Ullah et al. | PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection | |
Li et al. | Protein interaction network reconstruction through ensemble deep learning with attention mechanism | |
CN113012770A (en) | Medicine-medicine interaction event prediction method, system, terminal and readable storage medium based on multi-modal deep neural network | |
CN116206775A (en) | Multi-dimensional characteristic fusion medicine-target interaction prediction method | |
Tanoori et al. | Drug-target continuous binding affinity prediction using multiple sources of information | |
CN113764034A (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
AU2021104604A4 (en) | Drug target prediction method for keeping consistency of chemical properties and functions of drugs | |
CN112652355A (en) | Medicine-target relation prediction method based on deep forest and PU learning | |
CN114822716A (en) | Target drug screening method, device, electronic equipment and storage medium | |
CN118038995B (en) | Method and system for predicting small open reading window coding polypeptide capacity in non-coding RNA | |
Zhou et al. | Knowledge-aware attention network for protein-protein interaction extraction | |
Chen et al. | MultiscaleDTA: A multiscale-based method with a self-attention mechanism for drug-target binding affinity prediction | |
Li et al. | GA-ENs: A novel drug–target interactions prediction method by incorporating prior Knowledge Graph into dual Wasserstein Generative Adversarial Network with gradient penalty | |
CN113129999B (en) | New drug candidate substance output method and device, model construction method and recording medium | |
Hu et al. | Improving Protein-Protein Interaction Prediction Using Protein Language Model and Protein Network Features | |
Yousef et al. | SFM: a novel sequence-based fusion method for disease genes identification and prioritization | |
Pang et al. | DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model | |
CN113345535A (en) | Drug target prediction method and system for keeping chemical property and function consistency of drug | |
CN115828152A (en) | Anti-cancer peptide classification method, system and storage medium based on graph convolution network | |
Halsana et al. | DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein–Protein Interactions | |
Min et al. | Sequence-based deep learning frameworks on enhancer-promoter interactions prediction | |
CN114300036A (en) | Genetic variation pathogenicity prediction method and device, storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |