AU2021104604A4

AU2021104604A4 - Drug target prediction method for keeping consistency of chemical properties and functions of drugs

Info

Publication number: AU2021104604A4
Application number: AU2021104604A
Authority: AU
Inventors: Jian Liu; Chang SUN; Jinmao Wei
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-09-23
Anticipated expiration: 2029-07-27

Abstract

The present invention belongs to the technical field of computational biology, and proposes a drug-target interaction (DTI) prediction method based on maintaining the consistency of the chemical properties and functions of drugs. Finding unknown DTI is a key step in drug discovery and repositioning. Due to the shortcomings of time-consuming, laborious, high cost, and high failure rate for DTI identification based on biological experiments, the prediction of possible targets of drugs based on computational methods has gradually become a hotspot in the field of drug discovery in recent years. However, most of the previous inventions on DTI prediction did not consider the consistency of the chemical properties and functions of the drugs in the prediction process. This change in consistency may have a serious negative impact on the accuracy of the prediction results. The advantages of the present invention are: (1) the possible target of the drugs is predicted from two perspectives by taking chemical properties and clinical functions into consideration at the same time ; (2) the feature vector of the drugs is firstly projected to the protein space then to disease space through the autoencoder model, which makes the drug-target interaction prediction task change from the traditional single-label classification to the multi-label, taking into account the complex compatibility-exclusion relationship between the drugs and the proteins; (3) the consistency of medicinal chemical properties, molecular mechanisms and clinical functions is maintained by maintaining the consistency of chemical similarity and functional similarity of drugs. 1/2 Sequence information of protein Similarity calculation F waw-S Hfia Umcd mnMe F"Os Huffmnan & ) Coding MRPSCTA-GAA Algorithm MAHVRGL"QLP BLOSUME62 WDViOVGP-LPH b&TESM---RDV SMIL ES informationofdrugs ECTPS ECF. r, CCIMH)(C)(C@HI[NC(=D) F" I oCosine similarity -M , CCO(CII j1CN2CCC1C2)N1 0 1 0 .. 1r Ti COJC OC@@H] ry 1 0 1 1 U CC(C)C[C@HkNC@ONC@QHI rT 1 1 1 0 Disease-drugcorrelation drugs r. ro r r4 cosn n 4 4,4 protein U4 disease Figure 1

Description

1/2

Sequence information of protein Similarity calculation F waw-S Hfia Umcd mnMe F"Os Huffmnan &

) Coding MRPSCTA-GAA Algorithm MAHVRGL"QLP

BLOSUME62 WDViOVGP-LPH b&TESM---RDV

SMIL ES informationofdrugs ECTPS ECF. r, CCIMH)(C)(C@HI[NC(=D) F" I oCosine similarity -M , CCO(CII j1CN2CCC1C2)N1 0 1 0 .. 1r Ti COJC OC@@H] ry 1 0 1 1 U CC(C)C[C@HkNC@ONC@QHI rT 1 1 1 0

Disease-drugcorrelation drugs r. ro r r4 cosnn 4 4,4

protein U4 disease

Figure 1

Drug target prediction method for keeping consistency of chemical properties and

functions of drugs

TECHNICAL FIELD

The invention relates to the technical field of computer-aided drug discovery, in

particular to a drug target prediction method for keeping the consistency of chemical

properties and functions of drugs.

BACKGROUND

Because the identification of drug-target interaction (DTI) based on biological wet-lab

experiments is time-consuming, laborious, costly, and has high failure rate, the prediction

of possible drug targets based on computational methods has become a research hotspot

in the field of drug discovery. However, most previous inventions about DTI prediction

did not consider the consistency of chemical properties and functions of drugs in the

prediction process. This change in consistency may have a serious negative impact on the

accuracy of prediction results.

Traditional drug-target interaction prediction methods can be mainly divided into docking

simulations and ligand-based methods. Docking simulation methods need to simulate the

3D structure of target, which is very time-consuming and not all the structural

information of target protein is known. The ligand-based methods compare the target

protein of ligand to be queried with that of a group of known ligands. However, when the

number of known ligands is small, ligand-based methods do not perform well.

To solve the problems above, a prediction method of drug target interaction based on

deep neural network is proposed. Based on the assumption that similar drugs are more

likely to interact with similar targets, the possibility of interaction between each pair of drugs and targets is analysed by integrating various information in drug-target heterogeneous networks.

However, the current methods based on deep neural network all regard the prediction of

drug-target interaction as a single label and binary classification task, which makes the

prediction process between each pair of drug-target independent. However, because the

chemical properties and functions of drugs should be consistent, drugs with similar

chemical structures should also have similar target protein or similar indications. Ignoring

the complex compatibility-exclusion relationship between drugs and proteins and making

independent predictions for each pair of drugs-targets may regard mutually exclusive

drugs as compatible, which may lead to serious drug misuse in the clinical treatment.

Similarly, there are similar hidden dangers in predicting the relationship between drugs

and diseases. Therefore, in addition to considering the similarities of drugs and of target

protein, it is particularly important to keep the chemical property similarity and functional

similarity among drugs consistent.

SUMMARY

The purpose of the present invention is to provide a drug-target prediction method that

keeps the consistency of chemical properties and functions of drugs, so as to solve at least

one of technical problems existing in the background technology above.

In order to achieve the purpose above, the invention adopts the following technical

scheme:

On the one hand, the invention provides a method for predicting drug targets for keeping

the consistency of chemical properties and functions of drugs, which comprises the

following steps:

Acquiring chemical fingerprints of drugs to be predicted;

Using the trained feature selection model to process the chemical fingerprints of the drug

to obtain the interaction score matrix between the drug and the target; The feature

selection model regards the targets of drug as the feature of drug in protein space and the

indications of drug as the feature of drug in disease space. Wherein, when the feature

selection model undergo training, the similarity among target sequences and among

diseases are considered, with a aim at minimizing the similarity difference of drugs in

different spaces so as to keep the chemical properties and functions of drugs consistent;

Based on the score matrix of drug-target interaction, the corresponding target with the

highest score is taken as the candidate target of the drug.

Preferably, training the feature selection model comprises:

Extracting chemical fingerprints, sequence information of targets protein, targets

information and indications information of drugs to generate chemical fingerprints

characteristic matrix, drug-target interaction matrix and drug-disease association matrix.

Based on chemical fingerprints information, sequence of targets protein, drugs

information related to disease, the similarity matrix of chemical fingerprints, sequence

similarity matrix among targets protein and similarity matrix among diseases are

calculated respectively.

Based on the chemical fingerprint similarity matrix of drugs, sequence similarity matrix

among targets and similarity matrix among diseases, the similarity of drugs in drug,

protein and disease space is calculated on the basis of the chemical fingerprints of drugs,

action targets and related indications, and the feature selection model which regards the

target of drug as the feature of drug in protein space and the indication of drug as the feature of drug in disease space is trained with the goal of keeping the error of the three similarities minimum.

Preferably, the feature vector of the drug in the target protein space is calculated based on

the chemical fingerprint similarity matrix of the drug, the sequence similarity matrix

between targets and the similarity matrix between diseases.

Based on the feature vector of the drug in the target protein space, the association score

between the drug and each disease in the disease space is obtained.

Based on the feature vector of drugs in target protein space and the association score

between drugs and diseases in disease space, the similarity of each pair of drugs in target

protein space and disease space is calculated.

Based on the similarity of each pair of drugs in target protein space and disease space, the

score matrix of drug-target interaction is calculated.

Preferably, the chemical fingerprints of the drug are projected into the target protein

space through an encoder composed of two layers of fully connected neural networks,

and the interaction score between the drug and each target is obtained.

The interaction score between the drug and each target is projected into the disease space

through the decoder to get the association score between the drug and each disease.

Preferably, the chemical fingerprints f' of the drug , is input into an encoder

composed of two layers of fully connected neural networks, and projected into the target

protein space to obtain the feature vector h 3 in the target protein space:

h|=o7,(W'hi+b|) t=1,2,3

hi° =rT

Where o, h|, W', b| are the activation function, output, weight matrix and biased

vector respectively of the fully connected layer for the tth layer.

6 Preferably, with the help of a decoder, the association score h, of predicted drug

diseases is calculated according to the result h3 of the encoder:

a W'h|'- + b|} t =4,5,6. h|'=o

Preferably, the loss function of the feature selection model is:

loss= -ZhM _Ih-Y T k2 +~(~ij+_ S(,j )Si) S,(i, j) + S(i, j);

Wherein, for the encoder, A4 =3 , k=q, S=S,, S4 =SP, and for the decoder,

, 1,=14, k= n, S= Si S= SD; SP indicates the similarity of protein calculated

based on encoder prediction results; SD indicates the similarity among diseases

calculated based on decoder prediction results.

In a second aspect, the present invention provides a drug target prediction system for

maintaining the consistency of chemical properties and functions of drugs, which

comprises:

The acquisition module is used for acquiring chemical fingerprints of drugs to be

predicted.

The calculation module is used for processing chemical fingerprints of drugs by using the

trained feature selection model to obtain an interaction score matrix of drugs and targets;

The feature selection model regards the targets of drug as the feature of drug in protein

space and the indications of drug as the feature of drug in disease space. Wherein when the feature selection model is trained, the similarity among target sequences and diseases are considered, with an aim at minimizing the similarity difference of drugs in different spaces, so as to keep the chemical properties and functions of drugs consistent.

And the judgement module is used for taking the corresponding target with the highest

score as the candidate target of the drug based on the interaction score matrix of the drug

and the target.

In a third aspect, the present invention provides a non-transient computer-readable

storage medium, which includes instructions for executing the drug target prediction

method for keeping the consistency of drug chemical properties and functions as

described above.

In a fourth aspect, the present invention provides an electronic device comprising a non

transitory computer-readable storage medium as described above; And one or more

processors capable of executing the instructions of the non-transitory computer-readable

storage medium.

The method has following beneficial effects that the possible target of the drug are jointly

predicted from two views by simultaneously considering the chemical properties and

clinical function of the drug; Through the Auto-encoder model, the feature vectors of

drugs are projected to the protein space and then to the disease space, and the task of

drug-target interaction prediction is changed to the multi-label from the traditional single

label classification, taking into account the complex compatibility-exclusion relationship

between drugs and protein. By keeping the consistency of drug chemical similarity and

functional similarity, the consistency of drug chemical properties, molecular mechanism

and clinical function is maintained.

Additional aspects and advantages of the invention will be partially introduced in the

description which follows and will be clearer in the following description or known with

the help of application in the present invention.

BRIEF DESCRIPTION OF THE FIGURES

In order to explain the technical scheme of the embodiments of the present invention

more clearly, the drawings used in the description of the embodiments will be briefly

introduced below. Obviously, the drawings in the following description are only some

embodiments of the present invention, and other drawings can be obtained according to

these drawings on the premise of not paying creative labor.

Fig. 1 is a schematic diagram of a data set construction flow according to an embodiment

of the present invention.

Fig. 2 is a schematic diagram of the working principle of the Auto-encoder according to

the embodiment of the present invention.

Fig. 3 is an example diagram of similarity distribution of 20 drugs in drug space (left),

protein space (middle) and disease space (right).

Fig. 4 is a schematic diagram for comparing performance between the prediction methods

described in the embodiments in the present invention and other DTT prediction methods

DESCRIPTION OF THE INVENTION

Embodiments of the present invention are described in detail below, examples of which

are shown in the figures, in which identical or similar mark numbers representing

identical or similar elements or elements having identical or similar functions throughout.

The embodiments described below with the help of the figures are exemplary and are only used to explain the present invention but can't be interpreted as limiting the present invention.

It can be understood by those skilled in the art that all terms (including technical terms

and scientific terms) used herein have the same meanings as those generally understood

by those skilled in the art to which the present invention belongs, unless otherwise

defined.

It should also be understood that terms such as those defined in a general dictionary

should be understood to have meanings consistent with those in the context of the prior

art and will not be interpreted in idealized or overly formal meanings unless defined as

here.

As will be understood by those skilled in the art, the singular forms "a","an,

"mentioned" and "should" used herein may also include plural forms unless expressly

stated. It should be further understood that the word "comprising" used in the

specification of the present invention means the presence of stated features, integers,

steps, operations, elements and/or components, but does not exclude the presence or

addition of one or more other features, integers, steps, operations, elements and/or groups

thereof.

In the description of this specification, descriptions referring to the terms "one

embodiment", "some embodiments", "example", "specific example", or "some examples"

mean that specific features, structures, materials or characteristics described in

connection with this embodiment or example are included in at least one embodiment or

example of the present invention. Furthermore, the specific features, structures, materials

or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without contradicting each other.

For the convenience of understanding the present invention, the following will further

explain the present invention with specific examples in conjunction with the drawings,

and the specific examples do not mean a limitation on the embodiments of the present

invention.

It should be understood by those skilled in the art that the figures are only schematic

diagrams of embodiments, and the components in the figures are not necessarily

necessary for implementing the present invention.

Embodiment 1

The present invention provides a drug target prediction system for maintaining the

consistency of chemical properties and functions of drugs, which comprises:

predicted.

The mentioned feature selection model regards the targets of drug as the feature of drug

in protein space and the indications of drug as the feature of drug in disease space.

Wherein when the feature selection model is trained, the similarity among target

sequences and diseases are considered, with an aim at minimizing the similarity

difference of drugs in chemical fingerprints space(dimension), target protein space(dimension) and disease(indication) space(dimension), so as to keep the chemical properties and functions of drugs consistent;

score as the candidate target of the drug based on the interaction score matrix of the drugs

and the targets.

In Embodiment 1, a drug target prediction method for keeping the consistency of

chemical properties and functions of drugs is realized by using the system above,

including:

Using the acquisition module for acquiring chemical fingerprints of drugs to be predicted.

Using the calculation module for processing chemical fingerprints of drugs by using the

trained feature selection model to obtain an interaction score matrix of drugs and target;

The mentioned feature selection model regards the target of drug as the feature of drug in

protein space and the indication of drug as the feature of drug in disease space. Wherein

when the feature selection model is trained, the similarity among target sequences and

diseases are considered, with an aim at minimizing the similarity difference between

different drugs in chemical fingerprint space(dimension), target protein space(dimension)

and disease(indication) space(dimension), so as to keep the chemical properties and

functions of drugs consistent;

And using the judgement module for taking the corresponding target with the highest

and the targets.

In this embodiment 1, training the feature selection model includes:

Based on chemical fingerprints information, sequence of targets protein, drugs

calculated respectively.

among targets and similarity matrix among diseases, the similarity of drugs in drug

(chemical fingerprint dimension), protein and disease space are calculated on the basis of

the chemical fingerprints of drugs, interacted targets and related indications, and the

feature selection model is trained with the goal of keeping the error of the three

similarities minimum.

Where the feature vector of the drug in the target protein space is calculated based on the

chemical fingerprints matrix and chemical fingerprint similarity matrix of the drug, the

sequence similarity matrix among targets and the similarity matrix among diseases;

between the drug and each disease in the disease space is obtained;

protein space and disease space is calculated.

score matrix of drug-target interaction is calculated.

Where the chemical fingerprint of the drug is projected into the target protein space

through an encoder composed of two layers of fully connected neural networks, and the

interaction score between the drug and each target is obtained;

Embodiment 2

In Embodiment 2, a new method for predicting drug-target interaction is proposed, which

focuses on keeping the consistency of drug chemical properties, molecular mechanisms

and clinical manifestations.

Firstly, the data set of drug-target interaction prediction is extracted from several related

public databases. By constructing drug-protein-disease heterogeneous network, the

chemical fingerprints of drugs, amino acid sequence of protein, drug-target interaction

data and indication data of drugs are integrated.

After that, in order to consider the association of drugs in different spaces, the drug-target

interaction prediction task is regarded as a multi-label classification task. Specifically, the

targets of drugs are regarded as the characteristics of drugs in protein space, and

indications are regarded as the characteristics of drugs in disease space. By constructing

an Auto-encoder model based on deep neural network, the feature vectors of drugs are

projected from the original feature space (drug space/dimension) to the embedded space

(protein space/dimension), and then from the embedded space to the label space (disease

space/dimension).

In this Embodiment 2, according to the chemical fingerprints, targets and indications of

drugs, three similarities of drugs are calculated respectively. By minimizing the error among these three similarities in the prediction process, the consistency of the chemical properties, molecular mechanism and clinical function of the drug itself is maintained.

Firstly, the data set required for drug-target interaction prediction is extracted from the

network database, and the similarity of drug, target and disease is calculated (as shown in

Figure 1), which is used as the association measurement of various nodes in their original

feature space. The specific steps are as follows:

Step 1: Extracting the chemical fingerprints, target protein and indication information of

each drug in the data set from the public database, and generating the chemical

fingerprints characteristic matrix F'e R'k, drug-target interaction matrix yRP e R"'

and drug-disease association matrix YRDe R"`" of the drug. Set R ={r,r 2 ,-..., r,}

represents m drugs in the dataset; Set D ={d,d 2 ... d, } represents n diseases in the

dataset and Set P={piP 2,...,Pq represents q protein in the dataset. If the drug r,

has characteristics fE=1; Otherwise I'=0.. Similarly, if there is a known

association (or interaction) between, with disease d (or protein P yRD=1 (or

RP =1); otherwise yRD =0(or =0

Step 2: Based on the chemical fingerprint information of drugs and the related drug

information of diseases, the similarity matrix of drugs' chemical fingerprints, S, e R" "

and diseases, SdE R""" , are calculated respectively. Based on the sequence information

of protein, the sequence similarity matrix S, = R q* among protein is calculated.

Wherein, s(i,j)E [0,1]the closer to 1 the s(ij) is, the more similar the nodes of i and

j is.

Step 3: The potential target of drugs is predicted based on the Auto-encoder model on the

premise of keeping the chemical properties and functions of drugs consistent. As shown

in Fig. 2, the chemical fingerprint f of r; is used as the input of the model and r, is

projected into the protein space through an encoder composed of two-layer fully

connected neural network, and the interaction score between r, and each protein is

obtained, which is expressed by vector hi,h,(i, j)E [0,1] and the closer to 1 hi(i, j) is,

the greater the possibility of interaction between r, and p.In order to introduce the

indication information of drugs to facilitate DTI prediction, the feature vector hi of r,

in protein space is projected into disease space by a decoder, and the association score

with each disease is obtained, which is represented by vector h,

. Considering that there are unobserved target proteins and indications of drugs, this leads

to the problem of unavailability of features in drug-target interaction matrix and drug

disease association matrix YRP. If the encoder and decoder are optimized according to

the unavailable feature matrix, the association between drugs may be changed in protein

space and disease space.

As shown in Figure 3, according to the chemical fingerprints, target proteins and

indications of drugs, 20 drugs are randomly selected from the data set, and the association

distribution of these drugs in drug space, protein space and disease space is simulated.

Obviously, the distribution of association in the three spaces changed to some extent.

Based on the assumption that the chemical properties and functions of drugs should be

consistent, drugs with similar chemical properties should have similar targets and

indications, so the association of drugs in the three spaces should be consistent. As far as drugs are concerned, their chemical structures are known and complete, while their target information and indication information are unavailable to some extent. The uncertainty of association caused by the unavailability of features will have a negative impact on the prediction results of the model.

Therefore, in this embodiment 2, the consistency of drug association in the drug-protein

disease space is finally maintained: the consistency of drug chemical properties,

molecular mechanisms and clinical functions is maintained.

The specific steps are as follows:

The output, h/ and h', from the encoder about the drugs r and r, are regarded as

the feature vector of these two drugs in protein space, and the similarity of , and r, in

protein space is calculated and recorded as S,(i, j). Similarly, the similarity S'(ij) of

r, and r, in disease space is calculated. By minimizing the error among similarity

S(i, j), S[ (i, j), and Sd'(i, j) of ,' and r, in the three spaces, the aim of keeping the

chemical properties, molecular mechanisms and clinical functions of drugs consistent is

achieved.

In this embodiment 2, the chemical fingerprints of the drug are:

According to the molecular structure and chemical properties of the drug, the 0-1 code is

constructed. For the chemical fingerprint of a drug, code 1 means that it has the first

molecular structure or chemical property.

Targets and target protein are:

The functions (indications) of drugs include:

Molecular mechanism and clinical function. The molecular mechanism of drugs refers to

the targets that drugs can bind to. Clinical functions of drugs refer to diseases that drugs

can treat (i.e., indications).

In this Embodiment 2, the chemical fingerprints of the drug refer to:

constructed. For the chemical fingerprint f of a drug , f (j)=1means that it has

the j th type of molecular structure or chemical property.

Targets and target protein are:

Substances needed to bind to drugs for the sake of drugs efficacy become drug targets,

such as protein, genes and so on. Protein, which can be used as target, is called target

protein.

The functions (indications) of drugs are:

Including molecular mechanism and clinical function of drugs. The molecular mechanism

of drugs refers to the targets that drugs can bind to. Clinical functions of drugs refer to

diseases that drugs can treat (i.e., indications).

In this Embodiment 2, the similarity calculation includes:

Based on the chemical fingerprints of drugs and the related drugs of the diseases, the

similarities between drugs and diseases are calculated respectively. Based on the

sequence information of protein, the sequence similarity between protein is calculated.

Based on the SMILE information of drugs, the chemical fingerprints for all drugs in the

data set are constructed, and the chemical fingerprint matrix F'e Rxk of drugs is

obtained. Based on this, the chemical similarity Si,j) between drugs , and drugs

r, can be calculated. Similarly, according to the related drugs of the diseases, the similarity between each pair of diseases can be calculated, and the similarity matrix

S, e R"' of diseases can be obtained.

Based on the sequence information of proteins, the sequence similarity scores of each

pair of protein are calculated, and the similarity matrix S, e R"' of proteins is obtained.

In the Embodiment 2, a feature selection model based on Auto-encoder is designed, as

shown in Fig. 2. And specifically, the chemical fingerprint f of the drug r is input

of the model and r is projected into an embedded space through an encoder composed of

two layers of fully connected neural networks, and the feature vector h/ in the

embedded space is obtained according to the following formula:

h|=o(W'h|-'i+b,) t =1,2,3

hi° r

Where a, hi', W', b 1 are the activation function, output, weight matrix and bias

vector respectively of the fully connected layer for the t th layer.

In order to introduce the indication information of drugs to assist DTI prediction, a

decoder is used to calculate the association score of predicted drugs and each disease

according to the result of encoder. According to the following formula, the association

score h,' with each disease is calculated: h, : h, =oa(W'h- +b ) t = 4,5,6.

In addition, in order to prevent the model from over-fitting, a Batch Normalizing layer is

added after each fully connected layer, and the output of the fully connected layer is fitted

to a standard Gaussian distribution.

The loss of the encoder and the loss of the decoder can be calculated respectively

according to the following formula:

loss = hi '|

Wherein, for the encoder, h, = h,, Y = YRP ; For the decoder, hi = h, i =jRD

According to the prediction results of encoder and decoder, the similarity matrix Sr

and S,D of drug , and r, in protein space and disease space can be alculated.

Based on this, the loss functions of encoder and decoder are extended as follows:

loss= - h- + Sr*(i~j)-Srij);

Wherein, for the encoder, 1 =/1 , S*= S,; the second term in the formula is the

association loss of drugs in protein space; For the decoder, A,-2, S*=SD; the

second term in the formula is the loss of drug similarity between the drug space and the

protein (or disease) space. A, and 2 are parameters for adjusting the weight of loss

terms.

Based on the assumption that similar drugs can interact(associate) with similar

proteins(diseases), the similarity between protein (diseases) should be considered in the

prediction results of encoders(decoders). Therefore, the loss function of the Auto-encoder

model is finally defined as:

loss= h + 2* A - Sr(i, j)_ + 2 S(i, j) m mi k

Wherein, for the encoder, A.= /, k=q , S=S, , S#= S; and for the decoder,

14= 14, k=n , S S#= Sf ; SP indicates the similarity of protein calculated

based on encoder prediction results; SD indicates the similarity among diseases

calculated based on decoder prediction results.

In Embodiment 2, the final score matrix of drug-protein interaction is obtained by

minimizing the loss of encoder and decoder.

In this Embodiment 2, in order to evaluate the performance of the prediction model, it is

proved through 5-fold cross-validation that its prediction accuracy is superior to several

state-of-the-art DTI prediction methods, including DTINet, GRMF, MolTrans, NGDTP

and DeepDTNet, in terms of AUC and AUPR. Two published data sets of drug-target

interaction prediction were used to test the effectiveness of this method, and it is superior

to several baseline methods in terms of both AUC and AUPR.

To sum up, the drug-target interaction prediction method provided in Embodiment 2

includes four parts: the extraction of drug big data, the calculation of similarity among

various nodes, the prediction of drug-target interaction based on deep learning and

keeping the similarity of drug chemical properties and functions consistent. The

prediction of drug-target interaction includes two parts: drug-target interaction prediction

based on deep neural network and auxiliary prediction based on drug-disease association

information. Maintaining the similarity of drug chemistry and function includes keeping

the consistency of drug chemistry and molecular mechanism and keeping the consistency

of drug chemistry and clinical function.

In this Embodiment 2, by considering the chemical properties and clinical functions of

drugs at the same time, the possible targets of drugs are predicted from two views;

Through the Auto-encoder model, the feature vectors of drugs are projected to the protein

space and then to the disease space, and the task of drug-target interaction prediction is

changed from the traditional single-label classification task to the multi-label task, taking

into account the complex compatibility-exclusion relationship between drugs and protein.

By keeping the consistency of drug chemical similarity and functional similarity, the

consistency of drug chemical properties, molecular mechanism and clinical function is

maintained.

Embodiment 3

The present invention provides a non-transient computer-readable storage medium, which

includes instructions for executing the drug target prediction method for keeping the

consistency of drug chemical properties and functions as described above. The method

includes:

Acquiring chemical fingerprints of drugs to be predicted;

Processing chemical fingerprints of drugs by using the trained feature selection model to

obtain an interaction score matrix of drugs and target; The mentioned feature selection

model regards the targets of drug as the feature of drug in protein space and the

indications of drug as the feature of drug in disease space. Among them, when training,

the feature selection model considers the similarity between target sequences and

diseases, aiming at minimizing the similarity difference between a drug in three different

spaces, so as to keep the chemical properties and functions of drugs consistent;

highest score is taken as the candidate target of the drug.

Embodiment 4

The present invention provides an electronic device comprising a non-transitory

computer readable storage medium; And one or more processors capable of executing the

instructions of the non-transitory computer-readable storage medium. The non-transient

computer-readable storage medium comprises instructions for executing a drug target

prediction method for keeping the consistency of chemical properties and functions of

drugs, and the method comprises:

Acquiring chemical fingerprints of drugs to be predicted;

model regards the target of drug as the feature of drug in protein space and the indication

of drug as the feature of drug in disease space. Among them, when training, the feature

selection model considers the similarity between target sequences and diseases, aiming at

minimizing the similarity difference of drugs in different spaces, so as to keep the

chemical properties and functions of drugs consistent;

highest score is taken as the candidate target of the drug.

To sum up, the drug target prediction method and system for keeping the consistency of

drug chemical properties and functions described in the embodiments of the present

invention focus on keeping the consistency of drug chemical properties, molecular

mechanisms and clinical manifestations.

data and indication data of drugs are integrated. After that, in order to consider the

association of drugs in different spaces, the drug-target interaction prediction task is

regarded as a multi-label classification task.

Specifically, the targets of drugs are regarded as the characteristics of drugs in protein

space, and indications are regarded as the characteristics of drugs in disease space. By

constructing an Auto-encoder model based on deep neural network, the feature vectors of

drugs are projected from the original feature space (drug space/dimension) to the

embedded space (protein space), and then from the embedded space to the label space

(disease space).

According to the chemical fingerprints, target and indications of drugs, three similarities

of drugs are calculated respectively. By minimizing the error between these three

similarities in the prediction process, the consistency of the chemical properties,

molecular mechanism and clinical function of the drug itself is maintained.

It should be understood by those skilled in the field that embodiments of the present

invention may be provided as methods, systems, or computer program products.

Therefore, the present invention may take the form of an entirely hardware embodiment,

an entirely software embodiment, or an embodiment combining software and hardware

aspects. Furthermore, the present invention may take the form of a computer program

product embodied on one or more computer usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) having computer usable program code embodied therein.

The present invention is described with reference to the methods, devices (systems), and

flowcharts and/or block diagrams of computer program products according to

embodiments of the present invention. It should be understood that each flow and/or

block in the flowchart and/or block diagram, and combinations of flows and/or blocks in

the flowchart and/or block diagram can be implemented by computer program

instructions. These computer program instructions may be provided to a processor of a

general purpose computer, a special purpose computer, an embedded processor or other

programmable data processing apparatus to produce a machine, such that the instructions

which are executed by the processor of the computer or other programmable data

processing apparatus produce means for implementing the functions specified in one or

more flow diagrams and/or one or more block diagrams.

These computer program instructions can also be loaded on a computer or other

programmable data processing device, and a series of operation steps are executed on the

computer or other programmable device to produce a computer-implemented process, so

that the instructions executed on the computer or other programmable device provide

steps for implementing the functions specified in one or more flow charts and/or one or

more block diagrams.

The above is only the preferred embodiments of the present invention and is not used to

limit the present invention. For those skilled in the field, the present invention can be

modified and varied. Any modification, equivalent substitution, improvement, etc. made within the spirit and principle of this invention shall be subjected to the protection scope of this invention.

Although the specific embodiments of this invention have been described with reference

to the attached figures, it is not a limitation on the protection scope of this invention. It

should be understood by those skilled in the field that various modifications or variations

that can be made by those skilled in the field on the basis of the technical scheme

disclosed in this invention without making creative efforts should be covered within the

protection scope of this invention.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1.A drug target prediction method for keeping the consistency of chemical properties and

functions of drugs characterized in comprising:

Acquiring chemical fingerprints of drugs to be predicted;

is to obtain the interaction score matrix between the drug and the target; The feature

the feature selection model considers the similarity among targets and among diseases,

aiming at minimizing the similarity difference of drugs in different spaces, so as to keep

the chemical properties and functions of drugs consistent;

highest score is taken as the candidate target of the drug.

2. The drug target prediction method for keeping the consistency of drug chemical

properties and functions according to claim 1, the training the feature selection model

comprises:

Based on chemical fingerprints information, sequence of targets protein, drugs

calculated respectively.

protein and disease space is calculated by combining the chemical fingerprints of drugs,

interacted targets and related indications, and the feature selection model is trained with

the goal of keeping the error of the three similarities minimum.

3. The drug target prediction method for keeping the consistency of drug chemical

properties and functions according to claim 2, is characterized in that: The feature vector

of the drug in the target protein space is calculated based on the chemical fingerprint

similarity matrix of the drug, the sequence similarity matrix among targets and the

similarity matrix among diseases;

between the drug and each disease in the disease space is obtained;

between drugs and diseases in disease space, the similarity of each pair of drugs in

protein space and disease space is calculated.

Based on the similarity of each pair of drugs in protein space and disease space, the score

matrix of drug-target interaction is calculated.

4. The drug target prediction method for keeping the consistency of drug chemical

properties and functions according to claim 3, is characterized in that: the chemical

fingerprints of the drug are projected into the target protein space through an encoder

composed of two layers of fully connected neural networks, and the interaction score

between the drug and each target is obtained;

5.The drug target prediction method for keeping the consistency of drug chemical

properties and functions according to claim 4, is characterized in that: the chemical

fingerprint fjof the drug r is input into an encoder composed of two layers of fully

connected neural networks, and projected into the target protein space to obtain the

feature vector hi in the protein space:

h|=U,(W'h|-'+b|) t =1,2,3

hi°=rT

Where a, h|', W', b| are the activation function, output, weight matrix and bias

vector respectively of the fully connected layer for the t th layer.

6. The drug target prediction method for keeping the consistency of drug chemical

properties and functions according to claim 5, is characterized in that: through a decoder,

the association score of predicted drug diseases is calculated according to the result h/ of

the encoder: h16 . hi=-, (W'hi1+b|) t=4,5,6.

7. The drug target prediction method for keeping the consistency of drug chemical

properties and functions according to claim 6, is characterized in that the loss function of

the encoder is: the loss function of the feature selection model is:

loss= S___j ~ k2 ~ i~ Zh YT s(jjijj~)2~~sQj~s~)

Wherein, for the encoder, 1, =23, k=q, S=S,, S4 =SP, and for the decoder,

4 = 14, k=n, S= Se, S#=SD; SP indicates the similarity of protein calculated

based on encoder prediction results; SD indicates the similarity among diseases

calculated based on decoder prediction results.

8.A drug target prediction system for keeping the consistency of chemical properties and

functions of drugs is characterized by comprising:

The acquisition module which is used for acquiring chemical fingerprints of drugs to be

predicted;

The calculation module which is used for processing chemical fingerprints of drugs by

using the trained feature selection model to obtain an interaction score matrix of drugs

and target; The feature selection model regards the target of drug as the feature of drug in

protein space and the indication of drug as the feature of drug in disease space. Among

them, when training, the feature selection model considers the similarity between target

sequences and diseases, aiming at minimizing the similarity difference between a drug in

different spaces, so as to keep the chemical properties and functions of drugs consistent.

And the judgment module which is used for taking the corresponding target with the

highest score as the candidate target of the drug based on the interaction score matrix of

the drug and the target.

9. A non-transitory computer-readable storage medium is characterized in that it

comprises instructions for executing the drug target prediction method for keeping the

consistency of drug chemical properties and functions according to any one of claims 1-7.

10. An electronic device is characterized by including the non-transitory computer

readable storage medium according to claim 9 and one or more processors capable of

executing the instructions of the non-transitory computer-readable storage medium.

Figure 1 1/2

Figure 3 Figure 2 2/2