CN106529205B - It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information - Google Patents
It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information Download PDFInfo
- Publication number
- CN106529205B CN106529205B CN201610953873.3A CN201610953873A CN106529205B CN 106529205 B CN106529205 B CN 106529205B CN 201610953873 A CN201610953873 A CN 201610953873A CN 106529205 B CN106529205 B CN 106529205B
- Authority
- CN
- China
- Prior art keywords
- drug
- relationship
- target
- description information
- targets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Landscapes
- Spectroscopy & Molecular Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information, drug minor structure information, molecule character description information and known drug targets relationship are obtained by database first, then the similarity matrix between drug is constructed according to these drug minor structures, drug molecule character description information and known drug targets relationship, individually, then each similarity matrix of building is become into final drug similarity matrix according to weight sets;Also similar feature predicts the target relationship of drug for target finally based on similar drug targeting.The present invention only needs to construct similitude according to drug molecule character description information, minor structure information, the information such as the sequence independent of target, and target Relationship Prediction can be carried out to completely new medical compounds, avoid a large amount of manpower and material resources consumed by Biochemistry Experiment.The experimental results showed that this method being capable of accurate prediction drug targets relationship.
Description
Technical field
The invention belongs to system biology fields, are related to a kind of medicine based on drug minor structure, molecule character description information
Object target Relationship Prediction method.
Background technique
For at present, drug targets refer to has pharmacodynamic feature and can be by pharmaceutically-active large biological molecule, such as certain in vivo
A little large biological molecules such as protein and nucleic acid, the gene of those Code targets albumen are also referred to as target gene.First determine targeting
The relevant target molecules of specified disease are the bases of modern new drug development, therefore the identification of drug targets interaction has become
One important foundation process of drug development.Although can be identified by bioassay to drug targets interaction,
But its experimental method is very expensive for current drug development, is time-consuming and challenging.So with
There is different computation models to predict extensive potential drug targets incidence relation in the development of computing technique.
Currently, mainly having 3 major class for drug targets Relationship Prediction:
(1) Bioexperiment measuring method
This traditional medicament research and development mode, which has, achieves certain success in early period.But into since the new century, this
Kind is faced with always many intractable challenges based on " gene, a kind of drug, a kind of disease ", and such as high clinical proportion of goods damageds are opened
It excessive cycle is sent out, needs a large amount of manpower, financial resources, material resources are tested.
(2) network-based prediction technique
This method, which is based on similar drug, can be applied to as similar target it is assumed that being integrated with drug similitude net
The information such as network, target similitude network, existing DTI (drug targets relationship), drug side-effect relational network, can for us
Quickly, it easily predicts potential drug targets relationship and provides important help for the reorientation of drug, be based on network
Method have become prediction potential drug target incidence relation powerful.
Such as in NRWRH method, the similitude network between drug, protein-protein similitude network and known medicine are integrated
Object target interactive network carries out drug to a heterogeneous network, by random walk method again in this heterogeneous network
The prediction of target relationship, different with traditional random walk method is that it is integrated with three networks, can from drug to target,
Target is predicted to drug both direction.
In addition, this 3 inference pattern methods of the typical DBSI also compared, TBSI, NBI, these methods are based respectively on
The structural similarity that the SIMCOMP of drug is calculated, the net of Smith-Waterman score similitude and DTI based on target
Network Topology Similarity infers drug targets incidence relation.In the SDTNBI method of newest proposition, provide to new drug
The prediction model of object is closed, and is achieved good results, still, the integrated drug information of this method needs to be further improved.
(3) based on the prediction technique of machine learning
Currently, based on the method for machine learning by integrating the sequence information, known of the chemical structure of drug, target proteins
Drug targets relationships predicted, be divided into the study and semi-supervised learning method of supervision.In supervised learning method, mesh
The foundation that preceding positive negative sample determines is basis currently with the presence or absence of known incidence relation, however, this can have a negative sample
This select permeability because from the data of experimental verification can only be confirmation its there are incidence relations, without can confirm that it is not present
Incidence relation.Than more typical such as BLM method, predicted respectively from drug and target both direction using support vector machines
Then value takes its average value to obtain final prediction score, but negative sample selection of such method its inaccuracy is largely
The upper accuracy for influencing prediction.In order to solve the problems, such as above-mentioned negative sample, a small amount of label and a large amount of Unlabeled data collection are proposed
At semi-supervised learning method, compare typically NetLapRLS method, this method integrates pharmaceutical chemistry information, target gene
Information and known incidence relation predict new relationship, the method use label relationship and unmarked relationship rather than
Pure label relationship improves predictablity rate.
Although these above-mentioned methods be successfully applied to presently, there are drug targets incidence relation prediction and medicine
Object redirects in work, but the defect that its method provided has is that cannot new chemical entities be carried out with target association to close
System's prediction, or target Relationship Prediction can be carried out to new compound, but need to integrate more drug informations to calculate
Better prediction result, and this is very important drug development and further research.
Therefore, it is necessary to design a kind of new method for carrying out the prediction of target incidence relation towards new compound.
Summary of the invention
The technical problem to be solved by the present invention is to, in view of the deficiencies of the prior art, provide it is a kind of based on drug minor structure,
The drug targets Relationship Prediction method of molecule character description information is capable of the target relationship of accurate prediction drug, is effectively kept away
Exempt from a large amount of manpower and material resources consumed by Biochemistry Experiment.
The technical solution of invention is as follows:
It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information, including following step
It is rapid:
Step 1: according to each kernel texture information architecture drug Substructure similarity matrix S of all drugsSubSim;
Step 2: drug molecule character description information is constructed according to the molecule character description information (Smiles) of all drugs
Similarity matrix SSmiSim;
Step 3: judging to need whether the drug predicted has known drug targets relationship;What it is if necessary to prediction is to have
The drug for the drug targets relationship known, then the building of the drug targets relationship (DTI) according to known to drug drug targets relationship is similar
Property matrix SDTISim;If necessary to prediction be no known drug targets relationship drug, then do not construct drug targets relationship
Similarity matrix SDTISim;
Step 4: the various drug similarity matrixs of above-mentioned building are integrated into final drug similarity matrix SSim;
Step 5: according to drug targets relationship known to other drugs similar with the drug for needing to predict, calculating and need in advance
Relationship score between the drug and target of survey;Ranking is carried out to score, if score ranking is higher, the drug targets are to presence
A possibility that relationship, is bigger.
The detailed process of the step 1 are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;diFor i-th of medicine
Object drugiSub-structural features value vector, characteristic value number is equal to the dimension K of minor structure in feature value vector, if drug exists
The minor structure, then otherwise it is 0 that corresponding characteristic value, which is 1,;
Then, according to the cosine related coefficient of sub-structural features value vector, drug drug is calculatediAnd drugjStructure it is similar
PropertyCalculation formula is as follows:
Wherein, dikAnd djkRespectively indicate drugiAnd drugjSub-structural features value vector diAnd djIn k-th of characteristic value,;
WkFor the weight of k-th of minor structure, WkCalculation it is as follows:
Wherein, fkFor the frequency that k-th of minor structure occurs in all drugs, δ isStandard deviation, h is default
Parameter (is set as 0.1) in our current research, the meaning of weight be so that the frequency of occurrences it is low minor structure it is higher than frequency son knot
Structure occupies higher specific gravity when calculating drug Substructure similarity;
Finally, by allThe drug Substructure similarity matrix S of compositionSubSim;For
SSubSimThe element of i-th row jth column.
The detailed process of the step 2 are as follows:
Firstly, from the molecule character description information of drug isolate length be 4 LINGO Dictionary set (such as
The molecule character description information of drug DB00217 is " CN/C (=N C)/NCc1ccccc1 " in DrugBank database, from it
The LINGO Dictionary set isolated includes " CN/C ", " N/C (", "/C (=" etc.);LINGO Dictionary set
Each of element be calculated as a term;The LINGO Dictionary set of i-th of drug is denoted as Di;All drugs
LINGO Dictionary union of sets collection is denoted as D, it may be assumed that
D=D1∪......∪Dm;
Then, the weight idf (t, D) of each term in D is calculated;Molecule character description information of the term in all drugs
The frequency of middle appearance is higher, and weight is lower, and calculation formula is as follows:
Wherein, t is a specific term in set D;M is total molecule character description information number, and value is equal to medicine
Object number;M is the molecule character description information number comprising the term;Thus the weight of all term in set D is obtained;
Any two drug drug is calculated further according to following formulaiAnd drugjMolecule character description information similitude
Finally, by allConstitute the molecule character description information similarity matrix of drug
SSubSim;For SSmiSimThe element of i-th row jth column.
In the step 3, the target relationship (DTI) according to known to drug constructs drug targets relationship similarity matrix
SDTISimDetailed process are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;{targetl,l
=1,2 ..., n } be all targets set, n is the quantity of target;A is known drug targets relational matrix, the i-th row in A
The element of l column is denoted as ail, indicate i-th of drug drugiWith first of target targetlBetween relation value;If drugiWith
targetlThere are relationship, then ailIt is 1, is otherwise 0;
Then, drug is calculated based on matrix AiAnd drugjDrug targets relationship similitudeCalculation formula is as follows:
Wherein, function sign (ail,ajl) meaning be ailAnd ajlIn any one be 1, then returning the result is 1;Otherwise it returns
Return result 0.
Finally, by allConstitute drug targets relationship similarity matrix SDTISim;For
SDTISimThe element of i-th row jth column.
The detailed process of the step 4 are as follows: willWithIt is integrated with weight (α, β, 1- alpha-beta):
When need to predict is drug (the completely new drug) of no known target relationship, no SDTISimSimilarity data, therefore
drugiAnd drugjFinal drug similitudeCalculation formula are as follows:
When need to predict is to have the drug of known target relationship, drugiAnd drugjFinal drug similitude
Calculation formula are as follows:
Wherein, 0 < α, β < 1;
By allConstitute final drug similarity matrix SSim;For SSimI-th row jth column
Element.
The detailed process of the step 5 are as follows: according to final drug similarity matrix SSim, predict between drug and target
Relationship score;
drugiAnd targetlRelationship score are as follows:
Wherein, ajlFor the drug in known drug targets relational matrix AjAnd targetlBetween relation value;It needs pre-
The drug drug of surveyiWith target targetlBetween relationship fractional root according to drug similar with its whether with target targetlIt deposits
It is determined in relationship;Selection and drug when parameter Threshold and Ksim are for determining calculated relationship scoreiSimilar drug
Range, the former takes limitation and drugiSimilitudeDrug greater than Threshold is calculated;The latter then takes and drugi's
SimilitudeRanking is calculated in preceding Ksim of drug;drugiKsim is indicated and drugiSimilitude ranking before Ksim
The drug set of name;Two parameters participate in calculating as long as meeting one of them.Threshold and Ksim value can pass through intersection
Verifying obtains.
The quantity of drug of the present invention is m, including needing 1 drug predicted and m-1 to have known drug targets relationship
Drug;The drug for needing to predict is likely to be the drug of known drug targets relationship, it is also possible to be without known
The drug (completely new drug) of drug targets relationship.Drug if necessary to prediction is the medicine for having known drug targets relationship
Object then can construct drug targets relationship similarity matrix according to the known drug targets relationship of m drug in step 3
SDTISim, in steps of 5 according to integrated drug targets relationship similarity matrix SDTISimFinal drug similarity matrix calculate
The unknown drug targets relationship score of the drug for needing to predict, to predict its unknown drug targets relationship.If necessary to pre-
The drug of survey is the drug (completely new drug) of no known drug targets relationship, then does not construct drug targets in step 3
Relationship similarity matrix SDTISim, in steps of 5 according to not integrated drug targets relationship similarity matrix SDTISimFinal medicine
Object similarity matrix calculates the potential drug targets relationship score of drug for needing to predict, to predict its potential drug targets
Relationship.
The utility model has the advantages that
The present invention is based on current existing drug targets incidence relation, the molecule character description information of drug, minor structure letters
Breath, proposes a kind of drug targets Relationship Prediction method, wherein molecule character description information (Smiles), refers to simplified molecular line
Property input specification (Simplified molecular input line entery specification) be a kind of to use character
The molecular structure specification for going here and there to describe.This method is according to known drug targets relationship, minor structure information, Smiles character string structure
Build its drug similarity relationships matrix, integrated according to weight, according to similar drug towards target also it is similar this
Feature is capable of the target relationship of accurate prediction drug.The present invention is only needed according to drug molecule character description information, sub- knot
Structure information constructs similitude, the information such as sequence independent of target, and can carry out target to completely new medical compounds
Relationship Prediction, the shortcomings that avoiding a large amount of manpower and material resources consumed by Biochemistry Experiment.Prediction for drug targets relationship
It is divided into two classes, one kind is that there are the drugs of known drug target relationship, and another kind of is completely new medical compounds;The former is similar
Property matrix relationship is constructed by known drug targets relationship, minor structure, Smiles string-similarity, and the latter only passes through son
Structure, Smiles string-similarity construct.The present invention for there are the drug of known drug target relationship, can predict its with
Relationship between other targets can predict itself and each target for completely new drug (drug without known drug targets relationship)
Relationship between mark.
The present invention can make up for it the limitation that drug targets Relationship Prediction cannot be carried out to completely new compound, enrich
The integrated information of drug in SDTNBI method further improves its prediction effect, is not required to rely on specific biochemistry reality
It tests condition to predict its drug targets relationship, be provided for the redirection of drug and exploitation important needed for further research
Reference information.Using the prediction model for being different from the prior art, the also similar principle of the target based on similar drug targeting
It is predicted, reduces in the prior art due to the prediction deviation that attribute lacks and generates, obtained better prediction effect.
Detailed description of the invention
Drug targets Relationship Prediction overview flow chart of the Fig. 1 based on drug minor structure, molecule character description information;
Fig. 2 is present invention figure compared with ten times of cross validations of SDTNBI method;Fig. 2 (a)~Fig. 2 (e) is respectively this hair
It is bright to compare figure with ten times cross validations of the SDTNBI method on data set GPCRs, Kinases, ICs, NRs, Global;
Fig. 3 is present invention figure compared with the external certificate of SDTNBI method;Fig. 3 (a)~Fig. 3 (b) be respectively the present invention with
Ten times cross validations of the SDTNBI method on data set GPCRs, Kinases compare figure.
Specific embodiment
The present invention is described in further details below with reference to the drawings and specific embodiments:
Embodiment 1:
For the drug of known portions drug targets relationship, according to drug targets relationship, minor structure, molecule character description letter
The mixing similitude of the building drug of breath, the final target relationship for predicting drug;It, will be according only to drug point to completely new compound
Sub- character description information, chemical minor structure information construct mixing similitude, finally predict the target relationship of the compound.It is known
In the prediction of drug, using a benchmark dataset, provide the minor structure of its all drug, Smiles string, it is known that target close
It is information, goes out newly-increased drug targets relationship by integrating similitude model prediction;Prediction for novel compounds, using it
Minor structure information and molecule character description information carry out Similarity measures, prediction with the Given information of drug in benchmark dataset
Its target relationship.
Altogether five benchmark dataset GPCRs (g protein coupled receptor), Kinases (enzyme) (ion channel and nuclear receptor),
ICs (ion channel), NRs (nuclear receptor), Global are collected in ChEMBL and BindingDB database, and completely new drug
Target Relationship Prediction uses external data collection ExGPCRs and ExKinases, is collected in DrugBank database.
Based on drug minor structure, the whole flow process of the drug targets Relationship Prediction of molecule character description information as shown in Figure 1,
Following steps can be divided into:
(1) according to the Substructure similarity matrix S of each kernel texture information architecture drug of all drugsSubSim.Drug
Minor structure information includes following seven kinds: CDK fingerprint (CDK), and CDK extends fingerprint (CDKExt), CDK only chart fingerprint (Graph),
MACCS fingerprint (MACCS), PubChem database fingerprint (PubChem), sub fingerprint (FP4) and Klekota- Ross fingerprint
(KR).Drug minor structure data used in this research are calculated by PaDEL-Descriptor (version 2 .18) software.
Define { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;diFor i-th of drug
drugiSub-structural features value vector, characteristic value number is equal to dimension K (such as the MACCS knot of minor structure in feature value vector
The dimension of structure is 153), if drug there are the minor structure, otherwise it is 0 that corresponding characteristic value, which is 1,;;drugiAnd drugjKnot
Structure similitude isFor the cosine related coefficient of minor structure.Formula specific as follows:
Wherein, dikAnd djkRespectively indicate drugiAnd drugjSub-structural features value vector diAnd djIn k-th of characteristic value;
WkIt is
The weight of k minor structure, WkCalculation it is as follows:
Wherein, fkFor the frequency that k-th of minor structure occurs in all drugs, δ isStandard deviation, h is default
Parameter (is set as 0.1) in our current research, the meaning of weight be so that the frequency of occurrences it is low minor structure it is higher than frequency son knot
Structure occupies higher specific gravity when calculating drug Substructure similarity;
Finally, by allThe drug Substructure similarity matrix S of compositionSubSim;For
SSubSimThe element of i-th row jth column.
It in total include 4741 drugs, MACCS minor structure type dimension is 153, by above-mentioned in GPCRs data set
Weight calculation { WkAfter, the similitude of Drug105250 and Drug100109 are 0.0408.
(2) similitude S is calculated using the molecule character description information (Smiles) of drugSmiSim, Smiles is isolated into length
(the Smiles string of Drug81951 is " (NC (=O) OCC [N+] to the LINGO Dictionary set that degree is 4 such as in GPCR
(C) (C) C "), then the LINGO Dictionary that isolates set include " (NC (", " NC (=", " C (=O " etc.).LINGO
Each of Dictionary set element is calculated as term, and the LINGO Dictionary set of i-th of drug is denoted as Di, institute
There is drug term total collection to be denoted as D, for D in all drugsiIn term union, is defined as:
D=D1∪......∪Dm (3)
Then, the weight idf (t, D) of each term in D is calculated;Molecule character description information of the term in all drugs
The frequency of middle appearance is higher, and weight is lower, and calculation formula is as follows:
Wherein, t is a specific term in set D;M is total molecule character description information number, and value is equal to medicine
Object number;M is the molecule character description information number comprising the term;Thus the weight of all term in set D is obtained;
Any two drug drug is calculated further according to following formulaiAnd drugjMolecule character description information similitude
Finally, by allConstitute the molecule character description information similarity matrix of drug
SSubSim;For SSmiSimThe element of i-th row jth column.When predicting the known drug there are drug targets relationship,
Need to integrate its presently, there are drug targets relationship similitude.
Drug targets relational network is constructed before constructing DTI similitude, interacts and concentrates in drug targets, defines D
={ drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;T={ targetl, l=1,2 ..., n } be
The set of all targets, n are the quantity of target;According to two Principles of Network, drug targets interaction can be expressed as two
Drug targets network, wherein E={ eil:drugi∈Dr,targetl∈T};If drugiAnd targetlBetween exist test
Determining interaction is connected with solid line (side) between them.According to mathematic(al) representation, two networks of drug targets can be expressed
At the adjacency matrix { a of mnil, if a in matrixil=1 indicates drugiAnd targetlBetween there is determining mutual of test
It acts on, otherwise ail=0.
Correlation result, drug are calculated using known DTI dataiAnd drugjDrug targets relationship similitude
Calculate following formula:
Wherein, function sign (ail,ajl) meaning be ailAnd ajlIn any one be 1, then returning the result is 1;Otherwise it returns
Return result 0.By allConstitute drug targets relationship similarity matrix SDTISim;For SDTISim
The element of i-th row jth column.
Such as, in GPCR data set, total different targets in the drug targets relationship of drug82068 and drug82198
Marking number is 6, and public target number is 4, then its target relationship similitude is 0.6667.
(3) willSimilitude is integrated with weight (α, β, 1- alpha-beta).According to similarity data point
Analysis and prediction case, calculation formula are as follows:
In the case where predicting completely new compound, no SDTISimSimilarity data, therefore drugiAnd drugjFinal similitudeCalculation formula are as follows:
In the case where being predicted as existing drug, drugiAnd drugjFinal similitudeCalculation formula are as follows:
It (4) will be according to final drug similarity matrix SSim, based on drug similarity inference thought (if a drug
It interacts with a target proteins, then drug similar with this drug is also likely to act on this target);Therefore it predicts
Drug drugiWith target targetlBetween relationship score are as follows:
Wherein, ajlFor the drug in known drug targets relational matrix AjAnd targetlBetween relation value;It needs pre-
The drug drug of surveyiWith target targetlBetween relationship fractional root according to drug similar with its whether with target targetlIt deposits
It is determined in relationship;Selection and drug when parameter Threshold and Ksim are for determining calculated relationship scoreiSimilar drug
Range, the former takes limitation and drugiSimilitudeDrug greater than Threshold is calculated;The latter then takes and drugi
SimilitudeRanking is calculated in preceding Ksim of drug;drugiKsim is indicated and drugiSimilitude ranking before
Ksim drug set;Two parameters participate in calculating as long as meeting one of them.
For the validity of verification method, two kinds of verifyings, an internal verification, in five benchmark datasets have been carried out
Prediction verifying is carried out by the way of intersecting in GPCRs, Kinases, ICs, NRs, Global using ten times;Another is tested for outside
Two external data collection GPCRs and Kinases from DrugBank are concentrated in its corresponding reference data and are carried out completely newly by card
The prediction of drug is verified.
Specific profile data set is as shown in table 1 below, and target is validation data set, Nd, Nt, NdtRespectively each data set
Middle drug, target, drug targets relationship number, Sparsity NdtWith the ratio of all possible drug targets relationship number.
1 data set of table summarizes table
For the accuracy of assessment prediction method, to every a pair of of the training set and test set in cross validation, from training set
The corresponding relation data of all nodes in middle deletion test set, by after the model prediction with the DTI relationship in test set into
Row compares.Drug is assessed for every kind of participation, is ranked up according to the drug targets relationship score of prediction, then in test set
DTI relationship is compared.In order to assess its performance, below several evaluation indexes difference display model methods precision and robust
Property.Including accuracy rate (P), recall rate (R), accurate enhancing rate (ep) and recall enhancing rate (er).Its details be briefly described as
Under:
Wherein M and N is the drug and target number for participating in prediction, and X is the total DTI relationship number deleted in M drug,
XiFor drugiThe DTI relationship number of deletion.XiIt (L) is current drugiPredicting list in preceding L be correct DTI relationship number
Mesh.In addition, the performance also by calculating AUC (the areas under ROC curves) Lai Tixian algorithm.
Table 2 describes in ten times of cross validations, the Performance Evaluating Indexes value of each data set after prediction, using formula
(6) Lai Jicheng similitude, wherein ICs, Kinases, NRs data lumped parameter are (α=0.1, β=0.1, Ksim in GPCRs
=100, Threshold=0.5), the parameter used in Global data set for (α=0.1, β=0.1, Ksim=50,
Threshold=0.5).Due to being analyzed by data relationship in ten times of cross validations,Relationship specific gravity is to final pre-
Surveying result influences maximum, therefore taking its weighted value is 0.8, in additionIt is distributed as 0.1, is made an uproar to eliminate low similitude
Sound obtains better prediction result, and taking threshold values is 0.5, in prediction model other than taking threshold values 0.5, is also provided with each drug
Neighbour's number limitation of prediction is participated in, the number of drugs in GPCRs, ICs, Kinases, NRs is comparatively bigger than in Global,
It is described in Table 1, therefore it is 100 that Ksim value, which is arranged, in the former, the latter 50.
Algorithm performance index in 20 times of cross validations of table
In GPCRs data set, indices P, R, ep, er, AUC is more stable in seven sub- structures, and difference is not
Greatly, wherein AUC has reached and has reached 0.953 or more, and mean value 0.962 obtains good prediction effect.In Global data
Its average index ratio GPCRs is low on collection, wherein AUC value is worst on FP4 minor structure, only 0.817, but in other sub- knots
Above structure between 0.923 to 0.938, on IC data set, AUC peak has reached 0.971, minimum also to have 0.947,
On Kinases data set, verification result is more stable, and AUC value is all 0.961 or more, peak 0.963, in NRs data
On collection, performance is opposite to want difference, and AUC value is between 0.909 and 0.928.
Table 3 describes in external certificate and (uses the GPCRs from DrugBank database, Kinases data set), face
To the evaluation index value of the completely new each data set prediction result of compound, using formula (5) Lai Jicheng similitude, wherein outside
GPCRs data lumped parameter in portion is that (α=0.5, Ksim=0, Threshold=0.1, wherein Ksim=0 is indicated regardless of before ranking
How many data be involved in prediction calculate), the parameter used in Kinases data set for (α=0.5, Ksim=0,
Threshold=0.0, wherein Ksim=0 and Threshold=0.0 indicates to be involved in prediction regardless of the value and ranking of similitude
It calculates).It is rightSimilitude is integrated at the same scale, is set as 0 to neighbour Ksim, all with no restrictions,
Setting to threshold values Threshold be in GPCRs data set be 0 (with no restrictions) in 0.1, Kinases data set.
The algorithm performance index of 3 external certificate of table
Seen in table 3 since the prediction to new compound lacks drug targets relationship known to it, is compared in AUC value
Ten times of cross validations in table 2 want low, this side light importance of known DTI relationship.It is verified in GPCRs data set
In, AUC has reached peak 0.841 in the integrated prediction of FP4+smi, and in Kinases data set, the highest of AUC
Value obtains in the integrated prediction of KR+smi, and other indices are currently being also to have extremely high value.
The data set disclosed in SDTNBI method of the data set as used in the present invention, using is network
Distribution model, at the same define with above identical Performance Evaluating Indexes, we in the present invention estimated performance with
SDTNBI method compares, due to accuracy rate (P), recall rate (R), and accurate enhancing rate (ep) and recall enhancing rate (er) this
Four indexs are related to specific parameter L=20 in verifying, than be less to meet very much with overall performance, still comparing
When ignore this four indexs, only than being less related to the AUC value of any parameter.
In fig. 2 it is possible to find out the AUC maximum value for integrating different minor structure predictions in Global data set, minimum value
Result all than the prediction of SDTNBI method is poor, and mean value is 0.928 in SDTNBI, is 0.913 in new method, this can
Can be related with the density of drug targets relationship existing in Global data set, because in the ten times of intersections predicted known drug
It is integrated with known drug target relationship analog information in verifying, the known medicine in only Global data set is described in table 1
The density of object target relationship is lower than 1%, is 0.54%, therefore the similarity data constructed on the present invention causes influence, Jin Erying
Ring final prediction result.
And in other four group data sets GPCRs, Kinases, ICs, NRs, the mean value of different minor structure predictions is integrated,
Minimum value is all substantially better than SDTNBI method, and is not much different in maximum value.This demonstrates new on this four group data set
The prediction result of method is stablized than original SDTNBI method, especially in the multiple types minor structure information ratio for obtaining drug
Under more difficult situation, new method can be more more reliable than SDTNBI method.
Verification result ratio in Fig. 3, in completely new drug prediction validation test, in Kinases data set
SDTNBI is slightly poor, and mean value AUC ratio is 0.853:0.848, and maximum AUC ratio is 0.863:0.856, and minimum AUC ratio is 0.847:
0.841.And in GPCGs data set validation test, the estimated performance ratio SDTNBI of new method improves,
Value AUC ratio is 0.817:0.766, and in addition maximum AUC value and minimum AUC value are similarly better than the latter.
Pass through the comparison of above-mentioned two aspect, it was demonstrated that in the case where known drug target relationship is met certain condition,
New integrated approach can provide more accurate prediction result than original SDTNBI method, redirect and new medicine for drug
Object exploitation provides significant further Research foundation.
Claims (6)
1. a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information, which is characterized in that packet
Include following steps:
Step 1: according to each kernel texture information architecture drug Substructure similarity matrix S of all drugsSubSim;
Step 2: drug molecule character description information similarity matrix is constructed according to the molecule character description information of all drugs
SSmiSim;
Step 3: judging to need whether the drug predicted has known drug targets relationship;If necessary to prediction be have it is known
The drug of drug targets relationship, then the drug targets relationship according to known to drug constructs drug targets relationship similarity matrix
SDTISim;If necessary to prediction be no known drug targets relationship drug, then do not construct drug targets relationship similitude
Matrix SDTISim;
Step 4: the various drug similarity matrixs of above-mentioned building are integrated into final drug similarity matrix SSim;
Step 5: according to drug targets relationship known to other drugs similar with the drug for needing to predict, calculating what needs were predicted
Relationship score between drug and target;Ranking is carried out to score, if score ranking is higher, the drug targets are to there are relationships
A possibility that it is bigger.
2. the drug targets Relationship Prediction side according to claim 1 based on drug minor structure, molecule character description information
Method, which is characterized in that the detailed process of the step 1 are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;diFor i-th of drug
drugiSub-structural features value vector, characteristic value number is equal to the dimension K of minor structure in feature value vector, if exist should for drug
Minor structure, then otherwise it is 0 that corresponding characteristic value, which is 1,;
Then, according to the cosine related coefficient of sub-structural features value vector, drug drug is calculatediAnd drugjStructural similarityCalculation formula is as follows:
Wherein, dikAnd djkRespectively indicate drugiAnd drugjSub-structural features value vector diAnd djIn k-th of characteristic value,;WkFor
The weight of k-th of minor structure, WkCalculation it is as follows:
Wherein, fkFor the frequency that k-th of minor structure occurs in all drugs, δ isStandard deviation, h is parameter preset;
Finally, by allConstitute drug Substructure similarity matrix SSubSim, i, j=1,2 ..., m;For SSubSim
The element of i-th row jth column.
3. the drug targets Relationship Prediction side according to claim 2 based on drug minor structure, molecule character description information
Method, which is characterized in that the detailed process of the step 2 are as follows:
Gather firstly, isolating the LINGO Dictionary that length is 4 from the molecule character description information of drug;LINGO
Each of Dictionary set element is calculated as a term;The LINGO Dictionary set of i-th of drug is denoted as
Di;The LINGO Dictionary union of sets collection of all drugs is denoted as D, it may be assumed that
D=D1∪......∪Dm;
Then, in set of computations D each term weight idf (t, D);Calculation formula is as follows:
Wherein, t is a specific term in set D;M is the molecule character description information number comprising the term;
Any two drug drug is calculated further according to following formulaiAnd drugjMolecule character description information similitude
Finally, by allConstitute the molecule character description information similarity matrix S of drugSmiSim, i, j=1,2 ...,
m;For SSmiSimThe element of i-th row jth column.
4. the drug targets Relationship Prediction side according to claim 3 based on drug minor structure, molecule character description information
Method, which is characterized in that in the step 3, the target relationship according to known to drug constructs drug targets relationship similarity matrix
SDTISimDetailed process are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;{targetl, l=1,
2 ..., n } be all targets set, n be target quantity;A is known drug targets relational matrix, the i-th row l column in A
Element be denoted as ail, indicate i-th of drug drugiWith first of target targetlBetween relation value;If drugiWith
targetlThere are relationship, then ailIt is 1, is otherwise 0;
Then, drug is calculated based on matrix AiAnd drugjDrug targets relationship similitudeCalculation formula is as follows:
Wherein, function sign (ail,ajl) meaning be ailAnd ajlIn any one be 1, then returning the result is 1;Otherwise knot is returned
Fruit 0;
Finally, by allConstitute drug targets relationship similarity matrix SDTISim;For
SDTISimThe element of i-th row jth column.
5. the drug targets Relationship Prediction side according to claim 4 based on drug minor structure, molecule character description information
Method, which is characterized in that the detailed process of the step 4 are as follows: willWithWith weight (α, β, 1- alpha-beta) into
Row integration: when need to predict is the drug of no known target relationship, drugiAnd drugjFinal drug similitudeCalculation formula are as follows:
When need to predict is to have the drug of known target relationship, drugiAnd drugjFinal drug similitudeMeter
Calculate formula are as follows:
Wherein, 0 < α, β < 1;
By allConstitute final drug similarity matrix SSim;For SSimThe member of i-th row jth column
Element.
6. the drug targets Relationship Prediction side according to claim 5 based on drug minor structure, molecule character description information
Method, which is characterized in that the detailed process of step 5 are as follows: according to final drug similarity matrix SSim, predict drug and target it
Between relationship score;
drugiAnd targetlRelationship score are as follows:
Wherein, ajlFor the drug in known drug targets relational matrix AjAnd targetlBetween relation value;Parameter
Selection and drug when Threshold and Ksim is for determining calculated relationship scoreiThe range of similar drug, the former limit take with
drugiSimilitudeDrug greater than Threshold is calculated;The latter then takes and drugiSimilitudeRanking exists
Preceding Ksim of drug is calculated;drugiKsim is indicated and drugiSimilitude ranking before Ksim drug set;Two
Parameter participates in calculating as long as meeting one of them.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610953873.3A CN106529205B (en) | 2016-11-03 | 2016-11-03 | It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610953873.3A CN106529205B (en) | 2016-11-03 | 2016-11-03 | It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106529205A CN106529205A (en) | 2017-03-22 |
CN106529205B true CN106529205B (en) | 2019-03-26 |
Family
ID=58325471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610953873.3A Active CN106529205B (en) | 2016-11-03 | 2016-11-03 | It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529205B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292130B (en) * | 2017-06-09 | 2019-11-26 | 西安电子科技大学 | Drug method for relocating based on gene mutation and gene expression |
CN108647484B (en) * | 2018-05-17 | 2020-10-23 | 中南大学 | Medicine relation prediction method based on multivariate information integration and least square method |
CN109411033B (en) * | 2018-11-05 | 2021-08-31 | 杭州师范大学 | Drug efficacy screening method based on complex network |
CN109887540A (en) * | 2019-01-15 | 2019-06-14 | 中南大学 | A kind of drug targets interaction prediction method based on heterogeneous network insertion |
CN110444250A (en) * | 2019-03-26 | 2019-11-12 | 广东省微生物研究所(广东省微生物分析检测中心) | High-throughput drug virtual screening system based on molecular fingerprint and deep learning |
CN110853714B (en) * | 2019-10-21 | 2023-04-21 | 天津大学 | Drug repositioning system based on pathogenic contribution network analysis |
CN110957002B (en) * | 2019-12-17 | 2023-04-28 | 电子科技大学 | Drug target interaction relation prediction method based on synergistic matrix decomposition |
CN111477344B (en) * | 2020-04-10 | 2023-06-09 | 电子科技大学 | Drug side effect identification method based on self-weighted multi-core learning |
CN111524546B (en) * | 2020-04-14 | 2022-05-03 | 湖南大学 | Drug-target interaction prediction method based on heterogeneous information |
CN111755078B (en) * | 2020-07-30 | 2022-09-23 | 腾讯科技(深圳)有限公司 | Drug molecule attribute determination method, device and storage medium |
CN112133367B (en) * | 2020-08-17 | 2024-07-12 | 中南大学 | Method and device for predicting interaction relationship between medicine and target point |
CN112216353B (en) * | 2020-11-02 | 2024-04-02 | 长沙理工大学 | Method and apparatus for predicting drug-target interaction relationship |
CN112435720B (en) * | 2020-12-04 | 2021-10-26 | 上海蠡图信息科技有限公司 | Prediction method based on self-attention mechanism and multi-drug characteristic combination |
CN112863634B (en) * | 2021-01-12 | 2022-09-20 | 山东大学 | Traditional Chinese medicine prescription recommendation method and system based on new crown protein heterogeneous network clustering |
CN114023397B (en) * | 2021-09-16 | 2024-05-10 | 平安科技(深圳)有限公司 | Drug redirection model generation method and device, storage medium and computer equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103065066A (en) * | 2013-01-22 | 2013-04-24 | 四川大学 | Drug combination network based drug combined action predicting method |
CN103902848A (en) * | 2012-12-28 | 2014-07-02 | 深圳先进技术研究院 | System and method for identifying drug targets based on drug interaction similarities |
-
2016
- 2016-11-03 CN CN201610953873.3A patent/CN106529205B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902848A (en) * | 2012-12-28 | 2014-07-02 | 深圳先进技术研究院 | System and method for identifying drug targets based on drug interaction similarities |
CN103065066A (en) * | 2013-01-22 | 2013-04-24 | 四川大学 | Drug combination network based drug combined action predicting method |
Non-Patent Citations (2)
Title |
---|
一种多信息融合的药物-靶标关联预测算法;彭利红等;《计算机工程》;20160630;第42卷(第6期);第218-223页 |
基于化学信息学方法预测药物靶点的研究进展;方坚松等;《药学学报》;20141012;第49卷(第10期);第1357-1364页 |
Also Published As
Publication number | Publication date |
---|---|
CN106529205A (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529205B (en) | It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information | |
Yuste et al. | A community-based transcriptomics classification and nomenclature of neocortical cell types | |
CN105653846B (en) | Drug method for relocating based on integrated similarity measurement and random two-way migration | |
CN107506591B (en) | Medicine repositioning method based on multivariate information fusion and random walk model | |
Browning et al. | Quantitative analysis of tumour spheroid structure | |
CN109887540A (en) | A kind of drug targets interaction prediction method based on heterogeneous network insertion | |
Zou et al. | Approaches for recognizing disease genes based on network | |
CN114334038B (en) | Disease medicine prediction method based on heterogeneous network embedded model | |
CN107545151A (en) | A kind of medicine method for relocating based on low-rank matrix filling | |
CN110021341A (en) | A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access | |
Zhou et al. | EL_LSTM: prediction of DNA-binding residue from protein sequence by combining long short-term memory and ensemble learning | |
CN114649097A (en) | Medicine efficacy prediction method based on graph neural network and omics information | |
CN112420126A (en) | Drug target prediction method based on multi-source data fusion and network structure disturbance | |
Krivov | Numerical construction of the p fold (committor) reaction coordinate for a Markov process | |
Pouyan et al. | Clustering single-cell expression data using random forest graphs | |
Sun et al. | Protein function prediction using function associations in protein–protein interaction network | |
Sottosanti et al. | Co-clustering of spatially resolved transcriptomic data | |
Razdaibiedina et al. | PIFiA: self-supervised approach for protein functional annotation from single-cell imaging data | |
Mathur | Bioinformatics challenges: a review | |
CN110534153A (en) | Target prediction system and method based on deep learning | |
Wang et al. | Prediction of the disease causal genes based on heterogeneous network and multi-feature combination method | |
Wang et al. | Feature selection methods in the framework of mRMR | |
Lei | Model-driven design and uncertainty quantification for cardiac electrophysiology experiments | |
Zhang et al. | Application of machine learning techniques in drug-target interactions prediction | |
Villoutreix | Randomness and variability in animal embryogenesis, a multi-scale approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200513 Address after: 410000 No. 678 Qingshan Road, Yuelu District, Changsha City, Hunan Province Patentee after: HUNAN CREATOR INFORMATION TECHNOLOGIES Co.,Ltd. Address before: Yuelu District City, Hunan province 410083 Changsha Lushan Road No. 932 Patentee before: CENTRAL SOUTH University |
|
TR01 | Transfer of patent right |