CN106529205B - It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information - Google Patents

It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information Download PDF

Info

Publication number
CN106529205B
CN106529205B CN201610953873.3A CN201610953873A CN106529205B CN 106529205 B CN106529205 B CN 106529205B CN 201610953873 A CN201610953873 A CN 201610953873A CN 106529205 B CN106529205 B CN 106529205B
Authority
CN
China
Prior art keywords
drug
relationship
target
description information
targets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610953873.3A
Other languages
Chinese (zh)
Other versions
CN106529205A (en
Inventor
王建新
严承
王伟平
李敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUNAN CREATOR INFORMATION TECHNOLOGIES Co.,Ltd.
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201610953873.3A priority Critical patent/CN106529205B/en
Publication of CN106529205A publication Critical patent/CN106529205A/en
Application granted granted Critical
Publication of CN106529205B publication Critical patent/CN106529205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Landscapes

  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information, drug minor structure information, molecule character description information and known drug targets relationship are obtained by database first, then the similarity matrix between drug is constructed according to these drug minor structures, drug molecule character description information and known drug targets relationship, individually, then each similarity matrix of building is become into final drug similarity matrix according to weight sets;Also similar feature predicts the target relationship of drug for target finally based on similar drug targeting.The present invention only needs to construct similitude according to drug molecule character description information, minor structure information, the information such as the sequence independent of target, and target Relationship Prediction can be carried out to completely new medical compounds, avoid a large amount of manpower and material resources consumed by Biochemistry Experiment.The experimental results showed that this method being capable of accurate prediction drug targets relationship.

Description

A kind of drug targets relationship based on drug minor structure, molecule character description information is pre- Survey method
Technical field
The invention belongs to system biology fields, are related to a kind of medicine based on drug minor structure, molecule character description information Object target Relationship Prediction method.
Background technique
For at present, drug targets refer to has pharmacodynamic feature and can be by pharmaceutically-active large biological molecule, such as certain in vivo A little large biological molecules such as protein and nucleic acid, the gene of those Code targets albumen are also referred to as target gene.First determine targeting The relevant target molecules of specified disease are the bases of modern new drug development, therefore the identification of drug targets interaction has become One important foundation process of drug development.Although can be identified by bioassay to drug targets interaction, But its experimental method is very expensive for current drug development, is time-consuming and challenging.So with There is different computation models to predict extensive potential drug targets incidence relation in the development of computing technique.
Currently, mainly having 3 major class for drug targets Relationship Prediction:
(1) Bioexperiment measuring method
This traditional medicament research and development mode, which has, achieves certain success in early period.But into since the new century, this Kind is faced with always many intractable challenges based on " gene, a kind of drug, a kind of disease ", and such as high clinical proportion of goods damageds are opened It excessive cycle is sent out, needs a large amount of manpower, financial resources, material resources are tested.
(2) network-based prediction technique
This method, which is based on similar drug, can be applied to as similar target it is assumed that being integrated with drug similitude net The information such as network, target similitude network, existing DTI (drug targets relationship), drug side-effect relational network, can for us Quickly, it easily predicts potential drug targets relationship and provides important help for the reorientation of drug, be based on network Method have become prediction potential drug target incidence relation powerful.
Such as in NRWRH method, the similitude network between drug, protein-protein similitude network and known medicine are integrated Object target interactive network carries out drug to a heterogeneous network, by random walk method again in this heterogeneous network The prediction of target relationship, different with traditional random walk method is that it is integrated with three networks, can from drug to target, Target is predicted to drug both direction.
In addition, this 3 inference pattern methods of the typical DBSI also compared, TBSI, NBI, these methods are based respectively on The structural similarity that the SIMCOMP of drug is calculated, the net of Smith-Waterman score similitude and DTI based on target Network Topology Similarity infers drug targets incidence relation.In the SDTNBI method of newest proposition, provide to new drug The prediction model of object is closed, and is achieved good results, still, the integrated drug information of this method needs to be further improved.
(3) based on the prediction technique of machine learning
Currently, based on the method for machine learning by integrating the sequence information, known of the chemical structure of drug, target proteins Drug targets relationships predicted, be divided into the study and semi-supervised learning method of supervision.In supervised learning method, mesh The foundation that preceding positive negative sample determines is basis currently with the presence or absence of known incidence relation, however, this can have a negative sample This select permeability because from the data of experimental verification can only be confirmation its there are incidence relations, without can confirm that it is not present Incidence relation.Than more typical such as BLM method, predicted respectively from drug and target both direction using support vector machines Then value takes its average value to obtain final prediction score, but negative sample selection of such method its inaccuracy is largely The upper accuracy for influencing prediction.In order to solve the problems, such as above-mentioned negative sample, a small amount of label and a large amount of Unlabeled data collection are proposed At semi-supervised learning method, compare typically NetLapRLS method, this method integrates pharmaceutical chemistry information, target gene Information and known incidence relation predict new relationship, the method use label relationship and unmarked relationship rather than Pure label relationship improves predictablity rate.
Although these above-mentioned methods be successfully applied to presently, there are drug targets incidence relation prediction and medicine Object redirects in work, but the defect that its method provided has is that cannot new chemical entities be carried out with target association to close System's prediction, or target Relationship Prediction can be carried out to new compound, but need to integrate more drug informations to calculate Better prediction result, and this is very important drug development and further research.
Therefore, it is necessary to design a kind of new method for carrying out the prediction of target incidence relation towards new compound.
Summary of the invention
The technical problem to be solved by the present invention is to, in view of the deficiencies of the prior art, provide it is a kind of based on drug minor structure, The drug targets Relationship Prediction method of molecule character description information is capable of the target relationship of accurate prediction drug, is effectively kept away Exempt from a large amount of manpower and material resources consumed by Biochemistry Experiment.
The technical solution of invention is as follows:
It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information, including following step It is rapid:
Step 1: according to each kernel texture information architecture drug Substructure similarity matrix S of all drugsSubSim
Step 2: drug molecule character description information is constructed according to the molecule character description information (Smiles) of all drugs Similarity matrix SSmiSim
Step 3: judging to need whether the drug predicted has known drug targets relationship;What it is if necessary to prediction is to have The drug for the drug targets relationship known, then the building of the drug targets relationship (DTI) according to known to drug drug targets relationship is similar Property matrix SDTISim;If necessary to prediction be no known drug targets relationship drug, then do not construct drug targets relationship Similarity matrix SDTISim
Step 4: the various drug similarity matrixs of above-mentioned building are integrated into final drug similarity matrix SSim
Step 5: according to drug targets relationship known to other drugs similar with the drug for needing to predict, calculating and need in advance Relationship score between the drug and target of survey;Ranking is carried out to score, if score ranking is higher, the drug targets are to presence A possibility that relationship, is bigger.
The detailed process of the step 1 are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;diFor i-th of medicine Object drugiSub-structural features value vector, characteristic value number is equal to the dimension K of minor structure in feature value vector, if drug exists The minor structure, then otherwise it is 0 that corresponding characteristic value, which is 1,;
Then, according to the cosine related coefficient of sub-structural features value vector, drug drug is calculatediAnd drugjStructure it is similar PropertyCalculation formula is as follows:
Wherein, dikAnd djkRespectively indicate drugiAnd drugjSub-structural features value vector diAnd djIn k-th of characteristic value,; WkFor the weight of k-th of minor structure, WkCalculation it is as follows:
Wherein, fkFor the frequency that k-th of minor structure occurs in all drugs, δ isStandard deviation, h is default Parameter (is set as 0.1) in our current research, the meaning of weight be so that the frequency of occurrences it is low minor structure it is higher than frequency son knot Structure occupies higher specific gravity when calculating drug Substructure similarity;
Finally, by allThe drug Substructure similarity matrix S of compositionSubSimFor SSubSimThe element of i-th row jth column.
The detailed process of the step 2 are as follows:
Firstly, from the molecule character description information of drug isolate length be 4 LINGO Dictionary set (such as The molecule character description information of drug DB00217 is " CN/C (=N C)/NCc1ccccc1 " in DrugBank database, from it The LINGO Dictionary set isolated includes " CN/C ", " N/C (", "/C (=" etc.);LINGO Dictionary set Each of element be calculated as a term;The LINGO Dictionary set of i-th of drug is denoted as Di;All drugs LINGO Dictionary union of sets collection is denoted as D, it may be assumed that
D=D1∪......∪Dm
Then, the weight idf (t, D) of each term in D is calculated;Molecule character description information of the term in all drugs The frequency of middle appearance is higher, and weight is lower, and calculation formula is as follows:
Wherein, t is a specific term in set D;M is total molecule character description information number, and value is equal to medicine Object number;M is the molecule character description information number comprising the term;Thus the weight of all term in set D is obtained;
Any two drug drug is calculated further according to following formulaiAnd drugjMolecule character description information similitude
Finally, by allConstitute the molecule character description information similarity matrix of drug SSubSimFor SSmiSimThe element of i-th row jth column.
In the step 3, the target relationship (DTI) according to known to drug constructs drug targets relationship similarity matrix SDTISimDetailed process are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;{targetl,l =1,2 ..., n } be all targets set, n is the quantity of target;A is known drug targets relational matrix, the i-th row in A The element of l column is denoted as ail, indicate i-th of drug drugiWith first of target targetlBetween relation value;If drugiWith targetlThere are relationship, then ailIt is 1, is otherwise 0;
Then, drug is calculated based on matrix AiAnd drugjDrug targets relationship similitudeCalculation formula is as follows:
Wherein, function sign (ail,ajl) meaning be ailAnd ajlIn any one be 1, then returning the result is 1;Otherwise it returns Return result 0.
Finally, by allConstitute drug targets relationship similarity matrix SDTISimFor SDTISimThe element of i-th row jth column.
The detailed process of the step 4 are as follows: willWithIt is integrated with weight (α, β, 1- alpha-beta): When need to predict is drug (the completely new drug) of no known target relationship, no SDTISimSimilarity data, therefore drugiAnd drugjFinal drug similitudeCalculation formula are as follows:
When need to predict is to have the drug of known target relationship, drugiAnd drugjFinal drug similitude Calculation formula are as follows:
Wherein, 0 < α, β < 1;
By allConstitute final drug similarity matrix SSimFor SSimI-th row jth column Element.
The detailed process of the step 5 are as follows: according to final drug similarity matrix SSim, predict between drug and target Relationship score;
drugiAnd targetlRelationship score are as follows:
Wherein, ajlFor the drug in known drug targets relational matrix AjAnd targetlBetween relation value;It needs pre- The drug drug of surveyiWith target targetlBetween relationship fractional root according to drug similar with its whether with target targetlIt deposits It is determined in relationship;Selection and drug when parameter Threshold and Ksim are for determining calculated relationship scoreiSimilar drug Range, the former takes limitation and drugiSimilitudeDrug greater than Threshold is calculated;The latter then takes and drugi's SimilitudeRanking is calculated in preceding Ksim of drug;drugiKsim is indicated and drugiSimilitude ranking before Ksim The drug set of name;Two parameters participate in calculating as long as meeting one of them.Threshold and Ksim value can pass through intersection Verifying obtains.
The quantity of drug of the present invention is m, including needing 1 drug predicted and m-1 to have known drug targets relationship Drug;The drug for needing to predict is likely to be the drug of known drug targets relationship, it is also possible to be without known The drug (completely new drug) of drug targets relationship.Drug if necessary to prediction is the medicine for having known drug targets relationship Object then can construct drug targets relationship similarity matrix according to the known drug targets relationship of m drug in step 3 SDTISim, in steps of 5 according to integrated drug targets relationship similarity matrix SDTISimFinal drug similarity matrix calculate The unknown drug targets relationship score of the drug for needing to predict, to predict its unknown drug targets relationship.If necessary to pre- The drug of survey is the drug (completely new drug) of no known drug targets relationship, then does not construct drug targets in step 3 Relationship similarity matrix SDTISim, in steps of 5 according to not integrated drug targets relationship similarity matrix SDTISimFinal medicine Object similarity matrix calculates the potential drug targets relationship score of drug for needing to predict, to predict its potential drug targets Relationship.
The utility model has the advantages that
The present invention is based on current existing drug targets incidence relation, the molecule character description information of drug, minor structure letters Breath, proposes a kind of drug targets Relationship Prediction method, wherein molecule character description information (Smiles), refers to simplified molecular line Property input specification (Simplified molecular input line entery specification) be a kind of to use character The molecular structure specification for going here and there to describe.This method is according to known drug targets relationship, minor structure information, Smiles character string structure Build its drug similarity relationships matrix, integrated according to weight, according to similar drug towards target also it is similar this Feature is capable of the target relationship of accurate prediction drug.The present invention is only needed according to drug molecule character description information, sub- knot Structure information constructs similitude, the information such as sequence independent of target, and can carry out target to completely new medical compounds Relationship Prediction, the shortcomings that avoiding a large amount of manpower and material resources consumed by Biochemistry Experiment.Prediction for drug targets relationship It is divided into two classes, one kind is that there are the drugs of known drug target relationship, and another kind of is completely new medical compounds;The former is similar Property matrix relationship is constructed by known drug targets relationship, minor structure, Smiles string-similarity, and the latter only passes through son Structure, Smiles string-similarity construct.The present invention for there are the drug of known drug target relationship, can predict its with Relationship between other targets can predict itself and each target for completely new drug (drug without known drug targets relationship) Relationship between mark.
The present invention can make up for it the limitation that drug targets Relationship Prediction cannot be carried out to completely new compound, enrich The integrated information of drug in SDTNBI method further improves its prediction effect, is not required to rely on specific biochemistry reality It tests condition to predict its drug targets relationship, be provided for the redirection of drug and exploitation important needed for further research Reference information.Using the prediction model for being different from the prior art, the also similar principle of the target based on similar drug targeting It is predicted, reduces in the prior art due to the prediction deviation that attribute lacks and generates, obtained better prediction effect.
Detailed description of the invention
Drug targets Relationship Prediction overview flow chart of the Fig. 1 based on drug minor structure, molecule character description information;
Fig. 2 is present invention figure compared with ten times of cross validations of SDTNBI method;Fig. 2 (a)~Fig. 2 (e) is respectively this hair It is bright to compare figure with ten times cross validations of the SDTNBI method on data set GPCRs, Kinases, ICs, NRs, Global;
Fig. 3 is present invention figure compared with the external certificate of SDTNBI method;Fig. 3 (a)~Fig. 3 (b) be respectively the present invention with Ten times cross validations of the SDTNBI method on data set GPCRs, Kinases compare figure.
Specific embodiment
The present invention is described in further details below with reference to the drawings and specific embodiments:
Embodiment 1:
For the drug of known portions drug targets relationship, according to drug targets relationship, minor structure, molecule character description letter The mixing similitude of the building drug of breath, the final target relationship for predicting drug;It, will be according only to drug point to completely new compound Sub- character description information, chemical minor structure information construct mixing similitude, finally predict the target relationship of the compound.It is known In the prediction of drug, using a benchmark dataset, provide the minor structure of its all drug, Smiles string, it is known that target close It is information, goes out newly-increased drug targets relationship by integrating similitude model prediction;Prediction for novel compounds, using it Minor structure information and molecule character description information carry out Similarity measures, prediction with the Given information of drug in benchmark dataset Its target relationship.
Altogether five benchmark dataset GPCRs (g protein coupled receptor), Kinases (enzyme) (ion channel and nuclear receptor), ICs (ion channel), NRs (nuclear receptor), Global are collected in ChEMBL and BindingDB database, and completely new drug Target Relationship Prediction uses external data collection ExGPCRs and ExKinases, is collected in DrugBank database.
Based on drug minor structure, the whole flow process of the drug targets Relationship Prediction of molecule character description information as shown in Figure 1, Following steps can be divided into:
(1) according to the Substructure similarity matrix S of each kernel texture information architecture drug of all drugsSubSim.Drug Minor structure information includes following seven kinds: CDK fingerprint (CDK), and CDK extends fingerprint (CDKExt), CDK only chart fingerprint (Graph), MACCS fingerprint (MACCS), PubChem database fingerprint (PubChem), sub fingerprint (FP4) and Klekota- Ross fingerprint (KR).Drug minor structure data used in this research are calculated by PaDEL-Descriptor (version 2 .18) software.
Define { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;diFor i-th of drug drugiSub-structural features value vector, characteristic value number is equal to dimension K (such as the MACCS knot of minor structure in feature value vector The dimension of structure is 153), if drug there are the minor structure, otherwise it is 0 that corresponding characteristic value, which is 1,;;drugiAnd drugjKnot Structure similitude isFor the cosine related coefficient of minor structure.Formula specific as follows:
Wherein, dikAnd djkRespectively indicate drugiAnd drugjSub-structural features value vector diAnd djIn k-th of characteristic value; WkIt is
The weight of k minor structure, WkCalculation it is as follows:
Wherein, fkFor the frequency that k-th of minor structure occurs in all drugs, δ isStandard deviation, h is default Parameter (is set as 0.1) in our current research, the meaning of weight be so that the frequency of occurrences it is low minor structure it is higher than frequency son knot Structure occupies higher specific gravity when calculating drug Substructure similarity;
Finally, by allThe drug Substructure similarity matrix S of compositionSubSimFor SSubSimThe element of i-th row jth column.
It in total include 4741 drugs, MACCS minor structure type dimension is 153, by above-mentioned in GPCRs data set Weight calculation { WkAfter, the similitude of Drug105250 and Drug100109 are 0.0408.
(2) similitude S is calculated using the molecule character description information (Smiles) of drugSmiSim, Smiles is isolated into length (the Smiles string of Drug81951 is " (NC (=O) OCC [N+] to the LINGO Dictionary set that degree is 4 such as in GPCR (C) (C) C "), then the LINGO Dictionary that isolates set include " (NC (", " NC (=", " C (=O " etc.).LINGO Each of Dictionary set element is calculated as term, and the LINGO Dictionary set of i-th of drug is denoted as Di, institute There is drug term total collection to be denoted as D, for D in all drugsiIn term union, is defined as:
D=D1∪......∪Dm (3)
Then, the weight idf (t, D) of each term in D is calculated;Molecule character description information of the term in all drugs The frequency of middle appearance is higher, and weight is lower, and calculation formula is as follows:
Wherein, t is a specific term in set D;M is total molecule character description information number, and value is equal to medicine Object number;M is the molecule character description information number comprising the term;Thus the weight of all term in set D is obtained;
Any two drug drug is calculated further according to following formulaiAnd drugjMolecule character description information similitude
Finally, by allConstitute the molecule character description information similarity matrix of drug SSubSimFor SSmiSimThe element of i-th row jth column.When predicting the known drug there are drug targets relationship, Need to integrate its presently, there are drug targets relationship similitude.
Drug targets relational network is constructed before constructing DTI similitude, interacts and concentrates in drug targets, defines D ={ drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;T={ targetl, l=1,2 ..., n } be The set of all targets, n are the quantity of target;According to two Principles of Network, drug targets interaction can be expressed as two Drug targets network, wherein E={ eil:drugi∈Dr,targetl∈T};If drugiAnd targetlBetween exist test Determining interaction is connected with solid line (side) between them.According to mathematic(al) representation, two networks of drug targets can be expressed At the adjacency matrix { a of mnil, if a in matrixil=1 indicates drugiAnd targetlBetween there is determining mutual of test It acts on, otherwise ail=0.
Correlation result, drug are calculated using known DTI dataiAnd drugjDrug targets relationship similitude Calculate following formula:
Wherein, function sign (ail,ajl) meaning be ailAnd ajlIn any one be 1, then returning the result is 1;Otherwise it returns Return result 0.By allConstitute drug targets relationship similarity matrix SDTISimFor SDTISim The element of i-th row jth column.
Such as, in GPCR data set, total different targets in the drug targets relationship of drug82068 and drug82198 Marking number is 6, and public target number is 4, then its target relationship similitude is 0.6667.
(3) willSimilitude is integrated with weight (α, β, 1- alpha-beta).According to similarity data point Analysis and prediction case, calculation formula are as follows:
In the case where predicting completely new compound, no SDTISimSimilarity data, therefore drugiAnd drugjFinal similitudeCalculation formula are as follows:
In the case where being predicted as existing drug, drugiAnd drugjFinal similitudeCalculation formula are as follows:
It (4) will be according to final drug similarity matrix SSim, based on drug similarity inference thought (if a drug It interacts with a target proteins, then drug similar with this drug is also likely to act on this target);Therefore it predicts Drug drugiWith target targetlBetween relationship score are as follows:
Wherein, ajlFor the drug in known drug targets relational matrix AjAnd targetlBetween relation value;It needs pre- The drug drug of surveyiWith target targetlBetween relationship fractional root according to drug similar with its whether with target targetlIt deposits It is determined in relationship;Selection and drug when parameter Threshold and Ksim are for determining calculated relationship scoreiSimilar drug Range, the former takes limitation and drugiSimilitudeDrug greater than Threshold is calculated;The latter then takes and drugi SimilitudeRanking is calculated in preceding Ksim of drug;drugiKsim is indicated and drugiSimilitude ranking before Ksim drug set;Two parameters participate in calculating as long as meeting one of them.
For the validity of verification method, two kinds of verifyings, an internal verification, in five benchmark datasets have been carried out Prediction verifying is carried out by the way of intersecting in GPCRs, Kinases, ICs, NRs, Global using ten times;Another is tested for outside Two external data collection GPCRs and Kinases from DrugBank are concentrated in its corresponding reference data and are carried out completely newly by card The prediction of drug is verified.
Specific profile data set is as shown in table 1 below, and target is validation data set, Nd, Nt, NdtRespectively each data set Middle drug, target, drug targets relationship number, Sparsity NdtWith the ratio of all possible drug targets relationship number.
1 data set of table summarizes table
For the accuracy of assessment prediction method, to every a pair of of the training set and test set in cross validation, from training set The corresponding relation data of all nodes in middle deletion test set, by after the model prediction with the DTI relationship in test set into Row compares.Drug is assessed for every kind of participation, is ranked up according to the drug targets relationship score of prediction, then in test set DTI relationship is compared.In order to assess its performance, below several evaluation indexes difference display model methods precision and robust Property.Including accuracy rate (P), recall rate (R), accurate enhancing rate (ep) and recall enhancing rate (er).Its details be briefly described as Under:
Wherein M and N is the drug and target number for participating in prediction, and X is the total DTI relationship number deleted in M drug, XiFor drugiThe DTI relationship number of deletion.XiIt (L) is current drugiPredicting list in preceding L be correct DTI relationship number Mesh.In addition, the performance also by calculating AUC (the areas under ROC curves) Lai Tixian algorithm.
Table 2 describes in ten times of cross validations, the Performance Evaluating Indexes value of each data set after prediction, using formula (6) Lai Jicheng similitude, wherein ICs, Kinases, NRs data lumped parameter are (α=0.1, β=0.1, Ksim in GPCRs =100, Threshold=0.5), the parameter used in Global data set for (α=0.1, β=0.1, Ksim=50, Threshold=0.5).Due to being analyzed by data relationship in ten times of cross validations,Relationship specific gravity is to final pre- Surveying result influences maximum, therefore taking its weighted value is 0.8, in additionIt is distributed as 0.1, is made an uproar to eliminate low similitude Sound obtains better prediction result, and taking threshold values is 0.5, in prediction model other than taking threshold values 0.5, is also provided with each drug Neighbour's number limitation of prediction is participated in, the number of drugs in GPCRs, ICs, Kinases, NRs is comparatively bigger than in Global, It is described in Table 1, therefore it is 100 that Ksim value, which is arranged, in the former, the latter 50.
Algorithm performance index in 20 times of cross validations of table
In GPCRs data set, indices P, R, ep, er, AUC is more stable in seven sub- structures, and difference is not Greatly, wherein AUC has reached and has reached 0.953 or more, and mean value 0.962 obtains good prediction effect.In Global data Its average index ratio GPCRs is low on collection, wherein AUC value is worst on FP4 minor structure, only 0.817, but in other sub- knots Above structure between 0.923 to 0.938, on IC data set, AUC peak has reached 0.971, minimum also to have 0.947, On Kinases data set, verification result is more stable, and AUC value is all 0.961 or more, peak 0.963, in NRs data On collection, performance is opposite to want difference, and AUC value is between 0.909 and 0.928.
Table 3 describes in external certificate and (uses the GPCRs from DrugBank database, Kinases data set), face To the evaluation index value of the completely new each data set prediction result of compound, using formula (5) Lai Jicheng similitude, wherein outside GPCRs data lumped parameter in portion is that (α=0.5, Ksim=0, Threshold=0.1, wherein Ksim=0 is indicated regardless of before ranking How many data be involved in prediction calculate), the parameter used in Kinases data set for (α=0.5, Ksim=0, Threshold=0.0, wherein Ksim=0 and Threshold=0.0 indicates to be involved in prediction regardless of the value and ranking of similitude It calculates).It is rightSimilitude is integrated at the same scale, is set as 0 to neighbour Ksim, all with no restrictions, Setting to threshold values Threshold be in GPCRs data set be 0 (with no restrictions) in 0.1, Kinases data set.
The algorithm performance index of 3 external certificate of table
Seen in table 3 since the prediction to new compound lacks drug targets relationship known to it, is compared in AUC value Ten times of cross validations in table 2 want low, this side light importance of known DTI relationship.It is verified in GPCRs data set In, AUC has reached peak 0.841 in the integrated prediction of FP4+smi, and in Kinases data set, the highest of AUC Value obtains in the integrated prediction of KR+smi, and other indices are currently being also to have extremely high value.
The data set disclosed in SDTNBI method of the data set as used in the present invention, using is network Distribution model, at the same define with above identical Performance Evaluating Indexes, we in the present invention estimated performance with SDTNBI method compares, due to accuracy rate (P), recall rate (R), and accurate enhancing rate (ep) and recall enhancing rate (er) this Four indexs are related to specific parameter L=20 in verifying, than be less to meet very much with overall performance, still comparing When ignore this four indexs, only than being less related to the AUC value of any parameter.
In fig. 2 it is possible to find out the AUC maximum value for integrating different minor structure predictions in Global data set, minimum value Result all than the prediction of SDTNBI method is poor, and mean value is 0.928 in SDTNBI, is 0.913 in new method, this can Can be related with the density of drug targets relationship existing in Global data set, because in the ten times of intersections predicted known drug It is integrated with known drug target relationship analog information in verifying, the known medicine in only Global data set is described in table 1 The density of object target relationship is lower than 1%, is 0.54%, therefore the similarity data constructed on the present invention causes influence, Jin Erying Ring final prediction result.
And in other four group data sets GPCRs, Kinases, ICs, NRs, the mean value of different minor structure predictions is integrated, Minimum value is all substantially better than SDTNBI method, and is not much different in maximum value.This demonstrates new on this four group data set The prediction result of method is stablized than original SDTNBI method, especially in the multiple types minor structure information ratio for obtaining drug Under more difficult situation, new method can be more more reliable than SDTNBI method.
Verification result ratio in Fig. 3, in completely new drug prediction validation test, in Kinases data set SDTNBI is slightly poor, and mean value AUC ratio is 0.853:0.848, and maximum AUC ratio is 0.863:0.856, and minimum AUC ratio is 0.847: 0.841.And in GPCGs data set validation test, the estimated performance ratio SDTNBI of new method improves, Value AUC ratio is 0.817:0.766, and in addition maximum AUC value and minimum AUC value are similarly better than the latter.
Pass through the comparison of above-mentioned two aspect, it was demonstrated that in the case where known drug target relationship is met certain condition, New integrated approach can provide more accurate prediction result than original SDTNBI method, redirect and new medicine for drug Object exploitation provides significant further Research foundation.

Claims (6)

1. a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information, which is characterized in that packet Include following steps:
Step 1: according to each kernel texture information architecture drug Substructure similarity matrix S of all drugsSubSim
Step 2: drug molecule character description information similarity matrix is constructed according to the molecule character description information of all drugs SSmiSim
Step 3: judging to need whether the drug predicted has known drug targets relationship;If necessary to prediction be have it is known The drug of drug targets relationship, then the drug targets relationship according to known to drug constructs drug targets relationship similarity matrix SDTISim;If necessary to prediction be no known drug targets relationship drug, then do not construct drug targets relationship similitude Matrix SDTISim
Step 4: the various drug similarity matrixs of above-mentioned building are integrated into final drug similarity matrix SSim
Step 5: according to drug targets relationship known to other drugs similar with the drug for needing to predict, calculating what needs were predicted Relationship score between drug and target;Ranking is carried out to score, if score ranking is higher, the drug targets are to there are relationships A possibility that it is bigger.
2. the drug targets Relationship Prediction side according to claim 1 based on drug minor structure, molecule character description information Method, which is characterized in that the detailed process of the step 1 are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;diFor i-th of drug drugiSub-structural features value vector, characteristic value number is equal to the dimension K of minor structure in feature value vector, if exist should for drug Minor structure, then otherwise it is 0 that corresponding characteristic value, which is 1,;
Then, according to the cosine related coefficient of sub-structural features value vector, drug drug is calculatediAnd drugjStructural similarityCalculation formula is as follows:
Wherein, dikAnd djkRespectively indicate drugiAnd drugjSub-structural features value vector diAnd djIn k-th of characteristic value,;WkFor The weight of k-th of minor structure, WkCalculation it is as follows:
Wherein, fkFor the frequency that k-th of minor structure occurs in all drugs, δ isStandard deviation, h is parameter preset;
Finally, by allConstitute drug Substructure similarity matrix SSubSim, i, j=1,2 ..., m;For SSubSim The element of i-th row jth column.
3. the drug targets Relationship Prediction side according to claim 2 based on drug minor structure, molecule character description information Method, which is characterized in that the detailed process of the step 2 are as follows:
Gather firstly, isolating the LINGO Dictionary that length is 4 from the molecule character description information of drug;LINGO Each of Dictionary set element is calculated as a term;The LINGO Dictionary set of i-th of drug is denoted as Di;The LINGO Dictionary union of sets collection of all drugs is denoted as D, it may be assumed that
D=D1∪......∪Dm
Then, in set of computations D each term weight idf (t, D);Calculation formula is as follows:
Wherein, t is a specific term in set D;M is the molecule character description information number comprising the term;
Any two drug drug is calculated further according to following formulaiAnd drugjMolecule character description information similitude
Finally, by allConstitute the molecule character description information similarity matrix S of drugSmiSim, i, j=1,2 ..., m;For SSmiSimThe element of i-th row jth column.
4. the drug targets Relationship Prediction side according to claim 3 based on drug minor structure, molecule character description information Method, which is characterized in that in the step 3, the target relationship according to known to drug constructs drug targets relationship similarity matrix SDTISimDetailed process are as follows:
Firstly, defining { drugi, i=1,2 ..., m } be all drugs set, m be drug quantity;{targetl, l=1, 2 ..., n } be all targets set, n be target quantity;A is known drug targets relational matrix, the i-th row l column in A Element be denoted as ail, indicate i-th of drug drugiWith first of target targetlBetween relation value;If drugiWith targetlThere are relationship, then ailIt is 1, is otherwise 0;
Then, drug is calculated based on matrix AiAnd drugjDrug targets relationship similitudeCalculation formula is as follows:
Wherein, function sign (ail,ajl) meaning be ailAnd ajlIn any one be 1, then returning the result is 1;Otherwise knot is returned Fruit 0;
Finally, by allConstitute drug targets relationship similarity matrix SDTISimFor SDTISimThe element of i-th row jth column.
5. the drug targets Relationship Prediction side according to claim 4 based on drug minor structure, molecule character description information Method, which is characterized in that the detailed process of the step 4 are as follows: willWithWith weight (α, β, 1- alpha-beta) into Row integration: when need to predict is the drug of no known target relationship, drugiAnd drugjFinal drug similitudeCalculation formula are as follows:
When need to predict is to have the drug of known target relationship, drugiAnd drugjFinal drug similitudeMeter Calculate formula are as follows:
Wherein, 0 < α, β < 1;
By allConstitute final drug similarity matrix SSimFor SSimThe member of i-th row jth column Element.
6. the drug targets Relationship Prediction side according to claim 5 based on drug minor structure, molecule character description information Method, which is characterized in that the detailed process of step 5 are as follows: according to final drug similarity matrix SSim, predict drug and target it Between relationship score;
drugiAnd targetlRelationship score are as follows:
Wherein, ajlFor the drug in known drug targets relational matrix AjAnd targetlBetween relation value;Parameter Selection and drug when Threshold and Ksim is for determining calculated relationship scoreiThe range of similar drug, the former limit take with drugiSimilitudeDrug greater than Threshold is calculated;The latter then takes and drugiSimilitudeRanking exists Preceding Ksim of drug is calculated;drugiKsim is indicated and drugiSimilitude ranking before Ksim drug set;Two Parameter participates in calculating as long as meeting one of them.
CN201610953873.3A 2016-11-03 2016-11-03 It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information Active CN106529205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610953873.3A CN106529205B (en) 2016-11-03 2016-11-03 It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610953873.3A CN106529205B (en) 2016-11-03 2016-11-03 It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information

Publications (2)

Publication Number Publication Date
CN106529205A CN106529205A (en) 2017-03-22
CN106529205B true CN106529205B (en) 2019-03-26

Family

ID=58325471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610953873.3A Active CN106529205B (en) 2016-11-03 2016-11-03 It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information

Country Status (1)

Country Link
CN (1) CN106529205B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292130B (en) * 2017-06-09 2019-11-26 西安电子科技大学 Drug method for relocating based on gene mutation and gene expression
CN108647484B (en) * 2018-05-17 2020-10-23 中南大学 Medicine relation prediction method based on multivariate information integration and least square method
CN109411033B (en) * 2018-11-05 2021-08-31 杭州师范大学 Drug efficacy screening method based on complex network
CN109887540A (en) * 2019-01-15 2019-06-14 中南大学 A kind of drug targets interaction prediction method based on heterogeneous network insertion
CN110444250A (en) * 2019-03-26 2019-11-12 广东省微生物研究所(广东省微生物分析检测中心) High-throughput drug virtual screening system based on molecular fingerprint and deep learning
CN110853714B (en) * 2019-10-21 2023-04-21 天津大学 Drug repositioning system based on pathogenic contribution network analysis
CN110957002B (en) * 2019-12-17 2023-04-28 电子科技大学 Drug target interaction relation prediction method based on synergistic matrix decomposition
CN111477344B (en) * 2020-04-10 2023-06-09 电子科技大学 Drug side effect identification method based on self-weighted multi-core learning
CN111524546B (en) * 2020-04-14 2022-05-03 湖南大学 Drug-target interaction prediction method based on heterogeneous information
CN111755078B (en) * 2020-07-30 2022-09-23 腾讯科技(深圳)有限公司 Drug molecule attribute determination method, device and storage medium
CN112133367B (en) * 2020-08-17 2024-07-12 中南大学 Method and device for predicting interaction relationship between medicine and target point
CN112216353B (en) * 2020-11-02 2024-04-02 长沙理工大学 Method and apparatus for predicting drug-target interaction relationship
CN112435720B (en) * 2020-12-04 2021-10-26 上海蠡图信息科技有限公司 Prediction method based on self-attention mechanism and multi-drug characteristic combination
CN112863634B (en) * 2021-01-12 2022-09-20 山东大学 Traditional Chinese medicine prescription recommendation method and system based on new crown protein heterogeneous network clustering
CN114023397B (en) * 2021-09-16 2024-05-10 平安科技(深圳)有限公司 Drug redirection model generation method and device, storage medium and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method
CN103902848A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 System and method for identifying drug targets based on drug interaction similarities

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902848A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 System and method for identifying drug targets based on drug interaction similarities
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种多信息融合的药物-靶标关联预测算法;彭利红等;《计算机工程》;20160630;第42卷(第6期);第218-223页
基于化学信息学方法预测药物靶点的研究进展;方坚松等;《药学学报》;20141012;第49卷(第10期);第1357-1364页

Also Published As

Publication number Publication date
CN106529205A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106529205B (en) It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information
Yuste et al. A community-based transcriptomics classification and nomenclature of neocortical cell types
CN105653846B (en) Drug method for relocating based on integrated similarity measurement and random two-way migration
CN107506591B (en) Medicine repositioning method based on multivariate information fusion and random walk model
Browning et al. Quantitative analysis of tumour spheroid structure
CN109887540A (en) A kind of drug targets interaction prediction method based on heterogeneous network insertion
Zou et al. Approaches for recognizing disease genes based on network
CN114334038B (en) Disease medicine prediction method based on heterogeneous network embedded model
CN107545151A (en) A kind of medicine method for relocating based on low-rank matrix filling
CN110021341A (en) A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access
Zhou et al. EL_LSTM: prediction of DNA-binding residue from protein sequence by combining long short-term memory and ensemble learning
CN114649097A (en) Medicine efficacy prediction method based on graph neural network and omics information
CN112420126A (en) Drug target prediction method based on multi-source data fusion and network structure disturbance
Krivov Numerical construction of the p fold (committor) reaction coordinate for a Markov process
Pouyan et al. Clustering single-cell expression data using random forest graphs
Sun et al. Protein function prediction using function associations in protein–protein interaction network
Sottosanti et al. Co-clustering of spatially resolved transcriptomic data
Razdaibiedina et al. PIFiA: self-supervised approach for protein functional annotation from single-cell imaging data
Mathur Bioinformatics challenges: a review
CN110534153A (en) Target prediction system and method based on deep learning
Wang et al. Prediction of the disease causal genes based on heterogeneous network and multi-feature combination method
Wang et al. Feature selection methods in the framework of mRMR
Lei Model-driven design and uncertainty quantification for cardiac electrophysiology experiments
Zhang et al. Application of machine learning techniques in drug-target interactions prediction
Villoutreix Randomness and variability in animal embryogenesis, a multi-scale approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200513

Address after: 410000 No. 678 Qingshan Road, Yuelu District, Changsha City, Hunan Province

Patentee after: HUNAN CREATOR INFORMATION TECHNOLOGIES Co.,Ltd.

Address before: Yuelu District City, Hunan province 410083 Changsha Lushan Road No. 932

Patentee before: CENTRAL SOUTH University

TR01 Transfer of patent right