CN102542131A

CN102542131A - Method for predicting medicament pharmacokinetic properties and toxicity on basis of genetic algorithm and artificial neural network

Info

Publication number: CN102542131A
Application number: CN2010105833916A
Authority: CN
Inventors: 卢小泉; 韩文静; 周喜斌; 陈晶; 李鹏霞; 张玲屏; 张翠忠; 姬东琴
Original assignee: Northwest Normal University
Current assignee: Northwest Normal University
Priority date: 2010-12-07
Filing date: 2010-12-07
Publication date: 2012-07-04

Abstract

The present invention provides a method for predicting drug molecule pharmacokinetic properties and toxicity based on genetic algorithm (Genetic Algorithm, GA) and artificial neural network (Artificial Neural Network, ANN), comprising the following steps: (1) calculation; (2) preprocessing of the drug molecule descriptor dataset; (3) rescaling of the drug molecule descriptor dataset; (4) simultaneous optimization of the molecule descriptor and artificial neural network parameters using an ensemble approach; (5 ) use the selected optimal drug molecule descriptor and artificial neural network parameters to build a model to predict the pharmacokinetic properties and toxicity of drug molecules. This method makes full use of the three advantages of genetic algorithm, artificial neural network and computer, so that the drug molecular descriptor and artificial neural network parameters can be optimized at the same time, and the prediction effect and efficiency are greatly improved.

Description

Drug pharmacokinetics character and toxicity prediction method based on genetic algorithm and artificial neural network

Technical field

The present invention relates to area of computer aided SARS drug design field; Specifically relate to a kind ofly, be applicable to according to drug molecular structure information medicament molecule pharmacokinetic property and toxicity are predicted based on the medicament molecule pharmacokinetic property of genetic algorithm and artificial neural network and the Forecasting Methodology of toxicity.

Background technology

The general both at home and abroad at present prediction medicament molecule pharmacokinetic property and the method for toxicity are directly to set up model with artificial neural network to predict.This prediction of carrying out with single method can only make the parameter of ANN reach the optimization state, and the drug molecule descriptor is not optimized, and can produce two problems like this: the 1. interference problem of unrelated drugs molecule descriptor.In a large amount of drug molecule descriptor that calculates; Some drug molecule descriptor and medicament molecule pharmacokinetic property and toxicity do not have correlativity, and the adding of these drug molecule descriptors not only can't promote model predictive ability, interference model on the contrary; What cause predicting is inaccurate. therefore; The drug molecule descriptor of input model is not the more the better, should be optimized as the case may be. 2. sample size problem. when the drug molecule descriptor of input model more for a long time, then in order to obtain better prediction; Must increase considerably sample; And this often is difficult to satisfy in reality, because pharmacokinetics and toxicity test data test difficulty and expense are higher, has limited obtaining of great amount of samples.

Summary of the invention

In order to solve the optimization problem of drug molecule descriptor, the object of the invention aims to provide a kind of drug pharmacokinetics character and toxicity prediction method based on genetic algorithm and artificial neural network.

At the initial stage of drug development, use a computer and predict the pharmacokinetic property and the toxicity of drug molecule, can reduce the risk of later stage drug development, reduce R&D costs.Genetic algorithm and artificial neural network are combined; Make full use of the survival of the fittest principle of genetic algorithm; And the unique advantage of the statistical learning ability of artificial neural network and the outside huge data of computing machine fast processing; Drug molecule descriptor and artificial neural network parameter are optimized, make the both reach the optimization state, thereby the forecast quality of medicament molecule pharmacokinetic property and toxicity and forecasting efficiency are greatly improved.Practice effect proves, the present invention predicts that medicament molecule pharmacokinetic property and toxicity accuracy rate are high, saves time, laborsaving, and has preferable generalization and than a kind of Forecasting Methodology of high-timeliness.

The object of the invention can be realized through following technical scheme:

A kind of drug pharmacokinetics character and toxicity prediction method based on genetic algorithm and artificial neural network may further comprise the steps:

(1) calculating of drug molecule descriptor;

(2) pre-service of drug molecule descriptive data collection; Value with (a) drug molecule descriptor serves as zero above 90%; (b) with the related coefficient of other medicines molecule descriptor greater than 90% standard, deletion does not have the drug molecule descriptor of researching value;

(3) scale again of drug molecule descriptive data collection; The descriptive data collection is mapped in [1 ,+1] interval, its mapping formula is:

x_{pre} = \frac{2 x - (x_{\max} + x_{\min})}{x_{\max} - x_{\min}}

Wherein: x is the original value of drug molecule descriptor, x _PreBe again the value after the scale, x _MaxAnd x _MinThe maximal value and the minimum value of the corresponding drug molecule descriptor of difference.

(4) adopt integrated approach simultaneously drug molecule descriptor and artificial neural network parameter to be optimized, may further comprise the steps:

I is provided with the population size, end condition, ANN parameter area.Population comprises 20 individuals, and end condition is that the value of fitness within 100 generations is not changing, and the ANN parameter area is 2-10;

The ii chromosome of encoding: the drug molecule descriptor is used binary coding, and the corresponding binary-coded character of each drug molecule descriptor is selected if this drug molecule is described, and then is expressed as 1, otherwise is 0, and the ANN parameter adopts decimal coded;

Iii initialization population: produce 20 individuals at random;

Iv sets up model, calculates each individual fitness function in the initial population through the RMSE value that is produced by each model, and its computing formula is:

fitness＝w _r×RMSE+w _f×N

Wherein: RMSE is the predicted root mean square error of leaving-one method cross validation, and N is the number of the molecule descriptor of selection, w _rAnd w _fIt is respectively corresponding weight factor;

Whether v inspection satisfies end condition, if satisfy condition then withdraw from, otherwise the operation below continuing;

Vi adopts the elite strategy to carry out selection operation, and selecting in former generation, to have, the individuality of high fitness value appears at the next generation all the time;

Vii adopts intersect at random at 5, and the individuality of selecting is carried out interlace operation;

Viii adopts traditional single-point variation, and the individuality of selecting is carried out interlace operation, produces new population;

Ix forwards the iv step to;

(5) when satisfying end condition, promptly optimum drug molecule descriptor and ANN parameter are definite, and optimum model is set up, and the drug molecule descriptor input model of the medicine that needs are predicted promptly draws its pharmacokinetic property and toxicity value;

The advantage that the present invention compared with prior art has:

The survival of the fittest thought of genetic algorithm is incorporated into medicament molecule pharmacokinetic property and toxicity prediction system; Effectively utilize the statistical learning ability of artificial neural network and the unique advantage of the outside huge data of computing machine fast processing; Drug molecule descriptor and artificial neural network parameter are optimized; Make the both reach the optimization state, the forecast quality of medicament molecule pharmacokinetic property and toxicity and forecasting efficiency are greatly improved.Predictablity rate can reach more than 85%, and the drug molecule descriptor number that uses reduces more than 95% than other method.The method has crucial effect to area of computer aided SARS drug design and original new drug research and development.

Description of drawings

Fig. 1 drug molecule descriptor and ANN parameter optimization process flow diagram

Fig. 2 utilizes the present invention to realize the process flow diagram based on genetic algorithm and neural network prediction medicament molecule pharmacokinetic property and toxicity.

Fig. 3 is based on the medicament molecule pharmacokinetic property of genetic algorithm and neural network and the figure that predicts the outcome of toxicity.

Fig. 3 a is the figure as a result that microbiotic and the interactional action constant of DNA and the action constant value of measuring through experiment of model prediction compared;

Fig. 3 b is the IC of the BRAF suppressant of model prediction ₅₀Value and the IC that measures through experiment ₅₀The figure as a result that value is compared;

The EC of the enterovirus suppressant of Fig. 3 c model prediction ₅₀Value and the EC that measures through experiment ₅₀The figure as a result that value is compared.

Embodiment

In order to make the object of the invention, technical scheme and advantage clearer, technical scheme of the present invention is further described again below in conjunction with accompanying drawing.

A kind of medicament molecule pharmacokinetic property and toxicity prediction method based on genetic algorithm and artificial neural network is:

(1) collects drug molecule, set up extensive sample set, utilize the computed in software molecule descriptor of drug molecule descriptor.Collected 30 kinds of microbiotic, 30 kinds of BRAF suppressant and 21 kinds of enterovirus suppressant, the structure through Hyperchem draws each drug molecule adopts HF/6-31G among the structure input Gaussian with each drug molecule ^*Carry out the optimization of molecular conformation, seek the minimum energy state of each drug molecule.After optimizing end, calculate the drug molecule descriptor with optimizing among the good conformation importing MODEL.Calculated the electronics descriptor, ingredient descriptor, topological index descriptor, physicochemical property descriptor, geometrical molecular descriptor, quantum chemistry descriptor, six big types of 3778 kinds of drug molecule descriptors altogether.The electronics descriptor comprises that dipole moment, electric density connect index, topological electric charge index etc.; The ingredient descriptor comprises number of rings, atomicity, hydrogen bond number, atomic weight, hydrogen bond donor, hydrogen bond receptor of drug molecule etc.; The topological index descriptor comprises that Schultz Topological Index, Gutman Topological Index, Balaban molecule connection index, Wiener chemical bond index, CHI molecule connect index, kappa shape index, Hosoya Molecular Graphs index, Zagreb Molecular Graphs index, Moreau-Broto topology auto-correlation descriptor, Moran topology auto-correlation descriptor etc.; The physicochemical property descriptor comprises profit partition factor, polarizability; The geometrical molecular descriptor comprises principal moments of inertia, drug molecule volume, drug molecule surface area etc.; The quantum chemistry descriptor comprises highest occupied molecular orbital, lowest unoccupied molecular orbital, gross energy etc.

(2) said medicine molecule descriptive data collection is carried out pre-service; The value great majority of deletion drug molecule descriptor be 0 or with the related coefficient of other medicines molecule descriptor greater than 90% drug molecule descriptor; Carry out pretreated purpose and be drug molecule descriptor that deletion do not have researching value to reduce redundance, improve the accuracy rate of prediction.After the present invention carries out pre-service to 3778 drug molecule descriptors, also surplus 1777 drug molecule descriptors.

(3) to predict the outcome more accurately in order making, above-mentioned 1777 drug molecule descriptors to be carried out scale again, the data set that is about to comprise 1777 drug molecule descriptors passes through formula:

Be mapped in [1 ,+1] interval, wherein x is the original value of drug molecule descriptor, x _PreBe again the value after the scale, x _MaxAnd x _MinThe maximal value and the minimum value of the corresponding drug molecule descriptor of difference.

So far, accomplish the pre-service of drug molecule descriptor and the scale again of drug molecule descriptive data collection, also promptly accomplished preliminary preparation.

(4) adopt integrated method to carry out drug molecule descriptor and ANN parameter optimization, set up computer program prediction medicament molecule pharmacokinetic property and toxicity.

The end condition and the aberration rate of optimizing process at first are set, and end condition is not changing for the value of fitness within 100 generations, and aberration rate is 2%.Coding chromosome use binary coding with 1777 drug molecule descriptors, and the neuron node number composition chromosome of decimally encoding produces the initial population that comprises 20 individuals, genetic algebra K=1 at this moment then at random.The individuality that in the initial population each is comprised drug molecule descriptor subclass and ANN parameter is trained and is set up model, calculates each individual fitness value, fitness=w _r* RMSE+w _f* N, wherein: RMSE is the predicted root mean square error of leaving-one method cross validation, N is the number of the molecule descriptor of selection, w _rAnd w _fBe respectively corresponding weight factor, w _r=0.02, w _f=0.45.Whether inspection satisfies end condition, if satisfy end condition, then optimizing process stops, if do not satisfy then genetic algebra K=K+1, and the operation below continuing.Press the individuality in the big minispread initial population of fitness value, adopt elite policy selection 2 individuals, selecting in former generation, to have, the individuality of high fitness value appears at the next generation all the time.2 individuals of selecting are carried out 5 intersections and are produced new individual.Population is carried out traditional single-point variation according to 2% aberration rate, be about to 0 and become 1, or become 01.Repeat above process with new population, within 100 generations, do not change up to the value of fitness.

The value of the Fitness predictive ability of the bright model of novel more is strong more; Carrying out along with optimizing process; The value of fitness can diminish gradually, and the fitness value will no longer change when drug molecule descriptor and ANN parameter reach optimum state, explains that best model sets up.With the microbiotic of needs prediction, the molecule descriptor input model of BRAF suppressant and enterovirus suppressant promptly draws microbiotic and the interactional action constant value of DNA, the IC of BRAF suppressant ₅₀The EC of value and enterovirus suppressant ₅₀Value.Predicted results is fine, and wherein the predicated error of 95% sample is less than 0.5.Fig. 3 a-c is through 30 kinds of microbiotic of model prediction and the interactional action constant of DNA, the IC of 30 kinds of BRAF suppressant ₅₀The EC of value and 21 kinds of enterovirus suppressant ₅₀Value and the figure as a result that compares through the value that experiment is measured.Wherein Fig. 3 a is the figure as a result that microbiotic and the interactional action constant of DNA and the action constant value of measuring through experiment of model prediction compared; Fig. 3 b is the IC of the BRAF suppressant of model prediction ₅₀Value and the IC that measures through experiment ₅₀The figure as a result that value is compared; The EC of the enterovirus suppressant of Fig. 3 c model prediction ₅₀Value and the EC that measures through experiment ₅₀The figure as a result that value is compared.Usually, right-angle coordinate representation is used in predicting the outcome of medicament molecule pharmacokinetic property and toxicity, and horizontal ordinate is the value that experiment is surveyed, and ordinate is the value through model prediction.When the value that value through model prediction and experiment are surveyed approaching more, error more hour, it is tight more that sample point distributes in the diagonal line both sides, the linear effects of the comparison diagram of being done is good more.Square and pentagram are sample point of the present invention among Fig. 3 a-c.Can find out that from Fig. 3 a-c the figure that the point of square and pentagram is become becomes good linear relationship, and all sample points all are distributed in cornerwise both sides.Explanation predicts that with the present invention medicament molecule pharmacokinetic property and toxicity are a kind of good methods, can improve the forecast quality and the forecasting efficiency of medicament molecule pharmacokinetic property and toxicity.In addition, Fig. 3 a-c is compared also and can find out: the linear relationship of Fig. 3 b is best, and it is tightr that sample point distributes in the diagonal line both sides, and its reason is that the predicated error of 30 BRAF suppressant of model prediction is all in 0.3.

Claims

1. medicament molecule pharmacokinetic property and toxicity prediction method based on a GA and an ANN the steps include:

(1) calculating of drug molecule descriptor;

(2) pre-service of drug molecule descriptive data collection: the value with (a) drug molecule descriptor serves as zero above 90%; (b) with the related coefficient of other medicines molecule descriptor greater than 90% standard, deletion does not have the drug molecule descriptor of researching value to reduce redundance;

(3) scale again of drug molecule descriptive data collection; Drug molecule descriptive data collection is mapped in [1 ,+1] interval, and its mapping formula is:

x_{pre} = \frac{2 x - (x_{\max} + x_{\min})}{x_{\max} - x_{\min}}

Wherein x is the original value of drug molecule descriptor, x _PreBe again the value after the scale, x _MaxAnd x _MinThe maximal value and the minimum value of the corresponding drug molecule descriptor of difference;

(4) adopt integrated approach, simultaneously drug molecule descriptor and artificial neural network parameter are optimized; May further comprise the steps:

I is provided with the population size, end condition, ANN parameter area;

Iii initialization population: produce 20 individuals at random;

fitness＝w _r×RMSE+w _f×N

Wherein: RMSE is the predicted root mean square error of leaving-one method cross validation, and N is the number of the drug molecule descriptor of selection, w _rAnd w _fIt is respectively corresponding weight factor;

Viii adopts traditional single-point variation, and the individuality of selecting is carried out mutation operation, produces new population;

Ix forwards the iv step to;

(5) model of being set up with optimum drug molecule descriptor and ANN parameter, prediction medicament molecule pharmacokinetic property and toxicity.