CN111833971A

CN111833971A - Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression

Info

Publication number: CN111833971A
Application number: CN201910328251.5A
Authority: CN
Inventors: 张政
Original assignee: Shanghai Yungui Information Technology Co ltd
Current assignee: Shanghai Yungui Information Technology Co ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2020-10-27

Abstract

The invention discloses a pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression in the technical field of drug research and development, which comprises the following steps: step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound; step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK; step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; and then an Adam optimizer optimization model is adopted, a training optimization model of 20 stages is adopted, and the method utilizes a neural multi-task logistic regression algorithm and can automatically design rules under the condition of little or no manual intervention. By the molecular structure of the medicinal compound, the parameters of the medicinal compound, such as pharmacokinetics lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are accurately predicted.

Description

Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression

Technical Field

The invention relates to the technical field of drug research and development, in particular to a method for predicting pharmacokinetic parameters of a drug compound based on logistic regression.

Background

Pharmacokinetic parameters of the pharmaceutical compound, such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are important information for drug research and development, and can be used for constructing a physiological pharmacokinetic model, predicting the drug metabolic process of the drug in a human body and assisting the drug research and development. At present, the properties of the pharmaceutical compounds are mainly determined by means of in vitro experiments or animal experiments, and the method is time-consuming, labor-consuming and high in cost. There are also empirical formulas to predict pharmacokinetic parameters, such as logKp 0.71logP-0.0061 MW-6.3. Where Kp is the transdermal parameter, logp is the lipophilicity, and MW is the molecular mass. However, these predicted parameters are often very different from the experimental results. Research shows that the pharmacokinetic parameters of the drug have great relevance to the molecular structure of the drug compound, and the molecular structure of the compound can be characterized by descriptors in hundreds of varieties. Traditional machine learning methods, such as random forest methods, have also been tried to predict the property characteristics, but feature selection has been very labor intensive and inefficient.

When the pharmacokinetic parameters of the existing pharmaceutical compounds are obtained, the time, labor and cost are consumed and high through an experimental mode; through an experience mode or a traditional machine learning mode, the accuracy is low, and the process is complex.

Based on the above, the invention designs a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, so as to solve the above problems.

Disclosure of Invention

The invention aims to provide a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, which aims to solve the problems that the pharmacokinetic parameters of the existing pharmaceutical compound proposed in the background technology are time-consuming, labor-consuming and high in cost in an experimental mode when being obtained; the method has the problems of low accuracy and complex process through an empirical mode or a traditional machine learning mode.

In order to achieve the purpose, the invention provides the following technical scheme: a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:

step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;

step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;

step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;

step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;

and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.

Preferably, the structure file in the first step supports formats such as SDF, MOL, and the like.

Preferably, in the second step, after the descriptor is calculated, the training of the regression function is performed.

Preferably, the training of the regression function comprises the following steps:

a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;

b. carrying out nonlinear processing on the processing result through a ReLU activation function;

c. and training a regression function model of each parameter.

Preferably, the verification set in step three can be used for verification of the output parameter prediction model.

Compared with the prior art, the invention has the beneficial effects that: the invention utilizes neural multi-task logistic regression algorithm, and can automatically design rules under the condition of little or no manual intervention. By the molecular structure of the medicinal compound, the parameters of the medicinal compound, such as pharmacokinetics lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are accurately predicted.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the use of the software system of the present invention;

FIG. 2 is a block diagram of a neural multi-task logistic regression model and a period training process according to the present invention;

FIG. 3 is a diagram of the model training process of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution: a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:

In a further embodiment, the structure file in the step one supports formats such as SDF, MOL, and the like;

in a further embodiment, in the second step, after the descriptor is calculated, training of a regression function is performed;

in a further embodiment, the training of the regression function comprises the steps of:

c. training a regression function model of each parameter;

in a further embodiment, the validation set in step three may be used for validation of the output parameter prediction model.

It should be noted that, as shown in fig. 1, when inputting a structural file of a molecule of a pharmaceutical compound, it is possible to convert and input files of different formats, calculate a descriptor molecule by CDK, i.e., Chemistry Development Kit, and predict parameters such as pharmacokinetic lipophilicity, solubility, plasma protein binding rate, transdermal property, etc. accurately by using the trained descriptor to process and predict;

as shown in fig. 2, the model is a trained model, i.e. descriptor information of a compound is input, nonlinear processing is performed through a Tanh activation function, and then the processing result is subjected to nonlinear processing through a ReLU activation function, so as to train and obtain a regression function model of parameters;

as shown in fig. 3, in the model training process, the data set including the text data and the label data is divided into a training set and a verification set, the training set data trains the model, the model is output, and the model is verified by using the data of the verification set, then the Adam optimizer is used to optimize the model, and the training optimization model of 20 th stage is used;

the system can accurately predict the values of the pharmacokinetic parameters; can be used for early screening and drug design.

In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:

2. The method of claim 1, wherein the configuration file in the first step supports SDF, MOL, etc. formats.

3. The method of claim 1, wherein in the second step, the training of the regression function is performed after the descriptor is calculated.

4. The method of claim 3, wherein the training of the regression function comprises the steps of:

c. and training a regression function model of each parameter.

5. The method of claim 1, wherein the validation set of step three is used for validation of the output parameter prediction model.