CN111833971A - Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression - Google Patents
Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression Download PDFInfo
- Publication number
- CN111833971A CN111833971A CN201910328251.5A CN201910328251A CN111833971A CN 111833971 A CN111833971 A CN 111833971A CN 201910328251 A CN201910328251 A CN 201910328251A CN 111833971 A CN111833971 A CN 111833971A
- Authority
- CN
- China
- Prior art keywords
- training
- model
- descriptor
- namely
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 150000001875 compounds Chemical class 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000007477 logistic regression Methods 0.000 title claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000005457 optimization Methods 0.000 claims abstract description 9
- 238000012795 verification Methods 0.000 claims abstract description 8
- 102000004506 Blood Proteins Human genes 0.000 claims abstract description 7
- 108010017384 Blood Proteins Proteins 0.000 claims abstract description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 8
- 238000010200 validation analysis Methods 0.000 claims description 4
- 239000003814 drug Substances 0.000 abstract description 9
- 229940079593 drug Drugs 0.000 abstract description 9
- 238000012827 research and development Methods 0.000 abstract description 4
- 238000013461 design Methods 0.000 abstract description 3
- 230000001537 neural effect Effects 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 13
- 238000010801 machine learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012404 In vitro experiment Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- -1 lipophilicity Chemical class 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
Landscapes
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Pharmacology & Pharmacy (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression in the technical field of drug research and development, which comprises the following steps: step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound; step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK; step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; and then an Adam optimizer optimization model is adopted, a training optimization model of 20 stages is adopted, and the method utilizes a neural multi-task logistic regression algorithm and can automatically design rules under the condition of little or no manual intervention. By the molecular structure of the medicinal compound, the parameters of the medicinal compound, such as pharmacokinetics lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are accurately predicted.
Description
Technical Field
The invention relates to the technical field of drug research and development, in particular to a method for predicting pharmacokinetic parameters of a drug compound based on logistic regression.
Background
Pharmacokinetic parameters of the pharmaceutical compound, such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are important information for drug research and development, and can be used for constructing a physiological pharmacokinetic model, predicting the drug metabolic process of the drug in a human body and assisting the drug research and development. At present, the properties of the pharmaceutical compounds are mainly determined by means of in vitro experiments or animal experiments, and the method is time-consuming, labor-consuming and high in cost. There are also empirical formulas to predict pharmacokinetic parameters, such as logKp 0.71logP-0.0061 MW-6.3. Where Kp is the transdermal parameter, logp is the lipophilicity, and MW is the molecular mass. However, these predicted parameters are often very different from the experimental results. Research shows that the pharmacokinetic parameters of the drug have great relevance to the molecular structure of the drug compound, and the molecular structure of the compound can be characterized by descriptors in hundreds of varieties. Traditional machine learning methods, such as random forest methods, have also been tried to predict the property characteristics, but feature selection has been very labor intensive and inefficient.
When the pharmacokinetic parameters of the existing pharmaceutical compounds are obtained, the time, labor and cost are consumed and high through an experimental mode; through an experience mode or a traditional machine learning mode, the accuracy is low, and the process is complex.
Based on the above, the invention designs a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, so as to solve the above problems.
Disclosure of Invention
The invention aims to provide a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, which aims to solve the problems that the pharmacokinetic parameters of the existing pharmaceutical compound proposed in the background technology are time-consuming, labor-consuming and high in cost in an experimental mode when being obtained; the method has the problems of low accuracy and complex process through an empirical mode or a traditional machine learning mode.
In order to achieve the purpose, the invention provides the following technical scheme: a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:
step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;
step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;
step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;
step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;
and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.
Preferably, the structure file in the first step supports formats such as SDF, MOL, and the like.
Preferably, in the second step, after the descriptor is calculated, the training of the regression function is performed.
Preferably, the training of the regression function comprises the following steps:
a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;
b. carrying out nonlinear processing on the processing result through a ReLU activation function;
c. and training a regression function model of each parameter.
Preferably, the verification set in step three can be used for verification of the output parameter prediction model.
Compared with the prior art, the invention has the beneficial effects that: the invention utilizes neural multi-task logistic regression algorithm, and can automatically design rules under the condition of little or no manual intervention. By the molecular structure of the medicinal compound, the parameters of the medicinal compound, such as pharmacokinetics lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are accurately predicted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the use of the software system of the present invention;
FIG. 2 is a block diagram of a neural multi-task logistic regression model and a period training process according to the present invention;
FIG. 3 is a diagram of the model training process of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution: a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:
step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;
step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;
step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;
step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;
and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.
In a further embodiment, the structure file in the step one supports formats such as SDF, MOL, and the like;
in a further embodiment, in the second step, after the descriptor is calculated, training of a regression function is performed;
in a further embodiment, the training of the regression function comprises the steps of:
a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;
b. carrying out nonlinear processing on the processing result through a ReLU activation function;
c. training a regression function model of each parameter;
in a further embodiment, the validation set in step three may be used for validation of the output parameter prediction model.
It should be noted that, as shown in fig. 1, when inputting a structural file of a molecule of a pharmaceutical compound, it is possible to convert and input files of different formats, calculate a descriptor molecule by CDK, i.e., Chemistry Development Kit, and predict parameters such as pharmacokinetic lipophilicity, solubility, plasma protein binding rate, transdermal property, etc. accurately by using the trained descriptor to process and predict;
as shown in fig. 2, the model is a trained model, i.e. descriptor information of a compound is input, nonlinear processing is performed through a Tanh activation function, and then the processing result is subjected to nonlinear processing through a ReLU activation function, so as to train and obtain a regression function model of parameters;
as shown in fig. 3, in the model training process, the data set including the text data and the label data is divided into a training set and a verification set, the training set data trains the model, the model is output, and the model is verified by using the data of the verification set, then the Adam optimizer is used to optimize the model, and the training optimization model of 20 th stage is used;
the system can accurately predict the values of the pharmacokinetic parameters; can be used for early screening and drug design.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (5)
1. A method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:
step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;
step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;
step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;
step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;
and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.
2. The method of claim 1, wherein the configuration file in the first step supports SDF, MOL, etc. formats.
3. The method of claim 1, wherein in the second step, the training of the regression function is performed after the descriptor is calculated.
4. The method of claim 3, wherein the training of the regression function comprises the steps of:
a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;
b. carrying out nonlinear processing on the processing result through a ReLU activation function;
c. and training a regression function model of each parameter.
5. The method of claim 1, wherein the validation set of step three is used for validation of the output parameter prediction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910328251.5A CN111833971A (en) | 2019-04-23 | 2019-04-23 | Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910328251.5A CN111833971A (en) | 2019-04-23 | 2019-04-23 | Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111833971A true CN111833971A (en) | 2020-10-27 |
Family
ID=72911469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910328251.5A Pending CN111833971A (en) | 2019-04-23 | 2019-04-23 | Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111833971A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116452101A (en) * | 2023-06-09 | 2023-07-18 | 淄博市中心医院 | Intelligent anesthesia department medicine distribution charging method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1111533A2 (en) * | 1999-12-15 | 2001-06-27 | Pfizer Products Inc. | Logistic regression trees for drug analysis |
KR20080040481A (en) * | 2006-11-03 | 2008-05-08 | 주식회사 인실리코텍 | System, method and program for pharmacokinetic parameter prediction of peptide sequence by mathematical model |
KR20120085144A (en) * | 2011-10-05 | 2012-07-31 | 주식회사 켐에쎈 | Multiple linear regression-artificial neural network hybrid model predicting water solubility of pure organic compound |
CN106909990A (en) * | 2017-03-01 | 2017-06-30 | 腾讯科技(深圳)有限公司 | A kind of Forecasting Methodology and device based on historical data |
-
2019
- 2019-04-23 CN CN201910328251.5A patent/CN111833971A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1111533A2 (en) * | 1999-12-15 | 2001-06-27 | Pfizer Products Inc. | Logistic regression trees for drug analysis |
KR20080040481A (en) * | 2006-11-03 | 2008-05-08 | 주식회사 인실리코텍 | System, method and program for pharmacokinetic parameter prediction of peptide sequence by mathematical model |
KR20120085144A (en) * | 2011-10-05 | 2012-07-31 | 주식회사 켐에쎈 | Multiple linear regression-artificial neural network hybrid model predicting water solubility of pure organic compound |
CN106909990A (en) * | 2017-03-01 | 2017-06-30 | 腾讯科技(深圳)有限公司 | A kind of Forecasting Methodology and device based on historical data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116452101A (en) * | 2023-06-09 | 2023-07-18 | 淄博市中心医院 | Intelligent anesthesia department medicine distribution charging method and system |
CN116452101B (en) * | 2023-06-09 | 2023-08-25 | 淄博市中心医院 | Intelligent anesthesia department medicine distribution charging method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11829874B2 (en) | Neural architecture search | |
Schaffter et al. | GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods | |
JP6799574B2 (en) | Method and device for determining satisfaction with voice dialogue | |
JP7206419B2 (en) | Artificial intelligence recommendation model feature processing method, device, electronic device, and computer program | |
Williams et al. | Data-driven model development for cardiomyocyte production experimental failure prediction | |
CN105723405A (en) | Guided article authorship | |
US11308399B2 (en) | Method for topological optimization of graph-based models | |
CN109448795B (en) | Method and device for recognizing circRNA | |
CN111461168A (en) | Training sample expansion method and device, electronic equipment and storage medium | |
CN109902229B (en) | Comment-based interpretable recommendation method | |
CN109493925A (en) | A kind of method of determining drug and drug target incidence relation | |
CN103208038B (en) | A kind of patent introduces the computing method of predicted value | |
CN107315775A (en) | A kind of index calculating platform and method | |
CN108008942A (en) | The method and system handled data record | |
CN111785366A (en) | Method and device for determining patient treatment scheme and computer equipment | |
Currie et al. | Evolution of cultural traits occurs at similar relative rates in different world regions | |
CN111833971A (en) | Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression | |
Subkhankulova et al. | Novel ChIP-seq simulating program with superior versatility: isChIP | |
Buecherl et al. | Engineering genetic circuits: advancements in genetic design automation tools and standards for synthetic biology | |
Ramsey | A call for greater modesty in psychology and cognitive neuroscience | |
CN113886580A (en) | Emotion scoring method and device and electronic equipment | |
CN110489131B (en) | Gray level user selection method and device | |
Peng et al. | Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies | |
CN111984814A (en) | Stirrup matching method and device in construction drawing | |
CN110263346A (en) | Lexical analysis method, electronic equipment and storage medium based on small-sample learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |