CN111833971A - Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression - Google Patents

Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression Download PDF

Info

Publication number
CN111833971A
CN111833971A CN201910328251.5A CN201910328251A CN111833971A CN 111833971 A CN111833971 A CN 111833971A CN 201910328251 A CN201910328251 A CN 201910328251A CN 111833971 A CN111833971 A CN 111833971A
Authority
CN
China
Prior art keywords
training
model
descriptor
namely
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910328251.5A
Other languages
Chinese (zh)
Inventor
张政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yungui Information Technology Co ltd
Original Assignee
Shanghai Yungui Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yungui Information Technology Co ltd filed Critical Shanghai Yungui Information Technology Co ltd
Priority to CN201910328251.5A priority Critical patent/CN111833971A/en
Publication of CN111833971A publication Critical patent/CN111833971A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression in the technical field of drug research and development, which comprises the following steps: step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound; step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK; step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; and then an Adam optimizer optimization model is adopted, a training optimization model of 20 stages is adopted, and the method utilizes a neural multi-task logistic regression algorithm and can automatically design rules under the condition of little or no manual intervention. By the molecular structure of the medicinal compound, the parameters of the medicinal compound, such as pharmacokinetics lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are accurately predicted.

Description

Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression
Technical Field
The invention relates to the technical field of drug research and development, in particular to a method for predicting pharmacokinetic parameters of a drug compound based on logistic regression.
Background
Pharmacokinetic parameters of the pharmaceutical compound, such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are important information for drug research and development, and can be used for constructing a physiological pharmacokinetic model, predicting the drug metabolic process of the drug in a human body and assisting the drug research and development. At present, the properties of the pharmaceutical compounds are mainly determined by means of in vitro experiments or animal experiments, and the method is time-consuming, labor-consuming and high in cost. There are also empirical formulas to predict pharmacokinetic parameters, such as logKp 0.71logP-0.0061 MW-6.3. Where Kp is the transdermal parameter, logp is the lipophilicity, and MW is the molecular mass. However, these predicted parameters are often very different from the experimental results. Research shows that the pharmacokinetic parameters of the drug have great relevance to the molecular structure of the drug compound, and the molecular structure of the compound can be characterized by descriptors in hundreds of varieties. Traditional machine learning methods, such as random forest methods, have also been tried to predict the property characteristics, but feature selection has been very labor intensive and inefficient.
When the pharmacokinetic parameters of the existing pharmaceutical compounds are obtained, the time, labor and cost are consumed and high through an experimental mode; through an experience mode or a traditional machine learning mode, the accuracy is low, and the process is complex.
Based on the above, the invention designs a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, so as to solve the above problems.
Disclosure of Invention
The invention aims to provide a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, which aims to solve the problems that the pharmacokinetic parameters of the existing pharmaceutical compound proposed in the background technology are time-consuming, labor-consuming and high in cost in an experimental mode when being obtained; the method has the problems of low accuracy and complex process through an empirical mode or a traditional machine learning mode.
In order to achieve the purpose, the invention provides the following technical scheme: a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:
step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;
step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;
step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;
step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;
and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.
Preferably, the structure file in the first step supports formats such as SDF, MOL, and the like.
Preferably, in the second step, after the descriptor is calculated, the training of the regression function is performed.
Preferably, the training of the regression function comprises the following steps:
a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;
b. carrying out nonlinear processing on the processing result through a ReLU activation function;
c. and training a regression function model of each parameter.
Preferably, the verification set in step three can be used for verification of the output parameter prediction model.
Compared with the prior art, the invention has the beneficial effects that: the invention utilizes neural multi-task logistic regression algorithm, and can automatically design rules under the condition of little or no manual intervention. By the molecular structure of the medicinal compound, the parameters of the medicinal compound, such as pharmacokinetics lipophilicity, solubility, plasma protein binding rate, transdermal property and the like, are accurately predicted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the use of the software system of the present invention;
FIG. 2 is a block diagram of a neural multi-task logistic regression model and a period training process according to the present invention;
FIG. 3 is a diagram of the model training process of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution: a method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:
step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;
step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;
step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;
step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;
and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.
In a further embodiment, the structure file in the step one supports formats such as SDF, MOL, and the like;
in a further embodiment, in the second step, after the descriptor is calculated, training of a regression function is performed;
in a further embodiment, the training of the regression function comprises the steps of:
a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;
b. carrying out nonlinear processing on the processing result through a ReLU activation function;
c. training a regression function model of each parameter;
in a further embodiment, the validation set in step three may be used for validation of the output parameter prediction model.
It should be noted that, as shown in fig. 1, when inputting a structural file of a molecule of a pharmaceutical compound, it is possible to convert and input files of different formats, calculate a descriptor molecule by CDK, i.e., Chemistry Development Kit, and predict parameters such as pharmacokinetic lipophilicity, solubility, plasma protein binding rate, transdermal property, etc. accurately by using the trained descriptor to process and predict;
as shown in fig. 2, the model is a trained model, i.e. descriptor information of a compound is input, nonlinear processing is performed through a Tanh activation function, and then the processing result is subjected to nonlinear processing through a ReLU activation function, so as to train and obtain a regression function model of parameters;
as shown in fig. 3, in the model training process, the data set including the text data and the label data is divided into a training set and a verification set, the training set data trains the model, the model is output, and the model is verified by using the data of the verification set, then the Adam optimizer is used to optimize the model, and the training optimization model of 20 th stage is used;
the system can accurately predict the values of the pharmacokinetic parameters; can be used for early screening and drug design.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. A method for predicting pharmacokinetic parameters of a pharmaceutical compound based on logistic regression, comprising the steps of:
step one, inputting a file, namely inputting a molecular three-dimensional structure file of a medicinal compound;
step two, calculating a descriptor, namely calculating the descriptor of the molecule through CDK;
step three, a training process, namely dividing a data set containing text data and label data into a training set and a verification set; training a model by training set data, and outputting a parameter prediction model; then adopting an Adam optimizer optimization model and adopting a training optimization model of 20 periods;
step four, model prediction, namely processing prediction by using a descriptor of a trained parameter prediction model;
and step five, outputting parameters, and outputting parameter information such as lipophilicity, solubility, plasma protein binding rate, transdermal property and the like.
2. The method of claim 1, wherein the configuration file in the first step supports SDF, MOL, etc. formats.
3. The method of claim 1, wherein in the second step, the training of the regression function is performed after the descriptor is calculated.
4. The method of claim 3, wherein the training of the regression function comprises the steps of:
a. inputting descriptor information of the compound, and carrying out nonlinear processing through a Tanh activation function;
b. carrying out nonlinear processing on the processing result through a ReLU activation function;
c. and training a regression function model of each parameter.
5. The method of claim 1, wherein the validation set of step three is used for validation of the output parameter prediction model.
CN201910328251.5A 2019-04-23 2019-04-23 Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression Pending CN111833971A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910328251.5A CN111833971A (en) 2019-04-23 2019-04-23 Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910328251.5A CN111833971A (en) 2019-04-23 2019-04-23 Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression

Publications (1)

Publication Number Publication Date
CN111833971A true CN111833971A (en) 2020-10-27

Family

ID=72911469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910328251.5A Pending CN111833971A (en) 2019-04-23 2019-04-23 Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression

Country Status (1)

Country Link
CN (1) CN111833971A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452101A (en) * 2023-06-09 2023-07-18 淄博市中心医院 Intelligent anesthesia department medicine distribution charging method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1111533A2 (en) * 1999-12-15 2001-06-27 Pfizer Products Inc. Logistic regression trees for drug analysis
KR20080040481A (en) * 2006-11-03 2008-05-08 주식회사 인실리코텍 System, method and program for pharmacokinetic parameter prediction of peptide sequence by mathematical model
KR20120085144A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting water solubility of pure organic compound
CN106909990A (en) * 2017-03-01 2017-06-30 腾讯科技(深圳)有限公司 A kind of Forecasting Methodology and device based on historical data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1111533A2 (en) * 1999-12-15 2001-06-27 Pfizer Products Inc. Logistic regression trees for drug analysis
KR20080040481A (en) * 2006-11-03 2008-05-08 주식회사 인실리코텍 System, method and program for pharmacokinetic parameter prediction of peptide sequence by mathematical model
KR20120085144A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting water solubility of pure organic compound
CN106909990A (en) * 2017-03-01 2017-06-30 腾讯科技(深圳)有限公司 A kind of Forecasting Methodology and device based on historical data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452101A (en) * 2023-06-09 2023-07-18 淄博市中心医院 Intelligent anesthesia department medicine distribution charging method and system
CN116452101B (en) * 2023-06-09 2023-08-25 淄博市中心医院 Intelligent anesthesia department medicine distribution charging method and system

Similar Documents

Publication Publication Date Title
US11829874B2 (en) Neural architecture search
Schaffter et al. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods
JP6799574B2 (en) Method and device for determining satisfaction with voice dialogue
JP7206419B2 (en) Artificial intelligence recommendation model feature processing method, device, electronic device, and computer program
Williams et al. Data-driven model development for cardiomyocyte production experimental failure prediction
CN105723405A (en) Guided article authorship
US11308399B2 (en) Method for topological optimization of graph-based models
CN109448795B (en) Method and device for recognizing circRNA
CN111461168A (en) Training sample expansion method and device, electronic equipment and storage medium
CN109902229B (en) Comment-based interpretable recommendation method
CN109493925A (en) A kind of method of determining drug and drug target incidence relation
CN103208038B (en) A kind of patent introduces the computing method of predicted value
CN107315775A (en) A kind of index calculating platform and method
CN108008942A (en) The method and system handled data record
CN111785366A (en) Method and device for determining patient treatment scheme and computer equipment
Currie et al. Evolution of cultural traits occurs at similar relative rates in different world regions
CN111833971A (en) Pharmaceutical compound pharmacokinetic parameter prediction method based on logistic regression
Subkhankulova et al. Novel ChIP-seq simulating program with superior versatility: isChIP
Buecherl et al. Engineering genetic circuits: advancements in genetic design automation tools and standards for synthetic biology
Ramsey A call for greater modesty in psychology and cognitive neuroscience
CN113886580A (en) Emotion scoring method and device and electronic equipment
CN110489131B (en) Gray level user selection method and device
Peng et al. Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies
CN111984814A (en) Stirrup matching method and device in construction drawing
CN110263346A (en) Lexical analysis method, electronic equipment and storage medium based on small-sample learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination