CN117252110A - Method, system, equipment and medium for predicting imatinib valley concentration - Google Patents

Method, system, equipment and medium for predicting imatinib valley concentration Download PDF

Info

Publication number
CN117252110A
CN117252110A CN202311514369.XA CN202311514369A CN117252110A CN 117252110 A CN117252110 A CN 117252110A CN 202311514369 A CN202311514369 A CN 202311514369A CN 117252110 A CN117252110 A CN 117252110A
Authority
CN
China
Prior art keywords
imatinib
valley concentration
concentration prediction
valley
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311514369.XA
Other languages
Chinese (zh)
Inventor
刘博宇
黄琳
陶依然
邵千航
江倩
封宇飞
张晓红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Peoples Hospital
Original Assignee
Peking University Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Peoples Hospital filed Critical Peking University Peoples Hospital
Priority to CN202311514369.XA priority Critical patent/CN117252110A/en
Publication of CN117252110A publication Critical patent/CN117252110A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the technical field of medical care information data processing suitable for prediction purposes, and discloses an imatinib valley concentration prediction method, an imatinib valley concentration prediction system, imatinib valley concentration prediction equipment and an imatinib valley concentration medium, wherein the method comprises the following steps of: obtaining a training set and a testing set according to historical data of target groups; according to the training set, training through different algorithms to obtain a plurality of candidate imatinib valley concentration prediction models; obtaining an imatinib valley concentration prediction result of each candidate imatinib valley concentration prediction model according to the test set; obtaining the score of each candidate imatinib valley concentration prediction model according to the imatinib valley concentration prediction result, and selecting the candidate imatinib valley concentration prediction model with the optimal score as an optimal imatinib valley concentration prediction model; and predicting the imatinib valley concentration of the tester by an optimal imatinib valley concentration prediction model. The method constructs the optimal imatinib valley concentration prediction model, can predict the accurate imatinib valley concentration, and effectively assists doctors in making individual medication decisions.

Description

Method, system, equipment and medium for predicting imatinib valley concentration
Technical Field
The invention relates to the technical field of medical care information data processing suitable for prediction purposes, in particular to an imatinib valley concentration prediction method, an imatinib valley concentration prediction system, imatinib valley concentration prediction equipment and an imatinib valley concentration medium.
Background
Imatinib is an oral BCR-ABL 1 protein tyrosine kinase inhibitor that prevents cell proliferation and tumor formation by cutting off aberrant tyrosine kinase signaling. Imatinib is recommended as one of the first-line drug options for the treatment of Chronic Myelogenous Leukemia (CML) because of its higher cytogenetics and molecular response rate, better progression-free survival and overall survival. Studies report a close correlation between in vivo exposure of imatinib and clinical outcome, with significant individual differences, and the clinical use of steady-state plasma trough concentrations of imatinib represents its in vivo exposure of the drug. To achieve clinical personalized medicine, therapeutic Drug Monitoring (TDM) needs to be implemented and further clinical influencing factors responsible for individual differences in steady-state plasma trough concentrations of imatinib are determined.
Machine learning is used as a branch subject of artificial intelligence, a model is developed by using a large-scale complex algorithm and a data set, and when data prediction is performed according to a plurality of variables, data driving can be evaluated, and a nonlinear variable relation is obtained, so that a high-precision prediction result is obtained. Rapidly evolving machine learning has been widely used in biomedical fields such as clinical diagnosis, precision therapy, and health monitoring. However, the current imatinib valley concentration study mostly adopts a Population Pharmacokinetics (PPK) model, and no report has been made that a machine learning is utilized to build an imatinib valley concentration prediction model and interpret variables.
Disclosure of Invention
The invention provides an imatinib valley concentration prediction method, an imatinib valley concentration prediction system, imatinib valley concentration prediction equipment and an imatinib valley concentration prediction medium, which are used for solving the defect that a high-precision imatinib valley concentration prediction result cannot be obtained in the prior art.
The invention provides a method for constructing an imatinib valley concentration prediction model, which comprises the following steps:
obtaining a training set and a testing set according to historical data of target groups, wherein the target groups are patients who have received imatinib treatment, and the training set and the testing set respectively comprise clinical data of a plurality of patients and corresponding actual imatinib valley concentrations;
according to the training set, respectively training to obtain a plurality of candidate imatinib valley concentration prediction models through different algorithms, wherein the algorithms comprise at least two of the following: linear regression algorithm, minimum absolute value convergence and selection operator algorithm, ridge regression algorithm, support vector regression algorithm, random forest algorithm, and efficient gradient lifting decision tree algorithm;
obtaining an imatinib valley concentration prediction result of each candidate imatinib valley concentration prediction model according to the test set;
and obtaining the score of each candidate imatinib valley concentration prediction model according to the imatinib valley concentration prediction result, and selecting the candidate imatinib valley concentration prediction model with the optimal score as the optimal imatinib valley concentration prediction model.
According to the method for constructing the imatinib valley concentration prediction model, a training set and a testing set are obtained according to historical data of target groups, and the method comprises the following steps:
acquiring historical data of a target crowd;
screening historical data of target crowd, wherein the conditions of the target crowd with the historical data are as follows: the target crowd takes the imatinib for more than one month, so that the actual imatinib valley concentration reaches a steady state, and the actual imatinib valley concentration is obtained by taking venous blood 0-2 hours before the next administration, and the conditions of the target crowd with the history data removed include: the target group is pregnant women, the target group has cancer, the target group is combined with medicines which are known to have influence on the pharmacokinetics of the imatinib, the target group lacks clinical data, diarrhea response is caused when the target group collects venous blood, or the actual imatinib valley concentration of the target group does not reach the lower limit of quantification;
according to the history data of the screened target crowd, the method comprises the following steps of: 3, obtaining a training set and a testing set.
According to the method for constructing the imatinib valley concentration prediction model, clinical data comprise demographic data, physiological data and genetic data, wherein the demographic data comprise one or more of the following: age, sex, height, weight, body mass index, physiological data include one or more of the following: the number of bowel movements per day, the regimen of imatinib administration, the time since last dose when venous blood was taken, the actual imatinib trough concentration, white blood cell count, lymphocyte absolute number, monocyte absolute number, neutrophil absolute value, hemoglobin content, platelet count, alanine aminotransferase, aspartic aminotransferase, albumin, urea, total bilirubin, direct bilirubin, glomerular filtration rate.
According to the method for constructing the imatinib valley concentration prediction model, provided by the invention, the method for obtaining the gene data comprises the following steps:
extracting genome from peripheral blood white blood cells of a target population;
based on genome, combining with a single nucleotide polymorphism database, selecting a plurality of SNP loci according to the principle that the minimum allele frequency is more than 5% in Asian population;
and (3) carrying out gene polymorphism detection on the SNP locus to obtain gene data.
According to the method for constructing the imatinib valley concentration prediction model, the score of each candidate imatinib valley concentration prediction model is obtained according to the imatinib valley concentration prediction result, specifically:
and obtaining the fitting goodness, the mean square error, the average absolute error and the relative accuracy of the imatinib valley concentration predicted value and the actual imatinib valley concentration in a preset range of each candidate imatinib valley concentration predicted model according to the imatinib valley concentration predicted result.
According to the method for constructing the imatinib valley concentration prediction model provided by the invention, the candidate imatinib valley concentration prediction model with the optimal score is selected as the optimal imatinib valley concentration prediction model, and the method comprises the following steps:
when the fitting goodness of each candidate imatinib valley concentration prediction model has the unique maximum value, taking the candidate imatinib valley concentration prediction model with the highest fitting goodness as the optimal imatinib valley concentration prediction model;
and when two or more than two identical maximum values exist in the fitting goodness of each candidate imatinib valley concentration prediction model, taking the candidate imatinib valley concentration prediction model with higher phase accuracy as an optimal imatinib valley concentration prediction model.
The method for constructing the imatinib valley concentration prediction model provided by the invention further comprises the following steps:
the feature importance of the optimal imatinib valley concentration prediction model is explained by using the SHAP interpretation method.
The invention also provides an imatinib valley concentration prediction system, which comprises:
the data acquisition module is used for: obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
an imatinib valley concentration prediction module for: according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of any one of the imatinib valley concentration prediction models.
The invention also provides an electronic device comprising a processor and a memory storing a computer program, characterized in that the processor when executing the computer program realizes the following steps:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of any one of the imatinib valley concentration prediction models.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of any one of the imatinib valley concentration prediction models.
The present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the processor implementing the steps of, when the program is executed:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of any one of the imatinib valley concentration prediction models.
According to the method, the system, the equipment and the medium for predicting the imatinib valley concentration, provided by the invention, a training set and a testing set are constructed by using actual data of a patient, a plurality of candidate imatinib valley concentration prediction models are obtained through training of different machine learning algorithms, then the candidate imatinib valley concentration prediction models are scored according to imatinib valley concentration prediction results, the optimal imatinib valley concentration prediction model with the highest precision is selected, the accurate imatinib valley concentration can be predicted through the optimal imatinib valley concentration prediction model, a doctor is effectively assisted to make an individual medication decision, and huge economic benefits and social values can be brought to clinical medicine.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following brief description will be given of the drawings used in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for constructing an imatinib valley concentration prediction model.
Fig. 2 is a second flow chart of a method for constructing an imatinib valley concentration prediction model according to the present invention.
FIG. 3 is a graph comparing prediction performance of six candidate imatinib valley concentration prediction models;
FIG. 4 shows SHAP values of variables;
FIG. 5 shows variables sorted by SHAP value mean;
FIG. 6 shows SHAP values versus individual variable values;
fig. 7 is a schematic structural diagram of an imatinib valley concentration prediction system provided by the invention.
Fig. 8 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions thereof will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, which should not be construed as limiting the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In the description of the present invention, it is to be understood that the terminology used is for the purpose of description only and is not to be interpreted as indicating or implying relative importance.
The method, system, equipment and medium for predicting the concentration of imatinib provided by the invention are described below with reference to fig. 1-8.
Fig. 1-2 are schematic flow diagrams of a method for constructing an imatinib valley concentration prediction model. Referring to fig. 1, the method for constructing the imatinib valley concentration prediction model provided by the invention may include:
step S110, obtaining a training set and a testing set according to historical data of target groups, wherein the target groups are patients who have received imatinib treatment, and the training set and the testing set respectively comprise clinical data of a plurality of patients and corresponding actual imatinib valley concentrations;
step S120, training according to a training set through different algorithms to obtain a plurality of candidate imatinib valley concentration prediction models, wherein the algorithms comprise at least two of the following: a linear regression algorithm (Linear regression), a minimum absolute value convergence and selection operator algorithm (Lasso regression), a Ridge regression algorithm (Ridge regression), a support vector regression algorithm (SVR, support Vector Regression), a Random forest algorithm (RF), a high-efficiency gradient-lifting decision tree algorithm (XGBoost);
step S130, obtaining an imatinib valley concentration prediction result of each candidate imatinib valley concentration prediction model according to the test set;
and step 140, obtaining the score of each candidate imatinib valley concentration prediction model according to the imatinib valley concentration prediction result, and selecting the candidate imatinib valley concentration prediction model with the optimal score as the optimal imatinib valley concentration prediction model.
According to the method for constructing the imatinib valley concentration prediction model, the training set and the testing set are constructed by using actual data of patients, a plurality of candidate imatinib valley concentration prediction models are obtained through training of different machine learning algorithms, then the candidate imatinib valley concentration prediction models are scored according to the imatinib valley concentration prediction results, the optimal imatinib valley concentration prediction model with highest precision can be selected, so that an accurate imatinib valley concentration prediction result is provided, and the method has important significance and value for realizing clinical individuation medication.
In one embodiment, step S110 may include:
acquiring historical data of a target crowd;
screening historical data of target crowd, wherein the conditions of the target crowd with the historical data are as follows: the target population has taken imatinib for more than one month, so that the actual imatinib valley concentration reaches a steady state (the steady state is generally defined as reaching a blood concentration steady state after 4-5 half-lives after taking the drug, and the patient can reach the blood concentration steady state after taking imatinib for one month), and the actual imatinib valley concentration is obtained by taking venous blood 0-2 hours before the next administration, and the target population conditions in which the historical data are excluded include: the target group is pregnant women, the target group has cancer, the target group is combined with medicines which are known to have influence on the pharmacokinetics of the imatinib, the target group lacks clinical data, diarrhea response is caused when the target group collects venous blood, or the actual imatinib valley concentration of the target group does not reach the lower limit of quantification;
according to the history data of the screened target crowd, the method comprises the following steps of: 3, obtaining a training set and a testing set.
Specifically, referring to FIG. 2, 236 patients (such as chronic myelogenous leukemia patients, for example, who have explicit diagnosis results in a medical record system) who receive treatment of imatinib at a hospital from 11 months 2020 to 10 months 2021 are targeted population, imatinib is recommended as first line drug for treating chronic myelogenous leukemia, historical data of the targeted population is obtained according to 434 times of detection of imatinib valley concentration performed by the patients, wherein the patients are subjected to follow-up during treatment, each follow-up is subjected to detailed medical history record and laboratory examination, clinical data of the patients and corresponding actual imatinib valley concentration can be obtained from medical files and follow-up records of the patients, and then data screening is carried out according to the target crowd conditions, the screened data can form a training set (303 cases) and a testing set (131 cases) for obtaining an optimal imatinib valley concentration prediction model, independent sample t-test (parameter variable) and Wilcoxon test (nonparametric variable) are used for analyzing continuous variables, such as age and weight, counting data and classification variables (such as gender and gene locus) are tested accurately by using chi-square test and Fisher, p is less than or equal to 0.05, and the result shows that p is more than 0.05 between two groups of data sets of the training set and the testing set, namely, no significant difference exists between the two groups of data sets.
In one embodiment, the clinical data includes demographic data, physiological data, and genetic data, wherein the demographic data includes one or more of: age, gender, height, weight, body mass index (BMI index), physiological data (default to continuous variables) include one or more of the following: the number of bowel movements per day, the dosing regimen (e.g., dosage level, method of administration, etc.), the time since last dose at the time of venous blood collection, the actual imatinib trough concentration, white blood cell count, lymphocyte absolute number, monocyte absolute number, neutrophil absolute value, hemoglobin content, platelet count, alanine Aminotransferase (ALT), aspartate Aminotransferase (AST), albumin (ALB), urea, total bilirubin, direct bilirubin, glomerular filtration rate (gfr), physiological data derived from patient medical history, are all hospital biochemical measurements similar to the time of blood collection to detect imatinib concentration, at least the same day of data.
It should be noted that glomerular filtration rate (evfr) can be estimated using the existing MDRD equation.
It should be noted that the actual imatinib valley concentration may be obtained by: after the patient took imatinib for at least one month (i.e., the imatinib trough concentration reached steady state), 0 to 2 hours prior to the next administration, an anticoagulated evacuated blood collection tube of 3.0. 3.0 mL ethylenediamine tetraacetic acid (EDTA) was used to collect a whole blood sample from the patient, which was centrifuged at 3000×g for 10 minutes to obtain plasma and blood cells, 50 μl of plasma was taken to [ 2 H 8 ]-sample pretreatment using protein precipitation as an internal standard, taking 2 μl of supernatant for analysis, and measuring actual imatinib trough concentration in the patient using Thermo Scientific Vanquish UHPLC tandem TSQ Quantis mass spectrometry system (CA, USA) (sammer-feishi liquid-mass spectrometer, an analytical instrument). Wherein the mobile phase is water (10 mMAmmonium formate, 0.1% formic acid, solvent a) and acetonitrile (0.1% formic acid, solvent B), gradient elution: 0-0.5min,90% A;0.5-1.0min,90% A-10% A;1.0-2.0min,10% A;2.0-2.1min,10% A-90% A;2.1-3.0min (0.9 min balance), 90% A. The flow rate is 0.4mL/min, the sample injection amount is 2 mu L, the temperature of the column temperature and the sample injector is 45 ℃ and 4 ℃, and the actual imatinib valley concentration is in the range of 5-5000 ng/mL, so that the linearity is good. The actual imatinib valley concentration obtained in this way is closer to the actual value, and the accuracy of model training can be ensured.
Further, the method for obtaining the gene data can be realized by the following steps:
extracting genome from peripheral blood white blood cells of a target population;
based on genome, combining with a single nucleotide polymorphism database, selecting a plurality of SNP loci according to the principle that the minimum allele frequency is more than 5% in Asian population;
and (3) carrying out gene polymorphism detection on the SNP locus to obtain gene data.
Specifically, the genome can be extracted from peripheral blood leukocytes of the target population using a commercially available kit (e.z.n.a.tm SQ blood DNA kit) according to instructions and standard protocols, and in combination with a Single Nucleotide Polymorphism (SNP) database (dbSNP, hapMap), 10 SNP sites (metabolic enzyme gene polymorphisms) located on the CYP3A4, CYP3A5, ABCB1, ABCG2, SLC22A1 and POR genes are selected according to the principle that the Minimum Allele Frequency (MAF) is greater than 5% in asian populations: NADPH-cytochrome P450 reductase (POR) gene polymorphism, which is metabolized primarily by CYP3A4, CYP3A5, can affect the content and activity of CYP450 (including mainly CYP3A4, CYP3A5, etc.: transporter gene polymorphism: the P-gp encoded by ABCB1 gene can transport cell metabolites and drug molecules out of cells, is an important factor affecting the in vivo absorption and excretion of drugs, and is a substrate thereof, the transporter encoded by ABCG2 gene can affect the blood concentration of imatinib, the organic cation transporter encoded by SLC22A1 gene can affect the blood concentration of imatinib, then using SNaPshot technique (SNaPshot technique) to detect the polymorphism of 10 SNP sites CYP3A4 [ 1G (20239G > A; rs2242480), rs4646437], CYP3A5 [ [ 3 (6986A > G; rs776746) ], ABCB1 [1236C > T (rs 1128503), 3435C > T (rs 1045642) and 2677G > T/A (rs 2032582) ], ABCG2 [421C > A (rs 2231142) ], SLC22A1 [480C > G (rs 683369) and 1222A > G (rs 628031) ] and POR [ [ 28C > T; rs777868) ] to obtain gene data (through model, attempts to investigate the effect of gene polymorphisms on the trough concentration of imatinib in plasma drugs have involved uncertainty as to which patient populations are susceptible to mutation, so that gene detection is required prior to application of model predictions). It should be noted that, through experiments, SNP site rs4646437 located at CYP3A4 was found to have no mutation in the patients who were included in the study, and therefore, this variable was excluded.
According to the invention, the candidate imatinib valley concentration prediction model is comprehensively trained based on three-dimensional data (except the actual imatinib valley concentration, which is 31 variables) of demographic data, physiological data and gene data, so that the accuracy of a prediction result can be effectively improved.
After obtaining a training set and a test set, the present embodiment trains six different candidate imatinib valley concentration prediction models by six different machine learning algorithms, wherein a linear regression algorithm (Linear regression), a minimum absolute value convergence and selection operator algorithm (Lasso regression), a Ridge regression algorithm (Ridge regression), a support vector regression algorithm (SVR, support Vector Regression), a Random forest algorithm (RF, random forest), and a high-efficiency gradient lifting decision tree algorithm (XGBoost) are all adopted, an existing algorithm formula and a model frame (a model compression package can be downloaded from an existing public database, the training set can be input into the existing public database to be trained to obtain a corresponding model), clinical data of a patient in the test set is input into the candidate imatinib valley concentration prediction model obtained by training, and the prediction result of the imatinib valley concentration can be obtained by comparing the imatinib valley concentration prediction result with the actual imatinib valley concentration of the patient in the test set, so that the prediction performance of the model can be obtained.
In one embodiment, step S140 may include:
obtaining the fitting goodness (R), the Mean Square Error (MSE), the Mean Absolute Error (MAE) and the relative accuracy of the imatinib valley concentration predicted value and the actual imatinib valley concentration in a preset range of each candidate imatinib valley concentration prediction model according to the imatinib valley concentration predicted result;
when the fitting goodness of each candidate imatinib valley concentration prediction model has the unique maximum value, taking the candidate imatinib valley concentration prediction model with the highest fitting goodness as the optimal imatinib valley concentration prediction model;
and when two or more than two identical maximum values exist in the fitting goodness of each candidate imatinib valley concentration prediction model, taking the candidate imatinib valley concentration prediction model with higher phase accuracy as an optimal imatinib valley concentration prediction model.
It should be noted that, the goodness of fit, the mean square error, the average absolute error and the relative accuracy of the candidate imatinib valley concentration prediction model are calculated by adopting the existing calculation mode for evaluating the performance of the model. The value range of the goodness of fit is 0-1, and the closer to 1, the better the goodness of fit of the model is indicated. The relative accuracy of the imatinib valley concentration predicted value and the actual imatinib valley concentration in the preset range in the embodiment is specifically a ratio of error of the imatinib valley concentration predicted value and the actual imatinib valley concentration in the test set within +/-30%. Fig. 3 shows a predictive performance comparison of six candidate imatinib valley concentration predictive models, including but not limited to: goodness of fit, mean square error, mean absolute error, and relative accuracy.
When evaluating the candidate imatinib valley concentration prediction models, the fitting goodness is preferentially considered, the relative accuracy is considered secondly, when the fitting goodness of the six candidate imatinib valley concentration prediction models is different, the candidate imatinib valley concentration prediction model with the largest fitting goodness is selected as the optimal imatinib valley concentration prediction model, when two or more identical maximum values exist in the fitting goodness of the six candidate imatinib valley concentration prediction models, the candidate imatinib valley concentration prediction model with higher relative accuracy in the two or more candidate imatinib valley concentration prediction models is selected as the optimal imatinib valley concentration prediction model, if the two or more candidate imatinib valley concentration prediction models with the same maximum value of the fitting goodness and higher value of the relative accuracy exist, which candidate imatinib valley concentration prediction model is selected as the optimal imatinib valley concentration prediction model according to practical conditions, or the two or more candidate imatinib valley concentration prediction models with the same maximum value of the relative accuracy and higher value of the relative accuracy can be considered as the optimal imatinib valley concentration prediction models can be used by doctors.
In one embodiment, the method for constructing the imatinib valley concentration prediction model provided by the invention further comprises the step S150:
the feature importance of the optimal imatinib valley concentration prediction model was interpreted using SHAP interpretation (SHapley Additive exPlanation).
It should be noted that SHAP is a game theory method for providing information for machine learning output. It determines and assigns the credit value output by the model by Shapley values in the game theory including all relevant covariates. SHAP values represent the contribution of each feature in a sample as an additive feature attribute method, where each feature is considered a "contributor". Features with positive SHAP values will increase the output value and those larger values will make a greater contribution.
In this embodiment, SHAP values of 30 variables selected in the training set are calculated by using a SHAP interpretation method to explain feature importance of the 30 variables selected in the training set in the optimal imatinib valley concentration prediction model, where the SHAP values are obtained by using an existing calculation method. Depending on the variables selected, the relationship that exists between the variables and the imatinib valley concentration can be displayed by a SHAP map, and the eigenvalues refer to the actual values of each variable, such as: body height 170cm, etc., each characteristic value of a variable corresponds to a SHAP value, and for one sample, the sum of SHAP values of each variable is equal to the predictive value of imatinib valley concentration. The SHAP values for 30 variables are summarized as shown in fig. 4, where the shades of the dot colors represent the feature values for each variable. Further, the importance ranking of the variables to the optimal imatinib valley concentration prediction model can be seen in fig. 5. To understand how the eigenvalues of the individual variables affect the prediction results of the optimal imatinib valley concentration prediction model, the SHAP value of each variable is compared individually to the eigenvalues, with the SHAP value of the individual variable versus the eigenvalues, see fig. 6.
The invention utilizes the SHAP interpretation method to explain the feature importance of the optimal imatinib valley concentration prediction model, can further verify the rationality of training the optimal imatinib valley concentration prediction model for the prediction of the imatinib valley concentration, and effectively ensures the accuracy of the prediction result.
After the optimal imatinib valley concentration prediction model is obtained, the optimal imatinib valley concentration prediction model can be applied to imatinib valley concentration prediction, for example, a doctor can acquire clinical data of a tester (the tester is a patient who has received imatinib treatment), the clinical data of the tester is input into the optimal imatinib valley concentration prediction model to conduct imatinib valley concentration prediction, and then an imatinib valley concentration prediction value is obtained and is used for assisting in making a personalized medication scheme, so that decision efficiency and accuracy are improved.
The method is used for constructing the optimal imatinib valley concentration prediction model based on real world data and by adopting a machine learning algorithm for the first time in the medical community, and explaining the optimal imatinib valley concentration prediction model by adopting a SHAP interpretation method, so that the method has important significance and value for realizing clinical personalized medication.
The imatinib valley concentration prediction system provided by the invention is described below, and the imatinib valley concentration prediction system described below and the imatinib valley concentration prediction method described above can be referred to correspondingly.
Referring to fig. 7, an imatinib valley concentration prediction system provided by the present invention may include:
a data acquisition module 610, configured to: obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
an imatinib valley concentration prediction module 620 configured to: and predicting the imatinib valley concentration of the tester according to the clinical data of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of the imatinib valley concentration prediction model.
Wherein the clinical data of the tester may include demographic data, physiological data, and genetic data, wherein the demographic data includes one or more of: age, sex, height, weight, body mass index, physiological data include one or more of the following: the number of bowel movements per day, the regimen of imatinib administration, the time since last dose when venous blood was taken, the actual imatinib trough concentration, white blood cell count, lymphocyte absolute number, monocyte absolute number, neutrophil absolute value, hemoglobin content, platelet count, alanine aminotransferase, aspartic aminotransferase, albumin, urea, total bilirubin, direct bilirubin, glomerular filtration rate. According to clinical data of patients who have received imatinib treatment, the actual imatinib valley concentration of a tester is predicted by the optimal imatinib valley concentration prediction model obtained by the establishment method of the imatinib valley concentration prediction model, whether the imatinib valley concentration of the tester reaches a steady state can be verified, and the imatinib treatment strategy of the tester is adjusted accordingly.
Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform the steps of:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
and predicting the imatinib valley concentration of the tester according to the clinical data of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of the imatinib valley concentration prediction model.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program being executable by a processor to perform the steps of:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
and predicting the imatinib valley concentration of the tester according to the clinical data of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of the imatinib valley concentration prediction model.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the steps of:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
and predicting the imatinib valley concentration of the tester according to the clinical data of the tester by using the optimal imatinib valley concentration prediction model obtained by the construction method of the imatinib valley concentration prediction model.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The method for constructing the imatinib valley concentration prediction model is characterized by comprising the following steps of:
obtaining a training set and a testing set according to historical data of target groups, wherein the target groups are patients who have received imatinib treatment, and the training set and the testing set respectively comprise clinical data of a plurality of patients and corresponding actual imatinib valley concentrations;
according to the training set, respectively training to obtain a plurality of candidate imatinib valley concentration prediction models through different algorithms, wherein the algorithms comprise at least two of the following: linear regression algorithm, minimum absolute value convergence and selection operator algorithm, ridge regression algorithm, support vector regression algorithm, random forest algorithm, and efficient gradient lifting decision tree algorithm;
obtaining an imatinib valley concentration prediction result of each candidate imatinib valley concentration prediction model according to the test set;
and obtaining the score of each candidate imatinib valley concentration prediction model according to the imatinib valley concentration prediction result, and selecting the candidate imatinib valley concentration prediction model with the optimal score as the optimal imatinib valley concentration prediction model.
2. The method for constructing an imatinib valley concentration prediction model according to claim 1, wherein the obtaining a training set and a testing set according to historical data of a target crowd includes:
acquiring historical data of a target crowd;
screening historical data of target crowd, wherein the conditions of the target crowd with the historical data are as follows: the target crowd takes the imatinib for more than one month, so that the actual imatinib valley concentration reaches a steady state, and the actual imatinib valley concentration is obtained by taking venous blood 0-2 hours before the next administration, and the conditions of the target crowd with the history data removed include: the target group is pregnant women, the target group has cancer, the target group is combined with medicines which are known to have influence on the pharmacokinetics of the imatinib, the target group lacks clinical data, diarrhea response is caused when the target group collects venous blood, or the actual imatinib valley concentration of the target group does not reach the lower limit of quantification;
according to the history data of the screened target crowd, the method comprises the following steps of: 3, obtaining a training set and a testing set.
3. The method of constructing a predictive model of imatinib valley concentration according to claim 2, wherein the clinical data comprises demographic data, physiological data, and genetic data, wherein the demographic data comprises one or more of: age, sex, height, weight, body mass index, physiological data include one or more of the following: the number of bowel movements per day, the regimen of imatinib administration, the time since last dose when venous blood was taken, the actual imatinib trough concentration, white blood cell count, lymphocyte absolute number, monocyte absolute number, neutrophil absolute value, hemoglobin content, platelet count, alanine aminotransferase, aspartic aminotransferase, albumin, urea, total bilirubin, direct bilirubin, glomerular filtration rate.
4. The method for constructing an imatinib valley concentration prediction model according to claim 3, wherein the method for obtaining gene data comprises:
extracting genome from peripheral blood white blood cells of a target population;
based on genome, combining with a single nucleotide polymorphism database, selecting a plurality of SNP loci according to the principle that the minimum allele frequency is more than 5% in Asian population;
and (3) carrying out gene polymorphism detection on the SNP locus to obtain gene data.
5. The method for constructing an imatinib valley concentration prediction model according to any one of claims 1 to 4, wherein the score of each candidate imatinib valley concentration prediction model is obtained according to an imatinib valley concentration prediction result, specifically:
and obtaining the fitting goodness, the mean square error, the average absolute error and the relative accuracy of the imatinib valley concentration predicted value and the actual imatinib valley concentration in a preset range of each candidate imatinib valley concentration predicted model according to the imatinib valley concentration predicted result.
6. The method for constructing an imatinib valley concentration prediction model according to claim 5, wherein selecting the candidate imatinib valley concentration prediction model with the optimal score as the optimal imatinib valley concentration prediction model comprises:
when the fitting goodness of each candidate imatinib valley concentration prediction model has the unique maximum value, taking the candidate imatinib valley concentration prediction model with the highest fitting goodness as the optimal imatinib valley concentration prediction model;
and when two or more than two identical maximum values exist in the fitting goodness of each candidate imatinib valley concentration prediction model, taking the candidate imatinib valley concentration prediction model with higher phase accuracy as an optimal imatinib valley concentration prediction model.
7. The method for constructing an imatinib valley concentration prediction model according to any one of claims 1 to 4, further comprising:
the feature importance of the optimal imatinib valley concentration prediction model is explained by using the SHAP interpretation method.
8. An imatinib valley concentration prediction system, comprising:
the data acquisition module is used for: obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
an imatinib valley concentration prediction module for: according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using an optimal imatinib valley concentration prediction model obtained by the method for constructing the imatinib valley concentration prediction model according to any one of claims 1 to 7.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the following steps when executing the program:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using an optimal imatinib valley concentration prediction model obtained by the method for constructing the imatinib valley concentration prediction model according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, the computer program when executed by a processor implementing the steps of:
obtaining clinical data of a tester, wherein the tester is a patient who has been treated with imatinib;
according to clinical data of a tester, predicting the imatinib valley concentration of the tester by using an optimal imatinib valley concentration prediction model obtained by the method for constructing the imatinib valley concentration prediction model according to any one of claims 1 to 7.
CN202311514369.XA 2023-11-15 2023-11-15 Method, system, equipment and medium for predicting imatinib valley concentration Pending CN117252110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311514369.XA CN117252110A (en) 2023-11-15 2023-11-15 Method, system, equipment and medium for predicting imatinib valley concentration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311514369.XA CN117252110A (en) 2023-11-15 2023-11-15 Method, system, equipment and medium for predicting imatinib valley concentration

Publications (1)

Publication Number Publication Date
CN117252110A true CN117252110A (en) 2023-12-19

Family

ID=89133554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311514369.XA Pending CN117252110A (en) 2023-11-15 2023-11-15 Method, system, equipment and medium for predicting imatinib valley concentration

Country Status (1)

Country Link
CN (1) CN117252110A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327540A1 (en) * 2018-08-17 2021-10-21 Henry M. Jackson Foundation For The Advancement Of Military Medicine Use of machine learning models for prediction of clinical outcomes
CN116206776A (en) * 2023-03-02 2023-06-02 山东大学 Prediction model of blood concentration of polygenic SNP locus mediated antipsychotic drug, construction method and application thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327540A1 (en) * 2018-08-17 2021-10-21 Henry M. Jackson Foundation For The Advancement Of Military Medicine Use of machine learning models for prediction of clinical outcomes
CN116206776A (en) * 2023-03-02 2023-06-02 山东大学 Prediction model of blood concentration of polygenic SNP locus mediated antipsychotic drug, construction method and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PING ZHENG 等: "Predicting Blood Concentration of Tacrolimus in Patients With Autoimmune Diseases Using Machine Learning Techniques Based on Real-World Evidence", pages 1 - 8, Retrieved from the Internet <URL:doi: 10.3389/fphar.2021.727245> *
WEI GUO 等: "A Machine Learning Model to Predict Risperidone Active Moiety Concentration Based on Initial Therapeutic Drug Monitoring", pages 2 - 8, Retrieved from the Internet <URL:doi: 10.3389/fpsyt.2021.711868> *

Similar Documents

Publication Publication Date Title
Verweij et al. The genetic makeup of the electrocardiogram
Fang et al. Discovering genetic interactions bridging pathways in genome-wide association studies
Ibarra et al. Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing
Wang et al. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort
Naydenov et al. Differences in lymphocyte electron transport gene expression levels between subjects with bipolar disorder and normal controls in response to glucose deprivation stress
Shah et al. Population genomics of cardiometabolic traits: design of the university college london-london school of hygiene and tropical medicine-edinburgh-bristol (UCLEB) consortium
Sügis et al. HENA, heterogeneous network-based data set for Alzheimer’s disease
US20170277826A1 (en) System, method and software for robust transcriptomic data analysis
Nannipieri et al. Polymorphisms in the hANP (human atrial natriuretic peptide) gene, albuminuria, and hypertension
CN112289376B (en) Method and device for detecting somatic cell mutation
WO2022125806A1 (en) Predicting fractional flow reserve from electrocardiograms and patient records
CN109872776A (en) A kind of screening technique and its application based on weighted gene coexpression network analysis to gastric cancer potential source biomolecule marker
WO2016016879A1 (en) System, method and software for predicting drug efficacy in a patient
Khurshid et al. Clinical and genetic associations of deep learning-derived cardiac magnetic resonance-based left ventricular mass
KR101983985B1 (en) Composition for diagnosing sasang constitution and diagnosis method using the same
CN111724911A (en) Target drug sensitivity prediction method and device, terminal device and storage medium
Salvioli et al. Biomarkers of aging in frailty and age-associated disorders: State of the art and future perspective
CN111445991A (en) Method for clinical immune monitoring based on cell transcriptome data
CN111883258B (en) Method for constructing OHSS indexing parting prediction model
CN117252110A (en) Method, system, equipment and medium for predicting imatinib valley concentration
CN116564410A (en) Method, equipment and medium for predicting mutation site cis-regulatory gene
Platt et al. Characterizing redescriptions using persistent homology to isolate genetic pathways contributing to pathogenesis
Eun et al. Identification of novel biomarkers for prediction of neurological prognosis following cardiac arrest
Zhao et al. Deciphering the genetic architecture of human brain structure and function: a brief survey on recent advances of neuroimaging genomics
Zhao et al. Quantitative mapping of genetic similarity in human heritable diseases by shared mutations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination