CN114898873A - Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient - Google Patents
Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient Download PDFInfo
- Publication number
- CN114898873A CN114898873A CN202210334081.3A CN202210334081A CN114898873A CN 114898873 A CN114898873 A CN 114898873A CN 202210334081 A CN202210334081 A CN 202210334081A CN 114898873 A CN114898873 A CN 114898873A
- Authority
- CN
- China
- Prior art keywords
- risk
- model
- regression
- variables
- patient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000024172 Cardiovascular disease Diseases 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 34
- 206010012601 diabetes mellitus Diseases 0.000 title claims abstract description 29
- 238000012795 verification Methods 0.000 claims abstract description 55
- 238000012549 training Methods 0.000 claims abstract description 39
- 206010018429 Glucose tolerance impaired Diseases 0.000 claims abstract description 27
- 238000000611 regression analysis Methods 0.000 claims abstract description 22
- 238000012216 screening Methods 0.000 claims description 29
- 230000036961 partial effect Effects 0.000 claims description 21
- 206010003210 Arteriosclerosis Diseases 0.000 claims description 17
- 201000001320 Atherosclerosis Diseases 0.000 claims description 17
- 230000000391 smoking effect Effects 0.000 claims description 16
- 206010020772 Hypertension Diseases 0.000 claims description 11
- 238000004393 prognosis Methods 0.000 claims description 10
- 230000035488 systolic blood pressure Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 7
- 230000035487 diastolic blood pressure Effects 0.000 claims description 6
- 230000035622 drinking Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims 1
- 208000001280 Prediabetic State Diseases 0.000 abstract description 7
- 201000009104 prediabetes syndrome Diseases 0.000 abstract description 7
- 238000001514 detection method Methods 0.000 abstract description 6
- 238000013517 stratification Methods 0.000 abstract description 4
- 238000011337 individualized treatment Methods 0.000 abstract description 3
- 238000001914 filtration Methods 0.000 abstract 1
- 238000011282 treatment Methods 0.000 description 6
- 102000017011 Glycated Hemoglobin A Human genes 0.000 description 5
- 108010028554 LDL Cholesterol Proteins 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 230000001105 regulatory effect Effects 0.000 description 5
- 108010074051 C-Reactive Protein Proteins 0.000 description 4
- 102100032752 C-reactive protein Human genes 0.000 description 4
- 102000012192 Cystatin C Human genes 0.000 description 4
- 108010061642 Cystatin C Proteins 0.000 description 4
- 108010023302 HDL Cholesterol Proteins 0.000 description 4
- 239000002220 antihypertensive agent Substances 0.000 description 4
- DDRJAANPRJIHGJ-UHFFFAOYSA-N creatinine Chemical compound CN1CC(=O)NC1=N DDRJAANPRJIHGJ-UHFFFAOYSA-N 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 238000010200 validation analysis Methods 0.000 description 4
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 3
- 108010014663 Glycated Hemoglobin A Proteins 0.000 description 3
- 229940127088 antihypertensive drug Drugs 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000004820 blood count Methods 0.000 description 3
- 230000002526 effect on cardiovascular system Effects 0.000 description 3
- 239000008103 glucose Substances 0.000 description 3
- 238000011862 kidney biopsy Methods 0.000 description 3
- 210000000265 leukocyte Anatomy 0.000 description 3
- 208000038001 non-diabetic kidney disease Diseases 0.000 description 3
- 208000007342 Diabetic Nephropathies Diseases 0.000 description 2
- 238000011088 calibration curve Methods 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 229940109239 creatinine Drugs 0.000 description 2
- 208000033679 diabetic kidney disease Diseases 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003748 differential diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 108091005995 glycated hemoglobin Proteins 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 230000002503 metabolic effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005180 public health Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000001072 type 2 diabetes mellitus Diseases 0.000 description 2
- 206010008190 Cerebrovascular accident Diseases 0.000 description 1
- 101710194425 Dehydrogenase/reductase SDR family member 4 Proteins 0.000 description 1
- 102000001554 Hemoglobins Human genes 0.000 description 1
- 108010054147 Hemoglobins Proteins 0.000 description 1
- 206010061481 Renal injury Diseases 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- 239000003524 antilipemic agent Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- 208000020832 chronic kidney disease Diseases 0.000 description 1
- 208000029078 coronary artery disease Diseases 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006806 disease prevention Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000024924 glomerular filtration Effects 0.000 description 1
- 230000004153 glucose metabolism Effects 0.000 description 1
- 208000018914 glucose metabolism disease Diseases 0.000 description 1
- 238000005534 hematocrit Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000009533 lab test Methods 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 208000030613 peripheral artery disease Diseases 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000009862 primary prevention Effects 0.000 description 1
- 230000004224 protection Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000013058 risk prediction model Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- UFTFJSFQGQCHQW-UHFFFAOYSA-N triformin Chemical compound O=COCC(OC=O)COC=O UFTFJSFQGQCHQW-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention provides a method and a system for predicting cardiovascular disease risk of a patient with pre-diabetes, wherein the method comprises the steps of firstly, acquiring medical data of the patient with pre-diabetes; then randomly extracting data samples by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set; then, filtering key variables by using lasso regression on the training set; then, carrying out COX proportional risk regression analysis on the screened variables, and constructing a prediction model based on COX proportional risk regression fitting; and finally, carrying out model accuracy verification on the constructed multi-factor COX regression fitting prediction model by using the inner verification set and the outer verification set. The invention utilizes the conventional clinical detection indexes to construct a convenient and practical 10-year ASCVD risk prediction system so as to carry out risk stratification on increasing early-stage groups of diabetes mellitus, guide individualized treatment schemes and delay the occurrence of cardiovascular diseases.
Description
Technical Field
The invention relates to the technical field of medical condition detection, in particular to a method and a system for predicting cardiovascular disease risk of a patient with early diabetes.
Background
Pre-diabetes, i.e. impaired glucose regulation, refers to a state where blood glucose levels are intermediate between normal glucose metabolism and diabetes, and there are three main diagnostic criteria, one of which can be diagnosed if it is met: 1. 5.6mmol/L or more and 6.9mmol/L or less of fasting blood glucose; 2. 7.8mmol/L is less than or equal to 11.0mmol/L of blood sugar after 2 hours; 3. 5.7 percent to 6.4 percent of glycosylated hemoglobin.
Current studies indicate that the risk of atherosclerotic cardiovascular disease is significantly higher in pre-diabetic populations than in normal populations, but the measures associated with cardiovascular disease prevention for such pre-diabetic populations are still poor. Therefore, enhancing the primary prevention of cardiovascular diseases in pre-diabetic people is the key to improving the prognosis of pre-diabetic people.
Research shows that the prediabetes is a metabolic state with large heterogeneity, and the prognosis of people with different metabolic states is greatly different. The method is an effective strategy for reducing the cardiovascular complications in the early stage of diabetes by carrying out more active cardiovascular disease screening and treatment aiming at the population with poorer prognosis, can enable high-risk patients to be effectively treated in time, avoids over-treatment and unnecessary medical expenditure for low-risk patients, and has great public health significance. How to identify patients with high cardiovascular disease risk in advance is a critical problem to be solved urgently.
At present, cardiovascular risk prediction of people with glucose metabolism disorder is mainly based on modeling of type 2 diabetes mellitus people, and in the face of increasing pre-diabetes people, no atherosclerotic cardiovascular disease risk evaluation system special for the pre-diabetes people exists at present; the existing risk assessment tool constructed based on the whole population or the diabetes population has poor prediction performance in the early stage of diabetes; the risk of cardiovascular diseases of the pre-diabetic population is inconsistent with the risk of cardiovascular diseases of healthy people and diabetic patients, and the required intervention means and intensity are different.
The patent selects factors which have large influence on the differential diagnosis result through the analysis of clinical and pathological characteristics of type 2 diabetes patients, adopts a column diagram form, is concise, concise and understandable, and is convenient for clinical operation; according to the patent, the probability that the pathological diagnosis of the patient is non-diabetic kidney disease (NDRD) and diabetic nephropathy and non-diabetic kidney disease (MIX) in the renal biopsy is judged through initial detection data of the type 2 diabetic patient, and differential diagnosis of the diabetic nephropathy and the non-diabetic kidney disease in the type 2 diabetic patient is realized; through the probability prediction method disclosed by the patent, the risk-benefit ratio of renal biopsy of a type 2 diabetic renal injury patient is facilitated to be evaluated by a clinician, and meanwhile, reference is provided for staff of a medical institution without renal biopsy technology. However, the patent does not relate to any conventional clinical detection indexes, and a convenient and practical ASCVD risk prediction is constructed, so that the risk stratification is performed in the early stage of the increasingly-increased pre-diabetic population, an individualized treatment scheme is guided, and the occurrence of cardiovascular diseases is delayed.
Disclosure of Invention
The invention provides a method for predicting cardiovascular disease risk of a patient in the early stage of diabetes, which is a convenient, rapid and accurate atherosclerotic cardiovascular disease risk prediction model constructed based on conventional clinical detection indexes to realize risk stratification of a diabetic patient group, so that an individualized examination and treatment scheme is implemented, and the public health aims of delaying the occurrence of cardiovascular diseases and reducing the burden of the cardiovascular diseases are finally fulfilled.
It is still another object of the present invention to provide a system using the above method for predicting the risk of cardiovascular disease in a pre-diabetic patient.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a method for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising the steps of:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
Further, the specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
Further, the specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.
Further, the step S4 includes:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multifactor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
A system for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising:
the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;
the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;
the key variable general selection module is used for screening key variables by using lasso regression on the training set;
the prediction model construction module is used for carrying out COX proportion risk regression analysis on the screened variables and constructing a prediction model based on COX proportion risk regression fitting;
and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.
Further, the data acquisition module acquires medical data of the patient in the early stage of diabetes to perform data screening and feature extraction, and candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular disease are obtained; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
Further, the key variable general selection module performs LASSO regression analysis on the influence factors in the training set, constructs a penalty function to perform variable screening and complexity adjustment, screens an optimal model according to the lambda value, and screens out the key variables included in the model.
Further, the prediction model construction module performs COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), eliminates variables which do not reach a statistically significant level in the model, and constructs a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For incorporation into modelsVariable, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
the prediction model construction module obtains the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) as a modeled population baseline risk value,is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,the sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention utilizes the conventional clinical detection indexes to construct a convenient and practical 10-year ASCVD risk prediction system so as to carry out risk stratification on increasing early-stage groups of diabetes mellitus, guide individualized treatment schemes and delay the occurrence of cardiovascular diseases.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a graph of the results of a LASSO regression model with variable selection in the training set, with graph A being a coefficient distribution graph and graph B being a vertical dashed line graph;
FIG. 3 uses the risk prediction values of the model in the training set (A), the inner validation set (B), and the outer validation set (C, D) to evaluate the degree of calibration of disease risk;
FIG. 4 uses the risk predictors of the model in the training set (A), the inner validation set (B), and the outer validation set (C, D) to evaluate the clinical effectiveness of disease risk;
fig. 5 is a block diagram of the system of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, a method for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises the following steps:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
The specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure adjustment, diastolic pressure adjustment, hypertension history, and family cardiovascular disease history; laboratory test indexes include hemoglobin, white blood cell count, hematocrit, glycated hemoglobin, low density lipoprotein cholesterol for regulating lipid-lowering drugs, high density lipoprotein cholesterol, total cholesterol, triglyceride, C-reactive protein, creatinine, glomerular filtration rate (calculated by CKD-EPI formula), cystatin C; drawing a sample distribution histogram and a probability density curve for the continuous variable, and performing natural logarithm conversion on the variable which does not conform to normal distribution; the atherosclerotic cardiovascular disease comprises coronary heart disease, apoplexy and peripheral artery disease.
The specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables (shown in figure 2) included in the model.
Step S4 includes:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multifactor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
The risk value of atherosclerotic cardiovascular disease development in diabetic patients within 10 years is 1-0.9265^ exp (0.5556 × sex +0.1475 × whether smoking before +0.6147 × whether smoking at present +0.1999 × family history +0.2108 × hypertension +3.1032 × age +0.0064 × systolic blood pressure regulating drug +0.1660 × low density lipoprotein cholesterol regulating lipid lowering drug-0.5806 × high density lipoprotein cholesterol +2.1250 × glycated hemoglobin +0.0275 × white cell count +0.1233 × C reactive protein +0.6121 × cystatin C);
wherein, when the sex is female, the sex is 0, and when the sex is male, the sex is 1; if the existing smoking is finished, judging whether the existing smoking is 1 or not when the existing smoking is finished, or judging whether the existing smoking is 0 or not; when smoking, whether smoking is 1 or not at present, or whether smoking is 0 or not at present; if the family history exists, the family history is 1, otherwise, the family history is 0; when the hypertension exists, the hypertension is 1, otherwise the hypertension is 0;
the age unit is year, the numerical value is converted by natural logarithm, and the original numerical value range is 40-70;
regulating the systolic pressure unit of the hypotensive drug to mmHg, wherein the numerical range is 86.5-255.0;
regulating the concentration unit of low density lipoprotein cholesterol of the lipid-lowering medicine to be mmol/L, wherein the numerical range is 0.84-11.35;
the concentration unit of the high-density lipoprotein cholesterol is mmol/L, the value is converted by natural logarithm, and the original value range is 0.487-4.087;
the unit of the glycosylated hemoglobin is percent, the numerical value is converted by natural logarithm, and the range of the original numerical value is 5.70-6.49;
the white blood cell count unit is 10^ 9/L, and the value range is 0.21-101.90;
the unit of C-reactive protein is mg/L, the value is converted by natural logarithm, and the original value range is 0.08-78.05;
the cystatin C unit is mg/L, and the numerical range is 0.442-4.140;
describing the probability distribution condition of atherosclerotic cardiovascular disease of 10 years in a training set, an internal verification set and an external verification set population by taking the prediction probability as an analysis variable;
respectively constructing ROC curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the area under the curve (AUC) value range is between 0.5 and 1, the more the AUC is close to 1.0, the higher the predicted discrimination is; when the value is equal to 0.5, the discrimination is lowest, and the application value is not high;
in the embodiment, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years, the fluctuation of the AUC values in the training set and the outer verification set of the inner verification set is 0.688 to 0.712, so that the discrimination is better;
respectively constructing calibration curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the abscissa is the prediction probability, and the ordinate is the actually observed event risk; y is a reference line, which represents that the predicted value is equal to the observation risk under the ideal condition; marking points are risks predicted by the constructed model and corresponding actual observation risks; if the predicted value is greater than the actual value, the observation point is below the reference line; if the predicted value is smaller than the actual value, the observation point is above the reference line;
in the example, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years, the calibration curves of the training set and the outer verification set of the inner verification set fall on two sides of the reference line, so that the model is prompted to have better calibration degree;
respectively constructing decision curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the abscissa is a threshold probability, when the probability predicted by the patient using the model is greater than the threshold probability, therapeutic measures are taken, and the corresponding ordinate is the net benefit obtained by the patient's profit minus the loss at this time; the horizontal line and the oblique line are extreme conditions, the horizontal line represents that all samples are negative, and the net benefit is 0; the diagonal lines indicate that all samples were positive and all samples received treatment.
In the embodiment, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years and guiding the treatment, the patient can obtain net benefit within a certain threshold probability range;
constructing an online prediction system for 10-year atherosclerotic cardiovascular disease risk of a patient in the pre-diabetes (figure 4);
the complex regression equation is converted into a practical website interactive interface, so that the result of the prediction model is more readable, and the prediction model is convenient for evaluation of the testee and widely used in medical research and clinic.
Example 2
As shown in fig. 5, a system for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises:
the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;
the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;
the key variable general selection module is used for screening key variables by using lasso regression on the training set;
the prediction model construction module is used for carrying out COX proportional risk regression analysis on the screened variables and constructing a prediction model based on COX proportional risk regression fitting;
and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.
The variables collected by the collecting module comprise the age, sex, smoking state, systolic pressure, hypertension history, family cardiovascular disease history, white blood cell count, glycosylated hemoglobin, low density lipoprotein cholesterol, high density lipoprotein cholesterol, C reactive protein, creatinine, cystatin C, whether a antihypertensive drug is taken or not, and whether a lipid-lowering drug is taken or not; whether the antihypertensive drugs and the lipid-lowering drugs are taken or not is respectively used for calculating the systolic pressure of the antihypertensive drugs and the low-density lipoprotein cholesterol of the lipid-lowering drugs.
And the key variable general selection module performs LASSO regression analysis on the influence factors in the training set, constructs a penalty function to perform variable screening and complexity adjustment, screens an optimal model according to the lambda value, and screens out key variables brought into the model.
The prediction model building module performs COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), eliminates variables which do not reach a statistically significant level in the model, and builds a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
the prediction model construction module obtains the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
The prediction model construction module calculates an individual risk prediction value by using the constructed model, finally outputs a risk prediction value of atherosclerotic cardiovascular disease of a patient in the early stage of diabetes within 10 years, adopts a 10-year cardiovascular risk layering standard (low risk: less than 5%, critical risk: more than or equal to 5% and less than 7.5%, medium risk: more than or equal to 7.5% and less than 20%, high risk: more than or equal to 20%) recommended by the 2019 ACC/AHA cardiovascular disease first-level prevention and treatment guideline, and provides the current risk level based on the calculated risk value.
Example 3
As shown in fig. 1, a method for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises the following steps:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
The specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
The specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.
Step S4 includes:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,is a linear prediction value, i.e. the sum of products of respective variables and partial regression coefficients,The sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.
The same or similar reference numerals correspond to the same or similar parts;
the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A method for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising the steps of:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
2. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 1, wherein the specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
3. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 2, wherein the specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.
4. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 3, wherein the step S4 comprises:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the baseline risk, h (t, X) the risk when time is t and the objective factor is X.
5. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 4, wherein the step S4 further comprises:
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,the sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.
6. A system for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising:
the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;
the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;
the key variable general selection module is used for screening key variables by using lasso regression on the training set;
the prediction model construction module is used for carrying out COX proportional risk regression analysis on the screened variables and constructing a prediction model based on COX proportional risk regression fitting;
and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.
7. The system of claim 6, wherein the data collection module collects medical data of the pre-diabetic patient for data screening and feature extraction to obtain candidate prediction factors corresponding to risk of atherosclerotic cardiovascular disease; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
8. The system of claim 7, wherein the key variable commander module performs LASSO regression analysis on the impact factors in the training set, constructs a penalty function for variable screening and complexity adjustment, screens the optimal model according to the lambda value, and screens out the key variables included in the model.
9. The system of claim 8, wherein the model for predicting risk of cardiovascular disease in pre-diabetic patients comprises a model for predicting model building module that performs a COX proportional risk regression analysis on all screened variables in the training set using equation (1), eliminates variables in the model that do not reach a statistically significant level, and builds a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 1 +β 2 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the baseline risk, h (t, X) the risk when time is t and the objective factor is X.
10. The system of claim 9, wherein the model building module obtains the corresponding prediction probability of the multi-factor COX regression model according to equation (2); a greater risk value indicates a greater risk of poor patient prognosis;
wherein,to predict the probability of an adverse event in the resulting individual, S 0 (t) modeling a population baseline risk value,is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210334081.3A CN114898873A (en) | 2022-03-31 | 2022-03-31 | Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210334081.3A CN114898873A (en) | 2022-03-31 | 2022-03-31 | Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114898873A true CN114898873A (en) | 2022-08-12 |
Family
ID=82714670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210334081.3A Pending CN114898873A (en) | 2022-03-31 | 2022-03-31 | Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114898873A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115547495A (en) * | 2022-09-02 | 2022-12-30 | 广东药科大学 | System for comprehensively evaluating glycolipid metabolism level and application thereof |
CN116364268A (en) * | 2022-11-01 | 2023-06-30 | 山东大学 | Novel breast cancer prediction method based on punishment COX regression |
CN116665911A (en) * | 2023-06-15 | 2023-08-29 | 中国医学科学院阜外医院 | Long-term prediction method and prediction model construction method for myocardial infarction of patient with 2-type diabetes |
CN117153377A (en) * | 2023-10-11 | 2023-12-01 | 中山大学附属第一医院 | Model for predicting death risk of adult patient with moderately severe aortic valve stenosis |
CN117524486A (en) * | 2024-01-04 | 2024-02-06 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
CN117672445A (en) * | 2023-12-18 | 2024-03-08 | 郑州大学 | Diabetes mellitus debilitation current situation analysis method and system based on big data |
-
2022
- 2022-03-31 CN CN202210334081.3A patent/CN114898873A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115547495A (en) * | 2022-09-02 | 2022-12-30 | 广东药科大学 | System for comprehensively evaluating glycolipid metabolism level and application thereof |
CN115547495B (en) * | 2022-09-02 | 2023-09-12 | 广东药科大学 | System for comprehensively evaluating glycolipid metabolism level and application thereof |
CN116364268A (en) * | 2022-11-01 | 2023-06-30 | 山东大学 | Novel breast cancer prediction method based on punishment COX regression |
CN116364268B (en) * | 2022-11-01 | 2023-11-17 | 山东大学 | Novel breast cancer prediction method based on punishment COX regression |
CN116665911A (en) * | 2023-06-15 | 2023-08-29 | 中国医学科学院阜外医院 | Long-term prediction method and prediction model construction method for myocardial infarction of patient with 2-type diabetes |
CN117153377A (en) * | 2023-10-11 | 2023-12-01 | 中山大学附属第一医院 | Model for predicting death risk of adult patient with moderately severe aortic valve stenosis |
CN117672445A (en) * | 2023-12-18 | 2024-03-08 | 郑州大学 | Diabetes mellitus debilitation current situation analysis method and system based on big data |
CN117524486A (en) * | 2024-01-04 | 2024-02-06 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
CN117524486B (en) * | 2024-01-04 | 2024-04-05 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114898873A (en) | Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient | |
CN112837819B (en) | Method for establishing acute kidney injury prediction model after coronary artery bypass grafting operation | |
CN114023441A (en) | Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof | |
CN111297329B (en) | Method and system for predicting dynamic onset risk of cardiovascular complications of diabetics | |
CN111312401A (en) | After-physical-examination chronic disease prognosis system based on multi-label learning | |
CN114267451A (en) | Cardiovascular disease risk assessment method | |
CN115240855A (en) | Gastric cancer resection postoperative severe complication nursing risk prediction model and construction method thereof | |
CN117116477A (en) | Construction method and system of prostate cancer disease risk prediction model based on random forest and XGBoost | |
CN115691788A (en) | Dual attention coupling network diabetes classification system based on heterogeneous data | |
CN112820397B (en) | Method for establishing peri-operative risk prediction model of coronary artery bypass grafting | |
CN114188019A (en) | Method and system for establishing prediction model for identifying ischemic stroke | |
CN111883248B (en) | Prediction system for childhood obesity | |
CN111341452B (en) | Multisystem atrophy disability prediction method, model building method, device and equipment | |
CN117079810A (en) | Cardiovascular disease unscheduled re-hospitalization risk prediction method | |
CN115547502B (en) | Hemodialysis patient risk prediction device based on time sequence data | |
CN116913550A (en) | Modeling method and application of PPI-related diabetes risk prediction model | |
CN116759094A (en) | Evaluation system and method for senile community acquired pneumonia death risk | |
Panigrahy et al. | Predictive Modelling of Diabetes Complications: Insights from Binary Classifier on Chronic Diabetic Mellitus | |
TWI848789B (en) | Methods for establishing model to predict risk of diabetic nephropathy and predicting diabetic nephropathy risk using the model | |
CN114388129B (en) | Atherosclerosis risk prediction method based on dynamic information value criterion and ensemble learning | |
CN114049962A (en) | Fasting blood glucose damage risk prediction model and application thereof | |
CN118016315B (en) | Pancreatic cancer prediction system and prediction method based on data analysis | |
Quinn et al. | Defining the Correlation Between Kidney Function and Histopathologi-cal Changes: SU-OR18 | |
Norvik et al. | Molecular Mechanisms Underlying Sex-Specific Association of Circulating Transforming Growth Factor в1 with the Risk of Accelerated Kidney Function Decline: SU-OR17 | |
Gjerde et al. | Low Birth Weight for Gestational Age and Risk of Different Groups of Kidney Disease During the First 50 Years of Life: SU-OR19 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |