CN114898873A - Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient - Google Patents

Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient Download PDF

Info

Publication number
CN114898873A
CN114898873A CN202210334081.3A CN202210334081A CN114898873A CN 114898873 A CN114898873 A CN 114898873A CN 202210334081 A CN202210334081 A CN 202210334081A CN 114898873 A CN114898873 A CN 114898873A
Authority
CN
China
Prior art keywords
risk
model
regression
variables
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210334081.3A
Other languages
Chinese (zh)
Inventor
庄晓东
林钇奋
廖新学
利妙红
张绍钊
黄蔓
黄日华
仲祥斌
熊振宇
刘梦辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Sun Yat Sen University
Original Assignee
First Affiliated Hospital of Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Sun Yat Sen University filed Critical First Affiliated Hospital of Sun Yat Sen University
Priority to CN202210334081.3A priority Critical patent/CN114898873A/en
Publication of CN114898873A publication Critical patent/CN114898873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention provides a method and a system for predicting cardiovascular disease risk of a patient with pre-diabetes, wherein the method comprises the steps of firstly, acquiring medical data of the patient with pre-diabetes; then randomly extracting data samples by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set; then, filtering key variables by using lasso regression on the training set; then, carrying out COX proportional risk regression analysis on the screened variables, and constructing a prediction model based on COX proportional risk regression fitting; and finally, carrying out model accuracy verification on the constructed multi-factor COX regression fitting prediction model by using the inner verification set and the outer verification set. The invention utilizes the conventional clinical detection indexes to construct a convenient and practical 10-year ASCVD risk prediction system so as to carry out risk stratification on increasing early-stage groups of diabetes mellitus, guide individualized treatment schemes and delay the occurrence of cardiovascular diseases.

Description

Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient
Technical Field
The invention relates to the technical field of medical condition detection, in particular to a method and a system for predicting cardiovascular disease risk of a patient with early diabetes.
Background
Pre-diabetes, i.e. impaired glucose regulation, refers to a state where blood glucose levels are intermediate between normal glucose metabolism and diabetes, and there are three main diagnostic criteria, one of which can be diagnosed if it is met: 1. 5.6mmol/L or more and 6.9mmol/L or less of fasting blood glucose; 2. 7.8mmol/L is less than or equal to 11.0mmol/L of blood sugar after 2 hours; 3. 5.7 percent to 6.4 percent of glycosylated hemoglobin.
Current studies indicate that the risk of atherosclerotic cardiovascular disease is significantly higher in pre-diabetic populations than in normal populations, but the measures associated with cardiovascular disease prevention for such pre-diabetic populations are still poor. Therefore, enhancing the primary prevention of cardiovascular diseases in pre-diabetic people is the key to improving the prognosis of pre-diabetic people.
Research shows that the prediabetes is a metabolic state with large heterogeneity, and the prognosis of people with different metabolic states is greatly different. The method is an effective strategy for reducing the cardiovascular complications in the early stage of diabetes by carrying out more active cardiovascular disease screening and treatment aiming at the population with poorer prognosis, can enable high-risk patients to be effectively treated in time, avoids over-treatment and unnecessary medical expenditure for low-risk patients, and has great public health significance. How to identify patients with high cardiovascular disease risk in advance is a critical problem to be solved urgently.
At present, cardiovascular risk prediction of people with glucose metabolism disorder is mainly based on modeling of type 2 diabetes mellitus people, and in the face of increasing pre-diabetes people, no atherosclerotic cardiovascular disease risk evaluation system special for the pre-diabetes people exists at present; the existing risk assessment tool constructed based on the whole population or the diabetes population has poor prediction performance in the early stage of diabetes; the risk of cardiovascular diseases of the pre-diabetic population is inconsistent with the risk of cardiovascular diseases of healthy people and diabetic patients, and the required intervention means and intensity are different.
The patent selects factors which have large influence on the differential diagnosis result through the analysis of clinical and pathological characteristics of type 2 diabetes patients, adopts a column diagram form, is concise, concise and understandable, and is convenient for clinical operation; according to the patent, the probability that the pathological diagnosis of the patient is non-diabetic kidney disease (NDRD) and diabetic nephropathy and non-diabetic kidney disease (MIX) in the renal biopsy is judged through initial detection data of the type 2 diabetic patient, and differential diagnosis of the diabetic nephropathy and the non-diabetic kidney disease in the type 2 diabetic patient is realized; through the probability prediction method disclosed by the patent, the risk-benefit ratio of renal biopsy of a type 2 diabetic renal injury patient is facilitated to be evaluated by a clinician, and meanwhile, reference is provided for staff of a medical institution without renal biopsy technology. However, the patent does not relate to any conventional clinical detection indexes, and a convenient and practical ASCVD risk prediction is constructed, so that the risk stratification is performed in the early stage of the increasingly-increased pre-diabetic population, an individualized treatment scheme is guided, and the occurrence of cardiovascular diseases is delayed.
Disclosure of Invention
The invention provides a method for predicting cardiovascular disease risk of a patient in the early stage of diabetes, which is a convenient, rapid and accurate atherosclerotic cardiovascular disease risk prediction model constructed based on conventional clinical detection indexes to realize risk stratification of a diabetic patient group, so that an individualized examination and treatment scheme is implemented, and the public health aims of delaying the occurrence of cardiovascular diseases and reducing the burden of the cardiovascular diseases are finally fulfilled.
It is still another object of the present invention to provide a system using the above method for predicting the risk of cardiovascular disease in a pre-diabetic patient.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a method for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising the steps of:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
Further, the specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
Further, the specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.
Further, the step S4 includes:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multifactor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure BDA0003576014720000031
wherein,
Figure BDA0003576014720000032
to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,
Figure BDA0003576014720000033
is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,
Figure BDA0003576014720000034
the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
A system for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising:
the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;
the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;
the key variable general selection module is used for screening key variables by using lasso regression on the training set;
the prediction model construction module is used for carrying out COX proportion risk regression analysis on the screened variables and constructing a prediction model based on COX proportion risk regression fitting;
and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.
Further, the data acquisition module acquires medical data of the patient in the early stage of diabetes to perform data screening and feature extraction, and candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular disease are obtained; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
Further, the key variable general selection module performs LASSO regression analysis on the influence factors in the training set, constructs a penalty function to perform variable screening and complexity adjustment, screens an optimal model according to the lambda value, and screens out the key variables included in the model.
Further, the prediction model construction module performs COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), eliminates variables which do not reach a statistically significant level in the model, and constructs a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For incorporation into modelsVariable, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
the prediction model construction module obtains the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure BDA0003576014720000041
wherein,
Figure BDA0003576014720000042
to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) as a modeled population baseline risk value,
Figure BDA0003576014720000043
is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,
Figure BDA0003576014720000044
the sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention utilizes the conventional clinical detection indexes to construct a convenient and practical 10-year ASCVD risk prediction system so as to carry out risk stratification on increasing early-stage groups of diabetes mellitus, guide individualized treatment schemes and delay the occurrence of cardiovascular diseases.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a graph of the results of a LASSO regression model with variable selection in the training set, with graph A being a coefficient distribution graph and graph B being a vertical dashed line graph;
FIG. 3 uses the risk prediction values of the model in the training set (A), the inner validation set (B), and the outer validation set (C, D) to evaluate the degree of calibration of disease risk;
FIG. 4 uses the risk predictors of the model in the training set (A), the inner validation set (B), and the outer validation set (C, D) to evaluate the clinical effectiveness of disease risk;
fig. 5 is a block diagram of the system of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, a method for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises the following steps:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
The specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure adjustment, diastolic pressure adjustment, hypertension history, and family cardiovascular disease history; laboratory test indexes include hemoglobin, white blood cell count, hematocrit, glycated hemoglobin, low density lipoprotein cholesterol for regulating lipid-lowering drugs, high density lipoprotein cholesterol, total cholesterol, triglyceride, C-reactive protein, creatinine, glomerular filtration rate (calculated by CKD-EPI formula), cystatin C; drawing a sample distribution histogram and a probability density curve for the continuous variable, and performing natural logarithm conversion on the variable which does not conform to normal distribution; the atherosclerotic cardiovascular disease comprises coronary heart disease, apoplexy and peripheral artery disease.
The specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables (shown in figure 2) included in the model.
Step S4 includes:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multifactor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure BDA0003576014720000061
wherein,
Figure BDA0003576014720000062
to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,
Figure BDA0003576014720000063
is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,
Figure BDA0003576014720000064
the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
The risk value of atherosclerotic cardiovascular disease development in diabetic patients within 10 years is 1-0.9265^ exp (0.5556 × sex +0.1475 × whether smoking before +0.6147 × whether smoking at present +0.1999 × family history +0.2108 × hypertension +3.1032 × age +0.0064 × systolic blood pressure regulating drug +0.1660 × low density lipoprotein cholesterol regulating lipid lowering drug-0.5806 × high density lipoprotein cholesterol +2.1250 × glycated hemoglobin +0.0275 × white cell count +0.1233 × C reactive protein +0.6121 × cystatin C);
wherein, when the sex is female, the sex is 0, and when the sex is male, the sex is 1; if the existing smoking is finished, judging whether the existing smoking is 1 or not when the existing smoking is finished, or judging whether the existing smoking is 0 or not; when smoking, whether smoking is 1 or not at present, or whether smoking is 0 or not at present; if the family history exists, the family history is 1, otherwise, the family history is 0; when the hypertension exists, the hypertension is 1, otherwise the hypertension is 0;
the age unit is year, the numerical value is converted by natural logarithm, and the original numerical value range is 40-70;
regulating the systolic pressure unit of the hypotensive drug to mmHg, wherein the numerical range is 86.5-255.0;
regulating the concentration unit of low density lipoprotein cholesterol of the lipid-lowering medicine to be mmol/L, wherein the numerical range is 0.84-11.35;
the concentration unit of the high-density lipoprotein cholesterol is mmol/L, the value is converted by natural logarithm, and the original value range is 0.487-4.087;
the unit of the glycosylated hemoglobin is percent, the numerical value is converted by natural logarithm, and the range of the original numerical value is 5.70-6.49;
the white blood cell count unit is 10^ 9/L, and the value range is 0.21-101.90;
the unit of C-reactive protein is mg/L, the value is converted by natural logarithm, and the original value range is 0.08-78.05;
the cystatin C unit is mg/L, and the numerical range is 0.442-4.140;
describing the probability distribution condition of atherosclerotic cardiovascular disease of 10 years in a training set, an internal verification set and an external verification set population by taking the prediction probability as an analysis variable;
respectively constructing ROC curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the area under the curve (AUC) value range is between 0.5 and 1, the more the AUC is close to 1.0, the higher the predicted discrimination is; when the value is equal to 0.5, the discrimination is lowest, and the application value is not high;
in the embodiment, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years, the fluctuation of the AUC values in the training set and the outer verification set of the inner verification set is 0.688 to 0.712, so that the discrimination is better;
respectively constructing calibration curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the abscissa is the prediction probability, and the ordinate is the actually observed event risk; y is a reference line, which represents that the predicted value is equal to the observation risk under the ideal condition; marking points are risks predicted by the constructed model and corresponding actual observation risks; if the predicted value is greater than the actual value, the observation point is below the reference line; if the predicted value is smaller than the actual value, the observation point is above the reference line;
in the example, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years, the calibration curves of the training set and the outer verification set of the inner verification set fall on two sides of the reference line, so that the model is prompted to have better calibration degree;
respectively constructing decision curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the abscissa is a threshold probability, when the probability predicted by the patient using the model is greater than the threshold probability, therapeutic measures are taken, and the corresponding ordinate is the net benefit obtained by the patient's profit minus the loss at this time; the horizontal line and the oblique line are extreme conditions, the horizontal line represents that all samples are negative, and the net benefit is 0; the diagonal lines indicate that all samples were positive and all samples received treatment.
In the embodiment, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years and guiding the treatment, the patient can obtain net benefit within a certain threshold probability range;
constructing an online prediction system for 10-year atherosclerotic cardiovascular disease risk of a patient in the pre-diabetes (figure 4);
the complex regression equation is converted into a practical website interactive interface, so that the result of the prediction model is more readable, and the prediction model is convenient for evaluation of the testee and widely used in medical research and clinic.
Example 2
As shown in fig. 5, a system for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises:
the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;
the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;
the key variable general selection module is used for screening key variables by using lasso regression on the training set;
the prediction model construction module is used for carrying out COX proportional risk regression analysis on the screened variables and constructing a prediction model based on COX proportional risk regression fitting;
and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.
The variables collected by the collecting module comprise the age, sex, smoking state, systolic pressure, hypertension history, family cardiovascular disease history, white blood cell count, glycosylated hemoglobin, low density lipoprotein cholesterol, high density lipoprotein cholesterol, C reactive protein, creatinine, cystatin C, whether a antihypertensive drug is taken or not, and whether a lipid-lowering drug is taken or not; whether the antihypertensive drugs and the lipid-lowering drugs are taken or not is respectively used for calculating the systolic pressure of the antihypertensive drugs and the low-density lipoprotein cholesterol of the lipid-lowering drugs.
And the key variable general selection module performs LASSO regression analysis on the influence factors in the training set, constructs a penalty function to perform variable screening and complexity adjustment, screens an optimal model according to the lambda value, and screens out key variables brought into the model.
The prediction model building module performs COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), eliminates variables which do not reach a statistically significant level in the model, and builds a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
the prediction model construction module obtains the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure BDA0003576014720000081
wherein,
Figure BDA0003576014720000082
to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,
Figure BDA0003576014720000083
is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,
Figure BDA0003576014720000084
the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
The prediction model construction module calculates an individual risk prediction value by using the constructed model, finally outputs a risk prediction value of atherosclerotic cardiovascular disease of a patient in the early stage of diabetes within 10 years, adopts a 10-year cardiovascular risk layering standard (low risk: less than 5%, critical risk: more than or equal to 5% and less than 7.5%, medium risk: more than or equal to 7.5% and less than 20%, high risk: more than or equal to 20%) recommended by the 2019 ACC/AHA cardiovascular disease first-level prevention and treatment guideline, and provides the current risk level based on the calculated risk value.
Example 3
As shown in fig. 1, a method for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises the following steps:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
The specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
The specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.
Step S4 includes:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure BDA0003576014720000101
wherein,
Figure BDA0003576014720000102
to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,
Figure BDA0003576014720000103
is a linear prediction value, i.e. the sum of products of respective variables and partial regression coefficients,
Figure BDA0003576014720000104
The sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.
The same or similar reference numerals correspond to the same or similar parts;
the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A method for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising the steps of:
s1: acquiring medical data of a patient in the early stage of diabetes;
s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;
s3: screening key variables using lasso regression on the training set;
s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;
s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.
2. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 1, wherein the specific process of step S1 is:
acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
3. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 2, wherein the specific process of step S3 is:
and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.
4. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 3, wherein the step S4 comprises:
performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the baseline risk, h (t, X) the risk when time is t and the objective factor is X.
5. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 4, wherein the step S4 further comprises:
obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure FDA0003576014710000021
wherein,
Figure FDA0003576014710000022
to predict the probability of an adverse event occurring in the resulting individual, S 0 (t) modeling a population baseline risk value,
Figure FDA0003576014710000023
is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,
Figure FDA0003576014710000024
the sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.
6. A system for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising:
the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;
the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;
the key variable general selection module is used for screening key variables by using lasso regression on the training set;
the prediction model construction module is used for carrying out COX proportional risk regression analysis on the screened variables and constructing a prediction model based on COX proportional risk regression fitting;
and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.
7. The system of claim 6, wherein the data collection module collects medical data of the pre-diabetic patient for data screening and feature extraction to obtain candidate prediction factors corresponding to risk of atherosclerotic cardiovascular disease; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.
8. The system of claim 7, wherein the key variable commander module performs LASSO regression analysis on the impact factors in the training set, constructs a penalty function for variable screening and complexity adjustment, screens the optimal model according to the lambda value, and screens out the key variables included in the model.
9. The system of claim 8, wherein the model for predicting risk of cardiovascular disease in pre-diabetic patients comprises a model for predicting model building module that performs a COX proportional risk regression analysis on all screened variables in the training set using equation (1), eliminates variables in the model that do not reach a statistically significant level, and builds a multi-factor COX regression model:
h(t,X)=h 0 (t)exp(β 1 X 12 X 2 +…+β m X m ) (1)
wherein, X 1 ,X 2 ,…,X m For variables incorporated into the model, beta 1 ,β 2 ,…,β m Partial regression coefficient of each variable, h 0 (t) is the baseline risk, h (t, X) the risk when time is t and the objective factor is X.
10. The system of claim 9, wherein the model building module obtains the corresponding prediction probability of the multi-factor COX regression model according to equation (2); a greater risk value indicates a greater risk of poor patient prognosis;
Figure FDA0003576014710000031
wherein,
Figure FDA0003576014710000032
to predict the probability of an adverse event in the resulting individual, S 0 (t) modeling a population baseline risk value,
Figure FDA0003576014710000033
is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,
Figure FDA0003576014710000034
the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.
CN202210334081.3A 2022-03-31 2022-03-31 Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient Pending CN114898873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210334081.3A CN114898873A (en) 2022-03-31 2022-03-31 Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210334081.3A CN114898873A (en) 2022-03-31 2022-03-31 Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient

Publications (1)

Publication Number Publication Date
CN114898873A true CN114898873A (en) 2022-08-12

Family

ID=82714670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210334081.3A Pending CN114898873A (en) 2022-03-31 2022-03-31 Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient

Country Status (1)

Country Link
CN (1) CN114898873A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115547495A (en) * 2022-09-02 2022-12-30 广东药科大学 System for comprehensively evaluating glycolipid metabolism level and application thereof
CN116364268A (en) * 2022-11-01 2023-06-30 山东大学 Novel breast cancer prediction method based on punishment COX regression
CN116665911A (en) * 2023-06-15 2023-08-29 中国医学科学院阜外医院 Long-term prediction method and prediction model construction method for myocardial infarction of patient with 2-type diabetes
CN117153377A (en) * 2023-10-11 2023-12-01 中山大学附属第一医院 Model for predicting death risk of adult patient with moderately severe aortic valve stenosis
CN117524486A (en) * 2024-01-04 2024-02-06 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient
CN117672445A (en) * 2023-12-18 2024-03-08 郑州大学 Diabetes mellitus debilitation current situation analysis method and system based on big data

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115547495A (en) * 2022-09-02 2022-12-30 广东药科大学 System for comprehensively evaluating glycolipid metabolism level and application thereof
CN115547495B (en) * 2022-09-02 2023-09-12 广东药科大学 System for comprehensively evaluating glycolipid metabolism level and application thereof
CN116364268A (en) * 2022-11-01 2023-06-30 山东大学 Novel breast cancer prediction method based on punishment COX regression
CN116364268B (en) * 2022-11-01 2023-11-17 山东大学 Novel breast cancer prediction method based on punishment COX regression
CN116665911A (en) * 2023-06-15 2023-08-29 中国医学科学院阜外医院 Long-term prediction method and prediction model construction method for myocardial infarction of patient with 2-type diabetes
CN117153377A (en) * 2023-10-11 2023-12-01 中山大学附属第一医院 Model for predicting death risk of adult patient with moderately severe aortic valve stenosis
CN117672445A (en) * 2023-12-18 2024-03-08 郑州大学 Diabetes mellitus debilitation current situation analysis method and system based on big data
CN117524486A (en) * 2024-01-04 2024-02-06 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient
CN117524486B (en) * 2024-01-04 2024-04-05 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient

Similar Documents

Publication Publication Date Title
CN114898873A (en) Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient
CN112837819B (en) Method for establishing acute kidney injury prediction model after coronary artery bypass grafting operation
CN114023441A (en) Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof
CN111297329B (en) Method and system for predicting dynamic onset risk of cardiovascular complications of diabetics
CN111312401A (en) After-physical-examination chronic disease prognosis system based on multi-label learning
CN114267451A (en) Cardiovascular disease risk assessment method
CN115240855A (en) Gastric cancer resection postoperative severe complication nursing risk prediction model and construction method thereof
CN117116477A (en) Construction method and system of prostate cancer disease risk prediction model based on random forest and XGBoost
CN115691788A (en) Dual attention coupling network diabetes classification system based on heterogeneous data
CN112820397B (en) Method for establishing peri-operative risk prediction model of coronary artery bypass grafting
CN114188019A (en) Method and system for establishing prediction model for identifying ischemic stroke
CN111883248B (en) Prediction system for childhood obesity
CN111341452B (en) Multisystem atrophy disability prediction method, model building method, device and equipment
CN117079810A (en) Cardiovascular disease unscheduled re-hospitalization risk prediction method
CN115547502B (en) Hemodialysis patient risk prediction device based on time sequence data
CN116913550A (en) Modeling method and application of PPI-related diabetes risk prediction model
CN116759094A (en) Evaluation system and method for senile community acquired pneumonia death risk
Panigrahy et al. Predictive Modelling of Diabetes Complications: Insights from Binary Classifier on Chronic Diabetic Mellitus
TWI848789B (en) Methods for establishing model to predict risk of diabetic nephropathy and predicting diabetic nephropathy risk using the model
CN114388129B (en) Atherosclerosis risk prediction method based on dynamic information value criterion and ensemble learning
CN114049962A (en) Fasting blood glucose damage risk prediction model and application thereof
CN118016315B (en) Pancreatic cancer prediction system and prediction method based on data analysis
Quinn et al. Defining the Correlation Between Kidney Function and Histopathologi-cal Changes: SU-OR18
Norvik et al. Molecular Mechanisms Underlying Sex-Specific Association of Circulating Transforming Growth Factor в1 with the Risk of Accelerated Kidney Function Decline: SU-OR17
Gjerde et al. Low Birth Weight for Gestational Age and Risk of Different Groups of Kidney Disease During the First 50 Years of Life: SU-OR19

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination