CN114898873A

CN114898873A - Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient

Info

Publication number: CN114898873A
Application number: CN202210334081.3A
Authority: CN
Inventors: 庄晓东; 林钇奋; 廖新学; 利妙红; 张绍钊; 黄蔓; 黄日华; 仲祥斌; 熊振宇; 刘梦辉
Original assignee: First Affiliated Hospital of Sun Yat Sen University
Current assignee: First Affiliated Hospital of Sun Yat Sen University
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-08-12

Abstract

The invention provides a method and a system for predicting cardiovascular disease risk of a patient with pre-diabetes, wherein the method comprises the steps of firstly, acquiring medical data of the patient with pre-diabetes; then randomly extracting data samples by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set; then, filtering key variables by using lasso regression on the training set; then, carrying out COX proportional risk regression analysis on the screened variables, and constructing a prediction model based on COX proportional risk regression fitting; and finally, carrying out model accuracy verification on the constructed multi-factor COX regression fitting prediction model by using the inner verification set and the outer verification set. The invention utilizes the conventional clinical detection indexes to construct a convenient and practical 10-year ASCVD risk prediction system so as to carry out risk stratification on increasing early-stage groups of diabetes mellitus, guide individualized treatment schemes and delay the occurrence of cardiovascular diseases.

Description

Method and system for predicting cardiovascular disease risk of diabetes mellitus pre-stage patient

Technical Field

The invention relates to the technical field of medical condition detection, in particular to a method and a system for predicting cardiovascular disease risk of a patient with early diabetes.

Background

Pre-diabetes, i.e. impaired glucose regulation, refers to a state where blood glucose levels are intermediate between normal glucose metabolism and diabetes, and there are three main diagnostic criteria, one of which can be diagnosed if it is met: 1. 5.6mmol/L or more and 6.9mmol/L or less of fasting blood glucose; 2. 7.8mmol/L is less than or equal to 11.0mmol/L of blood sugar after 2 hours; 3. 5.7 percent to 6.4 percent of glycosylated hemoglobin.

Current studies indicate that the risk of atherosclerotic cardiovascular disease is significantly higher in pre-diabetic populations than in normal populations, but the measures associated with cardiovascular disease prevention for such pre-diabetic populations are still poor. Therefore, enhancing the primary prevention of cardiovascular diseases in pre-diabetic people is the key to improving the prognosis of pre-diabetic people.

Research shows that the prediabetes is a metabolic state with large heterogeneity, and the prognosis of people with different metabolic states is greatly different. The method is an effective strategy for reducing the cardiovascular complications in the early stage of diabetes by carrying out more active cardiovascular disease screening and treatment aiming at the population with poorer prognosis, can enable high-risk patients to be effectively treated in time, avoids over-treatment and unnecessary medical expenditure for low-risk patients, and has great public health significance. How to identify patients with high cardiovascular disease risk in advance is a critical problem to be solved urgently.

At present, cardiovascular risk prediction of people with glucose metabolism disorder is mainly based on modeling of type 2 diabetes mellitus people, and in the face of increasing pre-diabetes people, no atherosclerotic cardiovascular disease risk evaluation system special for the pre-diabetes people exists at present; the existing risk assessment tool constructed based on the whole population or the diabetes population has poor prediction performance in the early stage of diabetes; the risk of cardiovascular diseases of the pre-diabetic population is inconsistent with the risk of cardiovascular diseases of healthy people and diabetic patients, and the required intervention means and intensity are different.

The patent selects factors which have large influence on the differential diagnosis result through the analysis of clinical and pathological characteristics of type 2 diabetes patients, adopts a column diagram form, is concise, concise and understandable, and is convenient for clinical operation; according to the patent, the probability that the pathological diagnosis of the patient is non-diabetic kidney disease (NDRD) and diabetic nephropathy and non-diabetic kidney disease (MIX) in the renal biopsy is judged through initial detection data of the type 2 diabetic patient, and differential diagnosis of the diabetic nephropathy and the non-diabetic kidney disease in the type 2 diabetic patient is realized; through the probability prediction method disclosed by the patent, the risk-benefit ratio of renal biopsy of a type 2 diabetic renal injury patient is facilitated to be evaluated by a clinician, and meanwhile, reference is provided for staff of a medical institution without renal biopsy technology. However, the patent does not relate to any conventional clinical detection indexes, and a convenient and practical ASCVD risk prediction is constructed, so that the risk stratification is performed in the early stage of the increasingly-increased pre-diabetic population, an individualized treatment scheme is guided, and the occurrence of cardiovascular diseases is delayed.

Disclosure of Invention

The invention provides a method for predicting cardiovascular disease risk of a patient in the early stage of diabetes, which is a convenient, rapid and accurate atherosclerotic cardiovascular disease risk prediction model constructed based on conventional clinical detection indexes to realize risk stratification of a diabetic patient group, so that an individualized examination and treatment scheme is implemented, and the public health aims of delaying the occurrence of cardiovascular diseases and reducing the burden of the cardiovascular diseases are finally fulfilled.

It is still another object of the present invention to provide a system using the above method for predicting the risk of cardiovascular disease in a pre-diabetic patient.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a method for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising the steps of:

s1: acquiring medical data of a patient in the early stage of diabetes;

s2: randomly extracting the data samples in the step S1 by adopting a random number method, and constructing a training set, an inner verification set and an outer verification set;

s3: screening key variables using lasso regression on the training set;

s4: performing COX proportional risk regression analysis on the variables screened in the step S3, and constructing a prediction model based on COX proportional risk regression fitting;

s5: and carrying out model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by using the inner verification set and the outer verification set.

Further, the specific process of step S1 is:

acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.

Further, the specific process of step S3 is:

and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables included in the model.

Further, the step S4 includes:

performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multifactor COX regression model:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein, X ₁ ，X ₂ ，…，X _m For variables incorporated into the model, beta ₁ ，β ₂ ，…，β _m Partial regression coefficient of each variable, h ₀ (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;

obtaining the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;

wherein,

to predict the probability of an adverse event occurring in the resulting individual, S ₀ (t) modeling a population baseline risk value,

is a linear prediction value, namely the sum of products of respective variables and partial regression coefficients,

the sum of the products of the mean value of the respective variables and the partial regression coefficients in the modeled population is used.

A system for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising:

the data acquisition module is used for acquiring medical data of a patient in the early stage of diabetes;

the data preprocessing module is used for randomly extracting the data samples acquired in the step through a random number method and constructing a training set, an inner verification set and an outer verification set;

the key variable general selection module is used for screening key variables by using lasso regression on the training set;

the prediction model construction module is used for carrying out COX proportion risk regression analysis on the screened variables and constructing a prediction model based on COX proportion risk regression fitting;

and the verification module is used for performing model accuracy verification on the constructed prediction model based on COX proportional risk regression fitting by utilizing the inner verification set and the outer verification set.

Further, the data acquisition module acquires medical data of the patient in the early stage of diabetes to perform data screening and feature extraction, and candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular disease are obtained; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.

Further, the key variable general selection module performs LASSO regression analysis on the influence factors in the training set, constructs a penalty function to perform variable screening and complexity adjustment, screens an optimal model according to the lambda value, and screens out the key variables included in the model.

Further, the prediction model construction module performs COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), eliminates variables which do not reach a statistically significant level in the model, and constructs a multi-factor COX regression model:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein, X ₁ ，X ₂ ，…，X _m For incorporation into modelsVariable, beta ₁ ，β ₂ ，…，β _m Partial regression coefficient of each variable, h ₀ (t) is the reference risk, h (t, X) is the risk when the time is t and the objective factor is X;

the prediction model construction module obtains the corresponding prediction probability of the multifactor COX regression model according to the formula (2); a greater risk value indicates a greater risk of poor patient prognosis;

wherein,

to predict the probability of an adverse event occurring in the resulting individual, S ₀ (t) as a modeled population baseline risk value,

the sum of the product of the mean value of each variable in the modeled population and the partial regression coefficient is used.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention utilizes the conventional clinical detection indexes to construct a convenient and practical 10-year ASCVD risk prediction system so as to carry out risk stratification on increasing early-stage groups of diabetes mellitus, guide individualized treatment schemes and delay the occurrence of cardiovascular diseases.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a graph of the results of a LASSO regression model with variable selection in the training set, with graph A being a coefficient distribution graph and graph B being a vertical dashed line graph;

FIG. 3 uses the risk prediction values of the model in the training set (A), the inner validation set (B), and the outer validation set (C, D) to evaluate the degree of calibration of disease risk;

FIG. 4 uses the risk predictors of the model in the training set (A), the inner validation set (B), and the outer validation set (C, D) to evaluate the clinical effectiveness of disease risk;

fig. 5 is a block diagram of the system of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a method for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises the following steps:

s1: acquiring medical data of a patient in the early stage of diabetes;

s3: screening key variables using lasso regression on the training set;

The specific process of step S1 is:

acquiring medical data of a patient in the early stage of diabetes to perform data screening and feature extraction, and acquiring candidate prediction factors corresponding to the risk of atherosclerotic cardiovascular diseases; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure adjustment, diastolic pressure adjustment, hypertension history, and family cardiovascular disease history; laboratory test indexes include hemoglobin, white blood cell count, hematocrit, glycated hemoglobin, low density lipoprotein cholesterol for regulating lipid-lowering drugs, high density lipoprotein cholesterol, total cholesterol, triglyceride, C-reactive protein, creatinine, glomerular filtration rate (calculated by CKD-EPI formula), cystatin C; drawing a sample distribution histogram and a probability density curve for the continuous variable, and performing natural logarithm conversion on the variable which does not conform to normal distribution; the atherosclerotic cardiovascular disease comprises coronary heart disease, apoplexy and peripheral artery disease.

The specific process of step S3 is:

and carrying out LASSO regression analysis on the influence factors in the training set, constructing a penalty function to carry out variable screening and complexity adjustment, screening an optimal model according to the lambda value, and screening out key variables (shown in figure 2) included in the model.

Step S4 includes:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein,

The risk value of atherosclerotic cardiovascular disease development in diabetic patients within 10 years is 1-0.9265^ exp (0.5556 × sex +0.1475 × whether smoking before +0.6147 × whether smoking at present +0.1999 × family history +0.2108 × hypertension +3.1032 × age +0.0064 × systolic blood pressure regulating drug +0.1660 × low density lipoprotein cholesterol regulating lipid lowering drug-0.5806 × high density lipoprotein cholesterol +2.1250 × glycated hemoglobin +0.0275 × white cell count +0.1233 × C reactive protein +0.6121 × cystatin C);

wherein, when the sex is female, the sex is 0, and when the sex is male, the sex is 1; if the existing smoking is finished, judging whether the existing smoking is 1 or not when the existing smoking is finished, or judging whether the existing smoking is 0 or not; when smoking, whether smoking is 1 or not at present, or whether smoking is 0 or not at present; if the family history exists, the family history is 1, otherwise, the family history is 0; when the hypertension exists, the hypertension is 1, otherwise the hypertension is 0;

the age unit is year, the numerical value is converted by natural logarithm, and the original numerical value range is 40-70;

regulating the systolic pressure unit of the hypotensive drug to mmHg, wherein the numerical range is 86.5-255.0;

regulating the concentration unit of low density lipoprotein cholesterol of the lipid-lowering medicine to be mmol/L, wherein the numerical range is 0.84-11.35;

the concentration unit of the high-density lipoprotein cholesterol is mmol/L, the value is converted by natural logarithm, and the original value range is 0.487-4.087;

the unit of the glycosylated hemoglobin is percent, the numerical value is converted by natural logarithm, and the range of the original numerical value is 5.70-6.49;

the white blood cell count unit is 10^ 9/L, and the value range is 0.21-101.90;

the unit of C-reactive protein is mg/L, the value is converted by natural logarithm, and the original value range is 0.08-78.05;

the cystatin C unit is mg/L, and the numerical range is 0.442-4.140;

describing the probability distribution condition of atherosclerotic cardiovascular disease of 10 years in a training set, an internal verification set and an external verification set population by taking the prediction probability as an analysis variable;

respectively constructing ROC curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the area under the curve (AUC) value range is between 0.5 and 1, the more the AUC is close to 1.0, the higher the predicted discrimination is; when the value is equal to 0.5, the discrimination is lowest, and the application value is not high;

in the embodiment, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years, the fluctuation of the AUC values in the training set and the outer verification set of the inner verification set is 0.688 to 0.712, so that the discrimination is better;

respectively constructing calibration curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the abscissa is the prediction probability, and the ordinate is the actually observed event risk; y is a reference line, which represents that the predicted value is equal to the observation risk under the ideal condition; marking points are risks predicted by the constructed model and corresponding actual observation risks; if the predicted value is greater than the actual value, the observation point is below the reference line; if the predicted value is smaller than the actual value, the observation point is above the reference line;

in the example, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years, the calibration curves of the training set and the outer verification set of the inner verification set fall on two sides of the reference line, so that the model is prompted to have better calibration degree;

respectively constructing decision curves corresponding to the COX regression model in a training set, an inner verification set and an outer verification set by taking the prediction probability as an analysis variable; the abscissa is a threshold probability, when the probability predicted by the patient using the model is greater than the threshold probability, therapeutic measures are taken, and the corresponding ordinate is the net benefit obtained by the patient's profit minus the loss at this time; the horizontal line and the oblique line are extreme conditions, the horizontal line represents that all samples are negative, and the net benefit is 0; the diagonal lines indicate that all samples were positive and all samples received treatment.

In the embodiment, when the probability calculated by the model is used for predicting the occurrence of atherosclerotic cardiovascular diseases within 10 years and guiding the treatment, the patient can obtain net benefit within a certain threshold probability range;

constructing an online prediction system for 10-year atherosclerotic cardiovascular disease risk of a patient in the pre-diabetes (figure 4);

the complex regression equation is converted into a practical website interactive interface, so that the result of the prediction model is more readable, and the prediction model is convenient for evaluation of the testee and widely used in medical research and clinic.

Example 2

As shown in fig. 5, a system for predicting the risk of cardiovascular disease in a pre-diabetic patient comprises:

the prediction model construction module is used for carrying out COX proportional risk regression analysis on the screened variables and constructing a prediction model based on COX proportional risk regression fitting;

The variables collected by the collecting module comprise the age, sex, smoking state, systolic pressure, hypertension history, family cardiovascular disease history, white blood cell count, glycosylated hemoglobin, low density lipoprotein cholesterol, high density lipoprotein cholesterol, C reactive protein, creatinine, cystatin C, whether a antihypertensive drug is taken or not, and whether a lipid-lowering drug is taken or not; whether the antihypertensive drugs and the lipid-lowering drugs are taken or not is respectively used for calculating the systolic pressure of the antihypertensive drugs and the low-density lipoprotein cholesterol of the lipid-lowering drugs.

And the key variable general selection module performs LASSO regression analysis on the influence factors in the training set, constructs a penalty function to perform variable screening and complexity adjustment, screens an optimal model according to the lambda value, and screens out key variables brought into the model.

The prediction model building module performs COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), eliminates variables which do not reach a statistically significant level in the model, and builds a multi-factor COX regression model:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein,

The prediction model construction module calculates an individual risk prediction value by using the constructed model, finally outputs a risk prediction value of atherosclerotic cardiovascular disease of a patient in the early stage of diabetes within 10 years, adopts a 10-year cardiovascular risk layering standard (low risk: less than 5%, critical risk: more than or equal to 5% and less than 7.5%, medium risk: more than or equal to 7.5% and less than 20%, high risk: more than or equal to 20%) recommended by the 2019 ACC/AHA cardiovascular disease first-level prevention and treatment guideline, and provides the current risk level based on the calculated risk value.

Example 3

s1: acquiring medical data of a patient in the early stage of diabetes;

s3: screening key variables using lasso regression on the training set;

The specific process of step S1 is:

The specific process of step S3 is:

Step S4 includes:

performing COX proportional risk regression analysis on all screened variables in the training set by using a formula (1), removing variables which do not reach a statistically significant level in the model, and constructing a multi-factor COX regression model:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein,

is a linear prediction value, i.e. the sum of products of respective variables and partial regression coefficients，

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising the steps of:

s1: acquiring medical data of a patient in the early stage of diabetes;

s3: screening key variables using lasso regression on the training set;

2. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 1, wherein the specific process of step S1 is:

3. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 2, wherein the specific process of step S3 is:

4. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 3, wherein the step S4 comprises:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein X ₁ ,X ₂ ,…,X _m For variables incorporated into the model, beta ₁ ，β ₂ ，…，β _m Partial regression coefficient of each variable, h ₀ (t) is the baseline risk, h (t, X) the risk when time is t and the objective factor is X.

5. The method for predicting the risk of cardiovascular disease in a pre-diabetic patient according to claim 4, wherein the step S4 further comprises:

wherein,

6. A system for predicting the risk of cardiovascular disease in a pre-diabetic patient, comprising:

7. The system of claim 6, wherein the data collection module collects medical data of the pre-diabetic patient for data screening and feature extraction to obtain candidate prediction factors corresponding to risk of atherosclerotic cardiovascular disease; candidate predictors include: the patient's age, sex, race, smoking status, drinking status, body mass index, waist circumference to hip circumference ratio, systolic pressure regulation, diastolic pressure regulation, hypertension history, and family cardiovascular disease history.

8. The system of claim 7, wherein the key variable commander module performs LASSO regression analysis on the impact factors in the training set, constructs a penalty function for variable screening and complexity adjustment, screens the optimal model according to the lambda value, and screens out the key variables included in the model.

9. The system of claim 8, wherein the model for predicting risk of cardiovascular disease in pre-diabetic patients comprises a model for predicting model building module that performs a COX proportional risk regression analysis on all screened variables in the training set using equation (1), eliminates variables in the model that do not reach a statistically significant level, and builds a multi-factor COX regression model:

h(t，X)＝h ₀ (t)exp(β ₁ X ₁ +β ₂ X ₂ +…+β _m X _m ) (1)

wherein, X ₁ ,X ₂ ,…,X _m For variables incorporated into the model, beta ₁ ，β ₂ ，…，β _m Partial regression coefficient of each variable, h ₀ (t) is the baseline risk, h (t, X) the risk when time is t and the objective factor is X.

10. The system of claim 9, wherein the model building module obtains the corresponding prediction probability of the multi-factor COX regression model according to equation (2); a greater risk value indicates a greater risk of poor patient prognosis;

wherein,

to predict the probability of an adverse event in the resulting individual, S ₀ (t) modeling a population baseline risk value,