CN113421648B - Model for predicting ejection fraction retention type heart failure risk - Google Patents

Model for predicting ejection fraction retention type heart failure risk Download PDF

Info

Publication number
CN113421648B
CN113421648B CN202110686960.8A CN202110686960A CN113421648B CN 113421648 B CN113421648 B CN 113421648B CN 202110686960 A CN202110686960 A CN 202110686960A CN 113421648 B CN113421648 B CN 113421648B
Authority
CN
China
Prior art keywords
model
heart failure
ejection fraction
risk
predicting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110686960.8A
Other languages
Chinese (zh)
Other versions
CN113421648A (en
Inventor
方向东
赵学彤
渠鸿竹
董蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Genomics of CAS
Original Assignee
Beijing Institute of Genomics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Genomics of CAS filed Critical Beijing Institute of Genomics of CAS
Priority to CN202110686960.8A priority Critical patent/CN113421648B/en
Publication of CN113421648A publication Critical patent/CN113421648A/en
Application granted granted Critical
Publication of CN113421648B publication Critical patent/CN113421648B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a model for predicting the risk of heart failure of a reserved ejection fraction, wherein the information collected by the model comprises the following marker combinations: age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta values for 25 methylation sites. All patients are clinically evaluated to identify risk through an end-to-end machine learning model combined with multiple sets of mathematical data interactions, and the occurrence of heart failure with preserved ejection fraction is delayed or prevented by controlling risk factors, treating asymptomatic left ventricular contractility abnormalities, and the like. The model provided by the invention has good clinical application prospect.

Description

Model for predicting ejection fraction retention type heart failure risk
Technical Field
The invention belongs to the field of disease diagnosis models, and particularly relates to a model for predicting the risk of heart failure with reserved ejection fraction.
Background
In recent years, heart failure morbidity and mortality have increased year by year. Heart failure is a change in the structure or function of the heart caused by complex interactions of biochemical factors such as inheritance, neurohormones, metabolism, inflammation, etc. Chronic heart failure (Chronic heart failure, CHF) is characterized by disturbances in myocardial energy metabolism and metabolic remodeling, with high morbidity and mortality. Thus, it is necessary to obtain accurate personalized risk assessment to assist in further managing clinical decisions.
There are three currently accepted subtypes of chronic heart failure, which are classified into heart failure of the depressed-ejection fraction type (HFrEF), heart failure of the medium-ejection fraction type (HFmrEF), and heart failure of the ejection fraction retention type (HFpEF) according to left ventricular ejection fraction (left ventricular ejection fraction, LVEF). The three subtypes differ greatly in etiology and pathophysiology. Notably, early prediction of HFpEF remains challenging. The establishment of the HFpEF early prediction model is very important to risk assessment management and clinical decision of heart failure, and timely controls diet and life habit aiming at patients with high HFpEF risk, thereby being more in line with the principle of accurate prevention.
Heart failure is a disease caused by multiple factors, and its occurrence is the result of the combined action of genetic factors and environmental factors. Epigenetic mechanisms such as DNA methylation and the like are involved in regulating myocardial fibrosis, leading to myocardial energy metabolism disorders, the occurrence of abnormal metabolism, transport and activation of amino acids and the like, promoting the development of cardiovascular diseases and affecting the risk of individual diseases, which is one of the pathophysiological causes of HFpEF occurrence. It is well known that the occurrence of HFpEF is closely related to clinical factors. The risk of HFpEF onset increases dramatically with age. DNA methylation and clinical characteristics can describe disease states in different dimensions, with internal interactions.
In the prior art Sadiya et al developed a 10 year heart failure risk model that included 10 clinical features, but did not discuss the pathogenesis and different subtypes of CHF. The method lacks the ability to learn feature interactions and considers only clinical features, focusing on no epigenetic factors (Khan SS,Ning H,Shah SJ et al.10-Year Risk Equations for Incident Heart Failure in the General Population.J.Am.Coll.Cardiol.2019;73:2388-2397).
William b.kannel et al developed a 4-year heart failure risk assessment model in which the risk of heart failure was estimated using a logistic function consisting of 9 clinical features. The method has the same problems as the model developed by Sadiya and the like: lack of ability to learn feature interactions and consider only clinical features, not focused on epigenetic factors (Kannel WB,D'Agostino RB,Silbershatz H et al.Profile for estimating risk of heart failure.Arch.Intern.Med.1999;159:1197-1204.).
Edward Choi et al built an early detection model of CHF, using modeling of the temporal relationship between health data from a patient Electronic Health Record (EHR), predicting future diagnosis of CHF. The method only focuses on clinical features, and does not consider the influence of epigenetic factors on future heart failure events (Choi E,Schuetz A,Stewart WF et al.Using recurrent neural network models for early detection of heart failure onset.J.Am.Med.Inform.Assoc.2016;24:361-370).
There is no HFpEF risk prediction model developed that integrates clinical and epigenetic features. Therefore, developing an accurate and comprehensive HFpEF risk prediction method is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the above problems, the present invention provides a model for predicting the risk of heart failure with preserved ejection fraction, which performs clinical evaluation on all patients to identify HFpEF risk by combining end-to-end machine learning model with multi-set of learning data interaction, and delays or prevents HFpEF occurrence by controlling HFpEF risk factors, treating asymptomatic left ventricular contractile dysfunction, and the like.
97 Clinical data from 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected. After 8 years of follow-up, 738 participants showed no heart failure, 59 participants were diagnosed as HFpEF, and the present invention uses this data as a training set to obtain a set of marker combinations for modeling the predicted risk of heart failure with preserved ejection fraction.
Terminology:
BMI: body Mass Index, BMI = Body weight (kg)/height (meter) squared, i.e., kg/m 2.
EHR: electronic health records, an electronic health record.
AUC: area under curves area under the curve.
FM: a factorization machine.
DNN: deep neural networks.
In one aspect, the invention provides a marker combination for predicting the risk of heart failure in the ejection fraction retention type.
The marker combination comprises:
Age of patient;
Whether diuretics are taken;
BMI;
Urine albumin;
Serum creatinine;
Beta value of methylation site.
The methylation site comprises :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
The diuretics include, but are not limited to: furosemide (furosemide), metolazone (metolazone), chlorthalidone (chlorthalidone), torsemide, furosemide (lasix).
In another aspect, the invention provides a model for predicting the risk of heart failure in the ejection fraction retention type.
The information collected by the model comprises marker combinations obtained by screening the following characteristics:
Age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta value of methylation site :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
The model is built by means of a data mining algorithm.
The model performs modeling by using DeepFM algorithm on the marker combination obtained by feature screening.
The method for establishing the model comprises the steps of inputting a sigmoid function in DeepFM algorithm for early prediction of an HFpEF event, and outputting as follows:
Wherein, Is a predicted HFpEF event, yFM is the output of the FM component, yDNN is the output of the DNN component.
The output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is as follows:
Where w ε R d、Vi∈Rk (k is known). The other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions.
The output of the DNN component is as follows:
yDNN=W|H|+1·a|H|+b|H|+1
Where |H| is the number of hidden layers, a (l) is the output of the embedded layer, W (l) is the model weight, and b (l) is the bias of the first layer.
Specifically, for a given deep hidden layer, a deep neural network is implemented that contains two hidden layers (256 ) using ReLU as an activation function:
y=f(x)=relu(wx+b),
Using logloss as an objective function here, to control the overfit, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001.
When the deep neural network is optimized, a batch normalization and weight attenuation method is adopted, the embedding is set to be 8, the batch size is set to be 300, and the decay size is set to be 0.9.
Preferably, the performance evaluation of the DeepFM model uses a three-fold cross-validation, using Adam as the optimization algorithm, with a learning rate of 0.0001, epoch of 400, and dropout of 60%.
Preferably, the recognition capability of the DeepFM model is assessed by area, sensitivity, specificity and accuracy under the subject's operating characteristics.
In a further aspect, the present invention provides the use of the foregoing marker composition and/or the foregoing model for the preparation of a medicament and/or kit for the treatment of heart failure associated with ejection fraction retention.
In yet another aspect, the invention provides the use of the foregoing marker composition and/or the foregoing model in constructing an early predictive model of other chronic complex diseases.
The invention has the beneficial effects that:
Taking multiple groups of clinical characteristics into consideration, starting from 5 clinical diagnosis characteristics and 25 epigenetic characteristics, DNA methylation, using DeepFM algorithm to establish an early risk assessment method of HFpEF, wherein the method can extract respective complex characteristics from original characteristics to realize interaction among the characteristics, and the method result is superior to the existing models and other baseline machine learning models.
The DeepFM model employed in the present application performs best compared to the currently widely used baseline model SVC、Bagging、Random Forest、RUSBoost、Easy Ensemble、Gradient Boosting、Gaussian Naive Bayes、XGBoost、LogitBoost.
Drawings
FIG. 1 is an AUC of predicted performance based on different features in a test set, wherein: HFrisk model denotes a model containing EHR features and DNA methylation features, 25CpG model denotes a model with DNA methylation features only, and 5EHR model denotes a model with EHR features only.
Figure 2 is the AUC results for the baseline model in the test set and 30 features used for this method.
Fig. 3 is the AUC results of this method and William b.kannel model in male/female participants.
Fig. 4 shows the evaluation of the method by means of a calibration graph of the observed risk and the predicted risk and a Hosmer-Lemeshow goodness-of-fit test, the calibration being as shown in the graph, the Hosmer-Lemeshow statistical test being statistically significant (p=0.678).
Detailed Description
The present invention will be described in further detail with reference to the following examples, which are not intended to limit the present invention, but are merely illustrative of the present invention. The experimental methods used in the following examples are not specifically described, but the experimental methods in which specific conditions are not specified in the examples are generally carried out under conventional conditions, and the materials, reagents, etc. used in the following examples are commercially available unless otherwise specified.
Example 1A method for constructing a model for predicting the risk of heart failure with preserved ejection fraction
97 Clinical data from 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected. After 8 years of follow-up, 738 participants showed no heart failure, 59 participants were diagnosed with HFpEF, and this data was used as a training set.
And gradually screening the DNA methylation characteristics and the clinical diagnosis and treatment characteristics by using a three-step characteristic selection method.
(1) For clinical diagnosis and treatment characteristics, incomplete and insignificant clinical characteristics in the training set are removed by adopting the following thresholds: missing samples >20%, and the chi-square test/Mann-Whitney U test of both groups of samples has a p-value >0.05. If the Pearson correlation between two clinical features is greater than 0.8, then the clinical features with less Spearman correlation (i.e., less correlation with HFpEF) are discarded, and finally 26 clinical features are screened as follows:
Diuretics, beta-blockers, angiotensin II antagonists, ACEI, aspirin, cardiovascular disorders, coronary heart disease, atrial fibrillation, sex, age, BMI, serum creatinine, diastolic blood pressure, fasting blood glucose, high density lipoprotein cholesterol, systolic blood pressure, total cholesterol, hypertension, lipid disorders, atrial enlargement, rheumatism, aortic valve disorders, rheumatoid arthritis, urinary albumin, whole blood hemoglobin A1C, C reactive protein.
(2) For methylation signatures, differential methylation probes were obtained in the training Set (DMPs). DMPs were obtained using the fold change after log >0.05 and the adjusted p-value <0.05 as threshold, screening 319 DNA methylation sites, as follows:
cg01074797,cg10501210,cg15084543,cg16836311,cg16986315,cg26553501,cg00522231,cg02577387,cg04864807,cg20698421,cg22454769,cg01227537,cg15028458,cg17280346,cg02650266,cg04074536,cg08288016,cg04675542,cg05016408,cg05845376,cg06883126,cg10530883,cg16227623,cg18839637,cg19025234,cg23479922,cg03738025,cg08476511,cg16867657,cg18319852,cg22367191,cg23737190,cg24661236,cg09124496,cg09125127,cg07041999,cg12873476,cg14620572,cg10187894,cg17766026,cg24205914,cg24530234,cg01178624,cg15243034,cg21481937,cg01974091,cg01150270,cg18567924,cg01295034,cg04358214,cg08101977,cg08151623,cg09259081,cg09684846,cg16263848,cg17759274,cg19344626,cg24794228,cg25755428,cg00375983,cg00815832,cg01128109,cg01588224,cg02998240,cg03341469,cg03655142,cg05363438,cg06794355,cg06829788,cg07388493,cg08128734,cg08702915,cg08893087,cg09362335,cg13352914,cg16000360,cg16265542,cg22559013,cg23371584,cg00057240,cg00190206,cg03233656,cg03879180,cg05085636,cg05203213,cg05365735,cg05481257,cg05951221,cg09128529,cg10106284,cg10687131,cg10835286,cg11237792,cg11807280,cg12682972,cg13149736,cg17534916,cg19578183,cg22943590,cg25138327,cg25552548,cg27141850,cg02356435,cg03556243,cg10098541,cg11580026,cg12615982,cg14039937,cg14975410,cg19729744,cg23450509,cg01950474,cg02188818,cg02810967,cg03063309,cg03671075,cg04573661,cg07456878,cg07809027,cg07974833,cg08462122,cg11076306,cg11970349,cg15106030,cg16781992,cg18797590,cg20816447,cg02051771,cg04071270,cg05955210,cg07705913,cg08635765,cg08763461,cg11925729,cg14698665,cg15421911,cg23125993,cg23500537,cg24856658,cg26624398,cg27604145,cg02872426,cg03142554,cg03785755,cg04064963,cg05220968,cg06126421,cg06386482,cg06951627,cg09548084,cg10083824,cg11342453,cg11617964,cg12560772,cg13221458,cg14441271,cg15319032,cg15804973,cg16004593,cg17608381,cg21572722,cg21855021,cg22707857,cg23049448,cg27368039,cg02756107,cg03068497,cg03453431,cg03799713,cg06890291,cg08614290,cg09423312,cg14017689,cg14485097,cg16446288,cg17067544,cg17727071,cg21429551,cg21807944,cg23447239,cg23715104,cg25879142,cg26153045,cg26391564,cg00285394,cg00327383,cg00399059,cg07583137,cg08090164,cg10210397,cg13021857,cg13631444,cg16196274,cg17903548,cg19021188,cg20988565,cg22861548,cg23180489,cg24554944,cg25305703,cg25329685,cg25483741,cg25923729,cg27039118,cg00008629,cg01028796,cg07158339,cg10173814,cg13040392,cg13474639,cg13619074,cg13741668,cg13959831,cg23006204,cg23469878,cg00045910,cg00639447,cg03962527,cg06012872,cg06812574,cg09421083,cg10556349,cg21024264,cg24711336,cg24892069,cg27290215,cg27401945,cg01074365,cg02315732,cg04993130,cg05586607,cg06344265,cg10253640,cg11025604,cg12738765,cg19172170,cg20648141,cg01349368,cg03124318,cg04671742,cg05157098,cg06021880,cg15110296,cg17725129,cg20051875,cg23256579,cg25486399,cg00509187,cg00893603,cg01205935,cg01783816,cg02838877,cg04887675,cg12603632,cg02222791,cg03459776,cg10172979,cg13158272,cg18485215,cg19590421,cg22816294,cg27056759,cg04134722,cg10403394,cg11032707,cg23299445,cg27340001,cg00876127,cg01046070,cg01259782,cg03809021,cg04155793,cg05107535,cg05917111,cg07839457,cg08486507,cg09958192,cg21996068,cg02228185,cg03745383,cg07169660,cg12317815,cg18181703,cg19758448,cg24079381,cg00495303,cg07529654,cg24217948,cg00706441,cg00789427,cg03562414,cg06581818,cg08233235,cg09547119,cg10635122,cg13640414,cg13672791,cg19536401,cg20964856,cg22693863,cg25481705,cg25599567,cg01479232,cg01581757,cg01618988,cg04931539,cg10339152,cg10743062,cg11853697,cg11861654,cg11991566,cg17129400,cg18965930,cg20810198,cg25829531,cg22862003,cg01234420,cg01346718,cg04986324.
(3) Combining 26+319 characteristics obtained by screening, and performing second-step characteristic screening by using lasso algorithm. By adding the L1 norm of the coefficient as a penalty term to the loss function, a first order penalty function is constructed, the purpose of feature selection is achieved by parameter reduction, the result vector sparsity is achieved, and finally, feature screening is executed by compressing the coefficient corresponding to some weak variables to 0, and the minimized objective function is as follows:
in this algorithm, the super parameter family= "binominal", type. Measure= "auc"/"class", nfolds =10 is set, and finally 80 combination features are obtained.
(4) And using a supporting gradient lifting tree algorithm (Gradient Boosting Decison Tree, GBDT for short), continuously performing feature splitting to generate a tree, fitting a residual error between a predicted value and an actual value of a previous round of model, so that the model achieves an optimal effect, and simultaneously, limiting the complexity of the tree by adding regularization terms to further prevent overfitting. In this algorithm, setting the hyper-parameters objective="binary:logistic",booster="gbtree",eval metric="error",nrounds=7,eta=0.5,max depth=3,subsample=0.5,colsample bytree=1,min child weight=2,gamma=0.5. finally obtains 30 features, which are specifically as follows:
Age, whether diuretics, BMI, urinary albumin, serum creatinine, methylation sites :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
(5) To better reflect the effect of interactions between features on HFpEF risk, modeling was performed on 30 features obtained by feature screening using DeepFM algorithm. The neural network model DeepFM integrates the architecture of a factor decomposition machine (FM) and a Deep Neural Network (DNN), FM is low-order feature interaction modeling, DNN is high-order feature interaction modeling, cross features are learned through dot product and hidden vector forms, the capability of automatically learning the cross features is achieved, and end-to-end models of complex features can be extracted from original features. DeepFM can be trained effectively because its wide and deep portions are different, but they share the same input and embedding vectors. DeepFM extract DNA methylation and EHR features and learn the combination of features that are hidden behind these features. DeepFM jointly trains the whole network in an end-to-end mode, finally inputs a sigmoid function for early prediction of the HFpEF event, and outputs as follows:
Wherein, Is a predicted HFpEF event, yFM is the output of the FM component, yDNN is the output of the DNN component. The FM component and DNN component represent a factorizer and a feedforward neural network, respectively, for low-order feature interactions and high-order feature interactions. The FM model captures order-2 feature interactions more effectively than previous methods, especially if the data set is sparse, as an inner product of the respective feature potential vectors for the 2-order feature interactions. The output of FM is the sum of one addition unit and a plurality of inner product units, and the output of FM is:
Where w ε R d、Vi∈Rk (k is known). The other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions. The depth component DNN is a feedforward neural network and is used for learning high-order feature interaction, and the DNN output is as follows:
yDNN=W|H|+1·a|H|+b|H|+1
Where |H| is the number of hidden layers, a (l) is the output of the embedded layer, W (l) is the model weight, and b (l) is the bias of the first layer. For a given deep hidden layer, a deep neural network is implemented that contains two hidden layers (256 ) using ReLU as an activation function:
y=f(x)=relu(wx+b)。
Logloss is used here as an objective function. To control the overfitting, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001. To optimize the neural network, batch normalization and weight decay methods are employed. The embedding was set to 8, the batch size was set to 300, and the decay size was set to 0.9. To train DeepFM algorithm, using Adam as the optimization algorithm, learning rate was set to 0.0001, epoch was set to 400, dropout was set to 60%, and performance of DeepFM model was evaluated using tri-fold cross validation.
(6) Finally, the area, sensitivity, specificity and accuracy under the subject's operating characteristics were used to evaluate the recognition ability of all application models. The subject operation characteristic curve, namely ROC curve, is a curve drawn by taking a true positive rate TPR (True Positive Rate, which can indicate sensitivity) as an ordinate and a false positive rate FPR (False Positive Rate, which can indicate 1-specificity) as an abscissa according to a series of set thresholds, and reflects the changes of the TPR and the FPR under different thresholds, wherein the closer the curve is to the upper left corner, the better the classification performance of the model is indicated; sensitivity is the proportion of correct prediction in the model, and represents the proportion of the positive examples divided into pairs, and the recognition capability of the classifier on the positive examples is measured; accuracy describes the ability of the classifier to determine overall data, with positive determinations being positive and negative determinations being negative; the accuracy is the most common evaluation index, which is the number of samples divided by the number of all samples, and in general, the higher the accuracy, the better the classifier; the specificity represents the proportion of all negative examples which are divided into pairs, and the recognition capability of the classifier on the negative examples is measured.
Example 2 model verification
171 Sampled participants without cardiovascular disease were collected, 139 participants showed no heart failure over 8 years of follow-up, and 32 participants were diagnosed with HFpEF and evaluated as a test set of the method.
The model was judged and calibrated using AUC and Hosmer-Lemeshow test. The following table is a confusion matrix for the 30 features used in the method in the test set:
In the test set, the AUC obtained using this model was 0.90 (95% confidence interval, 0.89-0.90). The calibration curve obtained by the model is shown in fig. 1, and the Hosmer-Lemeshow statistical test has no statistical significance (p=0.678), so that the reliability of the result is proved.
To assess the impact of training set sample size on this model, 75%, 60%, 50% and 25% of training set participants were randomly selected. Test set results are shown below:
Sample size Participants (participants) AUC Sensitivity of Specificity (specificity) Accuracy rate of
25% Verification set 0.90(0.76-1.00) 0.89(0.57-1.00) 0.93(0.84-1.00) 0.92(0.86-0.99)
25% Test set 0.88(0.84-0.91) 0.82(0.76-0.88) 0.84(0.66-1.00) 0.84(0.69-0.98)
50% Verification set 0.87(0.79-0.95) 0.86(0.62-1.00) 0.79(0.61-0.98) 0.80(0.65-0.95)
50% Test set 0.89(0.87-0.90) 0.85(0.80-0.91) 0.87(0.75-0.98) 0.86(0.78-0.95)
60% Verification set 0.92(0.87-0.96) 0.89(0.82-0.95) 0.84(0.69-0.99) 0.85(0.71-0.98)
60% Test set 0.89(0.88-0.89) 0.83(0.77-0.89) 0.85(0.72-0.98) 0.86(0.75-0.94)
75% Verification set 0.90(0.86-0.95) 0.84(0.57-1.00) 0.87(0.66-1.00) 0.87(0.70-1.00)
75% Test set 0.88(0.85-0.91) 0.76(0.70-0.82) 0.90(0.82-0.98) 0.87(0.81-0.93)
The results indicate that the results of the test set are independent of the sample size of the training set.
Furthermore, we also compared HFrisk models to the currently widely used baseline model SVC、Bagging、Random Forest、RUSBoost、Easy Ensemble、Gradient Boosting、Gaussian Naive Bayes、XGBoost、LogitBoost for performance. Each reference model is fine-tuned to achieve better results. Using the same 30 characteristics, AUC results were obtained as shown in fig. 2. Although the AUC (auc=0.57-0.88) was slightly different for each model, the performance for DeepFM model was still the best.
Comparing this model with other published models: william b.kannel et al propose a 4-year risk assessment model (Profile for Estimating Risk of Heart Failure[J].Archives of Internal Medicine,1999,159(11):1197-204.), that uses a hybrid logistic regression algorithm to assess CHF risk by gender on the FHS cohort. The model was constructed using the same training set as William b.kannel and compared to the model constructed according to the present application. The AUCs obtained by the final inventive model were respectively: male (male) 0.99, female (female) 0.94; AUC of William b.kannel model was: male 0.74 female 0.89. The results are shown in FIG. 3.
AUC or C statistics of the present model are directly compared to those of the published model: sadiya S. khan et al describe statistics for a 10 year CHF risk equation (Khan S S,Ning H,Shah S J,et al.10-Year Risk Equations for Incident HeartFailure in the General Population[J].Journal of the American College of Cardiology,2019,73(19):2388-2397.), validation set C of 0.71-0.87; edward Choi et al established a CHF early detection model with a test set AUC <0.80. The results prove that the method is optimal.
Calibration degree calibration charts are often used to evaluate consistency/calibration degree, i.e. the difference between predicted and actual values, i.e. scatter plots of actual occurrence (observed risk) and predicted occurrence (PREDICTED RISK), which are visualized as a result of the Hosmer-Lemeshow goodness-of-fit test. And the real result and the model fitting probability are transmitted to a hostem. Test function, so that p=0.678 is obtained, and the fact that the original assumption cannot be refused is indicated, and the fitting effect is good. The results are shown in FIG. 4.

Claims (10)

1. A marker combination for predicting the risk of heart failure with preserved ejection fraction, comprising:
Beta values for methylation sites including :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910;
Age of patient;
Whether diuretics are taken;
BMI;
Urine albumin;
serum creatinine.
2. A model for predicting the risk of heart failure with preserved ejection fraction, wherein the information collected by the model comprises the following marker combinations:
age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta value of methylation site;
The methylation site comprises :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
3. The model for predicting the risk of heart failure in a ejection fraction preserving model according to claim 2, wherein the model performs modeling by using DeepFM algorithm on marker combinations obtained by feature screening.
4. A model for predicting the risk of heart failure with preserved ejection fraction as claimed in claim 3, wherein the method for constructing the model includes inputting a sigmoid function in DeepFM algorithm for early prediction of heart failure event with preserved ejection fraction, and outputting:
Wherein, Is a predicted ejection fraction retention heart failure event, yFM is the output of the FM component, yDNN is the output of the DNN component.
5. The model for predicting the risk of heart failure in a preserved ejection fraction as set forth in claim 4, wherein the output of the FM component is a sum of an addition unit and a plurality of inner product units, and the output of the FM component is:
Wherein w ε R d、Vi∈Rk, k is known; the other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions.
6. The model for predicting the risk of heart failure with preserved ejection fraction as set forth in claim 4, wherein the output of the DNN component is:
yDNN=W|H|+1·a|H|+b|H|+1
Where |H| is the number of hidden layers, a (l) is the output of the embedded layer, W (l) is the model weight, and b (l) is the bias of the first layer.
7. The model for predicting the risk of ejection fraction preserving heart failure according to claim 5, wherein for a given deep hidden layer a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:
y=f(x)=relu(wx+b),
Using logloss as an objective function here, to control the overfit, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001.
8. The model of claim 3, wherein the DeepFM model performance evaluation uses a tri-fold cross-validation, using Adam as the optimization algorithm, learning rate is set to 0.0001, epoch is set to 400, and dropout is set to 60%.
9. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the preparation of a medicament and/or a kit for the treatment of heart failure associated with ejection fraction retention.
10. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for constructing an early predictive model of chronic complex disease.
CN202110686960.8A 2021-06-21 2021-06-21 Model for predicting ejection fraction retention type heart failure risk Active CN113421648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110686960.8A CN113421648B (en) 2021-06-21 2021-06-21 Model for predicting ejection fraction retention type heart failure risk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110686960.8A CN113421648B (en) 2021-06-21 2021-06-21 Model for predicting ejection fraction retention type heart failure risk

Publications (2)

Publication Number Publication Date
CN113421648A CN113421648A (en) 2021-09-21
CN113421648B true CN113421648B (en) 2024-04-23

Family

ID=77789546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110686960.8A Active CN113421648B (en) 2021-06-21 2021-06-21 Model for predicting ejection fraction retention type heart failure risk

Country Status (1)

Country Link
CN (1) CN113421648B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104994912A (en) * 2012-12-06 2015-10-21 康肽德生物医药技术有限公司 Peptide therapeutics and methods for using same
CN105586406A (en) * 2016-01-15 2016-05-18 汪道文 Method for detecting gene polymorphism of ADRB1 and GRK5
CN107683341A (en) * 2015-05-08 2018-02-09 新加坡科技研究局 method for the diagnosis and prognosis of chronic heart failure
JP2018072337A (en) * 2016-10-21 2018-05-10 国立研究開発法人国立循環器病研究センター Method of predicting recurrence risk of major adverse cardiac event

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130310436A1 (en) * 2012-05-18 2013-11-21 The Regents Of The University Of Colorado, A Body Corporate Methods for predicting response to beta-blocker therapy in non-ischemic heart failure patients
WO2017218911A1 (en) * 2016-06-17 2017-12-21 Abbott Laboratories BIOMARKERS TO PREDICT NEW ONSET HEART FAILURE WITH PRESERVED EJECTION FRACTION (HFpEF)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104994912A (en) * 2012-12-06 2015-10-21 康肽德生物医药技术有限公司 Peptide therapeutics and methods for using same
CN107683341A (en) * 2015-05-08 2018-02-09 新加坡科技研究局 method for the diagnosis and prognosis of chronic heart failure
CN105586406A (en) * 2016-01-15 2016-05-18 汪道文 Method for detecting gene polymorphism of ADRB1 and GRK5
JP2018072337A (en) * 2016-10-21 2018-05-10 国立研究開発法人国立循環器病研究センター Method of predicting recurrence risk of major adverse cardiac event

Also Published As

Publication number Publication date
CN113421648A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
He et al. A novel ensemble method for credit scoring: Adaption of different imbalance ratios
Hu et al. Detection of COVID-19 severity using blood gas analysis parameters and Harris hawks optimized extreme learning machine
Norouzi et al. Predicting renal failure progression in chronic kidney disease using integrated intelligent fuzzy expert system
Shahid et al. A novel approach for coronary artery disease diagnosis using hybrid particle swarm optimization based emotional neural network
Orr Use of a probabilistic neural network to estimate the risk of mortality after cardiac surgery
Nabrdalik et al. Machine learning predicts cardiovascular events in patients with diabetes: the silesia diabetes-heart project
Huang et al. Donor-derived cell-free DNA combined with histology improves prediction of estimated glomerular filtration rate over time in kidney transplant recipients compared with histology alone
CN113421648B (en) Model for predicting ejection fraction retention type heart failure risk
Jiang et al. Prediction of coronary heart disease in gout patients using machine learning models
Grossi How artificial intelligence tools can be used to assess individual patient risk in cardiovascular disease: problems with the current methods
Wang et al. Expanded feature space-based gradient boosting ensemble learning for risk prediction of type 2 diabetes complications
Levene et al. Prevalence of traditional and non-traditional cardiovascular risk factors in adults with congenital heart disease
Gudelis et al. Diagnosis of pain in the right iliac fossa. A new diagnostic score based on decision-tree and artificial neural network methods
Kaya Performance evaluation of multilayer perceptron artificial neural network model in the classification of heart failure
CN115188475A (en) Risk prediction method for lupus nephritis patient
Yördan et al. Hybrid AI-Based Chronic Kidney Disease Risk Prediction
Sancar et al. Body mass index estimation by using an adaptive neuro fuzzy inference system
Hernández et al. Predicting delayed graft function and mortality in kidney transplantation
Shahabi et al. Rule extraction for fatty liver detection using neural networks
Tang et al. Different thresholds in the prediction of chronic obstructive pulmonary disease using neural network and Logistic model
Asadi et al. Identifying Risk Indicators of Cardiovascular Disease in Fasa Cohort Study (FACS): An Application of Generalized Linear Mixed-Model Tree
Ramasamy et al. A Work Review on Clinical Laboratory Data Utilizing Machine Learning Use-Case Methodology
CN118155853B (en) Method and system for constructing lupus nephritis immunotherapy reactivity prediction model
Saharan et al. Optimization of Smoking Classification by Applying Neural Network with Variable Importance Using Cytokine Biomarkers
Madgwick et al. P027 Machine learning approaches to identify prognosis indicators from microbiome data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant