CN113421648B

CN113421648B - Model for predicting ejection fraction retention type heart failure risk

Info

Publication number: CN113421648B
Application number: CN202110686960.8A
Authority: CN
Inventors: 方向东; 赵学彤; 渠鸿竹; 董蔚
Original assignee: Beijing Institute of Genomics of CAS
Current assignee: Beijing Institute of Genomics of CAS
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2024-04-23
Anticipated expiration: 2041-06-21
Also published as: CN113421648A

Abstract

The invention provides a model for predicting the risk of heart failure of a reserved ejection fraction, wherein the information collected by the model comprises the following marker combinations: age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta values for 25 methylation sites. All patients are clinically evaluated to identify risk through an end-to-end machine learning model combined with multiple sets of mathematical data interactions, and the occurrence of heart failure with preserved ejection fraction is delayed or prevented by controlling risk factors, treating asymptomatic left ventricular contractility abnormalities, and the like. The model provided by the invention has good clinical application prospect.

Description

Model for predicting ejection fraction retention type heart failure risk

Technical Field

The invention belongs to the field of disease diagnosis models, and particularly relates to a model for predicting the risk of heart failure with reserved ejection fraction.

Background

In recent years, heart failure morbidity and mortality have increased year by year. Heart failure is a change in the structure or function of the heart caused by complex interactions of biochemical factors such as inheritance, neurohormones, metabolism, inflammation, etc. Chronic heart failure (Chronic heart failure, CHF) is characterized by disturbances in myocardial energy metabolism and metabolic remodeling, with high morbidity and mortality. Thus, it is necessary to obtain accurate personalized risk assessment to assist in further managing clinical decisions.

There are three currently accepted subtypes of chronic heart failure, which are classified into heart failure of the depressed-ejection fraction type (HFrEF), heart failure of the medium-ejection fraction type (HFmrEF), and heart failure of the ejection fraction retention type (HFpEF) according to left ventricular ejection fraction (left ventricular ejection fraction, LVEF). The three subtypes differ greatly in etiology and pathophysiology. Notably, early prediction of HFpEF remains challenging. The establishment of the HFpEF early prediction model is very important to risk assessment management and clinical decision of heart failure, and timely controls diet and life habit aiming at patients with high HFpEF risk, thereby being more in line with the principle of accurate prevention.

Heart failure is a disease caused by multiple factors, and its occurrence is the result of the combined action of genetic factors and environmental factors. Epigenetic mechanisms such as DNA methylation and the like are involved in regulating myocardial fibrosis, leading to myocardial energy metabolism disorders, the occurrence of abnormal metabolism, transport and activation of amino acids and the like, promoting the development of cardiovascular diseases and affecting the risk of individual diseases, which is one of the pathophysiological causes of HFpEF occurrence. It is well known that the occurrence of HFpEF is closely related to clinical factors. The risk of HFpEF onset increases dramatically with age. DNA methylation and clinical characteristics can describe disease states in different dimensions, with internal interactions.

In the prior art Sadiya et al developed a 10 year heart failure risk model that included 10 clinical features, but did not discuss the pathogenesis and different subtypes of CHF. The method lacks the ability to learn feature interactions and considers only clinical features, focusing on no epigenetic factors (Khan SS,Ning H,Shah SJ et al.10-Year Risk Equations for Incident Heart Failure in the General Population.J.Am.Coll.Cardiol.2019;73:2388-2397).

William b.kannel et al developed a 4-year heart failure risk assessment model in which the risk of heart failure was estimated using a logistic function consisting of 9 clinical features. The method has the same problems as the model developed by Sadiya and the like: lack of ability to learn feature interactions and consider only clinical features, not focused on epigenetic factors (Kannel WB,D'Agostino RB,Silbershatz H et al.Profile for estimating risk of heart failure.Arch.Intern.Med.1999;159:1197-1204.).

Edward Choi et al built an early detection model of CHF, using modeling of the temporal relationship between health data from a patient Electronic Health Record (EHR), predicting future diagnosis of CHF. The method only focuses on clinical features, and does not consider the influence of epigenetic factors on future heart failure events (Choi E,Schuetz A,Stewart WF et al.Using recurrent neural network models for early detection of heart failure onset.J.Am.Med.Inform.Assoc.2016;24:361-370).

There is no HFpEF risk prediction model developed that integrates clinical and epigenetic features. Therefore, developing an accurate and comprehensive HFpEF risk prediction method is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In order to solve the above problems, the present invention provides a model for predicting the risk of heart failure with preserved ejection fraction, which performs clinical evaluation on all patients to identify HFpEF risk by combining end-to-end machine learning model with multi-set of learning data interaction, and delays or prevents HFpEF occurrence by controlling HFpEF risk factors, treating asymptomatic left ventricular contractile dysfunction, and the like.

97 Clinical data from 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected. After 8 years of follow-up, 738 participants showed no heart failure, 59 participants were diagnosed as HFpEF, and the present invention uses this data as a training set to obtain a set of marker combinations for modeling the predicted risk of heart failure with preserved ejection fraction.

Terminology:

BMI: body Mass Index, BMI = Body weight (kg)/height (meter) squared, i.e., kg/m ².

EHR: electronic health records, an electronic health record.

AUC: area under curves area under the curve.

FM: a factorization machine.

DNN: deep neural networks.

In one aspect, the invention provides a marker combination for predicting the risk of heart failure in the ejection fraction retention type.

The marker combination comprises:

Age of patient;

Whether diuretics are taken;

BMI；

Urine albumin;

Serum creatinine;

Beta value of methylation site.

The methylation site comprises ：cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.

The diuretics include, but are not limited to: furosemide (furosemide), metolazone (metolazone), chlorthalidone (chlorthalidone), torsemide, furosemide (lasix).

In another aspect, the invention provides a model for predicting the risk of heart failure in the ejection fraction retention type.

The information collected by the model comprises marker combinations obtained by screening the following characteristics:

Age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta value of methylation site ：cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.

The model is built by means of a data mining algorithm.

The model performs modeling by using DeepFM algorithm on the marker combination obtained by feature screening.

The method for establishing the model comprises the steps of inputting a sigmoid function in DeepFM algorithm for early prediction of an HFpEF event, and outputting as follows:

Wherein, Is a predicted HFpEF event, yFM is the output of the FM component, yDNN is the output of the DNN component.

The output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is as follows:

Where w ε R ^d、V_i∈R^k (k is known). The other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions.

The output of the DNN component is as follows:

yDNN＝W^|H|+1·a^|H|+b^|H|+1。

Where |H| is the number of hidden layers, a ^(l) is the output of the embedded layer, W ^(l) is the model weight, and b ^(l) is the bias of the first layer.

Specifically, for a given deep hidden layer, a deep neural network is implemented that contains two hidden layers (256 ) using ReLU as an activation function:

y＝f(x)＝relu(wx+b)，

Using logloss as an objective function here, to control the overfit, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001.

When the deep neural network is optimized, a batch normalization and weight attenuation method is adopted, the embedding is set to be 8, the batch size is set to be 300, and the decay size is set to be 0.9.

Preferably, the performance evaluation of the DeepFM model uses a three-fold cross-validation, using Adam as the optimization algorithm, with a learning rate of 0.0001, epoch of 400, and dropout of 60%.

Preferably, the recognition capability of the DeepFM model is assessed by area, sensitivity, specificity and accuracy under the subject's operating characteristics.

In a further aspect, the present invention provides the use of the foregoing marker composition and/or the foregoing model for the preparation of a medicament and/or kit for the treatment of heart failure associated with ejection fraction retention.

In yet another aspect, the invention provides the use of the foregoing marker composition and/or the foregoing model in constructing an early predictive model of other chronic complex diseases.

The invention has the beneficial effects that:

Taking multiple groups of clinical characteristics into consideration, starting from 5 clinical diagnosis characteristics and 25 epigenetic characteristics, DNA methylation, using DeepFM algorithm to establish an early risk assessment method of HFpEF, wherein the method can extract respective complex characteristics from original characteristics to realize interaction among the characteristics, and the method result is superior to the existing models and other baseline machine learning models.

The DeepFM model employed in the present application performs best compared to the currently widely used baseline model SVC、Bagging、Random Forest、RUSBoost、Easy Ensemble、Gradient Boosting、Gaussian Naive Bayes、XGBoost、LogitBoost.

Drawings

FIG. 1 is an AUC of predicted performance based on different features in a test set, wherein: HFrisk model denotes a model containing EHR features and DNA methylation features, 25CpG model denotes a model with DNA methylation features only, and 5EHR model denotes a model with EHR features only.

Figure 2 is the AUC results for the baseline model in the test set and 30 features used for this method.

Fig. 3 is the AUC results of this method and William b.kannel model in male/female participants.

Fig. 4 shows the evaluation of the method by means of a calibration graph of the observed risk and the predicted risk and a Hosmer-Lemeshow goodness-of-fit test, the calibration being as shown in the graph, the Hosmer-Lemeshow statistical test being statistically significant (p=0.678).

Detailed Description

The present invention will be described in further detail with reference to the following examples, which are not intended to limit the present invention, but are merely illustrative of the present invention. The experimental methods used in the following examples are not specifically described, but the experimental methods in which specific conditions are not specified in the examples are generally carried out under conventional conditions, and the materials, reagents, etc. used in the following examples are commercially available unless otherwise specified.

Example 1A method for constructing a model for predicting the risk of heart failure with preserved ejection fraction

97 Clinical data from 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected. After 8 years of follow-up, 738 participants showed no heart failure, 59 participants were diagnosed with HFpEF, and this data was used as a training set.

And gradually screening the DNA methylation characteristics and the clinical diagnosis and treatment characteristics by using a three-step characteristic selection method.

(1) For clinical diagnosis and treatment characteristics, incomplete and insignificant clinical characteristics in the training set are removed by adopting the following thresholds: missing samples >20%, and the chi-square test/Mann-Whitney U test of both groups of samples has a p-value >0.05. If the Pearson correlation between two clinical features is greater than 0.8, then the clinical features with less Spearman correlation (i.e., less correlation with HFpEF) are discarded, and finally 26 clinical features are screened as follows:

Diuretics, beta-blockers, angiotensin II antagonists, ACEI, aspirin, cardiovascular disorders, coronary heart disease, atrial fibrillation, sex, age, BMI, serum creatinine, diastolic blood pressure, fasting blood glucose, high density lipoprotein cholesterol, systolic blood pressure, total cholesterol, hypertension, lipid disorders, atrial enlargement, rheumatism, aortic valve disorders, rheumatoid arthritis, urinary albumin, whole blood hemoglobin A1C, C reactive protein.

(2) For methylation signatures, differential methylation probes were obtained in the training Set (DMPs). DMPs were obtained using the fold change after log >0.05 and the adjusted p-value <0.05 as threshold, screening 319 DNA methylation sites, as follows:

cg01074797,cg10501210,cg15084543,cg16836311,cg16986315,cg26553501,cg00522231,cg02577387,cg04864807,cg20698421,cg22454769,cg01227537,cg15028458,cg17280346,cg02650266,cg04074536,cg08288016,cg04675542,cg05016408,cg05845376,cg06883126,cg10530883,cg16227623,cg18839637,cg19025234,cg23479922,cg03738025,cg08476511,cg16867657,cg18319852,cg22367191,cg23737190,cg24661236,cg09124496,cg09125127,cg07041999,cg12873476,cg14620572,cg10187894,cg17766026,cg24205914,cg24530234,cg01178624,cg15243034,cg21481937,cg01974091,cg01150270,cg18567924,cg01295034,cg04358214,cg08101977,cg08151623,cg09259081,cg09684846,cg16263848,cg17759274,cg19344626,cg24794228,cg25755428,cg00375983,cg00815832,cg01128109,cg01588224,cg02998240,cg03341469,cg03655142,cg05363438,cg06794355,cg06829788,cg07388493,cg08128734,cg08702915,cg08893087,cg09362335,cg13352914,cg16000360,cg16265542,cg22559013,cg23371584,cg00057240,cg00190206,cg03233656,cg03879180,cg05085636,cg05203213,cg05365735,cg05481257,cg05951221,cg09128529,cg10106284,cg10687131,cg10835286,cg11237792,cg11807280,cg12682972,cg13149736,cg17534916,cg19578183,cg22943590,cg25138327,cg25552548,cg27141850,cg02356435,cg03556243,cg10098541,cg11580026,cg12615982,cg14039937,cg14975410,cg19729744,cg23450509,cg01950474,cg02188818,cg02810967,cg03063309,cg03671075,cg04573661,cg07456878,cg07809027,cg07974833,cg08462122,cg11076306,cg11970349,cg15106030,cg16781992,cg18797590,cg20816447,cg02051771,cg04071270,cg05955210,cg07705913,cg08635765,cg08763461,cg11925729,cg14698665,cg15421911,cg23125993,cg23500537,cg24856658,cg26624398,cg27604145,cg02872426,cg03142554,cg03785755,cg04064963,cg05220968,cg06126421,cg06386482,cg06951627,cg09548084,cg10083824,cg11342453,cg11617964,cg12560772,cg13221458,cg14441271,cg15319032,cg15804973,cg16004593,cg17608381,cg21572722,cg21855021,cg22707857,cg23049448,cg27368039,cg02756107,cg03068497,cg03453431,cg03799713,cg06890291,cg08614290,cg09423312,cg14017689,cg14485097,cg16446288,cg17067544,cg17727071,cg21429551,cg21807944,cg23447239,cg23715104,cg25879142,cg26153045,cg26391564,cg00285394,cg00327383,cg00399059,cg07583137,cg08090164,cg10210397,cg13021857,cg13631444,cg16196274,cg17903548,cg19021188,cg20988565,cg22861548,cg23180489,cg24554944,cg25305703,cg25329685,cg25483741,cg25923729,cg27039118,cg00008629,cg01028796,cg07158339,cg10173814,cg13040392,cg13474639,cg13619074,cg13741668,cg13959831,cg23006204,cg23469878,cg00045910,cg00639447,cg03962527,cg06012872,cg06812574,cg09421083,cg10556349,cg21024264,cg24711336,cg24892069,cg27290215,cg27401945,cg01074365,cg02315732,cg04993130,cg05586607,cg06344265,cg10253640,cg11025604,cg12738765,cg19172170,cg20648141,cg01349368,cg03124318,cg04671742,cg05157098,cg06021880,cg15110296,cg17725129,cg20051875,cg23256579,cg25486399,cg00509187,cg00893603,cg01205935,cg01783816,cg02838877,cg04887675,cg12603632,cg02222791,cg03459776,cg10172979,cg13158272,cg18485215,cg19590421,cg22816294,cg27056759,cg04134722,cg10403394,cg11032707,cg23299445,cg27340001,cg00876127,cg01046070,cg01259782,cg03809021,cg04155793,cg05107535,cg05917111,cg07839457,cg08486507,cg09958192,cg21996068,cg02228185,cg03745383,cg07169660,cg12317815,cg18181703,cg19758448,cg24079381,cg00495303,cg07529654,cg24217948,cg00706441,cg00789427,cg03562414,cg06581818,cg08233235,cg09547119,cg10635122,cg13640414,cg13672791,cg19536401,cg20964856,cg22693863,cg25481705,cg25599567,cg01479232,cg01581757,cg01618988,cg04931539,cg10339152,cg10743062,cg11853697,cg11861654,cg11991566,cg17129400,cg18965930,cg20810198,cg25829531,cg22862003,cg01234420,cg01346718,cg04986324.

(3) Combining 26+319 characteristics obtained by screening, and performing second-step characteristic screening by using lasso algorithm. By adding the L1 norm of the coefficient as a penalty term to the loss function, a first order penalty function is constructed, the purpose of feature selection is achieved by parameter reduction, the result vector sparsity is achieved, and finally, feature screening is executed by compressing the coefficient corresponding to some weak variables to 0, and the minimized objective function is as follows:

in this algorithm, the super parameter family= "binominal", type. Measure= "auc"/"class", nfolds =10 is set, and finally 80 combination features are obtained.

(4) And using a supporting gradient lifting tree algorithm (Gradient Boosting Decison Tree, GBDT for short), continuously performing feature splitting to generate a tree, fitting a residual error between a predicted value and an actual value of a previous round of model, so that the model achieves an optimal effect, and simultaneously, limiting the complexity of the tree by adding regularization terms to further prevent overfitting. In this algorithm, setting the hyper-parameters objective＝"binary:logistic",booster＝"gbtree",eval metric＝"error",nrounds＝7,eta＝0.5,max depth＝3,subsample＝0.5,colsample bytree＝1,min child weight＝2,gamma＝0.5. finally obtains 30 features, which are specifically as follows:

Age, whether diuretics, BMI, urinary albumin, serum creatinine, methylation sites ：cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.

(5) To better reflect the effect of interactions between features on HFpEF risk, modeling was performed on 30 features obtained by feature screening using DeepFM algorithm. The neural network model DeepFM integrates the architecture of a factor decomposition machine (FM) and a Deep Neural Network (DNN), FM is low-order feature interaction modeling, DNN is high-order feature interaction modeling, cross features are learned through dot product and hidden vector forms, the capability of automatically learning the cross features is achieved, and end-to-end models of complex features can be extracted from original features. DeepFM can be trained effectively because its wide and deep portions are different, but they share the same input and embedding vectors. DeepFM extract DNA methylation and EHR features and learn the combination of features that are hidden behind these features. DeepFM jointly trains the whole network in an end-to-end mode, finally inputs a sigmoid function for early prediction of the HFpEF event, and outputs as follows:

Wherein, Is a predicted HFpEF event, yFM is the output of the FM component, yDNN is the output of the DNN component. The FM component and DNN component represent a factorizer and a feedforward neural network, respectively, for low-order feature interactions and high-order feature interactions. The FM model captures order-2 feature interactions more effectively than previous methods, especially if the data set is sparse, as an inner product of the respective feature potential vectors for the 2-order feature interactions. The output of FM is the sum of one addition unit and a plurality of inner product units, and the output of FM is:

Where w ε R ^d、V_i∈R^k (k is known). The other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions. The depth component DNN is a feedforward neural network and is used for learning high-order feature interaction, and the DNN output is as follows:

yDNN＝W^|H|+1·a^|H|+b^|H|+1。

Where |H| is the number of hidden layers, a ^(l) is the output of the embedded layer, W ^(l) is the model weight, and b ^(l) is the bias of the first layer. For a given deep hidden layer, a deep neural network is implemented that contains two hidden layers (256 ) using ReLU as an activation function:

y＝f(x)＝relu(wx+b)。

Logloss is used here as an objective function. To control the overfitting, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001. To optimize the neural network, batch normalization and weight decay methods are employed. The embedding was set to 8, the batch size was set to 300, and the decay size was set to 0.9. To train DeepFM algorithm, using Adam as the optimization algorithm, learning rate was set to 0.0001, epoch was set to 400, dropout was set to 60%, and performance of DeepFM model was evaluated using tri-fold cross validation.

(6) Finally, the area, sensitivity, specificity and accuracy under the subject's operating characteristics were used to evaluate the recognition ability of all application models. The subject operation characteristic curve, namely ROC curve, is a curve drawn by taking a true positive rate TPR (True Positive Rate, which can indicate sensitivity) as an ordinate and a false positive rate FPR (False Positive Rate, which can indicate 1-specificity) as an abscissa according to a series of set thresholds, and reflects the changes of the TPR and the FPR under different thresholds, wherein the closer the curve is to the upper left corner, the better the classification performance of the model is indicated; sensitivity is the proportion of correct prediction in the model, and represents the proportion of the positive examples divided into pairs, and the recognition capability of the classifier on the positive examples is measured; accuracy describes the ability of the classifier to determine overall data, with positive determinations being positive and negative determinations being negative; the accuracy is the most common evaluation index, which is the number of samples divided by the number of all samples, and in general, the higher the accuracy, the better the classifier; the specificity represents the proportion of all negative examples which are divided into pairs, and the recognition capability of the classifier on the negative examples is measured.

Example 2 model verification

171 Sampled participants without cardiovascular disease were collected, 139 participants showed no heart failure over 8 years of follow-up, and 32 participants were diagnosed with HFpEF and evaluated as a test set of the method.

The model was judged and calibrated using AUC and Hosmer-Lemeshow test. The following table is a confusion matrix for the 30 features used in the method in the test set:

In the test set, the AUC obtained using this model was 0.90 (95% confidence interval, 0.89-0.90). The calibration curve obtained by the model is shown in fig. 1, and the Hosmer-Lemeshow statistical test has no statistical significance (p=0.678), so that the reliability of the result is proved.

To assess the impact of training set sample size on this model, 75%, 60%, 50% and 25% of training set participants were randomly selected. Test set results are shown below:

Sample size	Participants (participants)	AUC	Sensitivity of	Specificity (specificity)	Accuracy rate of
						25％	Verification set	0.90(0.76-1.00)	0.89(0.57-1.00)	0.93(0.84-1.00)	0.92(0.86-0.99)
25％	Test set	0.88(0.84-0.91)	0.82(0.76-0.88)	0.84(0.66-1.00)	0.84(0.69-0.98)
						50％	Verification set	0.87(0.79-0.95)	0.86(0.62-1.00)	0.79(0.61-0.98)	0.80(0.65-0.95)
50％	Test set	0.89(0.87-0.90)	0.85(0.80-0.91)	0.87(0.75-0.98)	0.86(0.78-0.95)
						60％	Verification set	0.92(0.87-0.96)	0.89(0.82-0.95)	0.84(0.69-0.99)	0.85(0.71-0.98)
60％	Test set	0.89(0.88-0.89)	0.83(0.77-0.89)	0.85(0.72-0.98)	0.86(0.75-0.94)
						75％	Verification set	0.90(0.86-0.95)	0.84(0.57-1.00)	0.87(0.66-1.00)	0.87(0.70-1.00)
75％	Test set	0.88(0.85-0.91)	0.76(0.70-0.82)	0.90(0.82-0.98)	0.87(0.81-0.93)

The results indicate that the results of the test set are independent of the sample size of the training set.

Furthermore, we also compared HFrisk models to the currently widely used baseline model SVC、Bagging、Random Forest、RUSBoost、Easy Ensemble、Gradient Boosting、Gaussian Naive Bayes、XGBoost、LogitBoost for performance. Each reference model is fine-tuned to achieve better results. Using the same 30 characteristics, AUC results were obtained as shown in fig. 2. Although the AUC (auc=0.57-0.88) was slightly different for each model, the performance for DeepFM model was still the best.

Comparing this model with other published models: william b.kannel et al propose a 4-year risk assessment model (Profile for Estimating Risk of Heart Failure[J].Archives of Internal Medicine,1999,159(11):1197-204.), that uses a hybrid logistic regression algorithm to assess CHF risk by gender on the FHS cohort. The model was constructed using the same training set as William b.kannel and compared to the model constructed according to the present application. The AUCs obtained by the final inventive model were respectively: male (male) 0.99, female (female) 0.94; AUC of William b.kannel model was: male 0.74 female 0.89. The results are shown in FIG. 3.

AUC or C statistics of the present model are directly compared to those of the published model: sadiya S. khan et al describe statistics for a 10 year CHF risk equation (Khan S S,Ning H,Shah S J,et al.10-Year Risk Equations for Incident HeartFailure in the General Population[J].Journal of the American College of Cardiology,2019,73(19):2388-2397.), validation set C of 0.71-0.87; edward Choi et al established a CHF early detection model with a test set AUC <0.80. The results prove that the method is optimal.

Calibration degree calibration charts are often used to evaluate consistency/calibration degree, i.e. the difference between predicted and actual values, i.e. scatter plots of actual occurrence (observed risk) and predicted occurrence (PREDICTED RISK), which are visualized as a result of the Hosmer-Lemeshow goodness-of-fit test. And the real result and the model fitting probability are transmitted to a hostem. Test function, so that p=0.678 is obtained, and the fact that the original assumption cannot be refused is indicated, and the fitting effect is good. The results are shown in FIG. 4.

Claims

1. A marker combination for predicting the risk of heart failure with preserved ejection fraction, comprising:

Beta values for methylation sites including ：cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910;

Age of patient;

Whether diuretics are taken;

BMI；

Urine albumin;

serum creatinine.

2. A model for predicting the risk of heart failure with preserved ejection fraction, wherein the information collected by the model comprises the following marker combinations:

age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta value of methylation site;

3. The model for predicting the risk of heart failure in a ejection fraction preserving model according to claim 2, wherein the model performs modeling by using DeepFM algorithm on marker combinations obtained by feature screening.

4. A model for predicting the risk of heart failure with preserved ejection fraction as claimed in claim 3, wherein the method for constructing the model includes inputting a sigmoid function in DeepFM algorithm for early prediction of heart failure event with preserved ejection fraction, and outputting:

Wherein, Is a predicted ejection fraction retention heart failure event, yFM is the output of the FM component, yDNN is the output of the DNN component.

5. The model for predicting the risk of heart failure in a preserved ejection fraction as set forth in claim 4, wherein the output of the FM component is a sum of an addition unit and a plurality of inner product units, and the output of the FM component is:

Wherein w ε R ^d、V_i∈R^k, k is known; the other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions.

6. The model for predicting the risk of heart failure with preserved ejection fraction as set forth in claim 4, wherein the output of the DNN component is:

yDNN＝W^|H|+1·a^|H|+b^|H|+1；

7. The model for predicting the risk of ejection fraction preserving heart failure according to claim 5, wherein for a given deep hidden layer a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:

y＝f(x)＝relu(wx+b)，

8. The model of claim 3, wherein the DeepFM model performance evaluation uses a tri-fold cross-validation, using Adam as the optimization algorithm, learning rate is set to 0.0001, epoch is set to 400, and dropout is set to 60%.

9. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the preparation of a medicament and/or a kit for the treatment of heart failure associated with ejection fraction retention.

10. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for constructing an early predictive model of chronic complex disease.