CN113421648A

CN113421648A - Model for predicting risk of ejection fraction retention type heart failure

Info

Publication number: CN113421648A
Application number: CN202110686960.8A
Authority: CN
Inventors: 方向东; 赵学彤; 渠鸿竹; 董蔚
Original assignee: Beijing Institute of Genomics of CAS
Current assignee: Beijing Institute of Genomics of CAS
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2021-09-21
Anticipated expiration: 2041-06-21

Abstract

The invention provides a model for predicting ejection fraction retention type heart failure risk, wherein the information collected by the model comprises the following marker combinations: the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; beta values for 25 methylation sites. Clinical evaluation is carried out on all patients by combining an end-to-end machine learning model with multi-group data interaction so as to identify risks, and the occurrence of ejection fraction retention type heart failure is delayed or prevented by controlling risk factors, treating asymptomatic left ventricular systolic dysfunction and the like. The model provided by the invention has good clinical application prospect.

Description

Model for predicting risk of ejection fraction retention type heart failure

Technical Field

The invention belongs to the field of disease diagnosis models, and particularly relates to a model for predicting ejection fraction retention type heart failure risk.

Background

In recent years, heart failure morbidity and mortality have increased year by year. Heart failure is an abnormal change in the structure or function of the heart caused by a complex interaction of biochemical factors such as genetics, neurohormones, metabolism, inflammation, etc. Chronic Heart Failure (CHF) is characterized by disturbances in myocardial energy metabolism and metabolic remodeling, with high morbidity and mortality. Therefore, it is essential to obtain accurate individualized risk assessment to assist in further management of clinical decisions.

There are three subtypes of chronic heart failure recognized at present, and they are classified into heart failure with reduced ejection fraction (HFrEF), heart failure with medium ejection fraction (HFmrEF), and heart failure with preserved ejection fraction (HFpEF) according to Left Ventricular Ejection Fraction (LVEF). The three subtypes differ greatly in etiology and pathophysiology. It is noteworthy that early prediction of HFpEF remains challenging. The establishment of an HFpEF early prediction model is very important for risk assessment management and clinical decision of heart failure, and the method controls diet and living habits in time aiming at patients with high HFpEF risk and better accords with the principle of accurate prevention.

Heart failure is a multifactorial disease, the occurrence of which is the result of the combined action of genetic and environmental factors. Epigenetic mechanisms such as DNA methylation are involved in regulating myocardial fibrosis, causing myocardial energy metabolism disorder, abnormal metabolism, transportation, activation and the like of amino acid, promoting cardiovascular disease development and influencing individual disease risk, and are one of the pathophysiological reasons for HFpEF occurrence. It is well known that the development of HFpEF is closely related to clinical factors. The risk of HFpEF onset increases dramatically with age. DNA methylation and clinical features can describe disease states in different dimensions, with internal interactions.

In the prior art, Sadiya et al developed a 10-year heart failure risk model that included 10 clinical features, but did not discuss the pathogenesis and different subtypes of CHF. This method lacks the ability to learn feature interactions and considers only clinical features, with no focus on epigenetic factors (Khan SS, Ning H, Shah SJ et al.10-Yeast Risk Equations for inclusion Heart Failure in the General position.J.Am.Coll.Cardiol.2019; 73: 2388-.

William b.kannel et al developed a 4-year heart failure risk assessment model, and the risk of heart failure was estimated using a logistic function consisting of 9 clinical features. This method has the same problems as the model developed by Sadiya et al: lack of ability to learn feature interactions and only consider clinical features, and no attention has been paid to epigenetic factors (Kannel WB, D' Agostino RB, Silverhatz H et al. Profile for simulating rank of heart failure. Arch. Intern. Med.1999; 159: 1197-.

Edward Choi et al established an early detection model for CHF, using modeling of the temporal relationship between health data from a patient's Electronic Health Record (EHR), to predict a diagnosis of future CHF. The method focuses only on clinical features and does not take into account the influence of epigenetic factors on future heart failure events (Choi E, Schuetz A, Stewart WF et al, using recurrent neural network models for early detection of heart failure onset. J.Am.Med.inform. asset.2016; 24: 361-.

No HFpEF risk prediction model has been developed that integrates clinical and epigenetic features. Therefore, the development of an accurate and comprehensive HFpEF risk prediction method is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In order to solve the above problems, the present invention provides a model for predicting risk of ejection fraction-preserved heart failure, which is used for clinical evaluation of all patients to identify risk of HFpEF by combining end-to-end machine learning model with multiple sets of mathematical data interaction, and delaying or preventing the occurrence of HFpEF by controlling HFpEF risk factors, treating asymptomatic left ventricular systolic dysfunction, and the like.

The present invention collected 97 clinical diagnosis and treatment data of 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chip. After 8 years of follow-up, 738 participants had no heart failure performance, 59 participants were diagnosed as HFpEF, and this data was used as a training set to obtain a set of marker combinations for establishing a model for predicting risk of ejection fraction-preserved heart failure.

The terms:

BMI: body Mass Index, BMI-Body weight (kg) per height (m) squared, i.e. kg/m²。

EHR: electronic health records.

AUC: area under curves.

FM: a factorization machine.

DNN: a deep neural network.

In one aspect, the invention provides a marker combination for predicting risk of ejection fraction-preserved heart failure.

The marker combination comprises:

the age of the patient;

whether to take diuretic drugs;

BMI；

urinary albumin;

serum creatinine;

beta value of methylation site.

The methylation sites include: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.

The diuretic drugs include but are not limited to: furosemide (furosemide), metolazone (methazone), chlorthalidone (chlorothalidone), torasemide (torsemide), furosemide (lasix).

In another aspect, the invention provides a model for predicting risk of ejection fraction-preserved heart failure.

The information collected by the model comprises a marker combination obtained by screening the following characteristics:

the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; methylation site: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.

The model is built by means of a data mining algorithm.

The model is modeled by using a deep FM algorithm on the marker combination obtained by feature screening.

The method for establishing the model comprises the steps of inputting a sigmoid function in a deep FM algorithm for early prediction of the HFpEF event, and outputting:

wherein the content of the first and second substances,

is the predicted HFpEF event, yFM is the output of the FM component, and yDNN is the output of the DNN component.

The output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is as follows:

wherein w ∈ R^d、V_i∈R^k(k known). Additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction.

The output of the DNN component is:

yDNN＝W^|H|+1·a^|H|+b^|H|+1。

where | H | is the number of implicit layers, a^(l)For the output of the embedding layer, W^(l)Is a model weight, b^(l)The deviation of the l-th layer.

Specifically, for a given deep hidden layer, a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:

y＝f(x)＝relu(wx+b)，

here using loglos as the objective function, to control overfitting, add an L2 regularization penalty on the nodes, with the parameter set to 0.0001.

When the deep neural network is optimized, a batch normalization and weight attenuation method is adopted, the embedding is set to be 8, the batch size is set to be 300, and the decay size is set to be 0.9.

Preferably, the performance evaluation of the DeepFM model adopts triple-fold cross validation, Adam is used as an optimization algorithm, the learning rate is set to be 0.0001, the epoch is set to be 400, and the dropout is set to be 60%.

Preferably, the discriminatory power of the deep fm model is assessed by the area, sensitivity, specificity and accuracy under the subject's operating characteristic curve.

In a further aspect, the invention provides the use of a marker composition as described above and/or a model as described above in the manufacture of a medicament and/or a kit for use in association with heart failure of the ejection fraction retaining type.

In a further aspect, the invention provides the use of the aforementioned marker compositions and/or the aforementioned models in the construction of early predictive models of other chronic complex diseases.

The invention has the beneficial effects that:

starting with 5 clinical diagnosis features and 25 epigenetic feature DNA methylation, an early risk assessment method of HFpEF is established by using a DeepFM algorithm, and the method can extract respective complex features from original features to realize interaction among the features, and the result of the method is superior to that of the existing model and other baseline machine learning models.

Compared with the current widely used reference models such as SVC, Bagging, Random Forest, RUSBoost, Easy Ensemble, Gradient Boosting, Gaussian Naive Bayes, XGboost and LogitBoost, the deep FM model adopted by the method has the best performance.

Drawings

FIG. 1 is the AUC for predicting performance based on different characteristics in a test set, where: HFrisk model represents a model containing EHR signature and DNA methylation signature, 25CpG model represents a model with only DNA methylation signature, 5EHR model represents a model with only EHR signature.

Figure 2 is the AUC results for the baseline model in the test set and this method using 30 features.

Figure 3 is the AUC results of this method with William b.kannel model in male/female participants.

FIG. 4 is a graph showing the evaluation of the present method by a calibration chart of observed risk and predicted risk, wherein the calibration chart is a chart of the statistical test of Hosmer-Lemeshow without statistical significance (P is 0.678).

Detailed Description

The present invention will be further illustrated in detail with reference to the following specific examples, which are not intended to limit the present invention but are merely illustrative thereof. The experimental methods used in the following examples are not specifically described, and the materials, reagents and the like used in the following examples are generally commercially available under the usual conditions without specific descriptions.

Embodiment 1 a method for constructing a model for predicting risk of ejection fraction-preserved heart failure

97 clinical data of sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected in 797 cases. After 8 years of follow-up, 738 participants had no heart failure performance, and 59 participants were diagnosed with HFpEF, which was used as the training set.

And gradually screening the DNA methylation characteristics and clinical diagnosis and treatment characteristics by using a three-step characteristic selection method.

(1) For clinical diagnosis and treatment features, the following threshold values are adopted to remove incomplete and non-significant clinical features in the training set: deletion samples > 20%, and the chi-square test/Mann-Whitney U test for both sets of samples had p-values > 0.05. If the Pearson correlation between two clinical features is greater than 0.8, then the clinical feature with a smaller Spearman correlation (i.e., a smaller correlation with HFpEF) is discarded, and finally 26 clinical diagnostic features are screened as follows:

diuretic, beta-blocker, angiotensin II antagonist, ACEI, aspirin, cardiovascular disease, coronary heart disease, atrial fibrillation, sex, age, BMI, serum creatinine, diastolic blood pressure, fasting plasma glucose, high density lipoprotein cholesterol, systolic blood pressure, total cholesterol, hypertension, lipid disease, atrial augmentation, rheumatism, aortic valve disease, rheumatoid arthritis, urinary albumin, whole blood hemoglobin A1C, C reactive protein.

(2) For methylation signatures, in the training set, Differential Methylation Probes (DMPs) were obtained. DMPs were obtained using log fold change >0.05 and adjusted p value <0.05 as thresholds, and 319 DNA methylation sites were screened as follows:

cg01074797,cg10501210,cg15084543,cg16836311,cg16986315,cg26553501,cg00522231,cg02577387,cg04864807,cg20698421,cg22454769,cg01227537,cg15028458,cg17280346,cg02650266,cg04074536,cg08288016,cg04675542,cg05016408,cg05845376,cg06883126,cg10530883,cg16227623,cg18839637,cg19025234,cg23479922,cg03738025,cg08476511,cg16867657,cg18319852,cg22367191,cg23737190,cg24661236,cg09124496,cg09125127,cg07041999,cg12873476,cg14620572,cg10187894,cg17766026,cg24205914,cg24530234,cg01178624,cg15243034,cg21481937,cg01974091,cg01150270,cg18567924,cg01295034,cg04358214,cg08101977,cg08151623,cg09259081,cg09684846,cg16263848,cg17759274,cg19344626,cg24794228,cg25755428,cg00375983,cg00815832,cg01128109,cg01588224,cg02998240,cg03341469,cg03655142,cg05363438,cg06794355,cg06829788,cg07388493,cg08128734,cg08702915,cg08893087,cg09362335,cg13352914,cg16000360,cg16265542,cg22559013,cg23371584,cg00057240,cg00190206,cg03233656,cg03879180,cg05085636,cg05203213,cg05365735,cg05481257,cg05951221,cg09128529,cg10106284,cg10687131,cg10835286,cg11237792,cg11807280,cg12682972,cg13149736,cg17534916,cg19578183,cg22943590,cg25138327,cg25552548,cg27141850,cg02356435,cg03556243,cg10098541,cg11580026,cg12615982,cg14039937,cg14975410,cg19729744,cg23450509,cg01950474,cg02188818,cg02810967,cg03063309,cg03671075,cg04573661,cg07456878,cg07809027,cg07974833,cg08462122,cg11076306,cg11970349,cg15106030,cg16781992,cg18797590,cg20816447,cg02051771,cg04071270,cg05955210,cg07705913,cg08635765,cg08763461,cg11925729,cg14698665,cg15421911,cg23125993,cg23500537,cg24856658,cg26624398,cg27604145,cg02872426,cg03142554,cg03785755,cg04064963,cg05220968,cg06126421,cg06386482,cg06951627,cg09548084,cg10083824,cg11342453,cg11617964,cg12560772,cg13221458,cg14441271,cg15319032,cg15804973,cg16004593,cg17608381,cg21572722,cg21855021,cg22707857,cg23049448,cg27368039,cg02756107,cg03068497,cg03453431,cg03799713,cg06890291,cg08614290,cg09423312,cg14017689,cg14485097,cg16446288,cg17067544,cg17727071,cg21429551,cg21807944,cg23447239,cg23715104,cg25879142,cg26153045,cg26391564,cg00285394,cg00327383,cg00399059,cg07583137,cg08090164,cg10210397,cg13021857,cg13631444,cg16196274,cg17903548,cg19021188,cg20988565,cg22861548,cg23180489,cg24554944,cg25305703,cg25329685,cg25483741,cg25923729,cg27039118,cg00008629,cg01028796,cg07158339,cg10173814,cg13040392,cg13474639,cg13619074,cg13741668,cg13959831,cg23006204,cg23469878,cg00045910,cg00639447,cg03962527,cg06012872,cg06812574,cg09421083,cg10556349,cg21024264,cg24711336,cg24892069,cg27290215,cg27401945,cg01074365,cg02315732,cg04993130,cg05586607,cg06344265,cg10253640,cg11025604,cg12738765,cg19172170,cg20648141,cg01349368,cg03124318,cg04671742,cg05157098,cg06021880,cg15110296,cg17725129,cg20051875,cg23256579,cg25486399,cg00509187,cg00893603,cg01205935,cg01783816,cg02838877,cg04887675,cg12603632,cg02222791,cg03459776,cg10172979,cg13158272,cg18485215,cg19590421,cg22816294,cg27056759,cg04134722,cg10403394,cg11032707,cg23299445,cg27340001,cg00876127,cg01046070,cg01259782,cg03809021,cg04155793,cg05107535,cg05917111,cg07839457,cg08486507,cg09958192,cg21996068,cg02228185,cg03745383,cg07169660,cg12317815,cg18181703,cg19758448,cg24079381,cg00495303,cg07529654,cg24217948,cg00706441,cg00789427,cg03562414,cg06581818,cg08233235,cg09547119,cg10635122,cg13640414,cg13672791,cg19536401,cg20964856,cg22693863,cg25481705,cg25599567,cg01479232,cg01581757,cg01618988,cg04931539,cg10339152,cg10743062,cg11853697,cg11861654,cg11991566,cg17129400,cg18965930,cg20810198,cg25829531,cg22862003,cg01234420,cg01346718,cg04986324。

(3) the 26+319 features from the screening were combined and a second step of feature screening was performed using the lasso algorithm. The method comprises the steps of constructing a first-order penalty function by adding an L1 norm of a coefficient as a penalty term to a loss function, realizing parameter reduction to achieve the purpose of feature selection, enabling the result vector to be sparse, finally compressing coefficients corresponding to certain weak variables into 0, performing feature screening, and minimizing a target function:

in this algorithm, the hyper-parameters family is set to "binomial", type, measure is set to "auc"/"class", nfolds is set to 10, and finally 80 combined features are obtained.

(4) And fitting residual errors between predicted values and actual values of the previous round of models by using a support Gradient Boosting Tree algorithm (GBDT) and continuously performing feature splitting to generate a Tree, so that the models achieve the best effect, and meanwhile, by increasing the complexity of a regularization term limiting Tree, overfitting is further prevented. In this algorithm, the hyper-parameter object is set to "binary: local", boost "gbtree", eval metric "error", nrounds "7, eta" 0.5, max depth "3, subsample" 0.5, coltemplate "1, min child weight" 2, and gamma "0.5. Finally, 30 characteristics were obtained, as follows:

age, whether diuretics are taken, BMI, urinary albumin, serum creatinine, methylation site: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.

(5) In order to better reflect the impact of the interaction between features on the risk of HFpEF, the 30 features obtained from feature screening were modeled using the deep fm algorithm. The neural network model DeepFM integrates the frameworks of a Factorization Machine (FM) and a Deep Neural Network (DNN), wherein the FM is low-order feature interactive modeling, the DNN is high-order feature interactive modeling, cross features are learned in the form of dot products and hidden vectors, the neural network model DeepFM has the capability of automatically learning the cross features, and end-to-end models of respective complex features can be extracted from original features. Deep fm can train efficiently because its wide and deep parts are different, but share the same input and embedding vectors. Deep fm extracts DNA methylation and EHR features and learns the combination of features hidden behind these features. The deep FM jointly trains the whole network in an end-to-end mode, and finally inputs a sigmoid function for early prediction of HFpEF events, and the output is as follows:

wherein the content of the first and second substances,

is the predicted HFpEF event, yFM is the output of the FM component, and yDNN is the output of the DNN component. The FM component and the DNN component represent a factorizer and a feedforward neural network, respectively, for low-order feature interaction and high-order feature interaction. The FM model uses 2-order feature interaction as the inner product of potential vectors of respective features, and can capture order-2 feature interaction more effectively than the prior method, especially in the case of sparse data set. The output of the FM is the sum of an addition unit and a plurality of inner product units, and the output of the FM is as follows:

wherein w ∈ R^d、V_i∈R^k(k known). Additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction. The depth component DNN is a feed-forward neural network for learning higher-order feature interactionsThe output of DNN is:

yDNN＝W^|H|+1·a^|H|+b^|H|+1。

where | H | is the number of implicit layers, a^(l)For the output of the embedding layer, W^(l)Is a model weight, b^(l)The deviation of the l-th layer. For a given deep hidden layer, a deep neural network is implemented using ReLU as the activation function, which contains two hidden layers (256 ):

y＝f(x)＝relu(wx+b)。

here, loglos is used as the objective function. To control the overfitting, an L2 regularization penalty was added on the nodes with the parameter set to 0.0001. To optimize the neural network, batch normalization and weight attenuation methods are employed. The embedding is set to 8, the batch size is set to 300, and the decay size is set to 0.9. To train the DeepFM algorithm, Adam was used as the optimization algorithm, the learning rate was set to 0.0001, epoch was set to 400, and dropout was set to 60%, and the performance of the DeepFM model was evaluated using triple-fold cross-validation.

(6) Finally, the area under the subject's operating characteristic curve, sensitivity, specificity and accuracy were used to assess the recognition ability of all applied models. An operation characteristic curve of a subject, namely an ROC curve, is a curve which is drawn by taking a True Positive Rate TPR (True Positive Rate, which can indicate sensitivity) as an ordinate and a False Positive Rate FPR (False Positive Rate, which can indicate 1-specificity) as an abscissa according to a series of set thresholds, and reflects the changes of the TPR and the FPR under different thresholds, wherein the closer the curve is to the upper left corner, the better the classification performance of the model is; the sensitivity is the proportion correctly predicted in the model, which represents the proportion of the classified proportion in all the positive examples, and the recognition capability of the classifier on the positive examples is measured; the accuracy describes the judging capability of the classifier on the whole data, and positive judgment can be determined as positive, and negative judgment can be determined as negative; the accuracy is the most common evaluation index, and is the number of paired samples divided by the number of all samples, and generally speaking, the higher the accuracy, the better the classifier; the specificity represents the proportion of all negative examples which are paired, and the recognition capability of the classifier on the negative examples is measured.

Example 2 model verification

171 sampling participants without cardiovascular disease were collected, and after 8 years of follow-up, 139 participants had no heart failure, and 32 participants were diagnosed as HFpEF, which was used as the test set for the method and evaluated.

And distinguishing and calibrating the model by adopting AUC and Hosmer-Lemeshow test. The following table is a confusion matrix for the 30 features used in the method in the test set:

in the test set, the AUC obtained using this model was 0.90 (95% confidence interval, 0.89-0.90). The calibration curve obtained by the model is shown in FIG. 1, and the statistical test of Hosmer-Lemeshow has no statistical significance (P is 0.678), thereby proving the reliability of the result.

To assess the impact of training set sample size on this model, 75%, 60%, 50% and 25% of the training set participants were randomly selected. The test set results are as follows:

sample size	Participants	AUC	Sensitivity of the probe	Specificity of	Accuracy rate
						25％	Verification set	0.90(0.76-1.00)	0.89(0.57-1.00)	0.93(0.84-1.00)	0.92(0.86-0.99)
25％	Test set	0.88(0.84-0.91)	0.82(0.76-0.88)	0.84(0.66-1.00)	0.84(0.69-0.98)
						50％	Verification set	0.87(0.79-0.95)	0.86(0.62-1.00)	0.79(0.61-0.98)	0.80(0.65-0.95)
50％	Test set	0.89(0.87-0.90)	0.85(0.80-0.91)	0.87(0.75-0.98)	0.86(0.78-0.95)
						60％	Verification set	0.92(0.87-0.96)	0.89(0.82-0.95)	0.84(0.69-0.99)	0.85(0.71-0.98)
60％	Test set	0.89(0.88-0.89)	0.83(0.77-0.89)	0.85(0.72-0.98)	0.86(0.75-0.94)
						75％	Verification set	0.90(0.86-0.95)	0.84(0.57-1.00)	0.87(0.66-1.00)	0.87(0.70-1.00)
75％	Test set	0.88(0.85-0.91)	0.76(0.70-0.82)	0.90(0.82-0.98)	0.87(0.81-0.93)

The results show that the results of the test set are independent of the sample size of the training set.

In addition, the HFrisk model is compared with the reference models which are widely used at present, such as SVC, Bagging, Random Forest, RUSBoost, Easy Ensemble, Gradient Boosting, Gaussian Naive Bayes, XGboost and LogitBoost in performance. Each reference model is fine tuned to obtain better results. Using the same 30 characteristics, AUC results were obtained, as shown in fig. 2. Although the AUC (AUC 0.57-0.88) of each model was slightly different, the performance of the deep fm model was still the best.

This model was compared to other published models: william B.Kannel et al proposed a 4-year Risk assessment model (Profile for Estimating Risk of Heart Failure [ J ]. Archives of Internal Medicine,1999,159(11): 1197-. The same training set of William b.kannel was used to construct the model and compared to the model constructed herein. The AUC obtained by the model of the invention is respectively as follows: male (male)0.99, female (female) 0.94; AUC for William b.kannel model is: male 0.74 female 0.89. The results are shown in FIG. 3.

AUC or C statistics of this model were directly compared to published models: sadiya S.Khan et al describe the 10 Year CHF Risk equation (Khan S, Ning H, Shah S J, et al.10-Yeast Risk equalizations for incorporated healthcare in the General position [ J ]. Journal of the American College of medicine, 2019,73(19): 2388-; edward Choi et al established an early detection model for CHF with a test set AUC < 0.80. The results demonstrate that the method is optimal.

Calibration degree the consistency/calibration degree, i.e. the difference between predicted and actual values, is often evaluated using a calibration map, i.e. a scatter plot of the actual (observed) and predicted (predicted) occurrence, which is a visualization of the results of the Hosmer-Lemeshow goodness-of-fit test. And (3) transferring the real result and the model fitting probability to a hoslem. The results are shown in FIG. 4.

Claims

1. A marker combination for predicting risk of ejection fraction-preserved heart failure, comprising:

a beta value for a methylation site, said methylation site comprising: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg 0704153999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910;

the age of the patient;

whether to take diuretic drugs;

BMI；

urinary albumin;

serum creatinine.

2. A model for predicting risk of ejection fraction-preserved heart failure, wherein information collected by said model comprises the following combinations of markers:

the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; beta value of methylation site;

3. The model for predicting risk of ejection fraction-preserved heart failure as set forth in claim 2, wherein the model is modeled by a combination of markers obtained by feature screening using a deep fm algorithm.

4. The model of claim 3, wherein the model is built by inputting sigmoid function in deep FM algorithm for early prediction of the event of ejection fraction-preserved heart failure, and the output is:

wherein the content of the first and second substances,

is the predicted ejection fraction preserving heart failure event, yFM is the output of the FM component,yDNN is the output of the DNN component.

5. The model of claim 4, wherein the output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is:

wherein w ∈ R^d、V_i∈R^k(k is known); additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction.

6. The model for predicting risk of ejection fraction preserving heart failure as claimed in claim 4, wherein the output of said DNN component is:

yDNN＝W^|H|+1·a^|H|+b^|H|+1；

7. The model for predicting risk of ejection fraction-preserved heart failure as recited in claim 5, wherein for a given deep hidden layer, a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:

y＝f(x)＝relu(wx+b)，

8. The model of claim 2, wherein the performance evaluation of the deep fm model is triple-fold cross-validation using Adam as an optimization algorithm, learning rate is set to 0.0001, epoch is set to 400, and dropout is set to 60%.

9. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the manufacture of a medicament and/or a kit for use in association with heart failure of the ejection fraction retaining type.

10. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the construction of an early predictive model of other chronic complex diseases.