CN113421648A - Model for predicting risk of ejection fraction retention type heart failure - Google Patents

Model for predicting risk of ejection fraction retention type heart failure Download PDF

Info

Publication number
CN113421648A
CN113421648A CN202110686960.8A CN202110686960A CN113421648A CN 113421648 A CN113421648 A CN 113421648A CN 202110686960 A CN202110686960 A CN 202110686960A CN 113421648 A CN113421648 A CN 113421648A
Authority
CN
China
Prior art keywords
model
heart failure
ejection fraction
output
deep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110686960.8A
Other languages
Chinese (zh)
Other versions
CN113421648B (en
Inventor
方向东
赵学彤
渠鸿竹
董蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Genomics of CAS
Original Assignee
Beijing Institute of Genomics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Genomics of CAS filed Critical Beijing Institute of Genomics of CAS
Priority to CN202110686960.8A priority Critical patent/CN113421648B/en
Priority claimed from CN202110686960.8A external-priority patent/CN113421648B/en
Publication of CN113421648A publication Critical patent/CN113421648A/en
Application granted granted Critical
Publication of CN113421648B publication Critical patent/CN113421648B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Abstract

The invention provides a model for predicting ejection fraction retention type heart failure risk, wherein the information collected by the model comprises the following marker combinations: the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; beta values for 25 methylation sites. Clinical evaluation is carried out on all patients by combining an end-to-end machine learning model with multi-group data interaction so as to identify risks, and the occurrence of ejection fraction retention type heart failure is delayed or prevented by controlling risk factors, treating asymptomatic left ventricular systolic dysfunction and the like. The model provided by the invention has good clinical application prospect.

Description

Model for predicting risk of ejection fraction retention type heart failure
Technical Field
The invention belongs to the field of disease diagnosis models, and particularly relates to a model for predicting ejection fraction retention type heart failure risk.
Background
In recent years, heart failure morbidity and mortality have increased year by year. Heart failure is an abnormal change in the structure or function of the heart caused by a complex interaction of biochemical factors such as genetics, neurohormones, metabolism, inflammation, etc. Chronic Heart Failure (CHF) is characterized by disturbances in myocardial energy metabolism and metabolic remodeling, with high morbidity and mortality. Therefore, it is essential to obtain accurate individualized risk assessment to assist in further management of clinical decisions.
There are three subtypes of chronic heart failure recognized at present, and they are classified into heart failure with reduced ejection fraction (HFrEF), heart failure with medium ejection fraction (HFmrEF), and heart failure with preserved ejection fraction (HFpEF) according to Left Ventricular Ejection Fraction (LVEF). The three subtypes differ greatly in etiology and pathophysiology. It is noteworthy that early prediction of HFpEF remains challenging. The establishment of an HFpEF early prediction model is very important for risk assessment management and clinical decision of heart failure, and the method controls diet and living habits in time aiming at patients with high HFpEF risk and better accords with the principle of accurate prevention.
Heart failure is a multifactorial disease, the occurrence of which is the result of the combined action of genetic and environmental factors. Epigenetic mechanisms such as DNA methylation are involved in regulating myocardial fibrosis, causing myocardial energy metabolism disorder, abnormal metabolism, transportation, activation and the like of amino acid, promoting cardiovascular disease development and influencing individual disease risk, and are one of the pathophysiological reasons for HFpEF occurrence. It is well known that the development of HFpEF is closely related to clinical factors. The risk of HFpEF onset increases dramatically with age. DNA methylation and clinical features can describe disease states in different dimensions, with internal interactions.
In the prior art, Sadiya et al developed a 10-year heart failure risk model that included 10 clinical features, but did not discuss the pathogenesis and different subtypes of CHF. This method lacks the ability to learn feature interactions and considers only clinical features, with no focus on epigenetic factors (Khan SS, Ning H, Shah SJ et al.10-Yeast Risk Equations for inclusion Heart Failure in the General position.J.Am.Coll.Cardiol.2019; 73: 2388-.
William b.kannel et al developed a 4-year heart failure risk assessment model, and the risk of heart failure was estimated using a logistic function consisting of 9 clinical features. This method has the same problems as the model developed by Sadiya et al: lack of ability to learn feature interactions and only consider clinical features, and no attention has been paid to epigenetic factors (Kannel WB, D' Agostino RB, Silverhatz H et al. Profile for simulating rank of heart failure. Arch. Intern. Med.1999; 159: 1197-.
Edward Choi et al established an early detection model for CHF, using modeling of the temporal relationship between health data from a patient's Electronic Health Record (EHR), to predict a diagnosis of future CHF. The method focuses only on clinical features and does not take into account the influence of epigenetic factors on future heart failure events (Choi E, Schuetz A, Stewart WF et al, using recurrent neural network models for early detection of heart failure onset. J.Am.Med.inform. asset.2016; 24: 361-.
No HFpEF risk prediction model has been developed that integrates clinical and epigenetic features. Therefore, the development of an accurate and comprehensive HFpEF risk prediction method is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the above problems, the present invention provides a model for predicting risk of ejection fraction-preserved heart failure, which is used for clinical evaluation of all patients to identify risk of HFpEF by combining end-to-end machine learning model with multiple sets of mathematical data interaction, and delaying or preventing the occurrence of HFpEF by controlling HFpEF risk factors, treating asymptomatic left ventricular systolic dysfunction, and the like.
The present invention collected 97 clinical diagnosis and treatment data of 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chip. After 8 years of follow-up, 738 participants had no heart failure performance, 59 participants were diagnosed as HFpEF, and this data was used as a training set to obtain a set of marker combinations for establishing a model for predicting risk of ejection fraction-preserved heart failure.
The terms:
BMI: body Mass Index, BMI-Body weight (kg) per height (m) squared, i.e. kg/m2
EHR: electronic health records.
AUC: area under curves.
FM: a factorization machine.
DNN: a deep neural network.
In one aspect, the invention provides a marker combination for predicting risk of ejection fraction-preserved heart failure.
The marker combination comprises:
the age of the patient;
whether to take diuretic drugs;
BMI;
urinary albumin;
serum creatinine;
beta value of methylation site.
The methylation sites include: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
The diuretic drugs include but are not limited to: furosemide (furosemide), metolazone (methazone), chlorthalidone (chlorothalidone), torasemide (torsemide), furosemide (lasix).
In another aspect, the invention provides a model for predicting risk of ejection fraction-preserved heart failure.
The information collected by the model comprises a marker combination obtained by screening the following characteristics:
the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; methylation site: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
The model is built by means of a data mining algorithm.
The model is modeled by using a deep FM algorithm on the marker combination obtained by feature screening.
The method for establishing the model comprises the steps of inputting a sigmoid function in a deep FM algorithm for early prediction of the HFpEF event, and outputting:
Figure BDA0003125067040000031
wherein the content of the first and second substances,
Figure BDA0003125067040000032
is the predicted HFpEF event, yFM is the output of the FM component, and yDNN is the output of the DNN component.
The output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is as follows:
Figure BDA0003125067040000041
wherein w ∈ Rd、Vi∈Rk(k known). Additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction.
The output of the DNN component is:
yDNN=W|H|+1·a|H|+b|H|+1
where | H | is the number of implicit layers, a(l)For the output of the embedding layer, W(l)Is a model weight, b(l)The deviation of the l-th layer.
Specifically, for a given deep hidden layer, a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:
y=f(x)=relu(wx+b),
here using loglos as the objective function, to control overfitting, add an L2 regularization penalty on the nodes, with the parameter set to 0.0001.
When the deep neural network is optimized, a batch normalization and weight attenuation method is adopted, the embedding is set to be 8, the batch size is set to be 300, and the decay size is set to be 0.9.
Preferably, the performance evaluation of the DeepFM model adopts triple-fold cross validation, Adam is used as an optimization algorithm, the learning rate is set to be 0.0001, the epoch is set to be 400, and the dropout is set to be 60%.
Preferably, the discriminatory power of the deep fm model is assessed by the area, sensitivity, specificity and accuracy under the subject's operating characteristic curve.
In a further aspect, the invention provides the use of a marker composition as described above and/or a model as described above in the manufacture of a medicament and/or a kit for use in association with heart failure of the ejection fraction retaining type.
In a further aspect, the invention provides the use of the aforementioned marker compositions and/or the aforementioned models in the construction of early predictive models of other chronic complex diseases.
The invention has the beneficial effects that:
starting with 5 clinical diagnosis features and 25 epigenetic feature DNA methylation, an early risk assessment method of HFpEF is established by using a DeepFM algorithm, and the method can extract respective complex features from original features to realize interaction among the features, and the result of the method is superior to that of the existing model and other baseline machine learning models.
Compared with the current widely used reference models such as SVC, Bagging, Random Forest, RUSBoost, Easy Ensemble, Gradient Boosting, Gaussian Naive Bayes, XGboost and LogitBoost, the deep FM model adopted by the method has the best performance.
Drawings
FIG. 1 is the AUC for predicting performance based on different characteristics in a test set, where: HFrisk model represents a model containing EHR signature and DNA methylation signature, 25CpG model represents a model with only DNA methylation signature, 5EHR model represents a model with only EHR signature.
Figure 2 is the AUC results for the baseline model in the test set and this method using 30 features.
Figure 3 is the AUC results of this method with William b.kannel model in male/female participants.
FIG. 4 is a graph showing the evaluation of the present method by a calibration chart of observed risk and predicted risk, wherein the calibration chart is a chart of the statistical test of Hosmer-Lemeshow without statistical significance (P is 0.678).
Detailed Description
The present invention will be further illustrated in detail with reference to the following specific examples, which are not intended to limit the present invention but are merely illustrative thereof. The experimental methods used in the following examples are not specifically described, and the materials, reagents and the like used in the following examples are generally commercially available under the usual conditions without specific descriptions.
Embodiment 1 a method for constructing a model for predicting risk of ejection fraction-preserved heart failure
97 clinical data of sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected in 797 cases. After 8 years of follow-up, 738 participants had no heart failure performance, and 59 participants were diagnosed with HFpEF, which was used as the training set.
And gradually screening the DNA methylation characteristics and clinical diagnosis and treatment characteristics by using a three-step characteristic selection method.
(1) For clinical diagnosis and treatment features, the following threshold values are adopted to remove incomplete and non-significant clinical features in the training set: deletion samples > 20%, and the chi-square test/Mann-Whitney U test for both sets of samples had p-values > 0.05. If the Pearson correlation between two clinical features is greater than 0.8, then the clinical feature with a smaller Spearman correlation (i.e., a smaller correlation with HFpEF) is discarded, and finally 26 clinical diagnostic features are screened as follows:
diuretic, beta-blocker, angiotensin II antagonist, ACEI, aspirin, cardiovascular disease, coronary heart disease, atrial fibrillation, sex, age, BMI, serum creatinine, diastolic blood pressure, fasting plasma glucose, high density lipoprotein cholesterol, systolic blood pressure, total cholesterol, hypertension, lipid disease, atrial augmentation, rheumatism, aortic valve disease, rheumatoid arthritis, urinary albumin, whole blood hemoglobin A1C, C reactive protein.
(2) For methylation signatures, in the training set, Differential Methylation Probes (DMPs) were obtained. DMPs were obtained using log fold change >0.05 and adjusted p value <0.05 as thresholds, and 319 DNA methylation sites were screened as follows:
cg01074797,cg10501210,cg15084543,cg16836311,cg16986315,cg26553501,cg00522231,cg02577387,cg04864807,cg20698421,cg22454769,cg01227537,cg15028458,cg17280346,cg02650266,cg04074536,cg08288016,cg04675542,cg05016408,cg05845376,cg06883126,cg10530883,cg16227623,cg18839637,cg19025234,cg23479922,cg03738025,cg08476511,cg16867657,cg18319852,cg22367191,cg23737190,cg24661236,cg09124496,cg09125127,cg07041999,cg12873476,cg14620572,cg10187894,cg17766026,cg24205914,cg24530234,cg01178624,cg15243034,cg21481937,cg01974091,cg01150270,cg18567924,cg01295034,cg04358214,cg08101977,cg08151623,cg09259081,cg09684846,cg16263848,cg17759274,cg19344626,cg24794228,cg25755428,cg00375983,cg00815832,cg01128109,cg01588224,cg02998240,cg03341469,cg03655142,cg05363438,cg06794355,cg06829788,cg07388493,cg08128734,cg08702915,cg08893087,cg09362335,cg13352914,cg16000360,cg16265542,cg22559013,cg23371584,cg00057240,cg00190206,cg03233656,cg03879180,cg05085636,cg05203213,cg05365735,cg05481257,cg05951221,cg09128529,cg10106284,cg10687131,cg10835286,cg11237792,cg11807280,cg12682972,cg13149736,cg17534916,cg19578183,cg22943590,cg25138327,cg25552548,cg27141850,cg02356435,cg03556243,cg10098541,cg11580026,cg12615982,cg14039937,cg14975410,cg19729744,cg23450509,cg01950474,cg02188818,cg02810967,cg03063309,cg03671075,cg04573661,cg07456878,cg07809027,cg07974833,cg08462122,cg11076306,cg11970349,cg15106030,cg16781992,cg18797590,cg20816447,cg02051771,cg04071270,cg05955210,cg07705913,cg08635765,cg08763461,cg11925729,cg14698665,cg15421911,cg23125993,cg23500537,cg24856658,cg26624398,cg27604145,cg02872426,cg03142554,cg03785755,cg04064963,cg05220968,cg06126421,cg06386482,cg06951627,cg09548084,cg10083824,cg11342453,cg11617964,cg12560772,cg13221458,cg14441271,cg15319032,cg15804973,cg16004593,cg17608381,cg21572722,cg21855021,cg22707857,cg23049448,cg27368039,cg02756107,cg03068497,cg03453431,cg03799713,cg06890291,cg08614290,cg09423312,cg14017689,cg14485097,cg16446288,cg17067544,cg17727071,cg21429551,cg21807944,cg23447239,cg23715104,cg25879142,cg26153045,cg26391564,cg00285394,cg00327383,cg00399059,cg07583137,cg08090164,cg10210397,cg13021857,cg13631444,cg16196274,cg17903548,cg19021188,cg20988565,cg22861548,cg23180489,cg24554944,cg25305703,cg25329685,cg25483741,cg25923729,cg27039118,cg00008629,cg01028796,cg07158339,cg10173814,cg13040392,cg13474639,cg13619074,cg13741668,cg13959831,cg23006204,cg23469878,cg00045910,cg00639447,cg03962527,cg06012872,cg06812574,cg09421083,cg10556349,cg21024264,cg24711336,cg24892069,cg27290215,cg27401945,cg01074365,cg02315732,cg04993130,cg05586607,cg06344265,cg10253640,cg11025604,cg12738765,cg19172170,cg20648141,cg01349368,cg03124318,cg04671742,cg05157098,cg06021880,cg15110296,cg17725129,cg20051875,cg23256579,cg25486399,cg00509187,cg00893603,cg01205935,cg01783816,cg02838877,cg04887675,cg12603632,cg02222791,cg03459776,cg10172979,cg13158272,cg18485215,cg19590421,cg22816294,cg27056759,cg04134722,cg10403394,cg11032707,cg23299445,cg27340001,cg00876127,cg01046070,cg01259782,cg03809021,cg04155793,cg05107535,cg05917111,cg07839457,cg08486507,cg09958192,cg21996068,cg02228185,cg03745383,cg07169660,cg12317815,cg18181703,cg19758448,cg24079381,cg00495303,cg07529654,cg24217948,cg00706441,cg00789427,cg03562414,cg06581818,cg08233235,cg09547119,cg10635122,cg13640414,cg13672791,cg19536401,cg20964856,cg22693863,cg25481705,cg25599567,cg01479232,cg01581757,cg01618988,cg04931539,cg10339152,cg10743062,cg11853697,cg11861654,cg11991566,cg17129400,cg18965930,cg20810198,cg25829531,cg22862003,cg01234420,cg01346718,cg04986324。
(3) the 26+319 features from the screening were combined and a second step of feature screening was performed using the lasso algorithm. The method comprises the steps of constructing a first-order penalty function by adding an L1 norm of a coefficient as a penalty term to a loss function, realizing parameter reduction to achieve the purpose of feature selection, enabling the result vector to be sparse, finally compressing coefficients corresponding to certain weak variables into 0, performing feature screening, and minimizing a target function:
Figure BDA0003125067040000081
in this algorithm, the hyper-parameters family is set to "binomial", type, measure is set to "auc"/"class", nfolds is set to 10, and finally 80 combined features are obtained.
(4) And fitting residual errors between predicted values and actual values of the previous round of models by using a support Gradient Boosting Tree algorithm (GBDT) and continuously performing feature splitting to generate a Tree, so that the models achieve the best effect, and meanwhile, by increasing the complexity of a regularization term limiting Tree, overfitting is further prevented. In this algorithm, the hyper-parameter object is set to "binary: local", boost "gbtree", eval metric "error", nrounds "7, eta" 0.5, max depth "3, subsample" 0.5, coltemplate "1, min child weight" 2, and gamma "0.5. Finally, 30 characteristics were obtained, as follows:
age, whether diuretics are taken, BMI, urinary albumin, serum creatinine, methylation site: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
(5) In order to better reflect the impact of the interaction between features on the risk of HFpEF, the 30 features obtained from feature screening were modeled using the deep fm algorithm. The neural network model DeepFM integrates the frameworks of a Factorization Machine (FM) and a Deep Neural Network (DNN), wherein the FM is low-order feature interactive modeling, the DNN is high-order feature interactive modeling, cross features are learned in the form of dot products and hidden vectors, the neural network model DeepFM has the capability of automatically learning the cross features, and end-to-end models of respective complex features can be extracted from original features. Deep fm can train efficiently because its wide and deep parts are different, but share the same input and embedding vectors. Deep fm extracts DNA methylation and EHR features and learns the combination of features hidden behind these features. The deep FM jointly trains the whole network in an end-to-end mode, and finally inputs a sigmoid function for early prediction of HFpEF events, and the output is as follows:
Figure BDA0003125067040000091
wherein the content of the first and second substances,
Figure BDA0003125067040000092
is the predicted HFpEF event, yFM is the output of the FM component, and yDNN is the output of the DNN component. The FM component and the DNN component represent a factorizer and a feedforward neural network, respectively, for low-order feature interaction and high-order feature interaction. The FM model uses 2-order feature interaction as the inner product of potential vectors of respective features, and can capture order-2 feature interaction more effectively than the prior method, especially in the case of sparse data set. The output of the FM is the sum of an addition unit and a plurality of inner product units, and the output of the FM is as follows:
Figure BDA0003125067040000093
wherein w ∈ Rd、Vi∈Rk(k known). Additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction. The depth component DNN is a feed-forward neural network for learning higher-order feature interactionsThe output of DNN is:
yDNN=W|H|+1·a|H|+b|H|+1
where | H | is the number of implicit layers, a(l)For the output of the embedding layer, W(l)Is a model weight, b(l)The deviation of the l-th layer. For a given deep hidden layer, a deep neural network is implemented using ReLU as the activation function, which contains two hidden layers (256 ):
y=f(x)=relu(wx+b)。
here, loglos is used as the objective function. To control the overfitting, an L2 regularization penalty was added on the nodes with the parameter set to 0.0001. To optimize the neural network, batch normalization and weight attenuation methods are employed. The embedding is set to 8, the batch size is set to 300, and the decay size is set to 0.9. To train the DeepFM algorithm, Adam was used as the optimization algorithm, the learning rate was set to 0.0001, epoch was set to 400, and dropout was set to 60%, and the performance of the DeepFM model was evaluated using triple-fold cross-validation.
(6) Finally, the area under the subject's operating characteristic curve, sensitivity, specificity and accuracy were used to assess the recognition ability of all applied models. An operation characteristic curve of a subject, namely an ROC curve, is a curve which is drawn by taking a True Positive Rate TPR (True Positive Rate, which can indicate sensitivity) as an ordinate and a False Positive Rate FPR (False Positive Rate, which can indicate 1-specificity) as an abscissa according to a series of set thresholds, and reflects the changes of the TPR and the FPR under different thresholds, wherein the closer the curve is to the upper left corner, the better the classification performance of the model is; the sensitivity is the proportion correctly predicted in the model, which represents the proportion of the classified proportion in all the positive examples, and the recognition capability of the classifier on the positive examples is measured; the accuracy describes the judging capability of the classifier on the whole data, and positive judgment can be determined as positive, and negative judgment can be determined as negative; the accuracy is the most common evaluation index, and is the number of paired samples divided by the number of all samples, and generally speaking, the higher the accuracy, the better the classifier; the specificity represents the proportion of all negative examples which are paired, and the recognition capability of the classifier on the negative examples is measured.
Example 2 model verification
171 sampling participants without cardiovascular disease were collected, and after 8 years of follow-up, 139 participants had no heart failure, and 32 participants were diagnosed as HFpEF, which was used as the test set for the method and evaluated.
And distinguishing and calibrating the model by adopting AUC and Hosmer-Lemeshow test. The following table is a confusion matrix for the 30 features used in the method in the test set:
Figure BDA0003125067040000101
in the test set, the AUC obtained using this model was 0.90 (95% confidence interval, 0.89-0.90). The calibration curve obtained by the model is shown in FIG. 1, and the statistical test of Hosmer-Lemeshow has no statistical significance (P is 0.678), thereby proving the reliability of the result.
To assess the impact of training set sample size on this model, 75%, 60%, 50% and 25% of the training set participants were randomly selected. The test set results are as follows:
sample size Participants AUC Sensitivity of the probe Specificity of Accuracy rate
25% Verification set 0.90(0.76-1.00) 0.89(0.57-1.00) 0.93(0.84-1.00) 0.92(0.86-0.99)
25% Test set 0.88(0.84-0.91) 0.82(0.76-0.88) 0.84(0.66-1.00) 0.84(0.69-0.98)
50% Verification set 0.87(0.79-0.95) 0.86(0.62-1.00) 0.79(0.61-0.98) 0.80(0.65-0.95)
50% Test set 0.89(0.87-0.90) 0.85(0.80-0.91) 0.87(0.75-0.98) 0.86(0.78-0.95)
60% Verification set 0.92(0.87-0.96) 0.89(0.82-0.95) 0.84(0.69-0.99) 0.85(0.71-0.98)
60% Test set 0.89(0.88-0.89) 0.83(0.77-0.89) 0.85(0.72-0.98) 0.86(0.75-0.94)
75% Verification set 0.90(0.86-0.95) 0.84(0.57-1.00) 0.87(0.66-1.00) 0.87(0.70-1.00)
75% Test set 0.88(0.85-0.91) 0.76(0.70-0.82) 0.90(0.82-0.98) 0.87(0.81-0.93)
The results show that the results of the test set are independent of the sample size of the training set.
In addition, the HFrisk model is compared with the reference models which are widely used at present, such as SVC, Bagging, Random Forest, RUSBoost, Easy Ensemble, Gradient Boosting, Gaussian Naive Bayes, XGboost and LogitBoost in performance. Each reference model is fine tuned to obtain better results. Using the same 30 characteristics, AUC results were obtained, as shown in fig. 2. Although the AUC (AUC 0.57-0.88) of each model was slightly different, the performance of the deep fm model was still the best.
This model was compared to other published models: william B.Kannel et al proposed a 4-year Risk assessment model (Profile for Estimating Risk of Heart Failure [ J ]. Archives of Internal Medicine,1999,159(11): 1197-. The same training set of William b.kannel was used to construct the model and compared to the model constructed herein. The AUC obtained by the model of the invention is respectively as follows: male (male)0.99, female (female) 0.94; AUC for William b.kannel model is: male 0.74 female 0.89. The results are shown in FIG. 3.
AUC or C statistics of this model were directly compared to published models: sadiya S.Khan et al describe the 10 Year CHF Risk equation (Khan S, Ning H, Shah S J, et al.10-Yeast Risk equalizations for incorporated healthcare in the General position [ J ]. Journal of the American College of medicine, 2019,73(19): 2388-; edward Choi et al established an early detection model for CHF with a test set AUC < 0.80. The results demonstrate that the method is optimal.
Calibration degree the consistency/calibration degree, i.e. the difference between predicted and actual values, is often evaluated using a calibration map, i.e. a scatter plot of the actual (observed) and predicted (predicted) occurrence, which is a visualization of the results of the Hosmer-Lemeshow goodness-of-fit test. And (3) transferring the real result and the model fitting probability to a hoslem. The results are shown in FIG. 4.

Claims (10)

1. A marker combination for predicting risk of ejection fraction-preserved heart failure, comprising:
a beta value for a methylation site, said methylation site comprising: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg 0704153999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910;
the age of the patient;
whether to take diuretic drugs;
BMI;
urinary albumin;
serum creatinine.
2. A model for predicting risk of ejection fraction-preserved heart failure, wherein information collected by said model comprises the following combinations of markers:
the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; beta value of methylation site;
the methylation sites include: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
3. The model for predicting risk of ejection fraction-preserved heart failure as set forth in claim 2, wherein the model is modeled by a combination of markers obtained by feature screening using a deep fm algorithm.
4. The model of claim 3, wherein the model is built by inputting sigmoid function in deep FM algorithm for early prediction of the event of ejection fraction-preserved heart failure, and the output is:
Figure FDA0003125067030000011
wherein the content of the first and second substances,
Figure FDA0003125067030000012
is the predicted ejection fraction preserving heart failure event, yFM is the output of the FM component,yDNN is the output of the DNN component.
5. The model of claim 4, wherein the output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is:
Figure FDA0003125067030000021
wherein w ∈ Rd、Vi∈Rk(k is known); additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction.
6. The model for predicting risk of ejection fraction preserving heart failure as claimed in claim 4, wherein the output of said DNN component is:
yDNN=W|H|+1·a|H|+b|H|+1
where | H | is the number of implicit layers, a(l)For the output of the embedding layer, W(l)Is a model weight, b(l)The deviation of the l-th layer.
7. The model for predicting risk of ejection fraction-preserved heart failure as recited in claim 5, wherein for a given deep hidden layer, a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:
y=f(x)=relu(wx+b),
here using loglos as the objective function, to control overfitting, add an L2 regularization penalty on the nodes, with the parameter set to 0.0001.
8. The model of claim 2, wherein the performance evaluation of the deep fm model is triple-fold cross-validation using Adam as an optimization algorithm, learning rate is set to 0.0001, epoch is set to 400, and dropout is set to 60%.
9. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the manufacture of a medicament and/or a kit for use in association with heart failure of the ejection fraction retaining type.
10. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the construction of an early predictive model of other chronic complex diseases.
CN202110686960.8A 2021-06-21 Model for predicting ejection fraction retention type heart failure risk Active CN113421648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110686960.8A CN113421648B (en) 2021-06-21 Model for predicting ejection fraction retention type heart failure risk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110686960.8A CN113421648B (en) 2021-06-21 Model for predicting ejection fraction retention type heart failure risk

Publications (2)

Publication Number Publication Date
CN113421648A true CN113421648A (en) 2021-09-21
CN113421648B CN113421648B (en) 2024-04-23

Family

ID=

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130310436A1 (en) * 2012-05-18 2013-11-21 The Regents Of The University Of Colorado, A Body Corporate Methods for predicting response to beta-blocker therapy in non-ischemic heart failure patients
CN104994912A (en) * 2012-12-06 2015-10-21 康肽德生物医药技术有限公司 Peptide therapeutics and methods for using same
CN105586406A (en) * 2016-01-15 2016-05-18 汪道文 Method for detecting gene polymorphism of ADRB1 and GRK5
US20170363620A1 (en) * 2016-06-17 2017-12-21 Abbott Laboratories BIOMARKERS TO PREDICT NEW ONSET HEART FAILURE WITH PRESERVED EJECTION FRACTION (HFpEF)
CN107683341A (en) * 2015-05-08 2018-02-09 新加坡科技研究局 method for the diagnosis and prognosis of chronic heart failure
JP2018072337A (en) * 2016-10-21 2018-05-10 国立研究開発法人国立循環器病研究センター Method of predicting recurrence risk of major adverse cardiac event

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130310436A1 (en) * 2012-05-18 2013-11-21 The Regents Of The University Of Colorado, A Body Corporate Methods for predicting response to beta-blocker therapy in non-ischemic heart failure patients
CN104994912A (en) * 2012-12-06 2015-10-21 康肽德生物医药技术有限公司 Peptide therapeutics and methods for using same
CN107683341A (en) * 2015-05-08 2018-02-09 新加坡科技研究局 method for the diagnosis and prognosis of chronic heart failure
CN105586406A (en) * 2016-01-15 2016-05-18 汪道文 Method for detecting gene polymorphism of ADRB1 and GRK5
US20170363620A1 (en) * 2016-06-17 2017-12-21 Abbott Laboratories BIOMARKERS TO PREDICT NEW ONSET HEART FAILURE WITH PRESERVED EJECTION FRACTION (HFpEF)
JP2018072337A (en) * 2016-10-21 2018-05-10 国立研究開発法人国立循環器病研究センター Method of predicting recurrence risk of major adverse cardiac event

Similar Documents

Publication Publication Date Title
Chetty et al. Role of attributes selection in classification of Chronic Kidney Disease patients
Kurland et al. Longitudinal data with follow-up truncated by death: match the analysis method to research aims
WO2019071098A2 (en) Methods for predicting or detecting disease
Orr Use of a probabilistic neural network to estimate the risk of mortality after cardiac surgery
CN109273094B (en) Construction method and construction system of Kawasaki disease risk assessment model based on Boosting algorithm
Abed-Esfahani et al. Transfer Learning for Depression: Early Detection and Severity Prediction from Social Media Postings.
Spyroglou et al. A bayesian logistic regression approach in asthma persistence prediction
Yi et al. XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
Sudharson et al. Enhancing the Efficiency of Lung Disease Prediction using CatBoost and Expectation Maximization Algorithms
Gudelis et al. Diagnosis of pain in the right iliac fossa. A new diagnostic score based on Decision-Tree and Artificial Neural Network Methods
Yaseliani et al. Prediction of heart diseases using logistic regression and likelihood ratios
CN113421648A (en) Model for predicting risk of ejection fraction retention type heart failure
CN113421648B (en) Model for predicting ejection fraction retention type heart failure risk
Chiu et al. Intelligent systems developed for the early detection of chronic kidney disease
Jung et al. Outcomes and factors leading to graft failure in kidney transplants from deceased donors with acute kidney injury—A retrospective cohort study
Jiang et al. Prediction of coronary heart disease in gout patients using machine learning models
CN115188475A (en) Risk prediction method for lupus nephritis patient
Pihur et al. Meta analysis of chronic fatigue syndrome through integration of clinical, gene expression, SNP and proteomic data
Tran-Dinh et al. Personalized risk predictor for acute cellular rejection in lung transplant using soluble CD31
Satapathy et al. Observation-prevention framework of cardiac risk factors: An Indian study
Karunakar et al. Unified time series analysis with Bi-long short-term memory model for early prediction of dyslipidemia in steel workers
Shahabi et al. Rule extraction for fatty liver detection using neural networks
Herzog et al. Deep transformation models for functional outcome prediction after acute ischemic stroke
Nguyen et al. Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study
Zeynep et al. Performance evaluation of the ensemble learning models in the classification of chronic kidney failure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant