CN113421648A - Model for predicting risk of ejection fraction retention type heart failure - Google Patents
Model for predicting risk of ejection fraction retention type heart failure Download PDFInfo
- Publication number
- CN113421648A CN113421648A CN202110686960.8A CN202110686960A CN113421648A CN 113421648 A CN113421648 A CN 113421648A CN 202110686960 A CN202110686960 A CN 202110686960A CN 113421648 A CN113421648 A CN 113421648A
- Authority
- CN
- China
- Prior art keywords
- model
- heart failure
- ejection fraction
- output
- deep
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010019280 Heart failures Diseases 0.000 title claims abstract description 35
- 230000014759 maintenance of location Effects 0.000 title abstract description 5
- 230000003993 interaction Effects 0.000 claims abstract description 15
- DDRJAANPRJIHGJ-UHFFFAOYSA-N creatinine Chemical compound CN1CC(=O)NC1=N DDRJAANPRJIHGJ-UHFFFAOYSA-N 0.000 claims abstract description 14
- 239000003550 marker Substances 0.000 claims abstract description 11
- 230000011987 methylation Effects 0.000 claims abstract description 11
- 238000007069 methylation reaction Methods 0.000 claims abstract description 11
- 239000003814 drug Substances 0.000 claims abstract description 10
- 239000002934 diuretic Substances 0.000 claims abstract description 8
- 229940109239 creatinine Drugs 0.000 claims abstract description 7
- 230000001882 diuretic effect Effects 0.000 claims abstract description 7
- 229940079593 drug Drugs 0.000 claims abstract description 7
- 230000002485 urinary effect Effects 0.000 claims abstract description 7
- 108010088751 Albumins Proteins 0.000 claims abstract description 6
- 102000009027 Albumins Human genes 0.000 claims abstract description 6
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 210000002966 serum Anatomy 0.000 claims abstract description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 8
- 201000010099 disease Diseases 0.000 claims description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000001684 chronic effect Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 abstract description 3
- 206010049694 Left Ventricular Dysfunction Diseases 0.000 abstract description 2
- 230000003111 delayed effect Effects 0.000 abstract 1
- 208000038003 heart failure with preserved ejection fraction Diseases 0.000 description 23
- 238000000034 method Methods 0.000 description 21
- 238000012360 testing method Methods 0.000 description 13
- 230000007067 DNA methylation Effects 0.000 description 12
- 206010007558 Cardiac failure chronic Diseases 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 208000024172 Cardiovascular disease Diseases 0.000 description 5
- 230000001973 epigenetic effect Effects 0.000 description 5
- 238000012502 risk assessment Methods 0.000 description 5
- 239000000523 sample Substances 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 238000003759 clinical diagnosis Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- ZZUFCTLCJUWOSV-UHFFFAOYSA-N furosemide Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC(C(O)=O)=C1NCC1=CC=CO1 ZZUFCTLCJUWOSV-UHFFFAOYSA-N 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- NGBFQHCMQULJNZ-UHFFFAOYSA-N Torsemide Chemical compound CC(C)NC(=O)NS(=O)(=O)C1=CN=CC=C1NC1=CC=CC(C)=C1 NGBFQHCMQULJNZ-UHFFFAOYSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000037149 energy metabolism Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 229960003883 furosemide Drugs 0.000 description 2
- 208000038002 heart failure with reduced ejection fraction Diseases 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000002107 myocardial effect Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000000528 statistical test Methods 0.000 description 2
- 229960005461 torasemide Drugs 0.000 description 2
- JIVPVXMEBJLZRO-CQSZACIVSA-N 2-chloro-5-[(1r)-1-hydroxy-3-oxo-2h-isoindol-1-yl]benzenesulfonamide Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC([C@@]2(O)C3=CC=CC=C3C(=O)N2)=C1 JIVPVXMEBJLZRO-CQSZACIVSA-N 0.000 description 1
- 208000021959 Abnormal metabolism Diseases 0.000 description 1
- 229940123413 Angiotensin II antagonist Drugs 0.000 description 1
- 208000027896 Aortic valve disease Diseases 0.000 description 1
- BSYNRYMUTXBXSQ-UHFFFAOYSA-N Aspirin Chemical compound CC(=O)OC1=CC=CC=C1C(O)=O BSYNRYMUTXBXSQ-UHFFFAOYSA-N 0.000 description 1
- 206010003658 Atrial Fibrillation Diseases 0.000 description 1
- 108010074051 C-Reactive Protein Proteins 0.000 description 1
- 102100032752 C-reactive protein Human genes 0.000 description 1
- 229940097420 Diuretic Drugs 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108010023302 HDL Cholesterol Proteins 0.000 description 1
- 102000001554 Hemoglobins Human genes 0.000 description 1
- 108010054147 Hemoglobins Proteins 0.000 description 1
- 238000001207 Hosmer–Lemeshow test Methods 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 206010028594 Myocardial fibrosis Diseases 0.000 description 1
- 108010071390 Serum Albumin Proteins 0.000 description 1
- 102000007562 Serum Albumin Human genes 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 229960001138 acetylsalicylic acid Drugs 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 239000002333 angiotensin II receptor antagonist Substances 0.000 description 1
- 230000001746 atrial effect Effects 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000002876 beta blocker Substances 0.000 description 1
- 229940097320 beta blocking agent Drugs 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000011088 calibration curve Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 229960001523 chlortalidone Drugs 0.000 description 1
- JIVPVXMEBJLZRO-UHFFFAOYSA-N chlorthalidone Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC(C2(O)C3=CC=CC=C3C(=O)N2)=C1 JIVPVXMEBJLZRO-UHFFFAOYSA-N 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 208000029078 coronary artery disease Diseases 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000035487 diastolic blood pressure Effects 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 229940030606 diuretics Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000007608 epigenetic mechanism Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000006371 metabolic abnormality Effects 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 229960002817 metolazone Drugs 0.000 description 1
- AQCHWTWZEMGIFD-UHFFFAOYSA-N metolazone Chemical compound CC1NC2=CC(Cl)=C(S(N)(=O)=O)C=C2C(=O)N1C1=CC=CC=C1C AQCHWTWZEMGIFD-UHFFFAOYSA-N 0.000 description 1
- 239000000712 neurohormone Substances 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001991 pathophysiological effect Effects 0.000 description 1
- 230000007310 pathophysiology Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 206010039073 rheumatoid arthritis Diseases 0.000 description 1
- 238000013058 risk prediction model Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000035488 systolic blood pressure Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000002861 ventricular Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Abstract
The invention provides a model for predicting ejection fraction retention type heart failure risk, wherein the information collected by the model comprises the following marker combinations: the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; beta values for 25 methylation sites. Clinical evaluation is carried out on all patients by combining an end-to-end machine learning model with multi-group data interaction so as to identify risks, and the occurrence of ejection fraction retention type heart failure is delayed or prevented by controlling risk factors, treating asymptomatic left ventricular systolic dysfunction and the like. The model provided by the invention has good clinical application prospect.
Description
Technical Field
The invention belongs to the field of disease diagnosis models, and particularly relates to a model for predicting ejection fraction retention type heart failure risk.
Background
In recent years, heart failure morbidity and mortality have increased year by year. Heart failure is an abnormal change in the structure or function of the heart caused by a complex interaction of biochemical factors such as genetics, neurohormones, metabolism, inflammation, etc. Chronic Heart Failure (CHF) is characterized by disturbances in myocardial energy metabolism and metabolic remodeling, with high morbidity and mortality. Therefore, it is essential to obtain accurate individualized risk assessment to assist in further management of clinical decisions.
There are three subtypes of chronic heart failure recognized at present, and they are classified into heart failure with reduced ejection fraction (HFrEF), heart failure with medium ejection fraction (HFmrEF), and heart failure with preserved ejection fraction (HFpEF) according to Left Ventricular Ejection Fraction (LVEF). The three subtypes differ greatly in etiology and pathophysiology. It is noteworthy that early prediction of HFpEF remains challenging. The establishment of an HFpEF early prediction model is very important for risk assessment management and clinical decision of heart failure, and the method controls diet and living habits in time aiming at patients with high HFpEF risk and better accords with the principle of accurate prevention.
Heart failure is a multifactorial disease, the occurrence of which is the result of the combined action of genetic and environmental factors. Epigenetic mechanisms such as DNA methylation are involved in regulating myocardial fibrosis, causing myocardial energy metabolism disorder, abnormal metabolism, transportation, activation and the like of amino acid, promoting cardiovascular disease development and influencing individual disease risk, and are one of the pathophysiological reasons for HFpEF occurrence. It is well known that the development of HFpEF is closely related to clinical factors. The risk of HFpEF onset increases dramatically with age. DNA methylation and clinical features can describe disease states in different dimensions, with internal interactions.
In the prior art, Sadiya et al developed a 10-year heart failure risk model that included 10 clinical features, but did not discuss the pathogenesis and different subtypes of CHF. This method lacks the ability to learn feature interactions and considers only clinical features, with no focus on epigenetic factors (Khan SS, Ning H, Shah SJ et al.10-Yeast Risk Equations for inclusion Heart Failure in the General position.J.Am.Coll.Cardiol.2019; 73: 2388-.
William b.kannel et al developed a 4-year heart failure risk assessment model, and the risk of heart failure was estimated using a logistic function consisting of 9 clinical features. This method has the same problems as the model developed by Sadiya et al: lack of ability to learn feature interactions and only consider clinical features, and no attention has been paid to epigenetic factors (Kannel WB, D' Agostino RB, Silverhatz H et al. Profile for simulating rank of heart failure. Arch. Intern. Med.1999; 159: 1197-.
Edward Choi et al established an early detection model for CHF, using modeling of the temporal relationship between health data from a patient's Electronic Health Record (EHR), to predict a diagnosis of future CHF. The method focuses only on clinical features and does not take into account the influence of epigenetic factors on future heart failure events (Choi E, Schuetz A, Stewart WF et al, using recurrent neural network models for early detection of heart failure onset. J.Am.Med.inform. asset.2016; 24: 361-.
No HFpEF risk prediction model has been developed that integrates clinical and epigenetic features. Therefore, the development of an accurate and comprehensive HFpEF risk prediction method is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the above problems, the present invention provides a model for predicting risk of ejection fraction-preserved heart failure, which is used for clinical evaluation of all patients to identify risk of HFpEF by combining end-to-end machine learning model with multiple sets of mathematical data interaction, and delaying or preventing the occurrence of HFpEF by controlling HFpEF risk factors, treating asymptomatic left ventricular systolic dysfunction, and the like.
The present invention collected 97 clinical diagnosis and treatment data of 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chip. After 8 years of follow-up, 738 participants had no heart failure performance, 59 participants were diagnosed as HFpEF, and this data was used as a training set to obtain a set of marker combinations for establishing a model for predicting risk of ejection fraction-preserved heart failure.
The terms:
BMI: body Mass Index, BMI-Body weight (kg) per height (m) squared, i.e. kg/m2。
EHR: electronic health records.
AUC: area under curves.
FM: a factorization machine.
DNN: a deep neural network.
In one aspect, the invention provides a marker combination for predicting risk of ejection fraction-preserved heart failure.
The marker combination comprises:
the age of the patient;
whether to take diuretic drugs;
BMI;
urinary albumin;
serum creatinine;
beta value of methylation site.
The methylation sites include: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
The diuretic drugs include but are not limited to: furosemide (furosemide), metolazone (methazone), chlorthalidone (chlorothalidone), torasemide (torsemide), furosemide (lasix).
In another aspect, the invention provides a model for predicting risk of ejection fraction-preserved heart failure.
The information collected by the model comprises a marker combination obtained by screening the following characteristics:
the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; methylation site: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
The model is built by means of a data mining algorithm.
The model is modeled by using a deep FM algorithm on the marker combination obtained by feature screening.
The method for establishing the model comprises the steps of inputting a sigmoid function in a deep FM algorithm for early prediction of the HFpEF event, and outputting:
wherein the content of the first and second substances,is the predicted HFpEF event, yFM is the output of the FM component, and yDNN is the output of the DNN component.
The output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is as follows:
wherein w ∈ Rd、Vi∈Rk(k known). Additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction.
The output of the DNN component is:
yDNN=W|H|+1·a|H|+b|H|+1。
where | H | is the number of implicit layers, a(l)For the output of the embedding layer, W(l)Is a model weight, b(l)The deviation of the l-th layer.
Specifically, for a given deep hidden layer, a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:
y=f(x)=relu(wx+b),
here using loglos as the objective function, to control overfitting, add an L2 regularization penalty on the nodes, with the parameter set to 0.0001.
When the deep neural network is optimized, a batch normalization and weight attenuation method is adopted, the embedding is set to be 8, the batch size is set to be 300, and the decay size is set to be 0.9.
Preferably, the performance evaluation of the DeepFM model adopts triple-fold cross validation, Adam is used as an optimization algorithm, the learning rate is set to be 0.0001, the epoch is set to be 400, and the dropout is set to be 60%.
Preferably, the discriminatory power of the deep fm model is assessed by the area, sensitivity, specificity and accuracy under the subject's operating characteristic curve.
In a further aspect, the invention provides the use of a marker composition as described above and/or a model as described above in the manufacture of a medicament and/or a kit for use in association with heart failure of the ejection fraction retaining type.
In a further aspect, the invention provides the use of the aforementioned marker compositions and/or the aforementioned models in the construction of early predictive models of other chronic complex diseases.
The invention has the beneficial effects that:
starting with 5 clinical diagnosis features and 25 epigenetic feature DNA methylation, an early risk assessment method of HFpEF is established by using a DeepFM algorithm, and the method can extract respective complex features from original features to realize interaction among the features, and the result of the method is superior to that of the existing model and other baseline machine learning models.
Compared with the current widely used reference models such as SVC, Bagging, Random Forest, RUSBoost, Easy Ensemble, Gradient Boosting, Gaussian Naive Bayes, XGboost and LogitBoost, the deep FM model adopted by the method has the best performance.
Drawings
FIG. 1 is the AUC for predicting performance based on different characteristics in a test set, where: HFrisk model represents a model containing EHR signature and DNA methylation signature, 25CpG model represents a model with only DNA methylation signature, 5EHR model represents a model with only EHR signature.
Figure 2 is the AUC results for the baseline model in the test set and this method using 30 features.
Figure 3 is the AUC results of this method with William b.kannel model in male/female participants.
FIG. 4 is a graph showing the evaluation of the present method by a calibration chart of observed risk and predicted risk, wherein the calibration chart is a chart of the statistical test of Hosmer-Lemeshow without statistical significance (P is 0.678).
Detailed Description
The present invention will be further illustrated in detail with reference to the following specific examples, which are not intended to limit the present invention but are merely illustrative thereof. The experimental methods used in the following examples are not specifically described, and the materials, reagents and the like used in the following examples are generally commercially available under the usual conditions without specific descriptions.
Embodiment 1 a method for constructing a model for predicting risk of ejection fraction-preserved heart failure
97 clinical data of sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected in 797 cases. After 8 years of follow-up, 738 participants had no heart failure performance, and 59 participants were diagnosed with HFpEF, which was used as the training set.
And gradually screening the DNA methylation characteristics and clinical diagnosis and treatment characteristics by using a three-step characteristic selection method.
(1) For clinical diagnosis and treatment features, the following threshold values are adopted to remove incomplete and non-significant clinical features in the training set: deletion samples > 20%, and the chi-square test/Mann-Whitney U test for both sets of samples had p-values > 0.05. If the Pearson correlation between two clinical features is greater than 0.8, then the clinical feature with a smaller Spearman correlation (i.e., a smaller correlation with HFpEF) is discarded, and finally 26 clinical diagnostic features are screened as follows:
diuretic, beta-blocker, angiotensin II antagonist, ACEI, aspirin, cardiovascular disease, coronary heart disease, atrial fibrillation, sex, age, BMI, serum creatinine, diastolic blood pressure, fasting plasma glucose, high density lipoprotein cholesterol, systolic blood pressure, total cholesterol, hypertension, lipid disease, atrial augmentation, rheumatism, aortic valve disease, rheumatoid arthritis, urinary albumin, whole blood hemoglobin A1C, C reactive protein.
(2) For methylation signatures, in the training set, Differential Methylation Probes (DMPs) were obtained. DMPs were obtained using log fold change >0.05 and adjusted p value <0.05 as thresholds, and 319 DNA methylation sites were screened as follows:
cg01074797,cg10501210,cg15084543,cg16836311,cg16986315,cg26553501,cg00522231,cg02577387,cg04864807,cg20698421,cg22454769,cg01227537,cg15028458,cg17280346,cg02650266,cg04074536,cg08288016,cg04675542,cg05016408,cg05845376,cg06883126,cg10530883,cg16227623,cg18839637,cg19025234,cg23479922,cg03738025,cg08476511,cg16867657,cg18319852,cg22367191,cg23737190,cg24661236,cg09124496,cg09125127,cg07041999,cg12873476,cg14620572,cg10187894,cg17766026,cg24205914,cg24530234,cg01178624,cg15243034,cg21481937,cg01974091,cg01150270,cg18567924,cg01295034,cg04358214,cg08101977,cg08151623,cg09259081,cg09684846,cg16263848,cg17759274,cg19344626,cg24794228,cg25755428,cg00375983,cg00815832,cg01128109,cg01588224,cg02998240,cg03341469,cg03655142,cg05363438,cg06794355,cg06829788,cg07388493,cg08128734,cg08702915,cg08893087,cg09362335,cg13352914,cg16000360,cg16265542,cg22559013,cg23371584,cg00057240,cg00190206,cg03233656,cg03879180,cg05085636,cg05203213,cg05365735,cg05481257,cg05951221,cg09128529,cg10106284,cg10687131,cg10835286,cg11237792,cg11807280,cg12682972,cg13149736,cg17534916,cg19578183,cg22943590,cg25138327,cg25552548,cg27141850,cg02356435,cg03556243,cg10098541,cg11580026,cg12615982,cg14039937,cg14975410,cg19729744,cg23450509,cg01950474,cg02188818,cg02810967,cg03063309,cg03671075,cg04573661,cg07456878,cg07809027,cg07974833,cg08462122,cg11076306,cg11970349,cg15106030,cg16781992,cg18797590,cg20816447,cg02051771,cg04071270,cg05955210,cg07705913,cg08635765,cg08763461,cg11925729,cg14698665,cg15421911,cg23125993,cg23500537,cg24856658,cg26624398,cg27604145,cg02872426,cg03142554,cg03785755,cg04064963,cg05220968,cg06126421,cg06386482,cg06951627,cg09548084,cg10083824,cg11342453,cg11617964,cg12560772,cg13221458,cg14441271,cg15319032,cg15804973,cg16004593,cg17608381,cg21572722,cg21855021,cg22707857,cg23049448,cg27368039,cg02756107,cg03068497,cg03453431,cg03799713,cg06890291,cg08614290,cg09423312,cg14017689,cg14485097,cg16446288,cg17067544,cg17727071,cg21429551,cg21807944,cg23447239,cg23715104,cg25879142,cg26153045,cg26391564,cg00285394,cg00327383,cg00399059,cg07583137,cg08090164,cg10210397,cg13021857,cg13631444,cg16196274,cg17903548,cg19021188,cg20988565,cg22861548,cg23180489,cg24554944,cg25305703,cg25329685,cg25483741,cg25923729,cg27039118,cg00008629,cg01028796,cg07158339,cg10173814,cg13040392,cg13474639,cg13619074,cg13741668,cg13959831,cg23006204,cg23469878,cg00045910,cg00639447,cg03962527,cg06012872,cg06812574,cg09421083,cg10556349,cg21024264,cg24711336,cg24892069,cg27290215,cg27401945,cg01074365,cg02315732,cg04993130,cg05586607,cg06344265,cg10253640,cg11025604,cg12738765,cg19172170,cg20648141,cg01349368,cg03124318,cg04671742,cg05157098,cg06021880,cg15110296,cg17725129,cg20051875,cg23256579,cg25486399,cg00509187,cg00893603,cg01205935,cg01783816,cg02838877,cg04887675,cg12603632,cg02222791,cg03459776,cg10172979,cg13158272,cg18485215,cg19590421,cg22816294,cg27056759,cg04134722,cg10403394,cg11032707,cg23299445,cg27340001,cg00876127,cg01046070,cg01259782,cg03809021,cg04155793,cg05107535,cg05917111,cg07839457,cg08486507,cg09958192,cg21996068,cg02228185,cg03745383,cg07169660,cg12317815,cg18181703,cg19758448,cg24079381,cg00495303,cg07529654,cg24217948,cg00706441,cg00789427,cg03562414,cg06581818,cg08233235,cg09547119,cg10635122,cg13640414,cg13672791,cg19536401,cg20964856,cg22693863,cg25481705,cg25599567,cg01479232,cg01581757,cg01618988,cg04931539,cg10339152,cg10743062,cg11853697,cg11861654,cg11991566,cg17129400,cg18965930,cg20810198,cg25829531,cg22862003,cg01234420,cg01346718,cg04986324。
(3) the 26+319 features from the screening were combined and a second step of feature screening was performed using the lasso algorithm. The method comprises the steps of constructing a first-order penalty function by adding an L1 norm of a coefficient as a penalty term to a loss function, realizing parameter reduction to achieve the purpose of feature selection, enabling the result vector to be sparse, finally compressing coefficients corresponding to certain weak variables into 0, performing feature screening, and minimizing a target function:
in this algorithm, the hyper-parameters family is set to "binomial", type, measure is set to "auc"/"class", nfolds is set to 10, and finally 80 combined features are obtained.
(4) And fitting residual errors between predicted values and actual values of the previous round of models by using a support Gradient Boosting Tree algorithm (GBDT) and continuously performing feature splitting to generate a Tree, so that the models achieve the best effect, and meanwhile, by increasing the complexity of a regularization term limiting Tree, overfitting is further prevented. In this algorithm, the hyper-parameter object is set to "binary: local", boost "gbtree", eval metric "error", nrounds "7, eta" 0.5, max depth "3, subsample" 0.5, coltemplate "1, min child weight" 2, and gamma "0.5. Finally, 30 characteristics were obtained, as follows:
age, whether diuretics are taken, BMI, urinary albumin, serum creatinine, methylation site: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
(5) In order to better reflect the impact of the interaction between features on the risk of HFpEF, the 30 features obtained from feature screening were modeled using the deep fm algorithm. The neural network model DeepFM integrates the frameworks of a Factorization Machine (FM) and a Deep Neural Network (DNN), wherein the FM is low-order feature interactive modeling, the DNN is high-order feature interactive modeling, cross features are learned in the form of dot products and hidden vectors, the neural network model DeepFM has the capability of automatically learning the cross features, and end-to-end models of respective complex features can be extracted from original features. Deep fm can train efficiently because its wide and deep parts are different, but share the same input and embedding vectors. Deep fm extracts DNA methylation and EHR features and learns the combination of features hidden behind these features. The deep FM jointly trains the whole network in an end-to-end mode, and finally inputs a sigmoid function for early prediction of HFpEF events, and the output is as follows:
wherein the content of the first and second substances,is the predicted HFpEF event, yFM is the output of the FM component, and yDNN is the output of the DNN component. The FM component and the DNN component represent a factorizer and a feedforward neural network, respectively, for low-order feature interaction and high-order feature interaction. The FM model uses 2-order feature interaction as the inner product of potential vectors of respective features, and can capture order-2 feature interaction more effectively than the prior method, especially in the case of sparse data set. The output of the FM is the sum of an addition unit and a plurality of inner product units, and the output of the FM is as follows:
wherein w ∈ Rd、Vi∈Rk(k known). Additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction. The depth component DNN is a feed-forward neural network for learning higher-order feature interactionsThe output of DNN is:
yDNN=W|H|+1·a|H|+b|H|+1。
where | H | is the number of implicit layers, a(l)For the output of the embedding layer, W(l)Is a model weight, b(l)The deviation of the l-th layer. For a given deep hidden layer, a deep neural network is implemented using ReLU as the activation function, which contains two hidden layers (256 ):
y=f(x)=relu(wx+b)。
here, loglos is used as the objective function. To control the overfitting, an L2 regularization penalty was added on the nodes with the parameter set to 0.0001. To optimize the neural network, batch normalization and weight attenuation methods are employed. The embedding is set to 8, the batch size is set to 300, and the decay size is set to 0.9. To train the DeepFM algorithm, Adam was used as the optimization algorithm, the learning rate was set to 0.0001, epoch was set to 400, and dropout was set to 60%, and the performance of the DeepFM model was evaluated using triple-fold cross-validation.
(6) Finally, the area under the subject's operating characteristic curve, sensitivity, specificity and accuracy were used to assess the recognition ability of all applied models. An operation characteristic curve of a subject, namely an ROC curve, is a curve which is drawn by taking a True Positive Rate TPR (True Positive Rate, which can indicate sensitivity) as an ordinate and a False Positive Rate FPR (False Positive Rate, which can indicate 1-specificity) as an abscissa according to a series of set thresholds, and reflects the changes of the TPR and the FPR under different thresholds, wherein the closer the curve is to the upper left corner, the better the classification performance of the model is; the sensitivity is the proportion correctly predicted in the model, which represents the proportion of the classified proportion in all the positive examples, and the recognition capability of the classifier on the positive examples is measured; the accuracy describes the judging capability of the classifier on the whole data, and positive judgment can be determined as positive, and negative judgment can be determined as negative; the accuracy is the most common evaluation index, and is the number of paired samples divided by the number of all samples, and generally speaking, the higher the accuracy, the better the classifier; the specificity represents the proportion of all negative examples which are paired, and the recognition capability of the classifier on the negative examples is measured.
Example 2 model verification
171 sampling participants without cardiovascular disease were collected, and after 8 years of follow-up, 139 participants had no heart failure, and 32 participants were diagnosed as HFpEF, which was used as the test set for the method and evaluated.
And distinguishing and calibrating the model by adopting AUC and Hosmer-Lemeshow test. The following table is a confusion matrix for the 30 features used in the method in the test set:
in the test set, the AUC obtained using this model was 0.90 (95% confidence interval, 0.89-0.90). The calibration curve obtained by the model is shown in FIG. 1, and the statistical test of Hosmer-Lemeshow has no statistical significance (P is 0.678), thereby proving the reliability of the result.
To assess the impact of training set sample size on this model, 75%, 60%, 50% and 25% of the training set participants were randomly selected. The test set results are as follows:
sample size | Participants | AUC | Sensitivity of the probe | Specificity of | |
25% | Verification set | 0.90(0.76-1.00) | 0.89(0.57-1.00) | 0.93(0.84-1.00) | 0.92(0.86-0.99) |
25% | Test set | 0.88(0.84-0.91) | 0.82(0.76-0.88) | 0.84(0.66-1.00) | 0.84(0.69-0.98) |
50% | Verification set | 0.87(0.79-0.95) | 0.86(0.62-1.00) | 0.79(0.61-0.98) | 0.80(0.65-0.95) |
50% | Test set | 0.89(0.87-0.90) | 0.85(0.80-0.91) | 0.87(0.75-0.98) | 0.86(0.78-0.95) |
60% | Verification set | 0.92(0.87-0.96) | 0.89(0.82-0.95) | 0.84(0.69-0.99) | 0.85(0.71-0.98) |
60% | Test set | 0.89(0.88-0.89) | 0.83(0.77-0.89) | 0.85(0.72-0.98) | 0.86(0.75-0.94) |
75% | Verification set | 0.90(0.86-0.95) | 0.84(0.57-1.00) | 0.87(0.66-1.00) | 0.87(0.70-1.00) |
75% | Test set | 0.88(0.85-0.91) | 0.76(0.70-0.82) | 0.90(0.82-0.98) | 0.87(0.81-0.93) |
The results show that the results of the test set are independent of the sample size of the training set.
In addition, the HFrisk model is compared with the reference models which are widely used at present, such as SVC, Bagging, Random Forest, RUSBoost, Easy Ensemble, Gradient Boosting, Gaussian Naive Bayes, XGboost and LogitBoost in performance. Each reference model is fine tuned to obtain better results. Using the same 30 characteristics, AUC results were obtained, as shown in fig. 2. Although the AUC (AUC 0.57-0.88) of each model was slightly different, the performance of the deep fm model was still the best.
This model was compared to other published models: william B.Kannel et al proposed a 4-year Risk assessment model (Profile for Estimating Risk of Heart Failure [ J ]. Archives of Internal Medicine,1999,159(11): 1197-. The same training set of William b.kannel was used to construct the model and compared to the model constructed herein. The AUC obtained by the model of the invention is respectively as follows: male (male)0.99, female (female) 0.94; AUC for William b.kannel model is: male 0.74 female 0.89. The results are shown in FIG. 3.
AUC or C statistics of this model were directly compared to published models: sadiya S.Khan et al describe the 10 Year CHF Risk equation (Khan S, Ning H, Shah S J, et al.10-Yeast Risk equalizations for incorporated healthcare in the General position [ J ]. Journal of the American College of medicine, 2019,73(19): 2388-; edward Choi et al established an early detection model for CHF with a test set AUC < 0.80. The results demonstrate that the method is optimal.
Calibration degree the consistency/calibration degree, i.e. the difference between predicted and actual values, is often evaluated using a calibration map, i.e. a scatter plot of the actual (observed) and predicted (predicted) occurrence, which is a visualization of the results of the Hosmer-Lemeshow goodness-of-fit test. And (3) transferring the real result and the model fitting probability to a hoslem. The results are shown in FIG. 4.
Claims (10)
1. A marker combination for predicting risk of ejection fraction-preserved heart failure, comprising:
a beta value for a methylation site, said methylation site comprising: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg 0704153999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910;
the age of the patient;
whether to take diuretic drugs;
BMI;
urinary albumin;
serum creatinine.
2. A model for predicting risk of ejection fraction-preserved heart failure, wherein information collected by said model comprises the following combinations of markers:
the age of the patient; whether to take diuretic drugs; BMI; urinary albumin; serum creatinine; beta value of methylation site;
the methylation sites include: cg06344265, cg27401945, cg25755428, cg24205914, cg23299445, cg21429551, cg21024264, cg20051875, cg17766026, cg16781992, cg13352914, cg11853697, cg 1055638, cg10083824, cg08614290, cg 01908177, cg07041999, cg 058476, cg 05481251251257, cg05363438, cg03556243, cg 33656, cg00522231, cg00495303, cg 00045910.
3. The model for predicting risk of ejection fraction-preserved heart failure as set forth in claim 2, wherein the model is modeled by a combination of markers obtained by feature screening using a deep fm algorithm.
4. The model of claim 3, wherein the model is built by inputting sigmoid function in deep FM algorithm for early prediction of the event of ejection fraction-preserved heart failure, and the output is:
5. The model of claim 4, wherein the output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is:
wherein w ∈ Rd、Vi∈Rk(k is known); additional dosage Unit<w,x>Reflecting the importance of the order-1 feature, the inner product unit represents the influence of the order-2 feature interaction.
6. The model for predicting risk of ejection fraction preserving heart failure as claimed in claim 4, wherein the output of said DNN component is:
yDNN=W|H|+1·a|H|+b|H|+1;
where | H | is the number of implicit layers, a(l)For the output of the embedding layer, W(l)Is a model weight, b(l)The deviation of the l-th layer.
7. The model for predicting risk of ejection fraction-preserved heart failure as recited in claim 5, wherein for a given deep hidden layer, a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:
y=f(x)=relu(wx+b),
here using loglos as the objective function, to control overfitting, add an L2 regularization penalty on the nodes, with the parameter set to 0.0001.
8. The model of claim 2, wherein the performance evaluation of the deep fm model is triple-fold cross-validation using Adam as an optimization algorithm, learning rate is set to 0.0001, epoch is set to 400, and dropout is set to 60%.
9. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the manufacture of a medicament and/or a kit for use in association with heart failure of the ejection fraction retaining type.
10. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the construction of an early predictive model of other chronic complex diseases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110686960.8A CN113421648B (en) | 2021-06-21 | Model for predicting ejection fraction retention type heart failure risk |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110686960.8A CN113421648B (en) | 2021-06-21 | Model for predicting ejection fraction retention type heart failure risk |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421648A true CN113421648A (en) | 2021-09-21 |
CN113421648B CN113421648B (en) | 2024-04-23 |
Family
ID=
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130310436A1 (en) * | 2012-05-18 | 2013-11-21 | The Regents Of The University Of Colorado, A Body Corporate | Methods for predicting response to beta-blocker therapy in non-ischemic heart failure patients |
CN104994912A (en) * | 2012-12-06 | 2015-10-21 | 康肽德生物医药技术有限公司 | Peptide therapeutics and methods for using same |
CN105586406A (en) * | 2016-01-15 | 2016-05-18 | 汪道文 | Method for detecting gene polymorphism of ADRB1 and GRK5 |
US20170363620A1 (en) * | 2016-06-17 | 2017-12-21 | Abbott Laboratories | BIOMARKERS TO PREDICT NEW ONSET HEART FAILURE WITH PRESERVED EJECTION FRACTION (HFpEF) |
CN107683341A (en) * | 2015-05-08 | 2018-02-09 | 新加坡科技研究局 | method for the diagnosis and prognosis of chronic heart failure |
JP2018072337A (en) * | 2016-10-21 | 2018-05-10 | 国立研究開発法人国立循環器病研究センター | Method of predicting recurrence risk of major adverse cardiac event |
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130310436A1 (en) * | 2012-05-18 | 2013-11-21 | The Regents Of The University Of Colorado, A Body Corporate | Methods for predicting response to beta-blocker therapy in non-ischemic heart failure patients |
CN104994912A (en) * | 2012-12-06 | 2015-10-21 | 康肽德生物医药技术有限公司 | Peptide therapeutics and methods for using same |
CN107683341A (en) * | 2015-05-08 | 2018-02-09 | 新加坡科技研究局 | method for the diagnosis and prognosis of chronic heart failure |
CN105586406A (en) * | 2016-01-15 | 2016-05-18 | 汪道文 | Method for detecting gene polymorphism of ADRB1 and GRK5 |
US20170363620A1 (en) * | 2016-06-17 | 2017-12-21 | Abbott Laboratories | BIOMARKERS TO PREDICT NEW ONSET HEART FAILURE WITH PRESERVED EJECTION FRACTION (HFpEF) |
JP2018072337A (en) * | 2016-10-21 | 2018-05-10 | 国立研究開発法人国立循環器病研究センター | Method of predicting recurrence risk of major adverse cardiac event |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chetty et al. | Role of attributes selection in classification of Chronic Kidney Disease patients | |
Kurland et al. | Longitudinal data with follow-up truncated by death: match the analysis method to research aims | |
WO2019071098A2 (en) | Methods for predicting or detecting disease | |
Orr | Use of a probabilistic neural network to estimate the risk of mortality after cardiac surgery | |
CN109273094B (en) | Construction method and construction system of Kawasaki disease risk assessment model based on Boosting algorithm | |
Abed-Esfahani et al. | Transfer Learning for Depression: Early Detection and Severity Prediction from Social Media Postings. | |
Spyroglou et al. | A bayesian logistic regression approach in asthma persistence prediction | |
Yi et al. | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease | |
Sudharson et al. | Enhancing the Efficiency of Lung Disease Prediction using CatBoost and Expectation Maximization Algorithms | |
Gudelis et al. | Diagnosis of pain in the right iliac fossa. A new diagnostic score based on Decision-Tree and Artificial Neural Network Methods | |
Yaseliani et al. | Prediction of heart diseases using logistic regression and likelihood ratios | |
CN113421648A (en) | Model for predicting risk of ejection fraction retention type heart failure | |
CN113421648B (en) | Model for predicting ejection fraction retention type heart failure risk | |
Chiu et al. | Intelligent systems developed for the early detection of chronic kidney disease | |
Jung et al. | Outcomes and factors leading to graft failure in kidney transplants from deceased donors with acute kidney injury—A retrospective cohort study | |
Jiang et al. | Prediction of coronary heart disease in gout patients using machine learning models | |
CN115188475A (en) | Risk prediction method for lupus nephritis patient | |
Pihur et al. | Meta analysis of chronic fatigue syndrome through integration of clinical, gene expression, SNP and proteomic data | |
Tran-Dinh et al. | Personalized risk predictor for acute cellular rejection in lung transplant using soluble CD31 | |
Satapathy et al. | Observation-prevention framework of cardiac risk factors: An Indian study | |
Karunakar et al. | Unified time series analysis with Bi-long short-term memory model for early prediction of dyslipidemia in steel workers | |
Shahabi et al. | Rule extraction for fatty liver detection using neural networks | |
Herzog et al. | Deep transformation models for functional outcome prediction after acute ischemic stroke | |
Nguyen et al. | Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study | |
Zeynep et al. | Performance evaluation of the ensemble learning models in the classification of chronic kidney failure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |