CN113421648B - Model for predicting ejection fraction retention type heart failure risk - Google Patents
Model for predicting ejection fraction retention type heart failure risk Download PDFInfo
- Publication number
- CN113421648B CN113421648B CN202110686960.8A CN202110686960A CN113421648B CN 113421648 B CN113421648 B CN 113421648B CN 202110686960 A CN202110686960 A CN 202110686960A CN 113421648 B CN113421648 B CN 113421648B
- Authority
- CN
- China
- Prior art keywords
- model
- heart failure
- ejection fraction
- risk
- predicting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 206010019280 Heart failures Diseases 0.000 title claims abstract description 32
- 230000014759 maintenance of location Effects 0.000 title claims description 8
- 208000038003 heart failure with preserved ejection fraction Diseases 0.000 claims abstract description 30
- 230000003993 interaction Effects 0.000 claims abstract description 18
- DDRJAANPRJIHGJ-UHFFFAOYSA-N creatinine Chemical compound CN1CC(=O)NC1=N DDRJAANPRJIHGJ-UHFFFAOYSA-N 0.000 claims abstract description 14
- 239000003550 marker Substances 0.000 claims abstract description 13
- 230000011987 methylation Effects 0.000 claims abstract description 10
- 238000007069 methylation reaction Methods 0.000 claims abstract description 10
- 239000002934 diuretic Substances 0.000 claims abstract description 8
- 229940030606 diuretics Drugs 0.000 claims abstract description 8
- 229940109239 creatinine Drugs 0.000 claims abstract description 7
- 108010088751 Albumins Proteins 0.000 claims abstract description 6
- 102000009027 Albumins Human genes 0.000 claims abstract description 6
- 210000002966 serum Anatomy 0.000 claims abstract description 6
- 210000002700 urine Anatomy 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 15
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 7
- 201000010099 disease Diseases 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 239000003814 drug Substances 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000001684 chronic effect Effects 0.000 claims description 2
- 238000002360 preparation method Methods 0.000 claims description 2
- 230000002861 ventricular Effects 0.000 abstract description 4
- 238000010801 machine learning Methods 0.000 abstract description 3
- 230000005856 abnormality Effects 0.000 abstract 1
- 230000003111 delayed effect Effects 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 15
- 238000013103 analytical ultracentrifugation Methods 0.000 description 13
- 230000007067 DNA methylation Effects 0.000 description 12
- 238000012549 training Methods 0.000 description 8
- 206010007559 Cardiac failure congestive Diseases 0.000 description 7
- 208000024172 Cardiovascular disease Diseases 0.000 description 5
- 230000001973 epigenetic effect Effects 0.000 description 5
- 238000012502 risk assessment Methods 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 230000036541 health Effects 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 206010007558 Cardiac failure chronic Diseases 0.000 description 3
- 238000003759 clinical diagnosis Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- ZZUFCTLCJUWOSV-UHFFFAOYSA-N furosemide Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC(C(O)=O)=C1NCC1=CC=CO1 ZZUFCTLCJUWOSV-UHFFFAOYSA-N 0.000 description 3
- JIVPVXMEBJLZRO-CQSZACIVSA-N 2-chloro-5-[(1r)-1-hydroxy-3-oxo-2h-isoindol-1-yl]benzenesulfonamide Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC([C@@]2(O)C3=CC=CC=C3C(=O)N2)=C1 JIVPVXMEBJLZRO-CQSZACIVSA-N 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 229960001523 chlortalidone Drugs 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000037149 energy metabolism Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 229960003883 furosemide Drugs 0.000 description 2
- 229960002817 metolazone Drugs 0.000 description 2
- AQCHWTWZEMGIFD-UHFFFAOYSA-N metolazone Chemical compound CC1NC2=CC(Cl)=C(S(N)(=O)=O)C=C2C(=O)N1C1=CC=CC=C1C AQCHWTWZEMGIFD-UHFFFAOYSA-N 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000002107 myocardial effect Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000000528 statistical test Methods 0.000 description 2
- 230000002485 urinary effect Effects 0.000 description 2
- 208000021959 Abnormal metabolism Diseases 0.000 description 1
- 229940123413 Angiotensin II antagonist Drugs 0.000 description 1
- 208000027896 Aortic valve disease Diseases 0.000 description 1
- BSYNRYMUTXBXSQ-UHFFFAOYSA-N Aspirin Chemical compound CC(=O)OC1=CC=CC=C1C(O)=O BSYNRYMUTXBXSQ-UHFFFAOYSA-N 0.000 description 1
- 206010003658 Atrial Fibrillation Diseases 0.000 description 1
- 108010074051 C-Reactive Protein Proteins 0.000 description 1
- 102100032752 C-reactive protein Human genes 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108010023302 HDL Cholesterol Proteins 0.000 description 1
- 102000001554 Hemoglobins Human genes 0.000 description 1
- 108010054147 Hemoglobins Proteins 0.000 description 1
- 238000001207 Hosmer–Lemeshow test Methods 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 206010028594 Myocardial fibrosis Diseases 0.000 description 1
- 108010071390 Serum Albumin Proteins 0.000 description 1
- 102000007562 Serum Albumin Human genes 0.000 description 1
- NGBFQHCMQULJNZ-UHFFFAOYSA-N Torsemide Chemical compound CC(C)NC(=O)NS(=O)(=O)C1=CN=CC=C1NC1=CC=CC(C)=C1 NGBFQHCMQULJNZ-UHFFFAOYSA-N 0.000 description 1
- 229960001138 acetylsalicylic acid Drugs 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 239000002333 angiotensin II receptor antagonist Substances 0.000 description 1
- 230000001746 atrial effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000002876 beta blocker Substances 0.000 description 1
- 229940097320 beta blocking agent Drugs 0.000 description 1
- 230000037396 body weight Effects 0.000 description 1
- 238000011088 calibration curve Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- 230000009091 contractile dysfunction Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 208000029078 coronary artery disease Diseases 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000035487 diastolic blood pressure Effects 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000007608 epigenetic mechanism Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000006371 metabolic abnormality Effects 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000000712 neurohormone Substances 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001991 pathophysiological effect Effects 0.000 description 1
- 230000007310 pathophysiology Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 206010039073 rheumatoid arthritis Diseases 0.000 description 1
- 238000013058 risk prediction model Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000035488 systolic blood pressure Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 229960005461 torasemide Drugs 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a model for predicting the risk of heart failure of a reserved ejection fraction, wherein the information collected by the model comprises the following marker combinations: age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta values for 25 methylation sites. All patients are clinically evaluated to identify risk through an end-to-end machine learning model combined with multiple sets of mathematical data interactions, and the occurrence of heart failure with preserved ejection fraction is delayed or prevented by controlling risk factors, treating asymptomatic left ventricular contractility abnormalities, and the like. The model provided by the invention has good clinical application prospect.
Description
Technical Field
The invention belongs to the field of disease diagnosis models, and particularly relates to a model for predicting the risk of heart failure with reserved ejection fraction.
Background
In recent years, heart failure morbidity and mortality have increased year by year. Heart failure is a change in the structure or function of the heart caused by complex interactions of biochemical factors such as inheritance, neurohormones, metabolism, inflammation, etc. Chronic heart failure (Chronic heart failure, CHF) is characterized by disturbances in myocardial energy metabolism and metabolic remodeling, with high morbidity and mortality. Thus, it is necessary to obtain accurate personalized risk assessment to assist in further managing clinical decisions.
There are three currently accepted subtypes of chronic heart failure, which are classified into heart failure of the depressed-ejection fraction type (HFrEF), heart failure of the medium-ejection fraction type (HFmrEF), and heart failure of the ejection fraction retention type (HFpEF) according to left ventricular ejection fraction (left ventricular ejection fraction, LVEF). The three subtypes differ greatly in etiology and pathophysiology. Notably, early prediction of HFpEF remains challenging. The establishment of the HFpEF early prediction model is very important to risk assessment management and clinical decision of heart failure, and timely controls diet and life habit aiming at patients with high HFpEF risk, thereby being more in line with the principle of accurate prevention.
Heart failure is a disease caused by multiple factors, and its occurrence is the result of the combined action of genetic factors and environmental factors. Epigenetic mechanisms such as DNA methylation and the like are involved in regulating myocardial fibrosis, leading to myocardial energy metabolism disorders, the occurrence of abnormal metabolism, transport and activation of amino acids and the like, promoting the development of cardiovascular diseases and affecting the risk of individual diseases, which is one of the pathophysiological causes of HFpEF occurrence. It is well known that the occurrence of HFpEF is closely related to clinical factors. The risk of HFpEF onset increases dramatically with age. DNA methylation and clinical characteristics can describe disease states in different dimensions, with internal interactions.
In the prior art Sadiya et al developed a 10 year heart failure risk model that included 10 clinical features, but did not discuss the pathogenesis and different subtypes of CHF. The method lacks the ability to learn feature interactions and considers only clinical features, focusing on no epigenetic factors (Khan SS,Ning H,Shah SJ et al.10-Year Risk Equations for Incident Heart Failure in the General Population.J.Am.Coll.Cardiol.2019;73:2388-2397).
William b.kannel et al developed a 4-year heart failure risk assessment model in which the risk of heart failure was estimated using a logistic function consisting of 9 clinical features. The method has the same problems as the model developed by Sadiya and the like: lack of ability to learn feature interactions and consider only clinical features, not focused on epigenetic factors (Kannel WB,D'Agostino RB,Silbershatz H et al.Profile for estimating risk of heart failure.Arch.Intern.Med.1999;159:1197-1204.).
Edward Choi et al built an early detection model of CHF, using modeling of the temporal relationship between health data from a patient Electronic Health Record (EHR), predicting future diagnosis of CHF. The method only focuses on clinical features, and does not consider the influence of epigenetic factors on future heart failure events (Choi E,Schuetz A,Stewart WF et al.Using recurrent neural network models for early detection of heart failure onset.J.Am.Med.Inform.Assoc.2016;24:361-370).
There is no HFpEF risk prediction model developed that integrates clinical and epigenetic features. Therefore, developing an accurate and comprehensive HFpEF risk prediction method is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the above problems, the present invention provides a model for predicting the risk of heart failure with preserved ejection fraction, which performs clinical evaluation on all patients to identify HFpEF risk by combining end-to-end machine learning model with multi-set of learning data interaction, and delays or prevents HFpEF occurrence by controlling HFpEF risk factors, treating asymptomatic left ventricular contractile dysfunction, and the like.
97 Clinical data from 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected. After 8 years of follow-up, 738 participants showed no heart failure, 59 participants were diagnosed as HFpEF, and the present invention uses this data as a training set to obtain a set of marker combinations for modeling the predicted risk of heart failure with preserved ejection fraction.
Terminology:
BMI: body Mass Index, BMI = Body weight (kg)/height (meter) squared, i.e., kg/m 2.
EHR: electronic health records, an electronic health record.
AUC: area under curves area under the curve.
FM: a factorization machine.
DNN: deep neural networks.
In one aspect, the invention provides a marker combination for predicting the risk of heart failure in the ejection fraction retention type.
The marker combination comprises:
Age of patient;
Whether diuretics are taken;
BMI;
Urine albumin;
Serum creatinine;
Beta value of methylation site.
The methylation site comprises :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
The diuretics include, but are not limited to: furosemide (furosemide), metolazone (metolazone), chlorthalidone (chlorthalidone), torsemide, furosemide (lasix).
In another aspect, the invention provides a model for predicting the risk of heart failure in the ejection fraction retention type.
The information collected by the model comprises marker combinations obtained by screening the following characteristics:
Age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta value of methylation site :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
The model is built by means of a data mining algorithm.
The model performs modeling by using DeepFM algorithm on the marker combination obtained by feature screening.
The method for establishing the model comprises the steps of inputting a sigmoid function in DeepFM algorithm for early prediction of an HFpEF event, and outputting as follows:
Wherein, Is a predicted HFpEF event, yFM is the output of the FM component, yDNN is the output of the DNN component.
The output of the FM component is the sum of an addition unit and a plurality of inner product units, and the output of the FM component is as follows:
Where w ε R d、Vi∈Rk (k is known). The other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions.
The output of the DNN component is as follows:
yDNN=W|H|+1·a|H|+b|H|+1。
Where |H| is the number of hidden layers, a (l) is the output of the embedded layer, W (l) is the model weight, and b (l) is the bias of the first layer.
Specifically, for a given deep hidden layer, a deep neural network is implemented that contains two hidden layers (256 ) using ReLU as an activation function:
y=f(x)=relu(wx+b),
Using logloss as an objective function here, to control the overfit, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001.
When the deep neural network is optimized, a batch normalization and weight attenuation method is adopted, the embedding is set to be 8, the batch size is set to be 300, and the decay size is set to be 0.9.
Preferably, the performance evaluation of the DeepFM model uses a three-fold cross-validation, using Adam as the optimization algorithm, with a learning rate of 0.0001, epoch of 400, and dropout of 60%.
Preferably, the recognition capability of the DeepFM model is assessed by area, sensitivity, specificity and accuracy under the subject's operating characteristics.
In a further aspect, the present invention provides the use of the foregoing marker composition and/or the foregoing model for the preparation of a medicament and/or kit for the treatment of heart failure associated with ejection fraction retention.
In yet another aspect, the invention provides the use of the foregoing marker composition and/or the foregoing model in constructing an early predictive model of other chronic complex diseases.
The invention has the beneficial effects that:
Taking multiple groups of clinical characteristics into consideration, starting from 5 clinical diagnosis characteristics and 25 epigenetic characteristics, DNA methylation, using DeepFM algorithm to establish an early risk assessment method of HFpEF, wherein the method can extract respective complex characteristics from original characteristics to realize interaction among the characteristics, and the method result is superior to the existing models and other baseline machine learning models.
The DeepFM model employed in the present application performs best compared to the currently widely used baseline model SVC、Bagging、Random Forest、RUSBoost、Easy Ensemble、Gradient Boosting、Gaussian Naive Bayes、XGBoost、LogitBoost.
Drawings
FIG. 1 is an AUC of predicted performance based on different features in a test set, wherein: HFrisk model denotes a model containing EHR features and DNA methylation features, 25CpG model denotes a model with DNA methylation features only, and 5EHR model denotes a model with EHR features only.
Figure 2 is the AUC results for the baseline model in the test set and 30 features used for this method.
Fig. 3 is the AUC results of this method and William b.kannel model in male/female participants.
Fig. 4 shows the evaluation of the method by means of a calibration graph of the observed risk and the predicted risk and a Hosmer-Lemeshow goodness-of-fit test, the calibration being as shown in the graph, the Hosmer-Lemeshow statistical test being statistically significant (p=0.678).
Detailed Description
The present invention will be described in further detail with reference to the following examples, which are not intended to limit the present invention, but are merely illustrative of the present invention. The experimental methods used in the following examples are not specifically described, but the experimental methods in which specific conditions are not specified in the examples are generally carried out under conventional conditions, and the materials, reagents, etc. used in the following examples are commercially available unless otherwise specified.
Example 1A method for constructing a model for predicting the risk of heart failure with preserved ejection fraction
97 Clinical data from 797 sampling participants without cardiovascular disease and 402,380 DNA methylation site data from DNA methylation 450K chips were collected. After 8 years of follow-up, 738 participants showed no heart failure, 59 participants were diagnosed with HFpEF, and this data was used as a training set.
And gradually screening the DNA methylation characteristics and the clinical diagnosis and treatment characteristics by using a three-step characteristic selection method.
(1) For clinical diagnosis and treatment characteristics, incomplete and insignificant clinical characteristics in the training set are removed by adopting the following thresholds: missing samples >20%, and the chi-square test/Mann-Whitney U test of both groups of samples has a p-value >0.05. If the Pearson correlation between two clinical features is greater than 0.8, then the clinical features with less Spearman correlation (i.e., less correlation with HFpEF) are discarded, and finally 26 clinical features are screened as follows:
Diuretics, beta-blockers, angiotensin II antagonists, ACEI, aspirin, cardiovascular disorders, coronary heart disease, atrial fibrillation, sex, age, BMI, serum creatinine, diastolic blood pressure, fasting blood glucose, high density lipoprotein cholesterol, systolic blood pressure, total cholesterol, hypertension, lipid disorders, atrial enlargement, rheumatism, aortic valve disorders, rheumatoid arthritis, urinary albumin, whole blood hemoglobin A1C, C reactive protein.
(2) For methylation signatures, differential methylation probes were obtained in the training Set (DMPs). DMPs were obtained using the fold change after log >0.05 and the adjusted p-value <0.05 as threshold, screening 319 DNA methylation sites, as follows:
cg01074797,cg10501210,cg15084543,cg16836311,cg16986315,cg26553501,cg00522231,cg02577387,cg04864807,cg20698421,cg22454769,cg01227537,cg15028458,cg17280346,cg02650266,cg04074536,cg08288016,cg04675542,cg05016408,cg05845376,cg06883126,cg10530883,cg16227623,cg18839637,cg19025234,cg23479922,cg03738025,cg08476511,cg16867657,cg18319852,cg22367191,cg23737190,cg24661236,cg09124496,cg09125127,cg07041999,cg12873476,cg14620572,cg10187894,cg17766026,cg24205914,cg24530234,cg01178624,cg15243034,cg21481937,cg01974091,cg01150270,cg18567924,cg01295034,cg04358214,cg08101977,cg08151623,cg09259081,cg09684846,cg16263848,cg17759274,cg19344626,cg24794228,cg25755428,cg00375983,cg00815832,cg01128109,cg01588224,cg02998240,cg03341469,cg03655142,cg05363438,cg06794355,cg06829788,cg07388493,cg08128734,cg08702915,cg08893087,cg09362335,cg13352914,cg16000360,cg16265542,cg22559013,cg23371584,cg00057240,cg00190206,cg03233656,cg03879180,cg05085636,cg05203213,cg05365735,cg05481257,cg05951221,cg09128529,cg10106284,cg10687131,cg10835286,cg11237792,cg11807280,cg12682972,cg13149736,cg17534916,cg19578183,cg22943590,cg25138327,cg25552548,cg27141850,cg02356435,cg03556243,cg10098541,cg11580026,cg12615982,cg14039937,cg14975410,cg19729744,cg23450509,cg01950474,cg02188818,cg02810967,cg03063309,cg03671075,cg04573661,cg07456878,cg07809027,cg07974833,cg08462122,cg11076306,cg11970349,cg15106030,cg16781992,cg18797590,cg20816447,cg02051771,cg04071270,cg05955210,cg07705913,cg08635765,cg08763461,cg11925729,cg14698665,cg15421911,cg23125993,cg23500537,cg24856658,cg26624398,cg27604145,cg02872426,cg03142554,cg03785755,cg04064963,cg05220968,cg06126421,cg06386482,cg06951627,cg09548084,cg10083824,cg11342453,cg11617964,cg12560772,cg13221458,cg14441271,cg15319032,cg15804973,cg16004593,cg17608381,cg21572722,cg21855021,cg22707857,cg23049448,cg27368039,cg02756107,cg03068497,cg03453431,cg03799713,cg06890291,cg08614290,cg09423312,cg14017689,cg14485097,cg16446288,cg17067544,cg17727071,cg21429551,cg21807944,cg23447239,cg23715104,cg25879142,cg26153045,cg26391564,cg00285394,cg00327383,cg00399059,cg07583137,cg08090164,cg10210397,cg13021857,cg13631444,cg16196274,cg17903548,cg19021188,cg20988565,cg22861548,cg23180489,cg24554944,cg25305703,cg25329685,cg25483741,cg25923729,cg27039118,cg00008629,cg01028796,cg07158339,cg10173814,cg13040392,cg13474639,cg13619074,cg13741668,cg13959831,cg23006204,cg23469878,cg00045910,cg00639447,cg03962527,cg06012872,cg06812574,cg09421083,cg10556349,cg21024264,cg24711336,cg24892069,cg27290215,cg27401945,cg01074365,cg02315732,cg04993130,cg05586607,cg06344265,cg10253640,cg11025604,cg12738765,cg19172170,cg20648141,cg01349368,cg03124318,cg04671742,cg05157098,cg06021880,cg15110296,cg17725129,cg20051875,cg23256579,cg25486399,cg00509187,cg00893603,cg01205935,cg01783816,cg02838877,cg04887675,cg12603632,cg02222791,cg03459776,cg10172979,cg13158272,cg18485215,cg19590421,cg22816294,cg27056759,cg04134722,cg10403394,cg11032707,cg23299445,cg27340001,cg00876127,cg01046070,cg01259782,cg03809021,cg04155793,cg05107535,cg05917111,cg07839457,cg08486507,cg09958192,cg21996068,cg02228185,cg03745383,cg07169660,cg12317815,cg18181703,cg19758448,cg24079381,cg00495303,cg07529654,cg24217948,cg00706441,cg00789427,cg03562414,cg06581818,cg08233235,cg09547119,cg10635122,cg13640414,cg13672791,cg19536401,cg20964856,cg22693863,cg25481705,cg25599567,cg01479232,cg01581757,cg01618988,cg04931539,cg10339152,cg10743062,cg11853697,cg11861654,cg11991566,cg17129400,cg18965930,cg20810198,cg25829531,cg22862003,cg01234420,cg01346718,cg04986324.
(3) Combining 26+319 characteristics obtained by screening, and performing second-step characteristic screening by using lasso algorithm. By adding the L1 norm of the coefficient as a penalty term to the loss function, a first order penalty function is constructed, the purpose of feature selection is achieved by parameter reduction, the result vector sparsity is achieved, and finally, feature screening is executed by compressing the coefficient corresponding to some weak variables to 0, and the minimized objective function is as follows:
in this algorithm, the super parameter family= "binominal", type. Measure= "auc"/"class", nfolds =10 is set, and finally 80 combination features are obtained.
(4) And using a supporting gradient lifting tree algorithm (Gradient Boosting Decison Tree, GBDT for short), continuously performing feature splitting to generate a tree, fitting a residual error between a predicted value and an actual value of a previous round of model, so that the model achieves an optimal effect, and simultaneously, limiting the complexity of the tree by adding regularization terms to further prevent overfitting. In this algorithm, setting the hyper-parameters objective="binary:logistic",booster="gbtree",eval metric="error",nrounds=7,eta=0.5,max depth=3,subsample=0.5,colsample bytree=1,min child weight=2,gamma=0.5. finally obtains 30 features, which are specifically as follows:
Age, whether diuretics, BMI, urinary albumin, serum creatinine, methylation sites :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
(5) To better reflect the effect of interactions between features on HFpEF risk, modeling was performed on 30 features obtained by feature screening using DeepFM algorithm. The neural network model DeepFM integrates the architecture of a factor decomposition machine (FM) and a Deep Neural Network (DNN), FM is low-order feature interaction modeling, DNN is high-order feature interaction modeling, cross features are learned through dot product and hidden vector forms, the capability of automatically learning the cross features is achieved, and end-to-end models of complex features can be extracted from original features. DeepFM can be trained effectively because its wide and deep portions are different, but they share the same input and embedding vectors. DeepFM extract DNA methylation and EHR features and learn the combination of features that are hidden behind these features. DeepFM jointly trains the whole network in an end-to-end mode, finally inputs a sigmoid function for early prediction of the HFpEF event, and outputs as follows:
Wherein, Is a predicted HFpEF event, yFM is the output of the FM component, yDNN is the output of the DNN component. The FM component and DNN component represent a factorizer and a feedforward neural network, respectively, for low-order feature interactions and high-order feature interactions. The FM model captures order-2 feature interactions more effectively than previous methods, especially if the data set is sparse, as an inner product of the respective feature potential vectors for the 2-order feature interactions. The output of FM is the sum of one addition unit and a plurality of inner product units, and the output of FM is:
Where w ε R d、Vi∈Rk (k is known). The other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions. The depth component DNN is a feedforward neural network and is used for learning high-order feature interaction, and the DNN output is as follows:
yDNN=W|H|+1·a|H|+b|H|+1。
Where |H| is the number of hidden layers, a (l) is the output of the embedded layer, W (l) is the model weight, and b (l) is the bias of the first layer. For a given deep hidden layer, a deep neural network is implemented that contains two hidden layers (256 ) using ReLU as an activation function:
y=f(x)=relu(wx+b)。
Logloss is used here as an objective function. To control the overfitting, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001. To optimize the neural network, batch normalization and weight decay methods are employed. The embedding was set to 8, the batch size was set to 300, and the decay size was set to 0.9. To train DeepFM algorithm, using Adam as the optimization algorithm, learning rate was set to 0.0001, epoch was set to 400, dropout was set to 60%, and performance of DeepFM model was evaluated using tri-fold cross validation.
(6) Finally, the area, sensitivity, specificity and accuracy under the subject's operating characteristics were used to evaluate the recognition ability of all application models. The subject operation characteristic curve, namely ROC curve, is a curve drawn by taking a true positive rate TPR (True Positive Rate, which can indicate sensitivity) as an ordinate and a false positive rate FPR (False Positive Rate, which can indicate 1-specificity) as an abscissa according to a series of set thresholds, and reflects the changes of the TPR and the FPR under different thresholds, wherein the closer the curve is to the upper left corner, the better the classification performance of the model is indicated; sensitivity is the proportion of correct prediction in the model, and represents the proportion of the positive examples divided into pairs, and the recognition capability of the classifier on the positive examples is measured; accuracy describes the ability of the classifier to determine overall data, with positive determinations being positive and negative determinations being negative; the accuracy is the most common evaluation index, which is the number of samples divided by the number of all samples, and in general, the higher the accuracy, the better the classifier; the specificity represents the proportion of all negative examples which are divided into pairs, and the recognition capability of the classifier on the negative examples is measured.
Example 2 model verification
171 Sampled participants without cardiovascular disease were collected, 139 participants showed no heart failure over 8 years of follow-up, and 32 participants were diagnosed with HFpEF and evaluated as a test set of the method.
The model was judged and calibrated using AUC and Hosmer-Lemeshow test. The following table is a confusion matrix for the 30 features used in the method in the test set:
In the test set, the AUC obtained using this model was 0.90 (95% confidence interval, 0.89-0.90). The calibration curve obtained by the model is shown in fig. 1, and the Hosmer-Lemeshow statistical test has no statistical significance (p=0.678), so that the reliability of the result is proved.
To assess the impact of training set sample size on this model, 75%, 60%, 50% and 25% of training set participants were randomly selected. Test set results are shown below:
Sample size | Participants (participants) | AUC | Sensitivity of | Specificity (specificity) | Accuracy rate of |
25% | Verification set | 0.90(0.76-1.00) | 0.89(0.57-1.00) | 0.93(0.84-1.00) | 0.92(0.86-0.99) |
25% | Test set | 0.88(0.84-0.91) | 0.82(0.76-0.88) | 0.84(0.66-1.00) | 0.84(0.69-0.98) |
50% | Verification set | 0.87(0.79-0.95) | 0.86(0.62-1.00) | 0.79(0.61-0.98) | 0.80(0.65-0.95) |
50% | Test set | 0.89(0.87-0.90) | 0.85(0.80-0.91) | 0.87(0.75-0.98) | 0.86(0.78-0.95) |
60% | Verification set | 0.92(0.87-0.96) | 0.89(0.82-0.95) | 0.84(0.69-0.99) | 0.85(0.71-0.98) |
60% | Test set | 0.89(0.88-0.89) | 0.83(0.77-0.89) | 0.85(0.72-0.98) | 0.86(0.75-0.94) |
75% | Verification set | 0.90(0.86-0.95) | 0.84(0.57-1.00) | 0.87(0.66-1.00) | 0.87(0.70-1.00) |
75% | Test set | 0.88(0.85-0.91) | 0.76(0.70-0.82) | 0.90(0.82-0.98) | 0.87(0.81-0.93) |
The results indicate that the results of the test set are independent of the sample size of the training set.
Furthermore, we also compared HFrisk models to the currently widely used baseline model SVC、Bagging、Random Forest、RUSBoost、Easy Ensemble、Gradient Boosting、Gaussian Naive Bayes、XGBoost、LogitBoost for performance. Each reference model is fine-tuned to achieve better results. Using the same 30 characteristics, AUC results were obtained as shown in fig. 2. Although the AUC (auc=0.57-0.88) was slightly different for each model, the performance for DeepFM model was still the best.
Comparing this model with other published models: william b.kannel et al propose a 4-year risk assessment model (Profile for Estimating Risk of Heart Failure[J].Archives of Internal Medicine,1999,159(11):1197-204.), that uses a hybrid logistic regression algorithm to assess CHF risk by gender on the FHS cohort. The model was constructed using the same training set as William b.kannel and compared to the model constructed according to the present application. The AUCs obtained by the final inventive model were respectively: male (male) 0.99, female (female) 0.94; AUC of William b.kannel model was: male 0.74 female 0.89. The results are shown in FIG. 3.
AUC or C statistics of the present model are directly compared to those of the published model: sadiya S. khan et al describe statistics for a 10 year CHF risk equation (Khan S S,Ning H,Shah S J,et al.10-Year Risk Equations for Incident HeartFailure in the General Population[J].Journal of the American College of Cardiology,2019,73(19):2388-2397.), validation set C of 0.71-0.87; edward Choi et al established a CHF early detection model with a test set AUC <0.80. The results prove that the method is optimal.
Calibration degree calibration charts are often used to evaluate consistency/calibration degree, i.e. the difference between predicted and actual values, i.e. scatter plots of actual occurrence (observed risk) and predicted occurrence (PREDICTED RISK), which are visualized as a result of the Hosmer-Lemeshow goodness-of-fit test. And the real result and the model fitting probability are transmitted to a hostem. Test function, so that p=0.678 is obtained, and the fact that the original assumption cannot be refused is indicated, and the fitting effect is good. The results are shown in FIG. 4.
Claims (10)
1. A marker combination for predicting the risk of heart failure with preserved ejection fraction, comprising:
Beta values for methylation sites including :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910;
Age of patient;
Whether diuretics are taken;
BMI;
Urine albumin;
serum creatinine.
2. A model for predicting the risk of heart failure with preserved ejection fraction, wherein the information collected by the model comprises the following marker combinations:
age of patient; whether diuretics are taken; BMI; urine albumin; serum creatinine; beta value of methylation site;
The methylation site comprises :cg06344265、cg27401945、cg25755428、cg24205914、cg23299445、cg21429551、cg21024264、cg20051875、cg17766026、cg16781992、cg13352914、cg11853697、cg10556349、cg10083824、cg08614290、cg08101977、cg07041999、cg05845376、cg05481257、cg05363438、cg03556243、cg03233656、cg00522231、cg00495303、cg00045910.
3. The model for predicting the risk of heart failure in a ejection fraction preserving model according to claim 2, wherein the model performs modeling by using DeepFM algorithm on marker combinations obtained by feature screening.
4. A model for predicting the risk of heart failure with preserved ejection fraction as claimed in claim 3, wherein the method for constructing the model includes inputting a sigmoid function in DeepFM algorithm for early prediction of heart failure event with preserved ejection fraction, and outputting:
Wherein, Is a predicted ejection fraction retention heart failure event, yFM is the output of the FM component, yDNN is the output of the DNN component.
5. The model for predicting the risk of heart failure in a preserved ejection fraction as set forth in claim 4, wherein the output of the FM component is a sum of an addition unit and a plurality of inner product units, and the output of the FM component is:
Wherein w ε R d、Vi∈Rk, k is known; the other addition unit < w, x > reflects the importance of the order-1 feature and the inner product unit represents the effect of order-2 feature interactions.
6. The model for predicting the risk of heart failure with preserved ejection fraction as set forth in claim 4, wherein the output of the DNN component is:
yDNN=W|H|+1·a|H|+b|H|+1;
Where |H| is the number of hidden layers, a (l) is the output of the embedded layer, W (l) is the model weight, and b (l) is the bias of the first layer.
7. The model for predicting the risk of ejection fraction preserving heart failure according to claim 5, wherein for a given deep hidden layer a deep neural network comprising two hidden layers (256 ) is implemented using ReLU as an activation function:
y=f(x)=relu(wx+b),
Using logloss as an objective function here, to control the overfit, an L2 regularization penalty is added on the nodes, with the parameter set to 0.0001.
8. The model of claim 3, wherein the DeepFM model performance evaluation uses a tri-fold cross-validation, using Adam as the optimization algorithm, learning rate is set to 0.0001, epoch is set to 400, and dropout is set to 60%.
9. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for the preparation of a medicament and/or a kit for the treatment of heart failure associated with ejection fraction retention.
10. Use of a marker combination according to claim 1 and/or a model according to claims 2-8 for constructing an early predictive model of chronic complex disease.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110686960.8A CN113421648B (en) | 2021-06-21 | 2021-06-21 | Model for predicting ejection fraction retention type heart failure risk |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110686960.8A CN113421648B (en) | 2021-06-21 | 2021-06-21 | Model for predicting ejection fraction retention type heart failure risk |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421648A CN113421648A (en) | 2021-09-21 |
CN113421648B true CN113421648B (en) | 2024-04-23 |
Family
ID=77789546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110686960.8A Active CN113421648B (en) | 2021-06-21 | 2021-06-21 | Model for predicting ejection fraction retention type heart failure risk |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421648B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104994912A (en) * | 2012-12-06 | 2015-10-21 | 康肽德生物医药技术有限公司 | Peptide therapeutics and methods for using same |
CN105586406A (en) * | 2016-01-15 | 2016-05-18 | 汪道文 | Method for detecting gene polymorphism of ADRB1 and GRK5 |
CN107683341A (en) * | 2015-05-08 | 2018-02-09 | 新加坡科技研究局 | method for the diagnosis and prognosis of chronic heart failure |
JP2018072337A (en) * | 2016-10-21 | 2018-05-10 | 国立研究開発法人国立循環器病研究センター | Method of predicting recurrence risk of major adverse cardiac event |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130310436A1 (en) * | 2012-05-18 | 2013-11-21 | The Regents Of The University Of Colorado, A Body Corporate | Methods for predicting response to beta-blocker therapy in non-ischemic heart failure patients |
WO2017218911A1 (en) * | 2016-06-17 | 2017-12-21 | Abbott Laboratories | BIOMARKERS TO PREDICT NEW ONSET HEART FAILURE WITH PRESERVED EJECTION FRACTION (HFpEF) |
-
2021
- 2021-06-21 CN CN202110686960.8A patent/CN113421648B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104994912A (en) * | 2012-12-06 | 2015-10-21 | 康肽德生物医药技术有限公司 | Peptide therapeutics and methods for using same |
CN107683341A (en) * | 2015-05-08 | 2018-02-09 | 新加坡科技研究局 | method for the diagnosis and prognosis of chronic heart failure |
CN105586406A (en) * | 2016-01-15 | 2016-05-18 | 汪道文 | Method for detecting gene polymorphism of ADRB1 and GRK5 |
JP2018072337A (en) * | 2016-10-21 | 2018-05-10 | 国立研究開発法人国立循環器病研究センター | Method of predicting recurrence risk of major adverse cardiac event |
Also Published As
Publication number | Publication date |
---|---|
CN113421648A (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | A novel ensemble method for credit scoring: Adaption of different imbalance ratios | |
Hu et al. | Detection of COVID-19 severity using blood gas analysis parameters and Harris hawks optimized extreme learning machine | |
Norouzi et al. | Predicting renal failure progression in chronic kidney disease using integrated intelligent fuzzy expert system | |
Shahid et al. | A novel approach for coronary artery disease diagnosis using hybrid particle swarm optimization based emotional neural network | |
Orr | Use of a probabilistic neural network to estimate the risk of mortality after cardiac surgery | |
Nabrdalik et al. | Machine learning predicts cardiovascular events in patients with diabetes: the silesia diabetes-heart project | |
Huang et al. | Donor-derived cell-free DNA combined with histology improves prediction of estimated glomerular filtration rate over time in kidney transplant recipients compared with histology alone | |
CN113421648B (en) | Model for predicting ejection fraction retention type heart failure risk | |
Jiang et al. | Prediction of coronary heart disease in gout patients using machine learning models | |
Grossi | How artificial intelligence tools can be used to assess individual patient risk in cardiovascular disease: problems with the current methods | |
Wang et al. | Expanded feature space-based gradient boosting ensemble learning for risk prediction of type 2 diabetes complications | |
Levene et al. | Prevalence of traditional and non-traditional cardiovascular risk factors in adults with congenital heart disease | |
Gudelis et al. | Diagnosis of pain in the right iliac fossa. A new diagnostic score based on decision-tree and artificial neural network methods | |
Kaya | Performance evaluation of multilayer perceptron artificial neural network model in the classification of heart failure | |
CN115188475A (en) | Risk prediction method for lupus nephritis patient | |
Yördan et al. | Hybrid AI-Based Chronic Kidney Disease Risk Prediction | |
Sancar et al. | Body mass index estimation by using an adaptive neuro fuzzy inference system | |
Hernández et al. | Predicting delayed graft function and mortality in kidney transplantation | |
Shahabi et al. | Rule extraction for fatty liver detection using neural networks | |
Tang et al. | Different thresholds in the prediction of chronic obstructive pulmonary disease using neural network and Logistic model | |
Asadi et al. | Identifying Risk Indicators of Cardiovascular Disease in Fasa Cohort Study (FACS): An Application of Generalized Linear Mixed-Model Tree | |
Ramasamy et al. | A Work Review on Clinical Laboratory Data Utilizing Machine Learning Use-Case Methodology | |
CN118155853B (en) | Method and system for constructing lupus nephritis immunotherapy reactivity prediction model | |
Saharan et al. | Optimization of Smoking Classification by Applying Neural Network with Variable Importance Using Cytokine Biomarkers | |
Madgwick et al. | P027 Machine learning approaches to identify prognosis indicators from microbiome data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |