CN113205880B - LogitBoost-based heart disease prognosis prediction method and device - Google Patents

LogitBoost-based heart disease prognosis prediction method and device Download PDF

Info

Publication number
CN113205880B
CN113205880B CN202110483774.4A CN202110483774A CN113205880B CN 113205880 B CN113205880 B CN 113205880B CN 202110483774 A CN202110483774 A CN 202110483774A CN 113205880 B CN113205880 B CN 113205880B
Authority
CN
China
Prior art keywords
variable
binning
variables
relevant
logitboost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110483774.4A
Other languages
Chinese (zh)
Other versions
CN113205880A (en
Inventor
刘淇乐
林桂森
曾安
徐小维
陈宇琛
贾乾君
黄美萍
史弋宇
庄建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong General Hospital
Original Assignee
Guangdong General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong General Hospital filed Critical Guangdong General Hospital
Priority to CN202110483774.4A priority Critical patent/CN113205880B/en
Publication of CN113205880A publication Critical patent/CN113205880A/en
Application granted granted Critical
Publication of CN113205880B publication Critical patent/CN113205880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Abstract

The invention discloses a method and a device for predicting the prognosis of heart diseases based on LogitBoost, wherein the method comprises the following steps: acquiring target CCTA clinical data, and determining a plurality of related variables from the target CCTA clinical data; performing box separation operation on the plurality of related variables based on a plurality of box separation principles to obtain a plurality of box separation variables corresponding to each related variable; calculating variable information gains of a plurality of boxed variables corresponding to each relevant variable, and screening a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all relevant variables; and establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data. Therefore, the invention realizes more accurate heart disease prognosis prediction effect, is beneficial to improving the prediction efficiency and precision, and provides help for subsequent clinical diagnosis and treatment.

Description

LogitBoost-based heart disease prognosis prediction method and device
Technical Field
The invention relates to the technical field of disease prognosis prediction, in particular to a cardiac disease prognosis prediction method and device based on LogitBoost.
Background
With the improvement of the modernization degree of people's life, the attention degree on physical health, particularly heart health, is increasing, and therefore, the prediction of the long-term prognosis of heart disease patients is of additional importance. In the existing prediction method, operators such as doctors cannot effectively screen variables in a large amount of CCTA (Coronary Computed Tomography Angiography) data, so the obtained prediction effect is poor. Therefore, the prior art has defects and needs to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem that the LogitBoost-based heart disease prognosis prediction method and device can effectively consider the influence of different binning principles on the binning of variables, screen out high-quality target variables based on the variable information gain of the binned variables, realize a more accurate heart disease prognosis prediction effect by combining a LogitBoost learning algorithm, contribute to improving the prediction efficiency and precision, and provide help for subsequent clinical diagnosis and treatment.
In order to solve the technical problems, the invention discloses a method for predicting prognosis of heart diseases based on LogitBoost in a first aspect, which comprises the following steps:
acquiring target CCTA clinical data, and determining a plurality of related variables from the target CCTA clinical data; the target CCTA clinical data comprises CCTA clinical data of a plurality of heart disease patients;
performing box separation operation on the plurality of related variables based on a plurality of box separation principles to obtain a plurality of box separation variables corresponding to each related variable;
calculating variable information gains of a plurality of boxed variables corresponding to each relevant variable, and screening a plurality of target relevant variables from the relevant variables according to the variable information gains corresponding to all the relevant variables;
establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data; the LogitBoost prediction model is used for carrying out heart disease prognosis prediction on a target heart disease patient.
As an alternative implementation, in the first aspect of the present invention, the method further includes:
and analyzing the prediction result of the LogitBoost prediction model by adopting a prediction result accuracy analysis method to obtain the prediction effect of the LogitBoost prediction model.
As an alternative embodiment, in the first aspect of the invention, the type of the relevant variable comprises a computed tomography angiography indicator and/or a clinical variable.
As an alternative embodiment, in the first aspect of the present invention, the relevant variables include: one or more of a global mortality, body mass index, blood pressure, body surface area, coronary artery calcium score, diagonal, diabetes, diabetic peripheral neuropathy, ejection fraction, family history, vermingham primary risk score, vermingham risk score, triglycerides, glycated hemoglobin, high density lipoprotein, ischemic stroke, left branch, low density lipoprotein, left trunk, left ventricular end-diastole, left ventricular end-systole, left ventricular mass, major cardiac adverse event, modified duchenne index, quantity, middle, blunt limbus branch, peripheral artery, lateral branch, proximal, right coronary artery, segment involvement score, tachypnea, segment stenosis score, transient ischemic stroke.
As an optional implementation manner, in the first aspect of the present invention, the performing a binning operation on the multiple related variables based on multiple binning principles to obtain multiple binned variables corresponding to each of the related variables includes:
and for each relevant variable in the plurality of relevant variables, performing variable binning processing on the relevant variable based on four binning principles of equal-frequency binning, equal-distance binning, decision tree binning and chi-square binning respectively to obtain an equal-frequency binning post-variable, an equal-distance binning post-variable, a decision tree binning post-variable and a chi-square binning post-variable corresponding to the relevant variable.
As an optional implementation manner, in the first aspect of the present invention, the calculating variable information gains of a plurality of binned variables corresponding to each of the relevant variables, and screening a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all the relevant variables includes:
calculating an equal-frequency after-binning variable information gain, a decision tree after-binning variable information gain and a chi-square after-binning variable information gain which correspond to each related variable respectively;
calculating the average information gain value of the variable information gain after equal frequency binning, the variable information gain after equal distance binning, the variable information gain after decision tree binning and the variable information gain after chi-square binning corresponding to each relevant variable;
sequencing all the related variables from high to low according to corresponding average information gain values to obtain variable sequences;
and determining the related variables of the previous preset number in the variable sequence as target related variables.
As an alternative implementation manner, in the first aspect of the present invention, the method for analyzing accuracy of the prediction result includes a multi-fold cross-validation method and/or a significance analysis method.
The invention discloses a cardiac disease prognosis prediction device based on LogitBoost in a second aspect, which comprises:
the acquisition module is used for acquiring target CCTA clinical data and determining a plurality of related variables from the target CCTA clinical data; the target CCTA clinical data comprises CCTA clinical data of a plurality of heart disease patients;
the box separation module is used for carrying out box separation operation on the plurality of relevant variables based on a plurality of box separation principles to obtain a plurality of box separated variables corresponding to each relevant variable;
the calculation module is used for calculating the variable information gain of a plurality of boxed variables corresponding to each relevant variable and screening a plurality of target relevant variables from the relevant variables according to the variable information gain corresponding to all relevant variables;
the establishing module is used for establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data; the LogitBoost prediction model is used for carrying out heart disease prognosis prediction on a target heart disease patient.
As an optional embodiment, in the second aspect of the present invention, the apparatus further comprises:
and the analysis module is used for analyzing the prediction result of the LogitBoost prediction model by adopting a prediction result accuracy analysis method to obtain the prediction effect of the LogitBoost prediction model.
As an alternative embodiment, in the second aspect of the invention, the type of the relevant variable comprises a computed tomography angiography indicator and/or a clinical variable.
As an alternative embodiment, in the second aspect of the present invention, the relevant variables include: one or more of a global mortality, body mass index, blood pressure, body surface area, coronary artery calcium score, diagonal, diabetes, diabetic peripheral neuropathy, ejection fraction, family history, vermingham primary risk score, vermingham risk score, triglycerides, glycated hemoglobin, high density lipoprotein, ischemic stroke, left branch, low density lipoprotein, left trunk, left ventricular end-diastole, left ventricular end-systole, left ventricular mass, major cardiac adverse event, modified duchenne index, quantity, middle, blunt limbus branch, peripheral artery, lateral branch, proximal, right coronary artery, segment involvement score, tachypnea, segment stenosis score, transient ischemic stroke.
As an optional implementation manner, in the second aspect of the present invention, the specific manner in which the binning module performs binning operation on the multiple related variables based on multiple binning principles to obtain multiple binned variables corresponding to each related variable includes:
and for each relevant variable in the plurality of relevant variables, performing variable binning processing on the relevant variable based on four binning principles of equal-frequency binning, equal-distance binning, decision tree binning and chi-square binning respectively to obtain an equal-frequency binning post-variable, an equal-distance binning post-variable, a decision tree binning post-variable and a chi-square binning post-variable corresponding to the relevant variable.
As an optional implementation manner, in the second aspect of the present invention, the specific manner in which the calculating module calculates the variable information gains of the plurality of binned variables corresponding to each of the relevant variables, and screens out a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all of the relevant variables includes:
calculating the after-equal-frequency binning variable, the after-decision tree binning variable and the after-chi-square binning variable which correspond to each relevant variable respectively, and calculating the after-equal-frequency binning variable information gain, the after-decision tree binning variable information gain and the after-chi-square binning variable information gain which correspond to the after-equal-frequency binning variable, the after-decision tree binning variable information gain and the after-chi-square binning variable information gain;
calculating the average information gain value of the variable information gain after equal frequency binning, the variable information gain after equal distance binning, the variable information gain after decision tree binning and the variable information gain after chi-square binning corresponding to each relevant variable;
sequencing all the related variables from high to low according to corresponding average information gain values to obtain variable sequences;
and determining the related variables of the previous preset number in the variable sequence as target related variables.
As an alternative embodiment, in the second aspect of the present invention, the method for analyzing accuracy of the prediction result includes a multi-fold cross-validation method and/or a significance analysis method.
In a third aspect, the present invention discloses another LogitBoost-based cardiac disease prognosis prediction device, which includes:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program codes stored in the memory to execute part or all of the steps of the LogitBoost-based cardiac disease prognosis prediction method disclosed in the first aspect of the embodiment of the invention.
The fourth aspect of the embodiments of the present invention discloses a bluetooth device, which includes a LogitBoost-based cardiac disease prognosis prediction apparatus, where the LogitBoost-based cardiac disease prognosis prediction apparatus is used to perform part or all of the steps in the LogitBoost-based cardiac disease prognosis prediction method disclosed in the first aspect of the embodiments of the present invention.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, target CCTA clinical data are obtained, and a plurality of related variables are determined from the target CCTA clinical data; performing box separation operation on the plurality of related variables based on a plurality of box separation principles to obtain a plurality of box separation variables corresponding to each related variable; calculating variable information gains of a plurality of boxed variables corresponding to each relevant variable, and screening a plurality of target relevant variables from the relevant variables according to the variable information gains corresponding to all the relevant variables; and establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data. Therefore, the method can effectively consider the influence on the variable binning based on different binning principles, further screens out high-quality target variables based on variable information gains of the binned variables, realizes a more accurate heart disease prognosis prediction effect by combining a LogitBoost learning algorithm, is beneficial to improving the prediction efficiency and accuracy, and provides help for subsequent clinical diagnosis and treatment.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for predicting prognosis of cardiac disease based on LogitBoost according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a cardiac disease prognosis prediction device based on LogitBoost according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another LogitBoost-based cardiac disease prognosis prediction apparatus according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the results of ranking the information gain averages of a plurality of variables of the ACM prediction model and the MACE prediction model according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating predicted results of the ACM model and the MACE model according to the present disclosure.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The invention discloses a cardiac disease prognosis prediction method and device based on LogitBoost, which can effectively consider the influence of different binning principles on the binning of variables, further screen out high-quality target variables based on the variable information gain of the binned variables, realize more accurate cardiac disease prognosis prediction effect by combining a LogitBoost learning algorithm, contribute to improving the prediction efficiency and precision, and provide help for subsequent clinical diagnosis and treatment. The following are detailed below.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for predicting prognosis of cardiac disease based on LogitBoost according to an embodiment of the present invention. The method described in fig. 1 may be applied to corresponding disease prognosis prediction equipment, a disease prognosis prediction terminal, and a disease prognosis prediction server, and the server may be a local server or a cloud server, which is not limited in the embodiment of the present invention as shown in fig. 1, and the method for predicting cardiac disease prognosis based on LogitBoost may include the following operations:
101. target CCTA clinical data is obtained, and a plurality of relevant variables are determined from the target CCTA clinical data.
In the embodiment of the invention, the target CCTA clinical data comprises CCTA clinical data of a plurality of heart disease patients. Optionally, the target CCTA clinical data may be obtained by collecting a large amount of CCTA clinical data of a patient with an extra-cardiovascular disease and screening a part of data with higher quality.
In an embodiment of the invention, the type of the relevant variable comprises a computed tomography angiography indicator and/or a clinical variable. Optionally, the relevant variables include ACM (all-cause mortality), BMI (body mass index), BP (blood pressure), BSA (body surface area), CCS (coronary calcium score), CCTA (coronary computed tomography), D (diagonals), DM (diabetes), DPN (diabetic peripheral neuropathy), ejection fraction, FHx (facial tissue family), FRRS (framework risk score, Framingham risk score), frgs (Framingham risk score, HDL (hemoglobin risk score, hemoglobin concentration), hemoglobin concentration (hemoglobin concentration), hemoglobin concentration of protein of interest, and hemoglobin concentration of protein of interest (c), ischemic stroke), LAD (left trunk), LCX (left branch), LDL (low-density lipoprotein), LM (left trunk), LVED (left ventricular end diastolic), LVES (left ventricular end systolic), LVMs (left ventricular end systolic), LVM (left ventricular mass), MACE (left ventricular end diastolic), MDI (modified duindex, modified ducker index), Nr (number), Mid (middle), OM (blunt edge support), PAD (PAD), peripheral artery (peripheral region), RCA (RCA), peripheral artery (proximal), coronary (proximal), peripheral artery (RCA), proximal (right branch, proximal), coronary (right segment, proximal), peripheral artery (proximal, peripheral), and coronary (right segment, proximal), tachypnea), SSS (segment stenosis score), TIA (transient ischemic stroke).
102. And performing box separation operation on the plurality of related variables based on various box separation principles to obtain a plurality of post-box-separation variables corresponding to each related variable.
103. And calculating the variable information gain of the plurality of classified variables corresponding to each relevant variable, and screening a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all relevant variables.
104. And establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data.
In the embodiment of the invention, the LogitBoost prediction model is used for carrying out heart disease prognosis prediction on a target heart disease patient.
Therefore, the method provided by the embodiment of the invention can effectively consider the influence on the classification of the variables based on different classification principles, and further screen out the high-quality target variables based on the variable information gain of the classified variables, and by combining the LogitBoost learning algorithm, the more accurate heart disease prognosis prediction effect is realized, the efficiency and the precision of prediction are improved, and the follow-up clinical diagnosis and treatment are facilitated.
In an optional embodiment, the method further comprises:
and analyzing the prediction result of the LogitBoost prediction model by adopting a prediction result accuracy analysis method to obtain the prediction effect of the LogitBoost prediction model.
In the embodiment of the present invention, optionally, the method for analyzing the accuracy of the prediction result includes a multi-fold cross-validation method and/or a significance analysis method. For example, the prediction result of the LogitBoost prediction model may be analyzed by a triple-fold cross-validation method, and the average value after 100 calculations is taken as the final score to validate the prediction effect of the model.
In another optional embodiment, in step 103, performing a binning operation on the multiple related variables based on multiple binning principles to obtain multiple binned variables corresponding to each related variable, includes:
for each relevant variable in the multiple relevant variables, variable binning processing is carried out on the relevant variable based on four binning principles of equal-frequency binning, equal-distance binning, decision tree binning and chi-square binning respectively, and an equal-frequency binning post-variable, an equal-distance binning post-variable, a decision tree binning post-variable and a chi-square binning post-variable corresponding to the relevant variable are obtained.
In another alternative embodiment, in step 103, calculating the variable information gains of the plurality of binned variables corresponding to each relevant variable, and screening a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all relevant variables, includes:
calculating an after-equal-frequency binning variable, an after-equal-distance binning variable, a decision tree after-binning variable and a chi-square after-binning variable which correspond to each relevant variable respectively, and calculating an after-equal-frequency binning variable information gain, an after-equal-distance binning variable information gain, a decision tree after-binning variable information gain and a chi-square after-binning variable information gain which correspond to the after-equal-frequency binning variable, the after-equal-distance binning variable, the decision tree after-binning variable information gain and the chi-square after-binning variable which correspond to each relevant variable respectively;
calculating the average information gain value of the variable information gain after equal frequency binning, the variable information gain after equal distance binning, the variable information gain after decision tree binning and the variable information gain after chi-square binning corresponding to each relevant variable;
sequencing all related variables from high to low according to corresponding average information gain values to obtain variable sequences;
and determining the front preset number of correlation variables in the variable sequence as target correlation variables.
The embodiment of the invention also discloses a specific technical implementation scheme, which comprises the steps of firstly selecting partial data with higher quality from the collected CCTA clinical data of the patients with the cardiovascular and cerebrovascular diseases, and then entering a characteristic selection process. In the characteristic selection process, the information gain average value of the patient characteristics is calculated through four classification methods of equal frequency classification, equidistant classification, decision tree classification and chi-square classification, and the information gain average value is sorted in a descending order. Under the four binning methods, fig. 4 shows the ranking results of the information gain averages of multiple variables of the ACM (all accounted mortality) prediction model and MACE (major adverse cardiac events) prediction model. In fig. 4 there are 35 computed tomography angiography indices (dark grey) and 34 clinical variables (light grey), and the information gain ordering is to assess the correlation of one attribute with the training data prediction. Wherein, some variables are abbreviated to mean: ACM (all-cause mortality), BMI (body mass index), BP (blood pressure), BSA (body surface area), CCS (coronary calcium score), CCTA (coronary computed tomography), D (diagonals), DM (diabetes mellitus), DPN (diabetic peripheral neuropathy), EF (injection fraction), blood fraction), FHx (clinical history), FRRS (framework risk raw risk score), FRS (framework risk, GTN), HDL (hemoglobin 1, hemoglobin triglyceride), HDL (hemoglobin 1, hemoglobin triglyceride), and hbs (hemoglobin triglyceride) 52 (hemoglobin-albumin, bovine serum albumin, blood serum albumin, hemoglobin, and albumin (hemoglobin), and hemoglobin-serum albumin, hemoglobin, albumin, LAD (left material dividing area, left trunk), LCX (left circular flex area, left branch), LDL (low-density lipoprotein), LM (left main area, left trunk), LVED (left ventricular end diastole), LVES (left ventricular end systole), LVM (left ventricular mass), MACE (large ventricular end cardiac events), MDI (modified Duke index), Nr (number), Mid (middle), OM (volumetric marginal area, blunt edge), PAD (PAD (peripheral area, peripheral artery), RCA (proximal respiratory area, right respiratory area), RCA (proximal area ), RCA (proximal area, right respiratory area), LVED (left ventricular mass), MACE (positive ventricular cardiac output, major cardiac adverse event), MDI (modified Duke index), LVM (right ventricular mass), MCA (peripheral arterial cross), RCA (proximal area, RCA (proximal area, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, RCA, R, The abbreviations for the remaining unexplained variables have well-defined meanings in the art, and are not described herein again.
Then, the variables with higher information gain are selected from all the variables, for example, the variable characteristics of the first ten ranked variables are selected, and the next process, namely model evaluation, is carried out.
In the model evaluation stage, firstly, a LogitBoost model is established through the patient characteristics screened in the first step and integrated learning. Then, long-term cardiac prognosis prediction is performed on the patient with the cardiovascular disease by using the model. And then the results are cross-validated by 3 folds and averaged by 100 times. And finally, obtaining a reliable prediction model and a prediction result through significance (P value) analysis. The prediction results of the ACM model and the MACE model are shown in fig. 5. The results of fig. 5 show that the prediction method in the present embodiment of the technology has a higher prediction accuracy than the existing conventional method because all relevant variables having a large influence of the patient are considered. By using 3-fold cross validation, averaging 100 prediction results, and significance analysis, the prediction methods and models in embodiments of the present technology have greater reliability, persuasiveness, and robustness in the long-term cardiac prognosis prediction of patients with extra-cardiovascular disease.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a cardiac disease prognosis prediction device based on LogitBoost according to an embodiment of the present invention. The apparatus described in fig. 2 may be applied to corresponding disease prognosis prediction equipment, a disease prognosis prediction terminal, and a disease prognosis prediction server, and the server may be a local server or a cloud server, which is not limited in the embodiment of the present invention. As shown in fig. 2, the apparatus may include:
the acquiring module 201 is configured to acquire target CCTA clinical data and determine a plurality of relevant variables from the target CCTA clinical data.
In the embodiment of the invention, the target CCTA clinical data comprises CCTA clinical data of a plurality of heart disease patients. Optionally, the target CCTA clinical data may be obtained by collecting a large amount of CCTA clinical data of a patient with an extra-cardiovascular disease and screening a part of data with higher quality.
In an embodiment of the invention, the type of relevant variable comprises a computed tomography angiography indicator and/or a clinical variable. Optionally, the relevant variables include ACM (all-cause mortality), BMI (body mass index), BP (blood pressure), BSA (body surface area), CCS (coronary calcium score), CCTA (coronary computed tomography), D (diagonals), DM (diabetes), DPN (diabetes peripheral neuropathy), diabetic peripheral neuropathy), ejection fraction, FHx (facial history, family), FRRS (framework risk score), Framing raw risk score, Framing risk score, GTK (hemoglobin ), HDL (hemoglobin density score, HDL-c), hemoglobin density protein (hemoglobin-hemoglobin protein score, HDL-c), ischemic stroke), LAD (left trunk), LCX (left branch), LDL (low-density lipoprotein), LM (left trunk), LVED (left ventricular end diastolic), LVES (left ventricular end systolic), LVMs (left ventricular end systolic), LVM (left ventricular mass), MACE (left ventricular end diastolic), MDI (modified duindex, modified ducker index), Nr (number), Mid (middle), OM (blunt edge support), PAD (PAD), peripheral artery (peripheral region), RCA (RCA), peripheral artery (proximal), coronary (proximal), peripheral artery (RCA), proximal (right branch, proximal), coronary (right segment, proximal), peripheral artery (proximal, peripheral), and coronary (right segment, proximal), tachypnea), SSS (segment stenosis score), TIA (transient ischemic stroke).
The binning module 202 is configured to perform binning operation on multiple related variables based on multiple binning principles to obtain multiple binned variables corresponding to each related variable;
the calculating module 203 is configured to calculate variable information gains of the plurality of binned variables corresponding to each relevant variable, and screen out a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all relevant variables;
the establishing module 204 is used for establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data; the LogitBoost prediction model is used for carrying out heart disease prognosis prediction on a target heart disease patient.
Therefore, the device described by the embodiment of the invention can effectively consider the influence on the variable binning based on different binning principles, further screen out high-quality target variables based on the variable information gain of the binned variables, and realize a more accurate heart disease prognosis prediction effect by combining the LogitBoost learning algorithm, thereby being beneficial to improving the prediction efficiency and precision and providing help for subsequent clinical diagnosis and treatment.
As an optional implementation, the apparatus further comprises:
the analysis module 205 is configured to analyze the prediction result of the LogitBoost prediction model by using a prediction result accuracy analysis method, so as to obtain a prediction effect of the LogitBoost prediction model.
In the embodiment of the present invention, optionally, the method for analyzing the accuracy of the prediction result includes a multi-fold cross-validation method and/or a significance analysis method. For example, the prediction result of the LogitBoost prediction model may be analyzed by a triple-fold cross-validation method, and the average value calculated 100 times is taken as the final score to verify the prediction effect of the model.
As an optional implementation manner, the binning module 202 performs binning operation on multiple related variables based on multiple binning principles to obtain a specific manner of multiple binned variables corresponding to each related variable, including:
for each relevant variable in the multiple relevant variables, variable binning processing is carried out on the relevant variable based on four binning principles of equal-frequency binning, equal-distance binning, decision tree binning and chi-square binning respectively, and an equal-frequency binning post-variable, an equal-distance binning post-variable, a decision tree binning post-variable and a chi-square binning post-variable corresponding to the relevant variable are obtained.
As an optional implementation manner, the calculating module 203 calculates variable information gains of a plurality of binned variables corresponding to each relevant variable, and selects a specific manner of a plurality of target relevant variables from the plurality of relevant variables according to the variable information gains corresponding to all relevant variables, including:
calculating an after-equal-frequency binning variable, an after-equal-distance binning variable, a decision tree after-binning variable and a chi-square after-binning variable which correspond to each relevant variable respectively, and calculating an after-equal-frequency binning variable information gain, an after-equal-distance binning variable information gain, a decision tree after-binning variable information gain and a chi-square after-binning variable information gain which correspond to the after-equal-frequency binning variable, the after-equal-distance binning variable, the decision tree after-binning variable information gain and the chi-square after-binning variable which correspond to each relevant variable respectively;
calculating the average information gain value of the variable information gain after equal-frequency binning, the variable information gain after equal-distance binning, the variable information gain after decision tree binning and the variable information gain after chi-square binning corresponding to each relevant variable;
sequencing all related variables from high to low according to corresponding average information gain values to obtain variable sequences;
and determining the previous preset number of correlation variables in the variable sequence as target correlation variables.
Specifically, the prediction apparatus in the embodiment of the present invention is a functional module implementation scheme of the prediction method in the first embodiment, and specific technical details or implementation schemes thereof may refer to the description in the first embodiment, which is not described herein again.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of another cardiac disease prognosis prediction apparatus based on LogitBoost according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus may include:
a memory 301 storing executable program code;
a processor 302 coupled to the memory 301;
the processor 302 calls the executable program code stored in the memory 301 to perform part or all of the steps of the LogitBoost-based cardiac disease prognosis prediction method disclosed in the embodiment of the present invention.
Example four
The embodiment of the invention discloses a computer storage medium, which stores computer instructions, and when the computer instructions are called, the computer storage medium is used for executing part or all of the steps of the method for predicting the prognosis of the heart disease based on LogitBoost disclosed by the embodiment of the invention.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the method and the device for predicting the prognosis of cardiac disease based on LogitBoost disclosed in the embodiments of the present invention are only preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, rather than limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A LogitBoost-based cardiac disease prognosis prediction method, the method comprising:
acquiring target CCTA clinical data, and determining a plurality of related variables from the target CCTA clinical data; the target CCTA clinical data comprises CCTA clinical data of a plurality of heart disease patients;
for each relevant variable in the multiple relevant variables, performing variable binning processing on the relevant variable based on four binning principles of equal-frequency binning, equal-distance binning, decision tree binning and chi-square binning respectively to obtain an equal-frequency binning post-variable, an equal-distance binning post-variable, a decision tree binning post-variable and a chi-square binning post-variable corresponding to the relevant variable;
calculating an equal-frequency after-binning variable information gain, a decision tree after-binning variable information gain and a chi-square after-binning variable information gain which correspond to each related variable respectively;
calculating the average information gain value of the variable information gain after equal-frequency binning, the variable information gain after equal-distance binning, the variable information gain after decision tree binning and the variable information gain after chi-square binning corresponding to each relevant variable;
sequencing all the related variables from high to low according to corresponding average information gain values to obtain variable sequences;
determining the related variables of the previous preset number in the variable sequence as target related variables;
according to the target related variable and the target CCTA clinical data, a LogitBoost prediction model based on a LogitBoost learning algorithm is established; the LogitBoost prediction model is used for carrying out heart disease prognosis prediction on a target heart disease patient.
2. The LogitBoost-based cardiac disease prognosis prediction method according to claim 1, wherein the method further comprises:
and analyzing the prediction result of the LogitBoost prediction model by adopting a prediction result accuracy analysis method to obtain the prediction effect of the LogitBoost prediction model.
3. A LogitBoost-based cardiac disease prognosis prediction method according to claim 1, wherein the type of the relevant variables comprises computed tomography angiography indicators and/or clinical variables.
4. The LogitBoost-based prognosis method for cardiac diseases according to claim 1, wherein the relevant variables include one or more of a global mortality, body mass index, blood pressure, body surface area, coronary calcium score, diagonal, diabetes, diabetic peripheral neuropathy, ejection fraction, family history, vermingham primary risk score, vermingham risk score, triglycerides, glycated hemoglobin, high density lipoprotein, ischemic stroke, left branch, low density lipoprotein, left trunk, left ventricular end-diastole, left ventricular end-systole, left ventricular mass, major cardiac adverse events, modified duckweed index, number, middle, blunt limbic branch, peripheral artery, lateral branch, proximal end, right coronary artery, segment involvement score, tachypnea, segment stenosis score, transient ischemic stroke.
5. The LogitBoost-based cardiac disease prognosis prediction method according to claim 2, wherein the prediction result accuracy analysis method comprises a multi-fold cross-validation method and/or a significance analysis method.
6. A LogitBoost-based cardiac disease prognosis prediction device, the device comprising:
the acquisition module is used for acquiring target CCTA clinical data and determining a plurality of related variables from the target CCTA clinical data; the target CCTA clinical data comprises CCTA clinical data of a plurality of heart disease patients;
the box separation module is used for carrying out box separation operation on the plurality of relevant variables based on a plurality of box separation principles to obtain a plurality of box separated variables corresponding to each relevant variable; the box separation module is used for carrying out box separation operation on the plurality of related variables based on a plurality of box separation principles to obtain a specific mode of a plurality of post-box-separation variables corresponding to each related variable, and the specific mode comprises the following steps:
for each relevant variable in the multiple relevant variables, performing variable binning processing on the relevant variable based on four binning principles of equal-frequency binning, equal-distance binning, decision tree binning and chi-square binning respectively to obtain an equal-frequency binning post-variable, an equal-distance binning post-variable, a decision tree binning post-variable and a chi-square binning post-variable corresponding to the relevant variable;
the calculation module is used for calculating the variable information gain of the plurality of classified variables corresponding to each relevant variable and screening a plurality of target relevant variables from the plurality of relevant variables according to the variable information gain corresponding to all relevant variables; the calculation module calculates the variable information gain of a plurality of binned variables corresponding to each relevant variable, and screens out specific modes of a plurality of target relevant variables from the relevant variables according to the variable information gains corresponding to all the relevant variables, wherein the specific modes comprise:
calculating an equal-frequency after-binning variable information gain, a decision tree after-binning variable information gain and a chi-square after-binning variable information gain which correspond to each related variable respectively;
calculating the average information gain value of the variable information gain after equal frequency binning, the variable information gain after equal distance binning, the variable information gain after decision tree binning and the variable information gain after chi-square binning corresponding to each relevant variable;
sequencing all the related variables from high to low according to corresponding average information gain values to obtain variable sequences;
determining the related variables of the previous preset number in the variable sequence as target related variables;
the establishing module is used for establishing a LogitBoost prediction model based on a LogitBoost learning algorithm according to the target related variables and the target CCTA clinical data; the LogitBoost prediction model is used for carrying out heart disease prognosis prediction on a target heart disease patient.
7. A LogitBoost-based cardiac disease prognosis prediction apparatus, adapted for use with a smart card, the apparatus comprising:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to perform the LogitBoost-based cardiac disease prognosis prediction method according to any one of claims 1-5.
8. A computer storage medium storing computer instructions for performing a LogitBoost-based cardiac disease prognosis prediction method according to any one of claims 1-5 when invoked.
CN202110483774.4A 2021-04-30 2021-04-30 LogitBoost-based heart disease prognosis prediction method and device Active CN113205880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110483774.4A CN113205880B (en) 2021-04-30 2021-04-30 LogitBoost-based heart disease prognosis prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110483774.4A CN113205880B (en) 2021-04-30 2021-04-30 LogitBoost-based heart disease prognosis prediction method and device

Publications (2)

Publication Number Publication Date
CN113205880A CN113205880A (en) 2021-08-03
CN113205880B true CN113205880B (en) 2022-09-23

Family

ID=77029935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110483774.4A Active CN113205880B (en) 2021-04-30 2021-04-30 LogitBoost-based heart disease prognosis prediction method and device

Country Status (1)

Country Link
CN (1) CN113205880B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114090601B (en) * 2021-11-23 2023-11-03 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399255A (en) * 2018-03-06 2018-08-14 中国银行股份有限公司 A kind of input data processing method and device of Classification Data Mining model
CN108959187A (en) * 2018-04-09 2018-12-07 中国平安人寿保险股份有限公司 A kind of variable branch mailbox method, apparatus, terminal device and storage medium
CN109840843A (en) * 2019-01-07 2019-06-04 杭州排列科技有限公司 The automatic branch mailbox algorithm of continuous type feature based on similarity combination
CN110991432A (en) * 2020-03-03 2020-04-10 支付宝(杭州)信息技术有限公司 Living body detection method, living body detection device, electronic equipment and living body detection system
CN111597918A (en) * 2020-04-26 2020-08-28 北京金山云网络技术有限公司 Training and detecting method and device of human face living body detection model and electronic equipment
CN112115316A (en) * 2019-06-20 2020-12-22 北京京东振世信息技术有限公司 Box separation method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6956572B2 (en) * 2003-02-10 2005-10-18 Siemens Medical Solutions Health Services Corporation Patient medical parameter user interface system
CN109978406A (en) * 2019-04-08 2019-07-05 上海叮诺科技有限公司 A kind of method and system of security downside risks assessment diagnosis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399255A (en) * 2018-03-06 2018-08-14 中国银行股份有限公司 A kind of input data processing method and device of Classification Data Mining model
CN108959187A (en) * 2018-04-09 2018-12-07 中国平安人寿保险股份有限公司 A kind of variable branch mailbox method, apparatus, terminal device and storage medium
CN109840843A (en) * 2019-01-07 2019-06-04 杭州排列科技有限公司 The automatic branch mailbox algorithm of continuous type feature based on similarity combination
CN112115316A (en) * 2019-06-20 2020-12-22 北京京东振世信息技术有限公司 Box separation method and device, electronic equipment and storage medium
CN110991432A (en) * 2020-03-03 2020-04-10 支付宝(杭州)信息技术有限公司 Living body detection method, living body detection device, electronic equipment and living body detection system
CN111597918A (en) * 2020-04-26 2020-08-28 北京金山云网络技术有限公司 Training and detecting method and device of human face living body detection model and electronic equipment

Also Published As

Publication number Publication date
CN113205880A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
US11803965B2 (en) Methods and systems for assessing image quality in modeling of patient anatomic or blood flow characteristics
US11020081B2 (en) Method and system for determining a measurement start time
Rabkin et al. A new QT interval correction formulae to adjust for increases in heart rate
US20220093215A1 (en) Discovering genomes to use in machine learning techniques
EP3453321A1 (en) Non-invasive method and system for estimating blood pressure from photoplethysmogram using statistical post-processing
US20210056684A1 (en) Medical image classification method and related device
CN113205880B (en) LogitBoost-based heart disease prognosis prediction method and device
CN114732419A (en) Exercise electrocardiogram data analysis method and device, computer equipment and storage medium
JP2016524512A (en) ECG features for type-ahead editing and automatic updates for report interpretation
Rahmouni et al. Clinical utility of automated assessment of left ventricular ejection fraction using artificial intelligence–assisted border detection
CN115965621B (en) Magnetic resonance imaging-based prediction device for main heart adverse events
EP2590096A2 (en) Method and apparatus for displaying analysis result of medical measured data
US20230260133A1 (en) Methods for acquiring aorta based on deep learning and storage media
KR20130104883A (en) Apparatus and method for prediction of cac score level change
KR20130104882A (en) Apparatus and method for predictiing coronary artery calcification risk
EP4084010A1 (en) Method for operating an evaluation system for medical image data sets, evaluation system, computer program and electronically readable storage medium
CN115778403A (en) Electrocardiogram analysis method, electrocardiogram analysis device, electronic equipment and storage medium
CN112022140A (en) Automatic diagnosis method and system for diagnosis conclusion of electrocardiogram
Romero et al. Quanification of subcutaneous and visceral adipose tissue using CT
Jegelevicius et al. Web-based health services and clinical decision support
CN113768476B (en) Cardiac cycle phase detection method, apparatus and computer program product
CN114098777A (en) Method and device for acquiring cardiac phase, storage medium and computer equipment
CN114494252A (en) Heart image processing method, device, equipment and storage medium
ABRAMOV et al. INFORMATICS AND AUTOMATION
KR20160123150A (en) System for providing medical information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant