CN113974566A - COPD acute exacerbation prediction method based on time window - Google Patents
COPD acute exacerbation prediction method based on time window Download PDFInfo
- Publication number
- CN113974566A CN113974566A CN202111319613.8A CN202111319613A CN113974566A CN 113974566 A CN113974566 A CN 113974566A CN 202111319613 A CN202111319613 A CN 202111319613A CN 113974566 A CN113974566 A CN 113974566A
- Authority
- CN
- China
- Prior art keywords
- model
- days
- features
- day
- patient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7275—Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Pathology (AREA)
- Heart & Thoracic Surgery (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Physiology (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a COPD acute exacerbation prediction method based on a time window, S1, collecting pulmonary indexes of a patient twice a day (morning and afternoon) by utilizing devices such as a small lung instrument, an electronic stethoscope and the like; s2, predicting T +1, T +2 and T +3 days for the supported model, and keeping the model easy to use; s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; s4, taking one exacerbation (and the previous 7 days) as a positive sample; s5, carrying out significance test on the characteristics; and S6, inputting parameters of the model by using the 235 significant characteristics, predicting whether the T + d days (d is 1, 2 and 3) are aggravated, predicting whether the patient has the risk of acute exacerbation of the COPD by using the lung monitoring data of a time window, and enabling the patient to monitor the patient at home to have significance for home care of the COPD patient.
Description
Technical Field
The invention relates to the technical field of COPD acute exacerbation prediction, in particular to a COPD acute exacerbation prediction method based on a time window.
Background
Chronic obstructive pulmonary disease (hereinafter referred to as "COPD") is a disease of chronic bronchitis, emphysema, a disease causing damage to alveolar structures, or a mixture of both and closure of airways from bronchi to alveoli; symptoms of this disease include: long-term cough with sputum, dyspnea due to a drop in air flow rate caused by airway obstruction, and common respiratory infections (such as the common cold); this disease causes high mortality worldwide, and is rapidly increasing due to smoking, air pollution, and the like; the etiology of COPD is an abnormal chronic inflammatory response of the lungs to toxic molecules or gases, as well as various factors that are complexly involved in COPD, such as smoking, urbanization, pollution, infectious respiratory disease, and the like.
Combinations of clinical parameters have been used to predict acute exacerbations of COPD in patients; however, these clinical parameters are not accurate enough for individual case predictions; furthermore, although COPD patients may develop a possibility of acute exacerbation after going to hospital due to the above-mentioned factors, COPD patients cannot predict the possibility of their own acute exacerbation; thus, the visit to the hospital after an acute exacerbation of COPD may lead to adverse outcomes for COPD patients.
Although literature is available for predicting COPD acute exacerbation events using statistical or machine learning means, the current literature in the field has the following drawbacks:
1. the existing studies are mainly cross-sectional data, and fail to use time-series data to perform real-time early warning on COPD acute exacerbation events of patients;
2. the current research does not carry out systematic feature mining, and the prediction capability of the model is improved;
3. the current research can not predict T +1, T +2 and T +3, and the existing model can only predict the risk probability of future exacerbation of the patient.
Disclosure of Invention
The invention aims to provide a COPD acute exacerbation prediction method based on a time window, which uses lung monitoring data of the time window to predict whether a patient has the risk of COPD acute exacerbation for T + d days (d is 1, 2, 3), so that the patient can monitor and warn himself at home, the operation is simple, and the method has significance for home care of the COPD patient and solves the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a COPD acute exacerbation prediction method based on a time window comprises the following steps:
s1, collecting lung indexes of a patient twice a day (morning and afternoon) by using devices such as a small lung instrument, an electronic stethoscope and the like, such as FVC (pneumotach) and FEV (FEV 1), and the maximum value of the energy of lung vibration collected by the stethoscope, wherein the FVC adopts an instrument 'small lung instrument', and obtains forced vital capacity, namely the maximum air volume capable of being exhaled as soon as possible after the maximum inhalation is tried; the FEV1 adopts an instrument, namely a small lung instrument, to obtain the volume of the air volume which is maximally breathed after the maximum deep inspiration and is maximally breathed out for the first second; the PEF adopts an instrument, namely a small lung instrument, and obtains the instantaneous flow rate when the expiratory flow is fastest in the forced vital capacity measuring process;
s2, in order to support the model, it can predict T +1, T +2, T +3 days, and in order to keep the model easy to use, it uses the fixed time window (7 days) of the patient lung monitoring index to predict, collects the patient' S32 indexes every morning and evening through the electronic device, and distinguishes the indexes of five-day time window into date and whether morning, then the number of features is 32 × 7 × 2-448;
s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; the data expansion comprises the following steps: index sliding window statistics such as 3 day mean/variance, 5 day mean/variance; the difference of the alternate-day indexes; 1744 expanded feature numbers;
s4, taking one exacerbation (and the previous 7 days) as a positive sample; for negative samples, the specified time window cannot include 30 days before and after the acute attack period, so as to prevent the disease condition from influencing the monitoring index; the negative sample is generated by sampling all data which can be observed continuously for 7 days in the data;
s5, performing significance test on the features, and finding out whether 235 features have significant correlation to the increase of the T + d day (d is 1, 2, 3);
s6, using the 235 significant features as model input parameters, predicting whether the day T + d (d is 1, 2, 3) is heavy; the model adopts an integrated model based on a decision tree: xgboost, lightgbm, and catboost, and 5-fold cross validation was used to evaluate the model effect.
Preferably, the XGBoost model interpretation method includes the following steps:
(1) analyzing the tree model element structure of the XGboost model to analyze the tree structure of each single tree;
(2) inputting a test sample into the XGboost model, and acquiring an effective leaf node corresponding to the test sample and an effective path of a tree of the effective leaf node according to a tree structure;
(3) and calculating a contribution value of the feature according to the effective path, and explaining the XGboost model according to the acquired contribution value.
Preferably, the XGBoost utilizes a Boosting integration method, a large amount of XGBoost is used for data mining, and the XGBoost can process missing values and regularize features, so that a second-order accelerated optimization function of a cost function is realized.
Preferably, the LightGBM is a new gradient spanning tree framework, supporting algorithms of GBDT, GBRT, GBM and MART, and is a complete solution for distributed training based on DMTK framework.
Preferably, the Catboost algorithm includes: in a sensing period, the secondary user sends the sensed energy value in the channel to the fusion center as a characteristic energy vector, and the primary user sends information of whether the spectrum resources are occupied or not to the fusion center as a label discontinuously, so that the construction of a training data set is completed. The model was trained with the Catboost algorithm in the fusion center.
Preferably, the Catboost algorithm is proposed by Yandex, optimizes the processing of the class characteristics, and calculates the leaf node values when selecting the tree model in the training stage rather than the data preprocessing stage, so as to reduce overfitting.
Preferably, the prediction period is longer than eight days, namely, the positive sample is intercepted in a time window of eight days (T-7, T-6, T-5, T-4, T-3, T-2, T-1 and T), and for the positive sample, the Tth day is the acute exacerbation starting date; for negative samples, the specified time window cannot include the 7 days before and after the acute episode.
Preferably, in order to achieve the effect of early warning, the prediction period is set with 3 groups of prediction tasks in advance:
(1) task _1, adopting observed values from T-5 days to T-1 days to predict whether acute exacerbation occurs on the T day;
(2) task _2, adopting observed values from T-6 days to T-2 days to predict whether acute exacerbation occurs on the T day;
(3) task _3, using observations from day T-7 to day T-3, predicts whether acute exacerbation occurred on day T.
Preferably, to reduce the number of features, a Kolmogorov-Smirnov test is performed on the features, which can compare whether the two distributions are the same, and then the distribution of each feature on the positive sample and the distribution on the negative sample are tested with a confidence level of 0.05.
In summary, due to the adoption of the technology, the invention has the beneficial effects that:
1. the invention can predict and early warn whether the COPd acute exacerbation exists on the T + d days (d is 1, 2, 3), and predict whether the patient has the COPD acute exacerbation risk or not by using the lung monitoring data of a time window;
2. compared with the method only using the original detection index value, the method for constructing the data comprises the following steps of: the positive and negative samples are selected, data sampling and medical knowledge are combined, and the model effect is remarkably improved;
3. the model of the invention has high practicability, the patient can monitor and early warn at home, and the operation is simple, which is of great significance for the home care of the COPD patient.
Drawings
FIG. 1 is a flow chart of model construction according to the present invention;
FIG. 2 is a ROC curve of five LightGBM-based models under the Task _1 setting of the present invention;
FIG. 3 is ROC curves of five LightGBM-based models under Task _2 setting of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention; all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention; thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention; all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention;
the invention provides a COPD acute exacerbation prediction method based on a time window as shown in figures 1-3, which comprises the following steps:
s1, collecting lung indexes of a patient twice a day (morning and afternoon) by using devices such as a small lung instrument, an electronic stethoscope and the like, such as FVC (pneumotach) and FEV (FEV 1), and the maximum value of the energy of lung vibration collected by the stethoscope, wherein the FVC adopts an instrument 'small lung instrument', and obtains forced vital capacity, namely the maximum air volume capable of being exhaled as soon as possible after the maximum inhalation is tried; the FEV1 adopts an instrument, namely a small lung instrument, to obtain the volume of the air volume which is maximally breathed after the maximum deep inspiration and is maximally breathed out for the first second; the PEF adopts an instrument, namely a small lung instrument, and obtains the instantaneous flow rate when the expiratory flow is fastest in the forced vital capacity measuring process (the lung index is shown in a table 1);
s2, in order to support the model, it can predict T +1, T +2, T +3 days, and in order to keep the model easy to use, it uses the fixed time window (7 days) of the patient lung monitoring index to predict, collects the patient' S32 indexes every morning and evening through the electronic device, and distinguishes the indexes of five-day time window into date and whether morning, then the number of features is 32 × 7 × 2-448;
s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; the data expansion comprises the following steps: index sliding window statistics such as 3 day mean/variance, 5 day mean/variance; the difference of the alternate-day indexes; 1744 expanded feature numbers;
s4, taking one exacerbation (and the previous 7 days) as a positive sample; for negative samples, the specified time window cannot include 30 days before and after the acute attack period, so as to prevent the disease condition from influencing the monitoring index; the negative sample is generated by sampling all data which can be observed continuously for 7 days in the data;
s5, performing significance test on the features, and finding out whether 235 features have significant correlation to the increase of the T + d day (d is 1, 2, 3);
s6, using the 235 significant features as model input parameters, predicting whether the day T + d (d is 1, 2, 3) is heavy; the model adopts an integrated model based on a decision tree: xgboost, lightgbm, and catboost, and 5-fold cross validation was used to evaluate the model effect.
Specifically, the XGboost model interpretation method comprises the following steps:
(1) analyzing the tree model element structure of the XGboost model to analyze the tree structure of each single tree;
(2) inputting a test sample into the XGboost model, and acquiring an effective leaf node corresponding to the test sample and an effective path of a tree of the effective leaf node according to a tree structure;
(3) and calculating a contribution value of the feature according to the effective path, and explaining the XGboost model according to the acquired contribution value.
Specifically, the XGBoost utilizes a Boosting integration method, is largely used for data mining, and can process missing values and regularize features, so that a second-order accelerated optimization function of a cost function is realized.
Specifically, the LightGBM is a new gradient-boosted tree framework, supports algorithms of GBDT, GBRT, GBM and MART, is several times faster than the existing gradient-enhanced tree due to a completely greedy tree growth method and histogram-based memory and computational optimization, is a complete solution for distributed training based on the DMTK framework, and quickly becomes a common tool for data mining contestants after the occurrence of the LightGBM.
Specifically, the Catboost algorithm includes: in a sensing period, the secondary user sends the sensed energy value in the channel to the fusion center as a characteristic energy vector, and the primary user sends information of whether the spectrum resources are occupied or not to the fusion center as a label discontinuously, so that the construction of a training data set is completed. The model was trained with the Catboost algorithm in the fusion center.
Specifically, the Catboost algorithm is proposed by Yandex, optimizes the processing of the class characteristics, and calculates the leaf node values when selecting the tree model in the training stage instead of the data preprocessing stage, thereby reducing overfitting.
Specifically, the prediction period duration takes eight days as a time window to intercept a positive sample, the eight days are marked as (T-7, T-6, T-5, T-4, T-3, T-2, T-1, T), and for the positive sample, the Tth day is the acute exacerbation starting date; for negative samples, the specified time window cannot include the 7 days before and after the acute episode.
Specifically, in order to achieve the early warning effect in the prediction period, 3 groups of prediction tasks are set in advance:
(1) task _1, adopting observed values from T-5 days to T-1 days to predict whether acute exacerbation occurs on the T day;
(2) task _2, adopting observed values from T-6 days to T-2 days to predict whether acute exacerbation occurs on the T day;
(3) task _3, using observations from day T-7 to day T-3, predicts whether acute exacerbation occurred on day T.
Specifically, in order to reduce the number of features, a Kolmogorov-Smirnov test is performed on the features, the test can compare whether the two distributions are the same, and then the distribution of each feature on a positive sample and the distribution on a negative sample are tested, and the confidence coefficient is taken to be 0.05.
Table 1: observation value feature names and their interpretation;
table 2. score the first fifty features and their P-values by significance test;
using a k-fold hierarchical cross validation (k ═ 5), data were divided into 5 folds, each time at 8: and 2, training and testing the model by dividing the model into a training set and a testing set. The validation indicates evaluation indices of sensitivity, specificity and AUC, where the threshold is the minimum threshold that allows sensitivity to exceed 0.9 and the specificity is the specificity at the current threshold. The used models are catboost, xgboost and lightgbm, and other hyper-parameters are obtained by performing hyper-parameter search through cross validation; three tasks are set: task _1, Task _2, and Task _3, under each Task, 5 models are set:
(1) m _ all, training by adopting all the characteristics;
(2) m _ sig, using all features that pass significance tests;
(3) m _ signature, which adopts the relevant characteristics of the electronic stethoscope passing the significance test;
(4) m _ sigLSI, using characteristics of the small lung apparatus passing significance test;
(5) m _ sig50, the first 50 features with the lowest p-value passing significance test under the task setting are adopted;
(6) m _ sig25, using top 25 features that pass the significance test;
(7) m _ orig, training by adopting all original observation indexes.
Task_1 | Task_2 | Task_3 | |
M_all | 0.8135 | 0.8135 | 0.8135 |
M_sig | 0.9268 | 0.9045 | 0.8887 |
M_sigSTE | 0.9020 | 0.8845 | 0.8302 |
M_sigLSI | 0.8279 | 0.7158 | 0.6617 |
M_sig50 | 0.8826 | 0.8000 | 0.8631 |
M_sig25 | 0.8173 | 0.8075 | 0.8816 |
M_orig | 0.7361 | 0.7434 | 0.5782 |
Table 3. AUC mean score of cross validation;
under the Task _1 setting, the number of the salient features is 123, wherein the number of the small lung instrument features passing the significance test is 31, and the number of the electronic stethoscopes is 92.
Under the Task _2 setting, the significant features are 134, wherein 33 small lung instruments pass the significance test, and 101 electronic stethoscopes pass the significance test. Under the Task _3 setting, the number of the significant features is 131, wherein 28 of the small lung instrument features pass the significance test, and 103 of the electronic stethoscopes pass the significance test.
Table 3 reports the AUC mean score of the cross validation, where the model used is LightGBM. (1) Task _1 can get a higher score, consistent with intuitive understanding (predicting the next day is simpler than predicting the next two or three days);
(2) the score is obviously reduced only by using the characteristics generated by the small lung instrument, and the characteristics generated by using the stethoscope still have better performance, which indicates that the observed data of the electronic stethoscope has stronger discrimination and prediction functions;
(3) the method adopts the significance test to screen the features, and is obviously improved compared with the method of directly using the original observed value or all the features;
(4) with the top 50 features of significance or the top 25 features, the model score will decrease somewhat, indicating that the model fitting ability decreases after the number of features is reduced. ROC curves for five models based on LightGBM under Task _1 setting as shown in fig. 1;
table 3 reports the AUC mean score of the cross validation, where the model used is LightGBM. To verify the performance under other models, we give the following effect under the xgboost or catboost model:
Task_1 | Task_2 | Task_3 | |
M_all | 0.8772 | 0.8673 | 0.8142 |
M_sig | 0.9181 | 0.8946 | 0.8233 |
M_sigSTE | 0.9036 | 0.8792 | 0.8110 |
M_sigLSI | 0.8279 | 0.7610 | 0.7000 |
M_sig50 | 0.8372 | 0.7831 | 0.8184 |
M_sig25 | 0.8177 | 0.8047 | 0.8203 |
M_orig | 0.7812 | 0.7881 | 0.6659 |
table 3-1. AUC mean score of cross validation. The model used is xgboost.
Table 3-2. AUC mean score of cross validation. The model used was catboost.
Sensitivity of the composition | Specificity of | Probability threshold | |
Task_1M_sig50 | 0.9043 | 0.7345 | 0.0113 |
Task_2M_sig50 | 0.9043 | 0.7098 | 0.0091 |
Task_3M_sig50 | 0.9043 | 0.6623 | 0.0042 |
And 4, setting the sensitivity and specificity values of the optimal model M _ sig under each task.
In order to verify the influence of different decision tree models on the performance of the predicted tasks, the following table reports model performance with significant features under three task settings, and we compare Xgboost, Lightgbm and Catboost, and the three strongest gradient boosting (gradient boosting) algorithms based on the decision tree models perform the best under the three task settings according to experimental results.
Lightgbm | Catboost | Xgboost | |
Task_1M_sig | 0.9268 | 0.8852 | 0.9181 |
Task_2M_sig | 0.9045 | 0.8505 | 0.8946 |
Task_3M_sig | 0.8887 | 0.8722 | 0.8233 |
And 5, cross validation average AUC of three types of decision tree integration models Catboost, Lightgbm and Xgboost based on the most characteristic combination M _ sig under each task setting.
Example 2
Five-fold cross validation was performed on task1 using the Lightgbm model, with the AUC per fold appearing as follows:
five-fold cross validation was performed on task2 using the Lightgbm model, with the AUC per fold appearing as follows:
five-fold cross validation was performed on task3 using the Lightgbm model, with the AUC per fold appearing as follows:
example 3
Five-fold cross validation was performed on task1 using the Lightgbm model, with average ACC, precision, recall, f1, auc scores as follows:
table 5 mean scores for various indices cross-validated on task 1.
Five-fold cross validation was performed on task2 using the Lightgbm model, with average ACC, precision, recall, f1, auc scores as follows:
AUC | ACC | precision | recall | F1 | |
M_all | 0.8135 | 0.8694 | 0.5415 | 0.7142 | 0.5475 |
M_sig | 0.9045 | 0.8665 | 0.5365 | 0.9142 | 0.6441 |
M_sigSTE | 0.8845 | 0.9108 | 0.6758 | 0.5714 | 0.6112 |
M_sigLSI | 0.7158 | 0.7509 | 0.2969 | 0.7428 | 0.4028 |
M_sig50 | 0.8000 | 0.8807 | 0.4988 | 0.6243 | 0.5275 |
M_sig25 | 0.8075 | 0.8950 | 0.4902 | 0.6571 | 0.5322 |
M_orig | 0.7434 | 0.8423 | 0.7107 | 0.6 | 0.5572 |
table 6 average scores for various indices cross-validated at Task 2.
Five-fold cross validation was performed on task3 using the Lightgbm model, with average ACC, precision, recall, f1, auc scores as follows:
table 7 average scores for various indices cross-validated at Task 3.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
It is to be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions; also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Claims (9)
1. A COPD acute exacerbation prediction method based on a time window is characterized by comprising the following steps:
s1, collecting lung indexes of a patient twice a day (morning and afternoon) by using devices such as a small lung instrument, an electronic stethoscope and the like, such as FVC (pneumotach) and FEV (FEV 1), and the maximum value of the energy of lung vibration collected by the stethoscope, wherein the FVC adopts an instrument 'small lung instrument', and obtains forced vital capacity, namely the maximum air volume capable of being exhaled as soon as possible after the maximum inhalation is tried; the FEV1 adopts an instrument, namely a small lung instrument, to obtain the volume of the air volume which is maximally breathed after the maximum deep inspiration and is maximally breathed out for the first second; the PEF adopts an instrument, namely a small lung instrument, and obtains the instantaneous flow rate when the expiratory flow is fastest in the forced vital capacity measuring process;
s2, in order to support the model, it can predict T +1, T +2, T +3 days, and in order to keep the model easy to use, it uses the fixed time window (7 days) of the patient lung monitoring index to predict, collects the patient' S32 indexes every morning and evening through the electronic device, and distinguishes the indexes of five-day time window into date and whether morning, then the number of features is 32 × 7 × 2-448;
s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; the data expansion comprises the following steps: an index sliding window statistic; the difference of the alternate-day indexes;
s4, taking one exacerbation (and the previous 7 days) as a positive sample; for negative samples, the specified time window cannot include 30 days before and after the acute attack period, so as to prevent the disease condition from influencing the monitoring index; the negative sample is generated by sampling all data which can be observed continuously for 7 days in the data;
s5, performing significance test on the features, and finding out whether 235 features have significant correlation to the increase of the T + d day (d is 1, 2, 3);
s6, using the 235 significant features as model input parameters, predicting whether the day T + d (d is 1, 2, 3) is heavy; the model adopts an integrated model based on a decision tree: xgboost, lightgbm, and catboost, and 5-fold cross validation was used to evaluate the model effect.
2. The method of claim 1, wherein the method comprises: the XGboost model interpretation method comprises the following steps:
(1) analyzing the tree model element structure of the XGboost model to analyze the tree structure of each single tree;
(2) inputting a test sample into the XGboost model, and acquiring an effective leaf node corresponding to the test sample and an effective path of a tree of the effective leaf node according to a tree structure;
(3) and calculating a contribution value of the feature according to the effective path, and explaining the XGboost model according to the acquired contribution value.
3. The method of claim 1, wherein the method comprises: the XGboost utilizes a Boosting integration method, is largely used for data mining, and can process missing values and regularize features, so that the function of second-order accelerated optimization of a cost function is realized.
4. The method of claim 1, wherein the method comprises: the LightGBM is a new gradient lifting tree framework, supports the algorithm of GBDT, GBRT, GBM and MART, and is a complete solution of distributed training based on the DMTK framework.
5. The method of claim 1, wherein the method comprises: the Catboost algorithm includes: in a sensing period, the secondary user sends the sensed energy value in the channel to the fusion center as a characteristic energy vector, and the primary user sends information of whether the spectrum resources are occupied or not to the fusion center as a label discontinuously, so that the construction of a training data set is completed. The model was trained with the Catboost algorithm in the fusion center.
6. The method of claim 1, wherein the method comprises: the Catboost algorithm is proposed by Yandex, optimizes the processing of the class characteristics, processes in a training stage instead of a data preprocessing stage, and calculates leaf node values when a tree model is selected to reduce overfitting.
7. The method of claim 1, wherein the method comprises: the prediction period is long, eight days are taken as a time window to intercept positive samples, the eight days are marked as (T-7, T-6, T-5, T-4, T-3, T-2, T-1, T), and for the positive samples, the Tth day is the acute exacerbation starting date; for negative samples, the specified time window cannot include the 7 days before and after the acute episode.
8. The method of claim 7, wherein the method comprises: in order to achieve the effect of early warning in the prediction period, 3 groups of prediction tasks are set in advance:
(1) task _1, adopting observed values from T-5 days to T-1 days to predict whether acute exacerbation occurs on the T day;
(2) task _2, adopting observed values from T-6 days to T-2 days to predict whether acute exacerbation occurs on the T day;
(3) task _3, using observations from day T-7 to day T-3, predicts whether acute exacerbation occurred on day T.
9. The method of claim 1, wherein the method comprises: in order to reduce the number of features, a Kolmogorov-Smirnov test is performed on the features, the test can be used for comparing whether the two distributions are the same, and then the distribution of each feature on a positive sample and the distribution on a negative sample are tested, and the confidence coefficient is 0.05.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111319613.8A CN113974566B (en) | 2021-11-09 | 2021-11-09 | COPD acute exacerbation prediction method based on time window |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111319613.8A CN113974566B (en) | 2021-11-09 | 2021-11-09 | COPD acute exacerbation prediction method based on time window |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113974566A true CN113974566A (en) | 2022-01-28 |
CN113974566B CN113974566B (en) | 2023-09-19 |
Family
ID=79747333
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111319613.8A Active CN113974566B (en) | 2021-11-09 | 2021-11-09 | COPD acute exacerbation prediction method based on time window |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113974566B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114566238A (en) * | 2022-02-09 | 2022-05-31 | 无锡启益医疗科技有限公司 | Screened patient AUC (AUC) improving method based on COPD (chronic obstructive pulmonary disease) risk judgment |
CN117894478A (en) * | 2024-03-14 | 2024-04-16 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Informationized intelligent management method for severe cases of oncology department of severe cases of oncology |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150080671A1 (en) * | 2013-05-29 | 2015-03-19 | Technical University Of Denmark | Sleep Spindles as Biomarker for Early Detection of Neurodegenerative Disorders |
CN107451390A (en) * | 2017-02-22 | 2017-12-08 | Cc和I研究有限公司 | System for predicting acute exacerbations in patients with chronic obstructive pulmonary disease |
CN110123274A (en) * | 2019-04-29 | 2019-08-16 | 上海电气集团股份有限公司 | A kind of monitoring system of septicopyemia |
CN110289061A (en) * | 2019-06-27 | 2019-09-27 | 黎檀实 | A kind of Time Series Forecasting Methods of the traumatic hemorrhagic shock condition of the injury |
CN111657888A (en) * | 2020-05-28 | 2020-09-15 | 首都医科大学附属北京天坛医院 | Severe acute respiratory distress syndrome early warning method and system |
CN113057588A (en) * | 2021-03-17 | 2021-07-02 | 上海电气集团股份有限公司 | Disease early warning method, device, equipment and medium |
WO2021148967A1 (en) * | 2020-01-23 | 2021-07-29 | Novartis Ag | A computer-implemented system and method for outputting a prediction of a probability of a hospitalization of patients with chronic obstructive pulmonary disorder |
CN113469227A (en) * | 2021-06-18 | 2021-10-01 | 南京润楠医疗电子研究院有限公司 | Forced expiration total amount prediction method |
-
2021
- 2021-11-09 CN CN202111319613.8A patent/CN113974566B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150080671A1 (en) * | 2013-05-29 | 2015-03-19 | Technical University Of Denmark | Sleep Spindles as Biomarker for Early Detection of Neurodegenerative Disorders |
CN107451390A (en) * | 2017-02-22 | 2017-12-08 | Cc和I研究有限公司 | System for predicting acute exacerbations in patients with chronic obstructive pulmonary disease |
US20180239872A1 (en) * | 2017-02-22 | 2018-08-23 | CC&I Research Co.,Ltd | System for predicting an acute exacerbation of chronic obstructive pulmonary disease |
CN110123274A (en) * | 2019-04-29 | 2019-08-16 | 上海电气集团股份有限公司 | A kind of monitoring system of septicopyemia |
CN110289061A (en) * | 2019-06-27 | 2019-09-27 | 黎檀实 | A kind of Time Series Forecasting Methods of the traumatic hemorrhagic shock condition of the injury |
WO2021148967A1 (en) * | 2020-01-23 | 2021-07-29 | Novartis Ag | A computer-implemented system and method for outputting a prediction of a probability of a hospitalization of patients with chronic obstructive pulmonary disorder |
CN111657888A (en) * | 2020-05-28 | 2020-09-15 | 首都医科大学附属北京天坛医院 | Severe acute respiratory distress syndrome early warning method and system |
CN113057588A (en) * | 2021-03-17 | 2021-07-02 | 上海电气集团股份有限公司 | Disease early warning method, device, equipment and medium |
CN113469227A (en) * | 2021-06-18 | 2021-10-01 | 南京润楠医疗电子研究院有限公司 | Forced expiration total amount prediction method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114566238A (en) * | 2022-02-09 | 2022-05-31 | 无锡启益医疗科技有限公司 | Screened patient AUC (AUC) improving method based on COPD (chronic obstructive pulmonary disease) risk judgment |
CN117894478A (en) * | 2024-03-14 | 2024-04-16 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Informationized intelligent management method for severe cases of oncology department of severe cases of oncology |
CN117894478B (en) * | 2024-03-14 | 2024-05-28 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Informationized intelligent management method for severe cases of oncology department of severe cases of oncology |
Also Published As
Publication number | Publication date |
---|---|
CN113974566B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wollenstein-Betech et al. | Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: hospitalizations, mortality, and the need for an ICU or ventilator | |
CN113974566B (en) | COPD acute exacerbation prediction method based on time window | |
US10332638B2 (en) | Methods and systems for pre-symptomatic detection of exposure to an agent | |
JP5450556B2 (en) | Medical information processing apparatus and method, and program | |
CN109166630B (en) | Infectious disease data monitoring and processing method and system | |
CN108417274A (en) | Forecast of epiphytotics method, system and equipment | |
CN112216402A (en) | Epidemic situation prediction method and device based on artificial intelligence, computer equipment and medium | |
CN115240803A (en) | Model training method, complication prediction system, complication prediction device, and complication prediction medium | |
CN112967803A (en) | Early mortality prediction method and system for emergency patients based on integrated model | |
Joshe et al. | Symptoms analysis based chronic obstructive pulmonary disease prediction in Bangladesh using machine learning approach | |
CN118136254A (en) | Method for constructing chronic obstructive pulmonary disease early model based on chest CT parameters | |
Ghose et al. | Deep viewing for Covid-19 detection from x-ray using cnn based architecture | |
Xu et al. | Automated detection of airflow obstructive diseases: a systematic review of the last decade (2013-2022) | |
CN117116475A (en) | Method, system, terminal and storage medium for predicting risk of ischemic cerebral apoplexy | |
Nikolikj et al. | Sensitivity Analysis of RF+ clust for Leave-one-problem-out Performance Prediction | |
Abdullah et al. | MERS-CoV disease estimation (MDE) A study to estimate a MERS-CoV by classification algorithms | |
Banyal et al. | Technology landscape for epidemiological prediction and diagnosis of covid-19 | |
Rajmohan et al. | G-Sep: A deep learning algorithm for detection of long-term sepsis using bidirectional gated recurrent unit | |
Corizzo et al. | Lstm-based pulmonary air leak forecasting for chest tube management | |
Patel et al. | Multi Feature fusion for COPD Classification using Deep learning algorithms | |
JP2022086803A (en) | Method for estimating reason, method for prediction, method for estimating attribute value, reason estimation device, prediction device, attribute value estimation device, and program | |
Xiao et al. | Breathing New Life into COPD Assessment: Multisensory Home-monitoring for Predicting Severity | |
Do et al. | Deep Q-learning for Predicting Asthma Attack with Considering Personalized Environmental Triggers’ Risk Scores | |
Nguyen et al. | Sound-Dr: Reliable Sound Dataset and Baseline Artificial Intelligence System for Respiratory Illnesses | |
Wang et al. | Machine Learning Classification Techniques for Diabetic Foot Ulcers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |