CN113974566A - COPD acute exacerbation prediction method based on time window - Google Patents

COPD acute exacerbation prediction method based on time window Download PDF

Info

Publication number
CN113974566A
CN113974566A CN202111319613.8A CN202111319613A CN113974566A CN 113974566 A CN113974566 A CN 113974566A CN 202111319613 A CN202111319613 A CN 202111319613A CN 113974566 A CN113974566 A CN 113974566A
Authority
CN
China
Prior art keywords
model
days
features
day
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111319613.8A
Other languages
Chinese (zh)
Other versions
CN113974566B (en
Inventor
王琨
朱威
李强
陆银美
侯应伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Qiyi Medical Technology Co ltd
Original Assignee
Wuxi Qiyi Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Qiyi Medical Technology Co ltd filed Critical Wuxi Qiyi Medical Technology Co ltd
Priority to CN202111319613.8A priority Critical patent/CN113974566B/en
Publication of CN113974566A publication Critical patent/CN113974566A/en
Application granted granted Critical
Publication of CN113974566B publication Critical patent/CN113974566B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Physiology (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a COPD acute exacerbation prediction method based on a time window, S1, collecting pulmonary indexes of a patient twice a day (morning and afternoon) by utilizing devices such as a small lung instrument, an electronic stethoscope and the like; s2, predicting T +1, T +2 and T +3 days for the supported model, and keeping the model easy to use; s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; s4, taking one exacerbation (and the previous 7 days) as a positive sample; s5, carrying out significance test on the characteristics; and S6, inputting parameters of the model by using the 235 significant characteristics, predicting whether the T + d days (d is 1, 2 and 3) are aggravated, predicting whether the patient has the risk of acute exacerbation of the COPD by using the lung monitoring data of a time window, and enabling the patient to monitor the patient at home to have significance for home care of the COPD patient.

Description

COPD acute exacerbation prediction method based on time window
Technical Field
The invention relates to the technical field of COPD acute exacerbation prediction, in particular to a COPD acute exacerbation prediction method based on a time window.
Background
Chronic obstructive pulmonary disease (hereinafter referred to as "COPD") is a disease of chronic bronchitis, emphysema, a disease causing damage to alveolar structures, or a mixture of both and closure of airways from bronchi to alveoli; symptoms of this disease include: long-term cough with sputum, dyspnea due to a drop in air flow rate caused by airway obstruction, and common respiratory infections (such as the common cold); this disease causes high mortality worldwide, and is rapidly increasing due to smoking, air pollution, and the like; the etiology of COPD is an abnormal chronic inflammatory response of the lungs to toxic molecules or gases, as well as various factors that are complexly involved in COPD, such as smoking, urbanization, pollution, infectious respiratory disease, and the like.
Combinations of clinical parameters have been used to predict acute exacerbations of COPD in patients; however, these clinical parameters are not accurate enough for individual case predictions; furthermore, although COPD patients may develop a possibility of acute exacerbation after going to hospital due to the above-mentioned factors, COPD patients cannot predict the possibility of their own acute exacerbation; thus, the visit to the hospital after an acute exacerbation of COPD may lead to adverse outcomes for COPD patients.
Although literature is available for predicting COPD acute exacerbation events using statistical or machine learning means, the current literature in the field has the following drawbacks:
1. the existing studies are mainly cross-sectional data, and fail to use time-series data to perform real-time early warning on COPD acute exacerbation events of patients;
2. the current research does not carry out systematic feature mining, and the prediction capability of the model is improved;
3. the current research can not predict T +1, T +2 and T +3, and the existing model can only predict the risk probability of future exacerbation of the patient.
Disclosure of Invention
The invention aims to provide a COPD acute exacerbation prediction method based on a time window, which uses lung monitoring data of the time window to predict whether a patient has the risk of COPD acute exacerbation for T + d days (d is 1, 2, 3), so that the patient can monitor and warn himself at home, the operation is simple, and the method has significance for home care of the COPD patient and solves the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a COPD acute exacerbation prediction method based on a time window comprises the following steps:
s1, collecting lung indexes of a patient twice a day (morning and afternoon) by using devices such as a small lung instrument, an electronic stethoscope and the like, such as FVC (pneumotach) and FEV (FEV 1), and the maximum value of the energy of lung vibration collected by the stethoscope, wherein the FVC adopts an instrument 'small lung instrument', and obtains forced vital capacity, namely the maximum air volume capable of being exhaled as soon as possible after the maximum inhalation is tried; the FEV1 adopts an instrument, namely a small lung instrument, to obtain the volume of the air volume which is maximally breathed after the maximum deep inspiration and is maximally breathed out for the first second; the PEF adopts an instrument, namely a small lung instrument, and obtains the instantaneous flow rate when the expiratory flow is fastest in the forced vital capacity measuring process;
s2, in order to support the model, it can predict T +1, T +2, T +3 days, and in order to keep the model easy to use, it uses the fixed time window (7 days) of the patient lung monitoring index to predict, collects the patient' S32 indexes every morning and evening through the electronic device, and distinguishes the indexes of five-day time window into date and whether morning, then the number of features is 32 × 7 × 2-448;
s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; the data expansion comprises the following steps: index sliding window statistics such as 3 day mean/variance, 5 day mean/variance; the difference of the alternate-day indexes; 1744 expanded feature numbers;
s4, taking one exacerbation (and the previous 7 days) as a positive sample; for negative samples, the specified time window cannot include 30 days before and after the acute attack period, so as to prevent the disease condition from influencing the monitoring index; the negative sample is generated by sampling all data which can be observed continuously for 7 days in the data;
s5, performing significance test on the features, and finding out whether 235 features have significant correlation to the increase of the T + d day (d is 1, 2, 3);
s6, using the 235 significant features as model input parameters, predicting whether the day T + d (d is 1, 2, 3) is heavy; the model adopts an integrated model based on a decision tree: xgboost, lightgbm, and catboost, and 5-fold cross validation was used to evaluate the model effect.
Preferably, the XGBoost model interpretation method includes the following steps:
(1) analyzing the tree model element structure of the XGboost model to analyze the tree structure of each single tree;
(2) inputting a test sample into the XGboost model, and acquiring an effective leaf node corresponding to the test sample and an effective path of a tree of the effective leaf node according to a tree structure;
(3) and calculating a contribution value of the feature according to the effective path, and explaining the XGboost model according to the acquired contribution value.
Preferably, the XGBoost utilizes a Boosting integration method, a large amount of XGBoost is used for data mining, and the XGBoost can process missing values and regularize features, so that a second-order accelerated optimization function of a cost function is realized.
Preferably, the LightGBM is a new gradient spanning tree framework, supporting algorithms of GBDT, GBRT, GBM and MART, and is a complete solution for distributed training based on DMTK framework.
Preferably, the Catboost algorithm includes: in a sensing period, the secondary user sends the sensed energy value in the channel to the fusion center as a characteristic energy vector, and the primary user sends information of whether the spectrum resources are occupied or not to the fusion center as a label discontinuously, so that the construction of a training data set is completed. The model was trained with the Catboost algorithm in the fusion center.
Preferably, the Catboost algorithm is proposed by Yandex, optimizes the processing of the class characteristics, and calculates the leaf node values when selecting the tree model in the training stage rather than the data preprocessing stage, so as to reduce overfitting.
Preferably, the prediction period is longer than eight days, namely, the positive sample is intercepted in a time window of eight days (T-7, T-6, T-5, T-4, T-3, T-2, T-1 and T), and for the positive sample, the Tth day is the acute exacerbation starting date; for negative samples, the specified time window cannot include the 7 days before and after the acute episode.
Preferably, in order to achieve the effect of early warning, the prediction period is set with 3 groups of prediction tasks in advance:
(1) task _1, adopting observed values from T-5 days to T-1 days to predict whether acute exacerbation occurs on the T day;
(2) task _2, adopting observed values from T-6 days to T-2 days to predict whether acute exacerbation occurs on the T day;
(3) task _3, using observations from day T-7 to day T-3, predicts whether acute exacerbation occurred on day T.
Preferably, to reduce the number of features, a Kolmogorov-Smirnov test is performed on the features, which can compare whether the two distributions are the same, and then the distribution of each feature on the positive sample and the distribution on the negative sample are tested with a confidence level of 0.05.
In summary, due to the adoption of the technology, the invention has the beneficial effects that:
1. the invention can predict and early warn whether the COPd acute exacerbation exists on the T + d days (d is 1, 2, 3), and predict whether the patient has the COPD acute exacerbation risk or not by using the lung monitoring data of a time window;
2. compared with the method only using the original detection index value, the method for constructing the data comprises the following steps of: the positive and negative samples are selected, data sampling and medical knowledge are combined, and the model effect is remarkably improved;
3. the model of the invention has high practicability, the patient can monitor and early warn at home, and the operation is simple, which is of great significance for the home care of the COPD patient.
Drawings
FIG. 1 is a flow chart of model construction according to the present invention;
FIG. 2 is a ROC curve of five LightGBM-based models under the Task _1 setting of the present invention;
FIG. 3 is ROC curves of five LightGBM-based models under Task _2 setting of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention; all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention; thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention; all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention;
the invention provides a COPD acute exacerbation prediction method based on a time window as shown in figures 1-3, which comprises the following steps:
s1, collecting lung indexes of a patient twice a day (morning and afternoon) by using devices such as a small lung instrument, an electronic stethoscope and the like, such as FVC (pneumotach) and FEV (FEV 1), and the maximum value of the energy of lung vibration collected by the stethoscope, wherein the FVC adopts an instrument 'small lung instrument', and obtains forced vital capacity, namely the maximum air volume capable of being exhaled as soon as possible after the maximum inhalation is tried; the FEV1 adopts an instrument, namely a small lung instrument, to obtain the volume of the air volume which is maximally breathed after the maximum deep inspiration and is maximally breathed out for the first second; the PEF adopts an instrument, namely a small lung instrument, and obtains the instantaneous flow rate when the expiratory flow is fastest in the forced vital capacity measuring process (the lung index is shown in a table 1);
s2, in order to support the model, it can predict T +1, T +2, T +3 days, and in order to keep the model easy to use, it uses the fixed time window (7 days) of the patient lung monitoring index to predict, collects the patient' S32 indexes every morning and evening through the electronic device, and distinguishes the indexes of five-day time window into date and whether morning, then the number of features is 32 × 7 × 2-448;
s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; the data expansion comprises the following steps: index sliding window statistics such as 3 day mean/variance, 5 day mean/variance; the difference of the alternate-day indexes; 1744 expanded feature numbers;
s4, taking one exacerbation (and the previous 7 days) as a positive sample; for negative samples, the specified time window cannot include 30 days before and after the acute attack period, so as to prevent the disease condition from influencing the monitoring index; the negative sample is generated by sampling all data which can be observed continuously for 7 days in the data;
s5, performing significance test on the features, and finding out whether 235 features have significant correlation to the increase of the T + d day (d is 1, 2, 3);
s6, using the 235 significant features as model input parameters, predicting whether the day T + d (d is 1, 2, 3) is heavy; the model adopts an integrated model based on a decision tree: xgboost, lightgbm, and catboost, and 5-fold cross validation was used to evaluate the model effect.
Specifically, the XGboost model interpretation method comprises the following steps:
(1) analyzing the tree model element structure of the XGboost model to analyze the tree structure of each single tree;
(2) inputting a test sample into the XGboost model, and acquiring an effective leaf node corresponding to the test sample and an effective path of a tree of the effective leaf node according to a tree structure;
(3) and calculating a contribution value of the feature according to the effective path, and explaining the XGboost model according to the acquired contribution value.
Specifically, the XGBoost utilizes a Boosting integration method, is largely used for data mining, and can process missing values and regularize features, so that a second-order accelerated optimization function of a cost function is realized.
Specifically, the LightGBM is a new gradient-boosted tree framework, supports algorithms of GBDT, GBRT, GBM and MART, is several times faster than the existing gradient-enhanced tree due to a completely greedy tree growth method and histogram-based memory and computational optimization, is a complete solution for distributed training based on the DMTK framework, and quickly becomes a common tool for data mining contestants after the occurrence of the LightGBM.
Specifically, the Catboost algorithm includes: in a sensing period, the secondary user sends the sensed energy value in the channel to the fusion center as a characteristic energy vector, and the primary user sends information of whether the spectrum resources are occupied or not to the fusion center as a label discontinuously, so that the construction of a training data set is completed. The model was trained with the Catboost algorithm in the fusion center.
Specifically, the Catboost algorithm is proposed by Yandex, optimizes the processing of the class characteristics, and calculates the leaf node values when selecting the tree model in the training stage instead of the data preprocessing stage, thereby reducing overfitting.
Specifically, the prediction period duration takes eight days as a time window to intercept a positive sample, the eight days are marked as (T-7, T-6, T-5, T-4, T-3, T-2, T-1, T), and for the positive sample, the Tth day is the acute exacerbation starting date; for negative samples, the specified time window cannot include the 7 days before and after the acute episode.
Specifically, in order to achieve the early warning effect in the prediction period, 3 groups of prediction tasks are set in advance:
(1) task _1, adopting observed values from T-5 days to T-1 days to predict whether acute exacerbation occurs on the T day;
(2) task _2, adopting observed values from T-6 days to T-2 days to predict whether acute exacerbation occurs on the T day;
(3) task _3, using observations from day T-7 to day T-3, predicts whether acute exacerbation occurred on day T.
Specifically, in order to reduce the number of features, a Kolmogorov-Smirnov test is performed on the features, the test can compare whether the two distributions are the same, and then the distribution of each feature on a positive sample and the distribution on a negative sample are tested, and the confidence coefficient is taken to be 0.05.
Figure BDA0003345094150000091
Figure BDA0003345094150000101
Figure BDA0003345094150000111
Table 1: observation value feature names and their interpretation;
Figure BDA0003345094150000112
Figure BDA0003345094150000121
Figure BDA0003345094150000131
Figure BDA0003345094150000141
table 2. score the first fifty features and their P-values by significance test;
using a k-fold hierarchical cross validation (k ═ 5), data were divided into 5 folds, each time at 8: and 2, training and testing the model by dividing the model into a training set and a testing set. The validation indicates evaluation indices of sensitivity, specificity and AUC, where the threshold is the minimum threshold that allows sensitivity to exceed 0.9 and the specificity is the specificity at the current threshold. The used models are catboost, xgboost and lightgbm, and other hyper-parameters are obtained by performing hyper-parameter search through cross validation; three tasks are set: task _1, Task _2, and Task _3, under each Task, 5 models are set:
(1) m _ all, training by adopting all the characteristics;
(2) m _ sig, using all features that pass significance tests;
(3) m _ signature, which adopts the relevant characteristics of the electronic stethoscope passing the significance test;
(4) m _ sigLSI, using characteristics of the small lung apparatus passing significance test;
(5) m _ sig50, the first 50 features with the lowest p-value passing significance test under the task setting are adopted;
(6) m _ sig25, using top 25 features that pass the significance test;
(7) m _ orig, training by adopting all original observation indexes.
Task_1 Task_2 Task_3
M_all 0.8135 0.8135 0.8135
M_sig 0.9268 0.9045 0.8887
M_sigSTE 0.9020 0.8845 0.8302
M_sigLSI 0.8279 0.7158 0.6617
M_sig50 0.8826 0.8000 0.8631
M_sig25 0.8173 0.8075 0.8816
M_orig 0.7361 0.7434 0.5782
Table 3. AUC mean score of cross validation;
under the Task _1 setting, the number of the salient features is 123, wherein the number of the small lung instrument features passing the significance test is 31, and the number of the electronic stethoscopes is 92.
Under the Task _2 setting, the significant features are 134, wherein 33 small lung instruments pass the significance test, and 101 electronic stethoscopes pass the significance test. Under the Task _3 setting, the number of the significant features is 131, wherein 28 of the small lung instrument features pass the significance test, and 103 of the electronic stethoscopes pass the significance test.
Table 3 reports the AUC mean score of the cross validation, where the model used is LightGBM. (1) Task _1 can get a higher score, consistent with intuitive understanding (predicting the next day is simpler than predicting the next two or three days);
(2) the score is obviously reduced only by using the characteristics generated by the small lung instrument, and the characteristics generated by using the stethoscope still have better performance, which indicates that the observed data of the electronic stethoscope has stronger discrimination and prediction functions;
(3) the method adopts the significance test to screen the features, and is obviously improved compared with the method of directly using the original observed value or all the features;
(4) with the top 50 features of significance or the top 25 features, the model score will decrease somewhat, indicating that the model fitting ability decreases after the number of features is reduced. ROC curves for five models based on LightGBM under Task _1 setting as shown in fig. 1;
table 3 reports the AUC mean score of the cross validation, where the model used is LightGBM. To verify the performance under other models, we give the following effect under the xgboost or catboost model:
Task_1 Task_2 Task_3
M_all 0.8772 0.8673 0.8142
M_sig 0.9181 0.8946 0.8233
M_sigSTE 0.9036 0.8792 0.8110
M_sigLSI 0.8279 0.7610 0.7000
M_sig50 0.8372 0.7831 0.8184
M_sig25 0.8177 0.8047 0.8203
M_orig 0.7812 0.7881 0.6659
table 3-1. AUC mean score of cross validation. The model used is xgboost.
Figure BDA0003345094150000161
Figure BDA0003345094150000171
Table 3-2. AUC mean score of cross validation. The model used was catboost.
Sensitivity of the composition Specificity of Probability threshold
Task_1M_sig50 0.9043 0.7345 0.0113
Task_2M_sig50 0.9043 0.7098 0.0091
Task_3M_sig50 0.9043 0.6623 0.0042
And 4, setting the sensitivity and specificity values of the optimal model M _ sig under each task.
In order to verify the influence of different decision tree models on the performance of the predicted tasks, the following table reports model performance with significant features under three task settings, and we compare Xgboost, Lightgbm and Catboost, and the three strongest gradient boosting (gradient boosting) algorithms based on the decision tree models perform the best under the three task settings according to experimental results.
Lightgbm Catboost Xgboost
Task_1M_sig 0.9268 0.8852 0.9181
Task_2M_sig 0.9045 0.8505 0.8946
Task_3M_sig 0.8887 0.8722 0.8233
And 5, cross validation average AUC of three types of decision tree integration models Catboost, Lightgbm and Xgboost based on the most characteristic combination M _ sig under each task setting.
Example 2
Five-fold cross validation was performed on task1 using the Lightgbm model, with the AUC per fold appearing as follows:
Figure BDA0003345094150000181
five-fold cross validation was performed on task2 using the Lightgbm model, with the AUC per fold appearing as follows:
Figure BDA0003345094150000182
Figure BDA0003345094150000191
five-fold cross validation was performed on task3 using the Lightgbm model, with the AUC per fold appearing as follows:
Figure BDA0003345094150000192
example 3
Five-fold cross validation was performed on task1 using the Lightgbm model, with average ACC, precision, recall, f1, auc scores as follows:
Figure BDA0003345094150000193
Figure BDA0003345094150000201
table 5 mean scores for various indices cross-validated on task 1.
Five-fold cross validation was performed on task2 using the Lightgbm model, with average ACC, precision, recall, f1, auc scores as follows:
AUC ACC precision recall F1
M_all 0.8135 0.8694 0.5415 0.7142 0.5475
M_sig 0.9045 0.8665 0.5365 0.9142 0.6441
M_sigSTE 0.8845 0.9108 0.6758 0.5714 0.6112
M_sigLSI 0.7158 0.7509 0.2969 0.7428 0.4028
M_sig50 0.8000 0.8807 0.4988 0.6243 0.5275
M_sig25 0.8075 0.8950 0.4902 0.6571 0.5322
M_orig 0.7434 0.8423 0.7107 0.6 0.5572
table 6 average scores for various indices cross-validated at Task 2.
Five-fold cross validation was performed on task3 using the Lightgbm model, with average ACC, precision, recall, f1, auc scores as follows:
Figure BDA0003345094150000202
Figure BDA0003345094150000211
table 7 average scores for various indices cross-validated at Task 3.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
It is to be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions; also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Claims (9)

1. A COPD acute exacerbation prediction method based on a time window is characterized by comprising the following steps:
s1, collecting lung indexes of a patient twice a day (morning and afternoon) by using devices such as a small lung instrument, an electronic stethoscope and the like, such as FVC (pneumotach) and FEV (FEV 1), and the maximum value of the energy of lung vibration collected by the stethoscope, wherein the FVC adopts an instrument 'small lung instrument', and obtains forced vital capacity, namely the maximum air volume capable of being exhaled as soon as possible after the maximum inhalation is tried; the FEV1 adopts an instrument, namely a small lung instrument, to obtain the volume of the air volume which is maximally breathed after the maximum deep inspiration and is maximally breathed out for the first second; the PEF adopts an instrument, namely a small lung instrument, and obtains the instantaneous flow rate when the expiratory flow is fastest in the forced vital capacity measuring process;
s2, in order to support the model, it can predict T +1, T +2, T +3 days, and in order to keep the model easy to use, it uses the fixed time window (7 days) of the patient lung monitoring index to predict, collects the patient' S32 indexes every morning and evening through the electronic device, and distinguishes the indexes of five-day time window into date and whether morning, then the number of features is 32 × 7 × 2-448;
s3, extracting more features according to the features, wherein the features can reflect the change condition of the lung monitoring index of the patient; the data expansion comprises the following steps: an index sliding window statistic; the difference of the alternate-day indexes;
s4, taking one exacerbation (and the previous 7 days) as a positive sample; for negative samples, the specified time window cannot include 30 days before and after the acute attack period, so as to prevent the disease condition from influencing the monitoring index; the negative sample is generated by sampling all data which can be observed continuously for 7 days in the data;
s5, performing significance test on the features, and finding out whether 235 features have significant correlation to the increase of the T + d day (d is 1, 2, 3);
s6, using the 235 significant features as model input parameters, predicting whether the day T + d (d is 1, 2, 3) is heavy; the model adopts an integrated model based on a decision tree: xgboost, lightgbm, and catboost, and 5-fold cross validation was used to evaluate the model effect.
2. The method of claim 1, wherein the method comprises: the XGboost model interpretation method comprises the following steps:
(1) analyzing the tree model element structure of the XGboost model to analyze the tree structure of each single tree;
(2) inputting a test sample into the XGboost model, and acquiring an effective leaf node corresponding to the test sample and an effective path of a tree of the effective leaf node according to a tree structure;
(3) and calculating a contribution value of the feature according to the effective path, and explaining the XGboost model according to the acquired contribution value.
3. The method of claim 1, wherein the method comprises: the XGboost utilizes a Boosting integration method, is largely used for data mining, and can process missing values and regularize features, so that the function of second-order accelerated optimization of a cost function is realized.
4. The method of claim 1, wherein the method comprises: the LightGBM is a new gradient lifting tree framework, supports the algorithm of GBDT, GBRT, GBM and MART, and is a complete solution of distributed training based on the DMTK framework.
5. The method of claim 1, wherein the method comprises: the Catboost algorithm includes: in a sensing period, the secondary user sends the sensed energy value in the channel to the fusion center as a characteristic energy vector, and the primary user sends information of whether the spectrum resources are occupied or not to the fusion center as a label discontinuously, so that the construction of a training data set is completed. The model was trained with the Catboost algorithm in the fusion center.
6. The method of claim 1, wherein the method comprises: the Catboost algorithm is proposed by Yandex, optimizes the processing of the class characteristics, processes in a training stage instead of a data preprocessing stage, and calculates leaf node values when a tree model is selected to reduce overfitting.
7. The method of claim 1, wherein the method comprises: the prediction period is long, eight days are taken as a time window to intercept positive samples, the eight days are marked as (T-7, T-6, T-5, T-4, T-3, T-2, T-1, T), and for the positive samples, the Tth day is the acute exacerbation starting date; for negative samples, the specified time window cannot include the 7 days before and after the acute episode.
8. The method of claim 7, wherein the method comprises: in order to achieve the effect of early warning in the prediction period, 3 groups of prediction tasks are set in advance:
(1) task _1, adopting observed values from T-5 days to T-1 days to predict whether acute exacerbation occurs on the T day;
(2) task _2, adopting observed values from T-6 days to T-2 days to predict whether acute exacerbation occurs on the T day;
(3) task _3, using observations from day T-7 to day T-3, predicts whether acute exacerbation occurred on day T.
9. The method of claim 1, wherein the method comprises: in order to reduce the number of features, a Kolmogorov-Smirnov test is performed on the features, the test can be used for comparing whether the two distributions are the same, and then the distribution of each feature on a positive sample and the distribution on a negative sample are tested, and the confidence coefficient is 0.05.
CN202111319613.8A 2021-11-09 2021-11-09 COPD acute exacerbation prediction method based on time window Active CN113974566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111319613.8A CN113974566B (en) 2021-11-09 2021-11-09 COPD acute exacerbation prediction method based on time window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111319613.8A CN113974566B (en) 2021-11-09 2021-11-09 COPD acute exacerbation prediction method based on time window

Publications (2)

Publication Number Publication Date
CN113974566A true CN113974566A (en) 2022-01-28
CN113974566B CN113974566B (en) 2023-09-19

Family

ID=79747333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111319613.8A Active CN113974566B (en) 2021-11-09 2021-11-09 COPD acute exacerbation prediction method based on time window

Country Status (1)

Country Link
CN (1) CN113974566B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566238A (en) * 2022-02-09 2022-05-31 无锡启益医疗科技有限公司 Screened patient AUC (AUC) improving method based on COPD (chronic obstructive pulmonary disease) risk judgment
CN117894478A (en) * 2024-03-14 2024-04-16 天津市肿瘤医院(天津医科大学肿瘤医院) Informationized intelligent management method for severe cases of oncology department of severe cases of oncology

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150080671A1 (en) * 2013-05-29 2015-03-19 Technical University Of Denmark Sleep Spindles as Biomarker for Early Detection of Neurodegenerative Disorders
CN107451390A (en) * 2017-02-22 2017-12-08 Cc和I研究有限公司 System for predicting acute exacerbations in patients with chronic obstructive pulmonary disease
CN110123274A (en) * 2019-04-29 2019-08-16 上海电气集团股份有限公司 A kind of monitoring system of septicopyemia
CN110289061A (en) * 2019-06-27 2019-09-27 黎檀实 A kind of Time Series Forecasting Methods of the traumatic hemorrhagic shock condition of the injury
CN111657888A (en) * 2020-05-28 2020-09-15 首都医科大学附属北京天坛医院 Severe acute respiratory distress syndrome early warning method and system
CN113057588A (en) * 2021-03-17 2021-07-02 上海电气集团股份有限公司 Disease early warning method, device, equipment and medium
WO2021148967A1 (en) * 2020-01-23 2021-07-29 Novartis Ag A computer-implemented system and method for outputting a prediction of a probability of a hospitalization of patients with chronic obstructive pulmonary disorder
CN113469227A (en) * 2021-06-18 2021-10-01 南京润楠医疗电子研究院有限公司 Forced expiration total amount prediction method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150080671A1 (en) * 2013-05-29 2015-03-19 Technical University Of Denmark Sleep Spindles as Biomarker for Early Detection of Neurodegenerative Disorders
CN107451390A (en) * 2017-02-22 2017-12-08 Cc和I研究有限公司 System for predicting acute exacerbations in patients with chronic obstructive pulmonary disease
US20180239872A1 (en) * 2017-02-22 2018-08-23 CC&I Research Co.,Ltd System for predicting an acute exacerbation of chronic obstructive pulmonary disease
CN110123274A (en) * 2019-04-29 2019-08-16 上海电气集团股份有限公司 A kind of monitoring system of septicopyemia
CN110289061A (en) * 2019-06-27 2019-09-27 黎檀实 A kind of Time Series Forecasting Methods of the traumatic hemorrhagic shock condition of the injury
WO2021148967A1 (en) * 2020-01-23 2021-07-29 Novartis Ag A computer-implemented system and method for outputting a prediction of a probability of a hospitalization of patients with chronic obstructive pulmonary disorder
CN111657888A (en) * 2020-05-28 2020-09-15 首都医科大学附属北京天坛医院 Severe acute respiratory distress syndrome early warning method and system
CN113057588A (en) * 2021-03-17 2021-07-02 上海电气集团股份有限公司 Disease early warning method, device, equipment and medium
CN113469227A (en) * 2021-06-18 2021-10-01 南京润楠医疗电子研究院有限公司 Forced expiration total amount prediction method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566238A (en) * 2022-02-09 2022-05-31 无锡启益医疗科技有限公司 Screened patient AUC (AUC) improving method based on COPD (chronic obstructive pulmonary disease) risk judgment
CN117894478A (en) * 2024-03-14 2024-04-16 天津市肿瘤医院(天津医科大学肿瘤医院) Informationized intelligent management method for severe cases of oncology department of severe cases of oncology
CN117894478B (en) * 2024-03-14 2024-05-28 天津市肿瘤医院(天津医科大学肿瘤医院) Informationized intelligent management method for severe cases of oncology department of severe cases of oncology

Also Published As

Publication number Publication date
CN113974566B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Wollenstein-Betech et al. Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: hospitalizations, mortality, and the need for an ICU or ventilator
CN113974566B (en) COPD acute exacerbation prediction method based on time window
US10332638B2 (en) Methods and systems for pre-symptomatic detection of exposure to an agent
JP5450556B2 (en) Medical information processing apparatus and method, and program
CN109166630B (en) Infectious disease data monitoring and processing method and system
CN108417274A (en) Forecast of epiphytotics method, system and equipment
CN112216402A (en) Epidemic situation prediction method and device based on artificial intelligence, computer equipment and medium
CN115240803A (en) Model training method, complication prediction system, complication prediction device, and complication prediction medium
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
Joshe et al. Symptoms analysis based chronic obstructive pulmonary disease prediction in Bangladesh using machine learning approach
CN118136254A (en) Method for constructing chronic obstructive pulmonary disease early model based on chest CT parameters
Ghose et al. Deep viewing for Covid-19 detection from x-ray using cnn based architecture
Xu et al. Automated detection of airflow obstructive diseases: a systematic review of the last decade (2013-2022)
CN117116475A (en) Method, system, terminal and storage medium for predicting risk of ischemic cerebral apoplexy
Nikolikj et al. Sensitivity Analysis of RF+ clust for Leave-one-problem-out Performance Prediction
Abdullah et al. MERS-CoV disease estimation (MDE) A study to estimate a MERS-CoV by classification algorithms
Banyal et al. Technology landscape for epidemiological prediction and diagnosis of covid-19
Rajmohan et al. G-Sep: A deep learning algorithm for detection of long-term sepsis using bidirectional gated recurrent unit
Corizzo et al. Lstm-based pulmonary air leak forecasting for chest tube management
Patel et al. Multi Feature fusion for COPD Classification using Deep learning algorithms
JP2022086803A (en) Method for estimating reason, method for prediction, method for estimating attribute value, reason estimation device, prediction device, attribute value estimation device, and program
Xiao et al. Breathing New Life into COPD Assessment: Multisensory Home-monitoring for Predicting Severity
Do et al. Deep Q-learning for Predicting Asthma Attack with Considering Personalized Environmental Triggers’ Risk Scores
Nguyen et al. Sound-Dr: Reliable Sound Dataset and Baseline Artificial Intelligence System for Respiratory Illnesses
Wang et al. Machine Learning Classification Techniques for Diabetic Foot Ulcers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant